0:00

Welcome again to the course on Audio signal processing for music applications.

Â Last week, we talked about the short time Fourier transfer.

Â That offered a sound representation from which we can synthesize

Â sounds without losing any information.

Â And at the same time, it's a good tool for understanding, describing,

Â and transforming sound.

Â This week, we go a step further in the direction of obtaining a higher level

Â representation that in exchange of losing a bit in terms of the identity properties

Â of the STFT, we gain quite a lot of flexibility and

Â level of obstruction in the representation.

Â This is what we call the sinusoidal model.

Â And we will cover this topic in three theory lectures.

Â So this is the first one.

Â We will first present the model, the sinusoidal model.

Â Then talk about how these sinusoids can be expressed in the spectrum

Â 1:03

So the model is quite simple, it's just a sum of time varying sinusoids.

Â So this equation, we have seen it before, but here we are emphasizing two aspects.

Â One, is the idea of summing a finite number of sinusoids, R.

Â So we have R sinusoids, and each of these sinusoids is time varying.

Â It has an instantaneous and a frequency value that changes in time.

Â 1:42

So letâ€™s see how that looks like.

Â Again, we have seen these equation before.

Â So, if we start from a signal x, that is real sinewave and

Â then we take the DFT of the windowed version of this sinewave,

Â we see that of course, the sine wave can be expressed as the sum

Â of two complex sinewaves that are then multiplied by the window we use.

Â And being the sum of two exponential sinewaves,

Â we can split that into two summatory, so separate DFTs.

Â So we'll have the DFT of the negative frequency and

Â the DFT of the positive frequency.

Â Each one, again, multiplied by a window.

Â And these complex exponentials can be grouped together.

Â And basically this is the DFT of a shifted version of the transform of the window.

Â Okay, so basically, at the end we see that

Â the first summatory is the DFT of the function W.

Â So it's W and the frequency index is shifted, so

Â we have shifted the window and is a scale by the amplitude

Â of the cosine, by half of the amplitude of the cosine.

Â And the other element is the same window, but

Â shifted by the positive frequency, and

Â also scaled by the same amplitude.

Â So if we start from a sinewave and we want to show it that the plot

Â of one single spectrum of this window sinusoid, we can see it like this.

Â So this is the positive spectrum.

Â So we don't see the two windows, we only see the positive one.

Â So we are seeing the positive frequencies and

Â so the contribution of the positive exponential.

Â And we see the shape of the window that we use,

Â but of course, centered at the frequency of the sinusoid, which is 440 hertz,

Â which of course we can listen to the sinewave.

Â [SOUND] And this is its spectrum.

Â So a peak centered at 440 hertz.

Â And the phase that during the main lobe is flat

Â that corresponds to the phase of that sinewave at location zero.

Â 4:20

Let's make it a little bit more difficult.

Â What happens when our sound is made up of two sinewaves?

Â So these are two sinewaves, one at 440 hertz,

Â the other at 490 hertz, together.

Â And we can also listen to that.

Â [SOUND] Okay, so, clearly it sounds like a modulated signal.

Â And in the time domain, we can see these modulations.

Â So we see the low frequency which is the modulation, and the high frequency.

Â 4:54

And if we compute the spectrum of that, the positive part of that,

Â we are seeing the two contributions of the two sinusoids.

Â So, we see the two peaks of the two sinusoids, and

Â in the phase we see the phases of these two sinusoids.

Â 5:15

And now let's show an example of a real sound.

Â A sound that includes many sinewaves, like the sound of an oboe.

Â Let's listen to the oboe first.

Â [SOUND] Okay, so this is an oboe playing four notes,

Â so it is around, fundamental around 440 hertz,

Â and in the spectrum, we clearly see all these

Â sinusoids which are the harmonics of the sound.

Â Here, we're only plotting the first 4000 hertz so

Â we're only seeing the first few harmonics, this sound has many more harmonics.

Â But this a good way to zoom into these shapes of the windows and

Â also we see the phase spectrum of that.

Â But, how do we detect the frequency, amplitude,

Â and phase of each of these sinewaves?

Â A simple way to identify a sinusoid in the spectrum is by just focusing

Â on the spectrum magnitude on its location, and on its height.

Â So the location is the frequency and the height is the amplitude of the sinusoid.

Â So therefore, we consider a sinusoid as a peak in the magnitude spectrum.

Â And of course, the issue is that the resolution of

Â a magnitude spectrum is discreet, it's finite.

Â And the maximum resolution we'll be able to get is half of

Â the distance between two frequency samples, between two bins.

Â So that's the maximum frequency resolution that we will get in measuring a sinusoid.

Â 7:15

So we'll be able to do zero-padding to get a bigger FFT so that we get more samples.

Â And we can also do interpolation directly on the resulting samples to even

Â refind the value of the frequency and amplitude values.

Â To detect the spectral peaks, we have to understand the effect of the window.

Â And the most important factor is the window size.

Â So, if we have a particular window, and one important

Â concept is the bandwidth of the main lobe in the spectral domain.

Â So the bandwidth of the main-lobe is B sub f expressed in hertz,

Â so that would be the main-lobe bandwidth of the window in hertz and

Â that's define as the product of B sub s.

Â So the main-lobe width of the window expressed in samples,

Â multiply by the sampling rate and divided by the window size.

Â 8:20

And so that will be the width of the main-lobe.

Â And then, if we consider a particular delta of

Â the distance between two frequencies that we want to resolve.

Â So we have two frequencies, f sub k plus one, and f sub k.

Â So the absolute value of the difference

Â is the delta frequency that we want to resolve.

Â 8:44

So what would be the window size, M, so

Â that two main-lobes of the window are these joints, so

Â that we can see these two frequencies as separate peaks in the spectrum.

Â So this equation here shows what has to happen.

Â So M has to be bigger or equal than B sub s,

Â of the number of samples of the window in the main-lobe,

Â multiply by the sampling rate and divide it by this delta.

Â Or we can also change this delta as the absolute value of

Â the difference of these two frequencies.

Â 9:25

But in many cases this difference between the two frequencies corresponds to

Â the fundamental frequency, because if it's a harmonic sound, the distance

Â between two consecutive harmonics is equal to the fundamental frequency.

Â So if we consider the fundamental frequency,

Â this delta that we want to be able to discriminate,

Â then the bandwidth in hertz of the main loop of the window has to be smaller or

Â equal than this fundamental frequency.

Â So we see these lobes separate and therefore N,

Â the window size, will have to be bigger or

Â equal than B sub s multiplied by F sub s, divided by F sub 0.

Â Or, if we express the period instead of the fundamental frequency,

Â we express the cycle length as the period in sample,

Â then this M has to be bigger equal than B sub s multiplied by the period,

Â the period of the harmonic sound expressed in samples, which is this P.

Â So, let's show an example, let's start from a given window,

Â like the Hamming window, that B sub s is equal to 4.

Â So the main-lobe width is equal to 4.

Â And we have a given sampling rate, and

Â we have two particular frequencies that we want to distinguish.

Â The ones we showed before, 440 hertz and 490 hertz, so the difference

Â is this 50 hertz.

Â 11:06

So we can calculate M that allows us to distinguish these two frequencies.

Â So M will have to be bigger or equal than B sub s 4 multiplied

Â by sampling rate 44,100, and divided by this difference.

Â The absolute value of this difference which is going to be this 50 hertz.

Â And that M will be 3,528 samples.

Â So, if we take 3528 samples of this signal, and

Â we compute the DFT of, with some zero padding so

Â that we see a smooth spectrum, we see this magnitude spectrum.

Â Clearly, we see two clearly distinct peaks,

Â each one corresponding to the transform of a Hamming window, and

Â of course in the phase spectrum, we also see the corresponding phases.

Â 12:15

which is 440 hertz, what should be the N?

Â So if we take an N of 401 samples, these 401 samples

Â basically corresponds to four periods of this oboe sound, okay?

Â And then, so, if we compute the DFT of this

Â signal multiplied by a Hamming window, and

Â again with a zero padding to get an N equal to 1,024,

Â we see these harmonics quite clearly separated.

Â So each harmonic corresponds to one main-lobe of this Hamming window and

Â they're quite clearly one after the other.

Â But now let's see if we increase this window size,

Â instead of having this 401 samples,

Â we have twice as many, so we have 801 samples.

Â And then we do the same thing, we apply the Hamming window and

Â then we take the FFT, which is larger, we see this spectrum.

Â And so here, because we took more samples,

Â the distance that now we can discriminate, is larger.

Â So, in fact, we see that the main lobes are none other than what we ever need,

Â that is we even see the side lobes in between

Â the two main lobes because we took a bigger window size and

Â therefore we are able to discriminate even more than the fundamentals frequency.

Â 14:00

On the topics covered until now, there were quite a bit of references, but

Â starting from this letter on, the techniques are more specific

Â to music applications and quite a bit less has been published.

Â It is good for me, for the course,

Â you'll have to pay more attention to what I'm going to be talking about.

Â 14:23

Anyway, so apart from the standard references,

Â in Wikipedia you can find a little bit about it.

Â And of course, again on Julius references, you can find quite a bit more and

Â more in-depth discussion about these things.

Â 14:42

And that's all for this lecture.

Â We have presented the sinusoidal model, a sound representation that can be built on

Â top of the short time Fourier transform and

Â that can reduce the amount of spectral information to be considered.

Â However, to use it, we have to understand a bit about spectra and about windows.

Â Hopefully, you understood some of that in this lecture.

Â