0:39

This is the Short-time Fourier Transform equation,

Â basically a modified version of the DFT.

Â With few but important differences.

Â So for example, the input to the equation,

Â the input signal, is not just x of n but

Â is the multiplication of w which is our

Â analysis window, by a fragment of x of n.

Â Okay, so here x has an argument that has n, our time index,

Â but also has a frame number and a hop-size.

Â So, l is the frame number and this is our time index so

Â we will be Iterating over l.

Â So, we will be skipping thru time this way and capital H is our hop-size.

Â How much we going to hop from one time instances to the next.

Â So, basically x is going to be changing in time according to l and

Â H and then at every time instance, at every error,

Â it's going to be multiplied by this analysis window, w of n.

Â The rest is the DFT.

Â So, the only thing that changes is that the input signal changes.

Â And therefore, the output is also is not a single spectrum but a sequence of spectra.

Â There is the x sub l, so the variable l is the frame number.

Â So that means, that the output of the Short-time Fourier

Â transform is going to be a sequence of spectra.

Â Each one of the same size and having magnitudes and fades but

Â each one differently because the input will be a different fragment of the sound,

Â stepping through the sound in a progressive manner.

Â So, to emphasize the idea of zero phase windowing that

Â we already have talked about.

Â From now on, we generally specify the timing next to go

Â from minus over N over 2 to N over 2 minus 1.

Â Ok, so it's always centered around zero we do not have any phase changes.

Â We don't have any kind of shifting in the time,

Â and therefore in the spectrum.

Â The windowing is a way to step through the sound, as I mentioned.

Â So, here we can see a depiction of that and if we use the analogy of image and

Â video we could relate a spectrum with a photograph, a static image and

Â then the short time for a transfer with video, a time varied image.

Â So, here we see in this picture that the whole time for

Â a sound at the bottom and how we are basically stepping

Â through the sound by windowing the sound with this analysis window.

Â And therefore, being able to get all the sound

Â as a sum of basically sound fragments.

Â Okay, to better understand the fact of windowing a sound, let's put an example of

Â what happens when we window a real sinusoid and then compute its spectrum.

Â So, if we start from a real sinusoid we already have seen that.

Â So, it's a cosine with a frequency index,

Â k subzero, and an amplitude, A subzero,

Â which can be expressed as the sum of two complex sinusoids.

Â One with a positive frequency, another with a negative frequency.

Â Then, if we substitute into the Short-time Fourier

Â transform equation, this signal and we window it.

Â We can step through these different steps

Â in which we first put X of N in the equation.

Â Then, we are substituting by the sum of this two complex exponentials.

Â Therefore, because of the linearity of the DFT,

Â we can split these into two equally equations,

Â equal equations in which, in each one we have

Â a complex exponential as the input signal and

Â of the amplitudes can be move outside, and

Â basically what we get back to is the sum 2 DFT's of the window.

Â And with frequency shifting operation.

Â So, basically, and then here we see that

Â the result is the spectrum of the window.

Â Of course, frequency shifted by the frequency of the input signal,

Â and multiplied by the amplitude,

Â by half of the amplitude of the input signal plus of course

Â the other window at the other complex exponential frequency.

Â One is the minus frequency and the other is the plus frequency.

Â So, this will be the result of these cosine, so

Â which is basically the transform of the window.

Â Shifted to the frequency of the input signal and

Â multiply that with the amplitude of the input signal.

Â When we this plot, we can understand this windowing process a little bit better.

Â So, on the top, we have the window.

Â Underneath, is the windowed,

Â sinusoid that we have as our input signal.

Â And then the transform of the window can be shown on the top in

Â which the transform of this window within this case is a hanning

Â window Is that magnitude spectrum centered around zero and

Â with the symmetry and with a given phase.

Â And now if we take the DFT off the windowed sinusoid,

Â well what we are seeing is basically the same shape

Â than the window but at the frequency of the sinusoid.

Â The two frequencies of the sinusoid, the positive and

Â the negative frequencies and at the phase of the sinusoid, too.

Â So we have the two values for the two phases with this

Â anti-symmetry that this analysis results into.

Â So, from this discussion, we can realize the importance of the analysis

Â window in the spectrum of a sinusoid and that's of any sound.

Â It's clear that we have to spend some time explaining the windows.

Â So an analysis window is generally a real function, and

Â is asymmetric around the origin.

Â And this is the simplest window, the rectangular window.

Â Its time domain is nothing too particular, but

Â it's magnitude spectrum is much more interesting.

Â So time domain, it just has value of one, for the duration of the window,

Â in this case 64, and the spectrum, the magnitude spectrum,

Â has a shape which we call it as a sinc, because the transform is a sinc function.

Â And it basically could be described in many different ways.

Â But we focus on two main aspects,

Â on what we call the main lobe, the peak at the center and

Â we'll be talking about the width of the main lobe mainly.

Â And then we talk about the side-lobes which are these small lobes next to it.

Â And we basically focused on the level of the highest of these side lobes.

Â So, we were talking about the highest side-lobe level.

Â Okay, so there are many windows used in audio signal processing.

Â And this is the list of windows available in the scipy module of Python.

Â So we can go through them and we can see quite a variety of windows.

Â Some of them we are not going to pay much attention to, but

Â for example, we will be talking about the Blackman window.

Â We'll be talking about the Hamming window, the Hamming window,

Â we'll be talking about, for example, the triangular window, etc.

Â Some others are not so much used in audio.

Â Each window can be distinguished from the others by measuring the main lobe

Â width and the side lobe level.

Â And each windows offers a different compromise with respect to these two

Â values.

Â So, let's show some of them.

Â So, the first one is rectangular window, and

Â the equation shows how it's computed.

Â And, the spectrum is what we call a sinc function,

Â it's the sine Pi k where k is the frequency index

Â divided by another sine function.

Â So, if we look in the plots, the spectrum could be part is,

Â well it's the manage of the spectrum so

Â it's the absolute value of this WK so

Â basically is a sign function with a kind of a thin

Â waited function applied to them at the boundaries.

Â So, it resolves into this, This shape here,

Â the characteristic shape that's going to be called this sync function.

Â And talking about how to describe it,

Â we mentioned about the width of the main lobe and these has two bins and

Â two bins means two samples and in this, we have to be careful because

Â this is measured When the DFT is the same size than the window.

Â So, if we take a window size of the same size of the window,

Â let's say ten samples, then it's going to be two bins.

Â But generally since we do zero padding, then the number of bins is higher.

Â But this is because of the zero padding and

Â we normally do it in order to better visualize the shapes.

Â In fact, this shape has been generated by a lot of zero padding so

Â we can have this as smooth visualization that ,strictly speaking,

Â the number of bins that we refer to is two.

Â And the side lobe level, the highest side lobe level, is minus 13.3 decibels.

Â So, the distance between the center peak and the first side lobe level.

Â Maybe the most popular window is the Hamming window, which is a raised cosine.

Â So, the equation is, we do .5 + .5 of the cosine so this raises the cosine.

Â So it's just one cycle of that cosine.

Â And if we compute this spectrum it's also going to be expressed as sums

Â of the syncfunctions.

Â In fact all the windows can be expressed In the time domain by sums of cosines and

Â in the frequency domain by sums of this sync function.

Â So, in this case is the sum,

Â in the frequency domain of three sync functions, okay.

Â And, again, the two values that characterize this shape,

Â these frequencies that main shape, is the width of the main lobe which is four bins,

Â so twice as much as the rectangular function.

Â And the side lobe level is minus 31 point 5 decibels, so which is lower.

Â Okay, now the main lobe width wider And the side-lobe level is lower.

Â The Hamming window is very similar to the honing, but with a small and

Â insignificant difference.

Â It's a raised cosine with a step in the side.

Â By having these small steps into the sides.

Â We get a m spectrum that maintains the same main look width.

Â So that's good it doesn't get wider but

Â in exchange we get much lower site lobe level -42.7 decimals and

Â this is, as we are going to see an important thing.

Â They ideally used to have the lowest side-lobe level and

Â the narrowest possible main-lobe.

Â So this a good window.

Â Of course, nothing comes for free, so the side-lobe levels

Â do not decrease so abruptly as they go away from the main log.

Â The Blackman window is the sum of two sinusoids and with that we accomplish

Â a significant improvement in terms of the side-lobe level measure.

Â Okay, so we see the magnitude spectrum which the main lobe Is wider,

Â is 6 bins, but the side-lobe level is lower, is 58 decibels.

Â And that's good because that's starting to be quite useful

Â value at the side-lobe level for many audio applications.

Â And we'll come back to that.

Â And then finally the window I want to and I'm talking about

Â is the blackman-harris window is a very special one.

Â Because you can basically say it has no side lobes.

Â So, it's a sum of several cosines, in this case it's four cosines,

Â with different coefficients in the summing.

Â And then in the frequencies domain,

Â the magnitude spectrum, the main lobe, again, gets wider.

Â In this case, it's 8 bins.

Â But the side-lobe level is -92dB and

Â if we think about it in terms of signal-to-noise ratio,

Â which is a very important factor in digital signals.

Â 92 decibels is basically below the noise floor of 16

Â bits of the kind of signal that we deal with.

Â So basically, that means this side lobes, and

Â if we consider them as, As artifacts or a noise, they are not heard.

Â In other windows we could say that these side-lobes

Â are artifacts that can't be heard.

Â Anyway, again, we will come back to that.

Â And now to finish let me just compare some of

Â these windows being applied to the same sound.

Â So, we start with a fragment of a sound of a certain length and

Â we are applying three different windows.

Â The first one is the rectangle, the next one hamming, and finally, the blackman.

Â Clearly, very distinct spectra.

Â And by looking at these, we can see kind of that maybe the best for

Â this particular analysis is the blackman.

Â We see a smoother spectrum,

Â we see these peaks are much more clearly distinct and in fact,

Â these peaks correspond to the harmonics of the sound.

Â Okay, so this is all and there is a lot of references for

Â the topics I covered, especially about windows.

Â In Wikipedia, you can find quite a bit of information about

Â Short-Time Fourier Transform about windows.

Â Julius and his website and his online back discusses this quite a bit.

Â So, that's a very good reference.

Â And that's the researchers and their credits and references.

Â So, this is all for

Â the first part of the lecture on the Short-Time Fourier Transform.

Â We have explained the basic equation of the Short-Time Fourier transform,

Â and we have focused on the analysis window.

Â In the second part, we will continue with this topic.

Â So, I will see you in the next class.

Â