0:07

And in this demonstration class I want to continue what we started in

Â the last one which was to analyze a sound in that case a sound

Â of soprano using the short time transform, the topic of this week.

Â So in this lecture, I want to analyze another sound,

Â a sound that can give us another view of the short time frame transform.

Â So letâ€™s open the sound visualizer,

Â and this is the sound weâ€™re going to analyze today.

Â This is the sound of a piano and so let's listen that.

Â [MUSIC]

Â Okay, so this is a very simple piano phrase, quite clear, five notes.

Â So let's go directly to the sms-tools and

Â let's go to the short-time Fourier transform module.

Â So let's go to the piano sound that is here, piano.wav,

Â okay now let's decide about the parameters okay?

Â In the last class we mention that the Blackman was quite a good choice for

Â what we doing so let's keep it the window size, okay.

Â This is not a high pitch as the voice, so

Â we would need quite a bigger window size.

Â So, I don't know let's start with for example a 1501.

Â This a knot size window and

Â this is something that we will whenever possible, we will do.

Â If we take windows with an odd size,

Â that means that they can be centered around zero, and

Â especially for the phase analysis, that's going to be very convenient.

Â So let's use that and let's take that as a habit of using always odd sized windows.

Â The FFT has to be bigger than that.

Â And of course, normally now we will be all using R of 2.

Â So that is efficient if you use FFT algorithm.

Â So the power of 2B here then 1,500 is 2,048.

Â Okay, that's a good size.

Â And the hop size has to be, for the Blackman window has to be hops,

Â so that the window overlap correctly.

Â So, let's say they have to be at least

Â one-fourth of 1,500, so

Â that would be around, let's say 325, okay.

Â That would be around one fourth.

Â And let's compute.

Â 2:58

Okay, this was the input sound, the magnitude, and phase spectrogram.

Â And their output reconstructed.

Â So let's first just listen to the reconstruction.

Â [SOUND] Okay, that's pretty good.

Â So, I guess we haven't lost any information from the analysis.

Â That means that the hub size and the window size were chosen correctly so

Â that they overlap correctly.

Â In the phase spectrum, it looks like very minimalist, but it's quite interesting.

Â We see these very clear vertical lines, and these correspond to the attacks.

Â Basically this means that during these attacks,

Â the phase is quite disrupted, is quite chained.

Â There is a quite big transition there.

Â That's something that we see very clearly in the phase information.

Â And in the more steady, or in the notes we see more of these horizontal structure.

Â That means that the harmonics maintain a kind of a phase continuity

Â that can be identified in the phase spectrogram.

Â 4:13

In the magnitude spectrogram, well we see very clearly the harmonics.

Â These are red lines.

Â And we see that as the sound evolves, a piano being a percussive instrument in

Â the attack, there is more energy and so there is more harmonics.

Â And as the time evolves, the harmonics are decaying and

Â especially they decay the high harmonics and the low harmonics are staying more.

Â We also see quite clearly the attacks of the sound and

Â what is going on during the attacks so that's quite interesting.

Â Okay now let's zoom in and let's go into some detail of that so

Â let's use the option of this figure of doing zooming into a rectangle.

Â And let's just take this middle note, the fourth note from a little

Â bit before they attack to around, when the note ends.

Â Okay, and that's what we're getting, and what we are seeing,

Â in fact, is the de-secretization of the analyses.

Â We have zoom enough so that we can see this vertical,

Â kind of, quantization, these vertical bars.

Â This correspond to every frame, every spectrum computed.

Â So at every bar correspond to the number of samples of the hop size.

Â 5:52

So this was this 325 samples that we are hopping from one frame to the next.

Â And vertically we also see this kind of discretization,

Â this horizontal lines that are more

Â narrower because we have taken quite a bit of samples in delta t.

Â We have taken 2,048 samples, so

Â we have a pretty good frequency resolution.

Â Let's compute with a different set of parameters.

Â For example, let's use window size, which is smaller.

Â For example, let's use 201 samples.

Â 6:37

And let's use an FFT size correspondingly smaller, it doesn't have to be that big.

Â So, let's say 256, and of course the hop size has to be

Â accordingly to the window size, at least one-fourth, so let's use 50.

Â And now let's compute it.

Â 6:56

Takes a little bit, because, it's, is of course,

Â being the hub sized model, has to compute more FFT's and this is the what we get.

Â Basically we are visualizing a similar thing, the analysis and

Â then the synthesis and the synthesis is going to be pretty good.

Â Let's listen to that.

Â [MUSIC]

Â Since we have maintained the same relationship between the hub size and

Â the window size, the identity is preserved.

Â So the output sound is identical to the original.

Â But now let's zoom into the same region that we zoomed before

Â to try to understand the differences.

Â Let's get a little bit before the tag and

Â let's get a little bit of the steady state, okay.

Â And let's compare it with the previous one.

Â 7:58

If we mention what we were talking about before, the concept of the vertical and

Â horizontal lines, in terms of the vertical lines, we see them narrower.

Â There are more frames per second here.

Â So the resolution, the time resolution is bigger.

Â 8:17

Okay so we see more things in terms of what, how things evolve in time.

Â In exchange, at the vertical axis, the frequency resolution is

Â worse because the 50 size was much smaller,

Â therefore these boxes are kind of larger in the vertical access.

Â So we see less information in the frequency resolution domain.

Â 8:59

In the first case, we had a good frequency resolution and a not so

Â good time resolution in exchange in this second example,

Â we have a pretty good timed resolution but not so good frequency resolution.

Â And that's a quite important consideration to take into account when we analyze

Â a sound and to decide what is the best set of parameters for a particular sound.

Â 9:30

Okay, now let's go into one aspect to the attack,

Â and try understand some aspect of the sound byte, looking at these

Â find spectrum analyses that we have started to do.

Â So, for that, let's do the DFT okay, and

Â we will just compute the DFT of one location at the attack.

Â The attack, more or less, it was around, let's see,

Â it was 1.54.

Â That's kind of where the attack is.

Â And let's keep these same resolution that we have.

Â So let's keep the 1,501.

Â And let's have the FFT size 2,048.

Â And let's use the piano sound.

Â Okay, now we'll compute it.

Â Okay this is the beginning.

Â We see here there is the attack on the piano.

Â So we see quite a bit of things going on here.

Â 10:53

So let's just get the magnitude spectrogram up to,

Â let's say, well, let's get it up to 10,000 hertz.

Â Okay so we see quite a bit of things.

Â Let's now recompute with the same parameters but

Â a little bit beyond the attack.

Â So when more is a steady state.

Â So let's say, 100 milliseconds after.

Â So, one second and 64 with the same parameters

Â 11:30

okay, and this another analysis.

Â And again, let's zoom into the same region.

Â Let's just zoom into the region that goes until

Â 10,000 hertz and that we get all the information.

Â Okay and let's compare it, and let's see if we can understand

Â what is going on at the sound level.

Â The top is the tack, the bottom is the more steady state.

Â In the time domain clearly we see the difference,

Â in the frequency domain I believe we can see also significantly difference.

Â For example in the top the harmonics are not so

Â well defined because it's the beginning of the sound the harmonics

Â have not been started completely, instead in the steady state,

Â these peaks are much more clear, much more resolved, okay?

Â And then another thing is that in the attack, the kind of the noise floor or

Â basically the energy of the high frequencies is higher.

Â So the high frequencies are much louder than or at least substantially louder than

Â during the steady state in which are the lower harmonics that are clearly louder.

Â 13:24

So we have been looking at a sound,

Â in this case the piano sound, using the sms-tools.

Â And, of course, the sound is available under free sound, and, hopefully,

Â this has given you another insight into the tool we are building,

Â in this case, the short-term fray transfer.

Â But, at the same time, has given you some insight into the piano sounds And

Â I believe it's quite an interesting instrument and sound and

Â using these tools we can appreciate quite a bit of it.

Â So anyway, so that's it for the demonstrations of this week.

Â So I hope to see you next class.

Â Thank you.

Â