0:00

Welcome back to the course on Audio Signal Processing for Music Applications.

Â In the previous demonstration class,

Â we talked about the harmonic plus residual model.

Â We actually analyze the sound with the model, with the SMS tools, and

Â we are able to identify the harmonics, subtract them from the original sound and

Â obtain a residual which then we

Â combine with the harmonics to obtain the original signal.

Â 0:29

In this demonstration class we want to go a little bit beyond that by

Â approximating the residual with the stochastic model.

Â And in the case that residual is close to a stochastic signal,

Â then this will be a good representation, the harmonic plus,

Â the stochastic component to represent a sound.

Â So let's start, and let's open the GUI of SMS Tools.

Â And we will start with the sound of the flute sound, okay?

Â So, let's listen to this flute sound.

Â 1:10

[SOUND] Okay, so, it's a very stable note,

Â quite clearly define, the pitch, in fact it's an A4.

Â And it clearly has some breathing quality that will be relevant for

Â this idea of the stochastic component.

Â In order to decide what window size, well, again, let's use the blackman window.

Â Since it's stable sound, blackman will be a good choice for

Â being able to get lobe, the side lobes and

Â getting a good kind of signal to noise ratio in terms of the window.

Â 1:52

So in order to compute what is the best window size, we'll have to just do 6

Â which is the number of bins of the blackman window times the sampling rate.

Â 44,100 divide by the fundamental frequency of this sound and

Â this is an A4, so it's 440 Hertz, okay.

Â So 601 would be a good window to use.

Â Given that we really want to get a good resolution of the harmonics and

Â try to minimize the rest of the component and it's quite stable,

Â we can afford to take a longer window, maybe so let's take 801 samples.

Â The FFT size, again, we can take a bigger FFT so

Â that we get a smoother spectrum, so 2048, okay.

Â And well, let's analyze just in the middle of the sound.

Â [SOUND] This is a quite a long sound so let's do it like that.

Â Okay, so this is the samples of the input signal.

Â In the spectrum, we see quite well the low frequency harmonics

Â 3:11

and similar to the organ sound that we analyze before.

Â Yeah, in the high frequency areas,

Â there is quite a bit of, kind of unstable or

Â kind of very stochastic type of components that are not so

Â clearly defined as partials.

Â So they maybe, they might be some harmonics or partials but

Â they're kind of masked and difficult to identify here.

Â Anyways, it's okay, so now let's just go directly to the harmonic model and

Â let's apply the same parameters.

Â So 801, 2048, now we have to choose the parameters for

Â the pitch detection and the harmonic detection.

Â I don't think we need to go very much down in the spectrum for

Â the harmonics because as we saw, there's not that many.

Â In terms of the duration of the harmonics,

Â I think it's good to make sure that they're long as a stable note.

Â So we might want to put 0.2 in terms of the minimum duration.

Â In terms of number of harmonics, there are clearly not that many,

Â so I'm sure with 40 should be plenty.

Â And we have to set the minimum and maximum fundamental so that 440 fits here.

Â So I'm sure if we put 300 and 500, that should be plenty.

Â And the error for the pitch detection, well, let's take the default as 7.

Â And the deviation, let's start with these values, see what happens.

Â 4:54

And of course, we need to get the flute sound.

Â Okay, now we can compute it.

Â Okay, so clearly we got quite a few harmonics and

Â some here that appear, disappear.

Â Of course, we have to realize that here we're just plotting the first 5,000 Hertz.

Â If we want to see more and what's going on in the higher frequencies,

Â we should display it a little bit differently.

Â Okay, so let's just listen to the result.

Â 5:27

[SOUND] Okay, that's pretty good.

Â That's a quite accurate rendition of that.

Â Of course, it's different from the original one.

Â In fact, if we hear back the original.

Â [SOUND] The original has more brighter type of quality and

Â this is because it has all these other components.

Â Of course, we can play around with these parameters for example,

Â to allow for more harmonics to appear.

Â So for example, if we allow, let's say put 0.1 here, and we compute it again.

Â Yeah, now, we are allowing these high harmonics

Â which are very unstable to appear sort of more, but

Â of course, all these jumps are not really that good.

Â So that means that these harmonics are not really stable,

Â so they are really buried into the,

Â kind of the noise, or this breath that we also hear.

Â Okay, now we can go ahead and use the harmonic plus residual model so

Â we can listen to the residual.

Â And let's again use the same parameters.

Â The flute sound.

Â Let's put 800, let's put 2048,

Â and so the threshold -80.

Â The minimum duration, let's just put 0.2.

Â Number of harmonics, yeah, we don't need that many.

Â And well this will be okay, 350, 700.

Â And this I think 7 was all right, and

Â yeah let's make it not as open in terms of the deviation as before.

Â Let's just put maybe 0.2, see what happens.

Â 7:25

Okay, so here we see the original and the harmonics.

Â And maybe we see a few more than before and synthesize.

Â So, let's listen to the residual.

Â [SOUND] Okay, that's a pretty nice residual.

Â Very [SOUND], we hear the attack and the attack in fact,

Â that is this red thing here that during the attack it's clear,

Â louder this breath and then it just a, sort of gets attenuated and

Â we hear a very clear breath noise throughout.

Â Okay, so now, so this is very much on a stochastic signal, it's very noisy,

Â so that means that we can apply the stochastic analysis to that.

Â So now let's use the harmonic plus stochastic model and

Â let's get the same type of parameters.

Â Of course, we can play around these parameters to get better values.

Â But the ones we chose, they looked okay.

Â So let's again, let's put 0.2 here.

Â Let's get the number of harmonics, yeah, I think 40 harmonics.

Â And this was all right, this was 7, and this was 0.02.

Â Okay, now the parameter that is specific for

Â the stochastic analysis is the stochastic approximation factor.

Â And here by default is 0.1.

Â So it means that it reduces the whole spectrum

Â 9:25

Okay, so clearly now, well, we see differently because in this one

Â we are showing the range from 0 to 14,000 Hertz.

Â So in fact, we're seeing quite a bit more, so

Â we see more of the stochastic component.

Â But as we see the harmonics are very much

Â on the lower side above 5,000 Hertz, that is not that much.

Â There is, some of these, but maybe even this line should not be

Â considered a harmonics and maybe they should be discarded.

Â But let's listen to the stochastic component.

Â 10:09

Okay, of course, we have lost like a little bit of

Â the details that the residual had but

Â it definitely keeps this breathy noise.

Â Of course, it's not that loud so

Â when we put it together with the original signal, with the harmonics.

Â [SOUND] Okay, so it sounds good but clearly the harmonics are taking over and

Â they are kind of masking quite a bit this stochastic component.

Â We have to listen to it quite carefully if we want to be able to

Â distinguish the stochastic component of this type of sound.

Â And that's basically all I wanted to say.

Â 10:58

Let's go back, so we have talked about

Â harmonic plus stochastic model, and we have used SMS tools,

Â the interface that has allowed us to play around with this model.

Â And of course, the sound, this flute sound, is a free sound.

Â So hopefully that has given you a view of the potential of the harmonic plus

Â stochastic model, it's a little bit different from the harmonic plus residual.

Â But the main difference is that now with the stochastic representation

Â of the residual we will be able to do quite a bit of things.

Â Next week, in fact, we're going to be doing transformations to these sounds and

Â the stochastic representation will allow us to do that whether in

Â