0:00

Welcome back to the course on Audio Signal Processing for Music Applications.

Â Until now, we have been presenting different techniques for

Â analyzing and synthesizing sounds.

Â It's time to put that in practice and

Â to make use of it in actual music applications.

Â So this week, we'll be talking about sound transformations.

Â About how to transform the representations we have been talking about and

Â then resynthesize a sound that is a modified version of the input sound.

Â We'll be talking in this first theory lecture about the short-time

Â Fourier transform and about two types of manipulations or

Â transformations, filtering and morphing.

Â And also, we will be talking about this sinusoidal model, and

Â how we can use it for frequency scaling and time scaling particular sound.

Â Let's start with the short-time Fourier transform.

Â This is the block diagram that we saw in which, frame by frame,

Â we are selecting a fragment of the sound, windowing it,

Â computing the spectrum, obtaining a magnitude and

Â phase, and now what we are introducing is

Â a transformation block after this spectral analysis.

Â From this transformation, we obtain a new magnitude and

Â phase spectrum that can be inverted, and

Â we can obtain a new sound that is a modified version of the input sound.

Â Let's first talk about filtering.

Â We talked about filtering in this course before, and the idea is that

Â filtering can be implemented in the time domain using convolution,

Â or in the frequency domain, using multiplication.

Â So this is what we are using here, the short-time Fourier transform for.

Â Applying the filtering in the spectrum by multiplying the magnitude and

Â phase spectrum of the input sound with the magnitude and

Â phase spectrum of the filter.

Â Strictly speaking, what we're doing is, if we have the magnitude and

Â phase separate, we're going to be summing the phase spectra and

Â multiplying the magnitude spectra.

Â So the filter is expressed by its frequency response that has a magnitude

Â and a phase, and the phases are added, and the magnitudes are multiplied.

Â And the equation below shows these ideas.

Â So the complex spectrum of the output is equal to the product of the two

Â magnitude spectrum, of the magnitude spectrum of the filter

Â with one frame of the magnitude spectrum of the input sound.

Â And the phases are sum, but again, is the whole phase

Â spectrum of the filter with one frame of the input sound.

Â And then, we can perform the inverse FFT and obtain an output sound.

Â So let's see that in practice.

Â So here we have on top left, a fragment of an orchestral sound,

Â a very short fragment, of which we compute the magnitude and phase spectrum.

Â So below that, we have the red line is a magnitude spectrum, and

Â below is the phase spectrum.

Â And then we have the magnitude spectrum of a filter.

Â A filter can be zero phase, so

Â it's normally the important aspect of the filter is the magnitude of it.

Â And in many situations,

Â we basically can discard the phase because it has no effect.

Â So in here, what we're going to do is multiplying the magnitude spectrum of

Â the filter with the magnitude spectrum of the input sound.

Â However, since we are in a lock scale in dB, we add the two together.

Â So we'll be adding these two shapes that we see here, and

Â the shape on the right is the added result of these two shapes.

Â So it's a modified version of the input spectrum,

Â and the phases basically are untouched.

Â And then we obtain the output sound y by performing the inverse FFT of that.

Â Let's actually see these for a complete sound.

Â So on top, we see the spectrogram of the complete orchestral sound,

Â we have heard before, but let's listen to it again.

Â [MUSIC]

Â Okay, and now in the middle, we have the shape of

Â the filter that we are applying, the magnitude of it.

Â And then what we are doing is multiplying this shape, but

Â every single one of the frames of the input sound in the magnitude spectrum,

Â and we obtain the spectrogram below.

Â So let's listen this resulting sound.

Â [MUSIC]

Â Clearly much softer, because we have attenuated most of the frequencies.

Â We have only let pass, so these are bandpass filter.

Â The frequency's around a thousand and something hertz.

Â And the rest are very much attenuated.

Â And these ones we have even boosted them a little bit, but

Â of course, energy-wise, we have reduced the energy quite a bit.

Â Let's now use the short-time Fourier transform for

Â another type of transformation, what we call morphing.

Â In morphing, we start from a sound x1, in which what we're doing is

Â basically the short-time Fourier transform analysis resynthesis,

Â and at every frame, we are multiplying its magnitude spectrum by

Â a magnitude spectrum of another sound that is also changing in time.

Â So what we are doing is, we're taking another sound,

Â x2, we are doing a similar short-time Fourier transform process, and

Â at every frame, so basically it's a parallel process,

Â we are only using its magnitude spectrum, and then what we're doing is moving it.

Â Because the idea is that we are applying the general shape of x2 into x1.

Â And if we see the equation below,

Â basically what we are doing is similar to the concept of filtering,

Â but now the x2 is time varying, so it has an l is as a frame, and

Â we're multiplying these two magnitude spectrum of x2 and

Â x1 at frame l, and the phase spectrum is only the one of x1.

Â So let's see that in practice.

Â This, again, is the same sound that we played before, the orchestral sound.

Â And below it is, again, the magnitude and phase spectrum.

Â But now what we are doing is, this black line is

Â a smooth magnitude spectrum of another sound.

Â But this one will keep changing,

Â the same way they keep changing the orchestral sound.

Â And we'll be adding this to spectra, again,

Â because it's in a logarithmic scale, and

Â we will be creating a new set of a spectra, from which we do the inverse FFT.

Â Let's do these for the time varying sound.

Â So on top, we have the orchestra.

Â We already have heard about it, and let's now listen to the other sound,

Â the x2 sound, which is this speech male sound.

Â >> Do you hear me, they don't lie at all.

Â >> And the spectrogram we see in the middle is this smooth version of

Â this x2 sound.

Â So it does not have the detail of x2, it just has the general shape.

Â And these are the two spectrograms that we are summing

Â frame by frame to obtain the lower spectrogram, mY,

Â in which we are definitely seeing that is not x, and

Â it's not x2, it's a modified version of the two.

Â Let's listen to this modified version.

Â [MUSIC]

Â So it clearly has aspects of the orchestra, and

Â basically we can understand the speech because the general shape

Â of the magnitude spectrogram is the one of the speech.

Â Maybe one way of understanding this sound is as

Â if the orchestra was playing through the vocal

Â track of this male, and therefore,

Â reproducing the phonetic kind of textures of the speech sound.

Â Now let's go to the sinusoidal model that we already have seen.

Â So we have again, the analysis of the sinusoidal model

Â with the peak detection and sinusoidal tracking, and

Â now what we're going to be doing is modifying the resulting sinusoidal values.

Â But there is a small change here.

Â We are not going to take the phase values.

Â The phase values are very sensitive, and if we want to make any transformations,

Â it's very difficult to take care of them.

Â So what we are doing is regenerating the phase values from the frequency values.

Â So we'll be modifying amplitude and frequencies, and

Â then we will be generating the phase values after the transformations,

Â and this is the input for the synthesis, and

Â is the same synthesis that we have been doing until now.

Â These are the particular operations we are doing on the output of

Â the sinusoidal analysis.

Â So the new frequencies and new amplitudes are the result of

Â applying some scaling factor, sf, to the input frequency.

Â And also, the reading of the input frequencies

Â are controlled by some scaling time factor,

Â which allows us to move inside the kind of input array,

Â input signal, so that we can slow down or speed up the reading of the sound.

Â And we do that for both amplitude and frequency.

Â The amplitude scenes is done in the dB scale,

Â we sum the scaling amplitude factor with the amplitude of the input signal.

Â And the phases are regenerated, they are not from the original sound, and

Â they're generated by starting from the previous phase, and therefore, we need

Â an initial phase at the beginning, which can be zero, or can be a random value.

Â And then, at every frame, we add the frequency,

Â the new frequency that we are generating, so that the phases

Â automatically unwrap and are generated from the frequency values.

Â So these would be, for example, a scaling envelope.

Â So the scaling factor apply to a particular sound, and therefore,

Â would be an envelope, in which we'll read from the input time.

Â So the horizontal axis is the input time.

Â And it will assign an output time to every input time,

Â so changing the reading position of the input time and modifying its length.

Â So let's see a particular example of this time scaling effect,

Â in which we start from mridangam sound, and

Â we are analyzing that with the sinusoidal model.

Â So the spectrogram top, what we're seeing is the original signal and

Â the synthesized, well the trajectories of the sinusoids.

Â And let's listen to the sound first.

Â [MUSIC]

Â Okay, and then below it is the transformed spectrogram.

Â Though the transform sinusoids, so these are the modified sinusoids,

Â in which we have spaced them differently.

Â So we had been reading them at a different speed, and

Â if you look at the horizontal axis, the time axis is very different.

Â So the original sound was two seconds long, and

Â now this output sound is basically three seconds long.

Â So it has stretched by a factor of 1.5.

Â So what we are seeing here is the output sound but of course,

Â with a new time information.

Â So in terms of frames, there are many more frames,

Â because the length of the frame will remain the same.

Â So let's listen to the resynthesized, modified sound.

Â [MUSIC]

Â Okay, so we have changed the duration, not in a constant scale.

Â In fact, we basically made all the onsets at the same sort of distant.

Â So we basically rupt the time information in a way

Â to generate the times in a different position and

Â making the sound a little bit longer.

Â Of course, the number of possibilities here are enormous, and

Â we can just play around with this mapping in any way we want.

Â And in terms of the other transformation,

Â the frequency scaling applied to the sinusoids,

Â well we also need a scaling envelope in which it's time varying.

Â And so here we have horizontal axis time, and

Â the vertical axis is the scaling factor.

Â So that means basically that at beginning of time,

Â we are multiplying all the frequencies of the sinusoids by 0.8, and

Â by the end of the sound, we're multiplying all the frequencies by 1.2.

Â So the frequency will start lower and end up higher.

Â So let's see that in a particular sound.

Â So this is the orchestral sound that we have already heard.

Â And we see here the sinusoidal analysis of it in the spectrogram,

Â where we see the regional spectrogram and the sinusoidal tracks.

Â And then below it is the transform sinusoids

Â from that curve that we just showed.

Â So here we see that, at the very beginning, the sinusoids are more compact,

Â so they are lower frequencies.

Â And at the end, they are higher, because we have multiplied them by 1.2.

Â So let's listen to this modified orchestral sound.

Â [MUSIC]

Â Of course, it doesn't sound natural, because this is not something that you

Â normally would do, and there is some distortions.

Â But it can be refined and obtain some good results

Â by applying this type of techniques.

Â There is not much information about the details of how to apply

Â these type of transformations on sounds.

Â But if you look in Wikipedia, you can have some pointers and

Â some initial references for sound effects equalization and

Â how to apply time scaling and pitch modifications to sounds.

Â That's all I wanted to say in this first series lecture.

Â So we have gone over some particular transformations using

Â the short-time Fourier transform and the sinusoidal model.

Â And then, in the next theory lecture, we will continue with the other models

Â we have presented and using them in some other type of transformations.

Â So I hope to see you next lecture.

Â Bye-bye

Â