0:00

One simple yet extraordinarily class of probabilistic temporal models is the

Â class of hidden Markov models. Although these are models can be viewed

Â as a subclass of dynamic Bayesian networks.

Â We'll see that they have their own type of structure that makes them particularly

Â useful for a broad range of applications. So, a hidden Markov model in its simplest

Â form can be viewed as a probabilistic model that has a state variable S and a

Â single observation variable O. And so the model really has only two

Â publistic pieces, there is the transition model that tells us the transition from

Â one state to the next over time and then there is the observation model, that

Â tells us in a given state how likely we are to see different observations.

Â Now, you can unroll this simple 2d-BN. So if this is our 2d-BN you can unroll

Â that to produce an unrolled network, which has the same repeated structure

Â [INAUDIBLE] state at time zero move to the state at time one, and so on,

Â and at each state, we make an appropriate observation.

Â But what's interesting about hidden Markov models is that they often have a

Â lot of internal structure that manifests most notably here in the transition

Â model, but sometimes that was on the observation

Â model. So here is an example of what the of what

Â a structured transition model would look like and this entire model is actually

Â an, is peering into the CPD of the probability of the state a time, of the

Â next state given, the current state. So each of these nodes in here is not a

Â random variable, rather it is a particular assignment to the state

Â variable, sort of state that the model might been.

Â And what you see here is the structure of the transition matrix that tells us that

Â from S1 for example, the the model is likely to transition to S2 with

Â probability of 0.7 or stay in S1 with a probability of 0.3. And so the,

Â these two outgoing probabilities have to sum to one because it's a probability

Â distribution over the next state, given that in the current time point the model

Â is in state S1. And we similarly have that for, all other

Â states, so here from S4, for example.

Â there is the probability of 0.9 of going to S2 and 0.1 of staying at S4.

Â So here, the structure is actually a sparsity in the transition model,

Â As opposed to something that manifests at the level of the 2TBN structure, which is

Â actually fairly simple. It turns out that this kind of structure

Â is useful for a broad range of applications.

Â robot localization, speech recognition, where HMMs are really the method of

Â choice for all current speech recognition systems.

Â Biological sequence analysis for example, annotating a DNA sequence with elements

Â that are functionally important and other elements that are not of importance and

Â similarly annotating a text sequence with, particular annotating words with

Â the role in the sentence for example. All of these are methods where hidden

Â Markov models have been used with great success.

Â So, let's look for example what the HMM would look like for robot localization.

Â This might not look exactly like a HMM to begin with because it has some additional

Â variables, so lets talk about what each of these

Â variables means. Here, what we have is a state variable

Â that represents the robot pose that is the position, and potentially orientation

Â of the robot within a map at each point in time.

Â We also have an external control signal u, which is the guidance that the robot

Â is told of move left, move right, turn around, since these variables are

Â observed and externally imposed, they're not really stochastic random variables,

Â they are just inputs to the system if you will.

Â and then we have in addition, the observation variables which is what the

Â robot observes at each point in the process, which depends on both their

Â position, and of course on, the map position. So that the robot's

Â observations depend on the overall architecture of the space that they're in

Â and the state that they're building. Now, since in many of the applications

Â that we'll be considering the map is observed, which is why I just grayed it

Â out then you can effectively think of this as something where the basic

Â structure, over which where reasoning is just the

Â set of variables that represent the S's and the O's,

Â which is why this is really a slight elaboration of the standard HMM model.

Â And here also, we're going to have sparsity in the transition model, because

Â the robot had jump around from one state to the other so within a single step.

Â and so there is only going to be a limited set of positions at time T plus

Â one given where the robot is at time T. Speech recognition, as I mentioned, is

Â perhaps the prototypical HMM success story,

Â because it's so, it's so much in common use.

Â The speech recognition problem is to take a sentence, such as Dorothy lived in

Â whatever, and imagine somebody is saying that and

Â what is given as input to the probabilistic reasoning system is this

Â very noisy acoustic signal that represents the, the values of the

Â different frequencies of speech in a different frequency, acoustic frequencies

Â at each point in time. And what we would like to do, is we would

Â like to take that input and put it into a decoder

Â that is going to evaluate different possible sentences,

Â and hopefully guess what the source sentence is and be able to predict it

Â with reasonable accuracy. So how does speech recognition work?

Â So this is what an acoustic signal looks like,

Â we can see that over here where at each point in time we have these frequencies

Â and we can convert that to the frequency spectrum by using Fourier transform.

Â And what we would like to do, we would like to break up this continuous signal

Â into these pieces that correspond to words and recognize for each piece that

Â which word it corresponds to. So this is a twofold problem because in general

Â in continuous speech, we have to both identify the boundaries between words, as

Â we are also trying to identify the words. And it turns out that the way to do this

Â is not to think about words as the basic unit,

Â but rather think about these smaller entities that are called phones or

Â phonemes, as the basic units from which words are comprised,

Â and then, as we recognize phones, we can put them together to figure out what the

Â words mean. So, here is a phonetic alphabet one of

Â several that break the is used to define how words are broken up into these little

Â pieces. And, so you can see that these hm, hm

Â don't line up exactly with with characters in the alphabet,

Â because the same character can actually be pronounced in different ways giving

Â rise to different phones, and vice versa sometimes the same phone can come from

Â different letters. And, so what we see here are the,

Â for example the phone called ER, or called ER and is pronounced in the same

Â way as the ur in hurt. And, so this is a phonetic alphabet, and

Â as I said, there is many of those that one can consider.

Â So once we have the phonetic alphabet we can

Â look at a word, in this case, this is the, this is the

Â phonetic alphabet, the phonetic breakdown of the word nine, n, ay, n.

Â So you can see that this is an HMm structure,

Â this is not the DBN, this is the HMM that tell us at each point in time, whether

Â you're within the phone, n, ay, and so, there is a self transition loop, because

Â you can stay in the same phone for more than one time unit.

Â And then, eventually, you transition to the next phone and the next phone and

Â this is a typical HMM for a word. Now, within a phone, a phone also lasts a

Â certain unit of time, and so what we have here is the, within the phone for a

Â given, within a given phone there is a transition model.

Â In this case, the phone and it has in this case, three states,

Â the beginning b, the middle, and the final.

Â And this is a fairly standard breakdown for most phone that have the beginning of

Â the phone, the middle, and the end. And each of these typically corresponds

Â to a different distribution over the acoustic signal that you see at that

Â stage in the phone. So, if you put all these together, if

Â you're trying to do speech recognition, then and this is for the moment, speech

Â recognition for isolated words. So there is a start state,

Â and then, there is a transition model from the start state that tells us how

Â likely we are in the current state to go into one of many words and that would be

Â a language model that tells us how likely different words are to occur at this

Â point. And then, assuming, and then given that

Â the model has transitioned to say the one word, the word one, and we can see that

Â we now have across different points in time that the model transitions to the w,

Â w and then ultimately to the and then finally to the n, n, n.

Â And, then, at the end of it, it transitions to the end of the word, at

Â which point the process circles back and we move on to the next word.

Â And the self transition, and the transition back to the start state is

Â what allows us to do continuous speech recognition.

Â So this is a probabilistic model that tells us how speech might be constructed

Â of first words, then phones within words, and then finally pieces, little bits of

Â the phone that we see in this in the phone HMM that we saw previously.

Â And, this is a generative model of speech,

Â but what happens is that as you feed in evidence about the observed acoustic

Â signal over here, and you run probabilistic inference over

Â this model. What you get out is the most likely set

Â of words that gave rise to the observed speech signal.

Â So to summarize, HMMs in some simplistic way can be viewed

Â as a subclass of the framework of dynamic Bayesian networks.

Â And, while they seem unstructured when we observe them at that level, when we think

Â of structure at the level of random variables, there is a lot of structure

Â that manifests in the sparsity structure of the, of the conditional probabilities

Â and also in terms of repeated elements. As we saw in the previous slide,

Â the phoning, the phone model can occur in multiple words and we replicate that

Â structure across the different places where the same phone can be used in, in

Â the language. And so, that gives a lot of structure in

Â the transition model that really doesn't manifest in any way at the level of

Â random variables. And, we've seen that HMMs are used in, of

Â wide variety of applications for sequence modeling and they're probably one of the

Â most used forms of probabilistic graphical models out there.

Â