One of the most generally useful class of sampling methods one that's very commonly
used in practice is the class of Markov Chain Monte Carlo methods.
And those are methods that allows us to design an intuitive sampling process that
through a sequence of steps allows us to generate a sample from a desired target
distribution that might be intractable to sample from directly.
So what are these Markov chains and how do you use them.
So, a Markov Chain is a, is a method for
sampling from a distribution p that is intractable to sample from.
And so we see one example of such a distribution p.
if you'd like to sample from a from say Bayesian network.
Where we have some evidence. We don't really know how to sample from
that. If you'd like to sample from a Markov
network, we don't really know how to sample from that either, in general.
And so here we have examples of distributions p.
And we'd like to come up with a way of generating even one sample from that
distribution p. So mark up chain gives us a general
mechanism for doing that. And what the Markov Chain does is it
defines an iterative process by which the first sample that you generate is not
going to be from a distribution p but ultimately, as you move along,
you are going to get closer and closer to generating a sample from p.
So, so let's understand what that what that means so we have a Markov chain, and
a Markov chain is defined over a state space which we are going to use x's to
denote. And so here is an example of such a state
space. This is a very simple state space, it
starts with the zero point over here and you can imagine, and it has, I, the, four
negitive numbers to the left and four positive numbers to the left.
And a Markov chain defines a probabilistic transition model which,
given that I'm at a given state, x tells me how likely I am to transition to a
different state, x prime. And that is a probability distribution as
indicated in this formula here. So that's for any x, the probability, the
sum over the probability over the states you can transition is exactly one.
So for example if in this case we have our little grass, grasshopper that starts
out at state zero. We can see that it has a probability of
0.25 of transitioning to the right. A probability of 0.25 of transitioning to
the left. And a probability of 0.5 of not making
any progress and staying exactly where it is.
And in fact that same general probability distribution is actually in this example
replicated across the different states in the chain, with the exception of the
states at the end, where if the poor grasshopper tries to go to the left when
it's at negative four, it's going to hit the wall and it's going to end up staying
where it is regardless. And so the probability in this case of
staying is actually 0.75, corresponding to the two different cases that we just
talked about. But anyway, this is a Markov chain, and
it and you can imagine. Simulating a random process by which a
grasshopper traverses this chain. And so, initially, it starts out with at
say, at state zero. And then
and then, what happens, as. And then, as it moves, it selects to move
left with probability of quarter. Right with probability of quarter.
And and stay at the same place with probability 0.5.
Once the states move to the left, it now does exactly the same thing.
So let's think about the temporal dynamics of this type of process.
We can ask what is the probability that a time, that it's step t + 1 so this is the
step. what is the probability that at that
time-step the state, at time t1, + 1 is equal to some value x fine.
So we can get that by a recurrence relationship that looks at the state of
time T. So if we had previously computed the
probability distribution over where the grasshopper might be at time T.
We can sum up over all possible states where x, where the grasshopper might be
and ask if it was at state x at time t, what is the probability that it ends up
at being that it ends up going to x prime.
So this together gives me a distribution over pairs.
X comma X prime, where this, which, which measures the propability that at time, t
grasshopper is at state X and the T plus one is at X prime, and since I'm only
interested now in asking about T plus one, I sum up, or marginalize the time T
step, state X. So if we go back to our grasshopper
example, we can simulate this process. And here is the first three steps of it.
So at time zero, the grasshopper is at state zero with a probability of one.
At time one, it's going to be at negative one with a probability of a quarter and
it's positive one with a probability of a quarter, probability half it's stuck at
the same state. And now we can simulate the next step.
So, it's, at the next time step, the probability that it moves, manages to
move, all the way to -2 is 0.25 squared, because it considers two successful moves
to the left. Here you have two successful moves to the
right. At stage zero you have, for example, a
sum of different events, which is the probability that it stayed in the same
state twice. So.5 squared.
Plus the two events that correspond to moving to the left once and back to the
right, or moving to the right and back to the left, each of which happens with
probability 25^2.. So this is an example of how you would do
this. Now it turns out from many of these
chains and we'll describe conditions in a moment ultimately as the process evolves,
the probability distribution kind of equalizes which means that the
probability for the time t your at state x prime is almost the same as the
probability that at time t + one you're at x prime.
So sort of it kind of the distribution over where you might be tends to
equalize. And and so, we can then consider what's
called the limiting process, the limiting distribution that you would get as you
simulate the process for more and more steps.
And that is typically denoted by pie, which is called the stationary
distribution. It also has a bunch of other names.
But, stationary is the most common. And if you plugin.
You see, basically take out this approximate equality.
And you can see that what we have is a condition on Pi.
Is that the distribution of ti, at one time step is the needs to be at, the, the
probability of X prime in the stationary distribution needs to be equal to this
summation over here. Where Pi now appears both in the left
hand side and within the summation on the right hand side.