0:02

So previously we defined the Clique Tree Algorithm which is just a belief

Â propagation algorithm run over a graph that happens to be a tree.

Â As it turns out we can exploit the aspects of the tree to greatly improve

Â the computational behavior of this algorithm, not just it's correctness

Â which is what we showed before. So let's go back for a message passing in

Â this very simple tree which has just these four clusters A, B, B, C, C, D, and

Â E and let's see what the messages do. So here is the message delta one two

Â which is the message that past on this edge and here's a message delta two

Â three, and so on.

Â And these are the six messages, that we have.

Â And now we can notice one very important property.

Â Delta one, two once computed, never changes.

Â 1:10

You get instant convergence of Delta one two, once you complete it for the very

Â first time it is convergence. Now what about delta 2,3.

Â Well that depends on when delta 2,3 is passed.

Â If delta 2,3 is passed first before it gets before cluster two gets a message

Â from cluster one then once it gets that message things might change but if

Â cluster two is clever it's going to wait and it waits until that message is passed

Â and once that message is passed it doesn't change delta 2,3 doesn't change

Â either ever. If its smart enough to wait.

Â What about delta three four? Delta three four needs to wait a little

Â bit longer because it needs to wait for delta two three.

Â But if it waits for both delta one two to be passed to cluster two and for delta

Â three to be passed to cluster three, then now it's now that it's sent now that it

Â sends delta three four, delta three four doesn't change either.

Â So delta two three has to wait. Or, delta one two and delta three four

Â has to wait for delta two three but assuming it waits then convergence is

Â achieved in a single message passing style.

Â Now there is nothing magical about going left to right if you go right to left you

Â get the exact same behavior, so here you have the this one converges instantly.

Â 2:58

And this one you need to wait or delta B2. But again, once assuming you wait

Â then, the messages converge in a single step, and so we can compute all of the

Â messages in this entire tree, in, one paths in either direction, one paths from

Â left to right, and one paths from right to left.

Â In the context of chain structures, like this one, this is actually, this actually

Â has its own name, so for chains. You might see this under the name forward

Â backward algorithm. And is very commonly used in, things like

Â the hidden Markov model and and other similar chain structured,

Â representations. And so the point is, the total effort

Â that we had to compute all of these messages is just one message passing step

Â in each direction. one from left to right, and one from

Â right to left. And the result of that, remember that

Â once all messages have been passed, we have beliefs.

Â 4:54

And, so more generally as we said, once CI receives a final message from all of

Â its neighbors, except for CJ. If it waits, to send, if it waits to send

Â the message to J until it receives a message from everybody else, than that

Â message delta IJ is also final. So if these are final, this is also

Â final. Can you always find a message passing

Â order that will achieve that goal I mean maybe you kind of everybody ends up

Â waiting for everybody else and they all get stuck and they never send anybody any

Â messages. The point is that you could always start,

Â because the message from a leaf is always immediately final.

Â And once the message from a leaf is finalized, then you can send the message

Â from the parents of the leaf, and so on and so forth.

Â And because this is a tree, it's guaranteed that you will be able to find

Â a legal message passing order. And if we pass the messages in the right

Â order, then we only need to pass 2K minus 1 messages, where K is the total number

Â 6:09

of clusters. And why is that?

Â Because if you have a tree over K nodes, in this case K clusters, then there's K

Â minus one edges. And since each edge has a message in both

Â directions, so we have, we have K minus one messages.

Â I'm sorry, K minus one edges. And therefore two A minus one messages,

Â one in each direction. Let's look at a couple of message passing

Â orders to see which ones are going to work and which ones aren't.

Â So we can start with any leaf, any leaf works so say we start with oh, I don't

Â know C two? Once C two past its message what other

Â clut, what other clique can pass this message.

Â Well see we can't pass messages yet because its still waiting for messages

Â from all sorts of other guys so we have to go to another leaf for example C1, C1

Â can pass a message now. Neither C3, nor C4 are are candidates for

Â message passing yet cause each of them is still waiting for another message.

Â But C6 can pass a message and now we have the option of activating C4, C4 can now

Â send a message to C3, because we've got everything except the message from c3

Â before. C7 can say pass a message to C3 and you

Â notice there is a lot of arbitrary decisions on how I ordered this I could

Â have used C five next. And now C three has the option of passing

Â a message to C5. And notice that at this point, everyone

Â has received all of the messages, or all of the messages but one.

Â And so now we can start passing in the other direction.

Â So C5 can pass a message to c3 C3 can pass a message to any of C2, C7,

Â or C4. And at this point, C4 can pass every,

Â any, can pass a message to both C6 and C1.

Â 8:36

What is an example of an illegal message passing order?

Â So for example if c1 passes the message to C4 and C4 jumps the gun and says yey

Â I'm going to send the message to C3 next that is an illegal message passing order

Â because C4 does not yet have all the information that it needs to pass the

Â message to C3, it's still waiting for a message from C6.

Â So that's not a good ordering. Clique trees have other very elegant

Â computation properties that make them a useful data structure.

Â So, let's think about the kind of queries that one can answer using clique trees

Â and some of these are fairly obvious, but others are a little bit more subtle.

Â So, imagine that we wanted to answer a posterior distribution query on either a

Â single variable or a set of variables that appear together in a clique.

Â Well, in this case you can take any clique that contains all of those

Â variables. And sum out the variable that you don't

Â care about from the clique to get a posterior over just those variables.

Â Now remember that, that posterior is P tilda Pi, it's an unnormalized measure,

Â so in order to get the normalized measure you actually need to renormalize.

Â 9:58

Now, you could also do something a little bit more clever.

Â You could say, you wanted to introduce new evidence, regarding some variable and

Â query another variable. So this is a form of incremental

Â 10:18

inference. Where you've already calibrated the

Â cligue tree and you say now oh, wait a minute I made another observation let's

Â see what happens now. So it turns out that clique trees are

Â really good for that. and so we're going to divide this into

Â two cases. The easy case and the slightly less easy

Â case. So, imagine that we want that question to

Â query a variable X that appeared in the clique with my new evidency.

Â So here is my clique, and here is z and here is X,

Â and I'm going to, I want to figure out what the posterior of X would be if I

Â further observe Z. Well this is actually really easy because

Â what I have here is P tilde Pi. lets assume for the moment just Z and X

Â if there's other variables there its not it doesn't change anything because we can

Â just get rid of them we can now condition or reduce the clique so this is a cligue

Â reduction by just restricting attention to the entry's that are consistent with

Â my evidence to be. And then, I end up with P tilde Pi,

Â 11:44

of little V,X and in order to get a posterior, I can sum out the relevant

Â variables in the normalized the incremental inference.

Â Well that's the easy case if the two variables are together in a clique.

Â what about a somewhat more elaborate case where I introduce the new evidence and I

Â query a variable that doesn't appear in the same variable, in the same clique as

Â Z. So using an example, let's imagine that

Â what I did is observe evidence on A, and I want to query, say E.

Â Actually, let's do a different example. Let's imagine that I want to query B.

Â 12:30

Well, in this case I can do the following.

Â I can multiply the this indicator function or

Â what we call this reduction of the factor into a cluster that contains the

Â evidence. In this case, we're going to multiply in

Â the indicator function of A equals little A.

Â Which corresponds to removing all entries in cluster one, in the beliefs of cluster

Â one that are not consistent with with

Â multiply it into the side one, removing all entries that are inconsistent with

Â that. So now, what happens when we change psi

Â one. Imagine that we're doing this entire

Â computation from scratch. That is, forget everything that we did

Â before. And think what would happen to this model

Â in the hypothetical case that we were passing messages with a new Psy one

Â instead of the old si one. Well, some messages would change.

Â Which messages would change? Delta 1,2 would change, because Psy one

Â changed. Delta two three would change because

Â because delta two one changed. delta three four would change, but we

Â don't care about delta three four, if we're interested in this belief over

Â here. What about other messages?

Â Delta 4-3 doesn't change, eeither does delta three two.

Â Neither, neither hid delta two one, and so only a sub set of the messages in the

Â clique tree changs as a result of this message of, of, this of this change,

Â which means that we can reuse, at least half the messages if not more, in

Â computing, are beliefs about in this case cluster three.

Â And the only messages that we need to recompute are the messages that are along

Â the path through a clique that contains the variable X that we want to

Â [INAUDIBLE] which in this case is cluster three.

Â 15:07

So to summarize if we have a cliqque tree with K cliques and we pass messages in a

Â careful order which means that we start at the leaf and propagate inwards then we

Â can construct and ordering subset 2K minus one that's to justify compute all

Â of the beliefs in in the in the clique tree which means we get a posterior over

Â every single variable in the model. Contrast that with the computational cost

Â of running variable elimination and times to compute posteriors and n different

Â variables. Here, you get a considerable

Â computational savings. In fact a, a, a computational savings of

Â order, order of the number of variables. And so what we can do is we can compute

Â all the marginals at only twice the cost of variable elimination as opposed to N

Â times the cost of variable elmination. And we've also shown that by caching,

Â restoring the messages, we can reuse inference for example, in the context of

Â incremental queries. Which provides us with the useful data

Â structure to keep updating, as we obtain additional evidence.

Â