0:01

We previously used the variable elimination algorithm as a way of proving

Â the correctness of the clique tree algorithm.

Â It turns out that this relationship can also be used to answer another

Â fundamental question regarding clique trees, which is the question of where

Â clique trees come from or rather how we might construct a good clique tree.

Â In order to do that, what we're going to do is we're going to is we're going to

Â look at a run of variable elimination and we're going to see how we might

Â reinterpret it as a clique tree as a clique tree variable.

Â So, in order to do that let's look at the variable elimination algorithm and try

Â and take a clique tree view of what it's doing.

Â So the variable elimination algorithm, as a reminder, starts by takes a series of

Â steps, where in each step it creates a large factor, which for the purposes of

Â this discussion we're going to call lambda i.

Â And what it does and it does that by taking a bunch of factors and multiplying

Â them together. When that is done we eliminate the

Â variable from lambda i and that gives us a new, smaller scope factor which we're

Â going to call tau i And tau i is put into the pool of factors that we have

Â available to us which means that at some point we're going to use it in the

Â context of computing another large factors lambda j when we multiply a bunch

Â of factors together. So now let's think about this process in

Â terms of passing messages in in between computational entities in the same way

Â that we had when we were doing message passing algorithms.

Â So we're going to look at these intermediate factors lambda i's as the

Â cliques in a clique tree. And the tau i's that are used, that are

Â produced in the context of one clique. And utilized by a different clique. we're

Â going to consider that to be a message. So these little factors that are produced

Â by lander and consumed by another lander can be viewed as a message that's passed

Â from the clique that corresponds to the first lambda and received by the second

Â lambda So summarising the variable elimination algorithm defines a graph.

Â We're going to have a cluster Ci for every factor lamda, okay, we're going to

Â draw an edge, between Ci and Cj if the factor that was generated in the

Â computation on which I multiplied together all the factors, to produce

Â lamda i is used in the computation of lamda j.

Â And so we can think of it as Ci which corresponds to lambda i, passing a

Â message to Cj that corresponds to lambda j.

Â And that message is the tau i that was produced by, by the lambda i computation.

Â So now let's look at this in the context of an example.

Â We're going to go through the variable elimination process that we had for our

Â network that we had earlier, the enhanced student network.

Â And so the first variable elimination step multiplies together the C factor and

Â the CD factor and that gives us, this is my lambda 1.

Â And lambda 1 produces a clique whose scope is CD and so that's this clique

Â over here. The next step eliminates D, and the way

Â in which it does that is it takes tau one and then it multiplies as, it multiplies

Â that by one of my original factors phi G and so now we have a lambda 2 this is my

Â lambda 2. And that gives me a cleet who's scope is

Â GID. And since GID used tau 1,

Â 4:03

The next step in the analysis that we did before, eliminated the variable i and

Â again used two of the original factors phi i and phi s.

Â And it also used how to. And so now, we have a factor lambda 3

Â whose scope is G S and I because those are the variables in this scope over

Â here. And the message GI is that we see here,

Â this edge between GID and GSI corresponds to this tau 2 that was produced by lambda

Â 2 and consumed by lambda 3. The next elimination step didn't use any

Â of the prid factors produced by other, by other cliques, just used one of the

Â original cliques, so it just used phi h. And so note that it's currently not

Â connected to anything, it's just sitting there all by its lonesome.

Â And the next step now gets to use, gets to use factors that were produced in two

Â separate computations. So notice that this computation for tau,

Â for tau, for lambda 5 which is this guy, used both, tau 3 and tau 4 which is why

Â we have both an edge from cluster 3 and an edge from cluster 4.

Â The next step that we took marginalized out S, it's multiplied in tau 5 and one

Â of the original factors JLS. And, so now we have a message, here that

Â is passed between cluster 5 and cluster 6.

Â And finally there was the marginalization of L and so we have a JL cluster.

Â Now if you look at this graph, at this clique tree, you'll notice that it's

Â bigger than the clique tree that we had for this network to begin with.

Â And one of the, and the reason for that, is that we actually have three adjacent

Â cliques where one is a subset of the others.

Â So JL is a subset of JSL which in turn is a subset of GJSL.

Â Now if you think about it there's really no point to including three cliques when

Â one is a subset of the other because you might as well do all of the computation

Â in the first clique to begin with. There is no value to sort of breaking it

Â out into separate data structures. And so a typical post-processing step is

Â to go ahead and remove redundant cliques and just basically collapse them, and

Â their factors into into the one, the biggest clique subsumes them.

Â So in this case since we had 5JLS which originally utilized in tau 6, we're now

Â going to move that and put that instead in, sorry it was used in lambda 6 we're

Â now going to stick that in clique 5. And so we've seen how we can take a

Â variable eliminator process a run of variable elimination and use that to

Â produce what looks like a cleek tree. But is it a clique tree?

Â SO what properties can we prove about this output from a variable elimination

Â model? Well first is the fact that it is in fact

Â a tree. If we want it to be a clique tree it

Â better be a tree. Why is it a tree?

Â Because in variable elimination, every intermediate factor tau, once you produce

Â it, it gets consumed exactly once. Once it gets consumed, remember in the

Â variable elimination algorithm, it gets taken out of the set of factors and then,

Â it disappears because it gets multiplied in.

Â And so if you think about this as a directed tree, it starts out and it

Â produces factors and each factor feeds up only into a single factor parent and, so

Â as a directed tree it means that every var-, every cluster has exact, has at

Â most one parent. And in fact every cluster except the last

Â one has exactly one parent because eventually everything has to be consumed

Â because everything is multiplied together.

Â And so this variable elimination process induces a directed tree where we

Â subsequently forget about the directionality of the edges.

Â 9:00

The second property that is again not difficult to show is that the tree is

Â family preserving. And that is because each of the original

Â factors was in my original factor set. Each of my original factors phi in my

Â original factor set big phi was somewhere in the variable elimination process.

Â And that means it must have been used at some point because eventually I'm

Â multiplying all factors together. And therefore it must be contained in the

Â scope of one of these. Sorry, this should be lambda i.

Â And that lambda i by force has a scope that contains the scope of phi,

Â which is what we need for the family preservation property.

Â 10:00

The final property we had that we would like to show hold, is that the tree

Â respects or obeys the running intersection property.

Â And so as a reminder, that property says that if you have two clusters Ci and Cj

Â with a variable X that is present in both, then X needs to be in each cluster

Â and each subset in the unique path between Ci and Cj.

Â So first of all let's convince are selfs that, that running interception property

Â holds here, and we've already seen that, because this is actually the familiar

Â clique tree that we had for this example in fact we, produced that clique tree by

Â running variable elimination using exactly the process that, that I showed

Â you before. Now lets show that however, more

Â generally. So we want to prove to ourselves, that if

Â T is a tree of clusters that was induced by a run of variable elimination, then T

Â obeys the running interception property. And so now remember that this tree came

Â from a directed run and so what I'm showing you here is actually a directed

Â tree where we see how the clusters communicate this is not the same tree

Â that we had in our example it's a more complicated one to illustrate the sort of

Â to make the theorem more interesting to show.

Â So in one of those subsets and in exactly one of those subsets the variable x is

Â eliminated because this is a random variable elimination every variable is

Â eliminated exactly once. So let's imagine that x is eliminated

Â here. So this click over here is the last to

Â Cx, so C4 has x in it. X is in C4 and X is not in C6.

Â Now let's take any other cluster that contains X.

Â So for example assume that cluster 7 contains X.

Â Well, if cluster 7 contains x, and x wasn't eliminated on this edge, because

Â it's eliminated exactly once. It means that it needs to be on subset,

Â which means it was in the message tau that c7 produced.

Â So it's in tau 7. X is in the scope of tau 7.

Â Which means that x also needs to be in the scope of C3 because it was multiplied

Â in. If it was multiplied in then it has to be in that scope.

Â And so by continuing this in exactly the same inductive argument, we know that

Â since X has to be in C3 it also needs to be on this subset and so on.

Â And so we've shown that X needs to be on the path from C7 to C4.

Â And now let's take a different cluster that contains C5, it contains X for

Â example C5. We can similarly show that X needs to be

Â on every step along the path. And since this is a tree,

Â means at least to be on the path between C7 and C5.

Â 13:42

So to summarize a run the variable eliminator implicitly defines a correct

Â clique tree. Which means that, you can,

Â if you want to get the clique tree, that has the required properties,

Â family preservation and running intersection.

Â We can simply simulate a run of variable elimination.

Â What does it mean to simulate? It doesn't mean that we run variable

Â elimination. Because that would be an expensive

Â process. We just figure out what factors I would

Â multiply and what scopes that would have. And therefore, and what factor gets used

Â where. And that gives me the cliques and the

Â connections between them. And this is the exact process that we did

Â in the simple example that we had. And that clique tree has a computational

Â cost which is essentially equivalent to the cost of running the variable

Â elimination algorithm. Because, if you think about the, sizes of

Â the clique that are produced, they correspond exactly to the sizes of, the

Â factors produced by variable elimination. So these costs are directly comparable to

Â each other. Except that as we showed before, the

Â clique trees use dynamic programming or storing, or caching of the messages so

Â that we don't need to recompute the same message more than once.

Â And that, allows us to produce marginals over all of the variables in the network

Â at only twice the cost of running variable elimination.

Â So, taking yet one other step backward, what that means is if you, care to get

Â the values of, the marginals for even not, not necessarily even all variable

Â but multiple variables. One efficient way of doing that is to take variable

Â elmination, do the simulation to figure out what clique tree we want to build or

Â how to build a clique tree and then use that clique tree to compute all of the

Â marginals by appropriate message passing. And we talked about variable elimination

Â and how to construct to use heuristic solutions to construct a good variable

Â elimination ordering one that gives us relatively small factor hopefully along

Â the way and that exact same set of heuristics is also used for constructing

Â a small efficient clique tree to the extent that that's possible given our

Â graph.

Â