0:00

We've shown that the clique tree algorithm has some pretty cool

Â properties, that it guarentees that we achieve the correct marginal, at every

Â single clique and therefore that these marginals necassarily agree with each

Â other. We've also seen that these marginals be,

Â can be computed, using a single upward and downward pass over the cligue trees,

Â so a fairly efficient computation. But, we also note that the problem of

Â inference in probablistic graphical models, is an empty heart problem and

Â somewhere over there, there might, must be a catch, that is there must be, a

Â computational cost, that occurs. Least in certain cases when we round the

Â clique tree algorithm. And what we're going to do now is we're

Â going to try and understand where that hidden cost might lie.

Â So Well where going to do so in order to

Â analyze the computation complexity of a clique tree we're going to define a

Â little bit of notation. so let's consider an edge IJ in the

Â cligue tree T and let's divide a little bit of notation we're going to divide up

Â the variables into three groups there's this group so here what we have is a

Â clique tree here's CI and CJ. There adjacent kleeks and they're

Â connected to each other via steps at S, I, J and then there's this whole kleek

Â tree, a whole bunch of kleeks on the left side and a whole bunch of kleeks on the

Â right side. We're going to divide the variables into

Â these three groups: W, less, W that's on the I side of the I, J edge are the

Â variables that are just. In this, group here not counting the sub

Â set, which is going to appear in both sets.

Â W that's on the J side, is over there, and SIJ, is the stuff that's in the

Â middle, so these are three, mutually exclusive, and exhaustive groups, that

Â partiiton all the variables in, the tree. And now, here is a,

Â An interesting and, and quite enlightening theorem.

Â That says that, the clique tree t satisfies the running intersection

Â property. If and only if, for every subset IJ, we

Â have that, the variables on the left side of the edge are independent of the

Â variables on the right side of the edge. Given the variables in the middle of the

Â edge. That is, these variables, the subset,

Â separate. The left side from the right side.

Â 2:39

Now, remember that the running intersection property was the critical

Â property that we used to prove correctness of the clique tree algorithm.

Â And so this is a, coming up with this condition that tells us exactly when

Â running intersection holds, is important because this is the defining property for

Â all of those nice behaviors that we talked about earlier in terms of the

Â clique tree algorithm. So let's try and convince ourselves of

Â this by looking at a concrete example first.

Â Let's look at the cleet tree that we have over here and let's for example focus on

Â this subset which has the variables G and S and the variables on the left side of

Â that subset excluding the variables G and S are I.

Â D and C. On the other hand, the variables on the

Â right side of this subset again excluding G and S are H.

Â J and L, so now let's look at where these variables placed themselves on the graph

Â structure over here which is the induced graph corresponding to the Markov

Â network, which we derive from the factors in this Basian network, so Basian

Â network, CPE's produce factors, the factors give us this induced Markov

Â network. So where are the sets of variables G and

Â S? Those are over here.

Â Where are the variables on the left hand side?

Â These are the blue variables, C, I, D, and they're sitting over there.

Â And the green variables, H, J, and L, are sitting over here.

Â And, simple inspection can show us that there are no paths or trails between the

Â blue variables and the green variables, that did not go through the red

Â variables. So that, we conclude from that, that the

Â variables C. I knee are separated.

Â 4:34

From. H, O, and J given G and S.

Â And from the, connection between the graph structure and independence in

Â Markov networks, we can conclude from that, that this independence property.

Â Holds, that is that C, I, and D are independent of H, J, and l given G and S.

Â Which is exactly what we wanted to show, that the subset separates the variables

Â on the left, the blue variables from the variables on the right, the green

Â variables. Let's try an give a slightly more a

Â general argument for this. One That isn't just demonstrating it in

Â the context of a particular example. so let's ignore for a moment the concrete

Â letters inside this. Example and just think about what would

Â why this is going to be, true. So let's imagine that this is now a

Â generic. Subset.

Â And, this is it over here. And, we'd like to prove that all the

Â variables, on the green side are independent of all the variables on the

Â blue side. Now let's assume otherwise this is going

Â to be a prove by contradiction. If this were not the case then that means

Â that in this induced Markov network there needs to be some.

Â So there needs to be, to otherwise.

Â 6:26

Which means there needs to be a path in the induced Markov network between the

Â blue side and the green side. So between W less than IJ.

Â And, W, less than JI. Well if there exists a path that goes

Â from one side of this graph to the other, it means there eventually has to be an

Â edge where one node is on one side and one node's on the other.

Â So there, that means there needs to be some edge that goes from the blue side to

Â the green side. Now notice, and I forgot to say, this

Â path that exists in the induced Markov network, doesn't.

Â 7:27

That doesn't, go through. The subset, because otherwise the subset

Â would separate it. So, it doesn't go through SIJ.

Â So there needs to be an edge that goes for example from here to there or it

Â doesn't matter which node I pick, which pair of nodes I pick.

Â There needs to be some node on the green side and some node on the blue side.

Â This is the green side and this is the blue side and an edge that goes between

Â them that doesn't go through the subset. But now that implies that there needs to

Â be some factor. That involves those two variables.

Â So in this case, e comma h. But now because of family preservation

Â that factor must sit in some. One of the clique's in this clique tree.

Â And that clique is either on the green side or the blue side.

Â It has to be somewhere. So, lets assume that we put that clique

Â without loss of generality on. The green site that's not what happens we

Â have an H that's sitting here and remember H is a blue variable and H is

Â also sitting here because it is on the blue side and now we have an H in one

Â clique and an H in the other and running intersection property tells us that H

Â needs to be everywhere in between and specifically.

Â It needs to be in the subset, which is a violation of the assumption, either of

Â the running interception property or of the assumption that the, the variable h

Â is not in this in the subset. And so that's, a sort of, a somewhat

Â formal proof outline that can, with a little bit of extra effort and notation,

Â be made into a rigorous proof of why a running intersection property implies

Â this independence assumption. And I didn't prove the other direction,

Â because this is actually the direction that we care about more.

Â So we start out this whole discussion by saying that these properties have

Â computational implications. But where are these computational

Â implications? What can we conclude from the fact that

Â the subset needs to separate the graphics to conditionally independent pieces?

Â Well it turns out that in many graphs that implies a certain minimal complexity

Â that can sometimes be quite large. So let's do this in, let's look at this

Â in the context of two quite simple but very commonly used examples.

Â So here is an example of A. What's called a, a complete bipartite

Â gra, graph. Where we have two sets of variables.

Â That have no edges between each of the set separately.

Â But where all of the cross edges are present.

Â This is a structure that is that we've actually seen before.

Â We saw it in the context of the, play model for a student for course

Â difficulty. And student intelligence where we had a

Â bunch of difficulty variables, a bunch of intelligence variables and there were no

Â edges between the difficulties or, I mean between the intelligences.

Â But for every difficulty intelligence pair there was an edge that was induced

Â by the V structure corresponding to an observed student grade between the course

Â difficulty, corresponding course difficulty and not student intelligence.

Â Now what is the smallest subset that we could possibly construct for this graph?

Â Can we, for example just look at, say these two As and separate out the graph

Â into two conditionally independent pieces?

Â Well, no, not really because for example if we now look at these two Bs we see

Â that you can connect them via any one of the other As that I didn't include in the

Â subset, and so there is no, this doesn't de-couple the graph at all.

Â With a little bit of extra thought it's not difficult to convince oneself that

Â the smaller sets that we could construct, that actually breaks up the graph into

Â meaningful pieces is either all the variable son the one side or all the

Â variables on the other. Which means the smaller sub set.

Â 12:00

For the small. in any kind of meaningful clique tree has

Â size. Greater than or equal to min of k comma m

Â with k's number of variables on one side and m on the other.

Â A slightly less obvious example, but one that is also imposes some very

Â significant lower bounds is in the context of a grid.

Â Such as we encountered in the context of the icing model or you know, when doing

Â image analysis, where the variables correspond to pixels.

Â And now let's try and think about how we might break up this graph into

Â conditionally independent pieces. Now, we can construct clique trees that

Â have smaller subsets, small subsets. For example, here's a subset.

Â Breaks away A11 from everything else. But notice that, that still leaves me a

Â very large everything else. but if we try for example, to break up

Â the graphs, so that a11 appears on the one side, and A44 appears on the other.

Â Any, clique tree that, you give me, that will have this separation, with A11 on

Â the one side, and A44 on the other. Any such clique tree has to have.

Â 13:23

A subset. Of size.

Â Greater than or equal to N, where this is a N by N gre, grid.

Â Which means that if you try and break up the N by N grid.

Â In a way that puts, one corner on one side, and one corner on the other.

Â Then, you're going to have a subset, that's at least a dimension of the grid.

Â And breaking it up in other ways doesn't make it any better.

Â So, here are two cases where we can place a lower bound on the size of the subset

Â that is required for running cliue interference and that is where we pay the

Â exponential blowup that is in some sense required by the fact that that the

Â problem is intrinsically an [INAUDIBLE]. So to summarize we've shown previously

Â that the correctness of the cleet tree inference algorithm relies on having the

Â running intersection property and we've now shown in tern that the running

Â intersection property implies certain separation properties on the original

Â distribution. These separation properties in turn, can

Â be used to analyze the complexity of inference in different graphs.

Â And provide certain minimal bounds on the complexity that would hav to be incurred

Â by the best possible clique tree on graphs.

Â And we've already seen before, a notion of minimal complexity, which is the

Â minimal induced width of the graph. This notion is a little bit different,

Â because it talks about subsets. Whereas the induced width really talks

Â more about cliques, but these are both different ways of analyzing the

Â complexity certain minimal complexity that has to be incurred by exact

Â inference, which again is related to the NP-hardness of the problem.

Â