0:20

And the actual marginalization. And so what we're going to do now, is

Â we're going to count the operations used by each of them.

Â So, let's start with a factor product and first let's remind ourselves of what a

Â factor product is doing. So, factor product is taking in this case

Â for example a factor who's scope is AB and one whose scope is BC and producing a

Â factor whose scope is ABC. Now let's think about how each of the

Â numbers in this result is produced. So each of these is a product, in this

Â case of two numbers, one that comes from this table and one that comes from that

Â table. So we have to produce every one of the

Â rows in this in this new table, this new factor, so let's call NK the number of

Â the rows in this new table. And how many operations are there that we

Â need to produce each such row? Well, if we need to multiply in, in this

Â case MK different factors. So for each row.

Â 2:28

Each one gets added to only one of the rows in the new factor.

Â And so a simple upper bound on the amount of of additions that we need to do is

Â simply the size of this factor nk. So now let's total up the computational

Â complexity of variable elimination. So let's assume that we start with m

Â factors. For Bayesian networks m is really

Â effectively n because we have one factor. For every variable.

Â Which is the CPD. And the reason I wrote less than or equal

Â is because of the reduction by evidence. Now for Markov networks this connection

Â would be larger. So if you think of a grid Markov network

Â or a fully connected Markov network, the number of factors might be so fully

Â connected pairwise Markov networks. The number of factors can actually be

Â larger than the number of variables. So that's why we have the complexity in

Â terms of m as opposed to in terms of n. Now, so that's the set of factors that we

Â start out with and then what happens as we an elimination step, and elimination

Â step picks some of tose factors and generates another factor, but each

Â elimination step generates exactly one factor.

Â 3:50

How many elimination steps do we have? Well, each elimination step corresponds

Â to elimination of one variable, so we have at most, N elimination steps.

Â So the total number of factors that we ever produce is, which we're going to

Â call M star. Is going to be equal to, at the most, M,

Â which is the set of initial factors. Plus the newly generated factors which is

Â at most N. And so all together M star is less than

Â or equal to M plus N. So, now that we've figured that out,

Â let's look at what the complexity of the algorithm is in terms of various key

Â quantities. So n is the size of the largest factor

Â that I ever create. Which is the maths of these different

Â nk's that I had. So, now, how many product operations do

Â we have? Well, remember that we had a sum over

Â different elimination steps, so, sum over K, and this was the number of product

Â operations that we have. But now, here is the critical

Â observation. Each factor.

Â Is multiplied in at most once. Because as soon as we multiply it in.

Â 5:10

As soon as we multiply it in, it goes away.

Â Which means that the sum over K, MK minus one, is is at most the total number of

Â factors. And so so said otherwise, let's, we can

Â write that this is less than or equal to M times the sum over K, MK minus one, and

Â this is less than or equal, to M star, because this is at most the total number

Â of factors in my universal factors. What about the number of summation

Â operations? Well, here, this is less than or equal to

Â the sum over KNK, which, when you, which is N times, less than or equal to N times

Â the number of elimination steps. Which is simply less than or equal to n

Â times n. Though altogether between these two

Â steps, over here we have n x m*. Over here we have n x m, which tells us

Â that the total work that we have is linear in n and in m*.

Â Great. Linear time, aren't we lucky?

Â Well, not quite. Because, the nk, which is, the,

Â contribution to this quantity n, is the total number of values in a factor.

Â And so if we let, if we say for example just for simplicity, all variables have d

Â values. In their scope.

Â 7:41

So let's understand how this complexity manifests in the context of a real

Â example. So this is the run of variable

Â elimination that we did, that we did before just written out all in one in one

Â slide. And so now let's see what the complexity

Â of this is. When we see that we have produced several

Â factors here and let's write down how many variables are in the scope of each

Â of these factors, this one has two. This one has three, G, I, and V.

Â This one has, three, S, G, and I. This one has three, H, G, and J.

Â This one has four, L, G, S, and J. And this one has three, J, L, and S.

Â And so the size of the largest factor is, is this one that has four variables in

Â it. And that is what's going to, in general,

Â drive the complexity of the algorithm. not an example as simple as this, but in

Â more realistic examples. So, now let's understand how elimination

Â ordering plays into this. We previously said that variables can be

Â eliminated in any order, so long as we're careful about multiplying things in at

Â the appropriate time. But now let's see how elimination order

Â might affect the complexity. So assume that in this example I'm going

Â to make a not very judicious decision and I'm going to start by eliminating G.

Â So which factors do I need to multiply in, in order to eliminate G?

Â Well, psi L of L and G. Psi G of GI and D, and, phi H of H, G,

Â and J. And so if I multiply all these together,

Â it turn out that I now end up with a variant with a factor whose scope is,

Â let's see, L. D, I, D, H, and J, so a total of six

Â variables, whereas before the largest factor that we ever generated had four

Â variables. So that's, maybe you might say, six

Â versus four, not a big deal, I mean it's, you know, how much does, how much

Â difference does it make? So let's convince ourselves that in other

Â graphs it might make a bigger difference. So here is a graph that has, it's a

Â simple, pairwise Markov network with A and C and then a bunch of variables in

Â the middle, B1 up to BK. And then imagine that I start by

Â eliminating A first. Well, what are the factors that involve

Â A? Well, we have a factor, ab1, ab2, ab3, up

Â to abk. And the total scope of the factors are,

Â Is, of this factor that we generate is going to a, b1, up to bk.

Â So it's going to be exponential in k. So the size of the factor Is exponential.

Â Nk. Maybe this is inevitable.

Â Well, no. So let's imagine that, instead, we're

Â going to eliminate the bi's first. So, let's think, for example, that we're

Â going to start by eliminating b1. Well, b1 is in a factor with a, and in a

Â factor with c. So we're going to end up with a product

Â of, say, phi one, phi a1 of a, b1 * phi c1 of c, b1.

Â And that's going to give me a factor whose scope.

Â Is a. B1 and C.

Â And the result of summing out B1 is going to be a factor, tau one of A and C.

Â We're going to get the exact same behavior when we now eliminate B2.

Â And that's going to give me a factor, tau two of A and C.

Â And so on and so forth. Until, at the very end, I'm going to have

Â a bunch of factors. Tau I of A and C that are all multiplied

Â together. And I've done this without ever

Â generating a factor whose size is bigger than three.

Â