0:00

So far, we've defined graphical models primarily as a data structure for

Â encoding probability distribution. So we talked about how you can take a

Â probability distribution. And using a set of parameters that are

Â somehow tied to the graph structure. One can go ahead and represent a

Â probability distribution over a high dimensional space in a factored form.

Â It turns out that one can view the graph structure in a graphical model using a

Â completely complimentary viewpoint. Which is, as a representation of the set

Â of independencies, that the probability distribution must satisfy.

Â That theme turns out to be really enlightening, and thought provoking.

Â And so let's talk about that. And we are going to begin by just

Â defining the notion of independencies that we're going to utilize in subsequent

Â presentations. So let's start by just defining the very

Â basic notion of independence within a probability distribution.

Â And, initially we're just going to talk about the probability, the, sorry, the

Â independence of events alpha and datus within a probability distribution and let

Â me just go ahead and introduce this notation, this says P the symbol is the

Â logical symbol for satisfied. And this perpendicular symbol is a

Â standard notation for independence.'Kay? So this says, P satisfies alpha's

Â independence of beta. That's how one should read that

Â statement. And there's actually three entirely

Â equivalent just definitions of the concept of independence.

Â The first one says that the probability of the conjunction of the two events so

Â you can, you can, there's several, several different ways to denote

Â conjunction, some people denote it as intersection, we typically denote it

Â using a comma so here is the probability of alpha and beta holding both, is simply

Â the probability of alpha times the probability of beta.

Â That's the first definition. The second definition, is the definition

Â about flow of influence. And this says, if you tell me beta, it

Â doesn't affect my probability in alpha. So the probability of alpha given the

Â information about beta is the same as the probability of alpha if you don't give me

Â that information. And, because probabilistic influence is

Â symmetrical, we also have the exact converse of that.

Â That is, the probability of beta given alpha is the same as the probability of

Â beta. So this is independence of events, and

Â you can take that exact same definition and generalize it to the independence of

Â random variables. So here we're going to read this in the

Â exact same way. This says p satisfies,

Â x is independent of y for two random variables x and y.

Â And once again we have the exact same set of definitions, so the first one says

Â that p of x comma y is equal to p of x times p of y.

Â The third, the second says that p of x given y is equal to p of x and p of y

Â given x is equal to p of y. You can made this new statements in two

Â different but equivalent form, the first is at a universal statement.

Â 3:10

So for example, you could read the first statement as saying, for every assignment

Â little X and little Y to the variables x and y, we have that p of the event x

Â comma y is equal to p of x * p of y. So you can think of it as a conjunction

Â of lots and lots of independent statements of the form [SOUND] over here.

Â That's the first interpretation. The second interpretation is as an

Â expression over factors, that is, this one tells me that the factor over here

Â which is the joint distribution over XY is actually a product of two lower

Â dimensional factors one which a factor whose scope is X, and one is a factor

Â whose scope is Y. These are all equivalent definitions but

Â each of them has a slightly different intuition so it's useful to recognize all

Â of them. So let's think of examples of

Â independence, here a, A fragment of, our student network, it

Â has, three rend variabled intelligence, difficulty and course grade, and this is

Â a, probability distribution whose, who, who, that has a scope over three

Â variables, but we can go ahead and marginalize that, to get a probability

Â distribution over the scope, which is a factor over the scope ID as it happens,

Â this is the marginal distribution which you can confirm for yourselfs by just

Â adding up, the appropriate entries, so just as a reminder to get I0, D0 we're

Â going to add up this one. This one, and that one.

Â And that's going to give us this factor. And it's not difficult to test that.

Â If we then go ahead and marginalize p of I, d to get p of I and p of d.

Â That p of I, d is the product of these two factors.

Â Here is a good example of a distribution that satisfies an independence property.

Â And here is the graphical model and when you look at it you can see that there's

Â no, direct connections between I and V, and, well, and we'll talk later bout how

Â that tells us that there is no the detour action independence of this distribution.

Â Now independence by itself is not a particularly powerful notion because it

Â happens only very rarely. That is only in very few cases are you

Â going to have prob, random variables that are truly independent of each other, at

Â least few interesting cases, you can always construct examples.

Â So now we're going to define a much broader notion of much greater usefulness

Â which is the notion of conditional independence.

Â Conditional independence which applies equally well to random variables or to

Â set of random variables is written like this so here we have once again the P

Â satisfies. Here we have, again, the independent

Â sign, but here we have a conditioning sign.

Â And this is red as p is p satisfies x is independent of y given z,

Â okay? And once again, we have three identical,

Â not identical, sorry. Three equivalent definitions of this of

Â this property. The first says that probability of X, Y

Â given Z is equal to the product of P of X given Z times the probability of Y given

Â Z. Once again, you can view this as a

Â universally quantified statement over all possible values of X, Y and Z or as a

Â product of factors. Definition number two, is a definition of

Â information flow given Z, Y gives me no additional information that changes my

Â probability in X, or, given Z, X gives me no additional information that changes my

Â probability in Y. Once again, this is a, this is, you can

Â view this as an expression involving factors.

Â Notice that this is very analagous to the definitions that we had to just plain old

Â independence, Z effectively never moves, it always sits there on the right hand

Â side of the conditioning bar and never moves.

Â And so if you find yourself having a hard time remembering conditional independence

Â just remember that the thing your conditioning on just sits there on the

Â right hand side of the conditioning bar, all the time.

Â 8:28

Let's look at an example of conditional independence.

Â imagine that you have that I give you two coins.

Â And I'm telling you that one of those coins is fair, and the other one is

Â biased. And it's going to come up heads 90% of

Â the time. But they look the same.

Â So now you have a process by which you first pick a coin out of my hand.

Â And then you toss it twice. So this is which coin you pick.

Â This is the two tosses. Now, let's think about dependence and

Â independence in this example. If I.

Â Don't, if you don't know which coin you picked, and you tossed the coin and it

Â comes out heads. What happens to the probability of heads

Â in the second toss? Be higher.

Â Right? Because if it came up heads the first

Â time, that is more likely to happen. I mean it happens 50, 50 with a fair

Â coin, but it also happens that it happens with greater probability with a biased

Â coin and so the probability of having heads in the second toss is going to be

Â higher now. On the other hand, if I now tell you, no,

Â no, you've picked the fair coin, if there wasn't really, you don't really care what

Â the outcome of the first toss is. It doesn't tell you anything about the

Â probability of the second toss. Similarly, if I tell you that it's the

Â bias coin. It also doesn't tell you anything at that

Â point. The first toss and the second toss are no

Â longer correlated. And so what we have is that x1.

Â An X two are not independent. So P does not satisfy.

Â X one is independent of X2. But we have that P does satisfy.

Â 10:13

X one is independent of x two given c. So here's a very simple and intuitive

Â example of contuitive independence. Let's go back to con, another example of

Â conditional independence, one in the distribution that we've seen before.

Â This is actually a very analogous, model, because it also has, in this case, one

Â common cause, which is this case, it's the student's intelligence.

Â This is in the student example that we've seen before.

Â There are two things that emanate from that, the student's grade in the course

Â and their SAT scores, and, and once again, it's, you can generate the,

Â The joint distribution I S. G, which is this, and now you can look at

Â the probability of S and G given, for example, I zero, and ask yourselves how

Â does that how does that decompose and is that independent given when we look at

Â the probability of S given I zero and. In the probability of G minus zero.

Â 11:35

Now one, somewhat counter intuitive property is independent instead you kind

Â of don't think about when you ish, hear about independent, about conditional

Â in-dependencies the first time. Is of conditioning, on some things,

Â doesn't just gain you in-dependency as, as it did in the case of the coin, or as

Â the case of the intelligence. But rather, condition can also lose

Â independency. So this is the the other fragment of our

Â student network, where we had the. Intelligence and the difficulty both

Â influencing the grade, and we have already seen that although I and d are

Â independent in the original distribution, they are not independent when we

Â condition on grade. So this is a case where and you can just

Â convince yourselves of this by examining this distribution over here, that I and d

Â are not independent in this conditional distribution even though they were in the

Â marginal distribution.

Â