0:00

So far we focused solely on the global structure of the distribution.

Â The fact that you can take it and factorize it as a product of factors that

Â correspond to subsets of variables. but it turns out that you also have other

Â types of structure that you might want to encode and that is actually really

Â important for real world applications. So to motivate that, let's look at the

Â tabular representation of conditional probability distributions which is what

Â we've used universally until now in the examples that we've given.

Â So a tabular representation is one of these examples where we have, you know, a

Â row for each assignment. So this is, just a reminder, this is G

Â with parents I and v. And here we have a row for each

Â assignment of appearance that give us explicitly enumerating all of the entry's

Â that correspond to the probabilities of the variable G.

Â So this is great because it will what's the problem and then it sounds like a

Â perfectly reasonable and very understandable representation.

Â Well, that's great, but now let's consider a more realistic example.

Â For instance, in a medical application we might have a variable corresponding to

Â cough. Well, there is lots and lots of things

Â that might make you cough. You might have pneumonia or the flu or

Â tuberculosis, or bronchitis, or just the common cold.

Â And, by the time you finish enumerating all of the many things that might make

Â you cough, you realize that the variable, usually, might have as many as ten, or

Â fifteen, or even twenty parents for a variable such as fever.

Â So that's so when you think about what the

Â implications of that are, relative to a tabular CPD.

Â You realize that if we have k parents. And let's assume for the moment that

Â they're all binary just for simplicity then the number of entries in the CPD

Â grows as two to the k, or of two to the k, depending on the number of variable,

Â value of the child. So, unfortunately these situations are

Â more common than not and that means that tabular representations are really not

Â suitable for a lot of real world applications.

Â So we have to think beyond that. So, fortunately, there is nothing in the

Â definition of a Bayesian network that imposes on us a fully specified table as

Â the only representation of a conditional probability distribution.

Â The only thing we need is that the CPV, p of x, given y1 up to yk, needs to specify

Â a distribution. Over x for each assignment, y, one to y,

Â k. And it can do that completely implicitly.

Â It can do that as a little bit of c code that looks at y, one to y, k and, and

Â prints out a distribution over x. Now fortunately we don't usually have to

Â sorry or in fact we can use any function parameterized or as a C routine or

Â anything to specify a factor over the scope X Y1 up to Yk such that well it has

Â to be a probability distribution over X, so it has to sum, so when you sum up all

Â the values of X. for given assignment Y1 up to YP has the

Â sum to one. Anything that satisfies these criteria,

Â these constraints, is a legitimate representation of a CP.

Â Like I said. Fortunately, we don't usually have to

Â resort to C code, to specify CPDs. the theory, the framework of statistics

Â has defined for us a multitude of different representations of a

Â conditional probability distribution given a set of, of conditioning

Â variables. so some examples include deterministic

Â CPDs where x is a is a deterministic function of y 1 up to y k.

Â We've already seen a couple of examples of that.

Â you can think, we can define, and we will define CPDs that have the form of what's

Â called a decision tree or a regression tree

Â framework that some of you have seen before.

Â you can think of CPDs that are logistic functions or more generally, log linear

Â models. We're going to talk about things that are

Â noisy or noisy. And which are noisy versions of

Â deterministic CPDs. And then, in the continuous case, we also

Â have, frameworks that allow us to represent the probability distribution of

Â a continuous variable on a set of continuous or discrete parents.

Â And that is really critical because, obviously, when you have a variable that

Â takes on a continuum of values, you. And possibly write down the table that

Â lists every single one of them. Now one of the things that are

Â intertwined with the notion of structure within a CPD is a useful notion

Â called context specific independence and it turns out that this notion arises in

Â some of thee, representations, of CPD's that we're going to talk about.

Â Context specific independence is a type of independence that is co, were we have

Â a set of variables X, a set of variables Y, a set of variables Z, and then we have

Â a particular assignment C which is, an assignment to some set of variables, big

Â C. So this is conditioning.

Â This is a, an independent statement that only holds for particular values of the

Â conditioning variable C. As opposed to all values of the, of the

Â conditioning variable C. Now, the definition of this is exactly as

Â we have seen before. So, except that if you remember before,

Â we had Z stays when we were doing conditional independence given Z, we had

Â the Z stays on the right hand side of the conditioning bar, everywhere.

Â Well now, Z and C stay on the right hand side of the conditioning bar in all of

Â these forms of the independent state. So let's look at y context specific

Â independence might arise when we have particular internal structure within a

Â CPD. So, let's imagine.

Â Let's, let's consider the case that x is a deterministic or of y1 and y2.

Â And the question is. What form of context specific

Â independence holds when we have X being a deterministic, X being this deterministic

Â or? So let's look at these statements one at

Â a time and here we have that Y2 is false. And when Y, what happens when Y2 if

Â false? Well, when Y2 is false, X is the same as

Â Y1. In which case obviously their not going

Â to be independent. On the other hand, what if Y2 is true.

Â Y was true. I don't care about Y1 anymore, because if

Â Y2 is true, X is true. So, here we have a notion of context

Â specific independence. What about this one?

Â I'm now asking about y1 being independent of y2 giving two possible values of x.

Â What happens when x is false? What do I know about y1 and y2.

Â They're both false. Well if they're both false, then they're

Â independent of each other. Because if you tell me that one of them

Â is false, I already know the other one's false, so they're independent.

Â And so this is another, condition, context specific independence that holds

Â here, does it hold if I tell you that x is

Â true? No, because I don't know which of y1 and

Â y2 made x true. And so this last one doesn't hold and so

Â here are two context specific independencies that hold in the context

Â of deterministic cpd that wouldn't hold. Neither of these would hold for general

Â purpose cpd, would necessarily hold for general purpose cpd, where, x depending

Â on y one and y two.

Â