We described the belief propagation algorithm as passing message over cluster

graph but we left unspecified how that cluster graph would be constructed and

what properties it needs to satisfy in order to support reasonable message

passing. So let's remind ourselves what cluster

graphs are. A cluster graph is an undirected graph

whose nodes are clusters that involve subsets of variables and edges involve a

subset SIJ, which is a subset of the two clusters on

the twin points. What are some properties that the cluster

graph has to satisfy? The first property is the one called

family preservation an obvious value. where remember that we need, given a set

of factors phi, we need to be able to assign each phi K to some cluster,

C alpha K, such that the fact, such that the cluster can accommodate.

The scope. So can accommodate 5K.

oops accommodate.. Well, in order for the cluster graph to

allow this to be done, we need to have an appropriate cluster in the cluster graph.

So, this imposes a constraint on the cluster graph that says that for every

factor phi K in my set of factors phi, there exists some cluster CI that, such

that CI accommodates phi K, which means that you can put phi k inside

this cluster. That's the family preservation property.

The second property is a little bit trickier to understand.

It's called the Running Intersection Property.

The Running Intersection Property, let's first read the definition and then

understand what it says. It says that let's assume that we have a

pair of clusters CI and CJ and a variable X that sub, that belongs to both of them.

So for example we might have the variable X sitting here in C7, and we might have

the variable X sitting here, in C5. This property says that there exists a,

exists a unique path between Ci and Cj for which all clusters and subsets along

the path contain x. What does that mean?

It means that for example should I choose this to be my unique path.

It means that X needs to sit here, here and here.

So, there is a connecting path that involves X and that path is along the

entire route and that path is unique. So let's try and understand the intuition

between both sides of this definition, the existence and the uniqueness.

Imagine that I, let's do the existence first.

And imagine that for whatever reason I decide that there is not going to be a

path over here. So there is no way for C7 to communicate

to C3 about the variable X. Well, that means that we now have these

two separate, isolated communities, each of which has some information about X and

they can never talk to each other about X.

So they're never going to get to agree about X, there's never going to be any

information transfer, of these two pieces of information.

Well, so that's not very good, which is why we need, that path to exist.

Okay, so that's the exists part. What about the unique part?

The unique part is a little bit trickier to understand but let's but let's try and

provide some intuition for it. Let's imagine that I have two paths that

involved X. Let's so now let's think about this

message passing algorithm. So C3 can send the message.

C5 with information about X. For example, I think X is taking the value one, I have

a, I have a factor that suggests that X takes the value 1.

Well, C5, you know, integrates that with it's own information and sends it on to

C2. Which sends it back to C3.

And now C3 hears from C2 that ooh, X needs to take the value one.

Huh? That reinforces its beliefs that X needs

to take the value one and so the probability goes up.

It now goes back and sends that information on to C5, which sends it on

to C2, which sends it back to C3, and then once again the probability in X

taking the value 1 goes up. And so we have this sort of

self-reinforcing loop that's going to give rise to very extreme and very skewed

probabilities in many examples. And so a way of avoiding that is to or at

least reducing that risk is to is to prevent these kinds of feedback loops.

Now, it's important to realize that by preventing these loops that only reduces

opposed to eliminates the problem. And specifically this is kind of a little

bit of a digression but it's important to know,

is that, imagine for example that we have X, and Y, that are very strongly

correlated. So here, we have for example X and Y and

here we have a path. That involves Y.

Okay? So now what's going to happen is that C3

sends information to C5 about X. C5, translates that to information about

Y, Y should take the value one. That information about Y goes back to C3

and increases the probability in X taking the value one.

Now this is a little bit of a forward looking hint about some of the issues

that we'll see with belief propagation later on, which is that belief

propagation does very poorly. When we have strong correlations.