0:00

So we've learned a lot of little bits and pieces of representation, that are used

Â to put together a graphical model. And now let's try and take a big step

Â back, and figure out how you might put these all together if you actually wanted

Â to build a graphical model for some application that you care about.

Â Now, let me start by saying that this is really not a science.

Â Just like any other design, it's much closer to an art, or even a black magic,

Â than a scientific endeavor. And, so the only thing that one can do

Â here is to provide hints about how one might go about doing this.

Â so let's first identify some important distinctions, and then we'll get concrete

Â about particular examples. there's at least, three main classes of

Â design choices that one needs to make. the first is whether you have a template

Â based model versus a very specific model for concrete fixed set of random

Â variables. Whether the model is directed or

Â undirected. And whether it's generative, versus

Â descriminative. These are all terms that we've seen

Â before, and we'll talk about them in just a moment.

Â But, before we go into the sort of, trade offs between each of these, let me

Â emphasize this last point, which is probably the most critical thing to

Â remember. It's often not the case, that you just go

Â in one direction or the other. That is in many models you're going to

Â have for example, template based pieces, as well as some stuff that, isn't at the

Â template level. you might have directed as well as

Â undirected components, and so on. So these are not, a sharp boundary, and

Â it's useful to keep that in mind. That you don't have to go only one

Â direction versus the other, in a real problem.

Â Now the first important distinction is template based versus specific.

Â And what are some examples of specific models? So for example medical diagnosis

Â is usually a specific model. That is you have a particular set of

Â symptoms diseases and so on that you want to encode in your model so that's one

Â example. on the other side on the template based

Â side you have things like image segmentation.

Â 3:02

All diagnosis has you can think of it as a specific model that is you can think

Â about writing a diagnostic model for this particular type of printer.

Â But really, if you're in, inside a company that's writing a diagnostic tool

Â for your line of fifteen different printers, they're going to have shared

Â components. And if you have a component inside

Â printer one that also appears inside printer two, chances are that it's going

Â to have the same fault model. And so you're going to have elements that

Â are unique, and elements that are shared. And so once again, it's something that's

Â going to sit at the intersection between the two.

Â That, said. Once you've decided where on this

Â spectrum you sit. It kind of really changes the way in

Â which you tackle the knowledge engineering problem.

Â Because template based models, are usually.

Â Not always, but usually. have a fairly, small.

Â Number of variable types. So, for example, in our image

Â segmentation setting, you have the class label, that is one variable type.

Â Nevertheless, we manage to construct very richly expressive models about this,

Â because of interesting interactions between multiple class labels for

Â adjacent, for different pixels in the image.

Â But it's a very small number of variable types, and most of the effort goes into

Â figuring out things like which features are most predictive.

Â 6:13

a higher performance model. So you might wonder, well when would I

Â use a generative model, I mean that he gets at high performance by using richly

Â expressive features, and there's multiple answers to that.

Â One answer is when I don't have a, when I don't have a predetermined task so when

Â the task shifts. So for example, when I have a medical

Â diagnosis pack, every patients present, every patient presents differently.

Â In each patients case I have a different subset of things that I happen to know

Â about that patient. The symptoms that they present with, and

Â the tests that I happened to perform. And so, I don't want to train a

Â discriminative model that, uses a predetermined set of variables as inputs

Â and a, predetermined set of diseases as outputs.

Â Rather, I want something that gives me flexibility to measure different

Â variables and predict others. The second reason for using a generative

Â model. And this is looking way forward in the

Â class. is that it turns out that generative

Â models. Are easier.

Â A train in certain regimes. And specifically, just to sort of make

Â sure, just to sort of say it out loud, in, the case where the data is not, fully

Â labeled, it's, it turns out that generative models can some, that, that,

Â sometimes you can't train in this form of the model, but you can train a generative

Â model. So we'll definitely see that when we get

Â to that part of the course. Okay, so having talked about these

Â different these different regimes. Now let's think about what are the key

Â decisions that we have to make in the context of designing a graphical model.

Â So, first of all what variables are we going to include in the model and

Â regardless of whether we have a fixed. Or varying task in hand.

Â We have usually a set of variables that are the target variables.

Â These are the ones we care about. So, even in the medical diagnosis

Â setting, you have a set of disease variables.

Â Which are the ones that we care to predict.

Â You might not care to predict all of them, in any given setting.

Â But they're usually the targets. We have the set of observed variables.

Â Again, they might not always be observed, but you don't really, necessarily care

Â about predicting them. So these might be in the medical setting,

Â things like symptoms and test results. And then, the third category might be a

Â little bit surprising. So, we might have variables that are

Â latent or hidden. And these are variables, that, we, don't.

Â 8:49

Nor do we necessarily care about predicting, they are just there.

Â Why would the [INAUDIBLE] model variables that you neither observe nor care to ever

Â look at? So, let's look at an example.

Â Let's consider. Imagine that I asked all of you in this

Â class, what time does your watch show? Okay?

Â So each of these WIs is the watch. The, the time on the watch of each of you

Â in the class. So we have W1 up to WK.

Â Now, these variables are all correlated with each other.

Â But really, they're not correlated with each other.

Â Unless we all had, like, a watch setting party just before class.

Â Really, what they're all correlated with is Greenwich mean time.

Â So you have a model, in this case it's a naive base model, where you have

Â Greenwich Mean Time influencing a bunch of random variables that are

Â conditionally independent given that. Now Greenwich Mean Time is latent unless

Â we actually end up calling Greenwich to find out what the current time is right

Â now in Greenwich, which I don't think any of us really care about.

Â But why would we want to include Greenwich Mean Time in our model?

Â Because if we don't include Greenwich Mean Time, so if we basically eliminate

Â Greenwich Mean Time from our model, what happens to the dependency structure, of

Â our model? We end up with a model that is

Â [INAUDIBLE]. And so sometimes latent variables can

Â simplify our structure. And so there useful to include even in

Â cases where we, real, don't really car about them, just because not including

Â them gives us much more complicated models.

Â Which brings us to the topic of structure.

Â when we think about Bayesian networks specifically.

Â The, the concept that comes to mind. The question that comes to mind is, do

Â the arrows, given that they are directed. Do they correspond to causality?

Â That is, is an arrow from x to y indicative of having a causal connection

Â from x to y? So, the answer to that is yes and no.

Â Very satisfactory. so what does no mean in this case?

Â Well, we've, we've seen. It means consider a model where we have X

Â pointing to Y. We'll just, you know, do the two variable

Â case. Well, any distribution that I can model,

Â on this graphic model where X is a parent of Y, I can equally well model in a model

Â in the Bayes Net where I invert that edge and has a Y pointing to X.

Â So, in this example, as well as in many others, I can reverse the edges and have

Â a model that's equally expressive. And in fact I can do this in general that

Â is you can give me any ordering that you want on the random variables and I can

Â build you a graphical model that can represent them.

Â Any distribution that has that ordering on the variables so you want X1 to come

Â before X2 to come before X3 and you want to represent the distribution peak,

Â that's fine no problem I can have a graphical model that will do that but.

Â That model might be very nasty. And we've already seen an example of that

Â when we had a case where X1 and X2 were both parents of Y, and it was, you know,

Â a simple model that looked like this. And if I want to invert the

Â directionality of the edges and put Y as a parent of say X2.

Â 12:39

Then, I have to, if I want to capture the distribution that I started out with that

Â for which this was the graph. Then I end up having to have, a, a direct

Â edge between X1 and X2. And so what happens is that causal

Â directionality is often simpler. So to drive this home even further, let's

Â go back to our Greenwich mean time example.

Â Where we have the Greenwich mean time is in some way the, the cause or the parent

Â of the different watch, times that we see our in different individuals.

Â And let's imagine that I force you, to invert the edges.

Â What's it going to look like? Well.

Â And now I'm going to force Grenwich mean time to be the child of all these.

Â And now what? Is this the correct model?

Â No, because this says that all of the watch times are independent which we know

Â is not the case. And so, what we're going to end up with

Â as the model is the same horrific model that I showed before where everything is

Â connected to everything else. And so causal ordering, although it's not

Â more correct than a non-causal ordering, it's sparser.

Â So generally. Are sur as well as more intuitive so more

Â intuitive. As well as easier to parameterize.

Â 14:07

Very human. So again your not forced to use it and

Â sometimes there is a good reasons not to do it but.

Â It's generally a good tip to follow. So how does one actually construct a

Â graphical model? Do we have in our minds some monolithic P

Â of some set of variables, X1 up to XN and we just need to figure out how to encode

Â that using a graph? Well maybe implicitly, but certainly not

Â in any explicit form. The way in which one typically constructs

Â a graphical model in practice is by having some variable or sometimes set of

Â variables that we wish to reason about. So, for example, we might care about the

Â variable cancer or maybe even lung cancer.

Â Well, what influences, whether we have cancer.

Â whether somebody is going to get lung cancer.

Â Well if we go an ask a doctor. What is the probability for someone to

Â get lung cancer? The doctor is going to say, well.

Â You know, that depends. And you might say, what does it depend

Â on? An the doctor will say, well.

Â Whether they smoke for example. At which point.

Â You're likely to add the variable smoking as a parent to the lung cancer variable.

Â The doctor might say well but that's not the only thing, it might the probability

Â of cancer also depends for example on the kind of work that you do because some

Â kinds of work involve more dust particles getting into your lungs and so again

Â here's another variable which you would add as a parent.

Â And I even go and ask either there a doctor or an expert in a different domain

Â what is the probability that somebody smokes?

Â And if they think about it they're likely to say that depends, and what does it

Â depend on? Well maybe their age, gender,

Â maybe their, the country that they live in because certain different countries

Â have different smoking frequencies. And so once again, we're going to extend the

Â conversation backward to include more variables up to the point that we can

Â stop, because if you now ask for example, what is the probability of gender being

Â male versus female, well anybody can answer that one.

Â And at that point one can stop because there's no way to extend the conversation

Â backward. Is that enough?

Â Usually not because we also need to consider for example, factors that might

Â help us might indicate to us whether somebody's going to have can, somebody

Â has cancer or not. And so we might go and ask the doctor

Â what are some pieces of evidence that might be indicative here, and we would,

Â the doctor would tell us for example, coughing or maybe bloody sputum and

Â various other things that would be potential indicators.

Â And at that point, one would say, well, okay.

Â What is the probability of coughing given lung cancer?

Â And again, one would now extend the conversation backward to say.

Â Well, other things may cause coughing. For example, having allergies.

Â And so once again we would, go from here and extend backward, to construct a

Â graphical model that captured, all the relavent factors for answering queries

Â that we hear about. So, that's the structure of a graphical

Â model now let's talk a little bit about parameters, the values of these

Â parameters and what make a difference here, so here are certain things that

Â really do make a difference, to parameters, zeros.

Â Make big difference. And when we talked about diagnosis we saw

Â that many of the mistakes that were made in early medical expert systems were

Â derived from the fact that people gave zeros to things that was unlikely.

Â But not actually impossible. And so zeros are something to be very,

Â very careful about. Because you should only use something,

Â you should only give probability zero to something that is, impossible perhaps

Â because it's definitional. Otherwise, things really shouldn't have

Â probability zero. Other things that make a difference are a

Â sort of weaker versions. So for example, orders of, order of

Â magnitude differences, the difference between a probability of one over ten

Â versus one over 100 that makes a difference.

Â It makes a much bigger, whereas small differences like 0.54 versus 0.57 are

Â unlikely to make a difference to most queries.

Â Finally it's turned out that relative values between conditional probabilities

Â make a much bigger difference to the answer than the absolute probabilities.

Â That is, the, comparing different entries in the same CPD, relative to each other,

Â is a very useful way of of evaluating the graphical model and seeing whether the

Â value. Use that you use for those relative

Â ratios really make sense. Finally,

Â Conditional probability tables are actually quite rare acceptance small

Â applications. In most cases one would use structured

Â CPDs of the forms that we've discussed as well as the variety of other forms.

Â So let's talk a little bit about structured CPDs because those are

Â actually quite important. and we can break up of the.

Â The types of CPD's that we've talked about along two axes: one is whether

Â they're intended to deal primarily discreet or with continuous variables.

Â And on the other side is whether they type of structure that they encode is

Â context specific, where a variable might make a difference in some circumstances

Â and not in others, versus aggregating. Of multiple weak influences.

Â And so let's give off an example of each of these categories.

Â So for discrete and context specific, we had three cpd's as an example.

Â For discrete and aggregating we had sigmoid.

Â CPD's as well as noisy or, where noisy max or any one of those, that family.

Â For continuous CPD's we didn't actually talk about context specific,

Â representations, but one can take the, continues version of tree CPD called a

Â regression tree. Where one breaks up the context based on

Â some threshold on the continuous variables.

Â 21:35

Finally, it's important to realize that a model is rarely done the first time you

Â write it, and just like any code design model design is an iterative process

Â where one starts out somewhere, test it and then improves it over time.

Â So importantly once one constructs a model, the first thing to do is to test

Â the model. Ask it queries and see whether the

Â answers coming out are reasonable. There's also a suite of tools to do

Â what's called sensitivity analysis. Which means that one can do, for, one

Â for, can look at a given query, and ask which parameters have the biggest

Â different on the value of the query, and that means those are probably the ones

Â that we should fine tune in order to get the best results to the queries that we

Â hear about. Finally any iterative refinement process,

Â usually depends extensively on a process of error analysis.

Â Where once we have identified the erros that our model makes we go back and try

Â and see which improvements to the model are going to make those errors go away.

Â It could be for example adding features for example in some of the image

Â segmentation work that we did there's features that might help eliminate

Â certain errors that we see in our segmentation results.

Â Or maybe adding dependencies to the model that can capture the kind of structure

Â that's in it.

Â