0:00

We've show how probabilistic graphical models can be used for a variety of

Â inference tasks like computing conditional probabilities or finding the

Â map assignment. But often the thing that you actually

Â want to do with a probability distribution is make decisions in the

Â world. So, for example, if you're a doctor

Â encountering a patient, it's not just enough to figure out what disease the

Â patient has. Ultimately you need to decide what

Â treatment to give the patient. How do we use a probability distribution

Â and specifically a probabilistic graphical model in order to make the

Â decisions? It turns out that the theoretical

Â foundations for doing this kind of inference.

Â Pass were actually established long before probalistic graphical models came

Â to the fore work and that framework is called the framework of maximum expected

Â utility. So lets formulate the problem that we are

Â trying to solve and then we can define the principle of M E U or maximum

Â expected utility. So what are the simple decision making

Â situation be comprised of? Well we have a set now, it's no longer a

Â purely probabilistic model we have other elements in it.

Â We have an action A which has several different choices that we might pick, for

Â example different drugs that we might choose to give the patient.

Â We have a set of states of the world X1 of the XN.

Â And the action can affect which of those states comes about.

Â So this set of states might include those things that I can't affect like the

Â patient's pre-disposing factor of which disease they have or which test results

Â came out which way. But it can also include things like the

Â outcome that that the patient will have after I administer the drug.

Â So this, this set of variables that defines the state can involve things that

Â I can affect and things that I cannot affect.

Â And then finally, the final piece of the puzzle here is something that defines the

Â agent's preferences. That is they cannot make preferences that

Â weigh different, that consider different actions without weighing the relative

Â merits of different actions. And my way of evaluating the merits of

Â different actions is to have some kind of numerical utility that tells me if I take

Â action A in this, and I have this particular state of the world, how happy

Â am I? And.

Â Very happy, mediat, moderately happy, or extremely unhappy, so this numerical

Â utility function is going to be thing that's going to allow me to evaluate

Â different actions in terms of how much I prefer them.

Â So, now I can go ahead and write down the notion of an expected utility for a

Â decision, problem D, given an action A. And, the way in which that's done, is a

Â simple expectation. I sum up over all possible states of the

Â world. The probability that I get that state,

Â given the action, and remember, not everything in that action might,

Â everything in that state might be affected by the action, and then I

Â multiply that by the utility of the state action pair.

Â And I sum up over all possible states that I might end up with.

Â And, that summation, that expectation, is the overall happiness that I have on

Â average, in this decision making situation, if I take the actioning.

Â Clearly, I want to maximize my overall happiness.

Â And so the purpose of the maximum expected utility computation.

Â The principle of maximum expected utility,

Â maximum expected utility is that I want to choose the action A that maximizes

Â this expected utility. We can use that ideas that we developed

Â in the context of graphical models to represent decision making situations in a

Â very interpretable way. So this is a structure called an

Â influence diagram and it's an extension of a Bayesian network with two additional

Â pieces. We have in addition to random variables

Â which are denoted as usual as ovals. So here's a random variable.

Â And a random variable is part of my state X.

Â 4:27

We also have two other kinds of nodes in this diagram.

Â We have, action variables. Which are places where it's not nature

Â that gets to make a choice over the value of the variable.

Â But rather the agent that gets to pick. And then we have the green diamonds at

Â the bottom which represents utility. So, let's look at this particular

Â example, which is probably one of, the simplest thing that you can construct.

Â this is a, budding entrepreneur, who just graduated from college and wants to

Â decide whether to found a widget making company or not.

Â So he has two possible actions, F0 which is do not found and go take a more

Â conventional job. And this is found the company, which of

Â course is a much more risky strategy. We have a random variable, which is a

Â market for widgets. And, this is a poor market,

Â moderate nad the great market. And so now we can and we have a

Â probability distribution over those different values.

Â The utility of the agent obviously depends on both whether he decides to

Â found the company, and on the market. And we're denoting that by having both of

Â these be parents of the utility variable. So the parents of the utility variable,

Â or the utility function, are the attributes or the variables, on which the

Â agent's utility depends. So, in this case, our utility has two

Â arguments F and M. And we can see that if the agent decides

Â not to found the company. Then the utility is zero, regardless of

Â the market conditions for widgets because at that point, you don't really care.

Â And if the agent does decide to found the company, and the market for widgets is

Â poor, then his utility is -seven. Say he invests a bunch of money, and ends

Â up losing all of it. If the market is his utility is five.

Â 6:51

And if the market is great, his utility's twenty.

Â Okay? So now we can go ahead and compute

Â expected utility for the two action choices.

Â So in this case the expected utility of, S0 is 0 and the expected utility of F1 is

Â going to be equal to 0.5 * -7 + 0.3 * 8 + 0.2 * 20.

Â Which, if you add it all up, is equal to 2 so obviously in this case, the optimal

Â action, is to found the company. Let's look at a more complicated

Â influence diagram that involves more random variables.

Â And also other richer structures so this is an elaboration of our student network.

Â it has the variables, difficulty and intelligence.

Â The grade, the quality of the letter. And, in this case, also a variable that

Â represents the, the job. A student's job prospects which we're

Â going to assume depends both on the quality of the letter, as well as on the

Â student's grade because the recruiter has access to the student's transcript.

Â The student has one decision in this example, which is whether to study in the

Â course or not which is going to probably affect their grade.

Â And so we have the B study variable, the study action affects the distribution

Â that the student has over grades. Now, in this case you'll notice that we

Â have three different utility diamonds VG, VS, and VQ.

Â So what are these what do these represent?

Â They represent different components of the utility function, so in this case

Â I've broken down the utility function into constituent pieces.

Â This one represents the happiness with the grade by itself.

Â This is the utility that you get from saying, Yeah, I got an A in the course,

Â or ooh not so good, I got a C. So that's one piece of your utility

Â function. A second piece of your utility function

Â is your overall career aspirations which is the value of getting a good job,

Â versus not getting a good job. So that's this piece, over here.

Â And the final one is the component that is ascribed to the quality of life, that

Â you have during your studies. So if you don't study maybe you can go

Â and play ultimate Frisbee and go to the movies every night.

Â and so that might give you more happiness than sitting there hitting the books

Â every evening. And so we're going to assume that

Â particular utility depends on how much you study and on how difficult the course

Â is. Because maybe it's more fun to study if

Â it's easy. And we're going to assume that your

Â overall utility, that the agent's overall utility, is the sum, of these different

Â components. And this is called the decomposed utility

Â function, and we'll come back to that later.

Â 10:22

Now, you might wonder why I would bother decomposing the utility function instead

Â of having just a single big monolithic utility function that is affected by all

Â of these different factors. Well, that is exactly why because it is,

Â in fact, affected by all of these different factors.

Â So, whereas, here, I have a utility function that has one argument.

Â And another one has one argument and one has two arguments.

Â If I had put them all together, I would have a single monolithic utility function

Â that has four arguments. And that would be much harder to elicit

Â because of the exponential combinatorial growth in the number of possible

Â combinations that I need to consider. So this is a decomposition of a function.

Â In this case, the utility function, into factors in exactly the same way that we,

Â for example, like to decompose a probability distribution.

Â As a product of factors so another interesting extension of the influence

Â diagram representation allows us to capture is the notion of the information

Â available to the agent when they make their decision.

Â So, let's look at this example over here, which elaborates our entrepreneur

Â example. And here, we come in with the assumption

Â that the entrepreneur has the opportunity to conduct to survey regarding the

Â overall market demand for widgets before deciding whether to found a company.

Â Now the survey is not entirely reliable. And so, over here, we have the CPD, that

Â tells us. How likely different survey results are

Â to come up, given different values of the true market demand.

Â Now the point is that, having conducted the survey, the agent can base his

Â decision on the value of the survey. We denote that with the presence of an

Â edge such as this which indicates that the.

Â Founder. That the entrepreneur can make different

Â decisions, depending on how the value of the survey turns out.

Â So from a formal perspective, that means that what the agent's allowed to is to,

Â choose a decision rule, which we're going to denote delta, at the action node.

Â So the, and what a decision rule is, is effectively a CPD.

Â It tells the agent given an assignment of the values to the parents of A, which are

Â the val, the variables that the agent can observe prior to making a decision.

Â The agent, based on that, can decide on the probability distribution of the

Â action that it takes in that particular situation.

Â So here for example we would have a CPD which tells us the probability of

Â founding a company, which in this case is a binary valued variable, given the three

Â possible values of the survey variable. So you might say, why would I need to

Â have a CPD? why would the agent ever want to make a

Â random choice between two different actions?

Â And it turns out, in fact, that in a single agent decision making situation.

Â There's no value to allowing that extra expressive power, because of the

Â terministic CPD would work just as well. Nevertheless for reasons that will become

Â clear on the next slide, it's actually useful to think about this as a CPD, even

Â though in most cases it'll actually be the terministic at reasoning samples that

Â we're talking about. So now that we've given the agent that

Â expressive power how do you formulate the decision problem that they need to make?

Â That is what do they get to choose and how do they choose it?

Â So given a decision rule delta for an action variable A if we inject that into

Â the decision network now all of the variables in our network, both A and the

Â remaining variable X, variables X all have a CPD associated with them.

Â So now we have effectively defined a joint probability distribution.

Â Over, all the variables X union. The variable A because each of them has a

Â CPD, and so this. Is, the, this probability distribution.

Â And the agent expected utility, once they've selected the decision rule delta

Â A is simply this expectation over here. We average out over all possible

Â scenarios, where scenarios and assignment to the chance variables X as well as to H

Â is action A. And but we're averaging out is the

Â agent's utility in that scenario. And so that's the overall expectation

Â that the agent prior to anything going on can expect to gain given a decision rule

Â delta A. So obviously the agents want, the agents

Â wants to choose the decision rule delta A.

Â That is going to give him the maximum value for this expression, that denotes

Â his expected utility. And so this is the optimization problem

Â that the agent is trying to solve. So how do we go about finding the

Â decision rule that maximizes the agent's expected utility?

Â Well, let's look at this first in the context of the simple example, and then

Â we'll do the general case. So over here, we see, once more, the

Â expected utility equation for a given decision rule.

Â And now we're trying to optimize this delta A.

Â So let's write down the expected utility in this particular example.

Â So the bayesian network in this case, has now two original CPD's.

Â There is P of M, that comes in from the M variable.

Â There's P of S given M, which comes out from the survey variable and we have the

Â CPD that is as yet to be selected, which specifies the decisional at the at the

Â action F. And then multiplied with all that is the

Â utility which depends on the decision to found the company.

Â And on the true market value. Well, this should look awfully familiar.

Â It is simply a product of factors. Some of these factors are probabilistic

Â factors. And one of them, the U, is a different

Â numerical factor. But it's still a factor which depends on

Â a scope of two variables, in this case F and M.

Â So now we can go ahead and apply the same kinds of analysis that we've done in

Â previous computations. Specifically we can, since we're trying

Â to optimize over this expression over here the delta F of F given S.

Â We're going to push in everything that we don't currently need to manipulate, which

Â is just F and S which are the two things that appear in the factor, the decision

Â role. And so specifically we're going push in M

Â and we with in the summation, over M, all of the factors.

Â that depends on M. And so, that gives us a sum over M, P of

Â M, P of S given M, U of F, comma M. And if we marginalize out M, what we end

Â up with, if we multiply these three factors and marginalize that M, is a

Â factor that we're going to call mu of FS. And the reason that it's called mew is to

Â suggest that it has utility component U, so mew and U.

Â And now we have actually a fairly simple expression that the agent is trying to

Â optimize. It's a summation over all possible values

Â of S and F over here, of the Delta, the decision rule that the agent takes over

Â F, given the situation as given the observation, multiplied by this factor

Â that I just computed. And now if our goal is not to optimize

Â this expression, what, the agent ought to do is to pick the variance factor, U of

Â FF, the highest value. the S that gives it the highest value in

Â each circumstance S. So this is a little bit of strat so let's

Â look at this in the context of an actual numerical example to see this.

Â So here are the three factors that we have in this network to begin with.

Â This is the CPD for M over here. The CPD for S given M in the middle, and

Â here is the utility factor U, it comes from here.

Â And now if we're going to go through this expression, through this computation, and

Â we're going to compute. This expression over here, this is what

Â we get. It's a factor that gives us this value

Â for each value of the variable of the action F.

Â And each value of the survey variable S. And if we want to maximize the summation.

Â 19:29

And we look at this factor over here, it seems clear that what's going to maximize

Â the summation is if we put full weight on F0 in the case F0, which if you think

Â about, if you look at the CPDs at the case where the survey suggests that the

Â market demand is bad, and at the same time in the other two cases we're going

Â to choose the action to found the company.

Â And if you look at this, you can see that no other choice of delta is going to give

Â us a higher value for this summation than this deterministic choice that chooses in

Â the case, S0, we choose S0, and in the case S1, and S2, we choose the

Â [INAUDIBLE] And so this is the optimal decision rule.

Â And if you ask, what is the expectation what is the agent's expected utility with

Â this strategy before it does anything, before it, before the survey is

Â conducted. Well, it's just going to be the sum of

Â 01.15+2.1, + 1.15 + 2.1 which sums up to 2.25, as the overall expected utility,

Â sorry, 3.25. As the agent's overall expected utility

Â in this case. Now, if you think back to the agent's

Â expected utility, when you didn't have the information about the survey.

Â That expected utility, if you recall was 2 so without survey the agent's best map

Â expected utility was equal to 2. So by conducting the survey, the agent

Â has gained 1.25 utility points. So, let's generalize this to arbitrary

Â decision diagrams. We're going to focus on the moment on

Â single utility nodes and single action nodes.

Â The more general case is a little more complicated and we're not going to go

Â through, but the principles are very similar.

Â So, once again, going into this example, we see that Delta A, the decision with A,

Â defines a joint distribution. Over all of the Chancion, and the Accuin,

Â and this is multiplied by the utility under this scenario.

Â And so we can open up the expected, the, the probability distribution piece of

Â delta A as a product of two types of CPDs.

Â We have the original CPDs which is simply the product over I, P of XI given its

Â parents and then we multiply that by the decision role delta A.

Â And we're going to use Z just as a shorthand to denote the parents that is

Â the observations that the agent makes prior who deciding on A.

Â 23:58

So summarizing how this algorithm goes, to compute the maximum expected utility

Â and find the optimal decision rule at an action A, we take the following steps.

Â We treat A as a random variable with, at this point, an unknown CPD.

Â We introduce an additional factor beyond the CPDs which is utility factor who's

Â scope is the variables that the that the utility depends on, so now we have, as

Â usual, a collection of factors. We now do, effectively, a variable

Â elimination algorithm. So this is plain, good old, variable

Â elimination, to eliminate all variables except A and A's parents, which produces

Â a factor mu of AZ. And then, for each parent assignment, Z

Â to the variable A, we choose the action that maximizes over A within that factor.

Â So, summarizing this the MEU principle provides us with a rigorous foundation

Â for decision making under uncertainty. Building on that, the framework of

Â probabilistic graphical models allows us to provide a compacter presentation for

Â complicated, high dimensional decision making settings where we have multiple

Â chance variables, multiple actions, and utilities that depend on multiple

Â variables. We can now build on inference methods in

Â problistic graphical models, specifically in this case the variable elimination

Â algorithm to both find the optimal strategy and figure out the overall value

Â to the agents of a particular decision situation.

Â And where we didn't talk about this these methods can also be extended through

Â richer scenarios where we have multiple utility components as in the example we

Â gave. As well as multiple decisions an agent

Â makes in a sequence.

Â