0:00

Hi, my name is Brian Caffo and this is Mathematical Biostatistics BootCamp

Â Lecture five on Conditional Probability. So in this lecture, we are going to talk

Â about conditional probabilities and then the associated density functions for

Â calculating conditional probabilities basically just called conditional

Â densities and we'll talk about conditional maths functions for a discreet random

Â variables. We'll talk about Bayes Rule and then

Â briefly talk about an example of Bayes Rule using diagnostic test and then we'll

Â talk little about the so-called diagnostic likelihood ratios.

Â So let me give you some brief motivation for conditional probabilities.

Â I think we kind of internally do these things pretty easily.

Â So, imagine rolling a standard die, and we're assuming that the probability of

Â each face is one-sixth. Suppose you didn't know the outcome of the

Â die but someone would give you the information that die roll is odd.

Â So it had to be a one, a three or a five. Two or four or six is not possible, given

Â this extra information. The conditional on this new information,

Â everyone would probably agree that probability is now one-third.

Â And all we're going to do in the next couple of slides is mathematically develop

Â these ideas a little bit more completely. So let's develop the notion of conditional

Â probability, just generically talking about events.

Â So let's let B be any event. Such that the probability of B is greater

Â than zero. This condition's kind of important because

Â it doesn't make any sense to condition on the probability of an event occurring when

Â that event cannot occur. So it doesn't make any sense to talk about

Â the probability of A given B occurred if the probability that B occurred is exactly

Â zero. Just to put this in words it makes no

Â sense to talk about the probability that a coin is head given that the coin is on

Â it's side if you're not going to allow for the possibility for the coin to land on

Â it's side. So the definition of a conditional

Â probability of an event is the probability of the event A occurring given that the

Â event B has occurred, is the probability of the intersection divided by the

Â probability of B. Now notice if A and B are independent.

Â Then the probability of A given B, well the numerator component, the intersection.

Â A intersect B, factors into the product of the two probabilities.

Â Probability of A times the probability of B.

Â The probability of B cancels out the numerator and denominator and you're left

Â with the probability of A. So this actually makes a lot of sense that

Â if the events A and B are independent. Then the probability of A, given that B

Â has occurred is simply the probability of A without knowledge of whether or not B

Â has occurred. That is the information about whether B

Â has occurred is irrelevant to the calculation of the probability of A.

Â This matches our intuition as to what independence means and it's nice that the

Â mathematics works out that way. In fact in some probability texts, this is

Â their definition of independence as opposed to the definition that we gave

Â earlier. So let's just work through the formula

Â given our example with the die role just to convince ourselves that it's actually

Â working. We want the probability of a one, given

Â that the die roll is odd, so in this case B is a one, three or a five.

Â A is just a one, so the probability of A given that B has occurred is the

Â probability of the intersection. And in this case A is the set containing

Â one, B is the set containing one, three, and five.

Â So A is a subset of B. So when you intersect the two you just get

Â A by itself. So it works out to be the probability of A

Â divided by the probability of B. The probability of A by itself is one

Â sixth, the probability of B by itself is three sixth, and we get one third.

Â Exactly the answer that our intuition told us.

Â Okay. So that ends our very brief discussion of

Â basic conditional probability calculations using standard events, and generic

Â discussion of probability. Next, we're going to talk about

Â conditional densities, which will be our mathematical formulation for conditional

Â probabilities for our continuous random variables.

Â So welcome back troops. We're going to be talking now about

Â conditional densities now that we know a little bit more about conditional

Â probabilities, so conditional densities or mass functions are exactly densities in

Â mass functions that govern the behavior of a random variable condition on the value

Â that the other random variable or another random variable took a different value.

Â So just to tie this down a little bit, let's let f(x, y) be a bivariate density

Â or mass function and it governs the probabilistic behavior of the random

Â variables, capital X and capital Y. Now, I'm going to abuse notations slightly

Â and let the letter f be the joint density and f(x) be the marginal density

Â associated with x and f(y) be the marginal density or mass function associated with

Â y. And it's probably not the best notation to

Â use f for the joint density, f for the two marginals when they're all referring to

Â different things. So, you know, just keep in mind that the

Â arguments are kind of differentiating what I'm talking about here.

Â This is exactly very sloppy notation but I'm using it anyway.

Â So just to remind you the marginal density f(y) is the joint density f(x, y)

Â integrated over x or if the random variables happen to be discreet.

Â Then f(y) is the joint mass function f(x, y) summed over x.

Â So, in other words if you want to know regardless of what happened with respect

Â to x, what is the probability behavior of the random variable y, you have to

Â integrate over the random variable x. Overall, the potential values you can take

Â with what probabilities and then you get the marginal behavior of the random

Â variable y. In similar, you get the marginal for x.

Â F(x) is the integral of the joint density over y or the sum of the joint mass

Â function over y. Well, the conditional density is exactly,

Â say for example, f(x) given y is the joint density f(x, y) or mass function, divided

Â by the marginal f(y). It follows actually directly from the

Â definition of conditional probabilities that we just gave you a couple slides ago

Â and that we sort of all agreed on made a lot of sense.

Â 7:01

Let me elaborate on that point. It's in fact in the discrete case where x

Â can only take so many values one, two, three, four, then this definition of

Â conditional probability is exactly the definition that we used from events were A

Â is the event that x = x, and B is the event that y = y.

Â So there's no confusion. It exactly agrees with our definition of

Â conditional probability. The continuous one is a little bit harder

Â to kind of motivate why this is the definition.

Â The event that x takes on a specific value or y takes on a specific value has

Â probability zero for continuous random variables and so, that kind of fails our

Â basic premise from conditional probability associated with events that the

Â probability of the event that we're conditioning on has to have probability

Â greater than, than zero. Now, note we're talking not about

Â conditional probabilities, we're talking about the construction of the conditional

Â densities which govern the behavior of conditional probabilities.

Â So, we haven't violated that rule from earlier but it still kind of seems to

Â break the spirit of the rule and how do we get at this idea?

Â How can we have a meaningful definition of the probabilistic behavior or a random

Â variable, given that another random variable takes on a specific value.

Â Well, here's the motivation that I like. So, imagine if you define the event, A

Â that the random variable x is less than or equal to a specific value little x and the

Â event B is that the random variable y lies in this interval from y to y plus some

Â small amount, say epsilon. Then now A and B are events that have

Â positive probability. And we can apply our standard definition

Â of conditional probability to talk about the probability of the event A given that

Â the event B has occurred, right? That would just follow from our standard

Â definition. So, actually let's formulate this.

Â So, the probability A given B is the probability of x being less than equal to

Â little x, given that y is in the set y to y + epsilon.

Â And then now in this case, nothing has probability zero.

Â We can just directly apply the probabilistic formula.

Â And I don't think this is terribly important for this class.

Â I just wanted this argument be here for those who want to see it.

Â But then. You can just follow through the arithmetic

Â it's not the calculus here, and get that basically this construction.

Â Yields the conditional distribution function associated with the x.

Â Given that y = y, as we let epsilon get smaller and smaller.

Â So as the conditioning event gets closer and closer to y conditioning on it being

Â the specific value y. We limit to, conditional distribution

Â function associated with x. And then, remember that density functions

Â are derivatives of distribution functions so if we just take the derivative of this,

Â then we get the conditional density function.

Â So we can see right here that if we differentiate this conditional

Â distribution function, we get exactly the definition of the conditional density that

Â we gave you before, f(x, y) / f(y). So if you're interested in this at this

Â level, then you can go through those arguments carefully, and to be fair, these

Â only cover. The definition in the continuous case when

Â we have differentiable distribution functions.

Â But this is more than enough for our case. If you're interested in it at a deeper

Â level even than this, where you have mixed continuous and discrete densities, then

Â you can take an advanced probability course somewhere; but, for our purposes,

Â this is enough. And so just to summarize, we have the

Â conditional probability definition associated with events that kind of

Â governs all of our thinking about conditional probabilities and that's the

Â probability of A given B is the probability of A intersect B divided by

Â the probability of B and then in the event you are talking about random variables

Â what we want talk about the probability of a random variable x, given that the random

Â variable y has taken on a specific value. It's the joint density or mass function

Â divided by the marginal. And it has a nice sort of parallel with

Â the probability associated with events and here we've gone through the arguments to

Â show how we get from these statements about events to this definition for mass

Â functions and density functions. So conditional densities actually have a

Â very nice geometric interpretation. So if you have a joint density f(x, y)

Â that's a surface. F yields the Z value, and XY is the plain.

Â So f(x, y) is a joint density. It's a surface, and it's volume under the

Â surface has to be one for it to be a joint density.

Â Well what is it mean to get the conditional density of x given that y

Â takes a particular value. The event that y takes a particular value

Â that's sort of like a plane at the point, let's say y is five, at the point y equals

Â five, that's a plane, and that plane slices through this surface and yields a

Â function. That function is just f(x, y) evaluated at

Â the point five, f(x, five), okay? So we have this surface.

Â We have this plane. The y = five plane that cuts through the

Â surface and then we have the function that is on that plane at f(x, five).

Â And that is exactly the conditional density, with the exception of now it

Â doesn't integrate to one. So we have to normalize it by something

Â that integrates to one. Well, that' exactly what we divide by

Â there, f(5). Let's go through a specific example.

Â We have f(x, y) = ye^-xy - y. For, x and y both greater than zero.

Â Now the marginal density associated with y, let's just perform the integral.

Â We integrate from zero into infinity, of the joint density function over x because

Â we want the, marginal associated with y. And you can perform the integral.

Â It works out to be e^-y. And then our conditional density then f(x)

Â given y, is the joint density, f(x, y) / f(y).

Â So just churn through the calculations and you get ye^-x<i>y.< /i> And so if you</i>

Â wanted to know what's the conditional density, the governing behavior of the

Â random variable x, given that y is, say, three, then that density.

Â Would be 3e^-x<i>3.< /i> Okay, so you just plug in y = three.</i>

Â So, now this function, if you plug in any possible value of Y, this function will

Â now give you the associated density function for the random variable x

Â conditioning on the information that y takes on that specific value.

Â