0:01

Hi, my name is Brian Caffo and this is Lecture two of mathematical biostatistics

boot camp. In the last lecture, we covered

probability and the basics of bio-statistics at a very conceptual level.

In this lecture, we are going to get much more down to specifics.

First, we are going to cover subject of probability, which are mathematical

functions, so we will talk about the specifics of those kinds of mathematical

functions. Then in two, we will talk about random

variables. Random variables are just like any other

variables that you maybe encountered before like calculus, with the exception

that they are random. They can take lot of different values.

In, section three, we'ere going to talk about probability mass functions and

probability density functions. Theses are mathematical functions that map

probability to random variables. In section four we're going to talk about

so called commutative distribution functions, CDF's, very associated things

called survival functions, and quantiles. And then we'll wrap up with a brief

summary. So a probability measure is the function

that's going to govern the rules of probability for us.

And there is basically three rules that a probability measure has to follow.

And these three rules and every probability textbook will give you these

three rules are things that are equivalent.

There is interesting history behind these three rules.

The Russian mathematician Kolmogorov, who's generally considered the father of

all of the modern probability. Basically distilled everything that we

thought of as in terms of things that a probability should have to follow.

He minimized them down to the minimal set of rules that you could possibly have.

If you delete any of these rules you wind up with something that fails in some

fundamental way to be probability. And if you add any other rules they turn

out to be excessive. So it's really kind of an interesting

collection of research he did. It's also interesting to note that

[inaudible] tried to do something else. Which is to figure out what exactly it is

we mean by probability. So he found that problem to be very hard,

and I think if you look into it, the theory of exactly what is randomness, and

exactly. What is probability measure is a very deep

problem, and philosophers are still debating this, and I question whether or

not it'll ever reach a resolution. However.

One thing that's much less controversial is what rules probability has to follow

when comograph just nailed it, its done. So lets go over these three rules.

So probability measure P, the letter P here in italics, is a function that maps

events, which are subset of sample space to numbers between zero and one, that's

item one here. So events E have to be mapped to numbers

between zero and one. So probability is a function that operates

on sets. The second item here says that the

probability of the whole sample space has to be one.

Basically what this means is that something has to happen.

The sample space has to enumerate everything possible that can happen.

So for example, if you are flipping a coin, the coin can either be heads or

tails. The sample space is heads or tails when

you flip the coin, one of those two things has to happen.

The probability of one of the happening is one.

The coin can't land on it's side. If you want to allow the coin to land on

it's side, then it has to be heads, tails and land on it's side as the sample space.

The third statement and we will talk a lot about the third statement because we are

giving you an incorrect version of it. The third statement says that if two

events are mutually exclusive and recall events from mutual exclusive if they have

no intersection. If two events E1 and E2 are mutually

exclusive then the probability of the union is the sum of their probabilities.

So as an example we just talked about coin flipping, we said the probability of a

head or tail has to be one. The probability of either getting a head

or tail has to be one. So let's talk about that in the context of

real free. If E1 is the event that you get a head and

E2 is the event that you get a tail, then the probability of E1 union E2, the

probability to get a head or tail winds up being the probability of getting a head,

let's say is.5, plus the probability of getting a tail which is.5 which adds up to

one, exactly what we know has to. So in part three, the third rule that we

talked about in the previous slide, I said that there was some concern over it not

being complete, so I'm going to elaborate on what I mean by that in this slide.

First of all, let's note the following fact.

Part three of the previous slide, the fact that if you have two mutually exclusive

events, the probability of their union is the sum of their probabilities.

That pretty easily extends to the so called finite additivity, that instead of

having two, if you had three, or four, or five, or let's just say end events, that

the probability of their union. Equals the sum of their probabilities.

So in this case I have the probability of the union of a collection of mutually

exclusive events Ai, equals the sum of their probabilities.

That pretty directly follows from the previous definition, just to give you a

sense of how it works. If you had three events say A1, A2 and A3

and they are all. Mutually exclusive.

Then the probability of a1 union a2 and a3 is the sum of the probability of a1 plus

the probability of a2 union a3 right because a1 is mutually exclusive from the

union of a2 and a3. And then that second probability, the

probability of a2 union a3 is then again the probability of the union of two

mutually exclusive events. So it is the probability of a2 plus the

probability of a3. And you can formalize this with

mathematical induction if you want. So at any rate, the rule that I gave you

implies so-called finite additivity. And it seems like maybe that should be

enough to cover everything. Well the probabilists have thought very

hard, and they said well. Maybe we think it should be countable

additivity, instead of n it should go up to infinity.

And then it's not the case that the definition that we gave implies countable

additivity. That if you add an infinite collection of

mutually exclusive events that the probability of the union is the sum of the

probabilities, which requires ideas of limits and other things that we're not

going to cover so much in this class. So at any rate, it's the case that finite

additivity does not imply countable additivity, but of course countable

additivity implies finite additivity. So, in standard probability classes, in

the more theoretical probability classes, they make quite a bit of hay out of this

distinction. They discuss it a lot.

And the general definition gives countable additivity rather than finite additividty.

If you take a more advanced measure theoretic probability class, they will

deal with this issue at length. In this class, this will be the last time

we discuss this. In general, finite additivity will work

just fine for us. In the next slide, we are going to talk

about more details about what the probability functions operates on.

And again, it's going to be a rather important but maybe unnecessary detail for

this class, so we are going to... Again it's going to be another thing that

we cover very briefly and then tend not to think about for the remainder of the

lectures. Recall that our probability function

operates on events which are subsets of the sample space and maps them to numbers

between zero and one. So we need an appropriate domain.

Of our function, our domain is not an event, it's a collection of events.

So let me go through an example to make this idea a little bit more clear.

So let's suppose the sample space is simply the numbers one, two, or three.

Imagine somehow if you had a three sided die, that you were rolling.

Then, the. Probability function operates on all

possible events, that are subsets of that sample space.

So in this case the null event. The event, that you get a one.

A two, a three. A one or two.

A one or three. A two or three.

Or the whole sample space, a one, two or three.

And this is fine. Pretty much whenever you have a finite

set, the domain of the probability function will operate on all possible

subsets of the sample space. In this case we're using the letter script

F to denote this so called domain. When the sample space is a continuous set,

it actually gets a lot harder. And you can no longer say things like the

probability operates on the set of all possible subsets of a continuous set.

And it turns out that, that is an incredibly deep mathematical problem.

The mathematician Cantor thought about measure and sets in a very deep way, and

if you want to read about it, interesting character in the history of mathematics,

you should read about Cantor. He came up with interesting sets that, for

example, you can't reasonably include in the definition of a probability.

So in this class we're not going to think about this at all.

But I wanted to raise it just for those students that go on to take some of these

more advanced classes. So that you'll be prepared for some of

these admittedly kind of strange ideas that come up when you when you try to talk

about the set of sets that probabilities operate on.

For our purposes. When our sample spaces continue a set, we

are mostly going to be concerned with things like intervals or unions of

intervals. And in that case, definitions are very

easy. So our definition of the domain that the

probability operates on, we are just going to assume that anything that we can think

of, and since none of are Cantor, probably we won't think of anything too crazy.

Anything that we can think of is just fine.

And that definition works very well for this class.

In this slide, we're going to give a laundry list of properties that a

probability function has to have by virtue of its three definitions.

So it, you should find it kind of interesting that the three definitions

then imply all these things that we know probabilities have to have.

So take this first bullet here. The probability of the no said is zero,

basically the probability nothing happens is Zero.

So if you say you're going to roll a die, you actually roll a die, if you say you

can flip a coin, you actually flip a coin. That's basically what the probability of

the no said game zero is. The second bullet says the probability of

an event is one minus the probability with compliment.

In other words for example, if E is the probability that you get a head when you

flip a coin. The probability of getting a head is one

minus the probability of getting a tail and that's off-course true on a fair coin,

where the probability of head is 0.5 and the probability of tail is 0.5.

But lets suppose you have an unfair coin, maybe you, glued together, nickel and a US

dyne and made a funny shaped coin that you didn't know whether or not, the

probability of head was 0.5, lets suppose the probability of head in that case was

0.3. Well, this would say if the probability of

head is 0.3 then the probability of the tail has to be 0.7.

The next bullet says that the probability of the union of two events is the

probability of their sum. And that's all we would have to say if the

events are mutually exclusive. But, we have to subtract off the

intersection. If they are not mutually exclusive.

And the intuition behind this statement is something like this.

When you add the probability of A, you've added the probability of A.

Which includes the part of A that intersects B, and the part of A that does

not intersect B. And then you've added.

The probability of b, which includes the part of b that intersects a, and the part

of b that does not intersect a. So you have then just added that part of a

that intersects b and the part of b that intersects a, you've added it twice.

Once, when you added probability of a, once when you added probability of b.

You've added it twice, you only want to add it once, so subtract it out.

That's how the rule works. The next bullet point is a pretty simple

point, if A is a subset of B then the probability of A is less than or equal to

the probability of B. So this is analogous to saying if I am

rolling a die and A is say the event, that I get a one and B is the event that I get

a one or two, then the probability of getting a one is less than the probability

of getting a one or a two. And so this role I think makes a lot of

sense. From DeMorgan's laws we get probability of

A union B is one minus the probability of A complement intersect B complement.

The next bullet point is kind of a long though, lines of subtraction.

So A intersect B compliment, that set is sort of like subtracting B out of A, the

component of A that has nothing to do with B.

So the probability of A removing B is the probability of A minus the probability of

A intersect B, so that works out to be a nice rule that sort of set levels

subtraction works out to be equivalent to subtracting the probabilities.

The next bullet talks about the probability of the union events again.

This says the probability of the union of a collection of events is less than or

equal to the sum of the probability of the events.

Now again, if the events are mutually exclusive, then the probability of the

union has to equal the sum of the probabilities.

So this rule doesn't violate that rule whatsoever, but it also accounts for the

times when the events are not mutually exclusive.

The final rule talks, again, about unions of events.

In this case, the probability that the union of events is bigger than the

probability of the maximum of the collection of probabilities.

Again, this rule holds if the events are mutually exclusive or not.

But there's intuition behind this that's very easy.

The union, is. Everything that's in any, of the events.

E1 To EN. So it contains anything.

The probability of that has to be bigger than, any of its.

Component events. I think that makes quite a bit of sense.

So just, let me give you an example. Go back to our die roll, if E1 is the

event that you get a one, E2 is the event that you get a two, E2 is the event that

you get a three. The probability on the left hand side of

the equation is the probability that you get a one, two or three on the right hand

side it says that it's the maximum probability.

If you are talking about a standard dye probability of one is 1/6th, probability

two is 1/6th, probability three is 1/6th. So the maximum of them is 1/6th.

On the left hand side the probability of the union is the probability of a one, two

or three which is one half. So half is definitely bigger than 1/6th.

So let me, give you an example of one of these proofs.

So let's take, a simple one. The probability, of an event is one minus

the probability of its compliment. So consider line one.

Recall that the probability of the whole sample space is one.

But, again, the sample space for any event is equal to the union of that event and

its complement. So.

Omega equals e union e-complement. Then consider the next line.

An event is always mutually exclusive with its complement.

Something cannot simultaneously occur and not occur, so events are always mutually

exclusive with their compliment. So e and e compliment are mutually

exclusive events. So we can take the probability of the

union and turn it into sum of the probabilities, the probability of the,

possibly the probability of the compliment and then that's simply a restatement what

we want to prove, one equals the probability pos probability of compliment.

Let's do a more complex example of the consequences of the probability rules.

So recall that we discussed that the probability of the union of a collection

of events is less than or equal to the sum of the probabilities.

And recall that less than or equal to is an equality if the events are mutually

exclusive. So let's prove this using mathematical

induction. The way mathematical induction works is

you prove it for some small statement, one or two, then you assume that it's true for

say n minus one, and then prove that it's true for n.

That's how mathematical induction works. So let's consider just two events,

probability of e1 union e2. Well that's by one of the other

consequences of the probability rules that we investigated.

That's equal to probability of e1 plus the probability of e2 minus the probability of

e1 intersect e2, and here I'm assuming that we've gone ahead and proved that one

as well. So.

This final term here that's subtracted off, minus probability e1 e2.

We are subtracting off a number that has to be positive.

Remember probabilities have to be between zero and one, so they have to be

non-negative at least. So if we throw away that final term.

What's left can only get bigger, right? So if we're subtracting off a positive

number and we throw it away, then it's gotta get bigger.

So, then we've established the result for the case when we have two events.

Now let's assume the result is true when we have n minus one events, and let's

consider n events. So we want to demonstrate that the

probability of the union of the EI is less than or equal to the sum of the

probabilities. So let's write out the probability of the

union of the EI's as EN union with the union of the rest of them.

So the union of the rest of them I co one to N-1 is a single set.

We've already done that, that's just two sets EN and the union of the remainder are

two separate sets, we already worked it out for two sets.

So we can say that the probability of the union E1 to EN is less or equal to the

probability of EN, plus the probability of the remainder.

Now consider the next line. In the next line, we have the probability

of E N from the next line. And then we can say, that we have only

gotten bigger by our induction hypothesis. By the fact that we assume that this

statement is true for N minus one events. So there if we switch this probability

from the probability of the union to the sum of the probabilities.

We've only made it bigger. So we can maintain that inequality.

Then just collecting the terms, then we just have that this is the sum of the

probabilities. And just to give you a sense of notation I

use, when I write equals on this last line I mean it equals the previous line, not

that it's equal to the first line. So I am assuming that it's less than or

equal to, less than or equal to and equal to.

Implying that the final statement is less than or equal to the first statement, but

equal to the previous lines. So that's notation that I commonly use.

So you should be able to prove all of these probability statements that we

outlined on the previous slide. This particular one, let's go ahead and

take a step back from the mathematics and try and put some of this within a context.

So the National Sleep Foundation reports that around three percent of the American

population has Sleep Apnea. This is a, sleep disease where the upper

airways collapses. They also report that around ten percent

of the North American and European population has restless leg syndrome.

For the purpose of our discussion let's just assume that this is ten percent of

the American population has, restless leg syndrome.

Similarly they report that 58 percent of adults in the US experience insomnia.

So imagine if you were a sleep physician and you wanted to know the probability

that a random american has any of these three sleep disorders.

Can you simply add these probabilities, three%, ten%, 58 percent and get 71

percent of people have at least one of these sleep problems.

So this question is nothing other than, restatement of the probability

relationship that we just proved. So hear I am using A instead of E, but

maybe that's a good thing to do, just so you get used to not using the same letter

for everything. So lets A1 be that the person has sleep

Apia, A2 be the event the person has restless leg syndrome, and A3 be the event

that the person has Insomnia. And I'm gonna gloss over the details, but

the probability that a person has at least one of these diseases is, we're talking

about the union, A1 union, A2 union, A3. So we want to know the probability of the

union. Well that's only equal to the sum of the

probabilities, right? When a1, a2, and a3 are mutually

exclusive. Otherwise it's the probability of a1 plus

the probability of a2 plus the probability of a3, and we have to subtract out other

things. And in this case I give you the exact

equation for relating the probability of the union of three events to the

probability of A1, A2 and A3 and so works out to be.71 but then there's all the

other stuff that A1 intersect A2, A1 intersect A3, A2 intersect A3 and then you

have to add in the triple intersect in A1 intersect A2 intersect A3.

I would suggest you go through and figure out why exactly it is this formula works

out. But the point is that other stuff is

non-trivial and it's always there unless a-one, a-two, and a-three are mutually

exclusive. And so you can't simply add these other

things. And in fact, in this case, from a

scientific perspective, I mean we're talking about it from a mathematical

perspective, but from a scientific perspective it's probably the case that

there's a non trivial interception of people with sleep apnea and restless leg

syndrome, and a non trivial interception of people with restless leg syndrom and

insomnia and so on. So that this point seven one is not close

at all. So that ends our whirlwind tour of the

basics of probability mathematics. Next, we're gonna talk about random