0:03

So we've described some examples, some very basic examples of random variables.

Â So, what we need is a, a mathematics. Of random variables to use them.

Â And we have a mathematics of probability. And, we've acknowledged we're at least

Â willing to think of. Kinds of variables as if they're random.

Â We'd like to, put those two ideas together.

Â So we need functions that map the rules of probability to random variables, and so

Â for discrete random variables the kind of functions that we are talking about are so

Â called probability mass functions. So probability mass function is simply a

Â function that takes the values that random variable can take and maps it to their

Â associated probabilities. So for a die, the probability of p of one

Â would be one-sixth for example. And it turns out quite a few functions

Â satisfy the definition of being a probability mass function.

Â In fact, you only have to satisfy two rules if you'd like to be a probability

Â mass function. The first rule is that you have to be

Â bigger than zero for all of the arguments, where here, x is the collection of

Â possible values that a random variable can take.

Â And the second rule is that if you sum over all possible values, then you get

Â one. This is just exactly analogous to our

Â probability statement that the probability of the whole sample space has to be one.

Â But here we've put it in the terms of a probability mass function.

Â I want to talk a little bit about this notation.

Â Notice here I have this small x, and when we define random variables, two pages

Â previously, we used the capital X. So this is very common and maybe slightly

Â unfortunate notation, but it is used everywhere, so you might as well get used

Â to it instead of fighting it, that we use an uppercase letter.

Â Typically, to represent the random variable as a conceptual entity.

Â So if we say capital X, we're talking about a die roll that we could have.

Â When we use a lower case x, or a lower case y, or a lower case letter of any

Â sort, we tend to be talking about. Realized values of the random variable.

Â So the X lower case should be something that you should be able to plug a number

Â into where capital X is a conceptual random variable.

Â It's a conceptual flip of a coin, it's a conceptual role of a dye.

Â Lower case X is one or two or three or zero okay and it's slightly unfortunate

Â notation it takes a little bit of getting use to.

Â But I think for everyone who works on statistics of probability they've gotten

Â use to it and everyone does it so you might as well do it too.

Â Let's go over an example of constructing a probability mass function.

Â Let's take the simplest possible example, a coin flip.

Â So let's let x be the result of a coin flip, where zero represents tails and one

Â would represent a head. So we want the function, and let's assume

Â that the coin is fair. So we want a function that maps zero to

Â one-half and one to one-half, and there's infinitely many ways you could write down

Â that function. Well, we're gonna pick one.

Â Here we write it as one-half raised to the power of x.

Â Times one-half raised to the power of one minus x.

Â And notice that if you plug in x equals zero, you get one-half, and if you plug in

Â x equal one, you also get one-half. Now let's go to a slightly more

Â complicated example where we assume that the coin is potentially biased i.e.

Â That it's not fair. So let's let theta be the probability of a

Â head. In this case, expressed as a proportion

Â between zero and one. So, just as an example, imagine if theta

Â was.3 instead of a half, then we would think that the probability of a head was.3

Â and the probability of a tail is.7, but let's leave it as theta for right now.

Â So we want a function that says probability of a zero is one minus theta

Â and the probability of a one is theta. And then we see our function here, theta

Â to the x, one minus theta to the one minus x exactly satisfies these properties.

Â This is another common notation in the field of statistics like greek letters

Â like theta represent the things we don't know, that we would like to know.

Â So imagine if you had a coin, and you didn't know whether or not it was fair, we

Â would represent that unknown probability of head as theta.

Â So I want to give you a sense of where we were going.

Â In this case the probability of mass function is the entity that governs the

Â population of coin flips. And so, if we want it to know theta, we

Â are gonna collect data to estimate it, and then to evaluate the uncertainty in that

Â estimate. And the way we are going to evaluate

Â uncertainty in that estimate is using this probability distribution.

Â So all the probability distributions we are going to talk about are conceptual

Â models of populations and they are the entities that are going to tie our data to

Â the population. So at any rate, right now this may sound a

Â little heavy, and we'll discuss this in much more detail throughout the entire

Â class, but the one rule I want you to remember right now is that unknown things

Â that we want to know, like in this case, what would be the probability of a head,

Â are generally denoted in Greek letters. These are called parameters usually.

Â I also want to note one other thing. Why is it among all the possible ways that

Â we could have written out this probability mass function did we choose theta to the

Â x, one minus theta to the one minus x? There's lots of different ways we could

Â have done this. You can try and figure some of them out

Â yourself. Well it turns out, and we'll discuss this

Â at length, that in probability, multiplying is very useful.

Â And so we want, probability mass functions that make.

Â Multiplication very easy. So if we take things and raise them to

Â powers, then multiplying becomes easy. And that's a general, rule.

Â And we'll tell, you'll see later on why this is the case.

Â But any rate, this is why we choose this particular form.

Â Of the probability mass function when you could write it so many different ways.

Â But I wanna say, people have. Thought about this a lot, and this is.

Â Definitely the most useful way to write out, this particular probability mass

Â function. So consider again the unfair coin.

Â Our probability mass function satisfies p of zero equals one minus theta and p of

Â one equals theta. Let's just go through the exercise to

Â prove to ourselves that this is in fact a probability mass function.

Â It's greater than zero because it's one minus theta for zero and theta for one and

Â in this case theta is in between theta and one.

Â So, it's going to be greater for zero for x equal zero and one.

Â And then, the sum of the probabilities, probability of zero plus the probability

Â of one, in this case is theta plus one minus theta which is one.

Â So it satisfies the two rules that probability mass functions have to

Â satisfy. So that covers our principle entity that

Â we're going to use to model discrete random variables, probability mass

Â functions. So now we need to cover our principle

Â entity that we're going to use to model continuous random variables, which are

Â called probability density functions. So probability density functions are

Â abbreviated PDF by the way, so it stands for probability density function not

Â portable document format, which is what lots of people think of it as pdf, but in

Â statistics no one thinks of pdfs that way. I want you to remember one very important

Â role and I put it in italics to make it sure everyone remembers it and by the end

Â of the course this will be second nature to you, but if haven't seen it before, it

Â might seem a little odd. But the way that probability density

Â functions work are that areas under probability density functions correspond

Â to probabilities for the random variable. And there's definitely one undisputed king

Â of all PDFs, and that is the so-called bell curve.

Â So if you ever wondered what a bell curve was, if you hear it talked about a lot,

Â the so-called normal density function, you might wonder what in the world is a bell

Â curve accomplishing. Well.

Â Areas under bell curves correspond to probabilities.

Â So if you're modeling something as if the population it belongs to follows a bell

Â curve, then you are saying that, that probabilities associated with that random

Â variable are governed by areas under that bell curve.

Â That's just one example of a pdf. There is a lots of different kinds of

Â pdfs. So just like probability maths functions

Â have to follow two rules, probability density functions have to follow two rules

Â to be a valid probability density function.

Â They have to positive for all the possible values that the random variable can take,

Â that's called a support usually, and their integral has to be one.

Â I would also say a, a small point here. We define probability density functions as

Â if they, operate on the whole. Real line.

Â So even if your, random variable can only take values say between zero and two like

Â we talked about earlier with the pencil experiment.

Â Even if that's the case, we define the probability density function as zero below

Â zero and zero above two so that there's no associated probability, but we've defined

Â the probability on the whole real line so that we define its integral from minus

Â infinity to plus infinity. And I think in this class we tend to be a

Â little bit fuzzy about, sometimes operate on minus infinity to plus infinity, in

Â other times we will just write out zero to two, discarding all the area where the

Â function is zero and I hope from the context it will be clear what we are

Â doing. This final property, property two here

Â that the integral of your whole real line of the probability density function has to

Â be one, is simply again saying that the random variable has to take some value,

Â that it has to be in some interval in the real whole line.

Â Let's go through it specific example of a P D F and let's put it in a context.

Â So let's soon that the time in years from diagnosis until death with a specific kind

Â of cancer follows the density that looks like this.

Â Alpha vacsillesiii each as a negative x of five divided by five for x greater than

Â zero. The greater than zero been contextually

Â clear because you can have negative time from diagnosis and the person is

Â presuminglyiii alive at the time of diagnosis.

Â This is a very restricted example of a density that's commonly used in these

Â sorts of analyses of things like survival times.

Â It's called the exponential density function.

Â And again here you see that we have f(x) written as e to the negative x over five

Â over five for x bigger than zero, and zero otherwise, like I talked about in the

Â previous slide, we often just ditched that zero and talked about f(x) being, the

Â kernel of the function and the just either explicitly write or sometimes we will

Â fudge a little bit that x has to be greater than zero, if it is clear from the

Â context if that has to be the case. In this case it would be clear from the

Â context. Is this a valid density?

Â Could we model survival time after diagnosis with this density?

Â Well first of all we know that the function is positive because e raise to

Â any power is always positive, and then lets just check whether or not it

Â integrates to one. So we want the integral form minus

Â infinity to plus infinity but like we said that all of the meet of the distributions

Â starts at zero and goes from infinity, so lets just say the integral from zero to

Â infinity, f(x) dx is in this case the, anti-derivative is negative e to the

Â negative x over five, which when evaluated from zero to infinity yields one.

Â Let's go through an example, of. Using, this probability density function

Â to assign probabilities. So imagine if, we were to model this

Â population as if it followed this specific.

Â Exponential probability distribution. And imagine if someone asked us the

Â question, "What's the probability that a randomly selected person from this

Â population survived more than six years?" So if X, is the.

Â Conceptual, value. That, a random person takes.

Â We want to know, what's the probability that X is greater than or equal to six?

Â As represented by this, probability statement.

Â Remember again the golden rule for. Probability density functions that areas

Â under the curve correspond to probabilities.

Â So, if we want the probability x is greater than six.

Â We want the integral from sticks to infinity of the probability density

Â function and you can go through the calculus here to get the that works out to

Â be about 30%. In the statistical programming language or

Â you can do this automatically, it just does the integral for you, it uses a

Â numerical approximation and you just write Px6, for the fact that we want the

Â probability of six or larger. One fifth represents this parameter five

Â that you see in the exponential distribution.

Â Lower dot tail equals false, means that we want the probability being larger than six

Â rather than the probability being smaller than six.

Â So lower dot tail equals true, will give you six or smaller, lower dot tail equals

Â false, will give you six or larger. I want to elaborate on that point, by the

Â way. For a continuous random variable, the

Â probability that it takes any specific value is in fact zero.

Â Now that seems strange, but it's true. So remember areas under probability

Â density functions correspond to probabilities.

Â So what's the area of a line? It's zero.

Â Now, you might say, now that doesn't make any sense at all.

Â Specific values have to take probabilities because we see specific values when we

Â actually observe variables. The point is, is that our.

Â Probability density function is a model and it is defined on continuous random

Â variables. Continuous means measured to infinite

Â precision. And so, when we observe things, we never

Â measure them to infinite position, we never measure them to finite position.

Â And probability density functions are perfectly happy with saying, the

Â probability that x is 6.01 to 5.99 in assigning a perfectly valid probability to

Â that. But the probability that is exactly six is

Â zero. Because remember exactly six means 6.0

Â followed by an infinite trail of 0s, or 5.99 followed by an infinite trail of 9s.

Â Either way, that's the idea behind what probability density functions are getting

Â at. They're modeling truly continuous random

Â variables. So just remember that, when we observe

Â data. We of course measure them with finite

Â precision, but. Our, continuous.

Â Model is exactly that, it's a model. We find it far more useful in many

Â circumstances, to model random variables as if they were truly continuous.

Â Than to account for all the potential specific values they could take.

Â So, in this specific example a, a person will only measure how long they survive to

Â the year. Maybe to the month, maybe to the day.

Â Maybe to the hour, to the minute. To the second, but probably not much

Â further than that. And so, we're only going to measure to

Â finite precision. Nonetheless, it's still is much more

Â useful to model that as if it was continuous because we don't want to have

Â to assign probabilities to every single value.

Â We want to assign a general function. And that's why.

Â Continuous random variables are so intrinsically useful.

Â So my, the belabored point I'm trying to make.

Â This, by the way, is that whether or not you write probability x being greater than

Â or equal to six. Or the probability of x being strictly

Â greater than six in this case doesn't change this calculation whatsoever.

Â You get.301 either way. And so it also doesn't make a difference

Â in the probability exponential for the, our example.

Â It doesn't matter. Whether you specify lower tail or upper

Â tail, whether you're thinking about whether or not that includes six, it

Â doesn't care about that. However, for discrete random variables, it

Â makes a big difference, right? Because specific values.

Â Have actual probabilities assigned to it, so a die can take the value one, two,

Â three, four, five, or six. So in R if you are using these probability

Â functions, so Px are probabilities from the exponential distribution, P binomial

Â are probability from binomial distribution, P Poisson or Pois for

Â Poisson is probability from the Poisson distribution, P gamma probabilities from

Â the gamma distribution, are follows that rule pretty neatly.

Â If it's a discrete random variable, you have to be careful about whether or not

Â it's including the six. For a continuous random variable, you can

Â be very sloppy about it. So here I'm just depicting the area that

Â we're calculating. This grey area is the survival time from

Â six to infinity. This is simply the integral that we're

Â actually calculating, and I'll put the R code to generate exactly this figure in

Â the files for the course.

Â