0:02

So the variance of a random variable is another expected value property of a

Â distribution. Recall the mean measured the center of a

Â distribution. The variance measures how spread out it

Â is. So.

Â If x is a random variable and it has mean, , I, expected value of x equals , then the

Â variance of x is defined as the expected value of quantity x minus , whole thing

Â squared and, and the expected value. So what, what does that mean?

Â So, the expected values is in, is, is in essence an, an average, right?

Â So it's sort of the average or the typical value that the random that the variable

Â takes, the center of the distribution. On the other hand the, The variance is

Â sort of the, the, the average distance the random variable is from the mean.

Â So, what that means is, is sort of higher variances imply that variances are more,

Â what that implies is that, is that. Random variables with higher variances

Â come from distributions that are more spread out then ones that have a lower

Â variance. >> That makes sense and I'm just kind of

Â thinking of that fulcrum point still. >> Yeah.

Â >> How things are more spread out [inaudible].

Â >> Yeah, exactly, exactly, great. >> Alright.

Â >> And so, Let me just remind you what this formula, the variance formula means

Â again. If you were to take the random variable x

Â and figure out what the distribution was, if you were to subtract off its population

Â mean, which turns out to be the exact same distribution just with all the possible

Â values of x shifted by the value and then it has mean zero.

Â And then I were to take that random variable and figure out what the

Â distribution of the square of it is, then take the expected value of the resulting

Â random variable. And that's, that's hard, so we don't ever

Â calculate the variance that way. We typically calculate the variance by a,

Â a convenient shortcut, and that is that the variance of a random variable is the

Â expected value of x squared minus the expected value of x quantity squared, and

Â again this expected value of x quantity squared is just Mu squared.

Â This shortcut formula, then, requires you to calculate the expected value of

Â x-squared. But again, the, the kind of, ten,

Â typically the more convenient way to do that is to, to use, if it's discrete, the

Â summation of, of say, t-squared, p of t, where p is the probability mass function.

Â Or, if it's continuous, use integral t-squared, f of t, where f is the, the

Â density function. It would be nice for you as, as exercise,

Â to show that this original variance calculation.

Â Equals, this. Shortcut variance calculation.

Â Just by, expanding the square and using the.

Â Expected value, rules. It would be convenient if the variance

Â operator was also linear. It's not.

Â As an example, the, the variance if you pull a random variable out of the, out of

Â the variance, you, it gets squared. So variance of a times x, right, is not a

Â random variable it is a squared variance of x.

Â The square root of a random variable is called the standard deviation and the

Â reason we use standard deviation often instead of the variance is that the

Â standard deviation has the same units as the random variable.

Â So let's say x as a random variable has units in inches, the variance has units

Â inches squared, whereas, the standard deviation has units inches.

Â So, it's often quite convenient to. Talk about the spread in the same units as

Â the random variable itself. So the standard variation is a common

Â summary of the variance. Well let's, let's calculate a, a sample

Â variance. What's the sample variance from a toss of

Â a die? So in this case, expected value of x is

Â 3.5. We've covered that already.

Â And expected value of x squared, let's calculate that.

Â Well, we have one squared times a sixth, plus two squared times a sixth, plus three

Â squared times a sixth, plus four squared times a sixth, plus five squared times a

Â sixth, plus six squared times a sixth. That works out to be 15.17.

Â And then you subtract 15.17 minus 3.5 squared, and that works out to be about

Â 2.92. Let's go through a very important,

Â formula. Let's suppose we, flip a coin.

Â But let's make it slightly more interesting.

Â Instead of the coin having probability one-half of a head, let's say that it has

Â probability p of a head. So here expected value of x equals zero

Â times the probability of a tail, which is one minus p, plus one times the

Â probability of a head, which is p, so it works out to be p as the expected value.

Â And of course this works out with our calculation when the p happens to be

Â one-half for the, a fair coin. Now let's calculate the expected value of

Â x squared. Well, actually it's kind of interesting in

Â this case it's pretty easy to do that because x only takes on the values zero

Â and one, and if you square zero you get zero, and if you square one you get one.

Â So x squared is in fact exactly x. So expected value of x squared is equal to

Â the expected value of x which we already calculated as p.

Â So the variance of x in this case is expected value of x squared minus the

Â expected value of x quantity squared, which is p minus p squared, which works

Â out to be p times one minus p, which is a formula you may have encountered before.

Â It's interesting to know that this variance formula is maximized when p is.5,

Â so just simply plot the function p times one minus p, between zero and one.

Â So, plot this function between zero and one and you'll see that it maximizes at,

Â at.5. So the most variable coin flip can be is

Â if, in fact exactly a fair coin. It's, it's interesting to know that the

Â most variable a random variable can be, in general is if you shuff all its mass to

Â two endpoints. And equally distributed between those two

Â endpoints. That's so, if you have a continuous random

Â variable and you wanna make it more variable kind of chop out the middle and

Â spread it out equally distributed between the two ends.

Â And in fact let, let's, let's talk about this in greater detail.

Â Suppose that you have any random variable, like a uniform random variable, that's

Â between zero and one. And it's expected value is p.

Â Now, since the variable takes value between zero and one, p has to be a number

Â between zero and one. And then notice if, if x is a, a, a random

Â variable that's between zero and one, x squared has be less than or equal to x.

Â Because if you take, any number between zero and one and square it you get a.

Â Smaller number. And so X, expected value at X squared has

Â to be less than or equal to expected value of X which is P.

Â Therefore, the variance of X. Which is expected value of X squared minus

Â expected value of X quantity squared. Has to be less than or equal to the

Â expected value of X minus the expected value of X squared, which is P times one

Â minus P. And basically, this is then just a proof,

Â that the Bernoulli variants, this. Binary variance where the random variable

Â can only take the value zero or one, is the largest possible for a random variable

Â that has expected value of p. And then we also noted that we earlier

Â that the, the maximum value that you can get is when p is in fact 0.5, so this

Â basically just shows that the, this is basically a simple little proof that the

Â random variable, that the largest variance that you can get for a random variable is

Â that you. So to shove its mass to two endpoints,

Â and, the, the closer you can get to, to an equal mass in both the endpoints, the, the

Â larger the variance is. I' not sure if I'd mentioned this

Â previously but I called the, the variable a coin flip that can take heads with

Â probability p, I called it a Bernoulli random variable.

Â This is named after the mathematician Jacob Bernoulli who is one of the fathers

Â of probability and Jacob Bernoulli is an interesting character.

Â You should, you should read up on him. The Bernoullis were a very famous

Â mathematical family. They came up with lots of Lots of

Â discoveries, Jacob was a particularly influential member of the Bernoulli,

Â Bernoulli family and he discovered quite a bit of probability theory very, very early

Â on. At any rate, when you have a random

Â variable that takes the value zero or one with probability P, then we, we call that

Â a Bernoulli random variable. So here we are back.

Â Talking about variances, and. Variances are kind of difficult things to

Â understand and, and equivalently standard deviations.

Â I, I prefer to interpret standard deviations.

Â Intuitively we know that, that bigger variances mean distributions are more

Â spread out but, but we need some way to actually interpret what bigger a-, what

Â bigger means. Now in the context of a specific

Â distribution, we might learn. The, the kind of quantities associated

Â with that distribution to, to know that what, what does one variance mean, or two

Â standard deviations mean, three standard deviations mean?

Â And that's particularly true of the Gaussian, or bell-shaped density.

Â We, we know, we tend to know those, the values associated with those variances,

Â sort of, by heart. But there is a, a general rule that

Â applies to all distribution and its, its so called Chebyshev inequality, after the

Â Russian mathematician Chebyshev. So any rate, Chebyshev gave a really

Â useful inequality for interpreting variances.

Â So, Basically the inequality says the probability that a random variable is K

Â standard deviations from its mean, or more, is less than or equal to (1/K^2).

Â So let me repeat that because it's so important.

Â The probability that a random variable is more than K standard deviations from its

Â mean is less than or equal to (1/K^2). And let's just look at some simple

Â benchmarks for K. The probability that a random variable is

Â more than two standard deviations from its mean is.

Â 25 percent or less, the probability of the random variable is three standard

Â deviations from its mean is eleven percent or less.

Â The probability of the random variable four standard deviations from its mean is

Â six percent or less. And again, note that, that, that is a

Â bound on the probability statement. It doesn't.

Â It's not an equality, so. It's the worst that it could possibly be

Â the, the, the lots of distributions the probability of being four standard

Â deviations or more beyond the mean is far lower than six%, but six percent is the

Â worst it can be. So, so it's unlikely, say, for example

Â that you will, if you Ob, observe a random variable, it's unlikely that you will see

Â that random variable be say, six standard deviations from the mean, that's, that's

Â quite unlikely, that's has probably less than one over 36, regardless of the

Â distributions. What, what's interesting about Chebyshev's

Â inequality is that it's, it's quite easy to prove.

Â And so well, let's just go through the proof really quickly.

Â Well, let's look at this probability statement.

Â The probability that a random variable is more than K standard deviations from its

Â mean. And, and let's do it in the, in the

Â continuous case. Let's just do it in the continuous case.

Â You can prove it more generally but, but this just gives you the intuition behind

Â the proof. Well that's the integral over the, the set

Â of x where it's more than k standard deviations from the mean, where here now

Â the little x and the, the domain of integration is a, is a dummy variable of

Â integration, f of xdx, and this could, could be, you know, we could replace this

Â by another letter over on the right-hand side but on the left-hand side it has to

Â be capital x. Well notice, notice the that x minus mue

Â over k sigma. Absolute value x minus mue over k sigma

Â has to be bigger than one. So if we square that it has to be bigger

Â than one as well. So you take a number that's bigger than

Â one and square it, it's still bigger than one.

Â So we can multiply by x minus mue squared over k squared sigma squared.

Â 13:28

And we've only made the integral bigger. Right?

Â So we can replace this equality with an inequality where here the alligator's

Â chomping the bigger part. [laugh] Okay?

Â So now we have this quantity here. And we'll only make it bigger yet if we,

Â instead of integrating over this, this restriction of the domain, we'll, we

Â integrate over the whole thing. From minus infinity plus infinity because

Â everything, the, the X minus B squared is strictly positive, so we'll only make it

Â bigger. And then, notice now that this, the sure

Â that the case squared sigma part is a scaler that we can just factor out, and

Â then we have minus infinity, to plus infinity, X minus mew squared, minus

Â infinity to plus infinity integral of X minus mew squared X of FDX, well that's

Â just exactly the definition of the variants.

Â And that, so that equals sigma squared. The sigma squared's cancel and you get one

Â over K squared. So we see that the, the probability.

Â That X is more than K standard deviations from the mean.

Â We started out with an equal sign. We got bigger, we got bigger, then we've

Â had a final equality. So, the whole thing is less than or equal

Â to one over k-squared. So, I find it remarkable that Chebyshev's

Â Inequality, this powerful result that applies to all distributions, has such a

Â simple little proof. Let's go through some numerical examples,

Â just that, to, to. Show, why this.

Â Result is, useful. So, intelligence quotients.

Â And, I, you know, a, a, actually, I would recommend that you look up intelligence

Â quotients are often called Binet scales. They're, they have a very rich and

Â interesting history that intersects with statistics in several other fields and,

Â and, psychology and so on. And, I, I, it's really quite an

Â interesting literature on intelligence quotients.

Â So, I, I would, I would highly recommend you look it up just because it's quite

Â fun. But, but let's kind of skirt that

Â discussion and just say, let's suppose intelligence quotients really are

Â distributed with a mean of 100 and a standard deviation of fifteen.

Â What's the probability that, that a randomly-drawn person from a, from this

Â population of people that have IQs of, with mean 100 and standard deviation of

Â fifteen, what's the probability of drawing a person with a IQ higher than 160 or

Â below 40? And of course I picked the 160 or 40

Â specifically. Well 160 is four standard deviations above

Â the mean and 40 is four standard deviations below the mean, so Chebyshev's

Â inequality that the, that this will be no larger than six%.

Â If, if in fact, if in fact the IQ distribution is bell-shaped or is Gaussian

Â this bound is very, very conservative. Just to give you a sense of how

Â conservative the probability that a random draw from a bell curve being four standard

Â deviations from the mean is not six percent but on, but on the order of, of,

Â of ten to the minus fifth. 1000 of one%.

Â Which again, it doesn't violate the Chebyshev Inquality.

Â Ten to the minus fifth is less than.06 so it's, it's fine but it's quite a bit less

Â so, just to give you sense of how conservative Chebyshev inequality can be.

Â Let me go through another example. So a buzz phrase in...

Â In industrial quality control is... Is Motorola so-called 6-Sigma, and I have

Â to admit to being largely ignorant of exactly what the 6-Sigma Industrial

Â Protocol is, but might just the jest of it as far as I understand is that businesses

Â are suggested to control extreme events or rare def...

Â Or rare defective parts and the ideas that you go out six standard deviation, so.

Â Let's as an intellectual exercise, maybe you on your own, can go look up what

Â exactly the six sigma protocol is. Let's as an intellectual exercise talk

Â about what the probability of six sigma events are, the idea of having a random

Â variable that lies six standard deviations above the mean, well, that's by

Â Chebyshev's inequality, six standard deviations above or below the mean.

Â By Chebyshev's inequality, that's either, that's the probability of such an

Â occurrence is less than one over six squared which is about three%.

Â So it's highly unlikely. But again, remember Chebyshev's is a bound

Â that applies to all distributions. If you know something about the

Â distribution, for example, if you know the distribution is a bell curve then the

Â probability of a six sigma event is on the order.

Â Of, ten to the minus ninth, which is, I calculated, is one-ten millionth of a

Â percent. So, again, that doesn't violate

Â Chebyshev's Inequality, ten to the ninth is less than.03.

Â So it doesn't violate, Chebyshev's Inequality, but, any rate, that's what a

Â 6-Sigma event is discussing.

Â