0:01

Hi. My name is Brian Caffo, and

Â this is the Expected Values lecture as part of the Statistical inference

Â course on Coursera, which is part of our data science specialization.

Â This class is co-taught with my co-instructors, Jeff Leek and

Â Roger Peng, relative to Johns Hopkins Bloomberg School of Public Health.

Â 0:18

This class is about statistical inference.

Â The process of making conclusions about populations based on noisy data that we're

Â going to assume was drawn from it.

Â The way we're going to do this, is we're going to assume that populations and

Â the randomness govern, governing our sample is given by densities and

Â mass functions.

Â 0:57

Expected values, is the mean is the most useful expected value.

Â It's the center of a distribution.

Â So you might think of as the mean changes, the distribution just moves to the left or

Â the right as the mean of the distribution moves to the left or the right.

Â 1:18

And just like before in the way that the sample quintiles estimated

Â the population quantiles.

Â The sample expected values are going to estimate the population expected values.

Â So the sample mean will be an estimate of our population mean.

Â And our sample variance will be an estimation of our population variance, and

Â our sample standard deviation will be an estimate of

Â our population standard deviation.

Â The expected value,

Â or mean of a random variable, is the center of its distribution.

Â 1:54

The expected value takes its idea from the idea of the physical center of mass,

Â if the probabilities were weights, were bars where their

Â weights were governed by the prob, the value of the probability, and

Â the x was the location along an axis that they were at-

Â The expected value would simply be the center of mass.

Â We'll go through some examples of that in a second.

Â 2:17

This idea of center of mass is actually useful in defining the sample mean.

Â Remember, we're talking about in this lecture, the population mean,

Â which is estimated by the sample mean.

Â But it's interesting to note that the sample mean is the center of mass.

Â If we treat each data point as equally likely.

Â So, in other words, where the probability is one over N,

Â and each data point xi has that probability.

Â If we were to try, then find the center of mass for the data that is exactly X bar.

Â 2:52

So, I have some code here to show an example of taking the sample mean of data,

Â and how it represents the center of mass just by drawing a histogram.

Â So here I have this data Galton.

Â And again the code can be find in the mark down file associated with the slides that

Â you can get from GitHub.

Â 3:09

So here in this case, we have parent's heights and

Â children's heights in a paired data set.

Â And here, I have a histogram for the child's height and

Â here I have the histogram for the parent's height.

Â And I've overlayed a continuous density estimate.

Â So I'd like to go through an example where we actually show how moving

Â our finger around, will balance out that histogram and, fortunately, in our studio,

Â there's a, a neat little function called "manipulate" that will help us do this.

Â So, I'm going to load up "manipulate," and then,

Â the code I'm going to show you in here.

Â But I think if you go on to take the data products class,

Â which is part of the specialization here,

Â we'll actually go through the specifics of how you use the manipulate function.

Â But here I'm just going to do it, to show you it running.

Â 3:57

And then we're going to look at the plot.

Â Okay, so here is the, the plot of the child's heights.

Â It's the histogram, and I've overlaid a continuous histogram on top of it.

Â And here, let's say this vertical black line is our current estimate of the mean.

Â So here, it's saying that the mean is 62, and it gives us the mean squared error.

Â That's sort of a measure of imbalance, how teetering or tottering this histogram is.

Â Now notice as I move the mean around, which I can do now with manipulate,

Â let's move it more towards the center of the distribution.

Â Notice the mean has gone up.

Â Let's move it right here.

Â Notice the mean went up to 67.5, but the means squared error dropped quite a bit.

Â It balances, it, it helped balance out the histogram.

Â That was, almost the point where it would, would balance it out perfectly.

Â And you can see as I get here, it goes down a little bit more.

Â But then at some point, it starts going back up again.

Â So if I move it all the way over here.

Â Right. This mean squared error,

Â this measure of imbalance, gets quite large.

Â So again, this is just illustrating the point that the empirical mean is going to

Â be the point that balances out.

Â The empirical distribution and

Â we're going to use this to talk about the population mean,

Â which is going to be the, the point that balances out the population distribution.

Â