0:00

In this video,

Â we will discuss methods of quantifying centers of numerical distributions,

Â building on the previous concepts of visualizing numerical variables.

Â Previously, we discussed shapes of numerical distributions.

Â We categorize distributions into three in terms of skewness.

Â Left skewed, symmetric, and right skewed.

Â And in terms of modality, we talked about variables that have a unimodal,

Â bimodal, uniform, or multimodal distribution.

Â 0:27

Another key characteristic that is of interest is the center of

Â the distribution, commonly used measures of center are the mean,

Â which is simply the arithmetic average.

Â The median, which is the mid point of the distribution or in other words

Â the 50th percentile and the mode which is the most frequent observation.

Â If these measurements are calculated from a sample,

Â they're called sample statistics.

Â Sample statistics are point estimates for the unknown population parameters.

Â Their unknown, since it's usually not feasible to have information

Â on all observations in the population.

Â These estimates may not be perfect, but if the sample is good,

Â meaning representative of the population, they're usually good guesses.

Â We usually use letters from the Latin alphabet when denoting sample statistics,

Â and letters from the Greek alphabet when denoting population parameters.

Â For example, the sample mean is x bar and the population mean is mu.

Â 1:25

Let's give a quick example with some simulated data.

Â Suppose we have exam scores from 9 students.

Â The mean of this distribution is simply the arithmetic average of these scores.

Â The mode is the most frequently observed value.

Â In this case, we have two students who scored 88, so the mode is 88.

Â However, we can see that with continuous distributions,

Â it may be very unlikely to observe the same exact value multiple times.

Â Therefore, the mode of a distribution is not always a very useful measure.

Â The median is defined as the midpoint or the 50th percentile of the distribution.

Â In order to calculate the median,

Â we need to first sort the data in increasing order.

Â The we find the mid-point of the ordered data which in this case happens to be 87.

Â But what if we didn't have an exact midpoint of the distribution?

Â Say we have one more student who scored 100.

Â Now the sample size is 10 and with an even number of observations

Â there isn't a simple value that divides the data in half.

Â In these cases,

Â the median as defined as the average of the middle of the two observations.

Â Here, we have 87 and 88 at the middle of our distribution so

Â the median based on this new data set would be 87.5.

Â Learning how these values are calculated by hand can be important for

Â also understanding the concepts, but

Â we should note that calculations like these are rarely done by hand.

Â Instead, we often rely on computation, which makes life much easier for

Â working with data with a larger number of observations.

Â 3:01

For example, let's revisit the life expectancy and income per person data.

Â We established before that the distribution of the average life

Â expectancies are left skewed.

Â The mean is 70.51 and the median is 73.34.

Â The mean, indicated by the pink solid line on the plot

Â is lower than the median indicated by the orange dashed line.

Â This is expected based on the shape of this distribution.

Â Since there's a long tail to the left, the arithmetic average is being

Â pulled to the lower end by the observations in the lower tail.

Â On the other hand, in a right-skewed distribution,

Â like the distribution of average income per person for each country.

Â The mean is roughly $12,500 while the median is only $7,000.

Â The mean is much higher than the median because this time the longer tail is

Â on the right and the few countries with the very high income levels

Â compared to the others pull the mean up.

Â 3:58

So to recap, in the left skewed distribution,

Â the mean is generally smaller than the median.

Â Since the few low valued observations pull the average down, in symmetric

Â distributions the mean and the median are now roughly equal to each other.

Â And in right skewed distributions, the mean is generally higher than the median

Â since the few high valued observations pulled the average up.

Â