In the last video, we discussed mean values of

datasets which represent an average data point.

In this video, we will look at variances to describe other properties of a dataset.

Let us have a look at two different datasets, D1 and D2.

D1 is represented by the blue dots located at 1, 2, 4, and 5,

and D2 is represented by the red squares at minus 1, 3, and 7.

D1 and D2 have the same mean, which is 3,

but the data points in D2 are less

concentrated around the mean than the data points in D1.

Remember the mean value is the data point you would expect on average.

But to describe the concentration of data points around the mean value,

we can use the concept of the variance.

The variance is used to characterise

the variability or spread of data points in a dataset.

In one dimension, we can look at the average squared distance of

a data point from the mean value of this dataset.

Let's do this for D1 and D2.

So, D1 was 1, 2, 4,

and 5, and the mean value or expected value of D1 was 3.

And D2 was minus 1,

3, and 7 with exactly the same mean value.

So now we want to compute the average squared distance of

D1 from the mean and for D2 from the same mean.

So let do this for D1 first.

So we get 1 minus 3 squared plus 2 minus 3

squared plus 4 minus 3 squared plus 5 minus 3 squared.

So these are the sum of the square distances.

And to get the average,

we divide by 4, which is the number of data points in D1.

So if we do the computation,

we get 4 plus 1 plus 1 plus 4 divided by 4,

which is 10 over 4.

So now we do the same for D2.

And for D2, we get minus 1 minus 3 squared plus

3 minus 3 squared plus 7 minus 3 squared,

and we divide by the number of data points in D2, which is 3,

and we get 16 plus 0 plus 16 divided by 3,

which is 32 over 3.

So now this number is now bigger than this number,

which means that the average squared distance of D2 from

the mean value is bigger than the average squared distance of D1 from the mean value,

which indicates that the spread of the data is higher in D2 than in D1.

So what we have done can be formalised.

So, assuming we have a dataset consisting of N data points,

X1 to XN, then we can define the average squared distance as the following.

So we have X1 up to XN.

And then we define this to be our dataset X.

And we define now the variance of this dataset to be 1 over N times

the sum of small n equals 1 to big N

of Xn minus mu squared,

where mu is the mean value of the dataset X.

So what we have done here is exactly the same as what we did before with D1 and D2.

We computed an average squared distance of

the data points in the dataset from the mean value of the dataset.

And now we can also make some statements about this.

First, the variance as defined here can never

be negative because we just summed up squared values.

And that also means we can take the square root of the variance,

and this is called the standard deviation.

The standard deviation is expressed in the same units as

the mean value whereas the variance unfortunately is expressed in squared units.

So, comparing them is quite difficult.

Therefore, when we talk about spread of the data,

we usually look at standard deviations.

So, in this video,

we looked at variances of one-dimensional datasets.

And in the next video,

we will generalise this to higher dimensions.