In the last video we looked at variances for one dimensional datasets. In this video, we will be looking at variances for high dimensional data sets. The intuitive definition of the variance that we had earlier does not really work in the same way in high dimensions, and squaring vectors is not really defined. Assuming we have a two dimensional dataset, we can now compute variances in the X direction and the Y direction. But these variances are insufficient to fully describe what is going on in the dataset. In particular, we only have the variation of the data in either direction independent of the other direction but we may also be interested in the relationship between the X and Y variables. And this is where the concept of a covariance between these components comes into play. Let's have a look at an example in two dimensions. For this dataset, we can compute the variances in the Y direction, and the variances in the X direction, which are indicated by these vertical and horizontal bars. But this can be insufficient. Because we can look at other examples where the variances in the X and Y directions are the same, but the dataset look very different. If we look at this particular example, we have a different shape of the dataset but the variances in the X direction, and the variance in the Y direction are exactly the same, and the mean values of these datasets are also identical. And I can look at different sets like this one here and this one. These four data sets look very different. You've seen four different examples with four different properties or shapes of the datasets, but the variances in X, and the variances in Y, and the mean values are identical. If we exclusively focus on the horizontal and vertical spread of the data, we can't explain any correlation between X and Y. In this last figure, we can clearly see that on average, if the X value of a data point increases then on average, the Y value decreases. So that X and Y are negatively correlated. This correlation can be captured by extending the notion of the variance to what is called the covariance of the data. The covariance between X and Y is defined as follows. So covariance between X and Y, is defined as the expected value of X, minus the mean in the X direction, times Y minus the mean in the Y direction. Mu X is the expected value in the x direction. And Mu Y is the expected value of the Y coordinates. For the 2D data, we can therefore obtain four quantities of interest. We obtain the variance of X, the variance of Y. And we also have the covariance terms. The covariance between X and Y and the covariance between Y and X. So we summarise these values in this matrix called the covariance matrix with four entries. And in the top left corner, we have the variance in the x direction. Then we get the covariance term between X and Y. The top right corner the covariance between Y and X in the bottom left corner. And the variance of Y in the bottom right corner. If the covariance between X and Y is positive, then on average the Y value increases if we increase X. And if the covariance between X and Y is negative, then the Y value decreases if we increase X on average. If the covariance between X and Y is zero, X and Y have nothing to do with each other. They are uncorrelated. The covariance matrix is always a symmetric positive definite matrix, with the variances on the diagonal and the cross covariance or covariances on the off diagonals. If we now look at D dimensional datasets. Let's say we have a dataset consisting of N vectors X1 to Xn and every Xi is an RD. Then we can compute the variance of this data set as one over N, times the sum I equals to 1 to N. Xi minus Mu times Xi minus Mu. Transpose where Mu is the mean of the dataset and this is called the covariance matrix of the data and this is a D by D matrix.