0:00

In this video, I'd like to talk about the Gaussian distribution which is also

Â called the normal distribution.

Â In case you're already intimately familiar with the Gaussian distribution,

Â it's probably okay to skip this video, but if you're not sure or

Â if it has been a while since you've worked with the Gaussian distribution or

Â normal distribution then please do watch this video all the way to the end.

Â And in the video after this we'll start applying the Gaussian distribution

Â to developing an anomaly detection algorithm.

Â 0:59

And then to denote a Gaussian distribution,

Â sometimes I'm going to write script N parentheses mu comma sigma script.

Â So this script N stands for normal since Gaussian and

Â normal they mean the thing are synonyms.

Â And the Gaussian distribution is parametarized by two parameters,

Â by a mean parameter which we denote mu and

Â a variance parameter which we denote via sigma squared.

Â If we plot the Gaussian distribution or Gaussian probability density.

Â It'll look like the bell shaped curve which you may have seen before.

Â And so

Â this bell shaped curve is paramafied by those two parameters, mu and sequel.

Â And the location of the center of this bell shaped curve is the mean mu.

Â And the width of this bell shaped curve, roughly that,

Â is this parameter, sigma, is also called one standard deviation,

Â and so this specifies the probability of x taking on different values.

Â So, x taking on values here in the middle here it's pretty high,

Â since the Gaussian density here is pretty high, whereas x taking on values further,

Â and further away will be diminishing in probability.

Â Finally just for completeness let me write out the formula for

Â the Gaussian distribution.

Â So the probability of x, and I'll sometimes write this as the p (x)

Â when we write this as P ( x ; mu, sigma squared), and so this denotes that

Â the probability of X is parameterized by the two parameters mu and sigma squared.

Â And the formula for the Gaussian density is this 1/ root 2 pi,

Â sigma e (-(x-mu/g) squared/2 sigma squared.

Â So there's no need to memorize this formula.

Â This is just the formula for the bell-shaped curve over here on the left.

Â There's no need to memorize it,

Â and if you ever need to use this, you can always look this up.

Â And so that figure on the left, that is what you get if you take a fixed value of

Â mu and take a fixed value of sigma, and you plot P(x) so this curve here.

Â This is really p(x) plotted as a function of X for

Â a fixed value of Mu and of sigma squared.

Â And by the way sometimes it's easier to think in terms of sigma squared that's

Â called the variance.

Â And sometimes is easier to think in terms of sigma.

Â So sigma is called the standard deviation, and

Â so it specifies the width of this Gaussian probability density,

Â where as the square sigma, or sigma squared, is called the variance.

Â 4:38

Is also sent as zero.

Â But now the width of this is much smaller because the smaller the area is,

Â the width of this Gaussian density is roughly half as wide.

Â But because this is a probability distribution, the area under the curve,

Â that's the shaded area there.

Â That area must integrate to one this is a property of probability distributing.

Â So this is a much taller Gaussian density because this half is Y but

Â half the standard deviation but it twice as tall.

Â Another example is sigma is equal to 2 then you get a much fatter a much

Â wider Gaussian density and so

Â here the sigma parameter controls that Gaussian distribution has a wider width.

Â And once again, the area under the curve, that is the shaded area, will always

Â integrate to one, that's the property of probability distributions and because it's

Â wider it's also half as tall in order to still integrate to the same thing.

Â 5:51

Next, let's talk about the Parameter estimation problem.

Â So what's the parameter estimation problem?

Â Let's say we have a dataset of m examples so exponents x m and

Â lets say each of this example is a row number.

Â Here in the figure I've plotted an example of the dataset so

Â the horizontal axis is the x axis and either will have a range of examples of x,

Â and I've just plotted them on this figure here.

Â And the parameter estimation problem is,

Â let's say I suspect that these examples came from a Gaussian distribution.

Â So let's say I suspect that each of my examples, x i, was distributed.

Â That's what this tilde thing means.

Â Let's not suspect that each of these examples were distributed according to

Â a normal distribution, or Gaussian distribution, with some parameter mu and

Â some parameter sigma square.

Â But I don't know what the values of these parameters are.

Â The problem of parameter estimation is, given my data set, I want to try to

Â figure out, well I want to estimate what are the values of mu and sigma squared.

Â So if you're given a data set like this,

Â it looks like maybe if I estimate what Gaussian distribution the data came from,

Â maybe that might be roughly the Gaussian distribution it came from.

Â With mu being the center of the distribution, sigma standing for

Â the deviation controlling the width of this Gaussian distribution.

Â Seems like a reasonable fit to the data.

Â Because, you know, looks like the data has a very high probability of

Â being in the central region, and a low probability of being further out,

Â even though probability of being further out, and so on.

Â So maybe this is a reasonable estimate of mu and sigma squared.

Â That is, if it corresponds to a Gaussian distribution function

Â that looks like this.

Â 7:35

So what I'm going to do is just write out the formula the standard formulas for

Â estimating the parameters Mu and sigma squared.

Â Our estimate or the way we're going to estimate mu

Â is going to be just the average of my example.

Â So mu is the mean parameter.

Â Just take my training set, take my m examples and average them.

Â And that just means the center of this distribution.

Â 8:01

How about sigma squared?

Â Well, the variance, I'll just write out the standard formula again,

Â I'm going to estimate as sum over one through m of x i minus mu squared.

Â And so this mu here is actually the mu that I compute over here using

Â this formula.

Â And what the variance is, or

Â one interpretation of the variance is that if you look at this term,

Â that's the square difference between the value I got in my example minus the mean.

Â Minus the center, minus the mean of the distribution.

Â And so in the variance I'm gonna estimate as

Â just the average of the square differences between my examples, minus the mean.

Â 8:37

And as a side comment, only for those of you that are experts in statistics.

Â If you're an expert in statistics, and if you've heard of

Â maximum likelihood estimation, then these parameters, these estimates,

Â are actually the maximum likelihood estimates of the primes of mu and

Â sigma squared but if you haven't heard of that before don't worry about it,

Â all you need to know is that these are the two standard formulas for

Â how to figure out what are mu and Sigma squared given the data set.

Â Finally one last side comment again only for those of you that have maybe taken

Â the statistics class before but if you've taken statistics This class before.

Â Some of you may have seen the formula here where this is M-1 instead of M so

Â this first term becomes 1/M-1 instead of 1/M.

Â In machine learning people tend to learn 1/M formula but in practice whether it is

Â 1/M or 1/M-1 it makes essentially no difference assuming M is reasonably large.

Â a reasonably large training set size.

Â So just in case you've seen this other version before.

Â In either version it works just about equally well but in machine learning

Â most people tend to use 1/M in this formula.And the two versions have

Â slightly different theoretical properties like these are different math properties.

Â Bit of practice it really makes makes very little difference, if any.

Â 9:56

So, hopefully you now have a good sense of what the Gaussian distribution looks like,

Â as well as how to estimate the parameters mu and

Â sigma squared of Gaussian distribution if you're given a training set,

Â that is if you're given a set of data that you suspect comes from a Gaussian

Â distribution with unknown parameters, mu and sigma squared.

Â In the next video, we'll start to take this and

Â apply it to develop an anomaly detection algorithm.

Â