0:15

Hello and welcome to the Gaussian Process lesson.

Â Gaussian processes is a machine learning algorithm that uses a set

Â of Gaussian or normal distributions to

Â infer an underlying functional relationship among a set of data.

Â This is a little bit more complex

Â mathematically than some of the other algorithms that we've seen.

Â But it's a very popular,

Â very new and useful algorithm.

Â I wanted to include it in the course.

Â And you can actually understand a lot of what this algorithm is trying to do visually,

Â by seeing the plots of the actual underlying function,

Â the sample data, as well as the fitted function.

Â And this makes it a little bit easier to

Â introduce than simply diving straight into the mathematics.

Â So by the end of this lesson, what you should

Â understand the basic idea of a Gaussian Process.

Â You should be able to communicate that to somebody else.

Â I want you to understand that probabilistic underpinning behind the Gaussian Process.

Â And you should be able to apply a Gaussian Process by using the scikit-learn library.

Â There are two readings for this particular lesson as well as the course notebook.

Â So let's jump straight into the first lesson.

Â The first one is by a well-known Pythonista, Chris Fonnesbeck.

Â He is actually the head developer,

Â creator of a popular probabilistic programming language

Â called PyMC3 that will actually use in accounting 571.

Â This talks about fitting Gaussian Processes in Python,

Â and he demonstrates three different ways to do it.

Â I only want you to look at the first way.

Â If you do skim through this idea,

Â of course, this is some pretty complex math.

Â And so I'm not too concerned about you understanding it,

Â but you might be able to follow along.

Â It's actually not as complex as it looks.

Â But the main thing I want you to do to get out of this is to look at this idea.

Â First he talks about, just if you are

Â doing some of the stuff with Python what we would do.

Â But then he's actually going to get into demonstrating how to

Â apply Gaussian Processes by using the scikit-learn library.

Â And that of course is what we're interested in.

Â He also demonstrates two other ways to do that.

Â So here, I want you to look at the scikit-learn example, that's the first part.

Â You could ignore the GPflow in the PyMC3.

Â We'll actually look at the PyMC3 version as I said in accounting 571.

Â So, the second reading is this.

Â It's actually a little web page that talks about Gaussian Process Regression,

Â and the idea of trying to understand it.

Â So when you look at this, if you go to this web page,

Â you're just going to see this curve hopping around.

Â And what it is, is it's an attempt to perform

Â Gaussian Process Regression on a set of data,

Â but we haven't actually put any data points in.

Â There's some sliders down here that you can

Â use to control how crazy the function is behaving.

Â So if I slide this up, you can see it gets a little more crazy,

Â you can increase the noise,

Â you can also change your characteristic length scale,

Â and she is just going crazy.

Â So what does this have to do with Gaussian Processes?

Â Well, right now there's no data to constrict to fit.

Â It's just as you can see, flopping around.

Â But as soon as I put in one data point,

Â let's say I go right here,

Â you notice it's changed.

Â What's happened is that fit now has to go near this point, right?

Â There's a slight probabilistic fitting going on here,

Â so it can vacillate around that point a little bit,

Â but that's a restriction on where

Â the fit has to lie in a particular part of parameter space.

Â So now I come down and I get say, well,

Â let's come over here and let's put another point right here.

Â And now you've noticed that now,

Â we have two points that are restricting the fit of our data.

Â And not only that, you notice that here,

Â the black line is the actual fit that we've estimated our original signal at,

Â and then these are the confidence interval limits.

Â I can continue to put some points out here,

Â and I can put on the point out here,

Â and maybe another one here.

Â And you can see as I continue to add points,

Â the fit gets more and more constrained.

Â And so, here I have just, what?

Â Six points, seven points.

Â And you could see that they've already greatly constrained that function.

Â It was flopping around in all of this space.

Â And I add a few more points.

Â You could see that pretty soon this function is very well constrained,

Â and we've defined exactly where

Â the fit function should be as well as a confidence interval.

Â That's a really powerful example of this entire fitting process.

Â One other thing I want to mention before I leave this particular site.

Â Notice how at the ends where there's no data points constraining the fit,

Â is where it still has the greatest ability to vacillate or move around.

Â This is a characteristic example of

Â this challenge we've talked about before of interpolation,

Â where we have data points,

Â and we're trying to say what's the value of the function between the data points and

Â extrapolation where we're saying what's

Â the value of the function past the limits of our data?

Â And there's very little information out here so there's very little

Â to constrain the actual function that we're looking at.

Â The last thing of course is the notebook,

Â and this notebook is going to introduce Gaussian Processes.

Â This is a very powerful probabilistic technique.

Â It's based on the idea of Kernel functions that were going to be

Â using to approximate the actual underlying signal.

Â So first we have to talk about those.

Â Then we're going to use the Iris data set to introduce classification.

Â We'll demonstrate decision surfaces with Gaussian Processes as

Â well as hyper parameters and their effect on the overall fitting process.

Â We'll then see an example of a more complex dataset being used for classification.

Â And we'll also look then at regression and how

Â Gaussian Processes can be used to estimate a continuous value.

Â And for that, we will use the auto MPG data.

Â The rest of this notebook just walks through this.

Â The only thing I really wanted to emphasize here was

Â this idea of the Gaussian Process and how it's fitting.

Â This is now showing exactly what we saw on that website but in a static plot.

Â Here now is our data points,

Â and you can see here's our functional fit in the blue line.

Â The actual signal that we're trying to fit is this purple dashed line.

Â And you see that this doesn't do a very good job and we only have two observations.

Â We've fit the data at the endpoints but we don't do very well in between.

Â As soon as we start adding some more points however,

Â the function does a much better job.

Â We're starting to zero in on the actual fit.

Â And as we add more and more,

Â you can see that it starts getting better.

Â Realize, this is only eight observations.

Â That's not very much data.

Â And as soon as we go to 12,

Â it's doing quite well, right?

Â So you could see of how quickly you can start to approximate

Â a function with very little data from Gaussian Processes.

Â And that demonstrates one of the reasons that people are very excited about using

Â them is because it doesn't take a lot of data to both

Â get a reasonable model for your underlying signal,

Â as well as the fact you have these confidence intervals

Â that sort of tell you the limits of your knowledge.

Â The next thing was Kernel functions.

Â The idea that we have to employ

Â Kernel functions in order to approximate this underlying signal.

Â There's a number of different ones that can that can be used.

Â The scikit-learn library provides a number of them that you can actually apply.

Â So we talk a little bit about them.

Â There's a ConstantKernel, Sum

Â kernel that allows you to combine different Kernel functions,

Â Product which is multiplying two different Kernel functions,

Â there is a Kernel that allows you to

Â include something that estimates the noise in the signal,

Â there's a Radial Basis Function,

Â this is something we've seen before,

Â it's a non-linear function that allows us to actually get non-linear features.

Â And then there's a Matern Kernel,

Â which is a generalization of that RBF.

Â These are discussed in more detail

Â online in scikit-learn library and you can see it there.

Â The rest of this notebook simply walks through

Â applying Gaussian Processes to first the Iris data set.

Â So we can see that. See, our accuracy is pretty good.

Â Right off the bat. And then we can see our confusion matrix again.

Â Pretty good. We can then look at the decision surface.

Â And remember that we have to employ one of these kernels,

Â and so we can get something that's quite non-linear,

Â which this example shows.

Â We can then change hyper-parameters.

Â And for this, we're actually going to use different Kernels for our hyper-parameter.

Â And so here we're defining different Kernels.

Â You can see that we have an RBF Kernel used for the first one.

Â That's a isotropic RBF because we do not specify a different length scales.

Â The second one is an un-isotropic,

Â because these are different values of the length scales.

Â So you could think of this as an ellipse,

Â and one of them is a long axis,

Â the first one, and the second one is a shorter axis.

Â So it's a very narrow cigar shaped ellipse that may have

Â a quadratic function along with a Constant Kernel and dot product.

Â And then lastly, a Matern Kernel.

Â The details of these are less important.

Â I'm not asking you to become an expert in Gaussian Processes,

Â I do want you to be aware of

Â the richness of the Kernels that you can apply to this particular problem.

Â Then we just compute these and we apply and as you'll see as we

Â go through these decision surfaces, they're very different.

Â Particularly, from what we've seen before.

Â So this is the very first one we did,

Â the isotropic, radial basis function.

Â This is an un-isotropic radial basis function.

Â And then you notice how it's suddenly started to classify

Â the bottom as the same as the top. It's very different.

Â And then again, notice how nonlinear these decision surfaces are?

Â Then we have the dot product one.

Â And then lastly, we have Matern Kernel.

Â So again, changing the parameters changes the ability of

Â the algorithm to make classifications or to

Â perform regressions as the rest of this notebook will show.

Â You can see that that kernel has a lot of input into how the algorithm actually operates.

Â So with that, I'm going to go ahead and stop,

Â allow you to read through the material that we

Â introduced as well as run through this notebook.

Â This is a very powerful algorithm and something you'd want to be aware of,

Â although I don't expect you to in this course, become an expert.

Â It's the sort of thing that's very useful to you to be aware of,

Â so that if you're working in a team and people start talking about techniques,

Â you'll be familiar with them and able to converse and talk about

Â their strengths and weaknesses and why you might want to use it for a particular problem.

Â That I'm going to go ahead and stop.

Â If you have any questions let us know in the course forums and good luck.

Â