To do machine learning, we're going to use TensorFlow.

TensorFlow is a machine learning library that underlies many of Google's products.

We open-sourced this in 2015.

And TensorFlow is actually a C++ engine.

The reason it's C++ is so that we can use GPUs, we can use CPUs,

we can run on Android phones, etc.

But people don't want to write code in C++, so

you have an API, and that API is in Python.

The Python API talks to C++, gets the job done.

And TensorFlow is essentially a numerical processing library, but

it has a variety of features that make it particularly good for

deep neural networks and training of deep neural networks.

So first of all, so because we are going to be talking about neural networks,

what exactly is a neural network?

Let's go ahead and look at this pretty cool site called Playground.

So I'm going to go in to playground.tensorflow.org.

Let's go ahead and remove all these so

that we have an idea what it is that we want to do.

What it is that we want to do is that we have some data, and

the data is that we have blue dots and we have orange dots.

And the idea is that given a dot,

we want to be able to predict whether it's orange or it's a blue.

And in order to do that we have two pieces of information.

We have the x and we have the y.

Okay, the x here is from -66, and the y here is from -66.

And given the x and y, we want to be able to predict if a dot at this point for

example.

x is 5 and y is 4.

Is that dot going to be blue or orange?

What do you think?

Well, I think you would say orange,

because everything far away seems to be orange.

But at this point, the background image is a prediction.

And the prediction is that it's going to be blue.

Everything to the right of this is going to be blue.

Everything to the left of this line is going to be orange.

And the way this prediction comes about is by taking the two xs, this x and

this x, adding them together, and that's basically what my result is going to be.

All right, so it's basically going to be a sum of the x and y with a certain weight.

So -0.18 times x plus -0.28 times y.

Add the two things up.

If it's less than 0, it's orange, and if it's greater than 0, it's blue.

And because these two weights are negative, you can kind of see that

all the negative data is here and all the positive data is here.

So that's basically a prediction that's pretty bad, right?

The prediction is that everything here is going to be blue,

everything there is going to be orange, that's not true.

But let's see if we can change these weights.

There's a weight here on x, there's a weight here on y.

Let's say, go ahead and tune these weights to come up with a better prediction.

And as you can see, it is not possible to come up with a better prediction that can

linearly combine x and y to basically separate blue dots and orange dots.

So let's stop this, this is going nowhere.

And let's think back about this problem.

Remember when I said when x is 5 and y is 4, what was the color?

You can intuitively said it would be orange.

The reason that you thought it would be orange was because of the distance.

Everything that was close to the center was blue.

Everything far away from the center was orange.

It's square root of x squared plus y squared.

So there's an x squared term and there's a y squared term.

Let's, instead of just using x and

y, let's add x squared and let's add y squared.

So now we have four inputs, not just x and y, but x squared and y squared.

So given these four inputs, let's come up with weights for

all four of these in such a way that it separates blue dots and yellow dots.

So let's start, and lo and behold, that's my prediction now.

The prediction is that everything inside of this is going to be blue, and

everything outside of that is going to be orange, and

that seems to capture the data pretty well.

It captures our intuition of what this data say very well.

So this idea is called feature engineering.

So one of the ways that we can improve our machine learning models' predictions

is to kind of get human insight into the problem.

And the insight that we had was that this was based on distance.

We knew that distance involved x squared and y squared.

So we threw that into the network and we said, train yourself with weights.

But let's say we don't have that insight.

All I have is x and y and I want to basically do this prediction.

So rather than do feature engineering,

another thing that we can do is that we can create a neural network.

So I'll create a layer of these and

what it's doing is, this guy is x and

y added together, and to that, I'm applying some function.

I could do rectified linear unit, tan hyperbolic, sigmoid, whatever.

It doesn't really matter which one we choose, let's just pick Tanh.

So I'll do Tanh there, Tanh here, Tanh here, Tanh here, Tanh here,

so five different Tanhs.

Add them all up.

Why did I pick five?

Who knows? I just picked five.

Why did I pick Tanh?

Who knows? I just picked Tanh, right.

I just picked something.

I have a neural network.

I'll basically go ahead and train it.

So go ahead and find.

Know it's now no longer just two sets of weights.

It's two weights here, two weights here, two weights here.

So that's ten weights here plus five weights here.

So 15 weights that we get to basically tweak around, so

go find me a set of weights that capture this data.

And it takes a little bit long, and it basically comes back with, well,

everything inside this triangle like shape is going to be blue, and

everything outside it is going to be orange.

Yea, right, its not going to be perfect, but

it's a pretty reasonable approximation to this data, and it'll help you predict

with pretty darn good accuracy how well this is going to do.

And that's the point of a neural network.

The idea is to basically capture what the data are in such

a way that you can do the prediction later on.

So, notice that what we did here was rather than take our human insight,

we were able to use a neural network to essentially get at

a good enough end result.

So this is a neural network with one hidden layer,

that's one layer of these neurons, so there are five of those nodes.

We could also create extra hidden layers.

So now we have ten sets of weights here, ten sets of weights here,

and then five sets of weights, so that's now a whole bunch more weights, right?

Now this is ten, but each of these has five, so that's 5 times 5,

that's 25, so 10 plus 25 is 35, plus 5 is 40 weights.

So we now have a model that's a lot more complex.

And you basically get, again, reasonably good results.

And the basic rule of thumb is to go with the simplest possible

network that gives you good enough performance.

So in this case we would go with just one hidden layer, but

here we have to choose a number of nodes, and let's say we start with two nodes,

and let's see, does this do well?

And it turns out that no, two nodes are not enough for this problem.

It doesn't do very well.

There are all these errors here in terms of capturing them.

So let's stop that.

Let's say we add a third node, right?

And then we say, start this.

And with three nodes, it seems to be fine.

Except maybe some of these guys.

The ones at the edges are probably not completely right, but it's close enough.

So in this situation, I would probably go with just three nodes, right?

That's the simplest neural network that gives me good enough performance.