0:00

You see me draw a few pictures of neural networks.

Â In this video, we'll talk about exactly what those pictures means.

Â In other words,

Â exactly what those neural networks that we've been drawing represent.

Â And we'll start with focusing on the case of neural networks with

Â what was called a single hidden layer.

Â Here's a picture of a neural network.

Â Let's give different parts of these pictures some names.

Â We have the input features, x1, x2, x3 stacked up vertically.

Â And this is called the input layer of the neural network.

Â So maybe not surprisingly, this contains the inputs to the neural network.

Â Then there's another layer of circles.

Â And this is called a hidden layer of the neural network.

Â I'll come back in a second to say what the word hidden means.

Â But the final layer here is formed by, in this case, just one node.

Â And this single-node layer is called the output layer, and is responsible for

Â generating the predicted value y hat.

Â In a neural network, [INAUDIBLE] of supervised learning,

Â the training set contains values of the inputs x as well as the target outputs y.

Â So the term hidden layer refers to the fact t that in the training set,

Â the true values for this nos in the middle are not observed.

Â That is you don't see what they should be in the training set.

Â You see what the inputs are.

Â You see what the output should be.

Â But the things in the hidden layer are not seen in the training set.

Â So that kind of explains the name hidden there just because you

Â don't see it in the training set.

Â Let's introduce at annotation.

Â Where rest previously, we were using the vector X to denote the input features and

Â alternative notation for

Â the values of the input features will be A superscript square bracket 0.

Â And the term A also stands for activations, and

Â it refers to the values that different layers

Â of the neural network are passing on to the subsequent layers.

Â So the input layer passes on the value x to the hidden layer, so

Â we're going to call that activations of the input layer A super script 0.

Â The next layer, the hidden layer will in turn generate some set of activations,

Â which I'm going to write as A superscript square bracket 1.

Â So in particular, this first unit or this first node,

Â we generate a value A superscript square bracket 1 subscript 1.

Â This second we generate a value.

Â Now we're on the subscript 2 and so on.

Â And so, a superscript square bracket 1,

Â this is a four dimensional vector or you want in.

Â Python because the 4.1 matrix of the common vector, which looks like this.

Â And it's four dimensional, because in this case we have four nodes, or

Â four units, or four hidden units in this hidden layer.

Â And then finally, the open layer regenerates some value A2,

Â which is just a real number.

Â And so y hat is going to take on the value of A2.

Â So this is now guess how in the regression we have y hat equals a and

Â legislative regression which we only had that one output layer, so

Â we don't use the superscript square brackets.

Â But with our newer network, we now going to use the superscript square

Â bracket to explicitly indicate which layer it came from.

Â One funny thing about notational conventions in neural networks

Â is that this network that you've seen here is called a two layer neural network.

Â And the reason is that when we count layers in neural networks,

Â we don't count the input layer.

Â So the hidden layer is layer one and the output layer is layer two.

Â In our notational convention, we're calling the input layer layer zero, so

Â technically maybe there are three layers in this neural network,

Â because there's the input layer, the hidden layer, and the output layer.

Â But in conventional usage, if you read research papers and elsewhere in

Â the course, you see people refer to this particular neural network as a two layer

Â neural network, because we don't count the input layer as an official layer.

Â Finally, something that we'll get to later is that the hidden layer and

Â the output layers will have parameters associated with them.

Â So the hidden layer will have associated with it parameters w and b.

Â And I'm going to write superscripts square bracket 1 to indicate that these

Â are parameters associated with layer one with the hidden layer.

Â We'll see later that w will be a 4 by 3 matrix and

Â b will be a 4 by 1 vector in this example.

Â Where the first coordinate four comes from the fact that we have

Â four nodes of our hidden units and a layer, and

Â three comes from the fact that we have three input features.

Â We'll talk later about the dimensions of these matrices.

Â And it might make more sense at that time.

Â But in some of the output layer as associated with it, also parameters w

Â superscript square bracket 2 and b superscript square bracket 2.

Â And it turns out the dimensions of these are 1 by 4 and 1 by 1.

Â And these 1 by 4 is because the hidden layer has four hidden units,

Â the output layer has just one unit.

Â But we will go over the dimension of these matrices and vectors in a ater video.

Â So you've just seen what a two layered neural network looks like.

Â That is a neural network with one hidden layer.

Â In the next video,

Â let's go deeper into exactly what this neural network is computing.

Â That is how this neural network inputs x and

Â goes all the way to computing its output y hat.

Â