0:00

Welcome back. In the last session,

Â we have seen how to reshape a torch Tensor.

Â We saw that we are calling view method and this method would reshape our torch Tensor.

Â Today, we're going to discuss a very important topic.

Â This is the topic of Computational Graphs in PyTorch.

Â As we know in other deep learning frameworks,

Â we also deal with computational graphs like in Keras or in TensorFlow.

Â But, in these frameworks,

Â the computational graph are fixed,

Â so you create a model,

Â and as in Keras for example you define, for example,

Â if it is a neural network you define different layers, optimizer,

Â you define loss function,

Â number of epochs, the batch size,

Â and then you call compile.

Â If you call compile,

Â the run time will create a computational graph of

Â this model and this computational graph is fixed,

Â so you cannot change,

Â if you call model.fit it will execute the computational graph,

Â and during run time you cannot change it anymore, it's fixed.

Â And PyTorch has created something completely different.

Â The creators of PyTorch have decided that they

Â need a flexibility to change the computational graph at the runtime.

Â How did they accomplish this?

Â They have created a component which is called autograd.Variable.

Â This is the main building block of computational graph in PyTorch. How it works?

Â Let us see, let us execute this next cell.

Â So, first of all, we are creating autograd.Variable,

Â we are passing two parameters.

Â One parameter is actually data of this autograd.Variable.

Â This is a torch Tensor of size three,

Â and then we can pass the second argument which is requires_grad,

Â which is the meaning requires gradient, true or false.

Â It means following, it will say requires_grad true,

Â we are saying that this autograd.Variable should track how it was created.

Â This is very important.

Â So, we can print out here the data,

Â which is inside of this autograd.Variable.

Â This is simply the tensor of size three; one, two, three.

Â Then we are creating the next autograd.Variable,

Â which is Y. autograd.Variable, y the same.

Â We are passing this argument, data,

Â this is the first parameter,

Â then we are passing the second parameter,

Â requires gradient true or false, we're passing true.

Â And then we are adding up x and y,

Â and here we are printing the data in the component z,

Â which is the sum of x and y.

Â So, this is the sum,

Â so we just have summed up element wise two vectors.

Â But here, what is very interesting here is that we can also see how z was created.

Â If we write z.grad, function grad_fn,

Â and print out this operation,

Â we see that z was created by add operation.

Â It's very interesting. Let us see next example.

Â Here we are summing up the components of the z vector.

Â So, z vector we have created a sum of x and y,

Â this is five, seven, nine.

Â If we add up all those elements we are getting 21.

Â Nothing very interesting and special. But if we call in the next step,

Â if we call s grad function,

Â grad fn, we see that it was created as a sum.

Â So, the PyTorch knows exactly for every variable how it was created.

Â Here we are with the next example,

Â in here we have a function s. This function s is

Â the sum of two variable vectors of size three,

Â so we are summing up this vectors element wise.

Â So, we are summing the first element of vector x,

Â with the first element of vector y.

Â The second element with the second and third with the third.

Â So, now if we create partial differentiation,

Â so if we differentiate s

Â with respect to x_0, for example.

Â Due to the rules of partial differentiation,

Â we would deal with other variables other than x,

Â y as with the constants.

Â So, the partial differentiation for s,

Â for the function s, with respect to variable x_0 is one.

Â Let us see how it works with PyTorch.

Â So, first of all, let us recap that s was the sum.

Â This is actually the sum element wise.

Â And here we are calling backward,

Â and backward means, start back propagation from this point backwards.

Â So, it's starting back propagation,

Â so we have the function s which is the sum,

Â and then we say backwards,

Â and here we can pass the argument retain_graph true.

Â I will not go into details,

Â but it's optional actually.

Â So, here we round this in Excel what we see here is following.

Â So, first of all, we are printing vector x,

Â we see he has nothing special with just printing out this vector.

Â But, now if we print out the gradients,

Â not the gradient function as in a previous time,

Â but now we are we can print out the gradient.

Â Meaning gradient is the value of,

Â if we create differentiation,

Â if we differentiate the function at a special point,

Â we're getting also the value of this partial differentiation.

Â And here where it differentiates this function.

Â With respect to variable x,

Â x is consisting of three variables: x zero,

Â x one, x three.

Â So if we differentiate partially with respect to x zero,

Â we're getting one and then,

Â again with respect to x one,

Â we're getting one and with respect to x two, we're getting one.

Â We saw that this is very simple function.

Â If we create partial differentiation,

Â it would be one and all other variables would get zeros.

Â The level over here, we have one, one, one.

Â The same thing if we create differentiation,

Â partial differentiation with respect to y.

Â Because y is also, as x,

Â a simple plain variable here without any coefficient or exponentiations.

Â It's just one, one, one.

Â But what will happen if we call again <Backward>?

Â And then we call again: (x.grad) and (y.grad).

Â You see, it was updated.

Â So now, we have here two, two, two and

Â that was for x and for y the same, two, two, two.

Â What has happened? Let us call again <Backward> and again print out.

Â Now it's again changed.

Â It's three. The reason is,

Â that every time you call <Backward>,

Â the gradient property is accumulated.

Â This is a technology of PyTorch which is used for special,

Â there are some models where it's very convenient to use.

Â In this introductory session,

Â we will not use it but it's important to know that: every time if you call <Backward>,

Â the gradient will be accumulated.

Â Next important point is how to preserve the computational graph.

Â As you know, as you already know,

Â the autograd of variables are consisting of two components.

Â One is the data.

Â Another one is the grad fn, gradient function.

Â So with the data,

Â you can get out of the gradient.

Â Of the autograd variable,

Â you can get out the data.

Â And with grad fn,

Â you can get out of the function how this variable was created.

Â Let us execute next cell.

Â In this cell, we are creating two Tensors of size two.

Â And then we are summing up these Tensors.

Â So very simple.

Â So we're getting actually Tensor of size two by two, sorry.

Â So now let us execute next cell.

Â Here, we're creating autograd variable x, out of x,

Â and autograd variable y, out of y,

Â and saying both times (requires_gradient=True).

Â So now I can sum up these variables:

Â var_x + var_y and then we're printing out the gradient function,

Â how it was created.

Â Let us do this and we see z was created with add operation, <AddBackward>.

Â But now, what we do is the Following: we are extracting

Â the data out of this sum. Just the data.

Â And passing it to a new variable which is called var_z_data.

Â And then, we are creating actually new variable, new autograd.Variable,

Â and passing this new data variable in it and

Â then we are printing out and then we are trying to print out how it was created.

Â So we're actually printing out new_variable_z.gradient_function.

Â And then you see, none.

Â It's lost, because here,

Â if we are extracting the data.

Â We have extracted only data but not the gradient function.

Â And, yes, then the computational graph at this point is already broken so

Â the grad_fn was not passed and we

Â have retained only data but not the gradient function.

Â Now, if we tried to call <Backward> on this new autograd variable,

Â new_var_z which was created again with the data out of var_z,

Â we will get exception because there's nothing there.

Â Here we have a runtime error

Â that the element of variable does not require grad and does not have a gradient function.

Â This is the important point.

Â Does not have a gradient function.

Â Gradient function was lost and last but not least,

Â in this session, I would like to

Â briefly mention the CUDA functionality of the Torch Tensor.

Â As you probably remember,

Â the keras in Tensor flow,

Â automatically detect whether you have GPU acceleration on your machine or not.

Â And they will execute everything on the GPU,

Â if your GPU is available,

Â or on CPU if GPU is not available,

Â for PyTorch, you can very granually decide what to execute on GPU

Â and which Tensors you want to execute where, on GPU or on CPU.

Â Here you have a check.

Â And you can check torch.CUDA.is available().

Â And this check you can run every time if you want to decide where to execute the Tensor.

Â If you want to execute the tensor on CUDA,

Â and CUDA is available,

Â you just add CUDA function,.cuda().

Â And then it will run this Tensor on GPU.

Â If you don't want to run the Tensor on GPU,

Â you don't add this CUDA function at the end and it will run on CPU.

Â This is very, very flexible.

Â So in my opinion,

Â the main advantage of PyTorch is its huge flexibility.

Â Okay. I hope you enjoyed the session and next time we're going

Â to try something really practical.

Â We're going to build up a linear model with PyTorch.

Â See you then. Enjoy our sessions. Bye bye.

Â