Neural Networks — PyTorch Tutorials 1.13.1+cu117 documentation
Click here to download the full example code
Neural Networks¶
Neural networks can be constructed using the torch.nn
package.
Now that you had a glimpse of autograd
, nn
depends on
autograd
to define models and differentiate them.
An nn.Module
contains layers, and a method forward(input)
that
returns the output
.
For example, look at this network that classifies digit images:
It is a simple feed-forward network. It takes the input, feeds it
through several layers one after the other, and then finally gives the
output.
A typical training procedure for a neural network is as follows:
-
Define the neural network that has some learnable parameters (or
weights) -
Iterate over a dataset of inputs
-
Process input through the network
-
Compute the loss (how far is the output from being correct)
-
Propagate gradients back into the network’s parameters
-
Update the weights of the network, typically using a simple update rule:
weight = weight - learning_rate * gradient
Loss Function¶
A loss function takes the (output, target) pair of inputs, and computes a
value that estimates how far away the output is from the target.
There are several different
loss functions under the
nn package .
A simple loss is: nn.MSELoss
which computes the mean-squared error
between the output and the target.
For example:
output
=
net
(
input
)
target
=
torch
.
randn
(
10
)
# a dummy target, for example
target
=
target
.
view
(
1
,
-
1
)
# make it the same shape as output
criterion
=
nn
.
MSELoss
()
loss
=
criterion
(
output
,
target
)
(
loss
)
tensor(1.0810, grad_fn=<MseLossBackward0>)
Now, if you follow loss
in the backward direction, using its
.grad_fn
attribute, you will see a graph of computations that looks
like this:
input
->
conv2d
->
relu
->
maxpool2d
->
conv2d
->
relu
->
maxpool2d
->
flatten
->
linear
->
relu
->
linear
->
relu
->
linear
->
MSELoss
->
loss
So, when we call loss.backward()
, the whole graph is differentiated
w.r.t. the neural net parameters, and all Tensors in the graph that have
requires_grad=True
will have their .grad
Tensor accumulated with the
gradient.
For illustration, let us follow a few steps backward:
(
loss
.
grad_fn
)
# MSELoss
(
loss
.
grad_fn
.
next_functions
[
0
][
0
])
# Linear
(
loss
.
grad_fn
.
next_functions
[
0
][
0
]
.
next_functions
[
0
][
0
])
# ReLU
<MseLossBackward0 object at 0x7fbc61b81ae0> <AddmmBackward0 object at 0x7fbc61b81570> <AccumulateGrad object at 0x7fbc61b83a30>
Backprop¶
To backpropagate the error all we have to do is to loss.backward()
.
You need to clear the existing gradients though, else gradients will be
accumulated to existing gradients.
Now we shall call loss.backward()
, and have a look at conv1’s bias
gradients before and after the backward.
net
.
zero_grad
()
# zeroes the gradient buffers of all parameters
(
'conv1.bias.grad before backward'
)
(
net
.
conv1
.
bias
.
grad
)
loss
.
backward
()
(
'conv1.bias.grad after backward'
)
(
net
.
conv1
.
bias
.
grad
)
conv1.bias.grad before backward tensor([0., 0., 0., 0., 0., 0.]) conv1.bias.grad after backward tensor([ 0.0018, 0.0302, 0.0162, -0.0062, -0.0122, 0.0102])
Now, we have seen how to use loss functions.
Read Later:
The neural network package contains various modules and loss functions
that form the building blocks of deep neural networks. A full list with
documentation is here.
The only thing left to learn is:
Updating the weights of the network