Click here to download the full example code

Neural Networks¶

Neural networks can be constructed using the torch.nn package.

Now that you had a glimpse of autograd, nn depends on
autograd to define models and differentiate them.
An nn.Module contains layers, and a method forward(input) that
returns the output.

For example, look at this network that classifies digit images:

convnet

It is a simple feed-forward network. It takes the input, feeds it
through several layers one after the other, and then finally gives the
output.

A typical training procedure for a neural network is as follows:

Define the neural network that has some learnable parameters (or
weights)
Iterate over a dataset of inputs
Process input through the network
Compute the loss (how far is the output from being correct)
Propagate gradients back into the network’s parameters
Update the weights of the network, typically using a simple update rule:
weight = weight - learning_rate * gradient

Loss Function¶

A loss function takes the (output, target) pair of inputs, and computes a
value that estimates how far away the output is from the target.

There are several different
loss functions under the
nn package .
A simple loss is: nn.MSELoss which computes the mean-squared error
between the output and the target.

For example:


output
 =
 net
(
input
)
target
 =
 torch
.
randn
(
10
)
  # a dummy target, for example
target
 =
 target
.
view
(
1
,
 -
1
)
  # make it the same shape as output
criterion
 =
 nn
.
MSELoss
()

loss
 =
 criterion
(
output
,
 target
)
print
(
loss
)

tensor(1.0810, grad_fn=<MseLossBackward0>)

Now, if you follow loss in the backward direction, using its
.grad_fn attribute, you will see a graph of computations that looks
like this:


input
 ->
 conv2d
 ->
 relu
 ->
 maxpool2d
 ->
 conv2d
 ->
 relu
 ->
 maxpool2d
      ->
 flatten
 ->
 linear
 ->
 relu
 ->
 linear
 ->
 relu
 ->
 linear
      ->
 MSELoss
      ->
 loss

So, when we call loss.backward(), the whole graph is differentiated
w.r.t. the neural net parameters, and all Tensors in the graph that have
requires_grad=True will have their .grad Tensor accumulated with the
gradient.

For illustration, let us follow a few steps backward:


print
(
loss
.
grad_fn
)
  # MSELoss
print
(
loss
.
grad_fn
.
next_functions
[
0
][
0
])
  # Linear
print
(
loss
.
grad_fn
.
next_functions
[
0
][
0
]
.
next_functions
[
0
][
0
])
  # ReLU

<MseLossBackward0 object at 0x7fbc61b81ae0>
<AddmmBackward0 object at 0x7fbc61b81570>
<AccumulateGrad object at 0x7fbc61b83a30>

Backprop¶

To backpropagate the error all we have to do is to loss.backward().
You need to clear the existing gradients though, else gradients will be
accumulated to existing gradients.

Now we shall call loss.backward(), and have a look at conv1’s bias
gradients before and after the backward.


net
.
zero_grad
()
     # zeroes the gradient buffers of all parameters

print
(
'conv1.bias.grad before backward'
)
print
(
net
.
conv1
.
bias
.
grad
)

loss
.
backward
()

print
(
'conv1.bias.grad after backward'
)
print
(
net
.
conv1
.
bias
.
grad
)

conv1.bias.grad before backward
tensor([0., 0., 0., 0., 0., 0.])
conv1.bias.grad after backward
tensor([ 0.0018,  0.0302,  0.0162, -0.0062, -0.0122,  0.0102])

Now, we have seen how to use loss functions.

Read Later:

The neural network package contains various modules and loss functions
that form the building blocks of deep neural networks. A full list with
documentation is here.

The only thing left to learn is: