A gentle explanation of Backpropagation in Convolutional Neural Network (CNN)

A gentle explanation of Backpropagation in Convolutional Neural Network (CNN)

Recently, I have read some articles about Convolutional Neural Network, for example, this article, this article, and the notes of the Stanford CS class CS231n: Convolutional Neural Networks for Visual Recognition. These articles explain Convolutional Neural Network’s architecture and its layers very well but they don’t include a detailed explanation of Backpropagation in Convolutional Neural Network. After digging the Internet deeper and wider, I found two articles [4] and [5] explaining the Backpropagation phase pretty deeply but I feel they are still abstract to me. Because I want a more tangible and detailed explanation so I decided to write this article myself. I hope that it is helpful to you.

A PDF version is here.

1. Prerequisites

To fully understand this article, I highly recommend you to read the following articles to grasp firmly the foundation of Convolutional Neural Network beforehand:

2. Architecture

In this article, I will build a real Convolutional Neural Network from scratch to classify handwritten digits in the MNIST dataset provided by http://yann.lecun.com/exdb/mnist/. At an abstract level, the architecture looks like:

Figure 1: Abstract Architecture

where

  • I is a grayscale image with size 28×28
  • the kernel is a 3D array with size 6x5x5
  • the bias is a 1D array with size 6
  • the kernel is a 4D array with size 12x6x5x5
  • the bias is a 1D array with size 12
  • the weight w is a 2D array with size 10×192
  • the bias b is a 1D array with size 10
  • the output O is a 1D array with size 10

In the first and second Convolution Layers, I use ReLU functions (Rectified Linear Unit) as activation functions. I use MaxPool with pool size 2×2 in the first and second Pooling Layers. And, I use Softmax as an activation function in the Fully Connected Layer.

Zooming in the abstract architecture, we will have a detailed architecture split into two following parts (I split the detailed architecture into 2 parts because it’s too long to fit on a single page):

Figure 2: Detailed Architecture — part 1Figure 3: Detailed Architecture — part 2

Like a standard Neural Network, training a Convolutional Neural Network consists of two phases Feedforward and Backpropagation.

5. Implementation

Convolution Layers:

Pooling Layer:

ReLU, Softmax, and Loss functions:

CNN (including Feedforward and Backpropagation):

6. Training the Convolutional Neural Network

Training script:

We train the Convolutional Neural Network with 10,000 train images and learning rate = 0.005. After each epoch, we evaluate the network against 1000 test images. After 10 epochs, we got the following results:

Epoch: 1, validate_average_loss: 0.21975272097355802, validate_accuracy: 92.60%
Epoch: 2, validate_average_loss: 0.12023064924979249, validate_accuracy: 96.60%
Epoch: 3, validate_average_loss: 0.08324938936477308, validate_accuracy: 96.90%
Epoch: 4, validate_average_loss: 0.11886395613170263, validate_accuracy: 96.50%
Epoch: 5, validate_average_loss: 0.12090886461215948, validate_accuracy: 96.10%
Epoch: 6, validate_average_loss: 0.09011801069693898, validate_accuracy: 96.80%
Epoch: 7, validate_average_loss: 0.09669009218675029, validate_accuracy: 97.00%
Epoch: 8, validate_average_loss: 0.09173558774169109, validate_accuracy: 97.20%
Epoch: 9, validate_average_loss: 0.08829789823772816, validate_accuracy: 97.40%
Epoch: 10, validate_average_loss: 0.07436090860825195, validate_accuracy: 98.10%

As you can see, the Average Loss has decreased from 0.21 to 0.07 and the Accuracy has increased from 92.60% to 98.10%.

If we train the Convolutional Neural Network with the full train images (60,000 images) and after each epoch, we evaluate the network against the full test images (10,000 images). After 10 epochs, we got the following results:

Epoch: 1, validate_average_loss: 0.05638172577698067, validate_accuracy: 98.22%
Epoch: 2, validate_average_loss: 0.046379447686687364, validate_accuracy: 98.52%
Epoch: 3, validate_average_loss: 0.04608373226431266, validate_accuracy: 98.64%
Epoch: 4, validate_average_loss: 0.039190748866389284, validate_accuracy: 98.77%
Epoch: 5, validate_average_loss: 0.03521482791549167, validate_accuracy: 98.97%
Epoch: 6, validate_average_loss: 0.040033883784694996, validate_accuracy: 98.76%
Epoch: 7, validate_average_loss: 0.0423066147028397, validate_accuracy: 98.85%
Epoch: 8, validate_average_loss: 0.03472158758304639, validate_accuracy: 98.97%
Epoch: 9, validate_average_loss: 0.0685201646233985, validate_accuracy: 98.09%
Epoch: 10, validate_average_loss: 0.04067345041070258, validate_accuracy: 98.91%

At the epoch 8th, the Average Loss has decreased to 0.03 and the Accuracy has increased to 98.97%. So it’s very clear that if we train the CNN with a larger amount of train images, we will get a higher accuracy network with lesser average loss. That is our CNN has better generalization capability.

7. Using the trained Convolutional Neural Network to infer handwritten digits

For example, executing the above script with an argument -i 2020 to infer a number from the test image with index = 2020:

$ python test_model.py -i 2020

The result is

The trained Convolutional Neural Network inferred the test image with index 2020 correctly and with 100% confidence.

8. Conclusion

Performing derivation of Backpropagation in Convolutional Neural Network and implementing it from scratch helps me understand Convolutional Neural Network more deeply and tangibly. Hopefully, you will get some deeper understandings of Convolutional Neural Network after reading this article as well. If you have any questions or if you find any mistakes, please drop me a comment. In addition, I pushed the entire source code on GitHub at NeuralNetworks repository, feel free to clone it.

References

[1] https://victorzhou.com/blog/intro-to-cnns-part-1/

[2] https://towardsdatascience.com/convolutional-neural-networks-from-the-ground-up-c67bb41454e1

[3] http://cs231n.github.io/convolutional-networks/

[4] http://cbelwal.blogspot.com/2018/05/part-i-backpropagation-mechanics-for.html

[5] Zhifei Zhang. Derivation of Backpropagation in Convolutional Neural Network (CNN). University of Tennessee, Knoxvill, TN, October 18, 2016.
https://pdfs.semanticscholar.org/5d79/11c93ddcb34cac088d99bd0cae9124e5dcd1.pdf