Forward and Back Propagation over a CNN… code from Scratch!!
The name “convolutional neural network” indicates that the network employs a mathematical operation called convolution. Convolution is a specialized kind of linear operation. Convolutional networks are simply neural networks that use convolution in place of general matrix multiplication in at least one of their layers. It is most useful in computer vision algorithms and models.
If you don’t have any idea of how convolutional neural networks or backpropagation operates, I strongly recommend you to watch the whole cs231n course.
Keep in mind that the forward propagation: compute the result of an operation and save any intermediates needed for gradient computation in memory. Backward: apply the chain rule to compute the gradient of the loss function with respect to the inputs.
The intuition behind the backpropagation, chain rule, of a CNN could be resume in the next two images, they were extremely helpful in my process to figure it out:
The forward pass calculates z as a function f(x,y) using the input variables ‘x’ and ‘y’. Respect to the backwardpass the gradients of ‘x’ and ‘y’ on the loss function are calculated by applying the chain rule, by receive dL/dz, the gradient of the loss function with respect to z from above.
Mục Lục
Goal
Our goal is to find out how the gradient is propagating backward in a convolutional layer. In the backpropagation, the goal is to find the db, dx, and dw using the dL/dZ managing the chain gold rule!
The forward pass is defined like this:
The input consists of n data points, each with c channels, height h, and width W. We convolve each input with n different filters, where each filter spans all c channels and has height h and width w.
Input:
- x: Input data of shape (n, h, w, c)
- w: Filter weights of shape (f, h, w, c)
- ‘stride’: The number of pixels between adjacent receptive fields in the horizontal and vertical directions.
- ‘pad’: The number of pixels that will be used to zero-pad the input.
During padding, ‘pad’ zeros should be placed symmetrically (i.e equally on both sides) along the height and width axes of the input.
The following convolution operation takes an input X of size 7×7 using a single filter W of size3x3 without any padding and stride = 1 generating an output H of size 5×5. Also note that, while performing the forward pass, we will cache the variables X and filter W, each output maps the X’s and the kernel used to get it. Here we are performing the convolution operation without flipping the filter.
The backpropagation:
We need to assume that we get dh as input (from the backward pass of the next layer). It is important to understand that dh for the previous layer would be the input for the backward pass of the previous layer. Any change in weight in the filter will affect all the output pixels, because each weight in the filter contributes to each pixel in the output map. ¿How to get each derivative?
db
dw
We can notice that dw is a convolution of the input x with a filter dy. Let’s see if it’s still valid with an added dimension.
dx
We can notice that dx is a convolution of the input w with a filter dy. Let’s see if it’s still valid with an added dimension.
Derivative Computation (Backward pass) since pictures speak more than words
Back propagation illustration from the article Back Propagation in Convolutional Neural Networks — Intuition and Code, Mayank Agarwal.
Let’s code!!!
#!/usr/bin/env python3 """Convolutional Neural Networks""" import numpy as np def conv_forward(A_prev, W, b, activation, padding="same", stride=(1, 1)): """forward prop convolutional 3D image, RGB image - color Arg: A_prev: contains the output of prev layer (m, h_prev, w_prev, c_prev) W: filter for the convolution (kh, kw, c_prev, c_new) b: biases (1, 1, 1, c_new) padding: string ‘same’, or ‘valid’ stride: tuple (sh, sw) Return: padded convolved images RGB np.array """ m, h_prev, w_prev, c_prev = A_prev.shape k_h, k_w, c_prev, c_new = W.shape s_h, s_w = stride if padding == 'valid': p_h = 0 p_w = 0 if padding == 'same': p_h = np.ceil(((s_h*h_prev) - s_h + k_h - h_prev) / 2) p_h = int(p_h) p_w = np.ceil(((s_w*w_prev) - s_w + k_w - w_prev) / 2) p_w = int(p_w) A_prev = np.pad(A_prev, [(0, 0), (p_h, p_h), (p_w, p_w), (0, 0)], mode='constant', constant_values=0) out_h = int(((h_prev - k_h + (2*p_h)) / (stride[0])) + 1) out_w = int(((w_prev - k_w + (2*p_w)) / (stride[1])) + 1) output_conv = np.zeros((m, out_h, out_w, c_new)) m_A_prev = np.arange(0, m) for i in range(out_h): for j in range(out_w): for f in range(c_new): output_conv[m_A_prev, i, j, f] = activation(( np.sum(np.multiply( A_prev[ m_A_prev, i*(stride[0]):k_h+(i*(stride[0])), j*(stride[1]):k_w+(j*(stride[1]))], W[:, :, :, f]), axis=(1, 2, 3))) + b[0, 0, 0, f])
return output_conv
if __name__ == "__main__": np.random.seed(0) lib = np.load('../data/MNIST.npz') X_train = lib['X_train'] m, h, w = X_train.shape X_train_c = X_train.reshape((-1, h, w, 1)) W = np.random.randn(3, 3, 1, 2) b = np.random.randn(1, 1, 1, 2) def relu(Z): return np.maximum(Z, 0) plt.imshow(X_train[0]) plt.show() A = conv_forward(X_train_c, W, b, relu, padding='valid') print(A.shape) plt.imshow(A[0, :, :, 0]) plt.show() plt.imshow(A[0, :, :, 1])
plt.show()
Backpropagation over a convolutional layer of a neural network:
#!/usr/bin/env python3 """Convolutional Neural Networks""" import numpy as np def conv_backward(dZ, A_prev, W, b, padding="same", stride=(1, 1)): """back prop convolutional 3D image, RGB image - color Arg: dZ: containing the partial derivatives (m, h_new, w_new, c_new) A_prev: contains the output of prev layer (m, h_prev, w_prev, c_prev) W: filter for the convolution (kh, kw, c_prev, c_new) b: biases (1, 1, 1, c_new) padding: string ‘same’, or ‘valid’ stride: tuple (sh, sw) Returns: parcial dev prev layer (dA_prev), kernels (dW), biases (db) """ k_h, k_w, c_prev, c_new = W.shape _, h_new, w_new, c_new = dZ.shape m, h_x, w_x, c_prev = A_prev.shape s_h, s_w = stride x = A_prev if padding == 'valid': p_h = 0 p_w = 0 if padding == 'same': p_h = np.ceil(((s_h*h_x) - s_h + k_h - h_x) / 2) p_h = int(p_h) p_w = np.ceil(((s_w*w_x) - s_w + k_w - w_x) / 2) p_w = int(p_w) db = np.sum(dZ, axis=(0, 1, 2), keepdims=True) x_padded = np.pad(x, [(0, 0), (p_h, p_h), (p_w, p_w), (0, 0)], mode='constant', constant_values=0) dW = np.zeros_like(W) dx = np.zeros(x_padded.shape) m_i = np.arange(m) for i in range(m): for h in range(h_new): for w in range(w_new): for f in range(c_new): dx[i, h*(stride[0]):(h*(stride[0]))+k_h, w*(stride[1]):(w*(stride[1]))+k_w, :] += dZ[i, h, w, f] * W[:, :, :, f] dW[:, :, :, f] += x_padded[i, h*(stride[0]):(h*(stride[0]))+k_h, w*(stride[1]):(w*(stride[1]))+k_w, :] * dZ[i, h, w, f] if padding == 'same': dx = dx[:, p_h:-p_h, p_w:-p_w, :] else: dx = dx
return dx, dW, db if __name__ == "__main__": np.random.seed(0) lib = np.load('../data/MNIST.npz') X_train = lib['X_train'] _, h, w = X_train.shape X_train_c = X_train[:10].reshape((-1, h, w, 1)) W = np.random.randn(3, 3, 1, 2) b = np.random.randn(1, 1, 1, 2) dZ = np.random.randn(10, h - 2, w - 2, 2) print(conv_backward(dZ, X_train_c, W, b, padding="valid"))
Another articles that you could find interesting are:
Derivation of Backpropagation in Convolutional Neural Network (CNN)
Backpropagation in a convolutional layer
Understanding the backward pass through Batch Normalization Layer
Backpropagation in a Convolutional Neural Network
Hope this article helps you to understand the intuition behind the forward and backpropagation over a CNN, if you have any comment or fix please do not hesitate to contact me, or send me an email.
You could find more projects and machine learning paper implementation on my GitHub.