Forward and Back Propagation over a CNN… code from Scratch!!

The name “convolutional neural network” indicates that the network employs a mathematical operation called convolution. Convolution is a specialized kind of linear operation. Convolutional networks are simply neural networks that use convolution in place of general matrix multiplication in at least one of their layers. It is most useful in computer vision algorithms and models.

If you don’t have any idea of how convolutional neural networks or backpropagation operates, I strongly recommend you to watch the whole cs231n course.

No alt text provided for this image

Keep in mind that the forward propagation: compute the result of an operation and save any intermediates needed for gradient computation in memory. Backward: apply the chain rule to compute the gradient of the loss function with respect to the inputs.

The intuition behind the backpropagation, chain rule, of a CNN could be resume in the next two images, they were extremely helpful in my process to figure it out:

No alt text provided for this image

The forward pass calculates z as a function f(x,y) using the input variables ‘x’ and ‘y’. Respect to the backwardpass the gradients of ‘x’ and ‘y’ on the loss function are calculated by applying the chain rule, by receive dL/dz, the gradient of the loss function with respect to z from above.

No alt text provided for this image

Goal

Our goal is to find out how the gradient is propagating backward in a convolutional layer. In the backpropagation, the goal is to find the db, dx, and dw using the dL/dZ managing the chain gold rule!

The forward pass is defined like this:

The input consists of n data points, each with c channels, height h, and width W. We convolve each input with n different filters, where each filter spans all c channels and has height h and width w.

Input:

  • x: Input data of shape (n, h, w, c)
  • w: Filter weights of shape (f, h, w, c)
  • ‘stride’: The number of pixels between adjacent receptive fields in the horizontal and vertical directions.
  • ‘pad’: The number of pixels that will be used to zero-pad the input.

During padding, ‘pad’ zeros should be placed symmetrically (i.e equally on both sides) along the height and width axes of the input.

The following convolution operation takes an input X of size 7×7 using a single filter W of size3x3 without any padding and stride = 1 generating an output H of size 5×5. Also note that, while performing the forward pass, we will cache the variables X and filter W, each output maps the X’s and the kernel used to get it. Here we are performing the convolution operation without flipping the filter.

No alt text provided for this image

The backpropagation:

We need to assume that we get dh as input (from the backward pass of the next layer). It is important to understand that dh for the previous layer would be the input for the backward pass of the previous layer. Any change in weight in the filter will affect all the output pixels, because each weight in the filter contributes to each pixel in the output map. ¿How to get each derivative?

db

No alt text provided for this imageNo alt text provided for this image

dw

No alt text provided for this imageNo alt text provided for this image

We can notice that dw is a convolution of the input x with a filter dy. Let’s see if it’s still valid with an added dimension.

No alt text provided for this image

dx

No alt text provided for this imageNo alt text provided for this image

We can notice that dx is a convolution of the input w with a filter dy. Let’s see if it’s still valid with an added dimension.

No alt text provided for this image

Derivative Computation (Backward pass) since pictures speak more than words

No alt text provided for this image

Back propagation illustration from the article Back Propagation in Convolutional Neural Networks — Intuition and Code, Mayank Agarwal.

Let’s code!!!

No alt text provided for this image

#!/usr/bin/env python3

	"""Convolutional Neural Networks"""
	
    import numpy as np
	

	

	def conv_forward(A_prev, W, b, activation, padding="same", stride=(1, 1)):
	    """forward prop convolutional 3D image, RGB image - color
	    
        Arg:
	       A_prev: contains the output of prev layer (m, h_prev, w_prev, c_prev)
	       W: filter for the convolution (kh, kw, c_prev, c_new)
	       b: biases (1, 1, 1, c_new)
	       padding: string ‘same’, or ‘valid’
	       stride: tuple (sh, sw)
	    
        Return: padded convolved images RGB np.array
	    """

	    m, h_prev, w_prev, c_prev = A_prev.shape
	    k_h, k_w, c_prev, c_new = W.shape
	    s_h, s_w = stride
	

	    if padding == 'valid':
	        p_h = 0
	        p_w = 0
	

	    if padding == 'same':
	        p_h = np.ceil(((s_h*h_prev) - s_h + k_h - h_prev) / 2)
	        p_h = int(p_h)
	        p_w = np.ceil(((s_w*w_prev) - s_w + k_w - w_prev) / 2)
	        p_w = int(p_w)
	

	    A_prev = np.pad(A_prev, [(0, 0), (p_h, p_h), (p_w, p_w), (0, 0)],
	                    mode='constant', constant_values=0)
	

	    out_h = int(((h_prev - k_h + (2*p_h)) / (stride[0])) + 1)
	    out_w = int(((w_prev - k_w + (2*p_w)) / (stride[1])) + 1)
	    output_conv = np.zeros((m, out_h, out_w, c_new))
	    m_A_prev = np.arange(0, m)
	

	    for i in range(out_h):
	        for j in range(out_w):
	            for f in range(c_new):
	                output_conv[m_A_prev, i, j, f] = activation((
	                    np.sum(np.multiply(
	                        A_prev[
	                            m_A_prev,
	                            i*(stride[0]):k_h+(i*(stride[0])),
	                            j*(stride[1]):k_w+(j*(stride[1]))],
	                        W[:, :, :, f]), axis=(1, 2, 3))) + b[0, 0, 0, f])
	    
        
        return output_conv


if __name__ == "__main__":
	    np.random.seed(0)
	    lib = np.load('../data/MNIST.npz')
	    X_train = lib['X_train']
	    m, h, w = X_train.shape
	    X_train_c = X_train.reshape((-1, h, w, 1))
	

	    W = np.random.randn(3, 3, 1, 2)
	    b = np.random.randn(1, 1, 1, 2)
	

	    def relu(Z):
	        return np.maximum(Z, 0)
	

	    plt.imshow(X_train[0])
	    plt.show()
	    A = conv_forward(X_train_c, W, b, relu, padding='valid')
	    print(A.shape)
	    plt.imshow(A[0, :, :, 0])
	    plt.show()
	    plt.imshow(A[0, :, :, 1])
	    plt.show()

Backpropagation over a convolutional layer of a neural network:

#!/usr/bin/env python3
	"""Convolutional Neural Networks"""
	import numpy as np
	

	

	def conv_backward(dZ, A_prev, W, b, padding="same", stride=(1, 1)):
	    """back prop convolutional 3D image, RGB image - color
	    Arg:
	       dZ: containing the partial derivatives (m, h_new, w_new, c_new)
	       A_prev: contains the output of prev layer (m, h_prev, w_prev, c_prev)
	       W: filter for the convolution (kh, kw, c_prev, c_new)
	       b: biases (1, 1, 1, c_new)
	       padding: string ‘same’, or ‘valid’
	       stride: tuple (sh, sw)
	    Returns: parcial dev prev layer (dA_prev), kernels (dW), biases (db)
	    """
	    k_h, k_w, c_prev, c_new = W.shape
	    _, h_new, w_new, c_new = dZ.shape
	    m, h_x, w_x, c_prev = A_prev.shape
	    s_h, s_w = stride
	    x = A_prev
	

	    if padding == 'valid':
	        p_h = 0
	        p_w = 0
	

	    if padding == 'same':
	        p_h = np.ceil(((s_h*h_x) - s_h + k_h - h_x) / 2)
	        p_h = int(p_h)
	        p_w = np.ceil(((s_w*w_x) - s_w + k_w - w_x) / 2)
	        p_w = int(p_w)
	

	    db = np.sum(dZ, axis=(0, 1, 2), keepdims=True)
	

	    x_padded = np.pad(x, [(0, 0), (p_h, p_h), (p_w, p_w), (0, 0)],
	                      mode='constant', constant_values=0)
	

	    dW = np.zeros_like(W)
	    dx = np.zeros(x_padded.shape)
	    m_i = np.arange(m)
	    for i in range(m):
	        for h in range(h_new):
	            for w in range(w_new):
	                for f in range(c_new):
	                    dx[i,
	                       h*(stride[0]):(h*(stride[0]))+k_h,
	                       w*(stride[1]):(w*(stride[1]))+k_w,
	                       :] += dZ[i, h, w, f] * W[:, :, :, f]
	

	                    dW[:, :,
	                       :, f] += x_padded[i,
	                                         h*(stride[0]):(h*(stride[0]))+k_h,
	                                         w*(stride[1]):(w*(stride[1]))+k_w,
	                                         :] * dZ[i, h, w, f]
	    if padding == 'same':
	        dx = dx[:, p_h:-p_h, p_w:-p_w, :]
	    else:
	        dx = dx
	

	    return dx, dW, db



if __name__ == "__main__":
    np.random.seed(0)
    lib = np.load('../data/MNIST.npz')
    X_train = lib['X_train']
    _, h, w = X_train.shape
    X_train_c = X_train[:10].reshape((-1, h, w, 1))

    W = np.random.randn(3, 3, 1, 2)
    b = np.random.randn(1, 1, 1, 2)

    dZ = np.random.randn(10, h - 2, w - 2, 2)
    print(conv_backward(dZ, X_train_c, W, b, padding="valid"))

Another articles that you could find interesting are:

Derivation of Backpropagation in Convolutional Neural Network (CNN)

Backpropagation in a convolutional layer

Understanding the backward pass through Batch Normalization Layer

Backpropagation in a Convolutional Neural Network

Hope this article helps you to understand the intuition behind the forward and backpropagation over a CNN, if you have any comment or fix please do not hesitate to contact me, or send me an email.

You could find more projects and machine learning paper implementation on my GitHub.