Visualizing How Filters Work in Convolutional Neural Networks (CNNs)

Visualizing How Filters Work in Convolutional Neural Networks (CNNs)

Photo by John Barkiple on Unsplash

In Deep Learning, a Convolutional Neural Network (CNN) is a special type of neural network that is designed to process data through multiple layers of arrays. A CNN is well suited for applications like image recognition, and in particular is often used in face recognition software.

In CNN, convolutional layers are the fundamental building blocks which make all the magic happens. In a typical image recognition application, a convolutional layer is made up of several filters to detect the various features of an image. Understanding how this work is best illustrated with an analogy.

Suppose you saw someone walking towards you from a distance. From afar, your eyes will try to detect the edges of the figure, and you try to differentiate that figure from other objects, such as buildings, or cars, etc. As the person walks closer towards you, you try to focus on the shape of the person, trying to deduce if the person is male or female, slim or fat, etc. As the person gets nearer, your focus shifted towards other features of that person, such as his facial features, if his is wearing specs, etc. In general, your focus shifted from broad features to specific features.

Likewise, in a CNN, you have several layers containing various filters (or kernels as they are commonly called) in charge of detecting specific features of the target you are trying to detect. The early layer tries to focus on broad features, while the latter layers tries to detect very specific features.

In a CNN, the values for the various filters in each convolutional layer is obtained by training on a particular training set. At the end of the training, you would have a unique set of filter values that are used for detecting specific features in the dataset. Using this set of filter values, you would apply them on new images so that you can make a prediction on what is contained within the image.

One of the challenges in teaching beginners to CNN is explaining how the filters work. Students often have difficulties in visualising (pun not intended) the use of the filters. And it is with this goal in mind that I set out to write this article. I hope at the end of this article you will have a much better understanding of how filters work in CNN.

Getting Our Data

One of the classic examples in deep learning is the MNIST dataset. And I will use it in our example.

T he MNIST database (Modified National Institute of Standards and Technology database) is a large database of handwritten digits that is commonly used for training various image processing systems.

Using TensorFlow, you can load the MNIST data as follows:

from tensorflow.keras.datasets import mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data()

What I will do now is to use a specific item from this dataset, extract its data and then save it into a CSV file. The following code snippet does that.

item = 66 # index of the digit to load
data = X_train[item] # data is a 2D array

# get the rows and columns of the data
rows = data.shape[0]
columns = data.shape[1]

# used to store all the numbers of the digits
lines = ''

# convert all the cells into lines of values separated by commas
for r in range(rows):
print(data[r])
lines += ','.join([f'{i}' for i in data[r]]) + "\n"

# write the lines to a csv file
with open('mnist.csv','w') as file:
file.write(lines)

If you open up the saved mnist.csv file, you will see the following:

A better way to visualize it is to open it using Excel:

You can now see quite vividly that this dataset represents the digit “2”.

Applying Filters to an Image

To visualize how filters work, let’s use Excel and create a new worksheet.

I have made the final spreadsheet available for download at: https://bit.ly/2QVLnSS.

First, create a 28×28 grid with the following values (we shall use the data for the MNIST digit later on; for now I am going to show you something easier to understand):

Assume each value in the 28×28 grid represents a color (with 255 representing white and 0 representing black).

Next, create another 28×28 grid, where its values are obtained by dividing each value from the first grid by 255:

Next, we create a 3×3 grid representing the filter (kernel):

The filter represents the kind of pattern that we are looking for in the image, where 1 represents white and -1 represent black. In the filter above, I am looking for a vertical edge in the image where the color changes from white to black, like this:

Applying the filter to the grid is simply a matter of multiplying each value in the filter with the corresponding value in the grid:

Each value in the filter is multiplied with the corresponding value in the grid and then summed upThe value of the filter applied on the image; the result’s decimal part is then truncated

The resultant grid is the feature map that we are trying to seek. Looking at the values, it is not easy for us to know the significance of the feature map. So let’s add some color-coding to both of our original image and feature map so that we can see clearly what we are looking for.

For this image grid, we want to apply color to it by selecting the entire grid and then selecting Format | Conditional Formatting…:

In the Manage Rules window, click the + button at the bottom left of the dialog:

Set the colors as follows and then click OK twice:

The grid will now look like this:

Our image is simply a white image with a black rectangle in the middle

Observe from the image that there are 2 edges in the image — one from white to black and another one from black to white:

Let’s now also color-code our feature map (the image with the filter applied):

You should now see the following:

The column that is white is what we are looking for, based on our filter (remember we are looking for edges with color that changes from white to black). If we now change the filter to look for a vertical edge that changes from black to white, then the output will look like this:

How about horizontal edge? Well, if you change the filter to look for a horizontal edge, you get nothing (which is not surprising):

You are now ready see how the filter works on the MNIST dataset. Paste the data that you have extracted in the previous section (the digit “2″) into the worksheet:

The normalized and color-coded image will now look like this:

If you want to detect horizontal edges, use the following filter:

Now all the horizontal edges in the image are highlighted:

Using another example (for the digit “6”):

You can detect all the vertical edges like this:

You can also detect horizontal edges:

The combination of various filters in each convolutional layer is what makes the prediction possible in a CNN.

Try It Out Yourself

The best way to understand how filters work is to try it out yourself. Download the spreadsheet at https://bit.ly/2QVLnSS and play with different filter values, such as these:

Also, try out different digits in the MNIST dataset. And while you are at it, try out using some other images besides the digits!