Convolution Neural Networks vs Fully Connected Neural Networks

Convolution Neural Networks vs Fully Connected Neural Networks

If the terminator had CNN, the other Sarahs wouldn’t have died: image courtesy Adweek.com

I was reading the theory behind Convolution Neural Networks(CNN) and decided to write a short summary to serve as a general overview of CNNs. This article also highlights the main differences with fully connected neural networks.

Convolution neural networks are being applied ubiquitously for variety of learning problems. They are quite effective for image classification problems. In this post we will see what differentiates convolution neural networks or CNNs from fully connected neural networks and why convolution neural networks perform so well for image classification tasks.

First lets look at the similarities. Both convolution neural networks and neural networks have learn able weights and biases. In both networks the neurons receive some input, perform a dot product and follows it up with a non-linear function like ReLU(Rectified Linear Unit).

Main problem with fully connected layer:

When it comes to classifying images — lets say with size 64x64x3 — fully connected layers need 12288 weights in the first hidden layer! The number of weights will be even bigger for images with size 225x225x3 = 151875. Networks having large number of parameter face several problems, for e.g. slower training time, chances of overfitting e.t.c.

The main functional difference of convolution neural network is that, the main image matrix is reduced to a matrix of lower dimension in the first layer itself through an operation called Convolution. For e.g. an image of 64x64x3 can be reduced to 1x1x10. Following which subsequent operations are performed.

A CNN usually consists of the following components:

  • Input layer — a single raw image is given as an input. For a RGB image its dimension will be AxBx3, where 3 represents the colours Red, Green and Blue.
  • A convolution layer – a convolution layer is a matrix of dimension smaller than the input matrix. It performs a convolution operation with a small part of the input matrix having same dimension. The sum of the products of the corresponding elements is the output of this layer.
  • ReLU or Rectified Linear Unit — ReLU is mathematically expressed as max(0,x). It means that any number below 0 is converted to 0 while any positive number is allowed to pass as it is.

A ReLU function: courtesy Wikipedia

  • Maxpool — Maxpool passes the maximum value from amongst a small collection of elements of the incoming matrix to the output. Usually it is a square matrix.

A Maxpol function: courtesy ResearchGate.net

  • Fully connected layer — The final output layer is a normal fully-connected neural network layer, which gives the output.

Usually the convolution layers, ReLUs and Maxpool layers are repeated number of times to form a network with multiple hidden layer commonly known as deep neural network.

A Convolution Neural Network: courtesy MDPI.com

Some well know convolution networks

  • LeNet — Developed by Yann LeCun to recognize handwritten digits is the pioneer CNN.
  • AlexNet — Developed by Alex Krizhevsky, Ilya Sutskever and Geoff Hinton won the 2012 ImageNet challenge. It is the first CNN where multiple convolution operations were used.
  • GoogleLeNet — Developed by Google, won the 2014 ImageNet competition. The main advantage of this network over the other networks was that it required a lot lesser number of parameters to train, making it faster and less prone to overfitting.
  • VGGNet — This is another popular network, with its most popular version being VGG16. VGG16 has 16 layers which includes input, output and hidden layers.
  • ResNet — Developed by Kaiming He, this network won the 2015 ImageNet competition. The 2 most popular variant of ResNet are the ResNet50 and ResNet34. Another complex variation of ResNet is ResNeXt architecture.