Fully Connected vs Convolutional Neural Networks

To know more about the basic fundamentals related to CNN, check out my earlier blogs on Convolutions and Pooling .

In this post, we will cover the differences between a Fully connected neural network and a Convolutional neural network. We will focus on understanding the differences in terms of the model architecture and results obtained on the MNIST dataset.

Dataset Used

MNIST (Modified National Institute of Standards and Technology database) dataset of 60,000 28×28 grayscale images of the 10 digits, along with a test set of 10,000 images.
It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image.
It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting.

Model Implementation

A) Using Fully Connected Neural Network Architecture

Model Architecture

For the fully-connected architecture, I have used a total of three hidden layers with ‘relu’ activation function apart from input and output layers.

Model Summary

The total number of trainable parameters is around 0.3 million. In a fully-connected layer, for n inputs and m outputs, the number of weights is n*m. Additionally, you have a bias for each output node, so total (n+1)*m parameters.

Model Accuracy

On training the fully connected model for five epochs with a batch size of 128, and validation split value set to 0.3 we got training accuracy of 98.6% and validation accuracy of 96.07%. Moreover, after 2nd epoch, we can visualize how train and validation accuracy tends to move wide apart.

Accuracy on Test data

On test data with 10,000 images accuracy for the fully connected neural network is 96%.

B) Using Convolutional Neural Network Architecture

Model Architecture

For Convolutional Neural network architecture, we added 3 convolutional layers with activation as ‘relu’ and a max pool layer after the first convolutional layer.

Model Summary

With CNN the differences you can notice in summary are Output shape and number of parameters. As compared to the fully connected neural network model the total number of parameters is too less i.e. 0.1 million.

Model Accuracy

On training, CNN for five epochs for a batch size of 128, and validation split value set to 0.3 we got training accuracy of 99.19% and validation accuracy of 99.63%. Moreover, unlike the fully connected model, we can visualize train and validation accuracy do not tend to move as wide apart.

Accuracy on the Test dataset

On test data with 10,000 images, accuracy for the fully connected neural network is 98.9%.

Final Thoughts

Although fully connected networks make no assumptions about the input they tend to perform less and aren’t good for feature extraction. Plus they have a higher number of weights to train that results in high training time while on the other hand CNNs are trained to identify and extract the best features from the images for the problem at hand with relatively fewer parameters to train.

Please find the relevant codes used in this blog here. On similar lines, you can find the implementation of CNNs on the FMNIST dataset using PyTorch here.