Fully Connected vs Convolutional Neural Networks
To know more about the basic fundamentals related to CNN, check out my earlier blogs on Convolutions and Pooling .
In this post, we will cover the differences between a Fully connected neural network and a Convolutional neural network. We will focus on understanding the differences in terms of the model architecture and results obtained on the MNIST dataset.
Dataset Used
- MNIST (Modified National Institute of Standards and Technology database) dataset of 60,000 28×28 grayscale images of the 10 digits, along with a test set of 10,000 images.
- It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image.
- It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting.
Model Implementation
A) Using Fully Connected Neural Network Architecture
- Model Architecture
For the fully-connected architecture, I have used a total of three hidden layers with ‘relu’ activation function apart from input and output layers.
- Model Summary
The total number of trainable parameters is around 0.3 million. In a fully-connected layer, for n inputs and m outputs, the number of weights is n*m. Additionally, you have a bias for each output node, so total (n+1)*m parameters.
- Model Accuracy
On training the fully connected model for five epochs with a batch size of 128, and validation split value set to 0.3 we got training accuracy of 98.6% and validation accuracy of 96.07%. Moreover, after 2nd epoch, we can visualize how train and validation accuracy tends to move wide apart.
- Accuracy on Test data
On test data with 10,000 images accuracy for the fully connected neural network is 96%.
B) Using Convolutional Neural Network Architecture
- Model Architecture
For Convolutional Neural network architecture, we added 3 convolutional layers with activation as ‘relu’ and a max pool layer after the first convolutional layer.
- Model Summary
With CNN the differences you can notice in summary are Output shape and number of parameters. As compared to the fully connected neural network model the total number of parameters is too less i.e. 0.1 million.
- Model Accuracy
On training, CNN for five epochs for a batch size of 128, and validation split value set to 0.3 we got training accuracy of 99.19% and validation accuracy of 99.63%. Moreover, unlike the fully connected model, we can visualize train and validation accuracy do not tend to move as wide apart.
- Accuracy on the Test dataset
On test data with 10,000 images, accuracy for the fully connected neural network is 98.9%.
Final Thoughts
Although fully connected networks make no assumptions about the input they tend to perform less and aren’t good for feature extraction. Plus they have a higher number of weights to train that results in high training time while on the other hand CNNs are trained to identify and extract the best features from the images for the problem at hand with relatively fewer parameters to train.
Please find the relevant codes used in this blog here. On similar lines, you can find the implementation of CNNs on the FMNIST dataset using PyTorch here.