Basic CNN Architecture: Explaining 5 Layers of Convolutional Neural Network | upGrad blog

Home > Artificial Intelligence > Basic CNN Architecture: Explaining 5 Layers of Convolutional Neural Network

Introduction

In the last few years of the IT industry, there has been a huge demand for once particular skill set known as Deep Learning. Deep Learning a subset of Machine Learning which consists of algorithms that are inspired by the functioning of the human brain or the neural networks.

Check out our free data science courses to get an edge over the competition.

These structures are called as Neural Networks. It teaches the computer to do what naturally comes to humans. Deep learning, there are several types of models such as the Artificial Neural Networks (ANN), Autoencoders, Recurrent Neural Networks (RNN) and Reinforcement Learning. But there has been one particular model that has contributed a lot in the field of computer vision and image analysis which is the Convolutional Neural Networks (CNN) or the ConvNets. 

CNN is very useful as it minimises human effort by automatically detecting the features. For example, for apples and mangoes, it would automatically detect the distinct features of each class on its own.

You can also consider doing our Python Bootcamp course from upGrad to upskill your career.

CNNs are a class of Deep Neural Networks that can recognize and classify particular features from images and are widely used for analyzing visual images. Their applications range from image and video recognition, image classification, medical image analysis, computer vision and natural language processing.

CNN has high accuracy, and because of the same, it is useful in image recognition. Image recognition has a wide range of uses in various industries such as medical image analysis,  phone, security, recommendation systems, etc. 

The term ‘Convolution” in CNN denotes the mathematical function of convolution which is a special kind of linear operation wherein two functions are multiplied to produce a third function which expresses how the shape of one function is modified by the other. In simple terms, two images which can be represented as matrices are multiplied to give an output that is used to extract features from the image.

Learn Machine Learning online from the World’s top Universities – Masters, Executive Post Graduate Programs, and Advanced Certificate Program in ML & AI to fast-track your career.

Basic Architecture

There are two main parts to a CNN architecture

  • A convolution tool that separates and identifies the various features of the image for analysis in a process called as Feature Extraction. 

  • The network of feature extraction consists of many pairs of convolutional or pooling layers. 

  • A fully connected layer that utilizes the output from the convolution process and predicts the class of the image based on the features extracted in previous stages.

  • This CNN model of feature extraction aims to reduce the number of features present in a dataset. It creates new features which summarises the existing features contained in an original set of features. There are many

    CNN layers

    as shown in the

    CNN architecture diagram

    .

Source  

Featured Program for you: Fullstack Development Bootcamp Course

Convolution Layers 

There are three types of layers that make up the CNN which are the convolutional layers, pooling layers, and fully-connected (FC) layers. When these layers are stacked, a CNN architecture will be formed. In addition to these three layers, there are two more important parameters which are the dropout layer and the activation function which are defined below.

Good Read: Introduction to Deep Learning & Neural Networks

1. Convolutional Layer

This layer is the first layer that is used to extract the various features from the input images. In this layer, the mathematical operation of convolution is performed between the input image and a filter of a particular size MxM. By sliding the filter over the input image, the dot product is taken between the filter and the parts of the input image with respect to the size of the filter (MxM).

The output is termed as the Feature map which gives us information about the image such as the corners and edges. Later, this feature map is fed to other layers to learn several other features of the input image.

The convolution layer in CNN passes the result to the next layer once applying the convolution operation in the input. Convolutional layers in CNN benefit a lot as they ensure the spatial relationship between the pixels is intact.

2. Pooling Layer

In most cases, a Convolutional Layer is followed by a Pooling Layer. The primary aim of this layer is to decrease the size of the convolved feature map to reduce the computational costs. This is performed by decreasing the connections between layers and independently operates on each feature map. Depending upon method used, there are several types of Pooling operations. It basically summarises the features generated by a convolution layer.

In Max Pooling, the largest element is taken from feature map. Average Pooling calculates the average of the elements in a predefined sized Image section. The total sum of the elements in the predefined section is computed in Sum Pooling. The Pooling Layer usually serves as a bridge between the Convolutional Layer and the FC Layer.

This CNN model generalises the features extracted by the convolution layer, and helps the networks to recognise the features independently. With the help of this, the computations are also reduced in a network.

Must Read: Neural Network Project Ideas

3. Fully Connected Layer

The Fully Connected (FC) layer consists of the weights and biases along with the neurons and is used to connect the neurons between two different layers. These layers are usually placed before the output layer and form the last few layers of a CNN Architecture.

In this, the input image from the previous layers are flattened and fed to the FC layer. The flattened vector then undergoes few more FC layers where the mathematical functions operations usually take place. In this stage, the classification process begins to take place. The reason two layers are connected is that two fully connected layers will perform better than a single connected layer. These layers in CNN reduce the human supervision

4. Dropout

Usually, when all the features are connected to the FC layer, it can cause overfitting in the training dataset. Overfitting occurs when a particular model works so well on the training data causing a negative impact in the model’s performance when used on a new data.

To overcome this problem, a dropout layer is utilised wherein a few neurons are dropped from the neural network during training process resulting in reduced size of the model. On passing a dropout of 0.3, 30% of the nodes are dropped out randomly from the neural network.

Dropout results in improving the performance of a machine learning model as it prevents overfitting by making the network simpler. It drops neurons from the neural networks during training.

Must Read: Free deep learning course!

5. Activation Functions

Finally, one of the most important parameters of the CNN model is the activation function. They are used to learn and approximate any kind of continuous and complex relationship between variables of the network. In simple words, it decides which information of the model should fire in the forward direction and which ones should not at the end of the network.

It adds non-linearity to the network. There are several commonly used activation functions such as the ReLU, Softmax, tanH and the Sigmoid functions. Each of these functions have a specific usage. For a binary classification CNN model, sigmoid and softmax functions are preferred an for a multi-class classification, generally softmax us used. In simple terms, activation functions in a CNN model determine whether a neuron should be activated or not. It decides whether the input to the work is important or not to predict using mathematical operations.

Best Machine Learning Courses & AI Courses Online

Master of Science in Machine Learning & AI from LJMU
Executive Post Graduate Programme in Machine Learning & AI from IIITB
Advanced Certificate Programme in Machine Learning & NLP from IIITB
Advanced Certificate Programme in Machine Learning & Deep Learning from IIITB
Executive Post Graduate Program in Data Science & Machine Learning from University of Maryland

To Explore all our courses, visit our page below.

Machine Learning Courses

LeNet-5 CNN Architecture 

In 1998, the LeNet-5 architecture was introduced in a research paper titled “Gradient-Based Learning Applied to Document Recognition” by Yann LeCun, Leon Bottou, Yoshua Bengio, and Patrick Haffner. It is one of the earliest and most basic CNN architecture.

It consists of 7 layers. The first layer consists of an input image with dimensions of 32×32. It is convolved with 6 filters of size 5×5 resulting in dimension of 28x28x6. The second layer is a Pooling operation which filter size 2×2 and stride of 2. Hence the resulting image dimension will be 14x14x6.

Similarly, the third layer also involves in a convolution operation with 16 filters of size 5×5 followed by a fourth pooling layer with similar filter size of 2×2 and stride of 2. Thus, the resulting image dimension will be reduced to 5x5x16.

Once the image dimension is reduced, the fifth layer is a fully connected convolutional layer with 120 filters each of size 5×5. In this layer, each of the 120 units in this layer will be connected to the 400 (5x5x16) units from the previous layers. The sixth layer is also a fully connected layer with 84 units.

The final seventh layer will be a softmax output layer with ‘n’ possible classes depending upon the number of classes in the dataset.

Source

Also visit upGrad’s Degree Counselling page for all undergraduate and postgraduate programs.

The above diagram is a representation of the 7 layers of the LeNet-5 CNN Architecture.

Below are the snapshots of the Python code to build a LeNet-5 CNN architecture using keras library with TensorFlow framework

In Python Programming, the model type that is most commonly used is the Sequential type. It is the easiest way to build a CNN model in keras. It permits us to build a model layer by layer. The ‘add()’ function is used to add layers to the model. As explained above, for the LeNet-5 architecture, there are two Convolution and Pooling pairs followed by a Flatten layer which is usually used as a connection between Convolution and the Dense layers.

The Dense layers are the ones that are mostly used for the output layers. The activation used is the ‘Softmax’ which gives a probability for each class and they sum up totally to 1. The model will make it’s prediction based on the class with highest probability. 

The summary of the model is displayed as below.

In-demand Machine Learning Skills

Artificial Intelligence Courses
Tableau Courses
NLP Courses
Deep Learning Courses

Conclusion

Hence, in this article we have understood the basic CNN structure, it’s architecture and the various layers that make up the CNN model. Also, we have seen an architectural example of a very famous and traditional LeNet-5 model with its Python program. We have understood how the dependence on humans decreases to build effective functionalities. Distinct layers in CNN transform the input to output using differentiable functions.

If you’re interested to learn more about machine learning courses, check out IIIT-B & upGrad’s Executive PG Programme in Machine Learning & AI which is designed for working professionals and offers 450+ hours of rigorous training, 30+ case studies & assignments, IIIT-B Alumni status, 5+ practical hands-on capstone projects & job assistance with top firms.

Popular AI and ML Blogs & Free Courses

IoT: History, Present & Future
Machine Learning Tutorial: Learn ML
What is Algorithm? Simple & Easy
Robotics Engineer Salary in India : All Roles
A Day in the Life of a Machine Learning Engineer: What do they do?
What is IoT (Internet of Things)
Permutation vs Combination: Difference between Permutation and Combination
Top 7 Trends in Artificial Intelligence & Machine Learning
Machine Learning with R: Everything You Need to Know

AI & ML Free Courses

Introduction to NLP
Fundamentals of Deep Learning of Neural Networks
Linear Regression: Step by Step Guide
Artificial Intelligence in the Real World
Introduction to Tableau
Case Study using Python, SQL and Tableau

What are activation functions in CNN?

The activation function is one of the most vital components in the CNN model. They’re utilized to learn and approximate any form of network variable-to-variable association that’s both continuous and complex. In simple terms, it determines which model information should flow in the forward direction and which should not at the network’s end. It gives the network non-linearity. The ReLU, Softmax, tanH, and Sigmoid functions are some of the most often utilized activation functions. All of these functions have distinct uses. For a 2-class CNN model, sigmoid and softmax functions are favored, whereas softmax is typically employed for multi-class classification.

What are the basic components of the convolutional neural network architecture?

An input layer, an output layer, and multiple hidden layers make up convolutional networks. The neurons in the layers of a convolutional network are arranged in three dimensions, unlike those in a standard neural network (width, height, and depth dimensions). This enables the CNN to convert a three-dimensional input volume into an output volume. Convolution, pooling, normalizing, and fully connected layers make up the hidden layers. Multiple Cconv layers are used in CNN to filter input volumes to higher levels of abstraction.

What is the benefit of standard CNN architectures?

While traditional network architectures consisted solely of stacked convolutional layers, newer architectures look into new and novel ways of constructing convolutional layers in order to improve learning efficiency. These architectures provide general architectural recommendations for machine learning practitioners to adapt in order to handle a variety of computer vision problems. These architectures can be utilized as rich feature extractors for image classification, object identification, picture segmentation, and a variety of other advanced tasks.

What is the architecture of CNN?

It has three layers namely, convolutional, pooling, and a fully connected layer. It is a class of neural networks and processes data having a grid-like topology. The convolution layer is the building block of CNN carrying the main responsibility for computation. Pooling reduces the spatial size of the representation and lessens the number of computations required. Whereas, the Fully Connected Layer is connected to both the layers, prior and the recent one.

How to draw a CNN architecture?

To effectively communicate about the created models, it is imperative to use visual tools to communicate about the architecture of CNN. These tools help to create cnn diagrams by representing the model visually in an appealing manner. There are many tools that can be used to draw the architecture such as- Diagram.net NN- SVG Plot Neural Net TensorSpace.js Keras.js

How to implement new MATLAB code for CNN architecture?

In order to implement a new MATLAB code for CNN architecture, one should load and explore the data. Then define and specify the training architecture, once this is done then defining network architecture should be focused upon such as image input layer, max pooling layer, softmax layer, etc. Then specify the training options then lastly train the network using training data. Validation is the last and most important to check the accuracy.

How to increase the accuracy of any CNN architecture?

To improve the performance of CNN architecture, it is pertinent to improve the accuracy of the model. Below mentioned are some of the ways to build accuracy- Set parameters Data Augmentation Increase Data Set Fix the overfitting and underfitting problem

Which CNN architecture is used for YOLO?

YOLO stands for “You Only Look Once,” which uses CNN to look at the objects on a real-time basis. It means the prediction in the entire image is made in a single run. The network is looked at only once, and the forward pass is required only once to make the predictions.

Want to share this article?

Lead the AI Driven Technological Revolution

Apply for Advanced Certification in Machine Learning and Cloud