5 Different Types of Neural Networks
-A mostly complete chart of neural networks is here- Understand the idea behind the neural network algorithm, the definition of a neural network, the mathematics behind the neural network algorithm, and the different types of neural networks to become a neural network pro.
Let’s Have Some Fun Before That …Game Time!
Instead of starting with a mostly complete neural network chart, let us play a fun game first. Below you’ll find a mixture of red balls and black circles; your task is to count the number of balls of each color.
Too easy, right? Well, for most humans, it is. But, what if I wanted a computer to solve this task? Is it possible for it to do that? It turns out it is. A similar problem solved by one of the professors from Cornell University (CU) is now widely considered as the first step towards Artificial Intelligence. In 1958, Frank Rosenblatt from CU successfully demonstrated that a computer could separate cards marked on the left from cards marked on the right after 50 trials. Let us find out in the next section how exactly he did that.
Time Series Python Project using Greykite and Neural Prophet
Downloadable solution code | Explanatory videos | Tech Support
Start Project
What is a Perceptron?
Perceptron is one of the simplest binary classifiers; it separates two classes from each other by learning their features. For example, consider the famous Iris Dataset with features-widths and lengths of sepals and petals for three classes of flowers: Iris setosa, virginica, and versicolor. The dataset was collected by Dr. Edgar Anderson and contains 150 instances, each having four length values and a corresponding class of flowers with it.
Image: Iris Flowers (left) and four parameters that form the features of Iris Dataset (right). Source: Freepik.com(left), Digital Image Processing Textbook [1]
To keep things simple, let us consider only two features- petal length (cm) and sepal length (cm) for two flowers Iris setosa and Iris versicolor. And if we plot these features on a graph, this is what it will look like:
Carefully observe the graph and note that we can easily separate the two flowers from each other based on the two characteristics. In other words, one can effortlessly draw a straight line between the two and set the threshold values for the two lengths for each flower. Perceptron solves this problem. It tries to come up with the required equation of a line. But how is that possible? We’ll explore the answer to this now.
Mathematical Model of the Perceptron
In essence, a perceptron takes in features of an instance (x = {x1, x2, x3, …, xn}) from the dataset, multiplies each feature value by certain weights (w = {w1, w2, w3, …, wn}) and adds a bias term (b) to it. This function, h(x), maps the input vector to the activation function’s output. Look at the figure below that will help you understand this better.
The output of the function h(x) decides the instance belongs to which class. If the result is above zero, we say the instance x belongs to class A1. Otherwise, if the output is less than zero, it belongs to class A2. We can write this mathematically as,
But, how does a perceptron learn these weights so that the instance is labeled with its correct class? We are now ready to answer this.
Get Closer To Your Dream of Becoming a Data Scientist with 70+ Solved End-to-End ML Projects
Algorithm: For simplicity, we consider the input features as a vector and 1 to it at the end so that the input to the activation function is written as y = {y1, y2, y3, …, yn, 1}) and the weight vector becomes (w = {w1, w2, w3, …, wn, b}). We can now write the function h as –
Here, the vector y and w are called the augmented input vector and weight vector respectively. Using this notation, the algorithm of a perceptron can be written as:
Consider the weight vector w1 with arbitrary values. The weights vector will now be updates using the following:
where β> 0 represents a correction increment/the learning increment/the learning rate. The first two cases refer to the situation where the classes have been wrongly identified by the Perceptron. Thus, in this case, the weights will have to be updated. And, if the class has been identified correctly, the weights need not change as can be seen in the third step. These weights can then be used to plot the line that separates the two classes.
You may wonder how such a simple algorithm can give the correct answer always. Well, it cannot. The application of the Perceptron algorithm is limited to cases where the two classes can be separated linearly. That is, we only need to draw a line to separate the objects of two classes. And that’s the only case where this algorithm converges to give the correct weights.
Before we move on to the snippet of code that implements this algorithm, let us play a fun quiz.
Question: What was Frank Rosenblatt working on that led to the birth of the idea of a Perceptron?
-
Studying the way neurons in a human brain transfer information
-
Studying the way, the fly decides in its eye that determines its path of flee
-
Studying the behavior of a cat towards red and blue balls
-
Studying the response of a prey fish to predators
CODE:
The code is simple and easy to understand. Read the comments for a better explanation.
Test Yourself! Implement the above code on the two classes of Iris Dataset and classify them on the basis of sepal length and petal length. Also, don’t forget to use the weights to draw the line that separates the two classes on a graph.
We are now ready to move on to one of the most widely used algorithms, the Neural Networks. This algorithm is somewhat based on the Perceptron algorithm that we just finished learning. If all this was a bit rigorous for you, please go grab a snack and reward yourself for coming this far.
Recommended Reading
What are Neural Networks?
Definition of Neural Network: Neural Network, as the name suggests, is a network of neurons where each neuron behaves like a perceptron that we just finished discussing. The algorithm is based upon the operations of a biological neural system. It aims at recognizing the pattern between the input features and the expected output by minimizing the error between the predicted outcome and the actual output.
Neural Networks and Deep Learning: Deep Learning is a subfield of machine learning that consists of algorithms that mimic how a human brain function. And the basis of most such algorithms is the neural network (NN). The reason for its popularity is the large number of problems it has assisted in solving. From Face Recognition to Object Detection to Stock Prediction, NNs are at the heart of all such solutions. The applications of NN are no more limited to images or numbers. With the invention of exciting algorithm architectures like LSTM, GRU, neural networks have expanded their applications to Natural Language Processing problems. So, what lies in a neural network algorithm? Continue to find out.
How Do Neural Networks work?
Let us begin with the most common way of visualizing a neural network architecture, as shown in figure 1.
A neural network takes a feature vector from the dataset as input, just like a perceptron. But unlike perceptron, this algorithm works for more than two classes. Thus, it can have more than two outputs. Let us understand this algorithm step by step.
-
The first step begins at the input layer (Fig. 1), where the neural network receives the feature vector, x = {x1, x2, x3, …, xn} from the dataset. Each orange-colored circle of fig. 1 represents an element of this feature vector.
-
The next step involves connecting the input vector to all the neurons of next layer. Each neuron of this layer receives weighted sum of input vector-elements along with a bias term. Mathematically, this would mean:
The outcome is then passed through an activation function a(x) so that the output of each neuron is given by
Some of the popular activation functions are listed below
1Image Source: Handbook of Neural Network Signal Processing [2]
-
Repeat step 2 for all the hidden layers- layers that lie between the input and output layer. But, the key point to remember is that the activation functions need not be same for all the hidden layers. Thus, depending on the problem at hand, the output layer usually has different activation function as the neurons of output layer are responsible for labelling the feature vector to one of the expected classes.
The number of neurons in the output layer have to be same as the number of expected classes, each representing one class. The neuron that generates the highest value as an output identifies the class for the input feature vector.
Get FREE Access to Machine Learning Example Codes for Data Cleaning, Data Munging, and Data Visualization
-
Now that we have figured out how the output is evaluated , the remaining part of unraveling is how the network will learn the correct weights. For that, we first compute the error function using the output neurons given by
Ei is the error for a single pattern vector: xâ and is defined as,
j= 1, 2, 3, …, NÊ = number of different classes in the dataset; oj is the output value of the jáµÊ° neuron of the output layer, and zj is the desired response for the jth neuron of the output layer. But, this is not the only function that is in use today. There are a variety of options available, and you can explore them all here:
-
Once the error is evaluated at the output, it needs to be minimized. And that will only become possible when the whole network has learned the correct weights. The error is propagated back to the previous layers to ensure the network learns the correct weights. We can understand how this works by considering the application of the gradient descent algorithm. The weights will adjust in proportion to the partial derivative of the error function. That is,
where α represents the learning parameter and the superscripts denote the layer whose parameters are being considered.
After performing the necessary algebra, we end up with the following algorithm:
For any two layers l and l-1, the weights that connects the two layers are modified using
If j denotes the neuron of the output layer (l=L), the parameter δ is evaluated as
If j denotes the neuron of a hidden layer l and p represents a neuron of hidden layer l+1, the parameter δ is evaluated as
That’s all. We are all set with the mathematics. Grab another snack to energize yourself for the next section.
Access to a curated library of 250+ end-to-end industry projects with solution code, videos and tech support.
Request a demo
Different Kinds of Neural Networks:
Now that you know the basics of a feed forward neural network, let us explore how we can add interesting layers to solve exciting problems.
-
Artificial Neural Network: The neural network that we explained in the previous section is often referred to as Artificial Neural Network. We can thus easily skip this one as we have discussed it already,
-
Radial Basis Functional Neural Network (RBFNN): A special neural network class consisting of only three layers: input layer, hidden layer, and output layer. As is evident from the name, it utilizes Radial Basis Functions (RBFs) like gaussian, thin plate spline, multi-quadratic, etc., as an activation function for the hidden layers. It works like K-Means Clustering Algorithm. Thus, it is used in situations where the instances are not linearly seperable. The idea of using RBF is to transform the variables into a higher dimension where the instances of our dataset become linearly separable. Here is what the architecture of an RBFNN looks like:
The training algorithm for an RBFNN is different from the ANN and requires a few more parameters other than learning increment for computation.
-
Convolutional Neural Network (CNN): As the name suggests, this neural network involves the convolution operation. This type of neural network has wide applications in Image Classification and Object Detection. It receives an image at the input and the features of the image are extracted through the convolution operation. The convolution operation is mathematically defined as:
where y represents the input image vector, w represents the weights/filter/kernel, and s = (t-1)/2 where 1xt is the odd size of kernel.
â
As an example, consider the following values for the input vector y = [2, 1, 2, 3, 4, 6, 8, 1] and w = [0,1, 0, 0, 0] .
Get More Practice, More Data Science and Machine Learning Projects, and More guidance.Fast-Track Your Career Transition with ProjectPro
Note that we have a problem if we start from the origin as we cannot define the operation there. And the solution for this is the padding operation which involves adding the number of zeroes to the input vector so that the convolution operation can be defined.
Thus, the output in this case for x = 0 would be
2Architecture of LeNet-5. [3]
Notice the input to the network is an image. There are multiple convolution layers denoted by C and subsampling layers, represented by S. The subsampling layers are simple layers that contract the size by using operations like average, maximum of the four elements, etc. This model, LeNet-5, was utilized by the authors to recognize the handwritten and machine-printed characters. There can be many more exciting applications like you can use it for identifying your favorite cartoon
3Image source: seekpng.com
And if you don’t get accurate results using LeNet-5, you may switch to more recent CNNs like AlexNet, VGG, Resnet, Inception, Xception, etc.
-
Recurrent Neural Networks (RNN): The word recurrent means “occurring often or repeatedly.” The name suggests that there must be something like an operation happening many times or a repeated calculation. And that is indeed the case with RNN. In RNN, each output element is evaluated as a function of previous elements of the output. And, all the output elements are calculated by applying the same rule of updating the earlier outcomes. This is possible because layers of RNN are kind enough to allow weight-sharing. To understand this better, consider the figure below.
This figure sums up the basic idea of RNN. The input vector of specific dimensions is fed to the hidden layers, and the output is evaluated. However, there is also a circular arrow that points back at the input. This is referring to the fact that the output is being fed back to the network.
RNNs are used for processing sequential data. For example, in Natural Language Processing (NLP) applications, predicting the next word in a sentence keeping the sequence of words already entered in mind. We see Google Keyboard helping us with this every day.
So, if there are four words in a sentence and we want to predict the fifth word, we can use RNN. The network will unravel itself by producing four copies of its layers, one for each word. The terms are, of course, converted to vectors using embedding techniques like word2vec, one-hot encoding, etc. The network starts with evaluating the first word, x1 at time t=1. After that, the output s1 is assessed using an activation function. Next, at time t=2, the output is fed back to the input and even the second word of the sentence. Again, the outcome is evaluated using an activation function and so on. Notice the weight parameters are remain the same for all the calculations, thereby suggesting the recurrent behavior of RNN. Note that the recurrence is there with respect to time.
After evaluating the final output, the loss function is evaluated, and the error is propagated back to update the weights. Many recent algorithms like Long Short Term Memory networks (LSTM),
Gated Recurrent Units (GRU), and attention-based models have RNNs as a part of their architecture.
-
Autoencoders: These are a special kind of neural network that consists of three main parts: encoder, code, and decoder. For these networks, the input is the same as that of the output. They compress the information received at the input into a lower-dimensional code, which they then use to rebuild the result. Both the encoder and decoder have an ANN-based architecture and are usually a mirror image of each other. The idea of using a code between an encoder and a decoder is to introduce a few changes in the input vector and still expect the same output. It might seem odd at first but imagine if you pass a random image at the input, then an autoencoder will be able to present you a picture without noise easily. They are thus widely used for anomaly detection, data denoising, and dimensionality reduction.
Mastering Neural Networks through Hands-On Projects
Congratulations! You are now done with learning about one of the most famous algorithms used by Data Scientists. But, as they say, knowledge is incomplete without action, it is thus important that you explore relevant codes too which can guide you about how to apply Neural Network algorithms for solving real-world problems. Too lazy to google for Neural Network project ideas? Don’t worry, we’ve got you covered with some innovative Neural Network Project Ideas that will add great value to your data science or machine learning portfolio.
References
-
Gonzalez, R. C., & Woods, R. E. (2002). Digital image processing.
-
Hu, Y. H., & Hwang, J. (2002). Handbook of Neural Network Signal processing.
-
LeCun, Y., Bottou, L., Bengio, Y. & Haffner, P. (1998). Gradient-Based Learning Applied to Document Recognition. Proceedings of the IEEE (p./pp. 2278–2324).