Step-by-step Guide to Building Your Own Neural Network From Scratch
Mục Lục
Step-by-step Guide to Building Your Own Neural Network From Scratch
Learn the fundamentals of deep learning and build your very own neural network for image classification
Photo by Aron Visuals on Unsplash
For hands-on video tutorials on machine learning, deep learning, and artificial intelligence, checkout my YouTube channel.
The “what” and the “why”
What is deep learning
We have all heard about deep learning before. It has become very popular among data science practitioners and it is now used in a variety of settings, thanks to recent advances in computation capacity, data availability and algorithms.
But what exactly is deep learning?
Simply, deep learning refers to training a neural network.
Now, what is a neural network?
Well, it is simply a function that fits some data. In its simplest form, there is a single function fitting some data as shown below. This structure is called a neuron.
Schematic of a neuron
The function can be anything: a linear function or a sigmoid function. Of course, a single neuron has no advantage over a traditional machine learning algorithm.
Therefore, a neural network combines multiples neurons. Think of neurons as the building blocks of a neural network. By stacking them, you can build a neural network as below:
Schematic of a neural network
Notice above how each input is fed to each neuron. The neural network will figure out by itself which function fits best the data. All you need to provide are the inputs and the output.
Why use deep learning?
Deep learning has been successfully applied in many supervised learning settings. Traditional neural networks are applied for online advertising purposes. Convolutional neural networks (CNN) are great for photo tagging, and recurrent neural networks (RNN) are used for speech recognition or machine translation.
In recent years, our digital activity has significantly increased, generating very large amounts of data. While the performance of traditional machine learning methods will plateau as more data is used, large enough neural networks will see their performance increase as more data is available. In recent years, data storage has become very cheap, and computation power allow the training of such large neural networks.
This is why deep learning is so exciting right now. We have access to large amounts of data, and we have the computation power to quickly test and idea and repeat experiments to come up with powerful neural networks!
Now that we know what is deep learning and why it is so awesome, let’s code our very first neural network for image classification! Fire up your Jupyter Notebook!
Yes, our neural network will recognize cats. Classic, but it’s a good way to learn the basics!
Your first neural network
The objective is to build a neural network that will take an image as an input and output whether it is a cat picture or not.
Feel free to grab the entire notebook and the dataset here. It also contains some useful utilities to import the dataset.
Import the data
As always, we start off by importing the relevant packages to make our code work:
Then, we load the data and see what the pictures look like:
And you should see the following:
Example of a cat image in the dataset
Then, let’s print out more information about the dataset:
And you should see:
General information about the dataset
As you can see, we have 209 images in the training set, and we have 50 images for training. Each image is a square of width and height of 64px. Also, you notice that image has a third dimension of 3. This is because the image is composed of three layers: a red layer, a blue layer, and a green layer (RGB).
A picture is composed of three layers
Each value in each layer is between 0 and 255, and it represents how red, or blue, or green that pixel is, generating a unique color for each combination.
Now, we need to flatten the images before feeding them to our neural network:
Great! You should now see that the training set has a size of (12288, 209). This means that our images were successfully flatten since
12288 = 64 x 64 x 3.
Finally, we standardize our dataset:
Choose the activation function
One of the first steps in building a neural network is finding the appropriate activation function. In our case, we wish to predict if a picture has a cat or not. Therefore, this can be framed as a binary classification problem. Ideally, we would have a function that outputs 1 for a cat picture, and 0 otherwise.
You may already know that the sigmoid function makes sense here. I will assume that you know most of the properties of the sigmoid function. Otherwise, you can learn more here.
Mathematically, the sigmoid function is expressed as:
Sigmoid function
Thus, let’s define the sigmoid function, as it will become handy later on:
Great, but what is z? It is the weighted input and it is expressed as:
Weighted input
Where w is the weight matrix and b is a bias. Now, we need to initialize the weights and bias.
Think of the weight as the importance of a feature. Usually, we initialize it to non-zero random value.
The bias is a constant that we add, like an intercept to a linear equation. This gives the neural network an extra parameter to tune in order to improve the fit. The bias can be initialized to 0.
Now, we need to define a function for forward propagation and for backpropagation.
During forward propagation, a series of calculations is performed to generate a prediction and to calculate the cost. The cost is a function that we wish to minimize. In our case, the cost function will be:
Cost function
Where y is an observation and y_hat is a prediction.
Then, backpropagation calculates the gradient, or the derivatives. This will be useful during the optimization phase, because when the derivatives are close or equal to 0, it means that our parameters are optimized to minimize the cost function.
Hence, we write the following function:
Great! As aforementioned, we need to repeat forward propagation and backpropagation to update the parameters in order to minimize the cost function. This is done using gradient descent. For that, we set a learning rate which is a small positive value that controls the magnitude of change of the parameters at each run.
It is important to choose an appropriate value for the learning rate a shown below:
Pot of the cost as a function of the weights. Left: small learning rate. Right: large learning rate.
If it is too small, it will take a longer time to train your neural network as seen on the left. If it is too big, you might never reach the global minimum and gradient descent will oscillate forever.
In our case, we will update the parameters like this:
Gradient descent
Where alpha is the learning rate. In code, we write:
Awesome, we are almost done! All we need to do is compute a prediction. Knowing that the sigmoid function outputs a value between 0 and 1, we will determine that if the value is greater than 0.5, we predict a positive example (it is a cat). Otherwise, we will predict a false example (not a cat).
Amazing! Combining all our function into a single model should look like this:
Now, we can train our model and make predictions!
After running the code cell above, you should see that you get 99% training accuracy and 70% accuracy on the test set. Not bad for a simple neural network!
You can even plot the cost as a function of iterations:
And you should see:
Cost function going down as more iterations are performed
You see that the cost is indeed going down after each iteration, which is exactly what we want.
Feel free to experiment with different learning rates and number of iterations to see how it impact the training time and the accuracy of the model!