A step-by-step neural network tutorial for beginners

Photo by Franck V. on Unsplash

But, before starting the tutorial, you must to have a goal. Why do you need the tutorial?

The answer is simple, maybe you have something in mind to build using a neural network. Or maybe you just don’t want to miss out on this technology.

For you who don’t know what this technology can do, I will give you several examples of implementations.

If you want to explore more about Neural network and Deep Learning, you can download the Ebook here.

Neural network examples

From simple problems to very complicated ones, neural networks have been used in various industries. Here are several examples of where neural network has been used:

banking — you can see many big banks betting their future on this technology. From predicting how much money they need to put inside an ATM to optimize their trip back and forth to refill. Into replacing the old technology to detect fraudulent credit card transactions.
advertising — big advertising companies like Google Adsense deploy neural networks to further optimize their ad choice in relevancy. This results in better targeting and an associated increase in the Click-Through Rate.
healthcare — many academics and start-ups are trying to solve difficult problems that were unsolved before. Examples include clinical imaging to assist doctors in reading MRIs and genomics where DNA sequences are read.
automotive — the self-driving car is of huge interest. Huge deal. But I still doubt they can implement those cars in the congestive road of Jakarta…

Photo by Roman Mager on Unsplash

So, what is the Neural network?

The neural network is a technology based on the structure of the neurons inside a human brain.

Neuron by MyBrainNotes

Take a look at the image of the neuron above. Each neuron is a processing tool of our brain. Each neuron will try to stimulate other neurons via its Axon terminals and tell which terminal should active and which ones remain inactive.

By doing that over and over across multiple neurons (FYI, we have around 100 billion of neuron in our brain), our brain can process complex things and solving problems.

The artificial neural network

It was around the 1940s when Warren McCulloch and Walter Pitts create the so-called predecessor of any Neural network. But, it was Geoffrey Hinton makes this algorithm comes to the surface via his learning algorithm, called Backpropagation

Neural Network

In simple terms, a Neural network algorithm will try to create a function to map your input to your desired output.

As an example, you want the program output “cat” as an output, given an image of a cat.

Take a look at the image. The cat image is the input in the input layer, while the “cat” will be on the output layer. The hidden layers are the function that will map the image correct category.

Given the structure of the algorithm like a structure of a human neuron, hence it is called the Neural network.

Don’t worry, after doing this tutorial, you can also build your own Neural network.

So, without delay, let’s start the Neural Network tutorial.

Neural Network Tutorial with Python

Why Python? Well, Python is the library with the most complete set of Neural Network libraries.

For this tutorial, I will use Keras.

Keras is a higher-level abstraction for the popular neural network library, Tensorflow. Because of the high level of abstraction, you don’t have to build a low-level Linear Algorithm and Multivariate Calculus by yourself. This process will simplify your workflow and you get what you want anyway.

To get started, you need to set up the required software.

First, of course, you need Python. You can download it from their website. You need version 3.6+ for this neural network tutorial.

After that using pip and install TensorFlow, which includes Keras.

pip install tensorflow
pip install jupyter

Now you are ready for action!

Fashion MNIST, the not so common tutorial

MNIST, the handwritten digit dataset, is often used in neural network tutorials. While this is good for starting, what is the use of understanding handwritten digits?

And FYI, solving MNIST with the very simple Neural network could get you to 95% accuracy without trying to do any fine-tuning. It’s too simple for us to learn anything.

That’s why we will not use MNIST, but another dataset called fashion MNIST. Fashion MNIST is a dataset of ten categories of clothing and accessories, in grayscales.

Fashion MNIST.

There are 70,000 photos of such images available for us. Each of those images is 28×28 grayscale.

The image categories are:

The purpose of the tutorial is to accurately assign each item into one of the ten categories.

The preparation

Now, after getting the data. What next?

This step is the train-validation-test split. This splits your data into three portions.

The training data is self-explanatory. Generally, a larger amount of training data quantity will make your Neural Network better understand your data distribution. More data will make your trained network do better. Always put the priority on this portion of the split.

Next is the validation data. It is the portion of data which will be evaluated against during the training process. This data is used to estimate the prediction error.

Finally, the test data. This is the data used to evaluate the neural network model. If the network performs well on the test data, you can bring the network to the production level.

Train-validation-test split

There is a rule of thumb when splitting your data. If your data is not that many, maybe in thousands or tens of thousands, then use 70–10–20 as the split strategy. 70% of the data are split into training, 10% into validation, and 20% into the test set.

However, if you have millions of data, then 90–5–5 is a better split strategy. Or, if the data is more than that, maybe you can use 98–1–1 as the split strategy.

The data provided by Keras is already split between the training and testing sets, with 60K for training and 10k for testing. For the validation, let’s take 10% of the training data.

So, it will be 54K images for training, 6K images for validation, and 10K images for testing.

Neural network construction

Now, you know what to do to prepare the data. Let’s get into the action.

Type jupyter notebook in your command line to get started.

Your browser will open up a window like this. Using Jupyter notebook, you can code Python interactively.

Then do the set-up imports:

from tensorflow.keras.datasets import fashion_mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.utils import to_categorical

The fashion MNIST dataset is already included inside Keras’ own collection. For other datasets, you might want to import via OpenCV or Python Image Library to make these ready for processing and training.

For our fashion MNIST, let’s just load the data:

(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()

Okay, you are ready now to create your own neural network.

But how?

Important points

For every neural network project you will do in the future, these rules always apply.

Start simple. Use a single layer perceptron and evaluate the result. If it is good, then proceed to deployment.
If the previous step is not good enough, try to get your network wider and/or deeper. Add several neurons in your single-layer perceptron. Or, add one layer into the existing network. Evaluate and, if it is good, proceed to deployment. If not, then iterate by adding more neurons or layers.
When, after adding several more layers into your network, but the results are still not good, then maybe you need to change your network architecture. Use Convolutional Neural Network (CNN) for images or Recurring Neural Network for time-series and texts.

Follow those three steps, and you will get your results better.

Let’s apply the steps to our problem

Single-layer perceptron

Let’s start our neural network with a perceptron.

What is perceptron?

Single layer perceptron by LearnOpenCV.

The usual neural network images you see everywhere is the perceptron diagram. There are three layers on the image above: the Input Layer; one Hidden Layer; and the Output Layer. The Input and Output Layers will always be one layer each, for every network.

Therefore, I count only the number of Hidden Layers to recognize how deep it is. For that image, because there is only one Hidden Layer, I call that Single Layer Perceptron.

Our image is 28×28, and therefore is two-dimensional. Because of our perceptron only able to read one-dimensional data, let’s flatten them.

x_train = x_train.reshape(x_train.shape[0], -1) / 255.0
x_test = x_test.reshape(x_test.shape[0], -1) / 255.0
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

You will see that the size changed into 784 because of the flatten. Print y_train.shape and y_test.shape to see your data size.

Your training data x_train is transformed from 60,000 x 28 x 28 to 60,000 x 784. Your testing data x_test follows suit, from 10,000 x 28 x 28 to 10,000 x 784.

For the hidden layer, let’s set an arbitrary number of neurons. The number should be simple and small enough to follow our step number 1. Let’s choose 10 neurons.

While for the output layer, because we have ten categories to categorize, we need to set it to 10 output neurons. For each image, each of these neurons will be filled with 1 if it is the correct category and 0 if not.

In an example, if you have a Sandal image, then the output layer should have something like this [0 0 0 0 0 1 0 0 0 0]. The index for Sandal category (5) should be 1, the other should be 0. Remember, the array is zero-indexed. The sixth item should be index number 5.

The output layer is called One-Hot Vector, when it is hot then the value is 1, the others should be all zeros.

Photo by Thomas Jensen on Unsplash

Back to our architecture

model = Sequential()
model.add(Dense(10, input_dim=784, activation='relu'))
model.add(Dense(10, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

The Sequential model is the easiest model Keras has. The first line of model.add method is adding your hidden layer with 10 cells, coming from 784 input cells.

The second add method is adding your output layer to the network. This has 10 cells as I elaborated before.

The Relu and Softmax activation options are non-linear. Being able to use non-linear data makes Neural Network particularly useful. Generally, neural networks can map any data distribution at any level of complexity.

You don’t have to know what Relu and Softmax are. These are too complex for a beginner. You just need to follow these tips:

Use Relu whenever possible, on every hidden layer.
Use Softmax on output layers with more than two categories to be predicted.
Use Sigmoid on an output layer with two categories.

After creating your model, call compile method to finish your model. It usually takes three parameters. Always use categorical_crossentropy for multi-categories, and binary_crossentropy for two categories. Use adam or rmsprop as the optimizer since both of them are pretty good. And you need accuracy as the metric to check your network performance.

Let’s train

model.fit(x_train, y_train, epochs=10, validation_split=0.1)

As I said before, we will use 10% of the training data as the validation data, hence validation_split was set to 0.1.

epochs is the number of training loops we will do. One epoch will expose all of our training data once to the network. More epochs means the network will know better about our data.

And the result is:

Epoch 10/10
54000/54000 [==============================] - 9s 160us/step - loss: 0.4075 - acc: 0.8598 - val_loss: 0.4305 - val_acc: 0.8522

Pretty good. You get 85% accuracy on validation data.

You don’t see the training data accuracy, because it should have 100% accuracy. What matters is the accuracy of the validation data. Since it has not seen any of the validation data, we can see how well it can generalize.

Let’s see on to the testing data:

_, test_acc = model.evaluate(x_test, y_test)
print(test_acc)

And you will get around 84% accuracy on test data. Good enough for this simple architecture.

Accuracy

This is a metric to measure how good the performance of your network is. 84% accuracy on test data means the network guessed right for around 8400 images from the 10K test data.

A higher accuracy on test data means a better network. If you think the accuracy should be higher, maybe you need the next step(s) in building your Neural Network.

Make the network wider

model2 = Sequential()
model2.add(Dense(50, input_dim=784, activation='relu'))
model2.add(Dense(10, activation='softmax'))
model2.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model2.fit(x_train, y_train, epochs=10, validation_split=0.1)

This changes the number of the hidden layer cells. We’ve increased these from 10 to 50.

Let’s check out the results:

Epoch 10/10
54000/54000 [==============================] - 9s 167us/step - loss: 0.2735 - acc: 0.9006 - val_loss: 0.3703 - val_acc: 0.8653

A whopping 86% accuracy on validation data. Good! It proves that making a bigger network can increase the performance.

Let’s see on our test data:

_, test_acc = model2.evaluate(x_test, y_test)
print(test_acc)

Yup. It is increased to 86% too. Pretty good! It is around 300 more data samples guessed right.

But I want more.

Create a deeper network

model3 = Sequential()
model3.add(Dense(50, input_dim=784, activation='relu'))
model3.add(Dense(50, activation='relu'))
model3.add(Dense(10, activation='softmax'))
model3.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model3.fit(x_train, y_train, epochs=10, validation_split=0.1)

Let’s add one more hidden layer with 50 cells.

And check it the results:

Epoch 10/10
54000/54000 [==============================] - 9s 170us/step - loss: 0.2648 - acc: 0.9008 - val_loss: 0.3417 - val_acc: 0.8738

Validation accuracy increased by 1%. A better performance.

How about our test data?

_, test_acc = model3.evaluate(x_test, y_test)
print(test_acc)

Hmm. It is 86.9% accurate. The improvement is not that big.

What’s wrong?

Maybe our approach is not right by using perceptron on images. How about we change it into…

Convolutional neural network

A convolutional neural network (CNN) is a neural network that can “see ” a subset of our data. It can detect a pattern in images better than perceptron. Read more about Convolutional Neural Network Tutorial on my blog post.

Let’s just apply the CNN to our network:

from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten
import numpy as np
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()
x_train = x_train[:,:,:,np.newaxis] / 255.0
x_test = x_test[:,:,:,np.newaxis] / 255.0
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

Let’s import the necessary methods and reshape our training data. You can’t flatten it because CNN reads our images as is.

If you check your x_train, you will have 60,000 x 28 x 28 x 1 data.

Why x 1?

The data CNN needs to read must be like this: total_data x width x height x channels.

Height and width are self-explanatory. Channels are like Red or Green or Blue in RGB images. In RGB, because there are three channels, we need to make the data x 3. But because we work with grayscale images, every value on Red, Green, or Blue channel is the same and we reduce to one channel.

Let’s build the architecture:

model4 = Sequential()
model4.add(Conv2D(filters=64, kernel_size=2, padding='same', activation='relu', input_shape=(28,28, 1))) 
model4.add(MaxPooling2D(pool_size=2))
model4.add(Flatten())
model4.add(Dense(10, activation='softmax'))
model4.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

The code is a little bit different. There are Conv2D, MaxPooling2D, and Flatten.

These guys are the three most common layers to use in CNN.

model4.summary()

will explain you what is inside the network:

Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_1 (Conv2D)            (None, 28, 28, 64)        320       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 14, 14, 64)        0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 12544)             0         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                125450    
=================================================================

conv2d changes your 28x28x1 image to 28x28x64. Just imagine this as 64 hidden layer cells.

MaxPooling2D reduces the width and height so that you will not need to compute all the cells. It reduces the size to 14x14x64.

Finally, flatten just flattens out the output of MaxPooling into a hidden layer of 12544 cells.

Let’s check it out:

model4.fit(x_train, y_train, epochs=10, validation_split=0.1)

The validation result is:

Epoch 10/10
54000/54000 [==============================] - 42s 774us/step - loss: 0.1890 - acc: 0.9318 - val_loss: 0.2660 - val_acc: 0.9083

Oh yeah! It is more than 90%. A single CNN layer can do this.

How about the test data?

_, test_acc = model4.evaluate(x_test, y_test)
print(test_acc)

Wow, it gives you 90.25% accuracy.

Changing the architecture into a more suitable one really works. And I always suggest you do so.

Conclusion

A process on building Neural Network is pretty much like this. Follow my three steps and you will do just fine.

On traditional datasets like those in your company database, you can follow my steps from the very beginning and start to complicate the network. But, for images or texts, it is actually better to just start jump into the most suitable architecture. But still, do it as simple as possible for your first step.

Thanks for reading this. I hope this article can help you build your neural network better. And you can learn AI from this best AI course here.