Feed Forward Neural Networks — How To Successfully Build Them in Python | by Saul Dobilas | Towards Data Science

Neural Networks

Feed Forward Neural Networks — How To Successfully Build Them in Python

A detailed graphical explanation of Neural Networks with a Python example using real-life data

Feed Forward Neural Networks. Image by author.

Intro

Neural Networks have been the central talking point over the last few years. While they may initially seem intimidating, I assure you that you do not need a Ph.D. to understand how they work.

In this article, I will take you through the main ideas behind basic Neural Networks, also known as Feed Forward NNs or Multilayer Perceptrons (MLPs), and show you how to build them in Python using Tensorflow and Keras libraries.

Contents

  • Feed Forward Neural Network’s place within the universe of Machine Learning
  • A visual explanation of how Feed Forward NNs work
    – Network structure and terminology
    – Parameters and activation functions
    – Loss functions, optimizers, and training
  • Python examples of how to build and train your own Feed Forward Neural Networks

Feed Forward Neural Network’s place within the universe of Machine Learning

Machine Learning is a vast and ever-expanding space with new algorithms developed daily. I have attempted to bring structure to this world by categorizing some of the most commonly used algorithms in the interactive chart below. Click on different categories to enlarge and reveal more.👇

While this categorization is not perfect, it brings a general understanding of how different pieces fit together, and hopefully, it can also facilitate your data science learning journey.

I have placed Neural Networks in a distinct category recognizing their unique approach to Machine Learning. However, it is essential to remember that Neural Networks are most frequently employed to solve classification and regression problems using labeled training data. Hence, an alternative approach could be to put them under the Supervised branch of Machine Learning.

Machine Learning algorithm classification. Interactive chart created by the author.

If you enjoy Data Science and Machine Learning, please subscribe to get an email whenever I publish a new story.

A visual explanation of how Feed Forward NNs work

Structure and terminology

First, let’s familiarize ourselves with the basic structure of a Neural Network.

Basic structure of a Feed Forward (FF) Neural Network. Image by author.

  • Input Layer — contains one or more input nodes. For example, suppose you want to predict whether it will rain tomorrow and base your decision on two variables, humidity and wind speed. In that case, your first input would be the value for humidity, and the second input would be the value for wind speed.
  • Hidden Layer — this layer houses hidden nodes, each containing an activation function (more on these later). Note that a Neural Network with multiple hidden layers is known as Deep Neural Network.
  • Output Layer — contains one or more output nodes. Following the same weather prediction example above, you could choose to have only one output node generating a rain probability (where >0.5 means rain tomorrow, and ≤0.5 no rain tomorrow). Alternatively, you could have two output nodes, one for rain and another for no rain. Note, you can use a different activation function for output nodes vs. hidden nodes.
  • Connections — lines joining different nodes are known as connections. These contain kernels (weights) and biases, the parameters that get optimized during the training of a neural network.

Parameters and activation functions

Let’s take a closer look at kernels (weights) and biases to understand what they do. For simplicity, we create a basic neural network with one input node, two hidden nodes, and one output node (1–2–1).

Detailed view of how weights and biases are applied within the Feed Forward (FF) Neural Network. Image by author.

  • Kernels (weights) used to scale input and hidden node values. Each connection typically holds a different weight.
  • Biases — used to adjust scaled values before passing them through an activation function.
  • Activation functions — think of activation functions as standard curves (building blocks) used by the Neural Network to create a custom curve to fit the training data. Passing different input values through the network selects different sections of the standard curve, which are then assembled into a final custom-fit curve.

There are many activation functions to choose from, with Softplus, ReLU, and Sigmoid being the most commonly used. Here are the shapes and equations of six frequently used activation functions in Neural Networks:

Activation functions. Image by author.

As we are now familiar with kernels (weights), biases, and activation functions, let’s use the same Neural Network to calculate the probability of rain tomorrow based on today’s humidity.

Note, I have already trained this Neural Network (see Python section below). Hence, we already know the values for kernels (weights) and biases. The below illustration shows you a step-by-step process of how FF Neural Network takes an input value and produces the answer (output value).

Example calculation performed by Feed Forward (FF) Neural Network. Image by author.

As you can see, the above Neural Network tells us that a 50% humidity today implies a 33% probability of rain tomorrow.

Loss functions, optimizers, and training

Training Neural Networks involves a complicated process known as backpropagation. I will not go through a step-by-step explanation of how backpropagation works since it is a big enough topic deserving a separate article.

Instead, let me briefly introduce you to loss functions and optimizers and summarize what happens when we “train” a Neural Network.

  • Loss — represents the “size” of error between the true values/labels and the predicted values/labels. The goal of training a Neural Network is to minimize this loss. The smaller the loss, the closer the match between the true and the predicted data. There are many loss functions to choose from, with BinaryCrossentropy, CategoricalCrossentropy, and MeanSquaredError being the most common.
  • Optimizers — are the algorithms used in backpropagation. The goal of an optimizer is to find the optimum set of kernels (weights) and biases to minimize the loss. Optimizers typically use a gradient descent approach, which allows them to iteratively find the “best” possible configuration of weights and biases. The most commonly used ones are SGD, ADAM, and RMSProp.

Training a Neural Network is basically fitting a custom curve through the training data until it can approximate it as well as possible. The graph below illustrates what a custom-fitted curve could look like in a specific scenario. This example contains a set of data that seem to flip between 0 and 1 as the value for input increases.

Fitting a curve to training data. Image by author.

In general, the wide selection of activation functions combined with the ability to add as many hidden nodes as we wish (provided we have sufficient computational power) means that Neural Networks can create a curve of any shape to fit the data.

However, having this extreme flexibility may sometimes lead to overfitting the data. Hence, we must always ensure that we validate the model on the test/validation set before using it to make predictions.

Summarizing what we have learned

Feed Forward Neural Networks take one or multiple input values and apply transformations using kernels (weights) and biases before passing results through activation functions. In the end, we get an output (prediction), which is a result of this complex set of transformations optimized through training.

We train Neural Networks by fitting a custom curve through the training data, guided by loss minimization and achieved through parameter (kernels and biases) optimization.

Building and training Feed Forward Neural Networks in Python

Let’s now have some fun and build our own Neural Network. We will use historic Australian weather data to train a Neural Network that predicts whether it will rain tomorrow or not.

Setup

We’ll need the following data and libraries:

Let’s import all the libraries:

The above code prints package versions used in this example:

Tensorflow/Keras: 2.7.0
pandas: 1.3.4
numpy: 1.21.4
sklearn: 1.0.1
plotly: 5.4.0

Next, we download and ingest Australian weather data (source: Kaggle). We also do some simple data manipulations and derive new variables for our models.

And this is what the data looks like:

A snippet of Kaggle’s Australian weather data with some modifications. Image by author.

Neural Networks

Now we train and evaluate our Feed Forward (FF) Neural Network. I have extensively commented the code below to provide you with a clear understanding of what each part does. Hence, I will not repeat the same in the body of the article.

Using one input (Humidity3pm)

In short, we are using humidity at 3 pm today to predict whether it will rain tomorrow or not. Our Neural Network has a simple structure (1–2–1) analyzed earlier in this article: one input node, two hidden nodes, and one output node.

A couple of things to note:

  • The below code performs validation twice, once on a portion of X_train data (see validation_split in step 5) and another time on a test sample created in step 2. Of course, there is no need to do it twice, so feel free to use either method to validate your model.
  • The data was imbalanced (more sunny days than rainy days), so I’ve adjusted classes_weight in step 5.

Training a Feed Forward (FF) Neural Network. Gif image by author.

The above code prints the following summary and evaluation metrics for our 1–2–1 Neural Network:

1–2–1 Feed Forward (FF) Neural Network performance. Image by author.

Note that weights and biases for this model are different from the ones in the calculated example earlier in this article. It is because Neural Network training uses a stochastic (random) approach within the optimizer algorithms. Hence, your model will be different every time you re-train it.

Let’s now plot the prediction curve on a chart.

Prediction curve produced by the Neural Network with one input. Image by author.

Using two inputs (WindGustSpeed and Humidity3pm)

Let’s see how the network and predictions change when we use two inputs (WindGustSpeed and Humidity3pm) to train a Neural Network that has a 2–2–1 structure.

Feel free to experiment at your own time by training a model with 17 inputs and a different number of hidden nodes.

And the results are:

2–2–1 Feed Forward (FF) Neural Network model performance. Image by author.

Since we used two inputs, we can still visualize the predictions. However, this time we need a 3D chart to do it:

Curved prediction surface produced by the Neural Network with two inputs. Image by author.

Conclusions

Neural Networks are not as scary as they seem at first. I sincerely hope you enjoyed reading this article and obtained some new knowledge.

Feel feel to use the code provided in this article to build your own Neural Networks. Also, you can find the complete Jupyter Notebook in my GitHub repository.

As I try to make my articles more useful for readers, I would appreciate it if you could let me know what has driven you to read this piece and whether it has given you the answers you were looking for. If not, what was missing?

Cheers! 👏
Saul Dobilas