Deep Learning: Feed Forward Neural Networks (FFNNs)

a.k.a. Multi-Layered Perceptrons (MLPs)

A Deep Feed Forward Neural Network (FFNN) — aka Multi-Layered Perceptron (MLP)

An Artificial Neural Network (ANN) is made of many interconnected neurons:

A single Neuron from an Artificial Neural Network (ANN)

Each neuron takes in some floating point numbers (e.g. 1.0, 0.5, -1.0) and multiplies them by some other floating point numbers (e.g. 0.7, 0.6, 1.4) known as weights (1.0 * 0.7 = 0.7, 0.5 * 0.6 = 0.3, -1.0 * 1.4 = -1.4). The weights act as a mechanism to focus on, or ignore, certain inputs.

Weights act as soft gates to ignore some features (0) and focus on others (+1) or even inhibit them (-1)

The weighted inputs then get summed together (e.g. 0.7 + 0.3 + -1.4 = -0.4) along with a bias value (e.g. -0.4 + -0.1 = -0.5).

The summed value (x) is now transformed into an output value (y) according to the neuron’s activation function (y = f(x)). Some popular activation functions are shown below:

A small selection of Popular Activation Functions

e.g. -0.5 → -0.05 if we use the Leaky Rectified Linear Unit (Leaky ReLU) activation function: y = f(x) = f(-0.5) = max(0.1*-0.5, -0.5) = max(-0.05, -0.5) = -0.05

The neuron’s output value (e.g. -0.05) is often an input for another neuron.

A Neuron’s output value often feeds in as an input to other Neurons in the Artificial Neural Network (ANN)The Perceptron, one of the first Neural Networks, is made of just a single Neuron

However, one of the first ANNs was known as the perceptron and it consisted of only a single neuron.

The Perceptron

The output of the perceptron’s (only) neuron acts as the final prediction.

Each Neuron is a linear binary classifier all on its own (e.g. an output value >= 0 indicates the blue class, while an output value < 0 indicates the red class)

Lets code our own Perceptron:

import numpy as np
class Neuron: 
 
 def __init__(self, n_inputs, bias = 0., weights = None): 
 self.b = bias
 if weights: self.ws = np.array(weights)
 else: self.ws = np.random.rand(n_inputs)
 
 def __call__(self, xs): 
 return self._f(xs @ self.ws + self.b) 
 
 def _f(self, x): 
 return max(x*.1, x)

(Note: we have not included any learning algorithm in our example above — we shall cover learning algorithms in another tutorial)

perceptron = Neuron(n_inputs = 3, bias = -0.1, weights = [0.7, 0.6, 1.4])
perceptron([1.0, 0.5, -1.0])

-0.04999999999999999

Notice that by adjusting the values of the weights and bias, you can adjust the neuron’s decision boundary. (NB: a neuron learns by updating its weights and bias values to reduce the error of its decisions).

So why do we need so many neurons in an ANN if any one will suffice (as a classifier)?

Limitations: The neuron is a binary classifier since it can only learn to distinguish between two classes (e.g. blue and red) max. The neuron is a linear classifier because it’s decision boundary approximates to a straight line for 2D data (or a flat plane for 3D data, etc)

Unfortunately, individual neurons are only able to classify linearly separable data.

However, by combining neurons together, we essentially combine their decision boundaries. Therefore, an ANN composed of many neurons is able to learn complex, non-linear decision boundaries.

Combining Neurons allows Neural Networks to learn more complex, Nonlinear Decision Boundaries

Neurons are connected together according to a specific network architecture. Though there are different architectures, nearly all of them contain layers. (NB: Neurons in the same layer do not connect with one another)

Neural Networks contain Layers

There is typically an input layer (containing a number of neurons equal to the number of input features in the data), an output layer (containing a number of neurons equal to the number of classes) and a hidden layer (containing any number of neurons).

Deep neural networks contain multiple hidden layers

There can be more than one hidden layer to allow the neural net to learn more complex decision boundaries (Any neural net with more than one hidden layer is considered a deep neural net).

Lets build a deep NN to paint this picture:

An example image that our ANN will learn to paint (It will be learning to associate certain colours to certain regions of the picture)

Lets download the image and load its pixels into an array

!curl -O https://pmcvariety.files.wordpress.com/2018/04/twitter-logo.jpg?w=100&h=100&crop=1 from PIL import Image
image = Image.open('twitter-logo.jpg?w=100')
import numpy as np
image_array = np.asarray(image)

Now teaching our ANN to paint is a supervised learning task, so we need to create a labelled training set (Our training data will have inputs and expected output labels for each input). The training inputs will have 2 values (the x,y coordinates of each pixel).

Given the simplicity of the image, we could actually approach this problem in one of two ways. A classification problem (where the neural net predicts whether a pixel belongs to the “blue” class or the “grey” class, given its xy coordinates) or a regression problem (where the neural net predicts RGB values for a pixel given its coordinates).

If treating this as a regression problem: the training outputs will have 3 values (the normalised r,g,b values for each pixel). — Lets use this method for now.

training_inputs,training_outputs = [],[]
for row,rgbs in enumerate(image_array):
 for column,rgb in enumerate(rgbs):
  training_inputs.append((row,column))
  r,g,b = rgb
  training_outputs.append((r/255,g/255,b/255))

Now lets create our ANN:

A fully-connected feed-forward neural network (FFNN) — aka A multi-layered perceptron (MLP)

It should have 2 neurons in the input layer (since there are 2 values to take in: x & y coordinates).
It should have 3 neurons in the output layer (since there are 3 values to learn: r, g, b).
The number of hidden layers and the number of neurons in each hidden layer are two hyperparameters to experiment with (as well as the number of epochs we will train it for, the activation function, etc) — I’ll use 10 hidden layers with 100 neurons in each hidden layer (making this a deep neural network)

from sklearn.neural_network import MLPRegressor
ann = MLPRegressor(hidden_layer_sizes= tuple(100 for _ in range(10)))
ann.fit(training_inputs, training_outputs)

The trained network can now predict the normalised rgb values for any coordinates (e.g. x,y = 1,1).

ann.predict([[1,1]])

array([[0.95479563, 0.95626562, 0.97069882]])

lets use the ANN to predict the rgb values for every coordinate and lets display the predicted rgb values for the entire image to see how well it did (qualitatively — we shall leave evaluation metrics for another tutorial)

predicted_outputs = ann.predict(training_inputs)
predicted_image_array = np.zeros_like(image_array)
i = 0
for row,rgbs in enumerate(predicted_image_array):
 for column in range(len(rgbs)):
  r,g,b = predicted_outputs[i]
  predicted_image_array[row][column] = [r*255,g*255,b*255]
  i += 1
Image.fromarray(predicted_image_array)

Our ANN’s Painting (predicted pixel colours)

Try changing the hyperparameters to get better results.

If instead of treating this as a regression problem, we treat this as a classification problem, then the training outputs will have 2 values (the probabilities of the pixel belonging to each of the two classes: “blue” and “grey”)

training_inputs,training_outputs = [],[]
for row,rgbs in enumerate(image_array):
 for column,rgb in enumerate(rgbs):
  training_inputs.append((row,column))
  if sum(rgb) <= 600:
   label = (0,1) #blue class
  else:
   label = (1,0) #grey class
  training_outputs.append(label)

We can rebuild our ANN as a binary classifier with 2 neurons in the input layer, 2 neurons in the output layer and 100 neurons in the hidden layer (with 10 hidden layers)

from sklearn.neural_network import MLPClassifier
ann = MLPClassifier(hidden_layer_sizes= tuple(100 for _ in range(10)))
ann.fit(training_inputs, training_outputs)

We can now use the trained ANN to predict the class which each pixel belongs to (0: “grey” or 1: “blue”). The argmax function is used to find which class has the highest probability

np.argmax(ann.predict([[1,1]]))

(this indicates the pixel with xy-coordinates 1,1 is most likely from class 0: “grey”)

predicted_outputs = ann.predict(training_inputs)
predicted_image_array = np.zeros_like(image_array)
i = 0
for row,rgbs in enumerate(predicted_image_array):
 for column in range(len(rgbs)):
  prediction = np.argmax(predicted_outputs[i])
  if prediction == 0:
   predicted_image_array[row][column] = [245,245,245]
  else:
   predicted_image_array[row][column] = [135,206,250] 
  i += 1
Image.fromarray(predicted_image_array)

The predicted class for each pixelThe expected class for each pixel