Mục Lục

Activation Function Sigmoid

“The S-shaped function”

What is Sigmoid?

The sigmoid function also called the logistic function, is traditionally a very popular activation function for neural networks.

What does Sigmoid do?

Sigmoid takes a real value as input and transforms it to output another value between 0 and 1. Inputs that are much larger than 1 are transformed to the value 1, similarly, values much smaller than 0are snapped to 0.

Who invented Sigmoid?

The sigmoid function was introduced in a series of three papers by Pierre Francios Verhulst between 1838 and 1847, who devised it as a model of population growth by adjusting the exponential growth model. under the guidance of Adolphe Quetelet.

source: Wikipedia

What does sigmoid look like?

The shape of the function for all possible inputs is an S — shape from 0.0 up through 0.5 to 1.0

The sigmoid function:

Sigmoid Equation

S represents the Sigmoid function and z represents the number for which sigmoid has to be calculated

The derivative of the sigmoid function:

The Derivative of the Sigmoid Equation

The Implementation

import matplotlib.pyplot as plt

import numpy as np

import math

x = np.linspace(-10, 10, 100) # X- axis starting from -10 to 10

z = 1/(1 + np.exp(-x)) #Sigmoid function formula

y = z*(1-z) # Derivative of Sigmoid function

plt.plot(x, z)

plt.xlabel(“x”)

plt.ylabel(“Sigmoid(X)”)

plt.show()

The Output:

Sigmoid FunctionDerivative of Sigmoid Function

Why even?

For a long time, through the early 1990s, it was the default activation function used in the neural network. It is easy to work with and has all the nice properties of activation functions! Meaning, it:

is non-linear.
is continuously differentiable.
is monotonic.
has a fixed output range.

What are the limitations?

It gives rise to a problem of vanishing gradients.
Towards either end of the sigmoid function, the Y values tend to respond very less to changes in X.
Its output is not zero centered. It makes the gradient updates go too far in different directions.
It makes optimization harder
The network sometimes refuses to learn further or is drastically slow.
Sigmoids saturate and kill gradients.