Sigmoid function | Engati

What is a Sigmoid function?

The sigmoid function is a mathematical function that has a characteristic that can take any real value and map it to between 0 to 1 shaped like the letter “S”. The sigmoid function is also known as a logistic function.

Y = 1 / 1+e -z

The Sigmoid function S curveSigmoid function

If the value of z goes up to positive infinity, then the predicted value of y will become 1. But if the value of z goes down to negative infinity, then the predicted value of y will become 0.

If the outcome of the sigmoid function is greater than 0.5 then you would classify that label to be class 1 or positive class and if it is less than 0.5 then you would classify it to be a negative class or label it as class 0.

The Sigmoid function performs the role of an activation function in machine learning which is used to add non-linearity in a machine learning model. Basically, the function determines which value to pass as output and what not to pass as output. There are 7 types of activation functions that are used in machine learning and deep learning.

What is the history of the sigmoid function?

In the year 1798, a book named An Essay on the Principle of Population was published by the English cleric and economist Thomas Robert Malthus. In this book, he asserted that the population was increasing in a geometric progression (doubling every 25 years) while food supplies were increasing arithmetically. He claimed that this difference between the two would cause widespread famine.

After that, in the late 1830s, Pierre François Verhulst, a Belgian mathematician, was conducting experiments with various ways of modeling population growth. Pierre wanted to account for the fact that a population’s growth is ultimately self-limiting, it does not increase exponentially forever. For the purpose of modeling the slowing down of a population’s growth which occurs when a population begins to exhaust its resources, Verhulst picked the logistic function as a logical adjustment to the simple exponential model.

Over the course of the next century,  biologists and other scientists started to make use of the sigmoid function as a standard tool for modeling population growth, all the way from bacterial colonies to human civilizations.

In 1943, Warren McCulloch and Walter Pitts developed an artificial neural network model using a hard cutoff as an activation function. In this model,  a neuron generates an output of 1 or 0 depending on whether its input is above or below a threshold. 

In the year 1972, the biologists Hugh Wilson and Jack Cowan at the University of Chicago were trying to model biological neurons computationally and ended up publishing the Wilson–Cowan model, in which a neuron sends a signal to another neuron if it receives a signal greater than an activation potential. Wilson and Cowan employed the logistic sigmoid function to model the activation of a neuron as a function of a stimulus.

During the 1970s and the 1980s, several researchers started to make use of sigmoid functions in formulations of artificial neural networks, taking inspiration from biological neural networks. In 1998, Yann LeCun selected the hyperbolic tangent as an activation function in his groundbreaking convolutional neural network LeNet, which was the first CNN to have the ability to recognize handwritten digits to a practical level of accuracy.

Recently, ANNs have shifted away from sigmoid functions towards the ReLU function, because all the variants of the sigmoid function are computationally intensive to calculate, and the ReLU offers the required nonlinearity to take advantage of the depth of the network, while also being very fast to compute.

What are the types of sigmoid functions?

There are several types of sigmoid functions available. Here are three of the most common types of sigmoid functions.

Logistic Sigmoid Function

The logistic sigmoid function is normally referred to as the sigmoid function in the world of machine learning. The logistic sigmoid function can take any real-valued input and outputs a value between zero and one. This is how the logistic sigmoid function is mathematically defined:

Logistic sigmoid function formula

Hyperbolic Tangent Function

The hyperbolic tangent function is another commonly used sigmoid function. This function maps any real-valued input to the range between -1 and 1. Here is the mathematical definition of the hyperbolic tangent function:

Hyperbolic tangent function formula

Arctangent Function

This is yet another type of sigmoid function. The arctangent function is essentially the inverse of the tangent function. This function maps any real-valued input to the range −π/2 to π/2. This is the mathematical definition of the arctangent function:

Arctangent Function Formula

What is sigmoid in deep learning?

The sigmoid neuron is essentially the building block of the deep neural networks. These sigmoid neurons are similar to perceptrons, but they happen to be  slightly modified so that the output from the sigmoid neuron is far smoother than the step functional output from perceptron.

The sigmoid function is a smoother (less harsh) function than perceptron. In a sigmoid neuron, a minor change in the input only causes a minor change in the output, unlike the stepped functional output generated by a perceptron.

The inputs to the sigmoid neuron can be real numbers unlike the boolean inputs in MP Neuron. Even the output will be a real number between 0–1. In the sigmoid neuron, you are trying to regress the relationship between X and Y in terms of probability. Even though the output is between 0 and 1, you can still make use of the sigmoid function for binary classification tasks by selecting a threshold.