A brief introduction about Siamese Neural Network (SNN).

The term “Siamese twins,” also known as “Conjoined twins,” is two identical twins joined in utero. These twins are physically connected to each other (i.e., unable to separate), often sharing the same organs, predominately the lower intestinal tract, liver, and urinary tract.

No alt text provided for this image

Figure 1:siamese twins.

Just as Siamese twins are connected, so are Siamese networks.

Siamese networks are a special type of neural network architecture. Instead of a model learning to classify its inputs, the neural networks learns to differentiate between two inputs. It learns the similarity between them.

Where Siamese network can be used?

We use Siamese networks when performing verification, identification, or recognition tasks, the most popular examples being face recognition and signature verification.

For example, let’s suppose we are tasked with detecting signature forgeries. Instead of training a classification model to correctly classify signatures for each unique individual in a dataset (which would require significant training data), what if we instead took two images from the training set and asked the neural network if the signatures were from the same person or not?

If the two signatures are the same, then Siamese network reports “Yes”.
Otherwise, if the two signatures are not the same, thereby implying a potential forgery, the Siamese network reports “No”.

This is an example of a verification task (versus classification, regression, etc.), and while it may sound like a harder problem, it actually becomes far easier in practice — we need significantly less training data, and our accuracy actually improves by using Siamese networks rather than classification networks.

No alt text provided for this image

Figure 2: An example of a Siamese network, SigNet, used for signature verification.

How does a Siamese network work?

A Siamese networks consists of two identical neural networks, each taking one of the two input images. The last layers of the two networks are then fed to a contrastive loss function , which calculates the similarity between the two images. I have made an illustration to help explain this architecture.

Figure 3: Siamese Network Architecture.

There are two sister networks, which are identical neural networks, with the exact same weights.

Each image in the image pair is fed to one of these networks.

The networks are optimized using a contrastive loss function(we will get to the exact function).

Contrastive Loss function:

The objective of the Siamese architecture is not to classify input images, but to differentiate between them. So, a classification loss function (such as cross entropy) would not be the best fit. Instead, this architecture is better suited to use a contrastive function. Intuitively, this function just evaluates how well the network is distinguishing a given pair of images.

You can read more details here.

The contrastive loss function is given as follows:

Equation 1.0

where Dw is defined as the Euclidean distance between the outputs of the sister Siamese networks. Mathematically the Euclidean distance is :

Equation 2.0

No alt text provided for this image

where Gw is the output of one of the sister networks. X1 and X2 is the input data pair.