Mục Lục

A Comprehensive Guide to Siamese Neural Networks

Classification and regression are one of the most common words one must have heard if interested in machine learning or has been working in the same. But there is one more innovative technique known as similarity problems which finds if two inputs are similar or not which is known as a siamese neural network.

This kind of neural network architecture is scalable and does not require much retraining.

Prerequisites

I assume the readers are familiar with CNN for image classification and have trained in an image classification model or normal classification type before, where one must have a trained model e.g. a model that can recognize images of dogs and cats using normal deep learning networks or fully connected layer network.

Image Classification Model using traditional deep learning neural network architecture
What is Similarity Learning?
Use cases of Siamese network
Siamese Neural Networks Architecture
Loss function in siamese networks
Contrastive loss function
Triplet loss function
Pros and Cons of Siamese Networks
Siamese Network Implementation Procedure
How to improve Siamese Networks or similarity learning?
FAQ’s
Further Learning

At end of this article, one will get a clear understanding of siamese network architecture, its loss functions, and its application, and will implement an end-to-end model using siamese networks.

Image Classification Model using traditional deep learning neural network architecture

For this, one must have obtained a labeled dataset containing images of dogs and cats. After training the neural network, and upon giving any input image the network can only output labels as dog or cat. This is a standard computer vision problem known as Image Classification.

In classification, the input image is fed into a neural network(neural networks or fully connected layer network), and finally, at the output layer, we get the list of the probability distribution over all the classes (using a Softmax or any other activation function as per the classification problem being solved).

For example as above, if we are trying to classify an image as a cat or dog, then for every input image, we generate 2 probabilities, indicating the probability of the image belonging to each of the 2 classes.

But during the training process, we require an abundant number of images for each of the classes (cats and dogs), if the network is trained only on the above 2 classes of images, then we cannot expect to predict or test it on any other class, example “elephant”.

If we want our model to classify the images of elephants as well, then we need to first get a lot of elephant images and then we must re-train the model again on the images and then predict.

There are applications where we don’t have enough data for each class, as well as the number of classes can increase exponentially for use cases like the employee attendance system. Thus, the cost of data collection and re-training is high each time a new class is added or a new employee joins.

Due to this algorithms like similarity score learning or siamese networks comes as an alternative to the traditional classification algorithm.

We will rebuild a similar kind of binary image classification using the siamese neural network and also learn about the siamese networks. Before diving deep into siamese networks, let us understand what is similarity learning.

What is Similarity Learning?

Similarity learning is a technique of supervised machine learning in which the goal is to make the model learn, which is a similarity function that measures how similar two objects are and returns a similarity value.

A high score is returned when the objects are similar and a low score is returned when the images or objects are different. Now let us see some use cases where similarity learning i.e. one-shot classification(siamese network) is used.

Use cases of Siamese network

Here we will see two use cases of similarity learning first will be employee attendance and second will be the signature verification system.

In the siamese network, we require only one training example for each class. Due to this the name One Shot. Let’s try to understand with a real-world practical example.

Employee attendance system

Assume that we want to build an attendance system for a small organization with only 20 employees (small numbers keep things simple), where the system has to recognize the face of the employee.

Challenges in building this attendance system

The first problem will be train data images, we first require a lot of different images of each of the employees in the organization.

When a new employee joins or leaves the organization we need to take the pain of collecting data again and re-train the entire model again. This is not efficient for a scalable system, especially for large organizations like MNCs where new joining and resignation (i.e. attrition) is happening almost every week.

For such a scenario where a scalable system is needed, cases siamese network model can be a great solution.

Now, what Siamese networks do is instead of classifying a test image to one of the 20 people in the organization, the siamese network instead takes a reference image of the person as input and generates a similarity score denoting the probability that the two input images are of the same person.

The similarity score lies between 0 and 1 using a sigmoid function.

Where similarity score 0 denotes no similarity and similarity score 1 denotes full similarity. Any number between 0 and 1 is interpreted accordingly.

Siamese networks are not learning to classify an image to any of the output classes. But, it is learning with help of a similarity function, which takes two images as input and gives the probability of how similar these images are.

Even Baidu (the Chinese Search Giant) has developed a face recognition system for the employees in their organization using a similar technique.

How does the siamese neural network solve the above-stated problem?

1) Unlike traditional neural networks in deep learning, a siamese network does not require too many instances of a class and few are enough to build a good model.

2) The biggest advantage of the siamese network is that, in the case of face detection applications like attendance, when we have a new employee or class for our model. For the network to detect his/her face, the model only requires a single image of his/her face. Using the single image as the reference image, the network will calculate the similarity score for any new instances presented to it. That’s the reason we say that the network predicts the score in one shot and we say it’s a one-shot learning model.

Signature verification system using siamese networks

Siamese network can also be used to compare the signature of the account holder to the signature on the check or any document requiring the account holder’s signature, which needs to be verified by the bank employee for safety purposes.

If the similarity score is higher than a certain threshold (which needs to be decided based on the training performance of the model ), then the check is accepted and if the similarity score is low then the signature has a higher probability of being forged.

Here also makes the system scalable and has less development time and even the need for data and re-training time is reduced. Thus making the system more efficient.

This system will not be only limited to banking applications but also to legal, financial sectors, and other Government and also private institutions.

Siamese Neural Networks Architecture

A Siamese network is an artificial neural network that contains two or more identical sub-networks i.e. they have the same configuration with the same parameters and weights.

Mostly, we only train one of N (the number of subnetworks chosen for solving the problem) the subnetworks and use the same configuration(parameters and weights) for other sub-networks.

Siamese network networks are used to find the similarity of the inputs by comparing their feature vectors.

The step-by-step approach of how siamese network architecture works

We have two images and we want to compare and see if they are similar or dissimilar pairs

The first subnetwork takes an image (A) as input and passes through convolutional layers and fully connected layers, we get a vector representation of the image
Again pass the second image(B) through a network that is exactly the same with the same weights and parameters.
Now we have two encodings E(A) and E(B) from the respective images, we can compare these two to know how similar the two images are. If the images are similar then the encodings will also be quite similar.
We will measure the distance between these two vectors and if the distance between these is small then the vectors are similar or of the same classes and if the distance between is larger then the vectors are different from one another, based on the score.

In siamese network architecture, the loss functions play the main role to distinguish similar and dissimilar pairs from both images.

Loss function in siamese networks

Let us go through two main loss functions of the siamese network which are contrastive loss and Triplet loss.

Contrastive loss function

Siamese network is not to perform classification on input images, but to differentiate between input images. So, classification loss functions like cross-entropy loss, would not be the best fit.

Instead, this siamese network architecture is better suited to use a contrastive function.

This function just evaluates how well the siamese network is able to distinguish between the given image pairs.

The contrastive loss function formulae are as follows:

where Dw is defined as the Euclidean distance between the outputs of the sister networks.

Mathematically the formulae of euclidean distance are:

Y is either 1 or 0. If the first image and second image are from the same class, then the value of Y is 0, otherwise, Y is 1

max() is a function denoting the higher value between 0 and m-Dw.

m is a margin value that is greater than 0. Having a margin indicates that dissimilar pairs that are beyond this margin will not contribute to the loss.

Triplet loss function

Triplet loss will allow our model to map two similar images close and far from dissimilar sample image pairs.

This approach is done by using triplet constituting:

1. Anchor Image — This is a sample image.

2. Positive Image — This is just another variation of the anchor image.

This helps the siamese network model learn the similarities between the two images.

3. Negative Image — This is a different image from the above two similar image pairs.

This helps our model learn dissimilarities with anchor images.

Now, to increase the distance between similar and dissimilar pairs output vector, and to map similar images close to one another, there is a term known as margin. Margin increases the separation between our similar and dissimilar vector, and also eliminate the output of any trivial solution.

This similarity or dissimilarity is measured by the distance between two vectors using L2 distance and cosine distance.

Here is our loss function, where a is the anchor image, p is the positive image and n is the negative images

Pros and Cons of Siamese Networks

The main advantages or pros of siamese networks are,

Robustness to class Imbalance: Due to one-shot learning, a few images(very little data for training data) for the same class is sufficient for siamese networks to classify those images in the future
Ensemble with one of the classifier algorithms: As its learning mechanism is different from Classification algorithms, ensembling the siamese networks with a classifier can do much better than average two correlated Supervised models (e.g. GBM & RF classifier algorithms)
Semantic Similarity: trained siamese network focuses on learning embeddings (in the deep neural networks) that place the same classes close together. Hence, can learn semantic similarity.

The downsides or cons of the siamese networks can be,

Requires more training time than traditional neural network architectures and machine learning algorithms: Siamese Networks involve quadratic pairs to learn from which is slower than the normal traditional type of machine learning, the neural network learns fast than the siamese network.
Doesn’t output probabilities: Since training of siamese networks involves pairwise learning, it does not output the probabilities of the prediction, but the distance(using a distance formula like euclidean distance) from each class which is between 0 to 1.

Siamese Network Implementation Procedure

Every machine learning model has three stages first is training and validation, second is testing and third is deployment.

The steps below remain the same for almost all applications used to develop using the siamese network.

Training the Network:

The training process of a Siamese network is as follows:

Initialize the siamese network, loss function, and Optimizer(like Adam , Adagrad, SGD etc)
Pass the images one by one out of the image pairs through the siamese network, as here training involves pairwise learning.
Calculate the loss using the outputs from the first and second images using the loss.
Back propagate through the model to calculate the gradients of our model.
Update the weights using an optimizer to minimize the loss after a certain number of epochs
After we reach the max epochs we have set for the model and also get the least loss possible
Save the model

Testing the model:

Load the test data
Pass the image pairs and the labels
Find the euclidean distance between the images
Display the similar image pairs

Here is one example of how the siamese network is implemented using Keras along with a dataset link

Here is another example of text similarity measurement using siamese networks Link

How to improve Siamese Networks or similarity learning?

Selection of Loss functions: We have seen two types of loss functions,i.e contrastive loss and triplet loss.

We can say that triplet loss is more efficient than contrastive loss as it helps us with ranking and leads to better results than the other loss functions.

But we can certainly improve the performance of the network if we can find a better loss function.

“Deep metric learning with angular loss” and “correcting the triplet selection bias for triplet loss“ are some of the interesting methods which can be studied upon.

Some recent research papers have come up to show that we can also use classification loss functions such as cross-entropy to train a siamese network and still get good results.

Sampling: The triplets from the dataset(all the images) can be sampled in such a way that it can increase the accuracy of the model.

It is a better idea to include the hard cases for your triplets in the siamese network.

Ensemble of siamese network and other classification algorithms: We can also use different algorithms and networks, and train each of them on different triplets from our data.

FAQ’s

What is a siamese network used for?
As we have seen in this article it can be used for numerous face recognization and classification applications, image classification, object detection, text classification, voice classification, etc.
Is the Siamese network supervised?
Yes, the siamese network is a supervised metric-based approach algorithm technique