Real-Time Face Recognition With Siamese Convolutional Neural Networks
Mục Lục
Real-Time Face Recognition With Siamese Convolutional Neural Networks
It’s no secret that facial recognition is heavily flawed, especially when it comes to implicit racial bias. However, as a nerd who loves programming and AI, I decided to implement a type of convolutional neural network I recently learned about, and create such a software myself.
GitHub link to the project: https://github.com/zarif101/face_find
The What
The network I’m referring to is the ‘Siamese’ CNN. Before I jump into how I actually built it, let me tell you what it even is; (I’m assuming you’re familiar with the basics of neural networks, CNNs, computer vision, AI, etc). Instead of classifying single inputs into a fixed number of categories like most neural networks, a Siamese network aims to calculate how similar two inputs are. Given images, the network will pass each one through an identical — or Siamese you might say — ‘base’ CNN to generate feature maps, and then use a loss function such as Euclidian Distance, to calculate the difference between each feature map. This is especially helpful in situations where you don’t have thousands of samples or training images per class, or if the number of classes is constantly evolving, because instead of learning what type of image is what, it learns how to compare two different images. Think of a situation where you want to find out who a person is, given a large database of faces. You could create a traditional CNN to do so, but then you’d have to retrain the whole network every time a new face was added to the database. Also, you’d require many pictures of each person. I highly recommend that you read this paper to get a better understanding of how siamese networks function.
Credit for diagram: https://medium.com/@kuzuryu71/improving-siamese-network-performance-f7c2371bdc1e
The How
I used Keras as my deep learning library of choice, but feel free to use whichever one you like. Note: I’m using a backend called PlaidML instead of Tensorflow, since I’m running on an AMD GPU. The dataset I used can be found here. The dataset provided CSV files describing pairs of images — some pairs have 2 images of different people, while others have 2 images of the same people. The network will take these pairs and inputs, and attempt to predict a binary target; either the input images are of the same person, or not. I used a custom generator to yield images for the train, validation, and test set. After a great deal of parameter tuning on the base CN, I was able to achieve about 61% accuracy on the test set, without overfitting the model. That may not seem great, but I’ll explain in a bit why it still provides satisfactory results. If you’re interested in the exact architecture of the model, check out a file called ‘best_results.MD’ on the GitHub. I’ve turned the entire experiment into a user-friendly project as well, so feel free to clone the repository and train your own models!
Now all that’s left to do is use OpenCV to capture a feed from the webcam, and attempt to identify people. I saved the best model after training as an h5 file, and loaded it in before initiating the webcam feed. Then, the program continuously attempts to detect faces in the frame using OpenCV’s HaarCascade. When a face is found, it recursively checks the images in the directory of known people, and if the face matches all images of a given person, then it declares recognition for that individual. This is why a 61% accuracy is perfectly fine here — every single image of a person has to be matched, so if 2–3 are provided (not too much, not too little), there is a relatively low chance that the network will incorrectly predict that the person in the frame is the same as the person in every single image. This entire process might sound a bit complicated, but once again, feel free to peruse the code and entire project on GithHub.
Conclusion
I know that I didn’t exactly achieve state of the art results here, but it’s certainly intriguing to explore this area of machine learning and see how much the field is capable of. With the proper review, ethical monitoring, and non-discriminatory datasets, I believe that facial recognition technology such as Siamese networks have the power to make a wide variety of tasks such as employee verification at a large company, to the police identifying criminals, vastly more efficient. Thanks for reading, and make sure to keep creating!