GitHub – wayfair/keras_image_similarity_training: A Jupyter Notebook and python scripts that allows users to easily train a siamese network on image similarity, export the model to a SavedModel file,
Mục Lục
Keras Image Similarity Training
Train a convolutional neural network to determine content-based similarity between images.
This is done with a siamese neural network as shown
here.
The model learns from labeled images of similar and dissimiar pairs. The model’s objective is to
embed similar pairs nearby and dissimilar pairs far apart. This property of the latent space means
kNN searches can find similar images. This idea is based on the paper found
here.
Requirements
- Docker
- Nvidia Docker If using GPUs
Labeled Data
For both training and indexing, labeled data will be needed.
This data needed is multiple images of each unique item. Create a JSON file
such as the one seen below. The key of top level items should be
the item_id
. Each value should have an images
array, which contains
data on each image for that item. Optionally, you can also provide labels
for each item_id
, where two items sharing some label will not be
considered dissimilar.
{"item_id_1"
: {"images"
: [ {"filename"
:"
relative/path/to/item_1_1.jpg"
}, {"filename"
:"
relative/path/to/item_1_2.jpg"
} ],"labels"
: ["
red"
,"
pink"
] },"item_id_2"
: {"images"
: [ {"filename"
:"
relative/path/to/item_2_1.jpg"
}, {"filename"
:"
relative/path/to/item_2_2.jpg"
} ],"labels"
: ["
blue"
] } }
Training
For training a model, you will definitely need a GPU. If you do not have one,
then we suggest only using a pretrained model provided by Keras’s API.
Notebook
We provide a Jupyter notebook that will walk you through how to train a siamese network.
Note you will need a machine with an Nvidia GPU here.
DATA=/path/to/images/and/label/files make notebook
Exporting Model
If you trained a model, run the following
make bash-cpu python utilities.py --export savedmodel --keras-model checkpoints/file_saved_by_notebook.hdf5
Else you can use Google’s pretrained model on classification
make bash-cpu python utilities.py --export savedmodel
Indexing
Images need to be embedded and indexed for fast kNN search.
GPU and a trained model
DATA=/path/to/images/and/label/files make bash-gpu python utilities.py --export balltree \ --keras-model checkpoints/file_saved_by_notebook.hdf5 \ --labeled-data /data/path_to_labeled_images_file.json \ --image-dir /data/whereever_the_base_image_dir_is_mounted
GPU and Google’s pretrained model
DATA=/path/to/images/and/label/files make bash-gpu python utilities.py --export balltree \ --labeled-data /data/path_to_labeled_images_file.json \ --image-dir /data/whereever_the_base_image_dir_is_mounted
CPU and Google’s pretrained model
DATA=/path/to/images/and/label/files make bash-cpu python utilities.py --export balltree \ --labeled-data /data/path_to_labeled_images_file.json \ --image-dir /data/whereever_the_base_image_dir_is_mounted