Mục Lục

Convolutional Neural Network using Keras

In this tutorial I’m going to explain how images are classified using CNN and I’m using CIFAR-10 data set to train/test the model.

If you are beginner in Image recognition concept I would recommend you to read this blog and get a theoretical idea about the functionality of deep learning at image classification.

This dataset includes thousands of pictures of 10 different kinds of objects, like airplanes, automobiles, birds, and so on. Each image in the dataset includes a matching label so we know what kind of image it is. Using this dataset, we can train our neural network to recognize any of these 10 different kinds of object.

The images in the CIFAR-10 dataset are only 32 pixels by 32 pixels. These are very low resolution images. We’re using them here because the lower resolution will make it possible to train the neural network to recognize them relative quickly. With the same code we’ll write, we’ll also work for larger image sizes.

from keras.datasets import cifar10
import matplotlib.pyplot as plt
# List of names for each CIFAR10 class
cifar10_class_names = {
 0: "Plane",
 1: "Car",
 2: "Bird",
 3: "Cat",
 4: "Deer",
 5: "Dog",
 6: "Frog",
 7: "Horse",
 8: "Boat",
 9: "Truck"
}
# Load the entire data set
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
# Loop through each picture in the data set
for i in range(1000):
 # Grab an image from the data set
 sample_image = x_train[i]
 # Grab the image's expected class id
 image_class_number = y_train[i][0]
 # Look up the class name from the class id
 image_class_name = cifar10_class_names[image_class_number]
# Draw the image as a plot
 plt.imshow(sample_image)
 # Label the image
 plt.title(image_class_name)
 # Show the plot on the screen
 plt.show()

Run the above code to view the sample images under cifar10 dataset.

Preparing image data set

Keras provides a function for easily accessing data here to load the data we’ll call cifar10.loaddata(). This function returns four different arrays. First it returns an x and y array of training data. The x array will contain the actual images from the data set. The y array contains the matching label for each image. The function also returns an x and y array of test data. So we’ll add x_test, and y_test.

# Load the entire data set
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

Before we can use this data to train a neural network, we need to normalize it. Neural networks work best when the input data are floating point values in between zero and one. Normally images are stored as integer values for each pixel is a number between zero and 255. So to use this data, we need to convert it from integer the floating point and then we need to make sure all the values are between zero and one.

# Normalize data set to 0-to-1 range
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

There’s one last bit of cleanup we need to do before we can use our training data. Cifar10 provides the labels for each class as values from zero to nine. But since we are creating a neural network with 10 outputs, we need a separate expected value for each of those outputs. So we need to convert each label from a single number into an array with 10 elements.

# Convert class vectors to binary class matrices
# Our labels are single values from 0 to 9.
# Instead, we want each label to be an array with on element set to 1 and and the rest set to 0.
y_train = keras.utils.np_utils.to_categorical(y_train, 10)
y_test = keras.utils.np_utils.to_categorical(y_test, 10)

And now we’ve got this data ready to use with the neural network.

Creating Neural Network

Let’s start by creating that. First we need to create a new neural network object in Keras. To do that, we create a new sequential object. So we say model = sequential(). The sequential api lets us create a neural network by adding new layers to it one at a time. It’s call sequential because you add each layer in sequence and they automatically get connected together in that order.

We use convolutional layers to make it better at finding patterns in images. Since we’re working with images, we’ll want to add the two dimensional convolutional layer. To create one, we just create a new Conv2D object and then pass in the parameters. The first parameter is how many different filters should be in the layer? Each filter will be capable of detecting one pattern in the image. We’ll start with 32. Next, we need to pass in the size of the window that we’ll use when creating image tiles from each image. Let’s use a window size of three pixels by three pixels. This will split up the original image into three by three tiles. Padding is just extra zeros added to the edge of the image to make the math work out. Use the relu activation function because of its efficiency. To make our neural network more powerful, let’s add a few more convolutional layers the same way. First, let’s add another one with the same settings, 32 filters and a three by three window size.

# Create a model and add layers
model = Sequential()
model.add(Conv2D(32, (3, 3), padding="same", activation="relu", input_shape=(32, 32, 3)))
model.add(Conv2D(32, (3, 3), activation="relu"))
model.add(Conv2D(64, (3, 3), padding="same", activation="relu"))
model.add(Conv2D(64, (3, 3), activation="relu"))

Whenever we transition between convolutional layers and dense layers, we need to tell Keras that we’re no longer working with 2D data. To do that we need to create a flattened layer and add it to our network

model.add(Flatten())

The cifar10 data set has 10 different kinds of objects. Since we’re detecting 10 different kinds of objects, we’ll create a new dense layer with 10 nodes. So to do that we’ll call model.add and we’ll create a new dense object and we know it needs 10 nodes. When doing classification with more than one type of object, the output layer will almost always use a softmax activation function. The softmax activation function is a special function that makes sure all the output values from this layer add up to exactly one.

model.add(Dense(512, activation=”relu”))
model.add(Dense(10, activation=”softmax”))

# Print a summary of the model
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 32, 32, 32)        896       
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 30, 30, 32)        9248      
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 30, 30, 64)        18496     
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 28, 28, 64)        36928     
_________________________________________________________________
flatten (Flatten)            (None, 50176)             0         
_________________________________________________________________
dense (Dense)                (None, 512)               25690624  
_________________________________________________________________
dense_1 (Dense)              (None, 10)                5130      
=================================================================
Total params: 25,761,322
Trainable params: 25,761,322
Non-trainable params: 0
_________________________________________________________________

we’ve created a basic convolutional neural network. Let’s improve its efficiency by adding max pooling. Max pooling is where we scale down the output of the convolutional layers by keeping only the largest values and throwing away the smaller ones. This makes the neural network more efficient by throwing away the least useful data and keeping the most useful data. Typically, we’ll do max pooling right after a block of convolutional layers.

We have one more trick left up our sleeves to make our neural network perform better. Let’s add dropout layers to it. One of the problems with neural networks is that they can tend to memorize the input data instead of actually learning how to tell different objects apart. We can force the neural network to try harder to learn without memorizing the input data. The idea is that between certain layers, we’ll randomly throw away some of the data by cutting some of the connections between the layers. This is called dropout. . Usually we’ll add dropout right after max pulling layers, or after a group of dense layers. The only parameter we need to pass in is the percentage of neural network connections to randomly cut. Usually a value between 25% and 50% works well. We’ll use 25%. To do that, we pass in 0.25.

To compile the neural network, we’ll call model.compile. This function takes several parameters. we’ll pass in a loss parameter as categorical crossentropy. Next, we need to tell Keras which optimization algorithm we’ll use to train the neural network. For image data like this, a good starting point is to use an optimization algorithm called Adam, or Adaptive Moment Estimation. Metrics as accuracy.

# Create a model and add layers
model = Sequential()
model.add(Conv2D(32, (3, 3), padding='same', input_shape=(32, 32, 3), activation="relu"))
model.add(Conv2D(32, (3, 3), activation="relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), padding='same', activation="relu"))
model.add(Conv2D(64, (3, 3), activation="relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(512, activation="relu"))
model.add(Dropout(0.5))
model.add(Dense(10, activation="softmax"))
# Compile the model
model.compile(
    loss='categorical_crossentropy',
    optimizer='adam',
    metrics=['accuracy']
)
# Print a summary of the model
model.summary()

We could see the summary as

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_4 (Conv2D)            (None, 32, 32, 32)        896       
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 30, 30, 32)        9248      
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 15, 15, 32)        0         
_________________________________________________________________
dropout (Dropout)            (None, 15, 15, 32)        0         
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 15, 15, 64)        18496     
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 13, 13, 64)        36928     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 6, 6, 64)          0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 6, 6, 64)          0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 2304)              0         
_________________________________________________________________
dense_2 (Dense)              (None, 512)               1180160   
_________________________________________________________________
dropout_2 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_3 (Dense)              (None, 10)                5130      
=================================================================
Total params: 1,250,858
Trainable params: 1,250,858
Non-trainable params: 0

Train the model

In Keras we use dot fit function to train the model. This function takes several parameters. The first two parameters to fit are the training data set, and the expected labels for the training data set. We already loaded those up in our code as x training and y training. Next, we need to pass in a batch size. The batch size is how many images we want to feed into the network at once during training. If we set the number too low, training will take a long time and might not ever finish. If we set the number too high, we’ll run out of memory on our computer. Typical batch sizes are between 32 and 128 images, but feel free to experiment. Next, we need to decide how many times we wanna go through our training data set during the process. One full pass through the entire training data set is called an epoch. The more passes through the data we do, the more chance the neural network has to learn; but the longer the training process will take.

# Train the model
model.fit(
    x_train,
    y_train,
    batch_size=32,
    epochs=30,
    validation_data=(x_test, y_test),
    shuffle=True
)

Save neural network structure

The reason we save the structure separately from the weights is because often you’ll train the same neural network multiple times with different settings or different training datasets. It’s convenient to be able to load different sets of weights using the same neural network structure. There’s lots of ways to do this in Python, but here is one easy way to do it using the path library.

# Save neural network structure
model_structure = model.to_json()
f = Path("model_structure.json")
f.write_text(model_structure)

Save neural network’s trained weights

We just need the call model that save_weights and pass in the file name. I’m gonna call the file ‘model_weights.h5’ The data that gets saved here is in a binary format called HDF5. The HDF5 format is designed for saving and loading large binary files efficiently. So by convention we’re using the h5 file extension to indicate the format of the file.

# Save neural network's trained weights
model.save_weights("model_weights.h5")

Making predictions using trained data set

Now, we’re ready to load the neural network. First, we need to load the structure of the network itself. One option is to load the neural network structure from a file. Here in the file list, we already have a file called model_structure.json. This file contains the list of layers in our neural network.

from keras.models import model_from_json
from pathlib import Path
from keras.preprocessing import image
import numpy as np
# These are the CIFAR10 class labels from the training data (in order from 0 to 9)
class_labels = [
    "Plane",
    "Car",
    "Bird",
    "Cat",
    "Deer",
    "Dog",
    "Frog",
    "Horse",
    "Boat",
    "Truck"
]
# Load the json file that contains the model's structure
f = Path("model_structure.json")
model_structure = f.read_text()
# Recreate the Keras model object from the json data
model = model_from_json(model_structure)
# Re-load the model's trained weights
model.load_weights("model_weights.h5")
# Load an image file to test, resizing it to 32x32 pixels (as required by this model)
img = image.load_img("frog.png", target_size=(32, 32))
# Convert the image to a numpy array
image_to_test = image.img_to_array(img)
# Add a fourth dimension to the image (since Keras expects a list of images, not a single image)
list_of_images = np.expand_dims(image_to_test, axis=0)
# Make a prediction using the model
results = model.predict(list_of_images)
# Since we are only testing one image, we only need to check the first result
single_result = results[0]
# We will get a likelihood score for all 10 possible classes. Find out which class had the highest score.
most_likely_class_index = int(np.argmax(single_result))
class_likelihood = single_result[most_likely_class_index]
# Get the name of the most likely class
class_label = class_labels[most_likely_class_index]
# Print the result
print("This is image is a {} - Likelihood: {:2f}".format(class_label, class_likelihood))