Build and Train a Convolutional Neural Network with TensorFlow's Keras API

We’ll be working with the image data we prepared in the last episode . Be sure that you have gone through that episode first to get and prepare the data, and also ensure that you still have all of the imports we brought in last time, as we’ll be continuing to make use of them here.

In this episode, we’ll demonstrate how to build a simple convolutional neural network (CNN) and train it on images of cats and dogs using TensorFlow’s Keras API.

Build a simple CNN

To build the CNN, we’ll use a Keras Sequential model. Recall, we first introduced a Sequential model in an
earlier episode.

model = Sequential([
    Conv2D(filters=32
, kernel_size=(3
, 3
), activation='relu'
, padding = 'same'
, input_shape=(224
,224
,3)),
    MaxPool2D(pool_size=(2
, 2
), strides=2),
    Conv2D(filters=64
, kernel_size=(3
, 3
), activation='relu'
, padding = 'same'),
    MaxPool2D(pool_size=(2
, 2
), strides=2),
    Flatten(),
    Dense(units=2
, activation='softmax')
])

The first layer in the model is a 2-dimensional convolutional layer. This layer will have 32 output filters each with a kernel size of 3x3, and we’ll use the
relu activation function.

Note that the choice for the number of output filters specified is arbitrary, and the chosen kernel size of 3x3 is generally a very common size to use. You can experiment by choosing different
values for these parameters.

We enable
zero-padding by specifying padding = 'same'.

On the first layer only, we also specify the input_shape, which is the shape of our data. Our images are 224 pixels high and 224 pixels wide and have 3 color channels: RGB. This gives us an input_shape of (224,224,3).

We then add a
max pooling layer to pool and reduce the dimensionality of the data. Note, to gain a fundamental understanding of max pooling, zero padding, convolutional filters, and convolutional neural networks,
check out the
Deep Learning Fundamentals course.

We follow this by adding another convolutional layer with the exact specs as the earlier one, except for this second Conv2D layer has 64 filters. The choice of 64 here
is again arbitrary, but the general choice of having more filters in later layers than in earlier ones is common. This layer is again followed by the same type of MaxPool2D layer.

We then Flatten the output from the convolutional layer and pass it to a Dense layer. This Dense layer is the output layer of the network, and so it has
2 nodes, one for cat and one for dog. We’ll use the softmax activation function on our output so that the output for each sample is a probability distribution over the outputs
of cat and dog.

We can check out a summary of the model by calling model.summary().

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param  
=================================================================
conv2d (Conv2D)              (None
, 224
, 224
, 32
)      896       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None
, 112
, 112
, 32
)      0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None
, 112
, 112
, 64
)      18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None
, 56
, 56
, 64
)        0         
_________________________________________________________________
flatten (Flatten)            (None
, 200704
)            0         
_________________________________________________________________
dense (Dense)                (None
, 2
)                 401410    
=================================================================
Total params: 420
,802
Trainable params: 420
,802
Non-trainable params: 0
_________________________________________________________________

Now that the model is built, we compile the model using the Adam optimizer with a learning rate of 0.0001, a loss of categorical_cross_entropy, and we’ll
look at accuracy as our performance metric. Again, if you need a fundamental understanding of any of these topics, check out the
Deep Learning Fundamentals course.

model.compile
(optimizer=Adam(learning_rate=0.0001
), loss='categorical_crossentropy'
, metrics=['accuracy'
])

Note that when we have only two classes, we could instead configure our output layer to have only one output, rather than two, and use binary_crossentropy as our loss, rather than categorical_crossentropy.
Both options work equally well and achieve the exact same result.

With binary_crossentropy, however, the last layer would need to use sigmoid, rather than softmax, as its activation function.