2-D convolutional layer - MATLAB - EU-Vietnam Business Network (EVBN)

A 2-D convolutional layer applies sliding convolutional filters
to 2-D input. The layer convolves the input by moving the filters along the input
vertically and horizontally, computing the dot product of the weights and the input,
and then adding a bias term.

The dimensions that the layer convolves over depends on the layer input:

For 2-D image input (data with four dimensions corresponding to pixels in two spatial dimensions, the channels, and the observations), the layer convolves over the spatial dimensions.
For 2-D image sequence input (data with five dimensions corresponding to the pixels in two spatial dimensions, the channels, the observations, and the time steps), the layer convolves over the two spatial dimensions.
For 1-D image sequence input (data with four dimensions corresponding to the
pixels in one spatial dimension, the channels, the observations, and the time
steps), the layer convolves over the spatial and time dimensions.

The convolutional layer consists of various components.

Filters and Stride

A convolutional layer consists of neurons that connect to subregions of the input images or
the outputs of the previous layer. The layer learns the features localized by these regions
while scanning through an image. When creating a layer using the convolution2dLayer function, you can specify the size of these regions using
the filterSize input argument.

For each region, the trainNetwork function computes a dot product of the
weights and the input, and then adds a bias term. A set of weights that is applied to a
region in the image is called a filter. The filter moves along the
input image vertically and horizontally, repeating the same computation for each region. In
other words, the filter convolves the input.

This image shows a 3-by-3 filter scanning through the input. The lower map represents the
input and the upper map represents the output.

The step size with which the filter moves is called a stride. You can
specify the step size with the Stride name-value pair argument. The
local regions that the neurons connect to can overlap depending on the
filterSize and 'Stride' values.

This image shows a 3-by-3 filter scanning through the input with a stride of 2. The lower
map represents the input and the upper map represents the output.

The number of weights in a filter is h * w *
c, where h is the height, and w
is the width of the filter, respectively, and c is the number of channels
in the input. For example, if the input is a color image, the number of color channels is 3.
The number of filters determines the number of channels in the output of a convolutional
layer. Specify the number of filters using the numFilters argument with
the convolution2dLayer function.

Dilated Convolutions

A dilated convolution is a convolution in which the filters are expanded by spaces inserted
between the elements of the filter. Specify the dilation factor using the
'DilationFactor' property.

Use dilated convolutions to increase the receptive field (the area of the input which the
layer can see) of the layer without increasing the number of parameters or
computation.

The layer expands the filters by inserting zeros between each filter element. The dilation
factor determines the step size for sampling the input or equivalently the upsampling factor
of the filter. It corresponds to an effective filter size of (Filter Size
– 1) .* Dilation Factor + 1. For example, a 3-by-3 filter with the
dilation factor [2 2] is equivalent to a 5-by-5 filter with zeros between
the elements.

This image shows a 3-by-3 filter dilated by a factor of two scanning through the input.
The lower map represents the input and the upper map represents the output.

Feature Maps

As a filter moves along the input, it uses the same set of
weights and the same bias for the convolution, forming a feature map. Each
feature map is the result of a convolution using a different set of weights and a different
bias. Hence, the number of feature maps is equal to the number of filters. The total number of
parameters in a convolutional layer is
((h*w*c + 1)*Number of
Filters), where 1 is the bias.

Padding

You can also apply padding to input image borders vertically and horizontally
using the 'Padding' name-value pair argument. Padding is values
appended to the borders of a the input to increase its size. By adjusting the padding, you
can control the output size of the layer.

This image shows a 3-by-3 filter scanning through the input with padding of size 1. The
lower map represents the input and the upper map represents the output.

Output Size

The output height and width of a convolutional layer is
(Input Size – ((Filter Size – 1)*Dilation
Factor + 1) + 2*Padding)/Stride + 1. This
value must be an integer for the whole image to be fully covered. If the combination of these
options does not lead the image to be fully covered, the software by default ignores the
remaining part of the image along the right and bottom edges in the convolution.

Number of Neurons

The product of the output height and width gives the total number of neurons in a feature map,
say Map Size. The total number of neurons (output size) in a
convolutional layer is Map Size*Number of
Filters.

For example, suppose that the input image is a 32-by-32-by-3 color image. For a convolutional
layer with eight filters and a filter size of 5-by-5, the number of weights per
filter is 5 * 5 * 3 = 75, and the total number of parameters in the layer is (75 +
1) * 8 = 608. If the stride is 2 in each direction and padding of size 2 is
specified, then each feature map is 16-by-16. This is because (32 – 5 + 2 * 2)/2 + 1
= 16.5, and some of the outermost padding to the right and bottom of the image is
discarded. Finally, the total number of neurons in the layer is 16 * 16 * 8 =
2048.

Usually, the results from these neurons pass through some form of nonlinearity, such as rectified linear units (ReLU).

Learnable Parameters

You can adjust the learning rates and regularization options
for the layer using name-value pair arguments while defining the convolutional layer. If you
choose not to specify these options, then trainNetwork uses the global
training options defined with the trainingOptions function. For details on
global and layer training options, see Set Up Parameters and Train Convolutional Neural Network.

Number of Layers

A convolutional neural network can consist of one or multiple convolutional layers. The number of convolutional layers depends on the amount and complexity of the data.