Train deep learning neural network – MATLAB trainNetwork

Image data, specified as one of the following:

Data TypeDescriptionExample UsageDatastoreImageDatastoreDatastore of images saved on disk.

Train image classification neural network with
images saved on disk, where the images are the same
size.

When the images are different
sizes, use an
AugmentedImageDatastore
object.

ImageDatastore
objects support image classification tasks only. To
use image datastores for regression networks, create
a transformed or combined datastore that contains
the images and responses using the transform and combine functions,
respectively.

AugmentedImageDatastoreDatastore that applies random affine geometric
transformations, including resizing, rotation,
reflection, shear, and translation.

  • Train image classification neural network
    with images saved on disk, where the images are
    different sizes.

  • Train image classification neural network
    and generate new data using augmentations.

TransformedDatastoreDatastore that transforms batches of data read from
an underlying datastore using a custom transformation
function.

  • Train image regression neural
    network.

  • Train networks with multiple inputs.

  • Transform datastores with outputs not
    supported by
    trainNetwork.

  • Apply custom transformations to datastore
    output.

CombinedDatastoreDatastore that reads from two or more underlying
datastores.

  • Train image regression neural
    network.

  • Train networks with multiple inputs.

  • Combine predictors and responses from
    different data sources.

PixelLabelImageDatastore

(Computer Vision Toolbox)

Datastore that applies identical affine geometric
transformations to images and corresponding pixel
labels.Train neural network for semantic
segmentation.RandomPatchExtractionDatastore

(Image Processing Toolbox)

Datastore that extracts pairs of random patches from
images or pixel label images and optionally applies
identical random affine geometric transformations to the
pairs.Train neural network for object detection.DenoisingImageDatastore

(Image Processing Toolbox)

Datastore that applies randomly generated Gaussian
noise.Train neural network for image denoising.Custom mini-batch datastoreCustom datastore that returns mini-batches of
data.

Train neural network using data in a format
that other datastores do not
support.

For details, see Develop Custom Mini-Batch Datastore.

Numeric arrayImages specified as numeric array. If you specify
images as a numeric array, then you must also specify
the responses argument.Train neural network using data that fits in memory
and does not require additional processing like
augmentation.TableImages specified as a table. If you specify images as
a table, then you can also specify which columns contain
the responses using the responses
argument.Train neural network using data stored in a
table.

For networks with multiple inputs, the datastore must be a TransformedDatastore or CombinedDatastore object.

Tip

For sequences of images, for example video data, use the
sequences input argument.

Datastore

Datastores read mini-batches of images and responses. Datastores are
best suited when you have data that does not fit in memory or when you
want to apply augmentations or transformations to the data.

The list below lists the datastores that are directly compatible with
trainNetwork for image data.

For example, you can create an image datastore using the imageDatastore function
and use the names of the folders containing the images as labels by
setting the 'LabelSource' option to
'foldernames'. Alternatively, you can specify the
labels manually using the Labels property of the image datastore.

Note that ImageDatastore objects allow for batch
reading of JPG or PNG image files using prefetching. If you use a custom
function for reading the images, then ImageDatastore
objects do not prefetch.

Tip

Use augmentedImageDatastore for efficient preprocessing of images for deep
learning, including image resizing.

Do not use the readFcn option of the imageDatastore
function for preprocessing or resizing, as this option is usually significantly
slower.

You can use other built-in datastores for training deep learning
networks by using the transform and combine functions. These functions can convert the data
read from datastores to the format required by
trainNetwork.

For networks with multiple inputs, the datastore must be a TransformedDatastore or CombinedDatastore object.

The required format of the datastore output depends on the network
architecture.

Network ArchitectureDatastore OutputExample OutputSingle input layer

Table or cell array with two columns.

The
first and second columns specify the predictors and responses,
respectively.

Table elements must be scalars, row vectors, or
1-by-1 cell arrays containing a numeric array.

Custom mini-batch
datastores must output tables.

Table for network with one input and one
output:

data = read(ds)
data =

  4×2 table

        Predictors        Response
    __________________    ________

    {224×224×3 double}       2    
    {224×224×3 double}       7    
    {224×224×3 double}       9    
    {224×224×3 double}       9  

Cell array for network with one input and one
output:

data = read(ds)
data =

  4×2 cell array

    {224×224×3 double}    {[2]}
    {224×224×3 double}    {[7]}
    {224×224×3 double}    {[9]}
    {224×224×3 double}    {[9]}

Multiple input layers

Cell array with (numInputs + 1) columns, where
numInputs is the number of network
inputs.

The first numInputs columns specify
the predictors for each input and the last column specifies the
responses.

The order of inputs is given by the
InputNames property of the layer graph
layers.

Cell array for network with two inputs and one
output.

data = read(ds)
data =

  4×3 cell array

    {224×224×3 double}    {128×128×3 double}    {[2]}
    {224×224×3 double}    {128×128×3 double}    {[2]}
    {224×224×3 double}    {128×128×3 double}    {[9]}
    {224×224×3 double}    {128×128×3 double}    {[9]}

The format of the predictors depends
on the type of data.

DataFormat2-D images

h-by-w-by-c
numeric array, where h,
w, and c are
the height, width, and number of channels of the
images, respectively.

3-D imagesh-by-w-by-d-by-c
numeric array, where h,
w, d, and
c are the height, width, depth,
and number of channels of the images,
respectively.

For predictors returned in tables, the elements must contain a numeric
scalar, a numeric row vector, or a 1-by-1 cell array containing the
numeric array.

The format of the responses depends on the type of task.

TaskResponse FormatImage classificationCategorical scalarImage regression

  • Numeric scalar

  • Numeric vector

  • 3-D numeric array representing a 2-D
    image

  • 4-D numeric array representing a 3-D
    image

For responses returned in tables, the elements must be a categorical
scalar, a numeric scalar, a numeric row vector, or a 1-by-1 cell array
containing a numeric array.

For more information, see Datastores for Deep Learning.

Numeric Array

For data that fits in memory and does not require additional
processing like augmentation, you can specify a data set of images as a
numeric array. If you specify images as a numeric array, then you must
also specify the responses argument.

The size and shape of the numeric array depends on the type of image
data.

DataFormat2-D images

h-by-w-by-c-by-N
numeric array, where h,
w, and c are
the height, width, and number of channels of the
images, respectively, and N is
the number of images.

3-D imagesh-by-w-by-d-by-c-by-N
numeric array, where h,
w, d, and
c are the height, width, depth,
and number of channels of the images, respectively,
and N is the number of
images.

Table

As an alternative to datastores or numeric arrays, you can also
specify images and responses in a table. If you specify images as a
table, then you can also specify which columns contain the responses
using the responses argument.

When specifying images and responses in a table, each row in the table
corresponds to an observation.

For image input, the predictors must be in the first column of the
table, specified as one of the following:

  • Absolute or relative file path to an image, specified as a
    character vector

  • 1-by-1 cell array containing a
    h-by-w-by-c
    numeric array representing a 2-D image, where
    h, w, and
    c correspond to the height, width,
    and number of channels of the image, respectively.

The format of the responses depends on the type of task.

TaskResponse FormatImage classificationCategorical scalarImage regression

  • Numeric scalar

  • Two or more columns of scalar values

  • 1-by-1 cell array containing a
    h-by-w-by-c
    numeric array representing a 2-D image

  • 1-by-1 cell array containing a
    h-by-w-by-d-by-c
    numeric array representing a 3-D image

For neural networks with image input, if you do not specify
responses, then the function, by default, uses
the first column of tbl for the predictors and the
subsequent columns as responses.

Tip

  • If the predictors or the responses contains
    NaNs, then they are propagated
    through the network during training. In these cases, the
    training usually fails to converge.

  • For regression tasks, normalizing the responses often
    helps to stabilize and speed up training of neural networks
    for regression. For more information, see Train Convolutional Neural Network for Regression.

  • To input complex-valued data into a network, the SplitComplexInputs option of the input layer must be 1.

Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | table
Complex Number Support: Yes