Mục Lục

Loss Functions¶

A loss function is used to optimize the parameter values in a neural network
model. Loss functions map a set of parameter values for the network onto a
scalar value that indicates how well those parameter accomplish the task the
network is intended to do.

There are several common loss functions provided by theanets. These losses
often measure the squared or
absolute error between a network’s
output and some target or desired output. Other loss functions are designed
specifically for classification models; the cross-entropy is a common loss designed to minimize the
distance between the network’s distribution over class labels and the
distribution that the dataset defines.

Models in theanets have at least one loss to optimize during
training. There are default losses for each of the built-in model types, but you
can often override these defaults just by providing a non-default value for the
loss keyword argument when creating your model. For example, to create a
regression model with a mean absolute error loss:

net
 =
 theanets
.
Regressor
([
10
,
 20
,
 3
],
 loss
=
'mae'
)

This will create the regression model with the specified loss.

Predefined Losses¶

These loss functions are available for neural network models.


Loss

(target[, weight, weighted, output_name])
A loss function base class.


CrossEntropy

(target[, weight, weighted, …])
Cross-entropy (XE) loss function for classifiers.


GaussianLogLikelihood

([mean_name, …])
Gaussian Log Likelihood (GLL) loss function.


Hinge

(target[, weight, weighted, output_name])
Hinge loss function for classifiers.


KullbackLeiblerDivergence

(target[, weight, …])
The KL divergence loss is computed over probability distributions.


MaximumMeanDiscrepancy

([kernel])
Maximum Mean Discrepancy (MMD) loss function.


MeanAbsoluteError

(target[, weight, …])
Mean-absolute-error (MAE) loss function.


MeanSquaredError

(target[, weight, weighted, …])
Mean-squared-error (MSE) loss function.

Multiple Losses¶

A theanets model can actually have more than one loss that it attempts to
optimize simultaneously, and these losses can change between successive calls to
train(). In fact, a model has a
losses attribute that’s just a list of theanets.Loss instances; these losses are weighted by a weight
attribute, then summed and combined with any applicable regularizers during each call to train().

Let’s say that you want to optimize a model using both the mean absolute and the
mean squared error. You could first create a regular regression model:

net
 =
 theanets
.
Regressor
([
10
,
 20
,
 3
])

and then add a new loss to the model:

net
.
add_loss
(
'mse'
)

Then, when you call:

net
.
train
(
...
)

the model will attempt to minimize the sum of the two losses.

You can specify the relative weight of the two losses by manipulating the
weight attribute of each loss instance. For instance, if you want the MAE
loss to be twice as strong as the MSE loss:

net
.
losses
[
1
]
.
weight
 =
 2
net
.
train
(
...
)

Finally, if you want to reset the loss to the standard MSE:

net
.
set_loss
(
'mse'
,
 weight
=
1
)

(Here we’ve also shown how to specify the weight of the loss when adding or
setting it to the model.)

Using Weighted Targets¶

By default, the network models available in theanets treat all inputs as
equal when computing the loss for the model. For example, a regression model
treats an error of 0.1 in component 2 of the output just the same as an error of
0.1 in component 3, and each example of a minibatch is treated with equal
importance when training a classifier.

However, there are times when all inputs to a neural network model are not to be
treated equally. This is especially evident in recurrent models: sometimes, the
inputs to a recurrent network might not contain the same number of time steps,
but because the inputs are presented to the model using a rectangular minibatch
array, all inputs must somehow be made to have the same size. One way to address
this would be to cut off all inputs at the length of the shortest input, but
then the network is not exposed to all input/output pairs during training.

Weighted targets can be used for any model in theanets. For example, an
autoencoder could use an array of
weights containing zeros and ones to solve a matrix completion task, where the
input array contains some “unknown” values. In such a case, the network is
required to reproduce the known values exactly (so these could be presented to
the model with weight 1), while filling in the unknowns with statistically
reasonable values (which could be presented to the model during training with
weight 0).

As another example, suppose a classifier model is being trained in a binary
classification task where one of the classes—say, class A—is only present
0.1% of the time. In such a case, the network can achieve 99.9% accuracy by
always predicting class B, so during training it might be important to ensure
that errors in predicting A are “amplified” when computing the loss. You could
provide a large weight for training examples in class A to encourage the model
not to miss these examples.

All of these cases are possible to model in theanets; just include
weighted=True when you create your model:

net
 =
 theanets
.
recurrent
.
Autoencoder
([
3
,
 (
10
,
 'rnn'
),
 3
],
 weighted
=
True
)

When training a weighted model, the training and validation datasets require an
additional component: an array of floating-point values with the same shape as
the expected output of the model. For example, a non-recurrent Classifier model
would require a weight vector with each minibatch, of the same shape as the
labels array, so that the training and validation datasets would each have three
pieces: sample, label, and weight. Each value in the weight array is
used as the weight for the corresponding error when computing the loss.

Custom Losses¶

It’s pretty straightforward to create models in theanets that use different
losses from the predefined theanets.Classifier and theanets.Autoencoder and theanets.Regressor models. (The classifier uses categorical
cross-entropy (XE) as its default loss, and the other two both use mean squared
error, MSE.)

To define a model with a new loss, just create a new theanets.Loss subclass and specify its name when you create your
model. For example, to create a regression model that uses a step function
averaged over all of the model inputs:

class
 Step
(
theanets
.
Loss
):
    def
 __call__
(
self
,
 outputs
):
        return
 (
outputs
[
self
.
output_name
]
 >
 0
)
.
mean
()

net
 =
 theanets
.
Regressor
([
5
,
 6
,
 7
],
 loss
=
'step'
)

Your loss function implementation must return a Theano expression that reflects
the loss for your model. If you wish to make your loss work with weighted
outputs, you will also need to include a case for having weights:

class
 Step
(
theanets
.
Loss
):
    def
 __call__
(
self
,
 outputs
):
        step
 =
 outputs
[
self
.
output_name
]
 >
 0
        if
 self
.
_weights
:
            return
 (
self
.
_weights
 *
 step
)
.
sum
()
 /
 self
.
_weights
.
sum
()
        else
:
            return
 step
.
mean
()