Multiple Outputs and Keras

Date: 2022-03-18
Author: Helmuth

At some point, your model might create multiple outputs. For example, when performing Object Detection, your model will create predictions for both the localization of the objects and for the classes they belong to.

Keras has built-in support for multiple outputs, however it makes some assumptions which are not always fitting well with your situation.

The Keras Way

Let's first review how the Keras way of dealing with a model with multiple outputs looks like. For this article, let's assume a simple model, having one input tensor and two output tensors.

Defining a Model with Multiple Outputs

It's rather simple to define a model with multiple outputs in Keras. When creating the model, one can simply specify a Python list of output tensors instead of a single output tensor.

import tensorflow as tf
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model
import numpy as np

inp = Input(shape=(1,))
out_1 = Dense(1)(inp)
out_2 = Dense(2)(inp)

# create model with one input and two outputs
model = Model(inp, [out_1, out_2])

Loss Functions

Keras assumes that each output contributes to the loss of the model, and adds up all the losses. If you specify one loss function during compile, it will apply this loss function for each output separately.

# calculate loss once for out_1, and once for out_2 and then sum up the losses
model.compile(loss='mean_absolute_error', optimizer='adam')

In addition, it allows you to specify for each output a different loss function.

# use different losses for each output
model.compile(
    loss=['mean_absolute_error', 'mean_squared_error'],
    optimizer='adam')

For the training we have similarly to provide a list of target values.

# get some random "training" data
rng = np.random.default_rng()
in_train = rng.standard_normal(100)
out1_train = rng.standard_normal(100)
out2_train = rng.standard_normal(100)

# train the model
model.fit(in_train, [out1_train, out2_train])

Here we have to keep the position in the list of the outputs, the loss functions, and the targets in sync. Another, more intuitive way is to name the output layers, and to use a dict instead of a list for the loss functions.

import tensorflow as tf
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model
import numpy as np

inp = Input(shape=(1,))
out_1 = Dense(1, name='out1')(inp)
out_2 = Dense(2, name='out2')(inp)

# create model with one input and two outputs
model = Model(inp, [out_1, out_2])

# use different losses for each output
model.compile(
    loss={
        'out1': 'mean_absolute_error',
        'out2': 'mean_squared_error',
    },
    optimizer='adam')

# get some random "training" data
rng = np.random.default_rng()
in_train = rng.standard_normal(100)
out1_train = rng.standard_normal(100)
out2_train = rng.standard_normal(100)

# train the model
model.fit(in_train, {'out1': out1_train, 'out2': out2_train})

Multiple Outputs used in one Loss Function

The Keras way is easy to use, and when the losses are calculated separately from each output it leads to simple and straight-forward code.

However, some models have a loss component depending on two or more outputs. E.g., with SSD-Loss, when calculating the localization loss, there is a step looking at the classification outputs. Here, a model which creates localization and classification as separate outputs cannot be trained in the usual Keras way.

I'm presenting a few ways of circumventing this issue.

Approach 1: Add losses via model.add_loss

When building a model one can add losses not only via compile but also via the function add_loss in the Model class. These losses are specified as TensorFlow tensors, which depend on the input in the desired way.

Approach 2: Using a custom training loop

When using a custom training loop we are no longer compiling the model.

Approach 3: Concatenating outputs

The easiest solution is often just to create a model with one combined output. This might require reshaping the outputs (e.g. using a Flatten layer) and this might require the loss function to perform some splitting and reshaping again.

crossmenu