Francesco Scala :

I want to replace the loss function related to my neural network during training, this is the network:

model = tensorflow.keras.models.Sequential()
        model.add(tensorflow.keras.layers.Conv2D(32, kernel_size=(3, 3), activation="relu", input_shape=input_shape))
        model.add(tensorflow.keras.layers.Conv2D(64, (3, 3), activation="relu"))
        model.add(tensorflow.keras.layers.MaxPooling2D(pool_size=(2, 2)))
        model.add(tensorflow.keras.layers.Dropout(0.25))
        model.add(tensorflow.keras.layers.Flatten())
        model.add(tensorflow.keras.layers.Dense(128, activation="relu"))
        model.add(tensorflow.keras.layers.Dropout(0.5))
        model.add(tensorflow.keras.layers.Dense(output_classes, activation="softmax"))
        model.compile(loss=tensorflow.keras.losses.categorical_crossentropy, optimizer=tensorflow.keras.optimizers.Adam(0.001), metrics=['accuracy'])
        history = model.fit(x_train, y_train, batch_size=128, epochs=5, validation_data=(x_test, y_test))

so now I want to change tensorflow.keras.losses.categorical_crossentropy with another, so I made this:

model.compile(loss=tensorflow.keras.losses.mse, optimizer=tensorflow.keras.optimizers.Adam(0.001), metrics=['accuracy'])
    history = model.fit(x_improve, y_improve, epochs=1, validation_data=(x_test, y_test)) #FIXME bug during training

but I have this error:

ValueError: No gradients provided for any variable: ['conv2d/kernel:0', 'conv2d/bias:0', 'conv2d_1/kernel:0', 'conv2d_1/bias:0', 'dense/kernel:0', 'dense/bias:0', 'dense_1/kernel:0', 'dense_1/bias:0'].

Why? How can I fix it? There is another way to change loss function?

Thanks

Edoardo Guerriero :

So, a straightforward answer I would give is: switch to pytorch if you want to play this kind of games. Since in pytorch you define your training and evaluation functions, it takes just an if statement to switch from a loss function to another one.

Also, I see in your code that you want to switch from cross_entropy to mean_square_error, the former is suitable for classification the latter for regression, so this is not really something you can do, in the code that follows I switched from mean squared error to mean squared logarithmic error, which are both loss suitable for regression.

Despite other answers offers solutions to your question (see change-loss-function-dynamically-during-training) it is not clear wether you can trust or not the results. Some people found that even with a customised function sometimes Keras keep training with the first loss.

Solution:

My solution is based on train_on_batch, which allows us to train a model in a for loop and therefore stop training it whenever we prefer to recompile the model with a new loss function. Please note that recompiling the model does not reset the weights (see:Does recompiling a model re-initialize the weights?).

The dataset can be found here Boston housing dataset

# Regression Example With Boston Dataset: Standardized and Larger
from pandas import read_csv
from keras.models import Sequential
from keras.layers import Dense
from sklearn.model_selection import train_test_split
from keras.losses import mean_squared_error, mean_squared_logarithmic_error
from matplotlib import pyplot
import matplotlib.pyplot as plt

# load dataset
dataframe = read_csv("housing.csv", delim_whitespace=True, header=None)
dataset = dataframe.values

# split into input (X) and output (Y) variables
X = dataset[:,0:13]
y = dataset[:,13]

trainX, testX, trainy, testy = train_test_split(X, y, test_size=0.33, random_state=42)

# create model
model = Sequential()
model.add(Dense(13, input_dim=13, kernel_initializer='normal', activation='relu'))
model.add(Dense(6, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal'))

batch_size = 25

# have to define manually a dict to store all epochs scores 
history = {}
history['history'] = {}
history['history']['loss'] = []
history['history']['mean_squared_error'] = []
history['history']['mean_squared_logarithmic_error'] = []
history['history']['val_loss'] = []
history['history']['val_mean_squared_error'] = []
history['history']['val_mean_squared_logarithmic_error'] = []

# first compiling with mse
model.compile(loss='mean_squared_error', optimizer='adam', metrics=[mean_squared_error, mean_squared_logarithmic_error])

# define number of iterations in training and test
train_iter = round(trainX.shape[0]/batch_size)
test_iter = round(testX.shape[0]/batch_size)

for epoch in range(2):

    # train iterations 
    loss, mse, msle = 0, 0, 0
    for i in range(train_iter):

        start = i*batch_size
        end = i*batch_size + batch_size
        batchX = trainX[start:end,]
        batchy = trainy[start:end,]

        loss_, mse_, msle_ = model.train_on_batch(batchX,batchy)

        loss += loss_
        mse += mse_
        msle += msle_

    history['history']['loss'].append(loss/train_iter)
    history['history']['mean_squared_error'].append(mse/train_iter)
    history['history']['mean_squared_logarithmic_error'].append(msle/train_iter)

    # test iterations 
    val_loss, val_mse, val_msle = 0, 0, 0
    for i in range(test_iter):

        start = i*batch_size
        end = i*batch_size + batch_size
        batchX = testX[start:end,]
        batchy = testy[start:end,]

        val_loss_, val_mse_, val_msle_ = model.test_on_batch(batchX,batchy)

        val_loss += val_loss_
        val_mse += val_mse_
        val_msle += msle_

    history['history']['val_loss'].append(val_loss/test_iter)
    history['history']['val_mean_squared_error'].append(val_mse/test_iter)
    history['history']['val_mean_squared_logarithmic_error'].append(val_msle/test_iter)

# recompiling the model with new loss
model.compile(loss='mean_squared_logarithmic_error', optimizer='adam', metrics=[mean_squared_error, mean_squared_logarithmic_error])

for epoch in range(2):

    # train iterations 
    loss, mse, msle = 0, 0, 0
    for i in range(train_iter):

        start = i*batch_size
        end = i*batch_size + batch_size
        batchX = trainX[start:end,]
        batchy = trainy[start:end,]

        loss_, mse_, msle_ = model.train_on_batch(batchX,batchy)

        loss += loss_
        mse += mse_
        msle += msle_

    history['history']['loss'].append(loss/train_iter)
    history['history']['mean_squared_error'].append(mse/train_iter)
    history['history']['mean_squared_logarithmic_error'].append(msle/train_iter)

    # test iterations 
    val_loss, val_mse, val_msle = 0, 0, 0
    for i in range(test_iter):

        start = i*batch_size
        end = i*batch_size + batch_size
        batchX = testX[start:end,]
        batchy = testy[start:end,]

        val_loss_, val_mse_, val_msle_ = model.test_on_batch(batchX,batchy)

        val_loss += val_loss_
        val_mse += val_mse_
        val_msle += msle_

    history['history']['val_loss'].append(val_loss/test_iter)
    history['history']['val_mean_squared_error'].append(val_mse/test_iter)
    history['history']['val_mean_squared_logarithmic_error'].append(val_msle/test_iter)

# Some plots to check what is going on   
# loss function 
pyplot.subplot(311)
pyplot.title('Loss')
pyplot.plot(history['history']['loss'], label='train')
pyplot.plot(history['history']['val_loss'], label='test')
pyplot.legend()

# Only mean squared error 
pyplot.subplot(312)
pyplot.title('Mean Squared Error')
pyplot.plot(history['history']['mean_squared_error'], label='train')
pyplot.plot(history['history']['val_mean_squared_error'], label='test')
pyplot.legend()

# Only mean squared logarithmic error 
pyplot.subplot(313)
pyplot.title('Mean Squared Logarithmic Error')
pyplot.plot(history['history']['mean_squared_logarithmic_error'], label='train')
pyplot.plot(history['history']['val_mean_squared_logarithmic_error'], label='test')
pyplot.legend()
plt.tight_layout()
pyplot.show()

The resulting plot confirm that the loss function is changing after the second epoch:

The drop in the loss function is due to the fact that the model is switching from normal mean squared error to the logarithmic one, which has much lower values. Printing the scores also prove that the used loss truly changed:

print(history['history']['loss'])
[599.5209197998047, 570.4041115897043, 3.8622902120862688, 2.1578191178185597]
print(history['history']['mean_squared_error'])
[599.5209197998047, 570.4041115897043, 510.29034205845426, 425.32058388846264]
print(history['history']['mean_squared_logarithmic_error'])
[8.624503476279122, 6.346359729766846, 3.8622902120862688, 2.1578191178185597]

In the first two epochs the values of loss are equal to ones of mean_square_error and during the third and fourth epochs the values becomes equal to the ones of mean_square_logarithmic_error, which is the new loss that was set. So it seems that using train_on_batch allows to change loss function, nevertheless I want to stress out again that this is basically what one should do on pytoch to achieve the same results, with the difference that the behaviour of pytorch (in this scenario and in my opinion) is more reliable.