In short
Don't use softmax
.
Use sigmoid
for activation of your output layer.
Use binary_crossentropy
for loss function.
Use predict
for evaluation.
Why
In softmax
when increasing score for one label, all others are lowered (it's a probability distribution). You don't want that when you have multiple labels.
Complete Code
from keras.models import Sequential from keras.layers import Dense, Dropout, Activation from keras.optimizers import SGD model = Sequential() model.add(Dense(5000, activation='relu', input_dim=X_train.shape[1])) model.add(Dropout(0.1)) model.add(Dense(600, activation='relu')) model.add(Dropout(0.1)) model.add(Dense(y_train.shape[1], activation='sigmoid')) sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True) model.compile(loss='binary_crossentropy', optimizer=sgd) model.fit(X_train, y_train, epochs=5, batch_size=2000) preds = model.predict(X_test) preds[preds>=0.5] = 1 preds[preds<0.5] = 0 # score = compare preds and y_test
Ref: