Point to open this blog of people, it is estimated all know what Keras Yes. But I, as a white, or the first to talk about what keras Yes.
Like tensorflow, like, Keras is a python library, but there are things neural network. We want to train a deep learning model, and a model has a lot of widgets. such as
- What activation function, relu or sigmoid
- What optimizer, gradient descent or adam
- Do you want to add a regularization avoid over-fitting, with L2 regularization do it or dropout
- Or use a batch normalization
- .....
These are the words of their own handwriting, time-consuming. The Keras is already written these widgets, just need to call on ok!
====================== text dividing line == ======================= ==
table of Contents
First, the model definition phase
Second, the model initialization and compilation
keras in building a model, then training, testing was divided into the following steps:
- Defined model
- Initialization, and compilation model
- Debugging with the training set
- Testing with the test set
First, the model definition phase
def HappyModel(input_shape):
"""
Implementation of the HappyModel.
Arguments:
input_shape -- shape of the images of the dataset
(height, width, channels) as a tuple.
Note that this does not include the 'batch' as a dimension.
If you have a batch like 'X_train',
then you can provide the input_shape using
X_train.shape[1:]
Returns:
model -- a Model() instance in Keras
"""
# Define the input placeholder as a tensor with shape input_shape. Think of this as your input image!
X_input = Input(input_shape)
# Zero-Padding: pads the border of X_input with zeroes
X = ZeroPadding2D((3, 3))(X_input)
# CONV -> Batch Normalization -> RELU Block applied to X
X = Conv2D(32, (7, 7), strides = (1, 1), name = 'conv0')(X)
X = BatchNormalization(axis = 3, name = 'bn0')(X)
X = Activation('relu')(X)
# MAXPOOL
X = MaxPooling2D((2, 2), name='max_pool')(X)
# FLATTEN X (means convert it to a vector) + FULLYCONNECTED
X = Flatten()(X)
X = Dense(1, activation='sigmoid', name='fc')(X)
# Create model. This creates your Keras model instance, you'll use this instance to train/test the model.
model = Model(inputs = X_input, outputs = X, name='HappyModel')
### END CODE HERE ###
return model
Second, the model initialization and compilation
# 初始化模型
model = HappyModel(X_train.shape[1:])
#编译模型
model.compile(optimizer="adam",loss="binary_crossentropy",metrics=["accuracy"])
Here detail about model.compile () function
compile(optimizer, loss=None, metrics=None, loss_weights=None, sample_weight_mode=None, weighted_metrics=None, target_tensors=None)
Detailed parameters:
2.1 optimizer [ref]
optimizer is a class. There are two ways to call:
1, with a string parameter indicates default optimizer
model.compile(loss='mean_squared_error', optimizer='sgd')
2, create an instance optimizer, and calls
sgd = optimizers.SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='mean_squared_error', optimizer=sgd)
There are five kinds optimizer as follows:
2.1.1 SGD
keras.optimizers.SGD(learning_rate=0.01, momentum=0.0, nesterov=False)
Stochastic gradient descent optimizer.
Includes support for momentum, learning rate decay, and Nesterov momentum.
Arguments
-
learning_rate: float >= 0. Learning rate.
-
momentum: float >= 0. Parameter that accelerates SGD in the relevant direction and dampens oscillations.
-
nesterov: boolean. Whether to apply Nesterov momentum.
2.1.2 RMSprop
keras.optimizers.RMSprop(learning_rate=0.001, rho=0.9)
RMSProp optimizer. [source]
It is recommended to leave the parameters of this optimizer at their default values (except the learning rate, which can be freely tuned).
Arguments
-
learning_rate: float >= 0. Learning rate.
-
rho: float >= 0.
References
2.1.3 Adagrad
keras.optimizers.Adagrad(learning_rate=0.01)
Adagrad optimizer. [Source]
Adagrad is an optimizer with parameter-specific learning rates, which are adapted relative to how frequently a parameter gets updated during training. The more updates a parameter receives, the smaller the learning rate.
It is recommended to leave the parameters of this optimizer at their default values.
Arguments
-
learning_rate: float >= 0. Initial learning rate.
References
2.1.4 Adadelta
keras.optimizers.Adadelta(learning_rate=1.0, rho=0.95)
Adadelta optimizer. [source]
Adadelta is a more robust extension of Adagrad that adapts learning rates based on a moving window of gradient updates, instead of accumulating all past gradients. This way, Adadelta continues learning even when many updates have been done. Compared to Adagrad, in the original version of Adadelta you don't have to set an initial learning rate. In this version, initial learning rate and decay factor can be set, as in most other Keras optimizers.
It is recommended to leave the parameters of this optimizer at their default values.
Arguments
-
learning_rate: float >= 0. Initial learning rate, defaults to 1. It is recommended to leave it at the default value.
-
rho: float >= 0. Adadelta decay factor, corresponding to fraction of gradient to keep at each time step.
References
2.1.5 Adam
keras.optimizers.Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999, amsgrad=False)
Adam optimizer. [source]
Default parameters follow those provided in the original paper.
Arguments
-
learning_rate: float >= 0. Learning rate.
-
beta_1: float, 0 < beta < 1. Generally close to 1.
-
beta_2: float, 0 < beta < 1. Generally close to 1.
-
amsgrad: boolean. Whether to apply the AMSGrad variant of this algorithm from the paper "On the Convergence of Adam and Beyond".
2.2 loss
There is a loss two parameters (y_true, y_predict), returns a scalar value (a single tensor value) of TensorFlow / Theano symbolic function. Two parameters are two TensorFlow / Theano tensor, having the same shape.
loss of two settings as follows:
#方法一 loss funtion 实体法
from keras import losses
model.compile(loss=losses.mean_squared_error, optimizer='sgd')
# 方法二,直接用名字调用(都是keras中定义好的loss function)
model.compile(loss='mean_squared_error', optimizer='sgd')
Loss function defined in view of keras too much, I do not list them, just write two common look, see the rest here LOSSES .
2.2.1 binary_crossentropy
0,1 model used to make judgments
keras.losses.binary_crossentropy(y_true, y_pred, from_logits=False, label_smoothing=0)
2.2.2 categorical_crossentropy
For multi-classifier (e.g. SoftMax)
keras.losses.categorical_crossentropy(y_true, y_pred, from_logits=False, label_smoothing=0)
2.3 metrics
Models for assessing quality, and loss parameters, as is also a two parameter (y_true, y_predict), returns a scalar value (a single tensor value) of TensorFlow / Theano symbolic function. Two parameters are also TensorFlow / Theano tensor.
There are two metrics also call a method, instance & string type name
#方法一, 实例
from keras import metrics
model.compile(loss='mean_squared_error',
optimizer='sgd',
metrics=[metrics.mae, metrics.categorical_accuracy])
#也可以自己定义metrics,来评估模型的好坏。
import keras.backend as K
def mean_pred(y_true, y_pred):
return K.mean(y_pred)
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy', mean_pred])
# 方法二,直接按名字调用
model.compile(loss='mean_squared_error',
optimizer='sgd',
metrics=['mae', 'acc'])
One common metric as the rest see metrics
2.3.1 accuracy
keras.metrics.accuracy(y_true, y_pred)
2.4 The remaining parameters
The remaining parameters are not used, supplement use again
Third, the model debugging
Finally came to this step debugging model, call the fit function to debug. If a previous call too fit, fit again call will then get the last train of the parameters to continue training.
model.fit(x = X_train, y = Y_train, epochs=10, batch_size = 64)
This fact has particularly fit function parameters, the full version is as follows: Reference model.fit
fit(x=None, y=None, batch_size=None, epochs=1, verbose=1, callbacks=None, validation_split=0.0, validation_data=None, shuffle=True, class_weight=None, sample_weight=None, initial_epoch=0, steps_per_epoch=None, validation_steps=None, validation_freq=1, max_queue_size=10, workers=1, use_multiprocessing=False)
Parameters are as follows:
-
the X- : the Data training set
-
the y- : Labels training set
-
epochs : how many times the training set traversal
-
batch_size : a batch size
-
.....
return value:
A History
object. Its History.history
attribute is a record of training loss values and metrics values at successive epochs, as well as validation loss values and validation metrics values (if applicable).
Fourth, the test model
model.evalue(x=X_test, y=Y_test)
Of course, the full version of the function is still very long:
evaluate(x=None, y=None, batch_size=None, verbose=1, sample_weight=None, steps=None, callbacks=None, max_queue_size=10, workers=1, use_multiprocessing=False)
parameter:
- x: data test set
- y: test set labels
- ...
Returns:
Scalar test loss (if the model has a single output and no metrics) or list of scalars (if the model has multiple outputs and/or metrics). The attribute model.metrics_names
will give you the display labels for the scalar outputs.
V. drawing model structure
When all is done, you can look at the output of the model structure.
There are two ways to look at the structure of the model:
-
Returns the model structure in text form:
model.summary()
-
Returns the model structure in picture form:
plot_model(model,to_file='Model.png')
-
#文字形式返回model结构 model.summary() # 图片格式绘制model结构 plot_model(happyModel, to_file='HappyModel.png') # 保存图片 SVG(model_to_dot(happyModel).create(prog='dot', format='svg'))
reference:https://keras.io/