earlystop

from keras.callbacks import EarlyStopping

Arguments

monitor: quantity to be monitored. The calculation of measures on the validation dataset will have the ‘val_‘ prefix, such as ‘val_loss‘ for the loss on the validation dataset. Additional metrics can be monitored during the training of the model. They can be specified when compiling the model via the “metrics” argument to the compile function. This argument takes a Python list of known metric functions, such as ‘mse‘ for mean squared error and ‘acc‘ for accuracy.
min_delta: minimum change in the monitored quantity to qualify as an improvement, i.e. an absolute change of less than min_delta, will count as no improvement.
patience: number of epochs with no improvement after which training will be stopped. 连续几个epoch没有进展就会停止。
verbose: verbosity mode.
mode: one of {auto, min, max}. In min mode, training will stop when the quantity monitored has stopped decreasing; in max mode it will stop when the quantity monitored has stopped increasing; in auto mode, the direction is automatically inferred from the name of the monitored quantity. By default, mode is set to ‘auto‘ and knows that you want to minimize loss or maximize accuracy.
baseline: Baseline value for the monitored quantity to reach. Training will stop if the model doesn't show improvement over the baseline. It may be desirable to only stop training if performance stays above or below a given threshold or baseline. For example, if you have familiarity with the training of the model (e.g. learning curves) and know that once a validation loss of a given value is achieved that there is no point in continuing training. This can be specified by setting the “baseline” argument.
restore_best_weights: whether to restore model weights from the epoch with the best value of the monitored quantity. If False, the model weights obtained at the last step of training are used.

fit

from keras.models import Sequential

from keras.layers.core import Dense, Activation, Dropout

from keras.layers import GlobalMaxPooling1D, Conv1D, MaxPooling1D, Flatten, Bidirectional, SpatialDropout1D

Arguments

x: Numpy array of training data (if the model has a single input), or list of Numpy arrays (if the model has multiple inputs). If input layers in the model are named, you can also pass a dictionary mapping input names to Numpy arrays. x can be None (default) if feeding from framework-native tensors (e.g. TensorFlow data tensors).
y: Numpy array of target (label) data (if the model has a single output), or list of Numpy arrays (if the model has multiple outputs). If output layers in the model are named, you can also pass a dictionary mapping output names to Numpy arrays. y can be None (default) if feeding from framework-native tensors (e.g. TensorFlow data tensors).
batch_size: Integer or None. Number of samples per gradient update. If unspecified, batch_sizewill default to 32.
epochs: Integer. Number of epochs to train the model. An epoch is an iteration over the entire x and y data provided. Note that in conjunction with initial_epoch, epochs is to be understood as "final epoch". The model is not trained for a number of iterations given by epochs, but merely until the epoch of index epochs is reached.
verbose: Integer. 0, 1, or 2. Verbosity mode. 0 = silent, 1 = progress bar, 2 = one line per epoch.
callbacks: List of keras.callbacks.Callback instances. List of callbacks to apply during training and validation (if ). See callbacks.
validation_split: Float between 0 and 1. Fraction of the training data to be used as validation data. The model will set apart this fraction of the training data, will not train on it, and will evaluate the loss and any model metrics on this data at the end of each epoch. The validation data is selected from the last samples in the x and y data provided, before shuffling.
validation_data: tuple (x_val, y_val) or tuple (x_val, y_val, val_sample_weights) on which to evaluate the loss and any model metrics at the end of each epoch. The model will not be trained on this data. validation_data will override validation_split.
shuffle: Boolean (whether to shuffle the training data before each epoch) or str (for 'batch'). 'batch' is a special option for dealing with the limitations of HDF5 data; it shuffles in batch-sized chunks. Has no effect when steps_per_epoch is not None.
class_weight: Optional dictionary mapping class indices (integers) to a weight (float) value, used for weighting the loss function (during training only). This can be useful to tell the model to "pay more attention" to samples from an under-represented class.
sample_weight: Optional Numpy array of weights for the training samples, used for weighting the loss function (during training only). You can either pass a flat (1D) Numpy array with the same length as the input samples (1:1 mapping between weights and samples), or in the case of temporal data, you can pass a 2D array with shape (samples, sequence_length), to apply a different weight to every timestep of every sample. In this case you should make sure to specifysample_weight_mode="temporal" in compile().
initial_epoch: Integer. Epoch at which to start training (useful for resuming a previous training run).
steps_per_epoch: Integer or None. Total number of steps (batches of samples) before declaring one epoch finished and starting the next epoch. When training with input tensors such as TensorFlow data tensors, the default None is equal to the number of samples in your dataset divided by the batch size, or 1 if that cannot be determined.
validation_steps: Only relevant if steps_per_epoch is specified. Total number of steps (batches of samples) to validate before stopping.
validation_freq: Only relevant if validation data is provided. Integer or list/tuple/set. If an integer, specifies how many training epochs to run before a new validation run is performed, e.g. validation_freq=2 runs validation every 2 epochs. If a list, tuple, or set, specifies the epochs on which to run validation, e.g. validation_freq=[1, 2, 10] runs validation at the end of the 1st, 2nd, and 10th epochs.

Embedding层

嵌入层将正整数转换为具有固定大小的向量，如[[4],[20]]->[[0.25,0.1],[0.6,-0.2]]。Embedding层只能作为模型的第一层。

参数

1.input_dim：这是文本数据中词汇的大小。例如，如果你的数据是整数编码为0-10之间的值，则词表的大小将为11个字。

2.output_dim：这是嵌入单词的向量空间的大小。它为每个单词定义了该层的输出向量的大小。例如，它可以是32或100甚至更大。根据你的问题来定。

3.input_length：这是输入序列的长度，正如你为Keras模型的任何输入层定义的那样。例如，如果你的所有输入文档包含1000个单词，则为1000。

扫描二维码关注公众号，回复： 11606063 查看本文章

这三个为必须参数。

嵌入层的输出是一个二维向量，每个单词对应一个输入序列（输入文档）。如果希望连接密集层直接到嵌入层，必须首先压扁2D输出矩阵。

from keras.preprocessing.text import one_hot
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers.embeddings import Embedding

# define documents
docs = ['Well done!',
		'Good work',
		'Great effort',
		'nice work',
		'Excellent!',
		'Weak',
		'Poor effort!',
		'not good',
		'poor work',
		'Could have done better.']
# define class labels
labels = [1,1,1,1,1,0,0,0,0,0]

# integer encode the documents
vocab_size = 50  # 这里的数字是估计的
encoded_docs = [one_hot(d, vocab_size) for d in docs]
print(encoded_docs)

# 序列具有不同的长度，并且Keras更喜欢被矢量化的所以输入具有相同的长度。将输入序列的长度设为4。
# pad documents to a max length of 4 words
max_length = 4
padded_docs=pad_sequences(encoded_docs,maxlen=max_length, padding='post')
print(padded_docs)

# define the model
model = Sequential()
model.add(Embedding(vocab_size, 8, input_length=max_length))
model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))
# compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])
# summarize the model
print(model.summary())

# fit the model
model.fit(padded_docs, labels, epochs=50, verbose=0)
# evaluate the model
loss, accuracy = model.evaluate(padded_docs, labels, verbose=0)
print('Accuracy: %f' % (accuracy*100))

input_dim：大或等于0的整数，字典长度，即输入数据最大下标+1
output_dim：大于0的整数，代表全连接嵌入的维度
embeddings_initializer: 嵌入矩阵的初始化方法，为预定义初始化方法名的字符串，或用于初始化权重的初始化器。参考initializers
embeddings_regularizer: 嵌入矩阵的正则项，为Regularizer对象
embeddings_constraint: 嵌入矩阵的约束项，为Constraints对象
mask_zero：布尔值，确定是否将输入中的‘0’看作是应该被忽略的‘填充’（padding）值，该参数在使用递归层处理变长输入时有用。设置为True的话，模型中后续的层必须都支持masking，否则会抛出异常。如果该值为True，则下标0在字典中不可用，input_dim应设置为|vocabulary| + 2。
input_length：当输入序列的长度固定时，该值为其长度。如果要在该层后接Flatten层，然后接Dense层，则必须指定该参数，否则Dense层的输出维度无法自动推断。
输入shape
形如（samples，sequence_length）的2D张量
输出shape
形如(samples, sequence_length, output_dim)的3D张量

这里写图片描述

上图的流程是把文章的单词使用词向量来表示。
(1)提取文章所有的单词，把其按其出现的次数降许(这里只取前50000个)，比如单词‘network’出现的次数最多，编号ID为0，依次类推…
(2)每个编号ID都可以使用50000维的二进制(one-hot)表示
(3)最后，我们会生产一个矩阵M，行大小为词的个数50000，列大小为词向量的维度(通常取128或300)，比如矩阵的第一行就是编号ID=0，即network对应的词向量。
那这个矩阵M怎么获得呢？在Skip-Gram 模型中，我们会随机初始化它，然后使用神经网络来训练这个权重矩阵

这里写图片描述

那我们的输入数据和标签是什么？如下图，输入数据就是中间的哪个蓝色的词对应的one-hot编码，标签就是它附近词的one-hot编码(这里windown_size=2,左右各取2个)

这里写图片描述

就上述的Word2Vec中的demo而言，它的单词表大小为1000，词向量的维度为300，所以Embedding的参数 input_dim=10000，output_dim=300
回到最初的问题：嵌入层将正整数（下标）转换为具有固定大小的向量，如[[4],[20]]->[[0.25,0.1],[0.6,-0.2]]
举个栗子：假如单词表的大小为1000，词向量维度为2，经单词频数统计后，tom对应的id=4，而jerry对应的id=20，经上述的转换后，我们会得到一个M1000×2M1000×2 M_{1000\times2}M1000×2的矩阵，而tom对应的是该矩阵的第4行，取出该行的数据就是[0.25,0.1]
如果输入数据不需要词的语义特征语义，简单使用Embedding层就可以得到一个对应的词向量矩阵，但如果需要语义特征，我们大可把训练好的词向量权重直接扔到Embedding层中即可，具体看参考keras提供的栗子:在Keras模型中使用预训练的词向量

下面看一个具体的例子

在这里插入图片描述

可以看出，我输入数组中的词典大小是3，即有三个不同的词，所以我的input_dim便要比3要大1，其中output_dim便是输出维度，就如同CNN最后的全连接层一样，上面我设置的5，便将每一位要表示的数字变为用1x5来表示的向量。

Compile

model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.SGD(lr=0.01, momentum=0.9, nesterov=True))

Usage of metrics

A metric is a function that is used to judge the performance of your model. Metric functions are to be supplied in the metrics parameter when a model is compiled.

model.compile(loss='mean_squared_error',
              optimizer='sgd',
              metrics=['mae', 'acc'])

from keras import metrics

model.compile(loss='mean_squared_error',
              optimizer='sgd',
              metrics=[metrics.mae, metrics.categorical_accuracy])

A metric function is similar to a loss function, except that the results from evaluating a metric are not used when training the model. You may use any of the loss functions as a metric function.

You can either pass the name of an existing metric, or pass a Theano/TensorFlow symbolic function (see Custom metrics).

Arguments

y_true: True labels. Theano/TensorFlow tensor.
y_pred: Predictions. Theano/TensorFlow tensor of the same shape as y_true.

Returns

Single tensor value representing the mean of the output array across all datapoints.

Available metrics

binary_accuracy

keras.metrics.binary_accuracy(y_true, y_pred)

categorical_accuracy

keras.metrics.categorical_accuracy(y_true, y_pred)

sparse_categorical_accuracy

keras.metrics.sparse_categorical_accuracy(y_true, y_pred)

top_k_categorical_accuracy

keras.metrics.top_k_categorical_accuracy(y_true, y_pred, k=5)

sparse_top_k_categorical_accuracy

keras.metrics.sparse_top_k_categorical_accuracy(y_true, y_pred, k=5)

In addition to the metrics above, you may use any of the loss functions described in the loss function page as metrics.

Custom metrics

Custom metrics can be passed at the compilation step. The function would need to take (y_true, y_pred) as arguments and return a single tensor value.

import keras.backend as K

def mean_pred(y_true, y_pred):
    return K.mean(y_pred)

model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy', mean_pred])

参考：

Keras参数

earlystop

fit

Embedding层

参数

Compile

Usage of metrics

Available metrics

binary_accuracy

categorical_accuracy

sparse_categorical_accuracy

top_k_categorical_accuracy

sparse_top_k_categorical_accuracy

Custom metrics

猜你喜欢