Analysis of Dropout in Deep Learning

Regarding the concept of dropout, I have encountered it in the flower book on deep learning before, and I have also used it in the project through Caffe and Keras, but I have not carefully studied what it does, but just use it. Recently After reading some related blogs, I would like to summarize.

The role of Dropout is to prevent overfitting, especially when the training set data volume is small, which will greatly reduce its generalization ability. With the operation of Dropout, its function is to make some neurons useless, that is, each neuron has a certain probability of being removed during each training.

In fact, Dropout can be considered as a practical bagging method for integrating a large number of deep neural networks. Bagging involves training multiple models and evaluating multiple models on each test sample. This seems impractical when each model is a large neural network, since training and evaluating such a network takes a lot of runtime and memory. Dropout provides an inexpensive approximation of bagging ensembles capable of training and evaluating exponentially large numbers of neural networks.

All it does is ensemble during training to include all sub-networks formed after removing non-output units from the base network. A simpler operation is the multiply-by-zero operation, which effectively deletes a cell by multiplying the outputs of some cells by zero. And some frameworks do just that. Such as keras.

write picture description here

Recalling bagging learning, we define k different models, constructed with replacement samples from the training set k different datasets, and then in the training set i Train the model on i . The goal of Dropout is to approximate this process on an exponentially larger number of neural networks. Specifically, when using Dropout in training, we use mini-batch based learning algorithms with smaller step sizes such as gradient descent, etc. We load one sample at a time in the mini-batch and then randomly sample different binary masks applied to all input and hidden units in the network. For each unit, the mask is sampled independently. The sampling probability with a mask value of 1 (resulting in the inclusion of a cell) is a fixed hyperparameter before training begins. In fact, this is the parameter of dropout, and its value range is [ 0 , 1 ] .

The following is the function implementation of dropout in keras. In fact, it uses a random number generator to generate a vector of 0 and 1, and then multiplies the value of the neuron respectively. The last thing to note is that each value also needs to be divided by the (1-level) probability. Where level is the parameter of dropout (a value between [0,1] is used as the parameter of dropout probability).

def dropout(x, level, noise_shape=None, seed=None):
    """Sets entries in `x` to zero at random,
    while scaling the entire tensor.

    # Arguments
        x: tensor
        level: fraction of the entries in the tensor
            that will be set to 0.
        noise_shape: shape for randomly generated keep/drop flags,
            must be broadcastable to the shape of `x`
        seed: random seed to ensure determinism.
    """
    if level < 0. or level >= 1:
        raise ValueError('Dropout level must be in interval [0, 1[.')
    if seed is None:
        seed = np.random.randint(1, 10e6)
    if isinstance(noise_shape, list):
        noise_shape = tuple(noise_shape)

    rng = RandomStreams(seed=seed)
    retain_prob = 1. - level

    if noise_shape is None:
        random_tensor = rng.binomial(x.shape, p=retain_prob, dtype=x.dtype)
    else:
        random_tensor = rng.binomial(noise_shape, p=retain_prob, dtype=x.dtype)
        random_tensor = T.patternbroadcast(random_tensor,
                                           [dim == 1 for dim in noise_shape])
    x *= random_tensor
    x /= retain_prob
    return x

As for why you want to divide by that value, check out this blog for an explanation.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325652110&siteId=291194637