Xiaobai learns Pytorch series – Torch.nn API Dropout Layers (11)

method	note
nn.Dropout	During training, some elements of the input tensor are randomly zeroed with probability p using samples from a Bernoulli distribution.
nn.Dropout1d	Randomly zero the entire channel (a channel is a 1D feature map, e.g. the jth channel of the i'th sample in the batch input is a 1D tensor input[i,j]
nn.Dropout2d	Randomly zeroes the entire channel (a channel is a 2D feature map, e.g. the jth channel of the i'th sample in the batched input is a 2D tensor input[i,j] .
nn.Dropout3d	Randomly zeroes the entire channel (a channel is a 3D feature map, e.g. the jth channel of the i'th sample in the batched input is a 3D tensor input[i,j]).
nn.AlphaDropout	Applies Alpha Dropout on the input.
nn.FeatureAlphaDropout	Randomly mask entire channels (a channel is a feature map.

nn.Dropout

During training, some elements of the input tensor are randomly zeroed with probability p using samples from a Bernoulli distribution. Each channel will be zeroed independently on each forward call.

This has been shown to be an effective technique for regularizing and preventing adaptation of neurons, as described in the paper Improving neural networks by preventing co-adaptation of feature detectors
.

>>> m = nn.Dropout(p=0.2)
>>> input = torch.randn(20, 16)
>>> output = m(input)

class SelfDropout(nn.Module):
    def __init__(self, p: float = 0.5, inplace: bool = False) -> None:
        super(SelfDropout, self).__init__()
        if p < 0 or p > 1:
            raise ValueError("dropout probability has to be between 0 and 1, "
                             "but got {}".format(p))
        self.p = p
        self.inplace = inplace
        # 下面是自定义的内容
        # nn.Parameter将变量变成模型的参数，不仅可以参加反向传播的参数更新，还能够在model.cuda()时一同被送到GPU中
        self.dropout_mask = nn.Parameter(torch.zeros(4096))
        self.dropout_mask.requires_grad = True

    # 能够打印用户定制化的额外信息，为重载函数
    def extra_repr(self) -> str:
        return 'p={}, inplace={}'.format(self.p, self.inplace)

    def forward(self, input: Tensor) -> Tensor:
        if self.training:
            print(self.dropout_mask)
            return input * self.dropout_mask
        else:
            return input * self.dropout_mask

nn.Dropout1d

Randomly zero out the entire channel (a channel is a 1D feature map, e.g. the jth channel of the i'th sample in the batched input is a 1D tensor input [i,j]). Using samples from a Bernoulli distribution, each channel will be independently zeroed on each forward call with probability p.

>>> m = nn.Dropout1d(p=0.2)
>>> input = torch.randn(20, 16, 32)
>>> output = m(input)

nn.Dropout2d

Randomly zeroes the entire channel (a channel is a 1D feature map, e.g. the jth channel of the i'th sample in the batched input is a 1D tensor input[i,j]). Using samples from a Bernoulli distribution, each channel will be independently zeroed on each forward call with probability p.

Usually the input comes from nn.Conv2da module.

As described in the paper Efficient Object Localization Using Convolutional Networks , if adjacent pixels in the feature map are strongly correlated (as is normally the case in early convolutional layers), then iid dropout will not regularize the activation, otherwise Only results in a lower effective learning rate.

In this case, nn.Dropout2d()it will help to increase the independence between feature maps and it should be used.

>>> m = nn.Dropout2d(p=0.2)
>>> input = torch.randn(20, 16, 32, 32)
>>> output = m(input)

nn.Dropout3d

>>> m = nn.Dropout3d(p=0.2)
>>> input = torch.randn(20, 16, 4, 32, 32)
>>> output = m(input)

nn.AlphaDropout

Applies Alpha Dropout to the input.

Alpha decay is a type of decay that maintains self-normalizing properties. For inputs with zero mean and unit standard deviation, the output of Alpha Dropout maintains the original mean and standard deviation of the input. Alpha Dropout goes hand in hand with the SELU activation function, ensuring that the output has zero mean and unit standard deviation.

During training, it randomly masks some elements of the input tensor with probability p, using samples from a Bernoulli distribution. The elements to mask are randomized on each forward call, and scaled and shifted to maintain zero mean and unit standard deviation.

During evaluation, the module simply computes an identity function.

More details can be found in the paper Self-Normalizing Neural Networks .

>>> m = nn.AlphaDropout(p=0.2)
>>> input = torch.randn(20, 16)
>>> output = m(input)

nn.FeatureAlphaDropout

Randomly mask entire channels (a channel is a feature map, e.g. the j-th channel of the i-th sample in the batch input is tensor input[i,j] of input tensors). Instead of setting the activation to zero as in regular dropout, the activation is set to the negative saturation value of the SELU activation function. More details can be found in the paper Self-Normalizing Neural Networks .

In each forward call, each element will be masked independently, using samples from a Bernoulli distribution with probability p. On each forward call, the masked elements are randomized, scaled and shifted to keep the mean and unit variance at zero.

Usually the input comes from nn.AlphaDropouta module.

>>> m = nn.FeatureAlphaDropout(p=0.2)
>>> input = torch.randn(20, 16, 4, 32, 32)
>>> output = m(input)