Pytorch: weight initialization method

pytorch provided in torch.nn.init initialization method commonly used in the function, here briefly to facilitate queries.

Introduction of two parts:

1. Xavier, kaiming series;

2. Other distribution methods

 

Xavier initialization method, paper "Understanding the difficulty of training deep feedforward neural networks"

Formula is derived from the "variance consistency" Go, distribution and initialization of a uniform distribution of the normal two kinds.

1. Xavier uniform distribution

torch.nn.init.xavier_uniform_(tensor, gain=1)

xavier initialization process of uniform distribution U (-a, a), the distribution of the parameters a = gain * sqrt (6 / fan_in + fan_out),

There is a gain, the gain size is set according to the type of activation function

eg:nn.init.xavier_uniform_(w, gain=nn.init.calculate_gain('relu'))

PS: the above initialization method, also known as Glorot initialization

 

2. Xavier normal distribution

torch.nn.init.xavier_normal_(tensorgain=1)

xavier initialization process is normally distributed,

mean=0,std = gain * sqrt(2/fan_in + fan_out)

 

kaiming initialization method, paper "Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification", the same formula is derived from the "variance consistency" out of the law, kaiming initialization method is for xavier performance in this type of activation function poorly relu the proposed improvements, details can be found in the paper.

 

3. kaiming evenly distributed

torch.nn.init.kaiming_uniform_(tensora=0mode='fan_in'nonlinearity='leaky_relu')

This uniform distribution, U ~ (-bound, bound), bound = sqrt (6 / (1 + a ^ 2) * fan_in)

Wherein, a is the slope of the function of the activation axle negative, relu 0

or optionally fan_in mode- fan_out, so that when the forward propagating fan_in consistent variance; fan_out that the back-propagation, the same variance

Alternatively relu nonlinearity- and leaky_relu, the default value. leaky_relu

nn.init.kaiming_uniform_(w, mode='fan_in', nonlinearity='relu')

 

4. kaiming normal distribution

torch.nn.init.kaiming_normal_(tensora=0mode='fan_in'nonlinearity='leaky_relu')

This is a zero mean normal distribution, N ~ (0, std), wherein std = sqrt (2 / (1 + a ^ 2) * fan_in)

Wherein, a is the slope of the function of the activation axle negative, relu 0

or optionally fan_in mode- fan_out, so that when the forward propagating fan_in consistent variance; fan_out that the back-propagation, the same variance

Alternatively relu nonlinearity- and leaky_relu, the default value. leaky_relu

nn.init.kaiming_normal_(w, mode='fan_out', nonlinearity='relu')

2. Other

 

The uniform distribution initialization

torch.nn.init.uniform_(tensora=0b=1)

Uniformly distributed so that the value of U (a, b)

 

6. normal initialization

torch.nn.init.normal_(tensormean=0std=1)

So that the value of the normal distribution N (mean, std), the default value is 0,

 

7. initialization constants

torch.nn.init.constant_(tensorval)

That the value is a constant val nn.init.constant_ (w, 0.3)

 

8. initialization matrix

torch.nn.init.eye_(tensor)

The two-dimensional tensor initialized to the identity matrix (the identity matrix)

 

9. orthogonal Initialization

torch.nn.init.orthogonal_(tensorgain=1)

So that the tensor is orthogonal to the paper: Exact solutions to the nonlinear dynamics of learning in deep linear neural networks "- Saxe, A. et al (2013).

 

10. sparse initialization

torch.nn.init.sparse_(tensorsparsitystd=0.01)

Normal distribution N ~ (0. Std) of the sparse, so that each part is a column 0

Each column proportion sparse sparsity-, is the ratio of the 0

nn.init.sparse_(w, sparsity=0.1)

 

Reprinted to: https: //www.cnblogs.com/jfdwd/p/11269622.html

Guess you like

Origin www.cnblogs.com/ziwh666/p/12395199.html
Recommended