Nerve Network Weights initialization method

from:http://blog.csdn.net/u013989576/article/details/76215989

The method of initializing the weights are: initialization constants (Constant), initialization Gaussian distribution (gaussian), positive_unitball initialization, initialization uniform (uniform), xavier initialization, initialization MSRA, bilinear initialization (Bilinear)

 

 

Constant initialization (constant)

       The weight value or the offset is initialized to a constant, what specific constants, can define their own

Initialization Gaussian distribution (gaussian)

       Gaussian function desired for a given mean and standard deviation 

positive_unitball initialization

       Let each input neuron weights and 1, for example: a 100 input neurons, so that the weight 100 and input to the first of these 100 weights assigned to 1. (0,1 uniformly distributed between), and then, each divided by the weight thereof and can be friends. To do so, it may help prevent excessive weight value is initialized, thereby preventing the activation of functions (Sigmoid Function) into the saturation region. Therefore, it should be more suitable shape simgmoid activation function

Uniform initialization (Uniform)

       The weights are initialized with offset evenly distributed, with the min and max to control upper and lower limits thereof, the default (0,1)

xavier初始化(论文:《Understanding the difficulty of training deep feedforward neural networks》)

       For weight distribution: mean 0 and variance (input number 1 /) in a uniform distribution. * If we pay more attention to the spread, we can choose fan_in, enter the number of the forward direction of propagation; if pay more attention to the spread, we choose fan_out, because in the back propagation, fan_out is one input neuron number; if both are considered, it is selected from the group average = (fan_in + fan_out) / 2. For ReLU function is activated, XavierFiller initialization is also very suitable. With respect to the initialization method, specific reference articles , articles 2 , which assumes a linear activation function.

msra初始化(论文:《Delving Deep into Rectifiers:Surpassing Human-Level Performance on ImageNet Classification》.)

       For the distribution of weights: based on the mean of 0 and variance (number 2 / input) of a Gaussian distribution; it is particularly suitable ReLU activation function, which function is mainly based Relu proposed, Xavier similar derivation, reference may be blog .

Initialization bilinear (bilinear)

      Commonly used in the deconvolution neural network in the weights initialization

Published 74 original articles · won praise 337 · Views 1.3 million +

Guess you like

Origin blog.csdn.net/kebu12345678/article/details/103042660
Recommended