tf.variance_scaling_initializer () tensorflow Learning: initialization parameters

CNN is the most important parameter, including W, b. We train CNN's ultimate goal is to get the best parameters, the objective function to obtain a minimum. Initialization parameters are equally important, so fine-tuning by a lot of people's attention, then tf which provides a method of initialization parameters of it, we can not initialize itself it?

All methods are defined in the initializationtensorflow/python/ops/init_ops.py

1、tf.constant_initializer()

Can also be abbreviated as tf.Constant ()

Initialization is a constant, this is very useful, usually offset term is to use it to initialize.

By its two initialization method derived:

a, tf.zeros_initializer (), may be abbreviated as tf.Zeros ()

b, tf.ones_initializer (), may be abbreviated as tf.Ones ()

Example: In a convolutional layer, the bias term b is initialized to 0, there are a variety of writing:

  1. conv1 = tf.layers.conv2d(batch_images,
  2. filters=64,
  3. kernel_size=7,
  4. strides=2,
  5. activation=tf.nn.relu,
  6. kernel_initializer = tf . TruncatedNormal ( stddev = 0.01 )
  7. bias_initializer=tf.Constant(0),
  8. )

or:

  1. bias_initializer=tf.constant_initializer(0)

or:

  1. bias_initializer=tf.zeros_initializer()

or:

  1. bias_initializer=tf.Zeros()

Example: How to initialize W Laplace operator?

  1. value = [1, 1, 1, 1, -8, 1, 1, 11]
  2. init = tf.constant_initializer(value)
  3. W= tf.get_variable('W', shape=[3, 3], initializer=init)

2、tf.truncated_normal_initializer()

Or abbreviated as tf.TruncatedNormal ()

Generating a random number truncated normal distribution, the initialization method if more than using the tf.

It has four parameters ( Mean = 0.0, stddev = 1.0, SEED = None, DTYPE = dtypes.float32), are used to specify the mean, standard deviation, random number seed and random number data type, which generally only need to set stddev a parameter on it.

Example:

  1. conv1 = tf.layers.conv2d(batch_images,
  2. filters=64,
  3. kernel_size=7,
  4. strides=2,
  5. activation=tf.nn.relu,
  6. kernel_initializer = tf . TruncatedNormal ( stddev = 0.01 )
  7. bias_initializer=tf.Constant(0),
  8. )

or:

  1. conv1 = tf.layers.conv2d(batch_images,
  2. filters=64,
  3. kernel_size=7,
  4. strides=2,
  5. activation=tf.nn.relu,
  6. kernel_initializer=tf.truncated_normal_initializer(stddev=0.01)
  7. bias_initializer=tf.zero_initializer(),
  8. )

3、tf.random_normal_initializer()

May be abbreviated as tf.RandomNormal ()

Generating a standard normal random number as parameters, and truncated_normal_initializer.

4、random_uniform_initializer = RandomUniform()

May be abbreviated as tf.RandomUniform ()

Generating a uniformly distributed random number, there are four parameters ( MINVAL = 0, MAXVAL = None, SEED = None, DTYPE = dtypes.float32), are used to specify the minimum, maximum, and the random number seed type.

5、tf.uniform_unit_scaling_initializer()

May be abbreviated as tf.UniformUnitScaling ()

And almost uniform distribution, but this method does not need to specify minimum and maximum initialization, through calculated. Parameters (factor = 1.0, seed = None, dtype = dtypes.float32)

  1. max_val = math.sqrt(3 / input_size) * factor

Input_size herein refers to the dimension of the input data, assuming input is x, calculation of x * W, the input_size = W.shape[0]

Its distribution interval [-max_val, max_val]

6、tf.variance_scaling_initializer()

May be abbreviated as tf.VarianceScaling ()

参数为(scale=1.0,mode="fan_in",distribution="normal",seed=None,dtype=dtypes.float32)

scale: Scale Scale (positive float)

mode: a "fan_in", "fan_out", "fan_avg" is used to calculate the value of the standard deviation stddev.

distribution: the distribution type, "normal" or "uniform" one.

When the distribution = "normal" when generating truncated normal distribution (normal distribution truncated) random number, wherein stddev = sqrt (scale / n), n is calculated with the mode parameters.

      If mode = "fan_in", n is the number of nodes in the input unit;         

      If mode = "fan_out", n is the number of nodes of the output unit;

       If mode = "fan_avg", n means an average value of the input and output points of junction.

When the distribution = "uniform" when generating uniformly distributed random numbers, assuming that the distribution interval [-limit, limit], then

      limit = sqrt(3 * scale / n)

7、tf.orthogonal_initializer()

Abbreviated as tf.Orthogonal ()

Generating a random number of orthogonal matrices.

When the parameter is to be generated when the two-dimensional, this is the result of the SVD orthogonal matrix from a matrix of random numbers uniformly distributed.

8、tf.glorot_uniform_initializer()

Also known as Xavier uniform initializer, a uniform distribution of the (uniform distribution) to initialize data.

Interval is assumed uniform distribution [-limit, limit], then

limit=sqrt(6 / (fan_in + fan_out))

And wherein the fan_in fan_out represent node number and node number output unit input unit.

9、glorot_normal_initializer()

Also known as Xavier normal initializer. A truncated normal distribution of the data to initialize.

stddev = sqrt (2 / (fan_in fan_out +))

And wherein the fan_in fan_out represent the number of nodes and output to single input unit

Guess you like

Origin www.cnblogs.com/jfdwd/p/11184117.html