CNN is the most important parameter, including W, b. We train CNN's ultimate goal is to get the best parameters, the objective function to obtain a minimum. Initialization parameters are equally important, so fine-tuning by a lot of people's attention, then tf which provides a method of initialization parameters of it, we can not initialize itself it?
All methods are defined in the initializationtensorflow/python/ops/init_ops.py
1、tf.constant_initializer()
Can also be abbreviated as tf.Constant ()
Initialization is a constant, this is very useful, usually offset term is to use it to initialize.
By its two initialization method derived:
a, tf.zeros_initializer (), may be abbreviated as tf.Zeros ()
b, tf.ones_initializer (), may be abbreviated as tf.Ones ()
Example: In a convolutional layer, the bias term b is initialized to 0, there are a variety of writing:
- conv1 = tf.layers.conv2d(batch_images,
- filters=64,
- kernel_size=7,
- strides=2,
- activation=tf.nn.relu,
- kernel_initializer = tf . TruncatedNormal ( stddev = 0.01 )
- bias_initializer=tf.Constant(0),
- )
or:
- bias_initializer=tf.constant_initializer(0)
or:
- bias_initializer=tf.zeros_initializer()
or:
- bias_initializer=tf.Zeros()
Example: How to initialize W Laplace operator?
- value = [1, 1, 1, 1, -8, 1, 1, 1,1]
- init = tf.constant_initializer(value)
- W= tf.get_variable('W', shape=[3, 3], initializer=init)
2、tf.truncated_normal_initializer()
Or abbreviated as tf.TruncatedNormal ()
Generating a random number truncated normal distribution, the initialization method if more than using the tf.
It has four parameters ( Mean = 0.0, stddev = 1.0, SEED = None, DTYPE = dtypes.float32), are used to specify the mean, standard deviation, random number seed and random number data type, which generally only need to set stddev a parameter on it.
Example:
- conv1 = tf.layers.conv2d(batch_images,
- filters=64,
- kernel_size=7,
- strides=2,
- activation=tf.nn.relu,
- kernel_initializer = tf . TruncatedNormal ( stddev = 0.01 )
- bias_initializer=tf.Constant(0),
- )
or:
- conv1 = tf.layers.conv2d(batch_images,
- filters=64,
- kernel_size=7,
- strides=2,
- activation=tf.nn.relu,
- kernel_initializer=tf.truncated_normal_initializer(stddev=0.01)
- bias_initializer=tf.zero_initializer(),
- )
3、tf.random_normal_initializer()
May be abbreviated as tf.RandomNormal ()
Generating a standard normal random number as parameters, and truncated_normal_initializer.
4、random_uniform_initializer = RandomUniform()
May be abbreviated as tf.RandomUniform ()
Generating a uniformly distributed random number, there are four parameters ( MINVAL = 0, MAXVAL = None, SEED = None, DTYPE = dtypes.float32), are used to specify the minimum, maximum, and the random number seed type.
5、tf.uniform_unit_scaling_initializer()
May be abbreviated as tf.UniformUnitScaling ()
And almost uniform distribution, but this method does not need to specify minimum and maximum initialization, through calculated. Parameters (factor = 1.0, seed = None, dtype = dtypes.float32)
- max_val = math.sqrt(3 / input_size) * factor
Input_size herein refers to the dimension of the input data, assuming input is x, calculation of x * W, the input_size = W.shape[0]
Its distribution interval [-max_val, max_val]
6、tf.variance_scaling_initializer()
May be abbreviated as tf.VarianceScaling ()
参数为(scale=1.0,mode="fan_in",distribution="normal",seed=None,dtype=dtypes.float32)
scale: Scale Scale (positive float)
mode: a "fan_in", "fan_out", "fan_avg" is used to calculate the value of the standard deviation stddev.
distribution: the distribution type, "normal" or "uniform" one.
When the distribution = "normal" when generating truncated normal distribution (normal distribution truncated) random number, wherein stddev = sqrt (scale / n), n is calculated with the mode parameters.
If mode = "fan_in", n is the number of nodes in the input unit;
If mode = "fan_out", n is the number of nodes of the output unit;
If mode = "fan_avg", n means an average value of the input and output points of junction.
When the distribution = "uniform" when generating uniformly distributed random numbers, assuming that the distribution interval [-limit, limit], then
limit = sqrt(3 * scale / n)
7、tf.orthogonal_initializer()
Abbreviated as tf.Orthogonal ()
Generating a random number of orthogonal matrices.
When the parameter is to be generated when the two-dimensional, this is the result of the SVD orthogonal matrix from a matrix of random numbers uniformly distributed.
8、tf.glorot_uniform_initializer()
Also known as Xavier uniform initializer, a uniform distribution of the (uniform distribution) to initialize data.
Interval is assumed uniform distribution [-limit, limit], then
limit=sqrt(6 / (fan_in + fan_out))
And wherein the fan_in fan_out represent node number and node number output unit input unit.
9、glorot_normal_initializer()
Also known as Xavier normal initializer. A truncated normal distribution of the data to initialize.
stddev = sqrt (2 / (fan_in fan_out +))
And wherein the fan_in fan_out represent the number of nodes and output to single input unit