Neural Network 03 (parameter initialization)

1. Parameter initialization

For a certain neuron , there are two types of parameters that need to be initialized: one is the weight W, and the other is the bias b. The bias b can be initialized to 0. The initialization of weight W is more important. We focus on introducing common initialization methods.

(1) Random initialization

Random initialization samples from a Gaussian distribution (also called a normal distribution) with a mean of 0 and a standard deviation of 1, and initializes the parameter W with some very small values.

(2) Standard initialization

The weight parameters are initialized to uniformly random values ​​from the interval. That is, the weight of the current neuron is generated in a (-1/√d,1/√d) uniform distribution, where d is the number of inputs to each neuron.

(3) Xavier initialization (used by default in tf.keras)

The basic idea of ​​this method is that the activation values ​​and gradient variances of each layer remain consistent during the propagation process, which is also called Glorot initialization. There are two methods implemented in tf.keras:

① Normalized Xavier initialization

Glorot normal distribution initializer, also known as Xavier normal distribution initializer. It draws samples from a normal distribution centered at 0 with standard deviation stddev = sqrt(2 / (fan_in + fan_out)), where fan_in is the number of input neurons and fan_out is the number of output neurons. 

# 导入工具包
import tensorflow as tf
# 进行实例化
initializer = tf.keras.initializers.glorot_normal()
# 采样得到权重值
values = initializer(shape=(9, 1))
# 打印结果
print(values)

②Standardized Xavier initialization

Glorot uniform distribution initializer, also known as Xavier uniform distribution initializer. It draws samples from a uniform distribution in [-limit, limit], where limit is sqrt(6 / (fan_in + fan_out)), where fan_in is the number of input neurons and fan_out is the number of output neurons. 

# 导入工具包
import tensorflow as tf
# 进行实例化
initializer = tf.keras.initializers.glorot_uniform()
# 采样得到权重值
values = initializer(shape=(9, 1))
# 打印结果
print(values)

(4)He initialization

He initialization, also known as Kaiming initialization, is from the hand of the great god He Kaiming. Its basic idea is that during forward propagation, the variance of the activation value remains unchanged; during backward propagation, the variance of the gradient of the state value remains unchanged. There are also two types in tf.keras:

① Normalized He initialization

He normal distribution initialization is to extract samples from the truncated normal distribution with 0 as the center and standard deviation stddev = sqrt(2 / fan_in), where fan_in is the number of input neurons. The implementation method in tf.keras is : 

# 导入工具包
import tensorflow as tf
# 进行实例化
initializer = tf.keras.initializers.he_normal()
# 采样得到权重值
values = initializer(shape=(9, 1))
# 打印结果
print(values)

② Standardized He initialization

He uniform variance scaling initializer. [-limit,limit] It   draws samples from a uniform distribution where limit is  the number of sqrt(6 / fan_in)input  fan_in neurons. Implemented as:

# 导入工具包
import tensorflow as tf
# 进行实例化
initializer = tf.keras.initializers.he_uniform()
# 采样得到权重值
values = initializer(shape=(9, 1))
# 打印结果
print(values)

Guess you like

Origin blog.csdn.net/peng_258/article/details/132829736