Neural network optimization method summary
From the shallow to the deeper levels of the neural network neural network, neural network simply by increasing the number of layers and can not effectively improve the performance of the model, we will summarize here some optimization algorithms:
Optimization of a neural network: using regularization techniques to improve the generalization ability of the model
The regularization method used is as follows:
- L1, L2 regularization
- dropout regularization
- Data Augmentation increase training samples
- Early stopping to select the appropriate number of training iterations
Neural Network Optimization II: gradient optimization
Conventional gradient method is as follows:
-
Gradient descent
- Batch Gradient Descent
- Mini-Batch gradient descent
- Stochastic gradient descent (SGD)
-
Gradient descent with momentum (Momentum GD)
-
Nesterw Momentum
-
AdaGrad
-
RMSprop
-
Adam
Optimizing neural network Three: network initialization parameter tuning tips and super
- Network initialization skills
-
Standardization of the input network (the number of each characteristic value range is the same)
-
Weight w initialization
- Let the variance of the weight W 1 / n [l-1]
W[l] = np.random.randn(n[l],n[l-1])*np.sqrt(1/n[l-1])
- Let the variance of the weight W 2 / n [l-1]
W[l] = np.random.randn(n[l],n[l-1])*np.sqrt(2/n[l-1])
- Let the variance of the weight W 2 / n [l-1] ⋅n [l]
W[l] = np.random.randn(n[l],n[l-1])*np.sqrt(2/(n[l-1]*n[l]))
-
- Ultra-parameter testing
more depth neural network needs to be debugged hyper-parameters, common follows:- Learning factor α
- Gradient descent with momentum factor β
- Adam optimization algorithm parameters β1, β2, ε
- Neural network layers
- Each hidden layer neuron number
- Learning factor declined parameters
- The number of samples included in the training sample batch
- L1, L2 regularization factor λ