Article Directory
1. The advantages of standardization
2. The purpose of standardization
Makes the data histograms of the input layer, hidden layer, and input layer of the network are all within a specified range, which is conducive to the convergence of the model. Sometimes in order to facilitate the output result to better approximate the real result, the label data will be correspondingly Standardization
3. Standardization method
1、batch norm
- Of the channel is normalized, i.e. the normalized channel
- Explanation:
- 1. The calculation of bn is to take out the NHW of each channel separately and normalize it.
- 2. For each channel, we have a set of γ and β, so the learnable parameter is 2C
- 3. The smaller the batchsize , the worse the performance of BN, because the mean and variance obtained in the calculation cannot represent the overall situation
2、layer norm
- To batch normalize
- Explanation
- 1. The calculation of LN is to normalize each CHW separately, and is not affected by batchsize
- 2. It is commonly used in RNN networks, but if the input features are very different, then it is not recommended to use it for normalization
3、instance norm
- For a single picture is normalized
- Explanation:
- 1. The calculation of IN is to take out each HW separately and normalize it, which is not affected by the channel and batchsize.
- 2, commonly used in the style of migration , but if you can use the map feature correlation between channels, it is not recommended to use it to do the normalization process
-
4、group norm
- Group the channels and normalize each group
- Explanation:
- 1. The calculation of GN is to divide the channel into g groups first, and then take out each gHW separately for normalization, and finally merge the normalized data of g group into CHW
- 2. GN is between LN IN. Of course, it can be said that LN IN is a special case of GN. For example, the size of g is 1 or C
5、switchable norm
- The current frequency of use is very low
- Explanation:
- 1. Combine BN LN IN, assign weights, and let the network learn by itself what method should be used for the normalization layer
- 2. A collection of thousands of pets, but the training is complicated
6. Standardized mathematical formulas
- Two steps:
- 1. Standardization
- Make the distribution more even and facilitate training
- 2. Anti-standardization
- Obtain non-linear expression ability
- Obtain non-linear expression ability
- 1. Standardization
Anti-standardization: make it have nonlinear ability
7. Standardized process
Process: non-linear data→transform→linear data→w*x+b→non-linear data→BN: remove similarities and save differences/unify dimensions→linear (learn different points/faster convergence)→inverse BN: restore data→non Linearity (make larger features larger/expressively better) → activation (provide more nonlinearity/open the gap)
Fourth, the weight standardization method
y=wx
Have not studied carefully yet, wait for follow-up supplement