Data standardization in pytorch neural network

1. The advantages of standardization

Insert picture description here

2. The purpose of standardization

Makes the data histograms of the input layer, hidden layer, and input layer of the network are all within a specified range, which is conducive to the convergence of the model. Sometimes in order to facilitate the output result to better approximate the real result, the label data will be correspondingly Standardization

3. Standardization method
1、batch norm
  • Of the channel is normalized, i.e. the normalized channel
  • Explanation:
    • 1. The calculation of bn is to take out the NHW of each channel separately and normalize it.
    • 2. For each channel, we have a set of γ and β, so the learnable parameter is 2C
    • 3. The smaller the batchsize , the worse the performance of BN, because the mean and variance obtained in the calculation cannot represent the overall situation
      Insert picture description here
2、layer norm
  • To batch normalize
  • Explanation
    • 1. The calculation of LN is to normalize each CHW separately, and is not affected by batchsize
    • 2. It is commonly used in RNN networks, but if the input features are very different, then it is not recommended to use it for normalization
      Insert picture description here
      Insert picture description here
3、instance norm
  • For a single picture is normalized
  • Explanation:
    • 1. The calculation of IN is to take out each HW separately and normalize it, which is not affected by the channel and batchsize.
    • 2, commonly used in the style of migration , but if you can use the map feature correlation between channels, it is not recommended to use it to do the normalization process
      -Insert picture description here
4、group norm
  • Group the channels and normalize each group
  • Explanation:
    • 1. The calculation of GN is to divide the channel into g groups first, and then take out each gHW separately for normalization, and finally merge the normalized data of g group into CHW
    • 2. GN is between LN IN. Of course, it can be said that LN IN is a special case of GN. For example, the size of g is 1 or C
      Insert picture description here
5、switchable norm
  • The current frequency of use is very low
  • Explanation:
    • 1. Combine BN LN IN, assign weights, and let the network learn by itself what method should be used for the normalization layer
    • 2. A collection of thousands of pets, but the training is complicated
      Insert picture description here
6. Standardized mathematical formulas
  • Two steps:
    • 1. Standardization
      • Make the distribution more even and facilitate training
    • 2. Anti-standardization
      • Obtain non-linear expression ability
        Insert picture description here

Anti-standardization: make it have nonlinear ability

7. Standardized process

Insert picture description here
Process: non-linear data→transform→linear data→w*x+b→non-linear data→BN: remove similarities and save differences/unify dimensions→linear (learn different points/faster convergence)→inverse BN: restore data→non Linearity (make larger features larger/expressively better) → activation (provide more nonlinearity/open the gap)
Insert picture description here

Fourth, the weight standardization method

y=wx

Have not studied carefully yet, wait for follow-up supplement

Insert picture description here

Guess you like

Origin blog.csdn.net/qq_43586192/article/details/111199799
Recommended