We know that in the data passed to the machine learning model, we need to normalize the data.
After the data is normalized, the data is "flattened" to a uniform interval, and the output range is reduced to between 0 and 1. People usually think that after such an operation, the process of finding the optimal solution will obviously become smoother, and the model will more easily converge to the optimal level correctly.
However, such "stereotypes" have recently been challenged. DeepMind researchers have proposed a deep learning model NFNet that does not require normalization, which has achieved the best in the industry (SOTA) on large-scale image classification tasks.
Table 5: Comparison of ImageNet model migration performance after large-scale pre-training with additional data.
Andrew Brock said that although our understanding of neural network signal transmission and training rules still needs to be explored, the non-normalization method has provided people with a powerful reference and proved the development of this in-depth understanding. Ability can effectively improve efficiency in a production environment.