I’m not going to talk about the principle today, I feel like writing a lot, so that everyone can’t understand it.
Briefly say:
- Not only the training speed is greatly improved, but the convergence process is greatly accelerated;
- It can also increase the classification effect. One explanation is that this is a regular expression similar to Dropout to prevent over-fitting, so it can achieve a considerable effect without Dropout;
- In addition, the tuning process is much simpler, the initialization requirements are not so high, and a large learning rate can be used.
I also found some good pictures on the Internet, which can inspire
Data preprocessing
can be normalized by normalization, or standardized by standardization,
used to convert different features of the data into the same range,
normalization: convert the data to between [0, 1],
standardization: after conversion The data conforms to the standard normal distribution
Why do we need to normalize and standardize?
Different features have different orders of magnitude of data, and their influence on the result of linear combination is very different, and features with large orders of magnitude obviously have a greater impact.