1. During training, each Batch Norm only needs to process one mini-batch (m), and during testing, it needs to process the entire (1) test set at one time
Therefore, the method of Exponentially Weighted Average is used to track μ and σ**2 and use it for testing