Image data preprocessing in deep learning-image mean or pixel mean

 

 

First quote the following online explanation:

For a grayscale image, every pixel in the mean image is computed from the average of all corresponding pixels (i.e. same coordinates) across all images of your dataset. "Mean image" subtraction means that this mean image is subtracted from any input image you feed to the neural network. The intention is to have inputs that are (on average) centred around zero.

The mean pixel is simply the average of all pixels in the mean image. "Mean pixel" subtraction means that you subtract the *same* mean pixel value from all pixels of the input to the neural network.

Now the same applies to RGB images, except that every channel is processed independently (this means we don't compute averages across channels, instead every channel independently goes through the same transformations as for a grayscale image).

Intuitively, it feels like mean image subtraction should perform better (that is what I noticed on the auto-encoder example in DIGITS) although I don't know of research papers that back this up.

For grayscale images, each pixel in the average image is calculated from the average of the corresponding pixels (that is, the same coordinates) between all images in the dataset. "Average image" subtraction means subtracting the average image from any input image input to the neural network. The purpose is to center the input (average) to zero.

The average pixel is just the average of all pixels in the average image. "Average pixel" subtraction means subtracting the "same" average pixel value from all pixels input to the neural network.

The same applies to RGB images, except that each channel is processed independently (this means that we do not calculate the average value between channels, but each channel independently performs the same conversion as the grayscale image).

Intuitively, it feels like average image subtraction should perform better (this is what I noticed in the autoencoder example in DIGITS), although I don't know of any research papers that support this situation.

image mean:

For example, input an RGB image, such as N * N * 3, and find the image mean, and the result is still N * N * 3, that is, all images in the training set are in the same spatial position (also the same channel, not cross Channel) pixels are averaged.

pixel mean:

The pixel mean is to average the pixels of the R channel of all images. The G and B channels are the same, regardless of the relationship between the spatial positions. The result obtained in this way is R_mean, G_mean, B_mean, which is equivalent to averaging the image mean again.

Reasons for subtracting the mean:

(1) From the perspective of PCA

The mean value is subtracted for data feature standardization. Feature standardization refers to making each dimension of the data have zero mean and unit variance. This is the most commonly used method in normalization. In actual calculations, the specific operation of feature standardization is: first calculate the average value of the data in each dimension (using the entire data to calculate), and then subtract the average value in each dimension. Finally, divide each dimension of the data by the standard deviation of the data in that dimension.

For natural images, it is more to do zero-mean image processing, and does not need to estimate the sample variance. This is because when training on natural images, it makes little sense to estimate the mean and variance for each pixel separately, because (in theory) the statistical properties of any part of the image should be the same as the other parts. Make stationarity.

For images, this normalization can remove the average intensity of the image. In many cases, we are not interested in the illuminance of the image, but pay more attention to its content. For example, in the object recognition task, the overall image The brightness does not affect what objects are present in the image. At this time, it is meaningful to remove the average value of pixels for each data point.

(2) From the perspective of back propagation calculation

In deep learning, if you use gradient descent to train the model, you must basically normalize the data during data preprocessing. Of course there is a reason.

According to the formula

                                                                         

 

如果输入层 χ 很大,在反向传播时传递到输入层的梯度就会变得很大。梯度大,学习率就得非常小,否则会越过最优。在这种情况下,学习率的选择需要参考输入层的数值大小,而直接将数据归一化操作,可以很方便的选择学习率。而且受 χ 和 的影响,各个梯度的数量级不相同,因此,它们需要的学习率数量级也就不相同,对于w1 适合的学习率,可能相对于w来说太小,如果仍使用适合w的学习率,会导致在w2的方向上下降地很慢,会消耗非常多的时间,而使用适合W2的学习率,对于w1来说又太大,找不到适合w1的解。

 

Guess you like

Origin www.cnblogs.com/booturbo/p/12688105.html