Questions regarding image processing for pre-trained image classifier in PyTorch

DanielOh :

I am trying to use a popular pre-trained VGG model for image classification in PyTorch but noticed that the image is resized to 256 and crop it to 224 for pre-processing images in most of the programs. I am curious why we do resize it to 256 first and crop it instead of resizing it to 224 directly.

transforms = transforms.Compose([transforms.Resize(256), 
                                 transforms.CenterCrop(224),
                                 transforms.ToTensor(),
                                 transforms.Normalize([0.485, 0.456, 0.406], 
                                                     [0.229, 0.224, 0.225])])
GPhilo :

For image classification tasks, typically the object of interest is located in the center of the image. It is thus common practice (for inference) to take a central crop of the image cutting away some border (this does not apply in general, however, as exact preprocessing depends strongly on how the network was trained).

As per the "why cropping and not resizing directly", this is a byproduct of data augmentation during training: taking a random crop of the image is a very common data augmentation technique. At inference time, resizing the whole image to the input size instead of applying a crop influences the scale of the objects in the image, which negatively affects the network's performance (because you're evaluating on data that has a "format" different from the one you trained on and CNNs are not scale-invariant).

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=319716&siteId=1