Hands-on science learning enhanced depth data, model trimming

Reference Bo Yu learning platform "hands-on learning deep learning" course contents study notes written by
the original link: https: //www.boyuai.com/elites/course/cZu18YmweLv10OeV/lesson/6zsAsvcJ58UKvWtKArTqwq
sense Xie Boyu platform, Datawhale, and whales, AWS provides us with the opportunity to learn for free! !
Total Learning experience: Peter Yu of course do well, very systematic course, each higher-level courses, there will be introduced before the renewal of the need to master the basics, so it is suitable for my poor basis for such students to learn, based on the recommendations of poor students can focus on other courses Bo Yu:
mathematical foundation: https: //www.boyuai.com/elites/course/D91JM0bv72Zop1D3
machine learning the basics: https: //www.boyuai.com/elites/course/ 5ICEBwpbHVwwnK3C

Augmented image

In section (depth convolution neural network) 5.6 where we mentioned earlier, large-scale data sets is a prerequisite for the successful application of the depth of the neural network. Augmented image (image augmentation) technology by making a series of random changes to the training images to produce similar but different training samples, thereby expanding the size of the training data set. Image augmented Another explanation is that the training samples randomly change models can reduce the dependence on certain properties, thereby improving the generalization ability of the model. For example, we can cut a different embodiment of the image, so that objects of interest in various places, so as to reduce the model dependency of the appearance position of the object. We can also adjust the brightness, color and other factors to reduce the sensitivity of the model of color. It can be said, in a successful year AlexNet, the image augmentation techniques contributed. In this section we will discuss is widely used in computer vision in the technology.

First, a desired module or a package import experiments.
os Import
os.listdir ( "/ Home / kesci / the INPUT / img2083 /")

matplotlib inline%
Import os
Import Time
Import Torch
from Torch Import nn, optim #nn define the entire model, optim defined optimizer
from torch.utils.data import Dataset, DataLoader # load data, save the data.
import torchvision # image properties like brightness
import SYS
from the PIL import Image

sys.path.append ( "/ Home / kesci / INPUT /")
# opposing GPU device currently used only device 0
os.environ [ "CUDA_VISIBLE_DEVICES"] = " 0"

import d2lzh1981 as d2l

# Define device, whether the GPU, based on the computer configured to automatically choose
Device torch.device = ( 'CUDA' IF torch.cuda.is_available () the else 'CPU')
Print (Torch. Version )
Print (Device)

Commonly used methods augmented image

We read a shape 400 × 500 400\times 500 (height and width is 400 pixels and 500 pixels) of an image as the experimental sample.

d2l.set_figsize ()
IMG = Image.open ( '/ Home / kesci / INPUT / img2083 / IMG / cat1.jpg')
d2l.plt.imshow (IMG) with default axes #

Plotting functions defined below show_images.

# This function has been saved in d2lzh_pytorch package to facilitate future use
DEF show_images (imgs, NUM_ROWS, NUM_COLS, Scale = 2):
figsize = (* Scale NUM_COLS, NUM_ROWS * Scale)
_, = d2l.plt.subplots axes (NUM_ROWS, NUM_COLS , figsize = figsize)
for I in Range (NUM_ROWS):
for J in Range (NUM_COLS):
axes [I] [J] .imshow (imgs [NUM_COLS * I + J])
axes [I] [J] .axes. get_xaxis (). set_visible (False) # hidden axis False
axes [I] [J] .axes.get_yaxis (). set_visible (False)
return axes
most image augmented method has some randomness. In order to facilitate the observation image augmented effect, then we define a helper function apply. This function of the input image img image multiple runs aug augmented methods and show all the results.

def apply (img, aug, num_rows = 2, num_cols = 4, scale = 1.5): Object # aug operation of
the Y = [-Aug (IMG) for _ in Range (NUM_ROWS * NUM_COLS)]
show_images (the Y, NUM_ROWS, NUM_COLS, scale)

Flip and crop

Lateral tilt categories do not generally alter the image of the object. It is an image augmentation method earliest and most widely used. Let's create RandomHorizontalFlip instance by torchvision.transforms modules to achieve the level of half of the probability of an image (or so) flip.

apply(img, torchvision.transforms.RandomHorizontalFlip())

Flip upside down about as good as common. But at least for the sample images, upside down will not cause cognitive disorders. Let's create RandomVerticalFlip instances to achieve half of probabilities image vertically (up and down) Flip.

apply(img, torchvision.transforms.RandomVerticalFlip())

In the sample image we use, the cat in the middle of the image, but in general may not be the case. In 5.4 (cell layer) where we explain the pooling layer can reduce the sensitivity of the convolution layer of the target location. In addition, we can also crop the image to make random objects appear in different proportions at different locations of the image, which also can reduce the sensitivity of the model to the target location.

In the following code, every time we cut out a random area of ​​the original area 10 % 100 % 10\% \sim 100\% of the area, and the ratio of the width and height of the region from the random 0.5 2 0.5 \sim 2 , then the width and height of the region are scaled to 200 pixels. Unless otherwise specified in this section, a a and b b between the random number is the time from the interval [ a , b ] [a,b] random values uniformly sampled continuously obtained.

shape_aug = torchvision.transforms.RandomResizedCrop(200, scale=(0.1, 1), ratio=(0.5, 2))
apply(img, shape_aug)

Change color

The other method is augmented change color. We can change the color of the image from the four aspects: the luminance (brightness), contrast (contrast), saturation (Saturation) and color tone (hue). In the following example, we will image luminance random changes to original luminance 50 % 50\% ( 1 0.5 1-0.5 ) 150 % \sim 150\% ( 1 + 0.5 1+0.5 )。
apply(img, torchvision.transforms.ColorJitter(brightness=0.5, contrast=0, saturation=0, hue=0))

We can also randomly change the image tint.

apply(img, torchvision.transforms.ColorJitter(brightness=0, contrast=0, saturation=0, hue=0.5))

Similarly, we can randomly change the contrast of the image.

apply(img, torchvision.transforms.ColorJitter(brightness=0, contrast=0.5, saturation=0, hue=0))

We can also be provided simultaneously random variations of the brightness of the image (brightness), contrast (contrast), saturation (Saturation) and color tone (hue).

color_aug = torchvision.transforms.ColorJitter(brightness=0.5, contrast=0.5, saturation=0.5, hue=0.5)
apply(img, color_aug)

Superposition of multiple images augmented method

We will be more practical application image augmented overlay method to use. We plurality of images can be augmented method defined above are superimposed by Compose example, and then applied to the top of each image.

augs = torchvision.transforms.Compose([
torchvision.transforms.RandomHorizontalFlip(), color_aug, shape_aug])
apply(img, augs)

Using an image augmented training model

Here we look at an example of an image augmented in practical training. Here we use CIFAR-10 data sets, but not before we have been using Fashion-MNIST dataset. Fashion-MNIST This is because the object data set has passed the position and size normalization processing, and the color of the object CIFAR-10 concentration data and the size difference is more significant. The following shows the front CIFAR-10 32 training image data sets.

CIFAR_ROOT_PATH = ‘/home/kesci/input/cifar102021’
all_imges = torchvision.datasets.CIFAR10(train=True, root=CIFAR_ROOT_PATH, download = True)
#all_imges的每一个元素都是(image, label)
show_images([all_imges[i][0] for i in range(32)], 4, 8, scale=0.8);

In order to get the results determined at the time predicted, we usually only use the augmented image on training samples, without the use of an image containing a random increase in predicting wide operation. Here we use only the most simple random flipping around. In addition, we used small quantities ToTensor images into the format desired PyTorch, i.e. shape (batch size, number of channels, height, width), in the range between 0 and 1 and is a 32-bit floating-point type.

flip_aug = torchvision.transforms.Compose([
torchvision.transforms.RandomHorizontalFlip(),
torchvision.transforms.ToTensor()])

no_aug = torchvision.transforms.Compose([
torchvision.transforms.ToTensor()])

Next we define a helper function to facilitate the application reads the image and image augmentation. For more information about DataLoader, refer to Section 3.5 of the earlier image classification data set (Fashion-MNIST).
= 0 IF sys.platform.startswith num_workers ( 'Win32') the else. 4
DEF load_cifar10 (is_train, AUGs, the batch_size, the root = CIFAR_ROOT_PATH):
DataSet = torchvision.datasets.CIFAR10 (= the root the root, Train is_train =, = AUGs Transform, = False downloads)
return DataLoader (DataSet, the batch_size = the batch_size, shuffle = is_train, num_workers = num_workers)

Using an image augmented training model

We train in Section 5.11 (residual network) on CIFAR-10 data sets ResNet-18 model in this article.

We first define the function uses the GPU train training and evaluation model.
# This function has been saved in d2lzh_pytorch package to facilitate future use
DEF Train (train_iter, test_iter, NET, Loss, Optimizer, Device, num_epochs):
NET = net.to (Device)
Print ( "Training ON", Device)
batch_count = 0
Epoch in Range for (num_epochs):
train_l_sum, train_acc_sum, n-, Start = 0.0, 0.0, 0, the time.time ()
for X-, in train_iter Y:
X-X.to = (Device) # must remember to put the data device where
Y = y.to (device)
y_hat NET = (X-)
L = Loss (y_hat, Y)
optimizer.zero_grad ()
l.backward ()
optimizer.step ()
train_l_sum l.cpu = + (). Item ( )
train_acc_sum + = (y_hat.argmax (Dim =. 1) == Y) .sum (). CPU (). Item ()
n-y.shape + = [0]
batch_count +. 1 =
test_acc = d2l.evaluate_accuracy(test_iter, net)
print(‘epoch %d, loss %.4f, train acc %.3f, test acc %.3f, time %.1f sec’
% (epoch + 1, train_l_sum / batch_count, train_acc_sum / n, test_acc, time.time() - start))

You can then define a function using image train_with_data_aug augmented to train the model. The function uses Adam as optimization algorithm trained to use, and then applied to the augmented image on the training data set, the last call train just defined the function of training and evaluation model.

%% Below, type any markdown to display in the Graffiti tip.
%% Then run this cell to save it.
train_iter = load_cifar10(True, train_augs, batch_size) test_iter = load_cifar10(False, test_augs, batch_size)

def train_with_data_aug(train_augs, test_augs, lr=0.001):
batch_size, net = 256, d2l.resnet18(10)
optimizer = torch.optim.Adam(net.parameters(), lr=lr)
loss = torch.nn.CrossEntropyLoss()
train_iter = load_cifar10(True, train_augs, batch_size)
test_iter = load_cifar10(False, test_augs, batch_size)
train(train_iter, test_iter, net, loss, optimizer, device, num_epochs=10)

The following use of random left and right reversed image augmented to train the model.
train_with_data_aug (flip_aug, no_aug)

Fine tuning

In some previous chapter, we describe how to train on only 60 000 Fashion-MNIST training data set image of the model. We also describe the current academic community the most widely used large-scale image data sets ImageNet , it has more than 10 million images and 1,000 classes of objects. However, we usually come into contact with the data set size is usually between the two.

Suppose we want to identify different species from the image of the chair, then buy links recommended to the user. One possible approach is to first find 100 kinds of common chairs, shooting 1,000 images from different angles for each chair, then train a classification model on the collected image data set. The chair dataset although it may be bigger than Fashion-MNIST data set, but the number of samples is still less than one-tenth of the number of samples ImageNet centralized data. This can lead to models for complex data sets ImageNet on this chair datasets over-fitting. At the same time, because of the limited amount of data, the accuracy of the final model of the trained also may not meet practical requirements.

In response to these problems, one obvious solution is to collect more data. However, data collection and labeling will spend a lot of time and money. For example, to collect ImageNet data sets, researchers took millions of dollars in research funding. Although the current data collection costs have been reduced a lot, but it still costs can not be ignored.

Another solution is the application of transfer learning (transfer learning), from the source data set acquired knowledge migrated to the target data set. For example, although the image data set ImageNet nothing to do with most of the chair, but in the training data set more general models can be extracted image feature, which can help identify the edges, texture, shape, and composition, etc. of the object. Similar features of these may also be equally valid for identifying the chair.

In this section we introduce a common technique in the study of migration: trimming (fine tuning). Shown in Figure 9.1, consists of the following 4-step trimming.

  1. On the source data set (eg data set ImageNet) a pre-trained neural network model, i.e., the source model.
  2. Create a new neural network model, that is, the target model. It copies all the source model and model design parameters except the output layer. We assume that these model parameters include the knowledge learned on the source data set, and this knowledge is equally applicable to the target data set. We also assume that the source of the output layer model data set with label source closely related, and therefore not to adopt the target model.
  3. Add a target model is output as the output layer size number of the target data set categories, and the random initialization layer model parameters.
  4. Training target model on the target data sets (such as chairs dataset). We will re-train output layer, while the remaining layer parameters are fine-tuning parameters based source model obtained.

Image Name

When the target data set is much smaller than the source data set, fine-tuning help to improve the generalization ability of the model.

Identify hot dogs

Next we come to practice a concrete example: hot dogs identified. We trained for ResNet model on ImageNet fine-tune the data set based on a small data set. The data set comprising thousands of small image including hot dogs and hot dog does not contain. We will use the model to identify fine-tune the resulting image contains a hot dog.

First, a desired module or a package import experiments. torchvision the modelspackage provides a common pre-training model. If you want to get more pre-training model can be used to use pretrained-models.pytorchwarehouse.

Obtaining a data set

We use the hot dog datasets ( click to download ) is fetched from the Internet, which contains 1400 positive class image contains hot dogs, and just as many classes contain negative images of other foods. 1000 images of various types are used for training, rest for testing.

We will first download the compressed data set to the path data_dirbelow, then unzip the downloaded data set in the path, get two folders hotdog/trainand hotdog/test. Here are two folders hotdogand not-hotdogtwo categories of folders, each category is an image file inside the folder.

os Import
os.listdir ( '/ Home / kesci / the INPUT / resnet185352')
data_dir = '/ Home / kesci / the INPUT / hotdog4014'
os.listdir (os.path.join (data_dir, "Hotdog"))
We create two ImageFolderexamples of image files to read all the training data set and test data set, respectively.

train_imgs = ImageFolder(os.path.join(data_dir, ‘hotdog/train’))
test_imgs = ImageFolder(os.path.join(data_dir, ‘hotdog/test’))

The following eight before final positive image or a negative type image 8 shown. Can be seen, their different size and aspect ratio.
= Hotdogs [train_imgs [I] [0] for I in Range (. 8)]
not_hotdogs = [train_imgs [-i -. 1] [0] for I in Range (. 8)]
d2l.show_images (+ not_hotdogs Hotdogs, 2,. 8 , scale = 1.4);

In training, we start cutting out an image area of ​​random size and random random aspect ratio, the area and height and width are scaled input 224 pixels. When tested, we will be high and wide images are scaled to 256 pixels, then cut out from the height and width are 224 pixels in the central region as input. Moreover, we do value of RGB (red, green, blue) three color channels normalized: each numerical value by subtracting the average of all the channels divided by the standard deviation of all values ​​of the channel as an output.

Note: When using pre-trained model, and be sure to pre-training for the same pretreatment.
If you are using torchvisiona models, it requires:
All Models pre-trained in the Expect the INPUT ImagesRF Royalty Free on Normalized at The Same, Way, IE Mini-Batches of 3-Channel RGB ImagesRF Royalty Free of the Shape (the X-3 H the X-W), H and W are the WHERE expected to be at least 224. The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225].

= transforms.Normalize the normalize (Mean = [0.485, 0.456, 0.406], STD = [0.229, 0.224, 0.225])
train_augs transforms.Compose = ([
transforms.RandomResizedCrop (size = 224), # generalization ability of the model to increase
transforms .RandomHorizontalFlip (), # increase the generalization ability of the model
transforms.ToTensor (),
the normalize
])

test_augs = transforms.Compose([
transforms.Resize(size=256),
transforms.CenterCrop(size=224),
transforms.ToTensor(),
normalize
])

Define and initialize models

We use data collection on ImageNet pre-trained ResNet-18 as the source model. This specifies pretrained=Trueto automatically download and load pre-trained model parameters. The first time need an Internet connection to download using the model parameters.
= models.resnet18 pretrained_net (pretrained = False)
pretrained_net.load_state_dict (torch.load ( '/ Home / kesci / INPUT / resnet185352 / resnet18-5c106cde.pth'))

The following member variable print source model fc. As a fully connected layers, it ResNet final pool of global average output layer 1000 is converted into an output data set based on ImageNet.

print(pretrained_net.fc)

Note: If you are using another model, it may not have member variables fc(such as pre-training model models VGG in), so the correct approach is to view the source code corresponding to the model in its definition section, then it will not be wrong to deepen our model understanding. pretrained-models.pytorchWarehouse seemingly unified interface, but I recommend a look at the source code when using the corresponding model.

Seen at this time pretrained_netthe final target output is equal to the number of data sets the number of classes 1000. So we should be the last fcto modify the number of output categories we need:

pretrained_net.fc = nn.Linear(512, 2)
print(pretrained_net.fc)

At this point, pretrained_netthe fclayer was random initialization, but other layers still preserved pre-training parameter obtained. Because it is on a large data set ImageNet pre-trained, so the argument is good enough, it is generally just use a smaller learning rate to fine-tune these parameters, and fcrandom initialization parameters generally requires a larger learning rate training from scratch . PyTorch can easily set different parameters for different parts of the learning model, we have the code in the following fclearning rate is 10 times already pre-trained part.

output_params = list (map (id, pretrained_net.fc.parameters ())) # can quickly output 10 times
feature_params = filter (the lambda P: Not in output_params id§, pretrained_net.parameters ())
# line characteristic parameter points to inherit surface characteristics, requires a slow
LR = 0.01
Optimizer optim.SGD = ([{ 'the params': feature_params},
{ 'the params': pretrained_net.fc.parameters (), 'LR': LR 10 *}],
LR = lr, weight_decay = 0.001) # attenuation speed after each round of iteration

Fine-tune the model

train_fine_tuning DEF (NET, Optimizer, the batch_size = 128, num_epochs =. 5):
train_iter = DataLoader (ImageFolder (the os.path.join (data_dir, 'Hotdog / Train'), = train_augs Transform),
the batch_size, shuffle = True)
test_iter = DataLoader (ImageFolder (the os.path.join (data_dir, 'Hotdog / Test'), = test_augs Transform),
the batch_size)
Loss = torch.nn.CrossEntropyLoss ()
d2l.train (train_iter, test_iter, NET, Loss, Optimizer, Device , num_epochs)
train_fine_tuning (pretrained_net, Optimizer)
for comparison, we define a model of the same, but it is all model parameters are initialized to random values. As the entire training models from scratch, we can use a larger learning rate.

scratch_net = models.resnet18(pretrained=False, num_classes=2)
lr = 0.1
optimizer = optim.SGD(scratch_net.parameters(), lr=lr, weight_decay=0.001)
train_fine_tuning(scratch_net, optimizer)

Published 17 original articles · won praise 1 · views 606

Guess you like

Origin blog.csdn.net/water19111213/article/details/104492347