Hands-on science learning image classification Case depth 1-2

Reference Bo Yu learning platform "hands-on learning deep learning" course contents study notes written by
the original link: https: //www.boyuai.com/elites/course/cZu18YmweLv10OeV/lesson/ZDRJ8BaRpFmqDwJafJAYGn
sense Xie Boyu platform, Datawhale, and whales, AWS provides us with the opportunity to learn for free! !
Total Learning experience: Peter Yu of course do well, very systematic course, each higher-level courses, there will be introduced before the renewal of the need to master the basics, so it is suitable for my poor basis for such students to learn, based on the recommendations of poor students can focus on other courses Bo Yu:
mathematical foundation: https: //www.boyuai.com/elites/course/D91JM0bv72Zop1D3
machine learning the basics: https: //www.boyuai.com/elites/course/ 5ICEBwpbHVwwnK3C

introduction

Image classification, by definition, is a problem of image input, the output of the image content classification described. It is the core of computer vision, a wide range of practical applications. The method is conventional image classification detection and characterization, such conventional methods may for some simple image classification is effective, but the actual situation is very complex, traditional categories overwhelmed. Now, we no longer try to use the code to describe each image category, we decided to start using machine learning method for processing an image classification. The main task of a given input image, assigning it to a known mixing one label category.

Image classification (CIFAR-10) on Kaggle

Now we will use in the preceding sections learned knowledge to participate in Kaggle contest, the contest was to solve the CIFAR-10 image classification. The competition website is https://www.kaggle.com/c/cifar-10
# This section of the network requires a longer training time
# can be accessed in Kaggle:
#https: //www.kaggle.com/boyuai/boyu- d2l-image-classification-cifar- 10

import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
import os
import time

print("PyTorch Version: ",torch.version)

Access and organize data sets

Game data into training and test sets. Training set contains 50,000 images. Test set contains 300,000 images. Two data sets are image format PNG, are 32 pixels height and width, and having three color channels (RGB). The image covers 10 categories: airplanes, cars, birds, cats, deer, dogs, frogs, horses, boats and trucks. To make it easier to get started, we offer a small sample of the above data sets. "Train_tiny.zip" contains 80 training samples, while "test_tiny.zip" contains 100 test samples. They uncompressed folder names are "train_tiny" and "test_tiny".

Image Enhancement

data_transform = transforms.Compose([
transforms.Resize(40),
transforms.RandomHorizontalFlip(),
transforms.RandomCrop(32),
transforms.ToTensor()
])
trainset = torchvision.datasets.ImageFolder(root=’/home/kesci/input/CIFAR102891/cifar-10/train’
, transform=data_transform)
trainset[0][0].shape #第一个类别的第一张图片
data = [d[0].data.cpu().numpy() for d in trainset]
np.mean(data)
np.std(data)

# Image enhancement
transform_train transforms.Compose = ([
transforms.RandomCrop (32, padding =. 4), the first four weeks filling # 0, then random images cut into 32 * 32
transforms.RandomHorizontalFlip (), the probability of flipping the image # half, half the probability is not inverted
transforms.ToTensor (), # conversion format
transforms.Normalize ((0.4731, 0.4822, 0.4465 ), (0.2212, 0.1994, 0.2010)), # R, G, B of each layer of the normalized mean used and variance
])

transform_test = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.4731, 0.4822, 0.4465), (0.2212, 0.1994, 0.2010)),
])

Import data set

train_dir = ‘/home/kesci/input/CIFAR102891/cifar-10/train’
test_dir = ‘/home/kesci/input/CIFAR102891/cifar-10/test’

trainset = torchvision.datasets.ImageFolder(root=train_dir, transform=transform_train)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=256, shuffle=True)

testset = torchvision.datasets.ImageFolder(root=test_dir, transform=transform_test)
testloader = torch.utils.data.DataLoader(testset, batch_size=256, shuffle=False)

classes = [‘airplane’, ‘automobile’, ‘bird’, ‘cat’, ‘deer’, ‘dog’, ‘forg’, ‘horse’, ‘ship’, ‘truck’]

Definition Model

ResNet-18 network architecture: ResNet full name Residual Network residuals network. Kaiming He's "Deep Residual Learning for Image Recognition" won the best paper CVPR. He proposed depth of residual network in 2015 can be said to wash the major aspects of the game image, overwhelmingly we made a number of championship. And it in the network to ensure accuracy of the premise, the depth of the network layer reached 152, then further added to a depth of 1000.

Image Name
class ResidualBlock (nn.Module): # We define a network is generally inherited torch.nn.Module create a new subclass

def __init__(self, inchannel, outchannel, stride=1):
    super(ResidualBlock, self).__init__()
    #torch.nn.Sequential是一个Sequential容器,模块将按照构造函数中传递的顺序添加到模块中。
    self.left = nn.Sequential(
        nn.Conv2d(inchannel, outchannel, kernel_size=3, stride=stride, padding=1, bias=False), 
        # 添加第一个卷积层,调用了nn里面的Conv2d()
        nn.BatchNorm2d(outchannel), # 进行数据的归一化处理
        nn.ReLU(inplace=True), # 修正线性单元,是一种人工神经网络中常用的激活函数
        nn.Conv2d(outchannel, outchannel, kernel_size=3, stride=1, padding=1, bias=False),
        nn.BatchNorm2d(outchannel)
    )
    self.shortcut = nn.Sequential() 
    if stride != 1 or inchannel != outchannel:
        self.shortcut = nn.Sequential(
            nn.Conv2d(inchannel, outchannel, kernel_size=1, stride=stride, bias=False),
            nn.BatchNorm2d(outchannel)
        )
    #  便于之后的联合,要判断Y = self.left(X)的形状是否与X相同

def forward(self, x): # 将两个模块的特征进行结合,并使用ReLU激活函数得到最终的特征。
    out = self.left(x)
    out += self.shortcut(x)
    out = F.relu(out)
    return out

the ResNet class (nn.Module):
DEF the init (Self, ResidualBlock, num_classes = 10):
. Super (the ResNet, Self) the init ()
self.inchannel = 64
self.conv1 = nn.Sequential (# 3 with 3x3 volumes Instead of 7x7 convolution kernel convolution kernel, the model parameters to reduce
nn.Conv2d (. 3, 64, kernel_size =. 3, a stride of =. 1, padding =. 1, BIAS = False),
nn.BatchNorm2d (64),
nn.ReLU (),
)
self.layer1 = self.make_layer (ResidualBlock, 64, 2, a stride of =. 1)
self.layer2 = self.make_layer (ResidualBlock, 128, 2, a stride of = 2)
self.layer3 = self.make_layer (ResidualBlock, 256, 2 , a stride of = 2)
self.layer4 = self.make_layer (ResidualBlock, 512, 2, a stride of = 2)
self.fc = nn.Linear (512, num_classes)

def make_layer(self, block, channels, num_blocks, stride):
    strides = [stride] + [1] * (num_blocks - 1)   #第一个ResidualBlock的步幅由make_layer的函数参数stride指定
    # ,后续的num_blocks-1个ResidualBlock步幅是1
    layers = []
    for stride in strides:
        layers.append(block(self.inchannel, channels, stride))
        self.inchannel = channels
    return nn.Sequential(*layers)

def forward(self, x):
    out = self.conv1(x)
    out = self.layer1(out)
    out = self.layer2(out)
    out = self.layer3(out)
    out = self.layer4(out)
    out = F.avg_pool2d(out, 4)
    out = out.view(out.size(0), -1)
    out = self.fc(out)
    return out

def ResNet18():
return ResNet(ResidualBlock)

Training and Testing

Defines whether to use GPU

device = torch.device(“cuda” if torch.cuda.is_available() else “cpu”)

Ultra parameter settings

EPOCH = 20 # traversing the data set number of
times has been defined pre_epoch = 0 # traversed dataset
LR = 0.1 # learning rate

Model definition -ResNet

net = ResNet18().to(device)

Loss function definitions and optimizations

criterion = nn.CrossEntropyLoss () # loss of cross-entropy function, used for multi-classification problems
Optimizer = optim.SGD (net.parameters (), the LR LR =, Momentum = 0.9, = weight_decay-5E. 4)
# optimized way mini -batch momentum-SGD, and using L2 regularization (weight decay)

training

IF name == " main ":
Print ( "Training the Start, Resnet-18 is!")
num_iters = 0
for Epoch in Range (pre_epoch, the EPOCH):
Print ( '\ nEpoch: D%'% (+ Epoch. 1))
NET .train front () # training plus
sum_loss = 0.0
correct = 0.0
Total = 0
for I, data in the enumerate (trainloader, 0):
# traversable for a data object (such as a list, string, or tuples) a combination index sequence, while the data lists and data standard,
# 0 at the start position is marked, returns enumerate (enumeration) object.

        num_iters += 1
        inputs, labels = data
        inputs, labels = inputs.to(device), labels.to(device)
        optimizer.zero_grad()  # 清空梯度

        # forward + backward
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        #每训练1个batch打印一次loss验准确率
        sum_loss += loss.item() * labels.size(0)
        _, predicted = torch.max(outputs, 1) #选出每一列中最大的值作为预测结果
        total += labels.size(0)
        #print(predicted.numpy())
        #print(labels.numpy()
        correct += (predicted == labels).sum().item()
        # 每20个batch打印一次loss和准确率
        if (i + 1) % 20 == 0:
            print('[epoch:%d, iter:%d] Loss: %.03f | Acc: %.3f%% '
                    % (epoch + 1, num_iters, sum_loss / (i + 1), 100. * correct / total))
    # 每训练完一个epoch测试一下准确率
    print("Walking Test")
    with torch.no.grad():
        correct = 0
        total = 0
        for data in testloader:
            net.eval()
            images,labels = data
            images,labels = images.to(device).labels.toldevice)
            outputs = net(images)
            #取得分最高的那个类(outputs,data的索引号)
            _.predicted = torch.max(outputs.data,1)
            total += correct/total
            print("EPOCH ACC = %.3f%%" % acc)
            #记录最佳测试分辨准确率
            if acc > best_acc:
                best_acc =acc
print("best_acc = %.3f%%" % acc)
print("Training Finished, TotalEPOCH=%d" % EPOCH)

We can refer to the introduction of ResNet of Sun Jian ResNet in the North "depth study and practice," this course.
Before ResNet, deep learning in depth model will have a certain amount of explored. AlexNet, 8 levels (ILSVRC 2012), VGG layer 19 (ILSVRC 2014). In fact, google, VGG want to put layers push higher, but technically speaking, after 20 layers, layer together, then it is no use, and even worse results. The main reason is no way to optimize the depth of such a nonlinear system.
They Microsoft and Sun Jian's team in 2015, when the proposed ResNet (ILSVRC 2015), the number of layers upgrade to the 152 level, mainly to solve this optimization to a large extent from the principle of easing the optimization problem.
In theory, the deeper layers, then change between each layer will be smaller, so learn it directly to a change in signal is more difficult. And residual learning is a direct way to study the residual amount between the different layers, i.e., learning Delta different layers, so it is easier to learn. And he realized just need to add some jump like a branch in the network.

Photos

ResNet core point of view there are two:
• Skip Connection = "residual function"
• at The Shortest path the contains only A FEW Layers
more may explain the principles of reference: https: //zhuanlan.zhihu.com/p/91385516
For ResNet Why there are many interpretations of the work, Sun Jian of their own team in the training gives an explanation from the "shallow-to-deep" dynamics . Gradient such that the residual web large enough so that optimize some extent more convenient. There are many other chiefs issued after this work also gives reason to explain it work, you can refer to this: https: //zhuanlan.zhihu.com/p/80226180
and ResNet and CNN combined, also the output many interesting and effective model, such
Bottleneck (He et al. 2015)

Photos
ResNeXt (Xie et al. 2017)
Photos
Xception / MobileNets (Francois Chollet. 2017 ) (Howard et al. 2017)
Photos
ShuffleNet (Zhang et al. 2017)
Photos
and so on.

Dog breed identification on Kaggle (ImageNet Dogs)

In this section, we will address the challenges Kaggle recognize the breed competition, the game is https://www.kaggle.com/c/dog-breed-identification URLs in this game, we try to determine 120 different dog. The data set used in the game is actually a subset famous ImageNet data set.

# In this section notebook, a follow-up to set the parameters of the model train on a complete set of training, requires roughly 40-50 minutes
# Please long time we arrange GPU, try just switch to the GPU resources in training
# can also be accessed on the Kaggle this section Notebook:
#https: //www.kaggle.com/boyuai/boyu-d2l-dog-breed-identification-imagenet-dogs

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import torchvision.models as models
import os
import shutil
import time
import pandas as pd
import random

# Set random seed
random.seed (0)
torch.manual_seed (0)
torch.cuda.manual_seed (0)

Organize data collection

We can download data sets from the competition website, its directory structure:

| Dog Breed Identification
    | train
    |   | 000bec180eb18c7604dcecc8fe0dba07.jpg
    |   | 00a338a92e4e7bf543340dc849230e75.jpg
    |   | ...
    | test
    |   | 00a3edd22dc7859c487a64777fc8d093.jpg
    |   | 00a6892e5c7f92c1f465e213fd904582.jpg
    |   | ...
    | labels.csv
    | sample_submission.csv

Under train and test directories are the training and test sets of images, the training set contains 10,222 images, test set contains 10,357 images, image formats are JPEG, each image's file name is a unique id. labels.csv label contains a training set of images, files containing 10,222 lines, each line contains two columns, the first column is the image of the above mentioned id , the second column is the dog category . Dogs category a total of 120 species.

We hope to organize data to facilitate subsequent reading, our main objectives are:

  • Training set from a validation data set divided for adjusting hyperparameters. After the division, the data set should contain four parts: training set after the division of the divided validation set, complete training set, complete test set
  • For four portions, the establishment of four folders: train, valid, train_valid, test. In the folder for each category to create a folder, which is stored in the image belonging to the category. The first three parts of the tag is known, so each sub-folder 120, and the tag in an unknown test set, so that only the establishment of a subfolder called unknown, store all test data.

We hope that after finishing the data set directory structure:

| train_valid_test
    | train
    |   | affenpinscher
    |   |   | 00ca18751837cd6a22813f8e221f7819.jpg
    |   |   | ...
    |   | afghan_hound
    |   |   | 0a4f1e17d720cdff35814651402b7cf4.jpg
    |   |   | ...
    |   | ...
    | valid
    |   | affenpinscher
    |   |   | 56af8255b46eb1fa5722f37729525405.jpg
    |   |   | ...
    |   | afghan_hound
    |   |   | 0df400016a7e7ab4abff824bf2743f02.jpg
    |   |   | ...
    |   | ...
    | train_valid
    |   | affenpinscher
    |   |   | 00ca18751837cd6a22813f8e221f7819.jpg
    |   |   | ...
    |   | afghan_hound
    |   |   | 0a4f1e17d720cdff35814651402b7cf4.jpg
    |   |   | ...
    |   | ...
    | test
    |   | unknown
    |   |   | 00a3edd22dc7859c487a64777fc8d093.jpg
    |   |   | ...

data_dir = '/ home / kesci / input / Kaggle_Dog6357 / dog-breed-identification' # data set directory
label_file, train_dir, test_dir = 'labels.csv ', 'train', 'test' # data_dir folders, files
new_data_dir data = './train_valid_test' # after finishing storage directory
valid_ratio = 0.1 # validation set proportion

mkdir_if_not_exist DEF (path):
# If the directory path does not exist, create the directory
IF not os.path.exists (os.path.join (* path)):
os.makdirs (os.path.join (* path))

def reorg_dog_data(data_dir, label_file, train_dir, test_dir, new_data_dir, valid_ratio):
# 读取训练数据标签
labels = pd.read_csv(os.path.join(data_dir, label_file))
id2label = {Id: label for Id, label in labels.values} # (key: value): (id: label)

# 随机打乱训练数据
train_files = os.listdir(os.path.join(data_dir, train_dir))
random.shuffle(train_files)    

# 原训练集
valid_ds_size = int(len(train_files) * valid_ratio)  # 验证集大小
for i, file in enumerate(train_files):
    img_id = file.split('.')[0]  # file是形式为id.jpg的字符串
    img_label = id2label[img_id]
    if i < valid_ds_size:
        mkdir_if_not_exist([new_data_dir, 'valid', img_label])
        shutil.copy(os.path.join(data_dir, train_dir, file),
                    os.path.join(new_data_dir, 'valid', img_label))
    else:
        mkdir_if_not_exist([new_data_dir, 'train', img_label])
        shutil.copy(os.path.join(data_dir, train_dir, file),
                    os.path.join(new_data_dir, 'train', img_label))
    mkdir_if_not_exist([new_data_dir, 'train_valid', img_label])
    shutil.copy(os.path.join(data_dir, train_dir, file),
                os.path.join(new_data_dir, 'train_valid', img_label))

# 测试集
mkdir_if_not_exist([new_data_dir, 'test', 'unknown'])
for test_file in os.listdir(os.path.join(data_dir, test_dir)):
    shutil.copy(os.path.join(data_dir, test_dir, test_file),
                os.path.join(new_data_dir, 'test', 'unknown'))

reorg_dog_data(data_dir, label_file, train_dir, test_dir, new_data_dir, valid_ratio)

Image Enhancement

= transforms.Compose transform_train ([
# randomly cut out an image area of the original image area 0.08 1 times, and the height and width ratio of 3/4 in the image of 4/3, add height and width are reduced to 224 pixels new image
transforms.RandomResizedCrop (224, Scale = (0.08, 1.0),
ratio = (3.0 / 4.0, 4.0 / 3.0)),
# 0.5 level of probability to flip
transforms.RandomHorizontalFlip (),
# randomly change the brightness, contrast, and saturation
transforms.ColorJitter (Brightness = 0.4, = 0.4 Contrast, saturation = 0.4),
transforms.ToTensor (),
# do standardized for each channel, (0.485, 0.456, 0.406) and (0.229, 0.224, 0.225) in ImageNet the calculated mean and variance of each channel
transforms.Normalize ([0.485, 0.456, 0.406 ], [0.229, 0.224, 0.225]) # on the mean and variance ImageNet
])

# On the test set of image enhancement operation only certainty
transform_test transforms.Compose = ([
transforms.Resize (256),
# cut out the center of the image height and width are 224 square region
transforms.CenterCrop (224) ,
transforms.ToTensor (),
transforms.Normalize ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

Read data

There train, valid, train_valid #new_data_dir the directory, the directory four Test
# four directories, subdirectories each image represents one category of all, the directory is belonging to the category
train_ds = torchvision.datasets.ImageFolder (root = os .path.join (new_data_dir, 'Train'),
Transform = transform_train)
valid_ds = torchvision.datasets.ImageFolder (= the os.path.join the root (new_data_dir, 'Valid'),
Transform = transform_test)
train_valid_ds = torchvision.datasets.ImageFolder (= the os.path.join the root (new_data_dir, 'train_valid'),
Transform = transform_train)
test_ds = torchvision.datasets.ImageFolder (= the os.path.join the root (new_data_dir, 'Test'),
Transform = transform_test)

batch_size = 128
train_iter = torch.utils.data.DataLoader(train_ds, batch_size=batch_size, shuffle=True)
valid_iter = torch.utils.data.DataLoader(valid_ds, batch_size=batch_size, shuffle=True)
train_valid_iter = torch.utils.data.DataLoader(train_valid_ds, batch_size=batch_size, shuffle=True)
test_iter = torch.utils.data.DataLoader(test_ds, batch_size=batch_size, shuffle=False) # shuffle=False

Definition Model

Data this game belongs to a subset of ImageNet dataset, we use fine-tuning, the choice of models on a complete data set ImageNet pre-trained to extract image features as small-scale custom input the output of the network.

Here we used the trained ResNet-34 model, the model pre-trained directly multiplexed input in the output layer, i.e., the extracted features, and we redefine the output layer, the output layer parameters of the present time we only training redefined , and for the section for extracting features, we reserve the parameters pre-training model.

get_net DEF (Device):
finetune_net = models.resnet34 (pretrained = False) # network pre-trained resnet34
finetune_net.load_state_dict (torch.load ( '/ Home / kesci / INPUT / resnet347742 / resnet34-333f7ec4.pth'))
for param in finetune_net.parameters (): # freeze parameter
param.requires_grad = False
# finetune_net.fc is a primary input units is 512, the number of output units fully connected layer 1000
# finetune_net.fc replace the original, the new finetuen_net.fc model parameter records gradient
finetune_net.fc = nn.Sequential (
nn.Linear (in_features = 512, out_features = 256),
nn.ReLU (),
nn.Linear (in_features = 256, out_features = 120) is output category # 120 number
)
return finetune_net

Definition of training function

evaluate_loss_acc DEF (data_iter, NET, Device):
# average loss of accuracy in calculating data_iter
Loss = nn.CrossEntropyLoss () # cross entropy loss function
is_training = net.training # Bool net is in the train mode
net.eval ()
l_sum , acc_sum, n-= 0, 0, 0
with torch.no_grad ():
for X-, in data_iter Y:
X-, Y = X.to (Device), y.to (Device)
y_hat NET = (X-)
L = Loss (y_hat, Y)
l_sum l.item = + () * y.shape [0]
acc_sum = + (y_hat.argmax (Dim =. 1) == Y) .sum (). Item ()
n-+ = y.shape [0]
net.train (is_training) # net recovery of the train / eval status
return l_sum / n, acc_sum / n

def train(net, train_iter, valid_iter, num_epochs, lr, wd, device, lr_period,
lr_decay):
loss = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.fc.parameters(), lr=lr, momentum=0.9, weight_decay=wd)
net = net.to(device)
for epoch in range(num_epochs):
train_l_sum, n, start = 0.0, 0, time.time()
if epoch > 0 and epoch % lr_period == 0: # 每lr_period个epoch,学习率衰减一次
lr = lr * lr_decay
for param_group in optimizer.param_groups:
param_group[‘lr’] = lr
for X, y in train_iter:
X, y = X.to(device), y.to(device)
optimizer.zero_grad()
y_hat = net(X)
l = loss(y_hat, y)
l.backward()
optimizer.step()
train_l_sum += l.item() * y.shape[0]
n += y.shape[0]
time_s = “time %.2f sec” % (time.time() - start)
if valid_iter is not None:
valid_loss, valid_acc = evaluate_loss_acc(valid_iter, net, device)
epoch_s = ("epoch %d, train loss %f, valid loss %f, valid acc %f, "
% (epoch + 1, train_l_sum / n, valid_loss, valid_acc))
else:
epoch_s = ("epoch %d, train loss %f, "
% (epoch + 1, train_l_sum / n))
print(epoch_s + time_s + ', lr ’ + str(lr))

Parameter adjustment

# Scheduling procedure: the training set, validation set into the above training function based on the output result, constantly parameter
# carried out after adjusting to obtain good parameters, then the complete training set is substituted into training, and finally test set points
# Class the results output.
num_epochs, lr_period, = 20 is lr_decay, 10, 0.1
LR, WD = 0.03, 1E. 4-
Device = torch.device ( 'CUDA' IF torch.cuda.is_available () the else 'CPU')

net = get_net(device)
train(net, train_iter, valid_iter, num_epochs, lr, wd, device, lr_period, lr_decay)

Training model in the complete data set

# Using the above parameters, on the complete data set requires roughly 40-50 minutes training the model
NET = get_net (Device)
Train (NET, train_valid_iter, None, num_epochs, LR, WD, Device, lr_period, lr_decay)

And submit to testing set classification results

Predict the test data using the trained model. Match the requirements of the test set each picture, we must predict the probability of belonging to each category.

preds = []
for X, _ in test_iter:
X = X.to(device)
output = net(X)
output = torch.softmax(output, dim=1)
preds += output.tolist()
ids = sorted(os.listdir(os.path.join(new_data_dir, ‘test/unknown’)))
with open(‘submission.csv’, ‘w’) as f:
f.write(‘id,’ + ‘,’.join(train_valid_ds.classes) + ‘\n’)
for i, output in zip(ids, preds):
f.write(i.split(’.’)[0] + ‘,’ + ‘,’.join(
[str(num) for num in output]) + ‘\n’)

Published 17 original articles · won praise 1 · views 599

Guess you like

Origin blog.csdn.net/water19111213/article/details/104494485