Essential practical combat for visual entry--pytorch--Ali Tianchi Competition--Street View Characters--hands-on guidance

Table of contents

Foreword:

Libraries used:

1. Data preparation

2. Data loading

3. Create the Dataset class

pytorch -- Detailed Explanation of Dataset and DataLoader for Data Loading

4. Data enhancement, creating DataLoader

5. Build the model:

6. Model training

7. Model prediction results

8. Score submission

Foreword:

At present, the official competition of Ali Tianchi Competition has ended, and there is another long-term competition that students can participate in to increase their basic cv knowledge

Tianchi Big Data Competition_Tianchi Competition-Alibaba Cloud Tianchi

Here is the official website of the Tianchi Competition. Friends who want to play can click the link above to register. There are many competitions on data analysis, visual inspection, and algorithms. There are also many introductory competitions. Everyone can try to learn

Without further ado, let's start:

Remarks: By default, the students have already configured the pytorch environment. Of course, if you encounter a temporary library, you can install it again

The basic process is as follows:


Libraries used:

import os, sys, glob, shutil, json
os.environ["CUDA_VISIBLE_DEVICES"] = '0'
import cv2

from PIL import Image
import numpy as np

from tqdm import tqdm, tqdm_notebook
# %pylab inline

import torch
torch.manual_seed(0)
torch.backends.cudnn.deterministic = False
torch.backends.cudnn.benchmark = True

1. Data preparation

Enter the official website of the Tianchi Competition, and you can get the data after registration. There are data sets, verification sets and test machine download URLs in the csv file on the official website:

2. Data loading

Go directly to the code:

Data loading for the training set:

train_path = sorted(glob.glob('D:/1wangyong\pytorchtrains\街景字符\Data\mchar_train/*.png'))
train_json = json.load(open('D:/1wangyong\pytorchtrains\街景字符\Data\mchar_train.json'))

train_label = [train_json[x]['label'] for x in train_json]

The data loading for the test set is the same as for the training set:

val_path = sorted(glob.glob('D:/1wangyong\pytorchtrains\街景字符\Data\mchar_val/*.png'))
val_json = json.load(open('D:/1wangyong\pytorchtrains\街景字符\Data\mchar_val.json'))
val_label = [val_json[x]['label'] for x in val_json]
print(len(val_path), len(val_label))

Many articles write the path, it is best not to bring Chinese, because there is no problem with the operation, so it has not been changed. If you are not at ease, the path can also be in English

3. Create the Dataset class

In pytorch, after the data is loaded, it is necessary to create a Dataset class, which can be found in my blog:

pytorch -- Detailed Explanation of Dataset and DataLoader for Data Loading

View detailed description in:

class SVHNDataset(Dataset):
    def __init__(self, img_path, img_label, transform=None):
        self.img_path = img_path
        self.img_label = img_label
        if transform is not None:
            self.transform = transform
        else:
            self.transform = None

    def __getitem__(self, index):
        img = Image.open(self.img_path[index]).convert('RGB')

        if self.transform is not None:
            img = self.transform(img)

        lbl = np.array(self.img_label[index], dtype=np.int_)
        lbl = list(lbl) + (5 - len(lbl)) * [10]
        return img, torch.from_numpy(np.array(lbl[:5]))

    def __len__(self):
        return len(self.img_path)


 4. Data enhancement, creating DataLoader

This is also the training set and the validation set are separated:

Test set:

val_loader = torch.utils.data.DataLoader(
    SVHNDataset(val_path, val_label,
                transforms.Compose([
                    transforms.Resize((80, 160)),
                    transforms.RandomCrop((64, 128)),
                    # transforms.ColorJitter(0.3, 0.3, 0.2),
                    # transforms.RandomRotation(5),
                    transforms.ToTensor(),
                    transforms.Normalize([0.485, 0.456, 0.406], [
                                         0.229, 0.224, 0.225])
                ])),
    batch_size=64,
    shuffle=False,
    num_workers=0,
)

Training set:

train_loader = torch.utils.data.DataLoader(
    SVHNDataset(train_path, train_label,
                transforms.Compose([
                    transforms.Resize((80, 160)),
                    transforms.RandomCrop((64, 128)),
                    transforms.ColorJitter(0.3, 0.3, 0.2),
                    transforms.RandomRotation(10),
                    transforms.ToTensor(),
                    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
                ])),
    batch_size=64,
    shuffle=True,
    num_workers=0,
)

5. Build the model:

Official website model:

class SVHN_Model1(nn.Module):
    def __init__(self):
        super(SVHN_Model1, self).__init__()

        model_conv = models.resnet18(pretrained=True)
        model_conv.avgpool = nn.AdaptiveAvgPool2d(1)
        model_conv = nn.Sequential(*list(model_conv.children())[:-1])  # 去除最后一个fc layer
        self.cnn = model_conv

        self.fc1 = nn.Linear(512, 11)
        self.fc2 = nn.Linear(512, 11)
        self.fc3 = nn.Linear(512, 11)
        self.fc4 = nn.Linear(512, 11)
        self.fc5 = nn.Linear(512, 11)

    def forward(self, img):        
        feat = self.cnn(img)
        #print(feat.shape)
        feat = feat.view(feat.shape[0], -1)
        c1 = self.fc1(feat)
        c2 = self.fc2(feat)
        c3 = self.fc3(feat)
        c4 = self.fc4(feat)
        c5 = self.fc5(feat)
        return c1, c2, c3, c4, c5

The model given by the official website is relatively basic. If you only use the official website, it will definitely not make much sense:

So make the following improvements to the network:

We can make a series of improvements to the backbone network we use:

1. Change from resnet18 to larger resnet152

2. Add a fully connected hidden layer to each classification module

3. Add dropout to the hidden layer

4. Add a relu function to the fully connected hidden layer, and
change the enhanced nonlinearity from resnet18 to resnet152. The deeper model has better expressive ability. Adding a layer of hidden layer also plays a role in increasing the model fitting ability. At the same time, dropout is added to the hidden layer to perform a balance, which prevents overfitting to a certain extent. (Just some improvement techniques, not optimal)

The improved model definition code is as follows:

class SVHN_Model2(nn.Module):
    def __init__(self):
        super(SVHN_Model2, self).__init__()

        # resnet18
        model_conv = models.resnet152(pretrained=True)
        model_conv.avgpool = nn.AdaptiveAvgPool2d(1)
        model_conv = nn.Sequential(*list(model_conv.children())[:-1])  # 去除最后一个fc layer
        self.cnn = model_conv

        self.hd_fc1 = nn.Linear(512, 256)
        self.hd_fc2 = nn.Linear(512, 256)
        self.hd_fc3 = nn.Linear(512, 256)
        self.hd_fc4 = nn.Linear(512, 256)
        self.hd_fc5 = nn.Linear(512, 256)
        self.dropout_1 = nn.Dropout(0.25)
        self.dropout_2 = nn.Dropout(0.25)
        self.dropout_3 = nn.Dropout(0.25)
        self.dropout_4 = nn.Dropout(0.25)
        self.dropout_5 = nn.Dropout(0.25)
        self.fc1 = nn.Linear(256, 11)
        self.fc2 = nn.Linear(256, 11)
        self.fc3 = nn.Linear(256, 11)
        self.fc4 = nn.Linear(256, 11)
        self.fc5 = nn.Linear(256, 11)

    def forward(self, img):
        feat = self.cnn(img)
        feat = feat.view(feat.shape[0], -1)

        feat1 = torch.relu(self.hd_fc1(feat))
        feat2 = torch.relu(self.hd_fc2(feat))
        feat3 = torch.relu(self.hd_fc3(feat))
        feat4 = torch.relu(self.hd_fc4(feat))
        feat5 = torch.relu(self.hd_fc5(feat))
        feat1 = self.dropout_1(feat1)
        feat2 = self.dropout_2(feat2)
        feat3 = self.dropout_3(feat3)
        feat4 = self.dropout_4(feat4)
        feat5 = self.dropout_5(feat5)

        c1 = self.fc1(feat1)
        c2 = self.fc2(feat2)
        c3 = self.fc3(feat3)
        c4 = self.fc4(feat4)
        c5 = self.fc5(feat5)

        return c1, c2, c3, c4, c5

6. Model training

The basic data loading, data enhancement, and model building have been completed, and the formal training can be started:

Some students may be curious, what are the previous Datase class and DataLoader process used for?

Still, students can take a look at my previous blog: pytorch -- Detailed Explanation of Dataset and DataLoader for Data Loading

Here is a detailed introduction

Training code:

model = SVHN_Model2()
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), 0.001)
best_loss = 1000.0

use_cuda = True
if use_cuda:
    model = model.cuda()

for epoch in range(100):
    start = time.time()
    print('start', time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(start)))
    train_loss = train(train_loader, model, criterion, optimizer, epoch)
    val_loss = validate(val_loader, model, criterion)
    val_label = [''.join(map(str, x)) for x in val_loader.dataset.img_label]
    val_predict_label = predict(val_loader, model, 1)
    val_predict_label = np.vstack([
        val_predict_label[:, :11].argmax(1),
        val_predict_label[:, 11:22].argmax(1),
        val_predict_label[:, 22:33].argmax(1),
        val_predict_label[:, 33:44].argmax(1),
        val_predict_label[:, 44:55].argmax(1),
    ]).T
    val_label_pred = []
    for x in val_predict_label:
        val_label_pred.append(''.join(map(str, x[x != 10])))

    val_char_acc = np.mean(np.array(val_label_pred) == np.array(val_label))
    end = time.time()
    print('end', time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(end)))
    time_cost = end - start
    print(
        'Epoch: {0}, Train loss: {1} \t Val loss: {2}, time_cost: {3}'.format(
            epoch,
            train_loss,
            val_loss,
            time_cost))
    print('Val Acc', val_char_acc)
    # 记录下验证集精度
    if val_loss < best_loss:
        best_loss = val_loss
        # print('Find better model in Epoch {0}, saving model.'.format(epoch))
        torch.save(model.state_dict(), './model.pt')

I added some records of start time and end time to the training code to see how long it takes for the network to iterate once. Students who don't need it can just comment out and it's OK

If the chosen optimizer is Adam, the training will be completed within about 20 rounds. For SGD, it will take longer to train:

Adam used by the above code

7. Model prediction results

In the above process, we have completed the training of the model, and saved the best trained model, we just need to take the model out for testing:

code show as below:

model = SVHN_Model1().cuda()
test_path = sorted(glob.glob('D:/1wangyong\pytorchtrains\街景字符\Data\mchar_test_a/*.png'))
# test_json = json.load(open('../input/test.json'))
test_label = [[1]] * len(test_path)
# print(len(test_path), len(test_label))

test_loader = torch.utils.data.DataLoader(
    SVHNDataset(test_path, test_label,
                transforms.Compose([
                    transforms.Resize((68, 136)),
                    transforms.RandomCrop((64, 128)),
                    # transforms.ColorJitter(0.3, 0.3, 0.2),
                    # transforms.RandomRotation(5),
                    transforms.ToTensor(),
                    transforms.Normalize([0.485, 0.456, 0.406], [
                                         0.229, 0.224, 0.225])
                ])),
    batch_size=40,
    shuffle=False,
    num_workers=0,
)

# 加载保存的最优模型
model.load_state_dict(torch.load('D:/Projects/wordec/model.pt'))

test_predict_label = predict(test_loader, model, 1)
print(test_predict_label.shape)
print('test_predict_label', test_predict_label)

test_label = [''.join(map(str, x)) for x in test_loader.dataset.img_label]
# print('test_label', test_label)
test_predict_label = np.vstack([
    test_predict_label[:, :11].argmax(1),
    test_predict_label[:, 11:22].argmax(1),
    test_predict_label[:, 22:33].argmax(1),
    test_predict_label[:, 33:44].argmax(1),
    test_predict_label[:, 44:55].argmax(1),
]).T

test_label_pred = []
for x in test_predict_label:
    test_label_pred.append(''.join(map(str, x[x != 10])))
# print("test_label_pred", len(test_label_pred))
df_submit = pd.read_csv('D:/Projects/wordec/input/test_A_sample_submit.csv')
df_submit['file_code'] = test_label_pred
df_submit.to_csv('submit_1018.csv', index=None)
print("finished")

After completing the above process, the students have completed a basic training. Are you a little excited?

8. Score submission

Enter the official website of the Tianchi competition just now, find the relevant competition, and submit the result! !

Remarks: (The result is the file saved in the seventh step)

Go check your ranking now! !

Guess you like

Origin blog.csdn.net/weixin_53374931/article/details/130100125