Article directory
foreword
The models we talked about before usually focus on a single task, such as predicting the category of pictures, etc. During training, we will focus on the optimization of a specific indicator. But sometimes, we need to know a picture and know the
type of news from it (Politics/sports/entertainment) and whether it is male news or female news.
We focus on the optimization of a specific indicator, and may ignore the useful information on the indicators of interest. Specifically, it is the additional information brought by training related tasks , by sharing representations in multiple related tasks, we can make the model generalize better on our original tasks. This method is called multi-task learning.
1. Multi-task learning
1.1 Definition
Simultaneously complete multiple predictions, shared representations, and shared feature extraction. This allows the model to focus on some unique features. In fact, a set of network for feature extraction, combined with multiple loss functions, is a multi-task loss. Image positioning is a single task, if you still
need Knowing the categories, it becomes multi-task learning.
1.2 Principle
The model of multi-task learning is usually implemented by re-sharing the hidden layer (feature extraction layer) for all tasks, and using multiple output layers for different tasks. The more tasks that are automatically learned, the model can obtain representations that capture all tasks, and The risk of overfitting on the original task is smaller.
In multi-task learning, for the feature extraction of a task, since other tasks can also filter the extracted features, it can help the model focus on those that really work. Features.
The model will learn features that express multiple tasks as much as possible, and these features will have good generalization ability.
2. Multi-task learning code
Simultaneously predict the color and category of an item.
2.1 Preliminary Study on Datasets
One branch is used to classify the kind of clothing (such as shirt, skirt, jeans, shoes, etc.) given an input image; the other
branch is used to classify the color of that clothing (black, red, blue, etc.).
In total, our dataset consists of 2525 images divided into 7 "color+category" combinations, including:
Black jeans (344 images)
black shoes (358 images) blue skirt (386 images) blue jeans (356 images)
blue shirt (369 images) red skirt (380 images) red shirt (332 images) Image) data set download link: https://pan.baidu.com/s/1JtKt7KCR2lEqAirjIXzvgg Extraction code: 2kbc
2.2 Preprocessing
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import torchvision
import glob
from torchvision import transforms
from torch.utils import data
from PIL import Image
img_paths = glob.glob(r"F:\multi-output-classification\dataset\*\*.jpg")
img_paths[:5]
The path folder represents the label, so to get its label:
label_names = [img_path.split("\\")[-2] for img_path in img_paths]
label_names[:5]
label_array = np.array([la.split("_") for la in label_names])
label_array
label_color = label_array[:,0]
label_color
label_item = label_array[:,1]
label_item
Turn them into index, because only numbers are recognized in torch
unique_color = np.unique(label_color)
unique_color
unique_item = np.unique(label_item)
unique_item
item_to_idx = dict((v,k) for k, v in enumerate(unique_item))
item_to_idx
color_to_idx = dict((v,k) for k, v in enumerate(unique_color))
color_to_idx
label_item = [item_to_idx.get(k) for k in label_item]
label_color = [color_to_idx.get(k) for k in label_color ]
transform = transforms.Compose([
transforms.Resize((96,96)),
transforms.ToTensor(),
])
custom data set
class Multi_dataset(data.Dataset):
def __init__(self,imgs_path, label_color, label_item) -> None:
super().__init__()
self.imgs_path = imgs_path
self.label_color = label_color
self.label_item = label_item
def __getitem__(self, index):
img_path = self.imgs_path[index]
pil_img = Image.open(img_path)
# 防止有图片有黑白图
pil_img = pil_img.convert('RGB')
pil_img = transform(pil_img)
label_c = self.label_color[index]
label_i = self.label_item[index]
return pil_img, (label_c,label_i)
def __len__(self):
return len(self.imgs_path)
Divide the training set
count = len(multi_dataset)
count
# 划分训练集 测试集
train_count = int(count*0.8)
test_count = count - train_count
train_ds, test_ds = data.random_split(multi_dataset,[train_count, test_count])
len(train_ds),len(test_ds)
BATCHSIZE = 32
train_dl = data.DataLoader(train_ds,batch_size=BATCHSIZE,shuffle=True)
test_dl = data.DataLoader(test_ds,batch_size=BATCHSIZE)
2.3 Network structure design
## 定义网络
class Net(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3,16,3)
self.conv2 = nn.Conv2d(16,32,3)
self.conv3 = nn.Conv2d(32,64,3)
self.fc = nn.Linear(64*10*10, 1024)
self.fc1 = nn.Linear(1024,3)
self.fc2 = nn.Linear(1024,4)
def forward(self,x):
# 3X96X96-->3X48*48--->3X24X24--->3X12X12
x = F.relu(self.conv1(x))
x = F.max_pool2d(x, 2)
x = F.relu(self.conv2(x))
x = F.max_pool2d(x,2)
x = F.relu(self.conv3(x))
x = F.max_pool2d(x,2)
x = x.view(-1,64*10*10)
c = F.relu(self.fc(x))
i = self.fc2(x)
return c,i
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = Net().to(device)
model
Net(
(conv1): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1))
(conv2): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1))
(conv3): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1))
(fc): Linear(in_features=6400, out_features=1024, bias=True)
(fc1): Linear(in_features=1024, out_features=3, bias=True)
(fc2): Linear(in_features=1024, out_features=4, bias=True)
)
2.4 Training
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.0005)
3. Summary
to be continued