Article Directory
a classic item on kaggle, when CNN used to be doing a beginning, do simple rough
I put the whole project is divided into four
config to configure some parameters, Dataset used to build data sets
Main data used to train and save Wait,
the configuration of the model config that Module uses to build is as follows
TRAIN_PATH = r'D:\temp\train'
PRE_PATH = r'D:\temp\test1'
BATCH_SIZE = 200 # batch_size
PIC_SIZE = 100 # 图片的大小
Data preparation & processing
Here I have written it in the Dataset file.
The data comes from the kaggle official website https://www.kaggle.com/c/dogs-vs-cats to download.
There are three files in total, one csv file is used for submission, and the remaining two are one for training and one for prediction.
Taking a look at the training set, you will find that its label is directly attached to the name of the picture, so you need to match the string when processing it.
Furthermore, Kaggle did not divide the training set and the test set, so you should manually divide it here. (In fact, it should be divided into training set, validation set, test set, but I forgot it at the time)
First import the necessary packages
import os
from PIL import Image
from torch.utils.data import Dataset
from torchvision import transforms
import torch
import random
import config
The data set of pytorch is inherited from Dataset, so we have to define a class
class Cat_Vs_DogDatasets(Dataset):
pass
Then the general idea of construction is to give a list of picture names, then synthesize the picture paths according to the paths, and then import them one by one.
class Cat_Vs_DogDatasets(Dataset):
def __init__(self, path: str, imglist: list, s: int=config.PIC_SIZE):
self.path = path # 图片文件夹路径
self.compose = transforms.Compose([ # 图片processing
transforms.Resize(size=s), # 按照比例,把最小的边缩放成s大小
transforms.CenterCrop(size=s), # 裁剪为s乘s的图片
transforms.ToTensor(),
transforms.Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5))
])
self.imglist = imglist # 图片名词列表
self.len = len(imglist)
def __getitem__(self, idx: int):
name = self.imglist[idx]
label = 0
if name[:3] == 'cat': # 获取label
label = 1
return self.compose(Image.open(os.path.join(self.path, name) ,mode='r')), torch.tensor(label, dtype=torch.int64)
def __len__(self):
return self.len
Then, we built a data set structure, and then we need to divide the data. We can divide the data directly by dividing the list of picture names.
def train_test_split(path, test_size=0.15, random_state:int=666, s: int=config.PIC_SIZE):
imglist = os.listdir(path) # 把所有图片的名词导入
random.seed(random_state)
random.shuffle(imglist) #打乱顺序
train_size = int((1 - test_size) * len(imglist)) # 计算训练集代销
train = imglist[:train_size] # 训练集
test = imglist[train_size:] # 测试集
return Cat_Vs_DogDatasets(path, imglist=train, s=s), Cat_Vs_DogDatasets(path, imglist=test, s=s) # 生成
So far, the data is ready.
Model building
Here is a random model, there are two convolutional layers, pooling layer, and then connected to a double-layer fully connected network.
import torch
from torch import nn
import config
class CNN(nn.Module):
def __init__(self):
super(CNN, self).__init__()
self.cv1 = nn.Conv2d(in_channels=3, out_channels=10, kernel_size=5, stride=1, padding=0)
self.cv2 = nn.Conv2d(in_channels=10, out_channels=20, kernel_size=5, stride=1, padding=0)
self.maxpooling = nn.MaxPool2d(kernel_size=2)
self.acFun = nn.functional.relu
self.liner1 = nn.Linear(9680, 80)
self.liner2 = nn.Linear(80, 2)
def forward(self, x):
x = self.acFun(self.cv1(x))
x = self.maxpooling(x)
x = self.acFun(self.cv2(x))
x = self.maxpooling(x)
x = x.view(x.shape[0], -1)
x = self.acFun(self.liner1(x))
x = self.acFun(self.liner2(x))
return x
In the model, the size of my picture is 100×100. The reason why it is so small is because of the limited memory of the graphics card, but the resolution of 100×100 is not bad.
training
Training here, the optimizer used is SGD, and the loss function is the cross-entropy function.
import Dataset
import Module
import config
import torch
device = torch.device('cuda:0')
train, test = Dataset.train_test_split(config.TRAIN_PATH)
train_loader = torch.utils.data.DataLoader(dataset=train, batch_size=config.BATCH_SIZE, num_workers=4, shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset=test, batch_size=config.BATCH_SIZE)
moudle = Module.CNN().to(device)
sgd = torch.optim.SGD(params=moudle.parameters(), lr=0.01, momentum=0.15)
loss_fun = torch.nn.functional.cross_entropy
There is a problem here. When using SGD, I found that when the lr is too large, the loss does not decrease or rise. I am a bit confused and now I have not figured out how to explain it, so I adjusted the lr to a smaller value. Then increase the size of Momentum appropriately.
Then I wrote a function to calculate ACC
def OutPutACC():
with torch.no_grad():
A = 0
for batch in test_loader:
X, y = batch
X = X.to(device)
y = y.to(device)
pre = torch.argmax(moudle(X), dim=1)
A += (pre == y).sum().item()
print(A / len(test))
Finally train
if __name__ == '__main__':
for epoch in range(40):
for i, batch in enumerate(train_loader):
X, y = batch
X = X.to(device)
y = y.to(device)
pre = moudle(X)
loss = loss_fun(pre, y)
sgd.zero_grad()
loss.backward()
sgd.step()
if i % 10 == 0:
print(loss)
OutPutACC()
After 40 rounds of training and about 20 minutes, the ACC of the final model on the test set was 77.2%. The model is obviously still underfitting, and the number of training rounds can continue to increase.
Finally, attach the entire project link
https://github.com/zipper112/CatvsDog