[PytorchLearning] Build your own data set

Build your own dataset from local data

Take the cat and dog war dataset as an example

1 Cat and Dog War Dataset

The Cats vs. Dogs (Cats and Dogs War) dataset is an introductory task that Kaggle and Titanic juxtapose, using the given data combined with different algorithms to realize the recognition of cats and dogs.
The original data set contains a training set and a test set. The number of pictures of cats and dogs in the training set is 12,500 and they are sorted in order. The test set has a total of 12,500 pictures of cats and dogs mixed out of order.
In order to simplify the task, here I only use 4000 cat and dog images, a total of 8000 data, the link is as follows:
Link: https://pan.baidu.com/s/1cJ-f-q5CQV5rdgQVAGv72Q
Extraction code: ultv
insert image description here

2 Data preprocessing

Before the data is propagated in the model of the neural network, it is first necessary to ensure thatsame size(Some pyramid models can also handle inputs of different sizes, which will not be discussed this time). In the case of MNIST, the format and size of the data have been integrated, we only need to download and load. But for this task, although the format of the cat and dog images is jpg, the size is not uniform, so a preliminary conversion is required. For example, operations such as resize and crop can be performed on images.
Here, only a simpler method is provided:resize
For all images, we resize them to the size of 100x100, the code is as follows

import cv2
import os
from tqdm import tqdm

DataPath=r'TrainingData'# 训练数据
SavePath=r'ResizedData'

for img_dir in os.listdir(DataPath):
    # 包含类别信息的文件夹
    img_label_path=os.path.join(DataPath,img_dir)
    # 存储处理后的图片的文件夹
    save_label_path=os.path.join(SavePath,img_dir)
    if not os.path.exists(save_label_path):
        os.mkdir(save_label_path)

    # 使用tqdm查看每一类别图片处理进度
    tbar = tqdm(os.listdir(img_label_path), ncols=100)
    for image in tbar:
        # 进入到文件夹下进行图片遍历
        image_path=os.path.join(img_label_path,image)
        # 使用cv2读取图片
        image_old=cv2.imread(image_path)
        image_new=cv2.resize(image_old,(100,100))
        # 保存图像
        save_path=os.path.join(save_label_path,image)
        cv2.imwrite(save_path,image_new)
    print("当前处理的图片类别为{},保存路径为{},已处理完成".format(img_dir, save_label_path))

3 Create your own Dataset

3.1 Dataset and DataLoader

There are two main classes used to create datasets in Pytroch: Dataset and DataLoader.
Among them, Dataset is not only a library for integrating datasets (for example, MNIST and CIFAR are integrated in Datasets), but also a library for representing datasets.abstract class, our data set can be built with this class;
and DataLoader is essentially an iterable (the __ iter __ method is defined internally), by reading the data in Datasets, assembling it into a batch and returning a tensor.
All in all, Dataset is one of the important instance parameters for building Dataloader.

3.2 How to define Dataset

The Dataset class is the parent class that should be inherited from all dataset loading classes in Pytorch. The two private member functions in the parent class must be overloaded, otherwise an error message will be triggered:

def __getitem__(self,index):
	# 编写支持数据集索引的函数
def __len__(self):
	# 返回数据集的大小

The key point is the getitem function, getitem receives an index, and then returns the image data and labels, this index usually refers to the index of a list, each element of this list contains the image datapathandLabelinformation. So here I will introduce a more common approach:

1-制作存储了图片的路径和标签信息的txt
2-将这些信息转化为list,该list每一个元素对应一个样本
3-通过getitem函数,读取数据和标签,并返回数据和标签

3.3 Make a txt file containing paths and tags

It is recommended to use an absolute path for the path, in case the reading fails when the relative position of the file changes.
The label must be an integer variable (tensor gradient), and must start from 0.
The label must be an integer variable (tensor gradient), and must start from 0.
The label must be an integer variable (tensor gradient), and must start from 0.
Say important things three times, everyone must remember, otherwise Assertion cur_target >= 0 && cur_target < n_classes‘ failed.special effects will be triggered...


import os
import random

DataPath=r'ResizedData'# 训练数据
absolutePath=r'D:\Code\pytorch\Pytorch_Train\catvsdog\data'# 这里需要替换成catvsdog的绝对路径
label=0

if __name__ == '__main__':

    for dataClass in os.listdir(DataPath):

        # 读取猫狗数据
        data=os.listdir(os.path.join(DataPath,dataClass))

        # 打乱顺序
        random.shuffle(data)

        # 训练集占所有数据的4/5
        train_len=len(data)*4//5

        # 将图片按照8:2划分为训练、验证
        train_list,val_list=data[:train_len],data[train_len:]


        # 将训练集写入train.txt
        with open(os.path.join(absolutePath,'train.txt'), 'a')as f:
            for img in train_list:
                f.write(os.path.join(absolutePath,DataPath,dataClass,img)+ ' ' + str(label)+'\n')
        print("标签为{}的训练集图片处理完毕".format(dataClass))

        # 将验证集写入val.txt
        with open(os.path.join(absolutePath,'val.txt'), 'a+')as f:
            for img in val_list:
                f.write(os.path.join(absolutePath,DataPath,dataClass,img) + ' ' + str(label)+'\n')
        print("标签为{}的验证集图片处理完毕".format(dataClass))

        label+=1

3.4 Read txt file, inherit Dataset

The difficult task has been solved before, this part is not difficult to understand, directly on the code

import torch
import numpy as np
from PIL import Image
from torch.utils.data.dataset import Dataset
import torchvision.transforms as transforms

def read_txt(path):
    # 读取txt文件,将图像路径和标签写入到列表中并返回
    ims,labels=[],[]
    with open(path,'r') as f:
        for sample in f.readlines():
            im,label=sample.strip().split(" ")
            ims.append(im)
            labels.append(label)
    return ims,labels

class ImageDataset(Dataset):
    # 重载DataLoader

    def __init__(self, txtpath, transform=None):

        super().__init__()
        self.ims, self.labels = read_txt(txtpath)
        self.transform = transform


    def __getitem__(self, index):

        im_path=self.ims[index]
        label=self.labels[index]

        # 使用Image库处理图片,将其转化为张量
        image=Image.open(im_path)
        transf = transforms.ToTensor()
        image = transf(image)

        # image = self.transform(image).float().cuda()
        label=torch.from_numpy(np.asarray(label,dtype=np.int32)).long()

        return image,label

    def __len__(self):
        return len(self.ims)


Subsequent model and train files have been introduced before, so I won’t repeat them here

Appendix: Project Links

Link: https://pan.baidu.com/s/1f-Gy4CAzrB0c7fUhvsKing
Extraction code: lo59

poem of the day

come back
Tao Yuanming

   Come back, come back, will the wild huhu not return? Since you regard your heart as your form, how can you feel melancholy and lonely?If you understand the past, you will not be warned, and if you know the future, you can pursue it.It's not far away from the real lost road, and I feel that today is right and yesterday was wrong. Zhou Yaoyao is flying lightly, and the wind blows his clothes. Ask Zhengfu the way before, hate the dawn of dawn.

   Nai Zhan Hengyu, Zai Xin Zai Ben. Boy servants welcome, children wait at the door. The three paths are deserted, but the pines and chrysanthemums still exist. Bringing children into the room, there are wine bottles. Introduce the pot for self-discretion, and enjoy the beauty of the garden. Leaning against the south window to express pride, the ease of judging the knees. The garden is full of fun, and the door is always closed even though it is set up. To help the old to rest, to rectify the head and watch. The cloud has no heart to go out of Xiu, and the bird knows to return when it is tired. Jing Yiyi is about to enter, caressing the lonely pine and lingering.

   Go back and come, please give me a break. The world is against me, how can I ask for it again? Please the love words of relatives, and play music and books to relieve worries. The farmer told Yu Yichun that something would happen to Xichou. Or order a towel car, or a lone boat. It is both slim and tall to find ravines, and also rugged to pass through hills. Mu Xinxin is prosperous, and the spring trickles and begins to flow. When all things are good, I feel the rest of my life.

   It's over! How long will it last in the physical universe? Do you want to stay or not? Why do you want to do it? Wealth and honor are not my wish, and the imperial hometown is unpredictable. Huai Liangchen goes alone, or works hard by planting sticks. Deng Donggao wrote poems with Shu Xiao, Lin Qingliu. Chattering and multiplication will be exhausted, and Lefu's destiny will be ridiculed again!

Guess you like

Origin blog.csdn.net/weixin_43427721/article/details/122673634