Pytorch Beginner's Notes (1): How to Load Data and Dataset Combat

Table of contents

 1. Introduction to Dataset and preliminary preparations for the project

Two, MyData class

2.1 Define classes and methods in python

2.2 Define the MyClass class

Dataset

2.3 Get pictures

2.4 Use the console to debug the corresponding information

1. Obtain the absolute path of the picture in the first chapter of the ants collection

2. Read the picture corresponding to the path

3. Display pictures: show method

4. Get the image information list

3. Improve the MyData class

3.1 Parameters and methods required in the initialization method

3.2 Writing the initialization init method

3.3 Writing of getitem method

3.4 Generate an instance 

3.4 Generation and addition of two data sets

1. Generate ant and bee datasets 

2. Dataset addition

4. Complete code

5. Code exercise using the modified data set


The difference between python file, python console and jupter notebook

Problems encountered:

1. Configure pytorch in jupyter notebook

(71 messages) Using pytorch in jupyter notebook

2. Failed to use matplotlib in pycharm

(71 messages) The solution to the failure of Pycharm to import matplotlib_c472769019's blog-CSDN blog_matplotlib import failure

 1. Introduction to Dataset and preliminary preparations for the project

 Use the help method in the notebook to view the functions and operations of the dataset class:

  • If you want to use dataset, you need to inherit the parent class of Dataset
  • Need to override __getitem__ method and __len__ method
  • __getitem__(): Get the operation function of each picture in the data set by the given key
  • __len__(): function to get the size of the image in the dataset

Pre-operation :

1. Move the dataset to the directory folder where the project is located

2. Right click on the folder/picture you want to view the path: 

 

 You can copy the required absolute path/relative path

Two, MyData class

2.1 Define classes and methods in python

  • Requirements for defining a class in python: the class keyword defines the class, followed by the full name of the class, and the brackets (object) indicate which class the class is inherited from. If there is no suitable inheritance class, the object class is used. Is the class that all classes inherit from. 
  • Requirements for defining methods in a class: When defining a method in a class, the first parameter must be self.
  • Requirements for defining methods in a class: the self variable does not need to be passed, and other parameters are passed in normally.

example: 

2.2 Define the MyClass class

  • Import the Dataset module from the torch toolbox
from torch.utils.data import Dataset

Dataset

Dataset is an abstract class. In order to facilitate reading, the data to be used needs to be packaged as a Dataset class. A custom Dataset needs to inherit it and implement two member methods:

1.  __getitem__() This method defines an index ( 0 to  len(self)) to obtain a piece of data or a sample, which can be accessed using the object [item]

2.  __len__() This method returns the total length of the dataset

 

First, rewrite the init method and getitem method, and later rewrite the len() method

2.3 Get pictures

To import a picture, you need to get the corresponding picture image and the corresponding label label, and you also need to get the img_path where the picture is located

 Modules that need to be imported to read pictures

# 读取图片
from PIL import Image

2.4 Use the console to debug the corresponding information

Console role: can display defined variables and related properties

1. Obtain the absolute path of the picture in the first chapter of the ants collection

Stored in the img_path variable, the copied path needs to be escaped with a double slash.

2. Read the picture corresponding to the path

Use the open method in Image

You can see that the relevant attributes of the img variable appear on the right 

For example, the size value is the size of the picture, which can be output in the console

 

3. Display pictures: show method

 After calling this method, a corresponding window for displaying pictures can pop up 

4. Get the image information list

  • Introduce the os library: import os
  • Get the relative path of the folder: dir_path="dataset/train/ants"
  • Get the picture list: os.listdir function, you can get a list of all picture names under the corresponding folder

The picture shows the img_path_list object, you can see the names of all the pictures in the ants folder, a total of 124 pictures, so the list size is 124 

 If you access the elements of the img_path_list list, such as the first element, the subscript is 0, you can output the name of the picture in the first chapter

3. Improve the MyData class

3.1 Parameters and methods required in the initialization method

  • root_dir: root file path, root_dir="dataset/train"
  • label_dir: the label of the picture, because the label name is the folder name, so it is named label_dir, label_dir="ants"
  • os.path.join(x,y) method: You can splice the strings corresponding to x and y, and you can access the image you want to access through address splicing, the effect is shown in the figure.
  • os.listdir(path) method: Generate a list of picture names from the pictures under the corresponding path

3.2 Writing the initialization init method

 After obtaining the root directory of the file and the label directory, use the join method to splice the addresses to obtain the address of the corresponding image folder, and then use the listdir method to obtain the list of images at the address

# 重写函数的初始化方法
    def __init__(self,root_dir,label_dir):
        # 初始化
        self.root_dir=root_dir
        self.label_dir=label_dir
        # 获取图片文件夹的路径
        self.path=os.path.join(self.root_dir,self.label_dir)
        # 获取对应图片路径的图片名称列表
        self.img_list=os.listdir(self.path)

3.3 Writing of getitem method

Function: Get the object of a single picture in the image list and its label

idx: the index value of the corresponding image

Use the splicing method: folder path + picture name to get the address of a specific picture 

The open method generates the corresponding image object

Python basics: If there are multiple return values, they are packaged in tuples by default , so the geitem method returns a tuple of (img, label) 

 # 重写类的getitem方法
    def __getitem__(self, idx):
        # 获取单个图片名称
        img_name=self.img_list[idx]
        # 获取单个图片路径,使用拼接法
        img_item_path=os.path.join(self.root_dir,self.label_dir,img_name)
        # 生成对应图片对象
        img = Image.open(img_item_path)
        # 对应标签
        label = self.label_dir
        # 返回图像和标签,以元组格式返回
        return img,label

3.4 Generate an instance 

root_dir="dataset/train"
label_dir="ants"
#实例化MyData类
ants_datasets=MyData(root_dir,label_dir)

Test in the console, you can see that the generated ants_datasets object has all the attributes we defined in the initialization method above, such as list, path, etc.

 The first item of the ants_datasets dataset is the first image object and its label

img,label=ants_datasets[1], use img and label to accept img and label in the tuple, you can see that img and label in the variable have corresponding specific values

3.4 Generation and addition of two data sets

1. Generate ant and bee datasets 

root_dir="dataset/train"
ants_label_dir="ants"
bees_label_dir="bees"
# 生成MyData类的实例对象
ants_datasets=MyData(root_dir,ants_label_dir)
bees_datasets=MyData(root_dir,bees_label_dir)

2. Dataset addition

It can be seen that the length of train_datasets after addition is the sum of the two data sets 

 

4. Complete code

from torch.utils.data import Dataset
# 读取图片
from PIL import Image
# 关于系统的库
import os
class MyData(Dataset):
    # 重写函数的初始化方法
    def __init__(self,root_dir,label_dir):
        # 初始化
        self.root_dir=root_dir
        self.label_dir=label_dir
        # 获取图片文件夹的路径
        self.path=os.path.join(self.root_dir,self.label_dir)
        # 获取对应图片路径的图片名称列表
        self.img_list=os.listdir(self.path)

    # 重写类的getitem方法
    def __getitem__(self, idx):
        # 获取单个图片名称
        img_name=self.img_list[idx]
        # 获取单个图片路径,使用拼接法
        img_item_path=os.path.join(self.root_dir,self.label_dir,img_name)
        # 生成对应图片对象
        img = Image.open(img_item_path)
        # 对应标签
        label = self.label_dir
        # 返回图像和标签,元组
        return img,label

    def __len__(self):
        return len(self.img_list)

root_dir="dataset/train"
ants_label_dir="ants"
bees_label_dir="bees"
# 生成MyData类的实例对象
ants_datasets=MyData(root_dir,ants_label_dir)
bees_datasets=MyData(root_dir,bees_label_dir)
# 两个数据集相加
train_datasets=ants_datasets+bees_datasets

5. Code exercise using the modified data set

The modified data set structure is shown in the figure below, and images and labels each have a folder for storage

 Under the label folder is the label of each image, which is a txt file, the file name is the same as the image name, and the file content has only one line, which is the label content ants

 Therefore, when obtaining tags, you need to use file to read the file form

from torch.utils.data import Dataset
from PIL import Image
import os

class MyDataset(Dataset):
    def __init__(self,root_dir,img_dir,label_dir):
        # 根文件路径
        self.root_dir=root_dir
        # 图片文件路径
        self.img_dir=img_dir
        #标签文件夹路径
        self.label_dir=label_dir
        # 获取图片文件夹路径并生成图片名称的列表
        self.img_path=os.path.join(self.root_dir,self.img_dir)
        self.img_list=os.listdir(self.img_path)
        #获取标签文件夹路径并生成标签名称的列表
        self.label_path=os.path.join(self.root_dir,self.label_dir)
        self.label_list=os.listdir(self.label_path)

    def __getitem__(self, item):
        img_name=self.img_list[item]
        img_item_path=os.path.join(self.img_path,img_name)
        # 读取对应路径的图片内容,生成图片对象,存储在img中
        img=Image.open(img_item_path)

        label_name=self.label_list[item]
        label_item_path=os.path.join(self.label_path,label_name)
        # 打开对应路径的txt文件,读取对应内容,存储在label中
        file1 = open(label_item_path,"r")
        label= file1.readline()
        return img,label

    def __len__(self):
        return len(self.img_list)

root_dir="datasets2/train"
ants_img_dir="ants_image"
ants_label_dir="ants_label"
bees_img_dir="bees_image"
bees_label_dir="bees_label"
ants_datasets=MyDataset(root_dir,ants_img_dir,ants_label_dir)
bees_datasets=MyDataset(root_dir,bees_img_dir,bees_label_dir)

Guess you like

Origin blog.csdn.net/weixin_45662399/article/details/127386185