Download the data set:
Link: https: //pan.baidu.com/s/1l1AnBgkAAEhh0vI5_loWKw
extraction code: 2xq4
Create a data set: https://www.cnblogs.com/xiximayou/p/12398285.html
Reading data sets: https://www.cnblogs.com/xiximayou/p/12422827.html
Training: https://www.cnblogs.com/xiximayou/p/12448300.html
Save the model and continued training: https://www.cnblogs.com/xiximayou/p/12452624.html
Load a saved model and test: https://www.cnblogs.com/xiximayou/p/12459499.html
Training and validation set division side edge verification: https://www.cnblogs.com/xiximayou/p/12464738.html
Use learning strategies and attenuation rate of the side edge training test: https://www.cnblogs.com/xiximayou/p/12468010.html
Tensorboard use visual training and testing process: https://www.cnblogs.com/xiximayou/p/12482573.html
Receiving the command line parameters: https://www.cnblogs.com/xiximayou/p/12488662.html
Top1 top5 use and accuracy to measure the model: https://www.cnblogs.com/xiximayou/p/12489069.html
The use of pre-trained resnet18 model: https://www.cnblogs.com/xiximayou/p/12504579.html
Calculating mean and variance of the data set: https://www.cnblogs.com/xiximayou/p/12507149.html
The relationship between the Epoch, the BatchSize, the STEP: https://www.cnblogs.com/xiximayou/p/12405485.html
pytorch read data sets in two ways. This section describes a second embodiment.
The directory structure is stored in the data set:
First, we need a picture of the path and label stored to txt file, create a new file in the utils Img_to_txt.py
import os from glob import glob root="/content/drive/My Drive/colab notebooks/data/dogcat/" train_path=root+"train" val_path=root+"val" test_path=root+"test" def img_to_txt(path): tmp=path.strip().split("/")[-1] filename=tmp+".txt" with open(filename,'a',encoding="utf-8") as fp: i=0 for f in sorted(os.listdir(path)): for image in glob(path+"/"+str(f)+"/*.jpg"): fp.write(image+" "+str(i)+"\n") i+=1 img_to_txt(train_path) #img_to_txt(val_path)#img_to_txt(test_path)
Wherein the os.listdir () for obtaining a list of files in the folder path, [ 'cat', 'dog']. glob () to get all matching files in the directory. To be able to digitally mark category in order, we need to sort the list of directories. We will then cat marked as 0, dog labeled 1. And the image corresponding path and labels to add to the txt.
After running Similar results were obtained:
Then we want to achieve their defined data set class, need to inherit Dataset class, and override __getitem __ () and __len __ () method: Create a new file in the utils read_from_txt.py
from torch.utils.data import Dataset from PIL import Image class Dogcat(Dataset): def __init__(self,txt_path,transform=None,target_transform=None): super(Dogcat,self).__init__() self.txt_path=txt_path self.transform=transform self.target_transform=target_transform fp=open(txt_path,'r') imgs=[] for line in fp: line=line.strip().split() #print(line) img=line[0]+" "+line[1]+" "+line[2] #['/content/drive/My', 'Drive/colab', 'notebooks/data/dogcat/train/cat/cat.9997.jpg', '0'] #imgs.append((line[0],int(line[-1]))) imgs.append((img,int(line[-1]))) self.imgs=imgs def __getitem__(self,index): image,label=self.imgs[index] image=Image.open(image).convert('RGB') if self.transform is not None: image=self.transform(image) return image,label def __len__(self): return len(self.imgs)
Since we have a space in the path, the path taken note and when the label image.
After the rdata.py
from torch.utils.data import DataLoader import torchvision import torchvision.transforms as transforms import torch from utils import read_from_txt def load_dataset_from_dataset(batch_size): #预处理 print(batch_size) train_transform = transforms.Compose([transforms.RandomResizedCrop(224),transforms.ToTensor()]) val_transform = transforms.Compose([transforms.Resize((224,224)),transforms.ToTensor()]) test_transform = transforms.Compose([transforms.Resize((224,224)),transforms.ToTensor()]) root="/content/drive/My Drive/colab notebooks/utils/" train_loader = DataLoader(read_from_txt.Dogcat(root+"train.txt",train_transform), batch_size=batch_size, shuffle=True, num_workers=6) val_loader = DataLoader(read_from_txt.Dogcat(root+"val.txt",val_transform), batch_size=batch_size, shuffle=True, num_workers=6) test_loader = DataLoader(read_from_txt.Dogcat(root+"test.txt",test_transform), batch_size=batch_size, shuffle=True, num_workers=6) return train_loader,val_loader,test_loader
Then main.py are ready to use.
train_loader,val_loader,test_loader=rdata.load_dataset_from_dataset(batch_size)
Check the error under train.txt find duplicate file names, these duplicate files to delete.
Last run:
This last being given to:
The image address have not been read to DataLoader joined in the? Thread safe? Yet to find a solution. But the process of creating a data set as a whole is one such.