[Dogs and cats] to read the data set dataset second way

Download the data set:

Link: https: //pan.baidu.com/s/1l1AnBgkAAEhh0vI5_loWKw
extraction code: 2xq4

Create a data set: https://www.cnblogs.com/xiximayou/p/12398285.html

Reading data sets: https://www.cnblogs.com/xiximayou/p/12422827.html

Training: https://www.cnblogs.com/xiximayou/p/12448300.html

Save the model and continued training: https://www.cnblogs.com/xiximayou/p/12452624.html

Load a saved model and test: https://www.cnblogs.com/xiximayou/p/12459499.html

Training and validation set division side edge verification: https://www.cnblogs.com/xiximayou/p/12464738.html

Use learning strategies and attenuation rate of the side edge training test: https://www.cnblogs.com/xiximayou/p/12468010.html

Tensorboard use visual training and testing process: https://www.cnblogs.com/xiximayou/p/12482573.html

Receiving the command line parameters: https://www.cnblogs.com/xiximayou/p/12488662.html

Top1 top5 use and accuracy to measure the model: https://www.cnblogs.com/xiximayou/p/12489069.html

The use of pre-trained resnet18 model: https://www.cnblogs.com/xiximayou/p/12504579.html

Calculating mean and variance of the data set: https://www.cnblogs.com/xiximayou/p/12507149.html

The relationship between the Epoch, the BatchSize, the STEP: https://www.cnblogs.com/xiximayou/p/12405485.html

 

pytorch read data sets in two ways. This section describes a second embodiment.

The directory structure is stored in the data set:

First, we need a picture of the path and label stored to txt file, create a new file in the utils Img_to_txt.py

import os
from glob import glob 
root="/content/drive/My Drive/colab notebooks/data/dogcat/"
train_path=root+"train"
val_path=root+"val"
test_path=root+"test"
def img_to_txt(path):
  tmp=path.strip().split("/")[-1]
  filename=tmp+".txt"
  with open(filename,'a',encoding="utf-8") as fp:
    i=0
    for f in sorted(os.listdir(path)):
      for image in glob(path+"/"+str(f)+"/*.jpg"):
        fp.write(image+" "+str(i)+"\n")
      i+=1
img_to_txt(train_path)
#img_to_txt(val_path)#img_to_txt(test_path)

Wherein the os.listdir () for obtaining a list of files in the folder path, [ 'cat', 'dog']. glob () to get all matching files in the directory. To be able to digitally mark category in order, we need to sort the list of directories. We will then cat marked as 0, dog labeled 1. And the image corresponding path and labels to add to the txt.

After running Similar results were obtained:

Then we want to achieve their defined data set class, need to inherit Dataset class, and override __getitem __ () and __len __ () method: Create a new file in the utils read_from_txt.py

from torch.utils.data import Dataset
from PIL import Image

class Dogcat(Dataset):
  def __init__(self,txt_path,transform=None,target_transform=None):
    super(Dogcat,self).__init__()
    self.txt_path=txt_path
    self.transform=transform
    self.target_transform=target_transform
    fp=open(txt_path,'r')
    imgs=[]
    for line in fp:
      line=line.strip().split()
      #print(line)
      img=line[0]+" "+line[1]+" "+line[2]
      #['/content/drive/My', 'Drive/colab', 'notebooks/data/dogcat/train/cat/cat.9997.jpg', '0']
      #imgs.append((line[0],int(line[-1])))
      imgs.append((img,int(line[-1])))
      self.imgs=imgs
  def __getitem__(self,index):
    image,label=self.imgs[index]
    image=Image.open(image).convert('RGB')
    if self.transform is not None:
      image=self.transform(image)
    return image,label
  def __len__(self):
    return len(self.imgs)

Since we have a space in the path, the path taken note and when the label image.

After the rdata.py

from torch.utils.data import DataLoader
import torchvision
import torchvision.transforms as transforms
import torch
from utils import read_from_txt

def load_dataset_from_dataset(batch_size):
    #预处理
  print(batch_size)
  train_transform = transforms.Compose([transforms.RandomResizedCrop(224),transforms.ToTensor()])
  val_transform = transforms.Compose([transforms.Resize((224,224)),transforms.ToTensor()])
  test_transform = transforms.Compose([transforms.Resize((224,224)),transforms.ToTensor()])
  root="/content/drive/My Drive/colab notebooks/utils/"
  train_loader = DataLoader(read_from_txt.Dogcat(root+"train.txt",train_transform), batch_size=batch_size, shuffle=True, num_workers=6)
  val_loader = DataLoader(read_from_txt.Dogcat(root+"val.txt",val_transform), batch_size=batch_size, shuffle=True, num_workers=6)
  test_loader = DataLoader(read_from_txt.Dogcat(root+"test.txt",test_transform), batch_size=batch_size, shuffle=True, num_workers=6)
  return train_loader,val_loader,test_loader

Then main.py are ready to use.

 train_loader,val_loader,test_loader=rdata.load_dataset_from_dataset(batch_size)

Check the error under train.txt find duplicate file names, these duplicate files to delete.

Last run:

This last being given to:

The image address have not been read to DataLoader joined in the? Thread safe? Yet to find a solution. But the process of creating a data set as a whole is one such.

 

Guess you like

Origin www.cnblogs.com/xiximayou/p/12516735.html