What does a deep learning project need

DataLoader

1. Data preprocessing

DataLoader needs to preprocess the data before feeding it to the model. Preprocessing can include operations such as data augmentation, normalization, cropping, and scaling. These operations can improve the performance and accuracy of the model. When processing point cloud data, the farthest point can be down-sampled to a fixed number of points.

2. Read the tag file

i 1 2 3

love 45 6

python

matrix_file = open("D:\py_code\w123.txt", 'r', encoding='utf-8')  # 打开文件
lines = matrix_file.readlines()  # 读取文件中的所有行
print(lines)
print(len(lines))

结果:
['我 1 2 3\n', '爱 45 6\n', 'python']
3

3. Dataset division

Generate the training set, test set and file list of all data sets of the data set and save them as txt files. The proportion of the training set can be set by yourself

4. Load data class

The MyDataset(Dataset) class contains functions such as __init__(), __len__(), __getitem__()

Dataset: parent class, a template or abstraction of all data sets used by all developers for training and testing. This class is an abstract class. All data sets need to inherit this class if they want to establish a mapping between data and labels . All subclasses need to rewrite the __getitem__ method, which obtains each data and its corresponding Label according to the index value. Subclasses can also rewrite the __len__ method to return the size of the data set

MyDataset: subclass, which is a specific dataset, inherits all the methods and properties of the parent class

How to rewrite Dataset?

[Deep Learning] The use and example analysis of the PyTorch Dataset class - Zhihu (zhihu.com)

After clarifying how to assemble the path, get the file name in the path, and get the specific data object, rewrite the method

__init() stores all file names under the data directory

__len__ Get the length of the data set

__getitem__() returns a data item rather than the entire data set, which allows the index to correspond to specific data in the data set

from torch.utils.data import Dataset

class MyDataset(Dataset):
    def __init__(self):#一个函数中的变量是不能拿到另外一个函数中使用的,self可以当做类中的全局变量
        print("1")
    def __len__(self):
        print("2")
    def __getitem__(self,idx):
        print("3")
dataset = MyDataset()#会自动调用__init__方法

结果:
1

5. Create a data loader

data = MyDataset()#实例化
DataLoader = torch.utils.data.DataLoader(data, batch_size=64, shuffle=True)

Divide the samples into batches according to batch_size (the number of samples contained in each batch), and randomly shuffle the data at the beginning of each epoch (all samples are traversed)

Train

  1. def parse_args()

Parameters include batch_size/model/epoch/learning_rate/gpu/optimizier/data_path/result_savepath/check_savepath/log_dir/decay_rate etc.

  1. def valid(model,loader)

Test the performance of the network model on the validation set, returning the accuracy

3.def main(args)

Set which gpu to run on

Create a directory

Set the log file to record some information during the training process

Load training and test data

load model

Modify the learning rate (6 messages) [deep learning] learning rate (learning rate)_learning rate of deep learning_JNingWei's Blog-CSDN Blog

Training: Train once per epoch

Use the verification set to verify: the verification set is specially used to see how the network performance is after each epoch of training, and the test set can be used instead of the verification set (7 messages) The test set can be used as a verification set; the verification set cannot come from the training set ! (Cross-validation, leave-one-out method)_Can the verification set be used as a test set_doubleslow;'s blog-CSDN Blog

Test

  1. def parse_args()

Parameters include batch_size/model/gpu/data_path/save_path/pth_path etc.

2.def main(args)

Create a data storage directory

load test data

Loading model: the optimal model obtained by training

Guess you like

Origin blog.csdn.net/m0_67357141/article/details/129556437