Seven installments training pytorch learning

Batch training is what is it? In previous iterations of training code.

for t in range(100):
    out = net(x)
    loss = loss_func(out,y)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

The first iteration, the data needs to use all the training samples. So when the training set is very large, or when the sample can not be taken out at the same time, it is more difficult to train, this time to use the batch method.

What is a batch of it? Each iteration is to use only part of the training set as a representative, to train the entire network. This will speed up the training of the network, at the same time, the accuracy will not drop too much.

pytorch provides methods to grant training, the following two main

  • Data.TensorDataset
    The training sample x, y, encapsulated in a data set type

  • Data.DataLoader
    This tool is a set of data segmentation, segmentation is generally random.

The Code

import torch#导入模块
import torch.utils.data as Data

#每一批的数据量
BATCH_SIZE=5#每一批的数据量

x=torch.linspace(1,10,10)
y=torch.linspace(10,1,10)


#转换成torch能识别的Dataset
torch_dataset=Data.TensorDataset(x,y)  #将数据放入torch_dataset


#torch.utils.data.DataLoader这个接口定义在dataloader.py脚本中,只要是用PyTorch来训练
#模型都会用到该接口,该接口主要用来将自动以的数据读取接口的输出或者PyTorch已有的数据读取接口按照batch size
#封装成Tensor,后续只需要再包装成Variable即可作为模型的输入,因此该接口有点承上启下的作用,比较
#

loader=Data.DataLoader(
        dataset=torch_dataset,      #将数据放入loader
        batch_size=BATCH_SIZE,      #批的尺寸,五个为一个批次
        shuffle=True,               #是否打断数据  
        num_workers=0              #多线程读取数据,如果为0就是主线程来读取数据
        )

#for epoch in range(3):  #训练所有的整套数据3次
#    for step,(batch_x,batch_y) in enumerate(loader):   #
for epoch in range(3):
    for step,(batch_x,batch_y) in enumerate(loader):  
        print('Epoch:',epoch,'|Step:',step,'|batch x:',batch_x.numpy()
            ,'|batch y:',batch_y.numpy())

= Data.DataLoader Loader (
DataSet = torch_dataset, # data into the Loader
the batch_size = BATCH_SIZE, # batch size, a batch of five
shuffle = True, # whether the interrupt data
num_workers = 0 # multithreaded read data, If 0 is the main thread to read data
)

dataset is just created a data set, batch_size is the size of each batch. shuffle, if each batch of data is randomly selected from the dataset inside. num_works, whether it is multiple threads to read the data, this parameter is nonzero on my machine will fail to perform, I do not know why

for epoch in range(3):
    for step,(batch_x,batch_y) in enumerate(loader):  
        print('Epoch:',epoch,'|Step:',step,'|batch x:',batch_x.numpy()
            ,'|batch y:',batch_y.numpy())

for step, (batch_x, batch_y) in enumerate (loader):
This sentence is generated on an iterator loader then moves through the loader
STEP representative index, batch_x, batch_y representative of each random slicing of the training set of size 5
printing effect is as follows

Epoch: 0 |Step: 0 |batch x: [1. 2. 3. 4. 5.] |batch y: [10.  9.  8.  7.  6.]
Epoch: 0 |Step: 1 |batch x: [ 6.  7.  8.  9. 10.] |batch y: [5. 4. 3. 2. 1.]
Epoch: 1 |Step: 0 |batch x: [1. 2. 3. 4. 5.] |batch y: [10.  9.  8.  7.  6.]
Epoch: 1 |Step: 1 |batch x: [ 6.  7.  8.  9. 10.] |batch y: [5. 4. 3. 2. 1.]
Epoch: 2 |Step: 0 |batch x: [1. 2. 3. 4. 5.] |batch y: [10.  9.  8.  7.  6.]
Epoch: 2 |Step: 1 |batch x: [ 6.  7.  8.  9. 10.] |batch y: [5. 4. 3. 2. 1.]

Guess you like

Origin blog.csdn.net/ronaldo_hu/article/details/91958278