Pytorch BrokenPipeError: [Errno 32] Broken pipe error resolution

insert image description here

1. The reason for the error

The problem of multithreading under Windows torch.utils.data.DataLoaderis related to classes. num_workersImproper parameter setting

from torch.utils.data import DataLoader
...
dataset_train = DataLoader(train_data, batch_size=batch_size, shuffle=True, num_workers=16)
dataset_test = DataLoader(test_data, batch_size=batch_size, shuffle=False, num_workers=16)

The official API explanation of the num_workers parameter:num_workers (int, optional) – how many subprocesses to use for data loading. 0 means that the data will be loaded in the main process. (default: 0)

This parameter refers to the number of threads enabled when loading the dataset. num_workersThe parameter must be greater than or equal to 0. If it is 0, it means that the data set is loaded in the main process. If it is greater than 0, it means that the data set loading speed is improved through multiple processes. The default value is 0.

Two, the solution

  1. Set num_workersthe value to0
from torch.utils.data import DataLoader
...
dataset_train = DataLoader(train_data, batch_size=batch_size, shuffle=True, num_workers=0)
dataset_test = DataLoader(test_data, batch_size=batch_size, shuffle=False, num_workers=0)
  1. If num_workersthe value is greater than 0, you need to put the running part in if __name__ == '__main__':so that no error will be reported:
from torch.utils.data import DataLoader
...
if __name__ == '__main__':
	dataset_train = DataLoader(train_data, batch_size=batch_size, shuffle=True, num_workers=16)
	dataset_test = DataLoader(test_data, batch_size=batch_size, shuffle=False, num_workers=16)
  1. If you put the running part into mainthe method and still report an error, it is generally because num_workersthe setting is too large. can be smallerinsert image description here

    OSError: [WinError 1455] The paging file is too small to complete the operation. Error loading “F:\anaconda3\envs\xxx\lib\site-packages\torch\lib\caffe2_detectron_ops_gpu.dll” or one of its dependencies.
    train(model, device, dataset_train, optimizer, epoch + 1, FocalLoss, batch_size)


num_workersParameter setting skills:

When the data set is small (less than 2W), it is recommended that num_works be ignored by default, because it is slower than useless.
It is recommended to use it when the data set is large. Generally, it is best to set num_works to (number of CPU threads ± 1). You can use the following code to find the best num_works:

import time
import torch.utils.data as d
import torchvision
import torchvision.transforms as transforms

if __name__ == '__main__':
    BATCH_SIZE = 100
    transform = transforms.Compose([transforms.ToTensor(),
                                    transforms.Normalize((0.5,), (0.5,))])
    train_set = torchvision.datasets.MNIST('\mnist', download=True, train=True, transform=transform)

    # data loaders
    train_loader = d.DataLoader(train_set, batch_size=BATCH_SIZE, shuffle=True)

    for num_workers in range(20):
        train_loader = d.DataLoader(train_set, batch_size=BATCH_SIZE, shuffle=True, num_workers=num_workers)
        # training ...
        start = time.time()
        for epoch in range(1):
            for step, (batch_x, batch_y) in enumerate(train_loader):
                pass
        end = time.time()
        print('num_workers is {} and it took {} seconds'.format(num_workers, end - start))


Reference article:

https://blog.csdn.net/Ginomica_xyx/article/details/113745596
https://blog.csdn.net/qq_41196472/article/details/106393994

Guess you like

Origin blog.csdn.net/qq_40738764/article/details/127865690