DDP报错IndexError: list index out of range

完整错误如下:

Traceback (most recent call last):
File “/opt/conda/envs/env_cp36_microDL/lib/python3.6/multiprocessing/process.py”, line 261, in _bootstrap
util._exit_function()
File “/opt/conda/envs/env_cp36_microDL/lib/python3.6/multiprocessing/util.py”, line 319, in _exit_function
p.join()
File “/opt/conda/envs/env_cp36_microDL/lib/python3.6/multiprocessing/process.py”, line 124, in join
res = self._popen.wait(timeout)
File “/opt/conda/envs/env_cp36_microDL/lib/python3.6/multiprocessing/popen_fork.py”, line 50, in wait
return self.poll(os.WNOHANG if timeout == 0.0 else 0)
File “/opt/conda/envs/env_cp36_microDL/lib/python3.6/multiprocessing/popen_fork.py”, line 28, in poll
pid, sts = os.waitpid(self.pid, flag)
File “/opt/conda/envs/env_cp36_microDL/lib/python3.6/site-packages/torch/utils/data/_utils/signal_handling.py”, line 63, in handler
def handler(signum, frame):
File “/root/.vscode-server/extensions/ms-python.python-2022.6.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_trace_dispatch_regular.py”, line 352, in call
py_db, t, additional_info, cache_skips, frame_skips_cache = self._args
File “/opt/conda/envs/env_cp36_microDL/lib/python3.6/site-packages/torch/utils/data/_utils/signal_handling.py”, line 66, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 1852385) is killed by signal: Terminated.
Process Process-2:
Traceback (most recent call last):
File “/opt/conda/envs/env_cp36_microDL/lib/python3.6/multiprocessing/process.py”, line 258, in _bootstrap
self.run()
File “/opt/conda/envs/env_cp36_microDL/lib/python3.6/multiprocessing/process.py”, line 93, in run
self._target(*self._args, **self._kwargs)
File “/home/yingmuzhi/microDL_3_0/train/train_ddp.py”, line 267, in run_ddp
validation__mean_loss, metrics = MDL_trainer.evaluate_one_epoch(data_loader=validation_loader)
File “/home/yingmuzhi/microDL_3_0/train/trainer/microDL_trainer.py”, line 19, in wrapper
result = func(*args, **kwargs)
File “/home/yingmuzhi/microDL_3_0/train/trainer/microDL_trainer.py”, line 246, in evaluate_one_epoch
for step, (signal, target) in enumerate(data_loader):
File “/opt/conda/envs/env_cp36_microDL/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 435, in next
data = self._next_data()
File “/opt/conda/envs/env_cp36_microDL/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 1085, in _next_data
return self._process_data(data)
File “/opt/conda/envs/env_cp36_microDL/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 1111, in _process_data
data.reraise()
File “/opt/conda/envs/env_cp36_microDL/lib/python3.6/site-packages/torch/_utils.py”, line 428, in reraise
raise self.exc_type(msg)
IndexError: Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
File “/opt/conda/envs/env_cp36_microDL/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py”, line 198, in _worker_loop
data = fetcher.fetch(index)
File “/opt/conda/envs/env_cp36_microDL/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py”, line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File “/opt/conda/envs/env_cp36_microDL/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py”, line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File “/home/yingmuzhi/microDL_3_0/train/dataset.py”, line 144, in getitem
input = np.load(self.input[index])
IndexError: list index out of range

解决方法,不要使用dataset+batch_sampler来取batch_size个数据了,用dataset+sampler+batch_size来取数据,如下:

train_loader = torch.utils.data.DataLoader(
        train_dataset, 
        # batch_sampler=train_batch_sampler, 
        sampler = train_sampler,
        batch_size = args.batch_size,
        shuffle=args.shuffle_images, 
        num_workers=num_workers,
        pin_memory=args.pin_memory,
        collate_fn=train_dataset.collate_fn,
    )

batch_sampler给注释掉了。

猜你喜欢

转载自blog.csdn.net/qq_43369406/article/details/130769158