「已解决」使用DDP多卡训练在All distributed processes registered. Starting with 8 processes卡死

使用DDP进行多卡加速训练,卡在以下位置:

----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 8 processes
----------------------------------------------------------------------------------------------------

解决方法

export NCCL_P2P_DISABLE=1

猜你喜欢

转载自blog.csdn.net/CCCDeric/article/details/133993371