[python] pits and solutions for python parallelization based on multiprocessing.Pool

Pit 1: The sub-function called by apply_async does not execute or executes incompletely

Solution: Pass in error_callback to check for errors when using apply_async


from multiprocessing import Pool

def processFolder(idx, folders, o_dir):
    train_mesh = TrainMeshes(folders)
    output_path = os.path.join(o_dir, str(idx) + '.pkl')
    pickle.dump(train_mesh, file=open(output_path, "wb"))


	
if __name__ == '__main__':
	train_mesh_folders = ['mesh1','mesh2']
    n_processes = os.cpu_count()
    n_processes =2
    print('n_processes: ', n_processes)
    pool = Pool(processes=n_processes)  # 进程池
    # split folders into n_processes parts
    split_folders = np.array_split(train_mesh_folders, n_processes)

    pool.apply_async(processFolder, args=(0, split_folders[0], output_dir,))
    pool.apply_async(processFolder, args=(1, split_folders[1], output_dir,))

    pool.close()
    pool.join()

Running the above multi-process program, the called program exits after only a small part of it runs, and no error is reported.
It's confusing.
It seems that there is no error, but in fact there is an error! ! !
insert image description here
Add error_callback to the above program pool.apply_async , and you will find the problem


from multiprocessing import Pool

def processFolder(idx, folders, o_dir):
    train_mesh = TrainMeshes(folders)
    output_path = os.path.join(o_dir, str(idx) + '.pkl')
    pickle.dump(train_mesh, file=open(output_path, "wb"))

def error_callback(e):
    print('error_callback: ', e)
	
	
if __name__ == '__main__':
	train_mesh_folders = ['mesh1','mesh2']
    n_processes = os.cpu_count()
    n_processes =2
    print('n_processes: ', n_processes)
    pool = Pool(processes=n_processes)  # 进程池
    # split folders into n_processes parts
    split_folders = np.array_split(train_mesh_folders, n_processes)

    pool.apply_async(processFolder, args=(0, split_folders[0], output_dir,), error_callback=error_callback)
    pool.apply_async(processFolder, args=(1, split_folders[1], output_dir,), error_callback=error_callback)

    pool.close()
    pool.join()

insert image description here
As you can see, there is an error! ! ! !
But it's also very strange, if you don't use multi-process, you won't report an error. Maybe my program is not well written. Anyway, after the error is reported, the multi-process can run smoothly.

references:

The solution for the callback function of apply_async in the Python process pool is not executed
Python concurrent programming: Why is the target function passed into the process pool not executed and no error is reported?

Pit 2: from multiprocessing import Pool process pool torch related functions stuck

torch.min(V[:,0])
torch.sparse.FloatTensor(i, v, torch.Size(shape))
It is measured that both functions will be stuck

Solution: Use threads instead of processes, change from multiprocessing.pool import ThreadPool as Pool

def normalizeUnitCube(V):
    '''
    NORMALIZEUNITCUBE normalize a shape to the bounding box by 0.5,0.5,0.5

    Inputs:
        V (|V|,3) torch array of vertex positions

    Outputs:
        V |V|-by-3 torch array of normalized vertex positions
    '''
    V = V - torch.min(V,0)[0].unsqueeze(0)
    # x_min = torch.min(V[:,0])
    # y_min = torch.min(V[:,1])
    # z_min = torch.min(V[:,2])
    # min_bound = torch.tensor([x_min, y_min, z_min]).unsqueeze(0)
    # V = V - min_bound

    V = V / torch.max(V.view(-1)) / 2.0
    return V

The function above normalizes the set of points. V is a two-dimensional tensor, which is actually a (Nx3) list of vertices.
The actual measurement using torch.min(V,0) will get stuck, and only the commented code can be used. torch.min(V[:,0]) to run.

def tgp_midPointUp(V, F, subdIter=1):
    """
    perform mid point upsampling
    """
    Vnp = V.data.numpy()
    Fnp = F.data.numpy()
    VVnp, FFnp, SSnp = midPointUpsampling(Vnp, Fnp, subdIter)
    VV = torch.from_numpy(VVnp).float()
    FF = torch.from_numpy(FFnp).long()

    SSnp = SSnp.tocoo()
    values = SSnp.data
    indices = np.vstack((SSnp.row, SSnp.col))
    i = torch.LongTensor(indices)
    v = torch.FloatTensor(values)
    shape = SSnp.shape
    SS = torch.sparse.FloatTensor(i, v, torch.Size(shape))  #在这里会卡死
    return VV, FF, SS

Common solutions

  • Call the multiprocessing module under the pathos package to replace the original multiprocessing. The multiprocessing in pathos is rewritten with the dill package. The dill package can serialize almost all python types, so they can be pickled.
  • Use threads instead of processes from multiprocessing.pool import ThreadPool as Pool
  • You can use copy_reg to avoid exceptions
  • Write the called function at the top level to avoid
  • Override the inner function of the class to avoid

references

The pit stepped on by python multi-process

Guess you like

Origin blog.csdn.net/weixin_43693967/article/details/131404109