Pit 1: The sub-function called by apply_async does not execute or executes incompletely
Solution: Pass in error_callback to check for errors when using apply_async
from multiprocessing import Pool
def processFolder(idx, folders, o_dir):
train_mesh = TrainMeshes(folders)
output_path = os.path.join(o_dir, str(idx) + '.pkl')
pickle.dump(train_mesh, file=open(output_path, "wb"))
if __name__ == '__main__':
train_mesh_folders = ['mesh1','mesh2']
n_processes = os.cpu_count()
n_processes =2
print('n_processes: ', n_processes)
pool = Pool(processes=n_processes) # 进程池
# split folders into n_processes parts
split_folders = np.array_split(train_mesh_folders, n_processes)
pool.apply_async(processFolder, args=(0, split_folders[0], output_dir,))
pool.apply_async(processFolder, args=(1, split_folders[1], output_dir,))
pool.close()
pool.join()
Running the above multi-process program, the called program exits after only a small part of it runs, and no error is reported.
It's confusing.
It seems that there is no error, but in fact there is an error! ! !
Add error_callback to the above program pool.apply_async , and you will find the problem
from multiprocessing import Pool
def processFolder(idx, folders, o_dir):
train_mesh = TrainMeshes(folders)
output_path = os.path.join(o_dir, str(idx) + '.pkl')
pickle.dump(train_mesh, file=open(output_path, "wb"))
def error_callback(e):
print('error_callback: ', e)
if __name__ == '__main__':
train_mesh_folders = ['mesh1','mesh2']
n_processes = os.cpu_count()
n_processes =2
print('n_processes: ', n_processes)
pool = Pool(processes=n_processes) # 进程池
# split folders into n_processes parts
split_folders = np.array_split(train_mesh_folders, n_processes)
pool.apply_async(processFolder, args=(0, split_folders[0], output_dir,), error_callback=error_callback)
pool.apply_async(processFolder, args=(1, split_folders[1], output_dir,), error_callback=error_callback)
pool.close()
pool.join()
As you can see, there is an error! ! ! !
But it's also very strange, if you don't use multi-process, you won't report an error. Maybe my program is not well written. Anyway, after the error is reported, the multi-process can run smoothly.
references:
The solution for the callback function of apply_async in the Python process pool is not executed
Python concurrent programming: Why is the target function passed into the process pool not executed and no error is reported?
Pit 2: from multiprocessing import Pool process pool torch related functions stuck
torch.min(V[:,0])
torch.sparse.FloatTensor(i, v, torch.Size(shape))
It is measured that both functions will be stuck
Solution: Use threads instead of processes, change from multiprocessing.pool import ThreadPool as Pool
def normalizeUnitCube(V):
'''
NORMALIZEUNITCUBE normalize a shape to the bounding box by 0.5,0.5,0.5
Inputs:
V (|V|,3) torch array of vertex positions
Outputs:
V |V|-by-3 torch array of normalized vertex positions
'''
V = V - torch.min(V,0)[0].unsqueeze(0)
# x_min = torch.min(V[:,0])
# y_min = torch.min(V[:,1])
# z_min = torch.min(V[:,2])
# min_bound = torch.tensor([x_min, y_min, z_min]).unsqueeze(0)
# V = V - min_bound
V = V / torch.max(V.view(-1)) / 2.0
return V
The function above normalizes the set of points. V is a two-dimensional tensor, which is actually a (Nx3) list of vertices.
The actual measurement using torch.min(V,0) will get stuck, and only the commented code can be used. torch.min(V[:,0]) to run.
def tgp_midPointUp(V, F, subdIter=1):
"""
perform mid point upsampling
"""
Vnp = V.data.numpy()
Fnp = F.data.numpy()
VVnp, FFnp, SSnp = midPointUpsampling(Vnp, Fnp, subdIter)
VV = torch.from_numpy(VVnp).float()
FF = torch.from_numpy(FFnp).long()
SSnp = SSnp.tocoo()
values = SSnp.data
indices = np.vstack((SSnp.row, SSnp.col))
i = torch.LongTensor(indices)
v = torch.FloatTensor(values)
shape = SSnp.shape
SS = torch.sparse.FloatTensor(i, v, torch.Size(shape)) #在这里会卡死
return VV, FF, SS
Common solutions
- Call the multiprocessing module under the pathos package to replace the original multiprocessing. The multiprocessing in pathos is rewritten with the dill package. The dill package can serialize almost all python types, so they can be pickled.
- Use threads instead of processes from multiprocessing.pool import ThreadPool as Pool
- You can use copy_reg to avoid exceptions
- Write the called function at the top level to avoid
- Override the inner function of the class to avoid