How to use multiprocessing.Manager().Value to store a sum?

kjs :

I want to accumulate a sum using multiprocessing.Pool. Here's how I tried:

import multiprocessing

def add_to_value(addend, value):
    value.value += addend

with multiprocessing.Manager() as manager:
    value = manager.Value(float, 0.0)
    with multiprocessing.Pool(2) as pool:
        pool.starmap(add_to_value,
                     [(float(i), value) for i in range(100)])
    print(value.value)

This gives incorrect and even inconsistent results. For instance, one time it gives 2982.0 and another it gives 2927.0. The correct output is 4950.0, and I do get this when I use only one process in the call to Pool, rather than 2. I'm using Python 3.7.5.

filbranden :

The multiprocessing documentation (under multiprocessing.Value) is quite explicit about this:

Operations like += which involve a read and write are not atomic. So if, for instance, you want to atomically increment a shared value it is insufficient to just do counter.value += 1.

In short, you need to grab a lock to be able to do this.

You can do that with:

def add_to_value(addend, value, lock):
    with lock:
        value.value += addend

if __name__ == '__main__':
    with multiprocessing.Manager() as manager:
        lock = manager.Lock()
        value = manager.Value(float, 0.0)
        with multiprocessing.Pool(2) as pool:
            pool.starmap(add_to_value,
                         [(float(i), value, lock) for i in range(100)])
        print(value.value)

This will correctly output 4950.0.

But note that this approach will be quite expensive due to the need for locking. Most probably, it will take more time to finish than if you have a single process doing the operation.

NOTE: I'm also adding an if __name__ == '__main__': guard which is actually required when using a start method other than fork. The default on both Windows and Mac OS is spawn, so that's really needed to make this code portable to either of those platforms. Start methods spawn and forkserver are also available on Linux/Unix, so in some situations this is also needed there.

Multiprocessing will be more efficient when you're able to offload a job to workers that they can complete on their own, for example calculate partial sums and then add them together in the main process. If possible, consider rethinking your approach to fit that model.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=216540&siteId=1