python中多线程+多进程+logging出现子进程死锁的问题

参考文档:

https://www.ctolib.com/topics/85107.html

https://twiki.cern.ch/twiki/bin/view/Main/PythonLoggingThreadingMultiprocessingIntermixedStudy

https://github.com/google/python-atfork

服务采用多线程+ 多进程+ logging的模式,服务偶然会出现死锁的子进程。

如下:


image.png                             ​

通过分析子进程死锁的原因是在子进程复制(multiprocessing中调用os.fork)的时候,会把主进程的logging中的锁也复制了一份,如果这个时候锁是被占用的,子进程的锁就一直是占用的,即使父进程中的锁被释放,也不行。子进程会一直阻塞在获得锁的状态,就没办法记日志了。

测试环境上跑的时候并没有复现这个问题,且线上环境死锁的现象也是偶发的,现需要想办法复现这个问题。

测试:

import os
import sys
curdir = os.path.dirname(os.path.abspath(__file__))
sys.path.append(os.path.dirname(curdir))

from multiprocessing import Process
import threading
from common import log
import time



def worker(id):
    # logger = logging.getLogger("worker '%s'" % id)
    while True:
        t = time.time()
        msg = "worker '%s', time: %s" % (id, t)
        # print msg
        log.info(msg)
        # logger.debug(msg)


def startProcessWorkers(numWorkers):
    workers = []
    for i in range(numWorkers):
        id = "process %02i" % i
        w = Process(target=worker, args=(id,))
        w.start()
        workers.append((id, w))
    return workers


class Worker(threading.Thread):
    def __init__(self, id, iterations=None):
        threading.Thread.__init__(self)
        self.id = id
        self.iterations = iterations
        self._stopFlag = False

    def run(self):
        # logger = logging.getLogger("worker '%s'" % id)
        counter = self.iterations
        while not self._stopFlag:
            t = time.time()
            msg = "worker '%s', time: %s" % (self.id, t)
            # print msg
            log.info(msg)
            # logger.debug(msg)
            if self.iterations:
                counter -= 1
                if not counter:
                    self._stopFlag = True

    def terminate(self):
        self._stopFlag = True


def startThreadWorkers(numWorkers, iterations=None):
    workers = []
    for i in range(numWorkers):
        id = "thread %02i" % i
        w = Worker(id, iterations=iterations)
        w.start()
        workers.append((id, w))
    return workers


if __name__ == "__main__":
    workers = []
    workers.extend(startThreadWorkers(2, iterations=10000))
    #    iterations - make the threads stop after some time - processes
    #    in the case (2) will never resume anyway
    workers.extend(startProcessWorkers(2))  # problem
    # workers.extend(startThreadWorkers(2, iterations = 10000))

图中可看到启动了3个进程,但是日志文件中只记录了threading的日志,进程的日志并没有记录。

解决方案:https://github.com/google/python-atfork


原理是在python中无论任何时候调用fork(),都将进行一组回调。

_prepare_call_list

_parent_call_list

_child_call_list
分别对应三种触发时间:


1、prepare:在调用fork之前。
2、in the parent after fork-在fork后立即执行(不管成功还是失败),在父进程中。
3、in the child after fork-在fork后,子进程立即执行。

即:

方法:
import atfork
atfork.monkeypatch_os_fork_functions()

通过atfork.atfork注册三种回调:
atfork.atfork(prepare=my_lock.acquire,
                  parent=my_lock.release,
                  child=my_lock.release)

比方说logging模块中的锁:
from atfork import stdlib_fixer
stdlib_fixer.fix_logging_module()

猜你喜欢

转载自blog.csdn.net/qq_35462323/article/details/89086842
今日推荐