_ learning python multiprocessing

1, Basics:

  Computer hardware components:

    Curing the motherboard (register, is hardware and interact directly cpu)

    cpu CPU: calculating (digital computing and logic calculation) and control (control of all hardware coordination)

    Hard disk memory storage

    Keyboard and mouse input device microphone

    A printer monitor audio output device

  Early computer is calculated as the core, today's computers are stored in the core

  Is a software operating system is a direct operating software a hardware

  OS goal: to make it easier for users to use, high availability, low coupling, encapsulates all hardware interfaces, allowing users to more easily use for all resources within the computer, a reasonable allocation and scheduling

  Basic knowledge of the process:

    Process: The program is being executed, the program execution is the process where relevant, a collection of data sets, etc., also called the program once the implementation process is a dynamic concept

    Composition process: the PCB code segment data segment: process control block

    Three basic status of the process:

      Ready state: it has received all the resources needed for the operation, in addition to cpu

      Execution state: The've got all the resources, including the cpu, being executed in the state

      Blocked: because of various reasons, the process gave up the cpu, resulting in the process can not proceed at this time is in the process of memory, continue to wait to obtain cpu

      A special status: suspended state, for various reasons, the process gave up the cpu, resulting in the process can not proceed at this time the process was kicked out of memory

2, the process

   multiprocessing python built-in module for multi-process programming Process from multiprocessing import Process

  Parallel: Refers to two or many things, begin at the same point in time

  Concurrency: refers to two or more than one thing, be performed simultaneously in the same time interval

  Sync: perform a certain task must rely on other tasks return results

  Asynchronous; perform a certain task does not depend on another task returned, only you need to tell another job soon

  Obstruction: a similar program because IO wait, wait time can not continue

  Non-blocking: IO operation has encountered a similar time, not block waiting for, if not treated IO, it error or skip

  Process methods or properties:

    os.getpid () Gets the current process pid os.getppid () Gets the current process id of the parent process

    start () to open a child process

    join () asynchronous become synchronized, so that the parent process waits for the child process to finish, and then continue (when the main process execution to this statement, the main process blocks, waiting for the child process is finished, the main process continues), join must be placed after the start

    is_alive () to determine whether the process alive

    terminate () to kill the process

    Attributes;

      name: The name of the child process

      pid: pid of the child process

      deamon: set the process for the daemon to True representatives daemon, the default is false, not a daemon

      Daemon: p.daemon = True

        With the code of the parent process is finished ends (knock on the blackboard, draw the focus of the code is finished !! - does not block waiting for the main course)

        Daemon can not create a child process

        Daemon must be set before the process start

  IPC: Inter-Process Communication

    Lock mechanism: For the time multi-process communication, to protect data security

      from multiprocessing import Lock

      l = lock()

      l.acquire () acquires the lock (this time other processes can not access locked resources)

      l.release () to release the lock (other process can access)

    Signaling mechanisms:

      SEM = semaphore (n)

      n: initialization of a lock with a few keys, int

      l.acquire()

      l.release()

      Signal mechanism of the locking mechanism than the counter, this counter is used to record the current remaining few lock, the counter is 0, no expression of this key, Acquire () in the blocked state, Acquire () once decreases the counter by 1, release time counter plus one;

    Event mechanism:

      e = Event ()

      Initially false, blocked

      e.set() 设置is_set()为True,代表非阻塞状态

      e.clear() 设置is_set()为false,代表阻塞状态

      e.wait()判断is_set的值,True为非阻塞。fase为阻塞

      e.is_set() 标志

3、生产者消费者模型

   简要介绍:主要是用来解耦,借助队列来实现生产者消费者模型

        栈:先进后出

        队列:先进先出

   import queue   不能进行多进程之间的数据传输

   from multiprocessing import Queue 借助Queue解决生产者消费者模型

   队列是安全的

   q = Queue(num---队列的最大长度)

   q.get() 阻塞等待获取数据,如果有数据直接获取,没有则阻塞等待

   q.put() 阻塞 如果可以继续往队列中放数据,就直接放,不能放则阻塞等待

   q.get_nowait() 不阻塞,如果有数据直接获取,没有数据就报错

   q.put_nowait() 不阻塞,如果可以继续往队列中放数据,就直接放,不能就报错

  from multiprocessing import JoinableQueue 可连接的队列

    继承Queue 可以使用queue的方法

    增加的方法:

      q.join() 用户生产者接收消费者的返回结果,接收全部生产的数量,以便知道什么时候队列里的数据被消费完了

      q.task_done() 每消费一个数据,就返回一个表示返回结果,生产者就您呢个获得当前消费者消费了多少个数据,每消费队列里的一个数据,就给join返回一个表示

  管道:

    from multiprocessing import Pipe

    con1, con2 =  Pipe()

    管道是不安全的

    管道是用于多线程之间通信的一种方式

    单进程中:

      con1发则con2收,con2收则con1发

    多进程中:

      父进程con1发,子进程的con2收

    管道中错误EOFError 父进程中如果关闭发送端,子进程还在继续接收,就回导致EOFError

4、进程池

  一个池子,里面有固定数量的进程,且处在待命状态,一旦有任务来,马上就有进程去处理

   开启进程需要操作系统消耗大量的事件去管理它,大量的事件让cpu去调度它

   进程池会帮助程序员去管理池中的进程

    from multiprocessing import Pool

      p = Pool(os.cpu_count() + 1)

  进程池的三个方法:
    map(func, iterable)

      func: 进程池中进程执行的任务函数

      iterable:可迭代对象,是把可迭代对象中的每一个元素传给任务函数当参数

from multiprocessing import Pool
import os


def func(num):
    num += 1
    print(num)
    return num


if __name__ == '__main__':
    p = Pool(os.cpu_count() + 1)
    print(os.cpu_count())
    res = p.map(func, [i for i in range(20)]) . # i作为参数传入func中
    p.close() # 表示不能再向进程池中添加任务
    p.join()  # 表示等待进程池中所有任务执行完毕
    print(type(res))

 

    apply(func,arg=()) 同步的执行,即池中的进程一个个的去执行任务

      func:进程池中进程执行的任务函数

      args:可迭代对象的参数,是传给任务函数的参数

      同步执行任务,不需要close和join, 进程池中所有的进程都是普通进程(主进程需要等待其结束)

 

from multiprocessing import Pool
import requests
import time


def func(url):
    res = requests.get(url)
    if res.status_code == 200:
        return 'ok'


if __name__ == '__main__':
    p = Pool(5)
    l = ['https://www.baidu.com',
         'http://www.jd.com',
         'http://www.taobao.com',
         'http://www.mi.com',
         'http://www.cnblogs.com',
         'https://www.bilibili.com',
         ]
    start = time.time()
    for i in l:
        p.apply(func, args=(i,)) #即使有n个线程也是一个一个的去执行
    print(time.time() - start)

    start = time.time()
    for i in l:
        p.apply_async(func, args=(i, ))
    p.close()
    p.join()
    print(time.time() - start)

 

    apply_async(func,args=(), callback=None) 异步:池中的进程一次性去执行任务

      func:进程池中进程执行的任务函数

      args:可迭代对象的参数,是传给任务函数的参数

      callback: 回调函数 当进程池中有进程处理完任务来,返回的结果可以交给回调函数,由回调函数进程进一步的处理,这是只有异步才有的

      异步处理任务,需要close和join

      异步处理任务时,进程池中所有的进程都是守护进程

      回调函数:

        进程的任务函数的返回值,被当成回调函数的形参接收到,一次进一步的处理操作

        回调函数是由主进程调用的,而不是子进程,子进程只负责把结果给回调函数

 

from multiprocessing import Pool
import requests
import time
import os


def func(url):
    res = requests.get(url)
    print("子进程的pid:%s, 父进程的pid:%s" % (os.getpid(), os.getppid()))
    if res.status_code == 200:
        return url


def cal_back(sta):
    url = sta
    print("回调函数的pid:%s" % os.getpid())
    with open('content.txt', 'a+') as f:
        f.write(url + "\n")


if __name__ == '__main__':
    p = Pool(5)
    l = ['https://www.baidu.com',
         'http://www.jd.com',
         'http://www.taobao.com',
         'http://www.mi.com',
         'http://www.cnblogs.com',
         'https://www.bilibili.com',
         ]
    print("主进程pid:%s" % os.getpid())
    for i in l:
        p.apply_async(func, args=(i,), callback=cal_back)
    p.close()
    p.join()

 

  进程和进程池对比

 

from multiprocessing import Process, Pool
import time


def func(num):
    num += 1
    # print(num)


if __name__ == '__main__':
    p = Pool(5)
    start = time.time()
    p.map(func, [i for i in range(1000)])
    p.close()  # 指不能再向进程池中添加任务
    p.join()  # 等待进程池中的所有任务执行完毕
    print(time.time() - start)

    p_l = []
    start = time.time()
    for i in range(1000):
        p = Process(target=func, args=(i,))
        p.start()
        p_l.append(p)
    [i.join() for i in p_l]
    print(time.time() - start)

 

 

 

5、生产者消费者模型实例

  队列实现生产者消费者模型

from multiprocessing import Queue, Process
import time


def producer(q, name):
    for i in range(20):
        q.put(name)
        print("生产第%s个%s" % (i, name))
    # q.put(None)


def consumer(q, name, color):
    while 1:
        info = q.get()
        if info:
            print("%s %s拿走来%s\033[0m" % (color, name, info))
        else:
            break


if __name__ == '__main__':
    q = Queue(10)
    p = Process(target=producer, args=(q, 'number one'))
    p1 = Process(target=producer, args=(q, 'number two'))
    p2 = Process(target=producer, args=(q, 'number three'))
    c1 = Process(target=consumer, args=(q, 'alex', '\033[31m'))
    c2 = Process(target=consumer, args=(q, 'wusir', '\033[32m'))
    p_l = [p, p1, p2, c2, c1]
    [i.start() for i in p_l]
    p.join()
    p1.join()
    p2.join()
    q.put(None)
    q.put(None)  # 设置标志 表示没有数据了,生产者不再生产数据

 

  joinableQueue实现生产者消费者进程

from multiprocessing import JoinableQueue, Process


def producer(q, name):
    for i in range(20):
        q.put(name)
        print("生产第%s个%s" % (i, name))
    q.join() 生产者进程等待消费者进程消费完成


def consumer(q, name):
    while 1:
        q.get()
        print("%s 拿走了一个" % name)
        q.task_done()


if __name__ == '__main__':
    q = JoinableQueue(10)
    p1 = Process(target=producer, args=(q, 'one'))
    c1 = Process(target=consumer, args=(q, 'alex'))
    c1.daemon = True # 设置守护进程
    p1.start()
    c1.start()
    p1.join() # 主进程等待生产者进程

    # 主进程等待生产者进程结束
    # 程序有3个进程,主进程和生产者进程和消费者进程。  当主进程执行到35行代码时,主进程会等待生产进程结束
    # 而生产进程中(第26行)会等待消费者进程把所有数据消费完,生产者进程才结束。
    # 现在的状态就是  主进程等待生产者进程结束,生产者进程等待消费者消费完所有数据
    # 所以,把消费者设置为守护进程。  当主进程执行完,就代表生产进程已经结束,也就代表消费者进程已经把队列中数据消费完
    # 此时,主进程一旦结束,守护进程也就是消费者进程也就跟着结束。    整个程序也就能正常结束了。

 

      

6、其他相关知识

  进程间的内存共享

    from multiprocessing import Manager, Value

    m = manager()

    num = m.dict({}) num = m.list([])

  IPC:管道 队列(锁 信号量 事件)

  多个进程之间不能共享内存,不能修改全局变量

  from multiprocessing import value

    num = Value("i"--数据类型, num--数据)

    num.value() 值

  Manager模块:多进程间共享数据

    m = Manager()

    num = m.list([1,2,3])

  

 

      

Guess you like

Origin www.cnblogs.com/Laura-L/p/11384195.html