1, Basics:
Computer hardware components:
Curing the motherboard (register, is hardware and interact directly cpu)
cpu CPU: calculating (digital computing and logic calculation) and control (control of all hardware coordination)
Hard disk memory storage
Keyboard and mouse input device microphone
A printer monitor audio output device
Early computer is calculated as the core, today's computers are stored in the core
Is a software operating system is a direct operating software a hardware
OS goal: to make it easier for users to use, high availability, low coupling, encapsulates all hardware interfaces, allowing users to more easily use for all resources within the computer, a reasonable allocation and scheduling
Basic knowledge of the process:
Process: The program is being executed, the program execution is the process where relevant, a collection of data sets, etc., also called the program once the implementation process is a dynamic concept
Composition process: the PCB code segment data segment: process control block
Three basic status of the process:
Ready state: it has received all the resources needed for the operation, in addition to cpu
Execution state: The've got all the resources, including the cpu, being executed in the state
Blocked: because of various reasons, the process gave up the cpu, resulting in the process can not proceed at this time is in the process of memory, continue to wait to obtain cpu
A special status: suspended state, for various reasons, the process gave up the cpu, resulting in the process can not proceed at this time the process was kicked out of memory
2, the process
multiprocessing python built-in module for multi-process programming Process from multiprocessing import Process
Parallel: Refers to two or many things, begin at the same point in time
Concurrency: refers to two or more than one thing, be performed simultaneously in the same time interval
Sync: perform a certain task must rely on other tasks return results
Asynchronous; perform a certain task does not depend on another task returned, only you need to tell another job soon
Obstruction: a similar program because IO wait, wait time can not continue
Non-blocking: IO operation has encountered a similar time, not block waiting for, if not treated IO, it error or skip
Process methods or properties:
os.getpid () Gets the current process pid os.getppid () Gets the current process id of the parent process
start () to open a child process
join () asynchronous become synchronized, so that the parent process waits for the child process to finish, and then continue (when the main process execution to this statement, the main process blocks, waiting for the child process is finished, the main process continues), join must be placed after the start
is_alive () to determine whether the process alive
terminate () to kill the process
Attributes;
name: The name of the child process
pid: pid of the child process
deamon: set the process for the daemon to True representatives daemon, the default is false, not a daemon
Daemon: p.daemon = True
With the code of the parent process is finished ends (knock on the blackboard, draw the focus of the code is finished !! - does not block waiting for the main course)
Daemon can not create a child process
Daemon must be set before the process start
IPC: Inter-Process Communication
Lock mechanism: For the time multi-process communication, to protect data security
from multiprocessing import Lock
l = lock()
l.acquire () acquires the lock (this time other processes can not access locked resources)
l.release () to release the lock (other process can access)
Signaling mechanisms:
SEM = semaphore (n)
n: initialization of a lock with a few keys, int
l.acquire()
l.release()
Signal mechanism of the locking mechanism than the counter, this counter is used to record the current remaining few lock, the counter is 0, no expression of this key, Acquire () in the blocked state, Acquire () once decreases the counter by 1, release time counter plus one;
Event mechanism:
e = Event ()
Initially false, blocked
e.set() 设置is_set()为True,代表非阻塞状态
e.clear() 设置is_set()为false,代表阻塞状态
e.wait()判断is_set的值,True为非阻塞。fase为阻塞
e.is_set() 标志
3、生产者消费者模型
简要介绍:主要是用来解耦,借助队列来实现生产者消费者模型
栈:先进后出
队列:先进先出
import queue 不能进行多进程之间的数据传输
from multiprocessing import Queue 借助Queue解决生产者消费者模型
队列是安全的
q = Queue(num---队列的最大长度)
q.get() 阻塞等待获取数据,如果有数据直接获取,没有则阻塞等待
q.put() 阻塞 如果可以继续往队列中放数据,就直接放,不能放则阻塞等待
q.get_nowait() 不阻塞,如果有数据直接获取,没有数据就报错
q.put_nowait() 不阻塞,如果可以继续往队列中放数据,就直接放,不能就报错
from multiprocessing import JoinableQueue 可连接的队列
继承Queue 可以使用queue的方法
增加的方法:
q.join() 用户生产者接收消费者的返回结果,接收全部生产的数量,以便知道什么时候队列里的数据被消费完了
q.task_done() 每消费一个数据,就返回一个表示返回结果,生产者就您呢个获得当前消费者消费了多少个数据,每消费队列里的一个数据,就给join返回一个表示
管道:
from multiprocessing import Pipe
con1, con2 = Pipe()
管道是不安全的
管道是用于多线程之间通信的一种方式
单进程中:
con1发则con2收,con2收则con1发
多进程中:
父进程con1发,子进程的con2收
管道中错误EOFError 父进程中如果关闭发送端,子进程还在继续接收,就回导致EOFError
4、进程池
一个池子,里面有固定数量的进程,且处在待命状态,一旦有任务来,马上就有进程去处理
开启进程需要操作系统消耗大量的事件去管理它,大量的事件让cpu去调度它
进程池会帮助程序员去管理池中的进程
from multiprocessing import Pool
p = Pool(os.cpu_count() + 1)
进程池的三个方法:
map(func, iterable)
func: 进程池中进程执行的任务函数
iterable:可迭代对象,是把可迭代对象中的每一个元素传给任务函数当参数
from multiprocessing import Pool import os def func(num): num += 1 print(num) return num if __name__ == '__main__': p = Pool(os.cpu_count() + 1) print(os.cpu_count()) res = p.map(func, [i for i in range(20)]) . # i作为参数传入func中 p.close() # 表示不能再向进程池中添加任务 p.join() # 表示等待进程池中所有任务执行完毕 print(type(res))
apply(func,arg=()) 同步的执行,即池中的进程一个个的去执行任务
func:进程池中进程执行的任务函数
args:可迭代对象的参数,是传给任务函数的参数
同步执行任务,不需要close和join, 进程池中所有的进程都是普通进程(主进程需要等待其结束)
from multiprocessing import Pool import requests import time def func(url): res = requests.get(url) if res.status_code == 200: return 'ok' if __name__ == '__main__': p = Pool(5) l = ['https://www.baidu.com', 'http://www.jd.com', 'http://www.taobao.com', 'http://www.mi.com', 'http://www.cnblogs.com', 'https://www.bilibili.com', ] start = time.time() for i in l: p.apply(func, args=(i,)) #即使有n个线程也是一个一个的去执行 print(time.time() - start) start = time.time() for i in l: p.apply_async(func, args=(i, )) p.close() p.join() print(time.time() - start)
apply_async(func,args=(), callback=None) 异步:池中的进程一次性去执行任务
func:进程池中进程执行的任务函数
args:可迭代对象的参数,是传给任务函数的参数
callback: 回调函数 当进程池中有进程处理完任务来,返回的结果可以交给回调函数,由回调函数进程进一步的处理,这是只有异步才有的
异步处理任务,需要close和join
异步处理任务时,进程池中所有的进程都是守护进程
回调函数:
进程的任务函数的返回值,被当成回调函数的形参接收到,一次进一步的处理操作
回调函数是由主进程调用的,而不是子进程,子进程只负责把结果给回调函数
from multiprocessing import Pool import requests import time import os def func(url): res = requests.get(url) print("子进程的pid:%s, 父进程的pid:%s" % (os.getpid(), os.getppid())) if res.status_code == 200: return url def cal_back(sta): url = sta print("回调函数的pid:%s" % os.getpid()) with open('content.txt', 'a+') as f: f.write(url + "\n") if __name__ == '__main__': p = Pool(5) l = ['https://www.baidu.com', 'http://www.jd.com', 'http://www.taobao.com', 'http://www.mi.com', 'http://www.cnblogs.com', 'https://www.bilibili.com', ] print("主进程pid:%s" % os.getpid()) for i in l: p.apply_async(func, args=(i,), callback=cal_back) p.close() p.join()
进程和进程池对比
from multiprocessing import Process, Pool import time def func(num): num += 1 # print(num) if __name__ == '__main__': p = Pool(5) start = time.time() p.map(func, [i for i in range(1000)]) p.close() # 指不能再向进程池中添加任务 p.join() # 等待进程池中的所有任务执行完毕 print(time.time() - start) p_l = [] start = time.time() for i in range(1000): p = Process(target=func, args=(i,)) p.start() p_l.append(p) [i.join() for i in p_l] print(time.time() - start)
5、生产者消费者模型实例
队列实现生产者消费者模型
from multiprocessing import Queue, Process import time def producer(q, name): for i in range(20): q.put(name) print("生产第%s个%s" % (i, name)) # q.put(None) def consumer(q, name, color): while 1: info = q.get() if info: print("%s %s拿走来%s\033[0m" % (color, name, info)) else: break if __name__ == '__main__': q = Queue(10) p = Process(target=producer, args=(q, 'number one')) p1 = Process(target=producer, args=(q, 'number two')) p2 = Process(target=producer, args=(q, 'number three')) c1 = Process(target=consumer, args=(q, 'alex', '\033[31m')) c2 = Process(target=consumer, args=(q, 'wusir', '\033[32m')) p_l = [p, p1, p2, c2, c1] [i.start() for i in p_l] p.join() p1.join() p2.join() q.put(None) q.put(None) # 设置标志 表示没有数据了,生产者不再生产数据
joinableQueue实现生产者消费者进程
from multiprocessing import JoinableQueue, Process def producer(q, name): for i in range(20): q.put(name) print("生产第%s个%s" % (i, name)) q.join() 生产者进程等待消费者进程消费完成 def consumer(q, name): while 1: q.get() print("%s 拿走了一个" % name) q.task_done() if __name__ == '__main__': q = JoinableQueue(10) p1 = Process(target=producer, args=(q, 'one')) c1 = Process(target=consumer, args=(q, 'alex')) c1.daemon = True # 设置守护进程 p1.start() c1.start() p1.join() # 主进程等待生产者进程 # 主进程等待生产者进程结束 # 程序有3个进程,主进程和生产者进程和消费者进程。 当主进程执行到35行代码时,主进程会等待生产进程结束 # 而生产进程中(第26行)会等待消费者进程把所有数据消费完,生产者进程才结束。 # 现在的状态就是 主进程等待生产者进程结束,生产者进程等待消费者消费完所有数据 # 所以,把消费者设置为守护进程。 当主进程执行完,就代表生产进程已经结束,也就代表消费者进程已经把队列中数据消费完 # 此时,主进程一旦结束,守护进程也就是消费者进程也就跟着结束。 整个程序也就能正常结束了。
6、其他相关知识
进程间的内存共享
from multiprocessing import Manager, Value
m = manager()
num = m.dict({}) num = m.list([])
IPC:管道 队列(锁 信号量 事件)
多个进程之间不能共享内存,不能修改全局变量
from multiprocessing import value
num = Value("i"--数据类型, num--数据)
num.value() 值
Manager模块:多进程间共享数据
m = Manager()
num = m.list([1,2,3])