1. python的GIL(global interpreter lock)

python中的一个线程对应于c语言中的一个线程。
GIL使得同一时刻只有一个线程在CPU上执行字节码，无法将多个线程映射到多个CPU上执行。

# 查看线程的字节码
import dis

def add(a):
    return a+1

print(dis.dis(add))

GIL会根据执行的字节码行数以及时间片释放，GIL在遇到IO操作时会主动释放。

（其他有关GIL的内容可以参考python中的GIL详解）

2. python多线程编程的几种方式：

1. 通过Thread类进行实例化

该方法只适用于代码量较小的文件、动态建立线程或者建立线程池的情况，适用于不便于对需要修改初始化等设置的函数或类。

创建线程

target传递函数来创建进程，只需要传入函数名，不可传入函数调用

args传递函数的参数列表，必须传入元组类型，若该函数没有参数可以省略

my_thread = threading.Thread(target=get_detail, args = (a,))

执行线程

my_thread.start()

主线程结束后立刻结束其他线程

无论自行创建了多少个线程，程序本身还至少有一个主线程。

由于所有进程都是并发执行，当主线程执行结束时，若要求立即kill掉其他线程，则要对需要被kill掉的进程执行setDaemon函数，若参数为True，表示主线程结束后立即kill掉该进程。

my_thread.setDaemon(True)

若需要在等到某个线程执行结束后，主线程再往下执行，可以调用join函数

my_thread.join()

2.通过继承Threading.Thread类来实现多线程

较灵活，可以修改类的相关属性，可以进行重载，加入更多复杂的属性。

由继承了Thread类来创建线程时直接调用该函数，不需再引入相关的target和args。

import time
import threading

class getUrl(threading.Thread):
    def __init__(self, name):
        super().__init__(name=name)

    def run(self):
        print("I am the url")
        sleep(4)
        priny("Bye!")

if __name__ == "__main__":
    my_thread = getUrl("hi")
    start_time = time.time()
    my_thread.start()
    my_thread.join()
    
    print("Execute time is {}".format(time.time()-start_time)

3. 线程通信方式

共享变量

采用函数传递同一个变量或者声明global变量的方式使用共享变量。

共享变量的线程安全性会受到GIL的影响。

Queue方式

所有线程不需要再加入一个列表中，直接插入到一个Queue对象中，通过queue.put函数将新线程加入，再通过queue.get函数从queue中获取一个新线程。

这就实现了线程的安全性，因为queue对象本身即是线程安全的：调用put函数时，会检查queue是否是full，若已满，则阻塞该put语句；调用get函数时，不会产生多个线程同时获取导致的错误，它也是是阻塞的，当queue为empty时，阻塞该语句。因为它内部实现了线程同步里的一些锁机制和双端队列（deque本身是线程安全的），此外还可以通过queue的qsize属性获取当前队列中元素的个数。

from queue import Queue

# get函数的源代码
def get(self, block=True, timeout=None):
        '''Remove and return an item from the queue.

        If optional args 'block' is true and 'timeout' is None (the default),
        block if necessary until an item is available. If 'timeout' is
        a non-negative number, it blocks at most 'timeout' seconds and raises
        the Empty exception if no item was available within that time.
        Otherwise ('block' is false), return an item if one is immediately
        available, else raise the Empty exception ('timeout' is ignored
        in that case).
        '''
        with self.not_empty:
            if not block:
                if not self._qsize():
                    raise Empty
            elif timeout is None:
                while not self._qsize():
                    self.not_empty.wait()
            elif timeout < 0:
                raise ValueError("'timeout' must be a non-negative number")
            else:
                endtime = time() + timeout
                while not self._qsize():
                    remaining = endtime - time()
                    if remaining <= 0.0:
                        raise Empty
                    self.not_empty.wait(remaining)
            item = self._get()
            self.not_full.notify()
            return item

# put函数的源代码
  def put(self, item, block=True, timeout=None):
        '''Put an item into the queue.

        If optional args 'block' is true and 'timeout' is None (the default),
        block if necessary until a free slot is available. If 'timeout' is
        a non-negative number, it blocks at most 'timeout' seconds and raises
        the Full exception if no free slot was available within that time.
        Otherwise ('block' is false), put an item on the queue if a free slot
        is immediately available, else raise the Full exception ('timeout'
        is ignored in that case).
        '''
        with self.not_full:
            if self.maxsize > 0:
                if not block:
                    if self._qsize() >= self.maxsize:
                        raise Full
                elif timeout is None:
                    while self._qsize() >= self.maxsize:
                        self.not_full.wait()
                elif timeout < 0:
                    raise ValueError("'timeout' must be a non-negative number")
                else:
                    endtime = time() + timeout
                    while self._qsize() >= self.maxsize:
                        remaining = endtime - time()
                        if remaining <= 0.0:
                            raise Full
                        self.not_full.wait(remaining)
            self._put(item)
            self.unfinished_tasks += 1
            self.not_empty.notify()

当queue中运行的线程需要阻塞主线程时，可以调用join函数，但是若线程是个无限循环语句，则join函数永远不会退出，所以每次join之前必须使用task_done()函数，这两个函数总是成对出现。

my_queue.tast_done()
my_queue.join()

4. 线程同步

Lock

被锁住的代码段，同一时刻只能有该代码段在运行，只有锁释放时才允许其他代码运行。

from threading import Lock
import threading

total = 0
lock = Lock()

def add():
    global total
    global lock
    for i in range(1000000):
        lock.acquire()  # 获取锁
        total += 1
        lock.release()

def desc():
    global total
    global lock
    for i in range(1000000):
        lock.acquire()  # 获取锁
        total -= 1
        lock.release()

thread1 = threading.Thread(target=add)
thread2 = threading.Thread(target=desc)
thread1.start()
thread2.start()
thread1.join()
thread2.join()
print(total)

# 运行结果：0

如果没有加锁，运行结果可能是-1000000—1000000中的任意值，因为根据字节码，这两个函数的字节操作都分为四步：将变量a读进内存；将1读进内存；计算；返回。当一个线程读取a时，若另一个线程也要读取，则会导致函数还没执行完，字节码中途退出，导致错误。加锁之后进行了互斥的运算，结果才会唯一。

但是锁也不是完美的，主要有一下两个缺点：①影响性能 ②死锁

死锁的两种简单情况：

（1）一个函数中多次调用acquire，并且在下一次调用前上一次的锁还没有释放。

（2）多个线程互相等待：

A(a,b) acquire(a) acquire(b)

B(a,b) acquire(b) acquire(a)

RLock

可重入的锁。在同一个线程里可以连续调用acquire，但是acquire次数必须和release次数相同。

Condition

条件变量，用于复杂的线程间同步，完成协同。

wait：使线程等待某个信号量的发生在执行。

notify：通知某个正在wait的线程执行。

例子：进程两个对话线程之间的同步：如果只用锁机制，会产生一方将对话全都输出完才轮到另一方执行。

import threading
from threading import Condition

class XiaoAi(threading.Thread):
    def __init__(self, condition):
        super().__init__(name="小爱")
        self.condition = condition

    def run(self):
        with self.condition:  # with调用___enter__魔法函数
        # 可以不用with语句 用acquire和release是相同效果
            self.condition.wait() # 等待天猫喊小爱
            print("{}:在".format(self.name))
            self.condition.notify()

            self.condition.wait() # 等待天猫喊小爱
            print("{}:好啊".format(self.name))
            self.condition.notify()

            self.condition.wait() # 等待天猫喊小爱
            print("{}:君住长江尾".format(self.name))
            self.condition.notify()

            self.condition.wait() # 等待天猫喊小爱
            print("{}:共饮长江水".format(self.name))
            self.condition.notify()

            self.condition.wait() # 等待天猫喊小爱
            print("{}:此恨几时已".format(self.name))
            self.condition.notify()
 

            self.condition.wait() # 等待天猫喊小爱
            print("{}:定不负相思意".format(self.name))
            self.condition.notify()
        

class SkyCat(threading.Thread):
    def __init__(self, condition):
        super().__init__(name="天猫精灵")
        self.condition = condition

    def run(self): 
        with self.condition:
            print("{}:小爱同学".format(self.name))
            self.condition.notify()  # 要求小爱回应
            self.condition.wait()

            print("{}:我们来对古诗吧".format(self.name))
            self.condition.notify()  # 要求小爱回应
            self.condition.wait()

            print("{}:我在长江头".format(self.name))
            self.condition.notify()  # 要求小爱回应
            self.condition.wait()

            print("{}:日日思君不见君".format(self.name))
            self.condition.notify()  # 要求小爱回应
            self.condition.wait()

            print("{}:此水几时休".format(self.name))
            self.condition.notify()  # 要求小爱回应
            self.condition.wait()

            print("{}:只愿君心似我心".format(self.name))
            self.condition.notify()  # 要求小爱回应
            self.condition.wait()



if __name__ == "__main__":
    con = threading.Condition()
    xiaoai = XiaoAi(con)
    skycat = SkyCat(con)
     
    # 如果先启动天猫会产生死锁 因为天猫notify之后小爱才会执行 小爱没有收到notify信息就会死锁
    # 在调用with condition之后才能调用wait和notify
    # condition有两把锁，一把底层锁会在线程调用了wait时释放
    # 上面的锁会在每次调用wait的时候分配一把并放入condition的等待队列中，等待notify方法唤醒
    xiaoai.start()
    skycat.start()

Semaphore信号量

是控制信号数量的锁。

import threading
import time

class HtmlSpider(threading.Thread):
    def __init__(self, url, sem):
        super().__init__()
        self.url = url
        self.sem = sem

    def run(self):
        time.sleep(2)
        print("got html text success")
        self.sem.release()  # release必须和acquire成对出现

class UrlProducer(threading.Thread):
    def __init__(self, sem):
        super().__init__()
        self.sem = sem

    def run(self):
        for i in range(20):
            self.sem.acquire()  # 每调用一次信号量数量减一 当减到0时阻塞
            html_thread = HtmlSpider("http://www.baidu.com/{}".format(i), self.sem)
            html_thread.start()

if __name__ == "__main__":
    sem = threading.Semaphore(3)  # 每次只允许三个线程进入 
    url_producer = UrlProducer(sem)
    url_producer.start()

5. 线程池（ThreadPoolExecutor)

线程池也能控制执行的线程数量，并且在主线程中，当一个线程完成的时候主线程能立即知道该线程的状态或某一任务的状态及返回值。

线程池在concurrent模块的futures包中，futures可以让多线程和多进程编码接口一致，又称未来对象，是task的返回容器。

import time
from concurrent.futures import ThreadPoolExecutor, as_completed, wait


def get_html(times):
    time.sleep(times)
    print("get page {} success".format(times))
    return times


excutor = ThreadPoolExecutor(max_workers=2)
# 通过submit函数提交执行的函数到线程池中
 # submit是立即返回，非阻塞
# 逐个提交
# task1 = excutor.submit(get_html, 3)
# task2 = excutor.submit(get_html, 2)

# print(task1.done())  # 检查是否执行完成
# print(task2.cancel())  # 取消某一任务 必须作用在submit返回的对象上 当线程已经开始执行或执行完成不可取消
# time.sleep(3)
# print(task1.done())
# print(task1.result())  # 输出返回结果

# 输出结果:
# False
# False
# get page 2 success
# get page 3 success
# True
# 3

# 要获取已经成功的task的返回
# 批量提交
urls = [3, 2, 4]
all_tasks = [excutor.submit(get_html, url) for url in urls]
for future in as_completed(all_tasks):  # as_completed是一个生成器
    data = future.result()
    print("get {} page success".format(data))

# 运行结果：
# get page 2 success
# get 2 page success
# get page 3 success
# get 3 page success
# get page 4 success
# get 4 page success

# 也可以通过map函数获取成功的任务返回的值
for data in excutor.map(get_html, urls):
    print("get {} page success".format(data))

# 程序输出：map返回结果顺序与urls中一致
# get page 2 success
# get page 3 success
# get 3 page success
# get 2 page success
# get page 4 success
# get 4 page success

wait()函数：阻塞主线程，传递某一任务名为参数，仅当该任务完成后主线程才继续执行。

还可以为“return_when=”传递一个参数， FIRST_COMPLETE：若传递的是一个任务序列，当执行完第一个任务后主线程就不再阻塞；还有ALL_COMPLETE(default), FIRST_EXCEPTION。

6. 多进程编程

1）多线程和多进程对比

对于耗CPU的操作，使用多进程编程；对于io操作来说，使用多线程较优。进程切换的代价要高于线程。

①耗cpu操作举例：计算斐波那契数列第25-35项：

# 耗cpu的操作举例
import time
from concurrent.futures import ThreadPoolExecutor, as_completed


def fib(n):
    if n <= 2:
        return 1
    else:
        return fib(n - 1) + fib(n - 2)


if __name__ == "__main__":
    with ThreadPoolExecutor(3) as executor:  # 继承了Executor类 并实现了上下文协议
        all_tast = [executor.submit(fib, n) for n in range(25, 35)]
        start_time = time.time()
        for future in as_completed(all_tast):
            data = future.result()
            print("result is {}".format(data))

        print("Total time is {}".format(time.time() - start_time))

输出结果：

若将代码中的ThreadPoolExecutor均改为ProcessPoolExecutor（进程池），运行结果总时间为1.7055528163909912，明显可见对耗cpu的操作，多进程效率更高。

②io操作举例：

# io操作举例
import time
from concurrent.futures import ThreadPoolExecutor, as_completed


def random_sleep(n):
    time.sleep(n)
    return n


if __name__ == "__main__":
    with ThreadPoolExecutor(3) as executor:  # 继承了Executor类 并实现了上下文协议
        all_tast = [executor.submit(random_sleep, n) for n in [2]*30]
        start_time = time.time()
        for future in as_completed(all_tast):
            data = future.result()

        print("Total time is {}".format(time.time() - start_time))

多线程运行结果：Total time is 20.04104208946228

多进程运行结果：Total time is 20.10332202911377

在数量较小时差别不大，但是还是能看出在io操作频繁时，多线程优于多进程，并且操作系统对线程的开销小，进程的开销较大，过多进程后CPU可能会变慢甚至崩溃。

2）多进程编程

多线程通过全局变量可以通信，但是多进程不可以，因为不同进程之间的数据是相互隔离的。

①使用底层的multiprocessing类编程：

import time
import multiprocessing  # 比ProcessPoolExecutor更底层



def random_sleep(n):
    time.sleep(n)
    print("sub_process success")
    return n


if __name__ == "__main__":
    progress = multiprocessing.Process(target=random_sleep, args=(2,))
    progress.start()
    print(progress.pid)  # 输出唯一标识进程的进程标识号
    progress.join()
    print("main process ended")

输出结果：

61653
sub_process success
main process ended

②ProcessPoolExecutor已在上面介绍

③使用processing.Pool编程：

import time
import multiprocessing  # 比ProcessPoolExecutor更底层


def random_sleep(n):
    time.sleep(n)
    print("sub_process success")
    return n


if __name__ == "__main__":
    # 使用线程池
    pool = multiprocessing.Pool(multiprocessing.cpu_count())  # 多进程当进程数等于cpu数时效率最高
    result = pool.apply_async(random_sleep, args=(3,))

    pool.close()  # 必须先将pool关闭不再接受新任务  否则join会报错
    pool.join()  # 等待所有任务执行完成
    print(result.get())
    print(result.successful())  # 判断是否执行成功

运行结果：

sub_process success
3
True

# 使用imap批量加入进程
import time
import multiprocessing  # 比ProcessPoolExecutor更底层


def random_sleep(n):
    time.sleep(n)
    # print("sub_process success")
    return n


if __name__ == "__main__":
    # 使用线程池
    pool = multiprocessing.Pool(multiprocessing.cpu_count())  # 多进程当进程数等于cpu数时效率最高
    
    # iamp
    for result in pool.imap(random_sleep, [1, 5, 3]):
        print("{} sleep success".format(result))

运行结果：（完成的顺序和添加的顺序一致）

1 sleep success
5 sleep success
3 sleep success

若使用imap_unordered函数，则运行结果为：（按照进程完成顺序打印）

1 sleep success
3 sleep success
5 sleep success

3）进程间通信

共享全局变量只适用于多线程，不适用与多进程。

进程间的通信也可以使用Queue，但是该Queue与线程之间通信的Queue不同，是专门的多进程模块下的Queue，但是Queue不能用在进程池pool中。

import time
from multiprocessing import Queue, Process


def producer(queue):
    queue.put("a")
    time.sleep(2)


def consumer(queue):
    time.sleep(2)
    data = queue.get()
    print(data)


if __name__ == "__main__":
    queue = Queue(10)
    my_producer = Process(target=producer, args=(queue,))
    my_cunsumer = Process(target=consumer, args=(queue,))
    my_producer.start()
    my_cunsumer.start()
    my_producer.join()
    my_cunsumer.join()
    
# 运行结果：a

若想要在进程池中的进程之间进行通信，可以使用Manage类，该类也有一个Queue对象。

import time
from multiprocessing import Manager, Pool


def producer(queue):
    queue.put("a")
    time.sleep(2)


def consumer(queue):
    time.sleep(2)
    data = queue.get()
    print(data)


if __name__ == "__main__":
    queue = Manager().Queue(10)
    pool = Pool(2)
    pool.apply_async(producer, args=(queue,))
    pool.apply_async(consumer, args=(queue,))
    pool.close()
    pool.join()

# 运行结果：a

使用Pipe进行通信（简化版的Queue），Pipe只能用于两个进程之间的通信，但是pipe的性能高于Queue：

import time
from multiprocessing import Pipe, Process


def producer(pipe):
    pipe.send("Lil_Hoe")


def consumer(pipe):
    print(pipe.recv())


if __name__ == "__main__":
    recv_pipe, send_pipe = Pipe()

    my_producer = Process(target=producer, args=(send_pipe,))
    my_consumer = Process(target=consumer, args=(recv_pipe,))
    my_producer.start()
    my_consumer.start()
    my_producer.join()
    my_consumer.join()

# 运行结果：Lil_Hoe

进程间通信也可以是使用Manage类来共享变量：

import time
from multiprocessing import Manager, Process


def add_data(p_dict, key, value):
    p_dict[key] = value


if __name__ == "__main__":
    process_dict = Manager().dict()

    first_process = Process(target=add_data, args=(process_dict, "bobby1", 22))
    second_process = Process(target=add_data, args=(process_dict, "bobby2", 33))

    first_process.start()
    second_process.start()

    first_process.join()
    second_process.join()

    print(process_dict)  # 实现两个进程修改同一个变量
    
# 运行结果：{'bobby1': 22, 'bobby2': 33}

该笔记源于：https://www.bilibili.com/video/BV17p4y117N6?p=65

Python高级编程之多线程和多进程

目录

1. python的GIL(global interpreter lock)

2. python多线程编程的几种方式：

1. 通过Thread类进行实例化

2.通过继承Threading.Thread类来实现多线程

3. 线程通信方式

共享变量

Queue方式

4. 线程同步

Lock

RLock

Condition

Semaphore信号量

5. 线程池（ThreadPoolExecutor)

6. 多进程编程

1）多线程和多进程对比

2）多进程编程

3）进程间通信

猜你喜欢