Python多线程与多进程编程（一）就这么简单

"""

<axiner>声明：
（错了另刂扌丁我）
（如若有误，请记得指出哟，谢谢了！！！）

"""

先来了解一个概念，GIL？ GIL的全称为Global Interpreter Lock, 全局解释器锁。

Python代码的执行由Python 虚拟机(也叫解释器主循环，CPython版本)来控制，Python 在设计之初就考虑到要在解释器的主循环中，同时只有一个线程在执行，即在任意时刻，只有一个线程在解释器中运行。对Python 虚拟机的访问由全局解释器锁（GIL）来控制，正是这个锁能保证同一时刻只有一个线程在运行。

也就是说并没有正真的多线程.....

为什么又有多线程编程呢？

GIL锁的释放，并不是GIL锁的获取者会一条路走到黑，也就是说在执行的某些中途会释放GIL锁，此时其它就有机会获得GIL锁了.....

Python内部算法机制有几种释放GIL锁的方式：

1\ 时间片

2\ 字节码长度

3\ io操作时

多线程的实现方式：

from threading import Thread

   1、参数传入
       thread = Thread()
       thread(target=func, args=())
   2、类的继承
       class MyThread(Thread):
           def __init__(self):
               pass
           def run(self):
               # 重写类下run()方法
               pass

thread = Thread()
thread.SetDeamon(True) # 设置为保护线程（即主线程退出其也退出）
thread.join() # 将线程阻塞直至线程完成

---------
线程间通信：

1、全局共享变量。资源竞争，不安全
2、queue。安全（from queue import Queue）

注意：
queue.join() # 阻塞作用。要退出，则在其前加上queue.task_done()<成对出现>

---------
线程间同步：

    1、Lock, RLook
   使用：在需要的代码块加上锁，acquire获取锁-release释放锁
   锁的缺点：
       1、锁会影响性能；
       2、锁会引起死锁。
       引起死锁原因：a.多次acquire而不释放; b.互相等待对方造成资源竞争; c.lock中的子函数也有lock(此应用RLock)

   2、Condition(条件变量)
   from threading import Condition

   例如：A与B对话（A先说）
   from threading import Thread, Condition
   class A(Thread):
       def __init__(self, cond):
           super().__init__(name="A")
           self.conf = cond
       def __run__(self):
           with self.cond: # 获取condition
               print("{}: hello!".format(self.name))
               self.cond.notify() # 通知等待（wait）
               self.cond.wait() # 阻塞等待通知（notify）

   class B(Thread):
       def __init__(self, cond):
           super().__init__(name="B")
           self.conf = cond
       def __run__(self):
           with self.cond: # 获取condition
               self.cond.wait()
               print("{}: 你好！".format(self.name))
               self.cond.notify()
               self.cond.wait()

   3、Semaphore(信号)
   # 是用于控制进入数量的锁（文件的读写，写一般只用一个线程，读可以允许有多个线程）
   例如：爬取url及解析页面
   from threading import Thread, Semaphore
   class Html_Spider(Thread):
       def __init__(self, sem):
           super().__init__(name="B")
           self.sem = sem
       def __run__(self):
           time.sleep(2) # 模拟
           print("success...")
           self.sem.release() # 完成后释放

   class UrlProducter(Thread):
       def __init__(self, sem):
           super().__init__(name="B")
           self.sem = sem
       def __run__(self):
           for i in range(20):
               self.acquire() # 获取
               html_thread = Html_Spider("http://www.foo/{0}".format(i), self.sem)
               html_thread.start()

   if __name__ == "__main__":
       sem = Semaphore(3) # 锁的数量
       url_producter = UrlProducter(sem)
       url_producter.start()

---------
关于线程池

为什么用线程池？
   1、控制线程数量并发（semaphore也有这功能）
   2、可线程状态及返回值
   3、当一个线程完成时，主线程可以立即知道
   （4、futures可以让多线程和多进程接口一致）
from concurrent import futures

eg:
# 模拟html获取
import time
from concurrent.futures import ThreadPoolExecutor

def get_html(t):
   time.sleep(2)
   print("get page success: {0}".format(t))
   return t

executor = ThreadPoolExecutor(max_work=2)

1\\ 提交任务
# 1-穷举提交
task1 = executor.submit(get_html, (2,))
task2 = executor.submit(get_html, (4,))

# 2-批量提交
ts = [2, 4]
all_task = [executor.submit(get_html, t) for t in ts]

# 总结
executor.submit() # 将执行函数提交到线程池中，非阻塞立即返回

另：
executor.done() # 判断某任务是否完成
executor.result() # 获取结果，阻塞式
executor.cancel() # 未提交的任务可取消

from concurrent.futures import wait
wait() # 等待xx完成，才执行主线程
wait有timeout和return_when两个参数可以设置。
timeout控制wait()方法返回前等待的时间。
return_when决定方法什么时间点返回：如果采用默认的ALL_COMPLETED，程序会阻塞直到线程池里面的所有任务都完成；如果采用FIRST_COMPLETED参数，程序并不会等到线程池里面所有的任务都完成。

2\\ 获取结果
from concurrent.futures import as_completed
# 一
for future in as_completed(all_task):
data = future.result()
print("get success page: {0}".format(data))

------------
另：提交任务+获取结果
for data in executor.map(get_html, ts):
print(data)

# as_completed 与 executor.map
as_completed是concurrent.futures的函数，返回futures对象（顺序与执行相同）
map是futures.ThreadPoolExecutor下的方法，返回data结果（顺序与提交相同）