py knowledge (updated daily) 7.26

Blocking, non-blocking, asynchronous, synchronous with the coroutine

1. blocking, non-blocking
1.1 process or thread performance in the operation of state:
① obstruction
② run
③ ready
1.2 blocked:
a process or thread encounters blocking IO IO program encountered immediately stops (hangs), cpu switch immediately. IO wait until
after the end of the execution.
1.3 Non-blocking:
the process or thread is not IO or IO encountered by some means to make cpu to perform other tasks as possible
can take up the cpu.
2. asynchronous, synchronous
standing task released the angle
2.1 synchronization
can be seen in two ways:
between processes or threads ① an indirect relationship with each other constraints
such as a printer, after the process of obtaining this printer a, process B will be blocked and must wait for the process
after the release of a printer , process B to enter the ready state, waiting for the CPU to perform.
inter ② process or thread there is a direct correlation restrict
this case stems from a partnership relationship between processes or threads.
For example sends a message through a single process A to process B buffer, when the buffer area is empty, the process of obtaining less than B because
the required data is blocked, and only when the process a wants to send data buffer so that the buffer is not empty, the process
B woken up; conversely, when the buffer is full , into A process can not be placed due to buffer data to be blocked
only when the process B receives the data from the buffer, the buffer dissatisfaction, process A woken up.
Conclusion: synchronous and obstructive in some respects be equivalent, if two a process coincident relationship, if one
A process does not work or is for some reason blocked live, then the process will move towards another blocked
the road
2.2 Asynchronous:
Asynchronous mode without blocking the current process or thread to wait for the results returned, but allows subsequent operations, direct
to other processes or thread returns a result after processing, then the process or thread to notify the reception result,
from this point of view and the asynchronous non-blocking may be equivalent in some respects.
for example, when the crawler crawling image, the main thread for thread A perform data analysis and other functions, the line
drive is crawling thread B, mainly for crawling the web page information, is not received will not be executed during the thread a
web page information transmitted thread B is blocked, when the thread B crawling sends a message after the page information
notification thread a to data analysis.
2.1.1 + asynchronous invocation mechanism
reptiles:
the browser to do is very simple:
the browser sends a request to the head of the package ---> www.taobao.com (127.42.34.56) ---> service
Service obtains to request information, analysis correct ----> give you returns a file .---> browser this file
pieces rendering code, it becomes you to see looks like:
back Back to the file:
reptiles: the use of browser requests module functional simulation package head, send a request to the server, fool
after the server, the server returns a file to your reptile to get files, data cleansing acquired.
Information you want .
reptiles: two steps,
the first step: crawling file server (IO blocking).
the second step: get the documents, data analysis, (non-IO, IO rarely)

import requests
from concurrent.futures import ProcessPoolExecutor
import time
import random
import os
def get(url):
    response = requests.get(url)
    print(f'{os.getpid()} 正在爬取:{url}')
    time.sleep(random.randint(1,3))
    if response.status_code == 200:
        return response.text
def parse(text):
    '''
   对爬取回来的字符串的分析
   简单用len模拟一下.
   :param text:
   :return:
   '''
    print(f'{os.getpid()} 分析结果:{len(text)}')


if __name__ == '__main__':
    url_list = [
        'http://www.taobao.com',
        'http://www.JD.com',
        'http://www.JD.com',
        'http://www.JD.com',
        'http://www.baidu.com',
 'https://www.cnblogs.com/jinxin/articles/11232151.html',
 'https://www.cnblogs.com/jinxin/articles/10078845.html',
        'http://www.sina.com.cn',
        'https://www.sohu.com',
        'https://www.youku.com',
   ]
    pool = ProcessPoolExecutor(4)
    obj_list = []
    for url in url_list:
        obj = pool.submit(get, url)
        obj_list.append(obj)
    pool.shutdown(wait=True)
    for obj in obj_list:
        parse(obj.result())
        
'''
   串行
   obj_list[0].result()
   obj_list[1].result()
   obj_list[2].result()
   obj_list[3].result()
   obj_list[4].result()
'''
  1. The results of the analysis process is serial, low efficiency.
  2. After all the results you are crawling all successful, in a list and analysis.

In the process of opening the pool, and then open process, resource-intensive.

'' '
Crawl a page requires 2s, concurrent crawled pages 10: 2:00 S.
Analysis task: a task 1s, 10s need a total of 12. The second plurality.
Now this version of the procedure:
asynchronous crawling issued 10 web task, and then four processes concurrent (parallel) to go to complete the four crawled
pages of the task, then whoever ends, who were under a
crawling task until 10 tasks crawling all success.
the 10 climbing take the results of a list of serial analysis on
crawling a page requires 2s, analysis tasks: 1s, a total of 3s, a total of more than 3 seconds (open process loss).
3. thread queue
3.1 queue (FIFO First in First Out )
3.2 stack (LIFO Last in first out)
the next version of the procedure:
asynchronous issued 10 + crawled pages task analysis, and then four processes concurrently (in parallel) is completed go
into four pages crawled + task analysis,
then who should end, who conducted under a crawling + analysis tasks, crawling up to 10 + analysis tasks
completed successfully. '' '

import queue
q = queue.Queue(3)
q.put(1)
q.put(2)
q.put('太白')
print(q.get()) # 1
print(q.get()) # 2
print(q.get()) # 太白
3.3优先级队列
4.事件Event
并发的执行某个任务 .多线程多进程,几乎同时执行.
一个线程执行到中间时通知另一个线程开始执行.
import queue
q = queue.LifoQueue()
q.put(1)
q.put(3)
q.put('barry')
print(q.get()) # barry
print(q.get()) # 3
print(q.get()) # 1

Tuple of the form required, (int, data) int representative of priority, the lower the number, the higher the priority.

import queue
q = queue.PriorityQueue(3)
q.put((10, '垃圾消息'))
q.put((-9, '紧急消息'))
q.put((3, '一般消息'))
print(q.get()) # (-9, '紧急消息')
print(q.get()) # (3, '一般消息')
print(q.get()) # (10, '垃圾消息')
import time
from threading import Thread
from threading import current_thread
from threading import Event
event = Event()  # 实例化对象,默认是False
def task():
    print(f'{current_thread().name} 检测服务器是否正常开
启....')
    time.sleep(3)
    event.set()  # 改成了True
def task1():
    print(f'{current_thread().name} 正在尝试连接服务器')
          
event.wait() # 轮询检测event是否为True,当其为True,继续
下一行代码. 阻塞.
    event.wait(1)
设置超时时间,如果1s中以内,event改成True,代码继续执行.

设置超时时间,如果超过1s中,event没做改变,代码继续执行.

print(f'{current_thread().name} 连接成功')

if __name__ == '__main__':
    t1 = Thread(target=task1,)
t2 = Thread(target=task1,)
t3 = Thread(target=task1,)
t = Thread(target=task)
t.start()
t1.start()
t2.start()
t3.start()

The acquaintance coroutine
a thread of concurrency.
Concurrent, parallel, serial:
Serial: performing a plurality of tasks, the first task performed from the beginning, waiting encountered IO, IO wait junction obstruction
after beam continues the next task
parallelism: multicore, multiple threads or processes simultaneously execute four cpu, perform four tasks simultaneously.
concurrent: simultaneously perform multiple tasks looks, cpu switch back and forth between multiple tasks (IO encounter blocking
plug, compute-intensive execution time is too long).
nature of concurrent:

  1. IO encountered obstruction, compute-intensive execution time is too long to switch.
  2. Keep the original state of
    a thread concurrency.
    Multi-process: the operating system to control multiple tasks multiple processes switch + hold.
    Multithreading: The operating system controls multiple tasks multiple threads of switch + hold.
    Coroutine: Program controls the switching of a plurality of tasks and a thread holding state.
    coroutine: micro concurrent processing tasks should not be excessive.
    coroutine it scheduling cpu, if coroutine control tasks, experience blocking, it will rapidly (ratio operating system
    fast) to switch to another task, and can hang on a task (on hold), so that the operating system
    cpu have been working
    before we learned coroutine? yield is a coroutine,
    although the yield can be achieved two task switch back and forth, and be able to save the original state, but also Yi
    threads, but he could only encounter yield switched encountered Io or blocked.
    compute-intensive: serial and efficiency coroutine comparison
  1. 串行:
     import time
     def func1():
      for i in range(11):
          yield
          print('这是我第%s次打印啦' % i)
          time.sleep(1)
     def func2():
      g = func1()
      #next(g)
      for k in range(10):
          print('哈哈,我第%s次打印了' % k)
          time.sleep(1)
          next(g)
     #不写yield,下面两个任务是执行完func1里面所有的程序才会执行func2里面
     的程序,
  
  有了yield,我们实现了两个任务的切换+保存状态
  
  func1()
  
  func2()
  协程:
  
  串行
  
  import time
  def task1():
      res = 1
      for i in range(1,100000):
          res += i
  def task2():
      res = 1
      for i in range(1,100000):
         res -= i
  start_time = time.time()
  task1()
  task2()
  print(f'串行消耗时间:{time.time()-start_time}')  # 串行消耗时
  间:0.012000560760498047
  
  import time
  def task1():
      res = 1
      for i in range(1, 100000):
          res += i
          yield res
  def task2():
      g = task1()
  协程的优点 :
  多线程并发: 一个进程如果要是开4个线程,最多可以处理30个任务.
  多协程并发: 一个进程开启4个线程,然后我将4个线程设置4个协程,每个协程
  可以执行30个任务.120个任务.(了解)
  6.除yield之外两种协程的写法
  6.1greenlet与switch
      res = 1
      for i in range(1, 100000):
          res -= i
          next(g)
  start_time = time.time()
  task2()
  print(f'协程消耗时间:{time.time() - start_time}')  # 协程消耗时
  间:0.0260012149810791
  #1. 协程的切换开销更小,属于程序级别的切换,操作系统完全感知不到,因而
  更加轻量级
  #2. 单线程内就可以实现并发的效果,最大限度地利用cpu
  #3. 修改共享数据不需加锁
  
  

Yield than the other two kinds of writing coroutine

  greenlet与switch
  from greenlet import greenlet
  import time
  
  不能自动切换,
  
  遇到IO不切换
  # 可以保持原来的状态.
  def eat(name):
      print('%s eat 1' %name)  #2
      g2.switch('alex')   #3 (任务的第一次切换一定要传参)
      time.sleep(3)
      print('%s eat 2' %name) #6
      g2.switch() #7
  def play(name):
      print('%s play 3' %name) #4
      g1.switch()      #5
      print('%s play 4' %name) #8
  g1 = greenlet(eat)
  g2 = greenlet(play)
  g1.switch('太白')  # 1 (任务的第一次切换一定要传参)
  
6.2gevent与monkey
可以保持原来的状态.

def eat(name):
    print('%s eat 1' %name)  #2
    g2.switch('alex')   #3 (任务的第一次切换一定要传参)
    time.sleep(3)
    print('%s eat 2' %name) #6
    g2.switch() #7
def play(name):
    print('%s play 3' %name) #4
    g1.switch()      #5
    print('%s play 4' %name) #8
g1 = greenlet(eat)
g2 = greenlet(play)
g1.switch('太白')  # 1 (任务的第一次切换一定要传参)
import threading
from gevent import monkey
monkey.patch_all()  # 将你代码中的所有的IO都标识.
import gevent  # 直接导入即可
import time
def eat():
    print(f'线程1:{threading.current_thread().getName()}')
    print('eat food 1')
    time.sleep(3)  # 加上mokey就能够识别到time模块的sleep了
    print('eat food 2')
def play():
    print(f'线程2:{threading.current_thread().getName()}')
    print('play 1')
    time.sleep(1)  # 来回切换,直到一个I/O的时间结束,这里都是我们
个gevent做得,不再是控制不了的操作系统了。
    print('play 2')
g1=gevent.spawn(eat) # 使用spawn执行任务
g2=gevent.spawn(play)
gevent.joinall([g1,g2]) # 执行完全部的线程
print(f'主:{threading.current_thread().getName()}')

7.1 Comparison with processes and threads coroutine summary
7.1.1 Process
characteristics: large overhead, data isolation, data secure, you can take advantage of multi-core operating system level, resource allocation
smallest unit
7.1.2 Thread
features: small overhead, data sharing, data is insecure, can use a multi-core operating system level, CPU can adjust
the degree of the minimum unit

7.1.3 coroutine
characteristics: small overhead, data sharing, data security, can not take advantage of multi-core, user level

Guess you like

Origin www.cnblogs.com/lyoko1996/p/11328878.html