python thread pool, custom exception request disguise

Thread pool concept:

Thread pool can be understood as a multi-threaded load pond, pool placed a specified number of threads, when the number of tasks we have submitted more than thread pool, redundant tasks will be queued until after the completion of other tasks to perform, and then benefits submitted to the task queue thread execution, the thread pool that can perform multiple tasks simultaneously, multiplexing thread resources and reduce the thread creation and destruction, but also conserve system resources.

1. The common code is in theory a thread to perform tasks, and other similar language, the code is executed from top to bottom.

import time


def test_data(index):
    time.sleep(5)
    if index % 2 == 0:
        print(f'{index}执行错误。')
        raise Exception('我报错了')
    print(f'{index}执行完毕。')

for i in range(1, 50):
    test_data(i)
  • Output:

1执行完毕。
Traceback (most recent call last):
  File "/Users/peakchao/Code/Py/ReptileForPython/pool_test.py", line 15, in <module>
    test_data(i)
  File "/Users/peakchao/Code/Py/ReptileForPython/pool_test.py", line 11, in test_data
    raise Exception('我报错了')
Exception: 我报错了
2执行错误。

Analysis: Cycle call test_data method, after sleeping for five seconds modulo value test_data incoming process, if the remainder is 0, an exception is thrown, when i = 1, normal program execution in the second cycle, i == 2, modulo equal to 0, an exception is thrown, quit the program crashes.

Note: At this point every 5 seconds to print in turn.

2. Multi-threaded code, theory of multiple threads execute tasks, thread pools in different languages ​​at the same time have a similar implementation.

import time
from concurrent.futures import ThreadPoolExecutor

pool = ThreadPoolExecutor(max_workers=2)


def test_data(index):
    time.sleep(5)
    if index % 2 == 0:
        print(f'{index}执行错误。')
        raise Exception('我报错了')
    print(f'{index}执行完毕。')


for i in range(0, 50):
    pool.submit(test_data, i)
  • Output:

1执行完毕。0执行错误。

3执行完毕。
2执行错误。
4执行错误。
5执行完毕。
6执行错误。
7执行完毕。

Analysis: There are two prints the same time, indicating that there are two tasks in parallel, if we change n the number of thread pool, then his execution efficiency is n times the single-threaded.

Request disguise:

Sometimes we crawl the site data, the server returns an error, and we are using a browser to access but can open normally, because the server analyzes the data to our request, we are determined that the reptile, the termination of the normal response, when we frequently when crawling a website data, even if the request will occasionally fail disguise, because the information requested is fixed, and it is regularly intercepted.

USER_AGENTS = [
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.116 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.5 Safari/605.1.15',
    'Mozilla/5.0 (iPhone; CPU iPhone OS 13_2_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.3 Mobile/15E148 Safari/604.1',
    'Mozilla/5.0 (iPad; CPU OS 11_0 like Mac OS X) AppleWebKit/604.1.34 (KHTML, like Gecko) Version/11.0 Mobile/15A5341f Safari/604.1',
    'Mozilla/5.0 (Linux; Android 8.0; Pixel 2 Build/OPD3.170816.012) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.116 Mobile Safari/537.36'
]


def get_request_headers():
    headers = {
        'User-Agent': random.choice(USER_AGENTS),
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3',
        'Accept-language': 'zh-CN,zh;q=0.9',
        'Accept-Encoding': 'gzip, deflate,br',
        'Connection': 'keep-alive',
    }
    return headers

Custom exception:

Abnormalities often can not meet the system requirements defined, in order to throw more specific error, as well as follow-up on what you want to intercept error process, we need to custom exception.

class BaseException(Exception):
    def __init__(self, msg):
        self.msg = msg

    def __str__(self):
        print(self.msg)


try:
    input_data = input('请输入:')
    if len(input_data) > 0:
        raise BaseException('哈哈,不允许输入任何文字哦~')
    print('执行完毕')
except BaseException as err:
    print(f'捕捉到自定义异常:{err}')
  • Output:

请输入:666
哈哈,不允许输入任何文字哦~
哈哈,不允许输入任何文字哦~
Traceback (most recent call last):
  File "/Users/peakchao/Code/Py/ReptileForPython/pool_test.py", line 30, in <module>
    raise BaseException('哈哈,不允许输入任何文字哦~')
__main__.BaseException: <exception str() failed>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/peakchao/Code/Py/ReptileForPython/pool_test.py", line 33, in <module>
    print(f'捕捉到自定义异常:{err}')
TypeError: __str__ returned non-string (type NoneType)

Process finished with exit code 1

Published 122 original articles · won praise 238 · views 760 000 +

Guess you like

Origin blog.csdn.net/c__chao/article/details/104805370