This artifact has saved a large number of reptile programmers!

Hello everyone, I am Payson sauce.

I believe that everyone should have written a crawler, and a simple crawler only needs to use requests. When encountering complex crawlers, you need to add request headers and parameter information to the program. Something like this:

e1ae6e88566cdc959437fe0cb6a05e4b.png

Our general steps are to first find the request we need in the browser's network request, and then copy the request header and parameter information into the program one by one:

9ca5fd005856b7b9516d5df36c45dcac.png

It is cumbersome to do this every time, and sometimes it is easy to make mistakes.

Today I will introduce to you an artifact that can automatically parse the browser's request header information into the code that our crawler needs.

Install

First, we need to install this artifact:

pip install filestools -U

You can also use Alibaba Cloud to accelerate:

pip install filestools --index-url=http://mirrors.aliyun.com/pypi/simple -U

Of course, if you want to download the latest version, you can use the following command:

pip install filestools --index-url https://pypi.org/simple/ -U

use

The use of this artifact is also very simple.

The first step is to use the browser's function to copy the request header information we need from the network:

20b01716ad11f00b20937207d25403ab.png

Then paste the copied content into our conversion program:

from curl2py.curlParseTool import curlCmdGenPyScript

curl_cmd = """curl 'http://www.shixi.com/search/index?key=python'
-H 'Connection: keep-alive'
-H 'Cache-Control: max-age=0'
-H 'Upgrade-Insecure-Requests: 1'
-H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
-H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9'
-H 'Referer: http://www.shixi.com/'
-H 'Accept-Language: zh-CN,zh;q=0.9'
-H 'Cookie: UM_distinctid=17a50a2c8ea537-046c01e944e72f-6373267-100200-17a50a2c8eb4ff; PHPSESSID=rpprvtdrcrvt54fkr7msgcde17; CNZZDATA1261027457=1711789791-1624850487-https%253A%252F%252Fwww.baidu.com%252F%7C1627741311; Hm_lvt_536f42de0bcce9241264ac5d50172db7=1627741268; Hm_lpvt_536f42de0bcce9241264ac5d50172db7=1627741334'
--compressed
--insecure"""

output = curlCmdGenPyScript(curl_cmd)
print(output)

Just put it in curl_cmd here.

Finally, run the program, and we can get the code in the output window as follows:

#######################################
#      The generated by curl2py.      
#      author:小小明
#######################################

import requests
import json

headers = {
    "Connection": "keep-alive",
    "Cache-Control": "max-age=0",
    "Upgrade-Insecure-Requests": "1",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
    "Referer": "http://www.shixi.com/",
    "Accept-Language": "zh-CN,zh;q=0.9"
}
cookies = {
    "UM_distinctid": "17a50a2c8ea537-046c01e944e72f-6373267-100200-17a50a2c8eb4ff",
    "PHPSESSID": "rpprvtdrcrvt54fkr7msgcde17",
    "CNZZDATA1261027457": "1711789791-1624850487-https%253A%252F%252Fwww.baidu.com%252F%7C1627741311",
    "Hm_lvt_536f42de0bcce9241264ac5d50172db7": "1627741268",
    "Hm_lpvt_536f42de0bcce9241264ac5d50172db7": "1627741334"
}
params = {
    "key": "python"
}

res = requests.get(
    "http://www.shixi.com/search/index",
    params=params,
    headers=headers,
    cookies=cookies
)
print(res.text)

Just copy these codes to the crawler as needed.

Summarize

This is actually a very simple program, but it can solve a pain point in our code writing process and improve our code writing efficiency.

I am still very grateful and admire this "Xiao Xiaoming" classmate, he is really a caring person among code farmers.

Exchange group

After a lapse of 2 months, the Moyu learning and exchange group is open again for a limited time.

e302518c86cfe1eecee0e965fbef4ee5.png

The Python technical exchange group (mainly technical exchange, fishing, free prostitution courses) is open from time to time. Interested friends can reply in the official account below: 666, you can enter, and we will plan for 100 days  together !

Old rules , do you still remember, click "Looking" in the lower right corner , if you think the content of the article is good, remember to share it in Moments to let more people know!

e7c16781930f14fa0040ba96e506eff5.gif

[ How to obtain the mysterious gift package ]

Identify the official account below, reply: 1024

f0cbffd32fb02c228f6b9082861284aa.jpeg

Guess you like

Origin blog.csdn.net/weixin_48923393/article/details/128928506