Learn to crawl from Luffy (Come on, you are the best!)

Crack Baidu translation

Requirements:
#post request (with parameters) #The
response data is a set of json data

Writing steps :
1. Specify url
2. Perform UA camouflage
3. Post request parameter processing (same as get request)
4. Send request
5. Obtain response data
6. Persistent storage of
post request:

import requests
import json

if __name__ == "__main__":
    #1、指定url
    post_url = 'https://fanyi.baidu.com/sug'
    #2、进行UA伪装
    headers = {
    
    
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.81 Safari/537.36 SE 2.X MetaSr 1.0'
        
        }
    #3、post请求参数处理(同get请求一致)
    data = {
    
    
        'kw':'dog'
        }
    #4、请求发送
    response = requests.post(url=post_url,data=data,headers=headers)
    #5、获取响应数据:json()方法返回的是obj(如果确认响应数据是json类型的,才可以json())
    dic_obj = response.json()
    #print(dic_obj)    #获取响应数据
    #持久化存储
    fp = open('./dog.json','w',encoding = 'utf-8')
    json.dump(dic_obj,fp=fp,ensure_ascii=False)

    print('over!!!')

There is a question about the __name__ == " main " at the beginning , so I checked it. There are the following statements:
1) It symbolizes the main entrance of the program in languages ​​such as Java. Tell other programmers that the code entrance is here.
2) The name__ attribute is a built-in attribute of Python, which records a string.
If it is in the current file, name is __main
.
Print the __name__ attribute value of this file in the hello file, it shows __main__
Insert picture description here
 If it is an imported file, name__ is the name of the module.
The test file imports the hello module, and the __name__ attribute value of the hello module is printed in the test file, and the module name of the hello module is displayed.
Insert picture description here
Therefore __name
== ' main ' means that in the current file, test code can be written under the condition of if name == ' main ':, which can avoid the execution of the test code after the module is imported.
Summary :
"if name ==' main ':" often seems useless, but it is still necessary due to the standardization of the code.

Douban movie

get request:

import requests
import json

if __name__ == "__main__":
    url = "https://movie.douban.com/j/chart/top_list?"
    #Query String Parameters  //get请求中url后面要带的参数,即上面url问号后面的内容
    #params是一个计算机函数,表示函数的参数是可变个数的,即可变的方法参数
    param = {
    
    
        'type':'24',
        'interval_id':'100:90',
        'action': '',
        'start': '0',#从库中的第几部电影去取
        'limit': '20',#一次取出的个数
        }
    headers = {
    
    
        'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.81 Safari/537.36 SE 2.X MetaSr 1.0'
        }
    response = requests.get(url = url,params = param,headers = headers)
    list_data = response.json()
    fp = open('./douban.json','w',encoding='utf-8')
    #json 模块提供了一种很简单的方式来编码和解码JSON数据。 其中两个主要的函数是 json.dumps() 和 json.loads()。
    #json.dump()将一个Python数据结构转换为JSON;
    #json.loads将一个JSON编码的字符串转换回一个Python数据结构;
    json.dump(list_data,fp=fp,ensure_ascii=False)

    print('over!')

Insert picture description here

Guess you like

Origin blog.csdn.net/langezuibang/article/details/113949973