Crack Baidu translation
Requirements:
#post request (with parameters) #The
response data is a set of json data
Writing steps :
1. Specify url
2. Perform UA camouflage
3. Post request parameter processing (same as get request)
4. Send request
5. Obtain response data
6. Persistent storage of
post request:
import requests
import json
if __name__ == "__main__":
#1、指定url
post_url = 'https://fanyi.baidu.com/sug'
#2、进行UA伪装
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.81 Safari/537.36 SE 2.X MetaSr 1.0'
}
#3、post请求参数处理(同get请求一致)
data = {
'kw':'dog'
}
#4、请求发送
response = requests.post(url=post_url,data=data,headers=headers)
#5、获取响应数据:json()方法返回的是obj(如果确认响应数据是json类型的,才可以json())
dic_obj = response.json()
#print(dic_obj) #获取响应数据
#持久化存储
fp = open('./dog.json','w',encoding = 'utf-8')
json.dump(dic_obj,fp=fp,ensure_ascii=False)
print('over!!!')
There is a question about the __name__ == " main " at the beginning , so I checked it. There are the following statements:
1) It symbolizes the main entrance of the program in languages such as Java. Tell other programmers that the code entrance is here.
2) The name__ attribute is a built-in attribute of Python, which records a string.
If it is in the current file, name is __main .
Print the __name__ attribute value of this file in the hello file, it shows __main__
If it is an imported file, name__ is the name of the module.
The test file imports the hello module, and the __name__ attribute value of the hello module is printed in the test file, and the module name of the hello module is displayed.
Therefore __name == ' main ' means that in the current file, test code can be written under the condition of if name == ' main ':, which can avoid the execution of the test code after the module is imported.
Summary :
"if name ==' main ':" often seems useless, but it is still necessary due to the standardization of the code.
Douban movie
get request:
import requests
import json
if __name__ == "__main__":
url = "https://movie.douban.com/j/chart/top_list?"
#Query String Parameters //get请求中url后面要带的参数,即上面url问号后面的内容
#params是一个计算机函数,表示函数的参数是可变个数的,即可变的方法参数
param = {
'type':'24',
'interval_id':'100:90',
'action': '',
'start': '0',#从库中的第几部电影去取
'limit': '20',#一次取出的个数
}
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.81 Safari/537.36 SE 2.X MetaSr 1.0'
}
response = requests.get(url = url,params = param,headers = headers)
list_data = response.json()
fp = open('./douban.json','w',encoding='utf-8')
#json 模块提供了一种很简单的方式来编码和解码JSON数据。 其中两个主要的函数是 json.dumps() 和 json.loads()。
#json.dump()将一个Python数据结构转换为JSON;
#json.loads将一个JSON编码的字符串转换回一个Python数据结构;
json.dump(list_data,fp=fp,ensure_ascii=False)
print('over!')