Crawl Baidu content:
1 import requests 2 url = " https://www.baidu.com " 3 4 if __name__ == ' __main__ ' : 5 try : 6 kv = { ' user-agent ' : ' Mozilla/5.0 ' } 7 r = requests .get(url, headers= kv) 8 r.raise_for_status() #Return the status value, if it is not 200, throw an exception 9 r.encoding = r.apparent_encoding 10 print (r.text) 11 # print(r.request.headers) 12 except : 13 print ( " Crawler failed " )
Fill in http://www.baidu.com/s?wd=keyword in the URL. The keyword is the content we want to search on Baidu. There are params parameters in requests, which can be appended to the URL.
1 import requests 2 url = "http://www.baidu.com/s" 3 keyword = "python" 4 5 if __name__ == '__main__': 6 try: 7 kv = {'user-agent': 'Mozilla/5.0'} 8 wd = {'wd': keyword} 9 r = requests.get(url, headers=kv, params=wd) 10 print(r.request.url) 11 r.raise_for_status() 12 r.encoding = r.apparent_encoding 13 print(len(r.text)) 14 except: 15 print("爬虫失败")