Requests Library Web crawler combat

Example one: Jingdong commodity page crawling

import requests
url = "https://item.jd.com/100004770237.html"
try:
   r = requests.get(url)
   r.raise_for_status()
   r.encoding = r.apparent_encoding
   print(r.text[:1000])
except:
    print("爬取失败") 

Example 2: Amazon product page crawling

import requests
url = "https://www.amazon.cn/dp/B071HXVPXG/ref=lp_659039051_1_2?s=books&ie=UTF8&qid=1580353560&sr=1-2"
try:
   kv = {'user-agent' :'Mozilla/5.0'}    
   r = requests.get(url , headers = kv)
   r.raise_for_status()
   r.encoding = r.apparent_encoding
   print(r.text[1000:2000])
except:
    print("爬取失败") 

Three examples: 360 Baidu search keywords submitted

import requests
keyword = "python"
try:
    kv = {'q' : keyword}
    r = requests.get("http://www.so.com/s",params = kv)
    print(r.request.url)
    r.raise_for_status()
    print(len(r.text))   
except:
    print("爬取失败")

Note: Search engine keywords submitted Interface

Baidu keywords Interface: http: //www.baidu.com/s wd = keyword?

360 Keywords Interface: http: //www.so.com/s q = keyword?

Four examples: network picture of crawling and storage

Requests Import 
Import os 
url = "http://img1.3lian.com/2015/w7/97/d/25.jpg" 
# Set the storage location and name of the crawling pictures, names can use a picture of the original names can also be Customizing 
the root = "E: // // Python" 
path = + url.split the root ( '/') [-. 1] 
the try: 
    IF Not os.path.exists (the root): 
        os.mkdir (the root) 
    IF Not os.path.exists (path): 
        r = requests.get (url) 
        with Open (path, 'wb') AS f: 
            f.write (r.content) 
            f.close () 
            Print ( "document saved successfully.") 
    the else: 
        Print ( "file already exists") 
the except: 
    Print ( "crawling failed")
        

Examples of five: IP address attribution of automatic query

import requests
url = "http://m.ip138.com/ip.asp?ip="
try:
    r = requests.get(url+'202.204.80.112')
    r.raise_for_status()
    r.encoding = r.apparent_encoding
    print(r.text[-500:])
except:
    print("爬取失败")

  

Guess you like

Origin www.cnblogs.com/py2019/p/12242318.html