Example one - crawling pages
1 import requests 2 url="https//itemjd.com/2646846.html" 3 try: 4 r=requests.get(url) 5 r.raise_for_status() 6 r.encoding=r.apparent_encoding 7 print(r.text[:1000]) 8 except: 9 print("爬取失败")
Normal crawling pages
Example Two - crawling pages
1 import requests 2 url="https://www.amazon.cn/gp/product/B01M8L5Z3Y" 3 try: 4 kv={'user-agent':'Mozilla/5.0'} 5 r=requests.get(url,headers=kv) 6 r.raise_for_status() 7 r.encoding=r.apparent_encoding 8 print(r.text[1000:2000]) 9 except: 10 print("爬取失败")
There are restrictions on access to the user name, simulate browser requests to the site
Three examples - crawling search engine
1 # Baidu interfaces Keywords: HTTP: WD = //www.baidu.com/s keyword? 2 # Image Interface 360: HTTP: Q = //www.so.com/s keyword? . 3 Import Requests . 4 = keyword " Python " . 5 the try : . 6 kV = { ' WD ' :} keyword . 7 R & lt requests.get = ( " http://www.baidu.com/s " , the params = kV) . 8 Print (r.request. URL) . 9 r.raise_for_status () 10 Print (len (r.text)) . 11 the except : 12 is Print ( " crawling failure " )
------------------------------------------ --------
import requests
keyword="python"
try:
kv={'q':keyword}
r=requests.get("http://www.so.com/s",params=kv)
print(r.request.url)
r.raise_for_status()
print(len(r.text))
except:
print("爬取失败")
Four examples: - crawling picture
1 import requests 2 import os 3 url="http://image.nationalgeographic.com.cn/2017/0211/20170211061910157.jpg" 4 root="F://pics//" 5 path=root+url.split('/')[-1] 6 try: 7 if not os.path.exists(root): 8 os.mkdir(root) 9 if not os.path.exists(path): 10 r=requests.get(url) 11 with open(path,'WB ' ) AS F: 12 is f.write (r.content) 13 is f.close () 14 Print ( " save file is successfully " ) 15 the else : 16 Print ( " file already exists " ) . 17 the except : 18 is Print ( " crawl take failure " )
Crawling and storing the image
Examples of five --IP address belonging to the query:
http://m.ip138.com/ip.asp?ip=ipaddress
url="http://www.ip138.com/iplookup.asp?ip=" try: r=requests.get(url+'202.204.80.112'+'&action=2') r.raise_for_status() r.encoding=r.apparent_encoding print(r.text[-500:]) except: print("爬取失败")
There are anti-climb