python web crawler and crawler information extraction mooc ------ Example

Example one - crawling pages

1 import requests
2 url="https//itemjd.com/2646846.html"
3 try:
4   r=requests.get(url)
5   r.raise_for_status()
6   r.encoding=r.apparent_encoding
7   print(r.text[:1000])
8 except:
9   print("爬取失败")

Normal crawling pages

 

Example Two - crawling pages

 1 import requests
 2 url="https://www.amazon.cn/gp/product/B01M8L5Z3Y"
 3 try:
 4    kv={'user-agent':'Mozilla/5.0'}
 5    r=requests.get(url,headers=kv)
 6    r.raise_for_status()
 7    r.encoding=r.apparent_encoding
 8    print(r.text[1000:2000])
 9 except:
10    print("爬取失败")

There are restrictions on access to the user name, simulate browser requests to the site

 

Three examples - crawling search engine

1  # Baidu interfaces Keywords: HTTP: WD = //www.baidu.com/s keyword? 
2  # Image Interface 360: HTTP: Q = //www.so.com/s keyword? 
. 3  Import Requests
 . 4 = keyword " Python " 
. 5  the try :
 . 6      kV = { ' WD ' :} keyword
 . 7      R & lt requests.get = ( " http://www.baidu.com/s " , the params = kV)
 . 8      Print (r.request. URL)
 . 9      r.raise_for_status ()
 10      Print (len (r.text))
 . 11  the except :
 12 is     Print ( " crawling failure " ) 
------------------------------------------ --------
import requests
keyword="python"
try:
kv={'q':keyword}
r=requests.get("http://www.so.com/s",params=kv)
print(r.request.url)
r.raise_for_status()
print(len(r.text))
except:
print("爬取失败")
 

 

Four examples: - crawling picture

 1 import requests
 2 import os
 3 url="http://image.nationalgeographic.com.cn/2017/0211/20170211061910157.jpg"
 4 root="F://pics//"
 5 path=root+url.split('/')[-1]
 6 try:
 7     if not os.path.exists(root):
 8         os.mkdir(root)
 9     if not os.path.exists(path):
10         r=requests.get(url)
11         with open(path,'WB ' ) AS F:
 12 is              f.write (r.content)
 13 is              f.close ()
 14              Print ( " save file is successfully " )
 15      the else :
 16          Print ( " file already exists " )
 . 17  the except :
 18 is      Print ( " crawl take failure " )

Crawling and storing the image

Examples of five --IP address belonging to the query:

http://m.ip138.com/ip.asp?ip=ipaddress

url="http://www.ip138.com/iplookup.asp?ip="
try:
    r=requests.get(url+'202.204.80.112'+'&action=2')
    r.raise_for_status()
    r.encoding=r.apparent_encoding
    print(r.text[-500:])
except:
    print("爬取失败")

There are anti-climb

 

Guess you like

Origin www.cnblogs.com/cy2268540857/p/12424091.html