Small example - crawling car home info

1, https: //www.autohome.com.cn/news/1/#liststart
  Check Code

 

 

2, the code crawling

# Crawling address 
# https://www.autohome.com.cn/news/1/#liststart 
from BS4 Import the BeautifulSoup
 Import Requests 
URL = ' https://www.autohome.com.cn/news/1/# liststart ' 
RES = requests.get (URL) 

# generates a target bs4 

Soup = the BeautifulSoup (res.text, ' lxml ' ) 

div = soup.find (ID = ' Auto-Channel-lazyload-Article This article was ' )
 # div is an object 

ul div.find = (name = ' UL ' ) 

li_list= ul.find_all(name='li')
for li in li_list:
    h3 = li.find(name='h3')
    if h3:
        title = h3.text
        print(title)

    img = li.find(name='img')
    if img:
        img_url = img.get('src')
        print(img_url)

    a = li.find(name='a')
    if a:
        article_url = a.get('href')
        print(article_url)

    p = li.find(name='p')
    if p:
        content = p.text
        print(content)

3, the results of crawling

 

 

 

Guess you like

Origin www.cnblogs.com/xiaowangba9494/p/11938038.html