Getting Started with Python Crawler (Analysis)

 Summary based on the reprint of the previous article:

 

The main steps of python crawler

1. Visit the website to get html data

2. Read the html data, parse the data, and take out the value you want

   Parse data using beautifulsoup

 

The steps to parse the data are explained in detail:

1. Use beautifulsoup to parse html data into an object

    

soup = BeautifulSoup(html_text, "html.parser" )   # Create BeautifulSoup object

 

2. Get the label content through the soup object

    myHead=soup.head #Get the first <head> tag

    myBody=soup.body #Get the first <body> tag

    myBody=soup.b #Get the first <b> tag

 

    myPara= day.find_all('p') # Get all p tags

 

3. Get the text through the label object

   text=myPara.string

 

4. div nests div, it is not easy to locate, just use find_all to get it

 

data = body.find_all('div', {'class': 'weather_li_left'})
print(data)

 

 5. Get all the <li> tags under the tag, and output the text of the subtag

     

child=parent.find_all('li')
for text in child:
    print(text.string)

 

 

 

 

 

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326193779&siteId=291194637