Summary based on the reprint of the previous article:
The main steps of python crawler
1. Visit the website to get html data
2. Read the html data, parse the data, and take out the value you want
Parse data using beautifulsoup
The steps to parse the data are explained in detail:
1. Use beautifulsoup to parse html data into an object
soup = BeautifulSoup(html_text, "html.parser" ) # Create BeautifulSoup object
2. Get the label content through the soup object
myHead=soup.head #Get the first <head> tag
myBody=soup.body #Get the first <body> tag
myBody=soup.b #Get the first <b> tag
myPara= day.find_all('p') # Get all p tags
3. Get the text through the label object
text=myPara.string
4. div nests div, it is not easy to locate, just use find_all to get it
data = body.find_all('div', {'class': 'weather_li_left'}) print(data)
5. Get all the <li> tags under the tag, and output the text of the subtag
child=parent.find_all('li') for text in child: print(text.string)