Elephant college courses sense Xieliang Bin Liang teacher and teacher Yu Mang Mang Yu rows and rows brother's blog https://my.oschina.net/u/3914536 OSCHINA
First and foremost import library
import requests from bs4 import BeautifulSoup
URL is then introduced into the url
url='xxx.html' url = 'http://www.air-level.com/air/' + city_pinyin
The get function using the obtained requests web content, stored in the variable, such as custom or web_data r
Within 30 seconds
wb_data=requests.get(url,timeout=30)
entrainment can get a user name and password information
# Headers = { 'the User-- Agent': XXX, 'cookies': XXX} # wb_data = requests.get (URL, headers = headers) # wb_data information stored ' '' wb_data.text- page code wb_data.status_code - status code wb_data.url- request URL wb_data.headers- header wb_data.cookies-cookie information wb_data.content- byte stream '' '
Then use BeautifulSoup resolution, save in soup variable, remember to add text to change the form of analytical methods for the lxml
soup = BeautifulSoup(web_data.text,'lxml')
find if looking directly out of the () number first few characters from scratch position
If the class does not get directly to media content, content arrangement in accordance with the form of a list of open
soup = BeautifulSoup(r.text,'lxml') td_list = soup.find_all('td')
For under a div class want to repeat because find_all find and get that list, you can use an index to achieve
such as:
= soup.find_all city_div ( ' div ' , { ' class ' : ' bottom ' }) [. 1 ] # meaning: get the contents div class this class division manner inside a second bottom