About reptile _v1.0

Elephant college courses sense Xieliang Bin Liang teacher and teacher Yu Mang Mang Yu rows and rows brother's blog https://my.oschina.net/u/3914536 OSCHINA

 

First and foremost import library

import requests
from bs4 import BeautifulSoup

 

URL is then introduced into the url

url='xxx.html'
url = 'http://www.air-level.com/air/' + city_pinyin

The get function using the obtained requests web content, stored in the variable, such as custom or web_data r

Within 30 seconds    

wb_data=requests.get(url,timeout=30)

 

 

entrainment can get a user name and password information

# Headers = { 'the User-- Agent': XXX, 'cookies': XXX} 
# wb_data = requests.get (URL, headers = headers) 

# wb_data information stored 
' '' 
wb_data.text- page code 
wb_data.status_code - status code 
wb_data.url- request URL 
wb_data.headers- header 
wb_data.cookies-cookie information 
wb_data.content- byte stream 
'' '

Then use BeautifulSoup resolution, save in soup variable, remember to add text to change the form of analytical methods for the lxml

soup = BeautifulSoup(web_data.text,'lxml')

 find if looking directly out of the () number first few characters from scratch position

 If the class does not get directly to media content, content arrangement in accordance with the form of a list of open

soup = BeautifulSoup(r.text,'lxml')
td_list = soup.find_all('td')

 

For under a div class want to repeat because find_all find and get that list, you can use an index to achieve

such as:

= soup.find_all city_div ( ' div ' , { ' class ' : ' bottom ' }) [. 1 ]
 # meaning: get the contents div class this class division manner inside a second bottom

 

Guess you like

Origin www.cnblogs.com/EdedZhang/p/11234391.html