installation method
C:\Users\username\AppData\Local\Programs\Python\Python37\Scripts
pip.exe
pip3.7.exe
pip3.exe
These three commands is actually exactly the same, but just not the same name, it is recommended to be added to the environment variable, convenient and practical.
Install third-party libraries:
pip install libraryname
1. urllib
python comes with no extra installation
import urllib
import urllib.request
response=urllib.request.urlopen("http://www.baidu.com")
print(response)
返回如下结果表示请求成功:
<http.client.HTTPResponse object at 0x0000021B8D6D8CF8>
2. request
import requests
response=requests.get("http://www.baidu.com")
print(response)
返回如下结果表示请求成功:
<Response [200]>
3. Regular Expressions module
re, python comes with no extra installation, direct import
is not being given to prove that the libraries are installed correctly
4. selenium
The main library is used to make a drive browser, usually used for automated testing. We do reptiles time, will encounter some JS rendering web pages, this time with the requests
time do not get the correct requested requested content. At this time we selenium
can directly drive browser, JS rendering is performed directly by the browser, after the results obtained after the interface is rendered, you can get content after JS rendering.
5. chromedriver
Download version when the attention of the problem. At 32, followed by decompression into a place already configured environment variables.
import selenium
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("http://www.baidu.com")
print(driver.page_source)
6. plantomjs
or headless
selenium
When there is an open browser interface, plantomjs
open the browser is no interface.
But seemingly discarded, and = - =, replaced headless
import selenium
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument('--headless')
options.add_argument('--disable-gpu')
driver = webdriver.Chrome(options=options)
driver.get("https://cnblogs.com/")
print(driver.page_source)
7. LXML
Website parsing with
8. beautifulsoup4
9. pyquery
DOM parse tree and jQuery selector