Installing Python reptile commonly used library

Disclaimer: This article is a blogger original article, shall not be reproduced without the bloggers allowed. https://blog.csdn.net/u011033906/article/details/90737639

installation method

C:\Users\username\AppData\Local\Programs\Python\Python37\Scripts
pip.exe
pip3.7.exe
pip3.exe

These three commands is actually exactly the same, but just not the same name, it is recommended to be added to the environment variable, convenient and practical.

Install third-party libraries:

pip install libraryname

1. urllib

python comes with no extra installation

import urllib
import urllib.request
response=urllib.request.urlopen("http://www.baidu.com")
print(response)

返回如下结果表示请求成功:
<http.client.HTTPResponse object at 0x0000021B8D6D8CF8>

2. request

import requests
response=requests.get("http://www.baidu.com")
print(response)

返回如下结果表示请求成功:
<Response [200]>

3. Regular Expressions module

re, python comes with no extra installation, direct importis not being given to prove that the libraries are installed correctly

4. selenium

The main library is used to make a drive browser, usually used for automated testing. We do reptiles time, will encounter some JS rendering web pages, this time with the requeststime do not get the correct requested requested content. At this time we seleniumcan directly drive browser, JS rendering is performed directly by the browser, after the results obtained after the interface is rendered, you can get content after JS rendering.

5. chromedriver

Download version when the attention of the problem. At 32, followed by decompression into a place already configured environment variables.

import selenium
from selenium import webdriver

driver = webdriver.Chrome()
driver.get("http://www.baidu.com")
print(driver.page_source)

6. plantomjs or headless

seleniumWhen there is an open browser interface, plantomjsopen the browser is no interface.

But seemingly discarded, and = - =, replaced headless

import selenium
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

options = Options()
options.add_argument('--headless')
options.add_argument('--disable-gpu')
driver = webdriver.Chrome(options=options)
driver.get("https://cnblogs.com/")
print(driver.page_source)

7. LXML

Website parsing with

8. beautifulsoup4

9. pyquery

DOM parse tree and jQuery selector

10. pymysql | pymongo | redis | flask | jango | jupyter

Guess you like

Origin blog.csdn.net/u011033906/article/details/90737639