Python3 Web crawler - environment configuration

Mainly related to:

  Installation Python, MongoDB, Redis, MySQL database and common python reptiles; visual graphic interface comprising: Robo 3T, Redis, Navicat for MySQL

python:

  My computer has python3.5 and 3.7 versions. Environment is configured sublime_text3

  python pip need several libraries:

pip3 install lxml

from bs4 import BeautifulSoup
soup = BeautifulSoup("<html></html>","lxml")

   Specific functions See: https: //www.cnblogs.com/zhangxinqi/p/9210211.html#_label6

pip3 install pyquery

from pyquery import PyQuery as pq
doc = pq("<html>Hello</html>")
result = doc("html").text()
result
>>>'Hello'

   Specific functions See: https: //www.cnblogs.com/zhaof/p/6935473.html

pip3 install pymysql

import pymysql
conn = pymysql.connect(host = 'localhost', user = 'root', password = '123456', port = 3306, db = 'cookbook') 
cursor = conn.cursor()
cursor.execute('select * from color')
cursor.fetchall()

   This function is to operate third-party libraries mysql database exists

pip3 install pymongo

import pymongo
client = pymongo.MongoClient('localhost')
db = client['newtestdb']
db['table'].insert({'name':'Bob'})
db['table'].find_one({'name':'Bob'})

pip3 install redis

import redis
r = redis.Redis('localhost', 6379)
r.set('name', 'Bob')
r.get('name')

MongoDB, Redis, MySQL difference: https: //www.cnblogs.com/noah0532/p/10943120.html

 

pip3 install beautifulsoup4

  Specific functions See: https: //www.cnblogs.com/hanmk/p/8724162.html

 

pip3 install flask

pip3 install django

  flask feature comparison with django link: https: //www.cnblogs.com/crss/p/8532950.html

pip3 install jupyter

jupyter a nice python editor

How to open .ipynb file?

  cmd command to .ipynb folder -> Enter jupyter notebook can jump out of the web interface, for viewing.

 

 

How to use Selenium + Chorme?

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get("http://www.badiu.com")
driver.page_source

 

Guess you like

Origin www.cnblogs.com/HannahGreen/p/11929209.html