Reptile -selenium
Feature
Browser-based automation modules
Manual operation can be simulated
1. convenient access to the site dynamic loading of data
2. Convenient analog landing
3. You can get data dynamically loaded js
Driver download version and relationships
http://chromedriver.storage.googleapis.com/index.html
http://blog.csdn.net/huilan_same/article/details/51896672
Simple Case
from selenium import webdriver
from lxml import etree
#实例化一个浏览器对象,需要传入驱动
chrome=webdriver.Chrome(executable_path="chromedriver")
url='https://www.ixigua.com/i6701605562779435533/'
url2='http://125.35.6.84:81/xk'
#浏览器发起请求
chrome.get(url=url2)
#浏览器获取界面源码数据
page_text=chrome.page_source
print(page_text)
tree=etree.HTML(page_text)
c_name=tree.xpath("//ul[@id='gzlist']/li/dl/@title")
print(c_name)
#关闭浏览器
chrome.quit()
Commonly used methods
1. initiate a request: get (url)
2. Label positioning: find series of methods
3. Label interaction (think add value input box): send_keys ()
4. The program execution js: execute_script ( "jscode")
5. Come, return: back () / forward ()
6. Close the browser: quit ()
from selenium import webdriver
import time
chrome=webdriver.Chrome(executable_path="chromedriver")
url="https://www.jd.com"
url2="https://www.baidu.com"
#发起请求
chrome.get(url)
#找到搜索框
input=chrome.find_element_by_id("key")
#将想要查询的内容填入搜索框内
input.send_keys("显卡")
#找到搜索按钮
button=chrome.find_element_by_xpath('//*[@id="search"]/div/div[2]/button')
#点击搜索
button.click()
time.sleep(2)
#跳转到百度
chrome.get(url2)
time.sleep(2)
#返回
chrome.back()
time.sleep(2)
#前进
chrome.forward()
chrome.back()
#向下滚动一个浏览器界面的长度
for i in range(3):
chrome.execute_script('window.scrollTo(0,document.body.scrollHeight)')
time.sleep(2)
time.sleep(3)
#浏览器退出
chrome.quit()