python reptile --- from scratch (six) Selenium library

What is Selenium library:

  Automated testing tools, support for multiple browsers. Supported browsers include IE (7, 8, 9, 10, 11), Mozilla Firefox, Safari, Google Chrome, Opera and so on.

Reptile is mainly used to solve the problem JavaScript rendering. For driving the browser, and the browser administering operation.

Installation Selenium library: pip3 install selenium

Selcnium library uses detailed:

  Before using webDriver we need to install drivers specific installation methods, self-Baidu, remember the corresponding version.

  Basic use:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# 基本用法
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait

browser = webdriver.Chrome()
try:
    browser.get("http://www.baidu.com")
    input = browser.find_element_by_id('kw')
    input.send_keys('Python')
    input.send_keys(Keys.ENTER)
    wait = WebDriverWait(browser, 10)
    wait.until(EC.presence_of_element_located((By.ID,'content_left')))
    print(browser.current_url)
    print(browser.get_cookies())
    print(browser.page_source)
finally:
    browser.close()

If you run this code, explain your webDriver correct version (need to install the Google browser)

operation result:

  Statement browser object:

We have just said Selenium supports multiple browsers, here are how I look declare

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# 声明浏览器对象
from selenium import webdriver

browser = webdriver.Chrome()
browser = webdriver.Safari()
browser = webdriver.Edge()
browser = webdriver.Firefox()
browser = webdriver.PhantomJS()

我这里没有安装那些浏览器,就不给大家运行代码了,建议使用Chrome浏览器(Google谷歌浏览器)

访问页面:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# 访问页面
from selenium import webdriver
browser = webdriver.Chrome()
browser.get("http://baidu.com")
print(browser.page_source)
browser.close()

运行结果:

查找元素:

  单个元素:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# 查找元素,单个元素
from selenium import webdriver
browser = webdriver.Chrome()
browser.get("http://taobao.com")
input_first = browser.find_element_by_id('q')
input_second = browser.find_element_by_css_selector('#q')
input_three = browser.find_element_by_xpath('//*[@id="q"]')
print(input_first)
print(input_second)
print(input_three)
browser.close()

运行结果:

  • find_element_by_name  
  • find_element_by_xpath  
  • find_element_by_link_text
  • find_element_by_partial_link_text
  • find_element_by_tag_name
  • find_element_by_class_name
  • find_element_by_css_selector

这些都为查找方式

也可以用通用方式来查找:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# 查找元素,单个元素
from selenium import webdriver
from selenium.webdriver.common.by import By

browser = webdriver.Chrome()
browser.get("http://taobao.com")
input_first = browser.find_element(By.ID,'q')
print(input_first)
browser.close()

运行结果:

多个元素:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# 查找元素,多个元素
from selenium import webdriver
from selenium.webdriver.common.by import By

browser = webdriver.Chrome()
browser.get("http://taobao.com")
input_first = browser.find_elements_by_css_selector('.service-bd li')
for i in input_first:
    print(i)
browser.close()

运行结果:

还有很多方法和find_elment用法完全一致,返回一个列表数据。

元素交互操作:

对获取的元素调用交互方法:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# 元素交互操作

from selenium import webdriver
from selenium.webdriver.common.by import By

browser = webdriver.Chrome()
browser.get("http://baidu.com")
input_first = browser.find_element(By.ID,'kw')
input_first.send_keys('python从入坑到放弃')
button = browser.find_element_by_class_name('bg s_btn')
button.click()

运行代码我们会看到打开Chrome浏览器,并且输入要搜索的内容,然后点击搜索按钮。更多操作访问地址:https://selenium-python.readthedocs.io/api.html#module-selenium.webdriver.remote.webelement

交互操作:

将动作附加到动作链中串行执行

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# 交互操作
from selenium import webdriver
from selenium.webdriver import ActionChains

browser = webdriver.Chrome()
url = 'https://www.runoob.com/try/try.php?filename=jqueryui-api-droppable'
browser.get(url)
browser.switch_to.frame('iframeResult')
source = browser.find_element_by_id('draggable')
target = browser.find_element_by_id('droppable')
actions = ActionChains(browser)
actions.drag_and_drop(source, target)
actions.perform()

运行代码我们会看到内部的滑块进行了拖拽操作。更多详细的操作可以访问:https://selenium-python.readthedocs.io/api.html#module-selenium.webdriver.common.action_chains

执行Javascript:⭐️⭐️⭐️⭐️⭐️

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# 执行javascript
from selenium import webdriver

browser = webdriver.Chrome()
browser.get('https://www.zhihu.com/explore')
browser.execute_script('window.scrollTo(0,document.body.scrollHeight)')
browser.execute_script('alert("弹出")')

运行代码我们可以看到,滚动条被下拉,并且给予了弹出框。

获取元素信息:

  获取属性:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# 获取元素信息:获取属性
from selenium import webdriver

browser = webdriver.Chrome()
url = "http://www.zhihu.com/explore"
browser.get(url)
logo = browser.find_element_by_id('zh-top-link-logo')
print(logo)
print(logo.get_attribute('class'))

运行结果:

获取文本值:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# 获取文本值
from selenium import webdriver

browser = webdriver.Chrome()
url = "http://www.zhihu.com/explore"
browser.get(url)
question = browser.find_element_by_class_name('zu-top-add-question')
print(question.text)

运行结果:

获取ID,位置,标签名,大小:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# 获取ID,位置,标签名,大小
from selenium import webdriver

browser = webdriver.Chrome()
url = "http://www.zhihu.com/explore"
browser.get(url)
question = browser.find_element_by_class_name('zu-top-add-question')
print(question.id)
print(question.location)
print(question.tag_name)
print(question.size)

运行结果:

Frame:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Frame
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException

browser = webdriver.Chrome()
url = 'https://www.runoob.com/try/try.php?filename=jqueryui-api-droppable'
browser.get(url)
browser.switch_to.frame('iframeResult')
source = browser.find_element_by_id('draggable')
print(source)
try:
    logo = browser.find_element_by_class_name('logo')
except NoSuchElementException:
    print("NO LOGO")
browser.switch_to.parent_frame()
logo = browser.find_element_by_class_name('logo')
print(logo)
print(logo.text)

运行结果:

等待:

隐式等待 :

当使用了隐式等待执行测试的时候,如果WebDriver没有在DOM中找到元素,将继续等待,超出设定时间则抛出找不到元素的异常,换句话来说,当元素或查找元素没有立即出现的时候,隐式等待将等待一段时间再查找DOM,默认时间是0

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# 隐式等待
from selenium import webdriver

browser = webdriver.Chrome()
url = "http://www.zhihu.com/explore"
browser.get(url)
input = browser.find_element_by_class_name('zu-top-add-question')
print(input)

运行结果:

显示等待:比较常用

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# 显示等待
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait

browser = webdriver.Chrome()

browser.get("http://www.taobao.com")
wait = WebDriverWait(browser, 10)
wait.until(EC.presence_of_element_located((By.ID,'q')))
button = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR,'.btn-search')))
print(input,button)
  • title_is 标题是某内容
  • title_contains 标题包含某内容
  • presence_of_element_located 元素加载出,传入定位元祖,如(By.ID,'p')
  • visibility_of_element_located 元素可见,传入定位元祖
  • visibility_of 可见,传入元素对象
  • presence_of_all_elements_located 所有元素加载出
  • text_to_be_present_in_element 某个元素文本包含某文字
  • text_to_be_present_in_element_value 某个元素值包含某文字
  • frame_to_be_available_and_switch_to_it 加载并切换
  • invisibility_of_element_located 元素不可见
  • element_to_be_clickable 元素可点击
  • staleness_of 判断一个元素是否仍在DOM,可判断页面是否已经刷新
  • element_to_be_selected 元素可选择,传元素对象
  • element_located_to_be_selected 元素可以选择,传入定位元祖
  • element_selection_state_to_be 传入元素对象以及状态,相等返回True,否则返回False
  • element_located_selection_state_to_be 传入定位元祖以及状态,相等返回True,否则返回False
  • alert_is_present 是否出现Alert

  详细内容,可以阅读官方地址:https://selenium-python.readthedocs.io/api.html#module-selenium.webdriver.support.expected_conditions

前进和后退:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# 前进和后退
from selenium import webdriver

browser = webdriver.Chrome()

browser.get("http://www.taobao.com")
browser.get("http://www.baidu.com")
browser.get("http://www.zhihu.com")
browser.back()
browser.forward()

运行代码我们会看到优先大家taobao.com然后打开baidu.com,最后打开zhihu.com,然后执行退回动作和前进动作

Cookies:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Cookies
from selenium import webdriver

browser = webdriver.Chrome()

browser.get("http://www.zhihu.com")
print(browser.get_cookies())
browser.add_cookie({'name':'admin','domain':'www.zhihu.com','value':'cxiaocai'})
print(browser.get_cookies())
browser.delete_all_cookies()
print(browser.get_cookies())

运行结果:

选项卡管理:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# 选项卡管理
from selenium import webdriver

browser = webdriver.Chrome()

browser.get("http://www.baidu.com")
browser.execute_script('window.open()')
print(browser.window_handles)
browser.switch_to.window(browser.window_handles[1])
browser.get('http://www.taobao.com')
browser.switch_to.window(browser.window_handles[0])
browser.get('http://www.zhihu.com')

也可以使用浏览器的快捷方式的操作键位来打开窗口(不建议这样使用,建议使用上面的方式来管理选项卡)

异常处理:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# 异常处理
from selenium import webdriver
from selenium.common.exceptions import TimeoutException,NoSuchElementException

browser = webdriver.Chrome()
try:
    browser.get("http://www.baidu.com")
except TimeoutException:
    print("请求超时")
try:
    browser.find_element_by_id('hello')
except NoSuchElementException:
    print("NoSuchElementException")

运行结果:

由于异常处理比较复杂,异常也有很多,在这里不在一一列举了,建议大家去官网查看,地址:https://selenium-python.readthedocs.io/api.html#module-selenium.common.exceptions  

上述代码地址:https://gitee.com/dwyui/senlenium.git

到这里Selenium库的使用就说完了,python用于爬虫的库就说了这么多,前面的urllib,Requests,BeautfuliSoup,PyQuery还有今天的Selenium库,明天开始直接讲解真实案例,最近我会整理几个简单的小爬虫案例。

Guess you like

Origin www.cnblogs.com/cxiaocai/p/10946900.html