Python crawler study notes (seven) - Selenium

Table of contents

1. What is selenium?

2. Why use selenium?

3. selenium installation

4. How to use selenium

5. Element positioning of selenium

6. Access element information

7. Interaction

1. What is selenium?

(1) Selenium is a tool for web application testing.

(2) Selenium tests run directly in the browser, just like real users are operating.

(3) Support to drive real browsers through various drivers (FirfoxDriver, InternetExplorerDriver, OperaDriver, ChromeDriver) to complete the test.

(4) selenium also supports no interface browser operation.

2. Why use selenium?

Simulate the browser function, automatically execute the js code in the web page, and realize dynamic loading.

3. selenium installation

(1) Operate the Google Chrome driver download address

http://chromedriver.storage.googleapis.com/index.html

(2) Mapping table between Google Drive and Google Chrome versions

http://blog.csdn.net/huilan_same/article/details/51896672

(3) View the version of Google Chrome

Top right corner of Google Chrome ‐‐>Help‐‐>About

（4）pip install selenium

4. How to use selenium

(1) Import: from selenium import webdriver

(2) Create a Google Chrome operation object:

path = Google Chrome driver file path

browser = webdriver.Chrome(path)

(3) Visit URL

url = URL to visit

browser.get(url)


# （1）导入selenium
from selenium import webdriver

# (2) 创建浏览器操作对象

path = 'chromedriver.exe'

browser = webdriver.Chrome(path)

# （3）访问网站
# url = 'https://www.baidu.com'
#
# browser.get(url)

url = 'https://www.jd.com/'

browser.get(url)

# page_source获取网页源码
content = browser.page_source
print(content)

5. Element positioning of selenium

Element positioning: What automation needs to do is to simulate the mouse and keyboard to operate these elements, click, input, and so on. Before operating these elements, you must first find them. WebDriver provides many methods for locating elements

method:

（1）find_element_by_id

                eg:button = browser.find_element_by_id('su')

（2）find_elements_by_name

                eg:name = browser.find_element_by_name('wd')

（3）find_elements_by_xpath

                eg:xpath1 = browser.find_elements_by_xpath('//input[@id="su"]')

（4）find_elements_by_tag_name

                eg:names = browser.find_elements_by_tag_name('input')

（5）find_elements_by_css_selector

                eg:my_input = browser.find_elements_by_css_selector('#kw')[0]

（6）find_elements_by_link_text

                eg:browser.find_element_by_link_text("News")

Note: Now that these codes are integrated, you can find them through find_element(by='', value=''), the value of by is the attribute name, and the value is the attribute value


from selenium import webdriver

path = 'chromedriver.exe'
browser = webdriver.Chrome(path)

url = 'https://www.baidu.com'
browser.get(url)

# 元素定位

# 根据id来找到对象
button = browser.find_element_by_id('su')
print(button)

# 根据标签属性的属性值来获取对象的
button = browser.find_element_by_name('wd')
print(button)

# 根据xpath语句来获取对象
button = browser.find_elements_by_xpath('//input[@id="su"]')
print(button)

# 根据标签的名字来获取对象
button = browser.find_elements_by_tag_name('input')
print(button)

# 使用的bs4的语法来获取对象
button = browser.find_elements_by_css_selector('#su')
print(button)

button = browser.find_element_by_link_text('直播')
print(button)

6. Access element information

Get element attributes

            .get_attribute('class')

get element text

            .text

get tag name

            .tag_name


from selenium import webdriver

path = 'chromedriver.exe'
browser = webdriver.Chrome(path)


url = 'http://www.baidu.com'
browser.get(url)


input = browser.find_element_by_id('su')

# 获取标签的属性
print(input.get_attribute('class'))
# 获取标签的名字
print(input.tag_name)

# 获取元素文本
a = browser.find_element_by_link_text('新闻')
print(a.text)

7. Interaction

Click: click()

Input: send_keys()

Back operation: browser.back()

Forward operation: browser.forword()

Simulate JS scrolling:

js='document.documentElement.scrollTop=100000'

browser.execute_script(js) execute js code

Get web page code: page_source

Quit: browser.quit()

Example: Baidu automatically searches for Jay Chou and turns pages, etc.



from selenium import webdriver

# 创建浏览器对象
path = 'chromedriver.exe'
browser = webdriver.Chrome(path)

# url
url = 'https://www.baidu.com'
browser.get(url)

import time
time.sleep(2)

# 获取文本框的对象
input = browser.find_element_by_id('kw')

# 在文本框中输入周杰伦
input.send_keys('周杰伦')

time.sleep(2)

# 获取百度一下的按钮
button = browser.find_element_by_id('su')

# 点击按钮
button.click()

time.sleep(2)

# 滑到底部
js_bottom = 'document.documentElement.scrollTop=100000'
browser.execute_script(js_bottom)

time.sleep(2)

# 获取下一页的按钮
next = browser.find_element_by_xpath('//a[@class="n"]')

# 点击下一页
next.click()

time.sleep(2)

# 回到上一页
browser.back()

time.sleep(2)

# 回去
browser.forward()

time.sleep(3)

# 退出
browser.quit()