Article Directory

1. Driver installation
- 1.1 Quick start with Phantomjs
2. Chromedriver Quick Start
3. Positioning elements
4. Manipulate form elements & select tags
5. Manipulate non-select tags
6. Simulate login to Douban

1. Driver installation

Selenium introduction: Selenium is a web automated testing tool. It was originally developed for automated website testing. Selenium can run directly on the browser. It supports all mainstream browsers and can receive instructions to let the browser automatically load the page. Get the data you need, and even take screenshots of the page. Need to cooperate with the browser driver. chromedriver is a driver that drives the Chrome browser, and it can be used to drive the browser. Of course, there are different drivers for different browsers. The different browsers and their corresponding drivers are listed below:

Chrome：https://sites.google.com/a/chromium.org/chromedriver/downloads
Firefox：https://github.com/mozilla/geckodriver/releases
Edge：https://developer.microsoft.com/en-us/microsoft-edge/tools/webdriver/
Safari：https://webkit.org/blog/6900/webdriver-support-in-safari-10/
Download chromedriver
Baidu search: Taobao mirror (https://npm.taobao.org/)
Installation summary: https://www.jianshu.com/p/a383e8970135
Install Selenium: pip install selenium can also be installed from another source:
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple selenium
This is the source of Tsinghua, you can also use other sources.

1.1 Quick start with Phantomjs

Headless browser: a complete browser kernel, including js parsing engine, rendering engine, request processing, etc., but no interface for interaction with the user is displayed.
We need to download the Phantoumjs driver, unzip it, and put the driver exe file in the python root folder, because this folder has been added to the environment variable.
Case: Open Baidu.

from selenium import webdriver
driver = webdriver.PhantomJS()
driver.get('https://www.baidu.com/')

show result

D:\Python38\python.exe D:/work/爬虫/Day10/my_code/Phantojs_getin.py
D:\Python38\lib\site-packages\selenium\webdriver\phantomjs\webdriver.py:49: UserWarning: Selenium support for PhantomJS has been deprecated, please use headless versions of Chrome or Firefox instead
  warnings.warn('Selenium support for PhantomJS has been deprecated, please use headless '

Process finished with exit code 0

Actually, the Baidu webpage has been opened, but you didn't see it because Phantomjs has no display function. But there is a screenshot function. Let's demonstrate.

from selenium import webdriver
driver = webdriver.PhantomJS()
driver.get('https://www.baidu.com/')
driver.save_screenshot('baidu.png')

There is an additional "baidu.png" file in my directory.
Insert picture description here
Click on it to see a screenshot

of an interface that has not been logged in.
Let's operate the input box, then we must first find the position of the input box. By positioning, right-click on the Baidu interface "check" the element locator and locate the input box to get the position. We can locate it by id="kw" tag.
Code:

from selenium import webdriver
driver = webdriver.PhantomJS()
driver.get('https://www.baidu.com/')
# 操作输入框，先定位输入框
driver.find_element_by_id('kw').send_keys('python')
# 截屏
driver.save_screenshot('baidu_1.png')

Insert picture description here
If we want to search, we want to add clicks. We search through the id="su" tag, the code:

from selenium import webdriver
driver = webdriver.PhantomJS()
driver.get('https://www.baidu.com/')
# 操作输入框，先定位输入框
driver.find_element_by_id('kw').send_keys('python')
# 点击事件
button_tag = driver.find_element_by_id('su')
button_tag.click()
# 查看当前请求的url地址
print(driver.current_url)
# 截屏
driver.save_screenshot('baidu_2.png')

result

https://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=1&rsv_idx=1&tn=baidu&wd=python&fenlei=256&rsv_pq=97a4e5570000d3f9&rsv_t=f318U3sVVCvcZeSx6xhJZiX2%2B5LU7NW2CMwM1gQfD08RPAjfrW3%2F6VUwDzI&rqlang=cn&rsv_enter=0&rsv_dl=ib&rsv_sug3=6&rsv_btype=i&inputT=136&rsv_sug4=136

This is the url address we just searched for Baidu content.
We know so much about Phantomjs.

2. Chromedriver Quick Start

Similarly, we can download Chromedriver in the Taobao mirror. After unzipping, we can put the exe file in the python root directory. Then we can call, try writing code:

from selenium import webdriver
import time
driver = webdriver.Chrome() # 记得这里Chrome首字母是大写，否则会报错。
# 打开百度网页
driver.get('https://baidu.com/')
# 十五秒后进行下一步操作
time.sleep(15)
# 退出网页
driver.quit()

After running, the Baidu webpage is opened and automatically closed after 15 seconds. This is the effect. The quit() method is to exit the browser, that is, close all windows. The close() method is to close the current window. driver.maximize_window() can maximize the window. If we want to learn more methods, we can click Ctrl+Alt to open webdriver to learn more methods.

3. Positioning elements

Similar to Phantomjs, let’s look at the code directly. The following code locates the input box, enter python after 3 seconds, click search after 2 seconds, and close it after 5 seconds.

from selenium import webdriver
import time
driver = webdriver.Chrome() # 记得这里Chrome首字母是大写，否则会报错。
# 打开百度网页
driver.get('https://baidu.com/')
time.sleep(3)
driver.find_element_by_id('kw').send_keys('python')
button_tag = driver.find_element_by_id('su')
time.sleep(2)
button_tag.click()
time.sleep(5)
driver.quit()

3.1 Another positioning method

There is another positioning method, look at the code:

from selenium import webdriver
from selenium.webdriver.common.by import By   # 看这里，driver.后面选find_element()，
# 括号里输入By，然后.后面可以跟很多中查找定位方法
import time
driver = webdriver.Chrome() # 记得这里Chrome首字母是大写，否则会报错。
# 打开百度网页
driver.get('https://baidu.com/')
time.sleep(3)
# driver.find_element_by_id('kw').send_keys('python')
driver.find_element(By.ID,'kw').send_keys('miantaoge') # .find_element(By.)后面会提示很多中定位方法，
# 然后自由选择，这里我们仍然选择id定位，后面就逗号，然后输入id值。效果是跟刚才一样的。
button_tag = driver.find_element_by_id('su')
time.sleep(2)
button_tag.click()
time.sleep(5)
driver.quit()

3.2 Locate by class_name

This time we use the attribute name to locate the Baidu input box.
Insert picture description here
We use class = "s_ipt" to locate, the code:

from selenium import webdriver
from selenium.webdriver.common.by import By   # 看这里，driver.后面选find_element()，
# 括号里输入By，然后.后面可以跟很多中查找定位方法
import time
driver = webdriver.Chrome() # 记得这里Chrome首字母是大写，否则会报错。
# 打开百度网页
driver.get('https://baidu.com/')
time.sleep(3)
# driver.find_element_by_id('kw').send_keys('python')
# driver.find_element(By.ID,'kw').send_keys('miantaoge') # .find_element(By.)后面会提示很多中定位方法，
# 然后自由选择，这里我们仍然选择id定位，后面就逗号，然后输入id值。效果是跟刚才一样的。
driver.find_element_by_class_name('s_ipt').send_keys('石家庄疫情')
# driver.find_element(By.CLASS_NAME,'s_ipt').send_keys('石家庄疫情')   用这个方法也是一样的
button_tag = driver.find_element_by_id('su')
time.sleep(2)
button_tag.click()
time.sleep(5)
driver.quit()

The effect is the same after running.

3.3 Locate by name

The repetitive code is no longer written, only the key sentence:

driver.find_element_by_name('wd').send_keys('乌鲁木齐疫情')
driver.find_element(By.NAME,'wd').send_keys('北京疫情')

The effect is the same after running.

3.4 Positioning by tag_name

It is better not to locate in this way. It is possible to find a batch of labels in this way, because there are too many labels with the same name and the positioning is not accurate.

3.5 Positioning via xpath

If you can't find the above label, you can try to locate it with xpath and write a statement.

driver.find_element_by_xpath('//input[@id="kw"]').send_keys('哈尔滨疫情')

The same effect can be obtained by running after supplementing the code. If you use tag_name directly, input will definitely report an error, because there are many tags called input. If it is printed, it returns a list object.

3.6 Positioning via css selector

This requires some css syntax.

driver.find_element_by_css_selector('.s_ipt').send_keys('长春疫情')

This is located by class_name.
In addition, driver.find_elements_by_css_selector() can find a batch of tags that meet the requirements.

4. Manipulate form elements & select tags

Let's do the following operations. We enter "no trace rain" in the input box, then click "search", then empty it again, and finally close it.

from selenium import webdriver
import time
driver = webdriver.Chrome()
# 在百度输入框内输入“无痕的雨“
driver.get('https://www.baidu.com/')
time.sleep(3)
driver.find_element_by_id('kw').send_keys('无痕的雨')
time.sleep(3)
# 点击搜索
button_tag = driver.find_element_by_id('su')
button_tag.click()
time.sleep(3)
# 清除输入内容
driver.find_element_by_id('kw').clear()
time.sleep(3)
driver.close()

The result is presented as we described.
Let's look at an example website: example website .
There are drop-down menus in this website. Our purpose is to select one of the menu options, such as selecting "Japan" in the first "try me!" input box. This uses the operational knowledge of the select tag.
We right-click the input box, select check, and open the code page. We see a label with class="nojs".
Insert picture description here
Double-click this label and a series of value values will appear. These are the options we want, and the fourth one is the object we want JP. The select element cannot be clicked directly, because the element needs to be selected after clicking. At this time, selenium provides us with a class: from selenium.webdriver.support.ui import Select. Pass the obtained element as a parameter to this class, create this object, and then use this object for selection. We see in the code:

from selenium import webdriver
import time
driver = webdriver.Chrome()
# 打开示例网页
driver.get('https://www.17sucai.com/pins/demo-show?id=5926')
time.sleep(3)
# 定位select标签
select_tag = driver.find_element_by_class_name('nojs')
# 操作select
# 1 根据value值来选择
select_tag.select_by_value('JP') # 先来一个错误的示范，这样直接写是会报错的
'''
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":".nojs"}
  (Session info: chrome=88.0.4324.104)
报错说没有找到这个元素，不能定位元素"nojs"
'''

Let's look up the code of the webpage, and look up to see an iframe tag.
Insert picture description here
iframe is an html tag, which acts as a document in a document, or a floating frame. The iframe tag creates an inline frame (that is, an inline frame) that contains another document.
There is a src ='https://www.17sucai.com/preview/157524/2014-07-08/jQuery custom drop-down menu plug-in dropkick/index.html' in the label, click on it, and the same one will appear. page.
Insert picture description here
Explain that an identical page is nested in this page. If we want to select the select tag, we need to switch our selenium to this page. Then we need to switch the iframe,

driver.switch_to_frame()

This is an outdated method, but it still works. Then pass the found iframe element as a parameter.

from selenium import webdriver
import time
driver = webdriver.Chrome()
# 打开示例网页
driver.get('https://www.17sucai.com/pins/demo-show?id=5926')
time.sleep(3)
# 切换iframe
driver.switch_to_frame(driver.find_element_by_id('iframe')) # 将定位到的iframe标签元素当成参数传递进去
# 定位select标签
select_tag = driver.find_element_by_class_name('nojs')
# 操作select
# 1 根据value值来选择
select_tag.select_by_value('JP') # 这时仍然会报错
'''
AttributeError: 'WebElement' object has no attribute 'select_by_value'
报错说'WebElement'对象没有'select_by_value'这个方法。
'''

This is because we did not import the Select class mentioned in the card, now we import it. Look carefully at the comments in the code:

from selenium import webdriver
import time
from selenium.webdriver.support.ui import Select  # 导入Select类
driver = webdriver.Chrome()
# 打开示例网页
driver.get('https://www.17sucai.com/pins/demo-show?id=5926')
time.sleep(3)
# 切换iframe
driver.switch_to_frame(driver.find_element_by_id('iframe')) # 将定位到的iframe标签元素当成参数传递进去
# 定位select标签
# select_tag = driver.find_element_by_class_name('nojs')
select_tag = Select(driver.find_element_by_class_name('nojs')) # 将之前定位到的'nojs'标签当作参数传递进去
# 操作select
# 1 根据value值来选择
# select_tag.select_by_value('JP')
select_tag.select_by_value('JP')  # 这时在.的时候就有提示select_by_value()了，不过仍然出现了方法过时提示
'''
 DeprecationWarning: use driver.switch_to.frame instead
  driver.switch_to_frame(driver.find_element_by_id('iframe')) 
  提示可以用driver.switch_to.frame 代替 driver.switch_to_frame
'''

At this time, the Japan option appeared after running.
Insert picture description here
However, there is still a method outdated prompt: DeprecationWarning: use driver.switch_to.frame instead
driver.switch_to_frame(driver.find_element_by_id('iframe')) (prompt you can use driver.switch_to.frame instead of driver.switch_to_frame). We can try:

from selenium import webdriver
import time
from selenium.webdriver.support.ui import Select  # 导入Select类
driver = webdriver.Chrome()
# 打开示例网页
driver.get('https://www.17sucai.com/pins/demo-show?id=5926')
time.sleep(3)
# 切换iframe
# driver.switch_to_frame(driver.find_element_by_id('iframe')) # 将定位到的iframe标签元素当成参数传递进去
driver.switch_to.frame(driver.find_element_by_id('iframe')) # 将driver.switch_to_frame换成driver.switch_to
# 定位select标签
# select_tag = driver.find_element_by_class_name('nojs')
select_tag = Select(driver.find_element_by_class_name('nojs')) # 将之前定位到的'nojs'标签当作参数传递进去
# 操作select
# 1 根据value值来选择
# select_tag.select_by_value('JP')
select_tag.select_by_value('JP')  # 这时没有报错了。

Above we used the value label to locate, the second operation method, you can also locate through the subscript index value:

from selenium import webdriver
import time
from selenium.webdriver.support.ui import Select  # 导入Select类
driver = webdriver.Chrome()
# 打开示例网页
driver.get('https://www.17sucai.com/pins/demo-show?id=5926')
time.sleep(3)
# 切换iframe
# driver.switch_to_frame(driver.find_element_by_id('iframe')) # 将定位到的iframe标签元素当成参数传递进去
driver.switch_to.frame(driver.find_element_by_id('iframe')) # 将driver.switch_to_frame换成driver.switch_to
# 定位select标签
# select_tag = driver.find_element_by_class_name('nojs')
select_tag = Select(driver.find_element_by_class_name('nojs')) # 将之前定位到的'nojs'标签当作参数传递进去
# 操作select
# 1 根据value值来选择
# select_tag.select_by_value('JP')
select_tag.select_by_value('JP')
# 2 根据下标索引来定位
select_tag.select_by_index(4)

Insert picture description here
We saw the results we wanted.

5. Manipulate non-select tags

We manually open the second box of the above example webpage, we see that this is a little different from the first one, this is not a select tag.
Insert picture description here
But also under the iframe tag, you still need to switch to iframe. Let's try it with Select first:

from selenium import webdriver
import time
from selenium.webdriver.support.ui import Select  # 导入Select类
driver = webdriver.Chrome()
# 打开示例网页
driver.get('https://www.17sucai.com/pins/demo-show?id=5926')
time.sleep(3)
# 切换iframe
driver.switch_to.frame(driver.find_element_by_id('iframe')) # 将driver.switch_to_frame换成driver.switch_to
# 定位select标签
select_tag = Select(driver.find_element_by_id('dk_container_country-nofake')) # 将之前定位到的'nojs'标签当作参数传递进去

select_tag.select_by_index(4)
'''
Message: Select only works on <select> elements, not on <div>
报错Select只能适用于<select> elements元素。
'''

We do not add the Select option, but use the method of finding ordinary tags, pay attention to the comments:

from selenium import webdriver
import time
from selenium.webdriver.support.ui import Select  # 导入Select类
driver = webdriver.Chrome()
# 打开示例网页
driver.get('https://www.17sucai.com/pins/demo-show?id=5926')
time.sleep(3)
# 切换iframe
driver.switch_to.frame(driver.find_element_by_id('iframe')) # 将driver.switch_to_frame换成driver.switch_to
# 定位select标签
# select_tag = Select(driver.find_element_by_id('dk_container_country-nofake')) # 将之前定位到的'nojs'标签当作参数传递进去
div_tag = driver.find_element_by_id('dk_container_country-nofake')  # 先找到输入框
div_tag.click() # 这里不是select标签，必须要有点击动作
time.sleep(3)
aim_tag = driver.find_element_by_xpath('//*[@id="dk_container_country-nofake"]/div/ul/li[5]/a') # 这里右键标签可以直接复制xpath路径
time.sleep(3)
aim_tag.click() # 仍然要加上点击动作
time.sleep(3)
driver.close()

This time the page opens, Japan is selected, and the page automatically closes after 3 seconds.

6. Simulate login to Douban

Below we use the case of simulated login to Douban to summarize the knowledge points we have learned, and carefully read the notes:

from selenium import webdriver
import time
from selenium.webdriver.support.ui import Select  # 导入Select类，在这个案例里这个没有用
driver = webdriver.Chrome()
# 打开豆瓣网页
driver.get('https://www.douban.com/')
time.sleep(1)
# 切换到账号密码登录界面
driver.find_element_by_xpath('/html/body/div[1]/div[1]/ul[1]/li[2]').click()
time.sleep(1)
# 切换iframe
login_tag = driver.find_element_by_xpath('//*[@id="anony-reg-new"]/div/div[1]/iframe')
driver.switch_to.frame(login_tag) # 将driver.switch_to_frame换成driver.switch_to

# 切换到账号密码登陆方式，这个是在切换到iframe页面后的标签路径
driver.find_element_by_xpath('/html/body/div[1]/div[1]/ul[1]/li[2]').click()
time.sleep(1)
# 输入账号，这个是在切换到iframe页面后的标签路径
acount_tag = driver.find_element_by_xpath('//*[@id="username"]')
acount_tag.send_keys('###########')
time.sleep(1)
# 输入密码，这个是在切换到iframe页面后的标签路径
code_tag = driver.find_element_by_xpath('//*[@id="password"]')
code_tag.send_keys('#########')
time.sleep(2)
button_tag = driver.find_element_by_xpath('/html/body/div[1]/div[2]/div[1]/div[5]/a').click()

This class ends here.

Crawler (10) Simulated login Douban case on selenium