Using Selenium

Article preview:

Using Selenium

Preface:
Selenium is an automated testing tool, which can be simply understood as simulating real-life operations. It plays a very important role in crawlers, because a lot of website data comes from the interface, and the interface is encrypted, and some pages are dynamically rendered by JavaScript, and selenium automation is very effective in solving this problem. Not only have these anti-crawlers been overcome, but also the source code of the browser's current page can be obtained directly.
Readers who want to understand the principle of selenium can refer to the realization principle of Selenium, an article explains thoroughly!
Selenium documentation: https://selenium-python-zh.readthedocs.io/en/latest/

1. Preparations

The article uses Chrome as an example to explain the usage of Selenium. Before starting, the environment and drivers must be configured. Next, I will show you how to install

Please make sure that the Chrome browser is properly installed and configured ChromeDriver. In addition, you also need to install the Python Seleniumlibrary (enter pip install selenium in the terminal to download)

1.1 Environment installation

Google download: http://chorm.sdswrj.cn/browser.html
Install selenium library in python: Enter pip install selenium in the terminal
insert image description here

1.2 Install the driver (see the note first)

Official website: http://chromedriver.storage.googleapis.com/index.html

1.3 Driver placement
After downloading, unzip the file, and it will appear after opening
insert image description here
Copy the exe file and paste it in the python directory:
this is the author's directory, please Readers put according to their own catalog

Notice:

The driver must correspond to the browser version, otherwise it will fail to start
It doesn't matter if there is a little difference, the driver version should be as close as possible to the browser version
Prohibit browser update and open cmdInput services.mscto open the background service, and disable the automatic update of the browser because of the selenium `driver to open the browser for functional operations
If it cannot be closed, Google Update needs to download the corresponding version of the driver in time. In fact, there are versions on the Internet that prohibit updates.

2. Declare the browser object

Selenium supports a variety of browsers, such as Chrome, Firefox, Edge, etc., as well as browsers on mobile phones such as Android and BlackBerry. In addition, the interfaceless browser PhantomJS is also supported.

2.1 We can initialize as follows:

from selenium import webdriver

browser = webdriver.Chrome()     #如果使用的是Chrome浏览器，输入这一行
browser = webdriver.Firefox()    #如果使用的是Firefox浏览器，输入这一行
browser = webdriver.Edge()       #如果使用的是edge浏览器，输入这一行
browser = webdriver.PhantomJS()  #如果使用的是phantomjs浏览器，输入这一行
browser = webdriver.Safari()

This completes the initialization of the browser object and assigns it to the browser object. Next, all we have to do is call the browser object and let it perform various actions to simulate browser operations.

3. Basic use

3.1. Load the specified page and close it

from selenium import webdriver
import time
from selenium.webdriver.common.by import By
# 打开指定（chrome）浏览器
browser = webdriver.Chrome()
# 指定加载页面
browser.get("http://www.baidu.com/")
# 方法弃用
# browser.find_element_by_id('kw').send_keys('python')
# 通过name属性选择文本框元素，并设置内容
browser.find_element(By.NAME,'wd').send_keys("selenium")
# 通过通过ID属性获取“百度一下”按钮，并执行点击操作
browser.find_element(By.ID,"su").click()
# 提取页面
print(browser.page_source.encode('utf-8'))
# 提取cookie
print(browser.get_cookies())
# 获取当前页面截屏
print(browser.get_screenshot_as_file('123.png'))
# 提取当前请求地址
print(browser.current_url)
# 设置五秒后执行下一步
time.sleep(5)
# 关闭浏览器
browser.quit()

After running the code, it is found that a Chrome browser will pop up automatically. The browser will first jump to Baidu, then enter Python in the search box, and then jump to the search result page

**Selenium4 new features:**https://www.dilatoit.com/zh/2020/02/02/selenium-4-xintexingqianzhan.html

4. Initial configuration

from selenium import webdriver
options = webdriver.ChromeOptions()

# 禁止图片
prefs = {
    
    "profile.managed_default_content_settings.images": 2}
options.add_experimental_option("prefs", prefs)

# 无头模式 在后台运行
# options.add_argument("-headless")

# 通过设置user-agent
user_ag='MQQBrowser/26 Mozilla/5.0 (Linux; U; Android 2.3.7; zh-cn; MB200 Build/GRJ22;CyanogenMod-7) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1'
options.add_argument('user-agent=%s'% user_ag)


#隐藏"Chrome正在受到自动软件的控制"
options.add_experimental_option('useAutomationExtension', False) # 去掉开发者警告
options.add_experimental_option('excludeSwitches', ['enable-automation'])

#设置代理
# options.add_argument("--proxy-server=http://58.20.184.187:9091")

# 初始化配置
browser = webdriver.Chrome(chrome_options=options)

#将浏览器最大化显示
browser.maximize_window()
# 设置宽高
browser.set_window_size(480, 800)

# 通过js新打开一个窗口
browser.execute_script('window.open("http://httpbin.org/ip");')

5. Find Nodes

Selenium can drive the browser to complete various operations, such as filling forms, simulating clicks, and so on. For example, we want to complete the operation of inputting text into an input box or grabbing data, and Selenium provides a series of methods for finding nodes. We can use these methods to obtain the desired nodes so that we can perform some actions or OK.

Selenium provides 2 methods

find_element()Series: Used to locate individual page elements.
find_elements()Series: It is used to locate a group of page elements, and what is obtained is a group of lists.

5.1 Single node

Let's implement it with code:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys  # 模拟键盘操作
from selenium.webdriver.common.by import By
# 启动并打开指定页面
browser = webdriver.Chrome()
browser.get("http://www.baidu.com")
# 通过name属性选择文本框元素，并设置内容
s = browser.find_element(By.NAME,'wd')
s.send_keys('衣服')
s.send_keys(Keys.ENTER)   # 回车 确定的意思

Various node extraction demos

browser.get("http://www.baidu.com")
# ID选折起定位
input_text = browser.find_element(By.ID, "kw")
input_text.send_keys("selenium")
# CSS 选择器定位
s =browser.find_element(By.CSS_SELECTOR,'input.s_ipt')
s.send_keys('衣服')
# xpath 选择器定位
s = browser.find_element(By.XPATH,'//input[@id="kw"]')
s.send_keys('衣服')

5.2 Multiple nodes

If you want to find all nodes that meet the conditions, you need to use a method like find_elements(). Note that in the name of this method, element has an extra s, pay attention to the distinction.

This can be achieved like this:

from selenium import webdriver

browser = webdriver.Chrome()
browser.get('https://www.icswb.com/channel-list-channel-161.html')
lis = browser.find_elements(By.CSS_SELECTOR,'#NewsListContainer li')
print(lis)

It can be seen that the obtained content becomes a list type, and each node in the list is of the WebElement type.

6. Node interaction

Selenium can drive the browser to perform some operations, that is to say, it can let the browser simulate and perform some actions. The more common usages are: use the send_keys method when entering text, use the clear method when clearing text, and use the click method when clicking a button. Examples are as follows:

from selenium import webdriver
import time
browser = webdriver.Chrome()
browser.get('https://www.baidu.com')
input = browser.find_element(By.ID,'kw')
input.send_keys('iPhone')
time.sleep(1)
input.clear()
input.send_keys('iPad')
button = browser.find_element(By.ID,'su')
button.click()

Through the above method, we have completed the action operations of some common nodes. For more operations, please refer to the introduction of interactive actions in the official document
: http://selenium-python.readthedocs.io/api.html#module-selenium.webdriver .remote.webelement .

7. Switch the IFrame (generally, this is the problem that the node cannot be selected)

We know that there is a node in a web page called iframe, which is a sub-frame, which is equivalent to a sub-page of a page, and its structure is exactly the same as that of an external web page. After Selenium opens the page, it operates in the parent Frame by default, and if there are child Frames in the page at this time, it cannot get the nodes in the child Frame. At this time, you need to use the switch_to.frame() method to switch the Frame. Examples are as follows:

browser.get('https://www.douban.com/')
login_iframe=browser.find_element(By.XPATH,'//div[@class="login"]/iframe')
browser.switch_to.frame(login_iframe)
browser.find_element(By.CLASS_NAME,'account-tab-account').click()
browser.find_element(By.ID,'username').send_keys('123123123')

**Note:** For the iframe web page, you must switch in to be able to locate,

8. Action chain

In the above example, some interactive actions are performed for a certain node. For example, for an input box, we call its input text and clear text methods; for a button, we call its click method. In fact, there are other operations that do not have a specific execution object, such as mouse dragging, keyboard keys, etc. These actions are executed in another way, that is, the action chain.

For example, to implement the drag operation of a node now, to drag a node from one place to another, it can be implemented like this:

from selenium import webdriver
from selenium.webdriver import ActionChains

browser = webdriver.Chrome()
url = 'http://www.runoob.com/try/try.php?filename=jqueryui-api-droppable'
browser.get(url)
log = browser.find_element(By.XPATH, '//div[@id="iframewrapper"]/iframe')
browser.switch_to.frame(log)
source = browser.find_element(By.CSS_SELECTOR,'#draggable')
target = browser.find_element(By.CSS_SELECTOR,'#droppable')
actions = ActionChains(browser)
actions.drag_and_drop(source, target)
actions.perform()

drag_and_drop()The method involves parameter passing , one is the starting point of the element to be dragged, and the other is the end point of the element to be dragged

First, open a dragging instance in the webpage, then select the node to be dragged and the target node to be dragged in sequence, then declare the ActionChains object and assign it to the actions variable, and then call the drag_and_drop() method of the actions variable, and then call perform The () method executes the action, and the dragging operation is completed at this point

9. Page scrolling

Address: https://36kr.com/

# 浏览器滚动到底部 10000位置
document.documentElement.scrollTop=10000
# 滚动到顶部
document.documentElement.scrollTop=0

# 移动到页面最底部  
browser.execute_script("window.scrollTo(0, document.body.scrollHeight)")
 
# 移动到指定的坐标(相对当前的坐标移动)
driver.execute_script("window.scrollBy(0, 700)")
# 结合上面的scrollBy语句，相当于移动到700+800=1600像素位置  
driver.execute_script("window.scrollBy(0, 800)")
 
# 移动到窗口绝对位置坐标，如下移动到纵坐标1600像素位置  
driver.execute_script("window.scrollTo(0, 1600)")
# 结合上面的scrollTo语句，仍然移动到纵坐标1200像素位置  
driver.execute_script("window.scrollTo(0, 1200)")

Summarize:

For the selenium automation tool, the blogger personally feels that it is a very interesting tool. For example: in the case that the superstar learning pass cannot directly open the double speed, or adjust the progress bar, you can choose to use the selenium automation tool. Think carefully, sleep at night Dajue, selenium is also a very cool thing to simulate real people to brush classes, but readers still need to pay more attention to usage, just learn it, don't do bad things!

Selenium automation tool (1)