Newbies must read detailed tutorial on Python crawler Selenium library

When we crawl web pages, we often find that the data we want to obtain cannot simply be obtained by parsing the HTML code. These data are displayed on the page through AJAX asynchronous loading or JS rendering.

Selenuim is an automated testing tool that supports multiple browsers. In the crawler, we can use it to simulate the browser browsing the page, thereby solving the problem of JavaScript rendering.

1. Usage examples

2. Detailed introduction

2.1 Declare the browser object

That is, tell the program which browser should be used to operate

2.2 Access page

2.3 Find elements

After successfully accessing the web page, we may need to perform some operations, such as finding the search box and entering keywords and hitting the Enter key. Therefore, you need to find the element in selenium. 

2.3.1 Single element

Selenium has two ways to find elements. The first is to specify which method to use to find elements, such as specifying to select according to CSS or to search according to xpath.

The following is a detailed element search method

find_element_by_name
find_element_by_xpath
find_element_by_link_text
find_element_by_partial_link_text
find_element_by_tag_name
find_element_by_class_name
find_element_by_css_selector

The second method is to use find_element() directly. The first parameter passed in is the element search method that needs to be used.

2.3.2 Multiple elements

The method of searching for multiple elements is basically the same as that of searching for a single element (just add an s to the func that searches for a single element). Finding multiple elements returns a list.

2.4 Element interaction

Element interaction is to first obtain an element and then call the interaction method on the obtained element. For example, enter text in the search box:

2.5 Interactive actions

Interaction is to attach actions to the interaction chain and execute them serially, which requires the use of ActionChains.

2.6 Execute JavaScript

For example, drag and drop

2.7 Get element information

After you have obtained the element through element search, you may also need to obtain the attributes and text of this element.

2.7.1 Get attributes

2.8 Frame

If you locate the parent frame, you cannot find the information of the child frame, so you need to switch to the child frame and search again. In the same way, the information of the parent frame cannot be found in the child frame.

2.9 Waiting

When requesting a web page, there may be AJAX asynchronous loading. Selenium will only load the main web page and will not take AJAX into account. Therefore, you need to wait some time for the web page to load completely before proceeding.

2.9.1 Implicit wait

When using implicit wait, if webdriver does not find the specified element, it will continue to wait. After the specified time is exceeded, if the specified element is still not found, an element not found exception will be thrown. The default waiting time is 0.

Implicit wait is waiting for the entire page.

It should be noted that the implicit wait works for the entire driver cycle, so it only needs to be set once.

2.9.2 Explicit wait

Display waiting includes waiting conditions and waiting time.

First determine whether the waiting condition is established. If it is established, return directly; if the condition is not established, the longest waiting time is the waiting time. If the waiting condition is not met after the waiting time, an exception is thrown.

Explicit waiting waits for the specified element.

2.10 Browser forward/backward

back realizes returning to the previous page, forward realizes going to the next page

2.11 Operating Cookies

2.12 Tab management

Tab management is the browser's tabs. Sometimes we need to add a new tab or delete a tab in the browser, we can use selenium to achieve this.

Guess you like

Origin blog.csdn.net/qiqi1220/article/details/128669555
Recommended