Skill tree-web crawler-selenium

foreword

Hello everyone, I am Kongkong star, and I will share this article with you 《技能树-网络爬虫-selenium》.

1. selenium

Selenium is a set of web automation testing tools, and crawlers can use it to collect dynamic resources on pages. The wrong statement about it is:

A. The essence of selenium is to drive the browser to send requests, simulating the behavior of the browser
B. The content that can only be presented by executing js on the page, you can use selenium to assist in the collection
C. After the request, it often takes a while to wait for the resource loading and rendering to complete
D. Selenium, like requests, can be used to collect data with the same speed

Analysis:
A is right, Selenium is an automated testing tool that can drive browsers through programming languages ​​(such as Python, Java), and simulate manual operations to access web pages and obtain data.
B is right, there is an execute_script() method;
C is right, you can set the waiting time by displaying waiting, implicit waiting, and forced waiting;
D is wrong, Selenium cannot send HTTP requests directly like requests, but through simulation The user's operation on the browser to access the webpage, so it will be relatively slower than requests.

Two, selenium test cases

Selenium is a set of web automation testing tools that crawlers can use to collect dynamic resources on pages. Please operate in order:

  1. Install the Python Selenium package: pip install selenium
  2. Install the Chrome driver: https://npm.taobao.org/mirrors/chromedriver/, if you use other browsers, you need to download the corresponding browser driver
  3. Write tests using python unittest Use selenium to complete automation

The operation of selenium automated web page testing:

  1. Use selenium's Chrome driver to open the CSDN homepage, and the Chrome browser test page will open
  2. Verify the string "CSDN" in the page title
  3. Find the search box in the webpage
  4. Enter "OpenCV Skill Tree"
  5. Enter Enter, search results
  6. wait 10 seconds to exit

The code framework is as follows:

# -*- coding: UTF-8 -*-
import unittest
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time

class PythonOrgSearch(unittest.TestCase):

    def setUp(self):
        self.driver = webdriver.Chrome()

    def test_search_in_python_org(self):
        # TODO(You): 请正确实现浏览器自动化测试需求
        time.sleep(10)

    def tearDown(self):
        self.driver.close()

if __name__ == "__main__":
    unittest.main()

Is the following code implemented correctly?
a.

def test_search_in_python_org(self):
    driver = self.driver
    driver.get("https://www.csdn.net/")
    self.assertIn("CSDN", driver.title)
    elem = driver.find_element_by_id("toolbar-search-input")
    elem.send_keys(Keys.RETURN)
    assert "No results found." not in driver.page_source
    time.sleep(10)

B.

def test_search_in_python_org(self):
    driver = self.driver
    driver.get("https://www.csdn.net/")
    self.assertIn("CSDN", driver.title)
    elem = driver.find_element_by_id("toolbar-search-input")
    elem.send_keys("OpenCV技能树")
    elem.send_keys(Keys.RETURN)
    assert "No results found." not in driver.page_source
    time.sleep(10)

C.

def test_search_in_python_org(self):
    driver = self.driver
    driver.get("https://www.csdn.net/")
    self.assertIn("CSDN", driver.title)
    elem = driver.find_element_by_id("toolbar-search-input")
    elem.send_keys("OpenCV技能树")
    assert "No results found." not in driver.page_source
    time.sleep(10)

D.

def test_search_in_python_org(self):
    driver = self.driver
    driver.get("https://www.csdn.net/")
    self.assertIn("CSDN", driver.title)
    elem = driver.find_element_by_name("toolbar-search-input")
    elem.send_keys("OpenCV 技能树")
    elem.send_keys(Keys.RETURN)
    assert "No results found." not in driver.page_source
    time.sleep(10)

Analysis:
A is wrong, no OpenCV skill tree is entered into the input box;
B is right;
C is wrong, there is no carriage return query;
D is wrong, the positioning method is wrong, you can see the id of the input box through the figure below ="toolbar-search-input" is not a name

Summarize

Guess you like

Origin blog.csdn.net/weixin_38093452/article/details/131352699