Three Selenium Getting python web crawler's study notes

Disclaimer: This article is a blogger original article, shall not be reproduced without the bloggers allowed. https://blog.csdn.net/bowei026/article/details/90724538

Crawl dynamic web content there are two ways, one is to find dynamic content through an interface developer tools, by analyzing the interface parameters and return value to crawl data fetch site. Another is to capture data by simulating a browser, python Selenium the library can be simulated by the browser to fetch the data codes.

I. Overview

Run Selenium selenium you need to rely on the Python library, drive, and corresponds to a browser (WebDriver).

Installation selenium library

pip install selenium
Project Address: https://pypi.org/project/selenium/

Download WebDriver

WebDriver can be simply understood as a browser plug-in is an executable program. Different browsers corresponding WebDriver are different, such as Firefox WebDriver is geckodriver, the Windows environment is geckodriver.exe file; Chrome browser WebDriver is Chromedriver, the Windows environment is chromedriver.exe file.

Unzip Webdriver After the download, copy the exe file to the python directory (the directory as long as you can in the path environment variable)

Firefox webdriver download
https://github.com/mozilla/geckodriver/ 

google chrome download the webdriver (browser version by downloading the corresponding webdriver, if the chromedriver.exe version of Chrome does not match, then the program will run python selenium failure)
http://chromedriver.storage.googleapis.com/ index.html

Second, the example

Examples. 1:
from the webdriver Import Selenium

browser = webdriver.Chrome()
browser.get('http://www.baidu.com')
assert '百度一下' in browser.title

#elem = browser.find_element_by_name("wd")
elem = browser.find_element_by_xpath('//*[@id="kw"]')
elem.send_keys("selenium")

btn = browser.find_element_by_id("su")
btn.click()

#browser.quit()

 

例子2:
import  unittest
from selenium import webdriver


class BaiduTest(unittest.TestCase):

    def setUp(self):
        self.browser = webdriver.Firefox()
        self.browser.get("http://www.baidu.com")
        #self.addCleanup(self.browser.quit)

    testTitle DEF (Self):
        self.assertIn ( "Baidu it", self.browser.title)

    def testSearch(self):
        #self.browser.get("http://www.baidu.com")
        searchInput = self.browser.find_element_by_id("kw")
        searchInput.send_keys("selenium")

        searchBtn = self.browser.find_element_by_id("su")
        searchBtn.click()

        self.assertIn("selenium", self.browser.current_url)


if __name__ == '__main__':
    unittest.main(verbosity=2)

Other resources:
https://www.seleniumhq.org/download/
http://ftp.mozilla.org/pub/firefox/releases/ Firefox version
https://www.cnblogs.com/givemelove/p/8482361.html Firefox, Google software and webdriver

This concludes this article, it may be more concerned about the number of public and personal micro signal:

Guess you like

Origin blog.csdn.net/bowei026/article/details/90724538