Python third-party library --selenium library

Introduction

Selenium is an automated testing tool that supports various browsers. In short, selenium can operate browsers and load some interfaces that need to be dynamically loaded.
After installing the selenium library, you also need to install the driver for the corresponding browser.
The basic operation of selenium webdriver
locates the element After that, you also need to perform the following corresponding operations. Some commonly used methods are described below.

Common method

method describe Example
get(url) visit url driver.get('http://www.baidu.com')
back() one step back driver.back()Go back to the previous page, similar to: ←
forward() go to the next step driver.format()Opposite of back(), similar to: →
quit() Exit the driver and close all windows driver.quit()close all windows
close() close the current window driver.close()Close the currently open window
maximize_window() Browser maximized
refresh() refresh your browser driver.refresh()refresh your browser

element manipulation

method describe Example
send_keys() Enter data into the textbox type driver.find_element('input').send_keys('123')Enter 123 into the input box
clear() Clear entered data driver.find_element(‘tag’,'input').clear()Clear the content of the input box
click() click event driver.find_element('tag','input').click()
enter() Trigger the enter action of the keyboard driver.find_element('tag','input').enter()
text() Get the text content of an element driver.find_element('id','name').text(), returns the text of name
page_source Get the HTML content of the page driver.page_source()Get the html of the webpage

cookie manipulation

method Introduction Example
get_cookies() Get all cookies for the current page dirver.get_cookies()Get all cookies on the page
add_cookie() add cookies driver.add_cookie('time','1612354154.7383971'), add a cookie with a time of 1612354154.7383971
delete_cookie() delete a cookie driver.delete_cookie('time')delete the cookie named time
delete_all_cookies() delete all cookies driver.delete_cookies()delete all cookies

Case: Obtaining the recruitment information of Lagou.com

from selenium import webdriver
import time
from bs4 import BeautifulSoup
import re
class Job:		# 建立一个工作类
    def __init__(self):
        self.name=None
        self.company=None
        self.condition=None
        self.salary=None
    def get(self):
        return (self.name,self.company,self.condition,self.salary)
class Lagou:
    def __init__(self):
        self.driver=webdriver.Chrome()
        self.driver.maximize_window()
        self.url='http://www.lagou.com/'
    def search(self,keyword):
        self.driver.get(self.url)
        time.sleep(3)
        self.driver.find_element('id','cboxClose').click()
        time.sleep(3)
        self.driver.find_element('id','search_input').send_keys(keyword)
        self.driver.find_element('id','search_button').click()
        time.sleep(2)
        page_source=self.driver.page_source
        self.driver.quit()
        return page_source
    def get_jobs(self,page_source):
        soup=BeautifulSoup(page_source,'html.parser')
        myjob=Job()		# 实例化每一个工作并进行信息存储
        hot_item=soup.find_all('li',class_=re.compile('con_list_item'))
        for item in hot_item:
            myjob.name=item.find('h3').get_text().strip()	# 去除字符两端的空白字符
            myjob.company=item.select_one('.company_name>a').get_text().strip()
            myjob.salary=item.select_one('.money').get_text().strip()
            myjob.condition=item.find('div',class_='industry').get_text().strip()
            print(myjob.get())
if __name__ == '__main__':
    hot=Lagou()
    time.sleep(5)
    page_source=hot.search('python')
    hot.get_jobs(page_source)

operation result
insert image description here

Guess you like

Origin blog.csdn.net/m0_54510474/article/details/121205214