Introduction
Selenium is an automated testing tool that supports various browsers. In short, selenium can operate browsers and load some interfaces that need to be dynamically loaded.
After installing the selenium library, you also need to install the driver for the corresponding browser.
The basic operation of selenium webdriver
locates the element After that, you also need to perform the following corresponding operations. Some commonly used methods are described below.
Common method
method | describe | Example |
---|---|---|
get(url) | visit url | driver.get('http://www.baidu.com') |
back() | one step back | driver.back() Go back to the previous page, similar to: ← |
forward() | go to the next step | driver.format() Opposite of back(), similar to: → |
quit() | Exit the driver and close all windows | driver.quit() close all windows |
close() | close the current window | driver.close() Close the currently open window |
maximize_window() | Browser maximized | |
refresh() | refresh your browser | driver.refresh() refresh your browser |
element manipulation
method | describe | Example |
---|---|---|
send_keys() | Enter data into the textbox type | driver.find_element('input').send_keys('123') Enter 123 into the input box |
clear() | Clear entered data | driver.find_element(‘tag’,'input').clear() Clear the content of the input box |
click() | click event | driver.find_element('tag','input').click() |
enter() | Trigger the enter action of the keyboard | driver.find_element('tag','input').enter() |
text() | Get the text content of an element | driver.find_element('id','name').text() , returns the text of name |
page_source | Get the HTML content of the page | driver.page_source() Get the html of the webpage |
cookie manipulation
method | Introduction | Example |
---|---|---|
get_cookies() | Get all cookies for the current page | dirver.get_cookies() Get all cookies on the page |
add_cookie() | add cookies | driver.add_cookie('time','1612354154.7383971') , add a cookie with a time of 1612354154.7383971 |
delete_cookie() | delete a cookie | driver.delete_cookie('time') delete the cookie named time |
delete_all_cookies() | delete all cookies | driver.delete_cookies() delete all cookies |
Case: Obtaining the recruitment information of Lagou.com
from selenium import webdriver
import time
from bs4 import BeautifulSoup
import re
class Job: # 建立一个工作类
def __init__(self):
self.name=None
self.company=None
self.condition=None
self.salary=None
def get(self):
return (self.name,self.company,self.condition,self.salary)
class Lagou:
def __init__(self):
self.driver=webdriver.Chrome()
self.driver.maximize_window()
self.url='http://www.lagou.com/'
def search(self,keyword):
self.driver.get(self.url)
time.sleep(3)
self.driver.find_element('id','cboxClose').click()
time.sleep(3)
self.driver.find_element('id','search_input').send_keys(keyword)
self.driver.find_element('id','search_button').click()
time.sleep(2)
page_source=self.driver.page_source
self.driver.quit()
return page_source
def get_jobs(self,page_source):
soup=BeautifulSoup(page_source,'html.parser')
myjob=Job() # 实例化每一个工作并进行信息存储
hot_item=soup.find_all('li',class_=re.compile('con_list_item'))
for item in hot_item:
myjob.name=item.find('h3').get_text().strip() # 去除字符两端的空白字符
myjob.company=item.select_one('.company_name>a').get_text().strip()
myjob.salary=item.select_one('.money').get_text().strip()
myjob.condition=item.find('div',class_='industry').get_text().strip()
print(myjob.get())
if __name__ == '__main__':
hot=Lagou()
time.sleep(5)
page_source=hot.search('python')
hot.get_jobs(page_source)
operation result