Python + Selenium practice (a) - the removal of all mailboxes on the page

Exercise scenario: Some fields are of interest to us on a page, we hope extirpated, other operations. However, these fields may be different parts of the web page. For example, we require the removal of all the mailbox on the page on Baidu.

 

 Split thinking:

1. First, you need to get source contents of the current page, for example, to open a page, right - to view the page source code.

2. find out the law, by the removal of a regular expression to match the field, stored in a dictionary or list.

3. Cycle dictionary or print the contents of the list, Python used for statement to achieve.

 

Technically implement relevant methods:

1. view the source code of the page, there is a method in Selenium in this drive.page_source obtained;

2.Python with Regular, need REVIEW re module;

3.for email in emails:

   print email

First, the specific code:

Selenium the webdriver Import from 
Import Re 

Driver = webdriver.Chrome () 
driver.maximize_window () 
driver.implicitly_wait (. 6) 

driver.get ( "http://home.baidu.com/contact.html") 
# get page source 
doc driver.page_source = 
emails = the re.findall (R & lt '[\ W] + @ [\ W \ .-] +', DOC) 
# regular expressions, @ .XXX.XXX find XXX 
for in emails in Email: # circulation print matching mailbox 
    print (email)

 

Explanation:

In the regular expression syntax python, Python string preceded by r represents the original string, by \ w represents the matching alphanumeric and underlined. The re module findall method returns a list of matching substring.

operation result:

 

 

Second, the advanced version

from selenium import webdriver
import re,time,pprint,xlwt

wb = xlwt.Workbook()
ws = wb.add_sheet('E-mails')

driver = webdriver.Chrome()
driver.maximize_window()
time.sleep(2)
driver.get('https://www.baidu.com/')
driver.find_element_by_xpath("//*[@class='lh']/a[text()='关于百度']").click()
print(driver.current_window_handle)
handles = driver.window_handles
for handle in handles:
    if handle != driver.current_window_handle:
        driver.close()
        print("马上切换到新页面", handle)
        driver.switch_to.window(handle)
driver.find_element_by_xpath("//*[@id='indexAdmin']/div[1]/div/div/div/div[2]/ul/li[4]/a").click()
# / HTML / body / div [. 1] / div / div / div / div [2] / UL / Li [. 4] / A
print(driver.current_window_handle)
= driver.window_handles Handles 
Print (Handles) 
for handle in Handles: 
    IF = driver.current_window_handle handle:! 
        driver.close () # close the first window 
        print ( 'immediately switch to the new tab', handle) 
        driver.switch_to. window (handle) # switch to the second window 
# page source code obtained 
DOC = driver.page_source 
emiles the re.findall = (R & lt '[\ W] + @ [\ W \ .-] +', DOC) 
for index, in the enumerate Emile (emiles): 
    ws.write (index, 0, Emile) 
    Print (Emile) 
wb.save ( 'Baidu Contact email .xls') 
Print ( 'extraction is complete') 
driver.close ()

  

operation result:

 

 

Reference article: https://blog.csdn.net/u011541946/article/details/68485981

Guess you like

Origin www.cnblogs.com/zhaocbbb/p/12609332.html