Selenium and python crawler (5) [data processing (target 4) Baidu translation example]

The difference between selenium to get web page source code

This first blog has already been written. But what I want to reiterate here is his difference


drive=webdriver.Chrome()

drive.get('https://www.baidu.com/')
print(drive.page_source)

page_source is the source code returned by selenium, but the difference between this and what we get through the requests library is. Selenium can be dynamically updated and continuously updated. Selenium is more like a browser. She can directly get the page displayed in Ajax like a browser.
For example, when you use requests, you
Insert picture description here
Insert picture description here
can't find it when you crawl the hook net .
Insert picture description here

We can find it in the element. This is because this thing is loaded into the browser so selenium can get it naturally, but when using requests only visit one of the URLs, naturally only one can be obtained. The relationship is shown in the figure, if you use requests to access only url_1 and the content is in url_2
Insert picture description here

Instance

from selenium import webdriver
import time

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as Ec
drive=webdriver.Chrome()
drive.get('https://fanyi.baidu.com/?aldtype=16047#zh/en')

def set_word():

    try:
        InputTag=WebDriverWait(drive,10).until(
            Ec.presence_of_element_located((By.ID,'baidu_translate_input'))
        )
        InputTag.send_keys('你好')
    except Exception as e :
        print(e)
set_word()#百度会刷新一次所以要两次
time.sleep(2)
set_word()



print(drive.page_source)

The current result (Hello is returned)
Insert picture description here
compared to the previous one (using requests)
Insert picture description here

Then we put it in etree and use xpath or directly use re to extract the translation results.

statement

Due to the limited time, Insert picture description here
the subsequent code will not be written, which is relatively simple. Well, basically selenium needs to use so much.

Guess you like

Origin blog.csdn.net/FUTEROX/article/details/108503604