The difference between selenium to get web page source code
This first blog has already been written. But what I want to reiterate here is his difference
drive=webdriver.Chrome()
drive.get('https://www.baidu.com/')
print(drive.page_source)
page_source is the source code returned by selenium, but the difference between this and what we get through the requests library is. Selenium can be dynamically updated and continuously updated. Selenium is more like a browser. She can directly get the page displayed in Ajax like a browser.
For example, when you use requests, you
can't find it when you crawl the hook net .
We can find it in the element. This is because this thing is loaded into the browser so selenium can get it naturally, but when using requests only visit one of the URLs, naturally only one can be obtained. The relationship is shown in the figure, if you use requests to access only url_1 and the content is in url_2
Instance
from selenium import webdriver
import time
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as Ec
drive=webdriver.Chrome()
drive.get('https://fanyi.baidu.com/?aldtype=16047#zh/en')
def set_word():
try:
InputTag=WebDriverWait(drive,10).until(
Ec.presence_of_element_located((By.ID,'baidu_translate_input'))
)
InputTag.send_keys('你好')
except Exception as e :
print(e)
set_word()#百度会刷新一次所以要两次
time.sleep(2)
set_word()
print(drive.page_source)
The current result (Hello is returned)
compared to the previous one (using requests)
Then we put it in etree and use xpath or directly use re to extract the translation results.
statement
Due to the limited time,
the subsequent code will not be written, which is relatively simple. Well, basically selenium needs to use so much.