python crawling Baidu search results with selenium

Selenium is a tool for Web application testing. Selenium tests run directly in the browser, just as real users in the same operation.

(End of this article there is a complete code)

0. installation selenium library

pip install selenium

1. Download the corresponding browser browser driver

Here I use the Google browser
Driver Download Chrome browser
to download directly to extract the good of the directory to python

2. Import libraries and sets a headless browser (optional)

Headless browser, which is no interface of the browser.

from time import sleep
from selenium import webdriver

# 无头浏览器设置
co = webdriver.ChromeOptions()
co.headless = True
browser = webdriver.Chrome(options=co)

When co.headless = False when there is the interface of

3. Open the Baidu website

url = 'https://www.baidu.com'
browser.get(url)

Because our code execution speed faster than the speed of Baidu server response.
Baidu has not had time to return search results, we execute the following code

browser.implicitly_wait(3)

If no such element, every half a second to go once to see on screen until you find the element, or over 3 seconds maximum duration.

4. The use of chrome f12 find open id search box and search button

That is, find the program you need to enter or click on the location where

First press f12 to open the Developer Tools, select the label Elemes, and then click the arrow next to
Here Insert Picture Description
then put the mouse on the input box on the Elements will target tag id of the input box, the id is unique in this web page

Here Insert Picture Description
Similarly to obtain the search button mouse on the elements in the search box on the id
Here Insert Picture Description
get this even a id that we can continue to write programs

browser.find_element_by_id('kw').send_keys('kali搭建钓鱼wifi csdn')  # 输入框
browser.find_element_by_id('su').click()  # 点击搜索按钮

among them
find_element_by_id () is to find the page elements id
send_keys () method can be a string in the corresponding element
click () method of the element by clicking

5. Find the search interface element id

If you and I do together in front of the right, now the page should have it jump to the search results,
then download and as before, to find the id of each element of the result
Here Insert Picture Description
we found, id each result is regular, id from 1-10 corresponding respectively 10 search results, and they belong to the c-container class
so we directly find all the c-container class

elements = browser.find_elements_by_class_name('c-container')  # 查找到所有c-container类

note
Here is find_elements_by_class_name (), id find more than the previous one s, which is in line with the conditions of return found in all elements (there are three elements), in a list in return.
And if we use find_element_by_class_name (note one less s) method will only return the first element.

6. The for loop output

Here we see that the search results are placed on a label in the
Here Insert Picture Description
so we element.find_element_by_tag_name () to find a label in the text that the search results.

for element in elements:
    span = element.find_element_by_tag_name('a')
    print(span.text)

7. crawling finished, quit the browser

sleep(2)
browser.quit()

Is not very simple, only a short two dozen lines of code to get!

The complete code

"""
用selenium百度搜索结果显示
"""
from time import sleep
from selenium import webdriver

# 无头浏览器设置
co = webdriver.ChromeOptions()
co.headless = True
browser = webdriver.Chrome(options=co)

browser.implicitly_wait(3)
url = 'https://www.baidu.com'
browser.get(url)

browser.find_element_by_id('kw').send_keys('kali搭建钓鱼wifi csdn')  # 输入框
browser.find_element_by_id('su').click()  # 点击搜索按钮
elements = browser.find_elements_by_class_name('c-container')  # 查找到所有c-container类
for element in elements:
    span = element.find_element_by_tag_name('a')
    print(span.text)
    # print(span.get_attribute('innerHTML'))
sleep(2)
browser.quit()

If you liked this article, I want to leave your praise

Here Insert Picture Description Here Insert Picture Description

Published 10 original articles · won praise 14 · views 3715

Guess you like

Origin blog.csdn.net/yuaneuro/article/details/104517848