"College Entrance Examination Website + Python + Selenium" automates the PC and easily obtains computer major university information. I hope the above title can inspire you.

Preface

The object of a few days is a website related to China's college entrance examination. It provides college entrance examination information, batch line inquiry, volunteer application guidance, college information and other services. It is very useful to Chinese high school students and parents.

The specific steps are as follows:

Introduction of libraries

First, we need to import some required libraries:

# 时间模块
import time
# 自动化测试模块
from selenium import webdriver
# 保存数据
import csv
  • timeModule: used to control the speed of the program to prevent the IP from being blocked by the website.
  • seleniumModule: used to simulate browser operations and can solve some anti-crawling mechanisms, such as JavaScript rendering, etc.
  • csvModule: Used to store data into CSV files.

Open the browser and visit the web page

Then, we need to open the Chrome browser and visit the computer major page on the college entrance examination website:

# 打开浏览器
driver = webdriver.Chrome()
# 访问网站
driver.get('https://www.gaokao.cn/special?fromcoop=pddh&subjectCategory=%E5%B7%A5%E5%AD%A6&subjectName=%E8%AE%A1%E7%AE%97%E6%9C%BA%E7%B1%BB')
# 延时等待 网页元素加载
driver.implicitly_wait(10)

seleniumThe class in the module is used here webdriverto open the Chrome browser, and through getthe method, access the computer major page on the college entrance examination website. We also used implicitly_waitthe method, setting a wait time of 10 seconds to ensure that the required web elements have been fully loaded.

Get the elements of all universities and open the information page of each university one by one.

   lis = driver.find_elements_by_css_selector('.major-list_setSchool__3Nr1N')
   for li in lis:
       li.click()
       handles = driver.window_handles
       driver.switch_to.window(handles[-1])

Use a for loop to iterate through each obtained school name. Use clickthe function to simulate clicking the current school name with the mouse to enter another sub-page containing data.

After entering the university's information page, use an endless loop to continuously turn the pages and extract the basic information of each university on each page.

   while True:
       driver.implicitly_wait(10)
       time.sleep(1)
       divs = driver.find_elements_by_css_selector('.school-tab_schoolInfo__1mNye')
       for div in divs:
           # 提取大学的基本信息并保存到 CSV 文件中
           # ...
       frame = driver.find_element_by_css_selector('.ant-pagination-next')
       next_page = frame.get_attribute('aria-disabled')
       if next_page == 'true':
           break
       elif next_page == 'false':
           frame.click()

Use while Truestatements to enter an infinite loop. In each loop, use statements find_elements_by_css_selectorto obtain the label content corresponding to each school, use try exceptstatements to handle errors, and write the obtained data into a CSV file.

Before the end of the loop, frame.click()get the next page of the web page by traversing all web page elements. If there is no next page for the retrieved element, exit the loop.

Close the current information page, return to the initial page, and continue traversing the next university's information page.

   driver.close()
   driver.switch_to.window(handles[0])

Create CSV file

Next, we need to create a CSV file and write the header:

# 创建文件
f = open('data1.csv', mode='w', encoding='utf-8', newline='')
# 设置表头
csv_writer = csv.DictWriter(f, fieldnames=[
    '学校',
    'tag1',
    'tag2',
    'tag3',
    'tags',
])
# 写入表头
csv_writer.writeheader()

The function that comes with Python is used here opento create data1.csva file named. We then use csvthe module's DictWriterclass to create a write object for writing to the CSV file. The parameters of this class fieldnamesare used to set the header, which is the first line of the CSV file. Next, we call writeheaderthe function to write the header to the CSV file.

In this way, we can crawl the computer major university information from the college entrance examination website locally and save it as a CSV file to facilitate future analysis and processing.

Guess you like

Origin blog.csdn.net/m0_48405781/article/details/131310178