foreword
12306 is the only official website of China Railway Passenger Transport and an important online platform for purchasing train tickets. This article mainly introduces how to use the Python crawler module Selenium and BeautifulSoup to crawl train ticket information from 12306 and save it in an Excel document, so that everyone can view and compare the prices and remaining tickets of different train numbers and seat types.
Preparation
Before we start, we need to do some preparatory work.
Install the following components:
- Python 3.x
- BeautifulSoup 4
- Selenium
- Chrome browser or other browsers that support Selenium
The above components can be installed using the following command:
pip install beautifulsoup4
pip install selenium
Note: Since you need to use Selenium to simulate a browser to access the website, you need to download the front-end driver and select the corresponding driver according to the browser version. This article uses the Chrome browser, and the driver download link is: http://npm.taobao.org/mirrors/chromedriver/.
After the installation is complete, decompress the downloaded driver to any location, and add the folder where the driver is located to the PATH environment variable of the system.
Crawl train ticket information
Set request parameters
First, you need to set the request parameters, including departure city, arrival city, departure date and other information. In code, use the following parameters:
fro