Data explosion, Python can obtain the most popular product data from Alibaba’s auctions with one click and save it to the database!

Preface

Alibaba Foreclosure is a well-known online auction website in mainland China. It provides auctions, fixed-price transactions and fixed-price sales of various goods and services. If you want to obtain product information from the Ali Foreclosure website, such as product name, price, pictures, etc., you can use Python to write a code to obtain this data.

Before implementing, please make sure you have installed the following Python libraries and tools:

  • requests: used to send HTTP requests to the Ali Foreclosure website and obtain the response content.
  • beautifulsoup4: used to parse HTML web page content and extract data.
  • pandas: used to create data tables and organize data.

When you complete these steps, you are ready to start writing Python code.

Get data code implementation

Step 1: Get the destination URL

In this example, we will visit the page of the specified product on the Alibaba Foreclosure website and obtain information about this product. We need to first find the URL of this product and record it for subsequent use. In our example, we will get the URL of this product:

https://sf-item.taobao.com/sf_item/69947813772.htm

In practical applications, you need to obtain the URLs of different products as needed.

Step 2: Send a request to the target URL and get the response content

Next, we will use Python's requests library to send an HTTP request to the Ali Foreclosure website and obtain the response content. First, we need to set the HTTP request header information (Headers) so that the server can recognize our request.

# 设置Headers
headers = {
    
    'User-Agent': 'Mozilla/5.0'}
# 发送HTTP请求并获取响应内容
url = 'https://sf-item.taobao.com/sf_item/69947813772.htm'
page = requests.get(url, headers=headers)

In the code above, we set a simple User-Agent header, which tells the server that we are using the Mozilla browser. Then, we use the requests library to send a GET request to obtain the web content of the specified product on the Ali Foreclosure website, and save the response content in a "page" variable.

Step 3: Parse web content and extract product information

We have obtained the HTML content of the product page, now we need to extract product information from it. In this example, we will try to extract the name, price, image, and description information of the product.

# 解析网页内容并提取商品信息
soup = BeautifulSoup(page.content, 'html.parser')
item_name = soup.find_all('h3', class_='title')[0].get_text().strip()
item_price = soup.find_all('span', class_='price')[0].get_text().strip()
item_picture = soup.find_all('img', class_='og-image')[0]['src'].strip()
item_desc = soup.find_all('div', class_='desc desc-more')[0].get_text().strip()

Here we use Python's BeautifulSoup4 library to parse the HTML content of the product page and locate the product information we want through various tags and attribute values. For each object, we use their text property or label property to get their value, and use the strip() method to remove spaces and newlines from them.

Step 4: Save product information into DataFrame

Once we obtain various product information from the product page, we can use Python's pandas library to organize the information into a DataFrame format and save it to a CSV file or perform other operations.

# 将商品信息保存到DataFrame中
auction_dict = {
    
    
    'Name': [item_name],
    'Price': [item_price],
    'Picture': [item_picture],
    'Description': [item_desc]
}
auction_df = pd.DataFrame(auction_dict)

Here, we create a Python dictionary named "auction_dict", which contains key-value pairs of product information.

Save product information to the database

If you want to save data into a MySQL database, you need to use the MySQL Database API in Python to connect to the database and add the data. The following is a sample code that connects Python to MySQL and saves Ali’s foreclosure product information to the MySQL database:

Step 1: Install MySQL Connector

Using the MySQL database in Python requires installing the MySQL Connector, which can be installed using the following command:

pip install mysql-connector-python

Step 2: Connect to MySQL database

First we need to connect to the MySQL database and get the cursor:

# 导入MySQL Connector
import mysql.connector

# 连接数据库
mydb = mysql.connector.connect(
  host="localhost",
  user="yourusername",
  password="yourpassword",
  database="mydatabase"
)

# 获取游标
mycursor = mydb.cursor()

Please replace "yourusername", "yourpassword", and "mydatabase" with your database username, password, and database name.

Step 3: Create database table

Next, we need to create a database table to save product information. The following is a code example to create a database table:

# 创建表格
mycursor.execute("CREATE TABLE IF NOT EXISTS auctions (name VARCHAR(255), price VARCHAR(255), picture VARCHAR(255), description VARCHAR(255))")

Here, we create a table called "auctions" (if the table already exists, it will not be created again) and define four columns: product name, product price, product image, and product description.

Step 4: Insert data

We are ready to save product information to the MySQL database. The following is a code example for saving product information to a MySQL database:

# 插入数据
sql = "INSERT INTO auctions (name, price, picture, description) VALUES (%s, %s, %s, %s)"
val = (item_name, item_price, item_picture, item_desc)
mycursor.execute(sql, val)

# 提交数据到数据库
mydb.commit()

Here, we use the MySQL cursor mycursor obtained in the above code to perform the operation of inserting data. We insert a single piece of data using batch insertion, passing the product name, product price, product image and product description as a tuple to the MySQL Execute method.

Note that in actual applications, you need to write specific code according to different database connection methods and usage methods.

Guess you like

Origin blog.csdn.net/m0_48405781/article/details/131289245