[Python] Use of the MySQLdb library and format the value in the output field

1. Brief introduction of the project

There are many ways for us to obtain the content of the field, but basically we need to ctrl+c (copy) ctrl+v to paste, and then manually change it

Using python as a processing tool will be much faster. The libraries that need to be installed in this project: MySQLdb, pandas, numpy

For example, we want to add "" to each different value and copy the data as shown below

Then we have to add "" and between the fields of each line every time, which is very troublesome, so how to deal with this form of converting it into a string list and remove the list attribute.

2. Content plus symbol formatted output solution

1. Import the library, as is followed by an alias

import MySQLdb
import pandas as pd
import numpy as np

2. pandas read file data:

E:/py/txt/var.txt is the table where I store the data content. Its content is the different values in the brand field in the table extracted from SQL. The original table is the sales data of Jingdong mobile phone for one year:

# 导入文件,并设置index_col=None,
df = pd.read_csv('E:/py/txt/var.txt', index_col=None, engine='python', names=['品牌'])
print(df)

At this time, the data type is DataFrame, which is a two-dimensional table

Among them, there are three types of attribute index_col values, integer type, sequence, Boolean (the default is None)

index_col=None is the value of index, that is, the column uses the default index 0 1 2 3....

index_col=0 The first column is the index value, that is, the content of the first column is used as the index at this time

The result of the operation is as follows:

Brand 0
vivo
1 Glory 2
Xiaomi
3 Apple
4 Newman
5 Huawei
6 realme
7 oppo
8 Samsung
9 Nubia 10 OnePlus
11 Meizu 12 Motorola 13 Others 14 Coolpad
15 Dovey 16 Nokia 17 ZTE 18 Philips 19 Nikain 20 Tianyu 21 Coolby 22 Candy 23 Gionee 24 nzone 25 Black Shark

# 将DataFrame格式转换为数组
array = np.array(df)

3. Loop through the output and format the output

If range(1, 26) exceeds this range, an error will be reported indicating that the index range is exceeded, but it will not affect the running results

IndexError: index 26 is out of bounds for axis 0 with size 26

for i in range(1, 26):
    s = df[i-1:i]
    array[i] = np.array(s)
    print(f"%c{str(*array[i])}%c," % (34, 34))

Among them, s=df[i-1:i] is the loop output of each piece of data in df, similar to the index output of series

array[i] = np.array(s), circularly convert the data into an array and assign

%c is the formatted output character and the output with ASCII code value 34% (34,34)

Output as a list:

# 数组转列表
a_list = array.tolist()
print(a_list)

[['vivo'], ['vivo'], ['Glory'], ['Xiaomi'], ['Apple'], ['Newman'], ['Huawei'], ['realme'], [ 'oppo'], ['Samsung'], ['Nubia'], ['OnePlus'], ['Meizu'], ['Motorola'], ['Others'], ['Coolpad'], ['Duowei'], ['Nokia'], ['ZTE'], ['Philips'], ['Nikane'], ['Tianyu'], ['Coolby'], ['Candy' ], ['Gionee'], ['nzone']]

3. Mysqldb library solves the problem of complex data extraction from the database

1. There must be related libraries, as mentioned above

2. The following is the database output code

# 打开数据库连接
db = MySQLdb.connect("localhost", "root", "489000", "test", charset='utf8')
# 使用cursor()方法获取操作游标
cursor = db.cursor()
# 使用execute方法执行SQL语句
cursor.execute("SELECT VERSION()")
# 使用 fetchone() 方法获取一条数据
version = cursor.fetchone()
print("Database version : %s " % version)
# 循环下标
# Sql预处理语句分组并查询各个品牌
sql = """SELECT 品牌 FROM SHEET1 \
       GROUP BY 品牌 """
cursor.execute(sql)
for i in range(1, 10000):
    data = cursor.fetchone()
    if data is None:
        break
    else:
        print(*data)  # 解包输出
# 关闭数据库连接
db.close()

3. In order to ensure the complete range of readings during the cycle, you can set a larger range, and then add judgment conditions on this basis,

Assign the data read by the loop cursor to data. If this is None, the loop will be terminated, that is, if there is no more data to read, the loop will be terminated and output. In order to ensure that the output format is pure data, use *data to unpack this variable

4. The complete code and comparison of the whole process:

# _*_ coding:utf-8 _*_
# @Time    : 2022/9/1 9:30
# @Author  : ice_Seattle
# @File    : testprogram.py
# @Software: PyCharm

import MySQLdb
import pandas as pd
import numpy as np
# 打开数据库连接
db = MySQLdb.connect("localhost", "root", "489000", "test", charset='utf8')
# 使用cursor()方法获取操作游标
cursor = db.cursor()
# 使用execute方法执行SQL语句
cursor.execute("SELECT VERSION()")
# 使用 fetchone() 方法获取一条数据
version = cursor.fetchone()
print("Database version : %s " % version)
# 循环下标
# Sql预处理语句分组并查询各个品牌
sql = """SELECT 品牌 FROM SHEET1 \
       GROUP BY 品牌 """
cursor.execute(sql)
for i in range(1, 10000):
    data = cursor.fetchone()
    if data is None:
        break
    else:
        print(*data)  # 解包输出
# 关闭数据库连接
db.close()
# 导入文件,并设置index_col=None,
df = pd.read_csv('E:/py/txt/var.txt', index_col=None, engine='python', names=['品牌'])
# 将DataFrame格式转换为数组
array = np.array(df)
for i in range(1, 26):
    s = df[i-1:i]
    array[i] = np.array(s)
    print(f"%c{str(*array[i])}%c," % (34, 34))
# 数组转列表
a_list = array.tolist()
print(a_list)

Navicat runs SELECT brand, count(brand) from sheet1 group by brand and the results are as follows:

The spoon.bat running program test results in Kettle are as follows

5. Summary

In summary, if you want to extract different content in the field: after writing the python code and running it, it is much faster than Navicat and Kettle, and you can add the " " sign for other data conversion such as lists, reducing complex operations step.