[Tencent Cloud TDSQL-C Serverless Product Experience] Use Python to add read data to TDSQL-C to realize word cloud graph

foreword

TDSQL-C for MySQL (TDSQL-C for MySQL) is a new generation of cloud-native relational database developed by Tencent Cloud. Integrating the advantages of traditional database, cloud computing and new hardware technology, it provides users with highly flexible, high-performance, massive storage, safe and reliable database services. TDSQL-C MySQL version is 100% compatible with MySQL 5.7 and 8.0. It achieves high throughput of over one million QPS and the highest PB-level intelligent storage to ensure data security and reliability.
TDSQL-C MySQL version adopts the architecture of separating storage and computing. All computing nodes share one data, providing second-level configuration downgrade and second-level fault recovery. A single node can support million-level QPS, and automatically maintain data and backup. Parallel backfile at up to GB/sec.
TDSQL-C MySQL version not only combines the characteristics of stability, reliability, high performance, and scalability of commercial databases, but also has the advantages of simplicity, openness, and efficient iteration of open source cloud databases. TDSQL-C MySQL version engine is fully compatible with native MySQL, you can migrate MySQL database to TDSQL-C MySQL version engine without modifying any code and configuration of the application.

In this article, we will implement step by step using Python to add read data to TDSQL-C to implement word cloud graph

what did you learn

  1. How to apply for a TDSQL database: Including logging in to Tencent Cloud, purchasing configuration, purchasing and managing pages and other related steps.
  2. Create project engineering, connect to TDSQL database, create database, etc.
  3. It involves the explanation of reading word frequency Excel, creating tables, saving data to TDSQL, reading TDSQL data and other related codes.
  4. Python related knowledge, etc.

Preparation

Apply for TDSQL database

1. Click to log in to Tencent Cloud

Tencent cloud address

2. Click Buy Now, as shown below

insert image description here

3. The database configuration options on the purchase page are as follows

**Note**: Here we choose the instance formServerless

 - 实例形态   **(Serverless)**
 -  数据库引擎 **(MYSQL)**
 - 地域  **(北京)**  *地域这里根据自己的实际情况选择即可* 
 - 主可用区 **(北京三区)**   *主可用区这里根据自己的实际情况选择即可* 
 - 多可用区部署 **(否)**
 - 传输链路 
 - 网络
 - 数据库版本 **(MySQL5.7)**
 - 算力配置 **最小(0.25) , 最大(0.5)**
 - 自动暂停 **根据自己需求配置即可**
 - 计算计费模式 **(按量计费)**
 - 存储计费模式 **(按量计费)**

A screenshot of my configuration is as follows:

insert image description here

4. Basic Information

We can configure here directly 设置自己的密码and 表名大小写不敏感, as shown in the figure below
insert image description here

5. After the configuration is complete, click Buy Now in the lower right corner

6. After clicking Buy Now, there will be a pop-up window as follows, click again

insert image description here

7. After the purchase is complete, a pop-up window will appear, click前往管理页面

insert image description here

8. Click here for reading and writing examples开启外部

insert image description here

9. Create and authorize

insert image description here

So far our preparatory work is complete, in fact, it is quite simple!

data preparation

The required data is as follows

  • word frequency
  • background image
  • font file

The download address is at the end of the article, you can download it if you need it!

Create project project

The project directory is as follows
insert image description here

Explanation:

  1. The word cloud map folder in the file is used as the storage path of the generated image
  2. background.pngAs a word cloud map background image
  3. The font file is the font display of the word cloud map
  4. Word frequency is data support
  5. wordPhoto.pyfor the script file

LinkTDSQL

Open the database read and write instance to find the relevant configuration as shown in the figure

insert image description here

# MySQL数据库连接配置
db_config = {
    
    
    'host': "XXXXXX",  # 这里填写你自己申请的外部主机名
    'port': xxxx,   # 这里填写你自己申请的外部的端口
    'user': "root",  # 账户
    'password': "",  # 密码就是你自己创建实例时的密码
    'database': 'tdsql', # 这里需要自己在自己创建的`tdsql`中创建数据库 , 

}

create database

  1. Click the login button as shown in the figure to log in to the database we created
    insert image description here
  2. Enter the database click新建库
    insert image description here
  3. Click 新建数据库, a pop-up window appears
    insert image description here
  4. In the pop-up window, 数据库名称just write your favorite database name. Here we use tdsql, as the database name. After filling in the database name, 确定创建click
    insert image description here
  5. After the name of the database we created appears in the list, it means that it is created, and we can start writing code!
    insert image description here

function module

Read word frequency excel


def excelTomysql():
    path = '词频'  # 文件所在文件夹
    files = [path + "/" + i for i in os.listdir(path)]  # 获取文件夹下的文件名,并拼接完整路径
    for file_path in files:
        print(file_path)
        filename = os.path.basename(file_path)
        table_name = os.path.splitext(filename)[0]  # 使用文件名作为表名,去除文件扩展名
        # 使用pandas库读取Excel文件
        data = pd.read_excel(file_path, engine="openpyxl", header=0)  # 假设第一行是列名
        columns = {
    
    col: "VARCHAR(255)" for col in data.columns}  # 动态生成列名和数据类型

        create_table(table_name, columns)  # 创建表
        save_to_mysql(data, table_name)  # 将数据保存到MySQL数据库中,并使用文件名作为表名
        print(filename + ' uploaded and saved to MySQL successfully')


code explanation

  1. Set the folder path to 'word frequency', and assign the path to a variable path.
  2. Use os.listdir()the function to get all the file names under the folder, concatenate the full path, and store it in the list files.
  3. Use to forloop through fileseach file path in the list and print out the file path.
  4. Use os.path.basename()the function to get the file name and assign the file name to a variable filename.
  5. Use os.path.splitext()the function to obtain the extension of the file name, and remove the extension part through the index operation to obtain the table name, and assign the table name to the variable table_name.
  6. Use the function pandasof the library read_excel()to read the Excel file and store the data in a variable data. During reading, openpyxlthe engine is used, and the first row is assumed to be column names.
  7. Use dictionary comprehension to generate a dictionary columns, where the key of the dictionary is the column name of the data, and the value is "VARCHAR(255)" data type.
  8. Call create_table()the function with table_nameand columnsas parameters to create a corresponding table.
  9. Call save_to_mysql()the function with dataand table_nameas parameters to save the data into the MySQL database, using the file name as the table name.
  10. Print out the file name plus the prompt message of 'uploaded and saved to MySQL successfully'.

create table


def create_table(table_name, columns):
    # 建立MySQL数据库连接
    conn = pymysql.connect(**db_config)
    cursor = conn.cursor()
    # 组装创建表的 SQL 查询语句
    query = f"CREATE TABLE IF NOT EXISTS {
      
      table_name} ("
    for col_name, col_type in columns.items():
        query += f"{
      
      col_name} {
      
      col_type}, "
    query = query.rstrip(", ")  # 去除最后一个逗号和空格
    query += ")"

    # 执行创建表的操作
    cursor.execute(query)

    # 提交事务并关闭连接
    conn.commit()
    cursor.close()
    conn.close()
    

code explanation

  1. Establish a connection with the MySQL database, and the connection parameters db_configare provided through variables.
  2. Create a cursor object cursorfor executing SQL statements.
  3. Assemble the SQL query statement to create the table. First, insert the table name in the SQL query statement table_name. Then, by forlooping columnsthrough each key-value pair in the dictionary, add the column name and data type to the SQL query statement respectively.
  4. Remove the last comma and space at the end of the SQL query statement.
  5. Add closing brackets to complete the assembly of the SQL query statement.
  6. Use the cursor object cursorto execute the operation of creating a table, and the executed SQL statement is an assembled query statement.
  7. Commit the transaction to persist the modification to the database.
  8. Close the cursor and database connection.

The code uses pymysqlthe module to establish a MySQL database connection, and executes the operation of creating a table by writing an SQL statement. The specific database connection parameters db_configare provided in the variable, and columnsthe parameter is a dictionary generated by the previous code, which contains the column names and data types of the table.

save data totdsql


def save_to_mysql(data, table_name):
    # 建立MySQL数据库连接
    conn = pymysql.connect(**db_config)
    cursor = conn.cursor()
    # 将数据写入MySQL表中(假设数据只有一个Sheet)
    for index, row in data.iterrows():
        query = f"INSERT INTO {
      
      table_name} ("
        for col_name in data.columns:
            query += f"{
      
      col_name}, "
        query = query.rstrip(", ")  # 去除最后一个逗号和空格
        query += ") VALUES ("
        values = tuple(row)
        query += ("%s, " * len(values)).rstrip(", ")  # 动态生成值的占位符
        query += ")"
        cursor.execute(query, values)

    # 提交事务并关闭连接
    conn.commit()
    cursor.close()
    conn.close()

code explanation

  1. Establish a connection with the MySQL database, and the connection parameters db_configare provided through variables.
  2. Create a cursor object cursorfor executing SQL statements.
  3. For each row in the data, use forthe loop to iterate, get the index and row data.
  4. Assemble the SQL query statement for inserting data. First, insert the table name in the SQL query statement table_name. Then, foradd the column names to the SQL query statement by looping through the column names of the data.
  5. Remove the last comma and space at the end of the SQL query statement.
  6. Add closing brackets to complete the assembly of the SQL query statement.
  7. Use tuple(row)to convert the row data to a tuple type, and %sdynamically generate the corresponding number of placeholders for the value placeholders.
  8. Placeholders for values ​​are added to the SQL query statement.
  9. Use the cursor object cursor.execute()to execute the SQL query statement, and replace the placeholders in the query statement with the actual row data.
  10. Commit the transaction to persist the modification to the database.
  11. Close the cursor and database connection.

read tdsqldata

 
def query_data():
    # 建立MySQL数据库连接
    conn = pymysql.connect(**db_config)
    cursor = conn.cursor()
    # 查询所有表名
    cursor.execute("SHOW TABLES")
    tables = cursor.fetchall()

    data = []
    dic_list = []
    table_name_list = []
    for table in tables:
        # for table in [tables[-1]]:
        table_name = table[0]
        table_name_list.append(table_name)
        query = f"SELECT * FROM {
      
      table_name}"
        # # 执行查询并获取结果
        cursor.execute(query)
        result = cursor.fetchall()
        if len(result) > 0:
            columns = [desc[0] for desc in cursor.description]
            table_data = [{
    
    columns[i]: row[i] for i in range(len(columns))} for row in result]
            data.extend(table_data)
        dic = {
    
    }
        for i in data:
            dic[i['word']] = float(i['count'])
        dic_list.append(dic)

    conn.commit()
    cursor.close()
    conn.close()
    return dic_list, table_name_list

code explanation

  1. Establish a connection with the MySQL database, and the connection parameters db_configare provided through variables.
  2. Create a cursor object cursorfor executing SQL statements.
  3. Use cursor.execute()to execute the SQL query statement "SHOW TABLES"to obtain all table names.
  4. Use to cursor.fetchall()get the query result, store the result in variable tables.
  5. Create empty lists data, dic_listand table_name_list, data, dictionaries, and table names for storing query results.
  6. For each table name table, foriterate through the loop to get the table name and add it to table_name_list.
  7. Construct a SQL statement to query all data in the table, and use to cursor.execute()execute the query statement.
  8. Use to cursor.fetchall()get the query result, store the result in variable result.
  9. If resultthe length of the query result is greater than 0, it means that there is data, perform the following operations:
    • Use to cursor.descriptionget a list of column names of the query result, and store the column names in a variable columns.
    • Using list comprehensions and dictionary comprehensions, convert each row of the query result to a dictionary and store the dictionary in a variable table_data.
    • will table_databe added to datathe list.
  10. Build a dictionary from datathe results in and store the dictionary in the variable dic.
  11. will dicbe added to dic_listthe list.
  12. Commit the transaction to persist the modification to the database.
  13. Close the cursor and database connection.
  14. returns dic_listand table_name_list.

code call


if __name__ == '__main__':
    excelTomysql()
    result_list, table_name_list = query_data()
    for i in range(len(result_list)):
        maskImage = np.array(Image.open('background.PNG'))  # 定义词频背景图
        # 定义词云样式
        wc = wordcloud.WordCloud(
            font_path='PingFangBold.ttf', # 设置字体
            mask=maskImage,  # 设置背景图
            max_words=800,  # 最多显示词数
            max_font_size=200)  # 字号最大值
        # 生成词云图
        wc.generate_from_frequencies(result_list[i])  # 从字典生成词云
        # 保存图片到指定文件夹
        wc.to_file("词云图/{}.png".format(table_name_list[i]))
        print("生成的词云图【{}】已经保存成功!".format(table_name_list[i] + '.png'))
        plt.imshow(wc)  # 显示词云
        plt.axis('off')  # 关闭坐标轴
        plt.show()  # 显示图像

code explanation

  1. Use Image.open()Open the background image named 'background.PNG' and convert it to a NumPy array, store it in the variable maskImageas the background image of the word cloud.
  2. Create an WordCloudobject wcand set parameters such as font path, background image, maximum number of displayed words, and maximum font size.
  3. Generate a word cloud graph using dictionary data wc.generate_from_frequencies()from .result_list[i]
  4. Use to wc.to_file()save the generated word cloud as a file named "word cloud/{}.png", where {}indicates the corresponding table name.
  5. Print out the file name of the generated word cloud map.
  6. Use plt.imshow()to display a word cloud.
  7. Use plt.axis('off')to turn off the display of the axes.
  8. Use plt.show()to display the image.

full code

import pymysql
import pandas as pd
import os
import wordcloud
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt

# MySQL数据库连接配置
db_config = {
    
    
    'host': "XXXXXX",  # 这里填写你自己申请的外部主机名
    'port': xxxx,   # 这里填写你自己申请的外部的端口
    'user': "root",  # 账户
    'password': "",  # 密码就是你自己创建实例时的密码
    'database': 'tdsql', # 这里需要自己在自己创建的`tdsql`中创建数据库 , 

}


def create_table(table_name, columns):
    # 建立MySQL数据库连接
    conn = pymysql.connect(**db_config)
    cursor = conn.cursor()
    # 组装创建表的 SQL 查询语句
    query = f"CREATE TABLE IF NOT EXISTS {
      
      table_name} ("
    for col_name, col_type in columns.items():
        query += f"{
      
      col_name} {
      
      col_type}, "
    query = query.rstrip(", ")  # 去除最后一个逗号和空格
    query += ")"

    # 执行创建表的操作
    cursor.execute(query)

    # 提交事务并关闭连接
    conn.commit()
    cursor.close()
    conn.close()


def excelTomysql():
    path = '词频'  # 文件所在文件夹
    files = [path + "/" + i for i in os.listdir(path)]  # 获取文件夹下的文件名,并拼接完整路径
    for file_path in files:
        print(file_path)
        filename = os.path.basename(file_path)
        table_name = os.path.splitext(filename)[0]  # 使用文件名作为表名,去除文件扩展名
        # 使用pandas库读取Excel文件
        data = pd.read_excel(file_path, engine="openpyxl", header=0)  # 假设第一行是列名
        columns = {
    
    col: "VARCHAR(255)" for col in data.columns}  # 动态生成列名和数据类型

        create_table(table_name, columns)  # 创建表
        save_to_mysql(data, table_name)  # 将数据保存到MySQL数据库中,并使用文件名作为表名
        print(filename + ' uploaded and saved to MySQL successfully')


def save_to_mysql(data, table_name):
    # 建立MySQL数据库连接
    conn = pymysql.connect(**db_config)
    cursor = conn.cursor()
    # 将数据写入MySQL表中(假设数据只有一个Sheet)
    for index, row in data.iterrows():
        query = f"INSERT INTO {
      
      table_name} ("
        for col_name in data.columns:
            query += f"{
      
      col_name}, "
        query = query.rstrip(", ")  # 去除最后一个逗号和空格
        query += ") VALUES ("
        values = tuple(row)
        query += ("%s, " * len(values)).rstrip(", ")  # 动态生成值的占位符
        query += ")"
        cursor.execute(query, values)

    # 提交事务并关闭连接
    conn.commit()
    cursor.close()
    conn.close()


def query_data():
    # 建立MySQL数据库连接
    conn = pymysql.connect(**db_config)
    cursor = conn.cursor()
    # 查询所有表名
    cursor.execute("SHOW TABLES")
    tables = cursor.fetchall()

    data = []
    dic_list = []
    table_name_list = []
    for table in tables:
        # for table in [tables[-1]]:
        table_name = table[0]
        table_name_list.append(table_name)
        query = f"SELECT * FROM {
      
      table_name}"
        # # 执行查询并获取结果
        cursor.execute(query)
        result = cursor.fetchall()
        if len(result) > 0:
            columns = [desc[0] for desc in cursor.description]
            table_data = [{
    
    columns[i]: row[i] for i in range(len(columns))} for row in result]
            data.extend(table_data)
        dic = {
    
    }
        for i in data:
            dic[i['word']] = float(i['count'])
        dic_list.append(dic)

    conn.commit()
    cursor.close()
    conn.close()
    return dic_list, table_name_list


if __name__ == '__main__':
    excelTomysql()
    result_list, table_name_list = query_data()
    for i in range(len(result_list)):
        maskImage = np.array(Image.open('background.PNG'))  # 定义词频背景图
        # 定义词云样式
        wc = wordcloud.WordCloud(
            font_path='PingFangBold.ttf', # 设置字体
            mask=maskImage,  # 设置背景图
            max_words=800,  # 最多显示词数
            max_font_size=200)  # 字号最大值
        # 生成词云图
        wc.generate_from_frequencies(result_list[i])  # 从字典生成词云
        # 保存图片到指定文件夹
        wc.to_file("词云图/{}.png".format(table_name_list[i]))
        print("生成的词云图【{}】已经保存成功!".format(table_name_list[i] + '.png'))
        plt.imshow(wc)  # 显示词云
        plt.axis('off')  # 关闭坐标轴
        plt.show()  # 显示图像

Notice

Import related packages before running the code!


pip install pymysql
pip install pandas
pip install wordcloud
pip install numpy
pip install pillow
pip install matplotlib

run code

write screenshot

insert image description here

Screenshot of database data

insert image description here

Generate word cloud

insert image description here

Save the word cloud map to a folder
insert image description here

deleteTDSQL

The experience is complete, considering that the current business does not need to continue to open the database to prevent invalid billing, so delete it

Click the destroy button as shown in the picture

insert image description here
A pop-up window appears to destroy the instance, click OK

insert image description here

Download

The resources are taken from the Baidu disk!

Link: https://pan.baidu.com/s/1hClOJI07HUuGBQ2SwZfWjw Extraction code: 5mm9
– share from Baidu Netdisk super member v7

Summarize

When you use TDSQLit, you will find that it is really seamless access, very silky smooth, of course there are some deficiencies, I hope it can be improved!!

advantage

  1. The overall use and experience of Tencent Cloud Database TDSQL is very good, the operation is relatively simple, and the simple official documentation is used to build it successfully. Secondly, it is very cost-effective, especially for beginners.
  2. Compared with traditional databases, the billing method of TD-SQL Serverless is more flexible, and the billing method is paid according to the actual resources used, avoiding the cost of running the server for a long time. At the same time, it can also automatically sleep when idle, reducing unnecessary costs.

shortcoming

  1. Since TD-SQL Serverless allocates and starts resources only when the request arrives, there may be a certain delay at the first request. For some application scenarios with high real-time requirements, delay may affect user experience.
  2. Compared with traditional databases, TD-SQL Serverless provides fewer configuration and optimization options, and users have limited control over underlying resources. This may result in some specific requirements not being met.
  3. Although TD-SQL Serverless can automatically expand computing resources according to demand, high concurrent traffic may lead to higher costs. Additional charges may apply if there are a large number of concurrent requests within a short period of time.

Note that these three shortcomings are just guesses based on experience, please correct me if there are any mistakes!!

Guess you like

Origin blog.csdn.net/qq_33681891/article/details/132211647