[Tencent Cloud TDSQL-C Serverless Product Experience] Use Python to add read data to TDSQL-C to implement a word cloud graph

Introduction to TDSQL-C Serverless

TDSQL-C is a new generation of cloud-native relational database independently developed by Tencent Cloud.

It combines the advantages of traditional databases, cloud computing and new hardware technologies, is 100% compatible with MySQL, and provides users with database services with extreme elasticity, high performance, high availability, high reliability and security.

TDSQL-C achieves a high throughput of more than one million per second, supports PB-level massive distributed intelligent storage, and has serverless second-level scaling capabilities, which can accelerate the digital transformation of enterprises.

Its Serverless service is a serverless architecture implementation based on Tencent Cloud's self-developed new generation cloud-native relational database TDSQL-C MySQL version. It is a cloud-native database with a full Serverless architecture.

Serverless services are charged based on the actual computing and storage resources used. There is no need to pay, making Tencent Cloud's cloud-native technology accessible to all users.

Introduction to applicable scenarios

This type of database is actually pay-as-you-go and is very suitable for use in testing and R&D environments. If there are obvious fluctuations in the business, its elastic scaling function is also more suitable. For the cloud development of some small programs, some website construction of small enterprises can also consider such databases.

Database purchase

  1. Here is a brief introduction on how to find this database
  • Search to enter Tencent Cloud registration and log in

  • Enter TDSQL-C MYSQL version in the search box and click Search

  • Click to shop now

  • Adjust the configuration as needed and choose the Serverless instance form!!!

  • Configure TDSQL-C cluster

  • Connect to the database based on instance information

  1. If you just want to experience the function, you can try it for free through the following link:

https://mc.tencent.com/uQHh7pDI

Database stress test

  1. sysbench installation

Let's do a simple pressure test through sysbench to see some performance indicators of the database

Install via the following command

curl -s https://packagecloud.io/install/repositories/akopytov/sysbench/script.rpm.sh |bash 

yum install -y sysbench

Check it with the following command:

sysbench --version

  1. Writing stress test data

By executing the following command, create 20 new tables, and construct 1 million test data in each table. The specific host, port, user, and password can be modified according to the actual situation. The new table can also be created according to your own needs.

sysbench --db-driver=mysql --time=300 --threads=10 --report-interval=1 --mysql-host=gz-cynosdbmysql-grp-d27hp6vl.sql.tencentcdb.com --mysql-port=27529 --mysql-user=root --mysql-password=password --mysql-db=experience-15 --tables=20 --table_size=1000000 oltp_read_write --db-ps-mode=disable prepare
  1. overall literacy test

Test the comprehensive read and write TPS of the database, using the oltp_read_write mode

By executing the following command, you can see that the stress test data is output on the console. If you want to output it to a file, you can also configure it through the command.

Because the actual access is currently through the public network, here is just a stress test idea. If you are interested, you can practice it yourself on the intranet.

sysbench --db-driver=mysql --time=300 --threads=10 --report-interval=1  --mysql-host=gz-cynosdbmysql-grp-d27hp6vl.sql.tencentcdb.com --mysql-port=27529 --mysql-user=root --mysql-password=password --mysql-db=experience-15 --tables=20 --table_size=1000000 oltp_read_write --db-ps-mode=disable run

Console stress test data:

Note:
When using sysbench to perform read and write tests on the database, there are a few points you need to pay attention to:

    1. Choose the appropriate test mode, such as sequential read/write, random read/write, etc., based on the actual business scenario.
    1. Adjust the number of threads and test duration, and gradually increase the pressure until you find the pressure bottleneck of the database.
    1. Test data must be reloaded before and after testing to avoid caching affecting the results.
    1. Test against different database parameters, such as buffer pool size, index settings, etc.
    1. Record indicators under different pressure situations, such as TPS, latency, resource utilization, etc.
  1. Read-only performance test

To test the read-only performance of the database, use oltp_read_write mode and execute the following command:

sysbench --db-driver=mysql --time=300 --threads=10 --report-interval=1 -mysql-host=gz-cynosdbmysql-grp-d27hp6vl.sql.tencentcdb.com --mysql-port=27529 --mysql-user=root --mysql-password=password --mysql-db=experience-15  --tables=20 --table_size=1000000 oltp_read_only --db-ps-mode=disable run
  1. Insert performance test

Test the data insertion performance of the database, using the mode: oltp_insert, the command is as follows:

sysbench --db-driver=mysql --time=300 --threads=10 --report-interval=1 -mysql-host=gz-cynosdbmysql-grp-d27hp6vl.sql.tencentcdb.com --mysql-port=27529 --mysql-user=root --mysql-password=password --mysql-db=experience-15 --tables=20 --table_size=1000000 oltp_insert --db-ps-mode=disable run

Regarding the situation of some performance tests, the official also gives some data, you can refer to:

Actual experience

Use Python to add read data to TDSQL-C to realize word cloud graph

The entire practical steps are as follows:

  1. Prepare the python environment and install dependency packages
pip install PyMySQL==1.1.0
pip install pandas==2.0.1
pip install wordcloud==1.9.1.1
pip install numpy==1.23.5
pip install matplotlib==3.7.2
pip install Pillow==9.5.0
  1. Configure database connection information
  2. Create a function to read excel files
  3. Create database table name based on excel file name
  4. Save the read excel data into the corresponding table of the database.
    The following is the data stored in the database after reading excel. There is no difference in use from a conventional database.

  1. Read data stored in the database
  2. Execute the function and generate a word cloud diagram.
    The word cloud diagram generated according to the code is as follows

The complete code is as follows:

import pymysql
import pandas as pd
import os
import wordcloud
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt

# MySQL数据库连接配置
db_config = {
    'host': "gz-cynosdbmysql-grp-d27hp6vl.sql.tencentcdb.com",  # 主机名
    'port': 27529,  # 端口
    'user': "root",  # 账户
    'password': "pass",  # 密码
    'database': 'experience-16',

}


def create_table(table_name, columns):
    # 建立MySQL数据库连接
    conn = pymysql.connect(**db_config)
    cursor = conn.cursor()
    # 组装创建表的 SQL 查询语句
    query = f"CREATE TABLE IF NOT EXISTS {table_name} ("
    for col_name, col_type in columns.items():
        query += f"{col_name} {col_type}, "
    query = query.rstrip(", ")  # 去除最后一个逗号和空格
    query += ")"

    # 执行创建表的操作
    cursor.execute(query)

    # 提交事务并关闭连接
    conn.commit()
    cursor.close()
    conn.close()


def excelTomysql():
    path = '词频'  # 文件所在文件夹
    files = [path + "/" + i for i in os.listdir(path)]  # 获取文件夹下的文件名,并拼接完整路径
    for file_path in files:
        print(file_path)
        filename = os.path.basename(file_path)
        table_name = os.path.splitext(filename)[0]  # 使用文件名作为表名,去除文件扩展名
        # 使用pandas库读取Excel文件
        data = pd.read_excel(file_path, engine="openpyxl", header=0)  # 假设第一行是列名
        columns = {col: "VARCHAR(255)" for col in data.columns}  # 动态生成列名和数据类型

        create_table(table_name, columns)  # 创建表
        save_to_mysql(data, table_name)  # 将数据保存到MySQL数据库中,并使用文件名作为表名
        print(filename + ' uploaded and saved to MySQL successfully')


def save_to_mysql(data, table_name):
    # 建立MySQL数据库连接
    conn = pymysql.connect(**db_config)
    cursor = conn.cursor()
    # 将数据写入MySQL表中(假设数据只有一个Sheet)
    for index, row in data.iterrows():
        query = f"INSERT INTO {table_name} ("
        for col_name in data.columns:
            query += f"{col_name}, "
        query = query.rstrip(", ")  # 去除最后一个逗号和空格
        query += ") VALUES ("
        values = tuple(row)
        query += ("%s, " * len(values)).rstrip(", ")  # 动态生成值的占位符
        query += ")"
        cursor.execute(query, values)

    # 提交事务并关闭连接
    conn.commit()
    cursor.close()
    conn.close()


def query_data():
    # 建立MySQL数据库连接
    conn = pymysql.connect(**db_config)
    cursor = conn.cursor()
    # 查询所有表名
    cursor.execute("SHOW TABLES")
    tables = cursor.fetchall()

    data = []
    dic_list = []
    table_name_list = []
    for table in tables:
        # for table in [tables[-1]]:
        table_name = table[0]
        table_name_list.append(table_name)
        query = f"SELECT * FROM {table_name}"
        # # 执行查询并获取结果
        cursor.execute(query)
        result = cursor.fetchall()
        if len(result) > 0:
            columns = [desc[0] for desc in cursor.description]
            table_data = [{columns[i]: row[i] for i in range(len(columns))} for row in result]
            data.extend(table_data)
        dic = {}
        for i in data:
            dic[i['word']] = float(i['count'])
        dic_list.append(dic)

    conn.commit()
    cursor.close()
    conn.close()
    return dic_list, table_name_list


if __name__ == '__main__':
    ##excelTomysql()方法将excel写入到mysql
    excelTomysql()
    print("excel写入到mysql成功!")
    # query_data()方法将mysql中的数据查询出来,每张表是一个dic,然后绘制词云
    result_list, table_name_list = query_data()
    print("从mysql获取数据成功!")
    for i in range(len(result_list)):
        maskImage = np.array(Image.open('background.PNG'))  # 定义词频背景图
        # 定义词云样式
        wc = wordcloud.WordCloud(
            font_path='PingFangBold.ttf', # 设置字体
            mask=maskImage,  # 设置背景图
            max_words=500,  # 最多显示词数
            max_font_size=100)  # 字号最大值
        # 生成词云图
        wc.generate_from_frequencies(result_list[i])  # 从字典生成词云
        # 保存图片到指定文件夹
        wc.to_file("词云图/{}.png".format(table_name_list[i]))
        print("生成的词云图【{}】已经保存成功!".format(table_name_list[i] + '.png'))
        # 在notebook中显示词云图
        plt.imshow(wc)  # 显示词云
        plt.axis('off')  # 关闭坐标轴
        plt.show()  # 显示图像

Summarize

  1. Tencent Cloud TDSQL-C MySQL Serverless version is the first and largest MySQL serverless database product in China. Its biggest feature and advantage lies in its highly elastic and flexible usage. It is billed based on actual usage. If you do not use it, there is no charge. It is very suitable for small and medium-sized enterprises or individual developers whose business volume fluctuates greatly and is unpredictable. This mode of on-demand usage and billing greatly reduces usage costs and waste of resources. 100% compatible with MySQL, you can complete the smooth migration of database queries, applications and tools almost without changing the code.

  2. TDSQL-C MySQL Serverless Edition is especially suitable for some new services that have just been launched or whose business volume is unpredictable. It is also very suitable for applications with cyclical fluctuations in business load. It can be adjusted in real time according to peak and off-peak periods without the need to reserve fixed resources. It is both flexible and economical. Up to 400TB of storage, serverless architecture, automatic expansion and contraction, easily cope with dynamic changes and continuous growth of business data volume.

  3. Compared with traditional databases, TDSQL-C MySQL Serverless version can achieve second-level start-stop capacity expansion and contraction, flexible adjustment according to actual usage, and implements a pay-as-you-go billing model, which can be accurate to second-level billing, and is flexible and easy to use. There will be no waste of resources. Up to 400TB of storage, serverless architecture, automatic expansion and contraction, easily cope with dynamic changes and continuous growth of business data volume.

  4. If the business is mainly deployed within the WeChat ecosystem, such as WeChat mini programs, TDSQL-C MySQL Serverless version can be deeply integrated with the WeChat ecosystem to provide one-stop back-end cloud database services for developers of WeChat platforms such as mini programs. Development and operation and maintenance are very convenient and efficient. Computing nodes can be quickly upgraded or upgraded according to business needs, and capacity expansion can be completed in seconds. Combined with elastic storage, the cost of computing resources is optimized.

  5. For existing databases or data, TDSQL-C MySQL Serverless version also provides a variety of rapid migration solutions. In addition to using the data transmission service DTS migration provided by Tencent Cloud, data migration can also be carried out through command line tools such as mysqldump. The entire migration process can be fast and convenient.

Serverless service architecture

おすすめ

転載: blog.csdn.net/smallbird108/article/details/132297741