MySQL database development practice

Author: Zen and the Art of Computer Programming

1 Introduction

MySQL is one of the most popular relational database management systems (RDBMS). Whether you are building applications within a small business or operating a data center in a large Internet company, you all rely on the power and stability of MySQL. This document aims to improve R&D personnel's understanding and mastery of MySQL by detailing the application of MySQL in actual production environments, thereby accelerating the revolution in enterprise IT services.
　MySQL has mature commercial applications and open source community support. Many well-known companies and organizations at home and abroad use MySQL to provide their business support. Up to now, it has been used in many industries such as finance, government affairs, telecommunications, transportation, retail, etc. on a global scale, and it has been continuously iteratively updated and improved. It has highly reliable and high-performance data processing capabilities and is widely used in Data storage, data analysis, mobile applications, Internet of Things and other fields.
　This article will conduct an in-depth analysis of MySQL based on the following aspects, and share some problems encountered in actual work:
　　- Introduction to MySQL data types and storage methods;
　　- Introduction to MySQL common SQL operation statements;
　　- MySQL transaction isolation level and locks Mechanism introduction;
　　- MySQL optimization method introduction;
　　- MySQL database and table implementation plan;
　　- MySQL large data volume query optimization method; - MySQL
　　slow log troubleshooting techniques; - MySQL
　　master-slave replication configuration implementation;
　　- MySQL read-write separation configuration implementation;
　　- MySQL session parameter tuning;
　　- MySQL query cache configuration implementation;
　　- MySQL cluster construction and maintenance;
In addition, this document will also provide a detailed explanation of the management, monitoring and optimization solutions for our internally deployed MySQL cluster. This article will serve as an important supplement to enterprise IT services and play a positive role in promoting the construction and development of enterprise IT technology.

2. Data type and storage method

2.1 Three paradigms of relational database

In the design of relational database, three standard requirements are followed, namely satisfying the third normal form (3NF), the second normal form (2NF), and the first normal form (1NF). The first normal form (1NF) requires that attributes cannot be divided anymore, so the primary key is one and cannot contain any duplicate values. Second normal form (2NF) elements rely entirely on primary keys. The third normal form requires that each field directly depends on the primary key, and there can be no functional dependencies between non-primary key fields.
　The data types supported by MySQL include integer types (TINYINT, SMALLINT, MEDIUMINT, INT, BIGINT), floating point types (FLOAT, DOUBLE, DECIMAL), string types (VARCHAR, CHAR, TEXT), date and time types (DATE, DATETIME, TIMESTAMP), binary string type (BINARY, VARBINARY), JSON type. Among them, numeric types have different ranges that can be accurately represented. Integer types with a larger range can store larger data, but require more space. Character types can store variable-length strings, but may be less efficient.
　All tables in MySQL are composed of columns and records. Each table has a primary key, which is used to uniquely identify a record. Each record corresponds to a unique ID, which is globally unique in the entire database. MySQL's storage engines include MyISAM and InnoDB. InnoDB supports ACID transaction features, can provide high availability, and uses logs during insertion, deletion, and modification to ensure data integrity and consistency.
　After the table is created, if no index is specified, MySQL will generate the corresponding index according to the default settings of the storage engine. However, in general, it is recommended to create a combined index or a covering index to improve query efficiency. When creating a combined index, you should pay attention to the index order to avoid the problem of "selecting because there are few, resulting in retrieval errors". On the other hand, when the data volume of the table increases, the performance degradation problem caused by excessively large indexes can be solved through capacity expansion.
　In the query statement, the conditions in the WHERE clause should try to avoid performing functional operations on the associated columns, such as SUM, COUNT, AVG, etc., which will cause index failure and affect query speed. In addition, using the LIMIT OFFSET statement instead of a subquery can also improve query efficiency.
　For the MySQL database, in order to ensure data security, you can use permission management tools (such as the MySQL security plug-in) to encrypt and store sensitive information in the database.

3.SQL operation statements

3.1 DDL - Data Definition Language

DML (Data Manipulation Language) refers to the language used to operate data tables, such as SELECT, INSERT, UPDATE, and DELETE. DDL (Data Definition Language) is responsible for defining database objects, such as databases, tables, views, indexes, etc.
　CREATE DATABASE Create a database: CREATE DATABASE test_db;
　DROP DATABASE Delete a database: DROP DATABASE test_db;
　CREATE TABLE Create a table: CREATE TABLE table_name (column1 datatype constraint, column2 datatype constraint,...);
　DROP TABLE Delete a table: DROP TABLE table_name;
　ALTER TABLE Modify the table structure: ALTER TABLE table_name ADD/MODIFY COLUMN column_definition;, ALTER TABLE table_name DROP PRIMARY KEY/FOREIGN KEY index_name;
　CREATE INDEX Create an index: CREATE [UNIQUE] INDEX index_name ON table_name (column1, column2...);
　DROP INDEX Delete an index: DROP INDEX index_name ON table_name;
　SHOW CREATE DATABASE View the create database statement: SHOW CREATE DATABASE db_name;
　SHOW TABLES View the current database All tables: SHOW TABLES;
　DESCRIBE table_name describes the structure of the table:DESCRIBE table_name;

4. Transaction isolation level and lock mechanism

4.1 Four attributes of transactions

A transaction is an indivisible unit of work, which includes COMMIT (submit), ROLLBACK (rollback), START TRANSACTION (start transaction), and COMMIT (end transaction).
　Four attributes of transactions:
　　- Atomicity: A transaction is an atomic operation, in which all operations are either completed or not completed, and will not end in any intermediate link. The database is in the same state before the transaction starts and after it ends.
　　- Consistency: The integrity constraints of the database are not violated before and after the transaction is run. This requires that the database must always be in a consistent state.
　　- Isolation: The modifications made by one transaction will not be truly seen by other transactions until they are finally submitted. In other words, the operations and data used within a transaction are isolated from other concurrent transactions, and concurrently executed transactions cannot interfere with each other.
　　- Durability: Once a transaction is committed, the changes it makes to the database are saved forever. Subsequent other operations or fault recovery will not have any impact on it.

4.2 Lock mechanism

The lock mechanism is an important concept in computer foundation and the basis of database concurrency control. Locks are a mechanism used to control access to shared resources. Different locks are used to protect different resources so that the database can still maintain correctness when multiple users access it at the same time.
　MySQL uses two types of locks, row-level locks and table-level locks.
　- Row-level lock: Only one row can be locked at a time, and other processes can only wait. The lock granularity is the smallest, the probability of lock conflict is the lowest, and the concurrency is the highest.
　- Table-level lock: Lock the entire table at once, and other processes can only wait. The lock granularity is the largest, the probability of lock conflict is the highest, and the concurrency is the lowest.
　InnoDB uses row-level locking by default. When a row is locked, the IX lock on the record is converted into an S lock. When a lock is released, the S lock on the record is converted into an IX lock.

4.3 MVCC

In the traditional lock-based concurrency control mechanism, if a transaction wants to read a row of records, it usually requests the X lock of the record. Due to the lock mechanism, only locked transactions can continue to execute, and other transactions can only wait. This mode is called pessimistic concurrency control.
　In MVCC, there are multiple versions of each row of records, and a new snapshot is generated every time a transaction updates the record, allowing multiple transactions to read the same row of records at the same time. Before a transaction starts, it first requests a data snapshot (Read View) from the MySQL server to describe the current data state.
Within a transaction, other transactions cannot acquire the recorded X lock. When the transaction ends, the system will automatically clear all unnecessary old version data and only retain the latest snapshot data.
In practical applications, due to the complexity of the lock mechanism, MVCC is often more suitable for concurrency control of large-scale distributed database systems than traditional lock mechanisms.

5. Optimization methods

5.1 Index optimization

Indexing is the key to improving database query performance, but indexing is also a technical activity. Index design principles include selectivity (SELECTivity), uniqueness (uniqueness), fast filtering (fast filtering) and joint index (index on multiple columns).

Selectivity: The higher the selectivity of the index, the higher the query efficiency. Selectivity refers to the ratio between the number of data corresponding to each index key value and the total number of data in the data table. The calculation formula for selectivity is: selectivity = num of rows / num of distinct values in indexed column
Uniqueness: For a unique index, identical values are not allowed in the corresponding columns. This ensures the uniqueness of the index.
Quick filtering: When using range conditions (such as ">", "<") on the index column, the optimizer will use the index for quick filtering.
Joint index: If indexes are created on multiple columns at the same time, it is called a joint index. Union indexes improve query performance because the index matches the leftmost column first, then the middle columns, and finally the right column.
Through the above principles, a suitable index can be designed. In addition, optimization strategies such as partitioning, clustering index, and covering index can also be considered.

5.2 SQL optimization

SQL optimization involves the generation of SQL execution plans, query parsing, query optimization, use of query cache, maintenance of slow query logs, maintenance of statistical information, etc.
　- Query optimizer: The query optimizer determines the execution order of a query, determines which indexes to access, how to utilize the indexes, etc. The goal of the optimizer is to find the best execution plan for a query, thereby minimizing response time.
　- SQL rewriting: For some common query patterns, the optimizer may automatically generate more effective execution plans. The optimizer may also take into account appropriate index selection for specific types of queries.

Query cache: MySQL provides a query cache function that can cache results that have been repeatedly queried to speed up subsequent queries. However, in high-concurrency scenarios, the cache hit rate is not high, so you can turn off the query cache or set a reasonable cache timeout.
Slow query log: Slow query log records those queries whose response time exceeds a certain threshold. Analyzing these queries can help locate problems and improve database performance.
Statistical information: The optimizer relies on statistical information for query optimization, such as estimating the disk IO, CPU consumption, network transmission, sorting and aggregation operation costs of a query, and selecting an execution plan with the least cost.

6. Sub-database and sub-table

When the amount of data exceeds the processing capabilities of a single database, performance bottlenecks will occur. At this time, the database can be split horizontally, that is, the data is distributed to different databases, so that the amount of data in each database will be reduced.
The typical horizontal splitting method of database is to divide it according to business modules, that is, put the relevant data of different modules together.
In MySQL, range sharding can be used to divide databases and tables. Range sharding can allocate continuous data to multiple shards, with each shard responsible for a portion of the data. Using range sharding can also avoid the problem of data skew. Data skew refers to the uneven distribution of data on different shards, resulting in a large number of requests concentrated on a few shards and low query efficiency.
The range sharding method requires knowing the data range to be sharded in advance. If the data range is unknown in advance, or the data is not easy to divide, you can use the hash sharding method to hash the data to different shards.
In addition, vertical splitting can also be used, that is, dividing according to the data model and putting related data of different entities together. For example, user data and order data can be stored together. This approach improves query efficiency because entities of the same data model are placed together and the cache hierarchy is better utilized.

7. Large data query optimization

MySQL can solve query efficiency problems by dividing large data query tasks.

LIMIT clause paging: The LIMIT clause can limit the number of rows returned and can effectively solve the problem of memory overflow caused by excessive data volume.
UNION ALL clause merges query results: The UNION ALL clause can combine the results of two or more SELECT statements into one piece of data.
IN clause conditional filtering: The IN clause can specify multiple values, which can effectively reduce the number of scans of the index.
WHERE clause index scan: For equality, range, and LIKE conditions in the WHERE clause, you can consider using indexes for fast scanning.

8. Slow query troubleshooting

When a MySQL slow query occurs, first check whether the slow query log is turned on. If it is turned on, check the slow query log file.
The slow query log records SQL requests in the database that take longer to execute than the long_query_time setting. The default value of long_query_time is 10 seconds, which can be modified through the configuration file my.cnf.
Slow query logs generally record the client address, execution time, SQL, execution plan, lock information, etc.
There are several ways to troubleshoot slow queries:

Analysis based on slow query logs: Check the size of slow query log files, segment them by time windows, view SQL statements with high execution frequency, analyze their execution plans, and look for unreasonable indexes, SQL writing methods, etc.
Analysis through show profile: show profile outputs detailed information on the execution process, including detailed information on CPU, memory, and network I/O consumption. Analyze the SQL execution time displayed, analyze its execution plan, and check whether there are any locks that have been occupied for a long time.
Compare the results of the show processlist and explain commands: use show processlist to obtain running thread information, and use explain to analyze the execution plan of the statement.
Start with the code analysis: Using the pprof module provided by MySQL, you can collect performance information of the MySQL service. Using the trace module provided by MySQL, you can trace the execution process of the MySQL service.

9. Master-slave replication

In MySQL, master-slave replication is to achieve high availability of the MySQL server. To implement master-slave replication, two or more servers are required.
The master server handles all write operations, and the slave server handles all read operations. When the master server fails, the slave server can take over immediately.
Advantages of master-slave replication:

Read and write separation: Improve the load capacity of the database and improve the read performance of the database.
Disaster recovery: When a problem occurs on the master server, it can be switched to the slave server to ensure service availability.
Data redundancy: Master-slave replication can provide data redundancy backup. When the data on the master server is lost, the data on the slave server can be used for quick recovery.
Disadvantages of master-slave replication:
Latency: It takes a certain amount of time for data to be copied from the master server to the slave server.
Consistency: Master-slave replication needs to maintain data consistency. During the master-slave replication process, if the slave server lags behind the master server, data inconsistency will occur.
Replication delay: In the case of poor network conditions, replication delay will occur.

10. Separation of reading and writing

In MySQL, read-write separation is to relieve the pressure on the database server. Database read operations can put the database server under greater read operation pressure, while write operations can put the database server under greater write operation pressure.
Advantages of reading and writing separation:

Throughput improvement: The separation of read and write can significantly improve the throughput of the database server, and read requests will not put the database server under too much pressure.
Optimize query plans: Read-write separation can effectively optimize query plans and avoid resource competition and data inconsistency.
Disadvantages of read-write separation:
Data Inconsistency: During data synchronization, there may be a delay in data between the master and slave servers.
Complex configuration: Read-write separation involves the configuration of the database server, which increases the difficulty of configuration.
Management difficulty: The read-write separation configuration will increase the management complexity of the database server.

11. Session parameter tuning

MySQL server parameters are used to control the behavior of the database server. By adjusting these parameters, you can optimize the operation of the MySQL server.

max_connections parameter: This parameter sets how many connections the MySQL server can accept. The default value is 151. If this value is reached, new connections will be disconnected, requiring more resources to handle them.
thread_cache_size parameter: This parameter sets the size of the MySQL server thread cache. The default value is 32. If the thread cache is full, subsequent newly created threads will need to create new threads, which is reflected in the context switching of the CPU core.
tmp_table_size and max_heap_table_size parameters: These two parameters set the sizes of temporary tables and heap tables respectively. By default, tmp_table_size is 16M and max_heap_table_size is 16M. If the amount of data in the heap table exceeds the value of heap_table_size, it will first be converted to tmp_table, which will cause the database server to pause for a period of time.
key_buffer_size and innodb_buffer_pool_size parameters: These two parameters set the size of the MySQL server's key cache and innodb cache respectively. By default, innodb_buffer_pool_size is 128M and key_buffer_size is 256K. If key_buffer_size is not large enough to load all indexes, the MySQL server will use innodb_buffer_pool_size to expand the cache.
sort_buffer_size and read_rnd_buffer_size parameters: These two parameters set the size of the sort buffer and random read buffer of the MySQL server respectively. By default, sort_buffer_size is 256K and read_rnd_buffer_size is 1M.

12. Query cache

MySQL query cache is mainly used to improve the query efficiency of the database server. When the query cache is turned on, for the same query request, the results are directly searched from the cache instead of re-executing the query. Query caching can greatly improve the query performance of the database server.
How to enable query cache:

Query_Cache_Type parameter: This parameter can set the type of MySQL server query cache. Supports both MEMORY and DISK types. MEMORY means caching in memory only, DISK means caching in hard disk files. By default, Query_Cache_Type is DEAFULT, which means query caching is turned on.
QC_Hits and QC_Misses parameters: These two parameters record the number of query cache hits and misses.

13. Cluster construction

The cluster building process of MySQL is relatively complicated, especially in complex environments. We can refer to the article "MySQL Cluster Solution Based on Docker" to build it.

14.Monitoring and Optimization

IT operation and maintenance engineers need to monitor the health of the database, the usage of database resources, the query efficiency of the database, etc. Monitoring data can be fed back to the DBA and targeted optimization measures can be taken.
Monitoring data includes:

Server performance indicators: CPU, memory, disk, network, etc. These data reflect the hardware performance of the server and can determine whether the server is abnormal.
Database performance indicators: TPS, QPS, number of connections, connection pool status, SQL execution efficiency, etc. These data can reflect the overall performance of the database.
Database operation events: login failure, SQL execution error, etc. These events reflect the operating status of the database.
There are many directions for database performance optimization, including index optimization, SQL optimization, stored procedure optimization, etc.