Interview eight-part essay Mysql: (2) Database tuning

1. SQL optimization is necessary

Database optimization is a very important aspect in improving system performance, whether it is MySQL, MongoDB or other databases.
SQL optimization is the lowest-cost and most obvious way to improve system performance, which can achieve greater throughput and faster response . If your team is excellent in SQL optimization, it will undoubtedly be a qualitative leap in the availability of your entire large-scale system, and it can really save your boss a lot of money.
I have encountered such a problem in a project I encountered. The amount of data in the database is too large, causing the query data to time out, and multiple modules cannot provide services normally. The temporary solution is to delete the old data, but in the end it only treats the symptoms but not the root cause. .

2. Direction of SQL optimization

Insert image description here
Optimize costs: Hardware>System Configuration>Database Table Structure>SQL and Indexes.
Optimization effect: hardware < system configuration < database table structure < SQL and index.

Therefore: database optimization is optimized from the following aspects:

  • SQL tuning
  • database index
  • Clear unnecessary data regularly and perform defragmentation regularly
  • Database design—three major paradigms, fields, and table structures
  • Table and database subdivision (horizontal split, vertical split)
  • Optimize MySQL configuration (configure the maximum number of concurrencies my.ini, adjust the cache size)
  • Stored procedures (modular programming, which can increase speed)
  • Master-slave replication, read-write separation
  • wait

2.1. SQL statement tuning

Reasons for SQL performance degradation:
1. Poorly written query statements
2. Index failure (data change)
3. Too many joins in related queries (design defects or unavoidable requirements)
4. Server tuning and various parameter settings (buffering, number of threads, etc.) )

Typical SQL tuning process:

  • Observe and run for at least 1 day to see the slow SQL in production.
  • Enable the slow query log and set the threshold. For example, if it exceeds 5 seconds, it is slow SQL. Then grab it and save it in the log (you can specify the slow query log directory in my.ini).
  • explain + slow SQL analysis.
  • show profile queries the execution details and life cycle of SQL in the Mysql server.
  • Operation and maintenance manager or DBA, perform parameter tuning of SQL database server.
  • Check the optimized execution time and execution plan. If the optimization effect is not obvious, repeat

2.2. SQL index

Indexes are also part of database design

1. Generally speaking, indexes should be created on these columns:

On columns that often need to be searched, you can speed up the search;
on columns that are used as primary keys, you enforce the uniqueness of the column and organize the arrangement structure of the data in the table;
on columns that are often used in connections, these columns are mainly some Foreign keys can speed up connections;
create indexes on columns that often need to be searched based on range, because the index has been sorted, and its specified range is continuous;
on columns that often need to be sorted (group by or order by) Create an index on **, because the index has been sorted, so the query can use the sorting of the index to speed up the sorting query time;
create an index on the columns that are often used in the WHERE clause to speed up the judgment of conditions.
The summary is: unique, not empty, frequently queried fields

2. Indexes should not be created for some columns:

Indexes should not be created on columns that are rarely used or referenced in queries.
Indexes should not be added to columns that have few data values.
Indexes should not be added to columns that are defined as text, image and bit data types with large amounts of data.
When modification performance is much greater than retrieval performance, indexes should not be created. Modification performance and retrieval performance are contradictory. When adding indexes, retrieval performance will be improved, but modification performance will be reduced. When reducing indexes, modification performance will increase and retrieval performance will decrease. Therefore, when modification performance is much greater than retrieval performance, indexes should not be created.
3. Index failure

In the following situations, the execution engine will give up using the index and perform a full table scan

Use ** in where clause! = or <> operator**

Use or in the where clause to connect conditions. When the connected fields have fields but no indexes, the indexes of all fields will be invalid.

Perform null value judgment in the where clause field,

Fuzzy match for like in where clause starts with %

Perform expression operations or function operations on the index in the where clause

If the execution engine estimates that using a full table scan will be faster than using an index, then the index is not used

2.3. Three major paradigms of SQL design

(1) Database design—three major paradigms, fields, and table structures

1. Design the table structure according to the three paradigms of the database. When designing the table structure, you need to consider how to design more efficient queries.

First normal form: Each field in the data table must be the smallest unit that cannot be split, that is, to ensure the atomicity of each column; Second normal form:
After satisfying the first normal form, each column in the table must be unique and must depend on On the primary key;
third normal form: After satisfying the second normal form, each column in the table is only directly related to the primary key rather than indirectly related (foreign keys are also directly related), and there is no redundancy in the fields.
2. Others:

Try to use TINYINT, SMALLINT, MEDIUM_INT as integer types instead of INT. If it is non-negative, add
the length of UNSIGNED VARCHAR to allocate only the space that is really needed.
Try to use integers instead of string types. Do not have too many fields in a single table. It is recommended to avoid it
within 20
Using NULL fields makes it difficult to optimize queries and takes up extra index space.
It is not recommended to use select * from t. Use a specific field list instead of " ", and do not return any unused fields. Try to avoid returning large amounts of data to the client. If the amount of data is too large, you should consider whether the corresponding requirements are reasonable.
Tables are related through a redundant field, which has better performance than using JOIN directly.
select count (
) from table; such count without any conditions will cause a full table scan

2.4. Master-slave replication and read-write separation

In an actual production environment, reading and writing the database are in the same database server, which cannot meet actual needs. Whether it is in terms of security, high availability or high concurrency, it is completely unable to meet actual needs. Therefore, data is synchronized through master-slave replication, and the concurrent load capacity of the database is improved through read-write separation.

Function: database backup, read-write separation, high availability, clustering.

2. Process:

Before each transaction completes updating data, the master records these changes in the binary log. After writing to the binary log is completed, the master notifies the storage engine to commit the transaction.

Slave copies the master's binary log to its relay log. First, the slave starts a working thread (I/O). The I/O thread opens a normal connection on the master, and then starts the binlog dump process. The binlog dump process reads events from the master's binary log. If it has caught up with the master, it sleeps and waits for the master to generate new events. The I/O thread writes these events to the relay log.

Sql slave thread (sql slave thread) handles the last step of the process. The sql thread reads events from the relay log and replays the events to update the slave data to make it consistent with the data in the master. As long as the thread is connected to I/O Threads remain consistent and relay logs are usually located in the os cache, so the overhead of relay logs is very small.

Insert image description here

2.5. Sub-database and sub-table

Sub-database and sub-table

In master-slave replication, the slave database can continue to expand by increasing the number, but the master database cannot be easily increased. At this time, you can consider splitting tables and databases.

1. Split table method

Split horizontally (by row), split vertically (by column)

Vertical split: Vertical split is to divide the tables into different databases according to modules. The database divides the tables according to modules and functions, which tends to be service-oriented.

Horizontal sharding is mainly used to solve the "problem of large database data volume"

Horizontal split: Horizontal split is to divide the data of a table into different tables or databases according to certain rules. For example, by time, account rules, year, modulo algorithm, etc.
2. Table splitting scenario

According to experience, mysql table data generally reaches millions of levels, and query efficiency will be very low.
Some fields in a table have large values ​​and are rarely used. These fields can be isolated into a separate table and related through foreign keys, such as test scores. We usually focus on scores and not test details.
3. Horizontal table splitting strategy

Split tables by time: When the data has strong effectiveness, such as Weibo data, it can be split by month.
Divide tables by interval: For example, use one table for user tables from 1 to 1 million, and use one table from 1 million to 2 million.
Hash table subdivision: Calculate the table name of data storage through an original target ID or name according to a certain hash algorithm.
4. Disadvantages of table splitting:
paging query is difficult
and query is very limited

2.6. Architecture optimization

Add a caching service, such as Redis or Memcache, between the application and the database.

Insert image description here

After receiving the query request, we first query the cache to determine whether there is data in the cache. If there is data, it will be returned directly to the application. If there is no data, we will query the database again and load it into the cache. This will greatly reduce the number of accesses to the database. Naturally, database performance is also improved. However, it should be noted that after introducing distributed cache, the system needs to consider how to deal with cache penetration, cache breakdown and cache avalanche.

Guess you like

Origin blog.csdn.net/weixin_42774617/article/details/132245346
Recommended