Java interview must-test points--Lecture 09 (Part 2): MySQL tuning and best practices

Detailed explanation of MySQL

Let’s learn MySQL, the most widely used relational database in the Internet industry. Its knowledge point structure is shown below.

Commonly used SQL statements

There are no special skills for handwriting commonly used SQL statements. Just do more exercises according to the listed statement types.

type of data

You need to know what basic data types MySQL provides and how much space different data types occupy. You can memorize them according to the categories given, instead of listing them one by one.

engine

Introducing the main storage engines in MySQL.

  • MyISAM is the storage engine officially provided by MySQL. It is characterized by supporting full-text indexing and relatively high query efficiency. The disadvantage is that it does not support transactions and uses table-level locks.

  • InnoDB became the default storage engine of MySQL after version 5.5. It is characterized by supporting ACID transactions, foreign keys, and row-level locks to improve concurrency efficiency.

  • TokuDB is an open source storage engine developed by a third party. It has very fast writing speed, supports compressed storage of data, and can add indexes online without affecting read and write operations. However, due to compression, TokuDB is very suitable for infrequently accessed data or historical data archiving, and is not suitable for large-scale reading scenarios.

Lock

Locks in MySQL, as mentioned above, MyIASAM uses table-level locks and InnoDB uses row-level locks.

  • Table locks have low overhead, fast locking, and no deadlocks; however, the granularity of the locks is large, the probability of lock conflicts is high, and the concurrent access efficiency is relatively low.

  • Row-level locks are expensive and slow to acquire locks, and deadlocks may occur. However, because the locking granularity is the smallest, the probability of lock conflicts is low, and concurrent access efficiency is relatively high.

  • A shared lock is also a read lock. Other transactions can read but cannot write. MySQL can explicitly use shared locks through the lock in share mode statement.

  • An exclusive lock is a write lock, and other transactions cannot read or write. For UPDATE, DELETE and INSERT statements, InnoDB will automatically add exclusive locks to the involved data sets, or use select for update to display exclusive locks.

Stored procedures and functions

MySQL's stored procedures and functions can prevent developers from repeatedly writing the same SQL statements, and stored procedures and functions are executed in the MySQL server, which can reduce data transmission between the client and the server.

Stored procedures can implement more complex functions, while functions are generally used to implement more targeted functions, such as summation of special strategies. Stored procedures can perform a series of database operations including modifying tables, while user-defined functions cannot be used to perform operations that modify the global database state.

Stored procedures are generally executed as an independent part, while functions can be called as part of a query statement. Stored procedures cannot be used in SQL statements, but functions can be used. Stored procedures are generally bound to the database implementation. Using stored procedures will reduce the portability of the program and should be used with caution.

new features

In addition, you can learn about some new features of MySQL8.0, such as:

  • The default character set format is changed to UTF-8;

  • Added the function of hiding indexes. The hidden index will not be used by the query optimizer. You can use this feature for performance debugging;

  • Supports general table expressions, making embedded table statements in complex queries clearer;

  • The concept of window function is added, which can be used to implement new query methods.

Among them, window functions are similar to aggregate functions such as SUM and COUNT, but they do not merge multi-row query results, but place the results in multiple rows, that is, window functions do not require GROUP BY.

index

Looking at MySQL indexes, indexes can greatly increase the query performance of the database and are more or less used in actual business scenarios. However, indexes also come at a cost. Firstly, additional disk space is required to save the index; secondly, for operations such as inserting, updating, and deleting, updating the index will increase additional overhead, so indexes are more suitable for scenarios where there is more reading and less writing.

First learn MySQL index types.

  • A unique index means that the value in the index column must be unique, but null values ​​are allowed. This kind of index is generally used to ensure the uniqueness of data. For example, in a table that stores account information, the ID of each account must be unique. MySQL will return an exception if the same account ID is inserted repeatedly.

  • The primary key index is a special unique index, but it does not allow null values.

  • A normal index, unlike a unique index, allows identical values ​​to exist in the indexed columns. For example, in a student's grade list, the scores in each subject are allowed to be repeated, so a normal index can be used.

  • A joint index is an index composed of multiple columns. An index containing multiple single columns in a table is not a joint index. A joint index is an index that combines multiple column fields in order. When applying a joint index, you need to pay attention to the leftmost principle, that is, the fields in the where query condition must match the index fields from left to right. For example, a user information table uses name and age to form a joint index. If the query condition is "name equals Zhang San", then the leftmost principle is satisfied; if the query condition is "age is greater than 20", because the leftmost field in the index It is a name, not an age, so this index cannot be used.

  • Full-text index, as mentioned earlier, is implemented in the MyISAM engine. After version 5.6, the InnoDB engine also supports full-text index, and after version 5.7.6, it supports Chinese index. Full-text index can only be used on CHAR, VARCHAR, and TEXT type fields, and the bottom layer is implemented using an inverted index. It should be noted that for tables with large data volumes, generating full-text indexes will consume a lot of time and disk space.

Then look at the implementation of the index.

B+ tree implementation, B+ tree is more suitable for range queries such as > or <, and is the most commonly used index implementation in MySQL.

R-Tree is a data structure used to process multidimensional data and can spatially index geographic data. However, it is rarely used in actual business scenarios.

Hash uses a hash table to index data. The Hash method does not require multiple queries to locate records like B-Tree. Therefore, the efficiency of Hash index is higher than that of B-Tree, but it does not support functions such as range search and sorting. There are also relatively few actually used.

FullText is the full-text index mentioned earlier, an inverted index that records the relationship between keywords and corresponding documents.

Tuning

MySQL tuning is also a skill that developers need to master. Generally, MySQL tuning has four dimensions as shown in the figure below.

  • The first dimension is the optimization of database design, table structure design and index setting dimensions;

  • The second dimension is to optimize the SQL statements used in our business, such as adjusting where query conditions;

  • The third dimension is to optimize the configuration of the MySQL service, such as the management of the number of links, and the optimization of various cache sizes such as index cache, query cache, and sort cache;

  • The fourth dimension is to optimize hardware devices and operating system settings, such as adjusting operating system parameters, disabling swap, increasing memory, upgrading solid-state drives, etc.

From the perspective of optimization cost, the cost of optimization in these four dimensions gradually increases from left to right; from the perspective of optimization effect, the effect of optimization from right to left is higher.

For R&D personnel, the first two dimensions are closely related to business, so they need to be mastered. The latter two dimensions are more suitable for DBAs to conduct in-depth study, and a simple understanding is enough.

So, let’s focus on the first two latitudes, as shown in the figure below.

First look at the module on the left in the picture. Regarding the optimization of table structure and index, you should master the following principles.

  1. When designing the table structure, it is necessary to consider the horizontal and vertical expansion capabilities of the database, plan in advance the growth of data volume, read and write volume in the next year, and plan a sub-database and table scheme. For example, when designing a user information table, it is estimated that there will be 1 billion pieces of user data in one year. The write QPS will be about 5,000 and the read QPS will be 30,000. You can design a hashing based on UID latitude and divide it into 4 libraries. Each library has 32 tables. The amount of data in a single table Controlled at KW level.

  2. To choose the appropriate data type for the field, give priority to smaller data structures while retaining scalability. For example, to store age fields, use TINYINT instead of INT.

  3. A table with many fields can be decomposed into multiple tables, and intermediate tables can be added for correlation if necessary. If a table has 40 to 50 fields, it is obviously not a good design.

  4. Generally speaking, when designing a relational database, it needs to satisfy the third normal form, but in order to satisfy the third normal form, we may split into multiple tables. When querying, you need to perform related queries on multiple tables. Sometimes in order to improve query efficiency, the requirements for the paradigm are reduced and certain redundant information is stored in the table, which is also called anti-paradigm. But it should be noted that anti-paradigm must be moderate.

  5. Make good use of indexes, such as creating indexes for fields that are often used as query conditions. When creating joint indexes, consider the reusability of the index according to the leftmost principle and do not create indexes repeatedly; create unique indexes for fields that ensure that data cannot be repeated, etc. . However, it should be noted that indexes come at a cost for write operations such as inserts and updates. Do not abuse indexes. For example, a unique and poor field like gender is not suitable for indexing.

  6. Column fields should be set to not null as much as possible. It is difficult for MySQL to optimize queries for null columns. Allowing nulls will make indexes, index statistics, and values ​​more complex. Columns that allow null values ​​require more storage space and require special processing within MySQL.

    Look at the module shown on the right side of the figure again, the principle of optimizing SQL statements.

  7. To find the SQL statements that need optimization the most. It is either the most frequently used statement, or the statement that has been improved most significantly after optimization. You can find the SQL statements that need to be optimized by querying MySQL's slow query log;

  8. Learn to use the analysis tools provided by MySQL. For example, use Explain to analyze the execution plan of a statement to see whether an index is used, which index is used, how many records are scanned, whether file sorting is used, and so on. Or use the Profile command to analyze the time taken for each step in the execution of a certain statement.

  9. It should be noted that when using query statements, you should avoid using SELECT *. Instead, you should specify the specific fields that need to be obtained. The first reason is that it can avoid querying fields that do not need to be used, and the second reason is that it can avoid querying the meta information of column fields.

  10. Try to use prepared statements. One is that it performs better, and the other is that it can prevent SQL injection.

  11. Try to use index scanning for sorting, that is, try to perform sorting operations on indexed fields.

Inspection points and bonus points
Inspection point

Let’s look at the interview points for this lesson.

  1. You must understand the basic principles and usage scenarios of message queues and databases, as well as the characteristics of commonly used queues and databases. For example, the message queue is suitable for asynchronous processing and peak-shaving scenarios; Kafka achieves a high-performance distributed queue service with zero message loss while providing high availability; MySQL provides a variety of engines to support transactional and non-transactional The relational object library service and so on.

  2. To understand Kafka's architecture and message processing process, understand how Kafka ensures concurrency and redundancy and disaster recovery through Partition; understand how consumer groups ensure that each Consumer instance does not obtain duplicate messages.

  3. It is necessary to have a deep understanding of the ACID characteristics of database transactions, understand the concurrency problems that may be caused by concurrent transactions, and how different database isolation levels solve these concurrency problems.

  4. You must firmly master commonly used MySQL statements, such as WHERE conditional query statements, JOIN association statements, ORDER BY sorting statements, etc. Also be familiar with commonly used built-in functions, such as SUM, COUNT, etc.

  5. Understand the characteristics of different engines of MySQL database and different types of index implementations. For example, we know that the most commonly used InnoDB is very good at transaction processing, and MyISAM is more suitable for simple non-transaction query scenarios. For example, you know MySQL's different index types such as unique index, joint index, and full-text index, as well as the most commonly used B+ tree index implementation, etc.

bonus

If you want to perform better in interviews, you should also know the following bonus points.

  1. To understand new features, whether it is Kafka or MySQL, you must understand the new version features. For example, MySQL8.0 provides window functions to support new query methods; supports general table expressions to make embedded table statements in complex queries clearer, and so on.

  2. You need to know the principles of database table design. It would be better if you have experience in designing online business databases. You will know how to evaluate the capacity and know how to appropriately divide databases and tables to ensure the scalability of future services. This will have a great impact on the future service scalability. Interviews have a positive impact.

  3. It is best to have experience in database tuning. For example, an index statement has been clearly created, but the query efficiency is still very slow. Through Explain analysis, it is found that there are multiple indexes in the table, and the MySQL optimizer selected the wrong index, resulting in low query efficiency. , and then specify the index solution by using use index in the SQL statement.

  4. Have experience using mainstream message queues such as Kafka, and know how to optimize in business scenarios. For example, in the log push scenario, a small probability of message loss can be tolerated, and asynchronous message sending can be set up. For financial services, you need to set up synchronous message sending and set the highest message reliability. Set the request.required.acks parameter to -1.

Summary of real questions

Finally, the actual interview questions are summarized as follows.

  • Question 2: The reliability of the message can be ensured from three aspects: the sender of the message ensures delivery to the message queue, the high availability of the message object itself, and the offset is modified after the consumer processes it. This question can be answered by combining Kafka's message sending synchronously, asynchronously, and message reliability configuration.

  • Question 3 can solve message duplication from two aspects: one is to implement idempotence in message processing to eliminate the impact of message duplication; the other is to use Redis to deduplicate messages to avoid the processing of duplicate messages.

  • Question 4 can be started from the aspects of creating indexes, reducing related queries, optimizing SQL query conditions, etc.

  • Question 6 can be answered from the perspective of the relevant principles explained in the MySQL tuning section.

In the next lesson, we will learn about system architecture and detailed explanations of project cases.

Guess you like

Origin blog.csdn.net/g_z_q_/article/details/129940793