Summary of Java interview questions | Summary of Java interview questions 6-MYSQL module (continuously updated)

Mysql

Article directory

The difference between relational databases and non-relational databases

  • Advantages of relational databases
    • Easy to understand. Because it uses a relational model to organize data.
    • Data consistency can be maintained.
    • The overhead of data update is relatively small.
    • Supports complex queries (queries with where clause)
  • Advantages of non-relational databases
    • It does not need to be parsed by the SQL layer, and the reading and writing efficiency is high.
    • Based on key-value pairs, the data is very scalable.
    • It can support the storage of various types of data, such as pictures, documents, etc.

What is ORM? - mybatis

The Object Relational Mapping (ORM) pattern is a technology designed to solve the mismatch between object-oriented and relational databases . The ORM framework is a bridge connecting the database. As long as the mapping relationship between the persistence class and the table is provided , the ORM framework can refer to the information in the mapping file at runtime to persist the object into the database.

  • advantage:
    • 1) Improve development efficiency and reduce development costs
    • 2) Make development more object-oriented
    • 3) Portable
    • 4) Additional functions such as data caching can be easily introduced
  • shortcoming:
    • 1) Automated mapping of relational databases consumes system performance. In fact, the performance consumption here is not bad and can generally be ignored.
    • 2) When processing queries such as multi-table join queries and complex where conditions, the ORM syntax will become complicated.

How to evaluate whether an index is properly created?

It is recommended to design indexes according to the following principles:

  1. Avoid excessive indexing on frequently updated tables, and keep the number of columns in the index to a minimum. Indexes should be created on fields that are frequently used in queries, but avoid adding unnecessary fields.
  2. It is best not to use indexes for tables with small amounts of data. Since there is less data, the query may take less time than traversing the index, and the index may not produce an optimization effect.
  3. Create indexes on columns with many different values ​​that are often used in conditional expressions, and do not create indexes on columns with few different values. For example, the "Gender" field of the student table only has two different values: "Male" and "Female", so there is no need to create an index. If you create an index, not only will it not improve the query efficiency, but it will seriously reduce the data update speed.
  4. Specify a unique index when uniqueness is a characteristic of the data itself. The use of unique indexes must ensure the data integrity of the defined columns to improve query speed.
  5. Create indexes on columns that are frequently sorted or grouped (that is, perform group by or order by operations). If there are multiple columns to be sorted, you can create a combined index on these columns.

Count function

In terms of execution effect:

count(*) includes all columns, which is equivalent to the number of rows. When counting results, the column value is NULL will not be ignored.
count(1) includes ignoring all columns, and 1 represents the line of code. When counting results, The count
(column name) only includes the column with the column name. When counting the results, the count of the column value being null (the null here is not just an empty string or 0, but means null) will be ignored. , that is, when a certain field value is NULL, no statistics will be collected.

In terms of execution efficiency:

If the column name is the primary key, count(column name) will be faster than count(1). If the
column name is not the primary key, count(1) will be faster than count(column name).
If the table has multiple columns and no primary key, then count(1) The execution efficiency is better than count(*).
If there is a primary key, the execution efficiency of select count (primary key) is optimal.
If the table has only one field, select count(*) is optimal.

count(nullable field) < count(non-nullable field) = count(primary key id) < count(1) ≈ count(*)

count (primary key) and count (column name)

The primary key is not null. If there is a null value in the column name, then their returns are different.

Three major paradigms of database

First Normal Form (1NF): Each column of the database table is required to be an indivisible atomic data item.

Second normal form (2NF): Based on 1NF, there is a primary key. Non-primary key fields depend on the primary key fields. Each instance or record in the database table must be uniquely distinguishable.

Third Normal Form (3NF): Based on 2NF, any non-primary attribute does not depend on other non-primary attributes (eliminating transitive dependencies on the basis of 2NF). The third normal form needs to ensure that each column of data in the data table is directly related to the primary key . and cannot be indirectly related .

The difference between char and varchar in Mysql

char is fixed length, varchar is variable length

char If a certain length is less than M, MySQL will pad it with spaces on the right to make the length reach M

Each value of varchar only occupies just enough bytes, plus one byte to record its length. When the length is less than 255, the length recording bit occupies one byte. When it is greater than 255, it occupies two bytes.

What details should be considered in database design or function development?

  • The character sets of databases and tables use UTF8 uniformly.
  • All tables and fields need to be annotated
  • Try to separate hot and cold data and reduce the width of the table
  • Define all columns as NOT NUL if possible
  • It is prohibited to store large binary data such as pictures and files in the database.
  • Prioritize the smallest data type that meets storage needs
  • Avoid using TEXT and BLOB data types. The most common TEXT type can store 64k of data, and text data can be separated into tables.

Database create table statement

CREAT TABLE 'table_name'{
	'ID' INT UNSIGNED AUTO_INCREMENT,
	'字段1' varchar(100) NOT NULL,
}ENGINE = InnoDB DEFUALT CHAREST = utf-8

How to evaluate whether an index is properly created?

It is recommended to design indexes according to the following principles:

  1. Avoid excessive indexing on frequently updated tables, and keep the number of columns in the index to a minimum. Indexes should be created on fields that are frequently used in queries, but avoid adding unnecessary fields.
  2. It is best not to use indexes for tables with small amounts of data. Since there is less data, the query may take less time than traversing the index, and the index may not produce an optimization effect.
  3. Create indexes on columns with many different values ​​that are often used in conditional expressions, and do not create indexes on columns with few different values. For example, the "Gender" field of the student table only has two different values: "Male" and "Female", so there is no need to create an index. If you create an index, not only will it not improve the query efficiency, but it will seriously reduce the data update speed.
  4. Specify a unique index when uniqueness is a characteristic of the data itself. The use of unique indexes must ensure the data integrity of the defined columns to improve query speed.
  5. Create indexes on columns that are frequently sorted or grouped (that is, perform group by or order by operations). If there are multiple columns to be sorted, you can create a combined index on these columns.

Classification of Mysql index

Primary key index, unique index, joint index, ordinary index, full-text index, spatial index

Classification of non-clustered indexes

Auxiliary index, index of MyiSam

Please tell me about MySQL indexes and their advantages and disadvantages

The index is a separate data structure stored on the disk , which contains reference pointers to the records used in the data table. Using the index can improve query efficiency. The index is implemented in the storage engine, so the index of each storage engine None are exactly the same. Common index results include Hash and Btree+.

Benefits of indexing:

  • Improve query speed
  • When using grouping and sorting clauses for data query, you can significantly reduce the time of grouping and sorting in the query
  • By creating a unique index, you can ensure the uniqueness of each row of data in the database table

shortcoming:

  • Creating and maintaining indexes is a waste of time, and as the amount of data increases, the time spent will also increase.
  • Indexes require disk space
  • When inserting, deleting, and modifying operations on the table, the index also needs to be dynamically maintained, which reduces the speed of data maintenance.

The implementation principle of index

BTree+tree index and Hash index

The non-leaf nodes of the BTree+ tree only store key information. In this way, more index keys can be stored, which reduces the height of the BTree+ tree. The leaf nodes store key and value values. Every query must be queried from the root node to the leaves. nodes, so that the efficiency of each query is stable. At the same time, leaf nodes have pointers to adjacent nodes, which improves the efficiency of range searches.

Hash index is to create a hash table based on the hash value of the field. The key is the hash value and the value is the address corresponding to the data.

The underlying structure of Mysql

Service layer: connectors, query cache, analyzer, optimizer, executor

Storage engine: responsible for data storage and retrieval

Please tell me about MySQL indexes and their advantages and disadvantages

The index is a separate data structure stored on the disk , which contains reference pointers to the records used in the data table. Using the index can improve query efficiency. The index is implemented in the storage engine, so the index of each storage engine None are exactly the same. Common index results include Hash and Btree+.

Benefits of indexing:

  • Improve query speed
  • When using grouping and sorting clauses for data query, you can significantly reduce the time of grouping and sorting in the query
  • By creating a unique index, you can ensure the uniqueness of each row of data in the database table

shortcoming:

  • Creating and maintaining indexes is a waste of time, and as the amount of data increases, the time spent will also increase.
  • Indexes require disk space
  • When inserting, deleting, and modifying operations on the table, the index also needs to be dynamically maintained, which reduces the speed of data maintenance.

How to specify index in sql statement

SELECT 字段名表 
FROM 表名表
WITH (INDEX(索引名))
WHERE 查询条件

The difference between char and varchar in Mysql

char is fixed length, varchar is variable length

char If a certain length is less than M, MySQL will pad it with spaces on the right to make the length reach M

Each value of varchar only occupies just enough bytes, plus one byte to record its length. When the length is less than 255, the length recording bit occupies one byte. When it is greater than 255, it occupies two bytes.

The implementation principle of index

BTree+tree index and Hash index

The non-leaf nodes of the BTree+ tree only store key information. In this way, more index keys can be stored, which reduces the height of the BTree+ tree. The leaf nodes store key and value values. Every query must be queried from the root node to the leaves. nodes, so that the efficiency of each query is stable. At the same time, leaf nodes have pointers to adjacent nodes, which improves the efficiency of range searches.

Hash index is to create a hash table based on the hash value of the field. The key is the hash value and the value is the address corresponding to the data.

sql statement

Student performance table (name, subject, score), query the names of students whose scores in each course are >80 points.

select name from table group by name having min(score) > 80;
select name from table group by name having count(score) = sum(case when score > 80 then 1 else 0 end)

Add index

ALTER TABLE `table_name` ADD PRIMARY KEY ( `column` )//主键索引
ALTER TABLE `table_name` ADD UNIQUE ( `column` )//唯一索引
ALTER TABLE `table_name` ADD INDEX index_name ( `column` )
ALTER TABLE `table_name` ADD INDEX index_name ( `column1`, `column2`, `column3` )//联合索引
ALTER TABLE <表名> ADD INDEX (<字段>);

SQL injection

SQL injection means that the web application does not judge the legality of the user input data or does not filter it strictly. The attacker can add additional SQL statements at the end of the pre-defined query statements in the web application without the administrator's knowledge. Implement illegal operations in order to deceive the database server into performing unauthorized arbitrary queries, thereby further obtaining corresponding data information.

How to block

(1) Filter special characters in user input parameters to reduce the risk of SQL injection.

(2) SQL statements spliced ​​by characters are prohibited, and the incoming SQL parameters must be bound using parameters.

(3) Properly use the anti-injection mechanism provided by the database access framework. For example, MyBatis provides #{} binding parameters to prevent SQL injection. Use it with caution at the same time ${ }, ${}it is equivalent to using character splicing SQL.

Several ways to tune database SQL

(6 messages) Several ways to tune database SQL_lss0555’s blog-CSDN blog_sql tuning

  1. Create index

    1. To avoid full table scans, you should first consider creating indexes on the columns involved in where and order by.
    2. Create an index on fields that often need to be retrieved . For example, if you want to retrieve based on the table field username, you should create an index on the name field. If you often need to retrieve based on employee department and employee position level, then you should create an index on the employee department and employee position level. Create indexes on these two fields of employee position level.
      The performance improvement brought by creating an index to retrieval is often huge, so when you find that the retrieval speed is too slow, the first thing you should think of is to create an index.
    3. It is best not to have more than 6 indexes on a table. If there are too many, you should consider whether it is necessary to build indexes on some columns that are not commonly used. The more indexes, the better. Although the index can improve the efficiency of the corresponding select, it also reduces the efficiency of insert and update, because the index may be rebuilt during insert or update, so how to build the index needs to be carefully considered, depending on the specific situation. It depends.
  2. Avoid doing calculations on indexed columns

  3. Use precompiled queries

  4. Adjust the connection order in the Where clause

    According to this principle , table connections are best written before other where conditions , those that can filter out the maximum number of records.

  5. Try to compress multiple SQL statements into one SQL statement

    Every time you execute SQL, you must establish a network connection, perform permission verification, optimize the SQL statement query, and send the execution results. This process is very time-consuming, so you should try to avoid
    executing too many SQL statements and compress them to Do not use multiple statements to execute one SQL statement.

  6. Use the where clause to replace the HAVING clause
    to avoid using the HAVING clause, because HAVING will only filter the result set after all records have been retrieved, while where will flush the records before aggregation. If the number of records can be limited by the where clause, then This can reduce the cost in this area. The conditions in HAVING are generally used for filtering aggregate functions. In addition, the conditions should be written in the where clause.

  7. Using table aliases

    When joining multiple tables in a SQL statement, use table aliases and prefix each column name with the alias. This can reduce parsing time and
    reduce syntax errors caused by ambiguities in column names.

  8. Replace union with union all.
    When a SQL statement requires union of two query result sets, even if there are no duplicate records in the retrieval results, if union is used, the two result sets will also try to merge and then sort before outputting the final result. , so if it can be judged that there will be no duplicate records in the search results, union all should be used, so that the efficiency will be improved.

  9. Use varchar/nvarchar instead of char/nchar
    Use varchar/nvarchar instead of char/nchar as much as possible, because first of all, variable length fields have small storage space, which can save storage space, and secondly, for queries, search efficiency in a relatively small field Obviously higher.
    Don’t think that NULL does not require space. For example: char(100) type. The space is fixed when the field is created. Regardless of whether a value is inserted (NULL is also included), it will occupy 100 characters of space. If it is varchar For such a variable-length field, null does not take up space.

  10. Query select statement optimization

  11. Update Update statement optimization

    If you only change 1 or 2 fields, do not update all fields, otherwise frequent calls will cause obvious performance consumption and bring a lot of logs.

  12. Insert statement optimization

    When creating a new temporary table, if the amount of data inserted at one time is large, you can use select into instead of create table to avoid causing a large number of logs and improve the speed; if the amount of data is not large, in order to alleviate the resources of the system table, you should first create table , and then insert.

SQL statement execution order

SQL statement execution order

  • FROM: When executing a query on an SQL statement, the tables on both sides of the keyword are first connected in the form of a Cartesian product and a virtual table V1 is generated. A virtual table is a view, and data will come from the execution results of multiple tables.
  • ON: Perform ON filtering on the results of FROM connection and create virtual table V2
  • JOIN: Add the ON filtered left table and create a new virtual table V3
  • WHERE: Perform WHERE filter on virtual table V3 and create virtual table V4
  • GROUP BY: Group the records in V4 and create virtual table V5
  • HAVING: Filter V5 and create virtual table V6
  • SELECT: Filter the results in V6 according to SELECT and create virtual table V7
  • DISTINCT: Deduplicate the results in the V7 table and create a virtual table V8. If the GROUP BY clause is used, there is no need to use DISTINCT, because when grouping, the unique values ​​in the column are divided into one group, and each group only returns One row of records, so all records are different.
  • ORDER BY: Sort the results in the V8 table.

How to optimize SQL queries

Use index:

If no index is used during the query, the query statement will scan all records in the table. In the case of large amounts of data, the query speed will be very slow. If an index is used for query, the query statement can quickly locate the record to be queried based on the index, thereby reducing the number of records to be queried and improving the query speed.

Optimize subquery:

You can use subqueries to perform nested queries of SELECT statements, that is, the results of one SELECT query serve as the conditions for another SELECT statement. Subqueries can complete many SQL operations that logically require multiple steps to complete in one go.

Although subqueries can make query statements very flexible, their execution efficiency is not high. When executing a subquery, MySQL needs to create a temporary table for the query results of the inner query statement. The outer query statement then retrieves records from the temporary table. After the query is completed, these temporary tables are revoked. Therefore, the speed of the subquery will be affected to a certain extent. If the amount of data queried is relatively large, this impact will increase.

In MySQL, you can use join (JOIN) queries instead of subqueries. Join queries do not need to create temporary tables and are faster than subqueries. If indexes are used in the query, the performance will be better.

Mysql delete data

Delete is used to delete all or part of the data rows in the table. After executing delete, the user needs to commit or rollback to perform the delete or undelete, which will trigger all delete triggers on the table.

DELETE FROM <表名> [WHERE 子句] [ORDER BY 子句] [LIMIT 子句]
DELETE FROM tb_name;  删除表中的所有数据

Truncate deletes all data in the table. This operation cannot be rolled back and will not trigger triggers on the table. Truncate is faster than delete and takes up less space.

TRUNCATE [TABLE] 表名

The Drop command deletes the table from the database. All data rows, indexes and permissions will also be deleted. All DML triggers will not be fired, and this command cannot be rolled back.

The difference between TRUNCATE and DELETE

Logically, the TRUNCATE statement has the same effect as the DELETE statement, but in some cases, there are differences in their usage.

  • DELETE is a DML type statement; TRUNCATE is a DDL type statement. They are both used to clear the data in the table.
  • DELETE deletes records row by row; TRUNCATE directly deletes the original table and recreates an identical new table instead of deleting the data in the table row by row, and executes data faster than DELETE. Therefore, when you need to delete all data rows in the table, try to use the TRUNCATE statement to shorten the execution time.
  • After DELETE deletes data, the data can be retrieved with event rollback; TRUNCATE does not support transaction rollback, and data cannot be retrieved after deletion.
  • After DELETE deletes data, the system will not reset the counter of the auto-increment field; after TRUNCATE clears the table records, the system will reset the counter of the auto-increment field.
  • DELETE has a wider scope of use because it can delete part of the data by specifying conditions through the WHERE clause; TRUNCATE does not support the WHERE clause and can only delete the entire data.
  • DELETE will return the number of rows deleted, but TRUNCATE will only return 0, which is meaningless.

How to optimize MySQL?

For queries, we can improve query speed by using indexes and using joins instead of subqueries.

For slow queries, we can discover the causes of slow queries by analyzing slow query logs, so as to perform targeted optimization.

For insertion, we can improve the insertion speed by disabling indexes, disabling checks, etc., and then enabling indexes and checks after insertion.

Regarding the database structure, we can optimize it by splitting a table with many fields into multiple tables, adding intermediate tables, and adding redundant fields.

explainWhat to focus on?

Focus on the following columns:

List Remark
type This query table connection type, from here you can see the approximate efficiency of this query.
key The final selected index, if there is no index, the efficiency of this query is usually very poor.
key_len The actual length of the index used for result filtering in this query.
rows The estimated number of records that need to be scanned. The smaller the number of records that are expected to be scanned, the better.
Extra Additional additional information mainly confirms whether Using filesort and Using temporary occur.

In addition, Extra columns need to pay attention to the following situations:

Keywords Remark
Using filesort External sorting will be used instead of arranging the results in index order. When the data is small, it will be sorted from memory. Otherwise, sorting will need to be done on disk, which is very expensive and requires adding a suitable index.
Using temporary A temporary table needs to be created to store the results. This usually occurs when GROUP BY is performed on columns without indexes, or the columns in ORDER BY are not all in the index, and appropriate indexes need to be added.
Using index It means that MySQL uses covering index to avoid full table scan and does not need to search the data twice in the table. This is one of the better results. Be careful not to confuse it with the index type in type.
Using where Usually, a full table/full index scan is performed and then the WHERE clause is used to filter the results. Appropriate indexes need to be added.
Impossible WHERE The result of judging the Where clause is always false and no data can be selected. For example, where 1=0, there is no need to pay too much attention.
Select tables optimized away When using some aggregate functions to access a field that has an index, the optimizer will directly locate the required data rows through the index at once to complete the entire query, such as MIN()\MAX(), which is also one of the better results. one.

What to do with tens of millions of data in the table

It is recommended to optimize in the following order:

  1. Optimize SQL and indexes;
  2. Add cache, such as memcached and redis;
  3. To separate reading and writing, master-slave replication or master-master replication can be used;
  4. Use the partition table that comes with MySQL, which is transparent to the application and does not require code changes, but the SQL statement must be optimized for the partition table;
  5. Do vertical splitting, that is, divide a large system into multiple small systems based on the coupling degree of modules;
  6. When doing horizontal splitting, you must choose a reasonable sharding key. In order to achieve good query efficiency, the table structure must also be changed, a certain amount of redundancy must be made, and the application must also be changed. Try to include sharding key in SQL to locate the data in a limited Look up the table instead of scanning all tables.

Do you know about MySQL’s slow query optimization?

To optimize MySQL's slow queries, you can follow the following steps:

Enable slow query log:

--log-slow-queries[=file_name]The slow query log in MySQL is turned off by default. It can be turned on through the log-slow-queries option in the configuration file my.ini or my.cnf, or you can start the slow query log when the MySQL service is started.

When starting the slow query log, you need to configure the long_query_time option in the my.ini or my.cnf file to specify the recording threshold. If the query time of a certain query statement exceeds this value, the query process will be recorded in the slow query log file.

Analyze slow query logs:

Directly analyze the MySQL slow query log, and use the explain keyword to simulate the optimizer's execution of SQL query statements to analyze the SQL slow query statements.

Common slow query optimization:

  1. When the index doesn't work

    • In the query statement using the LIKE keyword, if the first character of the matching string is "%", the index will not work. The index will only work if "%" is not in the first position.
    • MySQL can create indexes for multiple fields. An index can include 16 fields. For multi-column indexes, the index will be used only if the first field among these fields is used in the query condition.
    • The index is only used in the query when there is only the OR keyword in the query conditions of the query statement, and the columns in the two conditions before and after the OR are both indexes. Otherwise, the query will not use the index.
  2. Optimize database structure

    • For tables with many fields, if some fields are rarely used, these fields can be separated to form a new table. Because when a table has a large amount of data, it will slow down due to the presence of infrequently used fields.
    • For tables that require frequent joint queries, intermediate tables can be established to improve query efficiency. By establishing an intermediate table, insert data that requires frequent joint queries into the intermediate table, and then change the original joint query to a query on the intermediate table to improve query efficiency.
  3. Decompose related queries

    Many high-performance applications will decompose the correlation query, that is, a single table query can be performed on each table, and then the query results will be correlated in the application. This will be more efficient in many scenarios.

  4. Optimize LIMIT paging

    When the offset is very large, for example, it may be a query such as limit 10000,20. MySQL needs to query 10020 records and then only return the last 20 records. The first 10000 records will be discarded, which is very costly. One of the simplest ways to optimize such queries is to use index coverage scans whenever possible, rather than querying all columns. Then perform a correlation operation as needed and return the required columns. The efficiency of doing this will be greatly improved when the offset is large.

How much do you know about the difference between relational and non-relational databases?

  • Advantages of relational databases
    • Easy to understand. Because it uses a relational model to organize data.
    • Data consistency can be maintained.
    • The overhead of data update is relatively small.
    • Supports complex queries (queries with where clause)
  • Advantages of non-relational databases
    • It does not need to be parsed by the SQL layer, and the reading and writing efficiency is high.
    • Based on key-value pairs, the data is very scalable.
    • Can support the storage of multiple types of data, such as pictures, documents, etc.

[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-RQXRO6Qw-1682647034826) (D:/Study/JAVA/Interview/Interview Questions Organized Version.assets/image- 20220907112634255.png)]

Mysql query statement locking

lock in share mode join shared lock

for update adds exclusive lock

leftmost matching principle

When MySQL builds a joint index, it will follow the leftmost prefix matching principle, that is, leftmost priority. When retrieving data, matching starts from the leftmost of the joint index. That is to say, if your SQL statement uses the leftmost index in the joint index, then this SQL statement can use this joint index to match

Cache and database consistency

Delete redis first, then update the database; update the database first, then delete redis. Both have their own benefits. If you choose the first method, you usually delay double deletion and subscribe to binlog; if you choose the second method, the implementation is simple, but it is inconsistent in the short term.

1) Delete the cache first, then update the database

If two threads want to "read and write" data concurrently, the following scenario may occur:

  1. Thread A wants to update X = 2 (original value X = 1)
  2. Thread A deletes the cache first
  3. Thread B reads the cache and finds that it does not exist, and reads the old value from the database (X = 1)
  4. Thread A writes new values ​​to the database (X = 2)
  5. Thread B writes the old value to cache (X = 1)

The final value of X is 1 (old value) in the cache and 2 (new value) in the database, causing inconsistency.

It can be seen that if the cache is deleted first and then the database is updated, when "read + write" concurrency occurs, data inconsistency still exists.

2) Update the database first, then delete the cache

It is still 2 threads "reading and writing" data concurrently:

  1. X does not exist in cache (database X = 1)
  2. Thread A reads the database and gets the old value (X = 1)
  3. Thread B updates the database (X = 2)
  4. Thread B deletes cache
  5. Thread A writes the old value to cache (X = 1)

The final value of X is 1 (old value) in the cache and 2 (new value) in the database, which is also inconsistent.

This situation is "theoretically" possible, but is it really possible?

In fact, the probability is "very low" because it must meet three conditions:

  1. The cache just expired
  2. Read request + write request concurrency
  3. The time to update the database + delete the cache (steps 3-4) is shorter than the time to read the database + write the cache (steps 2 and 5)

If you think about it carefully, the probability of condition 3 happening is actually very low.

Because writing to the database is usually "locked" first, writing to the database usually takes longer than reading the database.

From this point of view, the solution of "first updating the database + then deleting the cache" can ensure data consistency.

Therefore, we should adopt this solution to operate the database and cache.

Okay, now that the concurrency problem has been solved, let’s continue to look at the remaining problem of data inconsistency caused by the “failure” of the second step execution .

How to ensure the success of the second step

Retry asynchronously, subscribe to the database change log, and then operate the cache

Specifically, when our business application modifies data, it "only" needs to modify the database and does not need to operate the cache.

So when should the cache be operated? This is related to the "change log" of the database.

Take MySQL as an example. When a piece of data is modified, MySQL will generate a change log (Binlog). We can subscribe to this log, get the data of the specific operation, and then delete the corresponding cache based on this data.

picture

To subscribe to the change log, there are now relatively mature open source middlewares, such as Alibaba’s canal. The advantages of using this solution are:

  • There is no need to consider the failure of writing to the message queue : as long as the writing to MySQL is successful, Binlog will definitely have
  • Automatically deliver to the downstream queue : canal automatically "delivers" the database change log to the downstream message queue

Of course, at the same time, we need to invest energy in maintaining the high availability and stability of canal.

At this point, we can conclude that in order to ensure the consistency of the database and cache, it is recommended to adopt the "update the database first, then delete the cache" solution and cooperate with the "Message Queue" or "Subscribe to the Change Log" method .

Master-slave library delay and delayed double deletion issues

At this point, there are still two issues that we have not focused on.

The first question is , do you still remember the "delete the cache first, then update the database" solution mentioned earlier, which leads to inconsistency scenarios?

Here I will bring you another example for you to review:

If two threads want to "read and write" data concurrently, the following scenarios may occur:

  1. Thread A wants to update X = 2 (original value X = 1)
  2. Thread A deletes the cache first
  3. Thread B reads the cache and finds that it does not exist, and reads the old value from the database (X = 1)
  4. Thread A writes new values ​​to the database (X = 2)
  5. Thread B writes the old value to cache (X = 1)

The final value of X is 1 (old value) in the cache and 2 (new value) in the database, causing inconsistency.

The second question : It is about the issue of cache and database consistency in the case of "read-write separation + master-slave replication delay".

In the "update the database first, then delete the cache" solution, "read-write separation + master-slave library delay" will actually lead to inconsistency:

  1. Thread A updates the main library X = 2 (original value X = 1)
  2. Thread A deletes cache
  3. Thread B queries the cache, but there is no hit. It queries the "slave library" to get the old value (slave library X = 1)
  4. The slave database "synchronization" is completed (master-slave database X = 2)
  5. Thread B writes the "old value" to the cache (X = 1)

The final value of

Did you see it? The core of these two problems is that the cache has been restored to "old values" .

So how to solve this kind of problem?

The most effective way is to delete the cache .

However, it cannot be deleted immediately, but "delayed deletion" is required. This is the solution given by the industry: cache delayed double deletion strategy .

According to the delayed double deletion strategy, the solutions to these two problems are as follows:

Solve the first problem : After thread A deletes the cache and updates the database, it first "sleeps for a while" and then "delete" the cache.

To solve the second problem : Thread A can generate a "delayed message" and write it to the message queue, and the consumer will delay "delete" the cache.

The purpose of these two solutions is to clear the cache, so that the latest value can be read from the database and written to the cache next time.

But here comes the question, how long does it take to set the delay time for this "delayed deletion" cache?

  • Problem 1: The delay time is greater than the delay time of "master-slave replication"
  • Problem 2: The delay time is greater than the time for thread B to read the database + write to the cache

However, this time is actually difficult to evaluate in distributed and high-concurrency scenarios .

Many times, we roughly estimate this delay time based on experience, such as a delay of 1-5 seconds, which can only reduce the probability of inconsistency as much as possible.

So you see, using this solution is just to ensure consistency as much as possible. In extreme cases, inconsistencies may still occur.

Therefore, in actual use, I still recommend that you adopt the solution of "update the database first, then delete the cache". At the same time, you should try your best to ensure that the "master-slave replication" does not have too much delay to reduce the probability of problems.

Summarize

Regarding the issue of inconsistency between the cache and the database, two methods can be adopted. The first is to delete the cache first, and then update the database. However, inconsistencies may occur under concurrent conditions. You can use the strategy of delayed double deletion, which is the basic idea of ​​delayed double deletion. as follows:

  1. delete cache;
  2. Update database;
  3. sleep N milliseconds;
  4. Delete cache again.

After blocking for a period of time, delete the cache again to delete the inconsistent data in the cache during this process. As for the specific time, you need to evaluate the approximate time of your business and set it according to this time.

If you update the database first and then delete the cache, in order to ensure that both steps are executed successfully, you need to cooperate with the "message queue" or "subscription change log" solution. The essence is to ensure data consistency through "retry";

Subscribing to binlog When a piece of data is modified, MySQL will generate a change log (Binlog). We can subscribe to this log to get the data of the specific operation, and then delete the corresponding cache based on this data.

[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-qjxrcljm-1682647034827) (D:/Study/JAVA/Interview/Interview Questions Organized Version.assets/image- 20220903142730256.png)]

In case of cache deletion failure: Add cache update retry mechanism (commonly used): If the cache service is currently unavailable and cache deletion fails, we will retry after a period of time. The number of retries can be set by yourself. If it still fails after multiple retries, we can store the currently updated key that failed in the queue, and then delete the corresponding key in the cache after the cache service is available.

Under the "update the database first, then delete the cache" solution, "read-write separation + master-slave library delay" will also cause inconsistency between the cache and the database. The solution to alleviate this problem is "delayed double deletion" and sending "delayed messages" based on experience into the queue, delay deletion of the cache, and also control the master-slave library delay to reduce the probability of inconsistency as much as possible.

Use the operation of updating the database first and then deleting the cache, +cannel+message queue

preview

The process is shown in the figure below:
(1) Update the database data
(2) The database will write the operation information into the binlog log
(3) The subscription program extracts the required data and key
(4) Start a new section of non-business code to obtain the information
(5) Try to delete the cache operation and find that the deletion failed.
(6) Send the information to the message queue.
(7) Obtain the data from the message queue again and try the operation again.

Mysql's trivial mechanism

  1. "Table lock" : It is the lock with the largest granularity. It means that the current operation locks the entire table. The overhead is small, the lock is fast, and deadlock will not occur. However, because the granularity is too large, the probability of lock conflict is high and concurrency is high. Low performance. In Mysql** "MyISAM storage engine supports table locks" . There are two table lock modes of MyISAM: "table shared read lock" and "table exclusive write lock"**. Table locks are supported by most mysql engines, and both MyISAM and InnoDB support table-level locks.

  2. The granularity of "page lock" is a kind of lock between row lock and table lock.

  3. **"Row lock"** is the locking mechanism with the smallest granularity. Row lock has high locking overhead, slow locking, and deadlock. However, row lock has low probability of lock conflict and high concurrency performance. Row lock is the default lock mechanism supported by InnoDB. MyISAM does not support row lock. This is also one of the differences between InnoDB and MyISAM.

Row locks can be divided into: "shared read lock (S lock) and exclusive write lock (X lock)" in terms of usage .

When a transaction adds a read lock to a data row in Mysql, the current transaction cannot modify the row data and can only perform operations. Other transactions can only add read locks to the row data but not write locks . If a transaction adds a write lock to a row of data, the transaction can perform read and write operations on the row of data. Other transactions cannot add any locks to the row of data and cannot read or write.

[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-pgXG81CQ-1682647034828) (D:/Study/JAVA/Interview/Interview Questions Organized Version.assets/image- 20220903144148596.png)]

Optimistic locking and pessimistic locking

  1. Optimistic locking: Optimistic locking does not come with the database and needs to be implemented by ourselves. Optimistic locking means that when operating the database (update operation), the idea is very optimistic, believing that this operation will not cause conflicts. When operating data, no other special processing (that is, no locking) is performed, and the operation is carried out. After updating, check whether there is any conflict. The usual implementation is as follows: when operating (updating) the data in the table, first add a version (version) field to the data table , and increase the version number of that record by 1 for each operation. That is to say, first query the record and obtain the version field. If you want to operate (update) that record, first determine whether the value of the version at this moment is equal to the value of the version just queried . If they are equal, then explain During this period, if no other program operates on it, you can perform an update and add 1** to the value of the version field; if during the update it is found that the version value at this moment is not equal to the value of the version just obtained, then this period During this period, other programs have already operated on it, so no update operation will be performed. **

  2. Pessimistic lock: The implementation of pessimistic lock often relies on the lock mechanism provided by the database. Implementation of pessimistic lock: When first implementing pessimistic lock, we must first use set autocommit=0 ; turn off the autoCommit attribute of mysql. Because after we query the data, we must lock the data. After turning off automatic submission, we need to manually open the transaction.

    MySQL uses shared locks and exclusive locks to implement pessimistic locks, select...lock in share mode (shared lock), select...for update (exclusive lock)

Summary: Use optimistic locking for reading and pessimistic locking for writing.

Data redundancy

Data redundancy: Duplicate data in a data set is called data redundancy

For example, when designing a database, if a certain field belongs to one table, but it appears in another or multiple tables at the same time, and is completely equivalent to its meaning in the table to which it originally belongs, then this field is a redundant field.

shortcoming:

  • Waste of storage space.
  • Large system overhead and high maintenance costs
  • Data consistency is difficult to ensure: When redundant fields are used too much, data consistency is difficult to ensure. In order to ensure consistency, a large performance overhead will be incurred. If some redundant fields are manually maintained (developers) , data consistency will be difficult to guarantee.

benefit:

  • Use space for time to improve query speed
  • Can be used for data recovery

What should I do after Mysql crashes? Which log should be used to restore it? Do you know how to restore it specifically? two-phase commit

Use redo log for recovery

img

  • The first phase of the two-phase commit (prepare phase): write rodo-log and mark it as prepare status.
  • Then write binlog
  • The second phase of two-phase submission (commit phase): write bin-log and mark it as commit status.

Two paragraph submission

1. In the prepare stage, write redo log;

2. In the commit phase, write binlog and change the status of redo log to commit status;

During the MySQL crash recovery process, transaction rollback will be done based on the redo log and binlog records:

1. If both redo log and binlog exist and are logically consistent, then commit the transaction;

2. If the redo log exists but the binlog does not exist, which is logically inconsistent, then the transaction will be rolled back;

Finally, you can find that the two-stage submission here actually exists with redo log and binlog. So when binlog is not enabled, the submitted transaction is written directly to the redo log. This is why redo log transactions are submitted in two stages, depending on the scenario.

Submit detailed explanation in two paragraphs

During the execution of the update statement, two pieces of logs will be recorded redo log, binlogbased on basic transactions**. redo logDuring the execution of the transaction, you can continue to write**, and it binlogwill only be written when the transaction is submitted , so redo logthe binlogwriting timing of no the same.

[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-ocT4c6VB-1682647034829) (D:/Study/JAVA/Interview/Interview Questions Organized Version.assets/image- 20220912155444125.png)]

If you accidentally delete a table, do you know how to restore it?

Find the bin log file. The binlog file currently in use contains the data we want to recover. Generally, the binlog files in the production environment are hundreds of M or even G in size. We cannot find the location of the deleted data line by line, so it is important to remember the time of misoperation. We can use the –start of the mysqlbinlog command. -datetime parameter quickly locates data location

[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-c9GuIdUv-1682647034829) (D:/Study/JAVA/Interview/Interview Questions Organized Version.assets/image- 20220820205350823.png)]

What is a transaction? Usage scenarios of transactions

A transaction is a set of logical operations, either all succeed or all fail.

Scenario 1: If in actual business, a piece of data needs to be stored in two tables at the same time, and the data in the two tables are required to be synchronized, then a transaction management mechanism needs to be used to ensure data synchronization. If an error occurs, for example, data insertion into table 1 is successful but data insertion into table 2 fails, then it will be rolled back and the data persistence operation will be terminated.

Scenario 2: Software development in the financial industry places strict emphasis on transaction processing. For example, in our common transfer operations, the amount of one party's account decreases, corresponding to the increase in the amount of the other party's account. This process requires the use of a transaction mechanism, otherwise the transfer cannot be successful.

https://leetcode.cn/problems/bu-ke-pai-zhong-de-shun-zi-lcof/)

transaction isolation level

[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-HEgiFbv5-1682647034829) (D:/Study/JAVA/Interview/Interview Questions Organized Version.assets/image- 20220828103750928.png)]

  1. Dirty reading: The current transaction (A) can read uncommitted data (dirty data) from other transactions (B). This phenomenon is dirty reading.
  2. Non-repeatable read: The same data is read twice in transaction A, and the results of the two reads are different. This phenomenon is called non-repeatable read. The difference between dirty read and non-repeatable read is that the former reads uncommitted data by other transactions, while the latter reads data that has been submitted by other transactions.
  3. Phantom reading: In transaction A, the database is queried twice according to a certain condition, and the number of results of the two queries is different. This phenomenon is called phantom reading. The difference between non-repeatable reading and phantom reading can be easily understood as: the former means that the data has changed, and the latter means that the number of rows of data has changed.

MVCC

The full name of MVCC is multi-version concurrency control. It maintains multiple versions of each data according to the transaction so that it can resolve read and write conflicts in concurrent transactions. At the same time, it uses snapshot reading to provide non-blocking read function for mvcc, so it is a tool for Lock-free concurrency control mechanism to resolve read-write conflicts

(1). Hidden columns: There are hidden columns in each row of InnoDB data. The hidden columns contain the transaction ID of the data in this row, the pointer to the undo log, etc.;

(2). Version chain based on undo log: The hidden column of each row of data contains a pointer to the undo log, and each undo log points to a higher-level undo log, thus forming a version chain of undo log;

(3).ReadView: By hiding the column and version chain, MySQL can restore the data to the pointer version, but the specific version to be restored needs to be determined based on ReadView; ReadView: refers to the transaction taking a snapshot of the entire transaction system at a certain moment, and then When performing read and write operations, the transaction id in the read data will be compared with the trx_sys snapshot to determine whether the data is visible to the ReadView, that is, whether the transaction is visible.

ACID and locks

A: Atomicity. The key to achieving atomicity is the undo log. When a transaction modifies the database, the modified operations will be recorded in the undo log. If the transaction fails or rolls back, then the undo log will be read. information to recover

D: Durability. The key to achieving consistency is the redo log. When the data is modified, the redo log will record the modification. When the transaction is submitted, the fsync interface will be called to flush the redo log. If MySQL goes down, you can read the data in the redo log and restore the database when restarting, and the redo logs are written sequentially.

I: Isolation, transactions do not interfere with each other in concurrent situations

The first aspect is the impact of write operations (of one transaction) on write operations of (another transaction): the lock mechanism ensures isolation.

Isolation is achieved through locks. Before a transaction can modify data, it needs to obtain the corresponding lock. After acquiring the lock, the transaction can modify the data. During the transaction operation, this part of the data is locked. If other transactions need to modify the data, they need to wait for the current transaction to commit or roll back to release the lock.

The second aspect is the impact of write operations (of one transaction) on read operations of (another transaction): MVCC guarantees isolation.

MVCC stands for Multi-Version Concurrency Control, which is a multi-version concurrency control protocol. Its biggest advantage is that reading is not locked, so there is no conflict between reading and writing, and the concurrency performance is good.

Principle: Maintaining multiple versions of a data, snapshot reading provides a blocking function for Mysql to implement MVCC

  1. Hidden columns: Each row of data in InnoDB has hidden columns. The hidden columns include the transaction ID of the row of data, the pointer to the undo log, etc.
  2. Version chain based on undo log: The hidden column of each row of data contains a pointer to the undo log, and each undo log also points to an earlier version of the undo log, thus forming a version chain.
  3. ReadView: By hiding columns and version chains, MySQL can restore data to a specified version. But which version to restore to specifically needs to be determined based on ReadView. The so-called ReadView means that a transaction (recorded as transaction A) takes a snapshot of the entire transaction system (trx_sys) at a certain moment. When a read operation is performed later, the transaction ID in the read data will be compared with the trx_sys snapshot, thus Determine whether the data is visible to the ReadView, that is, whether it is visible to transaction A.

C: Consistency means that the integrity constraints of the database are not violated before and after the transaction.

It can be said that consistency is the ultimate goal pursued by transactions. The atomicity, persistence and isolation mentioned earlier are all to ensure the consistency of the database state. In addition, in addition to guarantees at the database level, the implementation of consistency also requires guarantees at the application level. Measures to achieve consistency include:

  • Guarantee atomicity, durability, and isolation. If these characteristics cannot be guaranteed, transaction consistency cannot be guaranteed.
  • The database itself provides guarantees, for example, it is not allowed to insert string values ​​into integer columns, and the string length cannot exceed the column limit, etc.
  • Guarantee at the application level. For example, if the transfer operation only deducts the balance of the transferor but does not increase the balance of the recipient, no matter how perfect the database is implemented, the status cannot be guaranteed to be consistent.

gap lock

Gap locks are used to lock a range, but not the records themselves. Its purpose is to prevent multiple transactions from inserting records into the same range, which can lead to phantom read problems.

MVCC

MVCC multi-version concurrency control controls multiple version numbers of a data and achieves no conflict between reading and writing.

  1. Hidden columns: Each row of data in InnoDB has hidden columns. The hidden columns include the transaction ID of the row of data, the pointer to the undo log, etc.
  2. Version chain based on undo log: The hidden column of each row of data contains a pointer to the undo log, and each undo log also points to an earlier version of the undo log, thus forming a version chain.
  3. ReadView: By hiding columns and version chains, MySQL can restore data to a specified version. But which version to restore to specifically needs to be determined based on ReadView. The so-called ReadView means that a transaction (recorded as transaction A) takes a snapshot of the entire transaction system (trx_sys) at a certain moment. When a read operation is performed later, the transaction ID in the read data will be compared with the trx_sys snapshot, thus Determine whether the data is visible to the ReadView, that is, whether it is visible to transaction A.

I: Consistency means that the integrity constraints of the database are not violated before and after the transaction.

MVCC➕Next-key-Lock prevents phantom reads

InnoDBThe storage engine solves the phantom read problem at the RR level through MVCCand :Next-key Lock

1. Execute normally select. At this time, MVCCthe data will be read in the way of snapshot reading.

In the case of snapshot reads, the RR isolation level will only be generated on the first query after the transaction is started Read View, and will be used until the transaction is committed. Therefore, updates and inserted record versions made by other transactions after generation Read Vieware not visible to the current transaction, achieving repeatable reads and preventing "phantom reads" under snapshot reads.

2. Execute select...for update/lock in share mode, insert, update, delete and other current reads

Under the current read, all the latest data is read. If other transactions insert new records and they happen to be within the query range of the current transaction, phantom reads will occur! InnoDBUse Next-key Lockopen in new window to prevent this. When executing the current read, the read records will be locked and their gaps will be locked to prevent other transactions from inserting data within the query range. As long as I don't let you insert, phantom reading won't happen

When is gap lock used?

InnoDB uses row locks when only using unique index queries and locking only one record.

Next-Key Lock will occur when only a unique index query is used, but the search condition is a range search, or the search result is a unique search but does not exist (trying to lock non-existent data).

When using ordinary index retrieval, no matter what kind of query, as long as the lock is applied, gap lock will be generated.

When using unique indexes and ordinary indexes at the same time, gap locks will also occur because the data rows are sorted first according to the ordinary index and then according to the unique index.

Tell me about your understanding of redo log, undo log, and binlog

binlog(Binary Log)

The binary log file is often referred to as binlog. The binary log records all MySQL operations that modify the database, and then records them in binary form in the log file, which also includes the execution time and resources consumed by each statement, as well as related transaction information.

By default, the binary log function is enabled. You can reconfigure --log-bin[=file_name]the options at startup and modify the directory and file name where the binary log is stored.

redo log

Redo logs are used to achieve transaction durability, which is the D in transaction ACID. It consists of two parts: one is the redo log buffer in memory, which is volatile; the other is the redo log file, which is persistent.

InnoDB is a transaction storage engine. It achieves transaction durability through the Force Log at Commit mechanism. That is, when a transaction commits (COMMIT), all logs of the transaction must first be written to the redo log file for persistence. It is not completed until the COMMIT operation is completed. The log here refers to the redo log. In the InnoDB storage engine, it consists of two parts, namely redo log and undo log.

Redo log is used to ensure the durability of transactions, and undo log is used to help transaction rollback and MVCC functions. Redo logs are basically written sequentially, and there is no need to read the redo log files when the database is running. The undo log requires random reading and writing.

undo log

The redo log records the behavior of the transaction and can be used to "redo" the page. However, transactions sometimes need to be rolled back, and undo is needed in this case. Therefore, when modifying the database, the InnoDB storage engine will not only generate redo, but also generate a certain amount of undo. In this way, if the transaction or statement executed by the user fails for some reason, or the user uses a ROLLBACK statement to request a rollback, the undo information can be used to roll back the data to the way it was before modification.

Redo is stored in the redo log file. Unlike redo, undo is stored in a special segment inside the database. This segment is called the undo segment, and the undo segment is located in the shared table space.

How does the database ensure consistency?

Divided into two levels.

  • From the database level , the database ensures consistency through atomicity, isolation, and durability. That is to say, among the four major characteristics of ACID, C (consistency) is the purpose, A (atomicity), I (isolation), and D (durability) are the means. They are the means provided by the database to ensure consistency. The database must implement the three major characteristics of AID to achieve consistency . For example, atomicity is not guaranteed, and obviously consistency is not guaranteed either.
  • From the application level , judge whether the database data is valid through code, and then decide whether to roll back or submit the data!

How does the database ensure atomicity?

Mainly using Innodb's undo log . Undo log is called rollback log, which is the key to achieving atomicity. When a transaction is rolled back, it can undo all successfully executed SQL statements. It needs to record the corresponding log information you want to roll back. For example

  • When you delete a piece of data, you need to record the information of the data. When rolling back, insert the old data.
  • When you update a piece of data, you need to record the old value. When rolling back, perform the update operation based on the old value.
  • When inserting a piece of data, the primary key of this record is needed. When rolling back, the delete operation is performed based on the primary key.

The undo log records the information required for rollback. When the transaction execution fails or rollback is called, causing the transaction to be rolled back, the information in the undo log can be used to roll back the data to the way it was before modification.

How does the database ensure durability?

Mainly using Innodb's redo log . Rewriting the log, as mentioned before, MySQL first loads the data on the disk into the memory, modifies the data in the memory, and then writes it back to the disk. If the machine crashes suddenly at this time, the data in the memory will be lost. how to solve this problem? It's simple, just write the data directly to the disk before the transaction is committed. What's wrong with doing this?

  • To modify only one byte in a page, the entire page must be flushed to disk, which is a waste of resources. After all, a page is 16kb in size. If you only change a little bit of it, 16kb of content will be flushed to the disk, which doesn't sound reasonable.
  • After all, the SQL in a transaction may involve the modification of multiple data pages, and these data pages may not be adjacent, that is, they belong to random IO. Obviously, operating random IO will be slower.

Therefore, we decided to use redo log to solve the above problem. When data is modified, not only the operation is performed in memory, but the operation is also recorded in the redo log . When the transaction is committed, the redo log will be flushed ( part of the redo log is in memory and part is on disk). When the database is down and restarted, the contents in the redo log will be restored to the database, and then the data will be rolled back or submitted based on the contents of the undo log and binlog .

What are the benefits of using redo log?

In fact, the advantage is that flushing the redo log is more efficient than flushing the data page. The specific performance is as follows:

  • The redo log is small in size. After all, it only records which page has been modified. Therefore, the redo log is small in size and can be refreshed quickly.
  • The redo log is appended to the end and belongs to sequential IO. The efficiency is obviously faster than random IO.

Database left and right links

Left link: It will return all records in the left table and records in the right table that meet the join conditions.

Right link: It will return all records in the right table and records in the left table that meet the join conditions.

Full outer join: Returns all records in the left table and right table and records that meet the conditions

Self-link: Return records that meet the conditions in the left and right tables

Mysql query statement locking

lock in share mode join shared lock

for update adds exclusive lock

What is a transaction? Usage scenarios of transactions

A transaction is a set of logical operations, either all succeed or all fail.

Scenario 1: If in actual business, a piece of data needs to be stored in two tables at the same time, and the data in the two tables are required to be synchronized, then a transaction management mechanism needs to be used to ensure data synchronization. If an error occurs, for example, data insertion into table 1 is successful but data insertion into table 2 fails, then it will be rolled back and the data persistence operation will be terminated.

Scenario 2: Software development in the financial industry places strict emphasis on transaction processing. For example, in our common transfer operations, the amount of one party's account decreases, corresponding to the increase in the amount of the other party's account. This process requires the use of a transaction mechanism, otherwise the transfer cannot be successful.

https://leetcode.cn/problems/bu-ke-pai-zhong-de-shun-zi-lcof/)

Mysql commonly used data types

Numeric type

[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-C0fm9042-1682647034830) (D:/Study/JAVA/Interview/Interview Questions Organized Version.assets/image- 20220829112751233.png)]

date type

[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-CnkUBuRm-1682647034830) (D:/Study/JAVA/Interview/Interview Questions Organized Version.assets/image- 20220829112805686.png)]

string type

[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-eXYXhBAY-1682647034830) (D:/Study/JAVA/Interview/Interview Questions Organized Version.assets/image- 20220829112816093.png)]

MySQL data types | Novice Tutorial (runoob.com)

Large table optimization

Read and write separation

It is also a commonly used optimization at present. When reading from the slave library and writing to the master library, generally do not use dual masters or multiple masters to introduce a lot of complexity. Try to use other solutions in this article to improve performance. At the same time, many current split solutions also take into account the separation of reading and writing.

cache

split vertically

Vertical splitting is based on the correlation of the data tables in the database . For example, if there is both user data and order data in a database, then vertical splitting can put the user data in the user database and the order data in Order library. Vertical table splitting is a way to vertically split a data table. It is common to split a large multi-field table into common fields and non-common fields . The number of data records in each table is generally the same. Yes, only the fields are different, use primary key association

split horizontally

Horizontal splitting is to store data in pieces through a certain strategy. The sub-database is divided into two parts: tables and sub-databases. Each piece of data will be dispersed to different MySQL tables or libraries to achieve master-slave replication, read-write separation.

  1. Deploy multiple databases, select one of them as the master database, and one or more other databases as slave databases.
  2. Ensure that the data between the master database and the slave database are synchronized in real time. This process is what we often call master-slave replication .
  3. The system hands write requests to the master database for processing, and read requests to the slave database for processing.

The principle of master-slave replication

MySQL binlog (binary log is a binary log file) mainly records all changes in the data in the MySQL database (all DDL and DML statements executed by the database). Therefore, we can synchronize the data of the main library to the slave library based on the MySQL binlog log of the main library.

  1. The main library writes the changes in data in the database to binlog
  2. Connect the slave library to the main library
  3. The slave library will create an I/O thread to request the updated binlog from the main library.
  4. The main library will create a binlog dump thread to send binlog, and the I/O thread in the slave library is responsible for receiving
  5. The I/O thread of the slave library writes the received binlog into the relay log.
  6. Read the relay log from the SQL thread of the library to synchronize the data locally (that is, execute the SQL again).

canal tool: The principle of canal is to simulate the MySQL master-slave replication process, parse binlog and synchronize data to other data sources.

The working principle of replication is not complicated. It is actually a restoration of a full backup plus a binary log backup. The difference is that the restore operation of this binary log is basically in progress in real time. What needs special attention here is that replication is not done synchronously in completely real-time, but asynchronously in real-time. There is an execution delay between the master and slave servers. If the master server is under great pressure, it may cause a large delay between the master and slave servers . The working principle of replication is shown in the figure below. The slave server has 2 threads. One is the I/O thread, which is responsible for reading the binary log of the master server and saving it as a relay log; the other is the SQL thread, which performs replication. Relay log.

[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-okkvstw1-1682647034831) (D:/Study/JAVA/Interview/Interview Questions Organized Version.assets/image- 20220912151953665.png)]

How to solve high concurrency in projects

  • Use redis cache to cache homepage data
  • Use Nginx to solve load balancing problems
  • You can use the master-slave read-write separation mode of the database

It has a distributed effect and can support a very large amount of data. The previous table partition is essentially a special in-database partition.

redo log

redo log(Redo log) is InnoDBunique to the storage engine and provides MySQLcrash recovery capabilities.

For example, if MySQLthe instance hangs or crashes, InnoDBthe storage engine will use redo logthe recovery data when restarting to ensure the persistence and integrity of the data.

img

MySQLThe data in the page is measured in units of pages. When you query a record, one page of data will be loaded from the hard disk. The loaded data is called a data page and will be placed into the page Buffer Pool.

Subsequent queries are first searched Buffer Poolfrom , and then loaded to the hard disk if there is no hit, which reduces hard disk IOoverhead and improves performance.

The same is true when updating table data. If you find that Buffer Poolthere is data to be updated in , you can Buffer Poolupdate it directly in .

Then "what modifications were made on a certain data page" will be recorded in the redo log cache ( redo log buffer), and then flushed to redo logthe file.

img

Typographical error in the picture: The buffer in the sentence "Clear the redo log buffer and flush the disk to the redo log" in step 4 should be buffer.

Ideally, the flush operation will be performed as soon as the transaction is submitted, but in fact, the timing of flushing is based on the strategy.

Tips: Each redo record consists of "table space number + data page number + offset + modified data length + specific modified data"

InnoDBThe storage engine redo logprovides innodb_flush_log_at_trx_commitparameters for the disk flushing strategy, which supports three strategies:

  • 0 : When set to 0, it means that no flush operation will be performed every time a transaction is submitted.
  • 1 : When set to 1, it means that the disk flush operation will be performed every time a transaction is submitted (default value)
  • 2 : When set to 2, it means that only the redo log buffer content is written to the page cache every time a transaction is submitted.

innodb_flush_log_at_trx_commitThe parameter defaults to 1, which means that when the transaction is committed, it will be called fsyncto flush the redo log.

In addition, InnoDBthe storage engine has a background thread that writes the contents to the file system cache ( ) every 1second , and then calls flush.redo log bufferpage cachefsync

Detailed explanation of MySQL's three major logs (binlog, redo log and undo log) | JavaGuide

Why do we need redolog logs? Can’t we just flush the disk directly?

Data page flushing is a random write, because the corresponding location of a data page may be at a random location in the hard disk file, so the performance is very poor.

If it is written redo log, one row of records may occupy dozens of records Byte, including only the table space number, data page number, disk file offset, and update value. In addition, it is written sequentially, so the disk flushing speed is very fast.

Therefore, redo logthe performance of recording modified content in the form will be far superior to that of refreshing the data page, which also makes the database's concurrency capability stronger.

The difference between varchar and char

1. The length of CHAR is immutable, while the length of VARCHAR is variable. That is to say, define a CHAR[10] and VARCHAR[10]. If 'ABCD' is stored, then the length occupied by CHAR It is still 10, except for the character 'ABCD', followed by six spaces, and the length of VARCHAR has become 4. When fetching data, the CHAR type needs to use trim() to remove the extra spaces, but the VARCHAR type does not need to of.

2. The access speed of CHAR is much faster than that of VARCHAR , because its length is fixed, which facilitates the storage and search of the program; however, CHAR pays the price of space because its length is fixed, so it is inevitable that there will be extra spaces. Characters occupy space, which can be said to be trading space for time efficiency, while VARCHAR puts space efficiency first. VARCHAR requires 1 or 2 extra bytes to record the length of the string

3. The storage method of CHAR is that one English character (ASCII) occupies 1 byte, and one Chinese character occupies two bytes; while the storage method of VARCHAR is that one English character occupies 2 bytes, and one Chinese character also occupies 2 bytes. byte.

4. Both storage data are non-unicode character data.

How many Chinese characters are stored in varchar(255)

UTF-8When the character set is :
MySQL| version< 4.1:
VARCHAR is stored in bytes. Assuming that all are commonly used Chinese characters, VARCHAR(255) can store a total of about 85 Chinese characters.

MySQL| version>= 4.1:
VARCHAR is stored in character units. Assuming that all are commonly used Chinese characters, VARCHAR(255) can store 255 Chinese characters.

MySQL's first-level cache and second-level cache

The first-level cache caches SQL statements, and the second-level cache caches result objects.

The first-level cache refers to the Session, and the scope is also at the Session level. When operating the database, you need to construct a SQLSession object. This object can store cached data, and the cached data areas of different SQLSession do not affect each other and can only function. in the same Session

How level one cache works:

The first query is checked in the database and then placed in the cache. The second query is directly checked in the cache.

If there are additions, deletions, and modifications between the two queries, the data in the cache will be cleared, and you need to check it in the database again.

The second-level cache is a mapper-level cache. Multiple SqlSession operates the same Mapper's sql statement. Multiple SqlSession can share the second-level cache. The second-level cache is across SqlSession. The second level cache has a wider scope.

The second level cache is a mapper level cache. When using the second-level cache, multiple SqlSession uses the same Mapper's sql statement to operate the database, and the obtained data will be stored in the second-level cache area, which also uses HashMap for data storage. Compared with the first-level cache SqlSession, the second-level cache has a larger scope. Multiple Sqlsession can share the second-level cache, and the second-level cache spans SqlSession.

The scope of the second-level cache is the same namespace of the mapper. Different sqlSession executes the sql statement under the same namespace twice, and the parameters passed to the sql are also the same, that is, the same sql statement is ultimately executed. After the first execution, the data queried in the database will be written to the cache. The secondary query will obtain data from the cache instead of querying the underlying database, thereby improving efficiency.

How L2 cache works:

img

Mybatis's first-level cache and second-level cache

Do you understand the MyBatis caching mechanism?

Reference answer

MyBatis's cache is divided into first-level cache and second-level cache.

Level 1 cache:

The first-level cache is also called local cache. It is enabled by default and cannot be turned off. The first-level cache exists in the life cycle of SqlSession, that is, it is a SqlSession-level cache. When querying in the same SqlSession, MyBatis will use the algorithm to generate cache key values ​​for the executed methods and parameters, and store the key values ​​and query results in a Map object. If the methods and parameters executed in the same SqlSession are exactly the same, the same key value will be generated through the algorithm. When the key value already exists in the Map cache object, the object in the cache will be returned.

Second level cache:

The second-level cache exists in the life cycle of SqlSessionFactory, that is, it is a cache at the SqlSessionFactory level. If you want to use the second level cache, you need to configure it in the following two places.

There is a parameter cacheEnabled in the global configuration settings of MyBatis. This parameter is the global switch of the second-level cache. The default value is true and the initial state is enabled.

MyBatis's second-level cache is bound to the namespace, that is, the second-level cache needs to be configured in the Mapper.xml mapping file. Under the condition that the global configuration of the second-level cache is enabled, to enable the second-level cache for Mapper.xml, you only need to add the following code to Mapper.xml:

<cache />

Second level cache has the following effects:

  • All SELECT statements in the mapped statement file will be cached.
  • All INSERT, UPDATE, and DELETE statements in the mapping statement file will refresh the cache.
  • The cache will be evicted using the Least Recently Used (LRU, Least Recently Used) algorithm.
  • The cache will not be flushed in any chronological order according to the schedule (such as no Flush Int erv al, no refresh interval).
  • The cache stores 1024 references to collections or objects regardless of the type of value returned by the query method.
    The sql statement below, and the parameters passed to the sql are also the same, that is, the same sql statement is eventually executed**. After the first execution, the data queried in the database will be written to the cache, and the second query will be written from the cache. To obtain data, you no longer need to query the underlying database, thereby improving efficiency.

How L2 cache works:

[External link pictures are being transferred...(img-zE8vYANX-1682647034832)]

Mybatis's first-level cache and second-level cache

Do you understand the MyBatis caching mechanism?

Reference answer

MyBatis's cache is divided into first-level cache and second-level cache.

Level 1 cache:

The first-level cache is also called local cache. It is enabled by default and cannot be turned off. The first-level cache exists in the life cycle of SqlSession, that is, it is a SqlSession-level cache. When querying in the same SqlSession, MyBatis will use the algorithm to generate cache key values ​​for the executed methods and parameters, and store the key values ​​and query results in a Map object. If the methods and parameters executed in the same SqlSession are exactly the same, the same key value will be generated through the algorithm. When the key value already exists in the Map cache object, the object in the cache will be returned.

Second level cache:

The second-level cache exists in the life cycle of SqlSessionFactory, that is, it is a cache at the SqlSessionFactory level. If you want to use the second level cache, you need to configure it in the following two places.

There is a parameter cacheEnabled in the global configuration settings of MyBatis. This parameter is the global switch of the second-level cache. The default value is true and the initial state is enabled.

MyBatis's second-level cache is bound to the namespace, that is, the second-level cache needs to be configured in the Mapper.xml mapping file. Under the condition that the global configuration of the second-level cache is enabled, to enable the second-level cache for Mapper.xml, you only need to add the following code to Mapper.xml:

<cache />

Second level cache has the following effects:

  • All SELECT statements in the mapped statement file will be cached.
  • All INSERT, UPDATE, and DELETE statements in the mapping statement file will refresh the cache.
  • The cache will be evicted using the Least Recently Used (LRU, Least Recently Used) algorithm.
  • The cache will not be flushed in any chronological order according to the schedule (such as no Flush Int erv al, no refresh interval).
  • The cache stores 1024 references to collections or objects regardless of the type of value returned by the query method.
  • The cache is considered read/write, meaning that object retrieval is not shared and can be safely modified by callers without interfering with potential modifications by other callers or threads.

Guess you like

Origin blog.csdn.net/qq_43167873/article/details/130420407
Recommended