Thoroughly understand MySQL interview eight-part essay

The content is excerpted from my learning website: topjavaer.cn

What is MySQL

MySQL is a relational database that stores data in the form of tables. You can think of it as an Excel table. Since data is stored in the form of a table, it has a table structure (rows and columns). Rows represent each row of data, and columns represent each value in that row. The values ​​on the column have data types, such as integers, strings, dates, etc.

Three major paradigms of database

First normal form 1NF

Ensure atomicity of database table fields. A very comprehensive Java interview website

For example, the field userInfo: 广东省 10086'must be split into userInfo: 广东省 userTel: 10086two fields according to the first normal form.

Second normal form 2NF

First of all, it must meet the first normal form, and it also includes two parts. First, the table must have a primary key; second, the non-primary key columns must completely depend on the primary key, and cannot only rely on part of the primary key.

for example. Assume that the course selection relationship table is student_course(student_no, student_name, age, course_name, grade, credit) and the primary key is (student_no, course_name). The credits are completely dependent on the course name, and the name and age are completely dependent on the student number, which does not conform to the second paradigm and will lead to data redundancy (students choose n courses, and there are n records of name and age) and insertion anomalies (insert a new course, Because there is no student ID, new class records cannot be saved) and other issues.

It should be split into three tables: students: student(stuent_no, student_name, age); courses: course(course_name, credit); course selection relationships: student_course_relation(student_no, course_name, grade).

Third normal form 3NF

First of all, it must satisfy the second normal form. In addition, non-primary key columns must directly depend on the primary key, and there cannot be transitive dependencies. That is, it cannot exist: non-primary key column A depends on non-primary key column B, and non-primary key column B depends on the primary key.

Assume that the student relationship table is Student(student_no, student_name, age, academy_id, academy_telephone), and the primary key is "student number". The college id depends on the student number, and the college location and college phone number depend on the college id. There is a transitive dependency, which is not consistent. Third paradigm.

The student relationship table can be divided into the following two tables: student: (student_no, student_name, age, academy_id); college: (academy_id, academy_telephone).

What is the difference between 2NF and 3NF?

  • 2NF is based on whether the non-primary key column depends entirely on the primary key, or depends on part of the primary key.
  • 3NF is based on whether the non-primary key column directly depends on the primary key or the non-primary key.

This article has been included in the Github repository, which includes computer basics, Java basics, multi-threading, JVM, database, Redis, Spring, Mybatis, SpringMVC, SpringBoot, distributed, microservices, design patterns, architecture, school recruitment and social recruitment sharing, etc. Core knowledge points, welcome to star!

Link (clickable): Github address

If you cannot access Github, you can access the gitee address.

Link (clickable): gitee address

What are the four major characteristics of transactions?

Transaction characteristics ACID : atomicity ( Atomicity), consistency ( Consistency), isolation ( Isolation), durability ( Durability).

  • Atomicity means that all operations included in a transaction either succeed or fail and are rolled back.
  • Consistency means that a transaction must be in a consistent state before and after execution. For example, the accounts of a and b have a total of 1,000 yuan. After the transfer between the two people succeeds or fails, the sum of their accounts will still be 1,000.
  • Isolation . Related to the isolation level, for example read committed, a transaction can only read submitted modifications.
  • Durability means that once a transaction is committed, the changes to the data in the database are permanent, and the operation of committing the transaction will not be lost even if the database system encounters a failure.

What are the transaction isolation levels?

First understand the following concepts: dirty reading, non-repeatable reading, and phantom reading.

  • Dirty reading refers to reading data in another uncommitted transaction during one transaction.
  • Non-repeatable read means that for a certain row of records in the database, multiple queries within a transaction range return different data values. This is because another transaction modified the data and submitted it during the query interval.
  • Phantom reading occurs when a transaction reads records in a certain range, and another transaction inserts a new record in the range. The correct understanding of phantom reading is that the conclusion of a read operation within a transaction cannot support subsequent business execution. Assume that the transaction wants to add a new record, the primary key is id, and a select is executed before adding, and no record with id xxx is found, but a primary key conflict occurs during insertion. This is a phantom read. No record can be read but a primary key conflict is found. This is because the record has actually been inserted by other transactions, but is not visible to the current transaction.

The difference between non-repeatable read and dirty read is that dirty read is when a transaction reads the uncommitted dirty data of another transaction, while non-repeatable read is when the data submitted by the previous transaction is read.

Transaction isolation is to solve the problems of dirty reads, non-repeatable reads, and phantom reads mentioned above.

The four isolation levels provided by MySQL database are:

  • Serializable : Solve the phantom read problem by forcing transactions to be ordered so that they cannot conflict with each other.
  • Repeatable read : MySQL's default transaction isolation level, which ensures that multiple instances of the same transaction will see the same data rows when reading data concurrently, solving the problem of non-repeatable read.
  • Read committed : A transaction can only see changes made by committed transactions. Dirty reads can be avoided.
  • Read uncommitted : All transactions can see the execution results of other uncommitted transactions.

Check the isolation level:

select @@transaction_isolation;

Set isolation level:

set session transaction isolation level read uncommitted;

What isolation level is generally used for production environment databases?

Most production environments use RC . Why not RR?

Repeatable Read, referred to as RR
Read Committed, referred to as RC

Reason 1: Under the RR isolation level, there is a gap lock, which leads to a much higher probability of deadlock than RC!
Reason two: Under the RR isolation level, if the conditional column misses the index, the table will be locked! Under the RC isolation level, only rows are locked!

In other words, RC has higher concurrency than RR.

And in most scenarios, the non-repeatable read problem is acceptable. After all, the data has been submitted, so there is no big problem in reading it out!

Link (clickable): A very comprehensive Java interview website

The relationship between encoding and character set

We can usually enter various Chinese and English letters on the editor, but these are for humans to read, not for computers. In fact, computers actually save and transmit data in the binary 0101 format .

Then there needs to be a rule to convert Chinese and English letters into binary. Among them, d corresponds to 64 in hexadecimal, which can be converted to 01 binary format. So letters and numbers correspond one to one, and this is the ASCII encoding format.

It uses one byte to identify characters . 8位There are 128 basic symbols and 128 extended symbols. It can only represent English letters and numbers .

This is obviously not enough. Therefore, in order to identify Chinese , the GB2312 encoding format appeared . In order to identify Greek , the greek encoding format appeared , and in order to identify Russian , the cp866 encoding format was adjusted .

In order to unify them, the Unicode encoding format appeared , which uses 2 to 4 bytes to represent characters, so that theoretically all symbols can be included, and it is also fully compatible with ASCII encoding, that is to say, the same The letter d is represented by 64 in ASCII, but it is still represented by 64 in Unicode.

But the difference is that ASCII encoding is represented by 1 byte, while Unicode is represented by two bytes.

They are also the letter d. Unicode uses one more byte than ascii, as follows:

D   ASCII:           01100100
D Unicode:  00000000 01100100

As you can see, the unicode encoding above is all 0 in front, which is actually not useful, but it still occupies 1 byte, which is a bit wasteful. If we can hide when we should, we can save a lot of space. According to this idea, there is UTF-8 encoding .

To sum up, matching symbols and binary codes according to certain rules is called encoding . And gathering n many such encoded characters together is what we often call a character set .

For example, the utf-8 character set is the collection of all characters in the utf-8 encoding format.

I would like to see which character sets mysql supports. Can be executedshow charset;

The difference between utf8 and utf8mb4

As mentioned above, utf-8 is an optimization based on unicode. Since unicode has a way to represent all characters, utf-8 can also represent all characters. In order to avoid confusion, I will call it utf8 later .

The character sets supported by mysql include utf8 and utf8mb4.

Let’s talk about utf8mb4 encoding first. mb4 means most bytes 4. As you can see from the rightmost picture above Maxlen, it supports a maximum of 4 bytes to represent characters. It can be used to represent almost all currently known characters.

Let’s talk about utf8 in the mysql character set , which is the default character set of the database . But note that this utf8 is not that utf8 , we call it the small utf8 character set. Why do you say this? Because it can be seen from Maxlen that it supports up to 3 bytes to represent characters. According to the naming method of utf8mb4, it should be called utf8mb3 more accurately .

utf8 is like a castrated version of utf8mb4, which only supports some characters. For example emoji, it does not support emoticons.

In the character sets supported by mysql, the third column, collation , refers to the comparison rules of character sets .

For example, "debug" and "Debug" are the same word, but their capitalization is different. Should they be regarded as the same word?

This is when you need to use collation.

You SHOW COLLATION WHERE Charset = 'utf8mb4';can check utf8mb4what comparison rules are supported.

If collation = utf8mb4_general_ci, it means that under the premise of using the utf8mb4 character set, comparison is performed character by character ( general), and is not case-sensitive ( _ci,case insensitice).

In this case, "debug" and "Debug" are the same word.

If changed to collation=utf8mb4_bin, it means comparing the binary bit sizes one by one .

So "debug" and "Debug" are not the same word.

So what are the disadvantages of utf8mb4 compared to utf8?

We know that in the database table, if the field type is char(2), it 2refers to the number of characters , which means that no matter what encoding character set is used in this table , 2 characters can be placed.

And char has a fixed length . In order to fit 2 utf8mb4 characters, char will reserve 2*4(maxlen=4)= 81 bytes of space by default.

If it is utf8mb3, 2 * 3 (maxlen=3) = 6bytes of space will be reserved by default. That is, in this case, utf8mb4 will use some more space than utf8mb3.

index

What is an index?

An index is a data structure used by storage engines to improve the access speed of database tables . It can be compared to the table of contents of a dictionary, which can help you quickly find the corresponding records.

Indexes are generally stored in files on disk, which occupy physical space.

What are the advantages and disadvantages of indexing?

advantage:

  • Speed ​​up data lookups
  • Adding indexes to fields used for sorting or grouping can speed up grouping and sorting.
  • Speed ​​up joins between tables

shortcoming:

  • Indexing requires physical space
  • It will reduce the efficiency of additions, deletions and modifications to the table, because every time a table record is added, deleted or modified, the index needs to be dynamically maintained , resulting in longer addition, deletion and modification time.

Let me share with you a Github repository, which contains more than 300 classic computer book PDFs compiled by Dabin, including C language, C++, Java, Python, front-end, database, operating system, computer network, data structure and algorithm, machine learning, Programming Life , etc., you can star it and search directly on it next time you are looking for a book. The warehouse is being updated continuously! Github address

What is the role of index?

The data is stored on the disk. When querying the data, if there is no index, all the data will be loaded into the memory and retrieved in sequence, and the disk will be read more times. With the index, there is no need to load all the data, because the height of the B+ tree is generally 2-4 layers, and only 2-4 disk reads are needed at most, greatly improving the query speed.

Under what circumstances is it necessary to create an index?

  1. Fields frequently used in queries
  2. Indexing fields frequently used for connections can speed up the connection
  3. Indexing is often required for fields that need to be sorted, because the index is already sorted, which can speed up sorting queries.

Under what circumstances is indexing not created?

  1. whereFields not used in conditions are not suitable for indexing
  2. The table has fewer records. For example, if there are only a few hundred pieces of data, there is no need to add an index.
  3. Frequent additions, deletions and modifications are required. Need to evaluate whether indexing is suitable
  4. Columns participating in column calculations are not suitable for index building
  5. Fields that are not highly distinguishable are not suitable for indexing, such as gender, which only has three values: male/female/unknown. Adding an index will not improve query efficiency.

index data structure

The data structures of the index mainly include B+ tree and hash table, and the corresponding indexes are B+ tree index and hash index respectively. The index types of the InnoDB engine include B+ tree index and hash index. The default index type is B+ tree index.

B+tree index

B+ tree is implemented based on B-tree and leaf node sequential access pointers. It has the balance of B-tree and improves the performance of interval query through sequential access pointers.

In the B+ tree, the nodes in the node keyare arranged in ascending order from left to right. If the left and right neighbors of a pointer keyare key i and key i+1 respectively , then the pointer points to all the nodes of the node that keyare greater than or equal to key i and less than or equal to key i+ 1 .

[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-sBews5yP-1691456619394) (http://img.topjavaer.cn/img/B+tree index 0.png) ]

When performing a search operation, first perform a binary search on the root node to find keythe pointer, and then recursively search on the node pointed by the pointer. Until the leaf node is found, then perform a binary search on the leaf node to find keythe corresponding data item.

The most commonly used index type in MySQL database is BTREEthe index, which is implemented based on the B+ tree data structure.

mysql> show index from blog\G;
*************************** 1. row ***************************
        Table: blog
   Non_unique: 0
     Key_name: PRIMARY
 Seq_in_index: 1
  Column_name: blog_id
    Collation: A
  Cardinality: 4
     Sub_part: NULL
       Packed: NULL
         Null:
   Index_type: BTREE
      Comment:
Index_comment:
      Visible: YES
   Expression: NULL

Hash index

The hash index is implemented based on the hash table. For each row of data, the storage engine will hash the index column to obtain the hash code, and the hash algorithm should try to ensure that the hash code value calculated for different column values ​​is Differently, the hash code value is used as the key value of the hash table, and the pointer to the data row is used as the value value of the hash table. The time complexity of searching a data in this way is O(1), which is generally used for precise searches.

What is the difference between Hash index and B+ tree index?

  • Hash indexes do not support sorting because hash tables are unordered.
  • Hash indexes do not support range lookups .
  • Hash indexes do not support fuzzy queries and leftmost prefix matching for multi-column indexes.
  • Because there will be hash conflicts in the hash table , the performance of the hash index is unstable, while the performance of the B+ tree index is relatively stable. Each query is from the root node to the leaf node.

Why is B+ tree more suitable for implementing database index than B tree?

  • Since the data of the B+ tree are all stored in leaf nodes, and the leaf nodes are all indexes, it is convenient to scan the database. You only need to scan the leaf nodes once. However, the B tree also stores data because its branch nodes, we need to find Specific data needs to be scanned in order by an in-order traversal, so the B+ tree is more suitable for interval queries. In the database, range-based queries are very frequent, so the B+ tree is usually used for database indexes.

  • The nodes of the B+ tree only store the index key value, and the address of the specific information exists in the address of the leaf node. This allows more nodes to be stored in the page-based index. Reduce more I/O expenses.

  • The query efficiency of B+ tree is more stable. Any keyword search must take a path from the root node to the leaf node. The path length of all keyword queries is the same, resulting in equal query efficiency for each data.

What are the categories of indexes?

1. Primary key index : The only non-empty index named primary, no null values ​​are allowed.

2. Unique index : The value in the index column must be unique, but null values ​​are allowed. The difference between a unique index and a primary key index is that the unique index field can be null and there can be multiple null values, while the primary key index field cannot be null. The purpose of the unique index: to uniquely identify each record in the database table, mainly to prevent repeated insertion of data. The SQL statement to create a unique index is as follows:

ALTER TABLE table_name
ADD CONSTRAINT constraint_name UNIQUE KEY(column_1,column_2,...);

3. Combined index : An index created on a combination of multiple fields in the table will be used only when the left fields of these fields are used in the query conditions. When using a combined index, the leftmost prefix principle must be followed.

4. Full-text index : Full-text index can only be used on CHAR, VARCHARand TEXTtype fields.

5. Ordinary index : Ordinary index is the most basic index. It has no restrictions and the value can be empty.

What is the leftmost matching principle?

If the leftmost index in the combined index is used in the SQL statement, then this SQL statement can use this combined index to perform matching. When a range query ( >, <, between, like) is encountered, matching will stop, and the index will not be used for subsequent fields.

For (a,b,c)indexing, if the query condition is a/ab/abc, the index will be used, but if bc is used, the index will not be used.

For (a,b,c,d)indexing, the query condition is a = 1 and b = 2 and c > 3 and d = 4, then the three fields a, b and c can use the index, but d cannot use the index. Because a range query was encountered.

As shown in the figure below, index (a, b) is created. a is globally ordered in the index tree, while b is globally unordered and locally ordered (when a is equal, it will be sorted according to b). Indexes cannot be used to directly execute b = 2this query condition.

[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-e4IVib9u-1691456619394) (http://img.topjavaer.cn/img/leftmost prefix.png)]

When the value of a is determined, b is ordered. For a = 1example, the b value is 1 and 2 is an ordered state. At that a = 2time, the value of b is 1, and 4 is also in an ordered state. a = 1 and b = 2The a and b fields can use the index when executing . During execution a > 1 and b = 2, the a field can use the index, but the b field cannot use the index. Because the value of a is a range at this time and is not fixed, the value of b is not ordered within this range, so the index cannot be used for the b field.

What is a clustered index?

InnoDB uses the primary key of the table to construct a primary key index tree, and the leaf nodes store the record data of the entire table. The storage of clustered index leaf nodes is logically continuous and uses a doubly linked list connection. The leaf nodes are sorted in the order of the primary key, so the sorting search and range search of the primary key are faster.

The leaf nodes of the clustered index are the row records of the entire table. InnoDB uses a clustered index as the primary key. Clustered indexes are much more efficient than non-clustered index queries.

For InnoDB, the clustered index is generally the primary key index in the table. If the specified primary key is not displayed in the table, the first unique NULLindex in the table that is not allowed will be selected. If there is no primary key and no suitable unique index, InnoDBa hidden primary key will be generated internally as a clustered index. The length of this hidden primary key is 6 bytes, and its value will increase automatically as data is inserted.

What is a covering index?

selectThe data columns can only be obtained from the index, and there is no need to go back to the table for a second query, which means that the query column must be covered by the index used. For innodbthe secondary index of the table, if the index can cover the queried column, then a secondary query of the primary key index can be avoided.

Not all types of indexes can be covering indexes. Covering indexes store the values ​​of index columns, while hash indexes and full-text indexes do not store the values ​​of index columns, so MySQL uses b+ tree indexes as covering indexes.

For queries that use covering indexes, if used in front of the query explain, the output extra column will be displayed as using index.

For example, in user_likethe user likes table, the combined index is (user_id, blog_id), user_idand blog_idneither is null.

explain select blog_id from user_like where user_id = 13;

explainExtraThe column of the result Using indexis that the queried column is covered by the index, and the where filtering condition complies with the leftmost prefix principle. You can directly find the data that meets the conditions through the index search without going back to the table to query the data.

explain select user_id from user_like where blog_id = 1;

explainExtraThe column of the result Using where; Using indexis that the queried column is covered by the index. The where filtering condition does not comply with the leftmost prefix principle. The qualifying data cannot be found through index search, but the qualifying data can be found through index scanning, and there is no need to return the table to query the data . .

Index design principles?

  • For fields that are often used as query conditions, indexes should be created to improve query speed.
  • Index fields that frequently require sorting, grouping, and union operations
  • The higher the discrimination of the index column , the better the index effect. For example, if you use a column with low distinction like gender as an index, the effect will be very poor.
  • Avoid indexing "large fields". Try to use fields with small data volumes as indexes. Because MySQLthe field values ​​are maintained together when maintaining the index, this will inevitably cause the index to occupy more space, and it will also take more time to compare during sorting.
  • Try to use short indexes . When indexing longer strings, you should specify a shorter prefix length, because smaller indexes involve less disk I/O and query speeds are faster.
  • The more indexes, the better. Each index requires additional physical space and maintenance takes time.
  • Do not create indexes for fields that are frequently added, deleted or modified. Assuming that a certain field is modified frequently, it means that the index needs to be rebuilt frequently, which will inevitably affect the performance of MySQL.
  • Use the leftmost prefix principle .

When will an index expire?

Situations that lead to index failure:

  • For composite indexes, if the leftmost field of the composite index is not used, the index will not be used.
  • Like queries that start with %, for example %abc, cannot use indexes; like queries that do not start with %, for example abc%, are equivalent to range queries, and indexes will be used.
  • The column type in the query condition is a string, and no quotation marks are used. Implicit conversion may occur due to different types, making the index invalid.
  • When determining whether an index column is not equal to a certain value
  • Perform operations on index columns
  • Query conditions using orconnections will also cause index failure

What is a prefix index?

Sometimes it is necessary to create an index on a very long character column, which makes the index extremely large and slow. Using prefix indexes avoids this problem.

Prefix index refers to indexing the first few characters of text or string, so that the length of the index is shorter and the query speed is faster.

The key to creating a prefix index is to choose a long enough prefix to ensure high index selectivity . The higher the index selectivity, the higher the query efficiency, because a highly selective index allows MySQL to filter out more data rows when searching.

How to create a prefix index:

// email列创建前缀索引
ALTER TABLE table_name ADD KEY(column_name(prefix_length));

index pushdown

Refer to my other article: Illustrated Index Pushdown!

What are the common storage engines?

The four storage engines commonly used in MySQL are: MyISAM , InnoDB , MEMORY , and ARCHIVE . The default storage engine after MySQL 5.5 is InnoDB.

InnoDB storage engine

InnoDB is MySQL's default transactional storage engine , the most widely used, and is built based on clustered indexes. InnoDB has made many optimizations internally, such as being able to automatically create adaptive hash indexes in memory to speed up read operations.

Advantages : Supports transactions and crash recovery capabilities; introduces row-level locks and foreign key constraints.

Disadvantages : The data space occupied is relatively large.

Applicable scenarios : transaction support is required and there is a high frequency of concurrent reads and writes.

MyISAM storage engine

Data is stored in a compact format. For read-only data, or the table is relatively small and can tolerate repair operations, the MyISAM engine can be used. MyISAM stores tables in two files, the data file .MYDand the index file .MYI.

Advantages : Fast access.

Disadvantages : MyISAM does not support transactions and row-level locks, does not support safe recovery after a crash, and does not support foreign keys.

Applicable scenarios : There is no requirement for transaction integrity; all table data will be read-only.

MEMORY storage engine

The MEMORY engine puts all the data in the memory, and the access speed is faster, but once the system crashes, the data will be lost.

The MEMORY engine uses hash indexes by default, saving the hash value of the key and the pointer to the data row in the hash index.

Advantages : Faster access.

Disadvantages :

  1. Hash index data is not stored in the order of index values ​​and cannot be used for sorting.
  2. Partial index match lookups are not supported because hash indexes use the entire contents of the index column to calculate the hash value.
  3. Only equality comparison is supported, range query is not supported.
  4. When a hash conflict occurs, the storage engine needs to traverse all row pointers in the linked list and compare them row by row until a row that meets the conditions is found.

ARCHIVE storage engine

The ARCHIVE storage engine is very suitable for storing large amounts of independent data as historical records. ARCHIVE provides compression function and has efficient insertion speed, but this engine does not support indexes, so query performance is poor.

What is the difference between MyISAM and InnoDB?

  1. Differences in storage structures . Each MyISAM is stored as three files on disk. The file name starts with the name of the table and the extension indicates the file type. .frm files store table definitions. The data file extension is .MYD (MYData). The extension of the index file is .MYI (MYIndex). All InnoDB tables are stored in the same data file (or multiple files, or independent table space files). The size of the InnoDB table is only limited by the size of the operating system file, which is generally 2GB.
  2. The difference in storage space . MyISAM supports three different storage formats: static table (default, but please note that there cannot be spaces at the end of the data, it will be removed), dynamic table, and compressed table. After the table is created and data is imported, no modification operations will be performed. You can use compressed tables to greatly reduce disk space usage. InnoDB requires more memory and storage, and it will establish its own dedicated buffer pool in main memory for caching data and indexes.
  3. Portability, backup and recovery . MyISAM data is stored in the form of files, so it is very convenient for cross-platform data transfer. You can perform operations on a table individually during backup and recovery. For InnoDB, feasible solutions are to copy data files, back up binlog, or use mysqldump, which is relatively troublesome when the data volume reaches dozens of gigabytes.
  4. Whether row-level locking is supported . MyISAM only supports table-level locks. When users operate a myisam table, select, update, delete, and insert statements will automatically lock the table. If the locked table meets the insert concurrency condition, new data can be inserted at the end of the table. data. InnoDB supports row-level locks and table-level locks, and the default is row-level locks. Row locks greatly improve the performance of multi-user concurrent operations.
  5. Whether to support safe recovery after transactions and crashes . MyISAM does not provide transaction support. InnoDB provides transaction support and has transaction, rollback and crash repair capabilities.
  6. Whether to support foreign keys . MyISAM does not support it, but InnoDB does.
  7. Whether to support MVCC . MyISAM does not support it, but InnoDB does. To deal with high-concurrency transactions, MVCC is more efficient than simple locking.
  8. Whether clustered indexes are supported . MyISAM does not support clustered indexes, but InnoDB supports clustered indexes.
  9. Full text index . MyISAM supports FULLTEXT type full-text index. InnoDB does not support FULLTEXT type full-text index, but innodb can use the sphinx plug-in to support full-text index, and the effect is better.
  10. Table primary key . MyISAM allows tables to exist without any indexes and primary keys. The indexes are the addresses where rows are saved. For InnoDB, if there is no primary key or non-empty unique index set, a 6-byte primary key will be automatically generated (not visible to the user).
  11. The number of rows in the table . MyISAM saves the total number of rows in the table. If select count(*) from table; the value will be taken out directly. InnoDB does not save the total number of rows in the table. If you use select count(*) from table; it will traverse the entire table, which consumes a lot of money. However, after adding the where condition, MyISAM and InnoDB handle it in the same way.

What locks does MySQL have?

Classified by lock granularity , there are row-level locks, table-level locks and page-level locks.

  1. Row-level locks are the most granular locks in MySQL. Indicates that only the row currently operated is locked. Row-level locking can greatly reduce conflicts in database operations. Its locking granularity is the smallest, but the locking overhead is also the largest. There are three main types of row-level locks:
    • Record Lock, record lock, that is, only locking one record;
    • Gap Lock, gap lock, locks a range, but does not include the record itself;
    • Next-Key Lock: A combination of Record Lock + Gap Lock, locks a range and locks the record itself.
  2. Table-level lock is the lock with the largest granularity in MySQL, which means locking the entire table of the current operation. It is simple to implement, consumes less resources, and is supported by most MySQL engines. The most commonly used MyISAM and InnoDB support table-level locking.
  3. Page-level locks are a type of lock in MySQL whose locking granularity is between row-level locks and table-level locks. Table-level locks are fast but have many conflicts. Row-level locks have few conflicts but are slow. Therefore, a compromised page-level lock is adopted, locking a group of adjacent records at a time.

Classified by lock level , there are shared locks, exclusive locks and intention locks.

  1. Shared locks, also known as read locks, are locks created by read operations. Other users can read the data concurrently, but no transaction can modify the data (acquire an exclusive lock on the data) until all shared locks have been released.
  2. Exclusive locks are also called write locks and exclusive locks. If transaction T adds an exclusive lock to data A, other transactions cannot add any type of blockade to A. Transactions granted exclusive locks can both read and modify data.
  3. Intention locks are table-level locks designed primarily to reveal the type of lock that will be requested for the next row in a transaction. Two table locks in InnoDB:

Intention shared lock (IS): Indicates that the transaction is preparing to add a shared lock to the data row, which means that before adding a shared lock to a data row, the IS lock of the table must first be obtained;

Intention exclusive lock (IX): Similar to the above, it indicates that the transaction is preparing to add an exclusive lock to the data row, indicating that the transaction must first obtain the IX lock of the table before adding an exclusive lock to a data row.

Intention locks are automatically added by InnoDB and do not require user intervention.

For INSERT, UPDATE and DELETE, InnoDB will automatically add exclusive locks to the data involved; for general SELECT statements, InnoDB will not add any locks, and transactions can explicitly add shared locks or exclusive locks through the following statements.

Shared lock:SELECT … LOCK IN SHARE MODE;

Exclusive lock:SELECT … FOR UPDATE;

MVCC implementation principle?

MVCC( Multiversion concurrency control) is a way to retain multiple versions of the same data, thereby achieving concurrency control. When querying, read viewthe data of the corresponding version is found through the version chain.

Function: Improve concurrency performance. For high-concurrency scenarios, MVCC is less expensive than row-level locks.

The implementation principle of MVCC is as follows:

The implementation of MVCC relies on the version chain, which is implemented through three hidden fields of the table.

  • DB_TRX_ID: Current transaction id, the time sequence of the transaction is judged by the size of the transaction id.
  • DB_ROLL_PTR: The rollback pointer points to the previous version of the current row record. Through this pointer, multiple versions of the data are connected together to form a undo logversion chain.
  • DB_ROW_ID: Primary key. If the data table does not have a primary key, InnoDB will automatically generate a primary key.

Each table record probably looks like this:

When a transaction is used to update a row record, a version chain will be generated. The execution process is as follows:

  1. Lock the row with an exclusive lock;
  2. Copy the original value of the row undo logto the old version for rollback;
  3. Modify the value of the current row, generate a new version, update the transaction ID, and make the rollback pointer point to the record of the old version, thus forming a version chain.

Here is an example for everyone to understand.

1. The initial data is as follows, DB_ROW_IDand the sum DB_ROLL_PTRis empty.

2. Transaction A modified the row data and changed ageit to 12. The effect is as follows:

3. Later, transaction B also modified the row record and changed ageit to 8. The effect is as follows:

4. At this time, the undo log has two lines of records, and they are connected through the rollback pointer.

Next, understand the concept of read view.

read viewIt can be understood as taking a "photo" to record the status of the data at each moment. When obtaining data at a certain time t, the data is obtained from the "photo" taken at time t.

read viewAn active transaction list is maintained internally, indicating the read viewtransactions that are still active at the time of generation. This linked list contains read viewtransactions that have not been committed before creation, but does not include read viewtransactions that were committed after creation.

Different isolation levels have different timings for creating read views.

  • read committed: Each time select is executed, a new read_view will be created to ensure that modifications submitted by other transactions can be read.

  • Repeatable read: Within a transaction scope, this read_view is updated during the first selection and will not be updated again. All subsequent selections reuse the previous read_view. This ensures that the content read within the transaction scope is the same every time, and can be read repeatedly.

Record filtering method of read view

Premise : DATA_TRX_IDIndicates the latest transaction ID of each data row; up_limit_idindicates the earliest started transaction in the current snapshot; low_limit_idindicates the slowest started transaction in the current snapshot, that is, the last transaction.

  • If DATA_TRX_ID< up_limit_id: It means that read viewthe transaction that modified the data row has been committed when it was created, and the record of this version can be read by the current transaction.
  • 如果DATA_TRX_ID >= low_limit_id:说明当前版本的记录的事务是在创建read view之后生成的,该版本的数据行不可以被当前事务访问。此时需要通过版本链找到上一个版本,然后重新判断该版本的记录对当前事务的可见性。
  • 如果up_limit_id <= DATA_TRX_ID < low_limit_i
    1. 需要在活跃事务链表中查找是否存在ID为DATA_TRX_ID的值的事务。
    2. 如果存在,因为在活跃事务链表中的事务是未提交的,所以该记录是不可见的。此时需要通过版本链找到上一个版本,然后重新判断该版本的可见性。
    3. 如果不存在,说明事务trx_id 已经提交了,这行记录是可见的。

总结:InnoDB 的MVCC是通过 read view 和版本链实现的,版本链保存有历史版本记录,通过read view 判断当前版本的数据是否可见,如果不可见,再从版本链中找到上一个版本,继续进行判断,直到找到一个可见的版本。

快照读和当前读

表记录有两种读取方式。

  • 快照读:读取的是快照版本。普通的SELECT就是快照读。通过mvcc来进行并发控制的,不用加锁。

  • 当前读:读取的是最新版本。UPDATE、DELETE、INSERT、SELECT … LOCK IN SHARE MODE、SELECT … FOR UPDATE是当前读。

快照读情况下,InnoDB通过mvcc机制避免了幻读现象。而mvcc机制无法避免当前读情况下出现的幻读现象。因为当前读每次读取的都是最新数据,这时如果两次查询中间有其它事务插入数据,就会产生幻读。

下面举个例子说明下:

1、首先,user表只有两条记录,具体如下:

2、事务a和事务b同时开启事务start transaction

3、事务a插入数据然后提交;

insert into user(user_name, user_password, user_mail, user_state) values('tyson', 'a', 'a', 0);

4、事务b执行全表的update;

update user set user_name = 'a';

5. Transaction b then executes the query and finds the data inserted in transaction a. (The left side of the figure below is transaction b, and the right side is transaction a. Before the transaction started, there were only two records. After transaction a inserted one piece of data, transaction b queried three pieces of data)

[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-c6OMxuPD-1691456619399) (http://img.topjavaer.cn/img/phantom reading 1.png)]

The above is the phantom reading phenomenon that occurs in the current reading.

So how does MySQL avoid phantom reads?

  • In the case of snapshot reads, MySQL mvccavoids phantom reads.
  • In the current read situation, MySQL next-keyavoids phantom reads (implemented by adding row locks and gap locks).

next-key consists of two parts: row lock and gap lock. Row locks are locks added to indexes, and gap locks are added between indexes.

SerializableThe isolation level can also avoid phantom reads, which will lock the entire table. The concurrency is extremely low and is generally not used.

Shared locks and exclusive locks

SELECT's read locking is mainly divided into two methods: shared lock and exclusive lock.

select * from table where id<6 lock in share mode;--共享锁
select * from table where id<6 for update;--排他锁

The main difference between these two methods is that LOCK IN SHARE MODE it is easy to cause deadlock when multiple transactions update the same form at the same time.

The prerequisite for applying for an exclusive lock is that no thread uses an exclusive lock or shared lock for any row data in the result set, otherwise the application will be blocked. When performing a transaction operation, MySQL will add an exclusive lock to each row of data in the query result set. Changes or deletions of these data by other threads will be blocked (only read operations) until the transaction of the statement is executed by the statement or commitstatement rollback. until the end.

SELECT... FOR UPDATEPrecautions for use:

  1. for updateApplies only to innodb and must be within transaction scope to take effect.
  2. Query based on the primary key. If the query condition is likeor is not equal to, the primary key field will generate a table lock .
  3. Querying based on non-indexed fields will generate table locks .

bin log/redo log/undo log

MySQL logs mainly include query logs, slow query logs, transaction logs, error logs, binary logs, etc. The more important ones are bin log(binary log) and redo log(redo log) and undo log(rollback log).

bin log

bin logIt is a file at the MySQL database level. It records all operations that modify the MySQL database. It does not record select and show statements. It is mainly used to restore the database and synchronize the database.

redo log

redo logIt is the innodb engine level and is used to record the transaction log of the innodb storage engine. It will be recorded regardless of whether the transaction is submitted for data recovery. When a database failure occurs, the innoDB storage engine will use redo logrecovery to the time before the failure to ensure data integrity. Set the parameter innodb_flush_log_at_tx_committo 1, then the commit will redo logbe written to disk synchronously.

undo log

In addition to recording redo log, when data is modified undo log, it will also be recorded undo logfor data recall operations. It retains the content before the record modification. undo logTransaction rollback can be achieved, and MVCC can be implemented based on backtracking toundo log a specific version of data .

What is the difference between bin log and redo log?

  1. bin logAll log records will be recorded, including logs of storage engines such as InnoDB and MyISAM; redo logonly innoDB's own transaction logs will be recorded.
  2. bin logIt is only written to the disk before the transaction is committed, and a transaction is only written once; while the transaction is in progress, there will be redo logcontinuous writes to the disk.
  3. bin logIt is a logical log, which records the original logic of the SQL statement; redo logit is a physical log, which records what modifications were made on a certain data page.

Tell me about MySQL architecture?

MySQL is mainly divided into Server layer and storage engine layer:

  • Server layer : Mainly includes connectors, query caches, analyzers, optimizers, executors, etc. All cross-storage engine functions are implemented in this layer, such as stored procedures, triggers, views, functions, etc., and there is also a general The log module binglog log module.
  • Storage engine : Mainly responsible for data storage and reading. The server layer communicates with the storage engine through the API.

Server layer basic components

  • Connector: When the client connects to MySQL, the server layer will perform identity authentication and permission verification.
  • Query cache: When executing a query statement, the cache will be queried first to verify whether the sql has been executed. If the sql is cached, it will be returned directly to the client. If there is no hit, subsequent operations will be performed.
  • Analyzer: If the cache is not hit, the SQL statement will go through the analyzer, which is mainly divided into two steps, lexical analysis and syntax analysis. First, see what the SQL statement does, and then check whether the syntax of the SQL statement is correct.
  • Optimizer: The optimizer optimizes the query, including rewriting the query, determining the reading and writing order of the table, selecting appropriate indexes, etc., and generating an execution plan.
  • Executor: First, before execution, it will check whether the user has permission. If there is no permission, an error message will be returned. If there is permission, the engine interface will be called according to the execution plan and the result will be returned.

Sub-database and sub-table

When the data volume of a single table reaches 1000W or 100G, optimizing indexes, adding slave databases, etc. may not have a significant effect on improving database performance. At this time, it is necessary to consider splitting it. The purpose of segmentation is to reduce the burden on the database and shorten query time.

Data segmentation can be divided into two ways: vertical partitioning and horizontal partitioning.

vertical division

Vertically partitioning the database is based on business. For example, in shopping scenarios, the tables involving products, orders, and users in the database can be divided into one database respectively to improve performance by reducing the size of a single database. Similarly, the case of splitting tables is to split a large table into a sub-table based on business functions, such as basic product information and product description. The basic product information is generally displayed in the product list, and the product description is on the product details page. The product can be Basic information and product description are split into two tables.

[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-cSyfknhd-1691456619400) (http://img.topjavaer.cn/img/vertical division.png)]

Advantages : row records become smaller, data pages can store more records, and I/O times are reduced during queries.

Disadvantages :

  • The primary key is redundant and redundant columns need to be managed;
  • It will cause table connection JOIN operation, which can be performed on the business server to reduce database pressure;
  • There is still the problem of excessive data volume in a single table.

Horizontal division

Horizontal partitioning is to split data according to certain rules, such as time or id sequence values. For example, split different databases based on year. Each database has the same structure, but the data is split to improve performance.

[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-Q97IJSOD-1691456619401) (http://img.topjavaer.cn/img/horizontal division.png)]

Advantages : The amount of data in a single database (table) is reduced and performance is improved; the divided tables have the same structure and fewer program changes.

Disadvantages :

  • Sharding transaction consistency is difficult to solve
  • Poor cross-node joinperformance and complex logic
  • Data sharding needs to be migrated when expanding

What is a partition table?

Partitioning is to divide the data of a table into N blocks. A partitioned table is an independent logical table, but the underlying layer is composed of multiple physical sub-tables.

When the data of the query condition is distributed in a certain partition, the query engine will only query a certain partition instead of traversing the entire table. At the management level, if you need to delete data in a certain partition, you only need to delete the corresponding partition.

Partitions are generally placed on a single machine, and time range partitioning is more commonly used to facilitate archiving. It's just that the sub-database and table need to be implemented in code, and partitioning is implemented internally in MySQL. Sub-database, sub-table and partition do not conflict and can be used together.

Partition table type

Range partitioning is based on range partitioning. For example, partition by time range

CREATE TABLE test_range_partition(
       id INT auto_increment,
       createdate DATETIME,
       primary key (id,createdate)
   ) 
   PARTITION BY RANGE (TO_DAYS(createdate) ) (
      PARTITION p201801 VALUES LESS THAN ( TO_DAYS('20180201') ),
      PARTITION p201802 VALUES LESS THAN ( TO_DAYS('20180301') ),
      PARTITION p201803 VALUES LESS THAN ( TO_DAYS('20180401') ),
      PARTITION p201804 VALUES LESS THAN ( TO_DAYS('20180501') ),
      PARTITION p201805 VALUES LESS THAN ( TO_DAYS('20180601') ),
      PARTITION p201806 VALUES LESS THAN ( TO_DAYS('20180701') ),
      PARTITION p201807 VALUES LESS THAN ( TO_DAYS('20180801') ),
      PARTITION p201808 VALUES LESS THAN ( TO_DAYS('20180901') ),
      PARTITION p201809 VALUES LESS THAN ( TO_DAYS('20181001') ),
      PARTITION p201810 VALUES LESS THAN ( TO_DAYS('20181101') ),
      PARTITION p201811 VALUES LESS THAN ( TO_DAYS('20181201') ),
      PARTITION p201812 VALUES LESS THAN ( TO_DAYS('20190101') )
   );

/var/lib/mysql/data/The corresponding data files can be found. Each partition table has a table file named using # to separate it :

   -rw-r----- 1 MySQL MySQL    65 Mar 14 21:47 db.opt
   -rw-r----- 1 MySQL MySQL  8598 Mar 14 21:50 test_range_partition.frm
   -rw-r----- 1 MySQL MySQL 98304 Mar 14 21:50 test_range_partition#P#p201801.ibd
   -rw-r----- 1 MySQL MySQL 98304 Mar 14 21:50 test_range_partition#P#p201802.ibd
   -rw-r----- 1 MySQL MySQL 98304 Mar 14 21:50 test_range_partition#P#p201803.ibd
...

list partition

List partitioning is similar to range partitioning. The main difference is that list is a collection of enumerated value lists, and range is a collection of continuous interval values. For list partitioning, the partitioning field must be known. If the inserted field is not in the enumeration value during partitioning, it will not be inserted.

create table test_list_partiotion
   (
       id int auto_increment,
       data_type tinyint,
       primary key(id,data_type)
   )partition by list(data_type)
   (
       partition p0 values in (0,1,2,3,4,5,6),
       partition p1 values in (7,8,9,10,11,12),
       partition p2 values in (13,14,15,16,17)
   );

hash partition

Data can be evenly distributed into predefined partitions.

create table test_hash_partiotion
   (
       id int auto_increment,
       create_date datetime,
       primary key(id,create_date)
   )partition by hash(year(create_date)) partitions 10;

Partitioning issue?

  1. Opening and locking all underlying tables can be costly. When a query accesses a partitioned table, MySQL needs to open and lock all underlying tables. This operation occurs before partition filtering, so partition filtering cannot be used to reduce this overhead, which will affect query speed. This overhead can be reduced through batch operations, such as batch insertion and LOAD DATA INFILEdeletion of multiple rows of data at a time.
  2. Maintaining partitions can be costly. For example, to reorganize a partition, a temporary partition will be created first, then the data will be copied to it, and finally the original partition will be deleted.
  3. All partitions must use the same storage engine.

Query statement execution process?

The execution process of the query statement is as follows: permission verification, query cache, analyzer, optimizer, permission verification, executor, and engine.

For example, the query statement is as follows:

select * from user where id > 1 and name = '大彬';
  1. First check the permissions. If there is no permission, an error will be returned;
  2. Before MySQL 8.0, the cache would be queried. If the cache hits, it will be returned directly. If not, the next step will be executed.
  3. Lexical analysis and syntactic analysis. Extract the table name and query conditions, and check whether there are any syntax errors;
  4. Two execution plans, check first id > 1or name = '大彬'the optimizer chooses the one with the best execution efficiency based on its own optimization algorithm;
  5. Verify permissions. If you have permission, call the database engine interface and return the engine execution results.

Update statement execution process?

The update statement execution process is as follows: analyzer, permission verification, executor, engine, redo log( preparestatus), binlog, redo log( commitstatus)

For example, the update statement is as follows:

update user set name = '大彬' where id = 1;
  1. First query the record with id 1. If there is a cache, the cache will be used.
  2. Get the query results, update the name to Dabin, and then call the engine interface to write the updated data. The innodb engine saves the data in the memory and records it at the same time. At this time, it enters redo logthe redo logstate prepare.
  3. The executor records the notification after receiving it binlog, then calls the engine interface and submits it redo logas commitstatus.
  4. update completed.

Why don't you submit it directly after recording redo log, but enter preparethe state first?

Suppose you write redo logand submit directly first, and then write . After binlogwriting redo log, the machine hangs up and binlogthe log is not written. Then after the machine restarts, the machine will restore the data, but the data is not recorded redo logat this time . The machine will be backed up later. binlogtime, this piece of data will be lost, and at the same time, the master-slave synchronization will also lose this piece of data.

What is the difference between exist and in?

existsUsed to filter appearance records. existsThe outer table will be traversed, and each row of the outer query table will be substituted into the inner query for judgment. When existsthe conditional statement in can return record rows, the condition is true and the current record of the table is returned. On the other hand, if existsthe conditional statement inside cannot return record rows and the condition is false, the current record in the table will be discarded.

select a.* from A awhere exists(select 1 from B b where a.id=b.id)

inThe method is to first find out the following statements and put them into the temporary table, then traverse the temporary table, and substitute each row of the temporary table into the external query to search.

select * from Awhere id in(select id from B)

When the table of the subquery is relatively large , using it existscan effectively reduce the total number of loops to improve the speed; when the table of the external query is relatively large , using it incan effectively reduce the loop traversal of the external query table to improve the speed.

What is the difference between int(10) and char(10) in MySQL?

The 10 in int(10) represents the length of the displayed data, while char(10) represents the length of the stored data.

What is the difference between truncate, delete and drop?

Same point:

  1. truncateand without whereclauses delete, and dropwill delete the data in the table.

  2. drop, truncateare all DDLstatements (data definition language), which will be automatically submitted after execution.

difference:

  1. truncate and delete only delete data without deleting the structure of the table; the drop statement will delete the constraints, triggers, and indexes on which the structure of the table depends;
  2. Generally speaking, execution speed: drop > truncate > delete.

What is the difference between having and where?

  • The objects they act on are different. whereThe clause acts on tables and views, and havingon groups.
  • whereFilter before data grouping and havingfilter after data grouping.

Why do we need to do master-slave synchronization?

  1. The separation of reading and writing enables the database to support greater concurrency.
  2. Real-time data is generated on the master server and analyzed on the slave server, thereby improving the performance of the master server.
  3. Data backup to ensure data security.

What is MySQL master-slave synchronization?

Master-slave synchronization allows data to be copied from one database server to other servers. When copying data, one server acts as the master server ( ) masterand the remaining servers act as slave servers ( slave).

Because replication is performed asynchronously, the slave server does not need to be connected to the master server all the time. The slave server can even be connected to the master server intermittently through dial-up. Through the configuration file, you can specify to copy all databases, a certain database, or even a certain table on a certain database.

What are optimistic locking and pessimistic locking?

Concurrency control in the database ensures that the isolation and unity of transactions and the unity of the database are not destroyed when multiple transactions access the same data in the database at the same time. Optimistic locking and pessimistic locking are the main technical means used for concurrency control.

  • Pessimistic lock: Assuming that a concurrency conflict will occur, the operated data will be locked. The lock will not be released until the transaction is committed, and other transactions can modify it. Implementation method: Use the lock mechanism in the database.
  • Optimistic locking: Assume that no concurrency conflicts will occur, and only check whether the data has been modified when submitting the operation. Add a field to the table and check whether it is equal to the original value versionbefore submitting the modification. If it is equal, it means that the data has not been modified and can be updated. Otherwise, the data is dirty data and cannot be updated. Implementation method: Optimistic locking is generally implemented using a version number mechanism or algorithm.versionversionCAS

Have you ever used processlist?

show processlistOr show full processlistyou can check whether MySQL is currently under pressure, running SQL, and whether it SQLis executing slowly. The return parameters are as follows:

  1. id : Thread ID, can be used to kill idkill a thread
  2. db : database name
  3. user : database user
  4. host : IP of the database instance
  5. command : the currently executed command, such as Sleep, Query, Connect etc.
  6. time : consumption time, unit seconds
  7. state : execution state, mainly including the following states:
    • Sleep, the thread is waiting for the client to send a new request
    • Locked, the thread is waiting for the lock
    • Sending data, processing SELECTthe query records and sending the results to the client at the same time
    • Kill, executing killstatement, kills the specified thread
    • Connect, a slave node is connected to the master node
    • Quit, the thread is exiting
    • Sorting for group, GROUP BYsorting for
    • Sorting for order, ORDER BYsorting for
  8. infoSQL : statement being executed

Is MySQL query limit 1000,10 as fast as limit 10?

Two query methods. Corresponds limit offset, sizeto two ways of and limit size.

In fact limit size, it is equivalent to limit 0, size. That is, start taking size data from 0.

In other words, the difference between the two methods is whether offset is 0.

Let’s first look at the internal execution logic of limit sql.

MySQL is internally divided into server layer and storage engine layer . Under normal circumstances, the storage engine uses innodb.

There are many modules in the server layer. What needs attention is that the executor is the component used to deal with the storage engine.

The executor can retrieve rows of data by calling the interface provided by the storage engine. When the data fully meets the requirements (such as meeting other where conditions), it will be placed in the result set and finally returned to the client calling mysql .

Take the limit execution process of the primary key index as an example:

Execution select * from xxx order by id limit 0, 10;, select is followed by an asterisk , which means that all field information of the row data is required .

The server layer will call the interface of innodb, obtain the complete row data from 0 to 10 in the primary key index in innodb , return it to the server layer in turn, put it into the result set of the server layer, and return it to the client.

Make the offset larger, for example, the execution is:select * from xxx order by id limit 500000, 10;

The server layer will call the interface of innodb. Since this time offset=500000, the complete row data from 0 to (500000 + 10) will be obtained from the primary key index in innodb. After returning to the server layer , it will be discarded one by one according to the offset value. , and finally only the last size items , that is, 10 pieces of data, are left in the result set of the server layer and returned to the client.

It can be seen that when the offset is non-0, the server layer will obtain a lot of useless data from the engine layer , and obtaining these useless data is time-consuming.

Therefore, limit 1000,10 in mysql query will be slower than limit 10. The reason is that limit 1000,10 will take out 1000+10 pieces of data and discard the first 1000 pieces, which is more time-consuming.

How much data can be stored in a B+ tree with a height of 3?

The InnoDB storage engine has its own minimum storage unit - page.

The command to query the InnoDB page size is as follows:

mysql> show global status like 'innodb_page_size';
+------------------+-------+
| Variable_name    | Value |
+------------------+-------+
| Innodb_page_size | 16384 |
+------------------+-------+

It can be seen that the default page size of innodb is 16384B = 16384/1024 = 16kb.

In MySQL, it is most appropriate to set the size of a node in the B+ tree to one page or a multiple of a page. Because if the size of a node is < 1 page, then when reading this node, one page is actually read, which causes a waste of resources.

The non- leaf nodes in the B+ tree store keys + pointers ; the leaf nodes store data rows .

For leaf nodes, if the data size of a row is 1k, then 16 pieces of data can be stored on one page.

For non-leaf nodes, if the key uses bigint, it is 8 bytes, and the pointer is 6 bytes in MySQL. It is 14 bytes in total, so 16k can store 16 * 1024 / 14 = 1170 index pointers.

So it can be calculated that for a B+ tree with a height of 2, the root node stores the index pointer node, then it has 1170 leaf nodes to store data, and each leaf node can store 16 pieces of data, a total of 1170 x 16 = 18720 pieces of data. For a B+ tree with a height of 3, it can store 1170 x 1170 x 16 = 21902400 pieces of data ( more than 20 million pieces of data ). That is, for more than 20 million pieces of data, we only need a B+ tree with a height of 3. It can be completed. It only takes 3 IO operations to find the corresponding data through primary key query.

Therefore, in InnoDB, when the height of the B+ tree is generally 3 levels, it can meet the needs of tens of millions of data storage.

How to optimize deep paging?

Or take the above SQL as empty:select * from xxx order by id limit 500000, 10;

Method one :

As can be seen from the above analysis, when the offset is very large, the server layer will obtain a lot of useless data from the engine layer. When the select is followed by the * number, the complete row information needs to be copied. Compared with copying the complete data , only Copying one or two column fields in the row data is more time-consuming.

Because the previous offset data are not needed in the end, there is no need to copy the complete fields, so the sql statement can be modified to:

select * from xxx  where id >=(select id from xxx order by id limit 500000, 1) order by id limit 10;

First execute the subquery select id from xxx by id limit 500000, 1. This operation will actually obtain 500000+1pieces of data from the primary key index in innodb. Then the server layer will discard the first 500,000 pieces and only retain the ID of the last piece of data.

But the difference is that in the process of returning to the server layer, only the id column in the data row will be copied, but not all columns of the data row. When the amount of data is large, the time consumption of this part is quite obvious. .

After getting the above id, assuming that this id is exactly equal to 500000, then the sql becomes

select * from xxx  where id >=500000 order by id limit 10;

In this way, innodb goes through the primary key index again, quickly locates the row data with id=500000 through the B+ tree, the time complexity is lg(n), and then fetches 10 pieces of data backwards.

Method Two:

Sort all the data according to the primary key of ID , then retrieve it in batches, and use the maximum ID of the current batch as the next filtering condition for query.

select * from xxx where id > start_id order by id limit 10;

Through the primary key index, the position of start_id is located each time, and then 10 pieces of data are traversed. In this way, no matter how big the data is, the query performance is relatively stable.

How to optimize large table query if it is slow?

A certain table has nearly 10 million data, and the query is slow. How to optimize it?

When the number of records in a single MySQL table is too large, the performance of the database will decrease significantly. Some common optimization measures are as follows:

  • Properly index. Create an index on the appropriate field, for example, create an index on the columns involved in the WHERE and ORDER BY commands. You can use EXPLAIN to check whether an index or a full table scan is used.
  • Index optimization, SQL optimization. Leftmost matching principle, etc., refer to: https://topjavaer.cn/database/mysql.html#%E4%BB%80%E4%B9%88%E6%98%AF%E8%A6%86%E7%9B %96%E7%B4%A2%E5%BC%95
  • Create partitions. Establish horizontal partitioning for key fields, such as time fields. If query conditions are often queried through time ranges, this can improve performance a lot.
  • Take advantage of caching. Use Redis and other cache hotspot data to improve query efficiency
  • Limit the scope of data. For example: when users query historical information, they can control it within a month's time range
  • Separation of reading and writing. Classic database splitting scheme, the master database is responsible for writing, and the slave database is responsible for reading
  • Optimization is carried out through sub-database and sub-table, mainly including vertical split and horizontal split.
  • Properly index. Index on the appropriate fields, such as indexing on the columns involved in the WHERE and ORDERBY commands
  1. Data heterogeneity to es
  2. Separation of hot and cold data. Data that was not commonly used a few months ago is placed in the cold storage, and the latest data is placed in the hot storage.
  3. Upgrade the database type to a database compatible with MySQL (OceanBase, tidb)

How big is a single table in MySQL to split databases and tables?

There are currently two mainstream theories:

  1. If the data volume of a single MySQL table is greater than 20 million rows, the performance will be significantly reduced. Consider splitting databases and tables.
  2. Alibaba's "Java Development Manual" states that database and table sharding is only recommended when the number of rows in a single table exceeds 5 million or the capacity of a single table exceeds 2GB.

In fact, this value has nothing to do with the actual number of records, but is related to the configuration of MySQL and the hardware of the machine. Because MySQL loads the index of the table into memory in order to improve performance. When the InnoDB buffer size is sufficient, it can be fully loaded into memory and there will be no problem with querying. However, when a single-table database reaches an upper limit of a certain magnitude, the memory cannot store its index, causing subsequent SQL queries to generate disk IO, resulting in performance degradation. Of course, this is also related to the design of the specific table structure, and the ultimate problem is memory limitation.

Therefore, for sub-databases and tables, it is necessary to combine actual needs and not over-design. We do not use sub-database and sub-table design at the beginning of the project. Instead, as the business grows and it is impossible to continue to optimize, consider sub-databases. Improve system performance with sub-tables. In this regard, Alibaba's "Java Development Manual" adds: If the data volume is not expected to reach this level in three years, please do not divide the database into tables when creating the table.

As for the size of a single MySQL table to be divided into databases and tables, it should be evaluated based on machine resources.

Let’s talk about the differences between count(1), count(*) and count(field name)

Well, let’s first talk about the difference between count(1) and count(field name).

The main difference between the two is

  1. count(1) will count all records in the table, including records with null fields.
  2. count(field name) will count the number of times this field appears in the table, ignoring the case where the field is null. That is, records with null fields are not counted.

Next let’s take a look at the differences between the three.

In terms of execution effect:

  • count(*) includes all columns, which is equivalent to the number of rows. When calculating the results, NULL column values ​​will not be ignored.
  • count(1) includes ignoring all columns, using 1 to represent code lines. When counting results, column values ​​​​that are NULL will not be ignored.
  • count (field name) only includes the column name. When counting the results, the count of empty column values ​​(the empty here does not mean just an empty string or 0, but means null) will be ignored. That is, a certain field value is When NULL, no statistics will be collected .

In terms of execution efficiency:

  • The column name is the primary key, and count(field name) will be faster than count(1)
  • Column name is not the primary key, count(1) will be faster than count(column name)
  • If the table has multiple columns and no primary key, count(1) performs better than count(*)
  • If there is a primary key, the execution efficiency of select count (primary key) is optimal
  • If the table has only one field, select count(*) is optimal.

What is the difference between DATETIME and TIMESTAMP in MySQL?

Well, both TIMESTAMPand DATETIMEcan be used to store time. They have the following main differences:

1. Representation range

  • DATETIME: 1000-01-01 00:00:00.000000 to 9999-12-31 23:59:59.999999
  • TIMESTAMP: '1970-01-01 00:00:01.000000' UTC 到 '2038-01-09 03:14:07.999999' UTC

TIMESTAMPThe supported time range is relatively DATATIMEsmall and can easily be exceeded.

2. Space occupation

  • TIMESTAMP: 4 bytes
  • DATETIME: Before MySQL 5.6.4, it occupies 8 bytes, and in subsequent versions, it occupies 5 bytes

3.Whether the deposit time will be automatically converted?

TIMESTAMPBy default, when inserting or updating data, TIMESTAMPthe column will be automatically filled/updated with the current time ( CURRENT_TIMESTAMP). DATETIMEIt will not do any conversion, nor will it detect the time zone. It will store whatever data you give it.

4. TIMESTAMPIt is more affected by the time zone, the MYSQL version and the SQL MODE of the server. Because TIMESTAMPit stores timestamps, the times obtained in different time zones are inconsistent.

5. If NULL is stored, the actual stored values ​​​​of the two are different.

  • TIMESTAMP: The current time now() will be automatically stored.
  • DATETIME: The current time will not be automatically stored, but the NULL value will be stored directly.

Tell me why it is not recommended to use foreign keys?

A foreign key is a constraint. The existence of this constraint will ensure that the relationship between data between tables is always complete. The existence of foreign keys is not entirely without advantages.

Foreign keys can ensure data integrity and consistency, and cascading operations are convenient. Moreover, the use of foreign keys can entrust the judgment of data integrity to the database, reducing the amount of code in the program.

Although foreign keys can ensure data integrity, they can bring many defects to the system.

1. Concurrency issues. When using foreign keys, every time you modify the data, you need to check the data in another table and acquire additional locks. If you are in a high-concurrency and high-traffic transaction scenario, using foreign keys is more likely to cause deadlock.

2. Scalability issues. For example, when MySQLmigrating to a new database Oracle, foreign keys depend on the characteristics of the database itself, so migration may be inconvenient.

3. It is not conducive to sub-database and sub-table. In the case of horizontal splitting and sharding, foreign keys cannot take effect. Putting the maintenance of relationships between data into the application can save a lot of trouble for future database and table subdivisions.

What are the benefits of using auto-incrementing primary keys?

The auto-increasing primary key allows the primary key index to maintain the incremental insertion order as much as possible, avoiding page splits, so the index is more compact and the efficiency is higher when querying.

Why can’t InnoDB’s self-increasing value be recycled?

Mainly to improve the efficiency and parallelism of inserting data.

Suppose there are two transactions executed in parallel. When applying for auto-increment value, in order to prevent the two transactions from applying for the same auto-increment ID, locks must be locked and then applied sequentially.

Assume that transaction A applies for id=2, and transaction B applies for id=3. Then the auto-increment value of table t is 4 at this time, and execution continues thereafter.

Transaction B committed correctly, but transaction A had a unique key conflict.

If transaction A is allowed to roll back the auto-increment id, that is, change the current auto-increment value of table t back to 2, then there will be a situation like this: there is already a row with id=3 in the table, and the current auto-increment id value is 2.

Next, other transactions that continue to be executed will apply for id=2, and then apply for id=3. At this time, an insert statement error "primary key conflict" will appear.

In order to resolve this primary key conflict, there are two methods:

  • Before each application for an id, first determine whether the id already exists in the table. If it exists, skip this id. However, this method is costly. Because originally applying for an id was a quick operation, but now we have to go to the primary key index tree to determine whether the id exists.
  • To expand the lock range of the auto-increment ID, you must wait until a transaction is completed and submitted before the next transaction can apply for the auto-increment ID. The problem with this method is that the lock granularity is too large and the system's concurrency capability is greatly reduced.

It can be seen that both methods will cause performance problems.

Therefore, InnoDB gave up the design of "allowing auto-increment ID to be rolled back", and the auto-increment ID will not be rolled back if the statement fails to execute.

Where is the auto-incrementing primary key stored?

Different engines have different storage strategies for auto-incremented values:

  • The auto-increment value of the MyISAM engine is saved in the data file.
  • Before MySQL 8.0, the auto-increment value of the InnoDB engine was stored in memory. This value in memory will be lost after MySQL restarts. When the table is opened for the first time after restarting, it will find the maximum value of the self-increment max(id), and then add 1 to the maximum value as the self-increment value of the table; MySQL8 The .0 version will record the self-increasing changes in the redo log, and rely on the redo log to recover when restarting.

Does the auto-incrementing primary key have to be continuous?

Not necessarily, there are several situations that can cause the auto-incrementing primary key to be discontinuous.

1. The unique key conflict causes the auto-incrementing primary key to be discontinuous. When we insert data into an InnoDB table with an auto-incrementing primary key, if the unique constraint of the unique index defined in the table is violated, the data insertion will fail. At this time, the key value of the table's auto-incrementing primary key will be rolled backward by 1. The next time you insert data again, you can no longer use the key values ​​generated by the last scrolling due to the failure to insert data. You must use the key values ​​generated by the new scrolling.

2. Transaction rollback causes the auto-incrementing primary key to be discontinuous. When we insert data into an InnoDB table with an auto-incrementing primary key, if the transaction is explicitly enabled and then the transaction is finally rolled back for some reason, the auto-increment value of the table will also be rolled at this time, and the next new Inserted data will not be able to use the rolled auto-increment value, but will need to re-apply for a new auto-increment value.

3. Batch insertion results in discontinuous self-increment values. MySQL has a strategy for batch application of self-increasing IDs:

  • During the execution of the statement, the first time you apply for an auto-increment ID, 1 auto-increment ID will be allocated.
  • After 1 is used up, if you apply for the second time, 2 self-increasing IDs will be allocated.
  • After 2 are used up, if you apply for the third time, 4 self-increasing IDs will be allocated.
  • And so on, each application is twice the amount of the previous one (the last application may not all be used)

If the next transaction inserts data again, it will be applied based on the self-increment after the previous transaction application. At this time, the self-increasing value is discontinuous.

4. If the auto-increment step size is not 1, it will also cause the auto-increment primary key to be discontinuous.

How to synchronize MySQL data to Redis cache?

Reference: https://cloud.tencent.com/developer/article/1805755

There are two options:

1. Redis is automatically refreshed synchronously through MySQL, implemented by MySQL trigger + UDF function .

The process is roughly as follows:

  1. Set a trigger in MySQL for the data to be operated and monitor the operation
  2. When the client writes data to MySQL, the trigger will be triggered. After the trigger, the UDF function of MySQL is called.
  3. The UDF function can write data to Redis to achieve synchronization effect

2. Parse MySQL's binlog to synchronize data in the database to Redis. This can be achieved through canal. Canal is an open source project under Alibaba that provides incremental data subscription & consumption based on database incremental log analysis.

The principle of canal is as follows:

  1. Canal simulates the interaction protocol of mysql slave, disguises itself as mysql slave, and sends the dump protocol to mysql master.
  2. mysql master receives the dump request and starts pushing binary log to canal
  3. canal parses the binary log object (originally a byte stream) and writes the data to Redis synchronously.

Why does the Alibaba Java manual prohibit the use of stored procedures?

Let’s first look at what a stored procedure is.

A stored procedure is a set of SQL statements that are used to complete specific functions in a large database system. It is stored in the database and is permanently valid after being compiled once. The user specifies the name of the stored procedure and gives parameters (if the stored procedure has parameters) to execute it.

Stored procedures mainly have the following disadvantages.

  1. Stored procedures are difficult to debug . The development of stored procedures has always lacked an effective IDE environment. The SQL itself is often very long, and debugging requires splitting the sentences and executing them independently, which is very troublesome.
  2. Poor portability . It is difficult to transplant stored procedures. Generally, business systems will inevitably use the unique characteristics and syntax of the database. When replacing the database, this part of the code needs to be rewritten, which is costly.
  3. Difficulties in management . The directory of stored procedures is flat, not a tree structure like a file system. It is easy to handle when there are few scripts, but once there are too many, the directory will fall into chaos.
  4. Stored procedures are only optimized once . Sometimes as the amount of data increases or the data structure changes, the execution plan selected by the original stored procedure may not be optimal, so manual intervention or recompilation is required at this time.

Finally, I would like to share with you a Github repository, which contains more than 300 classic computer book PDFs compiled by Dabin, including C language, C++, Java, Python, front-end, database, operating system, computer network, data structure and algorithm, and machine learning. , Programming Life , etc. You can star it and search directly on it next time you are looking for books. The warehouse is being updated continuously! Github address

If you cannot access Github, you can access the code cloud address. code cloud address

Guess you like

Origin blog.csdn.net/Tyson0314/article/details/132159169
Recommended