MySQL eight-part text recitation version

Article directory:

  • What is MySQL?

  • What are the commonly used storage engines for MySQL? What's the difference between them? ***

  • The three paradigms of database**

  • What are the data types of MySQL**

  • index***

    • What is an index?

    • What are the pros and cons of indexing?

    • index data structure?

    • Difference between Hash index and B+ tree?

    • What are the types of indexes?

    • What are the types of indexes?

    • What is the difference between a B tree and a B+ tree?

    • Why do databases use B+ trees instead of B trees?

    • What is a clustered index and what is a non-clustered index?

    • Will non-clustered indexes perform back-to-table queries?

    • What are the usage scenarios of indexes?

    • Index design principles?

    • How to optimize the index?

    • How to create/drop indexes?

    • Will performance definitely improve when querying with an index?

    • What is a prefix index?

    • What is the leftmost matching principle?

    • When does an index fail?

  • Database transactions***

    • What is a database transaction?

    • What are the four characteristics of transactions?

    • Database Concurrency Consistency Issues

    • What are the isolation levels of the database?

    • How is the isolation level achieved?

    • What is MVCC?

  • Database lock***

    • What is a database lock?

    • The relationship between database locks and isolation levels?

    • What are the types of database locks?

    • The row lock mode of the InnoDB engine in MySQL and how is it implemented?

    • What are optimistic locks and pessimistic locks in databases, and how to implement them?

    • What is deadlock? How to avoid it?

  • SQL Statement Basics

    • What are the main categories of SQL statements?

    • What are the SQL constraints? **

    • What is a subquery? **

    • Do you know several connection queries of MySQL? ***

    • The difference between in and exists in mysql? **

    • Difference between varchar and char? ***

    • Difference between int(10) and char(10) and varchar(10) in MySQL? ***

    • What is the difference between drop, delete and truncate? **

    • Difference between UNION and UNION ALL? **

    • What is a temporary table, when will a temporary table be used, and when should a temporary table be deleted?

    • How to optimize large table data query? ***

    • Know about slow log queries? Are statistics too slow for queries? How to optimize for slow queries? ***

    • Why set a primary key? **

    • The primary key generally uses an auto-incrementing ID or a UUID? **

    • Why should the field be set to not null? **

    • How to optimize data access during query? ***

    • How to optimize long and difficult queries? **

    • How to optimize LIMIT pagination? **

    • How to optimize UNION queries**

    • How to optimize the WHERE clause ***

    • What is the reason for the slow execution of the SQL statement? ***

    • Execution order of SQL statements? *

  • Database optimization

    • How to optimize large tables? ***

    • What is vertical sub-table, vertical sub-library, horizontal sub-table, horizontal sub-library? ***

    • How to deal with the ID key after sub-database and sub-table? ***

    • MySQL replication principle and process? How to implement master-slave replication? ***

    • Understand read-write separation? ***

What is MySQL?

Explanation on Baidu Encyclopedia: MySQL is an open source relational database management system (RDBMS) that uses the most commonly used database management language, Structured Query Language (SQL), for database management. MySQL is open source, so anyone can download it under the General Public License and modify it according to individual needs.

What are the commonly used storage engines for MySQL? What's the difference between them? ***

  • InnoDB

    InnoDB is MySQL's default storage engine and supports operations such as transactions, row locks, and foreign keys.

  • MyISAM

    MyISAM is the default storage engine before MySQL 5.1. MyISAM has poor concurrency and does not support operations such as transactions and foreign keys. The default lock granularity is table-level locks.

InnoDB MyISAM
foreign key support not support
affairs support not support
Lock Support table lock and row lock Support table lock
recoverability Recovery from transaction log No transaction log
Table Structure Data and indexes are centrally stored, .ibd and .frm Data and indexes are stored separately, data .MYD, indexes.MYI
query performance Generally worse than MyISAM Generally worse than InnoDB
index clustered index nonclustered index

The three paradigms of database**

  • First Normal Form: Ensures that each column remains atomic, and all field values ​​in the data table are non-decomposable atomic values.

  • Second Normal Form: Make sure that every column in the table is related to the primary key

  • Third Normal Form: Ensure that each column is directly related to the primary key column rather than indirectly related

What are the data types of MySQL**

  • integer

    TINYINT, SMALLINT, MEDIUMINT, INT, and BIGINT occupy 8, 16, 24, 32, and 64-bit storage space respectively. It is worth noting that the 10 in INT(10) just represents the number of displayed characters and has no practical significance. It is generally meaningful to use it in conjunction with UNSIGNED ZEROFILL. For example, the data type is INT(3), and the attribute is UNSIGNED ZEROFILL. If the inserted data is 3, the actual stored data is 003.

  • floating point number

    FLOAT, DOUBLE, and DECIMAL are floating-point numbers. DECIMAL is processed by strings and can store exact decimals. Compared with FLOAT and DOUBLE, DECIMAL is less efficient. FLOAT, DOUBLE, and DECIMAL can all specify the column width. For example, FLOAT(5,2) means a total of 5 bits, two bits store the fractional part, and three bits store the integer part.

  • string

    Commonly used strings are CHAR and VARCHAR. VARCHAR is mainly used to store variable-length strings, which saves space compared to fixed-length CHAR. CHAR is fixed length and allocates space according to the defined string length.

    Application scenario: It is better to use CHAR for frequently changed data, and CHAR is not prone to fragmentation. For very short columns, it is better to use CHAR, which is more efficient than VARCHAR. Generally, avoid using types such as TEXT/BLOB, because temporary tables are used when querying, causing serious performance overhead.

  • date

    The more commonly used are year, time, date, datetime, timestamp, etc. datetime saves the time from 1000 to 9999, with precision to seconds, using 8 bytes of storage space, regardless of time zone. timestamp is the same as UNIX timestamp, saves the time from midnight on January 1, 1970 to 2038, with precision to the second, uses four bytes of storage space, and is time zone dependent.

    Application scenario: try to use timestamp, which has higher space efficiency than datetime.

index***

What is an index?

Baidu Encyclopedia's explanation: An index is a structure that sorts the values ​​of one or more columns in a database table. Using an index, you can quickly access specific information in the data table.

What are the pros and cons of indexing?

advantage:

  • Greatly speed up data retrieval.

  • Turn random I/O into sequential I/O (because the leaves of the B+ tree are connected together)

  • Accelerate the connection between the table and the table

shortcoming:

  • From a space point of view, building an index requires physical space

  • From the perspective of time, it takes time to create and maintain indexes. For example, indexes need to be maintained when adding, deleting, and modifying data.

index data structure?

The data structure of the index mainly includes B+ tree and hash table, and the corresponding indexes are B+ tree index and hash index respectively. The index types of the InnoDB engine include B+ tree index and hash index. The default index type is B+ tree index.

  • B+ tree index

    Students who are familiar with data structures know that B+ trees, balanced binary trees, and red-black trees are all classic data structures. In the B+ tree, all record nodes are placed on the leaf nodes in the order of the key value, as shown in the following figure.

As can be seen from the above figure, because the B+ tree is ordered and all data is stored in the leaf nodes, the search efficiency is very high, and it supports sorting and range search.

The index of B+ tree can be divided into primary index and auxiliary index. The primary index is a clustered index, and the secondary index is a non-clustered index. The clustered index is a B+ tree index composed of the primary key as the key value of the B+ tree index. The leaf nodes of the clustered index store complete data records; the non-clustered index uses the non-primary key column as the key value of the B+ tree index. The formed B+ tree index, the leaf nodes of the non-clustered index store the primary key value. Therefore, when using a non-clustered index to query, the primary key value will be found first, and then the data field corresponding to the primary key will be found according to the clustered index. The leaf nodes in the above figure store data records, which are the structure diagram of the clustered index. The structure of the non-clustered index is as follows:

The letter in the above figure is the column value of the non-primary key of the data. If you want to query the information of the column value B, you need to find the primary key 7 first, and then query the data field corresponding to the primary key 7 in the clustered index.

  • hash index

    Hash index is implemented based on a hash table. For each row of data, the storage engine will perform hash calculation on the index column to obtain a hash code, and the hash algorithm should try to ensure that the hash code calculated by different column values The hash code value is different, the value of the hash code is used as the key value of the hash table, and the pointer to the data row is used as the value value of the hash table. In this way, the time complexity of finding a data is O(1), which is generally used for accurate search.

Difference between Hash index and B+ tree?

Because of the difference in the data structure of the two, their usage scenarios are also different. The hash index is generally used for accurate equivalent search, and the B+ index is mostly used for other searches except for accurate equivalent search. In most cases, you will choose to use a B+ tree index.

  • Hash indexes do not support sorting because hash tables are unordered.

  • Hash indexes do not support range lookups.

  • Hash indexes do not support fuzzy queries and leftmost prefix matching for multicolumn indexes.

  • Because there will be hash conflicts in the hash table, the performance of the hash index is unstable, while the performance of the B+ tree index is relatively stable, and each query is from the root node to the leaf node.

What are the types of indexes?

The main index types of MySQL are FULLTEXT, HASH, BTREE, RTREE.

  • FULL TEXT

    FULLTEXT is full-text index. MyISAM storage engine and InnoDB storage engine support full-text index in MySQL 5.6.4 and above. It is generally used to find keywords in text, rather than directly comparing whether they are equal, mostly on CHAR, VARCHAR, TAXT and other data types Create a full-text index. The full-text index is mainly used to solve the problem of low efficiency of fuzzy query for text such as WHERE name LIKE "%zhang%".

  • HASH

    HASH is a hash index. Hash index is mostly used for equivalent query. The time complexity is O(1), which is very efficient, but does not support sorting, range query, and fuzzy query.

  • BTREE

    BTREE is the B+ tree index, the default index of the INnoDB storage engine, supports sorting, grouping, range query, fuzzy query, etc., and has stable performance.

  • RTREE

    RTREE is spatial data index, which is mostly used for storage of geographic data. Compared with other indexes, the advantage of spatial data index lies in range search.

What are the types of indexes?

  • Primary key index: Data columns are not allowed to be duplicated, cannot be NULL, and a table can only have one primary key index

  • Composite Index: An index that consists of multiple column values.

  • Unique index: Data columns are not allowed to be repeated and can be NULL. The value of the index column must be unique. If it is a composite index, the combination of column values ​​must be unique.

  • Full-text indexing: Search the content of text.

  • Ordinary index: basic index type, can be NULL

What is the difference between a B tree and a B+ tree?

There are two main differences between B-trees and B+ trees:

  • The internal nodes and leaf nodes in the B-tree both store keys and values, while the internal nodes of the B+ tree have only keys and no values, and leaf nodes store all keys and values.

  • The leaf nodes of the B+ tree are connected together to facilitate sequential retrieval.

    The structure diagrams of the two are as follows.

Why do databases use B+ trees instead of B trees?

  • B-tree is suitable for random retrieval, while B+ tree is suitable for random and sequential retrieval

  • The space utilization of the B+ tree is higher, because each node of the B tree needs to store keys and values, while the internal nodes of the B+ tree only store keys, so that a node of the B+ tree can store more indexes, thereby making the height of the tree becomes lower, reducing the number of I/Os and making data retrieval faster.

  • The leaf nodes of the B+ tree are all connected together, so range search and sequential search are more convenient

  • The performance of the B+ tree is more stable, because in the B+ tree, each query is from the root node to the leaf node, while in the B tree, the value to be queried may not be in the leaf node, but has been found in the internal node.

In what circumstances is it suitable to use a B-tree, because the internal nodes of the B-tree can also store values, so you can place some frequently accessed values ​​close to the root node, which can improve query efficiency. To sum up, the performance of B+ tree is more suitable as a database index.

What is a clustered index and what is a non-clustered index?

The main difference between a clustered index and a non-clustered index is whether the data and the index are stored separately .

  • Clustered index: The data and the index are stored together, and the leaf nodes of the index structure retain the data rows.

  • Non-clustered index: The data entry and the index are stored separately, and the index leaf node stores the address pointing to the data row.

In the InnoDB storage engine, the default index is the B+ tree index. The index created by using the primary key is the primary index, which is also a clustered index, and the index created on the primary index is the secondary index, which is also a non-clustered index. Why is the secondary index created on top of the primary index, because the leaf nodes in the secondary index store the primary key.

In the MyISAM storage engine, the default index is also a B+ tree index, but both the primary and secondary indexes are non-clustered indexes, which means that the leaf nodes of the index structure store an address pointing to a data row. And use a secondary index to retrieve indexes that don't require access to the primary key.

You can see the difference between them from two very classic pictures (pictures are from the Internet):

Will non-clustered indexes perform back-to-table queries?

The above is to say that the leaf nodes of the non-clustered index store the primary key, which means that the primary key must be found through the non-clustered index, and then the data corresponding to the primary key must be found through the clustered index, and then the primary key is found through the clustered index. The process of the corresponding data is the query back to the table, so the non-clustered index will definitely perform the query back to the table?

The answer is not necessarily. This involves an index coverage issue. If the queried data can be fully obtained on the auxiliary index, there is no need to return to the table for query. For example, there is a table that stores personal information including fields such as id, name, and age. Assuming that the clustered index is an index constructed with ID as the key value, and the non-clustered index is an index constructed with the name as the key value, select id,name from user where name = 'zhangsan';this query does not require a back-to-table query because the data can already be retrieved through the non-clustered index , which is the case with index coverage. If the query statement is like this, select id,name,age from user where name = 'zhangsan';you need to perform a back-table query, because the value of age cannot be retrieved through a non-clustered index. How should that be resolved? You only need to cover the index, create a joint index of age and name, and then use select id,name,age from user where name = 'zhangsan';it to query.

Therefore, index coverage can solve the problem of non-clustered index back-to-table query.

What are the usage scenarios of indexes?

  • It is very effective to build indexes for medium and large tables. For very small tables, full table scans are generally faster.

  • For very large tables, the cost of creating and maintaining indexes will also become high, and partitioning technology can be considered at this time.

  • If there are many additions, deletions and changes to the table, and the query demand is very small, then there is no need to create an index, because maintaining the index also requires a price.

  • Fields that generally do not appear in where conditions do not need to be indexed.

  • If multiple fields are frequently queried, you can consider joint indexing.

  • Consider a unique index when there are many fields and the field values ​​are not repeated.

  • When there are many fields and there are repetitions, consider ordinary indexes.

Index design principles?

  • The most suitable columns for indexing are the columns that appear after the where or the columns specified in the join sentence, not the columns in the select list that appear after the SELECT keyword.

  • The larger the cardinality of the index column, the better the effect of the index, in other words, the higher the degree of discrimination of the index column, the better the effect of the index. For example, using a column with a low degree of discrimination, such as gender, as an index, the effect will be very poor, because the cardinality of the column is at most three, and most of them are either male or female.

  • Use short indexes as much as possible, and specify a shorter prefix length when indexing longer strings, because smaller indexes involve less disk I/O, and the blocks in the index cache can hold more The key value will make the query faster.

  • Use the leftmost prefix as much as possible.

  • Do not over-index, each index requires additional physical space, and maintenance also takes time, so the more indexes the better.

How to optimize the index?

In fact, the key to optimizing the index is to conform to the design principles and application scenarios of the index, and to optimize the index that does not meet the requirements into an index that conforms to the index design principles and application scenarios.

In addition to the design principles and application scenarios of the index, the following two aspects can also be considered.

  • The index column cannot be part of an expression or a parameter of a function when making a query, because then the index cannot be used. E.gselect * from table_name where a + 1 = 2

  • Put the most discriminative index first

  • Use select* sparingly

The usage scenarios of the index, the design principles of the index and how to optimize the index can be regarded as a problem.

How to create/drop indexes?

Create an index:

  • Using the CREATE INDEX statement

    CREATE INDEX index_name ON table_name (column_list);

  • Created at CREATE TABLE

    	CREATE TABLE user(
    	id INT PRIMARY KEY,
    	information text,
    	FULLTEXT KEY (information)
    );
    
  • CREATE INDEX USING ALTER TABLE

    ALTER TABLE table_name ADD INDEX index_name (column_list);

Drop index:

  • delete primary key index

    alter table 表名 drop primary key

  • delete other indexes

    alter table 表名 drop key 索引名

Will performance definitely improve when querying with an index?

Not necessarily, how to use the index reasonably has been mentioned in the usage scenarios of the index and the design principles of the index, because the creation and maintenance of the index requires space and time costs. If the index is used unreasonably, the query performance will be degraded. .

What is a prefix index?

Prefix indexing refers to indexing the first few characters of text or strings, so that the length of the index is shorter and the query speed is faster.

Usage scenario: When the distinction between prefixes is relatively high.

How to build a prefix index

ALTER TABLE table_name ADD KEY(column_name(prefix_length));

There is a prefix_length parameter that is difficult to determine. This parameter is the meaning of the prefix length. Usually, the following methods can be used to determine, first calculate the degree of discrimination of the whole column

SELECT COUNT(DISTINCT column_name) / COUNT(*) FROM table_name;

Then, when calculating the prefix length, it is most similar to the discrimination of the entire column.

SELECT COUNT(DISTINCT LEFT(column_name, prefix_length)) / COUNT(*) FROM table_name;

Constantly adjust the value of prefix_length until it is close to the degree of discrimination calculated for the entire column.

What is the leftmost matching principle?

The leftmost matching principle: Start continuous matching from the leftmost as the starting point, and stop matching when encountering a range query (<, >, between, like).

For example, to create an index (a, b, c), you can guess whether the index is used in the following situations.

  • The first

    select * from table_name where a = 1 and b = 2 and c = 3 
    select * from table_name where b = 2 and a = 1 and c = 3
    

    Indexes are used for all values ​​in the above two query processes, and the replacement of fields after where will not affect the query results, because the optimizer in MySQL will automatically optimize the query order.

  • the second

    select * from table_name where a = 1
    select * from table_name where a = 1 and b = 2  
    select * from table_name where a = 1 and b = 2 and c = 3
    

    The answer is that all three query statements use indexes, because all three statements are matched from the leftmost.

  • the third

    select * from table_name where  b = 1 
    select * from table_name where  b = 1 and c = 2 
    

    The answer is that neither of these two query statements uses an index, because it does not match from the leftmost

  • the fourth

    select * from table_name where a = 1 and c = 2 
    

    This query statement only uses the index in column a, and does not use the index in column c, because column b is skipped in the middle, and it is not matched continuously from the leftmost.

  • fifth

    select * from table_name where  a = 1 and b < 3 and c < 1
    

    In this query, only columns a and b use the index, and column c does not use the index, because according to the leftmost matching query principle, the range query will stop when encountered.

  • the sixth

    select * from table_name where a like 'ab%'; 
    select * from table_name where  a like '%ab'
    select * from table_name where  a like '%ab%'
    

    For the case where the column is a string, only prefix matching can use index, infix matching and suffix matching can only perform full table scan.

When does an index fail?

In the above, several situations that do not conform to the leftmost matching principle will cause the index to fail. In addition, the following situations will also cause the index to fail.

  • There is or in the condition, e.g.select * from table_name where a = 1 or b = 3

  • Doing calculations on an index invalidates the index, e.g.select * from table_name where a + 1 = 2

  • Invisible conversion of the data type on the type of the index will cause the index to fail. For example, the string must be quoted, assuming that  select * from table_name where a = '1'the index will be used, if it is written, the index select * from table_name where a = 1will be invalid.

  • Using a function in an index will invalidate the index, e.g.select * from table_name where abs(a) = 1

  • Starting with % will cause the index to fail when using a like query

  • Use on the index! , =, <> will cause the index to fail when judging, for exampleselect * from table_name where a != 1

  • Using the is null/is not null judgment on the index field will cause the index to fail, for exampleselect * from table_name where a is null

Database transactions***

What is a database transaction?



....The blogger is too lazy and the number of words is too much, and I don't want to write it....The article has been made into PDF, and friends in need can privately message me to get it for free!

 

Guess you like

Origin blog.csdn.net/weixin_70730532/article/details/125745202