Mysql
Fundamentals of Indexing
Indexes are used to quickly find specific records; turn unordered data into ordered data for query
Sort the column data to create the index
Generate an inverted list of the sorted results
Splice the address chain on the content of the posting list
When querying, first get the content of the inverted list, then get the address chain, and finally get the data
The difference between clustered index and non-clustered index
Clustered index: enter the data and index together, find the index and find the data
Non-clustered index: store the data and the index separately, the leaf nodes of the index structure point to the location of the data, and find the data through the location
the difference:
Querying the clustered index can directly obtain data, and the non-clustered index requires a second query
Clustered indexes are suitable for range queries, and non-clustered indexes are suitable for sorting
The structure of mysql index, their respective advantages and disadvantages
The data structure of the index is related to the specific storage engine. Hash index and B+ tree index are more commonly used;
Hash index: Use a certain hash algorithm to convert the key value into a new hash; equivalent query, then the hash index has obvious advantages. The premise is that the key value is unique. If the key value is not unique, you need to find the key first. position, and then scan the linked list to find the corresponding value; range query, hash index is not easy to use
B+ tree index: keyword retrieval efficiency is relatively average; conventional retrieval, the search efficiency from the root node to the leaf node is basically the same, there will be no large fluctuations, and when scanning based on the index, you can also use the bidirectional pointer to quickly move left and right, improving efficiency very high
Index Design Principles
Queries are faster and take up less space
Columns suitable for appearing in where clauses, or columns specified in join clauses
Columns with a small cardinality have poor indexing effect and do not need to be created
Don't over index. Indexes require disk space
Columns defined as foreign keys must be indexed
Frequently updated fields are not suitable for indexing
Do not create indexes for columns with less query involvement and more repeated values
Data types defined as text, image, and bit are not indexed
Basic characteristics and isolation principles of transactions
Basic characteristics (ACID):
Atomicity: The smallest execution unit of a transaction, no separation is allowed. Make sure the action is either fully completed or not completed
Consistency: The data remains consistent before and after the execution of the transaction, and the results of multiple transactions reading the same data are the same
Isolation: Modifications made by a transaction are not visible to other transactions until they are finally committed
Persistence: After a transaction is committed, the changes made are permanently stored in the database
Segregation Principle:
Read uncommitted: may read uncommitted data of other transactions, also called dirty read
Read committed (oracle): only read committed transactions, the results of the two reads are inconsistent, called non-repeatable read
Repeatable reading (mysql): The data read each time is consistent, but phantom reading may occur
Serializable: Generally not used, each row will be locked, which will cause a lot of timeout and lock competition problems
How to sub-database and table mysql? How much data needs to be divided into databases and tables? What are the methods and sharding strategies of sub-database and table? What is the execution sequence of SQl after sub-database sub-table?
What is sub-database and sub-table: When the amount of data is too large, the query speed decreases. To improve efficiency, the data in one table is distributed to multiple tables in multiple databases.
Commonly used sub-database and sub-table tools: MyCat, ShardingSphere
Data fragmentation method:
Vertical sharding: Splitting different tables into different libraries from a business perspective can solve the problem of too large database data files, but it cannot fundamentally solve the query problem.
Horizontal sharding: From the perspective of data, split the data in a table into different libraries or tables, which can fundamentally solve the problem of low query efficiency caused by excessive data volume.
Fragmentation strategy:
Remainder: evenly store data, but expansion is very troublesome
According to the scope: it is better to expand the capacity, but the data distribution is not uniform enough
According to time: it is easier to distinguish hot data
According to the enumeration value: for example, sharding by region
Specify the partition according to the target field prefix: custom business rule fragmentation
Horizontal sharding breaks through the bottleneck of single-machine data volume processing in theory, and expands freely. It is a standard solution for sub-database and sub-table
Ali Development Manual suggests: the data of a table exceeds 5 million or the data file reaches 2G (before the business starts, estimate the business volume for 3 years in advance)
Execution process after sub-database sub-table (ShardingSphere):
SQL analysis->query optimization->sql routing->sql rewriting->sql execution->result merging
Sub-database sub-table problem:
Cross-database query, cross-database sorting, distributed transactions, public tables, primary key duplication...
Mysql's master-slave synchronization principle
Master-slave synchronization: When the data in the master database changes, the changes will be synchronized to the slave database in real time;
Benefits of master-slave synchronization:
The ability to expand the database horizontally
fault tolerance, high availability
data backup
Realization: On the main library machine, the master-slave synchronization event will be written to a special log file; on the slave library machine, the slave library reads the master-slave synchronization event, and according to the change of the read event, do it on the slave library Corresponding changes
The difference between Myisam and Innodb
InnoDB index is clustered index, MyISAM index is non-clustered index
The leaf nodes of InnoDB's primary key index store row data, so the primary key index is very efficient
The leaf node of the MyISAM index stores the row data address, which needs to be addressed again to get the data.
d. The leaf nodes of the InnoDB non-primary key index store the primary key and other indexed column data, so when querying, the covering index will
very efficient
Index types in mysql and their impact on database performance
Primary key: special unique index, only one in a table
Unique index: guarantee the uniqueness of data
Ordinary index: Allows the indexed data column to contain duplicate values
Indexes can greatly improve data query speed and improve system performance; however, the speed of deletion, addition, and modification will be reduced; each index takes up physical space.
What does each field sub-table represent in the Explain statement result
id: Every time select appears in the statement, a unique id will be assigned, some subqueries will be optimized into join, and the id will be consistent
select_type: the query type corresponding to the select keyword
table: table name
partitions: matching partition information
type: query method for a single table (full table scan, index)
possible_keys: Indexes that may be used
key: the index actually used
key_len: the actual index length used
ref: When using an index query, the object information that matches the value of the index column
rows: the number of records read
filtered: Percentage of remaining records after table filtering
Extra: extra information
What is index covering
When executing sql, the field data that needs to be queried by the current sql is included in the B+ tree corresponding to the index, and there is no need to search for it, and the result is returned directly.
What is the leftmost prefix principle
Leftmost first, when creating an index, the most frequent column is placed on the leftmost
How Innodb implements transactions
Take update as an example:
After Innodb receives the update statement, it queries the page where the data is located according to the conditions and caches it in the Buffer Pool
Execute the update statement to modify the Buffer Pool data
Generate a redoLog object for the update statement and store it in the LogBuffer
Generate undo Log logs for update statements for transaction rollback
If the transaction is committed, then the redo Log object and retrograde persistence will be persisted, and there will be other mechanisms to persist the data page to the disk; if the transaction is rolled back, the undo log log will be used to roll back
The difference between B tree and B+ tree, why Mysql uses B+ tree
B tree: Sort nodes, a node can store multiple elements, and multiple elements are also sorted
B+ tree: It has the characteristics of B tree, there are pointers between leaf nodes, non-leaf nodes have redundancy in leaf nodes, and they are sorted
Because the index is used to speed up the query, and the B+ tree can improve the query speed by sorting the data. The B+ tree can store more elements through a node, making the B+ tree more chunky and requiring less IO. The page size is only 16KB. Generally, a B+ tree with a task depth of 3 can store 20 million rows of data. Using the ordered linked list of leaf nodes in the B+ tree can well support range search and full table scan.
What are the types of mysql locks
Row lock: refers to locking one or more rows of a table. When accessed by other transactions, the locked rows cannot be accessed, and others are normal.
Table lock: refers to locking the entire table, and other requests can only read, not write; it cannot be written until the read lock is released
Deadlock: During the execution process, multiple processes compete for resources and cause each other to wait and cannot continue to execute
Optimistic lock: Assuming that the data will not conflict, it will be detected when the data is submitted for update, and an error message will be returned if there is a conflict
Pessimistic lock: When modifying a piece of data in the database, in order to avoid being modified by others, lock it directly to prevent concurrency
Shared lock: When the data is locked, other transactions can only read the lock, but not the write lock; the write lock cannot be added until all the read locks are released
Exclusive lock: When a transaction adds a write lock to data, other requests cannot add any locks until the lock is released
How to optimize Mysql slow query?
Check if the index is gone, if not, optimize SQL to use the index
Check if the optimal index is used
Check whether all fields are required, whether too many fields are queried, and more than data are found
Check whether sub-database and sub-table are required
Check the database configuration to see if resources need to be added
What indexes are there in mysql?
Primary key index: Data columns are not allowed to be repeated, and are not allowed to be null. There can only be one in a table
Unique index: data columns are not allowed to be repeated, null is allowed, and a table allows multiple columns to create a unique index
Ordinary index: basic index type, no uniqueness restriction, NULL value is allowed
Under what circumstances will mysql cause index failure
Non-leftmost match (joint index can be used for field query starting from the leftmost, otherwise it cannot be used)
Error fuzzy query (only right fuzzy query can trigger index)
Column operations (indexed columns use operations)
Use functions (index columns use functions)
Type conversion (the field is a string type, but the int type is passed in, and the index is invalid)
use is not null
use (!= or <>)
The index column uses or
The difference between primary key and unique index
The primary key index does not allow null and is unique; the unique index allows null, allowing multiple columns to create a unique index
The primary key must create a unique index, and the column with the unique index is not necessarily the primary key
There can only be one primary key index, but there can be multiple unique indexes
A primary key can be referenced as a foreign key by other tables, but a unique index cannot
The execution order of the primary key is higher than that of the unique index
A primary key is a constraint, but a unique index is an index
20. What is MVCC
Multi-version concurrency control: When reading data, the data is retained in a way similar to snapshots, so that read locks and write locks do not conflict, and different transaction sessions will see their own specific versions of data and version chains;
MVCC can only work under the two isolation levels of committed read and repeatable read