Important knowledge points and problems of Mysql

command to view index

     show   index  from  mytable

The principle of indexing

Indexes are used to quickly find records with specific values. If there is no index, generally speaking, the entire table is traversed when executing a query.

The principle of indexing is to turn unordered data into ordered queries

  1. Sort the contents of the indexed column

  2. Generate an inverted list of the sorted results

  3. Add data address chains to the content of the posting list

  4. When querying, first get the content of the inverted list, and then take out the data address chain to get the specific data

Advantages and disadvantages of indexing

Advantages:    1. It helps to speed up data retrieval and reduce the cost of database IO. This is also the reason for creating database indexes. When querying database data, there is no need to perform full table scanning

           2. By creating a unique index, the uniqueness of each row of data in the database can also be guaranteed

           3. Optimize sorting, aggregation, and grouping operations, mainly because the index can provide pre-sorting and pre-grouping functions, thereby avoiding the need for a large number of calculations and comparison operations in the database when sorting and grouping. Specifically, an index can sort or group data according to the values ​​of one or some columns, so that when querying, the database can directly use the sorting or grouping order of the index to return data without additional comparisons Or computing operations, reducing CPU consumption.

           4. Improve concurrency performance. Indexes can improve the concurrency performance of the database. It can reduce the time spent on locking tables, rows, pages and other resources

           5. It can speed up the connection between tables

Disadvantages:   1. It takes time to create and maintain indexes, which is more obvious when the amount of data is large.

           2. The index needs to occupy storage space, because it needs to store the value of the index column and the pointer to the data row

           3. When the data in the table is added, deleted, or changed, the index must also be dynamically maintained, which increases the CPU consumption

Add an index to the field

When to recommend creating an index

1. Columns that are frequently used as query conditions. If a column is often used as a query condition, creating an index on this column can improve query efficiency. For example, in a user table, if user information is often queried based on the user ID, creating an index on the ID column can improve query efficiency.

2. The columns of the connection table, if the table connection operation needs to be frequently performed in the database, then an index can be created on the connection column to speed up the connection. For example, in an order table, if you often need to connect to the user table to query user information, creating an index on the user ID column can improve the efficiency of the connection.

3. Columns that are often used for sorting. If a column is often used as a sorting condition, creating an index on the column can improve the efficiency of sorting. For example, in a product table, if you often need to sort by price, creating an index on the price column can improve the efficiency of sorting.

4. Columns with few access times in large tables. In a database, an index is a data structure used to speed up queries. When doing queries on large tables, without indexes, the database would need to sequentially scan the entire table to find matching rows. This can be very slow if the table is large. Indexing can greatly improve the query speed, because it can reduce the search time from linear search to logarithmic search, and logarithmic search has a lower time complexity. Therefore, indexing columns that are accessed less frequently in large tables can help the database find matching rows more quickly. However, it should be noted that indexes also increase the time of write operations, because every insert, update and delete operation needs to update the index . Therefore, when deciding whether to add an index to a column, you need to weigh the trade-off between query speed and write speed, and make a decision based on the specific situation.

Situations where indexing is not recommended

1. Indexes should not be created for fields that are rarely referenced when querying data tables and that have a large number of repeated fields.

2. For a data table with a very small amount of data, the index can improve the efficiency of data access is very limited, so it is not necessary to create an index.

3. For a basic table, too many indexes should not be established. When the data table is added, deleted or modified, the indexes will also change accordingly. Indexes need to occupy file directories and storage space, and need to be maintained, too many will increase the burden on the system.

When is it better to use a consistent index?

1. Composite index

       A composite index refers to an index created by using multiple columns as an index. Unlike single-column indexes, composite indexes can contain multiple columns at the same time, and can be searched and sorted according to the order of these columns. When using a composite index, the database sorts first by the first column, then by the second column, and so on. This sorting method can improve the query speed, especially when the conditions of multiple columns need to be filtered at the same time, the composite index can be very effective. It should be noted that the more composite indexes, the better. Typically, you only need to create indexes on columns that are frequently used for queries. When selecting columns for a composite index, factors such as query frequency, number of filter conditions, and column order need to be considered. An unreasonable combination of indexes can degrade performance and even slow down queries. Therefore, when creating a composite index, you need to make a trade-off and choose according to the actual situation.

A composite index can also be a composite index. An index contains multiple columns. A composite index is less expensive than a single-value index (for the same multiple columns to be indexed). This is because a composite index can reduce the number of indexes, thereby reducing the cost of the index. maintenance and storage overhead.

2. Combination index leftmost prefix principle

     When using the column of the composite index as the eunuch, it must appear in the leftmost column as the condition, otherwise the index will not take effect, because the composite index is sorted according to the order of the columns, and the query only needs to match the query according to the order of the index columns conditions. If the query conditions do not meet the conditions of the leftmost prefix column, the index cannot be used, and the query will be converted to a full table scan. Therefore, when designing a composite index, the order of index columns needs to be selected according to the actual situation to ensure that the most commonly used queries can use the index. At the same time, it is necessary to avoid excessive indexing, that is, creating too many composite indexes, because this may reduce writing performance and increase the cost of index maintenance.

Under what circumstances will the index fail

1. The principle of the leftmost prefix of the composite index: when using the column of the composite index as a condition, the leftmost column must appear as the condition, otherwise the index will not take effect

2. There is or in the condition, even if there is an index in the condition, it will not take effect

      This is because the OR operator breaks the query condition into subconditions, each of which may require a different index. An index may be used if each subcondition can use it. However, if one of the sub-conditions does not have an index or the indexes of multiple sub-conditions cannot be used together, the index will be invalidated.

Avoid index failure: You can consider combining multiple single-column indexes into a composite index so that the index can be used when the query condition contains multiple columns. In addition, you can convert the OR operator to a UNION operator to split the query into multiple subqueries, each subquery contains only one condition, thereby avoiding the problem of index invalidation.

3.Like query with %Zhangsan, %Zangsan%

Avoid index invalidation: You can consider changing the pattern matching of the LIKE operator to end with a wildcard, or changing the LIKE operator to an equal (=) operator, so that you can use the index for optimization. In addition, other technologies such as full-text search engines can also be used to optimize fuzzy queries.

Full-text search engine: The principle of a full-text search engine is to segment each word in the text content and index the word segmentation results. When a user enters a query, the search engine tokenizes the query and looks for matching words in the index. The index used by the full-text search engine is usually an inverted index, that is, the position where each word appears is mapped to the word, so that the position containing the word in the text content can be quickly located.

4. For a field of string type, the index will be invalid when an int type parameter is passed in, but the field of int type will not be invalidated if the string type is passed in. Mysql converts according to its own data type conversion rules

Why does fuzzy matching fail

        This is because the pattern matching of the LIKE operator cannot be optimized using an index when it starts with a wildcard. Because in the index, the data is sorted according to the order of the index column, and the wildcard at the beginning of the LIKE operator indicates that the sorting feature of the index cannot be used.

Indexing is actually sorting, or queuing

like 'Zhang San%', what you are actually looking for is 'Zhang San XXX', just return all the content starting with 'Zhang San', this part is continuous, no full table scan is required. like '%Zhang San', what you are actually looking for is 'XXX Zhang San', this part is not continuous in the index, if you want to return the required results, you can only scan the whole table.

What are the benefits of a joint index?

1. Reduce the overhead of indexing

A compound index            of (a, b, c) is built , so it is actually equivalent to building three indexes (a), (a, b), (a, b, c), because each additional index will increase the write operation overhead and disk space overhead. For a table with a large amount of data, this is not a small overhead!

2. Index coverage

           Index coverage means that a query can be done using only the index without accessing the actual data rows in the table. When the queried columns are included in the index, the query can use index coverage to avoid accessing the actual data rows in the table, thereby improving query performance.

The advantage of using index coverage is that you can avoid querying a large number of data rows, thereby saving I/O operations and CPU time. Also, since indexes are typically smaller than data rows, queries can also read index data from disk more quickly.

It should be noted that when choosing whether to use index coverage, it needs to be weighed according to the actual situation. If the query needs to access most of the data rows in the table or needs to return multiple columns of data in the table, then using index covering may not be the optimal choice, because it will increase the number of scans of the index and the number of I/O operations.

3. Reduce the number of scan lines

(1) When using a joint index for query, if the clustered index column is included in the joint index, the query can directly obtain the required data from the clustered index without scanning the data rows, thereby reducing I/O operations and CPU time.

For example, suppose there is a table with two columns, A and B, and a single-column index is created for columns A and B, and column A is the clustered index column. If only the value of column A needs to be returned in the query statement, the query can use the following statement:

SELECT A FROM table WHERE B = 'value'

In this query, although the query statement only contains column B, column A is a clustered index column, so the query can directly obtain the required data from the clustered index without scanning the data rows, thereby reducing the number of scanned rows .

(2) Non-clustered index refers to an index method that stores index key values ​​and pointers to data rows separately. Since the index and data rows are stored in different pages, queries on nonclustered indexes are relatively inefficient. When using a joint index for query, if the joint index contains non-clustered index columns, then the query needs to find the pointer to the data row through the index key value, and then access the actual data row, thus increasing the I/O operation and CPU time.

For example, suppose there is a table with two columns, A and B, and a single-column index is created for columns A and B, and column B is a nonclustered index column. If only the value of column A needs to be returned in the query statement, the query can use the following statement:

SELECT A FROM table WHERE B = 'value'

In this query, although the query statement only contains column B, column B is a non-clustered index column, so the query needs to find the pointer to the data row through the index key value, and then access the actual data row, thus increasing the scan Rows.

It should be noted that the efficiency of the joint index depends on the specific situation of the query and the type of index. If the clustered index columns are included in the joint index, then the query can directly obtain the required data from the clustered index, thereby reducing the number of rows scanned; if the joint index only contains non-clustered index columns, then the query needs to pass The index key value finds the pointer to the data row, and then accesses the actual data row, thereby increasing the number of scanned rows. Therefore, when creating a joint index, you need to make a trade-off according to the actual situation, and select the appropriate column for the combined index to give full play to the advantages of the index.

Scenarios for joint index use

         Suppose there is an order table in this system, which contains the following columns: order number, user ID, order status, order time, product name, product quantity, product price, etc. If we need to query all completed orders of a user based on user ID and order status, we can create a joint index for user ID and order status.

When we issue this query, the system looks for all orders matching the user ID and order status criteria in the federated index. Since a joint index is an index composed of multiple columns and can be searched on multiple columns, the system can use the joint index to complete queries, thereby reducing the number of scanned rows and improving query efficiency. If there is no joint index, the system may need to scan the entire order table to filter eligible orders based on user ID and order status, which will increase query time and resource consumption.

Therefore, using a joint index in the order management system can improve query efficiency, speed up order processing, and improve user experience.

How to optimize Mysql slow query

1. Check whether the index is gone, if not, optimize SQL to use the index

2. Check whether the index used is the optimal index

3. Check whether the fields that are locked and checked are all necessary, whether too many fields have been queried, and redundant data has been found

4. If there is too much data in the checklist, whether it should be divided into databases and tables

5. Check whether the performance configuration of the machine where the database instance is located is too low, and whether resources can be appropriately increased

Explain and Sql optimization

Several important fields:

possible_keys: represents the index that may be used

key: the index actually used

key_len: the actual length of the index used

The type shows which type is used in the query, and the types contained in type include several types as shown in the following figure:

From best to worst are:

system > const > eq_ref > ref > range > index > all

1.system:表只有一行记录(等于系统表),这是const类型的特列,平时不会出现,这个也可以忽略不计
2.const:单表中最多只有一个匹配行(主键或唯一索引),在优化阶段即可读到数据
3.eq_ref:唯一性索引扫描,对于每个索引键,表中只有一条记录与之匹配。常见于主键或唯一索引扫描
4.ref:普通索引
5.range:范围查询,一般就是在你的where语句中出现between、< 、>、in等的查询,
6.index: 遍历索引树。这通常比ALL快,但是也没好到哪里去。
7.all:全表扫描,最差的情况下

Using index: Indicates that the corresponding select operation uses a covering index (Covering Index)

Using where: Indicates where filtering is used

How does Mysql determine whether to use an index or a full table scan

        When we query a field with an index, sometimes we find that the index is not used, but the full table scan is performed, because MySQL finds that the full table scan is faster than the index, so it chooses the full table scan.

For example: we add a joint index to the field bcd, and then query sql as select * from t1 where b>1

 

For example: if you go through the joint index, you will find seven eligible records, and then you need to perform seven back-to-table queries (because it is select *, and there are other fields that are not in the current index tree), so mysql thinks that instead of you going through the index first Then perform many times of querying back to the table, which is not as fast as a direct full table scan, so even though you use an index, you still scan the full table.

Summary: mysql will calculate the cost, how much is the cost of using the index, how much is the cost of the full table scan, if the index is used but there are too many return tables, go to the full table scan

The difference between relational and non-relational databases, and usage scenarios

1. Relational database: A database that uses a relational model to organize data. The relational model is a two-dimensional table model. The table name of a two-dimensional table is the relationship, a row in the two-dimensional table is a record, and a column in the two-dimensional table is a field.

Advantages: easy to understand, easy to use, common sql language, easy to maintain, rich integrity (entity integrity, referential integrity and user-defined integrity).

Disadvantages: Disk I/O is the bottleneck of concurrency, the query efficiency of massive data is low, and horizontal expansion is difficult. It is impossible to expand performance and load capacity simply by adding hardware and service nodes. When the database needs to be upgraded and expanded, downtime maintenance and Data migration, multi-table associated queries, and complex SQL queries of complex data analysis types have poor performance. Because ACID is guaranteed.

2. Non-relational database: Distributed, data storage system that generally does not guarantee compliance with ACID principles. Key-value pair storage, the structure is not fixed.

Advantages: simple structure and easy expansion, high performance and flexible data model

Disadvantages: It is only suitable for storing some relatively simple data, not suitable for complex query data, and not suitable for persistent storage of massive data, so it is suitable for storing relatively simple data. Some of them cannot persist data, so they need to be combined with relational databases.

For example, I use a relational MySql database to store data, but I use a non-relational database redis to create a cache and distributed locks. My mailbox verification code and news are stored in redis.

Left match, right match, inner join, talk about the difference

the difference:

1. left takes the table on the left side of left join as the main table, and the records in the left table will appear in the query results. If there is no matching result in the right table, it will be filled with null

2.right uses the right table of right join as the main table, and the records will appear in the query results. If there is no matching record in the left table, it will be filled with null.

3. The data searched by inner join is shared by the left and right tables

What is a small table driving a large table?

Driving large datasets with small datasets

1. When using left join, the left table is the driving table, and the right table is the driven table;

2. When using right join, the right table is the driving table, and the left table is the driven table;

3. When using inner join, mysql will select a table with a relatively small amount of data as the driving table, and a large table as the driven table;

For example: there are two tables A and B, table A has 200 pieces of data, and table B has 200,000 pieces of data; give an example according to the concept of cycle

Small table drives large table for(200 records){ for(200,000 records){ ... }

Large table drives small table for(200,000){ for(200){ ... }

Summarize:

1. If the small loop is in the outer layer, only 200 connections are required for the table connection; 2. If the large loop is in the outer layer, 200,000 table connections are required, which wastes resources and increases consumption;

To sum up: The main purpose of small tables driving large tables is to speed up query speed by reducing the number of table connection creations.

foreign key

Foreign key constraints can implement the following functions:

  1. Mandatory referential integrity: Foreign key constraints can enforce that data between associated tables must be consistent, thereby avoiding inconsistent data.

  2. Prevent orphaned rows: Foreign key constraints can prevent orphaned rows from appearing when data is deleted in associated tables, thereby ensuring data integrity.

  3. Automatic update of associated data: foreign key constraints can automatically update data in associated tables, thereby ensuring data consistency.

1. The foreign key can ensure the integrity and consistency of the data, and will not get orphan rows

2. You can get a good "cascade delete, cascade update", automatic table cleaning

3. The data integrity judgment is entrusted to the database, which reduces the code amount of the program

4. Foreign keys provide a very important hint as to which statistics are most important to collect in the database.

Reasons not to use foreign keys

1. The database needs to maintain internal management of foreign keys

2. The foreign key is equivalent to realizing the consistency transaction of the data, all of which are completed by the database server, so that the database has extra work for each CRUD operation, because it must check the consistency of the foreign key, which consumes a lot of resources. It is even more painful to update in large batches.

3. By enforcing relationships, foreign keys specify the order in which you must add/remove things, e.g. if a student is associated with an order, the order data must be removed first, followed by the student data.

4. Foreign keys are also prone to deadlocks due to the need to request internal locks on other tables.

Specifically, suppose there are two tables A and B with a foreign key constraint between them, and a record in table A is referenced by a record in table B. During concurrent operations, if two transactions want to modify related records in table A and table B respectively, the following deadlock will occur:

  • Transaction 1 first acquires the lock of table A, and then wants to acquire the lock of table B;
  • Transaction 2 first acquires the lock of table B, and then wants to acquire the lock of table A;
  • Since the two transactions get in the opposite order, they get deadlocked and cannot proceed.

To avoid this situation, the following methods can be adopted:

  1. Optimize the order and manner of transactions to avoid concurrent operations on multiple tables.

  2. Use finer-grained lock granularity to reduce lock competition and conflicts.

  3. Use the database's deadlock detection and automatic rollback mechanisms to ensure transaction execution order and consistency.

The difference between InnoDB and MyISAM

InnoDB and MyISAM are the two most commonly used storage engines in MySQL databases. They have different characteristics and applicable scenarios in terms of data storage, indexing, and transaction processing.

1. Storage method

The MyISAM storage engine uses table-level locking, which is mutually exclusive for read and write operations and cannot be performed at the same time. The InnoDB storage engine uses row-level locking, which is concurrent for read and write operations and can be performed at the same time.

2. Supported transactions

The MyISAM storage engine does not support transactions, while the InnoDB storage engine supports transactions and has ACID properties to ensure data consistency, isolation, persistence, and atomicity.

3. Index

The MyISAM storage engine has good performance for a large number of query operations, especially in full-text indexing. The InnoDB storage engine has good performance for a large number of write operations and transaction processing, especially in the case of high concurrency.

4. Foreign key constraints

The MyISAM storage engine does not support foreign key constraints, while the InnoDB storage engine supports foreign key constraints, and can guarantee data integrity and consistency between associated tables.

5. Data Security

The MyISAM storage engine is prone to data corruption and data loss when it fails. The InnoDB storage engine uses mechanisms such as transaction logs and rollback logs to ensure data security and integrity.

To sum up, the MyISAM storage engine is suitable for scenarios with many read operations and low requirements for transaction processing and data security, while the InnoDB storage engine is suitable for scenarios that require transaction processing and data security guarantees, especially for concurrent writes. Scenarios with high requirements for operations and foreign key constraints.

database isolation level

1. Dirty read: Transaction A reads the data updated by transaction B, and then B rolls back the operation, then the data read by A is dirty data.

2. Non-repeatable read: Transaction A reads the same data multiple times, and transaction B updates and submits the data during the multiple readings of transaction A, resulting in inconsistent results when transaction A reads the same data multiple times.

3. Phantom reading: System administrator A changes the grades of all students in the database from specific scores to ABCDE grades, but system administrator B inserts a record of specific scores at this time. When system administrator A completes the change, he finds that There is another record that has not been changed, as if hallucinations have occurred, which is called phantom reading.

Summary: Non-repeatable reading and phantom reading are easy to confuse. Non-repeatable reading focuses on modification, and phantom reading focuses on adding or deleting. To solve the problem of non-repeatable read, you only need to lock the rows that meet the conditions, and to solve the phantom read, you need to lock the table

Note: Under the innodb storage engine, the phantom reading problem has been almost solved after the introduction of mvvc.

What are Mysql locks?

Locks in the MySQL database include the following types:

1. Table-level lock

A table-level lock is the most basic lock, which can lock the entire table. It is mutually exclusive for read and write operations and cannot be performed at the same time. Table-level locks are suitable for batch operations on data and can effectively control concurrent access.

2. Row-level locks

Row-level lock is a more fine-grained lock, which can lock a row of data in the table, and read and write operations are concurrent and can be performed at the same time. Row-level locks are suitable for operations such as single-row modification, deletion, and insertion of data, which can improve concurrency.

3. Shared locks and exclusive locks

A shared lock is a read lock. Under a shared lock, multiple transactions can read the same row of data at the same time, but cannot write. An exclusive lock is a write lock. Under an exclusive lock, only one transaction can perform write operations, and other transactions cannot perform read and write operations.

4. Gap lock

A gap lock is a special lock that locks a gap in an index and prevents other transactions from inserting data in the gap. Gap locks are suitable for scenarios where multiple transactions insert data at the same time, and can avoid duplication of data.

5. Intention lock

Intention lock is a kind of auxiliary lock, which can inform other transactions what operation (read or write) will be performed on the data, so as to avoid lock conflicts. When the intent lock is applied to the locked table, it can improve the efficiency and performance of the lock.

It should be noted that different locks have different application scenarios and usage methods. The appropriate lock type should be selected according to the actual situation, and the database table structure and indexes should be reasonably designed to improve the concurrency and scalability of the database.

The purpose of MVCC is multi-version concurrency control. Its implementation in the database is to solve it 读写冲突. Its realization principle mainly relies on the three implicit fields in the record , undo log , and Read View . So let's take a look at the concept of these three points first.

MVCC implementation principle

Read View: To put it bluntly, Read View is the read view (Read View) produced when the transaction performs a snapshot read operation. At the moment when the transaction executes the snapshot read, a current snapshot of the database system will be generated to record and maintain the current active state of the system. The ID of the transaction (when each transaction is opened, it will be assigned an ID, and this ID is incremented, so the latest transaction has a larger ID value).

undo log: actually exists rollback segmentin the old record chain.

Implicit fields: In addition to our custom fields, each row of records also has 最近修改(修改/插入)事务 ID, 回滚指针,指向这条记录的上一个版本, and 隐含的自增 ID(隐藏主键)other fields implicitly defined by the database.

It is the difference in the timing of Read View generation that causes the difference in the results of snapshot reads under the Read Committed and Repeatable Read levels; in short, under the Read Committed isolation level, each snapshot read will generate and obtain the latest Read View; and Under the repeatable read isolation level, the first snapshot read in the same transaction will create a Read View, and the subsequent snapshot reads will all obtain the same Read View.

Talk about select... This query statement, in the process of mysql query

MySQL is divided into a server layer and a storage engine layer. The server layer includes connectors, analyzers, optimizers, and executors.

Next, introduce the functions of each part with the execution process of a sql query statement. The client executes a sql:

1. Connector: connect to database, authentication, rights management

2. Analyzer: Before execution, MySQL definitely needs to know what you are going to do, first perform lexical analysis, identify keywords, and then perform grammatical analysis to see if your SQL statement syntax is wrong.

3. Optimizer: Through the analyzer, we know what the SQL needs to do, but obtaining the result directly according to the SQL may consume a lot of performance, so it needs to be optimized by the optimizer. Generate execution plans, select indexes and other operations, and select the optimal execution plan

4. The executor opens the table and invokes the storage engine interface to judge whether the query conditions are satisfied row by row, put them in the result set, and finally return to the client; if an index is used, the filtered rows will also be filtered according to the index.

Talk about the process of redolog and undolog logs

The role of redo log: to ensure the durability of transactions. To prevent dirty pages from being written to the disk at the time of the failure, when the mysql service is restarted, it will be redone according to the redo log, so as to achieve the feature of transaction persistence.

content:

The log in physical format records the modification information of the physical data page, and its redo log is sequentially written into the physical file of the redo log file. When: The redo log is generated after the transaction starts, and the redo log is not written with the transaction submission, but is written into the redo log file during the execution of the transaction.

Undo log function: If the transaction fails or rolls back for some reason, you can use this undo to roll back.

Content: It can be considered that when a record is deleted, a corresponding insert record will be recorded in the undo log, and vice versa, when a record is updated, it will record a corresponding update record. When the rollback is executed, the corresponding content can be read from the logical records in the undo log and rolled back.

Guess you like

Origin blog.csdn.net/weixin_71243923/article/details/130605476