[MySQL] Index and transaction key knowledge summary

1. Index:

  1. The meaning of index existence is 为了提高查询到效率.
  2. The role of the index is 类似与一本书的目录to quickly find the desired content through the directory. If there is no directory, you can only turn (traverse) page by page.
  3. The cost of using the index (gains and losses): a) 消耗了更多的空间, b) 虽然提高了查找效率,但是降低了增删改的效率(because inserting modification records not only needs to modify the data on the hard disk but also adjust the index).
  4. Although the index has some costs, it is still considered worthwhile to use the index,因为大多数情况下查询的频率是高于增删改的.

1.1 Use of the index:

  1. For relatively large tables in the production environment, the indexes are generally planned at the beginning of the table construction, so as to avoid many inefficient operations.

  2. view index ,show index from 表名;insert image description here

  3. create index ,create index 索引名 on 表名(列名);

  4. Creating an index is an inefficient operation. If there is little data in the table, then the cost of creating an index is not large; if there is a lot of data in the table, creating an index will be very time-consuming and bring a lot of hard disk IO, and even The database is stuck.

  5. When creating an index, some related data structures are also created.

  6. delete index ,drop index 索引名 on 表名;

  7. The deletion operation is similar to the creation operation just now, which is a relatively inefficient operation.

1.2 The core data structure behind the index:

Which data structures can improve the efficiency of lookup:

1. Hash table, adding, deleting, checking and modifying are all O(1)

You can only query the case where the values ​​are equal, but if it is a relatively large range query such as < > between and such, it will not work

2. Binary tree/binary search tree, the worst query speed is O(N)

AVL tree/red-black tree (balanced binary search tree) O(logN)

If the database data is particularly large, the above tree will be relatively high O(logN)

The programmer tailored a special data structure B+ tree for the database index.

1.2.1 First understand the B-tree (N-fork search tree):

The B-tree is an N-fork search tree. 每个节点上可能会包含N-1个值(也可能更少), N-1个值就把区间划分成了N份The meaning of dividing into N forks in this way 就是表示同样的数据集合的时候,比二叉树的高度要小很多reduces the number of IOs a lot!

insert image description here

1.2.2 Know B+ tree again (N-fork search tree):

insert image description here

B-tree B+ tree
Each node of the B-treeN-1个值,就分出了N个区间 B+ treeN个值分成N个区间
in the B-tree值不会重复出现 B+ tree yes 可能重复出现( 父元素的值会在子元素中以最大值/最小值的姿态emergence)
At 叶子节点这里,B+树会把所有的叶子节点以链表的形式首尾相连this time, it is very convenient for range lookup
Because 叶子节点是全集数据, you only need to associate each row (all the complete columns of each record to the leaf node); 非叶子节点只需要保存索引列(only save an id);
Non-leaf nodes take up very little space (compared to the complete data set), and can be cached in memory. Therefore, at this time, the query further reduces hard disk IO.

2. Affairs:

  1. 事务就是用来保证原子性的.

  2. Atomicity: An atom is the smallest unit that cannot be divided, and an atom is used to represent an indivisible basic unit.

  3. There are also some operations in the database 希望可以按照原子的方式来执行, in which case you can use "transaction" to achieve

  4. Similar to the transfer operation, it needs to be completed in an atomic way. Either all executions are executed, or none are executed (the non-execution mentioned here does not mean that it is not executed, but half of the execution can be automatically restored to the original state if there is a problem)

  5. business can be guaranteed,当执行过程中出现问题的时候,自动的把前面的SQL执行的效果进行还原,恢复如初,这个操作叫做回滚(rollback);

  6. exist事务执行的过程中, MySQL会记录每一步都执行了啥,一旦出现问题就可以根据记录来回滚.

  7. Since it can be rolled back, why not withdraw it? In order to realize the transaction, it actually needs to pay a lot of price! If you want to realize the withdrawal, it means that you have to pay these costs at every step. The withdrawal operation is not impossible, but the cost is too high Yes, it's not worth it!

  8. 事务最核心的就是原子性, the start/commit/rollback of transactions are generally controlled by code.

  9. Four characteristics:

    4 features explain
    原子性 This is the meaning of the existence of transactions!, can 把多个SQL打包成一个整体, either all of them are executed, or none of them are executed (if in the process of execution 出错,则自动回滚)
    consistency Before and after the execution of the transaction, the data is in a consistent state, (the data can be correct)
    Persistence The changes made by the transaction are all written to the hard disk, and will not be lost with the program restart/host restart
    隔离性 When multiple transactions are executed concurrently, the transactions can be kept "isolated" without interfering with each other

2.1 Isolation:

  1. Concurrent execution, a simple understanding is to do many things at the same time. Execute transactions concurrently 可能存在问题,就需要隔离性.
  2. The meaning of isolation is让并发执行事务的过程中,尽量不出问题(问题在可控范围之内)

2.1.1 Dirty read problem:

  1. Imagine a scene, my roommate asked me for homework, I sent him the homework before revision, after he used it, I changed the homework.
  2. The above is a dirty read problem, and the dirty read data is one 临时的数据, which does not represent the final result.
  3. Dirty read: Before a transaction A modifies the data and submits it, another transaction B reads the data. 此时A极有可能在提交的时候把数据给改了.At this time, what transaction B reads is "invalid data", which is called dirty read, and dirty data is read.
  4. How to solve the problem of dirty reading: Combined with the above scenario, I will make an agreement with my roommate to ask for it after my homework is finished. Don’t ask me for it until I finish writing it!这个操作就相当于是对 写操作加锁!
  5. Before writing and locking , my write operation and my roommate's read operation are completely concurrent. At this time并发是最高的,隔离性是最低的!
  6. After writing lock , when I do my homework, my roommate can't ask me for it,并发性降低了, 但是隔离性提高了!
  7. But this introduces a new problem, non-repeatable read!

2.1.2 Non-repeatable read problem:

  1. concept:在一个事务A中,多次读取同一个数据发现不一样!!! (读的过程中数据被人修改了)
  2. Imagine a scene, because I agreed to write and lock, when my roommate was reading my homework, I had a new idea and changed the homework. At this time, I sent it to my roommate again, and they found that the homework had changed! , This process is the problem of non-repeatable reading.
  3. 不可重复读需要使用读加锁来解决, My roommate and I agreed that when I write my homework, you don't ask me to ask for it; at the same time, when my roommate reads my homework, I don't want to correct it.
  4. With the introduction of read locking, 并发程度又进一步的降低了(效率降低), 隔离性又提高了(数据准确性也提高了).

2.1.3 Phantom reading problem:

  1. Imagine a scenario where I just agreed with my roommate on write lock and read lock, but I still can’t stay idle. When my roommate reads file A, I go to modify file B/add and delete files...as long as it doesn’t affect everyone who is reading The data of that would be great! (I think so)
  2. In this way, although the data directly read by the students has no effect, the students will find that although the data related to the two readings is the same. (At the first time, 但是结果集变了you can only see one .java culture, but now you see two .java files )
  3. The above situation is called 幻读问题, and can be regarded as a special case of non-repeatable reading .
  4. In order to solve the problem of phantom reading, I made an agreement with my roommates that when they read the data, I have to turn off the computer and go fishing, and I can't touch the homework at all!
  5. At this time 并发程度最低了(串行执行的了)效率是最低的,隔离性是最高的,数据的准确性最高!

2.1.4 Summary:

  1. The above-mentioned dirty read problems, non-repeatable read problems, and phantom read problems are all possible effects in concurrent execution transactions. These effects are not necessarily bugs.
  2. If the demand is right 数据精度要求不高,上述问题就不是bug, so the degree of concurrency can be higher, the isolation is lower, and the efficiency can be improved!
  3. If the demand is right 精度要求很高,上述问题就是可能是bug, it is necessary to have a lower degree of rag concurrency and higher isolation to ensure data reliability!
  4. Similar to transfer, it must be highly accurate, and it is okay if the efficiency is low.
  5. Similar to the number of Douyin likes/coins, the accuracy requirements are not high.

2.1.5 Isolation level:

MySQL提供了隔离级别这个选项,给了四个档位, let us choose different gears according to actual needs. In the MySQL configuration file my.ini is configured, and different gears can be set according to different demand scenarios.

options illustrate
read uncommitted Uncommitted data is allowed to be read, 并发程度最高,隔离性最低and there may be dirty read/non-repeatable read/phantom read problems
read committed Only the data after submission can be read, which means that 写加锁the degree of concurrency is reduced, the isolation is improved, and the problem of dirty reading is solved
repeatable read (默认) It is equivalent 写加锁和读加锁, the degree of concurrency is reduced again, the isolation is improved again, and the problem of dirty read/non-repeatable read is solved
serializable Strictly implement serialization, 并发程度最低,隔离性最高solve the problem of dirty read/non-repeatable read/phantom read, and have the lowest efficiency

Guess you like

Origin blog.csdn.net/qq_68993495/article/details/128378294