MySQL index and transaction problem

The next section will talk about JDBC related programming,

Please look forward to it~~~~~ 

content

1. Index

1.1 What is an index

1.2 Usage scenarios of indexes

1.3 Index related costs

1.4 The data structure behind the index (B+ tree)

2. Affairs

2.1 What is a transaction

2.2 Why use transactions

2.3 Four Characteristics of Transactions

2.3.1 Atomicity

2.3.2 Consistency

2.3.3 Endurance

2.3.4 Isolation (focus on understanding!!!)


1. Index

MySQL index is a question that is often involved in interviews and is very important. Let us explain the index in MySQL below.

1.1 What is an index

①Theoretically:

An index is a special file that contains pointers to all the records in the data table. You can create an index on one or more columns in the table, and specify the type of the index. Each type of index has its own data structure implementation.

② Real life example:

a. The directory of the book is an index, we can find the corresponding page according to the directory of the book

b. The student number of a classmate is an index, and the class teacher can find the corresponding classmate through the student number

1.2 Usage scenarios of indexes

① To create an index, you need to consider the following scenarios:
The amount of data is large, and conditional queries are often performed on these columns.
Insert operations to the database table and modification operations to these columns are less frequent.
Indexes consume additional disk space.
When the above conditions are met, consider creating indexes on these fields in the table to improve query efficiency.
Conversely, if the column is unconditionally queried, or when inserting and modifying operations are frequently performed, or when the disk space is insufficient, the creation of an index is not considered.
②The first close operation of index and sql:
a. View the index: show index from+table name

We query the student1 table and find that it already has its own index. Why does it have its own index?

Because we set the primary key when we built the table, in addition to the primary key, if we use unique, we will also bring our own index , as shown in the following figure:

 b. Create an index for a column

create index index name on table name (column name)

Notice! ! ! If we create an index, we usually create it together when we create the table, as shown in the following figure:

 And we generally do not add indexes to the table after it has been created, because if there is already a lot of data in the table at this time, it is easy to bring down the database. The following figure is not worthy of admiration to create an index after the table has been created, it is very dangerous! ! !

 c. The operation of deleting the index:

drop index index name is on; this operation is the same as the previous index creation. When there is already a lot of content in the database, rashly deleting it may eventually lead to the collapse of the database;

 After deleting the index, the index in the original table no longer exists, pay attention! ! ! This is quite dangerous behavior

1.3 Index related costs

The main meaning of the index is to search, and the main purpose is to improve the efficiency of search, but with the improvement of search efficiency, there will be some costs. Let's talk about the price of first closing with the examples in life above:

①For the catalog of books:

Adding directories does speed up the search speed of the index and improves the search efficiency, but you can also find that it wastes a certain amount of paper at the same time. In other words, as the amount of data queried increases, it takes up more and more of extra storage

②For student number:

Naming a student number for each student is indeed convenient for teachers to find information about classmates, but as the number of students increases, more additional space is needed to record the student number of students.

That being the case, why do we even use indexes? ? ?

We will easily find that as long as a book is on the market, there is no chance to change the catalog, and for students, there is only a small probability of welcoming transfer students, that is, the probability of adding a student number is very small, and we often A series of operations are performed to find page numbers through the table of contents and students through student numbers. Therefore, in general, the advantages of indexing far outweigh its disadvantages .

1.4 The data structure behind the index (B+ tree)

①Introduce:

Because the fundamental purpose of the index is to improve the efficiency of the query, we must implement the traversal operation, and the traversal here does not mean traversing according to the address value like a pointer, but traversing according to the value

②Why choose B+ tree as the data structure behind the index?

(1) Why not use a binary tree (search tree)?

Because, the biggest problem with binary trees is that when there are more elements, the height of the tree will naturally increase, and for the database, each comparison means that the disk needs to perform IO.

(2) Why not use a hash table?

Although the time efficiency of the hash table has reached O(1), for the hash table, it is only more suitable for finding a value, that is, making a corresponding judgment on equality, rather than finding a range and the like. question

(3) Why not use heap?

Because the heap is more used to find the top element of the heap, that is, the case of the maximum or minimum value

(4) Why use B+ tree?

Because trees like B-trees and B+ trees, also known as multi-fork search trees, naturally decrease in height relative to ordinary binary search trees.

Before explaining, let's first understand, what are these two kinds of trees?

①What is a B-tree (B-tree is also called B-tree)?

The characteristics of aB tree: For the B tree, each node of it will store N key values, and the N key values ​​are divided into N+1 intervals, and each interval corresponds to a subtree.

The search process of bB tree:

In the process of B-tree search, start from the root node and lock its existing interval according to the value to be queried. At this time, many people will say that multiple comparisons are also performed in the process of locking the interval, so why is the binary search tree not good?

c. Why does a binary search tree not work?

Because for the binary search tree, the number of comparisons of each node is highly correlated, that is, the parent node is compared with the son node, but for the B tree, since each subtree is a range, and the same layer has Multiple ranges, so its height is relatively small, but the number of comparisons per node also increases. As for the number of comparisons, the number of IOs is what we are more concerned about, and the IO of disks is compared in units of nodes.

The actual structure diagram of the dB tree:

(3) What is a B+ tree?

a. What is the structure of a B+ tree:

The B+ tree is also an N-ary search tree. Each node contains multiple key values. If each node has N keys, it is divided into N intervals.
b.B+ tree is different from B tree:

Unlike the B-tree, the value of the parent node will be reflected on the child nodes, and the value of the non-leaf node will eventually be reflected on the leaf node. Generally speaking, as long as the value appears in the B+ tree , it will be reflected at least in the leaf nodes. The value of the parent node will be reflected as the maximum or minimum value of its child nodes. The lowermost leaf nodes are connected in order by a linked list.

The actual structure diagram of c.B+ tree:

d. Why is B+ tree more suitable for indexing:

1. The B+ tree is the same as the B tree. When searching, the overall number of IOs is relatively small
. 2. The final result of the query falls on the leaf node, so the number of IOs for each query is similar, and the query speed is stable . 3. The
leaf nodes are connected by a linked list, which makes it easier to find the range.
4. All data storage (loads) are placed on the leaf nodes, so only the key value can be stored in the non-leaf nodes, that is Non-leaf nodes take up less space as a whole and can even be cached in memory

2. Affairs

2.1 What is a transaction

A transaction refers to a logical group of operations, and the units that make up this group of operations either all succeed or all fail.
In different environments, there can be transactions. Corresponding in the database, is the database transaction.

2.2 Why use transactions

The purpose of the birth of the transaction is to package several independent operations into a whole for operation.

In SQL, some tasks are completed by a combination of multiple SQLs. If there is a problem with a SQL in a step, it will inevitably affect its previous one or SQL, thus making everyone lose their meaning.

2.3 Four Characteristics of Transactions

2.3.1 Atomicity

①What is atomicity:

Either all of them are executed, or none of them are executed, and the transaction cannot be divided again and again.

②Actual example:

transfer. When A transfers 1,000 yuan to B, when a transfer operation occurs, A's balance is -1,000 yuan, and B's balance is +1,000 yuan. The two are a whole. Under normal circumstances, it is impossible for the former to transfer to the latter without the operation.

③ How to ensure atomicity?

If there is such a problem that the previous sql occurs, and then there is a problem with the latter sql, which causes the interruption.

The database will automatically perform some "restore" operations to eliminate the impact of the previous sql, and finally it seems that the original sql has not been executed.

2.3.2 Consistency

It means that the content in the database is reasonable before and after the transaction is executed. Take the transfer operation mentioned above as an example, no matter how the account is transferred or not transferred, it is impossible for the account to be negative.

2.3.3 Endurance

Once the transaction is committed, the data will be stored for a long time and will be written to the hard disk

2.3.4 Isolation (focus on understanding!!!)

①What is isolation?

Isolation describes what happens when transactions execute concurrently. When multiple transactions are executed concurrently, there is a high possibility of trying/reading the same data, and there will be some problems, and the emergence of isolation is to solve the above problems

for example:

When students fill out the online form at the same time in school, we will easily find that sometimes both parties may fill in the same position

② Dirty read problem:

a. What is dirty read:

Transaction A is modifying a certain data, and at the same time, transaction B is reading the data. At this time, the data read by transaction B is very likely to be a temporary value, just the value before the modification, not the final result. This is "dirty data", and this operation is a dirty read

b. Why do dirty reads occur?

It is because there is no isolation between transaction A and transaction B; thus the above problems occur

c. How should dirty reads be avoided?

Add some constraints to make effective isolation between transactions.

Dealing with dirty reads: shackle the write operation to ensure that others cannot read it during the writing process, and others can read it after the modification. Once this write lock is in place, isolation is enhanced and concurrency is reduced

② Non-repeatable read operation:

a. What is a non-repeatable read operation?

It is mentioned in the dirty read: lock the write operation, we cannot read when writing, but when reading, there is no provision that no write operation can be performed.

That is, a transaction includes multiple read operations, and the results of each read are different

 b. How to resolve this situation?

Not only the write operation is locked, but also a lock is added to the read operation, so as to avoid this situation. At this time, the concurrency of the transaction is reduced and the isolation is improved.

③ Phantom reading problem:

a. What is the phantom reading problem?

During the execution of a transaction, multiple queries are performed, and the number of results of multiple queries is different. This operation is a special non-repeatable read.

When we were reading, someone else performed the writing operation. The difference from before is that the writing is not the same content, but the content of the transaction part. At this time, when you read, you will find the content. The number has increased.

b. How to solve the phantom reading problem:

Complete serialization.

Thorough serialization means: lock the read operation when writing, and lock the write operation when reading

 In this way, the isolation is the highest, the concurrency is the lowest, the data is the most reliable, and the time is the slowest.

From this we can also see that concurrency and isolation are not compatible

 

 At the same time, the isolation level of MySQL provides the following

Thanks for watching~~

 

Guess you like

Origin blog.csdn.net/weixin_58850105/article/details/123727619