45 MySQL combat stress study notes: general index and unique index, how should I choose? (Lecture 9)

First, today's content summary

Before the start of today's text, what I want to specially thank the comments area several students leaving high-quality comments. The user name is @ a person of students on knowledge of the article made a comb, then raised questioned about the visibility of the affairs of
the question, but the transaction is to be started after the submission of the impact on data visibility. @ Summer Rain students also mentioned this issue, I replied in the top review, at the end of the article today will expand the description. @Justin @ Ni adults and two
students raised two good questions.

Can cause problems for deeper thought, I'll write in the content of the reply, "Good question" in the name, to facilitate your search, you can also go and see their message.

Thank you very carefully look at the article and leave a message so much and very high quality. Know there are articles bring some new understanding for everyone, for me is a great encouragement. At the same time, also allow other students to seriously look at the comments section, there is
the opportunity to discover some of their own are not aware of, but could not clear knowledge, which is also generally improve the quality of the entire column. Thank you again.

Okay, now back to our text content today.

In front of the base article, I introduced you to the basic concepts of the index, I believe you already know the difference between a unique index and the general index. Today we'll continue to talk in different business scenarios, should choose the general index, or unique index?

Suppose you maintain a public system, each person has a unique ID number, and the service code has been written to ensure that the two will not repeat the ID number. If people need to follow the system ID number to check the name, which will perform a similar
kind of SQL statement:

select name from CUser where id_card = 'xxxxxxxyyyyyyzzzzz';

So, you will consider building an index on id_card field.

Since the ID number field is relatively large, I do not recommend that you put the ID number as the primary key, so now you have two choices, either to create a unique index to id_card field, either create a common index. If the business does not ensure that the code has been
written to duplicate identity card number, then the logic of these two options is correct.

Now I want to ask you is, from the point of view of performance considerations, you choose a unique index or ordinary index it? What is it based on selected?

For simplicity, we shall be illustrated by examples of Article 4 "layman index (on)", the values ​​on the assumption that the field k is not repeated.

 FIG 1 InnoDB index Structure

Next, we will from both index query performance impact statement and update statement to be analyzed.

Second, the inquiry process data

Assumptions, statements execute the query is select id from T where k = 5 . In the process of this query the index to find the trees, first from the roots begin by B + tree, search by layer to the leaf node, which is in the lower right corner of the data
page, you can then consider internal data pages to locate records by dichotomy .

  1. For ordinary index is, after finding the first record satisfying the condition (5,500), you need to find the next record, until it hits the first recording does not satisfy k = 5 condition.
  2. For a unique index, because the index definition uniqueness, after finding the first record that meets the conditions, it will discontinue retrieval.

So, this brings the performance gap between how much different would it? The answer is very little.

You know, InnoDB data is based on data in units of pages to read and write. That is, when the time required to read a record, this record is not read from the disk itself out, but in units of pages, the whole was read into memory. In
InnoDB, the default size of each data page is 16KB.

Because the engine is read page by page, so that when k = 5 records found, when it would have in the data page in memory of. So, for the average index, it should do more of that time "to find the next record and judgment" operation, we
only need to find one pointer and one calculation.

Of course, if k = 5 the last record is a record that exactly data page, then to remove a record, the next data page must be read, this operation will be slightly more complicated.

However, before we calculate the probability too, for integer field, a data page can put nearly a thousand key, therefore this situation will be low. So, when we calculated the average performance difference, it can still be considered the operating costs for the current
CPU is negligible.

Third, the update process data

To illustrate the impact on the general index and unique index update statement performance of this question, I need to talk to introduce you to change buffer.

When you need to update a data page, if the data page directly updated in memory, and if the data page has not been the case, without affecting the consistency of data in memory under the premise, InooDB will update these cached Change
Buffer , so that you do not need to read into the data page from disk. The next time a query needs to access the data pages, the data page is read into memory and then execute change buffer in operations related to this page. In this way
formula can guarantee the correctness of the data logic.

It should be noted that, although the name is called change buffer, in fact it is persistent data. In other words, change buffer has a copy in memory will be written to disk.

The operation change buffer is applied to the original data page, get the latest result of the process is called merge. In addition to access the data page will trigger merge, the system has a background thread periodically merge. In the database normally closed
(the shutdown) of the process, there will merge operation performed.

Obviously, if we can update the first record in change buffer, reducing disk read speed of execution of the sentence will be significantly improved. Moreover, the data is read into memory need to occupy buffer pool, so in this way also can avoid
free memory for improved memory utilization.

So, under what conditions can use the change buffer it?

For a unique index, all the update operations must first determine whether the operation violates a unique constraint. For example, to insert (4,400) this record, we must first determine whether the current record k = 4 already exist in the table, and this must want the number of
data pages into memory to judgment. If you have already read into memory, and that memory will be faster update directly, no need to use the change buffer.

Therefore, the update can not use a unique index change buffer, in fact, only the general index can be used. change buffer using a buffer pool in memory, and therefore can not be increased infinitely. change buffer big
small, can be dynamically set by the parameter innodb_change_buffer_max_size. When this parameter is set to 50 to indicate the size of the change buffer occupancy of a maximum of 50% buffer pool of.

Now, you have to understand the mechanisms of change buffer, then we'll take a look if you want to insert a new record in this table (4,400) words, InnoDB processing flow is like.

The first is that this record to be updated target page in memory. In this case, InnoDB process is as follows:

  1. For unique index, finding positions between 3 and 5, it is determined that there is no conflict, the value is inserted, the end of the statement is executed;
  2. For the general index, finding a location between 3 and 5, insert this value, the statement execution ends.

It would appear that the general index and unique index on the updated impact statement performance difference, but a judge will only consume a small CPU time.
However, this is not our focus.

The second case is that this record to be updated target page is not in memory. In this case, InnoDB process is as follows:

  1. For a unique index, the data pages need to be read into memory, to determine there is no conflict, insert this value, the statement execution ends;
  2. For the general index, the sucked update records in a change buffer, the statement execution is over.

Random access to data relating to the IO read from disk into memory, is one of the highest costs of operating inside the database. changebuffer because of reduced random disk access, so updates to enhance the performance will be very obvious.
Before I encountered one thing, there is a feedback DBA students told me that he was responsible for the library memory hit rate of a business is suddenly reduced from 99% to 75%, the entire system is blocked, all the update statement blocked. And explore its
later reason, I found this business there are a large number of operations to insert the data, and the day before he put one of these into a unique index on the general index.

Four, change buffer mechanism and application scenarios

Through the above analysis, you have a clear role in accelerating the update process using the change buffer, clearly the change buffer is limited to use in the general index of the scene, and not to the unique index. Well, now there is a
problem is this: all scenes, the use of change buffer ordinary index can play a role to accelerate it?

Because merge the time is truly performs time data updates, and the main purpose of change buffer is to change operation records cached, so before a data page do merge, change change buffer records
more (that is, to on this page the more the number of updates), the greater the benefits.

Thus, for Write Once Read Many small businesses, the probability of a page immediately after finished access to relatively small, this time using the best effect change buffer. This business model is common billing class, the class system log.

Conversely, suppose a business update mode is done immediately after writing queries, even if the conditions are satisfied, the first update record in change buffer, but after due immediately to access the data page, it will immediately trigger merge too

Cheng. In this way the number of random access IO will not be reduced, but increased the maintenance costs of change buffer. So, for this business model is, change buffer but played side effects.

Fifth, the index selection practices

Our question back to the beginning of the article, the general index and unique index should be how to choose. In fact, these two types of indexes on query capabilities is no difference, the main consideration is the impact on the performance of the update. So, I suggest that you try to choose a regular index.

If all behind updates are immediately associated with a query for the record, then you should turn off changebuffer. In other cases, change buffer can improve the performance update.

In actual use, you will find, with the general index and change buffer is used for optimization of large volumes of data to update the table is obvious.

In particular, the use of mechanical hard disk, with little change buffer of this mechanism is very significant. So, when you have a similar "historical data" libraries, and for cost considerations when using a mechanical hard drive, then you should pay particular attention to
index the table, try to use the general index, and then try to open a large change buffer, to ensure that the data write speed of this "historical data" table

Six, change buffer and redo log

Understand the principles of change buffer, you might think of me and you described earlier in the article had redo log and WAL.

In comments earlier article, I found that some students confused redo log and change buffer. WAL enhance the performance of the core mechanism and, indeed, is to minimize random read and write, really easy to confuse the two concepts. So, I put it here,
we put one sequence to illustrate the ease you distinguish between the two concepts.

Note: Here, you can look back at 2 under articles "log system: how to update a SQL statement is executed? "In the relevant content.

Now, we have to execute the insert statement on a table:

mysql> insert into t(id,k) values(id1,k1),(id2,k2);

Here, we assume that the current state of k index tree, after finding the location, where k1 data page in memory (InnoDBbuffer pool), the data page where k2 is not in memory. As shown in FIG. 2 is updated with the state of change buffer.

 FIG 2 with change buffer update process

Analysis of this update statement, you will find that it involves four parts: Memory, redo log (ib_log_fileX), data table space (t.ibd), system table space (ibdata1).

This update statements do the following operations (in numerical order in the figure):

1. Page 1 in memory, direct memory to be updated;
2. Page 2 is not in memory, change buffer area in the memory, the recording of "I want to insert Page 2 line" This information
3. The operation of the two note in the redo log (FIGS. 3 and 4).

Done to the above, the transaction can be completed. So, you will see that the cost of the implementation of this update statement is very low, that is, two memory write, then write a disk (two operations together to write a disk), but also the order in writing.

Meanwhile, the two broken arrows in the figure, is a background operation, does not affect the response time of the update.

That request after reading this, how to deal with it?

For example, we now want to perform select * from t where k in ( k1, k2). Here, I draw a flow chart of two read requests.
If you read the statement shortly after an update statement, the data in memory are still there, so this time the two read operations on the system table space (ibdata1) and redo log (ib_log_fileX) irrelevant. So, I did not draw the two parts in the figure.

 Reading process of FIG. 3 with change buffer

It can be seen from the figure:

1. Read Page 1 when returned directly from memory. Several students in the comments of the previous article asked, after reading WAL if the data is not necessarily to read the disk, is not necessarily redo log from which the data update

Before it can return later? In fact, it is not used. You can look at the state of Figure 3, although still on the disk before the data, but the result is returned directly from memory here, the result is correct.

Page 2 2. To read the time, Page 2 need to be read from disk into memory, and then apply the change buffer inside the operation log, generate a correct version and returns the result. It can be seen until the need to read Page 2 when the data page will be read into memory.

So, if you want to simply compare earnings of these two mechanisms on the lifting update performance, then, redo log major savings random write IO consume disk (converted into sequential write), and change buffer main savings are random read disk the IO consumption.

VII Summary

Today I begin a general index and unique index to select from, and you share a query and update data, and then explain the mechanisms and scenarios change buffer, and finally talked about the practice of index selection.

Because of the unique index do not have access optimization mechanisms change buffer, so if the business can accept, from a performance point of view I recommend that you give priority to non-unique index.

Finally, he went to the Questions of time.

You can see by Figure 2, change buffer memory is to write the beginning, then power down the machine restarted if this time, will not lead to change buffer lost it? change buffer loss is not a small thing, and then read from the disk

The data may not have the merge process, would be tantamount to a loss of data. Will it happen?

You can put your thoughts and ideas written in the comments section, I will end next article and you discuss this issue. Thank you for listening, you are welcome to send this share to more friends to read together.

supplement:

Comments area we have more discussion of "whether to use the unique index", mainly entangled in the case of "business may not be able to ensure that" the. Here, let me explain:

First, the business correctness priority. Let premise of this article is "business code has been written to ensure that no duplication of data" to discuss performance issues. If the business does not guarantee or service is to ask the database to do about
the beam, then did not have a choice, you must create a unique index. In this case, the significance of this article is that if you run into a lot of data insertion slow, memory hit rate, they can give you more of a troubleshooting ideas.

Then, some "archive library" scene, you can consider using a common index. For example, online data to keep only half a year, and historical data in the archive library. At this time, the archive is to ensure that no data has the only key punch
conflict of. To improve the efficiency of the archive, consider the table inside a unique index into the general index.

Eight, the issue of time

On stage the question is: how to construct a "data can not be modified," the scene. Comments district has a lot of students gave the correct answer, I'll describe here.

In this way, session A screenshot of what I see effects.
In fact, there is another scene, the students are also not mentioned in the message area.

 

This sequence of operations ran out, session A look at the contents of my shots are also able to reproduce the effect. This sessionB 'start transaction earlier than A, in fact, is the period remaining when describing the visibility rules transactional version of our eggs, because the
rule has a' judgment brisk business, "I am prepared to stay here replenished.

When I tried to tell the full rules here, we found the first 8 article "matters in the end is isolated or not isolated? Interpretation "is introduced too many concepts, so that the analysis is very complicated.

So I rewrote the first eight, so we labor to judge the visibility of the time, will be more convenient. [See here, I suggest that you be able to re-open the first eight articles and seriously study time. If the learning process, have any questions,
you are welcome to give me a message]

A new way to analyze the session B 'update Why is not visible on the session A: In session A view
instant view of the array is created, session B' is active, are "uncommitted version, invisible" case.
If you want to bypass this type of business in question, @ Joshua provides an "optimistic locking" of the solution, you can go on a message area to look at.

Nine, the classic message

Select the general index is the only index?

For the query process is:

a, the general index, found the first record after satisfying the conditions, continue to find the next record, know that the first record does not satisfy the condition of
b, a unique index, because the index is unique, and found to meet the conditions of the first record after stopping to retrieve
However, the performance gap between the two is minimal. Because InnoDB pages read and written according to the data.

For the update process is:

The concept: change buffer

When you need to update a data page, if the data page directly updated in memory, if not in memory, without affecting the data consistency, InnoDB will operate these updates in change buffer cache in. The next time a query needs to access the data pages, the data page is read into memory, and then perform operations associated with the page change buffer of.

change buffer is persistent data. There are copies in memory will be written to disk

purge: the operation change buffer is applied to the original data page, get the latest result of the process, become a purge access the data page will trigger the purge, the system has a background thread periodically purge, in the normal process of shutting down the database, will perform purge

Update unique index can not be used change buffer

change buffer is used in the memory buffer pool, change buffer size may be dynamically set by the parameter innodb_change_buffer_max_size. When this parameter is set to 50 to indicate the size of the change buffer occupancy of a maximum of 50% buffer pool of.

Random access to data relating to the IO read from disk into memory, is one of the highest costs of operating inside the database.
change buffer due to reduced random disk access, so updates to enhance performance is obvious.

change buffer usage scenarios

Before a data page do purge, the more change buffer records changes, the greater the benefits.
For write once read many small businesses, the probability of a page immediately after finished access to relatively small, this time using the best effect change buffer. This business model is common billing class, the class system log.

Conversely, suppose a business update mode is done immediately after writing queries, even if the conditions are satisfied, the first update record in change buffer, but after due immediately to access the data page, it will immediately trigger the purge process.
In this way the number of random access IO will not be reduced, but increased the maintenance costs of change buffer. So, for this business model is, change buffer but played side effects.

Index selection and practice:
Whenever possible, use a regular index.
redo log major savings random write disk IO consumption (converted into sequential write), and change buffer main savings are random read IO consume disk.

Guess you like

Origin www.cnblogs.com/luoahong/p/11611255.html