Why is it not recommended to use Select * for three reasons

text

picture

In fact, there are three points explained in the Alibaba manual:

1) Increase the parsing cost of the query analyzer

What is the cost of the analyzer, what is it, I will draw a simple diagram, let everyone know:

picture

This is the analyzer, which will analyze the grammar and lexical syntax of your sql.

For example, if it is select * from user, if you see *, you will go to see which table user it is, and then Query Table Metadata For Columnscheck out all the column values ​​for you, and fill it like select id ,name ,age,phone form userthis. (Of course there are other analysis, such as grammar judgment, field judgment, table name, etc.)

to be honest. The cost of this analyzer... If you say that the cost of parsing is increased, I can understand it.

But I feel that the cost is not too great... unless it is a large table, so large that all column values ​​​​are queried?

so, I can accept it, but not much.

2) Adding or subtracting fields is easy to be inconsistent with resultMap configuration

I don't want to say this. To be honest, sometimes I write select *(需要查表所有列值的时候), I added fields to the entity, I changed the resultMap, and I don’t need to move the sql yet.

This point belongs to the evasion point in the usual use norms, so I won’t say much.

3) Useless fields increase network consumption and disk IO overhead

This is important.

You can see the sketch I drew in the first point, if say

When the cache is not considered:

insert image description here

It will eventually go to the executor, and behind the executor is actually the engine layer

insert image description here

I will not expand on the engine layer here. The engine layer actually includes records of various logs (undo, redo, binlog, etc.), as well as finding data in memory.

To put it simply, this kind of query operation is actually a disk-swiping operation, which involves disk IO overhead.

Then, during the flashing operation, will it really selec *increase the disk IO overhead?

The answer, definitely yes. But the extent of the increased influence, I must say:

If you say that there are only three fields in the table, id, name, and age, the id and name were originally checked; because of select *, it becomes an extra age to check, which increases the disk IO overhead?

I think it is enlarged, but it is almost unreasonable. Since these are normal data types, the overhead does not increase much.

So, what is the real hidden mine?

have large fields

For example

tinytexttextmediumtextlongtext  
tinybob、blobmediumbloblongblob

These guys, on mysql, are treated as an independent object.

That's when you really have to be cautious.

If you are a table with many fields, such as an opinion feedback form, the length of the message is uncertain, use text, and the reply message field also uses text;

Another example is the blog text table, in order to store content, these large fields are used.

I originally wanted to check the name of the person who gave the feedback, or the title of the blog, but because I was lazy or didn’t pay attention, I wrote it select *., and these large fields were brought out when I queried.

So obviously, the content data read at this time is really much larger than the original intention (maybe the owner Xiaodan complained about the security, and the feedback message wrote you a small paper). At this time, because the read content is large, the disk IO overhead is high, and the amount of data packets returned to the client is also large, which really has an impact.

4) supplement

cannot use index coverage

select *Basic farewell index covered

What is index covering?

Example:

Build an index for the name field. When querying, only the indexed field is used, which is index coverage.

picture

That is, directly by querying the index, the data obtained already satisfies the field data returned by the query. There is no need for additional query operations, that is, index coverage. This is sure to be fast.

If the original intention is to check the name, but the result is written select *, and it turns out that there are too many other fields to check, then the other fields are not indexes, and certainly cannot trigger the index to cover the usage scenario, that is, additional query operations back to the table are required, which will be slow.

Back to the topic, because it is written select *, it becomes more fields to check, and other fields are not indexes, which leads to slow return to the table.

picture

where is the problem? Out of other fields is not an index?

Then build indexes for other fields, and it's over, brothers.

You must not mess around like this, the maintenance cost of the index must not be ignored.

It involves the maintenance cost of indexes when modifying, adding and deleting data, the splitting and merging of index pages, and so on. The index also needs to be saved, and it also needs to occupy disk space. And if N fields are all indexes, if you change a row of data at will, you need to maintain N indexes.

What is the concept, just like we usually write a word document, make a table of contents, and then the second-level headings, third-level headings, texts, pagination, and editing of messy operations all need to refresh the table of contents.

So is the impact of this index coverage really very big?

Take out the table with 200W data, delete all indexes, and platform_snadd indexes to separately:

picture

Then try the query covered by the index first, and see the time spent, 0.02 seconds:

picture

picture

Then replace it with select *:

picture

Of course, this is in the scenario of 200W data, but it can be seen that the time gap is still obvious.

0.02 to 0.179…

If we add a few more large fields? Text ... That would be really outrageous.

Objective summary:

  1. If there are large fields in the table, such as TEXT and BLOB series fields, you SELECT *need to pay attention to using
  2. If you only query one or two commonly used fields, you can build a single index or a combined index for these fields. At this time, the query should be avoided. It is best to SELECT *trigger index coverage as much as possible.
  3. If there are not many fields in the table, and there are no special field types, and there must be multiple columns to be checked, and index coverage cannot be triggered,

I think SELECT *it’s okay to use , or write one to list all the fields, so that the copy code is also convenient (because there will be a situation where this field exists in the database, but it cannot be found out. In this case, it is not as convenient as writing it in this way, only need to remove a certain select * field select ) .

Guess you like

Origin blog.csdn.net/qq_43842093/article/details/131869411