mysql combat 36 | Why temporary table can be the same name?

In the previous article, we use the time to optimize join queries to the temporary table. At that time, we are so used:

create temporary table temp_t like t1;
alter table temp_t add index(b);
insert into temp_t select * from t2 where b>=1 and b<=2000;
select * from t1 join temp_t on (t1.b=temp_t.b);
复制代码

You may be in doubt, why use a temporary table it? The direct use of ordinary table is not also allowed to do so?

Today we start with this question: What are the characteristics of a temporary table, and why it is suitable for this scenario?

Here, I need to help you clarify the question a misunderstood: Some people might think that the temporary table is the memory table. However, the two concepts, but completely different.

  • Memory table, refers to the use of table Memory engine, built table syntax create table ... engine = memory. This data tables are stored in memory, when the system is restarted will be cleared, but the table structure is still there. In addition to these two features seem more "strange", the other features from the point of view, it is a normal table.
  • The temporary table, you can use a variety of engine types. If you are using InnoDB engine or temporary tables MyISAM engine, when the write data is written to disk. Of course, you can also use temporary tables Memory engine.

Figure out the difference between memory and temporary tables in the future, let us look at what characterizes a temporary table.

Characteristics of a temporary table

For ease of understanding, we look at the following sequence of operations:


                                                                            Example 1 FIG characteristic temporary table

Can be seen, the temporary table has the following features are used:

  1. Built table syntax is create temporary table ....
  2. A temporary table is created it can only access the session, not visible to other threads. Therefore, in FIG temporary table created session A t, for the session B is not visible.
  3. Temporary table with the same name can be an ordinary table.
  4. There are temporary tables with the same name and in the ordinary session A time table, show create statements, and CRUD statements to access the temporary table.
  5. show tables command does not display temporary tables.

Because temporary tables can only be accessed session that created it, so when this session ends, the temporary table is automatically deleted. It is due to this feature, the temporary table is particularly suitable for the beginning of the article, join us optimize this scene . why?

The main reasons include the following two aspects:

  1. Temporary table is different session of the same name, if there are multiple session simultaneously perform join optimization, do not worry about repeating table names cause problems to build the table failed.
  2. You do not need to worry about data deletion problem. If a common table, in the process execution client disconnection abnormality occurs, or abnormal restart the database occurs, but also specifically to clean the intermediate data generated during the table. Due to the temporary table will automatically recover, so do not need this additional operations.

Application of temporary tables

Do not worry because the same name conflicts between threads, temporary tables are often used in the optimization process complex queries. Among them, the sub-library sub-table cross-database query system is a typical usage scenario.

Usually the scene dispersion of sub-library sub-table, it is to a large table to different logical database instance. such as. A large table ht, in accordance with the field F, split into sub-tables 1024 and then distributed to the instance database 32. As shown below:


                                                                             FIG 2 a schematic partial library sub-table

In general, this sub-library system has a sub-table intermediate layer proxy. However, there are a number of programs to make the client directly to connect to the database, that is, no proxy this layer.

In this architecture, the partition key choice is to "reduce cross-database and cross-table queries" is based. If most of the statement will include the equivalent conditions of f, then you would do with the partitioning key f. Thus, in this layer proxy parses SQL statements after completion, will be able to determine which of these statements routing table points to make inquiries.

For example, the following statement:

select v from ht where f=N;
复制代码

At this time, we can confirm that the required data which was on the score sheet by sub-table rules (for example, N% 1024). This statement requires only access a sub-table, a sub-library sub-table programs most popular form of the statement.

However, if there is another table on the index k, and the query is this:

select v from ht where k >= M order by t_modified desc limit 100;
复制代码

At this time, since there is no use query partition field f, only to find all of the partitions to meet the conditions of all the rows, and then do the order by unified action. In this case, there are two relatively common ideas.

The first idea is to implement sorting in the process of proxy layer code.

The advantage of this approach is the fast processing speed, to get the data points after storage were directly involved in the calculation in memory. However, the disadvantage of this solution is also more obvious:

  1. Development effort required is relatively large. We illustrate this statement is still relatively simple, if it comes to complicated operations, such as group by, or even join such an operation, development of the capability of the intermediate layer is relatively high;
  2. Pressure on the proxy side is relatively large, especially the problem of not enough memory and CPU bottlenecks easily appear.

Another idea is , to get respective sub-library data, a summary table to a MySQL instance, and then do a logical operation on this example are summarized.

This statement such as the above, the flow of execution may be something like this:

  • Summary library created on a temporary table temp_ht, table contains three fields v, k, t_modified;
  • Performed on the various sub-library

select v,k,t_modified from ht_x where k >= M order by t_modified desc limit 100;
复制代码

  • The sub-library results performed temp_ht inserted into the table;
  • carried out

select v from temp_ht order by t_modified desc limit 100; 
复制代码

got the answer.

This process corresponding to the flowchart as follows:


                                                                       Figure 3 schematic flow cross-database query

In practice, we often find that the amount of computation for each sub-library is not saturated, it will be directly placed on the temporary table temp_ht 32 points in one library . At this time of the query logic is similar to Figure 3, you can then think about their own specific process.

Why temporary table can be the same name?

You may ask, different threads can create a temporary table with the same name, this is how to do it?

Next, we look at this issue.

In carrying out our

create temporary table temp_t(id int primary key)engine=innodb;
复制代码

When this statement, MySQL InnoDB tables give this to create a frm save the table structure definition file, but also a place to store the table data.

This frm file in a temporary file directory, the file name suffix is .frm, the prefix is "#sql {process id} _ {thread id} _ serial number" . You can use select @@ tmpdir command to display the temporary file directory of the instance.

And on the way to store data in the table, we have a different approach in the different versions of MySQL:

  • In version 5.6 and earlier, MySQL creates temporary files in a directory with the same prefix, suffix to .ibd files, used to store data files;
  • From the start version 5.7, MySQL introduced a temporary table space, designed to store temporary data files. Therefore, we do not need to create ibd files.

From the file name prefix rule, we can see that, in fact, create a temporary table called t1 InnoDB is, MySQL storage believe we have created in the name of the table with ordinary table t1 is different, so the same library already has ordinary table below the case of t1, or you can re-create a temporary table t1.

For ease of discussion later, I'll give you an example.


                                                                                     Table temporary table in Figure 4

This process is the process ID 1234, thread id session A is 4, thread id session B is 5. So you see, a temporary table session A and session B created files on the disk will not be the same name.

MySQL data table to maintain, in addition to the file must be physically inside the memory also have a mechanism to distinguish different tables, each table corresponds to a table_def_key.

  • A value table_def_key of ordinary table is from the "library name table name +" get, so if you want to create a common table two with the same name in the same library, create a second table of the process will find table_def_key already exists a.
  • For temporary tables, table_def_key in the "library name + table name" basis, but also joined the "server_id + thread_id".

In other words, two temporary tables t1 session A and sessionB created, they have different table_def_key, disk file name is different, so can co-exist.

In the realization that each thread maintains its own list of temporary tables. So each time the operating table in the session, before traversing the list, check for the name of a temporary table, if there is a temporary operation on the priority list, if there is no longer an ordinary operating table; when the end of the session, each of the linked list temporary table, perform "DROP tEMPORARY tABLE + table name" operation.

This time you will find, binlog also recorded DROP TEMPORARY TABLE this command. You must be wondering, they can only access the temporary table in the thread, why the need to write to binlog inside?

This will require when it comes to standby copy.

Temporary tables and primary and replicate

Since writing binlog, it means that by the library needs.

You can imagine, perform the following sequence of statements in the main library:

create table t_normal(id int primary key, c int)engine=innodb;/*Q1*/
create temporary table temp_t like t_normal;/*Q2*/
insert into temp_t values(1,1);/*Q3*/
insert into t_normal select * from temp_t;/*Q4*/
复制代码

If the operation is not recorded on the temporary table, then the standby database only create table t_normal table and insert into t_normal select * from temp_t binlog log of these two statements, prepared by the library at the time of execution to insert into t_normal, will error "temp_t table does not exist."

You might say, if the row is set to binlog format just fine, right? Binlog row format is because, when the insert into binlog t_normal recording, the recording of the data of this operation, namely: a logical record which write_row event is "insert a row (1,1)."

Indeed it is. If the current binlog_format = row, then the temporary table associated with the statement, it will not be recorded in the binlog. In other words, only in binlog_format = statment / mixed time, binlog will operate in a temporary table records.

In this case, a temporary table creation statements will spread to the standby database to perform, and therefore synchronize threads library equipment will create the temporary table. Main library when the thread exits, it will automatically delete temporary table, but the standby database synchronization thread is continued in operation. So, this time we need to write a DROP TEMPORARY TABLE on the main library passed by the library to perform.

Before Someone asked me an interesting question : MySQL binlog in record time, either create table or alter table statements, as they are recorded, even with spaces remain unchanged. But if you perform drop table t_normal, the system will record binlog written:

DROP TABLE `t_normal` /* generated by server */
复制代码

That is, into a standard format. Why do you do that?

Now you know why that is: drop table command can delete multiple tables. For example, in the above example, provided binlog_format = row, if the implementation of "drop table t_normal, temp_t" command on the primary database, then it can only record the binlog:

DROP TABLE `t_normal` /* generated by server */
复制代码

Because there is no table temp_t on the standby database, this command will rewrite and then spread to the standby database to perform, will not cause the thread to stop by the library synchronization.

So, drop table command binlog record time, it is necessary to rewrite the statement to make. "/ * Generated by server * /" illustrates this being rewritten is a server-side command.

When it comes to primary and replicate , there is another problem to be solved : the main library of the same name in different threads to create a temporary table is okay, but the execution is passed by the library how to deal with it?

Now, I will give you an example, following the sequence in Example S is prepared by the library of M.


Two session on the primary library t1 M creates a temporary table with the same name, the two create temporary table t1 prepared statements will be transmitted to the library S.

However, the standby database application log thread is shared, that is to create the implementation of this statement has two threads in the application inside. (Even if opened multi-threaded replication, it may also be assigned to execute a worker from the same library). Well, this will not lead to thread synchronization error?

Obviously not, otherwise the temporary table is a bug. In other words, prepared by the library thread in the course of implementation, should the two tables t1 as two different temporary table to deal with. This, is how to achieve it?

MySQL binlog at the time of recording, the main library will execute this statement written binlog thread id in. In this way, the application thread libraries will be able to prepare the implementation of the main library thread id know each statement, and use the thread id to construct table_def_key temporary table:

  1. T1 session A temporary table in table_def_key library is prepared: library name + t1 + "M of serverid" + "thread_id session A's";
  2. Temporary table t1 session B in table_def_key library is prepared: library name + t1 + "M of serverid" + "session B of thread_id."

Due to the different table_def_key, so the two tables in a prepared application threads library which is not conflict.

summary

Today this article, I introduce you to the usage and characteristics of temporary tables.

In practice, generally a temporary table for processing more complex logical calculations. Due to a temporary table is visible to each thread its own, so no need to consider multiple threads execute the same processing logic, duplicate names temporary table. When the thread exits, the temporary table can be deleted automatically, eliminating the need for finishing work and exception handling.

In binlog_format = 'row' when operating a temporary table is not recorded in the binlog, but also save a lot of trouble, this may also be a consideration when you choose binlog_format.

Note that, when it comes to us above this temporary table is created by the users themselves, it can also be called a user temporary table. And it corresponds, is the internal temporary tables, 17 in the first article I've introduced you.

Finally, I leave you with a thought to the bar.

The following sequence of statements is to create a temporary table, and renamed it:

                                                        6 Questions about the renaming of temporary table

We can see, we can use the alter table syntax to modify the temporary table table name, but can not use the rename syntax. You know what reason?

You can write your analysis in the comments section, I will end next article and you discuss this issue. Thank you for listening, you are welcome to send this share to more friends to read together.

On the issue of time

The problem is that the previous period, to join the statement following three tables,

select * from t1 join t2 on(t1.a=t2.a) join t3 on (t2.b=t3.b) where t1.c>=X and t2.c>=Y and t3.c>=Z;
复制代码

If the rewrite straight_join, how to specify join order, three tables and how to create an index.

The first principle is to try to use BKA algorithm. Note that, when using the BKA algorithm, not "to join the two tables of results, talk to a third table join", but directly nested queries.

Specific implementation is: t1.c> = X, t2.c> = Y, t3.c> = Z these three conditions, select After filtration through a minimum data table that, as the first drive table. In this case, the following two situations may occur.

The first case, if the table t1 is elected or T3, that the rest is fixed.

  1. If the drive table is t1, the connection order is t1-> t2-> t3, the index is driven to the table fields are created, that is, create an index on t2.a and t3.b;
  2. If the drive table is t3, the connection order is t3-> t2-> t1, we need to create an index on t2.b and t1.a.

At the same time, we also need to create an index on the first field c drive table.

The second case is if the elected first drive table is a table t2, then the need to assess the filtering effect of the other two conditions.

In short, the whole idea is, try to make time to participate in every set of data join the drive table, the smaller the better, because we will drive smaller table.

Reproduced in: https: //juejin.im/post/5d05ef7fe51d4577407b1d2e

Guess you like

Origin blog.csdn.net/weixin_34301307/article/details/93183426