MySQL workflow

mysql workflow:

  

1.mysql architecture

  mysql layer and the memory is divided into server engine

1.1.server layer

  

  • Connector: connection management authority verification

  • Query cache: cache hit directly in exchange for query results

  • Analyzer: syntax analysis

  • Optimizer: generate execution plans, index selection

  • Actuator: indexing operation returns the result

1.2. Storage Engine

  Storage engine is responsible for data storage and retrieval; it is a plug-in architecture. In mysql5.5.5 innodb mysql version became the default storage engine.

  

Each storage engine comparison:

  • InnoDB: support services, support for foreign keys, InnoDB is a clustered index, the data and index files are tied together, must have a primary key, the primary key index is very efficient. But requires two auxiliary index query, the first query to the primary key, and then query the data by primary key, it does not support full-text indexing.

  • MyISAM: things are not supported and does not support foreign keys, MyISAM is a non-clustered index, the data files are separate, the index is a pointer to the saved data files. Primary key index and a secondary index is independent, MyISAM query efficiency is higher than the InnnDB, separate read and write do so when used is generally selected as the host InnoDB, MyISAM slave do

  • Memory: There are relatively few large defects usage scenarios; file data is stored in memory, if an exception occurs mysqld process, restart or shut down the machine the data will disappear.

Execution of 1.3.sql

  The first step in the client connection connector mysql database connector to obtain permission to maintain management connection; the connection is completed if you do not follow the instructions in this connection is idle too long if the connection is not used on this connection will be disconnected, the default length is 8 hours when the idle controlled by wait_timeout parameters.

  The second step to send you a sql mysql database, query cache to work this time, have not been performed before this sql to see if there is cached data is returned directly to the client, as long as the cache update operation performed on the table will fail , so some rarely updated data tables may consider the use of database cache of frequently updated tables using the cache instead of more harm than good. As a method using the cache sql, specified by SQL_CACHE:

  select SQL_CACHE * from table where xxx=xxx

  The third step when a cache miss when the analyzer to work; analyzer to determine if you are select or update or insert, analyze your grammar is correct.

  The fourth step optimizer decides according to the index and table of sql statement which index you use, I decided to join the order.

  A fifth step Actuators sql, call storage engine interface, scanning or traversing table update data inserted.

2 mysql log

  2.1 mysql log Introduction

  There are two important mysql log - redolog and binlog, redolog is only part of the innodb logs, binlog is part of the log server layer. Here are two logs what's the use: When we update the database data, these two log files will be updated recording database updates.

  redolog also known as redo log for recording changes in the operation of the transaction, the value is recorded after the data modification, regardless of whether or not to submit the transaction will be recorded. When it resumes in the database is used to restart, innodb use this log to restore to the state before the database is down, in order to ensure data integrity. redolog physical log, records the data in a table What changes, redolog fixed size, that is behind the log overwrite the previous log.

  binlog also known as archive log, which records all operations performed changes to the MySQL database, but does not include the SELECT and SHOW this type of operation. binlog is a logical log record is a table which actions. binlog form is added to the log, the log is not behind the front cover.

  2.2 data update process

  We perform an update operation is this: read data corresponding to memory -> Update Data -> write redolog Logs -> redolog state to prepare -> write binlog log -> commit transaction -> redolog state commit, official data write the log file. We found redolog's submission is "two-phase commit", the aim is to ensure the accuracy of time data recovery data recovery, data recovery is because by binlog backup to complete, so be sure to redolog and binlog consistent.

 

4 mysql Index

  4.1 Introduction Index

  According to the index data structures can be divided into a hash table, ordered array, the search tree, jump table:

  • Hash table applies to only the equivalent query scene

  • Applicable to the equivalent of an ordered array of queries and range queries scene, but a great update costs orderly array index, so it is best used for static data table

  • Search efficiency of the search tree stable and will not fluctuate significantly, but when the order of the index-based scan, you can also use two-way pointer move around fast, efficiency is very high

  • Jump table can be understood as an optimized hash index

  innodb B + tree index using the model, and is a multi-tree. Although binary tree is the highest efficiency index, but the index needs to be written to disk, if you use a binary tree disk io become very frequent. In the primary key index equatorial innodb index (clustered index) and non-primary key index (secondary index). Primary key index holds all the information the bank data, secondary index save the primary key of the row of data; so the use of secondary indexes will first find out when the primary key, and then check out the data back to the table, while not using the primary key index We need to return to the table.

For secondary index can be used for the index to optimize coverage sql, see the following two sql

  select * from table where key=1;

  select id from table where key=1;

  key is a secondary index, is to first check out sql id, and then the actual data according to the query id back to the table. While the second does not need to query the index return data directly back to the table. The second key sql index covers the needs of our inquiry, known as a covering index

  4.2 general index and unique index

  innoDB is based on data pages to read and write data, when the data to be read a first time data on this page is read into memory and then find the corresponding data rather than directly read, the default size of each page of data is 16KB.

When a data page needs to be updated, if the memory has the data pages can be updated directly, without the data page is without prejudice to the premise of the data consistency; update to the first cache change buffer, the query needs to access the next when the data page and then write the update operation in addition to the query will be change bufferwritten to disk, a background thread thread will periodically change bufferwritten to disk. For all the unique index for the update operation must first determine whether the operation would violate the uniqueness constraint, and therefore can not be used to update a unique index change bufferand general index can only update the index update uniqueness check one more than normal indexing process.

  4.3 joint index

  Index on two or more columns is called a joint index (composite index). Combined index may reduce the cost index, in a joint index (a, b, c) as an example, the establishment of such index have an index corresponding to a, ab, abc three indexes - Mysql left to right, using the index field , a query can be just a part of the index, but only the leftmost portion, and when the left-most field is a constant reference, the index will be very effective, this is the most left-prefix principles . From the most left-prefix principles, combined index is in order, then the index is placed in front of which there are more particular about. For there is a combination of knowledge index - index pushdown , assuming the combination index (a, b, c) has the following sql:

    selet * from table where a=xxx and b=xxx

  The sql will be screened twice first detected data from again found in the data. Pushdown and without using an index pushdown difference is that the index will not use the index to find out pushdown primary key data and then check out the full line of data based on the primary key query back to the table, and then find out the full line of data on the data; the index pushed down the implementation process is to identify the primary key data, and then on the primary key of the second query primary key, and then back to the table.a=xxxa=xxxb=xxxa=xxxb=xxxa=xxxb=xxx

Index pushdown features:

  • innodb engine tables, indexes can only be used to push down the secondary index

  • Field, the query can be used in the general index pushdown query field is not all joint index queries and query for multiple conditional clause field full of joint index.

  4.4 optimizer with index

  After indexing, a statement may hit more than one index, then, will be handed over to the optimizer to select the appropriate index. Optimizer chooses index purpose is to find an optimal program implementation, and with minimal costs to execute the statement. The optimizer is to determine how to index it? The optimizer will give priority to the least number of scanning lines indexes, while also combine Whether a temporary table, whether the overall judgment ordering and other factors. Before starting MySQL sql, satisfy this record does not know how many conditions, but can only be estimated based on statistical information mysql, and the statistics are sampling was out by the data.

  4.5 Other indices knowledge

  Sometimes the need to index long character column, this index will become very slow also accounted for a large memory. It can generally be part of the character begins as an index, which is the prefix of the index . This can greatly reduce the index space, thereby improving the efficiency index, but it will also reduce the selectivity index.

  Dirty pages impact on the database:

  When inconsistent data page memory and disk data We call this memory page is dirty pages, the same data is written to the memory disk data, called a clean page. When the data to be read into the database no memory of this time need to eliminate data pages in memory - clean pages can be directly eliminated, but need to brush dirty pages to disk and then eliminated. If a query too many dirty pages to be phased out will lead to longer time for the query. In order to reduce the impact of dirty pages on database performance, innodb will control the proportion of dirty pages and dirty pages refresh timing.

5 mysql syntax analysis and optimization

 5.1 count(*)

  count(*)For innodb is concerned, it needs to read data from the disk and then out of a cumulative count; MyISAM engine and the number of rows in a table is present on the disk, so execution will return directly to this number, and if the condition where the same innodb. So how to optimize ? One idea is to use a cache, but need to pay attention to consistent double-question written (after double buffering write consistency will make introductory chapter). You can also design a table to store .count(*)count(*)count(*)

  For count (primary key id) for, InnoDB engine will traverse the entire table, the id values ​​for each row are taken out, returned to the server layer. Get the server layer id, judgment is unlikely to be empty, the accumulated row. For count (1) is, InnoDB engine traverse the entire table, but not the value. For each row returned server layer, put a number "1" into, determination is impossible to empty the accumulated row. Just look at the difference between these two usage, you can compare it, count (1) than was executed count (primary key id) fast. Because the return operation id involve parsed data line, and the field value from the copy engine. For count (fields): if the "field" is defined as not null, then, line by line read out this field from the record which is determined not in rows accumulation is null; if the "field" is defined to allow is null, so, when executed, it determines that there is likely to be null, but also to determine what value is taken out again, not only accumulate null. As for the count (*), the entire field will not be taken out, but specifically optimized, not values ​​in rows accumulate. So sorting efficiency:

count(*)=count(1)>count(id)>count(字段)

 5.2 order by

  Mysql will allocate a block of memory for each thread to do the sorting process, referred to sort_buffer, a process comprising performing sorting is sql: sort memory application sort_buffer, a section and then query the entire row of data, then the need to put the field data into an ordered memory, dyed back to sort the data in memory to do a quick sort, then returned to the client. When the data is too large, sort of memory spilling over time will use temporary disk files to a secondary sort. When we sort of memory overflowed the data when, mysql will use rowidsorting to optimize. After ordering with respect rowid whole field sorting, all the fields are not put into sort_buffer, so sort of have to go back in the sort buffer table query. In a few cases, you can use the joint index + index covers ways to optimize the order by.

 5.3 join

  In understanding joinbefore we must first understand the driving table this concept - occur when two tables will have associated sub-driven table and driven table, the table also called the outer drive (R table), also known as the driven table in the table ( S table). When generally we will a small table as the driving table (specify join conditions, satisfy the query of a small number of rows in the table as "driver table" when the join condition is not specified, a small number of rows in the table as "driving table '; MySQL internal the optimizer is to do so).

Assuming that there is such a sql (xxx for the index):

    select * from table1 left join tablet2 on table1.xxx=table2.xxx

  Execution of this statement is to traverse the table table1, then each row of data in accordance with xxx value fetched from the table in table1 and table2 table to find the record satisfies the criteria. This process just like we nested query when writing a program similar and can be driven to spend the index table, this query is called NLJ. When xxx is not indexed, re-use NLJ, they would have to do a full table scan table2 many times (each piece of data taken from table1 table2 on a full table scan time), the number of scan surge. This time mysql query will use another strategy. Mysql table1 will first read the data into a join_buffermemory space inside, and then remove each row of data sequentially table2, with the join_bufferportion of the return data do comparison, join satisfy the condition as a result set.

We use the jointime to follow the following points:

  • Small table-driven large table.

  • The case was driven away indexed table (walk  NLJ query) when it is considered by join

 5.4 sql optimization

  1) in mysql, if on the field do the calculation function, you do not have access indexed.

    As sql (data index):

    select * from tradelog where month(data)=1;

    The optimizer will give up such a sql go search tree, because it can not know the range of data.

  2) implicit type conversion may result in failure of the index.

    As sql:

    select * from table where xxx=110717;

    Where xxx is the varchartype in mysql, string and numeric comparison, it will convert string into a number and then compare, here is equivalent to the use of lead can not take the index.CAST(xxx ASsigned)

  3) index of the column involved in the calculation does not take the index

  4) like% xxx will not take the index, like xxx% would take the index

  5) use in the where clause or, in innodb will not take the index, and will be MyISAM.

6 implementation plans and slow query log

 6.1 Implementation Plan

  Before adding sql query explainto view the article sql execution plan, such as:

    EXPLAIN SELECT * FROM table

  This sql returns a table like this:

id select_type table partitions type possible_keys key key_len ref rows filtered extra  
1 simple                      

  This table is sql execution plan, we can know that we run sql by analyzing the execution plan. Now to explain columns:

    1) id: clauses or select a query sequence operation table execution.

    2) select_type: each type of query select clause (simple to complex) comprising:

      • SIMPLE: the query does not contain subqueries or UNION;

      • PRIMARY: The query contains a complex of sub-section;

      • SUBQUERY: contains the subquery SELECT list or WHERE, the subquery is marked SUBQUERY;

      • DERIVED: derived subquery contained in the FROM list is marked DERIVED;

      • UNION: If after the second SELECT appears UNION, were marked UNION;

      • UNION RESULT: The results acquired from UNION SELECT table is marked as the RESULT UNION;

    3) type: MySQL find a way to express the desired line in the table, also known as "access types," including:

      • ALL: Full Table Scan, MySQL will traverse the whole table to find matching rows;

      • index: Full Index Scan, index differs only ALL is traversing the index tree index type;

      • range: index range scan, index scan begins at a certain point, matching range of return lines, common in between <> and other inquiries;

      • ref: non-unique index scanning, returns all rows matching a separate value. Find a non-unique prefix common to use a unique index non-unique index that is carried out;

      • eq_ref: unique index scan, for each index key, only one record in the table match. Common in the primary key or unique index scan;

      • and onst system: when a part of MySQL query optimization, and converted to a constant, using these types of access. As will be placed where the primary key list, MySQL the query can be converted to a constant, system type is const exception, in the case of only one row of table lookup using System;

      • NULL: MySQL decomposition statement in the optimization process, when executed even without access to a table or index.

    4) possible_keys: pointed MySQL can be used if there is an index, the index will be listed in the table on which the index to find the rows, queries related to the field, but not necessarily queries.

    5) key: Displays the index MySQL actually used in the query, if the index is not used, it appears as NULL.

    6) key_len: number of bytes used in the index, said length of the index may be used in the query is calculated by the column.

    7) ref: indicates the matching condition of the connection table, that is, which is used to find the column or constant value in the index column.

    8) rows: indicates the matching condition of the connection table, that is, which is used to find the column or constant value in the index column.

    9) Extra: Other important information includes:

      • Using index: This value indicates that the corresponding select operation using a covering index;

      • Using where: MySQL with the where clause to filter the result set;

      • Using temporary: MySQL expressed the need to use temporary tables to store the result set, common in the sorting and grouping queries;

      • Using filesort: MySQL can not use indexes to complete the sort operation called "Sort files."

 6.2 slow query log

  mysql slow query log support functions - sql relevant information long mysql query time will be written to the log. The query time threshold by the parameter long_query_timespecified, long_query_timethe default value is 10, the above operation will be recorded 10S sql query to the slow query log. By default, Mysql database does not start slow query log, we need to set the parameters manually. Slow query log file is written to support logging, also supports logging into the database table.

Slow query log can be viewed through the following sql is turned on:

    show variables like '%slow_query_log%';

  Turn slow query by sql:

    set global slow_query_log=1;

  Use sql modify the slow query log database setting takes effect only after the restart will fail if MySQL. If you want permanent, you must modify the configuration file my.cnf.

  View modify thresholds slow query by sql:

    show variables like 'long_query_time%';

    et global long_query_time=4;

7 master from a backup

 7.1 Master-slave principle backup

  Master, the master data server automatically copied from the copy refers to a database server acting as the master server, the other server or servers from the database server acts as the server from among. By this means we can achieve read and write separation, the main library to write data, read data from the database, thereby increasing the available databases. Relates to the master copy from MySQL three threads running in a master node (log dump thread), the remaining two (I / O thread, SQL thread) running from the node.

The master node binary log dump thread:

  When connecting the master node from the master node creates a thread for transmitting content. In the read operation in this thread on the master node would be locked, when the reading is completed, even before the launch to the node, the lock will be released.logdumpbinlogbinlogbinlog

  From the node I / O thread: from the library to the main library binlogto copy to the local relay log, first, from the library to the library will first start a worker thread, called IO worker thread is responsible for the main library and the establishment of a common client connection . If the process to catch up on the main library, it will go to sleep until the main library has produced a new event notice it, he would be awakened, the received event records into relay log(the relay log) in.

From node SQL thread:

  SQL thread is responsible for reading the relay logcontent, resolve into concrete action and execution, and ultimately ensure the consistency of data from the master.

Primary delay from a backup 7.2

  Standby delay the most direct expression is prepared by the library consumer relay log ( relay log) speed than the main library production binlogspeed is slower. Possible causes include:

    • Large transactions, it must wait until the completion of the implementation of the master database transaction is written binlog, then pass by the library, when a thing with a long time, because of the implementation of the delay produced this thing from the library.

    • Great pressure from the library.

  Standby delay is certainly not good, then what are ways to minimize the standby delay it? There are several ways the following:

    • A master multi-slave - more then a few from the library, let them read from the library to share the pressure. This method is suitable for reading stressful times from the library.

    • The ability to output to an external system through binlog, such as Hadoop such systems, allowing external systems to provide statistical category queries

8 Distributed Transaction

  Here no longer do things the popularity of the concept of distributed, directly describes two distributed transactions: XA distributed transactions and TCC distributed transactions.

  8.1 XA distributed transactions

    XA is a strong consistency of things two-phase commit. In MySQL 5.7.7 release, Oracle to MySQL XA official there has been a "bug" has been fixed, so as to achieve compliance with the standards of MySQL XA distributed transactions.

    XA transaction role:

      • Explorer (resource manager): used to manage system resources, is the way leading to the transactional resources. The database is a resource manager. Resource management should also have the ability to manage the transaction is committed or rolled back.

      • Transaction Manager (transaction manager): Transaction Manager is the core of distributed transaction manager. Transaction Manager to communicate with each resource manager (resource manager), coordinate and complete the transaction of business. Various branches of the transaction identified by a unique name.

    XA specification is the basis of the two-phase commit protocol:

  In the first stage, the transaction middleware request for submission of all relevant databases (pre-commit) their affairs branch to confirm whether all relevant databases can submit their transaction branch. When a database receives pre-submitted, if the transaction can submit their own branch, it will own what you did fix recorded in the transaction branch, and agree to the transaction middleware a response submitted to the database at this time can no longer be added to this transaction branch in any operation, but this time the database and did not really commit the transaction, database operations on shared resources has not been released (in the locked state). If the database can not submit their own transaction branch for some reason, it will roll back all of their operations, release the lock on the shared resources, and failure response is returned to the transaction middleware.

In the second phase, the transaction middleware database pre-submission review all returned results, such as all databases can be submitted, the transaction will require all database middleware make a formal submission, so that the global transaction is committed. And if there is any pre-submission database return a failure, the transaction will require all other database middleware to roll back its operations, so that the global transaction is rolled back.

mysql database allows multiple instances to participate in a global transaction. MySQL XA set of commands as follows:

     -- 开启一个事务,并将事务置于 ACTIVE 状态,此后执行的 SQL 语句都将置于该是事务中。

 

      XA START xid

 

    •  

    • -- 将事务置于 IDLE 状态,表示事务内的 SQL 操作完成。

    •     

      • XA END xid

    • -- 事务提交的准备动作,事务状态置于 PREPARED 状态。事务如果无法完成提交前的准备操作,该语句会执行失败。

    •     

      • XA PREPARE xid

    • -- 事务最终提交,完成持久化。

    •     

      • XA COMMIT xid

    • -- 事务回滚终止

    •     

      • XA ROLLBACK xid

    • -- 查看 MySQL 中存在的 PREPARED 状态的 xa 事务。

    •     

    •   
            XA RECOVER

    MySQL play in the XA transaction is the role of the participants, it is dominated by affairs coordinator. XA transaction than ordinary local affairs of a PREPAREstate, ordinary transactions are begin-> commit and distributed transaction is begin-> PREPARE other database transactions are to PREPARE state again when PREPARE-> commit. Distributed Transaction sql example:

    xa start 'aaa';

    insert into table(xxx) values(xxx);

    xa end 'aaa'; 

    xa prepare 'aaa';

    xa commit 'aaa';

XA transaction problems exist:

  • Single-point problem: Transaction Manager plays a role in the whole process is critical, if it goes down, for example, in the first phase has been completed, the second stage is preparing to submit downtime when the transaction manager, a resource manager will has been blocked, resulting in the database can not be used.

  • Synchronous blocking: After ready, Explorer of the resource has been in a blocked state until the submission is completed to release resources.

  • Data inconsistency: although the two-phase commit protocol for distributed data strong consistency of the design, but there are still inconsistencies in the data may, for example, in the second stage, it is assumed coordinator issued a notice transaction commit, but because of network problems such notice only be received and executed a commit operation part of the participants, because the rest of the participants did not receive notice has been blocked state, this time produced a data inconsistency.

 8.2 TCC Distributed Transaction

  TCC also known as flexible transaction by transaction compensation mechanism to achieve eventual consistency affairs, it is not strong consistency of the transaction. TCC transaction is divided into two stages, or is composed of two transactions. XA transaction with respect to TCC for concurrency better, XA are global affairs, while TCC is composed of two local transactions.

  Suppose we buy a product, the background needs to operate two tables - Standings plus integral inventory and inventory clasp, two tables exist in both databases, using TCC transaction execution in the matter:

  1) TCC implementation phase a: Try

  In stage try not directly reduce inventory plus points, but the data is changed to ready state. Inventory table to lock a stock, can be reserved for a way to lock the lock field, when the field is a time indicates that the product is locked. Standings add a data, this data is also locked state, the lock mode, and inventory tables. Sql which form:

    update stock set lock=1 where id=1;

    nsert into credits (lock,...) values (1,...)

  If these two sql perform successfully entered Confirm the stage, if unsuccessful execution stage enter Cancel

  2) TCC implement Phase II: Confirm

  This phase officially reduced inventory plus integral order status change has been paid. Execute sql lock stock deduction for accumulated points accumulate, and other logic.

  3) TCC achieve Phase Three: Cancel

  When the try was unsuccessful execution phase, we will implement this phase, which will lock the inventory reduction, locking points removed. Return to the state before the transaction execution.

  TCC principle of affairs is very simple, is not simple to use. First of all invasive TCC Affairs of the system greatly, followed by make business logic becomes complicated. In actual use, we must rely on middleware to make the transaction TCC TCC transaction can be achieved. TCC is usually a transaction implementation is probably like this: a service exposes an outside service, this service outside of normal calls, and other services can not perceive the presence of TCC affairs, and its internal services, were realized Try, Confirm, Cancel three interfaces, middleware TCC registered to go. When calling this service, which operate together to complete the transaction by the service and TCC middleware.

  The TCC transactional middleware but also do other things, such as ensuring Confirm or Cancel executed successfully, if we find a service or Confirm Cancel has not been successful, we will continue to call his retry or Cancel Confirm logic, be sure to him success! Even after multiple attempts can not succeed notification system to require manual for exceptions. TCC transaction processing should also consider some of the unusual circumstances, such as service orders suddenly hung up, and then restart again, TCC distributed transaction framework to be able to ensure that before executing the distributed transaction did not proceed. TCC framework also need to do distributed transaction logging, and various stages preserved state running distributed transactions, so that the line on the system can be for exceptions, to restore the data. Currently open source framework TCC affairs: and so on.Seata ByteTCC tcc-transaction

Guess you like

Origin www.cnblogs.com/open-yang/p/11376655.html