MyCAT XA Distributed Transaction

1 Overview

After the database is split, the business will encounter scenarios that require distributed transactions. MyCAT implements distributed transactions based on XA. Another popular database middleware in China, Sharding-JDBC, is ready to implement distributed transactions based on TCC.

This article is divided into three parts:

  1. XA Concept Brief
  2. How MyCAT Code Implements XA
  3. Some flaws in MyCAT's implementation of XA

2. XA Concept


The X/Open organization (now the Open Group) defines a distributed transaction processing model. The X/Open DTP model (1994) includes:

  1. Application (  AP  )
  2. Transaction Manager (  TM  )
  3. Resource Manager (  RM  )
  4. Communication resource manager (  CRM  )
    is general, common transaction manager ( TM ) is transaction middleware, common resource manager (  RM  ) is database, common communication resource manager (  CRM  ) is message middleware, the following figure is X/Open DTP model:

The general programming method is like this:

  1. Configure  the TM and  register  the RM with the TM through  the method provided   by  the TM  or  RM . It can be understood as   registering  RM  as a data source for TM . One  TM  can register multiple  RMs .
  2. The AP  obtains the agent of the resource manager  from  the TM (for example, using the JTA interface, from the context managed by the TM, obtains the JDBC connection or JMS connection of the RM managed by the TM) The
    AP  initiates a global transaction  to the  TM . At this time, the TM  will notify each  RM . The XID (Global Transaction ID) is notified to each RM.
  3. The AP indirectly operates  the RM  to perform business operations  through   the connection obtained in the TM . At this time, the TM transmits the XID (including the information of the branch to which it belongs) to the  RM  every time  the AP  operates,  and the RM operates the relationship with the transaction  through this  XID  association.
  4. When the AP  ends the global transaction, the TM  notifies the  RM that  the global transaction ends. Start two-stage submission, which is the process of prepare-commit.

The XA protocol refers to the interface between TM (Transaction Manager) and RM (Resource Manager). The current mainstream relational database products all implement the XA interface. JTA (Java Transaction API) conforms to the X/Open DTP model, and the XA protocol is also used between the transaction manager and the resource manager. In essence, distributed transactions are realized by means of a two-phase commit protocol. Let’s take a look at the model diagrams of XA transaction success and failure:

success

fail

Seeing this, does it feel like a black question mark? Calm down! Let's see how the MyCAT code level implements XA. In addition, if you are interested in learning more about the concept, you can refer to the following articles:

  1. "XA Transaction Processing"
  2. 《XA Transaction SQL Syntax》
  3. MySQL XA Transaction Support Survey

3. MyCAT code implementation

  • MyCAT: TM, coordinator.
  • Data Nodes: RM, Participants.

3.1 JDBC Demo code

public class MyCATXAClientDemo {
    public static void main(String[] args) throws ClassNotFoundException, SQLException {
        // 1. 获得数据库连接
        Class.forName("com.mysql.jdbc.Driver");
        Connection conn = DriverManager.getConnection("jdbc:mysql://127.0.0.1:8066/dbtest", "root", "123456");
        conn.setAutoCommit(false);
        // 2. 开启 MyCAT XA 事务
        conn.prepareStatement("set xa=on").execute();
        // 3. 插入 SQL
        // 3.1 SQL1 A库
        long uid = Math.abs(new Random().nextLong());
        String username = UUID.randomUUID().toString();
        String password = UUID.randomUUID().toString();
        String sql1 = String.format("insert into t_user(id, username, password) VALUES (%d, '%s', '%s')",
                uid, username, password);
        conn.prepareStatement(sql1).execute();
        // 3.2 SQL2 B库
        long orderId = Math.abs(new Random().nextLong());
        String nickname = UUID.randomUUID().toString();
        String sql2 = String.format("insert into t_order(id, uid, nickname) VALUES(%d, %s, '%s')", orderId, uid, nickname);
        conn.prepareStatement(sql2).execute();
        // 4. 提交 XA 事务
        conn.commit();
    }
}
  • set xa=on MyCAT starts an XA transaction.
  • conn.commit Commit the XA transaction.

3.2 MyCAT opens XA transaction

When MyCAT receives the  set xa = on command, it starts the XA transaction and generates the XA transaction number. The XA transaction number generation algorithm is UUID. The core code is as follows:

// SetHandler.java
public static void handle(String stmt, ServerConnection c, int offset) {
		int rs = ServerParseSet.parse(stmt, offset);
		switch (rs & 0xff) {
		// ... 省略代码
		case XA_FLAG_ON: {
			if (c.isAutocommit()) {
				c.writeErrMessage(ErrorCode.ERR_WRONG_USED, "set xa cmd on can't used in autocommit connection ");
				return;
			}
			c.getSession2().setXATXEnabled(true);
			c.write(c.writeToBuffer(OkPacket.OK, c.allocate()));
			break;
		}
		case XA_FLAG_OFF: {
			c.writeErrMessage(ErrorCode.ERR_WRONG_USED,
					"set xa cmd off not for external use ");
			return;
		}
		// ... 省略代码
	}
}
// NonBlockingSession.java
public void setXATXEnabled(boolean xaTXEnabled) {
   if (xaTXEnabled) {
       if (this.xaTXID == null) {
           xaTXID = genXATXID(); // 获得 XA 事务编号
       }
   } else {
       this.xaTXID = null;
   }
}
private String genXATXID() {
   return MycatServer.getInstance().getXATXIDGLOBAL();
}
// MycatServer.java
public String getXATXIDGLOBAL() {
   return "'" + getUUID() + "'";
}
public static String getUUID() { // 
   String s = UUID.randomUUID().toString();
   return s.substring(0, 8) + s.substring(9, 13) + s.substring(14, 18) + s.substring(19, 23) + s.substring(24);
}

3.3 MyCAT receives SQL

Here SQL refers to the  insert, update, and delete operations.

When SQL is initiated to a data node for the first time , it will be appended in front of the SQL  , and the connection transaction status of  XA START 'xaTranId'the data node will be set to ( distributed transaction status, which will be specially arranged below ). The core code is as follows:TxState.TX_STARTED_STATE

// MySQLConnection.java
private void synAndDoExecute(String xaTxID, RouteResultsetNode rrn,
                                 int clientCharSetIndex, int clientTxIsoLation,
                                 boolean clientAutoCommit) {
   String xaCmd = null;
   boolean conAutoComit = this.autocommit;
   String conSchema = this.schema;
   // never executed modify sql,so auto commit
   boolean expectAutocommit = !modifiedSQLExecuted || isFromSlaveDB() || clientAutoCommit;
   if (expectAutocommit == false && xaTxID != null && xaStatus == TxState.TX_INITIALIZE_STATE) {        
       // 
       xaCmd = "XA START " + xaTxID + ';';
       this.xaStatus = TxState.TX_STARTED_STATE;
   }
   // .... 省略代码
   StringBuilder sb = new StringBuilder();
   // .... 省略代码
   if (xaCmd != null) {
       sb.append(xaCmd);
   }
   // and our query sql to multi command at last
   sb.append(rrn.getStatement() + ";");
   // syn and execute others
   this.sendQueryCmd(sb.toString());
}

An example of a variable sb :

SET names utf8;SET autocommit=0;
XA START '1f2da7353e8846e5833b8d8dd041cfb1','db2';
insert into t_user(id, username, password) 
VALUES (3400, 'b7c5ec1f-11cc-4599-851c-06ad617fec42', 'd2694679-f6a2-4623-a339-48d4a868be90');

3.4 MySQL receives COMMIT

3.4.1 Single-node transaction or multi-node transaction

COMMIT During execution, MyCAT will determine the number of database nodes involved in the XA transaction.

  • If the number of nodes is 1, single-node transactions, use  CommitNodeHandler processing.
  • If number of nodes > 1, multi-node transaction, use  MultiNodeCoordinator processing.

CommitNodeHandler In contrast  MultiNodeCoordinator , there is only one data node, and there is no need for multi-node coordination, and the logic will be relatively simple. Interested students can look at it separately. Our main analysis  MultiNodeCoordinator.

3.4.2 Coordination log

The coordination log records the XA transaction status of each data node during the coordination process, and handles the abnormal crash of MyCAT or the partial XA COMMIT of the data node, and the state recovery of the other XA PREPARE.

There are two types of XA transactions :

  1. TX_INITIALIZE_STATE : transaction initialization
  2. TX_STARTED_STATE : The transaction has started to complete
  3. TX_PREPARED_STATE : Transaction preparation complete
  4. TX_COMMITED_STATE : Transaction commit completed
  5. TX_ROLLBACKED_STATE : Transaction rollback completed

状态变更流 :TX_INITIALIZE_STATE => TX_STARTED_STATE => TX_PREPARED_STATE => TX_COMMITED_STATE / TX_ROLLBACKED_STATE 。

The coordination log consists of two parts :

  1. CoordinatorLogEntry : Coordinator log
  2. ParticipantLogEntry : Participant log. Here, data nodes play the role of participants. In the following, participants and data nodes may be mixed, sorry.

One XA transaction, corresponding to one  CoordinatorLogEntry. A line CoordinatorLogEntry contains N linesParticipantLogEntry . The core code is as follows:

// CoordinatorLogEntry :协调者日志
public class CoordinatorLogEntry implements Serializable {
    /**
     * XA 事务编号
     */
    public final String id;
    /**
     * 参与者日志数组
     */
    public final ParticipantLogEntry[] participants;
}
// ParticipantLogEntry :参与者日志
public class ParticipantLogEntry implements Serializable {
    /**
     * XA 事务编号
     */
    public String coordinatorId;
    /**
     * 数据库 uri
     */
    public String uri;
    /**
     * 过期描述
     */
    public long expires;
    /**
     * XA 事务状态
     */
    public int txState;
    /**
     * 参与者名字
     */
    public String resourceName;
}

MyCAT records coordination logs to a file in JSON format . Each line contains one CoordinatorLogEntry. for example:

{"id":"'e827b3fe666c4d968961350d19adda31'","participants":[{"uri":"127.0.0.1","state":"3","expires":0,"resourceName":"db3"},{"uri":"127.0.0.1","state":"3","expires":0,"resourceName":"db1"}]}
{"id":"'f00b61fa17cb4ec5b8264a6d82f847d0'","participants":[{"uri":"127.0.0.1","state":"3","expires":0,"resourceName":"db2"},{"uri":"127.0.0.1","state":"3","expires":0,"resourceName":"db1"}]}

The implementation class is:

The current method of writing log files has poor performance. We will not analyze it here, but will talk about it together in [4. MyCAT Implementation Defects].

3.4.3 MultiNodeCoordinator

Knock knock knock, here is one of the key points of this article.

Stage 1: Initiate PREPARE.

public void executeBatchNodeCmd(SQLCtrlCommand cmdHandler) {
   this.cmdHandler = cmdHandler;
   final int initCount = session.getTargetCount();
   runningCount.set(initCount);
   nodeCount = initCount;
   failed.set(false);
   faileCount.set(0);
   //recovery nodes log
   ParticipantLogEntry[] participantLogEntry = new ParticipantLogEntry[initCount];
   // 执行
   int started = 0;
   for (RouteResultsetNode rrn : session.getTargetKeys()) {
       if (rrn == null) {
           continue;
       }
       final BackendConnection conn = session.getTarget(rrn);
       if (conn != null) {
           conn.setResponseHandler(this);
           //process the XA_END XA_PREPARE Command
           MySQLConnection mysqlCon = (MySQLConnection) conn;
           String xaTxId = null;
           if (session.getXaTXID() != null) {
               xaTxId = session.getXaTXID() + ",'" + mysqlCon.getSchema() + "'";
           }
           if (mysqlCon.getXaStatus() == TxState.TX_STARTED_STATE) { // XA 事务
               //recovery Log
               participantLogEntry[started] = new ParticipantLogEntry(xaTxId, conn.getHost(), 0, conn.getSchema(), ((MySQLConnection) conn).getXaStatus());
               String[] cmds = new String[]{"XA END " + xaTxId, // XA END 命令
                       "XA PREPARE " + xaTxId}; // XA PREPARE 命令
               mysqlCon.execBatchCmd(cmds);
           } else { // 非 XA 事务
               // recovery Log
               participantLogEntry[started] = new ParticipantLogEntry(xaTxId, conn.getHost(), 0, conn.getSchema(), ((MySQLConnection) conn).getXaStatus());
               cmdHandler.sendCommand(session, conn);
           }
           ++started;
       }
   }
   // xa recovery log
   if (session.getXaTXID() != null) {
       CoordinatorLogEntry coordinatorLogEntry = new CoordinatorLogEntry(session.getXaTXID(), false, participantLogEntry);
       inMemoryRepository.put(session.getXaTXID(), coordinatorLogEntry);
       fileRepository.writeCheckpoint(inMemoryRepository.getAllCoordinatorLogEntries());
   }
   if (started < nodeCount) { // TODO 疑问:如何触发
       runningCount.set(started);
       LOGGER.warn("some connection failed to execute " + (nodeCount - started));
       /**
        * assumption: only caused by front-end connection close. <br/>
        * Otherwise, packet must be returned to front-end
        */
       failed.set(true);
   }
}
  • XA END Send +  XA PREPARE command to each data node  . Here's an example of a variable cmds :
XA END '4cbb18214d0b47adbdb0658598666677','db3';
XA PREPARE '4cbb18214d0b47adbdb0658598666677','db3';
  • Record a coordination log. The status of each participant log is  TxState.TX_STARTED_STATE.

Stage 2: Initiate COMMIT.

@Override
public void okResponse(byte[] ok, BackendConnection conn) {
   // process the XA Transatcion 2pc commit
   if (conn instanceof MySQLConnection) {
       MySQLConnection mysqlCon = (MySQLConnection) conn;
       switch (mysqlCon.getXaStatus()) {
           case TxState.TX_STARTED_STATE:
               //if there have many SQL execute wait the okResponse,will come to here one by one
               //should be wait all nodes ready ,then send xa commit to all nodes.
               if (mysqlCon.batchCmdFinished()) {
                   String xaTxId = session.getXaTXID();
                   String cmd = "XA COMMIT " + xaTxId + ",'" + mysqlCon.getSchema() + "'";
                   if (LOGGER.isDebugEnabled()) {
                       LOGGER.debug("Start execute the cmd :" + cmd + ",current host:" + mysqlCon.getHost() + ":" + mysqlCon.getPort());
                   }
                   // recovery log
                   CoordinatorLogEntry coordinatorLogEntry = inMemoryRepository.get(xaTxId);
                   for (int i = 0; i < coordinatorLogEntry.participants.length; i++) {
                       LOGGER.debug("[In Memory CoordinatorLogEntry]" + coordinatorLogEntry.participants[i]);
                       if (coordinatorLogEntry.participants[i].resourceName.equals(conn.getSchema())) {
                           coordinatorLogEntry.participants[i].txState = TxState.TX_PREPARED_STATE;
                       }
                   }
                   inMemoryRepository.put(xaTxId, coordinatorLogEntry);
                   fileRepository.writeCheckpoint(inMemoryRepository.getAllCoordinatorLogEntries());
                   // send commit
                   mysqlCon.setXaStatus(TxState.TX_PREPARED_STATE);
                   mysqlCon.execCmd(cmd);
               }
               return;
           case TxState.TX_PREPARED_STATE: {
               // recovery log
               String xaTxId = session.getXaTXID();
               CoordinatorLogEntry coordinatorLogEntry = inMemoryRepository.get(xaTxId);
               for (int i = 0; i < coordinatorLogEntry.participants.length; i++) {
                   if (coordinatorLogEntry.participants[i].resourceName.equals(conn.getSchema())) {
                       coordinatorLogEntry.participants[i].txState = TxState.TX_COMMITED_STATE;
                   }
               }
               inMemoryRepository.put(xaTxId, coordinatorLogEntry);
               fileRepository.writeCheckpoint(inMemoryRepository.getAllCoordinatorLogEntries());
               // XA reset status now
               mysqlCon.setXaStatus(TxState.TX_INITIALIZE_STATE);
               break;
           }
           default:
       }
   }
   // 释放连接
   if (this.cmdHandler.relaseConOnOK()) {
       session.releaseConnection(conn);
   } else {
       session.releaseConnectionIfSafe(conn, LOGGER.isDebugEnabled(), false);
   }
   // 是否所有节点都完成commit,如果是,则返回Client 成功
   if (this.finished()) {
       cmdHandler.okResponse(session, ok);
       if (cmdHandler.isAutoClearSessionCons()) {
           session.clearResources(false);
       }
       /* 1.  事务提交后,xa 事务结束   */
       if (session.getXaTXID() != null) {
           session.setXATXEnabled(false);
       }
       /* 2. preAcStates 为true,事务结束后,需要设置为true。preAcStates 为ac上一个状态    */
       if (session.getSource().isPreAcStates()) {
           session.getSource().setAutocommit(true);
       }
   }
}
  • mysqlCon.batchCmdFinished() For each data node, the first return is  XA END success, the second return is XA PREPARE. After  XA PREPARE success, record the data node's participant log status as TxState.TX_PREPARED_STATE. After that, a command is issued to the data node  XA COMMIT .
  • XA COMMIT After the return is successful, record the transaction participant log status of the data node as  TxState.TX_COMMITED_STATE.
  • When all data nodes (participants) complete the  XA COMMIT return, that is  this.finished() == true, the MySQL Client XA transaction is successfully submitted.

[x]  XA PREPARE and  XA COMMIT, the data node may fail to return, and it has not been simulated for the time being. The corresponding method is #errorResponse(....).

3.5 MyCAT initiates rollback XA transaction

When MyCAT starts, it will roll back the XA transaction of the  ParticipantLogEntry corresponding data node in TxState.TX_PREPARED_STATE. code show as below:

// MycatServer.java
private void performXARecoveryLog() {
   // fetch the recovery log
   CoordinatorLogEntry[] coordinatorLogEntries = getCoordinatorLogEntries();
   for (int i = 0; i < coordinatorLogEntries.length; i++) {
       CoordinatorLogEntry coordinatorLogEntry = coordinatorLogEntries[i];
       boolean needRollback = false;
       for (int j = 0; j < coordinatorLogEntry.participants.length; j++) {
           ParticipantLogEntry participantLogEntry = coordinatorLogEntry.participants[j];
           if (participantLogEntry.txState == TxState.TX_PREPARED_STATE) {
               needRollback = true;
               break;
           }
       }
       if (needRollback) {
           for (int j = 0; j < coordinatorLogEntry.participants.length; j++) {
               ParticipantLogEntry participantLogEntry = coordinatorLogEntry.participants[j];
               //XA rollback
               String xacmd = "XA ROLLBACK " + coordinatorLogEntry.id + ';';
               OneRawSQLQueryResultHandler resultHandler = new OneRawSQLQueryResultHandler(new String[0], new XARollbackCallback());
               outloop:
               for (SchemaConfig schema : MycatServer.getInstance().getConfig().getSchemas().values()) {
                   for (TableConfig table : schema.getTables().values()) {
                       for (String dataNode : table.getDataNodes()) {
                           PhysicalDBNode dn = MycatServer.getInstance().getConfig().getDataNodes().get(dataNode);
                           if (dn.getDbPool().getSource().getConfig().getIp().equals(participantLogEntry.uri)
                                   && dn.getDatabase().equals(participantLogEntry.resourceName)) {
                               //XA STATE ROLLBACK
                               participantLogEntry.txState = TxState.TX_ROLLBACKED_STATE;
                               SQLJob sqlJob = new SQLJob(xacmd, dn.getDatabase(), resultHandler, dn.getDbPool().getSource());
                               sqlJob.run();
                               break outloop;
                           }
                       }
                   }
               }
           }
       }
   }
   // init into in memory cached
   for (int i = 0; i < coordinatorLogEntries.length; i++) {
  MultiNodeCoordinator.inMemoryRepository.put(coordinatorLogEntries[i].id, coordinatorLogEntries[i]);
   }
   // discard the recovery log
    MultiNodeCoordinator.fileRepository.writeCheckpoint(MultiNodeCoordinator.inMemoryRepository.getAllCoordinatorLogEntries());
}

4. MyCAT implementation flaws

MyCAT 1.6.5 version implements weak XA transactions. Relatively speaking, the author believes that there are some gaps from actual production use. The possible defects are listed below. If there is any error, please point it out. It is hoped that MyCAT will be more and more powerful in the realization of distributed transactions.

4.1 Coordinate log write performance

1. Every time a file is written CoordinatorLogEntry, ParticipantLogEntry all the logs in the memory are rewritten , resulting in the write performance getting worse and worse as the number of XA transactions increases, resulting in very poor overall performance of XA transactions . In addition, the method is synchronous , which also increases the latency of writing.

Suggestion: first obtain the OFFSET that can be written to the file, write the coordination log to the file, and maintain the mapping relationship between the XA transaction number and the OFFSET in the memory, so as to realize sequential writing  +  parallel writing .

2. All coordination logs are maintained in the memory, the memory occupied will be larger and larger, and there is no release mechanism. The coordination log is reloaded into memory even after a restart.

Recommendation: Reconcile logs that have been fully rolled back or committed do not fit into memory. In addition, there is a file to store the mapping relationship between the XA transaction number and the OFFSET.

3. The coordination log is only written to a single file.

Recommendation: Split the coordination log file.

PS: Interested students can look at  RocketMQ the right  CommitLog storage, the performance is very good!

4.2 COMMIT before all data nodes are PREPARE

XA transaction definition, which needs to be initiated after all participants have completed XA PREPARE  successfully  XA COMMIT. Currently MyCAT is  performed immediately after  a data node is XA PREPARE completed . For example: when the first data node is submitted  , the second data section   hangs before proceeding, and the first node will still   succeed.XA COMMITXA END;XA PREPAREXA END;XA PREAPRE;XA COMMIT

Recommendation: Follow the strict XA transaction definition.

4.3 MyCAT starts the XA transaction that rolls back PREPARE

1. When MyCAT starts, all  PREPARE XA transactions are rolled back, maybe a certain XA transaction, part  COMMIT, part PREPARE. Rolling back directly at this time will result in data inconsistency.

Suggestion: When judging the existence  PREPARE of a participant in an XA transaction, at the same time judging the transaction status of other participants in the XA transaction and the XA transaction status in the data nodeMySQL such as the participant  is, you can use XA RECOVER the query in  PREPARE all XA transactions.

2. The rollback  PREPARE is performed asynchronously, and the rollback is successful in the set file when it is not completed. Failure in the asynchronous process can result in inconsistent XA transaction state.

Recommendation: After the callback is successful, update the XA transaction state.

4.4 Single-node transactions do not record coordination logs

This situation is more extreme. After launching  XA PREPARE, MyCAT hangs up. After restarting, the XA transaction "disappears" in MyCAT, and the participant's XA transaction is always in the  PREPARE state. In theory, the XA transaction needs to be rolled back.

Recommendation: Keep a coordination log.

4.5 XA COMMIT some nodes hang up and resume, but no further processing

When some nodes are  XA COMMIT completed, the other part hangs at this time. When the administrator restarts the failed node, the corresponding XA transaction is not processed further, resulting in data inconsistency.

Advice: Wood has no advice. I'm also curious, if this is the case, how to deal with it more appropriately. If you know much, please let me know.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325160520&siteId=291194637