1 Overview
After the database is split, the business will encounter scenarios that require distributed transactions. MyCAT implements distributed transactions based on XA. Another popular database middleware in China, Sharding-JDBC, is ready to implement distributed transactions based on TCC.
This article is divided into three parts:
- XA Concept Brief
- How MyCAT Code Implements XA
- Some flaws in MyCAT's implementation of XA
2. XA Concept
The X/Open organization (now the Open Group) defines a distributed transaction processing model. The X/Open DTP model (1994) includes:
- Application ( AP )
- Transaction Manager ( TM )
- Resource Manager ( RM )
- Communication resource manager ( CRM )
is general, common transaction manager ( TM ) is transaction middleware, common resource manager ( RM ) is database, common communication resource manager ( CRM ) is message middleware, the following figure is X/Open DTP model:
The general programming method is like this:
- Configure the TM and register the RM with the TM through the method provided by the TM or RM . It can be understood as registering RM as a data source for TM . One TM can register multiple RMs .
- The AP obtains the agent of the resource manager from the TM (for example, using the JTA interface, from the context managed by the TM, obtains the JDBC connection or JMS connection of the RM managed by the TM) The
AP initiates a global transaction to the TM . At this time, the TM will notify each RM . The XID (Global Transaction ID) is notified to each RM.- The AP indirectly operates the RM to perform business operations through the connection obtained in the TM . At this time, the TM transmits the XID (including the information of the branch to which it belongs) to the RM every time the AP operates, and the RM operates the relationship with the transaction through this XID association.
- When the AP ends the global transaction, the TM notifies the RM that the global transaction ends. Start two-stage submission, which is the process of prepare-commit.
The XA protocol refers to the interface between TM (Transaction Manager) and RM (Resource Manager). The current mainstream relational database products all implement the XA interface. JTA (Java Transaction API) conforms to the X/Open DTP model, and the XA protocol is also used between the transaction manager and the resource manager. In essence, distributed transactions are realized by means of a two-phase commit protocol. Let’s take a look at the model diagrams of XA transaction success and failure:
Seeing this, does it feel like a black question mark? Calm down! Let's see how the MyCAT code level implements XA. In addition, if you are interested in learning more about the concept, you can refer to the following articles:
3. MyCAT code implementation
- MyCAT: TM, coordinator.
- Data Nodes: RM, Participants.
3.1 JDBC Demo code
public class MyCATXAClientDemo {
public static void main(String[] args) throws ClassNotFoundException, SQLException {
// 1. 获得数据库连接
Class.forName("com.mysql.jdbc.Driver");
Connection conn = DriverManager.getConnection("jdbc:mysql://127.0.0.1:8066/dbtest", "root", "123456");
conn.setAutoCommit(false);
// 2. 开启 MyCAT XA 事务
conn.prepareStatement("set xa=on").execute();
// 3. 插入 SQL
// 3.1 SQL1 A库
long uid = Math.abs(new Random().nextLong());
String username = UUID.randomUUID().toString();
String password = UUID.randomUUID().toString();
String sql1 = String.format("insert into t_user(id, username, password) VALUES (%d, '%s', '%s')",
uid, username, password);
conn.prepareStatement(sql1).execute();
// 3.2 SQL2 B库
long orderId = Math.abs(new Random().nextLong());
String nickname = UUID.randomUUID().toString();
String sql2 = String.format("insert into t_order(id, uid, nickname) VALUES(%d, %s, '%s')", orderId, uid, nickname);
conn.prepareStatement(sql2).execute();
// 4. 提交 XA 事务
conn.commit();
}
}
set xa=on
MyCAT starts an XA transaction.conn.commit
Commit the XA transaction.
3.2 MyCAT opens XA transaction
When MyCAT receives the set xa = on
command, it starts the XA transaction and generates the XA transaction number. The XA transaction number generation algorithm is UUID. The core code is as follows:
// SetHandler.java
public static void handle(String stmt, ServerConnection c, int offset) {
int rs = ServerParseSet.parse(stmt, offset);
switch (rs & 0xff) {
// ... 省略代码
case XA_FLAG_ON: {
if (c.isAutocommit()) {
c.writeErrMessage(ErrorCode.ERR_WRONG_USED, "set xa cmd on can't used in autocommit connection ");
return;
}
c.getSession2().setXATXEnabled(true);
c.write(c.writeToBuffer(OkPacket.OK, c.allocate()));
break;
}
case XA_FLAG_OFF: {
c.writeErrMessage(ErrorCode.ERR_WRONG_USED,
"set xa cmd off not for external use ");
return;
}
// ... 省略代码
}
}
// NonBlockingSession.java
public void setXATXEnabled(boolean xaTXEnabled) {
if (xaTXEnabled) {
if (this.xaTXID == null) {
xaTXID = genXATXID(); // 获得 XA 事务编号
}
} else {
this.xaTXID = null;
}
}
private String genXATXID() {
return MycatServer.getInstance().getXATXIDGLOBAL();
}
// MycatServer.java
public String getXATXIDGLOBAL() {
return "'" + getUUID() + "'";
}
public static String getUUID() { //
String s = UUID.randomUUID().toString();
return s.substring(0, 8) + s.substring(9, 13) + s.substring(14, 18) + s.substring(19, 23) + s.substring(24);
}
3.3 MyCAT receives SQL
Here SQL refers to the insert
, update
, and delete
operations.
When SQL is initiated to a data node for the first time , it will be appended in front of the SQL , and the connection transaction status of XA START 'xaTranId'
the data node will be set to ( distributed transaction status, which will be specially arranged below ). The core code is as follows:TxState.TX_STARTED_STATE
// MySQLConnection.java
private void synAndDoExecute(String xaTxID, RouteResultsetNode rrn,
int clientCharSetIndex, int clientTxIsoLation,
boolean clientAutoCommit) {
String xaCmd = null;
boolean conAutoComit = this.autocommit;
String conSchema = this.schema;
// never executed modify sql,so auto commit
boolean expectAutocommit = !modifiedSQLExecuted || isFromSlaveDB() || clientAutoCommit;
if (expectAutocommit == false && xaTxID != null && xaStatus == TxState.TX_INITIALIZE_STATE) {
//
xaCmd = "XA START " + xaTxID + ';';
this.xaStatus = TxState.TX_STARTED_STATE;
}
// .... 省略代码
StringBuilder sb = new StringBuilder();
// .... 省略代码
if (xaCmd != null) {
sb.append(xaCmd);
}
// and our query sql to multi command at last
sb.append(rrn.getStatement() + ";");
// syn and execute others
this.sendQueryCmd(sb.toString());
}
An example of a variable sb
:
SET names utf8;SET autocommit=0;
XA START '1f2da7353e8846e5833b8d8dd041cfb1','db2';
insert into t_user(id, username, password)
VALUES (3400, 'b7c5ec1f-11cc-4599-851c-06ad617fec42', 'd2694679-f6a2-4623-a339-48d4a868be90');
3.4 MySQL receives COMMIT
3.4.1 Single-node transaction or multi-node transaction
COMMIT
During execution, MyCAT will determine the number of database nodes involved in the XA transaction.
- If the number of nodes is 1, single-node transactions, use
CommitNodeHandler
processing. - If number of nodes > 1, multi-node transaction, use
MultiNodeCoordinator
processing.
CommitNodeHandler
In contrast MultiNodeCoordinator
, there is only one data node, and there is no need for multi-node coordination, and the logic will be relatively simple. Interested students can look at it separately. Our main analysis MultiNodeCoordinator
.
3.4.2 Coordination log
The coordination log records the XA transaction status of each data node during the coordination process, and handles the abnormal crash of MyCAT or the partial XA COMMIT of the data node, and the state recovery of the other XA PREPARE.
There are two types of XA transactions :
- TX_INITIALIZE_STATE : transaction initialization
- TX_STARTED_STATE : The transaction has started to complete
- TX_PREPARED_STATE : Transaction preparation complete
- TX_COMMITED_STATE : Transaction commit completed
- TX_ROLLBACKED_STATE : Transaction rollback completed
状态变更流 :TX_INITIALIZE_STATE => TX_STARTED_STATE => TX_PREPARED_STATE => TX_COMMITED_STATE / TX_ROLLBACKED_STATE 。
The coordination log consists of two parts :
- CoordinatorLogEntry : Coordinator log
- ParticipantLogEntry : Participant log. Here, data nodes play the role of participants. In the following, participants and data nodes may be mixed, sorry.
One XA transaction, corresponding to one CoordinatorLogEntry
. A line CoordinatorLogEntry
contains N linesParticipantLogEntry
. The core code is as follows:
// CoordinatorLogEntry :协调者日志
public class CoordinatorLogEntry implements Serializable {
/**
* XA 事务编号
*/
public final String id;
/**
* 参与者日志数组
*/
public final ParticipantLogEntry[] participants;
}
// ParticipantLogEntry :参与者日志
public class ParticipantLogEntry implements Serializable {
/**
* XA 事务编号
*/
public String coordinatorId;
/**
* 数据库 uri
*/
public String uri;
/**
* 过期描述
*/
public long expires;
/**
* XA 事务状态
*/
public int txState;
/**
* 参与者名字
*/
public String resourceName;
}
MyCAT records coordination logs to a file in JSON format . Each line contains one CoordinatorLogEntry
. for example:
{"id":"'e827b3fe666c4d968961350d19adda31'","participants":[{"uri":"127.0.0.1","state":"3","expires":0,"resourceName":"db3"},{"uri":"127.0.0.1","state":"3","expires":0,"resourceName":"db1"}]}
{"id":"'f00b61fa17cb4ec5b8264a6d82f847d0'","participants":[{"uri":"127.0.0.1","state":"3","expires":0,"resourceName":"db2"},{"uri":"127.0.0.1","state":"3","expires":0,"resourceName":"db1"}]}
The implementation class is:
The current method of writing log files has poor performance. We will not analyze it here, but will talk about it together in [4. MyCAT Implementation Defects].
3.4.3 MultiNodeCoordinator
Knock knock knock, here is one of the key points of this article.
Stage 1: Initiate PREPARE.
public void executeBatchNodeCmd(SQLCtrlCommand cmdHandler) {
this.cmdHandler = cmdHandler;
final int initCount = session.getTargetCount();
runningCount.set(initCount);
nodeCount = initCount;
failed.set(false);
faileCount.set(0);
//recovery nodes log
ParticipantLogEntry[] participantLogEntry = new ParticipantLogEntry[initCount];
// 执行
int started = 0;
for (RouteResultsetNode rrn : session.getTargetKeys()) {
if (rrn == null) {
continue;
}
final BackendConnection conn = session.getTarget(rrn);
if (conn != null) {
conn.setResponseHandler(this);
//process the XA_END XA_PREPARE Command
MySQLConnection mysqlCon = (MySQLConnection) conn;
String xaTxId = null;
if (session.getXaTXID() != null) {
xaTxId = session.getXaTXID() + ",'" + mysqlCon.getSchema() + "'";
}
if (mysqlCon.getXaStatus() == TxState.TX_STARTED_STATE) { // XA 事务
//recovery Log
participantLogEntry[started] = new ParticipantLogEntry(xaTxId, conn.getHost(), 0, conn.getSchema(), ((MySQLConnection) conn).getXaStatus());
String[] cmds = new String[]{"XA END " + xaTxId, // XA END 命令
"XA PREPARE " + xaTxId}; // XA PREPARE 命令
mysqlCon.execBatchCmd(cmds);
} else { // 非 XA 事务
// recovery Log
participantLogEntry[started] = new ParticipantLogEntry(xaTxId, conn.getHost(), 0, conn.getSchema(), ((MySQLConnection) conn).getXaStatus());
cmdHandler.sendCommand(session, conn);
}
++started;
}
}
// xa recovery log
if (session.getXaTXID() != null) {
CoordinatorLogEntry coordinatorLogEntry = new CoordinatorLogEntry(session.getXaTXID(), false, participantLogEntry);
inMemoryRepository.put(session.getXaTXID(), coordinatorLogEntry);
fileRepository.writeCheckpoint(inMemoryRepository.getAllCoordinatorLogEntries());
}
if (started < nodeCount) { // TODO 疑问:如何触发
runningCount.set(started);
LOGGER.warn("some connection failed to execute " + (nodeCount - started));
/**
* assumption: only caused by front-end connection close. <br/>
* Otherwise, packet must be returned to front-end
*/
failed.set(true);
}
}
XA END
Send +XA PREPARE
command to each data node . Here's an example of a variablecmds
:
XA END '4cbb18214d0b47adbdb0658598666677','db3';
XA PREPARE '4cbb18214d0b47adbdb0658598666677','db3';
- Record a coordination log. The status of each participant log is
TxState.TX_STARTED_STATE
.
Stage 2: Initiate COMMIT.
@Override
public void okResponse(byte[] ok, BackendConnection conn) {
// process the XA Transatcion 2pc commit
if (conn instanceof MySQLConnection) {
MySQLConnection mysqlCon = (MySQLConnection) conn;
switch (mysqlCon.getXaStatus()) {
case TxState.TX_STARTED_STATE:
//if there have many SQL execute wait the okResponse,will come to here one by one
//should be wait all nodes ready ,then send xa commit to all nodes.
if (mysqlCon.batchCmdFinished()) {
String xaTxId = session.getXaTXID();
String cmd = "XA COMMIT " + xaTxId + ",'" + mysqlCon.getSchema() + "'";
if (LOGGER.isDebugEnabled()) {
LOGGER.debug("Start execute the cmd :" + cmd + ",current host:" + mysqlCon.getHost() + ":" + mysqlCon.getPort());
}
// recovery log
CoordinatorLogEntry coordinatorLogEntry = inMemoryRepository.get(xaTxId);
for (int i = 0; i < coordinatorLogEntry.participants.length; i++) {
LOGGER.debug("[In Memory CoordinatorLogEntry]" + coordinatorLogEntry.participants[i]);
if (coordinatorLogEntry.participants[i].resourceName.equals(conn.getSchema())) {
coordinatorLogEntry.participants[i].txState = TxState.TX_PREPARED_STATE;
}
}
inMemoryRepository.put(xaTxId, coordinatorLogEntry);
fileRepository.writeCheckpoint(inMemoryRepository.getAllCoordinatorLogEntries());
// send commit
mysqlCon.setXaStatus(TxState.TX_PREPARED_STATE);
mysqlCon.execCmd(cmd);
}
return;
case TxState.TX_PREPARED_STATE: {
// recovery log
String xaTxId = session.getXaTXID();
CoordinatorLogEntry coordinatorLogEntry = inMemoryRepository.get(xaTxId);
for (int i = 0; i < coordinatorLogEntry.participants.length; i++) {
if (coordinatorLogEntry.participants[i].resourceName.equals(conn.getSchema())) {
coordinatorLogEntry.participants[i].txState = TxState.TX_COMMITED_STATE;
}
}
inMemoryRepository.put(xaTxId, coordinatorLogEntry);
fileRepository.writeCheckpoint(inMemoryRepository.getAllCoordinatorLogEntries());
// XA reset status now
mysqlCon.setXaStatus(TxState.TX_INITIALIZE_STATE);
break;
}
default:
}
}
// 释放连接
if (this.cmdHandler.relaseConOnOK()) {
session.releaseConnection(conn);
} else {
session.releaseConnectionIfSafe(conn, LOGGER.isDebugEnabled(), false);
}
// 是否所有节点都完成commit,如果是,则返回Client 成功
if (this.finished()) {
cmdHandler.okResponse(session, ok);
if (cmdHandler.isAutoClearSessionCons()) {
session.clearResources(false);
}
/* 1. 事务提交后,xa 事务结束 */
if (session.getXaTXID() != null) {
session.setXATXEnabled(false);
}
/* 2. preAcStates 为true,事务结束后,需要设置为true。preAcStates 为ac上一个状态 */
if (session.getSource().isPreAcStates()) {
session.getSource().setAutocommit(true);
}
}
}
mysqlCon.batchCmdFinished()
For each data node, the first return isXA END
success, the second return isXA PREPARE
. AfterXA PREPARE
success, record the data node's participant log status asTxState.TX_PREPARED_STATE
. After that, a command is issued to the data nodeXA COMMIT
.XA COMMIT
After the return is successful, record the transaction participant log status of the data node asTxState.TX_COMMITED_STATE
.- When all data nodes (participants) complete the
XA COMMIT
return, that isthis.finished() == true
, the MySQL Client XA transaction is successfully submitted.
[x] XA PREPARE
and XA COMMIT
, the data node may fail to return, and it has not been simulated for the time being. The corresponding method is #errorResponse(....)
.
3.5 MyCAT initiates rollback XA transaction
When MyCAT starts, it will roll back the XA transaction of the ParticipantLogEntry
corresponding data node in TxState.TX_PREPARED_STATE. code show as below:
// MycatServer.java
private void performXARecoveryLog() {
// fetch the recovery log
CoordinatorLogEntry[] coordinatorLogEntries = getCoordinatorLogEntries();
for (int i = 0; i < coordinatorLogEntries.length; i++) {
CoordinatorLogEntry coordinatorLogEntry = coordinatorLogEntries[i];
boolean needRollback = false;
for (int j = 0; j < coordinatorLogEntry.participants.length; j++) {
ParticipantLogEntry participantLogEntry = coordinatorLogEntry.participants[j];
if (participantLogEntry.txState == TxState.TX_PREPARED_STATE) {
needRollback = true;
break;
}
}
if (needRollback) {
for (int j = 0; j < coordinatorLogEntry.participants.length; j++) {
ParticipantLogEntry participantLogEntry = coordinatorLogEntry.participants[j];
//XA rollback
String xacmd = "XA ROLLBACK " + coordinatorLogEntry.id + ';';
OneRawSQLQueryResultHandler resultHandler = new OneRawSQLQueryResultHandler(new String[0], new XARollbackCallback());
outloop:
for (SchemaConfig schema : MycatServer.getInstance().getConfig().getSchemas().values()) {
for (TableConfig table : schema.getTables().values()) {
for (String dataNode : table.getDataNodes()) {
PhysicalDBNode dn = MycatServer.getInstance().getConfig().getDataNodes().get(dataNode);
if (dn.getDbPool().getSource().getConfig().getIp().equals(participantLogEntry.uri)
&& dn.getDatabase().equals(participantLogEntry.resourceName)) {
//XA STATE ROLLBACK
participantLogEntry.txState = TxState.TX_ROLLBACKED_STATE;
SQLJob sqlJob = new SQLJob(xacmd, dn.getDatabase(), resultHandler, dn.getDbPool().getSource());
sqlJob.run();
break outloop;
}
}
}
}
}
}
}
// init into in memory cached
for (int i = 0; i < coordinatorLogEntries.length; i++) {
MultiNodeCoordinator.inMemoryRepository.put(coordinatorLogEntries[i].id, coordinatorLogEntries[i]);
}
// discard the recovery log
MultiNodeCoordinator.fileRepository.writeCheckpoint(MultiNodeCoordinator.inMemoryRepository.getAllCoordinatorLogEntries());
}
4. MyCAT implementation flaws
MyCAT 1.6.5 version implements weak XA transactions. Relatively speaking, the author believes that there are some gaps from actual production use. The possible defects are listed below. If there is any error, please point it out. It is hoped that MyCAT will be more and more powerful in the realization of distributed transactions.
4.1 Coordinate log write performance
1. Every time a file is written CoordinatorLogEntry
, ParticipantLogEntry
all the logs in the memory are rewritten , resulting in the write performance getting worse and worse as the number of XA transactions increases, resulting in very poor overall performance of XA transactions . In addition, the method is synchronous , which also increases the latency of writing.
Suggestion: first obtain the OFFSET that can be written to the file, write the coordination log to the file, and maintain the mapping relationship between the XA transaction number and the OFFSET in the memory, so as to realize sequential writing + parallel writing .
2. All coordination logs are maintained in the memory, the memory occupied will be larger and larger, and there is no release mechanism. The coordination log is reloaded into memory even after a restart.
Recommendation: Reconcile logs that have been fully rolled back or committed do not fit into memory. In addition, there is a file to store the mapping relationship between the XA transaction number and the OFFSET.
3. The coordination log is only written to a single file.
Recommendation: Split the coordination log file.
PS: Interested students can look at RocketMQ
the right CommitLog
storage, the performance is very good!
4.2 COMMIT before all data nodes are PREPARE
XA transaction definition, which needs to be initiated after all participants have completed XA PREPARE
successfully XA COMMIT
. Currently MyCAT is performed immediately after a data node is XA PREPARE
completed . For example: when the first data node is submitted , the second data section hangs before proceeding, and the first node will still succeed.XA COMMIT
XA END;XA PREPARE
XA END;XA PREAPRE;
XA COMMIT
Recommendation: Follow the strict XA transaction definition.
4.3 MyCAT starts the XA transaction that rolls back PREPARE
1. When MyCAT starts, all PREPARE
XA transactions are rolled back, maybe a certain XA transaction, part COMMIT
, part PREPARE
. Rolling back directly at this time will result in data inconsistency.
Suggestion: When judging the existence PREPARE
of a participant in an XA transaction, at the same time judging the transaction status of other participants in the XA transaction and the XA transaction status in the data nodeMySQL
, such as the participant is, you can use XA RECOVER
the query in PREPARE
all XA transactions.
2. The rollback PREPARE
is performed asynchronously, and the rollback is successful in the set file when it is not completed. Failure in the asynchronous process can result in inconsistent XA transaction state.
Recommendation: After the callback is successful, update the XA transaction state.
4.4 Single-node transactions do not record coordination logs
This situation is more extreme. After launching XA PREPARE
, MyCAT hangs up. After restarting, the XA transaction "disappears" in MyCAT, and the participant's XA transaction is always in the PREPARE
state. In theory, the XA transaction needs to be rolled back.
Recommendation: Keep a coordination log.
4.5 XA COMMIT some nodes hang up and resume, but no further processing
When some nodes are XA COMMIT
completed, the other part hangs at this time. When the administrator restarts the failed node, the corresponding XA transaction is not processed further, resulting in data inconsistency.
Advice: Wood has no advice. I'm also curious, if this is the case, how to deal with it more appropriately. If you know much, please let me know.