MyCAT cross-database two-table query

1 Overview

MyCAT supports cross-database table Join. The current version only supports cross-database two- table Join. Even so, it has been able to meet most of our business scenarios. Moreover, the performance problems that may be caused by too many tables of Join are also very troublesome.

This article mainly shares:

  1. Overall process, call sequence diagram
  2. Analysis of core code

Pre-reading: " MyCAT Single Database Single Table Query " .

2. Main Process

When executing a cross-database two-table Join SQL, the general process is as follows:

On SQL, annotations need to be added  /*!mycat:catlet=io.mycat.catlets.ShareJoin */ ${SQL}.

RouteService#route(...) After parsing the annotation  mycat:catlet , the route is given  HintCatletHandler for further processing.

HintCatletHandlerCatlet Obtaining the implementation class  corresponding to the annotation  io.mycat.catlets.ShareJoin is one of the implementations (there is only one implementation at present), which provides the function of joining between two tables across libraries. From the point of view of class naming, it ShareJoin is very likely that a complete cross-database multi-table Join function will be provided in the future.

The core code is as follows:

// HintCatletHandler.java
public RouteResultset route(SystemConfig sysConfig, SchemaConfig schema,
                           int sqlType, String realSQL, String charset, ServerConnection sc,
                           LayerCachePool cachePool, String hintSQLValue, int hintSqlType, Map hintMap)
       throws SQLNonTransientException {
   String cateletClass = hintSQLValue;
   if (LOGGER.isDebugEnabled()) {
       LOGGER.debug("load catelet class:" + hintSQLValue + " to run sql " + realSQL);
   }
   try {
       Catlet catlet = (Catlet) MycatServer.getInstance().getCatletClassLoader().getInstanceofClass(cateletClass);
       catlet.route(sysConfig, schema, sqlType, realSQL, charset, sc, cachePool);
       catlet.processSQL(realSQL, new EngineCtx(sc.getSession2()));
   } catch (Exception e) {
       LOGGER.warn("catlet error " + e);
       throw new SQLNonTransientException(e);
   }
   return null;
}

3. ShareJoin

Currently, two- table joins across databases are supported . ShareJoin Split the SQL into left table SQL and right table SQL, send them to each data node for execution, summarize the data results and return them after combining.

The pseudo code is as follows:

// SELECT u.id, o.id FROM t_order o 
// INNER JOIN t_user u ON o.uid = u.id
// 【顺序】查询左表
String leftSQL = "SELECT o.id, u.id FROM t_order o";
List leftList = dn[0].select(leftSQL) + dn[1].select(leftSQL) + ... + dn[n].select(leftsql);
// 【并行】查询右表
String rightSQL = "SELECT u.id FROM t_user u WHERE u.id IN (${leftList.uid})";
for (dn : dns) { // 此处是并行执行,使用回调逻辑
    for (rightRecord : dn.select(rightSQL)) { // 查询右表
        // 合并结果
        for (leftRecord : leftList) {
            if (leftRecord.uid == rightRecord.id) {
                write(leftRecord + leftRecord.uid 拼接结果);
            }
        }
    }
}

The actual situation will be more complicated, we will look down a little bit next.

3.1 JoinParser

JoinParser Responsible for parsing SQL. The overall process is as follows:

For example, /*!mycat:catlet=io.mycat.catlets.ShareJoin */ SELECT o.id, u.username from t_order o join t_user u on o.uid = u.id;after parsing, the TableFilter result is as follows:

  • tName : table name
  • tAlia : table custom naming
  • where : filter condition
  • order : sort condition
  • parenTable : The table name of the Join of the left join. t_userThe table   is "o" in the joinattribute  , ie .parenTablet_order
  • joinParentkey : Join field of left join
  • joinKey : the join field. t_userThe table is in the  joinattribute  id.
  • join : child tableFilter. That is, the table to the right of the table join.
  • parent :  joinRelative to the property.

Seeing this, you may have questions why you should parse SQL into  TableFilter. JoinParser Execute the SQL based on the  TableFilter generated data node. code show as below:

// TableFilter.java
public String getSQL() {
   String sql = "";
   // fields
   for (Entry<String, String> entry : fieldAliasMap.entrySet()) {
       String key = entry.getKey();
       String val = entry.getValue();
       if (val == null) {
           sql = unionsql(sql, getFieldfrom(key), ",");
       } else {
           sql = unionsql(sql, getFieldfrom(key) + " as " + val, ",");
       }
   }
   // where
   if (parent == null) {    // on/where 等于号左边的表
       String parentJoinKey = getJoinKey(true);
       // fix sharejoin bug:
       // (AbstractConnection.java:458) -close connection,reason:program err:java.lang.IndexOutOfBoundsException:
       // 原因是左表的select列没有包含 join 列,在获取结果时报上面的错误
       if (sql != null && parentJoinKey != null &&
               !sql.toUpperCase().contains(parentJoinKey.trim().toUpperCase())) {
           sql += ", " + parentJoinKey;
       }
       sql = "select " + sql + " from " + tName;
       if (!(where.trim().equals(""))) {
           sql += " where " + where.trim();
       }
   } else {    // on/where 等于号右边边的表
       if (allField) {
           sql = "select " + sql + " from " + tName;
       } else {
           sql = unionField("select " + joinKey, sql, ",");
           sql = sql + " from " + tName;
           //sql="select "+joinKey+","+sql+" from "+tName;
       }
       if (!(where.trim().equals(""))) {
           sql += " where " + where.trim() + " and (" + joinKey + " in %s )";
       } else {
           sql += " where " + joinKey + " in %s ";
       }
   }
   // order
   if (!(order.trim().equals(""))) {
       sql += " order by " + order.trim();
   }
   // limit
   if (parent == null) {
       if ((rowCount > 0) && (offset > 0)) {
           sql += " limit" + offset + "," + rowCount;
       } else {
           if (rowCount > 0) {
               sql += " limit " + rowCount;
           }
       }
   }
   return sql;
}
  • When  parent empty, the table to the left of the on/where equals sign . For example: select id, uid from t_order.
  • When  parent not empty, the table to the right of the on/where equals sign . For example: select id, username from t_user where id in (1, 2, 3).

3.2 ShareJoin.processSQL(…)

After the SQL is parsed, the SQL executed by the table on the left is generated and sent to the corresponding data node to query the data. The general process is as follows:

When SQL is

/*!mycat:catlet=io.mycat.catlets.ShareJoin */

SELECT o.id, u.username from t_order o join t_user u on o.uid = u.id;When ,
sql = getSql() the return result is  select id, uid from t_order.

After the SQL executed by the left table is generated , it is sent to the corresponding data node to query the data sequentially . How the specific sequential query is implemented, let's look at the next chapter  BatchSQLJob .

3.3 BatchSQLJob

EngineCtx For  BatchSQLJob encapsulation, two upper-layer methods are provided:

  1. executeNativeSQLSequnceJob : Sequential (non-concurrent) execution of SQL tasks on each data node
  2. executeNativeSQLParallJob : execute SQL tasks concurrently on each data node

The core code is as follows:

// EngineCtx.java
public void executeNativeSQLSequnceJob(String[] dataNodes, String sql,
		SQLJobHandler jobHandler) {
	for (String dataNode : dataNodes) {
		SQLJob job = new SQLJob(jobId.incrementAndGet(), sql, dataNode,
				jobHandler, this);
		bachJob.addJob(job, false);
	}
}
public void executeNativeSQLParallJob(String[] dataNodes, String sql,
		SQLJobHandler jobHandler) {
	for (String dataNode : dataNodes) {
		SQLJob job = new SQLJob(jobId.incrementAndGet(), sql, dataNode,
				jobHandler, this);
		bachJob.addJob(job, true);
	}
}

BatchSQLJobSequential/concurrent execution tasks are implemented  through the list of tasks in execution and the list of tasks to be executed . The core code is as follows:

// BatchSQLJob.java
/**
* 执行中任务列表
*/
private ConcurrentHashMap<Integer, SQLJob> runningJobs = new ConcurrentHashMap<Integer, SQLJob>();
/**
* 待执行任务列表
*/
private ConcurrentLinkedQueue<SQLJob> waitingJobs = new ConcurrentLinkedQueue<SQLJob>();
public void addJob(SQLJob newJob, boolean parallExecute) {
   if (parallExecute) {
       runJob(newJob);
   } else {
       waitingJobs.offer(newJob);
       if (runningJobs.isEmpty()) { // 若无正在执行中的任务,则从等待队列里获取任务进行执行。
           SQLJob job = waitingJobs.poll();
           if (job != null) {
               runJob(job);
           }
       }
   }
}
public boolean jobFinished(SQLJob sqlJob) {
	runningJobs.remove(sqlJob.getId());
	SQLJob job = waitingJobs.poll();
	if (job != null) {
		runJob(job);
		return false;
	} else {
		if (noMoreJobInput) {
			return runningJobs.isEmpty() && waitingJobs.isEmpty();
		} else {
			return false;
		}
	}
}
  • When executing sequentially , when  runningJobs there is a task in execution, #addJob(...) it is not executed immediately and is added to waitingJobs. When  SQLJob complete, the next task is called in sequence.
  • When executed concurrently , #addJob(...) when, execute immediately.

SQLJob SQL executes tasks asynchronously. Its  jobHandler(SQLJobHandler) properties, when SQL execution returns results, will be called back, so as to achieve asynchronous execution.

In  ShareJoin , SQLJobHandler there are two implementations: ShareDBJoinHandler, ShareRowOutPutDataHandler. The former, the SQL callback executed by the left table ; the latter, the SQL callback executed by the right table .

3.4 ShareDBJoinHandler

ShareDBJoinHandler, the SQL callback that the table on the left executes. The process is as follows:

  • #fieldEofResponse(...) : Receive the fields returned by the data node and put them into memory.
  • #rowResponse(...) : Receive the row returned by the data node and put it into memory.
  • #rowEofResponse(...) : Returns all rows after receiving a data node. When all data nodes complete the SQL execution, submit the SQL task executed by the table on the right and execute it in parallel , that is, #createQryJob(…) in the figure .

When SQL is

/*!mycat:catlet=io.mycat.catlets.ShareJoin */

SELECT o.id, u.username from t_order o join t_user u on o.uid = u.id;When ,
sql = getChildSQL() the return result is  select id, username from t_user where id in (1, 2, 3).

The core code is as follows:

// ShareJoin.java
private void createQryJob(int batchSize) {
   int count = 0;
   Map<String, byte[]> batchRows = new ConcurrentHashMap<String, byte[]>();
   String theId = null;
   StringBuilder sb = new StringBuilder().append('(');
   String svalue = "";
   for (Map.Entry<String, String> e : ids.entrySet()) {
       theId = e.getKey();
       byte[] rowbyte = rows.remove(theId);
       if (rowbyte != null) {
           batchRows.put(theId, rowbyte);
       }
       if (!svalue.equals(e.getValue())) {
           if (joinKeyType == Fields.FIELD_TYPE_VAR_STRING
                   || joinKeyType == Fields.FIELD_TYPE_STRING) { // joinkey 为varchar
               sb.append("'").append(e.getValue()).append("'").append(','); // ('digdeep','yuanfang')
           } else { // 默认joinkey为int/long
               sb.append(e.getValue()).append(','); // (1,2,3)
           }
       }
       svalue = e.getValue();
       if (count++ > batchSize) {
           break;
       }
   }
   if (count == 0) {
       return;
   }
   jointTableIsData = true;
   sb.deleteCharAt(sb.length() - 1).append(')');
   String sql = String.format(joinParser.getChildSQL(), sb);
   getRoute(sql);
   ctx.executeNativeSQLParallJob(getDataNodes(), sql, new ShareRowOutPutDataHandler(this, fields, joinindex, joinParser.getJoinRkey(), batchRows, ctx.getSession()));
}

3.5 ShareRowOutPutDataHandler

ShareRowOutPutDataHandler, the SQL callback executed by the table on the right . The process is as follows:

  • #fieldEofResponse(...) : Receive the fields returned by the data node and return the header to MySQL Client.
  • #rowResponse(...) : Receive the row returned by the data node, match the records in the left table, and return the merged row to MySQL Client.
  • #rowEofResponse(...) : When all rows are returned, return eof to MySQL Client.

The core code is as follows:

// ShareRowOutPutDataHandler.java
public boolean onRowData(String dataNode, byte[] rowData) {
   RowDataPacket rowDataPkgold = ResultSetUtil.parseRowData(rowData, bfields);
   //拷贝一份batchRows
   Map<String, byte[]> batchRowsCopy = new ConcurrentHashMap<String, byte[]>();
   batchRowsCopy.putAll(arows);
   // 获取Id字段,
   String id = ByteUtil.getString(rowDataPkgold.fieldValues.get(joinR));
   // 查找ID对应的A表的记录
   byte[] arow = getRow(batchRowsCopy, id, joinL);
   while (arow != null) {
       RowDataPacket rowDataPkg = ResultSetUtil.parseRowData(arow, afields);//ctx.getAllFields());
       for (int i = 1; i < rowDataPkgold.fieldCount; i++) {
           // 设置b.name 字段
           byte[] bname = rowDataPkgold.fieldValues.get(i);
           rowDataPkg.add(bname);
           rowDataPkg.addFieldCount(1);
       }
       // huangyiming add
       MiddlerResultHandler middlerResultHandler = session.getMiddlerResultHandler();
       if (null == middlerResultHandler) {
           ctx.writeRow(rowDataPkg);
       } else {
           if (middlerResultHandler instanceof MiddlerQueryResultHandler) {
               byte[] columnData = rowDataPkg.fieldValues.get(0);
               if (columnData != null && columnData.length > 0) {
                   String rowValue = new String(columnData);
                   middlerResultHandler.add(rowValue);
               }
               //}
           }
       }
       arow = getRow(batchRowsCopy, id, joinL);
   }
   return false;
}

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325156778&siteId=291194637