Implementation plan for Vitess global unique ID generation | JD Cloud technical team

In order to identify a piece of data, we usually assign it a unique id, such as using the auto-incrementing primary key in the MySQL database. But when the amount of data is very large, it is not enough to rely solely on the database's auto-incrementing primary key, and for distributed databases, relying only on MySQL's auto-incrementing ID cannot meet the global unique demand. Therefore, various solutions have been produced, such as UUID, SnowFlake, etc. The following describes how Vitess solves this problem.

Vitess globally unique id generation

In the Vitess implementation plan, each table with a globally unique column will correspond to a sequence table. For example, the table user will correspond to a sequence table named user_seq, and the relationship between the original table and the sequence table will be recorded in the metadata. The metadata information of the user table and user_seq tables are as follows:

User table metadata: the sharding key is the name column, the sharding algorithm is hash; the globally unique column is the id column, and the specific value is generated based on the user_seq table.

{
    "tables": {
        "user": {
            "column_vindexes": [
                {
                    "column": "name",
                    "name": "hash"
                }
            ],
            "auto_increment": {
                "column": "id",
                "sequence": "user_seq"
            }
        }
    }
}

user_seq table metadata: The table type identifier is sequence.

{
  "tables": {
    "user_seq": {
      "type": "sequence"
    }
  }
}

All sequence tables have the same structure, as shown below:

CREATE TABLE user_seq (
	id int,
	next_id bigint,
	cache bigint,
	PRIMARY KEY (id)
) COMMENT 'vitess_sequence';

And there is only one piece of data with id 0:

mysql> select * from user_seq;
+----+---------+-------+
| id | next_id | cache |
+----+---------+-------+
|  0 |    1000 |   100 |
+----+---------+-------+

The sequence table can be thought of as a semicolon. The cache field represents the number of number segments issued each time, and the next_id column represents the starting value ** of the number segments issued each time. **Each Vitess shard will obtain the number segment from sequence based on next_id and cache during initialization and save it in the memory of VtTablet (the proxy service in front of the MySQL instance). When the memory segment is exhausted, it will be obtained from the sequence table again. New number segment.

Let’s dig into the code and take a look at the specific implementation logic:

// 获取sequence的方法
func (qre *QueryExecutor) execNextval() (*sqltypes.Result, error) {
    // 从plan中获取inc（为要获取的id数量）以及tableName
	inc, err := resolveNumber(qre.plan.NextCount, qre.bindVars)
	tableName := qre.plan.TableName()
	t := qre.plan.Table
	t.SequenceInfo.Lock()
	defer t.SequenceInfo.Unlock()
	if t.SequenceInfo.NextVal == 0 || t.SequenceInfo.NextVal+inc > t.SequenceInfo.LastVal {
        // 在事务中运行
		_, err := qre.execAsTransaction(func(conn *StatefulConnection) (*sqltypes.Result, error) {
            // 使用select for update锁住行数据以免在计算并更新新值期间被其他线程修改
			query := fmt.Sprintf("select next_id, cache from %s where id = 0 for update", sqlparser.String(tableName))
			qr, err := qre.execSQL(conn, query, false)
			nextID, err := evalengine.ToInt64(qr.Rows[0][0])

			if t.SequenceInfo.LastVal != nextID {
                // 如果从_seq表读取得到的id值小于tablet缓存中id，则将缓存中的值更新到_seq表中
				if nextID < t.SequenceInfo.LastVal {
					log.Warningf("Sequence next ID value %v is below the currently cached max %v, updating it to max", nextID, t.SequenceInfo.LastVal)
					nextID = t.SequenceInfo.LastVal
				}
				t.SequenceInfo.NextVal = nextID
				t.SequenceInfo.LastVal = nextID
			}
			cache, err := evalengine.ToInt64(qr.Rows[0][1])

            // 按照cache的倍数获取到大于inc量的缓存，计算出新newLast
			newLast := nextID + cache
			for newLast < t.SequenceInfo.NextVal+inc {
				newLast += cache
			}
            // 将新的边界值更新到_seq表中
			query = fmt.Sprintf("update %s set next_id = %d where id = 0", sqlparser.String(tableName), newLast)
			_, err = qre.execSQL(conn, query, false)
			t.SequenceInfo.LastVal = newLast
		})
	}
    // 返回获取的sequence值 更新SequenceInfo
	ret := t.SequenceInfo.NextVal
	t.SequenceInfo.NextVal += inc
	return ret
}

You can see from the source code:

Vitess uses intra-transaction row locking ( select for update) to ensure that multi-threaded queries and updates to sequence tables do not interfere with each other.
If the auto-incrementing sequence value cache in the VtTablet is insufficient or the number range is exhausted, the value is retrieved from the sequence table and the next_id field in the sequence table is updated.
Depending incon the size, that is, the number of required IDs, VtTablet will cacheuse the smallest block to obtain n*cache number of IDs from the sequence list and cache them in memory.

Additional instructions:

1. The sequence table is a non-split table.

2. Globally unique ID generation cannot guarantee continuity.

VtDriver implementation

In Vitess's SDK client solution VtDriver, the sequence generation logic is encapsulated in the MySQL driver package itself. Similar to Vitess's solution, for tables with global auto-increment settings, the sequence generation also depends on the corresponding sequence table. , the structure of the sequence list is the same as that of Vitess (see above), but the way to read and update the field next_id uses the CAS scheme:

public long[] querySequenceValue(Vcursor vCursor, ResolvedShard resolvedShard, String sequenceTableName) throws SQLException, InterruptedException {
	// cas 重试次数限制
    int retryTimes = DEFAULT_RETRY_TIMES;
    while (retryTimes > 0) {
    	// 查询_seq表中的sequence设置，其中cache为本地缓存的大小
        String querySql = "select next_id, cache from " + sequenceTableName + " where id = 0";
        VtResultSet vtResultSet = (VtResultSet) vCursor.executeStandalone(querySql, new HashMap<>(), resolvedShard, false);
        long[] sequenceInfo = getVtResultValue(vtResultSet);
        long next = sequenceInfo[0];
        long cache = sequenceInfo[1];

		// 将计算出的next_id的值尝试更新到_seq表中，如果失败则重新读取并更新，直到成功为止
        String updateSql = "update " + sequenceTableName + " set next_id = " + (next + cache) + " where next_id =" + sequenceInfo[0];
        VtRowList vtRowList = vCursor.executeStandalone(updateSql, new HashMap<>(), resolvedShard, false);
        if (vtRowList.getRowsAffected() == 1) {
            sequenceInfo[0] = next;
            return sequenceInfo;
        }
        retryTimes--;
        Thread.sleep(ThreadLocalRandom.current().nextInt(1, 6));
    }
    throw new SQLException("Update sequence cache failed within retryTimes: " + DEFAULT_RETRY_TIMES);
}

You can see this in the source code:

In the entire process of querying and updating the sequence list, there is no opening of transactions and generation of lock tables in the Vitess implementation. Instead, the CAS update method is used.
Use update user_seq set next_id=？ where next_id=？the execution return value to determine whether the statement is updated successfully. If it fails, re-query the next_idvalue, calculate the new value and try to update again. If concurrent contention occurs, the maximum number of retries allowed in Vtdriver DEFAULT_RETRY_TIMESis 100.

The way sequence is used in VtDriver is similar to the MySQL auto-increment key. If the table with sequence set is in the process of inserting data, and the auto-increment column does not have a specific value, the auto-increment ID will be obtained directly from the local cache. If there is no cache Or when the cache is insufficient, it will be routed to the MySQL service where the sequence list is located to obtain the sequence value .

Transaction + lock table or CAS?

In the source code of Vitess implementing sequence, the process of updating the sequence list is: execute select for update when starting a transaction, and use table locks to ensure multi-thread safety. Reality is often full of uncertainties. We can imagine: if an application locks a table in the database and then fails to perform the commit operation due to its own performance reasons, or the application node is down, at this time :

After the application crashes, the locks it holds will not be released! Any other subsequent connections to any SQL for this table will continue to be blocked!

VtDriver, as the client solution of Vitess, if its sequence implementation adopts transaction locking, since each application will be directly connected to the MySQL service, that is, the process of each application obtaining the sequence will cause the behavior of locking the table. At this time, once the lock table duration increases on the application side for some reason, or even the application crashes, all applications will experience very obvious performance degradation or even deadlock due to their lock table. Using cas means that the entire process does not require explicit opening of transactions, no need to lock rows, and naturally there is no potential deadlock risk. Of course, when the concurrency of CAS exceeds a certain level, various threads will compete with each other for resources. At this time, updates will fail and continue to be retried, which will put a certain amount of pressure on the CPU. This can be done by setting a larger cache value. , adjust by increasing the number of local caches.

Author: JD Retail Jinyue

Source: JD Cloud Developer Community Please indicate the source when reprinting