Non-stop database and table migration

Statement of needs

Similar to the order table and user table, massive data tables with a scale of hundreds of millions or even billions of tens of billions in the future are generally only single-table design in order to quickly launch the project at the beginning of the project, and there is no need to consider sub-database and sub-table . With the development of the business, the capacity of a single table exceeds 10 million or even more than 100 million. At this time, it is necessary to consider the problem of sub-database and sub-table, and the migration of sub-database and sub-table without downtime should be the most basic of sub-database and sub-table. Demand, after all, it is impossible for an Internet project to put up a billboard "system outage maintenance from 10:00 tonight to 10:00 the next day", this is very low, and I will switch jobs in the future. You tell the interviewer about this migration plan, what will the interviewer do? Want to?

 

Learn from codis

The author happened to have encountered this problem, and learned from some ideas of codis to realize the non-stop sub-database sub-table migration solution; codis is not the focus of this article, here only mentions the place of reference to codis -- rebalance:

 

When data access occurs during the migration process, the Proxy will send the "SLOTSMGRTTAGSLOT" migration command to Redis to force the key to be accessed by the client to be migrated immediately, and then process the client's request. ( SLOTSMGRTTAGSLOT is customized by codis based on redis)

 

Sub-library and sub-table

After understanding this plan, it is easier to understand the non-stop sub-database sub-table migration. Next, I will introduce the author's original implementation plan for the installed_app table; that is, the user's installed APP information table;

 

1. Determine the sharding column

Determining the sharding column is definitely the most important part of sub-database and sub-table, there is no one. The sharding column directly determines whether the entire sub-database and sub-table scheme can be successfully implemented in the end; the selection of a suitable sharding column basically enables most of the traffic interfaces related to this table to access the sub-database and sub-table through this sharding column. The most common sharding column is user_id, and user_id is also selected here in the note;

 

2. Sub-library and sub-table scheme

After selecting the most suitable sharding column according to your own business, it is necessary to determine the sub-database sub-table plan. The author adopts a scheme combining active migration and passive migration:

 

  1. Active migration is an independent program that traverses the installed_app table that requires sub-database and sub-table, and migrates the data to the target table after sub-database and sub-table.

  2. Passive migration is that the business code related to the installed_app table itself migrates the data to the corresponding table after the sub-database and sub-table.

 

These two schemes are described in detail below;

 

2.1 Active Migration

Active migration is an independent plug-in migration program. Its function is to traverse the installed_app table that requires sub-database and sub-table, and copy the data here to the target table after sub-database and sub-table. Since active migration and passive migration will run together, so Need to deal with the problem of active migration and passive migration collision, the author's active migration pseudo code is as follows:

public void migrate(){
    // Query the maximum ID of the current table to determine whether the migration is complete
    long maxId = execute("select max(id) from installed_app");
    long tempMinId = 0L;
    long stepSize = 1000;
    long tempMaxId = 0L;
    do{
        try {
            tempMaxId = tempMinId + stepSize;
            // According to InnoDB index features, where id>=? and id<? SQL performance is the highest
            String scanSql = "select * from installed_app where id>=#{tempMinId} and id<#{tempMaxId}";
            List<InstalledApp> installedApps = executeSql(scanSql);
            Iterator<InstalledApp> iterator = installedApps.iterator();
            while (iterator.hasNext()) {
                InstalledApp installedApp = iterator.next();
                // help GC
                iterator.remove();
                long userId = installedApp.getUserId();
                String status = executeRedis("get MigrateStatus:${userId}");
                if ("COMPLETED".equals(status)) {
                    // migration finish, nothing to do
                    continue;
                }
                if ("MIGRATING".equals(status)) {
                    // "passive migration" migrating, nothing to do
                    continue;
                }
                // Acquire lock before migration: set MigrateStatus:18 MIGRATING ex 3600 nx
                String result = executeRedis("set MigrateStatus:${userId} MIGRATING ex 86400 nx");
                if ("OK".equals(result)) {
                    // After successfully acquiring the lock, first query all the installed apps of this user [that is, the migration process is migrated according to the user ID dimension]
                    String sql = "select * from installed_app where user_id=#{user_id}";
                    List<InstalledApp> userInstalledApps = executeSql(sql);
                    // Migrate all installed apps of this user to the table after sub-database and sub-table (with user_id, you can get the specific table after sub-database and sub-table)
                    shardingInsertSql(userInstalledApps);
                    // After the migration is complete, modify the cache state
                    executeRedis("setex MigrateStatus:${userId} 864000 COMPLETED");
                } else {
                    // If the lock is not obtained, it means that the passive migration has obtained the lock, then the migration can be handed over to the passive migration [this probability is very low]
                    // The logic here can also be strengthened. The "passive migration" process cannot last for a long time. You can try to loop several times to obtain the status to determine whether the migration is complete.
                    logger.info("Migration conflict. userId = {}", userId);
                }
            }
            if (tempMaxId >= maxId) {
                // Update max(id) and finally confirm whether the traversal is complete
                maxId = execute("select max(id) from installed_app");
            }
            logger.info("Migration process id = {}", tempMaxId);
        }catch (Throwable e){
            // If there is any exception during execution (this exception may only be thrown by redis and mysql), then exit, fix the problem and then migrate
            // And set the value of tempMinId to logger.info("Migration process id="+tempMaxId); the id of the last record of the log to prevent repeated migration
            System.exit(0);
        }
        tempMinId += stepSize;
    }while (tempMaxId < maxId);
}

 There are a few things to note here:

 

  1. The first step to query max(id) is to minimize the number of max(id) queries. If the first query max(id) is 10000000, then you do not need to query max(id again until the traversed id reaches 10000000). );

  2. Traverse according to id>=? and id<?, instead of traversing according to id>=? limit n or limit m, n, because limit performance is general, and the performance will be worse as the traversal goes on. And the traversal method of id>=? and id<? will have no impact even if there are some steps in the air, and the entire performance curve is very smooth without any jitter; the migration program is an auxiliary program after all, and cannot have any impact on the business program. excessive influence;

  3. The List<InstalledApp> queried based on the id range should be converted to Iterator<InstalledApp>. After each iteration is processed a userId, it must be removed, otherwise it may cause GC exceptions or even OOM;

 

2.2 Passive Migration

Passive migration is to insert the migration logic before the normal business logic related to the installed_app table. Taking the newly installed APP as an example, the pseudo code is as follows:

// The passive migration method is public logic, so this method needs to be called before the business logic related to the `installed_app` table;
public void migratePassive(long userId)throws Exception{
    String status = executeRedis("get MigrateStatus:${userId}");
    if ("COMPLETED".equals(status)) {
        // The user data has been migrated, nothing to do
        logger.info("user's installed app migration completed. user_id = {}", userId);
    }else if ("MIGRATING".equals(status)) {
        // "passive migration" migrating, wait until migration is complete; in order to prevent infinite loop, you can increase the maximum waiting time logic
        do{
            Thread.sleep(10);
            status = executeRedis("get MigrateStatus:${userId}");
        }while ("COMPLETED".equals(status));
    }else {
        // prepare for migration
        String result = executeRedis("set MigrateStatus:${userId} MIGRATING ex 86400 nx");
        if ("OK".equals(result)) {
            // After successfully acquiring the lock, first query all the installed apps of this user [that is, the migration process is migrated according to the user ID dimension]
            String sql = "select * from installed_app where user_id=#{user_id}";
            List<InstalledApp> userInstalledApps = executeSql(sql);
            // Migrate all installed apps of this user to the table after sub-database and sub-table (with user_id, you can get the specific table after sub-database and sub-table)
            shardingInsertSql(userInstalledApps);
            // After the migration is complete, modify the cache state
            executeRedis("setex MigrateStatus:${userId} 864000 COMPLETED");
        }else {
            // If the lock is not acquired, it should have acquired the lock elsewhere and is migrating, you can try to wait until the migration is complete
        }
    }
}
// Businesses related to the `installed_app` table--adding APPs installed by users
public void addInstalledApp(InstalledApp installedApp) throws Exception{
    // try passive migration first
    migratePassive(installedApp.getUserId());
    // Insert the user's installed app information (installedApp) into the target table after the sub-database and sub-table
    shardingInsertSql(installedApp);
}

 No matter what kind of operation in CRUD, first judge according to the value of MigrateStatus:${userId} in the cache:

 

  1. If the value is COMPLETED, it means that the migration has been completed, then the request can be transferred to the table after the sub-database and sub-table for processing;

  2. If the value is MIGRATING, it means that the migration is in progress. You can wait in a loop until the value is COMPLETED, that is, after the migration is completed, and then transfer the request to the table after the sub-database and sub-table for processing;

  3. Otherwise, the value is empty, then try to acquire the lock and then perform data migration. After the migration is completed, update the cache value to COMPLETED, and finally transfer the request to the table after the sub-database and sub-table for processing;

 

3. The plan is perfect

When all data migration is completed, the CRUD operation will still be judged according to the value of MigrateStatus:${userId} in the cache. After the data migration is completed, this step is redundant. A total switch can be added. When all data migration is completed, the value of this switch is sent in a way similar to TOPIC. After all services receive TOPIC, the switch will be cached locally. Then the CRUD of the next service does not need to be judged according to the value of MigrateStatus:${userId} in the cache;

 

4. Legacy work

After the migration is completed, the active migration program is offline, and all calls to migratePassive() in the passive migration program are removed, and some third-party sub-database and sub-table middleware can be integrated, such as sharding-jdbc, you can refer to sharding-jdbc integration actual combat

 

Review summary

Looking back at this solution, the biggest disadvantage is that if the total number of records in the sharding column (such as userId) is relatively large, and the active migration is in progress, the passive migration collides with the active migration, then the passive migration may take a long time to wait.

 

However, according to DB performance, generally inserting 1,000 pieces of data in batches is at the 10ms level, and the records of the same sharding column belong to only one table after sub-database and sub-table, and do not involve cross-tables. Therefore, as long as there is no such abnormal sharding column in the table to be migrated through SQL statistics before migration, you can migrate with confidence;

 

When the author migrated the installed_app table, users only had no more than 200 APPs at most, so there was no need to think too much about the performance problems caused by collisions; there is no universal solution, but there are solutions suitable for them;

 

If there are sharding columns with tens of thousands of records, these sharding columns can be cached first, and the migration program will go online at night, and the data of these cached sharding columns will be migrated first, so that the experience of the migration program for these users can be reduced as much as possible. . Of course you can also use a better solution that you come up with.

 

Article source: https://mp.weixin.qq.com/s/uKGdK-1jP0q6xiJyczGwFw

 

Related blog posts are recommended to read: http://www.roncoo.com/article/index?title=mysql

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326016252&siteId=291194637