Online data migration, a compulsory course in the digital age - JD Cloud Data Migration Practice

Breaking the data boundary is a phrase that is often talked about in the digital age. The value of data is reflected in the flow, and so is the application of data. In the past, in order to meet the needs of development, testing, data protection, disaster recovery, and data analysis, we constantly replicated, backed up, and migrated data, so data migration is very important.

In the hybrid multi-cloud era, user data migration needs and scenarios surge

Today, let's focus on data migration in the hybrid cloud era. Let's take a look at several common enterprise data migration requirements and scenarios:

  1. Traditional cloud-based: the equipment is old and needs to be upgraded, the hardware cost upgrade is not cost-effective, and the cloud is more economical;

  2. Price-sensitive: comprehensively compare the prices of multiple manufacturers, and flexibly select the cost-optimized solution;

  3. Disaster recovery-driven: Multi-cloud and heterogeneous clouds are required to build your own disaster recovery system to ensure data security

  4. Game customers: Open servers in different places and serve users in different regions. Due to the inconsistent network quality in different places, a multi-cloud model is required to build game servers that serve local users.

  5. Pan-financial customers: To meet the requirements of financial security policies, data migration is required

These customers have adopted hybrid multi-cloud solutions due to factors such as system and technology upgrades, business development, and security compliance. At the same time, they have high demands on data migration, and they will also face various challenges under different business models and needs. question.

The Dilemma of Data Migration in the Hybrid Multi-Cloud Era

The development and diversity of databases raises the threshold for migration

In the hybrid multi-cloud era, migration is a major problem, and data security migration is often the most concerned by enterprises. When it comes to the difficulty of data migration, let’s first briefly review the development of relational databases to non-relational databases.

A relational database is a database that uses a relational model to organize data. The relational model was first proposed by Dr. EF Codd, a researcher at IBM, in 1970. In the following decades, the concept of relational model has been fully developed and gradually become mainstream. The relational database has the characteristics of transaction consistency, real-time reading and writing, structure and normalization, so it is easy to understand, easy to use, and easy to maintain. There are Oracle, Microsoft SQL Server, DB2, MySQL, etc.

However, with the rapid development of the Internet, the high concurrency of website users and the generation of massive data, traditional relational databases can no longer meet the data storage needs of enterprises. Weakening data structure consistency, more flexible use and good scalability have gradually become the first choice of enterprises.

The term NoSQL was first coined by Carlo Strozzi in 1998. Typical representatives are Redis, Amazon DynamoDB, Memcached, Microsoft Azure Cosmos DB, etc.

In the face of the migration between these two databases, the relational database SQL RDBMS has developed for a long time, and the migration ecological tools are complete, and most database products have their own migration tools; instead of relational NoSQL, the data definition is more loose, and the product volume is light. , abandoning the consistency check, resulting in the abuse of some data structures and increasing the difficulty of migration. Most of the migration tools are open to the ecosystem . Due to the short development time, the tools are not as complete as RDBMS.

Vendors try to 'data kidnap', making migration even worse

After analyzing the development of relational databases to non-relational databases, we can see that the data storage structure itself has undergone tremendous changes, which has greatly increased the difficulty of migration from the root cause, and some cloud vendors have re-engineered the database, and The underlying transformation of relational databases is not transparent, which greatly increases the complexity of the database. Trying to use the high cost of data migration to bind customers for a long time makes the data migration even worse.

"Individual differences" lead to lack of universal solutions

Due to the different needs of different companies and the diversity of usage scenarios, each customer is a new case for us, and we must "tailor-made" services, but in the process, we also summarize several common types. difficulty:

  • Difficulty 1: Multi-node database migration, the number of nodes is inconsistent

  • Difficulty 2: Cross-version problems of native products, inconsistent versions and insufficient compatibility between upper and lower versions

  • Difficulty 3: Cached data is more volatile

Faced with challenges, JD Cloud breaks through the migration dilemma

Below, we will share with you how we broke through the migration dilemma through actual cases and helped users to free themselves from the shackles of data.

Major challenges, calm in the face of danger / calm

In 2019, in order to achieve its three major goals of light assetization, cost reduction, and upgrading of the structure, Jingdong Logistics began to migrate its ES from the local computer room to the cloud. Relying on the high availability, easy expansion, and near real-time characteristics of JD Cloud Search ES, JD Logistics has successfully migrated hundreds of systems such as sorting center automation systems, cold chain process monitoring systems, and open order tracking systems to the cloud.

During the migration process, JD Cloud not only provides a conventional downtime migration solution, but also provides a special non-stop migration solution to ensure that the logistics business does not stop service. The non-stop migration solution is as follows: create a new cluster in the cloud, combine the cloud cluster and the user cluster into a large cluster, and use the Elasticsearch data distribution API (cluster.routing.allocation.exclude) to migrate the index data in the user cluster to the cloud The data nodes of the cluster, and finally the large cluster composed of the user cluster and the cloud cluster is split, and the user cluster is closed to complete the migration. (In this way, the following two conditions must be met at the same time: the version of the user cluster is the same as that of the cloud cluster; all nodes of the user cluster and all nodes of the cloud cluster can communicate with each other in the network.)

Through the above methods, the cloud work of hundreds of clusters and thousands of nodes of nearly 100 systems of JD Logistics was quickly completed. On top of this, JD Cloud also provides core functions such as one-click alarm, which provides all-weather and all-round protection for cloud work.

Up to now, more than 90% of JD Logistics' applications have been deployed on the public cloud. During the 11.11 last year, the business volume was more than three times that of the daily business, and the overall operation was stable.

Migration tool, a must for cloud migration

Relational databases are still one of the mainstream applications in various industries. How to quickly migrate data from traditional relational data to the cloud is also a concern of many industry users . To this end, JD Cloud has specially created a migration tool for relational databases - Data Transfer DTS.

Data transmission DTS provides real-time data streaming services, supports data migration, data subscription and data synchronization services, and can easily and conveniently meet business scenarios such as data migration to the cloud, business asynchronous decoupling, data remote disaster recovery, and business system data flow. At present, the data transfer DTS supports MySQL, MariaDB, Percona, SQL Server, PostgreSQL and other database migrations. It can easily and quickly migrate local self-built databases to JD Cloud. The source database can continue to run normally during the migration process, so as to maximize the Reduce application downtime.

Continuous breakthrough, technological innovation

An online advertising company needs to migrate Redis from its own computer room to the cloud. Since the client system carries a large number of settlement caches and service caches, it is required that there should be no service interruption during the migration process. There were some open source tools at the time, but they didn't meet the requirements. Mainly due to the version problem, the Redis version used by the customer was 4.0 and the open source tools at that time only supported versions 3.28 and below. In line with the principle of Jingdong's customer business first and the technical spirit of encouraging innovation, we thought about whether we can develop a set of tools for customers , which can cover the general tools for most scenarios of Redis data flow, so in July 2019 redissyncer 1.0 version It was born, and completed basic functions such as data source and target verification, native cluster synchronization, and large KV disassembly.

After 1.0, we welcomed several customers soon:

  • One is the Internet industry users, Redis single instance, the data volume is not about 20Gb. We successfully completed the migration task by optimizing details such as starting parameter repair and adjusting the value of each batch;

  • The second user is a user in the game industry. The user needs to migrate Redis in their own IDC to JD Cloud. Before using our products, users have found several open source products but none of them meet the requirements. Due to the large number of instances of the user, after learning about the features of the Redissyncer product, the user decides to use our tool to migrate on their own.

Close to the line, in every possible way

After an afternoon of training remote training, the user quickly got started with the first instance and the migration went smoothly. In the next few days, users completed the migration work through our tool one after another, and gave high praise to the product in the feedback, and specially sent a thank you letter.

Recognition from customers is the biggest driving force for us to keep moving forward!

Constantly polish, keep improving

After analyzing more customer pain points and needs, at the end of November 2019, we completed the upgrade of version 2.0, which added functions such as synchronous mode splitting, resumable uploading, offline file loading, cross-version migration, and streaming loading.

Soon we welcomed another financial user in December 2019. Users need to migrate the native Redis cluster to the self-developed Redis cluster. The number of target cluster nodes is 16*2, that is, a cluster composed of 16 master-slave pairs. The migration process is very smooth, and the application cutover is completed after 15 minutes of preparation.

(migration deployment diagram)

After polishing the actual scene, we have successively fixed some bugs that are difficult to encounter in the test and added some new features. The product supports not only upgrade migration but also downgrade migration; in order to improve the user experience, we have made a command line client named redissyncer-cli with reference to excellent open source products such as Redis and MySQL. So far, the upgrade of RedisSyncer 3.X has been completed, and the system construction of this project can basically meet most of the needs in the Redis migration and synchronization scenario.

More than that, breakthrough innovation

At first, we positioned RedisSyncer as a synchronization tool for Redis. With the development and practice on the user side, we next want to build RedisSyncer into a Redis data synchronization middleware with enterprise-level disaster recovery capabilities. There is still a certain threshold from tools to enterprise-level disaster recovery capabilities. Therefore, in the next step, the focus of our work is to carry out distributed transformation of the software. The ultimate goal is to automate and continue the task when any node fails, and realize the Redis data synchronization middleware with enterprise-level disaster recovery capabilities.

Embrace open source, be inclusive and open

At present, JD Cloud has accumulated migration experience covering the Internet, games, finance, logistics, retail and other multi-scenario fields. With the advent of the hybrid multi-cloud trend, we are well aware of the pain of user migration, and are willing to provide technical services to customers with a compatible and open mind, truly handing over the right of choice to users, and at the same time allowing more people to enjoy the convenience brought by technology , we will develop RedisSyncer completely open source (open source address: https://github.com/TraceNature/redissyncer-server ), return the technology to the community, and bring convenience to more users and developers!

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4090830/blog/5572326