What has OceanBase done this year since it entered the public cloud?

*This article is reproduced from the WeChat public account "Heart of the Machine, ID: almosthuman2014"

Today, the database market is entering a new stage of competition - a competition on the cloud.

In 2022, China's public cloud database market will account for more than half of the market for the first time [1], and the proportion is expected to further expand in the future. Many Chinese database vendors have also seized on the development trend of cloud computing and actively entered cloud databases.

However, it's not easy. Many enterprises are already using traditional database products. To convince them to use or migrate to the cloud, they need to reassure customers about new technologies, solve the technical and business challenges of data migration, provide strong data security and privacy protection capabilities, and, in Providing competitive prices while ensuring product quality and service levels - this is an extremely difficult thing to do given the inherent "stickiness" of database products.

There are many participants in the cloud database market, and each player is striving to provide better products and services and strive to gain a larger market share.

As the leading product of native distributed databases, OceanBase announced the launch of cloud database OB Cloud in August 2022. After a year of development, OB Cloud has won many customers around the world, including Haidilao, Qidianhuo and Keruyun in the new retail industry, Li Auto in the manufacturing industry, Amap, Ctrip, Kuaishou, Zuoyebang, etc. in the Internet industry. Yiou Education, GCash, as well as Onion Group, Zongteng Group, Di Sifang, etc. in the cross-border industry.

So what does OB Cloud do right? This article mainly analyzes it from four aspects.

picture

In May 2023, Li Auto's autonomous driving and car cloud systems will be launched on OB Cloud in batches to meet the challenges posed by a large number of cloud scenarios. Behind this decision is the stable operation of its production line operation and maintenance system on OceanBase for more than a year, and the ultimate experience of RTO (Recovery Time Objective, recovery time objective) in seconds.

Why is RTO important to automobile manufacturing? 

The automobile manufacturing assembly line is a highly complex and automated system that generates a large amount of data, including machine data (operating data of various equipment and machines on the production line, such as temperature, pressure, rotation speed, current, etc.), process data (product manufacturing Various parameters of the process, such as processing time, material usage, product quality data, etc.), Manufacturing Execution System (MES) data (such as production planning, inventory management, order information, material requirements planning, etc.).

Some large car manufacturers may generate up to several petabytes (1PB=1000TB) of data every day.

Using these data for analysis can help manufacturers better understand the production process, optimize processes, and improve production efficiency and product quality. However, handling such a huge amount of data requires a powerful database system, which is also a major problem that many manufacturers need to face during the digital transformation process.

With the rapid development of Li Auto in recent years, the amount of data in the production line system has increased dramatically. Its database system has begun to experience performance bottlenecks when processing a large number of concurrent requests or large-scale data, posing a serious threat to the stable operation of the production line. For car companies, the production line is the lifeline. It is crucial to ensure the smooth and efficient operation of the production line. Failure of any system on the production line may lead to production shutdown, and every second of stagnation means huge manpower and labor costs. Resource loss.

Against this background, Li Auto began to develop its own smart manufacturing operating system, Li-MOS, and was eager to find an extremely stable, reliable, and scalable database to meet the challenges of system stability and high availability. Since OceanBase was put into production, it has always maintained trouble-free and stable operation. Zhao Haijun, the person in charge of Ideal DBA, recalled.

RTO is an important measure of how long it takes for a system to return to normal operation after a failure. This time includes the entire time it takes to detect a problem, start the recovery process, and resume operations until the system returns to normal operation. This number has also become a core indicator for measuring the database failure recovery level of online applications.

In 2014, OceanBase was the first in the industry to propose RTO <30 seconds, and the entire fault recovery process was completely automated and no longer required manual participation. During the Double Eleven Alipay transaction that year, for the first time in the world, no data was lost in a distributed database (RPO=0). , Non-stop service (RTO<30s). Today, RTO <30s has become the de facto standard in the distributed database industry.

In 2022, OceanBase 4.0 will achieve RTO <8s for the first time, truly reducing the fault recovery time from minutes to seconds.

This short 22-second improvement from 30 seconds to 8 seconds seems simple, but it involves a lot of technical and engineering challenges. Just like the tire changing process in an F1 car race, every second shortened is a deep understanding and precise control of technology, infrastructure, team collaboration, and most importantly, application scenarios and business processes.

In version 4.0, OceanBase has made very large architectural adjustments, redesigned and implemented the lowest-level election and consistency protocols, and made a lot of optimizations. In terms of election, it no longer relies on the absolute time between nodes, but is completely message-driven, shortening the entire Lease election time to less than 4 seconds. Not only that, a fault detection mechanism has been redesigned within the higher-level RPC framework. When the master node fails, the system will directly reselect the master, and the master's service can be switched to a new one in a hundred milliseconds. on leader. In terms of consistency, all standby nodes can play back the content written by the primary node in real time and in parallel, thus ensuring that after the primary node fails, the standby node can immediately assume the service. Moreover, based on the innovation of Paxos algorithm and dynamic log streaming technology, OceanBase can outperform MySQL in stand-alone mode. In test scenarios, it can achieve close to 2 million TPS.

As the "origin" of the consensus protocol and the most fault-tolerant, Paxos is also the most difficult to implement. OceanBase has completely and independently implemented the log synchronization mechanism based on the Multi-Paxos algorithm as early as version 1.0, and has been polishing it in extreme scenarios for many years. It is precisely because it was completely self-researched from the beginning that it was able to achieve these innovations in the underlying architecture.

After upgrading to OceanBase, the database jitter frequency of Li Auto’s production line execution system has dropped by about 80% on average. For common fault events, it has truly achieved “recovery first, then analysis”, greatly improving the stability of the system operation. Combined with the intelligent operation and maintenance system, The production line execution system of Ideal Auto can quickly complete the automatic recovery of faults without being on duty, and realize the "unmanned driving" of the automobile production line system database.

OB Cloud fully supports OceanBase 4.x version and provides the same high-availability service. After the production line operation and maintenance system has been running stably for 17 months, Li Auto decided to continue migrating the database systems built on the cloud such as autonomous driving and car cloud to the cloud version of OceanBase, and continue to achieve a strict RTO on the cloud Target.

picture

Let’s look at another story.

As the largest e-wallet application in the Philippines, GCash is known as the "Alipay of the Philippines" with 60 million registered users. However, with the rapid expansion of the business, the cost of storage and computing resources has also shown rapid growth, which has brought huge cost pressure to the company.

In 2020, the average daily transaction volume of GCash has reached one million levels, with more than 18TB of new data pouring in every month, and it is still rising at a growth rate of about 10%. In order to process these data, the operation and maintenance team has to invest a lot of resources in data splitting, which not only consumes a lot of manpower and time, but may also affect the performance and stability of the system. At the same time, the pressure on data storage space is also increasing, and database administrators (DBAs) often need to clean up and archive data all night to free up storage space. However, this solution is only temporary and cannot fundamentally solve the problem. Instead, it further increases operation and maintenance costs.

At its busiest times, the operations team needed to manage more than 200 MySQL instances. Faced with such a large volume of business, it is difficult for the system to change smoothly to support new businesses, and in extreme cases data loss may occur.

GCash urgently needs a new cloud storage solution to cope with the cost challenges brought by the rapid growth of data.

Finally, with efficient, scalable and cost-effective data storage services, as well as OceanBase's rich accumulation in the financial payment field, GCash chose OB Cloud as its new generation storage chassis. OB Cloud provides the same data compression experience as OceanBase.

OceanBase's self-developed LSM-Tree architecture storage engine can perform adaptive encoding and compression according to the characteristics of data storage, providing efficient data compression capabilities. In the past experience of serving users, the storage space can even be reduced to one-tenth of the storage space of the user's original database system.

Compression is a natural choice to reduce storage costs. However, the ultimate goal of data compression is to reduce costs and increase efficiency, and cost reduction cannot sacrifice efficiency. Therefore, the prerequisite for achieving a high compression ratio must first ensure high performance, and secondly, make data compression that is more suitable for actual business scenarios.

By using self-developed data encoding and compression technology, OceanBase can automatically select the most appropriate encoding method based on data type and distribution characteristics to achieve efficient data compression while ensuring performance. Suppose you need to store these 15 numbers: "0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233". We can choose to store them directly, with each number occupying one storage unit, requiring a total of 15 storage units. Alternatively, we can also choose to write the "first 15 Fibonacci numbers" in a storage unit based on the characteristics of this string of numbers - each number is the sum of the previous two numbers. In this way, the amount of data stored is greatly reduced for the same information content. What's more, this compression process is lossless. Of course, this is just a simplified example, and the actual process is much more complicated.

In addition to data encoding that is aware of data characteristics and compresses by columns, OceanBase also supports general compression that is not aware of data characteristics. That is, a block of data can be encoded first, and then general-purpose compressed, resulting in a higher compression ratio.

These data encoding formats take into account the impact on query performance. Not only that, the compression path is also designed not to reduce computational efficiency and decompression performance.

The results show that with the help of an efficient storage engine and high-speed block storage service on the cloud, OB Cloud has saved GCash's data storage space by 70% and database resource costs by 40%. This result greatly exceeded GCash's expectations, brought significant benefits to it, and enabled it to more effectively serve its growing user base.

picture

In today's consumer market, new traffic patterns and consumption habits are constantly emerging.

Annual holidays, especially Qixi Festival and Shuangdan, are very important to catering and retail enterprises such as Haidilao. In the process of digital transformation, the system's ability to handle traffic peaks and troughs also directly affects the business.

Haidilao’s purchase, sales and inventory system is a typical example. Taking the first half of 2021 as an example, Haidilao’s total purchase amount of fruits, vegetables and meat alone exceeded 2.8 billion yuan, covering 29 provinces, municipalities and autonomous regions including Xinjiang, Guizhou, and Yunnan. The data volume is extremely large and silky. Handling is related to the quality and timely supply of ingredients.

With the rapid growth of business, the invoicing system using traditional databases is facing more and more challenges. For example, data on the purchase, storage, sales and supply of ingredients and materials in stores across the country, as well as high concurrency issues caused by real-time changes in data; changes in ingredients and materials in store sales orders must be consistent with the quantities in the inventory module. If they are inconsistent, they may It will lead to excessive stocking or out of stock; if the order status is not clear, it will lead to insufficient user service and affect dining satisfaction.

Every Chinese Valentine's Day and Double Day are not only the busiest times for Haidilao employees, but also the busiest time for system data processing. Since the inventory of some hot-selling products changes very quickly, a single piece of data needs to support a high frequency of changes of thousands of times per second, which also requires the system. It must be able to analyze and summarize the changes in the quantity of commodities in real time, so as to prepare and supply them in time.

How to achieve flexible expansion and contraction of the database in a more flexible, safer, and lower cost, and perfectly support the traffic peaks of each holiday has become the most concerned issue of Haidilao.

In order to better cope with rapid changes in traffic, Haidilao's business database needs to have flexible adjustment capabilities: during low business peak periods, it can run stably on a smaller scale to reduce resource waste; during peak business periods, it can be rapidly expanded to ensure Stable operation during holidays.

According to the business characteristics and needs of Haidilao, OB Cloud has created an ideal solution on the cloud with its multi-level elastic scaling capability inherited from OceanBase.

In OB Cloud, each business (tenant) has its own independent resources. These resources are located in the same resource pool and can be dynamically adjusted according to the actual needs of the business. This design makes the use of resources more precise and avoids waste. At the same time, this design also enables the business to respond to traffic changes more quickly, improving the responsiveness of the business.

In the face of large business traffic, simply adjusting tenant specifications may not be able to meet business needs. In this case, adjustments must be made on the cluster. In OB Cloud, you can adapt to changes in business needs by changing the configuration of the server (vertical expansion and contraction), or increasing or decreasing the number of servers (horizontal expansion and contraction), and the latter is difficult to achieve with active and standby architectures such as MySQL. . These two types of expansion and contraction can be combined with each other to provide greater flexibility, effectively respond to sudden increases and decreases in traffic, and ensure business stability and efficiency.

During the Chinese Valentine's Day that just passed, Haidilao's purchase, sale and inventory system experienced twice the traffic peak of last year. However, with the support of OB Cloud, the system's real-time analysis computing power was increased by 45%, and the overall database cost was reduced by 50%. Cope with the holiday test.

OceanBase is also exploring the flexibility of business and architecture at a higher level: it has introduced an innovative "single-machine distributed integrated architecture" for the first time, ranging from personal small sites using public clouds to large ones using private clouds and hybrid clouds. Bank core systems and giant e-commerce websites can flexibly meet the needs of cost-effectiveness and high availability based on their own characteristics at different stages of business development, instead of being forced to accept capabilities that they do not need due to technology constraints.

This also leads to a new challenge - how to provide a unified and efficient cloud database solution in complex cloud computing architecture and diverse computing scenarios.

picture

Cloud computing has developed from the initial public cloud and private cloud to a hybrid cloud architecture that includes multiple data centers. This shift also moves us towards more complex architectures and hybrid cloud scenarios. More and more enterprises are beginning to deploy applications and data on multiple infrastructures. On the one hand, they can take advantage of the flexibility and rapid response of hybrid cloud environments. On the other hand, they can choose different cloud infrastructures for different application scenarios and give full play to each cloud. The unique advantages of the service.

For example, Li Auto's production line manufacturing system is deployed privately in the data center, while the vehicle cloud and autonomous driving system choose multiple different cloud infrastructures and deploy them in multiple regions of the public cloud, so that even if some functions fail , The overall service will not be affected, ensuring the driving safety of car owners.

However, this model also brings many technical challenges: the functional performance differences of different database products on different cloud infrastructures increase the complexity of operation and maintenance and the difficulty of resource integration; traditional single databases are difficult to expand and have single points. The bottleneck problem cannot meet the low-latency requirements of multi-location access such as Internet of Vehicles systems. In addition, although some database products solve the scalability problem, their consistency protocols are sensitive to network delays, which may lead to write jitter and service instability in remote computer rooms or unstable network environments, making it difficult to meet similar requirements. Low-latency requirements for Internet of Vehicles and autonomous driving services, etc.

Facing the challenge of multiple infrastructures requires a flexible, scalable hybrid cloud architecture solution that can unify and simplify these environments while delivering consistent performance and functionality.

OceanBase does not rely on dedicated hardware and can support different cloud infrastructures. It adopts a shared-nothing architecture. By using OB Cloud, Li Auto can deploy the entire OceanBase platform in the data center and provide consistent functions and management interfaces on different cloud infrastructure and cloud services, which greatly improves the integration and management efficiency of the storage chassis. At the same time, OB Cloud's native high-availability architecture can quickly and automatically recover in the event of a local single-point failure, and can provide stable services even in cross-regional deployments, ensuring the safe operation of key systems such as connected cars and ensuring that car owners driving experience.

Now, Li Auto builds a world-leading manufacturing system with the help of OceanBase, and realizes cross-cloud and remote multi-activity of the car cloud business on OB Cloud, ensuring the continuity of production lines and business stability.

picture

OB Cloud's one-year performance is proof that OceanBase continues to innovate and iterate based on results.

China has the largest data base, and user applications are most likely to generate original innovations. Whether it is technology or engineering, it is necessary to return to reality. The adjustment of a parameter and the error of millisecond level may lead to various problems, which need to be polished step by step and continuously improved. Over the past few years, OceanBase has embraced open source and the community, and launched cloud databases. In addition to the research and development of the product itself, related documents, training, and supporting measures are also being promoted simultaneously.

As Bjarne Stroustrup, the father of C++, said, there are only two programming languages ​​in the world: one is criticized and the other is no one cares about.

Cloud databases have huge development potential and prospects, and they need to rely on the collaboration of the entire ecosystem to succeed. This is a difficult process that requires significant investment and sustained effort.

In the end, word of mouth from customers will be the most powerful proof.

* References:

[1] "Database Development Research Report (2023)", China Communications Standards Association, July 2023

Guess you like

Origin blog.csdn.net/OceanBaseGFBK/article/details/132743406