YashanDB: With hard work and hard work, there is no shortcut to breakthroughs in core database technology

It is said that the database is a hard core among the three basic software. It has high technical threshold, long R&D cycle and high engineering requirements. The market has been dominated by a few giants for a long time.

Therefore, achieving breakthroughs has always been the long-cherished wish of China's database industry. Since the 1980s, China's database industry has gone through hardships and ups and downs for more than 40 years and finally has a place. But as China gradually grows into the world's largest data circle, China's database industry is facing a new situation:

On the one hand, the digital economy has brought continuous and abundant demand for databases, and the future of China's database market is generally optimistic; on the other hand, hundreds of database companies have emerged in the market, and the phenomenon of repeated construction is prominent and a certain degree of chaos has emerged; more importantly What is important is that, in the face of increasingly complex and diverse data processing needs, database theory and core technologies are in urgent need of breakthroughs to better adapt to future market needs.

After all the lead is washed away, gold is revealed. Where should China's database industry go? Recently, the YashanDB team of Shenzhen Institute of Computing Science (hereinafter referred to as Shenzhen Institute of Computing Science) accepted an exclusive interview with Big Data Online and talked about the development of China's database industry and other topics. Wang Nan, product director of YashanDB, believes that the development of databases must break through key core technologies, and the only way out is to work hard. At present, YashanDB is actively exploring a new path for industry, academia and research, that is, it is committed to breakthroughs in database theory and core technologies, using cutting-edge research results to closely meet market needs to create world-class database products.

Barbaric growth is not advisable

IDC data shows that China's relational database market size will be US$3.43 billion in 2022, a year-on-year growth of 23.9%; by 2027, the scale is expected to reach US$10.27 billion, with a compound annual growth rate of 24.5%. The CICC research report also shows that the overall domestic replacement market space for databases from 2023 to 2027 is about 40 billion yuan.

There is no doubt that China's database market has huge potential. At the same time, the Academy of Information and Communications Technology's "White Paper on Database Development" mentioned that the number of database companies in China has reached 150, and there are as many as 238 database products. Against the background of increasing uncertainty in the external environment, the blooming of a hundred flowers has indeed made the market prosperous, but it has also made the phenomenon of barbaric growth and redundant construction increasingly prominent.

As basic software, databases have their own rules. The emergence of such a large number of companies in a short period of time may cause two challenges:

First, the influx of short-term capital creates the illusion of prosperity, but the overall size of the market is not enough to support so many companies, and the future prospects of most companies are doubtful; second, database is a basic software field that requires continuous investment, and repeated construction will Market talents and funds will be fragmented, and overall competitiveness will be damaged.

If we carefully study Chinese database companies, most of them are inextricably linked to the two major open source databases MySQL and PostgreSQL. It is undeniable that open source plays a key role in promoting the rapid development of China's database industry, and it is definitely one of the important development trends of the database industry in the future. But open source ≠ free. With the rise of cloud computing, various interest disputes have arisen frequently. MySQL's GPL agreement is the most stringent among open source agreements. How it develops in the future depends on Oracle's attitude. If you use open source to quickly package "quick" products in order to seize the market, this approach will have huge risks in the future.

Currently, this approach of “taking shortcuts” has had some impact. For example, CSDN's "2022-2023 China Basic Software and Hardware-Database Developer Survey Report" shows that only 31% of developers have a positive view of domestic databases, and 69% of developers have a negative view of domestic databases.

"There are no shortcuts for basic software such as databases. If databases want to continue to develop, they need to have sufficient strategic focus and focus on theoretical innovation and technological breakthroughs to truly solve the basic problems of databases." said Wang Nan, product director of YashanDB.

There is no shortcut to breakthroughs in database core technology

In essence, the database belongs to the software-heavy industry, which is highly engineering, requires large investment, is slow to produce results, and has great uncertainty in returns. If you want to make a difference in the database field, you need to face the four most critical challenges of capital, technology, talent and commercialization.

For example, the research and development of databases requires continuous investment of a large amount of funds. With low investment and reliance on open source "shortcuts", it is essentially difficult to obtain core competitiveness. It also faces problems such as insufficient core talent and commercialization.

But the most important challenge is undoubtedly technological breakthrough. The current database market is similar to the early stage of the new energy vehicle market. There are a large number of companies in the market, but there are not many companies that truly master the core technology. Among the key technical challenges of databases, innovation in database theory is the most critical, and the development of core technologies depends on innovation in database theory.

Obviously, today, as user business types, scenario scales, data volumes, etc. have all undergone earth-shaking changes, theoretical innovation of databases is urgent. This is also the direction that Chinese database companies need to concentrate on. Only by achieving innovation and breakthroughs in database theory can we bring about comprehensive changes in product technology, thus supporting the needs of future business scenarios.

In the current Chinese database market, the Shenzhen Institute of Computer Science and Technology is one of the few institutions dedicated to database theoretical research and innovation. The theoretical research team of the Shenzhen Institute of Computer Science and Technology originalizes bounded evaluation and data-driven approximation. ), concurrent transaction scheduling theory and a series of innovative theories, committed to continuing to explore breakthroughs in core database technology.

For example, bounded computing theory reduces big data calculations to small data processing, while approximate computing can achieve accurate and efficient query of big data with limited investment in hardware scale. The theoretical research results of the Institute of Shenzhen Computing Machinery are of great practical value to many industry users in the era of big data.

Currently, performance and cost are still the core factors in database product selection. However, the growth rate of computing resources is far less than the growth rate of data. Even if stacked machines increase computing power, it is difficult to cope with the computing requirements of massive data, and it will also lead to doubled operation and maintenance problems and costs. Bounded computing and approximate computing are expected to break the constraints of traditional database theory and bring database performance and cost to new heights.

For example, it was found through testing that in a real-time query scenario with billions of data in a certain business scenario, 91% of queries can be solved using bounded computing, and more than 70% of query efficiency can be improved by 25 to 140,000 times. The remaining 9% of queries that do not have bounded calculation conditions can be solved through data-driven approximate calculation theory.

However, the process from theoretical innovation to commercialized products is by no means easy and requires continuous verification, iteration and optimization. The YashanDB R&D team started verification from the prototype, experienced various difficulties and challenges, and gradually integrated these two theoretical research results into YashanDB. In the latest YashanDB version, YashanDB does not need to access all the data during big data analysis. It only needs to take a small data set to get the desired results. After actual measurement, the data volume increased from 10GB to 1TB, YashanDB's response delay remained sub-second, the performance increased by more than a thousand times without degradation, and the performance and cost were excellent.

It is reported that YashanDB is original from its core theory to key technologies and is highly compatible with mainstream databases. YashanDB's own product capabilities are relatively comprehensive. Based on the YashanDB core, it creates a variety of product forms such as stand-alone/active/standby, shared cluster, distributed, etc., covering OLTP/HTAP/OLAP load scenarios, and provides a complete tool system. Wang Nan revealed that YashanDB will recommend different product forms based on user scenarios.

"We design products with the goal of improving computing performance at unit resource cost, rather than stacking machines to pursue the 'upper limit of scale.'" Wang Nan said. In OLTP scenarios, YashanDB uses technologies such as fine-grained concurrency control, lock-free transaction optimization, and adaptive concurrency scheduling algorithms to maximize the transaction processing performance of a single machine and provide benchmark performance test configurations and test data that can be used in production. Its performance exceeds the mainstream More than 30% of commercial databases.

"A few years ago, you may have thought that it would take many years for China's database core to mature." Wang Nan said, "But now judging from the performance of some of our database products in core business scenarios, the core database technology only needs to be settled. If we conquer it, we will definitely be able to solve it.”

Just as domestic new energy vehicles have gradually taken the lead in conquering core technologies such as autonomous driving, smart vehicles, and chassis, and have stood out in the market, database companies that have taken root in key technology research and development and breakthroughs have laid a solid foundation from the beginning and are expected to grow in the future. gradually achieve leadership in the market. "There is no overtaking in the database. Mastering the core technology is the key. If the core technology is insufficient, even if you take a 'shortcut' at the beginning, you will not get far in the future." Wang Nan said.

Commercialization cannot be done “on paper”

Overall, the development of my country's database industry is in a prosperous stage, accelerating the key transformation from "quantity" to "quality". Among them, commercialization is a must-answer question for many Chinese database companies.

In the database market, it is not enough to have breakthrough core database technology and powerful database products. Commercialization is the key to realizing the value of product technology. As we all know, database companies in my country are still relatively young. In the past, due to the long-term occupation of the market by database giants such as Oracle, many Chinese database companies, even if they have many technological and product innovations, have difficulty in getting more opportunities to be verified in core business scenarios such as finance. , thus falling into a vicious circle in which "technology, products, and scenarios" cannot be a virtuous cycle, and the road to commercialization is extremely tortuous.

Nowadays, as independent and controllable technology systems have become an important support for the development of China's digital economy, Chinese databases have also ushered in an opportunity to break the vicious circle. In Wang Nan's view, Chinese database companies need to focus on four aspects: scenario verification, application transformation, selection cost, and service capabilities to accelerate commercialization.

The first is scenario verification. For example, core financial business scenarios have extremely high requirements for database performance, reliability, and stability. As autonomous controllable technology at the hardware level gradually enters core business scenarios, it will bring about database adaptation and performance. fluctuations and a series of challenges. Wang Nan bluntly said: "If the database wants to achieve large-scale replication, it must be verified in key industries and key scenarios. Only by moving forward step by step can we achieve large-scale replication in the breadth of industries and business scenarios."

Taking YashanDB as an example, it has already done a considerable range of coverage and verification around other customers and key scenarios in key industries such as finance and central state-owned enterprises.

Second is the challenge of application transformation. Financial institutions such as banks need to solve the cost problem caused by scale due to their rich history and complex business systems, such as the transformation of distributed architectures. "This is a key contradiction and a huge challenge for database companies and users. "Wang Nan added.

The third is to reduce the cost of customer selection. Due to uneven product quality, the cost of customer selection and judgment is too high. Providing honest and trustworthy cost-effective products, fair and transparent prices, a complete ecosystem and worry-free service providers is the way to break the situation.

Finally, it is necessary to improve service capabilities. Currently, the common dilemma faced by domestic database companies is that faced with the complexity of scenarios, heavy service investment is required, which relies heavily on the DBA team.

Compared with other commercial database companies, relying on the Institute of Computing Technology, YashanDB is a representative of the "integrated" database of industry, academia and research, and its commercialization path has attracted more attention from the industry. Wang Nan introduced that YashanDB has strong scientific research resources behind the Institute of Computer Science and Technology. In the future, it also hopes to accelerate commercialization, marketize good innovations in the database field, and bring more value to the digital transformation of Chinese enterprises. It is reported that YashanDB will further accelerate the marketization and commercialization process, and productization, key industries and ecological partner layout are also being promoted intensively and methodically.

"We have enough confidence and strategic determination to make YashanDB a success!" Wang Nan finally said.

Guess you like

Origin blog.csdn.net/dobigdata/article/details/132732650