In August, the "China Database Industry Analysis Report" has been released, focusing on data warehouses, and the first [Global Data Warehousing Industry Map]

In order to help everyone understand the development status of China's database industry in a timely manner and sort out the current database market environment and product ecology, starting from April 2022, the industry analysis and research team of Motianlun Community will continue to launch the latest "China Database Industry" for everyone every month . Analysis Report" , which continues to disseminate data technology knowledge and strive to promote technological innovation and industry ecological development . It has been updated to the sixteenth issue and has released a 2022 annual analysis report with a total of 122 pages .

Motianlun's August "China Database Industry Analysis Report" has been officially released (click to jump, everyone is welcome to download and review). This report takes stock of Motianlun's "China Database Popularity Ranking", new product releases, investment and financing, etc. Industry information to show the current development trends of the database market.

This report focuses on data warehouse, introduces its architectural evolution and technical principles in detail, and summarizes the five major technical characteristics and six major development trends. It is the first release of [ Global Data Warehousing Industry Map ], and finally selects typical data warehouse product cases at home and abroad. Its principles and characteristics are intended to lead everyone to a more comprehensive and in-depth mastery of knowledge points and application practices related to data warehouse technology.

1. Database rankings and cutting-edge trends

 Contents of this chapter

  • Analysis of Chinese database popularity rankings in August

A total of 286 databases participated in the Motianlun Chinese Database Popularity Ranking in August 2023. Among the top ten in this month's ranking, the head changes have intensified. Among them, OceanBase  has won the first place for nine consecutive months, TiDB's  ranking has risen one place to second place from the previous month, and Alibaba Cloud  PolarDB's  ranking has continued to climb for two consecutive months and has reached fourth place this month.

The ranking of a number of potential products in this month's rankings has increased compared with last month. In the 10th-50th stage, many databases rushed into the competition unstoppably. For example, the ranking of Apache Doris , Baidu's self-developed and open source OLAP database,   rose one place from the previous month to 16th; Alibaba Cloud  Hologres  was a newly added database to participate in the ranking in August, and its ranking was close to the top 20 to 22nd; Star KunDB , a distributed relational database created by Huan Technology,   rose 3 places to 28th this month; gStore, an open source native graph database system for RDF knowledge graphs developed by the Data Management Laboratory of Wangxuan Institute of Peking University, rose to 31st this  month  . ; BigInsights, a new intelligent database AiSQL product developed in C++ independently designed and developed by Bigmath  ,  has rapidly climbed 63 places this month compared to the previous month, and is now ranked 33rd.

  • Database industry development trends

The report collates information on investment and financing, new product launches, etc. that have attracted recent attention in the industry. Among them, in August 2023, the Ministry of Finance, together with the Ministry of Industry and Information Technology, studied and drafted standards for government procurement requirements for databases, operating systems, general servers, anti-virus software, middleware, portable computers, desktop computers, all-in-one computers, workstations , etc. Database government procurement requirements standards include distributed databases and centralized databases. In addition, Transwarp Scope 2.5 , an enterprise-level interactive data retrieval statistical analysis platform independently developed by Xinghuan Technology,  was released, and the report explained its features and functions; database startup  Neon  received US$46 million in financing; Oracle announced the full launch of  MySQL HeatWave Lakehouse . Enables customers to query data in object storage as quickly as querying data within a database. Due to space limitations, only some pictures are taken here. Please refer to the report for details .

2. Overview and technology evolution of data warehouse

 Contents of this chapter

  • Basic overview of data warehouse

In the era without data warehouses, data analysts needed to collect, clean, and integrate data from multiple data sources, and make partial data copies for each decision support environment. The process was time-consuming and had low accuracy. And because the system iterates and updates quickly, the data source is usually an old business system that has been offline, which makes data analysis more difficult. In this development context, data warehouse (Data Warehouse)  came into being.

Chapter 2 of the report specifically introduces the origin, hierarchical architecture, basic characteristics of the data warehouse, as well as the evolution and development process of the architecture . A data warehouse is a central repository of integrated data from one or more disparate sources. It stores current and historical data in one location and is used to create analytical reports for employees across the enterprise. It is subject-oriented, integrated, and non-volatile. , time-varying properties, etc.

Since Inmon proposed the concept of data warehouse in 1990, the architecture of data warehouse has also undergone many evolutions. It has evolved from the original traditional data warehouse architecture - offline data warehouse - offline big data architecture, Lambda architecture, Kappa architecture and Flink's The streaming-batch integrated architecture brought out by Huoyan continues to facilitate users to complete real-time calculations in the most natural and minimal cost.

In addition, the report also summarizes the development history of data warehouses from the budding exploration to the era of enterprise-wide integration, the era of enterprise data integration, the era of chaos - the debate between the "father of data warehouses", the era of theoretical model confirmation, and the era of contention of a hundred schools of thought on data warehouse products. , hoping to help readers grasp its development context vertically, and the specific content can be found in the report.

  • Analysis of Data Warehouse Technology

The report shows the core components of the data warehouse through an architecture diagram : central database, ETL (extract, transform, load) tools, metadata and access tools, and analyzes in detail the five key technologies of the data warehouse : query optimizer, MPP architecture, vector ization, columnar storage, and data compression.

The main goal of the query optimizer is to select the optimal execution plan to minimize the execution cost of the query, thereby improving query performance; while the MPP architecture can accelerate the preprocessing operations of data from multiple sources in order to organize the data into a suitable The form of analysis; vectorization can improve the efficiency of data analysis, so it is widely used in data loading, conversion, data analysis, complex queries and other operations; columnar storage has a higher compression rate and faster read and write efficiency than row storage , and can process higher quality data; in data warehouses, compression is usually performed using a combination of rows and columns to improve storage efficiency. Here we only briefly list and screenshot some of the features. For more detailed information, please refer to the report .

The wave of digital transformation has swept up various concepts. To help you sort out the differences, this chapter also introduces the concepts of data lake, integrated lake and warehouse, intelligent lake warehouse and other related terms that are often confused with data warehouse. If you are interested, Friends can download the report for review.

3. Current status and future trends of data warehouses

 Contents of this chapter

Chapter 3 of the report analyzes the current situation and development trends of data warehouse from a development perspective . At present, China's data warehouse market still has problems such as the manufacturer's short development history, small market size, and the cloud migration process is relatively lagging behind that of the United States. However, the digital scenarios of Chinese enterprises are more abundant, and the need for digitalization is more urgent. Overall, China's data warehouse market has huge development potential and will experience rapid growth in the future. IDC predicts that by 2027, China's data warehouse software market will reach US$2.73 billion, with a five-year market compound annual growth rate (CAGR) of 25.7% from 2022 to 2027.

Faced with the endless emergence of new technologies, data warehouses will develop in the future in the direction of real-time analysis, cloud-native serverless, lake-warehouse integration, HTAP, digital intelligence integration, and streaming data warehouses. The report provides a detailed interpretation of these six types of development trends. , due to space limitations, detailed screenshots are not shown here. You can download the report for review.

Finally, the report compiled and released the [ Global Data Warehousing Industry Map ], which differentiates global data warehouse products from two dimensions: open source and commercial, and Chinese and foreign. We hope to help everyone gain an in-depth understanding of the development of the data warehouse industry. You can download the report to view the high-definition version .

4. Analysis of typical cases of database products

The last chapter of the report selects typical data warehouse products at home and abroad as cases to introduce its core architecture, functional features and application practices .

Among them, foreign products include the elastic data warehouse  Snowflake , which has complete SQL support and semi-structured and schema-less data mode support. It is a multi-tenant, transactional, secure, and highly scalable elastic system; the founder of the data warehouse market  Teradata , which is mainly suitable for building large-scale data warehouse applications, has officially announced in 2023 that it will gradually end direct operations in China; Google's fully managed enterprise data warehouse BigQuery can help users through machine learning, geospatial analysis  and  business Use built-in capabilities such as intelligence to manage and analyze data, and leverage cloud data warehouses to power data-driven innovation.

Domestic products include Apache Doris , a modern data warehouse for real-time analysis  . It is a high-performance, real-time analytical database based on MPP architecture. It can not only support high-concurrency point query scenarios, but also support high-throughput complex analysis scenarios; distribution GBase 8a logical data warehouse  , its main market is the business analysis and business intelligence market, and can be applied to industries with massive business data such as government, party committees, security-sensitive departments, national defense, statistics, etc.; Huawei Cloud enterprise-level cloud distributed data warehouse service  GaussDB (DWS)  is an online data processing database based on cloud infrastructure and platform, providing ready-to-use, scalable and fully managed services; Finally, it introduced the ArgoDB developed by Xinghuan to help enterprises build a one-stop real-time data  warehouse and Hologres,  a one-stop real-time data warehouse engine developed by Alibaba Cloud  . Only part of the content in this chapter is shown here. You can download the report for more content.

This article only excerpts and organizes part of the content of the August "China Database Industry Analysis Report". For more complete and detailed content, you can download the full text of the report. We also welcome colleagues in the data industry to communicate, discuss, and make suggestions. Let's witness it together . , jointly help the development and growth of China's database industry!

Download address for the full text of the report: https://www.modb.pro/doc/116039

Download previous reports

More exciting content can be found in the Motianlun Data Community , which provides one-stop comprehensive services around the learning and growth of data people, and continues to promote knowledge dissemination and technological innovation in the data field. Add the community Mo Tianlun Assistant (VX: modb666) to get more technical information.

Guess you like

Origin blog.csdn.net/Era666/article/details/132625399