"2022 Open Source Big Data Heat Power Report" is released

On November 5, at the Yunqi Conference Integrated Big Data Intelligence Summit, the "2022 Open Source Big Data Thermal Power Report" jointly produced by the Open Atom Open Source Foundation, X-lab Open Laboratory and Alibaba Open Source Committee was released .

Liu Jingjuan, deputy secretary-general of the Open Atom Open Source Foundation

Liu Jingjuan, deputy secretary-general of the Open Atom Open Source Foundation, gave an in-depth interpretation of the report . Based on the 102 open source big data projects with the most active research in public data, the report explores the "Moore's Law" behind the development of open source big data technology: every 40 months, the heat value of open source projects will double, and the technology will complete a round of updates iterate. In the past 8 years, there have been 5 large-scale technological thermal transitions, and diversification, integration, and cloud native have become the most prominent features of the current development trend of open source big data.

Quantitative analysis of open source trends in the "post-Hadoop era"

As the origin of open source big data technology, Hadoop emerged in 2006 and has a history of 16 years. The report collects relevant public data from the 10th year of Hadoop development (that is, 2015) to the present, and conducts correlation analysis, defines the research model of the thermal value of open source projects, and uses quantitative indicators to describe the development iteration activity of open source projects and the influence of developers. welcome level.

The heat map of open source big data presented in the report provides insight into the thermal performance of the shortlisted projects from the perspective of technology panorama, technology stack classification and project dimensions, and correlates and analyzes key events in the project process with thermal performance, and interviews open source foundations, well-known Experts in fields such as open source projects try to find the general law of healthy development of projects, and summarize the methodology for effectively improving the influence of projects.

"Moore's Law" of open source big data technology

The report found that every 40 months, the heat value will double, open source big data has completed a round of iterative technology upgrades, and the technology cycle is shortening at an accelerated rate. In 8 years, there have been multiple rounds of thermal changes, reflecting the trend of upgrading the technology behind it. Developers have maintained a long-term development enthusiasm for "data query and analysis", ranking first in the heat value list for 8 consecutive years. In 2017, the thermal value of "stream processing" surpassed that of "batch processing", and big data processing entered the real-time stage. The scale of data continues to expand, and the data structure is more diverse. "Data integration" will explode from 2020.

Three Hot Trends: Diversification, Integration, and Cloud Native

The diversification of user needs promotes the diversification of technology. "Data Lake" ranks first in thermal value growth with a compound annual growth rate of 34%, followed by "Interactive Analysis" and "DataOps", ranking second and third respectively. The product iteration of the original Hadoop system tends to be stable, with a compound annual growth rate of 1% for thermal value.

Since 2015, the computing part has taken the lead in entering the evolution process of "integration", and the typical representative "integration of streaming and batching" has its first thermal peak in 2019. Storage integration represented by data lake storage has entered a new stage of development since 2019, and hot projects such as Delta Lake, Iceberg, and Hudi have emerged.

Cloud native large-scale reconstruction of open source technology stack. Open source projects born in the cloud-native era have sprung up like mushrooms after rain. There have been major project changes in fields such as "data integration", "data storage", and "data development and management", and the thermal value of new projects has exceeded 80%.

Open source big data heat list TOP30

This report selects the TOP30 heat list from the 102 shortlisted projects. Kibana topped the list with a heat value of 989.40. ClickHouse (data query and analysis), Airflow (data scheduling and orchestration), Flink (stream processing), and Airbyte (data integration) respectively won TOP1 in their respective segments. A number of Chinese open source projects such as Pulsar, Doris, StarRocks, DolphinScheduler, SeaTunnel, etc. also showed a high trend of heat. Taking solving user pain points as the core competitiveness is a common feature of these excellent open source projects. This feature ensures that they keep pace with the times and become "evergreen trees" in the heat trend.

Thanks to Kaiyuan China, InfoQ and Alibaba Cloud developer community for their strategic support; thanks to the 32 experts and contributors who made important contributions to the output of this report; thanks to CSDN, DataFun, Segmentfault, Kaiyuanshe and other communities for their cooperation.

Report download address:

https://www.openatom.org/other/%E5%BC%80%E6%BA%90%E5%A4%A7%E6%95%B0%E6%8D%AE%E7%83%AD%E5%8A%9B%E6%8A%A5%E5%91%8A2022.pdf

Guess you like

Origin blog.csdn.net/OpenAtomFund/article/details/128236526