Real-time and serverless are inevitable choices in the open source big data 3.0 era

Recently at the 2023 Yunqi Conference, Alibaba Cloud made its annual release of open source big data products: E-MapReduce, Elasticsearch and other open source big data products are fully serverless; innovatively launched a new generation of streaming lake warehouses partnered by Flink and Paimon; embracing AI , launched Milvus fully managed service, upgraded intelligent operation and maintenance tools EMR Doctor and Flink Advisor.

Core components are fully serverless

At the conference, Wang Feng, head of Alibaba Cloud's open source big data platform, reviewed the evolution route of Alibaba Cloud's open source big data technology: Since 2009, it has experienced the 1.0 era represented by big data on the cloud, with data lakes and real-time as the Representing the 2.0 era, Alibaba Cloud's open source big data platform has now entered the 3.0 era. With the in-depth implementation of cloud native architecture, the open source big data platform's core computing components Flink, EMR Spark, StarRocks and storage components OSS-HDFS, etc. All are serverless.

Product cost performance soared 2 times

Chen Shouyuan, director of Alibaba Cloud's open source big data products, said that Alibaba Cloud's open source big data products have been optimized through underlying technologies, including the integration of the Etian 710 chip and the enhancement of self-developed engine functions. The user cost has been significantly reduced by 50%, and the engine performance has been improved compared to the open source version. 1 to 3 times, and the overall cost performance is improved by more than 2 times.

· Alibaba Cloud E-MapReduce newly launched Serverless StarRocks and Serverless Spark, providing users with fully managed, operation-free and other services. The newly upgraded lake storage OSS-HDFS and one-stop lake management platform DLF provide one-stop services for enterprises to build modern open source and open data lake warehouses.

· Alibaba Cloud's real-time computing Flink version launches an enterprise-level data integration solution. With Flink's excellent pipeline capabilities and rich upstream and downstream ecosystem, it can efficiently realize real-time integration of massive data.

· Alibaba Cloud retrieval and analysis service Elasticsearch version has launched a serverless version that is compatible with open source and can be used on demand. The platform will automatically schedule and determine the increase or decrease of resources according to business traffic fluctuations, and elastically scale in seconds to achieve dynamic matching of load and resources. Pay.

Golden partner Flink + Paimon: a new generation of streaming lake warehouse

Data analysis is being upgraded from the traditional Hive model to the lake warehouse architecture. Alibaba Cloud has inferred from a large number of practices that real-time is the next evolution direction of lake warehouse analysis. Under this technological trend, Alibaba Cloud has created a new generation of streaming lake warehouse solutions based on its golden partners Flink+Paimon, providing users with one-stop data entry into the lake, real-time processing and exploration and analysis capabilities, and expanding the use of data lake scenarios. Real-time computing capabilities, Flink batch computing has been made available for production on the cloud, supporting batch data processing and job scheduling on the lake. In the scenario where 500 million pieces of data enter the lake, compared with the open source Hudi solution, the Upsert performance of Alibaba Cloud's streaming lake warehouse solution is improved by more than 4 times, and the Scan performance is improved by more than 10 times. 

Smarter open source big data

With the current AI outbreak in full swing, Alibaba Cloud's open source big data platform has also introduced AI technology into the big data platform system, upgraded intelligent operation and maintenance tools EMR Doctor and Flink Advisor, and has been widely used in customer and Alibaba Cloud internal platform operation and maintenance, with an average The time to identify cluster problems is reduced by 30%, and the effective utilization rate of cluster resources is increased by 75%, helping the Alibaba Cloud open source big data platform to achieve intelligent operation and maintenance and data management. At the same time, Alibaba Cloud launched Milvus fully managed service, which provides vector retrieval capabilities for multi-modal data and accelerates customer AI applications.

Microsoft launches new "Windows App" .NET 8 officially GA, the latest LTS version Xiaomi officially announced that Xiaomi Vela is fully open source, and the underlying kernel is NuttX Alibaba Cloud 11.12 The cause of the failure is exposed: Access Key Service (Access Key) exception Vite 5 officially released GitHub report : TypeScript replaces Java and becomes the third most popular language Offering a reward of hundreds of thousands of dollars to rewrite Prettier in Rust Asking the open source author "Is the project still alive?" Very rude and disrespectful Bytedance: Using AI to automatically tune Linux kernel parameter operators Magic operation: disconnect the network in the background, deactivate the broadband account, and force the user to change the optical modem
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/5583868/blog/10143506