Reduce the calculation cost of Spark by 50.18%, use the Kyligence lake warehouse engine to build a cloud-native big data base, and speed up the calculation by 2x

The 2023 China Open Source Future Development Summit was successfully held on May 13. In the open source native business sub-forum of the conference, Zhang Xiaolong, Senior Director of Kyligence Solution Architecture, delivered a keynote speech on "The Evolution of Cloud Native Big Data Base", and introduced his views on the development of open source to the guests. Trends, and the Kyligence lake warehouse engine can play an important role in building the next generation of cloud-native data base. By improving computing performance, computing costs can be greatly reduced. The following is the content of the speech:

Hello everyone, this speech consists of three parts:

The content of the first part is based on my personal experience, as well as the thinking generated by what I have seen and heard, to talk about some of my views on the development of open source.

The second part is to talk about why I think the basic key core technologies will gain new development opportunities.

The last part is to introduce the evolution trend of data base in the industry and some practices of our company around new development opportunities.

In the first part - my opinion on open source, I use Kyligence's past experience as an argument, and then put forward three points of view:

The first point of view is: open source software technology and its commercialization are important forces driving digital transformation in various fields.

First introduce Apache Kylin and Kyligence.

Apache Kylin™ is an open source, distributed analytical data warehouse.

Founded in 2016 by the founding team of Apache Kylin, Kyligence is a leading provider of big data analysis and indicator platforms.

You have seen many corporate logos. At present, more than 1,500 companies around the world use Apache Kylin and Kyligence commercial products to solve pain points in digital operations and analysis and decision-making. The rich practices of enterprises are constantly driving the development of open source and commercialization, and at the same time, open source and commercialization are also driving digital transformation in more fields.

The development of Apache Kylin and Kyligence is relatively mature, but this is only one of the countless open source and open source commercialization forces. Open source and the commercialization behind it are important forces driving digital transformation in various fields. It is of great significance and importance to vigorously develop these forces. value.

The second point of view is: the ecological prosperity of open source software relies on the spillover effects brought about by the vigorous development of the digital economy.

Because the digital economy includes two parts: digital industrialization and industrial digitization.

First of all, through digital industrialization to reserve high-level technology and a large number of talents, the development of the industry will move from zero-sum game to coordinated development. This is the basic condition for the development of open source.

Furthermore, the technologies and talents produced by digital industrialization will play a huge role in the process of industrial digitalization. The digital transformation of traditional industries can accelerate the pace of transformation by using open source projects and supporting commercialization, while injecting sustainable development momentum into open source.

Observing the development process of Apache Kylin and Kyligence from this timeline, I think this point can be well proved. Before 2015, the Kylin project was developed on eBay and then contributed to the Apache Foundation. This is the process of digital industrialization, and then just With the rapid advancement of industrial digitalization, the digital transformation of traditional industries has supported the commercial development of Kyligence and injected power and vitality into it, enabling it to further contribute to industrial digitalization. Since 2016, Kyligence has become an important force to promote the evolution of open source Kylin , and then contributed two open source projects, Byzer and Gluten. In my opinion, the fundamental factor affecting the prosperity of the open source ecosystem is the development level of the digital economy and business environment. We need to adhere to long-termism, cooperation and win-win.

The third point of view is: the creation of social value by the open source software ecosystem especially requires long-term planned and organized investment .

Github conducts surveys and analyzes based on hosted open source projects every year, and the latest conclusions are worth paying attention to. The report mentioned that almost all large-scale open source projects are led and maintained by technology companies, and most of them are key basic technologies, such as frameworks, compilers, and programming languages. Almost all open source projects with the largest number of contributors have commercial support behind them.

Still taking Kyligence as an example, the two projects other than Apache Kylin currently led by Kyligence have also achieved good results.

Byzer is a low-code development platform for Data and AI. Because it provides commercial support, open source contributors in the financial industry are also deeply involved. The project is currently applied in the production business of the financial industry and other industries.

Gluten is a vectorized computing engine, and its goal is to strive to improve its computing performance by several times compared with native Spark. Because Apache Spark is one of the most widely used open source distributed computing engines in the field of big data, Gluten is committed to improving the return on investment of IT computing power for existing Spark users through performance improvement and the flexibility of cloud computing, saving users cost.

In the second part, the country has proposed a grand strategy for building a digital China. Against this background, I pay special attention to what new development opportunities data technology will gain, and I will discuss it with you in this part.

The digital China construction plan is ambitious, systematic and comprehensive. I think that the most important aspect of promoting the development of basic key technologies comes from one of the "two foundations" in the "2522" framework, and the important strategy of consolidating digital infrastructure. Take the "Digital from the East and Computation from the West" project, which has been vigorously developed in recent years, as an example. It has laid out a large number of general data centers, supercomputing centers, intelligent computing centers, and edge data centers, and proposed "heterogeneous computing power fusion, cloud Network integration, multi-cloud scheduling, east-west collaboration, data security circulation..." and a series of development requirements, which will obviously promote the innovation of basic technologies such as artificial intelligence, big data and cloud computing, and their integration and collaborative application will be the future important direction of development.

In the "Digital from East to Computation from West" project, there are 8 national computing power hub nodes in the country, including 10 national data center clusters. The Chongqing cluster and the Chengdu-Chongqing hub are important computing power, and related industries in Chongqing will have very good development opportunities.

From the policies and measures released recently for the development of the computing power industry in some places, it can be seen that the development of localized cloud computing and big data platforms based on core software and hardware such as servers, computing and storage, cloud platforms, and data circulation will enter the fast lane. This will further promote the innovation and development of basic technologies such as artificial intelligence, big data and cloud computing, which is a very rare development opportunity.

In the face of the above opportunities, we believe that the combination of big data, artificial intelligence and cloud native technology is a good starting point to meet the above opportunities. The third part will share our relevant practical experience with you.

We have seen that leading companies at home and abroad, such as Alibaba Cloud and Databricks, are promoting the integrated development of cloud-native architecture data lakes and lake warehouses. Combined with Kyligence’s practical experience, we believe that after Hadoop, cloud-native Kubernetes Technically, it will form a new generation of big data base, with Spark, Flink and other streaming and batch computing technologies as a unified computing engine, and Hucang as a unified storage core, which greatly simplifies the complexity of the data stack, and builds low-code and low-threshold on it Data application is the general trend of the future.

In order to comply with this trend, Kyligence launched the Hucang engine, which uses vectorized computing technology and is compatible with Spark ecological applications, becoming a high-performance, agile, flexible, and open engine that supports the operation of the Hucang platform .

Users can currently deploy and try this technology in the Kubernetes container cloud. They can experience that the computing performance of vectorized Spark is doubled compared with that of native Spark, and the computing cost is reduced by 50%.

This technology is in the stage of open trial experience. We have already had some enterprise users try to use it to reduce the cost of offline computing on the public cloud, or to improve the computing performance of Hadoop clusters, and have achieved good results in some scenarios.

Next, I will play a 5-minute demo video to show you: 1. How to deploy the Hucang engine; 2. How to compare performance with native Spark; 3. How users can use the new engine to execute custom SQL, query or process Defined data; 4. How can users quickly add a custom version of the calculation engine, and compare the cost with the Hucang engine.

In the future, we plan to further enhance the performance and compatibility of the vectorized Spark engine, and strengthen the connection and collaboration with various Spark applications. Fully integrated with cloud-native technology, the engine's elasticity, agility, and performance are improved, computing energy efficiency is greatly improved, and computing costs are greatly reduced; and through open strategies, it provides users with reliable and sustainable support.

  • In terms of elasticity, resources will be accessed on demand, loads can be scaled extremely quickly, and resources are highly isolated;
  • In terms of high performance, the operators of vectorized computing will be further enhanced, compatible with general computing platforms, and support the use of chips with multiple architectures for computing acceleration;
  • In terms of agility, it will support heterogeneous computing platforms running in different places, and support multi-cloud and cross-cloud;
  • In terms of openness, we will always open interface standards, and always be compatible with Spark standard interfaces, integrate with other technologies in the Spark ecosystem, undertake localization and letter creation requirements, and open core source codes while cooperating with enterprises and businesses , support enterprises to achieve independent control of core basic technologies, and ensure the reliability and credibility of the software supply chain.

Next, I will share the test report of the Kyligence Hucang engine and Apache Spark in the TPC-H scenario. You can see that the performance of the vectorized Spark engine has been improved, saving half of the computing resources and reducing the cost of use for users by 50%. . Since deciding to support his commercialization, Kyligence is doubling down on resources to advance the technology at an even faster rate. We hope that more users who originally used Spark as a computing engine can try to use the Kyligence Hucang engine to obtain lower usage costs and better user experience. We invite everyone to work together to promote the progress of this new technology and generate value.

You can scan the QR code on the screen, follow Kyligence, join the Hucang engine trial communication group, or add my personal WeChat for further communication. This is the end of my sharing today, thank you all!

Thank you again for the invitation from the organizers of the conference. We are willing to work with you to contribute to the sustainable development of China's open source industry and the construction of a digital China.

Listeners who are interested in trying the Kyligence Hucang engine for free, please scan the QR code or click the link to fill in the relevant information. After submitting, we will send a free trial link of Kyligence Hucang Engine to your email.

                                                                        

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/cicixing/blog/8805223