Volcano Engine ByteHouse: 4,000 words summary, five thoughts on the application of Serverless in the OLAP field

For more technical exchanges and job opportunities, please follow the WeChat official account of ByteDance Data Platform and reply [1] to enter the official communication group

As the next iteration of cloud computing, Serverless allows developers to focus more on building applications in products without having to consider underlying stack issues. As the maturity of related technologies has increased in recent years, the market's acceptance of Serverless has also become higher and higher. It can be said that today, Serverless has entered a high-speed track of development toward maturity and stability.

As a cloud-native data warehouse launched by Volcano Engine, ByteHouse is built on the open source ClickHouse, and has further upgraded the OLAP engine capabilities, performance, operation and maintenance, and architecture under the test of ByteDance's internal and external scenarios. In addition, ByteHouse is also exploring the direction of Serverless. It has built a new generation of data warehouse based on the concept of cloud-native. The architecture has been decoupled at three levels. It is expected that with the support of Serverless, it will provide more stable, reliable, and Trusted analysis services free developers' time and energy from infrastructure operation and maintenance optimization, and focus more on core business functions.

This article comes from the sharing of Li Qun, the product manager of ByteHouse of Volcano Engine, and introduces the application thinking of Serverless in the OLAP field from five aspects: scene selection, application threshold, and landing application.

Which application scenarios are suitable for choosing Serverless architecture?

In the field of OLAP data analysis, let’s first look at which analysis modes are not suitable for Serverless architecture:

  1. Long tasks, large jobs: If the analysis task needs to run for a long time (such as more than 20 minutes), the use of Serverless technology will be limited. Because serverless platforms usually set limits on the maximum running time, exceeding the limit will cause the task to be interrupted.

  2. Computing intensive : Serverless technology is usually suitable for processing lightweight tasks, while for highly computationally intensive tasks, more computing resources are required. However, there is currently no commercial Serverless data warehouse in the industry that can provide a computing power scale of more than 2000 vcore. , and 2000 vcore, converted into a general-purpose physical machine or bare metal, is only the computing power of 20 servers. Often the computing power requirements of some medium-sized analytical systems far exceed this scale.

  3. High-concurrency read-write type : Serverless technology is characterized by resource sharing. For analysis tasks with high concurrency requirements, performance bottlenecks are likely to occur. On the one hand, this is due to the upper limit of the size of the shared resource pool, and on the other hand, multi-tenant competition for shared resources. use.

  4. Stable load pattern with few fluctuations : Serverless platforms usually run on demand, and if long-running applications are required, Serverless technology is not suitable.

In short, Serverless technology is suitable for processing lightweight, short-time, low-concurrency analysis business, and is suitable for businesses with obvious volatility characteristics in load patterns; it is also suitable for pipeline-type and middleware-type businesses, such as flink real-time computing , kafka message queue and ETL task execution, etc.

Serverless technology is not suitable for long-running, computing-intensive, high-concurrency reading and writing analysis services that require continuous operation.

What are the barriers to applying Serverless technology?

In the field of OLAP, whether it is the evolution path from the classic MPP architecture to the Serverless architecture, or the newly constructed Serverless architecture based on the Cloud-Native cloud native concept, they all face the same technical challenges:

  1. Separation of storage and calculation

Decoupling computing and storage is a critical first step in serverless architecture, but the technical challenges are very large, such as: how to ensure that performance is less degraded or even not degraded; Near Data Computing (NDP) technology, which operators are pushed down On the storage side; how distributed cache technology improves the cache hit rate, all of which aim to reduce the network overhead between computing and storage as much as possible.

In addition, from 25GE networks to high-speed networks such as RDMA/RoCE, and then to the next step of memory network integration, how to reduce delays and improve throughput is also one of the difficulties that the industry continues to solve at the network communication level.

  1. Compute stateless

The computing side usually adopts the classic shared-nothing architecture, which has good horizontal scalability. However, the degree of statelessness on the computing side is directly related to the quality of elasticity, including the management and synchronization of metadata and the automation of statistical information. , Optimizer intelligence are key technical difficulties.

To describe it vividly, in the elastic process, the more things you carry, the heavier the status, the lower the elastic efficiency, and the worse the user experience.

  1. Global resource scheduling

Storage resource pooling, computing pooling, network pooling, and memory pooling will be implemented in the future. Moreover, the ideal Serverless architecture needs to be able to automatically perform intelligent dynamic scaling according to the load requested by the user, and automatically release resources when they are not needed. Automatically allocate more resources during business surges. The above puts forward higher requirements for global resource scheduling capabilities.

  1. mixed load

Under the Serverless architecture, different tenants submit various types of analysis tasks in the same computing resource pool. How to provide stable and reliable SLA guarantees for upper-layer applications further amplifies the difficulty of mixed load management.

It is difficult to implement a static quota load strategy in the Serverless multi-tenant mode, and it needs to overcome technical difficulties in load management such as intelligent and dynamic resource allocation, current limiting, and circuit breakers.

For example, the impact radius of the long-standing problem of "inefficient SQL running out of resources" will be amplified in the Serverless mode, and may even have catastrophic effects.

  1. Resource pool upper limit

In serverless mode, multiple tenants share a resource pool. Ideally, this resource pool should be infinitely expandable. However, currently only the storage side can basically achieve this. The computing side resource pool is still limited by software capabilities and there will be a ceiling. For example, the current Serverless data warehouses of several mainstream cloud vendors have not yet exceeded the computing power scale of 2000 vCPU. If multi-tenant concurrency factors are added, the current serverless architecture will be difficult to promote on a large scale in the field of OLAP analysis.

In addition, the introduction of new hardware and provision of pooling services, such as FPGA resource pools, aimed at further reducing the load on the computing side, are also the direction of current cloud scenarios. Full-scenario and multi-level data security under the Serverless architecture is also a key issue to consider.

Here I would like to briefly share with you some of ByteHouse’s thoughts and practices in this regard:


ByteHouse builds a new generation of data warehouse based on the concept of cloud-native, with three layers of decoupling in the architecture. Looking from bottom to top,

  1. At the storage layer, ByteHouse has achieved serverless, elastic scaling, and unlimited capacity expansion. In order to improve the performance issues under the separation of storage and computing architecture, a series of technical optimizations have been made on the storage side, such as

    For HDFS semantics, merging small files to reduce the number of files, improved Hedge Read, Fast Switch Read, etc. can reduce latency by 3 times while increasing bandwidth by only 10%;

    For S3 semantics, data access performance is improved through technologies such as memory cache and independent IO thread pool.

  2. In terms of network communications, technologies such as connection multiplexing, RDMA, and transmission compression have greatly alleviated the problem of network amplification.

  3. In the middle computing layer, ByteHouse provides users with flexible computing services through virtual warehouse and a pay as you go accounting model to save costs for users.

    Technically, ByteHouse is stateless, based on containerized deployment, second-level elastic scaling, and second-level on-demand start and stop. ByteHouse's enhanced local caching technology makes data preheating and prefetching more intelligent and efficient, and the hit rate of cached data is also higher. At the computing layer, ByteHouse uses different VWs to isolate loads, such as isolation by read and write, isolation by application category. Although this tenant-aware load isolation mode is not a serverless mode, it can be used to a certain extent. Meeting the needs of users is also one of the paths to evolve towards a serverless architecture.

  4. At the top cloud services layer, ByteHouse provides centralized catalog metadata services, cluster management services, etc. We decouple metadata from the computing layer, making the computing layer stateless and achieving second-level elastic scaling and start-stop capabilities. Metadata storage based on distributed KV and efficient part caching technology also further improves metadata access performance.

How do you view the conflict between observability and serverless philosophy?

With the deepening of Serverless, people have found that problem location under Serverless architecture is more difficult than in traditional applications. In this regard, some people believe that the need for observability should be supported, while others believe that observability is contrary to the essence of Serverless. Serverless is designed to prevent users from caring about the underlying computing resources.

I think this issue is essentially related to the current maturity of Serverless technology. For example, now we use water and electricity every day, but few people pay attention to how to generate electricity, how to distribute it, the treatment of drinking water, etc., because the service standards for water and electricity we receive are stable. , credible and reliable, so no longer focus on process details.

Similarly, the goal of Serverless is to provide stable, reliable and trustworthy analysis services, so that developers no longer spend time and energy on the underlying infrastructure and operation and maintenance optimization, but focus on the realization of business functions. above.

However, the current maturity of Serverless technology in the field of OLAP data analysis is far from reaching this goal. The series of technical difficulties mentioned above have not yet been completely solved. The simplest example is how to solve the "inefficient SQL exhaustion of resources" that has plagued the industry for more than 40 years. In the serverless model, billing is closely related to resource usage. The rationality and credibility of resource usage on the bill are currently the biggest concerns of customers.

In addition, providing users with process observability through technical tools such as logging, tracking monitoring, and visual indicators is also a capability that a Serverless platform should have, and can also increase users' trust in the system.

Therefore, the two are not contradictory. We believe that one day Serverless will bring users standard, stable, reliable and trustworthy analysis services, just like we use water and electricity today.

When implementing Serverless, how to choose between self-developed and cloud vendor solutions?

The most valuable thing in the 21st century is talent. For enterprises, the goal of every investment is to obtain more in-depth analytical insights, more sensitive risk control perceptions and early warnings, and faster user growth. Therefore, enterprise IT is more about View investment decisions from a development perspective, enable business, and take a step closer to allowing IT to evolve from a traditional cost center to an empowerment center and a profit center. The focus of talent reserves is the direction of technology development.

The business logic of cloud vendors is to provide users with standard cloud computing technology services and provide users with differentiated cloud services through sustained and high-intensity R&D investment. The focus of talent reserves is on technology R&D. There is only one word difference between development and research and development, but the meanings are very different.

Especially for the implementation of Serverless technology in the OLAP field, which involves almost full-stack technical points in IT fields such as storage, network, operating system, database, and AI, it requires manufacturers to make continuous and high-cost R&D investment, and these investments It is difficult to see market returns in the short term. Once it stops midway, it means that all the early investment has been wasted.

Therefore, for small and medium-sized enterprises, it is still recommended to maintain a cautious attitude when investing in Serverless technology in the OLAP field. The technology research and development, evolution and iteration of Serverless should be left to large cloud vendors with stronger technical talent reserves and more professional technology investment. .

How far is Serverless from large-scale application?

In the field of OLAP data analysis, although there are already several commercial serverless architecture data warehouses, the technical difficulties mentioned above still exist and have not been overcome, and the scale of computing power provided in the future is difficult to support medium and large-scale data warehouses or Analyze platform requirements.

However, the architectural concept of Serverless is still oriented to the future, and technical challenges will have better solutions and measures over time, and it can currently be applied and promoted in some small and medium-sized analysis load scenarios.

Finally, I would like to mention that in addition to the continuous evolution and iteration of the technical level, another very critical factor affecting the large-scale application of Serverless is the standardization of Serverless services, especially in the field of OLAP analysis. The original intention of Serverless is to allow users to focus on business implementation, but without a standardized specification, users will be locked into the platform and unable to realize translation and seamless migration of applications. For example, users cannot seamlessly migrate applications based on MySQL to PostgreSQL. Because the following database is serverless, but the interface for interacting with business logic has not yet been standardized. Therefore, the large-scale application of Serverless also requires supporting standards and specification systems.

All in all, Serverless architecture has become more and more popular. With the further development and improvement of cloud computing and Serverless technology, Serverless architecture will become one of the preferred architectures for more large-scale applications in the future. Users will use water and electricity just like they do today. , conveniently and quickly enjoy serverless OLAP data analysis services.

Click to jump to ByteHouse to learn more

Microsoft launches new "Windows App" Xiaomi officially announces that Xiaomi Vela is fully open source, and the underlying kernel is NuttX Vite 5. Alibaba Cloud 11.12 is officially released. The cause of the failure is exposed: Access Key service (Access Key) anomaly. GitHub report: TypeScript replaces Java and becomes the third most popular. The language operator’s miraculous operation: disconnecting the network in the background, deactivating broadband accounts, forcing users to change optical modems ByteDance: using AI to automatically tune Linux kernel parameters Microsoft open source Terminal Chat Spring Framework 6.1 officially GA OpenAI former CEO and president Sam Altman & Greg Brockman joins Microsoft

Guess you like

Origin my.oschina.net/u/5588928/blog/10143487