After studying the design and implementation notes of the real-time streaming computing system recommended by Ali P8, finally enter Taobao

This article will lead you to look at the essence through the phenomenon, and master the design and trade-offs of high-performance, high-concurrency, and real-time systems;

Highly abstract the technical support, architecture mode, programming mode, system implementation and collaboration system of the real-time stream computing system, and write a distributed real-time stream computing system from scratch!

After learning the design and implementation of the real-time streaming computing system recommended by Ali P8 and PDF, finally enter Taobao

 

This article summarizes the general architecture patterns of real-time streaming computing systems.

By building a stream computing programming framework from scratch, readers can understand the task types of stream computing applications and learn to solve various problems and difficulties encountered in the calculation process.

This article hopes to let readers understand the advantages and fun of the "flow" programming method in Java program development. In addition, by extending the single-node stream computing application to a distributed cluster, readers can understand the architecture model of distributed systems, and can accurately view the various dazzling stream computing frameworks in the open source community, see the essence of these stream computing frameworks, and avoid choosing phobia.

This article also discusses the problems that real-time stream computing can and cannot solve, so that readers can understand the capabilities of the stream computing system, and won't be too arrogant.

All in all, after reading this article, readers can have a clear understanding and understanding of the real-time streaming computing system, and can do well in architecture design, system implementation and specific applications, and finally make an excellent real-time streaming computing application product. .

After learning the design and implementation of the real-time streaming computing system recommended by Ali P8 and PDF, finally enter Taobao

 

Chapter 1 Real-time stream computing; overall, the content of this article is organized in a "total score" structure. Reading this chapter, we have an overall understanding of the usage scenarios and general architecture of the real-time streaming computing system. In the following chapters, we will conduct specific analysis and discussion on each part of the real-time stream computing system.

After learning the design and implementation of the real-time streaming computing system recommended by Ali P8 and PDF, finally enter Taobao

 

Chapter 2 Data Acquisition; This chapter focuses on the data acquisition module and analyzes the issues related to NIO and asynchronous programming in detail.

Readers may be confused when reading this. The topic of this article is real-time stream computing, but so far, the most discussed are asynchronous and NIO. Is it digress? In fact, there is an inextricable relationship between "flow" and "asynchronous", "flow" is an important expression of "asynchronous", and "asynchronous" is the intrinsic nature of "flow" during execution. Nowadays, streaming programming is becoming more and more popular. On the one hand, "flow" is a natural representation of the process of real-world events; on the other hand, "flow" is asynchronous and parallel when executed internally, which can maximize Improve resource efficiency and program execution performance. In the previous explanation about asynchronous programming, we have very naturally used concepts such as "upstream" and "downstream" which are obviously related to "flow". It can be said that a thorough understanding of NIO and asynchrony is the basis for writing high-performance programs. Even if you do not implement a stream computing system, this knowledge will be very useful. The above is the reason why this chapter focuses so much on NIO and asynchrony.

In the following chapters, we will explain in detail how to design and implement the feature extraction module in the financial risk control system in a stream. In this process, we will more truly appreciate the similarities and differences between the two programming methods, stream and asynchronous!

After learning the design and implementation of the real-time streaming computing system recommended by Ali P8 and PDF, finally enter Taobao

 

Chapter 3 Realize single-node stream computing applications; this chapter analyzes the two more important basic components in stream computing by constructing a single-node real-time stream computing framework, namely the queue used to deliver events and the thread used to execute computing logic . Although in the subsequent chapters of this article, we will see more complex distributed stream computing frameworks, but this cannot change the basic structure of stream computing applications.

Stream computing is an asynchronous system, so we need to strictly control the problem of inconsistent execution of the various subsystems in the asynchronous system. To this end, we have repeatedly emphasized the importance of the reverse pressure function of the convection calculation system. Only the flow calculation application that supports the reverse pressure function internally can it run stably and reliably for a long time. Compared with "asynchronous", the calculation model of "flow" more naturally describes the process of things happening in the real world, and is more in line with our way of thinking when analyzing business execution processes. Therefore, "streaming" reduces the difficulty of building asynchronous and high-concurrency systems.

The optimization of stream computing applications is a very meaningful and valuable thing, which will give us a deeper understanding of the system we build (whether in business logic or technical details), so we must pay attention to program optimization.

This chapter constructs the stream computing application framework of the feature extraction module, but does not involve specific feature calculations. Feature calculation belongs to the content of stream data processing. In the following chapters, we will discuss all aspects of stream data processing.

After learning the design and implementation of the real-time streaming computing system recommended by Ali P8 and PDF, finally enter Taobao

 

Chapter 4 Data Processing; This chapter discusses data processing issues in real-time streaming computing systems from five aspects: stream data operation, time dimension aggregation feature calculation, correlation map feature calculation, event sequence analysis, model learning and prediction.

In general, in the future development of stream computing applications, most of the computing tasks we encounter will be classified into the above five types of computing. In the process of implementing various calculations, we have encountered various contradictions, such as the contradiction between the associated operation duration window and the limited memory, and the high-potential data and the limited storage space in the time-dimensional aggregation feature calculation. Contradiction, the contradiction between the complex graph calculation algorithm and real-time calculation in the calculation of the associated map feature. In the end, we all resolved these contradictions by adopting various optimization, trade-offs or compromise measures. However, due to the limited range of knowledge and ability of the author of this book, coupled with the continuous advancement of time and technology, many of the problem-solving methods introduced in this chapter may not be optimal. In the future development process, readers may wish to use the content of this chapter as a basis or reference, and continue to explore better ways to solve data processing problems in real-time stream computing.

After learning the design and implementation of the real-time streaming computing system recommended by Ali P8 and PDF, finally enter Taobao

 

Chapter 5 State Management of Real-Time Streaming Computing ; This chapter discusses the state management issues in real-time streaming computing applications. We divide the state in real-time streaming computing applications into streaming data state and streaming information state.

It can be said that these two states manage the flow from two different dimensions. The flow data state manages the flow from a time perspective, while the flow information state manages the flow from a spatial perspective. The flow information state makes up for the insufficiency of the flow data state only to manage events in the time series, and extends the flow state to any space.

Separating the two concepts of stream data state and stream information state will guide us to decouple the execution process of the stream computing application itself from the information management mechanism of stream data, which makes the overall structure of the real-time stream computing system clearer. If we understand the former as the execution pipeline of the CPU, then the latter is equivalent to memory.

After learning the design and implementation of the real-time streaming computing system recommended by Ali P8 and PDF, finally enter Taobao

 

Chapter 6 Open Source Stream Computing Framework; In addition to the open source stream computing framework introduced in the previous chapter, there are many other stream computing frameworks or platforms, such as Akka Streaming, Apache Beam, etc. These stream computing frameworks have their own characteristics. For example, Akka Streaming supports rich and flexible stream computing programming APIs, which can be described as stunning; while Apache Beam is a master of stream computing models, and it is ready to unify stream computing.

At present, most stream computing frameworks have or are planning to support SQL queries. This is a very good feature. It also adds a familiar operation interface to stream computing. But since this article focuses on the most essential thing of the calculation model of "flow", the discussion of the "skin" of SQL is omitted. In any case, if SQL is very suitable for the reader's usage scenario, then you might as well understand and use them. After all, SQL will also become a common mode of stream computing programming in the future.

After learning the design and implementation of the real-time streaming computing system recommended by Ali P8 and PDF, finally enter Taobao

 

Chapter 7 is not real-time; this chapter mainly discusses the use of Lambda architecture to indirectly achieve our real-time computing goals when we really cannot achieve our computing goals through a single stream computing framework.

The Lambda architecture is an idea of ​​constructing a data system, which defines the process of data analysis as a pure function calculation on an immutable data set. The construction of the data system is divided into two steps. The first step is to collect a batch of data to form an immutable data set, and the second step is to perform data processing and analysis on the immutable data set. This kind of data system construction idea can be applied not only to the offline processing part, but also to the real-time processing part. The offline processing part and the real-time processing part respectively derive the batch processing layer and the fast processing layer of the Lambda architecture.

After learning the design and implementation of the real-time streaming computing system recommended by Ali P8 and PDF, finally enter Taobao

 

Chapter 8 Data Transmission; In this chapter, we focus on data transmission in a stream computing system and describe three messaging middleware with different functions and roles.

Among them, Apache Kafka is very suitable for taking on the role of data bus in the era of big data due to its excellent throughput and streaming data storage capabilities; RabbitMQ, due to its compliance with AMQP standards, has higher data reliability, better real-time performance, and supports rich Clients in different languages ​​are very suitable to be used as configuration buses in real-time streaming computing systems; Apach.Camel is a good news due to its flexible routing function and consistent encapsulation of underlying message middleware and various data protocol ports. The service layer middleware can effectively manage the underlying message middleware.

Although this chapter explains three specific message middleware, their functional roles are different. The author is more concerned about letting readers understand and comprehend the respective roles and responsibilities of the three different role message middleware in the stream computing system. As for the specific message middleware, in addition to the three message middleware explained in this chapter, there are actually many kinds, such as ZeroMQ, ActiveMQ, Apache RocketMQ and Apach.PulsarApache Kafka.

After learning the design and implementation of the real-time streaming computing system recommended by Ali P8 and PDF, finally enter Taobao

 

Chapter 9 Data Storage; This chapter discusses various data storage issues involved in real-time streaming computing systems.

In fact, not only in real-time streaming computing systems, but also in almost any relatively complex system, the design of data storage solutions is very important. If the data storage scheme is improperly designed, when the data in the system accumulates to a certain amount, the service delay will inevitably increase, and eventually the service will be unavailable. Generally at this time, because the system already has a considerable amount of data and status, any repair and modification operations will become time-consuming and laborious, and even the business can only be suspended and the problem can be solved offline.

After learning the design and implementation of the real-time streaming computing system recommended by Ali P8 and PDF, finally enter Taobao

 

Chapter 10 Service governance and configuration management; this chapter mainly discusses service governance and dynamic configuration issues.

The reason for discussing the issues outside the main body of these two stream computing systems is that the system we build to solve specific business problems is an organic whole, not just a stream computing application. Even with real-time streaming computing applications as the core, if it is not well integrated with surrounding systems, this will make our subsequent development, operation and maintenance and product iterations difficult.

After learning the design and implementation of the real-time streaming computing system recommended by Ali P8 and PDF, finally enter Taobao

 

Chapter 11 Real-time stream computing application cases; this chapter uses two real-time stream computing application cases to summarize the fragmentary knowledge points in the previous chapters, and the purpose is to allow readers to have a clearer understanding of these knowledge points in these two cases. The role and location of the computing system.

The feature engine with DSL user interface implemented by the CompletableFuture framework is a general tool for feature extraction for data streams. Although this tool is still relatively rudimentary, it represents a general composition model for building a feature engine. For example, we can use Flink to replace the stream computing framework we have implemented in the execution plan execution layer, then we can make up for many of the defects of the wheel we built, such as the guarantee of event processing sequence, the guarantee of state consistency after failure recovery, More flexible resource scheduling and more convenient distributed state management, etc.

After learning the design and implementation of the real-time streaming computing system recommended by Ali P8 and PDF, finally enter Taobao

 

This [Real-time Streaming Computing System Design and Implementation] document has a total of 418 pages. Due to the limitation of the content of the article, the editor will not introduce more here. If you need a full version, you can forward this article. Follow the editor and scan the code. Get it below! !

Audience

This article is mainly suitable for the following readers:

  • Java software developers;
  • Real-time computing engineers and architects;
  • Distributed system engineer and architect.

Guess you like

Origin blog.csdn.net/bjmashibing001/article/details/111996742