The advantages and disadvantages of the Pipeline design pattern and practical cases

Author : Mingming Ruyue senior, CSDN blog expert, senior Java engineer of Ant Group, author of "Performance Optimization Methodology", "Unlocking Big Factory Thinking: Analysis of "Alibaba Java Development Manual", "Re-learning Classics: Exclusive Analysis of "EffectiveJava"" Columnist.

Recommended popular articles :

I. Overview

We covered the Pipeline design pattern earlier in the "Pipeline Design Pattern in Java" article. Its core idea is to create a set of operations (pipelines) and pass data among these operations, each operation can work independently, or it can support processing multiple data streams at the same time.
pipeline.png

Some students mentioned several good related issues, which will be briefly discussed in this article.

  • (1) The code of the Pipeline in the example can also be implemented by using hard coding. Why use this mode and what are the benefits?
  • (2) How is the Pipeline design pattern reflected in actual coding?
  • (3) What are the disadvantages of the Pipeline design pattern? How to solve?

2. Q & A

2.1 Why use the Pipeline design pattern instead of hardcoding?

"Why use the XXX design pattern instead of hardcoding?" This question is applicable to other design patterns.
This question can be answered in two dimensions:

  • (1) What are the advantages of this design pattern?
  • (2) What are the applicable scenarios of this design pattern?

2.1.1 Advantages of the Pipeline design pattern

The advantages of the Pipeline design pattern mainly include the following three points: reduce coupling, increase flexibility, and improve performance.

  • (1) Reduce the degree of coupling .
    High cohesion and weak coupling: The Pipeline design pattern encapsulates different processing logics into independent stages, each stage only focuses on its own input and output, and does not need to know the details of other stages. This makes it easy to add, delete or modify stages without affecting the operation of the entire process.
    Troubleshooting is convenient: For relatively complex or long codes suitable for this design pattern, if you do not use the Pipeline design pattern and code directly, when a step in the middle makes an error, you usually need to understand the context before you dare to modify it; after adopting the Pipeline design pattern, Due to the decoupling between different steps, you only need to pay attention to this step if you make an error.
    Strong testability: Since the different steps are relatively independent, the coupling is low, and it is more in line with the principle of single responsibility, and it is easier to write a single test for each step; when writing a single test, it is easier to fully cover the code logic.
  • (2) Increase flexibility . The Pipeline design pattern can be configured to enable different businesses to follow different processes without modifying the code. In this way, the process can be quickly adjusted according to changes in requirements, and development efficiency and maintainability can be improved.
  • (3) Improve performance . The Pipeline design pattern can use multithreading or asynchronous mechanisms to execute different stages in parallel, thereby improving the throughput and response time of the entire process.

If you use hard-coding to achieve similar functions: the coupling between codes is higher, and you need to read more codes to troubleshoot problems; it is also difficult to write high-quality single tests; you cannot flexibly realize the combination of different steps Reuse; although some steps can also be asynchronously achieved by using thread pools, etc., this ability cannot be reused, and you have to write it again for another scene.

2.1.2 Common Scenarios of the Pipeline Design Pattern

Generally speaking, a certain processing flow can be split into multiple processing steps, and different steps are relatively independent, and data is passed between different steps. A complex task can be completed through specific arrangement. At this time, you can consider using Pipeline design pattern.

Some common scenarios are given below:

  • (1) Data processing : When a large amount of data needs to be processed, it is usually necessary to divide the processing process into multiple stages. For example, stages such as data cleaning, transformation, normalization, and feature extraction can all be part of the Pipeline.

  • (2) Image processing : In image processing, multiple processing stages are required for the image, such as color space conversion, filtering, edge detection, feature extraction, etc. These processing steps can be combined into a Pipeline so that entire image datasets can be easily processed.

  • (3) Build a DevOps pipeline : During the software development process, multiple processing stages are required for the code, such as code compilation, unit testing, code analysis, code deployment, etc. These steps can be combined into a Pipeline to make up the entire development process.

2.2 How to implement it in actual work?

Many people may think that the above-mentioned Pipeline design pattern scenario is not grounded enough, so what are the common implementation methods of Pipeline in actual work?

2.2.1 Java Function API

We can use Functionto implement a simple and easy-to-use Pipeline.
Sample code:

Function<Integer, Integer> square = s -> s * s;
    Function<Integer, Integer> half = s -> s / 2;
    Function<Integer, String> toString = Object::toString;
    Function<Integer, String> pipeline = square.andThen(half)
        .andThen(toString);
    String result = pipeline.apply(5);

    String expected = "12";
    assertEquals(expected, result);

We can use the BiFunctionextended Functionfunction to support converting two objects into one object.
Sample code:

    BiFunction<Integer, Integer, Integer> add = Integer::sum;
    BiFunction<Integer, Integer, Integer> mul = (a, b) -> a * b;
    Function<Integer, String> toString = Object::toString;
    BiFunction<Integer, Integer, String> pipeline = add.andThen(a -> mul.apply(a, 2))
        .andThen(toString);
    String result = pipeline.apply(1, 2);
    String expected = "6";
    assertEquals(expected, result);

2.2.2 Java Stream API

The Java Stream API is a typical pipeline landing method.
The following is a simple Java Streamsample code, which uses the filter, mapand collectoperations to filter out the strings starting with the letter "A" from a list of strings, convert them to uppercase, and then collect them into a new list.

import java.util.Arrays;
import java.util.List;
import java.util.stream.Collectors;

public class StreamExample {
    
    

    public static void main(String[] args) {
    
    
        // Create a list of strings
        List<String> list = Arrays.asList("Apple", "Banana", "Orange", "Pear", "Avocado");

        // Create a stream from the list
        // Filter the strings that start with "A"
        // Map the strings to upper case
        // Collect the results into a new list
        List<String> result = list.stream()
                .filter(s -> s.startsWith("A"))
                .map(s -> s.toUpperCase())
                .collect(Collectors.toList());

        // Print the result
        System.out.println(result); // [APPLE, AVOCADO]
    }
}

In daily development, the underlying data is usually queried, and transformed into the required structure through filtering and mapping. Since Java's Stream is relatively simple and commonly used, I won't make too many statements here.

2.2.3 Business Orchestration

For example, you have a need in your work: you need to build a material (news information, short video, etc.) recommendation system, which has the following steps: Material recall (query candidate materials from MySQL, ES or two-party interfaces according to business needs) , Blacklist filtering (some materials are not allowed to be revealed), viewing record filtering (viewed ones cannot be revealed, need to be filtered out), rough sorting by type (only M can be kept in the same category or theme), fine sorting by algorithm (Call the algorithm system for scoring), business topping (topping some materials according to business needs), truncation by size (returning the size required for the request), etc. steps.

The pseudocode is as follows:

// 定义一个Pipeline接口,表示一个流水线
public interface Pipeline<T> {
    
    
    // 添加一个阶段到流水线
    void addStage(Stage<T> stage);
    // 执行流水线
    void execute(T input);
}

// 定义一个Stage接口,表示一个阶段
public interface Stage<T> {
    
    
    // 处理输入数据,并返回输出数据
    T process(T input);
}

// 定义一个PipelineContext类,表示流水线的上下文
public class PipelineContext<T> {
    
    
    // 存储流水线的阶段列表
    private List<Stage<T>> stages;
    // 存储流水线的当前索引
    private int index;

    public PipelineContext() {
    
    
        stages = new ArrayList<>();
        index = 0;
    }

    // 添加一个阶段到上下文
    public void addStage(Stage<T> stage) {
    
    
        stages.add(stage);
    }

    // 执行上下文中的下一个阶段
    public void invokeNext(T input) {
    
    
        if (index < stages.size()) {
    
    
            Stage<T> stage = stages.get(index++);
            stage.process(input);

         }
    }
}     


// 定义一个RecContext类,表示推荐的上下文
public class RecContext<T> {
    
    
    // 存储推荐中的物料列表
    private List<T> items; 

    // 其他属性

    public PipelineContext() {
    
    
        items = new ArrayList<>();
    }

    // 省略其他方法
}     


// 定义一个DefaultPipeline类,实现Pipeline接口
public class DefaultPipeline<T> implements Pipeline<T> {
    
    
    // 创建一个PipelineContext对象
    private PipelineContext<T> context;

    public DefaultPipeline() {
    
    
        context = new PipelineContext<>();
    }

    @Override
    public void addStage(Stage<T> stage) {
    
    
        context.addStage(stage);
    }

    @Override
    public void execute(T input) {
    
    
        context.invokeNext(input);
    }
}

// 定义一个物料类,表示推荐系统的输入和输出数据
public class Material {
    
    
    // 物料的id
    private String id;
    
    // 物料的类型(资讯、视频等)
    private String type;
    
    // 物料的评分(算法精排后的结果)
    private double score;
    
    // 省略构造方法、getters和setters

}


// 定义一个物料召回阶段类,实现Stage接口
public class MaterialRecallStage implements Stage<RecContext<Material>> {
    
    

    @Override
    public RecContext<Material> process(RecContext<Material> context) {
    
    
        // 根据用户的兴趣、行为等特征,从物料库(如 MySQL、Es存储或二方接口)中召回一批候选物料,并设置到 context 的 items中
        
        // 省略具体实现细节
        return context;
    }
}

// 定义一个黑名单过滤阶段类,实现Stage接口
public class BlacklistFilterStage implements Stage<RecContext<Material>> {
    
    

    @Override
    public RecContext<Material> process(RecContext<Material> context) {
    
    
        // 根据用户的黑名单设置,过滤掉不符合条件的物料,并设置到 context 的 items中
        
        // 省略具体实现细节
        return context;
    }
}


// 定义一个观看记录过滤阶段类,实现Stage接口
public class WatchRecordFilterStage implements Stage<RecContext<Material>> {
    
    

    @Override
    public RecContext<Material> process(RecContext<Material> context) {
    
    
        // 根据用户的观看记录,过滤掉已经观看过的物料,并设置到 context 的 items中
        
        // 省略具体实现细节
        return context;
    }
}

// 定义一个按类型粗排阶段类,实现Stage接口
public class TypeSortStage implements Stage<RecContext<Material>> {
    
    

    @Override
    public RecContext<Material> process(RecContext<Material> context) {
    
    
        // 根据用户的偏好和物料的类型,按照一定的规则对物料进行粗排,并设置到 context 的 items中
       
        // 省略具体实现细节
        return context;
    }
}

// 定义一个算法精排阶段类,实现Stage接口
public class AlgorithmSortStage implements Stage<RecContext<Material>> {
    
    

    @Override
    public RecContext<Material> process(RecContext<Material> context) {
    
    
        // 根据用户的特征和物料的特征,使用机器学习模型对物料进行打分,排序后设置到 context 的 items中
       
        // 省略具体实现细节
        return context;
    }
}

// 定义一个业务置顶阶段类,实现Stage接口
public class BusinessTopStage implements Stage<RecContext<Material>> {
    
    

    @Override
    public RecContext<Material> process(RecContext<Material> context) {
    
    
        // 根据业务的需求,对部分物料进行置顶操作,并设置到 context 的 items中
        
        // 省略具体实现细节
        return context;
    }
}

// 定义一个按size截断阶段类,实现Stage接口
public class SizeCutStage implements Stage<RecContext<Material>> {
    
    

    @Override
    public RecContext<Material> process(RecContext<Material> context) {
    
    
        // 根据请求中的 size 数量,对物料的数量进行截断,并设置到 context 的 items中
        
        // 省略具体实现细节
        return context;
    }
}

// 定义一个测试类,用来创建和执行流水线
public class Test {
    
    

    public static void main(String[] args) {
    
    
        // 创建一个物料对象,作为流水线的输入数据
        RecContext<Material> recContext = new RecContext<Material>();
        
        // 创建一个流水线对象
        Pipeline<RecContext<Material>> pipeline = new DefaultPipeline<>();
        
        // 添加各个阶段到流水线中
        pipeline.addStage(new MaterialRecallStage());
        pipeline.addStage(new BlacklistFilterStage());
        pipeline.addStage(new WatchRecordFilterStage());
        pipeline.addStage(new TypeSortStage());
        pipeline.addStage(new AlgorithmSortStage());
        pipeline.addStage(new BusinessTopStage());
        pipeline.addStage(new SizeCutStage());
        
        // 执行流水线
        pipeline.execute(recContext);
        
        // 输出流水线的结果
        System.out.println(material);
    }
}

Each process (Stage) can be configured as a Spring Bean, and the orchestration of the process can be controlled using dynamic configuration, so that it can be adjusted flexibly.
For example, there are many options for steps such as rough sorting and topping, which can be replaced by modifying the dynamic configuration according to business needs.
You can also develop your own framework or code your own to parse these steps, allowing some steps to be executed in parallel, such as dynamically configuring the bean name of the above-mentioned Stage, and parsing the parts in square brackets to improve performance:

[videoRecall, newsRecall,topicRecal],blacklist,
    recordFilter,typeSorce,algorithmSort,businessTop,sizeCut

I hope this example can help you better understand the advantages of the Pipeline design pattern: different steps can be independent of each other to reduce coupling, flexible combination and reuse, and parallel/concurrent execution between some steps can be used to improve performance, etc.

2.3 What are the disadvantages of the Pipeline mode?

Each design pattern has its own limitations. Here are some disadvantages of the Pipeline design pattern:

  • (1) The readability is not strong . Because the Pipeline design pattern is configurable, and the configuration is often external (such as a JSON in the database, Ctrip's Apollo dynamic configuration), it is not easy to see the logic and details of the entire process.
  • (2) Debugging is difficult . Because the Pipeline design pattern involves collaboration in multiple stages, if a problem occurs in a certain stage, it is not easy to quickly locate and fix it.
  • (3) Performance loss . Because the Pipeline design pattern needs to pass data between each stage, if each stage is cross-machine, it will increase the overhead of memory and network.

Of course, these shortcomings can also be solved by some techniques:

  • (1) For the problem of poor readability , we can post the address of the configuration at the entrance of the request to facilitate the association between code and configuration. Since each step is very independent, the code readability of each step can also solve the problem to a certain extent.
  • (2) For problems that are difficult to debug and troubleshoot . When using the Pipeline design pattern, we can log key places to facilitate quick location and troubleshooting.
  • (3) Aiming at the problem of performance loss . We can improve performance through some adjustments. For example, for the above-mentioned material recommendation business, we need to call the service of the algorithm platform to score. We can perform rough sorting before scoring, and only pass the rough sorting scores to the algorithm platform, users and materials. The characteristics do not need to be passed to the algorithm platform, and the algorithm platform will query relevant materials and user characteristics before scoring. Some memory usage is unavoidable no matter which method is used, so there is no need to worry about it.

3. The difference between the Pipeline design pattern and the chain of responsibility pattern

Both the Pipeline design pattern and the Chain of Responsibility pattern are design patterns for processing a series of interrelated tasks or operations, and their main difference lies in their processing methods and structures .

The Pipeline design pattern is usually a linear process, each step is independent, and each step processes the entire data set, and the output of each step is used as the input of the next step. This mode is usually used in data processing processes, such as ETL (Extract, Transform, Load) process.

The Chain of Responsibility pattern is more flexible, each step can process data and can selectively pass the data to the next step as needed. Each step has a handler that processes the data and passes it to the next step if the step can handle the data, otherwise it passes it to the next step. This pattern is commonly used for handling requests and commands, such as web requests and exception handling.

Therefore, the Pipeline design pattern is more suitable for linear processing processes, while the chain of responsibility pattern is more suitable for flexible processes, which can decide whether to continue processing data according to conditions .

Four. Summary

The purpose of learning is still for application. When you are learning design patterns, you must actively combine the JDK source code with the design of the two-party and three-party frameworks you use, and actively combine with daily business scenarios to better achieve Apply what you have learned .
Each design pattern has its own applicable scenarios, advantages and disadvantages. We must pay attention to mastering it, and not only understand the problems existing in a certain design pattern, but also actively think about how to solve them.
Generally speaking, for very simple scenarios, direct coding is sufficient; for complex scenarios, it is recommended to give priority to following design principles and using classic design patterns to improve code reusability, readability, flexibility, and scalability , security, and reduced code complexity.


Creation is not easy. If this article is helpful to you, please like, bookmark and pay attention. Your support and encouragement are the biggest motivation for my creation.
insert image description here

Guess you like

Origin blog.csdn.net/w605283073/article/details/129512487