The Beauty of Design Patterns 61-Strategy Pattern (Part 2): How to implement a small program that supports sorting files of different sizes?

61 | Strategy mode (Part 2): How to implement a small program that supports sorting files of different sizes?

In the last class, we mainly introduced the principle and implementation of the strategy pattern, and how to use the strategy pattern to remove if-else or switch-case branch judgment logic. Today, we will talk about the design intent and application scenarios of the strategy pattern in detail with a specific example of "sorting files".

In addition, in today's explanation, I will show you how a design pattern is "created" through step-by-step analysis and reconstruction. Through today's study, you will find that design principles and ideas are actually more universal and important than design patterns. After mastering the design principles and ideas of code, we can even create new design patterns by ourselves .

Without further ado, let's officially start today's study!

Problems and Solutions

Suppose there is such a requirement, and I hope to write a small program to realize the function of sorting a file. The file only contains integer numbers, and adjacent numbers are separated by commas. If you were to write such a small program, how would you implement it? You can take it as an interview question, think about it yourself, and then read my explanation below.

You might say, isn’t this very simple, just read the contents of the file and divide them into numbers one by one by commas, put them in the memory array, and then write some kind of sorting algorithm (such as quick sort), Or directly use the sorting function provided by the programming language to sort the array, and finally write the data in the array to a file.

But what if the file is huge? For example, if there is a size of 10GB, because the memory is limited (for example, only 8GB), we cannot load all the data in the file into the memory at one time. At this time, we need to use an external sorting algorithm (for details, please refer to my other "Sorting" related chapters in a column "The Beauty of Data Structures and Algorithms").

If the file is larger, such as 100GB in size, in order to take advantage of the multi-core CPU, we can optimize it on the basis of external sorting and add the function of multi-threaded concurrent sorting, which is similar to the "stand-alone version" of MapReduce.

If the file is very large, such as 1TB in size, even if it is a single-machine multi-threaded sort, it is considered very slow. At this time, we can use the real MapReduce framework to take advantage of the processing capabilities of multiple machines to improve the efficiency of sorting.

Code Implementation and Analysis

The solution idea is finished, it is not difficult to understand. Next, let's take a look at how to translate the solution ideas into code implementation.

I will implement it in the most simple and direct way first. I posted the specific code below, you can take a look first. Because we are talking about design patterns, not algorithms, so in the following code implementation, I only give the skeleton code related to the design pattern, and do not give the specific code implementation of each sorting algorithm. If you are interested, you can implement it yourself.

public class Sorter {
  private static final long GB = 1000 * 1000 * 1000;

  public void sortFile(String filePath) {
    // 省略校验逻辑
    File file = new File(filePath);
    long fileSize = file.length();
    if (fileSize < 6 * GB) { // [0, 6GB)
      quickSort(filePath);
    } else if (fileSize < 10 * GB) { // [6GB, 10GB)
      externalSort(filePath);
    } else if (fileSize < 100 * GB) { // [10GB, 100GB)
      concurrentExternalSort(filePath);
    } else { // [100GB, ~)
      mapreduceSort(filePath);
    }
  }

  private void quickSort(String filePath) {
    // 快速排序
  }

  private void externalSort(String filePath) {
    // 外部排序
  }

  private void concurrentExternalSort(String filePath) {
    // 多线程外部排序
  }

  private void mapreduceSort(String filePath) {
    // 利用MapReduce多机排序
  }
}

public class SortingTool {
  public static void main(String[] args) {
    Sorter sorter = new Sorter();
    sorter.sortFile(args[0]);
  }
}

In the part of "Coding Specifications", we said that the number of lines of the function should not be too many, and it is best not to exceed the size of one screen. Therefore, in order to avoid the sortFile() function being too long, we separate each sorting algorithm from the sortFile() function and split it into 4 independent sorting functions.

If you just develop a simple tool, the above code implementation is enough. After all, there are not many codes, and there are not many requirements for subsequent modification and expansion. No matter how you write them, the code will not be unmaintainable. However, if we are developing a large-scale project, and the sorting file is only one of the functional modules, then we have to work hard on code design and code quality. Only when each small functional module is well written, the code of the whole project can not be bad.

In the code just now, we did not give the code implementation of each sorting algorithm. In fact, if you implement it yourself, you will find that the implementation logic of each sorting algorithm is more complicated, and the number of lines of code is more. The code implementation of all sorting algorithms is piled in a class of Sorter, which will lead to a lot of code in this class. In the part of "Coding Standards", we also mentioned that too much code in a class will also affect readability and maintainability. In addition, all sorting algorithms are designed as private functions of Sorter, which will also affect the reusability of the code.

Code optimization and refactoring

As long as we have mastered the design principles and ideas we talked about before, we should be able to know how to solve the above problems even if we can’t think of any design patterns to use for refactoring, that is, to split some codes in the Sorter class Come out and become independent into a sub-category with more single responsibilities. In fact, splitting is a common method to deal with too many classes or function codes and code complexity. According to this solution, we refactor the code. The code after refactoring looks like this:

public interface ISortAlg {
  void sort(String filePath);
}

public class QuickSort implements ISortAlg {
  @Override
  public void sort(String filePath) {
    //...
  }
}

public class ExternalSort implements ISortAlg {
  @Override
  public void sort(String filePath) {
    //...
  }
}

public class ConcurrentExternalSort implements ISortAlg {
  @Override
  public void sort(String filePath) {
    //...
  }
}

public class MapReduceSort implements ISortAlg {
  @Override
  public void sort(String filePath) {
    //...
  }
}

public class Sorter {
  private static final long GB = 1000 * 1000 * 1000;

  public void sortFile(String filePath) {
    // 省略校验逻辑
    File file = new File(filePath);
    long fileSize = file.length();
    ISortAlg sortAlg;
    if (fileSize < 6 * GB) { // [0, 6GB)
      sortAlg = new QuickSort();
    } else if (fileSize < 10 * GB) { // [6GB, 10GB)
      sortAlg = new ExternalSort();
    } else if (fileSize < 100 * GB) { // [10GB, 100GB)
      sortAlg = new ConcurrentExternalSort();
    } else { // [100GB, ~)
      sortAlg = new MapReduceSort();
    }
    sortAlg.sort(filePath);
  }
}

After splitting, the code of each class will not be too much, the logic of each class will not be too complicated, and the readability and maintainability of the code will be improved. In addition, we designed the sorting algorithm as an independent class, decoupled from the specific business logic (if-else logic in the code), and made the sorting algorithm reusable. This step is actually the first step of the strategy pattern, which is to separate the definition of the strategy.

In fact, the above code can continue to be optimized. Each sorting class is stateless, and we don't need to recreate a new object every time we use it. Therefore, we can use the factory pattern to encapsulate the creation of objects. According to this idea, we refactor the code. The code after refactoring looks like this:

public class SortAlgFactory {
  private static final Map<String, ISortAlg> algs = new HashMap<>();

  static {
    algs.put("QuickSort", new QuickSort());
    algs.put("ExternalSort", new ExternalSort());
    algs.put("ConcurrentExternalSort", new ConcurrentExternalSort());
    algs.put("MapReduceSort", new MapReduceSort());
  }

  public static ISortAlg getSortAlg(String type) {
    if (type == null || type.isEmpty()) {
      throw new IllegalArgumentException("type should not be empty.");
    }
    return algs.get(type);
  }
}

public class Sorter {
  private static final long GB = 1000 * 1000 * 1000;

  public void sortFile(String filePath) {
    // 省略校验逻辑
    File file = new File(filePath);
    long fileSize = file.length();
    ISortAlg sortAlg;
    if (fileSize < 6 * GB) { // [0, 6GB)
      sortAlg = SortAlgFactory.getSortAlg("QuickSort");
    } else if (fileSize < 10 * GB) { // [6GB, 10GB)
      sortAlg = SortAlgFactory.getSortAlg("ExternalSort");
    } else if (fileSize < 100 * GB) { // [10GB, 100GB)
      sortAlg = SortAlgFactory.getSortAlg("ConcurrentExternalSort");
    } else { // [100GB, ~)
      sortAlg = SortAlgFactory.getSortAlg("MapReduceSort");
    }
    sortAlg.sort(filePath);
  }
}

After the above two refactorings, the current code actually conforms to the code structure of the strategy pattern. We decouple the definition, creation, and use of strategies through strategy patterns, so that each part will not be too complicated. However, the sortFile() function in the Sorter class still has a bunch of if-else logic. There are not many if-else logical branches here, and they are not complicated, so it is no problem to write this way. But if you really want to remove the if-else branch judgment, there is a way. I will give the code directly, and you can understand it at a glance. In fact, this is also solved based on the look-up table method, where "algs" is "table".

public class Sorter {
  private static final long GB = 1000 * 1000 * 1000;
  private static final List<AlgRange> algs = new ArrayList<>();
  static {
    algs.add(new AlgRange(0, 6*GB, SortAlgFactory.getSortAlg("QuickSort")));
    algs.add(new AlgRange(6*GB, 10*GB, SortAlgFactory.getSortAlg("ExternalSort")));
    algs.add(new AlgRange(10*GB, 100*GB, SortAlgFactory.getSortAlg("ConcurrentExternalSort")));
    algs.add(new AlgRange(100*GB, Long.MAX_VALUE, SortAlgFactory.getSortAlg("MapReduceSort")));
  }

  public void sortFile(String filePath) {
    // 省略校验逻辑
    File file = new File(filePath);
    long fileSize = file.length();
    ISortAlg sortAlg = null;
    for (AlgRange algRange : algs) {
      if (algRange.inRange(fileSize)) {
        sortAlg = algRange.getAlg();
        break;
      }
    }
    sortAlg.sort(filePath);
  }

  private static class AlgRange {
    private long start;
    private long end;
    private ISortAlg alg;

    public AlgRange(long start, long end, ISortAlg alg) {
      this.start = start;
      this.end = end;
      this.alg = alg;
    }

    public ISortAlg getAlg() {
      return alg;
    }

    public boolean inRange(long size) {
      return size >= start && size < end;
    }
  }
}

The current code implementation is even more elegant. We isolated the variable parts into static code sections in the strategy factory class and the Sorter class. When adding a new sorting algorithm, we only need to modify the static code segments in the strategy factory class and the Sort class, and other codes do not need to be modified, so that the code changes are minimized and centralized.

You might say that even so, when we add a new sorting algorithm, we still need to modify the code, which does not fully comply with the open-close principle. Is there any way for us to fully satisfy the opening and closing principle?

For the Java language, we can avoid modification of the policy factory class through reflection. Specifically, we do this: we use a configuration file or a custom annotation to mark which strategy classes are available; the strategy factory class reads the configuration file or searches for the strategy classes marked by the annotation, and then dynamically loads these strategy classes through reflection, Create a strategy object; when we add a new strategy, we only need to add the newly added strategy class to the configuration file or mark it with annotation. Remember the class discussion questions from the last class? We can also use this method to solve.

For Sorter, we can avoid modification by the same method. We put the correspondence between the file size range and the algorithm into the configuration file. When adding a new sorting algorithm, we only need to change the configuration file, not the code.

key review

Well, that's all for today's content. Let's summarize and review together, what you need to focus on.

When it comes to if-else branch judgment, some people think it is bad code. If the if-else branch judgment is not complicated and there are not many codes, there is no problem. After all, if-else branch judgment is a syntax provided by almost all programming languages, and there is a reason for its existence. Following the KISS principle, the best design is as simple as possible. It is a kind of over-design to have to use the strategy mode to create more than n categories.

When it comes to strategy mode, some people think that its function is to avoid if-else branch judgment logic. In fact, this understanding is very one-sided. The main role of the strategy pattern is to decouple the definition, creation and use of strategies, and control the complexity of the code, so that each part will not be too complicated and the amount of code will be too much. In addition, for complex code, the strategy pattern can also satisfy the open-closed principle. When adding a new strategy, it minimizes and centralizes code changes and reduces the risk of introducing bugs.

In fact, design principles and ideas are more universal and important than design patterns. After mastering the design principles and ideas of the code, we can understand more clearly why we need to use a certain design pattern, and we can apply the design pattern more appropriately.

class disscussion

  1. In the past project development, have you ever used the strategy pattern, are you using it to solve any problems?
  2. Can you tell me, under what circumstances is it necessary for us to remove the if-else or switch-case branch logic in the code?

Welcome to leave a message and share your thoughts with me. If you gain something, you are welcome to share this article with your friends.

Guess you like

Origin blog.csdn.net/fegus/article/details/130519140