Geek Time-The Beauty of Design Patterns Strategy Pattern (Part 2): How to implement a small program that supports sorting files of different sizes?

Combined with a specific example of sorting files, let 's talk about the design intent and application scenarios of the strategy pattern in detail.

In addition, I will analyze and refactor step by step to show you how a design pattern is "created". Through today's study, you will find that design principles and ideas are actually more universal and important than design patterns. Once we have mastered the code design principles and ideas, we can even create new design patterns ourselves.

Problems and solutions

Suppose there is such a demand, and hope to write a small program to realize the function of sorting a file. The file contains only integer numbers, and adjacent numbers are separated by commas. If you were to write such a small program, how would you achieve it? You can think of it as an interview question, think about it for yourself, and then look at my explanation below.

You might say, this is not very simple, just read the content of the file, and divide it into a number by commas, put them in the memory array, and then write some sorting algorithm (such as fast sorting), Or directly use the sorting function provided by the programming language to sort the array, and finally write the data in the array to the file.

But what if the file is large? For example, there is a size of 10GB. Because of the limited memory (for example, only 8GB), we can't load all the data in the file into the memory at one time. At this time, we have to use an external sorting algorithm (for details, please refer to my other A column "The Beauty of Data Structures and Algorithms" in the "sorting" related chapters).

If the file is larger, such as 100GB in size, in order to take advantage of the multi-core CPU, we can optimize on the basis of external sorting and add the function of multi-threaded concurrent sorting, which is a bit similar to the "stand-alone version" of MapReduce.

If the file is very large, such as 1TB in size, even if it is single-machine multi-threaded sorting, this is very slow. At this time, we can use the real MapReduce framework to use the processing power of multiple machines to improve the efficiency of sorting.

Code implementation and analysis

The solution is finished, it is not difficult to understand. Next, let's take a look at how to translate the solution ideas into code implementation.

I will implement it in the simplest and most direct way first. I have posted the specific code below, you can take a look first. Because we are talking about design patterns, not algorithms, in the following code implementation, I only give the skeleton code related to the design pattern, and not the specific code implementation of each sorting algorithm. If you are interested, you can implement it yourself.


public class Sorter {
    
    
  private static final long GB = 1000 * 1000 * 1000;

  public void sortFile(String filePath) {
    
    
    // 省略校验逻辑
    File file = new File(filePath);
    long fileSize = file.length();
    if (fileSize < 6 * GB) {
    
     // [0, 6GB)
      quickSort(filePath);
    } else if (fileSize < 10 * GB) {
    
     // [6GB, 10GB)
      externalSort(filePath);
    } else if (fileSize < 100 * GB) {
    
     // [10GB, 100GB)
      concurrentExternalSort(filePath);
    } else {
    
     // [100GB, ~)
      mapreduceSort(filePath);
    }
  }

  private void quickSort(String filePath) {
    
    
    // 快速排序
  }

  private void externalSort(String filePath) {
    
    
    // 外部排序
  }

  private void concurrentExternalSort(String filePath) {
    
    
    // 多线程外部排序
  }

  private void mapreduceSort(String filePath) {
    
    
    // 利用MapReduce多机排序
  }
}

public class SortingTool {
    
    
  public static void main(String[] args) {
    
    
    Sorter sorter = new Sorter();
    sorter.sortFile(args[0]);
  }
}

As we said in the "coding specification" part, the number of lines of a function cannot be too many, and it is best not to exceed the size of one screen. Therefore, in order to avoid the sortFile() function being too long, we extracted each sorting algorithm from the sortFile() function and split it into 4 independent sorting functions.

If you are just developing a simple tool, the above code implementation is sufficient. After all, there is not much code, and there are not many requirements for subsequent modifications and extensions. No matter how you write it, the code will not be unmaintainable. However, if we are developing a large-scale project and the sorting file is only one of the functional modules, then we have to work hard on code design and code quality. Only when every small function module is written, the code of the whole project is not bad.

In the code just now, we did not give the code implementation of each sorting algorithm. In fact, if you implement it yourself, you will find that the implementation logic of each sorting algorithm is more complicated and the number of lines of code is relatively large. The code implementations of all sorting algorithms are stacked in a Sorter class, which results in a lot of code for this class. In the "coding specification" part, we also mentioned that too much code for a class will also affect readability and maintainability. In addition, all sorting algorithms are designed as Sorter's private functions, which will also affect the reusability of the code.

Code optimization and refactoring

As long as we have mastered the design principles and thoughts we have talked about before, for the above problems, even if we can't think of what design pattern to refactor, we should be able to know how to solve it, that is, to split some code in the Sorter class Come out, independent into a smaller category with more single responsibilities. In fact, splitting is a common method to deal with too much class or function code and to deal with code complexity. According to this solution, we refactored the code. The code after refactoring is as follows:


public interface ISortAlg {
    
    
  void sort(String filePath);
}

public class QuickSort implements ISortAlg {
    
    
  @Override
  public void sort(String filePath) {
    
    
    //...
  }
}

public class ExternalSort implements ISortAlg {
    
    
  @Override
  public void sort(String filePath) {
    
    
    //...
  }
}

public class ConcurrentExternalSort implements ISortAlg {
    
    
  @Override
  public void sort(String filePath) {
    
    
    //...
  }
}

public class MapReduceSort implements ISortAlg {
    
    
  @Override
  public void sort(String filePath) {
    
    
    //...
  }
}

public class Sorter {
    
    
  private static final long GB = 1000 * 1000 * 1000;

  public void sortFile(String filePath) {
    
    
    // 省略校验逻辑
    File file = new File(filePath);
    long fileSize = file.length();
    ISortAlg sortAlg;
    if (fileSize < 6 * GB) {
    
     // [0, 6GB)
      sortAlg = new QuickSort();
    } else if (fileSize < 10 * GB) {
    
     // [6GB, 10GB)
      sortAlg = new ExternalSort();
    } else if (fileSize < 100 * GB) {
    
     // [10GB, 100GB)
      sortAlg = new ConcurrentExternalSort();
    } else {
    
     // [100GB, ~)
      sortAlg = new MapReduceSort();
    }
    sortAlg.sort(filePath);
  }
}

After splitting, the code of each class is not too much, the logic of each class is not too complicated, and the readability and maintainability of the code are improved. In addition, we designed the sorting algorithm as a separate class, decoupled from the specific business logic (the if-else part of the code in the code), and also allowed the sorting algorithm to be reused. This step is actually the first step of the strategy model, which is to separate the definition of the strategy.

In fact, the above code can continue to be optimized. Each sorting class is stateless, we don't need to create a new object every time we use it. Therefore, we can use the factory pattern to encapsulate the creation of objects. According to this idea, we refactored the code. The code after refactoring is as follows:


public class SortAlgFactory {
    
    
  private static final Map<String, ISortAlg> algs = new HashMap<>();

  static {
    
    
    algs.put("QuickSort", new QuickSort());
    algs.put("ExternalSort", new ExternalSort());
    algs.put("ConcurrentExternalSort", new ConcurrentExternalSort());
    algs.put("MapReduceSort", new MapReduceSort());
  }

  public static ISortAlg getSortAlg(String type) {
    
    
    if (type == null || type.isEmpty()) {
    
    
      throw new IllegalArgumentException("type should not be empty.");
    }
    return algs.get(type);
  }
}

public class Sorter {
    
    
  private static final long GB = 1000 * 1000 * 1000;

  public void sortFile(String filePath) {
    
    
    // 省略校验逻辑
    File file = new File(filePath);
    long fileSize = file.length();
    ISortAlg sortAlg;
    if (fileSize < 6 * GB) {
    
     // [0, 6GB)
      sortAlg = SortAlgFactory.getSortAlg("QuickSort");
    } else if (fileSize < 10 * GB) {
    
     // [6GB, 10GB)
      sortAlg = SortAlgFactory.getSortAlg("ExternalSort");
    } else if (fileSize < 100 * GB) {
    
     // [10GB, 100GB)
      sortAlg = SortAlgFactory.getSortAlg("ConcurrentExternalSort");
    } else {
    
     // [100GB, ~)
      sortAlg = SortAlgFactory.getSortAlg("MapReduceSort");
    }
    sortAlg.sort(filePath);
  }
}

After the above two refactorings, the current code actually conforms to the code structure of the strategy pattern. We decouple the definition, creation, and use of strategies through the strategy model, so that each part is not too complicated. However, the sortFile() function in the Sorter class still has a lot of if-else logic. There are not many if-else logic branches and it is not complicated, so it is completely fine to write. But if you particularly want to remove the if-else branch judgment, there are ways. I give the code directly, you can understand it at a glance. In fact, this is also solved based on the look-up table method, where "algs" is "table".


public class Sorter {
    
    
  private static final long GB = 1000 * 1000 * 1000;
  private static final List<AlgRange> algs = new ArrayList<>();
  static {
    
    
    algs.add(new AlgRange(0, 6*GB, SortAlgFactory.getSortAlg("QuickSort")));
    algs.add(new AlgRange(6*GB, 10*GB, SortAlgFactory.getSortAlg("ExternalSort")));
    algs.add(new AlgRange(10*GB, 100*GB, SortAlgFactory.getSortAlg("ConcurrentExternalSort")));
    algs.add(new AlgRange(100*GB, Long.MAX_VALUE, SortAlgFactory.getSortAlg("MapReduceSort")));
  }

  public void sortFile(String filePath) {
    
    
    // 省略校验逻辑
    File file = new File(filePath);
    long fileSize = file.length();
    ISortAlg sortAlg = null;
    for (AlgRange algRange : algs) {
    
    
      if (algRange.inRange(fileSize)) {
    
    
        sortAlg = algRange.getAlg();
        break;
      }
    }
    sortAlg.sort(filePath);
  }

  private static class AlgRange {
    
    
    private long start;
    private long end;
    private ISortAlg alg;

    public AlgRange(long start, long end, ISortAlg alg) {
    
    
      this.start = start;
      this.end = end;
      this.alg = alg;
    }

    public ISortAlg getAlg() {
    
    
      return alg;
    }

    public boolean inRange(long size) {
    
    
      return size >= start && size < end;
    }
  }
}

The current code implementation is even more beautiful. We isolated the variable part to the static code segment in the strategy factory class and the Sorter class. When adding a new sorting algorithm, we only need to modify the static code segments in the strategy factory class and the Sort class, and no other code needs to be modified, so that the code changes are minimized and centralized.

You might say that even so, when we add a new sorting algorithm, we still need to modify the code, which does not fully comply with the open-close principle. Is there any way for us to fully satisfy the principle of opening and closing?

For the Java language, we can avoid the modification of the strategy factory class through reflection. Specifically, we do this: we use a configuration file or a custom annotation to mark which strategy classes we have; the strategy factory class reads the configuration file or searches for the strategy classes marked by the annotation, and then dynamically loads these strategy classes through reflection, Create a strategy object; when we add a new strategy, we only need to add the newly added strategy class to the configuration file or mark it with annotation. Remember the class discussion questions from the previous class? We can also use this method to solve.

For Sorter, we can avoid modification in the same way. We put the correspondence between the file size interval and the algorithm into the configuration file. When adding a new sorting algorithm, we only need to change the configuration file, without changing the code.