一个关于全局ID过滤工具的性能优化实践

需求背景

假设有个动态的ID过滤规则，规则包含多个需要过滤的ID（用逗号分开）以及ID区间（用-代表区间）；
实现全局过滤器，对入参的ID判断是否命中过滤规则；
ID为long类型，配置中心返回的规则是string类型。

规则示例：
1,3,5,7-10

规则意思需要过滤 1,3,5,7,8,9,10

实现代码

咋一看需求挺简单，下面实现方式也直接简单

读取配置
解析配置
规则判断

//配置中心，用于读取规则配置
private Config config = ConfigService.getConfig();
/**
* 过滤器主方法
*/
public boolean isMatch(long id) {
        if (id<= 0)
            return false;
            
        String configStr = config.getProperty("idRule", "");
        if (StringUtils.isBlank(whitList))
            return false;
    
        return Stream.of(configStr.replaceAll("\\s", "").split(","))
                .anyMatch(range -> {
                    if (range.contains("-")) {
                        String[] boundaries = range.split("-", 2);
                        if (NumberUtils.isDigits(boundaries[0]) && NumberUtils.isDigits(boundaries[1])) {
                            long floor = NumberUtils.toLong(boundaries[0]);
                            long ceiling = NumberUtils.toLong(boundaries[1]);
                            return floor <= id && id <= ceiling;
                        }
                    }
                    return range.equals(String.valueOf(id));
                });
    }

对以上实现进行基准测试，测试用例mock的规则：“1, 3, 5, 7, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100-120, 200-220, 300-320”

    @BenchmarkMode(Mode.SampleTime)
    @Warmup(iterations = 2, time = 2)
    @Measurement(iterations = 5, time = 5)
    @Threads(1)
    @Fork(1)
    @OutputTimeUnit(TimeUnit.NANOSECONDS)
    @Benchmark
    public void testMatchByStr() {    	
        isMatch(RandomUtils.nextInt(1, 10000));
    }

测试结果

Benchmark                                            Mode     Cnt        Score    Error  Units
BenchmarkTest.testMatchByStr                         sample  867795     1843.805 ± 21.407  ns/op
BenchmarkTest.testMatchByStr:testMatchByStr·p0.00    sample             1000.000           ns/op
BenchmarkTest.testMatchByStr:testMatchByStr·p0.50    sample             1800.000           ns/op
BenchmarkTest.testMatchByStr:testMatchByStr·p0.90    sample             1800.000           ns/op
BenchmarkTest.testMatchByStr:testMatchByStr·p0.95    sample             1900.000           ns/op
BenchmarkTest.testMatchByStr:testMatchByStr·p0.99    sample             2900.000           ns/op
BenchmarkTest.testMatchByStr:testMatchByStr·p0.999   sample             4696.000           ns/op
BenchmarkTest.testMatchByStr:testMatchByStr·p0.9999  sample            24021.158           ns/op
BenchmarkTest.testMatchByStr:testMatchByStr·p1.00    sample          1243136.000           ns/op

结果看上去还不错，99.99%都在0.02毫秒内。

考虑到这是一个全局的过滤器，承载了全站的所有请求，并且是高并发业务，计算速度应该在允许的条件下追求最快，于是陷入思考，是否可以有更优实现？计算已经很简单了，貌似没有可以删减的；仔细看字符串解析部分计算是不是可以提前做？不需要放在每次的过滤判断中，下面试着分离计算：

初始化时就解析配置为List+Map，并监听配置变更
过滤器使用已解析好的集合进行规则判断

//这里实际是使用了apollo的配置变更事件，提前将字符串转换成为集合
private Set<Long> configArray = new HashSet<>();
private Map<Long, Long> configRange = new HashMap<>();
    
/**
 * 在初始化中读取并监听变更，提前把配置解析
 */
public void init() {
    String configStr = config.getProperty("idRule", "");
    resolveConfig(configStr );

    config.addChangeListener(configChangeEvent -> {
        if (configChangeEvent.isChanged("idRule")) {
            resolveConfig(config.getProperty("idRule", ""));
        }
    });
}

/**
 * 解析apollo配置
 */
private void resolveConfig(String configStr) {
    Set<Long> tmpConfigArray = new HashSet<>();
    Map<Long, Long> tmpConfigRange = new HashMap<>();

    if (StringUtils.isNotBlank(configStr)) {
        Stream.of(configStr.replaceAll("\\s", "").split(","))
                .forEach(range -> {
                    if (range.contains("-")) {
                        String[] boundaries = range.split("-", 2);
                        if (NumberUtils.isDigits(boundaries[0]) && NumberUtils.isDigits(boundaries[1])) {
                            long floor = NumberUtils.toLong(boundaries[0]);
                            long ceiling = NumberUtils.toLong(boundaries[1]);
                            tmpConfigRange.put(floor, ceiling);
                        }
                    } else {
                        tmpConfigArray.add(NumberUtils.toLong(range));
                    }
                });
    }

    this.configArray = tmpConfigArray;
    this.configRange = tmpConfigRange;
}

/**
* 过滤器主方法
*/
public boolean isMatch(long id) {
	  if (id<= 0)
	      return false;
	
	  if (configArray.size() == 0 && configRange.size() == 0) {
	      return false;
	  }
	
	  if (configArray.size() > 0) {
	      if(configArray.contains(id))
	          return true;
	  }
	
	  if (configRange.size() > 0) {
	      return configRange.entrySet().stream()
	              .anyMatch(entry -> entry.getKey() <= id && id <= entry.getValue());
	  }
	
	  return false;
}

对以上实现进行基准测试，mock同样的规则：“1, 3, 5, 7, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100-120, 200-220, 300-320”

测试规则

	@BenchmarkMode(Mode.SampleTime)
    @Warmup(iterations = 2, time = 2)
    @Measurement(iterations = 5, time = 5)
    @Threads(1)
    @Fork(1)
    @OutputTimeUnit(TimeUnit.NANOSECONDS)
    @Benchmark
    public void testMatchByList() {
        isMatch(RandomUtils.nextInt(1, 10000));
    }

测试结果

Benchmark                                              Mode     Cnt       Score   Error  Units
BenchmarkTest.testMatchByList                          sample  581645      67.503 ± 5.858  ns/op
BenchmarkTest.testMatchByList:testMatchByList·p0.00    sample                 ≈ 0          ns/op
BenchmarkTest.testMatchByList:testMatchByList·p0.50    sample             100.000          ns/op
BenchmarkTest.testMatchByList:testMatchByList·p0.90    sample             100.000          ns/op
BenchmarkTest.testMatchByList:testMatchByList·p0.95    sample             100.000          ns/op
BenchmarkTest.testMatchByList:testMatchByList·p0.99    sample             100.000          ns/op
BenchmarkTest.testMatchByList:testMatchByList·p0.999   sample             300.000          ns/op
BenchmarkTest.testMatchByList:testMatchByList·p0.9999  sample            3283.540          ns/op
BenchmarkTest.testMatchByList:testMatchByList·p1.00    sample          815104.000          ns/op

从基准结果看，快了10倍。在耗时很低的情况下，依然降低了10倍，虽然只是节省1000ns/op；不积跬步，无以至千里；如果能在每个业务开发时都稍微思考节省一点点，对于高并发系统的性能提升还是很有价值的。

总结：在高并发系统中应尽量将用户请求链中的计算做到最精简，当然需要考虑下投入产出比。