Java String的高效统计子串出现次数

结论: 使用 substring 时, 尽量采用双脚标方式,

理由: substring 指定双脚标时, 比默认指定但脚标时会少复制一些字符

性能验证:

import org.apache.commons.io.FileUtils;

import java.io.File;
import java.io.IOException;
import java.time.Duration;
import java.time.LocalTime;

public class TestProcessing {
    public static void main(String[] args) throws IOException {

        //功能测试小文件
        //String oldfile = "I:\\StudyProject\\5sProject\\filesearch\\test-source\\test.txt";

        //性能测试大文件
        String oldfile = "I:\\StudyProject\\5sProject\\filesearch\\test-source\\深入理解JVM-学习笔记.txt";
        String[] keys = {"加载", "接口", "使用", "初始化", "文件"};
        String content = FileUtils.readFileToString(new File(oldfile), "utf-8");
        int count = 0;
        LocalTime start = LocalTime.now();
        for (String key : keys) {
            for (int i = 0, length = content.length(), keyLength = key.length(); i + keyLength <= length; i++) {
                if (content.substring(i, i + keyLength).equals(key)) {
                    count++;
                }
            }
        }
        Duration between = Duration.between(start, LocalTime.now());
        System.out.println("count1: " + count + "  between1: " + between);

        int sum = 0;
        LocalTime start2 = LocalTime.now();
        for (String key : keys) {
            String temp = content;
            while (temp.contains(key)) {
                temp = temp.substring(temp.indexOf(key) + key.length());
                sum++;
            }
        }
        Duration between2 = Duration.between(start2, LocalTime.now());
        System.out.println("count2: " + sum + "  between2: " + between2);


    }

}

 测试结果:

count1: 262890  between1: PT0.663S
count2: 262890  between2: PT4M55.925S

猜你喜欢

转载自www.cnblogs.com/tyxuanCX/p/12592762.html