java编码字数统计

好久没有写博客了,最近在做一个window exe程序。


public static int String_length(String value) {
    int valueLength = 0;
    String chinese = "[\u4e00-\u9fa5]";
    for (int i = 0; i < value.length(); i++) {
        String temp = value.substring(i, i + 1);
        if (temp.matches(chinese)) {
            valueLength += 2;
        } else {
            valueLength += 1;
        }
    }
    return valueLength;
}

String s1 = "abcd我们";
String s2 = "abcdef";
String s3 = "啊波次得我们";
System.out.println("s1 default " + s1.length() + " s.byte " + s1.getBytes().length);
System.out.println("s1 gbk " + s1.length() + " s.byte " + s1.getBytes("GBK").length);
System.out.println("s1 utf-8 " + s1.length() + " s.byte " + s1.getBytes("UTF-8").length);
System.out.println("s2 " + s2.length() + " s.byte " + s2.getBytes().length);
System.out.println("s3 " + s3.length() + " s.byte " + s3.getBytes().length);

System.out.println("func s1 " + String_length(s1));
System.out.println("func s2 " + String_length(s2));
System.out.println("func s3 " + String_length(s3));

得到的结果是:

s1 default 6 s.byte 10  //默认是按utf-8搞
s1 gbk 6 s.byte 8  //gbk固定2个字节中文,英文1个
s1 utf-8 6 s.byte 10 //utf8中文是不固定的,可能是2~3个。英文1个
s2 6 s.byte 6
s3 6 s.byte 18
func s1 8
func s2 6
func s3 12

所以,string.length拿到的是文字的个数;string.getByte().length根据编码来返回字节数;
使用方法函数,使用unicode探测最好。

猜你喜欢

转载自blog.csdn.net/jzlhll123/article/details/81708571