字符串数据结构的实现

上一篇文章

简介

在Java的基本数据类型中,并不包含字符串这个关键字。字符串的底层实际上是由字符数组实现的。String对象也是一个不可变的对象。如果查看JDK文档,你就会发现String类中的每一个看起来是修改了对象的方法,其本质上都是创建了一个全新的String对象以包含修改后的字符串内容。而原先的字符串丝毫未动。

String

实现字符串引用的字符数组

private final char value[];

当我们new了一个字符串对象的时候。创建的字符串其实是参数字符串的副本。因为字符串是不可变的。

public String(String original) {
    
    
        this.value = original.value;
        this.hash = original.hash;
    }

charAt方法

这个方法的作用主要是根据索引返回指定位置的字符,由于字符串底层是一个字符数组,所以只需要判断索引是否合法就可以直接返回了。

public char charAt(int index) {
    
    
        if ((index < 0) || (index >= value.length)) {
    
    
            throw new StringIndexOutOfBoundsException(index);
        }
        return value[index];
    }

length方法

字符数组的大小,也就是字符串的长度。所以length方法只需要直接返回数组的大小即可

public int length() {
    
    
        return value.length;
    }

isEmpty方法

判断字符串的是不是空串,只需要判断数组的长度是不是等于0就好了

public boolean isEmpty() {
    
    
        return value.length == 0;
    }

equals方法

equals方法和= =就不太一样了。当元素内容相同的时候地址不一样,那么= =返回的结果依然是false。因为 = =是通过地址来进行比较的。而equals呢,即使地址不一样,只要内容完全一样,就会返回true。

public boolean equals(Object anObject) {
    
    
        if (this == anObject) {
    
    
            return true;
        }
        if (anObject instanceof String) {
    
    
            String anotherString = (String)anObject;
            int n = value.length;
            if (n == anotherString.value.length) {
    
    
                char v1[] = value;
                char v2[] = anotherString.value;
                int i = 0;
                while (n-- != 0) {
    
    
                    if (v1[i] != v2[i])
                        return false;
                    i++;
                }
                return true;
            }
        }
        return false;
    }

indexOf方法

通过字符返回该字符所在字符串的位置。其实现的方法则是通过遍历字符数组来实现的,通过底层的一层层调用,最后调用的方法实际是indexOfSupplementary

private int indexOfSupplementary(int ch, int fromIndex) {
    
    
        if (Character.isValidCodePoint(ch)) {
    
    
            final char[] value = this.value;
            final char hi = Character.highSurrogate(ch);
            final char lo = Character.lowSurrogate(ch);
            final int max = value.length - 1;
            for (int i = fromIndex; i < max; i++) {
    
    
                if (value[i] == hi && value[i + 1] == lo) {
    
    
                    return i;
                }
            }
        }
        return -1;
    }

compareTo方法

这个方法的功能主要是比较两个字符串的大小。为了保证效率,避免很多不必要的操作,因此底层先用lim变量存储了两个字符串长度小的那个数据,这样一来循环的次数可以减少。if(c1 != c2){ return c1 - c2 }的意思主要是,假如两个字符串一样长的话,就用ASCll码来进行比较。

public int compareTo(String anotherString) {
    
    
        int len1 = value.length;
        int len2 = anotherString.value.length;
        int lim = Math.min(len1, len2);
        char v1[] = value;
        char v2[] = anotherString.value;

        int k = 0;
        while (k < lim) {
    
    
            char c1 = v1[k];
            char c2 = v2[k];
            if (c1 != c2) {
    
    
                return c1 - c2;
            }
            k++;
        }
        return len1 - len2;
    }

replace方法

replace方法则是将字符串中老的字符替换成新的字符。首先它会判断老字符和新的字符是否相同,如果相同就没必要修改。不相同的话,那么它会先用len得到字符数组的长度,再用val来保存新的字符数组。用变量i作为索引来定位到老字符的位置。最后再用新的buf字符数组保存新的字符串进行返回。也就是说,虽然这个方法是修改功能。但其本质上也只是创建了一个新的字符数组而已，原先的那个依然是丝毫未动。

public String replace(char oldChar, char newChar) {
    
    
        if (oldChar != newChar) {
    
    
            int len = value.length;
            int i = -1;
            char[] val = value; /* avoid getfield opcode */

            while (++i < len) {
    
    
                if (val[i] == oldChar) {
    
    
                    break;
                }
            }
            if (i < len) {
    
    
                char buf[] = new char[len];
                for (int j = 0; j < i; j++) {
    
    
                    buf[j] = val[j];
                }
                while (i < len) {
    
    
                    char c = val[i];
                    buf[i] = (c == oldChar) ? newChar : c;
                    i++;
                }
                return new String(buf, true);
            }
        }
        return this;
    }

StringBuilder

简介

StringBuilder是一个可变的字符序列,底层继承了AbstractStringBuilder的抽象类,其本质也依然是采用字符数组实现的。但StringBuilder的实现和String又不太相同,StringBuilder看起来更像是ArrayList一般的动态数组。因为,StringBuilder一开始会先给数组一个初始长度,然后会用一个变量来统计字符串的长度。也就是说,在String里面,字符串的长度就等于字符数组的长度。而在StringBuilder里面,字符串的长度就未必会等于字符数组的长度。而且,一旦字符串的长度超过了字符数组的长度,那么它底层则会调用一个方法对该字符数组进行扩容。

我们初次new的时候,它的初始长度会是字符串的大小+16

public StringBuilder(String str) {
    
    
        super(str.length() + 16);
        append(str);
    }

append方法

append方法的功能是在原先的字符串上添加字符或者字符串,实现原理就是直接在字符数组后面加一个元素,因为前面也说了。字符串的长度并不等于字符数组的长度。所以直接value[count++] = c;就好了。那么问题来了,如果长度超出了字符数组的范围怎么办?其实它底层是调用了ensureCapacityInternal方法。

private void ensureCapacityInternal(int minimumCapacity) {
    
    
        // overflow-conscious code
        if (minimumCapacity - value.length > 0) {
    
    
            value = Arrays.copyOf(value,
                    newCapacity(minimumCapacity));
        }
    }

这个方法会让字符数组的长度进行扩容,到底扩容多少呢?我们在newCapacity方法里看到,是扩容了1倍+2。之所以使用位运算是因为位运算的效率相对乘法而言更高。

private int newCapacity(int minCapacity) {
    
    
        // overflow-conscious code
        int newCapacity = (value.length << 1) + 2;
        if (newCapacity - minCapacity < 0) {
    
    
            newCapacity = minCapacity;
        }
        return (newCapacity <= 0 || MAX_ARRAY_SIZE - newCapacity < 0)
            ? hugeCapacity(minCapacity)
            : newCapacity;
    }

delete方法

delete方法会删除起始索引和终点索引内的元素。start是开区间,end则是闭区间。它的实现原理是先判断两个索引是否合法,如果合法就会调用System.arraycopy方法来实现

public AbstractStringBuilder delete(int start, int end) {
    
    
        if (start < 0)
            throw new StringIndexOutOfBoundsException(start);
        if (end > count)
            end = count;
        if (start > end)
            throw new StringIndexOutOfBoundsException();
        int len = end - start;
        if (len > 0) {
    
    
            System.arraycopy(value, start+len, value, start, count-end);
            count -= len;
        }
        return this;
    }

deleteCharAt方法

这个方法和刚才的删除原理是一样的,先判断索引是否合法,再调用System.arraycopy实现

public AbstractStringBuilder deleteCharAt(int index) {
    
    
        if ((index < 0) || (index >= count))
            throw new StringIndexOutOfBoundsException(index);
        System.arraycopy(value, index+1, value, index, count-index-1);
        count--;
        return this;
    }

insert方法

这个方法看起来和append很像。不同的是,append是默认在字符串的尾部添加元素,而insert则是根据索引添加元素。如果长度不够了,又会调用方法进行扩容

public AbstractStringBuilder insert(int offset, String str) {
    
    
        if ((offset < 0) || (offset > length()))
            throw new StringIndexOutOfBoundsException(offset);
        if (str == null)
            str = "null";
        int len = str.length();
        ensureCapacityInternal(count + len);
        System.arraycopy(value, offset, value, offset + len, count - offset);
        str.getChars(value, offset);
        count += len;
        return this;
    }

replace方法

在StringBuilder里面的replace和String里面的实现方式就有所不同。在StringBuilder里面,replace可以根据起始索引和终止索引范围内的字符串进行修改。其实现方式依然是先判断索引,再重新定位大小,接着调用arraycopy方法进行操作。

 public AbstractStringBuilder replace(int start, int end, String str) {
    
    
        if (start < 0)
            throw new StringIndexOutOfBoundsException(start);
        if (start > count)
            throw new StringIndexOutOfBoundsException("start > length()");
        if (start > end)
            throw new StringIndexOutOfBoundsException("start > end");

        if (end > count)
            end = count;
        int len = str.length();
        int newCount = count + len - (end - start);
        ensureCapacityInternal(newCount);

        System.arraycopy(value, end, value, start + len, count - end);
        str.getChars(value, start);
        count = newCount;
        return this;
    }

reverse方法

这个方法的功能主要是实现字符串的反转。其最底层调用的是reverseAllValidSurrogatePairs方法

private void reverseAllValidSurrogatePairs() {
    
    
        for (int i = 0; i < count - 1; i++) {
    
    
            char c2 = value[i];
            if (Character.isLowSurrogate(c2)) {
    
    
                char c1 = value[i + 1];
                if (Character.isHighSurrogate(c1)) {
    
    
                    value[i++] = c1;
                    value[i] = c2;
                }
            }
        }
    }

StringBuffer

StringBuffer和StringBuilder的方法基本上都大同小异,但他们本质上的区别还是挺大的。StringBuffer是JDK1.0就存在的类，线程安全，做线程同步检查，效率较低。而StringBuilder是JDK1.5才提供的类，线程不安全，不做线程同步检查，因此效率较高。当然,实际使用中还是比较推荐使用StringBuilder

格式化输出

在JavaSE5的时候推出了C语言风格的printf();printf里面不使用 "+"来连接,而是使用逗号分隔

public static void main(String[] args) {
    
    
        System.out.printf("%d + %d = %d",5,9,5 + 9);
    }

System.out.format()

format方法可以用于PrintStream和PrintWriter对象,其中也包括System.out对象。formate方法和printf方法是等价的。

public static void main(String[] args) {
    
    
        System.out.format("%d + %d = %d",5,9,5 + 9);
    }

Formatter类

在Java中,所有新的格式化功能都由java.util.Formatter类处理。可以将Formatter看作一个翻译器,它将你的格式化字符串与数据翻译成需要的结果。当你创建一个Formatter对象的时候,需要向其构造器传递一些信息，告诉它最终的结果向哪里输出。

Java深入篇~02.字符串数据结构的实现(JDK1.8)

字符串数据结构的实现

上一篇文章

简介

本人过去曾写过一篇关于串的数据结构博客,点击该链接可以进去

String

实现字符串引用的字符数组

charAt方法

length方法

isEmpty方法

equals方法

indexOf方法

compareTo方法

replace方法

StringBuilder

简介

append方法

delete方法

deleteCharAt方法

insert方法

replace方法

reverse方法

StringBuffer

格式化输出

System.out.format()

Formatter类

猜你喜欢