ArrayList源码阅读（中）——迭代器和子集合

ArrayList源码阅读（中）

中篇的主要内容是ArrayList的迭代器和子集合的问题。
首先，我们追溯到集合的源头Collectio这个接口：

public interface Collection<E> extends Iterable<E> {
    int size();
    boolean isEmpty();
    boolean contains(Object o);
    Iterator<E> iterator();
    Object[] toArray();
    <T> T[] toArray(T[] a);
    boolean add(E e);
    boolean remove(Object o);
    boolean containsAll(Collection<?> c);
    boolean addAll(Collection<? extends E> c);
    boolean removeAll(Collection<?> c);
    default boolean removeIf(Predicate<? super E> filter) {
        Objects.requireNonNull(filter);
        boolean removed = false;
        final Iterator<E> each = iterator();
        while (each.hasNext()) {
            if (filter.test(each.next())) {
                each.remove();
                removed = true;
            }
        }
        return removed;
    }
    boolean retainAll(Collection<?> c);
    boolean equals(Object o);

    int hashCode();

    @Override
    default Spliterator<E> spliterator() {
        return Spliterators.spliterator(this, 0);
    }

    default Stream<E> stream() {
            return StreamSupport.stream(spliterator(), false);
    }

    default Stream<E> parallelStream() {
        return StreamSupport.stream(spliterator(), true);
    }
}

Collection接口继承于Iterable接口，这个接口的作用是用来说明实现这个接口的类是可以迭代的。
Iterable的源码如下：

public interface Iterable<T> {
    Iterator<T> iterator();
    default void forEach(Consumer<? super T> action) {
        Objects.requireNonNull(action);
        for (T t : this) {
            action.accept(t);
        }
    }
    default Spliterator<T> spliterator() {
        return Spliterators.spliteratorUnknownSize(iterator(), 0);
    }
}

该类中有一个iterator，所以实现该接口的类是可迭代的。Iterator才是迭代器。
在Iterator接口中定义了如下几个方法：
+ boolean hasNext() 判断集合中是否有下一个元素。
+ E next() 返回集合中的下一个元素。
+ void remove() 默认的remove方法
+ void forEachRemaining(Consumer<? super E> action) 变量集合中的元素，并根据传入的action进行相关的操作。
下面的这个方法是ArrayList中的跌点器。
下面的是Iterator的源码：

public interface Iterator<E> {
    boolean hasNext();
    E next();
    default void remove() {
        throw new UnsupportedOperationException("remove");
    }
    default void forEachRemaining(Consumer<? super E> action) {
        Objects.requireNonNull(action);
        while (hasNext())
            action.accept(next());
    }
}

在ArrayList中有一个返回迭代器的方法，如下：

public Iterator<E> iterator() {
    return new Itr();
}

方法中的Itr是ArrayList中的一个内部类，该类继承了Iterator这个接口。
这个接口的源码如下：

private class Itr implements Iterator<E> {
    // 下一个元素的索引
    int cursor;       // index of next element to return
    // 最后一次返回元素的索引
    int lastRet = -1; // index of last element returned; -1 if no such
    // 这里的expectedModCount相当与一个锁
    int expectedModCount = modCount;
    Itr() {
    }
    // 是否有下一个元素
    public boolean hasNext() {
        return cursor != size;
    }
    // 返回list中的值
    @SuppressWarnings("unchecked")
    public E next() {
        checkForComodification();
        int i = cursor;
        if (i >= size)
            throw new NoSuchElementException();
        Object[] elementData = ArrayList.this.elementData;
        // 第二次检查，list集合中数量是否发生变化
        if (i >= elementData.length)
            throw new ConcurrentModificationException();
        cursor = i + 1;  // 一下个元素的索引
        return (E) elementData[lastRet = i];
    }
    public void remove() {
        if (lastRet < 0)
            throw new IllegalStateException();
        checkForComodification();
        try {
            // 移出list中元素
            ArrayList.this.remove(lastRet);
            // 由于cursor比lastRet大１，所以这行代码是将指针往回移动一位
            cursor = lastRet;
            lastRet = -1;
            expectedModCount = modCount;
        } catch (IndexOutOfBoundsException ex) {
            throw new ConcurrentModificationException();
        }
    }
    /**
     * jdk1.8中使用的方法
     * 将list中的所有元素都给了consumer，这个方法可以取出元素
     *
     * @param consumer
     */
    @Override
    @SuppressWarnings("unchecked")
    public void forEachRemaining(Consumer<? super E> consumer) {
        Objects.requireNonNull(consumer);
        final int size = ArrayList.this.size;
        int i = cursor;
        if (i >= size) {
            return;
        }
        final Object[] elementData = ArrayList.this.elementData;
        if (i >= elementData.length) {
            throw new ConcurrentModificationException();
        }
        while (i != size && modCount == expectedModCount) {
            consumer.accept((E) elementData[i++]);
        }
        // update once at end of iteration to reduce heap write traffic
        cursor = i;
        lastRet = i - 1;
        checkForComodification();
    }
    /**
     * 检查这两个值是否相等
     * 在迭代器时list集合中的元素数量发生变化时会造成这两个值不相等
     */
    final void checkForComodification() {
        if (modCount != expectedModCount)
            throw new ConcurrentModificationException();
    }
}

在这里有有很多的博客说他是工作的在一个独立的线程中的，而且也没有解释。如下：

terator 是工作在一个独立的线程中，并且拥有一个 mutex 锁。 Iterator 被创建之后会建立一个指向原来对象的单链索引表，当原来的对象数量发生变化时，这个索引表的内容不会同步改变，所以当索引指针往后移动的时候就找不到要迭代的对象，所以按照 fail-fast 原则 Iterator 会马上抛出 java.util.ConcurrentModificationException 异常。
所以 Iterator 在工作的时候是不允许被迭代的对象被改变的。但你可以使用 Iterator 本身的方法 remove() 来删除对象， Iterator.remove() 方法会在删除当前迭代对象的同时维护索引的一致性

Iterator中有一个锁，但是不是工作在一个独立的线程中的，它和ArrayList工作在同一个线程中，这个锁的是为了在使用迭代器的时候不能再向ArrayList中添加元素。
如下：

import java.util.*;

public class TestIterator {
    public static void main(String[] args) {
        List<Integer> list = new ArrayList<>();
        list.add(1);
        list.add(2);
        Iterator iterator = list.iterator();
        list.add(3);
        if (iterator.hasNext()) {
            System.out.println(iterator.next());
        }
    }
}

运行上述的程序会报一个ConcurrentModificationException，这是因为当生成一个迭代器的时候会在会初始化一个变量expectedModCount，让这个变量expectedModCount = modCount，在上篇中说过了modCoun记录的是ArrayList的修改次数，ArrayList返回迭代器之后，然后再次使用ArrayList中的方法添加或者删除元素会导致modCount值变大，而在此调用迭代器中的方法的时候会检查expectedModCount和modCount是否相等，如果不相等则抛出ConcurrentModificationException。

下面的是官方文档中对ConcurrentModificationException异常的描述。

public class ConcurrentModificationExceptionextends RuntimeException当方法检测到对象的并发修改，但不允许这种修改时，抛出此异常。

例如，某个线程在 Collection 上进行迭代时，通常不允许另一个线性修改该 Collection。通常在这些情况下，迭代的结果是不确定的。如果检测到这种行为，一些迭代器实现（包括 JRE 提供的所有通用 collection 实现）可能选择抛出此异常。执行该操作的迭代器称为快速失败迭代器，因为迭代器很快就完全失败，而不会冒着在将来某个时间任意发生不确定行为的风险。

注意，此异常不会始终指出对象已经由不同线程并发修改。如果单线程发出违反对象协定的方法调用序列，则该对象可能抛出此异常。例如，如果线程使用快速失败迭代器在 collection 上迭代时直接修改该 collection，则迭代器将抛出此异常。

注意，迭代器的快速失败行为无法得到保证，因为一般来说，不可能对是否出现不同步并发修改做出任何硬性保证。快速失败操作会尽最大努力抛出 ConcurrentModificationException。因此，为提高此类操作的正确性而编写一个依赖于此异常的程序是错误的做法，正确做法是：ConcurrentModificationException 应该仅用于检测 bug。

这里面说道了fast-fail（快速失败）机制，fast-fail是Java集合（Collection）中的一种错误检测机制。当一个线程对集合使用迭代器遍历时，另一个线程修改了这个集合的内容。（这里的另一个进程也可能和遍历的是同一个进程）。
下面的两个程序中分别演示了在单线程和多线程中的情况。
1. 单线程

import java.util.*;

public class TestIterator {
    public static void main(String[] args) {
        List<Integer> list = new ArrayList<>();
        list.add(1);
        list.add(2);
        Iterator iterator = list.iterator();
        list.add(3);
        if (iterator.hasNext()) {
            System.out.println(iterator.next());
        }
    }
}

这里在迭代的过程中又添加了一个元素，之后在此迭代使用next时导致出现ConcurrentModificationException异常。
在多道程序中的情况：

import java.util.*;

public class TestIteratorThread extends Thread{
    static List<Integer> list;

    public static void main(String[] args) {
        TestIteratorThread test = new TestIteratorThread();
        list = new ArrayList<>();
        list.add(1);
        list.add(2);
        test.start();
        for (int i = 3; i < 100; i++) {
            System.out.println("main");    
            list.add(i);
        }
    }

    public void run() {
        Iterator itr = list.iterator();
        while (itr.hasNext()) {
            System.out.println(itr.next());
        }
    }

}

ArrayList中还有另一个迭代器ListItr。ArrayList中的下面的这个方法会返回这个迭代器。

/**
 * 返回从指定索引到结束的list迭代器
 * <p>The returned list iterator is <a href="#fail-fast"><i>fail-fast</i></a>.
 *
 * @throws IndexOutOfBoundsException {@inheritDoc}
 */
public ListIterator<E> listIterator(int index) {
    // 这里是一个index的检查
    if (index < 0 || index > size)
        throw new IndexOutOfBoundsException("Index: " + index);
    return new ListItr(index);
}

/**
 * 返回一个从0开始的迭代器
 * Returns a list iterator over the elements in this list (in proper
 * sequence).
 *
 * <p>The returned list iterator is <a href="#fail-fast"><i>fail-fast</i></a>.
 *
 * @see #listIterator(int)
 */
public ListIterator<E> listIterator() {
    return new ListItr(0);
}

ListItr也是ArrayList的一个内部类，是Itr的一个增强的版,它拥有Itr的所有功能，并且还拥有三个Itr不具有的功能。
1. 双向迭代
2. 使用迭代器添加元素。
3. 能够指定迭代器开始索引的位置。
ListItr继承了Itr，实现了ListIterator，可以说ListIterator是Iterator的List增强版。

public interface ListIterator<E> extends Iterator<E> {
    boolean hasNext();      // 判断是否有下一个元素
    E next();               // 返回下一个元素
    boolean hasPrevious();  // 判断是否有前一个元素
    E previous();           // 返回前一个元素
    int nextIndex();        // 返回下一个元素的索引
    int previousIndex();    // 返回前一个元素的索引
    void remove();          // 删除当前元素
    void set(E e);          // 重新设置元素
    void add(E e);          // 添加元素
}

下面的是ArrayList中ListItr的源码

/**
 * AbstractList中也有一个ListItr，ArrayList中的ListItr是对AbstractList中的一个优化。
 * ListItr与普通Iterator的区别
 * - 它可以进行双向移动，而普通的迭代器只能进行单向移动
 * - 它可以添加元素，而普通的迭代器不行
 * An optimized version of AbstractList.ListItr
 */
private class ListItr extends Itr implements ListIterator<E> {
    // 这个构造器有一个index参数
    // 这个参数的作用是是迭代器从index位置开始
    ListItr(int index) {
        super();
        cursor = index;
    }
    // 是否有前一个元素
    public boolean hasPrevious() {
        return cursor != 0;
    }
    // 获取下一个元素的索引
    // 这个方法没有做越界判断
    public int nextIndex() {
        return cursor;
    }
    /**
     * 获取前一个元素的索引
     * 这个方法没有做越界判断
     * @return
     */
    public int previousIndex() {
        return cursor - 1;
    }

    //　返回cursor前一元素
    @SuppressWarnings("unchecked")
    public E previous() {
        // 像前面一样的检查
        checkForComodification();
        int i = cursor - 1;
        if (i < 0)  // 检查是否越界
            throw new NoSuchElementException();
        // 获取ArrayList中的元素
        Object[] elementData = ArrayList.this.elementData;
        if (i >= elementData.length)    // 再次检查
            throw new ConcurrentModificationException();
        cursor = i;
        return (E) elementData[lastRet = i];
    }  

    public void set(E e) {
        // 当添加或者删除元素的时候lastRet的值会变为-1, 初始化的时候也为-1
        if (lastRet < 0)
            throw new IllegalStateException();
        checkForComodification();
        try {
            // 将上下次返回元素的位置的元素设置为e
            ArrayList.this.set(lastRet, e);
        } catch (IndexOutOfBoundsException ex) {
            throw new ConcurrentModificationException();
        }
    }
    public void add(E e) {
        checkForComodification();
        try {
            // 当前元素后移一位
            int i = cursor;
            ArrayList.this.add(i, e);   // 在i的位置添加元素e
            cursor = i + 1;
            lastRet = -1;
            expectedModCount = modCount;
        } catch (IndexOutOfBoundsException ex) {
            throw new ConcurrentModificationException();
        }
    }
}

这个也是一个快速失败的迭代器。

ArrayList中的子集合

在上篇中的toArray()方法中说过修改toArray返回的数组不会影响原集合，而下面说的子集合SubList会对原集合有影响，下面的这个方法会返回一个SubList.

/**
 * 获取fromIndex到toIndex之间的子集合
 * - 若fromIndex == toIndex，则返回空集合
 * - 对该子集合操作，会影响原有的集合
 * - 当调用了subList()后，若对原有的集合进行操作，会抛出java.util.ConcurrentModificationExcept
 * 这个和Itr的原因差不多由于modCount发生了改变，对集合的操作需要用子集合提供的方法
 * - 该子集合支持所有的集合操作
 * @throws IndexOutOfBoundsException {@inheritDoc}
 * @throws IllegalArgumentException  {@inheritDoc}
 */
public List<E> subList(int fromIndex, int toIndex) {
    subListRangeCheck(fromIndex, toIndex, size);
    return new ArrayList.SubList(this, 0, fromIndex, toIndex);
}

这里会一个方法检查传入值的合法性：

// 检查传入值的合法性
// 注意[fromIndex, toIndex)
static void subListRangeCheck(int fromIndex, int toIndex, int size) {
    if (fromIndex < 0)
        throw new IndexOutOfBoundsException("fromIndex = " + fromIndex);
    if (toIndex > size)
        throw new IndexOutOfBoundsException("toIndex = " + toIndex);
    if (fromIndex > toIndex)
        throw new IllegalArgumentException("fromIndex(" + fromIndex +
                ") > toIndex(" + toIndex + ")");
}

SubList是ArrayList的私有的内部类
首先，看一下SubList中的几个变量：

private final AbstractList<E> parent;   // 实际传入的是ArrayList本身
private final int parentOffset;         // 相对于父集合的偏移量，其实就是fromIndex
private final int offset;               // 偏移量，默认为0
int size;                               // SubList中的元素个数

下面的是SubList的构造器

/*
 * 子集合中的元素只是将父集合的元素映射，并没有复制
 * 因为有final的修饰，所以截取子集合后，父集合不能删除SubList中的有个元素——offset不能更改
 *
 * @param parent
 * @param offset
 * @param fromIndex
 * @param toIndex
 */
SubList(AbstractList<E> parent,
        int offset, int fromIndex, int toIndex) {
    this.parent = parent;
    this.parentOffset = fromIndex;
    this.offset = offset + fromIndex;
    this.size = toIndex - fromIndex;
    this.modCount = ArrayList.this.modCount;
}

SubList并没有复制一份ArrayList，而是建立了和ArrayList中elemntData数组之间的一一映射。下面的这个方法体现了这一点：

/**
 * 设置新值，返回旧值
 *
 * @param index
 * @param e
 * @return
 */
public E set(int index, E e) {
    rangeCheck(index);          // 检查越界
    checkForComodification();   //
    // 从这里可以看出：对子集合添加元素，是直接对父类添加的
    E oldValue = ArrayList.this.elementData(offset + index);
    ArrayList.this.elementData[offset + index] = e;
    return oldValue;
}

可以看到这里直接使用了index+offset的方法访问，和前面的迭代器一样这里也有一个checkForComodification()方法，它的作用和迭代器中的那个方法作用一样。也就是说如果在程序中使用了ArrayList中的SubList，那么在使用SubList的期间就不能在使用原来的ArrayList进行修改。

private void checkForComodification() {
    if (ArrayList.this.modCount != this.modCount)
        throw new ConcurrentModificationException();
}

下面在看一个add()方法：

// 添加元素
public void add(int index, E e) {
    rangeCheckForAdd(index);
    checkForComodification();
    // 中这里也可以看，index+offset得到添加的索引在父类中的索引，然后调用父类的方法。
    parent.add(parentOffset + index, e);
    this.modCount = parent.modCount;
    this.size++;
}

这个方法之间使用了ArrayList中的add()方法。其他的方法都是类似与上面的两个方法。在这里就不多做描述了。