[Java container source code series] Collection application summary: iterator & batch operation & thread safety issues

The class diagrams of all collections are listed below:
image description

  • What each interface does is very clear. For example, Serializable is only responsible for serialization, Cloneable is only responsible for copying, and Map is only responsible for defining the interface of Map. Although the entire diagram looks like there are many interfaces, the responsibilities are very clear;
  • Complex functions are implemented through the inheritance of interfaces. For example, ArrayList implements Serializable, Cloneable, RandomAccess, AbstractList, List and other interfaces, thus possessing various functions such as serialization, copying, and definition of various operations on arrays;
  • The above class diagram can only see the inheritance relationship, but the combination relationship is not yet visible. For example, Set combination encapsulates the underlying capabilities of Map.

The biggest advantage of the above design is that each interface capability has a single responsibility, and many interfaces become the accumulation of interface capabilities. Assuming we want to implement another data structure class, we can select the capabilities from these existing capability interfaces. To meet the requirements of the capability interface, perform some simple assembly, thereby speeding up development.

This kind of thinking is often used in daily work. We will abstract some common code blocks and deposit them into a code block pool. When we encounter different scenarios, we will take what we need from the code block pool. The code block is extracted, and simple arrangement and assembly are carried out to realize the scene function we need.

1. Iterator

1.1 Itrable

  • Iterable (traversal) operations can be performed by implementing the Itrable interface, and the top-level interface Collection implements Iterable
  • There are two ways to traverse, but the specific class needs to be implemented by itself
    • Iterator, which can implement multiple Iterators for multiple iterations (preorder, postorder...)
    • The forEach method, which is generally overridden by specific subclasses
public interface Iterable<T> {
    
    
    Iterator<T> iterator()
        
    default void forEach(Consumer<? super T> action) {
    
    
        Objects.requireNonNull(action);
        for (T t : this) {
    
    
            action.accept(t);
        }
    }
    
    default Spliterator<T> spliterator() {
    
    
        return Spliterators.spliteratorUnknownSize(iterator(), 0);
    }
}

1.1.1 Iterator interface

Often implemented by internal classes, returned by the constructor, generally used in scenarios where the collection is to be deleted

  • Iterator is a design pattern that encapsulates the traversal of the collection, so that you can traverse different collections in the same way without understanding the internal details of the collection
  • Iterators are not allowed to use the collection method to add or delete collections, but you can operate on the elements of the collection (such as set()), and you can also use remove() of the iterator
    • Cannot use the put and remove methods of the collection
    • Can be used to modify the attributes of collection elements: set method
    • remove operation: Iterator's remove method

Pay special attention here, it must be used after next(), such as deleting the first element, first next and then remove

public interface Iterator<E> {
    
    
   
 	// 每次next之前,先调用此方法探测是否迭代到终点
    boolean hasNext();  
    
 	// 返回当前迭代元素 ,同时,迭代游标后移
    E next();           
              
     /*删除最近一次已近迭代出出去的那个元素。
     只有当next执行完后,才能调用remove函数。
     比如你要删除第一个元素,不能直接调用 remove()   而要先next一下( );
     在没有先调用next 就调用remove方法是会抛出异常的。
     这个和MySQL中的ResultSet很类似
    */
    void remove() 
    {
    
    
        throw new UnsupportedOperationException("remove");
    }
}
  • The use of iterators:
// iterator是集合的自己Iterator构造方法
Iterator it = list.iterator();
while(it.hasNext) {
    
    
    it.next();
}

1.1.2 Enhanced for loop

The essence is the simplification and encapsulation of Iterator, which is generally used when only traversing the collection

  • Enhanced for loop:
    • Can be used to modify the attribute values ​​of elements in the collection: set method
    • But you cannot add or delete the collection: modCount control
  • demo:
// ArrayList.forEach()
@Override
public void forEach(Consumer<? super E> action) {
    
    
  // 判断非空
  Objects.requireNonNull(action);
  // modCount的原始值被拷贝
  final int expectedModCount = modCount;
  final E[] elementData = (E[]) this.elementData;
  final int size = this.size;
  // 每次循环都会判断数组有没有被修改,一旦被修改,停止循环
  for (int i=0; modCount == expectedModCount && i < size; i++) {
    
    
    // 执行循环内容,action 代表我们要干的事情
    action.accept(elementData[i]);
  }
  // 数组如果被修改了,抛异常
  if (modCount != expectedModCount) {
    
    
    throw new ConcurrentModificationException();
  }
}

1.1.2 forEach method

The essence is to encapsulate the for loop, used in conjunction with lamada, and is generally used to modify the properties of collection objects

// ArrayList.forEach()
@Override
public void forEach(Consumer<? super E> action) {
    
    
  // 判断非空
  Objects.requireNonNull(action);
  // modCount的原始值被拷贝
  final int expectedModCount = modCount;
  final E[] elementData = (E[]) this.elementData;
  final int size = this.size;
  // 每次循环都会判断数组有没有被修改,一旦被修改,停止循环
  for (int i=0; modCount == expectedModCount && i < size; i++) {
    
    
    // 执行循环内容,action 代表我们要干的事情
    action.accept(elementData[i]);
  }
  // 数组如果被修改了,抛异常
  if (modCount != expectedModCount) {
    
    
    throw new ConcurrentModificationException();
  }
}
  • Because of the default modification in the Iterable interface , it must be implemented by itself and the subclass does not have to be rewritten . In jdk8, all collections implement the forEach method
  • forEach:
    • Modify object properties: through the set method
    • Cannot add or delete operations: modCount
  • demo:
list.forEach(l -> {
    
    
    l.setName("zs");
    l.setAge(18);
})

2.2 Map iteration

map does not implement the Itrable interface, but it can still be iterated

  • Iterate through set Iterator

  • The top Map interface defines the forEach method

1.2.1 Set.iterator

eg. EntrySet & EntryIterator are listed here

final class EntrySet extends AbstractSet<Map.Entry<K,V>> {
    
    
    public final Iterator<Map.Entry<K,V>> iterator() {
    
    
            return new EntryIterator();
        }
}

final class EntryIterator extends HashIterator
        implements Iterator<Map.Entry<K,V>> {
    
    
        public final Map.Entry<K,V> next() {
    
     return nextNode(); }
}

Usage example:

Iterator<Map.Entry<String,String>> it = map.entrySet().iterator
while (it.hasNext()) {
    
    
	Map.Entry<String,String>  me = it.next();
	// 获取key
	me.getkey();
	// 获取value
	me.getValue();
}

1.2.2 Map.forEach

The forEach and default modification defined in the Map actually call entrySet

default void forEach(BiConsumer<? super K, ? super V> action) {
    
    
        Objects.requireNonNull(action);
        for (Map.Entry<K, V> entry : entrySet()) {
    
    
            K k;
            V v;
            try {
    
    
                k = entry.getKey();
                v = entry.getValue();
            } catch(IllegalStateException ise) {
    
    
                // this usually means the entry is no longer in the map.
                throw new ConcurrentModificationException(ise);
            }
            action.accept(k, v);
        }
}

HashMap的forEach

@Override
    public void forEach(BiConsumer<? super K, ? super V> action) {
    
    
        Node<K,V>[] tab;
        if (action == null)
            throw new NullPointerException();
        if (size > 0 && (tab = table) != null) {
    
    
            int mc = modCount;
            for (int i = 0; i < tab.length; ++i) {
    
    
                for (Node<K,V> e = tab[i]; e != null; e = e.next)
                    action.accept(e.key, e.value);
            }
            if (modCount != mc)
                throw new ConcurrentModificationException();
       }
 }

2. Batch operation

2.1 Batch add

The source code of the ArrayList.addAll method is listed below:

public boolean addAll(Collection<? extends E> c) {
    
    
  Object[] a = c.toArray();
  int numNew = a.length;
  // 确保容量充足,整个过程只会扩容一次
  ensureCapacityInternal(size + numNew); 
  // 进行数组的拷贝
  System.arraycopy(a, 0, elementData, size, numNew);
  size += numNew;
  return numNew != 0;
}

We can see that in the entire batch addition process, the capacity is only expanded once, and the same is true for the putAll method of HashMap. The entire addition process will only be expanded once, which greatly reduces the time for batch addition and improves performance.

So when it comes to batch copying of collections and batches of new scenes, when you want to improve the new performance, you can start with the initialization of the target collection.

This also reminds us that when the container is initialized, it is best to assign an initial value to the container, so as to prevent continuous expansion during the put process, thereby shortening the time. The source code of HashSet in the previous chapter demonstrates assigning initial values ​​to HashMap. The formula for is: take the maximum of the two in the brackets (desired value/0.75+1, default value 16).

Usage example:

When adding a large amount of data in List and Map, we should not use for loop + add/put method to add, this will have a large expansion cost, we should try to use addAll and putAll methods to add, the following is ArrayList The example writes a demo as follows, which demonstrates the performance comparison of the two solutions:

@Test
public void testBatchInsert(){
    
    
  // 准备拷贝数据
  ArrayList<Integer> list = new ArrayList<>();
  for(int i=0;i<3000000;i++){
    
    
    list.add(i);
  }

  // for 循环 + add
  ArrayList<Integer> list2 = new ArrayList<>();
  long start1 = System.currentTimeMillis();
  for(int i=0;i<list.size();i++){
    
    
    list2.add(list.get(i));
  }
  log.info("单个 for 循环新增 300 w 个,耗时{}",System.currentTimeMillis()-start1);

  // 批量新增
  ArrayList<Integer> list3 = new ArrayList<>();
  long start2 = System.currentTimeMillis();
  list3.addAll(list);
  log.info("批量新增 300 w 个,耗时{}",System.currentTimeMillis()-start2);
}

The finally printed log is:

16:52:59.865 [main] INFO demo.one.ArrayListDemo - 单个 for 循环新增 300 w 个,耗时1518

16:52:59.880 [main] INFO demo.one.ArrayListDemo - 批量新增 300 w 个,耗时8

It can be seen that the performance of the batch addition method is 189 times that of the single addition method. The main reason is that the batch addition will only expand once, which greatly shortens the running time. For a single addition, every time the expansion threshold is reached, Will be expanded, and will continue to expand throughout the process, wasting a lot of time

2.2 Batch delete

Batch delete ArrayList provides a removeAll method, HashMap does not provide a batch delete method, let's take a look at the source code implementation of removeAll, how to improve performance:

// 批量删除,removeAll 方法底层调用的是 batchRemove 方法
// complement 参数默认是 false,false 的意思是数组中不包含 c 中数据的节点往头移动
// true 意思是数组中包含 c 中数据的节点往头移动,这个是根据你要删除数据和原数组大小的比例来决定的
// 如果你要删除的数据很多,选择 false 性能更好,当然 removeAll 方法默认就是 false。
private boolean batchRemove(Collection<?> c, boolean complement) {
    
    
  final Object[] elementData = this.elementData;
  // r 表示当前循环的位置、w 位置之前都是不需要被删除的数据,w 位置之后都是需要被删除的数据
  int r = 0, w = 0;
  boolean modified = false;
    
  try {
    
    
    // 从 0 位置开始判断,当前数组中元素是不是要被删除的元素,不是的话移到数组头
    for (; r < size; r++)
      if (c.contains(elementData[r]) == complement)
        elementData[w++] = elementData[r];
  } finally {
    
    
    // r 和 size 不等,说明在 try 过程中发生了异常,在 r 处断开
    // 把 r 位置之后的数组移动到 w 位置之后(r 位置之后的数组数据都是没有判断过的数据,这样不会影响没有判断
    //  的数据,判断过的数据可以被删除)
    if (r != size) {
    
    
      System.arraycopy(elementData, r,
                       elementData, w,
                       size - r);
      w += size - r;
    }
      
    // w != size 说明数组中是有数据需要被删除的
    // 如果 w、size 相等,说明没有数据需要被删除
    if (w != size) {
    
    
      // w 之后都是需要删除的数据,赋值为空,帮助 gc。
      for (int i = w; i < size; i++)
        elementData[i] = null;
      modCount += size - w;
      size = w;
      modified = true;
    }
  }
  return modified;
}

We see that when ArrayList is deleted in batches, if the program is executed normally, there is only one for loop. If the program executes abnormally, a copy will be added. And a single remove method will copy the array each time it is executed (when the deleted element Except when it happens to be the last element of the array), when the larger the array, the more data that needs to be deleted, the performance of batch deletion will be worse. Therefore, it is strongly recommended to use the removeAll method to delete the ArrayList in batch deletion.

3. Thread safety issues

We say that collections are non-thread-safe. The non-thread-safety mentioned here refers to the collection class as a shared variable. When it is read and written by multiple threads, it is not safe. If you want to achieve a thread-safe collection, in the class comment JDK uniformly recommends us to use the Collections.synchronized* class. Collections helps us realize the thread-safe methods corresponding to List, Set, and Map, as shown in the following figure: The
image description
figure implements the thread-safe methods of various collection types. We take synchronizedList as For example, from the source code, how Collections achieve thread safety:

// mutex 就是我们需要锁住的对象
final Object mutex;  

// 这些synchronized~~都是Collections的静态内部类
static class SynchronizedList<E> extends SynchronizedCollection<E> implements List<E> {
    
    
    
        private static final long serialVersionUID = -7754090372962971524L;
        // 通过组合的方式,传入需要保证线程安全的类(List)
    	// Collection.synchronizedList(list)
        final List<E> list;
        SynchronizedList(List<E> list, Object mutex) {
    
    
            super(list, mutex);
            this.list = list;
        }
        
	   // 我们可以看到,List 的所有操作都使用了 synchronized 关键字,来进行加锁
	   // synchronized 是一种悲观锁,能够保证同一时刻,只能有一个线程能够获得锁
        public E get(int index) {
    
    
            synchronized (mutex) {
    
    return list.get(index);}
        }
        public E set(int index, E element) {
    
    
            synchronized (mutex) {
    
    return list.set(index, element);}
        }
        public void add(int index, E element) {
    
    
            synchronized (mutex) {
    
    list.add(index, element);}
        }
…………
}      

From the source code, we can see that Collections uses the synchronized keyword to lock the List operation array method to achieve thread safety.

4. Two points of attention

At the end of the article, two points to note when using collections are proposed:

  1. Override equals & hashcode
    When the elements of the collection are custom classes, the custom class forces the implementation of equals and hashCode methods, and both must be implemented. Because in the collection, except TreeMap and TreeSet which compare the element size through the comparator, the other collection classes will use the equals and hashCode methods when judging the index position and equality. This was mentioned in the previous source code analysis. , So when the collection element is a custom class, we strongly recommend to override the equals and hashCode methods. We can directly use the IDEA tool to override these two methods, which is very convenient

  2. Iteratively delete
    all collection classes. When deleting in the for loop, if you directly use the remove method of the collection class to delete, it will quickly fail and report ConcurrentModificationException. Therefore, it is recommended to use an iterator to delete in any loop deletion scenario;

Guess you like

Origin blog.csdn.net/weixin_43935927/article/details/108543312