Analysis of the working principle of LinkedList in JDK8

Although LinkedList is not used frequently in daily development, as a data structure different from array, it is necessary to understand its underlying implementation.

Before that, let's review the advantages and disadvantages of ArrayList. ArrayList is implemented based on the dynamic management of arrays. The array is a continuous storage address in memory, and the query and traversal of the array are very fast; the disadvantage lies in adding and deleting elements. When it is necessary to copy and move data in large scale, it is also necessary to consider whether expansion operation is required, so the efficiency is relatively low.

It is precisely because of the above shortcomings that the data structure of the linked list appears. First, the linked list is not continuous in memory, but associates all elements by reference, so the advantage of the linked list is that it is faster to add and delete elements, because It just moves the pointer and does not need to judge whether to expand the capacity. The disadvantage is that the query and traversal efficiency is relatively low.

First, let's take a look at the inheritance structure of LinkedList:

public class LinkedList<E>
    extends AbstractSequentialList<E>
    implements List<E>, Deque<E>, Cloneable, java.io.Serializable

As you can see from the source code above:

LinkedList inherits AbstractSequentialList and is a doubly linked list that can be used as a queue, stack, or double-ended queue.

Implementing the List interface can have queue operations.

Implements the Deque interface to have double-ended queue operations

Implementing the Cloneable interface can be used for shallow cloning

Implementing the Serializable interface can be used for network transmission and persistence, and serialization can be used for deep cloning.

Let's first look at the core data structure of LinkedList:

    private static class Node<E> {
        E item; //当前节点
        Node<E> next;//后置节点
        Node<E> prev;//前置节点

        Node(Node<E> prev, E element, Node<E> next) {
            this.item = element;
            this.next = next;
            this.prev = prev;
        }
    }

Then look at its member variables and constructors:

     
 `   transient int size = 0;//当前存储元素的个数

    transient Node<E> first;//首节点

    transient Node<E> last;//末节点

    //无参数构造方法 
    public LinkedList() {
    }

   //有参数构造方法
    public LinkedList(Collection<? extends E> c) {
        this();
        addAll(c);
    }

As can be seen from the above, LinkedList has two constructors, one without parameters and the other with parameters. The function of the constructor with parameters is to pass a set parameter and insert all the elements in it into the LinkedList. Note why it is said here Insertion, not initialized addition, because LinkedList is not thread-safe, it is entirely possible that after the this() method is called, other threads have already inserted data into it.

(1) Analysis of addAll method

Let's take a look at the call chain of the addAll method:

`   //1
    public boolean addAll(Collection<? extends E> c) {
        return addAll(size, c);
    }
    
  //2
    public boolean addAll(int index, Collection<? extends E> c) {
        //校验是否越界
        checkPositionIndex(index);
        //将集合参数转换为数组
        Object[] a = c.toArray();
        //新增的元素个数
        int numNew = a.length;
        if (numNew == 0)
            return false;

        //临时index节点的前置节点和后置节点
        Node<E> pred, succ;
        //说明是尾部插入
        if (index == size) {
            succ = null;//后置节点同时为最后一个节点的值总是为null
            pred = last;//前置节点是当前LinkList节点的最后一个节点
        } else {//非尾部插入
            succ = node(index);//找到index节点作为作为后置节点
            pred = succ.prev;//index节点的前置节点赋值当前的前置节点
        }

        //遍历对象数组
        for (Object o : a) {
            @SuppressWarnings("unchecked") E e = (E) o;//转为指定的泛型类
            //当前插入的第一个节点的初始化
            Node<E> newNode = new Node<>(pred, e, null);
            //如果节点为null，那么说明当前就是第一个个节点
            if (pred == null)
                first = newNode;
            else//不为null的情况下，把pred节点的后置节点赋值为当前节点
                pred.next = newNode;
            //上面的操作完成之后，就把当前的节点赋值为临时的前置节点
            pred = newNode;
        }
        //循环追加完毕后
        //判断如果是尾部插入
        if (succ == null) {//如果最后一个节点是null，说明是尾部插入，那么尾部节点的前置节点，就会被赋值成最后节点
            last = pred;
        } else {//非尾部插入
            pred.next = succ;//尾部节点等于succ也就是index节点
            succ.prev = pred;//succ也就是index节点的前置节点，就是对象数组里面最后一个节点
        }
        //size更新
        size += numNew;
        modCount++;//修改次数添加
        return true;
    }


    
    //3 给一个index查询该位置上的节点
    Node<E> node(int index) {
        // assert isElementIndex(index);
         //如果index小于当前size的一半，就从首节点开始遍历查找
        if (index < (size >> 1)) {
            Node<E> x = first;//初始化等于首节点
            for (int i = 0; i < index; i++)
                x = x.next;//依次向后遍历index次，并将查询结果返回
            return x;
        } else {//否则，就从末尾节点向前遍历
            Node<E> x = last;//初始化等于末尾节点
            for (int i = size - 1; i > index; i--)
                x = x.prev;//依次向前遍历index次，并将查询结果返回
            return x;
        }
    }

Here we see that there are two main methods:

The first is the addAll(int index, Collection c) method, which firstly determines whether the index may be out of bounds, and then initializes two temporary nodes, pred and succ, as the pre-node and post-node of the index node, respectively. If it is not in the case of the first initial insertion, the working principle of this code, you can understand that after a wooden stick is cut in two, the end of the first segment is the pre-node, and the beginning of the second segment of the wooden stick is After the node, the data we insert is similar to placing it between two wooden sticks, and then appends it in turn, and finally connects the front node and the rear node, which is equivalent to the completion of the insertion and becomes a For a longer wooden stick, this process is easier to understand by drawing with a pen.

Then you can see that there is also a method node(int index), the main function of this method is to find the Node node on the index number, although the source code has been traversed and optimized, and the query is halved. If the index is less than half of the size, it will start from the beginning. Start traversing the query backwards, otherwise traverse the query from the back to the front. Even so, traversing and querying are still the shortcomings of linked lists, which can be regarded as O(n) operations.

(2) Analysis of the add method

The add method is undoubtedly a frequently used method for operating linked lists. Its source code is as follows:

    //1
    public boolean add(E e) {
        linkLast(e);
        return true;
    }

//2
    void linkLast(E e) {
        //获取当前链表的最后一个节点
        final Node<E> l = last;
        //构造出一个新的节点，它的前置节点是当前链表的最后一个节点
        final Node<E> newNode = new Node<>(l, e, null);
        //然后把新节点作为当前链表的最后一个节点
        last = newNode;
        //首次插入
        if (l == null)
            first = newNode;
        else//非首次插入就把最后一个节点指向新插入的节点
            l.next = newNode;
        size++;//size更新
        modCount++;
    }

From the above, we can see that the add method will put the new node in the last bit of the linked list every time. It is because it is placed at the end of the linked list, so the adding performance of the linked list can be regarded as an O(1) operation.

(2) Analysis of remove method

There are two commonly used removal methods, one is to remove according to index, and the other is to remove according to Object. The source code is as follows:

   //1
    public E remove(int index) {
        checkElementIndex(index);
        return unlink(node(index));
    }
    
    //2
        public boolean remove(Object o) {
        //移除的数据为null
        if (o == null) {
            //遍历找到第一个为null的节点，然后移除掉
            for (Node<E> x = first; x != null; x = x.next) {
                if (x.item == null) {
                    unlink(x);
                    return true;
                }
            }
        } else {
            //移除的数据不为null，就遍历找到第一条相等的数据，然后移除掉
            for (Node<E> x = first; x != null; x = x.next) {
                if (o.equals(x.item)) {
                    unlink(x);
                    return true;
                }
            }
        }
        return false;
    }
    
    
    //3
        E unlink(Node<E> x) {
        // assert x != null;
        //移除的数据
        final E element = x.item;
        //移除节点后置节点
        final Node<E> next = x.next;
        //移除节点前置节点
        final Node<E> prev = x.prev;

        //如果前置节点为null，那么后置节点就赋值给首节点
        if (prev == null) {
            first = next;
        } else {//前置节点的后置节点为当前节点的后置节点
            prev.next = next;
            x.prev = null;//当前节点的前置节点置位null
        }
        //如果后置节点为null，末尾节点就为当前节点的前置节点
        if (next == null) {
            last = prev;
        } else {//否则后置节点的前置节点为移除本身的前置节点
            next.prev = prev;
            x.next = null;//移除节点的末尾为null
        }
        //移除的数据置位null，便于gc
        x.item = null;
        size--;
        modCount++;
        return element;
    }

From the above source code, you can see that the node(index) method is called in the removal according to the index to find the node that needs to be removed, and when it is removed according to the Object, the entire linked list is traversed, and then unloaded node.

In addition, the linked list has remove, removeFirst, and removeLast methods without any parameters. The remove method essentially calls the removeFirst method.

It can be concluded here that the deletion of the linked list based on the head and tail nodes can be regarded as an O(1) operation, rather than the head and tail deletion. In the worst case, it can achieve an O(n) operation, because it needs to traverse and query the specified node, so the performance is poor.

(3) Analysis of the get method

There are three get system methods, namely get(index), getFirst(), getLast(). The get(index) method is as follows:

    public E get(int index) {
        checkElementIndex(index);//是否越界
        return node(index).item;//折半遍历查询
    }

We see that the get(index) method essentially calls the node(index) method. This method has half of the performance analyzed in the previous analysis, and the other getFirst and getLast need not say more O(1) operations.

(4) Analysis of the set method

The source code is as follows:

    public E set(int index, E element) {
        checkElementIndex(index);//是否越界
        Node<E> x = node(index);//折半查询
        E oldVal = x.item;// 查询旧值
        x.item = element;//放入新值
        return oldVal;//返回旧值
    }

The set method is still the node method called, so the linked list updates the data at the specified location, and the performance is average.

(4) Analysis of clear method

    public void clear() {
         //遍历所有的数据，置位null      
        for (Node<E> x = first; x != null; ) {
            Node<E> next = x.next;
            x.item = null;
            x.next = null;
            x.prev = null;
            x = next;
        }
        first = last = null;
        size = 0;
        modCount++;
    }

The clear method is relatively simple, that is, the data of all nodes is set to null, which is convenient for garbage collection.

(5) Analysis of toArray method

    public Object[] toArray() {
      //声明长度一样的数组
        Object[] result = new Object[size];
        int i = 0;
        for (Node<E> x = first; x != null; x = x.next)
            result[i++] = x.item;
        return result;
    }

Declare an array of the same length, and traverse all the data into the array in turn.

(6) Analysis of serialization and deserialization methods

//序列化
    private void writeObject(java.io.ObjectOutputStream s)
        throws java.io.IOException {
        s.defaultWriteObject();
        //先写入大小
        s.writeInt(size);
        //再依次遍历链表写入字节流中
        for (Node<E> x = first; x != null; x = x.next)
            s.writeObject(x.item);
    }
    
    //反序列化
        private void readObject(java.io.ObjectInputStream s)
        throws java.io.IOException, ClassNotFoundException {
        // Read in any hidden serialization magic
        s.defaultReadObject();

         //先读取大小
        int size = s.readInt();
        //再依次读取元素，每次都追加到链表末尾
        for (int i = 0; i < size; i++)
            linkLast((E)s.readObject());
    }

Here we see that the serialization and deserialization methods are also customized in the linked list. When serializing, only x.item is written instead of the entire Node. This avoids the serialization mechanism that comes with java. The data is serialized for writing, and if Node is still a data structure of a double-ended linked list, it will undoubtedly lead to twice the space waste.

When deserializing, we see that the size is read first, and then the items are read in turn according to the size, and the data structure of the double-ended linked list is regenerated, and appended to the end of the linked list in turn.

(6) Methods for operating queues or stacks

At the beginning of the article, I said that LinkedList can be used as a double-ended queue or stack. There are a series of methods. Here are just a few commonly used methods. Because the principle is relatively simple, it will not be described:

pop()//移除第一个元素并返回
push(E e)//添加一个元素在队列首位
poll()  //移除第一个元素
offer(E e)//添加一个元素在队列末位
peek()//返回第一个元素，没有移除动作

Summarize:

This article introduces the working principle of LinkedList in JDK8, and analyzes its commonly used methods. The bottom layer of LinkedList is a linked list. The linked list is not a continuous address in memory, but will apply as much as it is used, so it is more efficient than ArrayList. It saves space, and its add method and delete operations are very fast, but query and traversal operations are time-consuming. Finally, LinkedList can also be used as a double-ended queue and stack container. It should be noted that LinkedList is not thread-safe. If necessary, please use the classes provided by other concurrency toolkits.

Analysis of the working principle of LinkedList in JDK8

Guess you like