Java TreeMap source code analysis

Following the introduction of HashMap in the previous article , this article begins to introduce another important class in the Map series, TreeMap . You may feel that there are many articles about HashMap on the Internet, but not so many about TreeMap. There are reasons for this: on the one hand, there are many usage scenarios for HashMap; on the other hand, compared with HashMap, TreeMap uses Data structures are more complex. Without further ado, let's get to the point.

signature

 
        public 
        class 
        TreeMap<K,V> 
       
        extends 
        AbstractMap<K,V> 
       
        implements 
        NavigableMap<K,V>, Cloneable, java.io.Serializable

It can be seen that compared to HashMap, TreeMap inherits one more interface NavigableMap , which is this interface, which determines the difference between TreeMap and HashMap:

The keys of HashMap are unordered, and the keys of TreeMap are ordered

interface NavigableMap

First look at the signature of NavigableMap

 
        public 
        interface 
        NavigableMap<K,V>  
        extends 
        SortedMap<K,V>

It is found that NavigableMap inherits SortedMap, and then look at the signature of SortedMap

`SortedMap`

 
        public 
        interface 
        SortedMap<K,V>  
        extends 
        Map<K,V>

SortedMapJust like its name, it means that this Map is ordered. This order generally refers to the natural ordering of the keys provided by the Comparable interface , or can be determined by specifying a Comparator when creating a SortedMap instance . The order of keys is reflected when we iterate over a SortedMap instance using collection views (like HashMap, provided by the entrySet, keySet and values methods). Here is an extension of the difference between Comparable and Comparator (refer to here ):

Comparable generally represents the natural order of classes, such as defining a Student class, and the student number is the default ordering
Comparator generally represents a special classification of a class in a certain situation, which requires customized sorting. For example, now I want to sort by the age of the Student class

The class of the key inserted into the SortedMap must inherit the Comparable class (or specify a comparator), so as to determine how to compare (pass k1.compareTo(k2)or comparator.compare(k1, k2)) the two keys, otherwise, ClassCastExceptionan exception will be reported when inserting. For this reason, the order of keys in SortedMap should be equalsconsistent with the method. That is k1.compareTo(k2)to say comparator.compare(k1, k2), when it is true, k1.equals(k2)it should also be true. After introducing SortedMap, come back to our NavigableMap. NavigableMap is a new addition to JDK1.6. On the basis of SortedMap, some "navigation methods" are added to return the elements closest to the search target. For example the following methods:

lowerEntry, returns all elements smaller than the given Map.Entry
floorEntry, returns all elements less than or equal to the given Map.Entry
ceilingEntry, returns all elements greater than or equal to the given Map.Entry
higherEntry, returns all elements larger than the given Map.Entry

design concept

Red–black tree

TreeMap is implemented using a red-black tree as a basis. The red-black tree is a binary search tree . Let us recall some properties of the binary search tree together.

binary search tree

Let's take a look at what a binary search tree (BST) looks like?

binary search tree

I believe that everyone is familiar with this picture. The key points are:

The value of the left subtree is less than the root node, and the value of the right subtree is greater than the root node.

The advantage of the binary search tree is that each time a judgment is made, the size of the problem can be reduced by half, so if the binary search tree is balanced, the time complexity of finding an element is log(n), that is, the height of the tree. I think of a more serious question here. If the binary search tree reduces the problem size by half, then the trigeminal search tree reduces the problem size by two thirds. Isn't it better, and so on, we also There can be quad search trees, penta search trees... for the more general case:

With n elements, what is the best efficiency when the K of the K-ary tree search tree is? When K=2?

K-ary search tree

If you follow my analysis above, you may also fall into a misunderstanding, that is

When a ternary search tree reduces the problem size by two-thirds, the number of comparison operations required is two (when a binary search tree reduces the problem size by half, only one comparison operation is required)

We can't ignore these two times, for the more general case:

With n elements, the average number of comparisons required for a K-ary tree search tree is k*log(n/k).

For the extreme case of k=n, the K-ary tree is transformed into a linear table, and the complexity is the O(n)same . If you solve this problem from a mathematical point of view, it is equivalent to:

When n is a fixed value, what value of k is k*log(n/k)the smallest value?

k*log(n/k)According to the logarithmic operation rules, it can be converted into ln(n)*k/ln(k), which ln(n)is a constant, so it is equivalent to taking k/ln(k)the minimum value. This question is very simple for those who have just learned advanced mathematics in freshman year. Let's look at the results directly here.

When k=e, k/ln(k)take the minimum value.

The value of the natural number e is about 2.718. It can be seen that the binary tree is basically the optimal solution. Do the following in the REPL of Nodejs

 
        function 
        foo(k) { 
        return 
        k/Math.log(k);} 
       
        > foo( 
        2 
        ) 
       
        2.8853900817779268 
       
        > foo( 
        3 
        ) 
       
        2.730717679880512 
       
        > foo( 
        4 
        ) 
       
        2.8853900817779268 
       
        > foo( 
        5 
        ) 
       
        3.1066746727980594

It seems that the result obtained when k=3 is smaller than that obtained when k=2, which means that the ternary search tree should be better than the binary search tree, but why is the binary tree more popular? Later, I found the answer on the omnipotent stackoverflow , and the gist is as follows:

Modern CPUs can optimize code for binary logic, where triple logic is decomposed into multiple binary logics.

In this way, it is probably understandable why binary trees are so popular, because by performing a comparison operation, we can reduce the size of the problem by at most half. Okay, this is a little farther, let's go back to the red-black tree.

red black tree nature

Let’s take a look at the red-black tree first:

Red-Black Tree Example

The above picture is taken from the wiki. It should be noted that:

The leaf node is the NIL node in the above figure. Some domestic textbooks do not have this NIL node. We sometimes omit these NIL nodes when drawing, but we need to be clear that when we say leaf nodes, we refer to these NIL nodes.

The red-black tree ensures that the tree is balanced by the following five rules:

The nodes of the tree have only two colors, red and black.
The root node is black
Leaf nodes are black
Byte points of red nodes must be black
In the path from any node to its subsequent leaf nodes, the number of black nodes is the same

After the above five conditions are met, it can be guaranteed that: 根节点到叶子节点的最长路径不会大于根节点到叶子最短路径的2倍. In fact, this is easy to understand. It mainly uses properties 4 and 5. Here is a brief description:

Assuming that in the shortest path from the root node to the leaf node, the number of black nodes is B, then according to property 5, in the longest path from the root node to the leaf node, the number of black nodes is also B, and the longest case is between every two black nodes. There is a red node (that is, the red and black case), so the red node is at most B-1. This proves the above conclusion.

Red-Black Tree Operations

Red-black tree rotation example (NIL nodes are not drawn)

Regarding the operations of insertion, deletion, left rotation, and right rotation of the red-black tree, I think it is best to visualize it. The text expression is rather cumbersome . You know the red-black tree thoroughly . I recommend a swf teaching video here (the video is in English, don't be afraid, the key is to look at the picture??), about 7 minutes, you can refer to. There is also a visual web page of the interactive red-black tree. You can go up and operate it yourself, insert a few nodes, delete a few nodes to play, and see how the left-handed and right-handed play is played.

Source code analysis

Since I will not talk about the operation of the red-black tree here, there is basically no source code to talk about here, because the important algorithms are all From CLRhere. The CLR here refers to Cormen, Leiserson, and Rivest, who are the authors of the introduction to algorithms . , that is to say, the algorithms in TreeMap are pseudocodes referring to the introduction to algorithms. Because the red-black tree is a balanced binary search tree, the time complexity of its put (including update operation), get, and remove are all log(n).

Summarize

So far, the implementations of TreeMap and HashMap have been introduced. You can see that their implementations are different, which determines their application scenarios:

The keys of TreeMap are ordered, and the time complexity of adding, deleting, modifying and checking operations is 1. O(log(n))In order to ensure the balance of the red-black tree, rotation is performed when necessary.
The keys of HashMap are unordered, and the time complexity of adding, deleting, modifying and checking operations is O(1). In order to achieve dynamic expansion, when necessary

http://www.importnew.com/16679.html