In-depth understanding of the underlying principles of HashMap (a)

Implement the Map interface hash tables. This implementation provides all of the optional map operations, and allows the use null null values and keys. (In addition to allowing the use of null and non-synchronous addition, the HashMap Hashtable class is approximately the same.) This mapping does not guarantee the order, in particular, it does not guarantee that the order constancy. This implementation assuming the hash function element is suitably distributed between the tub, to provide a stable performance for the basic operations (get and put). Iteration time and the desired view HashMap instance collection of "capacity" (the number of buckets) and the size (key - value mappings) proportional. So if iteration performance is important not to set the initial capacity too high (or the load factor too low)
- From "Baidu Encyclopedia"

Foreword

Step by step, a small step, a big step, I believe that the accumulation of knowledge will be a qualitative leap.
Write this article aim is to record their own learning. I hope that this is also recorded in the industry's predecessors, if not found the right place, I hope you can timely criticism and thank!

In this part

Data in this chapter is to record some of the attributes defined purpose HashMap to get a first look at the concept and deepen.
Let's look at a few key variables defined HashMap

/**
*	默认的初始容器大小,必须是2的整幂数,如果在创建实例的时候没有初始化一个对象,
*	那么容器就会默认使用这个数值
*/
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4;
/**
*	容器的最大容量,也就说容器大小必须<=MAXIMUM_CAPACITY 
*/
static final int MAXIMUM_CAPACITY = 1 << 30;
/**
*	加载因子,用来判断容器装载的程度,这个值并不是任意取的,他取的是一个平衡值,当容器实例创建时不指定
*	的话,会默认将此值作为加载因子 final float loadFactor = DEFAULT_LOAD_FACTOR
*/
static final float DEFAULT_LOAD_FACTOR = 0.75f;
/**
*	临界值,当容器大小超过临界值的时候会进行扩容,threshold=capacity * load factor
*/
int threshold;
/**
*	加载因子
*/
final float loadFactor;
/**
* 返回(记录)key-value映射的个数
*/
transient int size;
First, the initial capacity

In fact, the use HashMap is to promote our instantiation to the specified initial capacity, which is in "Ali Baba java development manual" collection process is also clearly defined. So why do we want to initialize?
Our definition of a set of four, a default capacity, an initial capacity of 16, an initial 20,000, an initial 10,000

public void compareInitial(){
        Map<Integer, Integer> map1 = new HashMap<>();
        long start1 = System.currentTimeMillis();
        for(int i = 0 ; i <10000000;i++){
            map1.put(i,i);
        }
        long end1 = System.currentTimeMillis();
        System.out.println("map1 capacity->default 耗时:"+(end1-start1));

        Map<Integer, Integer> map2 = new HashMap<>(20000);
        long start2 = System.currentTimeMillis();
        for(int i = 0 ; i <10000000;i++){
            map2.put(i,i);
        }
        long end2 = System.currentTimeMillis();
        System.out.println("map2 capacity->20000 耗时:"+(end2-start2));


        Map<Integer, Integer> map3 = new HashMap<>(16);
        long start3 = System.currentTimeMillis();
        for(int i = 0 ; i <10000000;i++){
            map3.put(i,i);
        }
        long end3 = System.currentTimeMillis();
        System.out.println("map3 capacity->16 耗时:"+(end3-start3));


        Map<Integer, Integer> map4 = new HashMap<>(10000);
        long start4 = System.currentTimeMillis();
        for(int i = 0 ; i <10000000;i++){
            map4.put(i,i);
        }
        long end4 = System.currentTimeMillis();
        System.out.println("map4 capacity->10000 耗时:"+(end4-start4));
    }

result

map1 capacity->default 耗时:8049
map2 capacity->20000 耗时:4812
map3 capacity->16 耗时:1388
map4 capacity->10000 耗时:4685

It is obvious from the results, to a reasonable initial size of the container is enough to effectively improve performance, because the container when there is a critical threshold value kv container when loading the container will be larger than this size for expansion, expansion mechanism threshold = capacity * load factor, each expansion will rebuild HashMap Hash table.

Two, size and capacity

In the definition of variables, we feel that size seems similar capac defined.
size : Returns the number of containers stored kv
Capacity : is the size of the current container
See demo

public void sizeCompareCapacity() throws Exception{
        Map<String, String> map = new HashMap<>();
        map.put("name","张三");
        Map<String, String> map1 = new HashMap<>(5); 
        map1.put("name","张三1");
        Map<String, String> map2 = new HashMap<>(16);
        map2.put("name","张三2");
        //通过反射机制来获取容器内的方法
        Class<?> capaci =  map.getClass();
        Method method = capaci.getDeclaredMethod("capacity");
        method.setAccessible(true);

        System.out.println("map capacity: "+ method.invoke(map)+"------> map size:"+map.size());
        System.out.println("map1 capacity: "+ method.invoke(map1)+"------> map1 size:"+map.size());
        System.out.println("map2 capacity: "+ method.invoke(map2)+"------> map2 size:"+map.size());
    }

operation result

map capacity: 16------> map size:1
map1 capacity: 8------> map1 size:1
map2 capacity: 16------> map2 size:1

size size is 1, capacity is not difficult to see the size of the size of our definition.
Careful friends may see the size of the container map1 my definition is 5 why he was 8.
I first posted a Source

//无参构造器
public HashMap() {
        this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
    }
//有参构造器
public HashMap(int initialCapacity) {
		//这是我们演示代码中使用的构造方法,调用下面的方法
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }
//有参构造器
public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)  //如果初始容量必须大于0否则报错
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
         //容器大小限制不大于最大值
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
            //Float.isNaN() is not number 不是一个数,判断我们传入的值是否进行多次计算造成非法
            //数值
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
        //赋值 加载因子                                       
        this.loadFactor = loadFactor;
        //赋予 容器大小
        this.threshold = tableSizeFor(initialCapacity);
    }

Overload provided for us in the interior of the HashMap constructor 3, in the source code comments I had a rough, let's look at the
last line ** this.threshold = tableSizeFor (initialCapacity) ** method of
attaching a Source

static final int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }
JAVA8对于JAVA7的性能优化tableSizeFor算是一个很好的例子
|= 和 += 类似。
所以这里的等价于n = n | n >>> 1
>>>无符号位运算符,与>>的区别在于不管是正数或负数补位,都是补0 , 
|或运算符 也就是两个都为0才是0,否则是1
下面请看:
我们传入的是cap=5, n=cap-1=4.
第一次:>>>1
n---0100
->	0010  
n=  0110
第二次: >>>2
n---0110
->  0001
n=  0111
第三次: >>>3
n---0111
->	0000
n=	0111
其实在此后无论再怎么使用无符号位运算符都是0111->转换为十进制:n=2^0+2^1+2^2=7
使用此运算的目的就是为了让传入cap最高位的1后面都为1.比如我们的结果从0100->0111
在return的时候在进行+1  0111->1000为10进制的8
那么为什么会将cap进行-1呢?目的是放止传入的值已经是2的幂次方.
以上面的例子来说,假设我们传入的是cap=4,不进行减1的话按照上面的流程走完得到的值是7
那么return返回的结果就是8。

In conclusion, the role of tableSizeFor method is to value when we pass a vessel does not comply with the rules, he will help us to find a larger than his first a power of 2 value.
This can also explain why we passed through into a 8 5

Third, the expansion mechanism and threshold loadFactor

Let's look at a set of operating results

 public void loadAndThres() throws Exception {
        Map<Integer,Integer> map = new HashMap<>(4);
        for(int i =0;i<3;i++){
            map.put(i,i);
        }

        Class<?> cls = map.getClass();
        Method capacity = cls.getDeclaredMethod("capacity");
        capacity.setAccessible(true);
        System.out.println("capacity:"+capacity.invoke(map));

        System.out.println("size:"+map.size());

        Field threshold = cls.getDeclaredField("threshold");
        threshold.setAccessible(true);
        System.out.println("threshold:"+threshold.get(map));

        Field loadFactor = cls.getDeclaredField("loadFactor");
        loadFactor.setAccessible(true);
        System.out.println("loadFactor:"+loadFactor.get(map));
        System.out.println("---------------------------------------");
        map.put(4,4);
        System.out.println("capacity:"+capacity.invoke(map));
        System.out.println("size:"+map.size());
        System.out.println("threshold:"+threshold.get(map));
        System.out.println("loadFactor:"+loadFactor.get(map));
    }
capacity:4
size:3
threshold:3
loadFactor:0.75
---------------------------------------
capacity:8
size:4
threshold:6
loadFactor:0.75

From the results we can find:
1.threshold: is basis to judge whether the load of container, if the container size is greater than the threshold then the vessel will carry out the expansion.
Expansion of the size of loadFactor * Capacity
2.capacity: expansion of container size is a power of 2 times in ascending order.

I hope you've seen partners who are able to find problems and timely criticism and thank.

Next: in-depth understanding of the underlying principles of HashMap (b) a rough introduction to the expansion process HashMap, interested friends can help correct place to discuss shortcomings!

Published 17 original articles · won praise 18 · views 1029

Guess you like

Origin blog.csdn.net/qq_40409260/article/details/104813519