Analysis of the Implementation Principles of Hashmap and Hashtable

The difference between HashMap and Hashtable

The main difference between the two is that Hashtable is thread-safe, while HashMap is not thread-safe

The synchronized keyword is added in the implementation methods of Hashtable to ensure thread synchronization, so the performance of HashMap will be higher relatively. We recommend using HashMap if there is no special requirement in normal use. If you use HashMap in a multi-threaded environment, you need to use Collections. synchronizedMap() method to obtain a thread-safe collection (Collections.synchronizedMap() implementation principle is that Collections defines an inner class of SynchronizedMap, which implements the Map interface, and uses synchronized to ensure thread synchronization when calling methods. Of course, the actual The above operation is still the HashMap instance we passed in. Simply put, the Collections.synchronizedMap() method helps us automatically add synchronized to achieve thread synchronization when operating the HashMap. Similar to other Collections.synchronizedXX methods are similar)

HashMap can use null as key, while Hashtable does not allow null as key

Although HashMap supports null value as key, it is recommended to avoid using it as much as possible, because once it is used accidentally, if it causes some problems, it will be very troublesome to investigate.
When HashMap uses null as key, it is always stored in the first place of the table array. on nodes

HashMap is the implementation of Map interface, HashTable implements Map interface and Dictionary abstract class

The initial capacity of HashMap is 16, the initial capacity of Hashtable is 11, and the default fill factor of both is 0.75

When HashMap is expanded, the current capacity is doubled, namely: capacity*2, and when Hashtable is expanded, the capacity is doubled +1, namely: capacity*2+1

The two methods of calculating the hash are different

Hashtable calculates the hash by directly using the hashcode of the key to directly modulate the length of the table array

int hash = key.hashCode();
int index = (hash & 0x7FFFFFFF) % tab.length;

HashMap calculates the hash and performs a secondary hash on the hashcode of the key to obtain a better hash value, and then touches the length of the table array

static int hash(int h) {
        // This function ensures that hashCodes that differ only by
        // constant multiples at each bit position have a bounded
        // number of collisions (approximately 8 at default load factor).
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }

 static int indexFor(int h, int length) {
        return h & (length-1);
    }

The underlying implementations of HashMap and Hashtable are array + linked list structure implementation

The difference between HashSet and HashMap, Hashtable

In addition to HashMap and Hashtable, there is also a hash set HashSet. The difference is that HashSet is not a key value structure, but only stores non-repeating elements, which is equivalent to a simplified version of HashMap, which only contains the keys in HashMap.

This is also confirmed by looking at the source code. HashSet is implemented using HashMap, but all the values ​​of HashMap in HashSet are the same Object, so HashSet is also not thread-safe. As for the difference between HashSet and Hashtable, HashSet is a Simplified HashMap, so you know the
following is the implementation of several main methods of HashSet

private transient HashMap<E,Object> map;
  private static final Object PRESENT = new Object();

  public HashSet() {
    map = new HashMap<E,Object>();
    }
 public boolean contains(Object o) {
    return map.containsKey(o);
    }
 public boolean add(E e) {
    return map.put(e, PRESENT)==null;
    }
 public boolean add(E e) {
    return map.put(e, PRESENT)==null;
    }
 public boolean remove(Object o) {
    return map.remove(o)==PRESENT;
    }


 public void clear() {
    map.clear();
    }

Implementation principle of HashMap and Hashtable

The underlying implementations of HashMap and Hashtable are implemented by array + linked list structure, which are completely consistent in this point.

When adding, deleting, and obtaining elements, the hash is calculated first, and the index is calculated according to the hash and table.length, which is the subscript of the table array, and then the corresponding operation is performed. The following takes HashMap as an example to illustrate its simple implementation.

 /**
     * HashMap的默认初始容量 必须为2的n次幂
     */
    static final int DEFAULT_INITIAL_CAPACITY = 16;

    /**
     * HashMap的最大容量,可以认为是int的最大值    
     */
    static final int MAXIMUM_CAPACITY = 1 << 30;

    /**
     * 默认的加载因子
     */
    static final float DEFAULT_LOAD_FACTOR = 0.75f;

    /**
     * HashMap用来存储数据的数组
     */
    transient Entry[] table;

Creation of HashMap

When HashMap is initialized by default, an Entry array with a default capacity of 16 is created, the default load factor is 0.75, and the threshold value is set to 16*0.75

/**
     * Constructs an empty <tt>HashMap</tt> with the default initial capacity
     * (16) and the default load factor (0.75).
     */
    public HashMap() {
        this.loadFactor = DEFAULT_LOAD_FACTOR;
        threshold = (int)(DEFAULT_INITIAL_CAPACITY * DEFAULT_LOAD_FACTOR);
        table = new Entry[DEFAULT_INITIAL_CAPACITY];
        init();
    }

put method

HashMap will treat the null value key specially, and always put it in the table[0] position.
The put process is to calculate the hash first, then calculate the index value by taking the hash and table.length, and then put the key in the table[index] position. When other elements already exist in table[index], a linked list will be formed at the position of table[index], the newly added element will be placed in table[index], and the original elements will be linked through the next of the Entry, so that the hash conflict will be resolved in the form of a linked list. The problem, when the number of elements reaches the critical value (capactiy*factor), the expansion is performed, and the length of the table array becomes table.length*2

public V put(K key, V value) {
        if (key == null)
            return putForNullKey(value); //处理null值
        int hash = hash(key.hashCode());//计算hash
        int i = indexFor(hash, table.length);//计算在数组中的存储位置
    //遍历table[i]位置的链表,查找相同的key,若找到则使用新的value替换掉原来的oldValue并返回oldValue
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }
    //若没有在table[i]位置找到相同的key,则添加key到table[i]位置,新的元素总是在table[i]位置的第一个元素,原来的元素后移
        modCount++;
        addEntry(hash, key, value, i);
        return null;
    }


    void addEntry(int hash, K key, V value, int bucketIndex) {
    //添加key到table[bucketIndex]位置,新的元素总是在table[bucketIndex]的第一个元素,原来的元素后移
    Entry<K,V> e = table[bucketIndex];
        table[bucketIndex] = new Entry<K,V>(hash, key, value, e);
    //判断元素个数是否达到了临界值,若已达到临界值则扩容,table长度翻倍
        if (size++ >= threshold)
            resize(2 * table.length);
    }

get method

Similarly, when the key is null, special processing will be performed.
The process of finding the element get with the key null on the linked list of table[0] is to first calculate the hash and then calculate the index value by taking the hash and table.length, and then traverse table[index. ] until the key is found, then return

public V get(Object key) {
        if (key == null)
            return getForNullKey();//处理null值
        int hash = hash(key.hashCode());//计算hash
    //在table[index]遍历查找key,若找到则返回value,找不到返回null
        for (Entry<K,V> e = table[indexFor(hash, table.length)];
             e != null;
             e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
                return e.value;
        }
        return null;
    }

remove method

The remove method is similar to put get. It calculates the hash, calculates the index, and then traverses the search to remove the found elements from the table[index] linked list.

public V remove(Object key) {
        Entry<K,V> e = removeEntryForKey(key);
        return (e == null ? null : e.value);
    }
    final Entry<K,V> removeEntryForKey(Object key) {
        int hash = (key == null) ? 0 : hash(key.hashCode());
        int i = indexFor(hash, table.length);
        Entry<K,V> prev = table[i];
        Entry<K,V> e = prev;

        while (e != null) {
            Entry<K,V> next = e.next;
            Object k;
            if (e.hash == hash &&
                ((k = e.key) == key || (key != null && key.equals(k)))) {
                modCount++;
                size--;
                if (prev == e)
                    table[i] = next;
                else
                    prev.next = next;
                e.recordRemoval(this);
                return e;
            }
            prev = e;
            e = next;
        }

        return e;
    }

resize method

The resize method is not disclosed in the hashmap. This method implements a very important hashmap expansion. The specific process is: first create a new table with a capacity of table.length*2, modify the critical value, and then calculate the hash value of the elements in the table and Use hash and table.length*2 to recalculate the index and put it into the new table. It should be noted
here that the index is recalculated with the hash of each element, instead of simply moving the element corresponding to the index position of the original table to the new table. table corresponding location

void resize(int newCapacity) {
        Entry[] oldTable = table;
        int oldCapacity = oldTable.length;
        if (oldCapacity == MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return;
        }

        Entry[] newTable = new Entry[newCapacity];
        transfer(newTable);
        table = newTable;
        threshold = (int)(newCapacity * loadFactor);
    }

    void transfer(Entry[] newTable) {
        Entry[] src = table;
        int newCapacity = newTable.length;
        for (int j = 0; j < src.length; j++) {
            Entry<K,V> e = src[j];
            if (e != null) {
                src[j] = null;        
                do {
                    Entry<K,V> next = e.next;
                    //重新对每个元素计算index
                    int i = indexFor(e.hash, newCapacity);
                    e.next = newTable[i];
                    newTable[i] = e;
                    e = next;
                } while (e != null);
            }
        }
    }

clear() method

The clear method is very simple. It traverses the table and sets each position to null, and at the same time modifies the number of elements to 0. It
should be noted that the clear method will only clear the elements inside, and will not reset capactiy

public void clear() {
        modCount++;
        Entry[] tab = table;
        for (int i = 0; i < tab.length; i++)
            tab[i] = null;
        size = 0;
    }

containsKey和containsValue

The containsKey method is to first calculate the hash and then use the hash and table.length to get the index value, and traverse the table[index] element to find whether it contains the same value as the key

public boolean containsKey(Object key) {
        return getEntry(key) != null;
    }
final Entry<K,V> getEntry(Object key) {
        int hash = (key == null) ? 0 : hash(key.hashCode());
        for (Entry<K,V> e = table[indexFor(hash, table.length)];
             e != null;
             e = e.next) {
            Object k;
            if (e.hash == hash &&
                ((k = e.key) == key || (key != null && key.equals(k))))
                return e;
        }
        return null;
    }

The containsValue method is relatively crude, that is, it traverses all elements directly until the value is found. It can be seen that the containsValue method of HashMap is essentially the same as the contains method of ordinary arrays and lists. You don't expect it to be as efficient as containsKey.

public boolean containsValue(Object value) {
    if (value == null)
            return containsNullValue();

    Entry[] tab = table;
        for (int i = 0; i < tab.length ; i++)
            for (Entry e = tab[i] ; e != null ; e = e.next)
                if (value.equals(e.value))
                    return true;
    return false;
    }

hash and indexFor

h & (length-1) in indexFor is equivalent to h%length, which is used to calculate index, that is, the subscript
hash method in the table array is to hash the hashcode twice to
obtain a better hash value
in order to For a better understanding, we can simplify these two methods to int index= key.hashCode()/table.length, and take the method in put as an example, which can be replaced like this

int hash = hash(key.hashCode());//计算hash
int i = indexFor(hash, table.length);//计算在数组中的存储位置
//上面这两行可以这样简化
int i = key.key.hashCode()%table.length;
  static int hash(int h) {
        // This function ensures that hashCodes that differ only by
        // constant multiples at each bit position have a bounded
        // number of collisions (approximately 8 at default load factor).
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }


    static int indexFor(int h, int length) {
        return h & (length-1);
    }

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326455069&siteId=291194637