Java programming logic (40) - analysis HashMap

The previous two describes the ArrayList and LinkedList, which is a common feature, find elements of relatively low efficiency, we need to compare one by one. This section describes HashMap, its efficiency will have to look much higher, what the HashMap? how to use? It is how to achieve? This section details.

Literally, HashMap of two words, Hash and Map, Map here is not meant map, but a map showing the relationship, is an interface that implements the Map interface has a variety of ways, ways to achieve the use of HashMap Hash.

Let's first look at the Map interface, then see how to use HashMap, then look to achieve the principles, we conclude the final analysis HashMap features.

Map Interface

basic concepts

Map the concept of keys and values, a value is mapped to a key, Map and access by key value store, the key can not be repeated, that will store a copy of a key, to set the same key value overrides the original value. Use Map can easily handle the needs based on key scenes access to the object, such as:

  • A dictionary application, can be a key word, the word value can be informational, including meaning, pronunciation, sample sentences.
  • The number of times a word appears in this book all the statistics and records, as may be the key word, the number of occurrences for the value.
  • Configuration Manager configuration file, configuration items is a typical key-value pairs.
  • According to personnel information query ID number, ID number for the key personnel information value. 

Array, ArrayList, LinkedList can be regarded as a special Map, as a key index, the value of the object.

Interface definition

Map interface is defined as:

Copy the code
public interface Map<K,V> {
    V put(K key, V value);
    V get(Object key);
    V remove(Object key);
    int size();
    boolean isEmpty();
    boolean containsKey(Object key);
    boolean containsValue(Object value);
    void putAll(Map<? extends K, ? extends V> m);
    void clear();
    Set<K> keySet();
    Collection<V> values();
    Set<Map.Entry<K, V>> entrySet();
    interface Entry<K,V> {
        K getKey();
        V getValue();
        V setValue(V value);
        boolean equals(Object o);
        int hashCode();
    }
    boolean equals(Object o);
    int hashCode();
}
Copy the code

Map interface has two types of parameters, K and V, respectively, represent the key (Key) and a type value (Value), we explain the method wherein the.

Save key for

V put(K key, V value);

Save button value key value, the already exists if the Map key, the value corresponding to coverage, the return value of the original value, if the key does not exist originally, return null. Based on the same key is that either are null, either method returns true equals.

Gets the value according to the key

V get(Object key);

If not found, returns null.

Delete key-value pairs according to key

V remove(Object key);

Returned to the original value of the corresponding key, if the key does not exist in the Map, return null.

Check the size of the Map

int size();
boolean isEmpty();

View contains a key

boolean containsKey(Object key);

View contains a value

boolean containsValue(Object value);

Batch Save

void putAll(Map<? extends K, ? extends V> m);

All keys stored in the current parameter m on the Map.

Clear Map all key-value pairs

void clear();

Gets a collection of keys in Map

Set<K> keySet();

Set is an interface that represents a collection of mathematical concepts, that there is no repetition of elements of the set, which is defined as:

public interface Set<E> extends Collection<E> {
}

It extends the Collection, but does not define any new approach, but it requires all implementers must ensure semantic constraints Set, that can not be duplicated elements. About Set, and then the next section we detail.

Keys in the Map is not repeated, so ketSet () returns a Set.

Gets a collection of all the values ​​of the Map

Collection<V> values();

Get all key-value pairs in the Map

Set<Map.Entry<K, V>> entrySet();

Map.Entry <K, V> is a nested interface defined inside the Map interface, represents a key-value pair, the main methods are:

K getKey();
V getValue();

keySet () / values ​​() / entrySet () have one thing in common, they are returned view, not a copy of the value, modify the Map itself directly modify the return value based on, for example:

map.keySet().clear();

Deletes all pairs.

HashMap

Examples of use

HashMap implements Map interface, we adopted a simple example, look at how to use.

In a random one, we introduced how to generate random numbers, now, we wrote a program that randomly generated number to see whether even, for example, randomly generated number 1000 0-3, the number of times each statistic. Code can write:

Copy the code
Random rnd = new Random();
Map<Integer, Integer> countMap = new HashMap<>();

for(int i=0; i<1000; i++){
    int num = rnd.nextInt(4);
    Integer count = countMap.get(num);
    if(count==null){
        countMap.put(num, 1);
    }else{
        countMap.put(num, count+1);
    }
}

for(Map.Entry<Integer, Integer> kv : countMap.entrySet()){
    System.out.println(kv.getKey()+","+kv.getValue());
}
Copy the code

The output of the first run as:

0,269
1,236
2,261
3,234

Code is relatively simple, is not explained.

Construction method

In addition to the default constructor, HashMap construction method there are the following:

public HashMap(int initialCapacity)
public HashMap(int initialCapacity, float loadFactor)
public HashMap(Map<? extends K, ? extends V> m)

All the key to a final Map existing structure, in which a copy of the current Map, it is easy to understand. The first two involve two two parameters initialCapacity and loadFactor, what do they mean it? We need to look at realization of the principle of HashMap.

The principle

Internal composition

It has the following major internal HashMap instance variables:

transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;
transient int size;
int threshold;
final float loadFactor;

The actual size represents the number of key-value pairs.

Entry table is a type of array, where each element points to a singly linked list, each node in the linked list represents a key-value pair, Entry is an internal class, its instance variables and methods to construct code is as follows:

Copy the code
static class Entry<K,V> implements Map.Entry<K,V> {
    final K key;
    V value;
    Entry<K,V> next;
    int hash;

    Entry(int h, K k, V v, Entry<K,V> n) {
        value = v;
        next = n;
        key = k;
        hash = h;
    }
} 
Copy the code

Wherein the key and value key and value, respectively, next Entry points to the next node, the hash key is a hash value, which we will introduce will be calculated, the hash value is stored directly in order to accelerate when comparing calculated, it will be our look at the code.

The initial value table EMPTY_TABLE, an empty table, specifically defined as:

static final Entry<?,?>[] EMPTY_TABLE = {};

When the Add pair, table on the table is not empty, it will be expanded with key-value pairs, the expansion strategy is similar to ArrayList, add the first element, the default allocation size of 16, however, and not a size larger than the further extension 16, when the next expansion and the relevant threshold.

threshold represents a threshold value, considering the number of key-value pairs when expanded size greater than or equal threshold. threshold is how come it? In general, threshold equal table.length loadFactor multiplied, for example, if table.length to 16, loadFactor 0.75, the threshold is 12.

loadFactor is the load factor indicating the degree of overall table is occupied, a float, the default is 0.75, can be modified by the constructor.

Below, we have adopted the code of some of the main methods look at how to use these internal data is HashMap implement the Map interface. Look at the default constructor. It should be noted that, for clarity and simplicity, we may ignore some of the non-primary code.

The default constructor

Code:

public HashMap() {
    this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);
}

DEFAULT_INITIAL_CAPACITY to 16, DEFAULT_LOAD_FACTOR 0.75, the main code default constructor method invocation is constructed as follows:

public HashMap(int initialCapacity, float loadFactor) {
    this.loadFactor = loadFactor;
    threshold = initialCapacity;
}

The main is to set the initial value and the threshold of loadFactor.

Save key for

Below, we look at how the HashMap is a key-value pair saved, the code is:

Copy the code
public V put(K key, V value) {
    if (table == EMPTY_TABLE) {
        inflateTable(threshold);
    }
    if (key == null)
        return putForNullKey(value);
    int hash = hash(key);
    int i = indexFor(hash, table.length);
    for (Entry<K,V> e = table[i]; e != null; e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }

    modCount++;
    addEntry(hash, key, value, i);
    return null;
}  
Copy the code

The first time you save, first calls inflateTable () method to the actual allocation table space, inflateTable main code:

Copy the code
private void inflateTable(int toSize) {
    // Find a power of 2 >= toSize
    int capacity = roundUpToPowerOf2(toSize);

    threshold = (int) Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
    table = new Entry[capacity];
}
Copy the code

By default, capacity value is 16, threshold becomes 12, table will assign a length of the array 16. Entry.

Next, check whether the key is null, and if so, call putForNullKey treatment alone, we ignore this situation.

Null is not the case, the next step in the method call key hash key calculated hash value to the hash method is the code:

Copy the code
final int hash(Object k) {
    int h = 0
    h ^= k.hashCode();
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}
Copy the code

Based on the key itself hashCode method return value, and then it makes a bit operation, the purpose of random and uniformity.

With the hash value after calling indexFor method to calculate this key should be put on the table at which position of the symbol:

static int indexFor(int h, int length) {
    return h & (length-1);
}

The HashMap, length is a power of 2, h & (length-1) is equivalent to the modulo operation: h% length.

Found the location to save i, table [i] points to a singly linked list, then, is in the list one by one to find out whether this has been the key, and traversal code:

for (Entry<K,V> e = table[i]; e != null; e = e.next) 

And when the comparison is to compare hash values, hash same time, and then using the equals method compared code:

if (e.hash == hash && ((k = e.key) == key || key.equals(k)))

Why to first compare hash of it? Because the hash is an integer, performance comparison is generally much higher than the equals comparison, different hash, it is not necessary to call the equals method, which can improve overall performance comparison.

If you can find, you can directly modify the value of Entry.

modCount ++ meaning ArrayList and LinkedList described as record number of modifications to facilitate the detection of structural changes in iteration.

If not found, then call the method to add addEntry in a given position, the code is:

Copy the code
void addEntry(int hash, K key, V value, int bucketIndex) {
    if ((size >= threshold) && (null != table[bucketIndex])) {
        resize(2 * table.length);
        hash = (null != key) ? hash(key) : 0;
        bucketIndex = indexFor(hash, table.length);
    }

    createEntry(hash, key, value, bucketIndex);
}
Copy the code

If space is enough, do not need to resize, then call createEntry added, createEntry of code:

void createEntry(int hash, K key, V value, int bucketIndex) {
    Entry<K,V> e = table[bucketIndex];
    table[bucketIndex] = new Entry<>(hash, key, value, e);
    size++;
}

Direct code comparison, a new Entry object, and unidirectional insertion head of the list, and increased size.

When it does not have to exceed the size that is below the predetermined threshold, the table and the corresponding position of the object has been inserted through a specific checking code:

if ((size >= threshold) && (null != table[bucketIndex]))

The call to resize method table is extended expansion strategy is multiplied by 2, resize the main code is as follows:

Copy the code
void resize(int newCapacity) {
    Entry[] oldTable = table;
    int oldCapacity = oldTable.length;
    Entry[] newTable = new Entry[newCapacity];
    transfer(newTable, initHashSeedAsNeeded(newCapacity));
    table = newTable;
    threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
}
Copy the code

Allocating a capacity of twice the original Entry array, call transfer method of the original key to transplant, then updates the internal variables of the table, and the value of the threshold. Method for the transfer of the code:

Copy the code
void transfer(Entry[] newTable, boolean rehash) {
    int newCapacity = newTable.length;
    for (Entry<K,V> e : table) {
        while(null != e) {
            Entry<K,V> next = e.next;
            if (rehash) {
                e.hash = null == e.key ? 0 : hash(e.key);
            }
            int i = indexFor(e.hash, newCapacity);
            e.next = newTable[i];
            newTable[i] = e;
            e = next;
        }
    }
}
Copy the code

Parameters rehash generally false. Traversing each code of the original key-value pairs, calculating a new position, and save it to a new location, the specific code is relatively straightforward, not explained.

The above is the main code stored key-value pairs, briefly summarize, the basic steps are:

  1. Hash value calculation bond
  2. The hash is worth to save the position (modulo)
  3. Chain into the position corresponding to the head or to update existing value
  4. The need to expand the size of the table 

Described above may be more abstract, we use an example, the illustrated manner, and then look at the code is:

Map<String,Integer> countMap = new HashMap<>();
countMap.put("hello", 1);
countMap.put("world", 3);

countMap.put("position", 4);

After creating an object by new HashMap (), shown in memory about the structure:

The next execution

countMap.put("hello", 1);

"Hello" in the hash value of 96207088, the result of the mold 16 is 0, the insertion Table [0] points to the head of the list, the memory structure becomes:

 "World" of the hash value 111 207 038, 15 die 16 result, so After saving "world", the memory structure becomes:

"Position" the hash value of 771782464, results die 16 is 0, table [0] has a node, the new node into the linked list header, the memory structure becomes:

The key to understanding how memory is stored, it is easier to understand other way, we get to see the method.

Gets the value according to the key

Code:

Copy the code
public V get(Object key) {
    if (key == null)
        return getForNullKey();
    Entry<K,V> entry = getEntry(key);

    return null == entry ? null : entry.getValue();
}
Copy the code

HashMap支持key为null,key为null的时候,放在table[0],调用getForNullKey()获取值,如果key不为null,则调用getEntry()获取键值对节点entry,然后调用节点的getValue()方法获取值。getEntry方法的代码是:

Copy the code
final Entry<K,V> getEntry(Object key) {
    if (size == 0) {
        return null;
    }

    int hash = (key == null) ? 0 : hash(key);
    for (Entry<K,V> e = table[indexFor(hash, table.length)];
         e != null;
         e = e.next) {
        Object k;
        if (e.hash == hash &&
            ((k = e.key) == key || (key != null && key.equals(k))))
            return e;
    }
    return null;
}
Copy the code

逻辑也比较简单:

1. 计算键的hash值,代码为:

int hash = (key == null) ? 0 : hash(key);

2. 根据hash找到table中的对应链表,代码为:

table[indexFor(hash, table.length)];

3. 在链表中遍历查找,遍历代码:

for (Entry<K,V> e = table[indexFor(hash, table.length)];
       e != null;
       e = e.next)

4. 逐个比较,先通过hash快速比较,hash相同再通过equals比较,代码为:

if (e.hash == hash &&
    ((k = e.key) == key || (key != null && key.equals(k))))

查看是否包含某个键

containsKey的逻辑与get是类似的,节点不为null就表示存在,具体代码为:

public boolean containsKey(Object key) {
    return getEntry(key) != null;
}

查看是否包含某个值

HashMap可以方便高效的按照键进行操作,但如果要根据值进行操作,则需要遍历,containsValue方法的代码为:

Copy the code
public boolean containsValue(Object value) {
    if (value == null)
        return containsNullValue();

    Entry[] tab = table;
    for (int i = 0; i < tab.length ; i++)
        for (Entry e = tab[i] ; e != null ; e = e.next)
            if (value.equals(e.value))
                return true;
    return false;
}
Copy the code

如果要查找的值为null,则调用containsNullValue单独处理,我们看不为null的情况,遍历的逻辑也很简单,就是从table的第一个链表开始,从上到下,从左到右逐个节点进行访问,通过equals方法比较值,直到找到为止。

根据键删除键值对

代码为:

public V remove(Object key) {
    Entry<K,V> e = removeEntryForKey(key);
    return (e == null ? null : e.value);
}

removeEntryForKey的代码为:

Copy the code
final Entry<K,V> removeEntryForKey(Object key) {
    if (size == 0) {
        return null;
    }
    int hash = (key == null) ? 0 : hash(key);
    int i = indexFor(hash, table.length);
    Entry<K,V> prev = table[i];
    Entry<K,V> e = prev;

    while (e != null) {
        Entry<K,V> next = e.next;
        Object k;
        if (e.hash == hash &&
            ((k = e.key) == key || (key != null && key.equals(k)))) {
            modCount++;
            size--;
            if (prev == e)
                table[i] = next;
            else
                prev.next = next;
            e.recordRemoval(this);
            return e;
        }
        prev = e;
        e = next;
    }

    return e;
}
Copy the code

基本逻辑为:

1. 计算hash,根据hash找到对应的table索引,代码为:

int hash = (key == null) ? 0 : hash(key);
int i = indexFor(hash, table.length);

2. 遍历table[i],查找待删节点,使用变量prev指向前一个节点,next指向下一个节点,e指向当前节点,遍历结构代码为:

Copy the code
Entry<K,V> prev = table[i];
Entry<K,V> e = prev;
while (e != null) {
    Entry<K,V> next = e.next;
    if(找到了){
       //删除
       return;
    }
    prev = e;
    e = next;
}
Copy the code

3. 判断是否找到,依然是先比较hash,hash相同时再用equals方法比较

4. 删除的逻辑就是让长度减小,然后让待删节点的前后节点连起来,如果待删节点是第一个节点,则让table[i]直接指向后一个节点,代码为:

size--;
if (prev == e)
    table[i] = next;
else
    prev.next = next;

e.recordRemoval(this);在HashMap中代码为空,主要是为了HashMap的子类扩展使用。

实现原理小结

以上就是HashMap的基本实现原理,内部有一个数组table,每个元素table[i]指向一个单向链表,根据键存取值,用键算出hash,取模得到数组中的索引位置buketIndex,然后操作table[buketIndex]指向的单向链表。

存取的时候依据键的hash值,只在对应的链表中操作,不会访问别的链表,在对应链表操作时也是先比较hash值,相同的话才用equals方法比较,这就要求,相同的对象其hashCode()返回值必须相同,如果键是自定义的类,就特别需要注意这一点。这也是hashCode和equals方法的一个关键约束,这个约束我们在介绍包装类的时候也提到过。

HashMap特点分析

HashMap实现了Map接口,内部使用数组链表和哈希的方式进行实现,这决定了它有如下特点:

  • 根据键保存和获取值的效率都很高,为O(1),每个单向链表往往只有一个或少数几个节点,根据hash值就可以直接快速定位。
  • HashMap中的键值对没有顺序,因为hash值是随机的。

 如果经常需要根据键存取值,而且不要求顺序,那HashMap就是理想的选择。

小结

本节介绍了HashMap的用法和实现原理,它实现了Map接口,可以方便的按照键存取值,它的实现利用了哈希,可以根据键自身直接定位,存取效率很高。

根据哈希值存取对象、比较对象是计算机程序中一种重要的思维方式,它使得存取对象主要依赖于自身哈希值,而不是与其他对象进行比较,存取效率也就与集合大小无关,高达O(1),即使进行比较,也利用哈希值提高比较性能。

But HashMap no order, if you want to maintain the order of addition, you can use a subclass of HashMap LinkedHashMap, we'll introduce the follow-up. Map there is an important implementation class TreeMap, it can be sorted, we left to later chapters introduced.

This section refers to the Set interface, the next section, let's explore one of its important implementation class HashSet.

 

Guess you like

Origin www.cnblogs.com/ivy-xu/p/12389777.html