HashMap深入学习
一、简单介绍
HashMap是一种比较常用的容器,其结构是数组加链表。外层数组,每个数组元素是链表。用链表解决hash冲突。结合了数组易寻址及链表易插入删除的优点。
二、HahMap的结构
1、主要属性:
/** * The default initial capacity - MUST be a power of two. */ static final int DEFAULT_INITIAL_CAPACITY = 16; /** * The maximum capacity, used if a higher value is implicitly specified * by either of the constructors with arguments. * MUST be a power of two <= 1<<30. */ static final int MAXIMUM_CAPACITY = 1 << 30; /** * The load factor used when none specified in constructor. */ static final float DEFAULT_LOAD_FACTOR = 0.75f; /** * The table, resized as necessary. Length MUST Always be a power of two. */ transient Entry[] table; /** * The number of key-value mappings contained in this map. */ transient int size; /** * The next size value at which to resize (capacity * load factor). * @serial */ int threshold; /** * The load factor for the hash table. * * @serial */ final float loadFactor; /** * The number of times this HashMap has been structurally modified * Structural modifications are those that change the number of mappings in * the HashMap or otherwise modify its internal structure (e.g., * rehash). This field is used to make iterators on Collection-views of * the HashMap fail-fast. (See ConcurrentModificationException). */ transient volatile int modCount;
分类 | 属性名 | 详细 |
类变量(default static final) | DEFAULT_INITIAL_CAPACITY | 默认初始容量,初始化不设置初始值以16为初始值,初始值必须是2的幂值。(table数组的size) |
类变量(default static final) | MAXIMUM_CAPACITY | 最大容量,1>>>30(table数组size) |
类变量(default static final) | DEFAULT_LOAD_FACTOR | 默认比例因子 0.75,当初始化没有传值时使用。(怎么计算的待研究) |
实例变量(default transient Entry[]) | table | 数组,用来存放hashMap单元数据,每个数组存一个链表结构。数组大小为2的幂值,大小不够可以resize |
实例变量(default transient int) | size | HashMap数据总数 |
实例变量 (default) | threshold | table resize 的阈值=capacity * load factor |
实例变量(final float) | loadFactor | 加载因子 size/capacity, 默认为0.75 |
实例变量(transient volatile int) | modCount | 有新增entry就会加1,remove也加1。 用来做简单的并发控制。 |
2、主要方法实现
1、初始化
/**
* Constructs an empty <tt>HashMap</tt> with the specified initial
* capacity and load factor.
*
* @param initialCapacity the initial capacity
* @param loadFactor the load factor
* @throws IllegalArgumentException if the initial capacity is negative
* or the load factor is nonpositive
*/
public HashMap(int initialCapacity, float loadFactor) {
if (initialCapacity < 0)
throw new IllegalArgumentException("Illegal initial capacity: " +
initialCapacity);
if (initialCapacity > MAXIMUM_CAPACITY)
initialCapacity = MAXIMUM_CAPACITY;
if (loadFactor <= 0 || Float.isNaN(loadFactor))
throw new IllegalArgumentException("Illegal load factor: " +
loadFactor);
// Find a power of 2 >= initialCapacity
int capacity = 1;
while (capacity < initialCapacity)
capacity <<= 1;
this.loadFactor = loadFactor;
threshold = (int)(capacity * loadFactor);
table = new Entry[capacity];
init();
}
入参: (1)initialCapacity 初始容量,指外层数组的初始大小。 (2)loadFactor 加载因子,当 size/capacity>loadFactor 需要resize扩大数组大小,默认为0.75。
注:数组容量为大于传入初始容量的最小2的幂值。 2、hash方法
/** * Applies a supplemental hash function to a given hashCode, which * defends against poor quality hash functions. This is critical * because HashMap uses power-of-two length hash tables, that * otherwise encounter collisions for hashCodes that do not differ * in lower bits. Note: Null keys always map to hash 0, thus index 0. */ static int hash(int h) { // This function ensures that hashCodes that differ only by // constant multiples at each bit position have a bounded // number of collisions (approximately 8 at default load factor). h ^= (h >>> 20) ^ (h >>> 12); return h ^ (h >>> 7) ^ (h >>> 4); }入参是key值的hashcode,目的是让hash表更松散,具体原理 待研究。 因为hashmap表长度是2的幂,经过indexfor的计算,就是取hashcode的末几位,hash方法的作用,是让末几位受到其他位差异的影响更多,让hashmap更松散。
3、indexfor方法
/** * Returns index for hash code h. */ static int indexFor(int h, int length) { return h & (length-1); }
因为hashmap表长度是2的幂,经过indexfor的计算,就是取hashcode的末几位
4、get方法
/** * Returns the value to which the specified key is mapped, * or {@code null} if this map contains no mapping for the key. * * <p>More formally, if this map contains a mapping from a key * {@code k} to a value {@code v} such that {@code (key==null ? k==null : * key.equals(k))}, then this method returns {@code v}; otherwise * it returns {@code null}. (There can be at most one such mapping.) * * <p>A return value of {@code null} does not <i>necessarily</i> * indicate that the map contains no mapping for the key; it's also * possible that the map explicitly maps the key to {@code null}. * The {@link #containsKey containsKey} operation may be used to * distinguish these two cases. * * @see #put(Object, Object) */ public V get(Object key) { if (key == null) return getForNullKey(); int hash = hash(key.hashCode()); for (Entry<K,V> e = table[indexFor(hash, table.length)]; e != null; e = e.next) { Object k; if (e.hash == hash && ((k = e.key) == key || key.equals(k))) return e.value; } return null; }
HashMap允许key为null,key为null放在数组第一位。
获取方法就是,先根据key算出数组下标,取到链表,再遍历链表获取到key相同的值,获取其value值。
遍历的时候,会先比较hash值,我理解这是为了提高性能,因为遍历的时候大多数情况应该是不等的,而大多数的不等通过hash值得比较就能过滤掉。不用通过equals比较,因为hash值是算好的,equals还得再算一遍,有些复杂的可能性能消耗不少。
5、put方法
/** * Associates the specified value with the specified key in this map. * If the map previously contained a mapping for the key, the old * value is replaced. * * @param key key with which the specified value is to be associated * @param value value to be associated with the specified key * @return the previous value associated with <tt>key</tt>, or * <tt>null</tt> if there was no mapping for <tt>key</tt>. * (A <tt>null</tt> return can also indicate that the map * previously associated <tt>null</tt> with <tt>key</tt>.) */ public V put(K key, V value) { if (key == null) return putForNullKey(value); int hash = hash(key.hashCode()); int i = indexFor(hash, table.length); for (Entry<K,V> e = table[i]; e != null; e = e.next) { Object k; if (e.hash == hash && ((k = e.key) == key || key.equals(k))) { V oldValue = e.value; e.value = value; e.recordAccess(this); return oldValue; } } modCount++; addEntry(hash, key, value, i); return null; }根据key算出table下标;取到对应链表,遍历链表,获取key值相同的entry更新value值返回oldValue;如果没有新增entry,返回null。 6、addEntry 方法
/** * Adds a new entry with the specified key, value and hash code to * the specified bucket. It is the responsibility of this * method to resize the table if appropriate. * * Subclass overrides this to alter the behavior of put method. */ void addEntry(int hash, K key, V value, int bucketIndex) { Entry<K,V> e = table[bucketIndex]; table[bucketIndex] = new Entry<K,V>(hash, key, value, e); if (size++ >= threshold) resize(2 * table.length); }根据指定的hash值、key、value、bucketIndex(table下标)新增entry;如果大小超过阈值(capacity*loadFactor),table阔成原来的两倍。 7、resize方法
/** * Rehashes the contents of this map into a new array with a * larger capacity. This method is called automatically when the * number of keys in this map reaches its threshold. * * If current capacity is MAXIMUM_CAPACITY, this method does not * resize the map, but sets threshold to Integer.MAX_VALUE. * This has the effect of preventing future calls. * * @param newCapacity the new capacity, MUST be a power of two; * must be greater than current capacity unless current * capacity is MAXIMUM_CAPACITY (in which case value * is irrelevant). */ void resize(int newCapacity) { Entry[] oldTable = table; int oldCapacity = oldTable.length; if (oldCapacity == MAXIMUM_CAPACITY) { threshold = Integer.MAX_VALUE; return; } Entry[] newTable = new Entry[newCapacity]; transfer(newTable); table = newTable; threshold = (int)(newCapacity * loadFactor); }如果旧表容量已经到MAXIMUM_CAPACITY, threshold设置成最大int值,table不做扩大。 否则根据传入新capacity,创建新table,将旧table数据迁移到新table,重新计算threshold(阈值)。
8、transfer方法,将oldTable数据迁移到newTable
/**
* Transfers all entries from current table to newTable.
*/
void transfer(Entry[] newTable) {
Entry[] src = table;
int newCapacity = newTable.length;
for (int j = 0; j < src.length; j++) {
Entry<K,V> e = src[j];
if (e != null) {
//oldtable数据设置为null,我理解能够快速垃圾回收。
src[j] = null;
do {
Entry<K,V> next = e.next;
int i = indexFor(e.hash, newCapacity);
e.next = newTable[i];
newTable[i] = e;
e = next;
} while (e != null);
}
}
}
遍历旧table的entry,重新计算index,存入到new table。
9、HashIterator内部类,HashMap迭代器基类。
private abstract class HashIterator<E> implements Iterator<E> { Entry<K,V> next; // next entry to return int expectedModCount; // For fast-fail int index; // current slot Entry<K,V> current; // current entry HashIterator() { expectedModCount = modCount; if (size > 0) { // advance to first entry Entry[] t = table; while (index < t.length && (next = t[index++]) == null) ; } } public final boolean hasNext() { return next != null; } final Entry<K,V> nextEntry() { if (modCount != expectedModCount) throw new ConcurrentModificationException(); Entry<K,V> e = next; if (e == null) throw new NoSuchElementException(); if ((next = e.next) == null) { Entry[] t = table; while (index < t.length && (next = t[index++]) == null) ; } current = e; return e; } //可以用这个方法一边遍历一边删除。 public void remove() { if (current == null) throw new IllegalStateException(); if (modCount != expectedModCount) throw new ConcurrentModificationException(); Object k = current.key; current = null; HashMap.this.removeEntryForKey(k); expectedModCount = modCount; } } private final class ValueIterator extends HashIterator<V> { public V next() { return nextEntry().value; } } private final class KeyIterator extends HashIterator<K> { public K next() { return nextEntry().getKey(); } } private final class EntryIterator extends HashIterator<Map.Entry<K,V>> { public Map.Entry<K,V> next() { return nextEntry(); } } // Subclass overrides these to alter behavior of views' iterator() method Iterator<K> newKeyIterator() { return new KeyIterator(); } Iterator<V> newValueIterator() { return new ValueIterator(); } Iterator<Map.Entry<K,V>> newEntryIterator() { return new EntryIterator(); }
三、优势
1、功能方面:
(1)根据key值寻找对应value,应用范围非常广。
2、性能方面
(1)结合数组寻址快速,链表插入删除快速的优点,寻找删除快速。
四、劣势
1、功能
(1)无序
2、性能
3、可靠性
(1)非线程安全的,modCout参数有简单的防并发功能。
五、应用场景和使用注意
(1)大小已知时,HashMap初始化需要传入长度参数,因为不传参数,HashMap的bucket个数不够会做resize比较耗性能。