本文以JDK1.8源码为例。
一、HashMap底层结构
HashMap底层采用数组+单向链表+红黑树实现,结构示意图如下:
下面我们结合源码对此图进行说明。HashMap类继承自Map接口,有如下三种构造方法:
第一种:
/**
* Constructs an empty <tt>HashMap</tt> with the specified initial
* capacity and load factor.
*
* @param initialCapacity the initial capacity
* @param loadFactor the load factor
* @throws IllegalArgumentException if the initial capacity is negative
* or the load factor is nonpositive
*/
public HashMap(int initialCapacity, float loadFactor){
......
}
该构造方法创建HashMap时可以指定HashMap的初始容量(initialCapacity)和加载因子(loadFactor)。
第二种:
/**
* Constructs an empty <tt>HashMap</tt> with the specified initial
* capacity and the default load factor (0.75).
*
* @param initialCapacity the initial capacity.
* @throws IllegalArgumentException if the initial capacity is negative.
*/
public HashMap(int initialCapacity) {
this(initialCapacity, DEFAULT_LOAD_FACTOR);
}
该构造方法可以指定HashMap的初始容量,而加载因子使用默认值DEFAULT_LOAD_FACTOR=0.75。
第三种:
/**
* Constructs an empty <tt>HashMap</tt> with the default initial capacity
* (16) and the default load factor (0.75).
*/
public HashMap() {
this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}
该构造方法创建的HashMap使用默认的初始容量和加载因子,分别为16和0.75,源码如下:
/**
* The default initial capacity - MUST be a power of two.
*/
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
/**
* The load factor used when none specified in constructor.
*/
static final float DEFAULT_LOAD_FACTOR = 0.75f;
现在我们我们使用第三种方式构建一个自己的HashMap,并向其中加入一个(key,value)键值对,会发生什么?
Map<Integer,String> map = new HashMap<>();
map.put(1,null);
第一行代码构建了一个空的map,此时里面什么都没有。第二行代码通过调用put方法向其中加入一个键值对(1,null)(HashMap允许向其中加入key=null或value=null的键值对,而HashTable则不可以,这也是两者的一个不同点)。put方法源码如下:
public V put(K key, V value) {
return putVal(hash(key), key, value, false, true);
}
put方法调用了putVal方法,第一个参数计算key的hash值,源码如下:
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;
//如果数组为空或者没有元素,调用resize方法扩容
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length;
//根据要put进去的键值对的hash寻找数组下标i
//如果下标i的位置没有数据创建Node直接放入
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);
//如果下标i的位置有数据
else {
//e用来存储map与要put的数据的key相同的数据
Node<K,V> e; K k;
//下标i位置的数据和要put进去的数据的key相同时,将数据赋给中间变量e
if (p.hash == hash &&
((k = p.key) == key || (key != null && key.equals(k))))
e = p;
//下标i位置数据是红黑树,使用红黑树的插入方式,将数据放入红黑树中
//将红黑树中key相同的数据赋给中间变量e
else if (p instanceof TreeNode)
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
//下标i位置数据是单向链表
else {
//遍历单向链表,将要put进去的数据放入链表的末尾
for (int binCount = 0; ; ++binCount) {
if ((e = p.next) == null) {
p.next = newNode(hash, key, value, null);
//如果链表长度达到TREEIFY_THRESHOLD=8,将链表转换成红黑树
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash);
break;
}
//找到与要put进去的数据相同的key的数据存储到e中
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
break;
p = e;
}
}
//找到了与put数据相同key的数据e时,更新key对应的value值
if (e != null) { // existing mapping for key
V oldValue = e.value;
if (!onlyIfAbsent || oldValue == null)
e.value = value;
afterNodeAccess(e);
return oldValue;
}
}
++modCount;
if (++size > threshold)
resize();
afterNodeInsertion(evict);
return null;
}
putVal源码基本思路:
1、添加第一个键值对之前,对空的map先扩容,容量大小为16,加载因子为0.75;
2、通过键值对(key,value)中key的hash值,寻找数组下标i,根据下标位置是否有数据,分两种情况:
(1)下标对应的位置没有数据,通过键值对(key,value)直接创建一个Node<Integer,String>,放入该位置;
(2)下标对应位置有数据,分两种情况(单个节点也可以看成是链表)
a、数据是Node<Integer,String>构成的链表,将数据插入链表的末尾,并判断此时链表长度是否达到8,如果达到,将链表转换成红黑树;同时,将与插入数据具有相同key的数据临时取出
c、数据是红黑树,按红黑树的插入方式,将数据插入;同时,将与插入数据具有相同key的数据临时取出
(3)前面两步中,如果map中已经存在相同key的数据,则实际上不会进行插入数据,而是根据临时取出的数据更新value值。
n = (tab = resize()).length;
调用如下resize方法进行扩容:
final Node<K,V>[] resize() {
Node<K,V>[] oldTab = table;
int oldCap = (oldTab == null) ? 0 : oldTab.length;
int oldThr = threshold;
int newCap, newThr = 0;
if (oldCap > 0) {
if (oldCap >= MAXIMUM_CAPACITY) {
threshold = Integer.MAX_VALUE;
return oldTab;
}
else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
oldCap >= DEFAULT_INITIAL_CAPACITY)
newThr = oldThr << 1; // double threshold
}
else if (oldThr > 0) // initial capacity was placed in threshold
newCap = oldThr;
else { // zero initial threshold signifies using defaults
newCap = DEFAULT_INITIAL_CAPACITY;
newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
}
if (newThr == 0) {
float ft = (float)newCap * loadFactor;
newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
(int)ft : Integer.MAX_VALUE);
}
threshold = newThr;
@SuppressWarnings({"rawtypes","unchecked"})
Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
table = newTab;
if (oldTab != null) {
for (int j = 0; j < oldCap; ++j) {
Node<K,V> e;
if ((e = oldTab[j]) != null) {
oldTab[j] = null;
if (e.next == null)
newTab[e.hash & (newCap - 1)] = e;
else if (e instanceof TreeNode)
((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
else { // preserve order
Node<K,V> loHead = null, loTail = null;
Node<K,V> hiHead = null, hiTail = null;
Node<K,V> next;
do {
next = e.next;
if ((e.hash & oldCap) == 0) {
if (loTail == null)
loHead = e;
else
loTail.next = e;
loTail = e;
}
else {
if (hiTail == null)
hiHead = e;
else
hiTail.next = e;
hiTail = e;
}
} while ((e = next) != null);
if (loTail != null) {
loTail.next = null;
newTab[j] = loHead;
}
if (hiTail != null) {
hiTail.next = null;
newTab[j + oldCap] = hiHead;
}
}
}
}
}
return newTab;
}
代码执行
@SuppressWarnings({"rawtypes","unchecked"})
Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
到这里,我们就明白了示意图中黄色虚线内的16个小方框的真正含义了。其实就是一个长度为16的Node<Integer,String>类型的数组。
那么加载因子又是怎么回事?加载因子是用来设定扩容时机的。比如0.75,那么当数组填满超过16*0.75=12时,map自动扩容为原来的2倍,也就是32。下一次当数组填满超过32*0.75=24个时,扩容为64,以此类推……。那么为何不是填满16个再扩容呢?回答这个问题,我们先看数据是怎么放入数组中的。查看putVal方法,我们看到键值对放入map中需要计算key的hash值,源码如下:
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
这里为了说明方便,我们做一个假设:假设hash方法是key对某个整数取模,比如12。刚才我们放入的一个键值对(1,null),应该是按如下过程进行的。首先,扩容,形成容量为16的数组。然后计算1的hash值为hash(1)=1%12=1,找到下标为1的位置,对应示意图中的第二个小方框,因为该方框中还没有数据,直接将数据放入。同理键值对(0,"h")会放入第一个小方框中。那么键值对(13,"apple")和(1,"hello")又如何存放?我们先看(13,"apple"),因为hash(13)=13%12=1,找到数组下标1,但是下标1的位置已经放入了(1,null),这里就发生了冲突(称为hash碰撞),但是删除掉这个数据肯定不合适,所以(13,"apple")会以链表的形式添加在(1,null)的末尾。同理(1,"hello")放入的位置也是第二个小方框,前面我们看到putVal方法中有一个参数onlyIfAbsent,在被调用时设置成了false,意思是遇到相同的key,更新value值,所以null值会被更新为hello。如果发生碰撞的数据很多,比如(25,"h"),(37,"h"),(49,"h"),(61,"h")……这样的键值对都需要放入下标为1的位置,如果每次都是添加在链表的末尾,会形成非常长的链表,这样查询效率会非常低。为了提高查询效率,HashMap采取了另外一种思路,当下标位置的元数据个数达到一定阈值时(默认为8),采用红黑树存储,源码如下:
/**
* The bin count threshold for using a tree rather than list for a
* bin. Bins are converted to trees when adding an element to a
* bin with at least this many nodes. The value must be greater
* than 2 and should be at least 8 to mesh with assumptions in
* tree removal about conversion back to plain bins upon
* shrinkage.
*/
static final int TREEIFY_THRESHOLD = 8;
也就是说当下标1的位置到达8个元素时,单向链表转换成红黑树,如示意图所示。我们再回到加载因子的问题,加载因子越大,填满的元素越多,空间利用率越高,但冲突的机会加大了。例如假设加载因子为1,扩容的阈值为16*1=16,也就是说只有当数组全部填满才会再次扩容,显然空间利用率很高,但是当填满了15个时,再次插入数据时,hash碰撞的概率高达15/16(不一定准确,因为具体hash算法不明确),碰撞几率很大,会容易形成链表和红黑树,查询数据变慢。反之,加载因子越小,填满的元素越少,冲突的机会减小,但空间浪费多了。例如加载因子为1/16。扩容的阈值为16*1/16=1,也就是填进一个元素就要扩容,当然会浪费空间,因数组的容量变大,hash碰撞的几率会变小。所以加载因子过大和过小都不好,因此,必须在 "冲突的几率"与"空间利用率"之间寻找一种平衡与折衷,所以默认加载因子0.75较好(不是最好)。
二、HashMap和HashTable区别
官方文档原文如下:
* Hash table based implementation of the <tt>Map</tt> interface. This
* implementation provides all of the optional map operations, and permits
* <tt>null</tt> values and the <tt>null</tt> key. (The <tt>HashMap</tt>
* class is roughly equivalent to <tt>Hashtable</tt>, except that it is
* unsynchronized and permits nulls.) This class makes no guarantees as to
* the order of the map; in particular, it does not guarantee that the order
* will remain constant over time.
HashMap和HashTable基本等价,有两点不同:
(1)permits nulls
HashMap允许键值对是空值的情况,而HashTable不可以。前面已经提到。测试如下:
package com.leboop;
import java.util.HashMap;
import java.util.Hashtable;
import java.util.Map;
import java.util.Map.Entry;
public class HashMapTest {
public static void main(String[] args) {
//初始化map和table
Map<Integer,String> map = new HashMap<>();
Map<Integer,String> table = new Hashtable<>();
//向map添加key=null,value=null,正常运行
map.put(null, null);
//正常输出1
System.out.println(map.size());
//向table中添加key=null,捕获到空指针异常
try{
table.put(null, "hello");
}catch(NullPointerException e){
//输出:key空指针异常:java.lang.NullPointerException
System.out.println("key空指针异常:"+e);
}
map.put(1,null);
map.put(2,null);
/**
* 输出:
* null=null
* 1=null
* 2=null
*/
for(Entry<Integer, String> e:map.entrySet()){
System.out.println(e.getKey()+"="+e.getValue());
}
try{
table.put(1, null);
}catch(NullPointerException e){
//value空指针异常:java.lang.NullPointerException
System.out.println("value空指针异常:"+e);
}
}
}
(2)unsynchronized
HashMap不能异步,也就是说线程不安全,而HashTable是线程安全的,HashTable的部分源码如下:
public synchronized V put(K key, V value) {
// Make sure the value is not null
if (value == null) {
throw new NullPointerException();
}
// Makes sure the key is not already in the hashtable.
Entry<?,?> tab[] = table;
int hash = key.hashCode();
int index = (hash & 0x7FFFFFFF) % tab.length;
@SuppressWarnings("unchecked")
Entry<K,V> entry = (Entry<K,V>)tab[index];
for(; entry != null ; entry = entry.next) {
if ((entry.hash == hash) && entry.key.equals(key)) {
V old = entry.value;
entry.value = value;
return old;
}
}
addEntry(hash, key, value, index);
return null;
}
public synchronized V remove(Object key) {
Entry<?,?> tab[] = table;
int hash = key.hashCode();
int index = (hash & 0x7FFFFFFF) % tab.length;
@SuppressWarnings("unchecked")
Entry<K,V> e = (Entry<K,V>)tab[index];
for(Entry<K,V> prev = null ; e != null ; prev = e, e = e.next) {
if ((e.hash == hash) && e.key.equals(key)) {
modCount++;
if (prev != null) {
prev.next = e.next;
} else {
tab[index] = e.next;
}
count--;
V oldValue = e.value;
e.value = null;
return oldValue;
}
}
return null;
}
HashTable中的方法基本都有关键字synchronized。但是HashMap中并没有synchronized,我们先看一段测试代码:
package com.leboop;
import java.util.HashMap;
import java.util.Map;
public class Difference {
private static Map<String, String> map=new HashMap<>();
public static void main(String[] args) {
for (int i = 0; i < 100; i++) {
Thread t = new Thread("线程" + i){
public void run() {
double i = Math.random() * 100000;
map.put("键" + i, "值" + i);
map.remove("键" + i);
System.out.println(Thread.currentThread().getName() + " size = " + map.size());
}
};
t.start();
}
}
}
输出结果:
线程24 size = 0
线程23 size = 0
线程11 size = 0
线程12 size = 2
线程0 size = 1
线程20 size = 0
线程26 size = 0
线程3 size = 1
线程15 size = 1
线程1 size = 1
线程17 size = 1
线程6 size = 0
线程21 size = 0
线程22 size = 0
线程27 size = 0
线程25 size = 0
线程4 size = 1
线程29 size = 0
线程10 size = 1
线程8 size = 1
线程9 size = 2
线程2 size = 1
线程5 size = 1
线程32 size = 0
线程7 size = 2
线程43 size = 0
线程35 size = 2
线程41 size = 0
线程18 size = 2
线程19 size = 4
线程16 size = 0
线程37 size = 0
线程14 size = 1
线程13 size = 2
线程30 size = 0
线程36 size = 0
线程42 size = 0
线程40 size = 0
线程39 size = 0
线程31 size = 1
线程28 size = 0
线程53 size = 1
线程44 size = 0
线程46 size = 0
线程52 size = 0
线程50 size = 0
线程51 size = 0
线程49 size = 0
线程45 size = 0
线程47 size = 0
线程34 size = 0
线程48 size = 0
线程33 size = -1
线程38 size = -1
线程54 size = -1
线程56 size = -1
线程59 size = -1
线程57 size = -1
线程58 size = -1
线程60 size = -1
线程63 size = -1
线程55 size = -1
线程64 size = -1
线程62 size = -1
线程61 size = -1
线程65 size = -1
线程66 size = -1
线程67 size = -1
线程68 size = -1
线程69 size = -1
线程70 size = -1
线程71 size = -1
线程72 size = -1
线程73 size = -1
线程74 size = -1
线程75 size = -1
线程76 size = -1
线程77 size = -1
线程78 size = -1
线程79 size = -1
线程80 size = -1
线程81 size = -1
线程82 size = -1
线程84 size = -1
线程83 size = -1
线程85 size = -1
线程86 size = -1
线程87 size = -1
线程88 size = -1
线程89 size = -1
线程90 size = 0
线程91 size = -1
线程92 size = -1
线程93 size = -1
线程94 size = -1
线程95 size = -1
线程96 size = -1
线程97 size = -1
线程98 size = -1
线程99 size = -1
从输出结果size=-1中显而易见。线程操作数据的时候是从主存拷贝一个变量副本进行操作,这里不再累述。
(3)默认初始容量
HashMap默认容量16,HashTable默认容量11
三、ConcurrentHashMap简介
If a
* thread-safe implementation is not needed, it is recommended to use
* {@link HashMap} in place of {@code Hashtable}. If a thread-safe
* highly-concurrent implementation is desired, then it is recommended
* to use {@link java.util.concurrent.ConcurrentHashMap} in place of
* {@code Hashtable}.
我们在HashTable类注释上看到上面一段说明:如果不要求线程安全,推荐使用HashMap,如果要求线程安全,推荐使用ConcurrentHashMap。该类不支持put键值对中含null值,通过volatile关键字支持多线程安全。比synchronized关键字效率高(线程上下文切换耗时)。