HashMap these questions you know?

HashMap is often one of the test sites in Java interview, and <Value Key,> one of the structure of which is often used in development. Perhaps you used HashMap, but you know these questions it?

  • HashMap of 底层结构what is?

If you can say is 数组+ 链表, then you know that the introduction of version 1.8 after 红黑树it?

  • He said 红黑树, you know its structure it?

You know 红黑树, then you know that it is the combination 平衡二叉树and 2-3树product advantages of it? Or you also know the structure of these two trees you?

  • Now that you know the tree 索引结构, then you find out about the various database index structure of it?

You probably know that is similar to MySqlthe use of the B+树structure, then you know why you want to use this structure? And rewind back problems, why HashMapuse 红黑树instead B+树? Why 数据库use that B+树?

  • HashMapThe expansion mechanism to understand it? Also you know why the HashMapcapacity to keep the N th power do?

  • HashMapNot thread-safe, then you know what happens mainly thread-safe is it?

Well, from here on, let's go over these issues.


index

  • What is the underlying structure HashMap?
  • From 2-3 trees began to see red-black tree
    • 2-3 tree
    • Red-black tree
  • You know all kinds of database index structure of it?
  • Why choose B + tree database index? Why choose HashMap red-black tree index?
  • HashMap expansion mechanism to understand it? Also you know why HashMap capacity to keep the N th power do?
  • What is the main case HashMap is thread safe?
  • Small eggs

What is the underlying structure HashMap?

This issue needs from the JDK version for long before their introduction JDK1.7 HashMap, HashMap structure using 数组+ 链表, the main reason for using this structure and Hash算法related. HashMap purpose is to allow data access to achieve the complexity of only O (1) level, it is <Key, Value> structure, in our memory, the key value using Hash algorithm to get a hashcode, the hashcode is valuein the HashMaparray the index position when we want to query one keycorresponding valuetime, only need to go through once Hashyou can get subscript position, without going through the tedious traversal.

Because different objects through Hashmay get the same hashcode after, so here we use a 链表structure, when we hit the same when an index needs to be expanded through the list.

1

If a Hashfunction is not very sophisticated design, or insert the data itself has a problem, then there will be a hashcodelater case of multiple hits, in which case we get an array subscript, the need to go 遍历这个链表to get specific value. In this case it will affect the access speed of HashMap.

Therefore, when JDK1.8, is introduced in order to improve the efficiency of the 红黑树structure, but the red-black tree in 链表length reaches 8(默认值)time, and tablewhen the length is not less than 64 (or primary expansion), these will be converted into a red-black tree linked list.

Suppose hash冲突very serious, a ground array behind a long list, then re-time complexity is O (n). If a red-black tree, the time complexity is O (logn).

2

Before you start the next question posted here a while HashMap的源码, here are a few key areas you need to know.

/**
 * The default initial capacity - MUST be a power of two.
 */
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

DEFAULT_INITIAL_CAPACITYIt refers to the default initial capacity, which is our direct new HashMap()size of the array after given.

/**
 * The load factor used when none specified in constructor.
 */
static final float DEFAULT_LOAD_FACTOR = 0.75f;

DEFAULT_LOAD_FACTORCalled the load factor, 负载因子*当前容器的大小determines the timing of the expansion vessel, such as the current capacity is 16, the load factor is 0.75, then 负载因子*当前容器的大小 = 16*0.75 = 12, when a exceeds 12, the expansion will be a vessel.

/**
 * The maximum capacity, used if a higher value is implicitly specified
 * by either of the constructors with arguments.
 * MUST be a power of two <= 1<<30.
 */
static final int MAXIMUM_CAPACITY = 1 << 30;

MAXIMUM_CAPACITYIt is the largest expansion of capacity.

/**
 * The bin count threshold for using a tree rather than list for a
 * bin.  Bins are converted to trees when adding an element to a
 * bin with at least this many nodes. The value must be greater
 * than 2 and should be at least 8 to mesh with assumptions in
 * tree removal about conversion back to plain bins upon
 * shrinkage.
 */
static final int TREEIFY_THRESHOLD = 8;

TREEIFY_THRESHOLDChain length is reached when this value is converted to values ​​of red-black tree.

/**
 * The bin count threshold for untreeifying a (split) bin during a
 * resize operation. Should be less than TREEIFY_THRESHOLD, and at
 * most 6 to mesh with shrinkage detection under removal.
 */
static final int UNTREEIFY_THRESHOLD = 6;

UNTREEIFY_THRESHOLDResize operation performed when the red-black tree node list is degraded less than this value.

/**
 * The smallest table capacity for which bins may be treeified.
 * (Otherwise the table is resized if too many nodes in a bin.)
 * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
 * between resizing and treeification thresholds.
 */
static final int MIN_TREEIFY_CAPACITY = 64;

MIN_TREEIFY_CAPACITYBefore the tree is converted, there will be a determination, only the key 64 is greater than the number of conversion will occur. This is to avoid the early establishment of the hash table, multiple keys to be placed in exactly the same linked list and lead to unnecessary conversions.

/**
 * The table, initialized on first use, and resized as
 * necessary. When allocated, length is always a power of two.
 * (We also tolerate length zero in some operations to allow
 * bootstrapping mechanics that are currently not needed.)
 */
transient Node<K,V>[] table;

tableThe so-called array.

From 2-3 trees began to see red-black tree

2-3 tree

You should know binary search tree, we can save more than one node of a binary tree key, and call it 2-节点. Adding more than two keys, call it 3-节点.

jiedian

2-结点, Comprising a key (corresponding to its value) and two links, the links to the left in the tree key 2-3 is less than the node 2-3 links to the right of the tree are greater than the node key.

3-结点, Comprising two keys (and corresponding values) and three links, links to the left of 2-3 in the tree is less than the node key, the link to the 2-3 keys are located in the tree of the node between the two keys, the right link points 2-3 are larger than keys in the tree of the node.

Link pointing to an empty tree called null link.

A perfect balance of all air links 2-3 to find the tree from the root to all should be the same.

So the question is, what is the point 2-3 tree?

I do not know you have not found, a binary search tree shortcomings, although it looks a node quickly, but it has a great disadvantage is that when you insert a new node, it is necessary to adjust the entire binary tree.

The binary search tree is not the same, when we want to insert a new node, if the search ends in a 2- node 可以将一个2-节点转换为3-节点, thus avoiding the balancing operation.

charu

If the search ended in a 3 - node 可以将一个3-节点转换为3个2-节点.

3-charu

3- To a parent node if the node is 2 to insert a new node values 可以将这个3-节点转换为3个2-节点,然后将其中一个2-节点与父节点的2-节点合并为3-节点.

4-charu

2-3 tree can be found in the case have also have to find effective and efficient inserted.

Red-black tree

红黑树背后的基本思想是用标准的二叉查找树(完全由2-节点构成)和一些额外的信息(替换3-节点)来表示2-3树。树种的链接分为两种类型:红链接将两个2-节点连接起来构成一个3-节点,黑连接则是2-3树中的普通链接。确切来说,将3-节点表示为由一个左斜的红色链接连接两个2-节点。这种情况下,我们的红黑树就可以直接使用标准二叉树的get方法来查找节点,在插入节点时,我们可以对节点进行转换,派生出一颗对应的2-3树。

rbtree

可以发现:

  • 红链接均为左链接
  • 没有任何一个结点同时和两条红链接相连

你可以将红黑树画平,就可以发现其中奥妙。

pin

红黑树会有一个所谓的难点,就是旋转,想必你曾经因为这个问题很是恼火,那么从2-3树的角度来看看旋转的本质吧。

左旋 右旋
o_zuoxuan1 o_youxuan1
o_zuoxuan2 o_youxuan1

左旋右旋的本质目的,就是为了保证红色链接均为左链接。

你知道各类数据库的索引结构吗?

这里要介绍有二叉查找树,平衡二叉树,B-Tree,B+-Tree,Hash结构。

  • 二叉查找树

每个节点最多只有两个子树的结构。对于一个节点来说,他的左子树节点小于他,右子树节点大于他。

ec

  • 平衡二叉树

在二叉树的基础上,他的任意一个节点的左子树高度均不超过1。

但是二叉树因为每个节点只有两个子节点,所以树的高度非常高,IO次数也会增大,有时候效率并没有全表扫描高。所以这时候就需要B-Tree了。

  • B-Tree

img

每个节点最多有m个孩子,m阶B树。根节点至少包括两个孩子,树中每个节点最多包含有m个孩子,所有叶子节点都位于同一层。目的是为了让每一个索引块尽可能多的存储更多的信息,尽可能减少IO次数。

  • B+-Tree

树中节点指针与关键字数目一样,且数据均在叶子节点中。

img

所以B+Tree更适合用来做索引存储,磁盘读写代价低,查询效率稳定。这也是Mysql所使用的索引,而且Mysql为了增加查询速度,引入了DATA指针,可以直接访问底层数组。

  • Hash索引

通过Hash运算直接定位到目标。

hash

  • BitMap位图索引

修改数据时对其他数据影响极大。

img

这类索引目前只有Oracle使用了。

数据库为什么选择B+树索引?HashMap为什么选择红黑树索引?

这个问题的答案是因为磁盘

数据库的查询是位于磁盘,读取到数据之后存储到索引结构中。

HashMap是位于内存中。

磁盘内存的数据读取有很大差异,磁盘每次读取的最小单位是一簇,他可以是2、4、8、16、32或64个扇区的数据。而内存我们可以按照位来读取。

这种情况下我们在数据库中使用红黑树,建立的索引可能会庞大到无法想象,而在HashMap中使用B+树,对于HashMap频繁的插入操作,B+树无疑是要频繁进行修改的。

HashMap的扩容机制了解吗?另外你知道为什么HashMap容量要保持2的N次方吗?

HashMap扩容的主要情况是当前的容量达到负载因子*容器容量

负载因子的默认值是0.75,使用这个值的原因是太小时没有扩容的必要,太大时才扩容会影响性能,所以选择了0.75这个值。

另一个问题是HashMap为什么要保持容量为2的N次方的容量。

可以当作是为了防止hash求值碰撞的问题。在使用2的N次方容量时,数组下标的求取拥有很高的散列程度。

这个是之前看到的一篇文章

1

左边两组是数组长度为16(2的4次方),右边两组是数组长度为15。两组的hashcode均为8和9,但是很明显,当它们和1110的时候,产生了相同的结果,也就是说它们会定位到数组中的同一个位置上去,这就产生了碰撞,8和9会被放到同一个链表上,那么查询的时候就需要遍历这个链表,得到8或者9,这样就降低了查询的效率。

同时,我们也可以发现,当数组长度为15的时候,hashcode的值会与14(1110)进行,那么最后一位永远是0,而0001,0011,0101,1001,1011,0111,1101这几个位置永远都不能存放元素了,空间浪费相当大,更糟的是这种情况中,数组可以使用的位置比数组长度小了很多,这意味着进一步增加了碰撞的几率,减慢了查询的效率!

所以说,当数组长度为2的n次幂的时候,不同的key算得得index相同的几率较小,那么数据在数组上分布就比较均匀,也就是说碰撞的几率小,相对的,查询的时候就不用遍历某个位置上的链表,这样查询效率也就较高了。

HashMap线程不安全的主要情况是什么?

HashMap线程不安全的主要情况是在扩容时,调用resize()方法里的rehash()时,容易出现环形链表。

这样当获取一个不存在的key时,计算出的index正好是环形链表的下标时就会出现死循环。

rehash操作是重建内部数据结构,从而哈希表将会扩容两倍。通常,默认加载因子(0.75)在时间和空间成本上寻求一种折衷。加载因子过高虽然减少了空间开销,但同时也增加了查询成本(在大多数 HashMap 类的操作中,包括 get 和 put 操作,都反映了这一点)。在设置初始容量时应该考虑到映射中所需的条目数及其加载因子,以便最大限度地减少 rehash 操作次数。如果初始容量大于最大条目数除以加载因子,则不会发生 rehash 操作。
如果很多映射关系要存储在 HashMap 实例中,则相对于按需执行自动的 rehash 操作以增大表的容量来说,使用足够大的初始容量创建它将使得映射关系能更有效地存储。

1

小彩蛋

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

这是HashMap的hash函数,不知道你有没有发现^ (h >>> 16)这个操作。

^ (h >>> 16)The purpose of it is because hashcode high 16 in hashcode in fact, not much of a role, in order to make it 16 also play a role, there will be 16-bit hash with its own high or will allow high 16 also involved hash calculation .

Guess you like

Origin www.cnblogs.com/LexMoon/p/HashMapDep.html