[集合]Java基础面试题(2)

1. Java有哪些集合

Java 容器分为 Collection 和 Map 两大类，其下又有很多子类。Iterable 接口是 Collection 类集合的根接口。实现 Iterable 接口的类都可以使用增强 for 循环。

2. HashMap的数据结构

HashMap 的数据结构是什么样的？

在 Java 中，HashMap 是一种基于哈希表实现的 Map 集合，使用键值对的方式存储数据。在 Java 7，其数据结构由一个数组和一个链表组成。在 Java 8，它由一个数组、一个链表或者红黑树组成。数组中的每个元素都是一个链表的头节点，每个节点存储一个键值对。在向 HashMap 中添加键值对时，首先根据键的 hashCode 值计算出数组中的位置，然后将该键值对添加到对应链表的尾部。

当链表中节点的数量达到一定的阈值(默认为 8)时，且数组长度大于或等于 64 时，链表会将转化为红黑树，以提高查找效率。如果节点数较少(小于等于 6)，红黑树将会转化为链表，以节省空间。

在进行查询操作时，HashMap 首先根据键的 hashCode 值计算出数组中的位置，然后遍历对应链表(或红黑树)中的所有节点，比较键的值，直到找到目标节点或遍历完整个链表(或红黑树)。由于哈希表的特性，查询操作的时间复杂度为 O(1) 或 O(log n)。

需要注意的是，HashMap 并不是线程安全的，如果在多线程环境下使用，应该采取相应的同步措施或使用线程安全的 Map 实现。

HashMap 内部链表实现代码：

```java /** * Basic hash bin node, used for most entries. (See below for * TreeNode subclass, and in LinkedHashMap for its Entry subclass.) */ static class Node implements Map.Entry { final int hash; final K key; V value; Node next;

Node(int hash, K key, V value, Node<K,V> next) {
    this.hash = hash;
    this.key = key;
    this.value = value;
    this.next = next;
}

public final K getKey()        { return key; }
public final V getValue()      { return value; }
public final String toString() { return key + "=" + value; }

public final int hashCode() {
    return Objects.hashCode(key) ^ Objects.hashCode(value);
}

public final V setValue(V newValue) {
    V oldValue = value;
    value = newValue;
    return oldValue;
}

public final boolean equals(Object o) {
    if (o == this)
        return true;
    if (o instanceof Map.Entry) {
        Map.Entry<?,?> e = (Map.Entry<?,?>)o;
        if (Objects.equals(key, e.getKey()) &&
            Objects.equals(value, e.getValue()))
            return true;
    }
    return false;
}

} ```

3. HashMap的put方法

说一下 HashMap 调用 put 方法插入元素过程？

当调用 HashMap 的 put 方法插入元素时，会经历以下步骤：

首先，根据插入元素的键值对的键的 hashCode 值，计算其在数组中的位置(即桶)，也就是调用 hash 方法将键的 hashCode 值进行哈希扰动函数计算，并将结果对数组长度取模。
如果在对应的桶中已经存在元素，就遍历该桶中的元素，查找是否已经存在相同的键。如果找到了相同的键，就将其对应的值更新为新的值，并返回旧的值；如果没有找到相同的键，就将新的键值对添加到链表(或红黑树)的尾部。
在添加完新的键值对后，如果链表的长度达到了一定的阈值(默认为 8)，就需要判断是否需要将链表转化为红黑树。如果需要，就将该链表转化为红黑树。
最后，如果添加元素后 HashMap 的大小达到了阈值(数组长度 * loadFactor)，就需要对 HashMap 进行扩容操作。具体操作为：创建一个新的数组，将原数组中的元素重新计算哈希值，然后放到新数组中对应的位置。这个过程需要遍历原数组中的所有元素，因此时间开销较大。

需要注意的是，由于哈希冲突的存在，可能会出现多个键的 hashCode 值相等，但它们的键值对实际上存储在不同的桶中的情况。因此，在查找元素时，需要先根据键的 hashCode 值找到对应的桶，然后遍历该桶中的元素，找到对应的键值对。

```java /** * Associates the specified value with the specified key in this map. * If the map previously contained a mapping for the key, the old * value is replaced. * * @param key key with which the specified value is to be associated * @param value value to be associated with the specified key * @return the previous value associated with key, or * null if there was no mapping for key. * (A null return can also indicate that the map * previously associated null with key.) */ public V put(K key, V value) { return putVal(hash(key), key, value, false, true); }

/** * Implements Map.put and related methods. * * @param hash hash for key * @param key the key * @param value the value to put * @param onlyIfAbsent if true, don't change existing value * @param evict if false, the table is in creation mode. * @return previous value, or null if none */ final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node [] tab; Node p; int n, i;

// 如果table为空或者还没有元素就扩容
if ((tab = table) == null || (n = tab.length) == 0)
    n = (tab = resize()).length;
// 如果首节点为空则创建一个新的节点
// (n - 1) & hash 才是真正的hash值，也就是存储在的数组的索引，Java 6中是使用indexFor方法
if ((p = tab[i = (n - 1) & hash]) == null)
    tab[i] = newNode(hash, key, value, null);
else {  // 产生了hash冲突，处理hash冲突
    Node<K,V> e; K k;

    // 如果在首节点和需要插入的节点有相同的hash和key值，用一个Node类变量保存下来
    if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k))))
        e = p;
    // 如果首节点是红黑树节点类型，则按照红黑树节点的方法添加元素
    else if (p instanceof TreeNode)
        e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);、
    // 说明是链表节点类型    
    else {
        for (int binCount = 0; ; ++binCount) {
            // 遍历到了链表末尾
            if ((e = p.next) == null) {
                p.next = newNode(hash, key, value, null);
                // 遍历的数目为大于等于8时，binCount从0开始所以是 >= 7，TREEIFY_THRESHOLD = 8
                if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                    treeifyBin(tab, hash);
                break;
            }
            // 如果找到与待插入的元素具有相同的hash和key值的节点，则停止遍历，此时遍历e已经记录了该节点
            if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k))))
                break;
            p = e;
        }
    }

    // 表示存在相同的元素
    if (e != null) { // existing mapping for key
        V oldValue = e.value;
        if (!onlyIfAbsent || oldValue == null)
            e.value = value;
        // 这是钩子函数，由用户定义在节点存在重复的情况的钩子函数
        afterNodeAccess(e);
        return oldValue;
    }
}

++modCount;
// 当节点数大于阈值则扩容
if (++size > threshold)
    resize();
// 用户自定义在节点插入之后的钩子函数
afterNodeInsertion(evict);
return null;

} ```

4. HashMap的hash方法

HashMap 如何计算元素 Key 的 hash 值？

HashMap 通过 hash 方法计算元素 Key 的 hash 值：

java /** * Computes key.hashCode() and spreads (XORs) higher bits of hash * to lower. Because the table uses power-of-two masking, sets of * hashes that vary only in bits above the current mask will * always collide. (Among known examples are sets of Float keys * holding consecutive whole numbers in small tables.) So we * apply a transform that spreads the impact of higher bits * downward. There is a tradeoff between speed, utility, and * quality of bit-spreading. Because many common sets of hashes * are already reasonably distributed (so don't benefit from * spreading), and because we use trees to handle large sets of * collisions in bins, we just XOR some shifted bits in the * cheapest possible way to reduce systematic lossage, as well as * to incorporate impact of the highest bits that would otherwise * never be used in index calculations because of table bounds. */ static final int hash(Object key) { int h; return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); }

当 key == null：hash 值为 0，所以 HashMap 的 Key 可以为 null。

对比 HashTable，Hashtable 对 key 直接调用 hashCode 方法计算 hash 值，若 key 为 null 时，会抛出异常，所以 Hashtable的 key 不可为 null。

```java // Hashtable的put方法 public synchronized V put(K key, V value) { // Make sure the value is not null if (value == null) { throw new NullPointerException(); }

// Makes sure the key is not already in the hashtable.
Entry<?,?> tab[] = table;
int hash = key.hashCode();
int index = (hash & 0x7FFFFFFF) % tab.length;
@SuppressWarnings("unchecked")
Entry<K,V> entry = (Entry<K,V>)tab[index];
for(; entry != null ; entry = entry.next) {
    if ((entry.hash == hash) && entry.key.equals(key)) {
        V old = entry.value;
        entry.value = value;
        return old;
    }
}

addEntry(hash, key, value, index);
return null;

} ```

当 key != null，则先调用 hashCode 方法返回值记为 h，然后对哈希码进行扰动处理：将哈希码自身右移 16 位后的二进制数按位异或 ^ 原哈希码得到最终的 hash 值。

5. 如何计算插入元素的位置

HashMap 插入一条元素如何计算插入元素的位置？

通过元素 Key 的 hash 值 & n - 1(数组长度为 n)计算插入元素的索引。

java // (n - 1) & hash 才是真正的hash值，也就是存储在的数组的索引，Java 6中是使用indexFor方法 if ((p = tab[i = (n - 1) & hash]) == null) tab[i] = newNode(hash, key, value, null);

6. 右移16位异或计算

为什么计算 hash 值要右移 16 位然后异或原哈希码？

h >>> 16 是用来取出 h 的高 16 位，也就是高半位： $$ h = 0000 \, 0100 \, 1011 \, 0011 \, 1101 \, 1111 \, 1110 \, 0001 \ h >>> 16 \ h = 0000 \, 0000 \, 0000 \, 0000 \, 0000 \, 0100 \, 1011 \, 0011 \ $$ 由于和最终和(n - 1)& 运算，length 绝大多数情况小于 2 的 16 次方。所以始终是 hashCode 的低 16 位(甚至更低)参与运算。要是高 16 位也参与运算，会让得到的下标更加散列。所以这样高 16 位是用不到的，如何让高16 也参与运算呢。所以才有 hash 方法。让元素 Key 的 hashCode() 返回值和自己的高 16 位进行 ^ 运算。所以(h >>> 16)得到高 16 位与 hashCode() 进行 ^ 运算。

为了方便验证，假设 length 为 8。HashMap 的默认初始容量为 16。(length - 1)= 7；转换二进制为 111；假设一个 Key 的 hashcode = 78897121。转换二进制： 100101100111101111111100001，与(length-1)& 运算如下： $$ 0000 \, 0100 \, 1011 \, 0011 \, 1101 \, 1111 \, 1110 \, 0001 \ \& \ 0000 \, 0000 \, 0000 \, 0000 \, 0000 \, 0000 \, 0000 \, 0111 \ = \ 0000 \, 0000 \, 0000 \, 0000 \, 0000 \, 0000 \, 0000 \, 0001 \ $$ 上述运算实质是：001 与 111 & 运算。也就是哈希值的低三位与 length 与运算。如果让哈希值的低三位更加随机，那么 & 结果就更加随机，如何让哈希值的低三位更加随机，那么就是让其与高位异或。右位移 16 位，正好是 32 位的一半，自己的高半区和低半区做异或，就是为了混合原始哈希码的高位和低位，以此来加大低位的随机性。而且混合后的低位掺杂了高位的部分特征，这样高位的信息也被变相保留下来。

7. 为什么用异或计算

为什么要用异或计算 hash 扰动函数的值，而不用 & 和 | 呢？

假设均匀随机(1位)输入，AND 函数输出概率分布分别为 75％ 0 和 25％ 1。相反，OR 为 25％ 0 和 75％ 1。 XOR 函数为 50％ 0和 50％ 1，因此对于合并均匀的概率分布非常有用。 $$ \begin{array}{cc|c} A & B & A \& B \ \hline 0 & 0 & 0 \ 0 & 1 & 0 \ 1 & 0 & 0 \ 1 & 1 & 1 \ \end{array} $$

$$ \begin{array}{cc|c} A & B & A | B \ \hline 0 & 0 & 0 \ 0 & 1 & 1 \ 1 & 0 & 1 \ 1 & 1 & 1 \ \end{array} $$

$$ \begin{array}{cc|c} A & B & A \oplus B \ \hline 0 & 0 & 0 \ 0 & 1 & 1 \ 1 & 0 & 1 \ 1 & 1 & 0 \ \end{array} $$

8. 为什么要引入红黑树

由于在 Java 7 之前，ＨashMap 的数据结构为：数组 + 链表。

链表来存储 hash 值一样的 key-value。如果按照链表的方式存储，随着节点的增加数据会越来越多，这会导致查询节点的时间复杂度会逐渐增加，平均时间复杂度Ｏ(n)。为了提高查询效率，故在 Java 8 中引入了改进方法红黑树。此数据结构的平均查询效率为 $Ｏ(log\,n) $。

9. 链表转化为红黑树

为什么 Java 8 以后，HashMap 在链表长度大于或等于 8 的时候要变成红黑树？

在 Java 8 以及以后的版本中，HashMap 的底层结构，由原来单纯的的数组加链表，更改为链表长度为 8 时，开始由链表转换为红黑树，我们都知道，链表查询元素的时间复杂度是 $O(n)$，红黑树的时间复杂度 $O(log\,n)$，很显然，红黑树的时间复杂度是优于链表的。因为树节点所占空间是普通节点的两倍，所以只有当节点足够多的时候，才会使用树节点。也就是说，节点少的时候，尽管时间复杂度上，红黑树比链表好一点，但是红黑树所占空间比较大，综合考虑，认为只能在节点太多的时候，红黑树占空间大这一劣势不太明显的时候，才会舍弃链表，使用红黑树，这也是不直接全部使用红黑树的原因。

那为什么要选择阈值为 8 链表转换成红黑树呢？

在理想状态下，受随机分布的 hashCode 影响，链表中的节点遵循泊松分布，而且根据统计，链表中节点数是 8 的概率已经接近千分之一，而且此时链表的性能已经很差了。所以在这种比较罕见和极端的情况下，才会把链表转变为红黑树。因为链表转换为红黑树也是需要消耗性能的，特殊情况特殊处理，为了照顾性能，权衡之下，才使用红黑树，提高性能。

那什么又是泊松分布呢？

泊松分布的概率函数为： $$ P(X=k)=\frac{\lambda^k}{k!}e^{-\lambda},k=0,1,... $$ 泊松分布的参数 $\lambda$ 是单位时间或单位面积内随机事件的平均发生次数。泊松分布适合于描述单位时间内随机事件发生的次数。泊松分布的期望和方差均为 $\lambda$。

特征函数为： $$ \psi(t)=e^{\lambda(e^{it}-1)} $$ 如果链表节点数大于 8，就一定会转换为红黑树吗？

HashMap 的 treeifyBin 方法：

java /** * Replaces all linked nodes in bin at index for given hash unless * table is too small, in which case resizes instead. */ final void treeifyBin(Node<K,V>[] tab, int hash) { int n, index; Node<K,V> e; // 先判断table长度是否小于 MIN_TREEIFY_CAPACITY = 64 // 小于64就扩容，否则就转换为红黑树 if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY) resize(); else if ((e = tab[index = (n - 1) & hash]) != null) { TreeNode<K,V> hd = null, tl = null; do { TreeNode<K,V> p = replacementTreeNode(e, null); if (tl == null) hd = p; else { p.prev = tl; tl.next = p; } tl = p; } while ((e = e.next) != null); if ((tab[index] = hd) != null) hd.treeify(tab); } }

可以看到在 treeifyBin 方法中并不是简单地将链表转换为红黑树，而是先判断table的长度是否大于 64，如果小于 64，就通过扩容的方式来解决，避免转换为红黑树。

为什么要使用红黑树而不使用 AVL 树？

AVL 树和红黑树有几点比较和区别：

AVL 树是更加严格的平衡，因此可以提供更快的查找速度，一般读取查找密集型任务，适用 AVL 树。
红黑树更适合于插入修改密集型任务。
通常，AVL 树的旋转比红黑树的旋转更加难以平衡和调试。
AVL 以及红黑树是高度平衡的树数据结构。它们非常相似，真正的区别在于在任何添加/删除操作时完成的旋转操作次数。
两种实现都缩放为a $O(log\,n)$，其中 n 是叶子的数量，但实际上 AVL 树在查找密集型任务上更快：利用更好的平衡，树遍历平均更短。另一方面，插入和删除方面，AVL 树速度较慢：需要更高的旋转次数才能在修改时正确地重新平衡数据结构。
在 AVL 树中，从根到任何叶子的最短路径和最长路径之间的差异最多为 1。在红黑树中，差异可以是 2 倍。
两个都给 $O(log\,n)$ 查找，但平衡 AVL 树可能需要 $O(log\,n)$ 旋转，而红黑树将需要最多两次旋转使其达到平衡(尽管可能需要检查 $O(log\,n)$ 节点以确定旋转的位置)。旋转本身是 $O(1)$ 操作，因为只是移动指针。