C# 之 Dictionary

通过查阅网上相关资料和查看微软源码，我对Dictionary有了更深的理解。

Dictionary，翻译为中文是字典，通过查看源码发现，它真的内部结构真的和平时用的字典思想一样。

我们平时用的字典主要包括两个两个部分，目录和正文，目录用来进行第一次的粗略查找，正文进行第二次精确查找。通过将数据进行分组，形成目录，正文则是分组后的结果。

而Dictionary对应的是 int[] buckets 和 Entry[] entries，buckets用来记录要查询元素分组的起始位置（这么些是为了方便理解，其实是最后一个插入元素的位置没有元素为-1，查找同组元素通过 entries 元素中的 Next 遍历，后面会提到），entries记录所有元素。分组依据是计算元素 Key 的哈希值与 buckets 的长度取余，余数就是分组，指向buckets 位置。通过先查找 buckets 确定元素分组的起始位置，再遍历分组内元素查找到准确位置。与对应的目录和正文相同，buckets的长度小于等于 entries，buckets 的长度使用 HashHelpers.GetPrime(capacity) 计算，是一个计算得到的最优值。capacity是字典的容量，大于等于字典中实际存储元素个数。

Dictionary与真实的字典不同之处在于，真实字典的分组结果的物理位置是连续的，而 Dictionary 不是，他的物理位置顺序就是插入的顺序，而分组信息记录在 entries 元素中的 Next 中，Next 是个 int 字段，用来记录同组元素的下一个位置（若当前为该组第一个插入元素则记录-1，第一个插入元素在分组遍历的最后一个）

解析一下Dictionary的几个关键方法

1.Add（Insert 新增&更新方法）

Add和使用[]更新实际就是调用的Insert，代码如下。

首先计算key的哈希值，与buckets取余后确定目录位置，找到entries的位置，

public void Add(TKey key, TValue value)
{
    Insert(key, value, true);
}

public TValue this[TKey key]
{
    get
    {
        int i = FindEntry(key);
        if (i >= 0) return entries[i].value;
        ThrowHelper.ThrowKeyNotFoundException();
        return default(TValue);
    }
    set
    {
        Insert(key, value, false);
    }
}

private void Insert(TKey key, TValue value, bool add)
{

    if (key == null)
    {
        ThrowHelper.ThrowArgumentNullException(ExceptionArgument.key);
    }

    if (buckets == null) Initialize(0);
    int hashCode = comparer.GetHashCode(key) & 0x7FFFFFFF;
    int targetBucket = hashCode % buckets.Length;

#if FEATURE_RANDOMIZED_STRING_HASHING
            int collisionCount = 0;
#endif

    for (int i = buckets[targetBucket]; i >= 0; i = entries[i].next)
    {
        if (entries[i].hashCode == hashCode && comparer.Equals(entries[i].key, key))
        {
            if (add)
            {
                ThrowHelper.ThrowArgumentException(ExceptionResource.Argument_AddingDuplicate);
            }
            entries[i].value = value;
            version++;
            return;
        }

#if FEATURE_RANDOMIZED_STRING_HASHING
                collisionCount++;
#endif
    }
    int index;
    if (freeCount > 0)
    {
        index = freeList;
        freeList = entries[index].next;
        freeCount--;
    }
    else
    {
        if (count == entries.Length)
        {
            Resize();
            targetBucket = hashCode % buckets.Length;
        }
        index = count;
        count++;
    }

    entries[index].hashCode = hashCode;
    entries[index].next = buckets[targetBucket];
    entries[index].key = key;
    entries[index].value = value;
    buckets[targetBucket] = index;
    version++;

#if FEATURE_RANDOMIZED_STRING_HASHING
 
#if FEATURE_CORECLR
            // In case we hit the collision threshold we'll need to switch to the comparer which is using randomized string hashing
            // in this case will be EqualityComparer<string>.Default.
            // Note, randomized string hashing is turned on by default on coreclr so EqualityComparer<string>.Default will 
            // be using randomized string hashing
 
            if (collisionCount > HashHelpers.HashCollisionThreshold && comparer == NonRandomizedStringEqualityComparer.Default) 
            {
                comparer = (IEqualityComparer<TKey>) EqualityComparer<string>.Default;
                Resize(entries.Length, true);
            }
#else
            if(collisionCount > HashHelpers.HashCollisionThreshold && HashHelpers.IsWellKnownEqualityComparer(comparer)) 
            {
                comparer = (IEqualityComparer<TKey>) HashHelpers.GetRandomizedEqualityComparer(comparer);
                Resize(entries.Length, true);
            }
#endif // FEATURE_CORECLR
 
#endif

}

private int FindEntry(TKey key) {
            if( key == null) {
                ThrowHelper.ThrowArgumentNullException(ExceptionArgument.key);
            }
 
            if (buckets != null) {
                int hashCode = comparer.GetHashCode(key) & 0x7FFFFFFF;
                for (int i = buckets[hashCode % buckets.Length]; i >= 0; i = entries[i].next) {
                    if (entries[i].hashCode == hashCode && comparer.Equals(entries[i].key, key)) return i;
                }
            }
            return -1;
        }

2.Resize（重新调整大小）

虽然这是个私有方法，但我认为关键。它会在元素个数即将超过容量时调用，代码如下，简单说明一下。

该方法会声明一个 newBuckets 和 newEntrues 用来替换之前的 buckets 和 entrues，声明后会重构这两个数组，将 entrues 的值复制到 entrues，重新计算 newBuckets 的值，如果频繁触发该方法消耗是较大的，所以创建 Dictionary 时建议指定合理的 capacity（容量）

private void Resize(int newSize, bool forceNewHashCodes)
{
    Contract.Assert(newSize >= entries.Length);
    int[] newBuckets = new int[newSize];
    for (int i = 0; i < newBuckets.Length; i++) newBuckets[i] = -1;
    Entry[] newEntries = new Entry[newSize];
    Array.Copy(entries, 0, newEntries, 0, count);
    if (forceNewHashCodes)
    {
        for (int i = 0; i < count; i++)
        {
            if (newEntries[i].hashCode != -1)
            {
                newEntries[i].hashCode = (comparer.GetHashCode(newEntries[i].key) & 0x7FFFFFFF);
            }
        }
    }
    for (int i = 0; i < count; i++)
    {
        if (newEntries[i].hashCode >= 0)
        {
            int bucket = newEntries[i].hashCode % newSize;
            newEntries[i].next = newBuckets[bucket];
            newBuckets[bucket] = i;
        }
    }
    buckets = newBuckets;
    entries = newEntries;
}

展示一个 Dictionary 实际存储效果图

猜你喜欢