[Daily Punch] From a simple discussion of hash tables to the principle analysis of Map + handwriting implementation

foreword

Recently, I was researching algorithms, and I just saw the hash table (Hash table, also called hash table) of the algorithm, which is indeed a very interesting thing.

Generally hash tables are used to quickly determine whether an element appears in a set, and when we want to use hashing to solve problems, we generally choose the following three data structures:

  • array
  • set (collection)
  • map

Seeing that both Map and Set have an inexplicable relationship with hash tables, I found that although these ES6 syntaxes have been used many times before, they have not taken the time to study them. Just learn the hash table, then go to study their underlying principles, and try to write a simple Map, and implement all his methods, let's go!


hash table

First, let's talk about hash tables in algorithms. The job of a hashing algorithm is to find a value in a data structure as fast as possible. If you want to query a data through enumeration, the time complexity is O(n), and using the hash function can quickly retrieve the value directly through the index, only O(1) can be done.

The hash function can obtain the position of the key parameter through the hashCode method, that is, convert the keys of different data formats into different values ​​through a specific encoding method, and then obtain the corresponding value according to the value. However, if the value obtained by hashCode is larger than the size of the hash table, a modulo operation can be performed on the value, so as to avoid the risk of the operand exceeding the maximum representation range of the numerical variable.

hash table.jpg

conflict resolution

However, sometimes some keys will have the same hash value, and conflicts will inevitably occur. Generally, there are two main methods for resolving conflicts, zipper method and linear exploration.

  1. The zipper method : is the easiest way to resolve conflicts. Create a linked list for each position of the hash table and store the elements in it, which is the so-called array + linked list form, so that it will not waste a lot of money because of the empty value of the array. memory, and will not waste too much time on lookups because the linked list is too long.
  2. 线性探查:是另一种解决冲突的有效方法,而之所以叫线性,因为它处理冲突的方法是将元素直接存储到表中,而不是单独的数据结构中。当想要添加一个新元素时,如果对应的位置被占据,就迭代散列表,直到找到一个空闲的位置。如果使用线性探查的话,需要注意的一个问题是数组的可用位置可能会被用完,所以可能还需要创建一个更大的数组并且将元素复制到新数组中去。

一个良好的散列函数是由几个方面构成的:插入和检索元素的时间,以及较低的冲突可能性。


Map

Map 介绍

我们平时常见的三种哈希结构就是数组、set 和 map,接下来看看这次的主角 map。

Map 对象保存键值对,并且能够记住键的原始插入顺序。任何值(对象或者原始值) 都可以作为一个键或一个值。

传统的 JavaScript 对象 Object 用字符串当作键,这给它的使用带来了很大的限制。

为了解决这个问题,ES6 提供了新的 Map 数据结构。与 Object 只能使用数值、字符串或者符号作为键不同,Map 可以使用任何 JS 数据类型作为键,也就是说“键”的范围不限于字符串,各种类型的值(包括对象)都可以当作键。而 Map 内部使用 SameValueZero 比较操作,基本上相当于使用严格对象相等的标准来检查键的匹配性。

与 Object 类似,映射的值是没有限制的,但是相较于 Object,Map 是一种更完善的 Hash 结构实现。如果你需要“键值对”的数据结构,Map 比 Object 更合适。

Map 方法使用

Map 属性和操作方法

Map 结构的实例有以下属性和操作方法。

属性/方法 作用
size 属性 size 是可访问属性,用于返回 一个 Map 对象的成员数量。
set(key, value) set 方法设置键名 key 对应的键值为 value,然后返回整个 Map 结构。如果 key 已经有值,则键值会被更新,否则就新生成该键。
get(key) get 方法读取 key 对应的键值,如果找不到 key ,返回 undefined。
has(key) has 方法返回一个布尔值,表示某个键是否在当前 Map 对象之中。
delete(key) delete 方法删除某个键,返回 true 。如果删除失败,返回 false。
clear() clear 方法清除所有成员,没有返回值。

Map 遍历方法

方法 作用
keys() keys() 返回一个引用的 Iterator 对象。它包含按照顺序插入 Map 对象中每个元素的key值。
values() values() 方法返回一个新的Iterator对象。它包含按顺序插入Map对象中每个元素的value值。
entries() entries() 方法返回一个新的包含 [key, value] 对的 Iterator 对象,返回的迭代器的迭代顺序与 Map 对象的插入顺序相同。
forEach() forEach() 方法按照插入顺序依次对 Map 中每个键/值对执行一次给定的函数

手写一个 Map

接下来就顺着 map 的属性和方法手写一个简易版 map。

初始化

首先定义一个 MyMap,然后在其原型链上创建 init 方法开执行初始化操作,当然也可以使用 class 来创建 MyMap。

function MyMap() {
  this.init()
}

MyMap.prototype.init = function () {
  // 散列表长度
  this.size = 0;
  
  // bucket 为散列表结构:数组 + 链表,初始化时每个链表的 next 指向头指针
  this.bucket = new Array(8).fill(0).map(() => {
    return {
      next: null
    }
  });
};
复制代码
const map = new MyMap();

console.log(map);
复制代码

Initialize map.jpg

定义 hash 方法

在 MyMap 的原型链上创建一个 hash 方法给新增的数据分类,让他们存储在指定的链表上,方便快速查找,也可以供后续的 get、set、delete 等方法使用。

MyMap.prototype.hash = function (key) {
  let index = 0;

  // 根据 key 的类型来给数据分类,返回指定的 index,也可以使用其他方式
  if (typeof key == "object") {
    index = 0;
  } else if (typeof key === "undefined") {
    index = 1;
  } else if (typeof key === "null") {
    index = 2;
  } else if (typeof key === "boolean") {
    index = 3;
  } else if (typeof key === "number") {
    // 给数字执行求余取模操作
    index = key % this.bucket.length;
  } else if (typeof key == "string") {
    for (let i = 0; i < key.length; i++) {
      // 求取字符串每个字符的 Unicode 编码之和
      index += key.charCodeAt(i);
    }
    
    // 给字符串 index 执行求余取模操作
    index = index % this.bucket.length;
  }

  return index;
}
复制代码

创建 set 方法

MyMap.prototype.set = function (key, value) {
  // 根据 key 值获取到对应的链表下标,得到要操作的链表
  const i = this.hash(key);
  let listNode = this.bucket[i];
  
  // 遍历链表,在链表尾部追加数据
  while (listNode.next) {
    // 如果有 key 存在,执行更新操作,并返回自身对象
    if (listNode.key === key) {
      listNode.value = value;
      
      return this;
    }
    
    // 没找到链表向下移动
    listNode = listNode.next;
  }

  // 如果遍历完都没有找到,就执行新增操作
  listNode.next = {
    key,
    value,
    next: null
  };

  this.size++;
  return this;
};
复制代码
const map = new MyMap();

map.set('0', 'foo');
map.set(1, 'bar');
map.set({}, "baz");

console.log(map);
复制代码

map的set方法.jpg

创建 get 方法

MyMap.prototype.get = function (key) {
  const i = this.hash(key);
  let listNode = this.bucket[i];

  // 遍历链表,找到指定的key并返回value,没有找到就返回undefined
  while (listNode.next) {
    if (listNode.next.key === key) {
      return listNode.next.value;
    }

    // 没找到链表向下移动
    listNode = listNode.next;
  }

  return undefined;
};
复制代码
const map = new MyMap();

map.set('0', 'foo');
map.set(1, 'bar');
map.set({}, "baz");

console.log('get 0', map.get(0));  // get 0 undefined
console.log('get 1', map.get(1));  // get 1 bar
复制代码

创建 has 方法

MyMap.prototype.has = function (key) {
  const i = this.hash(key);
  let listNode = this.bucket[i];

  // 遍历链表,找到指定的key,如果有返回true,反之返回false
  while (listNode.next) {
    if (listNode.next.key === key) {
      return true;
    }

    // 没找到链表向下移动
    listNode = listNode.next;
  }

  return false;
};
复制代码
const map = new MyMap();

map.set('0', 'foo');
map.set(1, 'bar');
map.set({}, "baz");

console.log('has 0', map.get(0));  // has 0 false
console.log('has 1', map.get(1));  // has 1 true
复制代码

创建 delete 方法

MyMap.prototype.delete = function (key) {
  const i = this.hash(key);
  let listNode = this.bucket[i];

  // 遍历链表,找到指定的key,如果存在改变链表next指向并返回true,反之返回false
  while (listNode.next) {
    if (listNode.next.key === key) {
      listNode.next = listNode.next.next;
      this.size--;
      
      return true;
    }

    // 没找到链表向下移动
    listNode = listNode.next;
  }

  return false;
};
复制代码
const map = new MyMap();

map.set('0', 'foo');
map.set(1, 'bar');
map.set({}, "baz");

console.log('delete "0"', map.delete('0'));  // delete "0" true
console.log(map);
复制代码

map的delete方法.jpg

创建 clear 方法

执行初始化操作即可。

MyMap.prototype.clear = function () {
  this.init();
};
复制代码

创建 entries 方法

可以直接遍历数组的每个链表,然后返回一个新的包含 [key, value] 对的 Iterator 对象。

MyMap.prototype.entries = function* () {
  for (let i = 0; i < this.bucket.length; i++) {
    let listNode = this.bucket[i];
    
    while (listNode.next) {
      if (listNode.next.key) {
        yield [listNode.next.key, listNode.next.value];
      }

      listNode = listNode.next;
    }
  }
};
复制代码

However, although the simple entries method can be implemented in this way, the original insertion order of the map cannot be recorded, so a head and tail node can be added to record the order.

  1. Modify the init method and move the head and tail nodes in the set and delete methods

    MyMap.prototype.init = function () {
      // 散列表长度
      this.size = 0;
    
      // bucket 为散列表结构:数组 + 链表,初始化时每个链表的 next 指向头指针
      this.bucket = new Array(8).fill(0).map(() => {
        return {
          next: null
        }
      });
    
      // 记录头尾节点
      this.head = {
        next: null
      };
      this.tail = null;
    };
    
    MyMap.prototype.set = function (key, value) {
      // 根据 key 值获取到对应的链表下标,得到要操作的链表
      const i = this.hash(key);
      let listNode = this.bucket[i];
    
      let flag = false;
      // 遍历链表,在链表尾部追加数据
      while (listNode.next) {
        // 如果有 key 存在,执行更新操作,并返回自身对象
        if (listNode.next.key === key) {
          listNode.next.value = value;
    
          // return this;
          flag = true;
          break;
        }
    
        // 没找到链表向下移动
        listNode = listNode.next;
      }
    
      // 如果存在,更新 head 节点中的 value
      if (flag) {
        listNode = this.head;
        while (listNode.next) {
          if (listNode.next.key === key) {
            listNode.next.value = value;
            return this;
          }
    
          listNode = listNode.next;
        }
      }
    
      const node = {
        key,
        value,
        next: null
      };
      // 如果遍历完都没有找到,就执行新增操作
      listNode.next = node;
    
      // 给头尾节点赋值,记录散列表的顺序
      if (this.size === 0) {
        this.head.next = node
        this.tail = this.head.next;
      } else {
        this.tail.next = node
        this.tail = this.tail.next;
      }
    
      this.size++;
      return this;
    };
    
    MyMap.prototype.delete = function (key) {
      const i = this.hash(key);
      let listNode = this.bucket[i];
    
      let flag = false;
      // 遍历链表,找到指定的key,如果存在改变链表next指向同时改变head节点中的next指向并返回true,反之返回false
      while (listNode.next) {
        if (listNode.next.key === key) {
          listNode.next = listNode.next.next;
          this.size--;
    
          flag = true;
          break
        }
    
        // 没找到链表向下移动
        listNode = listNode.next;
      }
    
      if (flag) {
        listNode = this.head;
        while (listNode.next) {
          if (listNode.next.key === key) {
            listNode.next = listNode.next.next;
            break;
          }
    
          listNode = listNode.next;
        }
      }
    
      return flag;
    };
    复制代码
  2. Traverse the head node, returning a new object containing [key, value]the pairs Iterator.

    MyMap.prototype.entries = function* () {
      let listNode = this.head.next;
    
      // 从头节点开始按照顺序遍历
      while (listNode) {
        if (listNode.key) {
          yield [listNode.key, listNode.value];
        }
    
        listNode = listNode.next;
      }
    };
    复制代码
    const map = new MyMap();
    
    map.set('0', 'foo');
    map.set(1, 'bar');
    map.set({}, "baz");
    
    const iterator = map.entries();
    console.log(iterator.next().value);  // ['0', 'foo']
    console.log(iterator.next().value);  // [1, 'bar']
    console.log(iterator.next().value);  // [{}, 'baz']
    
    console.log(map);
    复制代码

    map的entries方法.jpg

Create values ​​method

The values ​​method is similar to the entries method.

MyMap.prototype.values = function* () {
  let listNode = this.head.next;

  // 从头节点开始按照顺序遍历
  while (listNode) {
    if (listNode.value) {
      yield listNode.value;
    }

    listNode = listNode.next;
  }
};
复制代码

create keys method

The keys method is also similar to the entries method.

MyMap.prototype.keys = function* () {
  let listNode = this.head.next;

  // 从头节点开始按照顺序遍历
  while (listNode) {
    if (listNode.key) {
      yield listNode.key;
    }

    listNode = listNode.next;
  }
};
复制代码

Create @@iterator methods

@@iteratorThe initial value of the entriesproperty is the same function object as the initial value of the property.

MyMap.prototype[Symbol.iterator] = MyMap.prototype.entries; 
复制代码

Create a forEach method

// forEach(fn, context)方法
MyMap.prototype.forEach = function (fn, context = this) {
 let listNode = this.head.next;

  // 从头节点开始按照顺序遍历
  while (listNode) {
    if (listNode.key) {
      fn.call(context, listNode.value, listNode.key);
    }

    listNode = listNode.next;
  }
};
复制代码
const map = new MyMap();

map.set('0', 'foo');
map.set(1, 'bar');
map.set({}, "baz");

function logMapElements(value, key, map) {
  console.log(`m[${key}] = ${value}`);
}
map.forEach(logMapElements);

console.log(map);
复制代码

map的forEach方法.jpg

full code

The code is relatively long, if you are interested, you can check gitee.com/sosir/handw…


Summarize

Most of the features of map can be implemented by the Object type, but if the code is designed with a large number of query, insertion and deletion operations, the performance of map is undoubtedly better. That is to say, when we encounter the need to quickly determine whether an element exists or need a large number of addition and deletion operations, we must consider the hash table structure of map.

However, the hash table also sacrifices space in exchange for time. We still need to use additional arrays, sets, and maps to store data in order to achieve fast search and maximize the efficiency of reading, and the search efficiency of the hash table mainly depends on The hash function chosen when constructing the hash table and the method of handling collisions.

In short, the hash table is not a panacea, but if you need to quickly determine whether an element exists or not, you should think of using a hash table to complete it.

Guess you like

Origin juejin.im/post/7083677992290877448