Java数据结构和算法--哈希表

Hash表也称散列表，直译为哈希表，hash表是一种根据关键字值（key-value）而直接进行访问的数据结构。它基于数组，通过把关键字映射到数组的某个下标来加快查找速度，这种映射转换作用的函数我们称之为哈希函数。

每种哈希表都有自己的哈希函数，哈希函数是自己定义的，没有统一的标准，下面我们基于这个简单的哈希函数（hashValue=key%arraySize）来分析一下哈希表的实现过程。其中hashValue为哈希值，key为哈希表的键值，arraySize为表的数组的大小。我们先来说一下什么是哈希值冲突，就是我们不同的key值通过哈希函数的计算，有可能得到相同的哈希值，比如上面的哈希函数来计算，如果key为1和11，而arraySize为10的话，那么它们的哈希值等于1，这就冲突了。所以单纯依靠哈希值来映射数组单元的话，是不可能实现的哈希表的，我们必须要有方法来解决这种冲突。

有什么解决办法呢？我们常用的有开放地址法和链地址法，下面我们来看看什么是开放地址法

1.开放地址法

开放地址法是指，当我们通过哈希函数计算得出的下标值对应的数组单元已经被占用的时候，我们就要寻找其他的位置，主要的方法有：线性探测法、二次探测法、再哈希法。

1.1线性探测

在线性探测中，我们会线性去查找空白单元。比如我们的a位置被占用，我们就会去查找a+1，如果a+1也被占用，继续a+2，以此类推，它会沿着数组下标一步一步去查找，直到找到空白的位置。下面我们通过一个代码例子来看看线性探测是怎么样的

public class LinearProbingHashTable {

    private DataItem[] hashArray;   //DataItem类，表示每个数据项信息
    private int arraySize;//数组的初始大小
    private int itemNum;//数组实际存储了多少项数据
    private DataItem nonItem;//用于删除数据项

    public LinearProbingHashTable(int arraySize) {
        this.arraySize = arraySize;
        hashArray = new DataItem[arraySize];
        nonItem = new DataItem(-1);//删除的数据项下标为-1
    }

    //判断数组是否存储满了
    public boolean isFull() {
        return (itemNum == arraySize);
    }

    //判断数组是否为空
    public boolean isEmpty() {
        return (itemNum == 0);
    }

    //打印数组内容
    public void display() {
        System.out.println("Table:");
        for (int j = 0; j < arraySize; j++) {
            if (hashArray[j] != null) {
                System.out.print(hashArray[j].getKey() + " ");
            } else {
                System.out.print("** ");
            }
        }
    }

    //通过哈希函数转换得到数组下标
    public int hashFunction(int key) {
        return key % arraySize; //对数组大小取余
    }

    //插入数据项
    public void insert(DataItem item) {
        if (isFull()) {
            //扩展哈希表
            System.out.println("哈希表已满，重新哈希化...");
            extendHashTable();
        }
        int key = item.getKey();
        int hashVal = hashFunction(key);
        while (hashArray[hashVal] != null && hashArray[hashVal].getKey() != -1) {
            //线性探测
            ++hashVal;
            //做一次哈希计算，
            hashVal %= arraySize;
        }
        hashArray[hashVal] = item;
        itemNum++;
    }

    /**
     * 数组有固定的大小，而且不能扩展，所以扩展哈希表只能另外创建一个更大的数组，然后把旧数组中的数据插到新的数组中。
     * 但是哈希表是根据数组大小计算给定数据的位置的，所以这些数据项不能再放在新数组中和老数组相同的位置上。
     * 因此不能直接拷贝，需要按顺序遍历老数组，并使用insert方法向新数组中插入每个数据项。
     * 这个过程叫做重新哈希化。这是一个耗时的过程，但如果数组要进行扩展，这个过程是必须的。
     */
    public void extendHashTable() {
        int num = arraySize;
        itemNum = 0;//重新计数，因为下面要把原来的数据转移到新的扩张的数组中
        arraySize *= 2;//数组大小翻倍
        DataItem[] oldHashArray = hashArray;
        hashArray = new DataItem[arraySize];
        for (int i = 0; i < num; i++) {
            insert(oldHashArray[i]);
        }
    }

    //删除数据项
    public DataItem delete(int key) {
        if (isEmpty()) {
            System.out.println("Hash Table is Empty!");
            return null;
        }
        int hashVal = hashFunction(key);
        while (hashArray[hashVal] != null) {
            if (hashArray[hashVal].getKey() == key) {
                DataItem temp = hashArray[hashVal];
                hashArray[hashVal] = nonItem;//nonItem表示空Item,其key为-1，作为被删除项的标识
                itemNum--;
                return temp;
            }
            ++hashVal;
            hashVal %= arraySize;
        }
        return null;
    }

    //查找数据项
    public DataItem find(int key) {
        int hashVal = hashFunction(key);
        while (hashArray[hashVal] != null) {
            if (hashArray[hashVal].getKey() == key) {
                return hashArray[hashVal];
            }
            //当没有找到key对应的data时，用和插入同样的线性探测方法去寻找
            ++hashVal;
            hashVal %= arraySize;
        }
        return null;
    }


    public static class DataItem {
        private int iData;

        public DataItem(int iData) {
            this.iData = iData;
        }

        public int getKey() {
            return iData;
        }
    }
}

像这种线性探测的哈希表，有一个严重的缺点，就是当数组填得越来越满时，有可能探测的次数就会越来越多，因为空白单元越来越少，解决这种问题的主要方法有二次探测和再哈希法。因为再哈希法是一种比较好的解决方案，所以下面我们来介绍再哈希法。

1.2 再哈希法

就是在探测前，我们再利用一个哈希函数来计算探测步长，而不是线性探测那样每次的步长都为1，经验得出的有效再哈希方法为：stepSize = constant - key % constant，其中constant为质数而且小于哈希表数组的大小，同时数组的大小也为质数。因为这样才能避免步长出现重复循环的现象。下面我们来看看一个再哈希法的代码实现

public class HashDouble {

    private DataItem[] hashArray;   //DataItem类，表示每个数据项信息
    private int arraySize;//数组的初始大小
    private int itemNum;//数组实际存储了多少项数据
    private DataItem nonItem;//用于删除数据项

    public HashDouble() {
        this.arraySize = 13; //默认大小13
        hashArray = new DataItem[arraySize];
        nonItem = new DataItem(-1);//删除的数据项下标为-1
    }

    //判断数组是否存储满了
    public boolean isFull() {
        return (itemNum == arraySize);
    }

    //判断数组是否为空
    public boolean isEmpty() {
        return (itemNum == 0);
    }

    //打印数组内容
    public void display() {
        System.out.println("Table:");
        for (int j = 0; j < arraySize; j++) {
            if (hashArray[j] != null) {
                System.out.print(hashArray[j].getKey() + " ");
            } else {
                System.out.print("** ");
            }
        }
    }

    //通过哈希函数转换得到数组下标
    public int hashFunction1(int key) {
        return key % arraySize;
    }

    public int hashFunction2(int key) {
        return 5 - key % 5;
    }

    //插入数据项
    public void insert(DataItem item) {
        if (isFull()) {
            //扩展哈希表
            System.out.println("哈希表已满，重新哈希化...");
            extendHashTable();
        }
        int key = item.getKey();
        int hashVal = hashFunction1(key);
        int stepSize = hashFunction2(key);//用第二个哈希函数计算探测步数
        while (hashArray[hashVal] != null && hashArray[hashVal].getKey() != -1) {
            hashVal += stepSize;
            hashVal %= arraySize;//以指定的步数向后探测
        }
        hashArray[hashVal] = item;
        itemNum++;
    }

    /**
     * 数组有固定的大小，而且不能扩展，所以扩展哈希表只能另外创建一个更大的数组，然后把旧数组中的数据插到新的数组中。
     * 但是哈希表是根据数组大小计算给定数据的位置的，所以这些数据项不能再放在新数组中和老数组相同的位置上。
     * 因此不能直接拷贝，需要按顺序遍历老数组，并使用insert方法向新数组中插入每个数据项。
     * 这个过程叫做重新哈希化。这是一个耗时的过程，但如果数组要进行扩展，这个过程是必须的。
     */
    public void extendHashTable() {
        int num = arraySize;
        itemNum = 0;//重新计数，因为下面要把原来的数据转移到新的扩张的数组中
        arraySize *= 2;//数组大小翻倍
        DataItem[] oldHashArray = hashArray;
        hashArray = new DataItem[arraySize];
        for (int i = 0; i < num; i++) {
            insert(oldHashArray[i]);
        }
    }

    //删除数据项
    public DataItem delete(int key) {
        if (isEmpty()) {
            System.out.println("Hash Table is Empty!");
            return null;
        }
        int hashVal = hashFunction1(key);
        int stepSize = hashFunction2(key);
        while (hashArray[hashVal] != null) {
            if (hashArray[hashVal].getKey() == key) {
                DataItem temp = hashArray[hashVal];
                hashArray[hashVal] = nonItem;//nonItem表示空Item,其key为-1
                itemNum--;
                return temp;
            }
            hashVal += stepSize;
            hashVal %= arraySize;
        }
        return null;
    }

    //查找数据项
    public DataItem find(int key) {
        int hashVal = hashFunction1(key);
        int stepSize = hashFunction2(key);
        while (hashArray[hashVal] != null) {
            if (hashArray[hashVal].getKey() == key) {
                return hashArray[hashVal];
            }
            hashVal += stepSize;
            hashVal %= arraySize;
        }
        return null;
    }

    public static class DataItem {
        private int iData;

        public DataItem(int iData) {
            this.iData = iData;
        }

        public int getKey() {
            return iData;
        }
    }

}

上面是开放地址法的哈希表实现过程，下面我们来看看链地址法是怎么实现哈希表的。

2.链地址法

就是在哈希表中的每个单元中设置一个链表，数据项还是像之前一样通过映射关键字找到数组单元，但是数据项不是插入到当前数组单元中而是插入到单元所在的链表中。其他产生哈希值冲突的数据项也将不用再去寻找空白单元了，一并地插入到对应单元的链表中。下面我们通过代码来说话：

先定义一个有序链表

public class SortLink {
    
    private LinkNode first;

    public SortLink() {
        first = null;
    }

    public boolean isEmpty() {
        return (first == null);
    }

    //插入节点
    public void insert(LinkNode node) {
        int key = node.getKey();
        LinkNode previous = null;
        LinkNode current = first;
        //按顺序插入，找到需要插入位置两边的节点
        while (current != null && current.getKey() < key) {
            previous = current;
            current = current.next;
        }
        //如果头结点为null
        if (previous == null) {
            first = node;
        } else {
            //插入到中间
            node.next = current;
            previous.next = node;
        }
    }

    public void delete(int key) {
        LinkNode previous = null;
        LinkNode current = first;
        if (isEmpty()) {
            System.out.println("Linked is Empty!!!");
            return;
        }
        while (current != null && current.getKey() != key) {
            previous = current;
            current = current.next;
        }
        if (previous == null) {
            first = first.next;
        } else {
            previous.next = current.next;
        }
    }

    //查找节点
    public LinkNode find(int key) {
        LinkNode current = first;
        //从头节点开始查找
        while (current != null && current.getKey() <= key) {
            if (current.getKey() == key) {
                return current;
            }
        }
        return null;
    }

    public void displayLink() {
        System.out.println("Link(First->Last)");
        LinkNode current = first;
        while (current != null) {
            current.displayLink();
            current = current.next;
        }
        System.out.println("");
    }

    class LinkNode {
        private int iData;
        public LinkNode next;

        public LinkNode(int iData) {
            this.iData = iData;
        }

        public int getKey() {
            return iData;
        }

        public void displayLink() {
            System.out.println(iData + " ");
        }
    }
}

基于这个有序链表，我们来看看链地址法的代码

public class HashChaining {

    private SortLink[] hashArray;//数组中存放链表
    private int arraySize;

    public HashChaining(int size) {
        arraySize = size;
        hashArray = new SortLink[arraySize];
        //new 出每个空链表初始化数组
        for (int i = 0; i < arraySize; i++) {
            hashArray[i] = new SortLink();
        }
    }

    public void displayTable() {
        for (int i = 0; i < arraySize; i++) {
            System.out.print(i + "：");
            hashArray[i].displayLink();
        }
    }

    public int hashFunction(int key) {
        return key % arraySize;
    }

    //插入
    public void insert(SortLink.LinkNode node) {
        int key = node.getKey();
        int hashVal = hashFunction(key);
        hashArray[hashVal].insert(node);//直接往链表中添加即可
    }

    public SortLink.LinkNode delete(int key) {
        int hashVal = hashFunction(key);
        SortLink.LinkNode temp = find(key);
        hashArray[hashVal].delete(key);//从链表中找到要删除的数据项，直接删除
        return temp;
    }

    public SortLink.LinkNode find(int key) {
        int hashVal = hashFunction(key);
        SortLink.LinkNode node = hashArray[hashVal].find(key); //直接从链表中查找
        return node;
    }

}

一般来说链地址法比开放地址法要好，所用时间要少。我们经常用到的HashMap也是用到了链地址法，后面有时间话，我会写一篇关于HashMap原理的文章。好了，关于哈希表我就简单讲到这里。

源码地址：https://github.com/jiusetian/DataStructureDemo/tree/master/app/src/main/java/hash

Java数据结构和算法--哈希表

猜你喜欢