(23) Data structure-hash table

1 Basic introduction to hash tables

1.1 Data structures used for storage

In computers, both arrays and linked lists can be used to store data. Since there is data storage, there must be data query. Therefore, after we store the data in arrays and linked lists, we must query them. The query time complexity of a linked list is O(n), and the query time complexity of an unordered array is also O(n). We still have room to discuss the query time of the array, but the query time of the linked list is definitely longer. Long, because the linked list is a discontinuous space, it can only traverse the query one after another, regardless of whether the linked list is ordered or not. However, the query time of arrays can be improved. When the data in the array is ordered, we can use binary search to find the data in the array, which can save a lot of query time. Binary search is not a difficulty problem, because the dichotomy method can get rid of general data every time, so its time complexity is definitely much lower than O(n). Its time complexity is O(logn). As the number of queries increases, its time complexity The complexity will be significantly lower than O(n).

Therefore, for arrays, our research direction is no longer how to find a better search solution, but how to sort it faster, which leads to the sorting algorithm. There are currently eight mainstream sorting algorithms. In fact, There are more sorting methods that are recognized by everyone, and there are at least ten sorting methods worth learning. Reality tells us that contradictions are difficult to eliminate, they will only shift. With binary search arrays, the contradiction of long query time cannot be eliminated directly, because the sorting algorithm is also time-consuming, and the query time is used in the sorting time. . Regarding the time complexity of sorting, the lowest one is O(nlogn), which does not seem to be particularly low. This directly leads to the combined time complexity of the two algorithms, which is more time-consuming than direct unordered state checking.

Based on this problem, someone creatively proposed a new idea, that is: 在存储数据的时候，不再来数就存，而是使用一种巧妙的分类方法，将数据们进行分类，进而达到像二分查找一样的大规模缩减查询范围的效果, this is hash storage.

1.2 Hash table

There are various physical structures of hash tables. The bucket structure used in radix sorting is actually a hash table. Hash tables are usually based on certain classification rules to classify the stored data and then classify them. Stored under different indexes, so that when we query a piece of data, we first get the index, then find the data storage table with a matching index in the hash table, and then query directly in this table without traversing all the data. , this is the benefit of hash tables. In order to mnemonic hash tables, I will use an example to vividly illustrate hash tables: In a library, there are many books of various categories, including novels, dictionaries, magazines, professional books, etc. In the early days, librarians did not take good care of these books. They piled them together completely disorderly. When people came to borrow, they had to rummage through them one by one until they found the books they wanted to read. One day after that, a new librarian came, and this librarian classified these books. He put these books into different areas according to these four categories. After that, people who borrowed books came, First, the title of the book will be reported, and then the librarian will judge the category of the book based on the title. If a person wants to borrow "People's Literature", the librarian will tell the reader: The book belongs to a magazine, please go to the magazine If you search in the magazine area, the reader will walk directly to the magazine area, so that he does not have to rummage through the whole pile of books, which saves time to a great extent. The working principle of the hash table is actually such a process. The main idea used is index storage. It classifies data according to a certain rule, and then puts the data into different categories. These categories will have corresponding index values. When we query, first The index value will be queried. After the index value is queried, the data will be queried directly in the storage structure corresponding to the index value. This will directly reduce the query scope and reduce the query scale.

1.3 What is a hash table

散列表(Hash table, also called hash table) is 键码值（key value）a data structure that is directly accessed based on the relationship. That is, it accesses records by mapping key values to a location in the table to speed up lookups. This mapping function is called 散列函数, and the array that stores the records is called 散列表.
Insert image description here

2 Application examples

There is a company that requires a new employee to add his or her information (id, gender, age, name, address...) when a new employee comes to report. When the employee's ID is entered, all the employee's information is required to be found.

Require:

Without using a database, the faster the better =>
When adding the hash table, ensure that the insertion is from low to high by id.

(1) Use a linked list to implement a hash table. The linked list does not have a header [that is, the first node of the linked list stores employee information] (2)
Idea analysis
(3) Code implementation
Insert image description here
Code implementation:

public class HashTabDemo {
    
    
    public static void main(String[] args) {
    
    
        // 创建哈希表
        HashTab hashTab = new HashTab(7);

        // 写一个简单菜单
        String key = "";
        Scanner scanner = new Scanner(System.in);
        while (true) {
    
    
            System.out.println("add:添加雇员");
            System.out.println("list:显示雇员");
            System.out.println("find:查找雇员");
            System.out.println("exit:退出系统");

            key = scanner.next();

            switch (key) {
    
    
                case "add":
                    System.out.println("输入id");
                    int id = scanner.nextInt();
                    System.out.println("输入名字");
                    String name = scanner.next();
                    // 创建雇员
                    Emp emp = new Emp(id, name);
                    hashTab.add(emp);
                    break;
                case "list":
                    hashTab.list();
                    break;
                case "find":
                    System.out.println("输入id");
                    int no = scanner.nextInt();
                    hashTab.findEmpById(no);
                    break;
                case "exit":
                    scanner.close();
                    System.out.println("退出系统");
                    System.exit(0);
                default:
                    break;
            }
        }


    }

}

// 创建 hashtable 管理多条链表
class HashTab {
    
    
    private EmpLinkedList[] empLinkedListArray;
    private int size;

    // 构造器
    public HashTab(int size) {
    
    
        this.size = size;
        // 初始化 empLinkedListArray
        empLinkedListArray = new EmpLinkedList[size];
        // 初始化每一个链表
        for (int i = 0; i < size; i++) {
    
    
            empLinkedListArray[i] = new EmpLinkedList();
        }
    }

    // 添加雇员
    public void add(Emp emp) {
    
    
        // 根据员工的id，得到该员工应当添加到哪条链表
        int empLinkedListNo = hashFun(emp.id);
        // 将 emp 添加到对应的链表中
        empLinkedListArray[empLinkedListNo].add(emp);

    }

    // 遍历所有的链表，遍历 hashtab
    public void list() {
    
    
        for (int i = 0; i < size; i++) {
    
    
            empLinkedListArray[i].list(i);
        }
    }


    // 编写散列函数
    public int hashFun(int id) {
    
    
        return id % size;
    }

    // 根据输入的 id查找雇员
    public void findEmpById(int id){
    
    
        // 使用散列函数确定到哪条链表查找
        int i = hashFun(id);
        Emp emp = empLinkedListArray[i].findEmpById(id);
        if(emp != null){
    
    
            System.out.printf("在第%d条链表中找到雇员 id=%d name=%s\n", i, emp.id, emp.name);
        }else{
    
    
            System.out.println("没有找到该雇员");
        }
    }


}


// 表示一个雇员
class Emp {
    
    
    public int id;
    public String name;
    public Emp next;

    public Emp(int id, String name) {
    
    
        this.id = id;
        this.name = name;
    }
}

// 创建EmpLinkedList，表示链表
class EmpLinkedList {
    
    
    // 头指针，执行第一个Emp，因此我们这个链表的 head 是直接指向第一个 Emp
    // 默认为 null
    private Emp head;

    // 添加雇员到链表
    // 说明：
    // 1. 假定当添加雇员时，id是自增长，即id的分配总是从小到大
    // 因此，我们将该雇员直接加入到本链表的最后即可
    public void add(Emp emp) {
    
    
        // 如果是添加第一个雇员
        if (head == null) {
    
    
            head = emp;
            return;
        }
        // 如果不是第一个，则使用一个辅助的指针，帮助定位到最后
        Emp curEmp = head;
        while (true) {
    
    
            if (curEmp.next == null) {
    
    
                break;
            }
            // 后移
            curEmp = curEmp.next;
        }

        // 退出时直接将 emp 加入到链表
        curEmp.next = emp;

    }

    // 遍历链表的雇员信息
    public void list(int no) {
    
    
        // 说明链表为空
        if (head == null) {
    
    
            System.out.println("第" + no + "条链表为空!");
            return;
        }

        System.out.println("第" + no + "条链表的信息为");
        Emp curEmp = head;
        while (true) {
    
    
            System.out.printf("=> id=%d name =%s \t", curEmp.id, curEmp.name);
            // 说明 curEmp 已经是最后节点
            if (curEmp.next == null) {
    
    
                break;
            }
            // 后移
            curEmp = curEmp.next;

        }
        System.out.println();

    }

    // 根据id查找链表
    // 如果找到，返回Emp
    public Emp findEmpById(int id){
    
    
        // 判断链表是否为空
        if(head == null){
    
    
            System.out.println("链表为空");
            return null;
        }

        // 辅助指针
        Emp curEmp = head;
        while (true){
    
    
            // 找到
            if(curEmp.id == id){
    
    
                break;
            }
            // 遍历当前列表没有找到该雇员，退出条件
            if(curEmp.next == null){
    
    
                curEmp = null;
                break;
            }
            curEmp = curEmp.next;
        }

        return curEmp;

    }

}