Detailed explanation of data structure search

1. What is a lookup table?

1.1 Definition

A lookup table is a collection of data elements of the same type. For example, a telephone directory and a dictionary can be viewed as a lookup table.

1.2 Several operations of lookup tables:

1) Find a specific data element in the lookup table;
2) Insert the data element in the lookup table;
3) Delete the data element from the lookup table;

1.3 Static lookup table and dynamic lookup table

A lookup table that only performs search operations without changing the data elements in the table is called a static lookup table ; conversely, a lookup table that inserts or deletes data while performing a search operation is called a static lookup table. The table is a dynamic lookup table .

1.4 Keywords

When looking for a specific element in the lookup table, the premise is that you need to know some attributes of this element. For example, everyone will have their own unique student number when they go to school, because your name and age may be the same as others, but the student number will not be repeated. These attributes of students (student number, name, age, etc.) can be called keywords.

Keywords are further subdivided into primary keywords and secondary keywords. If a keyword can uniquely identify a data element, it is called the primary keyword. For example, a student's student ID number is unique; on the contrary, keywords such as a student's name and age are not unique because they are not unique. Sex, called the secondary keyword.

2. Find detailed explanations and examples of algorithms

Use a keyword to identify a data element. When searching, based on a given value, determine a record or data element in the table whose keyword value is equal to the given value. The method of searching in a computer is determined by the organizational structure of the records in the table .

2.1 Sequential search

Sequential search is also called linear search. Starting from one end of the linear table of the data structure, it scans sequentially and compares the scanned node keywords with the given value k. If they are equal, the search is successful; if it is still not found after the scan, Nodes with keywords equal to k indicate that the search failed.

C language code example:

#include <stdio.h>
#include <stdlib.h>

typedef struct {
    int key;//查找表中每个数据元素的值
    //如果需要,还可以添加其他属性
}ElemType;

typedef struct{
    ElemType *elem;//存放查找表中数据元素的数组
    int length;//记录查找表中数据的总数量
}SSTable;

//创建查找表
void create(SSTable **st,int length){

    (*st)=(SSTable*)malloc(sizeof(SSTable));
    (*st)->length=length;
    (*st)->elem =(ElemType*)malloc((length+1)*sizeof(ElemType));
    printf("输入表中的数据元素:\n");

    //根据查找表中数据元素的总长度,在存储时,从数组下标为 1 的空间开始存储数据
    for (int i=1; i<=length; i++) {
        scanf("%d", &((*st)->elem[i].key));
    }
}

//查找表查找的功能函数,其中key为关键字
int search_seq(SSTable *st,int key){
    st->elem[0].key=key;//将关键字作为一个数据元素存放到查找表的第一个位置,起监视哨的作用
    int i=st->length;
    //从查找表的最后一个数据元素依次遍历,一直遍历到数组下标为0
    while (st->elem[i].key!=key) {
        i--;
    }
    //如果 i=0,说明查找失败;反之,返回的是含有关键字key的数据元素在查找表中的位置
    return i;
}

int main() {
    SSTable *st;

    int length;
    printf("请输入查找数据的长度:\n");
    scanf("%d", &length);
    create(&st, length);
    
    printf("请输入查找数据的关键字:\n");
    int key;
    scanf("%d", &key);

    int location = search_seq(st, key);
    if (location == 0) {
        printf("查找失败");
    }else{
        printf("数据在查找表中的位置为:%d \n\n",location);
    }
    return 0;
}

Output result: 

2.2 Binary search (halve search)

Binary search requires the nodes in the linear table to be arranged in ascending or descending order by keyword value. The given value k is first compared with the keyword of the intermediate node. The intermediate node divides the linear table into two sub-tables. If they are equal, the search is successful; If they are not equal, the comparison result between k and the key of the intermediate node is used to determine which sub-table to search next. This is done recursively until the search is found or it is found that there is no such node in the table at the end of the search.

Process description:

{3,5,9,10,12,14,17,20,22,25,28}The process of using the binary search algorithm to find the keyword 10 in the static lookup table is:

Figure 1 Process of half search (a)


As shown in Figure 1 above, the pointers low and high point to the first key and the last key of the lookup table respectively, and the pointer mid points to the key in the middle of the low and high pointers. During the search process, it is compared with the keyword pointed to by mid every time. Since the data in the entire table is ordered, the approximate location of the keyword to be found can be known after the comparison.

For example, when searching for the keyword 10, first compare it with 14. Since 10 is 10  < 14, and the lookup table is sorted in ascending order, it can be determined that if there is a keyword 10 in the static lookup table, it must exist in the low and mid points. in the middle of the area.

Therefore, when traversing again, it is necessary to update the positions of the high pointer and the mid pointer, move the high pointer to a position to the left of the mid pointer, and make mid point again to the middle position between the low pointer and the high pointer. as shown in picture 2:

Figure 2 Process of half search (b)

Similarly, compare 10 with 9 pointed by the mid pointer, 9 < 10so it can be determined that if 10 exists, it must be in the area pointed by mid and high. So let low point to a position on the right side of mid, and update the position of mid at the same time.

Figure 3 Process of half search (3)

When the judgment was made for the third time, it was found that mid was the keyword 10, and the search ended. 

Note: During the search process, if the middle position of the low pointer and the high pointer is located between the two keywords during calculation, that is, the position of mid is not an integer, and a rounding operation needs to be performed uniformly.

C language code example:

#include <stdio.h>
#include <stdlib.h>

typedef struct {
    int key;//查找表中每个数据元素的值
    //如果需要,还可以添加其他属性
}ElemType;

typedef struct{
    ElemType *elem;//存放查找表中数据元素的数组
    int length;//记录查找表中数据的总数量
}SSTable;

//创建查找表
void create(SSTable **st, int length){
    (*st) = (SSTable*)malloc(sizeof(SSTable));
    (*st)->length = length;
    (*st)->elem = (ElemType*)malloc((length+1)*sizeof(ElemType));

    printf("输入表中的数据元素:\n");
    //根据查找表中数据元素的总长度,在存储时,从数组下标为 1 的空间开始存储数据
    for (int i=1; i<=length; i++) {
        scanf("%d",&((*st)->elem[i].key));
    }
}

//折半查找算法
int search_bin(SSTable *ST, int key){
    int low=1;//初始状态 low 指针指向第一个关键字
    int high=ST->length;//high 指向最后一个关键字
    int mid;
    while (low<=high) {
        mid=(low+high)/2;//int 本身为整形,所以,mid 每次为取整的整数
        if (ST->elem[mid].key==key)//如果 mid 指向的同要查找的相等,返回 mid 所指向的位置
        {
            return mid;
        }else if(ST->elem[mid].key>key)//如果mid指向的关键字较大,则更新 high 指针的位置
        {
            high=mid-1;
        }
        //反之,则更新 low 指针的位置
        else{
            low=mid+1;
        }
    }
    return 0;
}

int main(int argc, const char * argv[]) {
    SSTable *st;

    int length;
    printf("请输入查找数据的长度:\n");
    scanf("%d", &length);
    create(&st, length);

    printf("请输入查找数据的关键字:\n");
    int key;
    scanf("%d",&key);

    int location = search_bin(st, key);

    //如果返回值为 0,则证明查找表中未查到 key 值,
    if (location == 0) {
        printf("查找表中无该元素");
    }else{
        printf("数据在查找表中的位置为:%d \n\n",location);
    }
    return 0;
}

 Output result: 

2.3 Block search

Block search is also called index search. The line is divided into several blocks. The storage order of the data elements in each block is arbitrary, but it is required that the blocks must be arranged in order according to the size of the key value, and the establishment of An index table arranged in increasing order of keyword values. An item in the index table corresponds to a block in the linear table. The index item includes two contents: ① The key field stores the maximum keyword of the corresponding block; ② The chain field stores the point to this block. Pointer to the first node. Block search is performed in two steps. First, determine which block the node to be searched for belongs to, and then search for the node within the block.

Process description:

1. First select the largest keyword in each block to form an index table. 
2. The search is divided into two parts. First, perform a binary search or sequential search on the index table to determine which block the record to be searched is in. Then, search sequentially in the identified blocks.

C language code example:


#include <stdio.h>

struct index /*定义块的结构*/
{
	int key;
	int start;
	int end;
} index_table[4]; /*定义结构体数组*/

int block_search(int key, int a[]) /*自定义实现分块查找*/
{
	int i, j;
	i = 1;
	while (i <= 3 && key > index_table[i].key) /*确定在那个块中*/
	{
		i++;

		if (i > 3) /*大于分得的块数,则返回0*/
		{
			return 0;
		}
		

		j = index_table[i].start; /*j等于块范围的起始值*/

		while (j <= index_table[i].end && a[j] != key) /*在确定的块内进行查找*/
		{
			j++;

			if (j > index_table[i].end) /*如果大于块范围的结束值,则说明没有要查找的数,j置0*/
			{
				j = 0;
			}
			
		}
		
	}
	

	return j;
}

int main()
{
	int i, j = 0, k, key, a[15];
	printf("输入长度15的数据元素:\n");

	for (i = 0; i < 15; i++)
	{
		scanf("%d", &a[i]); /*输入由小到大的15个数*/
		index_table[i].start = j + 1; /*确定每个块范围的起始值*/
		j = j + 1;
		index_table[i].end = j + 4; /*确定每个块范围的结束值*/
		j = j + 4;
		index_table[i].key = a[j]; /*确定每个块范围中元素的最大值*/
		
	}
		
	printf("请输入要查找的关键字:\n");
	scanf("%d", &key); /*输入要查询的数值*/
	k = block_search(key, a); /*调用函数进行查找*/
	if (k != 0)
	{
		printf("数据在查找表中的位置为:%d \n\n",k);
	}else{
		printf("未找到");
	}
}

 Output result: 

2.4 Hash table lookup

Hash table lookup is to directly find the address of the node by operating on the recorded keyword value. It is a direct conversion method from keyword to address without repeated comparison. Assume that f contains n nodes, R_{i}and is one of the nodes (1≤i≤n), which key_{i}is its keyword value. A certain functional relationship is established between the address key_{i}of and and R_{i}the keyword value can be converted into The address of the corresponding node is: addr( R_{i})=H( key_{i}), addr( R_{i}) is the hash function.   

The hash address simply represents the storage location in the lookup table, not the actual physical storage location. f(x) is a function through which the hash address of the data corresponding to the keyword can be quickly found, which is called a "hash function".

2.4.1 There are 6 commonly used hash function construction methods: direct addressing method, digital analysis method, square centering method, folding method, division leaving remainder method and random number method.

Building a hash function requires taking appropriate methods based on the actual lookup table situation. The following factors are usually considered:

  • The length of the keyword. If the lengths are unequal, use the random number method. If the keyword has a large number of digits, use the folding method or the numerical analysis method; conversely, if the keyword has a short number of digits, consider the square-middle method;
  • The size of the hash table. If the size is known, you can choose to use the division with remainder method;
  • Distribution of keywords;
  • The lookup frequency of the lookup table;
  • The time required to calculate the hash function (including factors for hardware instructions)

Direct addressing method : Its hash function is a linear function, which is the following two forms:

H(key)= key or H(key)=a * key + b

Among them, H (key) means that the keyword is the hash address corresponding to the key, and both a and b are constants.

For example, there is a statistical table of population statistics from 1 to 100 years old, as shown in Table 1:

Table 1 Demographic table
 

Assuming that its hash function is in the first form, the value of its key represents the final storage location. If you need to find the number of people aged 25, put the age 25 into the hash function and directly find the corresponding hash address of 25 (the obtained hash address means that the record is located at the 25th position in the lookup table) Bit).

Numerical analysis method : If the keyword consists of multiple characters or numbers, you can consider extracting 2 or more digits as the hash address corresponding to the keyword. In the selection method, try to choose the bits that change more to avoid conflicts. .

For example, Table 2 lists some keywords, each keyword consists of 8 decimal digits:

Table 2
 

By analyzing the composition of the keyword, it is obvious that the 1st and 2nd digits of the keyword are fixed, while the 3rd digit is either a number 3 or 4, and the last digit can only be 2, 7 and 5. , only the middle 4 bits have approximately random values, so in order to avoid conflicts, 2 bits can be randomly selected from the 4 bits as its hash address.

The square-middle method is to perform a square operation on the keyword and take the middle number as the hash address. This method is also a commonly used method of constructing hash functions.

For example, the keyword sequence is {421,423,436}, and the result of squaring each keyword is {177241,178929,190096}, then the middle two digits can be taken {72,89,00}as its hash address.

The folding method is to divide the keyword into several parts with the same number of digits (the number of digits in the last part can be different), and then take the superposition sum of these parts (with the carry rounded off) as the hash address. This method is suitable for situations where there are many keyword digits.

For example, books in libraries are numbered using a 10-digit decimal number as the key. If a hash table is created for the lookup table, the folding method can be used.

If the number of a book is: 0-442-20586-4, the division method is as shown in Figure 1. There are two ways to fold it: one is shift folding, and the other is boundary folding:

  • Shift folding is to align each divided part according to its lowest bit, and then add them together, as shown in Figure 1(a);
  • Intermediate folding is folding back and forth along the dividing line from one end to the other, as shown in Figure 1(b).

Figure 1 Shift folding and boundary folding

Division with remainder method : If the maximum length m of the entire hash table is known, you can take a number p that is no greater than m, and then perform a remainder operation on the keyword key, that is: H(key)= key % p.

In this method, the value of p is very important. It is known from experience that p can be a prime number not greater than m or a composite number that does not contain prime factors less than 20.

Random number method: takes a random function value of the keyword as its hash address, that is: H(key)=random(key), this method is suitable for situations where the keyword lengths are not equal.

Note: The random function here is actually a pseudo-random function. The random function means that even if the given key is the same each time, H (key) is different; while the pseudo-random function is just the opposite, each key corresponds to a fixed H (key).

2.4.2 Methods of handling conflicts

For the establishment of a hash table, an appropriate hash function needs to be selected, but for unavoidable conflicts, appropriate measures need to be taken to deal with them.

Commonly used methods for handling conflicts include the following:

  • Open addressing method H (key) = (H (key) + d) MOD m (where m is the table length of the hash table, d is an increment) When the obtained hash address conflicts, choose the following three methods One of the methods obtains the value of d, and then continues calculation until the calculated hash address no longer conflicts. These three methods are:
    • Linear detection method: d=1, 2, 3,…, m-1
    • Secondary detection method: d=12,-12,22,-22,32,…
    • Pseudo-random number detection method: d=pseudo-random number
    For example, three data, 17, 60 and 29, have been filled in a hash table with a length of 11 (as shown in Figure 2(a)), and the hash function used is: H (key) = key MOD 11 , the existing fourth data is 38. When the hash address obtained through the hash function is 5 and conflicts with 60, the above three methods are used to obtain the insertion position, as shown in Figure 2(b):

    Figure 2 Open addressing method

    Note: In the linear detection method, when a conflict occurs, starting from the position where the conflict occurs, +1 is detected each time, until there is a free position; in the secondary detection method, starting from the position where the conflict occurs, in accordance with +12, -12, +22, ... are detected in this way until there is an idle position; pseudo-random detection is performed by adding a random number each time until an idle position is detected.

  • Rehash method:
    When the hash address obtained by the hash function conflicts with other keywords, another hash function is used to calculate until the conflict no longer occurs.
  • The chain address method
    stores all the data corresponding to all conflicting keywords in the same linear linked list . For example, there is a group of keywords {19,14,23,01,68,20,84,27,55,11,10,79}whose hash function is: H(key)=key MOD 13. The hash table constructed using the chain address method is shown in Figure 3:

    Figure 3 Hash table constructed by chain address method

  • Create a public overflow area.
    Create two tables, one is the basic table and the other is the overflow table. The basic table stores data without conflicts. When the hash address generated by the hash function of a key conflicts, the data is filled into the overflow table.

C language code example:

#include "stdio.h"
#include "stdlib.h"
#define HASHSIZE 15 //定义散列表长为数组的长度
#define NULLKEY -1
typedef struct{
    int *elem;//数据元素存储地址,动态分配数组
    int count; //当前数据元素个数
}HashTable;

//对哈希表进行初始化
void init(HashTable *hashTable){
    int i;
    hashTable->elem= (int *)malloc(HASHSIZE*sizeof(int));
    hashTable->count=HASHSIZE;
    for (i=0;i<HASHSIZE;i++){
        hashTable->elem[i]=NULLKEY;
    }
}

//哈希函数(除留余数法)
int Hash(int data){
    return data % HASHSIZE;
}

//哈希表的插入函数,可用于构造哈希表
void insert(HashTable *hashTable,int data){

	//求哈希地址
    int hashAddress=Hash(data); 

    //发生冲突
    while(hashTable->elem[hashAddress]!=NULLKEY){
        //利用开放定址法解决冲突
        hashAddress=(++hashAddress)%HASHSIZE;
    }

    hashTable->elem[hashAddress]=data;
}

//哈希表的查找算法
int search(HashTable *hashTable,int data){
	//求哈希地址
    int hashAddress=Hash(data); 

    //发生冲突
    while(hashTable->elem[hashAddress]!=data){

        //利用开放定址法解决冲突
        hashAddress=(++hashAddress) % HASHSIZE;

        //如果查找到的地址中数据为NULL,或者经过一圈的遍历回到原位置,则查找失败
        if (hashTable->elem[hashAddress] == NULLKEY || hashAddress==Hash(data)){
            return -1;
        }
    }
    return hashAddress;
}

int main(){
    int i, result, key;
    HashTable hashTable;
    int arr[HASHSIZE];

    printf("输入长度15的数据元素:\n");
    for (i = 0; i < 15; i++)
	{
		scanf("%d", &arr[i]); /*输入由小到大的15个数*/
		
	}

    //初始化哈希表
    init(&hashTable);

    //利用插入函数构造哈希表
    for (i=0;i<HASHSIZE;i++){
        insert(&hashTable,arr[i]);
    }
    
    //调用查找算法
    printf("请输入要查找的关键字:\n");
	scanf("%d", &key); /*输入要查询的数值*/
    result= search(&hashTable, key);

    if (result==-1){
    	printf("查找失败");
    } 
    else {
    	printf("在哈希表中的位置是:%d \n\n", result+1);
    }
    return  0;
}

  Output result: 

Guess you like

Origin blog.csdn.net/m0_68949064/article/details/128495417