Algorithms and Data Structures About Lookups

Basic concepts of lookup
Linear table lookup
tree table lookup
- Binary sorted tree:
- Balanced Binary Tree (AVL):
Hash table lookup

Basic concepts of lookup

Lookup table : It is a data structure that uses a collection of records of the same type as a logical structure and takes lookup as the core operation.

Keyword : is the value of a data item in the data element, also known as the key value, which can be used to identify a data element or a data item (field) of a record

Primary Key : A key that uniquely identifies a record. For those keywords that can identify multiple data elements (or records), they are called secondary keywords

Find : It is to find a record whose key value is equal to a given value in a set consisting of a set of records, or find some records whose attribute value meets certain conditions

Dynamic lookup table and static lookup table:
Static : Lookup table for lookup operation only
Dynamic : Dynamic table is characterized in that the table structure itself is dynamically generated during the lookup process. Simultaneously insert data elements that do not exist in the lookup table during the lookup process, or delete an existing data element from the lookup table

Average lookup length : In order to determine the position of the record in the lookup table, the expected value of the number of keywords that need to be compared with a given value is called the average lookup length of the lookup algorithm when the lookup is successful

Linear table lookup

1.顺序查找:
	a.查找过程:
		从表的一端开始，依次将记录的关键字和给定值进行比较，若某个记录的关键字和给定值相等，则查找成功，反之。查找失败。
	b.适用:
		线性表的顺序存储结构，也适用于线性表的链式存储结构。
2.折半查找:
	a.查找过程:
		从表的中间记录开始,如果给定值和中间记录的关键字相等,则查找成功;如果给定值大于或者小于中间记录的关键字,则在表中大于或小于中间记录的那一半中查找,这样重复操作,直到查找成功,或者在某一步中查找区间为空,则代表查找失败
	b.适用:
		只是适用于有序表，且限于顺序存储结构(线性链表无法进行折半查找)
3.分块查找:
	a.查找过程:
		第一步在索引表中确定待查记录所在的块，可以顺序查找或者折半查找索引表；第二步在块内顺序查找。

Sequential search algorithm:

template<class ElemType> int SqSearch(ElemType elem[],int n,ElemType key)
//一般顺序查找比较
{
    
    
    int i;
    for(i=0; i<n && elem[i]!=key ;i++);
    if(i<n)
        return i;
    else
        return -1;
}
template<class ElemType> int SqSearch(ElemType elem[],int n)
//使用哨兵elem[0]，n的传入的是带上哨兵的长度
{
    
    
    int i;
    for(i=n;elem[i]!=elem[0];i--);
    if(i==0)
        return -1;
    else
        return i;
}

Notice:
1. Sentinel optimization is an optimization of sequential search, because each time it is looped, it is necessary to judge whether i is out of bounds (whether i is less than or equal to n). Set up a sentinel, which can solve out-of-bounds problems. For the case where the search number is relatively large, the advantages of the sentinel are more obvious.
2. If the data type of the data element is a structure, it is necessary to overload the not equal (!=) relational operation in the structure.

The halved search algorithm:

//递归算法
template<class ElemType> int BinSearch(ElemType elem[],int low,int high,ElemType key)
{
    
    
    int mid;
    if(low>high)
        mid=-1;//查找失败
    else
    {
    
    
        mid=(low+high)/2;
        if(key<elem[mid])//左半边继续查找
            mid=BinSearch(elem,low,mid-1,key);
        else if(key>elem[mid])//右半边继续查找
            mid=BinSearch(elem,mid+1,high,key);
        //两个条件都不满足或者已经调用过，mid就是最后的结果
    }
    return mid;
}
//非递归算法
template<class ElemType> int BinSearch(ElemType elem[],int n,ElemType key)
{
    
    
    int low=0,high=n-1;//设置查找到的左右边界
    int mid;
    while(low<=high)
    {
    
    
        mid=(low+high)/2;
        if(key==elem[mid])
            return mid;
        else if(key<elem[mid])
            high=mid-1;
        else
            low=mid+1;
    }
    return -1;
}

Note: Algorithm idea of halved search

1) If the length of the search interval is less than 1 (low>high), the search fails and -1 is returned; otherwise, continue with the following steps.

2) Find the subscript mid (mid=(low+high)/2) of the data element in the middle of the search interval.

3) Use the keyword elem[mid] of the data element in the middle of the interval to compare with the given value key, and the result of the comparison has the following three possibilities.
①If elem[mid]=key, the search is successful, the success information is reported and its subscript mid is returned.
②If elem[mid]<key, it means that if there is a data element to be found in the data table, the data element must be on the right side of the mid, and the search interval can be reduced to the second half of the data table (low=mid+1 ), and then continue to search by half ( go to step 1 ).
③If elem[mid]>key, it means that if the data element you are looking for exists in the data table, the data element must be on the left side of mid. You can narrow the search interval to the first half of the data table (high=mid-1), and then continue to search in half ( go to step 1 ).

In the process of halving search, if the keyword of the data element is not equal to the given value for each comparison, the search interval is reduced by half. Until the search interval has been narrowed down to only one data element, if the desired data element is still not found, the search fails.

Comparison of three search methods:

a. The advantage of sequential search is that the algorithm is simple, and there is no requirement for the storage structure of the table. The disadvantage is that when n is large, the search efficiency is low
. b. The speed of binary search is fast and efficient, and it is suitable for tables that are not easily changed and often Search case
c. When a block search is to insert or delete a record in the table, as long as the block to which the record belongs is found, the operation can be performed in the block, and it is not suitable to use a chain storage structure

tree table lookup

Binary sorted tree:

1. Definition of binary sorting tree

Binary sorting tree (BST for short), also known as binary search (search) tree, is defined as: a binary sorting tree or an empty tree, or a binary tree that satisfies the following properties:

(1) If its left subtree is not empty, the values of all records in the left subtree are less than the value of the root record;
(2) If its right subtree is not empty, then the values of all records in the right subtree are equal to Greater than the value of the root record;
(3) The left and right subtrees are each a binary sorted tree.

2. Search in binary sorting tree

Because the binary sorting tree can be regarded as an ordered table, searching on the binary sorting tree is similar to binary search, and it is also a process of gradually narrowing the search range.

The recursive search algorithm SearchBST() is as follows (find the record with the keyword k in the binary sorting tree bt, and return the node pointer when successful, otherwise return NULL):

   BSTNode *SearchBST(BSTNode *bt,KeyType k)
 {
    
      if (bt==NULL || bt->key==k)        //递归终结条件
       return bt;
    if (k<bt->key)
    　 return SearchBST(bt->lchild,k); //在左子树中递归查找
    else
    　 return SearchBST(bt->rchild,k); //在右子树中递归查找
 }

The following non-recursive algorithm can also be used:

BSTNode *SearchBST1(BSTNode *bt,KeyType k)
{
    
      while (bt!=NULL)
   {
    
    
      if (k==bt->key)
          return bt;
      else if (k<bt->key)
          bt=bt->lchild;  //在左子树中递归查找
      else
          bt=bt->rchild;  //在左子树中递归查找
   }
   else                   //没有找到返回NULL
      return NULL;
 }

3. Insertion of binary sorting tree

To insert a new record with the key k in the binary sorting tree, it is necessary to ensure that the BST property is still satisfied after the insertion.

Insertion process：

(1) If the binary sorting tree T is empty, create a node whose key field is k, and use it as the root node;
(2) Otherwise, compare the keys of k and the root node, and if the two are equal, it means the tree There is already this keyword k in it, no need to insert it, it will return 0 directly;

(3) If k is less than T->key, insert k into the left subtree of the root node.

(4) Otherwise insert it into the right subtree.

The corresponding recursive algorithm InsertBST() is as follows:

int InsertBST(BSTNode *&p,KeyType k) 
//在以*p为根节点的BST中插入一个关键字为k的节点。插入成功返回1,否则返回0
{
    
      if (p==NULL)  //原树为空, 新插入的记录为根节点
   {
    
       p=(BSTNode *)malloc(sizeof(BSTNode));
       p->key=k;p->lchild=p->rchild=NULL;
       return 1;
   }
   else if  (k==p->key) //存在相同关键字的节点,返回0
      return 0;
   else if (k<p->key) 
      return InsertBST(p->lchild,k);　//插入到左子树中
   else  
      return InsertBST(p->rchild,k);  //插入到右子树中

 }

4. Deletion of binary sorting tree

(1) The deleted node is a leaf node: delete the node directly.
(2) If the deleted node has only left subtree or only right subtree, replace it with its left subtree or right subtree.
(3) The deleted node has both left subtree and right subtree: replace it with its predecessor, and then delete the predecessor node. The predecessor is the largest node in the left subtree. You can also replace it with its successor, and then delete the successor node. The successor is the smallest node in the right subtree.

Balanced Binary Tree (AVL):

1. Definition of Balanced Binary Tree

If the heights of the left and right subtrees of each node in a binary tree differ by at most 1, the binary tree is called a balanced binary tree.

In the algorithm, bybalance factor(balancd factor, denoted by bf) to implement the above definition of balanced binary tree.

balance factor: Each node in a balanced binary tree has a balance factor field, and the balance factor of each node is the height of the node's left subtree minus the height of the right subtree. From the perspective of balance factor, it can be said that if the absolute value of the balance factor of all nodes in a binary tree is less than or equal to 1, that is, the value of the balance factor is 1, 0 or -1, then the binary tree is called a balanced binary tree.
insert image description here

Hash table lookup

1. The basic concept of hash table
a.hash technologyIt is to establish a definite correspondence f between the storage location of the record and its key, so that each key key corresponds to a storage location f(key). The mapping relationship between keywords and storage locations is established, and the formula is as follows: Storage location = f (keyword)
Here, this correspondence f is called a hash function, also known as a hash (Hash) function.
　
b. Hash technology is used to store records in a continuous storage space, which is called a hash table or a hash table. Then, the record storage location corresponding to the key is called the hash address.

c. Hashing is both a storage method and a lookup method. There is no logical relationship between the records of hashing technology, it is only related to keywords, so hashing is mainly a search-oriented storage structure.
2. The construction method of the hash function

2.1 Direct address method: The so-called direct address method means that a certain linear function value of the keyword is taken as the hash address

Advantages: simple, uniform, and no conflict.
Disadvantages: Need to know the distribution of keywords in advance, suitable for small and continuous lookup tables.

使用:由于这样的限制，在现实应用中，此方法虽然简单，但却并不常用。

2.2 Digital analysis method : If the keyword is a number with more digits, such as the 11-digit mobile phone number "130****1234", the first three digits are the access number; the middle four digits are the HLR identification number, indicating that the user The attribution of the number; the last four are the real user numbers.

使用:数字分析法通过适合处理关键字位数比较大的情况，如果事先知道关键字的分布且关键字的若干位分布比较均匀，就可以考虑用这个方法。

2.3 Folding method : It is to divide the keyword into several parts with the same number of digits from left to right (note that the last part can be shorter if the number of digits is not enough), then superimpose and sum these parts, and take the hash table length according to the length of the hash table. The last few bits serve as the hash address.

使用:折叠法事先不需要知道关键字的分布，适合关键字位数较多的情况。

2.4 The method of taking the middle of the square: This method is very simple to calculate. Suppose the keyword is 1234, then its square is 1522756, and then the middle 3 bits are extracted to be 227, which is used as the hash address.

使用:平方取中法比较适合不知道关键字的分布，而位数又不是很大的情况。

2.5 The remainder method of division : This method is the most commonly used method for constructing a hash function. This method can not only take the modulo of the keyword directly, but also can be folded, squared, and then modulo.

使用:本方法的关键在于选择合适的p，p如果选不好，就可能会容易产生冲突。若散列表的表长为m，通常p为小于或等于表长（最好接近m）的最小质数或不包含小于20质因子的合数。
3. Methods of handling conflicts

3.1 Concept: In an ideal situation, the address calculated by the hash function for each keyword is different, but in reality, this is just an ideal. The market will encounter two keywords key1 != key2, but there are f(key1) = f(key2), this phenomenon is calledconflict. The occurrence of collisions will cause search errors, so the collisions can be as few as possible by carefully designing the hash function, but they cannot be completely avoided.

3.2 Open address method : The so-called open address method is to find the next empty hash address once a conflict occurs. As long as the hash table is large enough, the empty hash address can always be found and the record is stored.

3.3 Chain address method : Store all the records whose keywords are synonyms in a singly linked list, which is called a synonym sub-table, and only store the pointers in front of all the synonym sub-tables in the hash table. For the keyword set {12, 67, 56, 16, 25, 37, 22, 29, 15, 47, 48, 34}, use the same 12 as the remainder to perform the remainder method.
At this point, there is no longer any conflict to change addresses. No matter how many conflicts there are, it is just a matter of adding nodes to the singly linked list at the current position. The chain address method provides the guarantee that the address will never be found for hash functions that may cause many collisions. Of course, this also brings the performance loss of traversing the singly linked list when searching.
insert image description here

Data Structure Algorithms - those things to look for