"Hey Wei said," Data Structure - Summary of Chapter VII of learning content

 In this paper the basic index directory :

 First, the basic concepts and terminology lookup

Second, the sequential search algorithm  

Third, the binary search algorithm

Fourth, the binary sort tree algorithm  

V. balanced binary tree algorithm

Six, B tree Introduction  

Seven hash table lookup

Eight examples show work

Nine, reflection and self-summary

 

 First, the basic concepts and terminology are looking for:

  (1) Find : The given a value, determines a data element whose key equals a given value (or recorded) in a lookup table.

  (2) search algorithm classification :

    1) Find the static and dynamic look;
      Note: static or dynamic lookup table for all terms. Refers to dynamic table lookup table has deleted table and insert operations.

      Common static tables: sequential search, binary search, interpolation search, index lookup, etc.

      Common dynamic table: binary sort tree, balanced binary tree, B tree, hash table

    2) Find and disorderly orderly look.
      Disorderly Find: Find the number of columns is order-disorder can;
      Ordered Find: Find the number of columns must be ordered sequence.

  (3) the average length (Average Search Length, ASL): Specifies the key needs and expectations of conduct comparing the number of keywords, called search algorithm to find the average length of time to find success.

  (4) key (Key) : is the value of a data item of data elements, also called keys, which can be identified with a data element. If the keyword can uniquely identify a record, called the primary key of this key (Primary Key); if the keyword corresponding to a plurality of records, this keyword is called Keywords (Secondary Key).

 

 

Second, the sequential search algorithm:

  (1) the definition of : a sequential search (Sequential Search) called linear search is to find the most basic technique is that discovery process: the first (or last) record start from the table, and one by one key record given the value of the comparison, if the key of a record and given equal value, then the search is successful, the investigation found record; if until the last (or first) record, and given its key value comparisons are when the like, the table does not check the records search is unsuccessful.

  

  (2) implement the most basic method :

int Search_Seq(SSTable ST,keyType key)
{
    for(i = ST.length;i>=1;--i)
        if(ST.R[i].key == key)
            return i;
    return 0;
}

  

  (3) with a Sentinel implemented method :

// Sentinel has a sequential search from the array a [1] key to the end of the array, without the need for guards each cycle i is out of range, i.e. is less than a.length make a judgment. 
// set a sentry can be resolved so that every time i do not need to be compared with a.length. 
-1 // described lookup fails, Note: only one start searching for the location from the array index 
 int seqSearchWithGuard (int [] A, int Key) 
 { 
    int a.length I = - 1; // set the loop from the array tail starts 
    a [0] = key; // set a [0] is the key value, called "sentinel" 
    the while (A [I] =! Key) 
        i-- ; 
    IF (I> 0 )  return I;  the else  return -1 ;}

  

  (4) Time Complexity Estimation :

  For sequential search algorithm, the success is to find the best situation in the first place to find the algorithm time complexity is O (1), is a worst-case scenario was found in the end, requires n comparisons, the time complexity is O (n), n + 1 comparisons need, time complexity is O (n), since the probability of any keyword in a position are equal, so the average number of lookups to (n + 1) / 2 , the final time complexity is O (n). When large n, search efficiency is very low, while looking for some small data, which can be applied to find the way.

 

 

 Third, the binary search algorithm:

  (1) the definition : a binary search (Binary Search), also known as binary search. It is premised on the recording linear table must be ordered key (usually ordered from small to large), the linear form must be sequentially stored. Binary basic idea is to find: in the ordered list, taken as a comparison target intermediate recording, if a key value equal to the intermediate recording, the lookup is successful; if the value is less than the given key intermediate recording, the intermediate recorded areas continue to look left; if the value is greater than the given key intermediate recording, the right and look in the middle area recording. Repeating the process until the search is successful, or all of the area to find no records, so far failed to find.

  

  (2) the underlying implementation :

// binary search: an array of all the data includes a key array subscript, if not, returns -1 
int binarySearch (int [] a, int key) 
{ 
    int Low = 0; // definition of the minimum index of the first recording 
    int high = a.length-1; // define the maximum index of the last one recorded 
    the while (Low <= High) 
    { 
        int mID = (High + Low) / 2; // binary intermediate obtained subscript recorded 
        if (key <a [mid]) // if it can find a value smaller than the value of the 
            high = mid - 1; // the highest index is adjusted to an intermediate small among 
        else if (key> a [mid ]) // if the lookup value greater than large middle 
            low = mid + 1; // index is adjusted to the lowest intermediate a large scale 
        the else 
            return mID; // if then the index is equal to the intermediate recording is the value found     } 
    return -1  }
 ;

  

  (3) Time Complexity Estimation :

  Tantamount binary search to find the static orderly divided into two sub-tree, i.e., only to find the search results to a half data recorded therein, less work is equal to half, then the binary search continues, of course, very high efficiency.

  Depending on the nature of the binary tree 4, i.e. "complete binary tree of n nodes having a depth of [+1] Log2n" can be obtained by a binary search to find the number of times a keyword or worst case lookup failure Log2n [] is +1, the best course, is 1 times, so the binary search time complexity is O (Logn), clearly far better than the search order of O (n) time complexity of.

  The disadvantage is, however, demanding the table structure, only an ordered sequence table stored in, and find the required before sorting, and sorting itself is a time-consuming operation, not suitable for dynamically changing the linear form.

 

 

Fourth, the binary sort tree algorithm:

  (1) defined : also known as a binary search tree. It is either empty tree or a binary tree having the following properties:

      = "Left subtree if it is not empty, then the value of the left sub-tree, all the nodes are less than the value of the root;

      = 'If it is not empty right subtree, the right subtree are greater than values ​​of all the nodes of the root node and its value;

      = "Its left and right subtrees are also binary sort tree

      Of binary sort tree can be traversed in order to obtain an ordered sequence {35,37,47,51,58,62,73,88,93,99}

  

  (2) Implementation :

      = "If the search tree is empty, then the lookup fails;

      = "If the search tree is non-empty, then:

        ① If a given key value equal to the root key value, the search is successful, the end of the search process, otherwise turn ②

        ② If a given key value key value is less than the root node, is continued in the left sub-tree root node, otherwise turn ③

        ③ If the setpoint is greater than the root key of the key value, is continued in the right subtree of the root node.

// Find the key in the binary sort tree p, the lookup returns true if successful, the search is unsuccessful return false 
// returns false There are two cases: 1 binary sort tree is empty 2 p binary sort tree p is not but did not find an empty Key 
BOOL searchBST (int Key) { 
    BiTreeNode Current = the root; 
    the while (Current = null! ) 
    { 
        IF (Key == current.data)  return to true ;  the else IF (Key < current.data) = Current Current .lchild; the else Current = current.rchild;} return to false ;}

 

   (3) Time Complexity Estimation :

    Ideal binary tree is relatively balanced, the depth of the complete binary tree the same, [are] +1 log2N, then find the time complexity is O (logn), similar to binary search. Is an extreme imbalance swash tree, find the time complexity is O (n), is equivalent to sequential search. How to make balanced binary tree has become a problem to be considered.

  

  (4) basic operation algorithm binary sort tree :

#include <stdio.h>   
#include <stdlib.h> 

#define the OK. 1 
#define ERROR 0 
#define TRUE. 1 
#define FALSE 0 
#define the MAXSIZE / * initial allocation of storage space * 100 / 

typedef int the Status; / * the Status is a type of the function, which is a function of the result status code, such as OK, etc. * / 

/ * binary linked list node structure definition of binary tree * / 
typedef struct BiTNode / * node structure * /  {  int data; / * node data * /  struct BiTNode * lchild, * rchild; / * left child pointers * /  } BiTNode, * BiTree; / * recursive search whether there is a key, * / / * pointer f point T parents binary sort tree T, initial the invocation of the NULL * / / * If the lookup is successful, the pointer p pointing to the data element node, and returns TRUE * / / * otherwise, a pointer p pointing to the last access node on the search path * and returns FALSE / the Status SearchBST ( T BiTree, int Key, BiTree F, BiTree * ! P) {IF (T) / * search is unsuccessful * / {P * = F; returnFALSE;} else if (key == T-> data) / * find success * / {P * = T; return TRUE;} the else IF (Key <T-> Data) SearchBST return (T-> lchild, Key, T, p); / * continue to look at the left subtree * / else return SearchBST (T-> rchild, key, T, p); / * and look in the right subtree * / } / * when the binary sort key equals key data element is not present in the tree T, * / / * insert key and returns TRUE, otherwise * FALSE / the Status InsertBST (BiTree T *, int key) {BiTree P, S; IF (SearchBST (! * T, key, NULL, & p)) / * search is unsuccessful * / {S = (BiTree) the malloc (the sizeof (BiTNode)); S-> = Data Key; S-> lchild = S-> rchild = NULL; IF (! P) = T * s; / * insert a new root s * / the else IF (Key <p-> Data) p-> lchild = s; / * s is inserted into the left child * / the else P -> rchild = s; / * s is inserted into the right child * / return TRUE;} the else return FALSE; / * keywords existing in the same node in the tree, no insertion * /} / * From the binary sort tree delete node p, then both its left or right subtree. * / The Status the Delete (BiTree * P) {BiTree Q, S; IF ((P *) -> rchild == NULL) / * empty right subtree simply reclosing its left subtree (nodes are to be deleted leaf also take the branch) * / {Q = P *; * P = (P *) -> lchild; Free (Q); the else} IF ((P *) -> lchild == NULL) / * weight only then its right subtree * / {Q = P *; * P = (P *) -> rchild; Free (Q); the else} / * left and right subtrees are not null * / {* Q = P; S = (* P) -> lchild; the while (S-> rchild) / * turn left end to the right (looking precursor to be deleted node) * / {Q = S; S = S-> rchild;} ( * p) -> data = s- > data;! / * s point is deleted immediate predecessor node (the value will be replaced by the deleted node) * / if (q deleted node value prodromal * = P) q-> rchild = s-> lchild; / * reclosing right subtree q * / the else Q-> lchild = S-> lchild; / * q reclosing left subtree * / Free (S);} return TRUE;} / * If the data key equals key elements of binary sort tree T exists, then the node deletes the data element, * / / * and return TRUE; otherwise returns FALSE. * / Status DeleteBST (BiTree * T, intkey) {if (! * T ) / * key equals key data elements * / return absence FALSE; the else {IF (key == (* T) -> Data) / * find key equals key data elements * / return the Delete (T); the else IF (Key <(* T) -> Data) return DeleteBST (& (* T) -> lchild, Key); the else return DeleteBST (& (* T) -> rchild, Key );}} int main (void ) {int I; int A [10] = {62,88,58,47,35,73,51,99,37,93 }; BiTree T = NULL; for (I = 0; I <10; I ++ ) InsertBST (& T, A [I]); DeleteBST (& T, 93 ); DeleteBST (& T, 47 ); return 0 ;}

 

 

 V. balanced binary tree algorithm:

  (1) is defined :

    Balanced binary tree (Self-Balancing Binary Search Tree or Height-Balanced Binary Search Tree), a binary sort tree, wherein the height difference between the left subtree of a node, and each of the right subtree at most equal to 1. Balanced binary tree is a highly balanced binary sort tree, which is either an empty tree, or its left subtree and right subtree are balanced binary tree, and the absolute difference between the depth of the left subtree and right subtree value does not exceed 1. The left subtree of node binary tree depth value obtained by subtracting the depth of the right subtree is called balance factor BF (Balance Factor), then all nodes on a balanced binary tree may be balanced by the factor of -1, 0 and 1 only. Recently from the insertion node, and the balance factor is greater than the absolute value of the root node of a subtree, called the minimum unbalanced tree.

   (2) basic algorithm :

   (This block is ignored, and the balance of adjustments a little more time to fully read the article and then complete finishing out)

 

 

Six, B tree Description:

  (1) Background :

      Search algorithm discussed above are carried out in memory, they are suitable for smaller files, but for larger, files stored on the external memory is inappropriate, for such large-scale file, even with a balanced binary tree, still low on the search efficiency.

      If the data set to operate very large, large memory has no way to deal with, in this case, the processing of data need to constantly transferred from the hard disk or other storage device to call up the memory page. When it comes to such an external device, computing complexity with respect to time will change, time of access to the collection of elements has not only a function to find the required number of comparisons that element must be considered for the access time of the external storage device a hard disk and how many times individual access will be made to the device.

   (2) is defined :

      Before the tree is a node can have multiple children, but its own stores only one element, more restrictive binary tree, nodes can only have a maximum of two children. A node can store an element, the element is very much time, so it is very large or of tree (subtrees have a maximum number of nodes), or the height of the tree is very large, or even both must be big enough. This allows very large number of memory accesses external memory, which is obviously became a bottleneck on time efficiency, therefore, we need to break each node stores only one element of restriction.

      Multiple search tree (muiti-way search tree), each of which child node tree may be more than two, and at each node may store a plurality of elements.

   (3) an upgraded version of the B + tree defines :

      B + tree tree is a modification of the B-tree, which differ in that B-tree:

    • There node k sub-node must have a key k;
    • Non-leaf node has only the role of the index, with information about the records are stored in the leaf node in.
    • All the leaf nodes of the tree constituting a sorted linked list can be traversed all the records in the order of sorted key.

      

         The following is a difference between FIGS. B-tree and B + tree:

 

 

 Seven hash table lookup (Hash Table) ★★★★ ★★★★ important

   (1) is defined :

      Hashing technique is to establish a certain correspondence relationship f between the storage position and its record keyword, keyword that each key corresponds to a memory location f (key). When looking for, based on this correspondence relationship is determined to find the key map given value F (key), if present, to find the record set, then there must exist at a position F (key) on. The correspondence relationship is called a hash function f, also called a hash (Hash) function, using the hashing technique records are stored in a contiguous storage space, this storage space continuously or hash table called a hash table (Hash Table). Recording the storage location corresponding to the keyword is called a hash address.

   (2) Find steps :

      散列技术既是一种存储方法,也是一种查找方法。散列过程的步骤分为两步: 在存储时,通过散列函数计算记录的散列地址,并按次散列地址存储该记录。 当查找记录时,通过同样的散列函数计算记录的散列地址,按此散列地址访问该记录。

   (3)适合场景

      散列技术与线性表、树、图结构不同的是,散列技术的记录之间不存在什么逻辑关系,它只与关键字有关联。因此,散列主要是面向查找的存储结构。   散列技术最适合的求解问题是查找与给定值相等的记录。不适合同样的关键字对应很多记录或者范围查找。   对于两个不同的关键字key1≠key2,但是却有f(key1)=f(key2),这种现象称为冲突,并把key1和key2称为这个散列函数的同义词。

  (4)散列函数的构造方法

      =》直接定址法 :f(key) = a X key + b(a,b为常数)

      =》数字分析法

      =》平方取中法

      =》折叠法

      =》除留余数法(最常用的散列函数):对于散列表长为m的散列函数公式为:f(key) = key mod p (p ≤m)。

      =》随机数法

  (5)处理散列冲突的方法

  设计得再好的散列函数也不可能完全避免冲突。

  1.开放定址法:fi(key) = (f(key) + di) MOD m(di = 1,2,3...,m-1)

  开放定址法就是一旦发生了冲突,就去寻找下一个空的散列地址,只要散列表足够大,空的散列地址总能找到,并将记录存入。

  假设关键字集合为{12,67,56,16,25,37,22,29,15,47,48,34},表长为12,f(key) = key MOD 12

  则f(12) = 0,f(67) = 7,f(56) = 8,f(16) = 4,f(25) = 1,而f(37) = 1,则f(37) = (f(37) + 1) MOD 12 = 2,继续f(22) = 10,f(29) = 5,f(15) = 3,f(47) = 11

  而f(48) = 0,冲突,f(48) = (f(48) + 1) MOD 12 = 1,也冲突,f(48) = (f(48) + 1) MOD 12 = 2,还是冲突,一直到f(48) = (f(48) + 6) MOD 12 = 6时才不冲突。

  把这种解决冲突的开放定址法称为线性探测法

  例如48和37这种本来都不是同义词却需要争夺一个地址的情况,称为堆积。堆积使得需要不断处理冲突,无论是存入还是查找效率都会大大降低。

  

  当key=34时,f(key)=10,但是22后面没有空位置了,反而它的前面有一个空位置,尽管可以不断地求余数后得到结果,但效率很差。可以改进di=1²,-1²,2²,-2²,...,q²,-q²(q≤m/2),这样就等于是可以双向寻找到可能的空位置。

  增加平方运算的目的是为了不让关键字都聚集在某一块区域,称这种方法为二次探测法fi(key) = (f(key) + di) MOD m (di=1²,-1²,2²,-2²,...,q²,-q²(q≤m/2))

  还可以对于位移量di采用随机函数计算得到,称之为随机探测法。即设置随机种子相同,每次调用随机函数可以生成不会重复的数列,在查找时,用同样的随机种子,它每次得到的数列是相同的,相同的di可以得到相同的散列地址。fi(key) = (f(key) + di) MOD m(di是一个随机数列)

  2.再散列函数法

  再散列函数法就是事先准备多个散列函数fi(key) = RHi(key) (i=1,2,...,k),每当发生散列地址冲突时,就换一个散列函数计算,这种方法能够使得关键字不产生聚集,也相应增加了计算的时间。  

  3.链地址法

  将所有关键字为同义词的记录存储在一个单链表中,称这种表为同义词子表,在散列表中只存储所有同义词子表的头指针,无论有多少个冲突,都只是在当前位置给单链表增加结点而已。

  例如:0下标→48→12,1下标→37→25...

  链地址法对于可能会造成很多冲突的散列函数来说,提供了绝不会出现找不到地址的保障。当然,也带来了查找时需要遍历单链表的性能损耗。

  4.公共溢出区法

  公共溢出区法就是将所有与之间的关键字位置有冲突的关键字{37,48,34}存入一个公共的溢出区表中。

  在查找时,对给定值通过散列函数计算出散列地址后,先与基本表的相应位置进行对比,如果相等,则查找成功;如果不相等,则到溢出表中进行顺序查找。

  如果相对于基本表而言,有冲突的数据很少的情况下,公共溢出区的结构对查找性能来说还是非常高的。

 

八、Hash例题展示:Hashing

题干

    这道题目只需要用一个数组就可以解决,开始对数组每个元素先初始化为0,然后通过散列映射到数组中去,如果该映射的下标的元素值为0,则把该下标置为我们输入的值,否则就遍历正向的二次探测。这里大家可能会对什么时候才能结束探测循环表示疑问。

    对1来说,他不是素数(质数定义为在大于1的自然数中,除了1和它本身以外不再有其他因数),所以对他而言最小素数为2。

    首先需要构造获取素数的函数:

int GetPrime(int x)
{
    if(x == 1)    return 2;
    int p,i;
    if(x % 2 == 1) p = x;
    else p = x + 1;
    while(1) { for(i = sqrt(p); i >= 2; i--) if (p % i == 0) break; if(i == 1) break; else p += 2; } return p; }

    之后就可以开始敲哈希了

    for(int i = 0; i < N; i++)
    {
        if (i != 0) printf(" ");
        scanf("%d", &x);
        pos = x % size;
        tempPos = pos;
        if(A[tempPos] == 0) { A[tempPos] = x; printf("%d", pos); } else { int cnt, flag = 0; for(cnt = 1; cnt < size; cnt++) { pos = (tempPos + cnt*cnt) % size; if(A[pos] == 0) { flag = 1; A[pos] = x; printf("%d", pos); break; } } if(flag == 0) printf("-"); } }

    完整AC答案展示如下:

#include<stdio.h>
#include<stdlib.h>
#include<math.h>
#define MAX 99999999
int size,N,x,pos,tempPos,A[MAX];
int GetPrime(int x)
{
    if(x == 1)    return 2;
    int p,i; if(x % 2 == 1) p = x; else p = x + 1; while(1) { for(i = sqrt(p); i >= 2; i--) if (p % i == 0) break; if(i == 1) break; else p += 2; } return p; } int main() { scanf("%d %d", &size, &N); size = GetPrime(size); for(int i = 0; i < size; i++)//初始化 A[i] = 0; for(int i = 0; i < N; i++) { if (i != 0) printf(" "); scanf("%d", &x); pos = x % size; tempPos = pos; if(A[tempPos] == 0) { A[tempPos] = x; printf("%d", pos); } else { int cnt, flag = 0; for(cnt = 1; cnt < size; cnt++) { pos = (tempPos + cnt*cnt) % size; if(A[pos] == 0) { flag = 1; A[pos] = x; printf("%d", pos); break; } } if(flag == 0) printf("-"); } } return 0; }

 

 九、自我总结与反思:

  最近数挖的课程也要接近尾声,6月15日需要完成大作业,差不多大一也要结束了,收收心准备一些结项了。

  (1)完成基础Python知识学习与巩固复习,完成数挖爬虫作业

  (2)继续攻读统计学习,完成数挖算法作业

  (3)准备下周二多模态情景分析的论文presentation

  (4)继续学习ACM算法——序列自动机

  (5)学业复习及相关考试考级准备

Guess you like

Origin www.cnblogs.com/WinniyGD/p/10941439.html