Data Structures and Algorithms·Chapter 4 [String]

A string is a finite-length character sequence enclosed by a pair of single quotes, such as: “a string”
can be understood as c++ s t r i n g string string

Basic operations

Insert image description here
Insert image description here
S t r A s s i g n , S t r C o m p a r e , S t r L e n g t h , C o n c a t , S u b S t r i n g StrAssign,StrCompare,StrLength,Concat,SubString StrAssign,StrCompare,StrLength,Concat,SubString是基本操作集

Fixed-length sequential storage representation

  #define  MAXSTRLEN  255  // 用户可在255以内定义最大串长
  typedef unsigned char Sstring[MAXSTRLEN + 1];   // 0号单元存放串的长度

uses a char array to simulate a string, except that the array length is placed at the beginning of the array
The actual length of the string can be set arbitrarily within the range of this predefined length. If it exceeds the predetermined length, The string value with a meaningful length is discarded, which is called "truncation"

Basic operations

Status SubString(SString &Sub, SString S, int pos, int len) {
    
    
    // 用Sub返回串S的第pos个字符起长度为len的子串。
    //其中,1≤pos≤StrLength(S)且0≤len≤StrLength(S)-//pos+1 。
    if ( pos<1 ||  pos>S[0] || len<0 || len>S[0]-pos+1 )
             return ERROR;
    Sub[1..len] = S[pos..pos+len-1];
    Sub[0] = len;           return OK;
}//SubString

Note that S[0]-pos+1这里需要+1

Heap allocated storage representation

typedef struct {
    
    
   char *ch;      // 若是非空串,则按串长分配存储区,否则ch为NULL
   int  length;   // 串长度
} HString;

Usually, the string type provided in C language is implemented in this storage method. The system uses the functions malloc() and free() to dynamically manage the string value space and allocate a storage area for each newly generated string. The storage space shared by the string value is called "heap". Strings in C language are terminated by a null character, and the string length is an implicit value.简而言之:通过malloc(),free()为串动态分配存储空间

Basic operations

Status StrInsert(HString &S, int pos, HString T) {
    
    
      // 1≤pos≤StrLength(S)+1。
       //在串S的第pos个字符之前插入T
      if (pos<1 || pos>S.length+1 ) return ERROR; 
      // pos不合法
      if (T.length) {
    
    
      //T非空,则重新分配空间,插入T 
      if (!(S.ch = (char *) realloc ( S.ch, ( S.length+T.length ) * sizeof ( char ))))
          exit (OVERFLOW);
      for ( i=S.length-1;i>=pos-1; --i)
      //为插入T而腾出位置
      S.ch[i+T.length] = S.ch[i];
      S.ch[pos-1..pos+T.length-2] = T.ch[0..T.length-1]//插入T
      S.Length += T.length;
   }
   return OK;
} // StrInsert

Pay attention to S.ch[pos-1..pos+T.length-2] in this step. I feel that these details are not so noticeable
Herei=S.length-1, it is Because: the perspective given by pos is that the array starts from 1 to l e n g t h length length的存储 [ 1 ] … … [ l e n g t h ] [1]……[length] [1]……[leng th],However, in reality, the existence of the distributional skewer is from 0 start Target [ 0 ] … … [ l e n g t h − 1 ] [0]……[length-1] [0]……[length1]

r e a l l o c ( ) realloc() realloc()

  • If the value of size is 0, it is equivalent to calling free(ptr), which releases the memory space pointed to by ptr.
  • If the incoming ptr is not NULL and size is greater than the originally allocated memory size, an attempt is made to extend the previously allocated memory space to size bytes. If the current memory is not enough to expand to size bytes, a new memory space is reallocated, and the data in the original space is copied to the new memory space. Finally, the original memory space is released and the new memory space is returned. address.
  • If the incoming ptr is not NULL and size is less than or equal to the originally allocated memory size, try to reduce the original memory space to the specified size. If the original memory space is too large, the excess will be returned to the memory management system.
Status SubString(HString& Sub, HString S, int pos, int len) {
    
    
    // 用 Sub 返回串 S 的第 pos 个字符起长度为 len 的子串
    if (pos < 1 || pos > S.length || len < 0 || len > S.length - pos + 1) return ERROR;
    if (Sub.ch) free(Sub.ch); // 释放旧空间
    if (!len) {
    
    
        Sub.ch = NULL;
        Sub.length = 0; // 空子串
    } else {
    
    
        Sub.ch = (char*)malloc(len * sizeof(char)); // 完整子串
        for (int i = 0; i < len; i++) Sub.ch[i] = S.ch[pos + i - 1];
        Sub.length = len;
    }
    return OK;
} // SubString

It still needs to be emphasized that attention should be paid to special circumstances and special judgments should be made

Block chain storage representation of string

Linked lists are used to store string values. Since the data element of a string is a character, when stored in a linked list, usually one character or multiple characters can be stored in a node.

Insert image description here

#define CHUNKSIZE 80  // 可由用户定义的块大小
typedef struct Chunk {
    
    
    char ch[CHUNKSIZE];
    struct Chunk* next;
} Chunk;
typedef struct {
    
    
    Chunk *head, *tail; // 串的头和尾指针
    int curlen;         // 串的当前长度
} LString;

In actual application, the size of the node can be set according to the needs of the problem.
For example: In the editing system, the entire text editing area can be regarded as a string, and each line is a substring, forming a node. That is: strings in the same line use a fixed-length structure (80 characters), and lines are connected with pointers.

string pattern matching algorithm

Naive pattern matching

int Index(SString S, SString T, int pos) {
    
    
    // 返回子串 T 在主串 S 中第 pos 个字符之后的位置。若不存在,
    // 则函数值为 0。其中,T 非空,1 <= pos <= StrLength(S)。
    i = pos;
    j = 1;
    while (i <= S[0] && j <= T[0]) {
    
     // 0 下标存储字符串长度
        if (S[i] == T[j]) {
    
     
            ++i;  
            ++j; // 继续比较后继字符
        } else {
    
     
            i = i - j + 2;   
            j = 1; // 指针后退重新开始匹配
        }
    }
    if (j > T[0]) return i - T[0];
    else return 0;
} // Index

关于 i = i − j + 2 i = i - j + 2 i=ij+The meaning of 2: When a mismatch occurs during matching, we need to change the pointer i i i sum j j j At the same time, move back one position and then match to try to match the remaining part. Since the current pointer i has already matched the previous j − 1 j - 1 j1 characters, so i needs to be moved forward by j − 1 - 1 1, re-addition 2 2 2,Soku available security i i i moves back exactly one position and is ready to be matched with j j j Next match.

Details like this kind of location are completely worthy of careful attention.虽然暂时还不知道考试会不会扣这方面的细节,但是自己写代码的时候,也需要避免越界访问

The time complexity is: O(m*n)

KMP algorithm

int Index_KMP(SString S, SString T, int pos) {
    
    
    // 利用模式串 T 的 next 函数求 T 在主串 S 中第 pos 个字符之后的位置的 KMP 算法。
    // 其中,T 非空,1 ≤ pos ≤ StrLength(S)
    int next[MAXSIZE];
    GetNext(T, next);
    int i = pos;
    int j = 1;
    while (i <= S[0] && j <= T[0]) {
    
     // 0 下标存储字符串长度
        if (j == 0 || S[i] == T[j]) {
    
     
            ++i;  
            ++j; // 继续比较后继字符
        } else {
    
    
            j = next[j]; // 模式串向右移动
        }
    }
    if (j > T[0]) {
    
    
        return i - T[0] + 1; // 匹配成功
    } else {
    
    
        return 0;
    }
} // Index_KMP

This algorithm is slightly complicated
Explanation
Among them, exam n e x t next nextYou can play with yourself a few times

void GetNext(SString T, int* next) {
    
    
    int i = 1;
    int j = 0;
    next[1] = 0;

    while (i < T[0]) {
    
    
        if (j == 0 || T[i] == T[j]) {
    
    
            ++i;
            ++j;
            next[i] = j;
        } else {
    
    
            j = next[j];
        }
    }
}

exercise

KMP match

Insert image description here
Insert image description here

Guess you like

Origin blog.csdn.net/qq_61786525/article/details/130934784