Article Directory

Preface
1. What is the string
Second, the fixed-length sequential storage of the string
Three, string heap allocation storage structure
Fourth, the block chain storage structure of the string
Five, BF algorithm (string pattern matching algorithm)
to sum up

Preface

In the data structure, the string must be stored in a separate storage structure, which is called the string storage structure. The string here refers to the string. Strictly speaking, the string storage structure is also a linear storage structure, because the characters in the string also have a "one-to-one" logical relationship. It's just that, unlike the linear storage structure learned before, the string structure is only used to store character type data.

1. What is the string

No matter which programming language you learn, strings are always the most manipulated. In the data structure, some special strings are named according to the number and characteristics of the characters stored in the string, for example:

Empty string: a string of 0 characters is stored, such as S = "" (the double quotes are next to each other);
Space string: a string containing only space characters, such as S = "" (double quotes contain 5 spaces);
Substring and main string: Suppose there are two strings a and b. If a string consisting of several consecutive characters can be found in a which is exactly the same as b, then a is the main string of b, and b is a substring of a. For example, if a = "shujujiegou" and b = "shuju", since a also contains "shuju", string a and string b are the relationship between the main string and the substring;

It should be noted that the space string is different from the empty string. The space string contains characters, but they are all spaces. In addition, only when the whole string b appears in the string a can it be said that b is a substring of a. For example, "shujiejugou" and "shuju" are not the relationship between the main string and the substring.

In addition, for two strings with the relationship between the main string and the substring, you will usually be asked to use an algorithm to find the position of the substring in the main string. The position of the substring in the main string refers to the position of the first character of the substring in the main string.

For example, string a = "shujujiegou" and string b = "jiegou". Through observation, it can be judged that a and b are the relationship between the main string and the substring. At the same time, the substring b is located at the sixth position in the main string a, because in the string a , The position of the first character'j' of string b is 6.

Second, the fixed-length sequential storage of the string

Generally speaking, the array refers to the static array, such as str[10], the length of the static array is fixed. Corresponding to static arrays, there are dynamic arrays, which use malloc and free functions to dynamically apply for and release space, so the length of dynamic arrays is variable.

The fixed-length sequential storage structure of a string can be simply understood as using a "fixed-length sequential storage structure" to store strings, so the underlying implementation can only use static arrays.

When using a fixed-length sequential storage structure to store a character string, it is necessary to apply for sufficient memory space in advance based on the length of the target character string.

The following piece of C language code perfectly shows the use of fixed-length sequential storage structure to store strings: the
code is as follows (example):

#include<stdio.h>
int main()
{
    
    
    char str[12]="shujujiegou";
    printf("%s\n",str);
    return 0;
}

Three, string heap allocation storage structure

The specific implementation of the heap allocation storage of strings is to use dynamic arrays to store strings.

Generally, a programming language divides the memory space occupied by the program into multiple different areas, and the data contained in the program is categorized and stored in the corresponding areas. Take C language as an example, the program divides the memory into 4 areas, namely heap area, stack area, data area and code area. The heap area is the focus of this section.

Unlike other areas, the memory space of the heap area needs to be manually applied for by the programmer using the malloc function, and it must be released manually through the free function after it is not used.

The most common use of malloc functions in C language is to allocate space for arrays. Such arrays are called dynamic arrays. E.g:

char * a = (char*)malloc(5*sizeof(char));

This line of code creates a dynamic array a, and applies for 5 char-type heap storage space by using malloc.

The advantage of dynamic arrays over ordinary arrays (static arrays) is that the length is variable. In other words, dynamic arrays can apply for more heap space as needed (using the relloc function):

a = (char*)realloc(a, 10*sizeof(char));

Fourth, the block chain storage structure of the string

The block chain storage of strings refers to the use of a linked list structure to store strings.

The number of data stored in each node of the linked list can refer to the following factors:

The length of the string and the size of the storage space: If the string contains a large amount of data and the storage space requested by the linked list is limited, at this time, each node should be able to store more data as much as possible to improve the space utilization (each more node , It is necessary to apply for an additional pointer domain space); on the contrary, if the string is not particularly long or the storage space is sufficient, it needs to be considered in combination with other factors;
The function realized by the program: If a large number of insertion or deletion operations are required on the stored string in the actual scene, the amount of data stored in each node should be reduced as much as possible; otherwise, other factors need to be combined

The above two points are only the factors that affect the number of stored data on the node. In actual scenarios, they need to be combined to achieve a comprehensive analysis of the environment.

Five, BF algorithm (string pattern matching algorithm)

1. BF algorithm principle

The common pattern matching algorithm does not have any skills in its implementation process. It simply compares one string with the characters in another string to get the final result.
For example, using a common pattern matching algorithm to determine whether string A ("abcac") is a substring of string B ("ababcabacabab") is as follows:

First, align the first character of string A with string B, and then determine whether the relative characters are equal, as shown in Figure 1:

Schematic

diagram of the first pattern matching of a string Figure 1 Schematic diagram of the first pattern matching of a string.
In Figure 1, because the third character of string A and string B fails to match, it is necessary to move string A back by one character to continue matching with string B, as shown in Figure 2:
Schematic

diagram of the second pattern matching of a string Figure 2 Schematic diagram of the second pattern matching of a string
As you can see in Figure 2, the two strings failed to match, and the string A continues to move backward by one character, as shown in Figure 3:
Schematic

diagram of the third pattern matching of a string Figure 3 Schematic diagram of the third pattern matching of a string In
Figure 3, the pattern matching of the two strings fails, and the string A continues to move until it moves to the position in Figure 4 to match successfully:
Schematic

diagram of successful string pattern matching Figure 4 Schematic diagram of successful string pattern matching

Therefore, string A and string B have undergone 6 matching processes before they succeed. Through the entire pattern matching process, it is proved that string A is a substring of string B (string B is the main string of string A).
Next, we have to write code to achieve pattern matching between two strings (Figure 1 ~ Figure 4).

2. BF algorithm implementation

The realization idea of the BF algorithm is: store the two strings A and B specified by the user using the fixed-length sequential storage structure of the strings, and then implement the pattern matching process of the two strings in a loop. The C language implementation code is as follows:

#include <stdio.h>
#include <string.h>
//串普通模式匹配算法的实现函数，其中 B是伪主串，A是伪子串
int mate(char * B,char *A){
    
    
    int i=0,j=0;
    while (i<strlen(B) && j<strlen(A))
     {
    
    
        if (B[i]==A[j])
        {
    
    
            i++;
            j++;
        }
        else
        {
    
    
            i=i-j+1;
            j=0;
        }
    }
    //跳出循环有两种可能，i=strlen(B)说明已经遍历完主串，匹配失败；j=strlen(A),说明子串遍历完成，在主串中成功匹配
    if (j==strlen(A))
    {
    
    
        return i-strlen(A)+1;
    }
    //运行到此，为i==strlen(B)的情况
    return 0;
}
int main()
 {
    
    
    int number=mate("ababcabcacbab", "abcac");
    printf("%d",number);
    return 0;
}

Program running results: 6

3. BF algorithm time complexity

The optimal time complexity of this algorithm is O(n), where n represents the length of string A, that is, the first match is successful.

The worst-case time complexity of the BF algorithm is O(n*m): n is the length of string A, and m is the length of string B. For example, string B is "0000000001" and string A is "01". In this case, each time the two strings match, they must match to the end of string A to judge the match as a failure. Therefore, it runs nm times.

to sum up

The above is what we are going to talk about today. This article only briefly introduces the strings in the data structure.

String-data structure