[C language data structure 7]--the realization of string

string

1. What is a string

A string is what we often call a string, which is also a linear table. One might think that a string is a linear list of characters, but this is not accurate. For ordinary linear tables, they tend to focus on a single element, each of which has an independent meaning. For example, we use a linear table to store class scores, then the definition of the element type should be as follows:

typedef struct{
    
    
    char num[10];	//学号
    char name[10];	//姓名
    float scores;	//分数
}

Suppose the table stores the following elements:

student ID Name Fraction
1809111001 zack 97.5
1809111002 rudy 94
1809111003 alice 96
1809111004 atom 99

Now we take out the data of student ID 01and 04two people, we will only say that it is the grades of two people, or the top two people in the ranking. rather than calling them a whole (as is usually the case).

For strings, things are a little different. For example this string:

Do not go gentle into that good night!

We take out some data:

gentle

We can call it a word, and let's take a part:

good night!

We can say it is a sentence. It is precisely because a certain part of a string has overall meaning that we need to implement operations on strings and patterns in strings. It will be described in detail later.

Second, the representation of the string

Here we use a sequential storage structure to represent a string, which is similar to a sequential list:

#define MAXSIZE 20
typedef struct{
    
    
    char ch[MAXSIZE+1];
    int length;
}SString;

MAXSIZE+1Here we create a char array of length . Among them, we do not store data for the elements with the subscript 0, which is to make the logical position correspond to the physical position. Others and the sequence table are the same.

In addition to the above representation, you can also use the same representation as the C language itself string. Instead of storing length information, we use \0this character to indicate the end. However, the algorithm time complexity of obtaining the length of the string in this way is O(n).

Three, the realization of the string

(1) Assignment of strings

String assignment is very simple, it is a simple loop operation:

void StringAssign(SString *S, char *str){
    
    
    int i = 0;
    //如果当前字符不是\0
    while (s[i] != '\0'){
    
    
        //将字符数组的内容赋值给串
        S->ch[i+1] = s[i];
        ++S->length;
        ++i;
    }
}

Because the subscript of the array starts from 0, assign s[i] to S->ch[i+1].

(2) Copying of strings

The copy operation is similar to the assignment operation, it is also a simple loop, but the content of the assignment is changed to a string:

int StringCopy(SString *S1, SString S2){
    
    
    //如果串为空,则返回0
    if (!S2.length){
    
    
        return 0;
    }
    //循环遍历S2,将S2内容依次赋值给S1
    for(int i = 1; i <= S2.length; i++){
    
    
        S1->ch[i] = S2.ch[i];
    }
    //修改被赋值串S1的长度
    S1->length = S2.length;
    return 1;
}

We don't need to care about the original content of S1 , we just need to cover the content in turn, and then modify the length of S1 . In this way, we have completed the assignment of the string logically. Physically, there may be other characters at the end of S 1 , but we don't need to care.

(3) Find the length

We directly return the length member of the string which is the length:

int StringLength(SString S){
    
    
    return S.length;
}

(4) String comparison

The comparison of strings is the comparison of the ASCII values ​​of individual characters:

int StringCompare(SString S1, SString S2){
    
    
    //遍历S1,依次比较S1和S2的每个字符
    for(int i = 1; i < S1.length; i++){
    
    
        //如果不是同一个字符
        if(S1.ch[i] != S2.ch[i]){
    
    
            //返回它们的差值
            return S1.ch[i] - S2.ch[i];
        }
    }
    //返回长度的差值
    return S1.length - S2.length;
}

In the loop, we determine whether the characters are the same. If not, return the difference between the current character of S1 and the current character of S2 . We judge the string by the first unmatched character. For example the following couples:

abc    >    abd
acd    >    add

If each corresponding character matches successfully, compare the lengths of the strings. What is returned here is the difference in length. If the two strings are the same, the function will return 0. If S 1 is "greater than" S 2 , the function returns a number greater than 0, otherwise it returns a number less than 0.

(5) Intercept the string

A substring is a string consisting of any number of consecutive characters in the string. For example, we have a string:

Do not go gentle into that good night!

The following are its substrings:

Do
 not
t go gentle
good

The substring must exist and the original string, and must be contiguous.

The operation of intercepting substrings is very simple, here is just interception by subscripting:

int SubString(SString S1, SString *S2, int pos1, int pos2){
    
    
    //如果下标不合理,则返回0
    if(pos1 < 1 || pos2 > S1.length || pos2 <= pos1){
    
    
        return 0;
    }
    //将S1被截取的内容依次赋值给S2
    for(int i = pos1, j = 1; i <= pos2; i++, j++){
    
    
        S2->ch[j] = S1.ch[i];
    }
    //修改S2的长度
    S2->length = pos2-pos1;
    return 1;
}

Let's look at two operations separately, positioning strings and pattern matching.

Fourth, positioning strings and pattern matching

The operation of locating a substring is to find the position where the substring first appears in the original string. For example, we have the following substrings:

Do not go gentle into that good night!
Do
not
nt

The position of Do is 1, the position of not is 4, and nt appears twice in the string, we use the position of the first occurrence, which is 13. Let's take a look at how to find strings.

(1) Positioning substring

The operation of locating substrings is the process of constantly comparing strings. We first point the pointer i of the original string to the beginning of the string, and compare the string from i to i+len with the substring (where len is the length of the substring). As shown in the figure:

insert image description here

The red box part is the part that is taken out and compared with the substring. If it is equal to the substring, we return i as the position of the substring in the original string. If it fails, i++ until i+len is greater than the length of the original string.

The code is implemented as follows:

int IndexSubString(SString S1, SString S2){
    
    
    //用于存储原串截取的部分
    SString temp;
    InitString(&temp);
    //如果子串长度大于
    if(S1.length < S2.length){
    
    
        return 0;
    }
    //循环比较原串和子串
    for(int i = 1; i+S2.length <= S1.length; i++){
    
    
        //截取原串内容
        SubString(S1, &temp, i, i+S2.length);
        //将截取内容与子串比较
        int result = StringCompare(temp, S2);
        //如果截取内容与子串相等,则返回i的值(子串的位置)
        if(result == 0){
    
    
            return i;
        }
    }
    return 0;
}

Usually, the premise of locating a substring is that the substring must be found in the original string. The above is to consider the case where the substring does not exist. The positioning operation that we do when we cannot determine whether the substring can be found in the original string is called pattern matching. However, pattern matching also includes the matching of some special rules, so the meaning of pattern matching is richer.

(2) Pattern matching

In the above algorithm, we deliberately cut out a temporary string for comparison. This step is actually unnecessary. It is arranged here to make it easier for everyone to see the code. The code without auxiliary strings is as follows:

int IndexSubString(SString S, SString T) {
    
    
    //指向被比较子串的首位置
    int k = 1;
    //分别指向原串中被比较的位置和模式串中被比较的位置
    int i = k, j = 1;
    //循环比较
    while (k <= S.length && j <= T.length){
    
    
        if(S.ch[i] == T.ch[j]){
    
    
        	//当前字符匹配成功则继续匹配
            i++;
            j++;
        }else{
    
    
            //当前字符匹配失败则将k指向下一个子串,i与k同步
            k++;
            i=k;
            j=1;
        }
    }
    //防止原串字符不够
    if(j > T.length)
        return k;
    else
        return 0;
}

k、i、jThe positions pointed to by the three pointers are shown in the figure:

insert image description here

After the loop, we also judge whether j is greater than S2.length. Here you can simulate the process of matching the following strings:

abaccdo
cdoo

When we match to the end, the loop can exit normally. But the j pointer points to the first o of the pattern string, at which point our algorithm should return a matching failure signal. This is what the final if statement does.

To be precise, the above two algorithms are both pattern matching algorithms. There is also an important KMP algorithm in the string. Due to space limitations, the KMP algorithm will be written in a separate article.

Guess you like

Origin blog.csdn.net/ZackSock/article/details/119571032