Table of contents
0 Introduction
In C language, strings and arrays have many similarities, and many official library functions are provided for calling. So what kind of close relationship do the sisters strings and arrays have? As our key character in this issue, what is unique about strings?
The C language does not have an explicit string data type, because strings appear as string constants or are stored in character arrays. String constants are suitable for strings that are not modified by the program. All other strings must be stored in character arrays or dynamically allocated memory .
This article focuses on introducing some commonly used library functions for strings, so that everyone can choose the most suitable library function in different situations. The following is an overview of the content of this article
1 String Basics
A string is a sequence 0个
or characters ending in a byte whose 多个
bit pattern is all . For example:0
NUL
char message[] = "hello word";
2 string length
The length of a string is the number of characters it contains, excluding the last terminator . This will often be tested during interviews.
The length of a string can be automatically calculated through the library function strlen().
For example:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
char message[] = "hello word";
printf("字符串的长度为:%d\n",strlen(message));
system("pause");
return 0;
}
Print output:
It should be noted that this function returns an unsigned number, so you need to pay special attention when comparing the lengths of two strings through this function:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
char message1[] = "hello word";
char message2[] = "hello Shanghai";
//比较方式1
if(strlen(message1) >= strlen(message2))
printf("字符串1更长\n");
else
printf("字符串2更长\n");
//比较方式2
if (strlen(message1) - strlen(message2) >= 0)
printf("字符串1更长\n");
else
printf("字符串2更长\n");
system("pause");
return 0;
}
Print output:
Because an unsigned number is returned, soComparison method 2, the result of conditional judgment is always true, resulting in errors in the judgment result.
3 Unrestricted string functions
The so-called unrestricted string function means that when used, there is no need to specify the length of the string (actual parameter), and the function can run smoothly.
3.1 Copy string
Copying strings is often used in development, but when copied to a new string, the original part will be overwritten, so special attention is required.
For example:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
char message1[] = "hello word";
char message2[] = "hello Shanghai";
int message2_len = strlen(message2);
printf("字符串2的长度为:%d\n", strlen(message2));
strcpy(message2,message1);
printf("字符串2的长度为:%d\n",strlen(message2));
for(int i = 0; i < message2_len; i++)
printf("%c",message2[i]);
system("pause");
return 0;
}
Print output:
You can see that after message1
copying the string to message2
, message2
the lengths are actually different. Why is this?
This is because when copying the string, the terminator is also copied. When strlen()
the function is processed, it will definitely return 10
. Judging from the printed results, message2
the rest of the string is still retained.
When copying a longer string into a shorter string, an error is often reported because there is not enough space to accommodate the characters that need to be copied.
3.2 Connection string
When concatenating strings, you can use the strcat() function. Its prototype is as follows:
char *strcat(char *dst, char const *src);
for example:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
char message1[] = "hello word";
char message2[] = "hello Shanghai";
strcat(message1,message2);
printf("%s\n", message1);
system("pause");
return 0;
}
Print output:
You can see that the two strings are directly concatenated. The length value of the new string is the sum of the lengths of the original two strings .
3.3 Return value of function
The return value of these functions is sometimes a copy of the first parameter, so it can be nested, because when a string is used as an actual parameter, the address of the first element is also passed. So these functions can often be called nested.
For example:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
char message1[] = "hello ";
char message2[] = "word ";
char message3[] = "Shanghai";
strcat(strcat(message1, message2), message3);
printf("%s\n", message1);
system("pause");
return 0;
}
Print output:
But for the readability of the program, it is okay not to nest.
3.4 String comparison
For string comparison, there is only one commonly used library function, that is strcmp
. Its prototype is as follows:
int strcmp(char const *s1, char const *s2);
The comparison rules of this function are quite interesting. This function compares characters of two strings one by one until a mismatch is found. There are two situations here:
- The string containing the character with the highest ASCII ranking among the first unmatched characters is considered the smaller string;
- If both strings at the beginning are equal, the shorter string is considered the smaller string.
If a certain mismatched character is found in the string, the comparison result can be obtained without comparing the remaining parts.
As shown below:
Look at the actual code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
char temp1[] = "hello";
char temp2[] = "hello world";
char temp3[] = "hello worLd";
//字符串temp1和temp2作比较
if(strcmp(temp1,temp2) == 0)
{
printf("temp1 = temp2\n");
}
else if (strcmp(temp1, temp2) > 0)
{
printf("temp1 > temp2\n");
}
else if (strcmp(temp1, temp2) < 0)
{
printf("temp1 < temp2\n");
}
printf("------------------\n");
//字符串temp2和temp3作比较
if (strcmp(temp2, temp3) == 0)
{
printf("temp2 = temp3\n");
}
else if (strcmp(temp2, temp3) > 0)
{
printf("temp2 > temp3\n");
}
else if (strcmp(temp2, temp3) < 0)
{
printf("temp2 < temp3\n");
}
printf("\n");
system("pause");
return 0;
}
Printout:
4 Limited-length string functions
When calling some library functions, you need to pass in the length of the string to be processed, so they are called length-limited string functions.
These functions provide a convenient mechanism to prevent unpredictable long strings from overflowing their target arrays.
There are several common functions:
char *strncpy(char *dst, char const *src, size_t len);
char *strncat(char *dst, char const *src, size_t len);
int strncmp(char const *s1, char const *s2, size_t len);
for example:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
char message1[] = "hello ";
char message2[] = "hello Beijing ";
char message3[] = "Shanghai";
char message_all[] = "hello Beijing Shanghai";
if(strncmp(strncpy(strncat(message2, message3, strlen(message3)), message1, strlen(message1)), message_all,strlen(message_all)) == 0)
printf("二者相等\n");
else
printf("二者不相等\n");
system("pause");
return 0;
}
Printout:
This example is not very appropriate. Because the length is passed in according to the maximum, it can also explain the problem.
5 Basics of string search
There are many functions in the standard library that use various methods to find strings. These various tools give C programmers a great deal of flexibility.
5.1 Find a string
There are two library functions available for finding specific characters in a string.
char *strchr(char const *str, int ch);
char **strrchr(char const *str, int ch);
The former is used to find a certain characterfirstThe position where it occurs (returns a pointer to the address), which is used to find a characterthe last timeThe location where this occurs (returns a pointer to that address).
These two functions can be used like this:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
char message1[] = "hello ";
char message2[] = "hello Beijing ";
char message3[] = "Shanghai";
char message_all[] = "hello Beijing Shanghai";
char *first_site, *last_site;
first_site = strchr(message_all, 'h');
last_site = strrchr(message_all, 'h');
printf("字符串的长度是:%d\n",strlen(message_all));
printf("h第一次出现的位置是:%d\n", first_site - message_all);
printf("h最后一次出现的位置是:%d\n", last_site - message_all);
system("pause");
return 0;
}
Print output:
It should be noted that this function does not return the value of the target element position, but the pointer, so it needs to be different from the first element pointer of the string to get the result.
Note: The search is case-sensitive.
5.2 Find any few characters
strpbrk
is a more common function used to find the first occurrence of any character in a string in the target string. Its prototype is as follows:
char *strpbrk(char const *str, char const *group);
It can be used like this:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
char message_all[] = "hello Beijing Shanghai";
char *first_site;
first_site = strpbrk(message_all, "abcde");
printf("字符串的长度是:%d\n",strlen(message_all));
printf("abcde第一次出现匹配字符的位置是:%d\n", first_site - message_all);
system("pause");
return 0;
}
Print output:
It is easy to see that the first matched character is e
, and the position is 1
.
5.3 Find a substring
To find a substring in a string, we can use the strstr function, whose prototype is as follows:
char *strstr(char const *s1, char const *s2);
To give an example in actual use:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
char message_all[] = "hello Beijing Shanghai";
char *first_site;
first_site = strstr(message_all, "Beijing");
printf("字符串的长度是:%d\n",strlen(message_all));
printf("Beijing第一次出现的位置是:%d\n", first_site - message_all);
system("pause");
return 0;
}
Print output:
It can be seen that in this search, all characters need to be matched, not certain or partial characters.
6 Advanced string search
The next set of functions simplifies the process of finding and extracting a substring from the beginning of a string.
6.1 Find a string prefix
strspn
and strcspn
functions are used to count strings at their starting positions in the string, and their prototypes are as follows:
size_t strspn( char const *str, char const *group);
size_t strcspn( char const *str, char const *group);
It should be noted that these two functions do not return element pointers, but the actual number of matching characters.
For specific usage, see an example:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
int len1, len2;
char buffer[] = "25,142,330,smith,J,239-4123";
len1 = strspn(buffer, "0123456789");
len2 = strcspn(buffer, ",");
printf("0123456789的起始匹配数是:%d\n", len1);
printf(",的起始不匹配数是:%d\n", len2);
system("pause");
return 0;
}
Print output:
As can be seen from the above example, strspn
the function starts from the beginning to find characters that match the string it is looking for until it cannot be found. In this example, ,
it is no longer suitable, so in the case of continuous search, only 2
one is suitable.
The strcspn function is just the opposite. What it is looking for is not consistent. The 2 and 5 at the beginning obviously do not match, but they ,
are consistent. Therefore, in the case of continuous search, there is a suitable 2
one.
6.2 Find mark
A string often contains several separate parts, which are separated from each other. Each time in order to process these parts, they must first be extracted from the string.
The strtok function can achieve such a function. It isolates individual parts called tokens from the string. and discard the delimiter. Its prototype is as follows:
char *strtok( char *str, char const *sep);
Notice:
- When the strtok function performs its task, it modifies the string it processes. If the source string cannot be modified, make a copy and pass this copy to the strtok function.
- If the first parameter of strtok function is not NULL, the function will find the first token of the string. strtok will also save its position in the string. If the first parameter of the strtok function is NULL, the function searches for the next token in the same string starting from the saved position as before.
for example:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
int add = 0;
char buffer[] = "25,142,330,smith,J,239-4123";
char *token = NULL;
for (token = strtok(buffer, ","); token != NULL; token = strtok(NULL, ","))
{
printf("%s\n", token);
add++;
}
printf("--------------------------\n");
printf("add的值为:%d\n",add);
system("pause");
return 0;
}
Print output:
As can be seen from the above example, using the mark we need to find as the boundary, each loop will get a divided substring until all divisions are completed. Divided 6
times in total.
7 error messages
errno
When a C language library function fails to execute, there will be an error code (0 1 2 3 4 5 6 7 8 9...). The operating system reports the error code by setting an external integer variable. In other words, an error code corresponds to an error type. strerror
The function takes one of the error codes as a parameter and returns a pointer toPointer to a string describing the error. The prototype of this function is as follows:
char *strerror(int error_number);
for example:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
for(int i = 0; i < 10; i++)
printf("%s\n",strerror(i));
system("pause");
return 0;
}
Print output:
It can be seen that different operation codes correspond to different error types, and the error code 0
indicates that there is no error. Others indicate various errors. Just understand this part. No mastering required. There is no need to know what error each opcode represents.
8 character operations
The standard library contains two sets of functions for manipulating individual characters, and their prototypes are located in the header file ctype.h. The first set of functions is used to classify strings, while the second set of functions is used to convert characters.
8.1 Character classification
Each classification function accepts an integer parameter containing a character value. The function tests this character and returns an integer value representing true or false. The table below lists each function and the conditions required to return true.
function | Conditions required to return true |
---|---|
iscntrl | control characters |
isspace | White space characters: space, form feed '\f', line feed '\n', carriage return '\r', tab 't' or vertical tab '\v' |
even | decimal number |
self digit | Hexadecimal numbers, including uppercase and lowercase letters a~f |
islower | Lower case letters |
isupper | uppercase letter |
isalpha | Letters (uppercase or lowercase) |
the ice hall | letters or numbers |
ispunct | Any graphic character (printable symbol) that is not a number or letter |
isgraph | any graphic character |
sprint | Any printable character, including graphic characters and whitespace characters |
So these functions are used to determine string elements, for example:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
int main()
{
char temp[] = "To carry things with great virtue";
for (int i = 0; i < strlen(temp); i++)
{
if (islower(temp[i]))
printf("temp[%d] : %c是小写字母\n", i, temp[i]);
else if (isupper(temp[i]))
printf("temp[%d] : %c是大写字母\n", i, temp[i]);
else if(isspace(temp[i]))
printf("temp[%d] : %c是空格\n", i, temp[i]);
}
printf("\n");
system("pause");
return 0;
}
Print output:
You can see that it has been temp
determined whether each element is an uppercase letter, a lowercase letter, or a space.
8.2 Character conversion
Conversion functions convert uppercase letters to lowercase letters or lowercase letters to uppercase letters. There are two functions to call. toupper
function returnThe corresponding uppercase form of its parameters, tolower
the function returnsThe lowercase form corresponding to its parameter。
int tolower(int ch);
int toupper(int ch);
Give a practical example:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
int main()
{
char temp1[] = "To carry things with great virtue";
char temp2[] = "To carry things with great virtue";
//全转换为大写
for (int i = 0; i < strlen(temp1); i++)
{
if (islower(temp1[i]))
temp1[i] = toupper(temp1[i]);
printf("%c",temp1[i]);
}
printf("\n-----------------------------\n");
//全转换为小写
for (int i = 0; i < strlen(temp2); i++)
{
if (isupper(temp2[i]))
temp2[i] = tolower(temp2[i]);
printf("%c", temp2[i]);
}
printf("\n");
system("pause");
return 0;
}
Print output:
As you can see, we can adjust the capitalization of letters in the string according to our own wishes.
9 Memory operations
Strings generally end with NUL, but if we want to process strings containing NUL in the middle, or byte sequences of any length, the previous functions are relatively weak, or cannot be used at all. However, we can have another set of functions for us to use to complete some needs in actual development. Below are their prototypes.
Note: These functions can handle not just strings, but alsoStructureorarrayand other data types. The specific data types that can be processed depend on the specific functions.
void *memcpy(void *dst, void const *src, size_t length);
void *memmove(void *dst, void const *src, size_t length);
void *memcmp(void const *a, void const *b, size_t length);
void *memchr(void const *a, int ch, size_t length);
void *memset(void *a, int ch, size_t length);
for example:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define SIZE 10
int main()
{
char temp1[] = "hello world";
char temp2[] = "hello world";
char temp3[] = "hello world";
char temp4[] = "hello world";
unsigned int int_array[SIZE];
char *p = NULL;
//复制字符串
memcpy(temp1 + 2, temp1, 5);
memmove(temp2 + 2, temp2, 5);
printf("temp1 = %s\n", temp1);
printf("temp2 = %s\n", temp2);
printf("---------------------------------------\n");
//比较字符串
if(!memcmp(temp1, temp2, 6))
printf("temp1 = temp2\n");
else
printf("temp1 != temp2\n");
printf("---------------------------------------\n");
//查找字符
p = (char *)memchr(temp3, 'e', strlen(temp3));
if(p != NULL)
printf("字符e在temp3中的位置是:%d\n", p - &temp3[0]);
printf("---------------------------------------\n");
//初始化数组
memset(int_array, 0, sizeof(int_array));
for (int i = 0; i < SIZE; i++)
printf("int_array[%d]的值为:%d\t", i, int_array[i]);
printf("\n", sizeof(int));
printf("---------------------------------------\n");
//初始化数组
memset(temp4, 'a', sizeof(temp4) - 1);
printf("字符串temp4为:%s\n", temp4);
system("pause");
return 0;
}
Printout:
9.1 Are memcpy and memmove really different?
There is a question worth discussing:
Are the memcpy and memmove functions really the same? "C and Pointers" and many opinions on the Internet are: the two are different. If src and dst overlap, memcpy will have problems, while memmove can always run according to ideal conditions, but the results of our program are not Yes, both functions work ideally. Why?
The only explanation is,The running environment of the software is different, and the underlying library of the program is different, so this situation will occur.. But this does not affect our research on the previous versions (that is, the two are different) of memcpy and memmove!
Let’s first take a look at what overlap means and why string copying can cause problems when there is overlap.
The above is a schematic diagram of the string copy operation in our program. You can see that three letters of the src substring and the dst substring overlap. If we follow the conventional operation method, the following result will appear after copying.
If the areas overlap, it will cause a copy error, that is, the desired value is replaced by the new value.cover, causing the value to be unable to be obtained smoothly, temp1
it became after copying hehehehorld
. This is memcpy
how the string was copied before.
Then let's take a look at memmove
(and optimize memcpy
) how to solve this problem neatly.
As you can see, the order of copying has changed. This time it is copied from back to front, which avoids this problem well.
So here comes the question. This time the destination to be copied is at the back. What should we do if it is at the front? The answer is that the copy order is also reversed. Take a look at the execution process:
At this time, the value to be taken will not be overwritten by the original value. It will be executed according to the expected result. The result is: llo w world
.
9.2 memcmp: simple comparison
memcmp
Compare the first length bytes of memory area a and. The comparison method and return value are strcmp
basically the same. For details, please refer to the section of this article strcmp
.
9.3 memchr: simple search
memchr searches for the first occurrence of character ch starting from the starting position of a and returns a pointer to that position. The search method and return value are strchr
basically the same. For details, please refer to the section of this article strchr
.
9.4 memset: The initialized values can only be 0 and -1?
Judging from the introduction of this function, this function can set the same value (theoretically) for a continuous memory area , but in actual development, it is basically set to 0
or -1
. Why is this?
This is because this function assigns values to memory in bytes. It is generally used to assign values to strings without any problems, because the elements of strings only occupy one byte, but arrays are different. Common short , int, long type array elements are more than one byte, so the initialization will not produce the results we expected. When we set it to 0, each byte is 0 (if it is -1, then each bit is 1), so each element will be initialized to 0 no matter how many bytes it is, but for other values, the result will not be the same. Same. Refer to the following procedures:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define SIZE 10
int main()
{
unsigned int int_array[SIZE];
//初始化数组
printf("--------------初始化值设为0-------------------------\n");
memset(int_array, 0, sizeof(int_array));
for (int i = 0; i < SIZE; i++)
printf("int_array[%d]的值为:%d\t", i, int_array[i]);
printf("\n");
printf("--------------初始化值设为1-------------------------\n");
memset(int_array, 1, sizeof(int_array));
for (int i = 0; i < SIZE; i++)
printf("int_array[%d]的值为:%d\t", i, int_array[i]);
printf("\n");
system("pause");
return 0;
}
Printout:
Why is this?
This is because the memset
value is assigned by bytes, and int
the type data occupies 4
3 bytes in the memory, so the value should be in 4
bytes, that is: 0x01010101
, converted to decimal is exactly 16843009
.
Therefore, generally if you want to memse
set the same value for a memory area, initializing to 0
or -1
is the best choice.
10 Summary
The string itself is not very complicated. In order to facilitate development, a lot of library functions are provided, so we only need to master those library functions in C language. In particular, please note that some functions return not numerical values, but pointers; some functions are more special to use, such as strtok
etc.
------------------------------------------------------------------------end-------------------------------------------------------------------------