"C and Pointers" Reading Notes (Chapter 9 Strings, Characters and Bytes)

0 Introduction

In C language, strings and arrays have many similarities, and many official library functions are provided for calling. So what kind of close relationship do the sisters strings and arrays have? As our key character in this issue, what is unique about strings?

The C language does not have an explicit string data type, because strings appear as string constants or are stored in character arrays. String constants are suitable for strings that are not modified by the program. All other strings must be stored in character arrays or dynamically allocated memory .

This article focuses on introducing some commonly used library functions for strings, so that everyone can choose the most suitable library function in different situations. The following is an overview of the content of this article

Insert image description here

1 String Basics

A string is a sequence 0个or characters ending in a byte whose 多个bit pattern is all . For example:0NUL

	char message[] = "hello word";

2 string length

The length of a string is the number of characters it contains, excluding the last terminator . This will often be tested during interviews.

The length of a string can be automatically calculated through the library function strlen().
For example:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main()
{
    
    
	char message[] = "hello word";
	printf("字符串的长度为:%d\n",strlen(message));
	system("pause");
	return 0;
}

Print output:
Insert image description here
It should be noted that this function returns an unsigned number, so you need to pay special attention when comparing the lengths of two strings through this function:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main()
{
    
    
	char message1[] = "hello word";
	char message2[] = "hello Shanghai";
	//比较方式1 
	if(strlen(message1) >= strlen(message2))
		printf("字符串1更长\n");
	else
		printf("字符串2更长\n");
	//比较方式2 
	if (strlen(message1) - strlen(message2) >= 0)
		printf("字符串1更长\n");
	else
		printf("字符串2更长\n");
	system("pause");
	return 0;
}

Print output:
Insert image description here
Because an unsigned number is returned, soComparison method 2, the result of conditional judgment is always true, resulting in errors in the judgment result.

3 Unrestricted string functions

The so-called unrestricted string function means that when used, there is no need to specify the length of the string (actual parameter), and the function can run smoothly.

3.1 Copy string

Copying strings is often used in development, but when copied to a new string, the original part will be overwritten, so special attention is required.
For example:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main()
{
    
    
	char message1[] = "hello word";
	char message2[] = "hello Shanghai";

	int message2_len = strlen(message2);

	printf("字符串2的长度为:%d\n", strlen(message2));

	strcpy(message2,message1);

	printf("字符串2的长度为:%d\n",strlen(message2));
	for(int i = 0; i < message2_len; i++)
		printf("%c",message2[i]);
	system("pause");
	return 0;
}

Print output:
Insert image description here
You can see that after message1copying the string to message2, message2the lengths are actually different. Why is this?

This is because when copying the string, the terminator is also copied. When strlen()the function is processed, it will definitely return 10. Judging from the printed results, message2the rest of the string is still retained.

When copying a longer string into a shorter string, an error is often reported because there is not enough space to accommodate the characters that need to be copied.

3.2 Connection string

When concatenating strings, you can use the strcat() function. Its prototype is as follows:

char *strcat(char *dst, char const *src);

for example:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main()
{
    
    
	char message1[] = "hello word";
	char message2[] = "hello Shanghai";

	strcat(message1,message2);
	printf("%s\n", message1);

	system("pause");
	return 0;
}

Print output:
Insert image description here
You can see that the two strings are directly concatenated. The length value of the new string is the sum of the lengths of the original two strings .

3.3 Return value of function

The return value of these functions is sometimes a copy of the first parameter, so it can be nested, because when a string is used as an actual parameter, the address of the first element is also passed. So these functions can often be called nested.
For example:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main()
{
    
    
	char message1[] = "hello ";
	char message2[] = "word ";
	char message3[] = "Shanghai";

	strcat(strcat(message1, message2), message3);
	printf("%s\n", message1);

	system("pause");
	return 0;
}

Print output:
Insert image description here
But for the readability of the program, it is okay not to nest.

3.4 String comparison

For string comparison, there is only one commonly used library function, that is strcmp. Its prototype is as follows:

int  strcmp(char  const  *s1,  char  const  *s2);

The comparison rules of this function are quite interesting. This function compares characters of two strings one by one until a mismatch is found. There are two situations here:

  1. The string containing the character with the highest ASCII ranking among the first unmatched characters is considered the smaller string;
  2. If both strings at the beginning are equal, the shorter string is considered the smaller string.

If a certain mismatched character is found in the string, the comparison result can be obtained without comparing the remaining parts.
As shown below:
Insert image description here
Look at the actual code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main()
{
    
    
	char temp1[] = "hello";
	char temp2[] = "hello world";
	char temp3[] = "hello worLd";
	//字符串temp1和temp2作比较
	if(strcmp(temp1,temp2) == 0)
	{
    
    
		printf("temp1 = temp2\n");
	}
	else if (strcmp(temp1, temp2) > 0)
	{
    
    
		printf("temp1 > temp2\n");
	}
	else if (strcmp(temp1, temp2) < 0)
	{
    
    
		printf("temp1 < temp2\n");
	}
	printf("------------------\n");
	//字符串temp2和temp3作比较
	if (strcmp(temp2, temp3) == 0)
	{
    
    
		printf("temp2 = temp3\n");
	}
	else if (strcmp(temp2, temp3) > 0)
	{
    
    
		printf("temp2 > temp3\n");
	}
	else if (strcmp(temp2, temp3) < 0)
	{
    
    
		printf("temp2 < temp3\n");
	}

	printf("\n");
	system("pause");
	return 0;
}

Printout:
Insert image description here

4 Limited-length string functions

When calling some library functions, you need to pass in the length of the string to be processed, so they are called length-limited string functions.

These functions provide a convenient mechanism to prevent unpredictable long strings from overflowing their target arrays.
There are several common functions:

char    *strncpy(char  *dst,  char  const  *src,  size_t  len);
char    *strncat(char  *dst,  char  const  *src,  size_t  len);
int    strncmp(char  const  *s1,  char  const  *s2,  size_t  len);

for example:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main()
{
    
    
	char message1[] = "hello ";
	char message2[] = "hello Beijing ";
	char message3[] = "Shanghai";
	char message_all[] = "hello Beijing Shanghai";


	if(strncmp(strncpy(strncat(message2, message3, strlen(message3)), message1, strlen(message1)), message_all,strlen(message_all)) == 0)
		printf("二者相等\n");
	else
		printf("二者不相等\n");

	system("pause");
	return 0;
}

Printout:
Insert image description here
This example is not very appropriate. Because the length is passed in according to the maximum, it can also explain the problem.

5 Basics of string search

There are many functions in the standard library that use various methods to find strings. These various tools give C programmers a great deal of flexibility.

5.1 Find a string

There are two library functions available for finding specific characters in a string.

char *strchr(char const *str, int ch);
char **strrchr(char const *str, int ch);

The former is used to find a certain characterfirstThe position where it occurs (returns a pointer to the address), which is used to find a characterthe last timeThe location where this occurs (returns a pointer to that address).

These two functions can be used like this:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main()
{
    
    
	char message1[] = "hello ";
	char message2[] = "hello Beijing ";
	char message3[] = "Shanghai";
	char message_all[] = "hello Beijing Shanghai";

	char *first_site, *last_site;

	first_site = strchr(message_all, 'h');
	last_site = strrchr(message_all, 'h');

	printf("字符串的长度是:%d\n",strlen(message_all));
	printf("h第一次出现的位置是:%d\n", first_site - message_all);
	printf("h最后一次出现的位置是:%d\n", last_site - message_all);
	system("pause");
	return 0;
}

Print output:
Insert image description here
It should be noted that this function does not return the value of the target element position, but the pointer, so it needs to be different from the first element pointer of the string to get the result.

Note: The search is case-sensitive.

5.2 Find any few characters

strpbrkis a more common function used to find the first occurrence of any character in a string in the target string. Its prototype is as follows:

char *strpbrk(char  const *str,   char  const  *group);

It can be used like this:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main()
{
    
    
	char message_all[] = "hello Beijing Shanghai";
	char *first_site;

	first_site = strpbrk(message_all, "abcde");

	printf("字符串的长度是:%d\n",strlen(message_all));
	printf("abcde第一次出现匹配字符的位置是:%d\n", first_site - message_all);
	system("pause");
	return 0;
}

Print output:
Insert image description here
It is easy to see that the first matched character is e, and the position is 1.

5.3 Find a substring

To find a substring in a string, we can use the strstr function, whose prototype is as follows:

char  *strstr(char  const  *s1,  char  const  *s2);

To give an example in actual use:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main()
{
    
    
	char message_all[] = "hello Beijing Shanghai";
	char *first_site;

	first_site = strstr(message_all, "Beijing");

	printf("字符串的长度是:%d\n",strlen(message_all));
	printf("Beijing第一次出现的位置是:%d\n", first_site - message_all);
	system("pause");
	return 0;
}

Print output:
Insert image description here
It can be seen that in this search, all characters need to be matched, not certain or partial characters.

6 Advanced string search

The next set of functions simplifies the process of finding and extracting a substring from the beginning of a string.

6.1 Find a string prefix

strspnand strcspnfunctions are used to count strings at their starting positions in the string, and their prototypes are as follows:

size_t  strspn(  char  const   *str,  char const  *group);
size_t  strcspn(  char  const   *str,  char const  *group);

It should be noted that these two functions do not return element pointers, but the actual number of matching characters.

For specific usage, see an example:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main()
{
    
    
	int len1, len2;
	char buffer[] = "25,142,330,smith,J,239-4123";

	len1 = strspn(buffer, "0123456789");
	len2 = strcspn(buffer, ",");

	printf("0123456789的起始匹配数是:%d\n", len1);
	printf(",的起始不匹配数是:%d\n", len2);
	system("pause");
	return 0;
}

Print output:
Insert image description here
As can be seen from the above example, strspnthe function starts from the beginning to find characters that match the string it is looking for until it cannot be found. In this example, ,it is no longer suitable, so in the case of continuous search, only 2one is suitable.

The strcspn function is just the opposite. What it is looking for is not consistent. The 2 and 5 at the beginning obviously do not match, but they ,are consistent. Therefore, in the case of continuous search, there is a suitable 2one.

6.2 Find mark

A string often contains several separate parts, which are separated from each other. Each time in order to process these parts, they must first be extracted from the string.
The strtok function can achieve such a function. It isolates individual parts called tokens from the string. and discard the delimiter. Its prototype is as follows:

char  *strtok(  char  *str,  char const  *sep);

Notice:

  1. When the strtok function performs its task, it modifies the string it processes. If the source string cannot be modified, make a copy and pass this copy to the strtok function.
  2. If the first parameter of strtok function is not NULL, the function will find the first token of the string. strtok will also save its position in the string. If the first parameter of the strtok function is NULL, the function searches for the next token in the same string starting from the saved position as before.

for example:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main()
{
    
    
	int add = 0;
	char buffer[] = "25,142,330,smith,J,239-4123";
	char *token = NULL;
	for (token = strtok(buffer, ","); token != NULL; token = strtok(NULL, ","))
	{
    
    
		printf("%s\n", token);
		add++;
	}
	printf("--------------------------\n");
	printf("add的值为:%d\n",add);
	system("pause");
	return 0;
}

Print output:
Insert image description here
As can be seen from the above example, using the mark we need to find as the boundary, each loop will get a divided substring until all divisions are completed. Divided 6times in total.

7 error messages

errnoWhen a C language library function fails to execute, there will be an error code (0 1 2 3 4 5 6 7 8 9...). The operating system reports the error code by setting an external integer variable. In other words, an error code corresponds to an error type. strerrorThe function takes one of the error codes as a parameter and returns a pointer toPointer to a string describing the error. The prototype of this function is as follows:

char  *strerror(int error_number);

for example:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main()
{
    
    
	for(int i = 0; i < 10; i++)
		printf("%s\n",strerror(i));
	system("pause");
	return 0;
}

Print output:
Insert image description here
It can be seen that different operation codes correspond to different error types, and the error code 0indicates that there is no error. Others indicate various errors. Just understand this part. No mastering required. There is no need to know what error each opcode represents.

8 character operations

The standard library contains two sets of functions for manipulating individual characters, and their prototypes are located in the header file ctype.h. The first set of functions is used to classify strings, while the second set of functions is used to convert characters.

8.1 Character classification

Each classification function accepts an integer parameter containing a character value. The function tests this character and returns an integer value representing true or false. The table below lists each function and the conditions required to return true.

function Conditions required to return true
iscntrl control characters
isspace White space characters: space, form feed '\f', line feed '\n', carriage return '\r', tab 't' or vertical tab '\v'
even decimal number
self digit Hexadecimal numbers, including uppercase and lowercase letters a~f
islower Lower case letters
isupper uppercase letter
isalpha Letters (uppercase or lowercase)
the ice hall letters or numbers
ispunct Any graphic character (printable symbol) that is not a number or letter
isgraph any graphic character
sprint Any printable character, including graphic characters and whitespace characters

So these functions are used to determine string elements, for example:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

int main()
{
    
    
	char temp[] = "To carry things with great virtue";

	for (int i = 0; i < strlen(temp); i++)
	{
    
    
		if (islower(temp[i]))
			printf("temp[%d] : %c是小写字母\n", i, temp[i]);
		else if (isupper(temp[i]))
			printf("temp[%d] : %c是大写字母\n", i, temp[i]);
		else if(isspace(temp[i]))
			printf("temp[%d] : %c是空格\n", i, temp[i]);
	}

	printf("\n");
	system("pause");
	return 0;
}

Print output:
Insert image description here
You can see that it has been tempdetermined whether each element is an uppercase letter, a lowercase letter, or a space.

8.2 Character conversion

Conversion functions convert uppercase letters to lowercase letters or lowercase letters to uppercase letters. There are two functions to call. toupperfunction returnThe corresponding uppercase form of its parameters, tolowerthe function returnsThe lowercase form corresponding to its parameter

int tolower(int  ch);
int toupper(int  ch);

Give a practical example:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

int main()
{
    
    
	char temp1[] = "To carry things with great virtue";
	char temp2[] = "To carry things with great virtue";
	//全转换为大写
	for (int i = 0; i < strlen(temp1); i++)
	{
    
    
		if (islower(temp1[i]))
			temp1[i] = toupper(temp1[i]);
		printf("%c",temp1[i]);
	}
	printf("\n-----------------------------\n");
	//全转换为小写
	for (int i = 0; i < strlen(temp2); i++)
	{
    
    
		if (isupper(temp2[i]))
			temp2[i] = tolower(temp2[i]);
		printf("%c", temp2[i]);
	}
	printf("\n");
	system("pause");
	return 0;
}

Print output:
Insert image description here
As you can see, we can adjust the capitalization of letters in the string according to our own wishes.

9 Memory operations

Strings generally end with NUL, but if we want to process strings containing NUL in the middle, or byte sequences of any length, the previous functions are relatively weak, or cannot be used at all. However, we can have another set of functions for us to use to complete some needs in actual development. Below are their prototypes.

Note: These functions can handle not just strings, but alsoStructureorarrayand other data types. The specific data types that can be processed depend on the specific functions.

void  *memcpy(void  *dst,  void  const  *src,  size_t  length);
void  *memmove(void  *dst,  void  const  *src,  size_t  length);
void  *memcmp(void  const  *a,  void  const  *b,  size_t  length);
void  *memchr(void  const  *a,  int  ch,  size_t  length);
void  *memset(void  *a,  int  ch,  size_t  length);

for example:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define SIZE 10

int main()
{
    
    

	char temp1[] = "hello world";
	char temp2[] = "hello world";
	char temp3[] = "hello world";
	char temp4[] = "hello world";
	unsigned int int_array[SIZE];
	char *p = NULL;

	//复制字符串
	memcpy(temp1 + 2, temp1, 5);
	memmove(temp2 + 2, temp2, 5);

	printf("temp1 = %s\n", temp1);
	printf("temp2 = %s\n", temp2);
	printf("---------------------------------------\n");
	//比较字符串
	if(!memcmp(temp1, temp2, 6))
		printf("temp1 = temp2\n");
	else
		printf("temp1 != temp2\n");
	printf("---------------------------------------\n");
	//查找字符
	p = (char *)memchr(temp3, 'e', strlen(temp3));
	if(p != NULL)
		printf("字符e在temp3中的位置是:%d\n", p - &temp3[0]);
	printf("---------------------------------------\n");
	//初始化数组
	memset(int_array, 0, sizeof(int_array));
	for (int i = 0; i < SIZE; i++)
		printf("int_array[%d]的值为:%d\t", i, int_array[i]);
	printf("\n", sizeof(int));
	printf("---------------------------------------\n");
	//初始化数组
	memset(temp4, 'a', sizeof(temp4) - 1);
	printf("字符串temp4为:%s\n", temp4);
	system("pause");
	return 0;
}

Printout:
Insert image description here

9.1 Are memcpy and memmove really different?

There is a question worth discussing:
Are the memcpy and memmove functions really the same? "C and Pointers" and many opinions on the Internet are: the two are different. If src and dst overlap, memcpy will have problems, while memmove can always run according to ideal conditions, but the results of our program are not Yes, both functions work ideally. Why?

The only explanation is,The running environment of the software is different, and the underlying library of the program is different, so this situation will occur.. But this does not affect our research on the previous versions (that is, the two are different) of memcpy and memmove!

Let’s first take a look at what overlap means and why string copying can cause problems when there is overlap.
Insert image description here
The above is a schematic diagram of the string copy operation in our program. You can see that three letters of the src substring and the dst substring overlap. If we follow the conventional operation method, the following result will appear after copying.
Insert image description here
If the areas overlap, it will cause a copy error, that is, the desired value is replaced by the new value.cover, causing the value to be unable to be obtained smoothly, temp1it became after copying hehehehorld. This is memcpyhow the string was copied before.

Then let's take a look at memmove(and optimize memcpy) how to solve this problem neatly.
Insert image description here
As you can see, the order of copying has changed. This time it is copied from back to front, which avoids this problem well.

So here comes the question. This time the destination to be copied is at the back. What should we do if it is at the front? The answer is that the copy order is also reversed. Take a look at the execution process:
Insert image description here
At this time, the value to be taken will not be overwritten by the original value. It will be executed according to the expected result. The result is: llo w world.

9.2 memcmp: simple comparison

memcmpCompare the first length bytes of memory area a and. The comparison method and return value are strcmpbasically the same. For details, please refer to the section of this article strcmp.

9.3 memchr: simple search

memchr searches for the first occurrence of character ch starting from the starting position of a and returns a pointer to that position. The search method and return value are strchrbasically the same. For details, please refer to the section of this article strchr.

9.4 memset: The initialized values ​​can only be 0 and -1?

Judging from the introduction of this function, this function can set the same value (theoretically) for a continuous memory area , but in actual development, it is basically set to 0or -1. Why is this?

This is because this function assigns values ​​to memory in bytes. It is generally used to assign values ​​to strings without any problems, because the elements of strings only occupy one byte, but arrays are different. Common short , int, long type array elements are more than one byte, so the initialization will not produce the results we expected. When we set it to 0, each byte is 0 (if it is -1, then each bit is 1), so each element will be initialized to 0 no matter how many bytes it is, but for other values, the result will not be the same. Same. Refer to the following procedures:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define SIZE 10

int main()
{
    
    
	unsigned int int_array[SIZE];

	//初始化数组
	printf("--------------初始化值设为0-------------------------\n");
	memset(int_array, 0, sizeof(int_array));
	for (int i = 0; i < SIZE; i++)
		printf("int_array[%d]的值为:%d\t", i, int_array[i]);
	printf("\n");
	printf("--------------初始化值设为1-------------------------\n");
	memset(int_array, 1, sizeof(int_array));
	
	for (int i = 0; i < SIZE; i++)
		printf("int_array[%d]的值为:%d\t", i, int_array[i]);
	printf("\n");
	system("pause");
	return 0;
}

Printout: Insert image description here
Why is this?

This is because the memsetvalue is assigned by bytes, and intthe type data occupies 43 bytes in the memory, so the value should be in 4bytes, that is: 0x01010101, converted to decimal is exactly 16843009.

Therefore, generally if you want to memseset the same value for a memory area, initializing to 0or -1is the best choice.

10 Summary

The string itself is not very complicated. In order to facilitate development, a lot of library functions are provided, so we only need to master those library functions in C language. In particular, please note that some functions return not numerical values, but pointers; some functions are more special to use, such as strtoketc.

------------------------------------------------------------------------end-------------------------------------------------------------------------

Guess you like

Origin blog.csdn.net/weixin_43719763/article/details/130913227