Because a function strtok stepped on a pit, I understood the importance of looking at the source code

Follow, star public account, direct access to exciting content

ID: Technology makes dreams greater

Author: Li Xiao Yao

In the last article, because a function strtok stepped on a pit, I was ruthlessly ridiculed by old engineers (1) . We analyzed the strtok() function, as well as the thread-safe version under windos and Linux, then in this article we focus on analyzing and interpreting strtok( ), what are the pitfalls of strtok?

Look at the source code

If you want to learn more about its characteristics, you must look at the source code. The following code is taken from the strtok.c file of glibc-2.20.

 1#include <string.h>
 2
 3static char *olds; 
 4
 5#undef strtok
 6
 7#ifndef STRTOK
 8# define STRTOK strtok
 9#endif
10
11/* Parse S into tokens separated by characters in DELIM.
12   If S is NULL, the last string strtok() was called with is
13   used.  For example:
14    char s[] = "-abc-=-def";
15    x = strtok(s, "-");     // x = "abc"
16    x = strtok(NULL, "-=");     // x = "def"
17    x = strtok(NULL, "=");      // x = NULL
18        // s = "abc\0=-def\0"
19*/
20char *
21STRTOK (char *s, const char *delim)
22{
23  char *token;
24
25  if (s == NULL)
26    s = olds;
27
28  /* Scan leading delimiters.  */  
29  s += strspn (s, delim);
30  if (*s == '\0')
31    {
32      olds = s;
33      return NULL;
34    }
35
36  /* Find the end of the token.  */
37  token = s;
38  s = strpbrk (token, delim);  
39  if (s == NULL)
40    /* This token finishes the string.  */
41    olds = __rawmemchr (token, '\0');
42  else
43  {      
44    /* Terminate the token and make OLDS point past it.  */
45    *s = '\0';        /* 将分隔符所在位置置0,此为TOP2坑 */
46    olds = s + 1;
47  }
48  return token;
49}

But Baidu Encyclopedia also mentioned that it is replaced by the faster strsep(), so I checked the strsep function:

#include <string.h>

#undef __strsep
#undef strsep

char *
__strsep (char **stringp, const char *delim)
{
  char *begin, *end;

  begin = *stringp;
  if (begin == NULL)
    return NULL;

  /*A frequent case is when the delimiter string contains only one
     character.  Here we don't need to call the expensive `strpbrk'
     function and instead work using `strchr`.*/
  if (delim[0] == '\0' || delim[1] == '\0')
  {
      char ch = delim[0];

        if (ch == '\0')
            end = NULL;
         else
        {
            if (*begin == ch)
                end = begin;
            else if (*begin == '\0')
                end = NULL;
            else
                end = strchr (begin + 1, ch);
        }
  }
  else
    /* Find the end of the token.*/
    end = strpbrk (begin, delim);

  if (end)
  {
      /* Terminate the token and set *STRINGP past NUL character.  */
      *end++ = '\0';
      *stringp = end;
  }
  else
    /* No more delimiters; this is the last token.  */
    *stringp = NULL;

  return begin;
}

Trample guide 1-no reentry

At present, most programs are run in a multi-threaded environment, and this is one of the reasons why we use strtok to make mistakes.

We saw in the previous in char buffer[INFO_MAX_SZ]="Aob male 18,Bob male 19,Cob female 20";-cut time to extract only the first personal information.

This means that strtok is not reentrant. why?

We look at the source code Line3 above and define a global variable

static char *olds; 

Global variables are used, that is, static buffers are used, so this function is not reentrant.

Trample guide 2-The separator at the beginning and the end of the string will be ignored

The strtok() function separates strings according to a given character set and returns each substring.

If we look at the source code line28 , the separator before the string is skipped. If there is only the separator at the end of the string, skipping the leading character is equivalent to ignoring the separator at the end.

Let's take an example to cut ";Hello ;I'm lixiaoyao;;".

//https://tool.lu/coderunner/
//来源:技术让梦想更伟大
//作者:李肖遥
#include <stdio.h>
#include <string.h>

int main(void)
{
    char  szTest[]  = ";Hello ;I'm lixiaoyao;;";
    char *pSentence = NULL;

    pSentence = strtok(szTest, ";");
    while (NULL != pSentence)
    {
        printf("%s\n\n", pSentence);
        pSentence = strtok(NULL, ";");
    }

    return 0;
}

The compilation result is as follows, we see that the following ";" is gone.

Stepping on the pit guide 3-consecutive separators are treated as one separator

Let's look at the source code line42 , when we find a delimiter, we return, and the leading delimiter will be skipped next time we enter the function.

That is to say, if two delimiters appear consecutively, do you want to separate an empty string when separating, or do you want strtok to ignore the extra delimiters?

We just look at an example. code show as below:

//https://tool.lu/coderunner/
//来源:技术让梦想更伟大
//作者:李肖遥
#include <stdio.h>
#include <string.h>

int main(void)
{
    char  szTest[]  = "Hello;;I'm lixiaoyao";//连续使用两个;分隔语句
    char *pSentence = NULL;

    pSentence = strtok(szTest, ";");
    while (NULL != pSentence)
    {
        printf("%s\n\n", pSentence);
        pSentence = strtok(NULL, ";");
    }

    return 0;
}

The result is as follows. The consecutive separators are processed only once.

Trample guide 4-the source string will be modified

If a string is modified out of our sight, then a series of weird things may happen. In fact, in the source code line45 , the position of the separator is set to 0, that is, the source string is modified, we look at the following code:

//https://tool.lu/coderunner/
//来源:技术让梦想更伟大
//作者:李肖遥
#include <stdio.h>
#include <string.h>

int main(void)
{
    char  szTest[]  = "Hello;I'm lixiaoyao";
    char *pSentence = NULL;

    printf("The original string is %s.\n", szTest);

    pSentence = strtok(szTest, ";");
    while (NULL != pSentence)
    {
        pSentence = strtok(NULL, ";");
    }

    printf("The final string is %s.\n", szTest);

    return 0;
}

According to our understanding, strtok only cuts the string, the string itself does not change, but actually changed, we see that the result has indeed changed.

Avoid the pit conclusion

  1. Try to use the reentrant version of strtok, strtok_s function under Windows platform, strtok_r function under Linux platform;

  2. There are actually many ways to split strings, such as using the sscanf function, I will learn to share later;

  3. Keep in mind the separation rules of the strtok function family : ignore the separator before and after the string, and the continuous separator is treated as one;

  4. Before using strtok, please make a backup of the source string, otherwise it may be easier to ignore if you want to find errors.

At last

There is still a lot of knowledge about string segmentation. The main points and implementation principles of strtok and strtok_r can be studied in depth. From this we can also see that we need to use source code to find the root of the problem, especially in the embedded development process, many times we don’t know What is the real reason, but when we find the source code, it is clear again.

Recommended reading:

嵌入式编程专辑Linux 学习专辑C/C++编程专辑

关注微信公众号『技术让梦想更伟大』,后台回复“m”查看更多内容,回复“加群”加入技术交流群。
长按前往图中包含的公众号关注

Guess you like

Origin blog.csdn.net/u012846795/article/details/108114623