scanf() steps on the memory location record

1. Preliminary summary

Some time ago, there was a temporary requirement that required me to provide a demo to third-party users for testing, but I did not know the plaintext of the key provided by the third-party user, and I was required to remove our internal library. I think the easiest way is to use the open source openssl library, and use scanf()third-party users to manually enter the key plaintext. I think scanf()it should be quite simple for a novice who just learned C language, but after I manually input a string of a specified length for the second time, the first character of the first string will be modified and tampered with. It should have been located immediately, but I struggled for a long time, so I decided to record it.

2. Problem exploration

1. Source code reproduction

We construct a similar source code and try to reproduce this problem. I prepared the following main.c:

#include <stdio.h>
#include <string.h>

// 按16进制打印字符串
void print_hex(char* buf, int len) {
    
    
  int idx = 0;
  while (idx < len) {
    
    
    printf("%02x ", buf[idx++]);
  }
  printf("\n");
}

// 模拟字符串转换
void strTrans(char* dst, const char* src, const int len) {
    
    
  memcpy(dst, src, 64);
}

int main(int argc, char** argv) {
    
    
  // 1111111111111111111111111111111111111111111111111111111111111111
  char strA_src[64] = {
    
    0};
  char strA_dst[64] = {
    
    0};
  // 2222222222222222222222222222222222222222222222222222222222222222
  char strB_src[64] = {
    
    0};
  char strB_dst[64] = {
    
    0};

  printf("Input strA:");
  scanf("%s", strA_src);
  strTrans(strA_dst, strA_src, 64);

  printf("Input strB:");
  scanf("%s", strB_src);
  strTrans(strB_dst, strB_src, 64);

  printf("after trans strA:\n");
  print_hex(strA_dst, 64);
  printf("after trans strB:\n");
  print_hex(strB_dst, 64);

  return 0;
}

Try to compile and run it, the compilation command is: gcc -g -o exec main.c. After running, we manually input the string according to the comments, strA is 64 "1", and strB is 64 "2". The running result is shown in the figure below: As you can see, the first character of the string " strA_dst
insert image description here
" is abnormal at the position of the red box . In theory , the value of " strA_dst[0] " should be " 0x31 " . Below I try to use gdb to locate this problem.

2. Problem positioning

I'm setting breakpoints strTrans(strA_dst, strA_src, 64);with these two lines. scanf("%s", strB_src);At the first breakpoint after running, we will print " strA_src " and " strA_dst " after stepping one step further. The result is shown in the figure:
insert image description here
we can see that " strA_src " and " strA_dst " are both what we expect and are correct.

Then we run to the second breakpoint ( scanf("%s", strB_src);), first print once to see if " strA_dst " is correct.
insert image description here
At this time, it is still the correct value, and then we step one step further, enter " strB_src ", and then look at the value of " strA_dst ".
insert image description here
At this time, " strA_dst " has changed, so it can be determined that scanf()the string memory has been tampered with, which we commonly call " stepping on the memory ". Now that we have located the problem, let's review scanf()the usage first.

3. Exploring the principle of scanf

The function scanf()is a general subroutine that reads content from the standard input stream stdin (standard input device, generally pointing to the keyboard). It can read multiple characters in a descriptive format and store them in the variable at the corresponding address. The first parameter of the function is a format string, which specifies the format of the input, and parses the information corresponding to the input position according to the format specifier and stores it in the position pointed to by the corresponding pointer in the variable parameter list. Each pointer is required to be non-null and correspond to the format symbols in the string one by one.

I searched for all the correct scanf()instructions, but I couldn't find the key to solve the problem. At this time, the colleague gave the key information, and scanf()**'\0'** will be automatically added after the input string! If this is the case, there is indeed an overflow problem, because the length of the array I gave is 64 bytes in total, and the actual number of characters entered is also 64 bytes. If the '\0' character is automatically appended, then an array must be generated Boundary situation. At this point, let's write a small demo to verify it.

#include <stdio.h>
#include <string.h>

int main(int argc, char **argv) {
    
    
  char  ch[10];
  memset(ch, 0x31, sizeof(ch));
  printf("init array by 0x1:\n");
  for (int chIdx = 0; chIdx < 10; chIdx++) {
    
    
    printf("0x%02x, ", ch[chIdx]);
  }
  printf("\n\nstarting scanf test!\n\n");
  scanf("%s", ch);
  printf("after scanf ch is:\n");
  for (int chIdx = 0; chIdx < 10; chIdx++) {
    
    
    printf("0x%02x, ", ch[chIdx]);
  }
  printf("\n");
  return 0;
}

The code is very simple, first define a char array with a length of 10 bytes, and use 0x31(the hexadecimal ASCII code of the character '1') to initialize each value of the array. Then we call scanf()the function to manually enter a string with a length less than 10 to see if the last digit of the string is appended '\0'. The compilation command is also very simple, just a simple gcc compilation is enough. The running result is shown in the figure below:
insert image description here
After inputting 4 characters '2', each value in the array is printed, and you can see that the 5th digit of 0x31 has changed to 0x00 . Confirmed scanf()the conclusion that the function will append '\0' to the string input.

So this time the problem is basically positioned here, let's try to solve it below.

3. Solve the problem

The simplest idea, since it is out of bounds, then we follow scanfthe usage, wouldn't it be good to expand the two character arrays to be input by one bit?

  // 1111111111111111111111111111111111111111111111111111111111111111
  char strA_src[65] = {
    
    0};
  char strA_dst[64] = {
    
    0};
  // 2222222222222222222222222222222222222222222222222222222222222222
  char strB_src[65] = {
    
    0};
  char strB_dst[64] = {
    
    0};

In this way, the above problems can be solved, but this is also a very rough means of circumvention, so do we have any perfect solution?

  1. Use std::string:
    In view of the power of C++, we can use it directly std::string, and use its adaptability to perfectly solve this problem.
  2. getchar()
    It is also a very good solution to use getchar()functions for cyclic input and use heap memory development.

Of the above two methods, the first one is too simple to demonstrate, so write a simple demo to demonstrate the second one.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

// 初始化内存大小
#define INIT_SIZE 20
// 每次扩增的大小
#define EXPEND_SIZE 5

void print_hex(char* buf, int len) {
    
    
  int idx = 0;
  while (idx < len) {
    
    
    if (0 == idx % 16 && 0 != idx) {
    
    
      printf("\n");
    }
    printf("0x%02x ", buf[idx++]);
  }
  printf("\n");
}

int main (int argc, char **argv) {
    
    
  char *pStr = NULL;
  pStr = (char *)malloc(INIT_SIZE * sizeof(char));
  if (!pStr) {
    
    
    printf("[err] malloc failed!\n");
    return -1;
  }
  // 记录当前字符串内存空间大小
  int strSz = INIT_SIZE * sizeof(char);
  memset(pStr, 0, INIT_SIZE * sizeof(char));
  // 记录当前字符串有效长度
  int strLen = 0;
  // 存储字符的临时内存
  char ch = 0x00;
  // 用回车键作为输入的结束符,且不记录到字符串内
  while ('\n' != (ch = getchar())) {
    
    
    // 当字符串有效长度大于内存空间时,需重新开辟内存
    if (strLen + 1 > strSz) {
    
    
      char *pNewStr = (char *)realloc(pStr, strSz + EXPEND_SIZE);
      if (!pNewStr) {
    
    
        printf("[err] realloc failed!\n");
        return -1;
      }
      pStr = pNewStr;
      strSz += EXPEND_SIZE;
    }
    *(pStr + strLen) = ch;
    strLen++;
  }
  printf("str: \n");
  print_hex(pStr, strLen);
  printf("%s\n", pStr);
  return 0;
}

Here, I use a newline character as the condition for the end of the input, and the newline character will not be recorded as a valid character. Below I will paste the actual running process:
insert image description here
You can see that the program can adaptively obtain the current manually entered string and expand its storage space.

Four. Summary

This question dragged on for a long, long time before completing this summary one after another. The problem is very simple, but it is extremely easy to make mistakes. It is still necessary to consolidate the basic knowledge and then consolidate it! There are many solutions to the problem, not limited to the current two methods, including storing std::vectorcharacter arrays is also a method, of course, this is also based on a more flexible C++ implementation. When we encounter problems, we should not be limited to the present, and we should not only think about how to avoid the problem. The most important thing is how to solve the problem. This is what a qualified programmer should consider. Maybe my above solution is still inappropriate, and I implore all colleagues and masters to guide and correct it!

Finally, I will also upload this summary to my personal website, the link is as follows:
http://www.ccccxy.top/coding/archives/2021/01/06/scanf-overflow_98/

Guess you like

Origin blog.csdn.net/qq_38894585/article/details/109960753