A bug caused by "size_type"

Problem Description

#include <iostream>
#include <string>
#include <vector>

using namespace std;

int main(int argc, char* argv[]) {
    
    
  vector<int> v = {
    
    10};
  // 这是一个极端示例
  int pos = 0;
  if (pos <= v.size() - 2) {
    
    
    cout << "pos <= v.size() - 2" << endl;
  } else {
    
    
    cout << "pos > v.size() - 2" << endl;
  }
  return 0;
}

By observing the above sample code, please answer what is the output of the program? Is the answer "pos <= v.size() - 2" or "pos > v.size() - 2"?

Some people may be like me, and the first reaction is that there is std::vector vonly one element in , so the comparison v.size() == 1between 0and 1 - 2must be the result of ">". But the result of the program running tells us that the output here is pos <= v.size() - 2!

Tested in Ubuntu20.04 and Debian10.2.1, the results are both pos <= v.size() - 2. The result of the operation is as follows

identify the problem

Now that the result of the operation is yes pos <= v.size() - 2, let's print out the values on the left and right sides of the "<=" sign to see if it is consistent with what we think pos == 0and v.size() - 2 == -1the conclusion. Change the log output line to:

...
if () {
    
    
  cout << "pos[" << pos << "] <= v.size() - 2[" << v.size() - 2 << ']' << endl;
}
...

Compile and execute again, the result is as follows:

As can be seen in the console output print, the expression v.size() - 2is not equal to "1 - 2 == -1" as we think, but a very large value. In fact, at this point, programmers with certain experience already know why this is. Negative numbers, huge values, based on these two factors, it can be basically determined that the signed type (-1) is implicitly converted to an unsigned type and the overflow is caused !

This problem, the specific situation, we can use gdb to disassemble and debug to track it carefully.

The real address needs to be disassembled correctly after the program is running, otherwise the disassembled address is the offset address.

First set a program entry breakpoint to ensure that the program is running, we set a breakpoint at the entrance of the main function b *main, and run to hit the breakpoint:

(gdb) b *main
Breakpoint 1 at 0x11f5: file ./size_t.cc, line 7.
(gdb) r
Starting program: /home/openwrt/tmp/size_t/unittest_size_t

Breakpoint 1, main (argc=1, argv=0x11bf) at ./size_t.cc:7
7       int main(int argc, char* argv[]) {
    
    
(gdb)

Then determine the address range and disassembly content of the if judgment statement. In gdb mode type disas /m main:

(gdb) disas /m main
Dump of assembler code for function main(int, char**):
7       int main(int argc, char* argv[]) {
    
    
=> 0x00005555555551f5 <+0>:     push   %rbp

......

11        if (pos <= v.size() - 2) {
    
    
   0x0000555555555261 <+108>:   mov    -0x24(%rbp),%eax
   0x0000555555555264 <+111>:   movslq %eax,%rbx
   0x0000555555555267 <+114>:   lea    -0x50(%rbp),%rax
   0x000055555555526b <+118>:   mov    %rax,%rdi
   0x000055555555526e <+121>:   call   0x5555555554d4 <_ZNKSt6vectorIiSaIiEE4sizeEv>
   0x0000555555555273 <+126>:   sub    $0x2,%rax
   0x0000555555555277 <+130>:   cmp    %rax,%rbx
   0x000055555555527a <+133>:   setbe  %al
   0x000055555555527d <+136>:   test   %al,%al
   0x000055555555527f <+138>:   je     0x5555555552f5 <main(int, char**)+256>

......
(gdb)

Let's look at this paragraph first:

   0x000055555555526e <+121>:   call   0x5555555554d4 <_ZNKSt6vectorIiSaIiEE4sizeEv>
   0x0000555555555273 <+126>:   sub    $0x2,%rax
   0x0000555555555277 <+130>:   cmp    %rax,%rbx

In the first call statement here, we can see that std::vector::size()the method is called, we set the breakpoint here b *0x000055555555526e, and execute continue until the breakpoint is hit:

(gdb) b *0x000055555555526e
Breakpoint 2 at 0x55555555526e: file ./size_t.cc, line 11.
(gdb) c
Continuing.

Breakpoint 2, 0x000055555555526e in main (argc=1, argv=0x7fffffffe4a8)
    at ./size_t.cc:11
11        if (pos <= v.size() - 2) {
    
    
(gdb) x/3i $pc
=> 0x55555555526e <main(int, char**)+121>:
    call   0x5555555554d4 <_ZNKSt6vectorIiSaIiEE4sizeEv>
   0x555555555273 <main(int, char**)+126>:      sub    $0x2,%rax
   0x555555555277 <main(int, char**)+130>:      cmp    %rax,%rbx

Observe that the pc pointer has run to the address of the breakpoint. From these three assembly statements, we can easily see that the result of the expression is stored in the register rax v.size() - 2, and the value of the variable pos is stored in the rbx. We perform single-step debugging and check the values of these two registers and the status of register flags after each step:

(gdb) ni  ## call   0x5555555554d4 <_ZNKSt6vectorIiSaIiEE4sizeEv>
11        if (pos <= v.size() - 2) {
    
     
(gdb) i r rax rbx eflags
rax            0x1                 1
rbx            0x0                 0
eflags         0x202               [ IF ]
(gdb) ni  ## sub    $0x2,%rax
11        if (pos <= v.size() - 2) {
    
    
(gdb) i r rax rbx eflags
rax            0xffffffffffffffff  -1
rbx            0x0                 0
eflags         0x297               [ CF PF AF SF IF ]
(gdb) x/3i $pc
=> 0x555555555277 <main(int, char**)+130>:      cmp    %rax,%rbx
   0x55555555527a <main(int, char**)+133>:      setbe  %al
   0x55555555527d <main(int, char**)+136>:      test   %al,%al
(gdb) ni  ## cmp    %rax,%rbx
0x000055555555527a      11        if (pos <= v.size() - 2) {
    
    
(gdb) i r rax rbx eflags
rax            0xffffffffffffffff  -1
rbx            0x0                 0
eflags         0x213               [ CF AF IF ]
(gdb)

By recording the flag bits, it is not difficult to find that when executing sub $0x2 %rax, the flag bit CF is set, which represents this subtraction operation, which is a subtraction operation of unsigned type numbers, and there is a borrow, that is, overflow, and at the same time SF It is also set, indicating that the current subtraction calculation result is a negative number (since the data stored in the computer is stored in its complement form, so here 0xffffffffffffffff is the complement code, converted to source code is 0x8000000000000001, and the decimal representation is -1). But in the subsequent cmp %rax %rbxstatement,The flag bit CF is set again, which means that the computer subtracts the values in rax and rbx as unsigned numbers 0x0 - 0xffffffffffffffffBorrowing occurs naturally, so the computer automatically thinks "0 < -1"!

So the root cause of the problem is that when the computer executes the cmp instruction, the signed number "-1" is regarded as the unsigned number "0xffffffffffffffff" for comparison. Therefore, abnormal results appeared when judging the size.

issue tracking

Then why does the computer treat "-1" as an unsigned number? Let's take a look at std::vector::size()the declaration of the method:

size_type size() const noexcept;

So v.size()the returned 1 is of type size_type. This type is interpreted as an unsigned integer in the cplusplus website, usually the same as size_t. The truth is clear here, because v.size()this method returns an unsigned integer result, so in subsequent subtraction operations and size comparisons, C++ implicitly converts signed integers to unsigned integers by default, and all operations All become operations of unsigned numbers (the compiler will not give any warning for the overflow of unsigned integer subtraction).

Combined with the conclusions obtained through disassembly and debugging above, it proves that the cause of the problem is consistent with our initial guess.

problem solved

Now that the root cause of the problem is known, the solution is relatively simple, as long as the type conversion is legal when performing operations,

Because v.size()the unsigned integer obtained in the sample code is small, we can directly and explicitly convert it to a signed integer:

...
  if (pos <= (int)v.size() - 2) {
    
    
...

But this does not mean that as long as you explicitly convert the data type, errors will not occur. For example, a negative number is converted to an unsigned number, and the maximum value of an unsigned number is converted to a signed number. These two are typical type conversion problems that cause numerical overflow.

We should ensure that there will be no overflow after the conversion before we have to convert the data type!