[C++ Summary and Extraction 0x02] Memory Management (1): Raw Pointer


foreword

For a long time, C++ has inherited C language's support for pointers and arrays. In early C++, we
may define a pointer or an array in this way.

int n;
int *p = &n;
int digits[10] = {
    
    5,5,5,1,2,3,4,5,6,7};

But in modern C++, we tend to avoid these "raw pointers or arrays" more often.
Instead, use newer smart pointers (eg: unique_ptr), and container classes (eg: vector, list). There are many obvious benefits to using them:

  • There are more, more fully-featured interfaces
  • Ability to more clearly express the developer's intent
  • Their destructors automatically release resources

Of course, this does not mean that "raw pointers" have been completely abandoned.
When we use some libraries or frameworks based on "raw pointers", when we read code that uses "raw pointers", we still have to deal with raw pointers.

Therefore, this article starts from the "raw pointer", including the following content:

pointer to object

pointers and addresses

Memory may be the most important thing in a computer. When you create a new variable and run a program, you are dealing with memory, and pointers are our tools for manipulating and managing memory. Many people think that the pointer is very complicated, but in fact the pointer is just a number (no matter what type of pointer), it stores a memory address, nothing more.

A more detailed statement is:

  • A pointer is an object that holds the address of another object
  • Every pointer has a type, which represents the type of object it points to:
int *pi;                  // a "pointer to int"
unsigned long *pul;       // a "pointer to unsigned long"

So when I have an arbitrary object x, and I want to get the address of this object, I use &x to get the address of x. In other words, if x is of type T, then &x is of type "pointer to T":

int i;
unsigned long ul;           // object T

int *pi = &i;
unsigned long *pul = &ul;   // pointer to T

After understanding the above two points, it is easy for us to think that since the pointer is just a number, can the value of the pointer be modified? In fact, in most cases, it is possible. Pointers can point to different objects at different stages of their life cycle. As for some exceptions, this article will introduce them later.
insert image description here
For example, a pointer P in the above figure, we sometimes want it to point to object a, and sometimes we want it to point to object b, then I can modify the pointing of P like this:

int a = 1, b = 2;
int *p = &a;

p = &b;

null pointer

There is also a pointer, which does not point to anything, and is called a null pointer.

int *p1 = NULL;       // traditional C
int *p2 = 0;          // traditional C++
int *p3 = nullptr;    // modern C++

The above p1, p2, and p3 are all null pointers, but the syntax is different in different periods. The three writing methods can achieve the purpose, but our first choice is nullptr.

Because NULL originally came from the C language, in the C language NULL is defined as 0, or 0L. But no matter which definition, NULL has a type when it is just created and does not point to anything: int, which is an integer. This can lead to some unexpected situations when we use overloaded functions:

void f(int i);
void f(char *s);

f(NULL)			// call f(int), not f(char *)

In the above example, because NULL has an integer data type, int has a higher priority than char * when calling the overloaded function. Therefore, nullptr was introduced in C++11, which is a unique type rather than an integer, and can be converted to any pointer type.

dereference

The next topic is dereferencing: when we have a pointer variable p, we can use *p to get the object pointed to by p.

int i = 13;
unsigned long ul = 42;           
int *pi = &i;
unsigned long *pul = &ul;

*pi = 14;                         //把 i 的值变成 14
*pul += 2;                        //把 ul 的值加 2

Of course, care should be taken to avoid dereferencing a null pointer, which is undefined behavior. We don't know what bad things are going to happen, but certainly not good things.

pointer lifetime

In general, the lifetime of a pointer has nothing to do with the lifetime of the object it points to.

void f(int *p){
    
    
	return;
}

int i = 10;
f(&i);

In the above example, every time the f function is called, a new p instance will be recreated. After the function is called, this instance will be destroyed, and its life cycle is only this call. But before the function call, the object i pointed to by p already exists, and after the call ends, i still exists.
When objects live longer than pointers, there is no problem. But if the lifetime of the pointer is longer than that of the object, some unexpected situations will arise:

int *g(){
    
    
	int i = 0;    // i lifetime begins
	return &i;    
}                 // i lifetime ends

int *pi = g();    // pi points to dead i

In the above example, the pointer pi points to a dead object, which we call a dangling pointer, and accessing *pi is also an undefined behavior.

pointers in the array

First, we define the simplest fixed-length array:

char x[N];     // 长度为 N 的 char 型数组

On the surface, arrays in C++ are no different from arrays in other languages, but the first thing to notice is that if we initialize the array length with N, N must be an integer constant, const int, which must provide the compiler with a An evaluable amount used to determine how much memory to allocate for the array.

Then we can randomly access the array elements indexed from 0 to N-1:

int k;

x[0] = 'a';
x[k] = 'c';

Pointer calculations in arrays

We can also use the pointer pc to access elements in an array, ++pc will make it point to the next element of the array, regardless of the size of an array element. In this way, you can also iterate over the entire array:

char x[N];
char *pc = &x[0];

*pc = 'a';      // same as : x[0] = 'a'

++pc;           // pc now points to x[1]
*pc = 'c';      // same as : x[1] = 'c'

int in[5] = {
    
    1,2,3,4,5};
for(int *p = in; p < x + 5; ++p){
    
       // step through the array
	~~~
}

The above code for traversing the array is illustrated as follows, the initial pointer points to the first element of x, and then starts to move backward one by one: insert image description here
but when the loop ends, the pointer will point to the first element outside the array (at the dotted line). At this time, it will point to the next location of the storage location of the array in memory, and writing or reading from it may cause unpredictable behavior.

The above mentioned "no matter what the size of an array element is", its actual meaning is that when the pointer in the array is added or subtracted, it seems to be unitized, and it is always in the unit of the array element, which is the same as the element How many memory units are actually occupied is irrelevant. See the example below:

int i,j;
int x[5];
int *p = &x[i], *q = &x[j];

int n = q - p;
int m = j - i;
if(n == n) {
    
     ~~~ } // always true

It is especially important to note here that this subtraction between pointers is only valid when the two pointers point to elements in the same array, and their result is an int value.

Array subscripts and pointers

You can directly regard the x in the above example as a pointer, which points to the position of x[0]

int *pi = x;   // same as : pi = &x[0]
*x = 4;        // same as : x[0] = 4

To extend it again, in fact, the array subscript [] actually represents a pointer operation rather than an array operation.

x[i]   // same as : *(x + i)

for(int i = 0; i < n; ++i){
    
    
	sum += x[i];
	sum += *(x + i);   // equivalent but not recommended
}

Having said that, you might think that an array is really just a pointer. But in fact it is not, we can directly assign x to the pointer, because the compiler will treat the array x as a pointer to int. This is called "decays".

This decay is temporary, it only lasts until the end of the assignment statement. Just like we can add a double value and an int value and then assign it to a double, the compiler treats the int value as a double when calculating, but it does not mean that the value is really turned into a double.

double d;
int i;
~~~
d += i;     ==      {
    
     
				      double t = static_cast<double>(i)
					  d += t;
					}  // t在此处结束生命周期

Similarly, in the compiler, x is temporarily decayed to &x[0].

But it is worth mentioning that when an array is used as a function parameter, it is actually a pointer, and the three writing methods in the following example are the same:

void foo(int *x){
    
    
	cout << sizeof(x);           // sizeof(x) = sizeof(int *) 	
}
void foo(int x[]){
    
    
	cout << sizeof(x);           // sizeof(x) = sizeof(int *)
}
void foo(int x[10]){
    
    
	cout << sizeof(x);           // sizeof(x) = sizeof(int *)
}

size_t and ptrdiff_t

In the standard library, there are many functions that use parameters or return values ​​that are objects in units of bytes, for example:

T *p = malloc(N);   // N 是需要被分配的字节数
memcpy(dst,src,N);  // N 是需要从src拷贝到dst的字节数

What is the type of N here? The easiest thing to think of is INT.
We want these functions to be available when dealing with any object, so N needs to be able to represent the size of any object. So the designers specially designed a new type size_t, in fact, it has a built-in sizeof() function to indicate the size of the target object:

#include <cstddef>
using namespace std;

size_t n = sizeof(widget);

On different platforms, the definition of size_t may be slightly different, but there are two general principles:

  • size_t is an unsigned integer, since the size of an object cannot be negative
  • size_t may be an int, may be a long or even long long, it should be able to represent the largest object on the target machine.

Look at another example of size_t as the return value:

size_t strlen(char const *);

sizeof(array) returns the byte size occupied by an array, and it also indicates how many elements the array may contain at most (the size of each element is at least 1).

It is mentioned above that size_t must be a positive number, but the result of subtracting two pointers in the array is not necessarily a positive number, so how to express it: Obviously,
insert image description here
here p - q = -3, the designer provides a new type ptrdiff_t, which is used to represent the result of addition and subtraction of two pointers, is usually a signed integer.

One of size_t and ptrdiff_t represents an unsigned number and the other represents a signed number. When we compare a signed number with an unsigned number, the compiler will convert the signed number to an unsigned number, which may cause Some unexpected bugs:

char buffer[64];
char const *field_end = strchr(field,',');
ptrdiff_t length = field_end - field;
if(lenght < sizeof(buffer)){
    
             // 无符号数和有符号数进行了比较
	~~~
}

const and pointers

When const encounters pointers, many programmers are often confused. This section will summarize how const affects pointers.

Let's start with T *p:

const T *p;            // (1)
T const *p;            // (2)
T *const p;            // (3)
const T *const p;      // (4)
T const *const p;      // (5)

What do they mean respectively?

For (1) and (2), const is on the left side of the * sign, at this time, p means "pointer to constant T". This means that T is constant, we cannot change the value of T, but we can change the pointing of the pointer:

T x,y;
p = &x;   // OK : can modify p
*p = y;   // NO : can not modify T referenced by *p

For (3), const is on the right side of the * sign. At this time, p means "pointer constant pointing to T", that is to say, p can only think about T, and the T object can be modified:

T x,y;
p = &x;   // NO : can not modify p
*p = y;   // OK : can modify T referenced by *p

Then, for (4)(5), it is obvious that neither T nor p can be modified, it is a constant pointer p pointing to a constant T.

Then what needs to be introduced is constexpr, constexpr and const are not equivalent, constexpr always refers to a pointer constant:

char constexpr *p;
~~~
char const     *p;  // not equivalent
char *const     p;  // equivalent

The last thing to talk about is the type conversion of pointers. Here is just one example to remember:

T *p;
T const *pc;

pc = p    // OK
p = pc    // NO : lost const 

In the above example, when the T object is modified by const, if p also points to it, then we can modify the constant T through p, which is not allowed.

pointer type conversion

In the early days of the C language, unexpected conversions between different types of pointers caused many bugs. For this conversion, the compiler usually only gave a warning, but in C++, this conversion is not allowed. The compiler will Throws an error.

gadget *pg;
widget *pw;  // gadget and widget are distinct types
~~~
pg = pw;
pw = pg;     // warning in C, error in C++

Of course, you can use reinterpret_cast to make the compiler shut up and force the conversion to be completed, but this does not mean that this conversion is safe.

pg = reinterpret_cast<gadget *>(pw); 

So, which type conversions are safe:

  • Pointer to derived class converted to pointer to base class
  • Any type of pointer can be directly assigned to void *

The first point needs no elaboration.
Regarding the second point, some functions in the C language (such as malloc and free) are designed to operate on pointers to arbitrary objects, and C and C++ provide void * to represent pointers that can point to arbitrary data types:

void *malloc(size_t n);
void free(void *p);

Above we said that any type of pointer can be directly assigned to void *, but it should be noted that the reverse is not true. No type can contain type, because a void pointer, the program cannot do any operations (access, assignment, addition and subtraction) through it, but type cannot contain no type.

Summarize

This article introduces the usage of "raw pointer", but understanding smart pointers requires a lot of pre-knowledge. The author will update smart pointers after introducing these pre-knowledge.

Guess you like

Origin blog.csdn.net/qq_35595611/article/details/126433137