In-depth analysis of pointers-explain dangling pointers and pointers in a simple way, and tell you why you can't have pointer

table of Contents

  • Introduction
  • background
  • What is a pointer?
  • What is a pointer to a pointer?
  • Dangling pointer
  • Where do the pointers start?
  • The difference between local variables and pointers
  • So when is it impossible to achieve without pointers?
  • However, there are so many magical smart pointers!
  • What is call stack and stack overflow..
  • Cannot be without pointers-Case 1
  • Can't be without pointers-case 2
  • Cannot be without pointers-case 3
  • Cannot be without pointers-case 4
  • Cannot be without pointers-case 5
  • Cannot be without pointers-case 6
  • Why pointers are so powerful
  • Difference between object pointer and function pointer
  • Pass parameters by value
  • Pass parameters by reference
  • Points of Interest

Introduction

Pointers are the life and death enemies of many beginners. They can be called the Jiayuguan of C++. In this article, I will try to clarify some points about pointers and their usage. I hope it will be helpful to you who are learning.

background

Sometimes, on the Web, I have countless discussions about pointers in languages ​​such as C or C++, and there are constant debates about whether using pointers is worth or even wise. This problem seems to continue to this day. So, hoping to clarify some things, I decided to create this article. I'm not too skilled on this issue, nor religiously requires that each person newmust have the appropriate deleteor that everyone mallocmust have a corresponding free.

This article applies to C and C++, you have no choice but to deal with raw pointers. Other languages ​​(such as Java, C#, etc.) hide all voodoo magic behind the scenes. It is also suitable for beginners who do not use ASM, C or C++ to write code but are confused by the topic.

I will try to cover all situations without getting too deep into technical details. If you want to have very deep technical knowledge of pointers, you can find a lot of information from Wikipedia to many blogs. There is a great article on pointers on the cplusplus dot com website, so I won’t repeat it here. Given that you are developing in C or C++, I will introduce it in the following scenarios:

  • When is it impossible to achieve something without using pointers
  • Advantages of using pointers
  • Pass variables by value, reference or pointer
  • Dangling pointer

What is a pointer?

A pointer is a variable held in the specific integer width (an address float, , double, int, struct, classetc.) value is stored in a computer memory. Therefore, the pointer is the "fact" object that the computer can understand natively on the metal layer. The pointer is always an unsigned integer with a width of 8, 16, 32, 64, 128, etc. This largely depends on the CPU's main register width. It is also indicated by the bit alignment at runtime of the operating system. It is entirely possible to run a 16-bit OS on a 64-bit CPU, but vice versa. However, in a 16-bit OS, even if the CPU is 64 bits wide, you will be limited to the 16-bit address space.

Side note: Since the width of the 64-bit register is sufficient to meet the needs of the above-mentioned 16-bit OS, a special segmented memory manager can be written, which can peek beyond the 64Kb limit, but the premise is that the physical register must be wide enough (oh , The old 16-bit Windows)). The topic of 16-bit memory segments and offsets is obsolete today. In the current discussion, I only focus on flat memory.

The width of a pointer is directly related to the addressable memory that the pointer can hold, and the power of its width is 2 minus 1.

By looking at this table, you can see the past and the future.

If you want to wake up in the realm of sifi novels, the first thing to check is the addressable pointer memory on the computer they use to understand what you are dealing with.

128-bit registers do exist in today's 64-bit processors, usually used for SIMD (Single Instruction Multiple Data) opcodes. Long story short-they allow to load four 32-bit integers side by side into the same register and execute multiple instructions in one CPU cycle (Hz), such as MulDiv etc. It is worth seeing: 1 EB equals 1 million terabytes.

it is good. So what is a pointer to a pointer?

The pointer to the pointer is an integer variable that holds the address of the pointer. It is usually used as the return value of a function. It is also used where the danger of hanging the pointer may occur.

//
// Calling a function that returns a pointer
//
void* ptr = nullptr;
// declared as void SomeFuncReturnsPtr(void** p) { *p = value; }
SomeFuncReturnsPtr(&ptr);
// ptr is no longer null
ptr->DoStuff();

Dangling pointer

A pointer passed to a different function as a direct pointer (also called a pointer copy) faces the following danger: if the function deletes the pointer, the other copy of the pointer in another function will not be invalid. It still retains the address of the erased memory and therefore becomes a dangling pointer because its state seems to be valid (not null). Any use of such pointer results is undefined runtime behavior. In an unmanaged environment, this is a real danger because it is difficult to find at runtime, so it is best to face it during the coding phase.

The following code defensively handles the possibility of pointer dangling:

// Disastrous main
void main()
{
        A* ptr = new A;
        NukeA(ptr);
        // ptr now is a dangling pointer
        assert(ptr == nullptr); // Kaboom. Still points to wiped memory
        delete ptr; // Kaboom
}

// Safe main
void main()
{
        A* ptr = new A;
        NukeSafelyA (&ptr);
        // ptr now is null
        assert(ptr == nullptr);
}
 
void NukeA(A* p)
{
        p->DoThis();
        p->DoThat();
 
        delete p; // Kaboom! A dangling pointer is born

        p = nullptr;
}
 
void NukeSafelyA(A** p)
{
        (*p)->DoThis();
        (*p)->DoThat();
        delete *p; // Nice
        *p = nullptr;
}

In fact, if you use pointers to dynamically allocate memory, you should never pass the pointer to another function, especially if that function can or may delete it. Only pass it through a pointer to a pointer, the pointer and then you will be the pointer changes to nullif it actually delete invalid in another function. This is a very simple example, but it does demonstrate when it can happen.

Maybe the dry text seems to be boring to write. If the text alone is not easy to digest, you can join the group 973961276 to communicate and learn with everyone. There are also many video materials and technical experts in the group, and understanding with the article should make it easier to understand You have a good harvest.

Recommend a good c/c++ beginner course . This is different from what I have seen in the past that only talk about theory. This course starts with six enterprise-level projects that can be written on resumes and leads everyone to learn c/c++. Friends who are studying can find out.

Where do the pointers start?

Computer Metal doesn't care or even know what a variable is or what a variable is. All it knows is CPU registers and memory address-pointers. That being said, the compiler creates an illusion through variable declarations that allows the CPU to associate it with internal registers and the memory address from which the value is loaded.

The difference between local variables and pointers

All local variables are declared and reside in the function stack frame. The pointer to dynamically allocated memory is also located on the stack frame, but it points to the program's global heap memory. In the unmanaged global heap, the program is responsible for managing it. Therefore, any leaked memory will eventually lead to a lack of resources and cause your program to crash sooner or later.

So when is it impossible to achieve without pointers?

I can list a few of these situations, but let me solve them before I try further. Each program has multiple functional stack frames and a global heap. The global heap is basically virtual memory managed by the OS. The stack is a LIFO (Last In First Out) data structure of limited size. As you can imagine, any overflow to the adjacent stack will effectively erase the saved data and crash the program.

However, there are so many magical smart pointers!

Remember this. No code looks and behaves as bad as code riding on smart pointers written by someone who doesn't know what the pointer actually is or represents. Please read all the following situations before trying to use smart pointers to indulge in programs. Also, go back and re-read the "Dangling Pointer" section. Smart pointers are notorious for creating such an underlying layer when the connection/disconnection of raw pointers to smart pointers is handled through assets rather than brain cells.

What is call stack and stack overflow

The stack is a LIFO data structure. In fact, the compiler is also called a "stack computer". Each process and thread has its own stack. Each stack is subdivided into call stacks. The number of call stacks exactly matches the number of functions in the program. The call stack is the smaller block in the stack. There is a limit on the call stack size, which is usually 1Mb. On UNIX systems, it is an environment variable (I believe). Visual C ++ compiler allows you to change the size of the call stack /Fflag.

Stack overflow is best characterized by the following pseudo-code _chkstk()functions:

;***
;_chkstk - check stack upon procedure entry
;
;Purpose:
;       Provide stack checking on procedure entry. Method is to simply probe
;       each page of memory required for the stack in descending order. This
;       causes the necessary pages of memory to be allocated via the guard
;       page scheme, if possible. In the event of failure, the OS raises the
;       _XCPT_UNABLE_TO_GROW_STACK exception.
;
;       NOTE:  Currently, the (EAX < _PAGESIZE_) code path falls through
;       to the "lastpage" label of the (EAX >= _PAGESIZE_) code path.  This
;       is small; a minor speed optimization would be to special case
;       this up top.  This would avoid the painful save/restore of
;       ecx and would shorten the code path by 4-6 instructions.
;
;Entry:
;       EAX = size of local frame
;
;Exit:
;       ESP = new stackframe, if successful
;
;Uses:
;       EAX
;
;Exceptions:
;       _XCPT_GUARD_PAGE_VIOLATION - May be raised on a page probe. NEVER TRAP
;                                    THIS!!!! It is used by the OS to grow the
;                                    stack on demand.
;       _XCPT_UNABLE_TO_GROW_STACK - The stack cannot be grown. More precisely,
;                                    the attempt by the OS memory manager to
;                                    allocate another guard page in response
;                                    to a _XCPT_GUARD_PAGE_VIOLATION has
;                                    failed.
;
;*******************************************************************************

Since the stack size is limited, the content that overflows onto the adjacent stack will be monitored, and if this happens, an exception will be raised. A _PAGESIZE_is 4Kb on a 32-bit operating system and 8Kb on a 64-bit operating system. Therefore, if any variable size is larger than the page size, it will be checked, but it will not necessarily cause a stack overflow.

Alloca()The function pulls memory from the local stack at runtime.

Can't be without pointers-case 1

The object size exceeds the size of the function stack. This restriction requires the use of the global heap, so if the object is too large or may grow larger over time, you need to use pointers. Everyone encountered a stack overflow exception or segmentation fault at least once.

What will be just to name a few:

  • Read character arrays of files> 1Mb declared on the stack (or larger than the default stack size):
//
// Kaboom – Stack overflow
//
char file_readin[2000000]; //to read a file that of that size or less
...
  • C++ objects of nested classes and arrays whose cumulative size is greater than 1Mb (or greater than the default stack size):
class CGiganticClass{…}; // sizeof(CGiganticClass) >= 1,048,576 bytes

//
// Kaboom – Stack overflow
// 
CGiganticClass a;
  • Recursive functions can be called thousands of calls. In this case, even the pointer cannot save you, but it may delay the inevitable:
//
// Kaboom – Stack overflow
//
void recursive_function(int value)
{
        char file_readin[200];
        for(int i = 0; i < 100000; i++)
        {
                 recursive_function(i);
        }
}
  • The realization of the collection:
//
// Futile attempt to write a collection that uses stack only
//
template class futile_array<class T>
{
        T arr[1000]; // zeroed in ctor
         size_t avail_index; // zeroed in ctor
public:  
        void add(const T val)
        {
                 if(avail_index >= 1000)
                         return;

                 arr[avail_index++] = val;
        }
};

void main()
{
        //
        // Kaboom – Stack overflow
        //
        futile_array<CGiganticClass> a;
}
  • The subsystem used is not available during compilation and only exists at runtime, so only pointers are returned (Windows API, DirectX API, Linux sys call, Open GL, etc.):
void main()
{
        //
        // Available only during runtime, but not compile time
        //
        void* ptr = ::SomeOperatingSysAPI();
       
        // cast and do whatever you need
        SomeStruct* p = (SomeStruct*)ptr;
}
  • Any other type of runtime creation or destruction.

If you are using an embedded system, the situation will be even worse. There, you will be lucky to get a stack size of 4Kb or even smaller.

Can't be without pointers-case 2

Controlled creation. By declaring pointers, no objects are created. And even if they point to larger objects, the corresponding pointers only occupy 4 bytes (32-bit pointers) or 8 bytes (64-bit pointers) in memory. You may only want to create the object at runtime and under certain conditions (such as in any other situation) when absolutely necessary. Therefore, control the memory consumption of the program.

Cannot be without pointers-case 3

Contrary to case 2. Control damage. Automatic variables are destroyed in the reverse order after the function exits. This also applies to the main function. Sometimes it is necessary to clean up before the end of the function. You can release some resources, uninstall dynamic libraries, and so on. Imagine a class, for example, contains a pointer to a DLL object, and the class is declared as a stack variable. Therefore, you can unload the DLL before the function returns and the class destructor is called. And, if your class destructor happens to perform cleanup tasks on the same DLL, your program will crash. Before returning to any function and before releasing the DLL, such objects must be destroyed. Therefore, you need to control the time of destruction of the object.

I know you will be very smart, and you will argue that it is possible to use nested parentheses within the function itself to control the destruction of automatic variables. Yes, but only if you don't need to use the variable outside of these curly braces, and it looks ugly.

Regardless of the underlying situation, the most important thing to remember is that you can physically control the death of the subject in a timely manner or under specific circumstances of your choice.

Narrator: BTW smart pointers cannot provide "controlled destruction" functions. They literally chop your pointer into a "stack variable", and each copy will reference count it and make a copy in the entire location. Only when the smart pointer provides a call to delete the underlying pointer can the pointer's lifetime be controlled, but it is more ugly and secretive than the call to the delete operator itself. Anyway, my opinion.

Likewise, weak pointers are observers of strong smart pointers, and strong smart pointers actually carry an array of copies of each weak pointer internally so that they can be notified of the demise of basic strong pointers. Each weak pointer instance is added to the array within a strong pointer, and can actually reach a considerable size internally, which is much worse in performance compared to regular raw pointers.

I do not advocate this, but you must know what the cost is and whether it is worth it.

Cannot be without pointers-case 4

A single collection of C or C++ objects of too large a size that any unnecessary copying or moving of them will severely destroy the heap, so that any successive calls to the operator new (or malloc) will eventually lead to memory Insufficient exception. What could this be? A database engine implementation, a game with 50,000 objects in the scene, which contains meshes and textures and other data, to name a few. In this case, after these objects are allocated, any duplication or movement of these objects should be avoided. They must remain where they were originally created, and any operations on them (such as function call values, sorting, etc.) must be done only through pointers, not through the object itself. For example, you can sort the object pointers based on certain object conditions without rearranging the objects in the memory itself. Such objects are stored in collections by pointers. Therefore, it is std::vector<CGiganticClass*>stored as a pointer and sorted by the pointer instead of the actual class. This can be a confusing situation and can be solved with an example. Any collection of industrial strength (eg std::vectorinternal) use the heap to create the type. However, it happens to allocate space on the heap for the "type", the object itself or a pointer to the object. In the case of declaring a vector of pointers, no objects will be created, only pointers to them will be created. Therefore, you can actually assign them later. Or, if they have already been created/allocated, adding the pointer to the collection will not cause the object to copy or move itself. It can only be copied and moved through pointers that are only 4 or 8 bytes wide.

Cannot be without pointers-case 5

As mentioned above, in case 4, storing objects via pointers provides another great advantage. Not only can you store in the array of pointers, but all at the same time, you can store these very same pointer std::map, std::unordered_map, std::list, std::hashmapat the same time, never unnecessary copy of an object. You may want to access it by index, key value, or any other effective search pattern, and get a pointer to what you are looking for, while the main object remains static in the heap memory. This makes even data-heavy programs extremely fast and responsive.

Cannot be without pointers-case 6

The asynchronous execution parameters passed by the pointer or better known as thread parameters, they are actually stack automatic variables in the calling function. Well, any attempt to pass stack variables to separate threads by address is the key to the inevitable disaster. Because when the calling function exits, the variable will be destroyed, and the corresponding thread pointer will be looking up and visiting no man's land. This also applies to the wrong use of smart pointers, which delete the object after the function exits and leave the thread pointer in a dangling state.

Why pointers are so powerful

Pointers are stateful objects. Not only can it access the object itself, but it can also save information about the existence of the object at the same time.

Difference between object pointer and function pointer

Statement convention. The object pointer points to the data segment, and the function pointer points to the code segment. In addition, there is no need to dynamically allocate or delete function pointers.

Pass parameters by value

what did you say? Everyone knows! Why bother to talk about this topic. Okay, okay, check the graph in the stack section-parameters. Before jumping to the label of the function, each parameter will be pushed onto the stack. Depending on the composition of the object, it may be many PSH assembly opcodes. There is another reason why arrays are always passed by pointers, even if they are not explicitly specified. By the way, this only works for C arrays, if you pass something similar std::vector, it will be passed by value or copied. If you need to calculate how many CPU cycles a function jump requires, it depends on the size of the object. Because it is a complete and independent copy, this push may also cause a stack overflow. Sometimes, you have no choice but to deliver value. However, when you don't have to do this, passing by reference or pointer is hundreds of times faster because it's just a 4 or 8 byte push instead of hundreds of pushes.

Pass parameters by reference

It allows you to pass a huge object into a function via a PSH opcode and access it as if it were not a pointer. But this has to pay another price. First, you may accidentally pass in a dereferenced null pointer, and your function will crash. Second, because references are stateless in nature, there is absolutely no way to check whether the references are good. This brings up an interesting point that if the object is a pointer to dynamically allocated memory, don't pass it by reference (dereference), but pass it by pointer.

Points of Interest

It is worth mentioning that even if your computer only has 1 GB of physical memory installed, a 32-bit operating system can address the entire 4 GB of address space in each process. This magic is performed through the disk file system, so content that does not fit in physical RAM is written to disk or "paged" (making it somewhat dependent on the disk's read/write speed). In fact, each process has its own separate 4 GB address space (virtual), regardless of your hardware RAM capacity (to a certain extent). Not only that, each process also has its own virtual processor. The operating system switches the processor value when switching the process/thread context itself. For processor registers, it is important to keep pointers to the memory of the current process rather than the memory of adjacent processes. Any crash in one process will not harm any other process. In old 16-bit systems (such as MSDOS or Windows 1.0, 2.0, 3.0, and 3.1), this is not the case. These operating systems running on the physical hardware itself and a crashed program may actually crash any other program and OS.

Let you no longer be afraid of pointers in C language (on)

 

Guess you like

Origin blog.csdn.net/linuxguitu/article/details/112472257