C/C++ function calling convention and function name decoration rules

foreword

Programmers who use C/C++ language to develop software often encounter such problems: sometimes there is no problem with program compilation, but when linking, it always reports that the function does not exist (classic LINK 2001 error), sometimes the program compiles and links There are no errors, but as soon as a function in the library is called, a stack exception occurs. These phenomena usually occur when C and C++ codes are mixed or when third-party libraries are used in C++ programs (not developed in C++ language). In fact, these are function calling conventions and function name modification rules to blame. The method of function calling determines the order in which function parameters are pushed onto the stack, whether the caller function or the called function is responsible for clearing the parameters in the stack, and the function name modification rules determine which name modification method the compiler uses to distinguish different functions , if the calling conventions between functions do not match or the name decorations do not match, the above problems will occur. This article explains in detail the function calling conventions and function name modification rules of the two programming languages, C and C++, compares their similarities and differences, and illustrates the reasons for the above problems with examples.

function calling convention

The function calling convention not only determines the stacking order of function parameters when a function call occurs, but also determines whether the caller function or the called function is responsible for clearing the parameters in the stack and restoring the stack. There are many methods of function calling conventions. In addition to the common __cdecl, __fastcall and __stdcall, the C++ compiler also supports the thiscall method, and many C/C++ compilers also support the naked call method. So many function calling conventions often confuse many programmers, what are they all about, and under what circumstances are they used? These function calling conventions are introduced separately below.

1.__cdecl

The command-line argument to the compiler is /Gd. The __cdecl method is the default function calling convention of the C/C++ compiler. All non-C++ member functions and those functions that are not declared with __stdcall or __fastcall default to the __cdecl method. It uses the C function call method, and the function parameters follow the right The stack is pushed to the left in order, and the function caller is responsible for clearing the parameters in the stack. Since each function call must be generated by the compiler to clear (restore) the stack code, the program compiled with __cdecl is more efficient than compiled with __stdcall The program is much larger, but the __cdecl calling method is that the function caller is responsible for clearing the function parameters in the stack, so this method supports variable parameters, such as printf and windows API wsprintf is the __cdecl calling method. For C functions, the name decoration convention of __cdecl method is to add an underscore before the function name; for C++ functions, unless extern "C" is specially used, C++ functions use different name decoration methods.

2.__fastcall

The command-line argument to the compiler is /Gr. The __fastcall function calling convention uses registers to pass parameters when possible, usually the first two DWORD type parameters or smaller parameters are passed using ECX and EDX registers, and the rest of the parameters are pushed onto the stack in order from right to left, and the called function Responsible for clearing the parameters on the stack before returning. For C functions, the compiler uses two @ to modify the function name, followed by the size of the function parameter list represented by a decimal number, for example: @function_name@number. It should be noted that the __fastcall function calling convention may have different implementations on different compilers, such as 16-bit compilers and 32-bit compilers. In addition, when using inline assembly code, it should also be noted that it cannot be compiled with There is a conflict in the registers used by the device.

3.__stdcall

The compiler's command line parameter is /Gz, __stdcall is the default calling method of Pascal programs, and most Windows APIs also use the __stdcall calling convention. The __stdcall function calling convention pushes function parameters onto the stack from right to left, unless pointer or reference type parameters are used, all parameters are passed by value, and the called function is responsible for clearing the parameters in the stack. For C functions, the name modification method of __stdcall is to add an underscore before the function name, add @ and the size of the function parameter after the function name, for example: _functionname@number

4.thiscall

thiscall is only used in the call of C++ member functions. Function parameters are pushed onto the stack in order from right to left, and the this pointer of the class instance is passed through the ECX register. It should be noted that thiscall is not a keyword of C++, and functions cannot be declared using thiscall, it can only be used by the compiler.

5.naked call

For functions using the previous function calling conventions, the compiler will automatically add codes to save the ESI, EDI, EBX, and EBP registers at the beginning of the function when necessary, and restore the contents of these registers when the function exits, using naked call declarations The function will not add such code, which is why it is called naked. naked call is not a type modifier, so it must be used together with _declspec.

The VC compilation environment uses the __cdecl calling convention by default, and you can also choose to set the function calling convention in the Project Setting... menu => C/C++ =>Code Generation item of the compilation environment. You can also add keywords such as __stdcall, __cdecl, or __fastcall directly before the function declaration to determine the calling method of the function separately. The WINAPI macro is often used to develop software on the Windows system. It can be translated into an appropriate function calling convention according to the compilation settings. In WIN32, it is defined as __stdcall.

Function name mangling rules

A function's name decoration (Decorated Name) is a string created by the compiler during compilation to indicate the definition or prototype of the function. LINK programs or other tools sometimes need to specify the name decoration of the function to locate the correct location of the function. In most cases, the programmer does not need to know the name decoration of the function, and the LINK program or other tools will automatically distinguish them. Of course, in some cases, it is necessary to specify the name decoration of the function. For example, in a C++ program, in order for the LINK program or other tools to match the correct function name, it is necessary to specify the name decoration for the overloaded function and some special functions (such as constructor and destructors) specify name decorations. Another situation where you need to specify function name mangling is calling a C or C++ function from assembler. If there is any change in the function name, calling convention, return type, or function parameters, the original name decoration is no longer valid, and a new name decoration must be specified. The functions of C and C++ programs use different name decoration methods internally, and the two methods will be introduced separately below.

  1. C compiler function name decoration rules

For the __stdcall calling convention, the compiler and linker will prefix the output function name with an underscore, followed by a "@" symbol and the byte number of its parameter, such as _functionname@number. The __cdecl calling convention simply prefixes output function names with an underscore, such as _functionname. The __fastcall calling convention adds an "@" symbol before the output function name, followed by an "@" symbol and the number of bytes of its parameters, such as @functionname@number.

  1. C++ Compiler's Function Name Modification Rules

The C++ function name modification rules are somewhat complicated, but the information is more sufficient. By analyzing the modified name, you can not only know the calling method of the function, the return value type, the number of parameters, and even the parameter type. Regardless of the __cdecl, __fastcall or __stdcall calling method, the function modification starts with a "?", followed by the name of the function, followed by the start identifier of the parameter table and the parameter table spelled out according to the parameter type code. For the __stdcall method, the start identifier of the parameter list is "@@YG", for the __cdecl method it is "@@YA", for the __fastcall method it is "@@YI". The spelling code of the parameter table is as follows:
X–void
D–char
E–unsigned char
F–short
H–int
I–unsigned int
J–long
K–unsigned long (DWORD)
M–float
N–double
_N–bool
U– The way of struct
...
pointer is a bit special, use PA to represent the pointer, and use PB to represent the pointer of const type. The following code indicates the pointer type. If pointers of the same type appear continuously, they are replaced by "0", and a "0" represents a repetition. U indicates the structure type, usually followed by the type name of the structure, and "@@" indicates the end of the structure type name. The return value of the function is not treated specially. It is described in the same way as the function parameter, followed by the start sign of the parameter table, that is to say, the first item in the function parameter table actually indicates the return value type of the function. After the parameter table, "@Z" marks the end of the entire name, and if the function has no parameters, it ends with "Z". Here are two examples, suppose you have the following function declarations:

int Function1(char *var1,unsigned long);

Its function decoration name is "?Function1@@YGHPADK@Z".

And for function declarations:

void Function2();

Its function modification name is "?Function2@@YGXXZ".

For C++ class member functions (the calling method is thiscall), the name modification of the function is slightly different from that of non-member C++ functions. First, insert the class name guided by the "@" character between the function name and the parameter list; secondly The start identifier of the parameter table is different. The identifier of the public (public) member function is "@@QAE", the identifier of the protected (protected) member function is "@@IAE", and the identifier of the private (private) member function is "@@ AAE", if the function declaration uses the const keyword, the corresponding identifiers should be "@@QBE", "@@IBE" and "@@ABE" respectively. Use "AAV1" if the parameter type is a reference to a class instance, or "ABV1" for a const type reference. Let's take class CTest as an example to illustrate the name modification rules of C++ member functions:

class CTest
{
    
    
......
private:
    void Function(int);
protected:
    void CopyInfo(const CTest &src);
public:
    long DrawText(HDC hdc, long pos, const TCHAR* text, RGBQUAD color, BYTE bUnder, bool bSet);
    long InsightClass(DWORD dwClass) const;
......
};

For the member function Function, its function decoration name is "?Function@CTest@@AAEXH@Z", and the string "@@AAE" indicates that this is a private function. The member function CopyInfo has only one parameter, which is a const reference parameter to the class CTest, and its function decoration name is "?CopyInfo@CTest@@IAEXABV1@Z". DrawText is a relatively complex function declaration, which not only has string parameters, but also structure parameters and HDC handle parameters. It should be pointed out that HDC is actually a pointer of HDC__ structure type, and the representation of this parameter is "PAUHDC__@@ ", and its complete function decoration name is "?DrawText@CTest@@QAEJPAUHDC__@@JPBDUtagRGBQUAD@@E_N@Z". InsightClass is a shared const function. Its member function identifier is "@@QBE", and the complete modification name is "?InsightClass@CTest@@QBEJK@Z".

Neither the C function name modification method nor the C++ function name modification method changes the case of the characters in the output function name. This is different from the PASCAL calling convention. The function name output by the PASCAL convention has no modification and is all uppercase.

View function name mangling

There are two ways to check the name mangling of functions in your program: using the compiled output listing or using the Dumpbin tool. Use the /FAc, /FAs, or /FAcs command-line argument to have the compiler output a list of function or variable names. You can also use the dumpbin.exe /SYMBOLS command to get a list of function or variable names in the obj file or lib file. Alternatively, you can use undname.exe to convert a decorated name to its undecorated form.

Common problems caused by mismatching function calling conventions and name mangling rules

Stack exceptions caused by different function calling conventions:

If a stack exception occurs when a function is called, it is likely to be caused by a mismatch in the function calling convention. For example, the dynamic link library a has the following export functions:

long MakeFun(long lFun);

The function calling convention used when the dynamic library is generated is __stdcall, so the calling convention of the function MakeFun in the compiled a.dll is _stdcall, that is, the parameters are pushed into the stack from right to left when the function is called, and the stack is restored by itself when the function returns . Now a program module b needs to refer to MakeFun in a, b is compiled in C++ like a, except that the function calling method of module b is __cdecl, since b contains the MakeFun function declaration in the header file provided by a, so MakeFun is in In the b module, other functions that call MakeFun consider it to be the __cdecl calling method. Of course, these functions in the b module will help restore the stack after calling MakeFun, but MakeFun has already restored the stack at the end. The functions in the b module This superfluous action caused a stack pointer error, which caused a stack exception. The macroscopic phenomenon is that there is no problem with the function call (because the order of parameter passing is the same), and MakeFun has also completed its own function, but an error is caused after the function returns. The solution is also very simple, just ensure that the two modules set the same function calling convention when compiling.

Different function calling conventions lead to different function name modification rules and lead to link errors:

After understanding the function calling convention and the function name modification rules, it is very simple to look at the LINK 2001 error that often occurs when using a library compiled in C language in a C++ program. Take the two modules in the above example as an example. This time, both modules use the __stdcall calling convention when compiling, but a. The name modification of the MakeFun function in the library a.lib is "_MakeFun@4". b contains the MakeFun function declaration in the header file provided by a, but since b is compiled in C++, MakeFun in module b is named "?MakeFun@@YGJJ@Z" according to the C++ name modification rules. The compilation process All is well, when the program is linked, the c++ linker will look for "?MakeFun@@YGJJ@Z" in a.lib, but there is only "_MakeFun@4" in a.lib, there is no "?MakeFun@@YGJJ@Z ”, so the linker reports:

error LNK2001: unresolved external symbol ?MakeFun@@YGJJ@Z

The solution and simplicity is to let the b module know that this function is compiled in C language, and extern "C" can do this. A library compiled in C language should take into account that the program using this library may be a C++ program (using a C++ compiler), so you should pay attention to this when designing the header file. Normally header files should be declared like this:

#ifdef _cplusplus
extern "C" {
    
    
#endif
 
long MakeFun(long lFun);
 
#ifdef _cplusplus
}
#endif

In this way, the C++ compiler will know that the modified name of MakeFun is "_MakeFun@4", and there will be no link errors. (For the use of extern "C", please refer to https://blog.csdn.net/weixin_44049823/article/details/127279125 for details)

Many people don't understand why the compilers I use are all VC compilers and still produce "error LNK2001" errors? In fact, the VC compiler will choose the compilation method according to the extension of the source file. If the extension of the file is ".C", the compiler will use the syntax of C to compile. If the extension is ".cpp", the compiler will use C++ syntax compiles programs, so the best way is to use extern "C".

Reprinted from: https://www.cnblogs.com/cnlmjer/archive/2008/12/05/4099891.html

Guess you like

Origin blog.csdn.net/weixin_44049823/article/details/131120276