Calling convention and name modification in DLL in C++

Calling convention and name modification in DLL in C++

调用约定(Calling Convention)是指在程序设计语言中为了实现函数调用而建立的一 种协议。这种协议规定了该语言的函数中的参数
传送方式、参数是否可变和由谁来处理堆栈等问题。不同的语言定义了不同的调用约定。

在C++中,为了允许操作符重载和函数重载,C++编译器往往按照某种规则改写每一个入口点的符号名,以便允许同一个名字(具有不同的
参数类型或者是不同的作用域)有多个用法,而不会打破现有的基于C的链接器。这项技术通常被称为名称改编(Name Mangling)或者名
称修饰(Name Decoration)。许多C++编译器厂商选择了自己的名称修饰方案。

因此,为了使其它语言编写的模块(如Visual Basic应用程序、Pascal或Fortran的应用程序等)可以调用C/C++编写的DLL的函数,必须
使用正确的调用约定来导 出函数,并且不要让编译器对要导出的函数进行任何名称修饰。

1. Calling Convention

  • Calling conventions are used to deal with issues such as determining the order in which function parameters are pushed and popped when passed (whether the caller or the callee pops parameters from the stack), as well as the name modification conventions used by the compiler to identify function names. The following calling conventions are defined in Microsoft VC++ 6.0. We will analyze them one by one in conjunction with assembly language:

__cdecl

  • __cdecl is the default calling convention used by C/C++ and MFC programs. It can also be specified manually by adding the __cdecl keyword when declaring the function. When using the __cdecl convention, function parameters are pushed onto the stack in order from right to left, and the caller pops the parameters off the stack to clear the stack. Therefore, functions that implement variadic arguments can only use this calling convention. Since every function using the __cdecl convention must contain code to clean up the stack, the size of the generated executable file will be relatively large. __cdecl can be written as _cdecl.
  • The following will analyze the __cdecl convention through a specific example. Create a new Win32 Console project in VC++ and name it cdecl. The code is as follows:
int __cdecl Add(int a, intb);         //函数声明

 

void main()
{
    
    
      Add(1,2);                                 //函数调用
}

 

int __cdecl Add(int a, intb)          //函数实现
{
    
    

       return (a + b);

}
  • The disassembly code at the function call is as follows:
;Add(1,2);
push       2          ;参数从右到左入栈,先压入2
push       1          ;压入1
call       @ILT+0(Add) (00401005)     ;调用函数实现
add        esp,8          ;由函数调用清栈

__stdcall

  • The __stdcall calling convention is used to call Win32 API functions. When using the __stdcal convention, function parameters are pushed onto the stack in the order of from right to left. The called function clears the stack of transferred parameters before returning. The number of parameters is fixed. Since the function body itself knows the number of parameters passed in, the called function can use a ret n instruction to directly clear the stack of passed parameters before returning. __stdcall can be written as _stdcall.

  • Again, replace the __cdecl convention with __stdcall:

int __stdcall Add(int a, int b)
{
    
    
    return (a + b);
}
  • Disassembly code at the function call:
; Add(1,2);

push      2          ;参数从右到左入栈,先压入2

push      1           ;压入1

call      @ILT+10(Add) (0040100f)          ;调用函数实现
  • Disassembly code of function implementation part:
;int __stdcall Add(int a, int b)

push        ebp
mov         ebp,esp
sub         esp,40h
push        ebx
push        esi
push        edi
lea         edi,[ebp-40h]
mov         ecx,10h
mov         eax,0CCCCCCCCh
rep stos       dword ptr[edi]
;return (a + b);
mov         eax,dword ptr [ebp+8]
add         eax,dword ptr [ebp+0Ch]
pop         edi
pop         esi
pop         ebx
mov         esp,ebp
pop         ebp
ret         8       ;清栈

__fastcall

  • The __fastcall convention is used in situations where performance requirements are very high. __fastcall agrees to place the two parameters of the function starting from the left and not larger than 4 bytes (DWORD) in the ECX and EDX registers respectively. The remaining parameters are still pushed onto the stack from right to left. The called function is transferred before returning. Clean the stack of passed parameters. __fastcall can be written as _fastcall.

  • It is still a similar example. At this time, the function calling convention is __fastcall, and the number of function parameters is increased by 2:

int __fastcall Add(int a, double b, int c, int d)
{
    
    
    return (a + b + c + d);
}
  • Assembly code for the function call part:
;Add(1, 2, 3, 4);
push         4                         ;后两个参数从右到左入栈,先压入4
mov         edx,3                   ;int类型的3放入edx
push        40000000h            ;压入double类型的2
push        0
mov         ecx,1                   ;int类型的1放入ecx
call        @ILT+0(Add) (00401005)               ;调用函数实现

Disassembly code of function implementation part:

; int __fastcall Add(int a, double b, int c, int d)
push       ebp
mov        ebp,esp
sub        esp,48h
push       ebx
push       esi
push       edi
push       ecx
lea        edi,[ebp-48h]
mov        ecx,12h
mov        eax,0CCCCCCCCh
rep stos   dword ptr[edi]
pop        ecx
mov        dwordptr [ebp-8],edx
mov        dwordptr [ebp-4],ecx
;return (a + b + c + d);
fild       dword ptr [ebp-4]
fadd       qword ptr [ebp+8]
fiadd      dwordptr [ebp-8]
fiadd      dwordptr [ebp+10h]
call        __ftol (004011b8)
pop         edi
pop         esi
pop         ebx
mov         esp,ebp
pop         ebp
ret         0Ch                             ;清栈
  • The keywords __cdecl, __stdcall and __fastcall can be added directly before the function to be output, or they can be selected in the Setting...->C/C++->Code Generation item of the compilation environment. Their corresponding command line parameters are /Gd, /Gz and /Gr respectively. The default state is /Gd, which is __cdecl. When the keyword added before the output function is different from the selection in the compilation environment, the keyword added directly before the output function is effective.

thiscall

  • The thiscall calling convention is the default calling convention for non-static class member functions in C++. thiscall can only be used by the compiler and has no corresponding keyword, so it cannot be specified by the programmer. When using the thiscall convention, function parameters are pushed onto the stack in order from right to left. The called function clears the stack of transferred parameters before returning, but only transfers an additional parameter through the ECX register: this pointer.

  • In this example, we will define a class and define a member function in the class. The code is as follows:

class CSum
{
    
    
public:

    int Add(int a, int b)
    {
    
    
         return (a + b);
    }

};

void main()
{
    
        
       CSum sum;
       sum.Add(1, 2);
}
  • Function calling part assembly code:
;CSum  sum;

;sum.Add(1, 2);
push                   2                         ;参数从右到左入栈,先压入2
push             1                               ;压入1
lea         ecx,[ebp-4]                          ;ecx存放了this指针
call         @ILT+5(CSum::Add) (0040100a)        ;调用函数实现
  • Function implementation part assembly code:
;int Add(int a, int b)
push                    ebp
mov         ebp,esp
sub              esp,44h                       ;多用了一个4bytes的空间用于存放this指针
push              ebx
push              esi
push              edi
push              ecx
lea          edi,[ebp-44h]
mov        ecx,11h
mov         eax,0CCCCCCCCh
rep stos       dword ptr[edi]
pop         ecx
mov         dword ptr [ebp-4],ecx

;return (a + b);
mov         eax,dword ptr [ebp+8]
add         eax,dword ptr [ebp+0Ch]
pop          edi
pop         esi
pop          ebx
mov         esp,ebp
pop          ebp
ret         8                                ;清栈

naked attribute

When entering a function using the four calling conventions mentioned above, the compiler will generate code to save the values ​​in the ESI, EDI, EBX, and EBP registers, and when exiting the function, it will generate code to restore the contents of these registers. For functions that define the naked attribute, the compiler will not automatically generate such code. You need to manually use inline assembly to control stack management in function implementation. Since the naked attribute is not a type modifier, it must be used together with __declspec. The following code defines a function and its implementation that uses the naked attribute:

__declspec ( naked ) func()
{
    
    

    int i;
    int j;
    
    _asm
    {
    
    

         push             ebp
         mov      ebp, esp
         sub            esp,__LOCAL_SIZE
    }

    _asm
    {
    
    
         mov        esp, ebp
         pop         ebp
         ret
    }
}
  • The naked attribute has little to do with this section. Please refer to MSDN for details.

WINAPI

  • Another thing worth mentioning is the WINAPI macro, which can be translated into the appropriate calling convention for use by functions. This macro is defined in windef.h. The following is part of the content in windef.h:
#defineCDECL            _cdecl

#define WINAPI          CDECL

#define CALLBACK       __stdcall

#define WINAPI        __stdcall

#define APIENTRY      WINAPI
  • This shows the role of WINAPI, CALLBACK, APIENTRY and other macros.

2. Name Decoration

  • C or C++ functions are identified internally (compiled and linked) by their Decoration Name. The decorated name of a function is a string generated by the compiler when compiling the function definition or prototype. The compiler decorates the function name when creating the .obj file. In some cases, it is necessary to use the modified name of the function, such as specifying the output of C++ overloaded functions, constructors, and destructors in the module definition file, or calling C or C++ functions in assembly code, etc.

  • In VC++, the function modified name is determined by various factors such as compilation type (C or C++), function name, class name, calling convention, return type, parameters, etc. The following is a description of three situations: C compilation, C++ compilation (non-class member functions) and C++ class and member function compilation:

2.1. C compile-time function name modification

When a function uses the __cdecl calling convention, the compiler only adds an underscore prefix to the original function name in the format of _functionname. For example: function int __cdecl Add(int a, int b), the output is: _Add.

When a function uses the __stdcall calling convention, the compiler adds an underscore prefix before the original function name, followed by an @ symbol and the number of bytes of the function parameters in the format of _functionname@number. For example: function int __stdcallAdd(int a, int b), the output is: _Add@8.

When a function uses the __fastcall calling convention, the compiler adds an @ symbol before the original function name, followed by an @ symbol and the number of bytes of the function parameters in the format of @functionname@number. For example: function int __fastcall Add(int a, int b), the output is: @Add@8.

None of the above changes will change the character case in the original function name.

2.2 C++ compile-time function (non-class member function) name modification

  • When a function uses the __cdecl calling convention, the compiler does the following:

1. Begins with ? to identify the function name, followed by the function name;

2. The function name starts with the @@YA logo, followed by the return value and parameter list;

3. When the return value or parameters of a function have nothing to do with the C++ class, the return value and parameter list are represented by the following codes:

B:const

D:char

E:unsignedchar

F:short

G:unsignedshort

H:int

I:unsignedint

J:long

K:unsignedlong

M:float

N:double

_N:bool

PA:指针(*,后面的代号表明指针类型,如果相同类型的 指针连续出现,以0

代替,一个0代表一次重复)

      PB:const指针

      AA:引用(&

      AB:const引用

U:类或结构体

V:Interface(接口)

W4:enum

Xvoid

4. The @@YA mark is followed by the return value type of the function, followed by the data type of the parameters, and the pointer mark precedes the data type it refers to. When the return value or parameters of a function have nothing to do with the C++ class, its processing conforms to this rule, otherwise it is processed according to rules 5 and 6;

5. When the function return value is a certain class or a class with const properties, the return value is named: ?A/?B+V+class name+@@ (without the plus sign). When the function return value is a pointer/reference to a certain class or a pointer/reference to a class with const properties, the return value is named: PA/AA or PB/AB+V+class name+@@ (without the addition Number);

6. When the function parameter is a certain class, and the class used by the parameter has appeared before (that is, it is the same as the class used by the function return value or the same class used by the previous parameter), then the parameter type format It is: V+1+@ (without the plus sign). If the class used by this parameter does not appear before, the parameter type format is: V+class name+@@ (without the plus sign). When the function parameter is a pointer/reference of a certain class or a pointer/reference with const property, the parameter type format is based on the above format and is preceded by V to represent the pointer/reference type or pointer/reference with const property. Reference type identifier (PA/AA or PB/AB);

7. After the parameter list, @Z marks the end of the entire name. If the function has no parameters, it ends with Z mark.

  • When a function uses the __stdcall calling convention, the rules for what the compiler does are the same as the __cdecl calling convention above, except that the start identifier of the parameter table changes from @@YA above to @@YG.

  • When a function uses the __fastcall calling convention, the rules for what the compiler does are the same as the __cdecl calling convention above, except that the start identifier of the parameter table changes from @@YA above to @@YI.

2.3 Name modification when compiling classes and their member functions in C++

  • For exported C++ classes, only the __cdecl calling convention can be used. During the compiler compilation process, the compiler processes C++ classes. For example: class __declspec(dllexport)MyClass will be processed as class MyClass &MyClass::operator=(class MyClass const &). When the C++ compiler performs name modification on a C++ class, the compiler performs the following tasks:

1. Begins with ? to identify the function name, followed by ?4+class name;

2. The class name is followed by the @@QAE identifier, which is fixed for exported classes;

3. @@QAE is followed by AAV0@ABV0@, that is, the reference type identifier AA+V+0 (the identifier of the repeated class) +@ (without the plus sign) and the const reference AB+V+ 0 (the identifier of the repeated class) identifier) ​​+@ (without the plus sign);

4. Finally, mark the end of the entire name with @Z.

  • Different calling conventions can be used for member functions (non-constructors and destructors) in exported C++ classes. When a member function in an exported C++ class uses the __cdecl calling convention, the compiler does the following:

1. Begins with ? to identify the function name, followed by function name + @ + class name (without the plus sign);

2. Then it starts with the @@QAE identifier, followed by the return value and parameter list;

3. When the return value or parameters of a function have nothing to do with the C++ class, the return value and parameter list are represented by the following codes:

   B:const

D:char

E:unsignedchar

F:short

G:unsignedshort

H:int

I:unsignedint

J:long

K:unsignedlong

M:float

N:double

_N:bool

PA:指针(*,后面的代号表明指针类型,如果相同类型的 指针连续出现,以0

代替,一个0代表一次重复)

      PB:const指针

      AA:引用(&

      AB:const引用

U:类或结构体

V:Interface(接口)

W4:enum

Xvoid

4. The @@QAE mark is followed by the return value type of the function, followed by the data type of the parameter, and the pointer mark precedes the data type it refers to. When the return value or parameters of a function have nothing to do with the C++ class, its processing conforms to this rule, otherwise it is processed according to rules 5 and 6;

5. When the function return value is the current class or the current class with const properties, the return value is named: ?A or ?B+V+1+@@ (without the plus sign). When the function return value is a pointer/reference of the current class or a pointer/reference of the current class with const properties, the return value is named: PA/AA or PB/AB+V+1+@@ (without the addition Number);

6. When the function return value is a certain class or a class with const properties, the return value is named: ?A/?B+V+class name+@@ (without the plus sign). When the function return value is a pointer/reference to a certain class or a pointer/reference to a class with const properties, the return value is named: PA/AA or PB/AB+V+class name+@@ (without the addition Number);

7. When the function parameter is a certain class, and the class used by the parameter has appeared before (that is, the class currently to be exported is the same as the class used by the function return value or the same class used by the previous parameter) class), the parameter type format is: V+1+@ (without the plus sign). If the class used by this parameter is not the class currently to be exported, the parameter type format is: V+class name+@@ (without the plus sign). When the function parameter is a pointer/reference of a certain class or a pointer/reference with const property, the parameter type format is based on the above format and is preceded by V to represent the pointer/reference type or pointer/reference with const property. Reference type identifier (PA/AA or PB/AB);

8. After the parameter list, @Z marks the end of the entire name. If the function has no parameters, it ends with Z mark.

  • When a function uses the __stdcall calling convention, the rules for what the compiler does are the same as the __cdecl calling convention above, except that the start identifier of the parameter table changes from @@YA above to @@YG.

  • When a function uses the __fastcall calling convention, the rules for what the compiler does are the same as the __cdecl calling convention above, except that the start identifier of the parameter table changes from @@YA above to @@YI.

2.4 Name modification when C++ compiles and exports data

  • For exported data, only the __cdecl calling convention is used. When the C++ compiler performs name modification on a C++ class, the compiler performs the following tasks:

1. Use ? to identify the beginning of the data, followed by the data name;

2. The data name starts with the @@3 identifier, followed by the data type;

3. When the data type is independent of a C++ class, the data type is represented by the following codes:

   B:const

D:char

E:unsignedchar

F:short

G:unsignedshort

H:int

I:unsignedint

J:long

K:unsignedlong

M:float

N:double

_N:bool

PA:指针(*,后面的代号表明指针类型,如果相同类型的 指针连续出现,以0

代替,一个0代表一次重复)

      PB:const指针

      AA:引用(&

      AB:const引用

U:类或结构体

V:Interface(接口)

W4:enum

Xvoid

4. If the data type is a certain class, the data type is named: V+class name+@@ (without the plus sign). When the data type is a pointer/reference of the current class or a pointer/reference of the current class with const properties, the data type is named: PA/AA or PB/AB+V+class name+@@ (without the plus sign) );

5. Finally, if the data type is const, the modified name ends with B. If the data type is non-const, the modified name ends with A.

Guess you like

Origin blog.csdn.net/yohnyang/article/details/134659556