In-depth reading: A detailed analysis of extern "C"

 

 

[Guide]: This article analyzes the underlying principle and practical application of extern "C" in detail.

The following is the text

In the system you have worked on, I wonder if you can see the code similar to the following.

 

This seems to be no problem, you should still think: "Um...yes, our code is written like this, and I have never encountered any trouble because of this~".

You are right, if your header file has never been referenced by any C++ program.

What does this have to do with C++? Look at the name of __cplusplus (note the two underscores in front) and you should know that it has a lot to do with C++. __cplusplus is a predefined macro specified by the C++ specification. What you can trust is that all modern C++ compilers have pre-defined it; all C language compilers do not. In addition, according to the specification, the value of __cplusplus should be equal to 1 9 9 7 1 1 L, but not all compilers implement this, for example, the g++ compiler defines its value as 1.

Therefore, if the above code is referenced by a C language program, its content is equivalent to the following code.

 

In this case, since extern "C" {} does not exist after preprocessing, the relationship between it and the #include directive naturally arises out of nothing.

The past and present of extern "C"

In the C++ compiler, there is a Diablo who specializes in a job called "name mangling". When a C++ source file is put into compilation, it starts to work, smashing every externally visible name it sees in the source file completely, and then storing it in the symbol table of the binary object file.

The reason why such a monster exists in the world of C++ is because C++ allows different definitions of a name, as long as there is no ambiguity in semantics. For example, you can make two functions have the same name as long as their parameter lists are different, which is called function overloading; even, you can make the prototype declarations of two functions exactly the same, as long as they The name space (namespace) of the place is not the same. In fact, when in different namespaces, all names can be repeated, whether it is a function name, a variable name, or a type name.

In addition, the construction method of C++ program still inherits the tradition of C language: the compiler treats each source code file specified through the command line as an independent compilation unit, and generates object files; then, the linker searches for these object files The symbol table links them together to generate an executable program.

Compilation and linking are two stages; in fact, the compiler and linker are two completely independent tools. The compiler can know the difference between those symbols with the same name through semantic analysis; while the linker can only identify the object by the name stored in the symbol table of the object file.

Therefore, the purpose of name shredding by the compiler is to prevent the linker from getting confused when it is working. It re-encodes all names and generates new names that are globally unique and unique, so that the linker can accurately identify the corresponding to each name. Object.

However, the C language is a language with a single name space, and function overloading is not allowed, that is, within the scope of a compilation and linking, the C language does not allow objects with the same name. For example, in a compilation unit, a function with the same name is not allowed, regardless of whether the function is modified with static; in all object files corresponding to an executable program, an object with the same name is not allowed, whether it represents a global variable or a function. Therefore, the C language compiler does not need to perform complicated processing on any names (or just make simple and consistent decorations, such as adding a single underscore _ in front of the name).

Bjarne Stroustrup, the creator of C++, initially listed-being compatible with C and able to reuse a large number of existing C libraries-as an important goal of the C++ language. But the two language compilers treat names inconsistently, which brings trouble to the linking process.

For example, there is a header file named my_handle.h with the following content:

 

Then use the C language compiler to compile my_handle.c to generate the object file my_handle.o. Since the C language compiler does not smash the names, in the symbol table of my_handle.o, the names of these three functions are consistent with the declarations in the source code file.

 

Later, we want a C++ program to call these functions, so it also includes the header file my_handle.h. Suppose the name of this C++ source code file is my_handle_client.cpp, and its content is as follows:

 

Among them, the part in bold is the appearance of the names of the three functions after being crushed.

Then, in order for the program to work, you must link my_handle.o and my_handle_client.o together. Since the two object files have different names for the same object, the linker will report a related "symbol not defined" error.

 

In order to solve this problem, C++ introduced the concept of linkage specification, which is expressed as extern "language string". The "language string" generally supported by C++ compilers has "C" and "C++", corresponding to the C language. And C++ language.

The function of the link specification is to tell the C++ compiler that all declarations or definitions modified by the link specification should be processed in the manner of the specified language, such as names, calling conventions, and so on.

There are two uses of link specifications:

1. The link specification of a single statement, such as:

extern "C" void foo();

2. A set of declared link specifications, such as:

extern "C"{   void foo();  int bar();}

For our previous example, if we change the content of the header file my_handle.h to:

 

Then use the C++ compiler to recompile my_handle_client.cpp, and the symbol table in the generated object file my_handle_client.o becomes:

 

From this we can see that at this time, the symbols generated by the declaration modified with extern "C" are consistent with those generated by the C language compiler. In this way, when you link my_handle.o and my_handle_client.o together again, there will be no more "symbol undefined" errors.

But at this time, if you recompile my_handle.c, the C language compiler will report a "syntax error", because extern "C" is the syntax of C++ and the C language compiler does not recognize it. At this point, you can use the macro __cplusplus to identify the C and C++ compilers as we have discussed before. The revised code of my_handle.h is as follows:

 

Beware of the unknown world behind the door

After we understand the origin and purpose of extern "C", back to our original topic, why can't we put the #include directive in extern "C" {... }?

Let's first look at an example, the existing ah, bh, ch and foo.cpp, where foo.cpp contains ch, ch contains bh, and bh contains ah, as follows:

 

 

Now use the preprocessing option of the C++ compiler to compile foo.cpp, and get the following result:

 

As you can see, when you put the #include directive in extern "C" {}, it will cause the nesting of extern "C" {}. This nesting is allowed by the C++ specification. When nesting occurs, the innermost nesting shall prevail. For example, in the following code, the function foo will use the C++ link specification, and the function bar will use the C link specification.

 

If it can be ensured that all the header files directly or indirectly dependent on a C language header file are also in the C language, then according to the C++ language specification, there should be no problem with this nesting. But specific to the implementation of some compilers, such as MSVC2005, errors may be reported due to excessive nesting of extern "C" {}. Don't blame Microsoft for this, because as far as this issue is concerned, this nesting is meaningless. You can completely avoid nesting by placing the #include directive outside extern "C" {}. Take the previous example, if we move the #include directives of each header file outside of extern "C" {}, and then use the preprocessing options of the C++ compiler to compile foo.cpp, we will get the following result :

 

Such a result will certainly not cause compilation problems-even with MSVC.

Another major risk of placing the #include directive in extern "C" {} is that you may inadvertently change the link specification of a function declaration. For example: there are two header files ah, bh, where bh contains ah, as follows:

 

According to the original intention of the author of ah, the function foo is a C++ free function, and its link specification is "C++". But in bh, because #include "ah" is placed inside extern "C" {}, the link specification of function foo is incorrectly changed.

Since every #include directive hides this unknown world, unless you deliberately explore it, you will never know. When you put every #include directive in extern "C" {}, what will happen? What kind of results will be produced and what risks will it bring. Maybe you will say, "I can check the included header files, and I can guarantee that they will not cause trouble." But why bother? After all, we don’t have to pay for unnecessary things, can we?

 

Q&A

 

Q: Can't any #include directives be placed in extern "C"?

 

A: Just like most rules in this world, there are always special circumstances.

 

Sometimes, you may use the header file mechanism to "smartly" solve some problems. For example, the #pragma pack issue. The functions of these header files and regular header files are not the same. They will not place C function declarations or variable definitions, and the link specification will not affect their content. In this case, you do not have to follow these rules.

 

The more general principle is that after you understand all these principles, as long as you understand what you are doing, then do it.


Q: You only said that extern "C" should not be put in, but what can be put in?

 

A: The link specification is only used to modify functions and variables, and function types. So, strictly speaking, you should only place these three types of objects inside extern "C".

 

However, if you put other elements of the C language, such as non-function type definitions (structures, enumerations, etc.) into extern "C", it will not have any impact. Not to mention the macro definition preprocessing directives.

 

So, if you value the habits of good organization and management more, you should only use it where extern "C" declarations must be used. Even if you are lazy, in most cases, putting all the definitions and declarations of a header itself in extern "C" is not a big problem.

 

Q: What if there is no extern "C" declaration in a C header file with function/variable declaration?

 

A: If you can judge that this header file will never be used by C++ code, then just leave it alone.

 

But the reality is that in most cases, you cannot accurately predict the future. If you add this extern "C" now, it won't cost you much, but if you don't add it now, when this header file is accidentally included by someone else's C++ program in the future, others are likely to need a higher cost. To locate errors and fix problems.

 

Q: What should I do if my C++ program wants to include a C header file a. h, which contains C function/variable declarations, but they do not use the extern "C" link specification?

 

A: Add it to ah.

 

Some people may suggest that if ah does not have extern "C" and b.cpp contains ah, you can add in b.cpp:

extern "C"
{
  #include "a.h"
}

This is an evil plan, and the reason we have explained before. But it is worth discussing that there may be an assumption behind this scheme, that is, we cannot modify ah. The reasons for the inability to modify may come from two aspects:

 

1. The header file code belongs to other teams or third-party companies, and you do not have the authority to modify the code;


2. Although you have the authority to modify the code, since this header file belongs to the legacy system, rash modification may bring unpredictable problems.

 

In the first case, don't try to workaround yourself, because it will cause you unnecessary trouble. The correct solution is to treat it as a bug and send the defect report to the corresponding team or third-party company. If it is your own company’s team or a third-party company that you have paid for, they are obliged to make such changes for you. If they don't understand the importance of this matter, tell them. If these header files belong to a free and open source software, make the correct modifications yourself and release the patch to its development team.

 

In the second case, you need to discard this unnecessary security awareness. Because, first of all, for most header files, this modification is not a complex, high-risk modification, everything is within a controllable range; secondly, if a header file is messy and complicated, although it is not a legacy The philosophy of the system should be: "Don't touch it before it has caused trouble." But now that the trouble has come, it is better to face it. So the best strategy is to treat it as a good opportunity to organize it into a clean and reasonable state.

 

Q: The wording of extern "C" in our code is as follows. Is this correct?

 

A: Not sure.

 

According to the C++ specification, the value of __cplusplus should be defined as 199711L, which is a non-zero value; although some compilers do not implement it according to the specification, it can still guarantee that the value of __cplusplus is non-zero—at least I So far, I haven't seen which compiler implements it as 0. In this case, #if __cplusplus ... #endif is completely redundant.

 

However, there are so many C++ compiler manufacturers that no one can guarantee that a certain compiler, or an earlier version of a certain compiler, does not define the value of __cplusplus as 0. But even so, as long as it can be guaranteed that the macro __cplusplus is only predefined in the C++ compiler, then just using #ifdef __cplusplus ... #endif is enough to ensure the correctness of the intention; the additional use of #if __cplusplus ... #endif instead it's wrong.

 

Only in this case: that is, a certain manufacturer's C language and C++ language compilers have pre-defined __cplusplus, but the value is 0 and non-zero to distinguish, use #if __cplusplus ... #endif is Correct and necessary.

Since the real world is so complicated, you need to be clear about your goals, and then define corresponding strategies based on the goals. For example: if your goal is to enable your code to be compiled with several mainstream compilers that correctly comply with the specification, then you only need to simply use #ifdef __cplusplus ... #endif is enough.

 

But if your product is an ambitious cross-platform product that tries to be compatible with various compilers (including unknown), we may have to use the following methods to deal with various situations, where __ALIEN_C_LINKAGE__ is to identify those in Both C and C++ compilers define the __cplusplus macro compiler.

 

This should work, but writing such a large list in each header file is not only unsightly, but also creates a situation where once the strategy is modified, it will be modified everywhere. Violating the DRY (Don't Repeat Yourself) principle, you always have to pay extra for it. A simple solution to solve it is to define a specific header file-such as linkage.h, and add this definition to it:

 

 In the following example, the function declaration and definition of c are in cfun.h and cfun.c respectively, the function prints the string "this is c fun call", and the c++ function declaration and definition are in cppfun.h and cppfun.cpp respectively. The function prints String "this is cpp fun call", compilation environment vc2010

 

c++ method of calling c (the key is to make the function of c compile in the way of c, not the way of c++)

 

(1) cfun.h is as follows:

#ifndef _C_FUN_H_
#define _C_FUN_H_


    void cfun();


#endif

   cppfun.cpp is as follows:

//#include "cfun.h"  不需要包含cfun.h
#include "cppfun.h"
#include <iostream>
using namespace std;
extern "C"     void cfun(); //声明为 extern void cfun(); 错误


void cppfun()
{
    cout<<"this is cpp fun call"<<endl;
}


int main()
{
    cfun();
    return 0;
}

(2) cfun.h is the same as above

 

  cppfun.cpp is as follows:

extern "C"
{
    #include "cfun.h"//注意include语句一定要单独占一行;
}
#include "cppfun.h"
#include <iostream>
using namespace std;


void cppfun()
{
    cout<<"this is cpp fun call"<<endl;
}


int main()
{
    cfun();
    return 0;
}

(3) cfun.h is as follows:

#ifndef _C_FUN_H_
#define _C_FUN_H_


#ifdef __cplusplus
extern "C"
{
#endif


    void cfun();


#ifdef __cplusplus
}
#endif


#endif

cppfun.cpp is as follows:

#include "cfun.h"
#include "cppfun.h"
#include <iostream>
using namespace std;


void cppfun()
{
    cout<<"this is cpp fun call"<<endl;
}


int main()
{
    cfun();
    return 0;
}

 c calls c++ (the key is that C++ provides a function that conforms to C calling conventions)

 

When testing on vs2010, there is no statement of extern, etc., only cppfun.h is included in cfun.c, and then cppfun() can be called to compile and run. Compile errors under gcc, according to the standard of c++/c. It should be wrong. The following methods can run both compilers

 

cppfun.h is as follows:

#ifndef _CPP_FUN_H_
#define _CPP_FUN_H_


extern "C" void cppfun();




#endif

cfun.c is as follows:

//#include "cppfun.h" //不要包含头文件,否则编译出错
#include "cfun.h"
#include <stdio.h>


void cfun()
{
    printf("this is c fun call\n");
}


extern void cppfun();


int main()
{
#ifdef __cplusplus
    cfun();
#endif
    cppfun();
    return 0;
}

 

https://www.cnblogs.com/TenosDoIt/p/3163621.html

 

Guess you like

Origin blog.csdn.net/miaozenn/article/details/113063355