The extern "C" in C language is not that simple!


 

foreword

This article analyzes in detail the underlying principles and practical applications of extern "C" . In the system you have worked on, I wonder if you can see code similar to the following.

 

This doesn't seem like a problem, and you should still be thinking: " Well ... yeah, our code is written like this, and we have never had any trouble because of it~ " .

You are right, if your header file is never referenced by any C++ program.

What does this have to do with C++ ? Look at the name of __cplusplus (note the two underscores in front) and you should know that it has a lot to do with C++ . __cplusplus is a predefined macro specified by the C++ specification.

You can trust this: all modern C++ compilers predefine it; all C compilers don't. In addition, according to the specification, the value of __cplusplus should be equal to 1 9 9 7 1 1 L , but not all compilers implement this, for example, the g++ compiler defines its value as 1 .

Therefore, if the above code is referenced by a C language program, its content is equivalent to the following code.

 

In this case, since extern "C" { } does not exist after preprocessing, the relationship between it and the #include directive is naturally out of thin air.

The past and present of extern "C"

In C++ compilers, there is a Diablo who specializes in a job called " name mangling " . When a C++ source file is put into compilation, it goes to work, smashing every externally visible name it sees in the source file beyond recognition, and storing it in the symbol table of the binary object file.

The reason why such a monster exists in the C++ world is because C++ allows different definitions of a name, as long as there is no semantic ambiguity.

For example, you can make two functions have the same name, as long as their parameter lists are different, which is function overloading ; even, you can make the prototype declarations of two functions are exactly the same, as long as they are The name space (namespace) at the place is not the same.

In fact, when in different namespaces, all names can be repeated, whether it is a function name, variable name, or type name.

In addition, the construction method of the C++ program still inherits the tradition of the C language: the compiler regards each source code file specified by the command line as an independent compilation unit, and generates an object file; The symbol table links them together to produce an executable program.

Compiling and linking is a two-stage affair; in fact, the compiler and linker are two completely separate tools. The compiler can know the difference between symbols with the same name through semantic analysis; but the linker can only identify the object through the name saved in the symbol table of the object file.

Therefore, the purpose of name smashing by the compiler is to prevent the linker from getting confused when it is working, recode all the names, and generate globally unique, non-repeating new names, so that the linker can accurately identify the name corresponding to each name. object.

But the C language is a language with a single name space, and function overloading is not allowed, that is to say, within the scope of a compilation and link, the C language does not allow objects with the same name to exist.

For example, within a compilation unit, no function with the same name is allowed, no matter whether the function is modified with static ; no object with the same name is allowed in all object files corresponding to an executable program, no matter it represents a global variable or a function.

Therefore, the C language compiler does not need to perform complex processing on any names (or just perform simple and consistent decoration on the names , such as uniformly adding a single underscore _ in front of the names ).

The founder of C++, Bjarne Stroustrup, initially listed - being compatible with C and being able to reuse a large number of existing C libraries - as an important goal of the C++ language.

However, the compilers of the two languages ​​treat names differently, which brings trouble to the linking process.

For example, there is a header file named my_handle.h with the following content:

 

Then use the C language compiler to compile my_handle.c to generate the object file my_handle.o .

Since the C language compiler does not crush the names, in the symbol table of my_handle.o , the names of these three functions are consistent with the declarations in the source code file.

 

Later, we want a C++ program to call these functions, so it also includes the header file my_handle.h .

Suppose the name of this C++ source code file is my_handle_client.cpp , and its content is as follows:

 

Among them, the bold part is what the names of the three functions look like after being smashed.

Then, in order for the program to work, you must link my_handle.o and my_handle_client.o together. Since the same object is named differently in the two object files, the linker will report an associated " symbol is not defined " error.

 

In order to solve this problem, C++ introduces the concept of linkage specification , the notation is extern "language string" . The "language string" generally supported by C++ compilers includes "C" and "C++" , which correspond to the C language respectively. and C++ language.

The role of the link specification is to tell the C++ compiler: For all declarations or definitions decorated with the link specification, it should be processed in the way of the specified language, such as name, calling convention ( calling convention ) and so on.

Link specifications can be used in two ways:

1. A link specification for a single statement, such as:

extern "C" void foo();

2. A set of declared linkage specifications, such as:

extern "C"{  void foo();  int bar();}

For our previous example, if we change the contents of the header file my_handle.h to:

 

Then use the C++ compiler to recompile my_handle_client.cpp , and the symbol table in the generated object file my_handle_client.o becomes:

 

We can see from it that at this time, the symbols generated by the declaration modified with extern "C" are consistent with the symbols generated by the C language compiler. In this way, when you link my_handle.o and my_handle_client.o together again, there will be no previous " symbol undefined " error.

But at this time, if you recompile my_handle.c , the C language compiler will report " syntax error " , because extern "C" is the syntax of C++ , and the C language compiler does not recognize it. At this point, you can use the macro __cplusplus to identify the C and C++ compilers as we have discussed before . The code of the modified my_handle.h is as follows:

 

Beware of the unknown world behind the door

After we understand the origin and purpose of extern "C" , back to our original topic, why can't we put the #include directive in extern "C" { ... } ?

Let's look at an example first, there are ah , bh , ch and foo.cpp , where foo.cpp contains ch , ch contains bh , and bh contains ah , as follows:

 

Now compile foo.cpp using the preprocessing options of the C++ compiler , and get the following result:

 

As you can see, when you put the #include directive in extern "C" { } , it will cause nesting of extern "C" { } . This nesting is allowed by the C++ specification. When nesting occurs, the innermost nesting takes precedence. For example, in the following code, the function foo will use the C++ linkage specification, and the function bar will use the C linkage specification.

 

If it can be guaranteed that all header files that a C language header file directly or indirectly depends on are also C language, then according to the C++ language specification, there should be no problem with this kind of nesting.

But specific to the implementation of some compilers, such as MSVC2005 , it may report an error due to the deep nesting of extern "C" { } .

Don't blame Microsoft for this, as this nesting is pointless as far as the problem is concerned. You can avoid nesting entirely by placing #include directives outside extern "C" { } .

Taking the previous example as an example, if we move the #include directives of each header file out of extern "C" { } , and then compile foo.cpp with the preprocessing options of the C++ compiler , we will get the following result :

 

Such results are certainly not the result of compilation problems - even with MSVC .

Another significant risk of placing #include directives inside extern "C" { } is that you may inadvertently change the linkage specification of a function declaration . For example: there are two header files ah , bh , where bh contains ah , as follows:

 

According to the original intention of the author of ah , the function foo is a C++ free function, and its linkage specification is "C++" . But in bh , because #include "ah" is placed inside extern "C" { } , the linkage specification of function foo is incorrectly changed.

Since this unknown world is hidden behind each #include instruction, unless you deliberately explore, you will never know, when you put each #include instruction in extern "C" { } , what will happen? What kind of results will be produced and what kind of risks will be brought.

You might say, " I can go and look at the header files that are included, and I can guarantee they won't cause trouble " . But why bother? After all, we can afford not to pay for unnecessary things, can't we?

Q & A

Q:  Can't any #include directives be placed inside extern "C" ?

A:  Like most rules in this world, there are always exceptions.

Sometimes, you may use the header file mechanism to " smartly " solve some problems. For example, the problem with #pragma pack . These header files are different from regular header files. They will not place C function declarations or variable definitions in them, and link specifications will not affect their contents. In this case, you do not have to follow these rules.

The more general principle is that after you understand all the principles, as long as you understand what you are doing, then do it.

Q:  You only said that extern "C" should not be put in , but what can be put in?

A:  Linkage specifications are only used to decorate functions and variables, and function types. So, strictly speaking, you should only place these three objects inside extern "C" .

However, if you put other elements of the C language, such as non-function type definitions (structures, enumerations, etc.) into extern "C" , it will not have any impact. Not to mention macros define preprocessing directives.

So, if you place more value on good organization and management habits, you should only use extern "C" declarations where necessary. Even if you are lazy, in most cases, it will not be a big problem to put all the definitions and declarations of a header itself in extern "C" .

Q:  What if there is no extern "C" declaration in a C header file with function / variable declarations ?

A:  If you can judge that this header file will never be used by C++ code, then leave it alone.

But the reality is, most of the time, you can't predict the future accurately. If you add this extern "C" now , it won't cost you much, but if you don't add it now, when this header file is inadvertently included in other people's C++ programs in the future, others will probably need higher costs to locate errors and fix problems.

Q:  What should I do if my C++ program wants to include a C header file a.h , which contains C function / variable declarations, but they do not use the extern "C" linkage specification?

A:  Add it to ah .

Some people may suggest that if ah does not have extern "C" and b.cpp contains ah , you can add in b.cpp :

extern "C"{  #include "a.h"}

This is an evil scheme, for the reasons we explained earlier. But it is worth discussing that there may be an assumption behind this scheme, that is, we cannot modify ah . The reason why it cannot be modified may come from two aspects:

1. The header file code belongs to other teams or third-party companies, and you do not have permission to modify the code;

2. Although you have the authority to modify the code, since this header file belongs to the legacy system, modifying it rashly may cause unpredictable problems.

For the first case, don't try to do the workaround yourself , as it will cause you unnecessary trouble. The correct solution is to treat it as a bug and send a defect report to the appropriate team or third-party company.

If it is your own company's team or a third-party company that you have paid for, they are obliged to make such modifications for you. If they don't understand the importance of the matter, tell them. If these header files belong to a free and open source software, make correct modifications and release the patch to its development team.

In the second case, you need to discard this unnecessary security awareness.

Because, first of all, for most header files, this kind of modification is not a complicated, high-risk modification, everything is within the controllable range;

Secondly, if a certain header file is messy and complicated, although the philosophy for the legacy system should be: " don't touch it until it has caused trouble " , but now that the trouble has come, it is better to face it than to avoid it, so the best policy is to put it See it as a good opportunity to tidy up to a clean and reasonable state.

Q:  The extern "C" in our code is written as follows, is this correct ?

 

A:  Not sure.

According to the definition of the C++ specification, the value of __cplusplus should be defined as 199711L , which is a non-zero value; although some compilers do not implement it according to the specification, they can still guarantee that the value of __cplusplus is non-zero - at least I have seen So far I haven't seen any compiler implement it as 0 .

In this case, #if __cplusplus ... #endif is completely redundant.

However, there are so many C++ compiler manufacturers, no one can guarantee that a certain compiler, or an early version of a certain compiler, does not define the value of __cplusplus as 0 .

But even so, as long as the macro __cplusplus is only pre-defined in the C++ compiler, then just using #ifdef __cplusplus ⋯ #endif is enough to ensure the correctness of the intent; the additional use of #if __cplusplus ... #endif is instead Incorrect.

Only in this case: that is, a manufacturer's C language and C++ language compilers both pre-define __cplusplus , but it is distinguished by its value of 0 and non-zero, using #if __cplusplus ... #endif is correct and necessary.

Since the real world is so complex, you need to clarify your goals, and then define the corresponding strategy according to the goals. For example: if your goal is to enable your code to be compiled with several mainstream compilers that correctly comply with the specification, then you only need to simply use #ifdef __cplusplus ... #endif is enough.

But if your product is an ambitious cross-platform product trying to be compatible with various compilers (including unknown), we may have to use the following methods to deal with various situations, where __ALIEN_C_LINKAGE__ is to identify those in C and The compiler that defines the __cplusplus macro in C++ compilation.

 

This should work, but writing such a long list in every header file is not only unsightly, but also creates a situation where once the policy is changed, it will be changed everywhere. Violation of the DRY (Don't Repeat Yourself) principle, you always have to pay an extra price for it. A simple solution to it is to define a specific header file - such as clinkage.h , and add such a definition in it:

 

 In the following example, the function declaration and definition of c are respectively in cfun.h and cfun.c , and the function prints the string "this is c fun call" . The c++ function declaration and definition are respectively in cppfun.h and cppfun.cpp , and the function prints String "this is cpp fun call", compiling environment vc2010

C++ calls the method of c (the key is to make the function of c compile in the way of c , not in the way of c++ )

( 1 ) cfun.h is as follows:

#ifndef _C_FUN_H_#define _C_FUN_H_

void cfun();

#endif

  cppfun.cpp is as follows:

//#include "cfun.h"  不需要包含cfun.h#include "cppfun.h"#include <iostream>using namespace std;extern "C"     void cfun(); //声明为 extern void cfun(); 错误

void cppfun(){    cout<<"this is cpp fun call"<<endl;}

int main(){    cfun();    return 0;}

( 2 ) cfun.h same as above

  cppfun.cpp is as follows:

extern "C"{    #include "cfun.h"//注意include语句一定要单独占一行;}#include "cppfun.h"#include <iostream>using namespace std;

void cppfun(){    cout<<"this is cpp fun call"<<endl;}

int main(){    cfun();    return 0;}

( 3 ) cfun.h is as follows:

#ifndef _C_FUN_H_#define _C_FUN_H_

#ifdef __cplusplusextern "C"{#endif

    void cfun();

#ifdef __cplusplus}#endif

#endif

cppfun.cpp is as follows:

#include "cfun.h"#include "cppfun.h"#include <iostream>using namespace std;

void cppfun(){    cout<<"this is cpp fun call"<<endl;}

int main(){    cfun();    return 0;}

 c calls c++ (the key is that C++ provides a function that conforms to the C calling convention)

When testing on vs2010 , there is no declaration of extern , etc., only cppfun.h is included in cfun.c , and then calling cppfun() can also be compiled and run. Compilation error occurs under gcc , according to the standard of c++/c. Should be wrong. The following method works with both compilers

cppfun.h is as follows:

#ifndef _CPP_FUN_H_#define _CPP_FUN_H_

extern "C" void cppfun();

#endif

cfun.c is as follows:

//#include "cppfun.h" //不要包含头文件,否则编译出错#include "cfun.h"#include <stdio.h>

void cfun(){    printf("this is c fun call\n");}

extern void cppfun();

int main(){#ifdef __cplusplus    cfun();#endif    cppfun();    return 0;}

Code words are not easy, welcome to like, forward and collect, thank you! !

-END-

Guess you like

Origin blog.csdn.net/Rocky006/article/details/131001044
Recommended