c++ template declaration and definition

Article Directory

c++ template declaration and definition

Preface

I often encounter the question of whether it is easy to use a template. My answer is: "The use of a template is easy, but it is not easy to organize and write." We take a look at the template class almost every encounter it every day, such as STL, ATL, WTL, and Boostthe template class, can appreciate the taste of this: The interface is simple and complicated to operate.

I started using templates 5 years ago, when I saw the MFCcontainer class. Until last year, I didn't have to write the template class myself. But when I needed to write a template class by myself, the first thing I encountered was the fact that "traditional" programming methods (declared in *.h files and defined in *.cpp files) cannot be used for templates. So I spent some time to understand the problem and its solution.

This article is aimed at programmers who are familiar with templates but do not have much experience in writing templates. This article only covers template classes, not template functions. But the principle of discussion is the same for both.

The generation of the problem The
following example illustrates the problem. For example array.h, there are template classes in the file array:

// array.h
template <typename T, int SIZE>
class array
{
  T data_[SIZE];
  array (const array& other);
  const array& operator = (const array& other);
public:
  array(){};
  T& operator[](int i) {return data_[i];}
  const T& get_elem (int i) const {return data_[i];}
  void set_elem(int i, const T& value) {data_[i] = value;}
  operator T*() {return data_;}   
};

Then main.cppuse the above template in the main function in the file:

// main.cpp
＃include "array.h"

int main(void)
{
array<int, 50> intArray;
intArray.set_elem(0, 2);
int firstElem = intArray.get_elem(0);
int* begin = intArray;
}

Compile and run are normal at this time. The program first creates an array containing 50 integers, then sets the value of the first element of the array to 2, then reads the value of the first element, and finally points the pointer to the beginning of the array.

But what happens if you write in traditional programming methods? Let's take a look:

Split the array.h file into two files: array.h and array.cpp (main.cpp remains unchanged)

// array.h    
template <typename T, int SIZE>
class array
{
   T data_[SIZE];
   array (const array& other);
   const array& operator = (const array& other);
 public:
   array(){};
   T& operator[](int i);
   const T& get_elem (int i) const;
   void set_elem(int i, const T& value);
   operator T*();   
};

// array.cpp
＃include "array.h"

template<typename T, int SIZE> T& array<T, SIZE>::operator [](int i)
  {
  return data_[i];
  }

template<typename T, int SIZE> const T& array<T, SIZE>::get_elem(int i) const
  {
  return data_[i];
  }

template<typename T, int SIZE> void array<T, SIZE>::set_elem(int i, const T& value)
  {
  data_[i] = value;
  }
template<typename T, int SIZE> array<T, SIZE>::operator T*()
  {
  return data_;
  }

There will be 3 errors when compiling. The problem came out:

 为什么错误都出现在第一个地方？
 为什么只有3个链接出错？array.cpp中有4个成员函数。

To answer the above questions, it is necessary to have a deep understanding of the template instantiation process.

Template instantiation

The most common mistake programmers make when using template classes is to treat the template class as a certain data type. The term parameterized types has led to this misunderstanding.Of course, a template is not a data type, a template is a template, aptly named:

The compiler uses templates and creates data types by replacing template parameters. This process is template instantiation (Instantiation).
The type created from the template class is called specialization.
Template instantiation depends on the compiler being able to find available code to create special cases (called point of instantiation).
To create a special case, the compiler has to see not only the declaration of the template, but also the definition of the template.
The template instantiation process is slow, that is, instantiation can only be achieved by the definition of a function.

Looking back at the above example, we can see that array is a template, array<int, 50>a template instance-a type. array<int, 50>The process of creating from an array is the process of instantiation. The instantiation elements are embodied in the main.cppfile. If you follow the traditional way, == the compiler array.hsees the declaration of the template in the file, but there is no definition of the template, ==So the compiler cannot create the type array<int, 50>. But there is no error at this time, because the compiler thinks that the template is defined in other files, and leaves the problem to the linker to deal with.

编译阶段是检查语法错误，也找函数的声明，包括第三方的函数库

链接阶段就是，你有病我有药，把声明和定义结合起来；这也就解释了大部分的编译过程，cmake也是一样的；

Now, array.cppwhat will happen when compiling ?The compiler can parse the template definition and check the syntax, but it cannot generate code for member functions. It cannot generate code, because to generate code, you need to know the template parameters, that is, you need a type, not the template itself.

In this way, the link program main.cppor array.cppcan find array <int, 50> definition, then report an error undefined members.

So far, we have answered the first question. But there is a second question. array.cppThere are 4 member functions in, why does the linker only report 3 errors? The answer is: the inertia of instantiation causes this phenomenon. It main.cpphasn't been used in operator[], the compiler hasn't instantiated its definition.

Solution
Knowing the problem, you can solve the problem:

Let the compiler see the template definition in the instantiation element.
Use another file to explicitly instantiate the type so that the linker can see the type.
Use the export keyword.

The first two methods are usually called the inclusion mode, and the third method is called the separation mode.

The first method means that not only the template declaration file, but also the template definition file must be included in the conversion file that uses the template. In the above example, it is the first example, in array.hwhich all member functions are defined with inline functions. Or main.cppinclude it in the array.cppfile. In this way, the compiler can see the declaration and definition of the template and generate an array<int, 50>instance from it. The disadvantage of this is that the compiled file will become very large, which obviously reduces the speed of compilation and linking.

In the second method, the type is obtained through explicit template instantiation. It is best to put all the explicit instantiation process in another file. In this case, you can create a new file

templateinstantiations.cpp：
// templateinstantiations.cpp        
＃include "array.cpp"

template class array <int, 50>; // 显式实例化

array<int, 50>Type is not main.cppproduced in it, but templateinstantiations.cppproduced in it. This way the linker can find its definition. In this way, huge header files will not be generated and the compilation speed will be accelerated. And the header file itself is more "cleaner" and more readable. But this method cannot get the benefits of lazy instantiation, that is, it will explicitly generate all member functions. In addition, templateinstantiations.cppdocuments must be maintained .

The third method is to use the export keyword in the template definition, and let the compiler take care of the rest. When I
Stroustrupread exportit in my book , I was very excited. But it was soon discovered that it was VC 6.0not supported, and later it was discovered that no compiler could support this keyword at all (the first compiler to support it did not come out until the end of 2002). Since then, I have read a lot of articles about export and learned that it can hardly solve the problems that can be solved with the inclusion model. For more export keywords, it is recommended to read the article written by Herb Sutter.

Conclusion
To develop a template library, it is necessary to know that the template class is not a so-called "primitive type", and other programming ideas should be used. The purpose of this article is not to scare programmers who want to do template programming. On the contrary, it is to remind them to avoid making the mistakes that always occur when starting template programming.

Even when defining non-inline functions, all declarations and definitions are placed in the header file of the template. This seems to violate the usual header file rules:, “不要在分配存储空间前放置任何东西”this rule is to prevent multiple definition errors when connecting. == But the template definition is very special. Anything handled by template<...> means that the compiler does not allocate storage space for it at the time, and it stays in a waiting state until notified by a template instance. == There is a mechanism in the compiler and linker to remove multiple definitions of the template, so for ease of use, almost always put all the template declarations and definitions in the header file.

Why can't the C++ compiler support separate compilation of templates

首先，C++标准中提到，一个编译单元[translation unit]是指一个`.cpp`文件以及它所include的所有`.h`文件，`.h`文件里的代码将会被扩展到包含它的`.cpp`文件里，然后编译器编译该`.cpp`文件为一个`.obj`文件，后者拥有PE[Portable Executable,即windows可执行文件]文件格式，并且本身包含的就已经是二进制码，但是，不一定能够执行，因为并不保证其中一定有`main`函数。当编译器将一个工程里的所有`.cpp文`件以分离的方式编译完毕后，再由连接器(linker)进行连接成为一个`.exe`文件。

for example:

 
//---------------test.h-------------------// 
void f();//这里声明一个函数f 
//---------------test.cpp--------------// 
＃include”test.h” 
void f() 
{ 
…//do something 
} //这里实现出test.h中声明的f函数 
//---------------main.cpp--------------// 
＃include”test.h” 
int main() 
{ 
f(); //调用f，f具有外部连接类型 
}

In this example, test. cppand main.cppeach are compiled into a different objfile [let's name it test.objand main.obj], in main.cppwhich, the f function is called, but when the compiler compiles main.cpp, all it knows is only main.cppone of the test.hfiles contained in void f();statement, So the compiler regards f here as an external connection type,That is to say, its function implementation code is in another .objfile, In this case test.obj, that is to say, main.objthere is actually no one line of binary code about the f function, and these codes actually exist in test.cppthe compiled test.obj. In main.objthe call to f will generate a line call command, like this: call f[C ++ in the name of course is the result of mangling [treated] had]

At compile time, this call instruction is obviously wrong, because main.objthere is no line of f implementation code in it. then what should we do? This is the task of the linker. The linker is responsible objfor finding the implementation code of f in other. [in this example, test.obj], and after finding the call address of the instruction called call f, replace it with the actual function entry point address of f. It should be noted that: the linker actually obj"links" the project into a .exefile, and its most critical task is to find the address of an external link symbol in another .obj, and then replace the original one. "Fake" address.
If this process is more in-depth, it is:

call fThis line of instructions is actually not like this, it is actually a so-called stub, which is a jmp 0x23423 [This address may be arbitrary, but the key is that there is a line of instructions on this address to perform the real call faction. In other words, .objall calls to f in this file are jmpdirected to the same address, and the latter is true call f. The advantage of this is that when the linker modifies the address, it only needs to change the address of the latter call XXX. But how does the linker find the actual address of f [in this case it is in test.obj], because the format of .objYu .exeis the same, there is a symbol import table and a symbol export table in such a file [ import table and export table] which associate all symbols with their addresses. In this way, the linker only needs test.objto look for the address of the symbol f in the symbol export table [of course C++ mangling f], and then do some offset processing [because the two .obj files are merged, of course the address will be A certain offset, this linker knows main.objwhich item occupied by f in the symbol import table being written .
This is the approximate process. among them

The key is :

编译main.cpp时，编译器不知道f的实现，所有当碰到对它的调用时只是给出一个指示，指示连接器应该为它寻找f的实现体。这也就是说main.obj中没有关于f的任何一行二进制代码。 
编译test.cpp时，编译器找到了f的实现。于是乎f的实现[二进制代码]出现在test.obj里。 
连接时，连接器在test.obj中找到f的实现代码[二进制]的地址[通过符号导出表]。然后将main.obj中悬而未决的call XXX地址改成f实际的地址。 
完成。

However, for templates, you know that the code of template functions cannot actually be directly compiled into binary code. There must be a process of "realization". for example:

//----------main.cpp------// 
template<class T> 
void f(T t) 
{} 
int main() 
{ 
…//do something 
f(10); //call f<int> 编译器在这里决定给f一个f<int>的具现体 
…//do other thing 
}

In other words, if you main.cpphave not called f in the file, f will not be realized, and main.objthere will be no line of binary code about f in it! ! If you call it like this:
f(10); // f<int> can be realized
f(10.0); //f<double> can be realized
so main.objthat there is f<int>, f <double>Binary code segment of two functions. And so on.
However, realization requires the compiler to know the definition of the template, doesn't it?
Look at the following example: [Separate the template from its implementation]

//-------------test.h----------------// 
template<class T> 
class A 
{ 
public: 
void f(); //这里只是个声明 
}; 
//---------------test.cpp-------------// 
＃include”test.h” 
template<class T> 
void A<T>::f() //模板的实现，但注意：不是具现 
{ 
…//do something 
} 
//---------------main.cpp---------------// 
＃include”test.h” 
int main() 
{ 
A<int> a; 
a. f(); //编译器在这里并不知道A<int>::f的定义，因为它不在test.h里面 
//于是编译器只好寄希望于连接器，希望它能够在其他.obj里面找到 
//A<int>::f的实现体,在本例中就是test.obj，然而，后者中真有A<int>::f的 
//二进制代码吗？NO！！！因为C++标准明确表示，当一个模板不被用到的时 
//侯它就不该被具现出来，test.cpp中用到了A<int>::f了吗？没有！！所以实 
//际上test.cpp编译出来的test.obj文件中关于A::f的一行二进制代码也没有 
//于是连接器就傻眼了，只好给出一个连接错误 
//但是，如果在test.cpp中写一个函数，其中调用A<int>::f，则编译器会将其//具现出来，因为在这个点上[test.cpp中]，编译器知道模板的定义，所以能//够具现化，于是，test.obj的符号导出表中就有了A<int>::f这个符号的地 
//址，于是连接器就能够完成任务。 
}

Focus

//编译器在这里并不知道A<int>::f的定义，因为它不在test.h里面 
//于是编译器只好寄希望于连接器，希望它能够在其他.obj里面找到 
//A<int>::f的实现体,在本例中就是test.obj，然而，后者中真有A<int>::f的 
//二进制代码吗？NO！！！因为C++标准明确表示，当一个模板不被用到的时 
//侯它就不该被具现出来，test.cpp中用到了A<int>::f了吗？没有！！所以实 
//际上test.cpp编译出来的test.obj文件中关于A::f的一行二进制代码也没有 
//于是连接器就傻眼了，只好给出一个连接错误 
//但是，如果在test.cpp中写一个函数，其中调用A<int>::f，则编译器会将其//具现出来，因为在这个点上[test.cpp中]，编译器知道模板的定义，所以能//够具现化，于是，test.obj的符号导出表中就有了A<int>::f这个符号的地 
//址，于是连接器就能够完成任务。

The key is: in a separate compilation environment, the compiler cppdoes not know .cppthe existence of another file when compiling a certain .file, nor does it look for it [when it encounters a pending symbol, it will pin its hopes on the linker]. This mode works well without a template, but it is dumbfounded when encountering a template, because the template is only realized when it is needed, so when the compiler only sees the declaration of the template, it cannot To realize the template, you can only create a symbol with an external connection and expect the linker to resolve the address of the symbol. However, when the realization of the template .cppis not used in the file that implements the template , the compiler is too lazy to realize it . Therefore, there is no one line of binary code for the realization of the template in the .obj of the entire project, so the linker alsoPowerless

The organization of C++ template code-Inclusion Model

Keyword Template Inclusion Model
Source: C++ Template: The Complete Guide

Note: This article is translated from part of Chapter 6 of the book "C++ Template: The Complete Guide". Recently, I have often seen posts about the inclusion mode of templates on the C++ forum. When I think of myself as a beginner in templates, I have also been confused by similar problems. Therefore, I translate this article and hope it will be helpful to beginners.

There are several different ways of organizing template code. This article introduces one of the most popular ways: the inclusion pattern.

Link error

Most C/C++ programmers organize their non-template code like this:

Classes and other types are all placed in header files, which have .hpp(or .H, .h, .hh, .hxx) extensions.
For global variables and (non-inline) functions, only the declaration is placed in the header file, and the definition is placed in the point C file, these files have .cpp(or .C, .c, .cc, .cxx) extensions.

This organization works well: it makes it easy to access the required type definitions during programming, and avoids the "repeated definition of variables or functions" errors from the linker.

Due to the influence of the above organization conventions, template programming novices often make the same mistake. The following small program reflects this error. Just like treating "ordinary code", we define the template in the header file:

// basics/myfirst.hpp 

\#ifndef MYFIRST_HPP 
\#define MYFIRST_HPP 

// declaration of template

template <typename T>

void print_typeof (T const&);

\#endif // MYFIRST_HPP

 

print_typeof()声明了一个简单的辅助函数用来打印一些类型信息。函数的定义放在点C文件中：

// basics/myfirst.cpp

＃include <iostream>

＃include <typeinfo>

＃include "myfirst.hpp" 


// implementation/definition of template

template <typename T> 
void print_typeof (T const& x) 
{

  std::cout << typeid(x).name() << std::endl;

}

This example uses the typeid operator to print a string that describes the type information of the passed parameter.

Finally, we use our template in another point C file, where the template declaration is＃include：

// basics/myfirstmain.cpp 

＃include "myfirst.hpp" 

// use of the template

int main() {
  double ice = 3.0; 
  print_typeof(ice); // call function template for type double
}

Most C++ compilers (Compiler) are likely to accept this program without any problems.But the linker will probably report an error, pointing out that the definition of the function print_typeof() is missing.

The reason for this error is that print_typeof()the definition of the template function has not beenRealize(Instantiate). In order to materialize a template, the compiler must know which definition should be materialized and what template parameters to use to materialize. Unfortunately, in the previous example, these two sets of information exist in different files compiled separately. Therefore, when our compiler sees the right print_typeof()call,But when I didn’t see the actualized definition of this function for the double type, it just assumed that such a definition was provided elsewhere, and created a reference to that definition (the linker uses this reference to resolve). On the other hand, when the compiler processes myfirst.cppit, the file does not indicate that it must implement template definitions for the special parameters it contains.

Templates in the header file

The general solution to the above problem is to use == to use the same method as we use macros or inline functions: == We include the definition of the template into the header file that declares the template. For our example, we can achieve the goal by #include "myfirst.cpp"adding it to the myfirst.hppend of the file, or including the myfirst.cppfile in every point C file that uses our template . Of course, there is a third method,Just delete the myfirst.cppfile and rewrite the myfirst.hppfile so that it contains all template declarations and definitions：

// basics/myfirst2.hpp

\#ifndef MYFIRST_HPP 
\#define MYFIRST_HPP 

＃i nclude <iostream> 
＃i nclude <typeinfo> 


// declaration of template 
template <typename T> 
void print_typeof (T const&); 

// implementation/definition of template 
template <typename T> 
void print_typeof (T const& x) 
{

  std::cout << typeid(x).name() << std::endl;

}

\#endif // MYFIRST_HPP

This way of organizing template code is called == inclusion mode ==. After such adjustments, you will find that our program has been able to compile, link, and execute correctly.

We can get some observations from this method. The most notable point is that this method increases the myfirst.hppoverhead involved to a considerable extent . In this example, this overhead is not caused by the size of the template definition itself, but by the fact that we must include the header files used by our template, which in this example is <iostream>and <typeinfo>. You will It was found that this eventually resulted in thousands of lines of code, because <iostream>header files such as this also contained template definitions similar to ours.

This is indeed a problem in practice because it increases the time required by the compiler to compile an actual program. We will therefore verify some other possible methods to solve this problem in later chapters. But in any case, it is fast for a real-world program to compile and link in an hour (we have encountered a program that takes several days to compile from source code).

Putting aside the compilation time, we strongly recommend that if possible, try to organize the template code according to the include mode.

Another observation is that the most important difference between a non-inline template function and inline functions and macros is that it does not expand on the call side. On the contrary, when the template function is realized, a new copy of this function will be produced. Since this is an automatic process, the compiler may produce two identical copies in different files, causing the linker to report an error. In theory, we don't care about this: this is something that compiler designers should care about. In fact, everything works normally most of the time, and we don't have to deal with this situation at all. However, for large projects that need to create their own libraries, this problem occasionally becomes apparent.

Finally, it needs to be pointed out that in our example, the method applied to ordinary template functions is also applicable to member functions and static data members of template classes, as well as template member functions.