Chapter 4. Compiling

Chapter 4. Compiling

Introduction

The GNU Compiler Collection

Other Compilers

Compiling the Linux Kernel

Assembly Listings

Compiler Optimizations

Conclusion

4.1. Introduction

Because the Linux kernel and all GNU software are fully open source, the act of compiling and working with compilers is a very important part of becoming a Linux expert. There will often be times when a particular software package or a particular feature in a software package isn’t included in your distribution, so the only option is to obtain the source and compile it yourself. Another common task in Linux is recompiling your kernel. The reasons for doing this could be to disable or enable a particular feature of the current kernel, apply a patch to fix a problem to add a new feature, or to migrate the system to a completely different kernel source level. This chapter does not give complete instructions on how to carry out these tasks. Rather, it provides information on some of the lesser known details related to these actions. Furthermore, the intent is to help arm you with some of the skills to enable you to solve compilation difficulties on your own. First, a bit of background on the primary tool used to perform compilation—the compiler.

由于 linux 内核和所有 GNU 软件都是完全开源的, 编译和使用编译器的行为是 linux 专家的重要技能。通常情况下, 特定软件包或软件包中的特定功能不包括在您的发行版中, 因此唯一的选择是获取源代码并自行编译。Linux 中的另一个常见任务是重新编译内核。这样做的原因可能是禁用或启用当前内核的特定功能, 应用修补程序来修复问题以添加新功能, 或者将系统迁移到完全不同的内核版本。本章不提供有关如何执行这些任务的完整说明。相反, 它提供了关于这些操作的一些鲜为人知的细节。此外, 目的是帮助您掌握一些技能, 使您能够自己解决编译问题。首先, 在用于执行编译的主要工具上有一点背景, 即编译器。

4.2. The GNU Compiler Collection

The GNU Compiler Collection, or GCC as it is commonly referred to, is currently the most widely used compiler for developing GNU/Linux, BSDs, Mac OS X, and BeOS systems. GCC is free (as in freedom) software and is freely available for anyone to use for any purpose. There are a large number of developers around the world that contribute to GCC, which is guided by the GCC Steering Committee.

gnu 编译器集合 (通常称为 GCC) 是当前用于开发 GNU/Linux、BSDs、Mac OS X 和 BeOS 系统的最广泛使用的编译器。GCC 是免费的 (如自由) 软件, 可以自由地供任何人使用。世界各地有大量开发人员对 gcc 作出了贡献, 这是由GCC指导委员会指导的。

4.2.1. A Brief History of GCC

GCC originally started out as the “GNU C Compiler” back when the GNU Project was first started in 1984 by the GNU Project founder Richard Stallman. With funding from the Free Software Foundation (FSF), the first release of GCC was in 1987. At that time it was the first portable, optimizing compiler freely available, which quickly paved the way for the open source movement and ultimately the GNU/Linux operating system.

在 gnu 项目最初于1984年由 gnu 项目创始人Richard Stallman开始时, GCC 最初是作为 "gnu C 编译器" 开始的。在自由软件基金会 (FSF) 的资助下, GCC的第一次发布是在1987年。当时, 它是第一个便携式的, 自由可用, 优化的编译器, 这很快为开源运动和最终的 GNU/Linux 操作系统铺平了道路。

Version 2.0 was released in 1992 and provided support for C++. In 1997, the Experimental/Enhanced GNU Compiler System (EGCS) project was spun off of GCC by a group of developers who wanted to focus more on expanding GCC, improving C++ support, and improving optimization. The success of EGCS resulted in it being named the official version of GCC in April of 1999. The release of GCC 3.0 in 2001 made the new compiler widely available.

The evolution of GCC has resulted in support for several different languages including Objective C, Java, Ada, and Fortran. This prompted the renaming of the GCC acronym to the more appropriate “GNU Compiler Collection.”

GCC 2.0 版本于1992年发布, 并为 c++ 提供了支持。在 1997年, 实验/增强 GNU 编译器系统 (EGCS) 项目被一组开发人员从 gcc 剥离, 他们希望更多地关注扩展 gcc, 改进 c++ 支持, 并改进优化。电梯的成功导致它被命名了GCC的正式版本在 4月1999年。GCC 3.0 在2001年发布, 使新编译器得到广泛应用。GCC 的演化导致了对几种不同语言的支持, 包括目标 C、Java、Ada 和 Fortran。这就促使将 GCC 缩写重命名为更合适的 "GNU 编译器集合"。

Today, GCC has been ported to more hardware architectures than any other compiler and is a crucial part of the success of the GNU/Linux operation system. GCC continues to grow and stabilize. Current versions of SuSE Linux Enterprise Server 9 use GCC version 3.3.3, and Redhat Enterprise Linux 3 uses GCC version 3.2.3.

今天, GCC 已被移植到更多的硬件体系结构, 而不是任何其他编译器, 是 GNU/Linux 操作系统成功的关键部分。GCC继续增长和稳定。当前版本的 SuSE linux 企业服务器9使用 gcc 版本 3.3.3, Redhat 企业 Linux 3 使用 gcc 版本3.2.3。

4.2.2. GCC Version Compatibility

When new versions of GCC are released, there is always the potential for library incompatibilities to be introduced, particularly with C++ code. For example, when compiling a C++ application with GCC version 3.3.1, the system library libstdc++.so.5 will be automatically linked in. When compiling the same C++ application with GCC 3.4.1, the system library libstdc++.so.6 is automatically linked in. This is because between GCC version 3.3.3 and version 3.4.1, the C++ Application Binary Interface (ABI) (see the “Calling Conventions” section for more information) was changed, resulting in the two versions of the compiler generating binaries that use differing C++ interfaces.

当发布新版本的 GCC 时, 通常会造成库不兼容, 特别是使用 c++ 代码。例如, 在使用 GCC 版本3.3.1 编译 c++ 应用程序时, 系统库 libstdc ++. so.5 将自动链接。在使用 GCC 3.4.1 编译相同的 c++ 应用程序时, 系统库 libstdc ++.so.6 会自动链接。这是因为在 GCC 版本3.3.3 和版本3.4.1 之间, c++ 应用程序二进制接口 (ABI) 被更改了 (请参阅 "调用约定" 部分), 从而生成两个版本的编译器, 这些二进制程序使用不同的 C++ 接口。

To resolve this issue, both versions of the libstdc++.so library must be available on any system that could potentially run an application compiled with differing versions of GCC.

要解决此问题, libstdc c++.so 的两个版本. 因此, 在任何可能运行使用 GCC 不同版本编译的应用程序的系统上, 库都必须可用。

4.3. Other Compilers

There are other compilers available for Linux, but none are nearly as portable as GCC and are generally only available on specific hardware platforms. GCC’s main competition on the i386, IA64, and x86-64 architectures is the Intel C++ Compiler. This compiler is a commercial product and must be properly licensed. It claims to have better performance than GCC and offers differing support packages to the purchaser.

Linux 还有其他的编译器, 但没有一个可以像 GCC 那样便于移植。一般只能在特定的硬件平台上使用。GCC 在 i386、IA64 和 x86-64 体系结构上的主要竞争对手是Intel c++ 编译器。此编译器是商业产品, 必须获得适当的许可。它声称其性能优于 GCC, 并向采购商提供不同的支持包。

An alternative compiler to GCC on the PowerPC platform running Linux is the IBM xlC C++ Compiler. Just as with the Intel compiler, this one also claims greater performance and enhanced support but is, again, a commercial product requiring proper licensing.

在运行 Linux 的 PowerPC 平台上, GCC 的替代编译器是 IBM xlC c++ 编译器。与英特尔编译器一样, 这一项还要求更高的性能和增强的支持, 但也是需要适当许可的商业产品。

4.4. Compiling the Linux Kernel

Thanks to Linux’s open source policy, users are free to choose any kernel they wish to use and even make changes to it if they desire! Making changes to a kernel does not require that you are a kernel developer or even a programmer. There are a plethora of patches available on the Internet from those who are programmers and kernel developers.

多亏了 Linux 的开源策略, 用户可以自由选择他们希望使用的任何内核, 甚至在需要的时候对其进行更改!对内核进行更改并不要求您是内核开发人员, 甚至是程序员。互联网上有大量的补丁程序和内核开发人员。

Note: A patch is a set of any number of source code changes commonly referred to as diffs because they are created by the diff(1) utility. A patch can be applied to a set of source files effectively acting as an automatic source code modification system to add features, fix bugs, and make other changes.

注意: 修补程序是一组任意数量的源代码更改, 通常称为diffs, 因为它们是由diff (1) 实用程序创建的。修补程序可应用于一组源文件, 有效地充当自动源代码修改系统, 以添加功能、修复 bug 和进行其他更改。

 

Caution does need to be exercised if you are not confident with this task, as the kernel is the brain of the operating system; if it is not working properly, bad things will happen, and your computer may not even boot.

如果您对此任务没有十足把握, 则需要谨慎, 因为内核是操作系统的大脑。如果工作不正常, 则会发生坏的事情, 而您的计算机甚至可能无法启动。

It is important to know roughly how to compile a Linux kernel in case the need arises. Here’s a perfect example—when the 2.6 kernel was officially released in December 2003, the new kernel did not appear in any distributions for several months. Even though a lot of features found in the mainline 2.6 kernel source have been back-ported to the major distributions’ 2.4-based products, there are still many fundamental features of the 2.6 kernel that have not been back-ported, which makes running it very attractive. So the only option at that point in time was to obtain the 2.6 source code and compile the kernel manually. It is not the intention of this book to go into detail on how to do this because there is a great deal of information commonly available to do so. Rather, the intention of this book is to help you troubleshoot and solve problems that may arise through this process on your own and with the help of others if need be. Linux is a massive project, and there are an infinite number of different system configurations—so problems are a very real possibility.

重要的是要大致了解如何编译 Linux 内核以备需要。这里有一个完美的例子--当2.6 内核在2003年12月正式发布时, 随后几个月内,新内核在任何发行版中都没有出现。尽管在主线2.6 内核中发现了大量的功能已被移植到主要发行版的2.4 产品中, 但仍有许多基本功能没有移植, 即2.6 内核没有被重新移植, 这使得运行它非常吸引力。因此, 在该时间点的唯一选项是获取2.6 源代码并手动编译内核。这本书的目的不是详细介绍如何做到这一点, 因为有大量的信息可以这样做。相反, 本书的目的是在其他人的协助下, 帮助您解决这一过程中产生的问题。Linux 是一个庞大的项目, 有无限数量的不同的系统配置-所以出现问题是非常可能的。

4.4.1. Obtaining the Kernel Source

The Linux kernel source for all releases, including development/test releases, is found at kernel.org. Kernel source can be downloaded as one complete archive or as patches to a specific kernel level. The archives are available as compressed tar files. For example, the full 2.4.24 source tree is available on kernel.org in the pub/linux/kernel/v2.4 directory and is named linux-2.4.24.tar.gz and linux-2.4.24.tar.bz2.

所有版本 (包括开发/测试版本) 的 Linux 内核源都在 kernel.org 中找到。内核源可以作为一个完整的存档或修补程序下载到特定的内核级别。存档可用作压缩tar文件。例如, 完整的2.4.24 源树可用于 kernel.org 中的pub/linux/kernel/v2.4 目录中, 它被命名为 linux 2.4. 24.tar.gz 和 linux-2.4. 24. tar.bz2。

Note: bz2 is an alternate compression method to gzip that generally provides higher compression. Use the bunzip2 utility to uncompress the file. Alternatively, the archive can be untarred with a single command such as:

注意: bz2 是一个替代压缩方法, gzip 通常提供更高的压缩。使用 bunzip2 实用程序解压缩文件。或者, 可以使用单个命令解压缩存档, 如:

     bzcat linux-2.4.24.tar.bz2 | tar -xf -

 

The patch files are also available for every release and basically contain the difference between the release for which they’re named and one release previous. The idea is that for those who want to always have the latest officially released kernel, they don’t need to download the full source archive each time; rather they can download the patch file, apply it to their current kernel source tree, rename the directory to reflect the new version (though not required), and rebuild only what’s changed with their existing configurations.

修补程序文件也可用于每个版本, 并且基本上包含它们所命名的发布和上一个版本之间的差异。这个想法是, 对于那些想要永远拥有最新的正式发布内核, 他们不需要每次下载完整的源文件;他们可以下载修补程序文件, 将其应用到当前的内核源树中, 重命名目录以反映新版本 (尽管不是必需的), 并仅重建与现有配置更改的内容。

4.4.2. Architecture Specific Source

The kernel source trees contain a directory in the top level called “arch” under which is all the architecture-specific code. Usually all code required for a particular architecture is included, but there are occasions, especially when a particular architecture is relatively new, when the architecture specific code is incomplete and/or buggy. In this case, there is often a dedicated server on the Internet that holds patchkits to be applied to the mainline kernel source.

内核源树包含一个名为 "arch" 的顶级目录, 下面是所有体系结构特定的代码。通常, 特定体系结构所需的所有代码都包括在内, 但有时, 特别是当特定体系结构相对较新时, 当体系结构特定代码不完整和/或有缺陷时。在这种情况下, Internet 上通常有一个专用服务器, 它将补丁包应用于主线内核源。

Note: A patchkit is a term used to refer to a large set of patches provided in a single downloadable archive.

注意: 补丁包 是指在一个可下载的归档文件中提供的一大组修补程序的术语。

 

A prime example of the need for a patchkit is the x86-64 architecture and the early 2.6 kernel releases. The 2.6.0 kernel source on kernel.org does not contain all fixes needed to properly run on the x86-64 architecture, so a patchkit must be downloaded from x86-64.org in the pub/linux/v2.6 directory. The need for doing this will vary by architecture, so be sure to check for the architecture that you’re interested in.

需要补丁包的一个例子是 x86-64 体系结构和早期的2.6 内核版本。kernel.org 上的2.6.0 内核源不包含在 x86-64 体系结构上正确运行所需的所有修复程序, 因此 补丁包必须从 x86-64 下载. 在pub/linux/v2.6 目录中。这样做的必要性将因体系结构而异, 因此一定要检查您感兴趣的体系结构。

4.4.3. Working with Kernel Source Compile Errors

Even though the mainline kernel source is tested a great deal and compiled on many machines around the world before being declared an official release and posted to kernel.org, there is still no guarantee that it will compile flawlessly for you on your particular machine. Compile failures can occur for various reasons. Some of the more common reasons are:

即使主线内核源代码做了大量测试, 并在世界各地的许多机器上编译, 然后才正式发布, 并放到 kernel.org。 但是仍然不能保证, 它完全适合你的特定机器。由于各种原因, 编译失败可能会发生。一些常见的原因是:

  1. Environment/setup errors or differences 环境/设置错误或差异
  2. Compiler version differences 编译器版本差异
  3. Currently running kernel is incompatible with the kernel being compiled当前正在运行的内核与正在编译的内核不兼容
  4. User error 用户错误
  5. Code error 代码错误

If you experience a compile error while compiling a fresh kernel, fear not; it may not be as difficult to fix it as it might seem. That’s the beauty of Linux-there is always a wealth of help and information at your fingertips, and it’s quite possible that someone else has already had and fixed the very same problem.

如果在编译新内核时遇到编译错误, 不要害怕;它可能并不像看起来那样难以修复。这就是 Linux 的魅力--在你的指尖总是有大量的帮助和信息, 而且很有可能是其他人已经解决了同样的问题。

4.4.3.1. A Real Kernel Compile Error Example

Let’s work through a compile problem encountered when compiling the 2.4.20 kernel source downloaded directly from kernel.org as an example. The error message encountered was

让我们以编译直接从 kernel.org 下载的2.4.20 内核源代码时遇到的编译问题为例。遇到的错误消息是

Code View: Scroll / Show All

gcc -D__KERNEL__ -I/usr/src/linux-2.4.20/include -Wall -Wstrict-prototypes -Wno-trigraphs -O2 -fno-strict-aliasing -fno-common - fomit-frame-pointer -pipe -mpreferred-stack-boundary=2 - march=i686   -nostdinc -iwithprefix include - DKBUILD_BASENAME=ide_probe-DEXPORT_SYMTAB -c ide-probe.c

gcc -D__KERNEL__ -I/usr/src/linux-2.4.20/include -Wall -Wstrict-prototypes -Wno-trigraphs -O2 -fno-strict-aliasing -fno-common-fomit-frame-pointer -pipe -mpreferred-stack-boundary=2 -march=i686   -nostdinc -iwithprefix include -DKBUILD_BASENAME=ide_geometry -c -o ide-geometry.o ide-geometry.c ld -m elf_i386 -r -o ide-probe-mod.o ide-probe.o ide-geometry.o

gcc -D__KERNEL__ -I/usr/src/linux-2.4.20/include -Wall -Wstrict-prototypes -Wno-trigraphs -O2 -fno-strict-aliasing -fno-common-fomit-frame-pointer -pipe -mpreferred-stack-boundary=2 -march=i686 -nostdinc -iwithprefix include -DKBUILD_BASENAME=ide_disk -c -o ide-disk.o ide-disk.c

gcc -D__KERNEL__ -I/usr/src/linux-2.4.20/include -Wall -Wstrict-prototypes -Wno-trigraphs -O2 -fno-strict-aliasing -fno-common-fomit-frame-pointer -pipe -mpreferred-stack-boundary=2 -march=i686 -nostdinc -iwithprefix include -DKBUILD_BASENAME=ide_cd -c -o ide-cd.o ide-cd.c

In file included from ide-cd.c:318:

ide-cd.h:440: error: long, short, signed or unsigned used invalidly for 'slot_tablelen'

make[3]: *** [ide-cd.o] Error 1

make[3]: Leaving directory '/usr/src/linux-2.4.20/drivers/ide'

make[2]: *** [first_rule] Error 2

make[2]: Leaving directory '/usr/src/linux-2.4.20/drivers/ide'

make[1]: *** [_subdir_ide] Error 2

make[1]: Leaving directory '/usr/src/linux-2.4.20/drivers'

make: *** [_dir_drivers] Error 2

 

To the user unfamiliar with looking at compile and make errors, this can look quite daunting at first. The first thing to do when seeing error output like this is to identify its root cause. In the preceding output, it is important to see that some of the compilations shown are successful and only one failed. Each line that begins with gcc is a compilation, and if no error output follows the line, then the compilation was successful. So we can rule out the first three gcc compilation lines, as they were successful. The compilation that failed was

对于不熟悉编译和出错的用户来说, 一开始可能看起来令人望而生畏。当看到这样的错误输出时, 首先要做的是确定它的根本原因。在前面的输出中, 重要的是要看到显示的一些编译成功, 只有一个失败。从 gcc 开始的每行都是一个编译, 如果没有错误输出, 则编译成功。因此, 我们可以排除前三个 gcc 编译行, 因为它们是成功的。失败的编译是

Code View: Scroll / Show All

gc -D__KERNEL__ -I/usr/src/linux-2.4.20/include -Wall -Wstrict-prototypes -Wno-trigraphs -O2 -fno-strict-aliasing -fno-common-fomit-frame-pointer -pipe -mpreferred-stack-boundary=2 -march=i686 -nostdinc -iwithprefix include -DKBUILD_BASENAME=ide_cd  -c -o ide-cd.o ide-cd.c

 

The next thing to do is to examine this compile line in more detail. The first objective is to determine the name of the source file being compiled. To do this, scan through the compile line and ignore all of the command line arguments that begin with a dash. The remaining command line arguments are include and ide-cd.c. The include is actually part of the -iwithprefix argument, so it too can be ignored, leaving us with the source file being compiled - ide-cd.c.

Next, we need to look at the error message dumped by gcc:

接下来要做的事情是更详细地检查此编译行。第一个目标是确定正在编译的源文件的名称。为此, 请扫描编译行, 忽略以破折号开头的所有命令行参数。其余的命令行参数包括和 ide –cd.c。include实际上是 -iwithprefix 参数的一部分, 因此它也可以被忽略, 这会使我们的源文件ide-cd.c被编译。接下来, 我们需要查看 gcc 输出的错误消息:

Code View: Scroll / Show All

In file included from ide-cd.c:318:

ide-cd.h:440: error: long, short, signed or unsigned used invalidly for 'slot_tablelen'

 

From this output, we can see that the code that failed to compile isn’t actually in ide-cd.c; rather it’s in ide-cd.h, which is #include’d by ide-cd.c at line 318. Line 440 of ide-cd.h is the line the compiler could not understand. The following is a snippet of code from ide-cd.h surrounding line 440, with line 440 highlighted.

从这个输出中, 我们可以看到编译失败的代码实际上并不在 ide-cd.c 中。相反, 它在ide-cd.h 中, 这是 #include 在ide-cd.c 的318行中。ide-cd.h 的第440行是编译器无法理解的行。下面是 ide-cd.h 中的440行代码片段, 突出显示。

        byte     curlba[3];

        byte     nslots;

        __u8 short slot_tablelen;

};

 

struct atapi_slot {

#if defined(__BIG_ENDIAN_BITFIELD)

 

By looking at the failing line, it might not be obvious right away what the problem could be without looking at the definition of__u8. Using cscope (see the section “Setting Up cscope to Index Kernel Sources” for more information), we see that__u8 is defined for each particular architecture. Because this was encountered on i386, the appropriate file to look at is /usr/src/linux-2.4.20/include/asm-i386/types.h. By selecting this file in cscope, the file is opened, and the cursor is placed directly on the line containing the definition:

通过查看失败的行, 如果不考虑__u8定义,可能不太清楚问题是什么,。使用 cscope (有关详细信息, 请参阅 "设置 cscope 索引内核源" 一节), 我们看到为每个特定体系结构定义了 __u8。因为这是在 i386 上遇到的, 所以要查看的适当文件是: i386/src/linux-2.4.20/include/asm/types.h。通过在 cscope 中选择此文件, 打开该文件, 并将光标直接放在该行的定义上:

typedef unsigned char __u8;

 

So now substituting this definition into the failing line to see exactly what the compiler sees, we get:

所以现在将这个定义替换到失败的行, 以确切地看到编译器看到的内容, 我们得到:

unsigned char short slot_tablelen;

 

This certainly doesn’t look right given that a char and a short are two completely different primitive C data types. This appears to be a coding error or typo. The problem now is to try to figure out whether the code author actually wanted a char or a short. Just by considering the pathname of the failing code (drivers/ide/ide-cd.h) and the fact that this is a structure, I’m very hesitant to guess at the correct value since—if it is incorrect—that could be risking disk and filesystem corruption (IDE is a commonly used disk controller on x86-based computers).

鉴于char和short是两个完全不同的基本 C 数据类型, 这当然看起来不太正确。这似乎是一个错误。现在的问题是, 试图找出代码作者是否真的想要一个char还是short。只需考虑失败代码的路径名 (drivers/ide/ide-cd. h) 和这是一个结构体的事实,, 我很不愿意猜测正确的值, 因为-如果猜得不对的话-可能对磁盘有损伤和文件系统损坏 (ide 是一个常用的x86-based 计算机上的磁盘控制器)。

With this kind of issue, it’s very likely that others have had the same problem. Because of this, searching the Internet should be the next step. Plugging ide-cd.c slot_tablelen into a search engine instantly shows that this has in fact been reported by others.

有了这类问题, 很可能其他人也有同样的问题。正因为如此,下一步应该是搜索互联网。将 ide-cd.c slot_tablelen 插入到搜索引擎中时, 会立即显示这实际上已被其他人报告。

It took a bit of time to read through the top five or so results to find the solution, but it turned out that it was in fact a typo, and the line should instead have read:

花了一点时间读完前五个结果,就能找到解决方案, 事实证明, 它实际上是一个拼写错误, 而该行应该改为:

__u16 slot_tablelen;

 

When making this correction to ide-cd.h and re-running the compilation, ide-cd.c successfully compiled!

修改 ide-cd.h 文件后,重新运行编译,ide-cd.c 就编译成功了。

So what is the real cause of this particular compile failure? I did a little more investigative work and found that it could be categorized as a code error and as a compiler difference. The code itself really is incorrect; two primitive data types should never be used to declare a single variable as was done in this case. It seems however, that some compilers allow this particular case, and some don’t. Moreover, in this particular case, older versions of gcc allowed this where newer versions did not.

那么, 这一特定编译失败的真正原因是什么呢?我做了一些调查, 发现它可以归类为代码错误和编译器差异。代码本身真的是不正确的;在这种情况下, 不应使用两个基本数据类型声明单个变量。然而, 似乎有些编译器允许这个特定的情况, 有些则没有。此外, 在这种特殊情况下, gcc 的旧版本允许这么做,而新版本gcc不允许这么使用。

This type of compilation error is very common when porting applications from one platform to another, for example from IBM AIX to Linux or Sun Solaris to HP-UX. This simply means that the developers of each respective compiler interprets the C/C++ Standards and implements them differently.

将应用程序从一个平台移植到另一个平台时,例如从IBM AIX到Linux或Sun Solaris到HP-UX,这种类型的编译错误非常常见。 这意味着每个编译器的开发人员都会对C / C ++标准有自己的不同理解并实现它们。

4.4.4. General Compilation Problems

When compiling your own code, compilation errors are often very easy to resolve. When downloading source from reputable locations on the Internet, resolving compilation errors can be very difficult. It can be especially difficult if the source that has been downloaded has been officially released and tested thoroughly. The instant thought is that it is a user error and you must be doing something wrong. This isn’t always the case though, as I’ll attempt to point out.

Reiterating what was just stated, compilation errors are generally due to one or a combination of the following reasons:

编译自己的代码时, 编译错误通常很容易解决。从 Internet 上的可信位置下载源代码时, 解决编译错误可能非常困难。如果已下载的源已正式发布并经过彻底测试, 则可能特别困难。即时的想法是, 它是一个用户错误, 你一定是做错了什么。不过, 事实并不总是如此, 正如我刚才指出的。重申刚才所说的, 编译错误通常是由于下列原因之一或组合造成的:

  1. Environment/setup errors or differences
  2. Compiler version differences or bugs
  3. User error
  4. Code error

4.4.4.1. Environment/Setup Errors or Differences

There are an infinite number of ways to configure Linux system environments, so the chances for a setup or environment error are fairly high. Further adding to the possibility for problems is a correctly set up environment—but one that differs from the environment in which the source code was written. This is a very real problem, especially with Linux, simply because things change so quickly and are constantly evolving.

有无限多的方法来配置 Linux 系统环境, 因此安装或环境错误的几率相当高。进一步增加问题的难度的是一个正确的设置环境-但与编写源代码的环境不同。这是一个非常实际的问题, 特别是在 Linux 方面, 这仅仅是因为事情变化如此之快, 并且不断地演变。

Some examples of environment errors or differences that could easily lead to compilation problems are

一些可能容易导致编译问题的环境错误或差异示例是

  • missing or outdated system include files
  • outdated or differing glibc libraries
  • insufficient disk space

Many modern software packages include scripts generated by the GNU autoconf package, which will automatically configure the Makefile(s) and source code based on the system’s environment. The use of autoconf will immediately flag any differences or problems it finds with header files or libraries before compilation even begins, which greatly simplifies the problem determination process. Sample autoconf output for the gcc 3.3.2 source is shown here. The output can be quite lengthy, so a large chunk in the middle has been cut out (<<...>>) to show the beginning and end output.

许多现代软件包包括由 GNU autoconf 包生成的脚本, 它将根据系统环境自动配置生成文件和源代码。在编译之前, 使用 autoconf 将立即标记与头文件或库发现的任何差异或问题, 这大大简化了问题确定过程。此处示例显示了 gcc 3.3.2 源的 autoconf 输出。输出可能相当长, 因此中间的大块被剪掉 (<...>) 以显示开始和结束输出。

Code View: Scroll / Show All

linux> ./configure 2>&1 | tee conf.out

Configuring for a i686-pc-linux-gnu host.

Created "Makefile" in /home/dbehman/gcc-3.3.2 using "mt-frag"

Configuring libiberty...

creating cache ../config.cache

checking whether to enable maintainer-specific portions of

Makefiles... no

checking for makeinfo... no

checking for perl... perl

checking host system type... i686-pc-linux-gnu

checking build system type... i686-pc-linux-gnu

checking for ar... ar

checking for ranlib... ranlib

checking for gcc... gcc

checking whether we are using GNU C... yes

checking whether gcc accepts -g... yes

checking whether gcc and cc understand -c and -o together... yes

 

<<...>>

 

checking size of short... (cached) 2

checking size of int... (cached) 4

checking size of long... (cached) 4

checking size of long long... (cached) 8

checking byte ordering... (cached) little-endian

updating cache ../config.cache

creating ./config.status

creating Makefile

creating install-defs.sh

creating config.h

 

This isn’t a foolproof method, though, and compilation problems could still occur even after a successful configuration. If running configure with or without a series of command line parameters is documented as one of the first steps for compiling and installing the software, then it’s highly likely that autoconf is being used and your compilation experience has a greater chance of being problem-free.

虽然这不是一个万无一失的方法, 但即使在成功配置之后, 编译问题仍可能发生。如果运行带或不带一系列命令参数的configure被记录为编译和安装软件的第一步, 那么很有可能使用 autoconf, 并且您的编译体验有更大的机会是无问题的。

4.4.4.2. Compiler Version Differences or Bugs

A compilation failure due to a compiler version difference or a bug can be very tricky to diagnose. For version differences, the good news is that the GNU Compiler Collection (GCC) is the most commonly used set of compilers on Linux systems, so the scope of determining differences is much smaller than other systems. There are, however, a growing number of alternative compilers available for the various architectures on which Linux runs, so using a compiler other than GCC increases the chances of a compile failure. As GCC is almost always available on a Linux system as well as additional compilers, a good first step in diagnosing a compile failure with a different compiler is to re-attempt compilation with GCC. If GCC compiles without error, you’ve identified a compiler difference. It doesn’t necessarily mean that either of the compilers is wrong or has a bug. It could simply mean that the compilers interpret the programming standard differently.

Version differences within GCC can easily result in compile failures. The following example illustrates how the same code compiles clean, compiles with a warning, and fails with a compile error when using different versions of GCC. The source code is:

由于编译器版本差异或 bug 而导致的编译失败可能是非常棘手的问题。对于版本差异, 好消息是 GNU 编译器集合 (GCC) 是 Linux 系统上最常用的编译集, 因此确定差异的范围要比其他系统小得多。但是, 在 Linux 运行的各种体系结构中, 可供选择的编译器数量越来越多, 因此使用 GCC 以外的编译器会增加编译失败的几率。由于 gcc 几乎总是可以在 Linux 系统以及其他编译器中使用, 所以在诊断使用不同的编译器编译失败时的一个好的第一步是重新尝试使用 GCC 进行编译。如果 GCC 编译时没有错误, 您已经确定了编译器的差异。这并不意味着任何编译器都是错误的或者有 bug。这可能仅仅意味着编译器对编程标准的解释不同。GCC 内部的版本差异会很容易导致编译失败。下面的示例阐释了同一代码如何干净编译、使用警告进行编译, 并在使用不同版本的 GCC 时失败并出现编译错误。源代码是:

#include <stdio.h>

 

static const char msg[] = "This is a string

which spans a

couple of lines

to demonstrates differences

between gcc 2.96,

gcc 3.2,

and gcc 3.3";

 

int main( void )

{

   printf( "%s\n", msg );

 

   return 0;

}

 

Compiling and running this code with gcc 2.96 produces the following:

使用 gcc 2.96 编译和运行此代码会产生以下情况:

penguin> gcc -v

Reading specs from /usr/lib/gcc-lib/i386-suse-linux/2.95.3/specs

gcc version 2.95.3 20010315 (SuSE)

penguin> gcc multiline.c

penguin> ./a.out

This is a string

which spans a

couple of lines

to demonstrates differences

between gcc 2.96,

gcc 3.2,

and gcc 3.3

 

The compilation was successful with no warnings, and running the resulting executable displays the desired message.

编译成功, 没有警告, 运行生成的可执行文件将显示所需的消息。

Compiling with gcc 3.2 produces the following:

使用 gcc 3.2 编译生成以下内容:

penguin> gcc -v

Reading specs from /usr/lib64/gcc-lib/x86_64-suse-linux/3.2.2/specs

Configured with: ../configure —enable-threads=posix —prefix=/usr —

with-local-prefix=/usr/local —infodir=/usr/share/info —mandir=/usr/

share/man —libdir=/usr/lib64 —enable-

languages=c,c++,f77,objc,java,ada —enable-libgcj —with-gxx-include-

dir=/usr/include/g++ —with-slibdir=/lib —with-system-zlib —enable-

shared —enable-__cxa_atexit x86_64-suse-linux

Thread model: posix

gcc version 3.2.2 (SuSE Linux)

penguin> gcc multiline.c

multiline.c:3:27: warning: multi-line string literals are deprecated

penguin> ./a.out

This is a string

which spans a

couple of lines

to demonstrates differences

between gcc 2.96,

gcc 3.2,

and gcc 3.3

 

As the warning message states, multi-line string literals have been deprecated, but given this is just a warning, the compilation completes, and running the program produces our desired output.

正如警告消息所指出的, 多行字符串文本已不被支持, 但鉴于这只是一个警告, 编译完成, 运行程序产生我们想要的输出。

Compiling the source with gcc 3.3 produces this:

使用 gcc 3.3 编译源代码将生成以下内容:

Code View: Scroll / Show All

penguin> /opt/gcc33/bin/gcc -v

Reading specs from /opt/gcc33/lib64/gcc-lib/x86_64-suse-linux/3.3/specs

Configured with: ../configure —enable-threads=posix —prefix=/opt/

gcc33 —with-local-prefix=/usr/local —infodir=/opt/gcc33/share/info —mandir=/opt/gcc33/share/man —libdir=/opt/gcc33/lib64 —enable-languages=c,c++,f77,objc,java,ada —disable-checking —enable-libgcj—with-gxx-include-dir=/opt/gcc33/include/g++ —with-slibdir=/lib64—with-system-zlib —enable-shared —enable-__cxa_atexit x86_64-suse-linux

Thread model: posix

gcc version 3.3 20030312 (prerelease) (SuSE Linux)

penguin> /opt/gcc33/bin/gcc multiline.c

multiline.c:3:27: missing terminating " character

multiline.c:4: error: parse error before "which"

multiline.c:9:11: missing terminating " character

 

Clearly, the error is due to the string spanning multiple lines. If gcc 3.3 is the compiler version to be used, the only solution is to fix the code as shown in this updated script:

显然, 该错误是由跨越多行的字符串引起的。如果 gcc 3.3 是要使用的编译器版本, 唯一的解决方案是修复代码, 如下所示:

#include <stdio.h>

 

static const char msg[] = "This is a string\n"

"which spans a\n"

"couple of lines\n"

"to demonstrates differences\n"

"between gcc 2.96,\n"

"gcc 3.2,\n"

"and gcc 3.3";

 

int main( void )

{

   printf( "%s\n", msg );

 

   return 0;

}

 

The point here is that code with strings spanning multiple lines certainly existed when gcc 2.96 was the most current version. If that code doesn’t get updated by its author(s) and users attempt to compile it with a newer version of gcc, they will get compile errors directly related to a compiler version difference. Some C purists could argue that the first version of the sample code is incorrect and should not have been used in the first place. However, the fact remains that at one time the compiler allowed it without warning; therefore, it will be used by many programmers. In fact, there were several instances in kernel code, mostly drivers, which had multi-line strings that have since been fixed.

这里的要点是, 当 gcc 2.96 是最新版本时, 具有跨越多行的字符串的代码肯定存在。如果该代码没有得到作者的更新, 并且用户尝试用新的 gcc 版本编译它, 则它们将获得与编译器版本差异直接相关的编译错误。一些 C 纯粹主义者可能会争辩说, 样本代码的第一个版本是不正确的, 不应该首先使用。然而, 事实仍然是, 在同一时间, 编译器允许它没有警告;因此, 它将被许多程序员使用。实际上, 内核代码中有几个实例, 其中大多数是驱动程序, 它们具有自修复以来的多行字符串。

Compiler bugs are certainly another very real possibility for compilation errors. A compiler is a piece of software written in a high-level language as well, so it is by no means exempt from the same kinds of bugs that exist in programs being compiled. As with all unexpected compilation errors and compiler behavior, the best way to determine the cause of the problem is to eliminate anything nonessential and set up a minimalist scenario; this generally means making a standalone test program that is made up of a very small number of source files and functions. The test program should do nothing but clearly demonstrate the problem. When this is achieved, this test program can be sent to the parties supporting the compiler. In the case of gcc, this would generally be the distribution’s support team, but it could be gcc developers at gnu.org directly.

编译器 bug 当然是编译错误的另一个非常现实的可能性。编译器也是用高级语言编写的软件, 因此它也有正在编译的程序中存在的相同类型的 bug。与所有意外的编译错误和编译器行为一样, 确定问题原因的最好方法是去掉任何不必要的内容, 并设置一个极简的场景;这通常意味着制作一个独立的测试程序, 由少量的源文件和函数组成。测试程序除了测试问题外 什么也不做。当实现此目的时, 可以将此测试程序发送到编译器的技术支持。就 gcc 而言, 这通常是发行版的支持团队, 但它可以直接在 gnu.org 中成为 gcc 开发人员。

4.4.4.3. User Error

Compilation failures due to user error are extremely common and could be the result of the user incorrectly doing almost anything. Some examples include

由于用户错误导致的编译失败非常普遍 , 可能是用户不正确地执行操作的结果。一些例子包括

  • Incorrectly expanding the source archive
  • Incorrectly applying a necessary patch
  • Incorrectly setting required flags, options, or environment variables
  • Executing make incorrectly
  • Using insufficient permissions
  • Downloading incorrect or insufficient packages

The list is endless...

名单是无尽的..。

Generally, software packages come with documents in the highest level directory of the archive with a name to catch your attention. INSTALL, README, or COMPILE are examples. These files usually contain excellent documentation and instructions for building the software package quickly and easily. If the instructions are followed along with a little care and common knowledge, building a software package should be an error free experience.

一般情况下, 软件包的最高级别目录中的文档都带有一个名称来吸引您的注意。INSTALL、README文件或COMPILE都是示例。这些文件通常包含优秀的文档和关于快速方便地构建软件包的说明。如果如果用户稍微有一些常识并且遵循文档的步骤, 构建软件包应该一点问题都没有。

4.4.4.4. Code Error

Compilation failures due to code errors or bugs are the simplest way for a compilation to fail. In general, a source package gets thoroughly tested before it is made available. Testing cannot occur without compilation, so a very high percentage of compilation bugs, if not all, are flushed out during the testing phase. However, because of a huge variety of environments and compilation needs, real compile-time bugs can slip through the cracks to the end user. When this happens, the user can attempt to fix the problem on his own, but often correspondence with the author is required. A word of caution, though: The reason for discussing this section last is to stress the importance of ensuring that a compilation failure isn’t due to any of the causes just listed before assuming it is a code error. Reporting a problem as a bug to an author when it is actually user error, for example, is frustrating for everyone, so it is important to be sure of the diagnosis.

由于代码错误或 bug 导致的编译失败是编译失败的最常见的方式。通常情况下, 源包在可用之前进行彻底测试。没有编译就无法进行测试, 因此在测试阶段, 编译 bug (如果不是全部) 的百分比非常高。但是, 由于各种环境和编译需要, 真正的编译时 bug 可能没有检测出来,从而被最终用户碰上。发生这种情况时, 用户可以尝试自己解决问题, 但通常需要与作者联系。不过有一点要注意: 最后讨论这一节的原因是强调必须确保编译失败不是由于在假定它是之前列出的原因的代码错误造成的。例如, 当作者把实际用户错误的问题作为 bug 报告给其他用户是令人沮丧的, 因此确定诊断是很重要的。

It’s also very important to note that a compilation failure could very easily be due to a combination of code error and any of the other causes mentioned such as compiler version differences. Using the example of the string spanning multiple lines, even though gcc 2.96 happily compiled it without warning, this doesn’t necessarily mean that it is 100% “correct” code.

还必须注意的是, 编译失败很容易是由于代码错误和其他原因 (如编译器版本差异) 的组合造成的。使用跨多行的字符串的示例, 即使 gcc 2.96 在没有警告的情况下能愉快地编译它, 这并不一定意味着它是 100% "正确" 的代码。

4.5. Assembly Listings

A key to debugging various software problems is the ability to generate an assembly listing of a particular source file. Assembly is one step above machine language and is therefore extremely terse and difficult to program in. Modern third-generation languages (3GL) such as C incorporate an “assembly phase” that converts the source code into assembly language. This assembly source is then passed to the assembler for conversion into machine language. Because assembly involves the direct management of low-level system hardware such as registers, machine instructions, and memory, examining the assembly listing produced by the compiler will illustrate exactly what will happen during execution. This is often invaluable in debugging large applications as the C code can get quite complex and difficult to read.

调试各种软件问题的关键是能够生成特定源文件的汇编代码。汇编代码是机器语言上面的一步, 因此非常简洁, 难以编程。现代第三代语言 (3GL) (如 C) 包含一个 "汇编阶段", 它将源代码转换为汇编语言。然后将此汇编代码传递给汇编器, 以便转换为机器语言。因为汇编代码涉及对低级系统硬件 (如寄存器、机器指令和内存) 的直接管理, 所以检查编译程序生成的汇编代码将明确说明执行过程中会发生什么情况。在调试大型应用程序时, 这通常是非常宝贵的, 因为 C 代码可能变得相当复杂且难于阅读。

4.5.1. Purpose of Assembly Listings

Applications, especially large ones such as database management systems or Web servers, will often include problem determination features used to dump information to log files, which can be examined later to determine the location and cause of a particular problem. In smaller applications, a debugger can be used to gather necessary information for tracking down the cause of the problem (see Chapter 6, “The GNU Debugger (GDB)” for more information). For speed and binary size reasons, application’s binaries are as close to machine language as possible and contain very little human readable information. When a problem arises somewhere in the execution of the binaries, techniques are needed to convert the machine language into something somewhat human readable to be able to determine the cause of the problem. This could mean compiling the application with the -g flag, which will add extra debugging symbols to the resulting machine code binaries. These debug symbols are read by debuggers that will combine them with the assembly language interpretation of the machine language to make the binaries as humanly readable as possible. This adds a great deal of size overhead to the resulting binaries and is often not practical for everyday production use. What this means is that extra knowledge and skill is required by whoever is examining the problem because the highest level of readability that can be achieved is usually the assembly level.

应用程序 (尤其是大型数据库管理系统或 Web 服务器) 通常包括用于将信息转储到日志文件来调试问题的功能, 可以在以后检查以确定问题的位置和原因。.在较小的应用程序中, 可以使用调试器收集必要的信息, 以跟踪问题的来源 (参见第6章, "GNU 调试器 (GDB)" 以获得更多信息)。为了速度更快和二进制文件的尺寸更小, 应用程序的二进制文件尽可能接近机器语言, 并且包含很少的可读信息。当二进制文件执行过程中出现问题时, 需要技术将机器语言转换成一些可读的信息, 以便能够确定问题的原因。这可能意味着使用-g 标志编译应用程序, 这将为生成的计算机代码二进制文件添加额外的调试符号。这些调试符号由调试器读取, 将它们与计算机语言的汇编语言相结合, 使二进制文件尽可能地具有可读性。这为生成的二进制文件增加了大量的开销, 并且通常不适合日常生产使用。这意味着, 任何正在检查问题的人都需要额外的知识和技能, 因为可以实现的最高级别的可读性通常是汇编级别。

The ultimate objective whenever a problem such as a segmentation fault or bus error occurs is to convert the machine language location of the trap into a high-level source line of code. When the source line of code is obtained, the developer can examine the area of code to see why the particular trap may have occurred. Sometimes the problem will be instantly obvious, and other times more diagnostics will be needed.

当出现分段故障或总线错误等问题时, 最终目标是将出现错误的机器语言位置信息转换为高级语言源代码行。当获得源代码行时, 开发人员可以检查代码区域以了解特定错误可能发生的原因。有时, 问题非常明显, 其他时候需要更多的诊断信息。

So how then does one even determine the machine language or assembly location of the trap? One way is to dump diagnostic information to a log file from an application’s problem determination facilities. Generally, the diagnostic information will include a stack traceback (see Chapter 5 for more information) as well as an offset into each function in the stack traceback that represents where the execution currently is or will return to in that function. This stack traceback information is usually very similar to the information that is obtainable using a debugger. The following output shows a stack traceback obtained after attaching gdb to an xterm process:

那么, 怎样才能确定出错的机器语言或汇编程序的位置呢?一种方法是将诊断信息从应用程序的错误诊断设施转储到日志文件中。通常, 诊断信息将包括堆栈回溯 (参见第5章以了解更多信息) 以及堆栈回溯中每个函数的偏移量, 该值表示当前执行或将退出该函数的位置。此堆栈回溯信息通常与使用调试器可获得的信息非常相似。以下输出显示了在将 gdb 附加到 xterm 进程后获得的堆栈跟踪:

(gdb) where

#0  0x40398f6e in select () from /lib/i686/libc.so.6

#1  0x08053a47 in in_put ()

#2  0x080535d5 in VTparse ()

#3  0x080563d5 in VTRun ()

#4  0x080718f4 in main ()

(gdb) print &select

$1 = (<text variable, no debug info> *) 0x40398f50 <select>

(gdb) print /x 0x40398f6e - 0x40398f50

$2 = 0x1e

 

The important thing to understand is that from this simple output we know that the offset into the function select() at which execution is currently at is 0x1e. With the source code used to compile the /lib/i686/libc.so.6 library, we could then easily determine the exact line of source code by creating an assembly listing.

重要的是要理解,从这个简单的输出, 我们知道的偏移到函数select() 在执行当前是0x1e。使用源代码编译/lib/i686/libc.so.6 库, 然后我们可以通过创建汇编程序轻松地确定源代码的确切行。

We will focus more on stack tracebacks, problem determination facilities, and using a debugger in other chapters. For now, we will concentrate on the steps needed to create an assembly listing and how to correlate an offset with a line of code.

我们将更多地关注堆栈 tracebacks、问题确定机制以及在其他章节中使用调试器。现在, 我们将集中讨论创建汇编程序所需的步骤以及如何将偏移量与一行代码关联起来。

4.5.2. Generating Assembly Listings

For most high-level language programmers, looking at raw assembly isn’t very easy and could take a great deal of time. Fortunately, mixing the high-level source code with the assembly is an option.

对于大多数高级语言程序员来说, 看原始汇编程序不是很容易, 需要花费大量的时间。幸运的是, 将高级语言源代码与汇编程序混合是一个选项。

There are two methods to achieving this. One is with the objdump(1) utility. Another way to do this is by telling the compiler to stop at the assembly phase and write the listing to a file. This raw assembly listing can then be run through the system assembler with certain command line parameters to produce an output file that intermixes assembly and high-level source code. An example of this is done with the source code in the code saved in file as_listing.c and listed here:

实现这一目标有两种方法。一个是用 objdump (1) 实用程序。另一种方法是告诉编译器在汇编程序阶段停止并将汇编程序写入文件。然后, 可以通过具有某些命令行参数的系统汇编器运行此原始汇编程序, 以生成 intermixes 程序集和高级源代码的输出文件。此操作的一个示例是在文件 as_listing.c 中保存的代码中的源代码中完成的。

#include <stdio.h>

 

int main( void )

{

   int a = 5;

   int b = 3;

   int c = 0;

   char s[] = "The result is";

 

   c = a + b;

 

   printf( "%s %d\n", s, c );

   return 0;

}


 

A typical compilation of this source code would consist of running:

此源代码的典型编译包括运行:

gcc -o as_listing as_listing.c

 

This produces the executable file as_listing. For gcc, specifying the -S flag causes the compilation to stop before the assembling phase and dump out the generated assembly code. By default, if -o is not used to specify an output filename, gcc converts the input source filename extension (such as .c) to .s. To be able to properly intermix the assembly and high-level source code, it is also required to use the -g flag to produce debug symbols and line number information. For the code in as_listing.c, to produce an assembly output, run the following command:

这将生成可执行文件 as_listing。 对于 gcc, 指定-s 标志会导致编译在汇编程序阶段之前停止并转储生成的汇编程序代码。默认情况下, 如果 -o 不用于指定输出文件名, gcc 将输入源文件扩展名 (如. c) 转换为.s。为了能够正确地混合汇编程序和高级语言源代码, 还需要使用 -g 标志来生成调试符号和行号信息。对于 as_listing 中的代码, 若要生成汇编程序, 请运行以下命令:

 

gcc as_listing.c -S -g

 

The resulting as_listing.s text file can be examined, but unless you know assembly very well, it likely won’t make much sense. This is where the importance of mixing in the high-level language comes to play. To do this, run the system assembler, as, with the command line arguments, which turn on listings to include assembly and high-level source:

现在可以检查生成的 as_listing 文本文件, 但除非您对汇编程序非常了解, 否则可能不会有什么意义。这就是体现在汇编程序中混合高级语言的重要性的地方。为此, 请使用命令行参数运行系统汇编器, 该变量打开列表以包括汇编程序集和高级语言源代码:

 

as -alh as_listing.s > as_listing.s_c

Note: Certain compilations done using make files will compile a source file from a different directory rather than the current one. In this case, it may be necessary to run objdump or as -alh from the same directory in which the make process compiled the file.

注意: 使用 "Makefile" 进行的某些编译将从另一个目录进行编译而不是编译当前源文件。在这种情况下, 可能需要从make进程编译文件的同一目录中运行 objdump 或 -alh。

4.5.3. Reading and Understanding an Assembly Listing

For the most part, the reason for examining an assembly listing is to determine how the compiler interpreted the high-level language and what assembly language resulted. When first looking at an assembly listing, it can be quite intimidating. It’s important to understand that there is a lot of information in this file that is only of use to a system’s assembler. Generally, much of this data can be ignored as often only a very specific area of the code will be desired and referred to by a function name and an offset in a stack dump or stack traceback for example. With this information and the assembly listing in hand, the first thing to do is to search for the function name in the assembly listing. The assembly listing from the code in as_listing.c is shown below:

在大多数情况下, 检查汇编程序的原因是确定编译器如何解释高级语言以生成汇编语言。当第一次查看汇编程序时, 可能会很吓人。很重要的一点是, 在这个文件中有很多信息只对系统的汇编程序有用。通常, 这些数据中的大部分可以被忽略, 因为通常只需要一个非常特定的代码区域, 函数名和堆栈转储或堆栈回溯中的偏移量就会被引用。使用此信息和汇编程序, 首先要做的是在汇编程序中搜索函数名。下面显示了 as_listing 代码的汇编程序:

Code View: Scroll / Show All

  16                   .globl main

  17                           .type   main, @function

  18                   main:

  19                   .LFB3:

  20                           .file 1 "as_listing.c"

   1:as_listing.c **** #include <stdio.h>

   2:as_listing.c ****

   3:as_listing.c **** int main( void )

   4:as_listing.c **** {

  21                           .loc 1 4 0

  22 0000 55                   pushl   %ebp

  23                   .LCFI0:

  24 0001 89E5                 movl    %esp, %ebp

  25                   .LCFI1:

  26 0003 83EC28               subl    $40, %esp

  27                   .LCFI2:

  28 0006 83E4F0               andl    $-16, %esp

  29 0009 B8000000             movl    $0, %eax

  29      00

  30 000e 29C4                 subl    %eax, %esp

   5:as_listing.c ****    int a = 5;

  31                           .loc 1 5 0

  32                   .LBB2:

  33 0010 C745F405             movl    $5, -12(%ebp)

  33      000000

   6:as_listing.c ****    int b = 3;

  34                           .loc 1 6 0

  35 0017 C745F003             movl    $3, -16(%ebp)

  35      000000

   7:as_listing.c ****    int c = 0;

  36                           .loc 1 7 0

  37 001e C745EC00             movl    $0, -20(%ebp)

  37      000000

   8:as_listing.c ****    char s[] = "The result is";

^LGAS LISTING as_listing.s                     page 2

 

 

  38                           .loc 1 8 0

  39 0025 A1000000             movl    .LC0, %eax

  39      00

  40 002a 8945D8               movl    %eax, -40(%ebp)

  41 002d A1040000             movl    .LC0+4, %eax

  41      00

  42 0032 8945DC               movl    %eax, -36(%ebp)

  43 0035 A1080000             movl    .LC0+8, %eax

  43      00

  44 003a 8945E0               movl    %eax, -32(%ebp)

  45 003d 66A10C00             movw    .LC0+12, %ax

  45      0000

  46 0043 668945E4             movw    %ax, -28(%ebp)

   9:as_listing.c ****

  10:as_listing.c ****    c = a + b;

 

As you can see, assembly is quite terse and much more lengthy than C! The numbers at the far left that start at 16 and go up to 46 are the assembly listing line numbers. If you open up as_listing.s and look at line 16 for example, you will see:

正如您所看到的, 汇编程序非常简洁, 但比 C代码 更长! 最左边的数字从16开始, 到46是汇编程序行号。如果你打开 as_listing.s, 查看16行, 你会看到:

.globl main

 

Some of the assembly listing lines will have four digit hexadecimal numbers to the right of them. This is the offset number and is a very important number. In the preceding assembly listing we can see that the start of the main() function has an offset of 0:

某些汇编程序行的右侧将有四位十六进制数字。这是偏移量, 是一个非常重要的数字。在前面的汇编程序中, 我们可以看到main () 函数的开始偏移量为 0:

 

22 0000 55                 pushl    %ebp

 

It’s also important to understand that the first assembly instruction in any function on x86 based hardware is pushl %ebp, which pushes the calling function’s frame pointer onto the stack.

同样重要的是要了解 基于x86 的硬件的任何函数中的第一个程序集指令是 pushl %ebp, 它将调用函数的帧指针推送到堆栈上。

Note: Some architectures such as x86-64 support the -fomit-frame-pointer optimization flag, which is turned on by default with the -O2 optimization level (see the section, “Compiler Optimization,” for more information). When an object is compiled with this flag, there will not be any instructions at the beginning of a function to save and set up the frame registers. Doing this is advantageous for performance reasons because an extra register is freed up for other uses along with a few less instructions being run. Compiling with this option can make debugging a little more difficult, so caution may be warranted depending on your application’s need. The SuSE distributions on x86-64 are compiled with this flag, so manual stack analysis using the frame registers is not possible. The x86 architecture does not support the -fomit-frame-pointer flag because debugging is impossible with it.

注意: 某些体系结构 (如 x86-64) 支持-fomit-frame-pointer优化标志, 默认情况下, 它是以 -O2 优化级别打开的 (请参阅 "编译器优化" 部分, 以了解更多信息)。当使用此标志编译对象时, 函数的开头不会有任何说明来保存和设置帧寄存器。这样做是有利于性能, 因为额外的寄存器被因为其它的用途而释放, 这样就少了几个指令被运行。使用此选项进行编译会使调试变得更加困难, 因此, 根据应用程序的需要, 谨慎选择。x86-64 上的 SuSE 发行版是用此标志编译的, 因此使用帧寄存器进行手动堆栈分析是不可能的。x86 体系结构因为调试不方便,而不支持 -fomit-frame-pointer标志。

 

Note also that the start of a function is not necessarily always at offset 0; it could potentially be any value. If it isn’t 0 and you have an offset into the function to look for, simply subtract the offset at the beginning of the function from the offset you are looking for. This is a rule of thumb to always follow. In the example of main(), just shown, we would be subtracting 0 from the offset, which simply gives us the offset itself.

还请注意, 函数的开头偏移不一定总是0; 它可能是任何价值。如果它不是 0, 并且您有一个偏移量要到函数中查找, 只需从要查找的偏移量减去函数开始处的偏移量。这是一个经验法则。在main () 的示例中, 如刚才所示, 我们将从偏移量中减去 0, 这只会给出偏移量本身。

Now, looking at the line immediately above assembly line 22, we see this:

现在, 看看汇编程序22行上面的行, 我们看到的是:

21            .loc 1 4 0

 

The .loc is an assembly directive that stands for “line of code.” The numbers that follow it each have their own significance. The first number, 1 in this example, indicates which source file this line of code comes from. To determine which file 1 represents, you need to look through the assembly listing for the.file directives. Looking again at the assembly listing, we see the following line:

.loc 行是代表 "代码行" 的汇编指令。它后面的数字有自己的意义。在这个例子中,第一个数字1, 表示这行代码来自哪个源文件。要确定文件1, 您需要查看. file 的汇编程序。在汇编程序中, 我们看到以下行:

20            .file 1 "as_listing.c"

 

The 1 after .file indicates which file number this is, and of course the following string is the filename itself. The code in as_listing.c is very simple, so there is only one file. In more complex programs, though, especially when inline functions and macros defined in #include’d files are used, there could be several.file directives, so it’s important to understand this.

.file之后的1表示这是哪个文件号, 当然, 后面的字符串是文件名本身。as_listing.c 中的代码非常简单, 因此只有一个文件。在更复杂的程序中, 特别是当使用 #include 文件中定义的内联函数和宏时, 可能会有多个.file指令, 理解这一点很重要。

The next number after the 1 in the .loc directive, in this example, “4,” is the actual line of code in the file referred to by 1. For our example, line 4 of as_listing.c is in fact

在.loc指令中的1之后的下一个数字, 在本例中, "4" 是文件中引用1的实际代码行。对于我们的例子, as_listing.c中实际上有4行被引用

{

 

which indicates the start of the main() function.

指示main () 函数的开始。

The final number in the .loc directive, 0, is meant to be the column number; however, for GCC compiled objects, this value is always 0.

.loc 指令中的最后一个数字0, 是指列号;但是, 对于 GCC 编译对象, 此值始终为0。

As an aside, during this writing, I did not yet know what the final number in the .loc directive meant. Searching through documentation did not uncover any information other than indicating it was a “column” number. I then decided to harness the power of open source and look at the source code itself to see where that number came from. I found that the .loc directive is indeed emitted by GCC. I downloaded the gcc-3.3.2.tar.gz source tar-ball from ftp.gnu.org and untarred it. I then searched through the source for anything to do with .loc. The function dwarf2out_source_line, as the name implies, writes out information related to the source code line and is found in the file gcc/dwarf2out.c. The lines of interest from that function are

另一方面, 在本文中, 我还不知道最后的数字的作用。搜索文档也没有发现任何信息,表示它不是 "列" 号。我决定利用开源的力量, 查看源代码本身, 看看这个数字是从哪里来的。我发现, 该指令确实是由 GCC 发出的。我从 ftp.gnu.org下载了 gcc-3.3.2.tar.gz 并且解压缩该文件。然后我搜遍了源代码看.loc做了些什么事情,在文件 gcc/dwarf2out.c中函数 dwarf2out_source_line 输出该信息。该函数的相关行是

/* Emit the .loc directive understood by GNU as.  */

fprintf (asm_out_file, "\t.loc %d %d 0\n", file_num, line);

 

As you can see, the “0” is constant; therefore I can only assume that it’s not used for anything important. In any case, the point of discussing this is to show just how powerful open source can be. With a little knowledge and motivation, learning what makes things tick for any purpose is easily within everyone’s reach.

如你所见, "0" 是常数; 因此, 我只能假设它不用于任何重要的事情。在任何情况下, 讨论这一点的目的是要显示开源有多强大。凭借一点点知识和动力, 无论任何目的的学习, 都很容易实现。

4.6. Compiler Optimizations

High-level source compilers have the complex task of converting human readable source code into machine-specific assembly language. This task is complicated even more by various optimization options and levels that are applied during compilation. The intent here is not to give an exhaustive reference of what each optimization option and level does, but rather to make you aware of what effects compiler optimizations can have on your resulting binaries and the ability to debug them. For simplicity, only GCC will be discussed.

高级语言编译器的任务是将可读源代码转换为特定机器汇编语言。由于在编译过程中应用的各种级别的优化选项, 此任务更加复杂。此处的目的不是详细介绍每个级别优化选项, 而是让您知道编译器优化对生成的二进制文件有什么影响以及如何调试它们。为了简单起见, 只讨论 GCC。

There are several optimization levels that perform varying degrees of optimization. For GCC, these levels are specified with the -O parameter immediately followed by zero or one level specifiers. -O implies -O1, which is the first optimization level. Refer to the gcc(1) man page for more information and to see exactly which specific options that are controlled with a flag are enabled at each level. It’s important to understand that the -O2 and -O3 levels perform additional optimizations that consume more resources during compile time but will likely result in faster binaries.

优化级别不同,优化程度不同。对于 GCC, 这些级别是用-o 参数指定的, 紧接着是零个或一个级别说明符。-O 意味着-O1, 这是第一个优化级别。有关详细信息, 请参阅 gcc (1) 帮助手册, 并查看在每个级别上是否启用了带有标志控制的特定选项。重要的是要了解, -O2 和-O3 级别执行额外的优化, 在编译时消耗更多的资源, 但可能会生成更快的二进制文件。

A crucial aspect of compiler optimizations to understand is that debugging abilities are inhibited as more optimizations are performed. This is because for optimizations to work, the compiler must be free to rearrange and manipulate the resulting assembly code in any way it wishes, while of course not changing the desired programming logic. Because of this, correlating a line of C code, for example, directly to a small set of assembly instructions becomes more difficult. This results in a trade-off that the developer must deal with. Having the application run as fast as it possibly can is of course an important goal, but if the resulting application cannot be easily debugged and serviced, huge customer satisfaction issues could result. A good compromise is to use the -O2 optimization level. Excellent optimizations are performed while the ability to debug is still within reach. It’s also important to understand that if a real coding bug exists, it will exist at any optimization level. Therefore when a bug is found when the binaries are compiled with -O2 for example, simply recompile with -O0 and the same bug should occur. The application can then be debugged with a debugger, which will easily allow the developer to examine all variables and step through individual lines of source.

编译器优化的一个关键方面是要理解, 执行越多优化, 调试能力就越差。这是因为要优化工作, 编译器必须可以自由地按照它希望的方式重新排列和管理生成的汇编程序, 当然不会更改程序逻辑。因此, 将 C 源代码 (例如, 直接与一小部分汇编程序) 关联起来变得更加困难。开发人员必须权衡处理。让应用程序尽可能快地运行可能是一个很重要的目标, 但是如果生成的应用程序无法轻松调试, 那么可能会导致客户满意度的问题。一个很好的折衷方法是使用 -O2 优化级别。最好的优化是程序能执行, 而调试能力仍在。同样重要的是要了解, 如果存在真正的代码 bug, 它将存在于任何级别的优化。因此, 当在用 -O2 编译生成二进制文件时发现了一个 bug。只需用-O0 重新编译, 就会出现相同的 bug。然后, 可以使用调试器调试应用程序, 这将很容易地让开发人员检查所有变量并逐步调试源代码。

Everything good has its cost and usually with optimizations, one of the costs is increased code size. Again, reading the GCC man page and manual will detail exactly what each of the options and levels do and how they affect size, but in general code size will increase proportionally with the optimization levels 1, 2, and 3. There also exists -Os, which will optimize the code for size. This is good consideration for the AMD Opteron-based architecture, for example, as the cache size is relatively small at 1MB.

一切美好事物都有它的成本, 对优化也是一样的。优化的成本之一是增加代码大小。同样, 阅读 GCC 帮助手册将将透彻了解每个优化选项和优化级别的作用以及它们对大小的影响, 但在通常情况下,代码大小将与优化级别1、2和3成比例增加。还存在一些选项, 它将优化代码的大小。这是对 AMD Opteron体系结构的良好考虑, 例如, 由于高速缓存大小在1MB 时相对较小。

To illustrate how much the compiler optimization can affect binaries, let’s use the following very simple code and call it simple.c:

为了说明编译器优化对二进制文件的影响程度, 让我们使用下面非常简单的代码, 并简单地命名为simple. c:

#include <stdio.h>

 

int add_them_up(    int a, int b, int c )

{

   return a + b + c;

}

 

int main( void )

{

   int a = 1;

   int b = 2;

   int c = 3;

   int z = 0;

 

   z = add_them_up( 1, 2, 3);

 

   printf( "Answer is: %d\n", z );

 

   return 0;

}

 

Next, produce two separate assembly listings—one compiled with -O0 and one with -O3 by doing the following:

接下来, 生成两个单独的汇编程序-一个用 -O0 编译, 一个用 -O3 编译。参见以下操作:

penguin$ gcc -O0 -S -g simple.c

penguin$ as -alh simple.s > simple.out.no-opt

penguin$ gcc -O3 -S -g simple.c

penguin$ as -alh simple.s > simple.out.opt

 

The output of as -alh will produce a lot of additional symbol information that we’re generally not interested in. Omitting the uninteresting parts of simple.out.no-opt, here’s the produced output:

-alh 的输出将产生许多我们通常不感兴趣的附加符号信息。省略了不感兴趣的部分 simple.out.no-opt, 输出如下:

Code View: Scroll / Show All

  10      .globl add_them_up

  11             .type   add_them_up, @function

  12      add_them_up:

  13      .LFB3:

  14             .file 1 "simple.c"

   1:simple.c      **** #include <stdio.h>

   2:simple.c      ****

   3:simple.c      **** int add_them_up( int a, int b, int c )

   4:simple.c      **** {

  15             .loc 1 4 0

  16 0000 55            pushl  %ebp

  17      .LCFI0:

  18 0001 89E5          movl   %esp, %ebp

  19      .LCFI1:

   5:simple.c      ****   return a + b + c;

  20             .loc 1 5 0

  21 0003 8B450C        movl   12(%ebp), %eax

  22 0006 034508        addl   8(%ebp), %eax

  23 0009 034510        addl   16(%ebp), %eax

   6:simple.c      **** }

  24             .loc 1 6 0

  25 000c 5D            popl   %ebp

  26 000d C3            ret

  27      .LFE3:

  28             .size  add_them_up, .-add_them_up

  29             .section      .rodata

  30      .LC0:

  31 0000 416E7377             .string "Answer is: %d\n"

  31      65722069

  31      733A2025

  31      640A00

  32             .text

  33      .globl main

  34             .type  main, @function

  35      main:

  36      .LFB5:

   7:simple.c      ****

   8:simple.c      **** int main( void )

   9:simple.c      **** {

  37             .loc 1 9 0

  38 000e 55            pushl  %ebp

  39      .LCFI2:

  40 000f 89E5          movl   %esp, %ebp

  41      .LCFI3:

GAS LISTING simple.s                  page 2

 

  42 0011 83EC18        subl   $24, %esp

  43      .LCFI4:

  44 0014 83E4F0        andl   $-16, %esp

  45 0017 B8000000             movl   $0, %eax

  45      00

  46 001c 29C4          subl   %eax, %esp

  10:simple.c      ****    int a = 1;

  47             .loc 1 10 0

  48      .LBB2:

  49 001e C745FC01             movl   $1, -4(%ebp)

  49      000000

  11:simple.c      ****    int b = 2;

  50             .loc 1 11 0

  51 0025 C745F802             movl   $2, -8(%ebp)

  51      000000

  12:simple.c      ****    int c = 3;

  52             .loc 1 12 0

  53 002c C745F403             movl   $3, -12(%ebp)

  53      000000

  13:simple.c      ****    int z = 0;

  54             .loc 1 13 0

  55 0033 C745F000             movl   $0, -16(%ebp)

  55      000000

  14:simple.c      ****

  15:simple.c      ****    z = add_them_up( 1, 2, 3);

  56             .loc 1 15 0

  57 003a 83EC04        subl   $4, %esp

  58 003d 6A03          pushl  $3

  59 003f 6A02          pushl  $2

  60 0041 6A01          pushl  $1

  61      .LCFI5:

  62 0043 E8FCFFFF             call add_them_up

  62      FF

  63 0048 83C410        addl   $16, %esp

  64 004b 8945F0        movl   %eax, -16(%ebp)

  16:simple.c      ****

  17:simple.c      ****   printf( "Answer is: %d\n", z );

  65             .loc 1 17 0

  66 004e 83EC08        subl   $8, %esp

  67 0051 FF75F0        pushl  -16(%ebp)

  68 0054 68000000             pushl  $.LC0

  68      00

  69 0059 E8FCFFFF             call   printf

  69      FF

  70 005e 83C410        addl   $16, %esp

  18:simple.c      ****

  19:simple.c      ****   return 0;

  71             .loc 1 19 0

  72 0061 B8000000             movl   $0, %eax

  72      00

  20:simple.c      **** }

  73             .loc 1 20 0

  74 0066 C9            leave

  75 0067 C3            ret

  76      .LBE2:

  77      .LFE5:

  78             .size  main, .-main

GAS LISTING simple.s                  page 3

 

Next, let’s cut the uninteresting parts of simple.out.opt and examine it more closely:

接下来, 让我们去掉 simple.out.opt 无关的部分, 并仔细检查它:

Code View: Scroll / Show All

  15      .globl main

  16             .type main, @function

  17      main:

  18      .LFB14:

  19             .file 1 "simple.c"

   1:simple.c      **** #include <stdio.h>

   2:simple.c      ****

   3:simple.c      **** int add_them_up( int a, int b, int c )

   4:simple.c      **** {

   5:simple.c      ****    return a + b + c;

   6:simple.c      **** }

   7:simple.c      ****

   8:simple.c      **** int main( void )

   9:simple.c      **** {

  20             .loc 1 9 0

  21 0000 55            pushl   %ebp

  22      .LCFI0:

  23 0001 89E5          movl    %esp, %ebp

  24      .LCFI1:

  25 0003 52            pushl   %edx

  26 0004 52            pushl   %edx

  27 0005 83E4F0        andl    $-16, %esp

  10:simple.c      ****   int  a = 1;

  11:simple.c      ****   int  b = 2;

  12:simple.c      ****   int  c = 3;

  13:simple.c      ****   int  z = 0;

  14:simple.c      ****

  15:simple.c      ****   z =  add_them_up( 1, 2, 3);

  16:simple.c      ****

  17:simple.c      ****   printf( "Answer is: %d\n", z );

  28             .loc 1 17 0

  29      .LBB2:

  30 0008 50            pushl   %eax

  31 0009 50            pushl   %eax

  32 000a 6A06          pushl   $6

  33 000c 68000000              pushl $.LC0

GAS LISTING simple.s                  page 2

 

 

  33      00

  34      .LCFI2:

  35 0011 E8FCFFFF              call  printf

  35      FF

  18:simple.c      ****

  19:simple.c      ****    return 0;

  20:simple.c      **** }

  36             .loc 1 20 0

  37      .LBE2:

  38 0016 31C0          xorl    %eax, %eax

  39 0018 C9            leave

  40 0019 C3            ret

  41      .LFE14:

  42             .size  main, .-main

  43 001a 8DB60000             .p2align 4,,15

  43      0000

  44      .globl add_them_up

  45             .type  add_them_up,  @function

  46      add_them_up:

  47      .LFB25:

  48             .loc 1 4 0

  49 0020 55            pushl   %ebp

  50      .LCFI3:

  51 0021 89E5          movl    %esp, %ebp

  52      .LCFI4:

  53 0023 8B450C        movl    12(%ebp), %eax

  54             .loc 1 5 0

  55 0026 8B4D08        movl    8(%ebp), %ecx

  56 0029 8B5510        movl    16(%ebp), %edx

  57 002c 01C8          addl    %ecx, %eax

  58 002e 01D0          addl    %edx, %eax

  59             .loc 1 6 0

  60 0030 5D            popl    %ebp

  61 0031 C3            ret

  62      .LFE25:

  63             .size  add_them_up,  .-add_them_up

 

Comparing these two assembly listings, we can quickly see that the interesting parts in the optimized version are smaller than the interesting parts in the non-optimized version. For a small and simple program such as simple.c, this is expected because the compiler optimizations will strip out unnecessary instructions and make other changes in favor of performance. Remember that every single assembly instruction in a program represents a finite amount of time running on a processor, so less assembly instruction leads to greater performance. Of course, optimization techniques go well above and beyond that statement and could easily comprise a specialized book.

比较这两个汇编程序, 我们可以快速看到优化版本中有关的部分比非优化版本中有关的部分小。对于一个小而简单的程序, 如simple.c, 这是符合预期的, 因为编译器优化将去掉不必要的指令, 并进行其他合理更改以利于性能。请记住, 汇编程序中的每个汇编指令都表示在处理器上运行有限的时间, 因此汇编指令越少, 性能就越高。当然, 优化技术远远超出了本部分讨论的范围, 需要一本专门的书来讨论优化技术。

A very interesting observation regarding the difference in size between the two assembly listings is that the file size of the optimized version is almost exactly twice as large as the non-optimized version.

关于两个汇编程序之间大小差异的一个非常有趣的现象是, 优化版本的文件大小几乎是非优化版本两倍左右。

penguin> ls -l simple.out.opt simple.out.no-opt

-rw-r—r—  1 dbehman  users   17482 2004-08-30 21:46 simple.out.no-opt

-rw-r—r—  1 dbehman  users   35528 2004-08-30 21:46 simple.out.opt

 

So even though the assembly instructions are streamlined, there is a lot of extra data generated in the optimized listing over the non-optimized one. A significant portion of this extra data is the additionally inserted debug info. Often, when the compiler implements various optimization tricks and techniques, as mentioned, debugging capability is sacrificed. In an effort to alleviate this, more debugging info is added to the assembly, thus the larger file size.

因此, 即使汇编指令是精简的, 在优化的汇编程序中, 还有许多额外的数据在非优化的汇编程序中生成。此额外数据的一个重要部分是插入的调试信息。通常, 当编译器实现各种优化技巧和技术时, 如前所述, 以牺牲调试能力为代价。为了缓解这种情况, 将更多的调试信息添加到汇编程序中, 从而使文件体量变大。

Examining the assembly instructions more closely for the two assembly listings will show that simple.out.opt will look quite a bit more compressed and advanced than the non-optimized assembly. You should also notice right away that something strange has happened with the add_them_up() function in simple.out.opt. The function’s location was placed after main instead of before main as it is in the non-optimized version. This confuses the as -alh command; therefore the C source code is not properly intermixed. The C source is nicely intermixed with add_them_up() in the non-optimized assembly listing, which is very easy to read and associate assembly instructions with C source lines of code.

对两个汇编程序进行更仔细的检查, 可以发现 simple.out.opt 看起来比非优化的程序集更加压缩和高级。你也应该立刻注意到, 在 simple.out.opt 中 add_them_up () 函数发生了一些奇怪的事情。该函数的位置是在 main 之后, 而不是main之前, 因为它位于非优化版本中。这混淆了 “as –alh” 命令;因此, C 源代码没有正确混合。c 源代码在非优化的汇编程序中与 add_them_up () 很好地混合在一起, 这非常容易阅读, 并将汇编程序指令与 C 源代码行关联起来。

Let’s look a little closer at the generated assembly in each listing around this line of C source code:

让我们在下面的 C 源代码行的每个汇编列表中仔细查看生成的汇编程序:

z = add_them_up( 1, 2, 3);

 

In the associated assembly we would expect to see a call add_them_up instruction. In fact, we do see this in simple.out.no-opt, but we do not see it in simple.out.opt! What happened? Let’s look closer at the area in the optimized assembly listing where we expect the call to add_them_up():

在关联的汇编程序中, 我们期望看到调用 add_them_up 的指令。事实上, 在simple.out.no-opt,我们确实看到了, 但我们没有在 simple.out.opt看到它!发生了什么事?让我们仔细看看在优化的汇编程序中, 我们希望调用 add_them_up () 的部分:

  15:simple.c      ****    z = add_them_up( 1, 2, 3);

  16:simple.c      ****

  17:simple.c      ****    printf( "Answer is: %d\n", z );

  28             .loc 1 17 0

  29      .LBB2:

  30 0008 50            pushl   %eax

  31 0009 50            pushl   %eax

  32 000a 6A06          pushl   $6

  33 000c 68000000              pushl $.LC0

GAS LISTING simple.s                  page 2

 

  33      00

  34      .LCFI2:

  35 0011 E8FCFFFF              call printf

 

We can see that there is no assembly associated with C source code on line 15, which is where we call add_them_up() with the constant values 1, 2, and 3. Note the two pushl instructions which immediately precede the call printf instruction. These instructions are part of the procedure calling conventions-see the section on calling conventions for more details. The basic idea, however, is that on the i386 architecture, procedure arguments get pushed onto the stack in reverse order. Our call to printf takes two arguments - the string “Answer is: %d\n” and the variable z. So we know that the first push is for the variable z:

我们可以看到, 在第15行上没有与 C 源代码关联的汇编程序, 这是我们用常量值1、2和3调用 add_them_up () 的地方。注意在调用 printf 指令之前的两个 pushl 指令。这些指令是调用约定的一部分-请参见有关调用约定的一节以了解更多详细信息。但是, 基本思想是, 在 i386 体系结构上, 函数参数按相反顺序被推送到堆栈上。我们对 printf 的调用需要两个参数-字符串 " Answer is:%d \n" 和变量 z。因此, 我们知道, 第一个push是变量 z:

32 000a 6A06          pushl $6

Note: GCC for Linux uses the AT&T assembly syntax instead of the Intel syntax. The primary differences between the two syntaxes are as follows in Table 4.1.

注意: 用于 Linux 的 GCC 使用的是AT&T 汇编程序而不是Intel汇编程序。这两种汇编之间的主要区别如下表4.1 所示。

Table 4.1. Assembly Syntax Comparison.

 

AT&T

Intel

Operand order

source, destination

destination, source

Register naming

Prefix of %

No prefix

Constant values

Prefix of $

No prefix

Example: move the constant value 1 into the eax register

mov $1, %eax

mov eax, 1

 

So the compiler’s optimizer was smart enough to understand that the add_them_up() function was simply adding constant values and returning the result. Function calls are expensive in terms of performance, so any time a function call can be avoided is a huge bonus. This is why the compiler completely avoids calling add_them_up() and simply calls printf with the computed value of 6.

因此, 编译器的优化器足够聪明, 可以理解 函数 add_them_up ()只是计算常量加法并返回结果。函数调用对性能影响很大, 因此任何时候都避免函数调用,这对性能非常有好处。这就是为什么编译器完全避免调用 add_them_up (), 而只调用printf和计算结果6。

To take our examination a step further, let’s create an assembly listing for simple.c at the -O2 optimization level to see how it affects our call to add_them_up() and printf(). The section of interest is:

为了进一步检查, 让我们在 -O2 优化级别为simple.c生成汇编程序, 以了解它如何影响我们对 add_them_up () 和 printf () 的调用。相关部分是:

15:simple.c      ****   z =  add_them_up( 1, 2, 3);

49                            .loc 1 15 0

50                   .LBB2:

51 0028 50                    pushl   %eax

52 0029 6A03                  pushl   $3

53 002b 6A02                  pushl   $2

54 002d 6A01                  pushl   $1

55                   .LCFI4:

56 002f E8FCFFFF              call    add_them_up

56      FF

57 0034 5A                    popl    %edx

58 0035 59                    popl    %ecx

16:simple.c      ****

17:simple.c      ****   printf( "Answer is: %d\n", z );

59                            .loc 1 17 0

60 0036 50                    pushl   %eax

61 0037 68000000              pushl   $.LC0

61      00

62 003c E8FCFFFF              call    printf

 

The setup prior to the printf call looks very similar to the non-optimized assembly listing, but there is one significant difference. At the -O2 level, the %eax register is pushed onto the stack for our variable z. Recall from the section on calling conventions that %eax always holds a procedure’s return value. The GCC optimizer at -O2 is smart enough to know that we can simply leave the return value from the call to add_them_up() in the %eax register and push that directly onto the stack for the call to printf. The non-optimized assembly takes the return value in the %eax register from the call to add_them_up() and copies it to the stack which is where the variable z is stored. This then results in pushing the value from where it’s stored on the stack in preparation for the call to printf. This is much more expensive than register accesses.

在 printf 调用之前的设置看起来非常类似于未优化的汇编程序, 但有一个显著的差异。在 -O2 级别,%eax 寄存器被推送到堆栈上,变量 z使用. 回想调用约定节,%eax 始终保存函数的返回值。GCC 优化器 -O2 很聪明, 可以知道我们只需将返回值从调用 add_them_up ()的%eax 寄存器中的中保留下来, 然后直接推送到堆栈上,并调用 printf。非优化的汇编程序从调用 add_them_up ()的返回值 %eax 寄存器中获取, 并将其复制到存储变量 z 的堆栈中。这将导致将值从存储的堆栈上推送到对 printf 的调用准备中。这比直接寄存器访问昂贵得多。

Another major difference between the assembly at the -O2 and -O3 levels is that -O2 still makes a call to add_them_up() just as the non-optimized assembly does. This tells us that there is some optimization done specific to the -O3 level that results in saving an unnecessary function call. Looking at the GCC(1) man page, we see the following:

在 -O2 和-O3 级别的汇编程序的另一个主要区别是-O2 仍然调用 add_them_up (), 就像非优化的程序集一样。这告诉我们, 有一些特定于 -O3 级别的优化, 结果是节省了不必要的函数调用。查看 GCC (1) 帮助手册, 我们看到以下内容:

-O3 Optimize yet more.  -O3 turns on all optimizations specified by

-O2 and also turns on the -finline-functions,

   -fweb,

   -funit-at-a-time, -ftracer, -funswitch-loops and

   -frename-registers options.

 

Looking at the options enabled at -O3, -finline-functions looks very interesting. The man page documents the following:

查看在 -O3 中启用的选项, -finline-functions 看起来非常有趣。帮助手册文档如下:

-finline-functions

   Integrate all simple functions into their callers.  The compiler

   heuristically decides which functions are simple enough to be

   worth integrating in this way.

 

   If all calls to a given function are integrated, and the function

   is declared "static," then the function is normally not output as

   assembler code in its own right.

 

   Enabled at level -O3.

 

This explains exactly what we’ve observed with the call to add_them_up() at the -O3 level. To confirm, we can produce another assembly listing with the following commands:

这正好解释了我们在 -O3 级别对 add_them_up () 的调用所观察到的内容。为了确认, 我们可以使用以下命令生成另一个汇编程序:

penguin$ gcc -O2 -finline-functions -S -g simple.c

penguin$ as -alh simple.s > simple.out.opt-O2-finline_functions

 

The interesting parts from simple.out.opt-O2-finline-functions shows:

  15:simple.c      ****    z = add_them_up( 1, 2, 3);

  16:simple.c      ****

  17:simple.c      ****    printf( "Answer is: %d\n", z );

  28                            .loc 1 17 0

  29                    .LBB2:

  30 0008 50                    pushl   %eax

  31 0009 50                    pushl   %eax

  32 000a 6A06                  pushl   $6

  33 000c 68000000              pushl   $.LC0

^LGAS LISTING simple.s.opt-O2-inline-functions          page 2

 

  33      00

  34                    .LCFI2:

  35 0011 E8FCFFFF              call    printf

 

Bingo! We have identified a specific assembly change made by using an additional compiler optimization switch.

Bingo!我们已经确定了使用不同编译器优化选项所做的特定汇编程序的更改。

4.7. Conclusion

With the information presented in this chapter, you should now be much better armed to defend against many compilation-related problems that may come your way. With more knowledge of problem determination at runtime, you’re well on your way to becoming a completely self-sufficient Linux user.

利用本章提供的信息, 你现在应该更好地武装起来, 以你自己的方法来处理许多与编译相关的问题。利用更多运行时确定问题的知识, 你就可以成为完全自给自足的 Linux 用户了。

 

发布了234 篇原创文章 · 获赞 12 · 访问量 24万+

猜你喜欢

转载自blog.csdn.net/mounter625/article/details/102712026