u-boot之u-boot.lds文件

注：本文基于yocto编译出的imx6的u-boot.2016.03

0 . 基本概念

关于链接脚本语言我们要建立一些基本的概念。
链接器把所有的输入文件打包成一个输出文件。这个输出文件和每个输入文件都是有称之”目标文件格式“（object file format）的格式，每一个文件都称之为”目标文件“（object file）。输出文件称之为可执行的，但是我们都称这目标文件。每个文件都有很多的段（sections）。输入文件的段称之为输入段，相同地，输出文件的段称之为输出段。
每一个段在一个目标文件里都有其大小与名字。所有的段都有与之相关的数据。一个段可以为可装载的，这意味着这个输出文件在运行的时候要被装载进内存，有些段没有内容的，也许是要为它分配内存，但是却没有内容装载进去（例如：某些内存要置零）。这些段都包含了一些调试信息。
每一个可装载或要分配内存的输出段有两个地址，一个为VMA（虚拟地址），这个地址是运行地址，一个为LMA(装载地址)，这个地址是段被装载的地址。大多数情况下，这两个地址是相同的。如果程序启动时一个数据段被装载进了ROM而拷贝到RAM时，这时两个地址将不同,ROM的地址为LMA,而RAM的地址为VMA.
可以使用objdump -h指令来看段信息
每一个目标文件都有一堆的符号，我们称为符号表，一个符号有可能定义了也有可能末定义，每个符号有一个名字，每一个定义的符号都有一个地址和其它的信息。如果编译C/C++ 程序时，每一个定义的函数，全局数据，静态变量都会有一个符号表，末定义的都变成一个末定义的符号
可以使用objdump -t来查看符号表

1. u-boot.lds 文件注示

./u-boot.lds or build_output_file_dir/u-boot.lds

OUTPUT_FORMAT("elf32-littlearm", "elf32-littlearm", "elf32-littlearm")

/*指定输出可执行文件是elf格式,32位ARM指令,小端
OUTPUT_FORMAT的格式为：
OUTPUT_FORMAT(default, big, little)
原文说明：
If neither ‘-EB’ nor ‘-EL’ are used, then the output format will be the first argument, default. If ‘-EB’ is used, the output format will be the second argument,
big. If ‘-EL’ is used, the output format will be the third argument, little.
（编译）命令行参数里有 -EB参数使用第二个参数的格式，如果有-EL参数则使用第三个参数，都没有指定，使用第一个参数。在imx6q的uboot代码里，这两个参数都没有指定。关于 -EB\EL参数的含义可以查看gnu的gcc文档。
elf：表示文件（可执行文件、可重定位文件(.o)、共享目标文件(.so)、核心转储文件）的存储格式，使用readelf 命令可以查看文件的相关内容。具体见《参考》
*/
OUTPUT_ARCH(arm)
/*指定输出可执行文件的平台为ARM
格式：OUTPUT_ARCH(bfdarch)
Specify a particular output machine architecture. The argument is one of the names used by the BFD library。
可以使用objdump -f 来查看文件的平台

BFD是Binary format descriptor的缩写, 即二进制文件格式描述，是很多可执行文件相关二进制工具（如nm、objdump、ar、as等命令）的基础库。bfd库可以用来分析、创建、修改二进制文件，支持多种平台（如x86、arm等）及多种二进制格式（如elf、core、so等）
*/
ENTRY(_start)
/*
The first instruction to execute in a program is called the entry point. You can use the ENTRY linker script command to set the entry point. The argument is a symbol name:
    ENTRY(symbol)
There are several ways to set the entry point. The linker will set the entry point by trying each of the following methods in order, and stopping when one of them succeeds:
• the ‘-e’ entry command-line option;
• the ENTRY(symbol) command in a linker script;
• the value of the symbol start, if defined;
• the address of the first byte of the ‘.text’ section, if present;
• The address 0.

指定入口，这个_start函数定义在start.S函数里，可以在System.map文件里看到第一个地址是 _start。上电执行的第一条指令。
*/


SECTIONS
{
    
    
 . = 0x00000000;//起始地址，给dot 赋值为0x00000000
 . = ALIGN(4); //4字节对齐
 .text ://第一个段是代码段
 {
    
    
  *(.__image_copy_start) //要拷贝的镜像的开始，它与__image_copy_end是一对，在section.c里定义了。这两个位置把要重定向的数据给包起来了，在map文件里也可看得到。重定向（以后解锁）
  *(.vectors) //异常向量 arch/arm/lib/vectors.S里（以后解锁）
  arch/arm/cpu/armv7/start.o (.text*) //start.o里的代码段
  *(.text*)//其它的代码段
 }
 . = ALIGN(4);
 .rodata : {
    
     *(SORT_BY_ALIGNMENT(SORT_BY_NAME(.rodata*))) }//只读数据段 SORT_BY_NAME，SORT_BY_ALIGNMENT都ld的关键字排序。
 . = ALIGN(4);
 .data : {
    
     //可读写数据段
  *(.data*)
 }
 . = ALIGN(4);
 . = .;
 . = ALIGN(4);
 .u_boot_list : {
    
    //uboot 的所有指令都放在一起了，找的时候是通过内存地址去找函数的入口的。
  KEEP(*(SORT(.u_boot_list*)));//每个指令都指定了段,可以看linker_lists.h里的定义；SORT是关键字，进行排序；KEEP见下文。
 }
 . = ALIGN(4);
 .image_copy_end :
 {
    
    
  *(.__image_copy_end)
 }
 .rel_dyn_start :
 {
    
    
  *(.__rel_dyn_start)
 }
 .rel.dyn : {
    
    
  *(.rel*) //不懂，没有查到 .rel*相关的段，？？？？？
 }
 .rel_dyn_end :
 {
    
    
  *(.__rel_dyn_end)
 }
 .end :
 {
    
    
  *(.__end)
 }
 _image_binary_end = .;
 . = ALIGN(4096);
 .mmutable : {
    
    
  *(.mmutable)
 }
 .bss_start __rel_dyn_start (OVERLAY) : {
    
     //OVERLAY 是ld的关键字，不太懂，应该是把这几个段共用了一个空间。
  KEEP(*(.__bss_start));
  __bss_base = .;
 }
 .bss __bss_base (OVERLAY) : {
    
    
  *(.bss*)
   . = ALIGN(4);
   __bss_limit = .;
 }
 .bss_end __bss_limit (OVERLAY) : {
    
    
  KEEP(*(.__bss_end));
 }
 .dynsym _image_binary_end : {
    
     *(.dynsym) }
 .dynbss : {
    
     *(.dynbss) }
 .dynstr : {
    
     *(.dynstr*) }
 .dynamic : {
    
     *(.dynamic*) }
 .plt : {
    
     *(.plt*) }
 .interp : {
    
     *(.interp*) }
 .gnu.hash : {
    
     *(.gnu.hash) }
 .gnu : {
    
     *(.gnu*) }
 .ARM.exidx : {
    
     *(.ARM.exidx*) }
 .gnu.linkonce.armexidx : {
    
     *(.gnu.linkonce.armexidx.*) }
}

2. SECTIONS 命令

在脚本文件里，最重要的是SECTIONS命令了，官方解释：
The SECTIONS command controls exactly where input sections are placed into output sections, their order in the output file, and to which output sections they are allocated.
SECTIONS是控制输入文件放在输出文件的那个位置，决定输出文件的次序。在脚本里至多有一个SECTIONS。

2.1 SECTIONS 语法-----格式

...
secname start BLOCK(align) (NOLOAD) : AT ( ldadr )
  { contents } >region =fill
...
}

secname 与contents是必须项，其它的都是可选项。
start 为输出段的绝对地址
BLOCK 为输出段的对齐
NOLOAD 这个段不加载进内存

==NOLOAD==
The section should be marked as not loadable, so that it will not be loaded into memory when the program is run. 
这个段不加载进内存
DSECT
COPY
INFO
OVERLAY
These type names are supported for backward compatibility, and are rarely used. They all have the same effect: the section should be marked as not allocatable, so that no memory is allocated for the section when the program is run.
这个段是不可分配内存的。

AT 表示加载地址
fill 表示要填充的数据

2.1 SECTIONS 语法-----"."

这个点，不太好翻译原文是： “Location Counter”，我的理解：类似于PC指针。这里我翻译成”位置指针“可能比较靠谱，或者叫LC指针。有些博文翻译成”定位器符号“。

2.1 SECTIONS 语法-----“KEEP()”

When link-time garbage collection is in use (`–gc-sections’), it is often useful to mark sections that should not be eliminated. This is accomplished by surrounding an input section’s wildcard entry with KEEP(), as in KEEP((.init)) or KEEP(SORT_BY_NAME()(.ctors)).

KEEP() 把没有用到的section给留下来。这个对于u-boot的命令是很有用的，所有的命令不一定都使用到了。

在编译的时候，如加入-ffunction-sections, -fdata-sections 选项，编译器把每个函数作为一个section，每个数据（应该是指全局变量之类的吧）也作为一个section，这样链接的时候，加入–gc-sections选项则会把没用到的section丢弃掉，最终的可执行文件就只包含用到了的函数和数据。也就是说，链接的单位，是函数级别，这样就能丢弃没使用的函数。

2.1 SECTIONS 语法-----通配符

通配符和unix的是一样的有如下几种

`*’：匹配任意多个字符
`?’：匹配任意单个字符
`[chars]’ ：匹配任意单个字母，可使用 ”-“来表示范围 [a-z] 表示小写字母
\：转义符，表示后面的字符

3 例子

例1、

     SECTIONS {
       outputa 0x10000 :
         {
         all.o
         foo.o (.input1)
         }
       outputb :
         {
         foo.o (.input2)
         foo1.o (.input1)
         }
       outputc :
         {
         *(.input1)
         *(.input2)
         }
     }

https://sourceware.org/binutils/docs-2.21/ld/Input-Section-Example.html#Input-Section-Example
把输入文件all.o的所有段都放在输出文件 outputa的最前面，outputa段的开始地址为0x10000 ，foo.o文件标记为 .input1的段紧接着放在outputa这个输出段里，foo.o文件标记为 .input2的段放在outputb输出段的开始位置，foo1.o文件标记为 .input1的段接着放，剩下其它文件标记为.input1，.input1的段都放在输出段outputc.

例2、

 SECTIONS
{
  output :
  {
  file1(.text)
  . = . + 1000;
  file2(.text)
  . += 1000;
  file3(.text)
  } = 0x1234;
}

file1的代码段放在输出段的最前面，接着是1000 字节的gap，接着是file2的代码段，又接着是1000 字节的gap，然后是file3的代码段，这些gap都写上0x1234

参考

http://www.math.utah.edu/docs/info/ld_toc.html (Using ld,The GNU linker)主要参考
https://sourceware.org/binutils/docs-2.21/ld/index.html#Top (ld)
https://blog.csdn.net/muaxi8/article/details/79627859 (elf文件格式析)
https://www.cnblogs.com/feng9exe/p/6899351.html (elf文件格式分析)
http://www.sco.com/developers/gabi/latest/ch4.eheader.html (elf格式官方文档)
https://blog.csdn.net/t3swing/article/details/79671461 (bfd库使用-nm源码分析)
https://www.gnu.org/software/binutils/ (GNU Binutils)
https://blog.csdn.net/zhaixuebuluo/article/details/86658045
https://blog.csdn.net/Egean/article/details/84923005
https://blog.csdn.net/itxiebo/article/details/50938753