参考《程序员的自我修养》ch4.
1. 空间与地址分配
这里的空间分配只关注于虚拟地址空间的分配
现在的链接器空间分配基本上都采用 相同类型合并 的策略,使用这种方法的链接器一般采用一种叫 两步链接(Two-pass Linking) 的方法。 也就是说整个过程分两步:
第一步 空间与地址分配;
第二步 符号解析与重定位,这一步是链接的核心,特别是重定位;
>> ld a.o b.o -e main -o ab >> objdump -h a.o a.o: file format elf32-i386 Sections: Idx Name Size VMA LMA File off Algn 0 .text 00000034 00000000 00000000 00000034 2**2 CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE 1 .data 00000000 00000000 00000000 00000068 2**2 CONTENTS, ALLOC, LOAD, DATA 2 .bss 00000000 00000000 00000000 00000068 2**2 ALLOC 3 .comment 00000024 00000000 00000000 00000068 2**0 CONTENTS, READONLY 4 .note.GNU-stack 00000000 00000000 00000000 0000008c 2**0 CONTENTS, READONLY >> objdump -h b.o b.o: file format elf32-i386 Sections: Idx Name Size VMA LMA File off Algn 0 .text 0000003e 00000000 00000000 00000034 2**2 CONTENTS, ALLOC, LOAD, READONLY, CODE 1 .data 00000004 00000000 00000000 00000074 2**2 CONTENTS, ALLOC, LOAD, DATA 2 .bss 00000000 00000000 00000000 00000078 2**2 ALLOC 3 .comment 00000024 00000000 00000000 00000078 2**0 CONTENTS, READONLY 4 .note.GNU-stack 00000000 00000000 00000000 0000009c 2**0 CONTENTS, READONLY >> objdump -h ab ab: file format elf32-i386 Sections: Idx Name Size VMA LMA File off Algn 0 .text 00000072 08048094 08048094 00000094 2**2 CONTENTS, ALLOC, LOAD, READONLY, CODE 1 .data 00000004 08049108 08049108 00000108 2**2 CONTENTS, ALLOC, LOAD, DATA 2 .comment 00000048 00000000 00000000 0000010c 2**0 CONTENTS, READONLY
VMA: Virtual Memory Address
LMA: Load Memory Address
链接后目标文件中所使用的地址已经是程序在进程中的虚拟地址,例如.text的起始地址是08048094。在linux下,进程空间地址从08048000开始分配。
2. 符号解析与重定位
>> objdump -d a.o a.o: file format elf32-i386 Disassembly of section .text: 00000000 <main>: 0: 8d 4c 24 04 lea 0x4(%esp),%ecx 4: 83 e4 f0 and $0xfffffff0,%esp 7: ff 71 fc pushl -0x4(%ecx) a: 55 push %ebp b: 89 e5 mov %esp,%ebp d: 51 push %ecx e: 83 ec 24 sub $0x24,%esp 11: c7 45 f8 64 00 00 00 movl $0x64,-0x8(%ebp) 18: c7 44 24 04 00 00 00 movl $0x0,0x4(%esp) 1f: 00 20: 8d 45 f8 lea -0x8(%ebp),%eax 23: 89 04 24 mov %eax,(%esp) 26: e8 fc ff ff ff call 27 <main+0x27> 2b: 83 c4 24 add $0x24,%esp 2e: 59 pop %ecx 2f: 5d pop %ebp 30: 8d 61 fc lea -0x4(%ecx),%esp 33: c3 ret
重定位后,
>> objdump -d ab ab: file format elf32-i386 Disassembly of section .text: 08048094 <main>: 8048094: 8d 4c 24 04 lea 0x4(%esp),%ecx 8048098: 83 e4 f0 and $0xfffffff0,%esp 804809b: ff 71 fc pushl -0x4(%ecx) 804809e: 55 push %ebp 804809f: 89 e5 mov %esp,%ebp 80480a1: 51 push %ecx 80480a2: 83 ec 24 sub $0x24,%esp 80480a5: c7 45 f8 64 00 00 00 movl $0x64,-0x8(%ebp) 80480ac: c7 44 24 04 08 91 04 movl $0x8049108,0x4(%esp) 80480b3: 08 80480b4: 8d 45 f8 lea -0x8(%ebp),%eax 80480b7: 89 04 24 mov %eax,(%esp) 80480ba: e8 09 00 00 00 call 80480c8 <swap> 80480bf: 83 c4 24 add $0x24,%esp 80480c2: 59 pop %ecx 80480c3: 5d pop %ebp 80480c4: 8d 61 fc lea -0x4(%ecx),%esp 80480c7: c3 ret
2.1 重定位表 Relocation Table
对于每个要被重定位的ELF section都有一个对应的重定位section。
>> objdump -r a.o a.o: file format elf32-i386 RELOCATION RECORDS FOR [.text]: OFFSET TYPE VALUE 0000001c R_386_32 s 00000027 R_386_PC32 swap
这里的1c和27是.text中需要重定位的地方。
/* Relocation table entry without addend (in section of type SHT_REL). */ typedef struct { Elf32_Addr r_offset; /* Address */ Elf32_Word r_info; /* Relocation type and symbol index */ } Elf32_Rel;
2.2 符号解析
>> ld a.o ld: warning: cannot find entry symbol _start; defaulting to 0000000008048074 a.o: In function `main': a.c:(.text+0x1c): undefined reference to `s' a.c:(.text+0x27): undefined reference to `swap'链接器会查找由所有输入目标文件的符号表组成的全局符号表,找到相应的符号后进行重定位。
2.3 指令修正
see: http://stackoverflow.com/questions/12412064/meaning-of-r-386-32-r-386-pc32-in-rel-text-section-of-elf
R_386_32 is a relocation that places the absolute 32-bit address of the symbol into the specified memory location. R_386_PC32 is a relocation that places the PC-relative 32-bit address of the symbol into the specified memory location. R_386_32 is useful for
static data, as shown here, since the compiler just loads the relocated symbol address into some register and then treats it as a pointer. R_386_PC32 is useful for function references since it can be used as an immediate argument to call. See elf_machdep.c
for an example of how the relocations are processed.
3. COMMON块
由于链接器本身不支持符号的类型,即变量类型对于链接器来说是透明的,它只知道一个符号的名字,并不知道类型是否一致。 因此在处理弱符号时,多个弱符号定义类型会出现不一致的情况,这需要链接器来处理。
现在编译器和链接器都支持一种叫 COMMON块(Common Block) 的机制。 当同名的多个弱引用符号的类型不一致,以符号的占用空间以最大的大小为准。 在目标文件中,标注为“SHN_COMMON”类型的符号即用这种机制处理。
在前面章节中,存在编译器将 未初始化的全局变量 定义为SHN_COMMON类型。 那么为什么编译器不直接把 未初始化的全局变量 也当作 未初始化的局部静态变量 一样处理, 为它在BSS段分配空间,而是将其标记为一个COMMON类型的变量?
这是因为在未链接前,弱符号最终所占空间的大小是未知的,因为其它目标文件里该弱符号所占空间可能比本目标文件所占空间要大,因此无法在BSS段内分配空间。但当链接器读取了所有输入的目标文件后,任何一个弱符号的最终大小都可以确定了,所以它可以在最终输出文件的BSS段分配空间。
GCC允许加"-fno-common"参数把所有未初始化的全局变量不以COMMON块的形式处理,或者使用"__attribute__((nocommon))"扩展。
>> cat common.c int g __attribute__((nocommon)); >> cat common1.c int g = 0; int main() { } >> gcc common.c common1.c /tmp/ccMtuul2.o:(.bss+0x0): multiple definition of `g' /tmp/ccsjyUP2.o:(.bss+0x0): first defined here collect2: ld returned 1 exit status
4. C++相关的问题
4.1 重复代码消除
>> cat temp.C #include <iostream> using namespace std; template <class T> T fun(T t) { return t * 2; } int main() { int a = fun(2); double b = fun(2.0); cout << a << "," << b << endl; } >> readelf -S temp.o There are 18 section headers, starting at offset 0x550: Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align [ 0] NULL 0000000000000000 00000000 0000000000000000 0000000000000000 0 0 0 [ 1] .text PROGBITS 0000000000000000 00000040 00000000000001f9 0000000000000000 AX 0 0 4 [ 2] .rela.text RELA 0000000000000000 00000ef0 00000000000001f8 0000000000000018 16 1 8 [ 3] .data PROGBITS 0000000000000000 0000023c 0000000000000000 0000000000000000 WA 0 0 4 [ 4] .bss NOBITS 0000000000000000 0000023c 0000000000000001 0000000000000000 WA 0 0 4 [ 5] .rodata PROGBITS 0000000000000000 0000023c 0000000000000002 0000000000000000 A 0 0 1 [ 6] .gnu.linkonce.t._ PROGBITS 0000000000000000 0000023e 0000000000000034 0000000000000000 AX 0 0 2 [ 7] .gnu.linkonce.t._ PROGBITS 0000000000000000 00000272 000000000000000e 0000000000000000 AX 0 0 2 [ 8] .gnu.linkonce.t._ PROGBITS 0000000000000000 00000280 0000000000000026 0000000000000000 AX 0 0 2 [ 9] .ctors PROGBITS 0000000000000000 000002a8 0000000000000008 0000000000000000 WA 0 0 8 [10] .rela.ctors RELA 0000000000000000 000010e8 0000000000000018 0000000000000018 16 9 8 [11] .eh_frame PROGBITS 0000000000000000 000002b0 00000000000001a0 0000000000000000 A 0 0 8 [12] .rela.eh_frame RELA 0000000000000000 00001100 00000000000000d8 0000000000000018 16 11 8 [13] .note.GNU-stack PROGBITS 0000000000000000 00000450 0000000000000000 0000000000000000 0 0 1 [14] .comment PROGBITS 0000000000000000 00000450 000000000000002a 0000000000000000 0 0 1 [15] .shstrtab STRTAB 0000000000000000 0000047a 00000000000000d1 0000000000000000 0 0 1 [16] .symtab SYMTAB 0000000000000000 000009d0 0000000000000348 0000000000000018 17 18 8 [17] .strtab STRTAB 0000000000000000 00000d18 00000000000001d1 0000000000000000 0 0 1
".gnu.linkonce.name"
4.2 全局构造和析构
.init和.fini段
参考http://l4u-00.jinr.ru/usoft/WWW/www_debian.org/Documentation/elf/node3.html
.fini
This section holds executable instructions that contribute to the process termination code. That is, when a program exits normally, the system arranges to execute the code in this section.
.init
This section holds executable instructions that contribute to the process initialization code. That is, when a program starts to run the system arranges to execute the code in this section before the main program entry point (called main in C programs).
4.3 ABI
application binary interface
5. 静态库链接
>> pwd /usr/lib >> ar -t libc.a |wc 1429 1429 16645 >> objdump -t libc.a |grep -w printf reg-printf.o: file format elf64-x86-64 printf-prs.o: file format elf64-x86-64 printf.o: file format elf64-x86-64 0000000000000000 g F .text 000000000000009d printf printf-parsemb.o: file format elf64-x86-64 printf-parsewc.o: file format elf64-x86-64
>> ar -x /usr/lib/libc.a >> ld hello.o libc/printf.o -o a ld: warning: cannot find entry symbol _start; defaulting to 00000000004000b0 libc/printf.o: In function `_IO_printf': (.text+0x6b): undefined reference to `stdout' libc/printf.o: In function `_IO_printf': (.text+0x91): undefined reference to `vfprintf'
>> gcc -static --verbose -fno-builtin hello.c Reading specs from /usr/lib/gcc/x86_64-linux-gnu/3.4.6/specs Configured with: ../src/configure -v --enable-languages=c,c++,f77,pascal --prefix=/usr --libexecdir=/usr/lib --with-gxx-include-dir=/usr/include/c++/3.4 --enable-shared --with-system-zlib --enable-nls --without-included-gettext --program-suffix=-3.4 --enable-__cxa_atexit --enable-clocale=gnu --enable-libstdcxx-debug x86_64-linux-gnu Thread model: posix gcc version 3.4.6 (Ubuntu 3.4.6-6ubuntu5) /usr/lib/gcc/x86_64-linux-gnu/3.4.6/cc1 -quiet -v hello.c -quiet -dumpbase hello.c -mtune=k8 -auxbase hello -fno-builtin -version -o /tmp/ccKPJUxj.s ignoring nonexistent directory "/usr/local/include/x86_64-linux-gnu" ignoring nonexistent directory "/usr/include/x86_64-linux-gnu" #include "..." search starts here: #include <...> search starts here: /usr/local/include /usr/lib/gcc/x86_64-linux-gnu/3.4.6/include /usr/include End of search list. GNU C version 3.4.6 (Ubuntu 3.4.6-6ubuntu5) (x86_64-linux-gnu) compiled by GNU C version 3.4.6 (Ubuntu 3.4.6-6ubuntu5). GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 as --traditional-format -V -Qy --64 -o /tmp/ccMTOoIt.o /tmp/ccKPJUxj.s GNU assembler version 2.18.0 (x86_64-linux-gnu) using BFD version (GNU Binutils for Ubuntu) 2.18.0.20080103 /usr/lib/gcc/x86_64-linux-gnu/3.4.6/collect2 -m elf_x86_64 -static /usr/lib/gcc/x86_64-linux-gnu/3.4.6/../../../../lib/crt1.o /usr/lib/gcc/x86_64-linux-gnu/3.4.6/../../../../lib/crti.o /usr/lib/gcc/x86_64-linux-gnu/3.4.6/crtbeginT.o -L/usr/lib/gcc/x86_64-linux-gnu/3.4.6 -L/usr/lib/gcc/x86_64-linux-gnu/3.4.6 -L/usr/lib/gcc/x86_64-linux-gnu/3.4.6/../../../../lib -L/usr/lib/gcc/x86_64-linux-gnu/3.4.6/../../.. -L/lib/../lib -L/usr/lib/../lib /tmp/ccMTOoIt.o --start-group -lgcc -lgcc_eh -lc --end-group /usr/lib/gcc/x86_64-linux-gnu/3.4.6/crtend.o /usr/lib/gcc/x86_64-linux-gnu/3.4.6/../../../../lib/crtn.o
使用链接控制脚本,ld链接器在用户没有指定链接器脚本的时候会使用默认链接脚本,使用“ld -verbose”命令可查看默认的链接脚本。
/usr/lib/ldscripts/elf_i386.x => for normal executables
/usr/lib/ldscripts/elf_i386.xs => link shared library
可以使用-T参数指定链接控制脚本
http://blog.csdn.net/joker0910/article/details/7678056
非常经典的一片介绍linker script的文章
http://blogimg.chinaunix.net/blog/upfile2/090619175409.pdf