ELF file format and dynamic linking

20171204001

Introduction

ELF (Executable and Linkable Format) is the file format of the executable file under Unix system. This post will give a basic demonstration on the ELF file format and briefly explain how dynamic link works. First of all, I will present the structure of an object file and basic data structures to resolve an ELF file. Next I will further present the structure of an executable file and how the dynamic linking works under unix system.

We will use the following code as sample code to display the details in the ELF format.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
	char *buf1, *buf2, *buf3;
	if (argc != 4) return;

	buf1 = malloc(256);
	strcpy(buf1, argv[1]);

	buf2 = malloc(strtoul(argv[2], NULL, 16));

	buf3 = malloc(256);
	strcpy(buf3, argv[3]);

	free(buf3);
	free(buf2);
	free(buf1);

	return 0;
}

Object File

We can use the -c option in gcc to get the object file of target function, and use hex command to dump the content of object file in hex format.

$ gcc -c test1.c
$ hex test1.o
0x00000000: 7f 45 4c 46 01 01 01 00 - 00 00 00 00 00 00 00 00 ?ELFAAA@@@@@@@@@
0x00000010: 01 00 03 00 01 00 00 00 - 00 00 00 00 00 00 00 00 A@C@A@@@@@@@@@@@
0x00000020: 60 01 00 00 00 00 00 00 - 34 00 00 00 00 00 28 00 `A@@@@@@4@@@@@(@
0x00000030: 0a 00 07 00 55 89 e5 83 - e4 f0 83 ec 20 83 7d 08 J@G@U...�.. .}H
0x00000040: 04 74 05 e9 a5 00 00 00 - c7 04 24 00 01 00 00 e8 DtE@@@.D$@A@@�
0x00000050: fc ff ff ff 89 44 24 1c - 8b 45 0c 83 c0 04 8b 00 .....D$\.EL..D.@
0x00000060: 89 44 24 04 8b 44 24 1c - 89 04 24 e8 fc ff ff ff .D$D.D$\.D$�...
0x00000070: 8b 45 0c 83 c0 08 8b 00 - c7 44 24 08 10 00 00 00 .EL..H.@.D$HP@@@
0x00000080: c7 44 24 04 00 00 00 00 - 89 04 24 e8 fc ff ff ff .D$D@@@@.D$�...
0x00000090: 89 04 24 e8 fc ff ff ff - 89 44 24 18 c7 04 24 00 .D$�....D$X.D$@
0x000000a0: 01 00 00 e8 fc ff ff ff - 89 44 24 14 8b 45 0c 83 A@@�....D$T.EL.
0x000000b0: c0 0c 8b 00 89 44 24 04 - 8b 44 24 14 89 04 24 e8 .L.@.D$D.D$T.D$�
0x000000c0: fc ff ff ff 8b 44 24 14 - 89 04 24 e8 fc ff ff ff .....D$T.D$�...
0x000000d0: 8b 44 24 18 89 04 24 e8 - fc ff ff ff 8b 44 24 1c .D$X.D$�....D$\
0x000000e0: 89 04 24 e8 fc ff ff ff - b8 00 00 00 00 c9 c3 00 .D$�....@@@@�@
0x000000f0: 00 47 43 43 3a 20 28 55 - 62 75 6e 74 75 20 34 2e @GCC: (Ubuntu 4.
0x00000100: 34 2e 33 2d 34 75 62 75 - 6e 74 75 35 2e 31 29 20 4.3-4ubuntu5.1)
0x00000110: 34 2e 34 2e 33 00 00 2e - 73 79 6d 74 61 62 00 2e 4.4.3@@.symtab@.
0x00000120: 73 74 72 74 61 62 00 2e - 73 68 73 74 72 74 61 62 strtab@.shstrtab
0x00000130: 00 2e 72 65 6c 2e 74 65 - 78 74 00 2e 64 61 74 61 @.rel.text@.data
0x00000140: 00 2e 62 73 73 00 2e 63 - 6f 6d 6d 65 6e 74 00 2e @.bss@.comment@.
0x00000150: 6e 6f 74 65 2e 47 4e 55 - 2d 73 74 61 63 6b 00 00 note.GNU-stack@@
0x00000160: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 @@@@@@@@@@@@@@@@
0x00000170: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 @@@@@@@@@@@@@@@@
0x00000180: 00 00 00 00 00 00 00 00 - 1f 00 00 00 01 00 00 00 @@@@@@@@_@@@A@@@
0x00000190: 06 00 00 00 00 00 00 00 - 34 00 00 00 bb 00 00 00 F@@@@@@@4@@@.@@@
0x000001a0: 00 00 00 00 00 00 00 00 - 04 00 00 00 00 00 00 00 @@@@@@@@D@@@@@@@
0x000001b0: 1b 00 00 00 09 00 00 00 - 00 00 00 00 00 00 00 00 [@@@I@@@@@@@@@@@
0x000001c0: dc 03 00 00 48 00 00 00 - 08 00 00 00 01 00 00 00 .C@@H@@@H@@@A@@@
0x000001d0: 04 00 00 00 08 00 00 00 - 25 00 00 00 01 00 00 00 D@@@H@@@%@@@A@@@
0x000001e0: 03 00 00 00 00 00 00 00 - f0 00 00 00 00 00 00 00 C@@@@@@@.@@@@@@@
0x000001f0: 00 00 00 00 00 00 00 00 - 04 00 00 00 00 00 00 00 @@@@@@@@D@@@@@@@
0x00000200: 2b 00 00 00 08 00 00 00 - 03 00 00 00 00 00 00 00 +@@@H@@@C@@@@@@@
0x00000210: f0 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 .@@@@@@@@@@@@@@@
0x00000220: 04 00 00 00 00 00 00 00 - 30 00 00 00 01 00 00 00 D@@@@@@@0@@@A@@@
0x00000230: 30 00 00 00 00 00 00 00 - f0 00 00 00 26 00 00 00 0@@@@@@@.@@@&@@@
0x00000240: 00 00 00 00 00 00 00 00 - 01 00 00 00 01 00 00 00 @@@@@@@@A@@@A@@@
0x00000250: 39 00 00 00 01 00 00 00 - 00 00 00 00 00 00 00 00 9@@@A@@@@@@@@@@@
0x00000260: 16 01 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 VA@@@@@@@@@@@@@@
0x00000270: 01 00 00 00 00 00 00 00 - 11 00 00 00 03 00 00 00 A@@@@@@@Q@@@C@@@
0x00000280: 00 00 00 00 00 00 00 00 - 16 01 00 00 49 00 00 00 @@@@@@@@VA@@I@@@
0x00000290: 00 00 00 00 00 00 00 00 - 01 00 00 00 00 00 00 00 @@@@@@@@A@@@@@@@
0x000002a0: 01 00 00 00 02 00 00 00 - 00 00 00 00 00 00 00 00 A@@@B@@@@@@@@@@@
0x000002b0: f0 02 00 00 c0 00 00 00 - 09 00 00 00 07 00 00 00 .B@@.@@@I@@@G@@@
0x000002c0: 04 00 00 00 10 00 00 00 - 09 00 00 00 03 00 00 00 D@@@P@@@I@@@C@@@
0x000002d0: 00 00 00 00 00 00 00 00 - b0 03 00 00 29 00 00 00 @@@@@@@@.C@@)@@@
0x000002e0: 00 00 00 00 00 00 00 00 - 01 00 00 00 00 00 00 00 @@@@@@@@A@@@@@@@
0x000002f0: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 @@@@@@@@@@@@@@@@
0x00000300: 01 00 00 00 00 00 00 00 - 00 00 00 00 04 00 f1 ff A@@@@@@@@@@@D@..
0x00000310: 00 00 00 00 00 00 00 00 - 00 00 00 00 03 00 01 00 @@@@@@@@@@@@C@A@
0x00000320: 00 00 00 00 00 00 00 00 - 00 00 00 00 03 00 03 00 @@@@@@@@@@@@C@C@
0x00000330: 00 00 00 00 00 00 00 00 - 00 00 00 00 03 00 04 00 @@@@@@@@@@@@C@D@
0x00000340: 00 00 00 00 00 00 00 00 - 00 00 00 00 03 00 06 00 @@@@@@@@@@@@C@F@
0x00000350: 00 00 00 00 00 00 00 00 - 00 00 00 00 03 00 05 00 @@@@@@@@@@@@C@E@
0x00000360: 09 00 00 00 00 00 00 00 - bb 00 00 00 12 00 01 00 I@@@@@@@.@@@R@A@
0x00000370: 0e 00 00 00 00 00 00 00 - 00 00 00 00 10 00 00 00 N@@@@@@@@@@@P@@@
0x00000380: 15 00 00 00 00 00 00 00 - 00 00 00 00 10 00 00 00 U@@@@@@@@@@@P@@@
0x00000390: 1c 00 00 00 00 00 00 00 - 00 00 00 00 10 00 00 00 \@@@@@@@@@@@P@@@
0x000003a0: 24 00 00 00 00 00 00 00 - 00 00 00 00 10 00 00 00 $@@@@@@@@@@@P@@@
0x000003b0: 00 74 65 73 74 31 2e 63 - 00 6d 61 69 6e 00 6d 61 @test1.c@main@ma
0x000003c0: 6c 6c 6f 63 00 73 74 72 - 63 70 79 00 73 74 72 74 lloc@strcpy@strt
0x000003d0: 6f 75 6c 00 66 72 65 65 - 00 00 00 00 1c 00 00 00 oul@free@@@@\@@@
0x000003e0: 02 08 00 00 38 00 00 00 - 02 09 00 00 58 00 00 00 BH@@8@@@BI@@X@@@
0x000003f0: 02 0a 00 00 60 00 00 00 - 02 08 00 00 70 00 00 00 BJ@@`@@@BH@@p@@@
0x00000400: 02 08 00 00 8c 00 00 00 - 02 09 00 00 98 00 00 00 BH@@.@@@BI@@.@@@
0x00000410: 02 0b 00 00 a4 00 00 00 - 02 0b 00 00 b0 00 00 00 BK@@.@@@BK@@.@@@
0x00000420: 02 0b 00 00             -                         BK@@

To understand the meaning of the data displayed above, use the readelf command to display the structure of object file and use objdump commands to get assembly code of our sample code.

$ objdump -d test1.o
test1.o:     file format elf32-i386
Disassembly of section .text:

00000000 <main>:
   0:	55                   	push   %ebp
   1:	89 e5                	mov    %esp,%ebp
   3:	83 e4 f0             	and    $0xfffffff0,%esp
   6:	83 ec 20             	sub    $0x20,%esp
   9:	83 7d 08 04          	cmpl   $0x4,0x8(%ebp)
   d:	74 05                	je     14 <main+0x14>
   f:	e9 a5 00 00 00       	jmp    b9 <main+0xb9>
  14:	c7 04 24 00 01 00 00 	movl   $0x100,(%esp)
  1b:	e8 fc ff ff ff       	call   1c <main+0x1c>
  20:	89 44 24 1c          	mov    %eax,0x1c(%esp)
  24:	8b 45 0c             	mov    0xc(%ebp),%eax
  27:	83 c0 04             	add    $0x4,%eax
  2a:	8b 00                	mov    (%eax),%eax
  2c:	89 44 24 04          	mov    %eax,0x4(%esp)
  30:	8b 44 24 1c          	mov    0x1c(%esp),%eax
  34:	89 04 24             	mov    %eax,(%esp)
  37:	e8 fc ff ff ff       	call   38 <main+0x38>
  3c:	8b 45 0c             	mov    0xc(%ebp),%eax
  3f:	83 c0 08             	add    $0x8,%eax
  42:	8b 00                	mov    (%eax),%eax
  44:	c7 44 24 08 10 00 00 	movl   $0x10,0x8(%esp)
  4b:	00
  4c:	c7 44 24 04 00 00 00 	movl   $0x0,0x4(%esp)
  53:	00
  54:	89 04 24             	mov    %eax,(%esp)
  57:	e8 fc ff ff ff       	call   58 <main+0x58>
  5c:	89 04 24             	mov    %eax,(%esp)
  5f:	e8 fc ff ff ff       	call   60 <main+0x60>
  64:	89 44 24 18          	mov    %eax,0x18(%esp)
  68:	c7 04 24 00 01 00 00 	movl   $0x100,(%esp)
  6f:	e8 fc ff ff ff       	call   70 <main+0x70>
  74:	89 44 24 14          	mov    %eax,0x14(%esp)
  78:	8b 45 0c             	mov    0xc(%ebp),%eax
  7b:	83 c0 0c             	add    $0xc,%eax
  7e:	8b 00                	mov    (%eax),%eax
  80:	89 44 24 04          	mov    %eax,0x4(%esp)
  84:	8b 44 24 14          	mov    0x14(%esp),%eax
  88:	89 04 24             	mov    %eax,(%esp)
  8b:	e8 fc ff ff ff       	call   8c <main+0x8c>
  90:	8b 44 24 14          	mov    0x14(%esp),%eax
  94:	89 04 24             	mov    %eax,(%esp)
  97:	e8 fc ff ff ff       	call   98 <main+0x98>
  9c:	8b 44 24 18          	mov    0x18(%esp),%eax
  a0:	89 04 24             	mov    %eax,(%esp)
  a3:	e8 fc ff ff ff       	call   a4 <main+0xa4>
  a8:	8b 44 24 1c          	mov    0x1c(%esp),%eax
  ac:	89 04 24             	mov    %eax,(%esp)
  af:	e8 fc ff ff ff       	call   b0 <main+0xb0>
  b4:	b8 00 00 00 00       	mov    $0x0,%eax
  b9:	c9                   	leave
  ba:	c3                   	ret

Given the information above, let me introduce how ELF format was resolved based on the source code [6].

0x00-0x33: ELF file header

The ELF file header occupies the first 52 bytes of the object file. And the data structure is shown as following:

typedef struct {
	unsigned char	e_ident[EI_NIDENT];	/* File identification. */
	Elf32_Half	e_type;		/* File type. */
	Elf32_Half	e_machine;	/* Machine architecture. */
	Elf32_Word	e_version;	/* ELF format version. */
	Elf32_Addr	e_entry;	/* Entry point. */
	Elf32_Off	e_phoff;	/* Program header file offset. */
	Elf32_Off	e_shoff;	/* Section header file offset. */
	Elf32_Word	e_flags;	/* Architecture-specific flags. */
	Elf32_Half	e_ehsize;	/* Size of ELF header in bytes. */
	Elf32_Half	e_phentsize;	/* Size of program header entry. */
	Elf32_Half	e_phnum;	/* Number of program header entries. */
	Elf32_Half	e_shentsize;	/* Size of section header entry. */
	Elf32_Half	e_shnum;	/* Number of section header entries. */
	Elf32_Half	e_shstrndx;	/* Section name strings section. */
} Elf32_Ehdr;

From the hex dump of the object file, the information that could be retrieved as following:

ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              REL (Relocatable file)
  Machine:                           Intel 80386
  Version:                           0x1
  Entry point address:               0x0
  Start of program headers:          0 (bytes into file)
  Start of section headers:          352 (bytes into file)
  Flags:                             0x0
  Size of this header:               52 (bytes)
  Size of program headers:           0 (bytes)
  Number of program headers:         0
  Size of section headers:           40 (bytes)
  Number of section headers:         10
  Section header string table index: 7

The offset of section header is 0x160 (352). Before reaching the section header, I need to explain the data between 0x34 and 0x160

0x34-0xef: Text Data

The data following the ELF header is the assembly code of the function. It is pretty much clear via checking the hex dump of object file and the output of objdump command.

0xf0-0x115: Comment Section

This section is usually filled by compiler with some compiler information.

0x116-0x15f: Section Header String Table

This section is used to resolve the name of each entry in section header. How the section name is resolved will be discussed next.

0x160-0x2ef: Section Header Table

Section header table contains the necessary information for each section in ELF file format, and its data structure is shown as following:

typedef struct {
	Elf32_Word	sh_name;	/* Section name (index into the
					   section header string table). */
	Elf32_Word	sh_type;	/* Section type. */
	Elf32_Word	sh_flags;	/* Section flags. */
	Elf32_Addr	sh_addr;	/* Address in memory image. */
	Elf32_Off	sh_offset;	/* Offset in file. */
	Elf32_Word	sh_size;	/* Size in bytes. */
	Elf32_Word	sh_link;	/* Index of a related section. */
	Elf32_Word	sh_info;	/* Depends on section type. */
	Elf32_Word	sh_addralign;	/* Alignment in bytes. */
	Elf32_Word	sh_entsize;	/* Size of each entry in section. */
} Elf32_Shdr;

After dumping via objdump we can view the information in the section header table as below.

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        00000000 000034 0000bb 00  AX  0   0  4
  [ 2] .rel.text         REL             00000000 0003dc 000048 08      8   1  4
  [ 3] .data             PROGBITS        00000000 0000f0 000000 00  WA  0   0  4
  [ 4] .bss              NOBITS          00000000 0000f0 000000 00  WA  0   0  4
  [ 5] .comment          PROGBITS        00000000 0000f0 000026 01  MS  0   0  1
  [ 6] .note.GNU-stack   PROGBITS        00000000 000116 000000 00      0   0  1
  [ 7] .shstrtab         STRTAB          00000000 000116 000049 00      0   0  1
  [ 8] .symtab           SYMTAB          00000000 0002f0 0000c0 10      9   7  4
  [ 9] .strtab           STRTAB          00000000 0003b0 000029 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings)
  I (info), L (link order), G (group), x (unknown)
  O (extra OS processing required) o (OS specific), p (processor specific)

The size of a single section header entry is 0x28, let’s use the following dumped information to show how the section entry is resolved.

0x000001b0: 1b 00 00 00 09 00 00 00 - 00 00 00 00 00 00 00 00 [@@@I@@@@@@@@@@@
0x000001c0: dc 03 00 00 48 00 00 00 - 08 00 00 00 01 00 00 00 .C@@H@@@H@@@A@@@
0x000001d0: 04 00 00 00 08 00 00 00

The offset of section name into Section Header String Table is 0x1b, therefore the name of current section is *(0x116+0x1b) = “.rel.text”.
The type this section is 0x9, according to the macro defined in ELF.h, we know that the type of current section saves relocation entries of relocation table.

/* Legal values for sh_type (section type).  */
#define SHT_NULL	  0		/* Section header table entry unused */
#define SHT_PROGBITS	  1		/* Program data */
#define SHT_SYMTAB	  2		/* Symbol table */
#define SHT_STRTAB	  3		/* String table */
#define SHT_RELA	  4		/* Relocation entries with addends */
#define SHT_HASH	  5		/* Symbol hash table */
#define SHT_DYNAMIC	  6		/* Dynamic linking information */
#define SHT_NOTE	  7		/* Notes */
#define SHT_NOBITS	  8		/* Program space with no data (bss) */
#define SHT_REL		  9		/* Relocation entries, no addends */
#define SHT_SHLIB	  10		/* Reserved */
#define SHT_DYNSYM	  11		/* Dynamic linker symbol table */
#define SHT_INIT_ARRAY	  14		/* Array of constructors */
#define SHT_FINI_ARRAY	  15		/* Array of destructors */
#define SHT_PREINIT_ARRAY 16		/* Array of pre-constructors */
#define SHT_GROUP	  17		/* Section group */
#define SHT_SYMTAB_SHNDX  18		/* Extended section indeces */
#define	SHT_NUM		  19		/* Number of defined types.  */

Other data in the section header table is picked accordingly.

0x2f0-0x3af: Symbol Table and 0x3b0-0x3db: Symbol String Table

As the name implies, the symbol table is used to store the symbol information in ELF file format. The Symbol String Table stored the name to resolve the name of each entry in Symbol Table

Symbol table '.symtab' contains 12 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
     0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 00000000     0 FILE    LOCAL  DEFAULT  ABS test1.c
     2: 00000000     0 SECTION LOCAL  DEFAULT    1
     3: 00000000     0 SECTION LOCAL  DEFAULT    3
     4: 00000000     0 SECTION LOCAL  DEFAULT    4
     5: 00000000     0 SECTION LOCAL  DEFAULT    6
     6: 00000000     0 SECTION LOCAL  DEFAULT    5
     7: 00000000   187 FUNC    GLOBAL DEFAULT    1 main
     8: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND malloc
     9: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND strcpy
    10: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND strtoul
    11: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND free

The symbol table entry is defined as following:

typedef struct {
	Elf32_Word	st_name;	/* String table index of name. */
	Elf32_Addr	st_value;	/* Symbol value. */
	Elf32_Word	st_size;	/* Size of associated object. */
	unsigned char	st_info;	/* Type and binding information. */
	unsigned char	st_other;	/* Reserved (not used). */
	Elf32_Half	st_shndx;	/* Section index of symbol. */
} Elf32_Sym;

#define ELF32_ST_BIND(val)		(((unsigned char) (val)) >> 4)
#define ELF32_ST_TYPE(val)		((val) & 0xf)
#define ELF32_ST_INFO(bind, type)	(((bind) << 4) + ((type) & 0xf))

0x3dc-0x423: Relocation Table

At present, the information contained in relocation table is not important. After loading and linking, the relocation table will contain the function pointer that will be used by program.

Relocation section '.rel.text' at offset 0x3dc contains 9 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
0000001c  00000802 R_386_PC32        00000000   malloc
00000038  00000902 R_386_PC32        00000000   strcpy
00000058  00000a02 R_386_PC32        00000000   strtoul
00000060  00000802 R_386_PC32        00000000   malloc
00000070  00000802 R_386_PC32        00000000   malloc
0000008c  00000902 R_386_PC32        00000000   strcpy
00000098  00000b02 R_386_PC32        00000000   free
000000a4  00000b02 R_386_PC32        00000000   free
000000b0  00000b02 R_386_PC32        00000000   free

Executable File

An executable file contains much more information than an object file. Due the page limit of the post, I will not display the whole dumped information via hex command. Instead, I will discuss some of them section by section.

ELF Header

Still we first display the ELF header information of the current executable file:

hex dump info:
0x00000000: 7f 45 4c 46 01 01 01 00 - 00 00 00 00 00 00 00 00 ?ELFAAA@@@@@@@@@
0x00000010: 02 00 03 00 01 00 00 00 - d0 83 04 08 34 00 00 00 B@C@A@@@..DH4@@@
0x00000020: 34 11 00 00 00 00 00 00 - 34 00 20 00 08 00 28 00 4Q@@@@@@4@ @H@(@
0x00000030: 1e 00 1b 00

ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Intel 80386
  Version:                           0x1
  Entry point address:               0x80483d0
  Start of program headers:          52 (bytes into file)
  Start of section headers:          4404 (bytes into file)
  Flags:                             0x0
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         8
  Size of section headers:           40 (bytes)
  Number of section headers:         30
  Section header string table index: 27

Section Header

Then we can resolve the information in section header as discussed above.

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .interp           PROGBITS        08048134 000134 000013 00   A  0   0  1
  [ 2] .note.ABI-tag     NOTE            08048148 000148 000020 00   A  0   0  4
  [ 3] .note.gnu.build-i NOTE            08048168 000168 000024 00   A  0   0  4
  [ 4] .hash             HASH            0804818c 00018c 000034 04   A  6   0  4
  [ 5] .gnu.hash         GNU_HASH        080481c0 0001c0 000020 04   A  6   0  4
  [ 6] .dynsym           DYNSYM          080481e0 0001e0 000080 10   A  7   1  4
  [ 7] .dynstr           STRTAB          08048260 000260 000060 00   A  0   0  1
  [ 8] .gnu.version      VERSYM          080482c0 0002c0 000010 02   A  6   0  2
  [ 9] .gnu.version_r    VERNEED         080482d0 0002d0 000020 00   A  7   1  4
  [10] .rel.dyn          REL             080482f0 0002f0 000008 08   A  6   0  4
  [11] .rel.plt          REL             080482f8 0002f8 000030 08   A  6  13  4
  [12] .init             PROGBITS        08048328 000328 000030 00  AX  0   0  4
  [13] .plt              PROGBITS        08048358 000358 000070 04  AX  0   0  4
  [14] .text             PROGBITS        080483d0 0003d0 00020c 00  AX  0   0 16
  [15] .fini             PROGBITS        080485dc 0005dc 00001c 00  AX  0   0  4
  [16] .rodata           PROGBITS        080485f8 0005f8 000008 00   A  0   0  4
  [17] .eh_frame         PROGBITS        08048600 000600 000004 00   A  0   0  4
  [18] .ctors            PROGBITS        08049f0c 000f0c 000008 00  WA  0   0  4
  [19] .dtors            PROGBITS        08049f14 000f14 000008 00  WA  0   0  4
  [20] .jcr              PROGBITS        08049f1c 000f1c 000004 00  WA  0   0  4
  [21] .dynamic          DYNAMIC         08049f20 000f20 0000d0 08  WA  7   0  4
  [22] .got              PROGBITS        08049ff0 000ff0 000004 04  WA  0   0  4
  [23] .got.plt          PROGBITS        08049ff4 000ff4 000024 04  WA  0   0  4
  [24] .data             PROGBITS        0804a018 001018 000008 00  WA  0   0  4
  [25] .bss              NOBITS          0804a020 001020 000008 00  WA  0   0  4
  [26] .comment          PROGBITS        00000000 001020 000025 01  MS  0   0  1
  [27] .shstrtab         STRTAB          00000000 001045 0000ee 00      0   0  1
  [28] .symtab           SYMTAB          00000000 0015e4 000440 10     29  45  4
  [29] .strtab           STRTAB          00000000 001a24 000232 00      0   0  1

Program Segment

Program segment table displays the mapping property of each section in the executable file:

typedef struct
{
  Elf32_Word	p_type;			/* Segment type */
  Elf32_Off	p_offset;		/* Segment file offset */
  Elf32_Addr	p_vaddr;		/* Segment virtual address */
  Elf32_Addr	p_paddr;		/* Segment physical address */
  Elf32_Word	p_filesz;		/* Segment size in file */
  Elf32_Word	p_memsz;		/* Segment size in memory */
  Elf32_Word	p_flags;		/* Segment flags */
  Elf32_Word	p_align;		/* Segment alignment */
} Elf32_Phdr;

Here please pay attention to the flag of the memory.
R: Readable
W: Writable
E: Executable
Thus we can use the mapping relation to check the status of memory address.

0x00000034:             06 00 00 00 - 34 00 00 00 34 80 04 08 ^@[@F@@@4@@@4.DH
0x00000040: 34 80 04 08 00 01 00 00 - 00 01 00 00 05 00 00 00 4.DH@A@@@A@@E@@@
0x00000050: 04 00 00 00 03 00 00 00 - 34 01 00 00 34 81 04 08 D@@@C@@@4A@@4.DH
0x00000060: 34 81 04 08 13 00 00 00 - 13 00 00 00 04 00 00 00 4.DHS@@@S@@@D@@@
0x00000070: 01 00 00 00 01 00 00 00 - 00 00 00 00 00 80 04 08 A@@@A@@@@@@@@.DH
0x00000080: 00 80 04 08 04 06 00 00 - 04 06 00 00 05 00 00 00 @.DHDF@@DF@@E@@@
0x00000090: 00 10 00 00 01 00 00 00 - 0c 0f 00 00 0c 9f 04 08 @P@@A@@@LO@@L.DH
0x000000a0: 0c 9f 04 08 14 01 00 00 - 1c 01 00 00 06 00 00 00 L.DHTA@@\A@@F@@@
0x000000b0: 00 10 00 00 02 00 00 00 - 20 0f 00 00 20 9f 04 08 @P@@B@@@ O@@ .DH
0x000000c0: 20 9f 04 08 d0 00 00 00 - d0 00 00 00 06 00 00 00  .DH.@@@.@@@F@@@
0x000000d0: 04 00 00 00 04 00 00 00 - 48 01 00 00 48 81 04 08 D@@@D@@@HA@@H.DH
0x000000e0: 48 81 04 08 44 00 00 00 - 44 00 00 00 04 00 00 00 H.DHD@@@D@@@D@@@
0x000000f0: 04 00 00 00 51 e5 74 64 - 00 00 00 00 00 00 00 00 D@@@Q.td@@@@@@@@
0x00000100: 00 00 00 00 00 00 00 00 - 00 00 00 00 06 00 00 00 @@@@@@@@@@@@F@@@
0x00000110: 04 00 00 00 52 e5 74 64 - 0c 0f 00 00 0c 9f 04 08 D@@@R.tdLO@@L.DH
0x00000120: 0c 9f 04 08 f4 00 00 00 - f4 00 00 00 04 00 00 00 L.DH.@@@.@@@D@@@
0x00000130: 01 00 00 00                                       A@@@

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  PHDR           0x000034 0x08048034 0x08048034 0x00100 0x00100 R E 0x4
  INTERP         0x000134 0x08048134 0x08048134 0x00013 0x00013 R   0x1
  LOAD           0x000000 0x08048000 0x08048000 0x00604 0x00604 R E 0x1000
  LOAD           0x000f0c 0x08049f0c 0x08049f0c 0x00114 0x0011c RW  0x1000
  DYNAMIC        0x000f20 0x08049f20 0x08049f20 0x000d0 0x000d0 RW  0x4
  NOTE           0x000148 0x08048148 0x08048148 0x00044 0x00044 R   0x4
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x4
  GNU_RELRO      0x000f0c 0x08049f0c 0x08049f0c 0x000f4 0x000f4 R   0x1

 Section to Segment mapping:
  Segment Sections...
   00
   01     .interp
   02     .interp .note.ABI-tag .note.gnu.build-id .hash .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata .eh_frame
   03     .ctors .dtors .jcr .dynamic .got .got.plt .data .bss
   04     .dynamic
   05     .note.ABI-tag .note.gnu.build-id
   06
   07     .ctors .dtors .jcr .dynamic .got

Relocation Table and Dynamic Link

Here, we finally reach the key part of this post. We will discuss the .rel.plt, .plt and .got.plt. And how they will be utilised in exploitation development.
0x804a000-0x804a017: .rel.plt section
.rel.plt table contains the address of the unresolved function.

0x000002f0:                           00 a0 04 08 07 01 00 00 ..DHFA@@@.DHGA@@
0x00000300: 04 a0 04 08 07 02 00 00 - 08 a0 04 08 07 03 00 00 D.DHGB@@H.DHGC@@
0x00000310: 0c a0 04 08 07 04 00 00 - 10 a0 04 08 07 05 00 00 L.DHGD@@P.DHGE@@
0x00000320: 14 a0 04 08 07 06 00 00                           T.DHGF@@U..S..D.

Relocation section '.rel.plt' at offset 0x2f8 contains 6 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
0804a000  00000107 R_386_JUMP_SLOT   00000000   __gmon_start__
0804a004  00000207 R_386_JUMP_SLOT   00000000   __libc_start_main
0804a008  00000307 R_386_JUMP_SLOT   00000000   free
0804a00c  00000407 R_386_JUMP_SLOT   00000000   strtoul
0804a010  00000507 R_386_JUMP_SLOT   00000000   strcpy
0804a014  00000607 R_386_JUMP_SLOT   00000000   malloc

0x8048358-0x80483c7: .plt section
.plt contains the assembly code that will be used for dynamic link. At the level of assembly code

08048358 <__gmon_start__@plt-0x10>:
 8048358:	ff 35 f8 9f 04 08    	pushl  0x8049ff8
 804835e:	ff 25 fc 9f 04 08    	jmp    *0x8049ffc
 8048364:	00 00                	add    %al,(%eax)

08048368 <__gmon_start__@plt>:
 8048368:	ff 25 00 a0 04 08    	jmp    *0x804a000
 804836e:	68 00 00 00 00       	push   $0x0
 8048373:	e9 e0 ff ff ff       	jmp    8048358 <_init+0x30>

08048378 <__libc_start_main@plt>:
 8048378:	ff 25 04 a0 04 08    	jmp    *0x804a004
 804837e:	68 08 00 00 00       	push   $0x8
 8048383:	e9 d0 ff ff ff       	jmp    8048358 <_init+0x30>

08048388 <free@plt>:
 8048388:	ff 25 08 a0 04 08    	jmp    *0x804a008
 804838e:	68 10 00 00 00       	push   $0x10
 8048393:	e9 c0 ff ff ff       	jmp    8048358 <_init+0x30>

08048398 <strtoul@plt>:
 8048398:	ff 25 0c a0 04 08    	jmp    *0x804a00c
 804839e:	68 18 00 00 00       	push   $0x18
 80483a3:	e9 b0 ff ff ff       	jmp    8048358 <_init+0x30>

080483a8 <strcpy@plt>:
 80483a8:	ff 25 10 a0 04 08    	jmp    *0x804a010
 80483ae:	68 20 00 00 00       	push   $0x20
 80483b3:	e9 a0 ff ff ff       	jmp    8048358 <_init+0x30>

080483b8 <malloc@plt>:
 80483b8:	ff 25 14 a0 04 08    	jmp    *0x804a014
 80483be:	68 28 00 00 00       	push   $0x28
 80483c3:	e9 90 ff ff ff       	jmp    8048358 <_init+0x30>

0x8049ff4-0x804a017: .got.plt section
To understand how the values in .got.plt are used in target program. Let’s observe how the values are changed during execution.

//At the beginning of program:
(gdb) x/8x 0x804a000
0x804a000:	0x0804836e	0x00144b10	0x0804838e	0x0804839e
0x804a010:	0x080483ae	0x080483be	0x00000000	0x00000000

//After the first malloc in called:
//the value at 0x804a014 is changed to the address of malloc in libc
(gdb) x/8x 0x804a000
0x804a000:	0x0804836e	0x00144b10	0x0804838e	0x0804839e
0x804a010:	0x080483ae	0x001a0ae0	0x00000000	0x00000000

//At the end of program:
//other value at 0x804a008, 0x804a00c, 0x804a010 are all changed.
(gdb) x/8x 0x804a000
0x804a000:	0x0804836e	0x00144b10	0x001a0a00	0x0015e310
0x804a010:	0x001a3e00	0x001a0ae0	0x00000000	0x00000000

Lazy Binding

The process above is called lazy binding for resolving the address of target function at runtime to improve the performance. To explain the full process of symbol resolving, we take malloc as an example below.

//when malloc() is called at first time:
804849f:	e8 14 ff ff ff       	call   80483b8 <malloc@plt>

//the control flow diverts to 0x80483b8, which locates in .plt
80483b8:	ff 25 14 a0 04 08    	jmp    *0x804a014
80483be:	68 28 00 00 00       	push   $0x28
80483c3:	e9 90 ff ff ff       	jmp    8048358 <_init+0x30>

//at this time it will divert to *0x804a014, which locates in .got.plt
0x804a000:	0x0804836e	0x00144b10	0x0804838e	0x0804839e
0x804a010:	0x080483ae	0x080483be	0x00000000	0x00000000

the data at 0x804a014 is 0x80483be, thus the control flow jumps to 0x80483be
0x28 is the index of malloc in the .rel.plt
then it goes into the routine to resolve the function symbol 

(gdb) x/2i 0x8048358
   0x8048358:	pushl  0x8049ff8
   0x804835e:	jmp    *0x8049ffc

(gdb) x/4wx 0x8049ffc
0x8049ffc:	0x00123270	0x0804836e	0x00144b10	0x001a0a00

(gdb) x/11i 0x123270
   0x123270 <_dl_runtime_resolve>:	push   %eax
   0x123271 <_dl_runtime_resolve+1>:	push   %ecx
   0x123272 <_dl_runtime_resolve+2>:	push   %edx
   0x123273 <_dl_runtime_resolve+3>:	mov    0x10(%esp),%edx
   0x123277 <_dl_runtime_resolve+7>:	mov    0xc(%esp),%eax
   0x12327b <_dl_runtime_resolve+11>:	call   0x11d5a0 <_dl_fixup>
   0x123280 <_dl_runtime_resolve+16>:	pop    %edx
   0x123281 <_dl_runtime_resolve+17>:	mov    (%esp),%ecx
   0x123284 <_dl_runtime_resolve+20>:	mov    %eax,(%esp)
   0x123287 <_dl_runtime_resolve+23>:	mov    0x4(%esp),%eax
   0x12328b <_dl_runtime_resolve+27>:	ret    $0xc

In the final _dl_runtime_resolve is called to resolve the symbol information and modify the value stored at 0x804a014.

GOT Table Hijacking

Given all the information above, we can now discuss how lazy binding is utilised in exploitation.
Recall the information given in program header section. We can see that the .got.plt is located in an area, which is readable and writeable. So if an attacker can corrupt the data in .got.plt, the attacker can hijack the control flow via calling the original function.
To prevent the attacker hijacking control flow via corrupting data in .got.plt, we can compile the executable binary with option “-Wl,-z,relro,-z,now”. This will enable the linker to resolve all the function at loading time and set the .got.plt section as read-only.

Conclusion

In this post, it takes a lot of time and effort of me to explain the details of ELF format in Unix. Actually, for a tutorial on exploitation challenge, the last part of this post is enough. But I still wish to give more details in ELF file format resolving. Based on those information, we may discuss more exploitation techniques in future 🙂

Reference

[1]http://www.iecc.com/linker/
[2]http://www.xfocus.net/articles/200201/337.html
[3]http://flint.cs.yale.edu/cs422/doc/ELF_Format.pdf
[4]https://polimicg.org/pulp/git/pulp-public/eld.git/tree/
[5]http://blog.fpmurphy.com/2008/06/position-independent-executables.html
[6]http://repo.or.cz/w/glibc.git/blob/HEAD:/include/link.h

4 thoughts on “ELF file format and dynamic linking

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.