SlideShare a Scribd company logo
Linux For Embedded Systems
ForArabs
Ahmed ElArabawy
Cairo University
Computer Eng. Dept.
CMP445-Embedded Systems
Lecture 14:
Introduction to the Toolchain
Part 2: Binary Utilities
Binary Files
• What is a Binary File?
• A Binary File is the machine language instructions that should be
executed on the target
• This is not an accurate or complete answer….
• Executable files are just one type of binary files. Not all binary
files can execute on the target
• Machine language instructions are just one one component of
the binary file. Other components exist
• So what would be a more accurate answer ??
Binary File Types
Binary File Types
1. Relocatable Object Files (*.o)
• A relocatable object file is the outcome of the compilation/assembly
process of one source code file
• This file can not execute since it has some unresolved symbols, and
needs to be linked with other object files or libraries
• Even if the file does not use any unresolved symbols, it is still not ready
to execute on the target
• Why is it called Relocatable ?
• The different components of the object file is set to start at an address
independent from other object files (for example they can start at the
address 0x00000000)
• Addresses of all symbols inside the object files are offsets from the start
address
• This applies to both machine language instructions and other components of
the object file
• It is the job of the linker to relocate these components into other addresses
when combining multiple object files (so they don’t overlap in memory)
• The relocation process performed by the linker includes,
• Merging of code and data sections from the object files
• Moving the start address of the resulting sections into an address suitable to the
target or the OS
Embedded Systems: Lecture 14: Introduction to GNU Toolchain (Binary Utilities)
Binary File Types
2. Executable Binary Files
• This is the outcome of the linking process of multiple object
files as well as (static/dynamic) libraries
• This file has all of its symbols either resolved, or pointing to
some shared objects (to be resolved at load/run time)
• The code and data sections of this file is the outcome of
merging code/data sections of the object (*.o) and archive
(*.a) files used to generate it
• This file should be placed on the target storage, and is ready
to execute on target
Embedded Systems: Lecture 14: Introduction to GNU Toolchain (Binary Utilities)
Binary File Types
3. Shared Object Files (*.so)
• This is the outcome of the linking process of multiple object
files to form a dynamic library file
• Since it is the outcome of a linking process, then the sections
of object files are merged and relocated (same as an
executable)
• One shared object file may rely on some symbols in another
shared object files
• This file is ready to load on the target by the Dynamic Linker
(ld-linux.so) whenever it is needed by an executable file
• Shared object files are not executable on their own (they don’t
have an entry function), they are only called by executable
files
Binary File Types
4. Core Dump Files
• A core dump file (also called core file) is a binary file that is
automatically generated by the Linux kernel when the
executable faces a fatal problem that causes it to exit abruptly
(crash)
• For example, it is generated,
• When the executable has a segmentation fault (illegal memory
access)
• When the executable executes the abort() function
• Upon other faults such as floating point faults (eg. Divide by Zero)
• It contains records of the state of the program (memory,
registers, variables, stack trace, …. ) at the point of the crash
• It is useful for debugging purposes to analyze the system
crashes in an offline way. Core file can be copied from the
target to the host machine, and debugged on the host
machine
Controlling Core File Generation
• In some cases, core files are very useful for debugging faults that can
not be debugged on the target
• In other cases, core file generation is not desired, to avoid filling the
storage media with them
• The following can be done to control the generation of core files
• To enable core file generation
$ ulimit -c unlimited
• To disable core file generation
$ ulimit -c 0
• To check on the current setting
$ ulimit -c
• The core file location is set at /proc/sys/kernel/core_pattern
• To set the location of the core file generation (make sure the
program has write access on the folder)
$ echo ‘/home/user/cores/core_%e.%p’ | sudo tee /proc/sys/kernel/core_pattern
Binary File Components
Binary File Components
• Binary file Contains a lot of blocks of information:
• The code section (also called the .text section):
• This is the machine language instructions that the assembler generates
during the compilation process
• Each object file (*.o) will contain its .text section
• The linker merges the .text sections of object files
• The .text section has references to symbols (such as data that it uses, or
functions that it needs to jump to)
• A group of Data Sections
• The .data section that contains global/static data that are initialized in
the code (to other values than zero)
• The .bss section that contains global/static data that are not initialized or
initialized to zero
• The .rodata section that contains Read-only Data in the code (such as
strings in printf statements)
• A symbol table section that contains a table of variables and
functions defined or used in the code
• A group of debug info sections for use by debuggers
• Other sections
More on Data Sections
• What is the difference between .bss and .data sections?
• When the program is loaded in memory, the program global and static data is
also loaded from the storage to the RAM to start execution
• If the data was initialized in the program, then the object file needs to store
the initial value in the .data section
• If the data is not initialized, then it is initialized to zero at load time
• Since the data is initialized at load time, then there is no need to carry the
data value of zero in the object file (what is the value of having a part of the
object file which is all filled with zeros )
• Accordingly, the .bss section is normally empty, and used only as a place
holder. It has a start address and a length to reserve the space in memory at
load time, but no contents in the object file (stored in Flash)
• Note that the data initialized to zero in the program, is also considered with
the uninitialized data (since all will be initialized to zero by the loader)
• What about local data ? Where are they located ?
• Local data are created at run time inside the stack
• The stack is not part of the object file stored on the Flash, it is created
directly in the target RAM at program load in Memory
Embedded Systems: Lecture 14: Introduction to GNU Toolchain (Binary Utilities)
Memory Map Example:
(For Simple Embedded Systems)
Memory Map Example:
(For Simple Embedded Systems)
Memory Map Example:
(For Simple Embedded Systems)
Conclusion:
• Binary files are of different types and perform different roles
• Binary files are not just a bunch of machine language
instructions ready for execution
• There are a lot of information inside a binary file (code, data,
debug info, tables, …. )
• Due to all of that, we need to have some clean, extendable,
and flexible way to carry all of these info in one file and
facilitate the use of it
• Different OSs use different file formats to carry this
information:
• Unix introduced the COFF (Common Object File Format) file
format
• Windows uses the PE (Portable Executable) file format
• Linux uses the ELF (Executable and Linkable Format) file format
ELF File Format
What is an ELF File
• ELF stands for “Executable and Linkable Format”
• ELF is a file format used for,
• Relocatable Object files
• Executable files
• Shared Library files
• Core dump files
• ELF files are extensible and have a lot of optional fields
• They are not specific to any processor architecture
• Used in Unix and Linux OS (and may be adopted by other OSs
as well)
• Generated by GNU GCC
• Used by the GNU toolchain
ELF File Layout
• An ELF file contains the following,
• An ELF Header at the beginning of the file. This header describes
high level attributes of the file and target processor such as,
• Type of file (object, executable, core, shared object)
• Used Processor (ARM, x86, x86-64, …)
• Little Endian/Big Endian format
• 32/64 bit format
• It has Pointers to the other parts of the file
• Program Header Table that points to zero or more segments
• Section Header Table that describes zero or more sections
• A group of segments and sections pointed by the two headers
ELF File Layout
What is a Segment
• A segment is part of the executable image for the program
• It is more relevant to executable file (not a object file)
• It is necessary for runtime execution of the file
• There are 3 major types of segments:
• Text Segment
• Contains binary code for the executable (instructions of the program)
• Data Segment
• Contains the program variables that are initialized in the program
with non zero values
• BSS Segment
• Contains the program variables that are not initialized in the program
(or initialized to zero)
What is a Section
• Sections are used for linking and debugging purposes
• In relocatable object files, sections contain code and data such as in .text , .data
and .bss sections
• During the linking process, the linker maps the TEXT, DATA, and BSS segments
from these sections
• Other sections are not required for the execution of the program and are used
for debugging purposes
• Removing those sections will not affect running of the executable, but may
reduce debugging capabilities
• Main Sections are:
• Symbol Table Section (.symtab) : Used for low level debugging info and for linking
purposes
• Strings Table Section (.strtab) : Carries all strings used by other sections in the
executable
• Section Name String Table Section (.shstrtab) : Carries the names of the sections
in the table
• Debug Info Sections (multiple sections) : available when the executable contains
source level debug info
• Some tools can be used to strip an ELF file from its sections (such as strip
command)
ELF File Layout
ELF File Layout
• The first 4 bytes represent a magic number to identify that it is
an ELF file
• The magic number is : 7F ‘E’ ‘L’ ‘F’ which is 7F454C46
• The first 4 bytes represent a magic number to identify that it is
an ELF file
• The magic number is : 7F ‘E’ ‘L’ ‘F’ which is 7F454C46
Value Word Size
01 32 bit Format
02 64 bit Format
Value Byte Order
01 Little Endian
02 Big Endian
Value Byte Order
01 Little Endian
02 Big Endian
Value File Type
00 01 Relocatable Object
00 02 Executable
00 03 Shared Object
00 04 Core dump
Value Target Processor
00 03 x86
00 14 PowerPC
00 28 ARM
00 3E x86-64
Program Entry Address In Memory
Section Header Table Location
ELF Header Size
Program Header size
Program Table Entry Count
GNU readelf
Read ELF Files
(readelf Command)
• This command is used to read the contents of an ELF file
• This includes
• Main file header
• Section Headers
• Sections
• Symbol table
• … etc
• Can be used for,
• Resolving linking problems, such as “Unresolved Symbol” error
• Debugging a crash
• Hacking an executable
• Reverse engineering a binary file
Read ELF File Main Header
$ readelf -h <ELF file>
Other Options for readelf
• To Show the file sections
$ readelf -S <ELF file>
• To show the file segments
$ readelf -l <ELF file>
• To show the symbol table
$ readelf -s <ELF File>
• To show all the elf file contents
$ readelf -a <ELF File>
GNU objdump
Reading the ELF File
(objdump Command)
$ objdump [options] <ELF file>
• This is another program to read from the ELF files
• It has a lot of usages depending of the chosen options
• For example, it can be used for,
• Reading the ELF file Header (readelf does a better job with that)
• Reading the ELF file sections headers
• Reading the assembly code of the binary code
• Reading the Symbol table(s)
• Reading the Debug Information
Showing the ELF File Header
• This usage shows the information of the main header for the
ELF file
• This is equivalent to use of readelf -h (but shows less
information)
Display Sections Headers
Showing Assembly Code of an ELF
Showing Assembly Code of an ELF
• objdump can show the assembly code for the program
• It performs disassembly for all the ELF file sections that contain code
$ objdump -d my-binary
• If you are interested in a specific section, then we need to identify
what section to use,
$ objdump -j .text -d my-binary
• Keep in mind,
• You must be using the objdump for the same target platform that the
program was compiled for
• In case a processor supports to run in both little endian or big endian
formats (such as ARM), you need to specify which format you want
to use,
$ objdump -EB -j .text -d my-binary (Big endian)
$ objdump -EL -j .text -d my-binary (Little endian)
GNU nm
List Symbols
(nm Command)
$ nm [options] <elf file>
• This command lists the symbols in the provided ELF file
• For each symbol, it presents,
• Virtual Address of the symbol
• Symbol type (Local, Global, Data, BSS, Undefined, … )
• Name of the symbol
• Size of the symbol (use the option -S)
The nm Command Usage
• To find the object files that use or define a certain symbol:
$ nm -A ./*.o | grep var_1
This command lists the symbol tables of all object files in the
current directory along with the file name, and then filters that with
the name of the symbol we are looking for
• To list all undefined symbols in an ELF file:
$ nm -u my-obj-file
These undefined symbols need to be resolved through static linking
at build time or at run time via dynamically linking with a shared
object
• To list dynamic symbols of an executable
$ nm -D my-bin-file
GNU strings
List Strings in an ELF File
(strings Command)
$ strings [options] <ELF File>
• This command lists all strings in a non-textual file
• It looks for any set of printable characters of 4 or more letters
within the file
• Examples:
$ strings a.out (Lists a set of strings within the binary file)
$ strings -f /bin/* | grep “Copy”
This Command searches for the Copyright in all binary files in /bin
directory
GNU strip
Reduce ELF File Size
(strip Command)
$ strip [options] <object file>
• The strip command is used to reduce the size of the ELF file
• This applies for both executable or object files
• This is performed by removing some of the tables and sections
that the binary can run without
• This is useful for:
• Reduce the requirement for Flash and memory usage (specially
useful for embedded systems)
• Protect the code from being reverse engineered
Usage Examples
• To strip an executable from its symbol table
$ strip -s my-bin-file
• To remove debug symbols
$ strip --strip-debug my-bin-file
• Remove all un-needed symbols
$ strip --strip-unneeded my-bin-file
• To keep the original file, and create a stripped file
$ strip -s -ostripped-file my-bin-file
GNU addr2line
addr2line Command
• To know the location in the source code for a specific address,
$ addr2line -e my-bin-file 0x400534
• This can be useful when handling a crash, and we know the
address of the instruction that crashed, and need to know the
source code line that caused the crash
• To know also the function name for that address
$ addr2line -f -e my-bin-file 0x400534
GNU size
size Command
$ size [options] <ELF File>
• This command is used to list the sizes of the different sections
inside an elf file
$ size my-bin-file
size Command
• In Linux based target platforms,
• We load the whole ELF file into the target
• Hence, the Flash size of the target needs to accommodate the full size of the
ELF file
• Accordingly, using the size command becomes useful to find out the real size
of each section inside the ELF file
• This helps us identify where to optomize,
• Is it the code (text section) ?
• Is it the defined data (data and bss sections) ?
• We may need to use the strip command to remove some of the optional
sections to reduce the requirements on the Flash size
• In simpler platforms (bare bone, or simple RTOS),
• The Flash Loader only loads the necessary parts to the target
• The debug sections are left in the version on the host machine for debugging
purposes, but the target only gets the binary code, and the data sections
• Accordingly, using the strip command does not really help
• The size command becomes very important, since we don’t care about the
size of the ELF file, we care more about the size of some specific sections
http://Linux4EmbeddedSystems.com

More Related Content

PDF
Embedded Systems: Lecture 13: Introduction to GNU Toolchain (Build Tools)
PDF
Embedded Systems: Lecture 11: Introduction to Git & GitHub (Part 2)
PDF
Jagan Teki - U-boot from scratch
PPTX
System Booting Process overview
PDF
Introduction To Linux Kernel Modules
PDF
UEFI presentation
PPTX
Linux Initialization Process (1)
PDF
Embedded Systems: Lecture 7: Lab 1: Preparing the Raspberry Pi
Embedded Systems: Lecture 13: Introduction to GNU Toolchain (Build Tools)
Embedded Systems: Lecture 11: Introduction to Git & GitHub (Part 2)
Jagan Teki - U-boot from scratch
System Booting Process overview
Introduction To Linux Kernel Modules
UEFI presentation
Linux Initialization Process (1)
Embedded Systems: Lecture 7: Lab 1: Preparing the Raspberry Pi

What's hot (20)

PDF
Course 102: Lecture 9: Input Output Internals
PPT
U Boot or Universal Bootloader
PDF
Uboot startup sequence
PDF
U-Boot - An universal bootloader
PDF
Linux Internals - Part II
PDF
systemd
PDF
Android Things : Building Embedded Devices
PPT
U boot porting guide for SoC
PDF
PPTX
UEFI Spec Version 2.4 Facilitates Secure Update
PDF
Embedded Systems: Lecture 8: Lab 1: Building a Raspberry Pi Based WiFi AP
PPTX
Linux Boot Process
PPT
Kernel module programming
PPTX
Linux Initialization Process (2)
PDF
Embedded Systems: Lecture 1: Course Overview
PPTX
Linux System Programming - File I/O
PPTX
Linux device drivers
PPT
Linux Kernel Development
PDF
Course 102: Lecture 18: Process Life Cycle
Course 102: Lecture 9: Input Output Internals
U Boot or Universal Bootloader
Uboot startup sequence
U-Boot - An universal bootloader
Linux Internals - Part II
systemd
Android Things : Building Embedded Devices
U boot porting guide for SoC
UEFI Spec Version 2.4 Facilitates Secure Update
Embedded Systems: Lecture 8: Lab 1: Building a Raspberry Pi Based WiFi AP
Linux Boot Process
Kernel module programming
Linux Initialization Process (2)
Embedded Systems: Lecture 1: Course Overview
Linux System Programming - File I/O
Linux device drivers
Linux Kernel Development
Course 102: Lecture 18: Process Life Cycle
Ad

Viewers also liked (20)

PDF
Program Structure in GNU/Linux (ELF Format)
PDF
Embedded Systems: Lecture 12: Introduction to Git & GitHub (Part 3)
PDF
Embedded Systems: Lecture 4: Selecting the Proper RTOS
PDF
Embedded Systems: Lecture 7: Unwrapping the Raspberry Pi
PDF
C 102 lec_29_what_s_next
PPT
A hands-on introduction to the ELF Object file format
PDF
Embedded Systems: Lecture 6: Linux & GNU
PPTX
06 - ELF format, knowing your friend
PDF
Embedded Systems: Lecture 10: Introduction to Git & GitHub (Part 1)
PDF
Embedded Systems: Lecture 5: A Tour in RTOS Land
PDF
Embedded Systems: Lecture 2: Introduction to Embedded Systems
PDF
Course 102: Lecture 8: Composite Commands
PDF
Course 102: Lecture 5: File Handling Internals
PPT
嵌入式Linux課程-GNU Toolchain
PDF
ELF 101
PDF
HES2011 - James Oakley and Sergey bratus-Exploiting-the-Hard-Working-DWARF
PPT
Intro reverse engineering
PDF
Symbolic Debugging with DWARF
ODP
LD_PRELOAD Exploitation - DC9723
Program Structure in GNU/Linux (ELF Format)
Embedded Systems: Lecture 12: Introduction to Git & GitHub (Part 3)
Embedded Systems: Lecture 4: Selecting the Proper RTOS
Embedded Systems: Lecture 7: Unwrapping the Raspberry Pi
C 102 lec_29_what_s_next
A hands-on introduction to the ELF Object file format
Embedded Systems: Lecture 6: Linux & GNU
06 - ELF format, knowing your friend
Embedded Systems: Lecture 10: Introduction to Git & GitHub (Part 1)
Embedded Systems: Lecture 5: A Tour in RTOS Land
Embedded Systems: Lecture 2: Introduction to Embedded Systems
Course 102: Lecture 8: Composite Commands
Course 102: Lecture 5: File Handling Internals
嵌入式Linux課程-GNU Toolchain
ELF 101
HES2011 - James Oakley and Sergey bratus-Exploiting-the-Hard-Working-DWARF
Intro reverse engineering
Symbolic Debugging with DWARF
LD_PRELOAD Exploitation - DC9723
Ad

Similar to Embedded Systems: Lecture 14: Introduction to GNU Toolchain (Binary Utilities) (20)

PDF
Dynamic Linker
PDF
Compilation and Execution
PPTX
C++ shared libraries and loading
PPTX
Build process ppt.pptx
PPT
Purdue CS354 Operating Systems 2008
PDF
How to write shared libraries!
PPT
Ppt project process migration
PDF
Module-4 Program Design and Anyalysis.pdf
PPTX
Unit V.pptx
PDF
The walking 0xDEAD
PDF
sysprog2 Part2
PPT
PPTX
Revers engineering
PPTX
Linkers
PPT
bh-europe-01-clowes
PDF
The true story_of_hello_world
PPTX
ELF(executable and linkable format)
PPTX
Understanding how C program works
PDF
Compiler design notes phases of compiler
PPTX
Introduction to Linux Exploit Development
Dynamic Linker
Compilation and Execution
C++ shared libraries and loading
Build process ppt.pptx
Purdue CS354 Operating Systems 2008
How to write shared libraries!
Ppt project process migration
Module-4 Program Design and Anyalysis.pdf
Unit V.pptx
The walking 0xDEAD
sysprog2 Part2
Revers engineering
Linkers
bh-europe-01-clowes
The true story_of_hello_world
ELF(executable and linkable format)
Understanding how C program works
Compiler design notes phases of compiler
Introduction to Linux Exploit Development

More from Ahmed El-Arabawy (20)

PDF
Course 102: Lecture 28: Virtual FileSystems
PDF
Course 102: Lecture 27: FileSystems in Linux (Part 2)
PDF
Course 102: Lecture 26: FileSystems in Linux (Part 1)
PDF
Course 102: Lecture 25: Devices and Device Drivers
PDF
Course 102: Lecture 24: Archiving and Compression of Files
PDF
Course 102: Lecture 22: Package Management
PDF
Course 102: Lecture 20: Networking In Linux (Basic Concepts)
PDF
Course 102: Lecture 19: Using Signals
PDF
Course 102: Lecture 17: Process Monitoring
PDF
Course 102: Lecture 16: Process Management (Part 2)
PDF
Course 102: Lecture 14: Users and Permissions
PDF
Course 102: Lecture 13: Regular Expressions
PDF
Course 102: Lecture 12: Basic Text Handling
PDF
Course 102: Lecture 11: Environment Variables
PDF
Course 102: Lecture 10: Learning About the Shell
PDF
Course 102: Lecture 7: Simple Utilities
PDF
Course 102: Lecture 6: Seeking Help
PDF
Course 102: Lecture 4: Using Wild Cards
PDF
Course 102: Lecture 3: Basic Concepts And Commands
PDF
Course 102: Lecture 2: Unwrapping Linux
Course 102: Lecture 28: Virtual FileSystems
Course 102: Lecture 27: FileSystems in Linux (Part 2)
Course 102: Lecture 26: FileSystems in Linux (Part 1)
Course 102: Lecture 25: Devices and Device Drivers
Course 102: Lecture 24: Archiving and Compression of Files
Course 102: Lecture 22: Package Management
Course 102: Lecture 20: Networking In Linux (Basic Concepts)
Course 102: Lecture 19: Using Signals
Course 102: Lecture 17: Process Monitoring
Course 102: Lecture 16: Process Management (Part 2)
Course 102: Lecture 14: Users and Permissions
Course 102: Lecture 13: Regular Expressions
Course 102: Lecture 12: Basic Text Handling
Course 102: Lecture 11: Environment Variables
Course 102: Lecture 10: Learning About the Shell
Course 102: Lecture 7: Simple Utilities
Course 102: Lecture 6: Seeking Help
Course 102: Lecture 4: Using Wild Cards
Course 102: Lecture 3: Basic Concepts And Commands
Course 102: Lecture 2: Unwrapping Linux

Recently uploaded (20)

PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Empathic Computing: Creating Shared Understanding
PDF
Modernizing your data center with Dell and AMD
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Encapsulation theory and applications.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
NewMind AI Weekly Chronicles - August'25 Week I
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
NewMind AI Monthly Chronicles - July 2025
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Cloud computing and distributed systems.
Encapsulation_ Review paper, used for researhc scholars
Empathic Computing: Creating Shared Understanding
Modernizing your data center with Dell and AMD
Spectral efficient network and resource selection model in 5G networks
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Encapsulation theory and applications.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
NewMind AI Weekly Chronicles - August'25 Week I
The AUB Centre for AI in Media Proposal.docx
Network Security Unit 5.pdf for BCA BBA.
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
“AI and Expert System Decision Support & Business Intelligence Systems”
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
NewMind AI Monthly Chronicles - July 2025
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Digital-Transformation-Roadmap-for-Companies.pptx
Cloud computing and distributed systems.

Embedded Systems: Lecture 14: Introduction to GNU Toolchain (Binary Utilities)

  • 1. Linux For Embedded Systems ForArabs Ahmed ElArabawy Cairo University Computer Eng. Dept. CMP445-Embedded Systems
  • 2. Lecture 14: Introduction to the Toolchain Part 2: Binary Utilities
  • 3. Binary Files • What is a Binary File? • A Binary File is the machine language instructions that should be executed on the target • This is not an accurate or complete answer…. • Executable files are just one type of binary files. Not all binary files can execute on the target • Machine language instructions are just one one component of the binary file. Other components exist • So what would be a more accurate answer ??
  • 5. Binary File Types 1. Relocatable Object Files (*.o) • A relocatable object file is the outcome of the compilation/assembly process of one source code file • This file can not execute since it has some unresolved symbols, and needs to be linked with other object files or libraries • Even if the file does not use any unresolved symbols, it is still not ready to execute on the target • Why is it called Relocatable ? • The different components of the object file is set to start at an address independent from other object files (for example they can start at the address 0x00000000) • Addresses of all symbols inside the object files are offsets from the start address • This applies to both machine language instructions and other components of the object file • It is the job of the linker to relocate these components into other addresses when combining multiple object files (so they don’t overlap in memory) • The relocation process performed by the linker includes, • Merging of code and data sections from the object files • Moving the start address of the resulting sections into an address suitable to the target or the OS
  • 7. Binary File Types 2. Executable Binary Files • This is the outcome of the linking process of multiple object files as well as (static/dynamic) libraries • This file has all of its symbols either resolved, or pointing to some shared objects (to be resolved at load/run time) • The code and data sections of this file is the outcome of merging code/data sections of the object (*.o) and archive (*.a) files used to generate it • This file should be placed on the target storage, and is ready to execute on target
  • 9. Binary File Types 3. Shared Object Files (*.so) • This is the outcome of the linking process of multiple object files to form a dynamic library file • Since it is the outcome of a linking process, then the sections of object files are merged and relocated (same as an executable) • One shared object file may rely on some symbols in another shared object files • This file is ready to load on the target by the Dynamic Linker (ld-linux.so) whenever it is needed by an executable file • Shared object files are not executable on their own (they don’t have an entry function), they are only called by executable files
  • 10. Binary File Types 4. Core Dump Files • A core dump file (also called core file) is a binary file that is automatically generated by the Linux kernel when the executable faces a fatal problem that causes it to exit abruptly (crash) • For example, it is generated, • When the executable has a segmentation fault (illegal memory access) • When the executable executes the abort() function • Upon other faults such as floating point faults (eg. Divide by Zero) • It contains records of the state of the program (memory, registers, variables, stack trace, …. ) at the point of the crash • It is useful for debugging purposes to analyze the system crashes in an offline way. Core file can be copied from the target to the host machine, and debugged on the host machine
  • 11. Controlling Core File Generation • In some cases, core files are very useful for debugging faults that can not be debugged on the target • In other cases, core file generation is not desired, to avoid filling the storage media with them • The following can be done to control the generation of core files • To enable core file generation $ ulimit -c unlimited • To disable core file generation $ ulimit -c 0 • To check on the current setting $ ulimit -c • The core file location is set at /proc/sys/kernel/core_pattern • To set the location of the core file generation (make sure the program has write access on the folder) $ echo ‘/home/user/cores/core_%e.%p’ | sudo tee /proc/sys/kernel/core_pattern
  • 13. Binary File Components • Binary file Contains a lot of blocks of information: • The code section (also called the .text section): • This is the machine language instructions that the assembler generates during the compilation process • Each object file (*.o) will contain its .text section • The linker merges the .text sections of object files • The .text section has references to symbols (such as data that it uses, or functions that it needs to jump to) • A group of Data Sections • The .data section that contains global/static data that are initialized in the code (to other values than zero) • The .bss section that contains global/static data that are not initialized or initialized to zero • The .rodata section that contains Read-only Data in the code (such as strings in printf statements) • A symbol table section that contains a table of variables and functions defined or used in the code • A group of debug info sections for use by debuggers • Other sections
  • 14. More on Data Sections • What is the difference between .bss and .data sections? • When the program is loaded in memory, the program global and static data is also loaded from the storage to the RAM to start execution • If the data was initialized in the program, then the object file needs to store the initial value in the .data section • If the data is not initialized, then it is initialized to zero at load time • Since the data is initialized at load time, then there is no need to carry the data value of zero in the object file (what is the value of having a part of the object file which is all filled with zeros ) • Accordingly, the .bss section is normally empty, and used only as a place holder. It has a start address and a length to reserve the space in memory at load time, but no contents in the object file (stored in Flash) • Note that the data initialized to zero in the program, is also considered with the uninitialized data (since all will be initialized to zero by the loader) • What about local data ? Where are they located ? • Local data are created at run time inside the stack • The stack is not part of the object file stored on the Flash, it is created directly in the target RAM at program load in Memory
  • 16. Memory Map Example: (For Simple Embedded Systems)
  • 17. Memory Map Example: (For Simple Embedded Systems)
  • 18. Memory Map Example: (For Simple Embedded Systems)
  • 19. Conclusion: • Binary files are of different types and perform different roles • Binary files are not just a bunch of machine language instructions ready for execution • There are a lot of information inside a binary file (code, data, debug info, tables, …. ) • Due to all of that, we need to have some clean, extendable, and flexible way to carry all of these info in one file and facilitate the use of it • Different OSs use different file formats to carry this information: • Unix introduced the COFF (Common Object File Format) file format • Windows uses the PE (Portable Executable) file format • Linux uses the ELF (Executable and Linkable Format) file format
  • 21. What is an ELF File • ELF stands for “Executable and Linkable Format” • ELF is a file format used for, • Relocatable Object files • Executable files • Shared Library files • Core dump files • ELF files are extensible and have a lot of optional fields • They are not specific to any processor architecture • Used in Unix and Linux OS (and may be adopted by other OSs as well) • Generated by GNU GCC • Used by the GNU toolchain
  • 22. ELF File Layout • An ELF file contains the following, • An ELF Header at the beginning of the file. This header describes high level attributes of the file and target processor such as, • Type of file (object, executable, core, shared object) • Used Processor (ARM, x86, x86-64, …) • Little Endian/Big Endian format • 32/64 bit format • It has Pointers to the other parts of the file • Program Header Table that points to zero or more segments • Section Header Table that describes zero or more sections • A group of segments and sections pointed by the two headers
  • 24. What is a Segment • A segment is part of the executable image for the program • It is more relevant to executable file (not a object file) • It is necessary for runtime execution of the file • There are 3 major types of segments: • Text Segment • Contains binary code for the executable (instructions of the program) • Data Segment • Contains the program variables that are initialized in the program with non zero values • BSS Segment • Contains the program variables that are not initialized in the program (or initialized to zero)
  • 25. What is a Section • Sections are used for linking and debugging purposes • In relocatable object files, sections contain code and data such as in .text , .data and .bss sections • During the linking process, the linker maps the TEXT, DATA, and BSS segments from these sections • Other sections are not required for the execution of the program and are used for debugging purposes • Removing those sections will not affect running of the executable, but may reduce debugging capabilities • Main Sections are: • Symbol Table Section (.symtab) : Used for low level debugging info and for linking purposes • Strings Table Section (.strtab) : Carries all strings used by other sections in the executable • Section Name String Table Section (.shstrtab) : Carries the names of the sections in the table • Debug Info Sections (multiple sections) : available when the executable contains source level debug info • Some tools can be used to strip an ELF file from its sections (such as strip command)
  • 28. • The first 4 bytes represent a magic number to identify that it is an ELF file • The magic number is : 7F ‘E’ ‘L’ ‘F’ which is 7F454C46 • The first 4 bytes represent a magic number to identify that it is an ELF file • The magic number is : 7F ‘E’ ‘L’ ‘F’ which is 7F454C46
  • 29. Value Word Size 01 32 bit Format 02 64 bit Format
  • 30. Value Byte Order 01 Little Endian 02 Big Endian Value Byte Order 01 Little Endian 02 Big Endian
  • 31. Value File Type 00 01 Relocatable Object 00 02 Executable 00 03 Shared Object 00 04 Core dump
  • 32. Value Target Processor 00 03 x86 00 14 PowerPC 00 28 ARM 00 3E x86-64
  • 39. Read ELF Files (readelf Command) • This command is used to read the contents of an ELF file • This includes • Main file header • Section Headers • Sections • Symbol table • … etc • Can be used for, • Resolving linking problems, such as “Unresolved Symbol” error • Debugging a crash • Hacking an executable • Reverse engineering a binary file
  • 40. Read ELF File Main Header $ readelf -h <ELF file>
  • 41. Other Options for readelf • To Show the file sections $ readelf -S <ELF file> • To show the file segments $ readelf -l <ELF file> • To show the symbol table $ readelf -s <ELF File> • To show all the elf file contents $ readelf -a <ELF File>
  • 43. Reading the ELF File (objdump Command) $ objdump [options] <ELF file> • This is another program to read from the ELF files • It has a lot of usages depending of the chosen options • For example, it can be used for, • Reading the ELF file Header (readelf does a better job with that) • Reading the ELF file sections headers • Reading the assembly code of the binary code • Reading the Symbol table(s) • Reading the Debug Information
  • 44. Showing the ELF File Header • This usage shows the information of the main header for the ELF file • This is equivalent to use of readelf -h (but shows less information)
  • 47. Showing Assembly Code of an ELF • objdump can show the assembly code for the program • It performs disassembly for all the ELF file sections that contain code $ objdump -d my-binary • If you are interested in a specific section, then we need to identify what section to use, $ objdump -j .text -d my-binary • Keep in mind, • You must be using the objdump for the same target platform that the program was compiled for • In case a processor supports to run in both little endian or big endian formats (such as ARM), you need to specify which format you want to use, $ objdump -EB -j .text -d my-binary (Big endian) $ objdump -EL -j .text -d my-binary (Little endian)
  • 49. List Symbols (nm Command) $ nm [options] <elf file> • This command lists the symbols in the provided ELF file • For each symbol, it presents, • Virtual Address of the symbol • Symbol type (Local, Global, Data, BSS, Undefined, … ) • Name of the symbol • Size of the symbol (use the option -S)
  • 50. The nm Command Usage • To find the object files that use or define a certain symbol: $ nm -A ./*.o | grep var_1 This command lists the symbol tables of all object files in the current directory along with the file name, and then filters that with the name of the symbol we are looking for • To list all undefined symbols in an ELF file: $ nm -u my-obj-file These undefined symbols need to be resolved through static linking at build time or at run time via dynamically linking with a shared object • To list dynamic symbols of an executable $ nm -D my-bin-file
  • 52. List Strings in an ELF File (strings Command) $ strings [options] <ELF File> • This command lists all strings in a non-textual file • It looks for any set of printable characters of 4 or more letters within the file • Examples: $ strings a.out (Lists a set of strings within the binary file) $ strings -f /bin/* | grep “Copy” This Command searches for the Copyright in all binary files in /bin directory
  • 54. Reduce ELF File Size (strip Command) $ strip [options] <object file> • The strip command is used to reduce the size of the ELF file • This applies for both executable or object files • This is performed by removing some of the tables and sections that the binary can run without • This is useful for: • Reduce the requirement for Flash and memory usage (specially useful for embedded systems) • Protect the code from being reverse engineered
  • 55. Usage Examples • To strip an executable from its symbol table $ strip -s my-bin-file • To remove debug symbols $ strip --strip-debug my-bin-file • Remove all un-needed symbols $ strip --strip-unneeded my-bin-file • To keep the original file, and create a stripped file $ strip -s -ostripped-file my-bin-file
  • 57. addr2line Command • To know the location in the source code for a specific address, $ addr2line -e my-bin-file 0x400534 • This can be useful when handling a crash, and we know the address of the instruction that crashed, and need to know the source code line that caused the crash • To know also the function name for that address $ addr2line -f -e my-bin-file 0x400534
  • 59. size Command $ size [options] <ELF File> • This command is used to list the sizes of the different sections inside an elf file $ size my-bin-file
  • 60. size Command • In Linux based target platforms, • We load the whole ELF file into the target • Hence, the Flash size of the target needs to accommodate the full size of the ELF file • Accordingly, using the size command becomes useful to find out the real size of each section inside the ELF file • This helps us identify where to optomize, • Is it the code (text section) ? • Is it the defined data (data and bss sections) ? • We may need to use the strip command to remove some of the optional sections to reduce the requirements on the Flash size • In simpler platforms (bare bone, or simple RTOS), • The Flash Loader only loads the necessary parts to the target • The debug sections are left in the version on the host machine for debugging purposes, but the target only gets the binary code, and the data sections • Accordingly, using the strip command does not really help • The size command becomes very important, since we don’t care about the size of the ELF file, we care more about the size of some specific sections