SlideShare a Scribd company logo
Chapter 2: Assembler
Overview
Assembler Features
Assembler Designing Options (1 pass and 2 pass)
Overview[1]
• Computers like ones and zeros (0001110010000110 ) and humans like
symbols (ADD R6,R2,R6 ; increment index reg.)
• Assembler is a program that turns symbols into machine instructions.
• ISA-specific: close correspondence between symbols and instruction set
• Mnemonics for opcodes
• Labels for memory locations
• Additional operations for allocating storage and initializing data
Overview[2]
• Basic Assembler functions
• Translate mnemonic operation codes to their machine language
equivalents
• Symbolic operands (e.g., variable names)  addresses.
• Choose the proper instruction format & addressing mode.
• Constants  Numbers.
• Output to object files and listing files.
Overview[3]
• Assembler is a program that accepts an assembly language program as input
and produces its machine language equivalent along with information for the
loader
• The feature and design of an assembler depend
• Source language it translate
• The machine language it produce
Overview[4]
Source Program Assembler Object
Code
Loader
Executable
Code
Linker
Basics of Assembler Design[1]
The design of assembler can be to perform the following:
• Scanning (tokenizing)
• Parsing (validating the instructions)
• Creating the symbol table
• Resolving the forward references
• Converting into the machine language
The design of assembler in other words:
• Convert mnemonic operation codes to their machine language
equivalents
• Convert symbolic operands to their equivalent machine addresses
• Decide the proper instruction format Convert the data constants to
internal machine representations
• Write the object program and the assembly listing
Basics of Assembler Design[2]
• Assembler Designing Steps
1. Machine architecture understanding
2. Identify the algorithms and the various data structures to be used
• According to the required steps for assembling, the assembler also has to handle
assembler directives, these do not generate the object code but directs the assembler
to perform certain operation.
• Assembler directives (also called pseudo-instructions)
• Not translated into machine instructions
• Provides instructions to the assembler
Basics of Assembler Design[3]
• Basic assembler directives
• START: Specify name and starting address for the program
• END: Indicate the end of the source program, and (optionally) the first executable instruction in
the program
• BYTE: Generate character or hexadecimal constant, occupying as many bytes as needed to
represent the constant
• WORD: Generate one-word integer constant
• RESB: Reserve the indicated number of bytes for a data area
• RESW: Reserve the indicated number of words for a data area
Basics of Assembler Design[4]
TEST START 1003
FIRST LDA FIVE
STA ALPHA
ALPHA RESW 1 /*symbolic variable*/
FIVE WORD 5 /*symbolic constant, Literal */
END FIRST
LABEL Instruction Operand Psuedo Opcode OR
Assembler Directives
START
RESW
WORD,
END
Basics of Assembler Design[5]
Line numbers (for
reference)
Address labels
Mnemonic opcode
operands
comments
Basics of Assembler Design[6]
Index addressing
Indicate comment lines
Basics of Assembler Design[7]
Basics of Assembler Design[8]
• Example Program with Object Code • The first column shows the line
number for that instruction,
second column shows the
addresses allocated to each
instruction.
• The third column indicates the
labels given to the statement, and
is followed by the instruction
consisting of
opcode and operand.
• The last column gives the
equivalent object code.
Basics of Assembler Design[9]
• The most important things which need to be concentrated is the generation of Symbol table
and resolving forward references.
• Symbol Table:
• This is created during pass 1
• All the labels of the instructions are symbols
• Table has entry for symbol name, address value.
• Forward reference:
• Symbols that are defined in the later part of the program are called forward referencing.
• There will not be any address value for such symbols in the symbol table in pass 1.
Object Program[1]
The object code later will be loaded into memory for execution. The simple
object program we use contains three types of records:
• Object Program Format
• Header
Col. 1 H
Col. 2~7 Program name
Col. 8~13 Starting address of object program (hex)
Col. 14-19 Length of object program in bytes (hex)
• Text
Col.1 T
Col.2~7 Starting address for object code in this record (hex)
Col. 8~9 Length of object code in this record in bytes (hex)
Col. 10~69 Object code, represented in hexa (2 col. per byte)
• End
Col.1 E
Col.2~7 Address of first executable instruction in object
program (hex)
Object Program[2]
• H COPY 001000 00107ª
• T 001000 1E 141033 482039 001036 281030 301015 482061 3C1003 00102A 0C1039
00102D
T 00101E 15 0C1036 482061 081044 4C0000 454F46 000003 000000
T 002039 1E 041030 001030 E0205D 30203F D8205D 281030 302057 549039 2C205E
38203F
T 002057 1C 101036 4C0000 F1 001000 041030 E02079 302064 509039 DC2079 2C1036
T 002073 07 382064 4C0000 05
• E 001000
Design and Implementation Issues
• Some of the features in the program depend on the architecture of the machine.
• If the program is for SIC machine, then we have only limited instruction formats and hence limited
addressing modes.
• We have only single operand instructions; the operand is always a memory reference.
• Hence the improved version of SIC/XE machine provides more instruction formats and hence more
addressing modes.
• The moment we change the machine architecture the availability of number of instruction formats and the
addressing modes changes.
• Therefore the design usually requires considering two things:
• Machine-dependent features and
• Machine-independent features.
Assembler Features
Machine-Dependent Assembler Features
• Instruction formats
• Addressing modes
• Program relocation
Machine Independent Assembler Features
• Literals
• Symbol-defining statements
• Expressions
• Program blocks
• Control sections and program linking
Machine-Dependent Features
(Instruction Format and Addressing Mode)
• The instruction formats depend on the memory organization and the size of the memory.
• In SIC machine the memory is byte addressable.
• How Assembler Recognizes the Addressing Mode e.g. SIC/XE
• PC-relative or Base-relative addressing: op m
• Indirect addressing: op @m
• Immediate addressing: op #c
• Extended format: +op m
• Index addressing: op m,x
• Register-to-register instructions
• Larger memory -> multi-programming (program allocation)
Machine-Dependent Features
(Instruction Format and Addressing Mode)
• Translations for the Instruction involving Register-Register addressing mode
• During pass 1 the registers can be entered as part of the symbol table itself. The value for these registers is their equivalent
numeric codes.
• During pass 2, these values are assembled along with the mnemonics object code. If required a separate
table can be created with the register names and their equivalent numeric values.
• Translation involving Register-Memory instructions:
• In SIC/XE machine there are four instruction formats and five addressing modes.
• Among the instruction formats, format -3 and format-4 instructions are Register-Memory type of instruction. One of the operand is always in a register and the other
operand is in the memory.
• The addressing mode tells us the way in which the operand from the memory is to be fetched.
Machine-Dependent Features
(Program Relocation)
• The larger main memory of SIC/XE
• Several programs can be loaded and run at the same time.
• This kind of sharing of the machine between programs is called multiprogramming
• To take full advantage
• Load programs into memory wherever there is room
• Not specifying a fixed address at assembly time
• Called program relocation
Machine-Dependent Features
(Relocatable Program)
• Modification record
• Col 1 M
• Col 2-7: Starting location of the address field to be modified, relative to the beginning
of the program
• Col 8-9: length of the address field to be modified, in half-bytes
Machine-Independent Features
• Features are not closely related to machine architecture.
• More related to issues about:
• Programmer convenience
• Software environment
• Common examples:
• Literals
• Symbol-defining statements
• Expressions
• Program blocks
• Control sections
• Assembler directives are widely used to support these features
Machine-Independent Features
(Literals)
• Literal is equivalent to:
• Define a constant explicitly and assign an address label for it
• Use the label as the instruction operand
• Why use literals:
• To avoid defining the constant somewhere and making up a label for it
• Instead, to write the value of a constant operand as a part of the instruction
• How to use literals:
• A literal is identified with the prefix =, followed by a specification of the literal value
Machine-Independent Features
(Literals vs. Immediate Operands)
• Same:
• Operand field contains constant values
• Difference:
• Immediate addressing: the assembler put the constant value as part of the machine
instruction
• Literal: the assembler store the constant value elsewhere and put that address as part of
the machine instruction
Machine-Independent Features
(Literal Pool)
• All of the literal operands are gathered together into one or more literal
pools.
• literal pool:
• At the location where the LTORG directive is encountered
• To keep the literal operand close to the instruction that uses it
• At the end of the object program, generated immediately following the END statement
Machine-Independent Features
(Literals Examples)
Original Program Using Literals
Machine-Independent Features
(Literal Implementation)
• Data structure: a literal table LITTAB
• Literal name
• Operand value and length
• Address
• LITTAB is often organized as a hash table, using the literal name or value as
the key
Machine-Independent Features
(Symbols)[1]
• How to define symbols and their values
• Address label
• The label is the symbol name and the assigned address is its value
FIRST STL RETADR
• Assembler directive EQU
symbol EQU value
• This statement enters the symbol into SYMTAB and assigns to it the value specified
• The value can be a constant or an expression
• Assembler directive ORG
ORG value
Machine-Independent Features
(Symbols)[2]
• Use of EQU
• To improve the program readability, avoid using the magic numbers, make it easier to find and change
constant values
• +LDT #4096
• MAXLEN EQU 4096
+LDT #MAXLEN
• To define mnemonic names for registers
• A EQU 0
• X EQU 1
• BASE EQU R1
• COUNT EQU R2
Machine-Independent Features
(Symbols)[3]
• Use of ORG
• Indirect value assignment:
ORG value
• When ORG is encountered, the assembler resets its LOCCTR to the specified value
• ORG will affect the values of all labels defined until the next ORG
• If the previous value of LOCCTR can be automatically remembered, we can return to
the normal use of LOCCTR by simply write
ORG
Machine-Independent Features
(Symbols)[4]
• Forward reference is not allowed for EQU and ORG.
• That is, all terms in the value field must have been defined previously in the
program.
• The reason is that all symbols must have been defined during Pass 1 in a two-
pass assembler.
Allowed
Not allowed
Machine-Independent Features
(Symbols)[5]
• Usages of Symbols
• Establish symbolic names to improve readability in place of numeric values.
• E.g., +LDT #4096 can be changed to : MAXLEN EQU 4096 +LDT #MAXLEN
• When the assembler encounters the EQU statement, it enters MAXLEN into SYMTAB
(with value 4096).
• During assembly of the LDT instruction, the assembler searches SYMTAB for the symbol
MAXLEN, using its value as the operand in the instruction.
• Define mnemonic names for registers
Machine-Independent Features
(Expressions)[1]
• A single term as an instruction operand can be replaced by an expression.
STAB RESB 1100
STAB RESB 11*100
STAB RESB (6+3+2)*MAXENTRIES
• The assembler has to evaluate the expression to produce a single operand address or value.
• Expressions consist of
• Operator
• +,-,*,/ (division is usually defined to produce an integer result)
• Individual terms
• Constants
• User-defined symbols
• Special terms, e.g., *, the current value of LOCCTR
Machine-Independent Features
(Expressions)[2]
• Expressions are classified as either absolute expression or relative
expressions depending on the type of value they produce.
• Absolute Expressions: The expression that uses only absolute terms is absolute
expression. Absolute expression may contain relative term provided the relative terms
occur in pairs with opposite signs for each pair. Example: MAXLEN EQU BUFEND-
BUFFER
• Relative Expressions: All the relative terms except one can be paired as described in
“absolute”. The remaining unpaired relative term must have a positive sign. Example:
STAB EQU OPTAB + (BUFEND – BUFFER)
Machine-Independent Features
(Expressions)[3]
• Illegal Expressions
• These expressions represent neither absolute nor location within the program.
Therefore, these expression are considered illegal.
• BUFEND + BUFFER
• 100 – BUFFER
• 3 * BUFFER
Machine-Independent Features
(Program blocks)
• Collect many pieces of code/data that scatter in the source program but have the same kind
into a single block in the generated object program.
• For example, code block, initialized data block, un-initialized data block. (Like code, data
segments on a Pentium PC).
• Advantage:
• Because pieces of code are closer to each other now, format 4 can be replaced with format 3, saving
space and execution time.
• Code sharing and data protection can better be done.
• With this function, in the source program, the programmer can put related code and data
near each other for better readability.
Machine-Independent Features
(Program blocks Example)
Default block.
Machine-Independent Features
(Control Sections)
• A control section is a part of the program that maintains its identity after assembly.
• Each such control section can be loaded and relocated independently of the others. (Main
advantage)
• Different control sections are often used for subroutines or other logical subdivisions of a
program.
• The programmer can assemble, load, and manipulate each of these control sections
separately.
Machine-Independent Features
(Control Sections External References)
• Symbols that are defined in one control section cannot be used directly by another control
section.
• They must be identified as external references for the loader to handle.
• Two assembler directives are used:
• EXTDEF (external definition)
• Identify those symbols that are defined in this control section and can be used in other control sections.
• Control section names are automatically considered as external symbols.
• EXTREF (external reference)
• Identify those symbols that are used in this control section but defined in other control sections.
Assembler Design Options[4]
• The assembler design can be done:
• One pass assembler
• Passes over the file once, that is it collects all the information in one loop
• Two pass assembler:
• Two passes over the source file:
• In first pass, looks for label definitions and introduces them in the symbol table
• In the second pass, after the symbol table is complete, it does the actual assembly by translating the
operations into machine codes and so on.
• A strict 1-pass scanner cannot assemble source code which contains forward references.
Assembler Design Options[4]
• Forward Referencing in Pass 2 Assembler:
• Instruction contains the forward reference, i.e. if the used symbol is not
yet defined.
• Since, if the program is processed ( scanning and parsing and object code
conversion) is done line-by-line, we will be unable to resolve the address
of this symbol.
• Due to this problem most of the assemblers are designed to process the
program in two passes.
One-Pass Assembler[1]
• The whole process of scanning, parsing, and object code conversion is done
in single pass.
• The only problem with this method is resolving forward reference.
• This is shown with an example below:
One-Pass Assembler[2]
• Main problem
• Forward references
• Solution
• Require that all areas be defined before they are referenced.
• It is possible, although inconvenient, to do so for data items.
• Forward jump to instruction items cannot be easily eliminated.
• So, a multi-pass assembler resolves the forward references and then converts
into the object code.
One-Pass Assembler[3]
• There are two types of one-pass assemblers:
• One that produces object code directly in memory for immediate execution
(Load-and-go assemblers).
• The other type produces the usual kind of object code for later execution.
• Forward Reference in One-pass Assembler
• Omits the operand address if the symbol has not yet been defined
• Enters this undefined symbol into SYMTAB and indicates that it is undefined
• Adds the address of this operand address to a list of forward references associated with the
SYMTAB entry
One-Pass Assembler[4]
• Load-and-go assembler generates their object code in memory for immediate
execution.
• No object program is written out, no loader is needed.
• It is useful in a system with frequent program development and testing
• The efficiency of the assembly process is an important consideration.
• Programs are re-assembled nearly every time they are run; efficiency of the
assembly process is an important consideration.
Two-Pass Assembler[1]
• A two-pass assembler performs two sequential scans over the source code:
1. Pass 1: symbols and literals are defined
2. Pass 2: object program is generated
• For a two pass assembler, forward references in symbol definition are not
allowed
• Prohibiting forward references in symbol definition is not a serious
inconvenience.
• Forward references tend to create difficulty for a person reading the program.
Two-Pass Assembler[2]
Pass 1
• Assign addresses to all statements in the
program
• Save the values (addresses) assigned to all labels
(including label and variable names) for use in
Pass 2 (deal with forward references)
• Perform some processing of assembler
directives (e.g., BYTE, RESW these can affect
address assignment)
• Defines the symbols in the symbol table(generate the
symbol table)
Pass 2
• Assemble instructions (generate opcode and
look up addresses)
• Generate data values defined by BYTE,
WORD
• Perform processing of assembler directives not
done in Pass 1
• Write the object program and the assembly
listing
Two-Pass Assembler[3]
• Simple assembler main internal data structures:
• Operation Code Table (OPTAB): Contains symbolic instructions, their lengths and their
op-codes (or subroutine to use for translation)
• Symbol Table (SYMTAB): Contains labels and their values
• Location Counter(LOCCTR): Points to the next location where the code will be placed
Two-Pass Assembler[4]
• OPTAB looks up mnemonic opcodes & translates them to their machine language equivalents
• SYMTAB stores values (addresses) assigned to labels
Operation Code Table (OPTAB)[1]
• Used to look up mnemonic operation codes and translate them into machine
language equivalents
• Contains the mnemonic operation code and its machine language equivalent
• In more complex assemblers, contains information like instruction format and
length
Instruction Op code Length (bytes)
ADD m 18 3
LDA m 00 3
Operation Code Table (OPTAB)[2]
• Content
• The mapping between mnemonic and machine code. Also include the instruction
format, available addressing modes, and length information
• Characteristic
• Static table - the content will never change
• Contents are not normally added/deleted (predefined)
• Implementation
• Array or hash table, easy for search
• Gives optimum performance for the particular set of keys being stored
Symbol Table (SYMTAB)[1]
• Used to store values (addresses) assigned to labels
• Includes the name and value for each label
• Flags to indicate error conditions, e.g. duplicate definition of
labels
• May contain other info like type or length about the data area or
instruction labeled
COPY 1000
FIRST 1000
CLOOP 1003
ENDFIL 1015
EOF 1024
THREE 102D
ZERO 1030
RETADR 1033
LENGTH 1036
BUFFER 1039
RDREC 2039
WRREC 2061
LABEL Address (LOC value)
FIRST 1003
ALPHA 1009
FIVE 1012
Symbol Table (SYMTAB)[2]
• Content
• Flags to indicate error conditions, e.g. duplicate definition of labels
• Characteristic
• Dynamic table (i.e., symbols may be inserted, deleted, or searched in the table)
• Implementation
• Hash table can be used to speed up search
• Organized generally as hash table, for efficiency of insertion & retrieval
• Because variable names may be very similar (e.g., LOOP1, LOOP2), the selected hash
function must perform well with such non-random keys
Location Counter (LOCCTR)
• Used to help in the assignment of addresses
• Initialized to the beginning address specified in the START statement
• After each source statement is processed, the length of the assembled instruction
or data area to be generated is added
• Gives the address of a label
Pseudo Code for Pass 1 (SIC)[1]
• 1st
find starting address of the program
• START – its operand will be the starting address
Pseudo Code for Pass 1 (SIC)[2]
• Whenever we find a label, save it in the symbol table
•Set the error flag if an unrecognized opcode is found OR
if a symbol is encountered more than 1 time
Pseudo Code for Pass 1 (SIC)[3]
Pseudo Code for Pass 2 (SIC)[1]
Pseudo Code for Pass 2 (SIC)[2]
Pseudo Code for Pass 2 (SIC)[3]
Chapter 2}
Chapter 3{

More Related Content

PPTX
Introduction to loaders
PPTX
MACRO PROCESSOR
PPTX
Ch 4 linker loader
PPTX
System programming
PPT
Assembler
PPTX
Single pass assembler
PPT
Assembler
PPT
Assemblers: Ch03
Introduction to loaders
MACRO PROCESSOR
Ch 4 linker loader
System programming
Assembler
Single pass assembler
Assembler
Assemblers: Ch03

What's hot (20)

PDF
loaders and linkers
PPTX
System software - macro expansion,nested macro calls
PPTX
System Programing Unit 1
PPTX
Interactive debugging system
PPTX
Unit 3 sp assembler
PPTX
Linker and Loader
PPTX
COMPILER DESIGN OPTIONS
PPT
How to execute a C program
PPTX
System Programming Unit II
PPTX
Lexical Analysis - Compiler Design
PDF
Unit 3
PPT
PL/SQL Introduction and Concepts
PPTX
DMA and DMA controller
PPTX
Oops in vb
PPT
Operating system services 9
PPTX
Instruction sets of 8086
PPT
Instruction format
PDF
System programming note
loaders and linkers
System software - macro expansion,nested macro calls
System Programing Unit 1
Interactive debugging system
Unit 3 sp assembler
Linker and Loader
COMPILER DESIGN OPTIONS
How to execute a C program
System Programming Unit II
Lexical Analysis - Compiler Design
Unit 3
PL/SQL Introduction and Concepts
DMA and DMA controller
Oops in vb
Operating system services 9
Instruction sets of 8086
Instruction format
System programming note
Ad

Similar to Assembler (20)

PPTX
PPT
Assembler
PPT
Assembler
PPT
assembler_full_slides.ppt
PPT
Unit 3 assembler and processor
PPTX
basic assembler functions in system software.pptx
PPT
Assembler
PPT
Mod 5.1 - Assembler-Summaryyyyyyyyyyyyyyy.ppt
PPTX
Programming the basic computer
PPTX
Ch 3 Assembler in System programming
PPTX
SPOS UNIT1 PPTS (1).pptx
PPT
Bca 2nd sem-u-3.1-basic computer programming and micro programmed control
PDF
System software 5th unit
PPT
B.sc cs-ii-u-3.1-basic computer programming and micro programmed control
PDF
Examinable Question and answer system programming
PPT
Assembly language programming(unit 4)
PPTX
Computer Organization
PPT
basic computer programming and micro programmed control
PPT
Mca i-u-3-basic computer programming and micro programmed control
PPTX
System software module 1 presentation file
Assembler
Assembler
assembler_full_slides.ppt
Unit 3 assembler and processor
basic assembler functions in system software.pptx
Assembler
Mod 5.1 - Assembler-Summaryyyyyyyyyyyyyyy.ppt
Programming the basic computer
Ch 3 Assembler in System programming
SPOS UNIT1 PPTS (1).pptx
Bca 2nd sem-u-3.1-basic computer programming and micro programmed control
System software 5th unit
B.sc cs-ii-u-3.1-basic computer programming and micro programmed control
Examinable Question and answer system programming
Assembly language programming(unit 4)
Computer Organization
basic computer programming and micro programmed control
Mca i-u-3-basic computer programming and micro programmed control
System software module 1 presentation file
Ad

Recently uploaded (20)

PDF
Classroom Observation Tools for Teachers
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Hazard Identification & Risk Assessment .pdf
PDF
LDMMIA Reiki Yoga Finals Review Spring Summer
PPTX
Lesson notes of climatology university.
PPTX
History, Philosophy and sociology of education (1).pptx
PDF
Trump Administration's workforce development strategy
PPTX
Radiologic_Anatomy_of_the_Brachial_plexus [final].pptx
PPTX
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
RMMM.pdf make it easy to upload and study
PDF
Complications of Minimal Access Surgery at WLH
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PPTX
UNIT III MENTAL HEALTH NURSING ASSESSMENT
PDF
Paper A Mock Exam 9_ Attempt review.pdf.
PDF
Empowerment Technology for Senior High School Guide
PPTX
Orientation - ARALprogram of Deped to the Parents.pptx
PPTX
Digestion and Absorption of Carbohydrates, Proteina and Fats
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
Classroom Observation Tools for Teachers
Final Presentation General Medicine 03-08-2024.pptx
Hazard Identification & Risk Assessment .pdf
LDMMIA Reiki Yoga Finals Review Spring Summer
Lesson notes of climatology university.
History, Philosophy and sociology of education (1).pptx
Trump Administration's workforce development strategy
Radiologic_Anatomy_of_the_Brachial_plexus [final].pptx
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
Final Presentation General Medicine 03-08-2024.pptx
RMMM.pdf make it easy to upload and study
Complications of Minimal Access Surgery at WLH
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
UNIT III MENTAL HEALTH NURSING ASSESSMENT
Paper A Mock Exam 9_ Attempt review.pdf.
Empowerment Technology for Senior High School Guide
Orientation - ARALprogram of Deped to the Parents.pptx
Digestion and Absorption of Carbohydrates, Proteina and Fats
202450812 BayCHI UCSC-SV 20250812 v17.pptx

Assembler

  • 1. Chapter 2: Assembler Overview Assembler Features Assembler Designing Options (1 pass and 2 pass)
  • 2. Overview[1] • Computers like ones and zeros (0001110010000110 ) and humans like symbols (ADD R6,R2,R6 ; increment index reg.) • Assembler is a program that turns symbols into machine instructions. • ISA-specific: close correspondence between symbols and instruction set • Mnemonics for opcodes • Labels for memory locations • Additional operations for allocating storage and initializing data
  • 3. Overview[2] • Basic Assembler functions • Translate mnemonic operation codes to their machine language equivalents • Symbolic operands (e.g., variable names)  addresses. • Choose the proper instruction format & addressing mode. • Constants  Numbers. • Output to object files and listing files.
  • 4. Overview[3] • Assembler is a program that accepts an assembly language program as input and produces its machine language equivalent along with information for the loader • The feature and design of an assembler depend • Source language it translate • The machine language it produce
  • 5. Overview[4] Source Program Assembler Object Code Loader Executable Code Linker
  • 6. Basics of Assembler Design[1] The design of assembler can be to perform the following: • Scanning (tokenizing) • Parsing (validating the instructions) • Creating the symbol table • Resolving the forward references • Converting into the machine language The design of assembler in other words: • Convert mnemonic operation codes to their machine language equivalents • Convert symbolic operands to their equivalent machine addresses • Decide the proper instruction format Convert the data constants to internal machine representations • Write the object program and the assembly listing
  • 7. Basics of Assembler Design[2] • Assembler Designing Steps 1. Machine architecture understanding 2. Identify the algorithms and the various data structures to be used • According to the required steps for assembling, the assembler also has to handle assembler directives, these do not generate the object code but directs the assembler to perform certain operation. • Assembler directives (also called pseudo-instructions) • Not translated into machine instructions • Provides instructions to the assembler
  • 8. Basics of Assembler Design[3] • Basic assembler directives • START: Specify name and starting address for the program • END: Indicate the end of the source program, and (optionally) the first executable instruction in the program • BYTE: Generate character or hexadecimal constant, occupying as many bytes as needed to represent the constant • WORD: Generate one-word integer constant • RESB: Reserve the indicated number of bytes for a data area • RESW: Reserve the indicated number of words for a data area
  • 9. Basics of Assembler Design[4] TEST START 1003 FIRST LDA FIVE STA ALPHA ALPHA RESW 1 /*symbolic variable*/ FIVE WORD 5 /*symbolic constant, Literal */ END FIRST LABEL Instruction Operand Psuedo Opcode OR Assembler Directives START RESW WORD, END
  • 10. Basics of Assembler Design[5] Line numbers (for reference) Address labels Mnemonic opcode operands comments
  • 11. Basics of Assembler Design[6] Index addressing Indicate comment lines
  • 12. Basics of Assembler Design[7]
  • 13. Basics of Assembler Design[8] • Example Program with Object Code • The first column shows the line number for that instruction, second column shows the addresses allocated to each instruction. • The third column indicates the labels given to the statement, and is followed by the instruction consisting of opcode and operand. • The last column gives the equivalent object code.
  • 14. Basics of Assembler Design[9] • The most important things which need to be concentrated is the generation of Symbol table and resolving forward references. • Symbol Table: • This is created during pass 1 • All the labels of the instructions are symbols • Table has entry for symbol name, address value. • Forward reference: • Symbols that are defined in the later part of the program are called forward referencing. • There will not be any address value for such symbols in the symbol table in pass 1.
  • 15. Object Program[1] The object code later will be loaded into memory for execution. The simple object program we use contains three types of records: • Object Program Format • Header Col. 1 H Col. 2~7 Program name Col. 8~13 Starting address of object program (hex) Col. 14-19 Length of object program in bytes (hex) • Text Col.1 T Col.2~7 Starting address for object code in this record (hex) Col. 8~9 Length of object code in this record in bytes (hex) Col. 10~69 Object code, represented in hexa (2 col. per byte) • End Col.1 E Col.2~7 Address of first executable instruction in object program (hex)
  • 16. Object Program[2] • H COPY 001000 00107ª • T 001000 1E 141033 482039 001036 281030 301015 482061 3C1003 00102A 0C1039 00102D T 00101E 15 0C1036 482061 081044 4C0000 454F46 000003 000000 T 002039 1E 041030 001030 E0205D 30203F D8205D 281030 302057 549039 2C205E 38203F T 002057 1C 101036 4C0000 F1 001000 041030 E02079 302064 509039 DC2079 2C1036 T 002073 07 382064 4C0000 05 • E 001000
  • 17. Design and Implementation Issues • Some of the features in the program depend on the architecture of the machine. • If the program is for SIC machine, then we have only limited instruction formats and hence limited addressing modes. • We have only single operand instructions; the operand is always a memory reference. • Hence the improved version of SIC/XE machine provides more instruction formats and hence more addressing modes. • The moment we change the machine architecture the availability of number of instruction formats and the addressing modes changes. • Therefore the design usually requires considering two things: • Machine-dependent features and • Machine-independent features.
  • 18. Assembler Features Machine-Dependent Assembler Features • Instruction formats • Addressing modes • Program relocation Machine Independent Assembler Features • Literals • Symbol-defining statements • Expressions • Program blocks • Control sections and program linking
  • 19. Machine-Dependent Features (Instruction Format and Addressing Mode) • The instruction formats depend on the memory organization and the size of the memory. • In SIC machine the memory is byte addressable. • How Assembler Recognizes the Addressing Mode e.g. SIC/XE • PC-relative or Base-relative addressing: op m • Indirect addressing: op @m • Immediate addressing: op #c • Extended format: +op m • Index addressing: op m,x • Register-to-register instructions • Larger memory -> multi-programming (program allocation)
  • 20. Machine-Dependent Features (Instruction Format and Addressing Mode) • Translations for the Instruction involving Register-Register addressing mode • During pass 1 the registers can be entered as part of the symbol table itself. The value for these registers is their equivalent numeric codes. • During pass 2, these values are assembled along with the mnemonics object code. If required a separate table can be created with the register names and their equivalent numeric values. • Translation involving Register-Memory instructions: • In SIC/XE machine there are four instruction formats and five addressing modes. • Among the instruction formats, format -3 and format-4 instructions are Register-Memory type of instruction. One of the operand is always in a register and the other operand is in the memory. • The addressing mode tells us the way in which the operand from the memory is to be fetched.
  • 21. Machine-Dependent Features (Program Relocation) • The larger main memory of SIC/XE • Several programs can be loaded and run at the same time. • This kind of sharing of the machine between programs is called multiprogramming • To take full advantage • Load programs into memory wherever there is room • Not specifying a fixed address at assembly time • Called program relocation
  • 22. Machine-Dependent Features (Relocatable Program) • Modification record • Col 1 M • Col 2-7: Starting location of the address field to be modified, relative to the beginning of the program • Col 8-9: length of the address field to be modified, in half-bytes
  • 23. Machine-Independent Features • Features are not closely related to machine architecture. • More related to issues about: • Programmer convenience • Software environment • Common examples: • Literals • Symbol-defining statements • Expressions • Program blocks • Control sections • Assembler directives are widely used to support these features
  • 24. Machine-Independent Features (Literals) • Literal is equivalent to: • Define a constant explicitly and assign an address label for it • Use the label as the instruction operand • Why use literals: • To avoid defining the constant somewhere and making up a label for it • Instead, to write the value of a constant operand as a part of the instruction • How to use literals: • A literal is identified with the prefix =, followed by a specification of the literal value
  • 25. Machine-Independent Features (Literals vs. Immediate Operands) • Same: • Operand field contains constant values • Difference: • Immediate addressing: the assembler put the constant value as part of the machine instruction • Literal: the assembler store the constant value elsewhere and put that address as part of the machine instruction
  • 26. Machine-Independent Features (Literal Pool) • All of the literal operands are gathered together into one or more literal pools. • literal pool: • At the location where the LTORG directive is encountered • To keep the literal operand close to the instruction that uses it • At the end of the object program, generated immediately following the END statement
  • 28. Machine-Independent Features (Literal Implementation) • Data structure: a literal table LITTAB • Literal name • Operand value and length • Address • LITTAB is often organized as a hash table, using the literal name or value as the key
  • 29. Machine-Independent Features (Symbols)[1] • How to define symbols and their values • Address label • The label is the symbol name and the assigned address is its value FIRST STL RETADR • Assembler directive EQU symbol EQU value • This statement enters the symbol into SYMTAB and assigns to it the value specified • The value can be a constant or an expression • Assembler directive ORG ORG value
  • 30. Machine-Independent Features (Symbols)[2] • Use of EQU • To improve the program readability, avoid using the magic numbers, make it easier to find and change constant values • +LDT #4096 • MAXLEN EQU 4096 +LDT #MAXLEN • To define mnemonic names for registers • A EQU 0 • X EQU 1 • BASE EQU R1 • COUNT EQU R2
  • 31. Machine-Independent Features (Symbols)[3] • Use of ORG • Indirect value assignment: ORG value • When ORG is encountered, the assembler resets its LOCCTR to the specified value • ORG will affect the values of all labels defined until the next ORG • If the previous value of LOCCTR can be automatically remembered, we can return to the normal use of LOCCTR by simply write ORG
  • 32. Machine-Independent Features (Symbols)[4] • Forward reference is not allowed for EQU and ORG. • That is, all terms in the value field must have been defined previously in the program. • The reason is that all symbols must have been defined during Pass 1 in a two- pass assembler. Allowed Not allowed
  • 33. Machine-Independent Features (Symbols)[5] • Usages of Symbols • Establish symbolic names to improve readability in place of numeric values. • E.g., +LDT #4096 can be changed to : MAXLEN EQU 4096 +LDT #MAXLEN • When the assembler encounters the EQU statement, it enters MAXLEN into SYMTAB (with value 4096). • During assembly of the LDT instruction, the assembler searches SYMTAB for the symbol MAXLEN, using its value as the operand in the instruction. • Define mnemonic names for registers
  • 34. Machine-Independent Features (Expressions)[1] • A single term as an instruction operand can be replaced by an expression. STAB RESB 1100 STAB RESB 11*100 STAB RESB (6+3+2)*MAXENTRIES • The assembler has to evaluate the expression to produce a single operand address or value. • Expressions consist of • Operator • +,-,*,/ (division is usually defined to produce an integer result) • Individual terms • Constants • User-defined symbols • Special terms, e.g., *, the current value of LOCCTR
  • 35. Machine-Independent Features (Expressions)[2] • Expressions are classified as either absolute expression or relative expressions depending on the type of value they produce. • Absolute Expressions: The expression that uses only absolute terms is absolute expression. Absolute expression may contain relative term provided the relative terms occur in pairs with opposite signs for each pair. Example: MAXLEN EQU BUFEND- BUFFER • Relative Expressions: All the relative terms except one can be paired as described in “absolute”. The remaining unpaired relative term must have a positive sign. Example: STAB EQU OPTAB + (BUFEND – BUFFER)
  • 36. Machine-Independent Features (Expressions)[3] • Illegal Expressions • These expressions represent neither absolute nor location within the program. Therefore, these expression are considered illegal. • BUFEND + BUFFER • 100 – BUFFER • 3 * BUFFER
  • 37. Machine-Independent Features (Program blocks) • Collect many pieces of code/data that scatter in the source program but have the same kind into a single block in the generated object program. • For example, code block, initialized data block, un-initialized data block. (Like code, data segments on a Pentium PC). • Advantage: • Because pieces of code are closer to each other now, format 4 can be replaced with format 3, saving space and execution time. • Code sharing and data protection can better be done. • With this function, in the source program, the programmer can put related code and data near each other for better readability.
  • 39. Machine-Independent Features (Control Sections) • A control section is a part of the program that maintains its identity after assembly. • Each such control section can be loaded and relocated independently of the others. (Main advantage) • Different control sections are often used for subroutines or other logical subdivisions of a program. • The programmer can assemble, load, and manipulate each of these control sections separately.
  • 40. Machine-Independent Features (Control Sections External References) • Symbols that are defined in one control section cannot be used directly by another control section. • They must be identified as external references for the loader to handle. • Two assembler directives are used: • EXTDEF (external definition) • Identify those symbols that are defined in this control section and can be used in other control sections. • Control section names are automatically considered as external symbols. • EXTREF (external reference) • Identify those symbols that are used in this control section but defined in other control sections.
  • 41. Assembler Design Options[4] • The assembler design can be done: • One pass assembler • Passes over the file once, that is it collects all the information in one loop • Two pass assembler: • Two passes over the source file: • In first pass, looks for label definitions and introduces them in the symbol table • In the second pass, after the symbol table is complete, it does the actual assembly by translating the operations into machine codes and so on. • A strict 1-pass scanner cannot assemble source code which contains forward references.
  • 42. Assembler Design Options[4] • Forward Referencing in Pass 2 Assembler: • Instruction contains the forward reference, i.e. if the used symbol is not yet defined. • Since, if the program is processed ( scanning and parsing and object code conversion) is done line-by-line, we will be unable to resolve the address of this symbol. • Due to this problem most of the assemblers are designed to process the program in two passes.
  • 43. One-Pass Assembler[1] • The whole process of scanning, parsing, and object code conversion is done in single pass. • The only problem with this method is resolving forward reference. • This is shown with an example below:
  • 44. One-Pass Assembler[2] • Main problem • Forward references • Solution • Require that all areas be defined before they are referenced. • It is possible, although inconvenient, to do so for data items. • Forward jump to instruction items cannot be easily eliminated. • So, a multi-pass assembler resolves the forward references and then converts into the object code.
  • 45. One-Pass Assembler[3] • There are two types of one-pass assemblers: • One that produces object code directly in memory for immediate execution (Load-and-go assemblers). • The other type produces the usual kind of object code for later execution. • Forward Reference in One-pass Assembler • Omits the operand address if the symbol has not yet been defined • Enters this undefined symbol into SYMTAB and indicates that it is undefined • Adds the address of this operand address to a list of forward references associated with the SYMTAB entry
  • 46. One-Pass Assembler[4] • Load-and-go assembler generates their object code in memory for immediate execution. • No object program is written out, no loader is needed. • It is useful in a system with frequent program development and testing • The efficiency of the assembly process is an important consideration. • Programs are re-assembled nearly every time they are run; efficiency of the assembly process is an important consideration.
  • 47. Two-Pass Assembler[1] • A two-pass assembler performs two sequential scans over the source code: 1. Pass 1: symbols and literals are defined 2. Pass 2: object program is generated • For a two pass assembler, forward references in symbol definition are not allowed • Prohibiting forward references in symbol definition is not a serious inconvenience. • Forward references tend to create difficulty for a person reading the program.
  • 48. Two-Pass Assembler[2] Pass 1 • Assign addresses to all statements in the program • Save the values (addresses) assigned to all labels (including label and variable names) for use in Pass 2 (deal with forward references) • Perform some processing of assembler directives (e.g., BYTE, RESW these can affect address assignment) • Defines the symbols in the symbol table(generate the symbol table) Pass 2 • Assemble instructions (generate opcode and look up addresses) • Generate data values defined by BYTE, WORD • Perform processing of assembler directives not done in Pass 1 • Write the object program and the assembly listing
  • 49. Two-Pass Assembler[3] • Simple assembler main internal data structures: • Operation Code Table (OPTAB): Contains symbolic instructions, their lengths and their op-codes (or subroutine to use for translation) • Symbol Table (SYMTAB): Contains labels and their values • Location Counter(LOCCTR): Points to the next location where the code will be placed
  • 50. Two-Pass Assembler[4] • OPTAB looks up mnemonic opcodes & translates them to their machine language equivalents • SYMTAB stores values (addresses) assigned to labels
  • 51. Operation Code Table (OPTAB)[1] • Used to look up mnemonic operation codes and translate them into machine language equivalents • Contains the mnemonic operation code and its machine language equivalent • In more complex assemblers, contains information like instruction format and length Instruction Op code Length (bytes) ADD m 18 3 LDA m 00 3
  • 52. Operation Code Table (OPTAB)[2] • Content • The mapping between mnemonic and machine code. Also include the instruction format, available addressing modes, and length information • Characteristic • Static table - the content will never change • Contents are not normally added/deleted (predefined) • Implementation • Array or hash table, easy for search • Gives optimum performance for the particular set of keys being stored
  • 53. Symbol Table (SYMTAB)[1] • Used to store values (addresses) assigned to labels • Includes the name and value for each label • Flags to indicate error conditions, e.g. duplicate definition of labels • May contain other info like type or length about the data area or instruction labeled COPY 1000 FIRST 1000 CLOOP 1003 ENDFIL 1015 EOF 1024 THREE 102D ZERO 1030 RETADR 1033 LENGTH 1036 BUFFER 1039 RDREC 2039 WRREC 2061 LABEL Address (LOC value) FIRST 1003 ALPHA 1009 FIVE 1012
  • 54. Symbol Table (SYMTAB)[2] • Content • Flags to indicate error conditions, e.g. duplicate definition of labels • Characteristic • Dynamic table (i.e., symbols may be inserted, deleted, or searched in the table) • Implementation • Hash table can be used to speed up search • Organized generally as hash table, for efficiency of insertion & retrieval • Because variable names may be very similar (e.g., LOOP1, LOOP2), the selected hash function must perform well with such non-random keys
  • 55. Location Counter (LOCCTR) • Used to help in the assignment of addresses • Initialized to the beginning address specified in the START statement • After each source statement is processed, the length of the assembled instruction or data area to be generated is added • Gives the address of a label
  • 56. Pseudo Code for Pass 1 (SIC)[1] • 1st find starting address of the program • START – its operand will be the starting address
  • 57. Pseudo Code for Pass 1 (SIC)[2] • Whenever we find a label, save it in the symbol table •Set the error flag if an unrecognized opcode is found OR if a symbol is encountered more than 1 time
  • 58. Pseudo Code for Pass 1 (SIC)[3]
  • 59. Pseudo Code for Pass 2 (SIC)[1]
  • 60. Pseudo Code for Pass 2 (SIC)[2]
  • 61. Pseudo Code for Pass 2 (SIC)[3]

Editor's Notes

  • #18: Anything to be fetched from memory requires more time.
  • #20: Benefits of SIC/XE Addressing Modes Register-to-register instructions Shorter than register-to-memory instructions No memory reference Immediate addressing mode No memory reference. The operand is already present as part of the instruction Indirect addressing mode Avoids the needs for another instruction Relative addressing mode Shorten than the extended instruction Easy program relocation START directive specifies a beginning program address of 0: a relocatable program. Register-to-register instructions: simply convert the mnemonic name to their number equivalents OPTAB: for opcodes SYMTAB: preloaded with register names and their values Considering Instruction Formats Relative addressing: op m 1st choice: PC relative (arbitrarily chosen) 2nd choice: base relative (if displacement is invalid in PC relative mode) 3rd choice: error message (if displacement is invalid in both relative modes