SlideShare a Scribd company logo
UNIT 3
EMBEDDED PROGRAMMING
Component for Embedded
Programs
State machine
Stream-oriented programming and
circular buffer
Queue(used for DSP)
State machine
State machine keeps internal state as a
variable, changes state based on inputs.
Uses:
control-dominated code;
reactive systems.
State machine example
buzzer
belted
idle
seated
no seat/-
seat/timer on
no belt
and no
timer/-
no belt/timer on
belt/-
belt/
buzzer off
Belt/buzzer on
no seat/-
no seat/
buzzer off
C implementation
#define IDLE 0
#define SEATED 1
#define BELTED 2
#define BUZZER 3
switch (state) {
case IDLE: if (seat) { state = SEATED; timer_on = TRUE; }
break;
case SEATED: if (belt) state = BELTED;
else if (timer) state = BUZZER;
break;
…
}
Stream oriented and circular
buffer programming
Commonly used in data stream:
new data constantly arrives;
each datum has a limited lifetime.
Use a circular buffer to hold the data
stream.
d1 d2 d3 d4 d5 d6 d7
time t
time t+1
Circular buffer
x1 x2 x3 x4 x5 x6
t1 t2 t3
Data stream
x1 x2 x3 x4
Circular buffer
x5 x6 x7
Circular buffers
Indexes locate currently used data,
current input data:
d1
d2
d3
d4
time t1
use
input d5
d2
d3
d4
time t1+1
use
input
Circular buffer
implementation: FIR filter
int circ_buffer[N], circ_buffer_head = 0;
int c[N]; /* coefficients */
…
int ibuf, ic;
for (f=0, ibuff=circ_buff_head, ic=0;
ic<N; ibuff=(ibuff==N-1?0:ibuff++), ic++)
f = f + c[ic]*circ_buffer[ibuf];
Queues
Queue is also used in signal processing
and event processing
It is used whenever data may arrive and
depart at unpredictable time
Elastic buffer: holds data that arrives
irregularly.
Models of programs
Model programs. more general than
source code
The fundamental model for program is
the control/data flow graph CDFG
The CDFG has construct both data
operation & control operation
Data flow graph
DFG: data flow graph.
Does not represent control.
Models basic block: code with only one
entry and exit point
Describes the minimal ordering
requirements on operations.
Single assignment form
x = a + b;
y = c - d;
z = x * y;
y = b + d;
original basic block
x = a + b;
y = c - d;
z = x * y;
y1 = b + d;
single assignment form
Data flow graph
x = a + b;
y = c - d;
z = x * y;
y1 = b + d;
single assignment form
+ -
+
*
DFG
a b c d
z
x
y
y1
DFGs and partial orders
Partial order:
a+b, c-d; b+d x*y
Can do pairs of
operations in any
order.
+ -
+
*
a b c d
z
x
y
y1
Control-data flow graph
CDFG: represents control and data.
Uses data flow graphs as components.
Two types of nodes:
Decision nodes
data flow nodes
Data flow node
Encapsulates a data flow graph:
Write operations in basic block form for
simplicity.
x = a + b;
y = c + d
Control
cond
T
F
Equivalent forms
value
v1
v2 v3
v4
CDFG example
if (cond1) bb1();
else bb2();
bb3();
switch (test1) {
case c1: bb4(); break;
case c2: bb5(); break;
case c3: bb6(); break;
}
cond1 bb1()
bb2()
bb3()
bb4()
test1
bb5() bb6()
T
F
c1
c2
c3
for loop
for (i=0; i<N; i++)
loop_body();
for loop
i=0;
while (i<N) {
loop_body(); i++; }
equivalent
i<N
loop_body()
T
F
i=0
Assembly Linking and
Loading
Last steps in compilation process:
HLL compile assembly
assemble
HLL
HLL assembly
assembly
link
executable
loader
Object
code
execution
Multiple-module programs
Programs may be composed from several
files.
Addresses become more specific during
processing:
relative addresses are measured relative to
the start of a module;
absolute addresses are measured relative to
the start of the CPU address space.
Assemblers
Major tasks:
generate binary for symbolic instructions;
translate labels into addresses;
handle pseudo-ops (data, etc.).
Generally one-to-one translation.
Assembly labels:
ORG 100
label1 ADR r4,c
Symbol table
ADD r0,r1,r2
xx ADD r3,r4,r5
CMP r0,r3
yy SUB r5,r6,r7
assembly code
xx 0x8
yy 0x10
symbol table
Symbol table generation
Use program location counter (PLC) to
determine address of each location.
Scan program, keeping count of PLC.
Addresses are generated at assembly
time, not execution time.
Symbol table example
ADD r0,r1,r2
xx ADD r3,r4,r5
CMP r0,r3
yy SUB r5,r6,r7
xx 0x8
yy 0x10
PLC=0x7
PLC=0x7
PLC=0x7
PLC=0x7
Two-pass assembly
Pass 1:
generate symbol table
Pass 2:
generate binary instructions
Relative address
generation
Some label values may not be known at
assembly time.
Labels within the module may be kept in
relative form.
Pseudo-operations
Pseudo-ops do not generate instructions:
ORG sets program location.
EQU generates symbol table entry without
advancing PLC.
Data statements define data blocks.
Linking
Combines several object modules into a
single executable module.
Jobs:
put modules in order;
resolve labels across modules.
external reference
entry point
Externals and entry points
xxx ADD r1,r2,r3
B a
yyy %1
a ADR r4,yyy
ADD r3,r4,r5
Module ordering
Code modules must be placed in absolute
positions in the memory space.
Load map or linker flags control the order
of modules.
module1
module2
module3
Dynamic linking
Some operating systems link modules
dynamically at run time:
shares one copy of library among all
executing programs;
allows programs to be updated with new
versions of libraries.
Object code design
Memory Map design
Interrupt vector and other information for I/O
devices must be placed in specific location
Memory management tables must be set up
Global variables used for communication b/w
processes must be put in location that are
accessible to all the users of the data
Re-entrancy
Relocatability
Compilation Techniques
Compilation process.
Basic Compilation methods.
Compiler optimizations.
Interpreters and just-in-time compilers.
Compilation
Compilation strategy (Wirth):
compilation = translation + optimization
Compiler determines quality of code:
use of CPU resources;
memory access scheduling;
code size.
Basic compilation phases
HLL
parsing, symbol table
machine-independent
optimizations
Instruction level
optimizations
assembly
Basic compilation method
Statement translation
Procedures
Data structures
Statement translation and
optimization
Source code is translated into
intermediate form such as CDFG.
CDFG is transformed/optimized.
CDFG is translated into instructions with
optimization decisions.
Instructions are further optimized.
Arithmetic expressions
a*b + 5*(c-d)
expression
DFG
* -
*
+
a b c d
5
Control code generation
if (a+b > 0)
x = 5;
else
x = 7;
a+b>0 x=5
x=7
Procedure linkage
Need code to:
call and return;
pass parameters and results.
Parameters and returns are passed on
stack.
Procedures with few parameters may use
registers.
Procedure stacks
proc1
growth
proc1(int a) {
proc2(5);
}
proc2
SP
stack pointer
FP
frame pointer
5 accessed relative to SP
Data structures
Different types of data structures use
different data layouts.
Some offsets into data structure can be
computed at compile time, others must be
computed at run time.
One-dimensional arrays
C array name points to 0th element:
a[0]
a[1]
a[2]
a
= *(a + 1)
Two-dimensional arrays
Column-major layout:
a[0,0]
a[0,1]
a[1,0]
a[1,1] = a[i*M+j]
...
M
...
N
Structures
Fields within structures are static offsets:
field1
field2
aptr
struct {
int field1;
char field2;
} mystruct;
struct mystruct a, *aptr = &a;
4 bytes
*(aptr+4)
Compiler optimization
If we want to writing programs in high level
language then we need to understand how to
optimize them without rewriting them in asm
language
It requires creating the proper source code that
cause the complier to do what we want
Loop transformations
Goals:
reduce loop overhead;
increase opportunities for pipelining;
improve memory system performance.
Loop unrolling
Reduces loop overhead, enables some
other optimizations.
It helps expose parallelism
for (i=0; i<4; i++)
a[i] = b[i] * c[i];
for (i=0; i<2; i++) {
a[i*2] = b[i*2] * c[i*2];
a[i*2+1] = b[i*2+1] * c[i*2+1];
}
Loop fusion and
distribution
Fusion combines two loops into 1:
for (i=0; i<N; i++) a[i] = b[i] * 5;
for (j=0; j<N; j++) w[j] = c[j] * d[j];
 for (i=0; i<N; i++) {
a[i] = b[i] * 5; w[i] = c[i] * d[i];
}
Distribution breaks one loop into two.
Changes optimizations within loop body.
Loop tiling
Breaks one loop into a nest of loops.
Changes order of accesses within array.
Changes cache behavior.
Array padding
Add array elements to change mapping
into cache:
a[0,0] a[0,1] a[0,2]
a[1,0] a[1,1] a[1,2]
before
a[0,0] a[0,1] a[0,2]
a[1,0] a[1,1] a[1,2]
after
a[0,2]
a[1,2]
Dead code elimination
Dead code:
#define DEBUG 0
if (DEBUG) dbg(p1);
Can be eliminated by
analysis of control flow,
constant folding.
0
dbg(p1);
1
0
•Dead code can also be generated by compiler
•Dead code can be identified by reachability analysis
Register allocation
Goals:
choose register to hold each variable;
determine lifespan of varible in the register.
Basic case: within basic block.
Register lifetime graph
w = a + b;
x = c + w;
y = c + d;
time
a
b
c
d
w
x
y
1 2 3
t=1
t=2
t=3
Instruction scheduling
Non-pipelined machines do not need
instruction scheduling: any order of
instructions that satisfies data
dependencies runs equally fast.
In pipelined machines, execution time of
one instruction depends on the nearby
instructions: opcode, operands.
Reservation table
A reservation table
relates
instructions/time to
CPU resources.
Time/instr A B
instr1 X
instr2 X X
instr3 X
instr4 X
Software pipelining
Schedules instructions across loop
iterations.
Reduces instruction latency in iteration i
by inserting instructions from iteration
i+1.
Instruction selection
May be several ways to implement an
operation or sequence of operations.
Represent operations as graphs, match
possible instruction sequences onto
graph.
*
+
expression templates
* +
*
+
MUL ADD
MADD
Using your compiler
Understand various optimization levels (-
O1, -O2, etc.)
Look at mixed compiler/assembler output.
Modifying compiler output requires care:
correctness;
loss of hand-tweaked code.
Interpreters and JIT
compilers
Interpreter: translates and executes
program statements on-the-fly.
JIT compiler: compiles small sections of
code into instructions during program
execution.
Eliminates some translation overhead.
Often requires more memory.
Program-level performance
analysis
 Need to understand
performance in detail:
Real-time behavior, not
just typical.
On complex platforms.
 Program performance 
CPU performance:
Pipeline, cache are
windows into program.
We must analyze the entire
program.
Execution time calculation
Varies with input data:
Different-length paths. (because those values
select different execution paths in the
program)
Cache effects. (Plays major effect on
program)
Instruction-level performance variations:
Floating point operation.
Fetch times.
Measure program
performance
From microprocessor manufacturers:
Simulate execution of the CPU.(Simulator)
Makes CPU state visible.
By using timer.
Requires modifying the program to control
the timer. Used to measure the Performance
of executing sections of code.
By using logic analyzer.
 to measure the start & stop times of a code
segment.
Types of Performance
measures on Programs
Average-case execution time. (Typical
Time)
Typically used in application programming.
Worst-case execution time. (Longest
time)
A component in deadline satisfaction.
Best-case execution time.
This measure can be important in multi-rate
real time system.
Elements of program
performance
 Basic program execution time formula:
execution time = program path + instruction timing
The path is the sequence of instructions
executed by the program
We can trace the execution path of program
through its high-level language specification
It is hard to get accurate estimates of total
execution time from a high level program.
Instruction timing
It is based on sequence of instructions traced by
the program path.
Not all instructions take the same amount of time.
Multi-cycle instructions.
Fetches.
Execution times of instructions are not independent.
Pipeline interlocks.
Cache effects.
Execution times may depend on operand values.
Floating-point operations.
Some multi-cycle integer operations.
Measurement-driven
performance analysis
The most direct way to determine the execution
time of a program is by measuring it
Program trace- Process of combining the
determination of the execution path and the
timing of that path
Measurement issues
Need to know the desired input values
May need to write software scaffolding to generate
the input values (SIMULATOR)
Software scaffolding may also need to examine
outputs to generate feedback-driven inputs
Measurement-driven
performance analysis con’t
Methods- Directly on the Hardware & By using
the Simulator
Profiling
It is simple method for analysing software
performance
It does not measure execution time
There are two major ways to profile a program
Physical performance
measurement
In-circuit emulator allows tracing.
Affects execution timing.
Logic analyzer can measure behavior at pins.
Address bus can be analyzed to look for events.
Code can be modified to make events visible.
Particularly important for real-world input
streams.
Software performance
optimization
Software performance can be optimized
using several techniques such as loop and
cache optimizations.
Loop optimizations
Loops are good targets for optimization,
because programs with loops tend to
spend a lot of time executing those loops.
Basic loop optimizations:
code motion;
induction-variable elimination;
strength reduction (x*2 -> x<<1).
Code motion
Code motion can move
unnecessary code out of
the loop.
If a computation result does
not depend on operations
performed in the loop
body, then we can safely
move it out of the loop.
for (i=0; i<N*M; i++)
z[i] = a[i] + b[i];
i<N*M
i=0;
z[i] = a[i] + b[i];
i = i+1;
N
Y
i<X
i=0; X = N*M
Induction variable
elimination
Induction variable: It is a variable whose
value is derived from the loop iteration
variable’s value. loop index.
Consider loop:
for (i=0; i<N; i++)
for (j=0; j<M; j++)
z[i,j] = b[i,j];
Rather than recompute i*M+j for each array in
each iteration, share induction variable between
arrays, increment at end of loop body.
Strength Reduction
Strength reduction helps us to reduce the
cost of a loop iteration. Consider this
assignment,
Y=x*2
In integer arithmetic, we can use left shift
rather than a multiplication by 2. if the shift
is faster than multiply, we want to perform
the substitution.
Cache Optimization
Loop nest: set of loops, one inside other.
Perfect loop nest: no conditionals in nest.
Because loops use large quantities of
data, cache conflicts are common.
Performance optimization
hints
Use registers efficiently.
Use page mode memory accesses.
Analyze cache behavior:
instruction conflicts can be handled by
rewriting code, rescheudling;
conflicting scalar data can easily be moved;
conflicting array data can be moved, padded.
Program level
Energy/power optimization
Power consumption is a particularly
important design metric for battery
powered systems because the battery has
a very limited lifetime.
Energy: ability to do work.
Most important in battery-powered systems.
Power: energy per unit time.
Important even in wall-plug systems---power
becomes heat.
Measuring energy
consumption
Execute a small loop, measure current:
while (TRUE)
{
test_code();
}
CPU
I
Way to save power
Energy consumption of the
program
Energy consumption various somewhat from
instruction to instruction
The sequence of instructions has some influence
Opcode and the location of the operands also
matters.
Memory system also reduces the power
consumption
Cache are an important factor in energy
consumption
Cache behavior is
important
Energy consumption has a sweet spot as
cache size changes:
cache too small: program thrashes, burning
energy on external memory accesses;
cache too large: cache itself burns too much
power.
Optimizing for energy
Way to optimize a program for low power
high performance = low energy.
Making the program run faster also reduce
energy consumption
Not many instructions trade speed for energy.
Memory access patterns control by the
programmer
Optimizing for energy,
cont’d.
Use registers efficiently.
Identify and eliminate cache conflicts.
Make the use of page mode access in the
memory system
Moderate loop unrolling eliminates some loop
overhead instructions.
Eliminate pipeline stalls.
Eliminate recursive procedure call
Analysis and Optimization
of Program Size
The memory size of a program is determined by
the size of its data and instructions
Data provide an excellent opportunity for
minimizing size because the data are most
highly dependent on programming style.
Many inefficient programs often keep several
copies of data, identifying and eliminating
duplications can lead to memory utilization in
high manner.
Analysis and Optimization
of Program Size
A very low level technique for minimizing
data is to reuse values.
Data buffers can be reused at several
different points in the program.
Goal:
reduce hardware cost of memory;
reduce power consumption of memory units.
Two opportunities:
data;
instructions.
Data size minimization
Reuse constants, variables, data buffers in
different parts of code.
Requires careful verification of correctness.
Generate data using instructions.
Reducing code size
Avoid function inlining.
Choose CPU with compact instructions.
Use specialized instructions where
possible.
Program validation and
testing
Complex systems need testing to ensure that
they work as they are intended.
But bugs can be introduced particularly in
embedded system because specialized hardware
and real time responsiveness make
programming more challenging.
There are many available techniques for
software testing that can help us generate a
comprehensive set of tests to ensure that our
system works properly
Program validation and
testing
Breaking the testing problem into sub problems
and analyzing each sub programs.
Two Major type of testing strategies:
Black box doesn’t look at the source code.
Clear box (white box) does look at the source code.
Clear-box testing
Control /data flow graph is an important tool in
developing clear box test for program
Provide the program with input that exercise the test
Execute the program to perform the test
Examine the output to determine whether the test
was successful
Execution paths and
testing
Paths are important in functional testing
as well as performance analysis.
In general, an exponential number of
paths through the program.
Show that some paths dominate others.
Basis paths
Approximate CDFG
with undirected
graph.
Undirected graphs
have basis paths:
All paths are linear
combinations of basis
paths.
Cyclomatic complexity
Cyclomatic complexity
is a bound on the size
of basis sets:
e = # edges
n = # nodes
p = number of graph
components
M = e – n + 2p.
Types of Clear box testing
Strategy
Branch testing- A simple condition testing
strategy is known as branch testing.
This branch testing requires the true and
false branches of a condition.
Every simple condition in the conditional
expressions to be tested at least once.
Types of Clear box testing
Strategy
Domain testing- Another more
sophisticated strategy for testing
conditionals.
Data flow testing- It makes use of def-use
analysis (short for definition-use-analysis)
It selects the path that have some
relationship to the program function.
Loop testing
Loops need specialized tests to be tested
efficiently.
Heuristic testing strategy:
Skip loop entirely.
One loop iteration.
Two loop iterations.
# iterations much below max.
n-1, n, n+1 iterations where n is max.
Black-box testing
Two ways:
Block box test alone- Low probability of
finding all the bugs in a program, so black
box test will be combined with clear box
tests to provide a well-rounded test.
Complements clear-box testing.
May require a large number of tests.
Tests software in different ways.
Black-box test vectors
Random tests.
May weight distribution based on software
specification.
Regression tests.
Tests of previous versions, bugs, etc.
May be clear-box tests of previous versions.
How much testing is
enough?
Exhaustive testing is impractical.
One important measure of test quality---bugs
escaping into field.
Good organizations can test software to give
very low field bug report rates.
Error injection measures test quality:
Add known bugs.
Run your tests.
Determine % injected bugs that are caught.

More Related Content

PPTX
PENULISAN MAKALAH DALAM PENDIDIKANpptx.pptx
PPTX
Social care staff. pain management
PDF
Iqbal cisco-packet-tracer-konfigurasi-dynamic-routing-rip-–-eigrp-–-ospf
PDF
15.pelvic exsanguination (parinya 12.11.57)
PPT
456589.-Compiler-Design-Code-Generation (1).ppt
PPT
COMPILER_DESIGN_CLASS 2.ppt
PPTX
COMPILER_DESIGN_CLASS 1.pptx
PPT
Code Generations - 1 compiler design.ppt
PENULISAN MAKALAH DALAM PENDIDIKANpptx.pptx
Social care staff. pain management
Iqbal cisco-packet-tracer-konfigurasi-dynamic-routing-rip-–-eigrp-–-ospf
15.pelvic exsanguination (parinya 12.11.57)
456589.-Compiler-Design-Code-Generation (1).ppt
COMPILER_DESIGN_CLASS 2.ppt
COMPILER_DESIGN_CLASS 1.pptx
Code Generations - 1 compiler design.ppt

Similar to ERTS UNIT 3.ppt (20)

PPT
PRESENTATION ON DATA STRUCTURE AND THEIR TYPE
PPT
456589.-Compiler-Design-Code-Generation (1).ppt
PDF
Enscape 3D 3.6.6 License Key Crack Full Version
PDF
Wondershare Filmora Crack 12.0.10 With Latest 2025
PDF
Wondershare UniConverter Crack Download Latest 2025
PDF
Skype 125.0.201 Crack key Free Download
PPT
Code_generatio.lk,jhgfdcxzcvgfhjkmnjhgfcxvfghjmh
PDF
Intermediate code generation
PPT
PPTX
Compiler Design_Code generation techniques.pptx
PPTX
UNIT 3 - General Purpose Processors
PPTX
CH-3 CO-all-about-operating-system(Update).pptx
PDF
02 isa
PPT
PPT
ch 3_The CPU_modified.ppt of central processing unit
Ā 
PPT
dspa details
PPTX
Computer Architecture Lecture slide 2. course se 222
PPTX
Computer Organization and Architecture - UNIT I.pptx
PPT
Compiler Design Unit 5
PPT
IS 139 Lecture 6
PRESENTATION ON DATA STRUCTURE AND THEIR TYPE
456589.-Compiler-Design-Code-Generation (1).ppt
Enscape 3D 3.6.6 License Key Crack Full Version
Wondershare Filmora Crack 12.0.10 With Latest 2025
Wondershare UniConverter Crack Download Latest 2025
Skype 125.0.201 Crack key Free Download
Code_generatio.lk,jhgfdcxzcvgfhjkmnjhgfcxvfghjmh
Intermediate code generation
Compiler Design_Code generation techniques.pptx
UNIT 3 - General Purpose Processors
CH-3 CO-all-about-operating-system(Update).pptx
02 isa
ch 3_The CPU_modified.ppt of central processing unit
Ā 
dspa details
Computer Architecture Lecture slide 2. course se 222
Computer Organization and Architecture - UNIT I.pptx
Compiler Design Unit 5
IS 139 Lecture 6
Ad

Recently uploaded (20)

PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Classroom Observation Tools for Teachers
PDF
BƀI Tįŗ¬P Bį»” TRỢ 4 KỸ NĂNG TIįŗ¾NG ANH 9 GLOBAL SUCCESS - Cįŗ¢ NĂM - BƁM SƁT FORM Đ...
PPTX
Cell Types and Its function , kingdom of life
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
Week 4 Term 3 Study Techniques revisited.pptx
Ā 
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
RMMM.pdf make it easy to upload and study
PDF
01-Introduction-to-Information-Management.pdf
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
Cell Structure & Organelles in detailed.
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Complications of Minimal Access Surgery at WLH
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
O7-L3 Supply Chain Operations - ICLT Program
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Classroom Observation Tools for Teachers
BƀI Tįŗ¬P Bį»” TRỢ 4 KỸ NĂNG TIįŗ¾NG ANH 9 GLOBAL SUCCESS - Cįŗ¢ NĂM - BƁM SƁT FORM Đ...
Cell Types and Its function , kingdom of life
Microbial disease of the cardiovascular and lymphatic systems
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Week 4 Term 3 Study Techniques revisited.pptx
Ā 
STATICS OF THE RIGID BODIES Hibbelers.pdf
human mycosis Human fungal infections are called human mycosis..pptx
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
Anesthesia in Laparoscopic Surgery in India
RMMM.pdf make it easy to upload and study
01-Introduction-to-Information-Management.pdf
O5-L3 Freight Transport Ops (International) V1.pdf
Cell Structure & Organelles in detailed.
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Complications of Minimal Access Surgery at WLH
FourierSeries-QuestionsWithAnswers(Part-A).pdf
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
O7-L3 Supply Chain Operations - ICLT Program
Ad

ERTS UNIT 3.ppt

  • 2. Component for Embedded Programs State machine Stream-oriented programming and circular buffer Queue(used for DSP)
  • 3. State machine State machine keeps internal state as a variable, changes state based on inputs. Uses: control-dominated code; reactive systems.
  • 4. State machine example buzzer belted idle seated no seat/- seat/timer on no belt and no timer/- no belt/timer on belt/- belt/ buzzer off Belt/buzzer on no seat/- no seat/ buzzer off
  • 5. C implementation #define IDLE 0 #define SEATED 1 #define BELTED 2 #define BUZZER 3 switch (state) { case IDLE: if (seat) { state = SEATED; timer_on = TRUE; } break; case SEATED: if (belt) state = BELTED; else if (timer) state = BUZZER; break; … }
  • 6. Stream oriented and circular buffer programming Commonly used in data stream: new data constantly arrives; each datum has a limited lifetime. Use a circular buffer to hold the data stream. d1 d2 d3 d4 d5 d6 d7 time t time t+1
  • 7. Circular buffer x1 x2 x3 x4 x5 x6 t1 t2 t3 Data stream x1 x2 x3 x4 Circular buffer x5 x6 x7
  • 8. Circular buffers Indexes locate currently used data, current input data: d1 d2 d3 d4 time t1 use input d5 d2 d3 d4 time t1+1 use input
  • 9. Circular buffer implementation: FIR filter int circ_buffer[N], circ_buffer_head = 0; int c[N]; /* coefficients */ … int ibuf, ic; for (f=0, ibuff=circ_buff_head, ic=0; ic<N; ibuff=(ibuff==N-1?0:ibuff++), ic++) f = f + c[ic]*circ_buffer[ibuf];
  • 10. Queues Queue is also used in signal processing and event processing It is used whenever data may arrive and depart at unpredictable time Elastic buffer: holds data that arrives irregularly.
  • 11. Models of programs Model programs. more general than source code The fundamental model for program is the control/data flow graph CDFG The CDFG has construct both data operation & control operation
  • 12. Data flow graph DFG: data flow graph. Does not represent control. Models basic block: code with only one entry and exit point Describes the minimal ordering requirements on operations.
  • 13. Single assignment form x = a + b; y = c - d; z = x * y; y = b + d; original basic block x = a + b; y = c - d; z = x * y; y1 = b + d; single assignment form
  • 14. Data flow graph x = a + b; y = c - d; z = x * y; y1 = b + d; single assignment form + - + * DFG a b c d z x y y1
  • 15. DFGs and partial orders Partial order: a+b, c-d; b+d x*y Can do pairs of operations in any order. + - + * a b c d z x y y1
  • 16. Control-data flow graph CDFG: represents control and data. Uses data flow graphs as components. Two types of nodes: Decision nodes data flow nodes
  • 17. Data flow node Encapsulates a data flow graph: Write operations in basic block form for simplicity. x = a + b; y = c + d
  • 19. CDFG example if (cond1) bb1(); else bb2(); bb3(); switch (test1) { case c1: bb4(); break; case c2: bb5(); break; case c3: bb6(); break; } cond1 bb1() bb2() bb3() bb4() test1 bb5() bb6() T F c1 c2 c3
  • 20. for loop for (i=0; i<N; i++) loop_body(); for loop i=0; while (i<N) { loop_body(); i++; } equivalent i<N loop_body() T F i=0
  • 21. Assembly Linking and Loading Last steps in compilation process: HLL compile assembly assemble HLL HLL assembly assembly link executable loader Object code execution
  • 22. Multiple-module programs Programs may be composed from several files. Addresses become more specific during processing: relative addresses are measured relative to the start of a module; absolute addresses are measured relative to the start of the CPU address space.
  • 23. Assemblers Major tasks: generate binary for symbolic instructions; translate labels into addresses; handle pseudo-ops (data, etc.). Generally one-to-one translation. Assembly labels: ORG 100 label1 ADR r4,c
  • 24. Symbol table ADD r0,r1,r2 xx ADD r3,r4,r5 CMP r0,r3 yy SUB r5,r6,r7 assembly code xx 0x8 yy 0x10 symbol table
  • 25. Symbol table generation Use program location counter (PLC) to determine address of each location. Scan program, keeping count of PLC. Addresses are generated at assembly time, not execution time.
  • 26. Symbol table example ADD r0,r1,r2 xx ADD r3,r4,r5 CMP r0,r3 yy SUB r5,r6,r7 xx 0x8 yy 0x10 PLC=0x7 PLC=0x7 PLC=0x7 PLC=0x7
  • 27. Two-pass assembly Pass 1: generate symbol table Pass 2: generate binary instructions
  • 28. Relative address generation Some label values may not be known at assembly time. Labels within the module may be kept in relative form.
  • 29. Pseudo-operations Pseudo-ops do not generate instructions: ORG sets program location. EQU generates symbol table entry without advancing PLC. Data statements define data blocks.
  • 30. Linking Combines several object modules into a single executable module. Jobs: put modules in order; resolve labels across modules.
  • 31. external reference entry point Externals and entry points xxx ADD r1,r2,r3 B a yyy %1 a ADR r4,yyy ADD r3,r4,r5
  • 32. Module ordering Code modules must be placed in absolute positions in the memory space. Load map or linker flags control the order of modules. module1 module2 module3
  • 33. Dynamic linking Some operating systems link modules dynamically at run time: shares one copy of library among all executing programs; allows programs to be updated with new versions of libraries.
  • 34. Object code design Memory Map design Interrupt vector and other information for I/O devices must be placed in specific location Memory management tables must be set up Global variables used for communication b/w processes must be put in location that are accessible to all the users of the data Re-entrancy Relocatability
  • 35. Compilation Techniques Compilation process. Basic Compilation methods. Compiler optimizations. Interpreters and just-in-time compilers.
  • 36. Compilation Compilation strategy (Wirth): compilation = translation + optimization Compiler determines quality of code: use of CPU resources; memory access scheduling; code size.
  • 37. Basic compilation phases HLL parsing, symbol table machine-independent optimizations Instruction level optimizations assembly
  • 38. Basic compilation method Statement translation Procedures Data structures
  • 39. Statement translation and optimization Source code is translated into intermediate form such as CDFG. CDFG is transformed/optimized. CDFG is translated into instructions with optimization decisions. Instructions are further optimized.
  • 40. Arithmetic expressions a*b + 5*(c-d) expression DFG * - * + a b c d 5
  • 41. Control code generation if (a+b > 0) x = 5; else x = 7; a+b>0 x=5 x=7
  • 42. Procedure linkage Need code to: call and return; pass parameters and results. Parameters and returns are passed on stack. Procedures with few parameters may use registers.
  • 43. Procedure stacks proc1 growth proc1(int a) { proc2(5); } proc2 SP stack pointer FP frame pointer 5 accessed relative to SP
  • 44. Data structures Different types of data structures use different data layouts. Some offsets into data structure can be computed at compile time, others must be computed at run time.
  • 45. One-dimensional arrays C array name points to 0th element: a[0] a[1] a[2] a = *(a + 1)
  • 47. Structures Fields within structures are static offsets: field1 field2 aptr struct { int field1; char field2; } mystruct; struct mystruct a, *aptr = &a; 4 bytes *(aptr+4)
  • 48. Compiler optimization If we want to writing programs in high level language then we need to understand how to optimize them without rewriting them in asm language It requires creating the proper source code that cause the complier to do what we want
  • 49. Loop transformations Goals: reduce loop overhead; increase opportunities for pipelining; improve memory system performance.
  • 50. Loop unrolling Reduces loop overhead, enables some other optimizations. It helps expose parallelism for (i=0; i<4; i++) a[i] = b[i] * c[i]; for (i=0; i<2; i++) { a[i*2] = b[i*2] * c[i*2]; a[i*2+1] = b[i*2+1] * c[i*2+1]; }
  • 51. Loop fusion and distribution Fusion combines two loops into 1: for (i=0; i<N; i++) a[i] = b[i] * 5; for (j=0; j<N; j++) w[j] = c[j] * d[j];  for (i=0; i<N; i++) { a[i] = b[i] * 5; w[i] = c[i] * d[i]; } Distribution breaks one loop into two. Changes optimizations within loop body.
  • 52. Loop tiling Breaks one loop into a nest of loops. Changes order of accesses within array. Changes cache behavior.
  • 53. Array padding Add array elements to change mapping into cache: a[0,0] a[0,1] a[0,2] a[1,0] a[1,1] a[1,2] before a[0,0] a[0,1] a[0,2] a[1,0] a[1,1] a[1,2] after a[0,2] a[1,2]
  • 54. Dead code elimination Dead code: #define DEBUG 0 if (DEBUG) dbg(p1); Can be eliminated by analysis of control flow, constant folding. 0 dbg(p1); 1 0 •Dead code can also be generated by compiler •Dead code can be identified by reachability analysis
  • 55. Register allocation Goals: choose register to hold each variable; determine lifespan of varible in the register. Basic case: within basic block.
  • 56. Register lifetime graph w = a + b; x = c + w; y = c + d; time a b c d w x y 1 2 3 t=1 t=2 t=3
  • 57. Instruction scheduling Non-pipelined machines do not need instruction scheduling: any order of instructions that satisfies data dependencies runs equally fast. In pipelined machines, execution time of one instruction depends on the nearby instructions: opcode, operands.
  • 58. Reservation table A reservation table relates instructions/time to CPU resources. Time/instr A B instr1 X instr2 X X instr3 X instr4 X
  • 59. Software pipelining Schedules instructions across loop iterations. Reduces instruction latency in iteration i by inserting instructions from iteration i+1.
  • 60. Instruction selection May be several ways to implement an operation or sequence of operations. Represent operations as graphs, match possible instruction sequences onto graph. * + expression templates * + * + MUL ADD MADD
  • 61. Using your compiler Understand various optimization levels (- O1, -O2, etc.) Look at mixed compiler/assembler output. Modifying compiler output requires care: correctness; loss of hand-tweaked code.
  • 62. Interpreters and JIT compilers Interpreter: translates and executes program statements on-the-fly. JIT compiler: compiles small sections of code into instructions during program execution. Eliminates some translation overhead. Often requires more memory.
  • 63. Program-level performance analysis  Need to understand performance in detail: Real-time behavior, not just typical. On complex platforms.  Program performance  CPU performance: Pipeline, cache are windows into program. We must analyze the entire program.
  • 64. Execution time calculation Varies with input data: Different-length paths. (because those values select different execution paths in the program) Cache effects. (Plays major effect on program) Instruction-level performance variations: Floating point operation. Fetch times.
  • 65. Measure program performance From microprocessor manufacturers: Simulate execution of the CPU.(Simulator) Makes CPU state visible. By using timer. Requires modifying the program to control the timer. Used to measure the Performance of executing sections of code. By using logic analyzer.  to measure the start & stop times of a code segment.
  • 66. Types of Performance measures on Programs Average-case execution time. (Typical Time) Typically used in application programming. Worst-case execution time. (Longest time) A component in deadline satisfaction. Best-case execution time. This measure can be important in multi-rate real time system.
  • 67. Elements of program performance  Basic program execution time formula: execution time = program path + instruction timing The path is the sequence of instructions executed by the program We can trace the execution path of program through its high-level language specification It is hard to get accurate estimates of total execution time from a high level program.
  • 68. Instruction timing It is based on sequence of instructions traced by the program path. Not all instructions take the same amount of time. Multi-cycle instructions. Fetches. Execution times of instructions are not independent. Pipeline interlocks. Cache effects. Execution times may depend on operand values. Floating-point operations. Some multi-cycle integer operations.
  • 69. Measurement-driven performance analysis The most direct way to determine the execution time of a program is by measuring it Program trace- Process of combining the determination of the execution path and the timing of that path Measurement issues Need to know the desired input values May need to write software scaffolding to generate the input values (SIMULATOR) Software scaffolding may also need to examine outputs to generate feedback-driven inputs
  • 70. Measurement-driven performance analysis con’t Methods- Directly on the Hardware & By using the Simulator Profiling It is simple method for analysing software performance It does not measure execution time There are two major ways to profile a program
  • 71. Physical performance measurement In-circuit emulator allows tracing. Affects execution timing. Logic analyzer can measure behavior at pins. Address bus can be analyzed to look for events. Code can be modified to make events visible. Particularly important for real-world input streams.
  • 72. Software performance optimization Software performance can be optimized using several techniques such as loop and cache optimizations.
  • 73. Loop optimizations Loops are good targets for optimization, because programs with loops tend to spend a lot of time executing those loops. Basic loop optimizations: code motion; induction-variable elimination; strength reduction (x*2 -> x<<1).
  • 74. Code motion Code motion can move unnecessary code out of the loop. If a computation result does not depend on operations performed in the loop body, then we can safely move it out of the loop. for (i=0; i<N*M; i++) z[i] = a[i] + b[i]; i<N*M i=0; z[i] = a[i] + b[i]; i = i+1; N Y i<X i=0; X = N*M
  • 75. Induction variable elimination Induction variable: It is a variable whose value is derived from the loop iteration variable’s value. loop index. Consider loop: for (i=0; i<N; i++) for (j=0; j<M; j++) z[i,j] = b[i,j]; Rather than recompute i*M+j for each array in each iteration, share induction variable between arrays, increment at end of loop body.
  • 76. Strength Reduction Strength reduction helps us to reduce the cost of a loop iteration. Consider this assignment, Y=x*2 In integer arithmetic, we can use left shift rather than a multiplication by 2. if the shift is faster than multiply, we want to perform the substitution.
  • 77. Cache Optimization Loop nest: set of loops, one inside other. Perfect loop nest: no conditionals in nest. Because loops use large quantities of data, cache conflicts are common.
  • 78. Performance optimization hints Use registers efficiently. Use page mode memory accesses. Analyze cache behavior: instruction conflicts can be handled by rewriting code, rescheudling; conflicting scalar data can easily be moved; conflicting array data can be moved, padded.
  • 79. Program level Energy/power optimization Power consumption is a particularly important design metric for battery powered systems because the battery has a very limited lifetime. Energy: ability to do work. Most important in battery-powered systems. Power: energy per unit time. Important even in wall-plug systems---power becomes heat.
  • 80. Measuring energy consumption Execute a small loop, measure current: while (TRUE) { test_code(); } CPU I Way to save power
  • 81. Energy consumption of the program Energy consumption various somewhat from instruction to instruction The sequence of instructions has some influence Opcode and the location of the operands also matters. Memory system also reduces the power consumption Cache are an important factor in energy consumption
  • 82. Cache behavior is important Energy consumption has a sweet spot as cache size changes: cache too small: program thrashes, burning energy on external memory accesses; cache too large: cache itself burns too much power.
  • 83. Optimizing for energy Way to optimize a program for low power high performance = low energy. Making the program run faster also reduce energy consumption Not many instructions trade speed for energy. Memory access patterns control by the programmer
  • 84. Optimizing for energy, cont’d. Use registers efficiently. Identify and eliminate cache conflicts. Make the use of page mode access in the memory system Moderate loop unrolling eliminates some loop overhead instructions. Eliminate pipeline stalls. Eliminate recursive procedure call
  • 85. Analysis and Optimization of Program Size The memory size of a program is determined by the size of its data and instructions Data provide an excellent opportunity for minimizing size because the data are most highly dependent on programming style. Many inefficient programs often keep several copies of data, identifying and eliminating duplications can lead to memory utilization in high manner.
  • 86. Analysis and Optimization of Program Size A very low level technique for minimizing data is to reuse values. Data buffers can be reused at several different points in the program. Goal: reduce hardware cost of memory; reduce power consumption of memory units. Two opportunities: data; instructions.
  • 87. Data size minimization Reuse constants, variables, data buffers in different parts of code. Requires careful verification of correctness. Generate data using instructions.
  • 88. Reducing code size Avoid function inlining. Choose CPU with compact instructions. Use specialized instructions where possible.
  • 89. Program validation and testing Complex systems need testing to ensure that they work as they are intended. But bugs can be introduced particularly in embedded system because specialized hardware and real time responsiveness make programming more challenging. There are many available techniques for software testing that can help us generate a comprehensive set of tests to ensure that our system works properly
  • 90. Program validation and testing Breaking the testing problem into sub problems and analyzing each sub programs. Two Major type of testing strategies: Black box doesn’t look at the source code. Clear box (white box) does look at the source code.
  • 91. Clear-box testing Control /data flow graph is an important tool in developing clear box test for program Provide the program with input that exercise the test Execute the program to perform the test Examine the output to determine whether the test was successful
  • 92. Execution paths and testing Paths are important in functional testing as well as performance analysis. In general, an exponential number of paths through the program. Show that some paths dominate others.
  • 93. Basis paths Approximate CDFG with undirected graph. Undirected graphs have basis paths: All paths are linear combinations of basis paths.
  • 94. Cyclomatic complexity Cyclomatic complexity is a bound on the size of basis sets: e = # edges n = # nodes p = number of graph components M = e – n + 2p.
  • 95. Types of Clear box testing Strategy Branch testing- A simple condition testing strategy is known as branch testing. This branch testing requires the true and false branches of a condition. Every simple condition in the conditional expressions to be tested at least once.
  • 96. Types of Clear box testing Strategy Domain testing- Another more sophisticated strategy for testing conditionals. Data flow testing- It makes use of def-use analysis (short for definition-use-analysis) It selects the path that have some relationship to the program function.
  • 97. Loop testing Loops need specialized tests to be tested efficiently. Heuristic testing strategy: Skip loop entirely. One loop iteration. Two loop iterations. # iterations much below max. n-1, n, n+1 iterations where n is max.
  • 98. Black-box testing Two ways: Block box test alone- Low probability of finding all the bugs in a program, so black box test will be combined with clear box tests to provide a well-rounded test. Complements clear-box testing. May require a large number of tests. Tests software in different ways.
  • 99. Black-box test vectors Random tests. May weight distribution based on software specification. Regression tests. Tests of previous versions, bugs, etc. May be clear-box tests of previous versions.
  • 100. How much testing is enough? Exhaustive testing is impractical. One important measure of test quality---bugs escaping into field. Good organizations can test software to give very low field bug report rates. Error injection measures test quality: Add known bugs. Run your tests. Determine % injected bugs that are caught.