ERTS UNIT 3.ppt

Component for Embedded
Programs
State machine
Stream-oriented programming and
circular buffer
Queue(used for DSP)

State machine
State machine keeps internal state as a
variable, changes state based on inputs.
Uses:
control-dominated code;
reactive systems.

State machine example
buzzer
belted
idle
seated
no seat/-
seat/timer on
no belt
and no
timer/-
no belt/timer on
belt/-
belt/
buzzer off
Belt/buzzer on
no seat/-
no seat/
buzzer off

C implementation
#define IDLE 0
#define SEATED 1
#define BELTED 2
#define BUZZER 3
switch (state) {
case IDLE: if (seat) { state = SEATED; timer_on = TRUE; }
break;
case SEATED: if (belt) state = BELTED;
else if (timer) state = BUZZER;
break;
…
}

Stream oriented and circular
buffer programming
Commonly used in data stream:
new data constantly arrives;
each datum has a limited lifetime.
Use a circular buffer to hold the data
stream.
d1 d2 d3 d4 d5 d6 d7
time t
time t+1

Circular buffer
x1 x2 x3 x4 x5 x6
t1 t2 t3
Data stream
x1 x2 x3 x4
Circular buffer
x5 x6 x7

Circular buffers
Indexes locate currently used data,
current input data:
d1
d2
d3
d4
time t1
use
input d5
d2
d3
d4
time t1+1
use
input

Circular buffer
implementation: FIR filter
int circ_buffer[N], circ_buffer_head = 0;
int c[N]; /* coefficients */
…
int ibuf, ic;
for (f=0, ibuff=circ_buff_head, ic=0;
ic<N; ibuff=(ibuff==N-1?0:ibuff++), ic++)
f = f + c[ic]*circ_buffer[ibuf];

Queues
Queue is also used in signal processing
and event processing
It is used whenever data may arrive and
depart at unpredictable time
Elastic buffer: holds data that arrives
irregularly.

Models of programs
Model programs. more general than
source code
The fundamental model for program is
the control/data flow graph CDFG
The CDFG has construct both data
operation & control operation

Data flow graph
DFG: data flow graph.
Does not represent control.
Models basic block: code with only one
entry and exit point
Describes the minimal ordering
requirements on operations.

Single assignment form
x = a + b;
y = c - d;
z = x * y;
y = b + d;
original basic block
x = a + b;
y = c - d;
z = x * y;
y1 = b + d;
single assignment form

Data flow graph
x = a + b;
y = c - d;
z = x * y;
y1 = b + d;
single assignment form
+ -
+
*
DFG
a b c d
z
x
y
y1

DFGs and partial orders
Partial order:
a+b, c-d; b+d x*y
Can do pairs of
operations in any
order.
+ -
+
*
a b c d
z
x
y
y1

Control-data flow graph
CDFG: represents control and data.
Uses data flow graphs as components.
Two types of nodes:
Decision nodes
data flow nodes

Data flow node
Encapsulates a data flow graph:
Write operations in basic block form for
simplicity.
x = a + b;
y = c + d

Control
cond
T
F
Equivalent forms
value
v1
v2 v3
v4

CDFG example
if (cond1) bb1();
else bb2();
bb3();
switch (test1) {
case c1: bb4(); break;
}
cond1 bb1()
bb2()
bb3()
bb4()
test1
bb5() bb6()
T
F
c1
c2
c3

for loop
for (i=0; i<N; i++)
loop_body();
for loop
i=0;
while (i<N) {
loop_body(); i++; }
equivalent
i<N
loop_body()
T
F
i=0

Assembly Linking and
Loading
Last steps in compilation process:
HLL compile assembly
assemble
HLL
HLL assembly
assembly
link
executable
loader
Object
code
execution

Multiple-module programs
Programs may be composed from several
files.
Addresses become more specific during
processing:
relative addresses are measured relative to
the start of a module;
absolute addresses are measured relative to
the start of the CPU address space.

Assemblers
Major tasks:
generate binary for symbolic instructions;
translate labels into addresses;
handle pseudo-ops (data, etc.).
Generally one-to-one translation.
Assembly labels:
ORG 100
label1 ADR r4,c

Symbol table
ADD r0,r1,r2
xx ADD r3,r4,r5
CMP r0,r3
yy SUB r5,r6,r7
assembly code
xx 0x8
yy 0x10
symbol table

Symbol table generation
Use program location counter (PLC) to
determine address of each location.
Scan program, keeping count of PLC.
Addresses are generated at assembly
time, not execution time.

Symbol table example
ADD r0,r1,r2
xx ADD r3,r4,r5
CMP r0,r3
yy SUB r5,r6,r7
xx 0x8
yy 0x10
PLC=0x7
PLC=0x7
PLC=0x7
PLC=0x7

Two-pass assembly
Pass 1:
generate symbol table
Pass 2:
generate binary instructions

Relative address
generation
Some label values may not be known at
assembly time.
Labels within the module may be kept in
relative form.

Pseudo-operations
Pseudo-ops do not generate instructions:
ORG sets program location.
EQU generates symbol table entry without
advancing PLC.
Data statements define data blocks.

Linking
Combines several object modules into a
single executable module.
Jobs:
put modules in order;
resolve labels across modules.

external reference
entry point
Externals and entry points
xxx ADD r1,r2,r3
B a
yyy %1
a ADR r4,yyy
ADD r3,r4,r5

Module ordering
Code modules must be placed in absolute
positions in the memory space.
Load map or linker flags control the order
of modules.
module1
module2
module3

Dynamic linking
Some operating systems link modules
dynamically at run time:
shares one copy of library among all
executing programs;
allows programs to be updated with new
versions of libraries.

Object code design
Memory Map design
Interrupt vector and other information for I/O
devices must be placed in specific location
Memory management tables must be set up
Global variables used for communication b/w
processes must be put in location that are
accessible to all the users of the data
Re-entrancy
Relocatability

Compilation Techniques
Compilation process.
Basic Compilation methods.
Compiler optimizations.
Interpreters and just-in-time compilers.

Compilation
Compilation strategy (Wirth):
compilation = translation + optimization
Compiler determines quality of code:
use of CPU resources;
memory access scheduling;
code size.

Basic compilation phases
HLL
parsing, symbol table
machine-independent
optimizations
Instruction level
optimizations
assembly

Basic compilation method
Statement translation
Procedures
Data structures

Statement translation and
optimization
Source code is translated into
intermediate form such as CDFG.
CDFG is transformed/optimized.
CDFG is translated into instructions with
optimization decisions.
Instructions are further optimized.

Arithmetic expressions
a*b + 5*(c-d)
expression
DFG
* -
*
+
a b c d
5

Control code generation
if (a+b > 0)
x = 5;
else
x = 7;
a+b>0 x=5
x=7

Procedure linkage
Need code to:
call and return;
pass parameters and results.
Parameters and returns are passed on
stack.
Procedures with few parameters may use
registers.

Procedure stacks
proc1
growth
proc1(int a) {
proc2(5);
}
proc2
SP
stack pointer
FP
frame pointer
5 accessed relative to SP

Data structures
Different types of data structures use
different data layouts.
Some offsets into data structure can be
computed at compile time, others must be
computed at run time.

One-dimensional arrays
C array name points to 0th element:
a[0]
a[1]
a[2]
a
= *(a + 1)

Two-dimensional arrays
Column-major layout:
a[0,0]
a[0,1]
a[1,0]
a[1,1] = a[i*M+j]
...
M
...
N

Structures
Fields within structures are static offsets:
field1
field2
aptr
struct {
int field1;
char field2;
} mystruct;
struct mystruct a, *aptr = &a;
4 bytes
*(aptr+4)

Compiler optimization
If we want to writing programs in high level
language then we need to understand how to
optimize them without rewriting them in asm
language
It requires creating the proper source code that
cause the complier to do what we want

Loop transformations
Goals:
reduce loop overhead;
increase opportunities for pipelining;
improve memory system performance.

Loop unrolling
Reduces loop overhead, enables some
other optimizations.
It helps expose parallelism
for (i=0; i<4; i++)
a[i] = b[i] * c[i];
for (i=0; i<2; i++) {
a[i*2] = b[i*2] * c[i*2];
a[i*2+1] = b[i*2+1] * c[i*2+1];
}

Loop fusion and
distribution
Fusion combines two loops into 1:
for (i=0; i<N; i++) a[i] = b[i] * 5;
for (j=0; j<N; j++) w[j] = c[j] * d[j];
 for (i=0; i<N; i++) {
a[i] = b[i] * 5; w[i] = c[i] * d[i];
}
Distribution breaks one loop into two.
Changes optimizations within loop body.

Loop tiling
Breaks one loop into a nest of loops.
Changes order of accesses within array.
Changes cache behavior.

Array padding
Add array elements to change mapping
into cache:
a[0,0] a[0,1] a[0,2]
a[1,0] a[1,1] a[1,2]
before
a[0,0] a[0,1] a[0,2]
a[1,0] a[1,1] a[1,2]
after
a[0,2]
a[1,2]

Dead code elimination
Dead code:
#define DEBUG 0
if (DEBUG) dbg(p1);
Can be eliminated by
analysis of control flow,
constant folding.
0
dbg(p1);
1
0
•Dead code can also be generated by compiler
•Dead code can be identified by reachability analysis

Register allocation
Goals:
choose register to hold each variable;
determine lifespan of varible in the register.
Basic case: within basic block.

Register lifetime graph
w = a + b;
x = c + w;
y = c + d;
time
a
b
c
d
w
x
y
1 2 3
t=1
t=2
t=3

Instruction scheduling
Non-pipelined machines do not need
instruction scheduling: any order of
instructions that satisfies data
dependencies runs equally fast.
In pipelined machines, execution time of
one instruction depends on the nearby
instructions: opcode, operands.

Reservation table
A reservation table
relates
instructions/time to
CPU resources.
Time/instr A B
instr1 X
instr2 X X
instr3 X
instr4 X

Software pipelining
Schedules instructions across loop
iterations.
Reduces instruction latency in iteration i
by inserting instructions from iteration
i+1.

Instruction selection
May be several ways to implement an
operation or sequence of operations.
Represent operations as graphs, match
possible instruction sequences onto
graph.
*
+
expression templates
* +
*
+
MUL ADD
MADD

Using your compiler
Understand various optimization levels (-
O1, -O2, etc.)
Look at mixed compiler/assembler output.
Modifying compiler output requires care:
correctness;
loss of hand-tweaked code.

Interpreters and JIT
compilers
Interpreter: translates and executes
program statements on-the-fly.
JIT compiler: compiles small sections of
code into instructions during program
execution.
Eliminates some translation overhead.
Often requires more memory.

Program-level performance
analysis
 Need to understand
performance in detail:
Real-time behavior, not
just typical.
On complex platforms.
 Program performance 
CPU performance:
Pipeline, cache are
windows into program.
We must analyze the entire
program.

Execution time calculation
Varies with input data:
Different-length paths. (because those values
select different execution paths in the
program)
Cache effects. (Plays major effect on
program)
Instruction-level performance variations:
Floating point operation.
Fetch times.

Measure program
performance
From microprocessor manufacturers:
Simulate execution of the CPU.(Simulator)
Makes CPU state visible.
By using timer.
Requires modifying the program to control
the timer. Used to measure the Performance
of executing sections of code.
By using logic analyzer.
 to measure the start & stop times of a code
segment.

Types of Performance
measures on Programs
Average-case execution time. (Typical
Time)
Typically used in application programming.
Worst-case execution time. (Longest
time)
A component in deadline satisfaction.
Best-case execution time.
This measure can be important in multi-rate
real time system.

Elements of program
performance
 Basic program execution time formula:
execution time = program path + instruction timing
The path is the sequence of instructions
executed by the program
We can trace the execution path of program
through its high-level language specification
It is hard to get accurate estimates of total
execution time from a high level program.

Instruction timing
It is based on sequence of instructions traced by
the program path.
Not all instructions take the same amount of time.
Multi-cycle instructions.
Fetches.
Execution times of instructions are not independent.
Pipeline interlocks.
Cache effects.
Execution times may depend on operand values.
Floating-point operations.
Some multi-cycle integer operations.

Measurement-driven
performance analysis
The most direct way to determine the execution
time of a program is by measuring it
Program trace- Process of combining the
determination of the execution path and the
timing of that path
Measurement issues
Need to know the desired input values
May need to write software scaffolding to generate
the input values (SIMULATOR)
Software scaffolding may also need to examine
outputs to generate feedback-driven inputs

Measurement-driven
performance analysis con’t
Methods- Directly on the Hardware & By using
the Simulator
Profiling
It is simple method for analysing software
performance
It does not measure execution time
There are two major ways to profile a program

Physical performance
measurement
In-circuit emulator allows tracing.
Affects execution timing.
Logic analyzer can measure behavior at pins.
Address bus can be analyzed to look for events.
Code can be modified to make events visible.
Particularly important for real-world input
streams.

Software performance
optimization
Software performance can be optimized
using several techniques such as loop and
cache optimizations.

Loop optimizations
Loops are good targets for optimization,
because programs with loops tend to
spend a lot of time executing those loops.
Basic loop optimizations:
code motion;
induction-variable elimination;
strength reduction (x*2 -> x<<1).

Code motion
Code motion can move
unnecessary code out of
the loop.
If a computation result does
not depend on operations
performed in the loop
body, then we can safely
move it out of the loop.
for (i=0; i<N*M; i++)
z[i] = a[i] + b[i];
i<N*M
i=0;
z[i] = a[i] + b[i];
i = i+1;
N
Y
i<X
i=0; X = N*M

Induction variable
elimination
Induction variable: It is a variable whose
value is derived from the loop iteration
variable’s value. loop index.
Consider loop:
for (i=0; i<N; i++)
for (j=0; j<M; j++)
z[i,j] = b[i,j];
Rather than recompute i*M+j for each array in
each iteration, share induction variable between
arrays, increment at end of loop body.

Strength Reduction
Strength reduction helps us to reduce the
cost of a loop iteration. Consider this
assignment,
Y=x*2
In integer arithmetic, we can use left shift
rather than a multiplication by 2. if the shift
is faster than multiply, we want to perform
the substitution.

Cache Optimization
Loop nest: set of loops, one inside other.
Perfect loop nest: no conditionals in nest.
Because loops use large quantities of
data, cache conflicts are common.

Performance optimization
hints
Use registers efficiently.
Use page mode memory accesses.
Analyze cache behavior:
instruction conflicts can be handled by
rewriting code, rescheudling;
conflicting scalar data can easily be moved;
conflicting array data can be moved, padded.

Program level
Energy/power optimization
Power consumption is a particularly
important design metric for battery
powered systems because the battery has
a very limited lifetime.
Energy: ability to do work.
Most important in battery-powered systems.
Power: energy per unit time.
Important even in wall-plug systems---power
becomes heat.

Measuring energy
consumption
Execute a small loop, measure current:
while (TRUE)
{
test_code();
}
CPU
I
Way to save power

Energy consumption of the
program
Energy consumption various somewhat from
instruction to instruction
The sequence of instructions has some influence
Opcode and the location of the operands also
matters.
Memory system also reduces the power
consumption
Cache are an important factor in energy
consumption

Cache behavior is
important
Energy consumption has a sweet spot as
cache size changes:
cache too small: program thrashes, burning
energy on external memory accesses;
cache too large: cache itself burns too much
power.

Optimizing for energy
Way to optimize a program for low power
high performance = low energy.
Making the program run faster also reduce
energy consumption
Not many instructions trade speed for energy.
Memory access patterns control by the
programmer

Optimizing for energy,
cont’d.
Use registers efficiently.
Identify and eliminate cache conflicts.
Make the use of page mode access in the
memory system
Moderate loop unrolling eliminates some loop
overhead instructions.
Eliminate pipeline stalls.
Eliminate recursive procedure call

Analysis and Optimization
of Program Size
The memory size of a program is determined by
the size of its data and instructions
Data provide an excellent opportunity for
minimizing size because the data are most
highly dependent on programming style.
Many inefficient programs often keep several
copies of data, identifying and eliminating
duplications can lead to memory utilization in
high manner.

Analysis and Optimization
of Program Size
A very low level technique for minimizing
data is to reuse values.
Data buffers can be reused at several
different points in the program.
Goal:
reduce hardware cost of memory;
reduce power consumption of memory units.
Two opportunities:
data;
instructions.

Data size minimization
Reuse constants, variables, data buffers in
different parts of code.
Requires careful verification of correctness.
Generate data using instructions.

Reducing code size
Avoid function inlining.
Choose CPU with compact instructions.
Use specialized instructions where
possible.

Program validation and
testing
Complex systems need testing to ensure that
they work as they are intended.
But bugs can be introduced particularly in
embedded system because specialized hardware
and real time responsiveness make
programming more challenging.
There are many available techniques for
software testing that can help us generate a
comprehensive set of tests to ensure that our
system works properly

Program validation and
testing
Breaking the testing problem into sub problems
and analyzing each sub programs.
Two Major type of testing strategies:
Black box doesn’t look at the source code.
Clear box (white box) does look at the source code.

Clear-box testing
Control /data flow graph is an important tool in
developing clear box test for program
Provide the program with input that exercise the test
Execute the program to perform the test
Examine the output to determine whether the test
was successful

Execution paths and
testing
Paths are important in functional testing
as well as performance analysis.
In general, an exponential number of
paths through the program.
Show that some paths dominate others.

Basis paths
Approximate CDFG
with undirected
graph.
Undirected graphs
have basis paths:
All paths are linear
combinations of basis
paths.

Cyclomatic complexity
Cyclomatic complexity
is a bound on the size
of basis sets:
e = # edges
n = # nodes
p = number of graph
components
M = e – n + 2p.

Types of Clear box testing
Strategy
Branch testing- A simple condition testing
strategy is known as branch testing.
This branch testing requires the true and
false branches of a condition.
Every simple condition in the conditional
expressions to be tested at least once.

Types of Clear box testing
Strategy
Domain testing- Another more
sophisticated strategy for testing
conditionals.
Data flow testing- It makes use of def-use
analysis (short for definition-use-analysis)
It selects the path that have some
relationship to the program function.

Loop testing
Loops need specialized tests to be tested
efficiently.
Heuristic testing strategy:
Skip loop entirely.
One loop iteration.
Two loop iterations.
# iterations much below max.
n-1, n, n+1 iterations where n is max.

Black-box testing
Two ways:
Block box test alone- Low probability of
finding all the bugs in a program, so black
box test will be combined with clear box
tests to provide a well-rounded test.
Complements clear-box testing.
May require a large number of tests.
Tests software in different ways.

Black-box test vectors
Random tests.
May weight distribution based on software
specification.
Regression tests.
Tests of previous versions, bugs, etc.
May be clear-box tests of previous versions.

How much testing is
enough?
Exhaustive testing is impractical.
One important measure of test quality---bugs
escaping into field.
Good organizations can test software to give
very low field bug report rates.
Error injection measures test quality:
Add known bugs.
Run your tests.
Determine % injected bugs that are caught.

ERTS UNIT 3.ppt

More Related Content

Similar to ERTS UNIT 3.ppt (20)

Recently uploaded (20)

ERTS UNIT 3.ppt