SlideShare a Scribd company logo
The Embedded Hardware Architecture
@ ISA MODELS
isa architecture
Embedded Hardware: Building Blocks and the Embedded
Board
In This Chapter
• Introducing the importance of being able to read a schematic diagram
• Discussing the major components of an embedded board
• Introducing the factors that allow an embedded device to work
• Discussing the fundamental elements of electronic components
Learn to Read a Schematic
Blueprint Reading
isa architecture
ALPHABET OF LINES
•Universal language for designers, engineers, & production personnel.
•Uses lines, numbers, symbols and illustrations.
Different Blueprint Forms:
•Drawings for fabrication (Standardized symbols for mechanical, welding,
construction, electrical wiring and assembly).
•Sketches (Illustrate an idea, technical principle or function).
Lines are made in definite standard forms: (all have specific meaning)
• Thickness of a line (thick or thin)
• Solid
• Broken
• Dashed
A. ALPHABET OF LINES
4-8
Von Neumann Model
M E M O R Y
C O N T R O L U N I T
M A R M D R
I R
P R O C E S S I N G U N I T
A L U T E M P
P C
O U T P U T
M o n i t o r
P r i n t e r
L E D
D i s k
I N P U T
K e y b o a r d
M o u s e
S c a n n e r
D i s k
Discussion of the internal processor design as related to the von Neumann model
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
4-10
Memory
2k
x m array of stored bits
Address
• unique (k-bit) identifier of location
Contents
• m-bit value stored in location
Basic Operations:
LOAD
• read a value from a memory location
STORE
• write a value to a memory location
•
•
•
0000
0001
0010
0011
0100
0101
0110
1101
1110
1111
00101101
10100010
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
4-11
Interface to Memory
How does processing unit get data to/from memory?
MAR: Memory Address Register
MDR: Memory Data Register
To LOAD a location (A):
1. Write the address (A) into the MAR.
2. Send a “read” signal to the memory.
3. Read the data from MDR.
To STORE a value (X) to a location (A):
1. Write the data (X) to the MDR.
2. Write the address (A) into the MAR.
3. Send a “write” signal to the memory.
M E M O R Y
M A R M D R
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
4-12
Processing Unit
Functional Units
• ALU = Arithmetic and Logic Unit
• could have many functional units.
some of them special-purpose
(multiply, square root, …)
• LC-3 performs ADD, AND, NOT
Registers
• Small, temporary storage
• Operands and results of functional units
• LC-3 has eight registers (R0, …, R7), each 16 bits wide
Word Size
• number of bits normally processed by ALU in one instruction
• also width of registers
• LC-3 is 16 bits
P R O C E S S I N G U N I T
A L U T E M P
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
4-13
Input and Output
Devices for getting data into and out of computer memory
Each device has its own interface,
usually a set of registers like the
memory’s MAR and MDR
• LC-3 supports keyboard (input) and monitor (output)
• keyboard: data register (KBDR) and status register (KBSR)
• monitor: data register (DDR) and status register (DSR)
Some devices provide both input and output
• disk, network
Program that controls access to a device is
usually called a driver.
I N P U T
K e y b o a r d
M o u s e
S c a n n e r
D i s k
O U T P U T
M o n i t o r
P r i n t e r
L E D
D i s k
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
4-14
Control Unit
Orchestrates execution of the program
Instruction Register (IR) contains the current instruction.
Program Counter (PC) contains the address
of the next instruction to be executed.
Control unit:
• reads an instruction from memory
 the instruction’s address is in the PC
• interprets the instruction, generating signals
that tell the other components what to do
 an instruction may take many machine cycles to complete
C O N T R O L U N I T
I RP C
internal processor design as related to the von Neumann
model
• Processors are the main functional units of an embedded board, and are primarily
responsible for processing instructions and data.
• An electronic device contains at least one master processor, acting as the central
controlling device, and can have additional slave processors that work with and are
controlled by the master processor.
• These slave processors may either extend the instruction set of the master
processor or act to manage memory, buses, and I/O (input/output) devices.
Powering the Hardware
• Once you’ve soldered in the components needed for the power supply, power
up the board and check that this is operational. Also check that you have power
on every pad on the board where you expect power to be, and check the
ground pads to make sure that there is no power where you expect no power to
be.
• Next, solder in the power-decoupling capacitors for the ICs. Add in the
processor’s oscillator and decoupling capacitors. If the oscillator is a module,
check its operation with an oscilloscope.
• If IC sockets are used, solder these next, then insert the components. If you’re
using processors that need to be externally reprogrammed, then sockets are a
isa architecture
isa architecture
Variations chart
Computer Architecture’s Changing Definition
• 1950s to 1960s:
Computer Architecture Course = Computer Arithmetic
• 1970s to mid 1980s:
Computer Architecture Course = Instruction Set Design, especially ISA appropriate for
compilers
• 1990s:
Computer Architecture Course = Design of CPU, memory system, I/O system,
Multiprocessors
Review
• Amdahl’s Law:
• CPU Time & CPI:
Execution Time without enhancement 1
Speedup(E) = --------------------------------------------------------- = ----------------------
Execution Time with enhancement (1 - F) + F/S
CPU time = Instruction count x CPI x clock cycle time
CPU time = Instruction count x CPI / clock rate
Instruction Set Architecture (ISA)
instruction set
software
hardware
ISA Level
The ISA level is the interface between the compilers and the hardware.
Overview of
the Pentium 4
ISA Level
The Pentium 4’s primary
registers.
Instruction Set Architecture (ISA)
( Serves as an interface between software and hardware )
• Provides a mechanism by which the software tells the hardware what should be
done.
instruction set
High level language code : C, C++, Java, Fortran,
hardware
Assembly language code: architecture specific statements
Machine language code: architecture specific bit patterns
software
compiler
assembler
Instruction Set Architecture
• Instruction set architecture is the structure of a computer that a machine language
programmer must understand to write a correct (timing independent) program for
that machine.
• The instruction set architecture is also the machine description that a hardware
designer must understand to design a correct implementation of the computer.
Interface Design
Properties of A good interface:
• Lasts through many implementations (portability, compatibility)
• Is used in many different ways (generality)
• Provides convenient functionality to higher levels
• Permits an efficient implementation at lower levels
Interface
imp 1
imp 2
imp 3
use
use
use
time
Evolution of Instruction Sets
Single Accumulator (EDSAC 1950)
Accumulator + Index Registers
(Manchester Mark I, IBM 700 series 1953)
Separation of Programming Model
from Implementation
High-level Language Based Concept of a Family
(B5000 1963) (IBM 360 1964)
General Purpose Register Machines
Complex Instruction Sets Load/Store Architecture
RISC
(Vax, Intel 432 1977-80) (CDC 6600, Cray 1 1963-76)
(Mips,Sparc,HP-PA,IBM RS6000,PowerPC . . .1987)
LIW/”EPIC”? (IA-64. . .1999)
Evolution of Instruction Sets
• Major advances in computer architecture are typically associated with landmark
instruction set designs
• Design decisions must take into account:
1. technology
2. machine organization
3. programming languages
4. compiler technology
5. operating systems
• And they in turn influence these
What Are the Components of an ISA?
• Sometimes known as The Programmer’s Model of the machine
• Storage cells
– General and special purpose registers in the CPU
– Many general purpose cells of same size in memory
– Storage associated with I/O devices
• The machine instruction set
– The instruction set is the entire repertoire of machine operations
– Makes use of storage cells, formats, and results of the fetch/execute cycle
– i.e., register transfers
1. The instruction format
Size and meaning of fields within the instruction
1. The nature of the fetch-execute cycle
– Things that are done before the operation code is known
What Are the Components of an ISA?
• Which operation to perform add r0, r1, r3
–Ans: Op code: add, load, branch, etc.
• Where to find the operands: add r0, r1, r3
–In CPU registers, memory cells, I/O locations, or part of instruction
• Place to store result add r0, r1, r3
–Again CPU register or memory cell
What Must an Instruction Specify?(I)
Data Flow
• Location of next instruction add r0, r1, r3
br endloop
– Almost always memory cell pointed to by program counter—PC
(Sometimes there is no operand, or no result, or no next instruction. )
What Must an Instruction Specify?(II)
ISACSCE430/830
Types of Operations
• Arithmetic and Logic: AND, ADD
• Data Transfer: MOVE, LOAD, STORE
• Control BRANCH, JUMP, CALL
• System OS CALL, VM
• Floating Point ADDF, MULF, DIVF
• Decimal ADDD, CONVERT
• String MOVE, COMPARE
• Graphics (DE)COMPRESS
Instructions Can Be Divided into
3 Classes (I)
• Data movement instructions
– Move data from a memory location or register to another memory location or register without
changing its form
– Load—source is memory and destination is register
– Store—source is register and destination is memory
• Arithmetic and logic (ALU) instructions
– Change the form of one or more operands to produce a result stored in another location
– Add, Sub, Shift, etc.
• Branch instructions (control flow instructions)
– Alter the normal flow of control from executing the next instruction in sequence
– Br Loc, Brz Loc2,—unconditional or conditional branches
Classifying ISAs
Accumulator (before 1960):
1 address add A acc <− acc + mem[A]
Stack (1960s to 1970s):
0 address add tos <− tos + next
Memory-Memory (1970s to 1980s):
2 address add A, B mem[A] <− mem[A] + mem[B]
3 address add A, B, C mem[A] <− mem[B] + mem[C]
Register-Memory (1970s to present):
2 address add R1, A R1 <− R1 + mem[A]
load R1, A R1 <_ mem[A]
Register-Register (Load/Store) (1960s to present):
3 address add R1, R2, R3 R1 <− R2 + R3
load R1, R2 R1 <− mem[R2]
store R1, R2 mem[R1] <− R2
Classifying ISAs
Code Sequence C = A + BCode Sequence C = A + B
for Four Instruction Setsfor Four Instruction Sets
Stack Accumulator Register
(register-memory)
Register (load-store)
Push A
Push B
Add
Pop C
Load A
Add B
Store C
Load R1, A
Add R1, B
Store C, R1
Load R1,A
Load R2, B
Add R3, R1, R2
Store C, R3
memory memory
acc = acc + mem[C] R1 = R1 + mem[C] R3 = R1 + R2
Stack Architectures
• Instruction set:
add, sub, mult, div, . . .
push A, pop A
• Example: A*B - (A+C*B)
push A
push B
mul
push A
push C
push B
mul
add
sub
A B
A
A*B
A*B
A*B
A*B
A
A
C
A*B
A A*B
A C B B*C A+B*C result
Stacks: Pros and Cons
• Pros
– Good code density (implicit operand addressing top of stack)
– Low hardware requirements
– Easy to write a simpler compiler for stack architectures
• Cons
– Stack becomes the bottleneck
– Little ability for parallelism or pipelining
– Data is not always at the top of stack when need, so additional instructions like TOP and SWAP
are needed
– Difficult to write an optimizing compiler for stack architectures
Accumulator Architectures
• Instruction set:
add A, sub A, mult A, div A, . . .
load A, store A
• Example: A*B - (A+C*B)
load B
mul C
add A
store D
load A
mul B
sub D
B B*C A+B*C AA+B*C A*B result
Accumulators: Pros and Cons
• Pros
– Very low hardware requirements
– Easy to design and understand
• Cons
– Accumulator becomes the bottleneck
– Little ability for parallelism or pipelining
– High memory traffic
Register -Memory Architectures
• Instruction set:
(3 operands) add A, B, C sub A, B, C mul A, B, C
• Example: A*B - (A+C*B)
– 3 operands
mul D, A, B
mul E, C, B
add E, A, E
sub E, D, E
Memory-Memory:
Pros and Cons
• Pros
– Requires fewer instructions (especially if 3 operands)
– Easy to write compilers for (especially if 3 operands)
• Cons
– Very high memory traffic (especially if 3 operands)
– Variable number of clocks per instruction (especially if 2 operands)
– With two operands, more data movements are required
Register-Memory Architectures
• Instruction set:
add R1, A sub R1, A mul R1, B
load R1, A store R1, A
• Example: A*B - (A+C*B)
load R1, A
mul R1, B /* A*B */
store R1, D
load R2, C
mul R2, B /* C*B */
add R2, A /* A + CB */
sub R2, D /* AB - (A + C*B) */
Register- Memory -:
Pros and Cons
• Pros
– Some data can be accessed without loading first
–Instruction format easy to encode
– Good code density
• Cons
– Operands are not equivalent (poor orthogonality)
–Variable number of clocks per instruction
– May limit number of registers
Load-Store Architectures
• Instruction set:
add R1, R2, R3 sub R1, R2, R3 mul R1, R2, R3
load R1, R4 store R1, R4
• Example: A*B - (A+C*B)
load R1, &A
load R2, &B
load R3, &C
load R4, R1
load R5, R2
load R6, R3
mul R7, R6, R5 /* C*B */
add R8, R7, R4 /* A + C*B */
mul R9, R4, R5 /* A*B */
sub R10, R9, R8 /* A*B - (A+C*B) */
Load-Store:
Pros and Cons
• Pros
– Simple, fixed length instruction encoding
– Instructions take similar number of cycles
– Relatively easy to pipeline
• Cons
–Higher instruction count
– Not all instructions need three operands
– Dependent on good compiler
FLPU = Floating Points operations Unit
PFCU = Prefetch control unit
AOU = Atomic Operations Unit
Memory-Management unit (MMU)
MAR (memory address register)
MDR (memory data register)
BIU (Bus Interface Unit)
ARS (Application Register Set)
FRS File Register Set
(SRS) single register set
Processor Performance
Performance A measure of how fast something works..
Amdahl’s Law
Single Enhancement
F: Fraction enhanced, S: Speedup enhanced
F/S
Affected
S
F
F
Speedup
+−
=
)1(
1
1 - F FExecution Time
(without E)
1 - F
Unaffected
Execution Time
(with E)
Ex: Amdahl’s Law (I)
Floating point instructions improved to run 2X;
but only 10% of actual instructions are FP
What is the Speedup?
Ex: Amdahl’s Law (I)
F = 0.1, S = 2
053.1
95.0
1
2
1.0
)1.01(
1
==
+−
=Speedup
Make Common Case Fast 
Enhance the parts of the program that are used most often,
so ‘execution time affected by improvement’ is as large as
possible.
Floating point instructions improved to run 2X;
but only 10% of actual instructions are FP
What is the Speedup?
Amdahl’s Law (II)
Multiple Enhancements
F1,F2,F3: Fraction enhanced, S1,S2,S3: Speedup enhanced
∑∑ ==
+−
= n
i i
i
n
i
i
S
F
F
Speedup
11
)1(
1
1 – (F1+F2+F3)
Unaffected
Execution Time
(with E)
1 – (F1+F2+F3) F1Execution Time
(without E)
F2 F3
Affected
Fi/Si
Ex: Amdahl’s Law (II)
Three CPU performance enhancements with the following speedup
Enhancements and percentage of the execution time:
1) Percentage F1: 20%, Enhanced Speedup S1: 10
2) Percentage F2: 15%, Enhanced Speedup S2: 15
3) Percentage F3: 10%, Enhanced Speedup S3: 30
Assumption: Each enhancement affects a different portion of the code
and only one enhancement can be used at a time.
What is the Total Speedup?
71.1
0333.055.0
1
)1(
1
3
1
3
1
=
+
=
+−
=
∑∑ == i i
i
i
i
S
F
F
Speedup
Execution Time
CPU Time
- doesn’t count I/O or time spent running other programs.
- system CPU time  spent in the operating system
- user CPU time  spent in the program
Our Focus
user CPU time  (CPU) Execution Time = IC * CPI * cycle time
Elapsed Time
- Counts everything (disk and memory access, I/O, etc) - A useful number, but often not good
for comparison purposes
Ex: CPU Execution time
CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle
CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle
A program is running on a RISC machine with the followings:
- 40,000,000 instructions
- 6 cycles/instruction
- 1 GHz Clock rate
What is the CPU execution time for this program?
CPU Exec. Time = IC * CPI * Clock cycle time
= 9
101640000000 −
×××
= ?? seconds
Ex: Performance
A program is running on a RISC machine with the followings:
- 20,000,000 instructions
- 5 cycles/instruction
- 1 GHz Clock rate
Using the same program with a new compiler:
-5,000,000 instructions
-2 cycles/instruction
-1 GHz Clock rate
What is the speedup with the changes?
Speedup = old execution time / new execution time
= X / Y
= Z (times faster after change)
Ex: Instruction Classes & CPI
Compute the CPU clock cycles and average CPI
for the following program:
Inst. type ICi CPIi
ALU 20 4
Data transfer 20 5
Control 10 3
(Sol) CPU clock cycles = 20*4 + 20*5 + 10*3 = X
Average CPI = X/50 = Y
Ex: CPI and Instruction FREQi
Compute the average (effective) CPI for the followings:
Inst. type CPIi FREQi
ALU 3 40% (0.4)
Data transfer 4 40% (0.4)
Control 2 20% (0.2)
(Sol) Average (Effective) CPI = 3*0.4 + 4*0.4 + 2*0.2 = XX
Ex: Peak CPI
Compute the Peak CPI for the followings:
Inst. type CPIi FREQi
ALU 3 40%  0%
Data transfer 4 40%  0%
Control 2 20% 100%
(Sol) Peak CPI = 3*0.0 + 4*0.0 + 2*1.0 = 2.0
Ex: Average CPI and Average MIPS
Compute the average (effective) CPI for the followings:
Inst. type CPIi FREQi
ALU 3 40% (0.4)
Data transfer 4 40% (0.4)
Control 2 20% (0.2)
(Sol) Average (Effective) CPI = 3*0.4 + 4*0.4 + 2*0.2 = 3.2
If the processor is Pentium II (320MHz), what is the MIPS rate?
100
102.3
10320
10 6
6
6
=
×
×
=
×
=
CPI
ClockRate
MIPS
Ex: Peak CPI and Peak MIPS
Compute the Peak CPI for the followings:
Inst. type CPIi FREQi
ALU 3 40%  0%
Data transfer 4 40%  0%
Control 2 20% 100%
(Sol) Peak CPI = 3*0.0 + 4*0.0 + 2*1.0 = 2.0
If the processor is Pentium II (320MHz), what is the peak MIPS rate?
160
102
10320
10 6
6
6
=
×
×
=
×
=
CPI
ClockRate
MIPS
Benchmarking
Multicore Benchmarking Rules
• Do not rely on a single answer
• Match your application requirements
– Small or large data sets
– Few or many threads
– Dependencies
– OS overhead
Two Processor System Utilizing Single Memory Controller
Quad
Core
Processor 1
Quad
Core
Processor 2
DDR2
Interface
Processors 1 and 2 must always arbitrate for memory via their front side bus connection through
the North Bridge.
North
Bridge
Intel Front Side Bus
Two Processor System Utilizing Dual Memory
Controllers
Quad
Core
Processor 1
Quad
Core
Processor 2
LinkDDR2
Interface
DDR2
Interface
Direct Access
Shared Access
Doubly Shared Access
Two Processor System Utilizing Single Memory Controller
Quad
Core
Processor 1
Quad
Core
Processor 2
Link
DDR2
Interface
• Processor 1 must always access memory by traversing link to Processor 2
• Requires arbitration to access Processor 2’s memory
• Processor 2 always has prioritized access to this memory since it is directly attached.
• Affinity can help performance
Benchmarking Multicore – What’s Important?
• Measuring scalability
• Memory and I/O bandwidth
• Inter-core communications
• OS scheduling support
• Efficiency of synchronization
• System-level functionality
The Multicore benchmarking Roadmap
1.Communications
- MCAPI: ultra-light weight
2.Resource Management
- Memory management
- Basic synchronization
- Resource registration
- Resource partitioning
3.Task Management
-Task scheduling
The Four MC Pillars
Virtualization (or OS)
Communication Resource
Management
Task
Management
Debug
Multicore System
Adopted stdsAdopted stds
MCA Foundation APIsMCA Foundation APIs
Value Added Functions
• Languages
• Programming Models
• Design Environments
• Application Generators
• Benchmarks
Services
•Load Balancing
•System Mgt.
•Power Mgt.
•Reliability
•Quality of Service
4.Debug

More Related Content

PDF
Introduction to ARM LPC2148
PPT
Embedded system
PPTX
Ec8791 arm 9 processor
PPTX
PIC 16F877 micro controller by Gaurav raikar
PDF
8085 arithmetic instructions
PDF
Microcontroller basics
PDF
Device drivers and interrupt service mechanism
Introduction to ARM LPC2148
Embedded system
Ec8791 arm 9 processor
PIC 16F877 micro controller by Gaurav raikar
8085 arithmetic instructions
Microcontroller basics
Device drivers and interrupt service mechanism

What's hot (20)

PPTX
Embedded system.ppt
PPTX
ARM Processor
PPTX
Verilog Tutorial - Verilog HDL Tutorial with Examples
PPT
Unit 5 I/O organization
PPTX
Introduction to arm processor
PPTX
Introduction to Arduino.pptx
PPT
Embedded System-design technology
PPT
The ARM Architecture: ARM : ARM Architecture
PDF
ARM Architecture
PDF
Communication Protocols (UART, SPI,I2C)
PPTX
Computer architecture the pentium architecture
PPTX
Timing and control
PPT
PIC 16F877A by PARTHIBAN. S.
PPTX
DC Motor Direction Control Using 8051 C Program
PPTX
Interrupts in pic
PPTX
source code metrics and other maintenance tools and techniques
PDF
VTU 5TH SEM CSE OPERATING SYSTEMS SOLVED PAPERS
PPTX
Instruction Set Architecture
PPTX
INTRODUCTION TO MICROCONTROLLER
PPTX
Input output interface
Embedded system.ppt
ARM Processor
Verilog Tutorial - Verilog HDL Tutorial with Examples
Unit 5 I/O organization
Introduction to arm processor
Introduction to Arduino.pptx
Embedded System-design technology
The ARM Architecture: ARM : ARM Architecture
ARM Architecture
Communication Protocols (UART, SPI,I2C)
Computer architecture the pentium architecture
Timing and control
PIC 16F877A by PARTHIBAN. S.
DC Motor Direction Control Using 8051 C Program
Interrupts in pic
source code metrics and other maintenance tools and techniques
VTU 5TH SEM CSE OPERATING SYSTEMS SOLVED PAPERS
Instruction Set Architecture
INTRODUCTION TO MICROCONTROLLER
Input output interface
Ad

Similar to isa architecture (20)

PPT
Patt patelch04
PPTX
digital logic circuits, digital component
PPT
Unit-3 Von Neumann Architecture.ppt
PPT
Von neuman architecture
PPT
Chapter 4-The Von Neumann Model-PattPatel.ppt
PDF
APznzaboj9CF_9DQRT2HR-lWEYeLjr197Vw_ZUktUfDvP5Qqd8SL2ZSNwpIwVoC6MN9lqvglTXM11...
PPTX
B.sc cs-ii -u-1.2 digital logic circuits, digital component
PDF
chapter 1 of computers organization .pdf
PPTX
Presentation1.pptx
PPTX
UNIT 3 - General Purpose Processors
PPTX
Bca 2nd sem-u-1.2 digital logic circuits, digital component
PPTX
Computer Organisation & Architecture (chapter 1)
PPTX
instruction
PPTX
Unit 1 Presentation and notes with according to syllabus
PPT
comp. org Chapter 1
PPT
PattPatelCh04.ppt
PPTX
Computer organization unit 2detaila with analysis
PDF
computer organization and architecturebec306c
PDF
Computer organization basics
PPTX
computer function and interconnection.pptx
Patt patelch04
digital logic circuits, digital component
Unit-3 Von Neumann Architecture.ppt
Von neuman architecture
Chapter 4-The Von Neumann Model-PattPatel.ppt
APznzaboj9CF_9DQRT2HR-lWEYeLjr197Vw_ZUktUfDvP5Qqd8SL2ZSNwpIwVoC6MN9lqvglTXM11...
B.sc cs-ii -u-1.2 digital logic circuits, digital component
chapter 1 of computers organization .pdf
Presentation1.pptx
UNIT 3 - General Purpose Processors
Bca 2nd sem-u-1.2 digital logic circuits, digital component
Computer Organisation & Architecture (chapter 1)
instruction
Unit 1 Presentation and notes with according to syllabus
comp. org Chapter 1
PattPatelCh04.ppt
Computer organization unit 2detaila with analysis
computer organization and architecturebec306c
Computer organization basics
computer function and interconnection.pptx
Ad

More from AJAL A J (20)

PDF
KEAM KERALA ENTRANCE EXAM
PDF
Paleontology Career
PPT
CHEMISTRY basic concepts of chemistry
PPT
Ecology
PPT
Biogeochemical cycles
PDF
ac dc bridges
PDF
Hays bridge schering bridge wien bridge
PPT
App Naming Tip
PDF
flora and fauna of himachal pradesh and kerala
PDF
B.Sc Cardiovascular Technology(CVT)
PDF
11 business strategies to make profit
PDF
PCOS Polycystic Ovary Syndrome
PDF
Courses and Career Options after Class 12 in Humanities
PPT
MANAGEMENT Stories
PDF
NEET PREPRATION TIPS AND STRATEGY
PDF
REVOLUTIONS IN AGRICULTURE
PDF
NRI QUOTA IN NIT'S
PDF
Subjects to study if you want to work for a charity
PDF
IIT JEE A KERALA PERSPECTIVE
PDF
Clat 2020 exam COMPLETE DETAILS
KEAM KERALA ENTRANCE EXAM
Paleontology Career
CHEMISTRY basic concepts of chemistry
Ecology
Biogeochemical cycles
ac dc bridges
Hays bridge schering bridge wien bridge
App Naming Tip
flora and fauna of himachal pradesh and kerala
B.Sc Cardiovascular Technology(CVT)
11 business strategies to make profit
PCOS Polycystic Ovary Syndrome
Courses and Career Options after Class 12 in Humanities
MANAGEMENT Stories
NEET PREPRATION TIPS AND STRATEGY
REVOLUTIONS IN AGRICULTURE
NRI QUOTA IN NIT'S
Subjects to study if you want to work for a charity
IIT JEE A KERALA PERSPECTIVE
Clat 2020 exam COMPLETE DETAILS

Recently uploaded (20)

PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
Geodesy 1.pptx...............................................
PPT
Mechanical Engineering MATERIALS Selection
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
DOCX
573137875-Attendance-Management-System-original
PDF
Well-logging-methods_new................
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
Lesson 3_Tessellation.pptx finite Mathematics
PDF
composite construction of structures.pdf
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
additive manufacturing of ss316l using mig welding
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PDF
Structs to JSON How Go Powers REST APIs.pdf
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
CH1 Production IntroductoryConcepts.pptx
Geodesy 1.pptx...............................................
Mechanical Engineering MATERIALS Selection
CYBER-CRIMES AND SECURITY A guide to understanding
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
573137875-Attendance-Management-System-original
Well-logging-methods_new................
Embodied AI: Ushering in the Next Era of Intelligent Systems
Lesson 3_Tessellation.pptx finite Mathematics
composite construction of structures.pdf
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Foundation to blockchain - A guide to Blockchain Tech
additive manufacturing of ss316l using mig welding
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Structs to JSON How Go Powers REST APIs.pdf
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx

isa architecture

  • 1. The Embedded Hardware Architecture @ ISA MODELS
  • 3. Embedded Hardware: Building Blocks and the Embedded Board In This Chapter • Introducing the importance of being able to read a schematic diagram • Discussing the major components of an embedded board • Introducing the factors that allow an embedded device to work • Discussing the fundamental elements of electronic components
  • 4. Learn to Read a Schematic Blueprint Reading
  • 6. ALPHABET OF LINES •Universal language for designers, engineers, & production personnel. •Uses lines, numbers, symbols and illustrations. Different Blueprint Forms: •Drawings for fabrication (Standardized symbols for mechanical, welding, construction, electrical wiring and assembly). •Sketches (Illustrate an idea, technical principle or function). Lines are made in definite standard forms: (all have specific meaning) • Thickness of a line (thick or thin) • Solid • Broken • Dashed
  • 8. 4-8 Von Neumann Model M E M O R Y C O N T R O L U N I T M A R M D R I R P R O C E S S I N G U N I T A L U T E M P P C O U T P U T M o n i t o r P r i n t e r L E D D i s k I N P U T K e y b o a r d M o u s e S c a n n e r D i s k
  • 9. Discussion of the internal processor design as related to the von Neumann model
  • 10. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 4-10 Memory 2k x m array of stored bits Address • unique (k-bit) identifier of location Contents • m-bit value stored in location Basic Operations: LOAD • read a value from a memory location STORE • write a value to a memory location • • • 0000 0001 0010 0011 0100 0101 0110 1101 1110 1111 00101101 10100010
  • 11. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 4-11 Interface to Memory How does processing unit get data to/from memory? MAR: Memory Address Register MDR: Memory Data Register To LOAD a location (A): 1. Write the address (A) into the MAR. 2. Send a “read” signal to the memory. 3. Read the data from MDR. To STORE a value (X) to a location (A): 1. Write the data (X) to the MDR. 2. Write the address (A) into the MAR. 3. Send a “write” signal to the memory. M E M O R Y M A R M D R
  • 12. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 4-12 Processing Unit Functional Units • ALU = Arithmetic and Logic Unit • could have many functional units. some of them special-purpose (multiply, square root, …) • LC-3 performs ADD, AND, NOT Registers • Small, temporary storage • Operands and results of functional units • LC-3 has eight registers (R0, …, R7), each 16 bits wide Word Size • number of bits normally processed by ALU in one instruction • also width of registers • LC-3 is 16 bits P R O C E S S I N G U N I T A L U T E M P
  • 13. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 4-13 Input and Output Devices for getting data into and out of computer memory Each device has its own interface, usually a set of registers like the memory’s MAR and MDR • LC-3 supports keyboard (input) and monitor (output) • keyboard: data register (KBDR) and status register (KBSR) • monitor: data register (DDR) and status register (DSR) Some devices provide both input and output • disk, network Program that controls access to a device is usually called a driver. I N P U T K e y b o a r d M o u s e S c a n n e r D i s k O U T P U T M o n i t o r P r i n t e r L E D D i s k
  • 14. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 4-14 Control Unit Orchestrates execution of the program Instruction Register (IR) contains the current instruction. Program Counter (PC) contains the address of the next instruction to be executed. Control unit: • reads an instruction from memory  the instruction’s address is in the PC • interprets the instruction, generating signals that tell the other components what to do  an instruction may take many machine cycles to complete C O N T R O L U N I T I RP C
  • 15. internal processor design as related to the von Neumann model • Processors are the main functional units of an embedded board, and are primarily responsible for processing instructions and data. • An electronic device contains at least one master processor, acting as the central controlling device, and can have additional slave processors that work with and are controlled by the master processor. • These slave processors may either extend the instruction set of the master processor or act to manage memory, buses, and I/O (input/output) devices.
  • 16. Powering the Hardware • Once you’ve soldered in the components needed for the power supply, power up the board and check that this is operational. Also check that you have power on every pad on the board where you expect power to be, and check the ground pads to make sure that there is no power where you expect no power to be. • Next, solder in the power-decoupling capacitors for the ICs. Add in the processor’s oscillator and decoupling capacitors. If the oscillator is a module, check its operation with an oscilloscope. • If IC sockets are used, solder these next, then insert the components. If you’re using processors that need to be externally reprogrammed, then sockets are a
  • 20. Computer Architecture’s Changing Definition • 1950s to 1960s: Computer Architecture Course = Computer Arithmetic • 1970s to mid 1980s: Computer Architecture Course = Instruction Set Design, especially ISA appropriate for compilers • 1990s: Computer Architecture Course = Design of CPU, memory system, I/O system, Multiprocessors
  • 21. Review • Amdahl’s Law: • CPU Time & CPI: Execution Time without enhancement 1 Speedup(E) = --------------------------------------------------------- = ---------------------- Execution Time with enhancement (1 - F) + F/S CPU time = Instruction count x CPI x clock cycle time CPU time = Instruction count x CPI / clock rate
  • 22. Instruction Set Architecture (ISA) instruction set software hardware
  • 23. ISA Level The ISA level is the interface between the compilers and the hardware.
  • 24. Overview of the Pentium 4 ISA Level The Pentium 4’s primary registers.
  • 25. Instruction Set Architecture (ISA) ( Serves as an interface between software and hardware ) • Provides a mechanism by which the software tells the hardware what should be done. instruction set High level language code : C, C++, Java, Fortran, hardware Assembly language code: architecture specific statements Machine language code: architecture specific bit patterns software compiler assembler
  • 26. Instruction Set Architecture • Instruction set architecture is the structure of a computer that a machine language programmer must understand to write a correct (timing independent) program for that machine. • The instruction set architecture is also the machine description that a hardware designer must understand to design a correct implementation of the computer.
  • 27. Interface Design Properties of A good interface: • Lasts through many implementations (portability, compatibility) • Is used in many different ways (generality) • Provides convenient functionality to higher levels • Permits an efficient implementation at lower levels Interface imp 1 imp 2 imp 3 use use use time
  • 28. Evolution of Instruction Sets Single Accumulator (EDSAC 1950) Accumulator + Index Registers (Manchester Mark I, IBM 700 series 1953) Separation of Programming Model from Implementation High-level Language Based Concept of a Family (B5000 1963) (IBM 360 1964) General Purpose Register Machines Complex Instruction Sets Load/Store Architecture RISC (Vax, Intel 432 1977-80) (CDC 6600, Cray 1 1963-76) (Mips,Sparc,HP-PA,IBM RS6000,PowerPC . . .1987) LIW/”EPIC”? (IA-64. . .1999)
  • 29. Evolution of Instruction Sets • Major advances in computer architecture are typically associated with landmark instruction set designs • Design decisions must take into account: 1. technology 2. machine organization 3. programming languages 4. compiler technology 5. operating systems • And they in turn influence these
  • 30. What Are the Components of an ISA? • Sometimes known as The Programmer’s Model of the machine • Storage cells – General and special purpose registers in the CPU – Many general purpose cells of same size in memory – Storage associated with I/O devices • The machine instruction set – The instruction set is the entire repertoire of machine operations – Makes use of storage cells, formats, and results of the fetch/execute cycle – i.e., register transfers
  • 31. 1. The instruction format Size and meaning of fields within the instruction 1. The nature of the fetch-execute cycle – Things that are done before the operation code is known What Are the Components of an ISA?
  • 32. • Which operation to perform add r0, r1, r3 –Ans: Op code: add, load, branch, etc. • Where to find the operands: add r0, r1, r3 –In CPU registers, memory cells, I/O locations, or part of instruction • Place to store result add r0, r1, r3 –Again CPU register or memory cell What Must an Instruction Specify?(I) Data Flow
  • 33. • Location of next instruction add r0, r1, r3 br endloop – Almost always memory cell pointed to by program counter—PC (Sometimes there is no operand, or no result, or no next instruction. ) What Must an Instruction Specify?(II)
  • 34. ISACSCE430/830 Types of Operations • Arithmetic and Logic: AND, ADD • Data Transfer: MOVE, LOAD, STORE • Control BRANCH, JUMP, CALL • System OS CALL, VM • Floating Point ADDF, MULF, DIVF • Decimal ADDD, CONVERT • String MOVE, COMPARE • Graphics (DE)COMPRESS
  • 35. Instructions Can Be Divided into 3 Classes (I) • Data movement instructions – Move data from a memory location or register to another memory location or register without changing its form – Load—source is memory and destination is register – Store—source is register and destination is memory • Arithmetic and logic (ALU) instructions – Change the form of one or more operands to produce a result stored in another location – Add, Sub, Shift, etc. • Branch instructions (control flow instructions) – Alter the normal flow of control from executing the next instruction in sequence – Br Loc, Brz Loc2,—unconditional or conditional branches
  • 36. Classifying ISAs Accumulator (before 1960): 1 address add A acc <− acc + mem[A] Stack (1960s to 1970s): 0 address add tos <− tos + next Memory-Memory (1970s to 1980s): 2 address add A, B mem[A] <− mem[A] + mem[B] 3 address add A, B, C mem[A] <− mem[B] + mem[C] Register-Memory (1970s to present): 2 address add R1, A R1 <− R1 + mem[A] load R1, A R1 <_ mem[A] Register-Register (Load/Store) (1960s to present): 3 address add R1, R2, R3 R1 <− R2 + R3 load R1, R2 R1 <− mem[R2] store R1, R2 mem[R1] <− R2
  • 38. Code Sequence C = A + BCode Sequence C = A + B for Four Instruction Setsfor Four Instruction Sets Stack Accumulator Register (register-memory) Register (load-store) Push A Push B Add Pop C Load A Add B Store C Load R1, A Add R1, B Store C, R1 Load R1,A Load R2, B Add R3, R1, R2 Store C, R3 memory memory acc = acc + mem[C] R1 = R1 + mem[C] R3 = R1 + R2
  • 39. Stack Architectures • Instruction set: add, sub, mult, div, . . . push A, pop A • Example: A*B - (A+C*B) push A push B mul push A push C push B mul add sub A B A A*B A*B A*B A*B A A C A*B A A*B A C B B*C A+B*C result
  • 40. Stacks: Pros and Cons • Pros – Good code density (implicit operand addressing top of stack) – Low hardware requirements – Easy to write a simpler compiler for stack architectures • Cons – Stack becomes the bottleneck – Little ability for parallelism or pipelining – Data is not always at the top of stack when need, so additional instructions like TOP and SWAP are needed – Difficult to write an optimizing compiler for stack architectures
  • 41. Accumulator Architectures • Instruction set: add A, sub A, mult A, div A, . . . load A, store A • Example: A*B - (A+C*B) load B mul C add A store D load A mul B sub D B B*C A+B*C AA+B*C A*B result
  • 42. Accumulators: Pros and Cons • Pros – Very low hardware requirements – Easy to design and understand • Cons – Accumulator becomes the bottleneck – Little ability for parallelism or pipelining – High memory traffic
  • 43. Register -Memory Architectures • Instruction set: (3 operands) add A, B, C sub A, B, C mul A, B, C • Example: A*B - (A+C*B) – 3 operands mul D, A, B mul E, C, B add E, A, E sub E, D, E
  • 44. Memory-Memory: Pros and Cons • Pros – Requires fewer instructions (especially if 3 operands) – Easy to write compilers for (especially if 3 operands) • Cons – Very high memory traffic (especially if 3 operands) – Variable number of clocks per instruction (especially if 2 operands) – With two operands, more data movements are required
  • 45. Register-Memory Architectures • Instruction set: add R1, A sub R1, A mul R1, B load R1, A store R1, A • Example: A*B - (A+C*B) load R1, A mul R1, B /* A*B */ store R1, D load R2, C mul R2, B /* C*B */ add R2, A /* A + CB */ sub R2, D /* AB - (A + C*B) */
  • 46. Register- Memory -: Pros and Cons • Pros – Some data can be accessed without loading first –Instruction format easy to encode – Good code density • Cons – Operands are not equivalent (poor orthogonality) –Variable number of clocks per instruction – May limit number of registers
  • 47. Load-Store Architectures • Instruction set: add R1, R2, R3 sub R1, R2, R3 mul R1, R2, R3 load R1, R4 store R1, R4 • Example: A*B - (A+C*B) load R1, &A load R2, &B load R3, &C load R4, R1 load R5, R2 load R6, R3 mul R7, R6, R5 /* C*B */ add R8, R7, R4 /* A + C*B */ mul R9, R4, R5 /* A*B */ sub R10, R9, R8 /* A*B - (A+C*B) */
  • 48. Load-Store: Pros and Cons • Pros – Simple, fixed length instruction encoding – Instructions take similar number of cycles – Relatively easy to pipeline • Cons –Higher instruction count – Not all instructions need three operands – Dependent on good compiler
  • 49. FLPU = Floating Points operations Unit PFCU = Prefetch control unit AOU = Atomic Operations Unit Memory-Management unit (MMU) MAR (memory address register) MDR (memory data register) BIU (Bus Interface Unit) ARS (Application Register Set) FRS File Register Set (SRS) single register set
  • 50. Processor Performance Performance A measure of how fast something works..
  • 51. Amdahl’s Law Single Enhancement F: Fraction enhanced, S: Speedup enhanced F/S Affected S F F Speedup +− = )1( 1 1 - F FExecution Time (without E) 1 - F Unaffected Execution Time (with E)
  • 52. Ex: Amdahl’s Law (I) Floating point instructions improved to run 2X; but only 10% of actual instructions are FP What is the Speedup?
  • 53. Ex: Amdahl’s Law (I) F = 0.1, S = 2 053.1 95.0 1 2 1.0 )1.01( 1 == +− =Speedup Make Common Case Fast  Enhance the parts of the program that are used most often, so ‘execution time affected by improvement’ is as large as possible. Floating point instructions improved to run 2X; but only 10% of actual instructions are FP What is the Speedup?
  • 54. Amdahl’s Law (II) Multiple Enhancements F1,F2,F3: Fraction enhanced, S1,S2,S3: Speedup enhanced ∑∑ == +− = n i i i n i i S F F Speedup 11 )1( 1 1 – (F1+F2+F3) Unaffected Execution Time (with E) 1 – (F1+F2+F3) F1Execution Time (without E) F2 F3 Affected Fi/Si
  • 55. Ex: Amdahl’s Law (II) Three CPU performance enhancements with the following speedup Enhancements and percentage of the execution time: 1) Percentage F1: 20%, Enhanced Speedup S1: 10 2) Percentage F2: 15%, Enhanced Speedup S2: 15 3) Percentage F3: 10%, Enhanced Speedup S3: 30 Assumption: Each enhancement affects a different portion of the code and only one enhancement can be used at a time. What is the Total Speedup? 71.1 0333.055.0 1 )1( 1 3 1 3 1 = + = +− = ∑∑ == i i i i i S F F Speedup
  • 56. Execution Time CPU Time - doesn’t count I/O or time spent running other programs. - system CPU time  spent in the operating system - user CPU time  spent in the program Our Focus user CPU time  (CPU) Execution Time = IC * CPI * cycle time Elapsed Time - Counts everything (disk and memory access, I/O, etc) - A useful number, but often not good for comparison purposes
  • 57. Ex: CPU Execution time CPU time = Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle CPU time = Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle A program is running on a RISC machine with the followings: - 40,000,000 instructions - 6 cycles/instruction - 1 GHz Clock rate What is the CPU execution time for this program? CPU Exec. Time = IC * CPI * Clock cycle time = 9 101640000000 − ××× = ?? seconds
  • 58. Ex: Performance A program is running on a RISC machine with the followings: - 20,000,000 instructions - 5 cycles/instruction - 1 GHz Clock rate Using the same program with a new compiler: -5,000,000 instructions -2 cycles/instruction -1 GHz Clock rate What is the speedup with the changes? Speedup = old execution time / new execution time = X / Y = Z (times faster after change)
  • 59. Ex: Instruction Classes & CPI Compute the CPU clock cycles and average CPI for the following program: Inst. type ICi CPIi ALU 20 4 Data transfer 20 5 Control 10 3 (Sol) CPU clock cycles = 20*4 + 20*5 + 10*3 = X Average CPI = X/50 = Y
  • 60. Ex: CPI and Instruction FREQi Compute the average (effective) CPI for the followings: Inst. type CPIi FREQi ALU 3 40% (0.4) Data transfer 4 40% (0.4) Control 2 20% (0.2) (Sol) Average (Effective) CPI = 3*0.4 + 4*0.4 + 2*0.2 = XX
  • 61. Ex: Peak CPI Compute the Peak CPI for the followings: Inst. type CPIi FREQi ALU 3 40%  0% Data transfer 4 40%  0% Control 2 20% 100% (Sol) Peak CPI = 3*0.0 + 4*0.0 + 2*1.0 = 2.0
  • 62. Ex: Average CPI and Average MIPS Compute the average (effective) CPI for the followings: Inst. type CPIi FREQi ALU 3 40% (0.4) Data transfer 4 40% (0.4) Control 2 20% (0.2) (Sol) Average (Effective) CPI = 3*0.4 + 4*0.4 + 2*0.2 = 3.2 If the processor is Pentium II (320MHz), what is the MIPS rate? 100 102.3 10320 10 6 6 6 = × × = × = CPI ClockRate MIPS
  • 63. Ex: Peak CPI and Peak MIPS Compute the Peak CPI for the followings: Inst. type CPIi FREQi ALU 3 40%  0% Data transfer 4 40%  0% Control 2 20% 100% (Sol) Peak CPI = 3*0.0 + 4*0.0 + 2*1.0 = 2.0 If the processor is Pentium II (320MHz), what is the peak MIPS rate? 160 102 10320 10 6 6 6 = × × = × = CPI ClockRate MIPS
  • 65. Multicore Benchmarking Rules • Do not rely on a single answer • Match your application requirements – Small or large data sets – Few or many threads – Dependencies – OS overhead
  • 66. Two Processor System Utilizing Single Memory Controller Quad Core Processor 1 Quad Core Processor 2 DDR2 Interface Processors 1 and 2 must always arbitrate for memory via their front side bus connection through the North Bridge. North Bridge Intel Front Side Bus
  • 67. Two Processor System Utilizing Dual Memory Controllers Quad Core Processor 1 Quad Core Processor 2 LinkDDR2 Interface DDR2 Interface Direct Access Shared Access Doubly Shared Access
  • 68. Two Processor System Utilizing Single Memory Controller Quad Core Processor 1 Quad Core Processor 2 Link DDR2 Interface • Processor 1 must always access memory by traversing link to Processor 2 • Requires arbitration to access Processor 2’s memory • Processor 2 always has prioritized access to this memory since it is directly attached. • Affinity can help performance
  • 69. Benchmarking Multicore – What’s Important? • Measuring scalability • Memory and I/O bandwidth • Inter-core communications • OS scheduling support • Efficiency of synchronization • System-level functionality
  • 70. The Multicore benchmarking Roadmap 1.Communications - MCAPI: ultra-light weight 2.Resource Management - Memory management - Basic synchronization - Resource registration - Resource partitioning 3.Task Management -Task scheduling The Four MC Pillars Virtualization (or OS) Communication Resource Management Task Management Debug Multicore System Adopted stdsAdopted stds MCA Foundation APIsMCA Foundation APIs Value Added Functions • Languages • Programming Models • Design Environments • Application Generators • Benchmarks Services •Load Balancing •System Mgt. •Power Mgt. •Reliability •Quality of Service 4.Debug

Editor's Notes

  • #58: 0.24 seconds
  • #59: 0.1/0.01 = 10 (times faster after change)
  • #60: (Sol) CPU clock cycles = 20*4 + 20*5 + 10*3 = 210 Average CPI = 210/50 = 4.2
  • #61: (Sol) Average (Effective) CPI = 3*0.4 + 4*0.4 + 2*0.2 = 3.2
  • #62: (Sol) Peak CPI = 3*0.0 + 4*0.0 + 2*1.0 = 2.0