SlideShare a Scribd company logo
Unit - I
ARM Processor Fundamentals
Introduction
• ARM processors are a family of central processing units (CPUs) based on a reduced instruction set computer
(RISC) architecture. ARM stands for Advanced RISC Machine.
History of ARM processors:
• x86 is an older architectural approach (CISC - complex instruction set computer) , first x86 CPU design
launched in 1978.
• “Microcomputers" (PCs) evolved  high performance and a smaller design became a challenge.
• Early 1980s, Acorn Computers designed microcomputers  performance limitations with chip design.
• Around 1981, University of California, Berkeley project  resource usage with computer chips.
• Processing units have certain predefined operations collectively called instruction sets.
• Most programs used only a small subset of the instruction set. Reducing the number of predefined
instructions—cutting out complex and hard to implement (and little used) instructions —the remaining simple
instructions would run faster and take up much less power and space on the chip.  RISC
Dr. Shachi P, Dept. of ECE, BMSCE 2
History of ARM processors
• X86(Intel) design have a modular approach based on a motherboard with swappable
components. The CPU and other components—such as graphics cards and GPUs, memory
controllers, storage, or processing cores—are optimized for specific functions and can be
easily swapped out or expanded.
• However, these hardware components are typically more homogenized system
architectures, which can allow hackers to quickly breach and attack systems with "write
once, run anywhere" exploits.
• In ARM-based processor, CPU cores and other hardware functions (like I/O bus controllers
such as peripheral component interconnect) are on the same physical platform, and all of
the different functions are integrated together through an internal bus.  SoC
• x86 chips are designed to optimize performance;
• ARM-based processors are designed to balance cost with smaller sizes, lower power
consumption, lower heat generation, speed, and potentially longer battery life.
Dr. Shachi P, Dept. of ECE, BMSCE 3
ARM Partnership Model
Dr. Shachi P, Dept. of ECE, BMSCE 4
Microprocessors vs. Microcontrollers
Microprocessors Microcontrollers
A silicon chip representing a Central Processing
Unit(CPU), which is capable of performing arithmetic
as well as logical operations according to a pre-defined
set of Instructions.
A microcontroller is a highly integrated chip that
contains a CPU, RAM, Special and General purpose
Register Arrays, On Chip ROM/FLASH memory for
program storage, Timer and Interrupt control units
and dedicated I/O ports
It is a dependent unit. It requires the combination of
other chips like Timers, Program and data memory
chips, Interrupt controllers etc. for functioning.
It is a self contained unit and doesn’t require external
Interrupt Controller, Timer, and UART etc. for its
functioning.
Most of the time general purpose in design
and operation.
Mostly application oriented or domain
specific.
Doesn’t contain a built in I/O port. The I/O Port
functionality needs to be implemented with the help
of external Programmable Peripheral Interface Chips
Most of the processors contain multiple built-in I/O
ports which can be operated as a single 8 or 16 or 32
bit Port or as individual port pins.
Targeted for high end market where
performance is important.
Targeted for embedded market where
performance is not so critical.
Limited power saving options. Includes lot of power saving features
Dr. Shachi P, Dept. of ECE, BMSCE 5
RISC V/S CISC Processors/Controllers:
RISC Processors/Controllers CISC Processors/Controllers
Lesser no. of instructions. Greater no. of Instructions.
Instruction Pipelining and increased
execution speed.
Generally no instruction pipelining feature.
Operations are performed on registers only, the only
memory operations are load and store
Operations are performed on registers or
memory depending on the instruction
Large number of registers are available Limited no. of general purpose registers
Programmer needs to write more code to execute a
task since the instructions are simpler ones.
A programmer can achieve the desired
functionality with a single instruction.
Single, Fixed length Instructions. Variable length Instructions.
Less Silicon usage and pin count. More silicon usage.
With Harvard Architecture Harvard or Von-Neumann Architecture
Dr. Shachi P, Dept. of ECE, BMSCE 6
CISC vs. RISC - Instruction set
• The terms CISC and RISC refer to design principles and techniques.
RISC: Reduced instruction set computers
• Simple instructions require a small number of basic steps to execute.
• For a processor that has only simple instructions, a large number of instructions may be
needed to perform a given programming task.
• This could lead to a large value of N and a small value for S. 'N' is the total number of steps
required to complete program execution. 'S' is the average number of basic steps each
instruction execution requires.
• It is much easier to implement efficient pipelining in processors with simple instruction sets.
CISC: Complex instruction set computers
• Complex instructions involve a large number of steps.
• If individual instructions perform more complex operations, fewer instructions will be
needed, leading to a lower value of N and a larger value of S.
• Complex instructions combined with pipelining would achieve good performance.
Dr. Shachi P, Dept. of ECE, BMSCE 7
Harvard V/s Von-Neumann
Processor/Controller Architecture
Harvard Architecture(ARM) Von-Neumann Architecture(x86)
Microprocessors/controllers based on the Harvard
architecture will have separate data bus and instruction
bus. This allows the data transfer and program fetching
to occur simultaneously on both buses.
Microprocessors/controllers based on the Von-
Neumann architecture shares a single bus for fetching
both instructions and data. Program instructions &
data are stored in a common main memory.
Separate buses for Instruction and Data fetching. Single shared bus for Instruction and Data fetching.
Easier to Pipeline, so high performance can be
achieved.
Low performance Compared to Harvard
Architecture.
Comparatively high cost. Cheaper.
No memory alignment problems Allows self modifying codes
Dr. Shachi P, Dept. of ECE, BMSCE 8
Unit - I
• Basic Structure of computers- Von Neumann and Harvard Architecture, Basic
Processing Unit, Bus Structure, RISC and CISC Architecture, RISC and ARM
Design philosophy, ARM core Dataflow model, programming model,
processor states and operating modes, ARM pipeline.
Dr. Shachi P, Dept. of ECE, BMSCE 9
Computer Types
• Since their introduction in the 1940s, digital computers have evolved into
many different types that vary widely in size, cost, computational power, and
intended use.
• Modern computers can be divided roughly into four general
categories:
• 1. Embedded computers are integrated into a larger device or system in
order to automatically monitor and control a physical process or
environment.
• They are used for a specific purpose rather than for general
processing tasks.
• Ex: industrial and home automation, appliances, telecommunication
products, and vehicles
Dr. Shachi P, Dept. of ECE, BMSCE 10
Computer Types
• 2. Personal computers have achieved widespread use in homes, educational
institutions, and business and engineering office settings, primarily for
dedicated individual use.
• They support a variety of applications such as general computation,
document preparation, computer-aided design, audio visual entertainment,
interpersonal communication, and Internet browsing.
A number of classifications are used for personal computers.
• Desktop computers serve general needs and fit within a typical personal
workspace.
• Workstation computers offer higher computational capacity and more
powerful graphical display capabilities for engineering and scientific work.
• Portable and Notebook computers provide the basic features of a personal
computer in a smaller lightweight package. They can operate on batteries to
provide mobility
Dr. Shachi P, Dept. of ECE, BMSCE 11
Computer Types
3.3. Servers and Enterprise systems are large computers that are meant to be
shared by a potentially large number of users who access them from some form
of personal computer over a public or private network.
• Such computers may host large databases and provide information
processing for a government agency or a commercial organization.
4.Supercomputers and Grid computers normally offer the highest
performance. They are the most expensive and physically the largest
category of computers.
• Supercomputers are used for the highly demanding computations
needed in weather forecasting, engineering design and simulation,
• and scientific work.
Dr. Shachi P, Dept. of ECE, BMSCE 12
Functional Units of a Computer
Computer consists of five
functionally independent
main parts:
1. Input
2. Memory
3. Arithmetic and logic
4. Output
5. Control units
Dr. Shachi P, Dept. of ECE, BMSCE 13
Functional Units of a Computer
• The input unit accepts coded information from human operators using
devices such as keyboards, or from other computers over digital
communication lines.
• The information received is stored in the computer’s memory, either for later
use or to be processed immediately by the arithmetic and logic unit.
• The processing steps are specified by a program that is also stored in the
memory.
• Finally, the results are sent back to the outside world through the output unit.
• All of these actions are coordinated by the control unit.
• An interconnection network provides the means for the functional units to
exchange information and coordinate their actions.
Dr. Shachi P, Dept. of ECE, BMSCE 14
• The information handled by a computer is categorize as either instructions or data.
• Instructions, or machine instructions, are explicit commands that
1. Govern the transfer of information within a computer as well as between the computer and
its I/O devices.
2. Specify the arithmetic and logic operations to be performed
• A program is a list of instructions which performs a task. Programs are stored in
the memory.
• The processor fetches the program instructions from the memory, one after another,
and performs the desired operations.
• The computer is controlled by the stored program, except for possible external
interruption by an operator or by I/O devices connected to it.
• Data are numbers and characters that are used as operands by the instructions. Data
are also stored in the memory.
• The instructions and data handled by a computer must be encoded in a suitable
format.
Dr. Shachi P, Dept. of ECE, BMSCE 15
Functional Units of a Computer
Memory Unit
• The function of the memory unit is to store programs and data. There are two
classes of storage, called primary and secondary.
Primary Memory
• Primary memory (main memory) is a fast memory that operates at electronic
speeds.
• Programs must be stored in this memory while they are being executed.
• Semiconductor storage cells, each capable of storing one bit of information. These
cells are handled in groups of fixed size called words.
• The memory is organized  one word can be stored/retrieved in one basic
operation.
• Number of bits in each word word length (typically 16, 32, or 64 bits).
Dr. Shachi P, Dept. of ECE, BMSCE 16
Functional Units of a Computer
• To provide easy access to any word in the memory, a distinct address is associated
with each word location.
• Addresses are consecutive numbers, starting from 0, that identify successive
locations.
• A memory in which any location can be accessed in a short and fixed amount of time
after specifying its address is called a random-access memory (RAM).
• The time required to access one word is called the memory access time.
• This time is independent of the location of the word being accessed. It typically
ranges from a few nanoseconds (ns) to about 100 ns for current RAM units.
Dr. Shachi P, Dept. of ECE, BMSCE 17
Functional Units of a Computer
Cache Memory
• As an adjunct to the main memory, a smaller, faster RAM unit, called a cache,
is used to hold sections of a program that are currently being executed, along
with any associated data.
• The cache is tightly coupled with the processor and is usually contained
on the same integrated-circuit chip.
• The purpose of the cache is to facilitate high instruction execution rates.
• As execution proceeds, instructions are fetched into the processor chip, and a
copy of each is placed in the cache.
• If the required data located in the main memory, the data are fetched and
copies are also placed in the cache.
Dr. Shachi P, Dept. of ECE, BMSCE 18
Functional Units of a Computer
Secondary Storage
• Although primary memory is essential, it tends to be expensive and does not
retain information when power is turned off.
• Thus additional, less expensive, permanent secondary storage is used when
large amounts of data and many programs have to be stored, particularly for
information that is accessed infrequently.
• Access times for secondary storage are longer than for primary memory.
• Examples: magnetic disks, optical disks (DVD and CD), and flash memory
devices.
• https://www.youtube.com/watch?v=7J7X7aZvMXQ (up to 3:17seconds)
Dr. Shachi P, Dept. of ECE, BMSCE 19
Functional Units of a Computer
• When operands are brought into the processor, they are stored in high-
speed storage elements called registers.
• Each register can store one word of data.
• Access times to registers are even shorter than access times to the cache unit
on the processor chip.
Control Unit
• Control circuits are responsible for generating the timing signals that govern
the transfers and determine when a given action is to take place.
• Data transfers between the processor and the memory are also
managed by the control unit through timing signals.
Dr. Shachi P, Dept. of ECE, BMSCE 20
Functional Units of a Computer
The Basic Operational Concepts of a Computer
• To perform a given task an appropriate program consisting of a list of instructions is stored in the memory.
Individual instructions are brought from the memory into the processor, which executes the specified
operations. Data to be stored are also stored in the memory.
• Examples: - Add LOCA, R0
• This instruction adds the operand at memory location LOCA, to operand in register R0 & places the sum into
register. This instruction requires the performance of several steps,
1. First the instruction is fetched from the memory into the processor.
2. The operand at LOCA is fetched and added to the contents of R0
3. Finally the resulting sum is stored in the register R0
• The preceding add instruction combines a memory access operation with an ALU Operations. In
some other type of computers, these two types of operations are performed by separate instructions
for performance reasons.
• Load LOCA, R1
• Add R1, R0
• Transfers between the memory and the processor are started by sending the address of the memory location
to be accessed to the memory unit and issuing the appropriate control signals. The data are then transferred
to or from the memory.
Dr. Shachi P, Dept. of ECE, BMSCE 21
Dr. Shachi P, Dept. of ECE, BMSCE 22
Connections between the processor and the memory
• Besides IR and PC, there are n-general purpose
registers R0 through Rn-1.
Memory Address Register (MAR):
It holds the address of the location to
be accessed.
Memory Data Register (MDR):
It contains the data to be written into
or read out of the address location.
• The instruction register (IR)
• Holds the instructions that is currently being
executed. Its output is available for the control
circuits which generates the timing signals that
control the various processing elements in one
execution of instruction.
• The program counter PC: This is another
specialized register that keeps track of execution
of a program. It contains the memory address of
the next instruction to be fetched and executed.
Operating steps for Program execution
1. Execution of the program (stored in memory) starts when the PC is set to point to the first instruction of the
program.
2. The contents of the PC are transferred to the MAR and a Read control signal is sent to the memory.
3. The addressed word is read out of the memory and loaded into the MDR. Next, the contents of the MDR
are transferred to the IR. At this point, the instruction is ready to be decoded and executed.
4. If the instruction involves an operation to be performed by the ALU, it is necessary to obtain the required
operands.
5. If an operand resides in memory (it could also be in a general purpose register in the processor), it has to be
fetched by sending its address to the MAR and initiating a Read cycle.
6. When the operand has been read from the memory into the MDR, it is transferred from the MDR to ALU.
7. After one or more operands are fetched in this way, the ALU can perform the desired operation.
8. If the result of the operation is to be stored in the memory, then the result is entered in to the MDR.
Dr. Shachi P, Dept. of ECE, BMSCE 23
9.The address of the location where the result is to be stored is sent to the MAR, and a write cycle is initiated.
10.At some point during the execution of the current instruction, the contents of the PC are incremented so that the PC
points to the next instruction to be executed.
11. Thus, as soon as the execution of the current instruction is completed, a new instruction fetch may be started.
12. In addition to transferring data between the memory and the processor, the computer accepts data from input devices
and sends data to output devices. Thus, some machine instructions with the ability to handle I/O transfers are provided.
• Normal execution of a program may be preempted (temporarily interrupted) if some devices require urgent servicing,
to do this one device raises an Interrupt signal.
• An interrupt is a request signal from an I/O device for service by the processor. The processor provides the requested
service by executing an appropriate interrupt service routine.
• The Diversion may change the internal state of the processor. Its state must be saved in the memory location before
interruption. When the interrupt-routine service is completed the state of the processor is restored so that the
interrupted program may continue.
Dr. Shachi P, Dept. of ECE, BMSCE 24
Bus Structures
BUS: A group of lines(wires) that serves as a connecting path for several devices of a
computer is called a bus.
The following are different types of busses:
1. Address Bus 2. Data Bus 3. Control Bus
• The Data bus Carries(transfer) data from one component (source) to other component
(destination) connected to it. The data bus consists of 8, 16, 32 or more parallel signal lines.
The data bus lines are bi-directional i.e., CPU can read data on these lines from memory or
from a port, as well as send data out on these lines to a memory location.
• The Address bus is the set of lines that carry(transfer) address information about to which
memory address, the data is to be transferred to or from. It is an unidirectional bus. The
address bus consists of 16, 20, 24 or more parallel signal lines. On these lines CPU sends out
the address of the memory location.
• The Control Bus carries the Control and timing information.
Dr. Shachi P, Dept. of ECE, BMSCE 25
Bus Structures
Following are the other types of busses.
• System Bus: A System Bus is usually a combination of address bus, data
bus, and control bus.
• Internal Bus: The bus that operates only with the internal circuitry of the
CPU.
• External Bus: Buses which connects computer to external devices
• I/O Bus: The bus used by I/O devices to communicate with the CPU
• Synchronous Bus: While using Synchronous bus, data transmission
between source and destination units takes place in a given timeslot which
is already known to these units.
Dr. Shachi P, Dept. of ECE, BMSCE 26
Bus Structures
• Asynchronous Bus: In this case the data transmission is governed by a
special concept. That is handshaking control signals.
• Handshaking (either software codes or hardware signals) is used to halt
transmission of data from the sending computer until the receiving
computer has emptied the buffer.
• Handshaking is a I/O control method to synchronize I/O devices with the
microprocessor.
• As many I/O devices accepts or release information at a much slower rate
than the microprocessor, this method is used to control the microprocessor to
work with a I/O device at the I/O devices data transfer rate.
Dr. Shachi P, Dept. of ECE, BMSCE 27
The Bus interconnection Scheme
Dr. Shachi P, Dept. of ECE, BMSCE 28
1. Bus is a connecting path for several devices of a computer
2. In addition to the lines that carry the data, the bus must have
lines for address and control purposes.
Single bus structure
• The simplest way to interconnect functional units is to use a single bus, as shown below.
• All units are connected to this bus. The bus can be used for only one transfer at a time. Bus
control lines are used to arbitrate multiple requests for use of the bus.
ADVANTAGE
• Low-cost and its flexibility for attaching peripheral devices
DISADVANTAGE
• Low-performance because at time only one transfer
• Scalability: As computer systems become more complex and require higher bandwidth for
data transfer, a single bus structure may struggle to scale efficiently.
• Contention: Contention for the bus can occur when multiple components attempt to access
it simultaneously, leading to delays and potential performance issues.
Dr. Shachi P, Dept. of ECE, BMSCE 29
Traditional / Multiple bus Structure:
• Advantages: better performance, scalable, less contention
• Disadvantage: increased cost and complexity.
Dr. Shachi P, Dept. of ECE, BMSCE 30
Traditional / Multiple bus Structure:
• Traditional / Multiple bus Structure:
• There is a local bus that connects the processor to cache memory and that may
support one or more local devices.
• There is also a cache memory controller that connects this cache not only to this
local bus but also to the system bus. On the system, the bus is attached to the main
memory modules.
• I/O transfers to and from the main memory across the system bus do not
interfere with the processor’s activity.
• An expansion bus interface buffers data transfers between the system bus and
the I/O controllers on the expansion bus.
• I/O devices that might be attached to the expansion bus include: Network cards
(LAN), SCSI (Small Computer System Interface), Modem, etc..
Dr. Shachi P, Dept. of ECE, BMSCE 31
Basic Processing Unit
• Computing task consists of a series of operations specified by a sequence of machine-
language instructions that constitute a program.
• The processor fetches one instruction at a time and performs the operation specified.
Instructions are fetched from successive memory locations until a branch or a jump
instruction is encountered.
• The processor uses the program counter, PC, to keep track of the address of the next
instruction to be fetched and executed.
• After fetching an instruction, the contents of the PC are updated to point to the next
instruction in sequence. A branch instruction may cause a different value to be loaded
into the PC.
• When an instruction is fetched, it is placed in the instruction register, IR, from where it is
interpreted, or decoded, by the processor’s control circuitry. The IR holds the instruction
until its execution is completed.
• Consider a 32-bit RISC-style instruction set architecture.
Dr. Shachi P, Dept. of ECE, BMSCE 32
Basic Processing Unit
• Toexecute an instruction, the processor has to perform the following steps:
1. Fetch the contents of the memory location pointed to by the PC. The
contents of this location are the instruction to be executed; hence they are
loaded into the IR.
• In register transfer notation, the required action is IR ← [[PC]]
2. Increment the PC to point to the next instruction. Assuming that the
memory is byte addressable, the PC is incremented by 4; that is PC ← [PC]
+ 4
3. Carry out the operation specified by the instruction in the IR.
Dr. Shachi P, Dept. of ECE, BMSCE 33
Basic Processing Unit
• The operation specified by an instruction can be carried out by performing one
or more of the following actions:
• Read the contents of a given memory location and load them into a processor register.
• Read data from one or more processor registers.
• Perform an arithmetic or logic operation and place the result into a processor register.
• Store data from a processor register into a given memory location.
• The processor communicates with the memory through the processor- memory
interface, which transfers data from and to the memory during Read and Write
operations.
• The instruction address generator updates the contents of the PC after every
instruction is fetched. The register file is a memory unit whose storage locations are
organized to form the processor’s general-purpose registers.
Dr. Shachi P, Dept. of ECE, BMSCE 34
Basic Processing Unit
• The processor communicates with the memory through the
processor-memory interface, which transfers data from and to
the memory during Read and Write operations.
• The instruction address generator updates the contents of the
PC after every instruction is fetched.
• The register file is a memory unit whose storage locations are
organized to form the processor’s general- purpose registers.
• During execution, the contents of the registers named in an
instruction that performs an arithmetic or logic operation are
sent to the arithmetic and logic unit (ALU), which performs the
required computation.
• The results of the computation are stored in a register in the
register file.
• The clock period, which is the time between two successive rising edges, must be long enough to allow
the combinational circuit to produce the correct result.
Dr. Shachi P, Dept. of ECE, BMSCE 35
RISC and ARM Design Philosophy
The RISC Design Philosophy
• Instructions – reduced number and simpler
• Pipeline
• Registers – large number of general purpose registers (store data or address)
• Load/Store architecture – anything data on memory (to be processed), is first
moved to register/s and then processed.
ARM Design Philosophy
• Power efficiency
• High code density
• Memory footprint/ Die area
• Hardware Debug technology
Dr. Shachi P, Dept. of ECE, BMSCE 36
The RISC Design Philosophy
Dr. Shachi P, Dept. of ECE, BMSCE 37
Nomenclature
• ARM7TDMI-S
Dr. Shachi P, Dept. of ECE, BMSCE 38
ARM7TDMI Features
• 32 bit data bus/ ALU
• 32 bit instructions/ Address bus
• Aligned memory
• Von Neuman architecture
• 3-stage pipeline
• 37 registers- 32 bit each
• Load- store Model
• 7 operating modes
• 7 exceptions
• 7 addressing modes
• 3 data formats
Dr. Shachi P, Dept. of ECE, BMSCE 39
ARM ISA Features
• ARM ISA differs from pure RISC
• Variable execution cycle for certain instructions
• In-line barrel shifter leading to more complex instructions.
• Thumb instruction set
• Conditional execution
• Enhanced instructions with DSP extension
Dr. Shachi P, Dept. of ECE, BMSCE 40
Data Sizes and Instruction Sets
■ The ARM is a 32-bit architecture.
■ When used in relation to the ARM:
■ Byte means 8 bits
■ Halfword means 16 bits (two bytes)
■ Word means 32 bits (four bytes)
■ Most ARM’s implement two instruction sets
■ 32-bit ARM Instruction Set
■ 16-bit Thumb Instruction Set
■ Jazelle cores can also execute Java bytecode
Dr. Shachi P, Dept. of ECE, BMSCE 41
ARM core Dataflow model
• MOVS r7, r5, LSL #2
• MLA{<cond>}{S} R0,R1,R2,R3
• LDR r0, [r1, #4]!
• STRH r0,[r1,#0x4]!
• LDRSB r0,[r1]
Dr. Shachi P, Dept. of ECE, BMSCE 42
Registers
What is a register?
• data holding places that are part of the computer processor
• high-speed memory storing units.
• memory locations that can be accessed by the CPU directly
Difference between memory and register
• A register stores the instructions which the CPU currently processes.
• Memory stores the data and instructions that the processor while operation
may require.
Dr. Shachi P, Dept. of ECE, BMSCE 43
Registers (contd.)
• ARM has 37 registers (all are 32-bits long)
• 1 dedicated program counter
• 1 dedicated current program status register (CPSR)
• 5 dedicated saved program status registers (SPSR)
• 30 general purpose registers
• Out of 37 only 18 are active registers
• 16 data registers (r0-r15)- hold either data or address
• 2 process status registers
• r13 : stack pointer
• r14: link register
• r15: program counter
Dr. Shachi P, Dept. of ECE, BMSCE 44
Registers (contd.)
• Register r13 :
• used as the stack pointer (sp)
• stores the head of the stack in the current processor mode.
• Register r14
• the link register (lr)
• the core puts the return address whenever it calls a subroutine.
• Register r15:
• is the program counter (pc)
• the address of the next instruction to be fetched by the processor
• These registers are distributed in several register banks, their usage depends on
the mode in which the ARM processor is operated
Dr. Shachi P, Dept. of ECE, BMSCE 45
Banked Registers
• registers hidden from a program at different times  banked registers are
identified by the shading in the diagram
• Available only when the processor is in a particular mode
• Mode can be selected by writing directly to the mode bits of the cpsr (core must
be in privileged mode)
• Mode can also be changed by hardware when the core responds to an exception
or interrupt
• A banked register maps one-to one onto a user mode register
• If processor mode is changed , a banked register from the new mode will
replace an existing register
Saved Program Status Register (SPSR) stores the current value of the CPSR when an
exception is taken so that the CPSR can be restored after handling the exception.
Dr. Shachi P, Dept. of ECE, BMSCE 46
• Exceptions and interrupts suspend the normal execution of sequential instructions and jump to a specific
location.
• The following exceptions and interrupts cause a mode change:
 Reset
 interrupt request
 fast interrupt request
 software interrupt
 data abort
 prefetch abort
 undefined instruction
• a new register appearing in interrupt request mode: the saved program status register (spsr), which
stores the previous mode’s cpsr
• spsr can only be modified and read in a privileged mode. There is no spsr available in user mode.
• cpsr is not copied into tspsr when a mode change is forced due to a program writing directly to the cpsr.
• The saving of the cpsr only occurs when an exception or interrupt is raised.
Dr. Shachi P, Dept. of ECE, BMSCE 47
Current Program Status Register
• Used to monitor and control internal operations. 32-bit register, resides in register file.
• The CPSR is divided into four fields, each 8 bits wide:
Dr. Shachi P, Dept. of ECE, BMSCE 48
• flags
• Status
• Extension
• Control
• The control fieldprocessor mode, state, and interrupt mask bits.
• The flags field  contains the condition flags.
• Some ARM processor cores have extra bits allocated.
• The J bit, in flags field  used in Jazelle-enabled processors
Interrupt Masks
• Used to stop specific interrupt requests from interrupting the processor
• Two interrupt request levels inARM core
• Interrupt Request (IRQ)
• Fast Interrupt Request (FIQ)
• CPSR: 2 interrupt mask bits
• I when set to 1 it masks requests made by IRQ
• Fwhen set to 1 it masks requests made by FIQ
Conditional Flags
There are four Conditional Flags inARM7TDMI
It is present in the CPSR, the flag bits are
⚫ N: Result is Negative
⚫ Z: Zero flag
⚫ C: Carry Flag
⚫ V: Overflow Flag
Dr. Shachi P, Dept. of ECE, BMSCE 49
Condition Flags
• Updated by comparisons and results ofALU operations
• Only instructions having suffix S can update the flags
• Eg: SUBS instruction when executed sets Z=1 if result is zero
• Q: used in cores with DSP extensions
• Indicates an overflow/ saturation due to execution of enhanced DSP instruction
• It’s a sticky flag: can be set only by hardware
• Can be cleared by writing to CPSR directly
• ARM instructions follow conditional execution
• Its is based on the value stored in conditional flag[ Ref Table next slide]
Note 1:
 When bit=1Capital Letter
 When bit=0Lower case Letter
Figure: CPSR with both Jazelle and DSP extensions set
Note 2:
 Conditional flags Capital letter
indicate flag is set
 InterruptsCapital letter indicates
interrupt is disabled/masked
Dr. Shachi P, Dept. of ECE, BMSCE 50
Conditional Execution
 Controls whether or not the core will
execute an instruction
 Before execution, processor compares the
attributes with the flags in CPSR
 If they match instruction is executed
 If not instruction is ignored
 Conditional attribute is post-fixed to
instruction mnemonic [REFER TABLE]
 If mnemonic is not present the default is
AL (Always)
Dr. Shachi P, Dept. of ECE, BMSCE 51
Dr. Shachi P, Dept. of ECE, BMSCE 52
Dr. Shachi P, Dept. of ECE, BMSCE 53
On power up the processor by default operates in supervisor mode  privileged mode
Processor Modes
The processor mode determines which registers are active and the access rights to
the cpsr register itself.
 Each process mode is either
privileged or nonprivileged
 A privileged mode :allows full
read-write access to the cpsr
 A nonprivileged mode : only
allows to read access to the
control field in the cpsr
 but still allows read-write access
to the condition flags.
Dr. Shachi P, Dept. of ECE, BMSCE 54
Dr. Shachi P, Dept. of ECE, BMSCE 55
Dr. Shachi P, Dept. of ECE, BMSCE 56
State and Instruction Sets
• State defines which instruction set needs to be executed
• Selected using the control bits of the CPSR register
• 3 states:
• ARM : default state, selected when T=J=0,ARM instructions are executed
• Thumb: Selected when T=1; 16 bit thumb instructions are executed
• Jazelle : selected when J=1; 8 bit Jazelle instruction set is selected; Used to execute java bytecodes
• States can be changed by executing branch instruction
Dr. Shachi P, Dept. of ECE, BMSCE 57
Dr. Shachi P, Dept. of ECE, BMSCE 58
Pipelining in ARM7TDMI
• ARM devices need pipelining because of RISC as it emphasizes oncompiler complexity.
• Each stage is equivalent to 1 cycle, that is n stages = n cycles.
• ARM7 uses 3 stage pipeline
• Pipeline speeds up the execution;
• Next instruction is fetched while the other instructions are being decoded and executed
• The pipeline stages are
• FETCH: loads instruction from memory to instruction pipeline
• DECODE : identifies instruction to be executed
• EXECUTE :processes the instruction and writes the result back to a register
Dr. Shachi P, Dept. of ECE, BMSCE 59
Pipelining in ARM7TDMI
Three instructions are in the pipeline. Instructions are placed in pipeline
sequentially
• Cycle1: CORE fetches ADD from memory and puts it in instruction pipeline
• Cycle 2: CORE fetches SUB instruction and Decodes ADD instruction
• Cycle 3: CORE fetches CMP instruction, decodes SUB instruction and
Executes ADD instruction
• This procedure is called FILLING THE PIPELINE
Pipeline allows the CORE to execute an instruction every cycle.
Latency is 3-cycles but throughput is one instruction per cycle.
Dr. Shachi P, Dept. of ECE, BMSCE 60
EXTRAS
Dr. Shachi P, Dept. of ECE, BMSCE 61
Barrel Shifter
A barrel shifter is a digital circuit that can shift a binary number by a specified number
of bits in one clock cycle.
• Barrel shifter can be implemented by a combination of multiplexers
• 2 types – arithmetic and logical shifter
A few examples of barrel shifter applications:
• In Digital Signal Processing, barrel shifters are used to perform fast multiplication
and division operations. For example, in a FIR filter implementation, a barrel shifter
can be used to shift the filter coefficients based on the filter order.
• In Cryptography, barrel shifters are used to perform bitwise operations, such as
encryption and decryption. For example, a barrel shifter can be used to perform a
circular shift on a binary value to improve the security of the encryption algorithm.
• In Microprocessor Architectures, barrel shifters are used to shift the contents of
registers, allowing for efficient data manipulation. For example, in the ARM
architecture, the barrel shifter is used to perform shift and rotate operations on the
contents of registers.
Dr. Shachi P, Dept. of ECE, BMSCE 62
Extras Load-store architecture
• A load-store architecture is a type of computer architecture where all data
processing operations (such as arithmetic, logical, and control operations) are
performed only on data that is loaded from memory into registers, and the results
are stored back into memory. In other words, the only operations that directly access
memory are load and store operations.
• For CISC machine, which is a register-memory architecture, operands may come
from register or memory and RISC a register-register(or load-store) one on the
contrary.
Dr. Shachi P, Dept. of ECE, BMSCE 63

More Related Content

PDF
ARM-Unit-1.pdf
PPT
Module-01 8051 Microcontroller for engineering
PPT
Module-01 8051 Microcontroller presentation
PPTX
esunit1.pptx
PPTX
PPT MES class.pptx
PPTX
MES PPT.pptx
PPTX
Chapter_2_ESD_Typical Embedded System.pptx
PPTX
mod1_arm_embedded_systems_ppt_2021_22_odd_oe.pptx
ARM-Unit-1.pdf
Module-01 8051 Microcontroller for engineering
Module-01 8051 Microcontroller presentation
esunit1.pptx
PPT MES class.pptx
MES PPT.pptx
Chapter_2_ESD_Typical Embedded System.pptx
mod1_arm_embedded_systems_ppt_2021_22_odd_oe.pptx

Similar to ARM PROCESSING BASICS PPT FOR 4TH SEM ENGINEERING (20)

PPTX
Module-3 ADVANCED MICROCONTROLLER IMP.pptx
PPTX
18CS44-MODULE1-PPT.pptx
PPTX
AEC 8051 controller.pptxmicrocontroller notes
PPTX
introduction to embedded-converted new one
PPTX
SYSTEM approach in system on chip architecture
PPT
Basics of micro controllers for biginners
PPT
Microprocessor
PPTX
Uni Processor Architecture
PPTX
module1_CA_for use of tribal network .pptx
PPTX
MODULE 1 MES.pptx
PDF
MC_Module_1 (2).pdf MICROCONTROLLER VTU. MODULE 2
PPTX
1. Introduction to Microprocessor.pptx
PPTX
Computer organization & ARM microcontrollers module 3 PPT
PPTX
Unit I _COMPUTER ORGANISATON AND ARCHITECTURE_PPT.pptx
PPTX
Unit I _Computer organisation andarchitecture
PPTX
Mces MOD 1.pptx
PPT
Embedded systems-unit-1
PDF
18CS44-MODULE1-PPT.pdf
PPTX
Ca lecture 03
PPT
UNIT I Basic terminology COMPUTER ARCHI.ppt
Module-3 ADVANCED MICROCONTROLLER IMP.pptx
18CS44-MODULE1-PPT.pptx
AEC 8051 controller.pptxmicrocontroller notes
introduction to embedded-converted new one
SYSTEM approach in system on chip architecture
Basics of micro controllers for biginners
Microprocessor
Uni Processor Architecture
module1_CA_for use of tribal network .pptx
MODULE 1 MES.pptx
MC_Module_1 (2).pdf MICROCONTROLLER VTU. MODULE 2
1. Introduction to Microprocessor.pptx
Computer organization & ARM microcontrollers module 3 PPT
Unit I _COMPUTER ORGANISATON AND ARCHITECTURE_PPT.pptx
Unit I _Computer organisation andarchitecture
Mces MOD 1.pptx
Embedded systems-unit-1
18CS44-MODULE1-PPT.pdf
Ca lecture 03
UNIT I Basic terminology COMPUTER ARCHI.ppt
Ad

Recently uploaded (20)

PPTX
Lesson notes of climatology university.
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Computing-Curriculum for Schools in Ghana
PPTX
Cell Types and Its function , kingdom of life
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
Sports Quiz easy sports quiz sports quiz
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
GDM (1) (1).pptx small presentation for students
PPTX
Institutional Correction lecture only . . .
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
master seminar digital applications in india
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
PPH.pptx obstetrics and gynecology in nursing
Lesson notes of climatology university.
Microbial diseases, their pathogenesis and prophylaxis
Pharmacology of Heart Failure /Pharmacotherapy of CHF
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
human mycosis Human fungal infections are called human mycosis..pptx
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Computing-Curriculum for Schools in Ghana
Cell Types and Its function , kingdom of life
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
O7-L3 Supply Chain Operations - ICLT Program
Sports Quiz easy sports quiz sports quiz
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
FourierSeries-QuestionsWithAnswers(Part-A).pdf
GDM (1) (1).pptx small presentation for students
Institutional Correction lecture only . . .
STATICS OF THE RIGID BODIES Hibbelers.pdf
master seminar digital applications in india
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPH.pptx obstetrics and gynecology in nursing
Ad

ARM PROCESSING BASICS PPT FOR 4TH SEM ENGINEERING

  • 1. Unit - I ARM Processor Fundamentals
  • 2. Introduction • ARM processors are a family of central processing units (CPUs) based on a reduced instruction set computer (RISC) architecture. ARM stands for Advanced RISC Machine. History of ARM processors: • x86 is an older architectural approach (CISC - complex instruction set computer) , first x86 CPU design launched in 1978. • “Microcomputers" (PCs) evolved  high performance and a smaller design became a challenge. • Early 1980s, Acorn Computers designed microcomputers  performance limitations with chip design. • Around 1981, University of California, Berkeley project  resource usage with computer chips. • Processing units have certain predefined operations collectively called instruction sets. • Most programs used only a small subset of the instruction set. Reducing the number of predefined instructions—cutting out complex and hard to implement (and little used) instructions —the remaining simple instructions would run faster and take up much less power and space on the chip.  RISC Dr. Shachi P, Dept. of ECE, BMSCE 2
  • 3. History of ARM processors • X86(Intel) design have a modular approach based on a motherboard with swappable components. The CPU and other components—such as graphics cards and GPUs, memory controllers, storage, or processing cores—are optimized for specific functions and can be easily swapped out or expanded. • However, these hardware components are typically more homogenized system architectures, which can allow hackers to quickly breach and attack systems with "write once, run anywhere" exploits. • In ARM-based processor, CPU cores and other hardware functions (like I/O bus controllers such as peripheral component interconnect) are on the same physical platform, and all of the different functions are integrated together through an internal bus.  SoC • x86 chips are designed to optimize performance; • ARM-based processors are designed to balance cost with smaller sizes, lower power consumption, lower heat generation, speed, and potentially longer battery life. Dr. Shachi P, Dept. of ECE, BMSCE 3
  • 4. ARM Partnership Model Dr. Shachi P, Dept. of ECE, BMSCE 4
  • 5. Microprocessors vs. Microcontrollers Microprocessors Microcontrollers A silicon chip representing a Central Processing Unit(CPU), which is capable of performing arithmetic as well as logical operations according to a pre-defined set of Instructions. A microcontroller is a highly integrated chip that contains a CPU, RAM, Special and General purpose Register Arrays, On Chip ROM/FLASH memory for program storage, Timer and Interrupt control units and dedicated I/O ports It is a dependent unit. It requires the combination of other chips like Timers, Program and data memory chips, Interrupt controllers etc. for functioning. It is a self contained unit and doesn’t require external Interrupt Controller, Timer, and UART etc. for its functioning. Most of the time general purpose in design and operation. Mostly application oriented or domain specific. Doesn’t contain a built in I/O port. The I/O Port functionality needs to be implemented with the help of external Programmable Peripheral Interface Chips Most of the processors contain multiple built-in I/O ports which can be operated as a single 8 or 16 or 32 bit Port or as individual port pins. Targeted for high end market where performance is important. Targeted for embedded market where performance is not so critical. Limited power saving options. Includes lot of power saving features Dr. Shachi P, Dept. of ECE, BMSCE 5
  • 6. RISC V/S CISC Processors/Controllers: RISC Processors/Controllers CISC Processors/Controllers Lesser no. of instructions. Greater no. of Instructions. Instruction Pipelining and increased execution speed. Generally no instruction pipelining feature. Operations are performed on registers only, the only memory operations are load and store Operations are performed on registers or memory depending on the instruction Large number of registers are available Limited no. of general purpose registers Programmer needs to write more code to execute a task since the instructions are simpler ones. A programmer can achieve the desired functionality with a single instruction. Single, Fixed length Instructions. Variable length Instructions. Less Silicon usage and pin count. More silicon usage. With Harvard Architecture Harvard or Von-Neumann Architecture Dr. Shachi P, Dept. of ECE, BMSCE 6
  • 7. CISC vs. RISC - Instruction set • The terms CISC and RISC refer to design principles and techniques. RISC: Reduced instruction set computers • Simple instructions require a small number of basic steps to execute. • For a processor that has only simple instructions, a large number of instructions may be needed to perform a given programming task. • This could lead to a large value of N and a small value for S. 'N' is the total number of steps required to complete program execution. 'S' is the average number of basic steps each instruction execution requires. • It is much easier to implement efficient pipelining in processors with simple instruction sets. CISC: Complex instruction set computers • Complex instructions involve a large number of steps. • If individual instructions perform more complex operations, fewer instructions will be needed, leading to a lower value of N and a larger value of S. • Complex instructions combined with pipelining would achieve good performance. Dr. Shachi P, Dept. of ECE, BMSCE 7
  • 8. Harvard V/s Von-Neumann Processor/Controller Architecture Harvard Architecture(ARM) Von-Neumann Architecture(x86) Microprocessors/controllers based on the Harvard architecture will have separate data bus and instruction bus. This allows the data transfer and program fetching to occur simultaneously on both buses. Microprocessors/controllers based on the Von- Neumann architecture shares a single bus for fetching both instructions and data. Program instructions & data are stored in a common main memory. Separate buses for Instruction and Data fetching. Single shared bus for Instruction and Data fetching. Easier to Pipeline, so high performance can be achieved. Low performance Compared to Harvard Architecture. Comparatively high cost. Cheaper. No memory alignment problems Allows self modifying codes Dr. Shachi P, Dept. of ECE, BMSCE 8
  • 9. Unit - I • Basic Structure of computers- Von Neumann and Harvard Architecture, Basic Processing Unit, Bus Structure, RISC and CISC Architecture, RISC and ARM Design philosophy, ARM core Dataflow model, programming model, processor states and operating modes, ARM pipeline. Dr. Shachi P, Dept. of ECE, BMSCE 9
  • 10. Computer Types • Since their introduction in the 1940s, digital computers have evolved into many different types that vary widely in size, cost, computational power, and intended use. • Modern computers can be divided roughly into four general categories: • 1. Embedded computers are integrated into a larger device or system in order to automatically monitor and control a physical process or environment. • They are used for a specific purpose rather than for general processing tasks. • Ex: industrial and home automation, appliances, telecommunication products, and vehicles Dr. Shachi P, Dept. of ECE, BMSCE 10
  • 11. Computer Types • 2. Personal computers have achieved widespread use in homes, educational institutions, and business and engineering office settings, primarily for dedicated individual use. • They support a variety of applications such as general computation, document preparation, computer-aided design, audio visual entertainment, interpersonal communication, and Internet browsing. A number of classifications are used for personal computers. • Desktop computers serve general needs and fit within a typical personal workspace. • Workstation computers offer higher computational capacity and more powerful graphical display capabilities for engineering and scientific work. • Portable and Notebook computers provide the basic features of a personal computer in a smaller lightweight package. They can operate on batteries to provide mobility Dr. Shachi P, Dept. of ECE, BMSCE 11
  • 12. Computer Types 3.3. Servers and Enterprise systems are large computers that are meant to be shared by a potentially large number of users who access them from some form of personal computer over a public or private network. • Such computers may host large databases and provide information processing for a government agency or a commercial organization. 4.Supercomputers and Grid computers normally offer the highest performance. They are the most expensive and physically the largest category of computers. • Supercomputers are used for the highly demanding computations needed in weather forecasting, engineering design and simulation, • and scientific work. Dr. Shachi P, Dept. of ECE, BMSCE 12
  • 13. Functional Units of a Computer Computer consists of five functionally independent main parts: 1. Input 2. Memory 3. Arithmetic and logic 4. Output 5. Control units Dr. Shachi P, Dept. of ECE, BMSCE 13
  • 14. Functional Units of a Computer • The input unit accepts coded information from human operators using devices such as keyboards, or from other computers over digital communication lines. • The information received is stored in the computer’s memory, either for later use or to be processed immediately by the arithmetic and logic unit. • The processing steps are specified by a program that is also stored in the memory. • Finally, the results are sent back to the outside world through the output unit. • All of these actions are coordinated by the control unit. • An interconnection network provides the means for the functional units to exchange information and coordinate their actions. Dr. Shachi P, Dept. of ECE, BMSCE 14
  • 15. • The information handled by a computer is categorize as either instructions or data. • Instructions, or machine instructions, are explicit commands that 1. Govern the transfer of information within a computer as well as between the computer and its I/O devices. 2. Specify the arithmetic and logic operations to be performed • A program is a list of instructions which performs a task. Programs are stored in the memory. • The processor fetches the program instructions from the memory, one after another, and performs the desired operations. • The computer is controlled by the stored program, except for possible external interruption by an operator or by I/O devices connected to it. • Data are numbers and characters that are used as operands by the instructions. Data are also stored in the memory. • The instructions and data handled by a computer must be encoded in a suitable format. Dr. Shachi P, Dept. of ECE, BMSCE 15 Functional Units of a Computer
  • 16. Memory Unit • The function of the memory unit is to store programs and data. There are two classes of storage, called primary and secondary. Primary Memory • Primary memory (main memory) is a fast memory that operates at electronic speeds. • Programs must be stored in this memory while they are being executed. • Semiconductor storage cells, each capable of storing one bit of information. These cells are handled in groups of fixed size called words. • The memory is organized  one word can be stored/retrieved in one basic operation. • Number of bits in each word word length (typically 16, 32, or 64 bits). Dr. Shachi P, Dept. of ECE, BMSCE 16 Functional Units of a Computer
  • 17. • To provide easy access to any word in the memory, a distinct address is associated with each word location. • Addresses are consecutive numbers, starting from 0, that identify successive locations. • A memory in which any location can be accessed in a short and fixed amount of time after specifying its address is called a random-access memory (RAM). • The time required to access one word is called the memory access time. • This time is independent of the location of the word being accessed. It typically ranges from a few nanoseconds (ns) to about 100 ns for current RAM units. Dr. Shachi P, Dept. of ECE, BMSCE 17 Functional Units of a Computer
  • 18. Cache Memory • As an adjunct to the main memory, a smaller, faster RAM unit, called a cache, is used to hold sections of a program that are currently being executed, along with any associated data. • The cache is tightly coupled with the processor and is usually contained on the same integrated-circuit chip. • The purpose of the cache is to facilitate high instruction execution rates. • As execution proceeds, instructions are fetched into the processor chip, and a copy of each is placed in the cache. • If the required data located in the main memory, the data are fetched and copies are also placed in the cache. Dr. Shachi P, Dept. of ECE, BMSCE 18 Functional Units of a Computer
  • 19. Secondary Storage • Although primary memory is essential, it tends to be expensive and does not retain information when power is turned off. • Thus additional, less expensive, permanent secondary storage is used when large amounts of data and many programs have to be stored, particularly for information that is accessed infrequently. • Access times for secondary storage are longer than for primary memory. • Examples: magnetic disks, optical disks (DVD and CD), and flash memory devices. • https://www.youtube.com/watch?v=7J7X7aZvMXQ (up to 3:17seconds) Dr. Shachi P, Dept. of ECE, BMSCE 19 Functional Units of a Computer
  • 20. • When operands are brought into the processor, they are stored in high- speed storage elements called registers. • Each register can store one word of data. • Access times to registers are even shorter than access times to the cache unit on the processor chip. Control Unit • Control circuits are responsible for generating the timing signals that govern the transfers and determine when a given action is to take place. • Data transfers between the processor and the memory are also managed by the control unit through timing signals. Dr. Shachi P, Dept. of ECE, BMSCE 20 Functional Units of a Computer
  • 21. The Basic Operational Concepts of a Computer • To perform a given task an appropriate program consisting of a list of instructions is stored in the memory. Individual instructions are brought from the memory into the processor, which executes the specified operations. Data to be stored are also stored in the memory. • Examples: - Add LOCA, R0 • This instruction adds the operand at memory location LOCA, to operand in register R0 & places the sum into register. This instruction requires the performance of several steps, 1. First the instruction is fetched from the memory into the processor. 2. The operand at LOCA is fetched and added to the contents of R0 3. Finally the resulting sum is stored in the register R0 • The preceding add instruction combines a memory access operation with an ALU Operations. In some other type of computers, these two types of operations are performed by separate instructions for performance reasons. • Load LOCA, R1 • Add R1, R0 • Transfers between the memory and the processor are started by sending the address of the memory location to be accessed to the memory unit and issuing the appropriate control signals. The data are then transferred to or from the memory. Dr. Shachi P, Dept. of ECE, BMSCE 21
  • 22. Dr. Shachi P, Dept. of ECE, BMSCE 22 Connections between the processor and the memory • Besides IR and PC, there are n-general purpose registers R0 through Rn-1. Memory Address Register (MAR): It holds the address of the location to be accessed. Memory Data Register (MDR): It contains the data to be written into or read out of the address location. • The instruction register (IR) • Holds the instructions that is currently being executed. Its output is available for the control circuits which generates the timing signals that control the various processing elements in one execution of instruction. • The program counter PC: This is another specialized register that keeps track of execution of a program. It contains the memory address of the next instruction to be fetched and executed.
  • 23. Operating steps for Program execution 1. Execution of the program (stored in memory) starts when the PC is set to point to the first instruction of the program. 2. The contents of the PC are transferred to the MAR and a Read control signal is sent to the memory. 3. The addressed word is read out of the memory and loaded into the MDR. Next, the contents of the MDR are transferred to the IR. At this point, the instruction is ready to be decoded and executed. 4. If the instruction involves an operation to be performed by the ALU, it is necessary to obtain the required operands. 5. If an operand resides in memory (it could also be in a general purpose register in the processor), it has to be fetched by sending its address to the MAR and initiating a Read cycle. 6. When the operand has been read from the memory into the MDR, it is transferred from the MDR to ALU. 7. After one or more operands are fetched in this way, the ALU can perform the desired operation. 8. If the result of the operation is to be stored in the memory, then the result is entered in to the MDR. Dr. Shachi P, Dept. of ECE, BMSCE 23
  • 24. 9.The address of the location where the result is to be stored is sent to the MAR, and a write cycle is initiated. 10.At some point during the execution of the current instruction, the contents of the PC are incremented so that the PC points to the next instruction to be executed. 11. Thus, as soon as the execution of the current instruction is completed, a new instruction fetch may be started. 12. In addition to transferring data between the memory and the processor, the computer accepts data from input devices and sends data to output devices. Thus, some machine instructions with the ability to handle I/O transfers are provided. • Normal execution of a program may be preempted (temporarily interrupted) if some devices require urgent servicing, to do this one device raises an Interrupt signal. • An interrupt is a request signal from an I/O device for service by the processor. The processor provides the requested service by executing an appropriate interrupt service routine. • The Diversion may change the internal state of the processor. Its state must be saved in the memory location before interruption. When the interrupt-routine service is completed the state of the processor is restored so that the interrupted program may continue. Dr. Shachi P, Dept. of ECE, BMSCE 24
  • 25. Bus Structures BUS: A group of lines(wires) that serves as a connecting path for several devices of a computer is called a bus. The following are different types of busses: 1. Address Bus 2. Data Bus 3. Control Bus • The Data bus Carries(transfer) data from one component (source) to other component (destination) connected to it. The data bus consists of 8, 16, 32 or more parallel signal lines. The data bus lines are bi-directional i.e., CPU can read data on these lines from memory or from a port, as well as send data out on these lines to a memory location. • The Address bus is the set of lines that carry(transfer) address information about to which memory address, the data is to be transferred to or from. It is an unidirectional bus. The address bus consists of 16, 20, 24 or more parallel signal lines. On these lines CPU sends out the address of the memory location. • The Control Bus carries the Control and timing information. Dr. Shachi P, Dept. of ECE, BMSCE 25
  • 26. Bus Structures Following are the other types of busses. • System Bus: A System Bus is usually a combination of address bus, data bus, and control bus. • Internal Bus: The bus that operates only with the internal circuitry of the CPU. • External Bus: Buses which connects computer to external devices • I/O Bus: The bus used by I/O devices to communicate with the CPU • Synchronous Bus: While using Synchronous bus, data transmission between source and destination units takes place in a given timeslot which is already known to these units. Dr. Shachi P, Dept. of ECE, BMSCE 26
  • 27. Bus Structures • Asynchronous Bus: In this case the data transmission is governed by a special concept. That is handshaking control signals. • Handshaking (either software codes or hardware signals) is used to halt transmission of data from the sending computer until the receiving computer has emptied the buffer. • Handshaking is a I/O control method to synchronize I/O devices with the microprocessor. • As many I/O devices accepts or release information at a much slower rate than the microprocessor, this method is used to control the microprocessor to work with a I/O device at the I/O devices data transfer rate. Dr. Shachi P, Dept. of ECE, BMSCE 27
  • 28. The Bus interconnection Scheme Dr. Shachi P, Dept. of ECE, BMSCE 28 1. Bus is a connecting path for several devices of a computer 2. In addition to the lines that carry the data, the bus must have lines for address and control purposes.
  • 29. Single bus structure • The simplest way to interconnect functional units is to use a single bus, as shown below. • All units are connected to this bus. The bus can be used for only one transfer at a time. Bus control lines are used to arbitrate multiple requests for use of the bus. ADVANTAGE • Low-cost and its flexibility for attaching peripheral devices DISADVANTAGE • Low-performance because at time only one transfer • Scalability: As computer systems become more complex and require higher bandwidth for data transfer, a single bus structure may struggle to scale efficiently. • Contention: Contention for the bus can occur when multiple components attempt to access it simultaneously, leading to delays and potential performance issues. Dr. Shachi P, Dept. of ECE, BMSCE 29
  • 30. Traditional / Multiple bus Structure: • Advantages: better performance, scalable, less contention • Disadvantage: increased cost and complexity. Dr. Shachi P, Dept. of ECE, BMSCE 30
  • 31. Traditional / Multiple bus Structure: • Traditional / Multiple bus Structure: • There is a local bus that connects the processor to cache memory and that may support one or more local devices. • There is also a cache memory controller that connects this cache not only to this local bus but also to the system bus. On the system, the bus is attached to the main memory modules. • I/O transfers to and from the main memory across the system bus do not interfere with the processor’s activity. • An expansion bus interface buffers data transfers between the system bus and the I/O controllers on the expansion bus. • I/O devices that might be attached to the expansion bus include: Network cards (LAN), SCSI (Small Computer System Interface), Modem, etc.. Dr. Shachi P, Dept. of ECE, BMSCE 31
  • 32. Basic Processing Unit • Computing task consists of a series of operations specified by a sequence of machine- language instructions that constitute a program. • The processor fetches one instruction at a time and performs the operation specified. Instructions are fetched from successive memory locations until a branch or a jump instruction is encountered. • The processor uses the program counter, PC, to keep track of the address of the next instruction to be fetched and executed. • After fetching an instruction, the contents of the PC are updated to point to the next instruction in sequence. A branch instruction may cause a different value to be loaded into the PC. • When an instruction is fetched, it is placed in the instruction register, IR, from where it is interpreted, or decoded, by the processor’s control circuitry. The IR holds the instruction until its execution is completed. • Consider a 32-bit RISC-style instruction set architecture. Dr. Shachi P, Dept. of ECE, BMSCE 32
  • 33. Basic Processing Unit • Toexecute an instruction, the processor has to perform the following steps: 1. Fetch the contents of the memory location pointed to by the PC. The contents of this location are the instruction to be executed; hence they are loaded into the IR. • In register transfer notation, the required action is IR ← [[PC]] 2. Increment the PC to point to the next instruction. Assuming that the memory is byte addressable, the PC is incremented by 4; that is PC ← [PC] + 4 3. Carry out the operation specified by the instruction in the IR. Dr. Shachi P, Dept. of ECE, BMSCE 33
  • 34. Basic Processing Unit • The operation specified by an instruction can be carried out by performing one or more of the following actions: • Read the contents of a given memory location and load them into a processor register. • Read data from one or more processor registers. • Perform an arithmetic or logic operation and place the result into a processor register. • Store data from a processor register into a given memory location. • The processor communicates with the memory through the processor- memory interface, which transfers data from and to the memory during Read and Write operations. • The instruction address generator updates the contents of the PC after every instruction is fetched. The register file is a memory unit whose storage locations are organized to form the processor’s general-purpose registers. Dr. Shachi P, Dept. of ECE, BMSCE 34
  • 35. Basic Processing Unit • The processor communicates with the memory through the processor-memory interface, which transfers data from and to the memory during Read and Write operations. • The instruction address generator updates the contents of the PC after every instruction is fetched. • The register file is a memory unit whose storage locations are organized to form the processor’s general- purpose registers. • During execution, the contents of the registers named in an instruction that performs an arithmetic or logic operation are sent to the arithmetic and logic unit (ALU), which performs the required computation. • The results of the computation are stored in a register in the register file. • The clock period, which is the time between two successive rising edges, must be long enough to allow the combinational circuit to produce the correct result. Dr. Shachi P, Dept. of ECE, BMSCE 35
  • 36. RISC and ARM Design Philosophy The RISC Design Philosophy • Instructions – reduced number and simpler • Pipeline • Registers – large number of general purpose registers (store data or address) • Load/Store architecture – anything data on memory (to be processed), is first moved to register/s and then processed. ARM Design Philosophy • Power efficiency • High code density • Memory footprint/ Die area • Hardware Debug technology Dr. Shachi P, Dept. of ECE, BMSCE 36
  • 37. The RISC Design Philosophy Dr. Shachi P, Dept. of ECE, BMSCE 37
  • 38. Nomenclature • ARM7TDMI-S Dr. Shachi P, Dept. of ECE, BMSCE 38
  • 39. ARM7TDMI Features • 32 bit data bus/ ALU • 32 bit instructions/ Address bus • Aligned memory • Von Neuman architecture • 3-stage pipeline • 37 registers- 32 bit each • Load- store Model • 7 operating modes • 7 exceptions • 7 addressing modes • 3 data formats Dr. Shachi P, Dept. of ECE, BMSCE 39
  • 40. ARM ISA Features • ARM ISA differs from pure RISC • Variable execution cycle for certain instructions • In-line barrel shifter leading to more complex instructions. • Thumb instruction set • Conditional execution • Enhanced instructions with DSP extension Dr. Shachi P, Dept. of ECE, BMSCE 40
  • 41. Data Sizes and Instruction Sets ■ The ARM is a 32-bit architecture. ■ When used in relation to the ARM: ■ Byte means 8 bits ■ Halfword means 16 bits (two bytes) ■ Word means 32 bits (four bytes) ■ Most ARM’s implement two instruction sets ■ 32-bit ARM Instruction Set ■ 16-bit Thumb Instruction Set ■ Jazelle cores can also execute Java bytecode Dr. Shachi P, Dept. of ECE, BMSCE 41
  • 42. ARM core Dataflow model • MOVS r7, r5, LSL #2 • MLA{<cond>}{S} R0,R1,R2,R3 • LDR r0, [r1, #4]! • STRH r0,[r1,#0x4]! • LDRSB r0,[r1] Dr. Shachi P, Dept. of ECE, BMSCE 42
  • 43. Registers What is a register? • data holding places that are part of the computer processor • high-speed memory storing units. • memory locations that can be accessed by the CPU directly Difference between memory and register • A register stores the instructions which the CPU currently processes. • Memory stores the data and instructions that the processor while operation may require. Dr. Shachi P, Dept. of ECE, BMSCE 43
  • 44. Registers (contd.) • ARM has 37 registers (all are 32-bits long) • 1 dedicated program counter • 1 dedicated current program status register (CPSR) • 5 dedicated saved program status registers (SPSR) • 30 general purpose registers • Out of 37 only 18 are active registers • 16 data registers (r0-r15)- hold either data or address • 2 process status registers • r13 : stack pointer • r14: link register • r15: program counter Dr. Shachi P, Dept. of ECE, BMSCE 44
  • 45. Registers (contd.) • Register r13 : • used as the stack pointer (sp) • stores the head of the stack in the current processor mode. • Register r14 • the link register (lr) • the core puts the return address whenever it calls a subroutine. • Register r15: • is the program counter (pc) • the address of the next instruction to be fetched by the processor • These registers are distributed in several register banks, their usage depends on the mode in which the ARM processor is operated Dr. Shachi P, Dept. of ECE, BMSCE 45
  • 46. Banked Registers • registers hidden from a program at different times  banked registers are identified by the shading in the diagram • Available only when the processor is in a particular mode • Mode can be selected by writing directly to the mode bits of the cpsr (core must be in privileged mode) • Mode can also be changed by hardware when the core responds to an exception or interrupt • A banked register maps one-to one onto a user mode register • If processor mode is changed , a banked register from the new mode will replace an existing register Saved Program Status Register (SPSR) stores the current value of the CPSR when an exception is taken so that the CPSR can be restored after handling the exception. Dr. Shachi P, Dept. of ECE, BMSCE 46
  • 47. • Exceptions and interrupts suspend the normal execution of sequential instructions and jump to a specific location. • The following exceptions and interrupts cause a mode change:  Reset  interrupt request  fast interrupt request  software interrupt  data abort  prefetch abort  undefined instruction • a new register appearing in interrupt request mode: the saved program status register (spsr), which stores the previous mode’s cpsr • spsr can only be modified and read in a privileged mode. There is no spsr available in user mode. • cpsr is not copied into tspsr when a mode change is forced due to a program writing directly to the cpsr. • The saving of the cpsr only occurs when an exception or interrupt is raised. Dr. Shachi P, Dept. of ECE, BMSCE 47
  • 48. Current Program Status Register • Used to monitor and control internal operations. 32-bit register, resides in register file. • The CPSR is divided into four fields, each 8 bits wide: Dr. Shachi P, Dept. of ECE, BMSCE 48 • flags • Status • Extension • Control • The control fieldprocessor mode, state, and interrupt mask bits. • The flags field  contains the condition flags. • Some ARM processor cores have extra bits allocated. • The J bit, in flags field  used in Jazelle-enabled processors
  • 49. Interrupt Masks • Used to stop specific interrupt requests from interrupting the processor • Two interrupt request levels inARM core • Interrupt Request (IRQ) • Fast Interrupt Request (FIQ) • CPSR: 2 interrupt mask bits • I when set to 1 it masks requests made by IRQ • Fwhen set to 1 it masks requests made by FIQ Conditional Flags There are four Conditional Flags inARM7TDMI It is present in the CPSR, the flag bits are ⚫ N: Result is Negative ⚫ Z: Zero flag ⚫ C: Carry Flag ⚫ V: Overflow Flag Dr. Shachi P, Dept. of ECE, BMSCE 49
  • 50. Condition Flags • Updated by comparisons and results ofALU operations • Only instructions having suffix S can update the flags • Eg: SUBS instruction when executed sets Z=1 if result is zero • Q: used in cores with DSP extensions • Indicates an overflow/ saturation due to execution of enhanced DSP instruction • It’s a sticky flag: can be set only by hardware • Can be cleared by writing to CPSR directly • ARM instructions follow conditional execution • Its is based on the value stored in conditional flag[ Ref Table next slide] Note 1:  When bit=1Capital Letter  When bit=0Lower case Letter Figure: CPSR with both Jazelle and DSP extensions set Note 2:  Conditional flags Capital letter indicate flag is set  InterruptsCapital letter indicates interrupt is disabled/masked Dr. Shachi P, Dept. of ECE, BMSCE 50
  • 51. Conditional Execution  Controls whether or not the core will execute an instruction  Before execution, processor compares the attributes with the flags in CPSR  If they match instruction is executed  If not instruction is ignored  Conditional attribute is post-fixed to instruction mnemonic [REFER TABLE]  If mnemonic is not present the default is AL (Always) Dr. Shachi P, Dept. of ECE, BMSCE 51
  • 52. Dr. Shachi P, Dept. of ECE, BMSCE 52
  • 53. Dr. Shachi P, Dept. of ECE, BMSCE 53 On power up the processor by default operates in supervisor mode  privileged mode
  • 54. Processor Modes The processor mode determines which registers are active and the access rights to the cpsr register itself.  Each process mode is either privileged or nonprivileged  A privileged mode :allows full read-write access to the cpsr  A nonprivileged mode : only allows to read access to the control field in the cpsr  but still allows read-write access to the condition flags. Dr. Shachi P, Dept. of ECE, BMSCE 54
  • 55. Dr. Shachi P, Dept. of ECE, BMSCE 55
  • 56. Dr. Shachi P, Dept. of ECE, BMSCE 56
  • 57. State and Instruction Sets • State defines which instruction set needs to be executed • Selected using the control bits of the CPSR register • 3 states: • ARM : default state, selected when T=J=0,ARM instructions are executed • Thumb: Selected when T=1; 16 bit thumb instructions are executed • Jazelle : selected when J=1; 8 bit Jazelle instruction set is selected; Used to execute java bytecodes • States can be changed by executing branch instruction Dr. Shachi P, Dept. of ECE, BMSCE 57
  • 58. Dr. Shachi P, Dept. of ECE, BMSCE 58
  • 59. Pipelining in ARM7TDMI • ARM devices need pipelining because of RISC as it emphasizes oncompiler complexity. • Each stage is equivalent to 1 cycle, that is n stages = n cycles. • ARM7 uses 3 stage pipeline • Pipeline speeds up the execution; • Next instruction is fetched while the other instructions are being decoded and executed • The pipeline stages are • FETCH: loads instruction from memory to instruction pipeline • DECODE : identifies instruction to be executed • EXECUTE :processes the instruction and writes the result back to a register Dr. Shachi P, Dept. of ECE, BMSCE 59
  • 60. Pipelining in ARM7TDMI Three instructions are in the pipeline. Instructions are placed in pipeline sequentially • Cycle1: CORE fetches ADD from memory and puts it in instruction pipeline • Cycle 2: CORE fetches SUB instruction and Decodes ADD instruction • Cycle 3: CORE fetches CMP instruction, decodes SUB instruction and Executes ADD instruction • This procedure is called FILLING THE PIPELINE Pipeline allows the CORE to execute an instruction every cycle. Latency is 3-cycles but throughput is one instruction per cycle. Dr. Shachi P, Dept. of ECE, BMSCE 60
  • 61. EXTRAS Dr. Shachi P, Dept. of ECE, BMSCE 61
  • 62. Barrel Shifter A barrel shifter is a digital circuit that can shift a binary number by a specified number of bits in one clock cycle. • Barrel shifter can be implemented by a combination of multiplexers • 2 types – arithmetic and logical shifter A few examples of barrel shifter applications: • In Digital Signal Processing, barrel shifters are used to perform fast multiplication and division operations. For example, in a FIR filter implementation, a barrel shifter can be used to shift the filter coefficients based on the filter order. • In Cryptography, barrel shifters are used to perform bitwise operations, such as encryption and decryption. For example, a barrel shifter can be used to perform a circular shift on a binary value to improve the security of the encryption algorithm. • In Microprocessor Architectures, barrel shifters are used to shift the contents of registers, allowing for efficient data manipulation. For example, in the ARM architecture, the barrel shifter is used to perform shift and rotate operations on the contents of registers. Dr. Shachi P, Dept. of ECE, BMSCE 62
  • 63. Extras Load-store architecture • A load-store architecture is a type of computer architecture where all data processing operations (such as arithmetic, logical, and control operations) are performed only on data that is loaded from memory into registers, and the results are stored back into memory. In other words, the only operations that directly access memory are load and store operations. • For CISC machine, which is a register-memory architecture, operands may come from register or memory and RISC a register-register(or load-store) one on the contrary. Dr. Shachi P, Dept. of ECE, BMSCE 63

Editor's Notes

  • #9: Self-modifying code was more common in earlier computing systems when memory and processing resources were more constrained.
  • #17: A memory address a is said to be n-byte aligned when a is a multiple of n (where n is a power of 2). In this context, a byte is the smallest unit of memory access, i.e. each memory address specifies a different byte. An n-byte aligned address would have a minimum of log2(n) least-significant zeros when expressed in binary.
  • #28: Synchronous - Transfer of large text files, Chatrooms, Video conferencing, Telephone conversations Asynchronous – Emails, Forums, Radios
  • #39: The debug extensions provide the mechanism by which normal operation of the processor can be suspended for debug. 32x8 multiplier reduced the number of cycles required for a multiplication of two registers (32-bit * 32-bit) to a few cycles (data dependent). In-circuit emulation (ICE) is the use of a hardware device or in-circuit emulator used to debug the software of an embedded system.  ARM7TDMI (without the "-S" extension) was initially designed as a hard macro, meaning that the physical design at the transistor layout level was done by ARM, and licensees took this fixed physical block and placed it into their chip designs. Subsequently, A processor design distributed to licensees as an RTL description (such as ARM7TDMI-S) is therefore described as "synthesizable". https://developer.arm.com/documentation/ka001209/latest/
  • #40: 3 data-formats: word, half-word and byte A memory access is said to be aligned when the data being accessed is n bytes long and the datum address is n-byte aligned.  The ARM7TDMI core has a Von Neumann architecture, with a single 32-bit data bus carrying both instructions and data. Only load, store, and swap instructions can access data from memory.
  • #51: The V flag works the same as the C flag, but for signed operations. For example, 0x7fffffff is the largest positive two's complement integer that can be represented in 32 bits, so 0x7fffffff + 0x7fffffff triggers a signed overflow, but not an unsigned overflow (or carry): the result, 0xfffffffe, is correct if interpreted as an unsigned quantity, but represents a negative value (-2) if interpreted as a signed quantity.