Embedded System -Lyla B Das.pdf

EMBEDDEDSYSTEMS
EMBEDDEDSYSTEMS
A01_9788131787663_FM.indd i
A01_9788131787663_FM.indd i 7/3/2012 4:16:57 PM
7/3/2012 4:16:57 PM

The teacher, who is indeed wise, does not
bid you to enter the house of his wisdom
but rather leads you to the threshold of
your mind.
—Khalil Gibran
A01_9788131787663_FM.indd ii
A01_9788131787663_FM.indd ii 7/3/2012 4:16:58 PM
7/3/2012 4:16:58 PM

EMBEDDEDSYSTEMS
EMBEDDEDSYSTEMS
LYLABDAS
Department of Electronics and Communication Engineering
National Institute of Technology Calicut
Kozhikode, Kerala
An Integrated Approach
A01_9788131787663_FM.indd iii
A01_9788131787663_FM.indd iii 7/3/2012 4:16:58 PM
7/3/2012 4:16:58 PM

A01_9788131787663_FM.indd iv
A01_9788131787663_FM.indd iv 7/3/2012 4:17:02 PM
7/3/2012 4:17:02 PM
Copyright © 2013 Dorling Kindersley (India) Pvt. Ltd.
Licensees of Pearson Education in South Asia
No part of this eBook may be used or reproduced in any manner whatsoever without the publisher’s prior
written consent.
This eBook may or may not include all assets that were part of the print version. The publisher reserves the
right to remove any material present in this eBook at any time.
ISBN 9788131787663
eISBN 9789332511675
Head Office: A-8(A), Sector 62, Knowledge Boulevard, 7th Floor, NOIDA 201 309, India
Registered Office: 11 Local Shopping Centre, Panchsheel Park, New Delhi 110 017, India

This book is dedicated
to my children
and
to all my students
A01_9788131787663_FM.indd v
A01_9788131787663_FM.indd v 7/3/2012 4:17:02 PM
7/3/2012 4:17:02 PM

A01_9788131787663_FM.indd vi
A01_9788131787663_FM.indd vi 7/3/2012 4:17:02 PM
7/3/2012 4:17:02 PM
This page is intentionally left blank.

vii
Preface xiii
About the Author xix
Part I Design Aspects of Embedded Systems 1
0 Basics of Computer Architecture and the Binary Number System 3
0.1 Basics of Computer Architecture 3
0.2 Computer Languages 8
0.3 RISC and CISC Architectures 10
0.4 Number Systems 11
0.5 Number Format Conversions 13
0.6 Computer Arithmetic 21
0.7 Units of Memory Capacity 30
Key Points of this Chapter 31
Questions 31
Exercises 32
1 Introduction to Embedded Systems 34
1.1 Application Domain of Embedded Systems 35
1.2 Desirable Features and General Characteristics of Embedded Systems 35
1.3 Model of an Embedded System 37
1.4 Microprocessor vs Microcontroller 37
1.5 Example of a Simple Embedded System 40
1.6 Figures of Merit for an Embedded System 41
1.7 Classification of MCUs: 4/8/16/32 Bits 42
1.8 History of Embedded Systems 44
1.9 CurrentTrends 45
Questions 46
Exercises 46
2 Embedded Systems—The Hardware Point of View 47
2.1 Microcontroller Unit (MCU) 48
2.2 A Popular 8-bit MCU 50
2.3 Memory for Embedded Systems 64
2.4 Low Power Design 78
2.5 Pullup and Pulldown Resistors 79
contents
A01_9788131787663_FM.indd vii
A01_9788131787663_FM.indd vii 7/3/2012 4:17:02 PM
7/3/2012 4:17:02 PM

viii CONTENTS
Questions 85
Exercises 85
3 Sensors, ADCs and Actuators 86
3.1 Sensors 87
3.2 Analog to Digital Converters 97
3.3 Actuators 104
Questions 131
Exercises 132
4 Examples of Embedded Systems 133
4.1 Mobile Phone 133
4.2 Automotive Electronics 139
4.3 Radio Frequency Identification (RFID) 143
4.4 Wireless Sensor Networks (WISENET) 145
4.5 Robotics 146
4.6 Biomedical Applications 150
4.7 Brain Machine Interface 151
Questions 156
Exercises 157
5 Buses and Protocols 158
5.1 Defining Buses and Protocols 158
5.2 On-board Buses for Embedded Systems 166
5.3 External Buses 172
5.4 Automotive Buses 188
5.5 Wireless Communications Protocols 194
Questions 203
Exercises 203
6 Software Development Tools 204
6.1 Embedded Program Development 204
6.2 Downloading the Hex File to the Non-volatile Memory 211
6.3 Hardware Simulator 215
Questions 216
Exercises 217
Part II Software Design Aspects 219
7 Operating System Concepts 221
7.1 Embedded Operating Systems 223
7.2 Network Operating Systems (NOS) 223
A01_9788131787663_FM.indd viii
A01_9788131787663_FM.indd viii 7/3/2012 4:17:02 PM
7/3/2012 4:17:02 PM

CONTENTS ix
7.3 Layers of an Operating System 223
7.4 History of Operating Systems 224
7.5 Functions Performed by an OS (Components of an OS) 225
7.6 SomeTerms Associated with Operating Systems and Computer Usage 230
7.7 The Kernel 231
7.8 Tasks/Processes 234
7.9 Scheduling Algorithms 239
7.10 Threads 250
7.11 Interrrupt Handling 251
7.12 Inter Process (Task) Communications (IPC) 252
7.13 Task Synchronization 257
7.14 Semaphores 265
7.15 Priority Inversion 266
7.16 Device Drivers 268
7.17 Codes/Pseudo Codes for OS Functions 272
Questions 287
Exercises 288
8 Real-time Operating Systems 290
8.1 Real-timeTasks 290
8.2 Real-time Systems 294
8.3 Types of Real-timeTasks 294
8.4 Real-time Operating Systems 296
8.5 Real-time Scheduling Algorithms 298
8.6 Rate Monotonic Algorithm 302
8.7 The Earliest Deadline First Algorithm 306
8.8 Qualities of a Good RTOS 308
Questions 309
Exercises 309
9 Programming in Embedded C 311
9.1 Embedded C 311
9.2 PIC Programming Using MPLAB 328
Questions 331
Exercises 332
Part III Popular Microcontrollers Used
in Embedded Systems 333
10 ARM—The World’s Most Popular 32-bit
Embedded Processor (Part I – Architecture
and Assembly Language Programming) 335
10.1 History of the ARM Processor 335
10.2 ARM Architecture 344
10.3 InterruptVectorTable 348
A01_9788131787663_FM.indd ix
A01_9788131787663_FM.indd ix 7/3/2012 4:17:02 PM
7/3/2012 4:17:02 PM

x CONTENTS
10.4 Programming the ARM Processor 349
10.5 ARM Assembly Language 349
10.6 ARM Instruction Set 352
10.7 Conditional Execution 356
10.8 Arithmetic Instructions 357
10.9 Logical Instructions 359
10.10 Compare Instructions 360
10.11 Multiplication 361
10.12 Division 362
10.13 Starting Assembly Language Programming 363
10.14 General Structure of an Assembly Language Line 364
10.15 Writing Assembly Programs 365
10.16 Branch Instructions 366
10.17 Loading Constants 370
10.18 Load and Store Instructions 375
10.19 Readonly and Read/Write Memory 381
10.20 Multiple Register Load and Store 382
Questions 389
Exercises 390
11 ARM—The World’s Most Popular 32-bit
Embedded Processor (Part II – Peripheral
Programming of ARM MCU Using C) 391
11.1 Block Diagram 392
11.2 Features of the LPC 214x Family 393
11.3 Peripherals 397
11.4 ARM 9 424
11.5 ARM Cortex-M3 424
Questions 428
Exercises 428
12 Cypress’s PSoC: A Different Kind of MCU 429
12.1 How to get a PSoC Development Kit 430
12.2 The PSoC Family 433
12.3 PSoC1 434
12.4 The Internal Architecture of PSoC 437
12.5 The Digital Sub System 443
12.6 GPIO Pins 453
12.7 Digital Applications Using PSoC 456
12.8 The Analog Section 463
12.9 System Resources 473
12.10 PSoC3 and PSoC5 476
Questions 478
Exercises 479
A01_9788131787663_FM.indd x
A01_9788131787663_FM.indd x 7/3/2012 4:17:02 PM
7/3/2012 4:17:02 PM

CONTENTS xi
13 The 8051 Microcontroller:
The Programmer’s Perspective 480
13.1 History and Family Details of 8051 480
13.2 8051:The Programmer’s Perspective 482
13.3 Assembly Language Programming 485
13.4 Internal RAM 491
13.5 The 8051 Stack 493
13.6 Processor StatusWord (PSW) 495
13.7 Assembler Directives 496
13.8 Storing Data in Code Memory (ROM) 497
13.9 The Instruction Set of 8051 499
13.10 Port Programming 514
13.11 Subroutines (Procedures) 520
13.12 Delay Loops 522
Questions 527
Exercises 528
14 Programming the Peripherals of 8051 529
14.1 Pin Configuration of 8051 529
14.2 Programming the Internal Peripherals 533
14.3 Timers of 8051 535
14.4 Counter Programming 545
14.5 Interrupts of 8051 548
14.6 Serial Communication 558
Questions 565
Exercises 566
15 DSP Processors 567
15.1 The Application Scenario 568
15.2 General Features of Digital Signal Processors 569
15.3 SIMDTechniques 581
15.4 The SHARC Floating Point Processor 587
15.5 DSP Processors ofTexas Instruments (TI) 590
15.6 OMAP (Open Multimedia Applications Platform) 592
Questions 595
Exercises 595
Part IV Design and Performance Aspects 597
16 Automated Design of Digital ICs 599
16.1 History of Integrated Circuit (IC) Design 599
16.2 Types of Digital ICs 599
A01_9788131787663_FM.indd xi
A01_9788131787663_FM.indd xi 7/3/2012 4:17:02 PM
7/3/2012 4:17:02 PM

xii CONTENTS
16.3 ASIC Design 605
16.4 ASIC Design:The Complete Sequence 609
Questions 612
Exercises 612
17 Hardware Software Co–design and Embedded Product Development
Lifecycle Management 613
17.1 Hardware Software Co-design 614
17.2 Modelling of Systems 616
17.3 Embedded Product Development Lifecycle Management 620
17.4 Lifecycle Models 626
Questions 629
Exercises 629
18 Embedded Design: A Systems Perspective 630
18.1 ATypical Example 631
18.2 Product Design 633
18.3 The Design Process 637
18.4 Testing 654
18.5 Bulk Manufacturing 655
Questions 657
Exercises 658
Part V Projects 659
19 Academic Projects 661
19.1 Project No: 1 661
Questions 693
Exercises 694
Appendix A 695
Appendix B 700
Appendix C 710
Appendix D 729
Bibliography 741
Index 745
A01_9788131787663_FM.indd xii
A01_9788131787663_FM.indd xii 7/3/2012 4:17:02 PM
7/3/2012 4:17:02 PM

xiii
Preamble
Writing a book on Embedded Systems is not easy—let me list a few reasons to substantiate this
statement. The first reason is that the field of embedded systems is very vast. The second is that there
is no clear understanding on what exactly a student of engineering should learn about embedded
systems.A great number of products which are classed as embedded systems are available,and the field
is very sophisticated, well developed and rapidly expanding. Anything from a printer to an iPhone is
an embedded system.To write a book on all this is quite difficult on account of not having a clear idea
of where to start and where to end.
To complicate matters further, there are different families of embedded processors. A student
cannot be expected to learn all of them, or even some of them. To make a decision on what to
include and what not to, has been difficult, Besides that, ‘embedded processors’ is not the only topic
to learn. There is a large set of various kinds of sensors, actuators, buses, operating systems, design
methodologies, view points, development models and what not.
But after a lot of contemplation, finally, I converged on a few popular and upcoming processors,
latest buses,new approaches,traditional as well as modern peripherals,real time operating systems and
the like. A lot of literature for all these is available in the form of technical documents, data sheets and
user manuals—right from the USB technical spec to PSoC’s data sheets
Rummaging through all these highly sophisticated technical information,trying to make sense of
it all, and finally presenting it in a way that a student, albeit an eager and enthusiastic one, will be able
to enjoy reading and studying it—this is the challenge involved in writing this book. I have tried my
best to address this challenge of making it a student-friendly presentation.
There are a number of books available under the title of ‘Embedded Systems’. Except for a few,
most of them have simply concentrated on the architecture and application details of one particular
processor. Others have concentrated on the software aspects alone. There are certain others that deal
with both,but since the field of embedded systems is one in which fast evolution is the rule rather than
the exception, some topics become outdated quite fast.
Approach
I have started from the hardware basics, proceeded to discuss some important processors and systems,
and then moved on to the software aspects. The book ends with a presentation on embedded design
from a system point of view. Along with the basics, I have also tried to focus on the latest and most
relevant topics in the field, from the latest processors and buses to the latest trends in embedded
computing.
Pre-requisite
A student of CS, EC or EE branch who has done a first course in digital logic and a second course
in ‘microprocessors and microcontrollers’, is best placed to take up a course on Embedded Systems.
preface
A01_9788131787663_FM.indd xiii
A01_9788131787663_FM.indd xiii 7/3/2012 4:17:02 PM
7/3/2012 4:17:02 PM

xiv PREFACE
But it is possible also to study Embedded Systems as a second course—that is why
some very basic ideas of microprocessors and microcontrollers are included in the initial
chapters.
Organization of the Book
This book is organized as twenty chapters numbered from 0 to 19. It is divided into five
logical parts from Part I to Part V.
Part I
This part, which includes chapters 0 to 6, deals with the basics, the hardware aspects
including sensors, actuators, buses etc. and the tools commonly used in system
development.
Chapter 0 is a revision of computer arithmetic and computer architecture. One needs
to be very thorough in these two basic topics—then the path ahead becomes very
comfortable.
Chapter 1 introduces readers to what an embedded system is, and what its mandatory
parts are. Examples of practical and popularly used embedded systems are listed to make
the introduction clear. The classifications, history and current trends in the embedded
industry are also touched upon.
Chapter 2 is a very important chapter—any student who needs to use/learn embedded
hardware should become conversant and confident about all the topics covered in this
chapter.Not only are the important aspects of typical embedded processors covered here,
related topics such as semiconductor memory (RAM and Flash),low power design,con-
cepts of pullup and pulldown resistors are also touched upon.
Chapter 3 is very important for practical design of systems. Most students are likely to
do hardware based projects as part of academic requirements—this chapter, which gives
an in-depth discussion on sensors and actuators will definitely find use then.
Chapter 4 is meant for a light reading on some of the applications of embedded systems.
Mobile phones, robotics, RFIDs, automotive electronics, medical electronics etc. are
discussed as popular applications. A new idea called ‘brain machine interface’ is also
introduced in this chapter.
Chapter 5 is meant to be studied as a very important topic. It contains explanations of
some of the popular buses used in embedded systems. A student is not expected to study
all buses in detail, but a general idea of buses, and a study of some of the important ones
is advised. On-board and off-board buses, wired and wireless buses, bus standards, bus
arbitration etc are the important topics covered here.
Chapter 6 is a brief introduction to the development tools that are needed to take a proj-
ect to completion.The discussion is meant to guide students in the right direction when
they are confused about the techniques for writing programs, testing them and burning
them into hardware.
A01_9788131787663_FM.indd xiv
A01_9788131787663_FM.indd xiv 7/3/2012 4:17:02 PM
7/3/2012 4:17:02 PM

PREFACE xv
Part II
This is the second part of the book,and there are three chapters here.This part deals with
software design aspects. Chapter 7 is quite lengthy, but it should mandatorily be learned
because it gives answers to many aspects of computers and embedded systems that are
seen and experienced in everyday life. This chapter covers operating system concepts in
detail and then offers codes/pseudo codes where OS concepts are tried out.
Chapter 8 is about ‘real time operating systems’.This should be considered as a continu-
ation of the previous chapter, But here the special requirements and scheduling policies
for a special class of embedded systems i.e., real time systems, are taken up. Numerical
problems are worked in both these chapters to understand the scheduling mechanism
used in operating systems.
Chapter 9 is a short chapter, being at best a basic introduction to Embedded C. It
assumes that the reader has a basic knowledge of the constructs of ‘C’. How this high-
level language is used for processor programming is the focus of the discussion, which
is based on the 8051 architecture. Some codes for PIC are also included. Later chapters
of PSoC and ARM contain more coding using Embedded C, but basic ideas are intro-
duced here, in Chapter 9.
Part III
This part consists of Chapter 10 to Chapter 15. The architecture, programming and
applications of some of the most widely used and popular processors are covered in
reasonable depth.
Chapters 10 and 11 are devoted to the ARM processor, which is the most popular pro-
cessor used in 32-bit and high-end applications. Chapter 10 explains the core of ARM
and follows it up with assembly language programming. Chapter 11 expands ARM
architecture to make it a microcontroller. A specific ARM-based MCU is chosen and
its peripherals are studied. Programming of some peripherals using C is done.These two
chapters are likely to be sufficient to get a good grip on ARM architecture.
Chapter 12 is about a new processor. It is not new in the embedded design world, but
the academic world is just getting familiarized with it – the chapter discusses PSoC,
an MCU series, which makes life easier for a product designer. This is because of the
graphical IDE it has, and other special features that are covered (with programming
examples in Embedded C) in the chapter.
Chapters 13 and 14 are about one of the most widely used 8-bit microcontroller i.e., the
8051. This MCU is simple and the first that a student should study. These two chapters
discusses this MCU with assembly language programming. All the peripherals are cov-
ered, and programming is explained with worked-out examples.
Chapter 15 contains a general coverage of DSP processors. Such processors are increas-
ing in relevance and the time is just right to learn the special features of such chips.
The special features of such processors are first explained, and then some popular DSP
processors (BlackFin, SHARC, OMAP etc) have been identified, and their features
elaborated.
A01_9788131787663_FM.indd xv
A01_9788131787663_FM.indd xv 7/3/2012 4:17:02 PM
7/3/2012 4:17:02 PM

xvi PREFACE
Part IV
This part includes Chapters 16, 17 and 18.
Chapter 16 deals with ASIC design. It starts with the classification of digital ICs,
continues with programmable devices and then gives a step-by-step explanation of how
a digital IC is designed, tested and fabricated. The reader can get a good idea of what
terms such as front-end design,back-end design etc.mean,without going very deep into
the process of ASIC design.
Chapter 17 introduces two new terminology.One is ‘Hardware Software Co-design’and
the other is ‘Embedded Product Development Lifecycle’.Both these terms are explained
and elaborated upon, with relevant examples.
Chapter 18 is very special. After all the previous chapters, it looks upon an embedded
product as a system, and suggests the steps needed to apply embedded systems to
make useful products as demanded by users. The user/users might have their viewpoint
expressed before the design starts. New concepts like user research, ergonomics, anthro-
pometry etc are introduced. Starting from the desires of users, the design steps reach the
final stage of product manufacture and resting.
Part V
This part has just one chapter. Chapter 19 has a concise discussion of three projects done
by students.The projects pertain to embedded hardware and software and use advanced
processors—ARM, OMAP and PIC. This chapter is meant to encourage students to
take up challenging and innovative ideas and build products based on these ideas.
Appendices
The book has appendices from A to K. Only Appendix A to D are in the text book,
The rest are available in the website of the book www.pearsoned.co.in/lylabdas/
embeddedsystems.
The contents of the appendices are as listed:
A - The instruction set of 8051
B - A step-by-step guide to using the Keil RVDK for 8051 and ARM
C - A step-by-step guide to using the PSoC Designer
D - Pin configuration and PINSEL register configuration of LPC 2148
E - A manual with experiments for PSoC1
F - A step by step guide to using PSoC Creator
G - A tutorial on Keil RVDK for 8051 and ARM
H - A program for interfacing a Graphical LCD to PSoC3
I - A program for interfacing an SD card to ARM7 (LPC 2148)
J - A program for using the I2HC interface of PSoC1
K - User manual of ARM LPC 2148
In addition, PowerPoint presentations and solution manual of the chapters are available
for instructors.
A01_9788131787663_FM.indd xvi
A01_9788131787663_FM.indd xvi 7/3/2012 4:17:02 PM
7/3/2012 4:17:02 PM

PREFACE xvii
Contact
Your suggestions and feedback are welcome. In spite of my best efforts, it is possible that
some errors may have crept in. Please point them out to me.
My contact id is lbd@nitc.ac.in
ACKNOWLEDGEMENTS
This is my second major book,and as I complete it,I would like to acknowledge all those
who have helped and encouraged me in this Herculean task. Truly, it has been a great
effort to write it, and there are a lot of people who have directly or indirectly helped me.
Let me start from the beginning.
The team at Pearson Education provided a lot of inputs, suggestions and support
and brought the project to fruition. I feel that Sojan Jose, my editor, and Ramesh M. R
and Vijay Pritha, the production editors have done a tremendous job.
The first batch of students I taught Embedded Systems was the B070EC batch
and the next year, the B080EC batch came in. Both batches expressed enthusiasm and
interest in the topics I taught, and this is the primary factor that gave me the courage to
embark on the venture of writing a book on the subject. I would like to thank each and
every one of them for this.
During the process of writing,a few students helped me directly in bringing the book
to this form. They gave suggestions, performed reviews, and three of them have contrib-
uted by writing a few sections of the book. All of them are working in reputed companies
and I would like to list their names, along with expressing my heartfelt thanks to them.
Nithin Gopinath (Texas Instruments, Bangalore), Sabu Paul (Texas Instruments,
Bangalore) and Sai Krishna K. (Broadcom, Bangalore) are the three who have contrib-
uted directly by writing a few sections in the book.
The list of those who have done reviews of the chapters are: Nithin Gopinath (Texas
Instruments, Bangalore), Jayalal Vijayan (Synopsis, Bangalore), Sai Krishna K. (Broadcom,
Bangalore), Harikrishnan M. (McAfee, Bangalore), Srijit R. (Deloitte, Hyderabad) and
Sushmitha Dandeliya (Assistant Professor, Engineering College, Gwalior).
A few of my colleagues also have helped me in this endeavor, and I extend my
gratitude to them. Raghu C. V., my colleague in the department, has done the writing
of Chapter 18, and has also given me suggestions at various stages of this work. The
names of others with whom I have had discussions on some topics are Sameer S. M.,
Deepthi P. P., Sudheesh George, Bhuvan B. and Rajiv T. R., my department colleagues,
and Jayaraj B.and Anu Mary Chacko of the Computer Science Department.I am grate-
ful to Anand, senior mechanic at the Embedded Systems Lab, who assisted me in all
the hardware work associated with the book. I thank Beljit, Anju and Aswathi who have
drawn the diagrams in the book. I would also like to make a note of acknowledgement
to my son Sagar, for his suggestions on the theme of the front cover of the book.
Two engineers at Cypress Semiconductors, Narayana Swamy, and Geethesh N. S.,
made a detailed review of the chapter on PSoC.Their inputs have enhanced the quality of
the chapter. I am obliged to them and also to Benoy Jose, and Karthikeyan Mahalingam
of Cypress Semiconductors for co-ordinating this activity.
A01_9788131787663_FM.indd xvii
A01_9788131787663_FM.indd xvii 7/3/2012 4:17:03 PM
7/3/2012 4:17:03 PM

xviii PREFACE
It is only because my department gave me free time without the hassles of regular
work that I have been able to complete the book on schedule. All my colleagues have
been helpful in this and I feel that words are not suﬃcient to express my feelings of
gratitude to all of them. I am deeply indebted to my institute for giving me the freedom
to grow and follow the path I chose.
Chapter 19 of the book contains the project work of a few teams of students. They
have worked systematically and enthusiastically to do projects of good standard, which
require a lot of background study. I congratulate them for the work they have done and
would like to mention their names here. They are: Nithin Gopinath, Jayalal Vijayan,
Ashwin Harikumar, Kurian Abraham, Ebin George, Sushmita Dandeliya, Fahim Bin
Basheer, Jinu J. Alias, Mohammed Favas C., Navas V. and Naveed Farhan K.
I am happy that my family has always been a source of solace for me.
Last, but not the least, I thank all my students once again for the inspiration they
have always been, and continue to be.
LYLA B. DAS
A01_9788131787663_FM.indd xviii
A01_9788131787663_FM.indd xviii 7/3/2012 4:17:03 PM
7/3/2012 4:17:03 PM

xix
about the author
Lyla B. Das is Associate Professor, Department of Electronics Engineering, National Institute of
Technology Calicut (NITC), Kerala. She has a diverse mix of industrial, teaching and research experi-
ence spanning about 30 years. As a young graduate specializing in Electronics and Communications
from the College of Engineering, Trivandrum, Lyla B. Das joined Keltron Controls as Deputy
Engineer in 1981.She joined NITC (then Regional Engineering College,Calicut),as lecturer in 1985
and proceeded to complete her master’s degree in digital communications from the same college.Over
the years, she was successively elevated as Assistant Professor and then Associate Professor, a position
which she currently holds.
Keen to actively seek and impart knowledge, Lyla B. Das currently teaches courses on micro-
processors, microcontrollers, digital system design using VHDL, and system design using embed-
ded processors at the undergraduate as well as postgraduate level. She has presented research
papers in conferences of national and international stature and has worked on numerous projects
based on microprocessors and microcontrollers, such as microprocessor-based voting machines and
microcontroller-based rail track switching system. An avid reader of contemporary research material,
she keeps herself abreast of the current trends in her chosen ﬁeld and guides students in their M.Tech.
research theses. This book on Embedded Systems is her second book with Pearson Education, the ﬁrst
one being The X86 Microprocessors, which was published in 2010 and received with wide acclaim.
Lyla B. Das has worked on various projects funded by the ministry of human resource develop-
ment (MHRD) in thrust areas of growth including the setting up of an embedded systems labora-
tory in 2005–2008. She has delivered expert lectures on image compression using wavelets, advanced
microprocessors and microcontrollers, FPGA based systems and embedded systems at several engi-
neering colleges across Kerala. She has also participated in numerous tutorials and workshops con-
ducted by the Indian Institute of Technology (IIT) and the Indian Institute of Science (IISc).She was
a Fellow in the national conference on ‘VLSI Design and Embedded Systems’held at IISc Bangalore
(2003) and IIT Mumbai (2004). She is a life member of the System Society of India and a member of
the Indian Society for Technical Education and the Computer Society of India.
A01_9788131787663_FM.indd xix
A01_9788131787663_FM.indd xix 7/3/2012 4:17:03 PM
7/3/2012 4:17:03 PM

A01_9788131787663_FM.indd xx
A01_9788131787663_FM.indd xx 7/3/2012 4:17:03 PM
7/3/2012 4:17:03 PM

PART-I
DESIGN ASPECTS
OF EMBEDDED SYSTEMS
M00_9788131787663_C00.indd 1
M00_9788131787663_C00.indd 1 7/3/2012 12:08:16 PM
7/3/2012 12:08:16 PM

M00_9788131787663_C00.indd 2
M00_9788131787663_C00.indd 2 7/3/2012 12:08:16 PM
7/3/2012 12:08:16 PM

0.1 | Basics of Computer Architecture
0.1.1 | The Block Diagram of a Computer
A computer, as its name indicates is a machine used for computing. Computing, which
many years ago meant arithmetic calculations, has now given way to large amounts of
‘data processing’. As such, it is more reasonable to designate the computer now as a ‘data
processing machine’. For performing its designated tasks, this machine requires many
components, which can broadly be divided as hardware and software. Hardware is obvi-
ously, the physical constituents of a computer. Software is the collection of programs
which directs the hardware to perform its tasks.
Let us ﬁrst look at a computer in terms of its hardware. Figure 0.1 shows the archi-
tectural description of a computer system. It shows the major parts of the computer and
also indicates how these parts are connected together, to form the computing machine.
The major parts are the CPU, memory and input/output devices.
The heart of a computer is the ‘central processing unit’. It is this unit which gives
‘life’to a computer.The CPU usually is a ‘microprocessor’, which means that it is usually
a separate and self contained chip. The CPU processes the data given to it, according
to the programs meant to operate on these data. The program consists of ‘instructions’.
These instructions are decoded by the CPU, which generates control signals necessary
In this chapter, you will learn
The general principles of computer archi-
tecture
The operation of the data,address and con-
trol buses of a computer
The distinction between RISC and CISC
computing
The comparison between assembly and
high level language programming
The binary, hexadecimal and BCD number
systems
Number format conversions
basics of computer
architecture and
the binary number
system
0
Chapter-opening image: Firebird robotic platform (Courtesy: Nex Robotics, Mumbai).
M00_9788131787663_C00.indd 3
M00_9788131787663_C00.indd 3 7/3/2012 12:08:16 PM
7/3/2012 12:08:16 PM

4 EMBEDDED SYSTEMS
to activate the arithmetic and logic units of the CPU. As such, the CPU contains the
arithmetic logic unit and the control unit.All these activities are timed and synchronized
by a pulse train of fixed frequency.This is the clock signal, and it also has the job of syn-
chronizing the activity of the CPU with the activity on the bus.
0.1.2 | The System Bus
A bus is collection of signal wires which connect between the components of the com-
puter systems—Figure 0.2 shows that the CPU is connected to the memory as well as
I/O through the system bus,but only one at a time—if the memory and I/O wants to use
the bus at the same time, there is a conflict, as there is only one system bus. The system
bus comprises of the address bus, data bus and the control bus.
The Data Bus The set of lines used to transfer data is called the data bus. It is a bidi-
rectional bus, as data has to be sent from the CPU to memory and I/O, and has to be
received as well by the CPU.The width of the data bus determines the data transfer rate,
size of the internal registers of the CPU and the processing capability of the CPU. In
short, it is a reflection of the complexity of the processor. As we see, the 8086 has a data
bus width of 16 bits, while the 80486 has a 32-bit bus width.Thus the 80486 can process
data of 32 bits at a time while the 8086 can only handle 16 bits.
The Address Bus The address bus width determines the maximum size of the physi-
cal memory that the CPU can access. With an address bus width of 20 bits, the 8086
can address 220
different locations. It can use a memory size of 220
bytes or 1 MB. For
Pentium with an address bus width of 32 bits, the corresponding numbers are 232
bytes
i.e., 4 GB. When a particular memory location is to be accessed, the corresponding
address is placed on the address bus by the CPU. I/O devices also have addresses. In
both cases, it is the CPU which supplies the address, and as such, the address bus is
unidirectional.
The Control Bus The control bus is a set of control signals which needs to be activated
for activities like writing/reading to/from memory/I/O, or special activities of the CPU
like interrupts and DMA. Thus, we see signals like Memory Read, I/O Read, Memory
Write and Interrupt Acknowledge as part of the control bus.These control signals dictate
Figure 0.1 | The block diagram of a computer
CPU
Memory I/O
M00_9788131787663_C00.indd 4
M00_9788131787663_C00.indd 4 7/3/2012 12:08:17 PM
7/3/2012 12:08:17 PM

BASICS OF COMPUTER ARCHITECTURE AND THE BINARY NUMBER SYSTEM 5
the actions taking place on the system bus that involve communications with devices like
memory or I/O. For example, the Memory Read signal will be asserted for reading from
memory. It is sent to memory from the processor. A signal such as ‘Interrupt’is received
by the processor from an I/O device. Hence in the control bus, we have signals traveling
in either direction. Some control lines may be bidirectional too.
Now that we have discussed a computer system in general,let us go a bit deeper into
its individual constituents.
0.1.3 | The Processor
The processor or the microprocessor as we might call it,is the component responsible for
controlling all the activity in the system.It performs the following three actions continu-
ously. See Figure 0.3.
i) Fetch an instruction from memory.
ii) Decode the instruction.
iii) Execute the instruction.
When we write a program, it is stored in memory. Our code has to be brought to the
processor for the required action to be performed. The ﬁrst step obviously, is to ‘fetch’
it from memory. The next step i.e., decoding, involves the interpretation of the code as
to what action is to be performed. After decoding, the action required is performed.
This is termed ‘instruction execution’. The sequence of these three actions is called the
‘execution cycle’. To do all this, the processor has ‘control circuitry’ to fetch and decode
instructions. The ALU part of the processor performs the required arithmetic/logic
Figure 0.2 | The system bus and its components
Address Bus
Data Bus
Control Bus
I/O Device
I/O Device
I/O Device
Processor Memory
I/O
Interface
M00_9788131787663_C00.indd 5
M00_9788131787663_C00.indd 5 7/3/2012 12:08:17 PM
7/3/2012 12:08:17 PM

6 EMBEDDED SYSTEMS
operations. The sequence of fetch-decode-execute is done continuously and inﬁnitely
by the processor. An important implication of this cycle is that instruction execution is
‘sequential’ in nature—it is only after the ﬁrst instruction is dealt with, will the second
one be taken up.However,there will be situations when the sequential nature of program
execution is disturbed.This is when a ‘branch’ instruction appears in the sequence, and a
new sequence of instructions will be taken up starting from a new location.
0.1.4 | System Clock
All the activities of the processor and buses are synchronized by a clock, which is as
shown in Figure 0.4 a square wave with a particular frequency. The reciprocal of the
clock frequency is the cycle time T, also called the clock period. T = 1/f where f is the
clock frequency. An execution cycle may require many clock periods. This depends on
the architectural features of the processor, as well as the complexity of the instruction
to be executed. Since an execution cycle also involves fetching instructions and data
from memory, it also depends on how many clock cycles are needed to access memory.
Obviously, the time for execution depends on the clock speed as well. i.e., a clock speed
of 3 GHz implies faster processing than a clock of 1 GHz. However, the technology
used for the processor must be able to support the clock frequency used.
0.1.5 | Memory
The memory associated with a computer system includes the primary memory as well
as secondary memory. However, for the time being, we will think of memory as con-
stituting the primary or main memory only, which is usually RAM (Random Access
F D E F D E F
Execution Cycle
F -Fetch
D -Decode
E-Execute
Execution Cycle
Figure 0.3 | The execution cycle
Figure 0.4 | System clock
1
0
Clock
Cycle
Time
M00_9788131787663_C00.indd 6
M00_9788131787663_C00.indd 6 7/3/2012 12:08:17 PM
7/3/2012 12:08:17 PM

Memory). Memory is organized as bytes, and the capacity of a memory chip is stated in
terms of the number of bytes it can store.Thus, we can have chips of size 256 bytes, 1KB,
1MB and so on.If a computer has a total memory space of 20 MB it can use RAM chips
of the available capacity to get that much of memory.
There are two basic operations associated with memory—read and write. Reading
causes a data stored in a memory location to be transferred to the CPU, without erasing
the content in memory. Writing causes a new data to be placed in a memory location
(it overwrites the previous value). There is a certain amount of time required for these
operations and this is termed as ‘access time’.
Memory Read Cycle The steps involved in a typical read cycle are:
i) Place on the address bus, the address of the memory location whose content is to be
read.This action is performed by the processor.
ii) Assert the memory read signal which is part of the control bus.
iii) Wait until the content of the addressed location appears on the data bus.
iv) Transfer the data on the data bus to the processor.
v) De-activate the memory read signal. The memory read operation is over and the
address on the address bus is not relevant anymore.
Memory Write Cycle As a continuation, let us also examine the steps in a typical write
cycle.
i) Place on the address bus, the address of the location to which data is to be written.
ii) On the data bus, place the data to be written.
iii) Assert the memory write signal which is part of the control bus.
iv) Wait until the data is stored in the addressed location.
v) De-activate the memory write signal.This ends the memory write operation.
At this stage, we should remember that these operations are synchronized with the sys-
tem clock. An 8086 processor takes at least four clock cycles for reading/writing. These
four cycles constitute the ‘memory read’ and ‘memory write’ cycles for the processor.
Other processors may require more/less clock cycles for the same operations.
0.1.6 | The I/O System
For a computer to communicate with the outside world there is the need for what are
called peripherals. Some of these peripherals are purely input devices like the keyboard
and mouse; some are purely output devices like the printer and video monitor and some
Figure 0.5 | Memory and associated control signals
Memory
Address
Read
Write
Data
M00_9788131787663_C00.indd 7
M00_9788131787663_C00.indd 7 7/3/2012 12:08:17 PM
7/3/2012 12:08:17 PM

8 EMBEDDED SYSTEMS
like the modem transfer data in both directions.All this just means that such I/O devices
are needed for us to use a computer.However,it is difficult for a processor to deal directly
with I/O devices, because of their incompatibility with the processor—each peripheral
is different and the operating conditions, voltages, speeds and standards are not under-
standable to the processor. The processor does not have the necessary control signals to
deal with different peripherals. Hence, the normal practice is for each peripheral to have
a controller which acts as an interface between the peripheral and the processor. This
controller, which may be a special purpose chip, understands the characteristics of the
particular device and provides the necessary control signals to the processor to communi-
cate with the peripheral.Thus, we have specialized controllers for most peripherals—like
the keyboard display interfacing chip, parallel port interfacing chip and serial communi-
cation chip. All these chips are programmable—they have registers for commands, data
and status. By suitably programming these chips, we can get the processor to communi-
cate correctly with any peripheral. Figure 0.6 shows the use of an I/O interface between
an I/O device and a processor. The processor is not shown in the figure, but the system
bus which comes from the processor is shown.
In the final analysis, we can think of a computer that we usually use, as a conglom-
eration of components which include memory and I/O devices of various types, applica-
tions and specifications.
0.2 | Computer Languages
0.2.1 | Machine Language, Assembly Language
and High Level Language
The computer is just a dumb piece of equipment unless we are able to make it work for
us. For that, we must be able to ‘program’it, so that it will perform the tasks we assign it.
Programming a computer entails the use of a language that the computer understands.
The language native to computers is ‘machine language’ which consists of binary ones
and zeros. The computer knows this language, and the series of ones and zeros fed to it
Data
Status
Command
I/O Interface
Address Bus
Data Bus
Control Bus
System
Bus
I/O Device
Figure 0.6 | The I/O system
M00_9788131787663_C00.indd 8
M00_9788131787663_C00.indd 8 7/3/2012 12:08:17 PM
7/3/2012 12:08:17 PM

are ‘operation codes’ for it, which tells it what action is to be performed. Thus there is
one binary code for addition and another one for subtraction.These operation codes are
called ‘opcodes’and this language is called ‘machine language’. Programming in machine
language means writing the opcodes of the tasks we want to get done by the computer.
However, the problem with machine language, as is obvious, is that it is cumbersome
and error-prone. Human beings are not good at remembering or using binary codes.
Programming using machine language is not something that any one of us is likely to
enjoy. To make it easier for us to communicate with computers, there is a language at
a slightly higher level and that is called ‘assembly language’. This is more intelligible to
users than machine code.This language uses ‘mnemonics’for specifying the operation the
computer is to perform.These mnemonics are a direct translation of the machine code to
a symbol.For example,the binary code for addition is replaced by the symbol ‘ADD’—the
binary code for multiplication is given the symbolic name ‘MUL’. The exact mnemonic
used depends on the processor type, but it will be related to the operation to be done.
How does this help? A user does not have to remember binary codes or enter binary
code for programming. He only needs to remember the symbolic codes and the asso-
ciated syntax. We say that assembly coding is at a higher level than machine coding.
However, does the computer understand the mnemonics? No, which means that should
be an interface between assembly language and machine language. This interface con-
verts the symbolic codes fed in by the user into machine codes.The software which does
this is called an ‘assembler’.
Since machine language is native to a processor, each processor will have its own
machine language and thus it has its own assembly language also. Translating from
assembly language to machine language and vice versa is a one to one process—one
opcode translates to a unique machine code for a particular processor.
However, we human beings always look for easier ways to get things done. So there
are ‘higher level languages’ which has the vocabulary and grammar similar to the lan-
guage spoken by us.Such languages are very easy to use because the communication pro-
cess is similar to English.We have heard of languages like C,FORTRAN,COBOL and
many, many such ‘high level languages’.The features of such languages are that they are
i) easy to understand and write,
ii) are not processor speciﬁc.
Thus if we write a program in C, we can use it to run on any processor—as long as the
‘compiler’ for the language is available. The compiler is the software which ‘translates the
high level language statements’to statements in a lower level language.The lower level may
be assembly or machine language. However, ﬁnally the processor needs the machine code.
The program that we write in assembly or high level language is called the source
program or source code.A compiler or assembler converts this into an object code which
is ‘executable’ in the sense that the processor understands the code and performs the
tasks indicated.
0.2.2 | Comparison
Programming in machine language is too cumbersome and hence ruled out in the pres-
ent world. However, assembly language programming is frequently done, so let us now
M00_9788131787663_C00.indd 9
M00_9788131787663_C00.indd 9 7/3/2012 12:08:18 PM
7/3/2012 12:08:18 PM

10 EMBEDDED SYSTEMS
make a comparison between assembly language programming and high level language
programming.
Assembly code is specific to a processor—which means that the assembly code of
8086 does not make any sense to 8085 (though both are Intel made).Assembly programs
need the programmer to know the architecture of the processor intimately. He should
know the registers and flags and the way each instruction handles data.So doing assembly
coding involves the study of the concerned processor. However, once this part is done,
coding is very efficient, compact and executes very fast. Speed advantage of a hundred
times or more,is fairly common.Assembly language programming also gives direct access
to key machine features essential for implementing certain kinds of low level routines,
such as an operating system kernel or microkernel, device drivers, and machine control.
High level language programming, on the other hand, is not processor-specific. It
is easy to learn and master. However, high level languages are abstract.Typically a single
high level instruction is translated into several (sometimes dozens or in rare cases even
hundreds) executable machine language instructions. The object code generated by a
compiler is usually not compact. However, the advantage of high level languages is that
since it is easy to learn, semi-skilled designers can be employed for development activi-
ties, and so development and maintenance times are much less.
This book focuses on the x86 architecture and on assembly language programming.
The aim is to impart good assembly language skills and a thorough knowledge of the
x86 architecture.
0.3 | RISC and CISC Architectures
Two terms that are likely to be encountered frequently while reading about computer
architecture are RISC and CISC. RISC stands for Reduced Instruction Set Computer
and CISC means Complex Instruction Set Computer. Since a lot of controversy sur-
rounds these two terms, let us try to find out what it is all about.
In the early days of microprocessor development, the trend was to have complex
instructions implemented fully using hardware. For example, the multiply instruction is
a complex instruction which needs a dedicated hardware multiplier. Because hardware
is fast, execution is fast, but with lots of such complex instructions, the hardware budget
is naturally high.This is the philosophy and the main feature of CISC.
RISC on the other hand, views this matter in a different way. On an average, the
number of complex instructions a computer uses is relatively less. So, why not realize a
complex instruction using a set of simple instructions? This is possible, and the advan-
tage is that the hardware budget is much less.The instruction set is also small. However,
software is to be written to realize complex instructions with simple instructions. This
amounts to trading software for hardware.
There exists a long history of controversy regarding which is better. The x86 archi-
tecture was based on the CISC philosophy, right from the beginning. By the time RISC
principles became popular and software development for RISC became established, the
x86 CISC processors had already carved a niche for themselves in the processor market.
So,even though the supporters of RISC were able to establish their point,most develop-
ers did not want to take the risk of switching over to an untested domain.However,most
of the newer processors used the RISC philosophy for their architectures—examples
M00_9788131787663_C00.indd 10
M00_9788131787663_C00.indd 10 7/3/2012 12:08:18 PM
7/3/2012 12:08:18 PM

are ARM, Power PC, Sun’s Sparc processors and the like. Many of them found their
applications in the embedded processing field.
The main features of RISC are that they have only simple instructions implemented
in a single clock. However, there is an irony in that, many RISC processors have as many
complex instructions as CISC processors. Probably this can be justified by explaining
that such complex instructions have been implemented using microprogramming rather
than a direct hardware realization. Microprogramming is a method of implementing
the control unit of a computer by breaking down instructions into a sequence of small
programming steps.
While the RISC versus CISC controversy is still raging, the distinction between
what exactly is RISC and what is CISC is reducing to the extent of being almost indistin-
guishable except at the basic philosophical level. Intel which held on to CISC for many
years, bowed down to the RISC architecture by designing its Pentium Pro with complex
instructions which internally were broken down to simple RISC like instructions. So the
comment on it is that Pentium Pro is a RISC processor than runs CISC instructions.
0.4 | Number Systems
Motivation In the study of microprocessors, we will have to use many different num-
ber systems, and conversions from one system to the other. Clarity of these ideas is very
important for correct computation and the right interpretation of results. This is the
motivation for a review on it, though most of you have had an introduction to it already.
We have become quite used to the number system which we call the decimal num-
ber system, which is a system with a base (radix) 10. We are so used to this system of
numbers that our visualization of quantity is always based on this. Our mental faculties
are tuned to perform all calculations in this number system. In contrast, computers are
not comfortable with this system—we know that they use the binary system of numbers
and all computations are done in the binary format. Thus we have a problem when we
use computers to perform computations for us. So let us start this discussion by first
understanding the intricacies of each of the commonly used number systems. We will
discuss the ones that we most often might have to use in the context of computers.
0.4.1 | The Decimal System
The base of this system is 10 (ten)—and it naturally follows that there are ten defined
symbols in this system—the combinations of these ten symbols give us various values.
The ten symbols here are 0, 1, 2, 3, 4, 5, 6, 7, 8 and 9, and they are called ‘digits’.The posi-
tion of a digit in a number is what gives its value.
For example, how does the number 346 get its value?
346 = 3 × 102
+ 4 × 101
+ 6 × 100
= 300 + 40 + 6
This means that, associated with each position, there is a weight. Here the weight is a
power of 10.Thus
56785 = 5 × 104
+ 6 × 103
+ 7 × 102
+ 8 × 101
+ 5 × 100
= 50000 + 6000 + 700 + 80 + 5
M00_9788131787663_C00.indd 11
M00_9788131787663_C00.indd 11 7/3/2012 12:08:18 PM
7/3/2012 12:08:18 PM

12 EMBEDDED SYSTEMS
What about fractional numbers? The positions on the right side of the decimal number
also have weights, but the powers (of 10) are negative.
6.785 = 6 × 100
+ 7 × 10−1
+ 8 × 10−2
+ 5 × 10−3
= 6 × 1 + 7 × 0.1 + 8 × 0.01 + 5 × 0.001
= 6 + 0.7 + 0.08 + 0.005
These are things that we know very well.They have been reviewed here to show that the
same concept applies to other number systems as well.
0.4.2 | The Binary Number System
The base of this system is 2,and so it has two symbols,0 and 1,each of them being called
a bit. So each position has a weight which is a power of 2.Take the number 110110. Let
us find its value. Since there are 6 bits, there are six positions with weights as shown for
the bits:
Power of 2: 25
+24
+23
+22
+21
+20
Weight: 32 +16 +8 +4 +2 +1
Number: 1 1 0 1 1 0
Value: 1 × 32 +1 × 16 +0 × 8 +1 × 4 +1 × 2 +0 × 1
Adding the values in all the bit positions gives 32 + 16 + 0 + 4 + 2 + 0 = 54. This is the
equivalent value in the decimal system.We cannot help putting back everything into the
decimal system, because this is the number system with which we are most familiar and
comfortable.
Note Sometimes binary numbers are suffixed with B to indicate that they are binary
numbers e.g., 110110B, 1010110B. Sometimes the notation 1101102
is also used.
Note Keep the calculator in the PC (Accessories of Windows) open in the scientific
mode.This will help to verify all the calculations we are going to do from now on.
Next, let us try to understand the concept of fractional binary numbers.
Example 0.1
Find the decimal values of the binary number 1001.011 B
Solution
Power of 2: 23
22
21
20
2−1
2−2
2−3
Weight 8 4 2 1 0.5 0.25 0.125
Number 1 0 0 1 . 0 1 1
Value 8 + 0 + 0 + 1 . 0 + 0.25 + 0.125
= 9.375
Example 0.1 shows 1001.011 in binary (often written as 1001.0112
).
It also shows the power and weight or value of each digit position. Thus 1001.001 is
equivalent in decimal to 9.375 (8 + 1 + 0.25 + 0.125).
Notice that this is the sum of 23
+ 20
+ 2−2
+ 2−3
, but 22
and 21
are not added, as the bit
under these positions is 0.The fractional part is composed of 2−2
and 2−3
, but there is no
digit under 2−1
, so 0.5 is not added.
M00_9788131787663_C00.indd 12
M00_9788131787663_C00.indd 12 7/3/2012 12:08:18 PM
7/3/2012 12:08:18 PM

0.4.3 | The Hexadecimal Number System
Next is the hexadecimal system of numbering which has 16 symbols namely 0, 1, 2, 3, 4,
5, 6, 7, 8, 9, A, B, C, D, E, F.The base of the system is 16 and each symbol is called a ‘hex
digit’. Each position in the hexadecimal number system has a weight which is a power of
16.Let us find the value of 240FCH.The letter ‘H’is suffixed to the number if it is needed
to make clear that it is a hexadecimal number. A to F have the decimal values of 10 to 15.
Power of 16: 164
+163
+162
+161
+160
Weight: 63536 +4096 +256 +16 +1
Number: 2 4 0 F C
Value: 2 × 65536 +4 × 4096 +0 × 256 +15 × 16 +12 × 1
i.e., 131072 +16384 +0 +240 +12
= 147708
So we have calculated the equivalent decimal value of the given hex number by using the
concept of positional weights.
Example 0.2
Find the decimal value of the hex number 25.1H
Solution
Power of 16: 161
+160
+16−1
Weight: 16 +1 +0.0625
Number: 2 5 . 1
Value 2 × 16 +5 × 1 . +1 × 0.0625
i.e., 32 +5 . +0.0625
= 37.0625
There is also an octal system whose base is 8. The equivalent calculations involved in
this are left as an exercise for the interested student. In all the above, we have done the
conversion to decimal form from other number systems. Now we will see how we will
convert a decimal number to other systems of numbering.
Note i) In most computers, the default number system for writing numbers is deci-
mal. When we mean decimal numbers, we simply write it as it is—like 35,
687 and 234 and so on. A number in hex form is suffixed with the letter H,
for example, 56H, 8FH, 0AH and so on.
ii) The numbers from 0 to 9 are the same in the decimal and the hexadecimal
system.So,in the forthcoming chapters,you will see that no ‘H’is added when
writing numbers from 0 to 9, though there is nothing wrong in writing 7H,
8H, 01H and so on.
0.5 | Number Format Conversions
0.5.1 | Conversion from Decimal to Binary
The method is to divide the decimal number by 2, until the quotient is 0. See the tech-
nique illustrated below.
M00_9788131787663_C00.indd 13
M00_9788131787663_C00.indd 13 7/3/2012 12:08:18 PM
7/3/2012 12:08:18 PM

14 EMBEDDED SYSTEMS
Example 0.3
Find the binary value of 13.
Solution
Divide 13 by 2 repeatedly and save the remainders
2)13 remainder = 1
2)6 remainder = 0
2)3 remainder = 1
2)1 remainder = 1
0
Now write the remainders from bottom to top, as one line from left to right. We get
1101 as the converted binary number.
Thus, we have been able to convert from decimal to binary by repeated division by 2, the
base of the binary number system. To verify, try converting this binary number back to
decimal. It should be 13. Or simply use the scientiﬁc calculator to verify the conversion.
(Make sure it is kept open on your PC’s desktop.)
Example 0.4
Convert the number 213 to binary form.
Solution
2)213 remainder = 1
2)106 remainder = 0
2)53 remainder = 1
2)26 remainder = 0
2)13 remainder = 1
2)6 remainder = 0
2)3 remainder = 1
2)1 remainder = 1
0
Now write the remainders from bottom to top in one line,from left to right.The number
is 11010101.
0.5.2 | Conversion from Decimal to Hexadecimal
Conversion from decimal to hexadecimal is accomplished by dividing by 16 and ﬁnding
the remainders.Remainders ranging from 10 to 15 will be written using the hexadecimal
symbols A to F. See how 225 is converted to a hexadecimal form.
16)225 remainder = 1
16)14 remainder = E
0
Result = E1
The method for this is obvious i.e.,divide repeatedly the decimal number by 16,keep the
remainders. Do this until the quotient is 0.
M00_9788131787663_C00.indd 14
M00_9788131787663_C00.indd 14 7/3/2012 12:08:18 PM
7/3/2012 12:08:18 PM

Example 0.5
Convert the decimal number 4152 to hexadecimal.
Solution
16)16 remainder = 0
16)1 remainder = 1
0
Take the remainders from bottom to top and write it in a single line from left to right.
The number is 1038H.
0.5.3 | Converting from Binary to Hexadecimal
If we take any hex digit,note that its decimal value ranges from 0 to 15.For example F is
15,A is 10 and so on.If a hex digit has to be converted to a binary number,the maximum
number of bits required is 4.
SeeTable 0.1.Any hex digit can be written as a group of four bits.Taking an example,
4C57FH can be written in binary, by writing the equivalent binary of each of the digits
Hex 4. C 5 7 F
Binary 0100 1100 0101 0111 1111
The binary value of 4C57FH is 01001100010101111111. Looking at both the
representations tells us the biggest problem with binary numbers—they are long and
cumbersome to handle. Putting them into a hex form makes the representation short
and concise—we conclude that the binary representation is an expanded form of the
hexadecimal representation where each hex digit is expanded to its 4-bit binary form.
If we have a long binary number, what we can do to convert it into hex form is to
divide it into groups of 4 bits (starting from the right i.e., the LSB). Then write the hex
representation of each 4-bit binary group. Try this technique with the following binary
number: 11100101010100011101.
1110 0101 0101 0001 1101 B i.e., binary
E 5 5 1 D H i.e., hex
Table 0.1 | Hex, Binary and Decimal Representations
Decimal Hex Binary Decimal Hex Binary
0 0 0000 8 8 1000
1 1 0001 9 9 1001
2 2 0010 10 A 1010
3 3 0011 11 B 1011
4 4 0100 12 C 1100
5 5 0101 13 D 1101
6 6 0110 14 E 1110
7 7 0111 15 F 1111
M00_9788131787663_C00.indd 15
M00_9788131787663_C00.indd 15 7/3/2012 12:08:18 PM
7/3/2012 12:08:18 PM

16 EMBEDDED SYSTEMS
Example 0.6
Convert the following binary number to hex form.
10111110000111110001.
Solution
It is the practice to write binary as groups of four with a space between the groups.This
increases the readability of the binary number.
1011 1110 0001 1111 0001
A E 1 F 1
The equivalent hex number is AE1F1H.
Example 0.7
Convert the following hexadecimal number to binary form.
3AF24H
Solution
Take each hexadecimal digit and write its equivalent four-bit binary value.
3 A F 2 4
0011 1010 1111 0010 0100
From all this, we should realize that the hexadecimal notation is a contracted form of rebi-
nary number representation. Computers do all their processing using binary numbers, but
it is easier for us to represent that binary number in hex form.So when we ask the computer
to add 34H and 5DH, it actually expands these into binary form and does the addition.
0.5.4 | BCD Numbers
BCD stands for ‘Binary Coded Decimal’ but there is more to it than being just a binary
representation of a decimal number. Let us look into the details.
Decimal numbers are represented by 10 symbols from 0 to 9, each of them being
called a digit. We know the binary code for each of these decimal numbers. Suppose we
represent one decimal digit as a byte,it is called ‘unpacked BCD’.Consider the represen-
tation of 9—it is written as 00001001. Now if we want to write 98 in unpacked BCD,
it is written as two bytes:
9 8
00001001 00001000.
Thus the binary code of each decimal digit is in one byte.
Packed BCD What, then, is ‘packed BCD’? When each digit is packed into 4 binary
bits, it is packed BCD.Thus 98 is
9 8
1001 1000.
Each digit needs a nibble (four bits) to represent it. The packed BCD form of 675 is
0110 0111 0101.The important point to remember is that since there is no digit greater
M00_9788131787663_C00.indd 16
M00_9788131787663_C00.indd 16 7/3/2012 12:08:18 PM
7/3/2012 12:08:18 PM

than 9, no BCD nibble can have a code greater than ‘1001’. Computers do process BCD
numbers, but the user must be aware of the number representation that is being used.
Can we write BCD numbers in hex? Yes, because the hex representation is just a
concise representation of binary numbers. The decimal number 675 when written as
675H represents the packed BCD, in hex form. There is no need to be confused about
this, because the steps involved are:
i) write the binary equivalent of each decimal number, as a nibble,
ii) write the hex equivalent of each nibble.
675 is
0110 0111 0101
6 7 5 H
Spend a few moments thinking, to make it clear. One important point to keep in mind
is that when we represent BCD in hex form, no digit will ever take the value of A to F,
since decimal digits are limited to 9.So there will never be a BCD number such as 8F5H
or 56DH or A34H.
Example 0.8
Find the binary, hex and packed BCD representation of the decimal numbers 126 and
245. Also write the packed BCD in the hex format.
Solution
Number Binary Hex BCD BCD in hex
form
126 0111 1110 7EH 0001 0010 0110 126H
245 1111 0101 F5H 0010 0100 0101 245H
Example 0.9
Find the packed BCD value of the decimal number 2347654, and represent the BCD
in hex format.
Solution
To ﬁnd the BCD, each digit is to be coded in 4-bit binary.
Hence 2347659 is
0010 0011 0100 0111 0110 0101 1001 i.e., 2347659H is the hex representation of the
BCD number.
It is very important to keep this in mind, when we do programs using BCD arithmetic.
Whenever you have doubts then, just refer back to this chapter.
0.5.5 | ASCII Code
This word pronounced as ‘ask-ee’ is the abbreviation of the words ‘American Standard
Code for Information Interchange’. This is the code used when entering data through
the keyboard and displaying text on the video display. It is very important to know what
it is and how this code is used.
M00_9788131787663_C00.indd 17
M00_9788131787663_C00.indd 17 7/3/2012 12:08:18 PM
7/3/2012 12:08:18 PM

18 EMBEDDED SYSTEMS
ASCII is a seven bit code,which is written as a byte.It has representations for numbers,
lower case and upper case English alphabets, special characters (like # , ^ . ) and control
characters.For example,there are ASCII codes for ‘new line’,carriage return and the space
bar. A number of characters are related to printing. When we type a character on the key-
board, it is the ASCII value of the key that is read in.The computer must convert it from
this form to binary form, for processing. The list of ASCII codes is shown in Table 0.2.
Note that the ASCII value of numbers from 0 to 9 is 30H to 39H. The ASCII of upper
case alphabets starts from 41H and that for lower case starts from 61H.This table will be
needed as quick reference for various calculations we will do in the programming chapters.
0.5.6 | Representation of Negative Numbers
There are various ways of representing negative numbers—like signed magnitude, one’s
complement, two’s complement and so on, but we will straightway discuss the represen-
tation used by computers for this. Computers use the ‘two’s complement’ representation
for negative numbers.The method is to complement each bit of the number and add a ‘1’
to this. Let us see how it is done.
We will start with 4-bit numbers. Say we want to represent −6.
i) Write the 4-bit binary value of 6: 0110.
ii) Complement each bit: 1001.
iii) Add ‘1’ to this: 1010.
So –6 is ‘1010’, for computers.
Let us try this for all the numbers from 0 to 7. See Table 0.3 which shows the posi-
tive and negative number representation of numbers possible to be represented in four
bits. A number of observations can be made from Table 0.3.
i) The range of numbers that can be represented by 4 bits is −8 to +7. For an n-bit
number, this range works out to be (−2n−1
) to (+2n−1
−1).
ii) In this notation, the most signiﬁcant bit (MSB) is considered to be the sign bit.The
MSB for positive numbers is ‘0’ and for negative numbers is 1.
iii) There is a unique representation for 0.
Since we will deal mostly with bytes and words (16-bit) let’s have a feel of 8-bit negative
number representation.
Example 0.10
Find the two’s complement number corresponding to −6 when 6 is represented in 8 bits
as 0000 0110.
Solution
The steps: 0000 0110
1111 1001 ;complement each bit
1111 1010 ;add ‘1’ to it
F A ;in hex
Thus −6 is FAH in 8-bit form, while it is AH in 4-bit form (from Table 0.3)
Note H is the notation for ‘hexadecimal’.
M00_9788131787663_C00.indd 18
M00_9788131787663_C00.indd 18 7/3/2012 12:08:18 PM
7/3/2012 12:08:18 PM

Table 0.2 | The ASCII Code—Symbols versus Hex Value
Symbol ASCII
(Hex)
Symbol ASCII
(Hex)
Symbol ASCII
(Hex)
Symbol ASCII
(Hex)
NUL 0 DLE 10 (Space) 20 0 30
SOH 1 DC1 11 ! 21 1 31
STX 2 DC2 12 “ 22 2 32
ETX 3 DC3 13 # 23 3 33
EOT 4 DC4 14 $ 24 4 34
ENQ 5 NAK 15 % 25 5 35
ACK 6 SYN 16 26 6 36
BEL 7 ETB 17 . 27 7 37
BS 8 CAN 18 ( 28 8 38
Tab 9 EM 19 ) 29 9 39
LF A SUB 1A * 2A : 3A
VT B ESC 1B + 2B ; 3B
FF C FS 1C , 2C 3C
CR D GS 1D - 2D = 3D
SO E RS 1E . 2E 3E
SI F US 1F / 2F ? 3F
@ 40 P 50 ` 60 P 70
A 41 Q 51 a 61 q 71
B 42 R 52 b 62 r 72
C 43 S 53 c 63 s 73
D 44 T 54 d 64 t 74
E 45 U 55 e 65 u 75
F 46 V 56 f 66 v 76
G 47 W 57 g 67 w 77
H 48 X 58 h 68 x 78
I 49 Y 59 i 69 y 79
J 4A Z 5A j 6A z 7A
K 4B [ 5B k 6B { 7B
L 4C / 5C l 6C 7C
M 4D ] 5D m 6D } 7D
N 4E ^ 5E n 6E ~ 7E
O 4F – 5F o 6F 7F
M00_9788131787663_C00.indd 19
M00_9788131787663_C00.indd 19 7/3/2012 12:08:18 PM
7/3/2012 12:08:18 PM

20 EMBEDDED SYSTEMS
One very important point we need to observe and keep in mind is that, when a 4-bit
number is expanded into an 8-bit form, its sign bit has to be extended into the 8 bits.
The sign bit in the 4-bit representation of −6 is ‘1’. When expanding the number to fill
into 8 bits, the 1 is replicated 4 more times to fill the whole byte.Thus −6 which is AH in
4-bit form, becomes FAH in byte form, and will be FFFAH in 16-bit format, and FFFF
FFFAH in 32-bit format. We need to understand this for negative numbers. For positive
numbers, we do it without much thinking. So +6 is 0110, which expands to be 0000 0110
(byte) or 06H, and 0006H in 16-bit format and 0000 0006 H in the 32-bit format. Note
that for positive numbers,the sign bit is 0; effectively we are doing sign extension here too.
This concept of ‘sign extension’is important and we will deal with it in greater detail later.
Conversion from Two’s Complement Form Given the two’s complement representa-
tion of a decimal number, how do we find the decimal number which it represents? The
answer is—two’s complement it again.
Take FA
1111 1010 ...the number
0000 0101 + ...invert each bit
1 ...add 1
0000 0110 ...its 2’s complement
This is 6.Thus FA is the two’s complement representation of −6.
Example 0.11
Find the decimal number whose two’s complement representation is given.
i) FFF2H
ii) F9H
Solution
i) FFF2H
Taking two’s complement gives 000E
which is 1110.i.e.,14—which means that −14 is the number represented by FFF2H.
Table 0.3 | Negative and Positive Number Representation in 4-bit Binary
Negative Numbers Binary Hex Positive Numbers Binary Hex
−8 1000 8
−7 1001 9 + 7 0111 7
−6 1010 A + 6 0110 6
−5 1011 B + 5 0101 5
−4 1100 C + 4 0100 4
−3 1101 D + 3 0011 3
−2 1110 E + 2 0010 2
−1 1111 F + 1 0001 1
−0 0000 0 + 0 0000 0
M00_9788131787663_C00.indd 20
M00_9788131787663_C00.indd 20 7/3/2012 12:08:18 PM
7/3/2012 12:08:18 PM

ii) F9H
Taking two’s complement gives
0111 i.e., 7, which means that −7 is the number represented by F9H.
Question Looking at the result of various arithmetic operations on binary numbers,
how do we know whether it is a positive or a negative number? What is your observation
regarding signed numbers?
Answer We should know how many bits are used for the representation of a signed
number in the system.Then, if the MSB is a ‘1’, it is a negative number, if the MSB is a
‘0’, it is a positive number.
0.6 | Computer Arithmetic
0.6.1 | Addition of Unsigned Numbers
When we say that a number is unsigned, it implies that the sign of the number is irrel-
evant, which actually means that we consider the numbers as having no sign bit—all the
bits allotted for the data are used for the magnitude alone,in effect,it turns out that these
refer to positive numbers. With 8 bits, numbers from 0 to 255 can be used.
Binary addition is something that you have already learnt. Here we are reviewing it
to bring into focus some important points which we may have to be taken care of, in the
study of microprocessor programming.
Binary addition is done by adding bits column wise.We will consider byte sized data.
Case 1
Binary Decimal Hexadecimal
0101 1001 + 89 + 59H +
0110 1001 105 69H
1100 0010 194 C2H
Addition of the same numbers in the binary, decimal and hexadecimal formats is shown.
Since the sum lies within a value of 255, there is no special problem in this case.
Case 2
0111 1000 120 + 78H +
1001 1001 153 99H
10001 0001 273 111H
In this case, the sum is greater than the number of bits allotted for the operand, and the
extra bit, beyond the 8 bits of the sum, is called a ‘carry’. Whenever a carry appears, it
indicates the insufficiency of the space allocated for the result. In microprocessors, there
is a flag that indicates this condition.
0.6.2 | Addition of Packed BCD Numbers
Now let us add packed BCD numbers
M00_9788131787663_C00.indd 21
M00_9788131787663_C00.indd 21 7/3/2012 12:08:18 PM
7/3/2012 12:08:18 PM

22 EMBEDDED SYSTEMS
Case 1
Consider the case of two packed BCD bytes that are to be added, say 45 and 22.
Packed BCD Packed BCD in hex form Decimal
0100 0101 + 45H + 45 +
0010 0010 22H 22
0110 0111 67H 67
In this case,the upper nibble and lower nibble are within 0 to 9.So the addition proceeds
just like normal decimal addition.
Case 2
Consider the case of two packed BCD bytes that are to be added,say 45 and 27.In BCD
form, the correct answer should be 72. However, this is not obtained directly.
Packed BCD Packed BCD in hex form Decimal
0100 0101 + 45H + 45 +
0010 0111 27H 27
0110 1100 6CH 72
When adding in binary form, the lower nibble of the sum is greater than 9. Since no
BCD digit can have a value greater than 9, a correction needs to be applied here. The
correction to get the sum back to BCD form is to add 6 (0110) to the lower nibble alone.
Correction
0110 1100 +
0000 0110
0111 0010
This gives the correct sum of 72.
Case 3
This is when the upper nibble of the sum is greater than 9.The correction is to add 6 to
the upper nibble alone.
Add BCD 76 and 62. In binary form, the additions are
0111 0110 + 76H +
0110 0010 62H
1101 1000 + D8H + Now adding 6 to the upper nibble,
0110 0000 60H
1 0011 1000 138H
However, note that the data size exceeds 99, which is the maximum s number that 8 bits
can accommodate for a packed BCD number. Thus there is a ‘carry’ generated from the
addition operation. However, if the carry is also included in the answer, the sum of 138
is correct. However, more than 8 bits are needed for the sum.
Case 4
When both the upper and lower nibbles of the sum are greater than 9, add 6 to both
nibbles. Add BCD 89 and 72.
M00_9788131787663_C00.indd 22
M00_9788131787663_C00.indd 22 7/3/2012 12:08:18 PM
7/3/2012 12:08:18 PM

1000 1001 + 89H +
0111 0010 72H
1111 1011 FBH add 06 to both nibbles
0110 0110 66H
1 0110 0001 1 61H
The right answer of 161 is obtained. However, the sum needs more than one byte space.
Example 0.12
Perform the addition of the following numbers, after converting to decimal and hexa-
decimal forms.
i) 39 and 99
ii) 117 and 156
Solution
Decimal Binary Hexadecimal
i) 39 + 0010 0111 + 27H +
99 0110 0011 63H
138 1000 1010 8AH
ii) 117 + 0111 0101 + 75H +
156 1001 1100 9CH
273 1 0001 0001 1 1 1H
In the second addition, the data has exceeded the size which can be accommodated in
8 bits.Hence a carry will be generated.In microprocessors,there is a ﬂag which indicates
this condition.
0.6.3 | Addition of Negative Numbers
We know now that negative numbers are represented in two’s complement notation.
Let’s consider adding two negative numbers.
Example 0.13
Add −43 and −56
Solution
Convert the two numbers into their two’s complement form, as both are negative
numbers.
−43 + 1101 0101 +
−56 1100 1000
−99 1 1001 1101
We are adding two 8-bit numbers. If the sum exceeds 8 bits, an extra bit is generated
from the addition.Ignore this carry and look at the eight bits of the sum.(This is the rule
for two’s complement addition.)
M00_9788131787663_C00.indd 23
M00_9788131787663_C00.indd 23 7/3/2012 12:08:18 PM
7/3/2012 12:08:18 PM

24 EMBEDDED SYSTEMS
It is 1001 1101.The MSB is found to be ‘1’. So we know that it is a negative number.To
ﬁnd the decimal number whose two’s complement representation this is, take the two’s
complement of the sum.This comes to be 0110 0011 i.e., 99.Thus, we verify the correct-
ness of our addition procedure.
Example 0.14
Add +90 and −26.
Solution
One number is positive and the other is negative.
+90 0101 1010 +
−26 1110 0110
64 1 0100 0000
Ignore the end around carry.The sum is 0100 0000. Since the MSB of the number is ‘0’,
we understand that the sum is positive. So convert it to decimal.The result is 64.
Example 0.15
Add −120 and +45
Solution
−120 1000 1000 +
+45 0010 1101
−75 1011 0101
Look at the sum—the MSB of the sum is ‘1’. Hence, it is a negative number. The two’s
complement of this is 0100 1011 i.e., 75.Thus, the result of the calculation is −75.
Note In all the above calculations, we have used data of 8 bits.The result of the calcu-
lations was in the range of −128 to +127. Thus, the answers are correct. If the sum goes
outside this (for eight-bit data), the answers will be wrong, and havoc will be created if
one is not aware of that. Computers have ‘ﬂags’ to let us know of this. This will be dis-
cussed in later sections.
0.6.4 | Subtraction
Unsigned Numbers
i) Binary numbers
ii) Hexadecimal numbers
iii) BCD numbers
The procedure here is similar to addition i.e., bit by bit, column by column subtraction.
Sometimes, borrows from the columns on the left are needed.
M00_9788131787663_C00.indd 24
M00_9788131787663_C00.indd 24 7/3/2012 12:08:18 PM
7/3/2012 12:08:18 PM

Example 0.16
Subtract 56 from 230. Do this subtraction after converting numbers to binary and hex.
Solution
230 – 1110 0110 – E6H –
56 0011 1000 38H
174 1010 1110 AEH
In the above subtraction, we are subtracting a smaller number from a bigger number.
However, when subtracting column-wise, sometimes there is the issue of having to
subtract a bigger number from a smaller number. We know the idea of ‘borrow’ from
the left-hand column. However, for the borrowing with which we append the number,
depends on the base of the number system. For the decimal system, we borrow 10, for
binary 2 and for hex we borrow 16.
Check this hexadecimal subtraction:
E6H –
38H
AEH
Starting from the rightmost column,we see that we cannot subtract 8 from 6.So,borrow-
ing from E is needed.Borrowing from E leaves E to become D and 6 becomes 6 + 16 = 22
(in decimal). Subtracting 8 from 22 gives 14 which is E in hex. That is how we get E in
rightmost column of the result.Then,going over to the left,subtract 3 from D (13 in deci-
mal).This is 10 (in decimal) and A in hex.That is how the result of the subtraction is A.
This idea has been explained here in detail, so that we can use a similar idea in BCD
subtraction.
0.6.5 | Packed BCD Subtraction
Let us use the same numbers for BCD subtraction as we did in Example 0.16. i.e., sub-
tract 56 from 230.The BCD representation is shown below.Each decimal digit is packed
in to 4-bit binary bits.
Decimal Packed BCD
230 – 0010 0011 0000 –
56 0000 0101 0110
174 0001 0111 0100
The point to remember here is that each group of 4 bits represents a ‘decimal number’,
the base of which is ten. Thus, when we try to subtract a bigger number from a smaller
number, we have to consider the ‘four bits together’ as a decimal number. Let us review
the steps in the above subtraction.
First step
Thus, when we have to subtract 6 from 0 in the rightmost group of four bits, we need
to borrow. Borrow from the group on the left a decimal 10, and add it to the ‘0000’ on
M00_9788131787663_C00.indd 25
M00_9788131787663_C00.indd 25 7/3/2012 12:08:18 PM
7/3/2012 12:08:18 PM

26 EMBEDDED SYSTEMS
the right.That makes it ‘1010’(because of borrowing, the 0011 on the left is now ‘0010’).
Then subtract 0110 from this. The result is 0100 as seen (within the group, binary sub-
traction is done).
0010 0010 1010 –
0110
0100
Second step
This is the second group. For subtracting 0101 from 0010, borrowing of decimal 10 is
taken from the leftmost group.Thus 0010 is ‘1100’, 12 in decimal. Subtracting ‘0101’(5)
from it, gives ‘0111’ (7) as shown.
0001 1100 1010 –
0101 0110
0111 0100
Third step
The leftmost group is now 0001. Subtract 0000 from it. Thus, the ﬁnal answer is 174 in
packed BCD form.
0001 1101 1010 –
0000 0101 0110
0001 0111 0100
All this shows that BCD subtraction also needs extra care as BCD addition. In comput-
ers, special instructions take care of this.
Example 0.17
Express the numbers 53 and 18 in packed BCD and subtract the latter from the former.
Solution
Decimal Packed BCD
53 – 0101 0011 –
18 0001 1000
First step
Borrowing from the left side nibble to the nibble on the right side gives
0100 1101 –
0001 1000
0101
Second step
0100 1101 –
0001 1000
0011 0101
The result is 35, as it should be.
M00_9788131787663_C00.indd 26
M00_9788131787663_C00.indd 26 7/3/2012 12:08:18 PM
7/3/2012 12:08:18 PM

0.6.6 | Subtraction of Signed Numbers
Subtraction is the process of changing the sign of the second number and adding to the first.
65−34 is 65 + (−34).
So when we do subtraction, we actually add the two’s complement form (i.e., the
negative) of the second number to the first number. This is what computers actu-
ally do when they perform subtraction. In the discussion of subtraction in Section
0.6.4, this was not explicitly mentioned, because the idea then was to present certain
other intricacies related to subtraction. Now let us discuss subtraction for 8-bit signed
numbers. Keep in mind that the range of signed numbers usable with 8 bits is −128
to + 127.
Example 0.18
Perform subtraction of the following signed numbers:
i) +26 from +68
ii) +26 from −68
Solution
i) +26 from +68
This comes to be a computation in the form of 68 + (−26). For this, the two’s comple-
ment form of 26 should be added to 68.
Decimal Binary
68 is 0100 0100 +
−26 is 1110 0110
1 0010 1010
Ignore the extra bit generated. Since the MSB is ‘0’, the result is positive. The result is
0010 1010 i.e., 42.
ii) +26 from −68 i.e., −68 – (26) −68 + (−26)
−68 is (in two’s complement form) 1011 1100 +
−26 1110 0110
−94 1 1010 0010
Ignore the extra bit generated. Since the MSB of the 8-bit result 1010 0010 is ‘1’, the
difference of the two numbers is negative.Take the two’s complement of this. 0101 1110
i.e., 94. So the result of the computation is −94.
Example 0.19
Find the result of the following subtraction:
i) −56 from + 23
ii) −56 from −23
M00_9788131787663_C00.indd 27
M00_9788131787663_C00.indd 27 7/3/2012 12:08:18 PM
7/3/2012 12:08:18 PM

28 EMBEDDED SYSTEMS
Solution
i) −56 from + 23
The computation to be done is +23 – (−56) i.e., 23 + 56. This turns out to be the
addition of the two positive numbers 23 and 56.
23 + 0001 0111 +
56 0011 1000
79 0100 1111 i.e., 79
ii) −56 from −23
The computation to be done is −23 − (−56) i.e., −23 + 56.
−23 + 1110 1001 +
56 0011 1000
33 1 0010 0001
Ignore the extra bit generated. The MSB of the 8-bit result 0010 0001 is ‘0’. So the
number is positive and is 33 in decimal, as it should be.
Overflow into the Signed Bit
Whenever we use 8-bit signed numbers in addition or subtraction, the result is found to
be correct in sign and magnitude if it is within the range of −128 to +127.However,sup-
pose this is violated? What happens then? A typical case is when two negative numbers
are added. Try adding −100 and −55. Both the operands are within the allowed range.
See the addition.
−100 + 1001 1100 +
−55 1100 1001
−155 1 0110 0101
Ignore the extra carry bit and look at the 8-bit result.The MSB of the result is ‘0’indicating
that it is a positive number. However, we know that the answer is negative. What caused
the error? Because the sum was too large (larger than −128) to fit into the 8 bits allotted
to it,there was an ‘overflow into the sign bit’causing the sign bit to be changed.(A similar
issue occurs when we add two positive numbers and the sum is greater than + 127). In
computers there is a flag which tells us when there is an overflow into the sign bit caus-
ing it to be inverted.These matters will be discussed in detail when we do programming.
0.6.7 | Addition of Numbers of Different Lengths
We have discussed computer arithmetic in detail, because it is very important to be clear
about it,so as to be able to understand how the microprocessor responds to different data
types and arithmetic operations. Now let’s try to understand how data of different data
widths are dealt with.
Data can have different sizes depending on the processor.The 8086 can have data of
8 bits and 16 bits, while Pentium can handle 8, 16 and 32 bits internally. Sometimes it
may be required to add/subtract data of different widths. In these cases, the important
thing to do is to equalize the size of the data involved. Processors do not allow addition/
M00_9788131787663_C00.indd 28
M00_9788131787663_C00.indd 28 7/3/2012 12:08:18 PM
7/3/2012 12:08:18 PM

subtraction of data of diﬀerent widths. So a byte will have to be converted to a 16-bit
word, if it has to be added to a 16-bit number. The way it is done depends on whether
the data is signed or unsigned. For unsigned data, the byte is appended with zeros in
the upper byte, and converted to a 16-bit word. For signed data, the byte should be ‘sign
extended’ to make it a 16-bit word. Refer Section 0.5.6 once again to convince yourself
of the necessity for this.
Example 0.20
Add the unsigned numbers 35H and 7890H.
Solution
In this, 35H is appended with zeros to make it 0035H.
0035H +
7890H
78C5H
Example 0.21
Add the following signed numbers:
i) 45H and A87CH
ii) A8H and 1045H
iii) F5H and B45CH
Solution
i) In this 45H should be made into a 16-bit number. Check the MSB of this byte. It is
‘0’,meaning that it is a positive number.The sign bit when extended to 16 bits makes
the number 0045H.Then the addition is
0045H +
A87CH
A8C1H
ii) In this the byte is A8H, which has an MSB of ‘1’. Thus, sign extension makes it
FFA8H. Now the addition is
FFA8H +
1045H
1 0FEDH
The extra bit generated is ignored, like we have done in Section 0.7.3 on signed number
computation.
To be sure that this is correct, veriﬁcation can be done as below.
A8H is −88
1045H is +4165
Adding the two, gives us 4077 whose hex representation is 0FEDH.
M00_9788131787663_C00.indd 29
M00_9788131787663_C00.indd 29 7/3/2012 12:08:18 PM
7/3/2012 12:08:18 PM

30 EMBEDDED SYSTEMS
iii) Add F5H and B45CH
In this, F5H is sign extended to be FFF5H
Adding
FFF5H +
B45CH … note that this is a negative number
1 B451H
Ignoring the extra carry bit, the sum is B451H, a negative number. To verify, find the
decimal equivalents of the numbers which are −11 and −19364, which when added,
give −19375.
Note You may also verify that, without extending the negative sign, a wrong result is
obtained.
All the calculations we have done can be verified easily using the scientific calculator
available on the PC. So, try to be adept in the use of that calculator.
0.7 | Units of Memory Capacity
A memory device is one in which data is stored. How much data a memory device can
store depends on its capacity. The capacity of memory is specified as multiples of bytes
since memory is byte organized, which means that one byte is stored in one location in
memory. So, if there are 100 locations in a memory device, 100 bytes are stored. We all
have heard of memory capacity being mentioned in terms such as bytes, kilobytes and
megabytes. Now let us quantify these terms. You will be hearing these terms throughout
the use of this book.
A byte is 8 bits. A word is not really defined. It depends on the processor used. For
the 8086, a word size is 16 bits. A 32-bit processor may claim to have a word size of
32 bits. Memory capacity is always specified in bytes.
28
= 256 bytes
210
= 1024 bytes = 1KiloByte or 1KB
26
× 210
= 216
= 64 KB = 65, 536 bytes
210
× 210
= 220
= one Mega Byte (1MB) = 1024 × 1024 = 1,048, 576 bytes
210
× 220
= 230
= one Giga Byte (1 GB) = 1024 × 1024 × 1024 = 1,073, 741, 824 bytes
210
× 230
= 240
= one Terra Byte (TB) = 1024 × 1024 × 1024 × 1024
= 1,099, 511, 627, 776 bytes
There are also higher units, which are not so common in usage as yet, but things will
change soon, no doubt about it. Some of these units are:
Peta Byte (PB) = 250
bytes
Exa Byte (EB) = 260
bytes
Zetta Byte (ZB) = 270
bytes
M00_9788131787663_C00.indd 30
M00_9788131787663_C00.indd 30 7/3/2012 12:08:18 PM
7/3/2012 12:08:18 PM

K E Y P O I N T S O F T H I S C H A P T E R
A computer system consists of a CPU, memory and I/O which communicate with one
another through the system bus.
The system bus comprises the data bus, address bus and the control bus.
A processor’s activities are restricted to fetching, decoding and executing instructions.
For reading and writing from/to memory, a number of clock cycles of time are required.
The time expended for this is called the memory access time.
When comparing assembly language programming with high-level language program-
ming, we conclude that the former is faster in execution and more efficient and compact,
but is more difficult to learn and master.
RISC and CISC are two different philosophies in computer design, and even though a lot
of controversy still rages around which is better, the two seem to have merged, more
or less.
Computers do all the computations in binary, but for entering data through the keyboard
and for displaying it on the monitor, ASCII codes are used.
Negative numbers are represented in two’s complement form by all computers.
Q U E S T I O N S
1. Name the three most important components of a computer system.
2. Have you heard of the term‘bus contention’? What does it mean in the context of a com-
puter system?
3. If the data bus width of a processor is 64 bits, what would you say about its complexity
and capability?
4. If the address bus of a processor is 64 bits, what is its address space?
5. What could be a‘multi processing’system?
6. What is the first step in the execution cycle of a processor?
7. How does the system clock frequency influence the speed of processing?
8. If a system uses a 1.5-GHz clock, what is its clock period?
9. What is meant by the word‘system bus’?
10. Why should a computer have an I/O controller?
11. What are the difficulties involved in learning and using assembly language programming?
12. Name one distinguishing feature each of RISC and CISC computers.
13. How are the hexadecimal and binary number systems related?
14. When two signed positive numbers are added and the sum exceeds 127, what is the
problem that arises?
15. What is the range of signed numbers that can be represented in 12 bits?
M00_9788131787663_C00.indd 31
M00_9788131787663_C00.indd 31 7/3/2012 12:08:18 PM
7/3/2012 12:08:18 PM

32 EMBEDDED SYSTEMS
E X E R C I S E S
1. Write the decimal equivalent of the following numbers:
a) 31.3H
b) 1100.101B
c) A32.3H
d) 100101B
2. Convert the following numbers to binary form:
a) 34
b) 200
c) 90
3. Convert to hexadecimal format.
a) 3454
b) 4523
c) 789
4. Write the binary values for:
a) 34ADH
b) 78FH
c) 407BH
5. Write the hexadecimal values of:
a) 11000101010110001B
b) 10011111100001010B
6. Find the packed BCD representation of the following decimal numbers:
a) 45
b) 4678
c) 802345
7. Represent the packed BCD of the following numbers in hex:
a) 235
b) 9123
8. What is the ASCII of each of the following?
a) 7
b) 8
c) 0
d) A
e) Z
f) y
g) d
h) *
i)
9. Find the two’s complement representation of the following numbers in 8 bits:
a) −45
b) −90
c) −12
d) −34
M00_9788131787663_C00.indd 32
M00_9788131787663_C00.indd 32 7/3/2012 12:08:19 PM
7/3/2012 12:08:19 PM

10. Represent the following negative numbers using 16 bits:
a) −267
b) −4
c) −5676
d) −675
11. Perform binary addition for the following numbers:
a) 34 and 56
b) −52 and −70
c) −47 and +120
12. Convert to packed BCD and add,
a) 46 and 23
b) 55 and 67
c) 34 and 49
d) 99 and 44
13. Subtract after converting to binary form,
a) −20 from −75
b) +49 from +97
c) E5H from A4H
14. Add the following signed numbers:
a) F3H and 3245H
b) AH and F45H
c) B2H and 123EH
15. How many bytes constitute
a) 5MB
b) 4KB
c) 32MB
d) 32KB
e) 8GB
M00_9788131787663_C00.indd 33
M00_9788131787663_C00.indd 33 7/3/2012 12:08:19 PM
7/3/2012 12:08:19 PM

Introduction
The term ‘embedded systems’ has become very common, but is quite difficult to ‘define’,
because of the large variety of devices included in this class. So, let us make an attempt
to understand it, rather than make an attempt to ‘define’ it.
An embedded system is an electronic system which is designed to perform one
or a limited set of functions, using hardware and software. Thus, let’s examine the vast
domain of embedded systems.
Having hardware and software makes an embedded system a computer, but this
computer performs only a limited set of functions. Thus, we exclude the PC from the
embedded system world, and name it as a general purpose computer. Therefore, an
embedded system is a ‘special purpose’ computing unit—meaning that it will have
a processor and associated software. The software associated with the application is
‘burned’ into the ROM of the processor; therefore, it is better to designate it as a
‘firmware’.
Take the case of an automobile, for example, a car. It has a number of ‘electronic
control units (ECUs)’ as part of what is called ‘automobile electronics’—each of these
units has a processor, which controls one or other of the various parts of the car such
as engine, brakes, lights, doors and so on. Thus, embedded systems are ubiquitous, that
is, omnipresent within an automobile, and adds intelligence to the operation of the
vehicle.
What is meant by the term ‘embedded
systems’
The application domain of embedded
systems
The model of an embedded system
The difference between an MCU and an
MPU
The working of a simple embedded system
The figures of merit for an embedded
system
Classification of MCUs on the basis of
data bus widths
The history and current trends of the
embedded systems industry
introduction to
embedded systems
1
Chapter-opening image: Development board of TI’s low power MSP 430 microcontroller.
M01_9788131787663_C01.indd 34
M01_9788131787663_C01.indd 34 7/3/2012 12:08:27 PM
7/3/2012 12:08:27 PM

INTRODUCTION TO EMBEDDED SYSTEMS 35
1.1 | Application Domain of Embedded Systems
The application domain of embedded systems percolates every element of modern life—
it will be easier to understand its features once we take a tour of the world of embedded
systems.The following is a list:
i) Consumer electronics: Cameras, music players, TVs, DVD players, microwave
ovens, washing machines, refrigerators and remote controls.
ii) Household appliances/home security systems: Airconditioners, intruders and
fire alarm systems.
ii) Automobile controls: Anti-lock braking system, engine and transmission control,
door and wiper control, etc.
iv) Handheld devices: Mobile phones, PDAs, MP3 players, digicams, etc.
v) Medical equipments: Scanners, ECG and EEG units, testing and monitoring
equipments.
vi) Banking: ATMs, currency counters, etc.
vii) Computer peripherals: Printers, scanners, webcams, etc.
viii) Networking: Routers, switches, hubs, etc.
ix) Factories: Control, automation, instrumentation and alarm systems.
xi) Aviation: Airplane controls, guidance and instrumentation systems.
xii) Military: Control and monitoring of military equipments.
xiii) Robotics: Used in factories, household and hobby-related activities.
xiii) Toys.
Figure 1.1 depicts some embedded products. It is only a sample of the products in
the galaxy of embedded systems.
This list is incomplete, and on perusing it, you are likely to feel that anything and
everything that involves modern day electronic control is an ‘embedded system’. This is
not far from the truth. In fact, the only electronic equipment that we simply and easily
exclude from the list is the home PC.
Why do we exclude the PC from this list?
We will first list out the general features of an embedded system before attempting to
answer this question.
1.2 | Desirable Features and General Characteristics
of Embedded Systems
i) It should have one or a small set of functions which it is expected to perform
efficiently.
ii) It should be designed for low-power dissipation, because many systems are battery
powered.
iii) It has limited memory and limited number of peripherals.
iv) Applications are not meant to be alterable by the user.
v) Many of them are not accessible directly,that is,they may be part of the control unit
of a larger system, so no interference in operation is possible.
vi) They need to be highly reliable.
vii) Many of them need to operate with time constraints.
M01_9788131787663_C01.indd 35
M01_9788131787663_C01.indd 35 7/3/2012 12:08:28 PM
7/3/2012 12:08:28 PM

36 EMBEDDED SYSTEMS
Figure 1.1 | Some application ﬁelds of embedded systems
M01_9788131787663_C01.indd 36
M01_9788131787663_C01.indd 36 7/3/2012 12:08:28 PM
7/3/2012 12:08:28 PM

Now let’s try to understand why a PC is not considered to be an embedded system.
i) The PC has a large application set, from word processing and computation to
communications, printing, scanning and many more.
ii) Low power consideration is a good idea, but that is not the guiding principle in
its design.
iii) Memory is available in various forms: RAM, ROM and secondary memory
devices like the hard disk, CDROMs and the like. More memory can be added if
the user desires.
iv) Since the PC is used for various applications,more applications can be added as and
when needed.
v) The PC can be accessed by input devices like the keyboard, mouse, modem, etc.
vi) Like any other system, the PC also needs to be reliable, but since it is unlikely to be
the part of a very critical system, it can aﬀord to fail once in a while (not a very good
idea, though because PCs are used in critical monitoring applications sometimes).
vii) The applications on the PC need to be fast for better performance, but usually there
is no time criticality involved.
Now that we have eliminated the general purpose PC from the list of embedded
systems,the next question is whether the new handheld devices such as advanced mobile
phones, PDAs, etc. can be included in the list of ‘embedded systems’.The answer is that
gradually these devices are also being used for ‘general purposes’, just like a PC. But the
other side of the argument is that the design of such handheld devices is similar to that
of embedded systems, where processor power, memory, size, are limited, and timing is
critical, even though the applications may resemble that of a PC. As such, such devices
can also be thought of as embedded systems.
1.3 | Model of an Embedded System
In its simplest and most general form, an embedded system consists of a processor,
sensors, actuators and memory.The idea is that any application should be able to provide
solution to a real-world problem, for which some data is deﬁnitely to be read in. For this,
sensors are needed. This data is processed by the processor and the result of it is given
to actuators which perform appropriate actions. See Figure 1.2, which is a very simple
model of an embedded system.
1.4 | Microprocessor vs Microcontroller
We have already talked about a processor as being the brain of an embedded system.This
simply means that there should be a computational engine as the core of the system, to
make it ‘intelligent’. There are two types of ‘processor units’ commonly mentioned in
the literature.
1.4.1 | Microprocessor Unit (MPU)
A processor like the 8086, or its advanced version, that is, Pentium, has very high
computational capability,but it does not have pins or the internal architecture to interface
with the external world. For such ‘microprocessors’, external chips act as peripheral
M01_9788131787663_C01.indd 37
M01_9788131787663_C01.indd 37 7/3/2012 12:08:30 PM
7/3/2012 12:08:30 PM

38 EMBEDDED SYSTEMS
Processor
S
S
E
N
O
R
S
A
A
C
T
T
U
O
R
S
Memory
Figure 1.2 | General model of an embedded system
Figure 1.3 | An MPU with peripherals and memory external to the chip
MPU
Chip
S
S
Y
T
E
M
B
U
S
RAM
ROM
Parallel I/O
Serial I/O
Counter/
Timer
Other
Peripherals
controllers. For example, to connect an LCD display to an MPU, a parallel port IC is to
be connected externally—to have a serial transmission facility, an external serial control-
ler is necessary—for timing and counting, external timers are needed. Memory can also
be connected externally. Figure 1.3 shows an MPU chip connected to peripherals and
memory which are physically external to the chip. Such MPUs are used as the core of
general purpose computation systems, where the emphasis is on ‘computational power’
rather than interfacing capability. A PC uses an MPU, and a number of external chips
together as a ‘chipset’ which acts as controllers to various peripherals.
M01_9788131787663_C01.indd 38
M01_9788131787663_C01.indd 38 7/3/2012 12:08:30 PM
7/3/2012 12:08:30 PM

Processor
Core
Internal
Memory
A-to-D
Conversion
D-to-A
Conversion
MCU
Parallel I/O
Ports
Serial
I/O Ports
Counter/
Timer
Figure 1.4 | An MCU with peripherals and memory inside the chip
1.4.2 | Microcontroller Unit (MCU)
See Figure 1.4. Here the processing unit has along with it (in the same chip), timers,
parallel ports, serial ports, RAM, ROM, etc. So, no external controller chips are needed.
Memory is also available inside the chip—the program code is burned into the internal
ROM,and application code is run with the help of internal RAM.Thus,this is more or less
a self-contained single chip computer. Popular microcontrollers include 8051, PIC, AVR,
ARM, etc. When such an MCU has a lot of peripherals inside, such that the design of a
large system is possible with these peripheral controllers itself, the chip is called a System
on Chip (SoC). We usually hear terms like ARM SoC, Cypress’s PSoC (Programmable
System on Chip), etc. which are very popular in the embedded system market. Figure 1.5
is a photograph of a few MCU chips.
Figure 1.5 | Some popular MCUs
M01_9788131787663_C01.indd 39
M01_9788131787663_C01.indd 39 7/3/2012 12:08:31 PM
7/3/2012 12:08:31 PM

40 EMBEDDED SYSTEMS
1.5 | Example of a Simple Embedded System
Let’s examine the embedded system in detail. We will think of a very simple system,
as in Figure 1.6. An MCU is considered the brain of this system, which functions as a
temperature monitor/controller. A sensor reads the temperature, and an ADC inside
the MCU converts it to digital form. This data is compared to a reference temperature,
and if it is above the allowed value, an alarm is activated. Also an output from the MCU
is used to start a cooling fan to reduce the temperature. There is a digital display of the
temperature as well.
Thus, the output actuation consists of the following:
i) Display of the temperature value
ii) Alarm
iii) Motor which controls a cooling fan
The input is just a temperature sensor.
This is a very simple system. The program continuously measures the temperature
at the sensor with a delay of T, between the readings. This ‘delay’ is obtained from the
timer inside the MCU. There is an ADC inside the MCU to convert the analog value
of the temperature into a digital number. The output display also is refreshed at the
same rate as the rate of reading the input temperature. The program is written, tested
and burned into the ROM (usually ﬂash ROM) of the MCU. The program runs con-
tinuously. The circuitry is put on a PCB, packed inside an enclosure, and it becomes a
product.The user simply places it in the area that he wants to measure the temperature.
Now, if the program is to be changed, the designer has to interfere. The user cannot do
anything to the ﬁnished product.
This product can have a user interface, by adding a keyboard at the input side. For
example,if the ‘reference temperature‘ is allowed to be changed by the user,this keyboard
input, with a password (optional) can be used.The keyboard is interfaced with an inter-
rupt, that is, when a key is pressed, an ISR (interrupt service routine) (Section 2.2.9) is
activated, which checks the password and allows the user to change the ‘reference’ tem-
perature settings.
Figure 1.6 | A simple temperature monitor
Sensor
Keyboard
M
C
U
Display
Motor
Alarm
M01_9788131787663_C01.indd 40
M01_9788131787663_C01.indd 40 7/3/2012 12:08:32 PM
7/3/2012 12:08:32 PM

This kind of approach in ‘programming’,where a program continually and repeatedly
runs in memory, is termed ‘the superloop approach’. This is how a simple embedded
system is designed to meet its requirements. Any aperiodic input is accepted by the
mechanism of interrupts.
Next let’s take the case of a more complex system, for example, a mobile phone,
which has a number of functions to perform: handling voice calls, messaging, the
Internet, video and music players, reminders and a lot more. Some of the applications
may be time critical, others may not be. Such a complex system needs a ‘manager’ and
usually has an operating system. Most of us are aware that mobile phones have some
version of an operating system. Symbion, Android, etc. are some popular mobile phone
operating systems.
Other complex systems like PDAs, telecom networks, wireless sensor networks,
etc., also have operating systems. Operating systems are needed only when system com-
plexity due to multiple tasks of different types and criticality of response times dictate
the need for it.
1.6 | Figures of Merit for an Embedded System
Embedded system design is usually aimed to achieve the following objectives:
i) Low-power dissipation
ii) Small physical size
iii) Small code size
iv) High speed of response
Low-power Dissipation Many embedded devices are battery powered, and hence
low-power dissipation is an important figure of merit. Even in cases where the embed-
ded system is part of a larger system (like in a washing machine), it is important to keep
power dissipation low to avoid excessive heating. Thus, embedded designs should be
low-power designs and the first step in this is achieved by choosing an MCU with low-
power features. What has made the ARM MCU (used in many mobile phones, IPads,
etc.) very popular is its ‘low power’feature.Taking care of the power requirement of the
MCU is not the only thing. Peripherals like displays, motors, relays, etc. should also be
chosen with the same consideration.
Small Physical Size Many embedded systems are handheld devices, and others are
allotted only small spaces within large systems such as the electronic unit which controls
a printer, scanner, etc. It is obvious that the smaller the size of the unit, the better it will
be. As such, the trend is to choose an MCU with most of the peripheral controllers
inside the chip itself—thus the PCB is very small, with very few extra chips—there is
also the trend in chip design to focus on ‘small dies’.
Small Code Size The system code, after testing and debugging, is to be embedded as
firmware, and it is best if it fits inside the (flash) ROM of the MCU.Thus, the code size
is to be minimized, as on-chip ROM is expensive and a scarce resource. If the code size
is large, external memory will have to be added, which will defeat the very purpose of
M01_9788131787663_C01.indd 41
M01_9788131787663_C01.indd 41 7/3/2012 12:08:32 PM
7/3/2012 12:08:32 PM

42 EMBEDDED SYSTEMS
using an MCU.If an operating system is being used for the system,its ‘footprint’should
be small.
HighSpeed As a general case,we would like systems to respond fast.For an embedded
MCU, fast response implies high clock frequency—but the higher the clock frequency,
higher will be the power dissipation—so the trick is not to choose an unnecessarily
high clock frequency unless the application needs it. For simple applications, if PIC
and 8051 MCUs are used, we find that frequencies in the range of 12–20 MHz are
common.This is not a high frequency compared to clock frequencies of general purpose
processors (2 to 3 GHz). But higher end systems are likely to use MCUs (like ARM)
with much higher clock frequencies for faster response.
Real Time Response There is another aspect to response time and that is defined by a
‘deadline’—if an operation is stipulated to have to be completed within a deadline, the
system must be able to produce the result within this time frame—if not, it becomes
either a useless system or a system with low performance quotient.
1.7 | Classification of MCUs: 4/8/16/32 Bits
We can classify embedded systems on the basis of their complexity. Complex operations
imply large amounts of data,usually.For example,image and video operations need large
word lengths and higher clock rates.This needs MCUs also with wide data buses.MCUs
of different data and address bus widths are available. Here we classify MCUs on the
basis of their data word lengths.
i) 4-bit MCUs: Some applications deal with very little computation, and in that
case 4 bits of data could be sufficient. Simple toys and applications which use just
switch inputs and directly perform actuation don’t need to handle large volumes of
data.
ii) 8-bit MCUs: The highest volumes of MCUs used are the ones with 8-bit data
buses.For moderately complex operations,this is sufficient.The most popular of this
is the 8051 family, which was developed by Intel, but which is now manufactured
by various other companies as well. Microchip’s PIC is another popular family with
many different series with varying capabilities. Newer versions of PIC with more
and newer peripherals are being developed which makes the PIC series very attrac-
tive. Incidentally, the PIC series includes 8-bit, 16-bit and 32-bit MCUs.
iii) 16-bit MCUs: There are a few 16-bit MCUs like Intel’s 8096, 80196, some ver-
sions of PIC, etc. MSP 430 (manufactured by Texas Instruments) is a new 16-bit
series,which has very low-power dissipation,and can compete effectively in the new
embedded market, which is very particular about power dissipation.
iv) 32-bit MCUs: ‘ARM’ is the most popular 32-bit MCU in use today; it is used in
complex applications requiring low power,high speed and good computing capabil-
ity. The 32-bit MCUs are the ones used in image and video applications and thus
find use in the latest mobile phones, IPods, PDAs, etc.
In the following section, we will discuss some other devices which are also included
in the category of embedded systems.
M01_9788131787663_C01.indd 42
M01_9788131787663_C01.indd 42 7/3/2012 12:08:32 PM
7/3/2012 12:08:32 PM

1.7.1 | ASIC: Application Specific Integrated Circuit
This is an IC in which complex functional blocks are integrated to make it a complete
application.The IC is designed from basics, after defining its application. It can even be
that it is tailor-made for a particular customer by the designer company.
A video codec (coder decoder) is an example of an ASIC. What we get here is a
hardware implementation of a complex algorithm, and this hardware implementation
will be very efficient and fast (in the case of a sturdy design). ASICs are generally
expensive to make, especially because of ‘strict’ specifications and limited application
market.
What is an ASIP?
ASIP stands for ‘Application Specific Instruction Set Processor’. It is a processor whose
instructions set is tailor-made for a specific application, like graphics, for example.Thus,
it will be a sort of tradeoff design between the programmability features of a CPU and
the performance of an ASIC.
1.7.2 | FPGA (Field Programmable Gate Array)
These devices also are included in the domain of embedded systems. As the name indi-
cates, it is a programmable hardware— a type of hardware which is programmable, that
is, reconfigurable even while it is part of a circuit. It is an advanced form of complex
programmable logic devices. Here the device density is very high.
In this, a number of logic cells are interconnected—the logic cells as well as inter-
connects are programamable using hardware description languages and synthesis tools.
This makes hardware design cheap and flexible, but the end result is not as efficient or
as fast as ASICs. There are a number of well-advanced companies supplying FPGAs—
Xilinx, Altera, Altec, etc.
1.7.3 | DSP Processors
The above-said processors belong to the set of ‘general purpose processors’. They have
instruction sets which cater to general arithmetic and logical computations. But for
many applications like signal processing where floating point operations and complex
arithmetic operations are involved, their performance is unlikely to be sufficiently fast
and efficient. Here comes the necessity for processors with instruction sets which are
designed for signal processing and complex math operations. They are called DSP
processors.
Where real-time processing of speech, image, video, etc. are involved, they perform
superbly. There are many companies manufacturing such DSP processors: Texas
Instruments is the leader in the design of DSP processors, and Analog Devices, Nvidia,
Lucent, Freescale, etc. also some of them. For many applications, the current trend is
to have a general purpose core and a DSP core on the same chip, so that tasks can be
partitioned. See Figure 1.7 which is a very popular setup for advanced operations—an
ARM core and a DSP core handling different types of computations, along with a num-
ber of peripheral controllers—all on the same chip.
M01_9788131787663_C01.indd 43
M01_9788131787663_C01.indd 43 7/3/2012 12:08:32 PM
7/3/2012 12:08:32 PM

44 EMBEDDED SYSTEMS
1.8 | History of Embedded Systems
To trace the history of embedded systems, one does not need to look very far back.
When Intel and other companies started their design and manufacture of microproces-
sors, it was realized that programmability features could ease computations and also that
interfacing with input and output devices was possible. All this led to the development
of computers or what we now call the personal computer. The immense possibility of
using computers for control and actuation soon grew into realization and many pos-
sibilities were tried out.
With this realization, the personal computing sector grew and along with that, the
idea of embedding intelligence into computer chips took new wings. It was realized
that along with a computational engine, other functions can be incorporated into a sin-
gle chip. Memory and other functional blocks, when added to a microprocessor gave it
the name microcontroller or embedded processor. It was actually the development of the
concept of such types of processing units with peripherals and memory on a single chip
that gave the necessary impetus for the growth of embedded systems.
Two engineers in TI (Texas Instruments), Gary Boone and Michael Cochran, are
credited with creating the ﬁrst microcontroller TMS 1000, which became a commercial
product in 1974. It had ROM, RAM and clock circuitry on the chip along with the
processing unit.
In 1977, Intel emerged in this ﬁeld with the 8048, a microcontroller which had
RAM and ROM and which became widely used used in PC keyboards. In 1980, Intel
introduced the 8051 MCU and called it MCS-51 architecture.Over the years,it became
very popular especially because Intel allowed others also to manufacture and sell it. In
1982, Intel introduced the 80186 and called it an embedded processor. It had the same
computing engine as the 8086, but had a number of peripherals inside it, like timers,
DMA controllers, clock generators and so on. This chip was never used in PCs—all its
applications were in embedded products. Other popular microcontroller series are PIC
by Microchip and ATMega by AVR. Besides this there are a number of smaller players
also in the market.
This outlook that microcontrollers alone paved the way for the development of the
embedded industry development may not very true, however. If we look at any embed-
ded system today, we know that it is not ‘electronics’ alone that has made miraculous
strides.Along with electronics,sensors,actuators,displays,mechanical parts and software
developments have also made giant strides to fuel the growth of the embedded industry.
Figure 1.7 | Typical embedded dual core setup
Peripherals
ARM 9
Core
DSP
Core
M01_9788131787663_C01.indd 44
M01_9788131787663_C01.indd 44 7/3/2012 12:08:32 PM
7/3/2012 12:08:32 PM

What are the challenges in this field now?
New and innovative products are continually appearing in the market. The three Ps of
innovation frequently highlighted are ‘Price, Performance and Power’. This obviously
means that performance needs to be increased, but keeping power dissipation and price
as low as possible. This translates to using low-power dissipating processors, sensors and
actuators, all of which must be able to boast of high performance. Performance implies
high computational capability at the highest possible speed.The factors performance and
power dissipation directly conflict each other, and keeping prices low is an additional
issue. But in spite of all these challenges, the embedded industry is moving ahead in leaps
and bounds.
1.9 | Current Trends
To address the challenges just mentioned, various trends are being adopted.
i) Multi-core processors: It has become very clear that trying to improve processor
performance by increasing clock frequencies is fraught with difficulties, because the
direct result of higher clock frequency is high power dissipation. Thus, the option
of using more than one processor core (at lower clock frequencies) is being tried
out.Thus, the current smart phones and gaming consoles use multi-core processors.
It may be understood that if there are two cores, one may be a DSP core while the
other is a general purpose core.The design of multi core systems requires new design
environments which are being developed at a rapid rate.
ii) Embedded and real-time operating systems: With the emergence of complex
applications, many new embedded and real-time operating systems have become
popular. Linux has emerged as a popular embedded OS, and others like Android
and newer versions of Symbion have came up for mobile applications and handheld
devices.
iii) Newer areas of deployment of embedded devices: Embedded devices have
applications in the entertainment, healthcare and automotive segments. Besides
that, there are applications in the communication and military fields as well.
Research and development in these fields is going ahead.
Conclusion
In the forthcoming chapters, we will try to get some insight into this exciting field of
embedded systems and learn how we can make its study fruitful and interesting.
Embedded systems have made their presence felt in every area of modern life.
There are some desirable features for an electronic system to be included in the list of
embedded systems.
A general purpose PC is not an embedded system.
M01_9788131787663_C01.indd 45
M01_9788131787663_C01.indd 45 7/3/2012 12:08:32 PM
7/3/2012 12:08:32 PM

46 EMBEDDED SYSTEMS
Any embedded system has a sensor and an actuator.
MCUs are MPUs with peripherals and memory designed to be inside the chip.
The embedded systems industry is relatively new, but marching forward at a rapid rate.
Making use of multi-core processors has become a trend in the embedded industry.
Q U E S T I O N S
1. Explain what an embedded system is, with few examples.
2. How is software embedded into an ES?
3. Name four fields of applications for an embedded system.
4. List three characteristics that an embedded system should possess.
5. Can an electronic tablet be listed as an embedded system? Substantiate your answer.
6. What is the difference between an MCU and an MPU?
7. Why is power dissipation a very important factor in embedded design?
8. Why are DSP processors used in embedded design?
9. Name two new areas of deployment for embedded systems.
10. Name two commercial products based on the ARM processor.
E X E R C I S E S
1. Draw a block diagram of an embedded system which can be used for measuring short
distances.
2. Name a few embedded products in the field of bio-medical engineering.
M01_9788131787663_C01.indd 46
M01_9788131787663_C01.indd 46 7/3/2012 12:08:32 PM
7/3/2012 12:08:32 PM

Introduction
In Chapter 1, the model of an embedded system was defined and discussed. In this
chapter, we will probe a bit deeper. By now, it must be clear that embedded systems
include both hardware and software. This chapter plans to delve into the hardware
aspects and discuss the hardware building blocks of typical systems.
Anyone who wants to design a system starts with a suitable MCU which must
meet his requirements. The MCU should address all the needs of the application for
which the design is to be done. There is no point in choosing an MCU which is much
more advanced than what the application needs. We should not use a 32-bit MCU for
an application for which the data involved is very little. It is also important to ensure
that, as far as possible, all the controllers for the peripherals are available within the chip.
It should also be confirmed that there are sufficient numbers of I/O pins for connect-
ing all the peripherals. The clock frequency should be high enough, but not excessively
high because power dissipation tends to increase as clock frequency increases. In short, a
number of factors are to be kept in mind when hardware design is envisaged.
This chapter discusses the MCU as a hardware unit and examines the impor-
tant components in an MCU. Though the discussion is very general, the 8051 is used
(in certain sections) as an example to highlight some general features of MCU hardware.
This microcontroller has been chosen because it is one of the simplest and most popular
The hardware aspects of an embedded system
The ideas of power-on-reset and brown-
out-reset
The importance of the clock and general
purpose I/O
The working principle of timers and counters
The necessity of watchdog timers, real-
time clock and DMA
The concept of interrupts
Semiconductor memories like SRAM,
DRAM, Flash and EEPROM
The necessity and techniques of low power
design
Important concepts like pullups,pulldowns
and Hi-Z
embedded systems—
the hardware
point of view
2
Chapter-opening image: PIC and 8051 MCUs.
M02_9788131787663_C02.indd 47
M02_9788131787663_C02.indd 47 7/3/2012 12:08:41 PM
7/3/2012 12:08:41 PM

48 EMBEDDED SYSTEMS
of the available ones—there is also a chance that some of you have already done a course
on the 8051 MCU.Some features of PIC are also referred to in some parts of this chapter.
Those who already have had some exposure to microcontrollers will be able to con-
nect well with these discussions if some example MCUs are referred to. For others, this
chapter is meant to be the first step towards understanding embedded systems hardware.
It is important to keep in mind that when designing a system, the data sheet of the cho-
sen MCU should be referred and a thorough understanding of the pin diagram, register
structure, modes of operation, etc. should be gained.
In this chapter, a detailed discussion of semiconductor memory has also been
attempted. Different types of memory, and the relevance of each type, has been probed
into—this is necessary, because various types of memory are used in any system, and
even inside an MCU. Finally, low power design has also been elaborated upon.
2.1 | Microcontroller Unit (MCU)
In Chapter 1,it was made clear that a processing unit is needed for an embedded system.
An ‘embedded processor’ which is another name for a ‘microcontroller unit’, is used at
the heart of the system design.
There is a processor part inside the MCU block—this is the CPU or the computing
engine of the MCU. Figure 2.1 shows this. The other blocks inside it are memory and
peripheral controllers (the term peripheral is the term usually used,though we really mean
‘peripheral controllers’).The number and kind of peripheral controllers is not standard—
various MCUs have different sets of peripherals,but there are some peripherals which are
more or less a standard feature-like timers and the serial communication interface.
2.1.1 | The Processor
The processor is the unit which acts as the‘brain’—it is the central processing unit which per-
forms the required computations and controls the peripherals.Since memory and peripheral
controllers are also inside the chip, the data bus as well as the address bus are internal.
Figure 2.1 | The internal structure of an MCU
RAM
C
P
U
Peripherals
Timing and Control
MCU
ROM
M02_9788131787663_C02.indd 48
M02_9788131787663_C02.indd 48 7/3/2012 12:08:43 PM
7/3/2012 12:08:43 PM

EMBEDDED SYSTEMSTHE HARDWARE POINT OF VIEW 49
The ‘performance’ of the processing unit of an MCU is dependant on many factors,
like instruction set architecture (ISA), data bus width and addressing space. Its capacity
to perform complex computations at a fast rate is decided by the ISA. The ISA can be
defined as that aspect of computer architecture which is related to programming,includ-
ing the native data types, instruction registers, addressing modes, memory architecture
and interrupt handling. In short, it specifies the way in which an assembly language
programmer or a compiler designer views the processor.
2.1.2 | The Harvard Architecture
There are architectural variations in processors; the two classical divisions are the Von
Neumann and the Harvard architectures. In the former, both code and data share the
same address space and hence they are accessed using the same bus. Many MPUs use
this architecture, for example, the x86 series.
In the Harvard architecture, code and data are considered distinct and hence are
accessed by separate buses, and also have separate storage spaces. Figure 2.2 illustrates
this aspect. Microcontrollers achieve this physical separation by having instructions
stored in flash ROM and data in RAM.
Microcontrollers are single chip computers, and memories (RAM and ROM) are
inside the chip. As such, the buses for accessing data and instructions are inside the chip.
If the processor is an 8-bit one (like the 8051), it means that the data bus is 8 bits
wide and all computations are possible only for an 8-bit data, which means that for
arithmetic operations like addition, subtraction, multiplication, etc., the operands can be
only 8 bits in size.This also means that the data bus (internal) is 8 bits wide.If two 16-bit
numbers are to be added, a program should be written to add the lower two bytes, save
the carry, and then add the upper two bytes and the carry. Obviously, a processor with
16-bit data capability can do this with a single instruction.Thus, the second processor is
naturally superior to the first. A 32-bit processor will do still better, when large numbers
are involved.
Control Unit
I/O
Instruction
Memory
Data Memory
ALU
Figure 2.2 | The Harvard architecture
M02_9788131787663_C02.indd 49
M02_9788131787663_C02.indd 49 7/3/2012 12:08:43 PM
7/3/2012 12:08:43 PM

50 EMBEDDED SYSTEMS
Now about the address bus: If the address bus (internal) is 16 bits as is the case of
8051, a maximum of 216
bytes, that is, 64K of internal memory, can be addressed by the
CPU. The memory, in this context, is ROM, where the code will finally reside. Besides
this, there is RAM available inside the chip. This is relatively small and is divided as
general purpose registers, R/W memory area and special function registers (SFRs). The
latter refers to the registers associated with the peripherals.
Corresponding to every peripheral,a number of registers are needed.As an example,
think of a timer. A timer unit allows the generation of a square wave which can be taken
out from one of the pins. Inside the MCU, the timer unit has registers for fixing up
the period, mode, etc. The registers associated with the timer operation are called the
‘special function registers’ (SFRs) of the timers—similarly, there are SFRs for all other
peripherals.
The choice of MCU is dictated by the application requirement. Many applications
don’t require a lot of computation; so using a 32-bit MCU where the amount of data and
the complexity of processing is trivial, is a waste. The reason why 8-bit MCUs have the
biggest share in the embedded market is because many low end embedded systems have
to deal more with interfacing than with complex computations.
It is when large amounts of data have to be processed at high rates and the compu-
tations are complex do we go for 32-bit processors with powerful ISAs. ARM (refer to
Chapters 10 and 11) is one such processor. ARM is a computationally intensive proces-
sor, but it performs integer computations. If floating point arithmetic operations and
complex signal processing is involved, the MCU may be augmented by a DSP core.
Many of the newer embedded applications warrant the use of such a system in which
there is more than one processing unit. Refer to Figure 2.3 which shows an MCU with
two cores—an ARM core and a DSP core—along with the peripherals.
2.2 | A Popular 8-bit MCU
We will continue our discussion on the various aspects of embedded hardware using a
sample MCU—the 8051 family is the world’s most popular 8-bit microcontroller. It is
an 8-bit MCU, in the sense that all registers and ports are 8 bits wide, and data transfer
can occur at a maximum rate of 8 bits at a time. Its address bus is 16 bits wide, which
means that its internal memory can be 216
bytes, i.e. 64K. This is the maximum size of
internal ROM it can have.Figure 2.4 shows the functional pin configuration of the 8051
chip. It has four ports named P0, P1, P2 and P3. All these pins can be used as input or
Figure 2.3 | An MCU with two CPU cores
DSP
Core
ARM 9
Core
Peripherals
M02_9788131787663_C02.indd 50
M02_9788131787663_C02.indd 50 7/3/2012 12:08:43 PM
7/3/2012 12:08:43 PM

output, and many of the pin lines have dual functions. In Figure 2.4, Port3 pins have
been shown to have two functions each: they act either as I/O or act as signal lines for
the functions shown in the diagram.The ‘dual’functions of the other ports have not been
shown here.
2.2.1 | General Purpose I/O (GPIO)
All MCUs have a set of pins on which external peripherals can be connected.This is a set
of ports which can be used as a group or as a single pin, all of which are programmable
as inputs or outputs. Each port pin may have more than one function, and the user can
decide the functionality. Refer the pin conﬁguration of 8051 in Figure 2.4. It has four
8-bit ports P0 to P3.If used as a group,all eight bits of a port can be addressed to work in
unison. In that case, we use the nomenclature P0, P1, etc., in programs. If used individu-
ally, P1.0 and P1.1 indicate the 0th
and 1st
bits of Port 1. Similar notations are used for
the pins of the other ports. Each port and each pin can be used to connect peripherals.
Figure 2.4 | The functional pin diagram of the 8051
VCC
40
XTAL1
XTAL2
19
18
30 μF
30 μF
32
P0.7
33
P0.6
34
P0.5
35
P0.4
36
P0.3
37
P0.2
38
P0.1
39
P0.0
8
P1.7
7
P1.6
6
P1.5
5
P1.4
4
P1.3
3
P1.2
2
P1.1
1
P1.0
28
P2.7
27
P2.6
26
P2.5
25
P2.4
24
P2.3
23
P2.2
22
P2.1
21
P2.0
17
P3.7
16
P3.6
15
P3.5
14
P3.4
13
P3.3
12
P3.2
11
P3.1
10
P3.0
RXD
TXD
INT0
INT1
T0
T1
WR
RD
29
PSEN
30
ALE
31
EA
9
RST
VSS
20
8
0
5
1
M02_9788131787663_C02.indd 51
M02_9788131787663_C02.indd 51 7/3/2012 12:08:43 PM
7/3/2012 12:08:43 PM

52 EMBEDDED SYSTEMS
Figure 2.5 shows various peripheral devices connected to some of the ports of an
8051 chip. Port 0 is used as the 8-bit data port for an LCD display, P1.2 and P1.1 are
used for connecting switches. Port 2.0 is connected to a relay, and P2.4 to P2.7 to a
stepper motor driving circuit.This figure and the discussion are meant only to show that
GPIO pins can be used for any application as decided by the user. In this case, only P1.0
and P1.1 have been programmed to be inputs, all the others are output pins.
2.2.2 | Clock
All processor activities are synchronized by a clock. Usually, there is an oscillator inside
the chip,which produces a clock signal,the frequency of which is decided by the resonant
frequency of a crystal connected outside.Note the crystal connected between pins 18 and
19 of 8051 (Figure 2.4) with stabilizing capacitors of values ranging from 20pF to 40pF.
The performance of an MCU, in terms of ‘speed of computation’ is directly related
to the clock frequency. Each MCU has a specification as to what is the maximum fre-
quency of the crystal it can use. All timing calculations of timers, counters, serial port
baud rate, etc., are based upon the frequency of this clock.
2.2.3 | Power on Reset
Every processor has a power on reset circuit.‘Power on reset’means that when the power
is switched on, the processor should be reset. Otherwise, the system may operate initially
in an unpredictable fashion because flip-flops (inside the MCU) are not designed to
power-on in any particular state.The meaning of reset generally implies that internal reg-
isters (constituted by flip flops) are cleared, and that the processor gets ready to execute.
Because the program counter is cleared by reset,for almost all MCUs the particular loca-
tion from which the processor takes its first instruction from, is the first address itself.
Each MCU needs a specific pulse for reset to occur—it is specific in the sense that
the reset pulse is either high or low and also needs to have a minimum period.This reset
pulse should appear just once, when the power is switched on, and then it should disap-
pear. But there should be a provision for manual reset also, if necessary. There is a reset
pin for the MCU chip, but the ‘power on reset’ circuit is to be provided externally.
Figure 2.5 | I/O devices connected to the GPIO pins of 8051
Relay
P0
P2.0
P1.1
S1
S2 P1.2
P2.4
P2.5
P2.6
P2.7
D
r
i
v
e
r
8
0
5
1
Motor
LCD
Display
M02_9788131787663_C02.indd 52
M02_9788131787663_C02.indd 52 7/3/2012 12:08:43 PM
7/3/2012 12:08:43 PM

Figure 2.6 is a typical POR circuit. Let us analyse the circuit. Figure 2.7 a, b and c
show the voltage waveforms at the capacitor and at RES and RES. When power is ﬁrst
switched on, C starts charging as shown in Figure 2.7a. The presence of the Schmitt
trigger makes the exponential charging pulse to be a rectangular pulse, VH
is the voltage
at which the Schmitt switches (changes state). The value of R and C decide the value
Figure 2.6 | A typical power on reset circuit
R
D
C
RES
RES
Schmitt
Trigger
P
+5 V
RES
RES
T
V
V
VH
(b)
(a)
(c)
0
t
t
VC
T
V
0 t
Figure 2.7 | (a) The voltage waveform across the capacitor, (b) The pulse at RES and
(c) The pulse at RES
M02_9788131787663_C02.indd 53
M02_9788131787663_C02.indd 53 7/3/2012 12:08:43 PM
7/3/2012 12:08:43 PM

54 EMBEDDED SYSTEMS
of T of the pulse.This pulse obtained at the point RES of Figure 2.6 can be used by any
MCU which needs an active low reset.By using an inverter,a high pulse can be obtained
at RES, which can be used for an MCU which needs an active high reset.
Note that once the capacitor is fully charged, the voltage at the RES point is zero
and that at the RES point is V. Thus, after the reset pulse is over, the circuit behaves
normally as is needed in its operational mode.
Now see the other components in Figure 2.6. P is a push button connected across
the capacitor. This is for manual reset, if needed. Pressing the push button discharges
the capacitor, and on push button release, once again the capacitor gets recharged and
generates the reset pulse.
What is the need for the Diode D in Figure 2.6?
This diode is called a ﬂywheel diode. Normally the diode is reverse biased by the +5V
at its cathode. Now, if the capacitor is made to discharge suddenly by the push button
switch, the sudden change in polarity of the capacitor voltage can make the point A to
have a voltage above +5V.This makes the diode conducting and protects the MCU from
being damaged by the voltage surge.
2.2.3.1 | Power on Reset for 8051
Figure 2.6 is a general circuit which shows the principle behind power on reset circuits.
We can use any circuit which achieves the same result. The usually suggested POR cir-
cuit for the 8051 is very simple and consists just of a resistor and a capacitor. As the
capacitor gets charged from the power supply, the voltage at the reset pin, that is, across
the resistor falls exponentially as shown in Figure 2.8a. This functions as the high reset
pulse. It is quite likely that there is a Schmitt trigger inside the IC to convert this expo-
nential pulse to be a pulse with steep sides (see Figure 2.8b). You can add a push button
for reset and a ﬂywheel diode, if necessary.
Figure 2.8 | (a) Waveform at pin 9 of 8051 and (b) Reset pulse obtained internally
5V
VL
(b)
(a)
t
VR
T
V
0 t
M02_9788131787663_C02.indd 54
M02_9788131787663_C02.indd 54 7/3/2012 12:08:44 PM
7/3/2012 12:08:44 PM

The typical values used for an 8051 with a crystal frequency of 12 MHz are shown
in Figure 2.9.This MCU needs an active high pulse for reset, which is to be long enough
for its oscillator to start,plus two machine cycles (this is as per the specifications given by
the manufacturer). One machine cycle is 12 clock cycles. You can verify if the RC value
as used in the circuit of Figure 2.8 satisfies this condition.
2.2.4 | Brown Out Reset
Many embedded systems work on battery power. It may happen that the battery gets
drained and then the MCU does not work correctly.This may give wrong results leading
to wrong decisions and actions.To protect the system from occurrences of such a situa-
tion, it is to be ensured that if the power supply voltage goes below a specified level, the
MCU should be reset. This is called ‘brown out reset’. The PIC family of MCUs has an
inbuilt facility for ‘brown out reset’.The 8051 family does not have this. In such cases, an
external circuit is needed to ensure brown out reset. In certain other cases, it may also be
that the brown out voltage level needs to be different from that ensured by the inbuilt
circuitry of an MCU. Figure 2.10 shows a typical brown out reset circuit.
The circuit uses an analog comparator. The reference voltage is 2.25V, which is
defined by the breakdown voltage of the zener diode D. The voltage at A (the inverting
terminal of the op-amp) is half the power supply voltage V. If V = 5V, A has a voltage
VA
= 2.5V.The circuit is designed in such a way that if the voltage V goes below 4.5 V,VA
Figure 2.9 | A commonly used POR circuit for 8051
+5 V
RES
8.2 KΩ
10 μF
+
–
9
8
0
5
1
+V
D
R R1
R2
T1
A
–
+
B
R1 RES
RES
Figure 2.10 | A simple circuit for brown out reset
M02_9788131787663_C02.indd 55
M02_9788131787663_C02.indd 55 7/3/2012 12:08:44 PM
7/3/2012 12:08:44 PM

56 EMBEDDED SYSTEMS
will become less than 2.25V. Then the comparator changes state from low to high. This
can be used as the reset signal (RES) for an MCU which requires an active high reset.
For an MCU which requires the opposite polarity for reset (RES), transistor T1
acting as an inverter, can be added which goes low when the voltage at the output of the
comparator is high.
2.2.5 | Timers and Counters
All MCUs have timers as one of its peripherals. A timer is a dedicated hardware for
‘timing’events, to generate waveforms and so on. Once a timer is started, the CPU does
not have to interfere,as the timer hardware will take care of it.A number of registers may
be associated with timers,but basically there is a timer count register and a mode register
to decide between different options.
In its most basic form, the method of working of a timer is this: The processor clock
acts as the reference clock, and in many cases this clock frequency is divided by a con-
stant to get a lower frequency clock. In the timer count register, a number is loaded, and
the timer is then ‘started’.The count keeps increasing (or decreasing) until it reaches the
maximum (or minimum) value. At this point, the count register resets to 0, and this is
indicated by a flag bit or an interrupt.The time elapsed between the starting of counting
and the resetting of the count register can be used as a ‘measure’ of a delay.
In many MCUs, there is a prescaler that reduces the clock frequency used as the
reference, so that a longer delay is obtained—how much of prescaling is required can be
indicated in a prescale register.
What Is a counter?
In ‘timing’ applications, the reference clock is taken from the processor clock. But there
is another duty for timer units—to get it to act as a counter. A counter counts external
events; as such it does not use the processor clock. For example, if the frequency of an
external square wave is to be measured, the square wave is given as an input to a specific
MCU pin, and then the timer is started. The number in the count registers gets incre-
mented for every edge (leading or trailing) of the incoming square wave, and doing this
for say, one second, gives us the number of edges counted during that time. This is the
frequency of the incoming square wave. When the timing unit is used in this manner, it
is called an event counter.
For more sophisticated MCUs, there are advanced ‘compare and capture’ units
which perform counting in a very systematic manner. It is necessary to know the details
of individual MCUs for understanding it completely. The attempt here is only to give a
‘feel’ of the functions of the timer/counter units which are found in every MCU.
2.2.6 | Watchdog Timer
A watchdog timer is an additional timer that does a ‘monitoring’ job and resets the
system, if necessary. The scenario is this. Most embedded systems are expected to be
‘self-reliant’. There is very little possibility of intervention by a human operator in case
the associated software goes awry by getting stuck in an infinite loop. Some embedded
systems are placed in inaccessible sites like factory environments, space probes, etc.
When the software is detected to be malfunctioning, the best way is to reset and start
M02_9788131787663_C02.indd 56
M02_9788131787663_C02.indd 56 7/3/2012 12:08:44 PM
7/3/2012 12:08:44 PM

again. Such anomalies can occur due to various reasons like deadlocks (in a multitasking
environment), a noise voltage on some pin which may cause wrong triggering and so on.
The point is that, if such a situation arises, there should be a mechanism by which this is
automatically detected and gets the system to reset.
The watchdog timer is like any other timer. It can be loaded with a ‘count’ which
decrements down to zero. When it reaches zero, it resets the processor. For a system
which is doing its job correctly, the watchdog timer will never count down to zero.
Before that, the ‘correctly operating’ software will re-start it periodically and re-load
its original count, so as to prevent it from counting down to zero. What is the number
loaded as the ‘count’ in a WDT? This is decided by considering how much time is to be
allowed for the system to recover (on its own) before it is to be forcibly reset.
Figure 2.11 shows the way a watchdog timer operates. This is the case when the
WDT is external to the processor.The 8051 family of MCUs don’t have an internal WDT,
but most other MCU families like PIC, AVR, ARM, etc., have it as an internal timer.
When the processor gets reset by the WDT, it is called a ‘soft reset’ or a warm boot.
2.2.7 | Real-time Clock (RTC)
Many systems need to have a clock which refers to the ‘real’ time. For example, our PCs
give us a display of the system clock which refers to real world time. This clock gives a
reading in terms of seconds,minutes and hours.In the PC,there is a dedicated IC which
keeps count of ‘real’ time in such a way that even if the PC is oﬀ, the IC keeps working
and the time reference is not lost.This implies that this IC has an inbuilt battery which
keeps the clock ticking always.
Many embedded systems also need a reference to real time. Operating systems
loaded into memory need a timing tick at regular intervals. There are a lot of time ref-
erenced operations for which embedded systems are used. There are specialized ICs for
keeping time (like the Dallas DS series) which can be programmed to count time in
terms of seconds, minutes and hours. Of course, it is possible to count time by using
just a counter IC with a clock. But a dedicated RTC IC will make the task simpler and
the code compact. For an embedded system which needs such an RTC, such dedicated
chips may be used.
Besides that, there are some MCUs which have RTC peripherals on the chip itself.
Timing then occurs with respect to an oscillator which is derived from the system clock
frequency. For example, many ARM MCUs have on the chip, a real-time clock (RTC)
which is a set of counters for measuring time when system power is on, and optionally
when it is oﬀ (by supplying a battery back up).To use this RTC, the only action needed
is to write appropriate words into the associated registers.
Figure 2.11 | A watchdog timer setup
Clock
Processor
Restart
Reset
Watchdog
Timer
M02_9788131787663_C02.indd 57
M02_9788131787663_C02.indd 57 7/3/2012 12:08:44 PM
7/3/2012 12:08:44 PM

58 EMBEDDED SYSTEMS
2.2.8 | Stack
Associated with any system, there should be a stack. A stack is an area in RAM which
is used to store some important data temporarily, and from a software point of view it is
just a data structure. The operation of a stack is different from ordinary memory opera-
tions. Only two operations are normally defined for stacks: they are PUSH and POP
(the exact instructions used may vary from processor to processor). They can be either
LIFO (Last In First Out) or FIFO (First In First Out) stacks. In the case of the former,
the data that was last pushed in, is the one that can be taken out first.The data structure
of the latter is just the reverse. Most stacks are LIFO stacks.
It is obvious from the above that, in a stack, data cannot be written to or read from
random locations. The ‘top of the stack’ is the reference location, with respect to which
all accesses are done.
2.2.8.1 | Ascending/Descending Stacks
There is also another classification: the stack is either a descending or ascending type.
An ascending stack grows upwards. It starts from a low memory address and, as items
are pushed onto it, progresses to higher memory addresses. A descending stack grows
downwards. It starts from a high memory address, and as items are pushed onto it,
progresses to lower memory addresses.
Descending Stack
For any system, a stack must be defined before it can be used. Defining it amounts to
just giving an address to the stack top—this address must be in the ‘stack pointer’, i.e. SP
which is a processor register.
Let us have a look at the operation of a descending stack (see Figure 2.12).The stack
is in the RAM of the MCU. Let us consider a case of a RAM with 8-bit addresses. We
first load the number 0 × 42 (the notation 0 × 42 is for hex and is the same as 42H) in
SP.Thus, the beginning of the stack is finalized. Now let us push two bit numbers xx and
yy into the stack (which may now be in some registers).This entails the decrementing of
SP, writing xx into 0 × 41 and repeating the same for the next data byte. such that the
new value of SP is 0 × 40.This is how the PUSH instruction functions (see Figure 2.12a).
Figure 2.12a | PUSH operation in a descending stack
After PUSH
SP Value
Before PUSH
Stack
X X
Y Y 0 × 40
0 × 42
M02_9788131787663_C02.indd 58
M02_9788131787663_C02.indd 58 7/3/2012 12:08:44 PM
7/3/2012 12:08:44 PM

What about taking data out from the stack? SP is ﬁrst incremented, then the data
at the top is read. Referring to Figure 2.12b. If one byte alone is to be popped out, yy is
read and loaded into a register.The content of SP is 0 × 41 now
Ascending Stack
For an ascending stack,the operation is just the reverse.This is the kind of stack available
in the 8051 MCU. If the stack top is deﬁned as 0 × 42 (SP = 0 × 42), pushing two bytes
changes its content to 0 × 44; popping out one byte decrements it to 0 × 43. Figure 2.13
shows the operations for such a stack.
Figure 2.12b | POP operation in a descending stack
Before POP
After POP
Stack
X X
Y Y 0 × 40
0 × 41
Figure 2.13a | PUSH operation in an ascending stack
Before PUSH
SP Value
After PUSH
Stack
X
X
Y
Y
0 × 42
0 × 44
Figure 2.13b | POP operation in an ascending stack
After POP
SP Value
Before POP
Stack
X
X
Y
Y
0 × 43
0 × 44
M02_9788131787663_C02.indd 59
M02_9788131787663_C02.indd 59 7/3/2012 12:08:44 PM
7/3/2012 12:08:44 PM

60 EMBEDDED SYSTEMS
The above discussion is only to give you a feel about stacks in general. More details
of stack operation, the exact instructions used etc., should be learned with respect to a
specific processor.
Where are stacks used?
A stack is used to store data ‘temporarily’—so that it finds use in storing addresses and
processor status (content of registers including the flag register) when function calls and
interrupts occur. A programmer can find various other uses of stacks when he wants to
write intelligent and interesting programs.
2.2.9 | Interrupts
This is a very commonly used term in embedded processing,and thus it is very important
to understand it thoroughly. Let us start with the whole philosophy of this term with
respect to processor activity.
A processor in its operating state is normally performing some activity, that is, exe-
cuting a program. Now, if another more important activity is to be done, it is impera-
tive that the current activity be stopped and that the new one be taken up. For this, the
mechanism of ‘interrupts’ is needed.
When the processor is interrupted, it completes the instruction it is currently exe-
cuting and takes up the new task which it has been asked to do. This is actually a tem-
porary matter, because once the interrupting task is done, the processor must return and
continue from where it had been interrupted.
So there are a number of matters to be settled before ‘interrupting’ gets through.
i) The processor must save its current status which means that the content of registers
it is currently using must be saved. It must also save the address of the next instruc-
tion in the sequence, so that it can return to the right point at which it had left off.
ii) It should then ‘branch’ to the program that the interrupting device wants to get
executed.
iii) After executing this, it should ‘return’ to the earlier program and resume execution
from where it had left off.
Let us see how all this is done.
i) When a processor is interrupted, it saves the value of the working registers on the
stack (Section 2.2.8), and also the current content of the program counter (which
will point to the next instruction in the sequence).
ii) The program counter will then be loaded with the address of the first instruction of
the interrupt program.The interrupt program is designated as an ‘Interrupt Service
Routine’ (ISR) or ‘Interrupt handler’.
iii) The first instruction of the ISR is available at the ‘interrupt vector’.The word ‘vector’
in this context means ‘address’.For a particular interrupting device,there is a specific
location at which its ISR is stored (in memory).This is its interrupt vector.
What are the ways by which a processor can be interrupted?
i) It can be done by hardware,where a signal level on an ‘interrupt pin’can be activated.
For example, if a keyboard wants to cause an interrupt, it can activate an interrupt
M02_9788131787663_C02.indd 60
M02_9788131787663_C02.indd 60 7/3/2012 12:08:45 PM
7/3/2012 12:08:45 PM

pin of the processor. See Figure 2.4 for the 8051 in which Pin No. 12 is an interrupt
pin. For most MCUs, interrupts are vectored, which means that there is a definite
and pre-defined address at which the ISR is expected to be. For the interrupt in
INT0 the ‘vector’ is 0 × 03 (see Table 2.1), and so control branches to that location,
and the ISR therein will be executed to completion.The last instruction in any ISR
is a return instruction and that will take control back to the ‘interrupted program’.
ii) In MCUs, there are a number of inbuilt peripherals, and most of these periph-
erals have interrupts associated with them. For example, consider the case of the
timer which is a standard peripheral in almost all MCUs. The timer has a register
which keeps counting by increasing/decreasing the number in the register. When
it reaches its limit called the ‘terminal count’, a flag is set in another register. Along
with this, an interrupt can also be generated.This mans that control branches to its
‘vector’ and an ISR is taken from there. For 8051, the interrupt vector of its timer 0
is 0 × 0B (Table 2.1).
iii) There is also the facility to write instructions which are called software interrupts.
This is more applicable to general purpose CPUs.The x86 series can be interrupted
by instructions INT0, INT1 …INT255.
Why are interrupts very important in embedded processing?
Interrupts are extremely important in embedded systems.In fact,interrupting is ‘the’way
by which the processor interacts with the real world.
Think of a system in which a number of input peripherals are present. Each of these
peripherals can be made to act only on the basis of interrupts.In this case,the processor can
be simply kept in a waiting loop, or doing some activity. Any peripheral that needs service
can interrupt and get its ISR executed after which once again the processor goes back to its
waiting loop until the next peripheral wants service and interrupts it,and so on and so forth.
Note that many issues can come up here,like two (or more) peripherals interrupting
at the same time. How do you think this will be handled? Obviously one of them should
be defined as a more important peripheral and given priority. A well-designed system
will take care of all such possibilities and eventualities. Most systems have an ‘interrupt
controller’called a PIC (Programmable Interrupt Controller) which is a dedicated hard-
ware taking care of multiple interrupts by assigning and resolving priorities, fixing up
interrupt vectors and the like. Many embedded processors have such interrupt control-
lers inside them, while general purpose systems have external chips.
Can reset be considered as an interrupt?
All processors have a reset pin. On being reset, control branches to a particular address
(usually 0 for most MCUs) and this address is called the ‘reset vector’. So, considering
reset as a hardware interrupt seems quite reasonable.
2.2.9.1 | The Interrupt Structure of 8051
Now, let us take a look at the interrupt structure of the 8051. This will help to bring in
an additional level of clarity to our discussion on interrupts.
The 8051 has 5 interrupts excluding reset.Two are external hardware pins INT0 and
INT1 which use pin no. s 12 and 13, respectively. Note that they are active low inter-
rupts. An external device can be connected to these pins and cause the processor to be
M02_9788131787663_C02.indd 61
M02_9788131787663_C02.indd 61 7/3/2012 12:08:45 PM
7/3/2012 12:08:45 PM

62 EMBEDDED SYSTEMS
interrupted by making the pin low for a minimum time as specified in the manual of the
chip.The interrupt vectors of these hardware interrupts are listed in Table 2.1.
There are three other interrupts for this MCU.Two are timer interrupts. For each of
the timers,when its terminal count is reached,the timer flags are set and a corresponding
interrupt is also activated. Whatever is the action that needs to be done when the timer
count is reached, can be included in the corresponding ISR.
The other interrupt is the serial communication interrupt. As a matter of fact, this
interrupt pertains to both serial transmission as well as reception. Whenever a byte is
received or transmitted, this interrupt is activated and control branches to its interrupt
vector (Table 2.1).
For all the five interrupts,it is up to the user to decide whether the interrupt mecha-
nism is to be used or not—this means that there are ways and means to disable them
if they are not to act. This and many other options can be exercised by using the bits of
interrupt related registers.
The above discussion is about the interrupt structure of 8051, but most MCUs have
a similar structure. Of course, where are there more peripherals, you can expect to find
more number of interrupts, obviously.
Now look at Table 2.1 once again. Note the interrupt vectors. You will find there is
only an 8 byte space for the ISR of INT0.Will that space be sufficient to store a program
which needs to take care of an external peripheral? Quite unlikely. So usually, in the vec-
tored location, a jump instruction is written and then control gets re-directed to some
other address in memory.This is so, for all the interrupt vectors.
2.2.10 | DMA
DMA stands for ‘direct memory access’. So far, we have seen the processor communicat-
ing with either memory or I/O, that is, any communication always involves the proces-
sor. Is it possible for data in memory to be sent directly to I/O or vice versa without
involving the processor? For example, we might need to print a large chunk of data
which is in memory. This data can be sent to the output device (printer) directly from
memory. In this case, there is no necessity to involve the processor in the data transfer
and the data transfer is faster.This is called ‘direct memory access’.
However, since the processor is in the system, it must be isolated from this process.
This means that,when DMA is to take place,the connection of the processor to memory
and I/O is blocked, that is, the buses of the processor are to be tri-stated (in the high
Table 2.1 | Interrupt Vectors of 8051
Sl. No. Interrupt Vector
1 Reset 0 × 000
2 INT0 0 × 0003
3 Timer 0 0 × 000B
4 INT1 0 × 0013
5 Timer1 0 × 001B
6 Serial R/T 0 × 0023
M02_9788131787663_C02.indd 62
M02_9788131787663_C02.indd 62 7/3/2012 12:08:45 PM
7/3/2012 12:08:45 PM

impedance state), leaving the path open for data to be transferred directly between I/O
and memory. Figure 2.14 shows the typical scenario. Data transfer occurs between the
memory and I/O, in either direction.The processor cannot do any bus-related operation,
as its buses are in the float state.
A system which uses DMA should have a DMA controller which co-ordinates
this activity. When DMA operation is to be done, the DMA controller places a DMA
request on the ‘Bus Request’line of the processor.This is essentially a request for permis-
sion to take control of the system bus. Since DMA is a high priority service, the proces-
sor stops whatever it is doing and acknowledges the request by activating the ‘Bus Grant’
signal, and then DMA is initiated.
Except for low end applications, most embedded systems require DMA.The whole
process is managed and controlled by a DMA controller, usually available inside the
embedded processor. A number of registers also are needed to set up and get the DMA
operation done.The code for it will set up destination and source pointers denoting the
starting address of where the data is coming from and where it is be moved to.A counter
must be programmed to track the number of bytes in the transfer and give an indication
when the transfer is complete.
DMA is a standard feature in embedded and general purpose computing systems,
and the respective peripheral buses have pins for initiating and controlling the operation
of direct memory access.The transfer of data can be between I/O and memory, and also
from memory to memory. Having this kind of data transfer facility gives a significant
speed advantage, compared to what would have been obtained by writing code to get the
CPU to do it.
2.2.11 | Communication Ports
In this context, communication means ‘serial communication’. All MCUs have the
UART (Section 5.3.3) as a standard peripheral. There are registers, pins and interrupts
associated with this peripheral. See Figure 2.4 in which the 8051 has pins 10 and 11 for
serial reception and transmission. The pins are T × D for transmission and R × D for
Figure 2.14 | Processor buses in float state during DMA operation
Bus
Grant
System
Bus
Float
Memory
I/O
P
r
o
c
e
s
s
o
r
Bus
Request
M02_9788131787663_C02.indd 63
M02_9788131787663_C02.indd 63 7/3/2012 12:08:45 PM
7/3/2012 12:08:45 PM

64 EMBEDDED SYSTEMS
reception. Serial communication with a baud rate as decided by the register settings and
processor clock frequency, is possible with this communication setup.
More sophisticated MCUs have other communication ports as well. You may see
ports and peripheral controllers for protocols like SPI, I2C, bluetooth, USB, etc., for
some of the advanced MCUs like ARM, PIC, etc. The principles of these protocols are
dealt with, in Chapter 5.
2.3 | Memory for Embedded Systems
In general purpose computing systems, memory is outside the processor chip. Consider
a PC with a Pentium processor. It has primary memory in the form of RAM and ROM,
which are on the motherboard of the PC, but outside the processor. But for an embed-
ded processor (MCU), we know that RAM and ROM are inside the processor chip.The
amount of such memory available varies from chip to chip, and a designer is expected to
choose a processor which has sufficient amount of memory available on-chip,as required
for his application. But in case the memory requirements are much larger than that can
be provided on-chip, adding external memory is possible. Now, before we go into such
issues,a brief review of different kinds of semiconductor memory devices will be in place.
2.3.1 | Semiconductor Memory
First let us discuss some general aspects of semiconductor memories. Data is stored in
memory, and it is usually defined to be ‘byte oriented’ (in most cases). This means that
one address corresponds to one byte of memory.Thus, when one address is accessed, one
byte is either read or written into it. This suggests that for getting a word of four bytes
from/to memory, four consecutive locations with four addresses have to be accessed.
There are also memory devices in which memory is organized as 16 bits. In such a
case, each 16-bit data has one address.
Reading and writing takes a certain amount of time which is termed ‘memory access
time’. For reading, this is the time interval from the instant the address is placed at the
address pins to the time the data is available at the data pins. A similar definition applies
to write access time as well. The access time of any memory device depends on the
technology involved. Another term frequently encountered is ‘memory cycle time’. This
is the time interval between two consecutive memory accesses.These terms will be used
frequently throughout this chapter.
2.3.2 | Random Access Memory (RAM)
The word RAM stands for ‘Random Access Memory’ in which any location can be
accessed randomly (in contrast to serial access like in the case of magnetic tapes) with the
same latency (delay).It is a volatile memory which means that when power is removed,the
stored data is lost. There are different types of RAM depending on the technology used.
The fastest and hence the most expensive RAM is SRAM which stands for ‘Static RAM’.
2.3.2.1 | Static RAM (SRAM)
Each cell holds either a ‘0’or a ‘1’, and this content is static, because the content is stored
as a voltage,which does not change with time.Memory is realized using MOSFETS and
M02_9788131787663_C02.indd 64
M02_9788131787663_C02.indd 64 7/3/2012 12:08:45 PM
7/3/2012 12:08:45 PM

the most commonly used type of memory cell (for storing one bit) needs six transistors
and is called the 6T cell. Figure 2.15 shows this cell—it has two cross coupled NMOS
transistors (N1 and N2) acting as a bistable multivibrator and two PMOS transistors
(P1 and P2) acting as their loads. The output of N1 or N2 can be considered as the cell
output.(In many cases,a differential output is taken,which is better for noise immunity.)
The access transistors N3 and N4, and the word line WL and bit lines BL and BL,
are used to read and write from or to the cell. Normally (when no reading or writing is
required), the word line is low, turning the access transistors (N3 and N4) off. To write
into the cell, the logic data is placed on the bit line BL and the complementary data on
the inverse bit line, BL.Then the access transistors are turned on by setting the word line
to high. As the driver of the bit lines is strong, it will assert this information on the cross
coupled transistors. As soon as the information is thus stored in the bistable multivibra-
tor, the access transistors can be turned off and the information in the cell is preserved.
For reading, the word line is turned on to activate the access transistors while the
information is sensed at the bit lines.
This (Figure 2.15) is the cell for storing a single bit; for a byte, we need 8 such cells,
which needs 8 × 6 = 48 transistors for just a single byte. This is one of the reasons why
SRAM is very expensive.
2.3.2.2 | An SRAM Chip
Figure 2.16 shows a typical SRAM chip which has N address lines,8 data lines and con-
trol signals for reading and writing. Only when the CE ((Chip Enable) line is activated
will the chip become usable (selected or enabled). WR is the write control signal and OE
is the read control signal (OE stands for ‘output enable’). It enables the data lines for
reading, i.e. it is the read control signal)
Now, observe the read and write timings for SRAM (Figures 2.17 and 2.18) for
which the steps are listed. Note that the timing is ‘asynchronous’ because there is no
reference clock.
Figure 2.15 | An SRAM cell storing one bit
VDD
GND
P1 P2
BL
BL
N2
N1
N4
N3
Word Line WL
M02_9788131787663_C02.indd 65
M02_9788131787663_C02.indd 65 7/3/2012 12:08:45 PM
7/3/2012 12:08:45 PM

66 EMBEDDED SYSTEMS
2.3.2.3 | Memory Read Cycle
The steps in a read cycle of SRAM are as follows:
i) Place the address of the byte to be read, on the address bus.
ii) Ensure that the chip is activated by making CE low.
Figure 2.16 | An SRAM chip with control signals
D0
A0
D7
AN–1
RAM
CE
WR OE
Figure 2.17 | Asynchronous read timing for SRAM
Address Valid
tAA
tRC
Address
Data Valid
Data Out
CE
OE
Figure 2.18 | Asynchronous write timing for SRAM
Address
Data Valid
Data
Data
Setup
Data
Hold
Data
CS
WR
M02_9788131787663_C02.indd 66
M02_9788131787663_C02.indd 66 7/3/2012 12:08:45 PM
7/3/2012 12:08:45 PM

iii) Activate the OE pin which is the RD pin itself.This ensures that data is read.
iv) The required data then appears on the data bus.
In the timing diagram, two timing figures are shown: one is tAA
, the read access
time. This is the time measured from the instant the address is placed on the address bus
to the point in time when the required data is available on the data pins.
The other timing figure is tRC
, the read cycle time, which is the minimum time
between two read cycles. These two timing figures can be equal for SRAM because, as
soon as data is available at the data pins, it can be read by the processor, and a new read
cycle may be initiated.(This is pointed out here to contrast it with DRAM timing,where
we will see another element of delay.)
2.3.2.4 | Memory Write Cycle
A write cycle has also a similar timing.The steps in writing are as follows:
i) Place the address of the byte to be read, on the address bus.
ii) Ensure that the chip is activated by making CE low.
iii) Place the data to be written on the data bus.
iv) Activate the WR line. Only then the data is considered to be valid.
v) The data then gets written into the addressed location.
What are the merits and de-merits of SRAM?
To achieve low levels of power consumptions, CMOS is typically used for SRAM tech-
nology. It uses less power than DRAM (which we will discuss in the following sections),
but at high frequencies, it can also consume significant amounts of power. Because each
cell needs at least six transistors, SRAMs are quite expensive. They are as fast as typical
CPUs because of using the same technology as the CPU. Any MCU will have a certain
amount of SRAM inside it by the name ‘on chip RAM’. For many MCUs this SRAM
includes the internal registers which are arranged as ‘banks’.
2.3.2.5 | Synchronous SRAM (SSRAM)
Ordinary SRAMs are asynchronous in the sense that there is no clock signal for timing
the read and write operations. But recently, synchronous SRAM (called SSRAM) has
been produced for high speed applications. In such SRAMs, processor clocks create the
timing for read and write. SSRAMs are used in high speed applications. They are used
as the ‘cache’ for Power PC and Pentium-based workstations.
2.3.3 | Dynamic RAM (DRAM)
Another very popular and widely used RAM is‘dynamic RAM’.It is designated as dynamic
because its content does not remain unchanged or static as in SRAM, and hence frequent
‘refreshing’is necessary.
To understand this point, let us see what is contained in a typical DRAM cell
(Figure 2.19). A DRAM memory cell consists of a single field effect transistor (FET)
and a capacitor. It is the amount of charge stored in the capacitor that decides whether
the cell stores a ‘1’or a ‘0’. One of the problems with this arrangement is that the capaci-
tor does not hold charge indefinitely as the charge in a capacitor ‘leaks off’ and needs to
be replenished.This action of replenishing the charge that gets lost is done by ‘refreshing’
M02_9788131787663_C02.indd 67
M02_9788131787663_C02.indd 67 7/3/2012 12:08:45 PM
7/3/2012 12:08:45 PM

68 EMBEDDED SYSTEMS
the cell at regular intervals.The data is sensed and written and this then ensures that any
leakage is overcome,and the data is re-instated.The two lines,the Word Line and the Bit
Line are connected as shown, so that the required bit within the memory can be selected
to be read or written to.
In a DRAM chip, there are multitudes of such cells which form words consisting of
bits.Refreshing for the cells is done at one go,in a particular sequence.Memory addresses
are decoded and converted as rows and columns of the matrix that memory elements are
arranged in.
2.3.3.1 | Read Cycle of DRAM
Let us see the steps involved in a typical memory read of a DRAM chip. Recollect that
a processor when addressing memory sends the complete address on its address pins.
Between the processor and a DRAM chip, there is a memory controller whose function
is to split the address into two, as columns and rows. A DRAM has only half the number
of address pins as the address supplied by the processor, because the address lines of the
DRAM chip are multiplexed in time for the row and column addresses.The memory con-
troller should also generate the signals necessary for reading and writing to the DRAM.
Figure 2.20 displays the memory controller of a DRAM.Because the row and address
information is placed on the same address lines (multiplexed in time) the pin count of the
Figure 2.19 | A DRAM cell
Word Line
Gnd
Bit
Line
Address
Row/Column
Address
Processor DRAM
Data
Dram
Controller
Clock
Clock
RAS
CAS
CS
R/W
R/W
Figure 2.20 | Memory control for DRAM
M02_9788131787663_C02.indd 68
M02_9788131787663_C02.indd 68 7/3/2012 12:08:45 PM
7/3/2012 12:08:45 PM

DRAM chip is reduced. DRAM chips are large rectangular arrays of memory cells with
support logic that is used for reading and writing data in the arrays, and refresh circuitry
to maintain the integrity of stored data.Memory arrays are arranged in rows and columns
of memory cells called word lines and bit lines, respectively. Each memory cell has a
unique location or address defined by the intersection of a row and a column.
Let us see the steps in a typical read cycle of DRAM. Refer to the internal diagram
of a DRAM chip (Figure 2.21) and the timing diagram in Figure 2.22.
i) The row address is placed on the rows and given sufficient time to stabilize and be
latched.
ii) The row address strobe RAS signal is then activated.
iii) The row address decoder selects the proper row.
iv) Next, the column address is placed on the same address lines and allowed to stabi-
lize and be latched.
v) The column address strobe CAS signal is then activated.
vi) The CAS pin also serves as the output enable, so once the CAS signal has been
stabilized, the sense amps place the data from the selected row and column, on the
data bus.
vii) With this, the data in the selected address is available at the output buffers of the
chip, and it is transferred to the data bus.
Figure 2.21 | Internal diagram of a DRAM chip
Column Address Latch
Row
Addr.
Dcdr
Column Address Decoder
Address Bus Data Bus
Memory
Array
Row
Addr.
Latch
Sense and Refresh
Amplifiers
CAS
RAS
M02_9788131787663_C02.indd 69
M02_9788131787663_C02.indd 69 7/3/2012 12:08:46 PM
7/3/2012 12:08:46 PM

70 EMBEDDED SYSTEMS
viii) Before the read cycle can be considered complete, CAS and RAS must return to
their previous state.
This is a conventional asynchronous read,because the timing signals are not tied to
the main system clock.The access time (tRAC
) is the time from the time the RAS signal
is activated to the time the data is available on the data bus. The read cycle time (tRC
)
is also shown in the diagram. Observe that another time tRP
is included within this
read cycle time. The total read cycle time is the sum of the ‘RAS active time’, and the
‘RAS pre-charge time’. The ﬁrst corresponds to the time during which the RAS signal
is active (low).
What Is the ‘RAS pre-charge time’, (tRP
)?
It is the additional time needed before a new read (or write) cycle can be started. This
is because there is a parasitic capacitance for each cell. This parasitic capacitance must
be pre-charged high before any operation is to be commenced. The access time is also
referred to as latency. This applies to write cycles also.
One important merit of DRAM is that its packing density is very high compared
to SRAM. Note that for DRAM, one-bit storage requires only one transistor, while for
SRAM a minimum of six (at least four) transistors is required.
Refreshing
What about the refreshing rate? It varies, but typically manufacturers specify that
each row should be refreshed every 64 msecs. This time interval falls in line with the
JEDEC (Joint Electron Device Engineering Council) standards for dynamic RAM
refresh periods. How is refreshing done? There are many methods for refresh, and one
commonly used method is called ROR (RAS Only Refresh). Practically, it is done by
activating each row using RAS.
Figure 2.22 | Read timing diagram for DRAM
tRAC
= Access Time from RAS
tCAC
= Access Time from CAS
tRC
= Read Cycle Time
tRP
= RAS Precharge Time
RAS
CAS
Row
Data
RAS Active Time
Data
Column
tRC
tRP
tCAC
tRAC
Address
RAS
Precharge
Time
M02_9788131787663_C02.indd 70
M02_9788131787663_C02.indd 70 7/3/2012 12:08:46 PM
7/3/2012 12:08:46 PM

The DRAM controller takes care of scheduling the refreshes and making sure that
they do not interfere with regular reads and writes. So to keep the data in DRAM chip
from leaking away, the DRAM controller periodically sweeps through all of the rows by
cycling repeatedly and placing a series of row addresses on the address bus.This method
is designated as ROR or RAS Only Refresh. To reduce the number of refresh cycles,
one method of design is to split the address such that there are fewer rows and more
columns. So, the DRAM array is then a rectangular array, rather than a square one.
2.3.3.2 | Synchronous DRAM (SDRAM)
You might have noted the word ‘asynchronous’ when we talked about the read timing
cycle of DRAM. This meant that the access timing is not related to the system clock
at all. In around 1996, a new type of DRAM started making headway in the memory
arena and that technological innovation in DRAMs is called Synchronous DRAM. For
this type of DRAM, accesses are synchronized with the system clock and SDRAM
is currently ‘the’ RAM that is used as the primary (main) memory in general purpose
computer systems.
Embedded processors do not have SDRAM inside them. But many embedded
boards do have SDRAM as external memory.
Now,let us see what SDRAM has to offer in terms of improvements over asynchro-
nous DRAM. Technologically, they are similar, but SDRAM has incorporated certain
new features and modes of operations.
Synchronous Operation All operations are synchronized to the leading edge of the
system clock and thus controls are made easier.
ModeRegister There is a command register for this RAM into which command words
are written to specify various operating modes and also generate various control signals.
This is an entirely new concept for memory chips and thus allows a level of control based
on the level of performance needed from the RAM. In effect, it makes the RAM per-
formance ‘programmable’.
2.3.3.3 | Synchronous vs Asynchronous DRAMs
Let us conclude by saying that in terms of the basic technology and principle of opera-
tion of a basic DRAM memory cell, both types are the same, but SDRAM scores higher
only because of the way it is used. Since asynchronous DRAM does not share any sort
of common clock signal with the CPU, the controller chips have to manipulate the
DRAM’s control pins based on all sorts of timing considerations. SDRAM, however,
shares the bus clock with the CPU. Commands can be placed (or, certain predefined
combinations of signals) on its control pins on clock edges.
A significant difference between conventional DRAM and SDRAM is the way in
which memory access is executed. In a standard DRAM, the toggling of the external
control inputs has a direct effect on the internal memory array. In an SDRAM, the
input signals are latched into a control logic block which functions as the input to a state
machine.Therefore, the state machine actually controls the memory access. Basic opera-
tions of the SDRAM, such as read, write and refresh, are initiated by loading control
commands into the device.
M02_9788131787663_C02.indd 71
M02_9788131787663_C02.indd 71 7/3/2012 12:08:46 PM
7/3/2012 12:08:46 PM

72 EMBEDDED SYSTEMS
2.3.3.4 | DDR
This stands for ‘Double Data Rate’ SDRAM and the difference it has from regular
SDRAM is that it can be made to transfer data at both the rising and falling edges of the
clock,instead of just at the rising edge.This should double the data rate and hence the des-
ignation DDR. It achieves higher throughput by using differential pairs of signal wires to
allow faster signalling without noise and interference problems.DDR SDRAM first came
to market in 2000,but it did not really catch on until 2001 with the advent of mainstream
motherboards and chipsets supporting it. DDR found initial support in the graphics card
market and since then has become the mainstream PC memory standard. As such, DDR
SDRAM is supported by all the major processors and memory manufacturers.
DDR–2 and DDR–3 These are just faster versions of DDR SDRAMs and use special
techniques for speed up, considering that the basic latencies of a DRAM cell can never
be done away with completely. DDR2 is still double data rate just as with DDR, but the
modified signalling method enables higher speeds to be achieved with more immunity
to noise and crosstalk between the signals.The additional signals required for differential
pairs add to the pin count of DDR2 and DDR3.
2.3.4 | ROM (Read Only Memory)
This is a very commonly used term and we know it stands for ‘Read Only Memory’.
The user has the option to ‘burn in’ the contents of this, which is not lost when power is
switched off. Current terminology is ‘firmware’ for the ROM and its contents together.
ROM is a type of ‘programmable’memory. It has internal fuses which when blown,
create a bit pattern which is permanent and hence can be read whenever needed.
However, if it is an OTP (one time programmable) ROM, its contents can never be
changed again. This statement implies that there are other kinds of ROM into which
data can be re-written. This is EPROM. EPROMs are ‘Erasable and Programmable’–
their contents can be erased by exposing them to ultraviolet radiation. Such ROMs have
a window through which UV light penetrates the chip.
2.3.4.1 | EEPROM
This is ‘Electrically Erasable’ PROM, and erasure can be done while the chip is on the
circuit board.The predominant feature of EEPROM is that the programmer can change
the data embedded on the memory one byte at a time, giving him more control on how
he enters the data. However, this method takes a very long time especially when erasing
all the data in it.
The advantage of EEPROM is that it is non-volatile, but also erasable and repro-
grammable. Because of this, EEPROMs are used for data storage in small quantities
in embedded systems, where this data may have to be referenced (read) frequently by
the application software, but may not have to be changed normally. There are dedicated
EEPROM chips available which may be connected to MCUs, on an embedded board.
EEPROMs are of two types: parallel and serial.The data lines of parallel EEPROMs
can be directly connected to the processor’s data lines for data transfer. In the case of
serial EEPROMs,there are only one/two data lines.For transferring data between a pro-
cessor and the serial chip, serial protocols like SPI, I2C (refer Section 5.2) can be used.
M02_9788131787663_C02.indd 72
M02_9788131787663_C02.indd 72 7/3/2012 12:08:46 PM
7/3/2012 12:08:46 PM

There are PIC MCUs with EEPROMs inside the chip. The size of this is not very
large, but this has a special place in the system, in that it is used to store data like certain
constants, tables, identification numbers, etc. Frequent reading of this data may take
place in the course of program execution.
2.3.4.2 | Flash Memory
Flash memory is the most popular of the non-volatile types of memory available cur-
rently. In technology, it is similar to EEPROM, but there is the difference between the
two, in that EEPROM can be erased and written only one byte at a time, while for flash,
these operations are ‘block oriented’.
The Flash Memory Cell
Flash memory stores data in an array of memory cells.The memory cells are made from
floating-gate MOSFETS (known as FGMOS).These FG MOSFETs can store electri-
cal charge (as ‘1’ or ‘0’) for years, without the need for a power supply. This is the secret
of the robustness of flash storage.
Principle of Data Storage in Flash Memory A flash cell stores the data by removing or
putting electrons on its floating gate (Figure 2.23).The charge on the floating gate affects
the threshold of the memory element. When electrons are present on the floating gate,
no current flows through the transistor. This denotes a ‘0’. When electrons are removed
from the floating gate, the transistor starts conducting, indicating a ‘1’. This is achieved
by applying voltages between the control gate and source or drain. There are some tun-
nelling phenomena associated with all this, but a more detailed explanation is beyond
the scope of this chapter (and book).
There are two types of flash memory: the NOR flash developed by Intel and NAND
flash, which originated from Toshiba technology. The names, NOR-flash and NAND-
flash, came from the structure used for the interconnections between memory cells as
shown in Figure 2.24.
Figure 2.23 | A flash memory cell
Electron Flow During
Programming
Floating
Gate
Control
Gate
Vg
SiO2
Vd
Vs
P-substrate
N+ N+
M02_9788131787663_C02.indd 73
M02_9788131787663_C02.indd 73 7/3/2012 12:08:46 PM
7/3/2012 12:08:46 PM

74 EMBEDDED SYSTEMS
In NOR-flash, cells are connected in parallel to the bit lines facilitating the read-
ing/writing/erasing of each cell individually. This parallel connection is similar to the
parallel connection of transistors in a CMOS NOR array, and hence the name ‘NOR
flash’. In contrast, cells in a NAND flash are connected in series. This difference is the
reason for differences in the way of usage and the application domain of the two types
of cells.
NOR flash is relatively higher speed (than NAND), and here random access is pos-
sible such that data can be read or written in quantities as small as a byte. NAND flash,
on the other hand, does sequential access and can read or write only as blocks.
Because of this, the former is used in applications where data needs to be randomly
accessed—it is used to store BIOS in PCs and operating systems in PDAs and mobile
phones. The latter, that is, NAND flash finds use in applications where data is sequen-
tially stored and retrieved, that is, for data storage in digital cameras, mobile phones,
MP3 players, USB memory sticks and so on.
Figure 2.24 | NAND and NOR Flash
Bit
Line
Select
Bit
Line
WL7
WL6
WL5
WL4
WL3
WL2
WL1
WL0
Ground
Select
NAND-Flash Structure
Bit
Line
WL5
WL4
WL3
WL2
WL1
WL0
NOR-Flash Structure
M02_9788131787663_C02.indd 74
M02_9788131787663_C02.indd 74 7/3/2012 12:08:46 PM
7/3/2012 12:08:46 PM

Have you heard about SD cards?
SD stands for ‘Secure Digital’ and such cards are flash memory cards with security
features incorporated.The SD format includes several important technological features.
These include the addition of cryptographic security protection for copyrighted data/
music.There is an SD Card Association,which sets the specifications for SD cards.Such
cards are available in various capacities and also sizes—that’s why we have cards desig-
nated as mini SD, micro SD. MMC (multi media card), mobile MMC, etc.
Appendix I elaborates the interfacing of an SD card to an ARM MCU.
Comparing Volatile and Non-volatile Memory
Now that we have taken a brief tour of both volatile and non-volatile memory, it should
be quite evident that each of these memory types have different application domains.
Table 2.2 summarizes the differences between these types.
Which of these memory types are available in MCU chips?
All MCUs have a small amount of SRAM,and a relatively large amount of flash.SRAM
is used for a small amount of storage of intermediate data during computations. The
flash is the area where code is burned. A part of it may be used for data storage, as well.
Besides this, some MCUs have a small amount of EEPROM used to hold data
which is unlikely to change once the system is designed and launched.
As an example, have a look at the on chip memory structure of the PIC 18 F series
of MCUs.This series has a maximum SRAM size of 4KB, EEPROM of 1024 bytes and
flash ROM of 128KB. Individual members of this series have different amount of these
types of memory.The 8051 family, on the other hand, has a maximum size of SRAM of
256 bytes, flash ROM of 64KB and no EEPROM.
Memory Shadowing—What does it mean?
In PCs, the BIOS is in flash, which is a relatively slow memory device. Usually, device
drivers which are part of BIOS are frequently used, so having to access ROM each time
Table 2.2 | Comparison of Different Memory Types
Memory Type Features Usage
SRAM Volatile, high speed Cache
On chip RAM in MCUs
DRAM Volatile, speed
lower than SRAM
Primary memory in PCs
External memory in embedded
boards
EEPROM Non-volatile, very
low speed
Storage of small amounts of data
which remains more or less constant
FLASH ROM Non-volatile, lower
speed than RAMs
i) Storage of large amounts of data
which may have to be changed
frequently
ii) Storage of the program code of
embedded systems
M02_9788131787663_C02.indd 75
M02_9788131787663_C02.indd 75 7/3/2012 12:08:47 PM
7/3/2012 12:08:47 PM

76 EMBEDDED SYSTEMS
creates quite a delay.To circumvent this,the BIOS is copied to RAM and ROM accesses
are disabled. Thus access becomes very fast. The RAM used for this is called shadow
RAM and the technique is called memory shadowing.
There can be a similar problem for embedded systems which are expected to per-
form high speed applications. All code and most data are in the flash, the response time
of which is slow.To hasten up the system,memory shadowing will be necessary here too.
2.3.5 | Caches
Cache is a word you might have heard in various contexts. Most of you might already
have studied the operation of caches with respect to ‘computer architecture’. The pur-
pose of this section is not to discuss it in detail, but to stress the importance of cache in
embedded systems architecture.
We know that general purpose computing systems have a cache in the system. The
memory hierarchy in such a system is as shown in Figure 2.25.
Whenever the processor needs an instruction or data, the ideal condition is to find
it in the cache (this is called a hit). It is also possible that at certain times, the required
information is not found in the cache. This is called a miss and then, it will have to be
taken from main memory, and this is a relatively slow access.The cache mapping policy
is designed to maximize hits, and one important point to keep in mind is that a cache
is entirely managed by hardware—as the cache size increases or the mapping policy
becomes more stringent, the amount of hardware for managing this, increases. Thus,
the amount of cache required for a system and the amount it can afford (in terms of the
increased hardware budget) have to be carefully balanced.
Cache is the memory closest to the processor and also the fastest. But remember
that the content of cache is only a ‘copy of a portion’ of the main memory. In a general
purpose computing system, cache memory uses SRAM, while main memory is DRAM
technology—there are also different levels of caches—L1, L2 and L3, with L1 being
closest and L3 being farthest from the CPU.
2.3.5.1 | Embedded Systems and Cache
Do embedded systems have caches? The answer is that only the mid and high range sys-
tems possess it or need it. Thus, many high end processors, specifically DSP processors
have it in their cores.
Consider a system as shown in Figure 2.26. This system could be an embedded pro-
cessor board or a node in a networked system. Other blocks present in the system are not
Figure 2.25 | Memory hierarchy in a system with cache
Main
Memory
Bus
Cache
C
P
U
Secondary
Memory
M02_9788131787663_C02.indd 76
M02_9788131787663_C02.indd 76 7/3/2012 12:08:47 PM
7/3/2012 12:08:47 PM

Figure 2.26 | A hardware node of an embedded system
L2 Cache
Hardware Node
L1 D – Cache
Processor Core
L1 I – Cache
shown in the ﬁgure.There is a processor in the system,and the processor has an on chip L1
cache which is divided into an instruction cache and a data cache. L2 cache is outside the
processor.DSP processors are one class of processors which mandatorily use cache internally.
Figure 2.27 shows the internals of a DSP processor with L1 caches and their respec-
tive controllers.
2.3.5.2 | Caches and Real-time Systems
Caches, however, are not a preferred item for real-time systems (Section 8.2). The rea-
son can be explained to be because of the probabilistic nature of operations of caches.
Everything is ﬁne for a hit—but in case of a miss, access becomes very slow. Since
the data in the cache is continually changed, there is no way of guaranteeing that a
particular item will be there when needed. Real-time systems are ‘time critical’ and are
Figure 2.27 | A DSP processor with cache
L1
D Cache
L1
I Cache
D Cache
Controller
I Cache
Controller
DSP
Computation Engine
M02_9788131787663_C02.indd 77
M02_9788131787663_C02.indd 77 7/3/2012 12:08:47 PM
7/3/2012 12:08:47 PM

78 EMBEDDED SYSTEMS
to be designed to be deterministic. Processors used for ‘real-time’ applications usually
have a ‘tightly coupled memory’ which gives deterministic performance in contrast to
the probabilistic performance of caches.
2.3.5.3 | Tightly Coupled Memory
This is just a fast memory directly connected to the processor core (inside the chip). It is
meant to provide low-latency memory that the processor can use without the unpredict-
ability associated with caches. Actually, it is an area of memory in which important and
critical items like interrupt handlers,real-time task codes,etc.,are stored.It can also hold
important data which is needed frequently.Many ARM processors which are designated
to be specific for real-time applications have TCM and associated registers. Such sys-
tems might have caches as well, but the caches may be disabled if they are not to be used.
2.4 | Low Power Design
This is the age of handheld devices and the consequent requirement for miniaturiza-
tion. More and more applications are being built into devices which are small and can
be carried around—this requires more powerful processors, and also needs peripherals
which are very small. We need tiny keys, touch screens with good resolution and small
packaging. But beyond all this, there is one important factor which is a key issue in any
design, and that is ‘power’. All these appliances are battery operated and hence the need
to preserve the battery power is a matter of high priority.
So the current focus of research is on low power design. Every device used in the
system must be able to work with the lowest possible power supply voltage and the low-
est current.This applies to the processor and to the peripherals. Besides this, techniques
must be looked at so that when a device is not doing anything, it is in the sleep or dor-
mant state and wakes up only when alerted by an interrupt.Here,let us have a quick look
at what low power design entails.
The first point to take into account while designing a system aimed to dissipate min-
imum power is to choose an MCU with a low power processor core. The 32-bit ARM
core is claimed to be one such processor. TI’s MSP 430 is a 16-bit low power processor.
There are many other processors in the market with ‘low power’claims tagged to them.
What are the steps to be taken into consideration when there
is the need to design systems which are power limited?
To get a grip on this, let us think of the factors which tend to increase power dissipation
and then attempt to reduce the effect of these factors.
i) High frequency: The higher the clock frequency, higher is the power. In fact,
doubling the clock frequency translates to doubling the power. But when high per-
formance is needed, high frequencies cannot be avoided. The point then, is to use
only the minimum clock rate that is actually needed.
In the same design, some applications don’t need the same high clock rates as
certain others.The trick is to use a processor whose clock frequency can be dynami-
cally changed.Many of the modern day processors have internal PLLs (phase locked
loops) that can change the clock frequency on the fly.
M02_9788131787663_C02.indd 78
M02_9788131787663_C02.indd 78 7/3/2012 12:08:47 PM
7/3/2012 12:08:47 PM

ii) High power supply voltage: High power supply causes high currents and so more
power.The trend now is to have low supply voltages in the range of 3V.This reduces
noise margins, but ‘differential signalling’ (Section 5.3.1.5 ) has helped to improve
noise resilience.
ii) Complex hardware: Having simpler hardware directly translates to low power;
there are less number of transistors and so less is the power dissipated. RISC cores
are comparatively low power cores, and choosing such a core (if it can satisfy the
performance criteria also) is the first major step in this direction.
iii) Higher bus widths: Since more bus lines means more capacitances getting
charged and discharged for change of logic states (which does not translate to use-
ful power), it is best to reduce bus widths of the external (off chip) buses.
iv) I/O devices: Certain I/O device like optical isolators, electromechanical relays,
back lit displays, etc., are not ones that favour low power; it is best to avoid such
devices. In I/O selection, leakage and quiescent current ratings should be verified
and those which offer the lowest, should be chosen.
v) Circuit design: In the design of the processor, it is likely that the designers have
used ideas to switch off parts of the circuitry which are not being actively used.The
same ideas should be used in the circuits in the system design. Using gated clocks
help when the clock is removed from that part of the circuit—it doesn’t function
and thus needs no power.
vi) Using low power modes of the processor: Most embedded systems don’t need to
operate continuously. Many of them wake up when a stimulus (in the form of an
interrupt) is obtained.Thus, it is important to use the sleep and power down modes
of the processor effectively.
vii) Battery type: Choosing the right type of battery is important and it depends on
the application as well. In some applications, recharging is possible, in others (like
a monitoring device placed in an inaccessible location), it is not possible.The above
two cases need different types of batteries.
2.5 | Pullup and Pulldown Resistors
These are words which are commonly heard and popularly used, but not many people
have a clear idea of what they really mean. Here is an attempt to bring in a bit of clarity
to these terms.
2.5.1 | Floating State of an Input
Consider a gate, say an AND gate connected as in Figure 2.28. One of the inputs is
connected to the supply,and the other is left unconnected.Many people have the notion
that leaving the pin open is equal to a ‘0’ level. So their expectation is of getting a ‘0’ at
the output, for this logic connection. On finding that the output is a ‘1’, the natural ten-
dency is to doubt the gate as being faulty. But the fault is not of the gate, but is because
one input is left ‘floating’. A floating input truly floats, and cannot take on any fixed
voltage level permanently, and usually it is found to have a voltage corresponding to a
high level. It may even be susceptible to accepting noise voltage pulses and may keep
changing.
M02_9788131787663_C02.indd 79
M02_9788131787663_C02.indd 79 7/3/2012 12:08:47 PM
7/3/2012 12:08:47 PM

80 EMBEDDED SYSTEMS
The solution to this predicament is to ground the other input if it is to be considered
a ‘0’. All ‘1’levels inputs are to be connected to V and ‘0’levels to ground. In this case, no
input is ‘floating’,and the correct logic is obtained at the output.But sometimes,we need
to connect these terminals through a resistance if we want to limit current through the
circuit.This is where pulldown and pullup resistances come to the picture.
2.5.2 | Pulldown and Pullup Resistances
See the inverter in Figure 2.29. It can be connected either to a ‘1’ or ‘0’ depending on
which switch is closed.This is what we usually do when using digital gates, but in some
cases, there may be reasons to limit the current that flows to the input. Figure 2.30a
shows a circuit in which the input is connected to V through a resistor R1, which is
called a pullup resistor.This resistor’s function is to limit the amount of current that flows
through the circuit.
The term ‘pullup resistor’implies that the input has been pulled ‘up’to high through
the resistor. It is clear that when the input is to be high, the switch S2 is to be kept open.
The value of the pullup resistance can be calculated by considering the maximum current
allowed, If 5 mA is the maximum current allowed, 5V/5 mA =10K for the pullup.
See Figure 2.30b; here R2 is called a ‘pulldown’ resistor as it is connected to the
ground. It is obvious that when S1 is open, the input is truly grounded, and when S1 is
Figure 2.28 | Floating state of an input
Y
+V
Figure 2.29 | Inverter with two switches
S1
Y
+V
S2
Figure 2.30 | (a) Inverter with a pullup resistor and (b) Inverter with a pulldown resistor
R1
Y
+V
S2
S1
Y
+V
R2
M02_9788131787663_C02.indd 80
M02_9788131787663_C02.indd 80 7/3/2012 12:08:47 PM
7/3/2012 12:08:47 PM

closed, voltage V is fully dropped across R2 corresponding to a ‘1’ input, but the current
is limited by R2.
Figure 2.31 shows an AND gate with a pullup resistor R1 and a pulldown resistor R2.
In this set up (ignore the dotted line), the logic levels at A and B are 1 and 0, respectively.
If B is also to be made ‘1’, it is just enough to connect B to the supply voltage (note the
dotted line).Then the full V is available at B,but the current is limited by the resistance R2.
Normally, in digital circuits, most of you might not have connected pullup and pull-
down resistors at the input because the available TTL gates have inbuilt protection at
the inputs,but there can be cases when power dissipation is a consideration.Also GPIOs
of some MCUs insist on such resistors—use it only then.
2.5.3 | Open Collector/Open Drain Gates
In MCUs, GPIO pins are used to connect input and output devices. These pins have
internal pullups (‘active’pullups rather than resistive),so external resistors are not needed
in most cases but there is a necessity for pullup resistors for gates with a ‘open’ output,
and that is what we will discuss next.
Logic gates are based on TTL or MOS technology. For most logic gates, there
is an output stage (remember the totem pole output for TTL?), but there are certain
gates with outputs left open. They are called ‘open collector’ or ‘open drain’ depending
on whether the technology is TTL or MOS. See Figure 2.32a, where the collector of
the output transistor (of the gate) is left open, that is, unconnected to any power source.
To use such a gate to get a ‘1’ when the transistor is OFF, it must be connected to +V
through a resistor, and that resistor is called a pullup resistor (see Figure 2.32b). The
same discussion applies to open drain outputs, also.
Figure 2.31 | AND gate with pullup and pulldown resistors
R1
+V
R2
Y
A
B
Figure 2.32 | (a) An open collector gate output and (b) An open collector gate with
a pullup resistor
R
+V
M02_9788131787663_C02.indd 81
M02_9788131787663_C02.indd 81 7/3/2012 12:08:48 PM
7/3/2012 12:08:48 PM

82 EMBEDDED SYSTEMS
What is the need for such open collector/drain circuits?
A wired AND or dot AND logic can be very easily obtained by having a number of
‘open collector’ gates connected to the power supply using just one pullup resistor. See
Figure 2.33, which shows a number of transistors using a common pullup resistance. If
any of the switches is ON, that is, any transistor is conducting, the output is ‘0’ and thus
we get an AND logic, with this simple wiring. Many protocols use this sort of logic.
Section 5.2 provides a note on how the I2C protocol uses wired AND logic. Pullup
resistors are widely used in such output configurations.
Port 0 of the 8051 is an open drain configuration.This port needs pullup resistances
as the drains of the output transistors are left open. See Figure 2.34, which shows pullup
resistors connected externally to P0.0 to P0.7. Separate pullup resistors are used, as the
port pins may have to be used as single bit port lines.
The other ports P1 to P3 are not open drain configuration. Each of these port
pins have an output stage which consists of active devices and have an effective internal
pullup. (The data sheet can be looked at to find the amount of current such an output
pin sources or sinks.)
2.5.4 | Weak and Strong Pullup
When the current drawn through the pullup is small, it is called a weak pullup, that is,
a high resistance, otherwise it is a strong pullup. A weak pullup has a high R and since
there is a capacitance associated with any signal lead, it constitutes a high RC. Thus
Figure 2.33 | Wired AND logic
R
+V
Figure 2.34 | Pins of port 0 of 8051 with pullup resistors
8051
P0.7
P0.0
+5V
M02_9788131787663_C02.indd 82
M02_9788131787663_C02.indd 82 7/3/2012 12:08:48 PM
7/3/2012 12:08:48 PM

where switching speed is important, like in the case of buses like the I2C bus, a weak
pullup is not advisable as it will cause slow switching between levels.
Thus, values of pullup resistances should be calculated on the basis of speed require-
ments, current specification of the gates and also on the number of outputs to be con-
nected together.
2.5.5 | High Impedance State, Hi-Z
In digital systems, the two logical states are 1 (high) and 0 (low). Depending on the type
of logic family used, there are voltage levels defined for these states. Besides this, one
more logic state is usually used in most bus-oriented designs.The third logic is the ‘high
impedance’state also called the Hi-Z or floating state. If a particular device is connected
to a line which is in the high impedance state, that device is as good as ‘not connected’.
Thus, a Hi-Z state is an open circuited state.
See Figure 2.35a for a tri state buffer.The signal E is the enable signal. If it is put in
the enabling state (can be defined to be high or low, as per the circuit design), the signal
at A is transferred to Y. Otherwise, the buffer acts like an open circuit (Figure 2.35b).
There are many chips which have multiple tri state buffers in them.The chip 74LS244
is one of them.A control pin is used to activate/disable the buses.See Figure 2.36,which
shows a functional view of this chip. It has two sets of four bit inputs and corresponding
outputs. For each set, there is an enable pin OE. When the pin OE is low, the output
pins are active and will take on the logical levels of the input pins. When the OE pin is
high, the output pins are at high impedance or blocked or floating state. Any device that
Figure 2.35a | A tri-state buffer
Y
A
E
Figure 2.35b | An open circuit
Y
A
E
Figure 2.36 Functional pin diagram of the 74LS244 tri-state buffer IC
7
4
L
S
2
4
4
1Y0–1Y3
2Y0–2Y3
1A0–1A3
2A0–2A3
1OE
2OE
M02_9788131787663_C02.indd 83
M02_9788131787663_C02.indd 83 7/3/2012 12:08:48 PM
7/3/2012 12:08:48 PM

84 EMBEDDED SYSTEMS
is connected to these output pins is ‘disconnected’, in the sense that the interconnection
lines are actually ‘open circuit’.
The Hi-Z state is very important in bus-oriented systems, where a number of
devices are connected to the same set of lines. See Figure 2.37, in which three input
devices are attached to the data bus of an MPU. Only one of the devices should be really
‘connected’ to the bus. All the devices have an enable pin each. Only if the enable pin of
the respective device is activated will that device’s output lines be activated. Then only
will the particular device be connected to the data bus.
It is to be understood that at a time, only one of the devices should be ‘connected’to
the data bus, and this is achieved by selectively enabling each device. When the enable
pin of a device is inactive, the output lines of the device are in the high impedance
(Hi-Z) state, and hence no connection exists between the device output and the data
bus of the MPU.This setup is to ensure that only one device can send data to the MPU
at a time.
Conclusion
With this, we come to the end of our discussion on the general hardware aspects of
embedded systems. For each of the topics discussed here, more (intense) study will
be required, if and when a project is to be done and a hardware is to be completely
developed. For that, the data sheets of the processor, memory and I/O devices will
have to be studied. It is very important to understand the concepts in this chapter
well, because it will go a long way in developing your conﬁdence to understand any
MCU with ease.
The most important component in an MCU is its processor.
8051 is a popular 8-bit MCU.
The stack of an MCU is defined in RAM and is a very useful data structure.
Figure 2.37 | Example of a bus-oriented system using the Hi-Z logic
Device-1
E1
Device-2
E2
Device-3
Data Bus
M
P
U
E3
M02_9788131787663_C02.indd 84
M02_9788131787663_C02.indd 84 7/3/2012 12:08:48 PM
7/3/2012 12:08:48 PM

Interrupts are a way by which peripherals can get the attention of the processor.
Timers and counters are found in most MCUs as a standard peripheral.
DMA is a type of operation that is done in mid- and high-range processors.
There are serial communication ports in all MCUs.
MCUs have semiconductor memory on the chip in the form of flash, RAM and EEPROM.
Many embedded systems prefer tightly coupled memory to caches.
Designing embedded systems for very low power is the trend now.
Gates with open collector outputs need pullup resistors.
Bus-oriented systems have a high impedance state besides‘0’and‘1’logic.
Q U E S T I O N S
1. What are the types of registers available in an MCU?
2. What is meant by the word GPIO? What is it used for?
3. What is the use of a real-time clock for an OS?
4. Give two uses of the processor stack.
5. Distinguish between software and hardware interrupts.
6. Explain how a typical timer operates.
7. Why is serial communication needed? Give some contexts in which it is used.
8. Distinguish between SRAM and DRAM.
9. Why is SRAM the preferred memory technology for caches?
10. How is tightly coupled memory different from cache?
11. List a few applications for NAND flash.
12. Specify a context in which a pullup resistor is used.
13. Explain why bus-oriented systems need the Hi_Z logic level.
14. Explain the concept of wired AND logic.
E X E R C I S E S
1. Draw an MCU with the following peripherals connected to its GPIO pins:
a) A single digit LED
b) Four toggle switches
c) A relay
2. List the numbers and manufacturers of:
a) Two SRAMs
b) Two SDRAMs
c) One SSRAM
3. Choose a specific PIC IC and list out how much of RAM, flash and EEPROM it contains.
4. What are the trends in embedded design for low power dissipation?
M02_9788131787663_C02.indd 85
M02_9788131787663_C02.indd 85 7/3/2012 12:08:49 PM
7/3/2012 12:08:49 PM

Introduction
In our discussion on general purpose I/O (GPIO) in Chapter 2, it was mentioned
that depending on the application, the GPIO pins can be used as output or input
pins, and various peripheral devices can be connected to these pins. In this chapter,
we discuss a few standard and widely used I/O devices, that is, sensors which are
input devices and actuators which are output devices. Sensors are required for getting
data from the real world and actuators are needed to get the embedded system to ‘act’
based on this data.
We also devote a small amount of space to ‘Analog to Digital Converters’ or ADCs
as they are commonly referred to. All sensors have analog outputs. But for using them
with MCUs, there analog voltages are to be converted to digital numbers. ADCs per-
form this function, which is a very critical part of the whole system. Without an ADC
of good resolution and sensitivity, the whole point in having good sensors is lost.Taking
this aspect into consideration, the important aspects of A to D conversion have been
looked into.
The working principle of a number of
sensors
The use of temperature and light sensors in
practical circuits
How an LED is used in sensing circuits
Circuits which use photodiodes and photo
transistors
The working principle of proximity and
range sensors
How an optical encoder works
The data and control interface of an ADC
The use of parallel and serial ADCs
The use of seven segment LEDs and
OLEDs
The ways of using Character and Graphical
LCDs
Usage of stepper motors, DC motors,
current drivers and optocouplers
The working principle of a relay and some
practical circuits
sensors, adcs
and actuators
3
Chapter-opening image: Two ADC chips.
M03_9788131787663_C03.indd 86
M03_9788131787663_C03.indd 86 7/3/2012 1:09:51 PM
7/3/2012 1:09:51 PM

SENSORS, ADCS AND ACTUATORS 87
3.1 | Sensors
Remember the block diagram (Figure 1.2) that we started with. Any embedded system
needs sensors—depending on the application, it may be just one sensor or as in most
cases, many sensors. A sensor converts a physical quantity into a corresponding voltage.
Take an application like a home security system—there should be sensors for sensing
temperature, light, motion, humidity, etc. The data obtained from these sensors decide
the course of action for the actuators of the system. In this section, we take a quick look
at some of the commonly used sensors.
3.1.1 | Temperature Sensors
3.1.1.1 | Thermistor
A thermistor is a thermally sensitive resistor, which means that its resistance is
‘affected’ by the temperature variations around it. Thermistors are made with semi-
conductor materials and there are two kinds of thermistors—those with NTC (negative
temperature coefficient) and PTC (positive thermal coefficient). For the former, the
resistance of the thermistor decreases with increase in temperature, and for the latter, it
is just the reverse.
Figure 3.1 is a very simple circuit which uses a NTC thermistor. At normal
temperatures, the transistor is OFF, because the high resistance of the thermistor
prevents it from getting sufficient base current. When the temperature increases, the
resistance of the thermistor decreases and at a certain value of thermistor resistance, the
base current needed to turn on the transistor is obtained. This switches the transistor
ON. The collector voltage goes low, and the LED lights up. Since the collector is
connected to the input pin of an MCU, the port P1.1 switches from high to low at a
particular ‘triggering temperature’. The exact value of this temperature can be varied by
varying the value of R. That is why, it is shown as a variable resistor. Figure 3.2 is the
photograph of a thermistor.
Figure 3.1 | A simple circuit using a thermistor
R
Q1
LED
Thermistor
V
P1.1
8
0
5
1
M03_9788131787663_C03.indd 87
M03_9788131787663_C03.indd 87 7/3/2012 1:09:53 PM
7/3/2012 1:09:53 PM

88 EMBEDDED SYSTEMS
3.1.1.2 | Thermocouple
A thermocouple is also a sensor for measuring temperature. In this, there are two
dissimilar metals, joined together at one end and they produce a voltage proportional to
the temperature difference between the two ends of the pair of conductors.One junction
is kept at a constant temperature and is called the reference (cold) junction, while the
other is the measuring (Hot) junction.When the two junctions are at different tempera-
tures, a voltage is developed across the junction.
Thermocouples are used in many high temperature applications like furnaces,
turbines and engine temperature measurements in industries and automobiles.
3.1.2 | Light Sensors
Sensing of light, or rather the blocking of the light that is being sensed continuously is
an important feature in many systems. Let’s make a review of some of the popular light
sensors.
3.1.2.1 | Light Dependent Resistor (LDR)
LDRs are very popular devices, made from cadmium sulphide, the resistance of which
changes from several thousand ohms in the dark to only a few hundred ohms in the
presence of bright light. When light falls upon it, electron hole pairs are created and
conductivity increases. This is a very simple light sensor, but it has the disadvantage of
‘sluggish’response,that is,it takes quite some time to respond to a change in illumination.
Fig 3.3 is a photograph of an LDR. Figure 3.4 is a simple circuit which uses an LDR as
a sensor. The circuit acts as a light activated switch. When there is no ambient lighting,
the relay (Section 3.3.4) contact is open (it is a ‘Normally Open’,i.e.,NO contact) When
the light level increases beyond a certain range, this contact closes.
This simple circuit uses the LDR to sense the presence or absence of light. In
the absence of light, the LDR has a resistance in the range of Mega ohms and so the
transistor does not get sufficient bias to be ON. But when there is ambient lighting,
the resistance of the LDR falls and the transistor gets the bias current to conduct. The
Figure 3.2 | Photograph of a thermistor
M03_9788131787663_C03.indd 88
M03_9788131787663_C03.indd 88 7/3/2012 1:09:53 PM
7/3/2012 1:09:53 PM

transistor switches ON and the relay gets energized—its contact closes.This ‘closing’can
be used to activate some action, as needed.The amount of illumination required to cause
the ‘switching’ may be adjusted by using the variable resistance R1.
3.1.2.2 | Light Emitting Diodes (LED)
LEDs are light generating devices, and as such do not act directly as sensors. Fig 3.5
shows the photograph of an LED. But they can be made to emit light, which is detected
by a photo detecting device. This ‘detection’ can be used as a sensor value. Let’s think of
some typical applications which use this technique for sensing.
In what is called a line following robot, a robot is made to move continuously on a
white line (drawn on a black surface). An LED is ﬁxed under the robotic vehicle. The
light from this LED strikes the white line on the ground and reﬂects it back. There is
a photo detecting circuitry to receive this, and the corresponding activation circuitry
ensures that the vehicle moves continuously on the white line.
Figure 3.3 | Photograph of an LDR
Figure 3.4 | A circuit which uses an LDR for sensing light
R1
R2
Q1
LDR
VCC
No
M03_9788131787663_C03.indd 89
M03_9788131787663_C03.indd 89 7/3/2012 1:09:54 PM
7/3/2012 1:09:54 PM

90 EMBEDDED SYSTEMS
But when the path becomes a curve, the moving vehicle which is moving in a
‘straight path’ will deviate from the path defined by the white line. This will be sensed
by the photodetecting circuitry which no longer receives the reflected light, because the
black background absorbs it all. This can be used to create the necessary logic to bring
the vehicle back on the white line, and due to this feedback mechanism, it is made to
navigate along the curve.
This example was just to make clear the idea of how an LED can be part of a ‘sensor’
mechanism.
Infra Red LED
For many sensor circuits,infra red LEDs are preferred as the light source.This is because
it can be used in the same manner at night and day, as visible light does not affect its
operation. Common IR LEDs have a wavelength of 850nm, 940~980nm.The light gen-
erated by this is sensed by an IR receiver,which is usually a photodiode or phototransistor.
3.1.2.3 | Photojunction Devices
Photodiodes
This is a diode similar to regular semiconductor diodes except that its outer casing is
either transparent or has a clear lens to focus the light onto the PN junction for exposure
to light, i.e. it is packaged with a window to allow light to reach the sensitive part of the
device. It is designed to operate in reverse bias, so that reverse current flows. When light
energy strikes the junction, it is this current that increases.
Photodiodes are very smart light sensors that can switch from ‘ON’ and ‘OFF’ in
nanoseconds.They are commonly used in very many applications from robotics to cam-
eras,TV remote controls, scanners, fax machines, copiers, etc.
Phototransistors
A phototransistor is basically a photodiode with gain. The phototransistor light sensor
has its collector-base PN-junction reverse biased and is also exposed to radiant light
source. Any normal transistor can be easily converted into a phototransistor light sen-
sor by connecting a photodiode between the collector and base. Now, let’s think of an
application where light sensing is used.
Intrusion Detection In this circuit (Figure 3.6), an infra red LED output which is
activated by an astable oscillator (using the IC NE555,for example) is used to generate a
Anode
Figure 3.5 | A photograph of an LED
M03_9788131787663_C03.indd 90
M03_9788131787663_C03.indd 90 7/3/2012 1:09:55 PM
7/3/2012 1:09:55 PM

pulse of some particular frequency.This LED is in Circuit 1 which is the light transmitter
circuit.
This pulse train is continuously detected by an infra red detector in Circuit 2.When
an intruder blocks the path of light (between circuits 1 and 2), the momentarily absence
of light is sensed by the IR sensor in Circuit 2 and this information can be used as an
indication that an intruder has blocked the path of light. Any action can be initiated by
this, for example, a relay can be activated to trigger an alarm on sensing the intruder.
There is a standard IC acting as an infra red receiver, and it is the TSOP series. For
using this, the signal transmitted by the IR LED should have a standard format (refer to
its data sheet).The TSOP IC package contains a PIN photodiode, a bandpass filter and
a demodulator which converts the signal to a format that a microcontroller can use, i.e.
a high or low pulse. Because of the preamplifier and bandpass filter inside, the received
signal is robust and free of noise. Such receivers are very popular in simple intruder
detector systems, and also in remote control systems, proximity detectors, etc. Fig 3.7 is
a photograph of the TSOP (SM0038) IC.
Q1
Circuit 1
22K
5V
330Ω
Circuit 2
Q2
To ADC
5V
10K
Figure 3.6 | An intrusion detection setup
GND
Out
VS
Figure 3.7 | Photograph of a TSOP IC
M03_9788131787663_C03.indd 91
M03_9788131787663_C03.indd 91 7/3/2012 1:09:55 PM
7/3/2012 1:09:55 PM

92 EMBEDDED SYSTEMS
3.1.3 | Proximity/Range Sensors
Detection of an object and determination of its range are very important, especially in
the field of robotics. Visible or infra red light can be used successfully for this.To detect
whether there is some object in the proximity, the simple method used is to send a beam
of light (usually IR) which the object will reflect back to the receiver.The reflected light
is detected by a photo detector, and the information can be used to confirm that there is
an object within the path of the emitted light (within a certain range). This is what we
call a proximity sensor. To make it a range sensor, there should be a method to find its
range, as well.
Recently SHARP (the company) has produced a series (GP2DXX) of IR range
finder ICs which are very powerful and easy to use. Its merits are that it is quite accurate,
easy to use, affordable, small, has good range measurement capability from inches to
metres, and also low-power consumption.
The GP2DXX series has both proximity detectors and range sensors.The GP2D12,
GP2D120, GP2Y0A02 (‘0A02’), GP2Y0A21 (‘0A21’), and GP2Y0A700 (‘0A700’)
sensors offer true ranging information in the form of an analog output. The GP2D15
and GP2DY0D02 (‘0D’), by contrast, offer a single digital value based on whether an
object is present or not. None of the detectors require an external clock or signal.
3.1.3.1 | Range Sensing Technique
The Sharp IR Range Finder (the photograph of which is shown in Fig 3.8) works by the
process of triangulation, which is a technique in which a region is divided into a series of
triangular elements based on a line of known length, so that accurate measurements of
distances and directions may be made by the application of trigonometry.
A pulse of IR light is emitted and then is reflected back if it strikes an object in
its path. The reflected beam returns at an angle that is dependent on the distance of
the reflecting object. Triangulation works by detecting this reflected beam angle—by
knowing the angle, the distance can be determined (Fig 3.9).This type of IR range finder
receiver has a special precision lens that transmits the reflected light onto an enclosed
linear CCD array based on the triangulation angle. The CCD array then determines
the angle and causes the rangefinder to give an analog value, which corresponds to the
distance of the object. Additional to this, the ‘Sharp IR Range Finder’ circuitry applies a
modulated frequency to the emitted IR beam.This ranging method is almost (not 100%
true!!) immune to interference from ambient light, and is indifferent to the colour of the
detected object.
Note These sensors can measure range between 20 to 150 cms (approximate), which
means that there is not only a ‘maximum’but also a minimum for the range measurement.
Figure 3.8 | Photograph of a SHARP sensor
M03_9788131787663_C03.indd 92
M03_9788131787663_C03.indd 92 7/3/2012 1:09:55 PM
7/3/2012 1:09:55 PM

For example, observe the voltage output (Figure 3.10) from the GP2D120 sensor which
has the range specified to be between 4 and 13 cms.
The characteristic shows that below 4 cm, the output falls and may be wrongly
interpreted as a large range. In robotics, the best method would be to use more than one
range finder and then to cross fire them.
3.1.4 | Encoders
There is a sensor for finding the speed and direction (and thus the position and distance
travelled) of a moving vehicle.This is an ‘encoder’ which can be fitted to the shaft of the
wheel of the vehicle. It uses optical principals and is called an ‘optical encoder’. Such
encoders work on the principle of counting the number of transitions across a black and
white pattern.
0
0 2 4 6 8 10 12 14
Distance to Reflective Object (cm)
16 18 20 22 24 26 28 30 32 34 36 38 40
0.2
0.4
3.2
2.8
2.6
2.4
2.2
2
1.8
1.6
1.4
1.2
1
0.8
0.6
3
Analogvoltage
output
(V)
Figure 3.10 | Graph of voltage vs range for a range sensor
Angle
Object
Angle
Object
Point of
Reflection
Figure 3.9 | The technique for sensing range
M03_9788131787663_C03.indd 93
M03_9788131787663_C03.indd 93 7/3/2012 1:09:55 PM
7/3/2012 1:09:55 PM

94 EMBEDDED SYSTEMS
Look at the pattern in Figure 3.11. If we calibrate the black and white patterns in
terms of 1 and 0 –(0 for block, and 1 for white), we get a square pulse.
The pattern in Figure 3.11 has 12 black and 12 white blocks. Assume that this
pattern is embedded on a disk which is fitted to the shaft of the wheel and that there
is a sensor from which pulses can be obtained corresponding to the pattern. When the
wheel rotates once,a pulse train of 12 pulses are obtained.Thus,if the pulses are counted,
the number of rotations that have occurred can be found out. If the wheel diameter is
known, the circumference of the wheel can be calculated, which measures the distance
covered with one wheel rotation.In effect,this simply means that by knowing how much
is the angle turned for one received pulse, counting the received pulses is equivalent to
measuring the distance travelled, and from this velocity can be calculated.
For an optical encoder, two components are necessary
i) A disk with a pattern as shown in Figure 3.11. Now see Figure 3.12 which shows
such a disk.
ii) An LED which generates light which passes through the holes in the disk, or is
blocked by the disc.
iii) An optical sensor which receives these light pulses and converts it to electrical
signals.
Figure 3.12 shows a pattern disk with a few holes, which can be attached to the
shaft of a moving wheel. As the wheel rotates, the light passes through the holes and
this is sensed by the receiver on the PCB shown.The pulse train obtained can be used to
calculate the distance travelled (from which velocity can be obtained).
IC s are available, with an IR LED and a photodetector in one package, which act
as the receiver. See the optical interruptor switch in Figure 3.13 (also called the break
beam switch) with a U-shaped slot using which it can be fixed to the rotating wheels.
Such an IC is the H21A1/H21A2/H21A3 series which consist of a gallium arse-
nide infrared emitting diode coupled with a silicon phototransistor in a plastic housing.
Figure 3.11 | Pattern used in an optical encoder
M03_9788131787663_C03.indd 94
M03_9788131787663_C03.indd 94 7/3/2012 1:09:56 PM
7/3/2012 1:09:56 PM

The gap in the housing provides a means of interrupting the signal with an opaque mate-
rial, switching the output from an ‘ON’ to an ‘OFF’ state.
Figure 3.14 shows the arrangement of the pattern disk and the switch connected to
a wheel of a robot.
The type of pattern disk that we have just discussed is non-directional because
it cannot decipher the direction of movement. For directionality, a quadrature phase
Receivers
PCB
Pattern
Disk
Light
Figure 3.12 | Optical encoder transmitter and receiver
Figure 3.13 | Photograph of an optical interruptor switch
Optical Encoder Laser Disk
Figure 3.14 | Optical Encoder ﬁxed to the wheel of a robot
(Reproduced with permission from Nex Robotics, Mumbai)
M03_9788131787663_C03.indd 95
M03_9788131787663_C03.indd 95 7/3/2012 1:09:56 PM
7/3/2012 1:09:56 PM

96 EMBEDDED SYSTEMS
pattern with two staggered patterns is necessary, so that the system can tell which way
the wheel is turning. Figure 3.15 shows such a pattern.
3.1.5 | Humidity Sensors
Humidity is the quantity of water vapour present in air. It can be expressed as being
‘absolute’ or ‘relative’. Absolute humidity expresses the water vapour content of the air
using the mass of water vapour contained in a given volume of air. Relative humidity is
a ratio that compares the amount of water vapour in the air with the amount of water
vapour that would be present in the air at saturation.Thus,humidity sensors measure the
amount of water vapour present in air, but what are the applications for it?
In home automation systems, humidity is monitored so as to bring it to a level
which makes it a comfortable environment—other applications are in the semiconduc-
tor industry where moisture levels need to be continuously monitored during wafer
processing—in the household, it is used for intelligent control of living environment,
microwave cooking, laundry, etc. In automobiles, this sensor information forms the basis
for window de-fogging control.
There are different types of humidity sensors, but in general, sensing of humidity
involves a change of impedance (capacitance, for example). For this principle, the sen-
sor element is built out of a film capacitor on different substrates (glass, ceramic, etc.).
The dielectric is a polymer which absorbs or releases water proportional to the relative
environmental humidity, and thus changes the capacitance of the capacitor, which is
measured by an on-board electronic circuit. One commonly available relative humidity
sensor is the HS12P, HS15P series.
3.1.6 | Other Sensors
In this section, only a few types of sensors have been covered. Besides these, there
are sensors for many other physical quantities—there are gas sensors, smoke sensors,
Figure 3.15 | A directional pattern
M03_9788131787663_C03.indd 96
M03_9788131787663_C03.indd 96 7/3/2012 1:09:57 PM
7/3/2012 1:09:57 PM

piezo-electric sensors (for sensing stress, strain etc), touch sensors and so on. As per the
requirements of the applications, standard sensors for these can be easily sought out.
3.2 | Analog to Digital Converters
All sensors give an analog voltage proportional to the physical quantity sensed—to
convert it to a digital number which an MCU can use, an Analog to Digital Converter
(ADC) is used.Many MCUs have ADCs inside the chip (PIC,ARM,AVR,etc.),while
there are MCUs like 8051 where an external ADC might be needed. In this section, we
review some important aspects of ADCs.
3.2.1 | ADC Interfacing *
ADCs [Analog-to-Digital Convertors] convert analog voltages into digital codes which
can be processed by embedded systems. ADCs are required for all systems that need to
interface with real-world (analog) signals.
ADCs usually have two separate interfaces that are accessible to an embedded system.
i) Control interface
ii) Data interface
3.2.1.1 | Control Interface
ADCs vary widely in complexity, performance and speed.They have various modes and
states which need to be managed to make them operate in the manner we need them
to. For example, an ordinary SAR (Successive Approximation Register) ADC might
continuously convert the input signal, generate codes and put them out on the data bus.
However, the ADC might have different modes like 10-bit/12-bit/14-bit (resolution),
various offset correction modes, power modes, latency modes (which determines how
many clock cycles it takes for a given input to appear at the output), input clock modes
(single ended/differential), etc. There might be a huge number of modes depending on
the complexity of the ADC. Some might have elaborate schemes for tweaking various
aspects of their operation to improve or tailor performance to our needs.
Register Control
All these modes and states are managed with the help of registers and register controlled
fuses inside the ADC.
Hence, we need some way of writing into and reading from these registers. For this,
we depend on the control interface which is essentially a register interface.
Various industry standards exist for such interfaces. A few among them are
(Section 5.1.3)
i) SPI
ii) I2C
iii) UART
* The section 3.2.1 is written by Sabu Paul, Analog Design Engineer, Texas Instruments,
Bangalore.
M03_9788131787663_C03.indd 97
M03_9788131787663_C03.indd 97 7/3/2012 1:09:57 PM
7/3/2012 1:09:57 PM

98 EMBEDDED SYSTEMS
Pin Control
For very simple ADCs, there might not be enough number of states or modes to justify
the usage of a register interface. They might simply have a few pins which control the
state of the ADC.Typically,the number of such pins will be less than or equal to 4,which
allows us to select between 16 different states. Some such devices might have a simple
state machine inside which allows the ADC to respond to the sequence of inputs applied
at the input pins.This is frequently used in ADCs used for compact biomedical applica-
tions.The ADC might be combined with an AFE (Analog Front End) and/or transceiver.
The entire module can be controlled through the pin interface mentioned earlier.
3.2.1.2 | Data Interface
ADCs vary widely in their speed of conversion, from a few samples per second to several
GSPS (Giga Samples per Second).In the case of very slow ADCs,separate data and con-
trol interfaces are not usually used.In such devices,the register interface is used to read out
both the data and to write and read state information. For example, most microcontrollers
contain a built-in ADC which has a register interface through which both control signals
like start conversion/stop conversion as well as data can be sent and read out, respectively.
In the case of high speed ADCs, the transmission speeds of conventional register
interfaces and their signalling overheads together make them unsuitable for data. Also,
in most of the cases where high speed ADCs are used,control signals will have to be sent
without interrupting the flow of data.
In such cases, a separate dedicated data interface is used.
Two types of interfaces used are:
i) Parallel
ii) Serial
Parallel Interface
In a parallel data interface, each data word is transmitted over several physical lines with
each line carrying one data bit. There is also a clock line which is used to latch the data
when it is ready.These systems are losing popularity and are used mainly in applications
where the resolution and/or speed is low. In such cases, parallel interfaces make sense
since they can be directly transmitted without a lot of digital manipulation by the ADC
and directly read by the controller.This is unlike serial systems which require a SERDES
(Serializer-Deserializer) for data transfer. A clearer picture of its pros and cons vis-à-vis
the serial system will emerge during the discussion of serial systems.
Parallel interfaces can be classified based on the clock edge used for latching data.
i) Positive edge: The data is latched on the positive edge of the clock.
ii) Negative edge: The data is latched on the negative edge.
iii) Dual edge: The data is latched on the positive and negative edges of the clock.This
is also known as a DDR [Dual Data Rate] scheme. This can reduce the number of
data lines by half (at the cost of some extra hardware) or increase the speed by two.
Based on the number of lines used to transmit a single bit, we have
i) Single ended: A single line is used to transmit one bit and the voltage is referred to
the common ground.
M03_9788131787663_C03.indd 98
M03_9788131787663_C03.indd 98 7/3/2012 1:09:57 PM
7/3/2012 1:09:57 PM

ii) Differential:Amatchedpairisusedtotransmit1-bit.Onelinecarriesbit+Vandthe
other bit -V.This improves noise immunity and can allow for reduced signal swing.
A differential system requires twice the number of lines as a singled ended system.
Based on the voltage levels, these interfaces are classified into
i) CMOS (1.8 -3.3V): Modern ADCs have digital modules made using MOS tech-
nology. So, this voltage level is commonly used by a lot of ADCs for signalling.
ii) TTL (5V).
iii) LVDS (700mV peak-to-peak): Low Voltage Differential Signalling is used in dif-
ferential systems. Both bit+ and bit- lines have a swing of 350mV peak-to-peak
each. The noise immunity afforded by the matched pair of lines is what allows the
usage of this reduced swing without compromising on BER (Bit Error Rate).
The LVDS scheme requires 2X (twice) the number of lines. To work around this prob-
lem, it normally is used in conjunction with the DDR scheme mentioned earlier. It is
then called a DDR LVDS scheme.This allows us to have the exact same pin count while
at the same time getting the reduced power consumption and EMI (Electro Magnetic
Interference) levels of an LVDS scheme. The advantages of the LVDS scheme will
become more apparent after the discussion of its use in serial systems. ICs are available
which are capable of translating between voltage levels and signal modes (Differential/
Single-ended).
Serial Interface
A lot of the modern day high performance devices use the serial mode of data transfer.
Here, instead of each data line carrying 1-bit of the data word, all the bits are sent
serially over one single line.There are many advantages to this approach when compared
to a parallel system.
Advantages
i) Fewer number of physical connections.
ii) Smaller die size made possible by reduced pin count.
iii) The size is a big advantage for some of the higher resolution (14 bits)/multi-
channel ADCs.
iv) Serial interfaces can operate differentially as the increase in the number of wires is
minimal. This drastically cuts down noise pick-up and allows for lower signalling
voltages.
v) The EMI generated by a balanced differential pair is much lesser than that by a
large number of single ended parallel data lines. This is because the opposing cur-
rents in the pair cancel out the magnetic fields caused due to each other.
vi) The lower signal swing made possible by the common mode noise rejection in a
differential system reduces power consumption.
vii) Serial interfaces allow for features like clock recovery. This makes it unnecessary
to have a separate clock line. The clock is recovered from the data. This is possible
because serial interfaces switch at a much higher speed than parallel buses.
viii) Quite a few high-quality serial interface standards are there which allow for easy
interoperability.
M03_9788131787663_C03.indd 99
M03_9788131787663_C03.indd 99 7/3/2012 1:09:57 PM
7/3/2012 1:09:57 PM

100 EMBEDDED SYSTEMS
Disadvantages
i) Requires a deserializer to convert the serial data to data words.
ii) Features like clock recovery require the data to be modified to make sure that there
are sufficient number of transitions for the PLL (Phase Locked Loop) to lock on to
and to ensure that there are an equal number of 1s and 0s in a specified number of
bits to ensure that the input common mode of the receiver does not change.
iii) Parallel data has to be converted to serial data and a high speed clock generated
inside the device.
iv) The speed of transmission is higher by a factor equal to the resolution of the ADC
when compared to a parallel scheme. So, reliability of data transfer becomes very
sensitive to clock jitter and board parasitics.
Serial interfaces are also classified into the following:
• Single-ended:There is only one data line and signal voltage is referred to the common
ground. The zero crossings are susceptible to noise. But, the number of lines is less.
Single ended interfaces come with different signalling levels. A couple of them are
– CMOS (1.8-3.3V)
– TTL (5V)
• Differential: Data is transmitted over a matched differential pair. This reduces noise
and EMI generation. Signal swing can be lower because of improvement in noise.
This can save power.These systems require special interfacing ICs to convert the dif-
ferential signal into a single ended one for use in the embedded system. Differential
systems are further classified based on voltage swing.
– Rail-Rail Swing (e.g. USB)
– LVDS (Low Voltage Differential Signalling 700mVpp)
Nowadays, high speed multi-channel ADCs are available which convert several
input signals simultaneously into corresponding digital codes. They give out output
codes serially over several output data channels. Sometimes, hybrid systems are used to
get around speed limitations.These systems are mainly of the following two types:
• Multiplexed output: Data from several input channels is multiplexed onto one out-
put channel to save on pin count.
• Interleaved output: Data from one channel can be split up into 2 or more streams
which are then sent serially over multiple output data channels. This is done to sup-
port higher sampling speeds.
Conventional serial standards like USB and Firewire (Section 5.3) are not used in
data converters. This is because ADCs are a special category of devices sending out one
single type of data. Hence, the complications, overheads and limitations of these serial
standards make them unsuitable for use in ADC interfacing.The JEDEC (Joint Electron
Devices Engineering Council) JESD204 is the upcoming standard for serial data inter-
faces in high speed data convertors. This is an LVDS scheme supporting very high data
transfer speeds over multiple synchronized lanes and from multiple ADCs.
The complexity of the interfacing,depends on the speed and resolution of the ADCs
used. It varies from simple singled ended CMOS/TTL parallel interfaces to JEDEC
SERDES interfaces.The cost also increases correspondingly.
M03_9788131787663_C03.indd 100
M03_9788131787663_C03.indd 100 7/3/2012 1:09:57 PM
7/3/2012 1:09:57 PM

3.2.1.3 | Interfacing an ADC to 8051
ADCs may be‘parallel’or‘serial’,as we have just discussed.First we consider a parallel ADC.
3.2.1.4 | An ADC with a Parallel Data Interface
Our interest is to interface an ADC to an 8051 MCU using Port2 as the data lines,
and some pins of Port 1 for the control signals needed by the ADC. When an analog
voltage is given as an input to an ADC, it gets converted to a digital number which is
transferred to the 8051. The digital value can be stored in the RAM of the system and
may be displayed or used in further computations. See the block diagram of such a setup
in Figure 3.16.
ADC 0808/0809
We choose here,the ADC0808/ADC0809 which is an 8-bit parallel ADC and is micro-
processor compatible.Its functional pin diagram is shown in Figure 3.17.It is designated
as an ‘8-Bit μP Compatible A/D Converter with 8-Channel Multiplexer’. It uses the
successive approximation technique for analog to digital conversion.
Analog
Inputs
8
0
5
1
A
D
C
Input Port
Control Signals
Control Signals
Figure 3.16 | General block diagram of the connection between an ADC and a MCU (8051)
ADC0808/0809
GND Clock
D0
VCC
D1
D2
D3
D4
D5
D6
D7
EOC
OE
A
B
C
ALE
SE
(LSB)
Vref
(+)
Vref
(–)
IN0
IN1
IN2
IN3
IN4
IN5
IN6
IN7
Figure 3.17 | Functional pin diagram of ADC 0808/0809
M03_9788131787663_C03.indd 101
M03_9788131787663_C03.indd 101 7/3/2012 1:09:57 PM
7/3/2012 1:09:57 PM

Its key specifications are given as:
1. Resolution 8 Bits
2. Total Unadjusted Error + /_ ½ LSB and + /_ 1 LSB
3. Single Supply 5 VDC
4. Low Power 15 mW
5. Conversion Time 100 μsecs
It is an 8-input multiplexed ADC, which means that it has 8 input analog signal
lines, though only one of them can be operational at a time. This is selected by three
address inputs A, B, C. Table 3. 1 shows the address bit configuration for selecting spe-
cific input channels. For example, if IN0 is to act as the input, the address lines C, B and
A all have to be low; for IN1, the values of CBA is to be 001 and so on.
The first requirement in using the ADC is to select an input channel by giving the
appropriate logic on the address pins. To latch this on to the chip, a signal called ALE
(Address Latch Enable) is to be supplied on the ALE pin. ALE is to be a low to high
transition (refer to Figure 3.18). After the address is latched and the analog input is
Table 3.1 | Channel Selection Logic
Selected analog channel C B A
IN0 0 0 0
IN1 0 0 1
IN2 0 1 0
IN3 0 1 1
IN4 1 0 0
IN5 1 0 1
IN6 1 1 0
IN7 1 1 1
Clock
WR (SC)
RD (OE)
ALE
ADDR
EOC
D0–D7
Latch Address
Start Conversion
Figure 3.18 | Timing diagram for the ADC 0808/0809
M03_9788131787663_C03.indd 102
M03_9788131787663_C03.indd 102 7/3/2012 1:09:57 PM
7/3/2012 1:09:57 PM

available at the selected input line, the ADC must be signalled to start conversion. This
(SC) is a low to high pulse of minimum specified duration (as mentioned in the data
sheet).The ADC requires a clock and the speed of conversion depends on the clock rate.
The maximum clock frequency is specified in the data sheet.The clock of the MCU can
be divided to get the right frequency for the ADC.
The ADC takes a finite time to complete the conversion and then it notifies this
fact by lowering the pin called EOC (End of Conversion).This should be brought to the
notice of the MCU. The EOC signal can be used to interrupt the MCU, and allow the
converted data to be transferred to the 8086 or this signal can be polled continuously.
To receive the digital data, the output lines of the ADC are to be activated. This is
done by making high the line OE (Output Enable). Once, the output lines are activated,
the converted digital data can be transferred to the 8051. The above is the sequence of
actions necessary to use the ADC chip 0809 to perform analog to digital conversion and
then to transfer the digital data to the microprocessor.
Let us now use the pins of 8051 for the purposes specified above. Make the con-
nections between the 8051 and the ADC as shown in Figure 3.19. The salient points
regarding the connection are as follows:
i) Port 2 is used in the input mode to get the converted digital data from the ADC
to 8051.
ii) The port pins P1.7.P1.6 and P1.5 are used in the output mode as the address
selection pins A, B, C of the ADC.
iii) The port pin P1.0 is used as ALE. It is to be an output pin.
iv) The port pin P1.1 is used to give the start conversion (SC) pulse to the ADC.Hence
it is to be an output pin.
v) The port pin P1.2 is used in the input mode, to receive the End of Conversion
(EOC) signal from the ADC.
vi) Port pin P1.3 is used as OE for the ADC. It is defined as an output pin.
A B C
A
N
A
L
O
G
I
N
P
U
T
S
P1.0 ALE
SC
EOC
OE
D0
…
…
P1.1
8
0
5
1
A
D
C
0
8
0
8
P1.2
P1.3
P2
P1.7
P1.5
P1.6
IN0
IN1
IN2
IN3
IN4
IN5
IN6
IN7
D7
Figure 3.19 | Connections between the ADC and the 8051 MCU
M03_9788131787663_C03.indd 103
M03_9788131787663_C03.indd 103 7/3/2012 1:09:58 PM
7/3/2012 1:09:58 PM

3.2.1.5 | Serial ADC
An example for a serial ADC is the MAX 1169 which uses the I2C protocol
(Section 5.2.1) for transfer.The speciﬁcations of this chip is given as
i) High-Speed I2C-Compatible Serial Interface
ii) 400kHz Fast Mode
iii) 1.7MHz High-Speed Mode
iv) +4.75V to +5.25V Single Supply
v) +2.7V to +5.5V Adjustable Logic Level
vi) Internal +4.096V Reference
vii) External Reference: 1V to VAVDD
viii) Internal 4MHz Conversion Clock
ix) 58.6ksps Sampling Rate
A typical application circuit for the IC is shown in Figure 3.20 Note that the MCU
to be used needs an I2C port (SDA and SDC are I2C pins).
3.3 | Actuators
In the embedded application scenario, actuation means many things—it implies that
something is made to happen, and this ‘happening’ may be in the form of a motion or
display,alarm (sound or light),transmission to a distant unit etc.When the actuation is a
‘motion’,motors have to be used for rotational or linear motion.Let’s start the discussion
with display devices, related techniques and simple circuits where necessary.
0.1μF
10μF
Analog
Source
0.1μF 0.1μF
Rp Rp
12
13
8
AVDD
REF
REFADJ
AIN
AGNDS ADD3
SDA
ADD2
ADD1
ADD0
DVDD
SCL
SDA
VSS
VDD
M
C
U
M
A
X
1
1
6
9
SCL
AGND DGND
5.0V
10
9 1
14
2
3
7
6
5
4
11
3.0V
Figure 3.20 | Application circuit for the serial ADC MAX 1169
(Courtesy: Maxim Semiconductors)
M03_9788131787663_C03.indd 104
M03_9788131787663_C03.indd 104 7/3/2012 1:09:58 PM
7/3/2012 1:09:58 PM

3.3.1 | Displays
For most systems, some sort of display is necessary. Displays like LEDs, LCDs, etc. are
very common.There is a lot of range and variety in the displays used in embedded systems.
Let’s examine the features of some of the most popular displays.
3.3.1.1 | Light Emitting Diodes (LED)
A light emitting diode or LED as it is designated, works just as an ordinary semicon-
ductor diode.It is usually made of Gallium Arsenide and is available in different colours.
When it is forward biased, it conducts and also emits light which is used as a ‘display
unit’.
A single LED is used as an indication, of the state of a switch, the activation of
any signal, reception of some data/signal, etc. It can also be made to act as an alarm by
switching it on and off continuously at a certain rate. LEDs are easy to use and give a
very bright and pleasing display which can be viewed equally well from any viewing
angle, (unlike LCDs). The only drawback with LED displays is the high amount of
current they need, unlike LCDs which need very low power. Figure 3.21 shows that
a single LED can be connected to a positive power supply as shown. The value of the
current limiting resistor depends on the current rating of the LED.
In case we need a number of LEDs for displays, we still use only one power source
for all of them. In that case, they are connected together in either the ‘common anode’
or ‘common cathode’ LED configuration. In Figure 3.22a, the anodes of the three
LEDs are connected together, and it is a ‘common anode’ connection. If we want to
light up only the first and third LEDs, i.e., apply a ‘0’ (i.e. ground), only at K1 and K3.
+5
Figure 3.21 | A single LED circuit
+5
R1
D1
K1
R2
D2
K2
R3
D3
K3
Figure 3.22a | Common anode connection
M03_9788131787663_C03.indd 105
M03_9788131787663_C03.indd 105 7/3/2012 1:09:58 PM
7/3/2012 1:09:58 PM

In Figure 3.22b, which is ‘common cathode’, A1 and A3 alone should be given a ‘1’ for
the same result.
3.3.1.2 | Seven Segment LED
However, more important applications of LEDs are as alphanumeric displays. For that,
seven segment LEDs are used, in which seven LEDs are arranged as the segments of a
display arranged in a particular shape (see Figure 3.23a). When segments are selectively
lighted up, the display of all alphanumeric characters is possible. By lighting up all the
segments, we get ‘8’ displayed. We can have one more segment in this display and it is
for the decimal point. In Figure 3.23a, you can see that there are eight segments, includ-
ing the segment for the decimal point. In spite of this, such displays are still designated
as ‘seven’ segment displays. Such LED modules also can be used in either the common
anode or common cathode conﬁguration.
To light up a seven segment display LED of the common cathode type, ensure that
the cathode is grounded,and give a ‘1’to the segments which are to be lit up.It is obvious
that the opposite logic is to be used for common anode type.
Figure 3.23b shows the segments of a seven segment LED, arranged as a data byte
D0 to D7.
a
g
d
b
f
e c
dp
Figure 3.23a | The segments of a seven segment LED
D7
D6
D5
D4
D3
D2
D1
D0
dp g f e d c b a
Figure 3.23b | The data byte for the seven segment LED
A1
D1
A2
D2
A3
D3
Figure 3.22b | Common cathode connection
M03_9788131787663_C03.indd 106
M03_9788131787663_C03.indd 106 7/3/2012 1:09:58 PM
7/3/2012 1:09:58 PM

Example 3.1
a) Assuming a common cathode type of display, find the seven segment codes to be
used for displaying
i) 8, ii) A and iii) b.
b) How will the code change of it is a common cathode display?
Solution
(a) Since it is a common cathode type, the common cathode of the LEDs of a digit is
to be grounded.Then, supply the segment information to the anodes. For displaying
8, the segments to be lighted up are a, b, c, d, e, f, g. Since, we are using a common
cathode type of display, the data to light up a segment is ‘1’. Hence, the bits from
D0 to D6 is ‘1’.
Thus, the seven segment code for the display of 8 is 0111 1111, i.e., 7 FH.
i) Similarly for ‘A’ it is 0111 0111, i.e., 77 H.
ii) For ‘b’ it is 0111 1100, i.e., 7 CH.
b) For common anode displays, the code for ‘8’ is 1000 1000, for ‘A’ is 1000 1000 and
for ‘b’ is 1000 0011.
3.3.1.3 | Static Seven Segment Displays
Now, suppose we want to use such a module to display a single character. Let us think of
a scenario wherein, the number to be displayed is sent from an 8051 port to the display
module.We just send the code (called seven segment code) corresponding to the segments
of the LED.This code gives the information as to which of the segments are to be lighted
up for the display of a specific character. In Example 3.1, if 77H is outputted through a
port, the character ‘A’is displayed on the segment LED connected to the port lines.
Assume we use only a one digit display module. If it is a common cathode type,
we ground the common cathode and then we send the seven segment code directly to
activate the required segments.This causes some of the segments to be ON and some to
be OFF. Either way, as long as the display is ON, the module draws its required current
from the power source. This may be in the range of 5 to 30 mA for a single segment to
be lighted up.The display is ON all the time, and hence it is called a ‘static’ display.
Assume now that we need an eight digit display. If we use the same kind of static
display, the current drawn is multiplied by 8, and this becomes quite a large amount.
Multiply 7 × 25 × 8 mA.This gives a value of 1.4 A, which is much too large for an elec-
tronic circuit. For this reason, static displays are not preferred for multiple digit displays.
3.3.1.4 | Dynamic Displays
When there is an array of digit display units, say 4 seven segment LEDs arranged in
digit form in a row, a continuous display can be obtained by lighting up just one digit at
a time. The next instant this digit is switched off and the next one is lighted up. This is
done continuously and cyclically from digits 1 to 4 and repeated at a rapid rate. Because
of the property of persistence of vision of the eyes, an illusion of continuous display is
obtained.This is also called a multiplexed display.
M03_9788131787663_C03.indd 107
M03_9788131787663_C03.indd 107 7/3/2012 1:09:58 PM
7/3/2012 1:09:58 PM

The important points to note here are:
i) The common anode/cathode of a digit is to be activated for a digit to be active.
ii) At a time, only the segments of one digit are ‘ON’.
iii) After a specified delay, this digit is switched off and the segments of the next digit
are ON.The information displayed here is different from that of the previous case.
iv) Thus, for display multiplexing, consecutive digits should be switched on in a cyclic
fashion, and for each digit, the segment information should be supplied.
Now,let us use this concept in a system in which an 8051 handles a dynamic display
(Figure 3.24).This is an 4-digit display, of the common cathode type.The ports of 8051
are used in such a way that Port 1 supplies the digit information and Port 2 supplies the
segment information.Digit information through Port 1.0 to P1.3 is to select which digit
is being activated at a particular time. For segment information, the seven segment code
of each digit should be sent as a byte through Port 2.
Figure 3.25 shows the complete set up. Four pins of port 1 are used for ‘digit
driving’. These pins are connected to the bases of the four transistors Q1 to Q4. At
Port 2
Port 1
Digit Driving Logic
Segment Driving
Logic
P1.0
P1.3
8
0
5
1
Figure 3.24 | A dynamic display for an 8051-based system
Port 2
Segment Driver
8
0
5
1
Q4
B4
P1.0
P1.3
Port 1
Digit Driver
Q3
B3
B1
–B3
Q2
B2
Q1
B1
Figure 3.25 | A four digit dynamic display using the 8051.
M03_9788131787663_C03.indd 108
M03_9788131787663_C03.indd 108 7/3/2012 1:09:58 PM
7/3/2012 1:09:58 PM

a time, only one particular transistor is to be ON. These are PNP transistors and are
turned ON if a ‘0’ is applied to the bases. This ‘0’ goes to the emitters of the transis-
tor which is connected to the common cathode of the segment LEDs, of a par-
ticular digit. At a time, Port 1 gives a ‘0’ only on one of its port lines. To understand
this clearly, observe Figure 3.25. The most significant digit (or the left most digit of
the display) is activated by a ‘0’on pin P1.0. When this pin is cleared, P1.1 to Pi1.3
should be set. At the same time, the segment information for displaying the left
most digit should be placed on Port 1. If Port 2 gives the data 77 H, the first digit
displays ‘A’.
This technique is to be repeated for all digits continuously.The steps are:
1. Select the first digit to be displayed and send a suitable logic through P1.0 to P1.3 to
activate a digit.
2. Send the segment code through Port 2.
3. Call a delay of say, 3msecs.
4. Repeat this sequence for all four digits.
5. Then start again from the first step.
With 4 digits and 3 msecs delay, we can get back to the first digit every 12 msecs.
This corresponds to a refresh rate of around 83 times per second, which is sufficient
to fool the eye into believing that all the digits are ON at the same time [The persistence
of human vision is (1/16)th
of a second–(62.5 msec)].
3.3.1.5 | Organic LED (OLED)
This is a relatively new type of display which has gained acceptance in mobile phones,
PDAs, digital media players, cameras and similar portable applications. OLED-based
TVs are also making their foray into the consumer market.
The physical structure of an OLED consists of a layer of organic material (which
is emissive electroluminescent) is sandwiched between two conductors (an anode and
a cathode), which in turn are sandwiched between a glass top plate (seal) and a glass
bottom plate (substrate). See Figure 3.26. When electric current is applied to the two
conductors, a bright, electro-luminescent light is produced directly from the organic
material. There are two types of OLEDs- small molecule OLED, and polymer OLED.
The small molecule type is considered to have a longer lifespan. The OLED primary
colour matrix is arranged in red, green and blue pixels, which are mounted directly to a
printed circuit board. It expresses pure colours when an electric current stimulates the
relevant pixels.
Thin organic layers serve these displays as a source of light, which offers significant
advantages in relation to conventional technologies. The nature of its technology lends
itself to extremely thin and lightweight designs, which makes its application domain
very wide.To list out a few plus points of OLEDs:
i) Unlimited viewing angle
ii) Low power consumption
iii) Fast ‘response time’
iv) Brighter and more brilliant picture
M03_9788131787663_C03.indd 109
M03_9788131787663_C03.indd 109 7/3/2012 1:09:59 PM
7/3/2012 1:09:59 PM

When comparing it with standard display technologies, the notable points are that:
i) They require no backlighting as for LCDs because they are emissive devices.
ii) The fabrication process is easy, and devices are thinner and lighter than those fabri-
cated by cathode ray tube (CRT) display technology.
3.3.1.6 | Liquid Crystal Displays (LCDs)
Liquid crystal displays called LCDs are very popular, with their qualities of low power
dissipation and ease of use. The only problem normally encountered is the problem of
the viewing angle. The display is not equally clear at all viewing angles. They are avail-
able as character LCDs for displaying ASCII characters, and as graphical LCDs which
contain display elements as dots or pixels which can be selectively illuminated, so as to
display any pattern.
Character LCD
Character LCD modules of many different specifications (mostly differing in the num-
ber of lines, number of characters per line and so on) are available. An LCD module has
registers, writing into which the display can be easily programmed and controlled. Here,
we will discuss a 16 × 2 character LCD which looks as shown in Figure 3.27.
Pins of the LCD The LCD that we have selected, has 16 pins as shown in Table 3.2.
We see that DB0 to DB7 correspond to the data pins.The others are the pins for control
signals and the power supply.VEE is a pin used to adjust the contrast of the display. It is
usually connected to a potentiometer,so that contrast can be adjusted.Besides that,there
1
2
3
4
5
– – – –
–
–
+
+ + +
+
+
+
Figure 3.26 | Schematic of a bilayer OLED: 1. Cathode 2. Emissive Layer
3. Emission of radiation 4. Conductive Layer 5. Anode
16
2
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
Figure 3.27 | A 16 x2 character LCD module
M03_9788131787663_C03.indd 110
M03_9788131787663_C03.indd 110 7/3/2012 1:09:59 PM
7/3/2012 1:09:59 PM

are the VCC and ground pins.There are two pins for backlight adjustment, if necessary.
Backlighting means extra lighting behind the LCD panel (usually LEDs) so that the
display is visible in the dark also.
• RS–Register Select: This pin selects between a command register and a data register.
RS = 0 corresponds to the command register and RS = 1 to the data register.The data
to be displayed is to be sent to the data register.
• R/W–Read/Write: This pin allows the user to write to or read from the display.
When there is no necessity for reading the display, this pin is grounded. If both read-
ing and writing is required, this pin is made programmable.
• E–Enable: The enable pin has to be given a high to low pulse, which is maintained
high for at least 450 ns (may be different for other LCD modules).
• DB0 to DB7: These are the data pins of the LCD. Data to be written is to be sent
through these pins, and data to be read will be received from the LCD through these
pins.The data to be written for display are sent as ASCII characters. For writing into
the command registers, there are predefined codes for the LCD. The codes for the
specific LCD considered here, are given in Table 3.3.
• Busy Flag: It is seen that there is a minimum time required to latch one data on
the LCD and get it displayed. Suppose we want to give a new data for display, the
simplest way would be to introduce a small delay between sending the two display
data (which can be given only one character at a time). However, another method for
sending consecutive characters is to check what is called the ‘Busy’ flag of the LCD.
For testing the busy flag, make RS = 0 first. The ‘Busy’ flag is DB7 and can be read
Table 3.2 | Pins of the 16 × 2 LCD Module
No Symbol Function
1 Vss Power supply ground (0V)
2 Vcc Power supply (5V)
3 VEE Power supply for adjusting contrast
4 RS Register select signal
5 R/W Read write select signal
6 E Enable signal
7 DB0 Data bus line
8 DB1 Data bus line
9 DB2 Data bus line
10 DB3 Data bus line
15 AVEE
Positive voltage for back light
16 K 0V for back light
M03_9788131787663_C03.indd 111
M03_9788131787663_C03.indd 111 7/3/2012 1:09:59 PM
7/3/2012 1:09:59 PM

when R/W = 1 and RS = 0. If DB7 is found high, it means that the LCD is busy
doing its operations,and will not accept any new information.Keep checking this flag
until it is low.Then, the next data can be written to it.
Note If the busy flag is to be read, the R/W pin has to be made programmable.
Now, let us do some display activities using a 16 × 2 LCD. Data and commands are
sent from the ports of 8051. See Figure 3.28. Port1 is used as the data lines, and pins
P2.1 and P2.2 are used for RS and E.
The connections are done as follows. Refer to Figure 3.28.
i) VSS and R/W is connected to ground.
ii) VCC is connected to 5 V supply.
iii) VEE is connected through a 10 K pot to the supply for contrast adjustment.
iv) RS is connected to P2.0 and E is connected to P2.1.
v) Pin 7-14 (DB0 to DB7) of the LCD module are connected to Port 1 of 8051.
vi) Pins 15 and 16 of the LCD are used for backlight adjustment (not shown in
Figure 3.28).
Backlight There is a lamp here instead of reflected light. If backlighting is provided by
LEDs as in the case of many 16 × 2 LCDs, connect pin 16 to ground, and pin 15 to Vcc
through a 100 Ω resistor.
Getting a Character Displayed on the LCD To display characters on the LCD, the
ASCII value of the character should be sent to the data register. But before sending
the data, appropriate control signals should be activated by giving the required logic
levels on the port pins. Also, first the LCD is initialized, then cleared, and then the
cursor is positioned. This is done by sending command words to the LCD command
register. (Refer Table 3.3).
Algorithm
i) Send command word to Port 1. The command word are 38 H (initializing LCD),
0EH (making the LCD and cursor ON), 01 (clearing the screen), 06 (shifting the
cursor right) and 81 H (moving the cursor to line 1, position 1).
Port 1
8
0
5
1
L
C
D
P2.1
P2.2
R/W
VSS
VEE
VCC
–5V
1K Pot
E
RS
DB7
DB0
…
…
Figure 3.28 | Connecting the LCD module to the pins of 8051
M03_9788131787663_C03.indd 112
M03_9788131787663_C03.indd 112 7/3/2012 1:09:59 PM
7/3/2012 1:09:59 PM

ii) Make RS = 0 (by clearing P2. 1) for selecting the command register.
iii) Make R/W = 0 to write to LCD (if this line is grounded as in Figure 3.28, this step
can be skipped).
iv) Send a H to L pulse at the E pin to complete the writing. For this, make P2.2 high
for a short while and then clear it.
v) With this, the writing of commands is over. Now the required data must be written.
vi) Make RS = 1 for selecting the data register.
vii) Repeat steps 3 and 4.
In this setup (Figure 3.28),one whole 8-bit port was used up for LCD data.To save
on pins, it is possible to use LCDs with just 4 data pins of a port. The data and com-
mand words are sent as in the previous case, but the method here is to send the 8 bits
as two nibbles—thus only four lines of an MCU port need to be used. See Figure 3.29,
which shows that only 7 port pins are needed in total, to connect an LCD module to
an MCU.
Graphical LCD (GLCD)
Character LCDs have their limitations in that they can display characters only.Graphical
LCDs are currently used to display customized characters and images. Graphical LCDs
Table 3.3 | Command Codes of the LCD
Code (Hex) Command
01 Clear display screen
02 Return home
04 Shift cursor left (decrement cursor)
05 Shift display right
06 Shift cursor right (increment cursor)
07 Shift display left
08 Display off, cursor off
0A Display on, cursor on
0C Display on, cursor off
0E Display on, cursor blinking
0F Display off, cursor blinking
10 Shift cursor position to left
14 Shift cursor position to right
18 Shift the entire display to the left
1C Shift the entire display to the right
80* Force cursor to the beginning of the fi rst line
C0* Force cursor to the beginning of the second line
38 2 lines and 5 × 7 matrix
* For a 16 × 2 line display, the addresses of the cursor positions are 80 to 8F for the fi rst line, and C0 to CF for
the second line
M03_9788131787663_C03.indd 113
M03_9788131787663_C03.indd 113 7/3/2012 1:09:59 PM
7/3/2012 1:09:59 PM

find use in many applications; they are used in video games, mobile phones, etc. as dis-
play units.The customization is possible because the LCD has dots or pixels which may
be selectively lighted up to generate the display we need. Thus, their size is specified as
MxN dots or pixels.
Such LCD displays come in a variety of sizes, ranging from 32 × 80 to 240 × 320.
dots/pixels. Larger displays offer more display area, cost more and take longer to refresh
the entire screen with new data.
Graphical LCD modules come with inbuilt controllers which will allow us to inter-
face the display with an MCU, for selectively lighting up the required pixels. Figure 3.30
shows the picture of a 128 × 64 dots graphical LCD manufactured by ‘Vishay’, with
built-in controllers which are two ICs of KS0108 or equivalent. Table 3.4 gives the pin
functions of this LCD display module.
Two controllers are needed because the display is split logically as half-left half and
right half. It then needs two controllers with IC1 (Chipselect1) controlling the left half
of the display and IC2 (Chipselect2) controlling the right half. Each controller must be
addressed independently. Each half consists of 8 horizontal pages each of which is 8 bits
(1 byte) high. The page addresses, 0-7, specify one of the 8 pages. That is illustrated in
Figure 3.31.
L
C
D
D
I
S
P
L
A
Y
8
0
5
1
P2.6
P2.5
P2.4
P2.3
P2.2
P2.1
P2.0
RS
R/W
EN
DB7
DB6
DB5
DB4
Figure 3.29 | A 4-bit LCD interface
Figure 3.30 | A 128 × 64 dots (pixels) graphical LCD
M03_9788131787663_C03.indd 114
M03_9788131787663_C03.indd 114 7/3/2012 1:09:59 PM
7/3/2012 1:09:59 PM

Table 3.4 | Pin functions of the LCD module of Figure 3.30
Pin Number Symbol Function
1 Vss GND
2 Vdd Power Supply (+5V)
3 Vo Contrast Adjustment
4 D/L Data/Instruction
5 R/W Data Read/Write
6 E H L Enable/Signal
7 DB0 Data Bus Line
8 DB1 Data Bus Line
9 DB2 Data Bus Line
10 DB3 Data Bus Line
15 CS1 Chip Selector for IC1
16 CS2 Chip Selector for IC2
17 RST Reset
18 Vee Negative Voltage Output
19 A Power Supply for LED (4.2V)
20 K Power Supply for LED (0V)
D0
D1
D2
D3
D4
D5
D6
D7
Page 7
Page 6
Page 5
Page 4
Page 3
Page 2
Page 1
Page 0 64 Bytes (Columns) × 8 Bits
64 Bits
Controller 1 CS1 = 1, CS2 = 0
Page 7
Page 6
Page 5
Page 4
Page 3
Page 2
Page 1
Page 0
64 Bits
6
4
B
I
T
S
Controller 2 CS1 = 0, CS2 = 1
Figure 3.31 | Division into two halves, and the vertical pages of the GLCD
M03_9788131787663_C03.indd 115
M03_9788131787663_C03.indd 115 7/3/2012 1:10:00 PM
7/3/2012 1:10:00 PM

The Controller IC The KS0108B is a LCD driver with 64 channel output for dot
matrix liquid crystal graphic display system. This device consists of the display RAM,
64-bit data latch, 64-bit drivers and decoder logic. It has the internal display RAM for
storing the display data transferred from an 8-bit MCU and generates the dot matrix
LCD driving signals corresponding to stored data.
Commands of the Controller The following are the KS0108 commands. See
Figure 3.32.
• Y address (0 to 63): The Y address counter designates address of the internal RAM.
An address is set by instructions and is increased by 1 automatically by read or write
operations of display data. Y address 0 is the leftmost byte, and Y address 63 is the
rightmost byte of a page.
• X address (0 to 7): This is the page address and has no count function.
• Display line (0 to 63): The display start line register speciﬁes the line in RAM which
corresponds to the top line of LCD panel, when displaying contents in display data
RAM on the LCD panel. It is used for scrolling of the screen.
Function
Instruction RS R/W DB0
DB1
DB3 DB2
H
H
H
H
H
H
L
L
L
L
Display On/Off
Controls the display
on or off internal
status and display RAM
data is not affected
Y Address(0-63)
Set Address
Sets the Y address in
the Y addresss counter
Sets the X address at
the X address register
Indicates the display
data RAM displayed
at the lop the screen
Read status
Busy L: Ready
H: In Operation
On/Off L: Display On
H: Display Off
Reset L: Normal
H: Reset
Writes data (DB0:7) into
dispaly data RAM after
writing instruction
Y address is increased
by 1 automatically
Reads data (DB0:7)
from display data RAM
To the data bus
Read Data
Write Data
R
E
S
E
T
O
N
/
O
F
F
B
U
S
Y
Read Display
Data
Write Display
Data
Status Read
H
H
H H
H H
L
L
L L
H H H H
L
L
L L L H
L
L L
L
L
L
Display Start
Line
Set Page
(X Address)
Display Start Line
(0–63)
Page (0–7)
DB7 DB6 DB5 DB4
Figure 3.32 | Commands of the controller
M03_9788131787663_C03.indd 116
M03_9788131787663_C03.indd 116 7/3/2012 1:10:00 PM
7/3/2012 1:10:00 PM

With this information, and an in-depth reading of the data sheets of the controller,
it should now be possible for you to interface graphical LCD to an MCU like 8051,
PIC, etc.
The algorithm for use is similar to that of the character LCD, i.e., send the initial-
ization signals as commands, and then send the data.
How to light up pixels selectively?
Figure 3.33a shows the left half of the display (for one column only). There are 8 pixels
in one column and there are eight such columns, one for each page.The total number of
pixels in a column is numbered as P0 to P63.
Figure 3.33 b shows the eight pixels (named P0 to P7) corresponding to Page 0 of
Figure 3.33a Assume we want to light up P0,P1 and P7 of this column.The data for this
is to be sent as a byte. It is 1100 0001, i.e., 0xC1. Now for the next page (for pixels P8 to
P15) if all the pixels except P8 and P15 are to be lighted up, the data is 0111 1110, i.e.,
0x7E.This is continued for all the eight pages (one column of the display).
Thus, we write data (in bytes) for all the pixels of one column, and then go on to the
next column.This is done ﬁrst for the left part of the display, and then for the right half.
The display will then appear to start from the left, and move to the right. We see that
for any pattern, we have to generate a bit map and load it into the display RAM of the
LCD controller.
Connecting a Graphical LCD to an 8051 MCU
Figure 3.34 shows the connections between a graphical LCD and 8051. Note that the
connections are similar to Figure 3.28 but here two extra connections are for CS1 and CS2.
P0
Pixels
P1
P2
P62
P63
Figure 3.33a | Pixels in one column of the GLCD
P0
Pixels
P7
Figure 3.33b | Pixels of Page 0
M03_9788131787663_C03.indd 117
M03_9788131787663_C03.indd 117 7/3/2012 1:10:00 PM
7/3/2012 1:10:00 PM

D/I is the pin which selects between data and commands (similar to pin RS in
Figure 3.28).Pins 19 and 20 of the LCD module are for backlighting,and are not shown
in Figure 3.34.
Note Appendix H contains the program for interfacing this graphic LCD to PSoC3.
The program is also given.
3.3.2 | Motors
Motors are used for rotational motion, which can be converted to linear motion when
the application calls for it.In embedded systems,the rating (voltage,current,torque,etc.)
of the motor to be used, depends on the application. For example, we might use a motor
in a home security system to get a door opened or closed.This might require a heavy duty
motor depending on the weight of the door. Another very common application is hobby
robotics where vehicle movement, arm movement, etc. are required. The rating of the
motor to be used depends on the size of the robotic vehicle and its activity. The motors
used by hobbyists for robotics are usually small and rated at 6 to 12 V supply.
MCUs are used in motor circuits only for ‘controlling’ the motor. Motors may be
made to rotate clockwise/anti-clockwise at diﬀerent rpms or it may be that a movement by
a small angle alone is needed.In all these cases,the MCU can be programmed to generate
the driving logic for motor movement. But motors cannot be driven directly by a MCU
because the current output from an MCU is relatively small. So there should be arrange-
ments to get higher driving currents, and additional circuitry is usually necessary. We will
examine all such aspects for two types of motors—stepper motors and DC motors.
3.3.2.1 | Stepper Motors
Introduction to Stepper Motors A stepper motor is an electromechanical device
which converts electrical pulses into discrete mechanical movements. When electrical
pulses are applied to it, the shaft of the motor rotates in steps and this type of movement
gives the motor its name.
Principle of Operation Stepper motors operate diﬀerently from normal DC motors.
A DC motor rotates continuously when voltage is applied to its terminals. Stepper
motors, have multiple toothed electromagnets arranged around a central gear-shaped
Port 1
P2.0 E
R/W
D/I
CS1
CS2
P2.1
P2.2
P2.3
P2.4
8
0
5
1
G
L
C
D
Figure 3.34 | Connections of a GLCD to 8051
M03_9788131787663_C03.indd 118
M03_9788131787663_C03.indd 118 7/3/2012 1:10:00 PM
7/3/2012 1:10:00 PM

piece of iron (see Figure 3.35). The electromagnets are energized by an external control
circuit which sends pulses to the motor.
To turn the motor shaft,one of the electromagnets is given power first,which makes
the gear’s teeth magnetically attracted to the electromagnet’s teeth. When one tooth of
the gear is thus aligned to the energized electromagnet (Electromagnet ‘B’ and tooth
‘6’ are aligned in Figure 3.35), others are slightly offset from the corresponding electro-
magnets. When the next electromagnet is turned on and the first is turned off, the gear
rotates slightly to align with the next one, and from there the process is repeated. Each
of those slight rotations is called a step, with an integral number of steps making a full
rotation. In this way, the motor can be turned by a precise angle.
The rotation of the motor is related to the sequence of the input pulses:
i) The order in which a particular sequence is applied, decides the direction of rotation
(clockwise or anti-clockwise).
ii) The speed of stepping depends on the frequency of the pulses applied, i.e., higher
the frequency, faster the stepping motion.
We can use stepper motors for movement which needs to be finely controlled. The
fine control is obtained because these motors move in steps, and the steps can be quite
small in size. For example, one step can be 2 degrees and, for one complete (360 degree)
rotation, 180 steps are obviously needed. For one such step, the motor needs to get a
‘pulse’ from a control circuit. In order to obtain a 90 degree rotation for such a motor,
we must write a program to supply only 45 pulses to it.
This type of stepping motion can be used to advantage when it is needed to control
aspects such as rotation angle, speed, position and synchronism. As such, they are used
in applications such as printers, plotters, high end office equipment, hard disk drives,
medical equipment, fax machines, and automotive and industrial applications where
precise and controlled rotation is required. Robotics is another area where it is used for
precise and controlled motion.
A
15°
B
C
D
E
F
G
H
1
2
3
4
5
6
Figure 3.35 | Figure showing the operating principle of a stepper motor
M03_9788131787663_C03.indd 119
M03_9788131787663_C03.indd 119 7/3/2012 1:10:00 PM
7/3/2012 1:10:00 PM

Driving a Stepper Motor
Full Step Drive (two phases on) This is the usual method for full step driving of the
motor.Both phases are always ON.The motor will thus have the full rated torque.This is
achieved by the sequence of ones and zeros as shown in Table 3.5 which is to be repeat-
edly applied. (Note the presence of two ‘1’s in each row of the table corresponding to
‘two phases’being ON at any time.) Reversing the order in which the sequence is applied
gives anti-clockwise rotation.
In short, for clock wise rotation, the sequence to be applied repeatedly is 09, 0CH,
06, 03. For anti-clock wise rotation, it is 03, 06, 0CH, 09. If you read about stepper
motors from any other source, there is a possibility that you might see another ‘sequence’
being suggested. Neither that nor this is wrong. The ‘sequence numbers’ just depend on
the way we have named the phase windings.There are four wires available at the output
of any stepper motor winding and they are named A, A, B and B corresponding to
Figure 3.36.
Wave Drive In this drive method, only a single phase is activated at a time. It has the
same number of steps as the full step drive, but the motor will have signiﬁcantly less
than rated torque.This sequence is 8,4,2,1 for clockwise and 1,2,4,8 for anti-clockwise
rotation.Table 3.6 is the driving sequence for this.
Half Stepping When half stepping, the drive alternates between two phases ON and a
single phase ON.This increases the angular resolution,but the motor also has less torque
at the half step position (where only a single phase is on).The advantage of half stepping
is that the drive electronics need not change to support it. The step-angle is half that of
the previous two cases—thus the stepping resolution is increased. For anti-clockwise
rotation, the order of the above sequence should be reversed. Table 3.7 is the driving
sequence for this.
COM
A
A
B
B
Figure 3.36 | Stepper motor windings
Table 3.5 | Driving Sequence for a Stepper Motor—Full Step Drive
Clockwise
Step No. A B A B
1 1 0 0 1
2 1 1 0 0
3 0 1 1 0
4 0 0 1 1
M03_9788131787663_C03.indd 120
M03_9788131787663_C03.indd 120 7/3/2012 1:10:01 PM
7/3/2012 1:10:01 PM

Using an 8051 to Interface a Stepper Motor
We can run a motor using a sequence generated by an MCU, say 8051. The motor,
however, cannot be driven directly from its port pins, because the motor requires a
current much more than can be supplied by the MCU. (The exact current requirement
depends on the specifications of the particular motor being used.) As such, current
drivers are needed between the 8051 port lines and the leads of the motor. Transistors
with high current capability (e.g. Darlington pair or power transistors) can be used.
Besides this, there are special motor driving ICs available. One such IC is the ULN
2003 driving IC whose pin diagram is shown in Figure 3.37. This IC contains an array
of seven Darlington pair transistors. Figure 3.38 shows the 8051 MCU generating a
sequence for energizing a stepper motor, with the IC ULN 2003 being used to raise
the current level. Four pins of Port1 have been used for sending the driving sequence
to the motor.
Other Issues Regarding Stepper Motors
An important issue to take care of when using stepper motors is that there is a chance
of back emf being produced during the de-energization of the coils. This can damage
the circuits producing the sequence and hence diodes are connected which block these
spikes. Such diodes are called by various names as flywheel, fly back, free wheeling or
snubber diodes (Section 3.3.4.2) When discrete Darlington pair transistors are used for
Table 3.6 | Driving Sequence for a Stepper Motor—Wave Drive
Clockwise
Step No. A B A B
1 1 0 0 0
2 0 1 0 0
3 0 0 1 0
4 0 0 0 1
Table 3.7 | Driving Sequence for a Stepper Motor—Half Stepping
Clockwise
Step No. A B A B
1 1 0 0 1
2 1 0 0 0
3 1 1 0 0
4 0 1 0 0
5 0 1 1 0
6 0 0 1 0
7 0 0 1 1
8 0 0 0 1
M03_9788131787663_C03.indd 121
M03_9788131787663_C03.indd 121 7/3/2012 1:10:01 PM
7/3/2012 1:10:01 PM

producing the high current required to drive the motors, diodes are connected to block
this back emf. Such a diode is also in-built within the motor driving IC ULN 2003.
Note the diode connected at pin 9, which is the freewheeling diode, corresponding to
all the seven Darlington transistors inside the chip.
3.3.2.2 | DC Motors
This is a type of motor which operates on direct current and is very commonly used in
embedded systems, when continuous movement is needed.The movement may be made
‘controlled’, in the sense that the speed and direction can be changed as per the require-
ments of the application. Robotics is an area where DC motors are widely used, but this
is not the only application. Any type of movement is possible to be achieved with dc
motors.
The DC motor has two basic parts: the rotating part that is called the armature and
the stationary part called the stator that includes coils of wire called the ﬁeld coils.
IN 1 1 OUT 1
16
IN 2 2 OUT 2
15
IN 3 3 OUT 3
14
IN 4 4 OUT 4
13
IN 5 5 OUT 5
12
IN 6 6 OUT 6
11
IN 7 7 OUT 7
10
GND 8 Common Free
Wheeling Diodes
9
Figure 3.37 | Functional pin diagram of the driver, IC ULN2003
ULN 2003
8
0
5
1
P1.3
P1.2
P1.1
P1.0
OUT 1
A
B
A
B
To Stepper
Motor Coils
OUT 2
OUT 3
OUT 4
IN 1
IN 2
IN 3
IN 4
Figure 3.38 | Connections between 8051 and the stepper motor through a current driver
M03_9788131787663_C03.indd 122
M03_9788131787663_C03.indd 122 7/3/2012 1:10:01 PM
7/3/2012 1:10:01 PM

The armature is made of coils of wire wrapped around the core, and the core has an
extended shaft that rotates on bearings.The ends of each coil of wire on the armature are
terminated at one end of the armature.The termination points are called the commuta-
tor, and this is where the brushes make electrical contact to bring electrical current from
the stationary part to the rotating part of the machine. Figure 3.39 is a photograph of a
light weight DC motor.
Characteristics of DC Motors
DC motors are non-polarized; this means that its power supply voltage can be reversed.
The characteristics of a DC motor that we use in applications are as follows:
Speed Varies with Applied Voltage This feature is important for running a motor at
diﬀerent speeds. This can be done by increasing or decreasing the power supply voltage.
But when we use electronic control, PWM (pulse width modulation) is the method for
varying motor speed.
The method is to apply a pulse train to the power terminals of the motor.The aver-
age voltage obtained at the terminals is then proportional to the duty cycle of the pulse
train, which is proportional to the speed of rotation (rpm) of the motor. Thus, as the
duty cycle is increased, the motor rpm increases and vice versa. When the power supply
is constant, it runs at 100% of its power rating (at no load). As the duty cycle reduces,
the speed and the power reduce. Figure 3.40 shows pulse trains of various duty cycles.
When it is necessary to do speed control of DC motors for embedded applications,
an MCU can be made to generate the PWM waveform based on some criterion, or
depending on sensor output values. Many MCUs have a PWM unit as an integrated
peripheral—the user just needs to use a few registers to specify the pulse repetition time
(T) and the duty cycle. The 8051 does not have PWM unit—but such a waveform can
be generated easily by a simple program.
Torque Varies with Current The torque of a motor is the rotary force produced on its
output shaft.Torque increases with increased current, which means that it increases with
increase in power supply voltage.
Reversal of Polarity of the Supply Voltage Causes Reversal of Direction of Rotation
This aspect is very important in many applications, especially in robotics, when the motor
Figure 3.39 | Photograph of a small DC motor
M03_9788131787663_C03.indd 123
M03_9788131787663_C03.indd 123 7/3/2012 1:10:01 PM
7/3/2012 1:10:01 PM

needs to reverse its direction of rotation.For example,a robotic vehicle will have to change
from forward motion to reverse motion when an obstacle comes in its path. To do this
dynamically, some sort of controlling switch is necessary, and this is available in the form
of the H bridge.
3.3.2.3 | H Bridge
The H bridge is so named because it has four switching elements at the limbs of the H
and the motor forms the cross bar. Figure 3.41 shows the ‘idea’of the H bridge.There are
two switches at the top (left and right) and two more switches at the bottom. They are
named S1, S2, S3 and S4.
When the motor is not expected to rotate, all the switches are to be kept open.
When switches S1 and S4 alone are closed, the motor rotates in the clockwise direction,
with switches S2 and S3 closed, the rotation is anti-clockwise. In the positions when
the top two switches and/or the bottom two switches are closed, the motor gets short
circuited and such a situation should not be allowed.
The valid states of the switch are shown in Table 3.8, assuming that activation by a
‘1’ corresponds to a switch closure.
100%
25%
50%
75%
T
V
V
0
0
V
0
V
0
Duty
Cycle
T T
Figure 3.40 | PWM waveforms at various duty cycles
S1
S2
S3
S4
V
M
Figure 3.41 | The principle of operation of an H bridge
M03_9788131787663_C03.indd 124
M03_9788131787663_C03.indd 124 7/3/2012 1:10:02 PM
7/3/2012 1:10:02 PM

What is the mechanism to realize an H bridge?
It can be done using any device that has switching properties like relays, transistors,
MOSFET, etc. But if you are trying to run a DC motor from an MCU output, the best
bet would be a motor driving IC with H bridge. The IC L293D is a dual H bridge IC
which also provides sufficient current to drive a small motor.
The L293D IC whose pin configuration (shown in Figure 3.42) is a dual H Bridge
motor driver. With one such IC, two DC motors can be driven which can be controlled
in both clockwise and counter clockwise directions.
For applications that don’t need reversal of direction, the four output pins can be
used for driving four separate motors. This IC is rated for an output current of 600 mA
and peak output current of 1.2A per channel. Moreover for protection of the circuit
against back EMF,snubber/flywheel diodes (Section 3.3.4.2) are included within the IC.
A simple schematic for interfacing a DC motor using L293D is shown in Figure 3.43.
Refer Table 3.9 for the status of A and B.
Table 3.8 | Switch status for direction of motor rotation
S1 S2 S3 S4 Motor Rotation
1 0 0 1 Clockwise
0 1 1 0 Anti-clockwise
L
2
9
3
D
Chip Enable 1 1
2
3
4
5
6
7
8
Input 1
Input 2
VC
Output 1
Output 2
GND
GND
VSS
16
15
14
13
12
11
10
9
Input 4
Input 3
Chip Enable 2
Output 4
Output 3
GND
GND
Figure 3.42 | Pins of the L293D motor driver
Table 3. 9 | Action Performed for the Four Combinations of A and B
A B Action Performed
0 0 Motor is in the stop/brake condition
0 1 Motor rotates anti-clockwise
1 0 Motor rotates clockwise
1 1 Motor is in the stop/brake condition
M03_9788131787663_C03.indd 125
M03_9788131787663_C03.indd 125 7/3/2012 1:10:02 PM
7/3/2012 1:10:02 PM

Three pins of the chip are needed as inputs from the MCU. The enable pin has to
be set, and the pins A and B are to be controlled by the port lines P1.0 and P1.1, which
generates the necessary logic to get the motor to rotate as required.
Embedded applications may use either stepper or DC motors.But when it comes to
speed, weight, size and cost, DC motors are always preferred over stepper motors.
3.3.3 | Optocouplers/Opto Isolators
An optocoupler is another device helpful in damping the back EMF produced in a
motor circuit. An optocoupler or optical isolator is a device that uses an optical trans-
mission path to transfer a signal between elements of a circuit, while keeping them
electrically isolated. An optocoupler is essentially a combination of two distinct devices:
an optical transmitter, typically a gallium arsenide LED (light-emitting diode) and an
optical receiver such as a phototransistor or light-triggered diac. The two are separated
by a transparent [insulator] barrier which blocks any electrical current ﬂow between the
two, but does allow the passage of light.
The basic idea is shown in Figure 3.44. Usually the electrical connections to the
LED section are brought out to the pins on one side of the package and those for the
In 1 Out 1
5V
12V
In 2
In 3
In 4
En 1
En 2
4
8
16
VS
VSS
5
12
13
A 2 3 1 2
+ A –
DC Motor
Out 2
6
Out 3
11
Out 4 14
7
10
15
1
9
B
P1.0
8
0
5
1
L
2
9
3
D
P1.1
Figure 3.43 | Connections between an 8051, an H bridge and a DC motor
A
K B E
C
Figure 3.44 | Operating principle of an optocoupler
M03_9788131787663_C03.indd 126
M03_9788131787663_C03.indd 126 7/3/2012 1:10:02 PM
7/3/2012 1:10:02 PM

phototransistor or diac to the other side, to physically separate them as much as possible.
Figure 3.45 shows the K817.optocoupler used in a stepper DC motor interfacing circuit,
along with a driving IC ULN 2003.
3.3.4 | Relays
Switches which can be turned ON and OFF without manual control constitute a ‘relay’.
Relays can be used to connect and disconnect between points in a circuit, by using
electrical control logic. Relays allow one circuit to switch a second circuit which can be
completely separate from the ﬁrst. For example, a low voltage circuit can use a relay to
switch a high voltage (say, 230V AC mains) circuit. There is no electrical connection
inside the relay between the two circuits,the link is magnetic and mechanical.Such relays
are ‘electromechanical’.Figure 3.46 is a photograph of such a relay.Newer relays are of the
semiconductor type.
P1.0
1 4
2
2.2k
220Ω
5
ULN
M
OPTO
VCC
8
0
5
1
Figure 3.45 | A DC motor connected to 8051 through a current driver and an optocoupler
Figure 3.46 | Photograph of a relay
M03_9788131787663_C03.indd 127
M03_9788131787663_C03.indd 127 7/3/2012 1:10:02 PM
7/3/2012 1:10:02 PM

3.3.4.1 | Electromechanical Relays
This is the earliest type of relay, and is still is use today. We will discuss the types with
specifications that enable it to be used in embedded systems,where voltages and currents
are low. By using these low voltages and currents, relays can make and break connections
in higher voltage circuits.
Electromechanical relays convert a magnetic flux created by an electrical signal into
a mechanical force.This mechanical force causes switches to ‘close’ or ‘open’
See Figure 3.47. There is a coil wound around a permeable iron core. One end of
the iron core is fixed (and called the yoke), while the other end is end is free and spring
loaded (or gravity operated in certain cases), and is called the armature. The armature
is hinged to the yoke and mechanically linked to one or more sets of moving contacts.
When the coil is de-energized there is an air gap in the magnetic circuit. In this condi-
tion, one of the two sets of contacts in the relay is closed, and the other set is open.
In Figure 3.47, two sets of electrical contacts are shown. The contacts which are
open, when the coil is de-energized are called ‘Normally open (NO)’ contacts–similar
to this, there are ‘Normally closed (NC)’ contacts as well.This is shown in Figure 3.48.
Air Gap
Yoke
Magnetic
Flux
Armature
Pivot
Movable
Contact
Fixed
Contacts
Electrical
Connections
Normally Open
Common
Normally Closed
Coil Supply Voltage
Ene rgizing Coil
Figure 3.47 | The principle of operation of a relay
Contact Tips
Normally Open
Normally Closed
Figure 3.48 | NC and NO contacts
M03_9788131787663_C03.indd 128
M03_9788131787663_C03.indd 128 7/3/2012 1:10:03 PM
7/3/2012 1:10:03 PM

In a relay, there can be many NC and many NO contacts. When a contact closes,
the contact is like a short circuit (zero resistance) and when open, it is an open circuit
(infinite resistance). This is the ideal condition, but which may not hold true in
practice.
3.3.4.2 | Contact Types
The energization and de-energization of a relay can open or close one or more switch
contacts.Each ‘contact’may be referred to as a ‘pole’.Many of these contacts or poles can
be connected or ‘thrown’ together and this gives rise to the description of the contact
types as being SPST (single pole single throw), DPST (double pole single throw) and
DPDT (double pole double throw). See Figure 3.49.
For many applications, we connect a relay to the output side of a BJT or FET cir-
cuit, as in Figure 3.50. Here a diode is seen to be connected across the relay. This is the
‘flywheel diode’ which saves the transistor or FET from getting damaged, when there
is a back emf from the coil. This is produced when the current in the coil is turned off.
The magnetic flux collapses within the coil and results in a back emf which may be very
high in comparison with the switching voltages used for the active device in the circuit,
The diode is connected with such a polarity that the back emf makes it conduct and dis-
sipates the story energy in it, thus preventing damage to the BJT/FET. The diode now
is called the flywheel/freewheeling/snubber diode. Motors are another type of inductive
load which require such a flywheel diode.
NO
SPST
(NO)
NC
SPST
(NC)
DPST
(NO)
NO NO
SPDT
NC NO NC NO NC NO
DPDT
Figure 3.49 | Contact types
12 V
Relay
BC547
P1.0
8
0
5
1
470 Ω
Figure 3.50 | A simple relay circuit
M03_9788131787663_C03.indd 129
M03_9788131787663_C03.indd 129 7/3/2012 1:10:03 PM
7/3/2012 1:10:03 PM

Relays in embedded systems are connected to output pins of microcontrollers. In
many cases, it is used to control high power circuits, that is, high voltage/current, by
using the lower voltage levels in the MCU. See Figure 3.51 which shows a relay used
to control the switching of a bulb rated for 230V. In this case, the contacts of the relay
should be able to withstand the high current passing through it.
3.3.4.3 | Solid State Relays
Electromechanical relays have problems such as large physical size and mechanical parts
which tend to wear out with time. Solid state relays are the solution to these problems.
They have the same principle of operation, but they differ by having semiconduc-
tor switching elements like thyristors, triacs, diodes and transistors. Most of them have
optoisolators incorporated internally, so as to isolate the input and output.Thus, they are
smaller, noiseless, bounce free and more reliable.
Conclusion
With this, we come to the end of this chapter.This chapter has covered many commonly
used sensors and actuators.The explanation for each of them is aimed at a level of knowl-
edge with which a student can attain a confidence level sufficient to let him endeavour
to do practical projects.
Once a few particular sensors and actuators are understood well, choosing the right
one for a particular application becomes easy.
Sensors and actuators are necessary in all embedded systems.
Sensors convert physical quantities to analog voltages.
Some popular temperature sensors are thermistors and thermo couples.
12 V
R
P1.1
8
0
5
1
230 V
AC
Figure 3.51 | A relay circuit switching a high voltage circuit
M03_9788131787663_C03.indd 130
M03_9788131787663_C03.indd 130 7/3/2012 1:10:03 PM
7/3/2012 1:10:03 PM

Commonly used light sensors are LDRs and photo junction devices.
Sharp (the Company) has developed a good set of proximity and range sensors.
Optical encoders are fixed to the wheels of moving vehicles to measure velocity.
Humidity sensors are used in homes, factories and automobiles.
A to D converters have data and control interfaces
The data interface of ADCs may be parallel or serial.
Displays are a necessity in many embedded systems.
LEDs may be used singly or as seven segment ones.
Seven segment LEDs may operate in the static or dynamic modes.
OLEDs are a new kind of display devices
LCDs are very popular and are available as Character LCD or Graphical LCD modules.
Motors which are used in embedded systems are stepper motors or DC motors.
Relays are either electromagnetic or solid state types.
Q U E S T I O N S
1. What is the role of sensors in an embedded system? Name the sensors used in two
popular embedded systems.
2. How does an LDR work?
3. Why are infra red LEDs preferred to ordinary LEDs in sensing circuits?
4. How can an ordinary transistor be converted to a photo transistor?
5. How does a proximity sensor work?
6. Explain the principle by which range is calculated by the Sharp range sensor?
7. What is the operating mechanism of an optical encoder? Where is such an encoder used?
8. List two applications of humidity sensors.
9. Distinguish between the terms‘control interface’and‘data interface’for an ADC.
10. Why are serial ADCs becoming more popular than the parallel ones?
11. How can a static seven segment LED system be made‘dynamic’?
12. In what ways is a Graphical LCD superior to a Character LCD?
13. What is the role of drivers in the Graphical LCD module?
14. What role does an optocoupler have in a motor driving circuit?
15. Why are current drivers needed in motor circuits driven by MCUs?
16. What is the need for an H bridge in a DC motor circuit?
17. How does an H bridge work?
18. What do you understand by the terms NO and NC contacts of a relay?
19. Discuss two simple applications of relays.
20. What are the merits of a semiconductor relay over an electromagnetic relay?
M03_9788131787663_C03.indd 131
M03_9788131787663_C03.indd 131 7/3/2012 1:10:03 PM
7/3/2012 1:10:03 PM

E X E R C I S E S
1. Refer to the data sheet of the TSOP (SM0038) light sensor, and find out how it is used.
Draw typical circuits in which it is used.
2. Make a review of the use of Sharp range sensors and their use in hobby robotics.
3. List the names of five serial ADCs and discuss the data interface of each of them.
4. Are OLEDs used in consumer electronic products? Find out the details.
5. Design a robotic platform with wheels driven by
i) stepper motors and
ii) DC motors.
Compare the merits and de-merits and the differences in the use of these two kinds of
motors, for this application.
M03_9788131787663_C03.indd 132
M03_9788131787663_C03.indd 132 7/3/2012 1:10:03 PM
7/3/2012 1:10:03 PM

Introduction
Embedded systems form an integral part of our modern day life. As mentioned in
Chapter 1, the applications are very vast, ranging from small toys to huge machines.This
chapter focuses on a few important applications of embedded systems.
4.1 | Mobile Phone
It is an undeniable fact that the most depended upon and the most sought after gadget of
modern times is the mobile phone. Initially arriving as a wireless mode of data (mainly
voice) communication, and then eventually transforming into a handheld device which
now encompasses the uses of a camera, music player, note book, GPS, map, etc. (the list
grows longer each day),the mobile phone (or cell phone),has come a long way.There was
a time when mobile phones only had monochromatic LCD displays and provided only
basic functions like making calls and texting (SMSes). Today, mobile phones are much
more advanced with bright and colourful displays, touch screens, and are capable of
many video and audio processing applications.This metamorphosis has been mainly due
to the evolution of modern day embedded processors which have powerful processing
capabilities and at very high clock speeds.The advent of ﬁne sensors, very good displays
and A to D converters with high resolutions also have their role in gearing up the mobile
revolution. Needless to say, the modern day cell phone has changed lives and lifestyles.
Figure 4.1 shows a number of cell phones of diﬀerent grades and brands.
The working of a few embedded products
The working principle and components of
a mobile phone
The role of embedded systems in automobiles
How an RFID system works
Bio-medical applications of embedded
systems
What a wireless sensor network is
The principle and working of the brain
machine interface
examples of
embedded systems
4
Chapter-opening image: An ARM9 development board.
Nithin Gopinath, Analog Design Engineer, Texas Instruments, Bangalore.
M04_9788131787663_C04.indd 133
M04_9788131787663_C04.indd 133 7/3/2012 12:09:22 PM
7/3/2012 12:09:22 PM

4.1.1 | Block Diagram
Let’s look at the basic block diagram of a mobile phone. Figure 4.2 shows the functional
block diagram of a mobile phone receiver,
The fundamental blocks of a mobile phone are the following:
i) Central processing block consisting of the Microcontroller Unit (MCU), Digital
Signal Processing Unit (DSP) and memory.This forms the ‘embedded processor’of
a mobile phone.
ii) RF transceiver consisting of RF modem, transmitter, synthesizer and receiver. It
uses a low noise ampliﬁer for boosting signals and an RF antenna for transmitting/
receiving.
iii) Power management block takes care of all power-related issues (analog and digital
power) of the device.
iv) Analog baseband block is responsible for dealing with the analog signals coming
from the microphone and going to the speaker.
Figure 4.1 | A photograph of diﬀerent cell phones (Courtesy: www.geek.com)
Analog Baseband
Power
Management
RF
Transceiver
I
N
T
E
R
F
A
C
E
P
E
R
I
P
H
E
R
A
L
MCU DSP
Memory
Figure 4.2 | The functional block diagram of a mobile phone
M04_9788131787663_C04.indd 134
M04_9788131787663_C04.indd 134 7/3/2012 12:09:24 PM
7/3/2012 12:09:24 PM

EXAMPLES OF EMBEDDED SYSTEMS 135
v) Display consists of LCDs, LEDs and other such hardware.
vi) Peripheral interface consists of USB port, audio jack, etc. which facilitate the con-
nection of peripherals to the phone.
We are mainly concerned with the ‘embedded processing’part of the phone. Hence,
let us focus on the ﬁrst block.
The central processing block consists of ARM core based general purpose proces-
sors (single or dual core) and some co-processors which take charge of signal processing.
The same SoC handles the MCU and DSP applications. The processors used in today’s
smart phones are powerful enough to run operating systems like Linux, Android, etc.
The modern day phone does almost as many as, if not more, than the number of applica-
tions done by the PC.
Some of the popular processors used in mobile phones are Texas Instruments’
OMAP (Refer Section 15.6), Samsung’s Exynos, Qualcomm’s Snapdragon, Apple Inc.’s
Ax series and Nvidia’s Tegra platform.Intel is also keen to enter the mobile/tablet market
with its Medﬁeld platform. Figure 4.3 shows photographs of some of these processors.
Figure 4.3 | Photographs of some of the popular SoCs used in mobile phones
M04_9788131787663_C04.indd 135
M04_9788131787663_C04.indd 135 7/3/2012 12:09:25 PM
7/3/2012 12:09:25 PM

4.1.2 | The Cellular Concept
In a cellular network, the area over which the network has coverage is divided into sub-
units called ‘cells’.These sub-units are of a particular shape—hexagon, square, circle, etc.
Usually, hexagonal cells are used (see Figure 4.4). Each cell has a centrally located base
station controller (BSC) which is responsible for handling calls. A group of base stations
are controlled by a mobile switching centre (MSC).
4.1.3 | Multiple Access
In any form of communication (wireless or wired), the same channel is shared by many
transmitters for transmitting their messages. In such cases, access to the channel has
to be given to the diﬀerent transmitters in an organized manner. Some of the types of
shared medium accesses are as listed below.
i) FDMA (Frequency Division Multiple Access): Here,each user is allocated a single
distinct central frequency with a certain bandwidth around it (for all the time).
ii) TDMA (Time Division Multiple Access): In this case, each user is allocated a
single time slot during which the user can use all the available bandwidth.
Today’s cellular networks use a combination of FDMA and TDMA. Usually, there
are multiple antennae in a base station. Each of the antennae will have transceiver units.
Each of the transceiver units is assigned a frequency. Each of the frequency channels
is divided into time slots. When a person makes a call using his cell phone, the call is
handled by the nearest base station.The call is assigned a time slot on one of the frequen-
cies on one of the antennae (if all are directional antennae, then the call will be assigned
to that antenna which covers the user’s location). The call eventually gets connected to
the PSTN (Public Switched Telephone Network) or to another base station depending
on whether the call is made to a landline telephone or another mobile phone.
4.1.4 | Frequency Re-use
As power of the signal transmitted varies inversely with square of the distance from the
signal source, the signal transmitted by the base station dies out after a certain distance.
This allows us to use the same frequency for two other base stations which are at a
Figure 4.4 | A hexagonal cell cluster
M04_9788131787663_C04.indd 136
M04_9788131787663_C04.indd 136 7/3/2012 12:09:27 PM
7/3/2012 12:09:27 PM

minimum distance from each other. This is called frequency re-use. This may not be
possible for two adjacent base stations as there might be locations where signals from
both base stations arrive with power greater than the minimum power required.This type
of interference is called co-channel interference. This is prevented by utilizing frequency
re-use only after a certain minimum distance. Hence, with a couple of frequencies, a
service provider can provide coverage to a large area using frequency re-use. Figure 4.5
illustrates this concept.
4.1.5 | Handoff (Handover)
Consider the case of a person talking on the phone, and is travelling. If he moves out
of the cell (area of coverage of the base station handling the call), one would expect the
call to be dropped by the base station.The cellular network architecture uses a technique
called handover (handoff) to take care of this situation and to ensure that the call is not
dropped. The base station near the previous position of the person will communicate
with the base station near the new position of the person, and will effectively hand over
the call to the new base station.
4.1.6 | Spread Spectrum Techniques
In spread spectrum techniques, signals which are limited to a certain bandwidth are
spread across a wider bandwidth.This is advantageous for two main reasons:
i) Narrowband interference: Consider the case when there is some interference
around a certain frequency. If the frequency allotted to all base stations are fixed,
then the base station allotted this particular frequency will always have the same
high value of interference. However, if the frequencies are switched from time to
time, this narrowband interference gets averaged over all base stations and hence,
each of base station receives only a small amount of interference. This is known as
spread spectrum technique.This also helps to take care of jamming noise.
ii) Eavesdropping: The frequency switching takes place according to a pseudo-
random sequence known only to the sender and the receiver. Hence, ‘third person
eavesdropping’ on the communication is impossible without knowing the pseudo-
random sequence.This helps in making the communication secure.
F2
F1
F3
F4
F3
F1
F2
F1
Figure 4.5 | Illustration of the concept of frequency re-use
M04_9788131787663_C04.indd 137
M04_9788131787663_C04.indd 137 7/3/2012 12:09:28 PM
7/3/2012 12:09:28 PM

Some of the diﬀerent types of spread spectrum are as follows:
i) Frequency hopping spread spectrum: Frequency hopping is a technique in which
the frequency assigned to a base station is changed frequently among a set of
frequencies. The switching of frequencies is done according to a pseudo-random
sequence known only to the sender and the receiver.
ii) Direct sequence spread spectrum: In this technique, the signal is multiplied with
a pseudo-random sequence (chirp signal) which is known only to the sender and
receiver. At the receiver side, the correlation of the received signal with the pseudo-
random sequence is determined to demodulate the signal. This process is known
as de-spreading and it relies heavily on the orthogonality of the pseudo-random
sequences.This is the basis for CDMA (Code Division Multiple Access).
4.1.7 | Set Up and Maintanence
The setup and maintenance of mobile networks is an extensive and complicated process.
Usually, mobile service providers outsource the engineering aspects of their mobile
networks to telecommunication companies like Ericsson, Nokia-Siemens, Huawei, etc.
The latter companies are responsible for setting up BSCs, MSCs, user databases, etc.
Figure 4.6 shows the photograph of a mobile tower.
4.1.8 | Conclusion
The modern mobile phone is no longer ‘just a phone’. Many of the smartphones have
functionalities of what may be thought of to be ‘handheld computers’ with multime-
dia, computing and data processing capabilities. Multitudes of applications run on it
Figure 4.6 | Photograph of a mobile tower
M04_9788131787663_C04.indd 138
M04_9788131787663_C04.indd 138 7/3/2012 12:09:28 PM
7/3/2012 12:09:28 PM

and many important aspects of life like banking, for instance, find mobile phone and
mobile communications to be safe and secure.The future will definitely bring many more
‘solutions’ to be handled by mobile phones.
4.2 | Automotive Electronics
Electronic systems in automobiles have been available for quite some time. The earliest
electronic equipments to be used in automobiles were AM radios and 2-way radios.
With the invention of the integrated circuit (IC), there have been major developments
in the field of automotive electronics. Electronic systems today are involved in almost
every aspect of the car. They are used for safety purposes, better driving comforts, fuel
usage efficiency, infotainment purposes, and so on. The industry is constantly driven by
consumer demand and hence, a great deal of advancement is still happening in this field.
The electronic systems inside an automobile are controlled by electronic control
units (ECU). There are about 50 to 100 ECUs in a modern car. These ECUs form the
‘brain centres’of the electronic system of the automobile.The ECU mainly consists of a
microprocessor and necessary software stored on memory (EEPROM or flash). As any
faulty reading or delayed reading can potentially harm the passengers of the automobile,
the sensor inputs are processed in real time. As a result of this, the OS assigns hard
deadlines to the applications. Communication between devices is done using the con-
troller area network (CAN) bus (Section 5.4.1). Let us discuss some of the important
electronically controlled units in a modern automobile.
4.2.1 | Electronic Fuel Injection (EFI)
Electronic fuel injection was one of the major developments that took place in the latter
part of the twentieth century. EFI is a mechanism for regulating the amount of fuel sup-
plied to the engine for combustion. Prior to the development of electronic mechanisms,
this was done by a carburetor and a floating mechanism.The floating mechanism would
regulate the amount of fuel supplied to the engine. The carburetor would evaporate the
fuel so that it mixes with the air for combustion.The EFI mechanism,on the other hand,
measures the fuel amount so that the exact amount is given to the engine.It then forcibly
pumps the fuel through a tiny nozzle under high pressure to atomize it. Hence, it gives
only the proper amount of fuel needed for the engine.
EFI has great advantages over the carburetor mechanism:
• EFI prevents the flooding of the engine by not allowing too much fuel into the engine.
• EFI is more efficient and emission-friendly.
• With EFI, similar hardware can be used for diesel and petrol engines, which is not
the case for carburetors.
4.2.2 | Anti-lock Braking System (ABS)
Anti-lock braking system [ABS] is one of the most popular systems in automotive
electronics. ABS is a mechanism to prevent skidding due to locking up of the wheels.
Consider a situation in which a vehicle is moving at a high speed and is suddenly
confronted by an obstacle in its path. In a moment of panic, the driver applies full
M04_9788131787663_C04.indd 139
M04_9788131787663_C04.indd 139 7/3/2012 12:09:29 PM
7/3/2012 12:09:29 PM

brakes and turns the steering wheel with the intention of turning the vehicle away from
the obstacle. As the driver has applied full brakes, the wheels are locked (i.e. they are not
turning) and hence, they start skidding on the road. As a result of this, the vehicle does
not change direction (even though the driver has turned the steering wheel) but skids in
the direction of the obstacle. If the wheels hadn’t got locked up, the vehicle would have
changed direction and this collision could have been prevented.This is the basic concept
behind ABS.
An ABS consists of an ECU, wheel sensors and hydraulic brakes. Figure 4.7 shows
the important parts of ABS.The wheel sensors inform the ECU about the speed of the
wheels. The speed of the wheels relative to each other is important and hence their dif-
ferentials are analyzed. Whenever a wheel is moving signiﬁcantly slower or faster than
the other wheels,(this is an indicator of wheel lock up) the ABS applies hydraulic brakes
appropriately. If one wheel is moving faster than the other wheels, the ABS increases the
brakes applied on this wheel and if one wheel is moving slower than the other wheels,
the ABS decreases the brakes applied on this wheel. After a few accelerations and decel-
erations, all the wheels will be having similar speeds. The latest ABS mechanisms make
use of brake pulsing in which the wheels are subject to a sequence of quick alternate
accelerations and decelerations.
The main advantage of ABS is that it prevents wheel lock-up and hence, gives the
driver steering control, even after application of the full brake. This reduces the risk of
accidents. ABS also has the added advantage of lesser braking distance as compared to
vehicles without ABS. Braking distance is the distance a vehicle travels after application
of brakes before coming to a stop. This also depends on road conditions. On snow-
covered roads, vehicles without ABS have lesser braking distance than the ones with
ABS. However, ABS still gives the driver better control over the car on such roads.
Figure 4.7 | Components of an ABS
Brake Disc
Wheel
Sensor
Wheel Sensors
Wheel Sensors
Control Module
Modulator Unit
Gear Pulser
M04_9788131787663_C04.indd 140
M04_9788131787663_C04.indd 140 7/3/2012 12:09:29 PM
7/3/2012 12:09:29 PM

4.2.3 | Electronic Stability Control
Electronic stability control (ESC) is very similar to ABS.It also provides the driver better
control of the vehicle. It has wheel sensors and braking mechanisms similar to ABS. It
also has a steering wheel orientation sensor and a gyroscopic sensor. The gyroscopic
sensor detects the directional changes of the vehicle. The ECU of the ESC checks if
the vehicle is moving in the direction the driver intends to move, that is, whether the
gyroscopic sensor senses the vehicle in the direction of the steering wheel orientation or
not. If they agree with each other, the ESC does not intervene. If they do not match, it
indicates that the driver does not have control over the vehicle at the moment.Then the
ESC applies brakes appropriately on the wheels so that the vehicle moves in the direc-
tion intended by the driver.ESC can work on any surface and is mostly seen in high-end
vehicles, trucks, etc.
4.2.4 | Adaptive Cruise Control
Cruise control is a mechanism in which the vehicle speed is maintained at a constant
value without the driver having to constantly keep his foot on the accelerator pedal.
Whenever brakes are manually applied, the cruise control mechanism gives control of
the throttle to the driver. This mechanism can be very useful for drivers while driving
through highways with low traffic levels. Ordinary cruise control is not very useful when
there is a significant amount of traffic. Modern cruise control mechanisms take into
account other vehicles in front of them. These mechanisms are called adaptive cruise
control mechanisms.
Adaptive cruise control units consist of an ECU and RADAR headway sensor (usu-
ally placed somewhere in the front of the car, behind the grill).When there is no vehicle
in front, adaptive cruise control behaves like normal cruise control. When there is a
vehicle in front, adaptive cruise control comes into play. With the help of the RADAR
sensor, the ECU measures the speed and distance of the vehicle ahead and accordingly
accelerates or decelerates. This will ensure that the vehicle is at a safe distance from the
vehicle in front.
Another type of cruise control called ‘hill descent control’ helps the driver in
descending down hilly roads in difficult terrain. The control mechanism applies brakes
appropriately to drive downhill without the driver having to apply brakes.
4.2.5 | Airbag Deployment
Airbags are one of the earliest safety mechanisms used in vehicles. Airbags are to be
deployed whenever there is an automobile collision. They prevent the passengers in the
vehicle from hitting the inside of the car—window, dashboard, etc. The airbag mecha-
nism makes uses of the speed sensors (accelerometers, gyroscopic sensors, wheel speed
sensors, etc.) and impact sensors. Whenever there is a sudden decrease in the speed of
the vehicle in a very small amount of time, that is, large amount of deceleration, it indi-
cates a collision. The impact sensor may also report a collision. In either case, the ECU
actuates the ignition of a gas generator propellant.The gas is usually nitrogen which then
inflates a nylon fabric bag. The airbags are thus deployed preventing injury. Figure 4.8
illustrates this.
M04_9788131787663_C04.indd 141
M04_9788131787663_C04.indd 141 7/3/2012 12:09:29 PM
7/3/2012 12:09:29 PM

4.2.6 | Automotive Navigation Systems
Automotive navigation systems are electronic systems which provide a whole lot of
useful information to the driver. It can provide real-time information about routes,
traﬃc congestion, etc. These systems consist of an ECU, gyroscopic sensor and GPS.
A touch-screen user interface is also provided.GPS (Global Positioning System) makes
use of satellites in space to triangulate the position (in terms of latitude and longitude)
of the vehicle.The gyroscopic sensor is used to detect the direction in which the vehicle
has turned.These systems also provide details about driving restrictions (one-ways,etc.),
signboards,warning boards,nearby fuel stations,car wash shops,workshops,restaurants,
etc. Figure 4.9 is a photograph of the details seen on the screen of a navigation system
developed by Sony Corporation.
Figure 4.8 | Airbag deployment
Figure 4.9 | The screen of a navigation system developed by Sony
M04_9788131787663_C04.indd 142
M04_9788131787663_C04.indd 142 7/3/2012 12:09:29 PM
7/3/2012 12:09:29 PM

The current position of the vehicle is known to the ECU using the GPS. The
software has knowledge of all routes. When the user enters the destination details into
the system using the user interface, the ECU finds the shortest route from the current
point to the destination.The driver can choose any route and not necessarily the shortest
one. Once the driver has selected the route, it shows the driver the path to be taken
and also informs the driver when and where a turn is to be taken. If the driver takes a
wrong turn, the system recalculates and finds the shortest route from that point to the
destination.
The modern versions of the navigation systems also show traffic details.These details
are usually communicated in real-time to the navigation system using Bluetooth or such
wireless communication protocols.
4.2.7 | Conclusion
With this, we end our discussion of automotive electronics. Note that we have included
only a few items of control in this. But keep in mind that, automotive electronics has
come to occupy a major share of the embedded systems market. Consider the case of
cars alone. All cars manufactured these days come with automatic fuel injection and
related electronic control in the engine. Besides safety and navigation features, there is
the ‘infotainment’system also in a car. High end cars have electronic controls for almost
everything, right from door locking to engine starting key. The number of processors
used in an S class Mercedes Benz is likely to be, more than 100.These processor circuits
are interconnected by buses like CAN, LIN, MOST and Flex-Ray. In short, the auto-
mobile industry is one of the biggest users of embedded systems.
4.3 | Radio Frequency Identification (RFID)
Radio frequency identification is a method of identifying, tracking or verifying objects/
persons with the help of electronic tags/labels attached to them. These electronic tags
are capable of receiving and transmitting radio frequencies and are called RFID tags or
RFID labels.The ID information from these tags is obtained using RFID readers.
Refer to Figure 4.10 to see how an RFID system works. The system consists of a
reader,tag and antennae.Each RFID tag has a transmitter and receiver,called a transpon-
der because it ‘transmits and responds’.The RFID reader transmits a signal to interrogate
Computer System
Antenna Transponder
Reader
Figure 4.10 | Working of an RFID system
M04_9788131787663_C04.indd 143
M04_9788131787663_C04.indd 143 7/3/2012 12:09:30 PM
7/3/2012 12:09:30 PM

the tag (or tags). The tag receives the signal and responds by transmitting its identity
information. The system than takes some action based on this information. The action
may be simply to display a number/name on a handheld device, or it may be that the
information is passed on to a system for authentication/counting of a person, items etc.
Most systems utilize three general bands:
i) Low frequency from 125 KHz to 134 KHz
ii) High frequency at 13. 56 MHz
iii) Ultra high frequency at 860 to 930MHz
There may be some difference in frequency use due to some regulations in any par-
ticular country.The frequency band used also influences the practical size of the antenna
and the power transmissions that can be used.
4.3.1 | RFID Architecture
4.3.1.1 | Tag/Label
RFID tags units belong a class of radio devices known as ‘transponders’.A transponder is
a combination of a transmitter and a receiver,which is designed to receive a specific radio
signal and automatically transmit a reply. In its simplest implementation, the transpon-
der listens for a radio signal, and sends a signal of its own as a reply. More complicated
system may send a letter or a digit back to the source, or even send multiple strings of
letters and digits. Finally, advanced systems may do a calculation or verification process
and include encryption to prevent eavesdroppers from obtaining the information being
transmitted. Transponders used in RFID are commonly known as tag, chip or label. As
a general rule, an RFID tag consists of following items:
i) Encoding/Decoding circuitry
ii) Memory
iii) Antenna
iv) Power supply
v) Communication control
A tag can take almost any physical form, like cards, keys, rods, etc., as desired, to
perform the required function. The design may be influenced by the type of antenna,
which in turn may be dependent on the frequency used for the system. Some typical tag
types are card type, key fobs, etc. See Figure 4.11 which shows tags of various shapes.
Figure 4.11 | RFID tags of different physical shapes and sizes
M04_9788131787663_C04.indd 144
M04_9788131787663_C04.indd 144 7/3/2012 12:09:30 PM
7/3/2012 12:09:30 PM

RFID tags have risen as a major complement to the conventionally used barcode
system. RFID provides many advantages over the barcode system such as:
• RFID can be used to read multiple tags at a time where as barcodes can read only one
item at a time.
• RFID tags can be read even if the tagged object is inside a box or a cover, or other
situations when there is no line of sight. Barcode requires the code to be scanned in
the line of sight.
As electronic tags need to be working only when they are read, many RFID tags do
not depend on a battery for power. Thus, there are two types of tags: active and passive.
The active tags need a battery for power, while the passive tags don’t. They make use
of the power of the signal transmitted by the RFID reader. This prevents unnecessary
power wastage during idle hours.
4.3.1.2 | Reader
The next component needed in the system is the RFID reader, that is, the interrogator.
The reader is also a transceiver, that is, transmitter plus receiver. It is named as ‘reader’
by virtue of its function of querying the tag and reading data from it. Handheld systems
have the reader and its antenna together as one unit,while larger systems usually separate
antennas from the reader.
A reader may usually contains a system interface such as an RS-232 serial port or
Ethernet jack, cryptographic encoding or decoding circuitry, a power supply or a battery
and a communications control circuit. This depends on the requirement of the RFID
system.
The reader retrieves the information from the RFID tag. The reader may be self-
contained and may store information internally, But it may also be a part of localized
system such as an authentication system or a large local area network (LAN) or a wide
area network (WAN). Readers that send data to a LAN, do so by using a data interface
such as Ethernet or serial RS-232.
4.3.1.3 | Applications
RFID tags have a huge variety of applications. Some of them are as follows:
i) Supply-chain product tracking
ii) Season parking tickets
iii) Toll booths
iv) Transportation services
v) Public transit (metro, railway, etc.)
vi) Hospitals and museums
vii) Person authentication
4.4 | Wireless Sensor Networks (WISENET)
As the name suggests, a wireless sensor network (WISENET) is a wireless network
consisting of sensors meant for monitoring environmental conditions like temperature,
pressure, humidity, level of gases, etc. They can also be used for auditory or visual
M04_9788131787663_C04.indd 145
M04_9788131787663_C04.indd 145 7/3/2012 12:09:30 PM
7/3/2012 12:09:30 PM

monitoring. This information is then transmitted to the main database. These networks
are of great use in monitoring environmental conditions in places which are not easily
accessible and in regions of difficult terrain.
A WISENET node consists of a wireless transceiver, antenna, microcontroller and
battery. It is connected to a sensor or a collection of sensors. Refer to Figure 4.12 which
shows a number of sensor nodes within a sensor field.The nodes communicate with each
other and the information finally reaches the sink node (final node).The sink node is con-
nected to the wide area network (WAN) and hence,reaches the end user.WISENET uses
wireless communication protocols like ZigBee and IEEE 802.15.4. (Section 5.5.2). The
algorithms used in WISENET software is such that it minimizes power consumption.
The first operating system software specifically designed for wireless sensor
networks is the TinyOS. As the software is mainly responsible only for processing
the sensor readings and transmitting the data packets, it need not be functioning all
the time. Hence, TinyOS makes use of ‘event-driven’ programming. There are event
handlers associated with each task. Whenever an event comes up, the OS assigns the
event handler to handle the task.
4.4.1 | Applications
i) Vehicle tracking
ii) Energy monitoring
4.5 | Robotics
Robotics is the field of design and development of devices which can perform tasks on
their own or with guidance. The devices which perform these actions are called robots.
Robots are a subset of embedded systems. A robot is a mechanical system which has a
sense of purpose. A typical robot
i) can sense its environment
ii) can manipulate things in its environment
iii) has intelligence embedded in it
iv) has motion or translation in one or more axes
Figure 4.12 | A wireless sensor network
WAN
End Users Sink
Sensor Field
Sensor Nodes
M04_9788131787663_C04.indd 146
M04_9788131787663_C04.indd 146 7/3/2012 12:09:31 PM
7/3/2012 12:09:31 PM

The working of a robot has three main phases:
i) Perception: Obtaining information about the environment/stimulus. This action is
done by the sensor, for example, vision, sound, touch, etc.
ii) Processing: The processor processes the input data from the sensor and generates the
necessary control signals to be sent to the actuators which executes the necessary action.
iii) Action: Implementation of commands given by the processor. For example, motor,
video, sound, etc.
4.5.1 | Sensors
Some of the sensors used by robots are as follows:
i) Vision: Robotic vision is mainly computer vision captured using a CMOS or CCD
[Charge-coupled Device] array.The pixel information of the image is then processed
by the processor.
ii) Sound: Robots which respond to audio signals have a microphone to gather input
data which is then processed by the processor.Highly eﬃcient algorithms have been
developed for speech recognition applications.
iii) Tactile sensors: Robots which respond to touch, make use of tactile sensors for
input data. These sensors have certain impedance measuring devices which help in
detecting tactile information.
4.5.2 | Actuators
Some of the actuators used by robots are as follows:
i) Motors: Motors can be used to do mechanical functions like turning wheels,
pumping water, move in any direction, etc.
ii) Video and audio: Robots can also provide visual and auditory data using LCD
displays and speakers, respectively.
4.5.3 | Embedded Intelligence
All robots have some degree of intelligence attached to it.The intelligence to be embedded
in a robot depends on its intended application as well as on the degree of sophistication
and precision to which it is to perform.
To put it all together, let us say that for a robot, for example a mobile robot, to
perform its intended applications, it needs
i) sensors to sense its environment and to move accordingly
ii) actuators for performing the movement, and the assigned task
iii) a control algorithm for moving and performing the intended task
iv) communication systems (optional, depending on the application)
4.5.4 | Types of Robots
i) Stationary robots: They are stationary as a whole, but have moving parts like a
robotic arm, for example. Such arms may be used for picking up objects and/or
perform similar activities.
M04_9788131787663_C04.indd 147
M04_9788131787663_C04.indd 147 7/3/2012 12:09:31 PM
7/3/2012 12:09:31 PM

ii) Mobile robots have the capability to move around in their environment and are not
ﬁxed to one physical location.
iii) Humanoid robots are those which have an appearance similar to a human body and
can perform actions similar to human beings (e. g. dance and sing).
See Figure 4.13 for the image of two dancing robots developed by Sony Corporation.
Figure 4.14 shows the pictures of two robotic research platforms developed by Nex
Robotics, Mumbai (http://www. nex-robotics. com/).
4.5.5 | Open Loop and Closed Loop Systems
In open loop systems, the ‘action’ phase is executed based on the ‘perception’ phase but
the ‘perception’ phase is not inﬂuenced by the ‘action’ phase. For example, consider a
robot which translates from English to French. It has a microphone as the sensor and
Figure 4.13 | Dancing robots developed by Sony Corporation
Figure 4.14 | Two robotic research platforms developed by Nex Robotics, Mumbai (a) Robotic platform with
camera and (b) Craboide Robot (Reproduced with permission from M/s Nex Robotics)
M04_9788131787663_C04.indd 148
M04_9788131787663_C04.indd 148 7/3/2012 12:09:31 PM
7/3/2012 12:09:31 PM

speaker as the actuator.The robot takes in data in English via the microphone, translates
it using its processor and gives the output in French via the speaker.
When the ‘perception’phase is also made dependent on the previous ‘action’phase(s),
the loop becomes closed and hence there is a feedback mechanism in the system. For
example,consider a mobile robot which follows a slow moving red-coloured ball.It has a
video camera as sensor and motor as actuator to drive its wheels.The robot takes in visual
data via the camera, finds the location of the red-ball using the algorithm burned in the
processor and turns the wheel in such a way that the robot is still following the ball.
The simplest algorithm would be to ensure that the ball is always at the centre of
the image captured by the camera. So, whenever the ball moves away from the centre,
the robot adjusts its position such that the ball is at the centre of the image. Hence, the
current image seen by the robot (current ‘perception’ phase) is dependent on the direc-
tion in which the robot moved in the previous instant (previous ‘action’phase).This is an
example of a closed loop system with negative feedback.
4.5.6 | Designing an Autonomous Robotic System
An autonomous robot is a robot which can perform desired tasks in unstructured envi-
ronments without continuous human guidance. Different robots can be autonomous
in different ways. An autonomous robot is an assembly of mechanical and electronic
elements with artificial intelligence embedded in it.The mechanical structure of a robot
must be controlled to perform tasks.The control of a robot involves various aspects such
as path planning, pattern recognition, obstacle avoidance, etc.
The strategy for design consists of the following steps:
i) Identify various cost-effective applications
ii) Study various algorithms which can perform this job efficiently
iii) Test the chosen algorithms using modelling techniques
iv) Design the hardware and software
v) Choose the right sensors and actuators
vi) Integrate the whole system and test it
Figure 4.15 shows the picture of an autonomous robot capable of moving in a rough
and hostile terrain. Section 19.1 contains a full description of the design of a vision con-
trolled autonomous robot.
Figure 4.15 | Photograph of an autonomous robot (Reproduced with permission from
M/s Nex Robotics)
M04_9788131787663_C04.indd 149
M04_9788131787663_C04.indd 149 7/3/2012 12:09:32 PM
7/3/2012 12:09:32 PM

4.6 | Biomedical Applications
Embedded systems are being used in a variety of medical applications. Their scope is
being expanded by the improvements in sensor technology and by the miniaturization
of Analog Front Ends (AFEs) and Analog to Digital Convertors (ADCs). A few of the
biomedical applications are listed below:
i) X-Ray: X-ray machines used to be analog devices. The X-rays generated from a
source are projected onto a film sensitive to the radiation, after passing it through
the body part to be examined. Earlier, the film had to be developed to obtain
the image. This has now given way to digital x-ray machines in which Flat Panel
Detector (FPD) sensors replace the film. The analog output of the sensors are
amplified and fed to ADCs which convert them into digital codes and pass it onto
the digital modules which process the information and generate an image from it.
ii) MRI (Magnetic Resonance Imaging): Here magnetic waves generated by nuclei
immersed in a strong magnetic field and disturbed by an RF pulse at their resonant
frequency. This is sensed by coils and this information is processed to obtain the
image.
iii) CT (ComputedTomography):This uses x-rays to generate 3D images of the body
by rotating the sensor-detector pair around the body and taking multiple images at
various angles.
iv) Pulse oximetry: Red and Infra Red (IR) light is passed through the user’s finger
or earlobe and the transmitted light is sensed and processed to obtain information
about oxygen content in the blood and pulse rate.
v) Blood glucose measurement: A drop of blood is placed on a special strip and the
strip is loaded into a device.The device applies various electrical inputs to the strip
and then measures some quantity like total charge passed through the strip or its
resistance to measure the glucose level in the blood.
vi) Pedometer: Miniature devices that can be embedded in special purpose shoes or
carried/worn by the user can count the number of footsteps. This can be used by
athletes to monitor their performance, on the go. Such devices might use acceler-
ometers or pressure sensors to obtain the input signal, which is then filtered and
processed to give the required information.
vii) Wearable medical devices: Compact light weight systems that can moni-
tor important health parameters and transmit them for observation are being
developed.
viii) Emergency alert systems: Systems are available that can monitor the vital sta-
tistics of a person and alert others in case of a problem. Such systems can also be
manually triggered by the user. This is particularly useful for the elderly and the
infirm, who are prone to dangers like falling or sudden and debilitating conditions
like cardiac arrest.
ix) Wheel chairs: Wheel chairs with lots of flexibility of movement and control, and
a lot of features, are now being manufactured.The complete control of the chair is
done by high end processors with signal processing capabilities, and running very
sophisticated algorithms for locomotion. See Figure 4.16 which is a photograph
showing a modern wheel chair.
M04_9788131787663_C04.indd 150
M04_9788131787663_C04.indd 150 7/3/2012 12:09:32 PM
7/3/2012 12:09:32 PM

4.7 | Brain Machine Interface
Imagine a technology which would enable you to control objects without physically
interacting with them, but could control them by just a thought. This popular concept
in science ﬁction is soon to become a reality. In fact, many such devices have already
translated human thought into prosthetic arm movements,computer cursor movements,
etc. This is realized using what is called a Brain-Machine Interface (also called Brain-
Computer Interface).
4.7.1 | Block Diagram
A brain-machine Interface (BMI) is a communication channel between the human
brain and an external device. It serves as an interpreter or a translator which translates
human thought into corresponding action. Figure 4.17 shows the block diagram of such
a system.
Figure 4.16 | A modern wheel chair
Read Signals
from Brain
Amplifier and
Filter
Analyzing and
Decoding
the Signal
Action at
End Device
Brain Machine Interface
Figure 4.17 | Block diagram of a brain-machine interface
M04_9788131787663_C04.indd 151
M04_9788131787663_C04.indd 151 7/3/2012 12:09:32 PM
7/3/2012 12:09:32 PM

4.7.2 | Stages of a BMI
A BMI has the following stages:
4.7.2.1 | Signal Reading Stage
This is in the form of electrodes placed at different places on the scalp of the human
head.
There are mainly two methods adopted for reading signals from the brain: non-
invasive and invasive.
Non-invasive
Electroencephalography (EEG) EEG offers a non-invasive technique for reading
brain activity,that is,reading signals without placing electrodes or any such devices inside
the human body. Hence, there is no surgery required to connect a BMI to a person using
EEG. However, only the signals which are obtained at the scalp can be read using EEG.
These signals (originated from inside the brain) are heavily attenuated by the skull when
they reach the scalp.These reasons make the signal obtained at the scalp weak and highly
distorted. Hence, we need very high gain amplifiers to boost the signal and also very
accurate filters to remove noise.The difficulty with such a method is that no non-invasive
technique currently exists that approaches the spatial resolution needed to extract the
finest neural details. Figure 4.18 shows a number of electrodes placed on a scalp.
Invasive
Electrocorticography (ECoG) The other alternative of ECoG provides a better option
in terms of spatial resolution. Spatial resolution is a measure of the ability of the reading
stage to differentiate between signals from two very close parts of the brain. This is of
vital importance, considering the fact that different signals (sense, movement, etc.) come
from different points on the brain surface which are often close to each other. As signals
are read from the cortex directly,the attenuation caused by the skull is avoided.However,
ECoG is an invasive approach, that is, the electrodes have to be surgically implanted
inside the person’s head. See Figure 4.19 which shows the signals inside the brain which
are to be extracted, and the precise method is to place electrodes inside the brain.
Figure 4.18 | Non-invasive reading of signal (EEG)
M04_9788131787663_C04.indd 152
M04_9788131787663_C04.indd 152 7/3/2012 12:09:33 PM
7/3/2012 12:09:33 PM

4.7.2.2 | Amplifier and Filter Stage
The signals obtained from the scalp or cortex are weak, and need to be amplified. In
EEG, usually differential amplifiers are used. These amplifiers have gain in the range of
60 dB to 100 dB, that is, a voltage gain of 1,000V/V to 100,000 V/V. This would facili-
tate the strengthening of a signal of μV range to a signal of mV range.
The EEG/ECoG signal obtained is not only a weak one, but is also corrupted with
lots of noise. In order to decode the brain wave, it is very important that the signal be
passed through a filter first and then only analysed. Analysing an EEG signal directly
(even after amplification) is like listening to a bad telephone—there is a lot of static.
Hence after amplification, the brain waves are passed through a filter with cut-off
frequency, usually near the 30–70 Hz range. It depends on the type of waves expected.
It is very important that the filters used here are very accurate as there are all sorts of
noise in the signal. Sometimes, a notch filter is also used with the notch at the supply
frequency, that is, the frequency at which the supply voltage operates (50 Hz in India
and many other countries). This is done to avoid any noise creeping into the system via
the power supply.
4.7.2.3 | Analysing and Decoding the Signals
Once the brain waves have been obtained, it is provided to the most important part of
the brain-machine interface, that is, the decoding centre. It has the important job of
analysing the signals, finding out what they are meant for and generating the necessary
control signals to realize the corresponding movement/action in the end device.
There are a variety of parameters with the help of which the brain waves are anal-
ysed. Some of them are as follows.
Location of Electrode
One of the major parameters which help in decoding brain waves is the location on
the brain from where the waves are read. Different parts of the brain have different
Figure 4.19 | Signals inside the brain which are to be‘invasively’read
+
+
–
–
–
M04_9788131787663_C04.indd 153
M04_9788131787663_C04.indd 153 7/3/2012 12:09:34 PM
7/3/2012 12:09:34 PM

functions. Signals received from the visual cortex of the brain are associated with vision,
whereas signals received from the motor cortex of the brain are associated with voluntary
limb movements. Spatial resolution becomes an important concept in this context.
Frequency of the Signal
The frequency of the brain waves tells us the class to which it belongs to (theta, alpha,
beta, gamma, etc.).This in turn gives us some information about the kind of activity tak-
ing place in the brain. For example, alpha waves (frequency range 8–12 Hz) are obtained
while closing or opening of the eye. When a person is in an alert state, beta waves
(12–30 Hz) are obtained. Similarly, gamma waves (30–100+ Hz) are obtained during
memory match.
Strength of the Signal
The strength of the signal obtained is also an important parameter in analysis of brain
waves. The signal power is different for different types of brain activity. For example,
while opening the eye, low signal power alpha waves are obtained, whereas while closing
it, high power signals are obtained.This is one of the easiest methods which can be used
for decoding brain waves.
Consider a robot which goes only forward and backward. Closing and opening of
the eyes can serve as ‘indicator thoughts’ in order to tell the BMI to generate necessary
control signals to make the robot move backward or forward.
This example is a very simple decoding algorithm. Actual algorithms, however, are
very complex. This complexity is increased manifold by the fact that brain waves are
usually subjective, that is, varies from person to person. To develop a BMI which can
be readily used on any person would be impossible. The BMI has to fully ‘understand’
the person before actually being effective in aiding that person. For this to happen, the
person and the BMI have to adapt to each other.
On one hand, the BMI has to understand the individual brain patterns character-
izing the mental tasks executed by the subjects. On the other hand, the subject has to
modulate his brain waves voluntarily through feedback to generate distinct brain wave
patterns.This requires an adaptation time wherein the subject is made to visualize (think)
about the necessary action being executed and his/her brain waves are recorded. This is
done a number of times to get distinct brain wave patterns which are then fed into the
BMI as indicators for the necessary action.
4.7.3 | End Device
The end device may be a robot, a prosthetic limb, a computer cursor, etc. In case of a
computer cursor,the patient mentally visualizes the cursor moving to the target.The BMI
understands this signal and then generates the necessary signals to make the computer
cursor move to the target. For the brain-machine interface to understand the signal, a
house of practice is required.The BMI has to be trained to decode the singles correctly.
The end device may also be a prosthetic limb. In this case, the patient visualizes
the movement of the limbs which is interpreted by the software and the corresponding
movement is generated in the prosthetic limb. See Figure 4.20 which shows both these
cases.
M04_9788131787663_C04.indd 154
M04_9788131787663_C04.indd 154 7/3/2012 12:09:34 PM
7/3/2012 12:09:34 PM

Sometimes BMIs are used to provide alternate neural pathways for persons whose
natural neural pathways have been damaged or are dysfunctional. For instance, consider
a case when the muscles near the elbow part of the arm are damaged. The signals from
the brain cannot reach the hand part of the arm due to the blockage caused by the dam-
aged muscles. A BMI can easily solve the problem by providing an alternate pathway
between the brain and the human hand. Here the end device is not a prosthetic object
or a robot, but a human organ.
4.7.4 | Important Milestones
In 2002, a man named Jens Neumann had his vision restored (imperfectly) using a BMI.
Jens Neumann had lost his eyesight during adulthood. The device was in the form of
camera attached to one of the glasses of his spectacles.The images captured by the cam-
era were processed and sent to the visual cortex. Scientists targeted the 177 brain cells in
the cortex which interpret the image falling on the retina (screen) of the eye.
The implant, however, oﬀered only black-and-white vision and at a low frame rate.
In spite of this,he was able to drive slowly in the parking lot of the research centre where
he was given the BMI.
BMI technology for motor neuro-prosthetics created a whole new benchmark with
the BrainGate BMI implanted into Matt Nagle’s brain in 2005. Matt was paralysed
neck down after his spinal cord had been severed, as a result of stabbing. It was done as
part of the ﬁrst nine-month human trial of Cyber-kinetics Neuro-technology’s BrainGate
chip-implant.
Brain–Machine Interface
Virtual Arm or
Prosthetic Control Signal
Touch, Proximity and Position
Sensor Data
Computer Control
Optical Tracking Data
Sensory Feedback
Neuronal Activity
Figure 4.20 | Working of a BMI with computer cursor and prosthetic limb as end device
M04_9788131787663_C04.indd 155
M04_9788131787663_C04.indd 155 7/3/2012 12:09:34 PM
7/3/2012 12:09:34 PM

A 96 electrode BMI was implanted into his cortex. It required him some months
to initially adapt to the BMI. His signals were read and were then used for moving a
computer cursor. He could utilize the full functionality of the computer cursor including
left-click,right-click,etc.He could then check e-mail,turn ON/OFF TV,lights,etc.He
also became the first person to move a prosthetic hand (opening and closing of hand)
with his thoughts.
These examples indicate that BMI seems to be quite promising, though many hur-
dles are left to be crossed before it can be made really useful.
Conclusion
With this, we come to the end of our discussion on certain specific and popular embed-
ded systems.This chapter gives an insight into the various aspects involved in the design
of an embedded system. By viewing the example of a mobile phone itself, we realize that
an embedded system design is the culmination of the most complex technologies and
knowledge of the fields of signal processing, analog to digital conversion technology,
sensor and display design, processor design, communications principles, etc. Thus an
embedded system does not stand alone as just a piece of electronics circuitry; it encom-
passes the finest and best in many different fields of knowledge.
This chapter introduces a few embedded systems in common use.
One of the most popular devices we use today is the mobile phone.
A mobile phone of modern times uses ARM processors with DSP co-processors.
It uses the cellular concept and spread spectrum modulation technique.
Modern automobiles have a lot of ECUs pertaining to different controls.
ABS, EFI, ESC, ACC, etc. are systems used very commonly in many vehicles.
RFID systems have a lot of advantages over the conventional barcode system.
Wireless sensor networks are used for applications like environment monitoring.
The field of robotics is one in which embedded systems are widely used.
Robots are mechanical structures controlled by electronics.
There are different types of robots like stationary, mobile, autonomous, etc.
For autonomous robots, complex path following algorithms are needed.
The biomedical field is one where embedded systems are widely used.
Brain-machine interface is a new application which shows promise.
Q U E S T I O N S
1. List out the fundamental blocks of a mobile phone and explain the function of each block.
2. Explain why the central processing block of a mobile phone needs processors of very high
signal processing capability.
M04_9788131787663_C04.indd 156
M04_9788131787663_C04.indd 156 7/3/2012 12:09:34 PM
7/3/2012 12:09:34 PM

3. Why are mobile cells found to be hexagonal? Find out the reason.
4. What is the necessity for‘hand off’?
5. List at least five ECUs usually found in automobiles.
6. How does the ABS help in improving the safety of a vehicle?
7. Justify the name‘RFID’in terms of the functionality of the system.
8. List the most commonly used RFID frequencies.
9. What are the components likely to be present in a wireless sensor node?
10. Distinguish between open and closed loop robotic systems.
11. Name a few biomedical devices which used embedded systems.
12. Why is the wheel chair an apt example of the use of embedded systems?
13. Why do you think that BMI holds great promise for the future?
14. Why is the non-invasive method of reading brain signals not very effective?
15. What is the difficulty in the invasive method of brain signal reading?
E X E R C I S E S
1. Name five service providers for cell phones in India.
2. Name five manufacturers of mobile phone receivers in the world.
3. What are the equipment likely to be present in a mobile phone tower?
4. In a car, find out which‘buses’are used for the following units.
i) Infotainment
ii) Engine Control
iii) Body Electronics
5. How does the system of‘automatic gear’work in a typical‘gearless’car?
6. Explain how the GPS system in a vehicle (if present) works.
7. Find a few examples of the applications of wireless sensor networks.
8. Find the names of five manufacturers of biomedical appliances.
9. Find more examples of the practical results of using BMI.
10. Write an essay with information about the following topics.
i) Robotic soccer
ii) Humanoid robots and their future
iii) Industrial robots
iv) Robots for military applications
M04_9788131787663_C04.indd 157
M04_9788131787663_C04.indd 157 7/3/2012 12:09:34 PM
7/3/2012 12:09:34 PM

Introduction
In this chapter, we will discuss buses and protocols, two ‘related’ aspects that are very
important for any system, whether it is a general purpose computer system or a special
purpose embedded system. Let us start with some simple definitions.
5.1 | Defining Buses and Protocols
Bus In the simplest form, a bus is a collection of wires which carry electrical signals.
The electrical signals may be defined in terms of voltage levels or current values. We are
dealing with digital systems,and the signals involved in this carry information quantified
as bits: either 0 or 1.In short,we say that a signal wire carries a ‘1’or a ‘0’,and that a bus is
a collection of such wires (we shall exclude the wires which carry power supply voltages).
Protocol A protocol is set of rules/specifications which govern the transmission and
reception of information over a bus.These specifications are defined electrically,mechan-
ically and also in terms of ‘speed’, that is, the rate at which data is transferred.
We will, in this section carry out an in depth review of these related terms and study
different types of buses and associated protocols.
The definitions for buses and protocols
The different types of buses in a computer
system
The reasons why serial buses are preferred
currently
The different methods of bus arbitration
The protocols of two on-board buses, the
I2C and SPI bus
The working of the external serial buses
USB and firewire
The serial buses like RS 232, RS 422 and
RS 485
The concepts surrounding the Ethernet
protocol
The most important automotive bus,
i.e., the CAN bus
The wireless protocols for WLAN, Zigbee
and Bluetooth
buses and
protocols
5
Chapter-opening image: A Zigbee module.
M05_9788131787663_C05.indd 158
M05_9788131787663_C05.indd 158 7/3/2012 12:39:15 PM
7/3/2012 12:39:15 PM

BUSES AND PROTOCOLS 159
Since buses carry information, let us first have an idea of what kind of information
can be carried by a bus. As a basic subdivision, the information carried can be classified
as address, data and control. Remember that any computer system consists of a processor,
memory and I/O.The system bus present in such a setup is subdivided into the following:
i) The address bus which carries the memory or IO addresses which the processor
wants to access in order to read or write data. It is a unidirectional bus.
ii) The data bus carries data coming from or going to the processor.It is a bidirectional bus.
iii) The control bus also called command bus which transports control and synchroni-
zation signals. It is a bidirectional bus, that is, signals which travel in either or both
directions are present in this group.
Figure 5.1 is a typical system showing the CPU connected to memory and I/O. In this
figure, the system bus is shown connected to both I/O and memory.
5.1.1 | Processor-memory Bus
Over the years, the requirements of memory and I/O have become more and more dis-
tinct and separate, and you may hear terms like ‘processor-memory bus’ and ‘peripheral
bus’, with respect to PCs or standard embedded systems. In any system the fastest unit is
the CPU, that is, the processor. The next fast unit is main memory or cache (if present).
The bus connecting the processor and memory (including cache,if present) is now called
the processor-memory bus. It is also designated by terms such as internal bus, main bus,
system bus, etc.
There is a continuous transport of information between a processor and its memory.
For reading from memory, a processor (say, 8085) with an 8 bits data bus and a 16-bit
address bus sends 16 bits of address on 16 separate address lines, and receives 8 bits of
data on 8 physical wires. Control signals like memory read, chip enable, output enable,
etc. are also activated. Thus, a number of physical wires are active at the same time, and
contribute to the transfer of information. This is parallel communication between the
processor and memory. Internal buses are almost always parallel buses.
5.1.2 | Peripheral Buses
Processors also need to communicate with peripherals, that is, external input and output
devices, and this data pathway is called the I/O bus or peripheral bus. Since peripherals
Figure 5.1 | A processor connected to memory and I/O
CPU
Memory I/O
M05_9788131787663_C05.indd 159
M05_9788131787663_C05.indd 159 7/3/2012 12:39:16 PM
7/3/2012 12:39:16 PM

tend to be slower in response (compared to semiconductor memory), peripheral buses
are slower compared to internal buses.Peripherals are of different varieties with differing
electrical and mechanical characteristics,applications and are made from different mate-
rials, technologies, etc. This is the reason for having various kinds of peripheral buses,
with different sets of control signals associated with them. For example, the require-
ments of a printer are quite different from the needs of a video monitor. All these also
necessitate an extra ‘controller’ or an interfacing chip between the processor and the
peripheral.This also dictates the need for different kinds of peripheral buses of different
data rates, electrical specifications and mechanical dimensions. In Figure 5.2, we see that
usually an I/O controller is needed to interface between the processor and I/O devices.
The figure shows a USB controller,graphics controller,network controller,etc.connected
to their respective devices.
5.1.3 | Embedded Processors/Microcontrollers
At this point of our discussion, we need to make a clear distinction between a general
purpose processor (like Pentium) and an embedded processor or microcontroller (like
8051, PIC, ARM, etc).The former is dedicated to computation and for communicating
with the external world,its ‘I/O interface’,that is,I/O controllers are external to the pro-
cessor chip.The latter,on the other hand,is designed to act as a single chip computer,and
has memory and I/O controllers internal to the chip. Figure 5.3 shows a typical embed-
ded processor’s internal block diagram with a number of peripheral controllers inside
the chip. For a simple embedded system, it is highly probable that the RAM and ROM
available internally are sufficient, and that the I/O needed can be directly connected to
Processor
Processor-memory Bus
Peripheral Bus
Cache
USB
Controller
Device
1
Device
2
Main
Memory
Graphics
Controller
Network
Controller
Network
Graphics
Device
Figure 5.2 | The processor: memory bus and the peripheral bus
M05_9788131787663_C05.indd 160
M05_9788131787663_C05.indd 160 7/3/2012 12:39:16 PM
7/3/2012 12:39:16 PM

the port pins. In this case, the processor-memory bus is internal to the chip, and there is
no need for an external peripheral bus.
Now, as embedded systems themselves have become big and complex, the
resources available inside the processor chip may not be sufficient. Extra memory and
more peripheral controllers become necessary. Then we need an embedded board,
which needs at least one peripheral bus to connect to the extra peripherals and the
memory needed. The major components that make up the embedded board, that is,
the embedded processor, I/O components and memory are interconnected via buses on
the embedded board. Such a bus is called an on-board bus. On an embedded board, an
on-board bus will be the bus which connects between the MCU (microcontroller unit)
and units like EEPROM, SD card, LCD display, etc. Typical examples of on-board
buses are the SPI and I2C buses. Keep in mind that in this case, all the peripherals are
on this board itself.
On more complex boards, multiple buses may be found on the same board. When
different buses connect components that need to inter-communicate, bridges on the
board act as the interface between the various buses and carry information from one bus
to the other.
There are also off-board or expansion or external buses which connect the board
with another board or with standard external peripherals. For example, the USB is an
off-board bus. For such buses, there are connectors on the board. See Figure 5.4 which
shows connectors for Ethernet, USB, RS-232, etc.
With this brief introduction, let us attempt to understand different types of ‘periph-
eral’ buses used in embedded systems. There are ‘n’ number of buses available in the
industry. We will attempt a study of some of the standard and popular buses.
Figure 5.3 | Internal block diagram of an embedded processor
Processor
Core
Internal
Memory
A-to-D
Conversion
D-to-A
Conversion
Parallel I/O
Ports
Serial
I/O Ports
Counter/
Timer
To External
Memory
M05_9788131787663_C05.indd 161
M05_9788131787663_C05.indd 161 7/3/2012 12:39:17 PM
7/3/2012 12:39:17 PM

5.1.4 | Parallel vs Serial Buses
When data,address and control signals are transported in parallel,we have parallel buses.
All the early buses were entirely parallel.To make this point clear, let’s look at the buses
of a general purpose computer (something that most of us are very familiar with). Some
time back, many of the buses of the PC were parallel, for example, the printer port; all
PCs had a parallel port to which a printer could be connected. Besides this, inside the
PC also, the earlier buses for connecting expansions boards were all parallel, that is, the
ISA, EISA (Industry Standard architecture and Extended Industry Standard architec-
ture, which is now obsolete), PCI (Peripheral Component Interconnect bus for expan-
sion boards),ATA (AT Attachment,for the hard disk) and so on.Other important buses
like the multibus, VL bus, MCA buses, etc. were also parallel.
However, of late, things have changed, with all the above buses being replaced by
serial buses. Thus, PCI has been replaced by the serial PCIe (Express), ATA by serial
ATA (SATA, as it is called), the printer and many peripherals have accepted the USB as
its standard bus.This is so for the PC.
Now for the embedded world—the on-board buses associated with embedded
systems is all serial, for example, the I2C, SPI., so also the oﬀ-board buses like PCIe,
USB and so on.Keep in mind that associated with each of these buses,there is a standard
protocol. All the buses that we will discuss are serial buses.
What is the reason for the shift from parallel to serial buses?
i) The ﬁrst reason is that parallel buses require more number of physical wires. So they
take up more space on the PCBs and the ICs involved need a lot of pins.
ii) Over the years, the widths of the data and address buses have increased. The issue
of ‘skew’ has become more prominent. Skew occurs when data travelling along
Figure 5.4 | An embedded board with connectors to many external buses
USB
Port
Ethernet
Connector
RS 232
Connector
General
Expansion
Headers
USB
Port
OMAP
Processor
M05_9788131787663_C05.indd 162
M05_9788131787663_C05.indd 162 7/3/2012 12:39:17 PM
7/3/2012 12:39:17 PM

different physical wires reach the destination at different time instants, and tends to
distort the received data.This problem becomes more acute with the increase in bus
widths.
iii) When there are many physical wires, the possibility of crosstalk between them is
also high.
All these problems are avoidable by serial communications, which, in its wake, has
brought about a new and different set of issues to resolve.
5.1.5 | Serial Communications
Simplex, Half-Duplex and Full-Duplex Communication
A Simplex Connection is a connection in which the data flows in only one direction,
from the transmitter to the receiver. This type of connection is useful if data does not
need to flow in both directions (for example, from the computer to the printer or from
the mouse to the computer.)
A Half-duplex Connection (sometimes called an alternating or semi-duplex connec-
tion) is a connection in which data flows in one direction or the other, but not both at
the same time. At a time, transmission is done only in one direction and thus the full
bandwidth of the line can be used for the transmission (like when talking on a radio).
A Full-duplex Connection is a connection in which data flows in both directions
simultaneously.
Each end of the line can thus transmit and receive at the same time, which means
that the bandwidth is divided by two for each direction of data transmission if the same
transmission medium is used for both directions.Talking over a telephone line is a typi-
cal case of full-duplex transmission. The serial port on the PC is a full-duplex device
meaning that it can send and receive data at the same time. In order to be able to do this,
it uses separate lines for transmitting and receiving data.
Transmission Rate For parallel data transfer, the data rate is specified in bytes/second,
but for serial communications, it always bits per second (bps) that is specified, because
the data is sent one bit at a time. When you see the words Mbps and Kbps, translate it
to Mega and Kilo ‘bits per second’.
5.1.5.1 | Some Features of Buses
Synchronous and Asynchronous Buses A synchronous bus is timed by a clock signal.
The bus can run very fast and the interface logic will be small. Every device on the bus
must run at the same clock rate and the bus must necessarily be short to avoid clock-
skew problems.
An asynchronous bus is not clocked. It follows a handshake protocol. It can accom-
modate a wide variety of devices, and the bus can be lengthened without worrying about
clock-skew or synchronization problems.
M05_9788131787663_C05.indd 163
M05_9788131787663_C05.indd 163 7/3/2012 12:39:18 PM
7/3/2012 12:39:18 PM

5.1.6 | Bus Arbitration
When dealing with buses, a term that will frequently be encountered is ‘bus arbitration’.
Before discussing each of the buses, it will be apt to understand what this term means.
The activities that occur on a bus are called bus transactions. It amounts to putting
up an address and sending/receiving data.The device that initiates a transaction is called
a ‘master’ and the device that follows the orders of the master is called the slave. In a
system,the main processor is the master; but for peripheral buses,the processor need not
be called upon to initiate transactions, instead, the ‘controllers’for the specific buses take
charge of this activity. For example, the USB controller handles the USB transactions;
similarly, we have controllers for I2C, SPI, etc.
The simplest system is one in which there is one master and one slave. A more com-
plicated system will be one with one master and many slaves. Usually only one slave can
respond to the master’s initiation, and that will be decided by the address of the slave,
only the ‘addressed’ slave need respond.
A complex system is one with many masters and many slaves.It can happen that more
than one master will want to ‘drive’ the bus—this condition is called ‘bus contention’—
it is apparent that only one of the masters can be allowed ‘grant’of the bus.The solution
to this problem is ‘bus arbitration’. When there are multiple masters, each master will
have its own hardware arbiter to ‘stake its claim’ for the bus.There will also be a scheme
for deciding which device ultimately gets the bus.
5.1.6.1 | Bus Arbitration Schemes
For any system, the arbitration scheme designed to be used must balance two factors.
i) Priority: The highest priority module must be serviced first.
ii) Fairness: Even the lowest priority module should be able to get service.
In all schemes, there is a central arbiter which will act to control and restrict bus access.
This will be part of the I/O controller hardware and software. We will discuss three
simple schemes for bus arbitration.
5.1.6.2 | Daisy Chaining
This scheme uses three lines: Bus request, bus grant and bus busy as shown in Figure 5.5.
Each of these lines is shared by all the potential bus masters which are ‘daisy chained’,
that is, connected in a cascaded fashion. The priority of the modules is fixed by their
physical connection, left to right, meaning that the modules closer to the central bus
arbiter have the higher priorities.
One or more modules may place a bus request on the common line and this is
received by the central arbiter. It sends out a bus grant signal, provided the ‘bus busy’line
is inactive.This bus grant signal propagates from left to right and is accepted by the first
module which had asked for the bus. It activates the bus busy signal and uses the bus.
Thus, the bus grant signal does not propagate beyond this.
This scheme is very simple, but is obviously not a fair scheme. The priority of the
modules is fixed up by the physical connection which cannot be changed once wired up,
and a low priority module may be locked up permanently. The delay is propagating the
M05_9788131787663_C05.indd 164
M05_9788131787663_C05.indd 164 7/3/2012 12:39:18 PM
7/3/2012 12:39:18 PM

bus grant signal from the left to right also limits the number of modules that can be
accommodated in the system.
5.1.6.3 | Parallel Arbitration Scheme
This is also called the independent requests scheme. Here, a master module may output
the request (REQ) only if the system bus is not busy.The centralized arbiter issues a bus
grant signal to the module originating the highest priority request. Thus, this module
can acquire the control of the system bus and it then activates the common ‘bus busy’
line. This is depicted in Figure 5.6. The priority may be preset (in this case, it can hap-
pen that low priority master modules cannot access the system bus at all, or after a long
delay only), or it may use a round-robin scheme (the highest priority at a particular time
Figure 5.5 | Daisy chaining scheme of bus arbitration
Central
Arbiter
Bus
Arbiter
Bus
Arbiter
Bus
Arbiter
Bus Grant
Bus Request
Bus Busy
Master 1 Master 2 Master N
Central
Arbiter
Bus
Arbiter
Bus
Arbiter
Bus
Arbiter
Grant-1
Grant-2
Grant-N
Req-1
Req-2
Bus Busy
Req-N
Figure 5.6 | Parallel arbitration scheme
M05_9788131787663_C05.indd 165
M05_9788131787663_C05.indd 165 7/3/2012 12:39:18 PM
7/3/2012 12:39:18 PM

instant is assigned to the right side neighbour of the master module currently using the
system bus).In the latter case,all master modules have equal chance in accessing the sys-
tem bus over a long time period.In the case of very large systems,a distributed version of
the arbiter is used usually (the centralized version shown is failure critical).The demerit
of this scheme is that each module needs one signal line for bus request and one for bus
grant—two dedicated lines just for bus access.
5.1.6.4 | Distributed Arbitration by Self-Selection
This is a type of polling scheme. See Figure 5.7. Each module has an arbitration number
with an associated priority. When multiple modules ask for the bus, the one with the
highest arbitration number (not necessarily higher numerically) gets the bus. When bus
conﬂicts arise, it will be resolved in favour of the module with the highest arbitration
number. Along with a request, a module will also make known its arbitration number to
the others. Each requesting device will compare this number to its own number and the
one with the highest number wins. This scheme should allow dynamic reallocation of
arbitration numbers to make it a truly fair scheme.
5.2 | On-board Buses for Embedded Systems
5.2.1 | The I2C Protocol
I2C or I2
C stands for ‘Inter-Integrated Circuit’ and is a simple ‘two wire’ protocol with
just two wires, and was developed by Philips in 1980 for its TV applications which
required the connection of a CPU to many ICs. Today, this bus is very widely used in
the embedded ﬁeld. This is a synchronous, half duplex, serial protocol and is also byte
oriented,which means that one byte is sent,but one bit at a time in a serial fashion.After
each byte, an acknowledgement is to be sent by the receiver IC to the sender IC.
Central
Arbiter
Bus
Arbiter
Bus
Arbiter
Bus
Arbiter
Bus Busy
Bus Request
Arbitration
Number
Figure 5.7 | Distributed arbitration by self-selection
M05_9788131787663_C05.indd 166
M05_9788131787663_C05.indd 166 7/3/2012 12:39:18 PM
7/3/2012 12:39:18 PM

In its simplest form, this can have one master and many slaves (Figure 5.8). The
master, usually a microcontroller unit (MCU), can transmit as well as receive, so also the
slaves depending on whether they are input or output devices.For example,a slave which
is a ROM can only be read from, an LCD controller can only be written to, while an
external RAM chip can be read and written into.
The two signal wires are bidirectional and carry the signals SCL,the serial clock and
SDA the serial data. Each device has its own unique address, usually fixed by hardware.
Figure 5.8. shows the two active wires connected to pullup resistors (wired AND con-
nection), indicating that they are open collector or open drain connections, depending
on whether the technology is BJT or MOS.
5.2.1.1 | The I2C Protocol
Now let’s have a look at the talk session between a master and its slaves. Refer to
Figure 5.9.
i) First, the master issues a START signal. This signal causes all the slaves to come
to attention and listen.The start condition corresponds to the action of the master
pulling the SDA line low, when the clock (SCL) is high.
ii) The first byte sent by the master is the address. This address (7-bit) is sent serially
on the SDA line (MSB first). Note that the bits on the SDA line are synchronized
by the clock signal on the SCL line which means that the data on the SDA line
is read during the time that the clock on the SCL line is high (data is valid at the
L to H transition of the clock).
iii) Just after this,the master also sends the R/W signal indicating the direction of data
transfer (see Figure 5.9 a). Note that all activities are synchronized by the clock.
iv) Only one of the slaves will have the broadcasted address, and on realizing that
its address matches with this address, the particular slave responds by sending an
‘acknowledge’ signal back to the master.
v) Now a byte can be received from the slave if the R/W bit is set to READ, or be
written to the slave, if otherwise (see Figure 5.9b).
Figure 5.8 | I2C devices with one master and N slaves
Master
(MCU)
SCL
SDA
Slave 1 Slave 2 Slave N
R R
VCC
12C
Bus
M05_9788131787663_C05.indd 167
M05_9788131787663_C05.indd 167 7/3/2012 12:39:18 PM
7/3/2012 12:39:18 PM

vi) Once this data transfer is over, the device (master or slave) that has received the
byte sends an acknowledge signal. Acknowledgement is when the receiver drives
SDA low.
vii) If more bytes are to be transferred, steps v and vi are repeated.
viii) After this,the master pulls the SCL high,and then the SDA line also.This amounts
to a STOP condition when the bus is idle, also indicating that it is available for use
by other slaves.
The I2C bus was originally developed as a multi-master bus.This means that more than one
device initiating transfers can be active in the system. In such a case, each master will have
to arbitrate for the bus. I2C controllers have the additional hardware and protocol for this.
There are three standards for I2C bus and have the following three speeds:
i) Slow (under 100 Kbps)
ii) Fast (400 Kbps)
iii) High-speed (3.4 Mbps)
5.2.1.2 | Extended Addressing
Due to the popularity of the I2C bus, the 7-bit address space soon got exhausted. For
those who are designing new I2C compatible ICs, this became a problem and so the
I2C standard has been updated to implement a 10-bit addressing mode. A chip that
Figure 5.9b | Data transfer between the master and slave
D6
D7
MSB
D5 D4 D3 D2 D1 D0 ACK
ACK Stop
SDA
SCL
LSB
Data Byte
SDA
SCL
6
MSB
5 4 3 2 1 0 R/W
R/W
ACK
ACK
LSB
Start Address from Master
Figure 5 9a | The START condition and broadcast of the address by the master
M05_9788131787663_C05.indd 168
M05_9788131787663_C05.indd 168 7/3/2012 12:39:18 PM
7/3/2012 12:39:18 PM

conforms to the new standard, receives two address bytes. The first consists of the
extended addressing reserved address including the 2 upper bits of the device address
and the Read/Write bit. The second byte contains the 8 lower bits of the address. Any
new design should implement this new addressing scheme.
The I2C protocol is very simple, but it is not as fast as the competing SPI
scheme (Section 5.2.2.). I2C devices include EEPROMs, thermal sensors, real-
time clocks and similar peripherals which can have one data line and a clock line,
and can be programmed by an MCU. Many standard MCUs (PIC,AVR, PSoC,
ARM, etc.) have I2C ‘hardware’ within and SDA and SCL lines as pins, so that
the protocol can be implemented with ease. Since the hardware engine for this is
already available, the user need only be concerned about writing the software for
implementing it.
5.2.2 | The SPI Bus
This is a bus developed by Motorola.
SPI stands for ‘Serial Peripheral Interface’ and as the name suggests it is a serial
data transfer protocol, which is synchronous and full duplex (data can be sent in both
directions simultaneously), between a microcontroller unit (MCU) and a peripheral. As
a system, it is a single master, multi-slave system, in which only one of the slaves is to be
enabled at a time.It is a master slave protocol,in the sense that the master is the unit that
generates the clock signal and initiates data transfer. When the master does this, data
transfer occurs in both directions (simultaneously).
Figure 5.10a shows the signals of the SPI bus in a single slave configuration. There
are four wires for the bus: SCLK, the clock generated by the master; MOSI, which car-
ries data from the master to the slave; MISO which carries data from the slave to the
master; and SS, the slave select signal. The last pin SS (Slave Select) of the master is to
be connected to the chip enable pin of the slave to be selected, and is usually an active
low signal.
MOSI stands for Master Out, Slave In
MISO stands for Master In, Slave Out
Now see Figure 5.10b which shows a multi-slave SPI configuration. The master
is usually a microcontroller which has an SPI controller with the specified pins. The
MCU’s SPI controller unit has three SS pins, but only one slave is selected at a time.
Slaves that are currently not selected should have their MOSI and MISO tristated and
thus be isolated from the system.
5.2.2.1 | The SPI Protocol
The transfer of data using an SPI interface can be thought of as a large shift register
shared between the master and slave devices. Data is clocked IN at the same time as it is
clocked OUT of the devices (the clock is shared by the two devices). In addition, there
should be a transmit buffer register at the transmitter side, and a receive buffer register
at the receiver side.
M05_9788131787663_C05.indd 169
M05_9788131787663_C05.indd 169 7/3/2012 12:39:19 PM
7/3/2012 12:39:19 PM

Figure 5.10a | SPI signals with the master and one slave
Master
SCLK
MOSI
MISO
SS
Slave
SPI
Master
SCLK
MOSI
MISO
SS1
SCLK
MOSI
MISO
SS
SCLK
MOSI
MISO
SS
SCLK
MOSI
MISO
SS
SS2
SS3
Slave 1
Slave 2
Slave 3
Figure 5.10b | SPI connections for one master and three slaves
The SPI protocol behaves like a ring buﬀer,so that whenever the master sends a byte
to the slave, the slave sends a byte back to the master. Essentially, two actions take place
in an SPI clock cycle which are as follows:
i) The master sends a bit on the MOSI line which the slave reads from the same line
ii) The slave sends a bit on the MISO line and the master reads it from that same line
M05_9788131787663_C05.indd 170
M05_9788131787663_C05.indd 170 7/3/2012 12:39:19 PM
7/3/2012 12:39:19 PM

Figure 5.11 shows the data transfer between a master and a slave. There are shift
registers in the master and slave which are serially connected using the MOSI and
MISO pins. In this interconnection, a bit is shifted from master to slave and slave to
master simultaneously (full duplex). Not all transmissions require all these operations to
be meaningful but this is the way the protocol works. If, say, the individual shift registers
are 8 bits long, it is apparent that after 8 clock cycles, the data in the master and slave
gets exchanged. The length of the shift registers is decided by the manufacturer of the
SPI controller.
Once a set of data has been transmitted, the buffer at the transmitter side should
get fresh data to be sent. Similarly, the received data should be copied and saved at the
receiver side.This process can continue until the required block of data is transferred.
5.2.2.2 | Points to Note
i) SPI can operate at a higher speed compared to the I2C protocol, but it has a serious
problem in that it has no acknowledged signal. So the master has no way of con-
firming the receipt of the data sent.
ii) The protocol works best for a single slave system.
iii) Clock frequencies up to 70 MHz are possible.
iv) Slave devices such as serial EEPROM, flash memory, LCD drivers, memory cards,
serial ADCs, etc. are devices that frequently use the SPI protocol.
v) Popular MCUs like ARM, PIC, AVR, etc. have SPI controllers as a standard
feature.
Figure 5.12 shows the connection between a PIC MCU and an SD card. Appendix I
gives the details of SPI used with the ARM processor.
Master
SCLK
MOSI
MISO
SS SS
Shift Register Shift Register
Clock
Slave
Figure 5.11 | SPI transactions using shift registers
M05_9788131787663_C05.indd 171
M05_9788131787663_C05.indd 171 7/3/2012 12:39:19 PM
7/3/2012 12:39:19 PM

5.3 | External Buses
5.3.1 | The USB
The acronym USB stands for ‘Universal Serial Bus’, and it has become a very important
interface for users.We have experienced the ease of plugging in different types of devices
to the PC through the USB port. Many USB ports are available on the cabinet of the
PC, and also on laptops, which are ‘hot pluggable and hot swappable. These features
make it very useful for the users. Now the trend is to shift a lot of the peripherals to the
USB port.Thus we have printers, mice, keyboards, scanners, cameras, external hard disk
and so on, all of which are interfaced to the PC through the USB port.
Let us start with its history.The USB 1.0 specification was introduced in 1996.The
original USB 1.0 specification had a data transfer rate of 12 Mbps. USB, was created
by a core group of companies that consisted of Compaq, Digital, IBM, Intel, Northern
Telecom and Microsoft. One of the co-inventors of USB was Ajay Bhatt, who was later
given credit by Intel. The USB 2.0 specification was released in April 2000 and was
standardized at the end of 2001, with a data rate of 480 Mbps.
The USB 3.0 specification was released on 12 November 2008 by the USB 3.0
Promoter Group, and as of now in 2012, a number of USB 3.0 certified products have
SCK
MOSI
MISO
SS
Master
3.3v
10K
Microcontroller
DAT2/NC
DAT3/CS
DAT0/DO
DAT1/IRQ
CMD/DI
VSS1
VSS2
VDD
CLK/SCK
9
1
2
3
4
5
6
7
8
Figure 5.12 | SPI connections between a PIC MCU and an SD card
M05_9788131787663_C05.indd 172
M05_9788131787663_C05.indd 172 7/3/2012 12:39:19 PM
7/3/2012 12:39:19 PM

been released. They include host controllers, adapter cards, motherboards and hard
drives. Its maximum transfer rate is up to 10 times faster than the USB 2.0. It has been
dubbed the ‘Super Speed USB’.The USB 3.0 port will be backward compatible with the
current USB ones.
The USB 2.0 in current use, defines three data speeds:
i) Low speed: 1.5Mbps
ii) Full speed: 12Mbps
iii) High speed: 480Mbps
The low speed specification is meant for low speed devices like computer mouse. The
full speed is for most other devices, and the high speed is meant to compete with the
Firewire specifications. Firewire is a high speed port developed by Apple and is used in
equipment like professional digital cameras.
5.3.1.1 | Host and Devices
USB is an ‘asynchronous’bus which defines two components: a ‘host’, which is the mas-
ter and many ‘devices’ which are slaves. The bus is host controlled and there can be only
one host per bus.
Consider the case of a PC. It has a host controller in it, and thus is the host. Any
peripheral plugged into the USB connector of a PC is a ‘device’. Thus, memory sticks,
cameras, external hard disks, etc., that we connect to the PC are devices. Only the
host can act as the master—the host initiates transfers and communication takes place
between a host and a device—but no communication is possible between ‘devices’ nor-
mally. (There is, however, a new mode named ‘On the Go’ mode, when a device can act
as a host in a limited way).
A USB system consists of a host controller and multiple devices connected in a
tree-like fashion using special hub devices. A hub is a device that contains one or more
connectors or internal connections to USB devices along with the hardware to enable
communicating with each device. See Figure 5.13 which shows a six-tier USB system of
devices, connected through hubs at different levels.
Hubs may be cascaded, up to six levels. Up to 127 devices (including hubs) may
be connected to a single host controller. To the user, this means that up to 127 devices
can be connected to any one USB bus at any one given time. The limitation of 127 is
because the address field in a packet is 7 bits long.The length of any cable used is limited
to 5 metres.
5.3.1.2 | USB Cables
USB requires a shielded cable containing four wires, see Figure 5.14. The wires D+ and
D-, which carry the differential signals, form a twisted pair. Besides this, there is the
ground wire and also VBUS which carries a 5V supply used by a device for power.
5.3.1.3 | USB Connectors
The host and device have different types of connectors. Figure 5.15 shows the corre-
sponding receptacles: the A receptacle on a host,and the B receptacle on a device or hub.
M05_9788131787663_C05.indd 173
M05_9788131787663_C05.indd 173 7/3/2012 12:39:19 PM
7/3/2012 12:39:19 PM

Host
Root Hub
Hub
Hub
Hub
Hub Hub
Hub
Device
Device
Device
Device
Device
Device Device
Device Device Device
Device
Device
Figure 5.13 | A six-tier USB connection
VBus
D +
D –
GND
VBus
D +
D –
GND
Figure 5.14 | Internal details of a USB cable
The mini-B plug and receptacle has also been deﬁned as an alternative to the standard B
connector on handheld and portable devices.
The A receptacle is what we see on a PC or laptop—the A plug on a ﬂash memory,
mouse, etc. plugs directly in to the A receptacle. The B/mini B receptacle is seen on a
device like a digital camera, printer, etc. Figure 5.16 shows the connectors corresponding
M05_9788131787663_C05.indd 174
M05_9788131787663_C05.indd 174 7/3/2012 12:39:19 PM
7/3/2012 12:39:19 PM

to A, B and mini B receptacles. A cable with an A plug, at one end, and a B plug, at the
other end, is used to connect between a host and devices like printers, digicams, etc., see
Figure 5.17. Table 5.1 shows the pin numbers, the pin names and the corresponding
colour codes.
1
2
3 4
Figure 5.15b | The‘B’receptacle
1 2 3 4
Figure 5.15a | The‘A’receptacle
Figure 5.16 | The three types of USB connectors corresponding to A, B and mini B
receptacles
Figure 5.17 | USB cable with A and B plugs
Table 5.1 | USB Connector Pin Assignments
Pin No. Name Color
1 +5.0V Red
2 Data– White
3 Data+ Green
4 Ground Black
M05_9788131787663_C05.indd 175
M05_9788131787663_C05.indd 175 7/3/2012 12:39:19 PM
7/3/2012 12:39:19 PM

5.3.1.4 | USB–Power Issues
The USB connector provides a single 5 volt wire from which connected USB devices
may be powered.The bus is specified to deliver 100 mA current and even up to 500 mA,
if the host permits it to be configured as a high powered device. This is often enough
to power several devices, although this budget must be shared among all devices down-
stream of an unpowered hub. When USB devices (including hubs) are first connected,
they are interrogated by the host controller, which enquires of each, their maximum
power requirement. Devices that need more than 500 mA current must use an external
power source. Thus, we see printers and certain external hard disks with external power
supply, while USB-based mice and keyboards do not need any extra power.
When there is no communication between the host and a device for at least 3 msecs,
the device goes to the ‘suspend’ state when it draws less than 0.5 mA current, until it is
brought back to operation by a ‘resume’ signal or by a reset condition.
5.3.1.5 | USB Signals
The USB protocol uses differential signals for improving noise resilience. Differential
signalling is when the effective signal is the difference of the signal in the two signal
wires. In this case, common mode signals (usually noise) get cancelled out. USB signals
are bi-phase, and signals use the NRZI (non-return to zero inverted) data encoding
technique. In this technique, the signal level is inverted for each change to logic 0. The
signal level for logic 1 is not changed. A ‘0’ bit is ‘stuffed’ after every six consecutive
ones in the data stream. This is for synchronization, when long strings of ‘1’ appear. See
Figure 5.18.
5.3.1.6 | The USB Protocol
USB is a protocol which is entirely host initiated—also there is only a single logical path
at any one time—that path is between the host and ‘one’ device. All USB peripherals
(devices) are slaves that obey a defined protocol.
At a time, either one ‘device’ or the ‘host,’ transmits. When a host transmits,
and there are many devices in the system, only the device ‘addressed’ by the host can
respond. This means that each device monitors the device address sent by the host. If
the address broadcasted doesn’t match the device’s address, the device simply ignores the
communication.
Data
NRZI
Signal
1 1 1 1 1
0 0 0 0 0 0
Figure 5.18 | Raw data and NRZI encoded data
M05_9788131787663_C05.indd 176
M05_9788131787663_C05.indd 176 7/3/2012 12:39:21 PM
7/3/2012 12:39:21 PM

When a USB device is first connected to a USB host, the device ‘enumeration’
process is started.This is a sort of ‘interrogation’process in which the host asks the device
to give complete information about itself. The enumeration starts by the host sending a
reset signal to the USB device. If the device is supported by the host, the device drivers
needed for communicating with the device are loaded from the operating system of the
host computer and the device is set to a configured state. If, say, the host determines
that the device is a ‘standard’ printer, the drivers for this should be available in the host
computer—this is loaded into the USB system and then communication between the
host and peripheral can continue, and data transfer can take place.
5.3.1.7 | Data Transfer
Transfer of data is designated as ‘transactions’ and is done using packets which are for-
mats built up in a pre-determined way.A packet starts with a sync pattern,the data bytes
of the packet follow, ending with an ‘end of packet’signal. Remember that this is a serial
protocol in which data is sent one bit at a time. Here, the LSB is sent first.
There are four ways in which data transfer can be done,and the chosen way depends
on the type of data. Let’s find out what this is all about.
i) Control transfer: A control transfer is a bidirectional data transfer, and is generally
used for initial configuration of a device by the host. Control transfers enable the
host to read information about a device, set a device’s address, and select configura-
tions and other settings. All other transfer types are unidirectional.
ii) Bulk transfer: Bulk transfers are intended for situations where the rate of transfer
isn’t critical, such as sending a file to a printer or receiving data from a scanner. Bulk
transfers are designed to transfer large amounts of data with error-free delivery
and no guarantee of bandwidth, which means that such applications can afford to
wait, if the bus is busy handling other more important requests.Typical applications
include scanners and printers.
iii) Interrupt transfer: Interrupt transfers are meant for devices that must receive the
host’s attention periodically. Low speed devices like the mouse and keyboard use
this type of transfer where the device needs the attention of the host periodically.
No ‘interrupt’ is involved here; only the type of transfer is like that in an interrupt-
based system. Here, it is apparent that data should be transferred without any delay.
iv) Isochronous transfer: Isochronous in simple terms mean ‘at regular intervals’.This
is a sort of asynchronous data transmission,done at regular intervals,that is,a trans-
mission service allowing the sending and receiving of data in equal time increments.
Isochronous transfers have guaranteed delivery time but no error correcting facility.
This is the only type of transfer that doesn’t support automatic retransmitting of
data received with errors, so occasional errors must be acceptable. Only full- and
high-speed devices can do isochronous transfers.Audio and video streaming is done
using this method, that is, real-time applications.
5.3.1.8 | Plug and Play
USB supports plug and play with dynamically loadable and unloadable drivers. This
means that the user simply needs to plug the device on the bus.The host will detect this
addition, interrogate the newly inserted device and load the appropriate driver, provided
M05_9788131787663_C05.indd 177
M05_9788131787663_C05.indd 177 7/3/2012 12:39:21 PM
7/3/2012 12:39:21 PM

a driver is installed for the plugged-in device. The end user need not worry about terms
such as IRQs and port addresses or rebooting the computer. Once the use is over, the
user can simply plug the cable out, the host will detect its absence and automatically
unload the driver.
The word ‘hot pluggable and hot swappable’ means that devices can be plugged in
and/or swapped without switching off the power supply. For many of the buses available
earlier (inside PCs and other systems), like ISA, EISA, etc., power had to be switched
off before anything could be plugged in.
But then, why do we use the ‘Safely Remove Hardware’ utility before
removing a USB device from a PC?
This is a USB device manager.When a device is plugged in a USB drive on a PC,the PC
takes charge of writing and reading to/from the device. When writing, some of the data
is cached, or say buffered. What this means is that the data to be written might be kept
temporarily in the RAM of the PC. If the device is pulled out of the drive just as soon
as you think that ‘writing’ is over and done with, there is a possibility that the data that
is there in the disc is not actually the final and correct data. You are likely to get a cor-
rupted file, then. But Windows automatically disables caching on USB devices, unless it
has been specifically enabled. So this device manager is there simply as an extra level of
security for USB ‘storage devices’. The device manager causes the files to close properly
preserving pointers and gives time for writing to complete fully.
5.3.1.9 | USB And Embedded Systems
All the time that we were discussing the USB port, we used the concept of a ‘host’ com-
puter, which most of us would have visualized as the PC. But what about embedded
boards? Most embedded boards have USB ports—as host ports and/or slave ports. See
Figure 5.19 which shows many parts of a typical embedded board—a USB port (slave) is
also seen.
USB Port
ARM 7
Processor
Figure 5.19 | An embedded board showing a USB slave connector
M05_9788131787663_C05.indd 178
M05_9788131787663_C05.indd 178 7/3/2012 12:39:22 PM
7/3/2012 12:39:22 PM

In system design, usually a PC is used which has the required IDE (Integrated
Development Environment). After cross compilation, the hex file is downloaded into
the flash of the embedded board through the USB slave port. In early applications, the
serial port was used for this. Now, that the serial port is more or less obsolete, the USB
interface has become ubiquitous. Embedded boards can have host controllers also, using
which embedded applications can use the board as the host (like a PC).
5.3.1.10 | Developing USB-based Applications
If you need to develop a USB-based application, a USB controller will be a necessity.
The complexities and speed of the USB protocol are such that it is not practical to
expect a general purpose micro-controller to be able to implement it using its instruc-
tion set. Dedicated hardware is required to deal with the time-critical portions of the
specification, and in recent times, many MCU manufacturers have started integrating a
USB controller in their products. There are some PIC and ARM chips which have this
hardware built into them. Check the product range of the manufacturers of MCUs and
choose the MCU which you think will solve your problem in the best way.
5.3.2 | The Firewire Port
Firewire,also known as IEEE 1394,is a high-performance serial bus originally developed
by Apple in 1989. The baseline specification handles throughput rates of 100 Mbits/s,
200 Mbits/s and 400 Mbits/s.The IEEE 1394b specification increases this transfer rate
to 3.2 Gb/s.
Firewire is hot-pluggable which means that devices can be connected and discon-
nected while the system is powered up. In addition, devices operating at different com-
munication rates can exist on the same communications chain. It is a serial protocol
which is much more complex than USB, though the connectors of both ‘look’ similar
(see Figure 5.20).
But the difference between the USB and firewire protocols is in the speed. When
large data blocks are to be transferred—such as video—from one device to another,
firewire is the optimum solution.There are a number of upcoming applications which are
the driving forces for the firewire standard.Following is a list of prospective applications.
5.3.2.1 | Prospective Applications
i) Digital television (DTV)
ii) Multimedia CDROM (MMCD)
Figure 5.20 | A firewire plug-in connector
M05_9788131787663_C05.indd 179
M05_9788131787663_C05.indd 179 7/3/2012 12:39:23 PM
7/3/2012 12:39:23 PM

iii) Entertainment and video appliances
iv) Digital home networking
v) Printers for video and computer data
vi) Digital cameras and video conferencing cameras
vii) Industrial measurements
The range of possible applications is large. Most of the current PCs don’t have a firewire
port, but there are boards with firewire connectors which can be plugged inside in, if
there is the need for high speed data transfer. One of the applications in which a firewire
port is likely to be seen now is the professional digital camera.
5.3.2.2 | Implementation
Special integrated ICs are needed to implement the firewire protocol. Like ethernet,
firewire is a layered transport system. The IEEE-1394 standard defines three layers:
physical, data link and transaction.The physical layer provides the signals required by the
firewire bus.The data link layer takes raw data from the physical layer and formats it into
packets.The transaction layer takes the packets from the data link layer and presents them
to the application.The remainder of the transaction functions is performed in software.
5.3.2.3 | Topology
The 1394 protocol is a peer-to-peer network with a point-to-point signalling environment.
A specific host is not required which means that data from a camera could be
directly sent to a scanner or a printer. Similarly, a digital camera could easily stream data
to a digital VCR and a DVD-RAM.
As seen in the Figure 5.21, each equipment on the bus is a node with several ports
on them. Each of the nodes can act as a repeater, receiving and re-transmitting the pack-
ets received there.
Configuration of the bus occurs automatically whenever a new device is plugged
in. During system initialization, each node in a 1394 bus carries out a process of bus
initialization and self-identification.
Comparing USB and Firewire USB and 1394 are complementary buses, in the sense
that they differ in their application focus
i) USB is the preferred connection for most PC peripherals and is used for relatively
low-speed data transfer.
ii) Firewire caters to audio/visual consumer electronic devices such as digital camcord-
ers, digital VCRs, DVD players and digital televisions which are high bandwidth
applications.
Figure 5.21 | Nodes connected through firewire
VCR
Video
Camera
PC
Digital
TV
M05_9788131787663_C05.indd 180
M05_9788131787663_C05.indd 180 7/3/2012 12:39:24 PM
7/3/2012 12:39:24 PM

5.3.3 | The Standard Serial Port
For a long time, the standard serial port was available at the back of PCs and also in all
embedded boards.This port is on its way to obsolescence,but it will be a good idea to have
a look at this RS-232 compliant port. It is often called a ‘legacy’port, but it is still used by
hardware designed to connect to the serial port, especially for computers used as servers
by companies.Laptops and Macs stopped being sold with serial ports several years before
desktops did. However, if one needs a serial port, it is possible to buy one and install it.
But that is about the serial port of a general purpose computer. What about the
relevance of this embedded systems? Usually embedded software is developed on a host
computer and then downloaded to the MCU flash through the serial port of the com-
puter. The connection between the PC and the target board has been replaced by the
USB cable in recent cases, and for more advanced systems, there is the JTAG cable. But
many embedded boards still have ‘serial ports’. In some cases, it is probable that the PC
does not have a serial port. In that case, a conversion cable (serial to USB and vice versa)
can be used for communication between the two.
It is possible to have a two-way communication done between a PC and an embed-
ded board using the serial cable and ‘hyperterminal’software (or similar s/w) available in
the PC. Refer Section 9.1.2.
Because of all these factors, let’s have a ‘light’ discussion on RS-232, serial connec-
tors and connections.
5.3.3.1 | RS-232 Standards
This is one of the early standards for serial communications.We have talked earlier,about
sending data as bits as either ‘1’ or ‘0’. As per TTL standards, these levels correspond to
5 V and 0 volts. However, during transmission over short distances (without a modem),
say from one room to another, the TTL level signals will get corrupted easily by noise.
However, this problem does not occur normally, because we don’t send or receive at
TTL levels, instead we use a standard called RS-232C, which defines a different set of
voltages/currents. RS-232 stands for Recommend Standard Number 232 and C is the
latest revision of the standard. By this standard, before being transmitted, the TTL volt-
age levels are changed to a level between −3 to −25 V for a ‘1’, and +3 to +25 V for a ‘0’
(the voltages between −3 V and +3 V is undefined).This means that before sending, the
bits should be changed to this level and reconverted to TTL levels on receiving. There
are some standard chips available for doing this.
5.3.3.2 | RS-232 Level Converters
Almost all digital devices that we use, require either TTL or CMOS logic levels. An IC
for converting to and from these levels to RS-232 levels is the MAX232 (there are other
ICs also for the same purpose) Figures 5.22a and b show the pinout and the connections
for this chip.
5.3.3.3 | RS-232 Connectors
The pinout of the MAX 232 IC shows that we can get an RS-232 transmit signal and
an RS-232 receive signal from it. These pins of the IC are terminated on the standard
serial port.RS-232 defines a standard of 25 pins,leading to a 25 pin connector.But most
serial ports need only a subset of the RS-232 standard.Most of these pins are not needed
M05_9788131787663_C05.indd 181
M05_9788131787663_C05.indd 181 7/3/2012 12:39:24 PM
7/3/2012 12:39:24 PM

for normal PC communications, and indeed, most PCs are equipped with male D type
connectors having only 9 pins. Figure 5.23 shows the DB-9 connector and Table 5.2
describes the pin functions of the DB9 connector. But again, even though the standard
connectors have 9 pins, serial communication is possible with just 3 pins: T × D, R × D
and ground. See Figure 5.24 which shows two serial ports (on two devices, say a PC and
an embedded board) connected through a three wire cable.
1 2 3 4 5
6 7 8 9
Figure 5.23 | The DB-9 connector
Vcc
C1+
C3+
C4
C2+
2
6
1mF
1mF
1mF
1mF
1
3
4
5
11
10
12
9
From CMOS or TTL
To CMOS or TTL
15
16
14
7
13
8
RS-232 Output
RS-232 Output
RS-232 Input
RS-232 Input
Figure 5.22b | Connections for using the chip
C1+
C2+
C2–
C1–
Vs
+
Vcc
Vs
–
T2Out
T1Out
R2In
R1In
T1In
T2In
R1Out
R2Out
GND
1
2
3
4
5
6
7
8 9
10
11
12
13
14
15
16
Figure 5.22a | The pin out of MAX232
M05_9788131787663_C05.indd 182
M05_9788131787663_C05.indd 182 7/3/2012 12:39:24 PM
7/3/2012 12:39:24 PM

Table 5.2 | Pin Functions of the DB-9 Connector
Pin No. Description
1 Data carrier detect (DCD)
2 Received data (R × D)
3 Transmitted data (T × D)
4 Data terminal ready (DTR)
5 Signal ground (GND)
6 Data set ready (DSR)
7 Request to send (RTS)
8 Clear to send (CTS)
9 Ring indicator (RI)
Figure 5.24 | Serial communication with three pins
Serial Port-1 Serial Port-2
T × D
R × D
2 2
3 3
7 7
GND
T × D
R × D
GND
Any MCU (PIC, 8051, AVR, etc.) will have two serial communication pins, R × D
and T × D, for receiving and transmitting. MCUs have an inbuilt UART (Universal
Asynchronous Receiver Transmitter), the function of which is to convert the parallel
data available in the MCU registers to serial form to put on T × D, and also to convert
the serial data received on R × D, back into parallel form. The rate at which serial data
transfer occurs is determined by the ‘baud rate’ setting in the UART registers. To take
the serial data out of the board, through the serial port, they have to be converted to
RS-232 level signals, and this is done by the level converter MAX 232 IC or other ICs
with similar functions.
Figure 5.25 shows the part of an embedded board which contains a MAX232 IC,
an MCU, the connections of the MAX232 IC to the serial data pins of the MCU, and
to the DB-9 male connector of the serial port.
5. 3.4 | RS 422/RS 485
We have discussed the RS 232 serial interface in great detail, as this is a very popular
interface, and long distance communication is achieved by having modems at the
M05_9788131787663_C05.indd 183
M05_9788131787663_C05.indd 183 7/3/2012 12:39:25 PM
7/3/2012 12:39:25 PM

transmission and reception ends. Now we talk of other serial protocols which are used in
instrumentation systems and where modems are not used.
The prominent difference between these (RS 422 and RS 485) and RS 232 is that
the signals here are ‘balanced,differential’rather than single ended.There are two floating
signal wires, V+ and V-, but no common ground. At any time, the difference in the volt-
ages between the two wires is sensed.Thus, any noise which is common to both the wires
is cancelled, and makes this to be ‘low noise differential signalling’. In addition, the two
wires are twisted for further noise reduction.Twisting the lines helps to reduce the noise.
The noise currents induced by an external source are reversed in every twist. Instead of
amplifying each other as in a straight wire, the reversed noise currents reduce each others’
influence, that is, the currents get cancelled. See Figure 5.26 which illustrates this point.
Differential signals and twisting allows RS 485 to communicate over much longer
communication distances than achievable with RS 232. With RS 485, communication
distances of 1200m are possible.
5.3.4.1 | RS 485 and RS 422—The Difference
While many features of RS 485 and RS 422 are the same, there is one prominent differ-
ence. The former is a multipoint protocol, while the latter is multidrop. What do these
terms mean?
Both of them allow the direct connection of intelligent devices, without the need
of modems. But they are half duplex protocols. The RS 422 line driver can serve up to
ten receivers in parallel. Thus, one central control unit can send commands in parallel
to 10 slave devices. But these slave devices cannot send information back over a shared
interface line.Thus, the network topology possible is only ‘multi-drop’.
Figure 5.25 | The MAX232, an MCU and a serial port on an embedded board
M
A
X
2
3
2
1
2
3
4
5
6
7
8 9
10
11
12
13
14
15
16
1uF
1uF
1uF
1uF
10uF
+
+
+
+
+
MCU
Rx
Tx
GND
VCC
1 5
6 9
DB9
M05_9788131787663_C05.indd 184
M05_9788131787663_C05.indd 184 7/3/2012 12:39:25 PM
7/3/2012 12:39:25 PM

For a multi-point network where all nodes are considered equal and every node has
send and receive capabilities over the same line, an RS 485 interface can be used. Up to
32 parallel send and receive units can exist on one communication channel. Figure 5.27
shows the line driver and receiver in one node of RS 485.
Table 5.3 compares the important features of the three protocols.
5.3.5 | Ethernet
In the current world, networking between computers is very important in view of
the fact that a very high percentage of personal and business communication is done
electronically.
Straight Cable
Twisted Pair Cable
Magnetic Field
Induced Noise Current
Figure 5.26 | Illustrating the eﬀect of twisting wire pairs
Node
S
R
Figure 5.27 | Send and receive drivers in an RS485 node
M05_9788131787663_C05.indd 185
M05_9788131787663_C05.indd 185 7/3/2012 12:39:25 PM
7/3/2012 12:39:25 PM

As such, it is important to have some knowledge of the principles involved in the
networking of computers and associated technology.
5.3.5.1 | LAN, WAN and Internet
Local area networks (LAN) are computer networks located within a small geographic
area, for instance, a single building, a college campus or a business organization. The
network can be as small as just three computers or as large as one that links hundreds of
them. When such LANs located in different geographical areas are linked, it constitutes
a Wide Area Network (WAN).The linking of LANs in WAN is accomplished by leased
lines, dial up phones, satellite links, etc. The Internet, as we know, is a system of linked
networks which has made the ‘world wide web’ a world in itself.
5.3.5.2 | What is Ethernet?
Ethernet was originally developed by Intel, Digital (now Compaq) and Xerox, and is
now an open network standard. It refers to the LAN products covered by the IEEE
802.3 standard. It is a ‘wired’ standard. Three data rates are currently defined for opera-
tion over optical fibre and twisted-pair cables:
i) 10 Mbps-10Base-T Ethernet
ii) 100 Mbps-Fast Ethernet
iii) 1000 Mbps-Gigabit Ethernet
A new version named 10-Gigabit Ethernet was published in 2002. IEEE as 802.3ae
supplement to the IEEE 802.3 base standard.This has slight differences from the older
standard. We will discuss the type covered by the base standard.
Ethernet has been around for quite some time now, and there have been attempts
to replace it by newer technologies. But the market for Ethernet is so strong that more
than 85 per cent of wired LAN connected PCs use it still, because of its positive features
which are listed as
i) It is easy to understand, implement, manage and maintain
ii) It allows low-cost network implementations
iii) It is a widely accepted industry standard, ensuring compatibility
iv) It is structured to allow compatibility with network operating systems (NOS)
v) It is very reliable
vi) Provides extensive topological flexibility for network installation
Table 5.3 | Comparison of RS 232, RS 422 and RS 485
RS 232 RS 422 RS 485
Signal Single ended Differential Differential
No of drivers (max) 1 1 32
No of receivers 1 10 32
Operation Full duplex Half duplex Half duplex
Network topology Point to point Multidrop Multipoint
Max distance 15m 1200m 1200m
M05_9788131787663_C05.indd 186
M05_9788131787663_C05.indd 186 7/3/2012 12:39:25 PM
7/3/2012 12:39:25 PM

5.3.5.3 | The OSI Model
As Figure 5.28 shows, the Ethernet protocol is associated with only the data link and
physical layers of the OSI model. The physical payer transforms data into bits that are
sent across the physical media. The data link layer determines access to the network
media in terms of frames. Its sub-layer Media Access Control (MAC) is responsible for
physical addressing.The TCP/IP (Transmission Control Protocol/Internet Protocol) on
Ethernet takes care of all the seven layers.
5.3.5.4 | Network Topology
The computers in a LAN network may be connected in the bus or star topology.
Bus In a bus topology, all devices on the network connect to one trunk cable. This
makes it easy to install and configure, and is inexpensive. Ethernet in a bus topology
requires no special equipment to amplify or regenerate the signal.The problem with this
is that, if the trunk cable fails, all devices are affected. Figure 5.29 shows such a network.
Star In a star topology,a separate cable connects each device with a central device called
a hub. Unlike the bus topology, if a cable fails, it affects only the one device connected
Figure 5.29 | Computers connected in a bus topology
Application
Presentation
Session
Transport
Network
Data Link
Physical
Ethernet
Figure 5.28 | Ethernet’s association to the OSI model
M05_9788131787663_C05.indd 187
M05_9788131787663_C05.indd 187 7/3/2012 12:39:25 PM
7/3/2012 12:39:25 PM

to the failed cable. Star networks are easily expanded and are easier to troubleshoot. See
Figure 5.30.
5.3.5.5 | CSMA/CD
CSMA/CD stands for Carrier Sense Multiple Access with Collision Detection (Refer
Section 5.4.1.1). When more than two devices attempt to use the data channel simul-
taneously (i.e. a collision occurs), this protocol comes into action. Collision detection
means that a sending device can ‘detect’ simultaneous transmission attempts of other
senders. When two or more devices try to transmit data at the same time, a collision
occurs and both transmissions become unreadable. Each device then transmits a jam
signal,called a carrier,to alert all devices that a collision has occurred.All devices then go
into a ‘back oﬀ ’ mode and wait a random amount of time before attempting to retrans-
mit.This ‘random’ time of waiting provision prevents simultaneous retransmissions.
Multiple access means that all devices have equal access to the network, that is,
there is no priority assigned to any of the devices. Data packets can be sent at any time
by any device. All devices receive the transmission and compare the packet’s destination
address. If the destination address matches the device’s address, the device accepts the
data. If the address does not match, the device simply ignores the transmission.
5.3.5.6 | Ethernet Connector
The network is connected to the PC using an RJ-45 connector.Inside the Ethernet cable
there are eight wires, in which two are used for transmission, and two for reception.The
rest are unused. See Figure 5.31.
5.4 | Automotive Buses
If you look at a modern car, the amount of electronics used in its functioning is huge.
Right from fuel injection to window glass and wipers, the control is done electroni-
cally. ‘Automotive electronics’ has become a special ﬁeld in which embedded processors,
HUB
Figure 5.30 | Computers in a star connected LAN
M05_9788131787663_C05.indd 188
M05_9788131787663_C05.indd 188 7/3/2012 12:39:26 PM
7/3/2012 12:39:26 PM

standard buses and the controllers for the innumerable peripherals in an automobile,
have to work in unison. On-board electronics has contributed significantly to improved
performance, travelling in comfort, ease of manufacture and testing, and also to ‘cost
effectiveness’.
Looking at a car, for instance, we can divide the requirements of electronic controls
on the basis of priority (safety critical) and speed of response. Any electronic control
involving the engine, brakes, etc. should have fast response and should be given priority,
while door, wiper control, etc. can act much slower. Besides these, there is infotainment
electronics, that is, the audio and video systems, radio, satellite navigation, etc. Some of
the typical electronic modules in modern vehicles are the engine control unit (ECU),the
transmission control unit TCU), the anti-lock braking system (ABS) and body control
modules (BCM), and the infotainment unit.
The following are the ideas that should be apparent by now:
i) There are a number of embedded processors in any automobile, depending on the
sophistication of the car—the higher the price of the car, the higher will be this
number.
ii) All these processors have to communicate, and this means there are buses on which
requests, grants and signals travel, and this interconnection will tend to become a
‘network’.
iii) Depending on the requirements of a particular vehicle, it should be possible to plug
in additional modules into the systems. This means that if ‘air bag control’ is not
available, but is needed later, it should be possible to add the electronic module for
this into the ‘network’.
iv) Depending on the speed requirements of the data transfer, different kinds of buses
are used in automobiles, catering to the different applications.
There are a number of vehicle buses available and popular now, some of which are as
follows:
i) Controller area network (CAN), an inexpensive serial bus for interconnecting auto-
motive components.
ii) Local interconnect network (LIN), a very low cost, low speed in-vehicle sub-
network for body electronics.
Figure 5.31 | An Ethernet connector
Pin 4:
Reserved
Pin 5:
Reserved
Pin 8:
Reserved
Pin 7:
Reserved
Pin 6:
Receive –
Pin 2:
Transmit –
Pin 1:
Transmit +
Pin 3:
Receive +
M05_9788131787663_C05.indd 189
M05_9788131787663_C05.indd 189 7/3/2012 12:39:26 PM
7/3/2012 12:39:26 PM

iii) Media-oriented systems transport (MOST), a high-speed multimedia interface.
iv) FlexRay, a general purpose high-speed protocol with safety-critical features.
5.4.1 | Controller Area Network (CAN)
CAN is a protocol developed to reduce the wiring inside vehicles, around the year
1984 by Bosch, a company which has pioneered many developments in the automotive
embedded market. For some years, it was in the testing and development stage, until
finally it was installed on a Mercedes in the early 1990s. After that, the bus became very
popular and is now the de facto standard for automotive buses.There are different stan-
dard versions for CAN, as shown as follows:
i) Low Speed CAN - 125 kbps - 11-bit identifier
ii) Standard CAN 2.0 A - 1 Mbps - 11-bit identifier
iii) Extended CAN 2.0 B - 1 Mbps - 29-bit identifier
In this section, we will take a look into the features of the CAN protocol, which is
used in many units in a vehicle. Its interesting feature is that it is an ‘interconnection
network’. In vehicles, it may be used to interconnect between the engine control unit
and the transmission control unit. A lower speed CAN is sufficient to connect the
modules of door locks, seat control, climate control, etc. (this part is called the ‘body
electronics’ of the vehicle). In one vehicle itself, there can be different CAN buses
of different data rates. There can be other kinds of buses also in a vehicle. To con-
nect between buses of different speeds and of different standards, ‘bridges’ are used.
Figure 5.32 shows three different buses with different bit rates, catering to different
sets of modules.
We start with saying that the CAN bus connects different nodes. What is a node?
Figure 5.33 shows a CAN node, in which there is an MCU, a CAN controller and
transceiver,connected to a CAN bus through line drivers.To the MCU I/O pins,sensors
and actuators are connected. Because of its popularity, many microcontroller manufac-
turers have now added CAN controller units in their products, that is, inside the MCUs,
making it a reliable, efficient and low cost option. Many PIC, AVR and ARM MCUs
have CAN controllers as an integrated unit in them. CAN is now used in machine and
factory automation products as well. Figure 5.34 shows a CAN bus and a number of
CAN nodes connected to it.
5.4.1.1 | The CAN Protocol
How does CAN work?
Case 1: One Node Sends a Message CAN is a message based protocol. This means
many things, let’s look at it this way. One node ‘broadcasts’ a message, this means that
every other node can use it. But unlike I2C, none of the nodes have addresses. So then,
which node receives the broadcast message? That depends on the content of the message.
The message has a field with an ‘identifier’, which also indicates a priority.The receiving
nodes do an acceptance test for the identifier of the message to verify if the message is
relevant for it,and accepts it if it is relevant.Otherwise the message is ignored.The selec-
tion processing is called ‘acceptance filtering’ at each station (node).
M05_9788131787663_C05.indd 190
M05_9788131787663_C05.indd 190 7/3/2012 12:39:26 PM
7/3/2012 12:39:26 PM

Figure 5.32 | Automotive electronic modules networked using diﬀerent buses
Mirror
Control
Body Electronics
Door
Switch
Bus-1 Bus-2
Bus-3
Head Lamps Instrument
Panel
Oil
Pressure
Steering
Control
Bridge
Infotainment
Satellite
Navigation
Audio
Unit
Video
Unit
Braking
Unit
Radio
Engine
Control
Transmission
Control
Engine
Temperature
Engine and Transmission Control Unit
I/O
MCU
CAN Controller
CAN Transceiver
CAN Bus Lines
Figure 5.33 | A CAN node
M05_9788131787663_C05.indd 191
M05_9788131787663_C05.indd 191 7/3/2012 12:39:26 PM
7/3/2012 12:39:26 PM

Case 2: Many Nodes Send Messages When many nodes send messages simultane-
ously, it is necessary that only one node is allowed to do a valid ‘broadcast’. Other trans-
mitters should retreat, and try again later. Arbitration is the mechanism that handles bus
access conflicts.The technique of arbitration used here is like this: The identifiers (11-bit
or 29-bit) of the messages have dominant and recessive bits –0 is the dominant bit, and
1 the recessive bit.The logic is wired AND, where the presence of a 0, causes the output
to be 0.
Let’s use an example to understand this. Consider three nodes transmitting at the
same time. Let’s consider that the 11-bit identifiers of the 3 messages (from the three
nodes) have all their lower 8 bits as 0. In the arbitration phase, the comparisons of the
lower 8 bits do not cause any decision to be made. The 9th-bit of the identifier is 1 for
node 2 alone. So, when contending for the bus, node 2 loses arbitration at this point of
time. Then comes the 10th-bit, which is 1 for the message of node 3. Thus, node 3 also
loses out, and node 1 is left with its message on the bus.
See Figure 5.35 for understanding this method in the case of the example message
identifiers. The technique is called CSMA/CD which stands for carrier sense multiple
access/collision detection. Here we see that when multiple access (to the CAN bus)
occurs, collision detection is done by sensing the carrier (i.e. the encoded identifiers) and
on the basis of this, arbitration is done.
The coding of the identifiers of the messages should obviously be in such a way that
more important actions should be given higher priorities.
5.4.1.2 | Features of CAN
Figure 5.36 shows a CAN device connected to a CAN bus, where two signal wires
CAN_L and CAN_H (low and high) are seen. This is called differential signalling and
the effective signal is the difference of that in the two wires. When common mode sig-
nals, usually noise, appear on the bus, they are subtracted off, and this makes the CAN
bus resistant to noise. In CAN and most other modern serial protocols (USB, PCIe, etc)
differential signalling with NRZ (non return to zero) coding is used to reduce the effects
of noise. A CAN bus is terminated to minimize signal reflections on the bus.The ISO-
11898 requires that the bus has a characteristic impedance of 120 ohms.
CAN
Device
CAN
Device
CAN
Device
1 2 3
CAN
Device
4
Figure 5.34 | Nodes connected through a CAN bus
M05_9788131787663_C05.indd 192
M05_9788131787663_C05.indd 192 7/3/2012 12:39:27 PM
7/3/2012 12:39:27 PM

5.4.1.2 | Message Frames
CAN distinguishes four message formats: data, remote, error and overload frames.
Figure 5.37 shows the data frame. A data frame begins with the start-of-frame (SOF)
bit. It is followed by an eleven-bit identifier and the remote transmission request (RTR)
bit. This bit can be used to tell any other node to ‘transmit’ instead of just listening. The
identifier and the RTR bit form the arbitration field.The control field consists of six bits
Figure 5.35 | The‘collision detection’and arbitration technique
Start of Frame
Arbitration Field
Node 1 : T × D
Node 2 : T × D
Node 2
Loses Arbitration
(9th
-Bit)
Node 3
Loses Arbitration
(10th
-Bit)
Node 3 : T × D
CAN Bus
CAN Device
CAN Bus
CAN Device
120 Ω
120 Ω
CAN_H
CAN_L
GND
Figure 5.36 | CAN signals
SOF Identifier RTR Control Data CNC ACK EOF
Figure 5.37 | A CAN‘data’frame
M05_9788131787663_C05.indd 193
M05_9788131787663_C05.indd 193 7/3/2012 12:39:27 PM
7/3/2012 12:39:27 PM

and indicates how many bytes of data follow in the data field.The data field can be zero
to eight bytes.The data field is followed by the cyclic redundancy checksum (CRC) field,
which enables the receiver to check if the received bit sequence was corrupted.The two-
bit acknowledgment (ACK) field is used by the transmitter to receive an acknowledg-
ment of a valid frame from any receiver.The message frame is terminated by a seven-bit
end-of-frame (EOF). For CAN 2.0 B, the identifier is 29 bits long, instead of the eleven
bits of CAN1.0.
You will need to know more about the different types of frames,if you are implement-
ing a CAN-based application. But in that case also, you will be using a CAN controller
IC (or a CAN controller in an MCU) which will take care of many implementation
issues. CAN has become a very popular and widely used protocol because of its simplic-
ity, low cost, sophisticated error detection and handling system. Its maximum range is
typically 40 metres at 1Mbit/sec but the data rate decreases with increase in range.
5.4.1.4 | CAN and The OSI Model
Many network protocols are described using the seven-layer open systems interconnec-
tion (OSI) model.The CAN protocol defines the lowest two layers of the OSI model,i.e.,
the data link and the physical layer. There exist several CAN-based higher-layer proto-
cols (application level) that are standardized.The user choice depends on the application.
5.5 | Wireless Communications Protocols
So far we have discussed serial protocols all of which are wired. Now let’s see some wire-
less protocols which have become very relevant today,as wireless data transfer is the need
of the hour. For wireless communications, standards have been defined, and the proto-
cols that we discuss now conform to one of these specifications—the need is to transfer
data and control signals to systems to which there is no wired connection.
First we discuss wireless LAN for which IEEE has specified a standard defined by
the IEEE 802.11 standard. Next we discuss wireless personal area networks which are
confined to small distances (in comparison to WLAN) and has been standardized with
the number IEEE 802.15.
5.5.1 | WLAN (IEEE 802.11)
This is a standard defined for Wireless LANs (WLAN). The IEEE (Institute of
Electrical and Electronic Engineers) released the 802.11 specification in June 1997.The
initial specification, known as 802.11, used the 2.4 GHz frequency and supported a
maximum data rate of 1 to 2 Mbps. Later, many changes and additions came, and are
listed as follows:
i) 802.11b: The first widely used wireless networking technology, known as 802.11b
(more commonly called Wi-Fi), first released in 1999, but is still in use, though on
the road to obsolescence.
ii) 802.11g: In 2003, a follow-on version called 802.11g appeared offering greater per-
formance (i.e. speed and range) and remains today’s most common wireless net-
working technology.
M05_9788131787663_C05.indd 194
M05_9788131787663_C05.indd 194 7/3/2012 12:39:27 PM
7/3/2012 12:39:27 PM

iii) 802.11n: Another improved standard called 802.11n is currently under develop-
ment and is scheduled to be complete soon. The 802.11n standard has yet to be
finalized, but products based on the draft 802.11n standard are available.
The 802.11 protocol covers the MAC (Media Access Control) layer and physical layer.
The standard defines a single MAC which interacts with three types of physical layers.
i) Frequency Hopping Spread Spectrum in the 2.4 GHz Band
ii) Direct Sequence Spread Spectrum in the 2.4 GHz Band
iii) InfraRed
An 802.11 LAN is based on a cellular architecture where the system is subdivided into
cells. The specification defines two types of operational modes: ad hoc (peer-to-peer)
mode and infrastructure mode.
5.5.1.1 | Infrastructure Mode
Let us discuss the toplogy of a large wireless LAN network,while referring to Figure 5.38.
In the figure, a basic service set (BSS) is shown as a basic cell which has two com-
ponents, which are the station and the access point.
Station This is the unit which communicates with the wireless medium. The most
likely candidate to be a station is a PC with a network interface card (NIC). It is the
presence of the NIC that qualifies a PC to be a station. Iphones, PDAs, tablets, etc. are
new media devices which act as stations.
AccessPoint/BaseStation Each cell is controlled by a base station,also called an access
point (AP).Figure 5.38 shows three BSSs each with two to three stations in it.Note that,
Figure 5.38 | A WLAN in the infrastructure mode
BSS
AP
STA STA
BSS
AP
STA STA
BSS
AP
STA STA
STA
Wired Distribution System
Extended Service Set (ESS)
Wireless Link
M05_9788131787663_C05.indd 195
M05_9788131787663_C05.indd 195 7/3/2012 12:39:27 PM
7/3/2012 12:39:27 PM

a BSS is not defined by a geographic area, but rather by connectivity.The AP is shown to
form part of a wired system (through Ethernet). For stations in one BSS to connect to
stations in another BSS, the communication is forwarded through their respectives base
stations (APs).This forms an ESS.
5.5.1.2 | Extended Service Set (ESS)
It is a set of BSSs, where the APs communicate among themselves to forward traffic
from one BSS to another and to facilitate the movement of mobile stations.
Distribution System A number of BSSs interconnected through some kind of back-
bone is called the distribution system (DS). In the figure, the DS is shown to be a wired
(Ethernet) network but it can be a wireless system as well.
Roaming The 802.11 specification includes roaming capabilities that allow a computer
to roam among multiple access points on different channels.Thus, roaming stations with
weak signals can associate themselves with other access points with stronger signals.
A wireless station’s NIC may decide to associate itself with another access point within
range, because the load on its current access point is too high for optimal performance.
5.5.1.3 | CSMA/CA
In a cell, multiple stations may try to transmit and this is prevented by the CSMS/CA
protocol. This is slightly different from the CSMA/CD mechanism used in Ethernet
where a collision is ‘detected’and then,‘back off’is done.In WLAN,collision is ‘avoided’,
however. A CSMA protocol works as follows: A station desiring to transmit, senses the
medium. If the medium is busy (i.e. some other station is transmitting), then the station
defers its transmission to a later time.
5.5.1.4 | Ad hoc Mode
In the ad hoc mode, also known as independent basic service set (IBSS) or peer-to-peer
mode, all the computers equipped with an NIC can communicate with each other via
the wireless link without an access point.The ad hoc mode is convenient for quickly set-
ting up a wireless network in a meeting room, hotel conference centre, or anywhere else,
when sufficient wired infrastructure does not exist.
5.5.1.5 | Performance
Wirelessly networked computers function best when located relatively close together
and in the ‘line of sight’of each other.The level of performance of WLAN is dependent
on a number of important environmental and product-specific factors. 802.11b and g
officially work over a distance of up to 320 feet indoors or 1300 feet outdoors, but it is
practically difficult to achieve these figures. Access points will automatically negotiate
the appropriate signalling rate based upon environmental conditions, such as:
i) Distance between WLAN devices (AP and NICs)
ii) Transmission power levels
M05_9788131787663_C05.indd 196
M05_9788131787663_C05.indd 196 7/3/2012 12:39:27 PM
7/3/2012 12:39:27 PM

iii) Building and home materials
iv) Radio frequency interference
v) Signal propagation
vi) Antenna type and location
5.5.2 | IEEE 802.15 for WPAN
These protocols belong to a class of applications called ‘Wireless Personal Area
Networks’(WPAN). WPAN is defined as a network meant to span a small area such as
a private home or office building or an individual workspace. It is used to communicate
over a relatively short distance. Ad hoc networking is one of the key concepts in
WPANs. This allows devices to be part of the network temporarily; they can join and
leave at will. WPAN standard ensures data security through data encryption, and so is
resistant to unauthorized intrusions. Two important and popular protocols are Zigbee
and bluetooth, which have similarities in their approach, but vary in their application
domain.
The features of WPAN networks can be listed as follows:
i) Short-range
ii) Low Power
iii) Low Cost
iv) Small networks
Here we will take a look at two types of WPANs
5.5.2.1 | Zigbee
This is a wireless network of active modules, founded on a packet-based protocol
known as the IEEE 802.15.4. ZigBee-compliant products operate in unlicensed bands
worldwide, including 2.4GHz (global), 902 to 928 MHz (America) and 868 MHz
(Europe). The transmission range is from 10 to 100 m, depending on the power output
and environmental characteristics.
Communication Technology ZigBee uses direct sequence spread spectrum in the
2.4 GHz band, with offset-quadrature phase-shift keying modulation. The 868 and
900 MHz bands also use direct-sequence spread spectrum but with binary phase- shift
keying modulation.
Application Arena The name Zigbee was suggested by the alliance which standard-
ized and developed this protocol; the name was coined to compare the data movement
in the network, to the zigzag movement that honey bees use to share information, such
as the location, distance and direction of a newly discovered food source to fellow colony
members.
The application arena of Zigbee is such that the focus is on low data rate and
low power dissipation, to be useful for ‘monitoring and control’. The applications lined
up for using Zigbee are industrial control, embedded sensing, medical data collec-
tion, remote metering, smoke and intruder warning, building automation and domot-
ics (The word domotics means home robotics, as ‘domus’ is the latin word for home).
M05_9788131787663_C05.indd 197
M05_9788131787663_C05.indd 197 7/3/2012 12:39:27 PM
7/3/2012 12:39:27 PM

The field of domotics covers the whole range of smart home technology, including the
highly sophisticated sensors and controls that automate temperature, lighting, security
systems, toys and so on. A special area of interest for Zigbee is ‘wireless sensor networks’
(Section 4.4).
To formalize and standardize the protocol, an alliance of interested groups was
formed naming itself the ‘Zigbee Alliance’. The initial eight promoter companies were
Chipcon, Ember, Freescale, Honeywell, Mitsubishi, Motorola, Philips and Samsung.
Now, things are looking up for Zigbee with a growing number of companies (over 175)
expressing their commitment to providing Zigbee compliant products and solutions.
Understanding Zigbee Zigbee involves the interconnection of nodes.Each node has a
processing unit and peripherals—some of the peripherals may be sensors, others may be
actuators—besides this,nodes need to transmit and receive; as such,there is a transceiver
and an antenna as well in each node.
Zigbee Nodes Zigbee is based on a master–slave configuration. Zigbee defines two
different kinds of devices which act as the nodes in the network.
i) Zigbee Full Function Device (FFD): This device can act as a coordinator or router.
The coordinators and routers have to listen to the network continuously.
ii) Zigee Reduced Function Device (RFD): Such devices can only act as slaves.They
can find a network and transfer data from its application, if necessary. They are
designed for energy saving and are typically battery powered.
A network consists of FFDs and RFDs connected in a mesh, star, cluster mesh or peer-
to-peer topology. See Figures 5.39 and Figure 5.40 which show these topologies. The
components of the network are defined as follows:
Zigbee Co-Ordinator Only one coordinator is required for a network—this is the
node which initiates the formation of the network—new nodes can be added only at
the behest of the coordinator. But once the network is formed, the coordinator becomes
just a ‘router’. It is a full function device. In a peer-to-peer network, there is no specific
co-ordinator as all nodes are peers and have equal functionality, and all nodes use full
function devices.
Zigbee Router As the name indicates, it participates in mulit-hop routing of messages
from one node to another. It should be a full function device. The number of routers
depends on the application requirements. The idea is that messages from any point of
the network must get routed to any other point of the network, through any path; so a
series of routers are needed.
Zigbee End Device It is a reduced function device, in that it cannot do routing of mes-
sages. It can talk with a ZC or ZR but nothing more.
Figure 5.41 shows a typical Zigbee node. There is an RF transceiver module with
an antenna and also a processor module connected to sensors/actuators. There are also
components which take care of the Zigbee protocol.
The module which contains the MCU and the Zigbee network hardware is given
the name ‘MCU module’.
M05_9788131787663_C05.indd 198
M05_9788131787663_C05.indd 198 7/3/2012 12:39:27 PM
7/3/2012 12:39:27 PM

Figure 5.39 | Typical zigbee networks using star, mesh and cluster mesh
Star Mesh
Cluster Tree
Zigbee Coordinator
Zigbee Router (ZR)
Zigbee End Device (ZED)
Cluster Tree
Point to Point
Full Function Device
Figure 5.40 | Zigbee network in the peer-to-peer conﬁguration
M05_9788131787663_C05.indd 199
M05_9788131787663_C05.indd 199 7/3/2012 12:39:27 PM
7/3/2012 12:39:27 PM

5.5.2.2 | The OSI Model and Zigbee
The Zigbee standard is loosely based on the OSI 7-layer model. The IEEE 802.15.4
wireless standard deﬁnes the lower two layers, the application is determined by the cus-
tomer needs and the standards of the Zigbee Alliance, deﬁne the remaining three inter-
mediate layers.
Figure 5.41 | Block diagram of a zigbee node
RF Data Modem
RF
Reciever
Frequency
Generator
RF Trans-
mitter
Transmitter
Base Band
Reciever
Base Band
Control
Application
Zigbee
Network
15.4 MAC
Sensor/
Actuator
Driver
MCU Module
Power Management
Application
API
Security
Network
MAC
Physical
Customer
Zigbee
Allaince
IEEE 802.15.4
Figure 5.42 | The OSI layer, Zigbee and IEEE 802.15.14
M05_9788131787663_C05.indd 200
M05_9788131787663_C05.indd 200 7/3/2012 12:39:28 PM
7/3/2012 12:39:28 PM

Projects Using Zigbee From an academic point of view, projects can be done in which
remote transmission is necessary. For this, the nodes with a microcontroller and periph-
erals can be developed. Since Zigbee is a high level protocol, it is not possible to imple-
ment it easily by using the instructions available in the processor. Dedicated hardware
and firmware plus the RF section are needed. The solution is to buy a Zigbee module
from any of the standard sources. Zigbee modules to fit your application are available at
relatively low cost, from many sources.
5.5.2.3 | Bluetooth
All of us are very familiar with bluetooth, because most of us use it very frequently, as it
is available in our laptops and mobile phones. Here we will attempt to have a glimpse
of bluetooth as a ‘technology’ which has been standardized as the wireless standard
IEEE 802.15.1 Bluetooth is a wireless PAN network similar to Zigbee but different
in the area of application. This technology also offers wireless access to LANs, PSTN
(Public Switched Telephone Network, which refers to the wired land line telephone net-
work) the mobile phone network and the Internet, for a variety of home appliances and
portable handheld interfaces. It supports a range of 10 m, which can be increased up to
100 m with the use of an amplifier.
The motivation for this technology was the necessity to avoid wires to connect
peripherals. There was (and still is) a technology named IrDA (based on infra red sig-
nals) for wireless access, but this needs the transmitter and receiver to be within ‘line of
sight’ of each other. Bluetooth does not have this limitation.
Bluetooth is a standard developed in 1998 by the members of the Bluetooth Special
Interest Group (SIG) which consisted of the companies Ericsson, Intel,Toshiba, Nokia
and IBM that allows electronic equipments to be interconnected without wires and
cables.Today, more than 1000 companies have joined SIG to work for an open standard
for the Bluetooth concept.
Communication Technology It operates in the unlicensed spectrum of 2.4 GHz.This
spectrum is shared by other types of equipment (e.g. microwave ovens). In order to avoid
interference, the Bluetooth specification employs frequency hopping spread spectrum
(FHSS) techniques. Data is transmitted at a maximum rate of up to 1 Mb/s.
The Bluetooth Protocol Any Bluetooth device can be a master or a slave, depending
on its application.
A master is the only one that may initiate the creation of a link. However, once a
link is established, the slave has the permission to become the master if it needs to take
charge of the network. Slaves are not allowed to talk to each other directly. All commu-
nication occurs between a slave and the master.
Any two Bluetooth devices that come within a range of each other can set up an
ad hoc connection, which is called a piconet. Every piconet can consist of a maximum
of eight units (because a three-bit MAC address is used). There is always a master unit
in a piconet and the rest of the units act as slaves. The unit that establishes the piconet
becomes the master unit. The master unit can change later but there can never be more
than one master. Several piconets can exist in the same area. This is called a scatternet.
M05_9788131787663_C05.indd 201
M05_9788131787663_C05.indd 201 7/3/2012 12:39:28 PM
7/3/2012 12:39:28 PM

Within one scatternet, all units share the same frequency range, but each piconet uses
diﬀerent hop sequences and transmits on diﬀerent hop channels. See Figure 5.43.
Conclusion
With this, we come to the end of our discussion of buses and protocols. Only a few
important buses and protocols have been covered here: as a sample of the large set of
them available for various and varied purposes.
Buses carry signals inside a system, according to a set of well-defined rules called
‘protocols’.
Serial buses are gradually replacing parallel buses.
When multiple masters need the same bus at the same time, techniques for bus arbitra-
tion must be sought.
Two important on-board buses are the I2C and the SPI bus.
The USB is a serial bus which has become very popular.
Firewire is another serial bus similar to USB, but faster and more complex.
The standard serial port has been around for a long time, but is rapidly being replaced by
the USB port.
RS 232, RS 422 and RS 485 are different serial communication standards with minor differ-
ences between them.
The Ethernet is a wired LAN protocol which has stood the test of time and fierce
competition.
The CAN bus is the most popular automotive bus in use.
WLAN is defined by the IEEE 802.11 standard.
Two popular WPAN networks are Zigbee and Bluetooth.
Figure 5.43 | A bluetooth network
S4
S3
S5
S6
S7
S1
S2
M1 M2
Piconet A Piconet B
Scatternet
M05_9788131787663_C05.indd 202
M05_9788131787663_C05.indd 202 7/3/2012 12:39:28 PM
7/3/2012 12:39:28 PM

Q U E S T I O N S
1. Is the SPI a synchronous or asynchronous bus?
2. Why is UART considered an asynchronous protocol?
3. Why is cache needed for a computer system?
4. What is the purpose of having a bridge on an embedded board?
5. Make a clear distinction between on-board and off-board buses, with examples.
6. Name a few standard parallel buses you have heard of, and list their maximum transfer
speeds.
7. What is the main difference between the PCI bus and PCIe bus?
8. What is meant by ‘differential signalling’ that is used for serial transmission and what are
the advantages and disadvantages of such a signalling?
9. Compare the I2C and SPI buses in terms of their performance.
10. What is done in the enumeration phase of the USB protocol?
11. For what kind of data is isochronous data transfer used?
12. What is the arbitration technique used in CAN?
13. Why is the LIN protocol not very popular?
14. What is the use of the RTR bit in the CAN message format?
15. Why is the firewire considered to be a superior protocol?
16. In what way is RS 485 superior to RS 422 ?
17. What is the most significant advantage that RS 422/485 has over RS 232?
18. In what aspects are the Zigbee and bluetooth protocols different?
E X E R C I S E
1. Find the names of the PIC MCUs which have in them, I2C and SPI controllers. Draw a circuit
diagram connecting an LCD display and an SPI controller in a PIC.
2. Find the names of ARM and PIC MCUs which have CAN controllers inside them.
3. Find applications of CAN which are not related to automotive embedded systems.
4. Draw the setup of a Zigbee network for a home security system. Explain the sensors that
should be used, and the algorithm you will suggest for sensing and sending messages.
5. What do you know of the GSM network? Can you use a GSM module for home security
applications? How?
6. What are the applications in which you have used bluetooth? Find out at least three more
applications possible with bluetooth, but which you have not used.
7. Name at least 10 devices in which you have seen USB ports.Which of these ports are hosts
and which of them are devices?
8. Where is the Ethernet protocol used? Does it have any similarity to CAN?
9. What do you know of Firewire? Where are Firewire ports seen?
10. Have you seen SATA ports? Where?
11. What types of communication techniques are used in WLAN? Explain.
12. To what applications does IEEE 802.11n cater to?
M05_9788131787663_C05.indd 203
M05_9788131787663_C05.indd 203 7/3/2012 12:39:28 PM
7/3/2012 12:39:28 PM

Introduction
An embedded system, as we see, includes both hardware and software. To develop the
hardware and software in unison, a number of tools become necessary. Another related
term is ‘firmware’.When software is embedded in hardware, it becomes a ‘firmware’.The
hardware involved here is a non-volatile memory which is part of an MCU or system
board. Currently flash ROM is the device of choice for firmware.
In this chapter, we make a tour of the important tools used by an embedded system
developer, which takes him step by step into writing and testing software, and finally
embedding it as ‘firmware’. We discuss software development alone in this chapter
because hardware aspects have been covered in Chapter 2.
6.1 | Embedded Program Development
Let us first try to understand the program development steps. The assumption is that
the hardware design is being done parallel, and we are out to develop, test and port the
software onto the embedded systems board. It is only after the software is thoroughly
tested can it be burned into the flash (ROM) of the target processor.
See Figure 6.1 which is a typical target board .The processor is on the board which
contains a few more chips, and some connectors.There is the RJ-47 socket, USB socket,
The steps in developing embedded systems
software
Cross compilation and cross assembly
The set of actions involved in ‘building’ a
program
The methods of downloading the execut-
able file into a non-volatile memory.
Why a software simulator is very useful to
a software developer
Functions of an emulator
The usefulness of a hardware simulator
software
development tools
6
Chapter-opening image: Development board of TI’s low power MSP 430 microcontroller.
M06_9788131787663_C06.indd 204
M06_9788131787663_C06.indd 204 7/3/2012 12:10:07 PM
7/3/2012 12:10:07 PM

SOFTWARE DEVELOPMENT TOOLS 205
serial port, other connectors, etc. seen on this board, besides the MCU and a few other
chips and peripheral devices like an LCD, speakers, etc.
6.1.1 | The Initial Steps
Remember that the program to be run on the selected MCU cannot be simply burned
on its flash without testing and making sure that it is working as per the required
specifications. This part of developing a working and tested program called ‘software
development’. Let us assume that a high level language like say C is used for the
programming.
The first part of development is carried out on a general purpose PC which is called
the ‘host’ system which usually runs on an x86 processor. On this host system, we need
an IDE (Integrated Development Environment) to ‘build’ the program for our embed-
ded processor. An IDE is a tool chain which does cross compilation, linking and simula-
tion for the program. Finally it generates an executable hex file. This executable file is
downloaded on to the non-volatile memory,i.e.,flash,of the MCU on the board,using a
serial link as shown in Figure 6.2. (We call this file as a ‘hex’file which is a transport file
format for executable files or any binary information).Once this is done, the program is
permanently available on the target board (as firmware), and the board then becomes a
‘stand alone’ system dedicated for the application.
These steps seem easy as long as we have all the ‘development tools’ for the MCU
we use, that is, the IDE. With this brief introduction, let us attempt to understand the
complete program development sequence and the role played by each ‘part’ of the tool
chain.This chapter is meant to throw light on these aspects.
Figure 6.1 | An embedded board
M06_9788131787663_C06.indd 205
M06_9788131787663_C06.indd 205 7/3/2012 12:10:08 PM
7/3/2012 12:10:08 PM

6.1.2 | The Integrated Development Environment
An integrated development environment (IDE) is a programming environment that has
been packaged as an application program, consisting of a code editor, a cross compiler,
a debugger, and a graphical user interface (GUI) builder. (An IDE is specific for a
particular family of MCUs. One family of MCUs can have many different IDEs offered
by many vendors.)
Table 6.1 lists some of the popular IDEs available. Some of them are open source
ones, while others are proprietary. Some IDEs are very elementary, while others are very
advanced and cater to professional requirements.
The Keil IDE has been suggested (in this book) for use by students because of
its free availability (the student version), and ease of use. It caters to assembly and
Embedded C for a wide range of MCUs including different versions of the 8051 and
ARM. Appendix B gives the step by step method of using this IDE. It will be a good
idea to gain familiarity with a typical IDE, before going further.That will go a long way
in understanding the components of an IDE and the role of each component.
An IDE has at least the following:
i) Code Editor
ii) Builder
iii) Simulator
iv) GUI (Graphical User Interface)
Figure 6.2 | A HOST computer and an embedded board
M06_9788131787663_C06.indd 206
M06_9788131787663_C06.indd 206 7/3/2012 12:10:11 PM
7/3/2012 12:10:11 PM

6.1.3 | Code Editor
This editor allows code to be written,changed and saved as files in folders called ‘projects’.
During the course of the use of the IDE, the project folder will be found to contain all
the different types of files associated with that project.
6.1.4 | GUI
Most windows based IDEs have a graphical user interface which makes it easy to use.
This gives the facility to view all activities using the IDE. This is especially attractive
in the debug mode, when the contents of registers, memory and peripherals have to be
viewed.
6.1.5 | Compiler
A compiler is a program that translates one computer language program (called source
code) into another (object code). Usually the name ‘compiler’ is used for programs
that translate source code from a high-level programming language to a lower level
language (assembly language or machine code.) For a machine to ‘run’the object code,
the final conversion should be into the machine’s machine code. In that case, the con-
version from a high level language to assembly code can be considered to be an inter-
mediate step.
A C program running on an x86 processor (i.e on a PC) will be converted to the
assembly language of the processor being used here, i.e. the x86. For embedded systems,
it is ‘cross compilation’ that is done, however.
What is a cross compiler?
Assume that we have chosen ‘ARM’ as the processor of our target board. We write
programs in Embedded C for the ARM processor. Then the compilation is done to
convert the C program to the assembly/machine language of ARM rather than that of
the x86 processor used by the host system. That is, the processor of the PC ‘compiles’
Table 6.1 | A List of Popular IDEs
IDE MCUs for Which They are Used Supplier
Keil RVDK ARM,8051 Keil Inc,
Eclipse ARM OPEN SOURCE
PSoC Creator PSoC 3 and 5 CYPRESS SEMICONDUCTORS
PSoc Designer PSoC 1 CYPRESS SEMICONDUCTORS
IAR MSP 430, ARM, 8051 IAR SYSTEMS
Code Composer Studio DSP Processors, MSP 430 TEXAS INSTRUMENTS
MPLAB PIC MICROCHIP TECHNOLOGY
AVR STUDIO AVR MCU ATMEL CORPORATION
CODEWARRIOR ARM FREESCALE SEMICONDUCTORS
VISUAL DSP++ Blackfin DSP processor Analog Devices
M06_9788131787663_C06.indd 207
M06_9788131787663_C06.indd 207 7/3/2012 12:10:13 PM
7/3/2012 12:10:13 PM

the program into the assembly code of ‘another’ processor. That is why it is called ‘cross’
compilation.
For any processor, different compilers are available. Compilation is not a one-to-
one process. Each type of compiler may convert a single HLL line to many different
assembly lines, (and thus to different sets of the machine code, finally). Essentially, this
means that some compilers are more efficient than others because they might ‘compile’
to a lesser volume of the assembly/machine code. After compilation, the syntax errors
are indicated.
6.1.6 | Assembler
This is the software which converts an assembly program into a machine code on a one-
to-one basis.The term ‘one to one’ means that for a specific mnemonic, there is one and
only one machine code.
6.1.6.1 | Cross Assembler
If the program for ARM, in the editor of the IDE is an assembly program, the ‘cross
assembler’ in the IDE converts it to the machine language of ARM. Thus, the assem-
bler software which runs on one processor (x86 of the PC) produces machine code for
another processor, i.e., ARM.This is cross-assembly.
An assembler generates an ‘object file’ which contains the following information:
i) Machine code
ii) Re-locatable addresses of the code bytes
Typically, an assembler makes two ‘passes’ or readings over the assembly code. In
the first pass,it reads each line and records labels (symbols corresponding to addresses)
in a symbol table. In the second pass, it gets the actual addresses (displacements with
respect to a reference) of each code line and fills in the missing portions of the symbol
table. It is in this pass that the mnemonics are converted to machine code. Addresses
are said to be relative because the actual physical addresses of locations in memory,
to which these lines will be copied, will be a displacement with respect to a base
address.
6.1.7 | Builder
Once the code has been written and saved as a file in a project, the next step is ‘build-
ing the project’. A builder includes the compiler, and assembler that we have just talked
about. There is also a linker. The project may contain a number of source files, which
many be in C or assembly, and all of them are ‘built’ together, as we will soon see.
Now let us have a look at the general steps in converting a C program file, that is,
the source file to an executable file.Figure 6.3 outlines the steps involved.The steps show
the C source file being ‘compiled’to an assembly file,which is then converted to machine
code using an assembler.
The process of assembly also generates other files named as cross-reference,map and
so on. Describing each one of them will tend to clutter this discussion with unnecessary
details, and so is avoided. But any inquisitive reader can take a look at all these files in a
typical IDE and easily understand the need for these extra files.
M06_9788131787663_C06.indd 208
M06_9788131787663_C06.indd 208 7/3/2012 12:10:13 PM
7/3/2012 12:10:13 PM

After the step of assembly,we have the machine language file.At this point,‘linking’
is done with library routines (if needed), using a program called the ‘linker’. At the
end of all this, we get a single executable file which is to be ‘loaded’ into memory for
execution by the processor.
6.1.8 | Disassembly
Disassembling is the reverse process of assembling. In the ‘disassembly window’, the
addresses of the code lines, the machine code and the mnemonics can be seen. See such
a typical window for the Keil μVision IDE (in the debug mode) in Figure 6.4.
Note the highlighted line, to understand each part of the line.
C:0x0000 is the address of this code line in the code memory
MOV R3, #0x14 is the code mnemonic and 7B 14 is the corresponding machine
code in hex.
6.1.9 | Linker
This piece of software does the linking of many modules. There may be a number of
source modules. Note Figure 6.5 where three source files are compiled and assembled.
Figure 6.3 | Conversion steps from source file to an executable file
C Program
Compiler
Assembly Language Program
Assembler
Object File: Machine Language Module Object File: Library Routine (Machine Language)
Executable File: Machine Language Program
Linker
Loader
Memory
M06_9788131787663_C06.indd 209
M06_9788131787663_C06.indd 209 7/3/2012 12:10:13 PM
7/3/2012 12:10:13 PM

There may also be a number of library functions being used. For example, if we
started with a C program, we might be using the standard I/O or math library, which
needs to be ‘connected’to the object files. By linking, that is, combining all these, we get
the executable file, which is usually designated as the ‘exe file’. It is this file which will be
‘loaded’ to the processor memory to get executed.
For embedded processors, there is a reset vector and it is from that address in flash
that the code gets executed (for 8051 it is the first address, i.e., 0x0). At the end of the
linking process, the arrangement for loading the code into this specific address is also
done. Now the hex file is ready to be downloaded into the non-volatile memory in the
target board.
In an IDE (take a look at the Keil IDE, for example), the processes of ‘compile,
assemble and link’ are together called ‘build’. The output from the build process is the
hex file which can be loaded into the flash of the processor or be simulated using the
simulator of the IDE. Figure 6.3 shows the executable file being loaded into memory,
using a ‘loader’.
Figure 6.4 | A typical disassembly window
Figure 6.5 | Steps in building and converting to an executable file
Source
File
Source
File
Source
File
Compiler and
Assembler
Compiler and
Assembler
Compiler and
Assembler
Object
File
Object
File
Object
File
Linker
Library
Executable
File
M06_9788131787663_C06.indd 210
M06_9788131787663_C06.indd 210 7/3/2012 12:10:13 PM
7/3/2012 12:10:13 PM

6.1.10 | Simulator
This is also called a software debugger.This is a software which allows debugging of code
by identifying and correcting logical errors. Here setting break points and single step
execution is possible.The most important aspect of a software debugger in an embedded
system IDE is that the architecture of the selected processor is ‘simulated’. What this
means is that, if it is the 8051 MCU that has been selected, there is the facility to view
the registers, RAM and ROM locations of this MCU after each step of execution. The
available peripherals such as timers, GPIO, serial ports, etc. are also viewable. There is
the facility for simulating the external input ports and logic values.This means that if to
an input GPIO pin, applying a ‘1’as input is to be given, it can be done and the result of
this (in memory or a register) can be observed.
With the help of a simulator, the developer can make himself very sure about
the ‘working’, ‘logical’ of his code. But the simulator does not give a measure of time,
though observation of generated waveforms is possible using the logic analyser in the
simulator.
You can get a feel of all this,using the Keil Simulator by going to the ‘debug’mode.
6.2 | Downloading the Hex File to the Non-volatile
Memory
In Figure 6.3, a loader is shown to be loading the executable file to memory. In a general
purpose system, the ‘loader’ is a software which find space in RAM to load the exe file.
In the case of embedded systems, the final step in program development is burning
the hex file into the non-volatile memory of the target board. Usually this is the flash
memory of the MCU, but it is also possible that the target board contains flash memory
external to the MCU.
Once the executable hex file is ready it can be downloaded into the flash of the
MCU.This can be done in various ways:
i) OTP in factory environments
ii) Out of circuit programming
iii) In system programming
In the case of a factory environment where many chips have to be burned with the
same code (for a standard product,for example),there is no need for re-programmability.
All the chips are of type which is referred to as ‘OTP or One Time Programmable’.
6.2.1 | Out of Circuit Programming
The MCU can be taken and plugged into a universal programmer. Refer to Figure 6.6.
for the photograph of a universal programmer (SuperPro) with a chip plugged into it.
Such a programmer is one which caters to different families of MCUs.The programmer
is connected to the host PC and the hex code is burned onto the MCU.There are a num-
ber of such programmers available in the market having connectors for ICs of different
numbers of pins and packaging.
The software associated with this is loaded into the host PC and the executable hex
file is downloaded into the MCU‘s flash using a serial connection.
M06_9788131787663_C06.indd 211
M06_9788131787663_C06.indd 211 7/3/2012 12:10:13 PM
7/3/2012 12:10:13 PM

6.2.2 | In System Programming (ISP)
This is also called in-circuit programming. The idea here is that the MCU to be pro-
grammed remains on the target board itself, that is, it is in the system itself. This is very
convenient and is in contrast to the out of circuit programming that we have just dis-
cussed. Many boards are supplied with the necessary ISP hardware and software (loaded
in the host computer). For example, all MCU kits for educational purposes have it.Thus,
students write, test and finalize their program on the host computer, and get the hex
file ready. They then download it into the flash of the MCU on the kit. Erasing and
re-loading can be done many times (limited only by the capability of the flash, which
may be as high as 50,000 times).
What are the items included in ISP?
i) There is a software utility running on the host computer for controlling this pro-
gramming interface.
ii) There is an adapter for connecting between the target board and any of the standard
PC ports like the standard serial port, USB, etc. Sometime back, the standard serial
port was the port of choice for downloading the exe file to the flash, but USB is
more popular now. For more advanced processors like ARM, the JTAG interface is
the one more common.
iii) There is a serial protocol in operation for the transfer from the host to the target.
This may the SPI protocol (Section 5.2.2) or the JTAG protocol.This is indicated in
Figure 6.7 as the ‘programming interface’.
There is a large set of variants of ISP available, supplied by different board and chip
manufacturers.
Figure 6.6 | An MCU plugged into the socket of a universal programmer
M06_9788131787663_C06.indd 212
M06_9788131787663_C06.indd 212 7/3/2012 12:10:13 PM
7/3/2012 12:10:13 PM

6.2.3 | Porting an OS to an Embedded Board
When the embedded board needs an embedded OS to be burned onto it,the complexity
is more than just downloading an application program. A number of open source
embedded operating systems are available on the web.We must do a thorough search and
study to select the right OS for the board that we have, because some settings in the OS
Figure 6.7 | The ISP interface
Host
Source
File
Cross
Development
Hex File
Programming
Software
Adapter
Hardware
Port
Programming
Adapter
Programming
Interface
Target
Board
Flash
M06_9788131787663_C06.indd 213
M06_9788131787663_C06.indd 213 7/3/2012 12:10:15 PM
7/3/2012 12:10:15 PM

are hardware specific. These web sites offer a step-by-step procedure for porting the
OS to the board. Section 19.1.13 gives the procedure for porting a Linux kernel to a
beagle board (a board with a dual core processor, i.e., an ARM core and a DSP core).
6.2.4 | Emulator
The term ‘emulate’ means to ‘imitate with effort’. Thus, what an emulator does in this
context is to ‘imitate hardware’. The hardware of interest is usually (but not necessar-
ily) the MCU which is to be used in the embedded board under development.Thus, we
use emulators for hardware debugging. In the emulator box, there is hardware which
works exactly like the processor we use. Thus, our code is run on this emulator hard-
ware and debugged in this, rather than in the actual hardware. An emulator can emulate
the replaced uC in real time. The idea is similar to what we do with a simulator. The
difference is that we read the contents of registers,memory,etc.from the emulator,which
is a hardware which for us is equivalent to the target processor being used. Once the
debugging is done (with single stepping, setting breakpoints, etc.), and the developer is
confident about the performance of his code, it can be burned onto the target processor.
6.2.5 | ICE (In-Circuit Emulator)
An ICE is an emulator included in a development setup for a target board. A generic
ICE box contains the emulator as well as the ISP for downloading the code to the target
board. Many emulators have more advanced features like performance analysis, coverage
analysis, a trace buffer and advanced trigger and breakpoint possibilities.
See Figure 6.8 which shows the ICE for the PSoC board.The ICE included in the
PSoC development scenario is described as follows.
Figure 6.8 | A PSoC 3 board with an ICE
M06_9788131787663_C06.indd 214
M06_9788131787663_C06.indd 214 7/3/2012 12:10:15 PM
7/3/2012 12:10:15 PM

‘The In-Circuit Emulator (ICE) consists of a base unit, USB 2.0 cable, and power
supply. The base unit is connected to the host PC via the USB port. The ICE is driven
by the Debugger subsystem of PSoC Designer IDE. This software interface allows the
user to run, halt and single step the processor. It also allows the user to set complex event
points. Event points can start and stop the trace memory on the ICE, as well as break
the program execution’.
6.3 | Hardware Simulator
In the academic environment, where students want to develop hardware for their proj-
ects and innovative products, a hardware simulator is likely to be extremely useful. But
what exactly is this? Before hard wiring and soldering a circuit, if it is possible to test the
circuit and confirm that it works as per expectations, a lot of effort can be saved. There
are some such simulators in common use, where the hardware of important MCUs are
available into which the required program can be written and tested for the ‘correctness’
of the hardware connections and the associated software.
One such tool commonly used in the academic environment is the Proteus VSM
which is a design suite which offers the idea of ‘virtual modeling’of MCU-based circuitry
with important external peripherals like LEDs, LCDs, etc. With such a tool, testing of
the circuit can be started as soon as the schematic is ready, and once the software is
also ready, the complete setup can be tested and verified. Once this is confirmed to be
working correctly, PCB design can be started.
Figure 6.9 | A screenshot of a hardware connection simulated using Proteus VSM
M06_9788131787663_C06.indd 215
M06_9788131787663_C06.indd 215 7/3/2012 12:10:17 PM
7/3/2012 12:10:17 PM

Proteus VSM ‘combines mixed mode SPICE circuit simulation, animated
components and microprocessor models to facilitate co-simulation of complete
microcontroller based designs’. MCUs of families like the 8051, PIC, ARM, HC11,
etc. can be simulated and tested in terms of hardware and the programs burned into
them. Figure 6.9 is a screenshot of the schematic of PIC 18F458 with an LCD display
interfaced to it. The external connections between the MCU and the LCD unit have
been simulated, and the display program has been loaded into it. Two character strings
in two lines are shown on the display in the ﬁgure.
Conclusion
This chapter has dealt with some tools which are generally used in the development of
embedded systems. Only a few simple tools have been covered because the target audi-
ence is expected to be university students. Do keep in mind that the embedded system
industry is a multibillion dollar industry and many multinational companies develop
very advanced products. For the development of very advanced systems, many sophis-
ticated and elite tools are available which lets the developer have a very clear insight of
the hardware and software he is in the process of developing. Such advanced tools are
beyond the scope of this chapter as those tools are tuned to the requirements of profes-
sional developers working in the industry.
Embedded systems development needs a set of software tools.
An IDE is a software tool chain used for software development, and is specific for a
particular MCU.
An IDE consists of an editor and a builder.
The‘building’process includes the steps of compiling and linking.
It is‘cross compiling’that is normally done in the case of embedded systems.
A software simulator helps in checking the‘correctness’of the developed code.
Downloading the executable file into the non-volatile memory can be done outside the
circuit using a universal programmer.
In circuit programming is the best solution for firmware embedding.
An‘emulator’is a kind of hardware debugger.
The Proteus VSM is a very popular hardware simulator.
Q U E S T I O N S
1. Distinguish between hardware, software and firmware.
2. What is the role of a‘host system’in embedded system development?
3. What is an IDE?
4. Why should an IDE be‘specific’for a particular MCU family?
M06_9788131787663_C06.indd 216
M06_9788131787663_C06.indd 216 7/3/2012 12:10:18 PM
7/3/2012 12:10:18 PM

5. Define the process of‘assembly’.
6. What is the end product of the process of compilation?
7. Distinguish between compilation and cross compilation.
8. What is meant by the term‘disassembly’?
9. Where does‘linking’come in the steps of software development?
10. What are the components of ISP?
11. Where is OTP used and why?
12. What is the use of an ICE?
E X E R C I S E S
1. Find the names of three popular IDEs not listed in Table 6.1.
2. Assuming you have done a project involving the development of an embedded board.
Write down the names of the software tools you have used and your experiences while
doing the development.
M06_9788131787663_C06.indd 217
M06_9788131787663_C06.indd 217 7/3/2012 12:10:18 PM
7/3/2012 12:10:18 PM

M06_9788131787663_C06.indd 218
M06_9788131787663_C06.indd 218 7/3/2012 12:10:18 PM
7/3/2012 12:10:18 PM

PART-II
SOFTWARE DESIGN
ASPECTS
M07_9788131787663_C07.indd 219
M07_9788131787663_C07.indd 219 7/3/2012 3:11:50 PM
7/3/2012 3:11:50 PM

M07_9788131787663_C07.indd 220
M07_9788131787663_C07.indd 220 7/3/2012 3:11:52 PM
7/3/2012 3:11:52 PM

In this chapter, we will study the concepts of operating systems in general, and Chapter 8
will follow it up, by attending to the special requirements of what are called ‘real-time’
operating systems.The approach here will be to discuss concepts with numerical examples
(wherever necessary), but codes/pseudo codes are presented separately, at the end of the
chapter, and not within the conceptual discussions. Concepts regarding operating systems
are very interesting,in that each one of them is related to some kind of real-world scenario
and finding analogies from real life to understand underlying principles is easy.
Introduction
We have all heard of operating systems—we talk about their features and sometimes
compare them—for example, we know that Windows 7 is available in the laptops and
PCs we buy now; sometime back it was Vista, and recollect that the earlier Windows
XP was a stable and reasonably good OS. Many of us still don’t want to change to any-
thing new.
Why a computer system needs an operating
system
The functions performed by an operating
system
What is meant by a kernel
The concepts related to tasks and
multitasking
Calculations related to different scheduling
algorithms
The necessity and mechanisms of ‘Inter
Process Communications’
The issues related to task synchronization
The difficulties arising from racing,
readers–writers problem and deadlock
What is meant by ‘bounded buffer’problem
The concept and use of semaphores and
mutexes
Priority inversion and its solutions
Methods of designing device drivers
Codes and pseudo codes for OS functions
operating system
concepts
7
Chapter-opening image: An 8051 based board.
Section 7.17 is written by Sai Krishna K., Design Engineer, Broadcom, Bangalore.
M07_9788131787663_C07.indd 221
M07_9788131787663_C07.indd 221 7/3/2012 3:11:52 PM
7/3/2012 3:11:52 PM

We recently have been hearing about the Android OS in the context of mobile
phones and e-tablets. But there were other OSes in mobile phones earlier. For instance,
there was Symbion in Nokia phones from quite a long time back, but it is only now that
the public has become more aware of the word ‘OS’, and the features offered by different
types of OSes. Apple has its own Mac OS for PCs and iOS for its IPhones and IPads.
We will make a brief tour of the history of operating systems soon, to put the matter in
the right perspective.
What exactly is an operating system and what does it do? In the simplest terms, we
can consider it to be the manager of a system. For a PC, the OS manages the computer
system which consists of the processor, memory and I/O devices like mouse, modem,
keyboard, etc., and also the unlimited set of peripherals which are frequently plugged
in, and unplugged from it. Another very important function that the OS has, is to hide
the details of the hardware of the system from the user and his application program.
Application programs can be written without much concern about the underlying hard-
ware.This also means that the user does not have to know anything about the computer,
except the application program that he is interested in.
This important aspect is what makes the computer, that is, the modern PC, a very
convenient device—a child may be interested only in playing games on it, a salesman at
the billing counter of a mall needs to bother only about entering prices of articles and
printing bills, the design engineer at an engineering design centre is concerned about
only the Computer Aided Design (CAD) program he currently uses. To each of these
users, the OS hides the details of the intricate and complicated matters of the machine
on which they works on. In short, the user needs to know only about how to run his
application and nothing more.
This feature is called ‘abstraction’—it means that at ‘this’ level, unnecessary details
are hidden. The level can be different for different users. For example, the supervisory
personnel at the computer centre of an institution will have to know more about setting
and cancelling passwords of users, allotting each user a certain amount of memory and
so on. Computer technicians have to go to a lower level and know greater details of the
computer hardware, the software technician will have to know about the memory map
of the system and so on. It is this feature of abstraction that makes computer usage a
pleasure for all kinds of users.
An operating system is a special software (many processors provide a level of hard-
ware support also,for its design) and is called the system software.In a computer system,
the application software runs on top of system software using its support.That is why we
frequently have to check whether an application software is compatible with a specific
OS or not.There are many programs which run on Microsoft Windows but are not com-
patible with the Mac-OS (of Apple PCs) and vice versa. Linux OS users have a set of
applications which run on Linux only. Such is the state of compatibility between system
software and application software.
Currently there is also a transition occurring in terms of bit length. The earliest
PCs had a data bus width of just 8 bits. Over the years that increased to 16, 32 and now
64 bits is standard and common. The OS code also has changed to using 64-bit word
length.Thus, for instance, if we are offered a choice of 32-bit or 64-bit Windows 7, what
do we choose? To ‘move on’faster,it is best to use the 64-bit version,but there are a num-
ber of application programs which don’t run on the 64-bit OS. Many cross compilers are
M07_9788131787663_C07.indd 222
M07_9788131787663_C07.indd 222 7/3/2012 3:11:53 PM
7/3/2012 3:11:53 PM

OPERATING SYSTEM CONCEPTS 223
not supported by 64-bit OSes. We may choose a 32-bit OS if we need to do embedded
system development using low end IDEs, or may also consider having two OSes in the
same machine, or have one OS and another one emulated for specific applications alone.
These are choices that we can make for our own PC.
7.1 | Embedded Operating Systems
Till now, the concept of OS was talked about with reference to a PC.What about hand-
held devices like mobile phones,PDAs,tablets,etc.? Why do they need an operating sys-
tem? The answer is that each of them is almost as complex as a PC. A mobile phone has
multiple tasks running concurrently, it has a number of I/O devices and a network, (the
wireless mobile n/w) also to respond too. Besides that, we now run a number of appli-
cations on our phones right from bank software to entertainment software and games.
There are mobile phones with an embedded version of the Windows 7 OS. Currently,
Android is getting to be one of the most sought after OS for phones and tablets.
The Android and similar kinds of OS belong to a class named ‘embedded OS’ in
contrast to the general purpose OS that computers use. The characteristics and special
features needed for embedded and real-time OSes are covered in Chapter 8. In this
chapter, we will be cover the important aspects of operating systems in general, and in
the course of doing that, may point out some specific properties that embedded OSes
should place more emphasis on.
7.2 | Network Operating Systems (NOS)
Each of us might have a PC (laptop or desktop), but we usually connect it to the
Internet which is a ‘network’ of computers. Besides that, we find that institutions and
organizations have computers connected as a ‘local area network’ and users have access
to the resources locally available. Many OSes (UNIX, Mac OS, etc.) have networking
features in it, but the ones that are designated as NOSes are those which are designed
for networked systems alone. Examples are Novell Netware, Windows NT, Microsoft
Server, etc.
7.3 | Layers of an Operating System
Figure 7.1 shows the functional layers of a computer system highlighting the ‘level’ at
which the OS resides.The application is written by users,while‘software system program-
mers’ have the responsibility of the design of the OS and software utilities. Examples of
application programs are games, web browsers, word processors, etc.The user ‘runs’these
software for his needs—he doesn’t design the ‘software’ package, however.
The systems programmer is the person who designs utilities like compilers, linkers,
software packages, etc. for specific applications. He is a software developer who, with
more expertise can design the kernel of the OS, as well.
The hardware design is done by a computer architect and the OS is meant to shield
the user from having to know anything about the hardware and its complexities.
M07_9788131787663_C07.indd 223
M07_9788131787663_C07.indd 223 7/3/2012 3:11:53 PM
7/3/2012 3:11:53 PM

7.4 | History of Operating Systems
The earliest computers were quite small and simple and did not need an operating sys-
tem, as the user directly talked (or rather, interacted) with hardware. But as computers
became more complex and the range of users increased, having an interface for dealing
with users on one side, and the hardware, on the other side, became necessary. If we look
into the history of computers, we are likely to ﬁnd numerous variants of OSes developed
by various individuals and companies. Here we will talk only about the important ones
which really made their presence felt. Let us make a list of such OSes catering to general
purpose computer systems.
i) MS-DOS: In 1980–81, work began on the DOS (Disk Operating System) which
later came to be associated with Microsoft and came to be named as MS-DOS. It
is a command line based OS, and over the years has kept up by adding new features,
and thus newer and newer versions of DOS keep on coming. It has been part of all
Windows based OSes, except in the latest 64-bit Windows.
ii) UNIX: It was in 1970 that UNIX was developed using the C programming lan-
guage, by Ken Thomson and Dennis Ritchie. It became very popular and was used
in multiprogramming environments.
iii) WINDOWS: In the 1990s, Windows 3.0 was launched by Microsoft which is a
GUI-based OS. GUI stands for 'Graphical User Interface'. Though Apple’s PCs
has already had such a GUI-based OS, it was the IBM PCs which made Windows
popular. Over the years, newer versions of Windows keep coming up, each one
claiming to have more and improved features with respect to the previous version.
All in all, Windows is the most widely used OS in the world.
iv) LINUX: This is a very popular OS now, being used for small as well as very large
applications. It started its development in 1991, the propounder being Linus
Torvalds.Linux has similarities to Unix,but is not a version of Unix.Torvalds began
Application
Utilities
Operating System
Hardware
User
Systems
Programmer
Computer
Architect
Figure 7.1 | Functional layers in a computer system
M07_9788131787663_C07.indd 224
M07_9788131787663_C07.indd 224 7/3/2012 3:11:53 PM
7/3/2012 3:11:53 PM

work on the project while he was an undergraduate student on a part-time basis.He
now works full-time for the Open Source Development Lab (OSDL) in Portland,
Oregon (IBM is a member of OSDL, and thus helps support his work).
From that modest beginning 15 years ago, Linux has grown to be a complete
modern operating system with a world-wide following. It is widely used on servers
and PCs and embedded systems. Linux has also become the standard platform for
developing and deploying open source software. There are embedded versions of
linux as well as real-time versions.
7.5 | Functions Performed by an OS
(Components of an OS)
Now let’s make a survey of the functions that a general purpose operating system takes
charge of. Figure 7.2 is a diagram which shows the important functional blocks of an
operating system.The central function of an OS is ‘resource management’.
What are the resources associated with a computer system?
There is the CPU, memory and I/O which are ‘physical resources’. Logical resources are
ﬁles, and deﬁned variables like semaphores, mutexes, etc. (they are discussed in forth-
coming sections).
7.5.1 | Processor Management
The most important resource in a computer is of course the processor. We will assume a
single processor system in our discussions here. It is the processor which does computa-
tion and data processing,that is,it is the work horse in the system,and all the jobs (tasks)
to be done are allocated to the processor.
Scheduler
Memory
Management
Inter-task
Communication and
Synchronisation
File System
Management
Device Drivers
Figure 7.2 | Functional block diagram of an operating system
M07_9788131787663_C07.indd 225
M07_9788131787663_C07.indd 225 7/3/2012 3:11:53 PM
7/3/2012 3:11:53 PM

What then, is the need to ‘manage’ the processor?
The answer is that, almost always, a number of tasks are present in the system and all
of them need the processor. But only one of then can get the service of the processor
at a time. Thus, managing the processor translates to managing the tasks in the system
efficiently, such that the processor is kept busy as long as there are tasks left to be done.
In addition to this, the different tasks in the system may need to communicate with
each other and this needs synchronization between them.
7.5.2 | Memory Management
The processor is the most important resource, no doubt, but it cannot work in isolation.
Memory is the next most important resource and there is constant interaction, that is,
communication between memory and the processor.There are different types of memory
in a system. Semiconductor memory is the primary or main memory. There is also the
secondary memory which is the hard disk (for a PC).
The functions of an OS with respect to memory management can be listed as
follows:
i) All programs are ‘run’ with code and data in the RAM. Thus for a program, RAM
space has to be allocated when the program runs and de-allocated when the program
terminates.The OS has to keep track of the RAM space when this is being done.
ii) Secondary memory is by and large,a storage space.The OS is expected to keep track
of this storage space and allocate it aptly as required.
iii) The most important aspect of memory management is the virtual memory concept.
iv) The primary memory and secondary memories together constitute a system of
‘virtual memory’ which is a game played by the OS to get the user (i.e. the applica-
tion) into believing that he has unlimited memory space. This needs a little more
explanation.
What is virtual memory?
Some of you might have already learned this concept in ‘computer architecture’.
Virtual memory is a concept by which the application (in effect, the user) is made
to believe that it has much more memory than is physically available. In effect, there is
this idea of an ‘enlarged address space’which is called a ‘virtual address space’.This space
maps into a physical address space, ultimately.
Semiconductor memory, which is reasonably fast (RAM/ROM), is what we usually
designate as ‘primary memory’, ‘main memory’ or ‘physical memory’. The hard disk and
other memory types are designated as ‘secondary memory’. The processor directly com-
municates with the main memory only, but data movement between the secondary and
main memory is possible.
Whenever an application is to be run, the processor expects to find the associated
data and code in the main memory. In case it is not found there, it should be brought
thereto from the hard disk. This takes a small amount of time. This is the reason why
we find our computer ‘slow’ if the RAM size is low. However, even if the RAM is of a
large capacity, it is not necessary that our application files are currently present there.
Managing the system, such that important data and code are found in the main memory
most of the time, is part of the ‘cleverness’of the operating system. Figure 7.3 shows the
M07_9788131787663_C07.indd 226
M07_9788131787663_C07.indd 226 7/3/2012 3:11:53 PM
7/3/2012 3:11:53 PM

memory hierarchy in a computer system. In most computers today, there is an additional
level in the memory hierarchy and that is the ‘cache’, which is a fast SRAM memory and
which contains a copy of some parts of the RAM content.
7.5.2.1 | Memory Management in An Embedded System
In an embedded system, the kinds of memory available are RAM and ROM.
Technologically there are different kinds of RAM and ROM—like RAM may be SRAM
or SDRAM; ROM may be flash memory and/or EEPROM.The memory management
involved in embedded systems is less complex than in general purposes PCs but there are
some aspects to be managed when the system becomes complex, and dynamic memory
allocation techniques may become necessary.But application programs for simple embed-
ded systems (like those for 8-bit microcontrollers) almost always use static memory allo-
cation. In such cases, there is no need for special techniques of memory management.
7.5.3 | IO Management
IO is the next important resource and is the one which allows the user to communicate
with the computer.The number of I/O devices in common usage is increasing day by day
as newer types and brands become available. Managing all this is quite cumbersome and
requires skill. Figure 7.4 shows some I/O devices that are commonly used. It is obviously
that there is no similarity between any of them, that is, the keyboard, mouse and video
monitor. All work on different principles and protocols, and the voltage and current
levels are different.
How does the OS get to manage all of them? The method is to have a device driver
for each device.This is a set of routines written for the specific device.This routine takes
care of the hardware setup and protocols associated with that device. The OS includes
device drivers for most of the commonly used devices like printers, keyboard, mouse,
video card, sound card, etc. The device drivers may be different for different brands of
these I/O devices. If a new device is added to the system, the device driver also is loaded
and when (if) the device is removed, the driver is unloaded. We will see more about this
in Section 7.16.
Cache
Primary Memory
Secondary Memory
Less
Access Time
More
Capacity
Figure 7.3 | Computer hierarchies in a computer system
M07_9788131787663_C07.indd 227
M07_9788131787663_C07.indd 227 7/3/2012 3:11:53 PM
7/3/2012 3:11:53 PM

7.5.4 | File Management
All of us are used to storing data in what are called files. Files are continuously created,
deleted, modified, saved, opened, closed and stored in directories or folders. When a file
in a particular folder has to be opened, it has to located and to get this done efficiently,
file storage should be well structured.The file management system has to control access
to data in files; certain files are to be password protected, some are ‘read only’and certain
files are to be restricted to be accessed only by specific programs.The file manager must
attach a set of attributes to each file to control the access to it.
See Figure 7.5 which shows a file structure for a typical system.There is a root direc-
tory and directories within them, called subdirectories, for different applications like
games, CAD, documents etc.The word ‘directory’originated from MS-DOS. Windows
calls them ‘folders’ and subfolders.
7.5.5 | Multiprogramming
This term refers to the fact that multiple users use the same CPU.There is the case that
there is only one CPU, but multiple terminals on which multiple users work at the same
time. In this case, each user is made to believe that it is his program that is running all
the time, though actually all programs are running on a time shared basis.
Another case is when a user of the PC opens multiple windows and executes many dif-
ferent programs simultaneously.Here again,time sharing occurs but the user is not aware of
this.The actions taken by the OS to create a time sharing environment should be transpar-
ent to the user.There is a task scheduler and dispatcher to take care of such matters.
7.5.6 | Protection and Security
The terms ‘protection’and ‘security’refer to two different aspects that the OS has to take
care of.
Figure 7.4 | A keyboard, video monitor and mouse
M07_9788131787663_C07.indd 228
M07_9788131787663_C07.indd 228 7/3/2012 3:11:53 PM
7/3/2012 3:11:53 PM

Computers may be standalone or networked. In any case, multiple users and
multiple programs use them, and there is the additional aspect that data and code from
the same pool is shared across users. When this is being done, it is important to deﬁne
and also maintain the rights of each of these users with respect to the shared informa-
tion, and thus prevent unauthorized access. This aspect is called ‘protection’ and the OS
has to implement this feature.This relates to protection between users.Another aspect of
protection is when application level programs are not allowed to access system software
and device drivers, etc.This aspect is for protecting the OS software from being tromped
upon by users.
‘Security’ is something else. It is the capability of the system to keep itself safe
from external attacks from viruses, malicious software and from intrusion by external
agencies (who are not in the user set). A lot of security features may be incorporated
in OSes itself, but for extra security, additional software like anti-viruses, spyware,
etc. may be needed. It is obvious that computers dealing with defence informa-
tion need more security incorporated into their operating systems that those used
at home.
Root
Docs
Apps
DOS
CAD Games
Figure 7.5 | A typical ﬁle system
M07_9788131787663_C07.indd 229
M07_9788131787663_C07.indd 229 7/3/2012 3:11:54 PM
7/3/2012 3:11:54 PM

7.5.7 | Network Management
All computers are not connected to a LAN or WAN. But in the current world, every
computer should be able to access the Internet and thus be part of the World Wide Web.
Thus, ‘network management’ is part of the functionality of any operating system.
7.6 | Some Terms Associated with Operating Systems
and Computer Usage
7.6.1 | Low Level Software Utility
This is a term very frequently used. What is its meaning and relevance?
A technical dictionary refers to ‘software utility’as ‘a program that performs a specific
task related to the management of computer functions, resources, or files, such as pass-
word protection, memory management, virus protection, and file compression’. We have
all used such software—they don’t come as part of the OS,but these utilities can enhance
computer usage by providing important services which are usable by users.Programs like
disk cleanup, screen savers, disk partitioners, etc. are considered to be utilities which a
user can use,provided he has a reasonably good knowledge about computers.Utilities are
different from application programs like document, spreadsheet and database programs.
Such programs are also developed by system developers. Refer to Figure 7.1.
7.6.2 | Boot Loader
A bootloader is a software that is responsible for loading the OS code onto main mem-
ory. Once Power On Self Test (POST) and other hardware initialization steps are done
by the BIOS (Basic Input Output System), the system is ready to run software. For the
OS itself to be run, it must be copied from secondary memory to primary memory.The
bootloader is responsible for this. Once the OS kernel is loaded, the processor can start
executing it. The kernel code will then continue to load and initialize other parts of the
OS. A simple bootloader may consist of only a few assembly instructions that copy OS
code from disk to RAM. More complex bootloaders can allow the user to select a par-
ticular OS (on a system with multiple OSes) or set options to the kernel.
For example, GRUB (Grand Unified Boot Loader), LILO (Linux loader).
Bootloaders are highly dependant on the hardware platform.
For embedded processors, a bootloader does the same thing, but here it loads appli-
cation code (perhaps statically linked to an RTOS) onto the processor’s execution space
(usually flash). Bootloaders usually allow the code to be transferred via one of several
interfaces like the serial port or JTAG port.
7.6.3 | User Interface
Users of a computer like to have an easy and a ‘user friendly’interface to it,implying that,
using a computer should be a pleasurable experience.
What is meant by the term ‘user interface’?
It is the way the computer ‘appears’to a user. It is through this interface that requests are
sent to the operating system by the user for access to input devices and output devices.
M07_9788131787663_C07.indd 230
M07_9788131787663_C07.indd 230 7/3/2012 3:11:54 PM
7/3/2012 3:11:54 PM

Thus, the results of mouse clicks and key presses which are input devices are seen on
the video monitor which is an output device. This is the case when the user interface is
a GUI, that is, a Graphical User Interface, as we have in Windows type OSes. There is
another user interface, that is, the command line interface (CLI) which is the interface
for MS-DOS and sometimes used for Linux and Unix.In a CLI,we type commands for
operations we need to perform.This may seem to be cumbersome, but has its plus points
once the commands are mastered.
7.6.4 | Application Programming Interface (API)
An API is a set of functions, routines or protocols to simplify the process of building
software applications. When a developer needs to write code, he doesn’t have to start
from first principles or the very basics to build an application. He can put together a
set of routines from the API and get his application working. The API is the func-
tional interface over which a programmer writing an application program in a high
level language can get the service (data or functions) of the operating system or another
application.
For example, in Chapter 12 on PSoC, for each user module, APIs are provided to
help the programmer. For using the PWM module, for example (Section 12.7.3), the
function PWM8_1_Start() is used. The internal details of this function need not be
known to the programmer.Thus, an API provides a level of abstraction for the program-
mer and acts as an interface.
7.6.5 | POSIX
The words POSIX stands for ‘Portable Operating System Interface’. This is a standard
specified by IEEE to define the API and shell and utilities interfaces, to maintain porta-
bility between application programs and operating systems. It was started with the aim
of developing an interface for UNIX compatibility, but now the standard has developed
beyond this simple aim. By designing their programs to conform to POSIX, develop-
ers have some assurance that their software can be easily ported to POSIX-compliant
operating systems. Complying to this standard is very important for embedded applica-
tions, because of the large variety of applications and many variants of operating systems
available in the field.
7.7 | The Kernel
Figure 7.6 shows the 'onion skin' diagram of the operating system.The centre portion is
the kernel.The kernel is the innermost part or core of an operating system, and it is that
provides the basic services for all for all other parts of the system.The kernel is sometimes
referred to as the supervisor, core or internals of the operating system. Outside the ker-
nel is the less important part of the system software (Level-1), and the software utilities
(Level-2). Outside all this are the application programs (Level-3).
The kernel of an OS performs the following services (most of which we discuss in the
following section), though the last two items need not mandatorily be a part of the kernel.
• Interrupt handling
• Task creation and scheduling
M07_9788131787663_C07.indd 231
M07_9788131787663_C07.indd 231 7/3/2012 3:11:54 PM
7/3/2012 3:11:54 PM

• Inter process communication
• Support of I/O devices
• Memory allocation and deallocation
• File system management
• Network management
The kernel is the ﬁrst part of the operating system to load into memory (RAM)
during system startup. It remains there for the entire duration of the session because
its services are required continuously. Less important system programs and utilities lie
outside the kernel.
Because of its critical nature, the kernel has a privileged status as compared to nor-
mal user applications. It has a protected memory space and full access to hardware.This
status and its memory space are collectively referred to as kernel-space. User applica-
tions, on the other hand, execute in user-space. These user applications do not have the
privileges of kernel code, but can use kernel services using system calls.This relationship
between an application program and the kernel through the system call interface is what
prompts us to say that applications run on the OS. Figure 7.7 shows the user and kernel
spaces. The system call interface is to enable applications to use kernel services. I/O is
managed by device drivers on the other side.
Figure 7.7 shows that the kernel interacts with the hardware, on the one side, and
the applications on the other.
With such a deﬁnition for the kernel, a number of questions need to be answered.
How does the kernel interact with the hardware?
Kernels usually implement some level of hardware abstraction for the underlying hard-
ware.Thus for I/O,device drivers access the hardware,and for memory and the processor
there are another set of kernel functions.
Figure 7.6 | ‘Onion skin’diagram of the operating system.
Level-3
Level-2
Level-1
Kernel
M07_9788131787663_C07.indd 232
M07_9788131787663_C07.indd 232 7/3/2012 3:11:54 PM
7/3/2012 3:11:54 PM

How specific is a kernel?
The kernel of an OS is very specific, that is, the kernel of Windows 98 and Windows XP
are different and no migration is possible from one to the other.
Are kernels specific to the hardware they are loaded into?
Kernels are designed with a level of abstraction that that hide the details of the hardware
below it and so permits them to be loaded into different kinds of hardware. But there are
specific different kernel options that have to be configured when the OS is installed.This
‘configuration’is related to the hardware onto which it is being installed.Changing major
hardware components, such as the motherboard, processor or memory, often requires a
kernel update. Such an aspect of the operating system may not be noticeable for a user
who uses PCs with standard OSes,but when trying to load OSes into embedded boards,
it is important to know the type of board being booted with the OS, and the configura-
tion settings for it.
7.7.1 | Types of Kernels
There are two schools of thoughts with respect to kernel design.
7.7.1.1 | Monolihthic kernel
The word ‘monolithic’ means ‘consisting of one piece’. A monolithic kernel implies that
the kernel is one single entity and is loaded as a whole in the kernel space. Such a system
Application 1 Application 2 Application 3 Application 4 Application 5
User-space
System Call Interface
Kernel Subsystems
Device Drivers
Kernel-space
Hardware
Figure 7.7 | Kernel and user spaces and I/O management for an OS
M07_9788131787663_C07.indd 233
M07_9788131787663_C07.indd 233 7/3/2012 3:11:54 PM
7/3/2012 3:11:54 PM

has the problem that a failure in any part of the kernel can cause the whole system to
crash. Kernels of solaris, unix and linux are monolithic kernels.
7.7.1.2 | Microkernels
In this, some processes of the kernel (i.e. the less important parts) are separated out and
designated as ‘servers’.The relatively more important of these servers run in kernel space,
while the others run in user space. Because of this division, there is the need of interpro-
cessor communication (IPC-Section 7.12) between these servers (i.e. processes) which
can bring down the efficiency of the kernel. But the advantage is that the modularity
prevents a ‘total crash’, if problems occur in some of the servers. Minix and BeOS are
examples of microkernels.
Now that we have had a general introduction to the idea of operating systems as a
whole. it is time to dissect it and learn the exact mechanism by which each of its func-
tionalities is realized.
7.8 | Tasks/Processes
The activity of a computer is to‘run/execute’programs.A programinexecution is formally
defined as a process. Terms like ‘task’ and ‘job’ are also used to denote a process. In this
chapter, we use the words tasks and processes interchangeably. In a system, if there are
multiple programs that need execution at the same time, we call it a multiprogramming
environment and the system is a multitasking system.
A task is defined in terms of the program counter which points to its code memory,
the working registers (general purpose and status), stack and stack pointer. Another task
will use a different area of code memory, a different stack but the same working registers
(which are limited in number for any processor).We assume the single processor system,
i.e., a system in which there is only one CPU to perform all the computations needed.
A process/task can be in any one of the following five states and there will be con-
tinual transition between the states, because a process is a dynamic item. The five states
are named as new, ready, running, blocked and exit. Let’s see what each state means.
i) New: A process has been created but it hasn’t yet got admission into the ‘ready’
queue of executable processes.
ii) Ready: Here, the process has all the resources (like I/O) needed to execute, except
the availability of the CPU for which it has to wait.
iii) Running:This is the state of the process that is currently being executed using the CPU.
iv) Blocked: In this state, the process cannot execute until a specified event such as an
I/O completion occurs.
v) Exit: The process has terminated because its execution is complete, or because an
error has occurred.
Figure 7.8 shows the state transition diagram, which may also be called the pro-
cess life cycle, as it shows the possible states a process may traverse during its life time.
A process when created is in the ‘new’ state. If then it gets all the necessary resources to
run,it enters the ‘ready’state.There may be many other processes in the ready state,which
makes a ‘ready queue’. Processes in this queue wait for their chance to get the CPU. But
only one of them gets the CPU at a particular time. Which one of them (among the
M07_9788131787663_C07.indd 234
M07_9788131787663_C07.indd 234 7/3/2012 3:11:54 PM
7/3/2012 3:11:54 PM

waiting tasks) gets the chance to execute, depends on the scheduling algorithm. The
arrow marked ‘schedule’ shows a specific task getting its chance to use the CPU.
Figure 7.8 shows the transitions occurring between the ready, running and blocked
states. Let’s see the possibilities for the transitions to occur.
i) A process can exit from the running state when it finishes its stipulated use of the
CPU.
ii) A process can be pushed back into the Ready queue.This is called ‘pre-emption’and
the reasons for this to happen will be that some other process is to be given the use
of the CPU (depending on the scheduling algorithm).
iii) A process can be pushed into the ‘blocked’state because of needing some additional
resources, or an event from I/O, like a keyboard input or so. The arrow from the
running to the blocked state shows this case.
i v) While in the blocked state, if the awaited I/O event occurs, it cannot go back to the
‘running’state directly, rather, it waits in the ready queue until it gets its next chance
to use the CPU.
v) Finally (many transitions can occur again), it goes to the exit state of successful
completion or abnormal termination (due to some unforeseen error).
The OS is responsible for providing the support needed for creating a process and
taking it to its termination. If for instance, a user task is taken up, say, a user opens a
program window and wants to execute it, the OS prepares the task structure (called a
task/process control block) for it, verifies the availability of resources and puts it in the
ready queue.
7.8.1 | Task (Process) Control Block
This is a table that contains all the information about the task, and is prepared by the OS
and placed in memory. It contains the following
i) The unique ID of the task
ii) The state of the task
New Blocked Exit
I/O
Event Wait
I/O
Event
Occurs
Running
Finish
Ready
Schedule
Pre-empt
Figure 7.8 | The state transition diagram of a process
M07_9788131787663_C07.indd 235
M07_9788131787663_C07.indd 235 7/3/2012 3:11:54 PM
7/3/2012 3:11:54 PM

iii) Scheduling information with respect to it
iv) The registers it uses and their contents
v) The I/O resources it has been allocated
vi) Pointers to the memory it uses
vii) Priority of the task
viii) Other information pertaining to the task
Thus for each task, there is a Task/Process control block (TCB/PCB). If there are
n tasks, there are n TCBs in memory. Figure 7.9 shows the ﬁgure of TCBs for n tasks.
7.8.2 | Multitasking
We have assumed the case of ‘a single CPU and multiple tasks’ awaiting service from
this CPU.Thus, it is clear that one task should not monopolize the use of the CPU, that
is, the CPU must be shared between tasks. This is ‘time multiplexing’. This is meant to
create a feel of parallelism,with each task getting executed,but not at the same time.This
is a sort of pseudo parallelism and is achieved by multitasking. It is imperative then that
‘task switching’be done. Each task is allowed to use the CPU for some time, after which
it relinquishes its claim and allows the next task to execute. In eﬀect, chunks of each task
get executed each time.
Recollect that there is a TCB associated with each task,which contains all the infor-
mation of that task. This information is called the ‘context’ of the task. For a particular
task, there are register contents, memory pointers, resource pointers, etc. This is what
Other
Information
Task ID
Task State
PC
Task Context
Task Control Block 1
2
3
4
n
SP
Copy of GPRs
Figure 7.9 | Task control blocks for n tasks
M07_9788131787663_C07.indd 236
M07_9788131787663_C07.indd 236 7/3/2012 3:11:54 PM
7/3/2012 3:11:54 PM

is meant by the context of the task at the time that it ‘switches’. Task switching entails
context switching which tells us that the context of the new task will be different from
that of the abandoned task. So, it is obvious that all the information of the old task is to
be saved, and the term used for this is ‘context saving’. Because context saving is done,
there is no difficulty in resuming this task from the point at which it had to stop, when
it gets the service of the CPU once again, later.
When the CPU takes up a new task, the context changes to that of the new task.
When an old task that was abandoned,is taken up again,context retrieval occurs.Thus in
a multitasking system, context switching, context saving and context retrieval are activi-
ties that occur continuously. Note that these activities are time-consuming activities.
All of them cause memory writes (for context saving) and memory reads (for context
retrieval) as the TCBs are in memory.
The obvious side effect of multitasking is the additional time overhead in switching
from one task to another. The operating system is a software but if the processor for
which it is designed, provides hardware support for multitasking, the time overhead
can be reduced. Most advanced processors provide this sort of support. The x86
processors from 80286 onwards have special registers and other support structures
for multitasking. Embedded processors like ARM have modes for fast switching on
interrupts. It is by using the mechanism of interrupts that task switching occurs—
recollect that what an interrupt does is the abandoning of the current task and taking
up a new task (Section 2.2.9).
How does multitasking affect the user?
There are single user operating systems like DOS, and multi user systems like Windows.
In the latter, a number of tasks can run concurrently.The word ‘concurrently’in this con-
text means that all these tasks are placed for execution, but only one task is actually being
executed by the CPU. The CPU is being ‘time multiplexed’ and with a high rate of this
multiplexing, it appears that all the tasks are being run at the same time—this is the
concurrent processing we talk about.The user is unaware that the CPU is being switched
from one task to another.The user gets the feeling that all the tasks are executing in paral-
lel. But as the number of tasks increase, one obvious effect is that of a ‘system slowdown’.
7.8.3 | Task (Process) Scheduling
Now that we have discussed multitasking in principle, the next step is to get an idea of
how it is accomplished in operating systems.What is the criterion on which task switch-
ing is done? A number of questions arise. Does each task gets a stipulated amount of
time for executing before it is forced to relinquish its use of the CPU? Or are there some
tasks which are more favoured such that they are allowed to use more CPU time? What
is the order in which tasks are assigned to use the CPU? Are there tasks which are forced
to wait because other more important tasks need to be executed earlier?
The answers to all these questions give us a clue as to what the scheduling policy of
the operating system is.Note Figure 7.2 that shows that a scheduler is a specific and very
important unit in an operating system. What a scheduler does (in general) is to share a
‘resource’between the requestors.The scheduler does not want to give the resource to just
one of the requesting parties alone, because all tasks are equally important to it.
M07_9788131787663_C07.indd 237
M07_9788131787663_C07.indd 237 7/3/2012 3:11:54 PM
7/3/2012 3:11:54 PM

Consider the common kitchen of a multiroomed apartment in which many families
reside. Only one family can use the kitchen at a time, and if they all agree to use it on a
time shared basis, the apartment manager can schedule it in a particular sequence based
on the needs and time patterns of the different families.
In a computer system, when a number of tasks arise, the scheduler has the duty of
allocating the services of the CPU to each one of them, based on a set of scheduling
conditions. The conditions depend on the types of applications for which the operating
system is used. The task scheduling conditions for a ‘real time’ embedded OS are much
more stringent (Refer to Chapter 8) than for a non real-time OS.
Now we will go right into the heart of the matter by discussing the important task
scheduling algorithms applicable for general purpose and non real-time embedded oper-
ating systems. The special and stringent task scheduling methods for real-time embed-
ded systems are relegated to the next chapter.
7.8.4 | CPU and IO Bound Tasks
In a computer system, there are multiple tasks, one CPU and many I/O devices. A task
usually needs to use the CPU as well as I/O devices. Some tasks are computationally
intensive in which case they use the CPU for longer periods—they are termed CPU
bound tasks—other tasks which need more I/O time are called I/O bound tasks. For
example, a task which needs to do a lot of printing is an I/O bound task, because it uses
the printer far more than it uses the CPU. But on the average, a typical task consists of a
CPU burst followed by an I/O burst as in Figure 7.10,where a burst means the period of
activity during which time a task uses the CPU or I/O as the case may be.
7.8.5 | Selection of a Scheduling Algorithm
Scheduling of tasks means that a decision is made regarding the order/sequence in
which tasks are to be allocated the use of the CPU. We assume a system in which there
are multiple tasks each of which is vying to get the service of the CPU. What are the
objectives aimed to be achieved when a scheduling algorithm is chosen? Let’s define
some terms which will help us understand the qualities of a good scheduling policy, in a
quantitative manner.
i) CPU utilization: There is only one processor and it should be utilized to the maxi-
mum; as far as possible, the CPU should never be allowed to idle. CPU utilization
factor is defined as the percentage of time the CPU is working. When the CPU
works all the time, the utilization is 100%.
ii) Response time: Consider a task which is waiting for the CPU from which it needs
the response corresponding to the execution of the task. If this task gets its response
Time
CPU CPU
CPU I/O I/O
Figure 7.10 | Alternate CPU and I/O bursts
M07_9788131787663_C07.indd 238
M07_9788131787663_C07.indd 238 7/3/2012 3:11:54 PM
7/3/2012 3:11:54 PM

soon, the scheduling policy adopted for the system is assumed to be good—in
principle, the response time should be minimum.
iii) Turnaround time (TAT): The time interval from the instant the task is presented
to the system, to the instant that it exits after completion is defined as the ‘turn
around time’. This interval is likely to be quite eventful. Initially the task may have
to wait to get the CPU. Even after getting the service of the CPU, it may go to the
blocked state or back to the ready queue. Obviously then, the task may will not
get executed in ‘one go’. Parts of it may get executed until it reaches completion.
Thus the TAT is the sum total of the ‘execution’ periods and the ‘wait’ periods. For
a system, the average turnaround time indicates how well it can accommodate and
execute multiple tasks efficiently.
iv) Throughput rate: This corresponds to the number of tasks processed in unit time.
This will be the inverse of the average turnaround time.
In short,the ideal scheduling strategy should give a high throughput rate,minimum
response and turn around times and should maximize CPU utilization. It should be a
‘fair’ policy, in that no task should be unnecessarily denied/delayed the use of resources,
while keeping in mind that more important tasks should get priority. We will examine
various strategies and compare them.
7.8.6 | CPU Scheduler and Resource Manager
We are talking about a system with just one CPU and multiple tasks. Because each task
wants the CPU, there is a CPU scheduler which allocates the use of CPU to one of the
tasks. After the task finishes with the CPU, it might need the service of an I/O device—
like a printer. At that time, there are other tasks also needing the same resource. Thus
to manage the use of resources (typically,/O), there is a resource manager too.Thus, you
may now visualize multiple tasks with the following status:
i) Waiting in the ‘ready’ queue for the service of the CPU
ii) Waiting in the device queue for the service of a specific I/O device
These queues are dynamic, because as tasks are ready to execute, they enter the
ready queue of the CPU scheduler and from there to the running state. This happens
continuously. Similar is the case for the device queue—tasks enter and leave the queue
continuously.
All information about a task including the pointer to its starting point is noted and
kept in the TCB.Thus, when we say that tasks are in the queue, it means that the TCBs
of tasks are placed in a queue in memory.
7.9 | Scheduling Algorithms
Now we will have a look at some popular scheduling algorithms. Note Figure 7.11. As
far as scheduling is concerned, only three states are to be thought of and they are the
ready,running and blocked states.Any task will be in one of these states,during the time
when scheduling is ‘meaningful’ for the task.
M07_9788131787663_C07.indd 239
M07_9788131787663_C07.indd 239 7/3/2012 3:11:54 PM
7/3/2012 3:11:54 PM

7.9.1 | Pre-emption
There is a word ‘pre-emption’ used frequently with respect to scheduling. The meaning
of the word ‘pre-empt’in this context is ‘take the place after removing’.Thus,any task can
’pre-empt’ the currently running task. Obviously in this context, the currently running
task is forced to relinquish the use of the CPU and hand it over to some other task.The
pre-empted task then waits in the ready queue (Refer Fig 7.11). With this possibility,
scheduling policies may be non-preemptive or preemptive. We first consider non-
preemptive scheduling methods.
7.9.2 | Assumptions
The assumptions that we make while calculating the results regarding a particular sched-
uling method are not realistic, in a true sense. Besides this, the calculation of ‘times’
ignores the ’scheduling latency’. Note the following:
i) The execution times of each task are to be known ahead of scheduling. This is nei-
ther realistic nor possible. Only some estimates of execution times may be made by
the scheduler.
ii) Each time a new task comes into the ready queue, scheduling must be re-done.
iii) The scheduler needs some time to converge to a decision, and this adds a time
overhead.
7.9.3 | Non-preemptive Methods of Scheduling
7.9.3.1 | Co-operative Scheduling
This is the simplest method of scheduling. Each task is allowed to execute to its finish,
then only the next one is taken up.While one task executes,the others are willing to wait
and this gives the name ‘co-operative’ to the scheduling algorithm. This is also a case of
first come first serve scheduling (FCFS) or a first in first out scheme (FIFO). This is a
very simple scheme, but the obvious disadvantage is that one task can monopolize the
CPU if its service period is high.
Figure 7.11 | The three important states of a task during scheduling
Ready
I/O Completed
I/O Request
S
c
h
e
d
u
l
e
P
r
e
-
e
m
p
t
New Process
Blocked
Execution Over
Running
M07_9788131787663_C07.indd 240
M07_9788131787663_C07.indd 240 7/3/2012 3:11:54 PM
7/3/2012 3:11:54 PM

Example 7.1
Consider a multitasking system which uses the FCFS scheduling algorithm. There are
ﬁve tasks in the ready queue at a particular time, ordered from T1
to T5
. Table 7.1 gives
the tasks and the corresponding service times TS
for each. Find the average turn around
time (TAT) and wait time. Comment on the CPU utilization.
Table 7.1
Task No TS
(Time Units)
T1
300
T2
125
T3
400
T4
150
T5
100
Solution
The tasks are serviced in the order in which they are placed in the ready queue.The Gantt
chart (service time for each task) is shown in Figure 7.12.
Let us calculate the time for which each task waits.TAT for a task is its waiting time
plus service time.
TAT = TW
+ TS
The ﬁrst task T1
has a waiting time of 0. All other tasks have to wait until the ones
ahead in the ready queue are serviced.
T1
: TW
= 0, TS
= 300, TAT = 300
T2
: TW
= 300, TS
= 125, TAT = TW
+ TS
= 300 + 125 = 425
T3
: TW
= 425, TS
= 400, TAT = 825
T4
: TW
= 825, TS
= 150, TAT = 975
T3
TW
= 975, TS
= 100, TAT = 1075
Total TAT = 3600
Average TAT = 3600/5 = 720 time units
What is the total wait time?
TTOTAL-W
= 0 + 300 + 425 + 825 + 975 = 2525 time units
Average wait time = 2525/5 = 505 time units
300
T1
T2
T3
T4
T5
400
300
0 425 825 975 1075
100
150
125
Figure 7.12 | Gantt chart of scheduling for Example 7.1
M07_9788131787663_C07.indd 241
M07_9788131787663_C07.indd 241 7/3/2012 3:11:55 PM
7/3/2012 3:11:55 PM

This example clarifies that this is a very simple algorithm. The problem that may
occur is that if one process has a long execution time, it can monopolize the CPU.
7.9.4 | Shortest Job Next (SJN)
This is also called the shortest job first (SJF) algorithm. Here the method is to queue the
tasks such that the one with the shortest service time gets to execute first. But the prob-
lem is that, usually the execution times of tasks are not known in advance. The method
then is to estimate the service times by using the recent history of each of the tasks.The
tasks are queued in the order of increasing service times.
Example 7.2 a
Let’s use the data in the previous example itself. There are five tasks in the ready queue
as we see in Table 7.2.
Table 7.2
Task No TS
(Time Units)
T1
300
T2
125
T3
400
T4
150
T5
100
Now to find the sequence in which the jobs are to be executed, just order them
in the sequence of increasing service times. See the Gantt chart of Figure 7.13 which
shows this.T5
, which is the shortest job, executes first; and T3
, with the longest execu-
tion time, comes last.
T5
: TW
= 0, TS
= 100, TAT = 100
T2
: TW
= 100, TS
= 125, TAT = TW
+ TS
= 100 + 125 = 225
T4
: TW
= 225, TS
= 150, TAT = 375
T1
: TW
= 375, TS
= 300, TAT = 675
T3
: TW
= 675, TS
= 400, TAT = 1075
100
T5
T2
T4
T1
T3
0 225 375 675 1075
100 125 150 300 400
Figure 7.13 | Gantt chart of scheduling for Example 7.2a
M07_9788131787663_C07.indd 242
M07_9788131787663_C07.indd 242 7/3/2012 3:11:55 PM
7/3/2012 3:11:55 PM

Total TAT = 3600 time units
TTOTAL-W
= 0 + 100 + 225 + 375 + 675 = 1375 time units
In this, the wait times are less than that in Example 7.1, and therefore the average
TAT reduces.
Keep in mind that the service time is the same for any task, irrespective of the type
of scheduling.
Example 7.2b
In Example 7.2a, we assumed that all the tasks are in the ready queue at time t = 0 and
that no other tasks come in during the service periods of these tasks.
Now, let’s change that assumption a bit. Assume that two new tasks enter the ready
queue at time t = 350.At this time,T4
is executing.Meanwhile,the ready queue is modi-
ﬁed by adding T7
and T6
and ordered to have them execute before T1
and T3
.
The new tasks are T6
with a TS
of 250 and T7
with a TS
of 200.
The scheduling is now re-ordered to be as shown in the Gantt chart of Figure 7.14
The waiting time of T1
increases from 375 to 825
Figure 7.14 | Gantt chart of scheduling for Example 7.2b
100
T5
T2
T4
T7
T6
T1
T3
0 225 375 575 825 1525
1125
100 125 150 200 250 300 400
increases from 675 to 1125
Thus, the eﬀect of the new tasks is that the waiting times of T1
and T3
increase. If
again tasks of shorter service periods come in to the ready queue, the execution of tasks
of long periods gets delayed. In the extreme case, there is the possibility of ‘starvation’
occurring for such tasks, by being deprived of the service of the CPU altogether.
7.9.4.1 | Priority-based Scheduling
In such a system, tasks have priorities which are represented in terms of numbers.
The convention is to have 0 to 255 as the numbers representing priorities, with 0
representing the highest priority, and higher numbers indicating lower priorities. Many
systems use this scheme, and real-time operating systems (Chapter 8) use priority-
based methods only.
M07_9788131787663_C07.indd 243
M07_9788131787663_C07.indd 243 7/3/2012 3:11:55 PM
7/3/2012 3:11:55 PM

Example 7.3a
See the task list in the ready queue in Table 7.3 with priorities as indicated.
Table 7.3
Task No Ts
(Time Units) Priority
T1
300 5
T2
125 3
T3
400 2
T4
150 12
T5
100 15
The scheduling will be in the order as shown in Figure 7.15
T3
: TW
= 0, TS
= 400
T2
: TW
= 400, TS
= 125, TAT = TW
+ TS
= 400 + 125 = 525
T1
: TW
= 525, TS
= 300, TAT = 825
T4
: TW
= 825, TS
= 150, TAT = 975
T5
: TW
= 975, TS
= 100, TAT = 1075
Total TAT = 3600
Average TAT = 3600 /5 = 720 time units
TTOTAL-W
= 0 + 400 + 525 + 825 + 975 = 2725 time units
The problem with such a scheme is that tasks of low priority tend to suﬀer. This
condition is called ‘starvation’. Note that the ready queue is dynamic, and as new tasks
keep on coming, it may happen that some low priority tasks are kept in waiting. Every
time that new tasks enter the ready queue, scheduling is re-done. If some higher pri-
ority tasks appear, the low priority tasks which have been waiting are forced to wait
still more.
400
T3
T2
T1
T4
T5
400
0 525 825 975 1075
100
150
300
125
Figure 7.15 | Gantt chart of scheduling for Example 7.3a
M07_9788131787663_C07.indd 244
M07_9788131787663_C07.indd 244 7/3/2012 3:11:55 PM
7/3/2012 3:11:55 PM

Example 7.3b
Consider a situation when at t = 700, the ready queue of Table 7.3 gets appended with
three new tasks T6
,T7
and T8
, whose priorities are 0, 4 and 6, respectively. See Table 7.4.
What is the change in the scheduling pattern?
Table 7.4
Task No TE
T6
120 0
T7
80 4
T8
350 6
Solution
At t = 700, task T1
is being serviced. It will continue to completion. Meanwhile, the
scheduling order will be as shown in the Gantt chart of Figure 7.16.
400
T3
T2
T1
T6
T7
T8
T4
T5
400
0 525 825 945 1025 1375 1525 1625
150 100
350
120 80
300
125
Figure 7.16 | Gantt chart of scheduling for Example 7.3b
After T1
,T6
,T7
and T8
are to be serviced. Only after these three are serviced, will T4
and T5
get the chance to use the CPU.
Now the waiting times of T4
is changed from 825 to 825 + service times of T6
, T7
and T8
.
That is, 825 + 120 + 80 + 350 = 1375
is also changed from 975 to 975 + service times of T6
, T7
and T8
.
That is, 975 + + 120 + 80 + 350 = 1525
These waiting times can become bigger if more higher priority processes keep on
coming in.
Comment: The solution to the problem of starvation occurring for low priority tasks is
to change the priorities dynamically, and one technique is called ‘aging’. In this method,
the priority of a task increases as its waiting time increases, so at some time it ﬁnally gets
the use of the CPU.
7.9.5 | Pre-emeptive Scheduling Strategies
In this set,when a task is running (i.e.using the CPU),it is pushed out from the ‘running’
state to the ‘ready’ state and another task from the ready queue is taken up. We say that
M07_9788131787663_C07.indd 245
M07_9788131787663_C07.indd 245 7/3/2012 3:11:55 PM
7/3/2012 3:11:55 PM

the task has been pre-empted (removed and replaced). Let’s examine some scheduling
methods based on this pre-emptive approach.
7.9.5.1 | Round Robin Scheduling
Here, time slices are defined. Suppose there are n tasks in a system, each of them is
allowed to execute for a period equal to the time slice. After this, the next task gets its
turn, but can use the CPU only for a time equal to the defined time slice.This seems to
be fair in the sense that all tasks get their chance at least once, in one round.This is also
called a time sharing or time slice system.
Example 7.4
Draw the scheduling diagram corresponding to RR scheduling for the task set in
Table 7.5.The time slice is defined to be 50 time units.
Table 7.5
Task No TS
(Time Units)
T1
150
T2
100
T3
200
T4
50
Solution
Time quantum = 50
Note that this then acts like FCFS, in effect (Refer Figure 7.17).
T1
: TW
= 0 + 150 + 100 = 250, TS
= 150, TAT = 400
T2
: TW
= 50 + 150 = 200, TS
= 100, TAT = TW
+ TS
= 200 + 100 = 300
T3
: TW
= 100 + 150 + 50 = 300, TS
= 300, TAT = 600
T4
: TW
= 150 + TS
= 50, TAT = 200
Total TAT = 1500
TTOTAL-W
= 250 + 200 + 300 + 150 = 900
In this, task T4
needs only one time slice, while T3
uses four time slices. Figure 7.17
shows the time periods taken by each task.
Are there problems in such a fair scheme?
The obvious problem is the amount of time to be spent in task switching, which is once
every time slice.A lot of time is spent in context saving,switching and retrieval.If the time
slices are small,the ‘time overhead’is serious enough to be considered a system inefficiency.
But if the time slice chosen is too large, the algorithm tends to being similar to FCFS.
M07_9788131787663_C07.indd 246
M07_9788131787663_C07.indd 246 7/3/2012 3:11:55 PM
7/3/2012 3:11:55 PM

Example 7.5
Let’s use the data of Example 7.1 for RR scheduling with a time slice of 100.
Table 7.6
Task No TS
(Time Units)
T1
300
T2
125
T3
400
T4
150
T5
100
Perform round robin scheduling with a time quantum of 100.
Solution
Time Quantum = 50
Time
50 100 150 200 250
T1
T2
T3
T4
300 350 400 450 500
0
0
50
500
T1
T2
T3
T4
T1
T2
T3
T1
T3
T3
Figure 7.17 | Scheduling using the round robin technique
0
T1
T2
T3
T4
T5
T1
T2
T3
T4
T1
1200
1100
1000
900
800
CPU Unutilised
700
600
500
400
300
200
100
50
T3
T3
100
25
M07_9788131787663_C07.indd 247
M07_9788131787663_C07.indd 247 7/3/2012 3:11:55 PM
7/3/2012 3:11:55 PM

In round robin scheduling for this example, the OS gets a time tick every 100 time
units—to mark the task switching times. If the service time of a task is less than the
time quantum, for the rest of the time, the CPU remains free and unutilized. In this, one
time slice is 100 time units.
T1
gets its chance to execute at time points of 0, 500, and 900 time units.
T2
has a total TS
of 125 time units. It executes at time points of 100 and 600. In the
second time slice allotted to T2, only 25 time units are used by T2. For the remaining 75
time units, the CPU is unutilized.
T3
has a TS
of 400 time units. It executes at time points of 200, 700, 1000 and
1100.
T4
has a TS
of 150 time units. It executes at time points of 300, and 800. In the
second time quantum, it uses only 50 time units. For the remaining 50 time units, the
CPU is unutilized.
T5
has a TS
of 100 time units.It uses the CPU for only one time quantum,at t = 400
time units.
7.9.5.2 | Pre-emptive Priority
In this method, at any time it should be ensured that it is the highest priority task in the
ready queue that should be running. If, at any time, a task with a higher priority than the
one that is currently running enters the ready queue, the current task is pre-empted and
the new one is taken up. Example 7.6 shows how this is handled.
Example 7.6
Table 7.7 shows the tasks in the ready queue at time t = 0.
Table 7.7
Task No TE
T1
300 5
T2
125 3
T3
400 2
T4
150 12
T5
100 15
At time t = 600, a new task T6
of priority 1 and execution period 75 comes into the
ready queue. What happens?
The task that was executing at that time is pre-empted. The ﬁnal Gantt chart is as
shown in Figure 7.18.
At time t = 600, task T1
was executing when task T6
enters the ready queue. Since
T6
has a higher priority than T1
, it (T1
) is pre-empted (at t = 600) and T6
goes to the
running state. It is serviced for 75 time units and is completed. After this, the ready
queue is looked at once again by the scheduler. The highest priority task is T1
which
needs 225 time units more. Thus, T1
’s execution is resumed after which T4
and T5
are
taken up.
The TAT and wait times can be calculated as in the previous examples.
M07_9788131787663_C07.indd 248
M07_9788131787663_C07.indd 248 7/3/2012 3:11:56 PM
7/3/2012 3:11:56 PM

7.9.5.3 | Pre-emptive SJN/Shortest Remaining Time (SRT)
This is just like the previous example. Normally the scheduling is in the order of increas-
ing service times. Thus, tasks with the shortest service times are executed earlier. In the
midst of a schedule, if a new task enters the ready queue, its service time is compared
with the remaining service time of the currently executing task. If the new task has a
shorter service time than the remaining time of the current task, it (the current task) is
pre-empted and the new one serviced until it completes.
Example 7.7
Let’s use the data in Example 7.2 (Table 7.2 which is redrawn here)
Task No TS
(Time Units)
T1
300
T2
125
T3
400
T4
150
T5
100
At time t = 150,a new task T6
with service time of 50 enters the ready queue.At this
time instant, it was T2
that was executing. It needs 75 time units more for completion.
This is more than the service time of the new task T6
. So T2
is pre-empted and T6
is
serviced. After this, T2
is resumed once again. The resultant Gantt chart is as shown in
Figure 7.19.
100
T5
T2
T2
T6
T4
T1
T3
100 50 75
0 150 200 275 425 725 1125
50 150 300 400
Figure 7.19 | Gantt chart for the SRT/SJN algorithm
400
T3
T2
T1
T1
T4
T5
T6
400 125
0 525 600 675 1050
900 1250
75 75 225 150 100
Figure 7.18 | Scheduling using pre-emptive priority technique
The TAT and wait times can be calculated as in the previous examples.
M07_9788131787663_C07.indd 249
M07_9788131787663_C07.indd 249 7/3/2012 3:11:56 PM
7/3/2012 3:11:56 PM

It is obvious that scheduling is not a trivial job. It is a dynamic system with new
entrants coming into the ready queue continuously. The scheduler also needs to make
decisions based on a set of criteria and calculations. Anything extra that a scheduler has
to do eats up valuable time.The delay involved in scheduling is referred to as scheduling
latency. It includes interrupt latency (for task switching), deciding the task to be chosen
and then the time involved in dispatching the chosen task to the CPU.
All the scheduling policies that we have discussed pertain more to general purpose
operating systems.The specific policies that are used for real-time systems are discussed
in Chapter 8.
7.10 | Threads
The word ‘thread’is very commonly used in the discussion of different OS concepts.Now
that tasks/processes have been discussed at great length, it will be easy to understand the
idea of ‘threads’.
What exactly is a thread? Where does it come into the picture?
How is it useful?
A thread is sometimes called a ‘lightweight process’. We now know what a process
(i.e. task) is, what it entails, what it contains, what it comprises of, etc. We started by
calling a process/task as a program in execution. In a program which goes into execu-
tion (and becomes a process/task), a number of subsidiary activities may be needed. For
a process which deals with disk access, there can be one thread for reading and another
thread for writing. We know that the CPU can do only one activity at a time. So then,
it may read the disk for some time, and later may write. Thus, thread switching has to
occur.The obvious question is ‘Isn’t this similar to task switching?’The answer is yes, but
not entirely so.That is why we call a thread a lightweight process.
Each thread has its own program counter, general purpose registers and stack (see
Figure 7.20). Each thread needs a separate stack because different threads do different
activities and therefore handle a different set of procedures. But each thread belongs to a
process.A process can have multiple threads and all these threads share the same ‘address
Specific
to a Task
Specific
to Each
Thread
Threads
Common Address Space with Global Data
Registers
Stack
Thread 1
Registers
Stack
Thread 2
Registers
Stack
Thread 3
Figure 7.20 | Global and local resources for three threads
M07_9788131787663_C07.indd 250
M07_9788131787663_C07.indd 250 7/3/2012 3:11:56 PM
7/3/2012 3:11:56 PM

space’, this aspect is what makes a thread ‘lightweight’. When thread switching takes place,
there is no need to change the memory space, but the PC and stack do change.
Having multiple threads is like having many specialists at work on the same object—
only one of the specialists can work at one time, and each specialist does only one kind
of work on the object. Thus, a thread is one line of activity. Multiple threads act serially,
one after the other, doing separate activities. We can visualize a thread for reading a file,
and another one for writing. Each of these threads executes the low level instructions
needed for doing its assigned job.
The simplest process (task) is one which has just one thread. In general, most
processes have multiple threads. Within a process, different threads might need to be
scheduled and then most things that we discussed with respect to task scheduling apply
for thread scheduling also. But the overhead involved in thread switching is less than in
process (task) switching because all threads share the same memory space.Only registers
(PC, SP and general purpose registers) needed to be saved and restored during thread
switching.
Why threads?
The activities meant to be done by a process are subdivided into separate streams of
activity done quite efficiently by a specific thread.This makes the process a subtotal of a
number of threads.
Threads share the same data space and thus it is easy for threads to communicate
with each other. Thus, we can see that threads assist each other, rather than compete
with each other. But these advantages do not come free—the biggest drawback is that
there is no protection between threads. But this may not be a big issue when we to refer
to the previous statement that ‘threads are meant to assist each other and pertain to the
same process’.
There are kernel threads which form part of OS tasks, and user threads of applica-
tion programs. Many times, user threads are supported by the threads of the kernels.We
then say that the user thread runs over kernel threads.
7.11 | Interrrupt Handling
The mechanism of interrupt handling has been clearly described in Section 2.2.9.In that,
the processor level activities for interrupt processing is described. But in a system with
an OS, when the OS gets to know about a interrupt, it must co-ordinate all the actions
to be done to save the context of the current program, get the interrupt handler, resume
the previous activity once the Interrupt Service Routine (ISR) is over and done with,and
also handle issues like multiple interrupts occurring simultaneously and so on. In short,
even though processors have an inherent interrupt handling mechanism, the OS need to
keep an eye over the whole scene because there are multiple sources for interrupts.
Most I/O devices use an interrupt mechanism to get access to the processor. For
example, only if you type a key does the keyboard controller need to act. When it acts, it
generates an interrupt to let the system know it.The corresponding interrupt handler is
initiated to act, and after its duty it exits.
M07_9788131787663_C07.indd 251
M07_9788131787663_C07.indd 251 7/3/2012 3:11:56 PM
7/3/2012 3:11:56 PM

In the OS, interrupts are used either with hardware support (provided by the
processor) or by pure software. Either way, an interrupt handler or ISR does the
processing of the task initiated by the interrupt. In multitasking systems, task switching
is usually accomplished by interrupts. For example, in the round robin method, every
t seconds, an interrupt is generated and then the current task is pre-empted and the next
task is taken up.
Interrupts are associated with a delay which is called ‘interrupt latency’. Recollect
the sequence of actions that follow an interrupt, which are as follows:
i) Saving the current context
ii) Determining the identity of the interrupt
iii) Switching to the new context
iv) Starting the execution of the Interrupt Handler
Interrupt latency is the sum total of the delays caused by each of these actions. The
amount of ‘interrupt latency’ incurred depends on the speed of the hardware, as well as
the efficiency of the software.
7.11.1 | Interrupts and Task Switching
Task switching is accomplished by the mechanism of interrupts.The first component of
the latency involved is the interrupt processing time.Next,depending on the task sched-
uling method used, a certain amount of delay is incurred in deciding which task is to be
switched to. This delay is called the dispatch latency. Thus, the total switching latency =
interrupt latency + dispatch latency.
7.12 | Inter Process (Task) Communications (IPC)
We have been talking about tasks/processes and also about threads. Two important
points to keep in mind with regard to them are as follows:
i) Processes are independent of each other.This means that one process does not share
anything with another. They reside in separate memory spaces and are protected
from being accessed by other processes.
ii) Threads of the same process share a common address space, and so there is the
element of ‘sharing’ between threads. They are not protected with respect to one
another,and so do not need any additional mechanism for communicating with one
another.
Where then is communication needed?
Processes in the same machine, and processes in different machines might need to com-
municate —it might be to share data,or to send queries and receive responses.Let us first
consider inter process communication as a general case and then discuss the additional
aspects for ‘remote communication’.
When sharing and communication occurs, there is bound to be some conflict
between the activities of the processes that are involved.For example,reading and writing
are conflicting actions when done in the same shared space. There are many other cases
too,where inconsistencies and blockages occur.The resolving of such problems is grouped
under ‘process/task synchronization’. In practice, synchronization techniques should be
M07_9788131787663_C07.indd 252
M07_9788131787663_C07.indd 252 7/3/2012 3:11:56 PM
7/3/2012 3:11:56 PM

included along with ‘inter process/task communication techniques’. However, for the
purpose of clarity in our discussion, we will deal with these two aspects separately. First
we talk about task communication and then add task synchronization techniques to it.
7.12.1 | Task/Process Communication Methods
Two of the task communication methods are
i) Shared memory
ii) Message passing
7.12.1.1 | Shared Memory
This is a very simple concept. It simply means that an area of memory is allowed to be
accessed by more than one task. See Figure 7.21, where N processes have access to a
common area of memory. Let’s assume that the processes involved are all on the same
machine. Normally the address space of processes is separate and protected.When mem-
ory is shared, it assumes that this ‘protection’ is overridden and many co-operating tasks
are allowed to read and write this area of memory.Shared memory communication is very
eﬃcient and fast, as the only actions needed are the setting up of this space (by the OS
kernel) and allowing reading and writing to it,which are relatively fast operations.In prac-
tice, this sharing is a very dangerous condition which can lead to data inconsistencies and
cause a race condition.These issues must be taken care of by synchronization techniques.
7.12.1.2 | Message Passing
This is just like our real-world courier service. The sender sends a message to a receiver
through a courier.The courier in this case, is the IPC mechanism of the OS.The message
is sent from the address space of the sender process to the address space of the receiver
process.This is a case of a point-to-point message passing (see Figure 7.22).Messages may
also be broadcast, that is, sent from one point to many receiving points (see Figure 7.23).
For message passing, the send and receive formats are as follows:
Send (destination, message)
Receive (source, message)
This means that the sender needs to know where the message is to be sent to.
Similarly, the receiver should know the source of the message it receives.There are vari-
ous mechanisms for message passing, with only minor diﬀerences between them.Two of
them are message pipes and mailboxes.
Figure 7.21 | N processes sharing a common area of memory
Process-1 Process-2
Shared Memory
Process-N
M07_9788131787663_C07.indd 253
M07_9788131787663_C07.indd 253 7/3/2012 3:11:56 PM
7/3/2012 3:11:56 PM

Message Pipes
This is a very simple concept which was first implemented in Unix, and is available in
Windows and is part of POSIX compliance.
A pipe is a kernel data structure acting as a FIFO (First In First Out) buffer into
which writing from one end and reading from the other end is possible (see Figure 7.24).
As such it is a unidirectional transfer mechanism. If, between two tasks, bidirectional
data transfer is needed, two pipes are needed, obviously. Creating a pipe is just a matter
of defining the write point and the read point. The exact mechanism is implementation
specific, meaning that Unix may do it in one way and Windows in another way.
Pipes may be anonymous or named. The former belongs to a process or its child
process, and does not exist outside the process. Because of its association with a specific
process, it does not need a name and that is why the terminology ‘anonymous’.
A named pipe is a system concept. When a process creates a pipe, it is available as a
name (similar to a filename) in the system space.A number of pipes,with names (similar
to filenames) are available and any process is free to use any of them.
P1
Multicast
P3 PN
P2
Figure 7.23 | Broadcasting of messages from one process
Write()
Pipe
Read()
Figure 7.24 | Two ends of a message pipe
Figure 7.22 | Passing messages from a sender to a receiver process
Sender
Process-1
Receiver
Data
Process-2
M07_9788131787663_C07.indd 254
M07_9788131787663_C07.indd 254 7/3/2012 3:11:56 PM
7/3/2012 3:11:56 PM

Mailboxes
This is an indirect way of sending messages. It is just like a mailbox available in post
oﬃces.The sender deposits the message in the box.The receiver should have the mecha-
nism to receive it. If this is a common mail box, many receivers in the same locality can
receive, but a receiver process can taken out only messages addressed to it. In the OS
scenario, it is like having a mailbox maintained by the kernel. Thus the mailbox is com-
mon to the whole system, and the operating system manages the mailbox which is in
the kernel space.
Another option is for each process to have its own mailbox in the address space of
the process. This method is less popular. In a real-world analogy, it is like individuals
having their own mailboxes placed in their own premises.We know that this is not done
unless the receiver is a business establishment with the possibility of lots of mail being
sent there.
Transfer Protocol
Since senders and receivers are involved in a communication setup,they have to agree on
a protocol which can be synchronous (blocking) or asynchronous (non-blocking). Let us
see what these terms mean to us here.
i) Blocking sender, blocking receiver: In this, when a sender sends a message, it
expects the receiver to receive it (and maybe acknowledge it). Only then will it send
the next message.Thus, the sender is blocked for further sending and the receiver is
also blocked.The receiver has to ‘receive’ this message. Only then will it be sent any
more messages.Thus, both the sender and receiver are blocked.This is a tightly syn-
chronous system. See Figure 7.25 which shows process 1, the sender, and process 2,
the receiver. In a synchronous system, the sender sends a message and the receiver
receives (i.e. the execution of the receive process occurs). But only after the getting a
response,that is,an acknowledge signal from the receiver,will the sender send again.
At other times, both of them are blocked.
Ack 2
Ack 1
Message 2
Message 1 Time
Receiver
Sender
Blocked
Execution
Interprocess Communication
Figure 7.25 | Transfer protocol for synchronous communication
M07_9788131787663_C07.indd 255
M07_9788131787663_C07.indd 255 7/3/2012 3:11:57 PM
7/3/2012 3:11:57 PM

ii) Non-blocking sender, blocking receiver: Here the receiver process waits for a
specific message, only then will it accept other messages. But the sender is allowed
to send as long as the buffer is not full.
Producer Consumer Paradigm
The sender-receiver communication is a producer-consumer paradigm in which the
consumer can consume only if the producer produces, and the producer need produce
only as long as the consumer consumes it. When this condition does not hold, commu-
nication cannot take place.
Let’s look at it from the side of the producer, that is, the sender. When a sender
sends a message, it can be placed into a buffer for the receiver to pick it up when he
wants it.The sender can continue to send, until the buffer is full, after which the sender
will have to wait until the receiver receives some messages and the buffer is cleared for
more messages to be sent.
Looking at it from the side of the receiver, the receiver can receive only if the sender
has sent something into the buffer. As the sender keeps on sending data, the receiver can
keep on receiving it (the received data is removed from the buffer). If all the messages
are received and the sender does not send any more messages, the buffer is empty and no
more reception can be done.
This whole scenario is highlighted as the ‘bounded buffer problem’.There should be
a mechanism for synchronizing the activities of the producer and consumer, and this is a
classical OS problem with many kinds of solutions suggested, none of which are perfect
or ideal.
Remote Procedure Call
In the cases of IPC discussed so far, the implied assumption is that of processes in the
same computer that communicate. Now consider two physically separate computer sys-
tems that need to communicate.They will need a physical as well as a logical medium to
communicate: the physical communication is either a wired or wireless network, and the
protocols that govern the communication are what is meant by a logical medium.
The title ‘remote procedure call’ extends the concept of local procedure calling, so
that the called procedure can exist in a different address space from that of the call-
ing procedure. The two processes may be on the same system, but it is more sensible
to consider different systems with a network connecting them. RPC is a standardized
protocol by using which, users of distributed applications can be relieved of the necessity
of knowing the finer and intricate details of the networking mechanism. In effect, RPC
serves as a mediator for client/server communications.
Note the definitions of some of the terms involved in RPC.
i) Client: A process, such as a program or task that requests a service provided by
another program. The client process uses the requested service without having to
deal with too many working details about the other program or the service.
ii) Server:A process,such as a program or task,that responds to requests from a client.
iii) Endpoint: The name, port or group of ports on a host system that is monitored by
a server program for incoming client requests. The endpoint is a network-specific
address of a server process for remote procedure calls.
M07_9788131787663_C07.indd 256
M07_9788131787663_C07.indd 256 7/3/2012 3:11:57 PM
7/3/2012 3:11:57 PM

RPC and Its Working
An RPC is similar to a procedure (function) call. When an RPC is made, the calling
arguments are passed to the procedure in the remote machine and the caller waits for a
response to be returned. Figure 7.26 shows the flow of activity between two networked
systems.
The client makes a procedure call that sends a request to the server and waits. The
thread is blocked from processing until either a reply is received or is timed out. When
the request arrives, the server calls a dispatch routine that performs the requested service,
and sends the reply to the client. After the RPC call is completed, the client program
continues. RPC specifically supports network applications. It presumes the existence of a
low-level transport protocol such asTCP or UDP (Transmission control protocol and user
datagram protocol—both are commonly used in Internet protocols) for carrying the mes-
sage data between communicating programs.RPC spans the transport layer and the appli-
cation layer in the open systems interconnection (OSI) model of network communication.
RPC is usually adopted for calls between procedures in different machines. But the
mechanism involved may be used in the same machine as well.Microsoft claims that much
of its Windows architecture is composed of services that communicate with each other to
accomplish a task and that these services use RPC to communicate with each other.
7.13 | Task Synchronization
We next consider a very important aspect of operating systems.There are three aspects to it.
First, how different tasks with conflicting actions can cause havoc. Second, how to
avoid such situations and third, if such a situation occurs, how to get out of it.
Request
Completes
Execute
Request Call
Service
Service
Executes
Return
Reply
Call RPC
Function
Machine B
(Server)
Machine A
(Client)
Client
Program
Program
Continues
Figure 7.26 | Sequence of actions in RPC
M07_9788131787663_C07.indd 257
M07_9788131787663_C07.indd 257 7/3/2012 3:11:57 PM
7/3/2012 3:11:57 PM

7.13.1 | Parallelism aka concurrency
From our previous discussion on scheduling, it must be clear that what we try to achieve
on a machine with just one CPU and multiple tasks is pseudo parallelism or concurrency,
the latter being a better word for it.As far as users are concerned,a number of concurrent
tasks are running, using the single available CPU. When such an illusion of parallelism
is presented to the user, it must also be kept in mind that there is ample scope for clashes
and confusion because of the conflicting interests of the concerned tasks. We will now
look into such things which happen as a natural result of the concurrency we are out to
achieve.
7.13.2 | The Race Condition
If we think of two or more tasks, we naturally assume them to be independent and
residing in separate address spaces.Thus, they are independent of one another. But there
will be cases when they use global or shared resources—global variables or code or data
which is defined to be global, and thus allowed to be used by all tasks which need it.
Let’s consider two tasks associated with railway ticket reservation. Task T1
looks
after ‘reservation’while the other,T2
, deals with cancellation. After doing their parts, the
respective tasks update a shared variable named A, which indicates the availability of
seats. It is obvious that T1
decrements A, and T2
increments it. Now, what would happen
in a scheduling system when T2
is interrupted just after it does the ticket cancellation,
and is midway in the computation to increment A?
T1
starts the reservation process. When it tries to do it, the availability that it sees is
not the right number.This is because T1
interrupted T2
before it (T2
) could complete the
update of (write into) the shared variable A.Thus, the data read by T1
is wrong. When it
does the reservation,it does it using a wrong value of the shared variable A (the availabil-
ity of seats). In a peak season, it is very difficult to get tickets, as we all have experienced.
In an extreme case, the reservation program cannot do a reservation, because of finding
the availability to be 0. Because of the cancellation, actually tickets are now available, but
the reservation task T1
does not know it.
Now the question is, when the interrupt came, why was T2
not able to complete
its updating of A? The updation is just a simple addition, isn’t it? The answer is that, in
a high level language it might be just one instruction, but we know that one HLL line
corresponds to many low level assembly instructions. So incrementing A can get inter-
rupted midway. Later, after the interrupting task T1
completes the reservation part, it
decrements the same variable A.
This situation creates an ambiguity and the value of A at any time depends on how the
sequence of incrementing and decrementing by the two different tasks occurs. This is an
example of what is called a called a race condition. Additional problems arise when more
than two tasks are in the scene.The most common symptom of a race condition is unpre-
dictable values of variables that are shared between multiple tasks/threads.This is because
of the unpredictability of the order in which the tasks execute. At one time Task1 wins, at
other times Task2 wins.At certain other times,execution works correctly.Also,if each task
is executed separately without any interference, the variable value is always correct.
The ticket reservation issue is an example of how one task reads a wrong value of the
shared variable and makes a wrong decision based on the value of the shared variable.
M07_9788131787663_C07.indd 258
M07_9788131787663_C07.indd 258 7/3/2012 3:11:57 PM
7/3/2012 3:11:57 PM

If it were a banking problem, and the shared variable is the ‘balance’ in the account of a
customer, it will cause much more serious problems (in the practical world) than in the
case of railway ticket reservation.
7.13.2.1 | A Printer as a Shared Resource
Let’s try to understand the race problem with another example. Try to visualize this
scene. A print command is given by one task, and printing starts. In between, another
task interrupts and gives another print command. The first printing is stopped mid-
way, and the printer prints the data given by the second task. You can see what will
be the printed output—it will contain the printed output corresponding to both tasks,
Figure 7.27 illustrates this. The two pages indicate the two tasks which needs separate
printed outputs, but because of ‘racing’, it may turn out that the printed page contains
the printed matter of both the tasks.
How could this have been prevented? For that, let’s first try to understand some
terms related to this problem.
7.13.3 | Critical Section
That part of the program where the shared memory/variable/device is accessed, is called
the ‘critical section’.Understand that it is the ‘code’that uses the shared memory/variable
that is the critical section (and not the area of memory).The critical section can be used
by various tasks, as we have just seen. To avoid race conditions and flawed results, one
must identify codes in critical sections in each task /thread. And then, when one task
enters the critical section,it must be ensured that no other task interrupts it and enters it.
7.13.3.1 | A Railway Track with a Critical Section
See Figure 7.28, in which a common part of a railway track is the critical section. The
part of the track which is common to the two railway lines is the ‘critical section’ with
respect to railway traffic. It is obvious that only one train can use the common track at
a time. The train in the other track has to wait. The same is the case with programming
tasks which arrive at the critical section of code. Only one task is allowed to enter the
critical section. Others have to wait.
Figure 7.27 | Printer and two tasks
M07_9788131787663_C07.indd 259
M07_9788131787663_C07.indd 259 7/3/2012 3:11:57 PM
7/3/2012 3:11:57 PM

7.13.3.2 | Solutions to the Problem
There can be hardware solutions to the racing condition
Atomicity
The code for the program in the critical section may be written in a high level language,
but after compilation is converted to assembly instructions. If the increment and decre-
ment operations are ‘atomic’, that is, uninterruptible or unbroken, then the racing condi-
tion does not occur.Assembly instructions are uninterruptible and therefore ‘atomic’,but
one high level language line corresponds to a number of assembly instructions.
Let us look at an increment operation for a global variable A stored in memory.The
code is
A + + ; increment A
When compiled, it is converted to assembly operations (processor dependant), such
as the following:
i) Moving it to a register
ii) Incrementing the content of the register
iii) Copying it back to memory
If, by any chance, this task gets interrupted just after the move operation, another
task uses the ‘wrong’ (unincremented) value of this global variable. This whole sequence
is not atomic, and that is the real problem.
But if the three assembly operations ‘together’ are atomic, the second task will be
able to interrupt the first one only after the global variable has been incremented. This
is a solution possible if the processor hardware provided ‘atomic’ instructions for certain
operations (not necessarily the increment and decrement instructions alone). Such a
solution is possible, but having processors catering to such requirements in not always
probable.
Disabling Interrupts
The second hardware method is to disable interrupts if a task/thread enters a critical sec-
tion.That disallows a task/thread switching until the critical section part is done with. It
is okay to do this, but interrupt disabling may affect other resources like I/O devices, and
also affect the scheduling algorithms. Hence, this is not a feasible solution.
Figure 7.28 | A critical section of a railway track
Critical Section
Track 1
Track 2
Station
Platform
M07_9788131787663_C07.indd 260
M07_9788131787663_C07.indd 260 7/3/2012 3:11:57 PM
7/3/2012 3:11:57 PM

7.13.3.3 | A Practical Solution-locks
In practice,the actual solution is to lock out the critical section from contenders,once an
operation therein is taken up by a certain task/thread.The important point is that when
one task is executing programs involving shared modifiable data in its critical section,
no other task is to be allowed to enter its critical section. This means that execution of
critical sections by tasks should be mutually exclusive.
A locking mechanism called ‘mutex’ (mutual exclusion lock) is used to synchronize
access to a resource, in this case the ‘critical section’. Only one task (can be a thread also)
can acquire the mutex. This means that there is the idea of ownership associated with
mutex, and only the owner task can release the lock (mutex).
What does the above statement mean?
When a task needs to access a shared resource, it (the task) acquires the lock, and thus
‘owns’ the resource at the current time. When the task completes its use of the critical
section, it releases the lock. Any other task can then take the lock, acquire access to the
resource (keeping others away) and use it.
A simple pseudocode for a task to access a critical section is as shown
acquireLock();
Process Critical Section
releaseLock();
This shows that a task/thread acquires the lock prior to executing the critical section.
This lock can be acquired by only one task.
7.13.4 | The Readers‘Writers’Problem
This is a classical problem in OS first identified by Djikstra in 1970.
Now that we know what is meant by a ‘critical section’, we should be able to under-
stand this problem. If a number of readers and writers need access to a shared data area,
how is it to be dealt with? We have already made an assumption that when a critical
section is being used by one task, it is locked for other tasks. There are different types
of tasks and they may be affected by being locked out. One classification of tasks is
into readers and writers and the possible solutions too encompass the differences in the
‘effects’ on these two types of tasks.
7.13.4.1 | Readers
Consider that a reader is reading from the shared area, and before it exits the section,
another task is scheduled which will also need to ‘read’ from the same area. Reading is a
tame process. It does not change the content, and so more readers may be allowed, while
the first reader’s reading is not complete yet. So the reading of a shared section can be
interrupted by another reader.This is one aspect.
7.13.4.2 | Writers
What if a writer now comes in? One way of sorting it out, is to let all readers finish
reading—then only should a writer be allowed in. Once a writer enters its critical
section, (wherein it writes into the shared area), further readers and writers should be
M07_9788131787663_C07.indd 261
M07_9788131787663_C07.indd 261 7/3/2012 3:11:57 PM
7/3/2012 3:11:57 PM

blocked until this writer has completed. The point is that there is no danger in having
several processes read data concurrently. But, writing or modifying data must be done
under mutual exclusion to ensure consistency of data.
The above sequence seems quite logical, but both these solutions are prone to
starvation. The first allows readers to indefinitely lock out writers and the second
allows writers to indefinitely lock out readers. Because of this, many solutions have
been proposed and used according to the approach decided in the specific OS design.
Semaphores (Section 7.14) are used to do signalling between tasks, and also the
sequence and priority of readers and writers can be decided according to a pre-decided
OS policy.
7.13.5 | Deadlocks
This is another term commonly used in OS-related terminology.
What is a deadlock? What are the conditions necessary to consider it as
a deadlock? What are the methods available to handle it?
We have all experienced deadlocks. It is a state in which we feel that nothing gets going.
Everything is blocked. A typical case is a traffic junction (intersection) (see Figure 7.29).
Assume that there are four roads meeting at the junction. Vehicles from all the four
roads are waiting to go across the junction, as a result of which no movement is possible
and it is a state of deadlock.
How does a deadlock occur in an operating system? Say, a task (A) needs to execute.
It gets the CPU,but it also needs a resource (like a printer,display,etc.).But this resource
is now in the possession of another task (B).What then? Task A cannot execute because
of not getting the resource. And Task B cannot execute because of not having the CPU.
Thus, neither Task A nor B gets to execute.This is obviously a deadlock, with neither A
nor B being able to proceed at all.
We can generalize this case by saying that the first task needs a resource that the sec-
ond task holds, and the second task needs the resource that the first one currently keeps.
Thus, neither tasks get executed.The resources may be either physical (printers, memory,
CPU, etc.) or logical (files, semaphores, etc.)
Figure 7.29 | A deadlocked traffic intersection
M07_9788131787663_C07.indd 262
M07_9788131787663_C07.indd 262 7/3/2012 3:11:57 PM
7/3/2012 3:11:57 PM

Both tasks need both the resources, then only can any one of them execute. But
neither task is willing to let go of the resource that it currently holds. Each one waits for
the other task to relinquish the resource that it holds. This does not happen and that is
the issue with deadlocks.
Definition A set of tasks is deadlocked if each task in the set is waiting for an event
that only another task in the set can cause. There are four conditions attributed to the
occurrence of deadlocks, designated as ‘necessary conditions’.
i) Mutual exclusion: The resource that is being contended for, is assigned to a specific
task and no other task can expect to be allowed to share it. Thus, the resource has
been kept locked by the task in question, and will not let it go.
ii) Holdandwait:This refers to the state that certain tasks are holding certain resources,
but they need more resources to accomplish their execution. Such a task is in a ‘hold
and wait’ state.
iii) Non pre-emption: The resources already allocated to some tasks cannot be forcibly
taken away from them (cannot be pre-empted) by the OS—only when the task
execution is complete, will the task release it.
iv) Circular wait: Try visualizing a set of tasks,each of them waiting for a resource held
by the next task in the set.This forms a circular queue of waiting tasks.
The conditions listed above are called ‘Coiffman conditions’and deadlock occurs on
the combined occurrence of these conditions.
What can be done about deadlocks?
There are various ways of dealing with them.They are as follows:
i) Ignore deadlocks: This is a popular strategy employed by many operating systems
and is called the ‘Ostrich Approach’ (guess why). But it may not be a good solution
for systems which are safety critical.
ii) Avoidance: This is done by taking extra care in resource scheduling. It is just like
having a traffic signal at traffic intersections, to avoid traffic jam.
iii) Prevention: Prevention, they say, is better than cure. Prevention is slightly different
from ‘avoidance’.This is done by negating one of the conditions that cause deadlock.
iv) Detect and recover: Yet another strategy allows deadlocks to occur and then gets
the system to recover from it. At a traffic junction, when a deadlock happens, a
policeman may force the vehicles on one side of the junction to back up, and allow
vehicles from another direction to cross the junction. This implies interference by
the OS for the recovery procedure.
Let’s see how the latter two strategies can be applied to a computer system.
7.13.5.1 | Detect and Recover
In this, the idea is to frequently verify whether a deadlock has occurred. Once it is
detected, deallocation of resources or pre-emption can be resorted to. Take the example
of a bridge and traffic across it, as shown in Figure 7.30. Because the bridge is very
narrow, traffic is allowed only in one direction. A deadlock occurs when two cars from
opposite directions find themselves face to face with each other on the bridge.
M07_9788131787663_C07.indd 263
M07_9788131787663_C07.indd 263 7/3/2012 3:11:58 PM
7/3/2012 3:11:58 PM

In this state, only one solution is possible: one of the vehicles should back up and
clear the bridge. If there is a long line of traffic behind this car, a number of vehicles may
have to back up. In the OS context, this is analogous to the case of detecting a deadlock
and then, certain tasks which have been holding resources are forced to give them up, or
it may be that some task which is using the CPU is pre-empted forcibly.
7.13.5.2 | Deadlock Prevention
This strategy looks potentially very simple, but the problem is that a certain amount of
‘effort’ is needed to implement it. This ‘effort’ tends to introduce inefficiency into the
system. Why so? There are four conditions stipulated for deadlock occurrence. If one
of these conditions is negated, deadlock cannot occur. But this will lead to inefficient
utilization of resources because it prevents resources from being allocated in a normal
sequence. It can force pre-emption of a resource held by a task.To prevent deadlock due
to ‘circular wait’, some sequential order must be devised, which may not be optimal
In operating systems, there is a ‘resource manager’ which has all the information
regarding resources and their allocation.The resource manager, with the help of resource
graphs and other tools devise some solution.
The Dining Philosopher’s Problem
This is a classical problem formulated by Djikstra. The dining philosophers problem is
intended to illustrate the complexities of managing shared resources in a multitasking
environment.
Here there are five brainy Chinese philosophers involved in discussions, but also are
at a dining table wanting to eat.There are five bowls of rice placed before each of them,
but each philosopher needs two chopsticks to eat. Chopsticks are a scarce resource and
there is just one chopstick placed between two philosophers. The problem that arises is
when all the five philosophers want to eat at the same time. Each one picks up the chop-
stick at his left, and then looks to the right for the second chopstick. It is not available,
naturally.Thus, comes the deadlock with none of five philosophers being able to eat, and
the ensuing starvation too.
In the OS context, each philosopher represents a task/thread and each chopstick is
a shared resource.
Can this solution have a complete and perfect solution? The answer is ‘No’.There are
various suggestions for it, some which are as follows:
i) Have only four such philosophers
ii) Some philosophers are courteous enough to wait
iii) No philosophers eats indefinitely
Figure 7.30
M07_9788131787663_C07.indd 264
M07_9788131787663_C07.indd 264 7/3/2012 3:11:58 PM
7/3/2012 3:11:58 PM

This is just an example of a problem, and solutions may be chosen as per the discre-
tion of the OS designer.
7.14 | Semaphores
The word ‘semaphore’ means ‘signal’. In operating systems, a variable can be used as a
semaphore which does signalling. But what kind of signalling do we mean? First let us
try to understand a binary semaphore.
7.14.1 | Binary Semaphores
Assume two trains approaching a common area of a track.One train gets to use this track,
and as it does so,it signals to the other train to wait.In this way,the first train acquires the
associated binary semaphore and gives it a value of ‘0’,that is,the semaphore is in the sig-
nalled state.The second train sees this signal (semaphore) and does not attempt to use the
track. It simply waits. Meanwhile, the first train uses the critical section of the track, and
after that, it releases the semaphore which now has the value ‘1’. The second train finds
that the track is free and uses it, after making the semaphore to be ‘0’ once again. This
prevents other trains from using the same track, until the semaphore is released again.
So we see how the signalling occurs. In the case of an OS, if some task wants to
run in a critical section, that task can acquire the corresponding semaphore, and signal
to other tasks this matter. When this task exits the critical section, the semaphore is
released and other waiting tasks can try to use the critical section.
7.14.2 | Counting Semaphores
The idea of using semaphores was formulated by Djikstra.The binary semaphore is only
a special case of a counting semaphore.
Philosopher
Philosopher
Philosopher
Philosopher
Philosopher
Figure 7.31 | Five philosophers at a dining table
M07_9788131787663_C07.indd 265
M07_9788131787663_C07.indd 265 7/3/2012 3:11:58 PM
7/3/2012 3:11:58 PM

To illustrate the idea of a counting semaphore, consider a car park with space for
50 cars.Thus, the semaphore is initialized to 50, meaning that 50 cars can come into the
car park.The semaphore value is incremented or decremented as cars enter and leave the
parking lot. Every time a car enters the car park, the value of the semaphore is decre-
mented by one. When a car leaves the parking area, the semaphore is incremented by 1.
When, say, the semaphore value is 20, it implies that there is space for 20 more cars.
When the parking lot is full, the semaphore value is 0, and a new car entering it
cannot park inside, as the semaphore value becomes negative.
Similar is the case of an operating system. This can be used in the bounded buffer
problem. If the bounded buffer (memory with limited size) is divided into a number of
blocks,each block can be accessed by a task (or thread) and the number of tasks accessing
the buffer is limited to the number of blocks.This number can be used as the count N of
the counting semaphore used by tasks to access the different blocks of the buffer. When
N = 0, it implies that no more buffer blocks are available for tasks to access.
7.14.3 | Binary Semaphore vs Mutex
Are they the same? Both of them take only the values of 1 and 0. Some operating sys-
tems use them in the same way, but conceptually they are different. A mutex is a lock
of a resource. A task (or thread) takes this lock (mutex is then set to ‘1’), and uses it for
itself, to lock out other possible users from accessing the resource. At a particular time,
the task owns the lock.
After using the resource, the task relinquishes ownership of the task by setting the
mutex to ‘0’.
A binary semaphore,on the other hand,is used for signalling.When a task takes the
semaphore of a resource (of I/O devices, critical section of code, etc.), it signals to other
tasks that they should not try to access the resource. Once the task completes its action,
it signals to the waiting tasks that the shared resource is free for use.
The binary semaphore acts like a gatekeeper to a room which has only one bed for
sleeping.Once the occupant leaves the room,the gatekeeper indicates (signals) to others,
that the room is free. In the OS context, tasks waiting to get access to the resource don’t
have to keep checking for its free status, the semaphore ‘signals’to them the status.Thus,
the waiting tasks can go to sleep and get awakened by the semaphore.This is a ‘sleep and
wake up’ scenario.
In the case of a mutex, a prospective occupant gets the key of the room from the
‘reception desk’, uses the room and then returns the key to the reception when leaving.
The issue here is there, when others want to use the same room, they have to continually
check at the reception counter as to whether the key has been returned or not. In the
OS context, this puts other waiting tasks in a test and wait loop, leading to inefficiency.
7.15 | Priority Inversion
This is something that is likely to occur in an OS,because of the many conflicting aspects
that arise frequently. Let’s try to visualize a situation that causes this.
A system has multiple tasks which are scheduled according to their priorities. Some
tasks also need access to resources that are shared. Consider three tasks of low, medium
M07_9788131787663_C07.indd 266
M07_9788131787663_C07.indd 266 7/3/2012 3:11:58 PM
7/3/2012 3:11:58 PM

and high priorities with names T1
,T2
and T3
, respectively. Let’s say that the low priority
task T1
is running and is using a shared resource A by acquiring it semaphore. Task T3
come in, and because it is a high priority one, can pre-empt T1
. But incidentally,T3
also
needs the shared resource held by T1
.There is no way to get T1
to release the semaphore,
and T3
can run only after acquiring the semaphore. So T3
waits and we say it is ‘pending
on the resource’.
One direct eﬀect that will occur in a real-time system (Section 8.2) is that T3
might
miss its deadline due to this waiting. But that is not ‘priority inversion’. The inversion
occurs when a medium priority task T2
comes in.T2
does not need the resource, and so it
can easily pre-empt task T1
(because its priority is higher than that of T1
).This is prior-
ity inversion, because the medium priority task T2
gets its chance to execute while the
higher priority task is kept waiting. This is a very serious problem which totally upsets
the balance of the system. Let us see what solutions are available.
7.15.1 | Solutions
7.15.1.1 | Priority Inheritance
Here,a lower-priority task inherits the priority of any higher-priority task which is ‘pend-
ing on a resource’they share.As soon as the high priority task goes into its ‘pending’state,
the OS should get the low priority task to have the same priority as the pending task.
How does this help?
If a task of medium priority comes in during the pending period, it will not be able to
pre-empt the currently executing low priority task.Then, once the semaphore is released,
the pending high priority task takes it, and starts executing. At the same time, the low
Latency for T3
T1
resumes since
no other task is
ready
T1
Priority
T1
Finally releases ‘A’
Time
T1
T1
T3
T3
T2
Since T3
cannot lock ‘A’, it is blocked
T3
also want to access ‘A’, but is currently locked by T1
T1
Locks resource ‘A’
T2
does not want ‘A’
Figure 7.32 | An illustration of priority inversion
M07_9788131787663_C07.indd 267
M07_9788131787663_C07.indd 267 7/3/2012 3:11:58 PM
7/3/2012 3:11:58 PM

priority task is put back into its original priority. In this setup, no task of intermediate
priority can come and upset the priority balance.
7.15.1.2 | Priority Ceiling
This is another solution. In this, each shared resource is given a priority called ‘priority
ceiling’ which is higher than the highest priority of all tasks that can access the resource
When a task uses this resource, the priority of the task is boosted (if it is a low priority
task) to the ceiling value, thereby ensuring that a task owning a shared resource won’t
be preempted by any other task attempting to access the same resource. When the task
releases the resource, the task is returned to its original priority level.
7.16 | Device Drivers
The above word has been used before (Section 7.5.3). Besides this, you are likely to have
heard the term if you have attempted to attach new I/O devices to your PC. The point
is that device drivers are associated with external devices that you want to add to your
processing hardware, which may be a PC or an embedded board.
Think of a USB flash disk that we frequently use with our PC. This device is nor-
mally unknown to the PC’s processor hardware and software. The processor does not
know anything about its signals or its internal configuration.This implies that ‘someone’
should act as an intermediary so that the processor can interact with the flash disk and
allow communication and data transmission between the two seemingly incompatible
entities.Well, a device driver performs the role of this intermediary. It gets the processor
and the flash disk to understand each other and then initiates communication between
the two.
How is this done? Let’s examine this in a philosophical sense.The USB is a well-
defined protocol with well-defined modes of data transfer.The flash disk has four pins
through which signals (including power) flow. The processor must be made aware
of the intricate details of the flash disk’s signal flow and the USB protocol. It is too
much for an ordinary user to have to bother about such things, so the driver program
does this.
The same principle applies to any I/O device that is used, right from a printer and
scanner to a digital camera.In present times,when many device drivers are inbuilt in the
OS, these devices have become ‘plug and play’ devices, that is, the user has only the job
of plugging the device in – the OS detects the plugged in device and loads into RAM
the appropriate device driver from the stored programs in the hard disk. When the
device is unplugged, the driver can be unloaded (removed from main memory) until it
is needed again. Thus, we see that device drivers for all possible (and probable) devices
are available in a modern general purpose OS. For an embedded OS, such a large col-
lection of device drivers is not available nor needed, because embedded systems are
application specific.
Now let us go a bit deeper and try to answer various questions.
What is a device driver?
It is a software interface to a hardware device that handles requests from the kernel
regarding the use of the particular I/O device. There is a well-defined interface for the
M07_9788131787663_C07.indd 268
M07_9788131787663_C07.indd 268 7/3/2012 3:11:58 PM
7/3/2012 3:11:58 PM

kernel to make these requests. Because of this, adding new devices is easy. In short, there
is a device driver for each and every hardware device that is to be used by the system.
Device drivers can be classified as follows.
i) Block device drivers: This is a driver that is well suited for ‘block devices’ which
transfer data in blocks, like for example, disk drives. Such I/O devices have block-
sized buffers as part of the buffer cache in memory.
ii) Character device drivers:As the name signifies,this is used for devices which move
data as ‘characters’, and therefore does not need a buffer cache. A line printer is one
such device, where data is sent as characters. But such drivers are not limited to
applications that handle data as characters, some devices that handle data as large
chunks of data are also within this class-the characteristic is that data is to trans-
ferred directly, without buffering. Most drivers belong to this class.
iii) Network device drivers: This type has to setup and prepare a network for data
transmission and reception, and handle all the intricacies and protocols involved.
But what is this ‘buffer cache’?
It is frequently needed to read data from the disk, and this reading is an extremely slow
process. Sometime the same data will be read again and again within a short time frame,
and the need to access the disk each time makes the system very slow.To circumvent this
problem, the method of speed up is of reading the information from disk only once and
then keeping it in memory until no longer needed.This is called disk buffering, and the
memory used for the purpose is called the ‘buffer cache’. A buffer cache is used in the
case of ‘writing’ as well.
How does a hardware device communicate with a processor?
There is the ‘polling’method by which a processor keeps waiting for a hardware to signal
its need, but since this is very inefficient, mostly it is the interrupt based data transfer
that is done. Whenever a I/O device ‘acts’, an interrupt is generated and an ISR or
interrupt handler is awakened (Section 2.2.9 and Section 7.11). This ISR does the rest
of the job.
7.16.1 | Reading a Key Pressed Using a Keyboard Driver
Consider a keyboard and its associated device driver. Let us examine how it is that when
a key is pressed, it is identified and made known to the user program (like a document,
for instance).
Let this be the scenario.The user (of a PC) opens a document file and wants to type
in data into it. When a key is pressed, the ASCII value should reach the user program.
For this, the identity of the key should be found out.
We know that there is a powerful processor in a PC.
But to get it involved in the key detection logic is a big waste on its time. Usually
all keyboards have a dedicated keyboard controller (with a μC in it—this controller
is usually a part of the motherboard chipset). Thus, when a specific key, say M, as in
Figure 7.33 is pressed, its ASCII value is obtained in the data register of the keyboard
controller.The fact that a key has been pressed and that the key code is available is to be
M07_9788131787663_C07.indd 269
M07_9788131787663_C07.indd 269 7/3/2012 3:11:58 PM
7/3/2012 3:11:58 PM

made known to the user program.This ‘knowledge’is passed through the levels as shown
in Figure 7.33. The keyboard controller generates an interrupt which activates an ISR
which is a part of the device driver code. This ISR gets the keycode in the buffer in the
device driver level.
But how does the user program view it? The figure summarizes the flow of control
between a user program, the kernel, the device driver and the hardware.The dotted lines
show the flow of control signals, while the solid lines indicate the flow of data.The user
program sends a read request,using a read function (maybe a printf).This passes through
the kernel and is translated as a ‘system call’ to the device driver. Since the character
(ASCII value of the key pressed) is available in the buffer of the device driver, it is sent
to the user program, which is displayed on the video monitor (there is a device driver
associated with the display device, as well).
What role has the device driver played here? Only the device driver software has any
knowledge about the keyboard controller, its hardware signals, the format of the signals
(control and data) it sends and so on. No other part of the OS need be bothered about
trying to know or understand the keyboard controller.
7.16.2 | Purpose of a Device Driver
Figure 7.34 shows the different layers through which a user process has to go, to use a
particular hardware.The steps involved are as follows:
i) The user process makes a request for I/O access
ii) The request is channelled through the OS,and is sent to the device driver as a system
call.
Figure 7.33 | The sequence of actions in reading a key pressed
Bus
ISR
M
M
M
Data
INT
Keyboard
Controller
Keyboard
Read Call
User
Program
Kernel
Device
Driver
Buffer
M
Read Interface
Read Interface Register
M07_9788131787663_C07.indd 270
M07_9788131787663_C07.indd 270 7/3/2012 3:11:58 PM
7/3/2012 3:11:58 PM

iii) The device driver understands what is needed and generates an appropriate interrupt
to the hardware.
iv) The interrupt handler handles the interrupt request, gets the required service from
I/O and wakes up the device driver which then communicates the matter to the user
process.
Let us conclude by saying that the purpose of a device driver is to handle kernel
requests with regard to a specific I/O device,through a well-defined kernel interface.On
one side of this interface, is the device specific device driver code, and the other side, is
a set of well-defined system calls. Because of this, adding new devices to a system is an
easy task.
7.16.3 | Device Driver Design
The point is that, only the device driver needs to anything about the hardware, or rather
the device driver should know everything about the device it is made for. Every other
layer in Figure 7.34 is device independent.
Think of designing a parallel port driver. Such a parallel port will have a ‘port con-
troller’. This controller has control registers, status registers, mode controls, etc. The
device driver should write appropriate bits in all these registers, for the chosen mode of
operation. To arrange for data transfer, for a simple parallel port of 8086 with 8255 as
the controller IC, this is reasonably simple.
As devices become more complex, it is likely that there are numerous registers to
configure and many modes to choose from.This makes device driver design complex.
When a new embedded board is designed, drivers for its peripheral hardware and
buses have to written from scratch, knowing the full details of the processor and other
chips on the board.
Nowadays Linux has become the OS of choice for embedded systems, and writing
device drivers based on Linux is very commonly done. A lot of standard functions are
available for assisting the process of driver development.
Hardware
User
Device Independent
I/O Software (Kernel Service)
I/O Service
I/O Request
Device Dependent
Software (Device Driver)
Interrupt Handler
Figure 7.34 | Layers associated with a device driver
M07_9788131787663_C07.indd 271
M07_9788131787663_C07.indd 271 7/3/2012 3:11:58 PM
7/3/2012 3:11:58 PM

7.17 | Codes/Pseudo Codes for OS Functions
A deeper understanding of operating systems concepts requires a ‘hands on’ experience
of coding. A few sample codes /pseudo codes are presented here.
Note: All codes and pseudo codes discussed here,for ‘RTOS’is applicable to GPOS
as well.
7.17.1 | Multitasking
A task is a piece of code, from whose standpoint, appears to have full control of the
CPU. A task can be thought of as the basic functional unit of an OS user applica-
tion; the application is broken into modules whose functionality is described by tasks.
A multitasking system will have several such tasks that are run according to the schedul-
ing rule used by the system.
Almost all RTOSes and their application programs are written in C. A RTOS task
is thus essentially a C function that returns nothing and usually takes no arguments.
However, tasks may not be called in the same sense as a C function, and they never
return.The common prototype of a RTOS task is as follows:
void some_task (void)
Note that some RTOS allow parameters to be passed to the task just like how
parameters are passed to normal C functions,but here,for simplicity,we will assume that
a task does not take any parameters.
A task describes a real-time activity,hence it must run as long as the system is active.
Thus, task bodies are almost always inﬁnite loops, within which the functionality of the
task is coded.Thus, a task will look as follows:
void some_task (void)
{
/* this part runs only once – usually something is
initialized here */
for(;;)
{
/*
* code for the task’s functionality comes here
*/
}
}
The kernel is made aware that some_task is a RTOS task using an API such as
task_create or task_register. The kernel passes control to the task when appropriate.
As stated earlier, a multitasking system will have several such tasks. One might
wonder how they can execute,since each task is essentially an inﬁnite loop.Once started,
a task never stops executing! This is where the kernel does it’s magic—the kernel is able
to pause execution of a task, and can resume it at a later time.
A task need not run always—usually it will need to wait for an event, or may simply
need to wait for a certain amount of time before it has to run again. Almost all RTOSes
provide an API called sleep or delay, which causes the task to wait for a certain amount
M07_9788131787663_C07.indd 272
M07_9788131787663_C07.indd 272 7/3/2012 3:11:58 PM
7/3/2012 3:11:58 PM

of time (specified as an argument to these API). Thus, when a task calls such an API,
the kernel is notified that the task wants to sleep, and thus control of the CPU can be
transferred to another task. If all tasks in the system are asleep, control is given to a
special task called the System Idle Task or simply Idle Task. The Idle Task does nothing;
the CPU can even be put in a low power mode till a scheduling interrupt arrives. The
Idle Task can also be used to collect information about the RTOS performance (such
as the CPU load). Usually, the Idle Task is not accessible to the application developer.
How is multitasking implemented?
To implement preemptive multitasking, the kernel must be able to pause the execution
of a task, and resume the execution of another task. While this might seem difficult, it
really isn’t so.
Essentially, a task is just a piece of code; it will just be a sequence of CPU instruc-
tions that are to be fetched from memory.Consider the assembly equivalent of two tasks:
task1:
instruction-1
instruction-2
instruction-3
jmp task1
task2:
instruction-4
instruction-5
instruction-6
jmp task2
As stated earlier, both tasks are infinite loops. Assume that the CPU has started exe-
cuting task1,and is currently processing instruction-2,and a need to run task2 arises.What
needs to be done so that the CPU can remember it was about to execute instruction-3 before
jumping into task2, and hence must return back to instruction-3 when task2 is done?
The answer is that the Program Counter needs to be saved.
If the PC is saved while the CPU is executing instructions of task1, control can be
transferred to task2.When task1 needs to run again,this PC can be re-loaded and hence,
the CPU continues from where it left off. The same is done for task2 as well. Thus for
a system with N tasks, N different storage spaces are needed for the PC—one for each
task.This is similar to a normal subroutine call (say using the CALL instruction), except
that the PC is saved on both sides, and not just in the caller side of the code.The kernel
code will be responsible for saving and restoring the PC.
If this switching is done fast enough, the system will appear to do multiple things
simultaneously, and we say the system is multitasking.
Is it enough to save only the PC? The answer is usually no. If the instructions in
each task do not share anything in common, then saving the PC alone is sufficient.
However, almost all programs will require at least the CPU registers to be shared. If
these registers are not saved, their values can get overwritten every time a different task
gets control of the CPU.Thus, when the original task gets a chance to execute, the values
in the CPU registers will not be the same as when it was previously executing, leading
M07_9788131787663_C07.indd 273
M07_9788131787663_C07.indd 273 7/3/2012 3:11:58 PM
7/3/2012 3:11:58 PM

to inexplicable behavior. Thus, along with the PC, the shared registers must be saved as
well. This is again similar to a sub-routine call, where registers that will be modified by
the sub-routine are saved by the caller. Again, in this case, this saving is two-way.
If the code is written in a high-level language, it is difficult to predict what registers
will be used where. Hence, generally, all the CPU registers are saved when a task is to
be paused, and restored when the task resumes. In addition to CPU registers, the Stack
Pointer is also commonly saved (this is a requirement for almost all C based systems),
and in some rare cases, some extra information must be saved as well (such as some
internal compiler symbols).Thus for N tasks, N such copies need to be maintained.
All this information (PC, Registers, Stack Pointer) is called a task’s Context, and the
activity of saving and restoring the context data is called Context Switching.
Where is the context stored? An obvious way is to use a region of memory that is
reserved for this purpose. A more common approach is to use the stack itself.
For a multitasking system, the stack is split into smaller pieces, and each piece is
given to a task.Thus, each task has it’s own stack area.This means that the context infor-
mation can be stored on the stack itself, without it needing to be copied elsewhere. The
advantage is that the save is accomplished by simply pushing the data onto the stack.
The stack pointer alone is saved elsewhere, in a known memory location. To restore the
context, the stack pointer is retrieved, and then, the data can simply be popped off the
stack.This method is employed for most processors (AVR, ARM, etc.).
For 8051 based systems, things are different. The stack exists in the IRAM portion
of memory (which can be up to 256 bytes). This segment is too small to be partitioned
to give multiple stacks, hence the context data needs to be stored in external RAM
(XRAM). Since the 8051 has no instructions to move register contents to XRAM, the
entire context needs to be pushed onto stack and then copied to XRAM.To restore the
context, the context first needs to be copied onto stack from XRAM, and then popped.
7.17.1.1 | The Kernel
The kernel is the core of an RTOS. Again, the kernel is just a piece of code, but it
does something very important. It is responsible for context switching, and hence,
enabling multitasking. The basic work done by the kernel is shown in the following
pseudo-code:
procedure Kernel is
begin
Save_Context;
Schedule;
Restore_Context;
end procedure;
The first thing the kernel code does is to save the context. This involves pushing
the context data onto stack. Once this is done, the kernel code is free to use the CPU
registers to execute it’s own instructions. The kernel then decides which task has to run
next—an activity called Scheduling. Once a new task is scheduled, the kernel must start
to relinquish control.It starts by restoring the context (popping context data from stack),
and finally returning. This is somewhat similar to a function call, except that the kernel
may not return to the caller.
M07_9788131787663_C07.indd 274
M07_9788131787663_C07.indd 274 7/3/2012 3:11:58 PM
7/3/2012 3:11:58 PM

Since the kernel code needs to access the CPU registers for context save/restore,
that part has to be unavoidably written in assembly, since most high-level languages do
not provide any direct means to access CPU registers or even the stack. The rest of the
kernel code (scheduling) can be written in a high-level language.
The kernel code runs at special points in time called scheduling points. A scheduling
point is simply something that causes the kernel to run—it can be a due to a API call
(such as sleep), or a periodic interrupt.
Most systems employ a special timer that generates a periodic interrupt.This timer
is called the heart-beat timer or the tick timer. Every time this timer generates an inter-
rupt, the kernel code runs. This ensures that no task can take control of the CPU for
longer than permitted.
The kernel can also run when a RTOS API is called.For example,if a task calls sleep,
the kernel code can run so that the context of this task can be saved (the task is paused),
and another task can be given control.
7.17.1.2 | Task Control Blocks
We have seen how basic multitasking is implemented.What else is required to complete
the implementation? Looking at tasks as high-level entities, we can come up with some
attributes they can have, such as the entry point of the task, state (running, sleeping,
waiting), priority (for priority based systems), sleeping period (to implement sleep) and,
of course, the copy of stack pointer (or context data).
To make writing kernel code easier, a structure or record is defined with the above
fields as members. This structure is called a task control block (TCB). A TCB is the
personification of a task—it is a way to look at a task in an abstract way (and not just as a
piece of code). After all, the kernel needs to process only these attributes; it need not be
concerned with the actual code of the task itself. A typical TCB looks like this:
struct tcb_t
{
uint8_t *stack_pointer;
uint8_t task_state;
uint8_t task_priority;
uint16_t sleep_ticks;
};
(uint8_t represents an 8-bit unsigned integer, uint16_t represents an unsigned
16-bit integer; data types used for the structure members depends on the architecture of
the processor.The above example is for a typical 8-bit processor)
One TCB object is created for each task; it represents that task. These objects are
usually part of an array, called the TCB List. The scheduler works with this TCB list to
decide which task should run next. For a round-robin system, the next task is simply the
next entry in the TCB List. If the current entry is the last one, the first entry is taken
as next. For priority-based systems, scheduling involves finding out which entry in the
TCB list has the highest priority among non-sleeping tasks.
A Complete Example
Consider a simple RTOS that supports only round-robin scheduling. The RTOS has
the following API:
M07_9788131787663_C07.indd 275
M07_9788131787663_C07.indd 275 7/3/2012 3:11:59 PM
7/3/2012 3:11:59 PM

• os_init: initializes the RTOS
• task_register: registers a task with the kernel
• task_sleep: causes the task to sleep for a speciﬁed number of milliseconds
• os_run: starts up multitasking
Since the RTOS supports only round-robin scheduling,all tasks have equal priority.
If two or more tasks are ready at the same time, they are run one after the other.
task_register takes three parameters:
• entry point of the task (address of the task function)
• start of the task's stack
• size of the task's stack
Each task's stack space is allocated statically; the stack is simply an array declared
in the program.
The code for a simple application program: blinking two LEDs connected to two
I/O ports is shown below:
(uint8_t is an unsigned 8-bit integer type)
/* declare the two tasks */
void task1 (void);
void task2 (void);
/*
*declare the stacks for each task
*
*each task has 40 bytes of stack space
*starting address of the stack is the address of the
first element of the array
*/
uint8_t task1_stack[40];
uint8_t task2_stack[40];
int main (void)
{
/* initializes the RTOS */
os_init();
/* register the two tasks with the kernel */
task_register(task1, task1_stack, 40);
task_register(task2, task2_stack, 40);
/* start multitasking */
os_run();
return 0; /* will never reach here! */
}
/*
*define task1
*/
M07_9788131787663_C07.indd 276
M07_9788131787663_C07.indd 276 7/3/2012 3:11:59 PM
7/3/2012 3:11:59 PM

void task1 (void)
{
for(;;)
{
IOPORT1 ^ = 1; / *toggle IOPORT1 */
task_sleep(500);
}
}
/*
*define task2
*/
void task2 (void)
{
for(;;)
{
IOPORT2 ^ = 1; /* toggle IOPORT2 */
task_sleep(1000);
}
}
The application contains two tasks: task1 and task2. Each task toggle an I/O port
and then sleep for an amount of time (task1 sleeps for 500 ms, task2 for 1000 ms).
Applications written for general purpose operating systems are distinct from the
OS; they are not part of the OS itself. In this case, the operating system binary and
application binary are diﬀerent.
In contrast, there is no distinction between the RTOS application and the RTOS
itself. Both are linked together to give a single binary image, which is then loaded onto
the memory of the embedded processor.Hence,the RTOS exists as a library of code that
is linked with the application code (similar to how a math library is linked when say the
sin function is used). If a new task has to be added, the entire application code has to be
re-compiled and re-linked.
Why use an RTOS?
The real advantage of using RTOS can be understood from a simple example:
Consider a system that has to do two tasks:
• task1: needs to execute only when an interrupt arrives
• task2: executes irrespective of whether the interrupt arrived or not
Without a RTOS, the code might be written as follows:
volatile uint8_t flag = 0; /* flag to indicate arrival of
interrupt */
void interrupt_handler (void)
{
flag = 1;
}
M07_9788131787663_C07.indd 277
M07_9788131787663_C07.indd 277 7/3/2012 3:11:59 PM
7/3/2012 3:11:59 PM

int main (void)
{
for(;;)
{
/* if interrupt has arrived, execute task1 */
if (flag)
{
task1();
flag = 0; /* reset the flag */
}
/* task2 runs irrespective of whether the
interrupt arrived or not */
task2();
}
}
Looking at the code, we expect that task1 runs as soon as the interrupt arrives.
However, this is true only if the interrupt arrived before flag is checked in the main loop.
In such a case, the condition passes, and task1 runs as expected.
But what happens when the interrupt comes just after the condition was checked?
Since at the time of checking, flag was zero, task1 was not executed. Instead, now task 2
has already started to execute when the interrupt arrived. Although flag is now set (by
the interrupt handler), task1 has to wait until task2 finishes, so that the main loop sees
that flag was set and that task1 can run.
An adhoc solution could be put in place, like checking for flag inside task2, but it
does not really solve the problem. If the interrupts are completely asynchronous to the
system (meaning it is hard to predict when they will arrive), no amount of checking will
help.Things can get worse if another task needs to be added; where should the function
call to that task be put?
If task1 has to run in real-time, that is, run as soon as the interrupt arrives (regard-
less of when it arrives), this code is of no use. Assume that the interrupt signals that a
button was pressed, and task1 has to display a menu on a screen. Which the above code,
a user will find that the menu appears on the screen after varying amounts of delay—
sometimes it appears immediately, and at other times, it appears after a delay. Such a
behaviour can be unacceptable in many cases.
In short, this type of code has is timing sensitive; adding or removing pieces of code
will affect the time at which task1 actually executes.
Now let’s see how the same code is written using a multitasking RTOS. Some new
RTOS API are introduced here:
• signal: signals that a condition is fulfilled
• wait: causes a task to wait until a condition is fulfilled
• os_interrupt_exit: causes the kernel to run
signal and wait are what are called as Semaphore primitives. They are used for what
their name suggests—signalling between tasks, or an interrupt handler and a task.
M07_9788131787663_C07.indd 278
M07_9788131787663_C07.indd 278 7/3/2012 3:11:59 PM
7/3/2012 3:11:59 PM

A task waiting on a semaphore will be blocked (paused) until another task signals it.
This can be used for synchronization between two tasks. Note that an interrupt handler
cannot be made to wait, it is always asynchronous to the rest of the code. However, an
interrupt handler can signal a task. A semaphore object is used as parameters to signal and
wait.This object contains some information about the signal state,what tasks are waiting
on it, etc.
os_interrupt_exit contains some code to properly call the kernel, which reschedules
the tasks.This can cause a signalled task to resume execution.
How are these primitives used to solve the above problem?
Since task1 has to run when the interrupt arrives,the interrupt handler has to signal,and
task1 has to wait for that signal. When the signal does arrive, the kernel sees that task1
should run, and passes control to task1. If task2 is currently executing, it is paused until
task1 completes. When task1 ﬁnishes executing, it goes back to waiting for the next
signal, and task2 is resumed.
In this case, task1 always runs as soon as the interrupt arrives; task2 can no longer
aﬀect the time at which task1 executes. Hence, we can say task1 executes in real-time.
The best part is that more tasks can be added or removed easily; the kernel takes care of
scheduling, so that task1 always executes in real-time.
For simplicity, only the relevant task code is shown below; the main function, ini-
tialization of tasks, etc. are omitted.
void interrupt_handler(void)
{
/* send the signal, s is a semaphore object */
signal(s);
/* run the kernel code */
os_interrupt_exit();
}
void task1 (void)
{
for(;;)
{
/* wait for signal */
wait(s);
/*
* code for processing comes here
*/
}
}
void task2 (void)
{
for(;;)
M07_9788131787663_C07.indd 279
M07_9788131787663_C07.indd 279 7/3/2012 3:11:59 PM
7/3/2012 3:11:59 PM

{
/* some lengthy computation is done here */
}
}
Note that simply by using a RTOS does not magically make a non real-time system
into a real-time system. The facilities provided by the RTOS must be used properly to
ensure real-time behaviour.
7.17.2 | Mutex, Semaphore and Mailbox
Almost all RTOSes provide mutexes, semaphores and mailboxes. Typically, each has
an associated data-type, and a set of API to operate on an object of that data-type. For
example, the mutex implementation may have a data-type called mutex_t and a set of
API such as mutex_lock and mutex_unlock. The exact names of the data-types and API
are dependent on the RTOS, and hence, some general names are used here.
Assume that a particular RTOS provides the following:
i) Semaphore
• data-type: semaphore_t
• API: semaphore_create, semaphore_wait, semaphore_signal
ii) Mutex
• data-type: mutex_t
• API: mutex_create, mutex_lock, mutex_unlock
iii) Mailbox
• data-type: mailbox_t
• API: mailbox_create, mailbox_post, mailbox_wait
Each of these APIs operate on an object (variable) of the associated data-type. For
example, semaphore_signal will require an object of type semaphore_t to be passed to it.
The following pseudo-code examples assume a C like language, where arguments
are passed by value. Since all these APIs will need to modify the internal state of the
object passed to them,the address of the object is passed to the API,instead of the object
itself.
Semaphore Example
/* synchronizing producer-consumer tasks using
semaphores */
/* number of items in buffer */
#define N 42
/* semaphore objects to indicate state of queue */
semaphore_t sem_fill, sem_empty;
void producer_task (void)
M07_9788131787663_C07.indd 280
M07_9788131787663_C07.indd 280 7/3/2012 3:11:59 PM
7/3/2012 3:11:59 PM

{
for(;;)
{
/* wait till queue has atleast one empty space */
semaphore_wait(sem_empty);
/* insert an item into the queue */
insert_to_queue(item);
/* signal that the queue has been filled with
one item */
semaphore_signal(sem_fill);
}
}
void consumer_task (void)
{
for(;;)
{
/* wait for queue to be filled with atleast one
item */
semphore_wait(sem_fill);
/* remove item from queue and process it */
item = remove_from_queue();
/* signal that an item has been removed */
semaphore_signal(sem_empty);
}
}
int main (void)
{
/* create the semaphore with initial values */
semaphore_create(sem_fill, 0);
semaphore_create(sem_empty, N);
/* code to register the tasks with the kernel and
startup comes here */
/* should never reach here! */
return 0;
}
Here, two semaphore are used: sem_fill and sem_empty.
• sem_fill is initialized with value 0, and indicates the number of items currently in the
queue.
• sem_empty is initialized with value N, and indicates the number of free slots left in
the queue.
M07_9788131787663_C07.indd 281
M07_9788131787663_C07.indd 281 7/3/2012 3:11:59 PM
7/3/2012 3:11:59 PM

The producer task produces items (data) that are put in a common queue. Before
items can be placed in the queue, the task must ﬁrst ensure that the queue has atleast
one empty slot.This is done by waiting on the sem_empty semaphore: if this semaphore’s
count it zero,it means the queue is full (i.e.no free slots),and hence the producer task will
be put to sleep by the RTOS. Once a slot if available, the producer task is automatically
awakened by the RTOS. If free slots do exist, the execution continues normally. In
either case, the item is placed in the queue only if it has a free slot. The task must then
indicate that an item has been placed in the queue.This is done by signalling the sem_fill
semaphore.This operation simply increments the count of this semaphore.The operation
of the consumer task is similar to the producer task, except that the consumer removes
items from queue,and hence the condition is for the queue to be non-empty.This is done
by waiting on the sem_fill semaphore, and signalling the sem_empty semaphore indicates
that one item has been removed (one more slot has been made free in the queue).
Mutex Example
/* guarding a shared serial communication line using
mutexes */
/* mutex object representing the serial line */
mutex_t mx_serial;
void task1 (void)
{
for(;;)
{
/* lock the serial port, or wait for lock */
mutex_lock(mx_serial);
/* send the message */
uart_puts(“inside task1”);
/* unlock the serial port */
mutex_unlock(mx_serial);
}
}
void task2 (void)
{
for(;;)
{
/* lock the serial port, or wait for lock */
/* send the message */
uart_puts(“inside task2”);
/* unlock the serial port */
}
M07_9788131787663_C07.indd 282
M07_9788131787663_C07.indd 282 7/3/2012 3:11:59 PM
7/3/2012 3:11:59 PM

}
int main (void)
{
/* create the mutex object */
mutex_create(mx_serial);
return 0;
}
Notice that the mutex API calls are symmetric: both tasks should try to lock and
release the mutex object representing the serial port. The serial port itself is not aware
of any mutex guarding it; the entire work is done by the mutex object mx_serial and the
APIs mutex_lock and mutex_unlock. Using mutexes this way ensures that the serial port
is accessed only by one task at a time, thus enforcing mutual exclusion of the serial port
among the tasks.
Here, task1 performs some lengthy calculations, and produces a result.This result is
further used by task2 to perform some other processing. task1 can send this result using a
mailbox,and task2 simply needs to wait on this mailbox.Note that this is similar to using
a semaphore, where one task signals and the other simply waits.
The mailbox object has a member called contents, which is usually a generic pointer
(void pointer). Hence, to retrieve the data, this pointer has to be cast to the appropriate
data-type pointer, and then dereferenced.
Mailbox Example
/* message passing using mailboxes */
/* declare the mailbox object */
mailbox_t mb;
void task1 (void)
{
for(;;)
{
/* perform a lengthy calculation here */
/* send the result using a mailbox */
mailbox_send(mb, result);
}
}
void task2 (void)
{
for(;;)
{
M07_9788131787663_C07.indd 283
M07_9788131787663_C07.indd 283 7/3/2012 3:11:59 PM
7/3/2012 3:11:59 PM

/* wait for the result from task1 */
mailbox_wait(mb);
/* read and process the results */
data = *(int *)mb.contents;
/* continue processing */
}
}
int main (void)
{
/* create the mailbox object */
mailbox_create(mb);
return 0;
}
A Complete Example
Consider a music player that is driven by an RTOS. Two tasks that might exist are
play_task and glcd_gui_task. play_task simply plays a song in the background, interacting
only with the storage medium (say SD card) and the audio codec. glcd_gui_task runs in
the foreground, always alert in responding to user actions.
Assume that the user selects a new song from the on-screen menu.glcd_gui_task will
know of this, and has to send this information (i.e. the new ﬁle name) to play_task.This
information is passed using a mailbox.
Both tasks send debugging messages through a serial line, and this serial line is
protected using a mutex.
If some I/O error occurs during playback, play_task signals an error handling task to
report the error to the user.
/* declare objects */
semaphore_t sem_error;
mutex_t mx_serial;
mailbox_t mb;
char file_name[13];
void glcd_gui_task (void)
{
for(;;)
{
Display_GUI();
M07_9788131787663_C07.indd 284
M07_9788131787663_C07.indd 284 7/3/2012 3:11:59 PM
7/3/2012 3:11:59 PM

if(Menu_Button_Pressed)
{
Display_Menu();
Get_Selection(file_name);
/* send some debug information over
serial port */
uart_printf(“new file: %s”, file_name);
/* send the new file name as a mail to
play_task */
mailbox_post(mb, file_name);
}
}
}
void play_task (void)
{
for(;;)
{
/* wait for mail from glcd_gui_task */
mailbox_wait(mb);
/* mail contents is the file name */
Open_File(mb.contents);
/* send some debug information over serial
port */
mutex_lock(mx_serial)
uart_printf(“got file: %s”, file_name);
for(;;)
{
/* read block from SD card */
res = Read_File_Block(buffer, 512);
/* in case of I/O error, signal the
error task */
if(res! = RES_OK)
{
semaphore_signal(sem_error);
break;
}
/* forward the block to the audio codec */
Forward_Block(buffer);
}
M07_9788131787663_C07.indd 285
M07_9788131787663_C07.indd 285 7/3/2012 3:11:59 PM
7/3/2012 3:11:59 PM

}
}
void error_task (void)
{
for(;;)
{
/* wait for I/O error signal */
semaphore_wait(sem_error)
/* send a beep to the speakers to indicate
error */
Beep_Speaker();
}
}
Some points to remember when using an RTOS
We have seen that an RTOS is very helpful when dealing with timing related code.
However, an RTOS does have some disadvantages:
i) Requires more memory: since each task requires its own stack, stack space can get
eaten up quickly as the number of tasks increase.Since the binary image also has the
RTOS code, more ROM will be needed.
ii) Overhead: the Kernel is also a code that needs to execute: this adds some overhead
to application code execution.Typically, (2–5%) of execution time is used up by the
kernel itself.
For a preemptive RTOS, the stack space for each task is a major factor that aﬀects
the memory required.The stack is used to:
i) Store the context information
ii) Hold C automatic variables
iii)Hold copies of registers when an interrupt handler should run
The size of context information mainly depends on the number of processor regis-
ters: it can be quite high for RISC processors (in the order of 25 registers or so).Interrupt
handlers are also required to push registers that they will use in their own code, so the
stack must accommodate that as well.Things can get complicated with nested interrupts.
Conclusion
With this, we come to the end of the chapter on operating systems. Please note that
most of the important concepts have been covered here.
In this chapter, the concepts presented apply to a general purpose operating system.
Most of these concepts are used in embedded operating systems also. But additional con-
straints on time are to be added when an embedded OS is a real-time OS.These aspects
are discussed in the next chapter.A list of popular operating systems is given by Table 7.8.
M07_9788131787663_C07.indd 286
M07_9788131787663_C07.indd 286 7/3/2012 3:11:59 PM
7/3/2012 3:11:59 PM

Table 7.8 | List of Some Popular Operating Systems
GPOS EMBEDDED OS RTOS
UNIX SYMBION uC/OS
LINUX VARIANTS EMBEDDED LINUX VxWorks
WINDOWS ANDROID FREE RTOS
MAC OC X BADA (SAMSUNG) RT LINUX
BSD VARIANTS i OS
PALM OS
WINDOWS CE
TINY OS
An operating system is the super manager of a computer system.
The kernel is the core of the operating system.
A task/process is defined as a program in execution.
A notion of concurrency is achieved by using scheduling algorithms and multitasking.
Scheduling may be pre-emptive or non-preemptive.
Threads are light weight processes.
Interrupts have an important role in operating system activities.
It is necessary for processes to communicate with each other.
IPC is performed using pipes, mailboxes and shared memory.
Synchronizing various OS activities is a matter of priority.
Racing, readers’ writers’ problem, producer consumer issue and deadlocks are matters to
be resolved when designing operating systems.
Semaphores are variables used for signalling.
A binary semaphore and a mutex are not the same.
Priorityinversioncanoccurinsystemswheremutexesareusedandpre-emptionisallowed.
Device drivers are an important part of any operating system and are used for interacting
with I/O devices.
Q U E S T I O N S
1. List three reasons why an operating system is desirable for a computer system.
2. List three functions that an operating system performs.
3. With the help of a suitable example, differentiate between protection and security.
4. Name three low level software utilities that you have used.
5. Why do I/O devices need the support of an OS?
M07_9788131787663_C07.indd 287
M07_9788131787663_C07.indd 287 7/3/2012 3:11:59 PM
7/3/2012 3:11:59 PM

6. How is concurrency achieved in a system which has multiple tasks to perform?
7. List two criteria for selecting a scheduling algorithm.
8. Why is pre-emption of tasks sometimes done by an OS?
9. How are threads different from tasks?
10. Name and explain two IPC mechanisms.
11. Distinguish between blocking and non-blocking send and receive in IPC.
12. Think of a practical situation which relates to the producer consumer paradigm.
13. Think of a situation in a computer system, in which ‘racing’ can affect the correctness of
results.
14. How do deadlocks occur? How can they be avoided?
15. Is priority inversion a serious problem? Why?
E X E R C I S E S
1. Three tasks T1
, T2
and T3
, with service times as shown in Table 7.9 enter the ready queue in
the order T1
, T2
and T3
, respectively. Task T4
with a service time of 50 time units, enter the
queue after 50 time units.
Table 7.9
Task No TS
(Time Units)
T1
50
T2
10
T3
70
Calculate the average TAT and waiting time, for the following algorithms
i) Co-operative
ii) Shortest Job Next
2. Three tasks T1
, T2
and T3
, with service times as shown in Table 7.10 enter the ready queue
in the order T1
, T2
and T3
. Task T4
with a service time of 15 time units, enter the queue after
40 time units.
Table 7.10
Task No TS
(Time Units)
T1
60
T2
50
T3
10
Calculate the average TAT and waiting time, for the following algorithms
i) Co-operative
ii) Shortest Job Next
iii) Pre-emptive SJN (SRT)
M07_9788131787663_C07.indd 288
M07_9788131787663_C07.indd 288 7/3/2012 3:11:59 PM
7/3/2012 3:11:59 PM

3. Three tasks T1
, T2
and T3
, with service times and priorities as shown in Table 7.11 enter the
ready queue in the order T1
, T2
and T3
, respectively. Task T4
with a service time of 15 time
units and priority 1 enter the queue after 40 time units.
Table 7.11
Task No TS
T1
60 2
T2
50 3
T3
10 5
Calculate the average TAT and waiting time for the following algorithms
i) Non-preemptive priority-based scheduling
ii) Pre-emptive priority-based scheduling
4. Three processes with service times as shown in Table 7.12 enter the ready queue in the
order T3
, T1
, T2
. Calculate their TATs for the FIFO (co-operative scheduling algorithm.)
Table 7.12
Task No TS
(Time Units)
T1
40
T2
55
T3
25
i) If they choose to use the round-robin scheme with a slice time of 10 time units, how
will the scheduling change?
M07_9788131787663_C07.indd 289
M07_9788131787663_C07.indd 289 7/3/2012 3:11:59 PM
7/3/2012 3:11:59 PM

Introduction
The previous chapter focused on general purpose operating systems which are used in
PCs. In this chapter, we will discuss real-time operating systems, or RTOS, as they are
called,which have a rather restrictive domain.But most of the aspects of general purpose
OSes are applicable in this as well. In this chapter, only the ‘differences’ will be high-
lighted and discussed.
Before we go into the intricacies of real-time operating systems, let us have a clear
definition of what exactly we are talking about. First of all, what do we mean by the term
‘real-time’? The answer is that it is ‘time’ measured by a physical clock. Things have to
happen with relation to the actual time by which the universe is timed.
8.1 | Real-time Tasks
What is a real-time task?
It is a task in which the performance is judged on the basis of time; this means that
the result of a computation is ‘correct’ only if has produced its correct output within
the specified time constraint, that is, failure to meet the specified time constraints is
designated either as ‘system failure’ or a reduced value for the ‘quality of service’. This
‘temporal’ factor is the distinguishing feature of a real-time task.
How to define a real-time task
How to differentiate between soft, hard
and firm real-time tasks
The definitions of periodic, aperiodic and
sporadic tasks
The concept of real-time operating systems
and their necessity
The concepts of real-time scheduling
algorithms
The rate monotonic algorithm
The earliest deadline first algorithm
The qualities of a good real-time operating
system (RTOS)
real-time operating
systems
8
Chapter-opening image: An EPROM IC.
M08_9788131787663_C08.indd 290
M08_9788131787663_C08.indd 290 7/3/2012 12:10:49 PM
7/3/2012 12:10:49 PM

REALTIME OPERATING SYSTEMS 291
Listed as follows are some examples of systems with real-time tasks:
• Process control in industrial plants
• Robotics
• Air traffic control
• Telecommunications
• Weapon guidance system
• Medical diagnostic and life-support systems
• Automobile engine control systems
• Anti-lock braking systems
• Real-time data bases
In all these applications, time is an important parameter.Think of air traffic control,
where aircrafts are guided accurately for landing and navigation, and any delay in get-
ting the right response will result in a disaster in terms of human life.This applies to all
the other applications in the above-said sample list. A real-time data base is one which
gets updated continually, for example, the flight data of an aircraft is to be continuously
changed with new values of speed, direction, location, altitude and such data. Decisions
on the navigation of the craft are made based on this data, and so this data is safety criti-
cal. Take another case—a queue of people needing service in a billing counter is not a
real-time system, but if a time constraint is added that billing is to be completed within
a certain amount of time, then the service for each customer becomes ‘real time’.
What is real-time speech/video processing?
A speech or moving picture sample of 1 second, if processed in 1 second or less makes
it real-time processing. If the processing takes more than 1 second, it is no longer real-
time processing.
Are real-time systems and embedded systems the same?
No, embedded systems are systems designed for a specific set of applications (Ref
Section 1.2). When such a system requires ‘time constrained’ operation, it becomes a
real-time embedded system. All embedded systems are not real-time systems; a printer
is not a real-time system, though it belongs to the class of embedded systems, but a
robot which counts objects passing through a conveyor belt is a real-time system.
8.1.1 | Terms and Defintions
Now, let’s define some terms which we will be using henceforth.
i) Release time (or ready time): This is the time instant at which a task is ready or
eligible for execution.
ii) Scheduling time: This is the instant of time at which a task gets its chance to
execute.
iii) Completion time: This is the time instant at which a task completes execution.
iv) Deadline: This is the instant of time by which execution of the task should be
completed.
v) Run time: The time taken without interruption to complete the task, after the
task is released.
M08_9788131787663_C08.indd 291
M08_9788131787663_C08.indd 291 7/3/2012 12:10:50 PM
7/3/2012 12:10:50 PM

vi) Tardiness specifies the amount of time by which a task misses it deadline.It is equal
to the difference between the completion time instant and the deadline.
vii) Laxity is defined as the deadline minus the remaining computation time.The laxity
of a task is the maximum amount of time it can wait, and still meet its deadline.
8.1.2 | Scheme of a Time Constrained Task Execution
Figure 8.1 shows the various time components of a real-time task. Each computation
has a release/ready time R, at which the task becomes available for scheduling. At some
point after the ready time, the task gets scheduled at the scheduling time S, and then
the execution starts. Execution completes at time C, designated as the completion time.
A deadline D,is typically associated with the task completion,and the aim is to complete
the task before the deadline. Note that the task need not be scheduled just as soon as it
is ready. It needs to wait in the ready queue, until it gets its chance to execute.
8.1.3 | Types of Real-time Tasks
We can classify real-time tasks as hard, soft and firm. Suppose there are n tasks in a
system notated as T1
, T2
…. Tn
. Let the completion time instant and deadline for task
Ti
be Ci
and Di
, respectively. We can differentiate between the three types, based on the
following criteria.
8.1.3.1 | Hard Real-time Task
Task Ti
is a hard real-time task if it is mandatory that Ci
= Di
. This implies that the
failure to meet the deadline makes it a fatal fault. If the task misses the stipulated
deadline, it is a failure, which means that the value of completing the task after the
deadline, is zero, that is, the task is useless now, and may even be catastrophic. For a hard
real-time task, the value of the task instantly reduces to zero as shown in Figure 8.2.
The value if the task is plotted on the Y axis, and completion time is plotted on the
X axis, where release time R, and deadline D, are also marked.The task has the constant
value V, if it is completed any time within the deadline D. After that, its value is zero.
An automobile braking system needs to have a ‘hard’ deadline—the calculation and
decision for braking must be made before the expiry of the stipulated deadline—otherwise
a collision occurs. Such an application is designated as ‘safety critical‘, and all such tasks
are hard real-time tasks.
Ready
Task
R S C D
Scheduled Completed
Schematic of a Time Constrained Computation
Deadline
Time
T
Figure 8.1 | Time components of a real-time task
M08_9788131787663_C08.indd 292
M08_9788131787663_C08.indd 292 7/3/2012 12:10:50 PM
7/3/2012 12:10:50 PM

8.1.3.2 | Soft Real-time Task
Task Ti
is a soft real-time task if there is a penalty P(Ti
) associated with it if the condi-
tion Ci
= Di
is not satisfied. The penalty increases as (Ci
–Di
) increases. The penalty is
the inverse of the value of the task. If the completion time extends beyond the deadline,
the value of the task starts falling until it is zero. Figure 8.3 illustrates this. It is seen
that until the deadline is reached, the completion of the task has a value, which goes on
reducing with time, till a point (in time) is reached when the result of execution of the
task becomes useless, because it has come too late. The penalty and effects associated
with missing the deadline are poorly defined and varies, depending on the application.
Imagine a game in which scores of the players have to be displayed continuously.
If the scores are a bit late, no catastrophe occurs, but decisions have to be based on the
scores; if scores are outputted beyond the deadline, we say that the performance of the
scoring system is bad. Beyond a limit, the system becomes useless for gaming. Stock
market updates are of the similar category with ‘soft’ deadlines.
8.1.3.3 | Firm Real-time Task
Task Ti
is a firm real-time task, if its value reduces to zero if the stipulated deadline is
not met. In practice, this means that the output of the task is discarded if the deadline
is not met. This is slightly different from a ‘hard’ real-time task, in the sense that here,
R
V
D
Hard Deadline Value Function
Completion Time
Value
Figure 8.2 | A hard real-time task
Figure 8.3 | The value of a soft real-time task
R
V
D
Completion Time
Value
M08_9788131787663_C08.indd 293
M08_9788131787663_C08.indd 293 7/3/2012 12:10:50 PM
7/3/2012 12:10:50 PM

missing the deadline is not catastrophic, it simply is that the output of delayed execution
is dropped. For such a system to be useful, the point is that deadlines are allowed to be
missed once in a while, but not too frequently.
What could be considered as an example of a ‘firm real-time task’? Think of a
handheld device which receives/delivers video frames. The process of decoding a video
frame may occasionally get delayed by reasons like unexpected interrupts and the like.
When such video frames are delivered late, the playback does not look good. Skipping
such frames and making sure that not too many such frames are delayed will be better,
because the effect might be less noticeable to the user than playing back the delayed
frames.
8.2 | Real-time Systems
In a system, there will be a number of tasks which will be taken up one by one for pro-
cessing; do all these jobs need to be ‘time constrained’to make it a real-time system? The
answer is no. A real-time system is composed of many tasks, but there should be at least
one task which can be categorized as one of the above three types (hard, soft or firm).
Such a system can be designated as a real-time system.
Now let’s see other types of classifications for real-time tasks.
8.3 | Types of Real-time Tasks
8.3.1 | Periodic Tasks
Periodic tasks are real-time tasks which are activated (released) regularly at fixed rates
(periods). Periodic tasks must execute once per period. A very large set of real-time tasks
are periodic. Consider a number of sensors in a chemical plant. These sensor inputs are
sampled at regular intervals which are fixed ‘a priori’. However, the periods of tasks can
be dynamic as well. Figure 8.4 shows a task with period P.
8.3.2 | Aperiodic Tasks
An aperiodic task is a stream of jobs arriving at irregular intervals. The definition of
aperiodic tasks indicates that the inter-arrival period between two such tasks can be zero,
which means that two aperiodic tasks can arrive at the same time. Aperiodic tasks are
also implied to have ‘soft’deadlines, and the aim of scheduling them would be to provide
Task
R
P
S S
R S
R S
R
Time
T
t1
t2
t3
t4
Figure 8.4 | A periodic task
M08_9788131787663_C08.indd 294
M08_9788131787663_C08.indd 294 7/3/2012 12:10:50 PM
7/3/2012 12:10:50 PM

fast ‘average’response times.Signals generated by events from various devices in a system
can constitute aperiodic tasks. An operator’s command from a surveillance system (like
radar) is an example. Along with processing the surveillance signals, this will also have
to be taken care of.That’s how two signals may arrive at the same time, and it is obvious
that hard deadlines are not possible. See Figure 8.5 which shows a task whose instances
arrive without any particular timing.
8.3.3 | Sporadic Tasks
A sporadic task is an aperiodic task with a hard deadline and a minimum inter-arrival
time (between two such tasks). Without a minimum inter-arrival time restriction, it is
impossible to guarantee that a deadline of a sporadic task would always be met.Examples
of such tasks are emergency conditions like ﬁre, over speed of critical machinery in a
plant, etc.
8.3.4 | Preemptible/Non-preemptible Tasks
In some real-time scheduling algorithms, a task can be preempted if another task of
higher priority becomes ready. In contrast, the execution of a non-preemptive task
should be without interruption, once started.
In Figure 8.6, task T1
was executing, and when the higher priority task T2
comes in,
the ﬁrst one is preempted, that is, it is stopped in between, and the higher priority task
T2
is allowed to execute.
Now see Figure 8.7, in which T1
is a task which cannot be preempted (because
of the nature of the application). So, although the higher priority task T2
comes in, T1
continues till completion.
Figure 8.5 | An aperiodic task
Task
R S S
R S
R S
R
Time
T
Figure 8.6 | A preemptible task
Tasks
Time
R S
T2
T1
M08_9788131787663_C08.indd 295
M08_9788131787663_C08.indd 295 7/3/2012 12:10:51 PM
7/3/2012 12:10:51 PM

8.4 | Real-time Operating Systems
Now that you have got an idea of the different types of real-time tasks, it is time to be
introduced to what is meant by a ‘real-time operating system (RTOS)’.
All general purpose computers have operating systems, and the features of such
systems have been dealt with, in Chapter 7. We note that any system which has a pow-
erful processor, a number of peripherals, different types of memory and multiple users
(amounting to multiple tasks) need a manager.The OS functions as the manager in a PC.
Do embedded systems need an operating system?
Simple embedded applications like printers,scanners,sensor-based home security systems,
MP3 players and myriad such applications need only dedicated hardware and firmware.
Such embedded systems wait for sensor inputs and use interrupts to tell the CPU
that it needs attention. This is called the superloop-based approach. In this, the whole
code for the system is written as one loop which executes continuously. When external
events (like key press, alarms, etc.) come, interrupts are generated to alert the processor
and then the system responds appropriately. There can be a number of inputs and cor-
responding actuators, too. The code for the working of all these peripherals is there in
the flash memory of the system.
But there are some classes of embedded systems which are complex and need a
manager. If you visualize a mobile phone as an embedded system, you know that it is
a ‘computer’ with less computing capabilities (than a PC), but more communicating
capabilities—thus, it is a special function computer. It has a number of peripherals like
key pad, display, camera, speakers, mic and so on; it also has memory in the form of
RAM and flash. It needs to support different types of communication protocols. All
this points to the necessity of having a manager for such a system, and that’s why there
are ‘embedded operating systems’. An OS is a necessity in complex embedded systems.
Embedded systems are not limited to handheld devices (though that will be the first
to come to our mind). In manufacturing/chemical plants, in communication networks,
etc, distributed embedded systems with multiple processors is used. Such systems need
operating systems with extra and special features.
In short,operating systems for embedded systems are of greater variety than general
purpose OSes. But all embedded operating systems need not be real-time operating
systems. Only where time constraint is a factor, need we bother about a ‘real-time’
OS. Thus, we may port a linux kernel into an embedded processor, but for a real-time
application, it will be a real-time linux kernel that we port.
Time
R S
R S
T1
T2
Figure 8.7 | A non-preemptible task
M08_9788131787663_C08.indd 296
M08_9788131787663_C08.indd 296 7/3/2012 12:10:51 PM
7/3/2012 12:10:51 PM

Most of the aspects of operating systems that we saw in Chapter 7 are relevant and
applicable here, but this chapter gives an insight into real-time systems by introducing the
aspect of ‘time constraint’.The key diﬀerence between a GPOS and an RTOS is in the task
scheduling scenario,and that is what is focused on,in this chapter.Many scheduling strate-
gies are touched upon and a few are given an in-depth analysis with numerical examples.
What does an RTOS do?
Just as any OS functions, an RTOS also provides an abstraction layer between the
embedded hardware and the application software. See Figure 8.8.This simply means
that the users do not have to concern them self with the hardware—the RTOS manages
the interaction between the applications and the hardware. The RTOS ensures that the
multiple tasks that comes in, are managed and done ‘on time’. RTOSes are a necessity in
complex embedded real-time systems.
An RTOS has a kernel, which forms the core of the OS, besides that, there are
other components in it. Figure 8.9 shows the basic services that must be provided by any
RTOS kernel. As we see, managing multiple tasks, that is, task management is the core
Application
Software
RTOS
Hardware
Figure 8.8 | Hardware–software hierarchy in a complex embedded system
Device I/O
Supervisor
Dynamic
Memory
Allocation
Timers
Intertask
Communication and
Synchronization
Task
Management
Figure 8.9 | Kernel services of an RTOS
M08_9788131787663_C08.indd 297
M08_9788131787663_C08.indd 297 7/3/2012 12:10:51 PM
7/3/2012 12:10:51 PM

issue.This unit takes charge of task scheduling with the timing information given by the
timer. In this chapter, we will focus mainly on task scheduling. The other units of the
kernel have the same functionality as discussed for general purpose operating systems
in Chapter 7.
8.5 | Real-time Scheduling Algorithms
In the previous chapter, a number of scheduling algorithms for general purpose operat-
ing system have been discussed. Some of them are relevant for real-time systems as well,
but here we will concentrate on algorithms which are specific for real-time systems.
Let us classify the real-time scheduling methods which are in use. The flow chart in
Figure 8.10 gives a classification.
Many real-time systems are multiprocessor systems—networked embedded sytems
have processors at different locations. But here we dicuss only uniprocessor algorithms.
8.5.1 | Off Time Scheduling (Pre-run-time Scheduling)
They generate scheduling information prior to system execution. The scheduling is
based on the knowledge of the release times, deadlines and execution times of all the
tasks in the system. This is a deterministic system model. The exact timing characteris-
tics of the tasks are known a priori. With this information, a scheduling algorithm can
produce a precise schedule which optimizes one of several different measures and opti-
mal algorithms can be used, which can guarantee very good system performance. Fixed
factory jobs,where nothing changes under normal conditions,can use this approach.The
obvious disadvantage is the inflexibility. If the operational mode or any other parameter
changes, the policy will have to be re-done.
8.5.2 | On Line Scheduling
If the parameters of the tasks and the number and types of tasks that will come are
not known a priori, the scheduling policy becomes an ‘on line’ policy. This is the only
method that can be adopted by a system whose task list is not known in advance. Such a
Figure 8.10 | Classification of real-time scheduling methods
On-line
Off-line
Static Priority
Preemptive Non-preemptive Planning Based Best Effort
Dynamic Priority
Real-time Scheduling
M08_9788131787663_C08.indd 298
M08_9788131787663_C08.indd 298 7/3/2012 12:10:51 PM
7/3/2012 12:10:51 PM

scheduler must accommodate dynamic changes in the user demands and availability of
resources. But it may not be able to make the best use of all the resources, because of the
unpredictable nature of the incoming tasks.
8.5.2.1 | Static Priority
Using static priority and scheduling tasks such that those with higher priority gets
the chance to execute first, is one of the most popular of the scheduling methods. The
method can be preemptive or non-preemptive.
Non-preemptive Scheduling Here the task with the highest priority is run, until it is
completed.
Preemptive Priority-based Execution The ready task with the highest priority is cho-
sen for execution as in the previous case. But the difference is that execution of any task
can be preempted if a task of higher priority becomes ready.Thus,at all times the proces-
sor is either idle or executing the ready task with the highest priority.
8.5.2.2 | Dynamic Priority
This allows priority to be changed at run time, and so scheduling requires more com-
putation. Here also pre-emption may or may not be used. Flexibility is quite high in
systems which adopt this method.There are two subsets for this method:
Planning-based techniques guarantee deadlines for all the accepted tasks. The best
effort algorithm, as the name indicates does its best to maximize performance. It will
guarantee the deadline of all the hard real-time tasks, while optimizing the performance
of soft tasks.There are many algorithms which are used by different RTOSes,but we will
concentrate on the most popular of them.
Now that we have had a broad discussion on task scheduling for real-time systems,
let have a look at some numerical examples. In all our numerical examples, we imply
hard real-time systems, in which meeting their respective deadline is what is meant by
making a set of tasks to be schedulable.
First we consider a simple case of static priority.
Example 8.1
Consider three periodic tasks with priorities, periods and execution time as given in
Table 8.1. Draw Gantt charts corresponding to how these tasks will be scheduled,
assuming that all the jobs have the same release time (i.e. they arrive in the ready queue
at the same time).
Table 8.1
Tasks Priority Period CPU Burst
T1
1 7 2
T2
2 17 4
T3
3 24 8
M08_9788131787663_C08.indd 299
M08_9788131787663_C08.indd 299 7/3/2012 12:10:51 PM
7/3/2012 12:10:51 PM

Schedule the tasks to meet their deadlines
i) Without pre-emption
ii) With pre-emption
Solution
The term ‘CPU burst’ means the time the particular task needs the service of the CPU.
Each of the tasks has priorities, with Task1 having the highest and Task3, the lowest
priority. Since each task is periodic, one burst of the task must be completed before its
next period starts.
Figure 8.11 shows the chart with the three tasks shown on separate time axes, rep-
resenting the scheduling without pre-emption.The ready times of each burst of the tasks
are marked with arrows at the respective time instants.
The three tasks are ready at time t = 0.Task T1
, then T2
and finally T3
executes, one
after the other, because T1
has the highest priority and T3
, the lowest. Each task’s burst is
completely executed, and then only the next task is taken up. By the end of the time that
T3
has been completed, all the tasks have completed one round of execution. But a few
‘bad’ things have happened within this time.
i) At t = 7,the second burst of T1
arrived,but it was not processed because at that time,
the CPU was busy with Task T3
. So T1
has to wait until time t = 14, for its second
burst to be processed.
ii) At t = 14,the second and third bursts of T1
are ready,but at this time,the CPU must
execute the second burst of T1
. So the third burst of T1
which will have to wait. In
effect,T1
’s deadline is missed at t = 7 and t = 14.
iii) Because the period of T1
is relatively short, it is the task that suffers the most, even
though it is the task with the highest priority.
This problem can be solved by using pre-emptive tactics. See Table 8.1.The second
arrival of T3
is only at t = 24, and therefore its first burst need not be completed in a
hurry. Whenever T1
‘s burst arrives,T3
can be preempted and T1
can be processed.This is
quite logical as T1
is a higher priority task.
0 2 4 6 8 10 12 14 16
Two Bursts of T1
18 20 22 24
T3
T1
T2
Figure 8.11 | Scheduling the tasks in Table 8.1 using static priority without pre-emption
M08_9788131787663_C08.indd 300
M08_9788131787663_C08.indd 300 7/3/2012 12:10:51 PM
7/3/2012 12:10:51 PM

See Figures 8.12 a and b.
i) T1
is taken up at t = 0. It completes at t = 2.
ii) T2
is taken up at t = 2. It completes at t = 6.
iii) T3
is taken up at t = 6.It is preempted at t = 7,because the higher priority T1
arrives
again at t = 7.
iv) At t = 9, when the second burst of T1
completes,T3
is again take up for execution,
and is able to perform execution up to t = 14.
v) At t = 14, the third burst of T1
is ready. So T3
is preempted once again.
vi) T3
gets its next chance to execute at t = 16, after T1
completes its third burst.
vii) At t = 17, the second burst of T2
is ready and once again,T3
is preempted.
viii) T3
gets its chance again only at t = 23, after T2
completes its second burst and T1
,
its fourth burst.
ix) T3
completes,with its last instance from t = 23 to 24,and is able to meet its deadline.
Note that neither T1
nor T2
had to be pre-empted, but T3
had to be, three times.T3
is taken up when neither T1
nor T2
needs to be processed.This is fine as T3
is the lowest
priority task. T3
is completed, but in parts. The deadline of T3
(i.e. 24) is not violated as
the last instance of task T3
completes just before this.
Figure 8.12a shows the time axes for the three tasks separately.The same informa-
tion can be marked on the same time axis, as in Figure 8.12b. The first figure is easier
for visualization of scheduling, while the second one will be easier, when you have to
0 2 4 6 8 10 12 14 16 18 20 22 24
T3
T1
T2
Figure 8.12a | Scheduling the tasks in Table 8.1 using static priority with pre-emption
(Figure drawn with tasks on separate time axes)
Figure 8.12b | Scheduling the tasks in Table 8.1 using static priority with pre-emption
(Figure drawn with tasks on the same time axis)
0 2 4 6 8 10 12 14 16 18
Time
20 22 24
T1
T1
T1
T2
T3
T2
T3
T3
T3
T2
T1
T3
T1
T1
M08_9788131787663_C08.indd 301
M08_9788131787663_C08.indd 301 7/3/2012 12:10:52 PM
7/3/2012 12:10:52 PM

draw scheduling diagrams. One point you will note is that the CPU is not idle for any
time up to t = 24.
What is implicitly assumed in this example is that the deadline of a periodic task is the
time when its next ‘burst’arrives.Thus, effectively it means that the period of the task is its
deadline also.This is the approach used in one of the most important real-time task sched-
uling algorithms, that is, the ‘Rate Monotonic Scheduling’, which we will discuss next.
8.6 | Rate Monotonic Algorithm
Rate Monotonic Theory
The notion of rate monotonic scheduling was first introduced by Liu and Layland in
1973. The term ‘rate monotonic (RM)’ derives from a method of assigning priorities to
a set of processes, that is, assigning priorities as a monotonic function of the rate of a
(periodic) process. The term ‘monotonic’ means ‘either increasing or decreasing’ of a set
of values.
In this, priorities are assigned according to the increasing period of a process; as
the period increases, the priority decreases. This amounts to the fact that the process of
lowest period will get the highest priority.
With this rule for assigning priorities, rate monotonic scheduling theory provides
the following simple inequality (8.1),being the sufficient condition for ‘scheduling’using
the RM algorithm.
1/
1
/ (2 1)
n
n
i i
i
C P n
=
≤ −
∑ (8.1)
The LHS of the inequality is the total CPU utilization for n tasks, where Ci
is the
execution time (CPU burst) and Pi
is the period of task Ti
. If this condition is satisfied,
the RM algorithm will be able to schedule the tasks within their respective deadlines.
This is a sufficient condition, but not a necessary condition. This implies that, sets of
tasks which satisfy this inequality are definitely schedulable,but there can be sets of tasks
which do not satisfy this condition, and still are schedulable.
In Table 8.2, the value of the RHS of the inequality (8.1) is calculated for various
values of n,the number of tasks in the set.When n tends to infinity,the RHS reduces to ln2.
Table 8.2 | Rate Monotonic Schedulable Bound (RHS of Inequality 8.1)
Task Set Size (n) Schedulable Bound
1 1
2 0.828
3 0.780
4 0.757
5 0.743
6 0.735
------ -----
infinity ln2
M08_9788131787663_C08.indd 302
M08_9788131787663_C08.indd 302 7/3/2012 12:10:52 PM
7/3/2012 12:10:52 PM

Let’s find the CPU utilization for the set of tasks in Example 8.1.
The LHS is C1
/P1
+ C2
/P2
+ C3
/P3
= 3/7 + 4/17 + 7/24 = 0.9548
The RHS is = 3(21/3
− 1) = 3 × 0.26 = 0.78 (see Table 8.2 for n = 3)
We see that the sufficient condition for the RM algorithm does not get satisfied.
But still, this task set might be schedulable by the RM technique. Let us find out.
Looking closely, we find that the second part of Example 8.1 is the implementation
of the RM algorithm itself, because the priorities are ordered in the decreasing order of
their periods.T1
has the least period and therefore, the highest priority.T3
has the high-
est period and the least priority. We see that for this set of tasks, the RM algorithm was
able to schedule the tasks using pre-emption—the condition being that, at any time, the
task of the highest priority should be given the CPU (when it needs it).Thus,Figure 8.12
corresponding to the RM algorithm for the task set in Table 8.1. The RM algorithm is
a scheme catering to static priority with pre-emption.
Example 8.2
For the task set in Table 8.3,find the CPU utilization,and verify whether it is schedulable
using the RM algorithm.
Table 8.3
Tasks Period CPU Burst
T1
12 5
T2
7 3
Solution
CPU utilization = 5/12 + 2/7 = 0.844
For two tasks, the RHS of the inequality (8.1) is 0.828 (Table 8.2)
Since LHS RHS, the sufficient condition for schedulability is not satisfied. But
this task set is still schedulable using the RM algorithm, as shown in Figures 8.13a and b.
Task T2
has the higher priority (lower period) and so its burst is completed first. Next T1
is taken up at t = 3. But at t = 7, the second burst of T2
arrives. So T1
is pre-empted, and
T2
is taken up. Once T2
completes,T1
is again taken up for execution, and it completes at
t = 11, well before its deadline of t = 12.
0 2 4 6 8 10 12 14 16 18 20 22 24
T1
T2
Figure 8.13a | Scheduling the tasks in Table 8.3 using the rate monotonic algorithm
M08_9788131787663_C08.indd 303
M08_9788131787663_C08.indd 303 7/3/2012 12:10:52 PM
7/3/2012 12:10:52 PM

Thus, we find that both tasks are able to finish before their deadline, But the proces-
sor is idle for certain periods (in the figure, the idle period is from t = 11 to t = 12).
Example 8.3
For the task set given in Table 8.4, find the CPU utilization, and find out whether it is
schedulable using the RM algorithm
Table 8.4
T1
15 4
T2
12 2
T3
20 5
Solution
CPU utilization = 4/15 + 2/12 + 5/20 = 0.684.
The bound for scheduling for three tasks is 0.782.
Since LHS RHS (of inequality 8.1), that is, the ‘sufficient’ condition is satisfied,
this task set is definitely schedulable using the RM algorithm. See Figures 8.14a and b.
Note that T2
is the higher priority process and so it is executed first. The figures show
Figure 8.13b | Scheduling the tasks in Table 8.3 using the rate monotonic algorithm
0 2 4 6 8 10 12
Time
T2
T1
T2
T2
T1
T1
0 2 4 6 8 10 12 14 16 18 20 22 24 26
T3
T1
T2
Figure 8.14a | Scheduling the tasks in Table 8.4 using the rate monotonic algorithm
M08_9788131787663_C08.indd 304
M08_9788131787663_C08.indd 304 7/3/2012 12:10:52 PM
7/3/2012 12:10:52 PM

that the CPU is idle for quite long periods; an ideal situation for the RM algorithm.
There was also no need for pre-empting any task. The low CPU utilization made these
aspects possible.
Example 8.4
See Table 8.5 which shows three tasks, with diﬀerent release times. This means that all
the tasks are not ready at time t = 0.The next burst of each of the tasks is to be calculated
with respect to its ﬁrst release time.Try scheduling this with the RM algorithm.
Table 8.5
Tasks Period CPU Burst Release Time
T1
3 1 0
T2
10 3 1
T3
15 4 3
Solution
Figure 8.15 shows that scheduling is such that tasks T2
and T3
have to be preempted
when the burst of the high priority task T1
arrives, every three time units.
Figure 8.14b | Scheduling the tasks in Table 8.4 using the rate monotonic algorithm
0 2 4 6 8 10 12 14 16 18
Time
20 22 24 26
T2
T1
T2
T1
T3
T2
T3
T3
T1
0 2 4 6 8 10 12 14 16 18 20 22 24
T3
T1
T2
Figure 8.15 | Scheduling the tasks in Table 8.5 using the rate monotonic algorithm
M08_9788131787663_C08.indd 305
M08_9788131787663_C08.indd 305 7/3/2012 12:10:52 PM
7/3/2012 12:10:52 PM

Note that the release times of the tasks T2
and T3
are marked at t = 1 and 3,
respectively.This task set is, therefore, schedulable.
Note: There is also a ‘necessary’ condition for verifying if a task set is schedulable by the
RM algorithm. Since it involves a lot of calculation, it hasn’t been included here. For
now, we will be satisfied with the necessary condition, and one thing that might strike
us is that as the number of tasks increases, the CPU utilization has to reduce (Table 8.2)
to ensure that it is ‘schedulable’ using the RM algorithm. This is inefficient, because
maximizing CPU utilization is one of the aims for any scheduling scheme, and the RM
algorithm takes advantage of low CPU utilization to ensure schedulability.
8.7 | The Earliest Deadline First Algorithm
This belongs to a class of dynamic priority allocation methods.Here the ‘priority’of tasks
changes at run time. At any instant, the highest priority task is one that has the closest
deadline. Tasks that cannot be scheduled by the RM algorithm (because of high CPU
utilization) can be scheduled by this method. Look at the task set in Table 8.6.
Example 8.5
Table 8.6
T1
4 1
T2
6 2
T3
9 4
Solution
The CPU utilization is 1/4 + 2/6 + 4/9 = 0.98
We find that this set cannot be scheduled by the RM scheme.See Figure 8.16 which
shows that task T3
is unable to complete within its deadline of t = 9.
Figure 8.16 | Scheduling the tasks in Table 8.6 using the rate monotonic algorithm
0 2 4 6 8 10 12 14 16 18 20 22 24
Missed
Deadline
T3
T1
T2
M08_9788131787663_C08.indd 306
M08_9788131787663_C08.indd 306 7/3/2012 12:10:53 PM
7/3/2012 12:10:53 PM

But we will see that it can easily be scheduled using the EDF technique (Figure 8.17).
The idea is that at any time, the task which has the nearest deadline is to be scheduled,
rather than the task whose new burst has been released. See the steps of scheduling with
respect to Figure 8.17.
i) At t = 0, task T1
is taken up for execution and then T2
is taken up at t = 1.
ii) T2
completes at t = 3, and then T3
starts.
iii) At t = 4, the second burst of T1
appears. Since T1
has the earlier deadline (at t = 6),
it is taken up first and then T3
continues till it completes at t = 8.
iv) Meanwhile, the second burst of T2
arrives at t = 6, but it will not be serviced now
because the deadline of T3
is nearer (at t = 9), that is, the second burst of T2
need be
completed only at t = 12. So T3
continues till completion at t = 8.
v) At t = 8, there is a tie because the third burst of T1
is ready, as well as the second
burst of T2
, and both have the same deadline at t = 12. You can see that either of
them can be taken up, and both of them are able to meet their deadlines.
vi) At the end, we see that none of the tasks have missed their deadlines.
Thus, the EDF algorithm can schedule the task set which was not schedulable
with the static priority RM technique. The necessary condition for schedulability with
the EDF algorithm is that the CPU utilization must be less than 1. There is no need
to assume that tasks are periodic—aperiodic and sporadic tasks can also be scheduled
with this.
EDF is an optimal algorithm, in the sense that if a task set can be scheduled, EDF
will be able to do the scheduling.
8.7.1 | Disadvantages of EDF
It is a dynamic priority algorithm, and therefore requires dynamic determination of
priorities.Thus, it is not as controllable as static priority algorithms. In effect, more
overheads are required to implement this. Dynamic priority schemes are not usually
used in systems which require absolute predictability. Such schemes have slightly greater
scheduling overheads than fixed priority schemes. This is because the range of dynamic
priorities is usually greater than the range of static priorities,and dynamic priorities must
0 2 4 6 8 10 12 14 16
T3
18 20 22 24
T1
T2
Figure 8.17 | Scheduling the tasks in Table 8.6 using the EDF technique
M08_9788131787663_C08.indd 307
M08_9788131787663_C08.indd 307 7/3/2012 12:10:53 PM
7/3/2012 12:10:53 PM

also be recalculated at each decision point, whereas static priorities never change and
never have to be recalculated.
Example 8.6
Verify if the task set in Table 8.7 is schedulable using the EDF algorithm.
Table 8.7
T1
12 7
T2
7 5
CPU utilization of this is 7/12 +5/7 =1.297,that is,greater than 1.So the necessary con-
dition for EDF is violated.This is thus not schedulable as Figure 8.18 shows.T2
cannot
meet its deadline for its second burst.
8.8 | Qualities of a Good RTOS
Since real-time applications are varied, RTOSes are also of different kinds and varieties.
A set of qualities can be listed for an OS to ‘qualify’ as a good RTOS.
i) Performance: This factor implies that it must be capable of meeting the require-
ments of the application (for which is used)—it must be fast and cater to a high
throughput. In short, it should aim to improve and support the performance of the
real-time system.
ii) Reliability: This means that it should not fail or that the ‘mean time between
failures’ should be very high.
iii) Compactness: Since the software and the OS will finally be ported to the memory
(usually flash) of the real-time system, it is important that the size of the OS is as
small as possible.
iv) Scalability: Many operating systems are large and cater to many types of applica-
tions. It is necessary that for a particular application, the unnecessary services are
removed, that is, the OS must be scaled down. For example, if the application in
Figure 8.18 | Failure of EDF scheduling
0 2 4 6 8 10 12 14 16
T1
T2
18 20
Missed
Deadline
22 24
M08_9788131787663_C08.indd 308
M08_9788131787663_C08.indd 308 7/3/2012 12:10:53 PM
7/3/2012 12:10:53 PM

hand does not require network services, it must be possible to remove the network
part of the OS.Thus, there should be a kind of modularity in the design of the OS,
to allow it to be scaled down or up, as necessary.
Conclusion
This chapter reﬂects only a very small segment of the body of knowledge regarding real-
time operating systems. The idea has been to present the vast ocean of knowledge in a
small framework, and thus get you to understand the issues to be resolved when ‘time’
becomes a critical factor. Although only two algorithms have been discussed with a little
depth, they are the important ones used in designing real-time kernels. Many other
policies and techniques that are used have a reliance on these basic algorithms. This
chapter is intended only to give an introduction, so as to motivate the reader to learn
more about real-time systems.
Q U E S T I O N S
1. Give two examples of systems which need‘real-time’capabilities.
2. Distinguish between laxity and tardiness.
3. Distinguish between‘release time’and‘scheduling time’of a task.
4. Distinguish (with examples) between hard and firm real-time tasks.
5. Distinguish (with examples) between aperiodic and sporadic tasks.
6. What is the importance of a‘timer’in a real-time kernel?
7. How do you define‘schedulability’of a set of tasks?
8. Under what condition would you say that static priority-based preemptive scheduling is
the same as‘rate monotonic’scheduling?
9. What is meant by‘scalability’for an RTOS?
10. Name a few RTOSes.
E X E R C I S E S
1. For the task set given in Table 8.8 what is the CPU utilization? Is it schedulable using
(i) the RM algorithm (ii) the EDF method? Show the Gantt charts if it is schedulable.
Table 8.8
T1
20 6
T2
60 20
2. For the task set given in Table 8.9, what is the CPU utilization? Is it schedulable using
(i) the RM algorithm (ii) the EDF method? Show the Gantt charts if it is schedulable.
M08_9788131787663_C08.indd 309
M08_9788131787663_C08.indd 309 7/3/2012 12:10:53 PM
7/3/2012 12:10:53 PM

Table 8.9
T1
8 1
T2
10 2
T3
15 3
T4
24 4
T5
12 3
3. What is the CPU utilization for the following task set (Table 8.10)? Can it be scheduled
using the EDF algorithm?
Table 8.10
T1
10 5
T2
12 2
T3
15 3
T4
24 6
4. Try scheduling the task set in Table 8.9 using non-preemptive priority-based scheduling.
5. Find the names of four popular RTOSes.
6. Name the popular RTOSes used in mobile phones.
7. What is Tiny OS? Where is it used?
M08_9788131787663_C08.indd 310
M08_9788131787663_C08.indd 310 7/3/2012 12:10:53 PM
7/3/2012 12:10:53 PM

Introduction
This chapter is meant to provide an exposure to a high level language programming for
embedded processors. In Chapters 13 and 14, programs written in assembly language
for 8051 and ARM have been presented. We know that assembly programs are efficient
but is also time-consuming because one needs to know the instruction set of the specific
processor very intimately. Programming in a high level language is much easier. Here
we use C as the high level language, because it is the most popular language in the
embedded systems industry. This chapter is based on the assumption that you have a
least a minimum level of knowledge of C constructs, looping structures and functions.
It is from that point that we begin. All the 8051 programs in this chapter have been
tested in the Keil RVDK.
9.1 | Embedded C
We use the kind of C called Embedded C. The name itself implies that an embedded
processor is involved. Though the constructs that we use are that of the C language, the
program lines need a certain amount of knowledge about the processor being used. The
point is that we need to know at least the registers and settings needed for them, to get
a particular application to be run. There are different compilers for the same proces-
sor. In this book, we use Keil C, which is freely downloadable and thus accessible for
all students. Appendix B gives the step by step method of using the Keil Real View
Development Kit (RVDK) for 8051 as well as for ARM.
What is meant by the term ‘Embedded C’
Why it is specific for a particular processor
How to understand the header file of a par-
ticular processor
How to write delay programs for 8051
The method of using the timers of 8051 in
the status check and interrupt modes
How to use logical and shift operators in C
A few programs of PIC18F458
programming in
embedded c
9
Chapter-opening image: A GPS module setup.
M09_9788131787663_C09.indd 311
M09_9788131787663_C09.indd 311 7/3/2012 12:11:02 PM
7/3/2012 12:11:02 PM

In this chapter, we start with Embedded C for 8051 and end it with a few examples
using a PIC 18F458 MCU.The architecture and interfacing details of 8051 are covered
in Chapters 13 and 14. It is necessary to learn those chapters before you attempt to
understand Embedded C for 8051.
9.1.1 | The Header File
To write a program, a specific processor is to be chosen. The processor chosen has a
number of registers already defined in a header file,and we need to make this visible to our
program. Let us, for instance, choose an 89C51 of Atmel.The 89C51 is a chip which has
flash ROM, and hence is more likely to be used in projects and actual hardware design.
The header file is available in the directory C:KeilC51INCAtmel and is a note-
pad file of name 89X51.The ‘X’in the processor names implies that any processor in the
89C51 series may be used
Open this file. The addresses (with names) of 8-bit registers and the single bits of
bit-addressable registers are given in this file, and you will see the following set as in
Table 9.1
Table 9.1 | Contents of the Header File for 89x51
/*--------------------------------------------------------
----------------------------------------------------------
AT89X51.H
Header file for the low voltage Flash Atmel AT89C51 and
AT89LV51.
Copyright (c) 1988-2002 Keil Elektronik GmbH and Keil
Software, Inc.
All rights reserved.
----------------------------------------------------------
--------------------------------------------------------*/
#ifndef__AT89X51_H__
#define__AT89X51_H__
/*------------------------------------------------
Byte Registers
------------------------------------------------*/
sfr P0 = 0x80;
sfr SP = 0x81;
sfr DPL = 0x82;
sfr DPH = 0x83;
sfr PCON = 0x87;
sfr TCON = 0x88;
sfr TMOD = 0x89;
sfr TL0 = 0x8A;
M09_9788131787663_C09.indd 312
M09_9788131787663_C09.indd 312 7/3/2012 12:11:03 PM
7/3/2012 12:11:03 PM

PROGRAMMING IN EMBEDDED C 313
sfr TL1 = 0x8B;
sfr TH0 = 0x8C;
sfr TH1 = 0x8D;
sfr P1 = 0x90;
sfr SCON = 0x98;
sfr SBUF = 0x99;
sfr P2 = 0xA0;
sfr IE = 0xA8;
sfr P3 = 0xB0;
sfr IP = 0xB8;
sfr PSW = 0xD0;
sfr ACC = 0xE0;
sfr B = 0xF0;
/*------------------------------------------------
P0 Bit Registers
------------------------------------------------*/
sbit P0_0 = 0x80;
sbit P0_1 = 0x81;
sbit P0_2 = 0x82;
sbit P0_3 = 0x83;
sbit P0_4 = 0x84;
sbit P0_5 = 0x85;
sbit P0_6 = 0x86;
sbit P0_7 = 0x87;
/*------------------------------------------------
PCON Bit Values
------------------------------------------------*/
#define IDL_ 0x01
#define STOP_ 0x02
#define PD_ 0x02 /* Alternate definition */
#define GF0_ 0x04
#define GF1_ 0x08
#define SMOD_ 0x80
/*------------------------------------------------
TCON Bit Registers
------------------------------------------------*/
sbit IT0 = 0x88;
sbit IE0 = 0x89;
sbit IT1 = 0x8A;
sbit IE1 = 0x8B;
sbit TR0 = 0x8C;
sbit TF0 = 0x8D;
sbit TR1 = 0x8E;
sbit TF1 = 0x8F;
M09_9788131787663_C09.indd 313
M09_9788131787663_C09.indd 313 7/3/2012 12:11:03 PM
7/3/2012 12:11:03 PM

/*------------------------------------------------
TMOD Bit Values
------------------------------------------------*/
#define T0_M0_ 0x01
#define T0_M1_ 0x02
#define T0_CT_ 0x04
#define T0_GATE_ 0x08
#define T1_M0_ 0x10
#define T1_M1_ 0x20
#define T1_CT_ 0x40
#define T1_GATE_ 0x80
#define T1_MASK_ 0xF0
#define T0_MASK_ 0x0F
/*------------------------------------------------
P1 Bit Registers
------------------------------------------------*/
sbit P1_0 = 0x90;
sbit P1_1 = 0x91;
sbit P1_2 = 0x92;
sbit P1_3 = 0x93;
sbit P1_4 = 0x94;
sbit P1_5 = 0x95;
sbit P1_6 = 0x96;
sbit P1_7 = 0x97;
/*------------------------------------------------
SCON Bit Registers
------------------------------------------------*/
sbit RI = 0x98;
sbit TI = 0x99;
sbit RB8 = 0x9A;
sbit TB8 = 0x9B;
sbit REN = 0x9C;
sbit SM2 = 0x9D;
sbit SM1 = 0x9E;
sbit SM0 = 0x9F;
/*------------------------------------------------
P2 Bit Registers
------------------------------------------------*/
sbit P2_0 = 0xA0;
sbit P2_1 = 0xA1;
sbit P2_2 = 0xA2;
sbit P2_3 = 0xA3;
sbit P2_4 = 0xA4;
M09_9788131787663_C09.indd 314
M09_9788131787663_C09.indd 314 7/3/2012 12:11:03 PM
7/3/2012 12:11:03 PM

sbit P2_5 = 0xA5;
sbit P2_6 = 0xA6;
sbit P2_7 = 0xA7;
/*------------------------------------------------
IE Bit Registers
------------------------------------------------*/
sbit EX0 = 0xA8; /* 1 = Enable External interrupt 0 */
sbit ET0 = 0xA9; /* 1 = Enable Timer 0 interrupt */
sbit EX1 = 0xAA; /* 1 = Enable External interrupt 1 */
sbit ET1 = 0xAB; /* 1 = Enable Timer 1 interrupt */
sbit ES = 0xAC; /* 1 = Enable Serial port interrupt */
sbit ET2 = 0xAD; /* 1 = Enable Timer 2 interrupt */
sbit EA = 0xAF; /* 0 = Disable all interrupts */
/*------------------------------------------------
P3 Bit Registers (Mnemonics Ports)
------------------------------------------------*/
sbit P3_0 = 0xB0;
sbit P3_1 = 0xB1;
sbit P3_2 = 0xB2;
sbit P3_3 = 0xB3;
sbit P3_4 = 0xB4;
sbit P3_5 = 0xB5;
sbit P3_6 = 0xB6;
sbit P3_7 = 0xB7;
sbit RXD = 0xB0; /* Serial data input */
sbit TXD = 0xB1; /* Serial data output */
sbit INT0 = 0xB2; /* External interrupt 0 */
sbit INT1 = 0xB3; /* External interrupt 1 */
sbit T0 = 0xB4; /* Timer 0 external input */
sbit T1 = 0xB5; /* Timer 1 external input */
sbit WR = 0xB6; /* External data memory write strobe */
sbit RD = 0xB7; /* External data memory read strobe */
/*------------------------------------------------
IP Bit Registers
------------------------------------------------*/
sbit PX0 = 0xB8;
sbit PT0 = 0xB9;
sbit PX1 = 0xBA;
sbit PT1 = 0xBB;
sbit PS = 0xBC;
sbit PT2 = 0xBD;
M09_9788131787663_C09.indd 315
M09_9788131787663_C09.indd 315 7/3/2012 12:11:03 PM
7/3/2012 12:11:03 PM

/*------------------------------------------------
PSW Bit Registers
------------------------------------------------*/
sbit P = 0xD0;
sbit F1 = 0xD1;
sbit OV = 0xD2;
sbit RS0 = 0xD3;
sbit RS1 = 0xD4;
sbit F0 = 0xD5;
sbit AC = 0xD6;
sbit CY = 0xD7;
/*------------------------------------------------
Interrupt Vectors:
Interrupt Address = (Number * 8) + 3
------------------------------------------------*/
#define IE0_VECTOR 0 /* 0x03 External Interrupt 0 */
#define TF0_VECTOR 1 /* 0x0B Timer 0 */
#define IE1_VECTOR 2 /* 0x13 External Interrupt 1 */
#define TF1_VECTOR 3 /* 0x1B Timer 1 */
#define SIO_VECTOR 4 /* 0x23 Serial port */
#endif
What we see is that the names of registers and ports and interrupt vectors have been
defined here, and therefore ‘including’the header file in our program makes the registers
and other things visible to the compiler. With this structure, we can start writing simple
programs for 8051 in C.
So let’s write our first program in which we want to output specific values on port
pins. Comments for important lines are given in the program itself.
Example 9.1
#include AT89X51.H
void main(void)
{
for(;;)
{
P1 = ’H’; //copy ‘H’ to port P1
P2 = 0x55; //copy the value 0x55 to P2
P3 = 0xF0; //copy the value 0xF0 to P3
}
}
The simple program in Example 9.1 gives values to the port pins, and this can be verified
in the Keil simulator. Port 1 gets the binary value corresponding to the ASCII of H,
M09_9788131787663_C09.indd 316
M09_9788131787663_C09.indd 316 7/3/2012 12:11:03 PM
7/3/2012 12:11:03 PM

and the other ports have values as required by the program. The program is put in an
infinite loop so that, when it is burned into the flash of the MCU, control does not go
beyond the last line of the program (otherwise unwanted data in addresses beyond this
may be mistakenly taken as instructions).
Example 9.2
This program toggles P0 continuously between the values of 0 and 0xFF. This toggling
can be observed in the simulator but if the program in burned on the 8051, it will be
difficult to see the toggling because there is no time delay between the loading of the two
different values in Port 0. Our next program creates a delay loop,
#include AT89X51.H
void main(void)
{
for(;;) //Repeat forever
{
P0 = 0x00; //Copy 0 to P0
P0 = 0xFF; //Copy 0xFF to P0
}
}
Example 9.3
Write a program in which P2 is given two different values. The values should be passed
to P2 with a delay.
Solution
#include AT89X51.H
void main(void)
{
unsigned int x;
for(;;)
{
P2 = 0xF0; //Set the upper four bits alone
for(x = 0;x4000;x++);
P2 = 0x0F; //Set the lower four bits alone
for(x = 0;x4000;x++);
}
}
Example 9.3 needs some explanation.
i) Here, a delay has been defined using an integer x, whose value changes from 0 to
4000. The time quantum of delay obtained with this will not be not clear now. To
know that,the program has to be burned on an actual hardware and the calculations
need to be done based on the clock frequency of the processor used.
M09_9788131787663_C09.indd 317
M09_9788131787663_C09.indd 317 7/3/2012 12:11:03 PM
7/3/2012 12:11:03 PM

ii) The delay loop has been created using a C ‘for’ loop. In actual execution, these lines
get converted to assembly instructions, and from this the delay can be calculated
(Section 13.12).
Different C compilers convert C program lines to different sets of assembly
instructions. In the debug mode of the Keil simulator, it is possible to get the
assembly code of this C program and from that, the delay can be calculated after
knowing the clock frequency. We don’t attempt to do such calculations here, as
they are covered in Section 13.12.1.
iii) The variable x has been defined as unsigned int. In this context, let us list out the
commonly used data types for 8051 C (Table 9.2).
Example 9.4
Write a program to toggle P2.2 with a delay between the states.
Solution
#include AT89X51.H
sbit LED = P2^2;
void wait(unsigned int);
void main(void)
{
while(1)
{
LED = 0; //Pin P2^2 = 0
wait(5000);
LED = 1; //Pin P2^2 = 0
wait(5000);
}
}
void wait(unsigned int time) //The function ‘wait’ is defined
{
unsigned int i, j;
for(i = 0;itime;i++)
for(j = 0;jtime;j++);
}
Table 9.2 | Data Formats
Data Type Size Range
Unsigned char 8 bits 0 to 255
Signed char 8 bits −128 to + 127
Unsigned int 16 bits 0 to 0xFFFF, i.e., 0 to 65,536
Signed int 16 bits −32,768 to + 32,767
bit 1 bit Used for bits of RAM
sbit 1 bit Used for SFR bits
M09_9788131787663_C09.indd 318
M09_9788131787663_C09.indd 318 7/3/2012 12:11:03 PM
7/3/2012 12:11:03 PM

Example 9.4 shows a few new aspects.
i) The delay had been defined in a function named ‘wait’.Whenever a delay is required
the function with the parameter is called. Here wait (5000) is the function call.The
function name is wait and the parameter is 5000.
ii) A port pin P2.2, a single bit has been name as ‘LED’. The ‘sbit’ directive stands for
single bit. Note that P2.2 is written as P2^2.
iii) This program generates a square wave at P2.2. If an LED is connected to the port
hardware, it will go ON and OFF at the rate specified by the parameter.
9.1.1.1 | Using a Character Array
The next program uses the unsigned char data type. In this, the contents of the character
array ARRAY are sent to Port 0 one after the other repeatedly. Note that the array is
defined as an ASCII string.
Example 9.5
#include AT89X51.H
void main(void)
{
unsigned char ARRAY[] = {“HELLO”}; //String to be displayed
unsigned char x;
for(;;)
{
for(x = 0;x=5;x++)
P0 = ARRAY[x]; //Displaying each
character at P0
}
}
When defined as a character, the corresponding hex number will be found to be in
the RAM of the processor. This can be verified by viewing the ‘RAM memory’ in the
simulator.
In this program,at a time,only one ASCII character will be in Port 0.Since no delay
is available between different character movements to P0, it will be difficult to observe it
(on an actual processor). But in the simulator, in the ‘single step mode’, it is possible to
observe the characters on P0 one after the other.
Example 9.6
#include AT89X51.H
sbit SW = P2^3;
void main(void)
M09_9788131787663_C09.indd 319
M09_9788131787663_C09.indd 319 7/3/2012 12:11:03 PM
7/3/2012 12:11:03 PM

{
bit B; //B is defined as a bit in
bit-addressable RAM
unsigned char Y; //Y is defined as a character
stored in RAM
P1 = 0xFF; //make P1 an input port
SW = 1; //make SW an input pin
{
Y = P1; //copy the contents of P1
to Y
if(Y == 0x00 ) //verify if Y = 0
P3 = 0x0F; //if yes, move a value to P3
else //if Y is not = 0
P3 = 0xF0; //move another value to P3
}
{
B = SW; //the value on SW i.e P2.3
is copied to B
if(B == 1) //verify if B = 1
P2 = 0x09; //if yes, move a specific
value to P2
else //if B is not = 1,
P2 = 0x01; //move another value to P2
}
while(1);
}
In this program, one 8-bit port and one single bit input have been defined as inputs.The
contents in these ports and pin are read in and moved to RAM.
According to their values, specific numbers are moved to P3 and P2. The program
makes it very clear.
Note The char Y and bit B are found in RAM locations.
Example 9.7
Now we will see a program which writes a message to an output port. This program
outputs the ASCII value of the message MESG, written as an ASCII string, one
character at a time at port P0. Note that we use #define to state that the port P0 has
been named as LED. A delay is realized with a delay function named wait.
#include AT89X51.H
#define LED P0 //name port P0 as LED
void wait(unsigned int);
void main(void)
M09_9788131787663_C09.indd 320
M09_9788131787663_C09.indd 320 7/3/2012 12:11:03 PM
7/3/2012 12:11:03 PM

{
unsigned char MESG[] = ”MY NAME IS LULLU”;
unsigned char x;
while(1) //for all time
{
for(x = 0;x=16;x++)
{
LED = MESG[x]; //display all characters
wait(9000);
}
}
}
void wait(unsigned int time) //delay routine named wait
{
unsigned int i, j;
for(i = 0;itime;i++)
for(j = 0;jtime;j++);
}
Note The MESG is in RAM and from there it is moved to Port 0.
You can check in the simulator, in the RAM, the values
4D 59 20
4E 41 4D 45 20
49 53 20
4C 55 4C 4C 55
So far, we considered programs in which delays are generated using software rou-
tines.Now let’s use some of the peripherals of 8051.The timer is an important peripheral
which can be programmed to operate in the status check mode as well as in the interrupt
mode. The details of how the timers operate are given clearly in Chapter 14. This is the
C version of Example 14.3 (done using assembly language).
Example 9.8
#include AT89X51.H
void timerdelay(void);
sbit OUTP = P2^4;
void main(void)
{
while(1)
{
OUTP = ~OUTP; //complement pin P2.4
timerdelay(); //call the delay function
}
}
M09_9788131787663_C09.indd 321
M09_9788131787663_C09.indd 321 7/3/2012 12:11:03 PM
7/3/2012 12:11:03 PM

void timerdelay(void)
{
TMOD = 0x10; //mode 1 timer 1
TL1 = 0x42; //load low byte of count
TH1 = 0x80; //load high byte of count
TR1 = 1; //start timer
while(TF1 == 0); //wait as long as TF1 = 0
TR1 = 0; //when TF1 = 1, stop timer
TF1 = 0; //clear timer flag
}
The main program complements a pin (P2.4) and calls a delay function named
‘timerdelay’.In this function,the SFRs of timer 1 are used.The mode word and the count
values are loaded.The timer flag is continuously tested and when it becomes ‘1’,the timer
is stopped by clearing the timer start timer bit. Then the timer flag (TF) is cleared to
make it ready for the next timing operation. Control then returns to the main program
which complements the pin P2.4 and calls the delay function repeatedly.
This timer generates a symmetric square wave. For details of the frequency of the
waveform, refer to Example 14.3.
Example 9.9
This program creates an unsymmetrical square wave at P1.4. These timing values are
taken from Example 14.6. Since a timer in mode 1 is being used, timer count values are
to be repeatedly re-loaded for each portion of the square wave.The two different counts
are loaded in the two functions—timerdelay_ON and timedelay_OFF.
#include AT89X51.H
void timerdelay_ON(void);
void timerdelay_OFF(void);
sbit OUTP = P1^4;
void main(void)
{
while(1)
{
OUTP = 1;
timerdelay_ON();
OUTP = 0;
timerdelay_OFF();
}
}
void timerdelay_ON(void) //function for ON time
{
TH1 = 0xF0; //load high byte of count
M09_9788131787663_C09.indd 322
M09_9788131787663_C09.indd 322 7/3/2012 12:11:03 PM
7/3/2012 12:11:03 PM

TR1 = 1; //start the timer
TR1 = 0; //when TF1 = 1, stop the timer
TF1 = 0; //clear the timer flag
}
void timerdelay_OFF(void) //function for OFF time
{
TH1 = 0xCD; //load high byte of count
TR1 = 1; //start the timer
TR1 = 0; //when TF1 = 1,stop the timer
TF1 = 0; //clear the timer flag
}
Figure 9.1 | The waveform of Example 9. 9 displayed in the logic analyser
Figure 9. 1 shows the unsymmetric waveform displayed on the logic analyser of the
Keil simulator.
The next example uses timer 0 in mode 2.
For mode 2, it is not necessary to re-load the count after each timer ﬂag setting.The
only action needed is to check if the ﬂag is set, and then clear it when it is set.The count
is automatically re-loaded and timing resumes.
Example 9.10
#include AT89X51.H
void timerdelay(void);
sbit OUTP = P1^3;
void main(void)
M09_9788131787663_C09.indd 323
M09_9788131787663_C09.indd 323 7/3/2012 12:11:03 PM
7/3/2012 12:11:03 PM

{
TH0 = 0; //load count = 0 ,for the
largest delay
TR0 = 1; //start timer
while(1)
{
OUTP = ~OUTP; //complement pin P1.3
timerdelay(); //call time delay function
}
}
void timerdelay(void)
{
TF0 = 0; //clear TF0
}
9.1.1.2 | Using Timers with Interrupts
For using interrupts, look in the header file in Table 9.1.The part referring to interrupts
is copied here.
Interrupt Vectors
Interrupt Address = (Number * 8) + 3
------------------------------------------------*/
#define IE0_VECTOR 0 /* 0x03 External Interrupt 0 */
#define TF0_VECTOR 1 /* 0x0B Timer 0 */
#define IE1_VECTOR 2 /* 0x13 External Interrupt 1 */
#define TF1_VECTOR 3 /* 0x1B Timer 1 */
#define SIO_VECTOR 4 /* 0x23 Serial port */
The interrupt numbers to be used in the interrupt functions are seen here. For example,
for using timer 0, the vector is listed as 1. See how it is used in Example 9.11 here.
The event to occur when the interrupt of Timer 0 is activated is written in the
interrupt function names as timer 0 (void) interrupt 1 (Any name can be given to the
function, but the word interrupt 1 must be included in it. Here we have given the name
timer 0, but that may be replaced by any label of our choice).
In Example 9.11, the only event that occurs is the complementing of pin P2.4.
In the main program, the IE register is written so as to enable all the interrupts and
specifically Timer 0 interrupt. Refer to Chapter 14 for the exact details of using timers
in the interrupt mode.
Example 9.11
#include AT89X51.H
sbit OUTP = P2^4;
void timer 0(void)interrupt 1
M09_9788131787663_C09.indd 324
M09_9788131787663_C09.indd 324 7/3/2012 12:11:04 PM
7/3/2012 12:11:04 PM

{
OUTP = ~OUTP;
}
void main(void)
{
TH0 = 0; //load of count = 0 ,foe biggest delay
IE = 0x82;
TR0 = 1;
while(1);
}
Example 9.12
This is the C program corresponding to Example 14.19 in assembly. Both timers 0 and 1
are used and two square waves are simultaneously obtained at two output pins P2.0 and
P2.1.
#include AT89X51.H
sbit OUTP = P2^1;
sbit OUTPP = P2^0;
{
OUTP = ~OUTP;
}
{
OUTPP = ~OUTPP;
}
void main(void)
{
TMOD = 0x22; //mode 2 timer 0 and 1
TH0 = 0x0A; //load count for timer 0
TH1 = 0xCD; //load count for timer 0
IE = 0x8A; //activate interrupts for both timers
TR0 = 1; //start timer 0
TR1 = 1; //start timer 1
while(1); //for all time
}
9.1.1.3 | Logical and Shift Operations in C
When C is used for MCUs,it may be necessary to use logic operators and shift operators.
The notations for these are listed in Table 9.3a, and Example 9.12 shows a program
which includes these operations.
M09_9788131787663_C09.indd 325
M09_9788131787663_C09.indd 325 7/3/2012 12:11:04 PM
7/3/2012 12:11:04 PM

Logical
Table 9.3a | Logical Operators
Data OR AND EX-OR INVERT
X,Y X|Y XY X^Y ~X, ~Y
Shift
Table 9.3b | Shift Operators
A data may be left or right shifted, a ﬁxed number of times.
The notation for left shift is and for right shift, it is
The way to use is as shown
i) 0x343 shifts the data right three times.
ii) 0x7E4 shifts the data 4 times to the left.
Example 9.13
#include AT89X51.H
unsigned char W, X, Y, Z;
void main(void)
{
X = 0x45;
Y = 0x78;
Z = 0x67;
W = 0xAB;
P1 = XY; //AND operation
P2 = X|Y; //OR operation
P3 = Z3; //Left shift
P0 = W2; //Right shift
}
9.1.2 | Serial Communication
Frequently, it is required to communicate serially between the PC and an embedded
8051 board.The ‘hyperterminal’in the accessories of Windows allows the viewing of the
data transmitted. Example 9.13 is a program to receive a character in RX and transmit
the same character in TX, that is, to echo the input on the monitor via ‘Hyperterminal’.
To ensure baud rate compatibility between the two, the crystal frequency of 8051 should
be 11.0952MHz.
M09_9788131787663_C09.indd 326
M09_9788131787663_C09.indd 326 7/3/2012 12:11:04 PM
7/3/2012 12:11:04 PM

The physical connection between the PC’s RS-232 serial port and the 8051, through
a MAX232 IC,is shown in Figure 9.2.A more detailed diagram is available in Figure 5.25.
Example 9.14
//With 11.0952MHz clock
#include AT89X51.H
unsigned char Send;
void Receive_Transmitt_Interrupt_Handler(void)interrupt 4
{
if(RI == 1)
{
RI = 0;
Send = SBUF;
SBUF = Send;
}
else
TI = 0;
}
Initialise_UART()
{
TMOD = 0x20; //Timer 1 in Mode 2
TH1 = 0xFD; //Setting Baud rate 9600kbps, with
11.0952MHz
SCON = 0x50; //Serial mode 1 with 8-bit data,stop
bit and start bit.
//Baud rate is controlled by Timer 1
IE = 0x90; //Enables serial communication flag
RS 232
RxD
RxD
TxD
TxD
T1
/T2
In T1
/T2
Out
R1
/R2
Out R1
/R2
In
IC
MAX232
Micro
Controller
1
2
3
4
5
6
7
8
9
Figure 9.2 | Connections of a PC's serial port to the UART in a μC through a MAX 232 IC
M09_9788131787663_C09.indd 327
M09_9788131787663_C09.indd 327 7/3/2012 12:11:04 PM
7/3/2012 12:11:04 PM

TR1 = 1; //Starting the Timer 1
}
int main(void)
{
Initialise_UART();
while(1);
}
9.2 | PIC Programming Using MPLAB
Next ,three programs written for PIC 18F458 are included.Due to space limitations,it is
not possible to include the details of the architecture of PIC in this book.However,these
programs are presented here to show the structure of embedded C programs for another
processor, besides 8051. For PIC, the most popular IDE is MPLAB, and this program
has been tested in the C compiler of MPLAB.
Example 9.15
In this, the 8-bit Timer 2 is programmed to operate in the interrupt mode to generate a
square wave on all pins of port B.
#include p18f458.h
void main(void)
{
unsigned int chkint;
WDTCON = 0x00; //reset watchdog timer
TRISB = 0x00; //configure PORTB as output
TMR2 = 0x00; //Timer 2 module register
is cleared
INTCON = 0x80; //Global Interrupt Enable
bit set
PORTB = 0x00; //initialize PORTB as 00
T2CON = 0x04; //Enable Timer 2 by setting
the TMR2ON bit
PIE1 = 0X02; //set Peripheral Interrupt
Enable register
chkint = 0;
while(1)
{
chkint = 0;
PIR1 = 0x00; //interrupt flag bits
cleared in PIR register
while(chkint == 0) //check for timer interrupt
{
chkint = PIR10x02;
}
M09_9788131787663_C09.indd 328
M09_9788131787663_C09.indd 328 7/3/2012 12:11:05 PM
7/3/2012 12:11:05 PM

PORTB = 0xFF; //when interrupt occurs
PORTB pins go high
chkint = 0;
PIR1 = 0x00; //interrupt flag bits are
cleared
while(chkint == 0)
{
chkint = PIR10x02;
}
PORTB = 0x00; //when interrupt occurs
PORTB pins go low
}
}
Example 9.16
This is a program for getting Timer 1 to act as a ring counter at Port B.
#include p18f458.h
void main(void)
{
unsigned int chkint, n, p;
WDTCON = 0x00; //reset watchdog timer
TRISB = 0x00; //make port B ,an output port
TMR1L = 0x8A; //configured for some delay
TMR1H = 0XD0;
INTCON = 0X80; //Global Interrupt Enable bit
is set
PORTB = 0X00; //initialize PORTB as 00
T1CON = 0X01; //Enable Timer 1 by setting the
TMR1ON bit
PIE1 = 0X01; //set Peripheral Interrupt
Enable register
chkint = 0;
while(1)
{
WDTCON = 0x00;
TMR1L = 0x8A; //configured value for timer
TMR1H = 0XD0;
chkint = 0;
PIR1 = 0x00; //interrupt flag bits are cleared
for(n = 0;n8;n++)
{
TMR1L = 0x8A;
TMR1H = 0XD0;
chkint = 0;
PIR1 = 0x00;
M09_9788131787663_C09.indd 329
M09_9788131787663_C09.indd 329 7/3/2012 12:11:05 PM
7/3/2012 12:11:05 PM

if(n == 0)
p = 1;
else
p = p*2; //multiply by 2 for
configuring as
//ring counter
while(chkint == 0)
{
chkint = PIR10x01; //check for
timer
interrupt
}
PORTB = p;
}
}
}
Example 9.17
This program is for the 10-bit ADC, but using it as an 8-bit one, by using only the lower
8 bits. The ‘end of conversion’ is tested, and if conversion is found complete, the digital
data is obtained at Port B.
#include p18f458.h
void main(void)
{
unsigned int chkadc = 1;
WDTCON = 0x00;
TRISA = 0x01; //PORTA as input
ADCON0 = 0x01; //A/D converter module is powered up
ADCON1 = 0x8E; //Six (6) MSBs of ADRESH are read as
//‘0’and A/D Port
//Control bits are configured
TRISB = 0x00; //PORTB taken as output
while(1)
{
ADCON0|= 0x04; //start ADC
while(chkadc == 1)
{
chkadc = (ADCON00x04)
;//check whether
conversion is over
}
PORTB = ADRESL; //digital output
at PORTB
}
}
M09_9788131787663_C09.indd 330
M09_9788131787663_C09.indd 330 7/3/2012 12:11:05 PM
7/3/2012 12:11:05 PM

Conclusion
This chapter has only dealt with simple C programs for 8051 and PIC. With more
knowledge in C, many more constructs of the C programming language (including
pointers) can be used in Embedded C. Programs using embedded C for PSoC are
included in Chapter 12, and for ARM in Chapter 11.
Programming an MCU in a high level language like C is much easier than using assembly
language, though the latter is more efficient.
One needs to know about the architecture of the MCU to write a program for it.
The header file which defines the names and addresses of the registers is to be‘included’
in the C program.
It is possible to address 8-bit ports, single bit ports and also memory.
Since the final program will be burned on ROM, it is necessary to write programs with an
infinite loop so that unused locations are not accessed.
To write programs for 8051, the details of its ports and peripherals are to be known.
The‘for, while and if’loops of C can be used to create delays.
Timers are programmable using flag checking as well as using interrupts.
Serial communication between a PC and an 8051 is possible and the data transferred can
be observed using the hyperterminal facility of Windows.
MPLAB is the most popular IDE for PIC MCUs.
Q U E S T I O N S
1. Write a program to toggle all the bits of port B of 8051 alternately with a small delay.
2. Write a program to generate a square wave at pin P0.5. Observe the waveform in the logic
anlayser of the Keil simulator. Use software delay loops.
3. Study the timers of 8051 as is given in Chapter 12, and write C programs to get the
following.
a) A symmetric square wave of 10 KHz at pin P2.3 using timer 1 in mode 1.
b) A symmetric square wave of 15 KHz at pin P2.3 using timer 0 in mode 1.
c) An unsymmetric square wave of period 10 msecs and duty cycle of 20 percent at pin
P0.6, using timer 1 in mode 2.
4. Generate all the waveforms of question 3 using timers in the interrupt mode.
5. What do the following directives do?
a) sbit
b) bit
M09_9788131787663_C09.indd 331
M09_9788131787663_C09.indd 331 7/3/2012 12:11:05 PM
7/3/2012 12:11:05 PM

6. Write a program to do the following.
a) Shift left thrice an 8-bit data, and move the result to port 0.
b) Logically AND the 8-bit contents of two memory locations and move the result to
port 1.
c) Logically Ex-OR the 8-bit contents of two memory locations and move the result to
port 0.
E X E R C I S E S
1. Study the operation of the timers of any PIC MCU of the 18F series and write a program to
generate square waves of different frequencies using the flag check mode as well as the
interrupt mode.
2. Repeat the above for any PIC chip of the 16F and 21F series.
M09_9788131787663_C09.indd 332
M09_9788131787663_C09.indd 332 7/3/2012 12:11:05 PM
7/3/2012 12:11:05 PM

PART-III
POPULAR MICROCONTROLLERS
USED IN EMBEDDED SYSTEMS
M10_9788131787663_C10.indd 333
M10_9788131787663_C10.indd 333 7/3/2012 12:11:15 PM
7/3/2012 12:11:15 PM

M10_9788131787663_C10.indd 334
M10_9788131787663_C10.indd 334 7/3/2012 12:11:16 PM
7/3/2012 12:11:16 PM

Chapter-opening image: An ARM7 LPC2140 board.
Introduction
This chapter gives an introduction to ARM, the very popular 32-bit processor, with
a short account of its history, followed by details of where it stands in the embedded
processor market now. ARM stands for ‘Advanced RISC Machine’.The name explicitly
states its characteristic of being a RISC processor.The ﬁrst ARM processor actually was
meant to be the ‘Acorn RISC Machine’ as it was manufactured by Acorn Computers
Ltd., Cambridge, England, in 1985.
10.1 | History of the ARM Processor
In 1985, Acorn Computers Ltd. was in search of a new processor to put up in the
desktop market. While the technocrats were contemplating various design options, they
came across a few papers published by a set of students in the University of Berkley
(USA) outlining a very simple processor design based on RISC principles.The computer
architects of Acorn Computers found the design very attractive and decided to build
The history of the ARM processor
The features and architecture of ARM
The instruction set of ARM
Assembly language programming for
ARM
The addressing modes of ARM
How to use subroutines without a stack
How to generate 32-bit constants using
the rotation scheme
The concept of literal pools
How to access R/W and Read Only memory
The use of diﬀerent types of stacks
arm—the world’s
most popular
32-bit embedded
processor
10
part i – architecture and assembly
language programming
M10_9788131787663_C10.indd 335
M10_9788131787663_C10.indd 335 7/3/2012 12:11:16 PM
7/3/2012 12:11:16 PM

a new processor using some of these principles. This led to the development of ARM1,
which had less than 25,000 transistors, and operated at 6 MHz.
This was followed by ARM2 (in 1987) with 30,000 transistors. Comparing this to
an Intel/Motorola’s processor of that time having 70,000 transistors, this was a beauty
in terms of a smaller die size and lower power dissipation.This was thus, the ﬁrst ARM
processor which was produced in bulk. It had a 32-bit data bus, a 26-bit address space
and sixteen 32-bit registers and was clocked at 8 to 12 MHz. It dissipated much less
power, and performed much better than Intel’s 80286 which came up around the same
time (but focused on the desktop market).
ARM3,ARM4 and ARM5 were also designed,but never produced,because around
this time, in 1990, Acorn Computers teamed up with Apple Computers and VLSI
Technology group to form a company named Advanced RISC Machines Ltd.This com-
pany continued with ARM6, ARM7, etc. The latter was the processor which became
very popular and led to ARM being used in exotic products such as mobile phones,
PDAs,IPods,computer hard disks,etc.After this,ARM made rapid strides in the 32-bit
embedded market,accounting for a very high percentage of applications in the high-end
embedded systems market.
As of 2011,ARM processors account for approximately 90 per cent of all embedded
32-bit RISC processors. ARM processors are used extensively in consumer electronics,
including PDAs, mobile phones, digital media and music players, handheld game con-
soles, calculators and computer peripherals such as hard drives and routers, etc.
The subsequent and more advanced processors of the ARM family (ARM9,ARM10,
ARM11, Cortex) have been built on the success of the ARM7 processor, which is still
the most popular and widely used member of the ARM family.
Over the years,many advanced features have been added to the ARM processor,but
the core has remained more or less the same.
10.1.1 | The ARM Core
What is meant by the ‘core’? The core is the ‘processing unit’ or the ‘computing engine’
which has all the computing power, and this aspect is decided by the architecture, which
represents the basic design of the processor.
One special and unique feature of ARM as a company is that it designs the core and
licenses this IP (Intellectual Property) to others. This simply means that the company
does not ‘fabricate’the chip, but sells only the design.This design is taken by the licensee,
who may or may not add more features (usually peripherals) to the design. Sometimes
the buyer can also modify the basic design to a minor extent.The buyer company fabri-
cates the design and sells it/uses it for its products.
There are various ways in which ARM sells its IP. It could be in the form of a soft
IP. In this case, the design is sold as RTL (VHDL/Verilog code), and this allows the
buyer to modify the design to a certain extent. If the design is sold as a hard IP, it means
the buyer gets only the layout or the net list (connection of nets or electronic wires).
Thus, the buyer can add only peripherals to the ‘black box’ design he has purchased.
We can thus understand that ARM the company does not ‘fabricate’ ARM chips.
(In contrast, Intel fabricates its processors and sells them as chips.) It is because of this,
that we have ARM chips and boards of various companies—Samsung, Philips, Atmel,
Texas Instruments, ST Microelectronics and so on—the list is very long.
M10_9788131787663_C10.indd 336
M10_9788131787663_C10.indd 336 7/3/2012 12:11:17 PM
7/3/2012 12:11:17 PM

ARMTHE WORLD’S MOST POPULAR 32BIT EMBEDDED PROCESSOR 337
10.1.2 | The ARM Microcontroller
ARM has been designated as a ‘microprocessor’ and indeed it is a processor which has
very high computing capabilities. It has a rich set of features for handling complex
computations.
However, for using it as an embedded processor, it needs many more capabilities
and these come in the forms of on-chip peripherals. To the ARM core, peripherals are
added and thus it becomes a ‘microcontroller’ or an MCU (microcontroller unit), rather
than an MPU (micro processor unit). Figure 10.1 shows the ARM MCU. The number
and kind of peripherals added, depends on the requirements of the buyer of the IP. It is
because of this that we have varying number of peripherals for ARM processors sup-
plied by different companies. It could be obvious that to support more peripherals, the
core has to be more powerful.That is why we generally find more peripherals around an
ARM 9 core rather than around an ARM7 core. But as a rule, users have to spell out
their requirements for the peripherals of an MCU.
When a chip has the core and the necessary peripherals to perform as a system, it is
called a System on Chip (SoC)—and the term ‘ARM SoC’ is a very commonly used—
understandably it has some version of the ARM core and a large set of peripherals.
10.1.3 | RISC vs CISC
The differences between these two schools of thought in computer architecture have
been discussed in Section 0.3.
But to put the idea in a proper perspective in the context of ARM, some specific
features of RISC are listed herein.These apply to most of the instructions of ARM, but
not necessarily to all.
i) Instructions are of the same size, that is, 32 bits
ii) Instructions are executed in one cycle
iii) Only the load and store instructions access memory
Developed by ARM
Chip developed by
licensees and chip
manufacturers
Internal Bus
ARM Core
GPIO
Memory
Peripherals
Clock
Figure 10.1 | ARM SoC—core with peripherals
M10_9788131787663_C10.indd 337
M10_9788131787663_C10.indd 337 7/3/2012 12:11:17 PM
7/3/2012 12:11:17 PM

Due to these simple guidelines in the design of the ISA (Instruction Set Architecture),
the outstanding features of this RISC processor are as follows:
i) The number of transistors needed is much less than that of a CISC processor of
comparable computational power.
ii) The die size is less because of the reduced hardware involved.
iii) Due to these aspects (and a few others, which will soon be elaborated), power dis-
sipation is very low.
10.1.4 | Advanced Features
Once the basic ARM core was designed, later members of the family kept on having
more and more features added. Over the years, some of these became ‘standard’ and
some are still optional.To specify what features are available with a particular ARM core,
naming conventions were adopted,but which have had to be changed over the years.Let
us take a look into some of these features. But if reading this section seems cumbersome, you
can skip it now, but make sure you read it later.
i) Thumb: A new 16-bit instruction set called ‘Thumb’ was made available. The logic
of having this less powerful instruction set is that all applications do not need the
full power of 32-bit ARM instructions.For such cases,the 16-bit Thumb set (which
is a compressed form of the ARM instruction set) will be enough and the advantage
obtained is that of high ‘code density’.
But what is code density?
The higher the amount of code that can be contained in unit area of memory, the
higher is the code density. Thus, when available memory is limited, it may be suf-
ficient to use Thumb instructions,if the application is light.There is also present,the
facility for mixing ARM and THUMB instructions, this is called ‘ARM THUMB
interworking’.
ii) MMU and MPU: These are two aspects related to memory. One is the ‘memory
management unit’ and the other is the ‘memory protection unit’. Such units are
mandatorily available in all advanced desktop processors (like Pentium), but for
embedded systems, the necessity of such units is dictated by the product for which
the processor is to be used. Thus, we have some ARM processors with both MPU
and MMU, and others with one or neither of them.
iii) Cache: The first ARM processor with a cache was ARM3. It had an on-chip cache
of 4 KB. ARM 7 had a cache of 8 KB which was improved in ways other than just
the size. Current ARM processors have cache as a standard component.
iv) Debug interface: There is an on chip unit for testing called the JTAG interface.
JTAG stands for ‘Joint Test Action Group’ and defines a set of standards for testing
the functionality of hardware. For any chip/system there is a set of scan cells located
at the boundaries and there are specific signals designed to enable ctesting’ of the
device. Such a unit is called the JTAG debug interface, and some ARM chips have
this facility.
v) EmbeddedICEmacrocell:The current hardware trend is to design a system as‘mac-
rocells’,which is a hardware unit.The ARM core could be considered as a macrocell,
M10_9788131787663_C10.indd 338
M10_9788131787663_C10.indd 338 7/3/2012 12:11:18 PM
7/3/2012 12:11:18 PM

while other units (peripheral units as well) may also be added as ‘macrocells’. Some
processors have an embedded ICE (In Circuit Emulator) macrocell to enable test-
ing. This unit is powered by breakpoint and watch point registers and control and
status registers. All this together can work to halt the ARM core to read status and
thus do active debugging.
vi) Fast muliplier: Even though ARM is a RISC processor, there are many features
in it which do not conform exactly to the RISC philosophy. Having dedicated
hardware for complex operations is one such deviation.Multiplication is a complex
operation, and for fast multiplication, there may be a fast multiplier unit.
vii) Enhanced instructions: Most advanced embedded systems require DSP opera-
tions, and for that a DSP unit with complex arithmetic operations, may be made
available on the chip.
vii) Jazelle DBX (Direct Bytecode eXecution): allows some ARM processors to exe-
cute Java bytecode in hardware as a third execution state along with the existing
ARM and Thumb mode.This is useful to increase the execution speed of Java ME
games and applications. ARM claims that such Java applications get run in hard-
ware (rather than software) so that ‘more speed’ is achieved.
viii) Vector floating point unit: This implies hardware support for floating point
computation.
ix) Synthesizable: If an ARM processor is synthesizable, it means that its RTL code
is available with the licensee,using which extensions and modifications are possible
to the basic core.
See Table10.1 which summarizes the early naming conventions of ARM processors.
(The { } notation means ‘optional’).
From Table 10.1, let’s try to decipher what ARM7TDMI indicates. It is based on
the ARM7 core, and has the Thumb instruction set (T), JTAG debugger (D),
fast multiplier (M) and the embedded ICE macrocell(I). If the designation is
ARM7TDMI-S, it means it is synthesizable. (Design available as VHDL/Verilog
code.) Figure 10.2 shows two ARM cores which uses this naming convention.
Table 10.1 | Early Naming Conventions for ARM
ARM {x}{y}{z}{T}{D}{M}{I}{E}{J}{F}{H}{S}
x Family (7, 8, 9, 10, 11, …)
y Memory management/protection unit
z Cache
T Thumb 16-bit decoder
D JTAG debug
M Fast Multiplier
I EmbeddedICE macrocell
E DSP Enhanced instructions (assumes TDMI)
J Jazelle
F Vector Floating-point Unit
S Synthesizable Version
M10_9788131787663_C10.indd 339
M10_9788131787663_C10.indd 339 7/3/2012 12:11:18 PM
7/3/2012 12:11:18 PM

Subsequently, it was decided to do away with these complex naming schemes, as the
features corresponding to TDMI were expected to be mandatorily available in all ARM
processors. But some numbers were added to imply the presence of memory interfaces,
cache, tightly coupled memory and so on. For example, ARM with cache and MMU are
now given the suffix 26 or 36, whereas processors with MPUs are suffixed with 46. Over
the years, this type of naming convention has also changed. Refer to Table 10.2 for some
more variants of ARM.
10.1.5 | Architecture Versions
Over the years, the architectural features have also been enhanced. Thus, later versions
of the architecture are more powerful Versions v4 and v4T are the early versions, later
versions are v5, v5E, v6 and v7.Table 10.3 lists various architecture variants of ARM.
10.1.6 | ARM CORTEX
ARM has come a long way from ARM2, which was the first one to be commercially
produced. ARM7 was a resounding success which made ARM the dominant player in
the 32-bit embedded processor market. ARM7 was followed by ARM9, ARM10 and
ARM11, all of which boasted of more and more computing powers. The latest in the
sequence is the CORTEX series which has the architecture v7 version. To make this
series cater to well-defined application sets,the following three profiles have been defined:
i) The A profile: This profile which has the ARMv7-A architecture is meant for
high end applications. It is meant to handle complex applications with high-end
embedded operating systems, and typical applications requiring such a profile are
mobile phones and video systems.
ARM920T
MMU
Dual 16K Caches
Embedded ICE
ETM9 Interface
ARM V4T
ARM-9 Core
Thumb
ASB Interface
ARM7TDMI
Embedded ICE-RT
ETM7 Interface
ARM V4T
ARM-7 Core
Thumb
Figure 10.2 | Two ARM cores
M10_9788131787663_C10.indd 340
M10_9788131787663_C10.indd 340 7/3/2012 12:11:18 PM
7/3/2012 12:11:18 PM

Table 10.2 | Variants of the ARM Processor
Processor
Name
Architecture
Version
Memory Management
Features
Other
Features
ARM7TDMI ARMv4T
ARM7TDMI-S ARMv4T
ARM7EJ-S ARMv5E DSP, Jazelle
ARM920T ARMV4T MMU
ARM922T ARMv4T MMU
ARM926EJ-S ARMv5E MMU DSP, Jazelle
ARM946E-S ARMv5E MPU DSP
ARM966E-S ARMv5E DSP
ARM968E-S ARMv5E DMA, DSP
ARM966HS ARMv5E MPU (optional) DSP
ARM1020E ARMv5E MMU DSP
ARM1022E ARMv5E MMU DSP
ARM1026EJ-S ARMv5E MMU or MPU DSP, Jazelle
ARM1136J(F)-S ARMv6 MMU DSP, Jazelle
ARM1176JZ(F)-S ARMv6 MMU+TrustZone DSP, Jazelle
ARM11MPCore ARMv6 MMU+Multiprocessor
Cache Support
DSP, Jazelle
ARM1156T2(F)-S ARMv6 MPU DSP
Cortex-M0 ARMv6-M NVIC
Cortex-M1 ARMv6-M FPGA TCM interface NVIC
Cortex-M3 ARMv7-M MPU (optional) NVIC
(Courtesy: The Deﬁnitive Guide to ARM Cortex-M3 by Joseph Liu, Newnes Publications)
Table 10.3 | Features of the Architecture Variants of ARM
Architecture
Versions
Features
v4 ARM instructions only
v4T THUMB instructions also added
v5 More advanced ARM and THUMB instructions
v5E Advanced ARM instructions and enhanced DSP instructions
v6 Advanced ARM and THUMB. SIMD and memory support
instructions added
v7 THUMB-2 technology, in which both 16-bit and 32-
bit instructions are supported, and there is no need to
switching between ARM and THUMB instruction sets
M10_9788131787663_C10.indd 341
M10_9788131787663_C10.indd 341 7/3/2012 12:11:18 PM
7/3/2012 12:11:18 PM

ii) The R profile: This profile which has the ARMv7-R architecture has been designed
for high-end applications which require real-time capabilities. Typical applications
are automatic braking systems and other safety critical applications.
iii) TheMprofile:This profile which has the ARMv7-M architecture has been designed
for deeply embedded microcontroller type systems. This is to be used in industrial
control applications where a large number of peripherals may have to be handled
and controlled.
10.1.7 | The Features of ARM Which Makes It‘Special’
Now that we have done a survey of the range of ARM processors, let’s discuss the
features which have made ARM a very popular processor in the high-end embedded
market.
i) Data bus width:The processor has a 32-bit data bus width, which means that it can
read and write 32 bits in one cycle. For high end applications, having a wide data
bus corresponds to a high data bandwidth and is very important. When ARM first
made its entry into the field, there were very few embedded processors which had
such a wide bus width.
ii) Computational capability: The instruction set of ARM has been cleverly designed
to facilitate very good computational capability. Many unique and new methods of
fast computation without the necessity of extensive hardware is used.The design of
the processor used the RISC approach, but over the years, this philosophy has been
diluted to enable the addition of specialized hardware for computationally intensive
tasks. In essence, ARM is a RISC processor which has a few CISC features as well.
iii) Low power: In the embedded field, power saving is very important, because a large
number of devices operate on battery power. Designing lower power processor cores
is thus a matter of high priority. How is it that a processor is designed to have low
power capability? Embedded processors operate at low clock frequencies compared to
desk top processors.While 3.3GHz is commonly used in the desktop processor field,
ARM operates at relatively low frequencies from 60 MHz to at the most 1 GHz.
The other techniques in low-power design are explained in Section 2.4.
iv) Pipelining: Pipelining is a fundamental idea in computer architecture, for increas-
ing the speed of operation.The idea is to get many activities to be done in tandem,
by dividing the whole instruction processing stage into sub stages. The basic task
that any processor does is ‘fetch, decode and execute’. In the simplest form of pipe-
lining (3 stage), all the three stages are active all the time. While the first stage is
fetching an instruction, the next stage, that is,. the decode stage, is busy with the
decoding of the previously fetched instruction, and the execute stage is execut-
ing the instruction which had been previously decoded. Thus at any time, there
are three instructions simultaneously present in the pipeline, at different levels of
processing.
If the processor clock frequency is f, the clock period (T) of the processor
is divided by 3 to give a time of T/3 for each of the stages. In this sub-cycle (of
period T/3), one instruction each is obtained as a throughput, which is essentially
3 instructions in the period T. It means that the processing speed is multiplied by 3.
M10_9788131787663_C10.indd 342
M10_9788131787663_C10.indd 342 7/3/2012 12:11:18 PM
7/3/2012 12:11:18 PM

Fetch Decode Execute
Figure 10.3a | A three stage pipeline
Figure 10.3b | The three stage pipeline with 3 instructions in operation
1 2 3 4 5
INSTR 1
INSTR 2
INSTR 3
Cycle
Operation
Figure 10.3a shows a three stage pipeline, while Figure 10.3b shows three
instructions in the pipeline.Any instruction needs three sub cycles to come out of the
pipeline, which translates to a throughput of three instructions per clock period (T).
ARM7 has a 3-stage pipeline, while ARM9 has a 5-stage pipeline with more
finely quantized stages (Figure 10.4), which are ‘fetch, decode, execute, buffer data
and write back’. As a general rule, more advanced processors have more pipeline
stages for example. ARM10 has 6 stages.
Pipelining is a great idea, but it has the drawback that when a branch instruc-
tion appears, the instructions following it are no longer needed to be executed in
the normal sequence. So the instructions in the previous stage/stages have to be
discarded, or we say that the pipeline is to be flushed. This creates a loss of speed,
and the penalty is higher for pipelines with more number of stages.
v) Multiple register instructions: Since ARM is a RISC processor, it has instruc-
tions which process data which are in registers only – this simply means that data
processing instructions do not use of addressing modes in which one operand is in
memory. But there are instructions which access memory and load data into mul-
tiple registers – also, contents of multiple registers can be stored in memory, with a
single instruction.
vi) DSP enhancements: Our processor has RISC as its basic policy, but the more
advanced members of the family have DSP (Digital Signal Processing) instructions
as an enhanced feature. This is where ARM departs from its RISC philosophy, but
is necessary for surviving in the embedded market. These DSP enhancements are
signified by an ‘E’in the name as of the ARMv5TE and ARMv5TEJ architectures.
Figure 10.4 | A five-stage pipeline
Fetch Decode Execute Buffer Write
M10_9788131787663_C10.indd 343
M10_9788131787663_C10.indd 343 7/3/2012 12:11:18 PM
7/3/2012 12:11:18 PM

10.2 | ARM Architecture
With this background, let us get started on the more intricate details of the processor.
10.2.1 | Instruction Set Architecture
It is likely that you have heard the term ‘Instruction Set Architecture’ (ISA) mentioned
in some context or the other. The term implies the user’s i.e. the programmer’s view of
the processor, which constitute the instruction set, addressing modes, registers, etc. ISA
is the assembly programmer’s or compiler designer’s view of the processor. We will base
most of our discussions on ARM7 which was the first and still the most popular of the
ARM processors. Advanced versions may have more enhancements, but the basic archi-
tecture is more or less the same.
10.2.2 | Operating Modes
ARM has seven operating modes which are listed here.It is not important to understand
the exact functions of each mode right now. But keep in mind that the user mode cor-
responds to the simplest mode, with least privileges, but is the mode under which most
application programs run. The system mode is a highly privileged mode. This mode is
used by operating systems to manipulate and control the activities of the processor. The
other modes are entered on the occurrence of exceptions or rather, they are interrupt
modes. See the list of the operating modes of ARM.
i) User: Unprivileged mode under which most tasks run
ii) FIQ (Fast Interrupt Request): Entered on a high priority (fast) interrupt request
iii) IRQ (Interrupt Request): Entered on a low priority interrupt request
iv) Supervisor: Entered on reset and when a software interrupt instruction (SWI) is
executed
v) Abort: Used to handle memory access violations
vi) Undef: Used to handle undefined instructions
vii) System: Privileged mode using the same registers as user mode
10.2.3 | Register Set
ARM has 37 registers each of which is 32 bits long.They are listed as follows:
i) 1 dedicated program counter (PC)
ii) 1 dedicated current program status register (CPSR)
iii) 5 dedicated saved program status registers (SPSR)
iv) 30 general purpose registers
Now, let’s go into the details of the listed registers
10.2.3.1 | General Purpose Registers
There are 30 of them, but they are distributed among different modes.
To understand this feature, see the case of one particular mode, say the user mode.
In this mode, the registers act as shown in Table 10.4.
M10_9788131787663_C10.indd 344
M10_9788131787663_C10.indd 344 7/3/2012 12:11:18 PM
7/3/2012 12:11:18 PM

Figure 10.5 shows the whole set of registers available for the processor. Look at the
set of registers titled as ‘user and system’. Let’s discuss the speciﬁc functions of each of
them.
R0–R12 are general purpose registers, or what may be designated as scratch pad’
registers. These are the registers into which data and address are loaded. They are also
‘the’ registers used in computations.
R13 is the pointer to the stack, and is the stack pointer (SP).
R15 acts as the program counter (PC), which, like in any other processor, is the
register which sequences instructions as they are fetched from memory.
Table 10.4 | Registers in the User Mode
Register Numbers Designations
R0–R12 General purpose registers
R13 Stack pointer (SP)
R14 Link register (LR)
R15 Program counter (PC)
Figure 10.5 | Register set of ARM
R15 PC
R14 LR
R13 SP
R12
R11
R10
R9
R8
R7
R6
R5
R4
R3
R2
R1
R0
R14_FIQ
R13_FIQ
R12_FIQ
R11_FIQ
R10_FIQ
R9_FIQ
R8_FIQ
R14_IRQ
R13_IRQ
R14_SVC
R13_SVC
R14_Undef
R13_Undef
R14_ABT
R13_ABT
–
CPSR
SPSR_FIQ SPSR_SVC SPSR_Undef
SPSR_IRQ SPSR_ABT
User
and
System
Fast
Interrupt
Request
Interrupt
Request Supervisor Undefined Abort
M10_9788131787663_C10.indd 345
M10_9788131787663_C10.indd 345 7/3/2012 12:11:18 PM
7/3/2012 12:11:18 PM

R14 is the link register (LR),a special register.It is used whether there is a procedure
call or an interrupt, that is, branching to a location.When branching becomes necessary,
the value of PC is saved in the link register, and PC takes on the new branch address.
When returning to the original sequence, the PC value can be retrieved from the link
register. This is a very convenient option, because the necessity to push the PC value to
the stack is avoided.The stack is a memory area, and saving and retrieving from stack is
time consuming. Having such a register, that is, the LR, to store return addresses helps
to reduce the delay associated with procedure calls and interrupts.
10.2.4 | Mode Switching
We know that there are seven modes for the processor, which implies that it can be
switched to different modes,as decided by the requirement.When the processor switches,
say, from the user to another mode, some of the user mode registers are replaced by
another set of registers. See the FIQ mode, for example, in this mode, R8 to R14 are
replaced by another set of registers, and the names of these registers are suffixed by FIQ,
like R14_FIQ, R12_FIQ and so on.
Why Is it that FIQ uses another set of registers?
Note that this mode is entered on a ‘fast interrupt’ which means it requires fast action.
One action during interrupts would be to save the contents of the currently used regis-
ters.This ‘saving’takes some time.To ensure fast operation, in the case of being switched
to the FIQ mode, new registers are used. No time is spent on saving the contents of
register R8 to R14 of the user mode. Once the FIQ mode is entered, those registers are
just swapped out, and replaced by a set of new registers. Note also, that all registers are
not swapped out, however.
Now look at Figure 10.5 once again to note the IRQ mode.Here only R13 and R14
are replaced by new registers. In the IRQ mode, the response is not expected to be, as
fast as in the FIQ mode. Thus, there is sufficient time to allow the contents of most of
the registers to be saved, before mode switching is done. This also applies to the modes
‘undef, supervisor and abort’. In these modes too, only two registers are swapped out and
replaced with new ones.
CPSR
The CPSR (Current Program Status Register) is a very important register, and there is
only one such register for the processor. Figure 10.6 and Table 10.5 gives its details.
The CPSR contains the information about the current state of the processor. It has
bits which specify the mode, control bits to enable/disable interrupts, and also specifies
whether the Thumb or ARM mode is currently in use.
Undefined Undefined I F T mode
31 28 27 24 23 16 15 8 7 6 5 4 0
N Z C V Q J
Figure 10.6 | Current Program Status Register (CPSR) bit configuration
M10_9788131787663_C10.indd 346
M10_9788131787663_C10.indd 346 7/3/2012 12:11:18 PM
7/3/2012 12:11:18 PM

Bits 0 to 4 specify the current mode of operation. Since there are only 7 modes of
operation, only seven mode numbers are valid.
The J bit is for indicating whether the Jazelle state is valid or not.The T bit specifies
whether the current operation is in the ARM or Thumb mode.
The contents of this register can be modified only in the highly privileged system
mode. It also contains the condition flag bits. Most of you are likely to know the rel-
evance of the conditional flag bits. But for those who might be new to the concept of
flags, here is a concise description.
10.2.5 | Conditional Flags
N: Negative Flag This flag indicates the status of the MSB of the result of an opera-
tion. If we are dealing with signed number N = 1 means that the sign bit = 1, which is
a negative result.
C: Carry Flag This bit is set if there is an overflow from the MSB of the data being
manipulated; this can happen in additions,shifts,rotates etc.It is also set when the result
of subtraction is positive. If R1–R2 gives a positive result, C = 1, indicates that R1 is
greater than R2. To be precise, let’s say that ‘A carry occurs if the result of an add, sub-
tract or compare is greater than or equal to 232
, or as the result of an inline barrel shifter
operation in a move or logical instruction’.
Z: Zero Flag If the result of an arithmetic or logical operation is zero, then Z = 1.
V: Overflow Flag This is the overflow flag, which is relevant only for signed operations.
It indicates that the sign bit has possibly been corrupted because the result has gone out
of the range.
When signed numbers are used, only 31 bits are available for the magnitude of the
numbers. With 32 bits, overflow occurs if the result of an add, subtract or compare is
greater than or equal to (231
-1) or less than – 231
,which is the maximum range available
for signed numbers.
Table 10.5 | CPSR Bits
Bit Nos. Notation Interpretation
0 to 4 Mode Specifies the current mode of
operation
5 T Specifies whether in ARM(T = ) or
Thumb(T = 0) state
6 F Disables (F = 1)FIQ
7 I Disables (I = 1)IRQ
8 to 23, 25 to 26 Undefined
24 J In Jazelle state (J = 1)
27 Q Sticky overflow flag
28 to 31 V, C, Z, N Conditional flags
M10_9788131787663_C10.indd 347
M10_9788131787663_C10.indd 347 7/3/2012 12:11:19 PM
7/3/2012 12:11:19 PM

To cite an example, say two positive numbers are added, and the magnitude of the
sum becomes greater than 31 bits.There will be an overflow into the sign bit, which will
change the MSB to ‘1’ and get wrongly interpreted as a negative number. This overflow
into the sign bit (MSB) with no overflow out of the MSB causes the overflow (V) bit
to be set.
Q: Sticky Overflow Flag This flag indicates overflow itself, but it is ‘sticky’ in the sense
that it remains set until explicitly cleared.
Saved Program Status Registers (SPSR) There are five ‘Saved Program Status
Registers’,that is,one for each of the ‘exception’modes of operation.When an exception,
that is, an interrupt occurs, the corresponding SPSR saves the current CPSR value into
it (so as to be able to retrieve it on returning to the previous mode). The system mode
and user modes do not have SPSRs because they are not entered through the mechanism
of interrupts.
10.3 | Interrupt Vector Table
We have seen that ARM has a number of exception modes. Exceptions are a class of
interrupts which are internally generated due to the occurrence of some specific condi-
tions.For example,when an undefined instruction is detected,the processor can’t process
it.The solution for such an undesired situation is to make the processor switch to another
mode and generate an interrupt.This interrupt takes control to an interrupt service rou-
tine (ISR i.e. interrupt handler) residing in a specific location in memory. This specific
location is termed the ‘Interrupt Vector’ corresponding to this exception.
Besides ‘exceptions’, the processor can be interrupted by instructions and this is
called a software interrupt (SWI).There are hardware interrupts as well, which are acti-
vated by FIQ or IRQ.
The aforesaid discussion is just to clarify the fact that associated with all exceptions,
hardware and software interrupts, there is a fixed interrupt vector which leads to the ISR
or the interrupt handler.
See Table 10.6 which shows the pre-defined interrupt vectors.
Table 10.6 | List of Interrupt Vectors
Exception Shorthand Vector Address
Reset RESET 0x00000000
Undefined instruction UNDEF 0x00000004
Software interrupt SWI 0x00000008
Prefetch abort PABT 0x0000000c
Data abort DABT 0x00000010
Reserved – 0x00000014
Interrupt request IRQ 0x00000018
Fast interrupt request FIQ 0x0000001c
M10_9788131787663_C10.indd 348
M10_9788131787663_C10.indd 348 7/3/2012 12:11:19 PM
7/3/2012 12:11:19 PM

Note that the first entry in the table is ‘Reset’. All processors have a address, termed
‘reset vector’ which is the location to which control branches to, when it is first powered
on, or when reset in the midst of processor activity. For ARM, this is 0x0000 0000.
Since this location is always fixed, RESET is usually included in the class of vectored
interrupts.
10.4 | Programming the ARM Processor
Now that we have had a look at the concepts regarding the instruction set architecture
(ISA) of ARM, we are in a position to understand it better by programming. Writing,
running and testing programs is the key to understanding any processor. By doing pro-
gramming, we become capable of understanding almost everything about how registers,
memory and flags act on data. In short, we get a total feel about the processing activity
done inside the processor.
To get to this, we need a programming environment, that is, an Integrated
Development Environment (IDE). There are many IDEs available for ARM, some of
which are free of cost (and freely downloadable) and some of which are proprietary and
thus have to be paid for.However for students,an evaluation version is available which is
freely downloadable and available from the website www.keil.com. Here, we will use the
Keil IDE also called the RVDK (Real View Development Kit), which is very popular
and easy to use. This version can be used for testing programs and for simulation also.
We will do all our learning using this IDE. The step-by-step procedure for using this,
is detailed in Appendix A. In this part of the chapter, we will assume that you have this
IDE and also that you have already browsed through Appendix A.
10.4.1 | Programming—Assembly vs C
Programming can be done in assembly as well in high level languages. In the embedded
design world, high level languages are used in product design, and C is a very popular
language. As such we will also do C programming (in the next chapter). But before that,
let’s have a stint in assembly programming.Our approach will be such that to understand
the ARM core,that is,to use its registers,do memory access and so on,we will do assem-
bly programming.This ensures that we get a good grip on the ARM core architecture. In
this context, it will turn out that we focus on the computational capabilities of the core.
And when we start using ARM as a microcontroller, i.e. the core with a number of
peripherals, we use C programming. This will allow us to use the processor in various
practical applications involving peripherals and interaction with the external world.This
part will be discussed in Chapter 11.
10.5 | ARM Assembly Language
As mentioned earlier, the ARM instruction set has been cleverly designed to get more
than one operation to be done in a single instruction. Let’s list out some features of the
ARM instruction set.
M10_9788131787663_C10.indd 349
M10_9788131787663_C10.indd 349 7/3/2012 12:11:19 PM
7/3/2012 12:11:19 PM

i) ARM is a RISC processor,in which every instruction has a maximum size of 32 bits.
Instructions are expected to be executed in one cycle. This is true for most instruc-
tions,but not for all.Therefore it is better to say that ARM is a RISC processor with
a few CISC type instructions as well.
ii) Another feature of RISC and therefore of ARM, is that it is a load–store architec-
ture.This means that all computations are register based, that is, the operands are to
be brought to registers from memory, using a load instruction. After computation,
the result is to be stored in memory. For the user, this means that there is no data
processing instructions in which one of the operands is in memory. All operands are
to be available in registers before computation can be done.
iii) A third feature of ARM is that its ALU has a barrel shifter (Figure 10.7) associated
with one of its operands. A barrel shifter is a unit that can perform more than one
bit of shift/rotation,to the right or to the left on an operand.As we will soon see,the
barrel shifter adds some clever processing techniques to data processing and allows
shifting and an arithmetic operation to be combined in the same instruction.
iv) ‘Conditions’can be appended to instructions: this implies that we can choose to ‘do
or not do’a particular operation based on a status of a condition flag, For most other
processors, only branching operations depend on flag status. Here we will see that
data movement and data processing instructions can be made ‘conditional’.
10.5.1 | Data Types
ARM can operate on 32-bit data, which is termed a word, 16-bit data called a half word
and also on byte operands.The processing tools offer the option of storing data as ‘little
endian’, or ‘big endian’. To clarify this concept, follow the forthcoming discussion, and
observe Figure 10.8
Figure 10.7 | Data processing unit
ALU
Operand 1 Operand 2
Barrel
Shifter
Result
M10_9788131787663_C10.indd 350
M10_9788131787663_C10.indd 350 7/3/2012 12:11:19 PM
7/3/2012 12:11:19 PM

A 32-bit data stored in memory needs 4 bytes of space which means 4 consecutive
addresses are required, as one address can store only one byte. When the lowest byte
of the 32-bit word is stored in the lowest of these four addresses, it is called the ‘little
endian’ format. Otherwise, it is the ‘big endian’ format. See Figure 10.8. The 32-bit data
word is 0 = 0xE4790A3.The storage addresses are from 0x00001200 onwards.
In the processor industry,both formats are used.Intel prefers the little endian format,
while Motorola uses the big endian format. ARM allows both formats (can be ﬁxed up
by software, in the initialization stage). In this book, we assume the little endian format.
10.5.2 | Data Alignment
Storing (and loading also) of 4 bytes in memory can be done in one cycle, because the
processor has a 32-bit data bus. When 32-bit data is stored in memory, four addresses
are needed. But we need to specify only one address in our instruction; but there is an
aspect called ‘alignment’. For 32-bit data,‘alignment’implies that the last two bits of this
address are zero. For example, the address 0x00001200 is an aligned address. When this
address is used to store 32-bit data, this address and the next three addresses are auto-
matically accessed. This is because of the way memory is organized, as four banks (see
Figure 10.9).
Address Data
0x00001200 A3
0x00001201 90
0x00001202 47
0x00001203 0E
Figure 10.8a | The little endian format
Address Data
0x00001200 0E
0x00001201 47
0x00001202 90
0x00001203 A3
Figure 10.8b | The big endian format
Figure 10. 9 | Memory banks
Bank 3
D31 D24 D23 D16
32 Bits
16 Bits
D15 D8 D7 D0
0×1204
0×1200
Bank 2
0×1205
0×1201
Bank 1
0×1206
0×1202
Bank 0
0×1207
0×1203
16 Bits
M10_9788131787663_C10.indd 351
M10_9788131787663_C10.indd 351 7/3/2012 12:11:19 PM
7/3/2012 12:11:19 PM

If the address of a 32-bit number is given as 0x1200, the accessed addresses are
0x1200, 0x1201, 0x1202 and 0x1203.The 4 bytes in these addresses are considered to be
in the same row, that is, aligned. In this case, one byte each from each bank is accessed
and only one memory cycle is needed to access an aligned word.
For unaligned data, one more cycle is necessary. Think of the address 0x1201. The
locations to be accessed will be 0x1201, 0x1202, 0x1203 and 0x1204. Note that the first
three bytes will be in the same row, while the last will be in a different row (bank), and
so one more cycle of access will be required.
We summarize the conditions for ‘aligned data’as follows:
• For word (32-bit) data, the specified address should have its least significant two bits
as 0.
• For half word (16-bit) accesses, the specified address should have the LSB equal to 0.
Most of the tools for ARM ensure that data is stored in aligned locations, so as to
avoid unnecessary extra cycles of operation.
10.5.3 | Assembly Language Rules
An assembly language line has four fields, namely, label, opcode, operand and comment.
A label is positioned at the left of a line and is the symbol for the memory address which
stores that line of information. There are certain rules regarding labels that are allowed
under the type of assembler being used. The manual of the specific assembler should be
referred, to get this clear.The second field is the opcode or instruction field.The third is
the operand field, and the last is the comment field which starts with a semicolon. The
use of comments is advised for making programs more readable.
A typical assembly language statement is
BOSE ADD R1, R2, R3 ; add R2 and R3 and copy the sum to R1.
The label is BOSE, the opcode is ADD, the operands are R1, R2 and R3 and the
line after the semicolon is the comment. While writing programs, make sure you don’t
write instructions at the extreme left of the page—that part is the ‘label’ field in this
book. We will use the assembler which is part of the RVDK supplied by Keil.The steps
in using it have been clearly described in Appendix A. More details are available in the
‘Real view assembly guide’.
10.6 | ARM Instruction Set
We will now discuss the ARM instruction set, and gradually move on to writing
programs.
The instruction set can be broadly classified as follows:
i) Data processing instructions
ii) Load store instructions—single register, multiple register
iii) Branch instructions
iv) Status register access instructions
The last set moves the contents of the CPSR or an SPSR to or from a general purpose
register and are used only in privileged modes.We will discuss the first three sets in detail.
M10_9788131787663_C10.indd 352
M10_9788131787663_C10.indd 352 7/3/2012 12:11:19 PM
7/3/2012 12:11:19 PM

10.6.1 | Data Processing Instructions
ARM is a RISC processor, one of the features of which is that it processes, i.e., performs
computations, on data which are in registers only. There are instructions which move
data from one register to another. Such instructions have only two operands, that is, the
source and the destination. Instructions which perform arithmetic/logical computations
have three operands—two source operands and one destination operand.
10.6.1.1 | MOV and MVN
The ‘MOV’ instruction is a ‘register to register’ data movement instruction with the for-
mat MOV destination,source where both the source and destination have to be registers.
The mnemonic ‘MVN’stands for ‘move negated’which implies moving the comple-
mented value of the source to the destination.
Registers R1 to R12 can be used for data movement as they are general purpose
registers. The registers R13, R14 and R15, which are the stack pointer, link register and
the program counter respectively, can also use the MOV instructions, but this must be
done carefully and only for specific purposes.
Examples
MOV R11, R2 ;copy the contents of R2 to R11
MOV R12, R10 ;copy the contents of R10 to R12
MVN R0, R9 ;move the complemented value of R9 to R0
;if R9 = 0xFFF00000, R0 = 0x000FFFFFF
Note Here we have discussed only the case of the MOV instruction used for moving
data between registers. The MOV instruction is also used for copying immediate data
into registers.That will be discussed in Section 10.17.
10.6.1.2 | The Barrel Shifter
Now, refer to Figure 10.7. We see that there is a barrel shifter associated with data
processing. The figure shows two register operands, one of which can optionally be
acted upon by a barrel shifter, before being admitted to the ALU. The barrel shifter
can do shifting and rotation. Let us first have a general discussion on shifts and
rotations.
10.6.2 | Shift and Rotate
Two types of shifts are possible: logical and arithmetic.
10.6.2.1 | Logical Shift Left (LSL)
Logical Shift Left of a (say) 32-bit number causes it to shift left, (a specified number of
times) and the vacant bits on the right are filled with zeros. See Figure 10.10.The last bit
Figure 10.10 | Logical shift left
CF Register 0
M10_9788131787663_C10.indd 353
M10_9788131787663_C10.indd 353 7/3/2012 12:11:19 PM
7/3/2012 12:11:19 PM

shifted out from the left is copied to the carry flag. Keep in mind that a left shift by one
bit position corresponds to multiplication by 2.An LSL of 5 implies multiplication by 32.
10.6.2.2 | Logical Shift Right (LSR)
Logical Shift Right does a similar thing. The vacant bit positions on the left are filled
with zeros, and the last bit shifted out is retained in the carry flag. This is shown in
Figure 10.11. Shifting right by one, divides the number by 2. Two right shifts cause a
division by 4.
10.6.2.3 | Arithmetic Shift Right (ASR)
Arithmetic Shift Right is different in the sense that the vacant bit positions on the left
are filled with the MSB of the original number. See Figure 10.12. This type of shift has
the function of doing ‘sign extension’of data, because for positive numbers the MSB is 0,
and for negative numbers, the MSB is 1.There is no instruction for arithmetic shift left,
because of not having an application for it.
10.6.2.4 | Rotate Right (ROR)
In this, the data is moved right, and the bits shifted out from the right are inserted back
through the left. See Figure 10.13. The last bit rotated out is available in the carry flag.
There is no ‘rotate left’ instruction, because left rotation by n times can be achieved by
rotating to the right (32 – n) times. For example, rotating 4 times to the left is achieved
by rotating 32 – 4 = 28 times to the right.
10.6.2.5 | Rotate Right Extended (RRX)
This corresponds to rotating right through the carry bit, meaning that the bit that drops
off from the right side is moved to C and the carry bit enters through the left of the data.
This should be obvious from Figure 10.14.
Figure 10.11 | Logical shift right
CF
Register
0
CF
Register
Figure 10.12 | Arithmetic shift right
Figure 10.13 | Rotate right
CF
Register
Figure 10.14 | Rotate right extended
CF
Register
M10_9788131787663_C10.indd 354
M10_9788131787663_C10.indd 354 7/3/2012 12:11:19 PM
7/3/2012 12:11:19 PM

10.6.3 | Format of Shift and Rotate Instructions
The number of bit positions by which shifts and rotations are to be done may be specified
by a constant or may be indicated in another register.
Examples
LSL R2, #4 ;shift left logically, the content of R2 by 4 bit positions
ASR R5, #8 ;shift right arithmetically,the content of R2 by 4 bit positions
ROR R1, R2 ;rotate the content of R1, by the number specified in R2
Example 10.1
The content of some of the registers are given as:
R1 = 0xEF00DE12, R2 = 0x0456123F, R5 = 4, R6 = 28.
Find the result (in the destination register),when the following instructions are executed.
i) LSL R1, #8
ii) ASR R1, R5
iii) ROR R2, R6
iv) LSR R2, #5
Solution
i) Shifting R1 left 8 times causes 8 zeros in the 8 positions on the right. R1 now con-
tains 0x00DE1200
ii) R5 contains 4. Arithmetically right shifting R1 4 times, causes the MSB (1, for the
given number) to be replicated 4 times on the left, thus causing a sign extension of
the shifted number. R1 now contains 0xFEF00DE1.
iii) R6 contains 28. Rotating R2 28 times to the right is equivalent to rotating it
32–28 = 4 times, to the left. After rotation, R6 contains 0x456123F0.
iv) Here, R2 is logically shifted right 5 times, and so 5 zeros enter through the left. R2
now has the value 0x0022B091.
10.6.4 | Combining the Operations of Move and Shift
Recollect the barrel shifter which is an integral part of the data processing unit of the
processor. This allows shifting and data processing to be done in the same instruction
cycle. We will first see how moving and shifting can be combined in one instruction
itself.
MOV R1, R2, LSL #2
MOV R1, R2, LSR R3
In both the above instructions, R1 is the destination register. In the first instruction, the
source operand, that is, the content of R2 is logically shifted twice and then moved to
the destination register R1. In the second, the amount of ‘shifting’is specified in register
R3. After the shifting is done, the result is moved to R1.
M10_9788131787663_C10.indd 355
M10_9788131787663_C10.indd 355 7/3/2012 12:11:20 PM
7/3/2012 12:11:20 PM

Example 10.2
Find the content of the destination registers after the execution of each of the given
instructions, given that the content of R5 = 0x72340200 and R2 = 4.
i) MOV R3, R5, LSL #3
ii) MOV R6, R5, ASR R2
Solution
The results here are similar to Example 10.1, except that the source and destination reg-
isters are not the same after execution of the instructions.
i) MOV R3, R5, LSL #3.
The content of R5 is shifted left 3 times, and moved to R3.
R3 now contains 0x72340200
ii) MOV R6, R5, ASR R2
R2 = 4, and so R5 is arithmetically shifted right 4 times. Since the MSB of the
number in R5 is 0, when right shifting, this bit is replicated 4 times at the left of the
number. After execution, R6 contains 0x07234020
10.7 | Conditional Execution
ARM has another interesting feature which can be designated as ‘conditional execution’.
This means that instructions are executed only if a specified condition is true, and here
the important thing is that it is not branch instructions alone that are meant—any data
processing instruction can be used in this way.
In general, all arithmetic and logic instructions are expected to affect conditional
flags. But for ARM, we must suffix the instruction by S for this to happen. Otherwise
the flags are unaffected. It is the S suffix on a data processing instruction that causes the
flags in the CPSR to be updated.
In Example 10.2, in the instruction MOV R3, R5, LSL #3, there is a logical opera-
tion involved, that is, the left shift operation.This should cause the carry flag and N flag
to be set. But since the MOV instruction is not appended with the suffix, ‘S’, the flags
remain unaffected, that is, reset.The MOV instruction can be made conditional by writ-
ing it as MOVS R3, R5, LSL #3. After this is executed, we find the N and C flags to be
set. This flag setting can be used to make an instruction following it, to be ‘conditional’.
We will soon see more aspects of this.
Figure 10.15 shows the format of a typical ARM instruction. In the instruction
code, four bits are allotted for the condition under which the instruction is to be exe-
cuted. If no condition is indicated, these bits assume the ‘always’ condition.
Table 10.7 lists the conditions,condition codes and the flag statuses for these condi-
tions. We will discuss the use of condition codes for instructions.
Note that the conditions used for signed numbers and unsigned numbers are dif-
ferent. For unsigned numbers, we use the mnemonic ‘higher’ or ‘lower’, while for signed
numbers, the conditions are specified as ‘greater than’or ‘lower than’.The flag settings are
also different. The logic of this is very simple, that is, we know that 6 is higher than 3,
M10_9788131787663_C10.indd 356
M10_9788131787663_C10.indd 356 7/3/2012 12:11:20 PM
7/3/2012 12:11:20 PM

but -3 is greater than -6. Thus, it is clear that unsigned and signed numbers have to be
dealt with differently.
10.8 | Arithmetic Instructions
Now let’s get a feel of the arithmetic instructions of ARM and the special ways in which
they can be used.
10.8.1 | Addition and Subtraction
Addition and subtraction are three operand instructions.The destination is always a regis-
ter.The source operands may both be registers or one of them may be an immediate data.
There are some issues in using immediate data greater than 8 bits (Ref Section 10.17).
See Table 10.8 which gives examples of how the different addition and subtraction
instructions work. Any of the general purpose registers may be used as operands, though
in the table, only R3, R4 and R5 have been mentioned.
Table 10.7 | List of Conditions, Codes and Corresponding Flag Status
Cond Mnemonic Meaning Condition Flag State
0000 EQ Equal Z = 1
0001 NE Not Equal Z = 0
0010 CS/HS Carry set/unsigned = C = 1
0011 CC/LO Carry clear/unsigned C = 0
0100 MI Minus/Negative N = 1
0101 PL Plus/Positive or Zero N = 0
0110 VS Overflow O = 1
0111 VC No overflow O = 0
1000 HI Unsigned higher C = 1 Z = 0
1001 LS Unsigned lower or same C = 0 | Z = 1
1010 GE Signed = N = = V
1011 LT Signed N! = V
1100 GT Signed Z = = 0, N = = V
1101 LE Signed = Z = = 1 or N! = V
1110 AL Always
1111 (NV) Unpredictable
Figure 10.15 | Format of a typical instruction
COND OPCODE Rn Rd Other Info Rm
31 28 27 20 19 16 15 12 11 4 3 0
M10_9788131787663_C10.indd 357
M10_9788131787663_C10.indd 357 7/3/2012 12:11:20 PM
7/3/2012 12:11:20 PM

Remember the concept of suffixing data processing instructions. This can be used
ingenuously for making operations conditional. For example, the add instructions (just
as any other data processing instruction) does not affect the conditional flags unless it
is suffixed by S. Following such an ADD instructions, we can have instructions with
conditions appended to it.The set of possible conditions are listed in Table.10.7.For any
instruction, the upper 4 bits are used to specify the condition (Figure 10.15).
Consider these program lines
SUBS R1, R2, R3 ;the suffix ‘S’ has been used
MOVEQ R2, R1 ;the EQ notation tests the Z = 1 condition
Here the move instruction is executed only if the result of the subtraction produces a
zero and sets the zero flag.The condition EQ implies the setting of the zero flag (Refer
Table 10.7)). Let’s use this concept in a simple example.
Example 10.3
It is required to compare two numbers which are in registers R1 and R2. The bigger
number is to be placed in R10. If the two numbers are equal, then the number is to be
moved to R9.
Solution
Here we use the subtraction operation to do the comparison.
SUBS R3, R1, R2 ;R3 = R1 – R2
MOVEQ R9, R1 ;If R1 and R2 are equal (Z = 1) move R1 to R9
MOVHI R10, R1 ;if R1R2, C = 1, R1 is moved to R10
MOV R10, R2 ;otherwise move R2 to R10
The salient points of this program are as follows:
i) First the operation, R1-R2 is performed and the result is placed in R3.
ii) Since the SUB instruction has been appended with S,the flags will be set accordingly.
iii) If the two numbers are equal, the zero flag gets set and the instruction MOVEQ
will get executed. Otherwise it becomes a NOP (no operation) instruction. Here
one of the numbers (R1) is to moved to R9 (as both numbers are equal).
iv) The next line checks whether the carry flag has been set. If R1R2, the carry flag is
set (C = 1) and the MOVHI (move if high) instruction gets executed. Otherwise
this also becomes a NOP.The move instruction gets the bigger number into R10.
Table 10.8 | List of Arithmetic Instructions
Instruction Operation Calculation
ADD R3, R4, R5 Add R3 = R4 + R5
ADC R3, R4, R5 Add with carry R3 = R4 + R5 + C
SUB R3, R4, R5 Subtract R3 = R4 – R5
SBC R3, R4, R5 Subtract with carry R3 = R4 – R5 – C
RSB R3, R4, R5 Reverse subtract R3 = R5 – R4
RSC R3, R4, R5 Reverse subtract with carry R3 = R5 – R4 – C
M10_9788131787663_C10.indd 358
M10_9788131787663_C10.indd 358 7/3/2012 12:11:20 PM
7/3/2012 12:11:20 PM

v) If the carry flag is not set, and the Z flag is also not set, it means that R2 is bigger.
This is moved to R10.
Note The last line does not need a condition. It can simply be MOV. If the other two
conditions are not satisfied it is obvious that the last one will be.
Example 10.3 might seem much too simple to need such a lot of explanation, but the
intention is to make this idea (of conditional execution) very clear, so as to enable you to
tackle more difficult problems with ease.
One question that may come to your mind is ‘why’ such
conditional execution?
The answer is that with this, the use of branch instructions can be avoided in many
instances. This is a very great saving, as branching causes stalling of the pipeline
(Section 10.2).It allows very dense code,without many branches.Not executing some of
the conditional instructions does affect the speed, but the penalty is less than the over-
head due to a branch.
Example 10.4
Find the result of the following instructions. What do these instructions accomplish?
i) ADD R1, R2, R2, LSL #3
ii) RSB R3, R3, R3, LSL #3
iii) RSB R3, R2, R2, LSL #4
iv) SUB R0, R0, R0, LSL #2
v) RSB R2, R1, #0
Solution
i) ADD R1, R2, R2, LSL #3
One source operand is R2, LSL #3. Left shifting 3 times accomplishes multiplica-
tion by 23
= 8
The result of the whole operation is R1 = R2 + 8R2 = 9R2
ii) RSB R3, R3, R3, LSL #3
R3 = 8R3 – R3 = 7 R3
iii) RSB R3, R2, R2, LSL #4
R3 = 16R2 – R2 = 15R2
iv) SUB R0, R0, R0 LSL #2
R0 = R0 – 4R0 = -3R0
v) RSB R2, R1, #0
We get R2 = 0 – R1 = -R1. i.e., we get the negative value of R1
10.9 | Logical Instructions
Now, we will see the logical instructions of the processor. They also need to be suffixed
with ‘S’ to have the flags updated. See Table 10.9.
M10_9788131787663_C10.indd 359
M10_9788131787663_C10.indd 359 7/3/2012 12:11:20 PM
7/3/2012 12:11:20 PM

Example 10.5
Given the contents of R3 and R4 as,R3 = 0x0FF00FF0,R4 = 0x0FF00FF0.and R0 = 0.
Find the values in R1, R2 and R5 at the end of the sequence of instructions shown.
i) EORS R1, R3, R4
ii) ANDS R5, R3,
Solution
The content of the destination register and the affected flag is shown alongside the
executed instruction
i) EORS R1, R3, R4 ;R1 = 0x00000000, Z = 1
ii) ANDS R5, R3, R0 ;R5 = 0x00000000 Z = 1
Note One of the source operands may be 8-bit immediate data as well. Refer to
Section 10.17 for details of how to handle data bigger than 8 bits.
10.10 | Compare Instructions
This instruction compares two operands and causes the conditional flags to be affected,
but neither the destination nor the source changes. Comparison is done by a subtraction
operation, and the flags are set/reset according to the result of this. (ARM has four types
of compare instructions as shown in Table 10.10). However, only two flags really matter
and they are the zero flag and the carry flag. Refer to Table 10.11 to get an idea of the
flag settings after a compare instruction.
Note Since the compare instructions explicitly affect the flags, the suffix S is not
required for them.
Comparison is a very important operation, and we will use it very frequently.
A number of programs using this instruction will be discussed subsequently.
Table 10.9 | List of Logical Instructions
Instruction Operation Logical Result
AND R3, R4, R5 Logical AND of 32 bit values R3 = R4 AND R5
ORR R3, R4, R5 Logical OR of 32 bit values R3 = R4 OR R5
EOR R3, R4, R5 Logical XOR of 32 bit values R3 = R4 XOR R5
BIC R3, R4, R5 Logical bit clear R3 = R4 (AND NOT) R5
Table 10.10 | List of‘Compare’Instructions
CMP R3, R4 Compare R3 – R4, but only flags affected
CMN R3, R4 Compare negated R3 + R4, but only flags affected
TST R3, R4 Test R3 AND R4 but only flags affected
TEQ R3, R4 Test Equivalence R3 OR R4 but only flags affected
M10_9788131787663_C10.indd 360
M10_9788131787663_C10.indd 360 7/3/2012 12:11:20 PM
7/3/2012 12:11:20 PM

What Is the use of the TST instruction?
TST is an instruction similar to compare, but it does ANDing and then sets conditional
flags. If the result of ANDing is a zero, the Z flag is set. It can be used to verify if at least
one of the bits of a data word is set or not. For that, ‘test’ the number with another one
in which the required bit position has a ‘1’. For example, let’s say we need to know if
the LSB of the content of R1 is set or not. Use the instruction TST R1, #01 and verify
the status of the Z flag. If Z = 1, it implies that the LSB of R1 is not set, because the
AND operation for that bit, has produced a 0, not a 1.
What is the use of the TEQ instruction?
TEQ does exclusive ORing which tests for equality. If both the operands are equal, the
Z flag is set.It verifies if the value in a register is equal to a specified number.The instruc-
tion TEQ R1, #45 verifies whether the content of R1 is 45.
10.11 | Multiplication
Multiplication is a complex operation which needs specialized hardware and takes more
than one cycle to execute. ARM has a number of multiplication instructions, which uses
this hardware. Let’s examine how these instructions are used.
10.11.1 | Multiply
The format of the multiply instruction is
MUL Rd, Rm, Rs
where Rd is the destination register. Rm and Rs are source registers. A number of points
are to be kept in mind when these instructions are used.Table 10.12 lists different types
of multiplication instructions.
Table 10.11 | Flag Settings After a Compare Instruction
If C Z
R3 R4 1 0
R3 R4 0 0
R3 = R4 1 1
Table 10.12 | List of Multiply Instructions
Instruction Operation Calculation
SMLAL R0, R1, R2, R3 Signed multiply and
accumulate
[R0, R1] = [R0, R1] + R2 * R3
SMULL R0, R1, R2, R3 Signed multiply [R0, R1] = R2 * R3
UMLAL R0, R1, R2, R3 Unsigned multiply and
accumulate
[R0, R1] = [R0, R1] + R2 * R3
UMULL R0, R1, R2, R3 Unsigned multiply [R0, R1] = R2 * R3
M10_9788131787663_C10.indd 361
M10_9788131787663_C10.indd 361 7/3/2012 12:11:20 PM
7/3/2012 12:11:20 PM

i) The source and destination registers are 32 bits in length. If the product is longer
than 32 bits, only the lower bits are preserved in the destination register.
ii) Immediate data cannot be used as a source operand.
iii) If the multiplicand and multiplier are signed numbers, it is up to the programmer to
identify a logic to interpret the sign of the product.
iv) The instruction can be made conditional.
Example
MUL R1, R2, R3 ;R1 = R2 × R3
MULS R1, R2, R3 ;R1 = R2 × R3 and flags are also set
MULSEQ R3, R2, R1 ;R1 = R2 × R3 is done only if the Z = 1
;(because of the EQ suffix)
;because of the S suffix, flags are updated
MULEQ R4, R3, R5 ;if Z = 1, R4 = R3 × R5
10.11.2 | Multiply and Accumulate
The format of this instruction is
MLA Rd, Rm, Rs, Rn ;Rd = (Rm * Rs) + Rn
This instruction does multiplication and accumulation (addition) as seen above. All the
conditions specified for the MUL instruction are applicable here, as well.
Example
MLA R0, R1, R2, R3 ;R0 = R1 xR2 + R3
10.11.3 | Long Multiply/Long Multiply and Accumulate
In this, when 32-bit data are multiplied to get 64 bit results, the upper 32 bits are saved
in a specified register. For signed data, the sign bit is also preserved in the upper register.
The format is
Instruction RdLo, RdHi, Rm, Rs Examples
Note That in all the above cases, two registers function as the destination
Since multiplication is a complex instruction, it takes many cycles for execution. So, it is
best to realize multiplication using shifting and adding,rather than using any of the mul-
tiply instructions.Table 10.12 lists the available ‘multiply and accumulate’ instructions.
10.12 | Division
Division is another complex instruction requiring specialized hardware and extra clock
cycles. As a policy, basic ARM architecture does not have a ‘divide’ instruction. Division
can be realized using repeated subtraction. Compilers are given the responsibility of
accomplishing division using the simple instructions of the processor.
M10_9788131787663_C10.indd 362
M10_9788131787663_C10.indd 362 7/3/2012 12:11:20 PM
7/3/2012 12:11:20 PM

10.13 | Starting Assembly Language
Programming
If you have any previous experience of assembly language programming, you will know
that there are two items used therein—instructions and directives—the former are exe-
cutable statements which are ‘executed’ by the processor.The latter, that is, directives are
non-executable statements relating to the assembler.They are used to give the assembler
necessary information to perform the assembly process smoothly. For some processors,
directives are also called pseudo instructions. For ARM, pseudo instructions are special
directives issued to the processor which causes certain instructions to be executed.Thus,
they are also executable statements. Thus for ARM, an assembly language line will con-
tain an instruction, directive or pseudo instruction.
Writing and testing a program for ARM is done in a computer, usually a PC, which
is called the host computer. The host computer should have the program development
tools for ARM. Since the program written in ARM assembly language is assembled
in a PC which has a different processor (usually some version of Pentium), the process
is called ‘cross assembly’. After the program in tested, it is converted to a hex file and
burned into our processor (i.e. ARM).
The output of an assembly or compilation process has atleast two areas.
i) A code area.This is usually a read-only area.
ii) A data area.This is usually a read-write area.
The default area for code is Read-Only and for data it is Read-Write.
Let’s understand some fundamental directives first.
10.13.1 | The AREA Directive
The first thing we do when we start assembly language programming is to define an
area.There is a directive named ‘AREA’for this.This directive names the area and sets its
attributes.The attributes are placed after the name, separated by commas.
Example
AREA SORT, CODE, READ ONLY
AREA TABLE, DATA
The first area defined above is given the name SORT; it contains a code, and is read
only.The word ‘read only’is optional.The second AREA directive has the name TABLE
and it contains data and though not mentioned will correspond to the Read Write area
(as it is a data area).
10.13.2 | The ENTRY Directive
The ENTRY directive marks the first instruction to be executed within an application.
Because an application cannot have more than one entry point, the ENTRY directive
can appear in only one of the source modules.
M10_9788131787663_C10.indd 363
M10_9788131787663_C10.indd 363 7/3/2012 12:11:20 PM
7/3/2012 12:11:20 PM

10.13.3 | The END Directive
This directive tells the assembler to stop reading. Anything written after the END
directive will be ignored by the assembler. So every assembly language source module
must finish with an END directive, on a line by itself.
10.14 | General Structure of an Assembly Language Line
The general form of source lines in assembly language is:
{label} {instruction|directive|pseudo-instruction} {;comment}
Some points to keep in mind are listed as follows:
i) Instructions, pseudo-instructions and directives must be preceded by a white space,
such as a space or a tab, even if there is no label.This means that they should not be
written in the label space (extreme left of the line).
ii) Instruction mnemonics and register names can be written in uppercase or lowercase,
but not mixed.
iii) Labels are symbols that represent addresses. The address given by a label is calcu-
lated during assembly.The assembler calculates the address of a label relative to the
origin of the area where the label is defined.Assigning labels eases the programmer’s
burden as he does not have to concern himself with numerical values. The location
counter in the assembler keeps on incrementing as labels are encountered.
Typical assembly language lines are
NOO MOV R1, R2, LSL #2 ;copy the content of R2 left shifted, to R1
NUMS DCW 2354, 5678 ;define two data half words
In the above two lines, NOO and NUMS are the labels, MOV (with operands) is an
instruction and DCW is a directive, which is explained in the following section.
10.14.1 | Directives for Defining Data
Before we go deep into programming, we need to understand a few directives of the
assembler which define and describe different kinds of data. Data which is used in a
program can be bytes or words or half words. We define data and assign labels to their
corresponding addresses. Defining data implies allocating space for data. The space we
allocate corresponds to memory addresses, which are identified by labels. Data, when
stored in memory is defined accordingly, using directives.
DCB defines data byte, DCW defines 16 bits or a half word and DCD defines a
word (32 bits).
Examples
NUMS DCB 9, 82, 71
NUMB DCW 0x6787, 0x4564
NUMBR DCD 0x00000123, 0x67890900
In the above, the first line shows data which are bytes. The first byte 9, has the address
NUMS, 82 has the address NUMS + 1, and 71 has the address NUMS + 1.
M10_9788131787663_C10.indd 364
M10_9788131787663_C10.indd 364 7/3/2012 12:11:20 PM
7/3/2012 12:11:20 PM

The second line has address NUMB for the ﬁrst half word, and NUMB + 2 for
the second half word. In the last case, the addresses of the words are NUMBR and
NUMBR + 4.
Keep in mind that a byte needs only one address, half word needs two, and a word
requires four memory addresses.
10.14.2 | The EQU Directive
This is a frequently used directive, and is used to equate a numeric constant to a label.
The constant may be data or address. Examples are as follows:
FACTR EQU 35
BASE_ADDR EQU 0x40000000
10.14.3 | Constants Allowed
The constants that can be used are numbers (decimals, hex or having any other base),
characters, strings and Boolean
• Decimal., say, 346, 6748, etc.
• Hexadecimal. For example, 0x12345678, 0xFCE45, etc.
• n_xxx where: n is a base between 2 and 9 and xxx is a number in that base
• Characters: They are to be enclosed within single quotes, ‘e’, ‘R’, etc.
• Strings: They are characters enclosed within double quotes “mine”, “non”, etc.
• Boolean: TRUE or FALSE
10.14.4 | The RN Directive
The names of the general purpose registers have been introduced as R0, R1, R2, etc.
When we use them for loading operands,it is possible that we have a confusion as to
which data has been loaded into which register.To ease out this problem, there is a way
for giving variable names to registers. Suppose we need to use R0 for loading the value
of X, and R1 for loading Y, we use the directive RN as follows:
X RN 0
Y RN 1
This method can be used for any of the registers.
DAT1 RN 8
DET RN 10
10.15 | Writing Assembly Programs
Now, that we have got used to writing some instructions, let’s get down to writing a
complete program.This will let us get a feel of the programming process, after which we
can learn more important instructions and write bigger and better programs.
M10_9788131787663_C10.indd 365
M10_9788131787663_C10.indd 365 7/3/2012 12:11:20 PM
7/3/2012 12:11:20 PM

Example 10.6
Write a program to find the sum of 3X + 4Y + 9Z, where X = 2, Y = 3 and Z = 4.
Solution
AREA SUMM, CODE, READONLY
X RN 1 ;register R1 is named X
Y RN 2 ;register R2 is named Y
Z RN 3 ;register R3 is named Z
ENTRY
MOV X,#2 ;load X = 2 into register R1
MOV Y,#3 ;load Y = 3 into register R2
MOV Z,#4 ;load Z = 4 into register R3
ADD R1,R1,R1,LSL#1 ;R1 = 3X
MOV R2,R2,LSL#2 ;R2 = 4Y
ADD R3,R3,R3,LSL#3 ;R3 = 9Z
ADD R1,R1,R2 ;R1 = R1+R2 i.e. 3X+4Y
ADD R1,R1,R3 ;R1 = R1+R3 i.e. 3X+4Y+9Z
STOP B STOP ;continue branching at STOP
END ;end of the assembly file
Since this is the first complete program we are writing, it is important to make some
observations regarding it.
i) SUMM is the name of the code AREA defined.The term ‘Read only’is optional, as
by default, a code area corresponds to the Read only memory only.
ii) As assembly language line has the label field at the left, and the opcode field to its
right.For the Keil assembler,you may find that writing the word ‘AREA’in the label
field will generate an error message. But the directive RN is to be positioned in the
label field itself. No instructions should be in the label field.
iii) The ENTRY directive should be followed by an instruction or pseudo instruction.
iv) The program involves multiplication and addition. Since the multiply instruction is
a ‘complex’ one involving the use of special hardware (more power dissipation and
more clock cycles), it is not used. Instead multiplication is achieved by the use of
shift and add instructions.
v) The last instruction is an unconditional branch instruction (mnemonic ‘B’) and it
continually branches to the same label STOP. This is done so that control does not
go any instruction beyond this location. Any code has to finally be burned into
ROM. Many embedded programs have their last line as this kind of self branching,
since we don’t want the next memory locations in code memory to be accessed.
10.16 | Branch Instructions
For any processor, branching is a very important operation. The power to change
the sequence of execution is obtained by branching, which may be conditional or
M10_9788131787663_C10.indd 366
M10_9788131787663_C10.indd 366 7/3/2012 12:11:20 PM
7/3/2012 12:11:20 PM

unconditional. Most processors have ‘jump’ and ‘call’ instructions for changing the
sequence of execution. ARM does all this by different forms of a ‘branch’ instruction.
It has the mnemonic ‘B’ for branch. The four different forms of branch instruction are
given in Table 10.13
Let’s see the usage of each of them.
Branching implies transferring control to a new memory location which is expressed
as a ‘label’. Hence the format of any branch instruction is B label. Branching is made
conditional by appending the mnemonic B with the necessary condition.
Examples
B NEW ;transfers control unconditionally to location NEW
STOP B STOP ;continually branches to its own label STOP
BNE NOO ;branch to NOO if Z flag is not set
BHI LUX ;branch if high, i.e., if C = 1
The format of a branch instruction is as shown in Figure 10.16. Target addresses are
‘relative’.What this means is that when a branch instruction is taken up,the PC (program
counter) value and the value specified by the instruction are algebraically added.
The target is specified as a 24-bit signed number; this number is shifted left (logically)
twice (so that the two LSB bits are zero. This makes all target address to be ‘aligned’ (Ref
Secion 10.5.2). The left shifting also multiplies the number by 4. This makes the target to
have 26 bits,that is,the maximum range is between +/− 225 (one bit is for sign,remember).
This number is added to the PC value. In short, what is done with the 24-bit immediate
number,by the instruction is that it shifts it left by two bits,sign extends it to 32 bits,and adds it
to PC.Thus,the maximum range for branching is only +/- 32 MB.(225
= 220
× 25
;220
= 1 MB,
25
= 32). For branch addresses beyond this range, the PC can be directly loaded with the
target address. Now see this simple program which calculates the factorial of 10.
Example 10.7
AREA FACTO, CODE ;define the code area
ENTRY ;entry point
MOV R1, #10 ;R1 = 10
Table 10.13 | List of Branch Instructions
Mnemonic Instruction
B Branch
BL Branch and link
BX Branch and Exchange
BLX Branch Exchange with link
Figure 10.16 Format of a branch instruction
31 28 27 26 25 24 23
COND 1 0 1 Signed-immed_24
L
M10_9788131787663_C10.indd 367
M10_9788131787663_C10.indd 367 7/3/2012 12:11:20 PM
7/3/2012 12:11:20 PM

MOV R2, #1 ;R2 = 1
REPT MUL R2, R1, R2 ;R2 = R2 xR2
SUBS R1, R1, #1 ;R1 = R1- 1
BNE REPT ;branch to REPT if Z! = 0
STOP B STOP ;last line
END
This is a very simple program which finds the factorial of 10. It can be used to find
the factorial of any other number (except 0), provided the factorial does not exceed
32 bits in size.The technique is to multiply the number with the ‘number-1’ recursively.
Meanwhile,a counter also decrements by 1(which is done by subtraction),and when the
counter is 0, the Z flag is set. The multiplication is then stopped. The factorial is avail-
able in the register R2. The branch instruction used is a conditional one, that is, BNE
which tests the Zero flag.The instruction before it,that is,SUB has been appended with
the ‘S’ suffix to ensure the setting of flags.
Now let’s see another example which uses conditional branching.This program per-
forms division by repeated subtraction.
Example 10.8
AREA DIV, CODE
ENTRY
MOV R1, #500 ;Move the dividend to R1
MOV R2, #16 ;Move the divisor to R2
MOV R3, #0 ;R3 = 0
MOV R4, R1 ;copy the dividend to R4
REPT SUBS R4, R4, R2 ;subtract and set flags
ADDPL R3, R3, #1 ;add if N = 1 i.e. MSB of R$ is +ve
BPL REPT ;repeat the loop if the MSB is +ve
ADDMI R4, R4, R2 ;if MSB of R4 is –ve, add R2 to R4
STOP B STOP
END
This program performs division by repeated subtraction. Here 500 is to be divided by 16.
The method is to subtract 16 from 500 repeatedly until the result becomes negative.
The branch instruction BPL REPT means Branch to label REPT if plus (PL), i.e., if
N = 0.
Besides conditional branching,there are the ADD and SUB instructions also,which
are conditional—the condition used is the status of the sign flag N.
The steps of the program are as follows:
i) Subtract 16 from 500, and check if the result is +ve or −ve. This can be verified by
checking the N flag which corresponds to the MSB of the resultant number. The
condition flags are updated by the subtraction operation (using the suffix S).
ii) If the number (in R4) is +ve, it means that subtraction can be repeated unhindered.
Each time this is verified, the quotient register (R3) is incremented by 1.
M10_9788131787663_C10.indd 368
M10_9788131787663_C10.indd 368 7/3/2012 12:11:20 PM
7/3/2012 12:11:20 PM

iii) When the result of subtraction becomes – ve,(the condition ‘MI’for minus),add the
divisor to this negative number (in R3).
iv) In this problem, when 16 is subtracted 31 (0x1F) times from 500, the value in R4
is +ve. One more subtraction makes the N ﬂag to be set, and the number R4 to be
negative.
v) To this –ve number add the divisor.This makes it equal to the remainder which is 4,
in this case.
vi) Thus, we get 31 (0x1F) as the quotient (in R3) and 4 as the remainder (in R4)
10.16.1 | Subroutines/Procedures
In Table 10.13, there is another form of the branch instruction which is BL standing for
‘Branch and Link’.Recollect that a procedure (also called subroutines,functions,etc.) means
that a new program sequence is taken up, but control returns to the original point after
that. Most processors (including ARM) use stacks to store the return addresses and return
instructions to handle procedure calls.ARM has an additional feature to handle procedures
in a simpler manner.Recollect a register named the ‘Link Register’.When a BL instruction
is encountered,the PC value is changed to that of the target,but the old PC value is copied
to the LR register.At the end of the procedure,the LR value can be copied back to the PC.
Now let’s write a program which calls a procedure.
Example10.9
Write a program to calculate 3x2
+ 5Y2
, where X = 8 and Y = 5
Solution
AREA PROCED,CODE
ENTRY
MOV R2,#8 ;to calculate 3X2
+5Y2
BL SQUARE ;call the SQUARE procedure
ADD R1,R3,R3,LSL #1 ;3X2
MOV R2,#5 ;R2 = 5
BL SQUARE ;call the SQUARE procedure
ADD R0,R3,R3,LSL #2 ;5Y2
ADD R4,R1,R0 ;R4 = R1+R0 i.e 3X2
+5Y2
STOP B STOP ;last line in the execution
SQUARE MUL R3,R2,R2 ;the SQUARE procedure
MOV PC,LR ;return LR back to PC
END
The salient points of this program are as follows:
i) A procedure named SQUARE has been used. This procedure uses the multiply
instruction to ﬁnd the square of any number. The number to be squared is passed
to the procedure using the register R2.The square of the number is returned to the
main program in R3.
M10_9788131787663_C10.indd 369
M10_9788131787663_C10.indd 369 7/3/2012 12:11:20 PM
7/3/2012 12:11:20 PM

ii) There are two numbers, X and Y, whose squares are to be found. Calling the pro-
cedure amounts to just writing the instruction BL SQUARE. This instruction will
cause a branching to the procedure named SQUARE. It also copies the current PC
value to the link register (LR).
iii) The procedure has only two instructions: one to perform squaring, and the other to
copy the LR content back to PC.The second instruction causes a return to the main
program.
iv) We need two multiplications, in addition to the squaring operation.These two, that
is, 3X2
and 5Y2
are achieved by shifting and adding.The MUL instruction is used as
little as possible because it takes more time, and causes higher power dissipation.
v) The last step is adding 3X2
and 5Y2
which are now in R1 and R0.The sum is avail-
able in R4.
vi) Note that the last program line to be executed is STOP B STOP, even though it is
not the last line in the assembly file.
In Table 10.13 there are two more forms for the branch instruction. BX stands for
Branch and Exchange. BLX is for Branch Link and Exchange. The Exchange feature
is applicable when ARM and THUMB instructions are being used, and it is needed to
switch from one set to another.
10.17 | Loading Constants
How is it different for ARM?
An important addressing mode for any processor is the ‘immediate’mode. In this, a con-
stant which is specified in the instruction itself is to be copied into a register, or is used
as one operand in any arithmetic or logic operations.
Examples
MOV R1, #0x7867
ADD R1, R2, # 567
This seems very obvious and direct. For CISC machines, this is fine, because the imme-
diate data can be another byte or word. But ARM has the limitation that its instruction
size should nor exceed 32 bits, which means that the constant should fit in the word
length of 32 bits along with the opcode, condition code, register code and other infor-
mation that the instruction should carry. It is thus apparent that we can’t have a 32-bit
constant embedded in the instruction.
So then what is the maximum size of the constant
that can be used in the immediate mode?
We would like to be able to use immediate constants as large as 32 bits. How is this
done? ARM uses an ingenious technique, the idea being the use of rotation of a small
number to generate a large number. We have already seen that there is a barrel shifter
in the ALU. Any data processing instruction has a format as shown in Figure 10.17a.
The data processing instruction format has 12 bits available for operand 2.
Figure 10.17b shows the instruction format which has been modified for using the
immediate mode.
M10_9788131787663_C10.indd 370
M10_9788131787663_C10.indd 370 7/3/2012 12:11:20 PM
7/3/2012 12:11:20 PM

When the immediate mode is needed, the 12-bit field is modified such that there
is an 8-bit immediate constant which is subject to a ROR (rotate right operation). The
rotate operator can use only 4 bits. But since the maximum rotation possible is 32 bits,
the four bit ‘rotate’ operand is multiplied by 2 and then rotated. Hence, it becomes
a case of ‘8 bits shifted by an even number of bit positions’.The rotated 8-bit number will
become a 32-bit number during the data processing.
Let’s try to understand how this is done.
10.17.1 | Generating a 32-bit Constant Using Rotation
Consider the steps in rotating to the right by 2, the number 0xF0 after expanding it to
fill 32 bits space
Case 1
The original 8-bit number is 1111 0000
Expanding it to fill 32 bits makes it
00000000 00000000 00000000 11110000
Rotating it right by 2 makes it
00000000 00000000 00000000 00111100
i.e. 0x3C
Case 2
We can also use the MVN instruction for generating new numbers. If 0 is loaded into a
register and moved into another or the same register after using the MVN instruction,
we get 0xFFFFFFFF
You can try this code to verify.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Cond 0 0 I S Rn Rd Shifter_operand
Opcode
Figure 10.17a | Format of a typical data processing instruction
Figure 10.17b | Modification of the‘shifter operand’
ROR
×2
ROT IMMED-8
11 8 7 0
32-Bit Constant
M10_9788131787663_C10.indd 371
M10_9788131787663_C10.indd 371 7/3/2012 12:11:21 PM
7/3/2012 12:11:21 PM

MOV R1, #0x0
MVN R1, R1
Let’s summarize the points regarding the generation of constants using the ARM
rotation scheme.
i) A class of constants can be generated by this scheme,(Table 10.14) but all constants
cannot be generated.Those that cannot be,will have to loaded directly into memory,
by using the concept of ‘literal pools’.We will come to that soon.(Table 10.14 shows
the range of constants that can be generated by the rotation scheme)
ii) To generate the constant needed, the programmer need not specify the 8-bit imme-
diate number, and the number of rotations to be done. He just has to write an
instruction in the immediate mode. The assembler converts this instruction to the
required scheme. When a constant 0x200000002 is needed, the assembler converts
it to the instructions.
MOV R1, #0x22
MOV R1, R1, ROR#4, which creates the require 32-bit constants for us.
iii) The processor does not have an instruction for rotation to the left. But, rotating n
times to the left is achieved by rotating (32-n) times to the right.
Example 10.9
Find the 32-bit constant generated by each of the following rotations
i) Rotate 0x40, to the right 30 times
ii) Rotate 0x56, to the left 12 times
iii) Rotate 0x6D, to the right 16 times
iv) Rotate 0x05, to the right 6 times
v) Rotate 0xFC, to the right 2 times
Solution
i) The 8-bit number is 01000000.
00000000 00000000 00000000 01000000; the 8-bit number in 32-bit format
00000000 00000000 00000001 00000000; the number after rotation
Thus, the constant obtained is 0x100
ii) The 8-bit number is 01010110
00000000 00000000 000000000 01010110; the 8-bit number in 32-bit format.
Rotating 12 times to the left is equivalent to rotating 20 times to the right
00000000 00000101 01100000 00000000; the number after rotation
The constant generated is 0x56000
Table 10.14 | The Range of Constants That Can Be Generated By the Rotation Scheme
Decimal Values Equivalent
Hexadecimal
Step Between
Values
Rotate
0 – 255 0 – 0xﬀ 1 No rotate
256, 260, 264, …, 1020 0x100 – 0x3fc 4 Right by 30 bits
1024, 1040, 1056, …, 4080 0x400 – 0xﬀ0 16 Right by 28 bits
4096, 4160, 4224, …, 16320 0x1000 – 0x3fc0 64 Right by 26 bits
M10_9788131787663_C10.indd 372
M10_9788131787663_C10.indd 372 7/3/2012 12:11:21 PM
7/3/2012 12:11:21 PM

The answers for iii, iv and v are in Table 10.15
From Example 10.9, we note that many 32-bit numbers can be generated by the
ARM rotation scheme,but there are constants which cannot be obtained by this method.
For example, the number 0x11111111 cannot be generated by rotation.
How then are such constants obtained for use in the
immediate mode of addressing?
10.17.2 | Literal Pools
In computer science, specifically in compiler and assembler design, a literal pool is
a lookup table used to hold literals during assembly and execution. But first, what
exactly is a literal? In programming, a literal is a value written exactly as it is meant
to be interpreted. A literal can be a number, a character or a string. For example, in
the expression, x = 145, x is a variable, and 145 is a literal. Thus, literals are constants
for ARM. When it is required to load a constant in a register, the assembler can help
by creating a space in memory and then placing this constant in the space. From this
memory space,the processor can take it and use it using load instructions.But assemblers
are not guided by instructions; they use what are called pseudo instructions. In this case
of making a literal pool, and taking a constant from the pool, there is a specific pseudo
instruction
LDR Rd, = const
This pseudo instruction can construct any 32-bit numeric constant. Suppose we
need to the constant 0x33333333, it is likely that we write an instructions MOV R1,
# 0x33333333. With this, the assembler will give an error message that such a constant
cannot be generated.To avoid such a situation, we write
LDR R1, = 0x33333333.
This is a pseudo instruction (don’t confuse it with the LDR instruction,we will soon
come to, there is a difference in format between the two). This will cause the assembler
to check one of the following possibilities.
i) Can the constant be constructed with MOV or MVN instruction combined with
rotation? If this is possible, the assembler generates the appropriate instruction, that
is, an 8-bit number is rotated appropriately to get the constant in question.
ii) If the constant cannot be constructed this way, the assembler places the value in a
literal pool and generates an LDR (load register) instruction with a program-relative
address that reads the constant from this literal pool.
Table 10.15
8-Bit Number ROR Constant
iii 0x6D 16 0x6D0000
iv 0x05 6 0x14000000
v 0x3E 2 0x8000000F
M10_9788131787663_C10.indd 373
M10_9788131787663_C10.indd 373 7/3/2012 12:11:21 PM
7/3/2012 12:11:21 PM

Example 10.10a
AREA PROG1,CODE,READONLY
ENTRY
LDR R1, = #0x12400000
LDR R2, = 0X00555555
ADD R3,R1,R2
STOP B STOP
END
In Example10.10a, two constants are needed. If you run this program and check
the disassembly file, you will find two interesting facts, which relate to the different
ways in which these two constants are generated. The assembler realizes that the first
constant can be obtained by the rotation scheme, but the second one cannot. So a
literal pool is created just after the last instruction, and the constant 0x00555555 is
placed therein.Then a ‘load register’instruction is generated to load the constant into
register.
How Is the literal pool accessed?
The literal we need is accessed from the literal pool using a PC relative mode. In this
mode,only 12 bits are allowed for the ‘relative number’which can be positive or negative.
Thus, the literal in the pool has to be within +/ 4KB of the current PC value.
Where should the literal pool be placed?
Normally the literal pool is placed just after the END directive, which means just
after the end of the program area. This is okay for normal size programs. But some-
times, programs are very large and if the literal pool is placed after the end of the
program, it may be out of range (of +/− 4KB) of the LDR instruction. Such a situ-
ation implies that there should be the flexibility of placing literal pools anywhere in
memory. This is done by the LTORG directive which allows us to define the origin
of a literal pool.
When a pseudo instruction LDR Rd = const is encountered, the assembler checks
if the constant is available and addressable in the nearest literal pool. If it is so, it takes
it from the pool. Otherwise, it attempts to place the constant in the next literal pool.
If the next literal pool is out of range, the assembler generates an error message. In this
case, the LTORG directive is to be used to place an additional literal pool in the code.
Place the LTORG directive after the failed LDR pseudo instruction, and within 4KB
memory space.
Literal pools are to be placed in locations where the processor does not attempt to
execute them as instructions. It is best to place them after unconditional branch instruc-
tions, or after the return instruction at the end of a subroutine. Let us see an example of
a case where the LTORG directive becomes necessary.
M10_9788131787663_C10.indd 374
M10_9788131787663_C10.indd 374 7/3/2012 12:11:21 PM
7/3/2012 12:11:21 PM

Example10.10b
ENTRY
LDR R1, = 0x12400000
LDR R2, = 0x00555555
ADD R3,R1,R2
SPACE 4400
STOP B STOP
END
This is a modiﬁed version of Example 10.10a.Recollect that we need to have a literal pool
for the constant 0x00555555. Here, a directive called SPACE 4400 has been inserted.
This directive creates an empty area of 4400 byes. Because of this, the total space occu-
pied by the program becomes large (greater than 4400 bytes, anyway).The literal pool is
usually after the program area. In this case, this will make the literal pool to be beyond
the range (greater than 4KB) of the LDR R2, = 0x00555555. Hence, on assembling, the
following message is seen.
error: A1284E: Literal pool too distant, use LTORG to assemble it within 4KB
In the program an error message indicating this will be obtained as above.
To avoid such a situation, we can place the literal closer to the instruction which
needs the constant. See the modiﬁed version of the program.
Example 10.10c
ENTRY
LDR R1, = 0x12400000
LDR R2, = 0X00555555
ADD R3,R1,R2
LTORG
SPACE 4400
STOP B STOP
END
Now the program runs without error, because a literal pool has been created before the
‘free space’ of 4400 bytes. By the use of the LTORG directive, the required constant is
found to have been placed in this pool, by the assembler. Thus, we see that the directive
LTORG can be used to place literal pools wherever we want.This will become useful as
programs become larger.
10.18 | Load and Store Instructions
ARM is a RISC architecture, and one of the features of RISC is that of being a ‘load
store’ architecture. Loading is the process of getting data from memory into a register,
and storing is just the reverse process. In ARM, data is brought into registers using a
M10_9788131787663_C10.indd 375
M10_9788131787663_C10.indd 375 7/3/2012 12:11:21 PM
7/3/2012 12:11:21 PM

load instruction, and only then can it be used for data processing. After computation,
the result can be ‘stored’ in memory. The memory in question is ‘RAM’ which is the
read/write memory. RAM is volatile and is used for temporary storage of data in the
course of computations. The only instructions which access RAM are ‘load’ and ‘store’.
All registers can be accessed using these instructions, but programmers are advised to
exercise caution when accessing critical registers like the PC, SP, etc.
The syntax for load or store is
LDR/STR {cond}Rd, addressing mode
Rd is the source register for store and destination register for load.
The addressing mode gives us the necessary information to get the ‘effective address’,
which is the actual memory address to be accessed. The addressing mode is indirect
because the memory address is not to be specified directly in the instruction, rather
a base register is mandatorily used. For the simplest case, an example of LOAD and
STORE instructions are as follows:
LDR R1, [R2] ;copy into R1 the content of memory specified in R2
STR R1, [R2] ;store the content of R1 into the memory address specified in R2
This implies that the load/store instruction must be preceded by an instruction
which copies the address into R2. We will soon get to know how this is done. There
are various ways of specifying the effective address. The barrel shifter can be part of the
address specifying mechanism.
Example 10.11
Howistheeffectivememoryaddresscalculatedinthefollowingloadandstoreinstructions?
i) LDR R3, [R2, LSL #2]
ii) STR R9, [R1, R2, ROR #2]
iii) LDR R4, [R3, R2]
iv) STR R5, [R4, R3, ASL #4]
Solution
i) LDR R3, [R2, LSL #2]
In this the effective address is the content of R2 left shifted by 2, i.e. multiplied by 4
ii) STR R9, [R1, R2, ROR #2]
Here, the effective address is specified by R1, R2 and a right rotation. To calculate
it, the content of R2 is rotated twice by 2, and then added to the content of R1.
vi) LDR R4, [R3, R2]
The effective address here is the sum of R3 and R2.
vii) STR R5, [R4, R3, ASL #4]
The effective address is the sum of the content of R4 and the arithmetically left
shifted (by 4) content of R3.
10.18.1 | Bytes, Half Words and Words
Now, let’s see another aspect of load and store instructions. ARM has instructions
to transfer specifically a word (32 bits), half word (16 bits) or a byte (8 bits) between
M10_9788131787663_C10.indd 376
M10_9788131787663_C10.indd 376 7/3/2012 12:11:21 PM
7/3/2012 12:11:21 PM

memory and registers.There are also instructions which diﬀerentiate between signed and
unsigned data.
There are instructions which clearly indicate the kind of data to be moved. See
Table 10.16. From the table, we understand that we can load and store parts of a 32-bit
word by using B for byte and H for half word, along with the load and store instructions.
If a memory location contains a 32-bit word, we can move the LSB (assuming little
endian format) into a register by using LDRB, or the lower half of the word by using
LDRH. Let’s clarify this by an example.
Example 10.12
Two memory areas are being referenced and two registers are used as pointers:
R1 = 0x00000100
R2 = 0x40001200
Figures 10.18a and b show the data addresses and corresponding data.
Show the content of memory, after the execution of the following instructions:
Table 10.16 | List of Load and Store Instructions
LDR Load Word STR Store Word
LDRH Load Half Word STRH Store Half Word
LDRSH Load Signed Half Word
LDRB Load Byte STRB Store Byte
LDRSB Load Signed Byte
Address Byte Stored
0 x00000100 56
0 x00000101 23
0 x00000102 0D
0 x00000103 AE
Figure 10.18a | Address and data
Address Byte Stored
0 x40001200 00
0 x40001201 00
0 x40001202 00
0 x40001203 00
Figure 10.18b | Address and data
i) LDR R3, [R1]
ii) LDRB R3, [R1]
iii) LDRH R3, [R1]
iv) STRB R3, [R2] given that R3 = 0xAE0D2356
For this case, show half word and word storage as well.
M10_9788131787663_C10.indd 377
M10_9788131787663_C10.indd 377 7/3/2012 12:11:21 PM
7/3/2012 12:11:21 PM

Solution
i) LDR R3,[R1]
In this, the complete 32-bit data in the address pointed to by R1 is copied to R3.
So R3 = 0xAE0D2356
ii) LDRB R3,[R1]
In this, the byte (LSB) of the word alone is copied to R3. Since it is an unsigned
byte, the remaining bytes of R3 contain 0. So R3 = 0x00000056
iii) LDRH R3,[R1]
In this, the half word (lower two bytes) of the address is copied to R3.
R3 = 0x00002356
iv) STRB R3,[R2] given that R3 = 0xAE0D2356
In this, the byte corresponding to the LSB of the data in R3 is copied to the address
pointed by R2. See Tables 10.17a, b and c for byte, half word storage and word
storage as well.
STRB R3, [R2]
0 x40001200 56
00
00
00
STRH R3, [R2]
0 x40001200 56
23
00
00
STR R3, [R2]
0 x40001200 56
23
0D
AE
10.18.2 | Loading Signed Numbers
Signed numbers are those whose MSB is the sign bit. For positive numbers, the sign bit
is ‘0’, whereas negative numbers are in the two’s complement form and have their MSBs
to be ‘1’,When a 32-bit number is available in memory,it can be loaded into registers as
signed bytes and signed half words. In these cases, the MSB of the byte part or the half
word part is checked, and sign extension is done while loading it into registers
Consider the case of a word 0xCDEF8204 in memory. Let R7 be used as a pointer
to that memory location.Then, observe the result of execution of the following instruc-
tions, as given in the comments column.
Table 10.17a, b and c
M10_9788131787663_C10.indd 378
M10_9788131787663_C10.indd 378 7/3/2012 12:11:21 PM
7/3/2012 12:11:21 PM

LDR R1, [R7] ;R1 = 0xCDEF8204…….Case 1
LDRSH R2, R7] ;R2 = 0xFFFF8204……...Case 2
LDRSB R3, [R7] ;R3 = 0x00000004…….....Case 3
For case 1,the 32 bits are copied to R1.For case 2,only the lower 16 bits are to be copied.
The MSB of the 16-bit half word is ‘1’, and this is extended to 32 bits while copying to
R2.That’s how the upper 16 bits of R2 become FFFF.
For case 3, the lowest byte alone is copied. Its MSB is 0. As such, the rest of R3 is
filled with zeros, i.e., the sign bit ‘0’ is extended to fill the upper 24 bits. You can also
observe in Table 10.16 that there are no store instructions for signed bytes or signed half
words.This is because storing simply means placing numbers in memory.These numbers
may be signed, unsigned data or code—it is only when the user brings it to a register,
is the processing on that number done. Only then it is necessary for that number to be
interpreted as signed or unsigned.
10.18.3 | Indexed Addressing Modes
In this mode,the effective address calculation can be done before a load/store is executed
or afterwards. Let’s see what it is all about.
10.18.3.1 | Pre-indexed Addressing Mode
Observe the instruction LDR R0,[R7,#4].Here R7 is the base register and the effective
address is R7 + 4.The data at this effective address is copied to R7.
Next, see the instruction STR R1, [R5, R6, LSL #2]. The effective address = R5 +
R6 left shifted twice.
In the above two instructions, there is a notable feature, however. After the load/
store is done, the base address content remains unchanged, that is, the effective address
is not copied to the base register. But if we want the base address to contain the effective
address, just suffix the instruction by the character ‘!’ and then ‘write back’ occurs.
Consider the instruction LDR R2, [R6, #-8] !. In this, after the loading operation is
done, R6 has the effective address written back into it.
Example 10.13
Calculate the effective addresses and explain what each instruction does.
i) STRB R2, [R6, R7, #0x24]!
ii) LDRSH R4, [R10, R11, ASR #4]
Solution
i) STRB R2, [R6, R7, #0x24]!
The effective address is the sum of the contents of R6, R7 and the number 0x24.
The content of R2 is stored in the effective address. After that, the effective address
is copied to R6.
ii) LDRSH R4, [R10, R11, ASR #4]
Here, the effective address is the sum of the contents of R10 and R11 after arith-
metically shifting it right by 4 positions. The half word in this address is loaded to
R4.The contents of the base register remains unchanged.
M10_9788131787663_C10.indd 379
M10_9788131787663_C10.indd 379 7/3/2012 12:11:21 PM
7/3/2012 12:11:21 PM

10.18.3.2 | Post-indexed Addressing Mode
In this mode, the effective address calculation is done after the execution of the specific
instruction has been done.
Take the case of the instruction LDR R0, [R4], #4
Here the data pointed by the content of R4 is first copied to R0. After that, the
content of R4 is changed to R4 + 4. There is no need of the ‘!’ operation because that is
exactly what post-indexing does.
Example 10.14
Let’s add 10 numbers which are in memory. The numbers are 16-bit long, that is, half
words, and use two byte spaces. The pre-indexed mode of addressing with write back is
used to index the half words which have addresses with a spacing of 2 between them.The
instruction LDRH R2, [R7, #2]! does the indexing of the 16-bit numbers.
Solution
AREA DADD, CODE, READONLY
ENTRY
LDR R7, = TABLE ;copy the address of Table to R7
STRT MOV R0,#9 ;R0 = 9
LDRH R1,[R7] ;load 1st
number from memory to R1
REPT LDRH R2, [R7,#2]! ;pre-indexed with writeback
ADD R1,R1,R2 ;R1 = R1+R2
SUBS R0,R0,#1 ;R0 = R0-1
BNE REPT ;repeat the addition until R0 = 0
STOP B STOP ;last line
TABLE DCW 3456,7859,1234,9876,3452,3214,7864,0987,2032
END
Let’s examine the salient features of this program.
i) The numbers to be added are stored in code memory, just after the last line of the
program.
ii) There are 10 numbers to be added. The first number is loaded into register R1 and
the rest are loaded one by one into R2.
iii) R0 is used as a counter to the numbers. In a general case, if there are N numbers to
be added, R0 = N-1. Here N = 10.
iv) The loading of the numbers to R2 is done using a loop. Since the numbers are half
words, their addresses are to be incremented by2.This is done very efficiently by the
pre-indexed addressing with write back scheme. After one half word is accessed,
the effective address is written back to the base register R7 in readiness for accessing
the next half word.
v) The address corresponding to TABLE is a 32-bit constant. It is calculated by the
assembler, and loaded into R7 using the techniques mentioned in Section 10.17.
M10_9788131787663_C10.indd 380
M10_9788131787663_C10.indd 380 7/3/2012 12:11:21 PM
7/3/2012 12:11:21 PM

10.19 | Readonly and Read/Write Memory
The two memory areas deﬁned by the compiler are ‘Readonly’for code, and ‘Read/write’
for data.Usually this corresponds to ROM and RAM in a physical system.RAM is used
for intermediate results, for temporary storage, etc., as this is volatile memory. We can
store data permanently in the readonly memory, process it and copy it in RAM. In the
readonly memory, data is written using directives like DCD, DCW, etc. From there, it is
copied to readwrite memory using load and store instructions.
Example 10.15
AREA FIRST, CODE, READONLY
ENTRY
LDR R7, = NUMS ;load the address of NUMS in R7
LDR R8, = NUMS1 ;load the address of NUMS1 in R8
LDR R9, = NUMS2 ;load the address of NUMS2 in R9
LDR R1,[R7] ;load the word to R1
STR R1,[R9] ;store the word in R1 in NUMS2
STR R1,[R8] ;store the word in R1 in NUMS1
STOP B STOP
NUMS DCD 653451134
AREA SECOND, DATA, READWRITE
NUMS2 SPACE 60
NUMS1 DCD 0
END
In Example 10.15, three memory areas have been deﬁned: one in readonly, and two in
readwrite memory. What is accomplished is just the transfer of a word from readonly
memory to readwrite memory. In readwrite memory, one part is a space of 60 bytes.The
next is a word space which is initialized to 0. After the execution of the program, the
number 653451134 is copied to both these spaces.
Example 10.16
AREA STRIN1, CODE, READONLY
ENTRY
STRT LDR R1, = SOURCE ;pointer to source string
LDR R0, = DESTIN ;pointer to destination string
BL COPY ;call procedure for copying
STOP B STOP ;last line of execution
COPY LDRB R2, [R1],#1 ;Load byte and update address.
STRB R2, [R0],#1 ;Store byte and update address.
CMP R2, #0 ;Check for 0
BNE COPY ;repeat until the string is over
MOV PC,LR ;return to calling program
M10_9788131787663_C10.indd 381
M10_9788131787663_C10.indd 381 7/3/2012 12:11:21 PM
7/3/2012 12:11:21 PM

SOURCE DCB “I am sam”,0
AREA STRIN2,DATA,READWRITE
DESTIN DCB 0
END
Example 10.16 uses many of the programming aspects that we have been discussing so
far. Let’s have a look at the important features of this program.
i) There is an ASCII string written in readonly memory using the DCB directive.
Such a string is enclosed in double quotes and each character is a byte.
ii) One readonly and one read/write memory areas have been defined.
iii) After the ASCII string, a 0 is used as a terminating character. The arrival of this
0 in R2 is used to check whether the required transfer of the string is done.
iv) The instructions for loading and storing are suffixed by ‘B’which indicates that only
a byte is to be transferred.
v) Post-indexed mode of addressing is used for load and store. The addresses need to
be incremented only by 1, as only a byte is transferred.
vi) The instructions for loading and storing are in a procedure named COPY.The pro-
cedure is ‘called’ by the BL instruction which does branching and also copies the
current PC to the link register.The last line of the procedure is copying the LR back
to PC.This constitutes the ‘return’ to the main program.
10.20 | Multiple Register Load and Store
We have seen in Section 10.18 the LDR and STR instructions, which transfer data
(in the form of bytes, halfwords or words) between a register and memory. Now, let’s
see an advanced (or let’s say an extended) form of loading and storing, wherein multiple
registers are involved. But only data in the form of words (32 bits) can be handled by
these instruction.The mnemonic of multiple load and store is LDM/STM.
10.20.1 | The LDM Instruction
Let’s talk about LDM first—it has the syntax
LDM{cond}address-mode Rn{!},reg-list{^}
Rn is the base register for the load operation.The address stored in this register is the
starting address for the load operation. There can be a number of modes for specifying
the address. register-list is a comma-delimited list of symbolic register names and register
ranges enclosed in braces. There must be at least one register in the list. Register ranges
are specified with a dash. For example, {R0—R5, R9} is a list. The ! option is for ‘write
back’ and the ^ option is relevant for interrupts. We will not discuss the second option
here. Write back is not to be specified if the base register Rn is in register-list.
Multiple register load means that multiple memory locations are to be accessed,and
loaded into multiple registers. There is a ‘base register’ acting as a pointer for the first
memory location to be accessed. This register is then incremented or decremented to
point to the next memory addresses. There are four options for handling this. The base
register can be incremented or decremented by 4 (one word needs four addresses) for
M10_9788131787663_C10.indd 382
M10_9788131787663_C10.indd 382 7/3/2012 12:11:21 PM
7/3/2012 12:11:21 PM

each register in the operation, and the increment or decrement can occur before or after
the operation.The suffixes for these options are as follows:
IA – increment after
IB – increment before
DA – decrement after
DB – decrement before
Consider the instruction LDMDA R0, {R4-R9}
The base register here is R0. Let us assume it holds the number 0x45000000.The opera-
tion of this instruction is that the 32-bit word at that address is pointed by R0, and that
word copied to R4. Then the address is decremented to point to the next word. So the
new address is [R0-4], and this word is copied to R5. The sequence of decrementing
the address and loading data from memory is done for the registers R4, R5, R6, R7, R8
and R9.
It is obvious that a single instruction replaces six LDR instructions. Is there any
advantage in this? As far as execution is concerned, it is ‘No’. All the six load operations
have to be done. But note that only one ‘instruction fetch’cycle is needed for the six load
operations together. So there is definitely some savings in terms of time.
What Is the difference in operation of the following instruction?
LDMIA R10,{ R9, R1 – R5}
Here the base address is in R10, and after each data transfer, it is incremented by 4. In
the destination register list, R9 is specified first, but the processor has a particular way
of handling the list.The lowest register will always be loaded from the lowest address in
memory, and the highest register from the highest address. Here R1 gets the data in the
address pointed by R10.
LDM
STM
R0
R1
R2
R12
R13
R14
R15
Register Set
M
M+1
M+2
M+12
M+13
M+14
M+15
Memory
Figure 10.19 | The LDM and STM instructions
M10_9788131787663_C10.indd 383
M10_9788131787663_C10.indd 383 7/3/2012 12:11:21 PM
7/3/2012 12:11:21 PM

10.20.2 | The STM instruction
This has the same format as the LDM instruction. Consider the instruction
STMIA R1, {R2-R4}
This will be equivalent to the instructions
STR R2, [R1]
STR R3, [R1. #4]
STR R4, [R1. #8]]
After the sequences of four stores are over, the base content does not vary, however.
If you need it to be changed to that of the final address, the writeback operator ‘!’is to be
used. So write the instruction as STMIA R1!,{R2-R4}
Now let’s use the LDM and STM instructions to simply Example 10.16 which
transfers bytes from one portion of memory (Readonly) to another portion (Read/
write). But the multiple load/store instructions can be used only for words (32 bits). So
Example 10.17 has been modified and used to move 6 words.
Example 10.17
AREA STRIN1, CODE, READONLY
ENTRY
LDR R1, = SOURCE ;pointer to source
LDR R0, = DESTIN ;pointer to destination
LDMIA R1,{R2-R8} ;Load six words to R2-R8
STMIA R0,{R2-R8} ;Store six words in destination
STOP B STOP
SOURCE DCD 0x675889,0x1234568,0x9876543,0x2345678,0x8907653
AREA STRIN2,DATA,READWRITE ;define the R/W memory area
DESTIN DCD 0 ;
END
Here 6 words from the source memory have been copied to six registers using just
one instruction. In the next instruction, these six words are stored in the destination
memory
Note how simple, the program is.
Example 10.17 illustrates the idea that block data transfers can be simplified using
the multiple register instructions. But their real importance is for stack implementation.
Stacks are a necessity for any processor; stacks are needed for storing data temporarily
and also for storing return addresses and register values during procedure calls. We will
see this now. For those who are not very familiar with the concept of stacks, here is a
brief review.
10.20.3 | Stack
A stack is an area in memory,the accessing of which is done in a special way.Most stacks
are Last-In First-Out (LIFO) type stacks.This means that the last data that was stored
M10_9788131787663_C10.indd 384
M10_9788131787663_C10.indd 384 7/3/2012 12:11:21 PM
7/3/2012 12:11:21 PM

is the first one that can be taken out. It is sequential access that is done, and not random
access.Two operations are defined for a stack,that is,the PUSH,in which data is written
into the stack,and POP in which data is read out and loaded into registers.The stack has
a pointer to its top which is called the Stack pointer (SP). For ARM, this is register R13.
This means that the address of the top of the stack is to be available in SP.
10.20.3.1 | Types of Stacks
Ascending/Descending and Empty/Full
An ascending stack grows upwards. It starts from a low memory address and, as items
are pushed onto it, progresses to higher memory addresses. A descending stack grows
downwards. It starts from a high memory address, and as items are pushed onto it, it
progresses to lower memory addresses.
In an empty stack, the stack pointer points to the next free (empty) location on the
stack, i.e., to the place where the next item to be pushed, will be stored. In a full stack,
the stack pointer points to the topmost item in the stack, that is the location of the last
item pushed onto the stack. In practice, stacks are almost always full and descending.
Most stacks are ‘Full descending’ types.
Let’s consider a descending stack in which SP is first decremented and then data
is pushed in. The reverse occurs for the POP operation. Stacks allow data to be pushed
or popped only as words (32 bits for ARM). Consider that SP = 0x50002000, and the
contents of R1 and R2 are pushed in At the end of the operation we find that SP = SP-8
= 0x50001FF8. ARM does not have a mnemonic for PUSH, instead it uses the STM
instruction.To simplify the use of the STM/LDM instructions corresponding to PUSH
and POP for different types of stacks,Table 10.18 can be referred to.
For the kind of stack that we are talking about now, what
is the instruction we can use for pushing the contents
of registers R1 to R3?
The answer is STMDB SP! {R1-R3}. We need SP to be used as the base register. For
pushing in, SP is first decremented, and then storing is done. So we use the suffix ‘DB’
along with SP.The operator ‘!’ is used such the decremented value is available in SP.
See this simple program (Example 10.18) in which SP is initialized to 0x40000200.
Some values are loaded into registers R1 to R3.Using the STMDB instruction,the con-
tent of the three registers,that is,3 words are pushed to the stack,and will be available in
memory. At the end of the program, SP will be found to have the value of 0x400001F4.
Table 10.18 | Types of Stacks and Corresponding Instructions to be Used
Stack Type Push Pop
Full Descending STMFD (DB) LDMFD (IA)
Full Ascending STMFA (IB) LDMFA (DA)
Empty Descending STMED (DA) LDMED (IB)
Empty Ascending STMEA (IA) LDMEA (DB)
M10_9788131787663_C10.indd 385
M10_9788131787663_C10.indd 385 7/3/2012 12:11:21 PM
7/3/2012 12:11:21 PM

Example 10.18
AREA STCK, CODE, READONLY
ENTRY
LDR SP, = 0x40000200
MOV R1,#1
MOV R2,#2
MOV R3,#3
STMDB SP!,{R1-R3}
STOP B STOP
END
Now, in this program, if the STM instruction is changed to STMIA, the stack becomes
an ascending stack, and the value of SP will be 0x4000200C, after program execution.
Thus, it is obvious that a stack is a data structure which can be deﬁned by software.
10.20.4 | Stacks and Subroutines/Procedures
For most processors, procedures use a stack to store the return address. A procedure is
taken up by a ‘CALL’ instruction. This causes the action of pushing the current value of
PC onto the stack. The procedure ends with a ‘RETURN’ instruction. This causes the
PC value to be popped back.
For ARM,so far (Section 10.16.1) we have used procedures without the necessity of
a stack. That is because the Link Register (LR) keeps the return address, when a proce-
dure is called. But think of the case of nested procedures.There is only one link register
for a mode,and a new procedure will overwrite the existing link register which stores the
details of the previous procedure, and very soon things may go out of hand.
In such cases, a stack is a necessity. Each time a procedure is called, the PC value
is saved in the LR, as is the usual case. When a nested procedure comes in, the content
of the link register is pushed on to the stack, and popped out from the stack when exit-
ing the procedure. Figure 10.20 shows the sequence of actions needed to take care of a
nested procedure.
Now, let’s try to understand the sequence of actions indicated by Figure 10.20.
In the main program, we deﬁne a stack by giving a value to the stack pointer (SP).
Figure 10.20 | Sequence of actions needed for a nested procedure
Main Program
BL PROC1
LDR SP, =
PROC1
BL PROC2
LDMIA
SP!{REGS, LR}
MOV PC, LR
STMDB
SP!{REGS, LR}
PROC2
MOV PC, LR
M10_9788131787663_C10.indd 386
M10_9788131787663_C10.indd 386 7/3/2012 12:11:22 PM
7/3/2012 12:11:22 PM

The main program has a procedure named PROC1 which is called by the
instruction BL PROC1. This instruction causes the current PC value to be copied to
LR. In PROC1, since we anticipate a nested procedure, we push LR and the working
registers to stack using the instruction STMDB SP!,{REGS,LR}. Thus the content of
LR is safely stored in the stack.
In the procedure PROC1, another procedure is called by the instruction BL
PROC2.This instruction causes the copying of the present PC to LR. In PROC2, there
is the instruction MOV PC, LR at the end. This will get the PC value back from LR,
and thus execution goes back to PROC1.
At the end of PROC1, there is the stack based instruction LDMIA SP!, {REGS,
LR}.This retrieves the contents of LR.This is given back to PC by the instruction MOV
PC, LR.Thus, execution goes back to the main program.
Any part of memory can be deﬁned as a stack, by simply deﬁning the content of the
stack pointer register. Let’s write a procedure using a stack.
Example 10.19
AREA NESTED, CODE, READONLY
ENTRY
LDR R7, = 0X40000000
LDR SP, = 0x40000210 ;define SP
MOV R1,#1
MOV R2,#2
MOV R3,#3
BL PROC1 ;Call PROC1
LDR R6,[R7] ;load R6
STOP B STOP
PROC1 STMDB SP!,{LR,R1-R3} ;save registers and LR on stack
MOV R1,#0x34
MOV R2,#0x45
MOV R3,#0xDC
BL PROC2 ;call PROC2
STR R5,[R7] ;store R5
LDMIA SP!,{R1-R3,LR} ;retrieve registers from stack
MOV PC,LR ;copy LR to PC
PROC2 ADD R4,R2,R1 ;the nested procedure PROC2
ADD R5,R4,R3
MOV PC,LR ;go back to PROC1
END
Example 10.19 shows the instance of a nested procedure and the use of the stack. The
example is in tune with the sequence outlined by Figure 10.20. Nothing very important
is achieved by the program, But it shows how any nested procedure can be written.
PROC1 changes the contents of registers R1, R2 and R3, but since they have already
been saved on the stack by the STMDB instruction, their contents can be retrieved
while returning to the main program.
M10_9788131787663_C10.indd 387
M10_9788131787663_C10.indd 387 7/3/2012 12:11:22 PM
7/3/2012 12:11:22 PM

PROC2 adds the new contents of R1, R2 and R3, and returns. In PROC1, the sum
in R5 is stored in the memory location pointed by R7. Later, in the main program, this
content is loaded to R6.
Example 10.20
Write a program which arranges numbers (stored in readonly memory) in ascending
order, and place them in the R/W memory.
AREA NUM,DATA,READONLY
ARRAY DCD 2,7,4,5,11,17,3,15,8,6,9,19,10,23,20
AREA COD,CODE
ENTRY
LDR R0, = ARRAY ;load ARRAY to R0
LDMIA R0,{R1-R10} ;load 10 numbers to R1 to R10
MOV SP,#0x40000000 ;location of R/W memory
STMIA SP,{R1-R10} ;store the 10 numbers
ADD SP,#40
ADD R0,#40
LDMIA R0,{R1-R5} ;load next set of 5 numbers
STMIA SP,{R1-R5} ;store them
MOV SP,#0x40000000 ;address of Read-write memory
MOV R1,#0 ;Initialize counter R1 to zero
MOV R3,SP
LOOP1 MOV R2,#0 ;outer loop, counter from one end
MOV R4,SP
LOOP2 CMP R2,#14
BEQ OUTER ;branch to OUTER
ADD R2,#1 ;increment the counter
LDR R0,[R4] ;stored as 4 bytes
;hence a jump of 4
LDR R5,[R4,#4]
ADD R4,#4
CMP R0,R5 ;comparing nearby values
BLT LOOP2
MOV R6,R0
MOV R0,R5 ;swapping and storing them
MOV R5,R6
STR R5,[R4]
SUB R4,#4
STR R0,[R4]
ADD R4,#4
B LOOP2
OUTER
ADD R1,#1
CMP R1,#15
BNE LOOP1
M10_9788131787663_C10.indd 388
M10_9788131787663_C10.indd 388 7/3/2012 12:11:22 PM
7/3/2012 12:11:22 PM

STOP B STOP
END
This program uses the concept of ‘bubble sorting’.First the 15 numbers stored in the vari-
able ARRAY are loaded, in two different steps. After that, two loops are taken in which
the first loop uses a counter from 0 to 14.The first loop traverses from one end of the array
to other, and the second loop is used to compare nearby values and swap them according
to which value is lesser than the other.Hence,by each iteration of the outer loop,the low-
est value in the array slowly comes to the first element, and this process continues.
Conclusion
With this, we come to the end of our discussion on the architecture and assembly lan-
guage programming for ARM. There are a few more instructions, pseudo instructions
and directives that haven’t been dealt with, but that can be learned by referring to a book
fully dedicated to this processor.
ARM is the most popular of the 32-bit processors in the market.
ARM, the company does not fabricate chips—instead it sells the design as ‘Intellectual
Property’.
The ARM family consists of members ARM7, 9, 10, 11 and Cortex versions.
Lower power dissipation and good computational capability are the chief attributes of
the ARM processor.
The processor has a large set of registers, and operates in seven modes.
It can be programmed in assembly and one of the IDEs available is Keil RVDK.
The barrel shifter in the ALU has a lot of relevance, as it simplifies computations.
ARM can use data processing instructions conditionally, by suffixing S to it.
There is a link register (LR) for simplifying procedure calls.
It has a special mechanism for handling immediate data which is bigger than 8 bits.
It has multiple register load and store instructions .
Stacks are needed when nested procedures come .
Q U E S T I O N S
1. List out the important features that make ARM ideal for embedded applications.
2. Name two aspects in the design of ARM which has made it a processor with ‘low-power
dissipation’.
3. What is the use of a cache for any processor?
M10_9788131787663_C10.indd 389
M10_9788131787663_C10.indd 389 7/3/2012 12:11:22 PM
7/3/2012 12:11:22 PM

4. What does the acronym ISA mean to you?
5. What are the advantages and disadvantages of‘pipelining’?
6. What is the penalty incurred in the case of‘unaligned’data?
7. How is‘rotation to the left’achieved in ARM?
8. How is the instruction LDR different from the pseudo instruction LDR?
9. Why is it that compare instructions don’t need the suffix‘S’?
10. How is the write back operator used in the‘pre-indexed’mode of addressing?
E X E R C I S E S
1 Write instructions for the following, without using any CISC type instruction.
a) move into R7, a byte multiplied by 8
b) move into R6, a word multiplied by 17
c) move into R5, a number divided by 8
2. What do the following instructions mean and what is accomplished?
a) ANDEQ R1, R2, R4
b) ADDHI R2, R4, R2
c) MOVAL R7, R5
d) SUBME R1, R2, R7
e) CMP R1, R2
f) TEST R1, R3
g) MOVGT R2, R5
h) ADDLT R5, R6, R7
3. Write assembly language programs for the following.
a) Find the factorial of any number (the factorial should fit in a 32-bit register)
b) Do division using repeated subtraction
c) Find the sum of the first 100 natural numbers. Save the result in memory
d) Find the sum of 10 numbers stored in Readonly memory the result should be in Read/
write memory
e) Store 15 numbers in memory and arrange them in
– descending order
– ascending order
f) Write a program with a procedure call using
– without using stack
– using stack
M10_9788131787663_C10.indd 390
M10_9788131787663_C10.indd 390 7/3/2012 12:11:22 PM
7/3/2012 12:11:22 PM

Introduction
In the previous chapter, we made a thorough study of the core of ARM. The study was
not exhaustive; there are many more aspects,features and advancements introduced with
each new version of the architecture. The trick is to learn more as and when you need
to use the chip for a specific application. What we did in the last chapter was assembly
language programming, so that the computational capabilities of ARM, the processor,
are clear.
In this chapter,we take a different approach—we examine ARM as a microcontroller
aka SoC (System on Chip). The application domain of this processor is in the
embedded field. A number of peripherals are added inside the chip so as to make it a
‘microcontroller’and as the peripherals increase in number and complexity,it is sufficient
to make a complete system.The number and kind of peripherals needed depends on the
application, but there are some peripherals which are more or less a standard feature in
arm—the world’s
most popular
32-bit embedded
processor
11
Chapter-opening image: An ARM7 LPC2146 board.
The internal architecture of LPC 2148,
a typical and popular ARM7 MCU
The buses in this MCU
The list of peripherals inside the chip
The memory map of the peripherals
The programming of the GPIO
The programming of the timer unit
The programming of the PWM unit
How to use the serial communication unit
The internal structure of ARM9 and
Cortex–M3
part ii – peripheral programming of
arm mcu using c
M11_9788131787663_C11.indd 391
M11_9788131787663_C11.indd 391 7/3/2012 1:06:28 PM
7/3/2012 1:06:28 PM

most microcontrollers, examples are timers/counters, serial ports, general purpose I/O,
etc. More advanced MCUs have I2C, SPI, RTC, PWM units and so on as internal
peripherals. MCUs with more advanced cores have peripherals such as LCD controllers,
CAN controllers, USB controllers, etc.
In this chapter, we will take a look at the internal block diagram and internal buses
of some ARM MCUs, study a few selected peripherals and write a few programs for
these peripherals.
All the programs presented in the chapter have been tested and verified on the LPC
2148 MCU using the Keil RVDK.
ARM MCUs are manufactured by different firms and so there are a variety of
MCUs and peripheral boards in the market. We choose a few popular ones for our
study.
We start with an MCU based on the ARM 7 core—NXP founded by Philips (the
company) has manufactured and popularized the LPC 21xx series which is a set of
MCUs with sufficient peripherals for a moderately complex application. Let’s begin
with the LPC 2148 MCU which is one member of the LPC 214x series, the other
members being LPC 2141/42/44/46. Data sheets and user manuals for the series are
available which give all the fine details of each and every peripheral. Some important
details of the chip are given in Appendix D. The data sheet is available in the site www.
pearson/lyladas/.
11.1 | Block Diagram
Figure 11.1a is the photograph of an MCB 2140 board, with the LPC 2148 and
other interfaces marked on it. Figure 11.1b shows the internal block diagram of LPC
2148. Let’s take a look at this block diagram, and discuss some aspects of this very
complex chip.
SD
Adapter
Pot 1 LPC214x JTAG RS-232
Speaker LEDs Reset
Reset
INT 1
INT 1
USB
MCB2140
Ver 5.2
www.kell.com
EIM
PG
P1
P2
COM1
COM2
F1
F2
F3
F1
F2
F3
0
1
2
3
0
1
2
3
4
5
6
7
4
5
6
7
8
9
0
1
8
9
0
1
0
1
2
3
0
1
2
3
4
5
6
7
4
5
6
7
8
9
0
1
8
9
0
1
Figure 11.1a | Photograph of the LPC 2148 MCU on a MCB 2140 development board
M11_9788131787663_C11.indd 392
M11_9788131787663_C11.indd 392 7/3/2012 1:06:29 PM
7/3/2012 1:06:29 PM

11.2 | Features of the LPC 214x Family
The diﬀerent functional blocks in this SoC family (which includes the 2141/42/44/48)
are shown in Figure 11.1, and let us attempt to understand some of them. The details
of the ARM7 core was thoroughly covered in the previous chapter and so it is not
repeated here.
The main features provided by this family are listed below. It is not necessary to go
through the list comprehensively right now. Once the most important features are stud-
ied in detail, this list may be used as a back reference.
i) The core ARM 7TDMI-S in a tiny LQFP64 package
ii) 8 KB to 40 KB of on-chip static RAM
iii) 32 KB to 512 KB of on-chip ﬂash memory
iv) 128-bit wide interface/accelerator enables high-speed 60 MHz operation
v) USB 2.0 Full-speed compliant device controller with 2 KB of endpoint RAM.
In addition, provides 8 KB of on-chip RAM accessible to USB by DMA
i) 10-bit ADCs provide a total of 6/14 analog inputs
ii) Single 10-bit DAC provides variable analog output (LPC2142/44/46/48 only)
Figure 11.1b | Internal block diagram of LPC 2148
XTAL1
PLL0
Emulation
Trace
Module
PLL1
System
Functions
System Clock
USB Clock
AMBA AHB
(Advanced High-performance Bus)
Vectored
Interrupt
Controller
XTAL2
TDI(1)
TDO(1)
TMS(1)
TRST(1)
TCK(1)
RST
SSEL0, SSEL1
MISO0, MISO1
MOSI0, MOSI1
SCK0, SCK1
A/D Converters
0 and 1(2)
AD0[7:6] and
AD0[4:1]
P0[31:28] and
P0[25:0]
P1[31:16]
4 × CAP0
External
Interrupts
Capture/Compare
(W/External Clock)
Timer 0/Timer 1
EINT3 to
EINT0
4 × CAP1
8 × MAT1
8 × MAT0
AD1[7:0](2)
SPI and SSP
Serial Interfaces
I2
C-bus Serial
Interfaces 0 and 1 SDA0, SDA1
SCL0, SCL1
Connect
VBus
D+
USB 2.0 Full-speed
Device Controller
with DMA(3)
AHB
Decoder
8 KB RAM
Shared with
USB DMA(3)
VPB (VLSI
Peripheral Bus)
VPB
Divider
AHB to VPB
Bridge
Internal Flash
Controller
AHB Bridge
Test/Debug
Interface
ARM7TDMI-S
Internal SRAM
Controller
8/16/32 KB
SRAM
32/64/128/258/512
KB Flash
UP_LED
D–
Fast General
Purpose I/O
ARM7 Local Bus
LPC2141/42/44/46/48
M11_9788131787663_C11.indd 393
M11_9788131787663_C11.indd 393 7/3/2012 1:06:29 PM
7/3/2012 1:06:29 PM

iii) Two 32-bit timers/external event counters (with four capture and four compare
channels each), PWM unit (six outputs) and watchdog
iv) Low power real-time clock (RTC) with independent power and 32 kHz clock input
v) Multiple serial interfaces including two UARTs (16C550), two fast I2C-bus
(400 Kbit/s), SPI and SSP with buffering and variable data length capabilities
vi) Vectored interrupt controller (VIC) with configurable priorities and vector addresses
vii) Up to 45 of 5 V tolerant fast general purpose I/O pins in a tiny LQFP64 package
viii) Up to 21 external interrupt pins available
ix) 60 MHz maximum CPU clock available from programmable on-chip PLL
x) On-chip integrated oscillator operates with an external crystal from 1 MHz to 25 MHz
xi) Power saving modes include idle and power-down
xii) Processor wake-up from power-down mode via external interrupt or BOD
xiii) Single power supply chip with POR and BOD circuits
We now take a more detailed look at the important features of the chip. It may be
necessary to refer to Figure 11.1 while discussing these features.
11.2.1 | Memory
The memory available includes up to 40KB static RAM and 512KB flash. In the case of
LPC2146/48 only,an 8 KB SRAM block intended to be utilized mainly by the USB,can
also be used as a general purpose RAM for data storage and code storage and execution.
11.2.2 | Memory Map
The total memory space is 4 GB (corresponding to an internal address bus of 32 bits,i.e.,
232
= 4GB). It is a ‘memory mapped I/O’system in which peripherals and memory share
the same memory space.
11.2.3 | System Functions
The system functions include a crystal oscillator and a PLL (Phase Locked Loop). The
oscillator frequency can be in the range of 10 to 25 MHz which can be multiplied up,
to get a system frequency up to 60 MHz using the PLL. There is also the possibility of
changing the system frequency dynamically (using the PLL).When the system is idling,
the frequency can be scaled down to reduce power dissipation.
11.2.3.1 | Reset
There are two ways of resetting—a hard reset using the active low reset pin and a soft
reset on account of the watchdog timer. In either case, the reset vector is the address
0x0. Reset also starts a timer designated as the ‘wake up timer’. This timer ensures that
a minimum delay is allowed for the system to stabilize and then only code execution is
allowed to start.
11.2.3.2 | Power Control
One of the main features of ARM processors are their low-power dissipation Besides a
basic low power design in the technological aspects, low power modes are also available:
they are the idle mode and power-down mode.
M11_9788131787663_C11.indd 394
M11_9788131787663_C11.indd 394 7/3/2012 1:06:29 PM
7/3/2012 1:06:29 PM

Idle Mode
In the idle mode,instruction execution is suspended until either a reset or interrupt occurs.
But peripheral functions can continue operation and may generate interrupts to
cause the processor to resume execution. The idle mode eliminates power used by the
processor, memory systems, related controllers and internal buses.
Power-down Mode
In the power-down mode, the oscillator is shut down and the chip receives no internal
clocks.
This mode can be terminated and normal operation resumed by either a reset or
certain specific interrupts that are able to function without clocks.Since all dynamic oper-
ation of the chip is suspended,this mode reduces chip power consumption to almost zero.
11.2.4 | Internal Buses
11.2.4.1 | AMBA
‘Advanced Microcontroller Bus Architecture’ or AMBA is a standard defined by ARM
in 1996, for on-chip buses in its SoC designs. In Figure 11.2, a number of buses can be
seen which form part of AMBA.The figure shows the bus structure re-drawn to empha-
size the functionality of the constituent buses of the AMBA standard.Three buses with
different protocols and speeds have been defined, for catering to the different kinds of
components present inside the chip.
The fastest bus is the system or the local bus, which connects the processor core
with memory, as memory accesses have to be very fast. In the LPC 21xx series, serious
thought has been given to the idea of speeding up special peripherals, and as such, a
GPIO (General Purpose I/O) block is also connected to the local bus. This permits
peripherals connected to this fast GPIO block, to use the high speed of the local bus.
SRAM

Flash
ARM 7
Core
AHB
Bridge
–
VPB Peripherals
AMBA AHB
ARM Local Bus
Fast
GPIO
AHB to VPB
Bridge
VPB
Divider
VIC
Figure 11.2 | The internal bus structure
M11_9788131787663_C11.indd 395
M11_9788131787663_C11.indd 395 7/3/2012 1:06:29 PM
7/3/2012 1:06:29 PM

Along with the core, an AHB bridge is seen which defines the high speed AMBA’s
‘Advanced High Performance Bus’ facilitating the ‘Vectored Interrupt Controller’. The
third bus is the VPB bus which stands for VLSI Peripheral Bus; there is a bridge which
communicates between the low speed VPB bus and the higher speed AHB bus. The
VPB is the one that connects to all the peripherals of the LPC 214x SoC.
11.2.4.2 | The VPB Bus and Divider
Figure 11.2 shows that there is a bridge that interfaces between the VPB and the AHB.
There is also a VPB divider. This is a register whose settings can be used to divide the
output frequency of the PLL so as to get a reduced clock frequency (1/2 to ¼) for the
VPB peripherals which need to operate at a frequency lower than the processor. The
processor clock is designated as CCLK, while the peripheral clock is called PCLK. On
reset, PCLK is ¼ of CCLK.
11.2.5 | Memory Accelerator Module
Instructions are executed by fetching them from memory. In a typical MCU, program
code, that is, the instructions are stored in flash memory. Flash memory is rather slow,
which means that program execution gets slowed down, and so the very purpose of hav-
ing a high speed processor gets defeated.The easiest solution is to have a shadow RAM
as in PCs, where the content of flash is copied to RAM (on startup) so that this (fast)
RAM is accessed rather than the slow flash. Such a solution can be thought out for our
MCU as well, but here we are thinking of ‘on-chip’ flash and RAM which are limited
in size—so copying program code to on-chip RAM is not a feasible solution. Another
possibility is to have a fast cache on the chip, but that will need additional hardware and
increase the complexity of the chip.
The final solution to this has come in the form of a module named the ‘memory
accelerator module’. In simple terms, the memory accelerator module (MAM) attempts
to have the next ARM instruction that will be needed, in its latches in time to prevent
CPU fetch stalls (see Figure 11.3).
Bus
Interface
ARM Local Bus
Flash Memory Bank
(128 Bits)
Memory Address
Buffers
Figure 11.3 | Simplified view of the memory accelerator module
M11_9788131787663_C11.indd 396
M11_9788131787663_C11.indd 396 7/3/2012 1:06:30 PM
7/3/2012 1:06:30 PM

In this setup, the flash memory is arranged as a bank of 128 bits such that each
access to flash allows 128 bits to be accessed.This will require the flash to be organized
as 4 memory modules, where each module will have a bandwidth of 32 bits and thus
effectively 128 bits at a time. In practice, the speed of memory will not get multiplied
by 4, but it improves the speed over the case of having a flash memory and 32 bits
access, at a time. All the extra hardware to get this done is in the memory controller,
i.e., the MAM.
11.3 | Peripherals
Section 11.2 contains a list of the peripherals available in this chip.Each of the peripher-
als has addresses and the peripherals use the memory mapped I/O scheme of addressing–
this means that both memory and I/O share the same address space. The total address
space is from 0x0 to 0xFFFFFFF, i.e., a 4GB space.
The memory map of the system is as shown in Figure 11.4.
What are the notable features in this memory map?
i) The 512KB of flash (non-volatile) memory has an address starting from 0x0.
ii) The 40 KB of static RAM has the addresses starting at 0x4000 0000.
iii) There is a RAM allotted for ‘USB with DMA’ applications.
iv) The peripherals attached to the AHB have addresses from 0xF000 0000.
v) The peripherals attached to the lower speed VPB bus have the addresses from
0xE000 0000.
VPB Peripherals Next, we will learn how to use the peripherals of the chip. Each
peripheral has a number of special function registers (SFRs) associated with it, and each
SFR has a specific address.To use a specific peripheral in the way we want,the associated
registers should be written with the appropriate bits.
11.3.1 | GPIO (General Purpose I/O)
If you look at the pin configuration, which is available in the manual of the chip, it will
be seen that most pins have more than one function. There are three to four designa-
tions for each pin, and which of these is valid at a time depends on how the pin has been
‘programmed’ using the pinselect block.
There are two general purpose 32-bit ports,P0 and P1,with restrictions and features
as explained below.These are the pins to which external peripherals can be connected.
11.3.1.1 | Port 0
Port 0 is a 32-bit I/O port with individual direction controls for each bit. 28 pins of the
Port 0 can be used as general purpose bi-directional digital I/Os, while P0.31 provides
digital output functions only. The operation of Port 0 pins depends upon the pin func-
tion selected via the pin connect block. Pins P0.24, P0.26 and P0.27 are ‘reserved’ and
not available for use.
M11_9788131787663_C11.indd 397
M11_9788131787663_C11.indd 397 7/3/2012 1:06:30 PM
7/3/2012 1:06:30 PM

Total of 32 kB On-chip Non-volatile Memory (LPC2141)
0x0000 0000
0x0000 8000
0x0000 7FFF
0x0001 0000
0x0000 FFFF
0x0002 0000
0x0001 FFFF
0x0004 0000
0x0003 FFFF
0x0008 0000
0x0007 FFFF
0x4000 0000
0x3FFF FFFF
0x4000 2000
0x4000 1FFF
0x4000 4000
0x4000 3FFF
0x4000 8000
0x4000 7FFF
0x7FD0 0000
0x7FCF FFFF
0x7FD0 2000
0x7FD0 1FFF
0.0 G8
1.0 GB
2.0 GB
3.0 GB
3.5 GB
3.75 GB
4.0 GB
Reserved Address Space
32 kB On-chip Static RAM (LPC2146/2148)
8 kB On-chip Static RAM (LPC2141)
16 kB On-chip Static RAM (LPC2142/2144)
Boot Block
(12 kB Remapped from On-chip Flash Memory)
AHB Peripherals
APB Peripherals
8 kB On-chip USB DMA RAM (LPC2146/2148)
0x7FFF 0000
0x7FFF CFFF
0xE000 0000
0xF000 0000
0xFFFF FFFF
0xC000 0000
0x8000 0000
Figure 11.4 | Memory map of the SoC
M11_9788131787663_C11.indd 398
M11_9788131787663_C11.indd 398 7/3/2012 1:06:30 PM
7/3/2012 1:06:30 PM

11.3.1.2 | Port 1
Port 1 is a 32-bit bi-directional I/O port with individual direction controls for each bit.
The operation of Port 1 pins depends upon the pin function selected via the correspond-
ing pin connect block. Pins 0 through 15 are not available. Out of the remaining pins
from 16 to 31, the pins 16 to 25 are ‘reserved’. In effect, only very few pins of Port 1 are
available and they can be used for GPIO only, because other pin functions are used for
JTAG.
11.3.1.3 | Pin Connect Block
The purpose of this is to configure the pins to the desired functions. This acts like a
multiplexer.
Let’s look at it this way. Each pin of the chip has a maximum of four functions. To
select one specific function for a pin, a multiplexer with two select pins, is necessary.The
select pins function is provided by the bits of the PINSEL registers.
Only three ‘pinselect’ registers are available and needed too, because only Port 0
and half of Port 1 are available as peripheral pins. See Appendix D for details of the
PINSELECT registers.As a sample,pin selection for Pin No 29,P0.5 has been shown in
Table 11.1, and the selection of the pin using the Pinselect logic is shown in Figure 11.5
Table 11.1 | Pinselect Logic for P0.5
Bits of PINSEL
Register
Port Pin
No
Value of
PINSEL Bits
Function Selected Reset
Value
11: 10 P0.5 00 GPIO 0
01 MISO0 (SPI0)
10 Match 0.1 (Timer 0)
11 AD0.7
AD 0.7
P0.5
SPIO
GPIO
11:10
Bits of PINSEL0 Reg
Pin
Select
Mux
Match 0.1
Figure 11.5 | Pinselect mux of pin P0.5
M11_9788131787663_C11.indd 399
M11_9788131787663_C11.indd 399 7/3/2012 1:06:30 PM
7/3/2012 1:06:30 PM

What Table 11.1 displays is that the bits 11 and 10 of PINSEL0 register should be
00 if pin No:29 (P0.5) is to be a GPIO, 01 if it is to be used as SPIO, 10 if it is to be a
match pin for Timer 0 and 11, if it is used as AD0.7.
In general, pin selection is as shown in Table 11.2
The description of a pin shows that it can have 4 possible functions.The logic on the
select pins decides the functionality of the port pin.For any pin,the bit conﬁguration ‘00’
of the corresponding pinselect register bits program the pin to act as a general purpose
I/O pin. By default, on reset, all port pins act as GPIO pins.
Example 11.1
Use the PINSEL0 register for activating PWM1, PWM2 and PWM3 outputs, which
are at pins P0.0, P0.1 and P0.7, respectively.
Solution
Refer Appendix D, which gives the complete listing of the pin functions of the chip.
PWM output pins are listed as the second alternate function of any port pin.The corre-
sponding bits of PINSEL0 register for activating these port pins to act as PWM should
be given a logic of 10. See Table 11.3.
Table 11.3
Output Function Output Pin Bits of PINSEL0 Register
PWM1 P0.0 1:0
PWM2 P0.7 15:14
PWM3 P0.1 3:2
Thus, PINSEL0 register has the value -0000 0000 0000 0000 1000 0000 0000 1010 =
0x0000800A
11.3.1.4 | Using GPIO Pins
These pins can be used for applications for which speciﬁc ‘controllers/drivers’ are not
available inside the chip—for driving an LCD display, relays, motor controls, ON/OFF
functions and so on. Four registers are available for this. They are shown in Figure 11.6
and listed as follows:
Table 11.2 | Generalization of the Function of the Pinselect Register Bits
PINSEL0 and PINSEL1 Function
00 Primary function, typically GPIO
01 First alternate function
10 Second alternate function
11 Reserved
M11_9788131787663_C11.indd 400
M11_9788131787663_C11.indd 400 7/3/2012 1:06:30 PM
7/3/2012 1:06:30 PM

i) IODIR (IO Direction register): The bit setting of this register decides whether a
pin is to be an input(0) or output(1).
ii) IOSET (IO Set register): This register is used to set the output pins of the chip.To
make a pin to be ‘1’, the corresponding bit in the register is to be ‘1’. Writing zeros
have no eﬀect.
iii) IOCLR (IO Clear register): To make an output pin to have a ‘0’ value, i.e., to clear
it.The corresponding bit in this register has to be a ‘1’. Writing zeros have no eﬀect.
iv) IOPIN (IO Pin register): From this register, the value of the corresponding pin can
be read, irrespective of whether the pin is an input or output pin.
Example 11.2
Let us attempt to make the lower 16 GPIO pins to be output pins.
Then IODIR = 0x0000 FFFF
To send zeros to all these pins, IOCLR = 0x0000 FFFF.
Now, check the pin values in the register IOPIN, which will have the lower 16 bits
to be 0.
Now, if these pins are to be set, IOSET = 0x0000 FFFF
Check the pin values in the register IOPIN,which will have the lower 16 bits to be 1.
Example 11.3
Generate an asymmetric al square wave at the lowest four pins of Port0.
#include LPC214X.H
int main(void)
{
unsigned int x;
for(;;)
{
IODIR0 = 0xFFFFFFFF; //Make all pins as outputs
for(x = 0;x4;x++) //Delay for the high part
Figure 11.6 | Registers pertaining to one GPIO pin
SFRs
G
P
I
O
P
i
n
s
31
1
0
IO SET
IO PIN
IO CLR
IO DIR
M11_9788131787663_C11.indd 401
M11_9788131787663_C11.indd 401 7/3/2012 1:06:30 PM
7/3/2012 1:06:30 PM

IOSET0 = 0x0000000F; //Set the lowest four bits
P0.0 to P0.3
for(x = 0;x12;x++) //Delay for the low part
IOCLR0 = 0x0000000F; //Clear the lowest four
bits P0.0 to P0.3
}
}
Example 11.3 is a simple example to show the function of the four GPIO registers
corresponding to Port 0.The use of Embedded C has been discussed in Chapter 9 and is
not repeated here.This program sets and clears the lowest four bits of Port 0 at a certain
asymmetric rate (note that the delays are diﬀerent).
Figure 11.7 shows the output at Port pins 0.0 to 0.3, viewed in the logic analyser
which is available in the ‘simulator’ of Keil RVDK. LEDs may be connected to the port
pins and, then they will go ON and OFF at the rate determined by the delay. In the
program, the OFF time is three times the ON time.
The contents of the registers in Figure 11.6 can be observed in the ‘peripherals’ part
of the simulator.
Example 11.4
#include LPC214x.h
void wait(void)
{
int d;
for (d = 0; d 1000000; d++);
}
int main(void)
{
IODIR1 = 0x00010000; //Make P1.16 an output
while(1)
{
IOSET1 = 116; //P1.16 = 1
wait();
wait();
IOCLR1 = 116; //P1.16 = 0
wait();
}
}
Example 11.4 is another program which generates a square wave at the GPIO.Here
only one pin is chosen and it is P1.16. That pin is made 1 by loading ‘1’ on the LSB of
IOPIN1 register and shifting it left 16 times. Next the IOPIN1 register is cleared.This
program generates a square wave at P1.16.The delay is created in a function named wait
M11_9788131787663_C11.indd 402
M11_9788131787663_C11.indd 402 7/3/2012 1:06:30 PM
7/3/2012 1:06:30 PM

and this function is then called. To make the ON time to be twice the OFF time, the
wait function is called twice when the pin is ‘1’.
Note For both the above programs we haven’t used the ‘PINSELECT BLOCK’ for
choosing the pin function.This is not necessary, because on reset, all port pins behave as
GPIO itself.
11.3.2 | The Timer Unit
Next let’s study the timers/counters of LPC 2148. A timer and a counter are function-
ally equivalent, except that a timer uses the PCLK for its timing, while a counter uses an
external source.This means that a counter is used to count external events via the capture
inputs. A counter is also called a ‘capture timer’.
Here, we discuss the timer function alone. There are two such units—Timer 0 and
Timer 1. There are a number of registers associated with timer operations. Let’s discuss
the functionality of each of them for Timer 0 which we notate as T0. When we use
Timer 1, we use similar registers, but with the notation T1.
The ﬁrst step in timer operation is to load a number into what is called a ‘match
register’.Then a timer count register is started.This register keeps incrementing for each
PCLK cycle or a lower rate pre-scaled cycle. When the content of this timer count reg-
ister becomes equal to the value in the match register, i.e., a match occurs, the delay that
occurs from the starting time can be used for our ‘timing’.Figure 11.8 illustrates the idea
of timing done using the timer unit.
Now, let us understand the special function registers associated with Timer 0. Keep
in mind that a similar set of registers exist for Timer 1 as well.
11.3.2.1 | Important SFRs of Timer 0
Timer Count Register–T0TC
This is a 32-bit register, which gives it a range of counting from 0 to 0xFFFF FFFF and
then wraps back to the value 0x0000 0000. This register is incremented on every tick of
the clock (i.e. PCLK), if the prescale counter is made 0 (the use of the prescaler will be
explained subsequently).
Port0.0
Port0.1
Port0.3
Port0.2
0x0
0x1
0x0
0x1
0x0
0x1
0x0
0x1
Figure 11.7 | Waveforms at the lowest four pins of Port0 for Example 11.3
M11_9788131787663_C11.indd 403
M11_9788131787663_C11.indd 403 7/3/2012 1:06:30 PM
7/3/2012 1:06:30 PM

Timer Control Register–TOTCR
This is an 8-bit register (Figure 11.9) in which only the lowest two bits need be used.
Bit 0–E: This bit is the Enable bit. When this bit is ‘1’, the counter is enabled and
starts. Then, the count in T0TC is incremented for every cycle of PCLK (if prescaling
is not used).
Bit 1–R: This bit is the Reset bit. When ‘1’, the counter is reset on the next positive
edge of PCLK.
Match Registers (MR0 to MR3)
There are four 32-bit match registers available: MR0 to MR3. For the operation of one
timer,one of the match registers may be sufficient and is used by loading a number into it.
During timer operation, the timer count register starts incrementing, and at some
time, its count ‘matches’with the number in the match register.When this match occurs,
some action can be programmed to be done by ‘configuring’the bits of the ‘match control
register’.
Match Control Register–T0MCR
This is a 16-bit register (Figure 11.10) used to specify the event to occur when the match
occurs.
Control
Prescaler
PCLK
Match Control Register
Match Register
=
Stop
Reset
Timer Count Register
Timer Control Register
Figure 11.8 | Simplified block diagram illustrating the operation of the timer unit
Bit 1 Bit 0
R E
Figure 11.9 | The important bits of T0TCR
S R I
Figure 11.10 | Important bits of T0MCR
M11_9788131787663_C11.indd 404
M11_9788131787663_C11.indd 404 7/3/2012 1:06:30 PM
7/3/2012 1:06:30 PM

The lowest three bits are for controlling the operations related to the Match register 0.
The next three are for MR1, MR2 and MR3, in that order. Remember that there are four
match registers.
See the bits of the register related to Timer operation.
Bit 0–I: When ‘1’, an interrupt is activated when match occurs
When ‘0’, the interrupt is disabled.
Bit 1–R: When ‘1’, the Timer count register is reset when match occurs
When ‘0’, this feature is disabled.
Bit 2–S: When ‘1’, the timer count (T0TC) and the pre-scale counter will be
stopped when match occurs, also the Enable bit of the T0TCR is made ‘0’.
Let’s make a simple timer for which the steps are as listed in the following section.
11.3.2.2 | Timer Operation
i) Load a number in a match register.
ii) Start the timer by enabling the ‘E’ bit in T0TCR.
iii) The timer count register (T0TC) starts incrementing for every tick of the peripheral
clock PCLK (no prescaling is done).
iv) When the content of the T0TC equals the value in the match register,timing is said
to have occurred.
v) One of many possibilities can be made to occur when this happens.
vi) The possibilities are to reset the timer count register, stop the timer, or generate an
interrupt.This ‘setting’ is done in the T0MCR register.
Now let’s design a very simple timer for generating a symmetric square wave at
P1.16, using Timer 0.
Example 11.5
#include LPC214x.h
void wait(void);
int main(void)
{
T0MR0 = 0x000000FF; //Load a number in the
match register
T0MCR = 4;; //Stop when match occurs
while(1)
{
IODIR1 = 0x00010000; //Make P1.16 an output
pin
IOSET1 = 116; //P1.16 = 1
wait(); //Call the wait function
IOCLR1 = 116; //P1.16 = 0
wait();
}
}
M11_9788131787663_C11.indd 405
M11_9788131787663_C11.indd 405 7/3/2012 1:06:31 PM
7/3/2012 1:06:31 PM

void wait(void) //The wait function
T0TCR = 1; //Start the timer
while(!(T0TC == T0MR0)); //Until T0TC = MR0
T0TCR = 2; //Reset the counter
T0TC = 0; //Make the timer count
reg = 0
}
Explanation of Example 11.5
i) In the program, Timer 0 is used. The timer match control register (T0MCR) is
written with value 04, which indicates that the timer count register is to stop when
match is obtained.The match register (of Timer 0) is loaded with the number 0xFF.
These two steps are the initial conditions.
ii) The pin P1.16 is chosen as the output pin. It is set to ‘1’ initially and then the wait
function (which creates a delay) is called. After one call of the wait function, P1.16
is made ‘0’and the wait function is called again.This causes a symmetric square wave
to be obtained at this pin. Since this is a GPIO pin, it is understandable that the
other pin may also be used here to get the same functionality.
iii) The wait function creates the needed delay. The timer control register (T0TCR) is
loaded with the value ‘1’, so as to enable (start) the timer counter so that the timer
register T0TC increments for every clock tick of PCLK. When TOTC increments
to the value 0xFF (stored in T0MR0), ‘match’ occurs. At this point of time, the
counting is stopped, and the timer counter register is cleared.
iv) This whole sequence of events is repeated to get a continuous square wave.
11.3.2.3 | Calculation of Timer Output Frequency
Now let’s calculate the frequency of the square wave generated. The above program has
been tested on a board in which the peripheral clock (PCLK) is obtained by dividing the
crystal frequency of 60MHz by 4 (using VPB register settings) Thus PCLK is 15 MHz
now, and it has a period of T = 0.067 μsecs.
The function ‘wait’ creates a delay which is equal to 256 periods of PCLK (as
T0MR0 = 0xFF), i.e., 256 x0.067 μsecs. = 17.075 μsecs. This delay is half the period
of the square wave generated at P1.16. The period of the signal is thus 34μsecs and the
frequency = 29.4 KHZ.
Next, change the match counter value to 0xFFFF. A similar calculation gives a fre-
quency of 114 Hz. Figure 11.11(a) and (b) show the square waves generated for match
values of 0xFF with T = 34 μS and 0xFFFF with T = 8.7msecs.
T
34uS
Fig 11.11a
M11_9788131787663_C11.indd 406
M11_9788131787663_C11.indd 406 7/3/2012 1:06:31 PM
7/3/2012 1:06:31 PM

11.3.2.4 | Using the Pre-scaler
To get a lower frequency output, there is the facility of using the prescale counter.There
are two registers associated with ‘prescaling’—the prescale counter and the prescale
register.
The prescale counter increments for every PCLK, and when it counts up to the
value in the prescale counter (T0PR), it allows the timer counter (T0TC) to increment
its value by 1. This causes the T0TC to increment on every PCLK when PR = 0, every
2 PCLKs when PR = 1, every three PCLKs when T0PR = 2, every four PCLKs when
T0PR = 3 and so on. In eﬀect, load a number into T0PR, which will cause the output
frequency to get divided.
The mechanism is like this.
i) Say the prescale counter contains a number 2. When the timer is started, the pres-
clae counter decrements to 1 and only when it reaches 0,the timer count in T0TC is
incremented by 1.Thus, the timer count is incremented once only when the prescale
counter counts down from 2 to 0, i.e., in 3 clock periods of PCLK.
ii) This continues as long as the timer is enabled.
iii) For the case in Example 11.6, not using a prescale value amount to loading a num-
ber 0 into the prescale counter.
See Example 11.6, which shows the part of Example 11.5 which has been modiﬁed
by the additional instruction T0PR = 1. We now get a timer frequency of 57 Hz, for
T0MR0 = 0xFFFF. Without prescaling, the frequency would have been 114Hz.
Example 11.6
#include LPC214x.h
void wait(void);
int main(void)
{
T0MR0 = 0x000FFFF; //Load a number in the
match register
T0MCR = 4;
T0PR = 1; //Stop when TC is reached
while(1)
………………………….
………………………….
}
T = 8.7mS
Fig 11.11b | Square wave from the timer with a) T0MR0 = 0xFF b) T0MR0 = 0xFFFF
M11_9788131787663_C11.indd 407
M11_9788131787663_C11.indd 407 7/3/2012 1:06:31 PM
7/3/2012 1:06:31 PM

Tables 11.4(a) and Table 11.4(b) show the frequencies generated by Example 11.6
for match values of 0xFF and 0xFFFF, for diﬀerent pre-scaling factors.
11.3.3 | Timer 0 in the Interupt Mode
Next, we write a program for Timer 0 to operate it in the interrupt mode. Example 11.7
is such a program. To understand how the interrupt mechanism is incorporated here, a
brief introduction to the interrupt structure of the chip is necessary.
The discussion starts on a general note, and converges to the use of Timer 0, in the
interrupt mode. This will help us to use any other peripherals in the interrupt mode by
using the associated registers of the peripherals and its related interrupt registers. You
can, for instance, try to write programs for PWM and UARTs in the interrupt mode.
Example 11.7 is a program for generating a square wave at pin P1.16.The calculation for
the frequency at this pin is the same as presented in Section 11.3.2.3.
As part of the mechanism of understanding interrupts clearly, instructions of this
program are referred to, at every step of the forthcoming discussion.
Example 11.7
#include LPC214x.h
unsigned int x = 0;
__irq void Timer 0_ISR (void)
{
x ^= 1;
if(x)
IOSET1 = 116; //P1.16 = 1
else
IOCLR1 = 116; //P1.16 = 0
Table 11.4a | T0MR0 = 0xFFFF
T0PR Division Factor Frequency at P1.16
0 None 114 Hz
1 2 57 Hz
2 3 38Hz
4 5 22.8 Hz
Table 11.4b | MR0 = 0xFF
T0PR Division Factor Frequency at P1.16
0 None 29.4 K Hz
1 2 14.7 KHz
2 3 9.8 KHz
3 4 7.35 KHz
M11_9788131787663_C11.indd 408
M11_9788131787663_C11.indd 408 7/3/2012 1:06:31 PM
7/3/2012 1:06:31 PM

T0IR = 0x01; //Clear match 0 interrupt
VICVectAddr = 0x00000000; //End of interrupt
}
int main(void)
{
IODIR1 = 0x00FF0000; //P1.16.23 defined as
Outputs
IOCLR1 = 0x00FF0000; //P1.16 = 0
T0TCR = 0x00000000; //Disable timer counter0
T0PR = 0x00000002; //Prescaler value
T0MCR = 0x00000003; //Enable interrupt and
reset on match
T0MR0 = 0xFF; // MR0 value
VICVectAddr4 = (unsigned)Timer 0_ISR;
//Set the timer ISR
vector address
VICVectCntl4 = 0x00000024; //Set channel
VICIntEnable = 0x00000 //Enable the TIMER-0
interrupt
T0TCR = 0x00000001; //Enable timer counter0
for(;;);
}
11.3.3.1 | Vectored Interrupt Controller (VIC)
See Figure 11.1 in which the block diagram of LPC 2148 has a VIC as a peripheral. It
is the VIC that manages all the interrupts of the ARM core (IRQs and FIQs) as well as
interrupt requests from the peripherals.
Features of VIC
• 32 interrupt request inputs
• 16 vectored IRQ interrupts
• 16 priority levels dynamically assigned to interrupt requests
• Software interrupt generation
The vectored interrupt controller (VIC) takes 32 interrupt request inputs and pro-
grammably assigns them into 3 categories: FIQ, vectored IRQ and non-vectored IRQ.
The programmable assignment scheme means that priorities of interrupts from various
peripherals can be dynamically assigned and adjusted.
Fast interrupt requests (FIQ) have the highest priority. If more than one request
is assigned to FIQ, the VIC ORs the requests to produce the FIQ signal to the ARM
processor.
Vectored IRQs have the middle priority, but only 16 of the 32 requests can be
assigned to this category. Any of the 32 requests can be assigned to any of the 16 vec-
tored IRQ slots, among which slot 0 has the highest priority and slot 15 has the lowest.
Non-vectored IRQs have the lowest priority.
M11_9788131787663_C11.indd 409
M11_9788131787663_C11.indd 409 7/3/2012 1:06:31 PM
7/3/2012 1:06:31 PM

The VIC ORs the requests from all the vectored and non-vectored IRQs to produce
the IRQ signal to the ARM processor. Figure 11.12 illustrates this.
11.3.3.2 | Interrupt Sources for the VIC
Each peripheral device has one interrupt line connected to the VIC, but may have sev-
eral internal interrupt flags. Individual interrupt flags may also represent more than one
interrupt source. (See Table 11.5 which shows a part of the list of interrupt sources for
the VIC) For example, Timer 0 is assigned a number ‘4’ and has eight interrupt flags,
i.e., interrupts are generated for matching due to four match registers and four capture
registers, but the VIC associates Timer 0 with just one interrupt line.
Next,we will discuss briefly some of the important registers associated with the VIC.
11.3.3.3 | Interrupt Enable Register (VIC Interrupt Enable)
This is a read/write accessible register.This register controls the decision of which of the
32 interrupt requests and software interrupts are allowed to contribute to the generation
of an interrupt.
IRQN
IRQ1
IRQ
IRQ2
A
R
M
C
O
R
E
V
I
C
Figure 11.12 | The VIC’s connection to ARM
Table 11.5 | Connection of Interrupt Sources to the VIC
Block Flag(s) No
WDT Watchdog Interrupt (WDINT) 0
– Reserved for Software Interrupts only 1
ARM Core Embedded ICE, DbgCommRx 2
ARM Core Embedded ICE, DbgCommTx 3
TIMER 0 Match 0 – 3 (MR0, MR1, MR2, MR3)
Capture 0 – 3 (CR0, CR1, CR1, CR3)
4
TIMER 1 Match 0 – 3 (MR0, MR1, MR2, MR3)
Capture 0 – 3 (CR0, CR1, CR1, CR3)
5
UART0 Rx Line Status (RLS)
Transmit Holding Register Empty (THRE)
Rx Data Available (RDA)
Character Time-out Indicator (CTI)
6
M11_9788131787663_C11.indd 410
M11_9788131787663_C11.indd 410 7/3/2012 1:06:31 PM
7/3/2012 1:06:31 PM

For example, see Table. 11.6 which is part of this registers’ bit definitions for the
interrupting peripherals. It seen that the bit position for Timer 0 is ‘4’and hence bit 4 of
this register is to be set if Timer 0 is to be operated in the interrupt mode.
Example 11.7 uses the instruction
VIC Int Enable = 0x00000010; //Enable the TIMER-0 interrupt by
making bit = 4
11.3.3.4 | Vector Control Register (VIC Vect Cntl0-15)
Only 6 bits of this register are to be used.They are lower 6 ones.
Thus, in Example 11.7 where Timer 0 is allotted the number 4, as in Table 11.7.
VICVectCntl4 = 0x00000024; //Set channel
11.3.3.5 | Vector Address Registers (VIC Vect Add)
These are read/write accessible registers. These registers hold the addresses of the inter-
rupt service routines (ISRs) for the vectored IRQ slots.
In Example 11.7,
VICVectAddr4 = (unsigned)T0isr; //Set the timer ISR vector address
This instruction indicates that the ISR is at address ‘T0isr’ which is the starting
location (address) of the ISR.
11.3.3.6 | Using Timer 0 in the Interrupt Mode
Besides adding the instructions corresponding to the VIC, the timer program for oper-
ating in the interrupt mode has some minor differences from its operation in the status
check mode.
First of all,T0MCR is to be programmed to generate an interrupt on match. Hence
the 0th
bit of this register is to be set (Section11.2.2.1). So in Example 11.7,
TOMCR = 3; // generate interrupt and reset T0TC.
Table 11. 6 | Bit Definitions in the VIC Interrupt Enable Register for a Few Interrupt Sources
Bit 7 6 5 4 3 2 1 0
Symbol UART1 UART0 TIMER 1 TIMER 0 ARM Core 1 ARM Core 0 – WDT
Access R/W R/W R/W R/W R/W R/W R/W R/W
Table 11.7 | The lower 6 Bits of the Vector Control Register
Bits Function As Used in Example 11.7
4:0 The number of the selected interrupt 00100
5 Enable the chosen IRQ slot 1
M11_9788131787663_C11.indd 411
M11_9788131787663_C11.indd 411 7/3/2012 1:06:31 PM
7/3/2012 1:06:31 PM

11.3.3.7 | Timer 0 Interrupt Register (TOIR)
This register has bits for each of the matching states of MR0 to MR3. When a timer
operates in the interrupt mode and a match occurs, an interrupt is generated, and the
corresponding flag bit in T0IR is set. To ‘clear’ it, a ‘1’ must be written into this same
register.Then only will the interrupt flag be ‘reset’.
Table 11.8 shows that the corresponding bit for Timer 0 in T0IR is bit 0. In
Example 11.7, the instruction used is
T0IR = 0x01; // Clear match 0 interrupt
Steps of the Program (Example 11.7)
i) The operation of the timer is quite straight forward.When a match occurs,Timer 0
interrupt is activated.
ii) Program control goes the ISR T0isr, in which pin P1.16 is complemented, and the
timer flag is reset.
iii) To signal the end of the interrupt, a dummy write to the VICVectAddr register is
also done.
iv) Then, control goes back to the main program.
11.3.4 | The Pulse Width Modulation Unit
Pulse width modulation is basically a scene in which it is possible to control the period
and duty cycle of a square wave.The duty cycle is defined as the ratio of the ON time of
the pulse and the period, expressed as a %.
See Figure 11.13 which shows pulse trains at 25, 50 and 75% duty cycles.
Table 11.8 | Bits of T0IR for the Interrupts Generated When‘Match’Occurs
Bit Symbol Description
0 MR0 Interrupt Interrupt flag for match channel 0.
75%
50%
25%
V
0
V
0
V
0
Duty
Cycle
T T T
Figure 11.13 | Pulse trains at different duty cycles
M11_9788131787663_C11.indd 412
M11_9788131787663_C11.indd 412 7/3/2012 1:06:31 PM
7/3/2012 1:06:31 PM

When MCUs have dedicated PWM units, they have registers which can be
programmed for the required frequency of the pulse train as well as the duty cycle. For
this ARM7 MCU, the PWM unit works similar to a timer unit. Also if its PWM mode
is not enabled, it works only as a timer.
There is a free running timer register (PWMTC), which matches to any of the
seven 32-bit match registers (MR0 to MR6).The match register values are continuously
compared to the count in the timer register which increments (with pre-scaler) when
started. One match register (MR0) is dedicated to the action of deciding the frequency
of the pulse train, by resetting the count upon match. The other match registers can be
used for ﬁxing the duty cycle. Thus there are six PWM output pins from which PWM
signals can be simultaneously generated.
11.3.4.1 | Single Edge Controlled PWM
Here only the falling edge of a pulse train is controlled.Two match registers can be used
to provide a single edge controlled PWM output. One match register (PWMMR0)
controls the PWM cycle rate, by resetting the count upon match. The other match reg-
ister controls the PWM edge position, and thus it controls the duty cycle.
Multiple PWMs can be obtained by using more than one match register. For exam-
ple, refer to Figure 11.14. For this, PWMMR0 is used for specifying T. MR1 to MR6,
MR3 or MR4 can be used for specifying P. Note that for all pulse trains generated at
diﬀerent PWM output pins, the pulse repetition rate is the same, as it is decided by the
number in the match register MR0, which is common to all the pulse trains.
Rules for single edge controlled PWM outputs:
i) All single edge controlled PWM outputs go high at the beginning of a PWM cycle.
ii) Each PWM output will go low when its match value (in MR1 to MR6) is reached.
If no match occurs (i.e. the match value is greater than the PWM rate), the PWM
output remains continuously high.
iii) When a match occurs, actions can be triggered automatically. The possible actions
are to generate an interrupt,reset the PWM timer counter,or stop the timer.Actions
are controlled by the settings in the PWMMCR register.
Note In double edge controlled PWM, both the rising and falling edges of the PWM
waveform are controlled by the match registers. We limit the discussion here, to just the
single edge controlled PWM.
11.3.4.2 | Calculating the Frequency
MR0 is used to decide the frequency of the pulse train. As calculated for the timer
(Section 11.3.2.3) PCLK (15 MHz) has a period of 0.067μsecs. If PWMMR0 is loaded
Figure 11.14 | A pulse train showing the period T and ON time P
T
P
T
M11_9788131787663_C11.indd 413
M11_9788131787663_C11.indd 413 7/3/2012 1:06:31 PM
7/3/2012 1:06:31 PM

with a number N, it counts from 0 to N. A match occurs after 0.067μsecx (N+1), and
this is the period T of the pulse train generated. Refer Table 11.9 for a sample set of
calculations.
11.3.4.3 | Calculating the Duty Cycle
The duty cycle is the ratio of ON period (P) to the total period T. As per Figure 11.14, it
is P/T, expressed as a percentage.
There are six match registers for deciding the pulse ON time. We will consider the
simplest case of single edge controlled PWM. Let’s use the match register PWMMR3.
In this case, when the timer count value matches the value in PWMMR3, the PWM
output pin goes low. This will end the high period of the pulse. See Table11.10 for a
sample calculations based on the value of MR3, for T = 6.365μsecs.
Example 11.8
Calculate the value of the value to be given in PWMMR0 andPWMMR3 to get a pulse
train of period 5 ms and duty cycle of 25%.
Solution
5000 μsecs = (N+1) x.067
(N+1) = 74,626 = 0x12382
The number to be loaded in PWMMR0 is 1 less than this, i.e., it is 12381.
25% duty cycle corresponds to 1.25 msecs
(N1+1) x 0.067 μsecs = 1.25 msecs
Calculating N1 = 18,655
The number to be loaded in PWMMR3 = 18,655 = 0x48DF.
11.2.4.4 | The PWM Output Pins
Corresponding to the six match registers, there are six PWM output pins, and they are
called the PWM channels. The pins and PWMPCR registers bits for enabling each of
them are given in Table 11.11.
Table 11.9 | Calculation of the Period of the Pulse for Diﬀerent Values of PWMMR0
N in PWMMR0 T(μsecs) Frequency (f) = 1/T
0xFF 256 x 0.067 = 17.152 58.3 KHz
0x 5E 95 x0.067 = 6.365 157 KHz
0x FFFF 65,536 x0.067 = 4390.8 227 Hz
Table 11.10 | Calculating the Duty Cycle for Diﬀerent Values of PWMMR3
N in PWMMR0 Value in PWMMR3 Ton
(μsecs) Duty cycle = P/T
0x5E 0xF 1.004 μsec 15.5 %
0x2E 3.15 μsec 49.4 %
M11_9788131787663_C11.indd 414
M11_9788131787663_C11.indd 414 7/3/2012 1:06:31 PM
7/3/2012 1:06:31 PM

11.3.4.5 | Control Registers of the PWM Unit
There are a number of registers for this application, and the usage of each of them can be
referred from the manual of the chip. Here, we only discuss a few.
PWMTCR This is the PWM Timer Control Register.This is an 8-bit register. Only the
lower 4 bits of this register need to be used (Figure 11.15).
Bit 0–CE: COUNTER ENABLE When ‘1’, the PWM timer counter and Prescale
counter are enabled.
Bit 1–CR: COUNTER RESET ‘When ‘1’, the above mentioned PWM timer
count register and Prescale counter are reset on the next positive going edge of PCLK.
They remain reset until this bit is made ‘1’.
Bit 2: R-Reserved
Bit 3: PE-PWM ENABLE. When ‘1’, the PWM mode is enabled. Otherwise the
PWM unit acts as just a timer.
In Example 11.8, PWMTCR = 0x9
PWMPCR This is the PWM Control Register. This is a 16-bit register and is used to
enable and select the type of each PWM channel.
This register enables or disables the six PWM outputs, and also chooses between
double and single edge control. Bits 0, 1, 7 and 8 and 15 are unused. Table 11.12 shows
the state of the bits of PWMPCR for choosing between single and double edge control.
Table 11.11 | List of the PWM Output Channel and Corresponding Port Pin
PWM No Channel No Output Pin No
PWM1 1 P0.0
PWM2 2 P0.7
PWM3 3 P0.1
PWM4 4 P0.8
PWM5 5 P0.21
PWM6 6 P0.9
PE R
R CR CE
Figure 11.15 | Important bits of PWMTCR
Table 11.12 | Choosing Between Single and Double Edge Control Using Bits of PWMPCR
Bit No of PWMPCR When Bit = 0 When Bit = 1
PWMPCR.2 Single edge control for PWM2 Double edge control for PWM2
M11_9788131787663_C11.indd 415
M11_9788131787663_C11.indd 415 7/3/2012 1:06:31 PM
7/3/2012 1:06:31 PM

Refer Table 11.13 which shows the bits of PWMPCR to be ‘set’ to enable the
output pins on which the PWM waveform is to be obtained.
Example 11.9
What should be the value to be entered in the PWMPCR register for the following
situations?
i) Single edge control for PWM3
ii) Double edge control for PWM3
iii) Single edge control for PWM1, 2 and 3
Solution
Refer to Tables 11.12 and 11.13.
i) PWM3 output to be enabled. So only bit 11 is to be enabled, and only single edge
control is to be used. So PWMPCR = 0x00000800
ii) For double edge control, PWMPCR.3 should be set, and to enable PWM3 output
at P0.7, PWMPCR.11 should also be set.This PWMPCR = 0x0000808
iii) Single edge control needs the corresponding bits to be ‘0’. For enabling the outputs
of PWM 1, 2 and 3, bits 9, 10 and 11 should be set. PWMPCR = 0x00000E00;
PWMLER This is the PWM Latch Enable Register.The PWM latch enable register is
an 8-bit register used to control the update of the PWM match registers when they are
used for PWM generation.
When software writes to the location of a PWM match register while the timer is in
PWM mode,the value is held in a shadow register.When a PWM Match 0 event occurs
(normally also resetting the timer in PWM mode), the contents of shadow registers will
be transferred to the actual match registers if the corresponding bit in the latch enable
register has been set. At that point, the new values will take eﬀect and determine the
course of the next PWM cycle.Once the transfer of new values has taken place,all bits of
the LER are automatically cleared. Until the corresponding bit in the PWMLER is set
and a PWM Match 0 event occurs, any new value written to the PWM match registers
has no eﬀect on PWM operation.
Table 11.13 | The Bits of PWMPCR to be Set for Enabling the Output Pins
Enabled by Setting the Register Bit PWM Output
PWMPCR.9 PWM1
PWMPCR.10 PWM2
PWMPCR.11 PWM3
PWMPCR.12 PWM4
PWMPCR.13 PWM5
PWMPCR.14 PWM6
M11_9788131787663_C11.indd 416
M11_9788131787663_C11.indd 416 7/3/2012 1:06:31 PM
7/3/2012 1:06:31 PM

Figure 11.16 shows the six bits of the PWMLER register, each bit enabling the
latching of, match register 0 (MR0) to Match register 6, when set.
Example 11.10 uses PWMLER = 0x8, corresponding to the enabling of only
PWM3 latch.
Example 11.10
#includeLPC214x.h
void PWM_InIt(void)
{
PINSEL0 | = 0x00000008; //Enable P0.1 as PWM output
PWMPR = 0x00000000; //No prescaling
PWMPCR = 0x00000800; //PWM 3 single edge control,
output
//enabled
PWMMR0 = 0xFFF; //Fix up the pulse repletion
rate
PWMTCR = 0x00000009; //Enable PWM mode and PWM
timer
//counting
}
int main()
{
PWM_InIt();
while(1)
{
PWMMR3 = 0x2FE; //Match value for pulse ON
time
PWMLER = 0x8;
}
}
Calculating the Duty Cycle
0x2FF = 767 in decimal
0xFFF = 4096 in decimal
P = 767 x.067 μsecs
T = 4096 x.067 μsec
Duty cycle = P/T = 767/4096 = 18.73 %
Figure 11.16 | Bits of the PWMLER
Reserved M6 M5 M4 M3 M2 M1 M0
M11_9788131787663_C11.indd 417
M11_9788131787663_C11.indd 417 7/3/2012 1:06:31 PM
7/3/2012 1:06:31 PM

Next, let’s consider the generation of more than one PWM pulse. Example 11.11
is a program which gets PWM outputs from channels 1, 2 and 3. Note that all PWM
outputs will occur at the same repetition rate.
Example 11.11
#includeLPC214x.h
void PWM_InIt(void)
{
PINSEL0 |= 0x0000800A; //Enable PWM 1,2 and
3 outputs
PWMPR = 0x00000001; //Prescale value = 1
PWMPCR = 0x00000E00; //Pins of PWM 1,2 and
3 enabled
PWMMR0 = 0xFF; //Set PWM frequency
PWMTCR = 0x00000009;
}
int main()
{
PWM_InIt ();
while(1)
{
PWMMR1 = 0x30; //Pulse on time at PWM1
channel
channel
channel
PWMLER = 0xE; //Latch register value
}
}
T = 34.3 μsecs
Figure 11.18 shows the output pins on which the PWM signals are obtained and
Figure 11.19 shows the PWM output waveforms. Since there is a prescaling factor of
1, the basic time T (calculated with a count of 0xFF) is multiplied by 2 to get T = 34. 3
μsecs.The pulse ON times are also multiplied by 2 to get values as shown in Table 11.14.
T
P
T
Figure 11.17 | The output waveform obtained from Example 11.10
M11_9788131787663_C11.indd 418
M11_9788131787663_C11.indd 418 7/3/2012 1:06:32 PM
7/3/2012 1:06:32 PM

11.3.5 | The UART
This chip has two UARTs, namely, UART0 and UART1. To understand the operation
of these, first observe the simplified block diagram of the UARTs of the chip. For any
of the registers referred herein, you must add the prefix U0 or U1 depending on which
unit (UART0 or UART1) is being used. The three important units are the transmitter,
receiver and the baud rate generator blocks.
Table 11.14 | Values of the Duty Cycles obtained from Example 11.11
Channel Output Pin P (ON Time) μ secs Duty Cycle
PWM1 P0.0 3.38 x2 = 6.76 19.7%
PWM2 P0.7 5.42 x2 = 10.64 31.02%
PWM3 P0.1 2.41 x2 = 4.42 12.8%
PWM2
PWM1
PWM3
L
P
C
2
1
4
8
P0.1
P0.7
P0.0
Figure 11.19 | Output pins for the PWM channels
PWM 1
PWM 2
PWM 3
P1 P1
P2 P2
P3 P3
T T
Figure 11.19 | Output waveforms from three PWM channels
M11_9788131787663_C11.indd 419
M11_9788131787663_C11.indd 419 7/3/2012 1:06:32 PM
7/3/2012 1:06:32 PM

11.3.5.1 | The Transmitter
Figure 11.20 shows two registers in this block.They are theTransmitter Holding Register
(THR) and the Transmitter Shift Register (TSR).When a data byte arrives in the THR,
(from the CPU through the bus) it is ‘framed’ (by adding start and stop bits) and trans-
ferred to the TSR and sent out through the TxD pin one bit at a time, by clocking the
TSR at the baud decided by the transmitter clock TCLK.
11.3.5.2 | The Receiver
There are two registers in the receiver block.They are the Receiver Shift Register (RSR)
and the Receiver Buﬀer Register (RBR).The data received serially through the RxD line
(at the baud decided by RCLK), is moved bit by bit into the RSR, and then transferred
to the RBR after de-framing. From the RBR, it is copied to the CPU registers through
the bus.
TCLK
TCLK
DLL
DLM
THR TSR
TX
RCLK
TXD
BRG
PCLK
APB
Interface
B
U
S
RCLK
RBR RSR
RX
RXD
Figure 11.20 | Simpliﬁed block diagram of a UART on the chip
M11_9788131787663_C11.indd 420
M11_9788131787663_C11.indd 420 7/3/2012 1:06:32 PM
7/3/2012 1:06:32 PM

11.3.5.3 | The BAUD Rate Generator (BRG)
This takes PCLK as input, and generates the baud rates for the transmitter and receiver,
by using the numbers in the registers DLL and DLM which function as dividers.
11.3.5.4 | Registers of UART0
Let us use UART0 for transferring a character string from the LPC 2148 board to a
PC, using the ‘hyperterminal’, at a baud of 9600. Example 11.12 transmits the string
‘sushmita LPC2148’.The string is moved one character at a time to the THR. Between
the loading of one character and the next one, a delay is given.
It may be necessary to understand the registers used in the program to gain a full
understanding of it.As such,refer to the working of each of the registers (Section 11.3.5.4)
while going through the program.
Example 11.12
#include “LPC214x.h”
void init(void);
void delay_ms(int);
int main()
{
int i;
unsigned char c[] = ”sushmita LPC2148 n “;
//Send data by writing to U0THR
init();
for(;;)
{
for(i = 0;i=17;i++)
{
U0THR = c[i];
while((U0LSR 0x20));
//Check status of one bit
of U0LSR
delay_ms(250);
}
}
}
void init()
{
PINSEL0 = 0x05;
U0FCR = 0x07; //Enable and clear FIFOs
U0LCR = 0x83; //8-N-1, enable divisors
U0DLL = 0x62; //9600 baud
U0DLM = 0x00;
U0LCR = 0x03; //8-N-1, disable divisors
}
M11_9788131787663_C11.indd 421
M11_9788131787663_C11.indd 421 7/3/2012 1:06:32 PM
7/3/2012 1:06:32 PM

void delay_ms(int x)
{
int a,b;
for(a = 0;ax;a++)
{
for(b = 0;b3000;b++);
}
}
Pinselect Register (PINSEL0)
This register has been discussed earlier. In this context, only the pin selection for the
TxD and RxD pins of UART0 are referred.
Table 11.15 shows that P0.0 and P0.1 are the relevant pins, PINSEL0 register
selects pins P0.0 as TxD and P0.1 as RxD, by writing PINSEL0 = 0x5.
UART0 Transmit Holding Register (U0THR)
This is an 8-bit register and part of the transmit buﬀer, in fact, it is the topmost byte of
this buﬀer, and new characters are to be loaded into this register for being transmitted.
The data to be transmitted is written into this ‘write only’ register. In Example 11.12,
the characters to be transmitted are loaded into this register one byte at a time, with
a delay.
UART0 Divisor Latch Registers (U0DLL and U0DLM)
The UART0 divisor Latch is part of the UART0 Fractional Baud Rate Generator and
holds the value used to divide the clock supplied by the fractional prescaler in order to
produce the baud rate clock, which must be 16x the desired baud rate.The U0DLL and
U0DLM registers together form a 16-bit divisor where U0DLL contains the lower
8 bits of the divisor and U0DLM contains the higher 8 bits of the divisor.
Example 11.12 uses U0DLL = 0x62 and U0DLM = 0x00 to get a baud rate of 9600.
Table 11.15 | Relevant Bits of the PINSEL0 Register
Bits Port Pin Value Function
1:0 P0.0 00 GPIO
01 TxD (UART0)
10 PWM1
11 Reserved
3:2 P0.1 00 GPIO
01 RxD (UART0)
10 PWM3
11 EINT0
M11_9788131787663_C11.indd 422
M11_9788131787663_C11.indd 422 7/3/2012 1:06:32 PM
7/3/2012 1:06:32 PM

The calculation for these values is as follows
16 (256 ) (
boudrate
PCLK MulVal
UART0
U0DLM U0DLL MulVal DivAddVall
= ×
× × + +
In our case, PCLK = 15 MHz
With U0DLM = 0 and U0DLL = 0x62 (98 in decimal notation), the calculation
with the above formula gives the baud rate to be 9566.32, i.e., 9600 (approx)
UART0 FIFO Control Registers (U0FCR)
This is an 8-bit register Figure 11.21 in which the important bits are as explained.
Bit 0: E: This bit must be set for enabling the Tx and Rx FIFOs
Bit 1: Rx FIFO Reset: This must be set, to clear all bytes in UART0 Rx FIFO and
reset the pointer logic.This bit is self-clearing.
Bit 2: Tx FIFO Reset: This must be set to clear all bytes in UART0 Tx FIFO and
reset the pointer logic.This bit is self-clearing.
Bits 7 and 6: Rx trigger level: These two bits determine how many receiver FIFO
characters must be written before an interrupt is activated.
We have chosen 00.
UART0 Line Control Register (U0LCR)
The U0LCR is an 8-bit register which determines the format of the data character that
is to be transmitted or received.
The bits actively used here are
i) Bits 1: 0 These two bits have been chosen to be ‘11’to indicate 8-bit character length
ii) Bit 2: This is made ‘0’ to select one stop bit
iii) Bit 7: This is the Divisor Latch Access Bit (DLAB) and is set, to enable the use of
the divisor latch
iv) Other bits of this register pertain to parity and break control. These have been dis-
abled by making these bits to be ‘0’,
Thus we have U0LCR = 1000 0011 = 0x83
UART0 Line Status Register (U0LSR)
The U0LSR is a read-only register that provides status information regarding the
UART0 TX and RX blocks.
In Example 11.10,only the 5th
bit of this register is used.The 5th
bit shows a ‘1’when
the transmitter holding register (U0THR) is empty.Only if the register is empty can the
next byte be sent for transmission. To conﬁrm this, U0LSR is ANDed with 0x20, and
the next character is sent only after this status bit is conﬁrmed to be high.
Figure 11.21 | Bits of U0FCR
T (Bit 7 and 6) Reserved (Bits
3 to 5)
Tx R (Bit 2) Rx R (Bit 1) E (Bit 0)
M11_9788131787663_C11.indd 423
M11_9788131787663_C11.indd 423 7/3/2012 1:06:32 PM
7/3/2012 1:06:32 PM

With this, we conclude our discussion of serial communication. For more details of
the registers used and the interrupt mode of transmission, do refer to the user manual
of LPC2148.
11.3.6 | The SSP Unit
This unit performs serial communication using the SPI protocol (Refer Section 5.2.2).
Appendix I contains a program which interfaces an SD card to LPC 2148 using the SSP
unit. It may be necessary to refer to the manual of the chip, and gain an understanding
of the registers of the SSP unit, to get a clear understanding of the interfacing program.
Section 19.2 discusses a complete product developed using LPC 2148 MCU.
With this, our discussion of a typical ARM7 MCU ends. Note that the one that we
have used is only one among the numerous versions of ARM7 available in the market.
But a basic understanding of this chip, peripherals and programming will help in under-
standing any other ARM7 MCU chip.
Next we will take a look at typical ARM9 and Cortex MCUs. Going beyond the
periphery of these is beyond the scope of this book. We will look at them, from a block
diagram point of view, just to observe their complexity and power.
11.4 | ARM 9
The ARM9 core is a more advanced member of the ARM family (compared to ARM7).
It has a 5 stage pipeline and operates at a frequency range of approximately double that
of ARM7. Many ARM9 cores have DSP instructions and thus are ‘Enhanced’ARM9E
processors. Because the core is so powerful, it is used for more complex operations and
an MCU which is based on ARM9,typically has more peripherals than an ARM7 based
MCU.
Here we will take a look at a particular ARM9 board developed by NXP; it is con-
tains an MCU of the LPC 29xx series. The user manual describes the features of this
board in this way:
‘The LPC 29xx combine an 125 MHz ARM968E-S CPU core, Full Speed USB
2.0 host and device (LPC 2927/29 only), CAN and LIN, 56 KB SRAM, up to 768 KB
ﬂash memory, external memory interface, three 10-bit ADCs, and multiple serial and
parallel interfaces in a single chip’.
It is obvious that this is a very powerful chip with many more peripherals than the
ARM7 MCU that we have just studied. Figure 11.22 shows the internal block diagram
of the chip, in which you can observe its advanced features and peripheral structure.
11.5 | ARM Cortex-M3
To complete our discussion, let us look at a cortex-based MCU as well. The LPC 17xx
series is an MCU series with ARM cortex M3 as its core. The following paragraphs
quoted from the user manual, illustrate its main features, which are also evident from
the block diagram of Figure 11.23.
M11_9788131787663_C11.indd 424
M11_9788131787663_C11.indd 424 7/3/2012 1:06:32 PM
7/3/2012 1:06:32 PM

Slave
Timer0/1 MTMR
PWM0/1/2/3
3.3V ADC1/2
5V ADC0
CAN0/1
WDT
RS485 UART0/1
SPI0/1/2
Timer 0/1/2/3
System Control
Event Router
Chip Feature ID
LIN0/1
Global
Acceptance
Filter
AHB to APB
Bridge
ITCM
16 kB
DTCM
16 kB
8 kB SRAM
ARM968E-S
Test/Debug
Interface
General Purpose I/O
Ports 0/1/2/3
16 kB
EEPROM
AHB to DTL
Bridge
LPC2917/19/01
JTAG
Interface
AHB to DTL
Bridge
Vectored
Interrupt
Controller
Clock
Generation
Unit
Reset
Generation
Unit
Power
Management
Unit
AHB to APB
Bridge
AHB Multilayer
Matrix
Slave
Slave
Slave
Slave
Slave
Slave
AHB to APB
Bridge
AHB to APB
Bridge
Quadrature
Encoder
I2
C0/1
Embedded SRAM 32kB
Embedded SRAM 16kB
External Static
Memory Controller
GPOMA Controller
Master
Master
Master
GPOMA Registers
Embedded Flash
512/768 kB
Slave
Slave
Slave
Slave
Slave
Figure 11.22 | Internal block diagram of the LPC 29xx ARM9 MCU
The LPC 17xx is an ARM Cortex-M3 based microcontroller for embedded
applications requiring a high level of integration and low-power dissipation. The ARM
Cortex-M3 is a next generation core that oﬀers system enhancements such as modern-
ized debug features and a higher level of support block integration.
M11_9788131787663_C11.indd 425
M11_9788131787663_C11.indd 425 7/3/2012 1:06:32 PM
7/3/2012 1:06:32 PM

Figure 11.23 | Internal block diagram of the 17xx series of ARM–Cortex M3 MCU
AHB to
APB Bridge
APB Slave Group 0
20 Bytes of Backup
Registers
32 kHz
Oscillator
RTC Power Domain
Real Time Clock
SSP1
UARTs 0 1
CAN 1 2
I2
C 0 1
SPI0
Capture/Compare
Timers 0 1
Watchdog Timer
PWM1
12-bit ADC
Pin Connect Block
GPIO Interrupt Ctl
AHB to
APB Bridge
APB Slave Group 1
SSP0
UARTs 2 3
12S
I2
C2
Capture/Compare
Timers 2 3
Repetitive Interrupt
Timer
External Interrupts
DAC
System Control
Motor Control PWM
Quadrature Encoder
High Speed GPIO Multilayer AHB Matrix
Trace
Port
Trace
Moonie
USB
Interface
Clock
and
Controls
ROM
8 kB
SRAM
64 kB
Flash
Accelerator
Flash
512 kB
Clock Generation,
Power Control,
Brownout Detect,
and Other
System Functions
USB
Device,
Host,
OTG
Ethernet
PHY
Interface
RST
X/a
Out
X/a
In
Ethernet
10/100
MAC
JTAG
Interface
DMA
Controller
Test/Debug Interface
ARM Cortex-M3
I-Code
Bus
D-Code
Bus
System
Bus
Note: Shaded peripheral blocks
support general purpose DMA
M11_9788131787663_C11.indd 426
M11_9788131787663_C11.indd 426 7/3/2012 1:06:32 PM
7/3/2012 1:06:32 PM

High speed versions (LPC 1769 and LPC 1759) operate at up to a 120 MHz CPU
frequency. Other versions operate at up to an 100 MHz CPU frequency. The ARM
Cortex-M3 CPU incorporates a 3-stage pipeline and uses a Harvard architecture with
separate local instruction and data buses as well as a third bus for peripherals. It also
includes an internal prefetch unit that supports speculative branches.
The peripheral complement of the LPC 17xx includes up to 512 kB of flash mem-
ory,up to 64 kB of data memory,Ethernet MAC,a USB interface that can be configured
as either Host, Device, or OTG, 8 channel general purpose DMA controller, 4 UARTs,
2 CAN channels,2 SSP controllers,SPI interface,3 I2C interfaces,2-input plus 2-output
I2S interface, 8 channel 12-bit ADC, 10-bit DAC, motor control PWM, Quadrature
Encoder interface, 4 general purpose timers, 6-output general purpose PWM, ultra-low
power RTC with separate battery supply, and up to 70 general purpose I/O pins.’
Conclusion
With this, we conclude our discussion of ARM peripheral interfacing.The programs in
the chapter have been tested and confirmed to be working as per the specifications for
which it has been designed (i.e. frequency, pulse width, etc.) Only a few peripherals of
ARM7 have been discussed, but the methodology used for those blocks is expected to
help in understanding the rest of them.The important point is that registers of the unit
to be used are to be understood with a high degree of clarity. For ARM 9 and Cortex
MCUs, only the degree of complexity has been shown–programming them can be done
on similar lines, as has been done for ARM 7.
The LPC 2148 MCU belongs to the series of ARM7 MCUs of NXP, and is very popular.
It operates with a system clock of 60 MHz, and has a large set of peripherals.
It has three internal buses operating at different frequencies, conforming to AMBA
specifications.
There is a‘memory accelerator module’to allow fast access to program lines.
It has two ports, the pins of which act as GPIO pins.
Each pin is multi functional, and can be configured for a specific function by using the bits
of a‘Pinselect Register’.
There are two timers which can be used as free running interval timers or as capture
timers.
The system clock is named CCLK, and is divided to get a lower frequency peripheral clock
called PCLK.
The timers and PWMs used in the programs in the chapter have values of output frequen-
cies, based on a PCLK of 15 MHZ.
The PWM unit has 6 output pins from which 6 PWM pulse trains can be obtained
simultaneously.
M11_9788131787663_C11.indd 427
M11_9788131787663_C11.indd 427 7/3/2012 1:06:33 PM
7/3/2012 1:06:33 PM

There are two serial communication units named UART0 and UART1.
Using the SSP unit, an SD card can be interfaced to the chip.
ARM9 and Cortex MCUs are much more complex and powerful, and have more number
of peripherals.
Q U E S T I O N S
1. Name five peripherals in the LPC 2148 MCU.
2. What is the difference between PCLK and CCLK?
3. What is the necessity for having the MAM module? How does it function?
4. Distinguish between the power-down and idle modes of this MCU.
5. What is meant by the term‘AMBA’?
6. Differentiate between the different internal buses in terms of speed and function.
7. Look at the memory map and find out the extent of memory locations for static RAM and
flash ROM.
8. For a GPIO pin to be made to act as an ON/OFF switch, which are the registers to be used.
Give an example to illustrate the use of these registers.
9. How does the prescaler in a time unit function?
10. Distinguish between single and double edge PWM.
E X E R C I S E S
Write programs to obtain the following waveforms:
1. Generate a symmetrical square wave at four pins of Port 1, using software delay.
2. Generate an asymmetric square wave at four pins of Port 0 using software delay.
3. Using Timer 1, obtain a symmetric square wave of frequency 10 KHz at one pin of Port 1,
and another square wave of frequency 90 KHz at one pin of Port 0 using Timer 0. Both
waveforms should be simultaneously present.
4. UsingTimer 0, generate an asymmetric square waveform at four pins of Port 1.The square
wave should have an ON time of 0.1 msec and an OFF time of 0.35 msec.
5. Generate PWMs at the six output pins of the PWM unit, with duty cycles of 10, 20, 30, 40,
50 and 70%.
M11_9788131787663_C11.indd 428
M11_9788131787663_C11.indd 428 7/3/2012 1:06:33 PM
7/3/2012 1:06:33 PM

Introduction
The term SoC was mentioned in Chapter 1. So we know that an MCU with a large
number of peripherals is called an SoC, for ‘System on chip’. Each of the peripherals on
an SoC is usually programmable, and so the term PSoC can have a general meaning.
But in this chapter, we discuss a very specific product line of Cypress Semiconductors
designated as PSoC. We discuss the special features of Cypress’s PSoC, which has
become very popular in the embedded systems world and has found many applications
for itself. PSoC is a family of embedded processors with a simple 8-bit M8C core in
PSoC1, a more sophisticated 8-bit 8051 core in PSoC3, and an advanced 32-bit ARM
core in PSoC5.
In this chapter, we will concentrate more on the PSoC 1 architecture and usage.
The aim is to introduce the reader to this series of MCUs which are versatile, easy to
understand and use, and have many features that other MCUs do not possess. The best
way to learn is to get a PSoC development kit and do a project based on one of the chips
belonging to this family.This chapter introduces you to PSoC and analyses why PSoC is
a good point to ‘take off’ into the embedded design world.
The history and application range of PSoC
devices
The distinct and special features of PSoC
The differences between PSoC1, PSoC3
and PSoC5
The internal architecture of PSoC1
The GUI of PSoC Designer
The digital blocks of PSoC1
The working principle of Switched
Capacitor circuits
The finer details of the analog blocks
How to do the interconnections on the
GUI for digital and analog blocks
The programming of PSoC1
The enhancements available for PSoC3
and PSoC5
cypress’s psoc:
a different kind
of mcu
12
Chapter-opening image: A PSoC 5 development board.
M12_9788131787663_C12.indd 429
M12_9788131787663_C12.indd 429 7/3/2012 12:12:03 PM
7/3/2012 12:12:03 PM

12.1 | How to get a PSoC Development Kit
Look up the website http://www.cypress.com/psoc. for information on how to get
a PSoC kit for academic applications. Data sheets and links to other references and
forums can also be obtained here.
Development kits are available from the following distributors: Digi-Key, Avnet,
Arrow and Future. The website http://www.onfulﬁllment.com/cypressstore/ Online
Store contains development kits, C compilers and all accessories for the PSoC family.
12.1.1 | Development Kit
Figure 12.1 shows the details of the CY3210-PSoCEVAL1 evaluation kit used for the
PSoC1 family of mixed signal controllers.
This PSoC1 evaluation kit features an evaluation board and MiniProg1 program-
ming unit. The evaluation board includes an LCD module, potentiometer, LEDs, and
plenty of breadboarding space. The MiniProg1 programming unit is also included with
the kit and will program PSoC devices directly on the evaluation board, or on other
Figure 12.1 | CY3210-PSoCEVAL1 evaluation kit
CY3210-PSoCEVAL1 Evaluation Kit Details
Power LED
RS-232 Interface
UART Tx
Pin
RS-232 Transceiver
Prototyping
Area
Character LCD
Interface
UART Rx Pin
9-V Battery
Terminals
DC Supply Jack
Voltage Regulator
Jumper (JP3) to Select
Power Option (3.3 V/5 V)
LCD Contrast
Control
Reset Switch
Chip Socket for
PSoC Device JP1 connects P16 to UART Rx Pin
ISSP Programming Header
JP2 connects P27 to UART Tx Pin
GPIO
Expansion
Port
LEDs
Potentiometer
(Analog Input)
Push Button
Switch
M12_9788131787663_C12.indd 430
M12_9788131787663_C12.indd 430 7/3/2012 12:12:06 PM
7/3/2012 12:12:06 PM

CYPRESS’S PSoC: A DIFFERENT KIND OF MCU 431
boards via a 5-pin ISSP header. This programming unit is small and compact, and
connects to a PC via a USB 2.0 Cable.
The kit contains the following:
i) Evaluation Board with LCD Module
ii) MiniProg1 Programming Unit
iii) PSoC Designer Software CD
iv) 28 Pin CY8C29466-24PXI PDIP PSoC Device Sample
v) 28-pin CY8C27443-24PXI PDIP PSoC Device Sample
vi) USB 2.0 Cable
vii) Getting Started Guide
12.1.2 | History and Applications of PSoC
The first PSoC chips became commercially available in 2002. Now, in 2012, PSoC has
changed the profile of Cypress, the semiconductor company. Among the thousands
of PSoC customers worldwide are market leaders such as HP, Cisco, Motorola, IBM,
Honeywell, Samsung, LG, Lenovo, Haier, Acer, HTC, Fujitsu, Hitachi, Nintendo,
Sharp, Suzuki, Philips, BMW, and Gaggia.
PSoC is used in devices as simple as Sonicare toothbrushes and Adidas sneakers,
and as complex as the TV set-top box. One single PSoC, using CapSense, controls
the touch-sensitive scroll wheel on the Apple iPod click wheel. An expanded list of its
applications include high-definition televisions, digital cameras, remote-control hob-
byist helicopters, computer mice, printers, and other PC peripherals, health and fitness
equipment, automobile sound systems, satellite radios, and engine control units, medical
equipment, lighting, motorized baby strollers, oscilloscope and MP3 players.
Commercial scale shipments began in 2002, and Cypress shipped its 100-millionth
unit in 2006.The shipments reached 250 million units in 2007,500 million units in 2009
and 750 million in the fourth quarter of 2010. On 12 July 2011, Cypress announced
that it had shipped its one billionth PSoC unit. Obviously, PSoC is continuing its ride
comfortably.
12.1.3 | What is Different About PSoC
The PSoC is an MCU with a computing engine (we call it the core) and a number of
peripherals. One very distinct and special feature it possesses is that of having analog
peripherals co-existing with digital ones on the same chip.
The other major distinct features are:
i) Programmable analog and digital blocks
ii) Configurable peripherals
iii) Flexible GPIOs, most of the pins can connect to any of the peripherals
iv) Configurable GPIOs—any pin can be input or output
v) GUI-based IDE for easy visualization
vi) API’s for each peripheral and system resources; so less time is spent on register
manual
Starting at this point, let us now list out the advantages of using PSoC, in comparison
with other popular MCUs of comparable computing power.
M12_9788131787663_C12.indd 431
M12_9788131787663_C12.indd 431 7/3/2012 12:12:08 PM
7/3/2012 12:12:08 PM

i) For any real-world application, input signals are usually analog, and so being able
to process (amplify, compare, filter, etc) them, before passing them on for digital
actuation is a great boon, especially as it is all done in the same chip.This feature is
pointed out as the first major highlight of PSoC. Being able to integrate analog and
digital blocks on the same chip reduces the size, weight and power requirement of
the final product.
ii) The second advantage is that of having an easy to use graphical IDE (Integrated
Development Environment) in which the analog and digital building blocks can be
selected and interconnected by the simple act of ‘drag and drop’.This makes project
design very easy. PSoC Designer is the IDE for PSoC1 and PSoC Creator is the
IDE for PSoC3 and PSoC5.
iii) PSoC can be programmed in assembly and C and a number of functions are
available in its set of APIs (for Application Programming Interfaces, refer to
Section 7.6.4), which makes programming easy and ‘modular’. Thus for any user
module, the user has the ease of ‘just looking’ for the right function for his require-
ment. In this chapter, we use this mode of programming wherein the user manual
of a particular module is referred to get the apt function for any application using a
specific programmable block.
iv) The chip can use its internal oscillator and chain of multipliers and dividers, for
getting any frequency as required by any of the programmable blocks. Besides that,
there is the option of using an external source of frequency.
With this brief introduction, let us get started on knowing more about PSoC. As we get
to understand it more, we will get accustomed to some more of its special features which
will make it seem to be extremely flexible in comparison to other standard MCUs.These
features are:
i) Peripherals on the chip are not fixed. A PSoC chip is blank (as far as peripherals are
concerned) initially.The system designer can define what peripherals are needed for
his application,and ‘configure’the chip accordingly.To use any of the programmable
blocks to work as the peripheral he needs, the ‘drag and drop’technique of the IDE
is used and the ease of using it makes the design process a very comfortable and
interesting activity.
ii) For most other MCUs, the pins corresponding to the input and output of a specific
peripheral are fixed as per the chip pinout. In PSoC, this does not hold true. Any
GPIO pin can be used as the input or output pin of any of the peripherals of the
‘user module’group (there are some restrictions in the case of analog I/O and certain
pins).This simplifies the routing of signals outside the IC.
iii) PSoC has a large collection of peripherals.The user can choose from this large set,
the ones that his applications need (though, limited by the number of program-
mable blocks and GPIO pins).The point is that for different applications, a differ-
ent set of peripherals may be chosen. This is not so for any other standard MCU
where a limited set of ‘certain’ peripherals is available for the chip, and a ‘range of
choice’is not available.This point will be clearer as we get to use the programmable
blocks.
iv) There is an on-board power supply and the SMP unit may be used to provide power
(Section 7.6.4).
M12_9788131787663_C12.indd 432
M12_9788131787663_C12.indd 432 7/3/2012 12:12:08 PM
7/3/2012 12:12:08 PM

v) For PSoC1,there is the‘dynamic reconfiguration’option.This means that peripherals
may be changed at runtime. The point is that, if the current design is using all the
programmable blocks in the first configuration, some of them may be unloaded at
run time and replaced by others in the second configuration.
12.2 | The PSoC Family
The family comprises the following IC series
i) CY8C2xxxx named PSoC1 with the M8C core
ii) CY8C3xxxx named PSoC3 with the 8051 core
iii) CY8C5xxxx named PSoC5 with the ARM Cortex M3 core
There are a number of member chips in the family,and they differ in aspects like number
of pins, number of digital /analog blocks, flash, RAM, IC packaging, etc.
PSoC 1
This is the first version of PSoC, that is, the original design which has been available
since 2001. Its core is an 8-bit M8C (CISC) core. It uses a basic clock frequency of
24MHz (maximum), and 4 MIPS is the performance claimed by the manufacturers.
There are a number of chips available which uses the PSoC1 architecture.The IDE pro-
vided for PSoC1 is the ‘PSoC Designer’.
PSoC3
PSoC3 devices are based on a new, high-performance and enhanced 8-bit 8051 proces-
sor. It has a performance of 33 MIPS at 66 MHz.
PSoC5
PSoC 5 devices include a powerful 32-bit ARM Cortex M3 processor. 100 Dhrystone
MIPS is the performance claimed for this. (Dhrystone is a bench mark for integer
computations.)
12.2.1 | Comparing PSoC3 and 5
They have different cores and the peripheral structure is almost the same. Both of them
use the PSoC Creator as their IDE. They are completely different from PSoC1 and no
compatibility exists, which means that there is no way one can migrate from PSoC1 to
PSoc3 or PSoC5.
12.2.2 | Focus of the Chapter
In this chapter we start with PSoC1, discuss its core, the peripherals alias user modules
and write programs for some of its peripherals. The step-by-step instructions for using
the PSoC Designer are given in Appendix C. Besides, that, a set of design examples for
PSoC1, and also a step-by-step approach for using PSoC Creator is put up in the book
website www.pearsoned.co.in/lylabdas/embeddedsystems.
M12_9788131787663_C12.indd 433
M12_9788131787663_C12.indd 433 7/3/2012 12:12:08 PM
7/3/2012 12:12:08 PM

This chapter is meant to facilitate a good understanding of the PSoC architecture
(including programming). We will not do assembly language programming, nor discuss
the cores in detail, rather our concentration will be on using the inbuilt analog and digi-
tal building blocks and programming them using C. With a good knowledge of PSoC1
and its IDE, it will be easy to use any other version of PSoC, because even if the archi-
tectures and IDE are different, the approach is the same.
PSoC3 and PSoC5 will also be discussed—but in less detail.
12.3 | PSoC1
This version is available in the CY8C 21xxx, 22xxx, 24xxx, 27xxx, 28xxx and 29xxx series.
The forthcoming discussion here is more or less general and applies to all these part
numbers.In this book,we use the 29xxx series to cite examples of user modules,configu-
ration and programming.
12.3.1 | The CY8C29xxx Series
These controllers are available in different packages and pin counts, that is, there are
controllers with 8, 16, 20, 24, 28, 32, 44, 48 and 100 pins. As the pinout increases, the
count of the (GPIO ports) available in the chip increases. The maximum number of
ports possible is 8 (for the 100 pin package).For example,the CY8C29xxx series is avail-
able in 5 packages and 5 pin counts. Figure 12.2 gives an approximate idea of this. See
Table 12.1 to understand the packaging abbreviation, and Figures 12.2a to 12.2e shows
the CY8C29xxx series of PSoC ICs with different pin counts and packaging.
12.3.2 | Pin Designations
Some pin designations need elaboration:
i) AI: Analog Input
ii) AIO: Analog I/O
iii) All GPIO pins (i.e. port pins) can be used for digital I/O
iv) Some pins like clock, XTAL, VDD, VSS, etc. have fixed designations. In
Figure 12.2a, it can be noticed that the hardware I2C block (Section 5.2.1) can use
only specific pins. (10 and 11) But the user modules realized using programmable
digital blocks can use any GPIO pin. We will soon get to know how this is done.
Table 12.1 | IC Packaging Abbreviations
Abbreviation Packaging
PDIP Plastic Dual Inline Package
SSOP Shrink Small Outline Package
SOIC Small Outline Integrated Circuit
TQFP Thin Quad Flat Pack
QFN Quad Flat No Leads
M12_9788131787663_C12.indd 434
M12_9788131787663_C12.indd 434 7/3/2012 12:12:08 PM
7/3/2012 12:12:08 PM

Figure 12.2a | 28 pins with 3 GPIO ports and packages PDIP, SSOP and SOIC
P1[0],XTALout,I2CSDA
A, I, P0[7] 1 28
27
26
25
24
23
22
21
20
19
18
17
16
15
2
3
4
5
6
7
8
9
10
11
12
13
14
A, I, P0[1]
A, I, P2[3]
A, I, P2[1]
I2CSCL,P1[7]
I2CSDA,P1[5]
I2CSCL,XTALin,P1[1]
P1[3]
SMP
Vss
Vdd
PDIP
SSOP
SOIC
CY8C29466 28-pin PSoC Device
P0[6], A, I
P0[4], A, IO
P0[2], A, IO
P2[6],ExternalVREF
P2[4],ExternalAGND
P2[2], A, I
P2[0], A, I
P1[6]
P1[2]
P1[4],EXTCLK
XRES
P0[0], A, I
P2[7]
P2[5]
A, IO, P0[3]
A, IO, P0[5]
Figure 12.2b | 44 pins, 5 GPIO ports and TQFP packaging
P2[5] 1 33
32
31
30
29
28
27
26
25
24
23
2
3
4
5
6
7
8
9
10
11
P4[7]
P4[1]
SMP
P3[5]
P3[3]
P3[7]
P2[4],ExternalAGND
TQFP
P2[2], A, I
P2[0], A, I
P4[6]
P4[2]
P4[0]
XRES
P3[6]
P3[2]
P3[4]
P4[4]
P4[5]
P4[3]
A, I, P2[1]
A, I, P2[3]
P3[1]
12
13
14
15
16
17
18
19
20
21
22
P1[3]
I2CSDA,XTALout,P1[0]
P1[2]
P1[6]
P3[0]
EXTCLK,P1[4]
I2CSCL,XTALin,P1[1]
Vss
I2CSDA,
P1[5]
I2CSCL,
P1[7]
P2[7]
44
43
42
41
40
39
38
37
36
35
34
P0[5],
A,
IO
P0[6],
A,
I
P0[4],
A,
IO
P0[0],
A,
I
P2[6],ExternalVREF
P0[2],
A,
IO
P0[7],
A,
I
Vdd
P0[3],
A,
IO
P0[1],
A,
I
M12_9788131787663_C12.indd 435
M12_9788131787663_C12.indd 435 7/3/2012 12:12:08 PM
7/3/2012 12:12:08 PM

Figure 12.2c and Figure 12.2d | 48 pins–6 GPIO ports and packages QFN and SSOP
A, I, P2[3] 1
33
34
35
36
32
31
30
29
28
27
26
25
2
3
4
5
6
7
8
9
10
11
12
P4[5]
SMP
P3[7]
P3[3]
P3[1]
P5[3]
P3[5]
P2[4],ExternalAGND
QFN
(Top View)
P2[2], A, I
P2[0], A, I
P4[6]
P4[2]
P4[0]
XRES
P3[6]
P3[0]
P3[2]
P3[4]
P4[4]
P4[3]
P4[1]
P4[7]
A, I, P2[1]
P5[1]
13
15
14
16
17
18
19
20
21
22
23
24
P1[3]
I2CSDA,XTALout,P1[0]
P1[2]
P1[6]
P5[0]
P5[2]
EXTCLK,P1[4]
I2CSCL,XTALin,P1[1]
Vss
I2CSDA,P1[5]
I2CSCL,P1[7]
P2[7]
P2[5]
43
44
45
46
47
48
42
41
40
39
38
37
P0[5],
A,
IO
P0[6],
A,
I
P0[4],
A,
IO
P0[0],
A,
I
P2[6],ExternalVREF
P0[2],
A,
IO
P0[7],
A,
I
Vdd
P0[3],
A,
IO
P0[1],
A,
I
P1[0],XTALout,I2CSDA
A, I, P0[7] 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
A, I, P0[1]
A, I, P2[3]
A, I, P2[1]
I2CSCL,P1[7]
P5[1]
P5[3]
P3[1]
P3[3]
P3[5]
P3[7]
P4[1]
P4[3]
P4[5]
P4[7]
I2CSDA,P1[5]
I2CSCL,XTALin,P1[1]
P1[3]
SMP
Vss
Vdd
SSOP
P0[6], A, I
P0[4], A, IO
P0[2], A, IO
P2[6],ExternalVREF
P2[4],ExternalAGND
P2[2], A, I
P4[6]
P4[4]
P4[2]
P4[0]
P3[6]
P3[4]
P3[2]
P3[0]
P5[2]
P5[0]
P2[0], A, I
P1[6]
P1[2]
P1[4],EXTCLK
XRES
P0[0], A, I
P2[7]
P2[5]
A, IO, P0[3]
A, IO, P0[5]
48
47
46
45
44
43
42
41
40
39
38
37
36
35
34
33
32
31
30
29
28
27
26
25
Figure 12.2e | 100 pins with 8 GPIO ports and TQFP package
1
2
3
4
5
6
7
8
9
10
11
12
TQFP
NC
NC
NC
NC
NC
NC
NC
NC
NC
Vdd
Vss
P0[0], A, I
NC
P2[6],ExternalVREF
NC
P2[4],ExternalAGND
P2[2], A, I
P2[0], A, I
P4[6]
P4[4]
Vss
P4[2]
P4[0]
NC
NC
P3[6]
P3[4]
P3[2]
P3[0]
P5[6]
P5[4]
P5[2]
P5[0]
NC
XRES
NC
NC
A, I, P0[1]
P2[7]
P2[5]
A, I, P2[3]
A, I, P2[1]
P4[7]
P4[5]
P4[3]
P4[1]
NC
NC
Vss
P3[5]
P3[7]
P3[3]
P3[1]
P5[7]
P5[5]
P5[3]
P5[1]
I2CSCL,P1[7]
NC
SMP
13
15
14
16
17
18
19
20
21
22
23
24
25
75
74
73
72
71
70
69
68
67
66
65
64
63
61
62
60
59
58
57
56
55
54
53
52
51
33
34
35
36
32
31
30
29
28
27
26
43
44
45
46
47
48
49
50
42
41
40
39
38
37
93
92
91
90
94
95
96
97
98
99
100
83
82
81
80
79
78
77
76
84
85
86
87
88
89
I2CSDA,P1[5]
XTALin,I2CSCL,P1[1]
XTALout,I2
CSDA,P1[0]
P1[3]
P7[7]
P7[6]
P7[5]
P7[4]
P7[3]
P7[2]
P7[1]
P7[0]
P1[6]
P1[2]
EXTCLK,P1[4]
NC
NC
P0[2],
A,
IO
NC
P0[3],
A,
IO
P0[7],
A,
I
P6[6]
NC
P6[7]
NC
NC
Vdd
P0[5],
A,
IO
P6[5]
P6[4]
P6[3]
P6[2]
P6[1]
P6[0]
Vss
Vss
P0[4],
A,
IO
P0[6],
A,
I
Vdd
NC
M12_9788131787663_C12.indd 436
M12_9788131787663_C12.indd 436 7/3/2012 12:12:09 PM
7/3/2012 12:12:09 PM

Figure 12.3 | Internal block diagram of PSoC
POR and LVD
System Resets
System Bus
Port 7
Global Analog Interconnect
Global Digital Interconnect
PSoC CORE
Flash 32K
SROM
SRAM 2K
Interrupt
Controller
CPU Core (M8C)
Multiple Clock Sources
(Includes IMO, ILO, PLL, and ECO)
Sleep and
Watchdog
I2
C
Internal
Voltage
Ref.
Switch
Mode
Pump
Analog
Block
Array
Analog
Ref.
Digital System Analog System
System Resources
Decimator
Digital
Clocks
Two
Multiply
Accums.
Analog
Input
Muxing
Port 6 Port 5 Port 4 Port 3 Port 2 Port 1 Port 0
Analog
Drivers
Digital
Block
Array
12.4 | The Internal Architecture of PSoC
Figure 12.3 shows the internal block diagram of a typical PSoC1 device, with all its
components which are as follows:
i) The PSoC core
ii) The digital system
iii) The analog system
iv) The system resources
Let’s discuss these in detail.
M12_9788131787663_C12.indd 437
M12_9788131787663_C12.indd 437 7/3/2012 12:12:09 PM
7/3/2012 12:12:09 PM

12.4.1 | The PSoC Core
Figure 12.4 shows the PSoC core which again is constituted by diﬀerent blocks.
12.4.2 | The CPU Core (M8C)
The CPU core is an 8-bit ‘Harvard’ architecture (Section 2.1.2) meaning that the pro-
gram space and data memory space and associated buses are separate.The M8C has ﬁve
internal registers which are as follows:
i) The Accumulator (A)
ii) Index (X)
iii) Program Counter (PC)
iv) Stack Pointer (SP)
v) Flags (F)
All these are 8-bit registers except the PC which is 16-bit long.
12.4.3 | Memory
The address space of PSoC has three distinct divisions: the ROM space, the RAM space
and the I/O registers. Just as in the case of any other MCU, the ROM or Flash is the
one that stores program code, and is pointed by the 16-bit program counter. Figure 12.5
Figure 12.4 | The Core of PSoC
Multiple Clock Sources
24 MHz Internal Main
Oscillator (IMO)
32 KHz Crystal
Oscillator (ECO)
Internal Low Speed
Oscillator (ILO)
Phase Locked
Loop (PLL)
PSoCTM
CORE
SRAM
CPU Core (M8C)
Flash Nonvolatile
Memory
Supervisory ROM
(SROM)
Interrupt
Controller
Sleep and
Watchdog
System Bus
Port 0
Port 7 Port 6 Port 5 Port 4 Port 3 Port 2 Port 1
Analog
Drivers
M12_9788131787663_C12.indd 438
M12_9788131787663_C12.indd 438 7/3/2012 12:12:09 PM
7/3/2012 12:12:09 PM

shows the CPU, registers and memory. The points to be assimilated while observing
Figure 12.5 are as follows:
i) The maximum amount of flash possible is 32KB,but different chips (or ‘parts’as they
referred to) can have different flash sizes like 2, 4, 8,16 or 32 KB.The CY8C29466,
for instance, has 32 KB of flash. The flash is subdivided as program ROM and
supervisory ROM. This means that a part of flash memory is used for storing boot
up code (supplied by the manufacturer), calibration parameters and the like. This
part is called the Supervisory ROM or SROM. Normally, the application program-
mer does not need to access this part of ROM.
ii) The RAM is of SRAM technology and is considered as data memory. It is vola-
tile and is used for storing intermediate results during computation. The minimum
amount of RAM in any PSoC is 256 bytes. When a particular version has more
than 256 bytes, it is organized as 256 byte sized pages. The CY8C29466 has 2 KB
of RAM, i.e., it is organized as 8 pages.
iii) The figure shows a set of registers outside the CPU.These registers are like the SFRs
(Special Function Registers) of 8051 and other MCUs; they are used for managing
and configuring the programmable blocks of PSoC. This set consists of two banks
of 256 bytes each, and bank switching is taken care by an XIO bit (bit 4) available
in the flag register.
12.4.4 | Clock Sources
Figure 12.6 shows multiple clock sources for the chip.There is a basic clock frequency of
24 MHz, a frequency multiplier and a number of frequency dividers.
This figure shows that an internal oscillator of 24MHz,or an external oscillator with
maximum frequency value of 24MHz, can be used as the system clock and also as refer-
ence, to generate a 48MHz frequency (by multiplication) or various other frequencies
Figure 12.5 | The CPU core and memory block diagram
Internal
Registers
M8C
SRAM
Registers
256
Bytes
Bank 0
Bank 1
Page II
256
Bytes
Page 0
Page 1
Flash ROM
Prog
RAM
SROM
A
X
PC
SP
F
M12_9788131787663_C12.indd 439
M12_9788131787663_C12.indd 439 7/3/2012 12:12:09 PM
7/3/2012 12:12:09 PM

Figure 12.6 | Clock source with multipliers and dividers
Internal Main
Oscillator (IMO)
X2
VC3
VC1
CPU_CLK
VC2
External Signal
(1-24MHz)
SYSCLK
(24MHz)
SYSCLK × 2
(48MHz)
SYSCLK × 2
÷N1
÷N
÷N2
÷N3
24MHz
Figure 12.7 | Low frequency source for sleep timer
External
Oscillator
32KHz
Internal Low
Speed Oscillator
CLK 32
(32KHz)
1 to 512Hz
32KHz
Sleep
Timer
with a chain of dividers. The required frequencies can be got by using the values of
dividers N1 and N2 which are allowed to vary from 1 to 16, and N3 from 1 to 256.The
frequencies VC1, VC2 and VC3 can be used as clocks to various peripherals.
Figure 12.7 shows a lower frequency clock, which can be derived from the available
internal low speed oscillator or from an external source. All these operations are select-
able using the bits of an I/O register named OSSCR. The settings are available in the
graphic editor as well.
Why do we need external oscillators?
The internal oscillator frequency can drift by a range of around ±2.5% (device depen-
dent, can vary upto ±4%). When very high precision frequency generation is required,
the reference frequency can be from an external source which uses a precision crystal.
Where is the frequency of 1 to 512 Hz to be used?
This is for the sleep timer which is a timer which generates periodic interrupts at any
chosen rate between 1 and 512 Hz (speciﬁcally 1, 8, 64 and 512 Hz).
M12_9788131787663_C12.indd 440
M12_9788131787663_C12.indd 440 7/3/2012 12:12:10 PM
7/3/2012 12:12:10 PM

What is the sleep state?
Most embedded systems don’t work continuously. Think of a mobile phone. It becomes
active if a call or message or some other request occurs. The idea of putting an MCU
to sleep is to save power when it is not doing any active computation. The only activity
being done then, is the running of the sleep timer from a low frequency clock source.
Once in the sleep state, the MCU is woken up periodically by interrupts from the sleep
timer, (which is just like any ordinary timer) at any rate in the range of 1 to 512 Hz (as
mentioned above). Besides this, the MCU can also be woken up by interrupts generated
at the analog or digital inputs.
Watchdog This refers to the watchdog timer, which gets the MCU to reset, if caught
in an endless loop (due to some unexpected error). Refer Section 2.2.6 for more details
on this.
The other blocks in Figure 12.4 are the interrupt controller, which manages the
interrupt system of the MCU (Section 2.2.9) and the PLL.The interrupt controller takes
charge of 17 interrupt vectors,associated ISRs,their priorities,etc.The PLL is the ‘Phase
Locked Loop’ which is related to the functions of generating different frequencies.
12.4.5 | The PSoC Designer
PSoC Designer is the Integrated Design Environment (IDE) used to customize PSoC
to meet the specific requirements of an application. Because of the graphic environment,
it accelerates the process of system design.Applications can be designed using the library
of pre-characterized analog and digital peripherals in a drag-and-drop design environ-
ment.Then the design code can be written using the API libraries of each user module.
Finally, the design can be debugged and tested with the integrated debug environment
including in-circuit emulation and standard software debug features. To sum up, the
components of the IDE include the following:
i) Application editor GUI for device and user module configuration and dynamic
reconfiguration
ii) Extensive user module catalog
iii) Integrated source code editor (C and Assembly)
iv) C compiler with no size restrictions or time limits
v) Built-in debugger
vi) Integrated circuit emulation (ICE)
vii) PSoC designer supports the entire family of PSoC 1 devices
Figure 12.8 shows the PSoC Designer’s graphic editor page for configuring the digital
and analog blocks. This particular picture (figure) is applicable only to the 29xxx series.
For any other series, the numbers of GPIO pins are not the same and so the ‘picture’will
be different.
Note The step-by-step guide to using this IDE is available in Appendix C.
The graphical editor allows the designer to customize the connections ‘inside’ the
chip.
M12_9788131787663_C12.indd 441
M12_9788131787663_C12.indd 441 7/3/2012 12:12:10 PM
7/3/2012 12:12:10 PM

DBB00 DBB01 DCB02 DCB03
Port_0_0
Port_0_1
Port_0_2
Port_0_3
Port_0_4
Port_0_5
Port_0_6
Port_0_7
Port_1_0
Port_1_1
Port_1_2
Port_1_3
Port_1_4
Port_1_5
Port_1_6
Port_1_7
Port_2_0
Port_2_1
Port_2_2
Port_2_3
Port_2_4
Port_2_5
Port_2_6
Port_2_7
Port_0_0
Port_0_1
Port_0_2
Port_0_3
Port_0_4
Port_0_5
Port_0_6
Port_0_7
Port_1_0
Port_1_1
Port_1_2
Port_1_3
Port_1_4
Port_1_5
Port_1_6
Port_1_7
Port_2_0
Port_2_1
Port_2_2
Port_2_3
Port_2_4
Port_2_5
Port_2_6
Port_2_7
1
0
Port_0_0
Port_0_1 Port_0_4
Port_0_1
Port_0_2
Port_0_3
Port_0_4
Port_0_5
Port_0_6
Port_0_7
Port_2_0
Port_2_1
Port_2_2
Port_2_3
ACB00
ASC10
ASD20
Comparator
0
buf0
1
1
ACB01
ASD11
ASC21
Comparator
1
buf1
1
2
ACB02
ASC12
ASD22
Comparator
2
buf2
1
3
ACB03
ASD13
ASC23
Comparator
3
buf3
Port_0_5 Port_0_0
Port_0_2
Port_0_3
Port_0_4
Port_0_5
Comparator 0–5
VC 1–3
CPU_32_KHz
SpCK2
RI0[0]
RI0[1]
RI0[2]
RI0[3]
RO0[0]
RO0[1]
RO0[2]
RO0[3]
BC0
RI1[0]
RI1[1]
RI1[2]
RI1[3]
RO1[0]
RO1[1]
RO1[2]
RO1[3]
BC1
RI2[0]
RI2[1]
RI2[2]
RI2[3]
RO2[0]
RO2[1]
RO2[2]
RO2[3]
BC2
RI3[0]
RI3[1]
RI3[2]
RI3[3]
RO3[0]
RO3[1]
RO3[2]
RO3[3]
BC3
7 7
0 0
GIO GIE 7 7
0 0
GIO GIE
Figure 12.8 | Graphical editor of the PSoC designer
M12_9788131787663_C12.indd 442
M12_9788131787663_C12.indd 442 7/3/2012 12:12:10 PM
7/3/2012 12:12:10 PM

12.4.6 | General Purpose I/O (GPIO)
There are 8 ports defined for PSoC1. All of them are not realizable for all the parts
(i.e. chips) in the family. Since each port requires 8 pins, only the 100 pin version can
offer all the 8 ports. Others have port counts varying from 3 to 8. Figures 12.2a to 12.2e
show the number of ports for members of the CY8C29xxx series. Each port pin has an
elaborate circuitry behind it, which takes care of its logic level, as well as the load it can
use.The load can be selected according to the type of driving device or incoming signal.
This will be explained in greater detail soon (Section 12.6.1).
Note in Figure 12.4 that the analog driver is part of the core, though the analog
blocks are not. The analog drivers are associated with Port 0, because it is the pins of
this port that are used in the analog mode.Table 12.2 gives the resources available in the
PSoC1 chip CY8C29x66. Note that the resources vary from part to part.
12.5 | The Digital Sub System
Now see Figure 12.9 which shows the block diagram of the digital system. It is com-
posed of 16 digital blocks. Each block shown is an 8-bit resource, which can be used by
itself to obtain an 8-bit user module, or combined with other blocks to form 16, 24 or
32-bit modules. Each digital block is blank. It is up to the designer to decide the user
module to fit into any block. Let’s try to understand this in greater detail.
The digital user modules available for CY829x66 are
i) PWMs (8 to 32-bit)
ii) PWMs with dead band (8 to 32-bit)
iii) Counters (8 to 32-bit)
iv) Timers (8 to 32-bit)
v) UART 8-bit with selectable parity (up to 2)
vi) SPI slave and master (up to 2)
vii) I2C slave and multi-master (one more available as a system resource)
viii) CRC (Cyclic Redundancy Check) generator (8 to 32-bit)
ix) IrDA (Infra Red communication) (up to 2)
x) PRS (Pseudo Random Sequence) generators (8 to 32-bit)
As Figure 12.9 shows, there are the ‘Digital Building Blocks’ (DBB) and ‘Digital
Communication Blocks’(DCB) arranged in four rows of four blocks.
In each row, the first two are DBBs and the next two are DCBs.There is one point
to keep in mind, in this context. All digital components, like counters, PWM unit,
PRS and CRC generators can be configured out of any free block. On the other hand,
communication components like Rx,Tx, UART and SPI can be set on the DCB blocks
only. Besides these, I2C has a dedicated hardware block.
Table 12.2 | Details of Resources of the Chip Series CY9C29x66
PSoC Part
Number
Digital
I/O
Digital
Rows
Digital
Blocks
Analog
Inputs
Analog
Outputs
Analog
Columns
Analog
Blocks
SRAM
Size
Flash
Size
CY8c29x66 up to 64 4 16 up to 12 4 4 12 2K 32K
M12_9788131787663_C12.indd 443
M12_9788131787663_C12.indd 443 7/3/2012 12:12:10 PM
7/3/2012 12:12:10 PM

Figure 12.9 | Block diagram of the digital subsection
Port 7 Port 5
Port 6 Port 4 Port 2 Port 0
Port 3 Port 1
Row3
Row
Input
Configuration
Row
Output
Configuration
4
4
8
8
8
8
Row2
Row
Input
Configuration
Row
Output
Configuration
4
4
Row1
Row
Input
Configuration
Row
Output
Configuration
4
4
Row0
Row
Input
Configuration
Row
Output
Configuration
4
4
To System Bus
Digital System
Digital PSoC Block Array
To Analog
System
Digital Clocks
From Core
Global Digital
Interconnect
GOE[7:0]
GOO[7:0]
GIE[7:0]
GIO[7:0]
M12_9788131787663_C12.indd 444
M12_9788131787663_C12.indd 444 7/3/2012 12:12:10 PM
7/3/2012 12:12:10 PM

Figure 12.10 shows the graphic editor page of PSoC designer,catering to one row of
four digital blocks, the associated interconnections and a few port pins (all of them can-
not be shown in this page).This ﬁgure is to be referred when we discuss various aspects
of the interconnection logic. We do all interconnections using this editor.
What are the important points to note in this figure?
This ﬁgure shows just one row of four digital blocks,a set of four input rows above it,and
a set of four output rows below it.There is a BC (Broad Cast) line,the GI (Global Input)
lines on the left, the GO (Global Output) lines on the right and some of the port pins.
12.5.1 | Clock Input
All digital blocks need a clock as input. Figure 12.10 shows a black coloured triangu-
lar notation inside each of the blocks, which shows the clock input. One of the clock
sources as discussed in Section 12.4.4, i. e., VC1, VC2, VC3, SYSCLKx2, CPU_32, etc.
can be selected as the clock.There are other possibilities also.The output of one stage of
a block can be used as the clock of the next stage. Figure 12.11 shows a line called BC
(Broadcast), which allows any signal to be sent to various other points through the com-
mon broadcast line.The broadcast line may also be used to connect between the row lines
of one group (of four digital blocks) to that of another group, say from RI0[2] to RI1[3].
The step-by-step instructions for using the PSoC designer are given in Appendix C
In this chapter, the attempt is to explain the hardware and software aspects so as to
Figure 12.10 | The graphic editor page of PSoC designer for one row of digital blocks
PORT_0_0
7 7
0 0
RI0[0]
RI0[1]
RI0[2]
RI0[3]
BC0
GIO GIE
PORT_0_1
PORT_0_2
PORT_0_3
PORT_0_4
PORT_0_5
PORT_0_6
PORT_0_7
PORT_1_0
PORT_1_1
PORT_1_2
PORT_1_3
PORT_1_4
PORT_0_0
7 7
0 0
GOO GOE
PORT_0_1
PORT_0_2
PORT_0_3
PORT_0_4
PORT_0_5
PORT_0_6
PORT_0_7
PORT_1_0
PORT_1_2
PORT_1_3
PORT_1_4
PORT_1_5
RO0[0]
RO0[1]
RO0[2]
RO0[3]
M12_9788131787663_C12.indd 445
M12_9788131787663_C12.indd 445 7/3/2012 12:12:11 PM
7/3/2012 12:12:11 PM

Figure 12.11 | Two digital blocks with input and output and a BC connection
Input Rows
RI0[0]
RI0[1]
RI0[2]
RI0[3]
BC
RO0[0]
RO0[1]
RO0[2]
RO0[3]
Output Rows
Out
CLK
In
Out
CLK
facilitate a general understanding of the PSoC and it features. For that, let’s use the
graphic editor to define different kinds of applications.
Each of the digital blocks can be programmed using assembly language. This may
be cumbersome and so the easier approach uses the graphic editor for making the con-
nections, and writing programs in C. The PSoC API library provides, for each of the
user modules, a set of functions that perform a specific task. We will use a few of them
to make the concepts clear. Each user module has a number of API functions and the
programmer is expected to read the manual of the module to get the right function for
his requirement. Figure 12.12 contains the same information as Figure 12.10, but with
a few more details added. It will be easier to use Figure 12.12 when discussing the input
and output interconnections.
12.5.2 | Interconnection Structure—Input
Now refer to Figure 12.12 for the input side interconnects—there is the global bus GI
with 16 bus lines, with designations of GIE and GIO. On the left of it, the port pins
are seen. The interesting feature of PSoC is that the IDE provides a graphical method
of choosing any GPIO pin to act as any digital input or output pin. We can make the
choice of the pin and also the interconnection ‘between blocks’. In essence the routing
path of signals ‘inside the PSoC chip’ can be configured graphically.
At the input side, the global bus lines are named GIE and GIO meaning ‘Global
Input Even’ and ‘Global Input Odd’. The odd numbered bus lines connect only to odd
numbered ports, i.e., to P1, P3, P5 and P7. Similar is the case for even numbered bus
lines which connect to port pins of P0, P2, P4 and P6.You can see that there are only 16
GI bus lines, while there are up to 64 port lines for the chips in the series.
M12_9788131787663_C12.indd 446
M12_9788131787663_C12.indd 446 7/3/2012 12:12:11 PM
7/3/2012 12:12:11 PM

How is this managed?
The GI bus lines are for routing the signals from the input pins to the digital blocks.
The connections between the port lines and the GIE bus are as shown in Figure 12.13.
What this indicates is that GIE0 is the bus line to be used for the 0th
pin of any even
port, GIE1 for the 1st
pin of any even port pin and so on. Similar connectivity applies to
other GIE bus lines. (A similar conﬁguration of connections is applicable to the GIO
lines as well, for the odd numbered ports, though).
Figure 12.12 | An expanded view of Figure 12.10
Output Rows Output Mux
Input Mux
7 7
0 0
GIO GIE
Input Rows
Pins
Global Input Bus
Digital Blocks
BC
P1,P3,P5,P7
P0,P2,P4,P6
7 7
0 0
GOO GOE
Pins
Global Output Bus
P1,P3,P5,P7
P0,P2,P4,P6
Figure 12.13 | GIE lines and the port pins that connect to each bus line
GIE
P0_0,P2_0,P4_0,P6_0
PO_1,P2_1,P4_1,P6_1
P0_7,P2_7,P4_7,P6_7
6
7 5 4 3 2 1 0
M12_9788131787663_C12.indd 447
M12_9788131787663_C12.indd 447 7/3/2012 12:12:11 PM
7/3/2012 12:12:11 PM

Thus a signal from an input pin can be connected to the global bus line, and from
there it has to be routed to the input point of a digital block.
How is that done?
There are multiplexers for that. A set of multiplexers carry the signal to its destination.
Figure 12.14 shows the multiplexers whose output lines are notated as RI (standing
for ‘Row Input’). There are four such row lines corresponding to the four digital blocks
in a row. For the 0th
row of digital blocks, these are RI0(0), RI0(1), RI0(2) and RI0(3).
This ﬁgure shows the GI bus lines at the input of the ﬁrst mux (multiplexer) in the top
row line.
If we need to interpret this diagram on the basis of port pins, the inputs of the
multiplexer of row RI0(0) carry the signals from P0_0, P0_4, P1_0 and P1_4, see
Figure 12.15 Similar thinking can be applied to the case of the other three multiplexers.
See Figure 12.16 for the complete connection for the four row lines above the digi-
tal blocks DBB00, DBB01, DCB02 and DCB03.
Figure 12.14 | Input bus lines to the multiplexer of row 0
GIE0
GIO0
GIO4
GIE4
Row Input Line 0
RI0[0]
Select
7 0
GIO 7 0
GIE
Figure 12.15 | Row 0 multiplexer and the port pins at its input
P0_0
P1_0
P1_4
P0_4
Row Input Line 0
RI0[0]
Select
7 0
GIO 7 0
GIE
M12_9788131787663_C12.indd 448
M12_9788131787663_C12.indd 448 7/3/2012 12:12:12 PM
7/3/2012 12:12:12 PM

Example 12.1
Draw the routing (on the PSoC designer) for connecting input pin P0_0 to the capture
input of the timer which is to be realized using DBB00
Solution
The steps are as follows
i) First ‘place’ the 8-bit Timer1(from the list of user modules) on DB01.
ii) Next, the connection between the global input lines and the port pin P0_0 should
be made. For that, note that since P0 is an even port, the global lines to be used,
belong to the GIE group. It is the 0th
pin and so the GIE0 line that should be used.
Establish this connection.
iii) The capture input of the timer is connected to RI0(0),since we know (Figure 12.15)
that P0_0 is one of the inputs of the multiplexer whose output is RI0(0).
iv) Figure 12.17 shows this interconnection from P0_0 to GIE0 to RI0(0) to the cap-
ture input of the 8-bit Timer 1.
v) Note that the timer has been given a clock from VC1.
Figure 12.16 | Connections of the GIE and GIO bus lines to the input muxes
GlO
7 0
GIE
7 0 Select
Row Input Line 0
RI0[0]
GIE0
GIE4
GIO4
GIO0
Select
Row Input Line 1
RI0[1]
GIE1
GIE5
GIO5
GIO1
Select
Row Input Line 2
RI0[2]
GIE2
GIE6
GIO6
GIO2
Select
Row Input Line 3
RI0[3]
GIE3
GIE7
GIO7
GIO3
M12_9788131787663_C12.indd 449
M12_9788131787663_C12.indd 449 7/3/2012 12:12:12 PM
7/3/2012 12:12:12 PM

Figure 12.17 | Routing diagram from pin P0_0 to the capture pin of an 8 bit timer
Port_0_0
RI0[0]
RI0[1]
RI0[2]
RI0[3]
BC0
GIO GIE
7 0 0
7
Port_0_1
Port_0_2
Port_0_3
Port_0_4
Port_0_5
Port_0_6
Port_0_7
Port_1_0
Timer8
DBB00
Compare Out
Capture
1
Terminal
Count Out
Timer8_1
Timer8
Example 12.2
Explain the steps in giving an external source for the ‘Enable’ pin of the 8-bit PWM
which has been placed at DCB02.The port pin P1_3 is used for the enable logic.
Note In this use case, the PWM output will be enabled when the external signal goes
high.
Figure 12.18 | Routing the signal from P1_3 to an input of DCB02
Port_0_0
RI0[0]
RI0[1]
RI0[2]
RI0[3]
RO0[0]
RO0[1]
RO0[2]
RO0[3]
BC0
GIO GIE
7 0 0
7
Port_0_1
Port_0_2
Port_0_3
Port_0_4
Port_0_5
Port_0_6
Port_0_7
Port_1_0
Port_1_1
Port_1_2
Port_1_3
Port_1_4
CNTR16 LSB
DBB00
CNTR16 MSB
DBB01 DCB03
DCB02
Compare Out
Enable
2
Terminal
Count Out
PWM8_2
PWM8
M12_9788131787663_C12.indd 450
M12_9788131787663_C12.indd 450 7/3/2012 12:12:12 PM
7/3/2012 12:12:12 PM

Solution
The steps are
i) Note that P1_3 is a pin of an odd port, and hence the bus line from GIO must be
used, i.e., GI03 is connected to Port 1_3.
ii) Next, the connection from the Enable pin to RI0(3) is done, since (Figure 12.16)
GIO3 is an input to the mux with output line RI0(3).
iii) Figure 12.19 shows the connections of the mux that must be selected, such that
GI03 is routed to the output of the mux.
iv) VC2 is used as the clock to the PWM unit.
12.5.3 | Interconnection Structure—Output
Just as at the input side, there are global bus lines at the output as well.They are notated
as GOE (Global Output Even) and GOO (Global Output Odd) and the connection
pattern (to the port pins) is similar to that at the input side.
There are multiplexers on the output side as well, but they are slightly different
from those at the input, in that additional logic functions are possible here. The output
(without or without extra logic) is routed to the output pins through buffers. Note the
RO0(0) to RO0(3) row line in Figure 12.10. They are the lines to be taken to the port
pins through the multiplexers shown on the right hand side of each row line. These
multiplexers have the options of including a logic function also, before being routed to
the port pins. See Figure 12.20. This shows that logic functions of AND, NAND, etc.
can be realized between one row line and its neighbouring line.The white triangles after
the multiplexers are the output buffers through which the signal is routed to the GO
buses and thence to the output pins.
Figure 12.19 | Routing GI03 to RI0(3)
M12_9788131787663_C12.indd 451
M12_9788131787663_C12.indd 451 7/3/2012 12:12:13 PM
7/3/2012 12:12:13 PM

Example 12.3
Show the routing to use the 16-bit PWM unit to get a PWM output at port pin Port 1_2.
Solution
Figure 12.21 shows the routing. A 16-bit PWM unit uses up two of the digital blocks,
DCB02 and DCB03,but the connection ‘between’the two blocks is taken care of,by the
PSoC internally.The following are the connections made.
Figure 12.20 | Output side multiplexers and buﬀers
Figure 12.21 | Routing of the PWM output signal to Port 1_2
Port_0_4
Port_0_5
Port_0_6
Port_0_7
Port_1_0
Port_1_1
Port_1_2
Port_1_3
Port_1_4
Global Out Even_3
DC802
Enable
2
PWMDB16_1
PWM16_LSB
DCB03
PWMDB16_1
PWM16_MSB
PWM Output
VCC
M12_9788131787663_C12.indd 452
M12_9788131787663_C12.indd 452 7/3/2012 12:12:14 PM
7/3/2012 12:12:14 PM

i) The clock source chosen is VC2.
ii) Active high enable is used.
iii) The PWM output is connected to RO0(3).
iv) At the output mux, no additional logic is chosen. The output of the PWM unit is
sent through buffers to GO02 and from there to P1_2.
12.6 | GPIO Pins
The GPIO port pins have two basic functions
i) One is to allow the core to send information out of the PSoC device.The pin acts as
an output, in this case.
ii) The second is to obtain information from outside the PSoC device.This puts the pin
in the input state.
These operations are accomplished by using the bits of the port data register named
PRTxDR.
The pin as an output Writes from the core to the PRTxDR register store the data
state, one bit per GPIO. Then the pin drivers drive the pin in response to this data bit,
with a drive strength determined by the drive mode setting.
The pin as an input For the case of an input, again the content of the PRTxDR regis-
ter is referred, and the logic value at the pin is obtained in the core.
12.6.1 | Drive Modes
When using the PSoC designer, it is not necessary to know about the drive mode bits.
It is only necessary to select the drive mode. For doing this ‘selection’, it is important to
have an idea of what these drive modes mean and how the circuitry changes for each
mode.
Before reading ahead, it is suggested that you refer Section 2.5 to get a clear idea
of what is meant by terms like pullup, pulldown, open drain, high Z, etc. Also, on
coming back here, if you find that this part is too cumbersome, skip it for the time
being, and come back to it when you actually need to select drive modes for an appli-
cation. For more information on PSoC drive modes, refer to http://www.cypress.
com/?rID=39497cache=0
Figure 12.22a shows the complete circuit associated with a GPIO pin. It is not nec-
essary to try to understand this elaborate circuit—only the CMOS inverter stage at the
output of the circuit need be referred, for our purpose.The drive mode is controllable by
three bits PRtxDM0, DM1 and DM2 of the register associated with the pin. Refer to
Table 12.3 which shows the various drive options.
Figure 12.22b shows how the output transistor stage gets modified by the DM bit
settings.The diagram numbers refer to the entries in Table 12.3.
To understand the drive modes, let us name the output transistors as D1 (the upper
one- the PMOS) and D2 (the lower one-the NMOS). D1 and D2 can be turned ON
by setting the respective bits in the PRTxDR register. Note that only one MOS will be
ON at any moment of time.
M12_9788131787663_C12.indd 453
M12_9788131787663_C12.indd 453 7/3/2012 12:12:16 PM
7/3/2012 12:12:16 PM

Table 12.3 | Drive Mode Bits and Drive Modes with Reference to Figure 12.21b
Drive Modes
DM2 DM1 DM0 Drive Mode Diagram
Number
Data = 0 Data = 1
0 0 0 Resistive Pulldown 0 Resistive Strong
0 0 1 Strong Drive 1 Strong Strong
0 1 0 High Impedance 2 Hi-Z Hi-Z
0 1 1 Resistive Pullup 3 Strong Resistive
1 0 0 Open Drain, Drives High 4 Hi-Z Strong (Slow)
1 0 1 Slow Strong Drive 5 Strong (Slow) Strong (Slow)
1 1 0 High Impedance Analog 6 Hi-Z Hi-Z
1 1 1 Open Drain, Drives Low 7 Strong (Slow) Hi-Z
Figure 12.22a | The driver circuitry for a port pin
I2C Enable
I2C Output
2:1
2:1
Data
Data Bus
I2C Input
INBUF
(To Readmux
Interrupt
Logic)
OinLatch
CELLRD
DM[2:0]=1100
AOUT
EN
O
D
R
RESET
DM0
DM1
DM2
Slow
Control
5.6K
5.6K
Vpwr
Vpwr Vpwr
Drive
Logic
Output Path
Input Path
BYP
DM1
DM0
BYP
Write PRTxDR
Read PRTxDR
Global
Output Bus
Global
Input Bus
PIN
M12_9788131787663_C12.indd 454
M12_9788131787663_C12.indd 454 7/3/2012 12:12:16 PM
7/3/2012 12:12:16 PM

Figure 12.23 | Equivalent circuit at the pin for a resistive pullup
Internal
to the
Chip
5.6 KΩ
Port Pin
VCC
+
–
–
0. 3.
1. 2.
4. 7.
5. 6.
Figure 12.22b | Equivalent circuit for the output stage for the drive options of Table 12.3
i) A ‘strong’ drive implies that D1 or D2 is ON so that the eﬀective load is the ON
resistance of one of these, which being low, implies a ‘strong’ current.
ii) A resistive pullup implies that there is a resistance (5.6K) appearing in series with
D1 (which is ON), and so a resistive pullup is obtained. Figure 12.23 shows this.
Note that D2 will drive a strong low when it is ON.
iii) A resistive pulldown means that D2 is ON, and there is also a resistance (5.6K) that
appears in series with this and connected to ground. Figure 12.24 illustrates this
condition. Note that in this case, if D1 is turned ON, it will drive a strong high.
Figure 12.24 | Equivalent circuit at the pin for a resistive pulldown
Internal
to the
Chip
5.6 KΩ
–
Port Pin
+
+
M12_9788131787663_C12.indd 455
M12_9788131787663_C12.indd 455 7/3/2012 12:12:16 PM
7/3/2012 12:12:16 PM

iv) A Hi-Z means that both D1 and D2 are OFF, and so that a very high impedance is
seen at the port pin.
vi) A strong slow drive is when a resistance appears along with an ON transistor at
the input side of the inverter and this increases the rise and fall times of switching,
because of the increase in the time constant RC. This is mainly used to reduce the
effect of electromagnetic radiation caused by high frequency/high slew rate signals
and also to reduce the power consumption. Refer Figure 12.25. The slew control in
Figure 12.22a is affected by the DM bit settings when ‘slow strong’ is selected.
Note For fast switching, i.e., for a strong drive, the rise and fall times are in the range
of 3 to 18ns, while for the ‘slow’strong drive, it increases to the range of 10 to 25ns.
12.6.2 | Using the Drive Options
i) Hi-Z is the basic (default) input mode
ii) Strong drive is the basic (default) output mode
This mode may be changed if needed. For example, when push buttons or switches are
connected at the input, resistive pullup or pulldown modes may be required to prevent
a floating state. But user modules that directly drive GPIO will have the ‘right’ drive options
automatically configured.
12.7 | Digital Applications Using PSoC
Now, we will have a look at two simple applications using PSoC. Understand that there
are a number of user modules like counters, timers, PWM units, etc. which can be
mapped on to the digital blocks, that is, the chip contains dedicated hardware for these
peripherals and a number of these peripherals can run simultaneously. For example, a
PWM unit, a counter and a pseudo random sequence generator can run simultaneously
and produce outputs at different pins with different frequencies.
Besides such dedicated hardware,there are digital systems which are realizable using
the core and the software, but need not be mapped on the digital blocks. Single LED,
seven segment LED, LCD, etc. are some of them. You can refer to the user modules in
the PSoC Designer for the complete list. As mentioned earlier, user module datasheets
Figure 12.25 | Equivalent circuit at the pin for a slow strong drive
Internal
to the
Chip
Port Pin
VCC
M12_9788131787663_C12.indd 456
M12_9788131787663_C12.indd 456 7/3/2012 12:12:16 PM
7/3/2012 12:12:16 PM

are available for each of them, which need to be read to understand the hardware con-
nections and the usable API functions. We will start with the simplest module, that is,
an LED.
12.7.1 | API Function Naming Conventions
The Application Programming Interface (API) function is to be named as
Instance Name of user module_function //Note the
‘underscore’
Examples are
LED_1_Start() ; LED_1 is the name of the module and Start() is the function, with
; no parameters
LCD_Position(1, 2) ; LCD is the name of the module and Position () is the function
; with 1, 2 being the parameters used here
PWM8_1_Start ; PWM8_1 is the instance name of the module and Start() is the
; function
Note: The name LED_1, LCD, PWM8_1, etc. are names used by the IDE. But the
user can change the name (there is an option available for it). For LED_1 you can
change it to LED1, for instance. The API functions calling it will then start with the
name LED1
LED1_Switch(1) ; LED1 is the name of the module and Switch() is the function
; with the parameter ‘1’ being used here.
12.7.2 | The LED User Module
There is an LED module inbuilt,and a few simple functions to control it.We can choose
between an active high or active low connection.
The LED User Module is just a set of simple functions to control an LED or any
simple device that needs an ON–OFF control. For example, the module can be used
to control a relay. Of course, you need to use a relay driver externally, since the PSoC
Source/Sink current is limited).
Example 12.4
Implement a flashing LED using the inbuilt LED module.
Solution
For this,let’s select P0_2 as the port pin,and ‘active low’,(i.e.,the LED will be ON when
the data bit is set to zero) in the LED_1 user module parameter settings. Also choose
‘strong’ for the drive option for P0_2. The interconnection is shown in Figure 12.26.
Note that the only connection is to define the use of Port 0_2 for connecting the LED
and none of the global buses are used.
The program to be run for switching the LED ON and OFF at a specific rate is
shown below.This is the main.c file used.
M12_9788131787663_C12.indd 457
M12_9788131787663_C12.indd 457 7/3/2012 12:12:17 PM
7/3/2012 12:12:17 PM

Program
include m8c.h //part specific constants and
macros
#include “PSoCAPI.h” //PSoC API definitions for all
User Modules
void delay(int);
void main()
{
LED_1_Start();
LED_1_Switch(1); //Turn on LED
while(1)
{
delay(3);
LED_1_Invert(); //Flash LED
}
}
void delay(sec)
int sec;
{
int i,j,secd;
for (secd=0;secd=sec;secd++)
{
for(i=0;i=2;i++)
Figure 12.26 | An LED connected to P0_2
VDD
Active Low PSoC
PO_2
Figure 12.27 | LED_1 user module at port pin P0_2
Port_0_0
GIO GIE
7 0 0
7
Port_0_1
LED_1
Port_0_2
Port_0_3
M12_9788131787663_C12.indd 458
M12_9788131787663_C12.indd 458 7/3/2012 12:12:17 PM
7/3/2012 12:12:17 PM

{
for (j= 0;j=20480;j++)
{
}
}
}
}
Note that the functions used are LED_1_Start(), LED_1_Switch(1) and LED_1_
Invert().The delay is speciﬁed in the function named ‘delay’which is written by the user.
The program is burned on the ﬂash of the chip, and the LED switches ON and OFF
continuously once the Vdd is turned ON.
12.7.2.1 | A Delay Function
The following is another routine for creating a delay.The CPU clock (CPU_Clk) is cho-
sen as 3MHz (SysClk/8), t=250 creates a delay of 1 ms.
Program
void delay_ms(unsigned char ms)
{
volatile unsigned char t;
while (ms--)
{
t = 250;
while (t--);
}
}
12.7.3 | The PWM (Pulse Width Modulation) Modules
The idea of PWM, its use and timing diagrams have been discussed in Section 3.3.2.2
Here we attempt to realize PWM using PSoC.
It is possible to have 8 or 16-bit PWM and they use one digital block (for 8-bit) or two
digital blocks (for 16-bit).The other options are as follows:
i) Source clock rates up to 48 MHz
ii) Automatic reloads of period for each pulse cycle
iii) Programmable pulse width
iv) Input enables/disables continuous counter operation
v) Interrupt option on rising edge of the output or terminal count
The 8- and 16-bit PWM user modules are pulse width modulators with programma-
ble period and pulse width. The clock and enable signals can be selected from several
sources. The output signal can be routed to a pin or to one of the global output buses,
for internal use by other user modules. An interrupt can be programmed to trigger on
the rising edge of the output or when the counter reaches the terminal count condition.
M12_9788131787663_C12.indd 459
M12_9788131787663_C12.indd 459 7/3/2012 12:12:17 PM
7/3/2012 12:12:17 PM

Example 12.5
Generate a square wave using the PWM module.
Solution
Here, we choose an 8-bit PWM unit, i.e., PWM_8, use port pin P0_0 as the output pin,
the drive mode ‘strong’ is conﬁgured automatically and, use VC3 as the clock, and ‘1’ as
Enable. Figure 12.28 shows the input and output connections of the PWM unit.
Figure 12.28 | Input and output connections of the PWM unit
BC0
PWM8
DBB00
Compare Out
Enable
3
VCC
Terminal
Count Out
PWM8_1
PWM8
RO0[0]
RO0[1]
Figure 12.29 | The PWM unit with the output routed to P0_0, shown on the graphic editor
BC0
PWM8
DBB00
Compare Out
Enable
3
Vcc
Terminal
Count Out
PWM8_1
PWM8
DBB01 DCB02 DCB0G
RO0[0]
RI0[0]
RI0[1]
RI0[2]
RI0[3]
RO0[1]
RO0[2]
RO0[3]
Port_0_0
7 7
0 0
GOO GOE
Port_0_1
Port_0_2
Port_0_3
Port_0_4
Port_0_5
Port_0_6
Port_0_7
Port_1_0
Port_1_1
Port_1_2
Port_1_3
Program
include m8c.h //part specific constants and macros
#include “PSoCAPI.h” //PSoC API definitions for all User
Modules
void main()
M12_9788131787663_C12.indd 460
M12_9788131787663_C12.indd 460 7/3/2012 12:12:17 PM
7/3/2012 12:12:17 PM

{
PWM8_1_Start(); //API for square wave generation
}
The details of the function used here, i.e., PWM8_1_Start() can be obtained from the
manual of the module.
Note Once a speciﬁc peripheral, like the PWM unit, for instance, is set to start run-
ning as in Example 12.5, the CPU has no work at all. The hardware constituting the
PWM unit runs continuously to generate the square wave.
PWM Settings
Period=99
Pulse width =50
Clock_Synch =Sync to Sysclk
Compare = Less Than
With this, we get a duty cycle of 50% with a frequency of VC3/100 (Since the module
counts down from 99 to zero).
12.7.4 | Implement an LCD Display Design
The features of the LCD user module are
i) Uses the industry standard Hitachi HD44780 LCD display driver chip protocol
ii) Requires only seven I/O pins—four for data and three for control
There are functions provided with this module to display strings and numbers as well as
to display horizontal and vertical bar graphs.The LED module is connected to one port
(already pre-ﬁxed to be Port. 2 for the development kit), see Figure 12.30.
Refer to Section 3.3.1.6 for a detailed discussion on LCD interfacing.
In the set up here, data is sent as two nibbles to save port lines.The connections between
the port and PSoC pins are as shown in Table 12.5.
Figure 12.30 | Connection between the LCD and a PSoC Port 2 pins
DB0
E
R/W
RS
Vee
Vcc
Vss
+5V
10K
1
Port2_5
Port2_6
Port2_4
Hitachi
HD44780A
Based
Dot
Matrix
LCD
Module
Port2_0
Port2_1
Port2_2
Port2_3
2
3
4
5
6
7
8
9
10
11
12
13
14
DB1
DB2
DB3
DB4
DB5
DB6
DB7
M12_9788131787663_C12.indd 461
M12_9788131787663_C12.indd 461 7/3/2012 12:12:17 PM
7/3/2012 12:12:17 PM

Example 12.6
Implement an LCD display using the PSoC development kit.
Solution
The steps in using the LCD module are to select it from the list of user modules, and
choose Port 2. The placement is as shown in Figure 12.31. Note that the LCD user
module does not occupy any digital or analog blocks and also does not use any of the
global buses.
Figure 12.31 | LCD_1 module placed at Port 2 pins
Program
#include m8c.h
#include “PSoCAPI.h”
void main()
{
char str[ ] = “Lab, MTech”; //String stored in RAM
LCD_Start(); //initialize
LCD_Position(0,4); //position -0th row 4th
column
Table 12.5 | Connections Between Port 2 and the LCD Pins
PSoC Pin LCD Pin Description
Port 2_0 DB4 Data Bit 0
Port 2_4 E LCD Enable
Port 2_5 RS Register Select
Port 2_6 RW Read/Write
M12_9788131787663_C12.indd 462
M12_9788131787663_C12.indd 462 7/3/2012 12:12:18 PM
7/3/2012 12:12:18 PM

LCD_PrCString(“Welcome to PSoC”); //Print the string
stored in ROM
(saves RAM
//space)
LCD_Position(1,2);
LCD_PrString(str); //Print string stored in
RAM
}
The output of the display shows
‘Welcome to PSoC
Lab, MTech’
12.8 | The Analog Section
Now, we will have a look at the analog section of PSoC. Figure 12.32 shows the block
diagram of the analog system.
The analog blocks are arranged as three rows. The CY8C29x66 has four analog
columns. The analog blocks are arranged as four columns and three rows. There are two
types of analog building blocks.
Figure 12.32 | Block diagram of the analog section
*Note that the CY8C21 × 34/23 has limited 2 column functionality.
Column 0
Digital
Clocks
from
Core
To
Digital
System
Column 1
2 Column PSoC* 4 Column PSoC
Column 2 Column 3
Analog
Refs
Analog
Drivers
Analog
Input
Muxing
Analog PSoC Block Arriy
CT
SC
SC
CT
SC
SC
CT
SC
SC
GIobaI Analog lnterconnect
System Bus
Port 0
Port 2
PSoC Core
Analog System
1 Column
PSoC
CT
SC
SC
M12_9788131787663_C12.indd 463
M12_9788131787663_C12.indd 463 7/3/2012 12:12:19 PM
7/3/2012 12:12:19 PM

i) The CT (ContinuousTime) Blocks:The blocks in the top row are the (CT) blocks
and are used for realizing user modules like inverting amplifiers,programmable gain
amplifiers (PGA), instrumentation amplifiers, comparators, etc.
ii) The SC (Switched Capacitor) Blocks: The lower two rows are the switched capaci-
tor blocks.The user modules like ADCs, DACs and filters can be placed only in the
SC blocks.
We should know what an SC block means before we proceed further. The next section
gives a brief insight into the idea of switched capacitor circuits.
12.8.1 | Switched Capacitor Circuits
In integrated circuits, resistors are also to be fabricated along with transistors. It is quite
difficult to fabricate them on an IC with a high degree of accuracy and they also take
up a lot of space. In recent times, a new principle has been adopted to replace them. In
this, the values can be made to depend on ratios of capacitor values (which can be set
accurately),rather than absolute values (which vary between manufacturing runs).This is
the driving force behind the idea of having resistors realized by switched capacitors.Let’s
try to understand the working principle of switched capacitor circuits.
Consider the circuit in Figure 12.33, with a capacitor connected to two switches S1
and S2 and two different voltage sources v1 and v2.
If S2
closes with S1
open, then S1
closes with switch S2
open, a charge (q is trans-
ferred from v2
to v1
with
( )
1 2 1
q C v v
Δ = − (12.1)
If this switching process is repeated N times in a time t,the amount of charge trans-
ferred per unit time is given by
( )
1 2 1
q N
C v v
t t
Δ
= −
Δ Δ
(12.2)
Recognizing that the left hand side represents charge per unit time, or current, and
the number of cycles per unit time is the switching frequency (or clock frequency, fCLK
)
we can rewrite the equation as
( )
1 2 1 CLK
i C v v f
= − (12.3)
Rearranging we get
( )
2 1
1 CLK
v v 1
R
i C F
−
= = (12.4)
Figure 12.33 | Switched capacitor
C1
S1
S2
V2 V1
M12_9788131787663_C12.indd 464
M12_9788131787663_C12.indd 464 7/3/2012 12:12:19 PM
7/3/2012 12:12:19 PM

which can be interpreted to mean that the switched capacitor is equivalent to a resistor.
The value of this resistor decreases with increasing switching frequency or increasing
capacitance, as either will increase the amount of charge transferred from v2
to v1
in a
given time.
With this idea, there are switched capacitor integrators, filters, amplifiers, etc.
In integrated circuits, it is better to use this, instead of real resistors whose values
are not accurate. SC blocks are accurate, require no external components and maintain
a predictable response over all specified operating conditions. For switched capacitor
amplifiers, for instance, the operation takes place in two phases: sampling and amplifica-
tion. So the circuit needs a clock, along with the analog input. See Figure 12.34.
The advantages of switched capacitor circuits are:
i) Compatibility with CMOS technology
ii) Good accuracy of time constants
iii) Good voltage linearity
iv) Good temperature characteristics
v) 0.1% to 1% accuracy depending on the size of components
12.8.2 | SC and CT Blocks in PSoC
Thus we see that in the SC blocks,resistors are realized using the switching of capacitors.
In the CT blocks, the amplifiers and comparators are realized using normal OPAMP
and resistor networks. Note Figures 12.35 and 12.36 which show this. The inverting
amplifier, for instance, has the circuit realization as shown in Figure 12.35 with resistors
and op-Amps.The block diagram of the ADC (an SC block) shows switched capacitors
instead of resistors (Figure 12.36).
Figure 12.34 | A switched capacitor amplifier circuit
Vout
Vin
CK
Figure 12.35 | Internal diagram of the inverting amplifier
GAIN = –Rb/Ra
INV_Input
Output
CT_Block
SC_Block Ra
+
–
Rb
A_Bus
Bus Out
Analog
M12_9788131787663_C12.indd 465
M12_9788131787663_C12.indd 465 7/3/2012 12:12:20 PM
7/3/2012 12:12:20 PM

12.8.3 | Interconnects of the Analog Section
Now let’s discuss the interconnects of an analog section. Refer Figure 12.37b which is
the re-drawn (for clarity) version of the analog section as viewed on the graphic editor
(Figure 12.37a) of PSoC Designer. The multiplexers have been named and numbered
for simplifying the explanation of the whole set up (which looks quite complicated, but
is not really so).
The points to note and understand using these figures are
i) The top row contains the CT blocks—they are usually referred to as ACBs (Analog
Continuous time Blocks). Only amplifiers and comparators can use these blocks.
ii) The next two rows contain the switched capacitor blocks of type C or type D.They
are designated as ASC (type C) or ASD (type D). Having two types (C and D)
imply that there are slight differences in the SC circuitry. Incidentally, type D caters
to more complex circuits.
iii) There are some restrictions regarding the pins that can be used as analog input and
output. Note these points which can be readily verified from the IDE.
Figure 12.36 | The internal diagram of the ADC
System Bus
Row Bus
PWM 1:4
4
Decimator
Data Clock
+ Input
+
–
– Input
Figure 12.37a | The analog section as seen in the editor of the PSoC Designer
1
0
Port_0_0
Port_0_1 Port_0_2
Port_0_1
Port_0_2
Port_0_3
Port_0_2
Port_0_3
Port_0_4
Port_0_5
Port_0_4
Port_0_5
Port_0_6
Port_0_7
Port_2_0
Port_2_1
Port_2_2
Port_2_3
GAN
ACB00
ASC10
ASD20
Comparator
0
huf0
1
1
ACB01
ASD11
ASC21
Comparator
1
huf1
1
2
ACB02
ASC12
ASD22
Comparator
2
huf2
1
3
ACB03
ASD13
ASC23
Comparator
3
huf3
Port_0_3 Port_0_4
M12_9788131787663_C12.indd 466
M12_9788131787663_C12.indd 466 7/3/2012 12:12:20 PM
7/3/2012 12:12:20 PM

Figure 12.37b | The re-drawn version of Figure 12.35a (for ease of explanation)
From DB9
M11
M1
CB1 CB2 CB3 CB4
M7 M8 M9 M10
CT
SC
SC
CT
SC
SC
CT
SC
SC
CT
SC
SC
From DB9
M12
M5 M6
M2 M3 M4
Analog Bus
Port 2
Port 0
Port 0
Pins Pins
Port 0_ODD
To M7, M8, M9, M10
Port 0_Even Port 0_ODD Port 0_Even
– All pins of Port 0 can be used as analog inputs (if they are not used as outputs)
– The lower four pins of Port 2 can also be used as analog inputs, but only for the
left most SC blocks (rows 2 and 3 of column 1)
– Only P0_2, P0_3, P0_4 and P0_5 can be used as analog output pins
Such restrictions are inconvenient, no doubt, but we get used to it very soon.
iv) The inputs for the CT modules (top row) can be taken from the analog input port
lines of Port 0. For SC blocks which are not in column 1, or if Port 2 is not avail-
able (for instance, if it has been used already for some digital application.), inputs
cannot be taken from Port_0.
How is analog input taken from an external source, then?
The solution is to take it from the outputs of the CT modules which can act as
inputs to SC blocks. For example, to use an ADC (an SC block) which is to be
placed in the second or third rows, the method is to choose a PGA (with a gain
M12_9788131787663_C12.indd 467
M12_9788131787663_C12.indd 467 7/3/2012 12:12:21 PM
7/3/2012 12:12:21 PM

of 1) to be placed in one of the CT blocks. The input of the ADC is taken from
the output of the PGA, which feeds into one of the SC blocks.This is made clear
in Example 12.8.
v) The analog output can be taken out of the chip, through the analog bus which is
then routed to an output port pin.There is only one bus line (with a buffer) for each
column. This means that only one of the blocks in a column can drive an analog
output, at a time. Thus there are four analog outputs which can be routed to four
output port pins (P0_2, P0_3, P0_4 and P0_5)
vi) There is also a comparator bus line acting as a digital output from each column.
They are shown in the figure as CB1 to CB4 corresponding to each column. This
can be given ‘to’any digital block. For instance, signals from two analog blocks can
be compared, and the result of the comparison, a digital signal, can be passed on
to some digital block,say we can use it as an enable signal for a counter or timer or
similar module (in the digital section).
vii) The multiplexers at the top (M11 and M12) have inputs from digital blocks.Muxes
M5 and M6 are for choosing between the outputs of the muxes M1 to M4.
viii) The input analog multiplexers (M1 to M4)) are used to connect from input port
pins of Port 0 and there are separate muxes for even and odd numbered pins.
ix) Analog blocks require clocks, and there is the option of selecting different clocks
using the clock multiplexers (M7 to M10). See Figure 12.38. Here ACB00 has
four choices of clock, of which two are the outputs taken from the digital blocks.
Others can be VC1, VC2, VC3, etc. In the figure, a signal from a digital block has
been chosen as the clock.
Reading through the rules and specifications of the analog block might seem a bit con-
fusing during the first reading, but will be found to be very easy once you start using it
practically with the IDE.
12.8.4 | Internal Voltage References
Many analog resources need reference voltages,for example, the comparator needs a ref-
erence voltage with which the input voltage is to be compared, to make a decision on
what the output should be.Figure 12.39 shows the choices available (in the IDE) for this.
Figure 12.38 | The clock multiplexer for ACB00
ACB00
Clock
Chosen
Port_0_7
From DBs
ACB00
M12_9788131787663_C12.indd 468
M12_9788131787663_C12.indd 468 7/3/2012 12:12:21 PM
7/3/2012 12:12:21 PM

As we go though the steps of design using analog blocks, we will see many require-
ments of ‘reference voltage’. Such references are needed for comparators, ADCs, etc. and
the user is allowed to choose from a list of options available. See Figure 12.39.The hard-
ware for the reference voltage is available as part of the system resources (Section 12.9).
There is another important point to ponder on,regarding analog voltages,i.e.analog
voltages can be positive or negative. For inverting amplifiers, for example, a positive
input is inverted to be negative and vice versa. But PSoC supplies only voltages from
0 to +5V. So, then, how is ‘negative’ defined here? Figure 12.40 answers this question.
In the 0 to 5V range, VDD is the maximum and VSS (0 V) is the minimum value.
Here the central voltage is considered as the ‘analog ground’and notated as AGND. All
voltages below that are considered as negative,and all voltages above as positive voltages.
Now let’s use analog blocks for some simple applications.
Figure 12.39 | Reference voltages available
VDD
HI
GND
GND
LD
A
Positive
Negative
Figure 12.40 | Defining positive and negative analog voltages
M12_9788131787663_C12.indd 469
M12_9788131787663_C12.indd 469 7/3/2012 12:12:21 PM
7/3/2012 12:12:21 PM

12.8.5 | Programmable Gain Amplifier (PGA)
A PGA is one of the modules in the user module set of ‘Ampliﬁers’. It implements
an opamp based non-inverting ampliﬁer with programmable gain, with 33 user-
programmable gain settings (maximum is 48). For a PGA, the gain can be changed in
the program.That’s how its gain becomes programmable.
Example 12.7
Implement a PGA for a gain of 2.
Solution
The steps are
i) Select PGA_1 from the user module list.
ii) Select the parameters of the user module—here the input mux (M1) and the output
bus is chosen as shown in Figure 12.41.The gain is chosen to be 2.
iii) In the parameter list (Figure 12.42), the reference voltage is taken as VSS (the true
ground). With this, we mean that we need only positive outputs.
Figure 12.41 | Interconnecting the PGA to P0_1 (input pin)
Port_0_0
Port_0_1
Port_0_2
Port_0_3
Port_0_2
Port_0_3
Port_0_4
Port_0_5
Port_0_4
Port_0_5
Port_0_6
Port_0_7
Port_2_0
Port_2_1
Port_2_2
Port_2_3
ACB00
PGA_1
GAIN
ABC10
ABC20
huf0
1
GAIN
Reference
Input
AnalogBus
AGND
Port_0_1
M12_9788131787663_C12.indd 470
M12_9788131787663_C12.indd 470 7/3/2012 12:12:22 PM
7/3/2012 12:12:22 PM

The interconnection chosen is in Figure 12.41. Port 0_1 is chosen as the input ana-
log pin and Port 0_3 is the output pin. The other input required for the PGA is a clock
which has been chosen as VC1. The program uses two functions from the API set of
the PGA.
Program
include m8c.h //part specific constants and macros
#include “PSoCAPI.h” //PSoC API definitions for all User
Modules
void main()
{
PGA_1_Start(PGA_1_MEDPOWER);
PGA_1_SetGain(PGA_1_G2_00);
}
The working of the PGA can be tested by burning the code,and running the program on
the development board. A variable input voltage can be applied at P0_1 and the output
amplitude checked at P0_3. A gain of 2 will be obtained. The maximum output will be
limited to 5V (the supply voltage).For AC,the reference must be AGND before feeding
to the PSoC pin. See the ‘note’ in Section 12.8.6.
12.8.6 | Implementation of an ADC
There are many types of ADCs with diﬀerent resolutions, in the list of user modules.
One of them may be selected.
The salient points regarding the implementation of an ADC using the PSoC are
as given.
i) As mentioned in Section 12.8.3, the analog input of the ADC is not directly given
to the ADC. It is applied through a PGA (with gain=1), from an input analog pin
(Port 0), (Actually, the ADC which is an SC block can take analog input from Port
2. But for the kit, Port 2 is used for connecting the LCD. In this program, both the
LCD and ADC are to be used,so the connection to the ADC is through the PGA).
The output of the PGA is connected to the ADC input.
Figure 12.42 | Parameter selection for the PGA
M12_9788131787663_C12.indd 471
M12_9788131787663_C12.indd 471 7/3/2012 12:12:22 PM
7/3/2012 12:12:22 PM

ii) The ADC has a digital output, the number of bits of which is dependant on the
resolution of the ADC. There is no mechanism by which the digital values can be
obtained at a set of output port pins directly. But the converted digital value is avail-
able in software, and API funcions may be used to obtain it. One simple method is
to use the LCD API functions and display the ASCII value on an LCD which is
connected to Port 2 (on the development kit).
Example 12.8
Convert an analog voltage to digital form and display it on the LCD of the kit.
Solution
The interconnection diagram is shown in Figure 12.43. Observe that, P0_1 has been
chosen as the analog input pin.
Note:
i) Only positive analog voltages can be handled by the PSoC.
ii) For AC inputs which have positive and negative swings, the negative portion must
be clamped and made positive (before giving it as an input to the ADC).
iii) Similarly, analog AC outputs will have a positive bias.To remove it, pass it through
a capacitor to remove the positive offset.
In the figure, we find that P0_1 is the input pin. The analog input is applied to a PGA
with a gain of 1. The output of this is fed to the selected ADC in the next row. There is
no output for it in hardware.The ADC’s API functions are used to get conversion done.
The digital output in ASCII form is displayed on the LCD.
Figure 12.43 | Realization of an ADC
Port_0_0 1
Port_0_1
Port_0_1
ADC
ASC10
ADCINC14_1
ADC
Comparator
0
Port_0_2
Port_0_3
Port_0_4
Port_0_5
Port_0_6
Port_0_7
Port_2_0
Port_2_1
1
Input
ACB00
PGA_1 GAIN
Input
Reference
AnalogBus
AGND
M12_9788131787663_C12.indd 472
M12_9788131787663_C12.indd 472 7/3/2012 12:12:23 PM
7/3/2012 12:12:23 PM

Program
#include m8c.h //part specific
constants and
//macros
#include “PSoCAPI.h” //PSoC API definitions
for all User
//Modules
void main()
{
int iData;
M8C_EnableGInt; //Enable global
interrupts
PGA_1_SetGain(PGA_1_G1_00);
PGA_1_Start(PGA_1_MEDPOWER);
ADCINC14_1_Start //Turn on Analog
(ADCINC14_1_HIGHPOWER); section
ADCINC14_1_GetSamples(0); //Start ADC to read
LCD_1_Start(); //Initialize LCD
for(;;)
{
while(ADCINC14_1_fIsDataAvailable() == 0);
//Wait for data to be
ready
iData = ADCINC14_1_iGetData(); //Get Data
ADCINC14_1_ClearFlag(); //Clear data ready
flag
LCD_1_Position(0,5); //Place cursor at
row 0, col 5
LCD_1_PrHexInt(iData); //Print ADC data on
the LCD
}
}
12.9 | System Resources
Now, let’s make a brief study of what are called system resources. Refer to Figure 12.44.
Some of these have already been discussed earlier.
Figure 12.44 | The block diagram of system resources
System Resources
Digital
Clocks
Two
Multiply
Accums.
Decimator I2
C
Switch
Mode
Pump
Internal
Voltage
Ref.
System Resets
POR and LVD
M12_9788131787663_C12.indd 473
M12_9788131787663_C12.indd 473 7/3/2012 12:12:23 PM
7/3/2012 12:12:23 PM

i) Digital Clocks: This refers to the system of oscillators needed to be used as clocks
for the digital as well as the analog blocks. In Section 12.4.4, these sources were
explained as part of the requirement of the core. Keep in mind that the system core
requires a specific clock. Besides this, the user modules need clock signals, and a
chain of multipliers and dividers gives us the option to choose the value we need.
ii) Multiply Accumulate (MAC): For DSP applications, ‘multiply and accumulate’
is commonly needed. PSoC is not a DSP processor, but it provides DSP support
through this hardware MAC unit. ‘Multiply and accumulate’ is a functionality
commonly used for convolution and correlation operations.
Accumulation of products is a feature that is implemented on top of simple
multiplication. When using the MAC to accumulate the products of successive
multiplications, two 8-bit signed values are used for input.The product of the mul-
tiplication is accumulated as a 32-bit signed value.The user has the choice to either
cause a multiply accumulate function to take place or a multiply only function.
Refer to Figure 12.45.
iii) POR, LVD and System Resets: POR is ‘power on reset’and LVD is Low voltage
detect, the same as Brown out reset which is explained is Section 2.2.3 For LVD,
the threshold for reset can be selected by the user.
iv) Power on Reset (POR): PSoC has an active low POR. The circuitry for POR is
inside the chip.When the supply voltage rises to a specified level, the chip is in the
operational state.
Reset can occur from other triggers as well—one is due to the watchdog timer
(refer to Section 2.2.6) It can also be caused by an internally occurring error, for
instance, if during the boot sequence, it is found that the contents of flash are not
valid, the chip is reset.
v) Internal Voltage References: The reference voltages are generated by this circuit
diagram.The buffer circuit provides gain to the bandgap voltage,to produce a 1.30V
reference. See Figure 12.46, which has an opamp in the noninverting configuration
with a gain of (1 +RF
/R). At the selected point of the potentionmeter, any voltage
above the band gap can be obtained and used as the reference voltage VREF
.
vi) Decimator: This is also a hardware,mostly used for filters,i.e.,in DSP applications.
Figure 12.45 | The MAC unit
Multiplier
System
Bus
MULx_X or
MACx_X
MULx_Y or
MACx_Y
Z Out,
16 Bit
32-bit ACC
Sign
32-bit
AccumuIator
M12_9788131787663_C12.indd 474
M12_9788131787663_C12.indd 474 7/3/2012 12:12:23 PM
7/3/2012 12:12:23 PM

Figure 12.47 | The switched mode pump for a 20 pin PSoC chip
D1
C1
SMP
OSG
SMP
Control
Logic
B1
–
+
L1
VBAT
Vss
Vref
Vss
Vss
Power for
All Circuitry
Vdd
PSoCTM
+
–
R RF
VREF
VBG
+
–
Figure 12.46 | The circuit for generating reference voltages for analog blocks
vii) I2C: The I2C is a simple two wire protocol (refer Section 5.2.1) used for serial
communication between ICs. It is available as one of the user modules in the digi-
tal communication sections. We can place this user module on a digital communi-
cation block, select port pins and using the APIs supplied by the user module, I2C
communication can be realized. But besides that, there is a dedicated hardware for
I2C called I2HWC available as one of the system resources. Using this hardware
makes I2C realization very simple and effective. Appendix J contains a detailed
description of how this resource is used to facilitate communication between PSoC
and an EEPROM.
viii) Switched Mode Pump: This is used when the PSoC operates on battery supply.
A single battery voltage of 1.5V can be made to be boosted up to the level as to
supply the PSoC with sufficient voltage to make it work. (In practice, it is found
that the current may not be sufficient to have other chips in the circuitry, but a
PSoC chip and some passive sensors can run with this boosted power supply.)
Figure 12.47 shows the circuitry associated with the switch mode pump. A battery and
an inductor are connected in series at the SMP pin of the chip. A bypass capacitor of at
least 0.1 μF must be connected between VDD and VSS. The inductor is charged when
M12_9788131787663_C12.indd 475
M12_9788131787663_C12.indd 475 7/3/2012 12:12:23 PM
7/3/2012 12:12:23 PM

the internal SMP switch is on. When this switch is turned off, a flyback mode occurs
and the inductor energy is released into the bypass capacitor. This is done in a periodic
fashion (1.3 MHz), charging the capacitor and thus the required value of power supply
voltage is obtained at the VDD point. Note that the direction of the diode is such as to
get the capacitor to be charged to a positive voltage.
Note: The value of the battery voltage may go as low as 1V, but once the circuit is
switched off, there is no guarantee of this set up being reliable, unless the battery voltage
is at least 1.1V.
12.10 | PSoC3 and PSoC5
These two versions are the advanced versions of PSoC, to be used for more advanced
applications. As mentioned in Section 12.2, they are different from PSoC1 in the core
architecture as well in the IDE to be used. Appendix F gives the step-by-step instruc-
tions on how to get started on using PSoC Creator which is the IDE for PSoC3 and 5
(This Appendix is available in the website of the book).
12.10.1 | PSoC3
PSoC3 has an enhanced 8051 core.This means that its instruction set is the same as the
standard 8051, but it is has a few enhancements which are as follows:
i) Pipelined RISC architecture that executes ten times faster than the industry stan-
dard 8051
ii) Most instructions executed in one or two cycles
iii) 256 bytes of internal data RAM
iv) Dual DPTR extension to the standard 8051 architecture
v) 24-bit external data space that enables access to on-chip memory and registers,and
to off-chip memory
vi) New interrupt interface that enables direct interrupt vectoring
vii) New special function registers (SFRs) that enable fast access to PSoC3 I/O ports
12.10.2 | PSoC5
This has the ARM–Cortex M3 architecture with the features
i) Enhanced v7 ARM architecture, with
• Thumb2 Instruction Set
• 16- and 32-bit Instructions (no mode switching)
• 32-bit ALU; Hardware multiply and divide
ii) Single cycle 3-stage pipeline; Harvard architecture
(Refer to Chapter 10 for more information on the ARM architecture)
12.10.3 | Peripherals
Figure 12.48 shows the peripheral structure of PSoC3 and 5. It shows the digital build-
ing blocks (designated as ‘universal digital blocks’), the analog blocks, the core with the
clocks, power management module, debug and trace and other modules which are not
M12_9788131787663_C12.indd 476
M12_9788131787663_C12.indd 476 7/3/2012 12:12:24 PM
7/3/2012 12:12:24 PM

present in PSoC1. There are a number of peripherals too, which are not available in
PSoC1 like DMA, DAC, 20-bit ADC, USB etc. It is obvious that these two versions of
PSoCs can provide a great amount of functionality for high end applications.
Conclusion
With this, we come to the end of our discussion on PSoC, in which PSoC1 has been
explained in detail, while the higher versions have only been introduced.The discussion
on PSoC1 is neither exhaustive nor complete. It is only meant to give an introduction
to PSoC and an encouragement to those who want to try something new and innova-
tive.There are a large number of peripherals which have just been mentioned. It is up to
the user to ﬁnd the right peripherals for his application and study the user manual for
it, before he gets into the programming. But the point is to note that the whole process
of design of an embedded system becomes very simple if PSoC is used. For high end
designs, PSoC3 and PSoC5 can be used in a similar way.
PSoC is a very popular product of Cypress Semiconductors, which has a very impressive
line of applications
PSoC1 and PSoC3 are 8-bit cores, while PSoC5 is a 32-bit core
This chapter deals with PSoC1 devices of the CY 29xxx series
Figure 12.48 | UDBs of PSoC3 and PSoC5
Clocking
System
Power
Management
Power Boost
from 0.5V
DMA
EEPROM
Cortex M3/
8051 Core
CAN 2.0
FS USB 2.0 JTAG/SWO
Debug
Trace
Interrupt
Controller
I2
C
SRAM
Flash
Programmable
Routing
and
Interconnect
GPIO
GPIO
GPIO
GPIO
GPIO
GPIO
GPIO
GPIO
SIO
Analog Subsystem
Digital Subsystem
Analog
Block
CMP CMP CMP CMP
CP-AMP CP-AMP CP-AMP CP-AMP
Analog
Block
Analog
Block
Analog
Block
SAR
ADC
Digital
Filter Block
UDB
UDB
UDB
UDB
UDB
UDB
UDB
UDB
UDB
UDB
UDB
UDB
UDB
UDB
UDB
UDB
UDB
UDB
UDB
UDB
UDB
UDB
UDB
UDB
Timer
Counter
PWM
Timer
Counter
PWM
Timer
Counter
PWM
Timer
Counter
PWM
Timer
Counter
PWM
Timer
Counter
PWM
Del-Sig
ADC
DAC DAC DAC DAC
SAR
ADC
M12_9788131787663_C12.indd 477
M12_9788131787663_C12.indd 477 7/3/2012 12:12:24 PM
7/3/2012 12:12:24 PM

The internal structure of PSoC1 has the core, peripheral registers and memory
Besides these, configurable digital and analog blocks are available
Designing systems is very easy because of the use of a GUI based IDE
Interconnections are done between the digital blocks and analog blocks, and the design
is then burned on the PSoC device
Programming is done in C, using pre-designed APIs
Analog blocks use switched capacitor filters instead of resistors
PSoC3 and PSoC5 are meant for high end applications
Q U E S T I O N S
1. List three features about PSoC that makes it stand out in the crowd of MCUs available in
the market.
2. How are PSoC3 and PSoC5 different from PSoC1?
3. What is the function of SROM?
4. Explain why a number of frequencies are provided by PSoC? How are these different fre-
quencies generated.
5. What is the use of the sleep timer?
6. In the GUI of the IDE, digital blocks are named as DBB and DCB. What is the difference
between the two?
7. List six peripheral functions realizable by the digital blocks.
8. In the GUI, what is the function of the GI lines? How are they connected to the Port pins?
9. How do the input and output multiplexers allow connections between port pins and digi-
tal blocks?
10. Which are the sources that can be used as clocks for the digital blocks?
11. With respect to the output circuit of a port pin, explain how resistive pullup and pulldown
are achieved?
12. What is meant by a‘strong’drive?
13. Explain the naming convention used for PSoC APIs.
14. Which of the analog blocks use switched capacitors in place of resistors?
15. Suggest two cases where analog devices need‘reference voltages’.
16. In Example 12.8, what is the technique by which the analog voltage, after being con-
verted to digital form is displayed on the LCD.
17. Where are the following two resources likely to find applications?
(i) Decimator
(ii) MAC unit
18. What is special about the I2C block found in the list of system resources?
19. How does the switched mode pump work?
20. List a few peripherals available in PSoC 3 and 5 which are not present in PSoC1.
M12_9788131787663_C12.indd 478
M12_9788131787663_C12.indd 478 7/3/2012 12:12:24 PM
7/3/2012 12:12:24 PM

E X E R C I S E S
1. Using the PSoC designer, show the interconnect structure for realizing the following.
i) One 8-bit timer
ii) One 16-bit PWM
iii) One SPI unit
The input and output pins should also be selected.
2. Show the interconnect structure for one LED, and for a set of four LEDs.
3. Show the interconnect structure and write a program for realizing an amplifier for a gain
of 12.
4. Implement a D to A converter. Show the interconnects and write a required program.
5. Generate square waves using
i) An 8-bit timer
ii) A 16-bit timer
6. Generate PWMs of three different frequencies and duty cycles.
7. Realize a comparator using analog blocks.
M12_9788131787663_C12.indd 479
M12_9788131787663_C12.indd 479 7/3/2012 12:12:24 PM
7/3/2012 12:12:24 PM

Introduction
In this chapter, we will start our study of the very popular microcontroller 8051. The
programming model is discussed here while internal peripherals and their programming
will be discussed in Chapter 14.Some aspects of this MCU have been used in Chapter 2
while discussing the general aspects of embedded systems.Thus a bit of overlap may be
felt, but the perspective is different. Assembly language programming with worked out
examples is extensively discussed in this chapter and Chapter 14. Programming using
Embedded C for 8051 is included in Chapter 9.
13.1 | History and Family Details of 8051
Intel produced the 8051 microcontroller in the year 1981 calling it Intel MCS-51, the
technology used for it being NMOS. Now CMOS technology has taken over and with
this,relatively low-power versions of 8051 are also available.Over the years,Intel stopped
manufacturing this chip (in 2007) but allowed other manufacturers to do it. Thus, we
have this chip with different versions, packing and manufacturers. Chips manufactured
by Atmel, Philips, Maxim (originally Dallas Semiconductors) are available with varying
numbers of peripherals inside. Let’s make a review of this extremely popular family of
8-bit microcontrollers. The important features of 8051 when it was first designed by
Intel are as follows:
i) 8-bit data bus
ii) 16-bit address bus
The architecture of 8051 from a program-
mer’s perspective
The addressing modes of 8051
The complete instruction set of 8051
How to write assembly language programs
using the Keil assembler
the 8051
microcontroller:
the programmer’s
perspective
13
Chapter-opening image: An 8051 chip.
M13_9788131787663_C13.indd 480
M13_9788131787663_C13.indd 480 7/3/2012 12:12:37 PM
7/3/2012 12:12:37 PM

THE 8051 MICROCONTROLLER: THE PROGRAMMER’S PERSPECTIVE 481
iii) 4 register banks
iv) 32 general purpose registers each of 8 bits
v) A 16-bit program counter (PC) and data pointer (DPTR)
vi) 4 KB on chip program memory (ROM)
vii) 128 bytes on chip data memory (RAM)
viii) Two 16-bit timers
ix) Four 8-bit ports
x) 3 internal and 2 external interrupts
xi) One serial communication port (UART)
xii) Bit as well as byte addressable RAM area of 16 bytes
xiii) 12 clock cycles to constitute one machine cycle
Because of different manufacturers, many versions of the 8051 with different speeds
and differing amounts of on-chip ROM are now found in the market. What is impor-
tant is that although there are different flavours of the 8051, they are all compatible
with the original 8051 as far as the instruction set is considered. Figure 13.1 shows the
internal components of an 8051 chip.
External
Interrupts
Interrupt
Control
Bus
Control
CPU
OSC
Internal Bus
On-chip
ROM for
Program
Code
On-chip
RAM
Timer 0
Timer 1
4 I/O
Ports
Serial
Port
Counter
Inputs
P0 P1 P2 P3 TXD RXD
Figure 13.1 | Building blocks of a generic 8051
M13_9788131787663_C13.indd 481
M13_9788131787663_C13.indd 481 7/3/2012 12:12:39 PM
7/3/2012 12:12:39 PM

13.1.1 | Other Members of the Family
Two members of the family which stand out (because of being different) are the 8052
and 8031.The latter,i.e.,8031 does not have internal ROM and so is frequently denoted
as a ‘ROM less’8051; it needs external ROM to burn the program designed for it. 8052
is different by having more RAM in it—128 bytes more, and also an extra timer.
When you try to buy a chip of 8051, it is likely that you will not find such a part
number available. This is because 8051 has been given different numbers based on the
type of ROM inside; if the ROM is UV PROM, the version is denoted 8751. If it is
flash memory that is present, it is called 8951.The letter ‘C’is added to the chip name, as
it is based on CMOS technology rather than the NMOS used by Intel in the beginning.
Thus, we find 89C51 is the chip that we might get to buy. Atmel is a popular manufac-
turer of 8051.Table 13.1 displays a partial list of its 8051 versions.
Consider the name AT89C51-12PC, where ‘AT’ stands for Atmel, ‘C’ for CMOS,
‘12’ indicates 12 MHz, ‘P’ is for plastic DIP package, and ‘C’ for commercial. Note that
2051 and 1051 are scaled down versions of 8051, with less number of I/O pins, timers,
etc. There are versions with the full 64K ROM available, and the clock frequency also
varies from 4 to 40MHz.
Another major producer of the 8051 family is NXP founded by Philips Corporation.
This manufacturer has a very large collection of 8051 microcontrollers. Their products
include features such as A-to-D converters, D-to-A converters, extended I/O, and both
OTP (One Time Programmable ROM) and flash ROM. As an exercise, you can try to
find out the extra and extended peripherals available in the version P89C51RD2 manu-
factured by NXP.
13.1.2 | Learning the Features of 8051
Once the 8051 is understood well, learning any other microcontroller will be very easy.
Our approach will be to understand the 8051 first, from a programmer’s point of view
and then to enlarge this view by learning its hardware and interfacing features. This
chapter is concerned with the programming aspects only.
13.2 | 8051: The Programmer’s Perspective
Let us look at the 8051,as a programmer needs to know it.Figure 13.2 shows the internal
blocks which a programmer needs, to do coding. We will examine it block by block.
Table 13.1 | A List of Some 8051 Versions from Atmel
Part Number ROM RAM I/O Pins Timer Interrupt VCC
Packaging
AT89C51 4K 128 32 2 6 5V 40
AT89LV51 4K 128 32 2 6 3V 40
AT89C1051 1K 64 15 1 3 3V 20
AT89C2051 2K 128 15 2 6 3V 20
AT89C52 8K 128 32 3 8 5V 40
AT89LV52 8K 128 32 3 8 3V 40
M13_9788131787663_C13.indd 482
M13_9788131787663_C13.indd 482 7/3/2012 12:12:39 PM
7/3/2012 12:12:39 PM

In this figure, the bus connecting the blocks is not shown, but keep in mind that
the data bus is 8 bits wide, while the address bus is 16 bits wide. These buses are inside
the chip.
13.2.1 | Eight-bit Registers of 8051
The 8051 is an 8-bit microcontroller. This means that it can handle a maximum data
width of 8 bits only.This also implies that all its data registers are 8 bits long.
The Accumulator Register ‘A’ The most important data register is the A register which
acts as the ‘accumulator’. It is mandatory that the A register carry one of the operands
for all arithmetic instructions. The other operand may be in memory (RAM) or in any
other register.
Register B The register B is not a frequently used register, because it can be used as an
operand only for some specific operations like multiplication and division. For example,
for the multiplication of two numbers,one operand should be in A,and the other should
be in B. Same is the case for division. But it can store data.
Register Banks There is a set of eight registers named R0 to R7, which act as general
purpose data registers. Actually, there are four sets of such registers, each set being called
a register bank. But, at a particular time, only one block is operational. In conclusion, we
can say that the general purpose registers available for data manipulation at any time are
A, B and the current bank of eight registers. See Figure 13.3 which shows this set.
Processor Status Word (PSW) This is an 8-bit register which contains the flag bits,and
also has the bits that permit ‘bank switching’. We will see it in detail in the forthcoming
sections.
Stack Pointer (SP) This is an 8-bit register which stores the address of the top of the
stack.
Reg A (8)
Reg B (8)
PSW (8)
SP (8) PC (16)
DPTR (16)
User RAM
Register
Banks
R
A
M
8
0
5
1
Special
Function
Registers
ROM
P
O
R
T
S
Figure 13.2 | The 8051 architecture from a programmer’s point of view
M13_9788131787663_C13.indd 483
M13_9788131787663_C13.indd 483 7/3/2012 12:12:39 PM
7/3/2012 12:12:39 PM

13.2.2 | Internal Memory
Internal RAM Totally, the 8051 has 256 bytes of RAM, but half of it is reserved to
act as the ‘special function registers’, that is, the registers which are used to handle
the activities of the peripherals of the device. We will discuss the SFRs in Chapter
14. The remaining 128 bytes is what is referred to as internal RAM, and it is divided
into parts. The first 32 bytes act as register banks 0 to 3; each bank contains 8-data
registers named R0 to R7. These registers are used for data manipulations and data
movement. At a time, only one of these banks is operational. It is possible to switch
from the current bank to another bank by using two bits of the PSW. By default, it
is bank 0 that is the current bank. The remaining area of RAM is used simply as user
RAM and can be accessed using their addresses. We will see the details as we go into
programming.
Internal ROM All versions of 8051 contain some amount of ROM (except the
8031). ROM is used to store the final code that an application needs. Once a design
is tested and finalized, the code is burned into ROM. Flash ROM is the type that is
used nowadays. A part of ROM can store data also. There are instructions that allow
us to access ROM during the course of programming.The amount of ROM available
varies between chips, but the maximum possible is 64K (because the address bus is
16 bits).
13.2.3 | Sixteen-bit Registers
Program Counter (PC) This is an address register which ‘sequences’ instructions.
This means that at any time, it points to the address of the next instruction to be
fetched. Instructions are stored in ROM—hence the PC is a pointer to ROM—the
maximum size of ROM is 64K, and the size of the PC is 16 bits. See Figure 13.4. On
reset, the content of PC is 0, meaning that the first instruction is always taken from
the address 0000.
Figure 13.3 | General purpose registers
A
B
R0
R1
R2
R3
R4
R5
R6
R7
M13_9788131787663_C13.indd 484
M13_9788131787663_C13.indd 484 7/3/2012 12:12:39 PM
7/3/2012 12:12:39 PM

DataPointer(DPTR) This is also an address register,and so it is 16 bits in size.However,
it can be used as two 8-bit registers—DPH and DPL, with H and L standing for high
and low respectively, that is, the MSB and LSB of the DPTR. See Figure 13.5.
The DPTR is used as a pointer for accessing internal ROM, and also for external
memory (if added to the chip).
13.2.4 | Ports
There are four 8-bit ports for 8051, and they are named 0 to 3.There are corresponding
port pins, and these are used to connect to the peripherals. The ports are designated as
P0, P1, P2 and P3 and can be used with such designations in programs.They can also be
designated by their RAM addresses in the SFR space.
13.3 | Assembly Language Programming
We have had just a glance at the internal structure of 8051, but with this, we can start
programming. This will help to get a better view of the chip and its use. Also, as we do
assembly language programming, we will get a fuller view of the internal architecture.
The general format of an assembly language instruction line is
LABEL: INSTRUCTION ;COMMENTS
In this, the label and comments are optional.The instruction consists of the opcode
and the operands.The comment field needs a semicolon to indicate its presence.
For programming the 8051, different assemblers are available. In this book, all pro-
grams are tested using the Keil assembler, the details of which can be looked up in
Appendix B. The evaluation version of this is freely downloadable by looking up ‘Keil
microvision 4’ or trying the link https://www.keil.com/demo/eval/c51.htm.
13.3.1 | Modes of Addressing
As in the case of any processor, the 8051 also has various modes of addressing. Here we
use the ‘move’ instruction to illustrate the different modes. The general format of this
instruction is
MOV destination, source
The source and destination can be registers or memory. There is also the possibility
of the source being a data item,that is, an immediate number.
Figure 13.4 | The program counter
Program Counter (16)
DPH (8) DPL (8)
Figure 13.5 | The data pointer register
M13_9788131787663_C13.indd 485
M13_9788131787663_C13.indd 485 7/3/2012 12:12:39 PM
7/3/2012 12:12:39 PM

13.3.1.1 | Register Addressing
In this mode, both the source and destination are registers.The registers that can be used
are A, B and the registers from R0 to R7. Examples of this mode are as follows:
MOV A, R1 ;copy the content of register R1 to register A
MOV R3, R1 ;copy the content of R1 to R3
MOV R7, B ;copy the content of B to R7
13.3.1.2 | Immediate Addressing
In this mode, the source is a data item and is indicated by preceding the data by a ‘#’
symbol, indicating ‘immediate’. Data can be copied to a register or a memory location,
using this mode.
MOV A, #23H ;copy 23H to A
MOV 34H, #2FH ;copy 2FH to the RAM address 34H
MOV DPTR, #0F34CH ;copy 0F34H to DPTR
The 16-bit DPTR can be loaded with immediate data, considering it as two 8-bit
numbers being moved to DPH and DPL
MOV DPH, #0F3H ;copy F3H to the upper part of DPTR,
i.e., DPH
MOV DPL, #4CH ;copy 4CH to the lower part of DPTR,
i.e., DPL
Note
i) Any number is treated as a decimal number. Hexadecimal numbers are to be writ-
ten suffixed with an ‘H’ or prefixed with 0x. For example, 91 is treated as a decimal
number. If it to be considered as a hexadecimal number, it is to be written as 91H or
0x91, in programs.
ii) Hexadecimal numbers starting with the symbols ‘A to F’ must be preceded by a 0.
Otherwise, the assembler will indicate a syntax error. For example, FEH is to writ-
ten as 0FEH, in any program.
13.3.1.3 | Direct Addressing
In this mode, one of the operands is to be in a memory location. For the case of the
MOV instruction, the content of a memory location is moved to a register or vice versa.
The memory that we have is either RAM or ROM. Now, we will look into the case of
addressing RAM locations.The case of ROM will be examined in Section 13.3.1.5.
Before we write any instructions, let’s examine the RAM structure of 8051. The
internal RAM area is 128 bytes with addresses 00 to 7FH. See Figure 13.6 which shows
the RAM structure.
The first 32 bytes of RAM constitute the four register banks. For example, examine
bank 0,which is the default register bank.There are eight registers named R0 to R7.Since
they are part of the internal RAM, they have addresses 00 to 07 also. See Figure 13.7.
Thus,the instruction MOV A,R2 can be written as MOV A,02 as well.In both the cases,
the same content is being addressed.
M13_9788131787663_C13.indd 486
M13_9788131787663_C13.indd 486 7/3/2012 12:12:39 PM
7/3/2012 12:12:39 PM

General Purpose
RAM
Bit Addressable
RAM
7F
2F
1F
0F
30
20
18
17
10
08
07
00
Bank 3
Bank 2
Bank 1
Bank 0
Figure 13.6 | The subdivisions of internal RAM
Similarly MOV A, R0 is the same as MOV A, 00
MOV 06, A is the same as MOV R6, A
MOV B, 04 is the same as MOV B, R4
MOV 00, 03 is the same MOV R0, R3
If bank 0 is the current bank, the other banks are just RAM locations. See the fol-
lowing instructions:
MOV 0FH, A ;copy the content of A to RAM location 0FH
(this is R7 in bank 1)
MOV R1, 09 ;copy the content of RAM location 09 (R1 in
bank 1) to R1
MOV 70H, R4 ;copy the content of R4 to RAM location 70H
(beyond the banks)
MOV 17H, 14H ;copy the content of RAM location 14H (R4 in
bank 2) to; 17H (R7 in bank 2)
M13_9788131787663_C13.indd 487
M13_9788131787663_C13.indd 487 7/3/2012 12:12:39 PM
7/3/2012 12:12:39 PM

Figure 13.7 | Register banks and their RAM addresses
Byte
Address
Bank
3
Bank
2
Bank
1
Bank
0
1F
1E
1D
1C
1B
1A
19
18
17
16
15
14
13
12
11
10
0F
0E
0D
0C
0B
0A
09
08
07
06
05
04
03
02
01
00
R7
R6
R5
R4
R3
R2
R1
R0
R6
R5
R4
R3
R2
R1
R0
R7
R6
R5
R4
R3
R2
R1
R0
R7
R6
R5
R4
R3
R2
R1
R0
R7
Working Registers
M13_9788131787663_C13.indd 488
M13_9788131787663_C13.indd 488 7/3/2012 12:12:39 PM
7/3/2012 12:12:39 PM

Example 13.1
Run the following program and explain what happens after the execution of each line
of the program.
ORG 0
MOV 0FH,#45H
MOV A,0FH
MOV R1,A
MOV 13H,#0FEH
MOV 12H,13H
END
Solution
i) The first line ORG 0 shows the ‘origin’ of the program. It indicates that this pro-
gram is to start from 0, that is, the address from which the first instruction will be
taken up for execution.
ii) The last line END is also mandatory for any assembler. It tells the assembler to stop
reading beyond this line.
iii) For each of the program lines, the result of execution is shown in the comments
field.
MOV 0FH,#45H ;copy data 45H to the RAM address 0FH
MOV A,0FH ;copy the content of 0FH to A
;now register A contains 45H in it
MOV R1,A ;copy the content of A to R1
;now the register R1 contains 45H
MOV 13H,#0FEH ;move data FEH to RAM address 13H
MOV 12H,13H ;copy the content of 13H to 12H
;after program execution, both the RAM
;addresses 12H and 13H contain FEH
13.3.1.4 | Register Indirect Addressing
There is an indirect method of addressing data which is in RAM, by using certain reg-
isters as pointers to the address. However, only registers R0 and R1 are allowed to be
used as pointers.
Consider two data items residing in RAM locations 25H and 30H. R0 and R1 can
be made to act as pointers to these data, by loading the address values in them. See the
code snippet given as follows. The content of these RAM allocations can be copied to
registers A and B using the notation ‘@’ along with R0 and R1.
MOV R0, #25H ;load 25H into R0
MOV R1, #30H ;load 30H into R1
MOV A, @R0 ;copy to A, the contents of RAM pointed by R0
MOV B, @R1 ;copy to B, the contents of RAM pointed by R1
M13_9788131787663_C13.indd 489
M13_9788131787663_C13.indd 489 7/3/2012 12:12:40 PM
7/3/2012 12:12:40 PM

Data can also be copied to RAM locations ‘indirectly’, by using the pointer registers
R0 and R1. But registers R0 to R7 cannot be used as destination registers in indirect
addresses.
MOV R7, @R0 ;this gives a syntax error
However, using the RAM address 07 for R7 does not give any error
MOV 07, @R0 ;this copies to 07 (the address of R7)the contents of
;RAM pointed by R0
Example 13.2
Examine the following program,and ﬁnd out which registers and which RAM locations
contain the data 7EH and EFH, after program execution. What else will be noticed?
ORG 0
MOV 25H,#7EH
MOV 30H,#0EFH
MOV B,#88H
MOV R0,#25H
MOV R1,#30H
MOV A,@R0
MOV 07,@R0
MOV 05H,@R0
MOV 26H,@R1
MOV @R0,B
END
Solution
In this program, R0 acts as a pointer to the data 7EH, which is in RAM address 25H.
After program execution, registers A, R5 and R7 contain the data 7EH. The program
does not show R5 and R7 as destinations to this data, instead their RAM addresses
05 and 07 have been used. In the Keil debugger, this data will be seen in the register list
as well as in the RAM address (however, they mean the same).
The data EFH (in address 30H) is pointed by R1. The program makes this data to
be copied to register B, as well as to RAM address 26H (which is beyond the address of
the register banks. See Figure 13.7.
The last line of the program treats R0 as a pointer to a destination. The content of
B is moved to this destination. Hence 88H will be found in the RAM address 30H.
13.3.1.5 | Indexed Addressing
This is a type of addressing applicable to ROM alone. It was mentioned earlier that
ROM stores program code, but it can also be used to hold some data also. Usually
tables and some constants are stored there, to be used by programs. Suppose we want to
retrieve some of these data items, we cannot access ROM using any of the methods so
far discussed.
M13_9788131787663_C13.indd 490
M13_9788131787663_C13.indd 490 7/3/2012 12:12:40 PM
7/3/2012 12:12:40 PM

The maximum size of ROM is 64K, and so addresses of ROM locations can be
16 bits long. In indexed addressing mode, the DPTR is used as a base address, and the
accumulator is used as an offset. The effective address formed by adding the value of
the base address to the offset, is the ROM address to be accessed. There is the MOVC
instruction especially tailor-made for ROM access.
MOVC A, @A+DPTR is the instruction which loads the content of ROM with
effective address A+DPTR, into the A register.
Consider that the ROM location 0500H is to be accessed.The instructions needed
for getting data from the address are as follows.
MOV A, #0
MOV DPTR, #0500H
MOVC A, @A + DPTR
13.4 | Internal RAM
Before we go into details of active programming, it will be worthwhile to take a second
look at the different subdivisions of internal RAM.
Figure 13.8 shows the memory map of the 256 bytes of internal RAM that the
8051 possesses. The upper 128 bytes from addresses 80H to FFH are addresses of the
special function registers, which cater to the operation of the peripherals. Incidentally,
the addresses of the A, B and SP (Stack Pointer) registers are also in this space.
See Table 13.2.In programs,these addresses may be used,instead of the names,(but that
may not be necessary, normally).The details of the SFRs are discussed in Chapter 14.
General
Purpose RAM
(80 Bytes)
128
Bytes Bit Addressable RAM
(16 Bytes)
Internal
RAM
SFR
FFH
7FH
00
80H
Register Banks
(32 Bytes)
Figure 13.8 | Address map of internal RAM
Table 13.2 | Addresses of Important Registers
Register Name Address
Accumulator (A) E0H
B F0H
Stack Pointer (SP) 81H
M13_9788131787663_C13.indd 491
M13_9788131787663_C13.indd 491 7/3/2012 12:12:40 PM
7/3/2012 12:12:40 PM

Now look at Figuere 13.8 once again. The 16 RAM addresses from 20H to 2FH
are stated to be bit addressable. Figure 13.9 shows the addresses of the ‘bit’ locations.
Note that each 'byte' of RAM of this RAM area has an address. Within that byte, there
are eight 'bit' addresses too. For example, the byte location with address 20H can be
addressed with 8 diﬀerent addresses from 00 to 07, for each of the bits.
How do we differentiate between bit addresses and byte addresses?
For example, 03 is a bit address, but there is also a byte address 03.
The trick is that both use diﬀerent instructions.The instructions that are applicable
to bit addresses are SETB, CLR, JNB and JNB. The following instructions refer to the
‘bit address’ 03.
SETB 03
CLR 03
JB 03, NOPE
For byte addresses, instructions like the following apply.
MOV 03, #45H
MOV A, 03
MOV 03, R4
It is obvious that a byte is referred here.
7F 7E 7D 7C 7B 7A 79 78
6F 6E 6D 6C 6B 6A 69 68
77 76 75 74 73 72 71 70
67 66 65 64 63 62 61 60
5F 5E 5D 5C 5B 5A 59 58
57 56 55 54 53 52 51 50
4F 4E 4D 4C 4B 4A 49 48
47 46 45 44 43 42 41 40
3F 3E 3D 3C 3B 3A 39 38
37 36 35 34 33 32 31 30
2F 2E 2D 2C 2B 2A 29 28
27 26 25 24 23 22 21 20
1F 1E 1D 1C 1B 1A 19 18
17 16 15 14 13 12 11 10
0F 0E 0D 0C 0B 0A 09 08
07 06 05 04 03 02 01 00
Bit Addresses
Byte
Address
2F
2E
2D
2C
2B
2A
Bit
Addressable
Locations
29
28
27
26
25
24
23
22
21
20
Figure 13.9 | Addresses of the bit addressable locations of RAM
M13_9788131787663_C13.indd 492
M13_9788131787663_C13.indd 492 7/3/2012 12:12:40 PM
7/3/2012 12:12:40 PM

What can be done with one bit?
It can be ‘set’ or cleared; there are instructions available which will set or clear each
individual bit of these 16-byte locations, that is, there are 16 × 8 = 128 bit addressable
locations in the internal RAM.
Byte Addressable RAM The rest of the RAM from 30H to 7FH is addressable only as
bytes. These locations can be read from or written into, using the MOV instruction in
the direct or indirect addressing mode as was seen in Section 12.4.1.3.
13.5 | The 8051 Stack
All processors need a stack. A stack is a portion of memory which can act as a temporary
storage location for data which will be taken back later. The stack is a special type of
data structure, in that it can be accessed (read from or written to) only at the ‘top of the
stack’.The instructions used for this kind of access are PUSH (for writing to) and POP
(for reading from). The stack is user defined, but on start up, the stack pointer register
contains the number 07.
The 8051 stack is an ascending stack.This means that as data is pushed in, it grows
upwards to increasing addresses. If the stack pointer contains the number 07, the stack
area is defined from 08 onwards. The first data item pushed in will be stored in 08, the
next in 09 and so on.
In practice, it is best to change the value of the stack pointer from 07 to a higher
location. This is because register bank 1 starts at 08, and using the stack there, prevents
us from being able to use bank 1. For example, the instruction MOV SP, #42H makes
the stack to be defined from 43H onwards (upwards), by loading the number 42H into
the stack pointer.
13.5.1 | The Push Operation
Consider the case shown in Figure 13.10, where the SP value is 42H. Assume that
R3 contains the value xx and R5 contains the value yy.
Let’s do the exercise of pushing in the values of R3 and R5 onto the stack. The
natural way of pushing in would be to write the instructions PUSH R3 followed by
PUSH R5. But 8051 does not allow us to use register names in the PUSH instruction.
The address of these registers needs to be used. If register bank 0 is the one referred to,
the instructions to be used are
PUSH 03 ;push R3
PUSH 05 ;push R5
The stack pointer is now incremented (after each push instruction) and the end, it
has the content of 44H. Figure 13.10 shows the stack
13.5.2 | The pop operation
Now, to do popping to, say, register R7, the instruction to be used is
POP 07 ;pop R7
M13_9788131787663_C13.indd 493
M13_9788131787663_C13.indd 493 7/3/2012 12:12:40 PM
7/3/2012 12:12:40 PM

The SP value is decremented by one, to become 43H, and data which is on top of
the stack, that is, yy, will now be in R7. See Figure 13.11.
Example 13.3
Assume that register bank 2 is now the working bank. It is needed to push the content
of registers R5 and R6 to stack. It is also needed to pop out these values to A and B,
respectively. Write instructions for this.
Solution
PUSH 15H ;push the content of R5 of bank 2
PUSH 16H ;push the content of R6 of bank 2
POP 0F0H ;pop the content to B
POP 0E0H ;pop the content to A
Now let’s examine the salient points of this simple program
i) The addresses of the registers have been used.
ii) The content of R5 is ﬁrst pushed in, and then that of R6. After this, the stack top
contains the data that was in R6.
iii) The requirement is to copy the contents (through the stack) of R5 to A and R6
to B. Since the stack top contains the content of R6, the ﬁrst POP operation is to
pop to B.
Before PUSH
SP Value
After PUSH
X
X
Y
Y
0 × 42
0 × 44
Figure 13.10 | PUSH operation for the 8051 stack
After POP
SP Value
Before POP
Stack
X
X
Y
Y
0 × 43
0 × 44
Figure 13.11 | POP operation in an 8051 stack
M13_9788131787663_C13.indd 494
M13_9788131787663_C13.indd 494 7/3/2012 12:12:40 PM
7/3/2012 12:12:40 PM

iv) Keep in mind that for a stack, what is pushed in last is what can be popped out first.
v) The second pop operation copies the stack top contents to A.
Note There is no need to use the stack to copy the contents of one register to another.
The program here uses this method only to illustrate the working of a stack.
13.6 | Processor Status Word (PSW)
This is a register with address D0H, which has the conditional flags, and also contains
the bits that allows the switching of register banks, see Figure 13.12.
This register (like many other SFRs), is bit addressable, meaning that each bit can
be set or cleared individually.To do that, the notation for each bit is as given in column 2
of Table 13.3.
Now let’s discuss the bits of the PSW register.
The Carry Flag (CY) Bit D7, notated as PSW.7 is the carry flag.The carry flag (CY) is
set if there is a carry out from the most significant bit during a calculation. For example,
when 8-bit addition causes the result to be greater than 8 bits, there is a carry out from
the MSB (D7), which causes this flag to be set.This flag is set also when there is ‘borrow’
during subtraction.
The Auxiliary Carry Flag (AC) Bit D6, notated as PSW.6 is the AC flag.This flag func-
tions similar to the carry flag, except that the overflow is from bit D3 into D4. It thus
indicates a carry out from the lower 4 bits. The need for this flag is for the Decimal
Adjust (DA) instruction, which is important in BCD number calculations.There are no
other instructions that directly test the state of this flag and no conditional branching is
associated with this flag.
CY AC F0 RS1 RS0 OV -- P
Figure 13.12 | The PSW register
Table 13.3 | Bits of the PSW and their Functions
Bit Symbol Bit Notation Bit Position
Carry flag CY or C PSW.7 D7
Auxiliary carry AC PSW.6 D6
Flag 0 F0-user defined PSW.5 D5
RS1 Register bank select 1 PSW.4 D4
RS0 Register bank select 0 PSW.3 D3
OV Overflow flag PSW.2 D2
R Reserved PSW.1 D1
P Parity flag PSW.0 D0
M13_9788131787663_C13.indd 495
M13_9788131787663_C13.indd 495 7/3/2012 12:12:40 PM
7/3/2012 12:12:40 PM

The Overflow Flag (OV) Bit D2, notated as PSW.2 is the overflow flag.This flag is set
under one of the following conditions:
i) There is an overflow into the MSB (D7) from the bit of lower significance, but no
carry out from the MSB,
ii) There is a carry out from the MSB, but no carry into the MSB.
This flag indicates that the result of a signed number operation is too large, causing
the higher order bit to overflow into the sign bit, thus changing the sign bit.
The Parity Flag (P) Bit D0, notated as PSW.0 is the parity flag. The setting of this
flag indicates the presence of an even number of ‘1’ bits in the destination. For example,
after a particular arithmetic or logic operation, if the destination contains the number
11100111, the parity flag is set to indicate even parity.
Have you noticed that there is no zero flag (Z) for 8051?
Well, there isn’t a zero flag that a user has to take note of, but instead there are instruc-
tions which verify if the result of an arithmetic operation causes the A register to be zero,
and then takes appropriate action. We will see it in the context of our discussions on the
instruction set and programming.
Bank Switching The bits RS1 and RS0 (PSW.4 and PSW.3) are used for bank switch-
ing.There are four banks of registers designated as Bank 0, Bank 1, Bank 2 and Bank 3.
On start up, it is the default bank 0, which is the ‘current bank’. To switch to another
bank, refer to Table13.4. When bank 0 is being used, RS0 and RS1 are 00.To get to use
bank 1, say, the instruction to be used just needs to set PSW.3, that is, RS0. Similarly,
other banks can be used by setting /clearing these two bits.
Unused Bits of the PSW One bit, PSW.5 is available to the user to define as he deems
fit or leave unused.The bit PSW.1 is reserved for future uses.
13.7 | Assembler Directives
Very soon, we will learn the instructions set of 8051, and start the programming process
in dead earnest. So it is important to be aware of some of the important directives used
by a typical 8051 assembler. You should already know that directives are different from
instructions, in that they are non-executable statements.They just help the assembler by
providing certain important information.
Table 13.4 | Bit Values for Bank Switching
RS1 RS0 Register Bank
0 0 0
0 1 1
1 0 2
1 1 3
M13_9788131787663_C13.indd 496
M13_9788131787663_C13.indd 496 7/3/2012 12:12:40 PM
7/3/2012 12:12:40 PM

ORG ORG is a directive which means ‘origin’. In the context of assembly language
programming, it defines the starting address for any item (data or code) in the pro-
gram memory (ROM). We have already used the statement ORG 0 in Examples 13.1
and 13.2.
EQU This directive allows us to equate names to constants.The assembler just replaces
the names by the values mentioned.
Examples
COST EQU 34 ;equate the label COST to 34
PRICE EQU 56H ;equate the label PRICE by 56H
DB This directive stands for ‘data byte’ and places an 8-bit number constant at this
memory (ROM) location. If labels are used for these memory locations, a ‘colon’ should
suffix the labels.
Examples
NUMBER: DB 67H ;store the hex no.67H at a location NUMBER
FACT: DB 90 ;store the decimal no 90 at location FACT
STRNG: DB “MIST” ;store the ASCII string as bytes in locations, the
;starting location’s name being STRNG. Note that
;four bytes corresponding to four characters get
;stored
LIST: DB 34, 09, 0EH, 0FEH
;store four bytes at addresses starting from
;the location named LIST
BIT This directive equates a bit to either ‘1’ or ‘0’, or labels a bit which can be set or
reset. When the bit is named, the name, when encountered in a program is replaced by
the logic value specified.
Examples
FLIP BIT 0 ;the name FLIP is replaced by 0
TOG BIT 1 ;the name FLIP is replaced by 1
LIFT BIT P0.1 ;the name LIFT is replaced by the logic value of
;the port pin P0.1
REE BIT PSW.5 ;the name REE is replaced by the logic value
;of the register bit PSW.5
END This indicates that the assembler need not read beyond this.
13.8 | Storing Data in Code Memory (ROM)
We know that what we store in ROM is program code but data can also be stored and read
from it, when needed. In fact, the ROM does not need to know what is stored;whatever
is stored is binary numbers,either way.But when we store data,it is needed for use by the
program, and therefore there should be a mechanism to read it and bring it to registers.
M13_9788131787663_C13.indd 497
M13_9788131787663_C13.indd 497 7/3/2012 12:12:40 PM
7/3/2012 12:12:40 PM

Example 13.4
See the following program which illustrates the use of some of the above referred direc-
tives. Data is to be stored in ROM addresses 0500H and 0800H onwards. Note that
there are no instructions in this program, but only directives. Give a brief explanation on
what is achieved by each of these lines.
ORG 0500H
MY: DB 89
YOU: DB 78, 56, 90
ORG 0800H
ALL: DB “5678”
SENTNC: DB “MY NAME”
END
Solution
This is only a set of directives. No instructions are involved, so ‘execution’ is not neces-
sary. This data is just burned in ROM at the address 0500H and also at 0800H. In the
simulator, the data is found to be in code memory, just after the program is assembled.
We ﬁnd data as in Tables 13.5 and 13.6.
We ﬁnd another set of data from address 0800H onwards.
Table 13.5 | Data in ROM from 0500H
Label Address in HEX Content in HEX Explanation
MY 0500 59 Hex value of decimal 89
YOU 0501 4E Hex value of decimal 78
0502 38 Hex value of decimal 56
0503 5A Hex value of decimal 90
Table 13.6 | Data in ROM from 0800H
Label Address in HEX Content in HEX Explanation
ALL 0800 35 ASCII value of 5
0801 36 ASCII value of 6
SENTNC 0804 4D ASCII value of M
0805 59 ASCII value of Y
0806 20 ASCII value of space
0807 4E ASCII value of N
0808 41 ASCII value of A
0809 4D ASCII value of M
080A 45 ASCII value of E
M13_9788131787663_C13.indd 498
M13_9788131787663_C13.indd 498 7/3/2012 12:12:40 PM
7/3/2012 12:12:40 PM

13.9 | The Instruction Set of 8051
Now, let’s start the assembly language programming of 8051, by ﬁrst understanding the
instruction set.The instructions can be divided into functional groups as follows:
i) Data transfer instructions
ii) Bit manipulation instructions
iii) Branch instructions
iv) Port manipulation instructions
v) Arithmetic instructions
vi) Logical instructions
vii) Call and return instructions
13.9.1 | Data Transfer Instructions
In any processor, moving data is of primary concern.There is a destination and a source
for the movement. Data is moved between registers, memory and ports.Table 13.7 lists
the data transfer instructions of 8051.
Now let’s have a more detailed discussion on how and when each of these instruc-
tions is to be used.
13.9.1.1 | MOV—Move
Usage: MOV dest, src
For 8051, the data is 8 bits in size and so it is 8-bit data that is always moved. Let’s
see a few examples and the actions performed by them:
MOV A, B ;copy the content of B to A
MOV A, 56H ;copy the content of RAM location 56H to A
Table 13.7 | List of the Data Transfer Instructions
Sl. No. Instruction Format Function Performed Flags Affected
1 MOV dest, src Copy the content of source
to destination
None
2 MOVX dest, src Used to move from/to
external data memory only
None
3 MOVC dest, src Used to move from program
memory (ROM) only
None
4 PUSH src Copies one byte from source
to stack top
None
5 POP dest Copies one byte from stack
top to destination
None
6 XCH Exchanges data between
two sources
None
7 SWAP A Exchanges the upper and
lower nibbles of A
None
M13_9788131787663_C13.indd 499
M13_9788131787663_C13.indd 499 7/3/2012 12:12:40 PM
7/3/2012 12:12:40 PM

MOV @R0, B ;copy the content of B to the RAM address pointed by R0
MOV P0, A ;copy the content of A to Port 0
MOV R1, #45 ;copy 45 (decimal) to R1
MOV P1, R1 ;copy the content of R1 to Port1
MOV DPTR, #4567H ;this register is 16-bit, and it has to contain a 16-bit
;address
13.9.1.2 | MOVX—Move To/From External RAM
Sometimes the 8051 needs extra RAM, as the internal data RAM in sufficient. If extra
data RAM is connected externally, its content is accessed using the MOVX instruction.
The DPTR is used for this instruction by loading the RAM address (to be accessed) into
it.Then MOVX is used as shown as follows:
MOVX @DPTR, A ;copy data from A to the address specified in DPTR
MOVX A, @DPTR ;copy data from address specified in DPTR to A
13.9.1.3 | MOVC
The use of this instruction for accessing ROM has already been considered in
Section12.4.1.5. The ROM can be external or internal. In this book, we will concern
ourselves only with on-chip ROM.
13.9.1.4 | PUSH and POP
The use of these instructions has already been discussed in Section 12.6.
13.9.1.5 | XCH—Exchange
There are two instructions which perform the act of exchanging (swapping).One is 8-bit
swapping, and the other is 4-bit swapping.
The format for 8-bit exchange is XCH A, byte. The byte can be another register or
a memory location.
Examples
XCH A, R1 ;exchange the contents of A and R1
XCH A, 34H ;exchange the contents of A and RAM address 34H
13.9.1.6 | Nibble Swapping
The format of this is XCHD A,@Ri.
This instruction exchanges the lower nibble of A and the lower nibble of the byte
pointed by Ri, leaving the upper nibbles of both unchanged.
MOV A, #45H ;move 45H to A
MOV 56H, #30H ;move 30H to RAM address 56H
MOV R0, #56H ;move 56H to R0
XCHD A, @R0 ;exchange the lower nibble of A with that @R0
After execution, A will contain 40H and the RAM address will contain 35H.
M13_9788131787663_C13.indd 500
M13_9788131787663_C13.indd 500 7/3/2012 12:12:41 PM
7/3/2012 12:12:41 PM

13.9.2 | Bit Manipulation Instructions
All microcontrollers need to address data at the bit level because they may have to deal
with one bit interfaces like single switches, LEDs, relays, etc. All these devices require
the setting or clearing of single bits.
Which are the bits that can be addressed individually?
i) The carry flag
ii) The bits in the bit addressable area of RAM
iii) The bits in registers that are bit addressable, e.g., PSW.1. P0.3, P2.4, ACC.0, etc.
Let’s see, one by one, the instructions which allow single bits to be addressed.
Refer Table 13.8 and the following examples.
Example
SETB ACC.0 ;set the 0th
bit of the Accumulator, i.e., the A register
SETB PSW.4 ;PSW.4 = 1, i.e., RS1 = 1
SETB P2.4 ;P2.4 = 1
CLR C ;C = 0
CLR 22H ;clear the bit in RAM location 22H
CPL ACC.2 ;complement D2 of the A register
CPL P0.0 ;complement P0.0
ORL C, 26H ;move to C the logical OR of content of 26H and C
MOV C, P3.2 ;move to C, the bit value of P3.2
ANL 29H, C ;move to 29H the logical AND of content of 29H
;and C
Table 13.8 | List of Data Manipulation Instructions
Sl. No. Instruction Function to be Performed
1 SETB bit Set the indicated bit
2 CLR bit Clear the indicated bit
3 CPL bit Complement the indicated bit
4 MOV C, bit Move indicated bit to C (carry flag )
5 MOV bit, C Move C (carry flag) to the indicated bit
6 ANL C, bit Move to C, (carry flag) the logical AND of C and the
indicated bit
7 ANL bit, C Move to the bit ,the logical AND of C (carry flag) and
the indicated bit
8 ORL C, bit Move to C, (carry flag) the logical OR of C and the
indicated bit
9 ORL bit, C Move to the bit, the logical OR of C (carry flag) and
the indicated bit
M13_9788131787663_C13.indd 501
M13_9788131787663_C13.indd 501 7/3/2012 12:12:41 PM
7/3/2012 12:12:41 PM

Example 13.5
When the MCU is powered on, the default register bank is bank 0. Write a program to
switch to bank 3, and then to switch to bank 1.
Solution
On startup, it is register bank 0 which is operational, i.e., RS0 = RS1 = 0 will be the
status of the bank select bits.
The bits of the PSW needed for selecting a register bank are shown as follows:
RS1 (PSW.4) RS0 (DSW.3)
Bank 0 0 0
Bank 1 0 1
Bank 2 1 0
Bank 3 1 1
The code lines are as follows, for switching to bank 3.
SETB PSW.4 ;make RS1 = 1
SETB PSW.3 ;make RS0 = 1
Now to switch to bank 1, it is only necessary to make RS1 = 0. The instruction for
this is
CLR PSW.4 ;make RS1 = 0
13.9.3 | Branch Instructions
Branching is a very important aspect in programming, and making its actions to
be ‘conditional’is what gives decision-making capability to any computer. In 8051, there
are unconditional, as well as conditional branch instructions. Let’s have a look at these
instructions. We will start with the unconditional type of the ‘jump’ instruction.
13.9.3.1 | Unconditional Jump Instructions
SJMP Target
SJMP stands for ‘short jump’; it is also a ‘relative’ jump. What these terms mean is that
the destination is expressed as a ‘relative’ number, and that the ‘relative’ number is short,
that is, only 8 bits. The number is a signed number, and denotes the distance (in bytes)
between the current PC value and the target (destination) address. If this number is
negative, it is a backward jump, the maximum range of which is −128. If forwarding
jumping is what is needed, the number is positive with a maximum range of +127.
In practice, the programmer does not need to know this relative number. He can
write the label corresponding to the destination, and the assembler will calculate the
‘displacement’ and insert it into the machine code. To get to the target, the value of PC
(when the SJMP instruction is executed) will be added to the displacement in the code,
and this will be the new value of PC.Thus,control will be taken to the target address and
the change of ﬂow of the program occurs.
M13_9788131787663_C13.indd 502
M13_9788131787663_C13.indd 502 7/3/2012 12:12:41 PM
7/3/2012 12:12:41 PM

See the following code snippet.The target has the label THERE. When the SJMP
instruction is being executed, the PC will be pointing to the next instruction in the
sequence. The assembler calculates the oﬀset to the ‘THERE’ label, adds it to PC and
then continues execution, which is from the address THERE.
ORG 0
-----------
-----------
SJMP THERE
-----------
-----------
-----------
THERE: ADD A, R1
-----------
-----------
LJMP Target
This is a long jump instruction, and it is not ‘relative’. It is a three byte instruction; the
ﬁrst byte is the opcode, and the next two bytes constitute the absolute address of the
target. When this instruction is executed, the current PC value is simply replaced by the
16-bit number in the instruction, which can have any value from 0 to FFFFH. Since
code is written in program memory (ROM), the 16-bit number can only be as big as the
actual ROM present in the chip (all 8051 chips will not have 64K ROM).
AJMP Dest
This is also a relative jump, but the range of jumping is 2K and 11 bits specify the des-
tination range.
Unconditional jump instructions will be used later in programs which generate
square waves (Section 13.12.1) Besides that, it is likely that many programs end with
the following line
LABEL: SJMP LABEL or
SJMP $ which means the same
This instruction loops to itself continuously. This is done, because when programs
are burned in ROM, execution should not proceed beyond the last instruction in the
program. In the ROM, in addresses beyond the last line, random numbers (some of
which are codes burned earlier) are likely to have been stored and if those get executed,it
will cause havoc.To prevent this from happening, the last code line can be written using
an SJMP instruction like
HERE: SJMP HERE ;loops to HERE and stays in HERE
13.9.3.2 | Conditional Branch Instructions
These are the instructions which make programming really useful. Computers are used
for repetitive and conditional tasks and conditional branching is the method for it.
Table 13.9 gives the list of conditional branch instructions.
Note All conditional jumps are short jumps.
M13_9788131787663_C13.indd 503
M13_9788131787663_C13.indd 503 7/3/2012 12:12:41 PM
7/3/2012 12:12:41 PM

JC Target Jump on carry to target.
JNC Target Jump on No carry, to target.
These instructions test the carry flag, and jump to the target depending on the con-
dition of the carry flag, as specified.
JZ Target Jump on Zero (A = 0) to target.
JNZ Target Jump on no Zero (A! = 0) to target.
Recollect that there is no Zero flag for 8051.Theses two instructions test the A reg-
ister to see if it contains a non-zero number or not, and jumps to the target accordingly.
JB Bit, Target Jump to target if the specified bit is set.
JNB Bit, Target Jump to target if the specified bit is not set.
JBC Bit Target Jump to target if the specified bit is set, then clear the bit.
The bits that can be used here are bits of registers, ports or RAM.
Examples
JB P1.0, THERE ;a port bit is tested and if found set, jumps
JNB ACC.4, WOW ;a register bit is tested and if found cleared, jumps
JB 05, NOPE ;a RAM bit is tested and if found set, jumps
DJNZ Byte, Target This instruction decrements the specified byte and jumps to the
target if the byte becomes zero (on decrementing). The byte can be one of the registers
of the register bank, or a RAM address.
Examples
DJNZ R2, HEER ;decrement R2, and jump to HEER if R2! = 0
DJNZ 54H, MEER ;decrement content of RAM 54H, and jump to MEER if ! = 0
Now, let’s use these conditional jump instructions in programs.
Table 13.9 | List of Conditional Jump Instructions
Sl. No. Mnemonic Function to be Performed Flags Affected
1 JC target Jump if CY = 1 None
2 JNC target Jump if CY = 0 None
3 JZ target Jump if the register A = 0 None
4 JNZ target Jump if the register A is not
zero
None
5 JB bit, target Jump if the bit = 1 None
6 JNB bit, target Jump if the bit = 0 None
7 JBC bit, target Jump if the bit = 1. Then
clear the bit
None
8 DJNZ byte, target Decrement the byte, and
jump if the byte is zero
None
M13_9788131787663_C13.indd 504
M13_9788131787663_C13.indd 504 7/3/2012 12:12:41 PM
7/3/2012 12:12:41 PM

Example 13.6
Write a program to fill 20 spaces in RAM with the ASCII value of ‘*’.
Solution
ORG 0
MOV A,#’*’ ;move the ASCII of * to A
MOV R0,#40H ;R0 is the pointer to address 40H
MOV R3,#20 ;R3 = 20, is the counter
THERE: MOV @R0,A ;A = ’*’is copied to pointed address
INC R0 ;increment the pointer
DJNZ R3,THERE ;repeat 20 times, until R3 = 0
HERE: SJMP HERE ;stay in HERE
END
This is a very direct program. The ASCII value of * is loaded in A, and moved to 20
RAM locations, by using R0 as the pointer to the starting address 30H. The pointer is
incremented 20 times and at the end of the program, we find that 20 locations in RAM
have this content – i.e.’*’.The counter R3 is decremented by the DJNZ instruction, and
it causes jumping out of the loop once the counter is zero.
Example 13.7
Store the ASCII values of the first 10 capital letters of the alphabet in ROM locations.
Bring these 10 values to RAM locations starting from 40H.
Solution
ORG 0
MOV R3,#10 ;R3 = 10
MOV R1,#40H ;R1 points to RAM address 40H
MOV DPTR,#0500H ;DPTR points to ROM address 0500H
THERE: MOV A,#0 ;A = 0
MOVC A,@A+DPTR ;copy content of ROM to A
MOV @R1,A ;move the value in A to RAM
INC R1 ;increment the RAM pointer
INC DPTR ;increment the ROM pointer
DJNZ R3,THERE ;repeat actions until R3 = 0
ORG 0500H
DB “ABCDEFGHIJ”
END
In this example, both ROM and RAM are accessed. The ROM address (starting from
0500H) is pointed by DPTR,while the RAM address (from 40H) is pointed by R1.The
directive DB at 0500H stores the required characters in ROM.
M13_9788131787663_C13.indd 505
M13_9788131787663_C13.indd 505 7/3/2012 12:12:41 PM
7/3/2012 12:12:41 PM

The content of ROM is brought to A and copied to RAM continuously until the
counter R3 is 0.
Note that the 10 letters of the alphabets are stored as an ASCII string (in double
quotes). In some (not Keil) assemblers, it is not possible to store strings like this,
instead they should be stored as single characters separated by commas, i.e,‘A’,‘B’,‘C’…
and so on.
13.9.4 | Arithmetic Instructions
The complete list of arithmetic operations of the 8051 is given in Table13.10.
Note For 8051, it is mandatory that the A register is one of the operands for addition,
subtraction, multiplication and division.
13.9.4.1 | Addition Instructions
ADD A, src
This instruction adds the source and A, and puts the sum in A. The CY, OV and AC
flags are affected.
Table 13.10 | List of Arithmetic Instructions
Sl. No. Instruction Format Function to be Performed Flags Affected
1 ADD A, src Add the source to A – result
in A
OV, AC, CY
2 ADDC A, src Add the source and C (carry
flag) to A – result in A
OV, AC, CY
3 INC dest Add 1 to the destination None
4 SUBB A, src Subtract the source and C
from A – result in A
OV, AC, CY
5 DEC dest Subtract 1 from the
destination
None
6 MUL AB Multiply (unsigned) the
content of A and B registers –
result in A (lower byte) and B
(upper byte)
OV, CY
7 DIV AB Divide (unsigned) the
content of A by the content
of B – result in A (quotient)
and B (remainder)
OV, CY
8 DA A Decimal adjust after addition
of BCD numbers
CY
9 CLR A A = 0 None
10 CJNE dest, source,
target
Compare the source and
destination, and jump to
target if they are not equal
CY
M13_9788131787663_C13.indd 506
M13_9788131787663_C13.indd 506 7/3/2012 12:12:41 PM
7/3/2012 12:12:41 PM

ADD A, 32H ;add the content of RAM address 32H to A – sum in A
ADD A, @R1 ;add the content of RAM address pointed by R1 – sum in A
ADD A, R2 ;add the content of R2 to A – sum in A
ADD A, #67 ;add 67 (decimal) to A -sum in A
ADC A, src
This is the ‘add with carry’ instruction. The source, the carry flag and A are added and
the sum is put in A.
ADDC A, #76H ;add 76H and CY to A — sum in A
ADDC A, 56H ;add the content of 56H and CY and A – sum in A
ADDC A, R4 ;add content of R4 and CY and A – sum in A
INC dest
This instruction adds 1 to the destination, which can be any register or RAM location.
No flags are affected.
INC R5 ;add 1 to the number in R5
INC @R0 ;add 1 to the number in the address pointed by R0
INC A ;add 1 to the number in A
INC 43H ;add 1 to the content in address 43H
Example 13.8
Add the first 20 natural numbers and store the sum in a RAM location.
Solution
ORG 0
MOV R3,#20 ;move 20 to R3
CLR A ;A = 0
MOV R2,#1 ;R2 = 1
THERE: ADD A,R2 ;A = A+R2
INC R2 ;increment R2
DJNZ R3,THERE ;decrement R3, jump to THERE if R3! = 0
MOV 40H,A ;since R3 = 0, copy A to 40H
END
This is quite a simple program. In this program, the numbers 1, 2, 3…… 20 are consecu-
tively added and the sum is stored i

Embedded System -Lyla B Das.pdf

More Related Content

What's hot (20)

Similar to Embedded System -Lyla B Das.pdf (20)

Recently uploaded (20)

Embedded System -Lyla B Das.pdf