SlideShare a Scribd company logo
Programmable Digital Signal Processors Vol 13
Architecture Programming And Applications Yu Hen
Hu download
https://ebookbell.com/product/programmable-digital-signal-
processors-vol-13-architecture-programming-and-applications-yu-
hen-hu-4106462
Explore and download more ebooks at ebookbell.com
Here are some recommended products that we believe you will be
interested in. You can click the link to download.
Programmable Digital Signal Processors Architecture Programming And
Applications Yu Hen Hu
https://ebookbell.com/product/programmable-digital-signal-processors-
architecture-programming-and-applications-yu-hen-hu-1207054
Programmable Digital Signal Processors Architecture Programming And
Applications Yu Hen Hu
https://ebookbell.com/product/programmable-digital-signal-processors-
architecture-programming-and-applications-yu-hen-hu-1353792
Digital Signal Processing With Field Programmable Gate Arrays 4th
Edition Uwe Meyerbaese
https://ebookbell.com/product/digital-signal-processing-with-field-
programmable-gate-arrays-4th-edition-uwe-meyerbaese-34709906
Digital Signal Processing With Field Programmable Gate Arrays Dr Uwe
Meyerbaese Ph D Auth
https://ebookbell.com/product/digital-signal-processing-with-field-
programmable-gate-arrays-dr-uwe-meyerbaese-ph-d-auth-4187492
Digital Signal Processing With Field Programmable Gate Arrays Dr Uwe
Meyerbaese Auth
https://ebookbell.com/product/digital-signal-processing-with-field-
programmable-gate-arrays-dr-uwe-meyerbaese-auth-4189602
Digital Signal Processing With Field Programmable Gate Arrays
Originally Published As A Monograph3rd Ed Dr Uwe Meyerbaese Auth
https://ebookbell.com/product/digital-signal-processing-with-field-
programmable-gate-arrays-originally-published-as-a-monograph3rd-ed-dr-
uwe-meyerbaese-auth-4192124
Digital Signal Processing With Field Programmable Gate Arrays Meyer
Baese U
https://ebookbell.com/product/digital-signal-processing-with-field-
programmable-gate-arrays-meyer-baese-u-1085416
Digital Signal Processing With Field Programmable Gate Arrays Uwe
Meyerbaese
https://ebookbell.com/product/digital-signal-processing-with-field-
programmable-gate-arrays-uwe-meyerbaese-1317546
Digital Systems Design And Prototyping Using Field Programmable Logic
And Hardware Description Languages 2nd Edition Zoran Salcic
https://ebookbell.com/product/digital-systems-design-and-prototyping-
using-field-programmable-logic-and-hardware-description-languages-2nd-
edition-zoran-salcic-4188514
Programmable Digital Signal Processors Vol 13 Architecture Programming And Applications Yu Hen Hu
Marcel Dekker, Inc. New York • Basel
TM
Programmable Digital
Signal Processors
edited by
Yu Hen Hu
University of Wisconsin–Madison
Madison, Wisconsin
Architecture, Programming,
and Applications
Copyright © 2001 by Marcel Dekker, Inc. All Rights Reserved.
ISBN: 0-8247-0647-1
This book is printed on acid-free paper.
Headquarters
Marcel Dekker, Inc.
270 Madison Avenue, New York, NY 10016
tel: 212-696-9000; fax: 212-685-4540
Eastern Hemisphere Distribution
Marcel Dekker AG
Hutgasse 4, Postfach 812, CH-4001 Basel, Switzerland
tel: 41-61-261-8482; fax: 41-61-261-8896
World Wide Web
http://www.dekker.com
The publisher offers discounts on this book when ordered in bulk quantities. For more
information, write to Special Sales/Professional Marketing at the headquarters address
above.
Copyright  2002 by Marcel Dekker, Inc. All Rights Reserved.
Neither this book nor any part may be reproduced or transmitted in any form or by any
means, electronic or mechanical, including photocopying, microfilming, and recording,
or by any information storage and retrieval system, without permission in writing from
the publisher.
Current printing (last digit):
10 9 8 7 6 5 4 3 2 1
PRINTED IN THE UNITED STATES OF AMERICA
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
v
Series Introduction
Over the past 50 years, digital signal processing has evolved as a
major engineering discipline. The fields of signal processing have
grown from the origin of fast Fourier transform and digital filter
design to statistical spectral analysis and array processing, image,
audio, and multimedia processing, and shaped developments in high-
performance VLSI signal processor design. Indeed, there are few
fields that enjoy so many applications—signal processing is
everywhere in our lives.
When one uses a cellular phone, the voice is compressed, coded,
and modulated using signal processing techniques. As a cruise missile
winds along hillsides searching for the target, the signal processor is
busy processing the images taken along the way. When we are
watching a movie in HDTV, millions of audio and video data are
being sent to our homes and received with unbelievable fidelity.
When scientists compare DNA samples, fast pattern recognition
techniques are being used. On and on, one can see the impact of
signal processing in almost every engineering and scientific
discipline.
Because of the immense importance of signal processing and the
fast-growing demands of business and industry, this series on signal
processing serves to report up-to-date developments and advances in
the field. The topics of interest include but are not limited to the
following:
· Signal theory and analysis
· Statistical signal processing
· Speech and audio processing
· Image and video processing
· Multimedia signal processing and technology
· Signal processing for communications
· Signal processing architectures and VLSI design
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
Signal Processing and Communications
Editorial Board
Maurice G. Ballanger, Conservatoire National
des Arts et Métiers (CNAM), Paris
Ezio Biglieri, Politecnico di Torino, Italy
Sadaoki Furui, Tokyo Institute of Technology
Yih-Fang Huang, University of Notre Dame
Nikhil Jayant, Georgia Tech University
Aggelos K. Katsaggelos, Northwestern University
Mos Kaveh, University of Minnesota
P. K. Raja Rajasekaran, Texas Instruments
John Aasted Sorenson, IT University of Copenhagen
1. Digital Signal Processing for Multimedia Systems, edited by Keshab
K. Parhi and Takao Nishitani
2. Multimedia Systems, Standards, and Networks, edited by Atul Puri
and Tsuhan Chen
3. Embedded Multiprocessors: Scheduling and Synchronization, Sun-
dararajan Sriram and Shuvra S. Bhattacharyya
4. Signal Processing for Intelligent Sensor Systems, David C. Swanson
5. Compressed Video over Networks, edited by Ming-Ting Sun and Amy
R. Reibman
6. Modulated Coding for Intersymbol Interference Channels, Xiang-Gen
Xia
7. Digital Speech Processing, Synthesis, and Recognition: Second Edi-
tion, Revised and Expanded, Sadaoki Furui
8. Modern Digital Halftoning, Daniel L. Lau and Gonzalo R. Arce
9. Blind Equalization and Identification, Zhi Ding and Ye (Geoffrey) Li
10. Video Coding for Wireless Communication Systems, King N. Ngan,
Chi W. Yap, and Keng T. Tan
11. Adaptive Digital Filters: Second Edition, Revised and Expanded,
Maurice G. Bellanger
12. Design of Digital Video Coding Systems, Jie Chen, Ut-Va Koc, and
K. J. Ray Liu
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
13. Programmable Digital Signal Processors: Architecture, Program-
ming, and Applications, edited by Yu Hen Hu
14. Pattern Recognition and Image Preprocessing: Second Edition, Re-
vised and Expanded, Sing-Tze Bow
15. Signal Processing for Magnetic Resonance Imaging and Spectros-
copy, edited by Hong Yan
16. Satellite Communication Engineering, Michael O. Kolawole
Additional Volumes in Preparation
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
Preface
Since their inception in the late 1970s, programmable digital signal processors
(PDSPs) have gradually expanded into applications such as multimedia signal
processing, communications, and industrial control. PDSPs have always played
a dual role: on the one hand, they are programmable microprocessors; on the
other hand, they are designed specifically for digital signal processing (DSP)
applications. Hence they often contain special instructions and special architec-
ture supports so as to execute computation-intensive DSP algorithms more effi-
ciently. This book addresses various programming issues of PDSPs and features
the contributions of some of the leading experts in the field.
In Chapter 1, Kittitornkun and Hu offer an overview of the various aspects
of PDSPs. Chapter 2, by Managuli and Kim, gives a comprehensive discussion
of programming methods for very-long-instruction-word (VLIW) PDSP architec-
tures; in particular, they focus on mapping DSP algorithms to best match the
underlying VLIW architectures. In Chapter 3, Lee and Fiskiran describe native
signal processing (a technique to enhance the performance of multimedia signal
processing by general-purpose microprocessors) and compare various formats for
multimedia extension (MMX) instruction. Chapter 4, by Tessier and Burleson,
presents a survey of academic research and commercial development in recon-
figurable computing for DSP systems over the past 15 years.
The next three chapters focus on issues in software development. In Chapter
5, Wu and Wolf examine the pros and cons of various options for implementing
video signal processing applications. Chapter 6, by Yu and Hu, details a method-
ology for optimal compiler linear code generation. In Chapter 7, Chen et al. offer
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
practical advice on proper design of multimedia algorithms using MMX instruc-
tion sets.
Chapter 8, by Bhattacharyya, addresses the relationship between hardware
synthesis and software design, focusing particularly on automated mapping of
high-level specifications of DSP applications onto programmable DSPs. In Chap-
ter 9, Catthoor et al. discuss critical, yet often overlooked, issues of storage sys-
tem architecture and memory management.
I would like to express my appreciation to the authors of each chapter for
their dedication to this project and for their outstanding scholarly work. Thanks
also go to chapter reviewers James C. Abel, Jack Jean, Konstantinos Konstan-
tinides, Grant Martin, Miodrag Potkonjak, and Frederic Rousseau. Throughout
this project, B. J. Clark, acquisitions editor, and Ray K. J. Liu, series editor, have
provided strong encouragement and assistance. I thank them for their support
and trust. I would also like to express my gratitude to Michael Deters, production
editor, for his cooperation and patience.
Yu Hen Hu
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
Contents
Series Introduction
Preface
Contributors
1. Programmable Digital Signal Processors: A Survey
Surin Kittitornkun and Yu Hen Hu
2. VLIW Processor Architectures and Algorithm Mappings
for DSP Applications
Ravi A. Managuli and Yongmin Kim
3. Multimedia Instructions in Microprocessors for Native
Signal Processing
Ruby B. Lee and A. Murat Fiskiran
4. Reconfigurable Computing and Digital Signal Processing:
Past, Present, and Future
Russell Tessier and Wayne Burleson
5. Parallel Architectures for Programmable Video Signal
Processing
Zhao Wu and Wayne Wolf
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
6. OASIS: An Optimized Code Generation Approach for
Complex Instruction Set PDSPs
Jim K. H. Yu and Yu Hen Hu
7. Digital Signal Processing on MMX Technology
Yen-Kuang Chen, Nicholas Yu, and Birju Shah
8. Hardware/Software Cosynthesis of DSP Systems
Shuvra S. Bhattacharyya
9. Data Transfer and Storage Architecture Issues and
Exploration in Multimedia Processors
Francky Catthoor, Koen Danckaert, Chidamber Kulkarni,
and Thierry Omnès
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
Contributors
Shuvra S. Bhattacharyya Department of Electrical and Computer Engineer-
ing and Institute for Advanced Computer Studies, University of Maryland at
College Park, College Park, Maryland
Wayne Burleson Department of Electrical and Computer Engineering, Uni-
versity of Massachusetts, Amherst, Massachusetts
Francky Catthoor Design Technology for Integrated Information, IMEC,
Leuven, Belgium
Yen-Kuang Chen Microprocessor Research Laboratories, Intel Corporation,
Santa Clara, California
Koen Danckaert Design Technology for Integrated Information, IMEC,
Leuven, Belgium
A. Murat Fiskiran Department of Electrical Engineering, Princeton Univer-
sity, Princeton, New Jersey
Yu Hen Hu Department of Electrical and Computer Engineering, University
of Wisconsin–Madison, Madison, Wisconsin
Yongmin Kim Department of Bioengineering, University of Washington, Se-
attle, Washington
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
Surin Kittitornkun Department of Electrical and Computer Engineering, Uni-
versity of Wisconsin–Madison, Madison, Wisconsin
Chidamber Kulkarni Design Technology for Integrated Information, IMEC,
Leuven, Belgium
Ruby B. Lee Department of Electrical Engineering, Princeton University,
Princeton, New Jersey
Ravi A. Managuli Department of Bioengineering, University of Washington,
Seattle, Washington
Thierry Omnès Design Technology for Integrated Information, IMEC, Leu-
ven, Belgium
Birju Shah Microprocessor Research Laboratories, Intel Corporation, Santa
Clara, California
Russell Tessier Department of Electrical and Computer Engineering, Univer-
sity of Massachusetts, Amherst, Massachusetts
Wayne Wolf Department of Electrical Engineering, Princeton University,
Princeton, New Jersey
Zhao Wu Department of Electrical Engineering, Princeton University, Prince-
ton, New Jersey
Jim K. H. Yu* Department of Electrical and Computer Engineering, Univer-
sity of Wisconsin–Madison, Madison, Wisconsin
Nicholas Yu Microprocessor Research Laboratories, Intel Corporation, Santa
Clara, California
* Current affiliation: Tivoli Systems, Austin, Texas
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
1
Programmable Digital Signal
Processors: A Survey
Surin Kittitornkun and Yu Hen Hu
University of Wisconsin–Madison, Madison, Wisconsin
1 INTRODUCTION
Programmable digital signal processors (PDSPs) are general-purpose micropro-
cessors designed specifically for digital signal processing (DSP) applications.
They contain special instructions and special architecture supports so as to exe-
cute computation-intensive DSP algorithms more efficiently.
Programmable digital signal processors are designed mainly for embedded
DSP applications. As such, the user may never realize the existence of a PDSP
in an information appliance. Important applications of PDSPs include modem,
hard drive controller, cellular phone data pump, set-top box, and so forth.
The categorization of PDSPs falls between the general-purpose micropro-
cessor and the custom-designed, dedicated chip set. The former have the advan-
tage of ease of programming and development. However, they often suffer from
disappointing performance for DSP applications due to overheads incurred in
both the architecture and the instruction set. Dedicated chip sets, on the other
hand, lack the flexibility of programming. The time-to-market delay due to chip
development may be longer than the program coding of programmable devices.
1.1 A Brief Historical Scan of PDSP Development
1.1.1 The 1980s to the 1990s
A number of PDSPs appeared in the commercial market in the early 1980s.
Around 1980, Intel introduced the Intel2920, featuring on-chip A/D (analog-to-
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
Kittitornkun
digital) and D/A (digital-to-analog) converters. Nonetheless, it had no hardware
multiplier, and it was difficult to load program parameters into the chip due to
the lack of digital interface. At almost the same time, NEC introduced the NEC
MPD7720. It is equipped with a hardware multiplier and is among the first to
adopt the Harvard architecture with physically separate on-chip data memory and
program memory. Texas Instruments introduced the TMS320C10 in 1982. Simi-
lar to the MPD7720, the ’C10 adopts the Harvard architecture and has a hard-
ware multiplier. Furthermore, the ’C10 is the first PDSP that can execute in-
structions from off-chip program memory without performance penalty due to
off-chip memory input/output (I/O). This feature brought PDSPs closer to the
microprocessor/microcontroller programming model. In addition, the emphasis
on development tools and libraries by Texas Instruments led to widespread appli-
cations of PDSP. The architectural features of several representative examples
of these early PDSP chips are summarized in Table 1.
In these early PDSPs, DSP-specific instructions such as MAC (multiply-
and-accumulate), DELAY (delay elements), REPEAT (loop control), and other
flow control instructions are devised and included in the instruction set so as
to improve both programmability and performance. Moreover, special address
generator units with bit-reversal addressing mode support have been incorporated
to enable efficient execution of the fast Fourier transform (FFT) algorithm. Due
to limitation of chip area and transistor count, the on-chip data and program
memories are quite small in these chips. If the program cannot fit into the on-
chip memory, a significant performance penalty will incur.
Later, floating-point PDSPs, such as Texas Instruments’ TMS320C30 and
Motorola’s DSP96001 appeared in the market. With fixed-point arithmetic as in
early PDSPs, the dynamic range of the intermediate results must be carefully
monitored to prevent overflow. Some reports estimated that as much as one-third
of the instruction cycles in executing PDSP programs are wasted on checking
the overflow condition of intermediate results. A key advantage of a floating-
point arithmetic unit is its extensive dynamic range. Later on, some PDSPs also
included on-chip DMA (direct memory access) controllers, as well as a dedicated
DMA bus that allowed concurrent data I/O at the DMA unit, and signal pro-
cessing computation in the CPU.
1.1.2 The 1990s to 2000
In this decade, the following trends in PDSPs emerged.
Consolidation of PDSP Market. Unlike the 1980s, in which numerous
PDSP architectures had been developed, the 1990s are noted for a consolidation
of the PDSP market. Only very few PDSPs are now available in the market.
Notably, Texas Instrument’s TMS320Cxx series captured about 70% of the PDSP
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
Table 1 Summary of Characteristics of Early PDSPs
On-chip
On-chip On-chip program
Model Manufacturer Year data RAM data ROM RAM Multiplier
A100 Inmos — — — 4, 8, 12, 16
ADSP2100 Analog Device 1986 — — — 16 ⫻ 16 → 32
DSP16 AT&T 512 ⫻ 16 2K ⫻ 16 — 16 ⫻ 16 → 32
DSP32 AT&T 1984 1K ⫻ 32 512 ⫻ 32 — 32 ⫻ 32 → 40
DSP32C AT&T 1988 1K ⫻ 32 2K ⫻ 32 — 32 ⫻ 32 → 40
DSP56000 Motorola 1986 512 ⫻ 24 512 ⫻ 24 512 ⫻ 24 24 ⫻ 24 → 56
DSP96001 Motorola 1988 1K ⫻ 32 1K ⫻ 32 544 ⫻ 32 32 ⫻ 32 → 96
DSSP-VLSI NTT 1986 512 ⫻ 18 — 4K ⫻ 18 (18-bit) 12E6
Intel2920 Intel 1980 40 ⫻ 25 — 192 ⫻ 24 —
LM32900 National — — — 16 ⫻ 16 → 32
MPD7720 NEC 1981 128 ⫻ 16 512 ⫻ 13 512 ⫻ 23 16 ⫻ 16 → 31
MSM 6992 OKI 1986 256 ⫻ 32 — 1K ⫻ 32 (22-bit) 16E6
MSP32 Mitsubishi 256 ⫻ 16 — 1K ⫻ 16 32 ⫻ 16 → 32
MB8764 Fujitsu 256 ⫻ 16 — 1K ⫻ 24
NEC77230 NEC 1986 1K ⫻ 32 1K ⫻ 32 2K ⫻ 32 24E8 → 47E8
TS68930 Thomson 256 ⫻ 16 512 ⫻ 16 1K ⫻ 32 16 ⫻ 16 → 32
TMS32010 TI 1982 144 ⫻ 16 — 1.5K ⫻ 16 16 ⫻ 16 → 32
TMS320C25 TI 1986 288 ⫻ 16 — 4K ⫻ 16 16 ⫻ 16 → 32
TMS320C30 TI 1988 2K ⫻ 32 — 4K ⫻ 32 32 ⫻ 32 → 32E8
ZR34161 VSP Zoran 128 ⫻ 32 1K ⫻ 16 — 16-Bit vector engine
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
Kittitornkun
market toward the end of this decade. Within this family, the traditional
TMS320C10/20 series has evolved into TMS320C50 and has become one of the
most popular PDSPs. Within this TMS family, TMS320C30 was introduced in
1988 and its floating-point arithmetic unit has attracted a number of scientific
applications. Other members in this family that were introduced in the 1990s
include TMS320C40, a multiprocessing PDSP, and TMS320C80, another multi-
processing PDSP designed for multimedia (video) applications. TMS320C54xx
and TMS320C6xx are the recent ones in this family. Another low-cost PDSP
that has emerged as a popular choice is Analog Device’s SHARC processor.
These modern PDSP architectures will be surveyed in later sections of this
chapter.
DSP Core Architecture. As the feature size of the digital integrated circuit
continues to shrink, more and more transistors can be packed into a single chip.
As such, it is possible to incorporate peripheral (glue) logics and supporting logic
components into the same chip in addition to the PDSP. This leads to the notion
of the system on (a) chip (SoC). In designing a SoC system, an existing PDSP
core is incorporated into the overall system design. This design may be repre-
sented as a VHDL (very-high-speed integrated circuit hardware description
language)/Verilog core, or in a netlist format. A PDSP that is used in this fashion
is known as a processor core or a DSP core.
In the 1990s, many existing popular PDSP designs had been converted into
DSP cores so that the designers could design new applications using familiar
instruction sets or even existing programs. On the other hand, several new PDSP
architectures are being developed and licensed as DSP cores. Examples of these
DSP cores, including Carmel, R.E.A.L., StarCore, and V850, will be reviewed
in Section 4.
Multimedia PDSPs. With the development of international multimedia
standards such as JPEG image compression (Pennebaker and Mitchell, 1993),
MPEG video coding (Mitchell et al., 1997), and MP3 audio, there is an expanding
market for low-cost, dedicated multimedia processors. Due to the complexity of
these standards, it is difficult to develop a multimedia processor architecture with-
out any programmability. Thus, a family of multimedia enhanced PDSPs—such
as MPACT, TriMedia, TMS320C8x, and DDMP (Terada et el., 1999)—have
been developed. A key feature of these multimedia PDSPs is that they are
equipped with various special multimedia-related function units, for instance,
the YUV to RGB (color coordinates) converter, the VLC (variable-length code)
entropy encoder/decoder, and the motion estimation unit. In addition, they facili-
tate direct multimedia signal I/O, bypassing the bottleneck of a slow system bus.
Native Signal Processing with Multimedia Extension Instructions. By na-
tive signal processing (NSP), the signal processing tasks are executed in the
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
general-purpose microprocessor, rather than in a separate coprocessing PDSP. As
their speed increases, a number of signal processing operations can be performed
without additional hardware or dedicated chip sets. In the mid-1990s, Intel intro-
duced the MMX (MultiMedia eXtension) instruction set to the Pentium series
microprocessor. Because modern microprocessors have a long internal word
length of 32, 64, or even extended 128 bits, several 8-bit or 16-bit multimedia
data samples can be packed into a single internal word to facilitate the so-called
subword parallelism. By processing several data samples in parallel in a single
instruction, better performance can be accomplished while processing especially
multimedia streams.
1.1.3 Hardware Programmable Digital Signal Processors:
FPGA
An FPGA (field programmable gate array) is a software-configurable hardware
device that contains (1) a substantial amount of uncommitted combinational
logic; (2) preimplemented flip-flops; and (3) programmable interconnections
among the combinational logic, flip-flops, and the chip I/O pins. The downloaded
configuration bit stream programs all the functions of the combinational logic,
flip-flops, and the interconnections. Although not the most efficient, an FPGA
can be used to accelerate DSP applications in several different ways (Knapp,
1995):
1. An FPGA can be used to implement a complete application-specific
integrated circuit (ASIC) DSP system. A shortcoming of this approach
is that current FPGA technology does not yield the most efficient hard-
ware implementation. However, FPGA implementation has several key
advantages: (1) time-to-market is short, (2) upgrade to new architecture
is relatively easy, and (3) low-volume production is cost effective.
2. An FPGA can act as a coprocessor to a PDSP to accelerate certain
specific DSP functions that cannot be efficiently implemented using
conventional architecture.
3. Furthermore, an FPGA can be used as a rapid prototyping system to
validate the design of an ASIC and to facilitate efficient, hardware-in-
the-loop debugging.
1.2 Common Characteristics of DSP Computation
1.2.1 Real-Time Computation
Programmable digital signal processors are often used to implement real-time
applications. For example, in a cellular phone, the speed of speech coding must
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
Kittitornkun
match that of normal conversation. A typical real-time signal processing applica-
tion has three special characteristics:
1. The computation cannot be initiated until the input signal samples are
received. Hence, the result cannot be precomputed and stored for
later use.
2. Results must be obtained before a prespecified deadline. If the deadline
is violated, the quality of services will be dramatically degraded and
even render the application useless.
3. The program execution often continues for an indefinite duration of
time. Hence, the total number of mathematical operations needed to be
performed per unit time, known as throughput, becomes an important
performance indicator.
1.2.2 Data Flow Dominant Computation
Digital signal processor applications involve stream media data types. Thus, in-
stead of supporting complex control flow (e.g., context switch, multithread pro-
cessing), a PDSP should be designed to streamline data flow manipulation. For
example, special hardware must be designed to facilitate efficient input and output
of data from PDSP to off-chip memory, to reduce overhead involved in accessing
arrays of data in various fashions, and to reduce overhead involved in the execu-
tion of multilevel nested DO loops.
1.2.3 Specialized Arithmetic Computation
Digital signal processor applications often require special types of arithmetic op-
erations to make computations more efficient. For example, a convolution opera-
tion
y(n) ⫽ 冱
K⫺1
k⫽0
x(k)h(n ⫺ k)
can be realized using a recursion
y(n) ⫽ 0; y(n) ⫽ y(n) ⫹ x(k) ∗ h(n ⫺ k), k ⫽ 0, 1, 2, . . . , K ⫺ 1
For each k, a multiplication and an addition (accumulation) are to be performed.
This leads to the implementation of MAC instruction in many modern PDSPs:
R4 ← R1 ⫹ R2 ∗ R3
Modern PDSPs often contain hardware support of the so-called saturation arith-
metic. In saturation arithmetic, if the result of computation exceeds the dynamic
range, it is clamped to either the maximum or the minimum value; that is, 9 ⫹
9 ⫽ 15 (010012 ⫹ 010012 ⫽ 011112) in 2’s complement arithmetic. Therefore,
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
for applications in which saturation arithmetic is applicable, there will be no need
to check for overflow during the execution. These special instructions are also
implemented in hardware. For example, to implement a saturation addition func-
tion using 2’s complement arithmetic without intrinsic function support, we have
the following C code segment:
int sadd(int a, int b) {
int result;
result ⫽ a ⫹ b;
if (((a ∧ b) & 0x80000000) ⫽⫽ 0) {
if ((result ∧ a) & 0x80000000) {
result ⫽ (a ⬍ 0) ? 0x80000000 : 0x7fffffff;
}
}
return (result);
}
However, with a special _sadd intrinsic function support in TMS320C6x (Texas
Instruments, 1998c) the same code segment reduces to the single line:
result ⫽ _sadd(a,b);
1.2.4 Execution Control
Many DSP algorithms can be formulated as nested, indefinite Do loops. In order
to reduce the overhead incurred in executing multilevel nested loops, a number
of special hardware supports are included in PDSPs to streamline the control
flow of execution.
1. Zero-overhead hardware loop: A number of PDSPs contain a special
REPEAT instruction to support efficient execution of multiple loop
nests using dedicated counters to keep track of loop indices.
2. Explicit instruction-level parallelism (ILP): Due to the deterministic
data flow of many DSP algorithms, ILP can be exploited at compile
time by an optimizing compiler. This led several modern PDSPs to
adopt the very long instruction word (VLIW) architecture to efficiently
utilize the available ILP.
1.2.5 Low-Power Operation and Embedded System Design
1. The majority of applications of PDSPs are embedded systems, such as
a disk drive controller, modem, and cellular phone. Thus, many PDSPs
are highly integrated and often contain multiple data I/O function units,
timers, and other function units in a single chip packaging.
2. Power consumption is a key concern in the implementation of embed-
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
Kittitornkun
ded systems. Thus, PDSPs are often designed to compromise between
conflicting requirements of high-speed data computation and low-
power consumption (Borkar, 1999). The specialization of certain key
functions allows efficient execution of the desired operations using a
high degree of parallelism while holding down the power source volt-
age and overall clock frequency to conserve energy.
1.3 Common Features of PDSPs
1.3.1 Harvard Architecture
A key feature of PDSPs is the adoption of a Harvard memory architecture that
contains separate program and data memory so as to allow simultaneous instruc-
tion fetch and data access. This is different from the conventional Von Neumann
architecture, where program and data are stored in the same memory space.
1.3.2 Dedicated Address Generator
The address generator allows rapid access of data with complex data arrangement
without interfering with the pipelined execution of main ALUs (arithmetic and
logic units). This is useful for situations such as two-dimensional (2D) digital
filtering and motion estimation. Some address generators may include a bit-
reversal address calculation to support the efficient implementation of FFT, and
circular buffer addressing for the implementation of infinite impulse response
(IIR) digital filters.
1.3.3 High Bandwidth Memory and I/O Controller
To meet the intensive input and output demands of most signal processing appli-
cations, several PDSPs have built-in multichannel DMA channels and dedicated
DMA buses to handle data I/O without interfering with CPU operations. To max-
imize data I/O efficiency, some modern PDSPs even include a dedicated video
and audio codec (coder/decoder) as well as a high-speed serial/parallel communi-
cation port.
1.3.4 Data Parallelism
A number of important DSP applications exhibit a high degree of data parallelism
that can be exploited to accelerate the computation. As a result, several parallel
processing schemes, SIMD (single instruction, multiple data) and MIMD (multi-
ple instruction, multiple data) architecture have been incorporated in the PDSP.
For example, many multimedia-enhanced instruction sets in general-purpose mi-
croprocessors (e.g., MMX) employed subword parallelism to speed up the execu-
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
tion. It is basically an SIMD approach. A number of PDSPs also facilitate MIMD
implementation by providing multiple interprocessor communication links.
2 APPLICATIONS OF PDSP
In this section, both real-world and prototyping applications of PDSPs are sur-
veyed. These applications are divided into three categories: communication sys-
tems, multimedia, and control/data acquisitions.
2.1 Communications Systems
Programmable digital signal processors have been applied to implement various
communication systems. Examples include caller ID [using TMS320C2xx
(Texas Instruments Europe, 1997)], cordless handset, and many others. For voice
communication, an acoustic-echo cancellation based on the normalized least
mean square (NLMS) algorithm for hands-free wireless system is reported in
(Texas Instruments, 1997). Implemented with a TMS320C54, this system per-
forms both active-channel and double-talk detection. A 40-MHz TMS320C50
fixed-point processor is used to implement a low-bit-rate (1.4 Kbps), real-time
vocoder (voice coder) (Yao et al., 1998). The realization also includes both the
decoder and the synthesizer. A telephone voice dialer (Pawate and Robinson,
1996) is implemented with a 16-bit fixed-point TMS320C5x PDSP. It is a
speaker-independent speech recognition system based on the hidden Markov
model algorithm.
Modern PDSPs are also suitable for error correction in digital communica-
tion. A special Viterbi shift left (VSL) instruction is implemented on both the
Motorola DSP56300 and the DSP56600 PDSPs (Taipale, 1998) to accelerate the
Viterbi decoding. Another implementation of the ITU V.32bis Viterbi decoding
algorithm using a TMS320C62xx is reported by Yiu (Yiu, 1998). Yet another
example is the implementation of the U.S. digital cellular error-correction coding
algorithm, including both the tasks of source coding/decoding and ciphering/
deciphering on a TMS320C541 evaluation module (Chishtie, 1994).
Digital baseband signal processing is another important application of
PDSPs. A TMS320C25 DSP-based GMSK (Gaussian minimum shift keying)
modem for Mobitex packet radio data communication is reported in (Resweber,
1996). In this implementation, transmitted data in packet form is level-shifted
and Gaussian-filtered digitally within the modem algorithm so that it is ready for
transmitter baseband interface, either via a D/A converter or by direct digital
modulation. Received data at either baseband or the intermediate frequency (IF)
band from the radio receiver is digitized and processed. Packet synchronization
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
is also handled by the modem, assuring that the next layer sees only valid Mobitex
packets.
System prototyping can be accomplished using PDSP due to its low cost
and ease of programming. A prototype of reverse channel transmitter/receiver
for asymmetric digital subscriber line (ADSL) algorithm (Gottlieb, 1994) is im-
plemented using a floating-point DSP TMS320C40 chip clocked at 40 MHz. The
program consisted of three parts: synchronization, training, and decision-directed
detection.
Navigation using the Global Positioning System (GPS) has been widely
accepted for commercial applications such as electronic direction finding. A
software-based GPS receiver architecture using the TMS320C30 processor is
described in (Kibe et al., 1996). The ’C30 is in charge of signal processing
tasks such as correlation, FFT, digital filtering, decimation, demodulation, and
Viterbi decoding in the tracking loop. Further investigation on the benefits of
using a PDSP in a GPS receiver with special emphasis on fast acquisition tech-
niques is reported in (Daffara and Vinson, 1998). The GPS L1 band signal is
down-converted to IF. After A/D conversion, the signal is processed by a dedi-
cated hardware in conjunction with algorithms (software) on a PDSP. Func-
tions that are fixed and require high-speed processing should be implemented
in dedicated hardware. On the contrary, more sophisticated functions that are
less time-sensitive can be implemented using PDSPs.
For the defense system application, a linear array of TMS320C30 as the
front end and a Transputer processor array as the back end for programmable
radar signal processing are developed to support the PDDR (Point Defense Dem-
onstration Radar) (Alter et al., 1991). The input signal is sampled at 10 MHz to
16-bit, complex-valued samples. The PDSP front end performs pulse compres-
sion, moving target indication (MTI), and constant false alarm (CFA) rate detec-
tion.
2.2 Multimedia
2.2.1 Audio Signal Processing
The audible signals cover the frequency range from 20 to 20,000 Hz. PDSP appli-
cations to audio signal processing can be divided into three categories according
to the qualities and audible range of the signal (Ledger and Tomarakos, 1998):
professional audio products, consumer audio products, and computer audio multi-
media systems. The DSP algorithms used in particular products are summarized
in Table 2.
MP3 (MPEG-I Layer 3 audio coding) has achieved the status of the most
popular audio coding algorithm in recent years. The PDSP implementation of
MP3 decoder can be found in Robinson et al. (1998). On the other hand, most
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
Table 2 DSP Algorithms for Audio Applications
Application DSP algorithms used
Professional audio products
Digital audio effects processors (re- Delay-line modulation/interpolation,
verb, chorus, flanging, vibrato digital filtering (comb, FIR, etc.)
pitch shifting, dyn ran. compres-
sion, etc.)
Digital mixing consoles level detection, Filtering, digital amplitude panning
volume control
Digital audio tape (DAT) Compression techniques: MPEG
Electronic music keyboards physical Wavetable/FM synthesis, sample play-
modeling back
Graphic and parametric equalizers Digital FIR/IIR filters
Multichannel digital audio recorders ADPCM, AC-3
Room equalization Filtering
Speaker equalization Filtering
Consumer audio products
CD-I ADPCM, AC-3, MPEG
CD players and recorders PCM
Digital amplifiers/speakers Digital filtering
Digital audio broadcasting equipment AC-3, MPEG, and so forth
Digital graphic equalizers Digital filtering
Digital versatile disk (DVD) players AC-3, MPEG, and so forth
Home theater systems (surround-sound AC-3, Dolby ProLogic, THX DTS,
receivers/tuners) MPEG, hall/auditorium effects
Karaoke MPEG, audio effects algorithms
Satellite (DBS) broadcasting AC-3, MPEG
Satellite receiver systems AC-3
Computer audio multimedia systems
Sound card ADPCM, AC-3, MP3, MIDI, etc.
Special-purpose headsets 3D positioning (HRTFs)
synthesized sounds such as those used in computer gaming are still represented
in the MIDI (Yim et al., 1998). It can be seen that PDSPs are good candidates
for the implementation of these audio signal processing algorithms.
2.2.2 Image/Video Processing
Existing image and video compression standards such as JPEG and MPEG are
based on the DCT (discrete cosine transform) algorithm. The upcoming JPEG
2000 image coding standard will also include coding algorithms that are based
on the discrete wavelet transform (DWT). These standards are often implemented
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
in modern digital cameras and digital camcorders, in which PDSPs will play an
important role. An example of using the TMS320C549 to implement a digital
camera is reported in Illgner et al. (1999), where the PDSP can be upgraded later
to incorporate the upcoming JPEG 2000 standard.
Low-bit-rate video coding standards include the ITU H.263⫹/H.263M and
MPEG4 simple profile. In Budagavi et al. (1999), the potential applications of
TMS320C54x family chips to implement low-power, low-bit-rate video coding
algorithms are discussed. On the other hand, decoding of MPEG-II broadcasting
grade video sequences using either the TMS320C80 (Bonomini et al., 1996) chip
or the TMS320C6201 (Cheung et al., 1999) chip has been reported.
Medical imaging has become another fast-growing application area of
PDSPs. Reported in Chou et al. (1997) is the use of TMS320C3x as a controller
and on-line data processor for processing magnetic resonance imaging (MRI). It
can perform real-time dynamic imaging such as cardiac imaging, angiography
(examination of the blood vessels using x-rays following the injection of a radi-
opaque substance), and abdominal imaging. Recently, an implementation of real-
time data acquisition, processing, and display of ungated cardiac movies at 20
frames/sec using PDSPs was reported in Morgan et al. (1999).
2.2.3 Printing
The current printer consists of embedded processors to process various formats
of page description languages (PDLs) such as PostScript. In Ganesh and Thakur
(1999), a PDSP is used to interpret the PDL code, to create a list of elements to
be displayed, and to estimate the time needed to render the image. Rendering is
the process of creating the source pixel map. In this process, a common source
map is 600 ⫻ 600 pixels per square inch, with four colors for each pixel, and
eight bits for each color. Compression is necessary to store the output map when
rendering and screening cannot be completed within the real-time requirement.
This phase involves JPEG compression and matrix transformations and interpola-
tions. Depending on the characteristics of the screened image and the storage
memory available, the compressed image may be either lossless or lossy. Decom-
pression of the bit-mapped image occurs in real time as the compressed image
is fed to the print engine. The screening process converts the source pixel map
into the appropriate output format. Because the process must be repeated for all
pixels, the number of calculations is enormous for a high-resolution color image,
especially in real time.
2.2.4 SAR Image Processing
Synthetic aperture radar (SAR) signal processing possesses a significant chal-
lenge due to its very large computation and data storage requirements. A sensor
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
transmits pulses and receives the echoes in a direction approximately perpendicu-
lar to the direction of travel. The problem becomes 2D space-variant convolution
using the range-Doppler algorithm, in which all the signals and coefficients are
complex numbers with a precision of at least 16 bits. A heterogeneous architec-
ture—vector/scalar architecture—is proposed and analyzed (Meisl, 1996). The
vector processor (using Sharp LH9124 for FFTs) and the scalar processing unit
(using eight SHARC 21060’s connected in a mesh network) are chosen based on
performance, scalability, flexibility, development cost, and repeat cost-evaluation
criterion. The design is capable of processing SAR data at about one-tenth of
the real-time rate.
2.2.5 Biometric Information Processing
Handwritten signature verification, one of the biometric authentication tech-
niques, is inexpensive, reliable, and nonintrusive to the person being authorized.
A DSP kernel for on-line verification using the TMS32010 with a 200-Hz sam-
pling rate was developed (Dullink et al., 1995). The authentication kernel com-
prises a personalized table and some general-purpose procedures. This verifica-
tion method can be part of a variety of entrance monitoring and security systems.
2.3 Control and Data Acquisition
As expected, PDSP has found numerous applications in modern control and data
acquisition applications as well. Several control applications are implemented
using Motorola DSP56000 PDSPs that function as both powerful microcontrol-
lers and as fast digital signal processors. Its 56-bit accumulator (hence, the code
name 56xxx) provides 8-bit extension registers in conjunction with saturation
arithmetic to allow 256 successive consecutive additions without the need to
check for overflow condition or limit cycles. The output noise power due to
round-off noise of the 24-bit DSP56000/DSP56001 is 65,536 times less than that
for 16-bit PDSPs and microcontrollers. Design examples include a PID (propor-
tional-integral-derivative) controller (Stokes and Sohie, 1990) and an adaptive
controller (Renard, 1992).
Another example of DSP system development is the Computer-Assisted
Dynamic Data Monitoring and Analysis System (CADDMAS) project developed
for the U.S. Air Force and NASA (Sztipanovits et al., 1998). It is applied to
turbine engine stress testing and analysis. The project makes use of TMS320C40
for distributed-memory parallel PDSPs. An application-specific topology in-
terconnects 30 different systems, with processor counts varying from 4 to 128
processors. More than 300 sensors are used to measure signals with a sampling
rate in excess of 100 kHz. Based on measured signals, the system performs spec-
tral analysis, autocorrelation and cross-correlation, tricoherence, and so forth.
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
2.4 DSP Applications of Hardware Programmable PDSP
There are a variety of FPGA implementation examples of specific DSP functions,
such as the FIR (finite impulse response) digital filter DFT/FFT (discrete Fourier
transform/fast Fourier transform) processor (Dick, 1996), image/video pro-
cessing (Schoner et al., 1995), wireless CDMA (Code Division Multiple Access)
rake receiver (Shankiti and Lesser, 2000), and Viterbi decoding (Goslin, 1995).
2.4.1 16-Tap FIR Digital Filter
A distributed arithmetic (DA) implementation of a 16-tap finite impulse response
digital filter has been reported in Goslin (1995). The DA implementation of the
multiplier uses look-up tables (LUTs). Because the product of two n-bit integers
will have 22n
different results; the size of the LUT increases exponentially with
respect to the word length. For practical implementation, compromises must be
made to trade additional computation time for smaller number of LUTs.
2.4.2 CORDIC-Based Radar Processor
The improvement of FPGA-based CORDIC arithmetic implementation is studied
further in (Andraka, 1998). The iteration process of CORDIC can be unrolled
so that each processing element always performs the same iteration. Unrolling
the processor results in two significant simplifications. First, shift amounts be-
come fixed and can be implemented in the wiring. Second, constant values for
the angle accumulator are distributed to each adder in the angle accumulator chain
and can be hardwired instead of requiring storage space. The entire processor is
reduced to an array of interconnected adder–subtractors, which is strictly combi-
natorial. However, the delay through the resulting circuit can be substantial but
can be shortened using pipelining without additional hardware cost. A 14-bit, 5-
iteration pipelined CORDIC processor that fits in half of an Xilinx XC4013E-2
runs at 52 MHz. This design is used for high-throughput polar-to-Cartesian coor-
dinate transformations in a radar target generator.
2.4.3 DFT/FFT
An FPGA-based systolic DFT array processor architecture is reported in Dick
(1996). Each processing element (PE) contains a CORDIC arithmetic unit, which
consists of a series of shift and adds to avoid the requirement for area-consuming
multiplier. The timing analyzer xdelay determines the maximum clock frequency
to be 15.3 MHz implemented on a Xilinx XC4010 PG191-4 FPGA chip.
2.4.4 Image/Video Signal Processor
In Schoner et al. (1995), the implementation of an FPGA-augmented low-
complexity video signal processor was reported. This combination of ASIC and
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
Table 3 Achievable Frame Rate of Four Different Image
Processing Operations
Latency
Algorithms Frames/sec (msec)
7 ⫻ 7 Mask 2D filter 13.3 75.2
8 ⫻ 8 Block DCT 55.0 18.2
4 ⫻ 4 Block vector, quantization at 0.5 bit/pixel 7.4 139.0
One-level wavelet transform 35.7 28.0
FPGA is flexible enough to implement four common algorithms in real time.
Specifically, for 256 ⫻ 256 ⫻ 8-bit pictures, this device is able to achieve the
frame rates presented in Table 3.
2.4.5 CDMA Rake Receiver
A CDMA rake receiver for a real-time underwater data communication system
has been implemented using four Xilinx XC4010 FPGA chips (Shankiti and
Lesser, 2000) with one multiplier on each chip. The final design of each multiplier
occupies close to 1000 CLBs (configurable logic blocks) and is running at a clock
frequency of 1 MHz.
2.4.6 Viterbi Decoder
Viterbi decoding is used to achieve maximum likelihood decoding of a binary
stream of symbols. Because it involves bit-stream operations, it cannot be effi-
ciently implemented using the word-parallel architecture of general-purpose
microprocessors or PDSPs. It has been reported (Goslin, 1995) that a Xilinx
XC4013E-based FPGA implementation of a Viterbi decoder achieves 2.5 times
processing speed (135 nsec versus 360 nsec) compared to a dual-PDSP imple-
mentation of the same algorithm.
3 PERFORMANCE MEASURES
The comparison of the performance between PDSP and general-purpose micro-
processors, between various PDSPs, and between PDSPs and dedicated hard-
ware chip sets is a very difficult task. A number of factors contribute to this
difficulty:
1. A set of objective performance metrics is difficult to define for PDSPs.
It is well known that with modern superscalar instruction architecture,
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
the usual metrics such as MIPS (millions instructions per section) and
FLOPS (floating-point operations per second) are no longer valid met-
rics to gauge the performance of these microprocessors. Some PDSPs
also adopt such architecture. Hence, a set of appropriate metrics is
difficult to define.
2. PDSPs have fragmented architecture. Unlike general-purpose micro-
processors that have converged largely to a similar data format (32-
bit or 64-bit architecture), PDSPs have a much more fragmented archi-
tecture in terms of internal or external data format and fixed-point ver-
sus floating-point operations. The external memory interface is varied
on platform by platform basis. This is due to the fact that most PDSPs
are designed for embedded applications and, hence, cross-platform
compatibility is not of major concern for different manufacturers of
PDSPs. Furthermore, PDSPs often have specialized hardware to accel-
erate a special type of operation. Such specialized hardware makes the
comparison even more difficult.
3. PDSP applications are often hand programmed with respect to a par-
ticular platform. The performance of cross-platform compilers is still
far from realistic. Hence, it is not meaningful to run the same high-
level language benchmark program on different PDSP platforms.
Some physical parameters of PDSPs are summarized in Table 4.
Usually peak MIPS, MOPS, MFLOPS, MAC/sec, and MB/sec for a partic-
ular architecture are just the product of instructions, operations, floating-point
operations, multiply-accumulate operations, and memory access in bytes exe-
cuted in parallel multiplied by maximum clock frequency, respectively. They can
be achieved instantaneously in real applications at certain clock cycle and some-
Table 4 Physical Performance Parameters
Parameters Units
Maximum clock frequency MHz
Power consumption Absolute power, watts (W), power (W)/MIPS
Execution throughput, MIPS, MOPS (million operations/sec), MACS
peak and sustained (no. of MAC/sec), MFLOPS
Operation latency Instruction cycles
Memory access Clock cycles
Bandwidth MB/sec (megabytes per second)
Latency Clock cycle
Input/output No. of ports
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
Table 5 Examples of DSP Benchmarks
Level Algorithm/application names
Kernel
General FFT/IFFT, FIR, IIR, matrix/vector multiply, Viterbi de-
coding, LMS (least mean square) algorithm
Multimedia/graphic DCT/IDCT, VLC (variable-length code) decoding, SAD
Application
General Radar (Bhargava et al., 1998)
Multimedia/graphic MediaBench (Lee et al., 1997) G.722, JPEG, Image
(Bhargava et al., 1998)
how misleading. From a user’s perspective, the ultimate performance measure
is the ‘‘execution time’’ (wall clock time) of individual benchmark.
Recently, efforts have been made to establish benchmark suites for PDSPs.
The proposed benchmark suites (Bhargava et al., 1998; Lee et al., 1997) can be
categorized into kernel and application levels. They can be classified into general
DSP and multimedia/graphic. Because each kernel contributes to the run time
of each application at some certain percentage of run time and each application
may contain more than single DSP kernel, conducting benchmark tests at both
levels gives more accurate results than just the raw number of some DSP kernels.
A number of DSP benchmarks are summarized in Table 5.
4 MODERN PDSP ARCHITECTURES
In this section, several modern PDSP architectures will be surveyed. Based on
different implementation methods, modern PDSPs can be characterized as PDSP
chip, PDSP core, multimedia PDSPs, and NSP instruction set. The following
aspects of these implementation approaches are summarized in terms of three
general sets of characteristics:
1. Program (instruction) execution
2. Datapath
3. Physical implementation
Program execution of the PDSP is characterized by processing core (how the
PDSP achieves parallelism), instruction width (bits), maximum number of in-
structions issued, and address space of program memory (bits). Its datapath is
concerned with the number and bit width of datapath, pipelining depth, native
data type (either fixed point or floating point), number of ALUs, shifters, multipli-
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
ers, and bit manipulation units as well as their corresponding data precision/
accuracy and data/address registers. Finally, physical characteristics to be com-
pared include maximum clock frequency, typical operating voltage, feature size
and implementation technology, and power consumption.
4.1 PDSP Chips
Some of the recent single-chip PDSPs are summarized in Table 6.
4.1.1 DSP16xxx
The Lucent DSP16xxx (Bier, 1997) achieves ILP from parallel operations en-
coded in a complex instruction. These complex instructions are executed at a
maximum rate of one instruction per clock cycle. Embedding up to 31 instructions
following the Do instruction can eliminate overheads due to small loops. These
embedded instructions can be repeated a specified number of times without addi-
tional overhead. Moreover, a high instruction/data I/O bandwidth can be
achieved from a 60-kword (120-kbyte) dual-ported on-chip RAM, a dedicated
data bus, and a multiplexed data/instruction bus.
4.1.2 TMS320C54xx
Characterized as a low-power PDSP, each TMS320C54xx (Texas Instruments,
1999a) chip is composed of two independent processor cores. Each core has a
40-bit ALU, including a 40-bit barrel shifter, two 40-bit accumulators, and a 17-
bit ⫻ 17-bit parallel multiplier coupled with a 40-bit adder to facilitate single-
cycle MAC operation. The C54 series is optimized for low-power communication
applications. Therefore, it is equipped with a compare, select, and store unit
(CSSU) for the add/compare selection of the Viterbi operator. Loop/branch over-
head is eliminated using instructions such as repeat, block-repeat, and
conditional store. Interprocessor communication is carried out via two
internal eight-element first-in-first-out (FIFO) register.
4.1.3 TMS320C62x/C67x
TMS320C62x/C67x (Texas Instruments, 1999b, 1999c; Seshan, 1998) is a series
of fixed-point/floating-point, VLIW-based PDSPs for high-performance applica-
tions. During each clock cycle, a compact instruction is fetched and decoded
(decompressed) to yield a packet of eight 32-bit instructions that resemble those
of conventional VLIW architecture. The compiler performs software pipelining,
loop unrolling, and ‘‘If’’ conversion to a predicate execution. Furthermore, pro-
grammers from a high-level language such as C can access a number of special-
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
Table 6 Summary of Recent Single-Chip PDSPs
Family name: DSP 16xxx SHARC TMS32054xx TMS32062xx TMS32067xx TriCore
Model no.: DSP16210 ADSP21160 TMS320VC5421 TMS320C 6203 TMS320C 6701 TC10GP
Company: Lucent Analog Device Texas Texas Texas Infineon
Processing core: VLIW Multiproc./SIMD Multiprocessor VLIW VLIW Superscalar
Instruction
Width (bits) 16 & 32 32 16 & 32 256 256 16 & 32
Maximum issued 1 4 2 8 8 2
Address space (bits) 20 32 — 32 32 32
Datapath
No. of datapaths 2 2 2 2 2 3
Width of datapath (bits) 16 32 16 32 32 32
Pipeline depth 3 3 — 11 17 4
Data type Fixed-point Floating-point Fixed-point Fixed-point Floating-point Fixed-point
Functional units
ALUs 2 (40b) 2 2 (40b) 4 4 1
Shifters 2 2 2 (40b) 2 0 —
Multipliers 2 (16b ⫻ 16b) 2 2 (17b ⫻ 17b) 2 2 2 (16b ⫻ 16b)
Address generator 2 2 2 ⫻ 2 ALUs ALUs 1
Bit manipulation unit 1 (40b) Shifter 2 (40b) Shifter Shifter 1
Program control
Hardware loop Y Y Y N N Y
Nesting levels 2 — 2 2 2 3
On-chip storage
Data registers 8 2 ⫻ 16 2 ⫻ 2 2 ⫻ 16 2 ⫻ 16 16
Width (bits) 40 40 40 32 32 32
Address registers 21 2 ⫻ 8 2 ⫻ 8 — — 16
Width (bits) 20 32 — — — 32
Performance
Maximum clock (MHz) 150 100 — 300 167 66
Operating voltage (V) 3.0 — 1.8 1.5 1.8 2.5
Technology CMOS — CMOS CMOS (15C05) CMOS (18C05) CMOS
Feature size (µm) — — — 0.15 0.18 0.35
Power consumption 294 mW at — 162 mW at 100 MHz — — —
100 MHz
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
purpose DSP instructions called intrinsic functions. This feature helps ease the
programming task and improves the code performance.
4.1.4 ADSP 21160 SHARC
Analog Device’s ADSP21160 SHARC (Super Harvard Architecture) (Analog
Device, 1999) contains two PEs, both using a 40-bit extended precision floating-
point format. Every functional unit in each PE is connected in parallel and per-
forms single-cycle operations. Even though its name is abbreviated from Harvard
architecture, its program memory can store both instruction and data. Further-
more, SHARC doubles its data memory bandwidth to allow the simultaneous
fetch of both operands.
4.1.5 TriCore
TriCore TC10GP (TriCore, 1999) is a dual-issued superscalar load/store architec-
ture targeted at control-oriented/DSP-oriented applications. Even though its in-
structions are mixed 16/32 bits wide for low-code density, its datapath is 32 bits
wide to accommodate high-precision fixed-point and single-precision floating-
point numbers.
4.2 PDSP CORES
In Table 7, we compare the features of four DSP cores reported in the literature:
Carmel, R.E.A.L., StarCore, and V850.
4.2.1 Carmel
One of the distinguishing features of Carmel (Carmel, 1999; Eyre and Bier, 1998)
is its configurable long instruction words (CLIW) that are user-defined VLIW-
like instructions. Each CLIW instruction combines multiple predefined instruc-
tions into a 144-bit-long superinstruction:
CLIW name (ma1, ma2, ma3, ma4) { // CLIW reference line
MAC1 | | ALU1 | | MAC2 | | ALU2 | | MOV1 | | MOV2 // CLIW def
}
Programmers can indicate up to four execution units plus two data moves
according to the position of individual instruction within the long CLIW instruc-
tion. However, up to four memory operands can be specified using ma1 through
ma4. The assembler stores 48-bit reference line in program memory and 96-bit
definition in a separate CLIW memory (1024 ⫻ 96 bits). In addition to CLIW,
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
Table 7 Summary of PDSP Cores
Family name: Carmel R.E.A.L. StarCore V850
Model no.: DSP 10XX — SC140 NA853C
Company Infineon Philips Lucent & Motorola NEC
Processing core VLIW VLIW VLIW RISC
Instruction
Width (bits) 24 & 48 16 & 32 16 16 & 32
Maximum issued 1/2 2 6 —
Address space (bits) 23 — 32 26
Datapath
No. of datapaths 2 2 — —
Width of datapath (bits) 16 16 16 16
Data type Fixed-point Fixed-point Fixed-point Fixed-point
Functional units
ALUs 2 (40b) 4 (16b) 4 1 (32b)
Shifters 1 (40) 1 (40b) ALU (40b) 1 (32b)
Multipliers 2 (17b ⫻ 17b) 2 (16 ⫻ 16b) ALU (16b ⫻ 16b) 1 (32b ⫻ 32b)
Address generator 1 2 2 —
Bit manipulation unit Shifter — ALU Shifter
On-chip storage
Data registers 16 ⫹ 6 8 16 32
Width (bits) 16/40 16 40 32
Address registers 10 16 24 —
Width (bits) 16 — 32 —
Performance
Maximum clock (MHz) 120 85 300 33
Operating voltage (V) 2.5 2.5 1.5 3.3
Technology CMOS — CMOS Titanium silicide
Feature size (µm) 0.25 0.25 0.13 0.35
Power consumption 200 mW at 120 MHz — 180 mW at 300 MHz —
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
a specialized hardware is provided to support Viterbi decoding. Almost all in-
structions can use predicated execution by two conditional-execution registers.
4.2.2 R.E.A.L.
Similarly, the R.E.A.L. PDSP (Kievits et al., 1998) core allows users to specify
a VLIW-like set of application-specific instructions (ASIs) to exploit full parallel-
ism of the datapath. Up to 256 ASIs can be stored in a look-up table. A special
class of 16-bit instructions with an 8-bit index field activates these instructions.
Each ASI is 96 bits wide and has the following predicated form:
Cond (3) | | XACU (11) | | YACU (10) | | MPY1 (3) | | MPY0 (3) | | ALUs
(62) | | DSU (2) | | BNU (2)
ASI [if(asi_cc)]alu3_op,alu2_op,alu1_op,alu0_op
[mult1_op][,mult0_op][,dr_op][,xacu_op][,yacu_op];
ASI [if(asi_cc)]alu32_op, alu10_op [,mult1_op][,mult0_op][,dr_op]
[,xacu_op][,yacu_op];
ASI [if(asi_cc)]lfsr [,mult1_op][,mult0_op][,xacu_op][,yacu_op];
Each ASI starts with a 3-bit condition code followed by an 11-bit X ALU
opcode, 10-bit Y ALU, 3-bit multiplier 1 and 0’s opcodes, 62-bit operands,
and so on. In addition to the user-defined VLIW instruction, R.E.A.L. allows
application-specific execution units (AXUs) to be defined by the customer, which
can be placed anywhere in the datapath or address calculation units. Its applica-
tion is a GSM baseband signal processor.
4.2.3 StarCore
StarCore (StarCore, 1999; Wolf and Bier, 1998) is a joint development between
Lucent and Motorola for wireless software handset configurable terminals (ra-
dios) of third-generation wireless systems. It is expected to operate at a low volt-
age down to 0.9 V. A fetch set (8-word instruction set) is fetched from memory.
A program sequencing (PSEQ) unit detects a portion of this set to be executed
in parallel and dispatched to the appropriate execution unit. This feature is called
a variable-length execution set (VLES). StarCore achieves maximum parallelism
by allowing multiple address generation and data ALUs to execute multiple oper-
ations in a single cycle. StarCore is targeted at speech coding, synthesis, and
voice recognition.
4.2.4 V850
The NEC NA853E (NEC, 1997) is a five-stage pipeline RISC (reduced instruc-
tion set computer) core suitable for real-time control applications. Not only is
the instruction set a mixture of 16 and 32 bits wide, but it also includes intrinsic
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
instructions for high-level language support to increase the efficiency of the ob-
ject code generated by the compiler and to reduce the program size.
4.3 Multimedia PDSPs
Multimedia PDSPs are designed specifically for audio/video applications as well
as 2D/3D graphics. Some of their common characteristics are as follows:
1. Multimedia input/output (I/O): This may include ports and codec
(coder/decoder)forvideo,audio,aswellassuper-VGAgraphicssignals.
2. Multimedia-specific functional units such as a YUV to RGB converter
for video display, variable-length decoder for digital video decoding,
descrambler in TriMedia (Phillips, 1999), and motion estimation unit
for digital video coding/compression in Mpact2 (Kala, 1998; Purcell,
1998).
3. High-speed host computer/memory interfaces such as PCI bus and
RAMBUS DRAM interfaces.
4. Real-time kernel and operating system for MPACT and TriMedia, re-
spectively.
5. Support of floating-point and 2D/3D graphic.
Examples of multimedia PDSPs include MPACT, TriMedia, TMS320C8x, and
DDMP (Data-Driven Multimedia Processor) (Terada et al., 1999). Their architec-
tural features are summarized in Table 8.
4.4 Native Signal Processing
Native signal processing (NSP) is the use of extended instruction sets in a general-
purpose microprocessor to process signal processing algorithms. These are spe-
cial-purpose instructions that often operate in a different manner than the regular
instructions. Specifically, multimedia data formats usually are rather short (8 or
16 bits) compared to the 32-, 64-, and 128-bit native register length of modern
general-purpose microprocessors. Therefore, up to eight samples may be packed
into a single word and processed simultaneously to enhance parallelism at the
subword level. Most NSP instructions operate on both integer (fixed-point) and
floating-point numbers except the Visual Instruction Set (VIS) (Sun, 1997), which
supports only fixed-point numbers. In general, NSP instructions can be classified
as follows (Lee, 1996):
• Vector arithmetic and logical operations whose results may be vector
or scalar
• Conditional execution using masking operations
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
Table 8 Summary of Multimedia PDSPs
Family name: DDMP MPACT TMS320C8x TriMedia
Model no.: — MPACT2/6000 TMS320C82 TM1300
Company: Sharp Chromatic Texas Philips
Processing core: Dataflow VLIW Multiprocessor VLIW
Instruction
Width (bits) 72 81 32 16 to 224
Maximum issued 8 2 3 5
Address space (bits) — — 32 32
Datapath
Width of datapath (bits) 12 72 32 32
Floating-point precision NAa
Both Single Both
Functional units
ALUs 2 2 3 7
Shifters — 1 3 2
Multipliers 2 ALUs 1 3 2
Address generators 4 — 3 2
On-chip storage
Data registers 4 Accumulators 512 48 128
Width (bits) 24 72 32 32
Performance
Maximum clock (MHz) 120 125 60 166
Operating voltage (V) 2.5 — — 2.5
Technology CMOS (4 metal) — CMOS CMOS
Feature size (µm) 0.25 0.35 — 0.25
Power Consumption 1.2 W at 120 MHz — — 3.5 W at 166 MHz
a
NA ⫽ not available.
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
• Memory/cache access control such as cache prefetch to particular level
of cache, nontemporal store, and so forth, as well as masked load/store
• Data alignment and subword rearrangement (i.e., permute, shuffle, etc.)
Most NSP instruction set architectures exhibit the following features:
1. Native signal processing instructions may share the existing functional
units of regular instructions. As such, some overhead is involved when
switching between NSP instructions and regular instructions. However,
some NSP instruction sets have separate, exclusive execution units as
well as register file.
2. Saturation and/or modulo arithmetic instructions are often imple-
mented in hardware to reduce the overhead of dynamic range checking
during execution, as illustrated in Section 1.2.3.
3. To exploit subword parallelism, manual or human optimization of
NSP-based programs is often necessary for demanding applications
such as image/video processing and 2D/3D graphics.
Common and distinguishing features of available NSPs are summarized alphabet-
ically as presented in Table 9.
4.4.1 AltiVec
Motorola’s AltiVec (Motorola, 1998; Tyler et al., 1999) features a 128-bit vector
execution unit operating concurrently with the existing integer and floating-point
units. There are totally 162 new instructions that can be divided into four major
classes:
Intraelement arithmetic Addition, subtraction, multiply–add, aver-
operations age, minimum, maximum, conversion
between 32-bit integer and floating point
Intraelement nonarithmetic Compare, select, logical, shift, and rotate
operations
Interelement arithmetic Sum of elements within a single vector
operations register to a separate register
Interelement nonarithmetic Wide field shift, pack, unpack, merge/
operations interleave, and permute
AltiVec shows a significant amount of effort to exploit the maximum
amount of parallelism. This results in a 32-entry, 128-bit-wide register file sepa-
rating from the existing integer and floating-point register files. This is different
from other NSP architectures that often share the NSP register file with the ex-
isting one. The purpose is to exploit additional parallelism through the superscalar
dispatch of operations to multiple execution units; or through multithreaded
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
Table 9 Summary of Native Signal Processing Instruction Sets
Name: AltiVec MAX-2 MDMX MMX/3D Now MMX/SIMD VIS
Company: Motorola HP MIPS AMD Intel Sun
Instruction set: Power PC PA RISC 2.0 MIPS-V IA32 IA32 SPARC V.9
Processor: MPC7400 PA RISC R10000 K6-2 Pentium III UltraSparc
Fixed point (integer)
8-Bit 16 NAa
8 8 8 8
16-Bit 8 4 4 4 4 4
32-Bit 4 NA NA 2 2 2
Floating point
Single precision 4 2 2 2 4 Na
Fixed-point register file
Size 32 ⫻ 128b 32 ⫻ 64b 32 ⫻ 64b 8 ⫻ 64b 8 ⫻ 64b 32 ⫻ 64b
Shared with Dedicated Integer reg. FP reg. Dedicated FP reg. FP reg.
Fixed-point accumulator
Size NA NA 192 NA NA NA
Arithmetic
Unsigned saturation Y Y Y Y Y Y
Modulo Y Y Y Y Y Y
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
Interelement arithmetic
Multiply-Acca
4 NA 4 2 2 NA
Fixed-point MAC 32 ⫹⫽ (16 ⫻ 16) — 48 ⫹⫽ (16 ⫻ 16) 32 ⫹⫽ (16 ⫻ 16) 32 ⫹⫽ (16 ⫻ 16) —
precisionb
Compare Y N Y Y Y Y
Min/max Y N Y N Y Y
Floating-point 4 single 2 single 2 single 2 single 4 single N
Multiply-Acc
Floating-point Y N N Y Y N
min/max
Intraelement arithmetic
Sum Y N N Y Y N
Floating-point Sum Y N N Y N N
Type conversion
Pack Y Y Y Y Y Y
Unpack Y Y Y Y Y Y
Permute Y Y Y — N —
Merge Y Y Y Y Y Y
Special instructions VREFP CACHE HINT SELECT FEMMS EMMS EDGE
VRSQRTFP DEPOSIT PFRCP DIVPS ARRAY
SPLAT EXTRACT PFRSQRT PREFETCH PDIST
VSEL SHR PAIR PREFETCH SFENCE BLOCK
TRANSFER
a
NA: not available.
b
precision (bits): acc (bits) ⫽ acc (bits) ⫹ a (bits) ⫻ b (bits).
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
execution unit pipelines. Each instruction can specify up to three source oper-
ands and a single destination operand. Each operand refers to a vector register.
Target applications of AltiVec include multimedia applications as well as high-
bandwidth data communication, base station processing, IP telephony gateway,
multichannel modem, network infrastructure such as an Internet router, and a
virtual private network server.
4.4.2 MAX 2.0
Multimedia Acceleration eXtension (MAX) 2.0 (Lee, 1996) is an extension of
HP Precision Architecture RISC ISA on a PA8000 microprocessor with minimal
increased die area concern. Both 8-bit and 32-bit subwords are not supported due
to insufficient precision and insufficient parallelism compared to a 32-bit single-
precision floating point, respectively. Although pixels may be input and output
as 8 bits, using a 16-bit subword in intermediate calculations is preferred. The
additional hardware to support MAX2.0 is minimal because the integer pipe-
line already has two integer ALUs and shift merge units (SMUs), whereas the
floating-point pipeline has two FMACs and two FDIV, FSQRT units. MAX
special instructions are field manipulation instructions, as follows:
Cache hint For spatial locality
Extract Selects any field in the source register and places it
right-aligned in the target
Deposit Selects a right-aligned field from the source and places it
anywhere in the target
Shift pair Concatenates and shifts 64-bit or rightmost 32-bit con-
tents of tow register into one result
4.4.3 MDMX
Based on MIPS’ experience of designing Geometry Engine, Reality Engine, Max-
imum Impact, Infinite Reality, Nintendo64, O2, and Magic Carpet, the goal of
MIPS Digital Media Extension (MDMX) (MIPS Technology, 1997) is to improve
performance IEEE-compliant DCT accuracy. As a result, MDMX adds four- and
eight-element SIMD capabilities for an integer arithmetic through the definition
of these two data types:
Octal byte Eight unsigned 8-bit integers with eight unsigned 24-bit
accumulators
Quad half Four unsigned 16-bit integers with four unsigned 48-bit
accumulators
Note that both octal byte and quad half data types share a 192-bit accumulator,
which permits accumulation of 2N
N ⫻ N multiples, where N is either 8 or 16
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
bits according to octal byte and quad half, respectively. MDMX’s 32, 64-bit-
wide registers and the 8-bit condition code coincide with the existing floating-
point register file similar to the ‘‘paired-single’’-precision floating-point data
type. Data are moved between the shared floating-point register file and memory
with a floating-point load/store double word and between floating-point and inte-
ger registers. In addition, MDMX has a unique feature with the vector arithmetic:
It is able to operate on a specific element of a subword as an operand or as a
constant immediate value. However, the reduction instruction (sum across) and
sum of absolute difference (SAD) are judiciously omitted. In particular, SAD or
L1 norm can be performed as an L2 norm without loss of precision using the
192-bit accumulator.
4.4.4 MMX 3DNow!
AMD 3DNow! (AMD, 1999; Oberman et al., 1999) is Intel’s MMX-like multi-
media extension, first implemented in the AMD K6-2 processor. Floating-point
instructions are augmented to the integer-based MMX instruction set by introduc-
ing a new data type: single-precision floating-point to support 2D and 3D graph-
ics. Similar to the MMX, applications must determine if the processor supports
MMX or not. In addition, 3DNow! is implemented with a separate flat register
file in contrast to the stack-based floating-point/MMX register file. Because no
physical transfer of data between floating-point and multimedia unit register files
is required, FEMMS (faster entry/exit of the MMX or floating-point state) is
included to replace MMX EMMS instruction and to enhance the performance.
Either the register X or Y execution pipeline can execute floating-point instruc-
tions for a maximum issue and execution rate of two operations per cycle (AMD,
1999). There are no instruction-decode or operation-issue pairing restrictions. All
operations have an execution latency of two cycles and are fully pipelined. As
long as two operations do not fall into the same category, both operations will
start execution without delay. The 2 categories of the additional 21 instructions
are as follows:
1. PFADD, PFSUB, PFSUBR, PFACC, PFCMPx, PFMIN, PFMAX,
PI2FD, PFRCP, and PFRSQRT
2. PFMUL, PFRCPIT1, PFRSQIT1, and PFRCPIT2
Normally, all instructions should be properly scheduled so as to avoid delay due
to execution resource contention or structural hazard by taking dependencies and
execution latencies into account.
FEMMS Similar to MMX’s EMMS but faster because 3DNow!
does not share MMX registers with those of floating
point.
PFRCP Scalar floating-point reciprocal approximation
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
PFRSQRT Scalar floating-point reciprocal square root approxi-
mation
PREFETCH Loads 32 or greater number of bytes either nontempo-
ral or temporal in the specified cache level
4.4.5 MMX/SIMD
MMX (multimedia extension) is Intel’s first native signal processing extension
instruction set (Intel, 1999). Subsequently, additional instructions are augmented
to the Streaming SIMD Extensions (SSE) (Intel, 1999) in Pentium III class pro-
cessors. SIMD supports 4-way parallelism of 32-bit, single-precision floating-
point for 2D and 3D graphics or 32-bit integer for audio processing. These new
data types are held in a new separate set of eight 128-bit SIMD registers. Unlike
MMX execution, traditional floating-point instructions can be mixed with SSE
without the need to execute special instructions, such as EMMS. In addition,
SIMD features explicit SAD instruction and introduces a new operating-system
visible state:
EMMS (empty Must be used to empty the floating-point tag word
MMX state) at the end of an MMX routine before calling
other routines executing floating-point instruc-
tions
DIVPS Divides four pairs of packed, single-precision,
floating-point operands
PREFETCH Loads 32 or greater number of bytes either nontem-
poral or temporal in the specified cache level
SFENCE Ensures ordering between routines that produce
(store fence) weakly ordered results and routines that consume
these data just like multiprocessor weak consis-
tency; nontemporal stores implicitly weak or-
dered, no write-allocate, write combine/collapse
so that cache pollution is minimized
4.4.6 VIS
Sun’s VIS (Visual Instruction Set) (Sun, 1997; Tremblay et al., 1996) is the only
NSP reviewed here that does not support parallelism of floating-point data type.
However, the subword data share the floating-point register file with floating-
point number, as indicated in Table 9. Some special instructions in VIS are Array,
Pdist, and Block transfer:
Array Facilitates 3D texture mapping and volume rendering
by computing a memory address for data look up
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
based on fixed-point x, y, and z; data laid out in a
block fashion so that points which are near one an-
other have their data stored in nearby memory loca-
tions
Edge Computes a mask used for partial storage at an arbi-
trarily aligned start or stop address typically at
boundary pixels
Pdist Computes the sum of absolute value of difference of
eight pixel pairs
Block transfer Transfers 64 bytes of data between memory and regis-
ters
5 SOFTWARE PROGRAMMING TOOLS FOR PDSPs
5.1 Software Development Tools for Programming PDSPs
Since their introduction more than a decade ago, PDSPs have been incorporated
in many high-performance embedded systems such as modems and graphic accel-
eration cards. A unique requirement of these applications is that they all demand
high-quality (machine) code generation to achieve the highest performance while
minimizing the size of the program to conserve premium on-chip memory space.
Often, the difference of one or two extra instructions implies that either a real-
time processing constraint may be violated, leaving the code generated use-
less, or an additional memory module may be needed, causing significant cost
overrun.
High-level languages (HLLs) are attractive to PDSP programmers because
they hide hardware-dependent details and simplify the task of programming. Un-
like assembly codes, HLL programs are readable and maintainable and are more
likely to be portable to other processors. In the case of an object-oriented HLL,
such as C⫹⫹, those programs are also more reliable and reusable. All these
features contribute to reduce development time and cost.
Figure 1 depicts an example of typical software development for PDSPs—
the TMS320C6x software development flowchart. There are three possible source
programs: C source files, macro source files, and linear assembler source files. The
latter sources are both at assembly program level. The assembly optimizer assigns
registers and uses loop optimization to turn the linear assembly into a highly paral-
lel assembly that takes advantage of software pipelining. The assembler translates
assembly language source files into machine language object files. The machine
language is based on the common object file format (COFF). Finally, the linker
combines object files into a single executable object module. As it creates the
executable module, it performs relocation and resolves external references. The
linker also accepts relocatable COFF object files and object libraries as input.
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
Figure 1 TMS320C6x software development flow. (From Texas Instruments, 1998a.)
To improve the quality of the code generated, C compilers are always equipped
with extensive optimization options. Many of these compiler optimization strate-
gies are based on GNU C Compiler (GCC) (see Table 10).
The debugger can usually be both simulator and profiler like the C source
debugger (Texas Instruments, 1998b). The C source debugger is an advanced
graphic user interface (GUI) to develop, test, and refine ’C6x C programs and
assembly language programs. In addition to that, the ’C6x debugger accepts exe-
cutable COFF files as input. It features the following capabilities that are common
in other PDSP development environments:
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
• Multilevel debugging; user can debug both C and assembly language
code.
• Dynamic profiling provides a method for collecting execution statistics
and immediate feedback to identify performance bottlenecks within the
code.
• Fully configurable graphical user interface.
• Comprehensive data displays.
5.2 On-Chip Emulation
The presence of the Joint Test Action Group (JTAG) test access module and
enhanced on-chip emulation (EOnCE) module interface allows the user to insert
the PDSP into a target system while retaining debug control. The EOnCE module,
as shown in Figure 2, is used in PDSP devices to debug application software in
real time. It is a separate on-chip block that allows nonintrusive interaction with
the core. The user can examine the contents of registers, memory, or on-chip
peripherals through the JTAG pins. Special circuits and dedicated pins on the
core are defined, to avoid sacrificing user-accessible on-chip resources.
As applications grow in terms of both size and complexity, the EOnCE
provides the user with many features, including the following:
• Breakpoints on data bus values
• Detection of events, which can cause a number of different activities
configured by the user
• Nondestructive access to the core and its peripherals
Table 10 Compiler Optimization Options in DSP16000 Series
Optimization performed Targeted application
⫺O0 Default operation, no optimization C level debug to verify functional
correctness
⫺O1 Optimize for space Optimize space for control code
⫺O2 Optimize for space and speed Optimize space and speed for control
code
⫺O Equivalent to ⫺O2 Equivalent to ⫺O2
⫺O3 ⫺O2 plus loop cache support, Optimize speed for control and loop
some loop unrolling code
⫺O4 Aggressive optimization with soft- Optimize speed and space for control
ware pipeline and loop code
⫺Os Optimize for space Optimize space for control and loop
code
Source: Lucent, 1999.
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
Figure 2 Typical debugging system using EOnCE. (From StarCore, 1999.)
• Various means of profiling
• Program tracing buffer
The EOnCE module provides system-level debugging for real-time systems, with
the ability to keep a running log and trace of the execution of tasks and interrupts
and to debug the operation of real-time operating systems (RTOS).
5.3 Optimizing Compiler and Code Generation for PDSP
The PDSP architecture evolves from an ad hoc heterogeneous resource toward
a homogeneous resource like the general-purpose RISC microprocessor. One of
the reasons is to make compiler optimization techniques less difficult. Classical
PDSP architecture is characterized by the following:
• A small number and nonuniform register sets in which certain registers
and memory blocks are specialized for specific usage
• Highly irregular datapaths to improve code performance and reduce
code size
• Very specialized functional units
• Restricted connectivity and limited addressing to multipartitioned
memory
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
Several techniques have been proposed based on a simplified architecture of
TMS320C2X/5X [e.g., instruction selection and instruction scheduling (Yu and
Hu, 1994) register allocation and instruction scheduling.
The success of RISC and its derivatives such as the superscalar architecture
and the VLIW architecture have asserted significant impacts on the evolution of
modern PDSP architecture. However, code density is a central concern in devel-
oping embedded DSP systems. This concern leads to the development of new
strategies such as code compaction (Carmel, 1999) and user-defined long instruc-
tion word (Kievits et al., 1998). With a smaller code space in mind, code compac-
tion using integer programming (Leupers and Marwedel, 1995) was proposed for
applications to PDSPs that offer instruction-level parallelism such as VLIW.
Later, an integer programming problem formulation for simultaneous instruc-
tion selection, compaction, and register allocation was investigated by Geboyts
(Geboyts, 1997). It can be seen that earlier optimization techniques were focused
on the optimization of basic code blocks. Thus, they can be considered a local
optimization approach. Recently, the focus has shifted to global optimization
issues such as loop unrolling and software pipelining. In Stotzer and Leiss (1999),
results of implementing a software pipelining using modulo scheduling algorithm
on the ’C6x VLIW DSP have been reported.
Artificial intelligence (AI) techniques such as planning were employed to
optimize instruction selection and scheduling (Yu and Hu, 1994). With AI, con-
current instruction selection and scheduling yield code comparable to that of
handwritten assembly codes by DSP experts. The instruction scheduler is a heu-
ristic list-based scheduler. Both instruction scheduling and selection involve node
coverage by pattern matching and node evaluation by heuristic search using
means-end analysis and hierarchical planning. The efficiency is measured in
terms of size and execution time of generated assembly code whose size is up
to 3.8 times smaller than that of a commercial compiler.
Simultaneous instruction scheduling and register allocation for minimum
cost based on a branch-and-bound algorithm are reported. The framework can
be generalized to accumulator-based machines to optimize accumulator spilling,
such as the TMS320C40. Their uses are likely intended to obtain more compact
code.
Recently, in Leupers and Marwedel (1995) and Geboyts (1997), integer lin-
ear programming is shown to be effective in compiler optimization. The task of
local code compaction in VLIW architecture is solved under a set of linear con-
straints such as a sequence of register transfers and maximum time budget. Because
some DSP algorithms show more data flow and less control flow behavior (Leupers
and Marwedel, 1995), code compaction exploits parallel register transfers to be
scheduled into a single control step, resulting in a lower cycle count to satisfy
the timing constraint. Of course, it is important to consider resource conflicts and
dependencies, as well as the possible side effects of encoding restrictions and opera-
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
tions. Later effort has integrated instruction selection, compaction, and register
allocation together (Geboyts, 1997). The targeted PDSP is defined using arc map-
pings and later into logical propositions to make it retargetable. These propositions
are then translated into mathematical constraints to form the optimization model
using integer linear programming. Code is generated and optimized for minimum
code size and maximum performance in estimated energy dissipation.
In modern VLIW PDSPs, the architecture features homogeneous functional
and storage resources, enabling global optimization by the compiler. Software
pipelining is known as one of the most popular code scheduling techniques. It
exploits the available instruction-level parallelism in various loop iterations. A
software-pipelined loop consists of three components:
• A prolog: set up the loop initialization
• A kernel: execute pipelined loop body in steady state
• An epilog: drain the execution of the loop kernel
Modulo scheduling takes an innermost loop body and constructs a new schedule.
The new schedule is equivalent to overlapping loop iterations. The algorithm
utilizes a data precedence graph (DPG) and reservation table to construct a per-
missible schedule of the loop body under the available resource constraints. DPG
is a directed graph (possibly cyclic) with nodes and edges representing operations
and data flow dependencies of the original inner loop body. The resource require-
ments for an operation are modeled using a reservation table. Stotzer and Leiss
(1999) report the result of software pipelining on a set of 40 loop kernels based
on ’C6x architecture. However, the architectural features that impact performance
gain of software pipelining are moderately sized register file, constraints on code
size, and multiple assignment code.
6 DSP SYSTEM DESIGN METHODOLOGIES
Designing modern DSP systems requires more than just programming PDSP or
processing cores. Instead, the system’s performance must be the utmost perfor-
mance criterion. The DSP system design methodologies are developed at differ-
ent levels of abstraction. At the system level, the design scope includes task and
data partitioning and software synthesis/simulation. At the architectural level,
the focus is on architecture and compiler development. At the chip implementa-
tion level, hardware description languages such as VHDL and hardware/software
codesign methodologies are quite important.
6.1 Application Development with Existing
Hardware/Processing Core
A software engineering approach is incorporated to assist in the development of
an application using a DSP array processor at Raytheon System Corporation
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
(Kelly and Oshana, 1998). Three performance measurements are used to gauge
the quality of the design:
• Processor throughput rate
• Memory utilization
• I/O bandwidth utilization
A sensitivity analysis of these performance metrics is performed to examine
trade-offs of various design approaches. Factors that affect the processor
throughput rate include the quality of DSP algorithm formulation, the operation
cost in processor cycles, the sustained throughput rate to peak throughput effi-
ciency, and the expected speedup when it is upgraded to the next generation of
PDSP. Regarding the memory utilization, it has been observed that the size of
the data samples and the dynamic nature of memory usage patterns are the two
most important factors. The I/O bandwidth utilization, on the other hand, depends
on the algorithm as well as the hardware design. Several design tools used to
develop the entire system and their factors that may degrade the performance
during the design process are listed in Table 11.
Rate monotonic analysis (RMA) (Liu and Layland, 1973) is necessary to
validate the schedulability of software architecture. In general, the following les-
sons have been learned through this design experience:
• Prototype early in the development cycle.
• Ignore processor marketing information (actual throughput is highly
dependent on the application profile).
• Carefully analyze the most frequently executed function: task switching.
• Take inherent interface overheads such as interrupt handling, data pack-
ing, and data unpacking into account in estimating the throughput.
Another example of DSP system development is the Computer Assisted
Dynamic Data Monitoring and Analysis System (CADDMAS) developed for the
Table 11 DSP System Development Tools and Factors That May
Degrade Performance
Tools Factors that may degrade performance
Code generation Compiler efficiency
Quality of generated assembly code
Size of load image
Instruction level processor simulator Cycle counts for elementary operation
Cycle-accurate device level VHDL External memory access time
model Instruction caching effects
Resource contention between processor and
DMA channels
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
U.S. Air Force and NASA (Sztipanovitz et al., 1998). Its details have been de-
scribed in Section 2.3. An adaptive approach was necessary to allow the structure
of the system to adapt to changing external requirements and sensor availabilities.
This leads to the application of a reconfigurable controller for process control
called structurally adaptive signal processing. Unlike parametric adaptation,
where the topology of the graph is fixed and coefficients can change over time,
a structurally adaptive signal processing system can change its computational
structure on the fly. Therefore, its control functionality can be maintained even
in the face of sensor failures; the performance will be gracefully degraded but
correct control action is still present.
6.2 Application-Driven Design: Fine-Tuning
the Processing Core
Two reasons contributing to the poor performance of HLL PDSP commercial
compilers are, first, that compilers are developed after a target architecture has
been established and, second, the inability to exploit DSP-specific architectural
features in DSP compiler (Lee, 1994). The following application-driven design
methodologies are adopted:
• A DSP architecture and its compiler are developed in parallel.
• Its dynamic statistics assesses the impact trade-offs on performance.
• An iterative analysis is undertaken to fine-tune the architecture and
compiler.
The PDSP architecture is based on VLIW. As a result, an optimizing C
compiler is necessary to exploit static instruction-level parallelism as well as
DSP-specific hardware features. Those hardware features are modulo addressing,
low overhead looping, and dual data-memory banks. Meanwhile, an instruction
set simulator is developed to gather statistics on the run-time behavior of DSP
programs. A suite of DSP benchmarks in terms of kernel and application are
chosen to evaluate the system. The performance success of the compiler is due
to the flexibility of the model VLIW architecture. The statistics indicate the areas
of improvement to be fed back for fine-tuning the architecture. However, its draw-
back is the high instruction-memory bandwidth requirements that can be too ex-
pensive and impractical to implement.
As another means of DSP architecture development, machine description
language (MDL) has been proposed to achieve rapid prototyping at architectural
level. Recently, LISA (Peesl et al., 1999) was developed for the generation of
bit and cycle accurate models of a PDSP. It includes instruction set architecture
that enables automatic generation of simulators and assemblers. LISA is com-
posed of resource and operation declarations. Resource declaration represents the
storage objects of the hardware architecture (e.g., registers, memories, pipelines).
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
Declaration description collects the description of different properties of the sys-
tem (i.e., the instruction set model, the behavioral model, the timing model, and
necessary declarations). LISA supports cycle-accurate processor models, includ-
ing constructs to specify pipelines and their mechanisms. It targets SIMD, VLIW,
and superscalar architectures. Direct support for compiled simulation techniques
and strong orientation on C programming language are contributed in LISA. The
Texas Instruments’ TMS320C6201 DSP, realized as a real-world example, was
modeled on a cycle-by-cycle basis by only one designer and finished within
2 months.
6.3 Reconfigurable Computing: Hardware/Software
Codesign for a Given Application
In the system design process, it has been traditional that the decision is made on
a subtask-by-subtask basis to be implemented in either custom hardware or soft-
ware running on PDSP(s). On one hand, custom hardware or ASIC can be cus-
tomized to a particular subtask, resulting in relatively fast and efficient implemen-
tation. ASIC is physically programmed by patterning devices (transistors) and
metal interconnection prior to fabrication process. Higher throughput and lower
latency can be achieved with more space dedicated to particular functional units.
On the other hand, PDSP is programmed later by software resulting in flexible
but relatively slow and inefficient realization. Temporal or sequential operations
can be accomplished by a set of instructions to program a processor after its
fabrication.
Between these two extremes, reconfigurable computing (RC) architecture
can be programmed to perform any specific function by a set of configuration
bits. In other words, RC combines temporal programmability with spatial com-
putation in hardware after it is fabricated at low overhead. The hardware/
software boundaries can be altered by the RC paradigm (DeHon and Wawrzynek,
1999).
Reconfigurable computing is also known as the 90/10 rule of thumb, where
90% of run time is spent on 10% of the program, hardware/software partitioning
is inspired by the higher percentage of run time of specialized computation:
the greater improvement of cost/performance if it is implemented in hardware,
and the more specialized computation dominate the application the more closely
the specialized processor should be coupled with host processor. This rule of
thumb has been successfully applied to the floating-point processing unit as well
as RC.
In the heterogeneous system approach, RC is combined with the general-
purpose processing capability of the traditional microprocessor. The interface
between these two can be either closely or loosely coupled, depending on its
applications. How frequently RC’s functionality should be reconfigured dynami-
TM
Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
Another Random Scribd Document
with Unrelated Content
"Good-night, Papa Clyde; Doctor Heath says you are the most
splendid fellow in the world--but I know you are the dearest father
in the world; good-night, I 've had a lovely party."
She ran upstairs, but, in a moment, her father heard her
tripping down again. Her head parted the portières. "I just came
back to tell you, that this kind of a talk we 've had is just as good as
the Mount Hunger bedtime-talks. I shan't be homesick any more."
And away she ran.
Now John Curtis Clyde was a pew-owner--as had been his father
and grandfather before him--in one of the Fifth Avenue churches,
and duly made his appearance in that pew every Sunday morning.
He entered, too, into the service with hearty voice, and made his
responses without, the while, giving undue thought to the world. But
when he had said "Our Father" with his little daughter by his side, he
had supposed his duty performed to the extent of his needs--of
another's, his child's, he gave no thought.
To-night, however, as he sat in the easy-chair where Hazel had
left him, it began to dawn upon him slowly that his little daughter,
during her fourteen years, might have had other needs, for which he
had not provided, nor, perhaps, with all his riches was capable of
providing.
The clock chimed twelve,--one,--two--; John Clyde, with a sigh,
rose and went up to bed--a wiser and a better man.
XXII
ROSE
What a summer that was! Mr. Clyde sent Hazel up to the Blossoms
for July and again for September, when he, the Colonel and Mrs.
Fenlick, the Pearsells and the Masons, Aunt Carrie and Uncle Jo took
possession of the entire inn at Barton's River, and for a month
coached and rode throughout the "North Country," all in the cool
September weather. Jack Sherrill joined them for the last three
weeks, and, this time, Maude Seaton was not of the party.
"I just headed her off every time she made a dead set at any
one of us for an invitation," said Mrs. Fenlick one day in confidence
to her intimate, Mrs. Pearsell, as they sat on the vine-covered
veranda of the inn, "but she proved a regular octopus. She got the
Colonel in her toils one morning at the Casino, and I pretended to be
faint--yes, I did--just to get his attention for a sufficient time to
make a fuss, and get him alone in the carriage; then, of course, I
settled it. Oh, dear! men are so guileless in spots!"--Mrs. Fenlick
gave a weary sigh--"What I have n't been through with that girl!
Anyway, she's been out two winters, now, and she has n't caught
Jack Sherrill yet. I don't think there is much chance after the first
season for a girl to make a really fine match, do you?" Then they fell
to discussing the pros, and cons, of the question with evergreen
interest.
Jack Sherrill, for one, had no thought of Miss Seaton. He had
sent the valentine-flowers, and the sentiment from Barry Cornwall's
love-song, with a strange kind of "kill or cure" feeling.
He had communed with himself, at twilight of one February day,
as he lay at full length on the cushioned window-seat of his room
from which he looked down upon the darkening, snow-covered
campus and the anatomy of the elms showing black against it. His
pipe had gone out, but he derived some satisfaction in pulling away
at it mechanically, while he thought out the situation for himself.
"What's the use of a man's hanging fire when he knows?" he
thought. "Now, I love her--love her." (Jack's hand stole into the
breast of his jacket and crushed a bit of paper there; he smiled.) "Of
course she does n't know, and won't know for a while, but it shan't
be through any neglect of mine that she does n't; and when she
knows--there 's the rub!--will she care for me, Jack Sherrill? I 've
never done anything in my life to make a girl like that care for me.
"But there's one thing I 'd stake my life on--she would n't marry
a man for his money. A man 's got to be loved for himself--not for
what he can give a woman, or do for her, but just for himself, if it's
going to be the real thing, and last. And what am I that a girl like
that should love me--" Jack was growing very humble. He pulled
himself together: "Anyhow, I'll send the flowers and the sentiment, I
mean it; I don't care what she thinks!" Jack's courage rose as he
began to feel something like defiance of Fate.
Just then his chum came in.
"There's no use, Sherrill," he said, flinging himself down upon
the cushioned seat Jack had just vacated; "we can't have the
theatricals unless you take the girl's part. It won't put you out any--
smooth face and no scrub. You 've been it once, and it will be a dead
failure if you aren't in it now."
"I don't see how I can," replied Jack, shortly, for this intrusion
on his mood irritated him. "I told you, all of you, at the Club last
year, that I would n't play after I was a Junior."
"Well, what if you did?" rejoined his chum, a little crossly. "You
're not so uncompromisingly steadfast in other things that you can't
afford to change your mind in such a trifle as this."
"Come, don't be touchy," said Jack, good-humoredly. "Hit right
out from the shoulder, old man, and tell me what you mean."
Dawns smiled, clasped his hands under his head, and raised his
merry blue eyes to Jack, who was lighting up.
"They say over at the Club that you have thrown Maude Seaton
over, but Grayson took up the Seaton cudgels and made the
statement that she had thrown you over, and you won't take the
girl's part in the play because she is coming on for it."
Jack hesitated. He hated to play at any comedy of love when his
heart was throbbing with the genuine article. But, after all, it might
be the best way to silence the Club's tongues as well as some others
in Boston and New York.
"I 'll help you out this once, Dawns, but I tell you plainly I won't
have anything more to do with the Club theatricals while I 'm in
college," he replied, ignoring both of Dawns' statements, which
omissions his chum noticed, and made his own thoughts: "Just like
Sherrill. You can't get any hold of him to know what he really feels
and thinks."
Jack played his part accordingly, repeating the success of the
year before, and scoring new triumphs. He was glad when it was
over, and he could go back to his room "dead tired," as he said to
himself, but with the conviction that he had settled matters to his
own satisfaction if not to that of one other.
The room was in such disorder! Evidently, Dawns had been
having a little spree before Jack's late return, and the smoke had left
the air heavy.
Jack dropped his paraphernalia in the middle of the floor--
peeling himself as he stood yawning and thanking his lucky star that
he was not born a woman to be handicapped by such things!--
décolleté white satin waist, long-trained satin gown, necklace--Jack
gave the string a twitch, for it had knotted, and the Roman pearls
rolled into unreachable places all over the floor. Off flew one white
satin slipper--number ten, broad at the toes!--with a fine "drop kick"
hitting the ceiling and landing on the book-shelves; the other
followed suit. White fan with chain, white elbow gloves, corsage
bouquet--all dropped in a promiscuous heap. A general stampede
loosened silk under-skirt and dainty muslin petticoat, lace-trimmed.
A wrench,--corset-cover and corsets were torn from their moorings.
Jack groaned--or something worse--at the flummery, and, leaving
everything as it had dropped, rushed off into his bedroom, only to
find that he had forgotten to take off the blonde wig and wash off
the rouge.
At last, however, he was asleep, and slept the sleep of the
justified.
He slept both soundly and late, but when he awoke the next
morning his first thought was of the flowers for Mount Hunger and
the appropriate sentiment. Accordingly, having reckoned the arrival
of train, departure of stage, etc., to a minute, he selected the
flowers, wrote the sentiment, not without forebodings of the usual
kind, and despatched both to Mount Hunger with high hopes,
notwithstanding prescient feelings. Then, metaphorically, he sat
down to await an answer. He waited just two months, and during
that time had turned emotionally black and blue more than once at
the thought of his temerity in sending such a message.
Hazel had written him at once from North Carolina to tell him of
March's illness, and on the same day she sent a penitent note to
Rose, confessing her shame at her attempt at deception, and
explaining that it was because she loved her cousin so dearly she
could not bear to see his gift slighted.
When March was out of danger, Rose had written to Hazel a
frank, loving letter, blaming herself for her want of self-control, and
begging Hazel's forgiveness for her harsh words:
"It's all my old pride, Hazel dear," she wrote, "that I have to fight
very often. It was most kind of Mr. Sherrill to remember me when he
has so many, many other friends whom he has known longer, and I
shall write and tell him so. Now that my heart is lighter on account
of dear March, I can write more easily.
"We miss you so! when are you coming back to us? Chi looks
perfectly disconsolate, and we all feel a great deal more than we
care to say.
"I wish you were here to have the fun of the French evenings,
three times a week. You speak it so beautifully, Mr. Ford says, and I
thank you so much for all the help you gave me in teaching me. Mr.
Ford speaks it very well, too, so Miss Alton says. We all meet at our
house once a week on March's account, and then one evening in the
week, Miss Alton and I (she 's lovely) go over to the Fords' for
music. He has sent for some lovely songs for me--old English ones,
and we're going to have a little celebration for March's birthday in
May. How I wish you were to be here!
"March is lying on the settle, dreaming over that exquisite
photograph of Cologne Cathedral you sent him; I've just asked him if
he had any messages for you, and he smiled--oh, it's so good to see
his dear smile again! You can't think how tall he's grown since his
illness, and he's so thin--and said, 'I sent one to her this morning
myself; she can't have two a day.' But you know March's ways.
"Now I must stop; Mr. Ford is coming over on horseback and I
am riding Bob now. I wear an old riding-habit of Martie's--it fits fine!
I have more to tell you, but will finish after I get back from the ride--
there comes Mr. Ford--"
This letter Hazel duly forwarded to her cousin. "He 'll know by what
she says in it that she really was pleased, for all she acted so queer,"
she said to herself as she enclosed it in one to Jack, in which she
took special pains to inform him that he had never told her whether
he had given those verses Rose sang to Miss Seaton.
"I told Rose I was sure they were for Miss Seaton, and Rose said she
did n't mind copying them herself for you if you wished them. Do tell
me if you gave them to her. I told Rose your valentine to her last
year was a rose-heart. I hope you don't mind my telling, for, you
know, Jack, all our family think you are engaged to her--"
Jack dropped Hazel's letter at this point and gave a decided groan.
"What luck!" he muttered. "It's all up with the whole thing now.
No girl of any spirit would stand all that--and Hazel meddling so!
thinking she is doing her level best to explain matters;--What an ass
I was to send that flower-valentine to Maude--and she thinks I gave
her those verses! and there 's this Ford skulking round and having it
all his own way; he 's just the kind a girl would care for--those
musical cranks are no end sentimental. Hang it all!"
Jack thrust his hands deep into his pockets, took several
decided turns up and down the room, squared his shoulders, pursed
his lips, cut his two classroom lectures, ordered up Little Shaver and
rode out to the polo grounds, where, finding himself alone, he put
the little fellow through his best paces, ignoring the fact that snow
and ice wore on the pony's nerves--and had a game out to himself.
When just two months had passed, he received a note from
Rose, his first, and it was accorded the reception due to first notes in
particular. After this, Jack developed certain wiles of diplomacy, he
had thus far, in his various experiences, held in abeyance. He wrote
sympathetic notes to Mrs. Blossom; commissioned Chi to find him
another polo pony--Morgan, if possible--among the Green Hills; sent
March a set of illustrated books on architecture, and complained to
Doctor Heath of a pain that racked his chest; at which the Doctor's
eyes twinkled. He said he would examine him later, but he was
convinced it was heart trouble, the symptoms were apt to mislead
and confuse. He added gravely: "Too much hard polo riding, Jack;
get away into the country--mountains if you can, and you 'll
recuperate fast enough. I 'll make an examination in the fall."
Jack obeyed to the letter, and what a month of September that
was!
There were glorious rides with Rose along the beautiful river
valley and over the mountain roads. There were delightful evenings
at the Fords', and silent, beatific walks with Rose homewards
beneath the harvest moon. There were morning rambles with Rose
up over the pastures and deep into the woodlands for late ferns and
hooded gentians. There were adorable hours of doing nothing but
adore, while Rose was busy about her work, setting the table for tea
(Jack paid his board at the inn, but he lived at the Blossoms'), or
laying the cloth for dinner, or on Saturday morning even making rolls
for the tea to which the whole party at the inn were invited.
Chi was in his glory. Little Shaver came trotting regularly every
day up through the woods'-road, and whinnied "Good-morning" first
to Fleet, then to Chi. There were general coaching-parties to
Woodstock and Brandon, in which Mrs. Blossom was guest, and a
grand tea at the Fords' for all the guests, with a musicale for a
finish, and an informal dance in the Blossoms' barn to which all the
Lost Nation were invited.
They accepted, one and all. Captain Spillkins was in his element,
so he said. He and Mrs. Fenlick danced a two-step in a manner to
win the commendation of the entire assembly. Miss Elvira and Miss
Melissa went through the square dance escorted by Jack and Uncle
Jo. There were round dances and contra dances. Uncle Israel
contributed an "1812" jig, and Mr. Clyde passed round the hat for his
sole benefit. There were waltzes for those who could waltz, and
polkas for those who could polka, and schottische and minuet.
"There never was such a dance since before the Deluge!" declared
Mrs. Fenlick, when Captain Spillkins escorted her to a seat on a sap-
bucket; and then they all went at it again in a grand finale, the
Virginia Reel--Chi and Hazel, Mr. Clyde and Aunt Tryphosa for head
and foot couple; Maria-Ann with Jack; Alan Ford with Mrs. Fenlick;
the Colonel with Mrs. Blossom whom he admired greatly; March and
Miss Alton--such a double row of them!
Poor Reub sat in one of the empty stalls and watched the fun
with slow, half-understanding smile, and Ruth Ford reclined in a
rocking-chair in the corner, and with merry laughter and sparkling
wit soothed the dull ache in her heart that the knowledge that she
was henceforth to be a "Shut-out" from all that life had at first given
her.
The next day after the dance there was a grand dinner given at
the inn by the Newport party to all the Lost Nation; and, later on,
private entertainments for Mr. and Mrs. Blossom and the Fords. At
last, when the first maple leaves crimsoned and the frost silvered the
mullein leaves in the pasture, Hazel, her father, Jack, and their
friends bade good-bye to the Mountain and all its joys of
acquaintance, and in some cases, friendship, and turned their faces,
not without reluctance on the part of some of them, city-wards.
"Oh, mother! has n't it been too beautiful for anything?"
exclaimed Rose, turning to her mother, as the last of the riding-party
waved his cap in farewell to those on the porch. It was Jack.
"We have had a happy summer, Rose;--I think they have, too,"
her mother added, shading her eyes from the setting sun. "You 'll be
very lonely here at home, dear, after all this gayety."
"Lonely! Why, Martie Blossom, how can you think of such a
thing!" said Rose, still scanning the lower road for a last glimpse of
the riders. "See, see, they are all waving their handkerchiefs!"
The whole Blossom family laid hold of what they could--napkins,
towels, a table-cloth, and Chi seized his shirt, which he had hung on
the line to dry, and waved frantically until the party was no longer to
be seen.
"Lonesome! the idea," said Rose, turning to her mother. "Think
of all the studying March and I have to do, and the French evenings,
and the Fords, and Thanksgiving coming, and then Christmas, and
then--
"Then," said Mrs. Blossom, interrupting her, "my Rose takes a
little plunge into that whirlpool of gay life and fashion in New York."
"Yes," said Rose, with a happy smile that spoke volumes to her
mother, "I do look forward to it, Martie dear; but the whirlpool shan't
suck me under; I shall come home just your old-fashioned Rose-
pose."
"I hope so, dear," said her mother, a little wistfully, and called
the children in to supper.
Indeed, they found little opportunity to miss their friends in the
ensuing months; for there came kindly letters, and friendly letters,
and something very nearly resembling love-letters. The mail brought
papers, books, and magazines. The express brought to Barton's
River many a box of lovely flowers. At Christmas came more than
one remembrance for them all, including Aunt Tryphosa and Maria-
Ann, and four special invitations for Rose to visit in New York directly
after the holidays. One was from Mr. Clyde--with an urgent request
from Hazel to say "yes" by telegram and "relieve her misery," so she
put it--; one from Mrs. Heath; one from Aunt Carrie, and a gushingly
cordial one from Mrs. Fenlick! Each claimed her for a month. But
Mrs. Blossom shook her head.
"No, no, dear, you would wear your welcome out. I shall need
you at home by the last of February. I think you can accept only Mr.
Clyde's and Mrs. Heath's. You can accept social courtesies from the
other four of course."
"But, mother," Rose's face was the image of despair, "what shall
I wear? Just hear what Hazel has planned--'lunches, dinners,
theatre, concerts'--why! I can never go to all those things."
"I 've thought of that, too, Rose; but the little colt shan't go
bare this time--it will take some courage, dear, to wear the same
things over and over again, not to mention the puzzle of planning for
it all."
"I 'm not 'Molly Stark' for nothing," laughed Rose, and the two
women began to plan for what Chi called "Rose's campaign." The
pretty white serge was lengthened and made over to appear more
grown up, as Cherry put it; the dark blue wash silk--Hazel's gift that
had never been made up--was fashioned into a "swell affair"--so
March pronounced it; the old-fashioned blue lawn was cut over into
a dainty full waist, and then Mrs. Blossom added her surprise--a
delicate blue taffeta skirt to match the waist. Rose went into
raptures over it, and sought the best bedroom regularly three times
a day to feast her girl's eyes on the silken loveliness as it lay in state
on the best bed. A new dark blue serge was to do duty for a street
suit, with a plain felt hat. For best, there was a turban made of dark
blue velvet to match the wash silk.
"And four pairs of gloves! Martie Blossom, you are an angel, to
give me these that Hazel gave you a year ago last Christmas. Have
you been keeping them for me all this time?"
Mrs. Blossom smiled assent, and was rewarded by a squeeze
that interfered decidedly with her breathing apparatus.
The night before she left, Rose "costumed" for the benefit of the
entire family, who were assembled in the long-room, together with
Aunt Tryphosa and Maria-Ann, to see Rose in her finery.
"I 'll make it a climax," said Rose, laughing half-shamefacedly,
as she slipped upstairs to change her street suit, which had brought
forth admiring "Ohs" and "Ahs" from the children, and favorable
criticism from their elders.
Down she came in her white serge; there were nods and smiles
of approval.
Her reappearance in the wash silk and velvet turban was the
signal, on March's part, for a burst of applause, and cries of
admiration from Budd and Cherry.
"Grand transformation scene!" cried March, as Rose tripped
down in the blue taffeta, looking like a very rose herself.
"Beats all!" murmured Chi, who had become nearly speechless
with admiration, "what clothes 'll do for a good-lookin' woman; but
for a ravin', tearin' beauty like our Rose--George Washin'ton! She 'll
open those high-flyers' eyes."
"Cinderella--fifth act!" shouted March as, after a prolonged wait,
he heard Rose on the stairs.
But was it Rose?
The beautiful India mull of her mother's had been transformed
into a ball-dress. She had drawn on her long white gloves and
tucked into the simple, ribbon belt three of Jack's Christmas roses.
Maria-Ann gasped, and that broke the, to Rose, somewhat
embarrassing silence.
Marshalled by March, the whole family formed a procession, and
Rose was reviewed:--back breadths, front breadths, flounces, waist,
gloves; all were thoroughly inspected.
Chi touched the lower flounce of the half-train gingerly with one
work-roughened forefinger, then, straightening himself suddenly,
sighed heavily.
"What's the matter, Chi?" Rose laughed at the dubious
expression on his face.
"You ain't Rose Blossom nor Molly Stark any longer. You 're just
a regular Empress of Rooshy, 'n' you don't look like that girl I took
along to sell berries down to Barton's last summer, 'n' I wish you--"
he hesitated.
"What, Chi?" said Rose.
"I wish you was back again, old sunbonnet, old calico gown,
patched shoes 'n' all--"
"Oh, Chi, no, you don't," said Rose, laughing merrily; "you
forget, I shall probably see Miss Seaton down there in New York,
and you wouldn't want me to appear a second time before her in
that old rig."
"You 're right, Rose-pose," replied Chi, his expression
brightening visibly. He drew close to her and whispered audibly:
"Just sail right in, Molly Stark, 'n' cut that sassy girl out right 'n'
left. She never could hold a candle to you."
"Sh-sh, Chi!" said Mrs. Blossom, meaningly, but with a twinkle in
her eye.
"I mean just what I say, Mis' Blossom. Folks can't come up here
on this Mountain to sass us to our faces, 'n' she did;--I've stayed
riled ever since, 'n' I hope she'll get sassed back in a way that 'll
make her hair stand just a little more on end than it did, when she
gave that mean, snickerin' giggle--"
"Chi, Chi," Mrs. Blossom interrupted him in an appeasing tone.
"You need n't Chi me, Mis' Blossom. These children are just as
near to me as if they was my own, 'n' when they 're sassed, I 'm
sassed too; 'n' my great-grandfather fought over at Ticonderogy, 'n' I
ain't bound to take any more sass than he took--"
By this time the whole family were in fits of laughter over Chi's
persistent use of so much "sass," and, at last, Chi himself joined in
the laugh at his excessive heat:--
"Over nothin' but a wind-bag, after all," he concluded.
On the following morning, Mr. Blossom, Chi, March and Budd
drove down to Barton's to see Rose off. The old apple-green pung
had been fitted with two broad boards for seats, and covered with
buffalo robes and horse blankets. There was just room in the tail for
Rose's old-fashioned trunk and a small strapped box, which held two
dozen of new-laid eggs, six small, round cheeses, and a wreath of
ground hemlock and bitter-sweet--a neighborly gift from Aunt
Tryphosa and Maria-Ann to Hazel and Mr. Clyde.
As the train moved away from the station, Chi watched it with
brimming eyes.
"She'll never come back the same Rose-pose, livin' among all
those high-flyers--never," he muttered to himself; but aloud he
remarked, with forced cheerfulness, turning to Mr. Blossom while he
dashed the blinding drops from his eyes with the back of his hand:
"Looks mighty like a thaw, Ben; kind of wets down, don't it?"
"Yes, Chi," said Mr. Blossom, busy with conquering his own
heartache, "we 'd better be getting on home;" and the masculine
contingent of the Blossom household climbed into the pung and took
their way homeward in silence.
But what a reception that was for the transplanted Rose!
Mr. Clyde met her at the Grand Central Station, and Rose felt
how welcome she was just by the hand-clasp, and his first words:
"We have you at last, Rose; I would n't let Hazel come because
I thought the train might be late, and there's a cold rain falling.
Martin, take this box--"
"Oh, no; I must carry that myself," laughed Rose, looking up at
the liveried footman with something like awe. "I promised Aunt
Tryphosa and Maria-Ann I would n't let any one take them till they
were safe in the house; thank you," she bowed courteously to
Martin, who confided to the coachman so soon as they were on the
box: "Hi 'ave n't seen nothink so 'ansome since Hi 've bean in the
States."
As the brougham whirled into the Avenue, and the electric lights
shone full into the carriage, Rose could see the luxuriously
upholstered interior, and a sudden thought of the old apple-green
pung and the buffalo robes dimmed her eyes. But it was only for a
moment; Mr. Clyde was telling her of Hazel's impatience, and how
the coachman had had special orders from her to hurry up so soon
as he should be on the Avenue, and he had hardly finished before
the coachman drew rein, slackening his rapid pace as he turned a
corner, Martin was opening the door, and Hazel's voice was calling
from a wide house entrance flooded with soft light:
"Oh, Rose, my Rose! Is it really you, at last?"
"And this, I am sure, is Wilkins," said Rose, when finally Hazel
set her arms free. "We 've heard so much of you, that I feel as if I
had known you a long time." Rose held out her hand with such
sincere cordiality that Wilkins' speech was suddenly reduced to
pantomime, and he could only extend his other hand rather
helplessly towards the box that Rose still carried. But Rose refused
to yield it up.
"Here, Hazel, I promised Maria-Ann and Aunt Tryphosa I would
n't give it into any hands but yours. Oh! be careful--they 're eggs!"
"Eggs!" repeated Hazel, laughing. "Here, Wilkins, unstrap it for
me, quick--Oh, papa, look!" She held out the box to Mr. Clyde, and,
somehow, John Curtis Clyde for a moment thought with Chi, that
there was going to be a "thaw." Each egg was rolled in white cotton
batting and wrapped in pink tissue paper. The six little cheeses were
enclosed in tin-foil, and cheeses and eggs were embedded in the
Christmas wreath. On a piece of pasteboard was written in unsteady
characters:
To Mr. John Curtis Clyde of New York City, with the season's
compliments.
MOUNT HUNGER, VERMONT, January 6th, 1898.
"And you 've had such lovely flowers come for you, five boxes of
them, Rose, and piles of invitations. I 'm sure you 're engaged up to
Ash Wednesday."
"Come, Chatterbox," said her father, smiling at her volubility,
"Rose has just time to dress for dinner; you know Aunt Carrie and
Uncle Jo are coming to-night."
"Oh, I forgot all about them; you 'll have to hurry, Rose. Wilkins,
bring up the flowers. Come on," Hazel ran up the broad flight of
stairs, carpeted with velvety crimson, to the first landing, from
which, through a lofty arch in the hall, Rose caught a glimpse of
softly lighted rooms, the walls enriched with engravings and
etchings, with here and there a landscape or marine in watercolors.
Rose drew a long breath. This, then, was what Chi meant when he
said "Hazel was rich as Croesus."
"But, Hazel, my trunk has n't come," said Rose, as she followed
her hostess into the spacious bedroom, which was separated from
Hazel's only by a dressing-room.
"It 'll be here in a few minutes; papa has a special man, who
always delivers them almost as soon as we get here."
Sure enough, the trunk came in time; and Rose, as she
unpacked, finding evidences of the loving mother-care in every fold,
cried within her heart, looking about at the exquisite appointments
of her room and dressing-room:
"Martie, Martie, what would all this be without you!--Oh, I know
now, what dear old Chi meant when he said Hazel was poor where
we are rich--only a housekeeper to see to all Hazel's things--"
"Rose, what flowers are you going to wear?" called Hazel from
her room.
"I have n't had time to look," Rose called back, surveying her
white serge with great satisfaction in the pier-glass.
"Do look, then, and see who they 're from."
"Oh, Hazel, do come and see. How kind everybody has been!
Here are cards from Mrs. Heath and Doctor Heath, and your Aunt
Carrie, and Mr. Sherrill, and Mrs. Fenlick, and even that Mr. Grayson
who was up at our house to tea a year ago!"
"They are lovely. Whose are you going to wear?"
"I 'll make up a bunch of one or two from each, that will show
my appreciation of all their favors."
Hazel looked slightly crestfallen. "I hoped you 'd wear Jack's--
they 're the loveliest with white--" she lifted the white lilacs--"and
they 're so rare just now. I heard Aunt Carrie say that one of the
girls had put off her wedding for six weeks, just because she
couldn't have white lilacs for it."
"They 'll last with care three days surely, and I can wear them
to-morrow evening," replied Rose, bending to inhale their delicate
fragrance.
"So you can, for papa is going to give a dinner for you to-
morrow night, and afterwards, he has promised to take you to a
dance at Mrs. Pearsell's. I can't go, you know, for I 'm not grown up;
but you can tell me all about it. We 're going to have lots of fun this
week, for school does not begin for several days. Come."
Together they went down to the drawing-room, and Wilkins
announced that dinner was served.
After it was over he sought Minna-Lu in her own domains, and
gave vent to his long pent emotions.
"Minna-Lu," he whispered, mysteriously, "dere 's an out an' out
angel ben hubberin' 'bout de table--"
"Fo' de Lawd!" Minna-Lu turned upon him fiercely, for she was
superstitious to the very marrow. "Wa' fo' yo' come hyar, skeerin' de
bref out a mah bones wif yo' sp'r'ts! Yo' go long home wha' yo'
b'long."
But Wilkins was not to be repulsed in this manner. "Nebber see
sech ha'r, an' jes' lillum-white--"
"Oh, go 'long! Lillum-white ha'r," interrupted Minna-Lu, with
scathing sarcasm. "Huccome yo' know de angels hab lillum-white
ha'r?"
"Huccome I know?--'Case I see de shine, jes' lake yo' see in de
dror'n-room."
"De shine ob lillum-white ha'r in de dror'n-room! 'Pears lake yo'
head struck ile--"
"Yo' hol' yo' tongue, Minna-Lu," retorted Wilkins, irritated at the
continued evidence of disbelief on the part of his coadjutor. "Jes' yo'
hide back ob de dumb-waitah to-morrah ebenin' when de dessert
comes on, an' see fo' yo'se'f!" He departed in high dudgeon, and
Minna-Lu gurgled long and low to herself, but, in her turn, was
interrupted by the sound of tripping steps on the basement flight.
Minna-Lu hastily put her fat hands up to her turban to see if it
were on straight, and smoothed her apron, muttering:
"Clar to goodness, ef it ain't jes' mah luck to hab little Missus
come into dis yere hen-roost?" she rapidly surveyed her immaculate
kitchen with anxious eye.
"Minna-Lu, this is my friend, Miss Rose; the one who did up
those lovely preserves, and here are some new-laid eggs and some
cheeses that Miss Maria-Ann Simmons--you know I told you all about
her and the hens--has sent papa."
Minna-Lu gazed at Rose in open admiration. The faithful colored
retainer had her thorny side and her blossom one.
Rose put out her hand, and Minna-Lu took it in both hers. "I 'se
mighty glad yo' come, Miss Rose, dere ain't no strawberry-blossom
nor no rose-blossom can hol' a can'le to yo' own honey se'f. Dese
yere cheeses is prime." She examined one with the nose of a
connoisseur. "Jes' fill de bill wif de salad-chips to-morrah." She
stemmed her fists on her hips, and her mellow, contented gurgle
caused Rose and Hazel to laugh, too.
"What is it, Minna-Lu?" said Hazel, reading the signs of the
times.
"Dat Wilkins done tol' me to git back ob de dumb-waitah, to-
morrah ebenin' to see Missy Rose, but I 'se gwine to ask rale straight
to jes' see her 'fo' de comp'ny come."
"Of course you may. Come up to my room about seven, and we
'll be ready."
"Fo' sho'," said Minna-Lu, with beaming face.
"Good-night," said Rose, beaming, too, for she found the black
faces and ways irresistibly amusing.
"De Lawd bress yo' lily face, Missy Rose."
When the two girls were alone, at last, in Hazel's room, there
was no thought of bed for an hour. There were numberless
questions on Hazel's part concerning all the dear Mount Hunger
people, and speechless astonishment on Rose's at the number of
invitations that were waiting for her. They chatted all the time they
were undressing, calling back and forth to each other as one thing
or another suggested itself. Finally, Hazel made her appearance in
Rose's room. She went up to her, put her arms about her neck, and,
looking up with eyes full of loving trust, said:
"Rose-pose, won't you come into my room and say 'Our Father'
with me as Mother Blossom used to do on Mount Hunger? You can't
think how I miss it."
"Why, Hazel darling, of course I will--then I shan't feel homesick
missing that precious Martie."
She followed Hazel into her room, and after she was in bed,
Rose knelt by her side, and together they said, "Our Father." Then
Rose bent over to receive Hazel's loving kiss and whispered, "Oh,
Rose, I 'm so happy to have you here," and whispered back, "And I
'm so happy to be with you, Hazel--good-night."
"Good-night."
Rose went back to her room. At last she was alone. She drew
one of the easy-chairs up before the wood-fire that was dying down,
put her bare feet on the warm fender, and, for a while, dreamed
waking dreams. It was all so strange. The cathedral clock on the
mantel chimed twelve. They were all asleep in the farmhouse on the
Mountain--it was time for her to be. She rose, tiptoed softly into the
dressing-room, took from the bowl the spray of white lilacs she had
worn with the other flowers that evening, shook off the water, and
drew the stem through a buttonhole in the yoke of her simple night-
dress. She tiptoed back again into her room, looked up at the dainty,
canopied bed, then laid herself down within it, and, almost
immediately, fell asleep--with her hand resting on the white
fragrance that lay upon her heart.
XXIII
BEHOLD HOW GREAT A MATTER A LITTLE FIRE KINDLETH
It was so delightful! The weeks were passing all too quickly, and the
letters to Mount Hunger waxed eloquent in praise of everybody's
kindness.
Jack had come on to lead a cotillion with Rose at Aunt Carrie's.
It was a weighty affair--the selecting of the flowers for her. White
violets they must be, and white violets were about as rare as white
raspberries. Jack gave the florist his own address.
"I 'll see them, myself, before I send them up; for I won't trust
anyone's eyes but my own," he said to himself as he hurried home
to dress for dinner with a friend. "I wish I had n't promised Grayson
to meet him at the Club before seven. I 'm afraid they won't come in
time." He looked at his watch. "I 'm going to make them a test--and
see what she 'll do. She 's so friendly and frank and all that, I can't
find out even whether she 's beginning to care."
Jack's absorption in the theme was such that he put his latch-
key in wrong-side up, and, in consequence, wrestled with the lock till
he had worked himself into a fever of impatience; finally he touched
the button before he discovered the trouble.
"Any packages come for me, Jason?" he inquired of the butler,
whose dignified manner of locomotion had been rudely shaken by
Jack's unceasing pressure on the electric-bell.
"Yes, Mr. John. Just taken a box up to the rooms."
Jack looked relieved, and sprang upstairs two steps at a time.
He opened the box. There they were in all their exquisite freshness.
"Like her," he thought, touching his lips to them; then, suddenly
straightening himself, he felt the blood surge into his face.
"I like Dord's way of putting up his flowers, no tags, nor fol-de-
rols. Jason," he said, as he ran down stairs again, "I shall be back in
an hour; tell Thomas to have everything laid out--I 'm in a hurry.
And have a messenger-boy here when I come back, and don't forget
to order the carriage for quarter of eight, sharp."
"Yes, Mr. John."
"Messenger-boy come?" he inquired as Jason opened the door
on his return.
"Yes, sir, waiting in the hall."
Jack raced up stairs. There was the precious box on his
dressing-table. He hastily took a visiting card, and, writing on it the
sentiment that was uppermost in his heart, slipped it into the
envelope, gave it, together with the box, to the waiting boy, and
bade him hand it to the man, Wilkins, with the request that it be
sent up at once to the lady to whom it was addressed. Then he
made ready for dinner.
An hour later, Rose was dressing for the dance, and Hazel was
watching her, chatting volubly all the while.
"That's the loveliest dress, Rose, I heard Aunt Carrie say, you
couldn't buy such, nowadays."
"It was Martie's wedding-dress. An uncle of her mother's, who
was a sea-captain, brought it from India. But if I wear it many more
times, it will be known throughout the length of New York. This is
my sixth time."
"I should n't care if it were the hundredth; it's just lovely.
Besides, Jack has n't seen it, you know."
Rose laughed. "Oh, yes, he has--on Martie; that night of the tea
on the porch."
"Oh, well, that's different. What flowers are you going to wear?"
"I thought I wouldn't wear any, just for a change." Rose's face
was veiled by the shining hair, which she was brushing, preparatory
to coiling it high on her head; otherwise, Hazel would have seen the
clear flush that warmed even the roots of the soft waves at the nape
of her neck. Just then there was a knock. The maid opened the door,
and Wilkins' voice was distinctly audible:--
"Jes' come fo' Miss Rose; dey wuz to come up right smart, so de
boy say."
"Oh, more flowers. Who from?" cried Hazel, eagerly, while
Wilkins strained his ears to catch the reply.
"From Mr. Sherrill," said Rose, opening the little envelope.
What she read on the card caused the blood to mount higher
and higher, till temples and forehead flushed pink, then as suddenly
to recede.
"May I open them, Rose, and won't you wear some if they 're
from Jack?"
"Yes," said Rose, simply. The two girls leaned over the box as
Hazel took off the wrapper--then the cover--then the inner tissue
papers--then--
"The two girls leaned over the box as
Hazel took off the wrapper"
Suddenly a shriek of laughter, followed by another, penetrated
to Wilkins, who was lingering on the stairs; he came softly back
again. Peal after peal of wild merriment issued from Rose's room.
Within, Rose in her petticoat and bodice had flung herself on the bed
in an ecstasy of mirth, and Hazel was rolling over on the rug as was
the wont of Budd and Cherry in the old days on Mount Hunger. The
maid looked from one to the other, and, no longer able to keep from
joining in the merriment, although she did not know the cause, left
the room, only to find Wilkins with perturbed face just outside the
door.
"'Pears lake dere wor sumfin' queah 'bout dat ye re box--" he
began; but the maid only shook with laughter and laid her finger on
her lips, motioning him into the back hall.
"Did you ever?" cried Hazel, when she recovered her breath.
"No, I never," said Rose, wiping away the tears, for she had
laughed till she cried. "Let's take another look."
They bent over the box, and took out its contents; then went off
again into fits of seemingly inextinguishable laughter; for, neatly
folded beneath the tissue paper, lay four sets of Jack's new light-
weight, white silk pajamas, which he had purchased that afternoon,
in order to take back to Cambridge with him. On the card, which
Rose still held in her hand, was written, "Wear these for my sake."
"What will you say to him, Rose?" said Hazel, sitting up on the
rug with her hands clasped about her knees.
"I don't know," said Rose, proceeding to dress. "I can't wear
them, that's certain." And again the absurdity of the situation
presented itself to her. "And I can't apologize for not wearing them.
Neither can I take it for granted that he was going to send me
flowers, and explain that he sent me these instead."
"How awfully careless," said Hazel, interrupting her; "he must
have had something on his mind not to take the pains to look,
even."
Rose flushed. "It will be best to let the matter drop, and say
nothing about it," she replied in a cool, toploftical tone that amazed,
as well as mystified, her little hostess.
"Why, Rose, I think Jack ought to know about it. I 'll tell him, if
you don't want to."
"Thank you, Hazel, but I don't need your good offices in this
matter."
Hazel rose from the rug, and going over to Rose, laid both
hands on her shoulders and looked straight up into her eyes.
"Now, Rose Blossom, please don't speak to me in that way. You
're so queer! First you 're nice about Jack, and then you 're horrid;
and when you 're that way, you are n't nice to me a bit--and I don't
like it, and I don't blame Jack for not liking it either," she added
emphatically. "I remember papa said a year ago that Jack was 'all
heart' for a good many girls, old and young--but I can tell you what,
he won't have any for you, if you whiff round so."
Hazel in her earnestness gave Rose a little shake. Rose smiled,
and, bending her head, kissed her, saying, "F. and F. and you know,
Hazel."
"Oh, I know all about 'forgiving and forgetting,' but I don't like it
just the same. He's my cousin and the dearest fellow in the world,
and I don't like to have him treated so."
"How about his treating me?" said Rose, pointing to the
innocent box of underwear, "forgetting even to look; or not caring
enough, to see if I had the right package?"
"Oh, that's different--perhaps the florist made a mistake."
"The florist!" Rose laughed merrily. "I never knew that
gentlemen's underwear and roses grew on the same bush.--There 's
Wilkins, and I 'm not ready."
"De coachman say it's a pow'f ul col' night, an' Miss Rose bettah
take some mo' wraps."
"Thank you, Wilkins," Hazel flew into the dressing-room for a
long fur cloak of her mother's which she had used to wear to the
dancing-classes. She wrapped it about Rose, who stooped suddenly
and kissed her again, whispering, "Hazel, you 've all spoiled me,
that's what's the matter,--but I 'll be good to Jack, for your sake as
well as for my own."
"Now you 're what Doctor Heath calls papa, the most splendid
fellow in the world. There now--I won't crush your gown--" A kiss--
"Good-night. You look like an angel!"
Mr. Clyde thought so, too, as he watched her coming
downstairs. She slipped off the cloak as she stood beneath the soft,
but brilliant hall lights. "Do I look all right?" she asked earnestly, for
she had fallen into the habit, before going anywhere with him or
Hazel, of asking for their criticism.
"I should say so--but where are the flowers? I miss them."
"I thought I wouldn't wear any to-night, just for a change."
"A woman's whim, Rose. But I can't say that you need them--
Now, what's to pay?" he said to himself, as he helped her into the
carriage. "I saw Jack at Dord's this afternoon, and, evidently,
something was in the wind. I hope it has n't been taken out of his
sails."
"Sumfin' mighty queah 'bout dat yere box," murmured Wilkins
to himself, as he closed the door, "but Miss Rose doan' need no
flow's. Nebber see sech h--Fo' de good Lawd! Wha' fo' yo' hyar? Yo'
Minna-Lu,--skeerin' mah day-lights out o' mah, shoolin' 'roun' b'hin'
dat por' chair,--jes' lake bug'lahs."
Minna-Lu gurgled. "Yo' jes' straight, Wilkins; nebber see sech
ha'r. Huccome I 'se hyar? Jes' to see dat lillum-white angel--"
"Yo' go 'long, wha' yo' b'long," growled Wilkins, not yet having
recovered from his fright. And Minna-Lu went, with the radiant vision
still before her round, black eyes.
Jack felt a queer tightening about his lower jaw, and one heart-
throb, apparently in his throat, as he entered Aunt Carrie's
reception-room. Then, as with one glance he swept Rose from the
crown of her head to the hem of her dress, a hot, rushing wave of
indignant feeling mastered him--he knew he had staked his all (so a
man at twenty-two is apt to think) and lost. He braced himself,
mentally and physically. He was n't going to show the white-feather-
-not he.
But Rose--Rose was mystifying, captivating, cordial, merry, and
altogether charming. She knocked out all Jack's calculations as to
life, love, women, girls in general, and one girl in particular, at one
fell swoop. He was brought, necessarily, into unstable equilibrium, so
far as his feelings were concerned--his head he was obliged to keep
level on account of the various figures. Several other heads were
variously askew, and would have been turned, likewise, for good and
all, had the wearer of her mother's India-mull wedding-dress been
possessed of a fortune.
Rose developed social powers that evening that furnished food
for conversation for Aunt Carrie and Mr. Clyde, who watched her
with pride and pleasure. She was evidently enjoying herself
thoroughly, and her enjoyment proved contagious.
"After all," said Jack as, between figures, he found opportunity
for a whispered word or two; "this is n't half so fine a dance as the
one in the barn, last September."
"Why, that's just what I was thinking, myself, that very minute!"
"You were?"
"Yes."
The brown eyes and the blue ones met with such evidence of a
perfect understanding, that Jack failed to see Maude Seaton, who
had approached him for the purpose of taking him out in the four-in-
hand.
"Oh, I beg your pardon," said Jack, starting to his feet, "it's the
'four-in-hand.'"
"Yes, and I think you 'll have to be put into the traces again,"
she said, with a meaning smile.
"Not I," retorted Jack, merrily, "I kicked over them nearly a year
ago."
"So I heard," replied Miss Seaton, sweetly; and Jack wondered
what she meant.
When Jack found himself again beside Rose, he decided that,
flowers or no flowers, he would ask for an explanation. But his first
attempt was met with such a bewilderingly merry smile, and such
confident assurance that explanations were not in order, that it
proved a successful failure.
When, at last, in the early morning hours he was seated before
the open fire in his bedroom, pulling away reflectively at his pipe, he
had time to think it over. He came to the conclusion that it was trivial
in him to have staked his all on her wearing those flowers, for she
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
ebookbell.com

More Related Content

PDF
Digital signal processors architecture programming and applications 2nd Editi...
PDF
Digital signal processors architecture programming and applications 2nd Editi...
PDF
Digital signal processors architecture programming and applications 2nd Editi...
PPT
Research perspectives in biomedical signal processing
PDF
Digital Signal Processing and Applications Second Edition Dag Stranneby
PDF
Applied Signal Processing A Matlabbased Proof Of Concept 1st Edition Thierry ...
PDF
Applications Of Digital Signal Processing Christian Cuadradolaborde
PDF
Realtime Digital Signal Processing Implementations And Applications 2nd Editi...
Digital signal processors architecture programming and applications 2nd Editi...
Digital signal processors architecture programming and applications 2nd Editi...
Digital signal processors architecture programming and applications 2nd Editi...
Research perspectives in biomedical signal processing
Digital Signal Processing and Applications Second Edition Dag Stranneby
Applied Signal Processing A Matlabbased Proof Of Concept 1st Edition Thierry ...
Applications Of Digital Signal Processing Christian Cuadradolaborde
Realtime Digital Signal Processing Implementations And Applications 2nd Editi...

Similar to Programmable Digital Signal Processors Vol 13 Architecture Programming And Applications Yu Hen Hu (20)

PPT
Digital Signal Processor
PDF
Full download Real Time Digital Signal Processing Implementation and Applicat...
PPTX
Introduction_to_DSPforengineersforstudy.pptx
PDF
Presentation on DSP-Research Areas- National Conference in VLSI & Communi...
PDF
Introduction To Digital Speech Processing Foundations And Trends In Signal Pr...
PDF
Discrete Time Signal Processing 3rd Edition Alan V. Oppenheim
PDF
Recent Advances In Signal Processing Ashraf A Zaher
PDF
Video Coding for Wireless Communication Systems Signal Processing Series 1st ...
PDF
Video Coding for Wireless Communication Systems Signal Processing Series 1st ...
PDF
REAL TIME SPECIAL EFFECTS GENERATION AND NOISE FILTRATION OF AUDIO SIGNAL USI...
PDF
Digital Signal Processing Technology Essentials Of The Communications Revolut...
PDF
Digital Signal Processing International Edition John G Proakis Dimitris K Man...
PDF
Digital Signal Processing Applied in Mobile Communications
PDF
Digital Signal Processor evolution over the last 30 years
PDF
Discrete Time Signal Processing 3rd Edition Alan V. Oppenheim
PDF
Introduction to DSP - Digital Signal Processing
PDF
Real Time Digital Signal Processing Implementations Applications And Experime...
PDF
An Evaluation Of Lms Based Adaptive Filtering
PPTX
Introduction to Digital Signal Processors and Architectures
PDF
The Digital Signal Processing Handbook Video Speech and Audio Signal Processi...
Digital Signal Processor
Full download Real Time Digital Signal Processing Implementation and Applicat...
Introduction_to_DSPforengineersforstudy.pptx
Presentation on DSP-Research Areas- National Conference in VLSI & Communi...
Introduction To Digital Speech Processing Foundations And Trends In Signal Pr...
Discrete Time Signal Processing 3rd Edition Alan V. Oppenheim
Recent Advances In Signal Processing Ashraf A Zaher
Video Coding for Wireless Communication Systems Signal Processing Series 1st ...
Video Coding for Wireless Communication Systems Signal Processing Series 1st ...
REAL TIME SPECIAL EFFECTS GENERATION AND NOISE FILTRATION OF AUDIO SIGNAL USI...
Digital Signal Processing Technology Essentials Of The Communications Revolut...
Digital Signal Processing International Edition John G Proakis Dimitris K Man...
Digital Signal Processing Applied in Mobile Communications
Digital Signal Processor evolution over the last 30 years
Discrete Time Signal Processing 3rd Edition Alan V. Oppenheim
Introduction to DSP - Digital Signal Processing
Real Time Digital Signal Processing Implementations Applications And Experime...
An Evaluation Of Lms Based Adaptive Filtering
Introduction to Digital Signal Processors and Architectures
The Digital Signal Processing Handbook Video Speech and Audio Signal Processi...
Ad

Recently uploaded (20)

PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Computing-Curriculum for Schools in Ghana
PPTX
Pharma ospi slides which help in ospi learning
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
RMMM.pdf make it easy to upload and study
PDF
Classroom Observation Tools for Teachers
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
GDM (1) (1).pptx small presentation for students
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PPTX
Presentation on HIE in infants and its manifestations
Abdominal Access Techniques with Prof. Dr. R K Mishra
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
human mycosis Human fungal infections are called human mycosis..pptx
Computing-Curriculum for Schools in Ghana
Pharma ospi slides which help in ospi learning
2.FourierTransform-ShortQuestionswithAnswers.pdf
STATICS OF THE RIGID BODIES Hibbelers.pdf
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
VCE English Exam - Section C Student Revision Booklet
202450812 BayCHI UCSC-SV 20250812 v17.pptx
RMMM.pdf make it easy to upload and study
Classroom Observation Tools for Teachers
Microbial diseases, their pathogenesis and prophylaxis
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Final Presentation General Medicine 03-08-2024.pptx
GDM (1) (1).pptx small presentation for students
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Presentation on HIE in infants and its manifestations
Ad

Programmable Digital Signal Processors Vol 13 Architecture Programming And Applications Yu Hen Hu

  • 1. Programmable Digital Signal Processors Vol 13 Architecture Programming And Applications Yu Hen Hu download https://ebookbell.com/product/programmable-digital-signal- processors-vol-13-architecture-programming-and-applications-yu- hen-hu-4106462 Explore and download more ebooks at ebookbell.com
  • 2. Here are some recommended products that we believe you will be interested in. You can click the link to download. Programmable Digital Signal Processors Architecture Programming And Applications Yu Hen Hu https://ebookbell.com/product/programmable-digital-signal-processors- architecture-programming-and-applications-yu-hen-hu-1207054 Programmable Digital Signal Processors Architecture Programming And Applications Yu Hen Hu https://ebookbell.com/product/programmable-digital-signal-processors- architecture-programming-and-applications-yu-hen-hu-1353792 Digital Signal Processing With Field Programmable Gate Arrays 4th Edition Uwe Meyerbaese https://ebookbell.com/product/digital-signal-processing-with-field- programmable-gate-arrays-4th-edition-uwe-meyerbaese-34709906 Digital Signal Processing With Field Programmable Gate Arrays Dr Uwe Meyerbaese Ph D Auth https://ebookbell.com/product/digital-signal-processing-with-field- programmable-gate-arrays-dr-uwe-meyerbaese-ph-d-auth-4187492
  • 3. Digital Signal Processing With Field Programmable Gate Arrays Dr Uwe Meyerbaese Auth https://ebookbell.com/product/digital-signal-processing-with-field- programmable-gate-arrays-dr-uwe-meyerbaese-auth-4189602 Digital Signal Processing With Field Programmable Gate Arrays Originally Published As A Monograph3rd Ed Dr Uwe Meyerbaese Auth https://ebookbell.com/product/digital-signal-processing-with-field- programmable-gate-arrays-originally-published-as-a-monograph3rd-ed-dr- uwe-meyerbaese-auth-4192124 Digital Signal Processing With Field Programmable Gate Arrays Meyer Baese U https://ebookbell.com/product/digital-signal-processing-with-field- programmable-gate-arrays-meyer-baese-u-1085416 Digital Signal Processing With Field Programmable Gate Arrays Uwe Meyerbaese https://ebookbell.com/product/digital-signal-processing-with-field- programmable-gate-arrays-uwe-meyerbaese-1317546 Digital Systems Design And Prototyping Using Field Programmable Logic And Hardware Description Languages 2nd Edition Zoran Salcic https://ebookbell.com/product/digital-systems-design-and-prototyping- using-field-programmable-logic-and-hardware-description-languages-2nd- edition-zoran-salcic-4188514
  • 5. Marcel Dekker, Inc. New York • Basel TM Programmable Digital Signal Processors edited by Yu Hen Hu University of Wisconsin–Madison Madison, Wisconsin Architecture, Programming, and Applications Copyright © 2001 by Marcel Dekker, Inc. All Rights Reserved.
  • 6. ISBN: 0-8247-0647-1 This book is printed on acid-free paper. Headquarters Marcel Dekker, Inc. 270 Madison Avenue, New York, NY 10016 tel: 212-696-9000; fax: 212-685-4540 Eastern Hemisphere Distribution Marcel Dekker AG Hutgasse 4, Postfach 812, CH-4001 Basel, Switzerland tel: 41-61-261-8482; fax: 41-61-261-8896 World Wide Web http://www.dekker.com The publisher offers discounts on this book when ordered in bulk quantities. For more information, write to Special Sales/Professional Marketing at the headquarters address above. Copyright  2002 by Marcel Dekker, Inc. All Rights Reserved. Neither this book nor any part may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, microfilming, and recording, or by any information storage and retrieval system, without permission in writing from the publisher. Current printing (last digit): 10 9 8 7 6 5 4 3 2 1 PRINTED IN THE UNITED STATES OF AMERICA TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 7. v Series Introduction Over the past 50 years, digital signal processing has evolved as a major engineering discipline. The fields of signal processing have grown from the origin of fast Fourier transform and digital filter design to statistical spectral analysis and array processing, image, audio, and multimedia processing, and shaped developments in high- performance VLSI signal processor design. Indeed, there are few fields that enjoy so many applications—signal processing is everywhere in our lives. When one uses a cellular phone, the voice is compressed, coded, and modulated using signal processing techniques. As a cruise missile winds along hillsides searching for the target, the signal processor is busy processing the images taken along the way. When we are watching a movie in HDTV, millions of audio and video data are being sent to our homes and received with unbelievable fidelity. When scientists compare DNA samples, fast pattern recognition techniques are being used. On and on, one can see the impact of signal processing in almost every engineering and scientific discipline. Because of the immense importance of signal processing and the fast-growing demands of business and industry, this series on signal processing serves to report up-to-date developments and advances in the field. The topics of interest include but are not limited to the following: · Signal theory and analysis · Statistical signal processing · Speech and audio processing · Image and video processing · Multimedia signal processing and technology · Signal processing for communications · Signal processing architectures and VLSI design TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 8. Signal Processing and Communications Editorial Board Maurice G. Ballanger, Conservatoire National des Arts et Métiers (CNAM), Paris Ezio Biglieri, Politecnico di Torino, Italy Sadaoki Furui, Tokyo Institute of Technology Yih-Fang Huang, University of Notre Dame Nikhil Jayant, Georgia Tech University Aggelos K. Katsaggelos, Northwestern University Mos Kaveh, University of Minnesota P. K. Raja Rajasekaran, Texas Instruments John Aasted Sorenson, IT University of Copenhagen 1. Digital Signal Processing for Multimedia Systems, edited by Keshab K. Parhi and Takao Nishitani 2. Multimedia Systems, Standards, and Networks, edited by Atul Puri and Tsuhan Chen 3. Embedded Multiprocessors: Scheduling and Synchronization, Sun- dararajan Sriram and Shuvra S. Bhattacharyya 4. Signal Processing for Intelligent Sensor Systems, David C. Swanson 5. Compressed Video over Networks, edited by Ming-Ting Sun and Amy R. Reibman 6. Modulated Coding for Intersymbol Interference Channels, Xiang-Gen Xia 7. Digital Speech Processing, Synthesis, and Recognition: Second Edi- tion, Revised and Expanded, Sadaoki Furui 8. Modern Digital Halftoning, Daniel L. Lau and Gonzalo R. Arce 9. Blind Equalization and Identification, Zhi Ding and Ye (Geoffrey) Li 10. Video Coding for Wireless Communication Systems, King N. Ngan, Chi W. Yap, and Keng T. Tan 11. Adaptive Digital Filters: Second Edition, Revised and Expanded, Maurice G. Bellanger 12. Design of Digital Video Coding Systems, Jie Chen, Ut-Va Koc, and K. J. Ray Liu TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 9. 13. Programmable Digital Signal Processors: Architecture, Program- ming, and Applications, edited by Yu Hen Hu 14. Pattern Recognition and Image Preprocessing: Second Edition, Re- vised and Expanded, Sing-Tze Bow 15. Signal Processing for Magnetic Resonance Imaging and Spectros- copy, edited by Hong Yan 16. Satellite Communication Engineering, Michael O. Kolawole Additional Volumes in Preparation TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 10. Preface Since their inception in the late 1970s, programmable digital signal processors (PDSPs) have gradually expanded into applications such as multimedia signal processing, communications, and industrial control. PDSPs have always played a dual role: on the one hand, they are programmable microprocessors; on the other hand, they are designed specifically for digital signal processing (DSP) applications. Hence they often contain special instructions and special architec- ture supports so as to execute computation-intensive DSP algorithms more effi- ciently. This book addresses various programming issues of PDSPs and features the contributions of some of the leading experts in the field. In Chapter 1, Kittitornkun and Hu offer an overview of the various aspects of PDSPs. Chapter 2, by Managuli and Kim, gives a comprehensive discussion of programming methods for very-long-instruction-word (VLIW) PDSP architec- tures; in particular, they focus on mapping DSP algorithms to best match the underlying VLIW architectures. In Chapter 3, Lee and Fiskiran describe native signal processing (a technique to enhance the performance of multimedia signal processing by general-purpose microprocessors) and compare various formats for multimedia extension (MMX) instruction. Chapter 4, by Tessier and Burleson, presents a survey of academic research and commercial development in recon- figurable computing for DSP systems over the past 15 years. The next three chapters focus on issues in software development. In Chapter 5, Wu and Wolf examine the pros and cons of various options for implementing video signal processing applications. Chapter 6, by Yu and Hu, details a method- ology for optimal compiler linear code generation. In Chapter 7, Chen et al. offer TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 11. practical advice on proper design of multimedia algorithms using MMX instruc- tion sets. Chapter 8, by Bhattacharyya, addresses the relationship between hardware synthesis and software design, focusing particularly on automated mapping of high-level specifications of DSP applications onto programmable DSPs. In Chap- ter 9, Catthoor et al. discuss critical, yet often overlooked, issues of storage sys- tem architecture and memory management. I would like to express my appreciation to the authors of each chapter for their dedication to this project and for their outstanding scholarly work. Thanks also go to chapter reviewers James C. Abel, Jack Jean, Konstantinos Konstan- tinides, Grant Martin, Miodrag Potkonjak, and Frederic Rousseau. Throughout this project, B. J. Clark, acquisitions editor, and Ray K. J. Liu, series editor, have provided strong encouragement and assistance. I thank them for their support and trust. I would also like to express my gratitude to Michael Deters, production editor, for his cooperation and patience. Yu Hen Hu TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 12. Contents Series Introduction Preface Contributors 1. Programmable Digital Signal Processors: A Survey Surin Kittitornkun and Yu Hen Hu 2. VLIW Processor Architectures and Algorithm Mappings for DSP Applications Ravi A. Managuli and Yongmin Kim 3. Multimedia Instructions in Microprocessors for Native Signal Processing Ruby B. Lee and A. Murat Fiskiran 4. Reconfigurable Computing and Digital Signal Processing: Past, Present, and Future Russell Tessier and Wayne Burleson 5. Parallel Architectures for Programmable Video Signal Processing Zhao Wu and Wayne Wolf TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 13. 6. OASIS: An Optimized Code Generation Approach for Complex Instruction Set PDSPs Jim K. H. Yu and Yu Hen Hu 7. Digital Signal Processing on MMX Technology Yen-Kuang Chen, Nicholas Yu, and Birju Shah 8. Hardware/Software Cosynthesis of DSP Systems Shuvra S. Bhattacharyya 9. Data Transfer and Storage Architecture Issues and Exploration in Multimedia Processors Francky Catthoor, Koen Danckaert, Chidamber Kulkarni, and Thierry Omnès TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 14. Contributors Shuvra S. Bhattacharyya Department of Electrical and Computer Engineer- ing and Institute for Advanced Computer Studies, University of Maryland at College Park, College Park, Maryland Wayne Burleson Department of Electrical and Computer Engineering, Uni- versity of Massachusetts, Amherst, Massachusetts Francky Catthoor Design Technology for Integrated Information, IMEC, Leuven, Belgium Yen-Kuang Chen Microprocessor Research Laboratories, Intel Corporation, Santa Clara, California Koen Danckaert Design Technology for Integrated Information, IMEC, Leuven, Belgium A. Murat Fiskiran Department of Electrical Engineering, Princeton Univer- sity, Princeton, New Jersey Yu Hen Hu Department of Electrical and Computer Engineering, University of Wisconsin–Madison, Madison, Wisconsin Yongmin Kim Department of Bioengineering, University of Washington, Se- attle, Washington TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 15. Surin Kittitornkun Department of Electrical and Computer Engineering, Uni- versity of Wisconsin–Madison, Madison, Wisconsin Chidamber Kulkarni Design Technology for Integrated Information, IMEC, Leuven, Belgium Ruby B. Lee Department of Electrical Engineering, Princeton University, Princeton, New Jersey Ravi A. Managuli Department of Bioengineering, University of Washington, Seattle, Washington Thierry Omnès Design Technology for Integrated Information, IMEC, Leu- ven, Belgium Birju Shah Microprocessor Research Laboratories, Intel Corporation, Santa Clara, California Russell Tessier Department of Electrical and Computer Engineering, Univer- sity of Massachusetts, Amherst, Massachusetts Wayne Wolf Department of Electrical Engineering, Princeton University, Princeton, New Jersey Zhao Wu Department of Electrical Engineering, Princeton University, Prince- ton, New Jersey Jim K. H. Yu* Department of Electrical and Computer Engineering, Univer- sity of Wisconsin–Madison, Madison, Wisconsin Nicholas Yu Microprocessor Research Laboratories, Intel Corporation, Santa Clara, California * Current affiliation: Tivoli Systems, Austin, Texas TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 16. 1 Programmable Digital Signal Processors: A Survey Surin Kittitornkun and Yu Hen Hu University of Wisconsin–Madison, Madison, Wisconsin 1 INTRODUCTION Programmable digital signal processors (PDSPs) are general-purpose micropro- cessors designed specifically for digital signal processing (DSP) applications. They contain special instructions and special architecture supports so as to exe- cute computation-intensive DSP algorithms more efficiently. Programmable digital signal processors are designed mainly for embedded DSP applications. As such, the user may never realize the existence of a PDSP in an information appliance. Important applications of PDSPs include modem, hard drive controller, cellular phone data pump, set-top box, and so forth. The categorization of PDSPs falls between the general-purpose micropro- cessor and the custom-designed, dedicated chip set. The former have the advan- tage of ease of programming and development. However, they often suffer from disappointing performance for DSP applications due to overheads incurred in both the architecture and the instruction set. Dedicated chip sets, on the other hand, lack the flexibility of programming. The time-to-market delay due to chip development may be longer than the program coding of programmable devices. 1.1 A Brief Historical Scan of PDSP Development 1.1.1 The 1980s to the 1990s A number of PDSPs appeared in the commercial market in the early 1980s. Around 1980, Intel introduced the Intel2920, featuring on-chip A/D (analog-to- TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 17. Kittitornkun digital) and D/A (digital-to-analog) converters. Nonetheless, it had no hardware multiplier, and it was difficult to load program parameters into the chip due to the lack of digital interface. At almost the same time, NEC introduced the NEC MPD7720. It is equipped with a hardware multiplier and is among the first to adopt the Harvard architecture with physically separate on-chip data memory and program memory. Texas Instruments introduced the TMS320C10 in 1982. Simi- lar to the MPD7720, the ’C10 adopts the Harvard architecture and has a hard- ware multiplier. Furthermore, the ’C10 is the first PDSP that can execute in- structions from off-chip program memory without performance penalty due to off-chip memory input/output (I/O). This feature brought PDSPs closer to the microprocessor/microcontroller programming model. In addition, the emphasis on development tools and libraries by Texas Instruments led to widespread appli- cations of PDSP. The architectural features of several representative examples of these early PDSP chips are summarized in Table 1. In these early PDSPs, DSP-specific instructions such as MAC (multiply- and-accumulate), DELAY (delay elements), REPEAT (loop control), and other flow control instructions are devised and included in the instruction set so as to improve both programmability and performance. Moreover, special address generator units with bit-reversal addressing mode support have been incorporated to enable efficient execution of the fast Fourier transform (FFT) algorithm. Due to limitation of chip area and transistor count, the on-chip data and program memories are quite small in these chips. If the program cannot fit into the on- chip memory, a significant performance penalty will incur. Later, floating-point PDSPs, such as Texas Instruments’ TMS320C30 and Motorola’s DSP96001 appeared in the market. With fixed-point arithmetic as in early PDSPs, the dynamic range of the intermediate results must be carefully monitored to prevent overflow. Some reports estimated that as much as one-third of the instruction cycles in executing PDSP programs are wasted on checking the overflow condition of intermediate results. A key advantage of a floating- point arithmetic unit is its extensive dynamic range. Later on, some PDSPs also included on-chip DMA (direct memory access) controllers, as well as a dedicated DMA bus that allowed concurrent data I/O at the DMA unit, and signal pro- cessing computation in the CPU. 1.1.2 The 1990s to 2000 In this decade, the following trends in PDSPs emerged. Consolidation of PDSP Market. Unlike the 1980s, in which numerous PDSP architectures had been developed, the 1990s are noted for a consolidation of the PDSP market. Only very few PDSPs are now available in the market. Notably, Texas Instrument’s TMS320Cxx series captured about 70% of the PDSP TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 18. Table 1 Summary of Characteristics of Early PDSPs On-chip On-chip On-chip program Model Manufacturer Year data RAM data ROM RAM Multiplier A100 Inmos — — — 4, 8, 12, 16 ADSP2100 Analog Device 1986 — — — 16 ⫻ 16 → 32 DSP16 AT&T 512 ⫻ 16 2K ⫻ 16 — 16 ⫻ 16 → 32 DSP32 AT&T 1984 1K ⫻ 32 512 ⫻ 32 — 32 ⫻ 32 → 40 DSP32C AT&T 1988 1K ⫻ 32 2K ⫻ 32 — 32 ⫻ 32 → 40 DSP56000 Motorola 1986 512 ⫻ 24 512 ⫻ 24 512 ⫻ 24 24 ⫻ 24 → 56 DSP96001 Motorola 1988 1K ⫻ 32 1K ⫻ 32 544 ⫻ 32 32 ⫻ 32 → 96 DSSP-VLSI NTT 1986 512 ⫻ 18 — 4K ⫻ 18 (18-bit) 12E6 Intel2920 Intel 1980 40 ⫻ 25 — 192 ⫻ 24 — LM32900 National — — — 16 ⫻ 16 → 32 MPD7720 NEC 1981 128 ⫻ 16 512 ⫻ 13 512 ⫻ 23 16 ⫻ 16 → 31 MSM 6992 OKI 1986 256 ⫻ 32 — 1K ⫻ 32 (22-bit) 16E6 MSP32 Mitsubishi 256 ⫻ 16 — 1K ⫻ 16 32 ⫻ 16 → 32 MB8764 Fujitsu 256 ⫻ 16 — 1K ⫻ 24 NEC77230 NEC 1986 1K ⫻ 32 1K ⫻ 32 2K ⫻ 32 24E8 → 47E8 TS68930 Thomson 256 ⫻ 16 512 ⫻ 16 1K ⫻ 32 16 ⫻ 16 → 32 TMS32010 TI 1982 144 ⫻ 16 — 1.5K ⫻ 16 16 ⫻ 16 → 32 TMS320C25 TI 1986 288 ⫻ 16 — 4K ⫻ 16 16 ⫻ 16 → 32 TMS320C30 TI 1988 2K ⫻ 32 — 4K ⫻ 32 32 ⫻ 32 → 32E8 ZR34161 VSP Zoran 128 ⫻ 32 1K ⫻ 16 — 16-Bit vector engine TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 19. Kittitornkun market toward the end of this decade. Within this family, the traditional TMS320C10/20 series has evolved into TMS320C50 and has become one of the most popular PDSPs. Within this TMS family, TMS320C30 was introduced in 1988 and its floating-point arithmetic unit has attracted a number of scientific applications. Other members in this family that were introduced in the 1990s include TMS320C40, a multiprocessing PDSP, and TMS320C80, another multi- processing PDSP designed for multimedia (video) applications. TMS320C54xx and TMS320C6xx are the recent ones in this family. Another low-cost PDSP that has emerged as a popular choice is Analog Device’s SHARC processor. These modern PDSP architectures will be surveyed in later sections of this chapter. DSP Core Architecture. As the feature size of the digital integrated circuit continues to shrink, more and more transistors can be packed into a single chip. As such, it is possible to incorporate peripheral (glue) logics and supporting logic components into the same chip in addition to the PDSP. This leads to the notion of the system on (a) chip (SoC). In designing a SoC system, an existing PDSP core is incorporated into the overall system design. This design may be repre- sented as a VHDL (very-high-speed integrated circuit hardware description language)/Verilog core, or in a netlist format. A PDSP that is used in this fashion is known as a processor core or a DSP core. In the 1990s, many existing popular PDSP designs had been converted into DSP cores so that the designers could design new applications using familiar instruction sets or even existing programs. On the other hand, several new PDSP architectures are being developed and licensed as DSP cores. Examples of these DSP cores, including Carmel, R.E.A.L., StarCore, and V850, will be reviewed in Section 4. Multimedia PDSPs. With the development of international multimedia standards such as JPEG image compression (Pennebaker and Mitchell, 1993), MPEG video coding (Mitchell et al., 1997), and MP3 audio, there is an expanding market for low-cost, dedicated multimedia processors. Due to the complexity of these standards, it is difficult to develop a multimedia processor architecture with- out any programmability. Thus, a family of multimedia enhanced PDSPs—such as MPACT, TriMedia, TMS320C8x, and DDMP (Terada et el., 1999)—have been developed. A key feature of these multimedia PDSPs is that they are equipped with various special multimedia-related function units, for instance, the YUV to RGB (color coordinates) converter, the VLC (variable-length code) entropy encoder/decoder, and the motion estimation unit. In addition, they facili- tate direct multimedia signal I/O, bypassing the bottleneck of a slow system bus. Native Signal Processing with Multimedia Extension Instructions. By na- tive signal processing (NSP), the signal processing tasks are executed in the TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 20. general-purpose microprocessor, rather than in a separate coprocessing PDSP. As their speed increases, a number of signal processing operations can be performed without additional hardware or dedicated chip sets. In the mid-1990s, Intel intro- duced the MMX (MultiMedia eXtension) instruction set to the Pentium series microprocessor. Because modern microprocessors have a long internal word length of 32, 64, or even extended 128 bits, several 8-bit or 16-bit multimedia data samples can be packed into a single internal word to facilitate the so-called subword parallelism. By processing several data samples in parallel in a single instruction, better performance can be accomplished while processing especially multimedia streams. 1.1.3 Hardware Programmable Digital Signal Processors: FPGA An FPGA (field programmable gate array) is a software-configurable hardware device that contains (1) a substantial amount of uncommitted combinational logic; (2) preimplemented flip-flops; and (3) programmable interconnections among the combinational logic, flip-flops, and the chip I/O pins. The downloaded configuration bit stream programs all the functions of the combinational logic, flip-flops, and the interconnections. Although not the most efficient, an FPGA can be used to accelerate DSP applications in several different ways (Knapp, 1995): 1. An FPGA can be used to implement a complete application-specific integrated circuit (ASIC) DSP system. A shortcoming of this approach is that current FPGA technology does not yield the most efficient hard- ware implementation. However, FPGA implementation has several key advantages: (1) time-to-market is short, (2) upgrade to new architecture is relatively easy, and (3) low-volume production is cost effective. 2. An FPGA can act as a coprocessor to a PDSP to accelerate certain specific DSP functions that cannot be efficiently implemented using conventional architecture. 3. Furthermore, an FPGA can be used as a rapid prototyping system to validate the design of an ASIC and to facilitate efficient, hardware-in- the-loop debugging. 1.2 Common Characteristics of DSP Computation 1.2.1 Real-Time Computation Programmable digital signal processors are often used to implement real-time applications. For example, in a cellular phone, the speed of speech coding must TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 21. Kittitornkun match that of normal conversation. A typical real-time signal processing applica- tion has three special characteristics: 1. The computation cannot be initiated until the input signal samples are received. Hence, the result cannot be precomputed and stored for later use. 2. Results must be obtained before a prespecified deadline. If the deadline is violated, the quality of services will be dramatically degraded and even render the application useless. 3. The program execution often continues for an indefinite duration of time. Hence, the total number of mathematical operations needed to be performed per unit time, known as throughput, becomes an important performance indicator. 1.2.2 Data Flow Dominant Computation Digital signal processor applications involve stream media data types. Thus, in- stead of supporting complex control flow (e.g., context switch, multithread pro- cessing), a PDSP should be designed to streamline data flow manipulation. For example, special hardware must be designed to facilitate efficient input and output of data from PDSP to off-chip memory, to reduce overhead involved in accessing arrays of data in various fashions, and to reduce overhead involved in the execu- tion of multilevel nested DO loops. 1.2.3 Specialized Arithmetic Computation Digital signal processor applications often require special types of arithmetic op- erations to make computations more efficient. For example, a convolution opera- tion y(n) ⫽ 冱 K⫺1 k⫽0 x(k)h(n ⫺ k) can be realized using a recursion y(n) ⫽ 0; y(n) ⫽ y(n) ⫹ x(k) ∗ h(n ⫺ k), k ⫽ 0, 1, 2, . . . , K ⫺ 1 For each k, a multiplication and an addition (accumulation) are to be performed. This leads to the implementation of MAC instruction in many modern PDSPs: R4 ← R1 ⫹ R2 ∗ R3 Modern PDSPs often contain hardware support of the so-called saturation arith- metic. In saturation arithmetic, if the result of computation exceeds the dynamic range, it is clamped to either the maximum or the minimum value; that is, 9 ⫹ 9 ⫽ 15 (010012 ⫹ 010012 ⫽ 011112) in 2’s complement arithmetic. Therefore, TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 22. for applications in which saturation arithmetic is applicable, there will be no need to check for overflow during the execution. These special instructions are also implemented in hardware. For example, to implement a saturation addition func- tion using 2’s complement arithmetic without intrinsic function support, we have the following C code segment: int sadd(int a, int b) { int result; result ⫽ a ⫹ b; if (((a ∧ b) & 0x80000000) ⫽⫽ 0) { if ((result ∧ a) & 0x80000000) { result ⫽ (a ⬍ 0) ? 0x80000000 : 0x7fffffff; } } return (result); } However, with a special _sadd intrinsic function support in TMS320C6x (Texas Instruments, 1998c) the same code segment reduces to the single line: result ⫽ _sadd(a,b); 1.2.4 Execution Control Many DSP algorithms can be formulated as nested, indefinite Do loops. In order to reduce the overhead incurred in executing multilevel nested loops, a number of special hardware supports are included in PDSPs to streamline the control flow of execution. 1. Zero-overhead hardware loop: A number of PDSPs contain a special REPEAT instruction to support efficient execution of multiple loop nests using dedicated counters to keep track of loop indices. 2. Explicit instruction-level parallelism (ILP): Due to the deterministic data flow of many DSP algorithms, ILP can be exploited at compile time by an optimizing compiler. This led several modern PDSPs to adopt the very long instruction word (VLIW) architecture to efficiently utilize the available ILP. 1.2.5 Low-Power Operation and Embedded System Design 1. The majority of applications of PDSPs are embedded systems, such as a disk drive controller, modem, and cellular phone. Thus, many PDSPs are highly integrated and often contain multiple data I/O function units, timers, and other function units in a single chip packaging. 2. Power consumption is a key concern in the implementation of embed- TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 23. Kittitornkun ded systems. Thus, PDSPs are often designed to compromise between conflicting requirements of high-speed data computation and low- power consumption (Borkar, 1999). The specialization of certain key functions allows efficient execution of the desired operations using a high degree of parallelism while holding down the power source volt- age and overall clock frequency to conserve energy. 1.3 Common Features of PDSPs 1.3.1 Harvard Architecture A key feature of PDSPs is the adoption of a Harvard memory architecture that contains separate program and data memory so as to allow simultaneous instruc- tion fetch and data access. This is different from the conventional Von Neumann architecture, where program and data are stored in the same memory space. 1.3.2 Dedicated Address Generator The address generator allows rapid access of data with complex data arrangement without interfering with the pipelined execution of main ALUs (arithmetic and logic units). This is useful for situations such as two-dimensional (2D) digital filtering and motion estimation. Some address generators may include a bit- reversal address calculation to support the efficient implementation of FFT, and circular buffer addressing for the implementation of infinite impulse response (IIR) digital filters. 1.3.3 High Bandwidth Memory and I/O Controller To meet the intensive input and output demands of most signal processing appli- cations, several PDSPs have built-in multichannel DMA channels and dedicated DMA buses to handle data I/O without interfering with CPU operations. To max- imize data I/O efficiency, some modern PDSPs even include a dedicated video and audio codec (coder/decoder) as well as a high-speed serial/parallel communi- cation port. 1.3.4 Data Parallelism A number of important DSP applications exhibit a high degree of data parallelism that can be exploited to accelerate the computation. As a result, several parallel processing schemes, SIMD (single instruction, multiple data) and MIMD (multi- ple instruction, multiple data) architecture have been incorporated in the PDSP. For example, many multimedia-enhanced instruction sets in general-purpose mi- croprocessors (e.g., MMX) employed subword parallelism to speed up the execu- TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 24. tion. It is basically an SIMD approach. A number of PDSPs also facilitate MIMD implementation by providing multiple interprocessor communication links. 2 APPLICATIONS OF PDSP In this section, both real-world and prototyping applications of PDSPs are sur- veyed. These applications are divided into three categories: communication sys- tems, multimedia, and control/data acquisitions. 2.1 Communications Systems Programmable digital signal processors have been applied to implement various communication systems. Examples include caller ID [using TMS320C2xx (Texas Instruments Europe, 1997)], cordless handset, and many others. For voice communication, an acoustic-echo cancellation based on the normalized least mean square (NLMS) algorithm for hands-free wireless system is reported in (Texas Instruments, 1997). Implemented with a TMS320C54, this system per- forms both active-channel and double-talk detection. A 40-MHz TMS320C50 fixed-point processor is used to implement a low-bit-rate (1.4 Kbps), real-time vocoder (voice coder) (Yao et al., 1998). The realization also includes both the decoder and the synthesizer. A telephone voice dialer (Pawate and Robinson, 1996) is implemented with a 16-bit fixed-point TMS320C5x PDSP. It is a speaker-independent speech recognition system based on the hidden Markov model algorithm. Modern PDSPs are also suitable for error correction in digital communica- tion. A special Viterbi shift left (VSL) instruction is implemented on both the Motorola DSP56300 and the DSP56600 PDSPs (Taipale, 1998) to accelerate the Viterbi decoding. Another implementation of the ITU V.32bis Viterbi decoding algorithm using a TMS320C62xx is reported by Yiu (Yiu, 1998). Yet another example is the implementation of the U.S. digital cellular error-correction coding algorithm, including both the tasks of source coding/decoding and ciphering/ deciphering on a TMS320C541 evaluation module (Chishtie, 1994). Digital baseband signal processing is another important application of PDSPs. A TMS320C25 DSP-based GMSK (Gaussian minimum shift keying) modem for Mobitex packet radio data communication is reported in (Resweber, 1996). In this implementation, transmitted data in packet form is level-shifted and Gaussian-filtered digitally within the modem algorithm so that it is ready for transmitter baseband interface, either via a D/A converter or by direct digital modulation. Received data at either baseband or the intermediate frequency (IF) band from the radio receiver is digitized and processed. Packet synchronization TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 25. is also handled by the modem, assuring that the next layer sees only valid Mobitex packets. System prototyping can be accomplished using PDSP due to its low cost and ease of programming. A prototype of reverse channel transmitter/receiver for asymmetric digital subscriber line (ADSL) algorithm (Gottlieb, 1994) is im- plemented using a floating-point DSP TMS320C40 chip clocked at 40 MHz. The program consisted of three parts: synchronization, training, and decision-directed detection. Navigation using the Global Positioning System (GPS) has been widely accepted for commercial applications such as electronic direction finding. A software-based GPS receiver architecture using the TMS320C30 processor is described in (Kibe et al., 1996). The ’C30 is in charge of signal processing tasks such as correlation, FFT, digital filtering, decimation, demodulation, and Viterbi decoding in the tracking loop. Further investigation on the benefits of using a PDSP in a GPS receiver with special emphasis on fast acquisition tech- niques is reported in (Daffara and Vinson, 1998). The GPS L1 band signal is down-converted to IF. After A/D conversion, the signal is processed by a dedi- cated hardware in conjunction with algorithms (software) on a PDSP. Func- tions that are fixed and require high-speed processing should be implemented in dedicated hardware. On the contrary, more sophisticated functions that are less time-sensitive can be implemented using PDSPs. For the defense system application, a linear array of TMS320C30 as the front end and a Transputer processor array as the back end for programmable radar signal processing are developed to support the PDDR (Point Defense Dem- onstration Radar) (Alter et al., 1991). The input signal is sampled at 10 MHz to 16-bit, complex-valued samples. The PDSP front end performs pulse compres- sion, moving target indication (MTI), and constant false alarm (CFA) rate detec- tion. 2.2 Multimedia 2.2.1 Audio Signal Processing The audible signals cover the frequency range from 20 to 20,000 Hz. PDSP appli- cations to audio signal processing can be divided into three categories according to the qualities and audible range of the signal (Ledger and Tomarakos, 1998): professional audio products, consumer audio products, and computer audio multi- media systems. The DSP algorithms used in particular products are summarized in Table 2. MP3 (MPEG-I Layer 3 audio coding) has achieved the status of the most popular audio coding algorithm in recent years. The PDSP implementation of MP3 decoder can be found in Robinson et al. (1998). On the other hand, most TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 26. Table 2 DSP Algorithms for Audio Applications Application DSP algorithms used Professional audio products Digital audio effects processors (re- Delay-line modulation/interpolation, verb, chorus, flanging, vibrato digital filtering (comb, FIR, etc.) pitch shifting, dyn ran. compres- sion, etc.) Digital mixing consoles level detection, Filtering, digital amplitude panning volume control Digital audio tape (DAT) Compression techniques: MPEG Electronic music keyboards physical Wavetable/FM synthesis, sample play- modeling back Graphic and parametric equalizers Digital FIR/IIR filters Multichannel digital audio recorders ADPCM, AC-3 Room equalization Filtering Speaker equalization Filtering Consumer audio products CD-I ADPCM, AC-3, MPEG CD players and recorders PCM Digital amplifiers/speakers Digital filtering Digital audio broadcasting equipment AC-3, MPEG, and so forth Digital graphic equalizers Digital filtering Digital versatile disk (DVD) players AC-3, MPEG, and so forth Home theater systems (surround-sound AC-3, Dolby ProLogic, THX DTS, receivers/tuners) MPEG, hall/auditorium effects Karaoke MPEG, audio effects algorithms Satellite (DBS) broadcasting AC-3, MPEG Satellite receiver systems AC-3 Computer audio multimedia systems Sound card ADPCM, AC-3, MP3, MIDI, etc. Special-purpose headsets 3D positioning (HRTFs) synthesized sounds such as those used in computer gaming are still represented in the MIDI (Yim et al., 1998). It can be seen that PDSPs are good candidates for the implementation of these audio signal processing algorithms. 2.2.2 Image/Video Processing Existing image and video compression standards such as JPEG and MPEG are based on the DCT (discrete cosine transform) algorithm. The upcoming JPEG 2000 image coding standard will also include coding algorithms that are based on the discrete wavelet transform (DWT). These standards are often implemented TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 27. in modern digital cameras and digital camcorders, in which PDSPs will play an important role. An example of using the TMS320C549 to implement a digital camera is reported in Illgner et al. (1999), where the PDSP can be upgraded later to incorporate the upcoming JPEG 2000 standard. Low-bit-rate video coding standards include the ITU H.263⫹/H.263M and MPEG4 simple profile. In Budagavi et al. (1999), the potential applications of TMS320C54x family chips to implement low-power, low-bit-rate video coding algorithms are discussed. On the other hand, decoding of MPEG-II broadcasting grade video sequences using either the TMS320C80 (Bonomini et al., 1996) chip or the TMS320C6201 (Cheung et al., 1999) chip has been reported. Medical imaging has become another fast-growing application area of PDSPs. Reported in Chou et al. (1997) is the use of TMS320C3x as a controller and on-line data processor for processing magnetic resonance imaging (MRI). It can perform real-time dynamic imaging such as cardiac imaging, angiography (examination of the blood vessels using x-rays following the injection of a radi- opaque substance), and abdominal imaging. Recently, an implementation of real- time data acquisition, processing, and display of ungated cardiac movies at 20 frames/sec using PDSPs was reported in Morgan et al. (1999). 2.2.3 Printing The current printer consists of embedded processors to process various formats of page description languages (PDLs) such as PostScript. In Ganesh and Thakur (1999), a PDSP is used to interpret the PDL code, to create a list of elements to be displayed, and to estimate the time needed to render the image. Rendering is the process of creating the source pixel map. In this process, a common source map is 600 ⫻ 600 pixels per square inch, with four colors for each pixel, and eight bits for each color. Compression is necessary to store the output map when rendering and screening cannot be completed within the real-time requirement. This phase involves JPEG compression and matrix transformations and interpola- tions. Depending on the characteristics of the screened image and the storage memory available, the compressed image may be either lossless or lossy. Decom- pression of the bit-mapped image occurs in real time as the compressed image is fed to the print engine. The screening process converts the source pixel map into the appropriate output format. Because the process must be repeated for all pixels, the number of calculations is enormous for a high-resolution color image, especially in real time. 2.2.4 SAR Image Processing Synthetic aperture radar (SAR) signal processing possesses a significant chal- lenge due to its very large computation and data storage requirements. A sensor TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 28. transmits pulses and receives the echoes in a direction approximately perpendicu- lar to the direction of travel. The problem becomes 2D space-variant convolution using the range-Doppler algorithm, in which all the signals and coefficients are complex numbers with a precision of at least 16 bits. A heterogeneous architec- ture—vector/scalar architecture—is proposed and analyzed (Meisl, 1996). The vector processor (using Sharp LH9124 for FFTs) and the scalar processing unit (using eight SHARC 21060’s connected in a mesh network) are chosen based on performance, scalability, flexibility, development cost, and repeat cost-evaluation criterion. The design is capable of processing SAR data at about one-tenth of the real-time rate. 2.2.5 Biometric Information Processing Handwritten signature verification, one of the biometric authentication tech- niques, is inexpensive, reliable, and nonintrusive to the person being authorized. A DSP kernel for on-line verification using the TMS32010 with a 200-Hz sam- pling rate was developed (Dullink et al., 1995). The authentication kernel com- prises a personalized table and some general-purpose procedures. This verifica- tion method can be part of a variety of entrance monitoring and security systems. 2.3 Control and Data Acquisition As expected, PDSP has found numerous applications in modern control and data acquisition applications as well. Several control applications are implemented using Motorola DSP56000 PDSPs that function as both powerful microcontrol- lers and as fast digital signal processors. Its 56-bit accumulator (hence, the code name 56xxx) provides 8-bit extension registers in conjunction with saturation arithmetic to allow 256 successive consecutive additions without the need to check for overflow condition or limit cycles. The output noise power due to round-off noise of the 24-bit DSP56000/DSP56001 is 65,536 times less than that for 16-bit PDSPs and microcontrollers. Design examples include a PID (propor- tional-integral-derivative) controller (Stokes and Sohie, 1990) and an adaptive controller (Renard, 1992). Another example of DSP system development is the Computer-Assisted Dynamic Data Monitoring and Analysis System (CADDMAS) project developed for the U.S. Air Force and NASA (Sztipanovits et al., 1998). It is applied to turbine engine stress testing and analysis. The project makes use of TMS320C40 for distributed-memory parallel PDSPs. An application-specific topology in- terconnects 30 different systems, with processor counts varying from 4 to 128 processors. More than 300 sensors are used to measure signals with a sampling rate in excess of 100 kHz. Based on measured signals, the system performs spec- tral analysis, autocorrelation and cross-correlation, tricoherence, and so forth. TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 29. 2.4 DSP Applications of Hardware Programmable PDSP There are a variety of FPGA implementation examples of specific DSP functions, such as the FIR (finite impulse response) digital filter DFT/FFT (discrete Fourier transform/fast Fourier transform) processor (Dick, 1996), image/video pro- cessing (Schoner et al., 1995), wireless CDMA (Code Division Multiple Access) rake receiver (Shankiti and Lesser, 2000), and Viterbi decoding (Goslin, 1995). 2.4.1 16-Tap FIR Digital Filter A distributed arithmetic (DA) implementation of a 16-tap finite impulse response digital filter has been reported in Goslin (1995). The DA implementation of the multiplier uses look-up tables (LUTs). Because the product of two n-bit integers will have 22n different results; the size of the LUT increases exponentially with respect to the word length. For practical implementation, compromises must be made to trade additional computation time for smaller number of LUTs. 2.4.2 CORDIC-Based Radar Processor The improvement of FPGA-based CORDIC arithmetic implementation is studied further in (Andraka, 1998). The iteration process of CORDIC can be unrolled so that each processing element always performs the same iteration. Unrolling the processor results in two significant simplifications. First, shift amounts be- come fixed and can be implemented in the wiring. Second, constant values for the angle accumulator are distributed to each adder in the angle accumulator chain and can be hardwired instead of requiring storage space. The entire processor is reduced to an array of interconnected adder–subtractors, which is strictly combi- natorial. However, the delay through the resulting circuit can be substantial but can be shortened using pipelining without additional hardware cost. A 14-bit, 5- iteration pipelined CORDIC processor that fits in half of an Xilinx XC4013E-2 runs at 52 MHz. This design is used for high-throughput polar-to-Cartesian coor- dinate transformations in a radar target generator. 2.4.3 DFT/FFT An FPGA-based systolic DFT array processor architecture is reported in Dick (1996). Each processing element (PE) contains a CORDIC arithmetic unit, which consists of a series of shift and adds to avoid the requirement for area-consuming multiplier. The timing analyzer xdelay determines the maximum clock frequency to be 15.3 MHz implemented on a Xilinx XC4010 PG191-4 FPGA chip. 2.4.4 Image/Video Signal Processor In Schoner et al. (1995), the implementation of an FPGA-augmented low- complexity video signal processor was reported. This combination of ASIC and TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 30. Table 3 Achievable Frame Rate of Four Different Image Processing Operations Latency Algorithms Frames/sec (msec) 7 ⫻ 7 Mask 2D filter 13.3 75.2 8 ⫻ 8 Block DCT 55.0 18.2 4 ⫻ 4 Block vector, quantization at 0.5 bit/pixel 7.4 139.0 One-level wavelet transform 35.7 28.0 FPGA is flexible enough to implement four common algorithms in real time. Specifically, for 256 ⫻ 256 ⫻ 8-bit pictures, this device is able to achieve the frame rates presented in Table 3. 2.4.5 CDMA Rake Receiver A CDMA rake receiver for a real-time underwater data communication system has been implemented using four Xilinx XC4010 FPGA chips (Shankiti and Lesser, 2000) with one multiplier on each chip. The final design of each multiplier occupies close to 1000 CLBs (configurable logic blocks) and is running at a clock frequency of 1 MHz. 2.4.6 Viterbi Decoder Viterbi decoding is used to achieve maximum likelihood decoding of a binary stream of symbols. Because it involves bit-stream operations, it cannot be effi- ciently implemented using the word-parallel architecture of general-purpose microprocessors or PDSPs. It has been reported (Goslin, 1995) that a Xilinx XC4013E-based FPGA implementation of a Viterbi decoder achieves 2.5 times processing speed (135 nsec versus 360 nsec) compared to a dual-PDSP imple- mentation of the same algorithm. 3 PERFORMANCE MEASURES The comparison of the performance between PDSP and general-purpose micro- processors, between various PDSPs, and between PDSPs and dedicated hard- ware chip sets is a very difficult task. A number of factors contribute to this difficulty: 1. A set of objective performance metrics is difficult to define for PDSPs. It is well known that with modern superscalar instruction architecture, TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 31. the usual metrics such as MIPS (millions instructions per section) and FLOPS (floating-point operations per second) are no longer valid met- rics to gauge the performance of these microprocessors. Some PDSPs also adopt such architecture. Hence, a set of appropriate metrics is difficult to define. 2. PDSPs have fragmented architecture. Unlike general-purpose micro- processors that have converged largely to a similar data format (32- bit or 64-bit architecture), PDSPs have a much more fragmented archi- tecture in terms of internal or external data format and fixed-point ver- sus floating-point operations. The external memory interface is varied on platform by platform basis. This is due to the fact that most PDSPs are designed for embedded applications and, hence, cross-platform compatibility is not of major concern for different manufacturers of PDSPs. Furthermore, PDSPs often have specialized hardware to accel- erate a special type of operation. Such specialized hardware makes the comparison even more difficult. 3. PDSP applications are often hand programmed with respect to a par- ticular platform. The performance of cross-platform compilers is still far from realistic. Hence, it is not meaningful to run the same high- level language benchmark program on different PDSP platforms. Some physical parameters of PDSPs are summarized in Table 4. Usually peak MIPS, MOPS, MFLOPS, MAC/sec, and MB/sec for a partic- ular architecture are just the product of instructions, operations, floating-point operations, multiply-accumulate operations, and memory access in bytes exe- cuted in parallel multiplied by maximum clock frequency, respectively. They can be achieved instantaneously in real applications at certain clock cycle and some- Table 4 Physical Performance Parameters Parameters Units Maximum clock frequency MHz Power consumption Absolute power, watts (W), power (W)/MIPS Execution throughput, MIPS, MOPS (million operations/sec), MACS peak and sustained (no. of MAC/sec), MFLOPS Operation latency Instruction cycles Memory access Clock cycles Bandwidth MB/sec (megabytes per second) Latency Clock cycle Input/output No. of ports TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 32. Table 5 Examples of DSP Benchmarks Level Algorithm/application names Kernel General FFT/IFFT, FIR, IIR, matrix/vector multiply, Viterbi de- coding, LMS (least mean square) algorithm Multimedia/graphic DCT/IDCT, VLC (variable-length code) decoding, SAD Application General Radar (Bhargava et al., 1998) Multimedia/graphic MediaBench (Lee et al., 1997) G.722, JPEG, Image (Bhargava et al., 1998) how misleading. From a user’s perspective, the ultimate performance measure is the ‘‘execution time’’ (wall clock time) of individual benchmark. Recently, efforts have been made to establish benchmark suites for PDSPs. The proposed benchmark suites (Bhargava et al., 1998; Lee et al., 1997) can be categorized into kernel and application levels. They can be classified into general DSP and multimedia/graphic. Because each kernel contributes to the run time of each application at some certain percentage of run time and each application may contain more than single DSP kernel, conducting benchmark tests at both levels gives more accurate results than just the raw number of some DSP kernels. A number of DSP benchmarks are summarized in Table 5. 4 MODERN PDSP ARCHITECTURES In this section, several modern PDSP architectures will be surveyed. Based on different implementation methods, modern PDSPs can be characterized as PDSP chip, PDSP core, multimedia PDSPs, and NSP instruction set. The following aspects of these implementation approaches are summarized in terms of three general sets of characteristics: 1. Program (instruction) execution 2. Datapath 3. Physical implementation Program execution of the PDSP is characterized by processing core (how the PDSP achieves parallelism), instruction width (bits), maximum number of in- structions issued, and address space of program memory (bits). Its datapath is concerned with the number and bit width of datapath, pipelining depth, native data type (either fixed point or floating point), number of ALUs, shifters, multipli- TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 33. ers, and bit manipulation units as well as their corresponding data precision/ accuracy and data/address registers. Finally, physical characteristics to be com- pared include maximum clock frequency, typical operating voltage, feature size and implementation technology, and power consumption. 4.1 PDSP Chips Some of the recent single-chip PDSPs are summarized in Table 6. 4.1.1 DSP16xxx The Lucent DSP16xxx (Bier, 1997) achieves ILP from parallel operations en- coded in a complex instruction. These complex instructions are executed at a maximum rate of one instruction per clock cycle. Embedding up to 31 instructions following the Do instruction can eliminate overheads due to small loops. These embedded instructions can be repeated a specified number of times without addi- tional overhead. Moreover, a high instruction/data I/O bandwidth can be achieved from a 60-kword (120-kbyte) dual-ported on-chip RAM, a dedicated data bus, and a multiplexed data/instruction bus. 4.1.2 TMS320C54xx Characterized as a low-power PDSP, each TMS320C54xx (Texas Instruments, 1999a) chip is composed of two independent processor cores. Each core has a 40-bit ALU, including a 40-bit barrel shifter, two 40-bit accumulators, and a 17- bit ⫻ 17-bit parallel multiplier coupled with a 40-bit adder to facilitate single- cycle MAC operation. The C54 series is optimized for low-power communication applications. Therefore, it is equipped with a compare, select, and store unit (CSSU) for the add/compare selection of the Viterbi operator. Loop/branch over- head is eliminated using instructions such as repeat, block-repeat, and conditional store. Interprocessor communication is carried out via two internal eight-element first-in-first-out (FIFO) register. 4.1.3 TMS320C62x/C67x TMS320C62x/C67x (Texas Instruments, 1999b, 1999c; Seshan, 1998) is a series of fixed-point/floating-point, VLIW-based PDSPs for high-performance applica- tions. During each clock cycle, a compact instruction is fetched and decoded (decompressed) to yield a packet of eight 32-bit instructions that resemble those of conventional VLIW architecture. The compiler performs software pipelining, loop unrolling, and ‘‘If’’ conversion to a predicate execution. Furthermore, pro- grammers from a high-level language such as C can access a number of special- TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 34. Table 6 Summary of Recent Single-Chip PDSPs Family name: DSP 16xxx SHARC TMS32054xx TMS32062xx TMS32067xx TriCore Model no.: DSP16210 ADSP21160 TMS320VC5421 TMS320C 6203 TMS320C 6701 TC10GP Company: Lucent Analog Device Texas Texas Texas Infineon Processing core: VLIW Multiproc./SIMD Multiprocessor VLIW VLIW Superscalar Instruction Width (bits) 16 & 32 32 16 & 32 256 256 16 & 32 Maximum issued 1 4 2 8 8 2 Address space (bits) 20 32 — 32 32 32 Datapath No. of datapaths 2 2 2 2 2 3 Width of datapath (bits) 16 32 16 32 32 32 Pipeline depth 3 3 — 11 17 4 Data type Fixed-point Floating-point Fixed-point Fixed-point Floating-point Fixed-point Functional units ALUs 2 (40b) 2 2 (40b) 4 4 1 Shifters 2 2 2 (40b) 2 0 — Multipliers 2 (16b ⫻ 16b) 2 2 (17b ⫻ 17b) 2 2 2 (16b ⫻ 16b) Address generator 2 2 2 ⫻ 2 ALUs ALUs 1 Bit manipulation unit 1 (40b) Shifter 2 (40b) Shifter Shifter 1 Program control Hardware loop Y Y Y N N Y Nesting levels 2 — 2 2 2 3 On-chip storage Data registers 8 2 ⫻ 16 2 ⫻ 2 2 ⫻ 16 2 ⫻ 16 16 Width (bits) 40 40 40 32 32 32 Address registers 21 2 ⫻ 8 2 ⫻ 8 — — 16 Width (bits) 20 32 — — — 32 Performance Maximum clock (MHz) 150 100 — 300 167 66 Operating voltage (V) 3.0 — 1.8 1.5 1.8 2.5 Technology CMOS — CMOS CMOS (15C05) CMOS (18C05) CMOS Feature size (µm) — — — 0.15 0.18 0.35 Power consumption 294 mW at — 162 mW at 100 MHz — — — 100 MHz TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 35. purpose DSP instructions called intrinsic functions. This feature helps ease the programming task and improves the code performance. 4.1.4 ADSP 21160 SHARC Analog Device’s ADSP21160 SHARC (Super Harvard Architecture) (Analog Device, 1999) contains two PEs, both using a 40-bit extended precision floating- point format. Every functional unit in each PE is connected in parallel and per- forms single-cycle operations. Even though its name is abbreviated from Harvard architecture, its program memory can store both instruction and data. Further- more, SHARC doubles its data memory bandwidth to allow the simultaneous fetch of both operands. 4.1.5 TriCore TriCore TC10GP (TriCore, 1999) is a dual-issued superscalar load/store architec- ture targeted at control-oriented/DSP-oriented applications. Even though its in- structions are mixed 16/32 bits wide for low-code density, its datapath is 32 bits wide to accommodate high-precision fixed-point and single-precision floating- point numbers. 4.2 PDSP CORES In Table 7, we compare the features of four DSP cores reported in the literature: Carmel, R.E.A.L., StarCore, and V850. 4.2.1 Carmel One of the distinguishing features of Carmel (Carmel, 1999; Eyre and Bier, 1998) is its configurable long instruction words (CLIW) that are user-defined VLIW- like instructions. Each CLIW instruction combines multiple predefined instruc- tions into a 144-bit-long superinstruction: CLIW name (ma1, ma2, ma3, ma4) { // CLIW reference line MAC1 | | ALU1 | | MAC2 | | ALU2 | | MOV1 | | MOV2 // CLIW def } Programmers can indicate up to four execution units plus two data moves according to the position of individual instruction within the long CLIW instruc- tion. However, up to four memory operands can be specified using ma1 through ma4. The assembler stores 48-bit reference line in program memory and 96-bit definition in a separate CLIW memory (1024 ⫻ 96 bits). In addition to CLIW, TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 36. Table 7 Summary of PDSP Cores Family name: Carmel R.E.A.L. StarCore V850 Model no.: DSP 10XX — SC140 NA853C Company Infineon Philips Lucent & Motorola NEC Processing core VLIW VLIW VLIW RISC Instruction Width (bits) 24 & 48 16 & 32 16 16 & 32 Maximum issued 1/2 2 6 — Address space (bits) 23 — 32 26 Datapath No. of datapaths 2 2 — — Width of datapath (bits) 16 16 16 16 Data type Fixed-point Fixed-point Fixed-point Fixed-point Functional units ALUs 2 (40b) 4 (16b) 4 1 (32b) Shifters 1 (40) 1 (40b) ALU (40b) 1 (32b) Multipliers 2 (17b ⫻ 17b) 2 (16 ⫻ 16b) ALU (16b ⫻ 16b) 1 (32b ⫻ 32b) Address generator 1 2 2 — Bit manipulation unit Shifter — ALU Shifter On-chip storage Data registers 16 ⫹ 6 8 16 32 Width (bits) 16/40 16 40 32 Address registers 10 16 24 — Width (bits) 16 — 32 — Performance Maximum clock (MHz) 120 85 300 33 Operating voltage (V) 2.5 2.5 1.5 3.3 Technology CMOS — CMOS Titanium silicide Feature size (µm) 0.25 0.25 0.13 0.35 Power consumption 200 mW at 120 MHz — 180 mW at 300 MHz — TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 37. a specialized hardware is provided to support Viterbi decoding. Almost all in- structions can use predicated execution by two conditional-execution registers. 4.2.2 R.E.A.L. Similarly, the R.E.A.L. PDSP (Kievits et al., 1998) core allows users to specify a VLIW-like set of application-specific instructions (ASIs) to exploit full parallel- ism of the datapath. Up to 256 ASIs can be stored in a look-up table. A special class of 16-bit instructions with an 8-bit index field activates these instructions. Each ASI is 96 bits wide and has the following predicated form: Cond (3) | | XACU (11) | | YACU (10) | | MPY1 (3) | | MPY0 (3) | | ALUs (62) | | DSU (2) | | BNU (2) ASI [if(asi_cc)]alu3_op,alu2_op,alu1_op,alu0_op [mult1_op][,mult0_op][,dr_op][,xacu_op][,yacu_op]; ASI [if(asi_cc)]alu32_op, alu10_op [,mult1_op][,mult0_op][,dr_op] [,xacu_op][,yacu_op]; ASI [if(asi_cc)]lfsr [,mult1_op][,mult0_op][,xacu_op][,yacu_op]; Each ASI starts with a 3-bit condition code followed by an 11-bit X ALU opcode, 10-bit Y ALU, 3-bit multiplier 1 and 0’s opcodes, 62-bit operands, and so on. In addition to the user-defined VLIW instruction, R.E.A.L. allows application-specific execution units (AXUs) to be defined by the customer, which can be placed anywhere in the datapath or address calculation units. Its applica- tion is a GSM baseband signal processor. 4.2.3 StarCore StarCore (StarCore, 1999; Wolf and Bier, 1998) is a joint development between Lucent and Motorola for wireless software handset configurable terminals (ra- dios) of third-generation wireless systems. It is expected to operate at a low volt- age down to 0.9 V. A fetch set (8-word instruction set) is fetched from memory. A program sequencing (PSEQ) unit detects a portion of this set to be executed in parallel and dispatched to the appropriate execution unit. This feature is called a variable-length execution set (VLES). StarCore achieves maximum parallelism by allowing multiple address generation and data ALUs to execute multiple oper- ations in a single cycle. StarCore is targeted at speech coding, synthesis, and voice recognition. 4.2.4 V850 The NEC NA853E (NEC, 1997) is a five-stage pipeline RISC (reduced instruc- tion set computer) core suitable for real-time control applications. Not only is the instruction set a mixture of 16 and 32 bits wide, but it also includes intrinsic TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 38. instructions for high-level language support to increase the efficiency of the ob- ject code generated by the compiler and to reduce the program size. 4.3 Multimedia PDSPs Multimedia PDSPs are designed specifically for audio/video applications as well as 2D/3D graphics. Some of their common characteristics are as follows: 1. Multimedia input/output (I/O): This may include ports and codec (coder/decoder)forvideo,audio,aswellassuper-VGAgraphicssignals. 2. Multimedia-specific functional units such as a YUV to RGB converter for video display, variable-length decoder for digital video decoding, descrambler in TriMedia (Phillips, 1999), and motion estimation unit for digital video coding/compression in Mpact2 (Kala, 1998; Purcell, 1998). 3. High-speed host computer/memory interfaces such as PCI bus and RAMBUS DRAM interfaces. 4. Real-time kernel and operating system for MPACT and TriMedia, re- spectively. 5. Support of floating-point and 2D/3D graphic. Examples of multimedia PDSPs include MPACT, TriMedia, TMS320C8x, and DDMP (Data-Driven Multimedia Processor) (Terada et al., 1999). Their architec- tural features are summarized in Table 8. 4.4 Native Signal Processing Native signal processing (NSP) is the use of extended instruction sets in a general- purpose microprocessor to process signal processing algorithms. These are spe- cial-purpose instructions that often operate in a different manner than the regular instructions. Specifically, multimedia data formats usually are rather short (8 or 16 bits) compared to the 32-, 64-, and 128-bit native register length of modern general-purpose microprocessors. Therefore, up to eight samples may be packed into a single word and processed simultaneously to enhance parallelism at the subword level. Most NSP instructions operate on both integer (fixed-point) and floating-point numbers except the Visual Instruction Set (VIS) (Sun, 1997), which supports only fixed-point numbers. In general, NSP instructions can be classified as follows (Lee, 1996): • Vector arithmetic and logical operations whose results may be vector or scalar • Conditional execution using masking operations TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 39. Table 8 Summary of Multimedia PDSPs Family name: DDMP MPACT TMS320C8x TriMedia Model no.: — MPACT2/6000 TMS320C82 TM1300 Company: Sharp Chromatic Texas Philips Processing core: Dataflow VLIW Multiprocessor VLIW Instruction Width (bits) 72 81 32 16 to 224 Maximum issued 8 2 3 5 Address space (bits) — — 32 32 Datapath Width of datapath (bits) 12 72 32 32 Floating-point precision NAa Both Single Both Functional units ALUs 2 2 3 7 Shifters — 1 3 2 Multipliers 2 ALUs 1 3 2 Address generators 4 — 3 2 On-chip storage Data registers 4 Accumulators 512 48 128 Width (bits) 24 72 32 32 Performance Maximum clock (MHz) 120 125 60 166 Operating voltage (V) 2.5 — — 2.5 Technology CMOS (4 metal) — CMOS CMOS Feature size (µm) 0.25 0.35 — 0.25 Power Consumption 1.2 W at 120 MHz — — 3.5 W at 166 MHz a NA ⫽ not available. TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 40. • Memory/cache access control such as cache prefetch to particular level of cache, nontemporal store, and so forth, as well as masked load/store • Data alignment and subword rearrangement (i.e., permute, shuffle, etc.) Most NSP instruction set architectures exhibit the following features: 1. Native signal processing instructions may share the existing functional units of regular instructions. As such, some overhead is involved when switching between NSP instructions and regular instructions. However, some NSP instruction sets have separate, exclusive execution units as well as register file. 2. Saturation and/or modulo arithmetic instructions are often imple- mented in hardware to reduce the overhead of dynamic range checking during execution, as illustrated in Section 1.2.3. 3. To exploit subword parallelism, manual or human optimization of NSP-based programs is often necessary for demanding applications such as image/video processing and 2D/3D graphics. Common and distinguishing features of available NSPs are summarized alphabet- ically as presented in Table 9. 4.4.1 AltiVec Motorola’s AltiVec (Motorola, 1998; Tyler et al., 1999) features a 128-bit vector execution unit operating concurrently with the existing integer and floating-point units. There are totally 162 new instructions that can be divided into four major classes: Intraelement arithmetic Addition, subtraction, multiply–add, aver- operations age, minimum, maximum, conversion between 32-bit integer and floating point Intraelement nonarithmetic Compare, select, logical, shift, and rotate operations Interelement arithmetic Sum of elements within a single vector operations register to a separate register Interelement nonarithmetic Wide field shift, pack, unpack, merge/ operations interleave, and permute AltiVec shows a significant amount of effort to exploit the maximum amount of parallelism. This results in a 32-entry, 128-bit-wide register file sepa- rating from the existing integer and floating-point register files. This is different from other NSP architectures that often share the NSP register file with the ex- isting one. The purpose is to exploit additional parallelism through the superscalar dispatch of operations to multiple execution units; or through multithreaded TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 41. Table 9 Summary of Native Signal Processing Instruction Sets Name: AltiVec MAX-2 MDMX MMX/3D Now MMX/SIMD VIS Company: Motorola HP MIPS AMD Intel Sun Instruction set: Power PC PA RISC 2.0 MIPS-V IA32 IA32 SPARC V.9 Processor: MPC7400 PA RISC R10000 K6-2 Pentium III UltraSparc Fixed point (integer) 8-Bit 16 NAa 8 8 8 8 16-Bit 8 4 4 4 4 4 32-Bit 4 NA NA 2 2 2 Floating point Single precision 4 2 2 2 4 Na Fixed-point register file Size 32 ⫻ 128b 32 ⫻ 64b 32 ⫻ 64b 8 ⫻ 64b 8 ⫻ 64b 32 ⫻ 64b Shared with Dedicated Integer reg. FP reg. Dedicated FP reg. FP reg. Fixed-point accumulator Size NA NA 192 NA NA NA Arithmetic Unsigned saturation Y Y Y Y Y Y Modulo Y Y Y Y Y Y TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 42. Interelement arithmetic Multiply-Acca 4 NA 4 2 2 NA Fixed-point MAC 32 ⫹⫽ (16 ⫻ 16) — 48 ⫹⫽ (16 ⫻ 16) 32 ⫹⫽ (16 ⫻ 16) 32 ⫹⫽ (16 ⫻ 16) — precisionb Compare Y N Y Y Y Y Min/max Y N Y N Y Y Floating-point 4 single 2 single 2 single 2 single 4 single N Multiply-Acc Floating-point Y N N Y Y N min/max Intraelement arithmetic Sum Y N N Y Y N Floating-point Sum Y N N Y N N Type conversion Pack Y Y Y Y Y Y Unpack Y Y Y Y Y Y Permute Y Y Y — N — Merge Y Y Y Y Y Y Special instructions VREFP CACHE HINT SELECT FEMMS EMMS EDGE VRSQRTFP DEPOSIT PFRCP DIVPS ARRAY SPLAT EXTRACT PFRSQRT PREFETCH PDIST VSEL SHR PAIR PREFETCH SFENCE BLOCK TRANSFER a NA: not available. b precision (bits): acc (bits) ⫽ acc (bits) ⫹ a (bits) ⫻ b (bits). TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 43. execution unit pipelines. Each instruction can specify up to three source oper- ands and a single destination operand. Each operand refers to a vector register. Target applications of AltiVec include multimedia applications as well as high- bandwidth data communication, base station processing, IP telephony gateway, multichannel modem, network infrastructure such as an Internet router, and a virtual private network server. 4.4.2 MAX 2.0 Multimedia Acceleration eXtension (MAX) 2.0 (Lee, 1996) is an extension of HP Precision Architecture RISC ISA on a PA8000 microprocessor with minimal increased die area concern. Both 8-bit and 32-bit subwords are not supported due to insufficient precision and insufficient parallelism compared to a 32-bit single- precision floating point, respectively. Although pixels may be input and output as 8 bits, using a 16-bit subword in intermediate calculations is preferred. The additional hardware to support MAX2.0 is minimal because the integer pipe- line already has two integer ALUs and shift merge units (SMUs), whereas the floating-point pipeline has two FMACs and two FDIV, FSQRT units. MAX special instructions are field manipulation instructions, as follows: Cache hint For spatial locality Extract Selects any field in the source register and places it right-aligned in the target Deposit Selects a right-aligned field from the source and places it anywhere in the target Shift pair Concatenates and shifts 64-bit or rightmost 32-bit con- tents of tow register into one result 4.4.3 MDMX Based on MIPS’ experience of designing Geometry Engine, Reality Engine, Max- imum Impact, Infinite Reality, Nintendo64, O2, and Magic Carpet, the goal of MIPS Digital Media Extension (MDMX) (MIPS Technology, 1997) is to improve performance IEEE-compliant DCT accuracy. As a result, MDMX adds four- and eight-element SIMD capabilities for an integer arithmetic through the definition of these two data types: Octal byte Eight unsigned 8-bit integers with eight unsigned 24-bit accumulators Quad half Four unsigned 16-bit integers with four unsigned 48-bit accumulators Note that both octal byte and quad half data types share a 192-bit accumulator, which permits accumulation of 2N N ⫻ N multiples, where N is either 8 or 16 TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 44. bits according to octal byte and quad half, respectively. MDMX’s 32, 64-bit- wide registers and the 8-bit condition code coincide with the existing floating- point register file similar to the ‘‘paired-single’’-precision floating-point data type. Data are moved between the shared floating-point register file and memory with a floating-point load/store double word and between floating-point and inte- ger registers. In addition, MDMX has a unique feature with the vector arithmetic: It is able to operate on a specific element of a subword as an operand or as a constant immediate value. However, the reduction instruction (sum across) and sum of absolute difference (SAD) are judiciously omitted. In particular, SAD or L1 norm can be performed as an L2 norm without loss of precision using the 192-bit accumulator. 4.4.4 MMX 3DNow! AMD 3DNow! (AMD, 1999; Oberman et al., 1999) is Intel’s MMX-like multi- media extension, first implemented in the AMD K6-2 processor. Floating-point instructions are augmented to the integer-based MMX instruction set by introduc- ing a new data type: single-precision floating-point to support 2D and 3D graph- ics. Similar to the MMX, applications must determine if the processor supports MMX or not. In addition, 3DNow! is implemented with a separate flat register file in contrast to the stack-based floating-point/MMX register file. Because no physical transfer of data between floating-point and multimedia unit register files is required, FEMMS (faster entry/exit of the MMX or floating-point state) is included to replace MMX EMMS instruction and to enhance the performance. Either the register X or Y execution pipeline can execute floating-point instruc- tions for a maximum issue and execution rate of two operations per cycle (AMD, 1999). There are no instruction-decode or operation-issue pairing restrictions. All operations have an execution latency of two cycles and are fully pipelined. As long as two operations do not fall into the same category, both operations will start execution without delay. The 2 categories of the additional 21 instructions are as follows: 1. PFADD, PFSUB, PFSUBR, PFACC, PFCMPx, PFMIN, PFMAX, PI2FD, PFRCP, and PFRSQRT 2. PFMUL, PFRCPIT1, PFRSQIT1, and PFRCPIT2 Normally, all instructions should be properly scheduled so as to avoid delay due to execution resource contention or structural hazard by taking dependencies and execution latencies into account. FEMMS Similar to MMX’s EMMS but faster because 3DNow! does not share MMX registers with those of floating point. PFRCP Scalar floating-point reciprocal approximation TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 45. PFRSQRT Scalar floating-point reciprocal square root approxi- mation PREFETCH Loads 32 or greater number of bytes either nontempo- ral or temporal in the specified cache level 4.4.5 MMX/SIMD MMX (multimedia extension) is Intel’s first native signal processing extension instruction set (Intel, 1999). Subsequently, additional instructions are augmented to the Streaming SIMD Extensions (SSE) (Intel, 1999) in Pentium III class pro- cessors. SIMD supports 4-way parallelism of 32-bit, single-precision floating- point for 2D and 3D graphics or 32-bit integer for audio processing. These new data types are held in a new separate set of eight 128-bit SIMD registers. Unlike MMX execution, traditional floating-point instructions can be mixed with SSE without the need to execute special instructions, such as EMMS. In addition, SIMD features explicit SAD instruction and introduces a new operating-system visible state: EMMS (empty Must be used to empty the floating-point tag word MMX state) at the end of an MMX routine before calling other routines executing floating-point instruc- tions DIVPS Divides four pairs of packed, single-precision, floating-point operands PREFETCH Loads 32 or greater number of bytes either nontem- poral or temporal in the specified cache level SFENCE Ensures ordering between routines that produce (store fence) weakly ordered results and routines that consume these data just like multiprocessor weak consis- tency; nontemporal stores implicitly weak or- dered, no write-allocate, write combine/collapse so that cache pollution is minimized 4.4.6 VIS Sun’s VIS (Visual Instruction Set) (Sun, 1997; Tremblay et al., 1996) is the only NSP reviewed here that does not support parallelism of floating-point data type. However, the subword data share the floating-point register file with floating- point number, as indicated in Table 9. Some special instructions in VIS are Array, Pdist, and Block transfer: Array Facilitates 3D texture mapping and volume rendering by computing a memory address for data look up TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 46. based on fixed-point x, y, and z; data laid out in a block fashion so that points which are near one an- other have their data stored in nearby memory loca- tions Edge Computes a mask used for partial storage at an arbi- trarily aligned start or stop address typically at boundary pixels Pdist Computes the sum of absolute value of difference of eight pixel pairs Block transfer Transfers 64 bytes of data between memory and regis- ters 5 SOFTWARE PROGRAMMING TOOLS FOR PDSPs 5.1 Software Development Tools for Programming PDSPs Since their introduction more than a decade ago, PDSPs have been incorporated in many high-performance embedded systems such as modems and graphic accel- eration cards. A unique requirement of these applications is that they all demand high-quality (machine) code generation to achieve the highest performance while minimizing the size of the program to conserve premium on-chip memory space. Often, the difference of one or two extra instructions implies that either a real- time processing constraint may be violated, leaving the code generated use- less, or an additional memory module may be needed, causing significant cost overrun. High-level languages (HLLs) are attractive to PDSP programmers because they hide hardware-dependent details and simplify the task of programming. Un- like assembly codes, HLL programs are readable and maintainable and are more likely to be portable to other processors. In the case of an object-oriented HLL, such as C⫹⫹, those programs are also more reliable and reusable. All these features contribute to reduce development time and cost. Figure 1 depicts an example of typical software development for PDSPs— the TMS320C6x software development flowchart. There are three possible source programs: C source files, macro source files, and linear assembler source files. The latter sources are both at assembly program level. The assembly optimizer assigns registers and uses loop optimization to turn the linear assembly into a highly paral- lel assembly that takes advantage of software pipelining. The assembler translates assembly language source files into machine language object files. The machine language is based on the common object file format (COFF). Finally, the linker combines object files into a single executable object module. As it creates the executable module, it performs relocation and resolves external references. The linker also accepts relocatable COFF object files and object libraries as input. TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 47. Figure 1 TMS320C6x software development flow. (From Texas Instruments, 1998a.) To improve the quality of the code generated, C compilers are always equipped with extensive optimization options. Many of these compiler optimization strate- gies are based on GNU C Compiler (GCC) (see Table 10). The debugger can usually be both simulator and profiler like the C source debugger (Texas Instruments, 1998b). The C source debugger is an advanced graphic user interface (GUI) to develop, test, and refine ’C6x C programs and assembly language programs. In addition to that, the ’C6x debugger accepts exe- cutable COFF files as input. It features the following capabilities that are common in other PDSP development environments: TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 48. • Multilevel debugging; user can debug both C and assembly language code. • Dynamic profiling provides a method for collecting execution statistics and immediate feedback to identify performance bottlenecks within the code. • Fully configurable graphical user interface. • Comprehensive data displays. 5.2 On-Chip Emulation The presence of the Joint Test Action Group (JTAG) test access module and enhanced on-chip emulation (EOnCE) module interface allows the user to insert the PDSP into a target system while retaining debug control. The EOnCE module, as shown in Figure 2, is used in PDSP devices to debug application software in real time. It is a separate on-chip block that allows nonintrusive interaction with the core. The user can examine the contents of registers, memory, or on-chip peripherals through the JTAG pins. Special circuits and dedicated pins on the core are defined, to avoid sacrificing user-accessible on-chip resources. As applications grow in terms of both size and complexity, the EOnCE provides the user with many features, including the following: • Breakpoints on data bus values • Detection of events, which can cause a number of different activities configured by the user • Nondestructive access to the core and its peripherals Table 10 Compiler Optimization Options in DSP16000 Series Optimization performed Targeted application ⫺O0 Default operation, no optimization C level debug to verify functional correctness ⫺O1 Optimize for space Optimize space for control code ⫺O2 Optimize for space and speed Optimize space and speed for control code ⫺O Equivalent to ⫺O2 Equivalent to ⫺O2 ⫺O3 ⫺O2 plus loop cache support, Optimize speed for control and loop some loop unrolling code ⫺O4 Aggressive optimization with soft- Optimize speed and space for control ware pipeline and loop code ⫺Os Optimize for space Optimize space for control and loop code Source: Lucent, 1999. TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 49. Figure 2 Typical debugging system using EOnCE. (From StarCore, 1999.) • Various means of profiling • Program tracing buffer The EOnCE module provides system-level debugging for real-time systems, with the ability to keep a running log and trace of the execution of tasks and interrupts and to debug the operation of real-time operating systems (RTOS). 5.3 Optimizing Compiler and Code Generation for PDSP The PDSP architecture evolves from an ad hoc heterogeneous resource toward a homogeneous resource like the general-purpose RISC microprocessor. One of the reasons is to make compiler optimization techniques less difficult. Classical PDSP architecture is characterized by the following: • A small number and nonuniform register sets in which certain registers and memory blocks are specialized for specific usage • Highly irregular datapaths to improve code performance and reduce code size • Very specialized functional units • Restricted connectivity and limited addressing to multipartitioned memory TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 50. Several techniques have been proposed based on a simplified architecture of TMS320C2X/5X [e.g., instruction selection and instruction scheduling (Yu and Hu, 1994) register allocation and instruction scheduling. The success of RISC and its derivatives such as the superscalar architecture and the VLIW architecture have asserted significant impacts on the evolution of modern PDSP architecture. However, code density is a central concern in devel- oping embedded DSP systems. This concern leads to the development of new strategies such as code compaction (Carmel, 1999) and user-defined long instruc- tion word (Kievits et al., 1998). With a smaller code space in mind, code compac- tion using integer programming (Leupers and Marwedel, 1995) was proposed for applications to PDSPs that offer instruction-level parallelism such as VLIW. Later, an integer programming problem formulation for simultaneous instruc- tion selection, compaction, and register allocation was investigated by Geboyts (Geboyts, 1997). It can be seen that earlier optimization techniques were focused on the optimization of basic code blocks. Thus, they can be considered a local optimization approach. Recently, the focus has shifted to global optimization issues such as loop unrolling and software pipelining. In Stotzer and Leiss (1999), results of implementing a software pipelining using modulo scheduling algorithm on the ’C6x VLIW DSP have been reported. Artificial intelligence (AI) techniques such as planning were employed to optimize instruction selection and scheduling (Yu and Hu, 1994). With AI, con- current instruction selection and scheduling yield code comparable to that of handwritten assembly codes by DSP experts. The instruction scheduler is a heu- ristic list-based scheduler. Both instruction scheduling and selection involve node coverage by pattern matching and node evaluation by heuristic search using means-end analysis and hierarchical planning. The efficiency is measured in terms of size and execution time of generated assembly code whose size is up to 3.8 times smaller than that of a commercial compiler. Simultaneous instruction scheduling and register allocation for minimum cost based on a branch-and-bound algorithm are reported. The framework can be generalized to accumulator-based machines to optimize accumulator spilling, such as the TMS320C40. Their uses are likely intended to obtain more compact code. Recently, in Leupers and Marwedel (1995) and Geboyts (1997), integer lin- ear programming is shown to be effective in compiler optimization. The task of local code compaction in VLIW architecture is solved under a set of linear con- straints such as a sequence of register transfers and maximum time budget. Because some DSP algorithms show more data flow and less control flow behavior (Leupers and Marwedel, 1995), code compaction exploits parallel register transfers to be scheduled into a single control step, resulting in a lower cycle count to satisfy the timing constraint. Of course, it is important to consider resource conflicts and dependencies, as well as the possible side effects of encoding restrictions and opera- TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 51. tions. Later effort has integrated instruction selection, compaction, and register allocation together (Geboyts, 1997). The targeted PDSP is defined using arc map- pings and later into logical propositions to make it retargetable. These propositions are then translated into mathematical constraints to form the optimization model using integer linear programming. Code is generated and optimized for minimum code size and maximum performance in estimated energy dissipation. In modern VLIW PDSPs, the architecture features homogeneous functional and storage resources, enabling global optimization by the compiler. Software pipelining is known as one of the most popular code scheduling techniques. It exploits the available instruction-level parallelism in various loop iterations. A software-pipelined loop consists of three components: • A prolog: set up the loop initialization • A kernel: execute pipelined loop body in steady state • An epilog: drain the execution of the loop kernel Modulo scheduling takes an innermost loop body and constructs a new schedule. The new schedule is equivalent to overlapping loop iterations. The algorithm utilizes a data precedence graph (DPG) and reservation table to construct a per- missible schedule of the loop body under the available resource constraints. DPG is a directed graph (possibly cyclic) with nodes and edges representing operations and data flow dependencies of the original inner loop body. The resource require- ments for an operation are modeled using a reservation table. Stotzer and Leiss (1999) report the result of software pipelining on a set of 40 loop kernels based on ’C6x architecture. However, the architectural features that impact performance gain of software pipelining are moderately sized register file, constraints on code size, and multiple assignment code. 6 DSP SYSTEM DESIGN METHODOLOGIES Designing modern DSP systems requires more than just programming PDSP or processing cores. Instead, the system’s performance must be the utmost perfor- mance criterion. The DSP system design methodologies are developed at differ- ent levels of abstraction. At the system level, the design scope includes task and data partitioning and software synthesis/simulation. At the architectural level, the focus is on architecture and compiler development. At the chip implementa- tion level, hardware description languages such as VHDL and hardware/software codesign methodologies are quite important. 6.1 Application Development with Existing Hardware/Processing Core A software engineering approach is incorporated to assist in the development of an application using a DSP array processor at Raytheon System Corporation TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 52. (Kelly and Oshana, 1998). Three performance measurements are used to gauge the quality of the design: • Processor throughput rate • Memory utilization • I/O bandwidth utilization A sensitivity analysis of these performance metrics is performed to examine trade-offs of various design approaches. Factors that affect the processor throughput rate include the quality of DSP algorithm formulation, the operation cost in processor cycles, the sustained throughput rate to peak throughput effi- ciency, and the expected speedup when it is upgraded to the next generation of PDSP. Regarding the memory utilization, it has been observed that the size of the data samples and the dynamic nature of memory usage patterns are the two most important factors. The I/O bandwidth utilization, on the other hand, depends on the algorithm as well as the hardware design. Several design tools used to develop the entire system and their factors that may degrade the performance during the design process are listed in Table 11. Rate monotonic analysis (RMA) (Liu and Layland, 1973) is necessary to validate the schedulability of software architecture. In general, the following les- sons have been learned through this design experience: • Prototype early in the development cycle. • Ignore processor marketing information (actual throughput is highly dependent on the application profile). • Carefully analyze the most frequently executed function: task switching. • Take inherent interface overheads such as interrupt handling, data pack- ing, and data unpacking into account in estimating the throughput. Another example of DSP system development is the Computer Assisted Dynamic Data Monitoring and Analysis System (CADDMAS) developed for the Table 11 DSP System Development Tools and Factors That May Degrade Performance Tools Factors that may degrade performance Code generation Compiler efficiency Quality of generated assembly code Size of load image Instruction level processor simulator Cycle counts for elementary operation Cycle-accurate device level VHDL External memory access time model Instruction caching effects Resource contention between processor and DMA channels TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 53. U.S. Air Force and NASA (Sztipanovitz et al., 1998). Its details have been de- scribed in Section 2.3. An adaptive approach was necessary to allow the structure of the system to adapt to changing external requirements and sensor availabilities. This leads to the application of a reconfigurable controller for process control called structurally adaptive signal processing. Unlike parametric adaptation, where the topology of the graph is fixed and coefficients can change over time, a structurally adaptive signal processing system can change its computational structure on the fly. Therefore, its control functionality can be maintained even in the face of sensor failures; the performance will be gracefully degraded but correct control action is still present. 6.2 Application-Driven Design: Fine-Tuning the Processing Core Two reasons contributing to the poor performance of HLL PDSP commercial compilers are, first, that compilers are developed after a target architecture has been established and, second, the inability to exploit DSP-specific architectural features in DSP compiler (Lee, 1994). The following application-driven design methodologies are adopted: • A DSP architecture and its compiler are developed in parallel. • Its dynamic statistics assesses the impact trade-offs on performance. • An iterative analysis is undertaken to fine-tune the architecture and compiler. The PDSP architecture is based on VLIW. As a result, an optimizing C compiler is necessary to exploit static instruction-level parallelism as well as DSP-specific hardware features. Those hardware features are modulo addressing, low overhead looping, and dual data-memory banks. Meanwhile, an instruction set simulator is developed to gather statistics on the run-time behavior of DSP programs. A suite of DSP benchmarks in terms of kernel and application are chosen to evaluate the system. The performance success of the compiler is due to the flexibility of the model VLIW architecture. The statistics indicate the areas of improvement to be fed back for fine-tuning the architecture. However, its draw- back is the high instruction-memory bandwidth requirements that can be too ex- pensive and impractical to implement. As another means of DSP architecture development, machine description language (MDL) has been proposed to achieve rapid prototyping at architectural level. Recently, LISA (Peesl et al., 1999) was developed for the generation of bit and cycle accurate models of a PDSP. It includes instruction set architecture that enables automatic generation of simulators and assemblers. LISA is com- posed of resource and operation declarations. Resource declaration represents the storage objects of the hardware architecture (e.g., registers, memories, pipelines). TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 54. Declaration description collects the description of different properties of the sys- tem (i.e., the instruction set model, the behavioral model, the timing model, and necessary declarations). LISA supports cycle-accurate processor models, includ- ing constructs to specify pipelines and their mechanisms. It targets SIMD, VLIW, and superscalar architectures. Direct support for compiled simulation techniques and strong orientation on C programming language are contributed in LISA. The Texas Instruments’ TMS320C6201 DSP, realized as a real-world example, was modeled on a cycle-by-cycle basis by only one designer and finished within 2 months. 6.3 Reconfigurable Computing: Hardware/Software Codesign for a Given Application In the system design process, it has been traditional that the decision is made on a subtask-by-subtask basis to be implemented in either custom hardware or soft- ware running on PDSP(s). On one hand, custom hardware or ASIC can be cus- tomized to a particular subtask, resulting in relatively fast and efficient implemen- tation. ASIC is physically programmed by patterning devices (transistors) and metal interconnection prior to fabrication process. Higher throughput and lower latency can be achieved with more space dedicated to particular functional units. On the other hand, PDSP is programmed later by software resulting in flexible but relatively slow and inefficient realization. Temporal or sequential operations can be accomplished by a set of instructions to program a processor after its fabrication. Between these two extremes, reconfigurable computing (RC) architecture can be programmed to perform any specific function by a set of configuration bits. In other words, RC combines temporal programmability with spatial com- putation in hardware after it is fabricated at low overhead. The hardware/ software boundaries can be altered by the RC paradigm (DeHon and Wawrzynek, 1999). Reconfigurable computing is also known as the 90/10 rule of thumb, where 90% of run time is spent on 10% of the program, hardware/software partitioning is inspired by the higher percentage of run time of specialized computation: the greater improvement of cost/performance if it is implemented in hardware, and the more specialized computation dominate the application the more closely the specialized processor should be coupled with host processor. This rule of thumb has been successfully applied to the floating-point processing unit as well as RC. In the heterogeneous system approach, RC is combined with the general- purpose processing capability of the traditional microprocessor. The interface between these two can be either closely or loosely coupled, depending on its applications. How frequently RC’s functionality should be reconfigured dynami- TM Copyrightn2002byMarcelDekker,Inc.AllRightsReserved.
  • 55. Another Random Scribd Document with Unrelated Content
  • 56. "Good-night, Papa Clyde; Doctor Heath says you are the most splendid fellow in the world--but I know you are the dearest father in the world; good-night, I 've had a lovely party." She ran upstairs, but, in a moment, her father heard her tripping down again. Her head parted the portières. "I just came back to tell you, that this kind of a talk we 've had is just as good as the Mount Hunger bedtime-talks. I shan't be homesick any more." And away she ran. Now John Curtis Clyde was a pew-owner--as had been his father and grandfather before him--in one of the Fifth Avenue churches, and duly made his appearance in that pew every Sunday morning. He entered, too, into the service with hearty voice, and made his responses without, the while, giving undue thought to the world. But when he had said "Our Father" with his little daughter by his side, he had supposed his duty performed to the extent of his needs--of another's, his child's, he gave no thought. To-night, however, as he sat in the easy-chair where Hazel had left him, it began to dawn upon him slowly that his little daughter, during her fourteen years, might have had other needs, for which he had not provided, nor, perhaps, with all his riches was capable of providing. The clock chimed twelve,--one,--two--; John Clyde, with a sigh, rose and went up to bed--a wiser and a better man. XXII ROSE
  • 57. What a summer that was! Mr. Clyde sent Hazel up to the Blossoms for July and again for September, when he, the Colonel and Mrs. Fenlick, the Pearsells and the Masons, Aunt Carrie and Uncle Jo took possession of the entire inn at Barton's River, and for a month coached and rode throughout the "North Country," all in the cool September weather. Jack Sherrill joined them for the last three weeks, and, this time, Maude Seaton was not of the party. "I just headed her off every time she made a dead set at any one of us for an invitation," said Mrs. Fenlick one day in confidence to her intimate, Mrs. Pearsell, as they sat on the vine-covered veranda of the inn, "but she proved a regular octopus. She got the Colonel in her toils one morning at the Casino, and I pretended to be faint--yes, I did--just to get his attention for a sufficient time to make a fuss, and get him alone in the carriage; then, of course, I settled it. Oh, dear! men are so guileless in spots!"--Mrs. Fenlick gave a weary sigh--"What I have n't been through with that girl! Anyway, she's been out two winters, now, and she has n't caught Jack Sherrill yet. I don't think there is much chance after the first season for a girl to make a really fine match, do you?" Then they fell to discussing the pros, and cons, of the question with evergreen interest. Jack Sherrill, for one, had no thought of Miss Seaton. He had sent the valentine-flowers, and the sentiment from Barry Cornwall's love-song, with a strange kind of "kill or cure" feeling. He had communed with himself, at twilight of one February day, as he lay at full length on the cushioned window-seat of his room from which he looked down upon the darkening, snow-covered campus and the anatomy of the elms showing black against it. His
  • 58. pipe had gone out, but he derived some satisfaction in pulling away at it mechanically, while he thought out the situation for himself. "What's the use of a man's hanging fire when he knows?" he thought. "Now, I love her--love her." (Jack's hand stole into the breast of his jacket and crushed a bit of paper there; he smiled.) "Of course she does n't know, and won't know for a while, but it shan't be through any neglect of mine that she does n't; and when she knows--there 's the rub!--will she care for me, Jack Sherrill? I 've never done anything in my life to make a girl like that care for me. "But there's one thing I 'd stake my life on--she would n't marry a man for his money. A man 's got to be loved for himself--not for what he can give a woman, or do for her, but just for himself, if it's going to be the real thing, and last. And what am I that a girl like that should love me--" Jack was growing very humble. He pulled himself together: "Anyhow, I'll send the flowers and the sentiment, I mean it; I don't care what she thinks!" Jack's courage rose as he began to feel something like defiance of Fate. Just then his chum came in. "There's no use, Sherrill," he said, flinging himself down upon the cushioned seat Jack had just vacated; "we can't have the theatricals unless you take the girl's part. It won't put you out any-- smooth face and no scrub. You 've been it once, and it will be a dead failure if you aren't in it now." "I don't see how I can," replied Jack, shortly, for this intrusion on his mood irritated him. "I told you, all of you, at the Club last year, that I would n't play after I was a Junior." "Well, what if you did?" rejoined his chum, a little crossly. "You 're not so uncompromisingly steadfast in other things that you can't
  • 59. afford to change your mind in such a trifle as this." "Come, don't be touchy," said Jack, good-humoredly. "Hit right out from the shoulder, old man, and tell me what you mean." Dawns smiled, clasped his hands under his head, and raised his merry blue eyes to Jack, who was lighting up. "They say over at the Club that you have thrown Maude Seaton over, but Grayson took up the Seaton cudgels and made the statement that she had thrown you over, and you won't take the girl's part in the play because she is coming on for it." Jack hesitated. He hated to play at any comedy of love when his heart was throbbing with the genuine article. But, after all, it might be the best way to silence the Club's tongues as well as some others in Boston and New York. "I 'll help you out this once, Dawns, but I tell you plainly I won't have anything more to do with the Club theatricals while I 'm in college," he replied, ignoring both of Dawns' statements, which omissions his chum noticed, and made his own thoughts: "Just like Sherrill. You can't get any hold of him to know what he really feels and thinks." Jack played his part accordingly, repeating the success of the year before, and scoring new triumphs. He was glad when it was over, and he could go back to his room "dead tired," as he said to himself, but with the conviction that he had settled matters to his own satisfaction if not to that of one other. The room was in such disorder! Evidently, Dawns had been having a little spree before Jack's late return, and the smoke had left the air heavy.
  • 60. Jack dropped his paraphernalia in the middle of the floor-- peeling himself as he stood yawning and thanking his lucky star that he was not born a woman to be handicapped by such things!-- décolleté white satin waist, long-trained satin gown, necklace--Jack gave the string a twitch, for it had knotted, and the Roman pearls rolled into unreachable places all over the floor. Off flew one white satin slipper--number ten, broad at the toes!--with a fine "drop kick" hitting the ceiling and landing on the book-shelves; the other followed suit. White fan with chain, white elbow gloves, corsage bouquet--all dropped in a promiscuous heap. A general stampede loosened silk under-skirt and dainty muslin petticoat, lace-trimmed. A wrench,--corset-cover and corsets were torn from their moorings. Jack groaned--or something worse--at the flummery, and, leaving everything as it had dropped, rushed off into his bedroom, only to find that he had forgotten to take off the blonde wig and wash off the rouge. At last, however, he was asleep, and slept the sleep of the justified. He slept both soundly and late, but when he awoke the next morning his first thought was of the flowers for Mount Hunger and the appropriate sentiment. Accordingly, having reckoned the arrival of train, departure of stage, etc., to a minute, he selected the flowers, wrote the sentiment, not without forebodings of the usual kind, and despatched both to Mount Hunger with high hopes, notwithstanding prescient feelings. Then, metaphorically, he sat down to await an answer. He waited just two months, and during that time had turned emotionally black and blue more than once at the thought of his temerity in sending such a message.
  • 61. Hazel had written him at once from North Carolina to tell him of March's illness, and on the same day she sent a penitent note to Rose, confessing her shame at her attempt at deception, and explaining that it was because she loved her cousin so dearly she could not bear to see his gift slighted. When March was out of danger, Rose had written to Hazel a frank, loving letter, blaming herself for her want of self-control, and begging Hazel's forgiveness for her harsh words: "It's all my old pride, Hazel dear," she wrote, "that I have to fight very often. It was most kind of Mr. Sherrill to remember me when he has so many, many other friends whom he has known longer, and I shall write and tell him so. Now that my heart is lighter on account of dear March, I can write more easily. "We miss you so! when are you coming back to us? Chi looks perfectly disconsolate, and we all feel a great deal more than we care to say. "I wish you were here to have the fun of the French evenings, three times a week. You speak it so beautifully, Mr. Ford says, and I thank you so much for all the help you gave me in teaching me. Mr. Ford speaks it very well, too, so Miss Alton says. We all meet at our house once a week on March's account, and then one evening in the week, Miss Alton and I (she 's lovely) go over to the Fords' for music. He has sent for some lovely songs for me--old English ones, and we're going to have a little celebration for March's birthday in May. How I wish you were to be here! "March is lying on the settle, dreaming over that exquisite photograph of Cologne Cathedral you sent him; I've just asked him if
  • 62. he had any messages for you, and he smiled--oh, it's so good to see his dear smile again! You can't think how tall he's grown since his illness, and he's so thin--and said, 'I sent one to her this morning myself; she can't have two a day.' But you know March's ways. "Now I must stop; Mr. Ford is coming over on horseback and I am riding Bob now. I wear an old riding-habit of Martie's--it fits fine! I have more to tell you, but will finish after I get back from the ride-- there comes Mr. Ford--" This letter Hazel duly forwarded to her cousin. "He 'll know by what she says in it that she really was pleased, for all she acted so queer," she said to herself as she enclosed it in one to Jack, in which she took special pains to inform him that he had never told her whether he had given those verses Rose sang to Miss Seaton. "I told Rose I was sure they were for Miss Seaton, and Rose said she did n't mind copying them herself for you if you wished them. Do tell me if you gave them to her. I told Rose your valentine to her last year was a rose-heart. I hope you don't mind my telling, for, you know, Jack, all our family think you are engaged to her--" Jack dropped Hazel's letter at this point and gave a decided groan. "What luck!" he muttered. "It's all up with the whole thing now. No girl of any spirit would stand all that--and Hazel meddling so! thinking she is doing her level best to explain matters;--What an ass I was to send that flower-valentine to Maude--and she thinks I gave her those verses! and there 's this Ford skulking round and having it
  • 63. all his own way; he 's just the kind a girl would care for--those musical cranks are no end sentimental. Hang it all!" Jack thrust his hands deep into his pockets, took several decided turns up and down the room, squared his shoulders, pursed his lips, cut his two classroom lectures, ordered up Little Shaver and rode out to the polo grounds, where, finding himself alone, he put the little fellow through his best paces, ignoring the fact that snow and ice wore on the pony's nerves--and had a game out to himself. When just two months had passed, he received a note from Rose, his first, and it was accorded the reception due to first notes in particular. After this, Jack developed certain wiles of diplomacy, he had thus far, in his various experiences, held in abeyance. He wrote sympathetic notes to Mrs. Blossom; commissioned Chi to find him another polo pony--Morgan, if possible--among the Green Hills; sent March a set of illustrated books on architecture, and complained to Doctor Heath of a pain that racked his chest; at which the Doctor's eyes twinkled. He said he would examine him later, but he was convinced it was heart trouble, the symptoms were apt to mislead and confuse. He added gravely: "Too much hard polo riding, Jack; get away into the country--mountains if you can, and you 'll recuperate fast enough. I 'll make an examination in the fall." Jack obeyed to the letter, and what a month of September that was! There were glorious rides with Rose along the beautiful river valley and over the mountain roads. There were delightful evenings at the Fords', and silent, beatific walks with Rose homewards beneath the harvest moon. There were morning rambles with Rose up over the pastures and deep into the woodlands for late ferns and
  • 64. hooded gentians. There were adorable hours of doing nothing but adore, while Rose was busy about her work, setting the table for tea (Jack paid his board at the inn, but he lived at the Blossoms'), or laying the cloth for dinner, or on Saturday morning even making rolls for the tea to which the whole party at the inn were invited. Chi was in his glory. Little Shaver came trotting regularly every day up through the woods'-road, and whinnied "Good-morning" first to Fleet, then to Chi. There were general coaching-parties to Woodstock and Brandon, in which Mrs. Blossom was guest, and a grand tea at the Fords' for all the guests, with a musicale for a finish, and an informal dance in the Blossoms' barn to which all the Lost Nation were invited. They accepted, one and all. Captain Spillkins was in his element, so he said. He and Mrs. Fenlick danced a two-step in a manner to win the commendation of the entire assembly. Miss Elvira and Miss Melissa went through the square dance escorted by Jack and Uncle Jo. There were round dances and contra dances. Uncle Israel contributed an "1812" jig, and Mr. Clyde passed round the hat for his sole benefit. There were waltzes for those who could waltz, and polkas for those who could polka, and schottische and minuet. "There never was such a dance since before the Deluge!" declared Mrs. Fenlick, when Captain Spillkins escorted her to a seat on a sap- bucket; and then they all went at it again in a grand finale, the Virginia Reel--Chi and Hazel, Mr. Clyde and Aunt Tryphosa for head and foot couple; Maria-Ann with Jack; Alan Ford with Mrs. Fenlick; the Colonel with Mrs. Blossom whom he admired greatly; March and Miss Alton--such a double row of them!
  • 65. Poor Reub sat in one of the empty stalls and watched the fun with slow, half-understanding smile, and Ruth Ford reclined in a rocking-chair in the corner, and with merry laughter and sparkling wit soothed the dull ache in her heart that the knowledge that she was henceforth to be a "Shut-out" from all that life had at first given her. The next day after the dance there was a grand dinner given at the inn by the Newport party to all the Lost Nation; and, later on, private entertainments for Mr. and Mrs. Blossom and the Fords. At last, when the first maple leaves crimsoned and the frost silvered the mullein leaves in the pasture, Hazel, her father, Jack, and their friends bade good-bye to the Mountain and all its joys of acquaintance, and in some cases, friendship, and turned their faces, not without reluctance on the part of some of them, city-wards. "Oh, mother! has n't it been too beautiful for anything?" exclaimed Rose, turning to her mother, as the last of the riding-party waved his cap in farewell to those on the porch. It was Jack. "We have had a happy summer, Rose;--I think they have, too," her mother added, shading her eyes from the setting sun. "You 'll be very lonely here at home, dear, after all this gayety." "Lonely! Why, Martie Blossom, how can you think of such a thing!" said Rose, still scanning the lower road for a last glimpse of the riders. "See, see, they are all waving their handkerchiefs!" The whole Blossom family laid hold of what they could--napkins, towels, a table-cloth, and Chi seized his shirt, which he had hung on the line to dry, and waved frantically until the party was no longer to be seen.
  • 66. "Lonesome! the idea," said Rose, turning to her mother. "Think of all the studying March and I have to do, and the French evenings, and the Fords, and Thanksgiving coming, and then Christmas, and then-- "Then," said Mrs. Blossom, interrupting her, "my Rose takes a little plunge into that whirlpool of gay life and fashion in New York." "Yes," said Rose, with a happy smile that spoke volumes to her mother, "I do look forward to it, Martie dear; but the whirlpool shan't suck me under; I shall come home just your old-fashioned Rose- pose." "I hope so, dear," said her mother, a little wistfully, and called the children in to supper. Indeed, they found little opportunity to miss their friends in the ensuing months; for there came kindly letters, and friendly letters, and something very nearly resembling love-letters. The mail brought papers, books, and magazines. The express brought to Barton's River many a box of lovely flowers. At Christmas came more than one remembrance for them all, including Aunt Tryphosa and Maria- Ann, and four special invitations for Rose to visit in New York directly after the holidays. One was from Mr. Clyde--with an urgent request from Hazel to say "yes" by telegram and "relieve her misery," so she put it--; one from Mrs. Heath; one from Aunt Carrie, and a gushingly cordial one from Mrs. Fenlick! Each claimed her for a month. But Mrs. Blossom shook her head. "No, no, dear, you would wear your welcome out. I shall need you at home by the last of February. I think you can accept only Mr. Clyde's and Mrs. Heath's. You can accept social courtesies from the other four of course."
  • 67. "But, mother," Rose's face was the image of despair, "what shall I wear? Just hear what Hazel has planned--'lunches, dinners, theatre, concerts'--why! I can never go to all those things." "I 've thought of that, too, Rose; but the little colt shan't go bare this time--it will take some courage, dear, to wear the same things over and over again, not to mention the puzzle of planning for it all." "I 'm not 'Molly Stark' for nothing," laughed Rose, and the two women began to plan for what Chi called "Rose's campaign." The pretty white serge was lengthened and made over to appear more grown up, as Cherry put it; the dark blue wash silk--Hazel's gift that had never been made up--was fashioned into a "swell affair"--so March pronounced it; the old-fashioned blue lawn was cut over into a dainty full waist, and then Mrs. Blossom added her surprise--a delicate blue taffeta skirt to match the waist. Rose went into raptures over it, and sought the best bedroom regularly three times a day to feast her girl's eyes on the silken loveliness as it lay in state on the best bed. A new dark blue serge was to do duty for a street suit, with a plain felt hat. For best, there was a turban made of dark blue velvet to match the wash silk. "And four pairs of gloves! Martie Blossom, you are an angel, to give me these that Hazel gave you a year ago last Christmas. Have you been keeping them for me all this time?" Mrs. Blossom smiled assent, and was rewarded by a squeeze that interfered decidedly with her breathing apparatus. The night before she left, Rose "costumed" for the benefit of the entire family, who were assembled in the long-room, together with Aunt Tryphosa and Maria-Ann, to see Rose in her finery.
  • 68. "I 'll make it a climax," said Rose, laughing half-shamefacedly, as she slipped upstairs to change her street suit, which had brought forth admiring "Ohs" and "Ahs" from the children, and favorable criticism from their elders. Down she came in her white serge; there were nods and smiles of approval. Her reappearance in the wash silk and velvet turban was the signal, on March's part, for a burst of applause, and cries of admiration from Budd and Cherry. "Grand transformation scene!" cried March, as Rose tripped down in the blue taffeta, looking like a very rose herself. "Beats all!" murmured Chi, who had become nearly speechless with admiration, "what clothes 'll do for a good-lookin' woman; but for a ravin', tearin' beauty like our Rose--George Washin'ton! She 'll open those high-flyers' eyes." "Cinderella--fifth act!" shouted March as, after a prolonged wait, he heard Rose on the stairs. But was it Rose? The beautiful India mull of her mother's had been transformed into a ball-dress. She had drawn on her long white gloves and tucked into the simple, ribbon belt three of Jack's Christmas roses. Maria-Ann gasped, and that broke the, to Rose, somewhat embarrassing silence. Marshalled by March, the whole family formed a procession, and Rose was reviewed:--back breadths, front breadths, flounces, waist, gloves; all were thoroughly inspected. Chi touched the lower flounce of the half-train gingerly with one work-roughened forefinger, then, straightening himself suddenly,
  • 69. sighed heavily. "What's the matter, Chi?" Rose laughed at the dubious expression on his face. "You ain't Rose Blossom nor Molly Stark any longer. You 're just a regular Empress of Rooshy, 'n' you don't look like that girl I took along to sell berries down to Barton's last summer, 'n' I wish you--" he hesitated. "What, Chi?" said Rose. "I wish you was back again, old sunbonnet, old calico gown, patched shoes 'n' all--" "Oh, Chi, no, you don't," said Rose, laughing merrily; "you forget, I shall probably see Miss Seaton down there in New York, and you wouldn't want me to appear a second time before her in that old rig." "You 're right, Rose-pose," replied Chi, his expression brightening visibly. He drew close to her and whispered audibly: "Just sail right in, Molly Stark, 'n' cut that sassy girl out right 'n' left. She never could hold a candle to you." "Sh-sh, Chi!" said Mrs. Blossom, meaningly, but with a twinkle in her eye. "I mean just what I say, Mis' Blossom. Folks can't come up here on this Mountain to sass us to our faces, 'n' she did;--I've stayed riled ever since, 'n' I hope she'll get sassed back in a way that 'll make her hair stand just a little more on end than it did, when she gave that mean, snickerin' giggle--" "Chi, Chi," Mrs. Blossom interrupted him in an appeasing tone. "You need n't Chi me, Mis' Blossom. These children are just as near to me as if they was my own, 'n' when they 're sassed, I 'm
  • 70. sassed too; 'n' my great-grandfather fought over at Ticonderogy, 'n' I ain't bound to take any more sass than he took--" By this time the whole family were in fits of laughter over Chi's persistent use of so much "sass," and, at last, Chi himself joined in the laugh at his excessive heat:-- "Over nothin' but a wind-bag, after all," he concluded. On the following morning, Mr. Blossom, Chi, March and Budd drove down to Barton's to see Rose off. The old apple-green pung had been fitted with two broad boards for seats, and covered with buffalo robes and horse blankets. There was just room in the tail for Rose's old-fashioned trunk and a small strapped box, which held two dozen of new-laid eggs, six small, round cheeses, and a wreath of ground hemlock and bitter-sweet--a neighborly gift from Aunt Tryphosa and Maria-Ann to Hazel and Mr. Clyde. As the train moved away from the station, Chi watched it with brimming eyes. "She'll never come back the same Rose-pose, livin' among all those high-flyers--never," he muttered to himself; but aloud he remarked, with forced cheerfulness, turning to Mr. Blossom while he dashed the blinding drops from his eyes with the back of his hand: "Looks mighty like a thaw, Ben; kind of wets down, don't it?" "Yes, Chi," said Mr. Blossom, busy with conquering his own heartache, "we 'd better be getting on home;" and the masculine contingent of the Blossom household climbed into the pung and took their way homeward in silence. But what a reception that was for the transplanted Rose! Mr. Clyde met her at the Grand Central Station, and Rose felt how welcome she was just by the hand-clasp, and his first words:
  • 71. "We have you at last, Rose; I would n't let Hazel come because I thought the train might be late, and there's a cold rain falling. Martin, take this box--" "Oh, no; I must carry that myself," laughed Rose, looking up at the liveried footman with something like awe. "I promised Aunt Tryphosa and Maria-Ann I would n't let any one take them till they were safe in the house; thank you," she bowed courteously to Martin, who confided to the coachman so soon as they were on the box: "Hi 'ave n't seen nothink so 'ansome since Hi 've bean in the States." As the brougham whirled into the Avenue, and the electric lights shone full into the carriage, Rose could see the luxuriously upholstered interior, and a sudden thought of the old apple-green pung and the buffalo robes dimmed her eyes. But it was only for a moment; Mr. Clyde was telling her of Hazel's impatience, and how the coachman had had special orders from her to hurry up so soon as he should be on the Avenue, and he had hardly finished before the coachman drew rein, slackening his rapid pace as he turned a corner, Martin was opening the door, and Hazel's voice was calling from a wide house entrance flooded with soft light: "Oh, Rose, my Rose! Is it really you, at last?" "And this, I am sure, is Wilkins," said Rose, when finally Hazel set her arms free. "We 've heard so much of you, that I feel as if I had known you a long time." Rose held out her hand with such sincere cordiality that Wilkins' speech was suddenly reduced to pantomime, and he could only extend his other hand rather helplessly towards the box that Rose still carried. But Rose refused to yield it up.
  • 72. "Here, Hazel, I promised Maria-Ann and Aunt Tryphosa I would n't give it into any hands but yours. Oh! be careful--they 're eggs!" "Eggs!" repeated Hazel, laughing. "Here, Wilkins, unstrap it for me, quick--Oh, papa, look!" She held out the box to Mr. Clyde, and, somehow, John Curtis Clyde for a moment thought with Chi, that there was going to be a "thaw." Each egg was rolled in white cotton batting and wrapped in pink tissue paper. The six little cheeses were enclosed in tin-foil, and cheeses and eggs were embedded in the Christmas wreath. On a piece of pasteboard was written in unsteady characters: To Mr. John Curtis Clyde of New York City, with the season's compliments. MOUNT HUNGER, VERMONT, January 6th, 1898. "And you 've had such lovely flowers come for you, five boxes of them, Rose, and piles of invitations. I 'm sure you 're engaged up to Ash Wednesday." "Come, Chatterbox," said her father, smiling at her volubility, "Rose has just time to dress for dinner; you know Aunt Carrie and Uncle Jo are coming to-night." "Oh, I forgot all about them; you 'll have to hurry, Rose. Wilkins, bring up the flowers. Come on," Hazel ran up the broad flight of stairs, carpeted with velvety crimson, to the first landing, from which, through a lofty arch in the hall, Rose caught a glimpse of softly lighted rooms, the walls enriched with engravings and etchings, with here and there a landscape or marine in watercolors.
  • 73. Rose drew a long breath. This, then, was what Chi meant when he said "Hazel was rich as Croesus." "But, Hazel, my trunk has n't come," said Rose, as she followed her hostess into the spacious bedroom, which was separated from Hazel's only by a dressing-room. "It 'll be here in a few minutes; papa has a special man, who always delivers them almost as soon as we get here." Sure enough, the trunk came in time; and Rose, as she unpacked, finding evidences of the loving mother-care in every fold, cried within her heart, looking about at the exquisite appointments of her room and dressing-room: "Martie, Martie, what would all this be without you!--Oh, I know now, what dear old Chi meant when he said Hazel was poor where we are rich--only a housekeeper to see to all Hazel's things--" "Rose, what flowers are you going to wear?" called Hazel from her room. "I have n't had time to look," Rose called back, surveying her white serge with great satisfaction in the pier-glass. "Do look, then, and see who they 're from." "Oh, Hazel, do come and see. How kind everybody has been! Here are cards from Mrs. Heath and Doctor Heath, and your Aunt Carrie, and Mr. Sherrill, and Mrs. Fenlick, and even that Mr. Grayson who was up at our house to tea a year ago!" "They are lovely. Whose are you going to wear?" "I 'll make up a bunch of one or two from each, that will show my appreciation of all their favors." Hazel looked slightly crestfallen. "I hoped you 'd wear Jack's-- they 're the loveliest with white--" she lifted the white lilacs--"and
  • 74. they 're so rare just now. I heard Aunt Carrie say that one of the girls had put off her wedding for six weeks, just because she couldn't have white lilacs for it." "They 'll last with care three days surely, and I can wear them to-morrow evening," replied Rose, bending to inhale their delicate fragrance. "So you can, for papa is going to give a dinner for you to- morrow night, and afterwards, he has promised to take you to a dance at Mrs. Pearsell's. I can't go, you know, for I 'm not grown up; but you can tell me all about it. We 're going to have lots of fun this week, for school does not begin for several days. Come." Together they went down to the drawing-room, and Wilkins announced that dinner was served. After it was over he sought Minna-Lu in her own domains, and gave vent to his long pent emotions. "Minna-Lu," he whispered, mysteriously, "dere 's an out an' out angel ben hubberin' 'bout de table--" "Fo' de Lawd!" Minna-Lu turned upon him fiercely, for she was superstitious to the very marrow. "Wa' fo' yo' come hyar, skeerin' de bref out a mah bones wif yo' sp'r'ts! Yo' go long home wha' yo' b'long." But Wilkins was not to be repulsed in this manner. "Nebber see sech ha'r, an' jes' lillum-white--" "Oh, go 'long! Lillum-white ha'r," interrupted Minna-Lu, with scathing sarcasm. "Huccome yo' know de angels hab lillum-white ha'r?" "Huccome I know?--'Case I see de shine, jes' lake yo' see in de dror'n-room."
  • 75. "De shine ob lillum-white ha'r in de dror'n-room! 'Pears lake yo' head struck ile--" "Yo' hol' yo' tongue, Minna-Lu," retorted Wilkins, irritated at the continued evidence of disbelief on the part of his coadjutor. "Jes' yo' hide back ob de dumb-waitah to-morrah ebenin' when de dessert comes on, an' see fo' yo'se'f!" He departed in high dudgeon, and Minna-Lu gurgled long and low to herself, but, in her turn, was interrupted by the sound of tripping steps on the basement flight. Minna-Lu hastily put her fat hands up to her turban to see if it were on straight, and smoothed her apron, muttering: "Clar to goodness, ef it ain't jes' mah luck to hab little Missus come into dis yere hen-roost?" she rapidly surveyed her immaculate kitchen with anxious eye. "Minna-Lu, this is my friend, Miss Rose; the one who did up those lovely preserves, and here are some new-laid eggs and some cheeses that Miss Maria-Ann Simmons--you know I told you all about her and the hens--has sent papa." Minna-Lu gazed at Rose in open admiration. The faithful colored retainer had her thorny side and her blossom one. Rose put out her hand, and Minna-Lu took it in both hers. "I 'se mighty glad yo' come, Miss Rose, dere ain't no strawberry-blossom nor no rose-blossom can hol' a can'le to yo' own honey se'f. Dese yere cheeses is prime." She examined one with the nose of a connoisseur. "Jes' fill de bill wif de salad-chips to-morrah." She stemmed her fists on her hips, and her mellow, contented gurgle caused Rose and Hazel to laugh, too. "What is it, Minna-Lu?" said Hazel, reading the signs of the times.
  • 76. "Dat Wilkins done tol' me to git back ob de dumb-waitah, to- morrah ebenin' to see Missy Rose, but I 'se gwine to ask rale straight to jes' see her 'fo' de comp'ny come." "Of course you may. Come up to my room about seven, and we 'll be ready." "Fo' sho'," said Minna-Lu, with beaming face. "Good-night," said Rose, beaming, too, for she found the black faces and ways irresistibly amusing. "De Lawd bress yo' lily face, Missy Rose." When the two girls were alone, at last, in Hazel's room, there was no thought of bed for an hour. There were numberless questions on Hazel's part concerning all the dear Mount Hunger people, and speechless astonishment on Rose's at the number of invitations that were waiting for her. They chatted all the time they were undressing, calling back and forth to each other as one thing or another suggested itself. Finally, Hazel made her appearance in Rose's room. She went up to her, put her arms about her neck, and, looking up with eyes full of loving trust, said: "Rose-pose, won't you come into my room and say 'Our Father' with me as Mother Blossom used to do on Mount Hunger? You can't think how I miss it." "Why, Hazel darling, of course I will--then I shan't feel homesick missing that precious Martie." She followed Hazel into her room, and after she was in bed, Rose knelt by her side, and together they said, "Our Father." Then Rose bent over to receive Hazel's loving kiss and whispered, "Oh, Rose, I 'm so happy to have you here," and whispered back, "And I 'm so happy to be with you, Hazel--good-night."
  • 77. "Good-night." Rose went back to her room. At last she was alone. She drew one of the easy-chairs up before the wood-fire that was dying down, put her bare feet on the warm fender, and, for a while, dreamed waking dreams. It was all so strange. The cathedral clock on the mantel chimed twelve. They were all asleep in the farmhouse on the Mountain--it was time for her to be. She rose, tiptoed softly into the dressing-room, took from the bowl the spray of white lilacs she had worn with the other flowers that evening, shook off the water, and drew the stem through a buttonhole in the yoke of her simple night- dress. She tiptoed back again into her room, looked up at the dainty, canopied bed, then laid herself down within it, and, almost immediately, fell asleep--with her hand resting on the white fragrance that lay upon her heart. XXIII BEHOLD HOW GREAT A MATTER A LITTLE FIRE KINDLETH It was so delightful! The weeks were passing all too quickly, and the letters to Mount Hunger waxed eloquent in praise of everybody's kindness. Jack had come on to lead a cotillion with Rose at Aunt Carrie's. It was a weighty affair--the selecting of the flowers for her. White violets they must be, and white violets were about as rare as white raspberries. Jack gave the florist his own address.
  • 78. "I 'll see them, myself, before I send them up; for I won't trust anyone's eyes but my own," he said to himself as he hurried home to dress for dinner with a friend. "I wish I had n't promised Grayson to meet him at the Club before seven. I 'm afraid they won't come in time." He looked at his watch. "I 'm going to make them a test--and see what she 'll do. She 's so friendly and frank and all that, I can't find out even whether she 's beginning to care." Jack's absorption in the theme was such that he put his latch- key in wrong-side up, and, in consequence, wrestled with the lock till he had worked himself into a fever of impatience; finally he touched the button before he discovered the trouble. "Any packages come for me, Jason?" he inquired of the butler, whose dignified manner of locomotion had been rudely shaken by Jack's unceasing pressure on the electric-bell. "Yes, Mr. John. Just taken a box up to the rooms." Jack looked relieved, and sprang upstairs two steps at a time. He opened the box. There they were in all their exquisite freshness. "Like her," he thought, touching his lips to them; then, suddenly straightening himself, he felt the blood surge into his face. "I like Dord's way of putting up his flowers, no tags, nor fol-de- rols. Jason," he said, as he ran down stairs again, "I shall be back in an hour; tell Thomas to have everything laid out--I 'm in a hurry. And have a messenger-boy here when I come back, and don't forget to order the carriage for quarter of eight, sharp." "Yes, Mr. John." "Messenger-boy come?" he inquired as Jason opened the door on his return. "Yes, sir, waiting in the hall."
  • 79. Jack raced up stairs. There was the precious box on his dressing-table. He hastily took a visiting card, and, writing on it the sentiment that was uppermost in his heart, slipped it into the envelope, gave it, together with the box, to the waiting boy, and bade him hand it to the man, Wilkins, with the request that it be sent up at once to the lady to whom it was addressed. Then he made ready for dinner. An hour later, Rose was dressing for the dance, and Hazel was watching her, chatting volubly all the while. "That's the loveliest dress, Rose, I heard Aunt Carrie say, you couldn't buy such, nowadays." "It was Martie's wedding-dress. An uncle of her mother's, who was a sea-captain, brought it from India. But if I wear it many more times, it will be known throughout the length of New York. This is my sixth time." "I should n't care if it were the hundredth; it's just lovely. Besides, Jack has n't seen it, you know." Rose laughed. "Oh, yes, he has--on Martie; that night of the tea on the porch." "Oh, well, that's different. What flowers are you going to wear?" "I thought I wouldn't wear any, just for a change." Rose's face was veiled by the shining hair, which she was brushing, preparatory to coiling it high on her head; otherwise, Hazel would have seen the clear flush that warmed even the roots of the soft waves at the nape of her neck. Just then there was a knock. The maid opened the door, and Wilkins' voice was distinctly audible:-- "Jes' come fo' Miss Rose; dey wuz to come up right smart, so de boy say."
  • 80. "Oh, more flowers. Who from?" cried Hazel, eagerly, while Wilkins strained his ears to catch the reply. "From Mr. Sherrill," said Rose, opening the little envelope. What she read on the card caused the blood to mount higher and higher, till temples and forehead flushed pink, then as suddenly to recede. "May I open them, Rose, and won't you wear some if they 're from Jack?" "Yes," said Rose, simply. The two girls leaned over the box as Hazel took off the wrapper--then the cover--then the inner tissue papers--then--
  • 81. "The two girls leaned over the box as Hazel took off the wrapper" Suddenly a shriek of laughter, followed by another, penetrated to Wilkins, who was lingering on the stairs; he came softly back again. Peal after peal of wild merriment issued from Rose's room. Within, Rose in her petticoat and bodice had flung herself on the bed in an ecstasy of mirth, and Hazel was rolling over on the rug as was
  • 82. the wont of Budd and Cherry in the old days on Mount Hunger. The maid looked from one to the other, and, no longer able to keep from joining in the merriment, although she did not know the cause, left the room, only to find Wilkins with perturbed face just outside the door. "'Pears lake dere wor sumfin' queah 'bout dat ye re box--" he began; but the maid only shook with laughter and laid her finger on her lips, motioning him into the back hall. "Did you ever?" cried Hazel, when she recovered her breath. "No, I never," said Rose, wiping away the tears, for she had laughed till she cried. "Let's take another look." They bent over the box, and took out its contents; then went off again into fits of seemingly inextinguishable laughter; for, neatly folded beneath the tissue paper, lay four sets of Jack's new light- weight, white silk pajamas, which he had purchased that afternoon, in order to take back to Cambridge with him. On the card, which Rose still held in her hand, was written, "Wear these for my sake." "What will you say to him, Rose?" said Hazel, sitting up on the rug with her hands clasped about her knees. "I don't know," said Rose, proceeding to dress. "I can't wear them, that's certain." And again the absurdity of the situation presented itself to her. "And I can't apologize for not wearing them. Neither can I take it for granted that he was going to send me flowers, and explain that he sent me these instead." "How awfully careless," said Hazel, interrupting her; "he must have had something on his mind not to take the pains to look, even."
  • 83. Rose flushed. "It will be best to let the matter drop, and say nothing about it," she replied in a cool, toploftical tone that amazed, as well as mystified, her little hostess. "Why, Rose, I think Jack ought to know about it. I 'll tell him, if you don't want to." "Thank you, Hazel, but I don't need your good offices in this matter." Hazel rose from the rug, and going over to Rose, laid both hands on her shoulders and looked straight up into her eyes. "Now, Rose Blossom, please don't speak to me in that way. You 're so queer! First you 're nice about Jack, and then you 're horrid; and when you 're that way, you are n't nice to me a bit--and I don't like it, and I don't blame Jack for not liking it either," she added emphatically. "I remember papa said a year ago that Jack was 'all heart' for a good many girls, old and young--but I can tell you what, he won't have any for you, if you whiff round so." Hazel in her earnestness gave Rose a little shake. Rose smiled, and, bending her head, kissed her, saying, "F. and F. and you know, Hazel." "Oh, I know all about 'forgiving and forgetting,' but I don't like it just the same. He's my cousin and the dearest fellow in the world, and I don't like to have him treated so." "How about his treating me?" said Rose, pointing to the innocent box of underwear, "forgetting even to look; or not caring enough, to see if I had the right package?" "Oh, that's different--perhaps the florist made a mistake." "The florist!" Rose laughed merrily. "I never knew that gentlemen's underwear and roses grew on the same bush.--There 's
  • 84. Wilkins, and I 'm not ready." "De coachman say it's a pow'f ul col' night, an' Miss Rose bettah take some mo' wraps." "Thank you, Wilkins," Hazel flew into the dressing-room for a long fur cloak of her mother's which she had used to wear to the dancing-classes. She wrapped it about Rose, who stooped suddenly and kissed her again, whispering, "Hazel, you 've all spoiled me, that's what's the matter,--but I 'll be good to Jack, for your sake as well as for my own." "Now you 're what Doctor Heath calls papa, the most splendid fellow in the world. There now--I won't crush your gown--" A kiss-- "Good-night. You look like an angel!" Mr. Clyde thought so, too, as he watched her coming downstairs. She slipped off the cloak as she stood beneath the soft, but brilliant hall lights. "Do I look all right?" she asked earnestly, for she had fallen into the habit, before going anywhere with him or Hazel, of asking for their criticism. "I should say so--but where are the flowers? I miss them." "I thought I wouldn't wear any to-night, just for a change." "A woman's whim, Rose. But I can't say that you need them-- Now, what's to pay?" he said to himself, as he helped her into the carriage. "I saw Jack at Dord's this afternoon, and, evidently, something was in the wind. I hope it has n't been taken out of his sails." "Sumfin' mighty queah 'bout dat yere box," murmured Wilkins to himself, as he closed the door, "but Miss Rose doan' need no flow's. Nebber see sech h--Fo' de good Lawd! Wha' fo' yo' hyar? Yo'
  • 85. Minna-Lu,--skeerin' mah day-lights out o' mah, shoolin' 'roun' b'hin' dat por' chair,--jes' lake bug'lahs." Minna-Lu gurgled. "Yo' jes' straight, Wilkins; nebber see sech ha'r. Huccome I 'se hyar? Jes' to see dat lillum-white angel--" "Yo' go 'long, wha' yo' b'long," growled Wilkins, not yet having recovered from his fright. And Minna-Lu went, with the radiant vision still before her round, black eyes. Jack felt a queer tightening about his lower jaw, and one heart- throb, apparently in his throat, as he entered Aunt Carrie's reception-room. Then, as with one glance he swept Rose from the crown of her head to the hem of her dress, a hot, rushing wave of indignant feeling mastered him--he knew he had staked his all (so a man at twenty-two is apt to think) and lost. He braced himself, mentally and physically. He was n't going to show the white-feather- -not he. But Rose--Rose was mystifying, captivating, cordial, merry, and altogether charming. She knocked out all Jack's calculations as to life, love, women, girls in general, and one girl in particular, at one fell swoop. He was brought, necessarily, into unstable equilibrium, so far as his feelings were concerned--his head he was obliged to keep level on account of the various figures. Several other heads were variously askew, and would have been turned, likewise, for good and all, had the wearer of her mother's India-mull wedding-dress been possessed of a fortune. Rose developed social powers that evening that furnished food for conversation for Aunt Carrie and Mr. Clyde, who watched her with pride and pleasure. She was evidently enjoying herself thoroughly, and her enjoyment proved contagious.
  • 86. "After all," said Jack as, between figures, he found opportunity for a whispered word or two; "this is n't half so fine a dance as the one in the barn, last September." "Why, that's just what I was thinking, myself, that very minute!" "You were?" "Yes." The brown eyes and the blue ones met with such evidence of a perfect understanding, that Jack failed to see Maude Seaton, who had approached him for the purpose of taking him out in the four-in- hand. "Oh, I beg your pardon," said Jack, starting to his feet, "it's the 'four-in-hand.'" "Yes, and I think you 'll have to be put into the traces again," she said, with a meaning smile. "Not I," retorted Jack, merrily, "I kicked over them nearly a year ago." "So I heard," replied Miss Seaton, sweetly; and Jack wondered what she meant. When Jack found himself again beside Rose, he decided that, flowers or no flowers, he would ask for an explanation. But his first attempt was met with such a bewilderingly merry smile, and such confident assurance that explanations were not in order, that it proved a successful failure. When, at last, in the early morning hours he was seated before the open fire in his bedroom, pulling away reflectively at his pipe, he had time to think it over. He came to the conclusion that it was trivial in him to have staked his all on her wearing those flowers, for she
  • 87. Welcome to our website – the perfect destination for book lovers and knowledge seekers. We believe that every book holds a new world, offering opportunities for learning, discovery, and personal growth. That’s why we are dedicated to bringing you a diverse collection of books, ranging from classic literature and specialized publications to self-development guides and children's books. More than just a book-buying platform, we strive to be a bridge connecting you with timeless cultural and intellectual values. With an elegant, user-friendly interface and a smart search system, you can quickly find the books that best suit your interests. Additionally, our special promotions and home delivery services help you save time and fully enjoy the joy of reading. Join us on a journey of knowledge exploration, passion nurturing, and personal growth every day! ebookbell.com