VLSI Design for Video Coding 2010th Edition Youn

Visit https://ebookultra.com to download the full version and
explore more ebooks
VLSI Design for Video Coding 2010th Edition Youn
_____ Click the link below to download _____
https://ebookultra.com/download/vlsi-design-for-video-
coding-2010th-edition-youn/
Explore and download more ebooks at ebookultra.com

Here are some suggested products you might be interested in.
Click the link to download
Emerging Technologies for 3D Video Creation Coding
Transmission and Rendering 1st Edition Frederic Dufaux
https://ebookultra.com/download/emerging-technologies-for-3d-video-
creation-coding-transmission-and-rendering-1st-edition-frederic-
dufaux/
Multidimensional signal image and video processing and
coding 2ed Edition Woods J.W.
https://ebookultra.com/download/multidimensional-signal-image-and-
video-processing-and-coding-2ed-edition-woods-j-w/
3D Integration for VLSI Systems Chuan Seng Tan
https://ebookultra.com/download/3d-integration-for-vlsi-systems-chuan-
seng-tan/
VLSI Technology Wai
https://ebookultra.com/download/vlsi-technology-wai/

VLSI Circuits for Biomedical Applications 1st Edition
Krzysztof Iniewski
https://ebookultra.com/download/vlsi-circuits-for-biomedical-
applications-1st-edition-krzysztof-iniewski/
VLSI for Wireless Communication 1st Edition Bosco H. Leung
https://ebookultra.com/download/vlsi-for-wireless-communication-1st-
edition-bosco-h-leung/
Drawing Basics and Video Game Art Classic to Cutting Edge
Art Techniques for Winning Video Game Design 1st Edition
Chris Solarski
https://ebookultra.com/download/drawing-basics-and-video-game-art-
classic-to-cutting-edge-art-techniques-for-winning-video-game-
design-1st-edition-chris-solarski/
Coding For Dummies 1st Edition Nikhil Abraham
https://ebookultra.com/download/coding-for-dummies-1st-edition-nikhil-
abraham/
Handbook of Video Databases Design and Applications 1st
Edition Borko Furht
https://ebookultra.com/download/handbook-of-video-databases-design-
and-applications-1st-edition-borko-furht/

Digital Instant Download
Author(s): Youn,Long Steve Lin, Chao,Yang Kao, Hung,Chih Kuo, Jian,Wen
Chen
ISBN(s): 9781441909589, 1441909583
Edition: 2010
File Details: PDF, 29.76 MB
Year: 2010
Language: english

Youn-Long Steve Lin • Chao-Yang Kao
Huang-Chih Kuo • Jian-Wen Chen
VLSI Design for
Video Coding
H.264/AVC Encoding from Standard
Specification to Chip
123

Prof. Youn-Long Steve Lin
National Tsing Hua University
Dept. Computer Science
101 Kuang Fu Road
HsinChu 300
Section 2
Taiwan R.O.C.
Chao-Yang Kao
101 Kuang Fu Road
HsinChu 300
Section 2
Taiwan R.O.C.
Huang-Chih Kuo
101 Kuang Fu Road
HsinChu 300
Section 2
Taiwan R.O.C.
Jian-Wen Chen
101 Kuang Fu Road
HsinChu 300
Section 2
Taiwan R.O.C.
ISBN 978-1-4419-0958-9 e-ISBN 978-1-4419-0959-6
DOI 10.1007/978-1-4419-0959-6
Springer New York Dordrecht Heidelberg London
Library of Congress Control Number: 2009943294
c
Springer Science+Business Media, LLC 2010
All rights reserved. This work may not be translated or copied in whole or in part without the written
permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York,
NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in
connection with any form of information storage and retrieval, electronic adaptation, computer software,
or by similar or dissimilar methodology now known or hereafter developed is forbidden.
The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are
not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject
to proprietary rights.
Printed on acid-free paper
Springer is part of Springer Science+Business Media (www.springer.com)

Preface
A video signal is represented as a sequence of frames of pixels. There exists vast
amount of redundant information that can be eliminated with video compression
technology so that its transmission and storage becomes more efficient. To facilitate
interoperability between compression at the video producing source and decompres-
sion at the consumption end, several generations of video coding standards have
been defined and adapted.
After MPEG-1 for VCD and MPEG-2 for DVD applications, H.264/AVC is
the latest and most advanced video coding standard defined by the international
standard organizations. Its high compression ratio comes at the expense of more
computational-intensive coding algorithms. For low-end applications, software so-
lutions are adequate. For high-end applications, dedicated hardware solutions are
needed.
This book describes an academic project of developing an application-specific
VLSI architecture for H.264/AVC video encoding. Each subfunction is analyzed
before a suitable parallel-processing architecture is designed. Integration of sub-
functional modules as well as the integration into a bus-based SOC platform is
presented. The whole encoder has been prototyped using an FPGA.
Intended readers are researchers, educators, and developers in video coding sys-
tems, hardware accelerators for image/video processing, and high-level synthesis
of VLSI. Especially, those who are interested in state-of-the-art parallel architecture
and implementation of intra prediction, integer motion estimation, fractional motion
estimation, discrete cosine transform, context-adaptive binary arithmetic coding,
and deblocking filter will find design ideas from this book.
HsinChu, Taiwan, ROC Youn-Long Lin
Chao-Yang Kao
Huang-Chih Kuo
Jian-Wen Chen
v

Acknowledgments
Cheng-Long Wu, Cheng-Ru Chang, Chun-Hsin Lee, Chun-Lin Chiu, Hao-Ting
Huang, Huan-Chun Tseng, Huan-Kai Peng, Hui-Ting Yang, Jhong-Wei Gu,
Kai-Hsiang Chang, Li-Cian Wu, Ping Chao, Po-Sheng Liu, Sheng-Tsung Hsu,
Sheng-Yu Shih, Shin-Chih Lee, Tzu-Jen Lo, Wei-Cheng Huang, Yu-Chien Kao,
Yuan-Chun Lin, and Yung-Hung Chan of the Theda.Design Group, National Tsing
Hua University contribute to the development of the H.264 Video Encoder System
described in this book.
The authors appreciate financial support from Taiwan’s National Science Council
under Contracts no. 95-2220-E-007-024, 96-2220-E-007-013, and 97-2220-E-007-
003 and Ministry of Economics Affairs under Contracts no. 94-EC-17-A-01-S1-
038, 95-EC-17-A-01-S1-038, and 96-EC-17-A-01-S1-038. Financial support from
Taiwan Semiconductor Manufacturing Company Limited (TSMC) and Industry
Technology Research Institute (ITRI) is also greatly appreciated.
Global Unichip Corp. provided us with its UMVP multimedia SOC platform and
consultation during the FPGA prototyping stage of the development. The authors are
grateful to Chi Mei Optoelectronics for a 52-in. Quad Full HD display panel. Joint
research with the Microprocessor Research Center (MPRC) of Peking University
has been an important milestone of this project.
vii

Contents
1 Introduction to Video Coding and H.264/AVC ............................. 1
1.1 Introduction ............................................................... 1
1.1.1 Basic Coding Unit ............................................... 2
1.1.2 Video Encoding Flow ........................................... 2
1.1.3 Color Space Conversion......................................... 2
1.1.4 Prediction of a Macroblock ..................................... 3
1.1.5 Intraframe Prediction............................................ 4
1.1.6 Interframe Prediction............................................ 4
1.1.7 Motion Vector ................................................... 4
1.1.8 Prediction Error.................................................. 4
1.1.9 Space-Domain to Frequency-Domain
Transformation of Residual Error .............................. 5
1.1.10 Coefficient Quantization ........................................ 5
1.1.11 Reconstruction................................................... 5
1.1.12 Motion Compensation........................................... 5
1.1.13 Deblocking Filtering ............................................ 6
1.2 Book Organization ........................................................ 6
2 Intra Prediction ................................................................ 11
2.1 Introduction ............................................................... 11
2.1.1 Algorithm ........................................................ 12
2.1.2 Design Consideration ........................................... 16
2.2 Related Works ............................................................. 19
2.2.1 Prediction Time Reduction Approaches........................ 19
2.2.2 Hardware Area Reduction Approaches ........................ 19
2.3 A VLSI Design for Intra Prediction ...................................... 20
2.3.1 Subtasks Scheduling ............................................ 20
2.3.2 Architecture...................................................... 24
2.3.3 Evaluation ....................................................... 30
2.4 Summary .................................................................. 30
ix

x Contents
3 Integer Motion Estimation .................................................... 31
3.1 Introduction ............................................................... 31
3.1.1 Algorithms ....................................................... 33
3.1.2 Design Considerations .......................................... 36
3.2 Related Works ............................................................. 37
3.2.1 Architecture...................................................... 37
3.2.2 Data-Reuse Schemes ............................................ 43
3.3 A VLSI Design for Integer Motion Estimation .......................... 44
3.3.1 Proposed Data-Reuse Scheme .................................. 45
3.3.2 Architecture...................................................... 47
3.3.3 Data Flow ........................................................ 49
3.3.4 Evaluation ....................................................... 52
3.4 Summary .................................................................. 53
4 Fractional Motion Estimation ................................................ 57
4.1 Introduction ............................................................... 57
4.1.1 Algorithms ....................................................... 58
4.2 Related Works ............................................................. 61
4.3 A VLSI Design for Fractional Motion Estimation ...................... 63
4.3.1 Proposed Architecture........................................... 63
4.3.2 Proposed Resource Sharing Method
for SATD Generator ............................................. 68
4.3.3 Evaluation ....................................................... 72
4.4 Summary .................................................................. 72
5 Motion Compensation ......................................................... 73
5.1 Introduction ............................................................... 73
5.1.1 Algorithms ....................................................... 73
5.2 Related Works ............................................................. 75
5.2.1 Memory Traffic Reduction...................................... 76
5.2.2 Interpolation Engine............................................. 76
5.3 A VLSI Design for Motion Compensation .............................. 77
5.3.1 Motion Vector Generator........................................ 77
5.3.2 Interpolator ...................................................... 79
5.3.3 Evaluation ....................................................... 83
5.4 Summary .................................................................. 83
6 Transform Coding ............................................................. 85
6.1 Introduction ............................................................... 85
6.1.1 Algorithms ....................................................... 85
6.1.2 Design Consideration ........................................... 97
6.2 Related Works ............................................................. 97
6.2.1 Multitransform Engine Approaches ............................ 97
6.2.2 Trans/Quan or InvQuan/InvTrans Integration Approaches .... 97

Contents xi
6.3 A VLSI Design for Transform Coding................................... 98
6.3.1 Subtasks Scheduling ............................................ 98
6.3.2 Architecture...................................................... 98
6.3.3 Evaluation .......................................................106
6.4 Summary ..................................................................106
7 Deblocking Filter...............................................................107
7.1 Introduction ...............................................................107
7.1.1 Deblocking Filter Algorithm....................................108
7.1.2 Subtasks Processing Order......................................112
7.1.3 Design Considerations ..........................................113
7.2 Related Works .............................................................115
7.3 A VLSI Design for Deblocking Filter....................................116
7.3.1 Subtasks Scheduling ............................................116
7.3.2 Architecture......................................................116
7.3.3 Evaluation .......................................................122
7.4 Summary ..................................................................124
8 CABAC Encoder ...............................................................125
8.1 Introduction ...............................................................125
8.1.1 CABAC Encoder Algorithm ....................................125
8.1.2 Subtasks Processing Order .....................................134
8.1.3 Design Consideration ...........................................134
8.2 Related Works .............................................................136
8.3 A VLSI Design for CABAC Encoder ....................................139
8.3.2 Architecture......................................................140
8.3.3 Evaluation .......................................................147
8.4 Summary ..................................................................148
9 System Integration .............................................................151
9.1 Introduction ...............................................................151
9.1.1 Algorithm ........................................................151
9.1.2 Design Consideration ...........................................153
9.2 Related Works .............................................................155
9.3 A VLSI Design for H.264/AVC Encoder ................................156
9.3.2 Architecture......................................................159
9.3.3 Evaluation .......................................................165
9.4 Summary ..................................................................166
References...........................................................................167
Index.................................................................................173

Chapter 1
Introduction to Video Coding and H.264/AVC
Abstract A video signal is represented as a sequence of frames of pixels. There
exists a vast amount of redundant information that can be eliminated with video
compression technology so that transmission and storage becomes more efficient.
To facilitate interoperability between compression at the video producing source
and decompression at the consumption end, several generations of video coding
standards have been defined and adapted. For low-end applications, software so-
lutions are adequate. For high-end applications, dedicated hardware solutions are
needed. This chapter gives an overview of the principles behind video coding in
general and the advanced features of H.264/AVC standard in particular. It serves as
an introduction to the remaining chapters; each covers an important coding tool and
its VLSI architectural design of an H.264/AVC encoder.
1.1 Introduction
A video encoder takes as its input a video sequence, performs compression, and
then produces as its output a bit-stream data which can be decoded back to a video
sequence by a standard-compliant video decoder.
A video signal is a sequence of frames. It has a frame rate defined as the number
of frames per second (fps). For typical consumer applications, 30 fps is adequate.
However, it could be as high as 60 or 72 for very high-end applications or as low as
10 or 15 for video conferencing over a low-bandwidth communication link.
A frame consists of a two-dimensional array of color pixels. Its size is called
frame resolution. A standard definition (SD) frame has 720 480 pixels per frame
whereas a full high definition (FullHD) one has 1,920 1,088. There are large num-
ber of frame size variations developed by various applications such as computer
monitors.
A color pixel is composed of three elementary components: R, G, and B. Each
component is digitized to an 8-bit data for consumer applications or a 12-bit one for
high-end applications.
Y.-L.S. Lin et al., VLSI Design for Video Coding: H.264/AVC Encoding from Standard
Specification to Chip, DOI 10.1007/978-1-4419-0959-6 1,
c
1

2 1 Introduction to Video Coding and H.264/AVC
The data rate for a raw video signal is huge. For example, a 30-fps FullHD one
will have a data rate of 30 1;920 1;088 3 8 D 1:5Gbps, which is impractical
for today’s communication or storage infrastructure.
Fortunately, by taking advantage of the characteristics of human visual system
and the redundancy in the video signal, we can compress the data by two orders of
magnitude without scarifying the quality of the decompressed video.
1.1.1 Basic Coding Unit
In order for a video encoding or decoding system to handle video of different frame
rates and simplify the implementation, a basic size of 16 16 has been popularly
adopted. Every main stream coding standards from MPEG-1, MPEG-2, : : : to H.264
has chosen a macroblock of 16 16 pixels as their basic unit of processing. Hence,
for video of different resolutions, we just have to process different number of mac-
roblocks. For every 720 480 SD frame, we process 45 30 macroblocks while for
every FullHD frame, we process 120 68 macroblocks.
1.1.2 Video Encoding Flow
Algorithm 1.1 depicts a typical flow of video encoding. frame(t) is the current frame
to be encoded. frame0
(t1) is the reconstructed frame for referencing or called ref-
erence frame. frame0
(t) is the reconstructed current frame. We encode F.t/ one
macroblock (MB) at a time starting from the leftmost MB of the topmost row.
We called the MB being encoded as Curr MB. It can be encoded in one of the
three modes: I for intra prediction, P or unidirectional interprediction, and B for
bidirectional interprediction. The resultant MB from prediction is called Pred MB
and the difference between Curr MB and Pred MB is called Res MB for residu-
als. Res MB goes through space-to-frequency transformation and then quantization
processes to become Res Coef or residual coefficients. Entropy coding then com-
presses Res Coef to get final bit-stream. In order to prepare reconstructed current
frame for future reference, we perform inverse quantization and inverse transform
on Res Coef to get reconstructed residuals called Reconst res. Adding together Re-
const res and Pred MB, we have Reconstruct MB for insertion into frame0
(t).
1.1.3 Color Space Conversion
Naturally, each pixel is composed of R, G, and B 8-bit components. Applying the
following conversion operation, it can be represented as one luminance (luma) com-
ponent Y and two chrominance (chroma) components Cr and Cb. Since the human

1.1 Introduction 3
Algorithm 1.1: Encode a frame.
encode a frame (frame(t), mode)
for I D 1, N do //** N: #rows of MBs per frame
for I D 1, M do //** N: #rows of MBs per frame
Curr MB D MB(frame(t), I, J);
case (mode)
I: Pred MB D Intra Pred (frame(t)’, I, J);
P: Pred MB D ME (frame(t-1)’, I, J);
B: Pred MB D ME (frame(t-1)’, frame(tC1)’, I, J);
endcase
Res MB D Curr MB - Pred MB;
Res Coef D Quant(Transform(Res MB));
Output(Entropy code(Res Coef));
Reconst res D InverseTransform(InverseQuant(Res Coef));
Reconst MB D Reconst res C Pred MB;
Insert(Reconst MB, frame(t)’);
endfor
endfor
end encode a frame;
visual system is more sensitive to luminance component than chrominance ones, we
can subsample Cr and Cb to reduce the data amount without sacrificing the video
quality. Usually one out of two or one out of four subsampling is applied. The for-
mer is called 4:2:2 format and the later 4:2:0 format. In this book, we assume that
4:2:0 format is chosen. Of course, the inverse conversion will give us R, G, B com-
ponents from a set of Y , Cr, Cb components.
Y D 0:299R C 0:587G C 0:114B;
Cb D 0:564.B Y /;
Cr D 0:713.R Y /:
(1.1)
1.1.4 Prediction of a Macroblock
A macroblock M has 1616 D 256 pixels. It takes 2563 D 768 bytes to represent
it in RGB format and 256.1C1=4C1=4/ D 384 bytes in 4:2:0 format. If we can
find during decoding a macroblock M0
which is similar to M, then we only have to
get from the encoding end the difference between M and M0
. If M and M0
are very
similar, the difference becomes very small so does the amount of data needed to
be transmitted/stored. Another way to interpret similarity is redundancy. There exist
two types of redundancy: spatial and temporal. Spatial redundancy results from sim-
ilarity between a pixel (region) and its surrounding pixels (regions) in a frame. Tem-
poral redundancy results from slow change of video contents from one frame to the
next. Redundancy information can be identified and removed with prediction tools.

1.1.5 Intraframe Prediction
In an image region with smooth change, a macroblock is likely to be similar to its
neighboring macroblocks in color or texture. For example, if all its neighbors are
red, we can predict that a macroblock is also red. Generally, we can define sev-
eral prediction functions; each takes pixel values from neighboring macroblocks
as its input and produces a predicted macroblock as its output. To carry out in-
traframe prediction, every function is evaluated and the one resulting in the smallest
error is chosen. Only the function type and the error need to be encoded and
stored/transmitted. This tool is also called intra prediction and a prediction func-
tion is also called a prediction mode.
1.1.6 Interframe Prediction
Interframe prediction, also called interprediction, identifies temporal redundancy
between neighboring frames. We call the frame currently being processed the cur-
rent frame and the neighboring one the reference frame. We try to find from
the reference frame a reference macroblock that is very similar to the current
macroblock of the current frame. The process is called motion estimation. A mo-
tion estimator compares the current macroblock with candidate macroblocks within
a search window in the reference frame. After finding the best-matched candi-
date macroblock, only the displacement and the error need to be encoded and
stored/transmitted. The displacement from the location of the current macroblock
to that of the best candidate block is called motion vector (MV). In other words,
motion estimation determines the MV that results in the smallest interprediction
error. A bigger search window will give better prediction at the expense of longer
estimation time.
1.1.7 Motion Vector
A MV obtained from motion estimation is adequate for retrieving a block from the
reference frame. Yet, we do not have to encode/transmit the whole of it because there
exists similarity (or redundancy) among MVs of neighboring blocks. Instead, we can
have a motion vector prediction (MVP) as a function of neighboring blocks’ MVs
and just process the difference, called motion vector difference (MVD), between the
MV and its MVP. In most cases, the MVD is much smaller than its associated MV.
1.1.8 Prediction Error
We call the difference between the current macroblock and the predicted one as
prediction error. It is also called residual error or just residual.

1.1 Introduction 5
1.1.9 Space-Domain to Frequency-Domain Transformation
of Residual Error
Residual error is in the space domain and can be represented in the frequency
domain by applying discrete cosine transformation (DCT). DCT can be viewed
as representing an image block with a weighted sum of elementary patterns. The
weights are termed as coefficients. For computational feasibility, a macroblock of
residual errors is usually divided into smaller 4 4 or 8 8 blocks before applying
DCT one by one.
1.1.10 Coefficient Quantization
Coefficients generated by DCT carry image components of various frequencies.
Since human visual system is more sensitive to low frequency components and
less sensitive to high frequency ones, we can treat them with different resolution
by means of quantization. Quantization effectively discards certain least significant
bits (LSBs) of a coefficient. By giving smaller quantization steps to low frequency
components and larger quantization steps to high frequency ones, we can reduce the
amount of data without scarifying the visual quality.
1.1.11 Reconstruction
Both encoding and decoding ends have to reconstruct video frame. In the encoding
end, the reconstructed frame instead of the original one should be used as refer-
ence because no original frame is available in the decoding end. To reconstruct, we
perform inverse quantization and inverse DCT to obtain reconstructed residual. Note
that the reconstructed residual is not identical to the original residual as quantization
is irreversible. Therefore, distortion is introduced here. We then add prediction data
to the reconstructed residual to obtain reconstructed image. For an intrapredicted
macroblock, we perform predict function on its neighboring reconstructed mac-
roblocks while for an interpredicted one we perform motion compensation. Both
methods give a reconstructed version of the current macroblock.
1.1.12 Motion Compensation
Given a MV, the motion compensator retrieves from the reference frame a re-
constructed macroblock pointed to by the integer part of the MV. If the MV has
fractional part, it performs interpolation over the retrieved image to obtain the final
reconstructed image. Usually, interpolation is done twice, one for half-pixel accu-
racy and the other for quarter-pixel accuracy.

1.1.13 Deblocking Filtering
After every macroblock of a frame is reconstructed one by one, we obtain a
reconstructed frame. Since the encoding/decoding process is done macroblock-wise,
there exists blocking artifacts between boundaries of adjacent macroblocks or sub-
blocks. Deblocking filter is used to eliminate this kind of artificial edges.
1.2 Book Organization
This book describe a VLSI implementation of a hardware H.264/AVC encoder as
depicted in Fig. 1.1.
Inter Info
Memory
IME
Engine
TransCoding
Engine
FME
Engine
MC
Engine
IntraPred
Engine
IntraMD
Engine
Multiplexer
Recons
Engine
DF
Engine
Unfilter
Memory
ReconsMB
Memory
CABAC
Engine
PE
Engine
DF
MAU
Encoder Core
MB
MAU
SR
MAU
BIT
MAU
MAU Arbiter
Command
Receiver
AMBA
Slave
AMBA
Master
AMBA
MainCtrl
Engine
AMBA Interface
Fig. 1.1 Top-level block diagram of the proposed design

1.2 Book Organization 7
In Chap. 2, we present intra prediction. Intra prediction is the first process of
H.264/AVC intra encoding. It predicts a macroblock by referring to its neighboring
macroblocks to eliminate spatial redundancy. There are 17 prediction modes for a
macroblock: nine modes for each of the 16 luma 4 4 blocks, four modes for a luma
16 16 block, and four modes for each of the two chroma 8 8 blocks. Because
there exists great similarity among equations of generating prediction pixels across
prediction modes, effective hardware resource sharing is the main design consider-
ation. Moreover, there exists a long data-dependency loop among luma 4 4 blocks
during encoding. Increasing parallelism and skipping some modes are two of the
popular methods to design a high-performance architecture for high-end applica-
tions. However, to increase throughput will require more hardware area and to skip
some modes will degrade video quality. We will present a novel VLSI implementa-
tion for intra prediction in this chapter.
In Chap. 3, we present integer motion estimation. Interframe prediction in
H.264/AVC is carried out in three phases: integer motion estimation (IME), frac-
tional motion estimation (FME), and motion compensation (MC). We will discuss
these functions in Chaps. 3, 4, and 5, respectively. Because motion estimation
in H.264/AVC supports variable block sizes and multiple reference frames, high
computational complexity and huge data traffic become main difficulties in VLSI
implementation. Moreover, high-resolution video applications, such as HDTV,
make these problems more critical. Therefore, current VLSI designs usually adopt
parallel architecture to increase the total throughput and solve high computational
complexity. On the other hand, many data-reuse schemes try to increase data-reuse
ratio and, hence, reduce required data traffic. We will introduce several key points
of VLSI implementation for IME.
In Chap. 4, we present fractional motion estimation. Motion estimation in
H.264/AVC supports quarter-pixel precision and is usually carried out in two
phases: IME and FME. We have talked about IME in Chap. 3. After IME finds an
integer motion vector (IMV) for each of the 41 subblocks, FME performs motion
search around the refinement center pointed to by IMV and further refines 41 IMVs
into fractional MVs (FMVs) of quarter-pixel precision. FME interpolates half-
pixels using a six-tap filter and then quarter-pixels a two-tap one. Nine positions are
searched in both half refinement (one integer-pixel search center pointed to by IMV
and eight half-pixel positions) and then quarter refinement (one half-pixel position
and eight quarter-pixel positions). The position with minimum residual error is
chosen as the best match. FME can significantly improve the video quality (C0:3
to C0:5dB) and reduce bit-rate (20–37%) according to our experimental results.
However, our profiling report shows that FME consumes more than 40% of the total
encoding time. Therefore, an efficient hardware accelerator for fractional motion
estimation is indispensable.
In Chap. 5, we present motion compensation. Following integer and fractional
motion estimation, motion compensation (MC) is the third stage in H.264/AVC
interframe prediction (P or B frame). After the motion estimator finds MVs and
related information for each current macroblock, the motion compensator generates

compensated macroblocks (MBs) from reference frames. Due to quarter-pixel
precision and variable-block-size motion estimation supported in H.264, motion
compensation also needs to generate half- or quarter-pixels for MB compensation.
Therefore, motion compensation also has high computational complexity and dom-
inates the data traffic on DRAM. Current VLSI designs for MC usually focus on
reducing memory traffic or increasing interpolator throughput. In this chapter, we
will introduce several key points of VLSI implementation for motion compensation.
In Chap. 6, we present transform coding. In H.264/AVC, both transform and
quantization units consist of forward and inverse parts. Residuals are transformed
into frequency domain coefficients in the forward transform unit and quantized in
the forward quantization unit to reduce insignificant data for bit-rate saving. To gen-
erate reconstructed pixels for the intra prediction unit and reference frames for the
motion estimation unit, quantized coefficients are rescaled in the inverse quanti-
zation unit and transformed back to residuals in the inverse transform unit. There
are three kinds of transforms used in H.264/AVC: 4 4 integer discrete cosine
transform, 2 2 Hadamard transform, and 4 4 Hadamard transform. To design
an area-efficient architecture is the main design challenge. We will present a VLSI
implementation of transform coding in this chapter.
In Chap. 7, we present deblocking filter. The deblocking filter (DF) adopted
in H.264/AVC reduces the blocking artifact generated by block-based motion-
compensated interprediction, intra prediction, and integer discrete cosine transform.
The filter for eliminating blocking artifacts is embedded within the coding loop.
Therefore, it is also called in-loop filter. Expirically, it achieves up to 9% bit-rate
saving at the expense of intensive computation. Even with today’s fastest CPU, it
is hard to perform software-based real-time encoding of high-resolution sequences
such as QFHD (3,840 2,160). Consequently, accelerating the deblocking filter by
VLSI implementation is indeed required. Through optimizing processing cycle, ex-
ternal memory access, and working frequency, we show a design that can support
QFHD at 60-fps application by running at 195 MHz.
In Chap. 8, we present context-based adaptive binary arithmetic coding. Context-
based adaptive binary arithmetic coding (CABAC) adopted in H.264/AVC main
profile is the state-of-the-art in terms of bit-rate efficiency. In comparison with
context-based adaptive variable length coding (CAVLC) used in baseline profile, it
can save up to 7% of the bit-rate. However, CABAC occupies 9.6% of total encoding
time and its throughput is limited by bit-level data dependency. Moreover, for ultra-
high resolution, such like QFHD (3,840 2,160), its performance is difficult to meet
real-time requirement for a pure software CABAC encoder. Therefore, it is neces-
sary to accelerate the CABAC encoder by VLSI implementation. In this chapter, a
novel architecture of CABAC encoder will be described. Its performance is capable
of real-time encoding QFHD video in the worst case of main profile Level 5.1.
In Chap. 9, we present system integration. Hardware cost and encoding perfor-
mance are the two main challenges in designing a high-performance H.264/AVC en-
coder. We have proposed several high-performance architectures for the functional

1.2 Book Organization 9
units in an H.264/AVC encoder. In addition, external memory management is
another design issue. We have to access an external memory up to 3.3 GBps for
real-time encoding 1080pHD video in our encoder. We propose several AMBA-
compliant memory access units (MAUs) to efficiently access an external memory.
We will present our H.264/AVC encoder in this chapter.

Chapter 2
Intra Prediction
Abstract Intra prediction is the first process of H.264/AVC intra encoding. It
predicts a macroblock by referring to its neighboring macroblocks to eliminate spa-
tial redundancy. There are 17 prediction modes for a macroblock: nine modes for
each of the 16 luma 4 4 blocks, four modes for a luma 16 16 block, and four
modes for each of the two chroma 8 8 blocks. Because there exists great similarity
among equations of generating prediction pixels across prediction modes, effective
hardware resource sharing is the main design consideration. Moreover, there exists
a long data-dependency loop among luma 4 4 blocks during encoding. Increasing
parallelism and skipping some modes are two of the popular methods to design
a high-performance architecture for high-end applications. However, to increase
throughput will require more hardware area and to skip some modes will degrade
video quality. We will present a novel VLSI implementation for intra prediction in
this chapter.
2.1 Introduction
H.264/AVC intra encoding achieves higher compression ratio and quality compared
with the latest still image coding standard JPEG2000 [1]. The intra prediction unit,
which is the first process of H.264/AVC intra encoding, employs 17 kinds of pre-
diction modes and supports several different block sizes. For baseline, main, and
extended profiles, it supports 4 4 and 16 16 block sizes. For high profile, it ad-
ditionally supports an 8 8 block size.
In this chapter, we focus on the intra prediction for baseline, main, and extended
profiles. The intra prediction unit refers to reconstructed neighboring pixels to gen-
erate prediction pixels. Therefore, its superior performance comes at the expense of
very high computational complexity.
We describe the detailed algorithm of intra prediction in Sect. 2.1.1 and address
some design considerations in Sect. 2.1.2.
c
11

12 2 Intra Prediction
2.1.1 Algorithm
All intra prediction pixels are calculated based on the reconstructed pixels of
previously encoded neighboring blocks. Figure 2.1 lists all intra prediction modes
with different block sizes. For the luma component, a 16 16 macroblock can be
partitioned into sixteen 4 4 blocks or just one 16 16 block. The chroma com-
ponent simply contains one 8 8 Cb block and one 8 8 Cr block. There are nine
prediction modes for each of the 16 luma 4 4 blocks and four prediction modes
for a luma 16 16 block and two chroma 8 8 blocks.
Figure 2.2 illustrates the reference pixels of a luma macroblock. A luma 16 16
block is predicted by referring to its upper, upper-left, and left neighboring luma
16 16 blocks. For a luma 4 4 block, we utilize its upper, upper-left, left, and
upper-right neighboring 4 4 blocks. There are 33 and 13 reference pixels for a
luma 16 16 block and a luma 4 4 block, respectively. To predict a chroma 8 8
block is like to predict a luma 16 16 block by using its upper, upper-left, and left
neighboring chroma blocks. There are 17 reference pixels for a chroma block.
Figure 2.3 shows all the computation equations of luma 4 4 modes. Upper case
letters from “A” to “M” denote the 13 reference pixels and lower case letters from
“a” to “p” denote the 16 prediction pixels.
Cr
Cb
8x8
8x8
1 8x8
1 8x8
Y
16x16
1 16x16
Component Block Size Prediction Modes Abbreviation
16 4x4
0:vertical
1:horizontal
2:DC
3:plane
L16_VER
L16_HOR
L16_DC
L16_PLANE
0:vertical
1:horizontal
2:DC
3:diagonal down-left
4:diagonal down-right
5:vertical-right
6:horizontal-down
7:vertical-left
8:horizontal-up
L4_VER
L4_HOR
L4_DC
L4_DDL
L4_DDR
L4_VR
L4_HD
L4_VL
L4_HU
2:vertical
1:horizontal
0:DC
3:plane
CB8_VER
CB8_DC
CB8_PLANE
CB8_HOR
2:vertical
1:horizontal
0:DC
3:plane
CR8_VER
CR8_DC
CR8_PLANE
CR8_HOR
Fig. 2.1 Intra prediction modes

2.1 Introduction 13
MB_UpperLeft MB_Upper MB_UpperRight
MB_Left MB_Current
blk_UL blk_U blk_UR
blk_L blk_C
Fig. 2.2 Reference pixels of a luma macroblock
There are four modes for a luma 16 16 block and two chroma 8 8 blocks: hor-
izontal, vertical, DC, and plane as shown in Fig. 2.4. All except the plane mode are
similar to that of luma 4 4 modes. Plane modes defined for smoothly varying im-
age are the most complicated. Every prediction pixel has a unique value. Figure 2.5
shows the equations of a luma 16 16 plane mode. Each prediction pixel value de-
pends on its coordinate in the block and parameters a, b, and c which are calculated
from pixels of neighboring blocks.
After generating prediction pixels for each mode, the intra mode decision unit
will compute the cost of each mode, based on distortion and bit-rate, and choose the
one with the lowest cost.
We describe the encoding process of the intra prediction unit using Algorithm
2.1 and show the corresponding flow chart in Fig. 2.6.
The primary inputs of the intra prediction unit are Xcoord and Ycoord, which
indicate the location of the current macroblock in a frame. For example, in a
CIF (352 288) frame, (Xcoord,Ycoord/ D .0; 0/ denotes the first macroblock and
(Xcoord,Ycoord/ D .21; 17/ denotes the last. We use upper-case letters A through M
to represent 13 reference pixels for a luma 4 4 block as shown in Fig. 2.3. HSL
and VSL represent sets of left and upper reference pixels for a luma 16 16 block.
QL represents upper-left reference pixels for a luma macroblock. HSCb, VSCb, QCb,
HSCr, VSCr, and QCr are for two chroma 8 8 blocks.
Figure 2.7 shows the order in which to process the subtasks of an intra predic-
tion unit. Table 2.1 shows the cycle budget for an H.264/AVC encoder to encode a

H
D E F G
I
J
K
L
SH=SUM(I-L)
SV=SUM(A-D)
Prediction Mode (Abbreviation)
(Reference Pixels)
A-D: Upper Pixels
E-H: Upper Right Pixels
I-L : Left Pixels
M : Upper Left Pixels
(Prediction Pixels)
a-p
a b d
e f g h
i
n
m
j k l
o
c
Horizontal (HOR)
Vertical (VER)
DC (DC)
Diagonal Down-Left (DDL) Diagonal Down-Right (DDR)
Vertical-Right (VR) Horizontal-Down (HD)
Horizontal-Up (HU)
Vertical-Left (VL)
a=j=(M+A+1)1
b=k=(A+B+1)1
c=l=(B+C+1)1
d=(C+D+1)1
m=(K+2J+I+2)2
i=(J+2I+M+2)2
e=n=(I+2M+A+2)2
f=o=(M+2A+B+2)2
g=p=(A+2B+C+2)2
h=(B+2C+D+2)2
a=(A+2B+C+2)2
b=e=(B+2C+D+2)2
c=f=i=(C+2D+E+2)2
d=g=j=m=
(D+2E+F+2)2
h=k=n=
(E+2F+G+2)2
l=o=(F+2G+H+2)2
p=(G+3H+2)2
a=e=i=m=A
b=f=j=n=B
c=g=k=o=C
d=h=l=p=D
If (A-D,I-L available)
a-p=(SH+SV+4)3
else if (I-L available)
a-p=(SH+2)2
else if (A-D available)
a-p=(SV+2)2
else
a-p=128
a=b=c=d=I
e=f=g=h=J
i=j=k=l=K
m=n=o=o=L
m=(L+2K+J+2)1
i=n=(K+2J+I+2)2
e=j=o=(J+2I+M+2)2
a=f=k=p=
(I+2M+A+2)2
b=g=l=
(M+2A+B+2)2
c=h=(A+2B+C+2)2
d=(B+2C+D+2)2
m=(L+K+1)1
i=o=(K+J+1)1
e=k=(J+I+1)1
a=g=(I+M+1)1
n=(L+2K+J+2)2
j=p=(K+2J+I+2)2
f=l=(J+2I+M+2)2
b=h=(I+2M+A+2)2
c=(M+2A+B+2)2
d=(A+2B+C+2)2
a=(A+B+1)1
b=i=(B+C+1)1
c=j=(C+D+1)1
d=k=(D+E+1)1
l=(E+F+1)1
e=(A+2B+C+2)2
f=m=(B+2C+D+2)2
g=n=(C+2D+E+2)2
h=o=(D+2E+F+2)2
p=(E+2F+G+2)2
k=l=m=o=p=L
g=i=(L+K+1)1
c=e=(K+J+1)1
a=(J+I+1)1
h=j=(3L+J+2)2
d=f=(L+2K+J+2)2
b=(K+2J+I+2)2
M A B C H
D E F G
I
J
K
L
M A B C H
D E F G
I
J
K
L
M A B C H
D E F G
I
J
K
L
M A B C H
p
D E F G
I
J
K
L
M A B C H D E F G
I
J
K
L
M A B C H
D E F G
I
J
K
L
M A B C H
D E F G
I
J
K
L
M A B C H
D E F G
I
J
K
L
M A B C H
D E F G
I
J
K
L
M A B C
Fig. 2.3 Equations of luma 4 4 modes prediction pixels

2.1 Introduction 15
Vertical Horizontal
DC Plane
PlanePred [y,x]
Mean
(32 neighboring
pixels)
Vertical Horizontal
DC Plane
Pred [y,x]
b
a
Fig. 2.4 (a) Luma 16 16 and (b) chroma 8 8 prediction modes
ReconsPixel[–1,–1] ReconsPixel[–1,0]
ReconsPixel[0,–1]
7
x'=0
(x' +1)∗(ReconsPixel[8 + x',–1] – ReconsPixel[6 – x', –1])
H = Σ
1*(ReconsPixel[8, –1] - ReconsPixel[6,–1])
PlanePred [y, x] = Clip1{ a+16+b*(x–7)+c*(y–7))5}, x, y = 0~15
a = (ReconsPixel [–1, 15] + ReconsPixel [15,–1]) 4
b = (5*H + 32) 6
c = (5*V + 32) 6
3*(ReconsPixel[–1,10] - ReconsPixel[–1,4])
y'=0
(y' +1)∗(ReconsPixel[ –1,8 + y' ] – ReconsPixel[ – 1,6 – y' ])
V = Σ
7
Fig. 2.5 Illustration of plane modes

Algorithm 2.1: Intra prediction.
Intra Prediction (Xcoord ,Ycoord , AM, HSL, VSL, QL, HSCb, VSCb, QCb, HSCr , VSCr , QCr )
for 16 luma 4x4 block do
PredPixels4x4DC D Gen 4x4DC(A,B,C,D,I,J,K,L);
if up block available then
PredPixels4x4VER D Gen 4x4VER(A,B,C,D);
PredPixels4x4DDL D Gen 4x4DDL(A,B,C,D,E,F,G,H);
PredPixels4x4VL D Gen 4x4VL(A,B,C,D,E,F,G);
endif
if left block available then
PredPixels4x4HOR D Gen 4x4HOR(I,J,K,L);
PredPixels4x4H U D Gen 4x4HU(I,J,K,L);
endif
if left block available up block available up left block available then
PredPixels4x4DDR D Gen 4x4DDR(A,B,C,D,I,J,K,L,M);
PredPixels4x4VR D Gen 4x4VR(A,B,C,D,I,J,K,M);
PredPixels4x4HD D Gen 4x4HD(A,B,C,I,J,K,L,M);
endif
endfor
PredPixel16x16DC D Gen 16x16DC(HSL,VSL);
if up luma mb available then
PredPixels16x16VER D Gen 16x16VER(VSL);
endif
if left luma mb available then
PredPixels16x16HOR D Gen 16x16HOR(HSL);
endif
if left luma mb available up luma mb available then
PredPixels16x16PLANE D Gen 16x16PLANE(VSL,HSL,QL);
endif
if up chroma mb available then
PredPixels8x8VER Cb D Gen 8x8VER(VSCb);
PredPixels8x8VER Cr D Gen 8x8VER(VSCr );
endif
if left chroma mb available then
PredPixels8x8HOR Cb D Gen 8x8HOR(HSCb);
PredPixels8x8HOR Cr D Gen 8x8HOR(HSCr );
endif
if left chroma mb available up chroma mb available then
PredPixels8x8PLANE Cb D Gen 8x8PLANE(VSCb,HSCb,QCb);
PredPixels8x8PLANE Cr D Gen 8x8PLANE(VSCr ,HSCr ,QCr );
endif
1080pHD (1,920 1,088) video at 30 fps at different working frequencies. If the in-
tra prediction unit generates four prediction pixels per cycle, it will take 960 cycles
to predict a macroblock.
2.1.2 Design Consideration
Because all neighboring pixels are reconstructed, the intra prediction unit can only
start to predict a luma 4 4 block, a luma 16 16 block, or a chroma 8 8 block

2.1 Introduction 17
Start
Predict luma 4x4 DC
Upper block available?
Predict luma 4x4 VER, DDL, VL
Left block available?
Predict luma 4x4 HOR, HU
All neighbor block available?
Predict luma 4x4 DDR, VR, HD
Predict luma 16x16 or chroma DC
All 16 4x4 block done?
Upper MB available?
Predict luma 16x16 or chroma VER
Left MB available?
Predict luma 16x16 or chroma HOR
All neighbor MB available?
Predict luma 16x16 or chroma Plane
Luma 16x16, Cb8x8, and Cr 8x8 done?
End
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
No
No
No
No
No
No
No
No
Fig. 2.6 Flow chart of the intra prediction unit
after its neighboring blocks are reconstructed as shown in Fig. 2.2. This data depen-
dency exists in both the macroblock level for a luma 16 16 block and two chroma
8 8 blocks and the 4 4 block level for 16 luma 4 4 blocks. The data depen-
dency among 16 luma 4 4 blocks usually dominates the system performance of an
H.264/AVC intra encoder since it takes 576 960 D 60% of the total processing
time as shown in Fig. 2.7.
In Fig. 2.8, the arrows denote the data dependency among 16 luma 4 4 blocks,
and the numbers inside the 4 4 blocks show the processing order defined in the

Predict luma 4x4
DC for a 4x4
block
Predict luma 4x4
VER, DDL, VL for
a 4x4 block
Predict luma 4x4
HOR, HU for
a 4x4 block
Predict luma 4x4
DDR, VR, HD for
a 4x4 block
Predict luma16x16
DC, VER, HOR,
PLANE
36 576
Predict chroma
DC, VER, HOR,
PLANE
832 960
Cycles
Fig. 2.7 Order of processing subtasks
Table 2.1 Cycle budget at
different working frequency
for 1080pHD video
Frequency (MHz) Cycles
300 1,225
250 1,021
200 816
166 678
150 612
125 510
100 408
1
3
9
11
2
4
12
10
15
13
7
5
16
14
8
6
Fig. 2.8 Data dependency and processing order of 16 luma 4 4 blocks

2.2 Related Works 19
H.264/AVC standard. For example, the arrows point to block 13, which means we
have to predict it by referring to the reconstructed pixels of blocks 7, 8, and 10.
2.2 Related Works
Several VLSI architectures exist for H.264/AVC intra prediction. Some of them ad-
dress how to shorten prediction time to support high-resolution video applications.
Others aim to provide a hardware-efficient solution to minimize the system cost.
2.2.1 Prediction Time Reduction Approaches
The intra prediction unit and the intra mode decision unit together account for about
80% of the computation time in the H.264/AVC intra encoding, according to our
profiling using the H.264/AVC reference sofware Joint Model (JM) 11.0. An all-
mode mode decision approach evaluates costs of nine luma 4 4 modes, four luma
16 16 modes, and four chroma 8 8 modes, whereas a partial-mode approach
evaluates fewer modes by skipping modes that have a lower probability of being the
best one. By adopting the partial-mode mode decisions can shorten prediction time
but result in video-quality degradation. Several previous designs [7,54,78] propose
partial-mode mode decision algorithms. For example, a three-step encoding algo-
rithm is proposed by Cheng and Chang [7] to adaptively choose the next prediction
mode.
Instead of reducing the computation load, one can reduce prediction time by
increasing pixel-level parallelism (PLP). Increasing the PLP directly decreases the
total prediction time. For example, if the intra prediction unit predicts 8 pixels
per cycle, the lower bound of prediction time will be 480 cycles. Several previous
works [21, 29, 32] have proposed the ability to predict 4 pixels per cycle, and an-
other work [46] proposes to predict 8 pixels per cycle by employing two prediction
engines. The data dependency among luma 4 4 blocks introduces bubble cycles.
Huang et al. [21] proposes inserting luma 16 16 prediction into these bubble
cycles to eliminate them.
2.2.2 Hardware Area Reduction Approaches
Instead of using a dedicated hardware [61] for each prediction mode, several previ-
ous designs [21,29,46] employ reconfigurable architectures.
Some previous works [29, 46] save hardware area by removing the plane mode
from the prediction mode capability (PMC). PMC is defined as the prediction modes
that the intra prediction unit supports. PMC affects the prediction mode number and

the candidate mode set for the mode decision unit. The size of the candidate mode
set also affects compression ratio. A smaller set results in reduced coding efficiency.
Several previous designs schedule the processing order of prediction modes ac-
cording to their computation load to save hardware costs. Huang et al. [21] schedules
DC and plane modes last for data preparation, while the unit outputs pixels for ver-
tical and horizontal modes.
2.3 A VLSI Design for Intra Prediction
We propose a VLSI architecture for intra prediction in this section. We first describe
how we schedule all subtasks in Sect. 2.3.1, and we then propose our hardware ar-
chitecture in Sect. 2.3.2. In Sect. 2.3.3, we evaluate its performance.
2.3.1 Subtasks Scheduling
We categorize all intra prediction modes into reconstruction-loop (RL) modes and
nonreconstruction-loop (Non-RL) modes, as shown in Table 2.2. The former in-
cludes all luma 4 4 modes, whereas the later includes luma 16 16 and chroma
8 8 modes. Our profiling shows that RL modes occupy 59%, and three plane
modes occupy approximately 40% of overall computations.
Since the data dependency among 4 4 blocks dominates the processing time,
prediction of RL modes is the performance bottleneck. Our design spends 5 cy-
cles to generate prediction pixels of nine intra 4 4 modes for a 4 4 block by
increasing PLP to 16 (pixels/mode) 2 (modes/cycle) D 32 pixels/cycle. Ideally,
it takes 5 (cycles/block) 16 (blocks/macroblock) D 80 cycles to predict 16 luma
4 4 blocks.
Table 2.2 Mode categories
Prediction modes
RL modes Non-RL modes
Category Luma 4 4 Luma 16 16 Cb 8 8 Cr 8 8
Bypass L4 HOR L16 HOR CB8 HOR CR8 HOR
L4 VER L16 VER CB8 VER CR8 VER
DC L4 DC L16 DC CB8 DC CR8 DC
Plane L16 PLANE CB8 PLANE CR8 PLANE
Skew L4 DDL
L4 DDR
L4 VR
L4 HD
L4 VL
L4 HU

2.3 A VLSI Design for Intra Prediction 21
We also increase the PLP for predicting Non-RL modes. Our design generates 16
pixels of Non-RL modes per cycle. Therefore, it takes 16 (cycles/macroblock) 4
(modes) C 4 (cycles/chroma 8 8 block) 2 (chroma types) 4 (modes) D 96 cy-
cles to predict a luma 16 16 block and two chroma 8 8 blocks.
To alleviate the performance bottleneck caused by the long data-dependency loop
among luma 4 4 blocks, we modify the processing order of 4 4 blocks to process
two luma 4 4 blocks at a time. In Fig. 2.9, the arrows show the data dependency
among 16 luma 4 4 blocks, and the numbers inside the 4 4 blocks show the
modified processing order. To predict two chroma 8 8 blocks, we use a 4 4 block
as a unit to generate prediction pixels. We predict one Cb and one Cr 4 4 block
at the same time to shorten the processing time. Moreover, to utilize the bubble cy-
cles between two luma 4 4 blocks, we generate prediction pixels of luma 16 16
modes after generating prediction pixels of luma 4 4 modes for a luma 4 4 block.
Figure 2.10 shows the timing diagram of our proposed design. As mentioned be-
fore, there is a data dependency among luma 4 4 blocks. After the intra prediction
1'
3'
5'
7'
2'
4'
8'
6'
9'
7'
5'
3'
10'
8'
6'
4'
Fig. 2.9 Proposed processing order of 16 luma 4 4 blocks
RL engine 1 predicts
luma 4x4 modes for
a 4x4 block
Non-RL engine
predicts luma
16x16 modes
129
16 163
Fetch data
Cycles
RL engine 2 predicts
luma 4x4 modes for
a 4x4 block
Non-RL engine
predicts chroma
8x8 modes
32
Fig. 2.10 Timing diagram of the proposed design

unit finishes one luma 4 4 encoding iteration (encoding one or two luma 4 4
blocks), it needs to wait for transform, quantization, inverse quantization, inverse
transform, and reconstruction units before starting the next iteration.
Our design requires 1 cycle to read reference pixels. By adopting the
proposed processing order, two RL engines take 5 (cycles/iteration) 10 (itera-
tions/macroblock) D 50 cycles to predict 16 luma 4 4 blocks. We schedule the
Non-RL engine to predict luma 16 16 modes between two luma 4 4 encoding
iterations. The Non-RL engine generates prediction pixels of luma 16 16 modes
for one 4 4 block in iterations 1, 2, 9, and 10 and for two 4 4 blocks in other
iterations, as two RL engines do. The Non-RL engine also computes parameters for
the chroma plane and DC modes at iterations 2, 3, 4, 5, and 6 and outputs prediction
pixels at iterations 9 and 10. In total, it takes 163 cycles to predict a macroblock.
To schedule all prediction modes efficiently, we divide them into four cate-
gories: bypass, DC, skew, and plane, according to their characteristics as shown
in Table 2.2. The bypass category consists of vertical and horizontal modes which
require no computation. The DC category has four DC modes in luma 4 4, luma
16 16, Cb 8 8, and Cr 8 8, respectively. The skew category has six modes:
DDL, DDR, VR, HD, VL, and HU, all for luma 4 4 prediction. Their computa-
tion equations are mainly three-tap and two-tap filters. The plane category has three
modes: luma 16 16 plane, Cb 8 8 plane, and Cr 8 8 plane. They are the most
complicated.
To predict RL modes, we first separate those in the skew category into three
groups (1) HD and HU modes, (2) DDR and VR modes, and (3) DDL and VL
modes. We then schedule the horizontal mode and vertical mode into group (4) and
the DC mode into group (5). Our design takes 5 cycles to output them in order of
group number. To predict Non-RL modes, our design performs horizontal, vertical,
DC, and plane prediction in order for a luma 16 16 block and two chroma 8 8
blocks.
Although increasing PLP will increase the hardware area, we propose an op-
timized processing element (PE) scheduling scheme to achieve better resource
sharing. A PE is a basic hardware unit used to calculate a prediction pixel. With the
optimized scheduling scheme, we aim at generating prediction pixels of RL modes
with an optimal number of PEs. Our design predicts 32 pixels of RL modes per cy-
cle. If we give every prediction pixel a dedicated PE, 32 PEs will be needed for RL
modes. However, for all modes in the bypass category, we need nothing but multi-
plexers. For all DC modes, we use a 4-pixel adder tree to remove the computation
load from the PEs.
RL modes in the skew category are the most complicated modes. Still, there are
multiple prediction pixels sharing the same computation equation. For illustration,
we draw Table 2.3 by reorganizing a table proposed by Huang et al. [21]. Table 2.3
shows all computation equations of RL modes in the skew category. Each row rep-
resents a computation equation and the number of prediction pixels that utilize the
equation in each mode. The computation equations labeled as “T3,” “T2,” and “By-
pass” denote three-tap filter, two-tap filter, and bypass operations, respectively. The
numbers in the columns of reference pixels represent the multiplied numbers of cor-
responding reference pixels. The numbers in the columns of skew modes denote the

Table 2.3 Operation table of RL modes in the skew category
Reference pixels Skew modes
Equation L K J I M A B C D E F G H DDL DDR VR HD VL HU Sum
T3eq0 3 1 2 2
T3eq1 1 2 1 1 1 2 4
T3eq2 1 2 1 2 1 2 1 6
T3eq3 1 2 1 3 1 2 6
T3eq4 1 2 1 4 2 2 8
T3eq5 1 2 1 3 2 1 6
T3eq6 1 2 1 1 2 2 1 1 7
T3eq7 1 2 1 2 1 1 2 6
T3eq8 1 2 1 3 2 5
T3eq9 1 2 1 4 2 6
T3eq10 1 2 1 3 1 4
T3eq11 1 2 1 2 2
T3eq12 1 3 1 1
T2eq0 1 1 1 2 3
T2eq1 1 1 2 2 4
T2eq2 1 1 2 1 3
T2eq3 1 1 2 2
T2eq4 1 1 2 2
T2eq5 1 1 2 1 3
T2eq6 1 1 2 2 4
T2eq7 1 1 1 2 3
T2eq8 1 1 2 2
T2eq9 1 1 1 1
T2eq10 1 1 0
T2eq11 1 1 0
Bypass 1 6 6
number of prediction pixels sharing the corresponding computation equation. For
example, four prediction pixels in DDL mode share the same computation equation,
T3eq9.
We can reuse the PEs which perform the same computation equation during the
same cycle. Moreover, our design is able to predict RL modes with an optimal
number of PEs by simultaneously performing two of the most similar modes. For
example, DDL and VL modes have the greatest similarity according to Table 2.3.
If our design predicts DDL and VL modes in the same cycle, only 7 three-tap filter
PEs and 5 two-tap filter PEs are needed. However, DDR mode is similar to both VR
mode and HD mode. If we predict DDR mode with HD mode and VR mode with
HU mode in the same cycle, the optimal set of PEs will be 8 three-tap filter PEs and
7 two-tap filter PEs. Therefore, our proposed design predicts DDR mode with VR
mode and HD mode with HU mode in the same cycle. By adopting this schedule,
our design uses only 7 three-tap filter PEs and 5 two-tap filter PEs.

2.3.2 Architecture
Figure 2.11 shows the top-level block diagram of the proposed design. Its primary
inputs are reconstructed neighboring pixels, and its primary outputs are prediction
pixels. It contains two RL engines and one Non-RL engine. We use two RL engines
to predict two luma 4 4 blocks in parallel, and we design our Non-RL engine to
generate 32 prediction pixels for two 4 4 blocks at a time.
Reference
Pixels
Memory AG
Plane Seed
Pixel
Generator
Plane Pixel
Calculator
MB-level
DC Pixel
Generator
Plane
Parameter
Generator
MB-level
HV Pixel
Generator
Block-level
DC Pixel
Generator
Block-level
Skew Pixel
Generator
Block-level
HV Pixel
Generator
Multiplexer
Multiplexer
Multiplexer
Block-level
Reference
Pixels
RL Engine
Non-RL Engine
Xcoord, Ycoord Main
Controller
16
Prediction
Pixels
Block-level
DC Pixel
Generator
Block-level
Skew Pixel
Generator
Block-level
HV Pixel
Generator
Multiplexer
RL Engine
Multiplexer
MB-level
Reference
Pixels
Block-level
Reference
Pixels
Enable, Address
16
Prediction
Pixels
16
Prediction
Pixels
16
Prediction
Pixels
Fig. 2.11 Top-level block diagram of the proposed design

2.3.2.1 RL Engine
The RL engine consists of a block-level skew pixel generator, a block-level DC pixel
generator, and a block-level HV pixel generator. Both HV and DC pixel generators
are easy to implement. The HV pixel generator generates prediction pixels for ver-
tical and horizontal modes by bypassing the reference pixels, whereas the DC pixel
generator uses a 4-pixel adder tree to compute the DC value.
The skew pixel generator predicts pixels for DDL, DDR, HU, HD, VL, and VR
modes. Its architecture is shown in Fig. 2.12. It consists of seven PE3s and five PE2s,
where PE3 and PE2 are for three-tap filter and two-tap filter operations, respectively.
It first takes 13 reference pixels and distributes them to appropriate PEs. Next, each
PE selects input pixels to produce prediction values. Finally, 32 multiplexers select
the prediction values according to mode specification.
We design two customized architectures for PE3 and PE2, respectively, as de-
picted in Fig. 2.13. In each PE3, three multiplexers first select current input pixels.
Next, two adders and one shifter sum up the current input pixels. Then, the round-
ing shifting and clipping units proceed to postprocess the summation value. The
architecture of PE2 is similar to that of PE3 except that it sums up two reference
pixels before rounding.
2.3.2.2 Non-RL Engine
The Non-RL engine consists of three units: an MB-level HV pixel generator, an
MB-level DC pixel generator, and an MB-level plane pixel generator. The HV pixel
PE3_6
Multiplexer
PE3_1
PE3_2
PE3_3
PE3_4
PE3_5
PE3_7
Multiplexer
PE2_1
PE2_2
PE2_3
PE2_4
PE2_5
HU,
VR,
VL
HD,
DDR,
DDL
13 block-level
reference pixels
Fig. 2.12 Architecture of the block-level skew pixel generator

Round
Shift
Round
Shift
Round
Shift
Round
Shift
Round
Shift
Round
Shift
Round
Shift
Round
Shift
Round
Shift
Round
Shift
Round
Shift
Round
Shift
K G L
1
Clip
PE3_1
1
Clip
PE3_2
1
Clip
PE3_3
1
Clip
PE3_4
1
Clip
PE3_5
1
Clip
PE3_6
1
Clip
PE3_7
F J H J D K C I E I H J GMH ME I D A F A F M E B G L C L B K D B B A A C C
Clip
PE2_1
Clip
PE2_2
Clip
PE2_3
Clip
PE2_4
Clip
PE2_5
L E K F K A J B B I I C M D M A D E
J C
Fig. 2.13 PE3 and PE2 architecture
generator is similar to that of the RL engine. The DC pixel generator consists of two
4-pixel adder trees for summing up the reference pixels.
The plane pixel generator generates prediction pixels of the three plane modes.
It carries the largest amount of computation load. Although the value of each pre-
diction pixel depends on its coordinate in the block, there are systematic methods to
save the computations.
First, we divide a macroblock into sixteen 4 4 blocks, using a 4 4 block as a
unit to generate prediction pixels. We find that each prediction pixel is related to its
neighboring prediction pixels. Prediction values are increased by the parameter “b”
from left to right and by the parameter “c” from top to bottom in a 4 4 block. Both
parameters b and c are calculated according to the equations shown in Fig. 2.5. Sec-
ond, we can calculate 16 prediction pixels by using plane parameters and the seed
pixel for every 4 4 block. The seed pixel is a precalculated pixel on the top-left
corner of a 4 4 block. There are 16 seed pixels in the luma component and 8 seed
pixels in the chroma components. Figure 2.14 illustrates the seed pixel generation. It

Increase by 4b
SeedPixel_0 = a+16+b*(–7) + c*(–7)
SeedPixel_1 = SeedPixel_0 + 4*b
SeedPixel_2 = SeedPixel_0 + 4*c
SeedPixel_3 = SeedPixel_0 + 4*(b+c)
SeedPixel_16 = a+16+b*(–3) + c*(–3)
PlanePred [y, x] = Clip1{a+16+b*(x–7)+c*(y–7))5}
PlanePred [0, 0] = Clip1{a+16+b*(x–7)+c*(y–7))5}
= Clip1{(SeedPixel_0)5}
Increase
by 4c
SeedPixel_0
SeedPixel index
0 1 4 5
2 3 6 7
8 9 12 13
10 11 14 15
16 17 20 21
18 19 22 23
SeedPixel_20 = a+16+b*(–3) + c*(–3)
Fig. 2.14 Seed pixel generation
is not necessary to use 24 18 bits of registers to buffer all 24 seed pixels. Instead,
we use only 6 major seed pixels to calculate the remaining 18 seed pixels.
The plane pixel generator consists of three parts: a plane parameter generator, a
plane seed pixel generator, and a plane pixel calculator. The plane parameter gen-
erator as depicted in Fig. 2.15 produces parameters: a C 16, b, c, 3b, and 3c. It

Customized
Multiplier
Luma pixels
Multiplexer
Multiplexer
Multiplexer Multiplexer
Cb pixels
Cr pixels
Luma pixels
Cb pixels
Cr pixels
Luma pixels
Cb pixels
Cr pixels
Luma pixels
Cb pixels
Cr pixels
Customized
Multiplier
Customized
Multiplier
Customized
Multiplier
Increment
unit
Customized
Multiplier
16
Parameters
Fig. 2.15 Architecture of the plane parameter generator
then outputs these parameters to the plane seed pixel generator and the plane pixel
calculator.
The plane seed pixel generator as depicted in Fig. 2.16 takes the parameters to
generate the seed pixel for every 4 4 block. It first generates 6 major seed pixels,
0, 4, 8, 12, 16, and 20 and then computes the remaining 18 seed pixels by using 6
major seed pixels and parameters.
Figure 2.17 illustrates how the plane pixel calculator generates the plane pixels.
It contains 16 customized PEs to generate 16 prediction pixels. At first, it takes the
seed pixel of the current 4 4 block and the parameters as inputs. It then computes
the compensation value for each prediction pixel according to its coordinate in a
4 4 block. Figure 2.18 shows the 16 compensation values. Finally, 16 prediction
pixels are produced at a time by summing up the compensation values and the seed
pixel.

Multiplexer
Multiplexer
SeedPixel_0
Parameter
Set
Registers
Register
Customized
Multiplier
SeedPixel_4
SeedPixel_8
SeedPixel_12
SeedPixel_16
SeedPixel_20
SeedPixel_[0-23]
Fig. 2.16 Architecture of the plane seed pixel generator
SeedPixel Parameters
Customized
PE Array
Plane prediction
pixels
Fig. 2.17 Plane pixels calculation

0
c
2c
3c
b
b+c
b+3c
b+2c
2b+3c
2b+2c
2b+c
2b
3b+3c
3b+2c
3b+c
3b
Increase b
Increase c
Fig. 2.18 Compensation values in a 4 4 block
2.3.3 Evaluation
Synthesized targeted toward a TSMC 0.13-m CMOS cell library, the total gate
count of the proposed design is about 32K gates when running at 161 MHz. It takes
163 cycles to predict a macroblock. Running at 161 MHz, the proposed design can
real-time encode 4K 2K (4,096 2,048) video at 30 fps.
2.4 Summary
The intra prediction unit, which is the first process of H.264/AVC intra encoding,
refers to reconstructed neighboring pixels to generate prediction pixels. Its supe-
rior performance comes at the expense of very high computational complexity. To
share hardware resources well and to shorten prediction time to speed up the data-
dependency loop among luma 4 4 blocks are the main challenges in designing a
hardware architecture for high-resolution applications. Many previous works have
proposed either a mode-skipping scheme or configurable architecture to design the
hardware.
We increase the pixel-level parallelism and propose a new processing order to
shorten the prediction time. By adopting the proposed optimized PE scheduling
scheme, we also achieve better resource sharing of processing element. Our design
can process one macroblock within 163 cycles while consuming only 32K gates. It
is able to support high-resolution applications up to 4K 2K.

Chapter 3
Integer Motion Estimation
Abstract Interframe prediction in H.264/AVC is carried out in three phases: integer
motion estimation (IME), fractional motion estimation, and motion compensation.
We will discuss these functions in this chapter and Chaps. 4 and 5, respectively.
Because motion estimation in H.264/AVC supports variable block sizes and multi-
ple reference frames, high computational complexity and huge data traffic become
main difficulties in VLSI implementation. Moreover, high-resolution video applica-
tions, such as HDTV, make these problems more critical. Therefore, current VLSI
designs usually adopt parallel architecture to increase the total throughput and solve
high computational complexity. On the other hand, many data-reuse schemes try to
increase data-reuse ratio and, hence, reduce required data traffic. In this chapter, we
will introduce several key points of VLSI implementation for IME.
3.1 Introduction
High-quality video sequences usually have a high frame rate at 30 or 60 frames
per second (fps). Therefore, two consecutive frames in a video sequence are quite
similar. The goal of motion estimation is to exploit this characteristic to reduce
temporal redundancy. In Fig. 3.1, for example, when encoding frame t C 1, it only
needs to encode the difference between frame t C 1 and frame t (i.e., the airplane)
instead of the whole frame t C 1.
In current video coding standards, block-based motion estimation (BME) is
widely used to estimate movement of a rectangular block from the current frame.
BME fits well with rectangular video frames. It is also suitable for block-based im-
age transforms (e.g., DCT). However, there are several disadvantages. For example,
real objects are rarely rectangular. Many types of object motion are hard to estimate
using BME. Moreover, BME causes a blocking artifact after decoding. Despite these
disadvantages, BME is employed by most existing video coding standards. BME
compares each N N -pixel current block in the current frame with several or ev-
ery N N -pixel reference blocks (called candidate blocks) in the search window
of the reference frame to determine the best match, as shown in Fig. 3.2. The refer-
ence frame may be the previous frame, the next frame, or both. A popular matching
c
31

32 3 Integer Motion Estimation
Frame t+1
Frame t
Fig. 3.1 Two consecutive video frames
Fig. 3.2 Block-based motion estimation
criterion is to measure the residual calculated by subtracting reference blocks from
the current block, so that the reference block that minimizes the residual is chosen
as the best match.
In previous video coding standards, the block size of motion estimation is fixed.
Fixed-block-size motion estimation (FBSME) expends the same effort to estimate
the motion of moving objects and static objects. This inflexibility causes low coding
performance. Moreover, two objects moving in different directions in one block can
also lead to low estimation accuracy. Therefore, H.264/AVC adopts variable-block-
size motion estimation (VBSME). VBSME adaptively uses a smaller block size for
estimating moving objects and a larger block size for static ones to increase the cod-
ing efficiency. In H.264/AVC, each frame of a video is partitioned into macroblocks
(MBs) of 16 16 pixels. Each macroblock can be split into smaller blocks in four
ways: one 16 16 block, two 16 8 blocks, two 8 16 blocks, or four 8 8 blocks.
If the 8 8 mode is chosen, each of the four 8 8 blocks may be split further into
one 8 8 block, two 8 4 blocks, two 4 8 blocks, or four 4 4 blocks.
VBSME in H.264/AVC is carried out in two phases: integer motion estimation
(IME) and fractional motion estimation (FME). In the first step, IME computes the
motion vector predictor (MVP) of current MB. MVP is medium of MVs of the left,
the top, and the right-top (or left-top) MBs of current MB. The position pointed by
MVP is set as the center point of the search window. In the next step, IME performs

3.1 Introduction 33
Start
End
Fig. 3.3 Flow chart of IME
motion search within the search window and finds integer motion vector (IMV) for
all sizes of blocks (4 4–16 16 blocks). Finally, IME outputs these 41 IMVs (one
for 16 16 block, two for 16 8 blocks, two for 8 16 blocks, four for 8 8 blocks,
eight for 8 4 blocks, eight for 4 8 blocks, and sixteen for 4 4 blocks) to FME,
which will be discussed in Chap. 4. The flow chart of IME in H.264/AVC is shown
in Fig. 3.3.
We introduce some block-based IME algorithms in Sect. 3.1.1 and discuss several
key points of VLSI implementation for IME in Sect. 3.1.2.
3.1.1 Algorithms
There are many kinds of algorithms for block-based IME . The most accurate strat-
egy is the Full Search (FS) algorithm. By exhaustively comparing all reference
blocks in the search window, FS gives the most accurate motion vector which causes
minimum sum of absolute differences (SAD) or sum of square difference (SSD).
The computation of SAD and SSD is shown in (3.1) and (3.2), respectively, where
N denotes the block size, CB the current block, RB the reference block, (i,j ) the
motion vector (MV), and SR the search range . Algorithm 3.1 shows process steps
of the FS algorithm, where W denotes the frame width, H the frame height, SRH
the horizontal search range, and SRV the vertical search range.
SAD .i; j / D
N1
X
mD0
N1
X
nD0
jCB .m; n/ RB .m C i; n C j /j;
SADmin D min .SAD .i; j // ; SR i; j SR; (3.1)

Algorithm 3.1: Full search integer motion estimation.
for w D 0 to W/N do
for h D 0 to H/N do
MV(w,h) D (0,0);
SAD(w,h) D INIFINITE;
for i D SRH to SRH -1 do
for j D SRV to SRV -1 do
SAD(i,j) D 0;
for x D 0 to N-1 do
for y D 0 to N-1 do
SAD(i,j) C D j CB(x,y) - RB(i C x,j C y)j;
endfor
endfor
if SAD(i,j) SAD(w,h) then
MV(w,h) D (i,j);
SAD(w,h) D SAD(i,j);
endif
endfor
endfor
return with MV(w,h);
endfor
endfor
SSD .i; j / D
N1
X
mD0
N1
X
nD0
.CB .m; n/ RB .m C i; n C j //2
;
SSDmin D min .SSD .i; j // ; SR i; j SR: (3.2)
To exhaustively compare with all the reference blocks in the search window, the
FS algorithm requires huge computational cost. Therefore, many computationally
efficient heuristics have been proposed. They can be divided into two types. The first
type reduces computational complexity by reducing the number of search points,
e.g., Three Step Search (TSS) algorithm [31] and Diamond Search (DS) algorithm
[68,84]. Figure 3.4 depicts the search steps of the TSS algorithm, where the number
in a circle depicts the order of search step and a gray circle means the final choice
of each step. The TSS algorithm first searches points around the center (x,y) with
step size s. The positions of nine search points are (x s,y s), (x,y s), (x C
s,y s), (x s,y), (x,y), (x C s,y), (x s,y C s), (x,y C s), and (x C s,y C s).
The point with minimum distortion (SAD or SSD) becomes the center of the next
step. The initial step size in Fig. 3.4 is three, and the step size is decremented after
each step. This allows the algorithm to finish in three steps – hence its name. The
DS algorithm also has nine initial search points, but these search points form a
diamond instead of a square. Figure 3.5 shows the position of the diamond, where
the next search steps may be along the diamond’s vertex and face, respectively.
Therefore, there are five or three new search points to be evaluated at every next
step. DS stops searching when the search point with minimum distortion is at the
center of a diamond. Both TSS and DS algorithms are widely used, and there are

3.1 Introduction 35
1
2
3
1 2
3
1
2
3
1
2 3
1
2
3
2
3
1
1
1 1
2
2 3
3
Fig. 3.4 Three step search algorithm
1
1
2
1
3
1
2 3
1
3
1
1
1
1
2
3 3
Fig. 3.5 Diamond search algorithm
many algorithms extended from the TSS or DS algorithm, such as the Four Step
Search algorithm. There are also some algorithms trying to reduce the number of
search points by an early termination strategy. These algorithms start from the left-
top or center point in the search window and search reference blocks one by one. The
search process is terminated when the distortion of a reference block is less than a
predefined threshold. The rest of reference blocks are not tested even if there may be
a reference block with lower distortion. These algorithms can control computational
complexity and resulted video quality by changing the value of the threshold. The
higher the threshold, the lower the number of search points, and hence the lower the
complexity.

16
8
16
16
16
16
8
8
Fig. 3.6 Down sampling
The second type of algorithms reduces computational complexity by reducing
the computation of each search point. Koga et al. [31] proposed a down-sampling
method in which a 16 16 block can be 1/2-down-sampling to an 8 16 or a 16 8
block, or 1/4-down-sampling to an 8 8 block, as shown in Fig. 3.6. Therefore, the
complexity of distortion computation of each reference block is reduced to one-half
or one-fourth.
These two types of complexity-reduction schemes can be used together to further
reduce the computational complexity. However, these algorithms reduce search time
at the expense of video-quality loss and bit-rate increase. These low-complexity
algorithms are usually used in low-resolution or mobile applications.
3.1.2 Design Considerations
Due to computational regularity and excellent video quality, full search motion esti-
mation is commonly employed in VLSI implementation. Therefore, we concentrate
on VLSI implementation of the Full Search algorithm in the rest of this chapter.
However, this exhaustive search strategy also leads to high computational complex-
ity and huge amounts of data traffic. Therefore, a highly parallel architecture is
essential to perform large amount of computation. Moreover, there are a lot of over-
lapping pixels between consecutive search windows as well as between reference
blocks during block matching. An efficient data-reuse scheme is essential to reduce
redundant data access and thus total data traffic.

3.2 Related Works 37
3.2 Related Works
We describe several representative IME architectures in Sect. 3.2.1 and introduce
data-reuse schemes in Sect. 3.2.2.
3.2.1 Architecture
There are many hardware designs for FBSME. Yang et al. [80] proposed the first
1D array architecture as shown in Fig. 3.7. Each processing element (PE) is respon-
sible for calculating SAD of one reference block and the number of PEs is equal to
the number of reference blocks in the horizontal direction within the search range.
Current block pixels are propagated through shift registers while reference block
pixels are broadcasted to all PEs. This design allows sequential inputs but performs
parallel processing with 100% hardware utilization. Moreover, this design reduces
memory traffic by broadcasting reference block pixels.
By using similar concept, Yeo and Hu [81] proposed a 2D array architecture
as shown in Fig. 3.8. Current block pixels are still propagated through shift regis-
ters, but reference block pixels are broadcasted in both the horizontal and vertical
directions. A set of N N PEs is responsible for N N region in search win-
dow, where N is the block size (N D 4 in Fig. 3.8). Consequently, there are
totally (2SRV=N ) (2SRV=N ) sets of PEs for the whole search window. By two-
directional data broadcasting, this design further increases data-reuse ratio.
Komarek and Pirsch [34] proposed a 2D array architecture as shown in Fig. 3.9.
Each current pixel is stored in a PE and hence the number of PEs is equal to the
current block size (block size N D 4 in Fig. 3.9). Instead of broadcasting, reference
0
1
Comparator
MV
D D D D
PE0 PE1 PE2 PE14 PE15
Fig. 3.7 Yang’s 1D array architecture

Another Random Scribd Document
with Unrelated Content

“As I feel very queer my will I now make;
Write it down, Joseph Finch, and make no mistake.
I wish to leave all things fair and right, do you see,
And my relatives satisfy. Now, listen to me.
The first in my will is Lydia my wife,
Who to me proved a comfort three years of my life;
The second my poor aged mother I say,
With whom I have quarrelled on many a day,
For which I’ve been sorry, and also am still;
I wish to give her a place in my will.
The third that I mention is my dear little child;
When I think of her, Joseph, I feel almost wild.
Uncle Sam Bigsby, I must think of him too,
Peradventure he will say that I scarcely can do.
And poor Uncle Gregory, I must leave him a part,
If it is nothing else but the back of the cart.
And for you, my executor, I will do what I can,
For acting towards me like an honest young man.
“Now, to my wife I bequeath greater part of my store;
First thing is the bedstead before the front door;
The next is the chair standing by the fireside,
The fender and irons she cleaned with much pride.
I also bequeath to Lydia my wife
A box in the cupboard, a sword, a gun, and knife,
And the harmless old pistol without any lock,
Which no man can fire off, for ’tis minus a cock.
The cups and the saucers I leave her also,
And a book called ‘The History of Poor Little Mo,’
With the kettle, the boiler, and old frying-pan,
A shovel, a mud-scoop, a pail, and a pan.
And remember, I firmly declare my protest
That my poor aged mother shall have my oak chest
And the broken whip under it. Do you hear what I say?
Write all these things down without any delay.
And my dear little child, I must think of her too.
Friend Joseph, I am dying, what shall I do?
I give her my banyan, my cap, and my hose,
My big monkey-jacket, my shirt, and my shoes;
And to Uncle Sam Bigsby, I bequeath my high boots,
The pickaxe and mattock with which I stubbed roots.
And poor Uncle Gregory, with the whole of my heart,
I give for a bedstead the back of the cart

I give for a bedstead the back of the cart.
And to you, my executor, last in my will,
I bequeath a few trifles to pay off your bill.
I give you my shot-belt, my dog, and my nets,
And the rest of my goods sell to pay off my debts.
“Joseph Finch, Executor.
“Dated February 4th, 1839.”
From Missouri
Under the spell of the Muse, Joseph Johnson Cassiday, a well-known
farmer of Jasper County, Missouri, prepared his will in rhyme; for several
years this document answered the purposes of the testator; just prior to his
death, however, in March, 1910, more serious thoughts seem to have come
over him, and Mr. Cassiday executed a different will, the last being done in
the usual prose form. The will in rhyme is given below:
“I, Joseph Johnson Cassiday,
Being sound of mind and memory,
Do hereby publish my intent,
This my will and testament,
That all my just debts first be paid,
Expense for burial and funeral made,
And all expenses made of late,
Out of my personal and real estate.
I do bequeath, devise and give,
As long as she, my wife, shall live,
Lot six in the original town of Lever,
To her assigns and heirs forever.
To my adopted daughter Marie,
I do devise and give in fee,
The southeast quarter of section seven
Township nine and range eleven.
To my two sons Josephus and Reach,
I do devise one dollar each.
The residue of my estate,
I do bequeath to Mary Kate,
And I hereby appoint her for,
My last will, executor.
This eighteenth day of May was done,
In the year of our Lord, Nineteen One.”

CHAPTER IV
CURIOUS WILLS
“Most men are within a finger’s breadth of being mad; for if a man walk with his middle finger
pointing out, folk will think him mad, but not so if it be his forefinger.”
“Where be your Gibes now? Your Gambols? Your songs? Your flashes of merriment, that were wont
to set the table on a roar?”
1
Husbands, Wives, and Children
“Men should be careful lest they cause women to weep, for God counts their tears.”
An editorial on “Testamentary Habits and Peculiar Wills,” appeared in
the Western Reserve Law Journal some time ago. Its excellence merits a
reproduction in part:
“The laws of human nature underlie all systems of jurisprudence.
Positive law is evolved out of long periods of human phenomena. The
general systems of law are the composite products of innumerable
generations of men. These accepted codes are supposed to embody the
survivals of an immemorial struggle between right and wrong, and the
highest sentiments of justice, and the clearest perfection of reason of all
ages. But it is a remarkable fact that one-half of all the property in the
world, in the succession of generations, is transmitted and controlled by the
supreme purpose and disposition of individual men and women. The tenure
of property is not always held, nor is it transmitted, according to legislative
enactments or judicial law. Under the testamentary privilege secured by law
the unenlightened mind often becomes the legislature which frames and
promulgates the rule of descent which fixes the destiny of millions of
property. The perfect freedom and untrammelled modes of expression,
secured in the will-making privilege, results in the manifestation of the
most normal and spontaneous spirit of the individual.
“For genuine and authentic repositories of human idiosyncrasies and
whimsical peculiarities, as well as lofty sentiments and noble thoughts on

high themes, there is nothing comparable with the last will and testament.
There are several reasons for the existence of this fact.
“1st. The will is usually the product of grave thought and deliberation. It
is the matured disposition of the individual testator, framed and published in
the exercise of one of the highest and best appreciated rights granted by
society to the individual. The will is also the outgrowth of the individual’s
sense of duty involved in sacred domestic and family obligations and
relationships.
“2d. The right to make the will confers the privilege coveted by both
men and women to speak into the universal ear ‘the last word.’ The sum of
man’s moral sense, and his exact ethical tone, is not infrequently
concentrated in his last will.
“3d. In the ages of the world, when the agitation of religious beliefs was
most prevalent, men were prone to give a summary of their opinions upon
religion in their wills. The rites and ceremonies of sepulchre are often
prescribed; the belief in immortality is often expressed in these sacred
documents. The vanities and foibles, the whims and caprices, the
eccentricities and prejudices, all leave their exact mould and expression in
this important instrument. The cynic adopts this means of giving a parting
blow to the unfriendly and unsympathizing world. It is said that the mould
and fashion of the human form was so preserved in ancient Egypt by the
embalmer’s art that the peculiar physiognomy of the Pharaohs is discovered
after three thousand years of burial. This art of preservation has been lost.
But in the numerous receptacles for recorded wills in Europe and America
are found the mummified intellectual and spiritual remains of past
generations as clearly and positively embalmed as are the bodies of the
Pharaohs.
“It is interesting to note the influence of long-established customs upon
the social habits of people. The present habitat of the will-making people is
continental Europe. This fact is susceptible of easy explanation. The
jurisprudence of the continent is founded on Roman law. Sir Henry Sumner
Maine has well said: ‘To the Romans belong preëminently the credit of
inventing the will, the institution which, next to the contract, has exercised
the greatest influence in transforming human society.... To the Roman no
evil seems to have been a heavier visitation than the forfeiture of

testamentary privilege; no curse seems to have been bitterer than that
imprecated upon an enemy ‘that he might die without a will.’ ”
* * * * * * *
“The odd freaks, vagaries and vanities of men thus find permanent
lodgment in testamentary remains. While these features of the will at first
appear to defy classification, yet by careful examination, extending over
long periods, the manifestation of unvarying habits of mind, and the
existence of constant and controlling instincts and motives, are readily
discovered.
“These natures of ours, when freely dealing with the subject of property,
and exhibiting solemn sentiments upon duty and destiny, unconsciously
yield to fundamental laws of uniform operation; and these testamentary
memorials may be made to furnish much curious instruction upon
psychological and sociological subjects.”
Duty of Husbands to make Wills
The following article from the pen of Harriette M. Johnston-Wood, of
the New York bar, appeared in Harper’s Weekly in the issue of September
24, 1910; there is much in it which should appeal to the sense of justice and
manhood of the husbands, brothers and sons of our country. The barbaric
treatment of women with reference to property rights should no longer find
a place in the laws of a country which boasts of its enlightenment and
freedom as does the United States. It is gratifying to record that a more
liberal policy is fast being adopted by the law-making bodies of our States.
Our author says:
“It has been our custom for a number of years to pass our summer
vacation on the banks of Lake Seneca, where one of us was born. Here our
paternal grandparents came when the country was yet a wilderness, and
here they lived and died. Their wedding journey from Rensselaerwick was
made in a covered wagon, in which they brought their worldly possessions,
some chairs, a table, a bed, a stove, some dishes and cooking utensils. A
half-dozen sheep and a cow brought up the rear of this caravan. Here they
cleared the ground and built a house. Grandmother dyed and carded and
spun into yarn and wove into cloth the wool from the sheep, from which she
knitted the socks and mittens and made the clothing. From the flax which
grew wild thereabouts she made the household linen. No small tasks were

these when eventually nine children came to demand care and protection.
Once a year a perambulating shoemaker came through the country, and then
this small army was shod, with boots and shoes in reserve sufficient to last
until his return. By and by a frame house was built, a luxury in those days;
property was accumulated.
“To whom did it belong?
“In justice and equity it belonged to both parents. Each had borne the
burden; each should share in the reward. But the law said no. The wife’s
services belong to the husband, and their joint earnings belong to him, only
the husband must support the wife. The wife owned nothing. Truly a
munificent compensation for fifty years of service such as this!
“Did grandfather support grandmother? Were grandmother’s services
less valuable than grandfather’s? By what righteous authority did
everything belong to grandfather?—he being allowed to give or will away
everything, except the use of one-third of the real estate, which
grandmother might have after his death, but for her lifetime only. It was
barely possible that grandmother might have liked to give or will something
to her children on her own account. When she had earned it, by years of toil
as hard as his, why should she not have been allowed to gratify this
altogether worthy ambition?
“Forty years ago a boy and a girl married. He had nothing. She had
saved five hundred dollars teaching school. They bought a farm, paying her
five hundred dollars down, and taking a mortgage for the balance. Title was
taken in the husband’s name. They worked together for forty years. He died,
leaving no will. There were no children. Under the law of the State the
property went to his brothers and sisters, all old, all well-to-do. The
personalty amounted to very little. The wife’s dower, the use of one-third
during her life, amounts to less than $200 a year, and this is her sole support
in her old age.
“In that section of the country women can get one dollar a day for at
least half the year working in fruit, tying grape-vines, putting handles on
baskets, picking berries, cherries, and currants, and packing grapes, peaches
and plums. Household service is always at a premium, as no one there will
go out to do that kind of work. They are the descendants of the old settlers
and are proud. The married women work in the fruit in the daytime, and
perform their household duties at night. This means baking and cooking and

stewing, and washing and ironing and mending for the hired men as well as
the family. Incidentally they raise children. No one person could be hired to
do this work. They do it for love, but we believe there is no insurmountable
obstacle in the way of getting both love and justice; we believe that love
and injustice are irreconcilable,—and if we must choose between them, my
advice is to exact justice and take a chance on love.
“To wife’s services, 40 years at $3 per week (worth $5),
allowing for clothing, which she makes herself and which
seldom equals and rarely exceeds $30 a year, about $30,000
To $500 and interest, 40 years, about 6,000
Total $36,000
“Would the whole estate have been more than this wife was entitled to?
“A bride was presented by her uncle with $2000, with which the thrifty
bridegroom bought sheep. It proved a profitable investment, and in time
they were well-to-do. At the expiration of fifty years of matrimony and
mutual toil (which included the rearing of six children) the husband died.
By his last will and testament he gave to his beloved wife two thousand
dollars in cash, or her dower interest in his real estate. The wife took the
cash. Her original two thousand dollars for fifty years then amounted to
about $60,000.
“This shows that a wife may be considered to be a good investment.
* * * * * * *
“A clerk in a delicatessen store in a large city married a German
governess. They started a similar store of their own and lived in the rear.
The wife did the housework and the cooking and baking for the store, and
between times waited on customers. They were frugal and prospered. After
twenty years the husband died. The wife naturally thought she was entitled
to the property, at least a portion of it. But the husband had made a will
prior to his marriage, whereby he devised his property to his brothers and
sisters.”
* * * * * * *
“The staple argument of the opponents of equal laws for men and
women is that wives are privileged in that they can do with their own as
they like, while the husbands cannot. But is the property the husband’s any
more than the wife’s when they accumulate it jointly? Up to the

marriageable age girls earn nothing; after marriage their services belong to
their husbands. Where is the opportunity to accumulate property which
shall be their very own in the eyes of the law, with which they may do as
they like? What provision can they make for possible incapacity and certain
old age if they live?”
Will of a Chinaman
There was filed in the Surrogate’s Office of Queens County, New York,
on October 1, 1910, what the newspapers refer to as the queerest instrument
ever recorded in New York City. The testator was John Ling, a Chinaman,
of Woodbridge, New Jersey.
The original will was probated in Middlesex County, New Jersey, but as
Ling was the owner of considerable real estate in Queens County, before
settlement could be made an exemplified copy of the will had to be filed
there.
It appears that John Ling, Jr., a son of the deceased, had taken an Irish
bride, much against the will of his father. The Chinaman was enraged, and
talked long and earnestly with his son upon the subject. But to no avail. The
young man refused to leave his Irish bride. When the old man died, he left
the following will:
“First, I leave and bequeath to John Ling, my son, the sum of $1. With
the said sum of $1, or 100 cents, I wish that he would purchase a rope
strong and long enough to support his Irish wife; the said sum of $1 to be
paid six months after my decease by my wife, her heirs or executors.
“Secondly, I leave and bequeath to my wife, Mary Ling, all property,
whether in America or England, that I may be possessed of, during her
natural life; and at her death said property is to be equally divided between
Samson and Mary Ling, son and daughter of John and Mary Ling; and
should neither Samson nor Mary survive to come in possession of the said
property now belonging to John and Mary Ling, the property is then to
descend unto John Ling, the son of Joseph Ling, my nephew, now residing
in Europe, with the exception of the $1 to be paid to my son, John Ling.”
Two Hundred Dollars for a Husband
According to the New York Sun, an attractive young German woman of
Washington, D.C., walked into a newspaper office in that city on October

11, 1910, and requested the insertion of the following advertisement:
“ ‘Young woman, fairly wealthy, from foreign country, desires to meet at
once some poor young man. Object, matrimony.’
“She gave her name as Eugenie Adams, but admitted that this was an
assumed name. She said she was willing to give her prospective husband a
bonus of $200. She explained that her uncle, who lived in Germany, had
named her as the beneficiary in his will, provided she married in a week.
“ ‘You see it is this way,’ she explained with a German accent, ‘my old
uncle is very eccentric. He lives in the Fatherland, where all my people are.
He has named me the beneficiary of his will if I am married by a week from
to-day. I am very poor. I want the money. I plan to get married in order to
obtain it. I will pay any young man $200 to marry me.
“ ‘But I will be no trouble to him,’ she continued. ‘I will get a divorce
from him at once and never see him again. I do not want to remain married.
I only want to return to Germany at once with my marriage papers. Could a
man make $200 in an easier way?’
“She declined to give the amount of the legacy she expected to obtain
through her marriage.”
The Result
The St. Louis Times in a recent editorial comments on the “Two-
hundred-dollar Husband,” as follows:
“We have been much interested in a story which has been telegraphed
from Washington, and which relates the circumstances under which a
presentable fraulein bought a husband, in order that she might inherit an
estate—which was willed her on the condition that she marry within a given
time.
“She appears to have wanted the estate badly, though the idea of having
a husband did not appeal to her at all. Perhaps there was a ruddy faced
Heine at home with whom she had danced in the old days, and who still
held her heart in thrall. Be that as it may—as Laura Jean Libbey would say
—she married her emergency husband in Washington only because she had
to, in order to get the estate.
“She did not wish ever to see her husband again, and when a sailor
appeared in response to her advertisement, she rather liked the looks of him

—for the occasion at hand—but decided, wisely, that he would not do,
because ‘he travelled around the world, and she might see him again.’ She
finally decided in favor of one Harry Oliver Brown, who wore a flowing
sandy mustache, and a celluloid collar, and carried a walking-stick. We
should have thought the flowing sandy mustache would have been enough,
though we have no objection to the celluloid collar and the walking-stick, if
they be thought to possess a corroborative value.
“And so the two were married, and Mrs. Brown gave her hired husband
$200 and bade him good-by and left, without even saying she would hurry
back, and boarded a ship for the Fatherland, where the estate was—and,
presumably, is.
“We have related this quaint fable because it seems to possess a valuable
idea for those who contemplate matrimony, not because they consider
themselves fitted for it in any way, but because they feel they ‘have to get
married’—so much the slave to public opinion are many estimable young
people.
“If the thing has to be done, we commend the method of Mrs. Harry
Oliver Brown. A sandy mustache, a celluloid collar, and a walking-stick can
always be had for a song—and there is not a very heavy percentage of
sailors.”
Knew her Disposition
It is recorded of an old English farmer, that, in giving instructions for his
will, he directed a legacy of one hundred pounds be given to his widow.
Being informed that some distinction was usually made in case the widow
married again, he doubled the sum; and when told that this was quite
contrary to custom, he said, with heartfelt sympathy for his possible
successor, “Aye, but him as gets her’ll deserve it.”
Clothes on a Hickory Limb
The will of Charles C. Dickinson, former president of the Carnegie Trust
Company, who died a few months ago, contains a bequest of $4000 for the
education of his son Charles, at Cornell, with the strange stipulation that the
son shall forfeit this allowance if he goes “to or upon Cayuga Lake.”
The lake is used by the Cornell crews and by students for canoeing and
sailing.

To a nephew he leaves $2000 for educational purposes, with the same
restrictions regarding Cayuga Lake.
Sarcastic Will
A British sailor requested his executors to pay to his wife one shilling,
wherewith to buy hazelnuts, as she had always preferred cracking nuts to
mending his stockings.
A Contrite Husband
J. Withipol of Walthamstow, Essex County, England, left his landed
estates to his wife, “trusting, yea, I may say, as I think, assuring myself, that
she will marry no man, for fear to meet with so evil a husband as I have
been to her.”
Aunt Lunky’s Will
The author has sought with little success for wills which would portray
the character of the negro race, although the aid of Mr. Booker T.
Washington was enlisted in this behalf. One, however, is offered:
Aunt Lunky was a negro servant and resided in Jacksonville, Illinois. For
several generations, she had lived with the same family and had been a
party to all household duties and functions during that period: she made her
will, and her savings, some two hundred and fifty dollars, she left to “little
Billie.” “Little Billie” was the great-grandson of her employer, and the pet
of the household: in order that there might be no mistake in identifying the
legatee, a picture of the baby boy was securely attached to the testament.
Will of the Duchesse de Praslin
By her will made in 1784, this testatrix, strangely enough, disinherited
her own children, being falsely persuaded that her husband had substituted
for them others whom he had had by an actress. She made her legatees the
grandchildren of the Prince de Soubise, whom she did not even know. Her
will was contested, and set aside. It contained another singular bequest—
that by which she left to her husband a model of the Cheval de Bronze (the
equestrian statue of Henri IV. on the Pont Neuf).
Must ever Pray

Not long ago an Italian nobleman left all his money, which amounted to
about $50,000, to his wife, “to be disposed of according to her own ideas,”
provided she entered a religious order and spent the rest of her life praying
for the repose of his soul. If she refused the conditions, the money went to
the order direct, and she got nothing.
The poor woman is now fighting the will in court, and there is said to be
some prospect that the estate will be divided and one-half, or at least a life
interest in the income, given to her. This, however, can be done only by
compromise.
The reason for this strange condition is said to have been revenge. The
wife had a lover, and the husband did not discover the fact until during his
last sickness, when she neglected previous precautions and he learned of
her flirtations. The husband was also afraid that she would marry her lover,
and is said to have told his lawyer that he would fix things so that the
scoundrel could not have the benefit of his money, even if he did enjoy the
affections of his wife.
A Cold World
Ellen H. Cooper, West Somerset Street, Philadelphia, died recently.
Pathos and worldly wisdom are mingled in her will. She wrote the
instrument with her own hand. It follows in part:
“All the money and furniture I have has been saved through my earnings
and hard work, therefore, I wish my two sons, John W. Cooper and Bernard
M. Cooper, to follow to the letter my wishes.
“My one real anxiety has been their future after my death. They cannot
now realize what a lonely life theirs will be without home or parents, for I
know, except one has money, there is no one to care what becomes of one.
Therefore I have saved for one purpose, that if either, or both, live to be old
and unable to work you may find a home and pay so much to be kept the
rest of their lives. There will be enough left to clothe you. All I am
possessed of I want put out at interest. I do not want one cent of it spent
otherwise, excepting what it takes to pay my funeral expenses. Remember,
dear boys, this is a cold world and I would long since have been glad to lay
down my burden had it not been for my love for you.”
Beautiful Sentiments to Wives

As an expression of controlling impulses and ideas, the will has ever
been associated with the home and family life. Some of the purest and
sweetest sentiments of the human heart are often contained in these legal
muniments. They are often the permanent repositories of the loftiest
feelings of conjugal and domestic affection. More than fifty per cent of the
wills made bequeath the bulk of the estate, absolutely or for life, to the
surviving spouse.
A beautiful expression of this holy sentiment of affection is found in the
will of John Starkey, probated in 1861. This testator says: “The remainder
of my wealth is vested in the affection of my dear wife, with whom I leave
it, in the good hope of resuming it more pure, bright and precious, where
neither moth nor rust doth corrupt, and where there are no railways or
monetary panics or fluctuations of exchange, but steadfast, though
progressive and unspeakable riches of glory and immortality.”
The following is another example of solicitude for a devoted wife.
Sharon Turner, the eminent author of the “History of the Anglo-Saxons,”
dying in his eightieth year, in 1847, left this testimonial to his wife, who
had died before him: “It is my comfort to have remembered that I have
passed with her nearly forty-nine years of unabated affection and connubial
happiness, and yet she is still living, as I earnestly hope and believe, under
her Saviour’s care, in a superior state of being.” He was anxious that her
portrait, which he directed should be painted and bequeathed, should
correctly represent her. He then adds: “None of the portraits of my beloved
wife give any adequate representation of her beautiful face, nor of the sweet
and intellectual and attractive appearance of her living features and general
countenance and character.”
Kindness to Widows
Testators in the present day frequently and ungallantly leave property to
their widows only so long as they shall remain unmarried. In looking
through some of the wills of the time of Henry VII., we do not find such a
condition attached. There are many instances to be found, however, of the
husband’s affectionate care for the future comfort of his wife. To quote two
or three: First, from the will of William Parker: “Also I make Master John
Aggecombe, Alderman of Oxford, my overseer, to se my last will
performed; and I geve to hym for his labour my best crymsyn gowne so that

he be frendly to Alice my wife.” In the will of Robert Offe, of Boston,
Lincolnshire, after appointing Master Thomas Robynson and Master John
Robynson overseers, he goes on to say: “And I beseche you, maisters both,
that ye be good frends unto my wyf, and that ye will help her.” William
Holybrande, gentleman citizen and “tailler” of London, bequeaths to each
of his executors, William Bodley and William Grove, for their labor, £5
sterling, and “to be goode and kynde to my wyfe.” He appoints as overseer,
“Robert Joyns, my cousin, one of the gentleman ushers of the chambre of
our Sovaigne Lorde the Kynge,” and bequeaths to him £5 sterling “for his
labour, and that he may help my wyfe in all her troubill, if any shall happen
to her here after.” He also gives and bequeaths “to Roger Delle, my servant,
so that he be lovyng and gentill to my wyfe, and give a trewe accompte for
such besynese as he hath reconyng of, £5 sterlinge.” These three wills were
all proved in 1505.
Would not be Good
In 1772, a gentleman of Surrey, England, died, and his will being opened
was found to contain this peculiar clause, “Whereas, it was my misfortune
to be made very uneasy by ——, my wife, for many years from our
marriage, by her turbulent behavior, for she was not content to despise my
admonitions, but she contrived every method to make me unhappy; she was
so perverse in her nature that she would not be reclaimed, but seemed only
to be born to be a plague to me; the strength of Samson, the knowledge of
Homer, the prudence of Augustus, the cunning of Pyrrhus, the patience of
Job, the subtlety of Hannibal and the watchfulness of Hermogenes could
not have been sufficient to subdue her; for no skill or force in the world
would make her good; and as we have lived separate and apart from each
other for eight years, and, she having perverted her son to leave and totally
abandon me, therefore, I give her a shilling.”
Must remain at Home
The last will and testament of Lawrence Engler was admitted to probate
September 19, 1910, at Columbus, Ohio. It disposes of an estate valued at
$10,000. He was killed in a recent wreck on the Hocking Valley Railroad
near Toledo.

He provides in his will that his widow and their children be given the
proceeds resulting from the rent of his property and that they all must
remain at home. When they leave, they forfeit all rights to the income.
So long as they live together they are to share the income, but when one
leaves he loses his interest.
This arrangement is to remain during the life of all, but no provision is
made for the disposal of the remainder.
The will is peculiar in another way. The testator, after its execution, took
the liberty of striking out some of the provisions without having the
amendments witnessed. He failed to make a codicil, but does say that he did
the scratching himself.
Danger in Mutual Wills
The wills of Mrs. Mary Louise Woeltge and Professor Albert Woeltge
were filed in the Probate Court at Stamford, Connecticut, on September 20,
1910, and they reveal a somewhat unusual situation. Professor Woeltge was
the first to pass away at Walpole, New Hampshire, on September 12th. His
wife died there a day later. Both left wills executed April 11, 1895.
Professor Woeltge left all his estate to his wife and appointed her sole
executrix. Mrs. Woeltge by her will left all her property to her husband.
Professor Woeltge inserted a clause by way of explanation to his
nephew, Albert A. Woeltge, and his niece, Lillie Woeltge, both of New
York, of this disposition of the estate. It was, in effect, that the money by
which he acquired the property disposed of in the will came most, if not all
of it, from his wife or her mother.
Professor Woeltge left two letters, one addressed to his wife and the
other to his niece and nephew. The letter to his wife carried a direct
expression of desire that on her death all the money he left her go to the
children of his brother William, “that they might know that I loved them
best after you.” The question arises as to who will get the property.
The Worst of Women
Henry, Earl of Stafford, who followed the fortunes of his royal master
James II., and attended him in his exile to France, married there the
daughter of the Duc de Grammont, at the end of the seventeenth century.

The marriage was a most unhappy one, and, after fourteen years’ endurance
of the disgraceful conduct of his wife, he wrote as follows in his will:
“To the worst of women, Claude Charlotte de Grammont, unfortunately
my wife, guilty as she is of all crimes, I leave five-and-forty brass
halfpence, which will buy a pullet for her supper. A better gift than her
father can make her; for I have known when, having not the money, neither
had he the credit for such a purchase; he being the worst of men, and his
wife the worst of women, in all debaucheries. Had I known their characters
I had never married their daughter, and made myself unhappy.”
Took the Son’s Part
Sir Robert Bevill, Knight, who held an official position at court under
James I., was the representative of an old Hunts family, and held by entail
the estates of Chesterton in that county. Dying in 1635, his will, which it
appears was made within a very short time of his death, was proved, and in
it occur the following clauses relative to his wife and his daughter’s
husband, with whom he died at enmity. These vindictive behests, be it
observed, are preceded by a very devout and godly preface, bequeathing his
soul “into the hands of its Maker, stedfastly believing in, and by the merits
of, our Lord and Saviour Jesus Christ, to obteyne free pardon and
forgiveness of al my sinnes, and at the last day to have and receive a
glorious resurrection.”
Immediately follows: “I give and bequeath to my son-in-law, Sir John
Hewell, Baronet, tenn shillings and noe more, in respect he stroke and
ceaselessly fought with mee.
“Item: I give unto my wyfe tenn shillings in respect she took her sonnes
part against me, and did anymate and comfort him afterwards. These will
not be forgotten.” Furthermore, the testator, in resentment against his said
wife—“inasmuch as she hath not only deserted mee, but hath taken into her
own possession all her own goods, and hath disposed of them at her own
pleasure”—declares his determination “to make no ampler provision for
her.”
He concludes this vindictive will by leaving all his large estates to his
second son.
This will is not exactly of the class alluded to by Steele in one of his
plays, where he makes one of the characters, a widow, remark, “There is no

will of an husband so cheerfully obeyed as his last.”
Accused of every Crime
John Parker, a bookseller, living in Old Bond Street, served his wife in
the following manner, leaving her no more than fifty pounds, and in the
following words:
“To one Elizabeth Parker, whom through fondness I made my wife,
without regard to family, fame, or fortune, and who in return has not spared
most unjustly to accuse me of every crime regarding human nature, except
highway robbery, I bequeath the sum of fifty pounds.”
Between the Lines
A rich man, making his will, left legacies to all his servants except his
steward, to whom he gave nothing, on the plea that, “having been in my
service in that capacity twenty years I have too high an opinion of his
shrewdness to suppose he has not sufficiently enriched himself.”
Menial Service Required
A year or two ago, a Russian gentleman, living at Odessa, bequeathed
four million roubles to his four nieces, but they were to receive the money
only after having worked for a year as washerwomen, chambermaids or
farm servants. These conditions were carried out, and while occupying such
humble positions, it is gratifying to learn that they received over eight
hundred and sixty offers of marriage.
No Mustaches
The will of Mr. Henry Budd, which came into force in 1862, declared
against the wearing of mustaches by his sons, in the following terms: “In
case my son Edward shall wear mustaches, then the devise hereinbefore
contained in favour of him, his appointees, heirs, and assigns of my said
estate called Pepper Park, shall be void; and I devise the same estate to my
son William, his appointees, heirs, and assigns. And in case my said son
William shall wear mustaches, then the devise hereinbefore contained in
favour of him, his appointees, heirs, and assigns of my said estate called
Twickenham Park, shall be void; and I devise the said estate to my said son
Edward, his appointees, heirs, and assigns.”

Will of William Pym
The will of William Pym, of Woolavington, Somerset, gent., is worth
citing for its originality. It bears date January 10, 1608.
After various charitable bequests, the last of which specifies the sum of
twelvepence to the church at Wells, he proceeds:
“I give to Agnes, which I did a long time take for my wyfe—till shee
denyd me to be her husband, all though wee were marryd with my friends’
consent, her father, mother, and uncle at it; and now she swareth she will
neither love mee nor evyr bee perswaded to, by preechers, nor by any other,
which hath happened within these few yeres. And Toby Andrewes, the
beginner, which I did see with mine own eyes when hee did more than was
fitting, and this by means of others their abettors. I have lived a miserable
life this six or seven yeres, and now I leve the revenge to God—and tenn
pounds to buy her a gret horse, for I could not this manny yeres plese her
with one gret enough.”
Two years after writing this bitter record of his wrongs, William Pym,
gent., gave up the ghost, and his last wishes were faithfully carried out by
his two executors.
Contrary to Roosevelt’s Idea
The malevolence of some men is manifested in their deaths, as well as in
their lives. A certain wealthy man left this provision in his will: “Should my
daughter marry and be afflicted with children, the trustees are to pay out of
said legacy, Ten Thousand Dollars on the birth of the first child, to the ——
Hospital; Twenty Thousand Dollars, on the second; Thirty Thousand
Dollars, on the third; and an additional Ten Thousand Dollars on the birth of
each fresh child, till the One Hundred and Fifty Thousand Dollars is
exhausted. Should any portion of this sum be left at the end of twenty years,
the balance is to be paid to her to use as she thinks fit.” This item would, no
doubt, interest our late President, Theodore Roosevelt.
Wife’s Desertion Rewarded
A certain Glasgow doctor died some ten years ago, and left his whole
estate to his sisters. In his will appeared this unusual clause: “To my wife,
as a recompense for deserting me and leaving me in peace, I expect the said

sister, Elizabeth, to make her a gift of ten shillings sterling, to buy her a
pocket handkerchief to weep after my decease.”
Would not wear the Cap
A husband left his wife sixty thousand dollars, to be increased to one
hundred and twenty thousand dollars, provided she wore a widow’s cap
after his death. She accepted the larger amount, wore the cap for six
months, and then put it off. A lawsuit followed, but the judge gave the
widow a judgment and stated that the word “always” should have been
inserted. Shortly after the rendition of the judgment, the widow entered into
the state of matrimony.
Strange Requirement as to Marriage
In 1805, Mr. Edward Hurst left a very large fortune to his only son on
condition that the latter should seek out and marry a young lady, whom the
father, according to his own statement, had, by acts for which he prayed
forgiveness, reduced to the extremity of poverty; or failing her, her nearest
unmarried female heir. The latter, by the irony of fate, turned out to be a
spinster of fifty-five, who, professing herself willing to carry out her share
of the imposed duty, was duly united to the young man, who had just
reached his majority.
A Happy Wife
Many wills have reference to the domestic felicity, or otherwise,
experienced by those who executed them. As an example of the former, we
may give the following passage from the testament of Lady Palmerston, an
ancestress of the celebrated Premier. Referring to her husband, she says,
“As I have long given you my heart and tenderest affections and fondest
wishes have always been yours, so is everything else that I possess; and all
that I can call mine being already yours, I have nothing to give but my
heartiest thanks for the care and kindness you have at all times shown me,
either in sickness or in health, for which God Almighty will, I hope, reward
you in a better world.” Then, for “form’s sake,” follow several specific
bequests.
Must walk Barefooted

A wife who domineers over her husband sometimes discovers that she
has made a serious mistake. Ten years ago the London (England)
newspapers reported that a publican (housekeeper) took a curious revenge
on a nagging wife, whose sharp tongue had given him many bad days while
he lived. When his will was read, she learned that in order to receive any
property she must walk barefooted to the market-place each time the
anniversary of his death came around. Holding a candle in her hand, she
was there to read a paper confessing her unseemly behavior to her husband
while he lived, and stating that had her tongue been shorter, her husband’s
days would probably have been longer. By refusing to comply with these
terms she had to be satisfied with “twenty pounds a year to keep her off the
parish.”
Anticipating the Past
It was Mrs. Malaprop in Sheridan’s delightful comedy, “The Rivals,”
who declined to “anticipate the past.”
Mr. John B. Luther, whose will is given below, certainly had the past in
mind when the instrument was drawn; it seems clear that he desired to
“anticipate the past” in so far as a provision for forgotten widows and
children was concerned. The testator formerly lived in Fall River,
Massachusetts, but his will was probated in San Francisco; he left an estate
valued at more than $100,000.
“I do hereby declare that I am not married and that I have no children. I
have noticed, however, the facility with which sworn testimony can be
procured and produced in support of the claims of alleged widows and
adopted children, and the frequent recurrence of such claims in recent years.
I therefore make express provision in this my last will as follows: I give and
bequeath to such person as shall be found, proved, and established to be my
surviving wife or widow, whether the marriage be found to have taken place
before or after the execution of this will, the sum of $5, and to each and
every person who shall be found, proved, and established to be my child by
birth, adoption, acknowledgment, or otherwise, and whether before or after
the execution of this will, the sum of $5, and I declare that I do intentionally
omit to make for any of the persons in this paragraph referred to any other
or further provision.”

2
ANIMALS
“Kind hearts are more than coronets,
And simple faith than Norman blood.”
Lower Animals have Souls
The Peoples Pulpit, a publication issued by the “Brooklyn Tabernacle,”
in a recent issue under the title, “What is a Soul?” says:
“Thus we see why it is that the Scriptures speak of ‘souls’ in connection
with the lower animals. They, as well as man, are sentient beings or
creatures of intelligence, only of lower orders. They, as well as man, can
see, hear, feel, taste and smell; and each can reason up to the standard of his
own organism, though none can reason as abstrusely nor on as high a plane
as man. This difference is not because man has a different kind of life from
that possessed by the lower animals; for all have similar vital forces, from
the same fountain or source of life, the same Creator; all sustain life in the
same manner, by the digestion of similar foods, producing blood, and
muscles, and bones, etc., each according to his kind or nature; and each
propagates his species similarly, bestowing the life, originally from God,
upon his posterity. They differ in shape and in mental capacity.
“Nor can it be said that while man is a soul (or intelligent being) beasts
are without this soul-quality or intelligence, thought, feeling. On the
contrary, both man and beast have soul-quality or intelligent, conscious
being. Not only is this the statement of Scripture, but it is readily
discernible as a fact, as soon as the real meaning of the word ‘soul’ is
comprehended, as shown in the foregoing. To illustrate: Suppose the
creation of a perfect dog; and suppose that creation had been particularly
described, as was Adam’s, what difference of detail could be imagined? The
body of a dog created would not be a dog until the breath of life would be
caused to energize that body; then it would be a living creature with
sensibilities and powers all its own—a living soul of the lower order, called
dog, as Adam, when he received life, became a living creature with
sensibilities and powers all his own—a living soul of the highest order of
flesh beings, called man.”

A Heaven for Beasts
Bishop Butler and Theodore Parker offered the suggestion that there is a
future for beasts, and a poem has been dedicated “To my Pony in Heaven,”
by Mr. Sewell of Exeter College.
Goldfish and Flowers
A certain lady left seventy pounds a year for the maintenance of three
goldfish, which were to be identified as follows: “one is bigger than the
other two, and these latter are to be easily recognized, as one is fat and the
other lean.” She also made provision for flowers to be placed upon the
graves of the gold fish.
Bequest to a Fish
We have heard of lucky dogs often enough—instances of lucky fish are
more rare, yet we can tell of two carps who have been testamentarily
benefited. One is, or rather was, too well known to the tourist who has seen
Fontainebleau, to need more than a passing mention, as he only paid the
debt of nature a few years ago, having occupied the royal pond, it is said,
more than a century, probably in order to bear out the proverb which gives
long lives to annuitants; the other was the mute but valued friend of the
Count of Mirandola, who had been in his intimacy since 1805, dwelling in
an elegant antique piscina, shaded by tropical plants, in an oriel of his salon
at Lucca, where he was still living as late as 1835, and may be there still.
The count, dying in 1825, left him a handsome annuity, with special
directions for his treatment.
Bequest to a Parrot
A rich and eccentric widow, whose will was proved in London some
years ago, left at her death a parrot, whom, “having been her faithful
companion for 24 years,” she left in charge of an appointee, with an annuity
of one hundred guineas, the existence and identity of the bird to be proved
twice a year, and all payments to be withheld from the moment the
feathered pensioner ceased to be produced.
Polly wants a Contest

In July last, at Washington, D.C., a will contest was commenced, which
involves the life or death of a parrot.
It appears Mrs. Ottilie Stock left a will, by the terms of which her parrot
was doomed to Oslerization by the process of chloroform. Her daughter,
Elizabeth Stock, questioned the validity of the will. It seems that Elizabeth
was left one dollar in money, two kitchen chairs, two pails and one broom;
hence, the will contest.
Mrs. Stock, the testatrix, was the mother of one of the men who went to
his death on the ill-fated battleship Maine, in the harbor of Havana.
What behavior induced the death sentence on Polly, is not known.
Will of Mrs. Elizabeth Hunter
This lady, a resident of London, having for many years enjoyed the
society of a pet parrot, and being anxious as to the fate of her favorite after
her death, bequeathed an annuity of £200, to be paid quarterly, so long as
the parrot should live and its identity be satisfactorily proved. This annuity
of £50 quarterly was left in the first instance to Mrs. Mary Dyer, of Park
Street, Westminster, with a proviso that should that trustee die before the
parrot, the sum should continue to be paid to some “respectable female who
should not be a servant.” One would think the testatrix must have had in her
mind the story of Gay’s cat—“Nor cruel Tom nor Susan heard!” Moreover,
it was to dwell in a cage that was to cost not less than £20, and which was
to be “high, long, large and roomy”; the bird also was “not to be taken out
of England.” This will was probated in 1813.
A Caged Annuitant
An elderly spinster, by name Caroline Hunter, wishing to provide for a
favorite parrot, bequeathed the bird with a legacy of one thousand pounds to
a widow, a friend of hers, giving her power to transfer both the pet and the
money to any third person, provided it were to one of the female sex, who
would undertake not to leave England. There was a special bequest of
twenty guineas to provide a very high and handsome cage, into which the
parrot was to be removed, and the executors were charged, in the event of
the charge and bequest being refused by the widow, to see that the parrot
was committed to the care of some trustworthy, respectable person. The will
concludes: “I will and desire that whoever attempts to dispute this my last

will and testament, or by any means tries to frustrate these my intentions,
shall forfeit whatever I have left him, her, or them. And if any one to whom
I have left legacies attempt to bring any bill or charge against me, it is my
will and desire they shall forfeit whatever legacy I may have left them. I
owe nothing to any one—many owe me gratitude and money, but none have
paid me either.”
Horses to be Shot
Frederick Christian Winslow was born in 1752; he was Councillor of
State, professor of surgery, and knight of the order of Danebrog. His works
on surgery have been translated into almost all the languages of Europe. He
was grand-nephew of the celebrated anatomist, James Benignus Winslow.
He died at Copenhagen, June 24, 1811.
His will disposes of property amounting to 37,000 crowns, but contains
only one clause which can be considered singular, viz.: that which orders
that his carriage-horses should be shot, lest after his death they come to be
ill-treated by any person who might buy them.
Will in Favor of a Horse
Among the archives of Toulouse exists the registry of a singular will,
made by a countryman of the immediate environs in 1781. This peasant,
who was the owner of a considerable sum of money, besides his house and
the land surrounding it, had no children, but had attached himself to a horse
he always rode, though it does not seem to have been particularly comely in
appearance. His affection for this animal was very constant; for, finding
himself seriously ill, and having decided on making his will, he disposed of
all his property in favor of the four-footed favorite in these terms: “I declare
that I appoint my russet cob my universal heir, and I desire that he may
belong to my nephew George.”
As may be supposed, the will was contested; but, strange to say, it was
ultimately confirmed. An experienced jurisconsult, by name Claude Serres,
professor of “droit civil” at Montpellier, has cited the case, and gives the
reason for the decision arrived at, viz.: “That the will being pronounced
valid, the succession of the testator was adjudicated to the nephew whom he
had designated as proprietor of the horse, because it was ruled that the
simplicity of the rustic should secure to him the execution of his last will,

and that, having named his nephew as legatee of the horse, he intended he
should have it endowed with the bequests he had bestowed upon it.”
Horses as Legatees
A curious will contest was instituted in January, 1911, in the Hungarian
courts. This contest turns upon the legality of the will of an eccentric
nobleman, Emile von Bizony, brother of a well-known deputy, who left all
his real and personal property, amounting to about $200,000, to be used in
behalf of his twelve draught horses.
As executor of his will, he named the Society for the Protection of
Animals at Budapest, stipulating that the interest on his estate should be
devoted to the care of his twelve draught horses, and that upon the death of
one of them another aged horse was to be taken in and cared for, so that the
number of twelve might always be maintained.
Herr von Bizony was sixty-five years of age, a confirmed misogynist,
and at odds with all his relatives, who were naturally amazed at the contents
of the will. His brother, the Deputy, Herr Alusins von Bizony, disputed the
will. Negotiations were made with the above-mentioned society, and
$20,000 was offered it, but refused, the society bringing an action against
the Bizony family for the retention of the property.
Two Thousand Dollars for a Horse
An Irishman, James Gilwee, died in 1907 in Carondelet, a subdivision of
the city of Saint Louis: by his will, filed in the Probate Court of the city of
Saint Louis, he left two thousand dollars in trust, the revenue from which
was to be used in the support and comfort of a favorite horse, “Tony”: the
children of the deceased carefully respected the wishes of their father, and
the horse was shipped to Bloomington, Illinois, where corn is plentiful and
meadow grass is blue, and the horse received every attention until his death,
which occurred quite recently. The two thousand dollars was thereupon
divided between the heirs.
Domestic Pets
Mrs. Elizabeth Balls, late of Park Lodge, Streatham, England, whose
will was proved on the 5th of November, 1875, bequeathed to the Cancer
Hospital, £2,000 Consols; to the Institution for the Deaf and Dumb, Old

Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
Let us accompany you on the journey of exploring knowledge and
personal growth!
ebookultra.com

VLSI Design for Video Coding 2010th Edition Youn

More Related Content

Similar to VLSI Design for Video Coding 2010th Edition Youn (20)

Recently uploaded (20)

VLSI Design for Video Coding 2010th Edition Youn