SlideShare a Scribd company logo
7FIRST QUARTER 2004 1540-7977/04/$20.00©2004 IEEE IEEE CIRCUITS AND SYSTEMS MAGAZINE
Videocodingwith
H.264/AVC:
Tools, Performance, and Complexity
©EYEWIRE;DIGITALSTOCK;COMSTOCK,INC.1998
Jörn Ostermann, Jan Bormans, Peter List,
Detlev Marpe, Matthias Narroschke,
Fernando Pereira, Thomas Stockhammer, and Thomas Wedi
H.264/AVC, the result of the collaboration between the ISO/IEC
Moving Picture Experts Group and the ITU-T Video Coding
Experts Group, is the latest standard for video coding. The goals
of this standardization effort were enhanced compression effi-
ciency, network friendly video representation for interactive
(video telephony) and non-interactive applications (broadcast,
streaming, storage, video on demand). H.264/AVC provides
gains in compression efficiency of up to 50% over a wide range
of bit rates and video resolutions compared to previous stan-
dards. Compared to previous standards, the decoder complexity
is about four times that of MPEG-2 and two times that of
MPEG-4 Visual Simple Profile. This paper provides an overview
of the new tools, features and complexity of H.264/AVC.
Index Terms—H.263, H.264, JVT, MPEG-1, MPEG-2,
MPEG-4, standards, video coding, motion compensation,
transform coding, streaming
Abstract
Feature
1. Introduction
T
he new video coding standard Recommendation
H.264 of ITU-T also known as International Stan-
dard 14496-10 or MPEG-4 part 10 Advanced Video
Coding (AVC) of ISO/IEC [1] is the latest standard in a
sequence of the video coding standards H.261 (1990) [2],
MPEG-1 Video (1993) [3], MPEG-2 Video (1994) [4], H.263
(1995, 1997) [5], MPEG-4 Visual or part 2 (1998) [6]. These
previous standards reflect the technological progress in
video compression and the adaptation of video coding to
different applications and networks. Applications range
from video telephony (H.261) to consumer video on CD
(MPEG-1) and broadcast of standard definition or high
definition TV (MPEG-2). Networks used for video commu-
nications include switched networks such as PSTN
(H.263, MPEG-4) or ISDN (H.261) and packet networks like
ATM (MPEG-2, MPEG-4), the Internet (H.263, MPEG-4) or
mobile networks (H.263, MPEG-4). The importance of new
network access technologies like cable modem, xDSL,
and UMTS created demand for the new video coding stan-
dard H.264/AVC, providing enhanced video compression
performance in view of interactive applications like video
telephony requiring a low latency system and non-inter-
active applications like storage, broadcast, and streaming
of standard definition TV where the focus is on high cod-
ing efficiency. Special consideration had to be given to the
performance when using error prone networks like mobile
channels (bit errors) for UMTS and GSM or the Internet
(packet loss) over cable modems, or xDSL. Comparing the
H.264/AVC video coding tools like multiple reference
frames, 1/4 pel motion compensation, deblocking filter or
integer transform to the tools of previous video coding
standards, H.264/AVC brought in
the most algorithmic discontinu-
ities in the evolution of standard-
ized video coding. At the same time,
H.264/AVC achieved a leap in cod-
ing performance that was not fore-
seen just five years ago. This
progress was made possible by
the video experts in ITU-T and
MPEG who established the Joint
Video Team (JVT) in December
2001 to develop this H.264/AVC
video coding standard.
H.264/AVC was finalized in
March 2003 and approved
by the ITU-T in May 2003.
The corresponding stan-
dardization documents
are downloadable from
ftp://ftp.imtc-files.org/jvt-
experts and the reference
software is available at
h t t p : / / b s . h h i . d e /
~suehring/tml/download.
Modern video communi-
cation uses digital video
that is captured from a
camera or synthesized
using appropriate tools like
animation software. In an
optional pre-processing
8 IEEE CIRCUITS AND SYSTEMS MAGAZINE FIRST QUARTER 2004
H.264 to
H.264 to
H.324/M
H.264 to
RTP/IP
H.264 to
H.320
H.264 to
File Format
TCP/IP
H.264 to
MPEG-2
Systems
H.264/AVC Conceptual Layers
Video Coding Layer
Encoder
Video Coding Layer
Encoder
VCL-NAL Interface
Network Abstraction
Layer Encoder
Network Abstraction
Layer Encoder
NAL Encoder Interface NAL Decoder Interface
Transport Layer
Wired Networks Wireless Networks
Figure 2. H.264/AVC in a transport environment: The network abstraction layer interface
enables a seamless integration with stream and packet-oriented transport layers (from [7]) .
Source
(Video)
Receiver
(Video)
Video
Video
Pre-Processing
Post-Processing
& Error Recovery
Encoding
Decoding
Scope of Standard
Bitstream
Bitstream
Channel/
Storage
Figure 1. Scope of video coding standardization: Only the syntax and semantics of
the bitstream and its decoding are defined.
Jörn Ostermann is with the Institut für Theoretische Nachrichtentechnik und Informationsverarbeitung, University of Hannover, Hannover, Ger-
many. Jan Bormans is with IMEC, Leuven, Belgium. Peter List is with Deutsche Telecom, T-Systems, Darmstadt, Germany. Detlev Marpe is with
the Fraunhofer-Institute for Telecommunications, Heinrich Hertz Institute, Berlin, Germany. Matthias Narroschke is with the Institut für Theo-
retische Nachrichtentechnik und Informationsverarbeitung, University of Hannover, Appelstr. 9a, 30167 Hannover, Germany, narrosch@tnt.uni-
hannover.de. Fernando Peirera is with Instituto Superior Técnico - Instituto de Telecomunicações, Lisboa, Portugal. Thomas Stockhammer is
with the Institute for Communications Engineering, Munich University of Technology, Germany. Thomas Wedi is with the Institut für Theo-
retische Nachrichtentechnik und Informationsverarbeitung, University of Hannover, Hannover, Germany.
step (Figure 1), the
sender might choose to
preprocess the video
using format conversion
or enhancement tech-
niques. Then the en-
coder encodes the video
and represents the
video as a bit stream.
After transmission of the
bit stream over a com-
munications network,
the decoder decodes the
video which gets dis-
played after an optional post-processing step which might
include format conversion, filtering to suppress coding
artifacts, error concealment, or video enhancement.
The standard defines the syntax and semantics of the
bit stream as well as the processing that the decoder
needs to perform when decoding the bit stream into
video. Therefore, manufactures of video decoders can
only compete in areas like cost and hardware require-
ments. Optional post-processing of the decoded video is
another area where different manufactures will provide
competing tools to create a decoded video stream opti-
mized for the targeted application. The standard does not
define how encoding or other video pre-processing is per-
formed thus enabling manufactures to compete with their
encoders in areas like cost, coding efficiency, error
resilience and error recovery, or hardware requirements.
At the same time, the standardization of the bit stream
and the decoder preserves the fundamental requirement
for any communications standard—interoperability.
For efficient transmission in different environments
not only coding efficiency is relevant, but also the seam-
less and easy integration of the coded video into all cur-
rent and future protocol and network architectures. This
includes the public Internet with best effort delivery, as
well as wireless networks expected to be a major applica-
tion for the new video coding standard. The adaptation of
the coded video representation or bitstream to different
transport networks was typically defined in the systems
specification in previous MPEG standards or separate
standards like H.320 or H.324. However, only the close
integration of network adaptation and video coding can
bring the best possible performance of a video communi-
cation system. Therefore H.264/AVC consists of two con-
ceptual layers (Figure 2). The video coding layer (VCL)
defines the efficient representation of the video, and the
network adaptation layer (NAL) converts the VCL repre-
9FIRST QUARTER 2004 IEEE CIRCUITS AND SYSTEMS MAGAZINE
Decoded
Macroblock
Intra/Inter
Intra-Frame
Prediction
Motion Comp.
Prediction
Inverse
Transform
Deblocking
Filter
Memory
Motion Data
Entropy
Decoding
+
Quantized
Coefficients
Figure 4. Generalized block diagram of a hybrid video decoder with motion compensation.
Macroblock of
Input Image Signal
+
Prediction
Error Signal
Transform Quant.
Intra/Inter
Intra-Frame
Prediction
Motion Comp.
Prediction
Motion
Estimation
Inverse
Transform
Deblocking
Filter
Memory
Motion Data
Quantized
Coefficients Entropy
Coding
+
Figure 3. Generalized block diagram of a hybrid video encoder with motion compensation: The adaptive deblocking filter and
intra-frame prediction are two new tools of H.264.
sentation into a format suitable for specific transport lay-
ers or storage media. For circuit-switched transport like
H.320, H.324M or MPEG-2, the NAL delivers the coded
video as an ordered stream of bytes containing start
codes such that these transport layers and the decoder
can robustly and simply identify the structure of the bit
stream. For packet switched networks like RTP/IP or
TCP/IP, the NAL delivers the coded video in packets with-
out these start codes.
This paper gives an overview of the working, perform-
ance and hardware requirements of H.264/AVC. In Section
2, the concept of standardized video coding schemes is
introduced. In Section 3, we describe the major tools of
H.264/AVC that achieve this progress in video coding per-
formance. Video coder optimization is not part of the
standard. However, the successful use of the encoder
requires knowledge on encoder control that is presented
in Section 4. H.264/AVC may be used for different applica-
tions with very different constraints like computational
resources, error resilience and video resolution. Section 5
describes the profiles and levels of H.264/AVC that allow
for the adaptation of the decoder complexity to different
applications. In Section 6, we give comparisons between
H.264/AVC and previous video coding standards in terms
of coding efficiency as well as hardware complexity.
H.264/AVC uses many international patents, and Section 7
paraphrases the current licensing model for the commer-
cial use of H.264/AVC.
2. Concept of Standardized Video Coding Schemes
Standardized video coding techniques like H.263,
H.264/AVC, MPEG-1, 2, 4 are based on hybrid video cod-
ing. Figure 3 shows the generalized block diagram of such
a hybrid video encoder.
The input image is divided into macroblocks. Each
macroblock consists of the three components Y, Cr and
Cb. Y is the luminance component which represents the
brightness information. Cr and Cb represent the color
information. Due to the fact that the human eye system is
less sensitive to the chrominance than to the luminance
the chrominance signals are both subsampled by a factor
of 2 in horizontal and vertical direction. Therefore, a mac-
roblock consists of one block of 16 by 16 picture elements
for the luminance component and of two blocks of 8 by 8
picture elements for the color components.
These macroblocks are coded in Intra or Inter mode.
In Inter mode, a macroblock is predicted using motion
compensation. For motion compensated prediction a dis-
placement vector is estimated and transmitted for each
block (motion data) that refers to the corresponding
position of its image signal in an already transmitted ref-
erence image stored in memory. In Intra mode, former
standards set the prediction signal to zero such that the
image can be coded without reference to previously sent
information. This is important to provide for error
resilience and for entry points into the bit streams
enabling random access. The prediction error, which is
the difference between the original and the predicted
block, is transformed, quantized and entropy coded. In
order to reconstruct the same image on the decoder side,
the quantized coefficients are inverse transformed and
added to the prediction signal. The result is the recon-
structed macroblock that is also available at the decoder
side. This macroblock is stored in a memory. Mac-
roblocks are typically stored in raster scan order.
With respect to this simple block diagram (Figure 3),
H.264/AVC introduces the following changes:
1. In order to reduce the block-artifacts an adaptive
deblocking filter is used in the prediction loop. The
deblocked macroblock is stored in the memory and
can be used to predict future macroblocks.
2. Whereas the memory contains one video frame in
previous standards, H.264/AVC allows storing multi-
ple video frames in the memory.
3. In H.264/AVC a prediction scheme is used also in Intra
mode that uses the image signal of already transmit-
ted macroblocks of the same image in order to pre-
dict the block to code.
4. The Discrete Cosine Transform (DCT) used in former
standards is replaced by an integer transform.
Figure 4 shows the generalized block diagram of the
corresponding decoder. The entropy decoder decodes
the quantized coefficients and the motion data, which is
used for the motion compensated prediction. As in the
encoder, a prediction signal is obtained by intra-frame or
motion compensated prediction, which is added to the
inverse transformed coefficients. After deblocking filter-
ing, the macroblock is completely decoded and stored in
the memory for further predictions.
In H.264/AVC, the macroblocks are processed in so
called slices whereas a slice is usually a group of mac-
roblocks processed in raster scan order (see Figure 5). In
special cases, which will be discussed in Section 3.6, the
10 IEEE CIRCUITS AND SYSTEMS MAGAZINE FIRST QUARTER 2004
Slice 0
Slice 1
Slice 2
Figure 5. Partitioning of an image into several slices.
processing can differ from the raster scan order. Five dif-
ferent slice-types are supported which are I-, P-, B-, SI,-
and SP-slices. In an I-slice, all macroblocks are encoded in
Intra mode. In a P-slice, all macroblocks are predicted
using a motion compensated prediction with one refer-
ence frame and in a B-slice with two reference frames. SI-
and SP-slices are specific slices that are used for an effi-
cient switching between two different bitstreams. They
are both discussed in Section 3.6.
For the coding of interlaced video, H.264/AVC sup-
ports two different coding modes. The first one is called
frame mode. In the frame mode, the two fields of one
frame are coded together as if they were one single pro-
gressive frame. The second mode is called field mode. In
this mode, the two fields of a frame are encoded sepa-
rately. These two different coding modes can be selected
for each image or even for each macroblock. If they are
selected for each image, the coding is referred to as pic-
ture adaptive field/frame coding (P-AFF). Whereas MPEG-2
allows for selecting the frame/field coding on a mac-
roblock level H.264 allow for selecting this mode on a ver-
tical macroblock pair level. This coding is referred to as
macroblock-adaptive frame/field coding (MB-AFF). The
choice of the frame mode is efficient for regions that are
not moving. In non-moving regions there are strong sta-
tistical dependencies between adjacent lines even though
these lines belong to different fields. These dependencies
can be exploited in the frame mode. In the case of moving
regions the statistical dependencies between adjacent
lines are much smaller. It is more efficient to apply the
field mode and code the two fields separately.
3. The H.264/AVC Coding Scheme
In this Section, we describe the tools that make H.264
such a successful video coding scheme. We discuss Intra
coding, motion compensated prediction, transform cod-
ing, entropy coding, the adaptive deblocking filter as well
as error robustness and network friendliness.
3.1 Intra Prediction
Intra prediction means that the samples of a macroblock
are predicted by using only information of already trans-
mitted macroblocks of the same image. In H.264/AVC,
two different types of intra prediction are possible for
the prediction of the luminance component Y.
The first type is called INTRA_4×4 and the second one
INTRA_16×16. Using the INTRA_4×4 type, the mac-
roblock, which is of the size 16 by 16 picture elements
(16×16), is divided into sixteen 4×4 subblocks and a pre-
diction for each 4×4 subblock of the luminance signal is
applied individually. For the prediction purpose, nine dif-
ferent prediction modes are supported. One mode is DC-
prediction mode, whereas all samples of the current 4×4
subblock are predicted by the mean of all samples neigh-
boring to the left and to the top of the current block and
which have been already reconstructed at the encoder
and at the decoder side (see Figure 6, Mode 2). In addition
to DC-prediction mode, eight prediction modes each for a
specific prediction direction are supported. All possible
directions are shown in Figure 7. Mode 0 (vertical predic-
tion) and Mode 1 (horizontal prediction) are shown
explicitly in Figure 6. For example, if the vertical predic-
tion mode is applied all samples below sample A (see Fig-
ure 6) are predicted by sample A, all samples below
sample B are predicted by sample B and so on.
Using the type INTRA_16×16, only one prediction
mode is applied for the whole macroblock. Four different
prediction modes are supported for the type
INTRA_16×16: Vertical prediction, horizontal prediction,
DC-prediction and plane-prediction. Hereby plane-predic-
tion uses a linear function between the neighboring sam-
ples to the left and to the top in order to predict the
current samples. This mode works very well in areas of a
gently changing luminance. The mode of operation of
these modes is the same as the one of the 4×4 prediction
modes. The only difference is that they are applied for the
whole macroblock instead of for a 4×4 subblock. The effi-
11FIRST QUARTER 2004 IEEE CIRCUITS AND SYSTEMS MAGAZINE
M A B C D E F G H
I
J
K
L
M A B C D E F G H
I
J
K
L
M A B C D E F G H
I
J
K
L
Mode 0: Vertical Mode 1: Horizontal Mode 2: DC
A M : Neighboring samples that are already reconstructed at the encoder and at the decoder side
: Samples to be predicted
Mean (A, B,
C, D, I, J, K, L)
Figure 6. Three out of nine possible intra prediction modes for the intra prediction type INTRA_4×4.
ciency of these modes is high if the signal is very smooth
within the macroblock.
The intra prediction for the chrominance signals Cb
and Cr of a macroblock is similar to the INTRA_16×16
type for the luminance signal because the chrominance
signals are very smooth in most cases. It is performed
always on 8×8 blocks using vertical prediction, horizon-
tal prediction, DC-prediction or plane-prediction. All intra
prediction modes are explained in detail in [1].
3.2 Motion Compensated Prediction
In case of motion compensated prediction macroblocks
are predicted from the image signal of already transmit-
ted reference images. For this purpose, each macroblock
can be divided into smaller partitions. Partitions with
luminance block sizes of 16×16, 16×8, 8×16, and 8×8
samples are supported. In case of an 8×8 sub-macroblock
in a P-slice, one additional syntax element specifies if the
corresponding 8×8 sub-macroblock is further divided
into partitions with block sizes of 8×4, 4×8 or 4×4 [8].
The partitions of a macroblock and a sub-macroblock are
shown in Figure 8.
In former standards as MPEG-4 or H.263, only blocks of
the size 16×16 and 8×8 are supported. A displacement
vector is estimated and transmitted for each block, refers
to the corresponding position of its image signal in an
already transmitted reference image. In former MPEG
standards this reference image is the most recent pre-
ceding image. In H.264/AVC it is possible to refer to sev-
eral preceding images. For this purpose, an additional
picture reference parameter has to be transmitted togeth-
er with the motion vector. This technique is denoted as
motion-compensated prediction with multiple reference
frames [9]. Figure 9 illustrates the concept that is also
extended to B-slices.
The accuracy of displacement vectors is a quarter of a
picture element (quarter-pel or 1/4-pel). Such displace-
ment vectors with fractional-pel resolution may refer to
positions in the reference image, which are spatially
located between the sampled positions of its image sig-
nal. In order to estimate and compensate fractional-pel
displacements, the image signal of the reference image
has to be generated on sub-pel positions by interpolation.
In H.264/AVC the luminance signal at half-pel positions is
generated by applying a one-dimensional 6-tap FIR filter,
which was designed to reduce aliasing components that
deteriorate the interpolation and the motion compensat-
ed prediction [8]. By averaging the luminance signal at
integer- and half-pel positions the image signal at quarter-
pel positions is generated. The chrominance signal at all
fractional-pel positions is obtained by averaging.
In comparison to prior video-coding standards, the
classical concept of B-pictures is extended to a general-
ized B-slice concept in H.264/AVC. In the classical concept,
B-pictures are pictures that are encoded using both past
and future pictures as references. The prediction is
obtained by a linear combination of forward and back-
ward prediction signals. In former standards, this linear
combination is just an averaging of the two prediction sig-
nals whereas H.264/AVC allows arbitrary weights. In this
generalized concept, the linear combination of prediction
signals is also made regardless of the temporal direction.
12 IEEE CIRCUITS AND SYSTEMS MAGAZINE FIRST QUARTER 2004
3
7
0
5
4
6
1
88
1
6
4
5
0
7
3
Figure 7. Possible prediction directions for INTRA_4×4 mode.
Macroblock
Partitions
Sub-Macroblock
Partitions
16 x 16 16 x 8 8 x 16 8 x 8
8 x 8 8 x 4 4 x 8 4 x 4
Sub-
Macroblock
Figure 8. Partitioning of
a macroblock and a sub-
macroblock for motion
compensated prediction.
For example, a linear combination of two forward-predic-
tion signals may be used (see Figure 9). Furthermore,
using H.264/AVC it is possible to use images containing B-
slices as reference images for further predictions which
was not possible in any former standard. Details on this
generalized B-slice concept, which is also known as multi-
hypothesis motion-compensated prediction can be found
in [10], [11], [12].
3.3 Transform Coding
Similar to former standards transform coding is applied
in order to code the prediction error signal. The task of
the transform is to reduce the spatial redundancy of the
prediction error signal. For the purpose of transform
coding, all former standards such as MPEG-1 and MPEG-
2 applied a two dimensional Discrete Cosine Transform
(DCT) [13] of the size 8×8. Instead of the DCT, different
integer transforms are applied in H.264/ AVC. The size of
these transforms is mainly 4×4, in special cases 2×2.
This smaller block size of 4×4 instead of 8×8 enables the
encoder to better adapt the prediction error coding to
the boundaries of moving objects, to match the
transform block size with the smallest block
size of the motion compensation, and to gener-
ally better adapt the transform to the local pre-
diction error signal.
Three different types of transforms are used.
The first type is applied to all samples of all pre-
diction error blocks of the luminance component
Y and also for all blocks of both chrominance
components Cb and Cr regardless of whether
motion compensated prediction or intra predic-
tion was used. The size of this transform is 4×4.
Its transform matrix H1 is shown in Figure 10.
If the macroblock is predicted using the type
INTRA_16×16, the second transform, a
Hadamard transform with matrix H2 (see Figure
10), is applied in addition to the first one. It
transforms all 16 DC coefficients of the already
transformed blocks of the luminance signal. The
size of this transform is also 4×4.
The third transform is also a Hadamard
transform but of size 2×2. It is used for the
transform of the 4 DC coefficients of each
chrominance component. Its matrix H3 is shown
in Figure 10.
The transmission order of all coefficients is shown in
Figure 11. If the macroblock is predicted using the intra
prediction type INTRA_16×16 the block with the label
“−1” is transmitted first. This block contains the DC coef-
ficients of all blocks of the luminance component. After-
wards all blocks labeled “0”–“25” are transmitted whereas
blocks “0”–“15” comprise all AC coefficients of the blocks
of the luminance component. Finally, blocks “16” and “17”
comprise the DC coefficients and blocks “18”–“25” the AC
coefficients of the chrominance components.
Compared to a DCT, all applied integer transforms have
only integer numbers ranging from −2 to 2 in the trans-
form matrix (see Figure 10). This allows computing the
transform and the inverse transform in 16-bit arithmetic
using only low complex shift, add, and subtract opera-
tions. In the case of a Hadamard transform, only add and
subtract operations are necessary. Furthermore, due to
the exclusive use of integer operations mismatches of the
inverse transform are completely avoided which was not
the case in former standards and caused problems.
All coefficients are quantized by a scalar quantizer.
The quantization step size is chosen by a so called quan-
tization parameter QP which supports 52 different quan-
tization parameters. The step size doubles with each
increment of 6 of QP. An increment of QP by 1 results in
an increase of the required data rate of approximately
12.5%. The transform is explained in detail in [15].
3.4 Entropy Coding Schemes
H.264/AVC specifies two alternative methods of entropy
coding: a low-complexity technique based on the usage of
context-adaptively switched sets of variable length
codes, so-called CAVLC, and the computationally more
demanding algorithm of context-based adaptive binary
arithmetic coding (CABAC). Both methods represent
major improvements in terms of coding efficiency com-
13FIRST QUARTER 2004 IEEE CIRCUITS AND SYSTEMS MAGAZINE
Already Decoded Images as Reference Images to Code
dt = 3
dt = 2
dt = 1
Figure 9. Motion-compensated prediction with multiple reference
images. In addition to the motion vector, also an image reference
parameter dt is transmitted.
H1 =
1
2
1
1
1
1
–1
–2
1
–1
–1
2
1
–2
1
–1
1
1
1
1
1
1
–1
–1
1
–1
–1
1
1
–1
1
–1
1
1
1
–1
H2 = H3 =
Figure 10. Matrices H1, H2 and H3 of the three different transforms
applied in H.264/AVC.
pared to the techniques of statistical coding traditionally
used in prior video coding standards. In those earlier
methods, specifically tailored but fixed variable length
codes (VLCs) were used for each syntax element or sets
of syntax elements whose representative probability dis-
tributions were assumed to be closely matching. In any
case, it was implicitly assumed that the underlying statis-
tics are stationary, which however in practice is seldom
the case. Especially residual data in a motion-compensat-
ed predictive coder shows a highly non-stationary statis-
tical behavior, depending on the video content, the
coding conditions and the accuracy of the prediction
model. By incorporating context modeling in their
entropy coding framework, both methods of H.264/AVC
offer a high degree of adaptation to the underlying
source, even though at a different complexity-compres-
sion trade-off.
CAVLC is the baseline entropy
coding method of H.264/AVC. Its
basic coding tool consists of a sin-
gle VLC of structured Exp-Golomb
codes, which by means of individu-
ally customized mappings is
applied to all syntax elements
except those related to quantized
transform coefficients. For the lat-
ter, a more sophisticated coding
scheme is applied. As shown in the
example of Figure 12, a given block
of transform coefficients is first
mapped on a 1-D array according
to a predefined scanning pattern.
Typically, after quantization a block contains only a few
significant, i.e., nonzero coefficients, where, in addition, a
predominant occurrence of coefficient levels with magni-
tude equal to 1, so-called trailing 1’s (T1), is observed at
the end of the scan. Therefore, as a preamble, first the
number of nonzero coefficients and the number of T1s
are transmitted using a combined codeword, where one
out of four VLC tables are used based on the number of
significant levels of neighboring blocks. Then, in the sec-
ond step, sign and level value of significant coefficients
are encoded by scanning the list of coefficients in revers
order. By doing so, the VLC for coding each individual
level value is adapted on the base of the previously
encoded level by choosing among six VLC tables. Finally,
the zero quantized coefficients are signaled by transmit-
ting the total number of zeros before the last nonzero
level for each block, and additionally, for each significant
level the corresponding
run, i.e., the number of
consecutive preceding
zeros. By monitoring the
maximum possible num-
ber of zeros at each cod-
ing stage, a suitable VLC
is chosen for the coding
of each run value. A total
number of 32 different
VLCs are used in CAVLC
entropy coding mode,
where, however, the
structure of some of
these VLCs enables sim-
ple on-line calculation of
any code word without
recourse to the storage of
code tables. For typical
coding conditions and
test material, bit rate
14 IEEE CIRCUITS AND SYSTEMS MAGAZINE FIRST QUARTER 2004
4 x 4 Block of Quantized
Transform Coefficients
Array of Scanned Quantized
Transform Coefficients
CAVLC CABAC
Preamble
Reverse
Precoded
Syntax
Elements
Number signif. coeff: 5
Trailing 1’s (T1): 3
Sign T1: –1,1,1
Levels: 2,1
Total Zeros: 2
Run Before: 0, 1, 1, (0)
Coded Block Flag: 1
Signif. Coeff. Flag: 1,1,0,1,0,1,1
Last Coeff. Flag: 0,0,0,0,1
Magnitude of Levels: 1,1,1,2,1
Level Signs: –1,1,1,1,1
Figure 12. Precoding a block of quantized transform coefficients.
–1
16 17
Cb Cr
Luminance Signal Y
Chrominance Signals
(Only for
16 x 16_INTRA
Mode)
0
2
8
10
1
3
9
11
4
6
12
14
5
7
13
15
18
20
19
21
22
24
23
25
Figure 11. Transmission order of all coefficients of a macroblock [14].
reductions of 2–7% are obtained by CAVLC relative to a
conventional run-length scheme based on a single Exp-
Golomb code.
For significantly improved coding efficiency, CABAC as
the alternative entropy coding mode of H.264/AVC is the
method of choice (Figure 13). As shown in Figure 13, the
CABAC design is based on the key elements: binarization,
context modeling, and binary arithmetic coding. Bina-
rization enables efficient binary arithmetic coding via a
unique mapping of non-binary syntax elements to a
sequence of bits, a so-called bin string. Each element of
this bin string can either be processed in the regular cod-
ing mode or the bypass mode. The latter is chosen for
selected bins such as for the sign information or lower
significant bins, in order to speedup the whole encoding
(and decoding) process by means of a simplified coding
engine bypass. The regular coding mode provides the
actual coding benefit, where a bin may be context mod-
eled and subsequently arithmetic encoded. As a design
decision, in general only the most probable bin of a syn-
tax element is supplied with a context model using previ-
ously encoded bins. Moreover, all regular encoded bins
are adapted by estimating their actual probability distri-
bution. The probability estimation and the actual binary
arithmetic coding is conducted using a multiplication-free
method that enables efficient implementations in hard-
ware and software. Note that for coding of transform coef-
ficients, CABAC is applied to specifically designed syntax
elements, as shown in the example of Figure 12. Typically,
CABAC provides bit rate reductions of 5–15% compared
to CAVLC. More details on CABAC can be found in [16].
3.5 Adaptive Deblocking Filter
The block-based structure of the H.264/AVC architecture
containing 4×4 transforms and block-based motion com-
pensation, can be the source of severe blocking artifacts.
Filtering the block edges has been shown to be a power-
ful tool to reduce the visibility of these artifacts. Deblock-
ing can in principle be carried out as post-filtering,
influencing only the pictures to be displayed. Higher visu-
al quality can be achieved though, when the filtering
process is carried out in the coding loop, because then all
involved past reference frames used for motion compen-
sation will be the filtered versions of the reconstructed
frames. Another reason to make deblocking a mandatory
in-loop tool in H.264/AVC is to enforce a decoder to
approximately deliver a quality to the customer, which
was intended by the producer and not leaving this basic
picture enhancement tool to the optional good will of the
decoder manufacturer.
The filter described in the H.264/AVC standard is highly
adaptive. Several parameters and thresholds and also the
local characteristics of the picture itself control the
strength of the filtering process. All involved thresholds are
quantizer dependent, because blocking artifacts will always
become more severe when quantization gets coarse.
H.264/MPEG-4 AVC deblocking is adaptive on three
levels:
■ On slice level, the global filtering strength can be
adjusted to the individual characteristics of the
video sequence.
■ On block edge level, the filtering strength is made
dependent on inter/intra prediction decision,
motion differences, and the presence of coded
residuals in the two participating blocks. From
these variables a filtering-strength parameter is
calculated, which can take values from 0 to 4 caus-
ing modes from no filtering to very strong filtering
of the involved block edge.
■ On sample level, it is crucially important to be
able to distinguish between true edges in the
image and those created by the quantization of
the transform-coefficients. True edges should be
left unfiltered as much as possible. In order to
separate the two cases, the sample values across
every edge are analyzed. For an explanation
denote the sample values inside two neighboring
4×4 blocks as p3, p2, p1, p0 | q0, q1, q2, q3 with the
15FIRST QUARTER 2004 IEEE CIRCUITS AND SYSTEMS MAGAZINE
Syntax
Element
Non-Binary Valued
Syntax Element
Binarizer
Bin
String
Context
Bin
Regular
Bypass
Binary Valued
Syntax Element
Loop Over
Bins
Bin Value for Context Model Update
Bin Value,
Context
Model Coded
Bits
Coded
Bits
Bin Value
Regular
Bypass
Binary Arithmetic Coder
Bitstream
Context
Modeler
Regular
Coding
Engine
Bypass
Coding Engine
a
b
x
Figure 13. CABAC block diagram.
actual boundary between p0 and q0 as shown in
Figure 14. Filtering of the two pixels p0 and q0 only
takes place, if their absolute difference falls below
a certain threshold α. At the same time, absolute
pixel differences on each side of the edge
(|p1 − p0| and |q1 − q0|) have to fall below
another threshold β, which is considerably small-
er than α. To enable filtering of p1(q1), additional-
ly the absolute difference between p0 and p2 (q0
and q2) has to be smaller than β. The dependency
of α and β on the quantizer, links the strength of
filtering to the general quality of the reconstruct-
ed picture prior to filtering. For small quantizer
values the thresholds both become zero, and fil-
tering is effectively turned off altogether.
All filters can be calculated without multiplications
or divisions to minimize the processor load involved in
filtering. Only additions and shifts are needed. If filtering
is turned on for p0, the impulse response of the involved
filter would in principle be (0, 1, 4, | 4, −1, 0) / 8. For p1
it would be (4, 0, 2, | 2, 0, 0) / 8. The term in principle
means that the maximum changes allowed for p0 and p1
(q0 and q1) are clipped to relatively small quantizer
dependent values, reducing the low pass characteristic
of the filter in a nonlinear manner.
Intra coding in H.264/AVC tends to use INTRA_16×16
prediction modes when coding nearly uniform image
areas. This causes small amplitude blocking artifacts at
the macro block boundaries which are perceived as
abrupt steps in these cases. To compensate the resulting
tiling artifacts, very strong low pass filtering is applied on
boundaries between two macro blocks with smooth image
content. This special filter also involves pixels p3 and q3.
In general deblocking results in bit rate savings of
around 6–9% at medium qualities. More remarkable are
the improvements in subjective picture quality. A more
concise description of the H.264/AVC deblocking scheme
can be found in [17].
3.6 Error Robustness and Network Friendliness
For efficient transmission in different environments, the
seamless and easy integration of the coded video into all
current and future protocol and network architectures is
important. Therefore, both the VCL and the NAL are part
of the H.264/AVC standard (Figure 2). The VCL specifies
an efficient representation for the coded video signal. The
NAL defines the interface between the video codec itself
and the outside world. It operates on NAL units which
give support to the packet-based approach of most exist-
ing networks. In addition to the NAL concept, the VCL
itself includes several features providing network friendli-
ness and error robustness being essential especially for
real-time services such as streaming, multicasting, and
conferencing applications due to online transmission and
decoding. The H.264/AVC Hypothetical Reference Decoder
(HRD) [18] places constraints on encoded NAL unit
streams in order to enable cost-effective decoder imple-
mentations by introducing a multiple-leaky-bucket model.
Lossy and variable bit rate (VBR) channels such as the
Internet or wireless links require channel-adaptive
streaming or multi-casting technologies. Among others
[19], channel-adaptive packet dependency control [20]
and packet scheduling [21] allow reacting to these chan-
nels when transmitting pre-encoded video streams.
These techniques are supported in H.264/AVC by various
means, namely frame dropping of non-reference frames
resulting in well-known temporal scalability, the multiple
reference frame concept in combination with generalized
B-pictures allowing a huge flexibility on frame dependen-
cies to be exploited for temporal scalability and rate
shaping of encoded video, and the possibility of switch-
ing between different bit streams which are encoded at
different bit rates. This technique is called version
switching. It can be applied at Instantaneous Decoder
Refresh (IDR) frames, or, even more efficiently by the
usage of switching pictures which allow identical recon-
struction of frames even when different reference frames
are being used. Thereby, switching-predictive (SP) pictures
efficiently exploit motion-compensated prediction where-
as switching-intra (SI) pictures can exactly reconstruct SP
pictures. The switching between two bit streams using SI
and SP pictures is illustrated in Figure 15 and Figure 16.
Switching pictures can also be applied for error resilience
purposes as well as other features, for details see [22].
Whereas for relaxed-delay applications such as down-
load-and-play, streaming, and broadcast/multicast, resid-
ual errors can usually be avoided by applying powerful
forward error correction and retransmission protocols,
the low delay requirements for conversational applica-
16 IEEE CIRCUITS AND SYSTEMS MAGAZINE FIRST QUARTER 2004
a
Block Edge
p3
p2
p1
p0
q0
q1
q2
q3
β
β
Figure 14. One-dimensional visualization of a block edge in a
typical situation where the filter would be turned on.
tions impose additional challenges as transmission errors
due to congestions and link-layer imperfectness can gen-
erally not be avoided. Therefore, these video applications
require error resilience features. The H.264/AVC standardi-
zation process acknowledged thisby adopting a set of
common test conditions for IP based trans-
mission [23]. Anchor video sequences,
appropriate bit rates and evaluation criteria
are specified. In the following we briefly pres-
ent different error resilience features includ-
ed in the standard, for more details we refer
to [24] and [7]. The presentation is accompa-
nied by Figure 18 showing results for a repre-
sentative selection of the common Internet
test conditions, namely for the QCIF
sequence Foreman 10 seconds are encoded
at a frame rate of 7.5 fps applying only tem-
porally backward referencing motion com-
pensation. The resulting total bit rate
including a 40 byte IP/UDP/RTP header
matches exactly 64 kbit/s. As performance
measure the average luminance peak signal
to noise ratio (PSNR) is chosen and sufficient
statistics are obtained by transmitting at
least 10000 data packets for each experiment
as well as applying a simple packet loss sim-
ulator and Internet error patterns1 as speci-
fied in [23].
Although common understanding usually
assumes that increased compression efficien-
cy decreases error resilience, the opposite is
the case if applied appropriately. As higher
compression allows using additional bit rate
for forward error correction, the loss proba-
bility of highly compressed data can be
reduced assuming a constant overall bit rate.
All other error resilience tools discussed in
the following generally increase the data rate
at the same quality, and, therefore, their
application should always be considered
very carefully in order not to effect adversely
compression efficiency, especially if lower
layer error protection is applicable. This can
be seen for packet error rate 0 in Figure 18.
Slice structured coding reduces packet loss
probability and the visual degradation from
packet losses, especially in combination with
advanced decoder error concealment methods [25]. A
slice is a sequence of macroblocks within one slice group
and provides spatially distinct resynchronization points
within the video data for a single frame. No intra-frame
prediction takes place across slice boundaries. However,
17FIRST QUARTER 2004 IEEE CIRCUITS AND SYSTEMS MAGAZINE
Bit Stream 2
Bit Stream 1
P2,n–2 P2,n–1 S2,n P2,n+1 P2,n+2
S12,n
P1,n–2 P1,n–1 S1,n P1,n+1 P1,n+2
Figure 15. Switching between bit streams by using SI-Frames (from [22]).
Bit Stream 2
Bit Stream 1
P2,n–2 P2,n–1 S2,n P2,n+1 P2,n+2
S12,n
P1,n–2 P1,n–1 S1,n P1,n+1 P1,n+2
Figure 16. Switching between bit streams by using SP-Frames (from [22]).
1The Internet error pattern has been captured from
real-world measurements and results in packet loss
rates of approximately 3%, 5%, 10%, and 20%. These
error probabilities label the packet error rate in Figure
18. Note that the 5% error file is burstier than the oth-
ers resulting in somewhat unexpected results.
the loss of intra-frame prediction and the increased over-
head associated with decreasing slice sizes adversely
affect coding performance. Especially for wireless trans-
mission a careful selection of the packet size is necessary
[7].
As a more advanced feature, Flexible Macroblock
Ordering (FMO) allows the specification of macroblock
allocation maps defining the mapping of macroblocks to
slice groups, where a slice group itself may contain sev-
eral slices. An example is shown in Figure 17.
Therefore, macro-
blocks might be trans-
mitted out of raster scan
order in a flexible and
efficient way. Specific
macroblock allocation
maps enable the efficient
application of features
such as slice interleav-
ing, dispersed mac-
roblock allocation using
checkerboard-like pat-
terns, one or several
foreground slice groups
and one left-over back-
ground slice groups, or
sub-pictures within a
picture to support, e.g.,
isolated regions [26]. Fig-
ure 18 shows increased
performance for FMO
with checkerboard pat-
tern for increasing error
rate when compared to
the abandoning of error
resilience features.
Arbitrary slice order-
ing (ASO) allows that the
decoding order of slices
within a picture may not
follow the constraint that
the address of the first
macroblock within a slice
is monotonically increas-
ing within the NAL unit
stream for a picture. This
permits, for example, to
reduce decoding delay in
case of out-of-order deliv-
ery of NAL units.
Data Partitioning
allows up to three parti-
tions for the transmis-
sion of coded information. Rather than just providing two
partitions, one for the header and the motion information,
and one for the coded transform coefficients, H.264/AVC
can generate three partitions by separating the second
partition in intra and inter information. This allows assign-
ing higher priority to, in general, more important intra
information. Thus, it can reduce visual artifacts resulting
from packet losses, especially if prioritization or unequal
error protection is provided by the network.
If despite of all these techniques, packet losses and spa-
18 IEEE CIRCUITS AND SYSTEMS MAGAZINE FIRST QUARTER 2004
Slice Group 0
Slice Group 2
Slice Group 0 Slice Group 1
Slice Group 1
Figure 17. Division of an image into several slice groups using Flexible Macroblock Ordering (FMO).
Foreman, QCIF, 7.6 fps, 64 kbit/s
Packet Error Rate in %
No Error Resilience
20% Random Intra
Channel Adaptive Intra
FMO Checkerboard 2
FMO CB 2 with 20% RI
FMO CB 2 with CAI
Feedback with Delay 2
AverageY–PSNRindB
36
34
32
30
28
26
24
22
20
0 2 4 6 8 10 12 14 16 18 20
Figure 18. Average Y-PSNR over packet error rate (burstiness at 5% error rate is higher than for
other error rates) for Foreman, QCIF, 7.5 fps, 64 kbit/s and different error resilience tools in
H.264/AVC: no error resilience with one packet per frame, additional 20% random intra (RI)
update, channel adaptive intra (CAI) update, each feature combined with FMO checkerboard pat-
tern with 2 packets per frame (i.e., macroblocks with odd addresses in slice group 1, with even
addresses in slice group 2), and feedback system with a 2-frame delayed (about 250 ms) decoder
channel information at the encoder.
tio-temporal error propagation are not avoidable, quick
recovery can only be achieved when image regions are
encoded in Intra mode, i.e., without reference to a previ-
ously coded frame. H.264/AVC allows encoding of single
macroblocks for regions that cannot be predicted efficient-
ly. This feature can also be used to limit error propagation
by transmitting a number of intra coded macroblocks antic-
ipating transmission errors. The selection of Intra coded
MBs can be done either randomly, in certain update pat-
terns, or preferably in channel-adaptive rate-distortion
optimized way [7], [27]. Figure 18 reveals that the intro-
duction of intra coded macroblocks significantly improves
the performance for increased error rates and can be com-
bined with any aforementioned error resilience features.
Thereby, channel-adaptive intra updates can provide bet-
ter results than purely random intra updates, especially
over the entire range of error rates.
A redundant coded slice is a coded slice that is a part
of a redundant picture which itself is a coded representa-
tion of a picture that is not used in the decoding process
if the corresponding primary coded picture is correctly
decoded. Examples of applications and coding tech-
niques utilizing the redundant coded picture feature
include the video redundancy coding [28] and protection
of “key pictures” in multicast streaming [29].
In bi-directional conversational applications it is com-
mon that the encoder has the knowledge of experienced
NAL unit losses at the decoder, usually with a small delay.
This small information can be conveyed from the decoder
to the encoder. Although retransmissions are not feasible
in a low-delay environment, this information is still useful
at the encoder to limit error propagation [30]. The flexi-
bility provided by the multiple reference frame concept in
H.264/AVC allows incorporating so called NEWPRED
approaches [31] in a straight-forward manner which
address the problem of error propagation. For most suc-
cessful applications, a selection of reference frames and
intra updates can be integrated in a rate-distortion opti-
mized encoder control as discussed in Section 4 taking
into account not only video statistics, but also all avail-
able channel information [7]. Excellent results are shown
in Figure 18 applying five reference frames and feedback
delay of two frames, especially for moderate to higher
error rates. To improve the performance also for low
error rates, a combination of channel adaptive intra up-
dates and feedback might be considered according to
[27] at the expense of increased encoding complexity.
4. Rate Constrained Encoder Control
Due to the fact that the standard defines only the bit-
stream syntax and the possible coding tools the coding
efficiency is dependent on the coding strategy of the
encoder, which is not part of the standard (see Figure 1).
Figure 19 shows the principle rate distortion working
points for different encoder strategies. If just the mini-
mization of the distortion is considered for the decision
of the coding tools the achieved distortion is small but
the required rate is very high. Vice versa, if just the rate
is considered the achieved rate is small but the distortion
is high. Usually, these working points are both not
desired. Desired is a working point at which both the dis-
tortion and the rate are minimized together. This can be
achieved by using Lagrangian optimization techniques,
which are described for example in [32].
For the encoding of video sequences using the
H.264/AVC standard, Lagrangian optimization techniques
for the choice of the macroblock mode and the estima-
tion of the displacement vector are proposed in [10], [33]
and [34].
The macroblock mode of each macroblock Sk can be
19FIRST QUARTER 2004 IEEE CIRCUITS AND SYSTEMS MAGAZINE
Working Point Achieved by Minimizing Only the Rate
Working Point Achieved by Minimizing the Distortion Under a Rate Constraint
Working Point Achieved by Minimizing Only the Distortion
Rate
Distortion
Figure 19. Principle rate distortion working points for different encoder strategies.
efficiently chosen out of all possible modes Ik by mini-
mizing the functional
DREC(Sk, Ik | Q P) + λMode · RREC(Sk, Ik | Q P) → min
Hereby the distortion DREC is measured by the sum of
squared differences (SSD) between the original signal s
and the corresponding reconstructed signal s of the
same macroblock. The SSD can be calculated by
S S D =
(x,y)
| s[x, y, t] − s [x, y, t] |2
.
The rate RREC is the rate that is required to encode
the block with the entropy coder. QP is the quantization
parameter used to adjust the quantization step size. It
ranges from 0 to 51.
The motion vectors can be efficiently estimated by
minimizing the functional
DDF D(Si,
−→
d ) + γMotion · RMotion(Si,
−→
d ) → min
with
DDF D(Si,
−→
d )
=
(x,y)
|s[x, y, t] − s [x, − dx, y, − dy, t − dt] |2 .
Hereby RMotion is the rate required to transmit the motion
information
−→
d , which consists of both displacement vec-
tor components dx and dy and the corresponding refer-
ence frame number dt. The following Lagrangian
parameters lead to good results as shown in [10]:
λMode = λMotion = 0.85.2(QP−12)/3
.
As already discussed, the tools for increased error
resilience, in particular those to limit error propagation,
do not significantly differ from those used for compres-
sion efficiency. Features like multi-frame prediction or
macroblock intra coding are not exclusively error
resilience tools. This means that bad decisions at the
encoder can lead to poor results in coding efficiency or
error resiliency or both. The selection of the coding mode
for compression efficiency can be modified taking into
account the influence of the random lossy channel. In this
case, the encoding distortion is replaced by the expected
decoder distortion. For the computation of the expected
distortion we refer to, e.g. [27] or [35]. This method has
been applied to generate channel-adaptive results in sub-
section 3.6 assuming a random-lossy channel with known
error probability at the encoder.
5. Profiles and Levels of H.264/AVC
H.264/AVC has been developed to address a large range
of applications, bit rates, resolutions, qualities, and
services; in other words, H.264/AVC intends to be as
generically applicable as possible. However, different
applications typically have different requirements both
in terms of functionalities, e.g., error resilience, com-
pression efficiency and delay, as well as complexity (in
this case, mainly decoding complexity since encoding is
not standardized).
In order to maximize the interoperability while limiting
the complexity, targeting the largest deployment of the
standard, the H.264/AVC specification defines profiles and
levels. A profile is defined as a subset of the entire bit
stream syntax or in other terms as a subset of the coding
tools. In order to achieve a subset of the complete syntax,
flags, parameters, and other syntax elements are includ-
ed in the bit stream that signal the presence or absence
of syntactic elements that occur later in the bit stream. All
decoders compliant to a certain profile must support all
the tools in the corresponding profile.
However, within the boundaries imposed by the syn-
tax of a given profile, there is still a
large variation in terms of the capabil-
ities required of the decoders depend-
ing on the values taken by some
syntax elements in the bit stream such
as the size of the decoded pictures.
For many applications, it is currently
neither practical nor economic to
implement a decoder able to deal with
all hypothetical uses of the syntax
within a particular profile. To address
this problem, a second profiling
dimension was created for each pro-
file: the levels. A level is a specified set
of constraints imposed on values of
the syntax elements in the bit stream.
20 IEEE CIRCUITS AND SYSTEMS MAGAZINE FIRST QUARTER 2004
Extended
SI/SP Slices
Main
B Slices
CABAC
Weighted
Prediction
Field Coding
MB-AFF
Data Partitioning
FMO
Baseline
ASO
Red. Pictures
Features
I & P Slices
Diff. Block Sizes
Intra Prediction
CAVLC
In-Loop Deb. Filter
Multiple Ref. Frames
1/4 Pel MC
Figure 20. H.264/AVC profiles and corresponding tools.
These constraints may be simple limits on values or
alternatively they may take the form of constraints on
arithmetic combinations of values (e.g. picture width mul-
tiplied by picture height multiplied by number of pictures
decoded per second) [1]. In H.264/AVC, the same level
definitions are used for all profiles defined. However, if a
certain terminal supports more than one profile, there is
no obligation that the same level is supported for the var-
ious profiles. A profile and level combination specifies the
so-called conformance points, this means points of inter-
operability for applications with similar functional
requirements.
Summing up, profiles and levels together specify
restrictions on the bit streams and thus minimum
bounds on the decoding capabilities, making possible to
implement decoders with different limited complexity,
targeting different application domains. Encoders are
not required to make use of any specific set of tools;
they only have to produce bit streams which are com-
pliant to the relevant profile and level combination.
To address the large range of applications considered by
H.264/AVC, three profiles have been defined (see Figure 20):
■ Baseline Profile—Typically considered the sim-
plest profile, includes all the H.264/AVC tools with
the exception of the following tools: B-slices,
weighted prediction, field (interlaced) coding, pic-
ture/macroblock adaptive switching between
frame and field coding (MB-AFF), CABAC, SP/SI
slices and slice data partitioning. This profile typi-
cally targets applications with low complexity and
low delay requirements.
■ Main Profile—Supports together with the Baseline
profile a core set of tools (see Figure 20); however,
regarding Baseline, Main does exclude FMO, ASO
and redundant pictures features while including B-
slices, weighted prediction, field (interlaced) cod-
ing, picture/macroblock adaptive switching
between frame and field coding (MB-AFF), and
CABAC. This profile typically allows the best quali-
ty at the cost of higher complexity (essentially due
to the B-slices and CABAC) and delay.
■ Extended Profile—This profile is a superset of the
Baseline profile supporting all tools in the specifica-
tion with the exception of CABAC. The SP/SI slices
and slice data partitioning tools are only included in
this profile.
From Figure 20, it is clear that there is a set of tools
supported by all profiles but the hierarchical capabili-
ties for this set of profiles are reduced to Extended being
a superset of Baseline. This means, for example, that
only certain Baseline compliant streams may be decod-
ed by a decoder compliant with the Main profile.
Although it is difficult to establish a strong relation
between profiles and applications (and clearly nothing
is normative in this regard), it is possible to say that
conversational services will typically use the Baseline
profile, entertainment services the Main profile, and
streaming services the Baseline or Extended profiles for
wireless or wired environments, respectively. However,
a different approach may be adopted and, for sure, may
change in time as additional complexity will become
more acceptable.
In H.264/AVC, 15 levels are specified for each profile.
Each level specifies upper bounds for the bit stream or
lower bounds for the decoder capabilities, e.g., in terms
of picture size (from QCIF to above 4k×2k), decoder pro-
cessing rate (from 1485 to 983040 macroblocks per sec-
ond), size of the memory for multi-picture buffers, video
bit rate (from 64 kbit/s to 240 Mbit/s), and motion vec-
tor range (from [−64, +63.75] to [−512, +511.75]). For
more detailed information on the H.264/AVC profiles and
levels, refer to Annex A of [1].
6. Comparison to Previous Standards
In this section, a comparison of H.264/AVC to other
video coding standards is given with respect to the cod-
ing efficiency (Subsection 6.1) and hardware complexity
(Subsection 6.2).
6.1 Coding Efficiency
In [10], a detailed comparison of the coding efficiency of
different video coding standards is given for video
streaming, video conferencing, and entertainment-quality
applications. All encoders are rate-distortion optimized
using rate constrained encoder control [10], [33], [34].
For video streaming and video conferencing applications,
we use test video sequences in the Common Intermediate
Format (CIF, 352 × 288 picture elements, progressive) and
in the Quarter Common Intermediate Format (QCIF,
176×144 picture elements, progressive). For entertain-
ment-quality applications, sequences in ITU-R 601
(720×576 picture elements, interlaced) and High Defini-
tion Television (HDTV, 1280 × 720 picture elements, pro-
gressive) are used. The coding efficiency is measured by
average bit rate savings for a constant peak signal to
noise ratio (PSNR). Therefore the required bit rates of
several test sequences and different qualities are taken
into account.
For video streaming applications, H.264/AVC MP
(Main Profile), MPEG-4 Visual ASP (Advanced Simple Pro-
file), H.263 HLP (High Latency Profile), and MPEG-2 Video
ML@MP (Main Level at Main Profile) are considered. Fig-
ure 21 shows the PSNR of the luminance component ver-
sus the average bit rate for the single test sequence
Tempete encoded at 15 Hz and Table 1 presents the aver-
age bit rate savings for a variety of test sequences and bit
21FIRST QUARTER 2004 IEEE CIRCUITS AND SYSTEMS MAGAZINE
rates. It can be drawn from Table 1 that H.264/AVC out-
performs all other considered encoders. For example,
H.264/AVC MP allows an average bit rate saving of
about 63% compared to MPEG-2 Video and about
37% compared to MPEG-4 Visual ASP.
For video conferencing applications, H.264/AVC
BP (Baseline Profile), MPEG-4 Visual SP (Simple
Profile), H.263 Baseline, and H.263 CHC (Conversa-
tional High Compression) are considered. Figure
22 shows the luminance PSNR versus average bit
rate for the single test sequence Paris encoded at
15 Hz and Table 2 presents the average bit rate sav-
ings for a variety of test sequences and bit rates.
As for video streaming applications, H.264/AVC
outperforms all other considered encoders.
H.264/AVC BP allows an average bit rate saving of
about 40% compared to H.263 Baseline and about
27% compared to H.263 CHC.
For entertainment-quality applications, the aver-
age bit rate saving of H.264/AVC compared to MPEG-2
Video ML@MP and HL@MP is 45% on average [10]. A part of
this gain in coding efficiency is due to the fact that
H.264/AVC achieves a large degree of removal of film grain
noise resulting from the motion picture production process.
However, since the perception of this noisy grain texture is
often considered to be desirable, the difference in per-
ceived quality between H.264/AVC coded video and MPEG-
2 coded video may often be less distinct than indicated by
the PSNR-based comparisons, especially in high-quality,
high-resolution applications such as High-Definition DVD or
Digital Cinema.
In certain applications like the professional motion
picture production, random access for each individual
picture may be required. Motion-JPEG2000
[37] as an extension of the new still image
coding standard JPEG2000 provides this
feature along with some useful scalability
properties. When restricted to IDR frames,
H.264/AVC is also capable of serving the
needs for such a random access capability.
Figure 23 shows PSNR for the luminance
component versus average bit rate for the
ITU-R 601 test sequence Canoe encoded in
intra mode only, i.e., each field of the whole
sequence is coded in intra mode only. Inter-
estingly, the measured rate-distortion per-
formance of H.264/AVC MP is better than
that of the state-of-the-art in still image
compression as exemplified by JPEG2000,
at least in this particular test case. Other
test cases were studied in [38] as well, lead-
ing to a general observation that up to
1280 × 720 pel HDTV signals the pure intra
coding performance of H.264/AVC MP is comparable or
better than that of Motion-JPEG2000.
6.2 Hardware Complexity
Assessing the complexity of a new video coding standard
is not a straightforward task: its implementation complexi-
ty heavily depends on the characteristics of the platform
(e.g., DSP processor, FPGA, ASIC) on which it is mapped. In
this section, the data transfer characteristics are chosen as
generic, platform independent, metrics to express imple-
mentation complexity. This approach is motivated by the
data dominance of multimedia applications [39]–[44].
Both the size and the complexity of the specification
and the intricate interdependencies between different
H.264/AVC functionalities, make complexity assessment
using only the paper specification unfeasible. Hence the
22 IEEE CIRCUITS AND SYSTEMS MAGAZINE FIRST QUARTER 2004
Tempete CIF 15Hz
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
0 256 512 768 1024 1280 1536 1792
Bit Rate [kbit/s]
Y-PSNR[dB]
H.264/AVC MP
MPEG-4 ASP
H.263 HLP
MPEG-2
Figure 21. Luminance PSNR versus average bit rate for different coding
standards, measured for the test sequence Tempete for video streaming
applications (from [36]).
Table 1.
Average bit rate savings for video streaming applications (from [10]).
Table 2.
Average bit rate savings for video conferencing applications (from [10]).
Average Bit Rate Savings Relative To:
Coder H.263 CHC MPEG-4 SP H.263 Base
H.264/AVC BP 27.69% 29.37% 40.59%
H.263 CHC — 2.04% 17.63%
MPEG-4 SP — — 15.69%
Average Bit Rate Savings Relative To:
Coder MPEG-4 ASP H.263 HLP MPEG-2
H.264/AVC MP 37.44% 47.58% 63.57%
MPEG-4 ASP — 16.65% 42.95%
H.263 HLP — — 30.61%
23FIRST QUARTER 2004 IEEE CIRCUITS AND SYSTEMS MAGAZINE
presented complexity analysis has been performed on
the executable C code produced by the JVT instead. As
this specification is the result of a collaborative effort, the
code unavoidably has different properties with respect to
optimisation and platform dependence. Still, it is our
experience that when using automated profiling tools
yielding detailed data transfer characteristics (such as
[45]) on similar specifications (e.g., MPEG-4) meaningful
relative complexity figures are obtained (this is also the
conclusion of [46]. The H.264/AVC JM2.1 code is used for
the reported complexity assessment experiments. Newer
versions of the executable H.264/AVC specification have
become available that also include updated tool defini-
tions achieving a reduced complexity.
The test sequences used in the complexity assess-
ment are: Mother & Daughter 30
Hz QCIF, Foreman 25 Hz QCIF and
CIF, and Mobile & Calendar 15 Hz
CIF (with bit rates ranging from 40
Kbits/s for the simple sequences
to 2 Mbits/s for the complex ones).
A fixed quantization parameter
setting has been assumed.
The next two subsections high-
light the main contributions to the
H.264/AVC complexity. Conse-
quently some general considera-
tions are presented.
6.2.1 Complexity Analysis of
Some Major H.264/AVC Encoding
Tools
■ Variable Block Sizes: using
variable block sizes affects
the access frequency in a
linear way: more than 2.5%
complexity increase2
for each additional mode. A
typical bit rate reduction between 4 and 20% is
achieved (for the same quality) using this tool,
however, the complexity increases linearly with
the number of modes used, while the correspon-
ding compression gain saturates.
■ Hadamard transform: the use of Hadamard cod-
ing results in an increase of the access frequen-
cy of roughly 20%, while not significantly
impacting the quality vs. bit rate for the test
sequences considered.
■ RD-Lagrangian optimisation: this tool comes with
a data transfer increase in the order of 120% and
improves PSNR (up to 0.35 dB) and bit rate (up to
9% bit savings). The performance vs. cost trade-
off when using RD techniques for motion estima-
tion and coding mode decisions inherently
depends on the other tools used. For instance,
when applied to a basic configuration with 1 ref-
erence frame and only 16×16 block size, the
resulting complexity increase is less than 40%.
■ B-frames: the influence of B frames on the access
frequency varies from −16 to +12% depending on
the test case and decreases the bit rate up to 10%.
■ CABAC: CABAC entails an access frequency
increase from 25 to 30%, compared to methods
using a single reversible VLC table for all syn-
tax elements,. Using CABAC reduces the bit
rate up to 16%.
■ Displacement vector resolution: The encoder may
choose to search for motion vectors only at 1/2
pel positions instead of 1/4 pel positions. This
results in a decrease of access frequency and pro-
cessing time of about 10%. However, use of 1/4 pel
motion vectors increases coding efficiency up to
30% except for very low bit rates.
■ Search Range: increasing both reference frame
numbers and search size leads to higher access
frequency, up to approximately 60 times (see also
Table 3), while it has a minimal impact on PSNR
and bit rate performances.
■ Multiple Reference Frames: adopting multiple refer-
ence frames increases the access frequency accord-
Paris CIF 15Hz
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
0 128 256 384 512 640 768
Bit Rate [kbit/s]
Y-PSNR[dB]
H.264/AVC BP
H.263 CHC
MPEG-4 SP
H.263 Baseline
Figure 22. Luminance PSNR versus average bit rate for different coding standards,
measured for the test sequence Paris for video conferencing applications (from [36]).
2Complexity increases and compression improvements are relative to a comparable, meaningful configuration without the tool under considera-
tion, see also [47].
24 IEEE CIRCUITS AND SYSTEMS MAGAZINE FIRST QUARTER 2004
Foreman 25 Hz QCIF Foreman 25 Hz CIF Mobile & Calendar 15 Hz CIF
Search Range 8 16 32 8 16 32 8 16 32
5 ref. frames 16.9 24.6 55.7 17.5 25.3 56.1 16.6 23.1 48.8
1 ref. frame 1 2.54 8.87 1 2.53 8.90 1 2.49 8.49
Table 3.
Impact of the number of reference frames and search range on the number of encoder accesses
(relative to the simplest case considered for each sequence).
ing to a linear model: 25% complexity increase for
each added frame. A negligible gain (less than 2%) in
bit rate is observed for low and medium bit rates,
but more significant savings can be achieved for
high bit rate sequences (up to 14%).
■ Deblocking filter: The mandatory use of the
deblocking filter has no measurable impact on the
encoder complexity. However, the filter provides a
significant increase in subjective picture quality.
For the encoder, the main bottleneck is the combina-
tion of multiple reference frames and large search sizes.
Speed measurements on a Pentium IV platform at 1.7 GHz
with Windows 2000 are consistent with the above conclu-
sions (this platform is also used for the speed measure-
ments for the decoder).
6.2.2 Complexity Analysis of Some Major H.264/AVC
Decoding Tools
■ CABAC: the access frequency increase due to CABAC
is up to 12%, compared to methods using a single
reversible VLC table for all syntax elements,. The
higher the bit rate, the higher the increase.
■ RD-Lagrangian optimization: the use
of Lagrangian cost functions at the
encoder causes an average com-
plexity increase of 5% at the
decoder for middle and low–rates
while higher rate video is not affect-
ed (i.e. in this case, encoding choic-
es result in a complexity increase at
the decoder side).
■ B-frames: the influence of B-frames
on the data transfer complexity
increase varies depending on the
test case from 11 to 29%. The use
of B-frames has an important
effect on the decoding time: intro-
ducing a first B-frame requires an
extra 50% cost for the very low bit
rate video, 20 to 35% for medium
and high bite-rate video. The extra
time required by the second B-
frame is much lower (a few %).
■ Hadamard transform: the influence on the decoder of
using the Hadamard transform at the encoder is neg-
ligible in terms of memory accesses, while it increas-
es the decoding time up to 5%.
■ Deblocking filter: The use of the mandatory
deblocking filter increases the decoder access
frequency by 6%.
■ Displacement vector resolution: In case the encoder
sends only vectors pointing to 1/2 pel positions, the
access frequency and decoding time decrease
about 15%.
6.2.3 Other Considerations
In relative terms, the encoder complexity increases with
more than one order of magnitude between MPEG-4 Part 2
(Simple Profile) and H.264/AVC (Main Profile) and with a fac-
tor of 2 for the decoder. The H.264/AVC encoder/decoder
complexity ratio is in the order of 10 for basic configura-
tions and can grow up to 2 orders of magnitude for complex
ones, see also [47].
Our experiments have shown that, when combining
the new coding features, the relevant implementation
Canoe ITU-R 601, Intra Coding Only
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
0 2 4 6 8 10 12
Bit Rate [Mbit/s]
Y-PSNR[dB]
H.264/AVC MP, Intra Only
Motion-JPEG2000
Figure 23. Luminance PSNR versus average bit rate for H.264/AVC and
Motion-JPEG2000, measured for the ITU-R 601 test sequence Canoe for pure
intra coding.
complexity accumulates while the global compression
efficiency saturates. An appropriate use of the
H.264/AVC tools leads to roughly the same compression
performance if all the tools would be used simultane-
ously, but with a considerable reduction in implementa-
tion complexity (a factor 6.5 for the encoder and up to
1.5 for the decoder). These efficient use modes are
reflected in the choice of the tools and parameter set-
tings of the H.264/AVC profiles (see Section 5). More
information on complexity analyses that have been per-
formed in the course of H.264/AVC standardisation can
be found in [48] [49] [50].
7. Licensing of H.264/AVC Technology
Companies and universities introducing technology into
international standards usually protect their intellectual
property with patents. When participants in the stan-
dards definition process proposed patented technology
to be included into the standard they promised to license
the use of their technology in fair, reasonable and non-dis-
criminatory terms, the so-called RAND conditions. Essen-
tial patents describe technology that has to be
implemented in a standards-compliant decoder. The use
of patented technology requires a user of this technology
to license it from the respective owner. Given that there
are many patents used in any modern video coding stan-
dard, several companies pooled their patents into a pool
such that licensing H.264/AVC technology is easy for the
user. At this point, there are two patent pools: One is
organized by MPEG LA and the other by Via Licensing.
Since the patents covered by the two patent pools are not
precisely the same, users of H.264/AVC technology need
in principle to have a license from both patent pools.
Unfortunately, these pools do not guarantee that they
cover the entire technology of H.264 as participation of a
patent owner in a patent pool is voluntary.
MPEG LA LLC is the organization which gathered the
owners of essential patents like Columbia University, Elec-
tronics and Telecommunications Research Institute of
Korea (ETRI), France Télécom, Fujitsu, LG Electronics, Mat-
sushita, Mitsubishi, Microsoft, Motorola, Nokia, Phillips,
Robert Bosch GmbH, Samsung, Sharp, Sony, Toshiba, and
Victor Company of Japan (JVC) into a patent pool. VIA
Licensing Corporation, a subsidiary of Dolby Laboratories,
licenses essential H.264/AVC technology from companies
like Apple Computer, Dolby Laboratories, FastVDO, Fraun-
hofer-Gesellschaft eV, IBM, LSI Logic, Microsoft, Motorola,
Polycom, and RealNetworks. Both patent pools may be
licensed for the commercial use of an H.264/AVC decoder.
Unfortunately, the terms of the license differ.
MPEG LA terms: After the end of a grace period in
December 2004, an end product manufacturer for
encoders or decoders has to pay a unit fee of $0.20 per
unit after the first 100,000 units that are free each year.
In addition to this fee for the actual soft- or hardware,
certain companies are taxed a participation fee starting
January 2006. Providers of Pay-per-View, download or
Video-on-Demand services pay the lower of 2% of the
sales price or $0.02 for each title. This applies to all
transmission media like cable, satellite, Internet, mobile
and over the air. Subscription services with more than
100,000 but less than 1,000,000 AVC video subscribers
pay a minimum of $0.075 and a maximum of $0.25 per
subscriber per year. Operators of over-the-air free
broadcast services are charged $10,000 per year per
transmitter. Free Internet broadcast is exempt from any
fees until December 2010.
VIA Licensing terms: After the end of a grace period
in December 2004, an end product manufacturer for
encoder or decoders has to pay a unit fee of $0.25 per
unit. A participation or replication fee is not required if
the content is provided for free to the users. A fee of
$0.005 for titles shorter than 30 minutes up to $0.025 for
titles longer than 90 minutes has to be paid for titles that
are permanently sold. For titles that are sold on a tem-
porary basis, the ‘replication fee’ is $0.0025. This patent
pool does not require the payment of any fees as long as
a company distributes less than 50,000 devices and
derives less than $500,000 revenue from its activities
related to devices and content distribution. It appears
that interactive communication services like video
telephony only requires a unit fee but not a participation
fee. While previous standards like MPEG-2 Video also
required a license fee to be paid for every encoder and
decoder, the participation fees established for the use of
H.264/AVC require extra efforts from potential commer-
cial users of H.264/AVC.
Disclaimer: No reliance may be placed on this section
on licensing of H.264/AVC technology without written
confirmation of its contents from an authorized repre-
sentative.
8. Summary
This new international video coding standard has been
jointly developed and approved by the MPEG group
ISO/IEC and the VCEG group of ITU-T. Compared to previ-
ous video coding standards, H.264/AVC provides an
improved coding efficiency and a significant improve-
ment in flexibility for effective use over a wide range of
networks. While H.264/AVC still uses the concept of
block-based motion compensation, it provides some sig-
nificant changes:
■ Enhanced motion compensation capability using
high precision and multiple reference frames
■ Use of an integer DCT-like transform instead of
the DCT
25FIRST QUARTER 2004 IEEE CIRCUITS AND SYSTEMS MAGAZINE
■ Enhanced adaptive entropy coding including arith-
metic coding
■ Adaptive in-loop deblocking filter
The coding tools of H.264/AVC when used in an optimized
mode allow for bit savings of about 50% compared to pre-
vious video coding standards like MPEG-4 and MPEG-2 for
a wide range of bit rates and resolutions. However, these
savings come at the price of an increased complexity. The
decoder is about 2 times as complex as an MPEG-4 Visual
decoder for the Simple profile, and the encoder is about
10 times as complex as a corresponding MPEG-4 Visual
encoder for the Simple profile. The H.264/AVC main pro-
file decoder suitable for entertainment applications is
about four times more complex than MPEG-2. The
encoder complexity depends largely on the algorithms
for motion estimation as well as for the rate-constrained
encoder control. Given the performance increase of VLSI
circuits since the introduction of MPEG-2, H.264/AVC
today is less complex than MPEG-2 in 1994. At this point
commercial companies may already license some tech-
nology for implementing an H.264/AVC decoder from two
licensing authorities simplifying the process of building
products on H.264/AVC technology.
9. Acknowledgments
The authors would like to thank the experts of ISO/IEC
MPEG, ITU-T VCEG, and ITU-T/ISO/IEC Joint Video Team
for their contributions in developing the standard.
10. References
[1] ISO/IEC 14496–10:2003, “Coding of Audiovisual Objects—Part 10:
Advanced Video Coding,” 2003, also ITU-T Recommendation H.264
“Advanced video coding for generic audiovisual services.”
[2] ITU-T Recommendation H.261, “Video codec for Audiovisual Services
at p X 64 kbit/s,” March 1993.
[3] ISO/IEC 11172: “Information technology—coding of moving pictures
and associated audio for digital storage media at up to about 1.5 Mbit/s,”
Geneva, 1993.
[4] ISO/IEC 13818–2: “Generic coding of moving pictures and associated
audio information—Part 2: Video,” 1994, also ITU-T Recommendation H.262.
[5] ITU-T Recommendation H.263, “Video Coding for Low bit rate Commu-
nication,” version 1, Nov. 1995; version 2, Jan. 1998; version 3, Nov. 2000.
[6] ISO/IEC 14496–2: “Information technology—coding of audiovisual
objects—part 2: visual,” Geneva, 2000.
[7] T. Stockhammer, M.M. Hannuksela, and T. Wiegand, “H.264/AVC in
wireless environments,” IEEE Transactions on Circuits and Systems, vol.
13, no. 7, pp. 657–673, July 2003.
[8] T. Wedi and H.G. Musmann, “Motion- and aliasing-compensated pre-
diction for hybrid video coding,” IEEE Transactions on Circuits and Sys-
tems for Video Technology, vol. 13, pp. 577–587, July 2003.
[9] T. Wiegand, X. Zhang, and B. Girod, “Long-term memory motion-com-
pensated prediction for video coding,” IEEE Trans. Circuits Syst. Video
Technol., vol. 9, pp. 70–84, Feb. 1999.
[10] T. Wiegand, H. Schwarz, A. Joch, and F. Kossentini, “Rate-con-
strained coder control and comparison of video coding standards,” IEEE
Transactions on Circuits and Systems for Video Technology, vol. 13, pp.
688–703, July 2003.
[11] B. Girod, “Efficiency analysis of multihypothesis motion-compen-
sated prediction for video coding,” IEEE Trans. Image Processing, vol. 9,
Feb. 1999.
[12] M. Flierl, T. Wiegand, and B. Girod, “Rate-constrained multi-hypothesis
motion-compensated prediction for video coding,” in Proc. IEEE Int. Conf.
Image Processing, Vancouver, BC, Canada, Sept. 2000, vol. 3, pp. 150–153.
[13] N. Ahmed, T. Natarajan, and R. Rao, “Discrete cosine transform,”
IEEE Transactions on Computers, vol. C-23, pp. 90–93, Jan. 1974.
[14] Iain E G Richardson, “H.264/MPEG-4 Part 10 White Paper.” Available:
http://www.vcodex.fsnet.co.uk/resources.html
[15] H. Malvar, A. Hallapuro, M. Karczewicz, and L. Kerofsky, “Low-Com-
plexity transform and quantization in H.264/AVC,” IEEE Transactions on
Circuits and Systems for Video Technology, vol. 13, pp. 598–603, July 2003.
[16] D. Marpe, H. Schwarz, and T. Wiegand, “Context-based adaptive
binary arithmetic coding in the H.264/AVC video compression standard,”
IEEE Transactions on Circuits and Systems for Video Technology, vol. 13,
pp. 620–636, July 2003.
[17] P. List, A. Joch, J. Lainema, and G. Bjontegaard, “Adaptive deblock-
ing filter” IEEE Transactions on Circuits and Systems for Video Technology,
vol. 13, pp. 614–619, July 2003.
[18] J. Ribas-Corbera, P.A. Chou, and S. Regunathan, “A generalized
hypothetical reference decoder for H.264/AVC,” IEEE Transactions on Cir-
cuits and Systems, vol. 13, no. 7, pp. 674–687, July 2003.
[19] B. Girod, M. Kalman, Y.J. Liang, and R. Zhang, “Advances in video
channel-adaptive streaming,” in Proc. ICIP 2002, Rochester, NY, Sept. 2002.
[20] Y.J. Liang and B. Girod, “Rate-distortion optimized low--latency
video streaming using channel-adaptive bitstream assembly,” in Proc.
ICME 2002, Lausanne, Switzerland, Aug. 2002.
[21] S.H. Kang and A. Zakhor, “Packet scheduling algorithm for wireless
video streaming,” in Proc. International Packet Video Workshop 2002,
Pittsburgh, PY, April 2002.
[22] M. Karczewicz and R. Kurçeren, “The SP and SI frames design for
H.264/AVC,” IEEE Transactions on Circuits and Systems, vol. 13, no. 7, pp.
637–644, July 2003.
[23] S. Wenger. (September 2001). Common Conditions for wire-line, low
delay IP/UDP/RTP packet loss resilient testing. VCEG-N79r1. Available:
http://standard.pictel. com/ftp/video-site/0109_San/VCEG-N79r1.doc,.
[24] S. Wenger, “H.264/AVC over IP,” IEEE Transactions on Circuits and Sys-
tems, vol. 13, no. 7, pp. 645–656, July 2003.
[25] Y.-K. Wang, M.M. Hannuksela, V. Varsa, A. Hourunranta, and M.
Gabbouj, “The error concealment feature in the H.26L test model,” in
Proc. ICIP, vol. 2, pp. 729–732, Sept. 2002.
[26] Y.-K. Wang, M.M. Hannuksela, and M. Gabbouj, “Error-robust
inter/intra mode selection using isolated regions,” in Proc. Int. Packet
Video Workshop 2003, Apr. 2003.
[27] R. Zhang, S.L. Regunathan, and K. Rose, “Video coding with optimal
inter/intra-mode switching for packet loss resilience,” IEEE JSAC, vol. 18,
no. 6, pp. 966–976, July 2000.
[28] S. Wenger, “Video redundancy coding in H.263+,” 1997 International
Workshop on Audio-Visual Services over Packet Networks, Sept. 1997.
[29] Y.-K. Wang, M.M. Hannuksela, and M. Gabbouj, “Error resilient video
coding using unequally protected key pictures,” in Proc. International
Workshop VLBV03, Sept. 2003.
[30] B. Girod and N. Färber, “Feedback-based error control for mobile
video transmission,” in Proc. of IEEE, vol. 97, no. 10, Oct. 1999, pp.
1707–1723.
[31] S. Fukunaga, T. Nakai, and H. Inoue, “Error resilient video coding by
dynamic replacing of reference pictures,” in Proc. IEEE Globecom, vol. 3,
Nov. 1996.
[32] A. Ortega and K. Ramchandran, “Rate-distortion methods for image
and video compression,” IEEE Signal Processing Magazine, vol. 15 no. 6,
pp. 23–50, Nov. 1998.
[33] G.J. Sullivan and T. Wiegand, “Rate-distortion optimization for video
compression,” IEEE Signal Processing Magazine, vol. 15, pp. 74–90, Nov. 1998.
[34] T. Wiegand and B. Girod, “Lagrangian multiplier selection in hybrid
video coder control,” in Proc. of ICIP 2001, Thessaloniki, Greece, Oct. 2001.
[35] T. Stockhammer, D. Kontopodis, and T. Wiegand, “Rate-distortion
optimization for H.26L video coding in packet loss environment,” in Proc.
Packet Video Workshop 2002, Pittsburgh, PY, April 2002.
[36] A. Joch, F. Kossentini, H. Schwarz, T. Wiegand, and G. Sullivan, “Per-
formance comparison of video coding standards using Lagrangian coder
control,” Proc. of the IEEE ICIP 2002, part II, pp. 501–504, Sept. 2002.
[37] ISO/IEC 15444-3, “Motion-JPEG2000” (JPEG2000 Part 3), Geneva, 2002.
[38] D. Marpe, V. George, H.L. Cycon, and K.U. Barthel, “Performance
evaluation of Motion-JPEG2000 in comparison with H.264/AVC operated
in intra coding mode,” in Proc. SPIE Conf. on Wavelet Applications in
Industrial Processing, Photonics East, Rhode Island, USA, Oct. 2003.
[39] F. Catthoor, et al., Custom Memory Management Methodology.
26 IEEE CIRCUITS AND SYSTEMS MAGAZINE FIRST QUARTER 2004
Kluwer Academic Publishers, 1998
[40] J. Bormans et al., “Integrating system-level low power methodolo-
gies into a real-life design flow,” in Proc. IEEE PATMOS ‘99, Kos, Greece,
Oct. 1999, pp. 19–28.
[41] Chimienti, L. Fanucci, R. Locatelli, and S. Saponara, “VLSI architec-
ture for a low-power video codec system,” Microelectronics Journal,
vol. 33, no. 5, pp. 417–427, 2002.
[42] T. Meng et al., “Portable video-on-demand in wireless communica-
tion,” Proc. of the IEEE, vol. 83 no. 4, pp. 659–680, 1995.
[43] J. Jung, E. Lesellier, Y. Le Maguet, C. Miro, and J. Gobert, “Philips deblock-
ing solution (PDS), a low complexity deblocking for JVT,” Joint Video Team
of ISO/IEC MPEG and ITU-T VCEG, JVT-B037, Geneva, CH, Feb. 2002.
[44] L. Nachtergaele et al., “System-Level power optimization of video
codecs on embedded cores: A systematic approach,” Journal of VLSI Sig-
nal Processing, Kluwer Academic Publisher, vol. 18, no. 2, pp. 89–111, 1998.
[45] http://www.imec.be/atomium
[46] S. Bauer, et al., “The MPEG-4 multimedia coding standard: Algo-
rithms, architectures and applications,” Journal of VLSI Signal Processing.
Boston: Kluwer, vol. 23, no. 1, pp. 7–26, Oct. 1999.
[47] S. Saponara, C. Blanch, K. Denolf, and J. Bormans, “The JVT advanced
video coding standard: Complexity and performance analysis on a tool-by-
tool basis,” Packet Video Workshop (PV’03), Nantes, France, April 2003.
[48] V. Lappalainen et al., “Optimization of emerging H.26L video
encoder,” in Proc. IEEE SIPS’01, Sept. 2001, pp. 406–415.
[49] V. Lappalainen, A. Hallapuro, and T. Hamalainen, “Performance
analysis of low bit rate H.26L video encoder,” in Proc. IEEE ICASSP'01,
May 2001, pp. 1129–1132.
[50] M. Horowitz, A. Joch, F. Kossentini, and A. Hallapuro, “H.264/AVC
baseline profile decoder complexity analysis,” IEEE Tran. Circ. Sys. Video
Tech., vol. 13, no 7, pp. 715–727, 2003.
Jörn Ostermann studied Electrical Engi-
neering and Communications Engineering
at the University of Hannover and Imperi-
al College London, respectively. He
received Dipl.-Ing. and Dr.-Ing. from the
University of Hannover in 1988 and 1994,
respectively. From 1988 till 1994, he
worked as a Research Assistant at the Institut für Theo-
retische Nachrichtentechnik conducting research in low
bit-rate and object-based analysis-synthesis video coding.
In 1994 and 1995 he worked in the Visual Communications
Research Department at AT&T Bell Labs on video coding.
He was a member of Image Processing and Technology
Research within AT&T Labs—Research from 1996 to 2003.
Since 2003 he is Full Professor and Head of the Institut für
Theoretische Nachrichtentechnik und Informationsverar-
beitung at the Universität Hannover, Germany.
From 1993 to 1994, he chaired the European COST 211
sim group coordinating research in low bitrate video cod-
ing. Within MPEG-4, he organized the evaluation of video
tools to start defining the standard. He chaired the Adhoc
Group on Coding of Arbitrarily-shaped Objects in MPEG-4
Video. Jörn was a scholar of the German National Foun-
dation. In 1998, he received the AT&T Standards Recogni-
tion Award and the ISO award. He is a member of IEEE, the
IEEE Technical Committee on Multimedia Signal Process-
ing, past chair of the IEEE CAS Visual Signal Processing
and Communications (VSPC) Technical Committee and a
Distinguished Lecturer of the IEEE CAS Society. He pub-
lished more than 50 research papers and book chapters.
He is coauthor of a graduate level text book on video com-
munications. He holds 10 patents. His current research
interests are video coding and streaming, 3D modeling,
face animation, and computer-human interfaces.
Matthias Narroschke was born in
Hanover in 1974. He received his Dipl.-Ing.
degree in electrical engineering from the
University of Hanover in 2001 (with high-
est honors). Since then he has been work-
ing toward the PhD degree at the Institut
für Theoretische Nachrichtentechnik und
Informationsverarbeitung of the University of Hanover. His
research interests include video coding, 3D image process-
ing and video processing, and internet streaming. In 2003,
he became a Senior Engineer. He received the Robert-
Bosch-Prize for the best Dipl.-Ing. degree in electrical engi-
neering in 2001. He is an active delegate to the Motion
Picture Experts Group (MPEG).
Thomas Wedi received his Dipl.-Ing.
degree in 1999 from the University of Han-
nover, Hannover, Germany, where he is
currently working toward the Ph.D.
degree with research focused on motion-
and aliasing-compensated prediction for
hybrid video coding.
He has been with the Institut für Theoretische
Nachrichtentechnik und Informationsverarbeitung, Uni-
versity of Hannover, as Research Scientist and Teaching
Assistant. In 2001 he also became the Senior Engineer. His
further research interests include video coding and trans-
mission, 3D image and video processing, and audio-visual
communications. He is an active contributor to the ITU-T
Video Coding Experts Group (VCEG) and the ISO/IEC/
ITU-T Joint Video Team (JVT), where the H.264/AVC video
coding standard is developed. In both standardization
groups he chaired an Ad-Hoc group on interpolation fil-
tering. In cooperation with Robert Bosch GmbH, he holds
several international patents in the area of video com-
pression.
Thomas Stockhammer received his
Diplom-Ingenieur degree in electrical engi-
neering from the Munich University of
Technology (TUM), Germany, in 1996.
Since then he has been working toward
the Dr.-Ing. degree at the Munich Universi-
ty of Technology, Germany, in the area of
multimedia and video transmission over mobile and pack-
et-lossy channels. In 1996, he visited Rensselear Polytech-
nic Institute (RPI), Troy, NY, to perform his diploma thesis
in the area of combined source-channel coding for video
27FIRST QUARTER 2004 IEEE CIRCUITS AND SYSTEMS MAGAZINE
and coding theory. There he started the research in video
transmission as well as source and channel coding.
In 2000, he was Visiting Researcher in the Information
Coding Laboratory at the University of San Diego, Califor-
nia (UCSD). Since then he has published numerous con-
ference and journal papers and holds several patents. He
regularly participates and contributes to different stan-
dardization activities, e.g. ITU-T H.324, H.264, ISO/IEC
MPEG, JVT, IETF, and 3GPP. He acts as a member of sev-
eral technical program committees, as a reviewer for dif-
ferent journals, and as an evaluator for the European
Commission. His research interests include joint source
and channel coding, video transmission, multimedia net-
works, system design, rate-distortion optimization, infor-
mation theory, as well as mobile communications.
Jan Bormans, Ph.D., has been a
researcher at the Information Retrieval
and Interpretation Sciences laboratory of
the Vrije Universiteit Brussel (VUB), Bel-
gium, in 1992 and 1993. In 1994, he joined
the VLSI Systems and Design Methodolo-
gies (VSDM) division of the IMEC
research center in Leuven, Belgium. Since 1996, he is
heading IMEC's Multimedia Image Compression Systems
group. This group focuses on the efficient design and
implementation of embedded systems for advanced mul-
timedia applications. Jan Bormans is the Belgian head of
delegation for ISO/IEC's MPEG and SC29 standardization
committees. He is also MPEG-21 requirements editor and
chairman of the MPEG liaison group.
Fernando Pereira was born in Vermelha,
Portugal in October 1962. He was graduat-
ed in Electrical and Computers Engineer-
ing by Instituto Superior Técnico (IST),
Universidade Técnica de Lisboa, Portugal,
in 1985. He received the M.Sc. and Ph.D.
degrees in Electrical and Computers Engi-
neering from IST, in 1988 and 1991, respectively.
He is currently Professor at the Electrical and Com-
puters Engineering Department of IST. He is responsible
for the participation of IST in many national and inter-
national research projects. He is a member of the Edi-
torial Board and Area Editor on Image/Video
Compression of the Signal Processing: Image Communi-
cation Journal and an Associate Editor of IEEE Transac-
tions of Circuits and Systems for Video Technology,
IEEE Transactions on Image Processing, and IEEE Trans-
actions on Multimedia. He is a member of the Scientific
and Program Committees of tens of international con-
ferences and workshops. He has contributed more than
130 papers to journals and international conferences.
He won the 1990 Portuguese IBM Award and an ISO
Award for Outstanding Technical Contribution for his
participation in the development of the MPEG-4 Visual
standard, in October 1998.
He has been participating in the work of ISO/MPEG for
many years, notably as the head of the Portuguese dele-
gation, chairman of the MPEG Requirements group, and
chairing many Ad Hoc Groups related to the MPEG-4 and
MPEG-7 standards.
His current areas of interest are video analysis, pro-
cessing, coding and description, and multimedia interac-
tive services.
Peter List graduated in Applied Physics in
1985 and received the PH.D. in 1989 from
the University of Frankfurt/Main, Ger-
many.
Currently he is project manager at T-
System Nova, the R&D Company of
Deutsche Telekom. Since 1990 he has
been with Deutsche Telekom, and has actively followed
international standardization of video compression tech-
nologies in MPEG, ITU and several European Projects for
about 14 years.
Detlev Marpe received the Diploma
degree in mathematics with highest hon-
ors from the Technical University Berlin,
Germany. He is currently a Project Man-
ager in the Image Processing Department
of the Fraunhofer-I nstitute for Telecom-
munications, Heinrich-Hertz-Institute
(HHI), Berlin, Germany, where he is responsible for
research projects in the area of video coding, image pro-
cessing, and video streaming. Since 1997, he has been an
active contributor to the ITU-T VCEG, ISO/IEC JPEG and
ISO/IEC MPEG standardization activities for still image and
video coding. During 2001–2003, he chaired the CABAC
Ad-Hoc Group within the H.264/MPEG-4 AVC standardiza-
tion effort of the ITU-T/ISO/IEC Joint Video Team. He has
authored or co-authored more than 30 journal and confer-
ence papers in the fields of image and video coding, image
processing and information theory, and he has written
more than 40 technical contributions to various interna-
tional standardization projects. He also holds several
international patents. He is a member of IEEE and ITG
(German Society of Information Technology). As a co-
founder of daViKo GmbH, a Berlin-based start-up compa-
ny involved in development of server-less multipoint
videoconferencing products for Intranet or Internet col-
laboration, he received the Prime Prize of the 2001 Multi-
media Start-up Competition founded by the German
Federal Ministry of Economics and Technology.
28 IEEE CIRCUITS AND SYSTEMS MAGAZINE FIRST QUARTER 2004

More Related Content

PPT
H.264 video standard
PPTX
An Overview of High Efficiency Video Codec HEVC (H.265)
PDF
H.264 nal and RTP
PPT
H.263 Video Codec
PPTX
Video coding standards ppt
PPTX
A short history of video coding
PPTX
Subjective quality evaluation of the upcoming HEVC video compression standard
PDF
Presentazione Broadcast H.265 & H.264 Sematron Italia - Maggio 2016
H.264 video standard
An Overview of High Efficiency Video Codec HEVC (H.265)
H.264 nal and RTP
H.263 Video Codec
Video coding standards ppt
A short history of video coding
Subjective quality evaluation of the upcoming HEVC video compression standard
Presentazione Broadcast H.265 & H.264 Sematron Italia - Maggio 2016

What's hot (16)

PDF
The H.265/MPEG-HEVC Standard
PDF
Deblocking_Filter_v2
PPTX
H.265ImprovedCE_over_H.264-HarmonicMay2014Final
PPT
Introduction to HEVC
PDF
HEVC VIDEO CODEC By Vinayagam Mariappan
PPT
HEVC Definitions and high-level syntax
PDF
Video decoding: SDI interface implementation &H.264/AVC bitstreamdecoder hard...
PPT
Video Coding Standard
PDF
Feature hevc
PDF
h.264 video compression standard.
PPTX
H.264 vs HEVC
PPTX
High Efficiency Video Codec
PDF
HEVC overview main
DOCX
Algorithm and architecture design of the h.265 hevc intra encoder
PPT
H263.ppt
PDF
Applied technology
The H.265/MPEG-HEVC Standard
Deblocking_Filter_v2
H.265ImprovedCE_over_H.264-HarmonicMay2014Final
Introduction to HEVC
HEVC VIDEO CODEC By Vinayagam Mariappan
HEVC Definitions and high-level syntax
Video decoding: SDI interface implementation &H.264/AVC bitstreamdecoder hard...
Video Coding Standard
Feature hevc
h.264 video compression standard.
H.264 vs HEVC
High Efficiency Video Codec
HEVC overview main
Algorithm and architecture design of the h.265 hevc intra encoder
H263.ppt
Applied technology
Ad

Viewers also liked (16)

PDF
Mobile Power Strategy
PDF
2 Molecular Biology
PDF
Demand7 - Customer Acquisition Engine
PPTX
فصل دوم کتاب هچ
PDF
Gestione dati - Traccia di analisi caso studio
PDF
December 2013 nscs
PDF
Chris Beppler Consultor Imobiliário
PDF
Trey's toondoo
PDF
Papercraft - Special detachment 88 black
PPTX
Автовокзал Псков
PPTX
How do you travel on the plane
PPTX
I love bike
PPTX
John huddle adif media presentation nov2014
PDF
Sam kandel bedbug_presentation
PPT
Slide caritas 20170224 def
Mobile Power Strategy
2 Molecular Biology
Demand7 - Customer Acquisition Engine
فصل دوم کتاب هچ
Gestione dati - Traccia di analisi caso studio
December 2013 nscs
Chris Beppler Consultor Imobiliário
Trey's toondoo
Papercraft - Special detachment 88 black
Автовокзал Псков
How do you travel on the plane
I love bike
John huddle adif media presentation nov2014
Sam kandel bedbug_presentation
Slide caritas 20170224 def
Ad

Similar to H264 final (20)

PDF
10.1.1.184.6612
PDF
Spatial Scalable Video Compression Using H.264
PDF
E010132529
PDF
[IJET-V1I2P1] Authors :Imran Ullah Khan ,Mohd. Javed Khan ,S.Hasan Saeed ,Nup...
PDF
Motion Vector Recovery for Real-time H.264 Video Streams
PDF
The H.264/AVC Advanced Video Coding Standard: Overview and ...
PDF
A REAL-TIME H.264/AVC ENCODER&DECODER WITH VERTICAL MODE FOR INTRA FRAME AND ...
PPT
09a video compstream_intro_trd_23-nov-2005v0_2
PPT
/conferences/spr2004/presentations/eubanks/eubanks_mpeg4.ppt
PDF
PDF
H264 video compression explained
PDF
H.264 video compression standard.
DOC
IBM VideoCharger and Digital Library MediaBase.doc
PDF
Paper id 2120148
PDF
Overview of the H.264/AVC video coding standard - Circuits ...
PDF
Next generation video compression
PDF
Next generation video compression
PDF
video compression2
PDF
video compression2
PDF
video compression2
10.1.1.184.6612
Spatial Scalable Video Compression Using H.264
E010132529
[IJET-V1I2P1] Authors :Imran Ullah Khan ,Mohd. Javed Khan ,S.Hasan Saeed ,Nup...
Motion Vector Recovery for Real-time H.264 Video Streams
The H.264/AVC Advanced Video Coding Standard: Overview and ...
A REAL-TIME H.264/AVC ENCODER&DECODER WITH VERTICAL MODE FOR INTRA FRAME AND ...
09a video compstream_intro_trd_23-nov-2005v0_2
/conferences/spr2004/presentations/eubanks/eubanks_mpeg4.ppt
H264 video compression explained
H.264 video compression standard.
IBM VideoCharger and Digital Library MediaBase.doc
Paper id 2120148
Overview of the H.264/AVC video coding standard - Circuits ...
Next generation video compression
Next generation video compression
video compression2
video compression2
video compression2

H264 final

  • 1. 7FIRST QUARTER 2004 1540-7977/04/$20.00©2004 IEEE IEEE CIRCUITS AND SYSTEMS MAGAZINE Videocodingwith H.264/AVC: Tools, Performance, and Complexity ©EYEWIRE;DIGITALSTOCK;COMSTOCK,INC.1998 Jörn Ostermann, Jan Bormans, Peter List, Detlev Marpe, Matthias Narroschke, Fernando Pereira, Thomas Stockhammer, and Thomas Wedi H.264/AVC, the result of the collaboration between the ISO/IEC Moving Picture Experts Group and the ITU-T Video Coding Experts Group, is the latest standard for video coding. The goals of this standardization effort were enhanced compression effi- ciency, network friendly video representation for interactive (video telephony) and non-interactive applications (broadcast, streaming, storage, video on demand). H.264/AVC provides gains in compression efficiency of up to 50% over a wide range of bit rates and video resolutions compared to previous stan- dards. Compared to previous standards, the decoder complexity is about four times that of MPEG-2 and two times that of MPEG-4 Visual Simple Profile. This paper provides an overview of the new tools, features and complexity of H.264/AVC. Index Terms—H.263, H.264, JVT, MPEG-1, MPEG-2, MPEG-4, standards, video coding, motion compensation, transform coding, streaming Abstract Feature
  • 2. 1. Introduction T he new video coding standard Recommendation H.264 of ITU-T also known as International Stan- dard 14496-10 or MPEG-4 part 10 Advanced Video Coding (AVC) of ISO/IEC [1] is the latest standard in a sequence of the video coding standards H.261 (1990) [2], MPEG-1 Video (1993) [3], MPEG-2 Video (1994) [4], H.263 (1995, 1997) [5], MPEG-4 Visual or part 2 (1998) [6]. These previous standards reflect the technological progress in video compression and the adaptation of video coding to different applications and networks. Applications range from video telephony (H.261) to consumer video on CD (MPEG-1) and broadcast of standard definition or high definition TV (MPEG-2). Networks used for video commu- nications include switched networks such as PSTN (H.263, MPEG-4) or ISDN (H.261) and packet networks like ATM (MPEG-2, MPEG-4), the Internet (H.263, MPEG-4) or mobile networks (H.263, MPEG-4). The importance of new network access technologies like cable modem, xDSL, and UMTS created demand for the new video coding stan- dard H.264/AVC, providing enhanced video compression performance in view of interactive applications like video telephony requiring a low latency system and non-inter- active applications like storage, broadcast, and streaming of standard definition TV where the focus is on high cod- ing efficiency. Special consideration had to be given to the performance when using error prone networks like mobile channels (bit errors) for UMTS and GSM or the Internet (packet loss) over cable modems, or xDSL. Comparing the H.264/AVC video coding tools like multiple reference frames, 1/4 pel motion compensation, deblocking filter or integer transform to the tools of previous video coding standards, H.264/AVC brought in the most algorithmic discontinu- ities in the evolution of standard- ized video coding. At the same time, H.264/AVC achieved a leap in cod- ing performance that was not fore- seen just five years ago. This progress was made possible by the video experts in ITU-T and MPEG who established the Joint Video Team (JVT) in December 2001 to develop this H.264/AVC video coding standard. H.264/AVC was finalized in March 2003 and approved by the ITU-T in May 2003. The corresponding stan- dardization documents are downloadable from ftp://ftp.imtc-files.org/jvt- experts and the reference software is available at h t t p : / / b s . h h i . d e / ~suehring/tml/download. Modern video communi- cation uses digital video that is captured from a camera or synthesized using appropriate tools like animation software. In an optional pre-processing 8 IEEE CIRCUITS AND SYSTEMS MAGAZINE FIRST QUARTER 2004 H.264 to H.264 to H.324/M H.264 to RTP/IP H.264 to H.320 H.264 to File Format TCP/IP H.264 to MPEG-2 Systems H.264/AVC Conceptual Layers Video Coding Layer Encoder Video Coding Layer Encoder VCL-NAL Interface Network Abstraction Layer Encoder Network Abstraction Layer Encoder NAL Encoder Interface NAL Decoder Interface Transport Layer Wired Networks Wireless Networks Figure 2. H.264/AVC in a transport environment: The network abstraction layer interface enables a seamless integration with stream and packet-oriented transport layers (from [7]) . Source (Video) Receiver (Video) Video Video Pre-Processing Post-Processing & Error Recovery Encoding Decoding Scope of Standard Bitstream Bitstream Channel/ Storage Figure 1. Scope of video coding standardization: Only the syntax and semantics of the bitstream and its decoding are defined. Jörn Ostermann is with the Institut für Theoretische Nachrichtentechnik und Informationsverarbeitung, University of Hannover, Hannover, Ger- many. Jan Bormans is with IMEC, Leuven, Belgium. Peter List is with Deutsche Telecom, T-Systems, Darmstadt, Germany. Detlev Marpe is with the Fraunhofer-Institute for Telecommunications, Heinrich Hertz Institute, Berlin, Germany. Matthias Narroschke is with the Institut für Theo- retische Nachrichtentechnik und Informationsverarbeitung, University of Hannover, Appelstr. 9a, 30167 Hannover, Germany, narrosch@tnt.uni- hannover.de. Fernando Peirera is with Instituto Superior Técnico - Instituto de Telecomunicações, Lisboa, Portugal. Thomas Stockhammer is with the Institute for Communications Engineering, Munich University of Technology, Germany. Thomas Wedi is with the Institut für Theo- retische Nachrichtentechnik und Informationsverarbeitung, University of Hannover, Hannover, Germany.
  • 3. step (Figure 1), the sender might choose to preprocess the video using format conversion or enhancement tech- niques. Then the en- coder encodes the video and represents the video as a bit stream. After transmission of the bit stream over a com- munications network, the decoder decodes the video which gets dis- played after an optional post-processing step which might include format conversion, filtering to suppress coding artifacts, error concealment, or video enhancement. The standard defines the syntax and semantics of the bit stream as well as the processing that the decoder needs to perform when decoding the bit stream into video. Therefore, manufactures of video decoders can only compete in areas like cost and hardware require- ments. Optional post-processing of the decoded video is another area where different manufactures will provide competing tools to create a decoded video stream opti- mized for the targeted application. The standard does not define how encoding or other video pre-processing is per- formed thus enabling manufactures to compete with their encoders in areas like cost, coding efficiency, error resilience and error recovery, or hardware requirements. At the same time, the standardization of the bit stream and the decoder preserves the fundamental requirement for any communications standard—interoperability. For efficient transmission in different environments not only coding efficiency is relevant, but also the seam- less and easy integration of the coded video into all cur- rent and future protocol and network architectures. This includes the public Internet with best effort delivery, as well as wireless networks expected to be a major applica- tion for the new video coding standard. The adaptation of the coded video representation or bitstream to different transport networks was typically defined in the systems specification in previous MPEG standards or separate standards like H.320 or H.324. However, only the close integration of network adaptation and video coding can bring the best possible performance of a video communi- cation system. Therefore H.264/AVC consists of two con- ceptual layers (Figure 2). The video coding layer (VCL) defines the efficient representation of the video, and the network adaptation layer (NAL) converts the VCL repre- 9FIRST QUARTER 2004 IEEE CIRCUITS AND SYSTEMS MAGAZINE Decoded Macroblock Intra/Inter Intra-Frame Prediction Motion Comp. Prediction Inverse Transform Deblocking Filter Memory Motion Data Entropy Decoding + Quantized Coefficients Figure 4. Generalized block diagram of a hybrid video decoder with motion compensation. Macroblock of Input Image Signal + Prediction Error Signal Transform Quant. Intra/Inter Intra-Frame Prediction Motion Comp. Prediction Motion Estimation Inverse Transform Deblocking Filter Memory Motion Data Quantized Coefficients Entropy Coding + Figure 3. Generalized block diagram of a hybrid video encoder with motion compensation: The adaptive deblocking filter and intra-frame prediction are two new tools of H.264.
  • 4. sentation into a format suitable for specific transport lay- ers or storage media. For circuit-switched transport like H.320, H.324M or MPEG-2, the NAL delivers the coded video as an ordered stream of bytes containing start codes such that these transport layers and the decoder can robustly and simply identify the structure of the bit stream. For packet switched networks like RTP/IP or TCP/IP, the NAL delivers the coded video in packets with- out these start codes. This paper gives an overview of the working, perform- ance and hardware requirements of H.264/AVC. In Section 2, the concept of standardized video coding schemes is introduced. In Section 3, we describe the major tools of H.264/AVC that achieve this progress in video coding per- formance. Video coder optimization is not part of the standard. However, the successful use of the encoder requires knowledge on encoder control that is presented in Section 4. H.264/AVC may be used for different applica- tions with very different constraints like computational resources, error resilience and video resolution. Section 5 describes the profiles and levels of H.264/AVC that allow for the adaptation of the decoder complexity to different applications. In Section 6, we give comparisons between H.264/AVC and previous video coding standards in terms of coding efficiency as well as hardware complexity. H.264/AVC uses many international patents, and Section 7 paraphrases the current licensing model for the commer- cial use of H.264/AVC. 2. Concept of Standardized Video Coding Schemes Standardized video coding techniques like H.263, H.264/AVC, MPEG-1, 2, 4 are based on hybrid video cod- ing. Figure 3 shows the generalized block diagram of such a hybrid video encoder. The input image is divided into macroblocks. Each macroblock consists of the three components Y, Cr and Cb. Y is the luminance component which represents the brightness information. Cr and Cb represent the color information. Due to the fact that the human eye system is less sensitive to the chrominance than to the luminance the chrominance signals are both subsampled by a factor of 2 in horizontal and vertical direction. Therefore, a mac- roblock consists of one block of 16 by 16 picture elements for the luminance component and of two blocks of 8 by 8 picture elements for the color components. These macroblocks are coded in Intra or Inter mode. In Inter mode, a macroblock is predicted using motion compensation. For motion compensated prediction a dis- placement vector is estimated and transmitted for each block (motion data) that refers to the corresponding position of its image signal in an already transmitted ref- erence image stored in memory. In Intra mode, former standards set the prediction signal to zero such that the image can be coded without reference to previously sent information. This is important to provide for error resilience and for entry points into the bit streams enabling random access. The prediction error, which is the difference between the original and the predicted block, is transformed, quantized and entropy coded. In order to reconstruct the same image on the decoder side, the quantized coefficients are inverse transformed and added to the prediction signal. The result is the recon- structed macroblock that is also available at the decoder side. This macroblock is stored in a memory. Mac- roblocks are typically stored in raster scan order. With respect to this simple block diagram (Figure 3), H.264/AVC introduces the following changes: 1. In order to reduce the block-artifacts an adaptive deblocking filter is used in the prediction loop. The deblocked macroblock is stored in the memory and can be used to predict future macroblocks. 2. Whereas the memory contains one video frame in previous standards, H.264/AVC allows storing multi- ple video frames in the memory. 3. In H.264/AVC a prediction scheme is used also in Intra mode that uses the image signal of already transmit- ted macroblocks of the same image in order to pre- dict the block to code. 4. The Discrete Cosine Transform (DCT) used in former standards is replaced by an integer transform. Figure 4 shows the generalized block diagram of the corresponding decoder. The entropy decoder decodes the quantized coefficients and the motion data, which is used for the motion compensated prediction. As in the encoder, a prediction signal is obtained by intra-frame or motion compensated prediction, which is added to the inverse transformed coefficients. After deblocking filter- ing, the macroblock is completely decoded and stored in the memory for further predictions. In H.264/AVC, the macroblocks are processed in so called slices whereas a slice is usually a group of mac- roblocks processed in raster scan order (see Figure 5). In special cases, which will be discussed in Section 3.6, the 10 IEEE CIRCUITS AND SYSTEMS MAGAZINE FIRST QUARTER 2004 Slice 0 Slice 1 Slice 2 Figure 5. Partitioning of an image into several slices.
  • 5. processing can differ from the raster scan order. Five dif- ferent slice-types are supported which are I-, P-, B-, SI,- and SP-slices. In an I-slice, all macroblocks are encoded in Intra mode. In a P-slice, all macroblocks are predicted using a motion compensated prediction with one refer- ence frame and in a B-slice with two reference frames. SI- and SP-slices are specific slices that are used for an effi- cient switching between two different bitstreams. They are both discussed in Section 3.6. For the coding of interlaced video, H.264/AVC sup- ports two different coding modes. The first one is called frame mode. In the frame mode, the two fields of one frame are coded together as if they were one single pro- gressive frame. The second mode is called field mode. In this mode, the two fields of a frame are encoded sepa- rately. These two different coding modes can be selected for each image or even for each macroblock. If they are selected for each image, the coding is referred to as pic- ture adaptive field/frame coding (P-AFF). Whereas MPEG-2 allows for selecting the frame/field coding on a mac- roblock level H.264 allow for selecting this mode on a ver- tical macroblock pair level. This coding is referred to as macroblock-adaptive frame/field coding (MB-AFF). The choice of the frame mode is efficient for regions that are not moving. In non-moving regions there are strong sta- tistical dependencies between adjacent lines even though these lines belong to different fields. These dependencies can be exploited in the frame mode. In the case of moving regions the statistical dependencies between adjacent lines are much smaller. It is more efficient to apply the field mode and code the two fields separately. 3. The H.264/AVC Coding Scheme In this Section, we describe the tools that make H.264 such a successful video coding scheme. We discuss Intra coding, motion compensated prediction, transform cod- ing, entropy coding, the adaptive deblocking filter as well as error robustness and network friendliness. 3.1 Intra Prediction Intra prediction means that the samples of a macroblock are predicted by using only information of already trans- mitted macroblocks of the same image. In H.264/AVC, two different types of intra prediction are possible for the prediction of the luminance component Y. The first type is called INTRA_4×4 and the second one INTRA_16×16. Using the INTRA_4×4 type, the mac- roblock, which is of the size 16 by 16 picture elements (16×16), is divided into sixteen 4×4 subblocks and a pre- diction for each 4×4 subblock of the luminance signal is applied individually. For the prediction purpose, nine dif- ferent prediction modes are supported. One mode is DC- prediction mode, whereas all samples of the current 4×4 subblock are predicted by the mean of all samples neigh- boring to the left and to the top of the current block and which have been already reconstructed at the encoder and at the decoder side (see Figure 6, Mode 2). In addition to DC-prediction mode, eight prediction modes each for a specific prediction direction are supported. All possible directions are shown in Figure 7. Mode 0 (vertical predic- tion) and Mode 1 (horizontal prediction) are shown explicitly in Figure 6. For example, if the vertical predic- tion mode is applied all samples below sample A (see Fig- ure 6) are predicted by sample A, all samples below sample B are predicted by sample B and so on. Using the type INTRA_16×16, only one prediction mode is applied for the whole macroblock. Four different prediction modes are supported for the type INTRA_16×16: Vertical prediction, horizontal prediction, DC-prediction and plane-prediction. Hereby plane-predic- tion uses a linear function between the neighboring sam- ples to the left and to the top in order to predict the current samples. This mode works very well in areas of a gently changing luminance. The mode of operation of these modes is the same as the one of the 4×4 prediction modes. The only difference is that they are applied for the whole macroblock instead of for a 4×4 subblock. The effi- 11FIRST QUARTER 2004 IEEE CIRCUITS AND SYSTEMS MAGAZINE M A B C D E F G H I J K L M A B C D E F G H I J K L M A B C D E F G H I J K L Mode 0: Vertical Mode 1: Horizontal Mode 2: DC A M : Neighboring samples that are already reconstructed at the encoder and at the decoder side : Samples to be predicted Mean (A, B, C, D, I, J, K, L) Figure 6. Three out of nine possible intra prediction modes for the intra prediction type INTRA_4×4.
  • 6. ciency of these modes is high if the signal is very smooth within the macroblock. The intra prediction for the chrominance signals Cb and Cr of a macroblock is similar to the INTRA_16×16 type for the luminance signal because the chrominance signals are very smooth in most cases. It is performed always on 8×8 blocks using vertical prediction, horizon- tal prediction, DC-prediction or plane-prediction. All intra prediction modes are explained in detail in [1]. 3.2 Motion Compensated Prediction In case of motion compensated prediction macroblocks are predicted from the image signal of already transmit- ted reference images. For this purpose, each macroblock can be divided into smaller partitions. Partitions with luminance block sizes of 16×16, 16×8, 8×16, and 8×8 samples are supported. In case of an 8×8 sub-macroblock in a P-slice, one additional syntax element specifies if the corresponding 8×8 sub-macroblock is further divided into partitions with block sizes of 8×4, 4×8 or 4×4 [8]. The partitions of a macroblock and a sub-macroblock are shown in Figure 8. In former standards as MPEG-4 or H.263, only blocks of the size 16×16 and 8×8 are supported. A displacement vector is estimated and transmitted for each block, refers to the corresponding position of its image signal in an already transmitted reference image. In former MPEG standards this reference image is the most recent pre- ceding image. In H.264/AVC it is possible to refer to sev- eral preceding images. For this purpose, an additional picture reference parameter has to be transmitted togeth- er with the motion vector. This technique is denoted as motion-compensated prediction with multiple reference frames [9]. Figure 9 illustrates the concept that is also extended to B-slices. The accuracy of displacement vectors is a quarter of a picture element (quarter-pel or 1/4-pel). Such displace- ment vectors with fractional-pel resolution may refer to positions in the reference image, which are spatially located between the sampled positions of its image sig- nal. In order to estimate and compensate fractional-pel displacements, the image signal of the reference image has to be generated on sub-pel positions by interpolation. In H.264/AVC the luminance signal at half-pel positions is generated by applying a one-dimensional 6-tap FIR filter, which was designed to reduce aliasing components that deteriorate the interpolation and the motion compensat- ed prediction [8]. By averaging the luminance signal at integer- and half-pel positions the image signal at quarter- pel positions is generated. The chrominance signal at all fractional-pel positions is obtained by averaging. In comparison to prior video-coding standards, the classical concept of B-pictures is extended to a general- ized B-slice concept in H.264/AVC. In the classical concept, B-pictures are pictures that are encoded using both past and future pictures as references. The prediction is obtained by a linear combination of forward and back- ward prediction signals. In former standards, this linear combination is just an averaging of the two prediction sig- nals whereas H.264/AVC allows arbitrary weights. In this generalized concept, the linear combination of prediction signals is also made regardless of the temporal direction. 12 IEEE CIRCUITS AND SYSTEMS MAGAZINE FIRST QUARTER 2004 3 7 0 5 4 6 1 88 1 6 4 5 0 7 3 Figure 7. Possible prediction directions for INTRA_4×4 mode. Macroblock Partitions Sub-Macroblock Partitions 16 x 16 16 x 8 8 x 16 8 x 8 8 x 8 8 x 4 4 x 8 4 x 4 Sub- Macroblock Figure 8. Partitioning of a macroblock and a sub- macroblock for motion compensated prediction.
  • 7. For example, a linear combination of two forward-predic- tion signals may be used (see Figure 9). Furthermore, using H.264/AVC it is possible to use images containing B- slices as reference images for further predictions which was not possible in any former standard. Details on this generalized B-slice concept, which is also known as multi- hypothesis motion-compensated prediction can be found in [10], [11], [12]. 3.3 Transform Coding Similar to former standards transform coding is applied in order to code the prediction error signal. The task of the transform is to reduce the spatial redundancy of the prediction error signal. For the purpose of transform coding, all former standards such as MPEG-1 and MPEG- 2 applied a two dimensional Discrete Cosine Transform (DCT) [13] of the size 8×8. Instead of the DCT, different integer transforms are applied in H.264/ AVC. The size of these transforms is mainly 4×4, in special cases 2×2. This smaller block size of 4×4 instead of 8×8 enables the encoder to better adapt the prediction error coding to the boundaries of moving objects, to match the transform block size with the smallest block size of the motion compensation, and to gener- ally better adapt the transform to the local pre- diction error signal. Three different types of transforms are used. The first type is applied to all samples of all pre- diction error blocks of the luminance component Y and also for all blocks of both chrominance components Cb and Cr regardless of whether motion compensated prediction or intra predic- tion was used. The size of this transform is 4×4. Its transform matrix H1 is shown in Figure 10. If the macroblock is predicted using the type INTRA_16×16, the second transform, a Hadamard transform with matrix H2 (see Figure 10), is applied in addition to the first one. It transforms all 16 DC coefficients of the already transformed blocks of the luminance signal. The size of this transform is also 4×4. The third transform is also a Hadamard transform but of size 2×2. It is used for the transform of the 4 DC coefficients of each chrominance component. Its matrix H3 is shown in Figure 10. The transmission order of all coefficients is shown in Figure 11. If the macroblock is predicted using the intra prediction type INTRA_16×16 the block with the label “−1” is transmitted first. This block contains the DC coef- ficients of all blocks of the luminance component. After- wards all blocks labeled “0”–“25” are transmitted whereas blocks “0”–“15” comprise all AC coefficients of the blocks of the luminance component. Finally, blocks “16” and “17” comprise the DC coefficients and blocks “18”–“25” the AC coefficients of the chrominance components. Compared to a DCT, all applied integer transforms have only integer numbers ranging from −2 to 2 in the trans- form matrix (see Figure 10). This allows computing the transform and the inverse transform in 16-bit arithmetic using only low complex shift, add, and subtract opera- tions. In the case of a Hadamard transform, only add and subtract operations are necessary. Furthermore, due to the exclusive use of integer operations mismatches of the inverse transform are completely avoided which was not the case in former standards and caused problems. All coefficients are quantized by a scalar quantizer. The quantization step size is chosen by a so called quan- tization parameter QP which supports 52 different quan- tization parameters. The step size doubles with each increment of 6 of QP. An increment of QP by 1 results in an increase of the required data rate of approximately 12.5%. The transform is explained in detail in [15]. 3.4 Entropy Coding Schemes H.264/AVC specifies two alternative methods of entropy coding: a low-complexity technique based on the usage of context-adaptively switched sets of variable length codes, so-called CAVLC, and the computationally more demanding algorithm of context-based adaptive binary arithmetic coding (CABAC). Both methods represent major improvements in terms of coding efficiency com- 13FIRST QUARTER 2004 IEEE CIRCUITS AND SYSTEMS MAGAZINE Already Decoded Images as Reference Images to Code dt = 3 dt = 2 dt = 1 Figure 9. Motion-compensated prediction with multiple reference images. In addition to the motion vector, also an image reference parameter dt is transmitted. H1 = 1 2 1 1 1 1 –1 –2 1 –1 –1 2 1 –2 1 –1 1 1 1 1 1 1 –1 –1 1 –1 –1 1 1 –1 1 –1 1 1 1 –1 H2 = H3 = Figure 10. Matrices H1, H2 and H3 of the three different transforms applied in H.264/AVC.
  • 8. pared to the techniques of statistical coding traditionally used in prior video coding standards. In those earlier methods, specifically tailored but fixed variable length codes (VLCs) were used for each syntax element or sets of syntax elements whose representative probability dis- tributions were assumed to be closely matching. In any case, it was implicitly assumed that the underlying statis- tics are stationary, which however in practice is seldom the case. Especially residual data in a motion-compensat- ed predictive coder shows a highly non-stationary statis- tical behavior, depending on the video content, the coding conditions and the accuracy of the prediction model. By incorporating context modeling in their entropy coding framework, both methods of H.264/AVC offer a high degree of adaptation to the underlying source, even though at a different complexity-compres- sion trade-off. CAVLC is the baseline entropy coding method of H.264/AVC. Its basic coding tool consists of a sin- gle VLC of structured Exp-Golomb codes, which by means of individu- ally customized mappings is applied to all syntax elements except those related to quantized transform coefficients. For the lat- ter, a more sophisticated coding scheme is applied. As shown in the example of Figure 12, a given block of transform coefficients is first mapped on a 1-D array according to a predefined scanning pattern. Typically, after quantization a block contains only a few significant, i.e., nonzero coefficients, where, in addition, a predominant occurrence of coefficient levels with magni- tude equal to 1, so-called trailing 1’s (T1), is observed at the end of the scan. Therefore, as a preamble, first the number of nonzero coefficients and the number of T1s are transmitted using a combined codeword, where one out of four VLC tables are used based on the number of significant levels of neighboring blocks. Then, in the sec- ond step, sign and level value of significant coefficients are encoded by scanning the list of coefficients in revers order. By doing so, the VLC for coding each individual level value is adapted on the base of the previously encoded level by choosing among six VLC tables. Finally, the zero quantized coefficients are signaled by transmit- ting the total number of zeros before the last nonzero level for each block, and additionally, for each significant level the corresponding run, i.e., the number of consecutive preceding zeros. By monitoring the maximum possible num- ber of zeros at each cod- ing stage, a suitable VLC is chosen for the coding of each run value. A total number of 32 different VLCs are used in CAVLC entropy coding mode, where, however, the structure of some of these VLCs enables sim- ple on-line calculation of any code word without recourse to the storage of code tables. For typical coding conditions and test material, bit rate 14 IEEE CIRCUITS AND SYSTEMS MAGAZINE FIRST QUARTER 2004 4 x 4 Block of Quantized Transform Coefficients Array of Scanned Quantized Transform Coefficients CAVLC CABAC Preamble Reverse Precoded Syntax Elements Number signif. coeff: 5 Trailing 1’s (T1): 3 Sign T1: –1,1,1 Levels: 2,1 Total Zeros: 2 Run Before: 0, 1, 1, (0) Coded Block Flag: 1 Signif. Coeff. Flag: 1,1,0,1,0,1,1 Last Coeff. Flag: 0,0,0,0,1 Magnitude of Levels: 1,1,1,2,1 Level Signs: –1,1,1,1,1 Figure 12. Precoding a block of quantized transform coefficients. –1 16 17 Cb Cr Luminance Signal Y Chrominance Signals (Only for 16 x 16_INTRA Mode) 0 2 8 10 1 3 9 11 4 6 12 14 5 7 13 15 18 20 19 21 22 24 23 25 Figure 11. Transmission order of all coefficients of a macroblock [14].
  • 9. reductions of 2–7% are obtained by CAVLC relative to a conventional run-length scheme based on a single Exp- Golomb code. For significantly improved coding efficiency, CABAC as the alternative entropy coding mode of H.264/AVC is the method of choice (Figure 13). As shown in Figure 13, the CABAC design is based on the key elements: binarization, context modeling, and binary arithmetic coding. Bina- rization enables efficient binary arithmetic coding via a unique mapping of non-binary syntax elements to a sequence of bits, a so-called bin string. Each element of this bin string can either be processed in the regular cod- ing mode or the bypass mode. The latter is chosen for selected bins such as for the sign information or lower significant bins, in order to speedup the whole encoding (and decoding) process by means of a simplified coding engine bypass. The regular coding mode provides the actual coding benefit, where a bin may be context mod- eled and subsequently arithmetic encoded. As a design decision, in general only the most probable bin of a syn- tax element is supplied with a context model using previ- ously encoded bins. Moreover, all regular encoded bins are adapted by estimating their actual probability distri- bution. The probability estimation and the actual binary arithmetic coding is conducted using a multiplication-free method that enables efficient implementations in hard- ware and software. Note that for coding of transform coef- ficients, CABAC is applied to specifically designed syntax elements, as shown in the example of Figure 12. Typically, CABAC provides bit rate reductions of 5–15% compared to CAVLC. More details on CABAC can be found in [16]. 3.5 Adaptive Deblocking Filter The block-based structure of the H.264/AVC architecture containing 4×4 transforms and block-based motion com- pensation, can be the source of severe blocking artifacts. Filtering the block edges has been shown to be a power- ful tool to reduce the visibility of these artifacts. Deblock- ing can in principle be carried out as post-filtering, influencing only the pictures to be displayed. Higher visu- al quality can be achieved though, when the filtering process is carried out in the coding loop, because then all involved past reference frames used for motion compen- sation will be the filtered versions of the reconstructed frames. Another reason to make deblocking a mandatory in-loop tool in H.264/AVC is to enforce a decoder to approximately deliver a quality to the customer, which was intended by the producer and not leaving this basic picture enhancement tool to the optional good will of the decoder manufacturer. The filter described in the H.264/AVC standard is highly adaptive. Several parameters and thresholds and also the local characteristics of the picture itself control the strength of the filtering process. All involved thresholds are quantizer dependent, because blocking artifacts will always become more severe when quantization gets coarse. H.264/MPEG-4 AVC deblocking is adaptive on three levels: ■ On slice level, the global filtering strength can be adjusted to the individual characteristics of the video sequence. ■ On block edge level, the filtering strength is made dependent on inter/intra prediction decision, motion differences, and the presence of coded residuals in the two participating blocks. From these variables a filtering-strength parameter is calculated, which can take values from 0 to 4 caus- ing modes from no filtering to very strong filtering of the involved block edge. ■ On sample level, it is crucially important to be able to distinguish between true edges in the image and those created by the quantization of the transform-coefficients. True edges should be left unfiltered as much as possible. In order to separate the two cases, the sample values across every edge are analyzed. For an explanation denote the sample values inside two neighboring 4×4 blocks as p3, p2, p1, p0 | q0, q1, q2, q3 with the 15FIRST QUARTER 2004 IEEE CIRCUITS AND SYSTEMS MAGAZINE Syntax Element Non-Binary Valued Syntax Element Binarizer Bin String Context Bin Regular Bypass Binary Valued Syntax Element Loop Over Bins Bin Value for Context Model Update Bin Value, Context Model Coded Bits Coded Bits Bin Value Regular Bypass Binary Arithmetic Coder Bitstream Context Modeler Regular Coding Engine Bypass Coding Engine a b x Figure 13. CABAC block diagram.
  • 10. actual boundary between p0 and q0 as shown in Figure 14. Filtering of the two pixels p0 and q0 only takes place, if their absolute difference falls below a certain threshold α. At the same time, absolute pixel differences on each side of the edge (|p1 − p0| and |q1 − q0|) have to fall below another threshold β, which is considerably small- er than α. To enable filtering of p1(q1), additional- ly the absolute difference between p0 and p2 (q0 and q2) has to be smaller than β. The dependency of α and β on the quantizer, links the strength of filtering to the general quality of the reconstruct- ed picture prior to filtering. For small quantizer values the thresholds both become zero, and fil- tering is effectively turned off altogether. All filters can be calculated without multiplications or divisions to minimize the processor load involved in filtering. Only additions and shifts are needed. If filtering is turned on for p0, the impulse response of the involved filter would in principle be (0, 1, 4, | 4, −1, 0) / 8. For p1 it would be (4, 0, 2, | 2, 0, 0) / 8. The term in principle means that the maximum changes allowed for p0 and p1 (q0 and q1) are clipped to relatively small quantizer dependent values, reducing the low pass characteristic of the filter in a nonlinear manner. Intra coding in H.264/AVC tends to use INTRA_16×16 prediction modes when coding nearly uniform image areas. This causes small amplitude blocking artifacts at the macro block boundaries which are perceived as abrupt steps in these cases. To compensate the resulting tiling artifacts, very strong low pass filtering is applied on boundaries between two macro blocks with smooth image content. This special filter also involves pixels p3 and q3. In general deblocking results in bit rate savings of around 6–9% at medium qualities. More remarkable are the improvements in subjective picture quality. A more concise description of the H.264/AVC deblocking scheme can be found in [17]. 3.6 Error Robustness and Network Friendliness For efficient transmission in different environments, the seamless and easy integration of the coded video into all current and future protocol and network architectures is important. Therefore, both the VCL and the NAL are part of the H.264/AVC standard (Figure 2). The VCL specifies an efficient representation for the coded video signal. The NAL defines the interface between the video codec itself and the outside world. It operates on NAL units which give support to the packet-based approach of most exist- ing networks. In addition to the NAL concept, the VCL itself includes several features providing network friendli- ness and error robustness being essential especially for real-time services such as streaming, multicasting, and conferencing applications due to online transmission and decoding. The H.264/AVC Hypothetical Reference Decoder (HRD) [18] places constraints on encoded NAL unit streams in order to enable cost-effective decoder imple- mentations by introducing a multiple-leaky-bucket model. Lossy and variable bit rate (VBR) channels such as the Internet or wireless links require channel-adaptive streaming or multi-casting technologies. Among others [19], channel-adaptive packet dependency control [20] and packet scheduling [21] allow reacting to these chan- nels when transmitting pre-encoded video streams. These techniques are supported in H.264/AVC by various means, namely frame dropping of non-reference frames resulting in well-known temporal scalability, the multiple reference frame concept in combination with generalized B-pictures allowing a huge flexibility on frame dependen- cies to be exploited for temporal scalability and rate shaping of encoded video, and the possibility of switch- ing between different bit streams which are encoded at different bit rates. This technique is called version switching. It can be applied at Instantaneous Decoder Refresh (IDR) frames, or, even more efficiently by the usage of switching pictures which allow identical recon- struction of frames even when different reference frames are being used. Thereby, switching-predictive (SP) pictures efficiently exploit motion-compensated prediction where- as switching-intra (SI) pictures can exactly reconstruct SP pictures. The switching between two bit streams using SI and SP pictures is illustrated in Figure 15 and Figure 16. Switching pictures can also be applied for error resilience purposes as well as other features, for details see [22]. Whereas for relaxed-delay applications such as down- load-and-play, streaming, and broadcast/multicast, resid- ual errors can usually be avoided by applying powerful forward error correction and retransmission protocols, the low delay requirements for conversational applica- 16 IEEE CIRCUITS AND SYSTEMS MAGAZINE FIRST QUARTER 2004 a Block Edge p3 p2 p1 p0 q0 q1 q2 q3 β β Figure 14. One-dimensional visualization of a block edge in a typical situation where the filter would be turned on.
  • 11. tions impose additional challenges as transmission errors due to congestions and link-layer imperfectness can gen- erally not be avoided. Therefore, these video applications require error resilience features. The H.264/AVC standardi- zation process acknowledged thisby adopting a set of common test conditions for IP based trans- mission [23]. Anchor video sequences, appropriate bit rates and evaluation criteria are specified. In the following we briefly pres- ent different error resilience features includ- ed in the standard, for more details we refer to [24] and [7]. The presentation is accompa- nied by Figure 18 showing results for a repre- sentative selection of the common Internet test conditions, namely for the QCIF sequence Foreman 10 seconds are encoded at a frame rate of 7.5 fps applying only tem- porally backward referencing motion com- pensation. The resulting total bit rate including a 40 byte IP/UDP/RTP header matches exactly 64 kbit/s. As performance measure the average luminance peak signal to noise ratio (PSNR) is chosen and sufficient statistics are obtained by transmitting at least 10000 data packets for each experiment as well as applying a simple packet loss sim- ulator and Internet error patterns1 as speci- fied in [23]. Although common understanding usually assumes that increased compression efficien- cy decreases error resilience, the opposite is the case if applied appropriately. As higher compression allows using additional bit rate for forward error correction, the loss proba- bility of highly compressed data can be reduced assuming a constant overall bit rate. All other error resilience tools discussed in the following generally increase the data rate at the same quality, and, therefore, their application should always be considered very carefully in order not to effect adversely compression efficiency, especially if lower layer error protection is applicable. This can be seen for packet error rate 0 in Figure 18. Slice structured coding reduces packet loss probability and the visual degradation from packet losses, especially in combination with advanced decoder error concealment methods [25]. A slice is a sequence of macroblocks within one slice group and provides spatially distinct resynchronization points within the video data for a single frame. No intra-frame prediction takes place across slice boundaries. However, 17FIRST QUARTER 2004 IEEE CIRCUITS AND SYSTEMS MAGAZINE Bit Stream 2 Bit Stream 1 P2,n–2 P2,n–1 S2,n P2,n+1 P2,n+2 S12,n P1,n–2 P1,n–1 S1,n P1,n+1 P1,n+2 Figure 15. Switching between bit streams by using SI-Frames (from [22]). Bit Stream 2 Bit Stream 1 P2,n–2 P2,n–1 S2,n P2,n+1 P2,n+2 S12,n P1,n–2 P1,n–1 S1,n P1,n+1 P1,n+2 Figure 16. Switching between bit streams by using SP-Frames (from [22]). 1The Internet error pattern has been captured from real-world measurements and results in packet loss rates of approximately 3%, 5%, 10%, and 20%. These error probabilities label the packet error rate in Figure 18. Note that the 5% error file is burstier than the oth- ers resulting in somewhat unexpected results.
  • 12. the loss of intra-frame prediction and the increased over- head associated with decreasing slice sizes adversely affect coding performance. Especially for wireless trans- mission a careful selection of the packet size is necessary [7]. As a more advanced feature, Flexible Macroblock Ordering (FMO) allows the specification of macroblock allocation maps defining the mapping of macroblocks to slice groups, where a slice group itself may contain sev- eral slices. An example is shown in Figure 17. Therefore, macro- blocks might be trans- mitted out of raster scan order in a flexible and efficient way. Specific macroblock allocation maps enable the efficient application of features such as slice interleav- ing, dispersed mac- roblock allocation using checkerboard-like pat- terns, one or several foreground slice groups and one left-over back- ground slice groups, or sub-pictures within a picture to support, e.g., isolated regions [26]. Fig- ure 18 shows increased performance for FMO with checkerboard pat- tern for increasing error rate when compared to the abandoning of error resilience features. Arbitrary slice order- ing (ASO) allows that the decoding order of slices within a picture may not follow the constraint that the address of the first macroblock within a slice is monotonically increas- ing within the NAL unit stream for a picture. This permits, for example, to reduce decoding delay in case of out-of-order deliv- ery of NAL units. Data Partitioning allows up to three parti- tions for the transmis- sion of coded information. Rather than just providing two partitions, one for the header and the motion information, and one for the coded transform coefficients, H.264/AVC can generate three partitions by separating the second partition in intra and inter information. This allows assign- ing higher priority to, in general, more important intra information. Thus, it can reduce visual artifacts resulting from packet losses, especially if prioritization or unequal error protection is provided by the network. If despite of all these techniques, packet losses and spa- 18 IEEE CIRCUITS AND SYSTEMS MAGAZINE FIRST QUARTER 2004 Slice Group 0 Slice Group 2 Slice Group 0 Slice Group 1 Slice Group 1 Figure 17. Division of an image into several slice groups using Flexible Macroblock Ordering (FMO). Foreman, QCIF, 7.6 fps, 64 kbit/s Packet Error Rate in % No Error Resilience 20% Random Intra Channel Adaptive Intra FMO Checkerboard 2 FMO CB 2 with 20% RI FMO CB 2 with CAI Feedback with Delay 2 AverageY–PSNRindB 36 34 32 30 28 26 24 22 20 0 2 4 6 8 10 12 14 16 18 20 Figure 18. Average Y-PSNR over packet error rate (burstiness at 5% error rate is higher than for other error rates) for Foreman, QCIF, 7.5 fps, 64 kbit/s and different error resilience tools in H.264/AVC: no error resilience with one packet per frame, additional 20% random intra (RI) update, channel adaptive intra (CAI) update, each feature combined with FMO checkerboard pat- tern with 2 packets per frame (i.e., macroblocks with odd addresses in slice group 1, with even addresses in slice group 2), and feedback system with a 2-frame delayed (about 250 ms) decoder channel information at the encoder.
  • 13. tio-temporal error propagation are not avoidable, quick recovery can only be achieved when image regions are encoded in Intra mode, i.e., without reference to a previ- ously coded frame. H.264/AVC allows encoding of single macroblocks for regions that cannot be predicted efficient- ly. This feature can also be used to limit error propagation by transmitting a number of intra coded macroblocks antic- ipating transmission errors. The selection of Intra coded MBs can be done either randomly, in certain update pat- terns, or preferably in channel-adaptive rate-distortion optimized way [7], [27]. Figure 18 reveals that the intro- duction of intra coded macroblocks significantly improves the performance for increased error rates and can be com- bined with any aforementioned error resilience features. Thereby, channel-adaptive intra updates can provide bet- ter results than purely random intra updates, especially over the entire range of error rates. A redundant coded slice is a coded slice that is a part of a redundant picture which itself is a coded representa- tion of a picture that is not used in the decoding process if the corresponding primary coded picture is correctly decoded. Examples of applications and coding tech- niques utilizing the redundant coded picture feature include the video redundancy coding [28] and protection of “key pictures” in multicast streaming [29]. In bi-directional conversational applications it is com- mon that the encoder has the knowledge of experienced NAL unit losses at the decoder, usually with a small delay. This small information can be conveyed from the decoder to the encoder. Although retransmissions are not feasible in a low-delay environment, this information is still useful at the encoder to limit error propagation [30]. The flexi- bility provided by the multiple reference frame concept in H.264/AVC allows incorporating so called NEWPRED approaches [31] in a straight-forward manner which address the problem of error propagation. For most suc- cessful applications, a selection of reference frames and intra updates can be integrated in a rate-distortion opti- mized encoder control as discussed in Section 4 taking into account not only video statistics, but also all avail- able channel information [7]. Excellent results are shown in Figure 18 applying five reference frames and feedback delay of two frames, especially for moderate to higher error rates. To improve the performance also for low error rates, a combination of channel adaptive intra up- dates and feedback might be considered according to [27] at the expense of increased encoding complexity. 4. Rate Constrained Encoder Control Due to the fact that the standard defines only the bit- stream syntax and the possible coding tools the coding efficiency is dependent on the coding strategy of the encoder, which is not part of the standard (see Figure 1). Figure 19 shows the principle rate distortion working points for different encoder strategies. If just the mini- mization of the distortion is considered for the decision of the coding tools the achieved distortion is small but the required rate is very high. Vice versa, if just the rate is considered the achieved rate is small but the distortion is high. Usually, these working points are both not desired. Desired is a working point at which both the dis- tortion and the rate are minimized together. This can be achieved by using Lagrangian optimization techniques, which are described for example in [32]. For the encoding of video sequences using the H.264/AVC standard, Lagrangian optimization techniques for the choice of the macroblock mode and the estima- tion of the displacement vector are proposed in [10], [33] and [34]. The macroblock mode of each macroblock Sk can be 19FIRST QUARTER 2004 IEEE CIRCUITS AND SYSTEMS MAGAZINE Working Point Achieved by Minimizing Only the Rate Working Point Achieved by Minimizing the Distortion Under a Rate Constraint Working Point Achieved by Minimizing Only the Distortion Rate Distortion Figure 19. Principle rate distortion working points for different encoder strategies.
  • 14. efficiently chosen out of all possible modes Ik by mini- mizing the functional DREC(Sk, Ik | Q P) + λMode · RREC(Sk, Ik | Q P) → min Hereby the distortion DREC is measured by the sum of squared differences (SSD) between the original signal s and the corresponding reconstructed signal s of the same macroblock. The SSD can be calculated by S S D = (x,y) | s[x, y, t] − s [x, y, t] |2 . The rate RREC is the rate that is required to encode the block with the entropy coder. QP is the quantization parameter used to adjust the quantization step size. It ranges from 0 to 51. The motion vectors can be efficiently estimated by minimizing the functional DDF D(Si, −→ d ) + γMotion · RMotion(Si, −→ d ) → min with DDF D(Si, −→ d ) = (x,y) |s[x, y, t] − s [x, − dx, y, − dy, t − dt] |2 . Hereby RMotion is the rate required to transmit the motion information −→ d , which consists of both displacement vec- tor components dx and dy and the corresponding refer- ence frame number dt. The following Lagrangian parameters lead to good results as shown in [10]: λMode = λMotion = 0.85.2(QP−12)/3 . As already discussed, the tools for increased error resilience, in particular those to limit error propagation, do not significantly differ from those used for compres- sion efficiency. Features like multi-frame prediction or macroblock intra coding are not exclusively error resilience tools. This means that bad decisions at the encoder can lead to poor results in coding efficiency or error resiliency or both. The selection of the coding mode for compression efficiency can be modified taking into account the influence of the random lossy channel. In this case, the encoding distortion is replaced by the expected decoder distortion. For the computation of the expected distortion we refer to, e.g. [27] or [35]. This method has been applied to generate channel-adaptive results in sub- section 3.6 assuming a random-lossy channel with known error probability at the encoder. 5. Profiles and Levels of H.264/AVC H.264/AVC has been developed to address a large range of applications, bit rates, resolutions, qualities, and services; in other words, H.264/AVC intends to be as generically applicable as possible. However, different applications typically have different requirements both in terms of functionalities, e.g., error resilience, com- pression efficiency and delay, as well as complexity (in this case, mainly decoding complexity since encoding is not standardized). In order to maximize the interoperability while limiting the complexity, targeting the largest deployment of the standard, the H.264/AVC specification defines profiles and levels. A profile is defined as a subset of the entire bit stream syntax or in other terms as a subset of the coding tools. In order to achieve a subset of the complete syntax, flags, parameters, and other syntax elements are includ- ed in the bit stream that signal the presence or absence of syntactic elements that occur later in the bit stream. All decoders compliant to a certain profile must support all the tools in the corresponding profile. However, within the boundaries imposed by the syn- tax of a given profile, there is still a large variation in terms of the capabil- ities required of the decoders depend- ing on the values taken by some syntax elements in the bit stream such as the size of the decoded pictures. For many applications, it is currently neither practical nor economic to implement a decoder able to deal with all hypothetical uses of the syntax within a particular profile. To address this problem, a second profiling dimension was created for each pro- file: the levels. A level is a specified set of constraints imposed on values of the syntax elements in the bit stream. 20 IEEE CIRCUITS AND SYSTEMS MAGAZINE FIRST QUARTER 2004 Extended SI/SP Slices Main B Slices CABAC Weighted Prediction Field Coding MB-AFF Data Partitioning FMO Baseline ASO Red. Pictures Features I & P Slices Diff. Block Sizes Intra Prediction CAVLC In-Loop Deb. Filter Multiple Ref. Frames 1/4 Pel MC Figure 20. H.264/AVC profiles and corresponding tools.
  • 15. These constraints may be simple limits on values or alternatively they may take the form of constraints on arithmetic combinations of values (e.g. picture width mul- tiplied by picture height multiplied by number of pictures decoded per second) [1]. In H.264/AVC, the same level definitions are used for all profiles defined. However, if a certain terminal supports more than one profile, there is no obligation that the same level is supported for the var- ious profiles. A profile and level combination specifies the so-called conformance points, this means points of inter- operability for applications with similar functional requirements. Summing up, profiles and levels together specify restrictions on the bit streams and thus minimum bounds on the decoding capabilities, making possible to implement decoders with different limited complexity, targeting different application domains. Encoders are not required to make use of any specific set of tools; they only have to produce bit streams which are com- pliant to the relevant profile and level combination. To address the large range of applications considered by H.264/AVC, three profiles have been defined (see Figure 20): ■ Baseline Profile—Typically considered the sim- plest profile, includes all the H.264/AVC tools with the exception of the following tools: B-slices, weighted prediction, field (interlaced) coding, pic- ture/macroblock adaptive switching between frame and field coding (MB-AFF), CABAC, SP/SI slices and slice data partitioning. This profile typi- cally targets applications with low complexity and low delay requirements. ■ Main Profile—Supports together with the Baseline profile a core set of tools (see Figure 20); however, regarding Baseline, Main does exclude FMO, ASO and redundant pictures features while including B- slices, weighted prediction, field (interlaced) cod- ing, picture/macroblock adaptive switching between frame and field coding (MB-AFF), and CABAC. This profile typically allows the best quali- ty at the cost of higher complexity (essentially due to the B-slices and CABAC) and delay. ■ Extended Profile—This profile is a superset of the Baseline profile supporting all tools in the specifica- tion with the exception of CABAC. The SP/SI slices and slice data partitioning tools are only included in this profile. From Figure 20, it is clear that there is a set of tools supported by all profiles but the hierarchical capabili- ties for this set of profiles are reduced to Extended being a superset of Baseline. This means, for example, that only certain Baseline compliant streams may be decod- ed by a decoder compliant with the Main profile. Although it is difficult to establish a strong relation between profiles and applications (and clearly nothing is normative in this regard), it is possible to say that conversational services will typically use the Baseline profile, entertainment services the Main profile, and streaming services the Baseline or Extended profiles for wireless or wired environments, respectively. However, a different approach may be adopted and, for sure, may change in time as additional complexity will become more acceptable. In H.264/AVC, 15 levels are specified for each profile. Each level specifies upper bounds for the bit stream or lower bounds for the decoder capabilities, e.g., in terms of picture size (from QCIF to above 4k×2k), decoder pro- cessing rate (from 1485 to 983040 macroblocks per sec- ond), size of the memory for multi-picture buffers, video bit rate (from 64 kbit/s to 240 Mbit/s), and motion vec- tor range (from [−64, +63.75] to [−512, +511.75]). For more detailed information on the H.264/AVC profiles and levels, refer to Annex A of [1]. 6. Comparison to Previous Standards In this section, a comparison of H.264/AVC to other video coding standards is given with respect to the cod- ing efficiency (Subsection 6.1) and hardware complexity (Subsection 6.2). 6.1 Coding Efficiency In [10], a detailed comparison of the coding efficiency of different video coding standards is given for video streaming, video conferencing, and entertainment-quality applications. All encoders are rate-distortion optimized using rate constrained encoder control [10], [33], [34]. For video streaming and video conferencing applications, we use test video sequences in the Common Intermediate Format (CIF, 352 × 288 picture elements, progressive) and in the Quarter Common Intermediate Format (QCIF, 176×144 picture elements, progressive). For entertain- ment-quality applications, sequences in ITU-R 601 (720×576 picture elements, interlaced) and High Defini- tion Television (HDTV, 1280 × 720 picture elements, pro- gressive) are used. The coding efficiency is measured by average bit rate savings for a constant peak signal to noise ratio (PSNR). Therefore the required bit rates of several test sequences and different qualities are taken into account. For video streaming applications, H.264/AVC MP (Main Profile), MPEG-4 Visual ASP (Advanced Simple Pro- file), H.263 HLP (High Latency Profile), and MPEG-2 Video ML@MP (Main Level at Main Profile) are considered. Fig- ure 21 shows the PSNR of the luminance component ver- sus the average bit rate for the single test sequence Tempete encoded at 15 Hz and Table 1 presents the aver- age bit rate savings for a variety of test sequences and bit 21FIRST QUARTER 2004 IEEE CIRCUITS AND SYSTEMS MAGAZINE
  • 16. rates. It can be drawn from Table 1 that H.264/AVC out- performs all other considered encoders. For example, H.264/AVC MP allows an average bit rate saving of about 63% compared to MPEG-2 Video and about 37% compared to MPEG-4 Visual ASP. For video conferencing applications, H.264/AVC BP (Baseline Profile), MPEG-4 Visual SP (Simple Profile), H.263 Baseline, and H.263 CHC (Conversa- tional High Compression) are considered. Figure 22 shows the luminance PSNR versus average bit rate for the single test sequence Paris encoded at 15 Hz and Table 2 presents the average bit rate sav- ings for a variety of test sequences and bit rates. As for video streaming applications, H.264/AVC outperforms all other considered encoders. H.264/AVC BP allows an average bit rate saving of about 40% compared to H.263 Baseline and about 27% compared to H.263 CHC. For entertainment-quality applications, the aver- age bit rate saving of H.264/AVC compared to MPEG-2 Video ML@MP and HL@MP is 45% on average [10]. A part of this gain in coding efficiency is due to the fact that H.264/AVC achieves a large degree of removal of film grain noise resulting from the motion picture production process. However, since the perception of this noisy grain texture is often considered to be desirable, the difference in per- ceived quality between H.264/AVC coded video and MPEG- 2 coded video may often be less distinct than indicated by the PSNR-based comparisons, especially in high-quality, high-resolution applications such as High-Definition DVD or Digital Cinema. In certain applications like the professional motion picture production, random access for each individual picture may be required. Motion-JPEG2000 [37] as an extension of the new still image coding standard JPEG2000 provides this feature along with some useful scalability properties. When restricted to IDR frames, H.264/AVC is also capable of serving the needs for such a random access capability. Figure 23 shows PSNR for the luminance component versus average bit rate for the ITU-R 601 test sequence Canoe encoded in intra mode only, i.e., each field of the whole sequence is coded in intra mode only. Inter- estingly, the measured rate-distortion per- formance of H.264/AVC MP is better than that of the state-of-the-art in still image compression as exemplified by JPEG2000, at least in this particular test case. Other test cases were studied in [38] as well, lead- ing to a general observation that up to 1280 × 720 pel HDTV signals the pure intra coding performance of H.264/AVC MP is comparable or better than that of Motion-JPEG2000. 6.2 Hardware Complexity Assessing the complexity of a new video coding standard is not a straightforward task: its implementation complexi- ty heavily depends on the characteristics of the platform (e.g., DSP processor, FPGA, ASIC) on which it is mapped. In this section, the data transfer characteristics are chosen as generic, platform independent, metrics to express imple- mentation complexity. This approach is motivated by the data dominance of multimedia applications [39]–[44]. Both the size and the complexity of the specification and the intricate interdependencies between different H.264/AVC functionalities, make complexity assessment using only the paper specification unfeasible. Hence the 22 IEEE CIRCUITS AND SYSTEMS MAGAZINE FIRST QUARTER 2004 Tempete CIF 15Hz 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 0 256 512 768 1024 1280 1536 1792 Bit Rate [kbit/s] Y-PSNR[dB] H.264/AVC MP MPEG-4 ASP H.263 HLP MPEG-2 Figure 21. Luminance PSNR versus average bit rate for different coding standards, measured for the test sequence Tempete for video streaming applications (from [36]). Table 1. Average bit rate savings for video streaming applications (from [10]). Table 2. Average bit rate savings for video conferencing applications (from [10]). Average Bit Rate Savings Relative To: Coder H.263 CHC MPEG-4 SP H.263 Base H.264/AVC BP 27.69% 29.37% 40.59% H.263 CHC — 2.04% 17.63% MPEG-4 SP — — 15.69% Average Bit Rate Savings Relative To: Coder MPEG-4 ASP H.263 HLP MPEG-2 H.264/AVC MP 37.44% 47.58% 63.57% MPEG-4 ASP — 16.65% 42.95% H.263 HLP — — 30.61%
  • 17. 23FIRST QUARTER 2004 IEEE CIRCUITS AND SYSTEMS MAGAZINE presented complexity analysis has been performed on the executable C code produced by the JVT instead. As this specification is the result of a collaborative effort, the code unavoidably has different properties with respect to optimisation and platform dependence. Still, it is our experience that when using automated profiling tools yielding detailed data transfer characteristics (such as [45]) on similar specifications (e.g., MPEG-4) meaningful relative complexity figures are obtained (this is also the conclusion of [46]. The H.264/AVC JM2.1 code is used for the reported complexity assessment experiments. Newer versions of the executable H.264/AVC specification have become available that also include updated tool defini- tions achieving a reduced complexity. The test sequences used in the complexity assess- ment are: Mother & Daughter 30 Hz QCIF, Foreman 25 Hz QCIF and CIF, and Mobile & Calendar 15 Hz CIF (with bit rates ranging from 40 Kbits/s for the simple sequences to 2 Mbits/s for the complex ones). A fixed quantization parameter setting has been assumed. The next two subsections high- light the main contributions to the H.264/AVC complexity. Conse- quently some general considera- tions are presented. 6.2.1 Complexity Analysis of Some Major H.264/AVC Encoding Tools ■ Variable Block Sizes: using variable block sizes affects the access frequency in a linear way: more than 2.5% complexity increase2 for each additional mode. A typical bit rate reduction between 4 and 20% is achieved (for the same quality) using this tool, however, the complexity increases linearly with the number of modes used, while the correspon- ding compression gain saturates. ■ Hadamard transform: the use of Hadamard cod- ing results in an increase of the access frequen- cy of roughly 20%, while not significantly impacting the quality vs. bit rate for the test sequences considered. ■ RD-Lagrangian optimisation: this tool comes with a data transfer increase in the order of 120% and improves PSNR (up to 0.35 dB) and bit rate (up to 9% bit savings). The performance vs. cost trade- off when using RD techniques for motion estima- tion and coding mode decisions inherently depends on the other tools used. For instance, when applied to a basic configuration with 1 ref- erence frame and only 16×16 block size, the resulting complexity increase is less than 40%. ■ B-frames: the influence of B frames on the access frequency varies from −16 to +12% depending on the test case and decreases the bit rate up to 10%. ■ CABAC: CABAC entails an access frequency increase from 25 to 30%, compared to methods using a single reversible VLC table for all syn- tax elements,. Using CABAC reduces the bit rate up to 16%. ■ Displacement vector resolution: The encoder may choose to search for motion vectors only at 1/2 pel positions instead of 1/4 pel positions. This results in a decrease of access frequency and pro- cessing time of about 10%. However, use of 1/4 pel motion vectors increases coding efficiency up to 30% except for very low bit rates. ■ Search Range: increasing both reference frame numbers and search size leads to higher access frequency, up to approximately 60 times (see also Table 3), while it has a minimal impact on PSNR and bit rate performances. ■ Multiple Reference Frames: adopting multiple refer- ence frames increases the access frequency accord- Paris CIF 15Hz 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 0 128 256 384 512 640 768 Bit Rate [kbit/s] Y-PSNR[dB] H.264/AVC BP H.263 CHC MPEG-4 SP H.263 Baseline Figure 22. Luminance PSNR versus average bit rate for different coding standards, measured for the test sequence Paris for video conferencing applications (from [36]). 2Complexity increases and compression improvements are relative to a comparable, meaningful configuration without the tool under considera- tion, see also [47].
  • 18. 24 IEEE CIRCUITS AND SYSTEMS MAGAZINE FIRST QUARTER 2004 Foreman 25 Hz QCIF Foreman 25 Hz CIF Mobile & Calendar 15 Hz CIF Search Range 8 16 32 8 16 32 8 16 32 5 ref. frames 16.9 24.6 55.7 17.5 25.3 56.1 16.6 23.1 48.8 1 ref. frame 1 2.54 8.87 1 2.53 8.90 1 2.49 8.49 Table 3. Impact of the number of reference frames and search range on the number of encoder accesses (relative to the simplest case considered for each sequence). ing to a linear model: 25% complexity increase for each added frame. A negligible gain (less than 2%) in bit rate is observed for low and medium bit rates, but more significant savings can be achieved for high bit rate sequences (up to 14%). ■ Deblocking filter: The mandatory use of the deblocking filter has no measurable impact on the encoder complexity. However, the filter provides a significant increase in subjective picture quality. For the encoder, the main bottleneck is the combina- tion of multiple reference frames and large search sizes. Speed measurements on a Pentium IV platform at 1.7 GHz with Windows 2000 are consistent with the above conclu- sions (this platform is also used for the speed measure- ments for the decoder). 6.2.2 Complexity Analysis of Some Major H.264/AVC Decoding Tools ■ CABAC: the access frequency increase due to CABAC is up to 12%, compared to methods using a single reversible VLC table for all syntax elements,. The higher the bit rate, the higher the increase. ■ RD-Lagrangian optimization: the use of Lagrangian cost functions at the encoder causes an average com- plexity increase of 5% at the decoder for middle and low–rates while higher rate video is not affect- ed (i.e. in this case, encoding choic- es result in a complexity increase at the decoder side). ■ B-frames: the influence of B-frames on the data transfer complexity increase varies depending on the test case from 11 to 29%. The use of B-frames has an important effect on the decoding time: intro- ducing a first B-frame requires an extra 50% cost for the very low bit rate video, 20 to 35% for medium and high bite-rate video. The extra time required by the second B- frame is much lower (a few %). ■ Hadamard transform: the influence on the decoder of using the Hadamard transform at the encoder is neg- ligible in terms of memory accesses, while it increas- es the decoding time up to 5%. ■ Deblocking filter: The use of the mandatory deblocking filter increases the decoder access frequency by 6%. ■ Displacement vector resolution: In case the encoder sends only vectors pointing to 1/2 pel positions, the access frequency and decoding time decrease about 15%. 6.2.3 Other Considerations In relative terms, the encoder complexity increases with more than one order of magnitude between MPEG-4 Part 2 (Simple Profile) and H.264/AVC (Main Profile) and with a fac- tor of 2 for the decoder. The H.264/AVC encoder/decoder complexity ratio is in the order of 10 for basic configura- tions and can grow up to 2 orders of magnitude for complex ones, see also [47]. Our experiments have shown that, when combining the new coding features, the relevant implementation Canoe ITU-R 601, Intra Coding Only 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 0 2 4 6 8 10 12 Bit Rate [Mbit/s] Y-PSNR[dB] H.264/AVC MP, Intra Only Motion-JPEG2000 Figure 23. Luminance PSNR versus average bit rate for H.264/AVC and Motion-JPEG2000, measured for the ITU-R 601 test sequence Canoe for pure intra coding.
  • 19. complexity accumulates while the global compression efficiency saturates. An appropriate use of the H.264/AVC tools leads to roughly the same compression performance if all the tools would be used simultane- ously, but with a considerable reduction in implementa- tion complexity (a factor 6.5 for the encoder and up to 1.5 for the decoder). These efficient use modes are reflected in the choice of the tools and parameter set- tings of the H.264/AVC profiles (see Section 5). More information on complexity analyses that have been per- formed in the course of H.264/AVC standardisation can be found in [48] [49] [50]. 7. Licensing of H.264/AVC Technology Companies and universities introducing technology into international standards usually protect their intellectual property with patents. When participants in the stan- dards definition process proposed patented technology to be included into the standard they promised to license the use of their technology in fair, reasonable and non-dis- criminatory terms, the so-called RAND conditions. Essen- tial patents describe technology that has to be implemented in a standards-compliant decoder. The use of patented technology requires a user of this technology to license it from the respective owner. Given that there are many patents used in any modern video coding stan- dard, several companies pooled their patents into a pool such that licensing H.264/AVC technology is easy for the user. At this point, there are two patent pools: One is organized by MPEG LA and the other by Via Licensing. Since the patents covered by the two patent pools are not precisely the same, users of H.264/AVC technology need in principle to have a license from both patent pools. Unfortunately, these pools do not guarantee that they cover the entire technology of H.264 as participation of a patent owner in a patent pool is voluntary. MPEG LA LLC is the organization which gathered the owners of essential patents like Columbia University, Elec- tronics and Telecommunications Research Institute of Korea (ETRI), France Télécom, Fujitsu, LG Electronics, Mat- sushita, Mitsubishi, Microsoft, Motorola, Nokia, Phillips, Robert Bosch GmbH, Samsung, Sharp, Sony, Toshiba, and Victor Company of Japan (JVC) into a patent pool. VIA Licensing Corporation, a subsidiary of Dolby Laboratories, licenses essential H.264/AVC technology from companies like Apple Computer, Dolby Laboratories, FastVDO, Fraun- hofer-Gesellschaft eV, IBM, LSI Logic, Microsoft, Motorola, Polycom, and RealNetworks. Both patent pools may be licensed for the commercial use of an H.264/AVC decoder. Unfortunately, the terms of the license differ. MPEG LA terms: After the end of a grace period in December 2004, an end product manufacturer for encoders or decoders has to pay a unit fee of $0.20 per unit after the first 100,000 units that are free each year. In addition to this fee for the actual soft- or hardware, certain companies are taxed a participation fee starting January 2006. Providers of Pay-per-View, download or Video-on-Demand services pay the lower of 2% of the sales price or $0.02 for each title. This applies to all transmission media like cable, satellite, Internet, mobile and over the air. Subscription services with more than 100,000 but less than 1,000,000 AVC video subscribers pay a minimum of $0.075 and a maximum of $0.25 per subscriber per year. Operators of over-the-air free broadcast services are charged $10,000 per year per transmitter. Free Internet broadcast is exempt from any fees until December 2010. VIA Licensing terms: After the end of a grace period in December 2004, an end product manufacturer for encoder or decoders has to pay a unit fee of $0.25 per unit. A participation or replication fee is not required if the content is provided for free to the users. A fee of $0.005 for titles shorter than 30 minutes up to $0.025 for titles longer than 90 minutes has to be paid for titles that are permanently sold. For titles that are sold on a tem- porary basis, the ‘replication fee’ is $0.0025. This patent pool does not require the payment of any fees as long as a company distributes less than 50,000 devices and derives less than $500,000 revenue from its activities related to devices and content distribution. It appears that interactive communication services like video telephony only requires a unit fee but not a participation fee. While previous standards like MPEG-2 Video also required a license fee to be paid for every encoder and decoder, the participation fees established for the use of H.264/AVC require extra efforts from potential commer- cial users of H.264/AVC. Disclaimer: No reliance may be placed on this section on licensing of H.264/AVC technology without written confirmation of its contents from an authorized repre- sentative. 8. Summary This new international video coding standard has been jointly developed and approved by the MPEG group ISO/IEC and the VCEG group of ITU-T. Compared to previ- ous video coding standards, H.264/AVC provides an improved coding efficiency and a significant improve- ment in flexibility for effective use over a wide range of networks. While H.264/AVC still uses the concept of block-based motion compensation, it provides some sig- nificant changes: ■ Enhanced motion compensation capability using high precision and multiple reference frames ■ Use of an integer DCT-like transform instead of the DCT 25FIRST QUARTER 2004 IEEE CIRCUITS AND SYSTEMS MAGAZINE
  • 20. ■ Enhanced adaptive entropy coding including arith- metic coding ■ Adaptive in-loop deblocking filter The coding tools of H.264/AVC when used in an optimized mode allow for bit savings of about 50% compared to pre- vious video coding standards like MPEG-4 and MPEG-2 for a wide range of bit rates and resolutions. However, these savings come at the price of an increased complexity. The decoder is about 2 times as complex as an MPEG-4 Visual decoder for the Simple profile, and the encoder is about 10 times as complex as a corresponding MPEG-4 Visual encoder for the Simple profile. The H.264/AVC main pro- file decoder suitable for entertainment applications is about four times more complex than MPEG-2. The encoder complexity depends largely on the algorithms for motion estimation as well as for the rate-constrained encoder control. Given the performance increase of VLSI circuits since the introduction of MPEG-2, H.264/AVC today is less complex than MPEG-2 in 1994. At this point commercial companies may already license some tech- nology for implementing an H.264/AVC decoder from two licensing authorities simplifying the process of building products on H.264/AVC technology. 9. Acknowledgments The authors would like to thank the experts of ISO/IEC MPEG, ITU-T VCEG, and ITU-T/ISO/IEC Joint Video Team for their contributions in developing the standard. 10. References [1] ISO/IEC 14496–10:2003, “Coding of Audiovisual Objects—Part 10: Advanced Video Coding,” 2003, also ITU-T Recommendation H.264 “Advanced video coding for generic audiovisual services.” [2] ITU-T Recommendation H.261, “Video codec for Audiovisual Services at p X 64 kbit/s,” March 1993. [3] ISO/IEC 11172: “Information technology—coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s,” Geneva, 1993. [4] ISO/IEC 13818–2: “Generic coding of moving pictures and associated audio information—Part 2: Video,” 1994, also ITU-T Recommendation H.262. [5] ITU-T Recommendation H.263, “Video Coding for Low bit rate Commu- nication,” version 1, Nov. 1995; version 2, Jan. 1998; version 3, Nov. 2000. [6] ISO/IEC 14496–2: “Information technology—coding of audiovisual objects—part 2: visual,” Geneva, 2000. [7] T. Stockhammer, M.M. Hannuksela, and T. Wiegand, “H.264/AVC in wireless environments,” IEEE Transactions on Circuits and Systems, vol. 13, no. 7, pp. 657–673, July 2003. [8] T. Wedi and H.G. Musmann, “Motion- and aliasing-compensated pre- diction for hybrid video coding,” IEEE Transactions on Circuits and Sys- tems for Video Technology, vol. 13, pp. 577–587, July 2003. [9] T. Wiegand, X. Zhang, and B. Girod, “Long-term memory motion-com- pensated prediction for video coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 9, pp. 70–84, Feb. 1999. [10] T. Wiegand, H. Schwarz, A. Joch, and F. Kossentini, “Rate-con- strained coder control and comparison of video coding standards,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, pp. 688–703, July 2003. [11] B. Girod, “Efficiency analysis of multihypothesis motion-compen- sated prediction for video coding,” IEEE Trans. Image Processing, vol. 9, Feb. 1999. [12] M. Flierl, T. Wiegand, and B. Girod, “Rate-constrained multi-hypothesis motion-compensated prediction for video coding,” in Proc. IEEE Int. Conf. Image Processing, Vancouver, BC, Canada, Sept. 2000, vol. 3, pp. 150–153. [13] N. Ahmed, T. Natarajan, and R. Rao, “Discrete cosine transform,” IEEE Transactions on Computers, vol. C-23, pp. 90–93, Jan. 1974. [14] Iain E G Richardson, “H.264/MPEG-4 Part 10 White Paper.” Available: http://www.vcodex.fsnet.co.uk/resources.html [15] H. Malvar, A. Hallapuro, M. Karczewicz, and L. Kerofsky, “Low-Com- plexity transform and quantization in H.264/AVC,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, pp. 598–603, July 2003. [16] D. Marpe, H. Schwarz, and T. Wiegand, “Context-based adaptive binary arithmetic coding in the H.264/AVC video compression standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, pp. 620–636, July 2003. [17] P. List, A. Joch, J. Lainema, and G. Bjontegaard, “Adaptive deblock- ing filter” IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, pp. 614–619, July 2003. [18] J. Ribas-Corbera, P.A. Chou, and S. Regunathan, “A generalized hypothetical reference decoder for H.264/AVC,” IEEE Transactions on Cir- cuits and Systems, vol. 13, no. 7, pp. 674–687, July 2003. [19] B. Girod, M. Kalman, Y.J. Liang, and R. Zhang, “Advances in video channel-adaptive streaming,” in Proc. ICIP 2002, Rochester, NY, Sept. 2002. [20] Y.J. Liang and B. Girod, “Rate-distortion optimized low--latency video streaming using channel-adaptive bitstream assembly,” in Proc. ICME 2002, Lausanne, Switzerland, Aug. 2002. [21] S.H. Kang and A. Zakhor, “Packet scheduling algorithm for wireless video streaming,” in Proc. International Packet Video Workshop 2002, Pittsburgh, PY, April 2002. [22] M. Karczewicz and R. Kurçeren, “The SP and SI frames design for H.264/AVC,” IEEE Transactions on Circuits and Systems, vol. 13, no. 7, pp. 637–644, July 2003. [23] S. Wenger. (September 2001). Common Conditions for wire-line, low delay IP/UDP/RTP packet loss resilient testing. VCEG-N79r1. Available: http://standard.pictel. com/ftp/video-site/0109_San/VCEG-N79r1.doc,. [24] S. Wenger, “H.264/AVC over IP,” IEEE Transactions on Circuits and Sys- tems, vol. 13, no. 7, pp. 645–656, July 2003. [25] Y.-K. Wang, M.M. Hannuksela, V. Varsa, A. Hourunranta, and M. Gabbouj, “The error concealment feature in the H.26L test model,” in Proc. ICIP, vol. 2, pp. 729–732, Sept. 2002. [26] Y.-K. Wang, M.M. Hannuksela, and M. Gabbouj, “Error-robust inter/intra mode selection using isolated regions,” in Proc. Int. Packet Video Workshop 2003, Apr. 2003. [27] R. Zhang, S.L. Regunathan, and K. Rose, “Video coding with optimal inter/intra-mode switching for packet loss resilience,” IEEE JSAC, vol. 18, no. 6, pp. 966–976, July 2000. [28] S. Wenger, “Video redundancy coding in H.263+,” 1997 International Workshop on Audio-Visual Services over Packet Networks, Sept. 1997. [29] Y.-K. Wang, M.M. Hannuksela, and M. Gabbouj, “Error resilient video coding using unequally protected key pictures,” in Proc. International Workshop VLBV03, Sept. 2003. [30] B. Girod and N. Färber, “Feedback-based error control for mobile video transmission,” in Proc. of IEEE, vol. 97, no. 10, Oct. 1999, pp. 1707–1723. [31] S. Fukunaga, T. Nakai, and H. Inoue, “Error resilient video coding by dynamic replacing of reference pictures,” in Proc. IEEE Globecom, vol. 3, Nov. 1996. [32] A. Ortega and K. Ramchandran, “Rate-distortion methods for image and video compression,” IEEE Signal Processing Magazine, vol. 15 no. 6, pp. 23–50, Nov. 1998. [33] G.J. Sullivan and T. Wiegand, “Rate-distortion optimization for video compression,” IEEE Signal Processing Magazine, vol. 15, pp. 74–90, Nov. 1998. [34] T. Wiegand and B. Girod, “Lagrangian multiplier selection in hybrid video coder control,” in Proc. of ICIP 2001, Thessaloniki, Greece, Oct. 2001. [35] T. Stockhammer, D. Kontopodis, and T. Wiegand, “Rate-distortion optimization for H.26L video coding in packet loss environment,” in Proc. Packet Video Workshop 2002, Pittsburgh, PY, April 2002. [36] A. Joch, F. Kossentini, H. Schwarz, T. Wiegand, and G. Sullivan, “Per- formance comparison of video coding standards using Lagrangian coder control,” Proc. of the IEEE ICIP 2002, part II, pp. 501–504, Sept. 2002. [37] ISO/IEC 15444-3, “Motion-JPEG2000” (JPEG2000 Part 3), Geneva, 2002. [38] D. Marpe, V. George, H.L. Cycon, and K.U. Barthel, “Performance evaluation of Motion-JPEG2000 in comparison with H.264/AVC operated in intra coding mode,” in Proc. SPIE Conf. on Wavelet Applications in Industrial Processing, Photonics East, Rhode Island, USA, Oct. 2003. [39] F. Catthoor, et al., Custom Memory Management Methodology. 26 IEEE CIRCUITS AND SYSTEMS MAGAZINE FIRST QUARTER 2004
  • 21. Kluwer Academic Publishers, 1998 [40] J. Bormans et al., “Integrating system-level low power methodolo- gies into a real-life design flow,” in Proc. IEEE PATMOS ‘99, Kos, Greece, Oct. 1999, pp. 19–28. [41] Chimienti, L. Fanucci, R. Locatelli, and S. Saponara, “VLSI architec- ture for a low-power video codec system,” Microelectronics Journal, vol. 33, no. 5, pp. 417–427, 2002. [42] T. Meng et al., “Portable video-on-demand in wireless communica- tion,” Proc. of the IEEE, vol. 83 no. 4, pp. 659–680, 1995. [43] J. Jung, E. Lesellier, Y. Le Maguet, C. Miro, and J. Gobert, “Philips deblock- ing solution (PDS), a low complexity deblocking for JVT,” Joint Video Team of ISO/IEC MPEG and ITU-T VCEG, JVT-B037, Geneva, CH, Feb. 2002. [44] L. Nachtergaele et al., “System-Level power optimization of video codecs on embedded cores: A systematic approach,” Journal of VLSI Sig- nal Processing, Kluwer Academic Publisher, vol. 18, no. 2, pp. 89–111, 1998. [45] http://www.imec.be/atomium [46] S. Bauer, et al., “The MPEG-4 multimedia coding standard: Algo- rithms, architectures and applications,” Journal of VLSI Signal Processing. Boston: Kluwer, vol. 23, no. 1, pp. 7–26, Oct. 1999. [47] S. Saponara, C. Blanch, K. Denolf, and J. Bormans, “The JVT advanced video coding standard: Complexity and performance analysis on a tool-by- tool basis,” Packet Video Workshop (PV’03), Nantes, France, April 2003. [48] V. Lappalainen et al., “Optimization of emerging H.26L video encoder,” in Proc. IEEE SIPS’01, Sept. 2001, pp. 406–415. [49] V. Lappalainen, A. Hallapuro, and T. Hamalainen, “Performance analysis of low bit rate H.26L video encoder,” in Proc. IEEE ICASSP'01, May 2001, pp. 1129–1132. [50] M. Horowitz, A. Joch, F. Kossentini, and A. Hallapuro, “H.264/AVC baseline profile decoder complexity analysis,” IEEE Tran. Circ. Sys. Video Tech., vol. 13, no 7, pp. 715–727, 2003. Jörn Ostermann studied Electrical Engi- neering and Communications Engineering at the University of Hannover and Imperi- al College London, respectively. He received Dipl.-Ing. and Dr.-Ing. from the University of Hannover in 1988 and 1994, respectively. From 1988 till 1994, he worked as a Research Assistant at the Institut für Theo- retische Nachrichtentechnik conducting research in low bit-rate and object-based analysis-synthesis video coding. In 1994 and 1995 he worked in the Visual Communications Research Department at AT&T Bell Labs on video coding. He was a member of Image Processing and Technology Research within AT&T Labs—Research from 1996 to 2003. Since 2003 he is Full Professor and Head of the Institut für Theoretische Nachrichtentechnik und Informationsverar- beitung at the Universität Hannover, Germany. From 1993 to 1994, he chaired the European COST 211 sim group coordinating research in low bitrate video cod- ing. Within MPEG-4, he organized the evaluation of video tools to start defining the standard. He chaired the Adhoc Group on Coding of Arbitrarily-shaped Objects in MPEG-4 Video. Jörn was a scholar of the German National Foun- dation. In 1998, he received the AT&T Standards Recogni- tion Award and the ISO award. He is a member of IEEE, the IEEE Technical Committee on Multimedia Signal Process- ing, past chair of the IEEE CAS Visual Signal Processing and Communications (VSPC) Technical Committee and a Distinguished Lecturer of the IEEE CAS Society. He pub- lished more than 50 research papers and book chapters. He is coauthor of a graduate level text book on video com- munications. He holds 10 patents. His current research interests are video coding and streaming, 3D modeling, face animation, and computer-human interfaces. Matthias Narroschke was born in Hanover in 1974. He received his Dipl.-Ing. degree in electrical engineering from the University of Hanover in 2001 (with high- est honors). Since then he has been work- ing toward the PhD degree at the Institut für Theoretische Nachrichtentechnik und Informationsverarbeitung of the University of Hanover. His research interests include video coding, 3D image process- ing and video processing, and internet streaming. In 2003, he became a Senior Engineer. He received the Robert- Bosch-Prize for the best Dipl.-Ing. degree in electrical engi- neering in 2001. He is an active delegate to the Motion Picture Experts Group (MPEG). Thomas Wedi received his Dipl.-Ing. degree in 1999 from the University of Han- nover, Hannover, Germany, where he is currently working toward the Ph.D. degree with research focused on motion- and aliasing-compensated prediction for hybrid video coding. He has been with the Institut für Theoretische Nachrichtentechnik und Informationsverarbeitung, Uni- versity of Hannover, as Research Scientist and Teaching Assistant. In 2001 he also became the Senior Engineer. His further research interests include video coding and trans- mission, 3D image and video processing, and audio-visual communications. He is an active contributor to the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC/ ITU-T Joint Video Team (JVT), where the H.264/AVC video coding standard is developed. In both standardization groups he chaired an Ad-Hoc group on interpolation fil- tering. In cooperation with Robert Bosch GmbH, he holds several international patents in the area of video com- pression. Thomas Stockhammer received his Diplom-Ingenieur degree in electrical engi- neering from the Munich University of Technology (TUM), Germany, in 1996. Since then he has been working toward the Dr.-Ing. degree at the Munich Universi- ty of Technology, Germany, in the area of multimedia and video transmission over mobile and pack- et-lossy channels. In 1996, he visited Rensselear Polytech- nic Institute (RPI), Troy, NY, to perform his diploma thesis in the area of combined source-channel coding for video 27FIRST QUARTER 2004 IEEE CIRCUITS AND SYSTEMS MAGAZINE
  • 22. and coding theory. There he started the research in video transmission as well as source and channel coding. In 2000, he was Visiting Researcher in the Information Coding Laboratory at the University of San Diego, Califor- nia (UCSD). Since then he has published numerous con- ference and journal papers and holds several patents. He regularly participates and contributes to different stan- dardization activities, e.g. ITU-T H.324, H.264, ISO/IEC MPEG, JVT, IETF, and 3GPP. He acts as a member of sev- eral technical program committees, as a reviewer for dif- ferent journals, and as an evaluator for the European Commission. His research interests include joint source and channel coding, video transmission, multimedia net- works, system design, rate-distortion optimization, infor- mation theory, as well as mobile communications. Jan Bormans, Ph.D., has been a researcher at the Information Retrieval and Interpretation Sciences laboratory of the Vrije Universiteit Brussel (VUB), Bel- gium, in 1992 and 1993. In 1994, he joined the VLSI Systems and Design Methodolo- gies (VSDM) division of the IMEC research center in Leuven, Belgium. Since 1996, he is heading IMEC's Multimedia Image Compression Systems group. This group focuses on the efficient design and implementation of embedded systems for advanced mul- timedia applications. Jan Bormans is the Belgian head of delegation for ISO/IEC's MPEG and SC29 standardization committees. He is also MPEG-21 requirements editor and chairman of the MPEG liaison group. Fernando Pereira was born in Vermelha, Portugal in October 1962. He was graduat- ed in Electrical and Computers Engineer- ing by Instituto Superior Técnico (IST), Universidade Técnica de Lisboa, Portugal, in 1985. He received the M.Sc. and Ph.D. degrees in Electrical and Computers Engi- neering from IST, in 1988 and 1991, respectively. He is currently Professor at the Electrical and Com- puters Engineering Department of IST. He is responsible for the participation of IST in many national and inter- national research projects. He is a member of the Edi- torial Board and Area Editor on Image/Video Compression of the Signal Processing: Image Communi- cation Journal and an Associate Editor of IEEE Transac- tions of Circuits and Systems for Video Technology, IEEE Transactions on Image Processing, and IEEE Trans- actions on Multimedia. He is a member of the Scientific and Program Committees of tens of international con- ferences and workshops. He has contributed more than 130 papers to journals and international conferences. He won the 1990 Portuguese IBM Award and an ISO Award for Outstanding Technical Contribution for his participation in the development of the MPEG-4 Visual standard, in October 1998. He has been participating in the work of ISO/MPEG for many years, notably as the head of the Portuguese dele- gation, chairman of the MPEG Requirements group, and chairing many Ad Hoc Groups related to the MPEG-4 and MPEG-7 standards. His current areas of interest are video analysis, pro- cessing, coding and description, and multimedia interac- tive services. Peter List graduated in Applied Physics in 1985 and received the PH.D. in 1989 from the University of Frankfurt/Main, Ger- many. Currently he is project manager at T- System Nova, the R&D Company of Deutsche Telekom. Since 1990 he has been with Deutsche Telekom, and has actively followed international standardization of video compression tech- nologies in MPEG, ITU and several European Projects for about 14 years. Detlev Marpe received the Diploma degree in mathematics with highest hon- ors from the Technical University Berlin, Germany. He is currently a Project Man- ager in the Image Processing Department of the Fraunhofer-I nstitute for Telecom- munications, Heinrich-Hertz-Institute (HHI), Berlin, Germany, where he is responsible for research projects in the area of video coding, image pro- cessing, and video streaming. Since 1997, he has been an active contributor to the ITU-T VCEG, ISO/IEC JPEG and ISO/IEC MPEG standardization activities for still image and video coding. During 2001–2003, he chaired the CABAC Ad-Hoc Group within the H.264/MPEG-4 AVC standardiza- tion effort of the ITU-T/ISO/IEC Joint Video Team. He has authored or co-authored more than 30 journal and confer- ence papers in the fields of image and video coding, image processing and information theory, and he has written more than 40 technical contributions to various interna- tional standardization projects. He also holds several international patents. He is a member of IEEE and ITG (German Society of Information Technology). As a co- founder of daViKo GmbH, a Berlin-based start-up compa- ny involved in development of server-less multipoint videoconferencing products for Intranet or Internet col- laboration, he received the Prime Prize of the 2001 Multi- media Start-up Competition founded by the German Federal Ministry of Economics and Technology. 28 IEEE CIRCUITS AND SYSTEMS MAGAZINE FIRST QUARTER 2004