SlideShare a Scribd company logo
6
Most read
10
Most read
11
Most read
Video compression design, analysis, consulting and research




White Paper:
4x4 Transform and Quantization in H.264/AVC




© Iain Richardson / VCodex Limited

Version 1.1a Revised April 2009
Vcodex White Paper: 4x4 Transform and Quantization in H.264/AVC
www.vcodex.com


H.264 Transform and Quantization


1   Overview

In an H.264/AVC codec, macroblock data are transformed and quantized prior to
coding and rescaled and inverse transformed prior to reconstruction and display
(Figure 1). Several transforms are specified in the H.264 standard: a 4x4 “core”
transform, 4x4 and 2x2 Hadamard transforms and an 8x8 transform (High profiles
only).




Figure 1 Transform and quantization in an H.264 codec




This paper describes a derivation of the forward and inverse transform and
quantization processes applied to 4x4 blocks of luma and chroma samples in an H.264
codec. The transform is a scaled approximation to a 4x4 Discrete Cosine Transform
that can be computed using simple integer arithmetic. A normalisation step is
incorporated into forward and inverse quantization operations.

2   The H.264 transform and quantization process

The inverse transform and re-scaling processes, shown in Figure 2, are defined in the
H.264/AVC standard. Input data (quantized transform coefficients) are re-scaled (a
combination of inverse quantization and normalisation, see later). The re-scaled values
are transformed using a “core” inverse transform. In certain cases, an inverse
transform is applied to the DC coefficients prior to re-scaling. These processes (or their
equivalents) must be implemented in every H.264-compliant decoder. The
corresponding forward transform and quantization processes are not standardized but
suitable processes can be derived from the inverse transform / rescaling processes
(Figure 3).




© Iain Richardson/Vcodex Ltd 2009                                             Page 2 of 13
Vcodex White Paper: 4x4 Transform and Quantization in H.264/AVC
    www.vcodex.com




    Figure 2 Re-scaling and inverse transform




    Figure 3 Forward transform and quantization




    3   Developing the forward transform and quantization process

    The basic 4x4 transform used in H.264 is a scaled approximate Discrete Cosine
    Transform (DCT). The transform and quantization processes are structured such that
    computational complexity is minimized. This is achieved by reorganising the processes
    into a core part and a scaling part.

    Consider a block of pixel data that is processed by a two-dimensional Discrete Cosine
    Transform (DCT) followed by quantization (dividing by a quantization step size, Qstep ,
    then rounding the result) (Figure 4a).

    Rearrange the DCT process into a core transform (Cf) and a scaling matrix (Sf) (Figure
    4b).

    Scale the quantization process by a constant (215) and compensate by dividing and
    rounding the final result (Figure 4c).

    Combine Sf and the quantization process into Mf (Figure 4d), where:

           Sf # 215
    Mf "
            Qstep
                                                                       Equation 1


!   (The reason for the use of ≈ will be explained later).




    © Iain Richardson/Vcodex Ltd 2009                                            Page 3 of 13
Vcodex White Paper: 4x4 Transform and Quantization in H.264/AVC
    www.vcodex.com




    Figure 4 Development of the forward transform and quantization process




    4      Developing the rescaling and inverse transform process

    Consider a re-scaling (or “inverse quantization”) operation followed by a two-
    dimensional inverse DCT (IDCT) (Figure 5a).

    Rearrange the IDCT process into a core transform (Ci) and a scaling matrix (Si) (Figure
    5b).

    Scale the re-scaling process by a constant (26) and compensate by dividing and
    rounding the final result (Figure 5c)1.

    Combine the re-scaling process and S into Vi (Figure 5d), where:

    Vi = S i " 2 6 " Qstep
                                                                       Equation 2


!

    1
        This rounding operation need not be to the nearest integer.


    © Iain Richardson/Vcodex Ltd 2009                                            Page 4 of 13
Vcodex White Paper: 4x4 Transform and Quantization in H.264/AVC
    www.vcodex.com




    Figure 5 Development of the rescaling and inverse transform process




    5   Developing Cf and Sf (4x4 blocks)

    Consider a 4x4 two-dimensional DCT of a block X:

    Y = A⋅X⋅AT
                                                                       Equation 3

    Where ⋅ indicates matrix multiplication and:

      #a a a a &     a = 12
      %          (
       b c "c "b(
    A=%            , b = 1 2 cos ) 8 = 0.6532....
      %a "a "a a (
      %          (
      $c "b b "c '   c = 1 2 cos 3) 8 = 0.2706....

    The rows of A are orthogonal and have unit norms (i.e. the rows are orthonormal).
    Calculation of Equation 3 on a practical processor requires approximation of the
!   irrational numbers b and c. A fixed-point approximation is equivalent to scaling each
    row of A and rounding to the nearest integer. Choosing a particular approximation
    (multiply by 2.5 and round) gives Cf :



    © Iain Richardson/Vcodex Ltd 2009                                            Page 5 of 13
Vcodex White Paper: 4x4 Transform and Quantization in H.264/AVC
    www.vcodex.com




         #1 1 1 1 &
         %          (
    Cf = %2 1 "1 "2(
         %1 "1 "1 1 (
         %          (
         $1 "2 2 "1'

    This approximation is chosen to minimise the complexity of implementing the
    transform (multiplication by Cf requires only additions and binary shifts) whilst
!   maintaining good compression performance.

    The rows of Cf have different norms. To restore the orthonormal property of the
                                                                  1
    original matrix A, multiply all the values cij in row r by            :
                                                                     2
                                                                 " c rj
                                                                  j


                                            " 1   2         12  12 %
                                                                 12
                                            $                       '
                                            $1       ! 10 1 10 1 10 '
                                                  10 1
    A1 = Cf • Rf                where R f = $                       '
                                            $ 1   2    12  12   12 '
                                            $                       '
                                            $1
                                            #     10 1 10 1 10 1 10 '
                                                                    &


    • denotes element-by-element multiplication (Hadamard-Schur product2). Note that
                       !
    the new matrix A1 is orthonormal.

    The two-dimensional transform (Equation 3) becomes:

    Y = A1⋅X⋅A1T                = [Cf • Rf]⋅X⋅[CfT • RfT ]

    Rearranging:

    Y         = [Cf⋅X⋅CfT] • [Rf • RfT]
              = [Cf⋅X⋅CfT] • Sf

    Where

                          "                              %
                          $    1 4    1 2 10          1 4
                                                  1 2 10 '
                          $                              '
                     T     1   2 10  1 10  1 2 10  1 10 '
    Sf = R f • R f       =$
                          $    1 4  1 2 10   1 4  1 2 10 '
                          $                              '
                          $1   2 10  1 10  1 2 10  1 10 '
                          #                              &


    2
        P = Q•R means that each element pij = qij⋅rij
!

    © Iain Richardson/Vcodex Ltd 2009                                              Page 6 of 13
Vcodex White Paper: 4x4 Transform and Quantization in H.264/AVC
    www.vcodex.com


    6   Developing Ci and Si (4x4 blocks)

    Consider a 4x4 two-dimensional IDCT of a block Y:

    Z = AT⋅Y⋅A
                                                                            Equation 4

    Where

      #a a a a &     a = 12
      %          (
       b c "c "b(
    A=%            , b = 1 2 cos ) 8 = 0.6532....                          as before.
      %a "a "a a (
      %          (
      $c "b b "c '   c = 1 2 cos 3) 8 = 0.2706....

    Choose a particular approximation by scaling each row of A and rounding to the
    nearest 0.5, giving Ci :
!
         #1   1   1    1 &
         %                (
         % 1 1 2 "1 2 "1 (
    Ci = %                (
         %1   "1  "1   1 (
         %                (
         %1 2 "1
         $        1   "1 2(
                          '

    The rows of Ci are orthogonal but have non-unit norms. To restore orthonormality,
                                              1
!   multiply all the values cij in row r by             :
                                                   2
                                              "c   rj
                                              j


                                         "1 2           12    12    12%
                                         $                             '
                                !        $ 2 5          2 5   2 5   2 5'
    A2 = Ci • Ri             where R i = $                             '
                                         $1 2           12    12    12'
                                         $                             '
                                         $ 2 5
                                         #              2 5   2 5   2 5'
                                                                       &


    The two-dimensional inverse transform (Equation 4) becomes:
                         !
    Z =     A2T⋅Y⋅A2         = [CiT • RiT]⋅Y⋅[Ci • Ri ]

    Rearranging:

    Z       = [CiT]⋅[Y • RiT • Ri ]⋅[ Ci]
            = [CiT]⋅[Y • Si ]⋅[ Ci]



    © Iain Richardson/Vcodex Ltd 2009                                                   Page 7 of 13
Vcodex White Paper: 4x4 Transform and Quantization in H.264/AVC
    www.vcodex.com


    Where

                         "                       %
                         $ 1   4   1   10   1 4
                                              10 '1
                         $                       '
               T          1    10 2 5 1 10  2 5 '
    Si = R i       •Ri = $
                         $ 1   4 1 10  1 4 1 10 '
                         $                       '
                         $1    10 2 5 1 10  2 5 '
                         #                       &

    The core inverse transform Ci and the rescaling matrix Vi are defined in the H.264
    standard. Hence we now develop Vi and will then derive Mf .
!

    7    Developing Vi

    From Equation 2, Vi = Si ⋅ Qstep ⋅ 26

    H.264 supports a range of quantization step sizes Qstep . The precise step sizes are not
    defined in the standard, rather the scaling matrix Vi is specified. Qstep values
    corresponding to the entries in Vi are shown in the following Table.

    QP              Qstep
    0               0.625
    1               0.702..
    2               0.787..
    3               0.884..
    4               0.992..
    5               1.114..
    6               1.250
    …               …
    12              2.5
    …               …
    18              5.0
    …               …
    48              160
    …               …
    51              224

                                                               6
    The ratio between successive Qstep values is chosen to be 2 = 1.2246... so that Qstep
    doubles in size when QP increases by 6. Any value of Qstep can be derived from the first
    6 values in the table (QP0 – QP5) as follows:

    Qstep(QP) = Qstep(QP%6) ⋅ 2floor(QP/6)            !


    The values in the matrix Vi depend on Qstep (hence QP) and on the scaling factor
    matrix Si . These are shown for QP 0 to 5 in the following Table.




    © Iain Richardson/Vcodex Ltd 2009                                            Page 8 of 13
Vcodex White Paper: 4x4 Transform and Quantization in H.264/AVC
www.vcodex.com


QP            Qstep ⋅ 26         Vi = round ( Si ⋅ Qstep ⋅ 26 )
0             40                 "10   13 10 13%
                                 $             '
                                 $13   16 13 16'
                                 $             '
                                 $10   13 10 13'
                                 $             '
                                 $13
                                 #     16 13 16'
                                               &


1             44.898             "11   14 11 14%
                                 $             '
                           !     $14   18 14 18'
                                 $             '
                                 $11   14 11 14'
                                 $             '
                                 #14   18 14 18&


2             50.397             "13   16 13 16%
                                 $             '
                           !     $16   20 16 20'
                                 $             '
                                 $13   16 13 16'
                                 $             '
                                 #16   20 16 20&


3             56.569             "14   18 14 18%
                                 $             '
                           !     $18   23 18 23'
                                 $             '
                                 $14   18 14 18'
                                 $             '
                                 #18   23 18 23&


4             63.496             "16   20 16 20%
                                 $             '
                           !     $20   25 20 25'
                                 $             '
                                 $16   20 16 20'
                                 $             '
                                 #20   25 20 25&


5             71.272             "18   23 18 23%
                                 $             '
                           !     $23   29 23 29'
                                 $             '
                                 $18   23 18 23'
                                 $             '
                                 #23   29 23 29&



For higher values of QP, the corresponding values in Vi are doubled (i.e. Vi (QP=6) =
                          !
2Vi(QP=0) , etc).




Note that there are only three unique values in each matrix Vi . These three values are
defined as a table of values v in the H.264 standard, for QP=0 to QP=5 :




© Iain Richardson/Vcodex Ltd 2009                                           Page 9 of 13
Vcodex White Paper: 4x4 Transform and Quantization in H.264/AVC
    www.vcodex.com



    Table 1 Matrix v defined in H.264 standard
    QP    v (r, 0):                   v (r, 1):                 v (r, 2):
          Vi positions (0,0),         Vi positions (1,1),       Remaining
          (0,2), (2,0), (2,2)         (1,3), (3,1), (3,3)       Vi positions
    0     10                          16                        13
    1     11                          18                        14
    2     13                          20                        16
    3     14                          23                        18
    4     16                          25                        20
    5     18                          29                        23


    Hence for QP values from 0 to 5, Vi is obtained as:

         "v (QP , 0)   v (QP , 2) v (QP , 0) v (QP , 2)%
         $                                             '
         $v (QP , 2)   v (QP ,1) v (QP , 2) v (QP ,1)'
    Vi = $                                             '
         $v (QP , 0)   v (QP , 2) v (QP , 0) v (QP , 2)'
         $                                             '
         $v (QP , 2)
         #             v (QP ,1) v (QP , 2) v (QP ,1)' &


    Denote this as:
!
    Vi = v(QP, n)
    Where v (r,n) is row r, column n of v.

    For larger values of QP (QP>5), index the row of array v by QP%6 and then multiply
    by 2floor(QP/6) . In general:

    Vi = v (QP%6,n)⋅ 2floor(QP/6)


    The complete inverse transform and scaling process (for 4x4 blocks in macroblocks
    excluding 16x16-Intra mode) becomes:

                                                                       1
    Z = round ( [CiT]⋅[Y • v (QP%6,n)⋅ 2floor(QP/6)]⋅[ Ci] ⋅                )
                                                                       26

    (Note: rounded division by 26 can be carried out by adding an offset and right-shifting
    by 6 bit positions).
                                                            !

    8    Deriving Mf

    Combining Equation 1 and Equation 2:




    © Iain Richardson/Vcodex Ltd 2009                                           Page 10 of 13
Vcodex White Paper: 4x4 Transform and Quantization in H.264/AVC
        www.vcodex.com



                  S i # Sf # 221
        Mf "
                        Vi

        Si , Sf are known and Vi is defined as described in the previous section. Define Mf as:

!
                  # S " S " 221 &
        Mf = round% i f
                  %             (
                                (
                  $     Vi      '


                          # 131072 104857.6 131072 104857.6&
!                         %                                  (
                          %104857.6 83886.1 104857.6 83886.1 (
        S i " Sf " 221   =%                                  (
                          % 131072 104857.6 131072 104857.6(
                          %                                  (
                          $104857.6 83886.1 104857.6 83886.1 '

        The entries in matrix Mf may be calculated as follows (Table 2):
    !
        Table 2 Tables v and m
        QP v (r, 0):       v (r, 1):             v (r, 2):       m (r, 0):       m (r, 1):       m (r,2):
            Vi positions   Vi positions          Remaining       Mf positions    Mf positions    Remaining
            (0,0), (0,2),  (1,1), (1,3),         Vi positions    (0,0), (0,2),   (1,1), (1,3),   Mf positions
            (2,0), (2,2)   (3,1), (3,3)                          (2,0), (2,2)    (3,1), (3,3)
        0      10                  16            13                 13107            5243           8066
        1      11                  18            14                 11916            4660           7490
        2      13                  20            16                 10082            4194           6554
        3      14                  23            18                  9362            3647           5825
        4      16                  25            20                  8192            3355           5243
        5      18                  29            23                  7282            2893           4559

        Hence for QP values from 0 to 5, Mf can be obtained from m , the last three columns
        of Table 2:

             "m(QP , 0)         m(QP , 2) m (QP , 0) m (QP , 2)%
             $                                                 '
             $m(QP , 2)         m(QP ,1) m (QP , 2) m (QP ,1)'
        Mf = $                                                 '
             $m(QP , 0)         m(QP , 2) m (QP , 0) m (QP , 2)'
             $                                                 '
             $m(QP , 2)
             #                  m(QP ,1) m (QP , 2) m (QP ,1)' &

        Denote this as:
!
        Mf = m(QP, n)
        Where m (r,n) is row r, column n of m.

        For larger values of QP (QP>5), index the row of array m by QP%6 and then divide by
        2floor(QP/6) . In general:

        Mf = m (QP%6,n)/ 2floor(QP/6)


        © Iain Richardson/Vcodex Ltd 2009                                                        Page 11 of 13
Vcodex White Paper: 4x4 Transform and Quantization in H.264/AVC
www.vcodex.com




Where m (r,n) is row r, column n of m.

The complete forward transform, scaling and quantization process (for 4x4 blocks and
for modes excluding 16x16-Intra) becomes:

                                                              1
Y = round ( [Cf]⋅[Y]⋅[ CfT] • m (QP%6,n)/ 2floor(QP/6) ] ⋅          )
                                                              215

(Note: rounded division by 215 may be carried out by adding an offset and right-
shifting by 15 bit positions).
                                                   !




9   Further reading

ITU-T Recommendation H.264, Advanced Video Coding for Generic Audio-Visual
Services, November 2007.

H. Malvar, A. Hallapuro, M. Karczewicz, and L. Kerofsky, Low-complexity transform and
quantization in H.264/AVC, IEEE Transactions on Circuits and Systems for Video
Technology, vol. 13, pp. 598–603, July 2003.

T. Wiegand, G.J. Sullivan, G. Bjontegaard, A. Luthra, Overview of the H.264/AVC video
coding standard, IEEE Transactions on Circuits and Systems for Video Technology, vol.
13, No. 7. (2003), pp. 560-576.

I. Richardson, The H.264 Advanced Video Compression Standard, to be published in
late 2009.

See http://www.vcodex.com/links.html for links to further resources on H.264 and
video compression.


Acknowledgement

I would like to thank Gary Sullivan for suggesting a treatment of the H.264 transform
and quantization processes along these lines and for his helpful comments on earlier
drafts of this document.

About the author

As a researcher, consultant and author working in the field of video compression
(video coding), my books on video codec design and the MPEG-4 and H.264 standards
are widely read by engineers, academics and managers. I advise companies on video
coding standards, design and intellectual property and lead the Centre for Video
Communications Research at The Robert Gordon University in Aberdeen, UK and the
Fully Configurable Video Coding research initiative.


© Iain Richardson/Vcodex Ltd 2009                                         Page 12 of 13
Vcodex White Paper: 4x4 Transform and Quantization in H.264/AVC
www.vcodex.com




Using the material in this document

This document is copyright – you may not reproduce the material without permission.
Please contact me to ask for permission. Please cite the document as follows:

Iain Richardson, 4x4 Transform and Quantization in H.264/AVC, VCodex Ltd White
Paper, April 2009, http://www.vcodex.com/




© Iain Richardson/Vcodex Ltd 2009                                       Page 13 of 13

More Related Content

PDF
Chapter4 - The Continuous-Time Fourier Transform
PDF
Unit iii
PPTX
Isomorphic graph
PPT
Fft ppt
PPT
Digital signal processing part2
PPTX
Defuzzification
PPTX
Butterworth filter
PDF
Metaheuristic Algorithms: A Critical Analysis
Chapter4 - The Continuous-Time Fourier Transform
Unit iii
Isomorphic graph
Fft ppt
Digital signal processing part2
Defuzzification
Butterworth filter
Metaheuristic Algorithms: A Critical Analysis

What's hot (20)

PPTX
Properties of dft
PDF
Bellman ford algorithm
DOCX
Steps for design of butterworth and chebyshev filter
PDF
Differential evolution optimization technique
PPTX
PUSH DOWN AUTOMATA VS TURING MACHINE
PDF
DSP_2018_FOEHU - Lec 06 - FIR Filter Design
PPTX
Fourier transform convergence
PPTX
Vending Machine Controller using VHDL
PPTX
Neural Networks and Deep Learning Basics
PDF
Fourier Series
PPTX
Fourier Transform
PPTX
Properties of fourier transform
PPTX
Associative memory network
PPTX
Discrete Fourier Transform
PPTX
Overlap Add, Overlap Save(digital signal processing)
PDF
DSP_2018_FOEHU - Lec 08 - The Discrete Fourier Transform
PDF
Poset in Relations(Discrete Mathematics)
PPTX
Genetic Algorithm
PPTX
FIR and IIR system
Properties of dft
Bellman ford algorithm
Steps for design of butterworth and chebyshev filter
Differential evolution optimization technique
PUSH DOWN AUTOMATA VS TURING MACHINE
DSP_2018_FOEHU - Lec 06 - FIR Filter Design
Fourier transform convergence
Vending Machine Controller using VHDL
Neural Networks and Deep Learning Basics
Fourier Series
Fourier Transform
Properties of fourier transform
Associative memory network
Discrete Fourier Transform
Overlap Add, Overlap Save(digital signal processing)
DSP_2018_FOEHU - Lec 08 - The Discrete Fourier Transform
Poset in Relations(Discrete Mathematics)
Genetic Algorithm
FIR and IIR system
Ad

Similar to The H.264 Integer Transform (20)

PDF
H0545156
PDF
2.[9 17]comparative analysis between dct & dwt techniques of image compression
PDF
2.[9 17]comparative analysis between dct & dwt techniques of image compression
PDF
Tall-and-skinny QR factorizations in MapReduce architectures
PDF
Adaptive Signal and Image Processing
PPT
Image compression- JPEG Compression & its Modes
PPT
M4L1.ppt
PDF
Direct tall-and-skinny QR factorizations in MapReduce architectures
PPT
jpg image processing nagham salim_as.ppt
PDF
Fast dct algorithm using winograd’s method
PDF
Fx3111501156
PPTX
Video Compression Basics by sahil jain
PDF
Ml01
PDF
Linear algebra review
DOCX
Adequacy of solutions
PPT
A novel steganographic method for jpeg images
PPT
M4L12.ppt
PDF
On Fixed Point error analysis of FFT algorithm
PDF
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...
PDF
quantization
H0545156
2.[9 17]comparative analysis between dct & dwt techniques of image compression
2.[9 17]comparative analysis between dct & dwt techniques of image compression
Tall-and-skinny QR factorizations in MapReduce architectures
Adaptive Signal and Image Processing
Image compression- JPEG Compression & its Modes
M4L1.ppt
Direct tall-and-skinny QR factorizations in MapReduce architectures
jpg image processing nagham salim_as.ppt
Fast dct algorithm using winograd’s method
Fx3111501156
Video Compression Basics by sahil jain
Ml01
Linear algebra review
Adequacy of solutions
A novel steganographic method for jpeg images
M4L12.ppt
On Fixed Point error analysis of FFT algorithm
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...
quantization
Ad

More from Iain Richardson (7)

PDF
Compressed Video Quality
PPTX
A short history of video coding
PPT
Iain Richardson: An Introduction to Video Compression
PPT
Configurable Video Coding
PPT
Getting the most out of H.264
PPT
Book Launch: The H.264 Advanced Video Compression Standard
PPT
Introduction to H.264 Advanced Video Compression
Compressed Video Quality
A short history of video coding
Iain Richardson: An Introduction to Video Compression
Configurable Video Coding
Getting the most out of H.264
Book Launch: The H.264 Advanced Video Compression Standard
Introduction to H.264 Advanced Video Compression

Recently uploaded (20)

PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
MYSQL Presentation for SQL database connectivity
Programs and apps: productivity, graphics, security and other tools
Agricultural_Statistics_at_a_Glance_2022_0.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
sap open course for s4hana steps from ECC to s4
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
NewMind AI Weekly Chronicles - August'25 Week I
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Understanding_Digital_Forensics_Presentation.pptx
Encapsulation_ Review paper, used for researhc scholars
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Network Security Unit 5.pdf for BCA BBA.
20250228 LYD VKU AI Blended-Learning.pptx
Empathic Computing: Creating Shared Understanding
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Diabetes mellitus diagnosis method based random forest with bat algorithm
Per capita expenditure prediction using model stacking based on satellite ima...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Spectral efficient network and resource selection model in 5G networks

The H.264 Integer Transform

  • 1. Video compression design, analysis, consulting and research White Paper: 4x4 Transform and Quantization in H.264/AVC © Iain Richardson / VCodex Limited Version 1.1a Revised April 2009
  • 2. Vcodex White Paper: 4x4 Transform and Quantization in H.264/AVC www.vcodex.com H.264 Transform and Quantization 1 Overview In an H.264/AVC codec, macroblock data are transformed and quantized prior to coding and rescaled and inverse transformed prior to reconstruction and display (Figure 1). Several transforms are specified in the H.264 standard: a 4x4 “core” transform, 4x4 and 2x2 Hadamard transforms and an 8x8 transform (High profiles only). Figure 1 Transform and quantization in an H.264 codec This paper describes a derivation of the forward and inverse transform and quantization processes applied to 4x4 blocks of luma and chroma samples in an H.264 codec. The transform is a scaled approximation to a 4x4 Discrete Cosine Transform that can be computed using simple integer arithmetic. A normalisation step is incorporated into forward and inverse quantization operations. 2 The H.264 transform and quantization process The inverse transform and re-scaling processes, shown in Figure 2, are defined in the H.264/AVC standard. Input data (quantized transform coefficients) are re-scaled (a combination of inverse quantization and normalisation, see later). The re-scaled values are transformed using a “core” inverse transform. In certain cases, an inverse transform is applied to the DC coefficients prior to re-scaling. These processes (or their equivalents) must be implemented in every H.264-compliant decoder. The corresponding forward transform and quantization processes are not standardized but suitable processes can be derived from the inverse transform / rescaling processes (Figure 3). © Iain Richardson/Vcodex Ltd 2009 Page 2 of 13
  • 3. Vcodex White Paper: 4x4 Transform and Quantization in H.264/AVC www.vcodex.com Figure 2 Re-scaling and inverse transform Figure 3 Forward transform and quantization 3 Developing the forward transform and quantization process The basic 4x4 transform used in H.264 is a scaled approximate Discrete Cosine Transform (DCT). The transform and quantization processes are structured such that computational complexity is minimized. This is achieved by reorganising the processes into a core part and a scaling part. Consider a block of pixel data that is processed by a two-dimensional Discrete Cosine Transform (DCT) followed by quantization (dividing by a quantization step size, Qstep , then rounding the result) (Figure 4a). Rearrange the DCT process into a core transform (Cf) and a scaling matrix (Sf) (Figure 4b). Scale the quantization process by a constant (215) and compensate by dividing and rounding the final result (Figure 4c). Combine Sf and the quantization process into Mf (Figure 4d), where: Sf # 215 Mf " Qstep Equation 1 ! (The reason for the use of ≈ will be explained later). © Iain Richardson/Vcodex Ltd 2009 Page 3 of 13
  • 4. Vcodex White Paper: 4x4 Transform and Quantization in H.264/AVC www.vcodex.com Figure 4 Development of the forward transform and quantization process 4 Developing the rescaling and inverse transform process Consider a re-scaling (or “inverse quantization”) operation followed by a two- dimensional inverse DCT (IDCT) (Figure 5a). Rearrange the IDCT process into a core transform (Ci) and a scaling matrix (Si) (Figure 5b). Scale the re-scaling process by a constant (26) and compensate by dividing and rounding the final result (Figure 5c)1. Combine the re-scaling process and S into Vi (Figure 5d), where: Vi = S i " 2 6 " Qstep Equation 2 ! 1 This rounding operation need not be to the nearest integer. © Iain Richardson/Vcodex Ltd 2009 Page 4 of 13
  • 5. Vcodex White Paper: 4x4 Transform and Quantization in H.264/AVC www.vcodex.com Figure 5 Development of the rescaling and inverse transform process 5 Developing Cf and Sf (4x4 blocks) Consider a 4x4 two-dimensional DCT of a block X: Y = A⋅X⋅AT Equation 3 Where ⋅ indicates matrix multiplication and: #a a a a & a = 12 % ( b c "c "b( A=% , b = 1 2 cos ) 8 = 0.6532.... %a "a "a a ( % ( $c "b b "c ' c = 1 2 cos 3) 8 = 0.2706.... The rows of A are orthogonal and have unit norms (i.e. the rows are orthonormal). Calculation of Equation 3 on a practical processor requires approximation of the ! irrational numbers b and c. A fixed-point approximation is equivalent to scaling each row of A and rounding to the nearest integer. Choosing a particular approximation (multiply by 2.5 and round) gives Cf : © Iain Richardson/Vcodex Ltd 2009 Page 5 of 13
  • 6. Vcodex White Paper: 4x4 Transform and Quantization in H.264/AVC www.vcodex.com #1 1 1 1 & % ( Cf = %2 1 "1 "2( %1 "1 "1 1 ( % ( $1 "2 2 "1' This approximation is chosen to minimise the complexity of implementing the transform (multiplication by Cf requires only additions and binary shifts) whilst ! maintaining good compression performance. The rows of Cf have different norms. To restore the orthonormal property of the 1 original matrix A, multiply all the values cij in row r by : 2 " c rj j " 1 2 12 12 % 12 $ ' $1 ! 10 1 10 1 10 ' 10 1 A1 = Cf • Rf where R f = $ ' $ 1 2 12 12 12 ' $ ' $1 # 10 1 10 1 10 1 10 ' & • denotes element-by-element multiplication (Hadamard-Schur product2). Note that ! the new matrix A1 is orthonormal. The two-dimensional transform (Equation 3) becomes: Y = A1⋅X⋅A1T = [Cf • Rf]⋅X⋅[CfT • RfT ] Rearranging: Y = [Cf⋅X⋅CfT] • [Rf • RfT] = [Cf⋅X⋅CfT] • Sf Where " % $ 1 4 1 2 10 1 4 1 2 10 ' $ ' T 1 2 10 1 10 1 2 10 1 10 ' Sf = R f • R f =$ $ 1 4 1 2 10 1 4 1 2 10 ' $ ' $1 2 10 1 10 1 2 10 1 10 ' # & 2 P = Q•R means that each element pij = qij⋅rij ! © Iain Richardson/Vcodex Ltd 2009 Page 6 of 13
  • 7. Vcodex White Paper: 4x4 Transform and Quantization in H.264/AVC www.vcodex.com 6 Developing Ci and Si (4x4 blocks) Consider a 4x4 two-dimensional IDCT of a block Y: Z = AT⋅Y⋅A Equation 4 Where #a a a a & a = 12 % ( b c "c "b( A=% , b = 1 2 cos ) 8 = 0.6532.... as before. %a "a "a a ( % ( $c "b b "c ' c = 1 2 cos 3) 8 = 0.2706.... Choose a particular approximation by scaling each row of A and rounding to the nearest 0.5, giving Ci : ! #1 1 1 1 & % ( % 1 1 2 "1 2 "1 ( Ci = % ( %1 "1 "1 1 ( % ( %1 2 "1 $ 1 "1 2( ' The rows of Ci are orthogonal but have non-unit norms. To restore orthonormality, 1 ! multiply all the values cij in row r by : 2 "c rj j "1 2 12 12 12% $ ' ! $ 2 5 2 5 2 5 2 5' A2 = Ci • Ri where R i = $ ' $1 2 12 12 12' $ ' $ 2 5 # 2 5 2 5 2 5' & The two-dimensional inverse transform (Equation 4) becomes: ! Z = A2T⋅Y⋅A2 = [CiT • RiT]⋅Y⋅[Ci • Ri ] Rearranging: Z = [CiT]⋅[Y • RiT • Ri ]⋅[ Ci] = [CiT]⋅[Y • Si ]⋅[ Ci] © Iain Richardson/Vcodex Ltd 2009 Page 7 of 13
  • 8. Vcodex White Paper: 4x4 Transform and Quantization in H.264/AVC www.vcodex.com Where " % $ 1 4 1 10 1 4 10 '1 $ ' T 1 10 2 5 1 10 2 5 ' Si = R i •Ri = $ $ 1 4 1 10 1 4 1 10 ' $ ' $1 10 2 5 1 10 2 5 ' # & The core inverse transform Ci and the rescaling matrix Vi are defined in the H.264 standard. Hence we now develop Vi and will then derive Mf . ! 7 Developing Vi From Equation 2, Vi = Si ⋅ Qstep ⋅ 26 H.264 supports a range of quantization step sizes Qstep . The precise step sizes are not defined in the standard, rather the scaling matrix Vi is specified. Qstep values corresponding to the entries in Vi are shown in the following Table. QP Qstep 0 0.625 1 0.702.. 2 0.787.. 3 0.884.. 4 0.992.. 5 1.114.. 6 1.250 … … 12 2.5 … … 18 5.0 … … 48 160 … … 51 224 6 The ratio between successive Qstep values is chosen to be 2 = 1.2246... so that Qstep doubles in size when QP increases by 6. Any value of Qstep can be derived from the first 6 values in the table (QP0 – QP5) as follows: Qstep(QP) = Qstep(QP%6) ⋅ 2floor(QP/6) ! The values in the matrix Vi depend on Qstep (hence QP) and on the scaling factor matrix Si . These are shown for QP 0 to 5 in the following Table. © Iain Richardson/Vcodex Ltd 2009 Page 8 of 13
  • 9. Vcodex White Paper: 4x4 Transform and Quantization in H.264/AVC www.vcodex.com QP Qstep ⋅ 26 Vi = round ( Si ⋅ Qstep ⋅ 26 ) 0 40 "10 13 10 13% $ ' $13 16 13 16' $ ' $10 13 10 13' $ ' $13 # 16 13 16' & 1 44.898 "11 14 11 14% $ ' ! $14 18 14 18' $ ' $11 14 11 14' $ ' #14 18 14 18& 2 50.397 "13 16 13 16% $ ' ! $16 20 16 20' $ ' $13 16 13 16' $ ' #16 20 16 20& 3 56.569 "14 18 14 18% $ ' ! $18 23 18 23' $ ' $14 18 14 18' $ ' #18 23 18 23& 4 63.496 "16 20 16 20% $ ' ! $20 25 20 25' $ ' $16 20 16 20' $ ' #20 25 20 25& 5 71.272 "18 23 18 23% $ ' ! $23 29 23 29' $ ' $18 23 18 23' $ ' #23 29 23 29& For higher values of QP, the corresponding values in Vi are doubled (i.e. Vi (QP=6) = ! 2Vi(QP=0) , etc). Note that there are only three unique values in each matrix Vi . These three values are defined as a table of values v in the H.264 standard, for QP=0 to QP=5 : © Iain Richardson/Vcodex Ltd 2009 Page 9 of 13
  • 10. Vcodex White Paper: 4x4 Transform and Quantization in H.264/AVC www.vcodex.com Table 1 Matrix v defined in H.264 standard QP v (r, 0): v (r, 1): v (r, 2): Vi positions (0,0), Vi positions (1,1), Remaining (0,2), (2,0), (2,2) (1,3), (3,1), (3,3) Vi positions 0 10 16 13 1 11 18 14 2 13 20 16 3 14 23 18 4 16 25 20 5 18 29 23 Hence for QP values from 0 to 5, Vi is obtained as: "v (QP , 0) v (QP , 2) v (QP , 0) v (QP , 2)% $ ' $v (QP , 2) v (QP ,1) v (QP , 2) v (QP ,1)' Vi = $ ' $v (QP , 0) v (QP , 2) v (QP , 0) v (QP , 2)' $ ' $v (QP , 2) # v (QP ,1) v (QP , 2) v (QP ,1)' & Denote this as: ! Vi = v(QP, n) Where v (r,n) is row r, column n of v. For larger values of QP (QP>5), index the row of array v by QP%6 and then multiply by 2floor(QP/6) . In general: Vi = v (QP%6,n)⋅ 2floor(QP/6) The complete inverse transform and scaling process (for 4x4 blocks in macroblocks excluding 16x16-Intra mode) becomes: 1 Z = round ( [CiT]⋅[Y • v (QP%6,n)⋅ 2floor(QP/6)]⋅[ Ci] ⋅ ) 26 (Note: rounded division by 26 can be carried out by adding an offset and right-shifting by 6 bit positions). ! 8 Deriving Mf Combining Equation 1 and Equation 2: © Iain Richardson/Vcodex Ltd 2009 Page 10 of 13
  • 11. Vcodex White Paper: 4x4 Transform and Quantization in H.264/AVC www.vcodex.com S i # Sf # 221 Mf " Vi Si , Sf are known and Vi is defined as described in the previous section. Define Mf as: ! # S " S " 221 & Mf = round% i f % ( ( $ Vi ' # 131072 104857.6 131072 104857.6& ! % ( %104857.6 83886.1 104857.6 83886.1 ( S i " Sf " 221 =% ( % 131072 104857.6 131072 104857.6( % ( $104857.6 83886.1 104857.6 83886.1 ' The entries in matrix Mf may be calculated as follows (Table 2): ! Table 2 Tables v and m QP v (r, 0): v (r, 1): v (r, 2): m (r, 0): m (r, 1): m (r,2): Vi positions Vi positions Remaining Mf positions Mf positions Remaining (0,0), (0,2), (1,1), (1,3), Vi positions (0,0), (0,2), (1,1), (1,3), Mf positions (2,0), (2,2) (3,1), (3,3) (2,0), (2,2) (3,1), (3,3) 0 10 16 13 13107 5243 8066 1 11 18 14 11916 4660 7490 2 13 20 16 10082 4194 6554 3 14 23 18 9362 3647 5825 4 16 25 20 8192 3355 5243 5 18 29 23 7282 2893 4559 Hence for QP values from 0 to 5, Mf can be obtained from m , the last three columns of Table 2: "m(QP , 0) m(QP , 2) m (QP , 0) m (QP , 2)% $ ' $m(QP , 2) m(QP ,1) m (QP , 2) m (QP ,1)' Mf = $ ' $m(QP , 0) m(QP , 2) m (QP , 0) m (QP , 2)' $ ' $m(QP , 2) # m(QP ,1) m (QP , 2) m (QP ,1)' & Denote this as: ! Mf = m(QP, n) Where m (r,n) is row r, column n of m. For larger values of QP (QP>5), index the row of array m by QP%6 and then divide by 2floor(QP/6) . In general: Mf = m (QP%6,n)/ 2floor(QP/6) © Iain Richardson/Vcodex Ltd 2009 Page 11 of 13
  • 12. Vcodex White Paper: 4x4 Transform and Quantization in H.264/AVC www.vcodex.com Where m (r,n) is row r, column n of m. The complete forward transform, scaling and quantization process (for 4x4 blocks and for modes excluding 16x16-Intra) becomes: 1 Y = round ( [Cf]⋅[Y]⋅[ CfT] • m (QP%6,n)/ 2floor(QP/6) ] ⋅ ) 215 (Note: rounded division by 215 may be carried out by adding an offset and right- shifting by 15 bit positions). ! 9 Further reading ITU-T Recommendation H.264, Advanced Video Coding for Generic Audio-Visual Services, November 2007. H. Malvar, A. Hallapuro, M. Karczewicz, and L. Kerofsky, Low-complexity transform and quantization in H.264/AVC, IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, pp. 598–603, July 2003. T. Wiegand, G.J. Sullivan, G. Bjontegaard, A. Luthra, Overview of the H.264/AVC video coding standard, IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, No. 7. (2003), pp. 560-576. I. Richardson, The H.264 Advanced Video Compression Standard, to be published in late 2009. See http://www.vcodex.com/links.html for links to further resources on H.264 and video compression. Acknowledgement I would like to thank Gary Sullivan for suggesting a treatment of the H.264 transform and quantization processes along these lines and for his helpful comments on earlier drafts of this document. About the author As a researcher, consultant and author working in the field of video compression (video coding), my books on video codec design and the MPEG-4 and H.264 standards are widely read by engineers, academics and managers. I advise companies on video coding standards, design and intellectual property and lead the Centre for Video Communications Research at The Robert Gordon University in Aberdeen, UK and the Fully Configurable Video Coding research initiative. © Iain Richardson/Vcodex Ltd 2009 Page 12 of 13
  • 13. Vcodex White Paper: 4x4 Transform and Quantization in H.264/AVC www.vcodex.com Using the material in this document This document is copyright – you may not reproduce the material without permission. Please contact me to ask for permission. Please cite the document as follows: Iain Richardson, 4x4 Transform and Quantization in H.264/AVC, VCodex Ltd White Paper, April 2009, http://www.vcodex.com/ © Iain Richardson/Vcodex Ltd 2009 Page 13 of 13