Intra-coding using non-linear prediction, KLT and Texture Synthesis: AV1 encoders open the door to seemingly unconstrained video coding complexity

Institut für Informationsverarbeitung
Intra-coding using non-linear
prediction, KLT and Texture
Synthesis
AV1 encoders open the door to seemingly
unconstrained video coding complexity
Jörn Ostermann, Thorsten Laude, Yiqun Liu,
Bastian Wandt, Jan Voges, Holger Meuel

Decoder Runtimes
2
Thorsten Laude
laude@tnt.uni-hannover.de
Relative factors to HM, i.e. HM=1
0
2
4
6
8
10
12
14
JEM AV1 JEM AV1
All-intra Random Access
Complexityincrease
Class A1 Class A2 Class B Class C Class D Class E Class F Overall HM
Better

Encoder Runtimes
3
Thorsten Laude
Relative factors to HM, i.e. HM=1
0
10
20
30
40
50
60
JEM AV1 JEM AV1
All-intra Random Access
Complexityincrease
Class A1 Class A2 Class B Class C Class D Class E Class F Overall HM
Better
e.g. 10
frames/dayTotal CPU time: ≈ 1 decade

Runtime-memory Complexity
4
Thorsten Laude

Trade-off Coding Efficiency vs. Complexity
5
Thorsten Laude
Better
Better

Contour-based Multidirectional
Intra Coding for HEVC
Thorsten Laude and Jörn Ostermann

Prediction process
• 33 angular modes, DC, planar
• Extrapolation base: right column of left
block, bottom row of top block
Limitations of HEVC intra prediction
• Only one direction for angular modes
• Only one adjacent sample column/row as
extrapolation base
Motivation
7
Thorsten Laude
CurrentAlready coded Top image: Lainema et al., Intra Coding of the HEVC Standard, TCSVT, 2012

Limitations of HEVC intra prediction
• Only one direction for angular modes
• Only one adjacent sample column/row as
extrapolation base
Motivation
8
Thorsten Laude
CurrentAlready coded

Contour-based Multidirectional
Intra Coding
(CoMIC)
9
Thorsten Laude

Reconstructed
samples
• Available at
encoder and
decoder
Contour
extraction
• Detection
• Parameterization
Contour
extrapolation
• Sample value
continuation
• Various
extrapolation
methods
Contour-based Multidirectional Intra Coding
10
Thorsten Laude

Contour
detection
Contour
parameterization
Contour
extrapolation
11
Thorsten Laude

Contour
detection
Contour
parameterization
Contour
extrapolation
12
Thorsten Laude
Canny edge detection
Signal-adaptive thresholds following Otsu1, 2
1Otsu, A Threshold Selection Method from Gray-Level Histograms, SMC, 1979
2Fang et al., The Study on an Application of Otsu Method in Canny Operator, ISIP, 2009

Contour
detection
Contour
parameterization
Contour
extrapolation
13
Thorsten Laude
Polynomial parameterization
Linear regression problem  least squares

Contour
detection
Contour
parameterization
Contour
extrapolation
14
Thorsten Laude
Contour width by comparison of sample values
from central pixel with neighboring pixels

Contour
detection
Contour
parameterization
Contour
extrapolation
15
Thorsten Laude
Varying prediction certainty  Diminishing towards
mean sample value of reconstructed area
𝑠𝑠𝑒𝑒 =
𝑠𝑠𝑚𝑚 𝑑𝑑 + 𝑠𝑠𝑎𝑎(𝑑𝑑max − 𝑑𝑑)
𝑑𝑑max
𝑑𝑑 = (𝑥𝑥𝑎𝑎 − 𝑥𝑥𝑒𝑒)2+(𝑦𝑦𝑎𝑎 − 𝑦𝑦𝑒𝑒)2
𝑠𝑠𝑚𝑚
𝑠𝑠𝑎𝑎
𝑠𝑠𝑒𝑒

Contour
detection
Contour
parameterization
Contour
extrapolation
16
Thorsten Laude
Background prediction: continuation of sample values
• horizontal and vertical fill
• mean fill for shielded pixels
𝑠𝑠𝑚𝑚

Comparison with state-of-the art of Liu et al.1
17
Thorsten Laude
1Liu et al., Image Compression with Edge-based Inpainting, TCSVT, 2007
CoMIC (Ours) Liu et al.
Contour extrapolation solely
based on reconstructed samples
 no signalling
Signalling of side information for
the contour shape
Sample value continuation PDE-based inpainting
Signalling of representative
sample values for the inpainting

Stand alone codec: Comparison with state-of-the art of Liu et al.1
(anchor: JPEG)
18
Thorsten Laude
21%
20%
44%
21%
33%
32%
15%
26%
24%
28%
29%
31%
27%
33%
29%
37%
30%
26%
32%
22%
26%
26%
31%
34%
31%
30%
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
50%
Bitratesavings Liu et al. CoMIC [ours]
1Liu et al., Image Compression with Edge-based Inpainting, TCSVT, 2007
better

Additional coding mode in HEVC (HM-16.3)
19
Thorsten Laude
-2,0%
-1,8%
-1,6%
-1,4%
-1,2%
-1,0%
-0,8%
-0,6%
-0,4%
-0,2%
0,0%
Bike 14
BVI Ball
Under Water
BVI Bubbles
Clear BVI Sparkler
Basketball
Drive BQTerrace Kimono Mean
WeightedaverageBD-rate
All intra Low delay Random access Mean
better

• Separation of structural and texture
parts
• Contour extrapolation
• All information available at decoder
 no signalling except for mode
usage
• Coding gain:
up to 1.9% over HEVC
up to 36.5% over JPEG
• Outperforms related work
CoMIC
Results
Parameterization and extrapolation of structural
information result in improved intra prediction
Conclusion
20
Thorsten Laude

Scene-based KLT for Intra
Coding in HEVC
Yiqun Li and Jörn Ostermann

General Idea
22
Yiqun Liu
Yiqun.Liu@tnt.uni-hannover.de
Transform Coding
Original Luminance
Prediction Error
 Input: Prediction errors
 Output: Data for quantization
Desired:
 Content representable by few
coefficients in zig-zag order
16 ×16 TU
Logarithm of Energy
after DCT

Outline
General Idea
HM / JEM
Karhunen Loeve Transform
Conclusion
23
Yiqun Liu

HM / JEM
24
Yiqun Liu
DCT / DST
Benefit:
 Fixed coefficients
 Sensitivity of eyes
Drawbacks:
 DCT / DST not data-based
 Computational complexity
HM JEM
General DCT-II DCT-II
Special 4×4 DST-VII
for intra
Adaptive multiple Core transform (AMT) : (DST-
VII, DCT-VIII, DST-I, DCT-V)
Mode dependent non-separable
secondary transform (MDNSST) : 33
matrices for directional
2 matrices for non-directional modes

HM / JEM
25
Yiqun Liu
Signal dependent transform (SDT)
Procedure:
 Construct ref. patch with prediction
 Search for similar patches
 Data generated by subtraction
 Calculate the "ideal" transform
 Apply KLT on the prediction error
Ref. Patch
Benefit:
 No signaling at decoder
 Data-dependent transform
Drawback & Question mark:
 Decoding time rises
 Data choice for transform

26
Yiqun Liu
General Idea
HM / JEM
Conclusion

Desired Transform
Energy compaction
 Data dependent
⇒ Karhunen Loeve Transform (KLT)
Efficiency
 No re-generation at decoder
⇒ One off-line-trained transform for each case
27
Yiqun Liu

Desired Transform: Indicator Prediction Mode (PM)
(a) PM26 (b) PM18
Average absolute error of 8×8 TU, BQMall
Direction-based KLT for intra
⇒ One transform matrix for each direction mode
28
Yiqun Liu

Desired Transform: QP Dependency
Average absolute error of 8×8 TUs (PM 10) from PartyScene
QP-based KLT
⇒ Each sequence uses own KLT
Yiqun Liu
Yiqun.Liu@tnt.uni-hannover.de 10
QP 20 QP 37

Desired Transform: TU size
TU Size
 Coverage
 Complexity
TU size
Distribution of TUs in Class B seqs.
TU-based KLT
⇒ Aiming at 8×8 & 16×16
TUsYiqun Liu

Yiqun Liu
Desired Transform: Scene
(a) Basketball PM26 (b) BQMall PM 26
Average absolute error of 8×8 TU
Scene-based KLT
⇒ Each sequence uses own KLT
12

Structure
Block diagram of the hybrid encoder with KLT
Yiqun Liu

Simulation
Test sequences:
 JCT-VC
 1920×1080:
 BasketballDrive, Kimono, Cactus,
ParkScene, BQTerrace
 832×480:
 BasketballDrill, BQMall,
PartyScene, RaceHorses
 BVI Texture1
 1920×1080:
 PondDragonflies, Sparkler,
Bookcase, SmokeClear, Bricks
Test Condition:
 Common Test Condition2
 QP: 22 27 32 37
 All-Intra (AI)
Training Data:
 Class B & Class C
 100 Frames
 TU size 8×8, 16×16
Evaluation:
 BD-Rate3
1
M. A. Papadopoulos, F. Zhang, D. Agrafiotis and D. Bull, A Video Texture Database
for Perceptual Compression and Quality Assessment, ICIP 2015
2
F. Bossen, Common Test Conditions and Software Reference Configurations
3
G. Bjøntegaard, Improvements of the BD-PSNR Model, VCEG-AI11
Yiqun Liu

34
Yiqun Liu
Simulation Result
0 5 10 15
Scene−based
20 25
Kimono
Cactus
BQTerrace
BallUnderWater
BQMall
BasketballDrill
Plasma
BricksBushes
BricksLeaves
Gain [%]
BDBR. vs. HM−16.15
Average gain: 5.49%

35
Yiqun Liu
BasketballDrill, -25.00%
RaceHorses, -2.16%
BQMall, -0.37%
PartyScene, -1.21%

Performance in directions
BasketballDrill, -25.00% 0 6 10 14 18 22
Intra prediction modes
26 30 34
0
250
200
150
100
50
300
350
Numberof8x8TUsperFrame
BasketballDrill at QP22
HM
KLT
Distribution of TUs
⇒ Most TUs in diagonal directions
Yiqun Liu

BQMall, -0.37% 0 6 10 14 18 22
26 30 34
0
250
200
150
100
50
300
350
BQMall at QP22
HM
KLT
Distribution of TUs
⇒ Most TUs in horizontal and vertical directions
Yiqun Liu

BQMall, -0.37% 0 6 10 14 18 22
26 30 34
0
250
200
150
100
50
300
350
BQMall at QP22
HM
KLT
Distribution of TUs
⇒ Most TUs in horizontal and vertical directions
Most gain comes from diagonal directions
⇒ Only diagonal prediction modes (2-5, 15-21, 30-34)
Yiqun Liu

Yiqun Liu
Simulation Result
Kimono
0 5 10 15
Scene−based
20 25
Cactus
BQTerrace
BallUnderWater
BQMall
BasketballDrill
Plasma
BricksBushes
BricksLeaves
Gain [%]
Average gain: Scene-based 5.49%
Generic ~3%
18

Yiqun Liu
Kimono
0 5 10 15 20 25
Simulation Result Diagonal
BricksLeaves
BricksBushes
Plasma
BallUnderWater
BQMall
BasketballDrill
BQTerrace
Cactus
Gain [%]
Average gain: 5.49% vs. 4.14%
Scene−based
Scene−based diag.
18

Yiqun Liu
Distribution of TUs on frame
BasketballDrill, 1st frame, QP 32, HM-16.15
19

Yiqun Liu
Distribution of TUs on frame
BasketballDrill, 1st frame, QP 32, scene-based KLT
19

Conclusion
Yiqun Liu
General Idea
HM / JEM
Conclusion
20

Scene-based KLT
 Based on QP, TU-size, PM and scenes
 Average gain 5.49%, maximum at 25.00%
 Diagonal direction brings about 70% of all the gain
Conclusion
21
Yiqun Liu

Texture Synthesis
Bastain Wandt, Thorsten Laude, Bodo
Rosenhahn, Jörn Ostermann
pdf

Goal1
Penalty1
Penalty3
Penalty4
Texture Synthesis
46
Dipl.-Ing. Bastian Wandt
wandt@tnt.uni-hannover.de

Zusammenfassung
• AV1 has unseen level of encoder complexity
• Scene-based KLT 5%
• Non-linear intra prediction 0.5%
• Texture synthesis for severely bandlimited channels
Jörn Ostermann
ostermann@tnt.uni-hannover.de

Intra-coding using non-linear prediction, KLT and Texture Synthesis: AV1 encoders open the door to seemingly unconstrained video coding complexity

More Related Content

What's hot (18)

Similar to Intra-coding using non-linear prediction, KLT and Texture Synthesis: AV1 encoders open the door to seemingly unconstrained video coding complexity (20)

More from Förderverein Technische Fakultät (20)

Recently uploaded (20)

Intra-coding using non-linear prediction, KLT and Texture Synthesis: AV1 encoders open the door to seemingly unconstrained video coding complexity