SlideShare a Scribd company logo
Efficient Content-Adaptive Feature-based Shot Detection for
HTTP Adaptive Streaming
Vignesh V Menon, Hadi Amirpour, Mohammad Ghanbari, Christian Timmerer
Christian Doppler Laboratory ATHENA, Institute of Information Technology (ITEC), University of Klagenfurt, Austria
19-22 September 2021
Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 1
Outline
1 Introduction
2 Shot detection
3 Proposed Algorithm
4 Evaluation
5 Conclusions and Future Directions
Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 2
Introduction
Introduction
Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 3
Introduction
Introduction
Background of HTTP Adaptive Streaming (HAS)1
Source: https://bitmovin.com/adaptive-streaming/
Why Adaptive Streaming?
Adapt for a wide range of devices
Adapt for a broad set of Internet speeds
What HAS does?
Each source video is split into segments
Encoded at multiple bitrates, resolutions,
and codecs
Delivered to the client based on the device
capability, network speed etc.
1
A. Bentaleb et al. “A Survey on Bitrate Adaptation Schemes for Streaming Media Over HTTP”. In: IEEE Communications Surveys Tutorials 21.1 (2019),
pp. 562–585. doi: 10.1109/COMST.2018.2862938.
Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 4
Introduction
Introduction
Multi-shot encoding framework for VoD HAS applications2
Input Video Shot Detection
Shot Encodings
Video Quality Measure
Convex Hull Determination
Encoding Set Generation
Multi-shot Encoding
Encoded Shots
Bitrate Quality Pairs
Bitrate Resolution Pairs
Target Encoding Set
2
Venkata Phani Kumar M, Christian Timmerer, and Hermann Hellwagner. “MiPSO: Multi-Period Per-Scene Optimization For HTTP Adaptive Streaming”. In:
2020 IEEE International Conference on Multimedia and Expo (ICME). 2020, pp. 1–6. doi: 10.1109/ICME46284.2020.9102775.
Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 5
Shot detection
Shot detection
Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 6
Shot detection
Shot Detection
The boundaries between video shots are commonly known as shot transitions or shot-cuts.
The act of segmenting a video sequence into shots is called shot detection.
Objective:
Detect the first picture of each shot and encode it as an Instantaneous Decoder Refresh
(IDR) frame.
Encode the subsequent frames of the new shot based on the first one via motion compen-
sation and prediction.3
3
J.-R Ding and Jar-Ferr Yang. “Adaptive group-of-pictures and scene change detection methods based on existing H.264 advanced video coding information”.
In: Image Processing, IET 2 (May 2008), pp. 85 –94. doi: 10.1049/iet-ipr:20070014.
Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 7
Shot detection
Shot Detection
Shot transitions can be present in two ways:
hard shot-cuts
gradual shot transitions
The detection of gradual changes is much more difficult owing to the fact it is difficult to
determine the change in the visual information in a quantitative format.
Note
1 Ratio of IDR frames to non-IDR frames is skewed, i.e, uneven distribution.
2 Missed shot-cut detections and wrong IDR placements cause low compression efficiency,
i.e., cost of error is large.
Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 8
Proposed Algorithm
Proposed Algorithm
Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 9
Proposed Algorithm Phase 1: Feature Extraction
Proposed Algorithm
Phase 1: Feature Extraction
Compute texture energy per Coding Tree Unit (CTU)
A DCT-based energy function is used to determine the block-wise feature of each frame
defined as:
Hk =
w
X
i=1
h
X
j=1
e|( ij
wh
)2−1|
|DCT(i − 1, j − 1)| (1)
where w and h are the width and height of the block, and DCT(i, j) is the (i, j)th DCT
component when i + j > 2, and 0 otherwise.
The energy values of CTUs in a frame is averaged to determine the energy per frame.4
4
Michael King, Zinovi Tauber, and Ze-Nian Li. “A New Energy Function for Segmentation and Compression”. In: July 2007, pp. 1647–1650. doi:
10.1109/ICME.2007.4284983.
Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 10
Proposed Algorithm Phase 1: Feature Extraction
Proposed Algorithm
Phase 1: Feature Extraction
Figure: Hk of Tears of Steel sequence. Black circles denote the regions of shot transitions.
Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 11
Proposed Algorithm Phase 1: Feature Extraction
Proposed Algorithm
Phase 1: Feature Extraction
hk: Mean Squared Error (MSE) of the CTU level energy values of frame k to that of the
previous frame k − 1, normalized to Hk.
hk =
PM
i=1(Hk(i) − Hk−1(i))2
MHk
(2)
where M denotes the number of CTUs in frame k.
: gradient of h per frame,  given by:
k =
hk−1 − hk
hk−1
(3)
Note
If hk = 0, kth frame is a duplicate of (k − 1)th frame.
Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 12
Proposed Algorithm Phase 2: Successive Elimination Algorithm
Proposed Algorithm
Phase 2: Successive Elimination Algorithm
Step 1: while Parsing all video frames do
if k  T1 then
k ← IDR-frame, a new shot.
else if k ≤ T2 then
k ← P-frame or B-frame, not a new shot.
T1 , T2 : maximum and minimum threshold for k
Note
The frames are classified into three categories in this step:
1 a new shot
2 not a new shot
3 not decided
In the next steps of the algorithm, only frames of category (3) are considered.
Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 13
Proposed Algorithm Phase 2: Successive Elimination Algorithm
Proposed Algorithm
Phase 2: Successive Elimination Algorithm
f : video fps
Q : set of frames where T1 ≥   T2
q0: current frame number in the set Q
q−1: previous frame number in the set Q
q1: next frame number in the set Q
Step 2: while Parsing Q do
if q0 − q−1  f and q1 − q0  f then
q0 ← IDR-frame, a new shot.
Eliminate q0 from Q.
Step 3: while Parsing Q do
if q0 − q−1  f and q1 − q0 ≤ f then
compare q0 with q when q is from the subset of Q where q1 − q0 ≤ f
Frame q with the highest  value ← IDR-frame, a new shot.
Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 14
Proposed Algorithm Phase 2: Successive Elimination Algorithm
Proposed Algorithm
Working Example
Table: Step 1.
Frame Hk 
33 52162 21.68
54 52119 13.51
65 52625 19.21
86 52038 10.12
97 52499 17.34
161 47790 11.53
833 48644 11.49
1409 40367 14.51
1665 35321 19.93
1686 40463 10.72
1889 38475 12.16
2205 37218 10.08
2536 35793 10.49
Table: Step 2.
Frame Hk  q0 − q−1 q1 − q0
33 52162 21.68 33 21
54 52119 13.51 21 11
65 52625 19.21 11 21
86 52038 10.12 21 11
97 52499 17.34 11 64
161 47790 11.53 64 672
833 48644 11.49 672 576
1409 40367 14.51 576 256
1665 35321 19.93 256 21
1686 40463 10.72 21 203
1889 38475 12.16 203 316
2205 37218 10.08 316 331
2536 35793 10.49 331 -
Table: Step 3.
Frame Hk  q0 − q−1 q1 − q0
33 52162 21.68 33 21
54 52119 13.51 21 11
65 52625 19.21 11 21
86 52038 10.12 21 11
97 52499 17.34 11 64
1665 35321 19.93 256 21
1686 40463 10.72 21 203
2536 35793 10.49 331 -
This example uses FunOnTheRiver (24 fps) test sequence. Detected frames to be encoded as
IDR-frames in each step are:
Step 1: -
Step 2: 161, 833, 1409, 1889, 2205
Step 3: 33, 1665, 2536
Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 15
Evaluation
Evaluation
Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 16
Evaluation
Evaluation
Test Methodology
Test videos: JVET test sequences5 and professionally produced UHD HDR cinematic con-
tent6 having typical multi-scene content
System: Dual-processor server with Intel Xeon Gold 5218R (80 cores, 2.10 GHz)
Benchmark algorithm: default shot detection algorithm in x265
T1 = 50 and T2 = 10 for the proposed algorithm; determined experimentally
Metrics: accuracy, precision, recall,7 and F-measure8
5
Jill Boyce et al. JVET-J1010: JVET common test conditions and software reference configurations. July 2018.
6
M. H. Pinson. “The Consumer Digital Video Library [Best of the Web]”. In: IEEE Signal Processing Magazine 30.4 (2013), pp. 172–174. doi:
10.1109/MSP.2013.2258265.
7
Markus Junker, Rainer Hoch, and Andreas Dengel. “On the Evaluation of Document Analysis Components by Recall, Precision, and Accuracy”. In: (Apr.
2000). doi: 10.1109/ICDAR.1999.791887.
8
Sasaki Yutaka. “The truth of the F-measure”. In: https://www.toyota-ti.ac.jp/Lab/Denshi/COIN/people/yutaka.sasaki/F-measure-YS-26Oct07.pdf. 2007.
Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 17
Evaluation
Evaluation
Experimental Results
Table: Shot detection results
Video Actual Benchmark algorithm Proposed algorithm
shot-cuts Accuracy Precision Recall F-measure Accuracy Precision Recall F-measure
BigBuckBunny 10 99.88% 100.00% 80.00% 88.89% 100.00% 100.00% 100.00% 100.00%
Dinner 4 99.89% 100.00% 75.00% 85.71% 99.89% 100.00% 75.00% 85.71%
FoodMarket4 2 99.72% - 0% - 99.86% 100.00% 50.00% 66.67%
sintel trailer 14 99.86% 100.00% 85.71% 92.31% 99.93% 100.00% 92.86% 96.30%
snow mnt 3 99.47% - 0% - 99.65% 100.00% 33.33% 50.00%
Tears of Steel 13 99.93% 100.00% 92.31% 96.00 % 100.00% 100.00% 100.00% 100.00%
Busy City 11 99.64% 50.00% 18.18% 26.67% 99.87% 100.00% 63.64% 77.78%
FunOnTheRiver 12 99.60% 0% 0% - 99.80% 85.71% 50.00% 63.16%
Remarks
1 Actual shot-cuts: the ground truth, i.e., the number of real shot transitions in the considered test videos
determined manually.
2 Recall rate of the proposed algorithm is 25% better than the benchmark algorithm.
3 F-measure of the proposed algorithm is 20% higher compared to the benchmark algorithm.
Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 18
Evaluation
Evaluation
Experimental Results
Table: Detection rate statistics of the algorithms
Algorithm TPR FPR
Benchmark 53.62% 0.03%
Proposed 78.26% 0.01%
Runtime per frame: 0.1% of the total time taken for encoding each frame.
The algorithm needs to be run only once for a video. The decisions made can be used for
all remaining representations in HAS applications.
Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 19
Conclusions and Future Directions
Conclusions and Future Directions
Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 20
Conclusions and Future Directions
Conclusions
Proposed a shot detection algorithm as a feature-based pre-processing step for x265-based
HEVC encoding in VoD HAS applications.
Identified a DCT-based energy function as a feature to determine shot cuts.
Proposed a successive elimination algorithm to remove the false detections during gradual
shot transitions.
The proposed algorithm gives better-balanced shot detections compared to the benchmark
algorithm.
Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 21
Conclusions and Future Directions
Future Directions
We can extend the work in this paper to compute the relative complexity of the shots to
that of the entire video sequence using the feature metric and predict the ideal bitrate per
resolution for each shot.
As an extension of this work, more encoding parameter decisions like optimal block parti-
tioning, quantization offsets can be predicted.
This work can be extended to support more recent codecs e.g., VVC.
Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 22
Conclusions and Future Directions
Q  A
Thank you for your attention!
Vignesh V Menon (vignesh.menon@aau.at)
Hadi Amirpour (hadi.amirpourazarian@aau.at)
Mohammad Ghanbari (ghan@essex.ac.uk)
Christian Timmerer (Christian.Timmerer@aau.at)
Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 23

More Related Content

PDF
Automated packaging machine using plc
PPS
Cric
PPTX
Ultrasonic Automatic Vehicle Braking System for Forward Collision Avoidance
PPTX
ALCOHOL AND HELMET DETECTION WITH ENGINE LOCKING SYSTEM USING GSM
PDF
Pantograph II - Failure Analysis, Monitor & Testing Regime
PPT
fundamentals of machine vision system
PDF
Experimental verification and finite element analysis of a sliding door syste...
PDF
Final Presentation - Senior Design Project
Automated packaging machine using plc
Cric
Ultrasonic Automatic Vehicle Braking System for Forward Collision Avoidance
ALCOHOL AND HELMET DETECTION WITH ENGINE LOCKING SYSTEM USING GSM
Pantograph II - Failure Analysis, Monitor & Testing Regime
fundamentals of machine vision system
Experimental verification and finite element analysis of a sliding door syste...
Final Presentation - Senior Design Project

What's hot (20)

PDF
Pantograph Failure & Analysis
PPTX
What is OBD and OBD II Software Stack?
PPT
presentation on IR based vehicle with AUTOMATIC BRAKING and DRIVER AWAKENING ...
DOC
PDF
Automatic railway gate control
PDF
The Autonomous Revolution of Vehicles and Transportation
PPTX
Self driving cars.pptx
PPT
Self driving car
DOC
Finger print authentication for bikes
PPTX
Anti drowsy alarm for drivers
PPSX
Cnc 2- structure of cnc machines - hiast
PPT
INTELLIGENT BRAKING SYSTEM
PPTX
PPTX
Driver drowsiness detection
DOCX
ARDUINO BASED TIME AND TEMPERATURE DISPLAY
PPTX
Self driving car
PPT
Projet électronique
PPTX
Automatic drain cleaning system
PPTX
Me303_Automatic Car Washing Machine (1).pptx
PPTX
Manufacturing Process of Steering wheel
Pantograph Failure & Analysis
What is OBD and OBD II Software Stack?
presentation on IR based vehicle with AUTOMATIC BRAKING and DRIVER AWAKENING ...
Automatic railway gate control
The Autonomous Revolution of Vehicles and Transportation
Self driving cars.pptx
Self driving car
Finger print authentication for bikes
Anti drowsy alarm for drivers
Cnc 2- structure of cnc machines - hiast
INTELLIGENT BRAKING SYSTEM
Driver drowsiness detection
ARDUINO BASED TIME AND TEMPERATURE DISPLAY
Self driving car
Projet électronique
Automatic drain cleaning system
Me303_Automatic Car Washing Machine (1).pptx
Manufacturing Process of Steering wheel
Ad

Similar to IEEE ICIP'22:Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming (20)

PDF
TQPM.pdf
PDF
OPSE: Online Per-Scene Encoding for Adaptive HTTP Live Streaming
PDF
OPSE_Online Per-Scene Encoding for Adaptive HTTP Live Streaming.pdf
PDF
A FAST SEARCH ALGORITHM FOR LARGE VIDEO DATABASE USING HOG BASED FEATURES
PDF
A fast search algorithm for large
PDF
A FAST SEARCH ALGORITHM FOR LARGE VIDEO DATABASE USING HOG BASED FEATURES
PDF
Design and Analysis of Quantization Based Low Bit Rate Encoding System
PDF
Optimal Quality and Efficiency in Adaptive Live Streaming with JND-Aware Low ...
PDF
Optimal Quality and Efficiency in Adaptive Live Streaming with JND-Aware Low ...
PDF
survey on Scene Detection Techniques on video
PDF
IEEE MMSP'21: INCEPT: Intra CU Depth Prediction for HEVC
PDF
INCEPT: Intra CU Depth Prediction for HEVC
PDF
Green_VCA_presentation.pdf
PDF
SVM Based Saliency Map Technique for Reducing Time Complexity in HEVC
PDF
ETPS_Efficient_Two_pass_Encoding_Scheme_for_Adaptive_Streaming.pdf
PDF
ETPS: Efficient Two-pass Encoding Scheme for Adaptive Live Streaming
PDF
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
PDF
Flow Trajectory Approach for Human Action Recognition
PDF
A Novel Background Subtraction Algorithm for Dynamic Texture Scenes
PDF
IRJET- Design the Surveillance Algorithm and Motion Detection of Objects for ...
TQPM.pdf
OPSE: Online Per-Scene Encoding for Adaptive HTTP Live Streaming
OPSE_Online Per-Scene Encoding for Adaptive HTTP Live Streaming.pdf
A FAST SEARCH ALGORITHM FOR LARGE VIDEO DATABASE USING HOG BASED FEATURES
A fast search algorithm for large
A FAST SEARCH ALGORITHM FOR LARGE VIDEO DATABASE USING HOG BASED FEATURES
Design and Analysis of Quantization Based Low Bit Rate Encoding System
Optimal Quality and Efficiency in Adaptive Live Streaming with JND-Aware Low ...
Optimal Quality and Efficiency in Adaptive Live Streaming with JND-Aware Low ...
survey on Scene Detection Techniques on video
IEEE MMSP'21: INCEPT: Intra CU Depth Prediction for HEVC
INCEPT: Intra CU Depth Prediction for HEVC
Green_VCA_presentation.pdf
SVM Based Saliency Map Technique for Reducing Time Complexity in HEVC
ETPS_Efficient_Two_pass_Encoding_Scheme_for_Adaptive_Streaming.pdf
ETPS: Efficient Two-pass Encoding Scheme for Adaptive Live Streaming
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Flow Trajectory Approach for Human Action Recognition
A Novel Background Subtraction Algorithm for Dynamic Texture Scenes
IRJET- Design the Surveillance Algorithm and Motion Detection of Objects for ...
Ad

More from Vignesh V Menon (20)

PDF
Learning Quality from Complexity and Structure: A Feature-Fused XGBoost Model...
PDF
Film Grain Coding for Versatile Video Coding Systems: Techniques, Challenges,...
PDF
Decoding Complexity-Rate-Quality Pareto-Front for Adaptive VVC Streaming
PDF
Energy-Quality-aware Variable Framerate Pareto-Front for Adaptive Video Strea...
PDF
Convex-hull Estimation using XPSNR for Versatile Video Coding
PDF
A Tutorial on Latency- and Energy-Aware Video Coding and Delivery Streaming S...
PDF
Video Super-Resolution for Optimized Bitrate and Green Online Streaming
PDF
Enhancing Film Grain Coding in VVC: Improving Encoding Quality and Efficiency
PDF
Online Bitrate ladder prediction for Adaptive VVC Streaming
PDF
Gain of Grain: A Film Grain Handling Toolchain for VVC-based Open Implementat...
PDF
Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resoluti...
PDF
Content_adaptive_video_coding_for_HTTP_Adaptive_Streaming.pdf
PDF
VCIP_MCBE_presentation.pdf
PDF
Green Variable framerate encoding for Adaptive Live Streaming
PDF
JASLA_presentation.pdf
PDF
CAPS_Presentation.pdf
PDF
Doctoral Symposium presentation.pdf
PDF
LiveVBR presentation at VQEG NORM.pdf
PDF
Perceptually-aware Per-title Encoding for Adaptive Video Streaming.pdf
PDF
Research@Lunch_Presentation.pdf
Learning Quality from Complexity and Structure: A Feature-Fused XGBoost Model...
Film Grain Coding for Versatile Video Coding Systems: Techniques, Challenges,...
Decoding Complexity-Rate-Quality Pareto-Front for Adaptive VVC Streaming
Energy-Quality-aware Variable Framerate Pareto-Front for Adaptive Video Strea...
Convex-hull Estimation using XPSNR for Versatile Video Coding
A Tutorial on Latency- and Energy-Aware Video Coding and Delivery Streaming S...
Video Super-Resolution for Optimized Bitrate and Green Online Streaming
Enhancing Film Grain Coding in VVC: Improving Encoding Quality and Efficiency
Online Bitrate ladder prediction for Adaptive VVC Streaming
Gain of Grain: A Film Grain Handling Toolchain for VVC-based Open Implementat...
Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resoluti...
Content_adaptive_video_coding_for_HTTP_Adaptive_Streaming.pdf
VCIP_MCBE_presentation.pdf
Green Variable framerate encoding for Adaptive Live Streaming
JASLA_presentation.pdf
CAPS_Presentation.pdf
Doctoral Symposium presentation.pdf
LiveVBR presentation at VQEG NORM.pdf
Perceptually-aware Per-title Encoding for Adaptive Video Streaming.pdf
Research@Lunch_Presentation.pdf

Recently uploaded (20)

PDF
RMMM.pdf make it easy to upload and study
PDF
Complications of Minimal Access Surgery at WLH
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
Pre independence Education in Inndia.pdf
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
Sports Quiz easy sports quiz sports quiz
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
GDM (1) (1).pptx small presentation for students
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Insiders guide to clinical Medicine.pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
Basic Mud Logging Guide for educational purpose
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
RMMM.pdf make it easy to upload and study
Complications of Minimal Access Surgery at WLH
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
O5-L3 Freight Transport Ops (International) V1.pdf
Module 4: Burden of Disease Tutorial Slides S2 2025
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Pre independence Education in Inndia.pdf
Microbial disease of the cardiovascular and lymphatic systems
Sports Quiz easy sports quiz sports quiz
STATICS OF THE RIGID BODIES Hibbelers.pdf
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
GDM (1) (1).pptx small presentation for students
human mycosis Human fungal infections are called human mycosis..pptx
Insiders guide to clinical Medicine.pdf
Final Presentation General Medicine 03-08-2024.pptx
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Basic Mud Logging Guide for educational purpose
Microbial diseases, their pathogenesis and prophylaxis
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...

IEEE ICIP'22:Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming

  • 1. Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming Vignesh V Menon, Hadi Amirpour, Mohammad Ghanbari, Christian Timmerer Christian Doppler Laboratory ATHENA, Institute of Information Technology (ITEC), University of Klagenfurt, Austria 19-22 September 2021 Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 1
  • 2. Outline 1 Introduction 2 Shot detection 3 Proposed Algorithm 4 Evaluation 5 Conclusions and Future Directions Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 2
  • 3. Introduction Introduction Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 3
  • 4. Introduction Introduction Background of HTTP Adaptive Streaming (HAS)1 Source: https://bitmovin.com/adaptive-streaming/ Why Adaptive Streaming? Adapt for a wide range of devices Adapt for a broad set of Internet speeds What HAS does? Each source video is split into segments Encoded at multiple bitrates, resolutions, and codecs Delivered to the client based on the device capability, network speed etc. 1 A. Bentaleb et al. “A Survey on Bitrate Adaptation Schemes for Streaming Media Over HTTP”. In: IEEE Communications Surveys Tutorials 21.1 (2019), pp. 562–585. doi: 10.1109/COMST.2018.2862938. Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 4
  • 5. Introduction Introduction Multi-shot encoding framework for VoD HAS applications2 Input Video Shot Detection Shot Encodings Video Quality Measure Convex Hull Determination Encoding Set Generation Multi-shot Encoding Encoded Shots Bitrate Quality Pairs Bitrate Resolution Pairs Target Encoding Set 2 Venkata Phani Kumar M, Christian Timmerer, and Hermann Hellwagner. “MiPSO: Multi-Period Per-Scene Optimization For HTTP Adaptive Streaming”. In: 2020 IEEE International Conference on Multimedia and Expo (ICME). 2020, pp. 1–6. doi: 10.1109/ICME46284.2020.9102775. Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 5
  • 6. Shot detection Shot detection Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 6
  • 7. Shot detection Shot Detection The boundaries between video shots are commonly known as shot transitions or shot-cuts. The act of segmenting a video sequence into shots is called shot detection. Objective: Detect the first picture of each shot and encode it as an Instantaneous Decoder Refresh (IDR) frame. Encode the subsequent frames of the new shot based on the first one via motion compen- sation and prediction.3 3 J.-R Ding and Jar-Ferr Yang. “Adaptive group-of-pictures and scene change detection methods based on existing H.264 advanced video coding information”. In: Image Processing, IET 2 (May 2008), pp. 85 –94. doi: 10.1049/iet-ipr:20070014. Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 7
  • 8. Shot detection Shot Detection Shot transitions can be present in two ways: hard shot-cuts gradual shot transitions The detection of gradual changes is much more difficult owing to the fact it is difficult to determine the change in the visual information in a quantitative format. Note 1 Ratio of IDR frames to non-IDR frames is skewed, i.e, uneven distribution. 2 Missed shot-cut detections and wrong IDR placements cause low compression efficiency, i.e., cost of error is large. Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 8
  • 9. Proposed Algorithm Proposed Algorithm Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 9
  • 10. Proposed Algorithm Phase 1: Feature Extraction Proposed Algorithm Phase 1: Feature Extraction Compute texture energy per Coding Tree Unit (CTU) A DCT-based energy function is used to determine the block-wise feature of each frame defined as: Hk = w X i=1 h X j=1 e|( ij wh )2−1| |DCT(i − 1, j − 1)| (1) where w and h are the width and height of the block, and DCT(i, j) is the (i, j)th DCT component when i + j > 2, and 0 otherwise. The energy values of CTUs in a frame is averaged to determine the energy per frame.4 4 Michael King, Zinovi Tauber, and Ze-Nian Li. “A New Energy Function for Segmentation and Compression”. In: July 2007, pp. 1647–1650. doi: 10.1109/ICME.2007.4284983. Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 10
  • 11. Proposed Algorithm Phase 1: Feature Extraction Proposed Algorithm Phase 1: Feature Extraction Figure: Hk of Tears of Steel sequence. Black circles denote the regions of shot transitions. Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 11
  • 12. Proposed Algorithm Phase 1: Feature Extraction Proposed Algorithm Phase 1: Feature Extraction hk: Mean Squared Error (MSE) of the CTU level energy values of frame k to that of the previous frame k − 1, normalized to Hk. hk = PM i=1(Hk(i) − Hk−1(i))2 MHk (2) where M denotes the number of CTUs in frame k. : gradient of h per frame, given by: k = hk−1 − hk hk−1 (3) Note If hk = 0, kth frame is a duplicate of (k − 1)th frame. Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 12
  • 13. Proposed Algorithm Phase 2: Successive Elimination Algorithm Proposed Algorithm Phase 2: Successive Elimination Algorithm Step 1: while Parsing all video frames do if k T1 then k ← IDR-frame, a new shot. else if k ≤ T2 then k ← P-frame or B-frame, not a new shot. T1 , T2 : maximum and minimum threshold for k Note The frames are classified into three categories in this step: 1 a new shot 2 not a new shot 3 not decided In the next steps of the algorithm, only frames of category (3) are considered. Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 13
  • 14. Proposed Algorithm Phase 2: Successive Elimination Algorithm Proposed Algorithm Phase 2: Successive Elimination Algorithm f : video fps Q : set of frames where T1 ≥ T2 q0: current frame number in the set Q q−1: previous frame number in the set Q q1: next frame number in the set Q Step 2: while Parsing Q do if q0 − q−1 f and q1 − q0 f then q0 ← IDR-frame, a new shot. Eliminate q0 from Q. Step 3: while Parsing Q do if q0 − q−1 f and q1 − q0 ≤ f then compare q0 with q when q is from the subset of Q where q1 − q0 ≤ f Frame q with the highest value ← IDR-frame, a new shot. Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 14
  • 15. Proposed Algorithm Phase 2: Successive Elimination Algorithm Proposed Algorithm Working Example Table: Step 1. Frame Hk 33 52162 21.68 54 52119 13.51 65 52625 19.21 86 52038 10.12 97 52499 17.34 161 47790 11.53 833 48644 11.49 1409 40367 14.51 1665 35321 19.93 1686 40463 10.72 1889 38475 12.16 2205 37218 10.08 2536 35793 10.49 Table: Step 2. Frame Hk q0 − q−1 q1 − q0 33 52162 21.68 33 21 54 52119 13.51 21 11 65 52625 19.21 11 21 86 52038 10.12 21 11 97 52499 17.34 11 64 161 47790 11.53 64 672 833 48644 11.49 672 576 1409 40367 14.51 576 256 1665 35321 19.93 256 21 1686 40463 10.72 21 203 1889 38475 12.16 203 316 2205 37218 10.08 316 331 2536 35793 10.49 331 - Table: Step 3. Frame Hk q0 − q−1 q1 − q0 33 52162 21.68 33 21 54 52119 13.51 21 11 65 52625 19.21 11 21 86 52038 10.12 21 11 97 52499 17.34 11 64 1665 35321 19.93 256 21 1686 40463 10.72 21 203 2536 35793 10.49 331 - This example uses FunOnTheRiver (24 fps) test sequence. Detected frames to be encoded as IDR-frames in each step are: Step 1: - Step 2: 161, 833, 1409, 1889, 2205 Step 3: 33, 1665, 2536 Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 15
  • 16. Evaluation Evaluation Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 16
  • 17. Evaluation Evaluation Test Methodology Test videos: JVET test sequences5 and professionally produced UHD HDR cinematic con- tent6 having typical multi-scene content System: Dual-processor server with Intel Xeon Gold 5218R (80 cores, 2.10 GHz) Benchmark algorithm: default shot detection algorithm in x265 T1 = 50 and T2 = 10 for the proposed algorithm; determined experimentally Metrics: accuracy, precision, recall,7 and F-measure8 5 Jill Boyce et al. JVET-J1010: JVET common test conditions and software reference configurations. July 2018. 6 M. H. Pinson. “The Consumer Digital Video Library [Best of the Web]”. In: IEEE Signal Processing Magazine 30.4 (2013), pp. 172–174. doi: 10.1109/MSP.2013.2258265. 7 Markus Junker, Rainer Hoch, and Andreas Dengel. “On the Evaluation of Document Analysis Components by Recall, Precision, and Accuracy”. In: (Apr. 2000). doi: 10.1109/ICDAR.1999.791887. 8 Sasaki Yutaka. “The truth of the F-measure”. In: https://www.toyota-ti.ac.jp/Lab/Denshi/COIN/people/yutaka.sasaki/F-measure-YS-26Oct07.pdf. 2007. Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 17
  • 18. Evaluation Evaluation Experimental Results Table: Shot detection results Video Actual Benchmark algorithm Proposed algorithm shot-cuts Accuracy Precision Recall F-measure Accuracy Precision Recall F-measure BigBuckBunny 10 99.88% 100.00% 80.00% 88.89% 100.00% 100.00% 100.00% 100.00% Dinner 4 99.89% 100.00% 75.00% 85.71% 99.89% 100.00% 75.00% 85.71% FoodMarket4 2 99.72% - 0% - 99.86% 100.00% 50.00% 66.67% sintel trailer 14 99.86% 100.00% 85.71% 92.31% 99.93% 100.00% 92.86% 96.30% snow mnt 3 99.47% - 0% - 99.65% 100.00% 33.33% 50.00% Tears of Steel 13 99.93% 100.00% 92.31% 96.00 % 100.00% 100.00% 100.00% 100.00% Busy City 11 99.64% 50.00% 18.18% 26.67% 99.87% 100.00% 63.64% 77.78% FunOnTheRiver 12 99.60% 0% 0% - 99.80% 85.71% 50.00% 63.16% Remarks 1 Actual shot-cuts: the ground truth, i.e., the number of real shot transitions in the considered test videos determined manually. 2 Recall rate of the proposed algorithm is 25% better than the benchmark algorithm. 3 F-measure of the proposed algorithm is 20% higher compared to the benchmark algorithm. Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 18
  • 19. Evaluation Evaluation Experimental Results Table: Detection rate statistics of the algorithms Algorithm TPR FPR Benchmark 53.62% 0.03% Proposed 78.26% 0.01% Runtime per frame: 0.1% of the total time taken for encoding each frame. The algorithm needs to be run only once for a video. The decisions made can be used for all remaining representations in HAS applications. Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 19
  • 20. Conclusions and Future Directions Conclusions and Future Directions Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 20
  • 21. Conclusions and Future Directions Conclusions Proposed a shot detection algorithm as a feature-based pre-processing step for x265-based HEVC encoding in VoD HAS applications. Identified a DCT-based energy function as a feature to determine shot cuts. Proposed a successive elimination algorithm to remove the false detections during gradual shot transitions. The proposed algorithm gives better-balanced shot detections compared to the benchmark algorithm. Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 21
  • 22. Conclusions and Future Directions Future Directions We can extend the work in this paper to compute the relative complexity of the shots to that of the entire video sequence using the feature metric and predict the ideal bitrate per resolution for each shot. As an extension of this work, more encoding parameter decisions like optimal block parti- tioning, quantization offsets can be predicted. This work can be extended to support more recent codecs e.g., VVC. Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 22
  • 23. Conclusions and Future Directions Q A Thank you for your attention! Vignesh V Menon (vignesh.menon@aau.at) Hadi Amirpour (hadi.amirpourazarian@aau.at) Mohammad Ghanbari (ghan@essex.ac.uk) Christian Timmerer (Christian.Timmerer@aau.at) Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 23