SlideShare a Scribd company logo
Fixation Prediction for
360˚ Video Streaming in
Head-Mounted Virtual Reality
Ching-Ling Fan1, Jean Lee1, Wen-Chih Lo1, Chun-Ying Huang2, Kuan-
Ta Chen 3 and Cheng-Hsin Hsu1
1Department of Computer Science, National Tsing Hua University
2Department of Computer Science, National Chiao Tung University
3Institute of Information Science, Academia Sinica
360˚ Videos
1
360˚ Videos Streaming to HMD
• 360˚ videos contain wider view than
conventional videos
 much more information
 extremely high resolutions and
large file size
⇒ Insufficient bandwidth & degraded user
experience
2
360˚ Videos Streaming to HMD
• Sol: only stream the current
Field-of-View (FoV) of the
viewer
 The HMD viewer only gets to see
a small part of the whole 360˚ video
• Q: Which FoV should we stream
to meet the viewer’s needs in the next moment
(a few seconds)?
⇒ Fixation Prediction
3
Fixation Prediction
• Videos are split into tiles of sub-videos
 Encoded using H.264 and streamed using MPEG
Dynamic Adaptive Streaming over HTTP (DASH)
• Goal: predict which tiles are most likely
viewed by the viewers
➔ which tiles should be included in the next
segment
4
Proposed Approach
• Neural network trained with viewing
features
 content-related:
saliency maps and motion maps
 sensor-related:
viewer’s yaw, roll, and pitch
5
roll
yaw
System Overview
• Image saliency network: to predict the saliency of
images[1]
• Motion feature detector: analyzes Lucas-Kanade
optical flow of consecutive frames
• Orientation extractor: extracts the orientation data
from HMD sensor raw data
6
Image Saliency
Network
[1] M. Cornia, L. Baraldi, G. Serra, and R. Cucchiara. 2016. A Deep Multi-Level Network for
Saliency Prediction. In Proc. of International Conference on Pattern Recognition (ICPR’16).
System Overview
• Image saliency network: to predict the saliency of
images[1]
• Motion feature detector: analyzes Lucas-Kanade
optical flow of consecutive frames
• Orientation extractor: extracts the orientation data
from HMD sensor raw data
7
Image Saliency
Network
[1] M. Cornia, L. Baraldi, G. Serra, and R. Cucchiara. 2016. A Deep Multi-Level Network for
Saliency Prediction. In Proc. of International Conference on Pattern Recognition (ICPR’16).
System Overview
• Image saliency network: to predict the saliency of
images[1]
• Motion feature detector: analyzes Lucas-Kanade
optical flow of consecutive frames
• Orientation extractor: extracts the orientation data
from HMD sensor raw data
8[1] M. Cornia, L. Baraldi, G. Serra, and R. Cucchiara. 2016. A Deep Multi-Level Network for
Saliency Prediction. In Proc. of International Conference on Pattern Recognition (ICPR’16).
Image Saliency
Network
System Overview
• Feature buffer: stores the features in a sliding
window
• Fixation prediction network: to predict the video
fixations
• Tile rate selector: performs rate allocations among
video tiles
9
Image Saliency
Network
System Overview
• Feature buffer: stores the features in a sliding
window
• Fixation prediction network: to predict the video
fixations
• Tile rate selector: performs rate allocations among
video tiles
10
Image Saliency
Network
System Overview
• Feature buffer: stores the features in a sliding
window
• Fixation prediction network: to predict the video
fixations
• Tile rate selector: performs rate allocations among
video tiles
11
Image Saliency
Network
Fixation Prediction Network
• Recurrent Neural Network
• Goal: predict the viewing probability of each
tile in the next few seconds
 Orientation-based network
 Tile-based network
12
RNN
(t-1)
RNN
(t)
RNN
(t+1)
Input
Output
State
RNN
(t-2)
Fixation Prediction Network:
Orientation-Based Network
13
Orientation
Motion
Saliency
𝐏f+1 𝐏f+n
Predicted Viewing Probability
Features
…
Ff-m Ff
m
…
Orientation
Motion
Saliency
LSTMLSTM
…=
n
Output: Predicted
viewing probabilities
Fixation Prediction Network:
Tile-Based Network
14
Viewed
Tiles
Motion
Saliency
𝐏f+1 𝐏f+n
Predicted Viewing Probability
Features
…
Ff-m Ff
m
n
…
…
Viewed
Tiles
Motion
Saliency
Predicted
Tiles
Motion
Saliency
LSTMLSTM LSTM
Output: Predicted
viewing probabilities
Ground Truth
• The tiles viewed by the viewers at each
frame in equirectangular mapping model
 Calculate the FoV on the sphere by the orientation
 Sample the points within the FoV and map them
from sphere to equirectangular model
15
α
β
θ
FoV
0°
1 1 1 1
1 1 1 1
1 1 1
0 0 0 0 0 1
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0
0 0
011 1 1 1
1 1 1 1
1 1 1
1 1
Testbed
• HMD: Oculus Rift DK2
• Sensor Logger: OpenTrack[1]
• Frame Capturer: GamingAnywhere[2]
• 25 viewers and 10 360˚ videos
 12 viewers for training and the rest for testing
16
[1] https://github.com/opentrack/opentrack
[2] http://gaminganywhere.org
Network Training
• Implement the proposed neural network
architecture using Keras
• In training set, 80% of the traces are for
training and the rest are for cross validation
• Training parameters
 Number of neurons in {256, 512, 1024}
 Number of layers in {1, 2}
 Drop out in {True, False}
17
Training Results
• Orientation-based network
• Tile-based network
18
Parameters Training Set Testing Set
No.
Neu.
LSTM
Layers
Drop.
Rank.
Loss
Accuracy F-score
Rank.
Loss
Accuracy F-score
256 1 T 0.1 88.20% 0.67 0.15 85.72% 0.60
512 1 T 0.09 89.25% 0.70 0.14 86.35% 0.62
1024 1 T 0.09 89.28% 0.71 0.14 86.06% 0.62
Parameters Training Set Testing Set
No.
Neu.
LSTM
Layers
Drop.
Rank.
Loss
Accuracy F-score
Rank.
Loss
Accuracy F-score
256 2 F 0.14 86.58% 0.57 0.20 83.94% 0.52
512 2 F 0.13 86.91% 0.58 0.19 84.11% 0.52
1024 2 F 0.12 87.29% 0.60 0.19 84.22% 0.53
Performance Metrics
• Missing ratio
 the fraction of missing tiles over all viewed tiles
• Bandwidth
 the consumed bandwidth for streaming the predicted
tiles
• Initial buffering time
 the minimum buffering time for smooth playout
• Video quality
 the video quality viewed by the viewers
• Running time
 the consumed time for predicting viewed tiles
19
Simulation Setup
• Each viewer randomly selects a 360˚ video to watch
(traces in testing set)
• Each simulation lasts for 1 min and is repeated 8 times
 Bandwidth 150 Mbps (for 13 viewers), latency of 2 secs, 4-sec
segments, and tile size of 192x192
• Baselines
 Current (Cur), Dead Reckoning (DR), and Saliency (Sal)
• Ensure a <10% average missing ratio
 𝜌: the threshold for round the predicted probability to boolean
decision (Our)
 𝛿: the times to iteratively add new tiles at the edge of predicted
tiles (Cur and DR)
 𝜆: the percentile of the saliency value to decide transmit or not
(Sal) 20
• Result in comparable video quality
• Require shorter initial buffering time
Our Fixation Prediction Network
Outperforms Other Solutions
21
2.38s
Up to 43% reduction in initial buffering time
[1] K. Skarseth, H. Bj¿rlo, P. Halvorsen, M. Riegler, and C. Griwodz. 2016. OpenVQ: a video
quality assessment toolkit. In Proc. of ACM International Conference on Multimedia (MM’16),
OSSC paper. 1197–1200.
[1]
• Consumes less bandwidth
• Runs in real-time
Overhead of Our Fixation
Prediction Network
22
4 Mbps
Reduce about 22-36% in bandwidth consumption
< 50 ms
Conclusion
• Fixation prediction for 360˚ video
streaming to HMDs using neural
networks
 leverage both sensor- and content-related features
• Dataset collection and trace-driven simulation
• Our fixation prediction network outperforms
others
• The prediction is performed in real-time
(< 50 ms)
23
• Larger-scale datasets and more extensive simulations
• Eye-tracking HMDs
• The negative impact of 360˚ video projection
• Bitrate allocation algorithms, foveated rendering
Future Work
24
High
Medium
Low
High
Medium
Low
Our Dataset
• 50 subjects, each of them watch 10 360˚
videos using HMD
 sensor data: raw sensor data, viewer orientation,
and viewed tiles
 content data: detected saliency maps and motion
maps of each video
• W. Lo, C. Fan, J. Lee, C. Huang, K. Chen, and C. Hsu. 2017.
360˚ Video Viewing Dataset in Head-Mounted Virtual Reality.
In Proc. of ACM International Conference on Multimedia
Systems (MMSys’17), Dataset Track.
25
Q&A
ch.ling.fan@gmail.com

More Related Content

PPTX
NMSL_2017summer
PPTX
Performance Measurements of 360◦ Video Streaming to Head-Mounted Displays Ove...
PPTX
360° Video Viewing Dataset in Head-Mounted Virtual Reality
PDF
New Media Services from a Mobile Chipset Vendor and Standardization Perspective
PDF
"Update on Khronos Standards for Vision and Machine Learning," a Presentation...
PDF
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
PDF
NVIDIA 深度學習教育機構 (DLI): Neural network deployment
PDF
White Paper - CEVA-XM4 Intelligent Vision Processor
NMSL_2017summer
Performance Measurements of 360◦ Video Streaming to Head-Mounted Displays Ove...
360° Video Viewing Dataset in Head-Mounted Virtual Reality
New Media Services from a Mobile Chipset Vendor and Standardization Perspective
"Update on Khronos Standards for Vision and Machine Learning," a Presentation...
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
NVIDIA 深度學習教育機構 (DLI): Neural network deployment
White Paper - CEVA-XM4 Intelligent Vision Processor

What's hot (20)

PDF
Tactile Internet with Human-in-the-Loop
PDF
Seattle Video Tech Meetup August 2019: Optimal Multi-codec Streaming
PDF
FaME-ML: Fast Multirate Encoding for HTTP Adaptive Streaming Using Machine Le...
PDF
Parallelization techniques and hardware for 2D modelling - Mark Britton (DHI)
PDF
MIKE BY DHI SAAS - New features as of October 2014
PDF
Tile-based Streaming of 8K Omnidirectional Video: Subjective and Objective Qo...
PDF
What’s new in MPEG?
PDF
"How to Test and Validate an Automated Driving System," a Presentation from M...
PPTX
CAdViSE or how to find the Sweet Spots of ABR Systems
PDF
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
PDF
Possibilities of generative models
PDF
Evolution of Supermicro GPU Server Solution
PDF
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
PPTX
Lessons Learned Running a Container Cloud on Apache Hadoop YARN
PDF
"Using the OpenCL C Kernel Language for Embedded Vision Processors," a Presen...
PDF
20 Years of Streaming in 20 Minutes
PDF
産総研におけるプライベートクラウドへの取り組み
PPTX
Rack Cluster Deployment for SDSC Supercomputer
PDF
A Platform for Accelerating Machine Learning Applications
PDF
ES-HAS: An Edge- and SDN-Assisted Framework for HTTP Adaptive Video Streaming
Tactile Internet with Human-in-the-Loop
Seattle Video Tech Meetup August 2019: Optimal Multi-codec Streaming
FaME-ML: Fast Multirate Encoding for HTTP Adaptive Streaming Using Machine Le...
Parallelization techniques and hardware for 2D modelling - Mark Britton (DHI)
MIKE BY DHI SAAS - New features as of October 2014
Tile-based Streaming of 8K Omnidirectional Video: Subjective and Objective Qo...
What’s new in MPEG?
"How to Test and Validate an Automated Driving System," a Presentation from M...
CAdViSE or how to find the Sweet Spots of ABR Systems
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
Possibilities of generative models
Evolution of Supermicro GPU Server Solution
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Lessons Learned Running a Container Cloud on Apache Hadoop YARN
"Using the OpenCL C Kernel Language for Embedded Vision Processors," a Presen...
20 Years of Streaming in 20 Minutes
産総研におけるプライベートクラウドへの取り組み
Rack Cluster Deployment for SDSC Supercomputer
A Platform for Accelerating Machine Learning Applications
ES-HAS: An Edge- and SDN-Assisted Framework for HTTP Adaptive Video Streaming
Ad

Similar to Fixation Prediction for 360° Video Streaming in Head-Mounted Virtual Reality (20)

PDF
Network-Assisted Delivery of Adaptive Video Streaming Services through CDN, S...
PDF
IEEE_ICC'23_SARENA.pdf
PDF
SARENA: SFC-Enabled Architecture for Adaptive Video Streaming Applications
PPT
Presentation Crosshaul (Nem Summit, Madrid) - 29Nov2017
PPT
Presentation Quality Probe (Vilamoura, Portugal, IADIS Applied Computing Conf...
PDF
QoS for Media Networks
PPT
Computer graphic lecturer no 3
PDF
Immersive Video Delivery: From Omnidirectional Video to Holography
PDF
R2D2 Project (EP/L006251/1) - Research Objectives & Outcomes
PDF
ACM NOSSDAV'21-ES-HAS_ An Edge- and SDN-Assisted Framework for HTTP Adaptive ...
PPTX
SJNC13.pptx
PDF
Getting the Best of TrueDEM - June News & Updates
PPTX
End-to-end Quality of Experience Evaluation for HTTP Adaptive Streaming
PDF
Efficient video perception through AI
PDF
OW2con'14 - XLcoud, 3D rendering in the cloud, Marius Preda, Institut Mines T...
PDF
Paper id 2120148
PDF
IEEEGlobecom'22-OL-RICHTER.pdf
PPTX
XLcloud 3-d remote rendering
PDF
Nervana and the Future of Computing
PDF
HTTP Adaptive Streaming – Where Is It Heading?
Network-Assisted Delivery of Adaptive Video Streaming Services through CDN, S...
IEEE_ICC'23_SARENA.pdf
SARENA: SFC-Enabled Architecture for Adaptive Video Streaming Applications
Presentation Crosshaul (Nem Summit, Madrid) - 29Nov2017
Presentation Quality Probe (Vilamoura, Portugal, IADIS Applied Computing Conf...
QoS for Media Networks
Computer graphic lecturer no 3
Immersive Video Delivery: From Omnidirectional Video to Holography
R2D2 Project (EP/L006251/1) - Research Objectives & Outcomes
ACM NOSSDAV'21-ES-HAS_ An Edge- and SDN-Assisted Framework for HTTP Adaptive ...
SJNC13.pptx
Getting the Best of TrueDEM - June News & Updates
End-to-end Quality of Experience Evaluation for HTTP Adaptive Streaming
Efficient video perception through AI
OW2con'14 - XLcoud, 3D rendering in the cloud, Marius Preda, Institut Mines T...
Paper id 2120148
IEEEGlobecom'22-OL-RICHTER.pdf
XLcloud 3-d remote rendering
Nervana and the Future of Computing
HTTP Adaptive Streaming – Where Is It Heading?
Ad

Recently uploaded (20)

PPTX
NORMAN_RESEARCH_PRESENTATION.in education
PDF
COLEAD A2F approach and Theory of Change
PPTX
FINAL TEST 3C_OCTAVIA RAMADHANI SANTOSO-1.pptx
PPTX
Research Process - Research Methods course
PDF
Tunisia's Founding Father(s) Pitch-Deck 2022.pdf
PDF
6.-propertise of noble gases, uses and isolation in noble gases
PDF
IKS PPT.....................................
PPTX
Kompem Part Untuk MK Komunikasi Pembangunan 5.pptx
PDF
Yusen Logistics Group Sustainability Report 2024.pdf
PPTX
Bob Difficult Questions 08 17 2025.pptx
DOCX
Action plan to easily understanding okey
DOC
LBU毕业证学历认证,伦敦政治经济学院毕业证外国毕业证
PDF
Module 7 guard mounting of security pers
PPTX
2025-08-17 Joseph 03 (shared slides).pptx
PPTX
Anesthesia and it's stage with mnemonic and images
PPTX
lesson6-211001025531lesson plan ppt.pptx
PDF
Microsoft-365-Administrator-s-Guide_.pdf
PPTX
Lesson-7-Gas. -Exchange_074636.pptx
PPT
First Aid Training Presentation Slides.ppt
PPTX
Phylogeny and disease transmission of Dipteran Fly (ppt).pptx
NORMAN_RESEARCH_PRESENTATION.in education
COLEAD A2F approach and Theory of Change
FINAL TEST 3C_OCTAVIA RAMADHANI SANTOSO-1.pptx
Research Process - Research Methods course
Tunisia's Founding Father(s) Pitch-Deck 2022.pdf
6.-propertise of noble gases, uses and isolation in noble gases
IKS PPT.....................................
Kompem Part Untuk MK Komunikasi Pembangunan 5.pptx
Yusen Logistics Group Sustainability Report 2024.pdf
Bob Difficult Questions 08 17 2025.pptx
Action plan to easily understanding okey
LBU毕业证学历认证,伦敦政治经济学院毕业证外国毕业证
Module 7 guard mounting of security pers
2025-08-17 Joseph 03 (shared slides).pptx
Anesthesia and it's stage with mnemonic and images
lesson6-211001025531lesson plan ppt.pptx
Microsoft-365-Administrator-s-Guide_.pdf
Lesson-7-Gas. -Exchange_074636.pptx
First Aid Training Presentation Slides.ppt
Phylogeny and disease transmission of Dipteran Fly (ppt).pptx

Fixation Prediction for 360° Video Streaming in Head-Mounted Virtual Reality

  • 1. Fixation Prediction for 360˚ Video Streaming in Head-Mounted Virtual Reality Ching-Ling Fan1, Jean Lee1, Wen-Chih Lo1, Chun-Ying Huang2, Kuan- Ta Chen 3 and Cheng-Hsin Hsu1 1Department of Computer Science, National Tsing Hua University 2Department of Computer Science, National Chiao Tung University 3Institute of Information Science, Academia Sinica
  • 3. 360˚ Videos Streaming to HMD • 360˚ videos contain wider view than conventional videos  much more information  extremely high resolutions and large file size ⇒ Insufficient bandwidth & degraded user experience 2
  • 4. 360˚ Videos Streaming to HMD • Sol: only stream the current Field-of-View (FoV) of the viewer  The HMD viewer only gets to see a small part of the whole 360˚ video • Q: Which FoV should we stream to meet the viewer’s needs in the next moment (a few seconds)? ⇒ Fixation Prediction 3
  • 5. Fixation Prediction • Videos are split into tiles of sub-videos  Encoded using H.264 and streamed using MPEG Dynamic Adaptive Streaming over HTTP (DASH) • Goal: predict which tiles are most likely viewed by the viewers ➔ which tiles should be included in the next segment 4
  • 6. Proposed Approach • Neural network trained with viewing features  content-related: saliency maps and motion maps  sensor-related: viewer’s yaw, roll, and pitch 5 roll yaw
  • 7. System Overview • Image saliency network: to predict the saliency of images[1] • Motion feature detector: analyzes Lucas-Kanade optical flow of consecutive frames • Orientation extractor: extracts the orientation data from HMD sensor raw data 6 Image Saliency Network [1] M. Cornia, L. Baraldi, G. Serra, and R. Cucchiara. 2016. A Deep Multi-Level Network for Saliency Prediction. In Proc. of International Conference on Pattern Recognition (ICPR’16).
  • 8. System Overview • Image saliency network: to predict the saliency of images[1] • Motion feature detector: analyzes Lucas-Kanade optical flow of consecutive frames • Orientation extractor: extracts the orientation data from HMD sensor raw data 7 Image Saliency Network [1] M. Cornia, L. Baraldi, G. Serra, and R. Cucchiara. 2016. A Deep Multi-Level Network for Saliency Prediction. In Proc. of International Conference on Pattern Recognition (ICPR’16).
  • 9. System Overview • Image saliency network: to predict the saliency of images[1] • Motion feature detector: analyzes Lucas-Kanade optical flow of consecutive frames • Orientation extractor: extracts the orientation data from HMD sensor raw data 8[1] M. Cornia, L. Baraldi, G. Serra, and R. Cucchiara. 2016. A Deep Multi-Level Network for Saliency Prediction. In Proc. of International Conference on Pattern Recognition (ICPR’16). Image Saliency Network
  • 10. System Overview • Feature buffer: stores the features in a sliding window • Fixation prediction network: to predict the video fixations • Tile rate selector: performs rate allocations among video tiles 9 Image Saliency Network
  • 11. System Overview • Feature buffer: stores the features in a sliding window • Fixation prediction network: to predict the video fixations • Tile rate selector: performs rate allocations among video tiles 10 Image Saliency Network
  • 12. System Overview • Feature buffer: stores the features in a sliding window • Fixation prediction network: to predict the video fixations • Tile rate selector: performs rate allocations among video tiles 11 Image Saliency Network
  • 13. Fixation Prediction Network • Recurrent Neural Network • Goal: predict the viewing probability of each tile in the next few seconds  Orientation-based network  Tile-based network 12 RNN (t-1) RNN (t) RNN (t+1) Input Output State RNN (t-2)
  • 14. Fixation Prediction Network: Orientation-Based Network 13 Orientation Motion Saliency 𝐏f+1 𝐏f+n Predicted Viewing Probability Features … Ff-m Ff m … Orientation Motion Saliency LSTMLSTM …= n Output: Predicted viewing probabilities
  • 15. Fixation Prediction Network: Tile-Based Network 14 Viewed Tiles Motion Saliency 𝐏f+1 𝐏f+n Predicted Viewing Probability Features … Ff-m Ff m n … … Viewed Tiles Motion Saliency Predicted Tiles Motion Saliency LSTMLSTM LSTM Output: Predicted viewing probabilities
  • 16. Ground Truth • The tiles viewed by the viewers at each frame in equirectangular mapping model  Calculate the FoV on the sphere by the orientation  Sample the points within the FoV and map them from sphere to equirectangular model 15 α β θ FoV 0° 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 011 1 1 1 1 1 1 1 1 1 1 1 1
  • 17. Testbed • HMD: Oculus Rift DK2 • Sensor Logger: OpenTrack[1] • Frame Capturer: GamingAnywhere[2] • 25 viewers and 10 360˚ videos  12 viewers for training and the rest for testing 16 [1] https://github.com/opentrack/opentrack [2] http://gaminganywhere.org
  • 18. Network Training • Implement the proposed neural network architecture using Keras • In training set, 80% of the traces are for training and the rest are for cross validation • Training parameters  Number of neurons in {256, 512, 1024}  Number of layers in {1, 2}  Drop out in {True, False} 17
  • 19. Training Results • Orientation-based network • Tile-based network 18 Parameters Training Set Testing Set No. Neu. LSTM Layers Drop. Rank. Loss Accuracy F-score Rank. Loss Accuracy F-score 256 1 T 0.1 88.20% 0.67 0.15 85.72% 0.60 512 1 T 0.09 89.25% 0.70 0.14 86.35% 0.62 1024 1 T 0.09 89.28% 0.71 0.14 86.06% 0.62 Parameters Training Set Testing Set No. Neu. LSTM Layers Drop. Rank. Loss Accuracy F-score Rank. Loss Accuracy F-score 256 2 F 0.14 86.58% 0.57 0.20 83.94% 0.52 512 2 F 0.13 86.91% 0.58 0.19 84.11% 0.52 1024 2 F 0.12 87.29% 0.60 0.19 84.22% 0.53
  • 20. Performance Metrics • Missing ratio  the fraction of missing tiles over all viewed tiles • Bandwidth  the consumed bandwidth for streaming the predicted tiles • Initial buffering time  the minimum buffering time for smooth playout • Video quality  the video quality viewed by the viewers • Running time  the consumed time for predicting viewed tiles 19
  • 21. Simulation Setup • Each viewer randomly selects a 360˚ video to watch (traces in testing set) • Each simulation lasts for 1 min and is repeated 8 times  Bandwidth 150 Mbps (for 13 viewers), latency of 2 secs, 4-sec segments, and tile size of 192x192 • Baselines  Current (Cur), Dead Reckoning (DR), and Saliency (Sal) • Ensure a <10% average missing ratio  𝜌: the threshold for round the predicted probability to boolean decision (Our)  𝛿: the times to iteratively add new tiles at the edge of predicted tiles (Cur and DR)  𝜆: the percentile of the saliency value to decide transmit or not (Sal) 20
  • 22. • Result in comparable video quality • Require shorter initial buffering time Our Fixation Prediction Network Outperforms Other Solutions 21 2.38s Up to 43% reduction in initial buffering time [1] K. Skarseth, H. Bj¿rlo, P. Halvorsen, M. Riegler, and C. Griwodz. 2016. OpenVQ: a video quality assessment toolkit. In Proc. of ACM International Conference on Multimedia (MM’16), OSSC paper. 1197–1200. [1]
  • 23. • Consumes less bandwidth • Runs in real-time Overhead of Our Fixation Prediction Network 22 4 Mbps Reduce about 22-36% in bandwidth consumption < 50 ms
  • 24. Conclusion • Fixation prediction for 360˚ video streaming to HMDs using neural networks  leverage both sensor- and content-related features • Dataset collection and trace-driven simulation • Our fixation prediction network outperforms others • The prediction is performed in real-time (< 50 ms) 23
  • 25. • Larger-scale datasets and more extensive simulations • Eye-tracking HMDs • The negative impact of 360˚ video projection • Bitrate allocation algorithms, foveated rendering Future Work 24 High Medium Low High Medium Low
  • 26. Our Dataset • 50 subjects, each of them watch 10 360˚ videos using HMD  sensor data: raw sensor data, viewer orientation, and viewed tiles  content data: detected saliency maps and motion maps of each video • W. Lo, C. Fan, J. Lee, C. Huang, K. Chen, and C. Hsu. 2017. 360˚ Video Viewing Dataset in Head-Mounted Virtual Reality. In Proc. of ACM International Conference on Multimedia Systems (MMSys’17), Dataset Track. 25

Editor's Notes

  • #4: Above 15 Mbps
  • #16: Directly predict n future frames the viewed tiles in the future frames are not ground truth but the predicted tiles from the previous prediction
  • #17: The tiles that are overlapped with this points are considered to be the viewed tiles, which is the ground truth.