SlideShare a Scribd company logo
Shuffle and Learn: Unsupervised Learning
using Temporal Order Verification
Slides by Xunyu Lin
ReadCV, UPC
20th February, 2017
Ishan Misra, C. Lawrence Zitnick, Martial Hebert
[arxiv] (26 July 2016) [code] [demo]
Index
1. Introduction
2. Unsupervised Representations Learning
3. Video Representations Learning
4. Temporal Order Verification
5. In Practice
6. Evaluations
7. Conclusions
Index
1. Introduction
2. Unsupervised Representations Learning
3. Video Representations Learning
4. Temporal Order Verification
5. In Practice
6. Evaluations
7. Conclusions
Introduction
What is Unsupervised Learning?
● Unsupervised Learning is the machine learning task of inferring a function
to describe hidden structure from unlabeled data.
● The key of Unsupervised Learning is how to do clustering:
Introduction
Why Unsupervised Learning?
“Most of human and animal learning is unsupervised learning. If
intelligence was a cake, unsupervised learning would be the cake,
supervised learning would be the icing on the cake, and
reinforcement learning would be the cherry on the cake. We know
how to make the icing and the cherry, but we don’t know how to make
the cake.”
—— Yann LeCun
Introduction
Why Unsupervised Learning?
● It is the nature of how intelligent beings percept
the world.
● It can save us tons of efforts to build a
human-alike intelligent agent compared to a
totally supervised fashion.
● It’ll be the new breakthroughs to get true AI!
Index
1. Introduction
2. Unsupervised Representations Learning
3. Video Representations Learning
4. Temporal Order Verification
5. In Practice
6. Evaluations
7. Conclusions
Unsupervised Representations Learning
Popular Unsupervised Representations Learning frameworks
Auto-Encoder
Unsupervised Representations Learning
Popular Unsupervised Representations Learning frameworks
Variational
Auto-Encoder
(VAE)
Tutorial
Unsupervised Representations Learning
Popular Unsupervised Representations Learning frameworks
GAN
Index
1. Introduction
2. Unsupervised Representations Learning
3. Video Representations Learning
4. Temporal Order Verification
5. In Practice
6. Evaluations
7. Conclusions
Video Representations Learning
● Human percept the world through observing the dynamic changing of our
daily lives, which can be regarded as videos.
● Thus the unsupervised video representations learning plays an
unneglectable role in building a human-alike intelligent agent.
Video Representations Learning
Related Works
Video Prediction with LSTMs
Video Representations Learning
Related Works
Spatiotemporally Coherent Reconstruction
Index
1. Introduction
2. Unsupervised Representations Learning
3. Video Representations Learning
4. Temporal Order Verification
5. In Practice
6. Evaluations
7. Conclusions
Temporal Order Verification
The internal temporal order of videos
Temporal Order Verification
Temporal Order Verification
Take temporal order as the supervisory signals for learning
Shuffled
sequences
Binary classification
In order
Not in order
Index
1. Introduction
2. Unsupervised Representations Learning
3. Video Representations Learning
4. Temporal Order Verification
5. In Practice
6. Evaluations
7. Conclusions
In Practice
How to sample the tuple of frames?
1. The number of frames for each tuple
- 2 frames: may be ambiguous (picking up or placing down a cup?)
- 3 frames: practically useful, but still not enough for a cyclical case
- ...
In Practice
How to sample the tuple of frames?
a b c d e
b c d
ab d
eb d
Positive
Negative
Original
Video
In Practice
How to sample the tuple of frames?
2. Ambiguity in frames with small motion
- The order of a small motion is indistinguishable.
- Only sample from frames with high motion (smart sampling).
- Use coarse frame level optical flow as a proxy to measure the motion
between frames.
In Practice
How to sample the tuple of frames?
3. The distance of frames in positive tuples (difficulty of the task)
- Too close: results in ambiguous small motion or overly easy task
- Too far: consecutive frames are not highly related which makes the
learning task too difficult.
In Practice
How to sample the tuple of frames?
3. The distance of frames in positive tuples (difficulty of the task)
- Two metrics which control the difficulty of positive and negative samples.
b c d
ab d
eb d
In Practice
How to sample the tuple of frames?
4. Ratio of negative and positive samples
Index
1. Introduction
2. Unsupervised Representations Learning
3. Video Representations Learning
4. Temporal Order Verification
5. In Practice
6. Evaluations
7. Conclusions
Evaluation
Action Recognition on UCF-101 & HMDB-51
- Comparison to random initialization & transfer learning
- Pre-trained on ImageNet and finetuned on UCF-101 gives an accuracy of 67.1%.
- Pre-trained on ImageNet and finetuned on HMDB-51 gives an accuracy of 28.5%.
+ 11.6 %
+ 4.8 %
* UCF-101 is two times larger than HMDB-51
Evaluation
Action Recognition on UCF-101 & HMDB-51
- Comparison to other unsupervised frameworks
- Two Close: measure if two frames are close or not.
- Two Order: temporal verification with only 2 frames.
- DrLim: measure temporal coherency with L2 loss.
- TempCoh: measure temporal coherency with L1 loss.
- Obj. Patch: basically imitates human’s instinct eyes fixation ability. Paper link
Evaluation
Nearest Neighbor retrieval
Evaluation
Visualizing pool5 Unit Responses
Evaluation
Pose Estimation on FLIC & MPII
Index
1. Introduction
2. Unsupervised Representations Learning
3. Video Representations Learning
4. Temporal Order Verification
5. In Practice
6. Evaluations
7. Conclusions
Conclusions
● Temporal verification exploits the potential of a network to capture the
sequential logics in videos.
● Further works should be explored by capturing a longer temporal logics.
For now it only utilizes single frames in less than around 60 frames.
Architectures like RNN could be further utilized to extend the temporal
range.
● The only drawbacks lie in its weak constraint and tedious sampling
techniques.
● More general constraint with simplified procedure? → My research line

More Related Content

PPTX
Leadership theories and styles 2013
PDF
The evolution of management theory
PPTX
Transformational leadership
PPT
Competency mapping assessment and management
PPT
Roles of Supervisor and Developmental Approach
PDF
6 management styles and when best to use them
PDF
SEM結構方程模型與Amos基礎班講義-三星統計張偉豪
PPTX
How managers become leaders v2
Leadership theories and styles 2013
The evolution of management theory
Transformational leadership
Competency mapping assessment and management
Roles of Supervisor and Developmental Approach
6 management styles and when best to use them
SEM結構方程模型與Amos基礎班講義-三星統計張偉豪
How managers become leaders v2

Similar to Shuffle and learn: Unsupervised Learning using Temporal Order Verification (UPC Reading Group) (20)

PPTX
Self-Supervised Learning recent trends 1 2
PDF
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
PDF
Deep Learning from Videos (UPC 2018)
PDF
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
PDF
Learning with Videos (D4L4 2017 UPC Deep Learning for Computer Vision)
PDF
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
PDF
Learning object dynamics in video generation
PDF
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
PDF
Deep Learning for Computer Vision: Unsupervised Learning (UPC 2016)
PDF
MILA DL & RL summer school highlights
PDF
Unsupervised visual representation learning overview: Toward Self-Supervision
PPTX
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
PPTX
Cycle-Contrast for Self-Supervised Video Represenation Learning
PDF
Attention correlated appearance and motion feature followed temporal learning...
PDF
Facial Expression Recognition Using Enhanced Deep 3D Convolutional Neural Net...
PDF
Facial Expression Recognition Using Enhanced Deep 3D Convolutional Neural Net...
PDF
lec_11_self_supervised_learning.pdf
PPTX
Deep Learning in Computer Vision
PDF
AI-MiguelGonzalez.pdf
PDF
161209 Unsupervised Learning of Video Representations using LSTMs
Self-Supervised Learning recent trends 1 2
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
Deep Learning from Videos (UPC 2018)
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
Learning with Videos (D4L4 2017 UPC Deep Learning for Computer Vision)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Learning object dynamics in video generation
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Deep Learning for Computer Vision: Unsupervised Learning (UPC 2016)
MILA DL & RL summer school highlights
Unsupervised visual representation learning overview: Toward Self-Supervision
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
Cycle-Contrast for Self-Supervised Video Represenation Learning
Attention correlated appearance and motion feature followed temporal learning...
Facial Expression Recognition Using Enhanced Deep 3D Convolutional Neural Net...
Facial Expression Recognition Using Enhanced Deep 3D Convolutional Neural Net...
lec_11_self_supervised_learning.pdf
Deep Learning in Computer Vision
AI-MiguelGonzalez.pdf
161209 Unsupervised Learning of Video Representations using LSTMs

More from Universitat Politècnica de Catalunya (20)

PDF
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
PDF
Deep Generative Learning for All
PDF
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
PDF
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
PDF
The Transformer - Xavier Giró - UPC Barcelona 2021
PDF
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
PDF
Open challenges in sign language translation and production
PPTX
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
PPTX
Discovery and Learning of Navigation Goals from Pixels in Minecraft
PDF
Learn2Sign : Sign language recognition and translation using human keypoint e...
PDF
Intepretability / Explainable AI for Deep Neural Networks
PDF
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
PDF
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
PDF
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
PDF
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
PDF
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
PDF
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
PDF
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
PDF
Curriculum Learning for Recurrent Video Object Segmentation
PDF
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
The Transformer - Xavier Giró - UPC Barcelona 2021
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Open challenges in sign language translation and production
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Learn2Sign : Sign language recognition and translation using human keypoint e...
Intepretability / Explainable AI for Deep Neural Networks
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Curriculum Learning for Recurrent Video Object Segmentation
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020

Recently uploaded (20)

PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPT
Quality review (1)_presentation of this 21
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
Business Analytics and business intelligence.pdf
PPTX
Computer network topology notes for revision
PPTX
Introduction to Knowledge Engineering Part 1
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PDF
annual-report-2024-2025 original latest.
PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
Mega Projects Data Mega Projects Data
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Introduction to machine learning and Linear Models
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
STUDY DESIGN details- Lt Col Maksud (21).pptx
Quality review (1)_presentation of this 21
IBA_Chapter_11_Slides_Final_Accessible.pptx
Business Analytics and business intelligence.pdf
Computer network topology notes for revision
Introduction to Knowledge Engineering Part 1
Galatica Smart Energy Infrastructure Startup Pitch Deck
Data_Analytics_and_PowerBI_Presentation.pptx
Database Infoormation System (DBIS).pptx
Introduction-to-Cloud-ComputingFinal.pptx
Reliability_Chapter_ presentation 1221.5784
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
annual-report-2024-2025 original latest.
Clinical guidelines as a resource for EBP(1).pdf
Mega Projects Data Mega Projects Data
Supervised vs unsupervised machine learning algorithms
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Introduction to machine learning and Linear Models
SAP 2 completion done . PRESENTATION.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...

Shuffle and learn: Unsupervised Learning using Temporal Order Verification (UPC Reading Group)

  • 1. Shuffle and Learn: Unsupervised Learning using Temporal Order Verification Slides by Xunyu Lin ReadCV, UPC 20th February, 2017 Ishan Misra, C. Lawrence Zitnick, Martial Hebert [arxiv] (26 July 2016) [code] [demo]
  • 2. Index 1. Introduction 2. Unsupervised Representations Learning 3. Video Representations Learning 4. Temporal Order Verification 5. In Practice 6. Evaluations 7. Conclusions
  • 3. Index 1. Introduction 2. Unsupervised Representations Learning 3. Video Representations Learning 4. Temporal Order Verification 5. In Practice 6. Evaluations 7. Conclusions
  • 4. Introduction What is Unsupervised Learning? ● Unsupervised Learning is the machine learning task of inferring a function to describe hidden structure from unlabeled data. ● The key of Unsupervised Learning is how to do clustering:
  • 5. Introduction Why Unsupervised Learning? “Most of human and animal learning is unsupervised learning. If intelligence was a cake, unsupervised learning would be the cake, supervised learning would be the icing on the cake, and reinforcement learning would be the cherry on the cake. We know how to make the icing and the cherry, but we don’t know how to make the cake.” —— Yann LeCun
  • 6. Introduction Why Unsupervised Learning? ● It is the nature of how intelligent beings percept the world. ● It can save us tons of efforts to build a human-alike intelligent agent compared to a totally supervised fashion. ● It’ll be the new breakthroughs to get true AI!
  • 7. Index 1. Introduction 2. Unsupervised Representations Learning 3. Video Representations Learning 4. Temporal Order Verification 5. In Practice 6. Evaluations 7. Conclusions
  • 8. Unsupervised Representations Learning Popular Unsupervised Representations Learning frameworks Auto-Encoder
  • 9. Unsupervised Representations Learning Popular Unsupervised Representations Learning frameworks Variational Auto-Encoder (VAE) Tutorial
  • 10. Unsupervised Representations Learning Popular Unsupervised Representations Learning frameworks GAN
  • 11. Index 1. Introduction 2. Unsupervised Representations Learning 3. Video Representations Learning 4. Temporal Order Verification 5. In Practice 6. Evaluations 7. Conclusions
  • 12. Video Representations Learning ● Human percept the world through observing the dynamic changing of our daily lives, which can be regarded as videos. ● Thus the unsupervised video representations learning plays an unneglectable role in building a human-alike intelligent agent.
  • 13. Video Representations Learning Related Works Video Prediction with LSTMs
  • 14. Video Representations Learning Related Works Spatiotemporally Coherent Reconstruction
  • 15. Index 1. Introduction 2. Unsupervised Representations Learning 3. Video Representations Learning 4. Temporal Order Verification 5. In Practice 6. Evaluations 7. Conclusions
  • 16. Temporal Order Verification The internal temporal order of videos
  • 18. Temporal Order Verification Take temporal order as the supervisory signals for learning Shuffled sequences Binary classification In order Not in order
  • 19. Index 1. Introduction 2. Unsupervised Representations Learning 3. Video Representations Learning 4. Temporal Order Verification 5. In Practice 6. Evaluations 7. Conclusions
  • 20. In Practice How to sample the tuple of frames? 1. The number of frames for each tuple - 2 frames: may be ambiguous (picking up or placing down a cup?) - 3 frames: practically useful, but still not enough for a cyclical case - ...
  • 21. In Practice How to sample the tuple of frames? a b c d e b c d ab d eb d Positive Negative Original Video
  • 22. In Practice How to sample the tuple of frames? 2. Ambiguity in frames with small motion - The order of a small motion is indistinguishable. - Only sample from frames with high motion (smart sampling). - Use coarse frame level optical flow as a proxy to measure the motion between frames.
  • 23. In Practice How to sample the tuple of frames? 3. The distance of frames in positive tuples (difficulty of the task) - Too close: results in ambiguous small motion or overly easy task - Too far: consecutive frames are not highly related which makes the learning task too difficult.
  • 24. In Practice How to sample the tuple of frames? 3. The distance of frames in positive tuples (difficulty of the task) - Two metrics which control the difficulty of positive and negative samples. b c d ab d eb d
  • 25. In Practice How to sample the tuple of frames? 4. Ratio of negative and positive samples
  • 26. Index 1. Introduction 2. Unsupervised Representations Learning 3. Video Representations Learning 4. Temporal Order Verification 5. In Practice 6. Evaluations 7. Conclusions
  • 27. Evaluation Action Recognition on UCF-101 & HMDB-51 - Comparison to random initialization & transfer learning - Pre-trained on ImageNet and finetuned on UCF-101 gives an accuracy of 67.1%. - Pre-trained on ImageNet and finetuned on HMDB-51 gives an accuracy of 28.5%. + 11.6 % + 4.8 % * UCF-101 is two times larger than HMDB-51
  • 28. Evaluation Action Recognition on UCF-101 & HMDB-51 - Comparison to other unsupervised frameworks - Two Close: measure if two frames are close or not. - Two Order: temporal verification with only 2 frames. - DrLim: measure temporal coherency with L2 loss. - TempCoh: measure temporal coherency with L1 loss. - Obj. Patch: basically imitates human’s instinct eyes fixation ability. Paper link
  • 32. Index 1. Introduction 2. Unsupervised Representations Learning 3. Video Representations Learning 4. Temporal Order Verification 5. In Practice 6. Evaluations 7. Conclusions
  • 33. Conclusions ● Temporal verification exploits the potential of a network to capture the sequential logics in videos. ● Further works should be explored by capturing a longer temporal logics. For now it only utilizes single frames in less than around 60 frames. Architectures like RNN could be further utilized to extend the temporal range. ● The only drawbacks lie in its weak constraint and tedious sampling techniques. ● More general constraint with simplified procedure? → My research line