SlideShare a Scribd company logo
Learning visual representations
  for unfamiliar environments

      Kate Saenko, Brian Kulis,
           Trevor Darrell


       UC Berkeley EECS & ICSI
The challenge of large scale visual interaction




   Last decade has proven the superiority of models
   learned from data vs. hand engineered structures!
Large-scale learning
• “Unsupervised”: Learn models from “found data”;
  often exploit multiple modalities (text+image)

                             … The Tote is the perfect example of
                             two handbag design principles that ...
                             The lines of this tote are incredibly
                             sleek, but ... The semi buckles that
                             form the handle attachments are ...
E.g., finding visual senses

  Artifact sense: “telephone”          DICTIONARY

                                1: (n)
                                telephone, phone, telepho
                                ne set (electronic
                                equipment that converts
                                sound into electrical
                                signals that can be
                                transmitted over distances
                                and then converts received
                                signals back into sounds)

                                2: (n)
                                telephone, telephony
                                (transmitting speech at a
                                distance)




                                   [Saenko and Darrell ’09]
                                        4
Large-scale Learning
• “Unsupervised”: Learn models from “found data”;
  often exploit multiple modalities (text+image)

                             … The Tote is the perfect example of
                             two handbag design principles that ...
                             The lines of this tote are incredibly
                             sleek, but ... The semi buckles that
                             form the handle attachments are ...




• Supervised: Crowdsource labels (e.g., ImageNet)
Yet…
• Even the best collection of images from the web and
  strong machine learning methods can often yield poor
  classifiers on in-situ data!



                                                  ?
• Supervised learning assumption: training distribution
  == test distribution
• Unsupervised learning assumption: joint distribution is
  stationary w.r.t. online world and real world

                 Almost never true!               6
“What You Saw Is Not What You Get”


                           SVM:20%
                           NBNN:19%
SVM:54%
NBNN:61%




  The models fail due to domain shift
Examples of visual domain shifts




  digital SLR      webcam         Close-up   Far-away




 amazon.com                         FLICKR    CCTV
                Consumer images
Examples of domain shift:
change in camera, feature type, dimension
       digital SLR                webcam




         SURF                      SIFT




       VQ to 300
                      Different    VQ to
                                   1000
                     dimensions
Solutions?

• Do nothing (poor performance)
• Collect all types of data (impossible)
• Find out what changed (impractical)
• Learn what changed
Prior Work on Domain Adaptation

• Pre-process the data [Daumé ’07] : replicate
  features to also create source- and domain-
  specific versions; re-train learner on new features


• SVM-based methods [Yang’07], [Jiang’08],
  [Duan’09], [Duan’10] : adapt SVM parameters


• Kernel mean matching [Gretton’09] : re-weight
  training data to match test data distribution
Our paradigm: Transform-based
Domain Adaptation
                                        Example: “green” and “blue” domains
Previous methods’ drawbacks
• cannot transfer learned shift
  to new categories
• cannot handle new features
We can do both by learning                                  W
 domain transformations*


 * Saenko, Kulis, Fritz, and Darrell.
 Adapting visual category models to
 new domains. ECCV, 2010
Limitations of symmetric transforms
                            Symmetric assumption fails!

Saenko et al. ECCV10 used
  metric learning:
• symmetric transforms
• same features
                                            W
How do we learn more
 general shifts?
Latest approach*: asymmetric transforms
                                         Asymmetric transform (rotation)

• Metric learning model no
  longer applicable
• We propose to learn
  asymmetric transforms
  – Map from target to source
  – Handle different dimensions


 *Kulis, Saenko, and Darrell, What You
 Saw is Not What You Get: Domain
 Adaptation Using Asymmetric Kernel
 Transforms, CVPR 2011
Latest approach: asymmetric transforms
                                  Asymmetric transform (rotation)

• Metric learning model no
  longer applicable
• We propose to learn
  asymmetric transforms
                                                   W
  – Map from target to source
  – Handle different dimensions
Model Details


                          W




• Learn a linear transformation to map points
 from one domain to another
  – Call this transformation W
  – Matrices of source and target:
Loss Functions


Choose a point x from the
source and y from the
target, and consider inner
product:


Should be “large” for similar
objects and “small” for dissimilar
objects
Loss Functions

• Input to problem includes a collection of m
 loss functions


• General assumption: loss functions depend
 on data only through inner product matrix
Regularized Objective Function

• Minimize a linear combination of sum of loss
 functions and a regularizer:




• We use squared Frobenius norm as a
 regularizer
  – Not restricted to this choice
The Model Has Drawbacks

• A linear transformation may be insufficient
• Cost of optimization grows as the product of
 the dimensionalities of the source and target
 data


• What to do?
Kernelization

• Main idea: run in kernel space
  – Use a non-linear kernel function (e.g., RBF kernel)
    to learn non-linear transformations in input space
  – Resulting optimization is independent of input
    dimensionality
  – Additional assumption necessary: regularizer is a
    spectral function
Kernelization
                            Kernel matrices for source
                            and target



Original Transformation
Learning Problem




                                  New Kernel Problem




Relationship between
original and new problems
at optimality
Summary of approach



 Input                            Input
 space                            space
  1. Multi-Domain Data     2. Generate Constraints, Learn W




                    Test point
                   y1
                                            y2
                                                 Test point
    3. Map via W                 4. Apply to New Categories
Multi-domain dataset
Experimental Setup

• Utilized a standard bag-of-words model
• Also utilize different features in the target domain
   – SURF vs SIFT
   – Different visual word dictionaries


• Baseline for comparing such data: KCCA
Novel-class experiments

                                        Our Method (linear)
                                        Our Method




• Test method’s ability to transfer domain shift to unseen
  classes
• Train transform on half of the classes, test on the other half
Extreme shift example


 Query from target   Nearest neighbors in source using KCCA+KNN




                     Nearest neighbors in source using transformation
Conclusion
• Should not rely on hand-engineered features any
  more than we rely on hand engineered models!


• Learn feature transformation across domains

• Developed a domain adaptation method based on
  regularized non-linear transforms
  – Asymmetric transform achieves best results on more
    extreme shifts
  – Saenko et al ECCV 2010 and Kulis et al CVPR 2011;
    journal version forthcoming

More Related Content

PPTX
DETR ECCV20
PDF
End-to-End Object Detection with Transformers
PDF
Deformable DETR Review [CDM]
PDF
Deep Learning for Computer Vision: Object Detection (UPC 2016)
PPTX
Object Detection using Deep Neural Networks
PPTX
Object detection - RCNNs vs Retinanet
PDF
Mask-RCNN for Instance Segmentation
PDF
ViT (Vision Transformer) Review [CDM]
DETR ECCV20
End-to-End Object Detection with Transformers
Deformable DETR Review [CDM]
Deep Learning for Computer Vision: Object Detection (UPC 2016)
Object Detection using Deep Neural Networks
Object detection - RCNNs vs Retinanet
Mask-RCNN for Instance Segmentation
ViT (Vision Transformer) Review [CDM]

What's hot (20)

PDF
Object Detection and Recognition
PPTX
150807 Fast R-CNN
PDF
Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018
PPT
Shai Avidan's Support vector tracking and ensemble tracking
PPTX
Object detection
PDF
How much position information do convolutional neural networks encode? review...
PDF
Video Object Segmentation - Laura Leal-Taixé - UPC Barcelona 2018
PDF
Object Detection Using R-CNN Deep Learning Framework
PDF
SSD: Single Shot MultiBox Detector (UPC Reading Group)
PDF
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018
PDF
Object Detection - Míriam Bellver - UPC Barcelona 2018
PPTX
Object Detection Methods using Deep Learning
PDF
Recent Object Detection Research & Person Detection
PDF
#6 PyData Warsaw: Deep learning for image segmentation
PDF
Object Detection (D2L5 Insight@DCU Machine Learning Workshop 2017)
PDF
Auro tripathy - Localizing with CNNs
PDF
Transformer in Computer Vision
PDF
Image Object Detection Pipeline
PDF
Focal loss for dense object detection
PDF
Review: Incremental Few-shot Instance Segmentation [CDM]
Object Detection and Recognition
150807 Fast R-CNN
Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018
Shai Avidan's Support vector tracking and ensemble tracking
Object detection
How much position information do convolutional neural networks encode? review...
Video Object Segmentation - Laura Leal-Taixé - UPC Barcelona 2018
Object Detection Using R-CNN Deep Learning Framework
SSD: Single Shot MultiBox Detector (UPC Reading Group)
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018
Object Detection - Míriam Bellver - UPC Barcelona 2018
Object Detection Methods using Deep Learning
Recent Object Detection Research & Person Detection
#6 PyData Warsaw: Deep learning for image segmentation
Object Detection (D2L5 Insight@DCU Machine Learning Workshop 2017)
Auro tripathy - Localizing with CNNs
Transformer in Computer Vision
Image Object Detection Pipeline
Focal loss for dense object detection
Review: Incremental Few-shot Instance Segmentation [CDM]
Ad

Viewers also liked (8)

PPT
Bucharest Hubb Events Calendar
PPTX
Kompany Presentation
PDF
Fcv bio cv_weiss
PPTX
Mila pitch report china mail
PPTX
Iccv2009 recognition and learning object categories p3 c00 - summary and da...
PDF
On3 pitching competition
PDF
ICCV2009: MAP Inference in Discrete Models: Part 3
PDF
Software Outsourcing: Events Calendar
Bucharest Hubb Events Calendar
Kompany Presentation
Fcv bio cv_weiss
Mila pitch report china mail
Iccv2009 recognition and learning object categories p3 c00 - summary and da...
On3 pitching competition
ICCV2009: MAP Inference in Discrete Models: Part 3
Software Outsourcing: Events Calendar
Ad

Similar to Fcv rep darrell (20)

PPTX
How Machine Learning Helps Organizations to Work More Efficiently?
PPT
Machine Learning Deep Learning Machine learning
PPTX
17- Kernels and Clustering.pptx
PPTX
151106 Sketch-based 3D Shape Retrievals using Convolutional Neural Networks
PPT
Machine Learning workshop by GDSC Amity University Chhattisgarh
PPT
lec6a.ppt
PDF
Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...
PDF
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
PDF
PPT s11-machine vision-s2
PPTX
Sim-to-Real Transfer in Deep Reinforcement Learning
PDF
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
PDF
One talk Machine Learning
PDF
Machine Learning: Learning with data
PPTX
Pretrained Image Classification Model for CNN
PDF
DEF CON 24 - Clarence Chio - machine duping 101
PDF
DL4J at Workday Meetup
PPTX
04 Deep CNN (Ch_01 to Ch_3).pptx
PPTX
Scalable image recognition model with deep embedding
PPTX
Artificial Intelligence, Machine Learning and Deep Learning
How Machine Learning Helps Organizations to Work More Efficiently?
Machine Learning Deep Learning Machine learning
17- Kernels and Clustering.pptx
151106 Sketch-based 3D Shape Retrievals using Convolutional Neural Networks
Machine Learning workshop by GDSC Amity University Chhattisgarh
lec6a.ppt
Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
PPT s11-machine vision-s2
Sim-to-Real Transfer in Deep Reinforcement Learning
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
One talk Machine Learning
Machine Learning: Learning with data
Pretrained Image Classification Model for CNN
DEF CON 24 - Clarence Chio - machine duping 101
DL4J at Workday Meetup
04 Deep CNN (Ch_01 to Ch_3).pptx
Scalable image recognition model with deep embedding
Artificial Intelligence, Machine Learning and Deep Learning

More from zukun (20)

PDF
My lyn tutorial 2009
PDF
ETHZ CV2012: Tutorial openCV
PDF
ETHZ CV2012: Information
PDF
Siwei lyu: natural image statistics
PDF
Lecture9 camera calibration
PDF
Brunelli 2008: template matching techniques in computer vision
PDF
Modern features-part-4-evaluation
PDF
Modern features-part-3-software
PDF
Modern features-part-2-descriptors
PDF
Modern features-part-1-detectors
PDF
Modern features-part-0-intro
PDF
Lecture 02 internet video search
PDF
Lecture 01 internet video search
PDF
Lecture 03 internet video search
PDF
Icml2012 tutorial representation_learning
PPT
Advances in discrete energy minimisation for computer vision
PDF
Gephi tutorial: quick start
PDF
EM algorithm and its application in probabilistic latent semantic analysis
PDF
Object recognition with pictorial structures
PDF
Iccv2011 learning spatiotemporal graphs of human activities
My lyn tutorial 2009
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Information
Siwei lyu: natural image statistics
Lecture9 camera calibration
Brunelli 2008: template matching techniques in computer vision
Modern features-part-4-evaluation
Modern features-part-3-software
Modern features-part-2-descriptors
Modern features-part-1-detectors
Modern features-part-0-intro
Lecture 02 internet video search
Lecture 01 internet video search
Lecture 03 internet video search
Icml2012 tutorial representation_learning
Advances in discrete energy minimisation for computer vision
Gephi tutorial: quick start
EM algorithm and its application in probabilistic latent semantic analysis
Object recognition with pictorial structures
Iccv2011 learning spatiotemporal graphs of human activities

Recently uploaded (20)

PPTX
OMC Textile Division Presentation 2021.pptx
PDF
Hybrid model detection and classification of lung cancer
PDF
Architecture types and enterprise applications.pdf
PDF
STKI Israel Market Study 2025 version august
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
The various Industrial Revolutions .pptx
PDF
August Patch Tuesday
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Zenith AI: Advanced Artificial Intelligence
PPTX
Chapter 5: Probability Theory and Statistics
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
Tartificialntelligence_presentation.pptx
PDF
project resource management chapter-09.pdf
PDF
Developing a website for English-speaking practice to English as a foreign la...
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPT
Module 1.ppt Iot fundamentals and Architecture
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
OMC Textile Division Presentation 2021.pptx
Hybrid model detection and classification of lung cancer
Architecture types and enterprise applications.pdf
STKI Israel Market Study 2025 version august
NewMind AI Weekly Chronicles - August'25-Week II
The various Industrial Revolutions .pptx
August Patch Tuesday
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Zenith AI: Advanced Artificial Intelligence
Chapter 5: Probability Theory and Statistics
Final SEM Unit 1 for mit wpu at pune .pptx
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Tartificialntelligence_presentation.pptx
project resource management chapter-09.pdf
Developing a website for English-speaking practice to English as a foreign la...
Group 1 Presentation -Planning and Decision Making .pptx
Module 1.ppt Iot fundamentals and Architecture
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Assigned Numbers - 2025 - Bluetooth® Document
From MVP to Full-Scale Product A Startup’s Software Journey.pdf

Fcv rep darrell

  • 1. Learning visual representations for unfamiliar environments Kate Saenko, Brian Kulis, Trevor Darrell UC Berkeley EECS & ICSI
  • 2. The challenge of large scale visual interaction Last decade has proven the superiority of models learned from data vs. hand engineered structures!
  • 3. Large-scale learning • “Unsupervised”: Learn models from “found data”; often exploit multiple modalities (text+image) … The Tote is the perfect example of two handbag design principles that ... The lines of this tote are incredibly sleek, but ... The semi buckles that form the handle attachments are ...
  • 4. E.g., finding visual senses Artifact sense: “telephone” DICTIONARY 1: (n) telephone, phone, telepho ne set (electronic equipment that converts sound into electrical signals that can be transmitted over distances and then converts received signals back into sounds) 2: (n) telephone, telephony (transmitting speech at a distance) [Saenko and Darrell ’09] 4
  • 5. Large-scale Learning • “Unsupervised”: Learn models from “found data”; often exploit multiple modalities (text+image) … The Tote is the perfect example of two handbag design principles that ... The lines of this tote are incredibly sleek, but ... The semi buckles that form the handle attachments are ... • Supervised: Crowdsource labels (e.g., ImageNet)
  • 6. Yet… • Even the best collection of images from the web and strong machine learning methods can often yield poor classifiers on in-situ data! ? • Supervised learning assumption: training distribution == test distribution • Unsupervised learning assumption: joint distribution is stationary w.r.t. online world and real world Almost never true! 6
  • 7. “What You Saw Is Not What You Get” SVM:20% NBNN:19% SVM:54% NBNN:61% The models fail due to domain shift
  • 8. Examples of visual domain shifts digital SLR webcam Close-up Far-away amazon.com FLICKR CCTV Consumer images
  • 9. Examples of domain shift: change in camera, feature type, dimension digital SLR webcam SURF SIFT VQ to 300 Different VQ to 1000 dimensions
  • 10. Solutions? • Do nothing (poor performance) • Collect all types of data (impossible) • Find out what changed (impractical) • Learn what changed
  • 11. Prior Work on Domain Adaptation • Pre-process the data [Daumé ’07] : replicate features to also create source- and domain- specific versions; re-train learner on new features • SVM-based methods [Yang’07], [Jiang’08], [Duan’09], [Duan’10] : adapt SVM parameters • Kernel mean matching [Gretton’09] : re-weight training data to match test data distribution
  • 12. Our paradigm: Transform-based Domain Adaptation Example: “green” and “blue” domains Previous methods’ drawbacks • cannot transfer learned shift to new categories • cannot handle new features We can do both by learning W domain transformations* * Saenko, Kulis, Fritz, and Darrell. Adapting visual category models to new domains. ECCV, 2010
  • 13. Limitations of symmetric transforms Symmetric assumption fails! Saenko et al. ECCV10 used metric learning: • symmetric transforms • same features W How do we learn more general shifts?
  • 14. Latest approach*: asymmetric transforms Asymmetric transform (rotation) • Metric learning model no longer applicable • We propose to learn asymmetric transforms – Map from target to source – Handle different dimensions *Kulis, Saenko, and Darrell, What You Saw is Not What You Get: Domain Adaptation Using Asymmetric Kernel Transforms, CVPR 2011
  • 15. Latest approach: asymmetric transforms Asymmetric transform (rotation) • Metric learning model no longer applicable • We propose to learn asymmetric transforms W – Map from target to source – Handle different dimensions
  • 16. Model Details W • Learn a linear transformation to map points from one domain to another – Call this transformation W – Matrices of source and target:
  • 17. Loss Functions Choose a point x from the source and y from the target, and consider inner product: Should be “large” for similar objects and “small” for dissimilar objects
  • 18. Loss Functions • Input to problem includes a collection of m loss functions • General assumption: loss functions depend on data only through inner product matrix
  • 19. Regularized Objective Function • Minimize a linear combination of sum of loss functions and a regularizer: • We use squared Frobenius norm as a regularizer – Not restricted to this choice
  • 20. The Model Has Drawbacks • A linear transformation may be insufficient • Cost of optimization grows as the product of the dimensionalities of the source and target data • What to do?
  • 21. Kernelization • Main idea: run in kernel space – Use a non-linear kernel function (e.g., RBF kernel) to learn non-linear transformations in input space – Resulting optimization is independent of input dimensionality – Additional assumption necessary: regularizer is a spectral function
  • 22. Kernelization Kernel matrices for source and target Original Transformation Learning Problem New Kernel Problem Relationship between original and new problems at optimality
  • 23. Summary of approach Input Input space space 1. Multi-Domain Data 2. Generate Constraints, Learn W Test point y1 y2 Test point 3. Map via W 4. Apply to New Categories
  • 25. Experimental Setup • Utilized a standard bag-of-words model • Also utilize different features in the target domain – SURF vs SIFT – Different visual word dictionaries • Baseline for comparing such data: KCCA
  • 26. Novel-class experiments Our Method (linear) Our Method • Test method’s ability to transfer domain shift to unseen classes • Train transform on half of the classes, test on the other half
  • 27. Extreme shift example Query from target Nearest neighbors in source using KCCA+KNN Nearest neighbors in source using transformation
  • 28. Conclusion • Should not rely on hand-engineered features any more than we rely on hand engineered models! • Learn feature transformation across domains • Developed a domain adaptation method based on regularized non-linear transforms – Asymmetric transform achieves best results on more extreme shifts – Saenko et al ECCV 2010 and Kulis et al CVPR 2011; journal version forthcoming

Editor's Notes

  • #14: Introduce transformations; show why symmetric transformation isn’t enough
  • #16: End of Kate’s part
  • #23: Why is this nontrivial?
  • #28: Todo: make into bar plot