SlideShare a Scribd company logo
Identifying Objects in
Images from Analyzing the
User„s Gaze Movements
for Provided Tags
Tina Walber, Ansgar Scherp, Steffen Staab
University of Koblenz-Landau, Koblenz, Germany

Multimedia Modeling Conference
Klagenfurt, Austria
January 4-6, 2012
Motivation: Image Tagging
                      tree

                                                                  girl
       car

                                                                                  store

                                                                         people
       sidewalk
     Find specific objects in images
     Analyzing the user‟s gaze path only
 T. Walber, A. Scherp, S. Staab – Identifying Objects in Images                     2 of 21
Research Questions


1.Best fixation measure to find the correct
  image region given a specific tag?



2. Can we differentiate two regions in the
   same image?


  T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   3 of 21
3 Steps Conducted by Users




 Look at red blinking dot
 Decide whether tag can be seen (“y” or “n”)
 T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   4 of 21
Dataset
 LabelMe community images
   Manually drawn polygons
   Regions annotated with tags
 182.657 images (August 2010)



 High-quality segmentation and annotation
 Used as ground truth

 T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   5 of 21
Experiment Images and Tags
 Randomly selected 51 images
 Contain at least two tagged regions

 Created two tag sets for the 51 images
 Each image is assigned two tags (one per set)

 Tags are either “true” or “false”
   “true”  object described by tag can be seen
   “false”  object cannot be seen on the image
 Keep subjects concentrated during experiment
  T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   6 of 21
Subjects & Experiment System
 20 subjects
   16 male, 4 female (age: 23-40, Ø=29.6)
   Undergrads (6), PhD (12), office clerks (2)


 Experiment system
    Simple web page in Internet Explorer
    Standard notebook, resolution 1680x1050
    Tobii X60 eye-tracker (60 Hz, 0.5° accuracy)

  T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   7 of 21
Conducting the Experiment
 Each user looked at 51 tag-image-pairs
 First tag-image-pair dismissed

 94.3% correct answers
 Equal for true/false tags
 ~3s until decision (average)

 85% of users strongly agreed or agreed that
  they felt comfortable during the experiment
   Eyetracker did not much influence comfort
  T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   8 of 21
Pre-processing of Eye-tracking Data
 Obtained 547 gaze paths from 20 users where
   Users gave correct answers
   Image has “true” tag assigned
 Fixation extraction
   Tobii Studio‟s velocity & distance thresholds
   Fixation: focus on particular point on screen

 One fixation inside or near the correct region
 476 (87%) gaze paths fulfill this requirement

  T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   9 of 21
Analysis of Gaze Fixations (1)
 Applied 13 fixation measures on the 476 paths
  (2 new, 7 standard Tobii , 4 literature)

 Fixation measure: function on users‟ gaze paths
 Calculated for each image region, over all users
  viewing the same tag-image-pair




  T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   10 of 21
Considered Fixation Measures
Nr Name                             Favorite region r                                   Origin
1    firstFixation                  No. of fixations before 1st on r                    Tobii
2    secondFixation                 No. of fixations before 2nd on r                    [13]
3    fixationsAfter                 No. of fixations after last on r                    [4]
4    fixationsBeforeDecision fixationsAfter, but before decision                        New
5    fixationsAfterDecision         fixationsBeforeDecision and after                   New
6    fixationDuration               Total duration of all fixations on r                Tobii
7    firstFixationDuration          Duration of first fixation on r                     Tobii
8    lastFixationDuration           Duration of last fixation on r                      [11]
9    fixationCount                  Number of fixations on r                            Tobii
10 maxVisitDuration                 Max time first fixation until outside r             Tobii
11 meanVisitDuration                Mean time first fixation until outside r Tobii
12 visitCount                       No. of fixations until outside r                    Tobii
13 T. saccLength S. Staab – Identifying Objects in Imageslength, before fixation on r
      Walber, A. Scherp,                Saccade                                         [6]of 21
                                                                                         11
Analysis of Gaze Fixations (2)




 For every image region (b) the fixation
  measure is calculated over all gaze paths (c)
 Results are summed up per region
 Regions ordered according to fixation measure
 If favorite region (d) and tag (a) match, result is
  true positive (tp), otherwise false positive (fp)
  T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   12 of 21
Precision per Fixation Measure
                                                                          meanVisitDuration                               P
Sum of tp and fp assignments




            fixationsBeforeDecision                                                             lastFixationDuration


                                                                                      fixationDuration



                                                                       Fixation measures
                               T. Walber, A. Scherp, S. Staab – Identifying Objects in Images                  13 of 21
Adding Boundaries and Weights
 Take eye-tracker inaccuracies into account
 Extension of region boundaries by 13 pixels




 Larger regions more likely to be fixated
 Give weight to regions < 5% of image size
 meanVisitDuration increases to P = 0.67
  T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   14 of 21
Examples: Tag-Region-Assignments




 T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   15 of 21
Comparison with Baselines




 Naïve baseline: largest region r is favorite
 Random baseline: randomly select favorite r

 Gaze / Gaze* significantly better (χ², α<0.001)

  T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   16 of 21
Effect of Gaze Path Aggregation
         P




                                    Number of gaze paths used

 Aggregation of precision P for Gaze*

 Single user still significantly better (χ² for
  naive with α<0.001 and random with α<0.002)
  T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   17 of 21
Research Questions


1.Best fixation measure to find the correct
  image region given a specific tag?
   meanVisitDuration with precision of 67%


2. Can we differentiate two regions in the
   same image?


  T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   18 of 21
Differentiate Two Objects
 Use second tag set to identify different objects
  in the same image
 16 images (of our 51) have two “true” tags
 6 images had two correct regions identified
   Proportion of 38%

 Average precision for single object is 67%
  Correct tag assignment for two images: 44%


  T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   19 of 21
Correctly Differentiated Objects




 T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   20 of 21
Research Questions


1.Best fixation measure to find the correct
  image region given a specific tag?
    meanVisitDuration with precision of 67%


2. Can we differentiate two regions in the
   same image?
   Accuracy of 38%
Acknowledgement: This research was partially supported by the EU projects
Petamedia (FP7-216444) andObjects in Images
   T. Walber, A. Scherp, S. Staab – Identifying SocialSensor (FP7-287975). 21 of 21
Influence of Red Dot




 First 5 fixations, over all subjects and all images
  T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   22 of 21
Experiment Data Cleaning
 Manually replaced images with
a) Tags that are incomprehensible, require
   expert-knowledge, or nonsense
b) Tag refers to multiple regions, but not all are
   drawn into the image (e.g., bicycle)
c) Obstructed objects (bicycle behind a car)
d) “False”-tag actually refers to a visible part of
   the image and thus were “true” tags


  T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   23 of 21

More Related Content

PPTX
Can you see it? Annotating Image Regions based on Users' Gaze Information
PPTX
Adaptive Spectral Projection
PPTX
Accommodation-invariant Computational Near-eye Displays - SIGGRAPH 2017
PDF
Обзор алгоритмов трекинга объектов
PDF
Paper reading best of both world
PDF
When Remote Sensing Meets Artificial Intelligence
PDF
Lecture 6-computer vision features descriptors matching
PPTX
Analysis of GraphSum's Attention Weights to Improve the Explainability of Mul...
Can you see it? Annotating Image Regions based on Users' Gaze Information
Adaptive Spectral Projection
Accommodation-invariant Computational Near-eye Displays - SIGGRAPH 2017
Обзор алгоритмов трекинга объектов
Paper reading best of both world
When Remote Sensing Meets Artificial Intelligence
Lecture 6-computer vision features descriptors matching
Analysis of GraphSum's Attention Weights to Improve the Explainability of Mul...

More from Ansgar Scherp (18)

PDF
STEREO: A Pipeline for Extracting Experiment Statistics, Conditions, and Topi...
PDF
Text Localization in Scientific Figures using Fully Convolutional Neural Netw...
PPTX
A Comparison of Approaches for Automated Text Extraction from Scholarly Figures
PDF
About Multimedia Presentation Generation and Multimedia Metadata: From Synthe...
PPTX
Mining and Managing Large-scale Linked Open Data
PDF
Knowledge Discovery in Social Media and Scientific Digital Libraries
PPTX
A Comparison of Different Strategies for Automated Semantic Document Annotation
PPTX
Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...
PDF
A Framework for Iterative Signing of Graph Data on the Web
PDF
Smart photo selection: interpret gaze as personal interest
PPTX
Events in Multimedia - Theory, Model, Application
PPTX
Linked open data - how to juggle with more than a billion triples
PPTX
SchemEX -- Building an Index for Linked Open Data
PPTX
SchemEX -- Building an Index for Linked Open Data
PPTX
A Model of Events for Integrating Event-based Information in Complex Socio-te...
PPTX
SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
PPTX
strukt - A Pattern System for Integrating Individual and Organizational Knowl...
PPTX
Linked Open Data (Entwurfsprinzipien und Muster für vernetzte Daten)
STEREO: A Pipeline for Extracting Experiment Statistics, Conditions, and Topi...
Text Localization in Scientific Figures using Fully Convolutional Neural Netw...
A Comparison of Approaches for Automated Text Extraction from Scholarly Figures
About Multimedia Presentation Generation and Multimedia Metadata: From Synthe...
Mining and Managing Large-scale Linked Open Data
Knowledge Discovery in Social Media and Scientific Digital Libraries
A Comparison of Different Strategies for Automated Semantic Document Annotation
Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...
A Framework for Iterative Signing of Graph Data on the Web
Smart photo selection: interpret gaze as personal interest
Events in Multimedia - Theory, Model, Application
Linked open data - how to juggle with more than a billion triples
SchemEX -- Building an Index for Linked Open Data
SchemEX -- Building an Index for Linked Open Data
A Model of Events for Integrating Event-based Information in Complex Socio-te...
SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
strukt - A Pattern System for Integrating Individual and Organizational Knowl...
Linked Open Data (Entwurfsprinzipien und Muster für vernetzte Daten)
Ad

Recently uploaded (20)

PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Cloud computing and distributed systems.
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
cuic standard and advanced reporting.pdf
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
Programs and apps: productivity, graphics, security and other tools
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Cloud computing and distributed systems.
Encapsulation_ Review paper, used for researhc scholars
cuic standard and advanced reporting.pdf
sap open course for s4hana steps from ECC to s4
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Spectral efficient network and resource selection model in 5G networks
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Unlocking AI with Model Context Protocol (MCP)
20250228 LYD VKU AI Blended-Learning.pptx
Empathic Computing: Creating Shared Understanding
MIND Revenue Release Quarter 2 2025 Press Release
Mobile App Security Testing_ A Comprehensive Guide.pdf
The AUB Centre for AI in Media Proposal.docx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Ad

Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Provided Tags

  • 1. Identifying Objects in Images from Analyzing the User„s Gaze Movements for Provided Tags Tina Walber, Ansgar Scherp, Steffen Staab University of Koblenz-Landau, Koblenz, Germany Multimedia Modeling Conference Klagenfurt, Austria January 4-6, 2012
  • 2. Motivation: Image Tagging tree girl car store people sidewalk  Find specific objects in images  Analyzing the user‟s gaze path only T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 2 of 21
  • 3. Research Questions 1.Best fixation measure to find the correct image region given a specific tag? 2. Can we differentiate two regions in the same image? T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 3 of 21
  • 4. 3 Steps Conducted by Users  Look at red blinking dot  Decide whether tag can be seen (“y” or “n”) T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 4 of 21
  • 5. Dataset  LabelMe community images  Manually drawn polygons  Regions annotated with tags  182.657 images (August 2010)  High-quality segmentation and annotation  Used as ground truth T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 5 of 21
  • 6. Experiment Images and Tags  Randomly selected 51 images  Contain at least two tagged regions  Created two tag sets for the 51 images  Each image is assigned two tags (one per set)  Tags are either “true” or “false”  “true”  object described by tag can be seen  “false”  object cannot be seen on the image  Keep subjects concentrated during experiment T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 6 of 21
  • 7. Subjects & Experiment System  20 subjects  16 male, 4 female (age: 23-40, Ø=29.6)  Undergrads (6), PhD (12), office clerks (2)  Experiment system  Simple web page in Internet Explorer  Standard notebook, resolution 1680x1050  Tobii X60 eye-tracker (60 Hz, 0.5° accuracy) T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 7 of 21
  • 8. Conducting the Experiment  Each user looked at 51 tag-image-pairs  First tag-image-pair dismissed  94.3% correct answers  Equal for true/false tags  ~3s until decision (average)  85% of users strongly agreed or agreed that they felt comfortable during the experiment  Eyetracker did not much influence comfort T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 8 of 21
  • 9. Pre-processing of Eye-tracking Data  Obtained 547 gaze paths from 20 users where  Users gave correct answers  Image has “true” tag assigned  Fixation extraction  Tobii Studio‟s velocity & distance thresholds  Fixation: focus on particular point on screen  One fixation inside or near the correct region  476 (87%) gaze paths fulfill this requirement T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 9 of 21
  • 10. Analysis of Gaze Fixations (1)  Applied 13 fixation measures on the 476 paths (2 new, 7 standard Tobii , 4 literature)  Fixation measure: function on users‟ gaze paths  Calculated for each image region, over all users viewing the same tag-image-pair T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 10 of 21
  • 11. Considered Fixation Measures Nr Name Favorite region r Origin 1 firstFixation No. of fixations before 1st on r Tobii 2 secondFixation No. of fixations before 2nd on r [13] 3 fixationsAfter No. of fixations after last on r [4] 4 fixationsBeforeDecision fixationsAfter, but before decision New 5 fixationsAfterDecision fixationsBeforeDecision and after New 6 fixationDuration Total duration of all fixations on r Tobii 7 firstFixationDuration Duration of first fixation on r Tobii 8 lastFixationDuration Duration of last fixation on r [11] 9 fixationCount Number of fixations on r Tobii 10 maxVisitDuration Max time first fixation until outside r Tobii 11 meanVisitDuration Mean time first fixation until outside r Tobii 12 visitCount No. of fixations until outside r Tobii 13 T. saccLength S. Staab – Identifying Objects in Imageslength, before fixation on r Walber, A. Scherp, Saccade [6]of 21 11
  • 12. Analysis of Gaze Fixations (2)  For every image region (b) the fixation measure is calculated over all gaze paths (c)  Results are summed up per region  Regions ordered according to fixation measure  If favorite region (d) and tag (a) match, result is true positive (tp), otherwise false positive (fp) T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 12 of 21
  • 13. Precision per Fixation Measure meanVisitDuration P Sum of tp and fp assignments fixationsBeforeDecision lastFixationDuration fixationDuration Fixation measures T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 13 of 21
  • 14. Adding Boundaries and Weights  Take eye-tracker inaccuracies into account  Extension of region boundaries by 13 pixels  Larger regions more likely to be fixated  Give weight to regions < 5% of image size  meanVisitDuration increases to P = 0.67 T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 14 of 21
  • 15. Examples: Tag-Region-Assignments T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 15 of 21
  • 16. Comparison with Baselines  Naïve baseline: largest region r is favorite  Random baseline: randomly select favorite r  Gaze / Gaze* significantly better (χ², α<0.001) T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 16 of 21
  • 17. Effect of Gaze Path Aggregation P Number of gaze paths used  Aggregation of precision P for Gaze*  Single user still significantly better (χ² for naive with α<0.001 and random with α<0.002) T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 17 of 21
  • 18. Research Questions 1.Best fixation measure to find the correct image region given a specific tag?  meanVisitDuration with precision of 67% 2. Can we differentiate two regions in the same image? T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 18 of 21
  • 19. Differentiate Two Objects  Use second tag set to identify different objects in the same image  16 images (of our 51) have two “true” tags  6 images had two correct regions identified  Proportion of 38%  Average precision for single object is 67%  Correct tag assignment for two images: 44% T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 19 of 21
  • 20. Correctly Differentiated Objects T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 20 of 21
  • 21. Research Questions 1.Best fixation measure to find the correct image region given a specific tag?  meanVisitDuration with precision of 67% 2. Can we differentiate two regions in the same image?  Accuracy of 38% Acknowledgement: This research was partially supported by the EU projects Petamedia (FP7-216444) andObjects in Images T. Walber, A. Scherp, S. Staab – Identifying SocialSensor (FP7-287975). 21 of 21
  • 22. Influence of Red Dot  First 5 fixations, over all subjects and all images T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 22 of 21
  • 23. Experiment Data Cleaning  Manually replaced images with a) Tags that are incomprehensible, require expert-knowledge, or nonsense b) Tag refers to multiple regions, but not all are drawn into the image (e.g., bicycle) c) Obstructed objects (bicycle behind a car) d) “False”-tag actually refers to a visible part of the image and thus were “true” tags T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 23 of 21