Color naming 65,274,705,768 pixels

Nathan Moroney and Giordano Beretta

HP Labs




Electronic Imaging 2013: Color Imaging XVIII
Outline
 Motivation
       More (pixel) data
 Finding and processing 65 billion pixels
       Hint: Wikipedia & a dual core Open MP color namer
 What did you learn?
       The most frequent non-achromatic color term is…
 What’s next?
       Other than a trillion pixels




Electronic Imaging 2013: Color Imaging XVIII
Motivation
 Previous work in crowd-sourcing color training data
  and experimental efforts
 Related work in the area of big (image) data
       A. Torralba, R. Fergus, W. T. Freeman, "80 million tiny images: a
        large dataset for non-parametric object and scene recognition",
        IEEE Transactions on Pattern Analysis and Machine Intelligence,
        vol.30(11), pp. 1958-1970, 2008.
       Ben Shneiderman, "Extreme Visualization: Squeezing a Billion
        Records into a Million Pixels", SIGMOD Conference, pp. 3-12,
        (2008).
       Steven Seitz, “A Trillion Photos”, EI’13 Keynote (2013).



Electronic Imaging 2013: Color Imaging XVIII
Motivation




            0             1            2            3        4        5   6
                                               Log Number of Images




Electronic Imaging 2013: Color Imaging XVIII
Source Data
 ImageClef 2010 snapshot
       Adrian Popescu, Theodora Tsikrika and Jana Kludas, "Overview
        of the wikipedia retrieval task at ImageCLEF 2010", In the
        Working Notes for the CLEF 2010 Workshop, 20-23 September,
        Padova, Italy, 2010.
       250,000 images plus associated wikipedia data
       20 gigabytes
       65,000,000,000 pixels uncompressed




Electronic Imaging 2013: Color Imaging XVIII
Source Data: At 200 PPI




Electronic Imaging 2013: Color Imaging XVIII
Processing
 Basic single dual-core (but Open MP threaded) script
  to process over all image files
 Simple stuff like getting image dimensions can be
  done over lunch
 Uncompressing all the JPEG files to memory can
  take hours
 Goal was a color naming algorithm that could be run
  in less than a day




Electronic Imaging 2013: Color Imaging XVIII
Processing
 Some testing done using HP Cloud Services and
  compute clusters
 But majority of focus on single computing device
       Antony Rowstron, Dushyanth Narayanan, Austin Donnelly, Greg
        O'Shea, and Andrew Douglas. "Nobody ever got fired for using
        hadoop on a cluster", In HotCDP 2012 - 1st International
        Workshop on Hot Topics in Cloud Data Processing, (2012).




Electronic Imaging 2013: Color Imaging XVIII
Processing
 Won’t describe the specifics of the color naming
    algorithm (throw produce if you have it) but generally
       Input single RGB pixel and output is a single color term
       Size of vocabulary or number of color terms is a parameter
       Relative range of chroma values corresponding to an achromatic
        values is also a parameter
 Also currently testing a completely revised model
 Finally, in the Future directions section note that the
    best option for formal publication is to make use of
    currently available open source machine learning
    toolboxes.

Electronic Imaging 2013: Color Imaging XVIII
Results: Aspect Ratios
 Wide range of
  image types
 Most basic test
  of processing
  scripts




Electronic Imaging 2013: Color Imaging XVIII
Results: Median
 Additional test and
  visualization of
  basic color
  properties of images
 Large enough data
  set was worthwhile
  to write custom
  HTML5 2d canvas
  renderer



Electronic Imaging 2013: Color Imaging XVIII
Results: Median
 So much data, that
  as noted by
  Shneiderman the
  density plot "uses a
  spatial substrate
  organizing
  principle, but shows
  concentrations of
  markers” is maybe a
  better idea
 Data, alpha=0.05

Electronic Imaging 2013: Color Imaging XVIII
Results: Max
 Max of R+G+B for
  the images
 Final test of basic
  scripting code




Electronic Imaging 2013: Color Imaging XVIII
Results
 Color terms
  across all images
 Majority pixels
  achromatic
 Top chromatic
  colors are
  arguably natural
  tones
 Higher chroma
  terms relatively
  infrequent
Electronic Imaging 2013: Color Imaging XVIII
Results
                                                                                  Color Terms for 200,000+ images
                                                                  60000


 Color terms per
  image                                                           50000




 Peak at 5 are all                                               40000


  achromatic terms
                                               Number of Images
                                                                  30000
  or images
 Gradual then                                                    20000



  rapid usage of                                                  10000

  chromatic terms
                                                                     0
                                                                          0   5       10            15             20             25    30   35
                                                                                      Number of Color Terms. Maximum Vocabulary of 30




Electronic Imaging 2013: Color Imaging XVIII
Results
                                                                                  Color Terms for 200,000+ images
                                                                  60000


 Sudden drop off
  at 30 is a model                                                50000




  failure                                                         40000


 Term added to
                                               Number of Images
                                                                  30000
  vocabulary based
  on previous                                                     20000



  limited
                                                                  10000

  optimization
                                                                     0
                                                                          0   5       10            15             20             25    30   35
                                                                                      Number of Color Terms. Maximum Vocabulary of 30




Electronic Imaging 2013: Color Imaging XVIII
Current Work
 Repeated entire process adjusting the model
  parameters
 Processing to fill SQL databases
 Query the database to validate all of the steps and
  explore specific




Electronic Imaging 2013: Color Imaging XVIII
Current Work
 SELECT * from
    cntable order by
    skyblue desc limit 40




Electronic Imaging 2013: Color Imaging XVIII
Future Directions
 Image collections as “pixel
    corpora” for algorithm
    design, testing and optimization.
       Similar to the role that written and spoken
        corpora fill for NLP and corpus linguistics
       Useful to formalize for citation and
        repeatability
 Additional analysis features
 Testing with more public domain
    machine learning algorithms for
    repeatability

Electronic Imaging 2013: Color Imaging XVIII
Summary
 Algorithm optimization, like machine color
  naming, with 200,000 images is different than with
  200.
 Based on Wikipedia, majority of visual content or
  pixels are achromatic
 Based on Wikipedia, higher chroma named pixels are
  less frequent
 Based on Wikipedia, there is a gradual then sudden
  transition in color term usage



Electronic Imaging 2013: Color Imaging XVIII

More Related Content

PPTX
Digital Image Processing
PDF
digital image processing, image processing
PDF
Lec17 sparse signal processing & applications
PPTX
Lect 02 second portion
PDF
Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector
PDF
Introduction to DIGITAL IMAGE PROCESSING - DAY 1
PPTX
Types of images
PPT
Image representation
Digital Image Processing
digital image processing, image processing
Lec17 sparse signal processing & applications
Lect 02 second portion
Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector
Introduction to DIGITAL IMAGE PROCESSING - DAY 1
Types of images
Image representation

What's hot (10)

PPTX
Lectures 1 3 final (4)
PDF
Comprehensive Infrared Image Edge detection Algorithm
PPTX
Coin recognition using matlab
PDF
Digital Image Fundamentals
PDF
Lec15 graph laplacian embedding
PDF
Lec12 review-part-i
PPTX
Ec section
PDF
Lec07 aggregation-and-retrieval-system
PPTX
Seema dip
PPTX
Image colorization
Lectures 1 3 final (4)
Comprehensive Infrared Image Edge detection Algorithm
Coin recognition using matlab
Digital Image Fundamentals
Lec15 graph laplacian embedding
Lec12 review-part-i
Ec section
Lec07 aggregation-and-retrieval-system
Seema dip
Image colorization
Ad

Viewers also liked (20)

PPTX
HR Assignment
PPT
How to Play Well with Others (A Program on Dealing with Difficult People)
PDF
Scientific Applications of The Data Distribution Service
PPS
PDF
Rupert.Reading.Jan 2015
PPTX
Office 365 + Windows Azure (del 2)
PDF
Land and Home
PPTX
Carols Presentation53
PPT
Pintura 2 Eso
PDF
Sunshine coast admin
PPT
Destiny Overview
PDF
Redshift Company Credentials
PDF
Make a Wave - Branding Intro webinar - PatchworkPresent
PDF
Compramos la merienda 2º ciclo 2016
PDF
soal-pemrograman-b
PDF
Raspberry PiとActiveMQで作るセンサーライト
PPT
Social Realism
PPT
La Governance e le risorse finanziarie
PDF
ikp321-05
PDF
Baile alumnado 2º ciclo 2013
HR Assignment
How to Play Well with Others (A Program on Dealing with Difficult People)
Scientific Applications of The Data Distribution Service
Rupert.Reading.Jan 2015
Office 365 + Windows Azure (del 2)
Land and Home
Carols Presentation53
Pintura 2 Eso
Sunshine coast admin
Destiny Overview
Redshift Company Credentials
Make a Wave - Branding Intro webinar - PatchworkPresent
Compramos la merienda 2º ciclo 2016
soal-pemrograman-b
Raspberry PiとActiveMQで作るセンサーライト
Social Realism
La Governance e le risorse finanziarie
ikp321-05
Baile alumnado 2º ciclo 2013
Ad

Similar to Color naming 65,274,705,768 pixels (18)

PDF
IHC 2011 - Widgets Internship
PPTX
Vladimir Surin and Alexander Tyrsin - Research of properties of digital nois...
PDF
4 image enhancement in spatial domain
PPTX
Projects on Digital Image Processing Research Thesis Topics
PPT
Image Processing using Matlab . Useful for beginners to learn Image Processing
PPTX
Introduction to Image & Processing and Image
PPTX
3.point operation and histogram based image enhancement
PDF
Digital image processing using matlab
PPT
Digital Image Processing_ ch1 introduction-2003
PPTX
Digital Image Processing Unit 1 NotesPPT
PDF
Basics of image processing & analysis
PPTX
Image Enhacement for the image improvement
DOCX
Laureate Online Education Internet and Multimedia Technolog.docx
PPTX
What Color is Solid State Lighting - Panel Discussion
PDF
IT6005 digital image processing question bank
PDF
Color Imaging Lab Research Interests 2010
PDF
Chapter 2 Digital Image Fundamentals.pdf
PPT
introduction to Digital Image Processing
IHC 2011 - Widgets Internship
Vladimir Surin and Alexander Tyrsin - Research of properties of digital nois...
4 image enhancement in spatial domain
Projects on Digital Image Processing Research Thesis Topics
Image Processing using Matlab . Useful for beginners to learn Image Processing
Introduction to Image & Processing and Image
3.point operation and histogram based image enhancement
Digital image processing using matlab
Digital Image Processing_ ch1 introduction-2003
Digital Image Processing Unit 1 NotesPPT
Basics of image processing & analysis
Image Enhacement for the image improvement
Laureate Online Education Internet and Multimedia Technolog.docx
What Color is Solid State Lighting - Panel Discussion
IT6005 digital image processing question bank
Color Imaging Lab Research Interests 2010
Chapter 2 Digital Image Fundamentals.pdf
introduction to Digital Image Processing

Recently uploaded (20)

PDF
A review of recent deep learning applications in wood surface defect identifi...
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PDF
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
CloudStack 4.21: First Look Webinar slides
PDF
Hindi spoken digit analysis for native and non-native speakers
PPTX
observCloud-Native Containerability and monitoring.pptx
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PPTX
Modernising the Digital Integration Hub
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
Hybrid model detection and classification of lung cancer
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PDF
Getting Started with Data Integration: FME Form 101
PPTX
Benefits of Physical activity for teenagers.pptx
A review of recent deep learning applications in wood surface defect identifi...
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
Univ-Connecticut-ChatGPT-Presentaion.pdf
Getting started with AI Agents and Multi-Agent Systems
Enhancing emotion recognition model for a student engagement use case through...
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
A novel scalable deep ensemble learning framework for big data classification...
CloudStack 4.21: First Look Webinar slides
Hindi spoken digit analysis for native and non-native speakers
observCloud-Native Containerability and monitoring.pptx
Final SEM Unit 1 for mit wpu at pune .pptx
1 - Historical Antecedents, Social Consideration.pdf
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Modernising the Digital Integration Hub
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
Hybrid model detection and classification of lung cancer
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
Getting Started with Data Integration: FME Form 101
Benefits of Physical activity for teenagers.pptx

Color naming 65,274,705,768 pixels

  • 1. Color naming 65,274,705,768 pixels Nathan Moroney and Giordano Beretta HP Labs Electronic Imaging 2013: Color Imaging XVIII
  • 2. Outline  Motivation  More (pixel) data  Finding and processing 65 billion pixels  Hint: Wikipedia & a dual core Open MP color namer  What did you learn?  The most frequent non-achromatic color term is…  What’s next?  Other than a trillion pixels Electronic Imaging 2013: Color Imaging XVIII
  • 3. Motivation  Previous work in crowd-sourcing color training data and experimental efforts  Related work in the area of big (image) data  A. Torralba, R. Fergus, W. T. Freeman, "80 million tiny images: a large dataset for non-parametric object and scene recognition", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.30(11), pp. 1958-1970, 2008.  Ben Shneiderman, "Extreme Visualization: Squeezing a Billion Records into a Million Pixels", SIGMOD Conference, pp. 3-12, (2008).  Steven Seitz, “A Trillion Photos”, EI’13 Keynote (2013). Electronic Imaging 2013: Color Imaging XVIII
  • 4. Motivation 0 1 2 3 4 5 6 Log Number of Images Electronic Imaging 2013: Color Imaging XVIII
  • 5. Source Data  ImageClef 2010 snapshot  Adrian Popescu, Theodora Tsikrika and Jana Kludas, "Overview of the wikipedia retrieval task at ImageCLEF 2010", In the Working Notes for the CLEF 2010 Workshop, 20-23 September, Padova, Italy, 2010.  250,000 images plus associated wikipedia data  20 gigabytes  65,000,000,000 pixels uncompressed Electronic Imaging 2013: Color Imaging XVIII
  • 6. Source Data: At 200 PPI Electronic Imaging 2013: Color Imaging XVIII
  • 7. Processing  Basic single dual-core (but Open MP threaded) script to process over all image files  Simple stuff like getting image dimensions can be done over lunch  Uncompressing all the JPEG files to memory can take hours  Goal was a color naming algorithm that could be run in less than a day Electronic Imaging 2013: Color Imaging XVIII
  • 8. Processing  Some testing done using HP Cloud Services and compute clusters  But majority of focus on single computing device  Antony Rowstron, Dushyanth Narayanan, Austin Donnelly, Greg O'Shea, and Andrew Douglas. "Nobody ever got fired for using hadoop on a cluster", In HotCDP 2012 - 1st International Workshop on Hot Topics in Cloud Data Processing, (2012). Electronic Imaging 2013: Color Imaging XVIII
  • 9. Processing  Won’t describe the specifics of the color naming algorithm (throw produce if you have it) but generally  Input single RGB pixel and output is a single color term  Size of vocabulary or number of color terms is a parameter  Relative range of chroma values corresponding to an achromatic values is also a parameter  Also currently testing a completely revised model  Finally, in the Future directions section note that the best option for formal publication is to make use of currently available open source machine learning toolboxes. Electronic Imaging 2013: Color Imaging XVIII
  • 10. Results: Aspect Ratios  Wide range of image types  Most basic test of processing scripts Electronic Imaging 2013: Color Imaging XVIII
  • 11. Results: Median  Additional test and visualization of basic color properties of images  Large enough data set was worthwhile to write custom HTML5 2d canvas renderer Electronic Imaging 2013: Color Imaging XVIII
  • 12. Results: Median  So much data, that as noted by Shneiderman the density plot "uses a spatial substrate organizing principle, but shows concentrations of markers” is maybe a better idea  Data, alpha=0.05 Electronic Imaging 2013: Color Imaging XVIII
  • 13. Results: Max  Max of R+G+B for the images  Final test of basic scripting code Electronic Imaging 2013: Color Imaging XVIII
  • 14. Results  Color terms across all images  Majority pixels achromatic  Top chromatic colors are arguably natural tones  Higher chroma terms relatively infrequent Electronic Imaging 2013: Color Imaging XVIII
  • 15. Results Color Terms for 200,000+ images 60000  Color terms per image 50000  Peak at 5 are all 40000 achromatic terms Number of Images 30000 or images  Gradual then 20000 rapid usage of 10000 chromatic terms 0 0 5 10 15 20 25 30 35 Number of Color Terms. Maximum Vocabulary of 30 Electronic Imaging 2013: Color Imaging XVIII
  • 16. Results Color Terms for 200,000+ images 60000  Sudden drop off at 30 is a model 50000 failure 40000  Term added to Number of Images 30000 vocabulary based on previous 20000 limited 10000 optimization 0 0 5 10 15 20 25 30 35 Number of Color Terms. Maximum Vocabulary of 30 Electronic Imaging 2013: Color Imaging XVIII
  • 17. Current Work  Repeated entire process adjusting the model parameters  Processing to fill SQL databases  Query the database to validate all of the steps and explore specific Electronic Imaging 2013: Color Imaging XVIII
  • 18. Current Work  SELECT * from cntable order by skyblue desc limit 40 Electronic Imaging 2013: Color Imaging XVIII
  • 19. Future Directions  Image collections as “pixel corpora” for algorithm design, testing and optimization.  Similar to the role that written and spoken corpora fill for NLP and corpus linguistics  Useful to formalize for citation and repeatability  Additional analysis features  Testing with more public domain machine learning algorithms for repeatability Electronic Imaging 2013: Color Imaging XVIII
  • 20. Summary  Algorithm optimization, like machine color naming, with 200,000 images is different than with 200.  Based on Wikipedia, majority of visual content or pixels are achromatic  Based on Wikipedia, higher chroma named pixels are less frequent  Based on Wikipedia, there is a gradual then sudden transition in color term usage Electronic Imaging 2013: Color Imaging XVIII