SlideShare a Scribd company logo
Project Progress
What we’ve been doing(1)
 • Hacking Hadoop API.
 • Writing different kinds of programs to
   understand it. (Not CV programs)
 • Adaboost
 • SIFT, SURF
 • Reading, Reading
Segmentation

ROI   ROI
segmentation with overlap


             get SIFT/SURF descriptor for partial segments


              reduce no. of descriptors by grouping them.


region of interest (positive&negative)

          count the frequency of occurrence of visual words


                               AdaBoost
Methodology

• For simplicity, assume the the same image is
  stored on all slave nodes.
• Use ROI to run the algorithm.
• Hopefully this will make it easier for the
  “Reduce”
Map-Reduce???
• It’s just a framework
• You can also implement it by reading the
  paper[1]. :)
• Hadoop is one implementation. (Apache +
  Yahoo)
• Google’s implementation is not made
  public.
Map-Reduce for Machine
 Learning on Multi-core
Introduction

• Algorithm fitting Statistical Query Model
  may be written in a certain “summation
  form”
• Divide into data set into as many pieces as
  the number of cores.
• Algorithm fitting Statistical Query Model may be
  written in a certain “summation form”
• Divide into data set into as many pieces as the number
  of cores.
Algorithms(1)
• Locally Weight Linear Regression
• Naive Bayes
• Gaussian Discriminative Analysis
• k-means
• Logistic Regression
• Neural Network
Algorithms(2)

• Principal Components Analysis
• Independent Components Analysis
• Expansion Maximization
• Support Vector Machine
Example (LWLR)


          divide the computation among different mappers to compute:




2 reducers sum up the partial values for A and b and finally computes the solution
Experiment Result
• Used UCI Machine Learning repository
• Used only 2 cores.
• 1.9x times faster
• 54 times speed up on 64 cores.
• Speed up is achieved by “throwing cores”
  only

More Related Content

PDF
Spark Summit EU talk by Josef Habdank
PDF
Big Data Visualization With ParaView
PDF
Scalable Deep Learning Platform On Spark In Baidu
PPTX
Python and GIS: Improving Your Workflow
PPTX
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
PDF
Spark Summit EU talk by Kent Buenaventura and Willaim Lau
PDF
Spark Summit EU talk by Simon Whitear
PPTX
Spark and Hadoop Technology
Spark Summit EU talk by Josef Habdank
Big Data Visualization With ParaView
Scalable Deep Learning Platform On Spark In Baidu
Python and GIS: Improving Your Workflow
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Spark Summit EU talk by Kent Buenaventura and Willaim Lau
Spark Summit EU talk by Simon Whitear
Spark and Hadoop Technology

What's hot (20)

PPTX
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
ODP
Get involved with the Apache Software Foundation
PDF
Spark Summit EU talk by Heiko Korndorf
PDF
Spark Summit EU talk by Reza Karimi
PPTX
Big Data Laboratory
PPTX
Tailored for Spark
PDF
Spark Summit EU talk by Elena Lazovik
PDF
Introduction to MapReduce & hadoop
PDF
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
PDF
3rd Hivemall meetup
PDF
Spark Summit EU talk by Mikhail Semeniuk Hollin Wilkins
PDF
A Database-Hadoop Hybrid Approach to Scalable Machine Learning
PPTX
Apache Hadoop Big Data Technology
PDF
Spark Summit EU talk by Oscar Castaneda
PDF
Spark Summit EU talk by Sital Kedia
PDF
Deep Learning to Production with MLflow & RedisAI
PDF
Spark_Intro_Syed_Academy
PDF
Spark Summit EU talk by Jakub Hava
PPTX
Facebook Analytics with Elastic Map/Reduce
PPTX
The Evolution of Apache Kylin
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
Get involved with the Apache Software Foundation
Spark Summit EU talk by Heiko Korndorf
Spark Summit EU talk by Reza Karimi
Big Data Laboratory
Tailored for Spark
Spark Summit EU talk by Elena Lazovik
Introduction to MapReduce & hadoop
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
3rd Hivemall meetup
Spark Summit EU talk by Mikhail Semeniuk Hollin Wilkins
A Database-Hadoop Hybrid Approach to Scalable Machine Learning
Apache Hadoop Big Data Technology
Spark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Sital Kedia
Deep Learning to Production with MLflow & RedisAI
Spark_Intro_Syed_Academy
Spark Summit EU talk by Jakub Hava
Facebook Analytics with Elastic Map/Reduce
The Evolution of Apache Kylin
Ad

Viewers also liked (20)

DOC
Wildi 2009 Resume Addendum
PDF
OW2con'14 - Nanoko, 2 years feedback, Ubidreams
 
PPT
Chapter 13
PDF
Microsoft Power Point Customview360 Linked In
PPT
Billboard Liberation Front - Steve Lambert
PDF
OCCIware project and OCCI standard presented at China Cloud Computing & Stand...
 
PDF
OpenPaas Collaboration Platform. OW2con'15, November 17, Paris.
 
PDF
OW2con' 14 - re-VAMP load testing with CLIF for continuous integration on the...
 
PDF
Kalimucho Research Project, OW2con11, Nov 24-25, Paris
 
PPT
NFPA Presentation Social Media
PDF
Git, как инструмент управления веб-контентом
PDF
CompatibleOne Multi PaaS Provisioning, Sami Yangui & Mohamed Mohamed, Institu...
 
PDF
OCCIware, a formal framework for Everything as a Service. OW2con'15, November...
 
PPT
Chapter 6
PPTX
Los 88 pelda+os del +ëxitov 02
PPT
Serpica Naro
PDF
Big Data with SpagoBI. OW2con'15, November 17, Paris.
 
PDF
Slide Boothphotos
PPT
Hahn Golf Academia & Club
PPS
Adivina Que Ciudad Es
Wildi 2009 Resume Addendum
OW2con'14 - Nanoko, 2 years feedback, Ubidreams
 
Chapter 13
Microsoft Power Point Customview360 Linked In
Billboard Liberation Front - Steve Lambert
OCCIware project and OCCI standard presented at China Cloud Computing & Stand...
 
OpenPaas Collaboration Platform. OW2con'15, November 17, Paris.
 
OW2con' 14 - re-VAMP load testing with CLIF for continuous integration on the...
 
Kalimucho Research Project, OW2con11, Nov 24-25, Paris
 
NFPA Presentation Social Media
Git, как инструмент управления веб-контентом
CompatibleOne Multi PaaS Provisioning, Sami Yangui & Mohamed Mohamed, Institu...
 
OCCIware, a formal framework for Everything as a Service. OW2con'15, November...
 
Chapter 6
Los 88 pelda+os del +ëxitov 02
Serpica Naro
Big Data with SpagoBI. OW2con'15, November 17, Paris.
 
Slide Boothphotos
Hahn Golf Academia & Club
Adivina Que Ciudad Es
Ad

Similar to Project Progress (20)

PPT
Download It
PDF
Terascale Learning
PPTX
Hadoop for Data Science
PPTX
Data science and Hadoop
PPT
Behm Shah Pagerank
PDF
Using MapReduce for Large–scale Medical Image Analysis
PPTX
Intelligent Ruby + Machine Learning
PPT
Hadoop basics
PPTX
R user-group-2011-09
PPTX
This gives a brief detail about big data
PPTX
Big dataanalyticsbeyondhadoop public_20_june_2013
PPTX
Introduction to MapReduce
PDF
Document Similarity with Cloud Computing
PPT
Hadoop classes in mumbai
PDF
Map-Reduce for Machine Learning on Multicore
PDF
Apache Hadoop: DFS and Map Reduce
PDF
OSCON: Apache Mahout - Mammoth Scale Machine Learning
PDF
Introduction to map reduce
DOCX
mapReduce for machine learning
PPTX
Hadoop and Mapreduce for .NET User Group
Download It
Terascale Learning
Hadoop for Data Science
Data science and Hadoop
Behm Shah Pagerank
Using MapReduce for Large–scale Medical Image Analysis
Intelligent Ruby + Machine Learning
Hadoop basics
R user-group-2011-09
This gives a brief detail about big data
Big dataanalyticsbeyondhadoop public_20_june_2013
Introduction to MapReduce
Document Similarity with Cloud Computing
Hadoop classes in mumbai
Map-Reduce for Machine Learning on Multicore
Apache Hadoop: DFS and Map Reduce
OSCON: Apache Mahout - Mammoth Scale Machine Learning
Introduction to map reduce
mapReduce for machine learning
Hadoop and Mapreduce for .NET User Group

Recently uploaded (20)

PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PDF
Encapsulation theory and applications.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
A Presentation on Artificial Intelligence
PPTX
1. Introduction to Computer Programming.pptx
PPTX
Chapter 5: Probability Theory and Statistics
PDF
Enhancing emotion recognition model for a student engagement use case through...
Hindi spoken digit analysis for native and non-native speakers
A comparative study of natural language inference in Swahili using monolingua...
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Programs and apps: productivity, graphics, security and other tools
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
Univ-Connecticut-ChatGPT-Presentaion.pdf
Heart disease approach using modified random forest and particle swarm optimi...
Encapsulation theory and applications.pdf
NewMind AI Weekly Chronicles - August'25-Week II
Assigned Numbers - 2025 - Bluetooth® Document
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
OMC Textile Division Presentation 2021.pptx
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Building Integrated photovoltaic BIPV_UPV.pdf
Unlocking AI with Model Context Protocol (MCP)
A comparative analysis of optical character recognition models for extracting...
A Presentation on Artificial Intelligence
1. Introduction to Computer Programming.pptx
Chapter 5: Probability Theory and Statistics
Enhancing emotion recognition model for a student engagement use case through...

Project Progress

  • 2. What we’ve been doing(1) • Hacking Hadoop API. • Writing different kinds of programs to understand it. (Not CV programs) • Adaboost • SIFT, SURF • Reading, Reading
  • 4. segmentation with overlap get SIFT/SURF descriptor for partial segments reduce no. of descriptors by grouping them. region of interest (positive&negative) count the frequency of occurrence of visual words AdaBoost
  • 5. Methodology • For simplicity, assume the the same image is stored on all slave nodes. • Use ROI to run the algorithm. • Hopefully this will make it easier for the “Reduce”
  • 6. Map-Reduce??? • It’s just a framework • You can also implement it by reading the paper[1]. :) • Hadoop is one implementation. (Apache + Yahoo) • Google’s implementation is not made public.
  • 7. Map-Reduce for Machine Learning on Multi-core
  • 8. Introduction • Algorithm fitting Statistical Query Model may be written in a certain “summation form” • Divide into data set into as many pieces as the number of cores.
  • 9. • Algorithm fitting Statistical Query Model may be written in a certain “summation form” • Divide into data set into as many pieces as the number of cores.
  • 10. Algorithms(1) • Locally Weight Linear Regression • Naive Bayes • Gaussian Discriminative Analysis • k-means • Logistic Regression • Neural Network
  • 11. Algorithms(2) • Principal Components Analysis • Independent Components Analysis • Expansion Maximization • Support Vector Machine
  • 12. Example (LWLR) divide the computation among different mappers to compute: 2 reducers sum up the partial values for A and b and finally computes the solution
  • 13. Experiment Result • Used UCI Machine Learning repository • Used only 2 cores. • 1.9x times faster • 54 times speed up on 64 cores. • Speed up is achieved by “throwing cores” only