SlideShare a Scribd company logo
Ofer Vugman
 May 2012
Agenda and such…


   What is ML (Machine Learning)
   ML Common Use Cases
   Mahout Overview
   Algorithms in Mahout
   Mahout Commercial Use
   Mahout Summary
What is ML



       “Machine Learning is programming
      computers to optimize a performance
       criterion using example data or past
                    experience”


 Intro. To Machine Learning by E. Alpaydin
ML Common Use Cases


 Recommendation
ML Common Use Cases


 Classification
ML Common Use Cases


 Clustering
ML Common Libraries
Mahout Overview – What ?


A mahout is a person who keeps and drives
  an elephant
Mahout Overview – What ?


 A scalable machine learning library
Mahout Overview – What ?


 Began life at 2008 as a subproject of
  Apache’s Lucene project
 On 2010 Mahout became a top-level
  Apache project in its own right
 Implemented in Java
 Built upon Apache’s Hadoop (Look ! An
  Elephant !)
Mahout Overview – Why ?


 Many open source ML libraries either:
   Lack community
   Lack documentation and examples
   Lack scalability
   Lack the Apache license
   Are research oriented
   Not well tested
   Not built over existing production quality
    libraries
Mahout Overview – Why ?


 Scalability
   Scalable to reasonably large datasets (core
    algorithms implemented in Map/Reduce,
    runnable on Hadoop)
   Scalable to support your business case
    (Apache License)
   Scalable community
Mahout Overview – Why ?


 Built over existing production quality
  libraries
Mahout Overview – Use Cases


 Mahout currently supports mainly four
  use cases:
  1. Recommendation
  2. Clustering
  3. Classification
  4. Frequent Itemset Mining
Mahout Overview - Technical


 System Requirements
     Linux (or Cygwin on Windows)
     Java 1.6.x or greater
     Maven 2.0.11 or greater to build the source
      code
     Hadoop 0.2 or greater*


* Not all algorithms are implemented to work on Hadoop clusters
Algorithms in Mahout


 We’ll focus on one example:
   Collaborative Filtering (Recommenders)



 Yet there are many (many !!) more, you
  can find them all on
  https://cwiki.apache.org/confluence/dis
  play/MAHOUT/Algorithms
Algorithms Examples –
Recommendation

 Help users find items they might like
  based on historical preferences




 Based on example by Sebastian Schelter in “Distributed Itembased
  Collaborative Filtering with Apache Mahout”
Algorithms Examples –
Recommendation




      Alice   5     1   4




      Bob     ?     2   5




     Peter    4     3   2
Algorithms Examples –
Recommendation

 Algorithm
   Neighborhood-based approach
   Works by finding similarly rated items in the
    user-item-matrix (e.g. cosine, Pearson-
    Correlation, Tanimoto Coefficient)
   Estimates a user's preference towards an
    item by looking at his/her preferences
    towards similar items
Algorithms Examples –
Recommendation

 Prediction: Estimate Bob's preference
  towards “The Matrix”
  1. Look at all items that
        a) are similar to “The Matrix“
        b) have been rated by Bob
           => “Alien“, “Inception“
  2. Estimate the unknown preference with a
     weighted sum
Algorithms Examples –
Recommendation

 MapReduce phase 1
   Map – Make user the key
    (Alice, Matrix, 5)        Alice (Matrix, 5)
    (Alice, Alien, 1)         Alice (Alien, 1)
    (Alice, Inception, 4)     Alice (Inception, 4)
    (Bob, Alien, 2)           Bob (Alien, 2)
    (Bob, Inception, 5)       Bob (Inception, 5)
    (Peter, Matrix, 4)        Peter (Matrix, 4)
    (Peter, Alien, 3)         Peter (Alien, 3)
    (Peter, Inception, 2)     Peter (Inception, 2)
Algorithms Examples –
Recommendation

 MapReduce phase 1
   Reduce – Create inverted index
 Alice (Matrix, 5)
 Alice (Alien, 1)
 Alice (Inception, 4)     Alice (Matrix, 5) (Alien, 1) (Inception, 4)
 Bob (Alien, 2)           Bob (Alien, 2) (Inception, 5)
 Bob (Inception, 5)       Peter(Matrix, 4) (Alien, 3) (Inception, 2)
 Peter (Matrix, 4)
 Peter (Alien, 3)
 Peter (Inception, 2)
Algorithms Examples –
Recommendation

 MapReduce phase 2
    Map – Isolate all co-occurred ratings (all
      cases where a user rated both items)
                                              Matrix, Alien (5,1)
                                              Matrix, Alien (4,3)
Alice (Matrix, 5) (Alien, 1) (Inception, 4)   Alien, Inception (1,4)
Bob (Alien, 2) (Inception, 5)                 Alien, Inception (2,5)
Peter(Matrix, 4) (Alien, 3) (Inception, 2)    Alien, Inception (3,2)
                                              Matrix, Inception (4,2)
                                              Matrix, Inception (5,4)
Algorithms Examples –
Recommendation

 MapReduce phase 2
   Reduce – Compute similarities

  Matrix, Alien (5,1)
  Matrix, Alien (4,3)
  Alien, Inception (1,4)    Matrix, Alien (-0.47)
  Alien, Inception (2,5)    Matrix, Inception (0.47)
  Alien, Inception (3,2)    Alien, Inception(-0.63)
  Matrix, Inception (4,2)
  Matrix, Inception (5,4)
Algorithms Examples –
Recommendation




      Alice   5     1   4




      Bob     1.5   2   5




     Peter    4     3   2
Mahout Commercial Use


 Commercial use
Mahout Resources

 Mahout website - http://mahout.apache.org/
 Introducing Apache Mahout –
  http://www.ibm.com/developerworks/java/lib
  rary/j-mahout/
 “Mahout In Action” by Sean Owen and Robin
  Anil
Mahout Summary


 ML is all over the web today
 Mahout is about scalable machine
  learning
 Mahout has functionality for many of
  today’s common machine learning tasks
 MapReduce magic in
  action
Mahout Summary




     Thank you and good night

More Related Content

PDF
Apache Mahout Tutorial - Recommendation - 2013/2014
PDF
Collaborative Filtering and Recommender Systems By Navisro Analytics
PDF
Tutorial Mahout - Recommendation
PPTX
Apache mahout
KEY
Machine Learning with Apache Mahout
PDF
Next directions in Mahout's recommenders
PPTX
Intro to Mahout -- DC Hadoop
PDF
Introduction to Collaborative Filtering with Apache Mahout
Apache Mahout Tutorial - Recommendation - 2013/2014
Collaborative Filtering and Recommender Systems By Navisro Analytics
Tutorial Mahout - Recommendation
Apache mahout
Machine Learning with Apache Mahout
Next directions in Mahout's recommenders
Intro to Mahout -- DC Hadoop
Introduction to Collaborative Filtering with Apache Mahout

What's hot (20)

PPTX
Apache Mahout 於電子商務的應用
PPT
Buidling large scale recommendation engine
PPTX
Machine Learning and Apache Mahout : An Introduction
PDF
Mahout Tutorial and Hands-on (version 2015)
PPT
Mahout part2
PDF
SDEC2011 Mahout - the what, the how and the why
PPTX
Intro to Apache Mahout
PPTX
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
PDF
Apache Mahout
PPTX
Whats Right and Wrong with Apache Mahout
PPT
Hands on Mahout!
PDF
Scalable Collaborative Filtering Recommendation Algorithms on Apache Spark
PDF
Mahout
PPT
Orchestrating the Intelligent Web with Apache Mahout
PDF
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
PPTX
Apache Mahout
PDF
Machine Learning for Everyone
PPTX
Introduction to Apache Mahout
PDF
Jake Mannix, MLconf 2013
PPTX
Learning to Rank Presentation (v2) at LexisNexis Search Guild
Apache Mahout 於電子商務的應用
Buidling large scale recommendation engine
Machine Learning and Apache Mahout : An Introduction
Mahout Tutorial and Hands-on (version 2015)
Mahout part2
SDEC2011 Mahout - the what, the how and the why
Intro to Apache Mahout
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
Apache Mahout
Whats Right and Wrong with Apache Mahout
Hands on Mahout!
Scalable Collaborative Filtering Recommendation Algorithms on Apache Spark
Mahout
Orchestrating the Intelligent Web with Apache Mahout
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Apache Mahout
Machine Learning for Everyone
Introduction to Apache Mahout
Jake Mannix, MLconf 2013
Learning to Rank Presentation (v2) at LexisNexis Search Guild
Ad

Viewers also liked (20)

PDF
Random forest using apache mahout
PPTX
Movie recommendation system using Apache Mahout and Facebook APIs
PPTX
PPTX
Vaklipi Text Analytics Tools
PPTX
VPN Types, Vulnerabilities & Solutions - Tareq Hanaysha
PDF
Data Science for Cyber Risk
PDF
SAP Security - Real life Attacks to Business Processes - Hack in Paris 2015
PPTX
Combining R With Java For Data Analysis (Devoxx UK 2015 Session)
PDF
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
PDF
Apache Mahout Algorithms
PPTX
Building an Analytics - Enabled SOC Breakout Session
PDF
Text categorization with Lucene and Solr
PDF
Building an Analytics Enables SOC
PPTX
Introducing OpenText Auto-Classification
PDF
PerfUG 3 - perfs système
PDF
Dev opsmeetup sept2013-leaseweb
DOC
Resume Shavez Hasan (1)
PDF
Openstack benelux 2015
PDF
DailyTranslate Brochure
PPT
Corredor Norte De La Isla Hispaniola Creole
Random forest using apache mahout
Movie recommendation system using Apache Mahout and Facebook APIs
Vaklipi Text Analytics Tools
VPN Types, Vulnerabilities & Solutions - Tareq Hanaysha
Data Science for Cyber Risk
SAP Security - Real life Attacks to Business Processes - Hack in Paris 2015
Combining R With Java For Data Analysis (Devoxx UK 2015 Session)
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Apache Mahout Algorithms
Building an Analytics - Enabled SOC Breakout Session
Text categorization with Lucene and Solr
Building an Analytics Enables SOC
Introducing OpenText Auto-Classification
PerfUG 3 - perfs système
Dev opsmeetup sept2013-leaseweb
Resume Shavez Hasan (1)
Openstack benelux 2015
DailyTranslate Brochure
Corredor Norte De La Isla Hispaniola Creole
Ad

Similar to Intro to Mahout (20)

PDF
mahout-cf
PDF
Ruby and rails - Advanced Training (Cybage)
PDF
A tour on Spur for non-VM experts
PDF
OSCON: Apache Mahout - Mammoth Scale Machine Learning
PDF
Recommender Systems at Scale
ODP
MongoDB & Machine Learning
PDF
Explainability for Learning to Rank
PDF
JDD 2016 - Grzegorz Rozniecki - Java 8 What Could Possibly Go Wrong
PPTX
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
DOC
Download Materials
PPT
MEME – An Integrated Tool For Advanced Computational Experiments
PPTX
Yuandong Tian at AI Frontiers : Planning in Reinforcement Learning
PDF
Astronomy_python_data_Analysis_made_easy.pdf
PDF
AI in Production
PDF
Deploying MLlib for Scoring in Structured Streaming with Joseph Bradley
PPTX
Scala in the Wild
PPTX
Exploring .NET memory management - JetBrains webinar
PDF
Optimization Software Class Libraries 1st Edition Stefan Voß
PDF
AutoML lectures (ACDL 2019)
PPTX
Machine Learning with Spark
mahout-cf
Ruby and rails - Advanced Training (Cybage)
A tour on Spur for non-VM experts
OSCON: Apache Mahout - Mammoth Scale Machine Learning
Recommender Systems at Scale
MongoDB & Machine Learning
Explainability for Learning to Rank
JDD 2016 - Grzegorz Rozniecki - Java 8 What Could Possibly Go Wrong
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
Download Materials
MEME – An Integrated Tool For Advanced Computational Experiments
Yuandong Tian at AI Frontiers : Planning in Reinforcement Learning
Astronomy_python_data_Analysis_made_easy.pdf
AI in Production
Deploying MLlib for Scoring in Structured Streaming with Joseph Bradley
Scala in the Wild
Exploring .NET memory management - JetBrains webinar
Optimization Software Class Libraries 1st Edition Stefan Voß
AutoML lectures (ACDL 2019)
Machine Learning with Spark

More from Uri Lavi (9)

PPTX
JavaScript TDD
KEY
API Best Practices
PPTX
Web Performance 101
PPT
Cloud Aware Architecture
PPTX
Software craftsmanship - 4
PPTX
Software Craftsmanship - 3
PPTX
Software Craftsmanship - 2
PPTX
Software Craftsmanship - 1 Meeting
PPTX
Effective Code Review
JavaScript TDD
API Best Practices
Web Performance 101
Cloud Aware Architecture
Software craftsmanship - 4
Software Craftsmanship - 3
Software Craftsmanship - 2
Software Craftsmanship - 1 Meeting
Effective Code Review

Recently uploaded (20)

PPTX
MYSQL Presentation for SQL database connectivity
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
sap open course for s4hana steps from ECC to s4
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Approach and Philosophy of On baking technology
PPTX
Spectroscopy.pptx food analysis technology
PPTX
Cloud computing and distributed systems.
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
cuic standard and advanced reporting.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
MYSQL Presentation for SQL database connectivity
20250228 LYD VKU AI Blended-Learning.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
Chapter 3 Spatial Domain Image Processing.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Diabetes mellitus diagnosis method based random forest with bat algorithm
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Spectral efficient network and resource selection model in 5G networks
MIND Revenue Release Quarter 2 2025 Press Release
sap open course for s4hana steps from ECC to s4
NewMind AI Weekly Chronicles - August'25 Week I
Approach and Philosophy of On baking technology
Spectroscopy.pptx food analysis technology
Cloud computing and distributed systems.
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Dropbox Q2 2025 Financial Results & Investor Presentation
Review of recent advances in non-invasive hemoglobin estimation
cuic standard and advanced reporting.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf

Intro to Mahout

  • 2. Agenda and such…  What is ML (Machine Learning)  ML Common Use Cases  Mahout Overview  Algorithms in Mahout  Mahout Commercial Use  Mahout Summary
  • 3. What is ML “Machine Learning is programming computers to optimize a performance criterion using example data or past experience”  Intro. To Machine Learning by E. Alpaydin
  • 4. ML Common Use Cases  Recommendation
  • 5. ML Common Use Cases  Classification
  • 6. ML Common Use Cases  Clustering
  • 8. Mahout Overview – What ? A mahout is a person who keeps and drives an elephant
  • 9. Mahout Overview – What ?  A scalable machine learning library
  • 10. Mahout Overview – What ?  Began life at 2008 as a subproject of Apache’s Lucene project  On 2010 Mahout became a top-level Apache project in its own right  Implemented in Java  Built upon Apache’s Hadoop (Look ! An Elephant !)
  • 11. Mahout Overview – Why ?  Many open source ML libraries either:  Lack community  Lack documentation and examples  Lack scalability  Lack the Apache license  Are research oriented  Not well tested  Not built over existing production quality libraries
  • 12. Mahout Overview – Why ?  Scalability  Scalable to reasonably large datasets (core algorithms implemented in Map/Reduce, runnable on Hadoop)  Scalable to support your business case (Apache License)  Scalable community
  • 13. Mahout Overview – Why ?  Built over existing production quality libraries
  • 14. Mahout Overview – Use Cases  Mahout currently supports mainly four use cases: 1. Recommendation 2. Clustering 3. Classification 4. Frequent Itemset Mining
  • 15. Mahout Overview - Technical  System Requirements  Linux (or Cygwin on Windows)  Java 1.6.x or greater  Maven 2.0.11 or greater to build the source code  Hadoop 0.2 or greater* * Not all algorithms are implemented to work on Hadoop clusters
  • 16. Algorithms in Mahout  We’ll focus on one example:  Collaborative Filtering (Recommenders)  Yet there are many (many !!) more, you can find them all on https://cwiki.apache.org/confluence/dis play/MAHOUT/Algorithms
  • 17. Algorithms Examples – Recommendation  Help users find items they might like based on historical preferences  Based on example by Sebastian Schelter in “Distributed Itembased Collaborative Filtering with Apache Mahout”
  • 18. Algorithms Examples – Recommendation Alice 5 1 4 Bob ? 2 5 Peter 4 3 2
  • 19. Algorithms Examples – Recommendation  Algorithm  Neighborhood-based approach  Works by finding similarly rated items in the user-item-matrix (e.g. cosine, Pearson- Correlation, Tanimoto Coefficient)  Estimates a user's preference towards an item by looking at his/her preferences towards similar items
  • 20. Algorithms Examples – Recommendation  Prediction: Estimate Bob's preference towards “The Matrix” 1. Look at all items that  a) are similar to “The Matrix“  b) have been rated by Bob => “Alien“, “Inception“ 2. Estimate the unknown preference with a weighted sum
  • 21. Algorithms Examples – Recommendation  MapReduce phase 1  Map – Make user the key (Alice, Matrix, 5) Alice (Matrix, 5) (Alice, Alien, 1) Alice (Alien, 1) (Alice, Inception, 4) Alice (Inception, 4) (Bob, Alien, 2) Bob (Alien, 2) (Bob, Inception, 5) Bob (Inception, 5) (Peter, Matrix, 4) Peter (Matrix, 4) (Peter, Alien, 3) Peter (Alien, 3) (Peter, Inception, 2) Peter (Inception, 2)
  • 22. Algorithms Examples – Recommendation  MapReduce phase 1  Reduce – Create inverted index Alice (Matrix, 5) Alice (Alien, 1) Alice (Inception, 4) Alice (Matrix, 5) (Alien, 1) (Inception, 4) Bob (Alien, 2) Bob (Alien, 2) (Inception, 5) Bob (Inception, 5) Peter(Matrix, 4) (Alien, 3) (Inception, 2) Peter (Matrix, 4) Peter (Alien, 3) Peter (Inception, 2)
  • 23. Algorithms Examples – Recommendation  MapReduce phase 2  Map – Isolate all co-occurred ratings (all cases where a user rated both items) Matrix, Alien (5,1) Matrix, Alien (4,3) Alice (Matrix, 5) (Alien, 1) (Inception, 4) Alien, Inception (1,4) Bob (Alien, 2) (Inception, 5) Alien, Inception (2,5) Peter(Matrix, 4) (Alien, 3) (Inception, 2) Alien, Inception (3,2) Matrix, Inception (4,2) Matrix, Inception (5,4)
  • 24. Algorithms Examples – Recommendation  MapReduce phase 2  Reduce – Compute similarities Matrix, Alien (5,1) Matrix, Alien (4,3) Alien, Inception (1,4) Matrix, Alien (-0.47) Alien, Inception (2,5) Matrix, Inception (0.47) Alien, Inception (3,2) Alien, Inception(-0.63) Matrix, Inception (4,2) Matrix, Inception (5,4)
  • 25. Algorithms Examples – Recommendation Alice 5 1 4 Bob 1.5 2 5 Peter 4 3 2
  • 26. Mahout Commercial Use  Commercial use
  • 27. Mahout Resources  Mahout website - http://mahout.apache.org/  Introducing Apache Mahout – http://www.ibm.com/developerworks/java/lib rary/j-mahout/  “Mahout In Action” by Sean Owen and Robin Anil
  • 28. Mahout Summary  ML is all over the web today  Mahout is about scalable machine learning  Mahout has functionality for many of today’s common machine learning tasks  MapReduce magic in action
  • 29. Mahout Summary Thank you and good night

Editor's Notes

  • #14: The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers (2008)Apache Lucene(TM) is a high-performance, full-featured text search engine library  (2005)