SlideShare a Scribd company logo
© 2014 MapR Technologies 1
© 2014 MapR Technologies 2
© 2014 MapR Technologies 3
© 2014 MapR Technologies 4
A typical
encounter with a
potential Mahout
user
© 2014 MapR Technologies 5
Which leads us to
the Mahout 1.0
vision
© 2014 MapR Technologies 6
© 2014 MapR Technologies 7
© 2014 MapR Technologies 8
© 2014 MapR Technologies 9
Example: Cooccurrence Analysis
© 2014 MapR Technologies 10
How often do items co-occur?
// load distributed matrix
val A = drmFromHDFS(...)
// compute co-occurrences
val C = A.t %*% A
© 2014 MapR Technologies 11
How often do items co-occur?
// load distributed matrix
val A = drmFromHDFS(...)
// compute co-occurrences
val C = A.t %*% A
Under the covers:
Optimizer rewrites the matrix multiplication and
transpose operations to a TransposeSelf operator
Optimizer chooses from two physical operators for
TransposeSelf
© 2014 MapR Technologies 12
Which items co-occur anomalously?
// compute & broadcast number
// of interactions per item
val numInteractions =
drmBroadcast(A.colSums)
// create indicator matrix
val I = C.mapBlock() {
case (keys, block) =>
// allocate sparse block of indicator matrix
val indicatorBlock = sparse(block.nrow, block.ncol)
// compute indicators with loglikelihood ratio test
for (row <- block)
indicatorBlock(row.index,::) = computeLLR(row,numInteractions)
keys -> indicatorBlock
}
© 2014 MapR Technologies 13
Runtime
• prototype on Apache Spark
– fast and expressive cluster
computing system
– general computation graphs, in-memory primitives, rich API, interactive
shell
• future: add Stratosphere
– project proposed to
Apache Incubator recently
– similar to Apache Spark, adds data flow optimization and efficient out-
of-core execution
© 2014 MapR Technologies 14
© 2014 MapR Technologies 15
© 2014 MapR Technologies 16
How Does This Apply?
© 2014 MapR Technologies 17
How Can I Start?
© 2014 MapR Technologies 18
Q&A
@ted_dunning @mapr maprtech
tdunning@mapr.com
Engage with us!
MapR
maprtech
mapr-technologies
© 2014 MapR Technologies 20

More Related Content

PPTX
Which Algorithms Really Matter
PPTX
How to Determine which Algorithms Really Matter
PPTX
Anomaly Detection - New York Machine Learning
PPTX
Doing-the-impossible
PPTX
What is the past future tense of data?
PPTX
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
PPTX
What's new in Apache Mahout
PPTX
Dunning time-series-2015
Which Algorithms Really Matter
How to Determine which Algorithms Really Matter
Anomaly Detection - New York Machine Learning
Doing-the-impossible
What is the past future tense of data?
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
What's new in Apache Mahout
Dunning time-series-2015

What's hot (20)

PPTX
Cheap learning-dunning-9-18-2015
PDF
Strata 2014 Anomaly Detection
PPTX
Finding Changes in Real Data
PPTX
Building multi-modal recommendation engines using search engines
PPTX
My talk about recommendation and search to the Hive
PPTX
How to tell which algorithms really matter
PPTX
Dunning ml-conf-2014
PPTX
Recommendation Techn
PPTX
Polyvalent recommendations
PPTX
Using Mahout and a Search Engine for Recommendation
PPTX
Hadoop and R Go to the Movies
PPTX
Practical Computing With Chaos
PPTX
Practical Computing with Chaos
PPTX
Mahout and Recommendations
PPTX
Strata New York 2012
PPTX
STRIP: stream learning of influence probabilities.
PPTX
Universal Adiabatic Quantum Computer v1.0
PDF
Mining Big Data Streams with APACHE SAMOA
PDF
Enterprise Scale Topological Data Analysis Using Spark
PPTX
Architecting R into Storm Application Development Process
Cheap learning-dunning-9-18-2015
Strata 2014 Anomaly Detection
Finding Changes in Real Data
Building multi-modal recommendation engines using search engines
My talk about recommendation and search to the Hive
How to tell which algorithms really matter
Dunning ml-conf-2014
Recommendation Techn
Polyvalent recommendations
Using Mahout and a Search Engine for Recommendation
Hadoop and R Go to the Movies
Practical Computing With Chaos
Practical Computing with Chaos
Mahout and Recommendations
Strata New York 2012
STRIP: stream learning of influence probabilities.
Universal Adiabatic Quantum Computer v1.0
Mining Big Data Streams with APACHE SAMOA
Enterprise Scale Topological Data Analysis Using Spark
Architecting R into Storm Application Development Process
Ad

Viewers also liked (6)

PPTX
HBase and Drill: How Loosely Typed SQL is Ideal for NoSQL
PPTX
Cognitive computing with big data, high tech and low tech approaches
PPTX
Real time-hadoop
PPTX
How the Internet of Things is Turning the Internet Upside Down
PPTX
Sharing Sensitive Data Securely
PPTX
Apache Kylin - OLAP Cubes for SQL on Hadoop
HBase and Drill: How Loosely Typed SQL is Ideal for NoSQL
Cognitive computing with big data, high tech and low tech approaches
Real time-hadoop
How the Internet of Things is Turning the Internet Upside Down
Sharing Sensitive Data Securely
Apache Kylin - OLAP Cubes for SQL on Hadoop
Ad

Similar to Possible Visions for Mahout 1.0 (20)

PDF
Co-occurrence Based Recommendations with Mahout, Scala and Spark
PPTX
Practical Machine Learning: Innovations in Recommendation Workshop
PPTX
Predictive Analytics with Hadoop
PDF
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...
PPTX
Mahout scala and spark bindings
PPTX
Intro to Mahout -- DC Hadoop
PDF
Artificial Intelligence Layer: Mahout, MLLib, and other projects
PDF
Bringing Algebraic Semantics to Mahout
PDF
Drill into Drill – How Providing Flexibility and Performance is Possible
PPTX
Introduction to Mahout
PPTX
Introduction to Mahout given at Twin Cities HUG
PPTX
Apache Mahout: Driving the Yellow Elephant
PDF
Distributed Machine Learning with Apache Mahout
PDF
SDEC2011 Mahout - the what, the how and the why
PPTX
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
PDF
OSCON: Apache Mahout - Mammoth Scale Machine Learning
PPTX
mahout introduction
PDF
Apache Mahout Tutorial - Recommendation - 2013/2014
PDF
Apache Spark & Hadoop
PPTX
Intro to Mahout
Co-occurrence Based Recommendations with Mahout, Scala and Spark
Practical Machine Learning: Innovations in Recommendation Workshop
Predictive Analytics with Hadoop
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...
Mahout scala and spark bindings
Intro to Mahout -- DC Hadoop
Artificial Intelligence Layer: Mahout, MLLib, and other projects
Bringing Algebraic Semantics to Mahout
Drill into Drill – How Providing Flexibility and Performance is Possible
Introduction to Mahout
Introduction to Mahout given at Twin Cities HUG
Apache Mahout: Driving the Yellow Elephant
Distributed Machine Learning with Apache Mahout
SDEC2011 Mahout - the what, the how and the why
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
OSCON: Apache Mahout - Mammoth Scale Machine Learning
mahout introduction
Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Spark & Hadoop
Intro to Mahout

More from Ted Dunning (11)

PPTX
Dunning - SIGMOD - Data Economy.pptx
PPTX
How to Get Going with Kubernetes
PPTX
Progress for big data in Kubernetes
PPTX
Anomaly Detection: How to find what you didn’t know to look for
PPTX
Streaming Architecture including Rendezvous for Machine Learning
PPTX
Machine Learning Logistics
PPTX
Tensor Abuse - how to reuse machine learning frameworks
PPTX
Machine Learning logistics
PPTX
T digest-update
PPTX
Where is Data Going? - RMDC Keynote
PPTX
Inside MapR's M7
Dunning - SIGMOD - Data Economy.pptx
How to Get Going with Kubernetes
Progress for big data in Kubernetes
Anomaly Detection: How to find what you didn’t know to look for
Streaming Architecture including Rendezvous for Machine Learning
Machine Learning Logistics
Tensor Abuse - how to reuse machine learning frameworks
Machine Learning logistics
T digest-update
Where is Data Going? - RMDC Keynote
Inside MapR's M7

Recently uploaded (20)

PPTX
Cloud computing and distributed systems.
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Electronic commerce courselecture one. Pdf
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Approach and Philosophy of On baking technology
PPTX
Big Data Technologies - Introduction.pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
cuic standard and advanced reporting.pdf
PPTX
sap open course for s4hana steps from ECC to s4
PPT
Teaching material agriculture food technology
Cloud computing and distributed systems.
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
20250228 LYD VKU AI Blended-Learning.pptx
MIND Revenue Release Quarter 2 2025 Press Release
The Rise and Fall of 3GPP – Time for a Sabbatical?
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Electronic commerce courselecture one. Pdf
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Network Security Unit 5.pdf for BCA BBA.
Encapsulation_ Review paper, used for researhc scholars
Diabetes mellitus diagnosis method based random forest with bat algorithm
Review of recent advances in non-invasive hemoglobin estimation
Approach and Philosophy of On baking technology
Big Data Technologies - Introduction.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
Programs and apps: productivity, graphics, security and other tools
cuic standard and advanced reporting.pdf
sap open course for s4hana steps from ECC to s4
Teaching material agriculture food technology

Possible Visions for Mahout 1.0

  • 1. © 2014 MapR Technologies 1
  • 2. © 2014 MapR Technologies 2
  • 3. © 2014 MapR Technologies 3
  • 4. © 2014 MapR Technologies 4 A typical encounter with a potential Mahout user
  • 5. © 2014 MapR Technologies 5 Which leads us to the Mahout 1.0 vision
  • 6. © 2014 MapR Technologies 6
  • 7. © 2014 MapR Technologies 7
  • 8. © 2014 MapR Technologies 8
  • 9. © 2014 MapR Technologies 9 Example: Cooccurrence Analysis
  • 10. © 2014 MapR Technologies 10 How often do items co-occur? // load distributed matrix val A = drmFromHDFS(...) // compute co-occurrences val C = A.t %*% A
  • 11. © 2014 MapR Technologies 11 How often do items co-occur? // load distributed matrix val A = drmFromHDFS(...) // compute co-occurrences val C = A.t %*% A Under the covers: Optimizer rewrites the matrix multiplication and transpose operations to a TransposeSelf operator Optimizer chooses from two physical operators for TransposeSelf
  • 12. © 2014 MapR Technologies 12 Which items co-occur anomalously? // compute & broadcast number // of interactions per item val numInteractions = drmBroadcast(A.colSums) // create indicator matrix val I = C.mapBlock() { case (keys, block) => // allocate sparse block of indicator matrix val indicatorBlock = sparse(block.nrow, block.ncol) // compute indicators with loglikelihood ratio test for (row <- block) indicatorBlock(row.index,::) = computeLLR(row,numInteractions) keys -> indicatorBlock }
  • 13. © 2014 MapR Technologies 13 Runtime • prototype on Apache Spark – fast and expressive cluster computing system – general computation graphs, in-memory primitives, rich API, interactive shell • future: add Stratosphere – project proposed to Apache Incubator recently – similar to Apache Spark, adds data flow optimization and efficient out- of-core execution
  • 14. © 2014 MapR Technologies 14
  • 15. © 2014 MapR Technologies 15
  • 16. © 2014 MapR Technologies 16 How Does This Apply?
  • 17. © 2014 MapR Technologies 17 How Can I Start?
  • 18. © 2014 MapR Technologies 18 Q&A @ted_dunning @mapr maprtech tdunning@mapr.com Engage with us! MapR maprtech mapr-technologies
  • 19. © 2014 MapR Technologies 20

Editor's Notes

  • #2: I just have 5 minutes for this talk. Given the short time I thought I’d share with you some of the more interesting things you can do with Hadoop in 5 minutes or less…
  • #17: In 1 minute you can perform 4.73 million concurrent authentications in the largest biometric database in the worldIn India, there is no social security card. It’s difficult for the average citizen to set up a bank account, access benefit programs, and enjoy economic mobility. It’s difficult for the government as well with over a $1B of government aid classified as leakage, the result of fraud and corruption. The Aadhaar program is poised to change all that by leveraging the unique IDs that all people are born with. The program aims to get fingerprints and retina scans for all 1.2 billion citizens. The scale of this project required an integrated in-Hadoop database that was capable of 200 millisecond response times while supporting millions of concurrent look-ups.
  • #18: In 1 minute you can perform 4.73 million concurrent authentications in the largest biometric database in the worldIn India, there is no social security card. It’s difficult for the average citizen to set up a bank account, access benefit programs, and enjoy economic mobility. It’s difficult for the government as well with over a $1B of government aid classified as leakage, the result of fraud and corruption. The Aadhaar program is poised to change all that by leveraging the unique IDs that all people are born with. The program aims to get fingerprints and retina scans for all 1.2 billion citizens. The scale of this project required an integrated in-Hadoop database that was capable of 200 millisecond response times while supporting millions of concurrent look-ups.