SlideShare a Scribd company logo
Apache Mahout
By,
Rahul Reghunath
• A scalable machine learning library built on
hadoop, written on Java
• In the areas of collaborative filtering,
clustering and classification. Many of the
implementations use the Apache Hadoop
platform.
• It gives ability (Drive hadoop) to Hadoop
analyze.---- data mining.
• “ Machine learning is Programming computers
to optimize a performance criterion using
example data and past Experience”
Mahout Points
• Take a power of apache hadoop to solve
complex probs.
• By breaking them up into multiple parallel
tasks
• Stable release-- 0.9 / 1 February 2014
• 9 Oct 2011 - Mahout in Action released
Why Mahout?
• Many Open Source ML libraries either:
– Lack Community
– Lack Documentation and Examples
– Lack Scalability
– Or are research-oriented
Hadoop
• That was invented by Google back in their earlier days,
so they could usefully index all the rich textural and
structural information they were collecting, and then
present meaningful and actionable results to users.
• There was nothing on the market that would let them
do that, so they built their own platform. Google’s
innovations were incorporated into Nutch, an open
source project, and Hadoop was later spun-off from
that.
• Yahoo has played a key role developing Hadoop for
enterprise applications.
Hadoop architect
• Hadoop is designed to run on a large number of machines that
don’t share any memory or disks. That means you can buy a whole
bunch of commodity servers, slap them in a rack, and run the
Hadoop software on each one.
• When you want to load all of your organization’s data into Hadoop,
what the software does is bust that data into pieces that it then
spreads across your different servers. There’s no one place where
you go to talk to all of your data; Hadoop keeps track of where the
data resides.
• And because there are multiple copy stores, data stored on a server
that goes offline or dies can be automatically replicated from a
known good copy.
• Hadoop derives from Google's MapReduce and Google File System
papers.
Current Stages of Hadoop
• Facebook processes more than 500 TB of
data daily----The site manages millions of
photos and processes billions of likes each
day. That's a whole lot of sharing.
• hive is the technique used for connecting with
Hadoop.
• Yahoo also have some technique--pig
How to solve common business
Problems
• Recommendation –
User info + community info=Recommendation
• Classification --Mail sparming
• Clustering --making similar groups of data
Applications
• Ebay
• Netflix—movie
• Pandora—Radio staion
• E Hormoney –match people
Reference
• http://pig.apache.org
• http://mahout.apache.org
• Quora
• http://www.itproportal.com
• Wikipedia
• Youtube
• colleagues and Friends
The End
• 218 days are left in this year, Try to create an
awesome year for the world.
Thanks

More Related Content

PPTX
Hadoop training
PPTX
Introduction to apache hadoop copy
PPTX
Getting started big data
PDF
Hadoop Primer
PPTX
Hadoop foundation for analytics
PPTX
Big data references
PPTX
4. hadoop גיא לבנברג
PPTX
Hadoop And Their Ecosystem
Hadoop training
Introduction to apache hadoop copy
Getting started big data
Hadoop Primer
Hadoop foundation for analytics
Big data references
4. hadoop גיא לבנברג
Hadoop And Their Ecosystem

What's hot (18)

PDF
Hadoop Ecosystem
PPTX
INTRODUCTION TO BIG DATA HADOOP
ODP
Hadoop introduction
PDF
Facebook Hadoop Data & Applications
PPTX
Introduction to Hadoop at Data-360 Conference
PPTX
Cap 10 ingles
PDF
Bn1028 demo hadoop administration and development
PPTX
Hadoop..
PDF
Intro to Apache Spark
PPTX
Hadoop
PPTX
Hadoop overview
PPTX
Apache hadoop technology : Beginners
PPTX
Intro To Hadoop
PPT
Hadoop
PPT
2 hadoop@e bay-hug-2010-07-21
PPTX
Messaging architecture @FB (Fifth Elephant Conference)
PPTX
PDF
Big Data and Hadoop Ecosystem
Hadoop Ecosystem
INTRODUCTION TO BIG DATA HADOOP
Hadoop introduction
Facebook Hadoop Data & Applications
Introduction to Hadoop at Data-360 Conference
Cap 10 ingles
Bn1028 demo hadoop administration and development
Hadoop..
Intro to Apache Spark
Hadoop
Hadoop overview
Apache hadoop technology : Beginners
Intro To Hadoop
Hadoop
2 hadoop@e bay-hug-2010-07-21
Messaging architecture @FB (Fifth Elephant Conference)
Big Data and Hadoop Ecosystem
Ad

Similar to MahoutNew (20)

PPTX
ch 01B Introduction to Hadoop components
PDF
Hadoop framework thesis (3)
PPTX
Apache hadoop introduction and architecture
PPTX
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
PDF
Dallas TDWI Meeting Dec. 2012: Hadoop
PDF
Introduction To Hadoop Ecosystem
PPTX
Big Data Training in Mohali
PPTX
Introduction to Apache Hadoop Ecosystem
PPTX
Big Data Training in Amritsar
PPTX
Big Data Training in Ludhiana
PDF
EclipseCon Keynote: Apache Hadoop - An Introduction
PDF
SDEC2011 Mahout - the what, the how and the why
PPT
Cloud computing and Hadoop introduction
PPTX
Hadoop online training
PDF
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
PPTX
Hadoop jon
PPTX
Introduction to Hadoop
PDF
Chicago Data Summit: Keynote - Data Processing with Hadoop: Scalable and Cost...
PDF
Geek camp
ch 01B Introduction to Hadoop components
Hadoop framework thesis (3)
Apache hadoop introduction and architecture
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
Dallas TDWI Meeting Dec. 2012: Hadoop
Introduction To Hadoop Ecosystem
Big Data Training in Mohali
Introduction to Apache Hadoop Ecosystem
Big Data Training in Amritsar
Big Data Training in Ludhiana
EclipseCon Keynote: Apache Hadoop - An Introduction
SDEC2011 Mahout - the what, the how and the why
Cloud computing and Hadoop introduction
Hadoop online training
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Hadoop jon
Introduction to Hadoop
Chicago Data Summit: Keynote - Data Processing with Hadoop: Scalable and Cost...
Geek camp
Ad

MahoutNew

  • 2. • A scalable machine learning library built on hadoop, written on Java • In the areas of collaborative filtering, clustering and classification. Many of the implementations use the Apache Hadoop platform. • It gives ability (Drive hadoop) to Hadoop analyze.---- data mining. • “ Machine learning is Programming computers to optimize a performance criterion using example data and past Experience”
  • 3. Mahout Points • Take a power of apache hadoop to solve complex probs. • By breaking them up into multiple parallel tasks • Stable release-- 0.9 / 1 February 2014 • 9 Oct 2011 - Mahout in Action released
  • 4. Why Mahout? • Many Open Source ML libraries either: – Lack Community – Lack Documentation and Examples – Lack Scalability – Or are research-oriented
  • 5. Hadoop • That was invented by Google back in their earlier days, so they could usefully index all the rich textural and structural information they were collecting, and then present meaningful and actionable results to users. • There was nothing on the market that would let them do that, so they built their own platform. Google’s innovations were incorporated into Nutch, an open source project, and Hadoop was later spun-off from that. • Yahoo has played a key role developing Hadoop for enterprise applications.
  • 6. Hadoop architect • Hadoop is designed to run on a large number of machines that don’t share any memory or disks. That means you can buy a whole bunch of commodity servers, slap them in a rack, and run the Hadoop software on each one. • When you want to load all of your organization’s data into Hadoop, what the software does is bust that data into pieces that it then spreads across your different servers. There’s no one place where you go to talk to all of your data; Hadoop keeps track of where the data resides. • And because there are multiple copy stores, data stored on a server that goes offline or dies can be automatically replicated from a known good copy. • Hadoop derives from Google's MapReduce and Google File System papers.
  • 7. Current Stages of Hadoop • Facebook processes more than 500 TB of data daily----The site manages millions of photos and processes billions of likes each day. That's a whole lot of sharing. • hive is the technique used for connecting with Hadoop. • Yahoo also have some technique--pig
  • 8. How to solve common business Problems • Recommendation – User info + community info=Recommendation • Classification --Mail sparming • Clustering --making similar groups of data
  • 9. Applications • Ebay • Netflix—movie • Pandora—Radio staion • E Hormoney –match people
  • 10. Reference • http://pig.apache.org • http://mahout.apache.org • Quora • http://www.itproportal.com • Wikipedia • Youtube • colleagues and Friends
  • 11. The End • 218 days are left in this year, Try to create an awesome year for the world. Thanks