SlideShare a Scribd company logo
DO NOT USE PUBLICLY
    Million Monkeys                                 PRIOR TO 10/23/12
    Headline Goes Here
    Jesse Anderson | Curriculum Developer and Instructor
    Speaker Name or Subhead Goes Here
    November 2012




1
About Me
    • Cloudera - Educational Services Team
    • Twitter - @jessetanderson
    • Blog and more info: http://www.jesse-anderson.com
    • Screencasts on Pragmatic Programmers: Buy It Now on
      http://www.jesse-anderson.com
    • President – Northern Nevada Software Developers Group




2
About Cloudera
    • Cloudera is “The commercial Hadoop company”
    • Founded by leading experts on Hadoop from
      Facebook, Google, Oracle and Yahoo
    • Provides consulting and training services for Hadoop users
    • Staff includes committers to virtually all Hadoop projects




3
Introduction

    • Infinite Monkey Theorem
    • Hadoop
    • Million Monkeys Algorithm
    • Business Case




4
Infinite Monkey Theorem




5
Exponential Growth (aka Big Data)


     Odds of finding a group    Contiguous
                                              Combinations
     of characters is 1 in 26   Characters
     raised to the power of
          the number of             8           208,827,064,576
     contiguous characters
                                    9          5,429,503,678,976

                                   10        141,167,095,653,376




6
Hadoop

    •   Apache Project
    •   Reliable, Scalable, Distributed Computing
    •   Software Framework
    •   MapReduce
    •   Distributed File System (HDFS)
    •   Other projects

7
Map
    Create or process the input data




8
Reduce
    Process data from Map into something usable




9
Data Flow




10
Million Monkeys Algorithm




11
Business Case




12
Hadoop Scalability
                                Percent of Linear Scalability
               100

               80
     Percent




               60                                                               RDBMS
                                                                                Hadoop
               40

               20

                 0
                     1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
                                       Nodes                        RDBMS = Relational Database

13
Business Value of Scalability

        Scaling does not require    Adding more computers
         massive re-engineering         to cluster gets a
        and complete rewrites of     predictable increase in
                  code             computational power and
                                             storage

        SAVE                         SAVE




14
Going Viral (and taking over the world)


     Covered internationally      26,000 unique
     in BBC, Wall Street          visits from 119
     Journal, Wired and           countries in
     Slashdot                     one day




15
Next Steps
     •   Books
          •   Hadoop: The Definitive Guide - Tom White
          •   Hadoop Operations - Eric Sammer
     •   Cloudera Training
          •   Developer, Admin, Hive and Pig, HBase, Essentials
     •   CDH
          •   Cloudera's Apache Distribution Including Hadoop
          •   Open Source
          •   VM Image

16
Conclusion

     • MapReduce breaks up problem efficiently
     • No code changes to scale
     • Incredible scalability
     • Enables previously impossible tasks




17
18

More Related Content

PDF
Hd insight essentials quick view
PPTX
Dataiku big data paris - the rise of the hadoop ecosystem
DOCX
Hadoop online training by certified trainer
PPTX
Atlanta MLConf
PPTX
Data infrastructure and Hadoop at LinkedIn
PPTX
PPT on Hadoop
PPTX
Optimizing Big Data to run in the Public Cloud
PDF
Big Data Journey
Hd insight essentials quick view
Dataiku big data paris - the rise of the hadoop ecosystem
Hadoop online training by certified trainer
Atlanta MLConf
Data infrastructure and Hadoop at LinkedIn
PPT on Hadoop
Optimizing Big Data to run in the Public Cloud
Big Data Journey

What's hot (19)

PPTX
Apache Hadoop India Summit 2011 talk "Data Infrastructure on Hadoop" by Venka...
DOCX
Best Hadoop and Amazon Online Training
PPTX
Hadoop And Their Ecosystem
PDF
Hadoop Ecosystem
PPTX
Hadoop Tutorial For Beginners
DOCX
Hadoop online training
PPTX
Hadoop
KEY
Cassandra eu
PDF
Map reduce & HDFS with Hadoop
PDF
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
PPTX
Hadoop and Big Data
PPT
Hadoop distributions - ecosystem
PPTX
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
PDF
Bn1028 demo hadoop administration and development
PDF
Introduction To Hadoop Ecosystem
PPTX
Big data vahidamiri-tabriz-13960226-datastack.ir
PPT
Hadoop presentation
PPTX
Hadoop
PPTX
Big data advance topics - part 2.pptx
Apache Hadoop India Summit 2011 talk "Data Infrastructure on Hadoop" by Venka...
Best Hadoop and Amazon Online Training
Hadoop And Their Ecosystem
Hadoop Ecosystem
Hadoop Tutorial For Beginners
Hadoop online training
Hadoop
Cassandra eu
Map reduce & HDFS with Hadoop
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
Hadoop and Big Data
Hadoop distributions - ecosystem
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
Bn1028 demo hadoop administration and development
Introduction To Hadoop Ecosystem
Big data vahidamiri-tabriz-13960226-datastack.ir
Hadoop presentation
Hadoop
Big data advance topics - part 2.pptx
Ad

Similar to Million Monkeys User Group (20)

PPTX
Strata 2012 Million Monkeys
PPTX
Strata + Hadoop World 2012: Given Enough Monkeys - Some Thoughts On Randomness
PPTX
Getting Started with Big Data in the Cloud
PDF
Apache Hadoop & Friends at Utah Java User's Group
KEY
PDF
Hadoop programming
PDF
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
PDF
Hadoop Overview & Architecture
 
PDF
Keynote from ApacheCon NA 2011
PDF
Hadoop Overview kdd2011
PPT
Hadoop by sunitha
PPTX
Hands on Hadoop and pig
PDF
Architecting the Future of Big Data & Search - Eric Baldeschwieler
PPTX
Hadoop For Enterprises
PDF
Apache hadoop bigdata-in-banking
PPTX
Webinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo Slides
PDF
Common and unique use cases for Apache Hadoop
PDF
Commonanduniqueusecases 110831113310-phpapp01
KEY
Intro To Hadoop
PPTX
Big data hadoop ecosystem and nosql
Strata 2012 Million Monkeys
Strata + Hadoop World 2012: Given Enough Monkeys - Some Thoughts On Randomness
Getting Started with Big Data in the Cloud
Apache Hadoop & Friends at Utah Java User's Group
Hadoop programming
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Hadoop Overview & Architecture
 
Keynote from ApacheCon NA 2011
Hadoop Overview kdd2011
Hadoop by sunitha
Hands on Hadoop and pig
Architecting the Future of Big Data & Search - Eric Baldeschwieler
Hadoop For Enterprises
Apache hadoop bigdata-in-banking
Webinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo Slides
Common and unique use cases for Apache Hadoop
Commonanduniqueusecases 110831113310-phpapp01
Intro To Hadoop
Big data hadoop ecosystem and nosql
Ad

More from Jesse Anderson (12)

PDF
Managing Real-Time Data Teams
PDF
Pulsar for Kafka People
PDF
Big Data and Analytics in the COVID-19 Era
PDF
Working Together As Data Teams V1
PDF
What Does an Exec Need to About Architecture and Why
PDF
The Five Dysfunctions of a Data Engineering Team
PPTX
HBaseCon 2014-Just the Basics
PPTX
EC2 Performance, Spot Instance ROI and EMR Scalability
PPT
Introduction to Regular Expressions
ODP
Why Use MVC?
ODP
How to Use MVC
PPT
Introduction to Android
Managing Real-Time Data Teams
Pulsar for Kafka People
Big Data and Analytics in the COVID-19 Era
Working Together As Data Teams V1
What Does an Exec Need to About Architecture and Why
The Five Dysfunctions of a Data Engineering Team
HBaseCon 2014-Just the Basics
EC2 Performance, Spot Instance ROI and EMR Scalability
Introduction to Regular Expressions
Why Use MVC?
How to Use MVC
Introduction to Android

Million Monkeys User Group

  • 1. DO NOT USE PUBLICLY Million Monkeys PRIOR TO 10/23/12 Headline Goes Here Jesse Anderson | Curriculum Developer and Instructor Speaker Name or Subhead Goes Here November 2012 1
  • 2. About Me • Cloudera - Educational Services Team • Twitter - @jessetanderson • Blog and more info: http://www.jesse-anderson.com • Screencasts on Pragmatic Programmers: Buy It Now on http://www.jesse-anderson.com • President – Northern Nevada Software Developers Group 2
  • 3. About Cloudera • Cloudera is “The commercial Hadoop company” • Founded by leading experts on Hadoop from Facebook, Google, Oracle and Yahoo • Provides consulting and training services for Hadoop users • Staff includes committers to virtually all Hadoop projects 3
  • 4. Introduction • Infinite Monkey Theorem • Hadoop • Million Monkeys Algorithm • Business Case 4
  • 6. Exponential Growth (aka Big Data) Odds of finding a group Contiguous Combinations of characters is 1 in 26 Characters raised to the power of the number of 8 208,827,064,576 contiguous characters 9 5,429,503,678,976 10 141,167,095,653,376 6
  • 7. Hadoop • Apache Project • Reliable, Scalable, Distributed Computing • Software Framework • MapReduce • Distributed File System (HDFS) • Other projects 7
  • 8. Map Create or process the input data 8
  • 9. Reduce Process data from Map into something usable 9
  • 13. Hadoop Scalability Percent of Linear Scalability 100 80 Percent 60 RDBMS Hadoop 40 20 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Nodes RDBMS = Relational Database 13
  • 14. Business Value of Scalability Scaling does not require Adding more computers massive re-engineering to cluster gets a and complete rewrites of predictable increase in code computational power and storage SAVE SAVE 14
  • 15. Going Viral (and taking over the world) Covered internationally 26,000 unique in BBC, Wall Street visits from 119 Journal, Wired and countries in Slashdot one day 15
  • 16. Next Steps • Books • Hadoop: The Definitive Guide - Tom White • Hadoop Operations - Eric Sammer • Cloudera Training • Developer, Admin, Hive and Pig, HBase, Essentials • CDH • Cloudera's Apache Distribution Including Hadoop • Open Source • VM Image 16
  • 17. Conclusion • MapReduce breaks up problem efficiently • No code changes to scale • Incredible scalability • Enables previously impossible tasks 17
  • 18. 18

Editor's Notes

  • #6: Interesting statistical question. Thought about since Aristotle.Randomness+Resouces+Time=Anything PossibleNo real monkeys – need virtual monkeys
  • #7: Shakespeare lazy. Heavily influenced English Literature.Big Data isn’t always a huge file. It can be high computation.
  • #14: This is not a map of MT and ID1 to 20 node testingKeep efficiency up RDBMS efficiency in gutter
  • #15: Engineers not spending time coding to scale. Busy adding new features.No code changes for scaling. Took 1.5 months on one computer and 3.5 days on 20 nodesSpending on new computers gives a consistent, linear increase. Compare spending on RDBMS and Hadoop.