SlideShare a Scribd company logo
Deploying Hadoop-Based Bigdata
                  Environments
     Click to edit Master subtitle style
 “[Tall] Tales From The Frontier”

Roman Shaposhnik
rvs@apache.org, Cloudera Inc.
$ whoami

   An open source software developer
       Linux kernel, C/C++ compilers, FFmpeg, Plan9
   A Hadoop and all around UNIX guy
   root@cloudera
       Member of the “Kitchen” team
   Apache Software Foundation Incubator PMC
       [Bigtop], Hadoop Development Tools, Celix, Helix
   VP of Apache Bigtop
                                                    2
ZooKeeper (coordination)

       HUE (web based UI)


Pig (DQL) Hive (SQL) Impala (SQL)

 HBase      YARN/MR1         Oozie

         HDFS (filesystem)


                                     3
ZooKeeper (coordination)

       HUE (web based UI)


Pig (DQL) Hive (SQL) Impala (SQL)

 HBase      YARN/MR1         Oozie

         HDFS (filesystem)


                                     4
It is a jungle out there
   Zookeeper         Sqoop       JDK/JRE
   Hadoop            Oozie       Kerberos
         HDFS        Whirr       Ganglia
         YARN        Mahout      Nagios
         MR1         Flume       JSVC
         HTTPFS      Giraph      Tomcat
   HBase             Hama        Utils
   Pig               Hue         Postgress
   Hive              Solr        HTTPD
   Impala            Crunch
                                                5
And the answer is:

         Puppet[forge]


                     6
One way of using Apache software

  $ wget http://apache.org/httpd.tar.gz
  $ tar xzvf httpd.tar.gz
  $ cd httpd
  $ ./configure ; make
  $ make install
  ERROR: can't write to /usr/local/bin
  $ sudo make install
                                          7
A different way

  $ sudo apt-get install httpd
  Would you like to also upgrade your conf?




                                              8
Is there apt-get install hadoop ?

   Hadoop is still in a very active development
   Hadoop is Java based
   Hadoop is a distributed application
   Hadoop is way more than HDFS + MR




                                              9
Project-by-project approach

   “Passively” maintained code
       Packaging, OS-level (init.d)
   Developer-centric view
       Edit-compile-debug cycle vs. deployment
       Lack of integration testing
   Differences in distributions/packaging:
       Where is this valid: /usr/libexec ?
   Combinatoric explosion of dependencies
                                                  10
Dependencies Inferno:

                            Hive 0.8.1


          HBase
       Hbase (0.92, 0.90)
                                               HBase
                                            HBase
                                  Hadoop (1.0, 0.22, 0.23)



             A million dollar question:
$ tar xzvf hive-0.8.1.tar.gz
$ ls hive-0.8.1/lib
                                                             11
Dependencies Inferno:

                            Hive 0.8.1


          HBase
       Hbase (0.92, 0.90)
                                               HBase
                                            HBase
                                  Hadoop (1.0, 0.22, 0.23)



             A million dollar question:
$ tar xzvf hive-0.8.1.tar.gz
$ ls hive-0.8.1/lib
hbase-0.89.jar log4j-1.2.15.jar log4j-1.2.16.jar
                                             12
Remember what Debian did to Linux?


 GNU Software             Linux kernel
                         Linux kernel




                                         13
Bigtop is trying to do it with Hadoop

Hadoop Ecosystem              Hadoop
                             Linux kernel
(Pig, Hive, Mahout)        (HDFS + MR)




CDH4 beta 1
                                            14
What's there in Bigtop

       Build/Packaging infrastructure
           RPM, DEB, (tarballs, homebrew/MacPorts)
           VirtualBox, VMWare and KVM VMs
           Fedora, OpenSUSE, Mageia, CentOS, Ubuntu
       Puppet deployment infrastrucutre
       Integration test infrastrucutre (iTest)
       Bigtop Jenkins:
           http://bigtop01.cloudera.org:8080
                                                      15
And the answer is:

      Puppet[Bigtop]


                     16
System software deployment

   Packages vs. Puppet code
       package/file/service
   What is packaging?
       dependency tracking
       build encapsulation
       java packaging
       file layout
       user creation
       service registration   17
Does it really work?

   Java packaging
       maven/ivy integration
   file layout
       side-by-side installations of the same package
   user creation
       LDAP/AD provisioning
   service registration
       start on install vs. start on reboot
                                                     18
Petascale distributed systems

       Scale
           Yahoo! ~5000 nodes
       Deployment orchestration
           Kerberos::Host_keytab <| title == "hdfs" |> ->
              Service["hadoop-hdfs-datanode"]
       Highly coordinated distributed system
           It ain't HTTPD/loadbalancer
           Rolling upgrades/asynchronous rollbacks
                                                             19
Back to tarballs and shell?

       What's better for Puppet: fpm or rpm?
       What is the role of Puppet?
           coordinating the entire system: lack of DSL
           converging an isolated node: will it ever work?
           a building block for an agent-based system
       One agent to rule them all?
           there's no spoon^H^H^H^H^H^ agent: Whirr
           MCollective
                                                          20
           Cloudera Manager, Ambari
Evolution, not perfection!
   Minimalistic, highly consistent packages
       /usr/lib/hadoop, /etc/hadoop/conf (alternative)
       fail gracefully: .... || : )
       Java packaging is not solved [yet]: symlinks
   Minimalistic Puppet code
       package/file/service
       masterless (most of the time)
       integration with Whirr
   BoxGrinder                                            21
The road ahead
   New kind of configuration management
       /etc/hadoop vs Zookeeper
   New kinds of system packaging
       Parcels (tarballs + metadata)
       HPS (Hadoop Packaging System)
   Orchestration: to puppet or not to puppet?
       Cloudera Manager
       Apache Ambari (incubating)
       Reactor 8: http://reactor8.com       22
Java Packaging
   Fate of Java
       OpenJDK
   OSGi
       Hadoop's view: MAPREDUCE-1700
        https://issues.apache.org/jira/browse/MAPREDUCE-1700
   Project Jigsaw
       Language tie-ins? Really?
   Linux vendors getting their act together
                                                               23
Integration testing
   Clean room provisioning
       Those ain't unit tests – they trash the system
   Cluster topology and cluster state discovery
       How can puppet help us?
   Cluster state manipulation
       Test-driven orchestration
       Chaos Monkey
   How to be successful in OS co-opetition
       Make everything pluggable (and subvert ;-))      24
Anatomy of iTest

   Versioned, JVM-based test/data artifacts
   Dependency between test artifacts
   Matching stack of integration tests
   Implementation
       Maven artifacts, pom files
       JUnit test-execution entry point
       Groovy for scripting

                                               25
Who's the target audience

       End users
           YOU!
       ASF Projects/Bigdata developers
           from Avro to Zookeeper
       Bigdata solutions vendors
           Cloudera, EMC, Hortonworks, Karmasphere
       DevOPs
           Ebay, Yahoo, Facebook, LinkedIn
                                                      26
Who's on-board?
   Cloudera
       CDH4 is 100% based on Bigtop (hadoop v2)
       Available @cloudera.com
   Canonical
       Ubuntu Server: Hadoop and Bigdata blueprint
        https://blueprints.launchpad.net/ubuntu/+spec/servercloud-p-hdp-hadoop

   TrendMicro
   Hortonworks (partially)
   EMC, EBay (early stages of prototyping)                                27
What's happening?
   A special release: Bigtop 0.3.0-incubating
       Hadoop 1.0.1
   Last stable release: Bigtop 0.5.0
       Hadoop 2.0.2-alpha
   Next stable release: Bigtop 0.6.0
       End of Mar 2013 release
       Hadoop 2.0.3-beta
       Major focus on developers
                                                 28
What Bigtop needs from you?

       More of you!
           Meetup: “Silicon Valley Hands-on Programming”
            http://www.meetup.com/HandsOnProgrammingEvents/
       More infrastructure for build/test
           EC2, Supercell, EMC magic cluster, CloudStack
       More integration tests
           Convince your bosses to commit to Bigtop
       Validate upstream release using Bigtop
                                                              29
Contact
§
    Bigtop home @Apache:
    •
        http://incubator.apache.org/bigtop/
§
    Hangout places:
    •
        {dev,user}@bigtop.apache.org
    •
        #bigtop on Freenode
§
    Roman Shaposhnik
    •
        rvs@apache.org, rvs@cloudera.com



                                 30

More Related Content

PDF
PuppetCamp SEA 1 - Using Vagrant, Puppet, Testing & Hadoop
PPTX
Open Source Recipes for Chef Deployments of Hadoop
PPT
Building Hadoop with Chef
PDF
How bigtop leveraged docker for build automation and one click hadoop provis...
PDF
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform
PPTX
BigTop vm and docker provisioner
PPTX
How bigtop leveraged docker for build automation and one click hadoop provis...
PPTX
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
PuppetCamp SEA 1 - Using Vagrant, Puppet, Testing & Hadoop
Open Source Recipes for Chef Deployments of Hadoop
Building Hadoop with Chef
How bigtop leveraged docker for build automation and one click hadoop provis...
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform
BigTop vm and docker provisioner
How bigtop leveraged docker for build automation and one click hadoop provis...
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex

What's hot (20)

KEY
Practical introduction to dev ops with chef
PDF
Chef Fundamentals Training Series Module 1: Overview of Chef
PDF
Chef: Smart infrastructure automation
PDF
Chef for OpenStack: Grizzly Roadmap
PDF
Community Cookbooks & further resources - Fundamentals Webinar Series Part 6
PDF
Automated Deployment and Configuration Engines. Ansible
PDF
Boston/NYC Chef for OpenStack Hack Days
PDF
Chef for OpenStack December 2012
PDF
Chef Fundamentals Training Series Module 4: The Chef Client Run and Expanding...
PDF
Ninja, Choose Your Weapon!
PDF
Chef Fundamentals Training Series Module 2: Workstation Setup
PDF
Package Management and Chef - ChefConf 2015
PPTX
Scaling Development Environments with Docker
PDF
Node setup, resource, and recipes - Fundamentals Webinar Series Part 2
PDF
Environments - Fundamentals Webinar Series Week 5
PPTX
Opscode Webinar: Managing Your VMware Infrastructure with Chef
PDF
The unintended benefits of Chef
PDF
Chef Fundamentals Training Series Module 3: Setting up Nodes and Cookbook Aut...
PDF
SaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web Scale
PPTX
Building a PaaS using Chef
Practical introduction to dev ops with chef
Chef Fundamentals Training Series Module 1: Overview of Chef
Chef: Smart infrastructure automation
Chef for OpenStack: Grizzly Roadmap
Community Cookbooks & further resources - Fundamentals Webinar Series Part 6
Automated Deployment and Configuration Engines. Ansible
Boston/NYC Chef for OpenStack Hack Days
Chef for OpenStack December 2012
Chef Fundamentals Training Series Module 4: The Chef Client Run and Expanding...
Ninja, Choose Your Weapon!
Chef Fundamentals Training Series Module 2: Workstation Setup
Package Management and Chef - ChefConf 2015
Scaling Development Environments with Docker
Node setup, resource, and recipes - Fundamentals Webinar Series Part 2
Environments - Fundamentals Webinar Series Week 5
Opscode Webinar: Managing Your VMware Infrastructure with Chef
The unintended benefits of Chef
Chef Fundamentals Training Series Module 3: Setting up Nodes and Cookbook Aut...
SaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web Scale
Building a PaaS using Chef
Ad

Viewers also liked (14)

PDF
Whirr devdown
PDF
Cascading - A Java Developer’s Companion to the Hadoop World
PPTX
Hadoop Scheduling - a 7 year perspective
PDF
Hadoop scheduler
PPTX
Introduction to Puppet Enterprise 2016.4
PPTX
Streamline Hadoop DevOps with Apache Ambari
PDF
Using Vagrant, Puppet, Testing & Hadoop
PPTX
Demystifying TLS
PPTX
Introduction to Puppet Enterprise
PPTX
Introduction to Puppet Enterprise
PPTX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
PDF
Intro To Cascading
PPTX
Introduction to Puppet Enterprise
PPTX
Adopting Kubernetes with Puppet
Whirr devdown
Cascading - A Java Developer’s Companion to the Hadoop World
Hadoop Scheduling - a 7 year perspective
Hadoop scheduler
Introduction to Puppet Enterprise 2016.4
Streamline Hadoop DevOps with Apache Ambari
Using Vagrant, Puppet, Testing & Hadoop
Demystifying TLS
Introduction to Puppet Enterprise
Introduction to Puppet Enterprise
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
Intro To Cascading
Introduction to Puppet Enterprise
Adopting Kubernetes with Puppet
Ad

Similar to Deploying Hadoop-Based Bigdata Environments (20)

PDF
Webinar: The Future of Hadoop
PDF
May 2013 HUG: Building common denominator of Hadoop distributions with Bigtop
PDF
Aug 2012 HUG: Hug BigTop
PPTX
vBACD - Crash Course in Open Source Cloud Computing - 2/28
PPTX
Build a Cloud Day SF - Crash Course on Open Source Cloud Computing
PDF
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
PDF
Hadoop: Big Data Stacks validation w/ iTest How to tame the elephant?
PDF
Hadoop summit cloudera keynote_v5
PDF
Hw09 Welcome To Hadoop World
ODP
On HBase Integration Testing
PDF
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
PDF
CloudInit Introduction
PPTX
Hadoop for Bioinformatics: Building a Scalable Variant Store
PDF
Big Data Step-by-Step: Infrastructure 1/3: Local VM
PPTX
Overview: Building Open Source Cloud Computing Environments
PPTX
vBACD- July 2012 - Crash Course in Open Source Cloud Computing
PDF
20100128ebay
PPTX
Instant hadoop of your own
PDF
Deploying software at Scale
PDF
Running your Java EE 6 applications in the clouds
Webinar: The Future of Hadoop
May 2013 HUG: Building common denominator of Hadoop distributions with Bigtop
Aug 2012 HUG: Hug BigTop
vBACD - Crash Course in Open Source Cloud Computing - 2/28
Build a Cloud Day SF - Crash Course on Open Source Cloud Computing
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Hadoop: Big Data Stacks validation w/ iTest How to tame the elephant?
Hadoop summit cloudera keynote_v5
Hw09 Welcome To Hadoop World
On HBase Integration Testing
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
CloudInit Introduction
Hadoop for Bioinformatics: Building a Scalable Variant Store
Big Data Step-by-Step: Infrastructure 1/3: Local VM
Overview: Building Open Source Cloud Computing Environments
vBACD- July 2012 - Crash Course in Open Source Cloud Computing
20100128ebay
Instant hadoop of your own
Deploying software at Scale
Running your Java EE 6 applications in the clouds

More from Puppet (20)

PPTX
Puppet Community Day: Planning the Future Together
PPTX
The Evolution of Puppet: Key Changes and Modernization Tips
PPTX
Can You Help Me Upgrade to Puppet 8? Tips, Tools & Best Practices for Your Up...
PPTX
Bolt Dynamic Inventory: Making Puppet Easier
PPTX
Customizing Reporting with the Puppet Report Processor
PPTX
Puppet at ConfigMgmtCamp 2025 Sponsor Deck
PPTX
The State of Puppet in 2025: A Presentation from Developer Relations Lead Dav...
PPTX
Let Red be Red and Green be Green: The Automated Workflow Restarter in GitHub...
PDF
Puppet camp2021 testing modules and controlrepo
PPTX
Puppetcamp r10kyaml
PDF
2021 04-15 operational verification (with notes)
PPTX
Puppet camp vscode
PDF
Modules of the twenties
PDF
Applying Roles and Profiles method to compliance code
PPTX
KGI compliance as-code approach
PDF
Enforce compliance policy with model-driven automation
PDF
Keynote: Puppet camp compliance
PPTX
Automating it management with Puppet + ServiceNow
PPTX
Puppet: The best way to harden Windows
PPTX
Simplified Patch Management with Puppet - Oct. 2020
Puppet Community Day: Planning the Future Together
The Evolution of Puppet: Key Changes and Modernization Tips
Can You Help Me Upgrade to Puppet 8? Tips, Tools & Best Practices for Your Up...
Bolt Dynamic Inventory: Making Puppet Easier
Customizing Reporting with the Puppet Report Processor
Puppet at ConfigMgmtCamp 2025 Sponsor Deck
The State of Puppet in 2025: A Presentation from Developer Relations Lead Dav...
Let Red be Red and Green be Green: The Automated Workflow Restarter in GitHub...
Puppet camp2021 testing modules and controlrepo
Puppetcamp r10kyaml
2021 04-15 operational verification (with notes)
Puppet camp vscode
Modules of the twenties
Applying Roles and Profiles method to compliance code
KGI compliance as-code approach
Enforce compliance policy with model-driven automation
Keynote: Puppet camp compliance
Automating it management with Puppet + ServiceNow
Puppet: The best way to harden Windows
Simplified Patch Management with Puppet - Oct. 2020

Recently uploaded (20)

PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
A Presentation on Artificial Intelligence
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Cloud computing and distributed systems.
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Electronic commerce courselecture one. Pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPT
Teaching material agriculture food technology
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Big Data Technologies - Introduction.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
A Presentation on Artificial Intelligence
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Chapter 3 Spatial Domain Image Processing.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
MYSQL Presentation for SQL database connectivity
Cloud computing and distributed systems.
“AI and Expert System Decision Support & Business Intelligence Systems”
Electronic commerce courselecture one. Pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Teaching material agriculture food technology
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Agricultural_Statistics_at_a_Glance_2022_0.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Big Data Technologies - Introduction.pptx
Review of recent advances in non-invasive hemoglobin estimation
NewMind AI Monthly Chronicles - July 2025
Build a system with the filesystem maintained by OSTree @ COSCUP 2025

Deploying Hadoop-Based Bigdata Environments

  • 1. Deploying Hadoop-Based Bigdata Environments Click to edit Master subtitle style “[Tall] Tales From The Frontier” Roman Shaposhnik rvs@apache.org, Cloudera Inc.
  • 2. $ whoami  An open source software developer  Linux kernel, C/C++ compilers, FFmpeg, Plan9  A Hadoop and all around UNIX guy  root@cloudera  Member of the “Kitchen” team  Apache Software Foundation Incubator PMC  [Bigtop], Hadoop Development Tools, Celix, Helix  VP of Apache Bigtop 2
  • 3. ZooKeeper (coordination) HUE (web based UI) Pig (DQL) Hive (SQL) Impala (SQL) HBase YARN/MR1 Oozie HDFS (filesystem) 3
  • 4. ZooKeeper (coordination) HUE (web based UI) Pig (DQL) Hive (SQL) Impala (SQL) HBase YARN/MR1 Oozie HDFS (filesystem) 4
  • 5. It is a jungle out there  Zookeeper  Sqoop  JDK/JRE  Hadoop  Oozie  Kerberos  HDFS  Whirr  Ganglia  YARN  Mahout  Nagios  MR1  Flume  JSVC  HTTPFS  Giraph  Tomcat  HBase  Hama  Utils  Pig  Hue  Postgress  Hive  Solr  HTTPD  Impala  Crunch 5
  • 6. And the answer is: Puppet[forge] 6
  • 7. One way of using Apache software $ wget http://apache.org/httpd.tar.gz $ tar xzvf httpd.tar.gz $ cd httpd $ ./configure ; make $ make install ERROR: can't write to /usr/local/bin $ sudo make install 7
  • 8. A different way $ sudo apt-get install httpd Would you like to also upgrade your conf? 8
  • 9. Is there apt-get install hadoop ?  Hadoop is still in a very active development  Hadoop is Java based  Hadoop is a distributed application  Hadoop is way more than HDFS + MR 9
  • 10. Project-by-project approach  “Passively” maintained code  Packaging, OS-level (init.d)  Developer-centric view  Edit-compile-debug cycle vs. deployment  Lack of integration testing  Differences in distributions/packaging:  Where is this valid: /usr/libexec ?  Combinatoric explosion of dependencies 10
  • 11. Dependencies Inferno: Hive 0.8.1 HBase Hbase (0.92, 0.90) HBase HBase Hadoop (1.0, 0.22, 0.23) A million dollar question: $ tar xzvf hive-0.8.1.tar.gz $ ls hive-0.8.1/lib 11
  • 12. Dependencies Inferno: Hive 0.8.1 HBase Hbase (0.92, 0.90) HBase HBase Hadoop (1.0, 0.22, 0.23) A million dollar question: $ tar xzvf hive-0.8.1.tar.gz $ ls hive-0.8.1/lib hbase-0.89.jar log4j-1.2.15.jar log4j-1.2.16.jar 12
  • 13. Remember what Debian did to Linux? GNU Software Linux kernel Linux kernel 13
  • 14. Bigtop is trying to do it with Hadoop Hadoop Ecosystem Hadoop Linux kernel (Pig, Hive, Mahout) (HDFS + MR) CDH4 beta 1 14
  • 15. What's there in Bigtop  Build/Packaging infrastructure  RPM, DEB, (tarballs, homebrew/MacPorts)  VirtualBox, VMWare and KVM VMs  Fedora, OpenSUSE, Mageia, CentOS, Ubuntu  Puppet deployment infrastrucutre  Integration test infrastrucutre (iTest)  Bigtop Jenkins:  http://bigtop01.cloudera.org:8080 15
  • 16. And the answer is: Puppet[Bigtop] 16
  • 17. System software deployment  Packages vs. Puppet code  package/file/service  What is packaging?  dependency tracking  build encapsulation  java packaging  file layout  user creation  service registration 17
  • 18. Does it really work?  Java packaging  maven/ivy integration  file layout  side-by-side installations of the same package  user creation  LDAP/AD provisioning  service registration  start on install vs. start on reboot 18
  • 19. Petascale distributed systems  Scale  Yahoo! ~5000 nodes  Deployment orchestration  Kerberos::Host_keytab <| title == "hdfs" |> -> Service["hadoop-hdfs-datanode"]  Highly coordinated distributed system  It ain't HTTPD/loadbalancer  Rolling upgrades/asynchronous rollbacks 19
  • 20. Back to tarballs and shell?  What's better for Puppet: fpm or rpm?  What is the role of Puppet?  coordinating the entire system: lack of DSL  converging an isolated node: will it ever work?  a building block for an agent-based system  One agent to rule them all?  there's no spoon^H^H^H^H^H^ agent: Whirr  MCollective 20  Cloudera Manager, Ambari
  • 21. Evolution, not perfection!  Minimalistic, highly consistent packages  /usr/lib/hadoop, /etc/hadoop/conf (alternative)  fail gracefully: .... || : )  Java packaging is not solved [yet]: symlinks  Minimalistic Puppet code  package/file/service  masterless (most of the time)  integration with Whirr  BoxGrinder 21
  • 22. The road ahead  New kind of configuration management  /etc/hadoop vs Zookeeper  New kinds of system packaging  Parcels (tarballs + metadata)  HPS (Hadoop Packaging System)  Orchestration: to puppet or not to puppet?  Cloudera Manager  Apache Ambari (incubating)  Reactor 8: http://reactor8.com 22
  • 23. Java Packaging  Fate of Java  OpenJDK  OSGi  Hadoop's view: MAPREDUCE-1700 https://issues.apache.org/jira/browse/MAPREDUCE-1700  Project Jigsaw  Language tie-ins? Really?  Linux vendors getting their act together 23
  • 24. Integration testing  Clean room provisioning  Those ain't unit tests – they trash the system  Cluster topology and cluster state discovery  How can puppet help us?  Cluster state manipulation  Test-driven orchestration  Chaos Monkey  How to be successful in OS co-opetition  Make everything pluggable (and subvert ;-)) 24
  • 25. Anatomy of iTest  Versioned, JVM-based test/data artifacts  Dependency between test artifacts  Matching stack of integration tests  Implementation  Maven artifacts, pom files  JUnit test-execution entry point  Groovy for scripting 25
  • 26. Who's the target audience  End users  YOU!  ASF Projects/Bigdata developers  from Avro to Zookeeper  Bigdata solutions vendors  Cloudera, EMC, Hortonworks, Karmasphere  DevOPs  Ebay, Yahoo, Facebook, LinkedIn 26
  • 27. Who's on-board?  Cloudera  CDH4 is 100% based on Bigtop (hadoop v2)  Available @cloudera.com  Canonical  Ubuntu Server: Hadoop and Bigdata blueprint https://blueprints.launchpad.net/ubuntu/+spec/servercloud-p-hdp-hadoop  TrendMicro  Hortonworks (partially)  EMC, EBay (early stages of prototyping) 27
  • 28. What's happening?  A special release: Bigtop 0.3.0-incubating  Hadoop 1.0.1  Last stable release: Bigtop 0.5.0  Hadoop 2.0.2-alpha  Next stable release: Bigtop 0.6.0  End of Mar 2013 release  Hadoop 2.0.3-beta  Major focus on developers 28
  • 29. What Bigtop needs from you?  More of you!  Meetup: “Silicon Valley Hands-on Programming” http://www.meetup.com/HandsOnProgrammingEvents/  More infrastructure for build/test  EC2, Supercell, EMC magic cluster, CloudStack  More integration tests  Convince your bosses to commit to Bigtop  Validate upstream release using Bigtop 29
  • 30. Contact § Bigtop home @Apache: • http://incubator.apache.org/bigtop/ § Hangout places: • {dev,user}@bigtop.apache.org • #bigtop on Freenode § Roman Shaposhnik • rvs@apache.org, rvs@cloudera.com 30