SlideShare a Scribd company logo
Making sense of Apache Bigtop, ODPi and
why it all matters to Apache Apex
Roman Shaposhnik, rvs@apache.org,
@rhatr
Director of Open Source Strategy,
Pivotal Inc.
A slide deck build via “Apache Way”
• Bigtop community contributors
• Roman Shaposhnik
• Konstantin Boudnik
• Nate D'Amico
• Evans Ye & Darren Chen (Trend Micro)
What is Apache Bigtop?
• Apache Bigtop is to Hadoop what Debian is to Linux
• A 100% open, community driven distribution of bigdata
management platform based on Apache Hadoop
• A place where all communities around big data come
together
• The thing everybody (Pivotal, Cloudera, Hortonworks,
WANDisco, IBM, Amazon, TrendMicro) is building off of
• A cutting edge, quickly evolving distribution and a set
of tools
GNU Software Linux kernel
Hadoop Ecosystem
(Pig, Hive, Spark) Linux kernel
Hadoop
(HDFS + YARN + MR)
ODPi is a nonprofit organization committed to simplification &
standardization of the big data ecosystem with a common reference
specification called ODPi Core.
As a shared industry effort , ODPi is focused on promoting and advancing the state of Apache Hadoop®
and Big Data Technologies for the Enterprise.
February 2015 December 2015September 2015
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
What has ODPi done so far (1.0.1)?
• Runtime specification
• https://github.com/odpi/specs/blob/master/ODPi-Runtime.md
• Validation testsuite
• http://repo.odpi.org/ODPi/1.0/acceptance-tests/
• Reference implementation binaries
• http://repo.odpi.org/ODPi/1.0/{centos6, ubuntu-14.04}
What are we working on?
• Operations specification
• https://github.com/odpi/specs/blob/master/ODPi-Operations.md
• ISV “ODPi compatible” policy
• Expanding ODPi core beyond Apache Hadoop & Ambari
• Hive
• ????
• How can you help?
• Share usecases
• Test against reference implementation
• Contribute to upstream ASF projects
What’s in is Bigtop?
• A set of binary packages
• just like CDH/PHD/HDP/ODPi/etc.
• Integration code
• Packaging code
• Deployment code
• Orchestration code
• Validation code
• Continuous Integration infrastructure
Integration/packaging
• Linux packages
• RPM, DEB
• RHEL/CentOS(Fedora), SLES(OpenSUSE), Debian, Ubuntu
• VirtualBox, VMWare, etc. VM images
• Challenge: Linux packaging is node-centric
• “smart” tarballs
• Docker or BOSH images
Integration testing based on iTest
• Clean-room provisioning
• these ain’t your gramp’s unit tests
• Versioned test artifacts
• JVM-base test artifacts
• Matching stacks of components and integration tests
• Plug’n’play architecture: Gradle/Groovy, JARs/artifacts
Puppet 3.x deployment
• Master-less puppet
• $ puppet apply bigtop-deploy/puppet/manifests/site.pp # on each node
• Cluster topology is kept in Hiera
bigtop::hadoop_head_node: "hadoopmaster.example.com"
hadoop::hadoop_storage_dirs:
- ”/mnt”
hadoop_cluster_node::cluster_components:
- yarn
- zookeeper
bigtop::bigtop_repo_uri:
"http://bigtop-
One click Bigtop provisioning
Who is this for?
• For Hadoop app developers, cluster admins, users
• Run a Hadoop cluster to test your code on
• Try & test configurations before applying to Production
• Play around with Bigtop Big Data Stack
• For contributors
• Easy to test your packaging, deployment, testing code
• For vendors
• CI out of the box —> patch upstream code made easier
Works great, but…
• Need to add vagrant public key into docker images
• Too many issues with auto-created boot2docker
hosting VM
• A bug for docker provider keep opening for almost
2y
• Waiting for machine to boot' hangs infinitely
• Can not share same code for different providers
anyway
• Not all the docker options supported in Vagrantfile
• Does not support Docker Swarm
Docker Compose
Implementation
• Create docker containers:
• docker-compose scale bigtop=3
• Volumes:
• Bigtop Puppet configurations
• Bigtop Puppet code
• /etc/hosts
•Compatible with Docker Machine and Swarm
Docker Machine and Swarm
Juju orchestration
$ juju boostrap
$ juju deploy hadoop-processing
https://jujucharms.com/hadoop-
processing/
Juju orchestration
$ juju add-unit slave -n 2
Juju orchestration
$ juju action do namenode/0 smoke-test
$ juju action do resourcemanager/0
smoke-test
$ watch -n 0.5 juju action status
Early Mission Accomplished
Foundation for commercial Hadoop distros/services
Leveraged by app providers…
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Blue prints for data engineering
• BigPetStore
• Data Generator
• Examples using tools in Hadoop ecosystem to process
data
• Build system and tests for integrating tools and multiple
JVM languages
• Started by Dr. Jay Vyas, prinicipal software engineer at
Red Hat, Inc.
Datamodel
Transaction Purchase Model
Lambda/Stream Architectures
HDFS + Zookeeper +
New focus and target end users
Data engineers vs distro
builders
Enhance
Operations/Deployment
Reference implementations
& tutorials
Data data data…
Smarter/Realistic test data
-bigpetstore
-bigtop-bazaar
-weather data gen
Tutorial/Learning Data sets
-githubarchive.org
-more tbd…
Thank You, Q&A

More Related Content

PDF
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform
PPTX
BigTop vm and docker provisioner
PPTX
How bigtop leveraged docker for build automation and one click hadoop provis...
PDF
How bigtop leveraged docker for build automation and one click hadoop provis...
PDF
OpenShift, Docker, Kubernetes: The next generation of PaaS
PPT
Building Hadoop with Chef
PDF
PuppetCamp SEA 1 - Using Vagrant, Puppet, Testing & Hadoop
PDF
OpenStack in Action 4! Sebastien Han - Ceph: de facto storage backend for Ope...
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform
BigTop vm and docker provisioner
How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...
OpenShift, Docker, Kubernetes: The next generation of PaaS
Building Hadoop with Chef
PuppetCamp SEA 1 - Using Vagrant, Puppet, Testing & Hadoop
OpenStack in Action 4! Sebastien Han - Ceph: de facto storage backend for Ope...

What's hot (20)

PPTX
Open Source Recipes for Chef Deployments of Hadoop
PDF
OpenStack in Action 4! Thierry Carrez - From Havana to Icehouse
PDF
High Availability from the DevOps side - OpenStack Summit Portland
PPTX
Cloud Foundry Deployment Tools: BOSH vs Juju Charms
PPTX
Puppet at Spotify
PDF
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
PDF
OpenShift Overview
ODP
From Zero to Cloud: Revolutionize your Application Life Cycle with OpenShift ...
ODP
Build a Basic Cloud Using RDO-manager
PDF
Delve into Helm - Advanced DevOps
PDF
Chef for OpenStack: OpenStack Spring Summit 2013
PDF
OpenStack in Action! 5 - Dell - OpenStack powered solutions - Patrick Hamon
PDF
SaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web Scale
PDF
Chef for OpenStack December 2012
PDF
OSDC 2018 | Three years running containers with Kubernetes in Production by T...
PPTX
Containers and CloudStack
PDF
Api world apache nifi 101
ODP
5 ways to install @OpenShift in 5 minutes (Lightening Talk given at #DevConfC...
ODP
OpenShift Anywhere given at Infrastructure.Next Talk at #Scale12X
Open Source Recipes for Chef Deployments of Hadoop
OpenStack in Action 4! Thierry Carrez - From Havana to Icehouse
High Availability from the DevOps side - OpenStack Summit Portland
Cloud Foundry Deployment Tools: BOSH vs Juju Charms
Puppet at Spotify
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
OpenShift Overview
From Zero to Cloud: Revolutionize your Application Life Cycle with OpenShift ...
Build a Basic Cloud Using RDO-manager
Delve into Helm - Advanced DevOps
Chef for OpenStack: OpenStack Spring Summit 2013
OpenStack in Action! 5 - Dell - OpenStack powered solutions - Patrick Hamon
SaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web Scale
Chef for OpenStack December 2012
OSDC 2018 | Three years running containers with Kubernetes in Production by T...
Containers and CloudStack
Api world apache nifi 101
5 ways to install @OpenShift in 5 minutes (Lightening Talk given at #DevConfC...
OpenShift Anywhere given at Infrastructure.Next Talk at #Scale12X
Ad

Similar to Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex (20)

PDF
State of Big Data on ARM64 / AArch64 - Apache Bigtop
PPTX
ODPi 101: Who we are, What we do
PDF
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data Everywhere
PDF
Leveraging docker for hadoop build automation and big data stack provisioning
PDF
Leveraging Docker for Hadoop build automation and Big Data stack provisioning
PDF
Trend Micro Big Data Platform and Apache Bigtop
PDF
ODPi (Open Data Platform Initiative) - Standardizing Hadoop Ecosystem: Linaro...
PDF
Scalable Spark deployment using Kubernetes
PDF
ODPi (Open Data Platform Initiative) - Linaro Connect
PDF
BKK16-400B ODPI - Standardizing Hadoop
PPTX
Containers and Big Data
PDF
Hadoop on OpenStack - Sahara @DevNation 2014
PDF
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
PDF
Deploying Hadoop-Based Bigdata Environments
PDF
Deploying Hadoop-based Bigdata Environments
PDF
Extending DevOps to Big Data Applications with Kubernetes
PPTX
Storage and-compute-hdfs-map reduce
PPTX
SC4 Workshop 2: Hajira Jabeen BDE Platform architecture
PDF
Containers and microservices for realists
PDF
Containers and Microservices for Realists
State of Big Data on ARM64 / AArch64 - Apache Bigtop
ODPi 101: Who we are, What we do
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data Everywhere
Leveraging docker for hadoop build automation and big data stack provisioning
Leveraging Docker for Hadoop build automation and Big Data stack provisioning
Trend Micro Big Data Platform and Apache Bigtop
ODPi (Open Data Platform Initiative) - Standardizing Hadoop Ecosystem: Linaro...
Scalable Spark deployment using Kubernetes
ODPi (Open Data Platform Initiative) - Linaro Connect
BKK16-400B ODPI - Standardizing Hadoop
Containers and Big Data
Hadoop on OpenStack - Sahara @DevNation 2014
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
Deploying Hadoop-Based Bigdata Environments
Deploying Hadoop-based Bigdata Environments
Extending DevOps to Big Data Applications with Kubernetes
Storage and-compute-hdfs-map reduce
SC4 Workshop 2: Hajira Jabeen BDE Platform architecture
Containers and microservices for realists
Containers and Microservices for Realists
Ad

More from Apache Apex (20)

PDF
Low Latency Polyglot Model Scoring using Apache Apex
PDF
From Batch to Streaming with Apache Apex Dataworks Summit 2017
PDF
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
PDF
Developing streaming applications with apache apex (strata + hadoop world)
PDF
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
PDF
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
PPTX
Intro to Apache Apex @ Women in Big Data
PPTX
Deep Dive into Apache Apex App Development
PPTX
Hadoop Interacting with HDFS
PPTX
Introduction to Real-Time Data Processing
PPTX
Introduction to Apache Apex
PPTX
Introduction to Yarn
PPTX
Introduction to Map Reduce
PPTX
HDFS Internals
PPTX
Intro to Big Data Hadoop
PPTX
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
PPTX
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
PPTX
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
PPTX
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
PPTX
Ingesting Data from Kafka to JDBC with Transformation and Enrichment
Low Latency Polyglot Model Scoring using Apache Apex
From Batch to Streaming with Apache Apex Dataworks Summit 2017
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Developing streaming applications with apache apex (strata + hadoop world)
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Intro to Apache Apex @ Women in Big Data
Deep Dive into Apache Apex App Development
Hadoop Interacting with HDFS
Introduction to Real-Time Data Processing
Introduction to Apache Apex
Introduction to Yarn
Introduction to Map Reduce
HDFS Internals
Intro to Big Data Hadoop
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Ingesting Data from Kafka to JDBC with Transformation and Enrichment

Recently uploaded (20)

PPTX
MYSQL Presentation for SQL database connectivity
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Encapsulation theory and applications.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPT
Teaching material agriculture food technology
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Electronic commerce courselecture one. Pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
NewMind AI Monthly Chronicles - July 2025
MYSQL Presentation for SQL database connectivity
20250228 LYD VKU AI Blended-Learning.pptx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
NewMind AI Weekly Chronicles - August'25 Week I
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Encapsulation theory and applications.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Mobile App Security Testing_ A Comprehensive Guide.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Big Data Technologies - Introduction.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Teaching material agriculture food technology
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Building Integrated photovoltaic BIPV_UPV.pdf
Electronic commerce courselecture one. Pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
NewMind AI Monthly Chronicles - July 2025

Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex

  • 1. Making sense of Apache Bigtop, ODPi and why it all matters to Apache Apex Roman Shaposhnik, rvs@apache.org, @rhatr Director of Open Source Strategy, Pivotal Inc.
  • 2. A slide deck build via “Apache Way” • Bigtop community contributors • Roman Shaposhnik • Konstantin Boudnik • Nate D'Amico • Evans Ye & Darren Chen (Trend Micro)
  • 3. What is Apache Bigtop? • Apache Bigtop is to Hadoop what Debian is to Linux • A 100% open, community driven distribution of bigdata management platform based on Apache Hadoop • A place where all communities around big data come together • The thing everybody (Pivotal, Cloudera, Hortonworks, WANDisco, IBM, Amazon, TrendMicro) is building off of • A cutting edge, quickly evolving distribution and a set of tools
  • 5. Hadoop Ecosystem (Pig, Hive, Spark) Linux kernel Hadoop (HDFS + YARN + MR)
  • 6. ODPi is a nonprofit organization committed to simplification & standardization of the big data ecosystem with a common reference specification called ODPi Core. As a shared industry effort , ODPi is focused on promoting and advancing the state of Apache Hadoop® and Big Data Technologies for the Enterprise.
  • 7. February 2015 December 2015September 2015
  • 9. What has ODPi done so far (1.0.1)? • Runtime specification • https://github.com/odpi/specs/blob/master/ODPi-Runtime.md • Validation testsuite • http://repo.odpi.org/ODPi/1.0/acceptance-tests/ • Reference implementation binaries • http://repo.odpi.org/ODPi/1.0/{centos6, ubuntu-14.04}
  • 10. What are we working on? • Operations specification • https://github.com/odpi/specs/blob/master/ODPi-Operations.md • ISV “ODPi compatible” policy • Expanding ODPi core beyond Apache Hadoop & Ambari • Hive • ???? • How can you help? • Share usecases • Test against reference implementation • Contribute to upstream ASF projects
  • 11. What’s in is Bigtop? • A set of binary packages • just like CDH/PHD/HDP/ODPi/etc. • Integration code • Packaging code • Deployment code • Orchestration code • Validation code • Continuous Integration infrastructure
  • 12. Integration/packaging • Linux packages • RPM, DEB • RHEL/CentOS(Fedora), SLES(OpenSUSE), Debian, Ubuntu • VirtualBox, VMWare, etc. VM images • Challenge: Linux packaging is node-centric • “smart” tarballs • Docker or BOSH images
  • 13. Integration testing based on iTest • Clean-room provisioning • these ain’t your gramp’s unit tests • Versioned test artifacts • JVM-base test artifacts • Matching stacks of components and integration tests • Plug’n’play architecture: Gradle/Groovy, JARs/artifacts
  • 14. Puppet 3.x deployment • Master-less puppet • $ puppet apply bigtop-deploy/puppet/manifests/site.pp # on each node • Cluster topology is kept in Hiera bigtop::hadoop_head_node: "hadoopmaster.example.com" hadoop::hadoop_storage_dirs: - ”/mnt” hadoop_cluster_node::cluster_components: - yarn - zookeeper bigtop::bigtop_repo_uri: "http://bigtop-
  • 15. One click Bigtop provisioning
  • 16. Who is this for? • For Hadoop app developers, cluster admins, users • Run a Hadoop cluster to test your code on • Try & test configurations before applying to Production • Play around with Bigtop Big Data Stack • For contributors • Easy to test your packaging, deployment, testing code • For vendors • CI out of the box —> patch upstream code made easier
  • 17. Works great, but… • Need to add vagrant public key into docker images • Too many issues with auto-created boot2docker hosting VM • A bug for docker provider keep opening for almost 2y • Waiting for machine to boot' hangs infinitely • Can not share same code for different providers anyway • Not all the docker options supported in Vagrantfile • Does not support Docker Swarm
  • 19. Implementation • Create docker containers: • docker-compose scale bigtop=3 • Volumes: • Bigtop Puppet configurations • Bigtop Puppet code • /etc/hosts •Compatible with Docker Machine and Swarm
  • 21. Juju orchestration $ juju boostrap $ juju deploy hadoop-processing
  • 23. Juju orchestration $ juju add-unit slave -n 2
  • 24. Juju orchestration $ juju action do namenode/0 smoke-test $ juju action do resourcemanager/0 smoke-test $ watch -n 0.5 juju action status
  • 25. Early Mission Accomplished Foundation for commercial Hadoop distros/services Leveraged by app providers…
  • 27. Blue prints for data engineering • BigPetStore • Data Generator • Examples using tools in Hadoop ecosystem to process data • Build system and tests for integrating tools and multiple JVM languages • Started by Dr. Jay Vyas, prinicipal software engineer at Red Hat, Inc.
  • 31. New focus and target end users Data engineers vs distro builders Enhance Operations/Deployment Reference implementations & tutorials
  • 32. Data data data… Smarter/Realistic test data -bigpetstore -bigtop-bazaar -weather data gen Tutorial/Learning Data sets -githubarchive.org -more tbd…