SlideShare a Scribd company logo
© 2016 DataTorrent
Chinmay Kolhatkar
Committer, Apache Apex
Engineer, DataTorrent
June 21, 2016
Apache Apex-Bigtop
© 2016 DataTorrent
Agenda
2
• About Apache Apex
• Apex Platform Overview
• Apex - Native Hadoop Integration
• Apex Malhar Library
• Apex as a Bigtop component
• Installing Bigtop Apex
• Apex Docker sandbox
• Apex Docker sandbox Demo
© 2016 DataTorrent
About Apache Apex
3
• Platform and runtime engine that enables development of scalable
and fault-tolerant distributed applications
• Hadoop native (Hadoop >= 2.2)
No separate service to manage stream processing
Streaming Engine built into Application Master and Containers
• Process streaming or batch big data
• High throughput and low latency
• Library of commonly needed business logic
• Write any custom business logic in your application
© 2016 DataTorrent
Apex Platform Overview
4
© 2016 DataTorrent
Apex - Native Hadoop Integration
5
• YARN is the
resource
manager
• HDFS used for
storing any
persistent
state
© 2016 DataTorrent
Apex Malhar Library
6
RDBMS
• Vertica
• MySQL
• Oracle
• JDBC
NoSQL
• Cassandra, Hbase
• Aerospike, Accumulo
• Couchbase/ CouchDB
• Redis, MongoDB
• Geode
Messaging
• Kafka
• Solace
• Flume, ActiveMQ
• Kinesis, NiFi
File Systems
• HDFS/ Hive
• NFS
• S3
Parsers
• XML
• JSON
• CSV
• Avro
• Parquet
Transformations
• Filters
• Rules
• Expression
• Dedup
• Enrich
Analytics
• Dimensional Aggregations
(with state management for
historical data + query)
Protocols
• HTTP
• FTP
• WebSocket
• MQTT
• SMTP
Other
• Elastic Search
• Script (JavaScript, Python, R)
• Solr
• Twitter
© 2016 DataTorrent
Apex as Bigtop component
7
• Uses Bigtop framework for ease of deployment
Deployment using puppet recipes and Vagrant
Can spawn multiple node clusters for docker, VM & OpenStack
• Generates a deployable binaries for Apex engine
RPM - CentOS 5 & 6, Fedora 20, OpenSuse 42.1
DEB - Ubuntu 14.04 & 16.04, Debian 8
• Allows validating installations
Package Test
Smoke Test
© 2016 DataTorrent
• Add Bigtop Repository
http://www.apache.org/dist/bigtop/bigtop-1.1.0/repos/
• Install bigtop-hadoop
For Debian: apt-get install hadoop*
For RPM: yum install hadoop*
• Download bigtop-apex from bigtop CI
https://ci.bigtop.apache.org/job/Bigtop-trunk-packages/
• Install Apex:
For Debian: dpkg -i apex_3.4.0-1_all.deb
For RPM: rpm -i apex-3.4.0-1.el6.noarch.rpm
Installing Bigtop Apex
Bigtop 1.1.0 (Current)
8
© 2016 DataTorrent
• Add Bigtop Repository (Future URL)
http://www.apache.org/dist/bigtop/bigtop-1.2.0/repos/
• Install apex
For Debian: apt-get install apex
For RPM: yum install apex
Installing Bigtop Apex
Bigtop 1.2.0 (Next Release)
9
© 2016 DataTorrent
• A quick starter Apex docker image: https://hub.docker.com/r/chinmayk/apex/
• Preconfigured and running components
HDFS (namenode, secondarynamenode, datanode)
YARN (resourcemanager, nodemanager, timelineserver)
• Preconfigured and installed component
Apex
• Get started:
Step1: docker pull chinmayk/apex
Step2: docker run -it chinmayk/apex:ubuntu-14.04
Apex Docker sandbox
10
© 2016 DataTorrent
Apex Docker sandbox (contd.)
11
© 2016 DataTorrent
Resources
12
• Apache Apex website - http://apex.apache.org/
• Subscribe - http://apex.apache.org/community.html
• Download - http://apex.apache.org/downloads.html
• Twitter - @ApacheApex; Follow - https://twitter.com/apacheapex
• Facebook - https://www.facebook.com/ApacheApex/
• Meetup - http://www.meetup.com/topics/apache-apex
• SlideShare - http://www.slideshare.net/ApacheApex/presentations
• More Examples - https://github.com/DataTorrent/examples
• Startup Program – Free Enterprise License for Startups, Educational Institutions,
Non-Profits - https://www.datatorrent.com/startups/
• Cloud Trial - https://www.datatorrent.com/download/cloud-trial/
© 2016 DataTorrent
We Are Hiring
13
• jobs@datatorrent.com
• Back-End Engineers
• Front-End Engineers
• QA Automation Engineers
• Solutions Engineers

More Related Content

PDF
Introduction to Apache Apex
PDF
HEPiX2015_a2_RACF_azaytsev_Ceph_v4_mod1
PPTX
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
PDF
The Future of Apache Storm
PDF
Introduction to Apache Apex - CoDS 2016
PDF
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
PPTX
Espresso Database Replication with Kafka, Tom Quiggle
PPTX
LLAP: Locality is dead (in the cloud)
Introduction to Apache Apex
HEPiX2015_a2_RACF_azaytsev_Ceph_v4_mod1
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
The Future of Apache Storm
Introduction to Apache Apex - CoDS 2016
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
Espresso Database Replication with Kafka, Tom Quiggle
LLAP: Locality is dead (in the cloud)

What's hot (20)

PPTX
Apache Storm In Retail Context
PDF
Apache Ratis - In Search of a Usable Raft Library
PDF
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
PDF
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...
PDF
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
PPTX
High cardinality time series search: A new level of scale - Data Day Texas 2016
PPTX
HBaseConEast2016: Coprocessors – Uses, Abuses and Solutions
PPTX
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
KEY
Near-realtime analytics with Kafka and HBase
PDF
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase
PDF
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
PPTX
Flexible and Real-Time Stream Processing with Apache Flink
PPTX
Performance Comparison of Streaming Big Data Platforms
PPTX
Architecture of a Kafka camus infrastructure
PDF
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
PDF
Real-time Streaming Pipelines with FLaNK
PDF
Stream Processing made simple with Kafka
PPTX
Overview of Cascading 3.0 on Apache Flink
PPTX
Flink history, roadmap and vision
PDF
Testing Cassandra Guarantees under Diverse Failure Modes with Jepsen
Apache Storm In Retail Context
Apache Ratis - In Search of a Usable Raft Library
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
High cardinality time series search: A new level of scale - Data Day Texas 2016
HBaseConEast2016: Coprocessors – Uses, Abuses and Solutions
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Near-realtime analytics with Kafka and HBase
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
Flexible and Real-Time Stream Processing with Apache Flink
Performance Comparison of Streaming Big Data Platforms
Architecture of a Kafka camus infrastructure
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
Real-time Streaming Pipelines with FLaNK
Stream Processing made simple with Kafka
Overview of Cascading 3.0 on Apache Flink
Flink history, roadmap and vision
Testing Cassandra Guarantees under Diverse Failure Modes with Jepsen
Ad

Viewers also liked (20)

PDF
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
PPTX
Architectual Comparison of Apache Apex and Spark Streaming
PDF
Log ingestion kafka -- impala using apex
PPTX
Writing an Apache Apex Application
PPTX
DataFlow & Beam
PDF
Real-time Stream Processing using Apache Apex
PPTX
Smart Partitioning with Apache Apex (Webinar)
PPTX
The Avant-garde of Apache NiFi
PPTX
Apache NiFi in the Hadoop Ecosystem
PPTX
Apache Apex Introduction with PubMatic
PPTX
Integrating Apache NiFi and Apache Flink
PPTX
Introduction to Apache NiFi - Seattle Scalability Meetup
PPTX
Apache NiFi 1.0 in Nutshell
PPTX
Next Gen Big Data Analytics with Apache Apex
PDF
Introduction to Apache Beam
PPTX
Apache Beam: A unified model for batch and stream processing data
PDF
Streaming Processing with a Distributed Commit Log
PPTX
Apache NiFi Crash Course Intro
PPTX
Introduction to Apache Apex
PPTX
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Architectual Comparison of Apache Apex and Spark Streaming
Log ingestion kafka -- impala using apex
Writing an Apache Apex Application
DataFlow & Beam
Real-time Stream Processing using Apache Apex
Smart Partitioning with Apache Apex (Webinar)
The Avant-garde of Apache NiFi
Apache NiFi in the Hadoop Ecosystem
Apache Apex Introduction with PubMatic
Integrating Apache NiFi and Apache Flink
Introduction to Apache NiFi - Seattle Scalability Meetup
Apache NiFi 1.0 in Nutshell
Next Gen Big Data Analytics with Apache Apex
Introduction to Apache Beam
Apache Beam: A unified model for batch and stream processing data
Streaming Processing with a Distributed Commit Log
Apache NiFi Crash Course Intro
Introduction to Apache Apex
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Ad

Similar to Apache Apex & Bigtop (7)

PDF
How bigtop leveraged docker for build automation and one click hadoop provis...
PDF
Trend Micro Big Data Platform and Apache Bigtop
PPTX
How bigtop leveraged docker for build automation and one click hadoop provis...
PDF
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data Everywhere
PDF
State of Big Data on ARM64 / AArch64 - Apache Bigtop
PDF
Leveraging docker for hadoop build automation and big data stack provisioning
PDF
Leveraging Docker for Hadoop build automation and Big Data stack provisioning
How bigtop leveraged docker for build automation and one click hadoop provis...
Trend Micro Big Data Platform and Apache Bigtop
How bigtop leveraged docker for build automation and one click hadoop provis...
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data Everywhere
State of Big Data on ARM64 / AArch64 - Apache Bigtop
Leveraging docker for hadoop build automation and big data stack provisioning
Leveraging Docker for Hadoop build automation and Big Data stack provisioning

More from Apache Apex (20)

PDF
Low Latency Polyglot Model Scoring using Apache Apex
PDF
From Batch to Streaming with Apache Apex Dataworks Summit 2017
PDF
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
PDF
Developing streaming applications with apache apex (strata + hadoop world)
PDF
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
PPTX
Intro to Apache Apex @ Women in Big Data
PPTX
Deep Dive into Apache Apex App Development
PPTX
Hadoop Interacting with HDFS
PPTX
Introduction to Real-Time Data Processing
PPTX
Introduction to Apache Apex
PPTX
Introduction to Yarn
PPTX
Introduction to Map Reduce
PPTX
HDFS Internals
PPTX
Intro to Big Data Hadoop
PPTX
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
PPTX
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
PPTX
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
PPTX
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
PPTX
Ingesting Data from Kafka to JDBC with Transformation and Enrichment
PPTX
Big Data Berlin v8.0 Stream Processing with Apache Apex
Low Latency Polyglot Model Scoring using Apache Apex
From Batch to Streaming with Apache Apex Dataworks Summit 2017
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Developing streaming applications with apache apex (strata + hadoop world)
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Intro to Apache Apex @ Women in Big Data
Deep Dive into Apache Apex App Development
Hadoop Interacting with HDFS
Introduction to Real-Time Data Processing
Introduction to Apache Apex
Introduction to Yarn
Introduction to Map Reduce
HDFS Internals
Intro to Big Data Hadoop
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Ingesting Data from Kafka to JDBC with Transformation and Enrichment
Big Data Berlin v8.0 Stream Processing with Apache Apex

Recently uploaded (20)

PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
NewMind AI Monthly Chronicles - July 2025
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
cuic standard and advanced reporting.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Electronic commerce courselecture one. Pdf
PDF
KodekX | Application Modernization Development
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
Understanding_Digital_Forensics_Presentation.pptx
Spectral efficient network and resource selection model in 5G networks
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Building Integrated photovoltaic BIPV_UPV.pdf
A Presentation on Artificial Intelligence
Per capita expenditure prediction using model stacking based on satellite ima...
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
NewMind AI Monthly Chronicles - July 2025
MYSQL Presentation for SQL database connectivity
Digital-Transformation-Roadmap-for-Companies.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
cuic standard and advanced reporting.pdf
Network Security Unit 5.pdf for BCA BBA.
Electronic commerce courselecture one. Pdf
KodekX | Application Modernization Development
20250228 LYD VKU AI Blended-Learning.pptx
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Understanding_Digital_Forensics_Presentation.pptx

Apache Apex & Bigtop

  • 1. © 2016 DataTorrent Chinmay Kolhatkar Committer, Apache Apex Engineer, DataTorrent June 21, 2016 Apache Apex-Bigtop
  • 2. © 2016 DataTorrent Agenda 2 • About Apache Apex • Apex Platform Overview • Apex - Native Hadoop Integration • Apex Malhar Library • Apex as a Bigtop component • Installing Bigtop Apex • Apex Docker sandbox • Apex Docker sandbox Demo
  • 3. © 2016 DataTorrent About Apache Apex 3 • Platform and runtime engine that enables development of scalable and fault-tolerant distributed applications • Hadoop native (Hadoop >= 2.2) No separate service to manage stream processing Streaming Engine built into Application Master and Containers • Process streaming or batch big data • High throughput and low latency • Library of commonly needed business logic • Write any custom business logic in your application
  • 4. © 2016 DataTorrent Apex Platform Overview 4
  • 5. © 2016 DataTorrent Apex - Native Hadoop Integration 5 • YARN is the resource manager • HDFS used for storing any persistent state
  • 6. © 2016 DataTorrent Apex Malhar Library 6 RDBMS • Vertica • MySQL • Oracle • JDBC NoSQL • Cassandra, Hbase • Aerospike, Accumulo • Couchbase/ CouchDB • Redis, MongoDB • Geode Messaging • Kafka • Solace • Flume, ActiveMQ • Kinesis, NiFi File Systems • HDFS/ Hive • NFS • S3 Parsers • XML • JSON • CSV • Avro • Parquet Transformations • Filters • Rules • Expression • Dedup • Enrich Analytics • Dimensional Aggregations (with state management for historical data + query) Protocols • HTTP • FTP • WebSocket • MQTT • SMTP Other • Elastic Search • Script (JavaScript, Python, R) • Solr • Twitter
  • 7. © 2016 DataTorrent Apex as Bigtop component 7 • Uses Bigtop framework for ease of deployment Deployment using puppet recipes and Vagrant Can spawn multiple node clusters for docker, VM & OpenStack • Generates a deployable binaries for Apex engine RPM - CentOS 5 & 6, Fedora 20, OpenSuse 42.1 DEB - Ubuntu 14.04 & 16.04, Debian 8 • Allows validating installations Package Test Smoke Test
  • 8. © 2016 DataTorrent • Add Bigtop Repository http://www.apache.org/dist/bigtop/bigtop-1.1.0/repos/ • Install bigtop-hadoop For Debian: apt-get install hadoop* For RPM: yum install hadoop* • Download bigtop-apex from bigtop CI https://ci.bigtop.apache.org/job/Bigtop-trunk-packages/ • Install Apex: For Debian: dpkg -i apex_3.4.0-1_all.deb For RPM: rpm -i apex-3.4.0-1.el6.noarch.rpm Installing Bigtop Apex Bigtop 1.1.0 (Current) 8
  • 9. © 2016 DataTorrent • Add Bigtop Repository (Future URL) http://www.apache.org/dist/bigtop/bigtop-1.2.0/repos/ • Install apex For Debian: apt-get install apex For RPM: yum install apex Installing Bigtop Apex Bigtop 1.2.0 (Next Release) 9
  • 10. © 2016 DataTorrent • A quick starter Apex docker image: https://hub.docker.com/r/chinmayk/apex/ • Preconfigured and running components HDFS (namenode, secondarynamenode, datanode) YARN (resourcemanager, nodemanager, timelineserver) • Preconfigured and installed component Apex • Get started: Step1: docker pull chinmayk/apex Step2: docker run -it chinmayk/apex:ubuntu-14.04 Apex Docker sandbox 10
  • 11. © 2016 DataTorrent Apex Docker sandbox (contd.) 11
  • 12. © 2016 DataTorrent Resources 12 • Apache Apex website - http://apex.apache.org/ • Subscribe - http://apex.apache.org/community.html • Download - http://apex.apache.org/downloads.html • Twitter - @ApacheApex; Follow - https://twitter.com/apacheapex • Facebook - https://www.facebook.com/ApacheApex/ • Meetup - http://www.meetup.com/topics/apache-apex • SlideShare - http://www.slideshare.net/ApacheApex/presentations • More Examples - https://github.com/DataTorrent/examples • Startup Program – Free Enterprise License for Startups, Educational Institutions, Non-Profits - https://www.datatorrent.com/startups/ • Cloud Trial - https://www.datatorrent.com/download/cloud-trial/
  • 13. © 2016 DataTorrent We Are Hiring 13 • jobs@datatorrent.com • Back-End Engineers • Front-End Engineers • QA Automation Engineers • Solutions Engineers