SlideShare a Scribd company logo
The DAP – Where YARN, HBase, Kafka and Spark
Go to Production
Hadoop Summit - June 30th, 2016
cask.co
Cask, CDAP, Cask Hydrator and Cask Tracker are trademarks or registered trademarks of Cask Data. Apache Spark, Spark, the Spark logo, Apache Hadoop, Hadoop and the Hadoop logo are trademarks or registered trademarks of the Apache Software Foundation. All other trademarks and registered trademarks are the property of their respective owners.
cask.co
About Me
2
cask.co
The Many Faces of Hadoop
3
Developer Data Scientist IT Pro / Ops
LOB Manager
Advanced Programming
Focuses on App Logic
Basic Programming
Focuses on Data
Configuration & Monitoring
Focuses on Operations
Analysis & Decision Making
Focuses on Insights
cask.co
Big Data Challenges
4
cask.co
Building a Big Data App
5
cask.co
Deploying and Operating a Big Data App
6
cask.co
Today’s Integration Solutions are Silo’ed
7
Data Integration App Integration Cloud Integration Governance
cask.co
Introducing the DAP
8
cask.co9
Enter Cask
Key Customers and Partners
Named a Gartner Cool Vendor 2016
Founded in 2011 by early Hadoop engineers from Facebook and Yahoo!
cask.co
Introducing the Cask Data App Platform
10
cask.co
CDAP Overview
11
Open Source, Integrated Framework for Building and Running
Data Applications on Hadoop and Spark
cask.co12
â—Ź Provides a platform with framework level correctness
â—Ź Dataset abstractions & self-service data
â—Ź One framework: Prototype to Production
â—Ź Unified approach across all paradigms
â—‹ Metrics & Log collection
â—‹ Lineage, Audit, Access Control
CDAP Consolidates Big Data App Lifecycle
cask.co
CDAP Extensions
13
cask.co
CDAP Architecture
14
â—Ź Application Container Architecture
â—Ź Reusable Programming Abstractions
â—Ź Global User and Machine Metadata
cask.co
CDAP Application Structure
15
cask.co
CDAP Deployment Architecture
16
cask.co
Hadoop in the Enterprise – Simplified with CDAP
17
cask.co
Common Use Cases
18
cask.co
Summary
19
cask.co
Thank You !
20

More Related Content

PPTX
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
PPTX
Integrating Apache Spark and NiFi for Data Lakes
PDF
Filling the Data Lake
PPTX
Insights into Real World Data Management Challenges
PDF
Discover HDP 2.1: Apache Solr for Hadoop Search
PPTX
Accelerating Big Data Insights
PDF
Data-In-Motion Unleashed
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Integrating Apache Spark and NiFi for Data Lakes
Filling the Data Lake
Insights into Real World Data Management Challenges
Discover HDP 2.1: Apache Solr for Hadoop Search
Accelerating Big Data Insights
Data-In-Motion Unleashed

What's hot (20)

PPTX
Big Data Simplified - Is all about Ab'strakSHeN
PPTX
Is your Enterprise Data lake Metadata Driven AND Secure?
PDF
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
PDF
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
PPTX
Big Data at your Desk with KNIME
PPTX
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
PPTX
Cloudy with a chance of Hadoop - real world considerations
PPTX
Modernise your EDW - Data Lake
PDF
Splunk-hortonworks-risk-management-oct-2014
PDF
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
PDF
Discover.hdp2.2.h base.final[2]
PPTX
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
PPTX
Scaling Data Science on Big Data
PPTX
Swimming Across the Data Lake, Lessons learned and keys to success
PPTX
Format Wars: from VHS and Beta to Avro and Parquet
PPTX
HDFS: Optimization, Stabilization and Supportability
PPTX
End-to-End Security and Auditing in a Big Data as a Service Deployment
PPTX
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
PDF
High Performance Spatial-Temporal Trajectory Analysis with Spark
PPTX
Hybrid Data Platform
Big Data Simplified - Is all about Ab'strakSHeN
Is your Enterprise Data lake Metadata Driven AND Secure?
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Big Data at your Desk with KNIME
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Cloudy with a chance of Hadoop - real world considerations
Modernise your EDW - Data Lake
Splunk-hortonworks-risk-management-oct-2014
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Discover.hdp2.2.h base.final[2]
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Scaling Data Science on Big Data
Swimming Across the Data Lake, Lessons learned and keys to success
Format Wars: from VHS and Beta to Avro and Parquet
HDFS: Optimization, Stabilization and Supportability
End-to-End Security and Auditing in a Big Data as a Service Deployment
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
High Performance Spatial-Temporal Trajectory Analysis with Spark
Hybrid Data Platform
Ad

Viewers also liked (20)

PDF
Cassandra summit 2015 - Simplifying Streaming Analytics
PPTX
Manage your compactions before they manage you!
PDF
A Generative Method for Infrastructure Emergence
PPTX
Streamline Hadoop DevOps with Apache Ambari
PPT
SparkSQL et Cassandra - Tool In Action Devoxx 2015
PPTX
A complete hadoop stack
PDF
The SparkSQL things you maybe confuse
PPTX
"Who Moved my Data? - Why tracking changes and sources of data is critical to...
PPTX
Getting started with SparkSQL - Desert Code Camp 2016
PDF
Introducing Athena: 08/19 Big Data Application Meetup, Talk #3
PDF
ACID Transactions in Apache Phoenix with Apache Tephra™ (incubating), by Poor...
PDF
Webinar: What's new in CDAP 3.5?
PDF
Galvanise NYC - Scaling R with Hadoop & Spark. V1.0
PPTX
HBaseConEast2016: HBase and Spark, State of the Art
PDF
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
PDF
Transactions Over Apache HBase
PDF
The Nitty Gritty of Advanced Analytics Using Apache Spark in Python
PPTX
Data Science at Scale: Using Apache Spark for Data Science at Bitly
PPTX
Spark meetup v2.0.5
PDF
Parquet Hadoop Summit 2013
Cassandra summit 2015 - Simplifying Streaming Analytics
Manage your compactions before they manage you!
A Generative Method for Infrastructure Emergence
Streamline Hadoop DevOps with Apache Ambari
SparkSQL et Cassandra - Tool In Action Devoxx 2015
A complete hadoop stack
The SparkSQL things you maybe confuse
"Who Moved my Data? - Why tracking changes and sources of data is critical to...
Getting started with SparkSQL - Desert Code Camp 2016
Introducing Athena: 08/19 Big Data Application Meetup, Talk #3
ACID Transactions in Apache Phoenix with Apache Tephra™ (incubating), by Poor...
Webinar: What's new in CDAP 3.5?
Galvanise NYC - Scaling R with Hadoop & Spark. V1.0
HBaseConEast2016: HBase and Spark, State of the Art
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
Transactions Over Apache HBase
The Nitty Gritty of Advanced Analytics Using Apache Spark in Python
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Spark meetup v2.0.5
Parquet Hadoop Summit 2013
Ad

Similar to The DAP - Where YARN, HBase, Kafka and Spark go to Production (20)

PPTX
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
PDF
Transitioning Compute Models: Hadoop MapReduce to Spark
PDF
Hadoop Application Architectures Mark Grover Ted Malaska Jonathan Seidman Gwe...
PDF
spark_v1_2
PDF
Accelerating R analytics with Spark and Microsoft R Server for Hadoop
PDF
The Big Picture on Hadoop
PDF
Spark vs Hadoop: Which Big Data Framework to Choose?
PDF
Hadoop Vs Spark — Choosing the Right Big Data Framework
PPTX
View on big data technologies
PPTX
Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...
DOCX
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
PDF
Athens BigData Meetup - Sept 17
PDF
SnapLogic Enhancements Support iPaaS for Hadoop 2.0 Environments
PDF
Custom Development - SAP HANA
PDF
Sandish3Certs
PPTX
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
PPTX
Accelerate Your Big Data Analytics Efforts with SAS and Hadoop
PDF
Emerging trends in data analytics
PDF
Storm Demo Talk - Colorado Springs May 2015
PPTX
Learn Apache Spark: A Comprehensive Guide
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
Transitioning Compute Models: Hadoop MapReduce to Spark
Hadoop Application Architectures Mark Grover Ted Malaska Jonathan Seidman Gwe...
spark_v1_2
Accelerating R analytics with Spark and Microsoft R Server for Hadoop
The Big Picture on Hadoop
Spark vs Hadoop: Which Big Data Framework to Choose?
Hadoop Vs Spark — Choosing the Right Big Data Framework
View on big data technologies
Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
Athens BigData Meetup - Sept 17
SnapLogic Enhancements Support iPaaS for Hadoop 2.0 Environments
Custom Development - SAP HANA
Sandish3Certs
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Accelerate Your Big Data Analytics Efforts with SAS and Hadoop
Emerging trends in data analytics
Storm Demo Talk - Colorado Springs May 2015
Learn Apache Spark: A Comprehensive Guide

More from DataWorks Summit/Hadoop Summit (20)

PPT
Running Apache Spark & Apache Zeppelin in Production
PPT
State of Security: Apache Spark & Apache Zeppelin
PDF
Unleashing the Power of Apache Atlas with Apache Ranger
PDF
Enabling Digital Diagnostics with a Data Science Platform
PDF
Revolutionize Text Mining with Spark and Zeppelin
PDF
Double Your Hadoop Performance with Hortonworks SmartSense
PDF
Hadoop Crash Course
PDF
Data Science Crash Course
PDF
Apache Spark Crash Course
PDF
Dataflow with Apache NiFi
PPTX
Schema Registry - Set you Data Free
PPTX
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
PDF
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
PPTX
Mool - Automated Log Analysis using Data Science and ML
PPTX
How Hadoop Makes the Natixis Pack More Efficient
PPTX
HBase in Practice
PPTX
The Challenge of Driving Business Value from the Analytics of Things (AOT)
PDF
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
PPTX
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
PPTX
Backup and Disaster Recovery in Hadoop
Running Apache Spark & Apache Zeppelin in Production
State of Security: Apache Spark & Apache Zeppelin
Unleashing the Power of Apache Atlas with Apache Ranger
Enabling Digital Diagnostics with a Data Science Platform
Revolutionize Text Mining with Spark and Zeppelin
Double Your Hadoop Performance with Hortonworks SmartSense
Hadoop Crash Course
Data Science Crash Course
Apache Spark Crash Course
Dataflow with Apache NiFi
Schema Registry - Set you Data Free
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Mool - Automated Log Analysis using Data Science and ML
How Hadoop Makes the Natixis Pack More Efficient
HBase in Practice
The Challenge of Driving Business Value from the Analytics of Things (AOT)
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
Backup and Disaster Recovery in Hadoop

Recently uploaded (20)

PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPT
Teaching material agriculture food technology
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Big Data Technologies - Introduction.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
A Presentation on Artificial Intelligence
PPTX
Cloud computing and distributed systems.
DOCX
The AUB Centre for AI in Media Proposal.docx
 
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Advanced methodologies resolving dimensionality complications for autism neur...
Review of recent advances in non-invasive hemoglobin estimation
Teaching material agriculture food technology
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Building Integrated photovoltaic BIPV_UPV.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Mobile App Security Testing_ A Comprehensive Guide.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Encapsulation_ Review paper, used for researhc scholars
Big Data Technologies - Introduction.pptx
Unlocking AI with Model Context Protocol (MCP)
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Understanding_Digital_Forensics_Presentation.pptx
A Presentation on Artificial Intelligence
Cloud computing and distributed systems.
The AUB Centre for AI in Media Proposal.docx
 
Reach Out and Touch Someone: Haptics and Empathic Computing
Build a system with the filesystem maintained by OSTree @ COSCUP 2025

The DAP - Where YARN, HBase, Kafka and Spark go to Production