SlideShare a Scribd company logo
Webhooks
Near-real time event processing with guaranteed delivery of HTTP callbacks
HBaseCon 2015
Alan Steckley
Principal Software Engineer, Salesforce
2
Poorna Chandra
Software Engineer, Cask
3
​Safe harbor statement under the Private Securities Litigation Reform Act of 1995:
​This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties
materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results
expressed or implied by the forward-looking statements we make. All statements other than statements of historical fact could be
deemed forward-looking, including any projections of product or service availability, subscriber growth, earnings, revenues, or other
financial items and any statements regarding strategies or plans of management for future operations, statements of belief, any
statements concerning new, planned, or upgraded services or technology developments and customer contracts or use of our services.
​The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new
functionality for our service, new products and services, our new business model, our past operating losses, possible fluctuations in our
operating results and rate of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of any
litigation, risks associated with completed and any possible mergers and acquisitions, the immature market in which we operate, our
relatively limited operating history, our ability to expand, retain, and motivate our employees and manage our growth, new releases of
our service and successful customer deployment, our limited history reselling non-salesforce.com products, and utilization and selling to
larger enterprise customers. Further information on potential factors that could affect the financial results of salesforce.com, inc. is
included in our annual report on Form 10-K for the most recent fiscal year and in our quarterly report on Form 10-Q for the most recent
fiscal quarter. These documents and others containing important disclosures are available on the SEC Filings section of the Investor
Information section of our Web site.
​Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently
available and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based
upon features that are currently available. Salesforce.com, inc. assumes no obligation and does not intend to update these forward-
looking statements.
Safe Harbor
4
● Salesforce Marketing Cloud
● Webhooks use case
● Implementation in CDAP
● Q&A
Overview
5
● Connects businesses to their customers through email, social media, and SMS.
● 1+ billion personalized messages per day
● 100,000’s of business units
● Billions of subscribers
● Hosts petabytes of customer data in our data centers
● Handles a wide range of communications
○ Marketing campaigns
○ Purchase confirmations
○ Financial notifications
○ Password resets
What is the Salesforce Marketing Cloud?
6
● Webhooks is a near-real time event delivery platform with guaranteed delivery
○ Subscribers generate events by engaging with messages
○ Deliver events to customers over HTTP within seconds
○ Customers react to events in near real time
What is Webhooks?
7
A purchase receipt email fails to be delivered
A mail bounce event is pushed to a service hosted by the retailer
Retailer’s customer service is immediately aware of the failure
Example use case
8
1. Process a stream of near real time events based on customer defined actions.
2. Guarantee delivery of processed events emitted to third party systems.
General problem statement
9
High data integrity
Commerce, health, and finance messaging subject to government regulation
Horizontal scalability
Short time to market
Accessible developer experience
Existing Hadoop/YARN/HBase expertise and infrastructure
Open Source
Primary concerns
10
Some events need pieces of information from other event streams
Example: An email click needs the email send event for contextual information
Wait until other events arrive to assemble the final event
Join across streams
Configurable TTL to wait to join (optional)
Implementation concern - Joins
11
Configurable per customer endpoint
Retry
Throttle
TTL to deliver (optional)
Reporting metrics, SLA compliance
Implementation concern - Delivery guarantees
12
High level architecture
Ingest
Join
Route
Store
HTTP POST
Kafka Source
External
System
13
public class EventRouter {
private Map<EventType, Route> routesMap;
public void process(Event e) {
Route route = routesMap.get(e.clientId());
if (null != route) {
httpPost(e, route);
}
}
}
Business logic
14
public class EventJoiner {
private Map<JoinKey, SendEvent> sends;
public void process(ResponseEvent e) {
SendEvent send = sends.get(e.getKey());
if (null != send) {
Event joined = join(send, e);
routeEvent(joined);
}
}
}
Business logic
15
● Scaling data store is easy - use HBase
● Scaling application involves
○ Transactions
○ Application stack
○ Lifecycle management
○ Data movement
○ Coordination
How to scale?
16
17
● An open source framework to build and deploy data applications on
Apache™ Hadoop®
● Provides abstractions to represent data access and processing
pipelines
● Framework level guarantees for exactly-once semantics
● Transaction support on HBase
● Supports real time and batch processing
● Built on YARN and HBase
Cask Data Application Platform (CDAP)
18
Webhooks in CDAP
19
Business logic
public class EventJoiner {
private Map<JoinKey, SendEvent> sends;
public void process(ResponseEvent e) {
SendEvent send = sends.get(e.getKey());
if (null != send) {
Event joined = join(send, e);
routeEvent(joined);
}
}
}
20
Business logic in CDAP - Flowlet
public class EventJoiner extends AbstractFlowlet {
@UseDataSet(“sends”)
private SendEventDataset sends;
private OutputEmitter<Event> outQueue;
@ProcessInput
public void join(ResponseEvent e) {
SendEvent send = sends.get(e.getKey());
if (send != null) {
Event joined = join(e, send);
outQueue.emit(joined);
}
}
}
21
public class EventJoiner extends AbstractFlowlet {
@UseDataSet(“sends”)
private SendEventDataset sends;
private OutputEmitter<Event> outQueue;
@ProcessInput
public void join(ResponseEvent e) {
SendEvent send = sends.get(e.getKey());
if (send != null) {
Event joined = join(e, send);
outQueue.emit(joined);
}
}
}
Access data with Datasets
22
Chain Flowlets with Queues
public class EventJoiner extends AbstractFlowlet {
@UseDataSet(“sends”)
private SendEventDataset sends;
private OutputEmitter<Event> outQueue;
@ProcessInput
public void join(ResponseEvent e) {
SendEvent send = sends.get(e.getKey());
if (send != null) {
Event joined = join(e, send);
outQueue.emit(joined);
}
}
}
23
Tigon Flow
Event Joiner
Flowlet
HBase Queue HBase Queue
Start Tx End Tx
Start Tx
End Tx
Event Router
Flowlet
● Real time streaming processor
● Composed of Flowlets
● Exactly-once semantics
HBase Queue
24
Scaling Flowlets
Event Joiner
Flowlets
Event Router
Flowlets
HBase Queue
YARN
Containers
FIFO
Round Robin
Hash Partitioning
25
Summary
● CDAP makes development easier by handling the overhead of
scalability
○ Transactions
○ Application stack
○ Lifecycle management
○ Data movement
○ Coordination
26
Datasets and Tephra
27
Data abstraction using Dataset
● Store and retrieve data
● Reusable data access patterns
● Abstraction of underlying data storage
○ HBase
○ LevelDB
○ In-memory
● Can be shared between Flows (real-time) and MapReduce (batch)
28
● Transactions make exactly-once semantics possible
● Multi-row and across HBase regions transactions
● Optimistic concurrency control (Omid style)
● Open source (Apache 2.0 License)
● http://tephra.io
Transaction support with Tephra
29
● Used today in enterprise cloud applications
● CDAP is open source (Apache 2.0 License)
Use and contribute
http://cdap.io/
30
Alan Steckley
asteckley@salesforce.com
http://salesforce.com
Q&A
Poorna Chandra
poorna@cask.co
http://cdap.io
31

More Related Content

PPTX
GemFire In-Memory Data Grid
PPTX
Operating and Supporting Apache HBase Best Practices and Improvements
PPTX
Sharing metadata across the data lake and streams
PDF
Webinar: What's new in CDAP 3.5?
PPTX
Scaling HDFS at Xiaomi
PPTX
Ozone: scaling HDFS to trillions of objects
PPTX
Securing data in hybrid environments using Apache Ranger
PPTX
"Who Moved my Data? - Why tracking changes and sources of data is critical to...
GemFire In-Memory Data Grid
Operating and Supporting Apache HBase Best Practices and Improvements
Sharing metadata across the data lake and streams
Webinar: What's new in CDAP 3.5?
Scaling HDFS at Xiaomi
Ozone: scaling HDFS to trillions of objects
Securing data in hybrid environments using Apache Ranger
"Who Moved my Data? - Why tracking changes and sources of data is critical to...

What's hot (20)

DOCX
Hotel inspection data set analysis copy
PPTX
Yahoo - Moving beyond running 100% of Apache Pig jobs on Apache Tez
PPTX
What's new in Ambari
PDF
Introducing a horizontally scalable, inference-based business Rules Engine fo...
PPTX
In Search of Database Nirvana: Challenges of Delivering HTAP
PPTX
Integrating Apache Phoenix with Distributed Query Engines
PPT
The Time Has Come for Big-Data-as-a-Service
PPTX
Kudu Deep-Dive
PDF
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...
PDF
Apache Flink & Kudu: a connector to develop Kappa architectures
PPTX
Building Continuously Curated Ingestion Pipelines
PPTX
Design Patterns for Building 360-degree Views with HBase and Kiji
PPTX
A New "Sparkitecture" for modernizing your data warehouse
PPTX
HIPAA Compliance in the Cloud
PPTX
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
PDF
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
PDF
Big Data Ready Enterprise
PPTX
Enabling the Active Data Warehouse with Apache Kudu
PPTX
Make streaming processing towards ANSI SQL
Hotel inspection data set analysis copy
Yahoo - Moving beyond running 100% of Apache Pig jobs on Apache Tez
What's new in Ambari
Introducing a horizontally scalable, inference-based business Rules Engine fo...
In Search of Database Nirvana: Challenges of Delivering HTAP
Integrating Apache Phoenix with Distributed Query Engines
The Time Has Come for Big-Data-as-a-Service
Kudu Deep-Dive
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...
Apache Flink & Kudu: a connector to develop Kappa architectures
Building Continuously Curated Ingestion Pipelines
Design Patterns for Building 360-degree Views with HBase and Kiji
A New "Sparkitecture" for modernizing your data warehouse
HIPAA Compliance in the Cloud
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
Big Data Ready Enterprise
Enabling the Active Data Warehouse with Apache Kudu
Make streaming processing towards ANSI SQL
Ad

Viewers also liked (20)

PPTX
HBaseCon 2015: Industrial Internet Case Study using HBase and TSDB
PPTX
A Survey of HBase Application Archetypes
PDF
Maven c'est bien, SBT c'est mieux
PDF
Universitélang scala tools
PDF
Les monades Scala, Java 8
PDF
Université des langages scala
PDF
Scala Intro
PDF
Lagom, reactive framework
PDF
Introduction à Scala - Michel Schinz - January 2010
PDF
Scala in Action - Heiko Seeburger
PDF
Getting Functional with Scala
PDF
Paris stormusergroup intrudocution
ODP
Introduction to Spark with Scala
PDF
Soutenance ysance
PDF
Hammurabi
PDF
Scala - A Scalable Language
PDF
Scala at HUJI PL Seminar 2008
PDF
Mémoire de fin d'étude - La big data et les réseaux sociaux
PDF
Lagom, reactive framework(chtijug2016)
PDF
Démystifions le machine learning avec spark par David Martin pour le Salon B...
HBaseCon 2015: Industrial Internet Case Study using HBase and TSDB
A Survey of HBase Application Archetypes
Maven c'est bien, SBT c'est mieux
Universitélang scala tools
Les monades Scala, Java 8
Université des langages scala
Scala Intro
Lagom, reactive framework
Introduction à Scala - Michel Schinz - January 2010
Scala in Action - Heiko Seeburger
Getting Functional with Scala
Paris stormusergroup intrudocution
Introduction to Spark with Scala
Soutenance ysance
Hammurabi
Scala - A Scalable Language
Scala at HUJI PL Seminar 2008
Mémoire de fin d'étude - La big data et les réseaux sociaux
Lagom, reactive framework(chtijug2016)
Démystifions le machine learning avec spark par David Martin pour le Salon B...
Ad

Similar to HBaseCon 2015: NRT Event Processing with Guaranteed Delivery of HTTP Callbacks (20)

PDF
Webhooks, Asynchronous Web Applications and Push Notifications
PPTX
Event mesh APIDays melbourne September 2019
PDF
Messaging for modern applications
PDF
apidays LIVE Australia 2020 - Events are Cool Again! by Nelson Petracek
PPTX
ThingsExpo: Enterprise Internet of Things (IoT) Patterns, Opportunities and P...
PDF
Putting the Sec into DevOps
PDF
Control your world using the Salesforce1 Platform (IoT)
PPTX
Understanding Salesforce Streaming API
KEY
Event Driven Architecture
PDF
OutSystsems User Group Netherlands September 2024.pdf
PDF
Realtime Apps with Node.js, Heroku, and Force.com Streaming
PDF
Event Driven-Architecture from a Scalability perspective
PPTX
Brasil Roadshow
PPTX
Neev Expertise in Spring and Hibernate
PPTX
Salesforce Streaming event - PushTopic and Generic Events
PDF
Integrating High-Velocity External Data in Your Salesforce Application
PPTX
Eda gas andelectricity_meetup-adelaide_pov
PPT
Business Mashups Best of the Web APIs
PPTX
Event Management System using Full Stack Web Application Review-1
PPTX
Multi-Process JavaScript Architectures
Webhooks, Asynchronous Web Applications and Push Notifications
Event mesh APIDays melbourne September 2019
Messaging for modern applications
apidays LIVE Australia 2020 - Events are Cool Again! by Nelson Petracek
ThingsExpo: Enterprise Internet of Things (IoT) Patterns, Opportunities and P...
Putting the Sec into DevOps
Control your world using the Salesforce1 Platform (IoT)
Understanding Salesforce Streaming API
Event Driven Architecture
OutSystsems User Group Netherlands September 2024.pdf
Realtime Apps with Node.js, Heroku, and Force.com Streaming
Event Driven-Architecture from a Scalability perspective
Brasil Roadshow
Neev Expertise in Spring and Hibernate
Salesforce Streaming event - PushTopic and Generic Events
Integrating High-Velocity External Data in Your Salesforce Application
Eda gas andelectricity_meetup-adelaide_pov
Business Mashups Best of the Web APIs
Event Management System using Full Stack Web Application Review-1
Multi-Process JavaScript Architectures

More from HBaseCon (20)

PDF
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
PDF
hbaseconasia2017: HBase on Beam
PDF
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
PDF
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
PDF
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
PDF
hbaseconasia2017: Apache HBase at Netease
PDF
hbaseconasia2017: HBase在Hulu的使用和实践
PDF
hbaseconasia2017: 基于HBase的企业级大数据平台
PDF
hbaseconasia2017: HBase at JD.com
PDF
hbaseconasia2017: Large scale data near-line loading method and architecture
PDF
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
PDF
hbaseconasia2017: HBase Practice At XiaoMi
PDF
hbaseconasia2017: hbase-2.0.0
PDF
HBaseCon2017 Democratizing HBase
PDF
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
PDF
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
PDF
HBaseCon2017 Transactions in HBase
PDF
HBaseCon2017 Highly-Available HBase
PDF
HBaseCon2017 Apache HBase at Didi
PDF
HBaseCon2017 gohbase: Pure Go HBase Client
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: HBase at JD.com
hbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: hbase-2.0.0
HBaseCon2017 Democratizing HBase
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Transactions in HBase
HBaseCon2017 Highly-Available HBase
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 gohbase: Pure Go HBase Client

Recently uploaded (20)

PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PPTX
history of c programming in notes for students .pptx
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
Designing Intelligence for the Shop Floor.pdf
PDF
top salesforce developer skills in 2025.pdf
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PPTX
Transform Your Business with a Software ERP System
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
Understanding Forklifts - TECH EHS Solution
PPTX
Reimagine Home Health with the Power of Agentic AI​
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PDF
medical staffing services at VALiNTRY
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
Digital Strategies for Manufacturing Companies
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
Odoo Companies in India – Driving Business Transformation.pdf
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
history of c programming in notes for students .pptx
CHAPTER 2 - PM Management and IT Context
Designing Intelligence for the Shop Floor.pdf
top salesforce developer skills in 2025.pdf
Navsoft: AI-Powered Business Solutions & Custom Software Development
Transform Your Business with a Software ERP System
How to Choose the Right IT Partner for Your Business in Malaysia
Understanding Forklifts - TECH EHS Solution
Reimagine Home Health with the Power of Agentic AI​
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Wondershare Filmora 15 Crack With Activation Key [2025
wealthsignaloriginal-com-DS-text-... (1).pdf
medical staffing services at VALiNTRY
PTS Company Brochure 2025 (1).pdf.......
Digital Strategies for Manufacturing Companies
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
How to Migrate SBCGlobal Email to Yahoo Easily

HBaseCon 2015: NRT Event Processing with Guaranteed Delivery of HTTP Callbacks

  • 1. Webhooks Near-real time event processing with guaranteed delivery of HTTP callbacks HBaseCon 2015
  • 2. Alan Steckley Principal Software Engineer, Salesforce 2
  • 4. ​Safe harbor statement under the Private Securities Litigation Reform Act of 1995: ​This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results expressed or implied by the forward-looking statements we make. All statements other than statements of historical fact could be deemed forward-looking, including any projections of product or service availability, subscriber growth, earnings, revenues, or other financial items and any statements regarding strategies or plans of management for future operations, statements of belief, any statements concerning new, planned, or upgraded services or technology developments and customer contracts or use of our services. ​The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new functionality for our service, new products and services, our new business model, our past operating losses, possible fluctuations in our operating results and rate of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of any litigation, risks associated with completed and any possible mergers and acquisitions, the immature market in which we operate, our relatively limited operating history, our ability to expand, retain, and motivate our employees and manage our growth, new releases of our service and successful customer deployment, our limited history reselling non-salesforce.com products, and utilization and selling to larger enterprise customers. Further information on potential factors that could affect the financial results of salesforce.com, inc. is included in our annual report on Form 10-K for the most recent fiscal year and in our quarterly report on Form 10-Q for the most recent fiscal quarter. These documents and others containing important disclosures are available on the SEC Filings section of the Investor Information section of our Web site. ​Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently available and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features that are currently available. Salesforce.com, inc. assumes no obligation and does not intend to update these forward- looking statements. Safe Harbor 4
  • 5. ● Salesforce Marketing Cloud ● Webhooks use case ● Implementation in CDAP ● Q&A Overview 5
  • 6. ● Connects businesses to their customers through email, social media, and SMS. ● 1+ billion personalized messages per day ● 100,000’s of business units ● Billions of subscribers ● Hosts petabytes of customer data in our data centers ● Handles a wide range of communications ○ Marketing campaigns ○ Purchase confirmations ○ Financial notifications ○ Password resets What is the Salesforce Marketing Cloud? 6
  • 7. ● Webhooks is a near-real time event delivery platform with guaranteed delivery ○ Subscribers generate events by engaging with messages ○ Deliver events to customers over HTTP within seconds ○ Customers react to events in near real time What is Webhooks? 7
  • 8. A purchase receipt email fails to be delivered A mail bounce event is pushed to a service hosted by the retailer Retailer’s customer service is immediately aware of the failure Example use case 8
  • 9. 1. Process a stream of near real time events based on customer defined actions. 2. Guarantee delivery of processed events emitted to third party systems. General problem statement 9
  • 10. High data integrity Commerce, health, and finance messaging subject to government regulation Horizontal scalability Short time to market Accessible developer experience Existing Hadoop/YARN/HBase expertise and infrastructure Open Source Primary concerns 10
  • 11. Some events need pieces of information from other event streams Example: An email click needs the email send event for contextual information Wait until other events arrive to assemble the final event Join across streams Configurable TTL to wait to join (optional) Implementation concern - Joins 11
  • 12. Configurable per customer endpoint Retry Throttle TTL to deliver (optional) Reporting metrics, SLA compliance Implementation concern - Delivery guarantees 12
  • 13. High level architecture Ingest Join Route Store HTTP POST Kafka Source External System 13
  • 14. public class EventRouter { private Map<EventType, Route> routesMap; public void process(Event e) { Route route = routesMap.get(e.clientId()); if (null != route) { httpPost(e, route); } } } Business logic 14
  • 15. public class EventJoiner { private Map<JoinKey, SendEvent> sends; public void process(ResponseEvent e) { SendEvent send = sends.get(e.getKey()); if (null != send) { Event joined = join(send, e); routeEvent(joined); } } } Business logic 15
  • 16. ● Scaling data store is easy - use HBase ● Scaling application involves ○ Transactions ○ Application stack ○ Lifecycle management ○ Data movement ○ Coordination How to scale? 16
  • 17. 17
  • 18. ● An open source framework to build and deploy data applications on Apache™ Hadoop® ● Provides abstractions to represent data access and processing pipelines ● Framework level guarantees for exactly-once semantics ● Transaction support on HBase ● Supports real time and batch processing ● Built on YARN and HBase Cask Data Application Platform (CDAP) 18
  • 20. Business logic public class EventJoiner { private Map<JoinKey, SendEvent> sends; public void process(ResponseEvent e) { SendEvent send = sends.get(e.getKey()); if (null != send) { Event joined = join(send, e); routeEvent(joined); } } } 20
  • 21. Business logic in CDAP - Flowlet public class EventJoiner extends AbstractFlowlet { @UseDataSet(“sends”) private SendEventDataset sends; private OutputEmitter<Event> outQueue; @ProcessInput public void join(ResponseEvent e) { SendEvent send = sends.get(e.getKey()); if (send != null) { Event joined = join(e, send); outQueue.emit(joined); } } } 21
  • 22. public class EventJoiner extends AbstractFlowlet { @UseDataSet(“sends”) private SendEventDataset sends; private OutputEmitter<Event> outQueue; @ProcessInput public void join(ResponseEvent e) { SendEvent send = sends.get(e.getKey()); if (send != null) { Event joined = join(e, send); outQueue.emit(joined); } } } Access data with Datasets 22
  • 23. Chain Flowlets with Queues public class EventJoiner extends AbstractFlowlet { @UseDataSet(“sends”) private SendEventDataset sends; private OutputEmitter<Event> outQueue; @ProcessInput public void join(ResponseEvent e) { SendEvent send = sends.get(e.getKey()); if (send != null) { Event joined = join(e, send); outQueue.emit(joined); } } } 23
  • 24. Tigon Flow Event Joiner Flowlet HBase Queue HBase Queue Start Tx End Tx Start Tx End Tx Event Router Flowlet ● Real time streaming processor ● Composed of Flowlets ● Exactly-once semantics HBase Queue 24
  • 25. Scaling Flowlets Event Joiner Flowlets Event Router Flowlets HBase Queue YARN Containers FIFO Round Robin Hash Partitioning 25
  • 26. Summary ● CDAP makes development easier by handling the overhead of scalability ○ Transactions ○ Application stack ○ Lifecycle management ○ Data movement ○ Coordination 26
  • 28. Data abstraction using Dataset ● Store and retrieve data ● Reusable data access patterns ● Abstraction of underlying data storage ○ HBase ○ LevelDB ○ In-memory ● Can be shared between Flows (real-time) and MapReduce (batch) 28
  • 29. ● Transactions make exactly-once semantics possible ● Multi-row and across HBase regions transactions ● Optimistic concurrency control (Omid style) ● Open source (Apache 2.0 License) ● http://tephra.io Transaction support with Tephra 29
  • 30. ● Used today in enterprise cloud applications ● CDAP is open source (Apache 2.0 License) Use and contribute http://cdap.io/ 30