SlideShare a Scribd company logo
In-Memory	Streaming,	Storage	&	Analy4cs	
	Apache	Apex	+	Apache	Geode		
	
Thomas	Weise	 Ashish	Tadose
•  In-memory	Stream	Processing	
•  Par22oning	and	Scaling	out	
•  Windowing	Support	(temporal)	
•  Stateful	Fault-tolerance,	Operability	
•  Processing	Guarantees	
•  Compute	Locality	
•  Dynamic	updates	
	
Apex	Features	…
Apex	PlaGorm	Overview
Applica2on	Programming	Model	Applica2on	Programming	Model	
§  Stream is a sequence of data tuples
§  Operator takes one or more input streams, performs computations & emits one or more output streams
–  Each Operator is YOUR custom business logic in java, or built-in operator from our open source library
–  Operator has many instances that run in parallel and each instance is single-threaded
§  Directed Acyclic Graph (DAG) is made up of operators and streams
–  Iterative processing supported
Directed Acyclic Graph (DAG)
Output StreamTuple Tuple
er	
Operator	
er	
Operator	
er	
Operator	
er	
Operator
Apache	Apex-Malhar
Apex	Na2ve	Hadoop	Integra2on	
YARN	is	the	resource	
manager	
	
HDFS	used	for	storing	
any	persistent	state
•  Operator	state	is	checkpointed	to	a	persistent	store	
–  Automa2cally	performed	by	engine,	no	addi2onal	work	needed	by	operator	
–  In	case	of	failure	operators	are	restarted	from	checkpoint	state	
–  Frequency	configurable	per	operator	
–  Asynchronous	and	distributed	by	default	
–  Default	store	is	HDFS	
•  Automa2c	detec2on	and	recovery	of	failed	operators	
–  Heartbeat	mechanism	
•  Buffering	mechanism	to	ensure	replay	of	data	from	recovered	point	so	that	
there	is	no	loss	of	data	
•  Applica2on	master	state	checkpointed	
	
Apex	Fault	Tolerance
At-least-once	
• On	recovery	data	will	be	replayed	from	a	previous	checkpoint	
–  No	messages	lost	
–  Default,	suitable	for	most	applica2ons	
• Can	be	used	to	ensure	data	is	wriUen	once	to	store	
–  Transac2ons	with	meta	informa2on,	Rewinding	output,	Feedback	from	external	en2ty,	
Idempotent	opera2ons	
At-most-once	
• On	recovery	the	latest	data	is	made	available	to	operator	
–  Useful	where	data	loss	is	acceptable	and	latest	data	is	sufficient	
Exactly-once	
–  At-least-once	+	idempotency	+	transac2onal	mechanisms	(operator	logic)	to	achieve	end-to-end	
exactly	once	behavior	
	
Apex	Processing	Seman2cs
•  Data	flow	in-memory,	no	disk	
•  Incremental	recovery	–	buffer	server	
•  In-memory	data	for	querying	data	
	
	
IMC	Benefits	for	Apex
Streaming	meets	In	Memory	Data	Grid
Apex	+	Geode	Integra2on		
Completed		
	
•  Operator	check-poin2ng	in	Geode.	
•  Output	operator	to	store	tuples	in	Geode	region.		
	
Proposed		
	
•  Geode	output	operator	with	Transac2onal	support.	
•  Ingest	data	from	Geode	to	Apex	DAG.	
•  Distributed	Cache	Operator.	
•  Scan	operator	for	parallel	query	execu2on	&	result	retrieval.
Operator	Checkpoin2ng	in	Geode			
Apex	Operator	check-poin4ng	in	an	IMDG	(Geode	store)	
	
• Checkpoin2ng	is	an	essen2al	mechanism	to	ensure	Fault	Tolerance	
• Apex	checkpoints	operator	state	to	HDFS	
• Slower	HDFS	checkpoin2ng	hurts	applica2on	performance	
• Checkpoin2ng	in	Geode	ensures	that	applica2on	performance	is	not	impacted		
• Geode	has	beUer	latency	for	write	opera2ons	than	HDFS.	
Implementa4on:	 	 	GeodeStorageAgent	
hUps://issues.apache.org/jira/browse/APEXCORE-283
Data	Streams	to	Geode	Store			
Apex	Output	Operator	to	write	to	Geode	store	
	
•  Apex	Output	operator	–	Egress	data	from	Apex	DAG	to	external	store	
•  Store	incoming	tuples	in	binary	/	POJO	format	in	Geode	region			
•  Geode	Efficient	Query	integra2on	–	OQL	
•  Geode	region	supports	data	replica2on,	overflow	to	disk,	persistence	&	many	evic2on	
strategies	
Implementa4on:	 	 	GeodeStore	
GeodePOJOPutOperator	
AbstractGeodePutOperator	
hUps://malhar.atlassian.net/projects/MLHR/issues/MLHR-1942
Geode	Transac2ons		writes	
Apex	Output	Operator	to	write	to	Geode	store	with	Transac4ons	
	
• Apex		DAG	uses	Transac2onableStore	to	provide	guarantee	that	records	are	wriUen	are	
exactly	once.	E.g.	JdbcTransac2onalStore	
• Geode	provides	Transac2on	support	for	efficient	and	safe	coordinated	opera2ons	
• Geode	store	using	transac2ons	guarantee	that	records	are	wriUen	exactly	once	
• Put	operator	backed	by	GeodeTransac2onal	store	can	help	to	achieve	Exactly	once	
seman2cs	
Implementa4on:	 	 	GeodeWindowStore	as	Transac2onableStore
Streaming	Geode	data	in	Apex	
Apex	Input	Operator	to	read	from	Geode	store	
• Apex	Input	operators	–	Ingest	data	from	external	sources	into	Apex	DAG	
• Geode	provides	versa2le	and	reliable	event	distribu2on	to	provide	Real	Time	
updates	to	data	
•  Use	case	–	Apex	operator	to	stream	async	events	from	Geode	in	DAG	
•  Call	back	events	reduce	polling	cycles	over	network	
Implementa4on:	 	 	GeodeRegionStreamOperator		
	 	 	 	 	receives	a	newly	added	tuples	and	emits	in	DAG
Geode	Cache	Operator			
Apex	Geode	Cache	Operator		
• Geode	provides	efficient	Events	&	No2fica2ons		
•  Register	interest	–	update	local	copies		
•  Con2nuous	Query		
•  Receive	no2fica2on	when	Query	condi2on		met	on	server	
•  Eg.g	SELECT	*	FROM	/tradeOrder	t	WHERE	t.price	>	100.00			
• Use	Geode	events	no2fica2on	framework	to	maintain	&	invalidate	cache.	
Implementa4on:	 	 	GeodeCacheOperator	
	 	 	 	 	maintains	consistent	cache	based	on	subscribed	keyset/query
Geode	Scan	Operator				
Apex	Geode	Scan	Operator		
• Func2on	Execu2on	provides	Parallel	Query	Execu2on	
• MapReduce	like	execu2on	-	concurrent	execu2on	on	members	&	results	are	
collected	from	members	&	sent		to	caller.			
• Use	case:	Streaming	applica2on	depending	on	large	scan	result	from	external	store	
Implementa4on: 	 	GeodeQueryOperator		
	 	 	 	 	execute	data	dependent	queries	on	distributed	region	
	 	 	 	 	emit	results	in	DAG
Join the Apache Geode Community!
•  Check out: http://geode.incubator.apache.org
•  Subscribe: user-subscribe@geode.incubator.apache.org
•  Download: http://geode.incubator.apache.org/releases/
Ques4ons	???	
Thank	You	…

More Related Content

PDF
#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & Geode
PPTX
#GeodeSummit - Spring Data GemFire API Current and Future
PPTX
#GeodeSummit - Off-Heap Storage Current and Future Design
PDF
#GeodeSummit - Design Tradeoffs in Distributed Systems
PDF
#GeodeSummit - Where Does Geode Fit in Modern System Architectures
PDF
#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"
PDF
#GeodeSummit - Large Scale Fraud Detection using GemFire Integrated with Gree...
PPT
Apache Apex & Apace Geode In-Memory Computation, Storage & Analysis
#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & Geode
#GeodeSummit - Spring Data GemFire API Current and Future
#GeodeSummit - Off-Heap Storage Current and Future Design
#GeodeSummit - Design Tradeoffs in Distributed Systems
#GeodeSummit - Where Does Geode Fit in Modern System Architectures
#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"
#GeodeSummit - Large Scale Fraud Detection using GemFire Integrated with Gree...
Apache Apex & Apace Geode In-Memory Computation, Storage & Analysis

What's hot (20)

PDF
An introduction into Spark ML plus how to go beyond when you get stuck
PDF
Best Practices for Enabling Speculative Execution on Large Scale Platforms
PDF
Low latency high throughput streaming using Apache Apex and Apache Kudu
PDF
Portable UDFs: Write Once, Run Anywhere
PDF
Dynamic Class-Based Spark Workload Scheduling and Resource Using YARN with L...
PDF
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
PDF
Apache Spark the Hard Way: Challenges with Building an On-Prem Spark Analytic...
PPTX
Spark, Tachyon and Mesos internals
PDF
Building Apps with Distributed In-Memory Computing Using Apache Geode
PPTX
xPatterns on Spark, Shark, Mesos, Tachyon
PDF
Apache Kylin - Balance Between Space and Time
PPTX
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
PPTX
Lessons learned from embedding Cassandra in xPatterns
PPTX
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
PPTX
Hadoop for the Data Scientist: Spark in Cloudera 5.5
PDF
#GeodeSummit - Redis to Geode Adaptor
PDF
IMCSummit 2015 - Day 2 Developer Track - Anatomy of an In-Memory Data Fabric:...
PDF
Whirlpools in the Stream with Jayesh Lalwani
PPTX
Introduction and HDInsight best practices
PDF
GridGain 6.0: Open Source In-Memory Computing Platform - Nikita Ivanov
An introduction into Spark ML plus how to go beyond when you get stuck
Best Practices for Enabling Speculative Execution on Large Scale Platforms
Low latency high throughput streaming using Apache Apex and Apache Kudu
Portable UDFs: Write Once, Run Anywhere
Dynamic Class-Based Spark Workload Scheduling and Resource Using YARN with L...
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
Apache Spark the Hard Way: Challenges with Building an On-Prem Spark Analytic...
Spark, Tachyon and Mesos internals
Building Apps with Distributed In-Memory Computing Using Apache Geode
xPatterns on Spark, Shark, Mesos, Tachyon
Apache Kylin - Balance Between Space and Time
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
Lessons learned from embedding Cassandra in xPatterns
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
Hadoop for the Data Scientist: Spark in Cloudera 5.5
#GeodeSummit - Redis to Geode Adaptor
IMCSummit 2015 - Day 2 Developer Track - Anatomy of an In-Memory Data Fabric:...
Whirlpools in the Stream with Jayesh Lalwani
Introduction and HDInsight best practices
GridGain 6.0: Open Source In-Memory Computing Platform - Nikita Ivanov
Ad

Viewers also liked (18)

PDF
Design Tradeoffs in Distributed Systems- How Southwest Airlines Uses Geode
PDF
#GeodeSummit: Combining Stream Processing and In-Memory Data Grids for Near-R...
PDF
#GeodeSummit - Modern manufacturing powered by Spring XD and Geode
PDF
#GeodeSummit: Architecting Data-Driven, Smarter Cloud Native Apps with Real-T...
PDF
#GeodeSummit - Wall St. Derivative Risk Solutions Using Geode
PDF
#GeodeSummit: Democratizing Fast Analytics with Ampool (Powered by Apache Geode)
PDF
#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode
PDF
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
PDF
Wall Street Derivative Risk Solutions Using Geode
PPT
Tech in all Directions
PDF
Airlines 2020 substitution and commoditization
PPTX
SouthWest Airlines | Marketing | Case Study
PPTX
Geo-Analytics with Apache Spark and In-Memory Data Grids
PPTX
Apache geode
PDF
Introduction to Apache Geode (Cork, Ireland)
PPTX
MRY's SXSW 2015 Recap: Brands, Tech, Meerkat, Trends, and Meerkat
PPTX
Individual and societal risk
PPTX
REDES NEURONALES
Design Tradeoffs in Distributed Systems- How Southwest Airlines Uses Geode
#GeodeSummit: Combining Stream Processing and In-Memory Data Grids for Near-R...
#GeodeSummit - Modern manufacturing powered by Spring XD and Geode
#GeodeSummit: Architecting Data-Driven, Smarter Cloud Native Apps with Real-T...
#GeodeSummit - Wall St. Derivative Risk Solutions Using Geode
#GeodeSummit: Democratizing Fast Analytics with Ampool (Powered by Apache Geode)
#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
Wall Street Derivative Risk Solutions Using Geode
Tech in all Directions
Airlines 2020 substitution and commoditization
SouthWest Airlines | Marketing | Case Study
Geo-Analytics with Apache Spark and In-Memory Data Grids
Apache geode
Introduction to Apache Geode (Cork, Ireland)
MRY's SXSW 2015 Recap: Brands, Tech, Meerkat, Trends, and Meerkat
Individual and societal risk
REDES NEURONALES
Ad

Similar to #GeodeSummit - Apex & Geode: In-memory streaming, storage & analytics (20)

PPTX
Apache Apex Introduction with PubMatic
PPTX
Deep Dive into Apache Apex App Development
PPTX
Introduction to Apache Apex and writing a big data streaming application
PPTX
Next Gen Big Data Analytics with Apache Apex
PPTX
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
PDF
Impala Architecture presentation
PPTX
Introduction to Apache Apex
PPTX
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
PPTX
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
PDF
Introduction to Apache Apex
PPTX
Apache Apex Fault Tolerance and Processing Semantics
PDF
Introduction to Apache Apex by Thomas Weise
PDF
Low Latency Polyglot Model Scoring using Apache Apex
PPTX
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
PDF
Real-time Stream Processing using Apache Apex
PDF
Introduction to Apache Apex - CoDS 2016
PPTX
Fault Tolerance and Processing Semantics in Apache Apex
PPTX
Apache Apex Fault Tolerance and Processing Semantics
PPTX
Fault tolerance
PPTX
Intro to Apache Apex @ Women in Big Data
Apache Apex Introduction with PubMatic
Deep Dive into Apache Apex App Development
Introduction to Apache Apex and writing a big data streaming application
Next Gen Big Data Analytics with Apache Apex
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Impala Architecture presentation
Introduction to Apache Apex
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Introduction to Apache Apex
Apache Apex Fault Tolerance and Processing Semantics
Introduction to Apache Apex by Thomas Weise
Low Latency Polyglot Model Scoring using Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Real-time Stream Processing using Apache Apex
Introduction to Apache Apex - CoDS 2016
Fault Tolerance and Processing Semantics in Apache Apex
Apache Apex Fault Tolerance and Processing Semantics
Fault tolerance
Intro to Apache Apex @ Women in Big Data

More from PivotalOpenSourceHub (14)

PPTX
Zettaset Elastic Big Data Security for Greenplum Database
PPTX
New Security Framework in Apache Geode
PPTX
Apache Geode Clubhouse - WAN-based Replication
PPTX
GPORCA: Query Optimization as a Service
PDF
Pivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
PPTX
Apache Geode Offheap Storage
PPTX
Apache Zeppelin Meetup Christian Tzolov 1/21/16
PPTX
Build & test Apache Hawq
PDF
Postgre sql linuxcontainers by Jignesh Shah
PPTX
kafka for db as postgres
PPTX
Geode Transactions by Swapnil Bawaskar
PPTX
Greenplum Database Open Source December 2015
PPTX
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
PDF
Data Science Perspective and DS demo
Zettaset Elastic Big Data Security for Greenplum Database
New Security Framework in Apache Geode
Apache Geode Clubhouse - WAN-based Replication
GPORCA: Query Optimization as a Service
Pivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
Apache Geode Offheap Storage
Apache Zeppelin Meetup Christian Tzolov 1/21/16
Build & test Apache Hawq
Postgre sql linuxcontainers by Jignesh Shah
kafka for db as postgres
Geode Transactions by Swapnil Bawaskar
Greenplum Database Open Source December 2015
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
Data Science Perspective and DS demo

Recently uploaded (20)

PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Electronic commerce courselecture one. Pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Approach and Philosophy of On baking technology
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
cuic standard and advanced reporting.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Big Data Technologies - Introduction.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
MIND Revenue Release Quarter 2 2025 Press Release
Electronic commerce courselecture one. Pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
NewMind AI Weekly Chronicles - August'25 Week I
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Approach and Philosophy of On baking technology
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
cuic standard and advanced reporting.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Unlocking AI with Model Context Protocol (MCP)
Reach Out and Touch Someone: Haptics and Empathic Computing
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Building Integrated photovoltaic BIPV_UPV.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Big Data Technologies - Introduction.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Advanced methodologies resolving dimensionality complications for autism neur...
Dropbox Q2 2025 Financial Results & Investor Presentation

#GeodeSummit - Apex & Geode: In-memory streaming, storage & analytics