SlideShare a Scribd company logo
BENDING	SPARK	
TOWARDS	
ENTERPRIZE NEEDS
BORIS	TROFIMOV	@	SIGMA	SOFTWARE
Leading	DWH	at	AOL. /	Vidible division	/
Major	expertise	Big	Data	and	Enterprise
Cofounder	of	Odessa	JUG
Passionate follower	of	Scala
Associate	professor		at	ONPU
ABOUT ME
Bending Spark towards enterprise needs
Bending Spark towards enterprise needs
Bending Spark towards enterprise needs
USER	INTERFACE
SERVICE	LAYER
DATA	STORE
CLASSIC APPROACH
P R E S E N TAT I O N 	 L AY E R
A P P L I C AT I O N 	 S E R V I C E S
D O M A I N 	 L AY E R
D A L
INFRASTRUCTURE
CLASSIC LAYERS
Bending Spark towards enterprise needs
CLASSIC APPROACH
USER	INTERFACE
SERVICE	LAYER
DATA	STORE
CQRS
USER	INTERFACE
SERVICE	LAYER
DATA	STORE
COMMAND/WRITE
MODEL
READ	MODEL
CQRS NAKED
USER	INTERFACE
SERVICE	LAYER
WRITE
STORE
COMMAND	BUS
COMMAND	HANDLER
REPOSITORY
Domain
Model Domain
Model
EVENT	BUS
EVENT	HANDLER
Commands
READ	MODEL
STORE
QUERY	FACADE
Query DTO
WRITE	MODEL READ	MODEL
USER	INTERFACE
MULTIPLE READ	MODELS
USER	INTERFACE
SERVICE	LAYER
WRITE
STORE
COMMAND	BUS
COMMAND	HANDLER
REPOSITORY
Domain
Model Domain
Model
EVENT	BUS
EVENT	HANDLER
Commands
EVENT	HANDLER
READ	MODEL
STORE	1
READ	MODEL
STORE	2
QUERY	FACADE QUERY	FACADE
WRITE	MODEL READ	MODEL
BULLETIN BOARD	
APP
APPLICATION	FEATURES
• ADD	Bulletin	with	specific	author	name	and	message
• VIEW	list	of	published	bulletins
SERVICE	LAYER
SPARK	IS	COMING
USER	INTERFACE
MONGO
READ/WRITE	STORE
KAFKA
COMMAND	HANDLER
REPOSITORY
Domain
Model Domain
Model
Commands
QUERY	FACADE
Query DTO
WRITE	MODEL READ	MODEL
SPARK
SERVICE	LAYER
SPARK	IS	COMING
USER	INTERFACE
MONGO
READ/WRITE	STORE
KAFKA
COMMAND	HANDLER
REPOSITORY
Domain
Model Domain
Model
Commands
QUERY	FACADE
Query DTO
WRITE	MODEL READ	MODEL
SPARK
SHARED	STORE
TECHNOLOGY STACK
FAÇADE
• Java	8
• Spring	Boot	&	MVC
• Mongo
COMMAND	PROCESSOR
• Kafka	0.9.0.1
• Scala	2.11	
• Spark	2.0.0	Streaming
DEPLOYMENT VIEW
DOCKER
KAFKAZOOKEEPER
MONGO
DEV	MACHINE
LOCAL	SPARK
FACADE
DEMO
https://github.com/btrofimov/spark-enterprise-example/tree/master/nonblocking-bulletinboard
Bending Spark towards enterprise needs
WAIT
WELCOME	TO	ASYNCHONOUS	HELL
AT	LEAST	FOUR	SOLUTIONS
• C'mon,	it’s	OK
• Pulling	service	until	entity	is	added
• Add	push	channel	and	deliver	events	upstairs
• Make	REST	methods	blocked	until	command	is	finished
MAKE	IT	SYNCHRONOUS
DEMO	2
https://github.com/btrofimov/spark-enterprise-example/tree/master/blocking-bulletinboard
RETROSPECTIVE
READ	MODEL
instances
WRITE	MODEL
instances
EFFICIENT SCALE	OUT	
SEPARATED	SCALE	OUT
DEPENDING	ON	MODEL	
NEEDS
dataStream.foreachRDD { rdd =>
val values = rdd.values
values.map { cmdMessage =>
…
}
.foreach { cmdMessage =>
…
}
}
DESERIALIZATION ERRORS
MIGHT	FAIL	DURING	LAUNCH	IF	
PIPELINE	HAS	BEEN	CHANGED
RISK: AFFECTS	CHECKPOINTS
DESERIALIZATION ERRORS
• Hard	to	keep	code	tolerant	against	these	errors	and	testable	at	the	
same	time.
• Spark	with	Kafka	connector	0.10	allows	to	save	Kafka	offsets	and	it	is	
possible	to	restore	from	them	instead	of	checkpoints.
THROUGHPUT VS	LATENCY
Spark-based	services	have	nice	
throughput	however	due	to	
internal	overheads	they	have	
lower	limit	on	latency
IDENTIFIER ASSIGN	RESPONSIBILITY
• Client	generates	unique	identifier	and	passes	it	to	service
• Service	generates	unique	identifier	and	passes	it	back	to	client
EVENTUAL	VS STRONG CONSISTENCY
• CQRS	apps	usually	are	eventually	consistent
• Choose	wisely	between	CQRS	and	traditional	
approach
• Consider	optimistic	or	pessimistic	locking		to	
achieve	strong	consistency
• In	some	cases	defined	reasonable	timeouts	
or	in-progress	lists	might	help	to	solve	race	
conditions	(depends	on	command	
roundtrips)
FAULT	TOLERANCE
• Spark	auto	reruns	tasks	to	mitigate	network	or	other	outages
• Easy	and	safe	application	restart	based	on	checkpoints
FAULT	TOLERANCE
SUMMARY
P R O SC O N S
SUMMARY
P R O SC O N S
Best	fit	for	highly	throughput	and	
non-low	latency	apps
SUMMARY
P R O SC O N S
Easy	to	shoot	one's	foot
Best	fit	for	highly	throughput	and	
non-low	latency	apps
SUMMARY
P R O SC O N S
Easy	to	shoot	one's	foot
Easy	to	scale
Best	fit	for	highly	throughput	and	
non-low	latency	apps
SUMMARY
P R O SC O N S
Easy	to	shoot	one's	foot
Easy	to	scale
Best	fit	for	highly	throughput	and	
non-low	latency	apps
Think	twice	on	pure	CRUD	apps
SUMMARY
P R O SC O N S
Easy	to	shoot	one's	foot
Easy	to	scale
Best	fit	for	highly	throughput	and	
non-low	latency	apps
Separated	concerns
Think	twice	on	pure	CRUD	apps
SUMMARY
P R O SC O N S
Easy	to	shoot	one's	foot
Easy	to	scale
Best	fit	for	highly	throughput	and	
non-low	latency	apps
Might	be an	over-engineering	for	
projects	with	small	domain
Separated	concerns
Think	twice	on	pure	CRUD	apps
SUMMARY
P R O SC O N S
Easy	to	shoot	one's	foot
Easy	to	scale
Best	fit	for	highly	throughput	and	
non-low	latency	apps
Might	be an	over-engineering	for	
projects	with	small	domain
Separated	concerns
Think	twice	on	pure	CRUD	apps
Good	fit	distributed	teams	
with	decomposed	domains
THANK	YOU
slides code
POST/bulletins GET/bulletins
SPARK	IS	COMING
USER	INTERFACE
MONGO
READ/WRITE	STORE
KAFKA
COMMAND	HANDLER
REPOSITORY
Domain
Model Domain
Model
Commands
QUERY	FACADE
Query DTO
WRITE	MODEL READ	MODEL
SPARK

More Related Content

PDF
New accelerators in Big Data - Upsolver
PPTX
Why postgres SQL deserve noSQL fan respect - Riga dev day 2016
PDF
Audience counting at Scale
PDF
Scalding big ADta
PPTX
Scalding Big (Ad)ta
ODP
MongoDB Distilled
PDF
Faster persistent data structures through hashing
PDF
Continuous DB migration based on carbon5 framework
New accelerators in Big Data - Upsolver
Why postgres SQL deserve noSQL fan respect - Riga dev day 2016
Audience counting at Scale
Scalding big ADta
Scalding Big (Ad)ta
MongoDB Distilled
Faster persistent data structures through hashing
Continuous DB migration based on carbon5 framework

Similar to Bending Spark towards enterprise needs (17)

PDF
Programming Google App Engine Build and Run Scalable Web Apps on Google s Inf...
PPT
Using Spring Data and MongoDB with Cloud Foundry
PDF
Spark Summit EU talk by Ross Lawley
PDF
How To Connect Spark To Your Own Datasource
PPT
Challenges in an E-Commerce Catalogue with SQL; How Mongo helps
PPT
Mongo @ homeshop18
PPT
Mongo @ homeshop18
PDF
The Java Content Repository
PDF
MongoDB Europe 2016 - Big Data meets Big Compute
PDF
MongoDB World 2019: MongoDB Implementation at T-Mobile
KEY
using Spring and MongoDB on Cloud Foundry
PPTX
MongoDB + Spring
PPTX
MongoDB and Spring - Two leaves of a same tree
PDF
Best Practices for Building Open Source Data Layers
PDF
PostgreSQL - масштабирование в моде, Valentine Gogichashvili (Zalando SE)
ODP
Polyglot persistence with Spring Data
PPTX
Splunk's Hunk: A Powerful Way to Visualize Your Data Stored in MongoDB
Programming Google App Engine Build and Run Scalable Web Apps on Google s Inf...
Using Spring Data and MongoDB with Cloud Foundry
Spark Summit EU talk by Ross Lawley
How To Connect Spark To Your Own Datasource
Challenges in an E-Commerce Catalogue with SQL; How Mongo helps
Mongo @ homeshop18
Mongo @ homeshop18
The Java Content Repository
MongoDB Europe 2016 - Big Data meets Big Compute
MongoDB World 2019: MongoDB Implementation at T-Mobile
using Spring and MongoDB on Cloud Foundry
MongoDB + Spring
MongoDB and Spring - Two leaves of a same tree
Best Practices for Building Open Source Data Layers
PostgreSQL - масштабирование в моде, Valentine Gogichashvili (Zalando SE)
Polyglot persistence with Spring Data
Splunk's Hunk: A Powerful Way to Visualize Your Data Stored in MongoDB
Ad

More from b0ris_1 (10)

PDF
Learning from nature or human body as a source on inspiration for software en...
PDF
Devoxx 2022
PDF
IT Arena-2021
PDF
Learning from nature [slides from Software Architecture meetup]
PDF
Cowboy dating with big data TechDays at Lohika-2020
PDF
Cowboy dating with big data
PDF
Ultimate journey towards realtime data platform with 2.5M events per sec
PDF
So various polymorphism in Scala
PPTX
Spring AOP Introduction
PPTX
Clustering Java applications with Terracotta and Hazelcast
Learning from nature or human body as a source on inspiration for software en...
Devoxx 2022
IT Arena-2021
Learning from nature [slides from Software Architecture meetup]
Cowboy dating with big data TechDays at Lohika-2020
Cowboy dating with big data
Ultimate journey towards realtime data platform with 2.5M events per sec
So various polymorphism in Scala
Spring AOP Introduction
Clustering Java applications with Terracotta and Hazelcast
Ad

Recently uploaded (20)

PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Enhancing emotion recognition model for a student engagement use case through...
PPTX
cloud_computing_Infrastucture_as_cloud_p
PPTX
O2C Customer Invoices to Receipt V15A.pptx
PDF
1 - Historical Antecedents, Social Consideration.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
Tartificialntelligence_presentation.pptx
PDF
Developing a website for English-speaking practice to English as a foreign la...
PDF
STKI Israel Market Study 2025 version august
PPT
Module 1.ppt Iot fundamentals and Architecture
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
WOOl fibre morphology and structure.pdf for textiles
PDF
Getting Started with Data Integration: FME Form 101
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
2021 HotChips TSMC Packaging Technologies for Chiplets and 3D_0819 publish_pu...
PDF
Zenith AI: Advanced Artificial Intelligence
PPTX
1. Introduction to Computer Programming.pptx
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
Assigned Numbers - 2025 - Bluetooth® Document
Enhancing emotion recognition model for a student engagement use case through...
cloud_computing_Infrastucture_as_cloud_p
O2C Customer Invoices to Receipt V15A.pptx
1 - Historical Antecedents, Social Consideration.pdf
Programs and apps: productivity, graphics, security and other tools
Tartificialntelligence_presentation.pptx
Developing a website for English-speaking practice to English as a foreign la...
STKI Israel Market Study 2025 version august
Module 1.ppt Iot fundamentals and Architecture
A novel scalable deep ensemble learning framework for big data classification...
Univ-Connecticut-ChatGPT-Presentaion.pdf
WOOl fibre morphology and structure.pdf for textiles
Getting Started with Data Integration: FME Form 101
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
2021 HotChips TSMC Packaging Technologies for Chiplets and 3D_0819 publish_pu...
Zenith AI: Advanced Artificial Intelligence
1. Introduction to Computer Programming.pptx
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game

Bending Spark towards enterprise needs