SlideShare a Scribd company logo
ARCHITECTURE AND INFRASTRUCTURE
http://linkedin.com/in/alexvsilva
@thealexsilva
DESIGNING A REACTIVE
REAL-TIME DATA PLATFORM
Who am I?
- DATA Platform Architect
at Pluralsight
- Rackspace
- WDW
TECHNOLOGY
LEARNING
PLATFORM
What shou
ld
Ilearn?Where
Sho
uld
IStart?Who
can
help
me?Whatdid
I learn?
• Online technology
learning platform
• Subscription model
• Data-driven
PLURALSIGHT
IN	THE	BEGINNING…
Development TIME became the
bottleneck
Designing a reactive real-time data platform: Architecture and Infrastructure Challenges
HOW DO WE FIX THAT?
REQUIREMENTSDISCOVERABLE
OPEN
EXTENSIBLE
Flexible
Contract
ADAPTABLE
ABSTRACTION
Reactive principles
RESPONSIVE ELASTIC
RESILIENTMESSAGE DRIVEN
RESPONSIVE
ELASTIC
asynchronous share nothing
location
transparency
divide and conquer
RESILENT
MORE THAN “JUST” FAULT TOLERANCE
Designing a reactive real-time data platform: Architecture and Infrastructure Challenges
Designing a reactive real-time data platform: Architecture and Infrastructure Challenges
Designing a reactive real-time data platform: Architecture and Infrastructure Challenges
Designing a reactive real-time data platform: Architecture and Infrastructure Challenges
Designing a reactive real-time data platform: Architecture and Infrastructure Challenges
Designing a reactive real-time data platform: Architecture and Infrastructure Challenges
Software systems are complex systems.
“Complex systems run in degraded mode.”
“Complex systems run as broken systems.”
Richard Cook
COMPLEX OR COMPLICATED?
COMPLEX OR COMPLICATED?
MESSAGE DRIVEN
asynchronous FAILURES AS MESSAGES
location
transparency
ISOLATION
Messages and events
SAVE
THIS!
SOMEBODY
LOGGED IN!
FactsTopic
Events ARE…
Past
AddressableSpecific
Messages ARE…
Data platform at pluralsight
REAL-TIME	DATA	REPLICATION	PLATFORM
HYDRA
INGEST
Ingestion + Replication
PORTAL
Schema Manager
“The Log”
HYDRA
STREAMS
Streaming + Replication
Designing a reactive real-time data platform: Architecture and Infrastructure Challenges
AKKA
akka
Distributed Fault-Tolerant Asynchronous
Highly
Concurrent
Akka challenges
Remoting Type Safety Debugging
Release
Cycles
WHAT’s AN ACTOR?
behavior
state
MAILBOX
CHILD ACTORS
SUPERVISOR STRATEGY
ACTOR Refs are reactive!
ACTOR Refs Vs. ACTOR PATHS
akka.tcp://Hydra@localhost:9001/user/service/ingestor_registry
Protocol
Address
ActorSystem
Path
Actor paths enable location
transparency
akka.tcp://Hydra/user/service/ingestor_registry
Protocol
ActorSystem
Path
Deploying remote actors
akka	{
actor	{
		provider	=	remote
		deployment	{
				web_analytics_ingestor	{
						remote	=	"akka.tcp://Hydra-1@127.0.0.1:2553"
				}
		}
}
ACTORS are ELASTIC
akka {
actor {
deployment {
/services-manager/handler_registry/segment_handler {
router = round-robin-pool
optimal-size-exploring-resizer {
enabled = on
action-interval = 5s
downsize-after-underutilized-for = 2h
}
}
/services-manager/kafka_producer {
router = round-robin-pool
resizer {
lower-bound = 5
upper-bound = 50
messages-per-resize = 500
}
}
}
}
}
Sending messages on akka
VS
Hydra ingest
Data capture at scale
mitigate
message loss
DATA format
is secondary
automated
replication
schema driven
ENFORCE METADATA AT INGESTION TIME
DATA PIPELINES DATA QUALITY
DATA REPLICATIONDATA DISCOVERY
Metadata is a first class citizen
Why avro?
Schema evolution
Smaller data footprint
Json friendly
Strong community support
Existing tools
INGESTION PROTOCOL OVERVIEW
BRINGING REACTIVE PRINCIPLES TO THE MIX
HYDRA REQUEST
PAYLOAD
{
		"name":"John",
		"age":30,
		"cars":[	"Ford",	"BMW"	]
}
		kafka-topic	=	PersonTopic
		validation	=		Strict
		avro-schema	=	Person.avsc
		
METADATA
+ = HYDRA REQUEST
Anything,
really
HYDRA REPLICATION PROTOCOL
HYDRA REQUEST
INGESTORS
Publish
Akka Actors (remote)
Transport
Transports
Akka Actors (remote)
Kafka
Postgres
Elastic Search
Inspect metadata
and decide
Publish
INGESTORS
Join
STOP
Validate IngestValid
InvalidIgnore
WHY DIFFERENT PHASES?
Divide and conquer
isolation
Small asynchronous tasks
recovery
Designing a reactive real-time data platform: Architecture and Infrastructure Challenges
Designing a reactive real-time data platform: Architecture and Infrastructure Challenges
HYDRA Message delivery guarantees
ALSO METADATA DRIVEN
AT-LEAST-ONCE SEMANTICS
AKKA PERSISTENT ACTOR
hydra-delivery-strategy
kafka
A Messaging system based on
distributed log semantics
Scalable
Fault tolerant
Stateful
Strong ordering
High concurrency
BROKER
BROKER
BROKER(User, 0)
Topic: User
(User, 0)
(User, 0)
READS/WRITES FROM/TO
Leader only
REPLICATION PROTOCOL
Replication is about RESILIENCY
BROKER BROKER BROKER BROKER
Looks like A GLOBALLY ORDERED QUEUE
BROKER
APPLICATION
APPLICATION
CONSUMER
APPLICATION
THE LOG is a linear structure
Old New
Messages are added here
Consumers have a position
Only sequential access Read to offset and SCAN
Old New
Consumer 1
Consumer 2
MESSAGES CAN BE REPLAYED
FOR AS LONG AS THEY EXIST IN THE LOG
Old New
Consumer 1
Consumer 2
A DISTRIBUTED REPLICATION PROTOCOL
Rewind and Replay
Hydra STREAMS
STREAMING FEDERATION LAYER
STREAM PROCESSING
Continuously updating datasets
Max(viewed_time) from
clip_views
where location=‘CA’
over 1 day window
Similar features as a database
JOINAGREGGATE FILTER VIEW
Streaming
platforms
Designing a reactive real-time data platform: Architecture and Infrastructure Challenges
Designing a reactive real-time data platform: Architecture and Infrastructure Challenges
Why spark?
Support for many different data formats
Structured streaming
Failover and lifecycle management
Medium latency
Unified api
EVENT STREAM / LOG
MATERIALIZEDVIEWS/CACHE
HADOOP
ETL
SERVICE
TRANSF
Writes to
Replicates to
• Reproducible
• Stays in sync
Why kafka streams?
application that can run anywhere
Medium data volumes
Kafka specific
Low latency
Basic tasks
IT WORKS FOR MICROSERVICES TOO
HYDRA
Sends
BROKER
stores
(at a minimum)
INGESTION
Customer
HYDRA
STREAM DISPATCH
{ }
/dsls
submits
POSTs
Invoices Returns
joins/normalizes
streams
Hydra SPARK
What is it?
Abstraction layer on top of SPARK datasets
Models data flows
Sources and operations
Based on a custom dsl
Api-driven
The “DSL” abstraction
Example
WE ARE ON GITHUB!
github.com/pluralsight/hydra
github.com/pluralsight/hydra-spark
Designing a reactive real-time data platform: Architecture and Infrastructure Challenges
Thank You!

More Related Content

PPTX
Leveraging the power of the unbundled database
PDF
Designing a Scalable Data Platform
PDF
Designing a reactive data platform: Challenges, patterns, and anti-patterns
PDF
Bootstrapping Microservices with Kafka, Akka and Spark
PDF
Revitalizing Walmart's Aging Architecture for Web Scale
PDF
Jug - ecosystem
PPTX
Putting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OS
PDF
Akka and Kubernetes: Reactive From Code To Cloud
Leveraging the power of the unbundled database
Designing a Scalable Data Platform
Designing a reactive data platform: Challenges, patterns, and anti-patterns
Bootstrapping Microservices with Kafka, Akka and Spark
Revitalizing Walmart's Aging Architecture for Web Scale
Jug - ecosystem
Putting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OS
Akka and Kubernetes: Reactive From Code To Cloud

What's hot (20)

PDF
Beyond the brokers - Un tour de l'écosystème Kafka
PDF
Akka Revealed: A JVM Architect's Journey From Resilient Actors To Scalable Cl...
PDF
Reactive Design Patterns
PDF
Kafka summit SF 2019 - the art of the event-streaming app
PDF
Devoxx university - Kafka de haut en bas
PDF
Flexible Authentication Strategies with SASL/OAUTHBEARER (Michael Kaminski, T...
PPTX
Building Eventing Systems for Microservice Architecture
PDF
High-Speed Reactive Microservices
PPTX
Typesafe Reactive Platform: Monitoring 1.0, Commercial features and more
PDF
Building stateful systems with akka cluster sharding
PDF
Dataservices - Processing Big Data The Microservice Way
PDF
Why Actor-Based Systems Are The Best For Microservices
PDF
KSQL - Stream Processing simplified!
PPTX
Elastically scalable architectures with microservices. The end of the monolith?
PPTX
Gluecon - Kafka and the service mesh
PPTX
Akka Microservices Architecture And Design
PDF
Building Stateful Microservices With Akka
PDF
JHipster conf 2019 - Kafka Ecosystem
PDF
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...
PPTX
Paris Kafka Meetup - patterns anti-patterns
Beyond the brokers - Un tour de l'écosystème Kafka
Akka Revealed: A JVM Architect's Journey From Resilient Actors To Scalable Cl...
Reactive Design Patterns
Kafka summit SF 2019 - the art of the event-streaming app
Devoxx university - Kafka de haut en bas
Flexible Authentication Strategies with SASL/OAUTHBEARER (Michael Kaminski, T...
Building Eventing Systems for Microservice Architecture
High-Speed Reactive Microservices
Typesafe Reactive Platform: Monitoring 1.0, Commercial features and more
Building stateful systems with akka cluster sharding
Dataservices - Processing Big Data The Microservice Way
Why Actor-Based Systems Are The Best For Microservices
KSQL - Stream Processing simplified!
Elastically scalable architectures with microservices. The end of the monolith?
Gluecon - Kafka and the service mesh
Akka Microservices Architecture And Design
Building Stateful Microservices With Akka
JHipster conf 2019 - Kafka Ecosystem
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...
Paris Kafka Meetup - patterns anti-patterns
Ad

Similar to Designing a reactive real-time data platform: Architecture and Infrastructure Challenges (20)

PDF
Introduction to Akka-Streams
PDF
Akka A to Z: A Guide To The Industry’s Best Toolkit for Fast Data and Microse...
PPTX
Akka.Net Overview
PDF
Reactive programming with akka
PDF
Reactive Programming in Akka
PPT
Akka.Net & .Net Core - .Net Inside 4° MeetUp
PDF
Building a Reactive System with Akka - Workshop @ O'Reilly SAConf NYC
PPTX
Developing distributed applications with Akka and Akka Cluster
PDF
Take a Look at Akka+Java (English version)
PDF
Agile Lab_BigData_Meetup_AKKA
PDF
Async Messaging in CQRS: Part 2 - Akka.NET
PDF
Functional Programming and Composing Actors
PPTX
Clustersoftware
PPTX
Akka for big data developers
PDF
Actor model in .NET - Akka.NET
PDF
Effective Akka v2
PPTX
Designing distributed systems
PPTX
DotNext 2020 - When and How to Use the Actor Model and Akka.NET
PDF
Reactive applications with Akka.Net - DDD East Anglia 2015
PDF
Buiilding reactive distributed systems with Akka
Introduction to Akka-Streams
Akka A to Z: A Guide To The Industry’s Best Toolkit for Fast Data and Microse...
Akka.Net Overview
Reactive programming with akka
Reactive Programming in Akka
Akka.Net & .Net Core - .Net Inside 4° MeetUp
Building a Reactive System with Akka - Workshop @ O'Reilly SAConf NYC
Developing distributed applications with Akka and Akka Cluster
Take a Look at Akka+Java (English version)
Agile Lab_BigData_Meetup_AKKA
Async Messaging in CQRS: Part 2 - Akka.NET
Functional Programming and Composing Actors
Clustersoftware
Akka for big data developers
Actor model in .NET - Akka.NET
Effective Akka v2
Designing distributed systems
DotNext 2020 - When and How to Use the Actor Model and Akka.NET
Reactive applications with Akka.Net - DDD East Anglia 2015
Buiilding reactive distributed systems with Akka
Ad

Recently uploaded (20)

PPTX
Operating system designcfffgfgggggggvggggggggg
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Understanding Forklifts - TECH EHS Solution
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
How Creative Agencies Leverage Project Management Software.pdf
PPTX
ai tools demonstartion for schools and inter college
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PPTX
Transform Your Business with a Software ERP System
PPTX
Reimagine Home Health with the Power of Agentic AI​
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PPTX
history of c programming in notes for students .pptx
Operating system designcfffgfgggggggvggggggggg
Odoo POS Development Services by CandidRoot Solutions
How to Migrate SBCGlobal Email to Yahoo Easily
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Understanding Forklifts - TECH EHS Solution
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Wondershare Filmora 15 Crack With Activation Key [2025
How Creative Agencies Leverage Project Management Software.pdf
ai tools demonstartion for schools and inter college
Odoo Companies in India – Driving Business Transformation.pdf
Transform Your Business with a Software ERP System
Reimagine Home Health with the Power of Agentic AI​
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Navsoft: AI-Powered Business Solutions & Custom Software Development
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
history of c programming in notes for students .pptx

Designing a reactive real-time data platform: Architecture and Infrastructure Challenges