SlideShare a Scribd company logo
Building and Evolving a Dependency-Graph
Based Microservice Architecture
Lars Francke – Partner and Co-Founder @ OpenCore
Kafka Summit 2019 – 30 September 2019
© 2019 OpenCore GmbH & Co. KG 2
About Me – Lars Francke
• Partner & Co-Founder at OpenCore
• We do Hadoop/Big Data/insert Buzzword consulting
• Based in Germany but doing business world-wide if you need us 
• https://www.opencore.com
• ASF/Big Data/Hadoop since 2008
• Apache Committer & Member: HBase, Hive, ORC, Training (PMC)
• Contact
• lars.francke@opencore.com
• @lars_francke
The problem
© 2019 OpenCore GmbH & Co. KG 3
© 2019 OpenCore GmbH & Co. KG 4
The problem
No one here knows
the dependencies
between all our
Microservices
anymore!
We drew a picture
but it hasn't been
updated in months
and is now doing
more harm than
good
We're afraid of
stopping this service
because we don't
know who depends
on it
How do these topics differ and
where can I find the latest customer
registrations?
"customer_regs",
"customer_regs1",
"customer_regs_new",
"new_customers",
"customers_lars_test"
We need to migrate
from On-Prem Kafka to
Confluent Cloud but
have no idea where to
begin and what we
need.
© 2019 OpenCore GmbH & Co. KG 5
The problem
Didn't the London
team already build a
service to check zip
codes?
Why has this
dashboard stopped
showing data? Does anyone mind if
I add a field to the
"Customer" object?
Oh no, Governance
wants to know where
in Kafka we store PII
data 
Microservice architectures
© 2019 OpenCore GmbH & Co. KG 6
Choreography
© 2019 OpenCore GmbH & Co. KG 8
Choreography
Also known as
Event-Driven
© 2019 OpenCore GmbH & Co. KG 9
Choreography
• Services coordinate amongst themselves
• No central service
• "Smart endpoints and dumb pipes" – Martin Fowler & James Lewis
• Kafka often used for the "dumb pipes" part (no offense!)
• Lots of flexibility
• Just add a new service, no need to coordinate with others
• Use whatever language you want, whatever data format you want etc.
• Often brittle
• Loose coupling means you might depend on a service without knowing it
• Those dependencies might change and break
• People might depend on your service without you knowing it!
© 2019 OpenCore GmbH & Co. KG 10
Choreography
• Hard to keep track of everything & get an overview
• Harder to verify at build-time
• One can only do the equivalent of a unit test easily, integration testing is harder if
other components are unknown or under control by a different team
• Different teams can work independently
© 2019 OpenCore GmbH & Co. KG 11
Choreography
When I said no central service
What I meant was that we obviously
still do have central services like:
• Schema Registry
• Log collection
• Monitoring
• Etc.
Orchestration
Source:
https://commons.wikimedia.org/wiki/File:Peter_Oundjian_-
_Conductor_of_Toronto_Symphony_Orchestra_2014.jpg
© 2019 OpenCore GmbH & Co. KG 13
Orchestration
• One central "coordinator" that tells everyone what to do
• Like a conductor in an orchestra
• The Enterprise Service Bus (ESB) is an example
• Routing, Transformations, Business rules etc.
• It's easy to get an overview over the whole system
• The central service can even provide a nice UI, showing dependency graphs
• Monitoring is easier
© 2019 OpenCore GmbH & Co. KG 14
Orchestration
• Less flexible
• Adding a new service requires coordination and potentially changing/restarting
existing things
• Less brittle
• Central service can validate the architecture
• The architecture/graph can often be verified at "build"-time
• Works well with CI/CD
• * as Code (Infrastructure, Configuration, …)
• May require coordination between teams
• Less self-service
© 2019 OpenCore GmbH & Co. KG 15
Orchestography
Natural question to ask:
Which is better?
© 2019 OpenCore GmbH & Co. KG 16
Orchestography
© 2019 OpenCore GmbH & Co. KG 17
Orchestography
Both have their uses!
© 2019 OpenCore GmbH & Co. KG 18
Microservices
• Microservices are often used to split up a single monolithic app into
multiple independent services
• There are still independent "business applications" even though some may
share data or even services
• Ideally a single team responsible for a product
• Orchestration is easier within one product (or team) while Choreography is
appealing across product/team borders
© 2019 OpenCore GmbH & Co. KG 19
Orchestography
• Orchestration lends itself more to "workflow" oriented tasks which are split
across multiple processes and/or need to be distributed
• Strict or at least strong dependencies between those tasks
• Can be seen as "one" thing, that could – in theory – also be implemented as one
monolithic process
• Choreography lends itself more to loosely coupled or decoupled services
• These might also have dependencies but often not as strict
© 2019 OpenCore GmbH & Co. KG 20
Orchestography
Application 1
(Orchestrated)
Kafka
(or similar,
for Choreography)
Application 3
(Orchestrated)
Application 4
(Orchestrated)
Application 2
(Orchestrated)
?
© 2019 OpenCore GmbH & Co. KG 21
Example
Naming things is hard
© 2019 OpenCore GmbH & Co. KG 22
© 2019 OpenCore GmbH & Co. KG 23
Cattle vs. Pets
Show of hands
Who here has (had) servers with names like:
Sources: https://www.flickr.com/photos/gageskidmore/7584137078, https://commons.wikimedia.org/wiki/File:Jean-Luc_Picard_2.jpg,
https://www.flickr.com/photos/44214515@N06/21547144233
© 2019 OpenCore GmbH & Co. KG 24
Pets
Names like that are a good sign that these servers might be your Pets
They often have a combination of these features:
• Manually built and managed
• Indispensable
• Can never be down
© 2019 OpenCore GmbH & Co. KG 25
Cattle
The industry has moved on (or is in the process) to treating Servers (and
services) as Cattle instead
• No identity (random names or based on some pattern)
• Disposable
• Infrastructure as Code
© 2019 OpenCore GmbH & Co. KG 26
Cattle
• The Cloud was a big "enabler" for this movement
• Servers have more or less random names
• Each specific instance doesn't matter, will be rebuilt when needed
• e.g. Spot Instances
• Kubernetes & Co. playing a role as well
© 2019 OpenCore GmbH & Co. KG 27
Cattle
If we agree that this is a good thing…
…why do you have a topic called
customerCreated
Technology to the rescue
Lars, tell us what to do!
© 2019 OpenCore GmbH & Co. KG 28
© 2019 OpenCore GmbH & Co. KG 29

We are not the first to struggle with this
Surprising, I know
© 2019 OpenCore GmbH & Co. KG 30
Good Compan(y|ies)
Source: https://commons.wikimedia.org/wiki/File:ING_Group_N.V._Logo.svg
© 2019 OpenCore GmbH & Co. KG 31
Netflix Conductor
• Netflix OSS Project: https://github.com/Netflix/conductor
• "Conductor is a Workflow Orchestration engine that runs in the cloud."
• Conductor runs backend Servers providing UI & REST API
• You define your Workflows in a JSON DSL, POST it to the API
• You develop your Workers in whichever language you want (convenience
libraries available for Java & Python) and they get new work from the REST
API
© 2019 OpenCore GmbH & Co. KG 32
Netflix Conductor
• Workflows consist of Tasks
• Conductor itself can store some Payload or it can/must be stored externally
• It does not support using Kafka (or similar) to decouple Tasks
© 2019 OpenCore GmbH & Co. KG 33
Netflix Conductor – Tasks
[
{
"name": "verify_if_idents_are_added",
"retryCount": 3,
"retryLogic": "FIXED",
"retryDelaySeconds": 10,
"timeoutSeconds": 300,
"timeoutPolicy": "TIME_OUT_WF",
"responseTimeoutSeconds": 180
},
{
"name": "add_idents",
"retryCount": 3,
"retryLogic": "FIXED",
"retryDelaySeconds": 10,
"timeoutSeconds": 300,
"timeoutPolicy": "TIME_OUT_WF",
"responseTimeoutSeconds": 180
}
]
© 2019 OpenCore GmbH & Co. KG 34
Netflix Conductor – Workflow Pt. 1
{
"name": "add_netflix_identation",
"description": "Adds Netflix Identation to video files.",
"version": 2,
"schemaVersion": 2,
"tasks": [
{
"name": "verify_if_idents_are_added",
"taskReferenceName": "ident_verification",
"inputParameters": {
"contentId": "${workflow.input.contentId}"
},
"type": "SIMPLE"
},
{
"name": "decide_task",
"taskReferenceName": "is_idents_added",
"inputParameters": {
"case_value_param": "${ident_verification.output.is_idents_added}"
},
© 2019 OpenCore GmbH & Co. KG 35
Netflix Conductor – Workflow Pt. 2
"type": "DECISION",
"caseValueParam": "case_value_param",
"decisionCases": {
"false": [
{
"name": "add_idents",
"taskReferenceName": "add_idents_by_type",
"inputParameters": {
"identType": "${workflow.input.identType}",
"contentId": "${workflow.input.contentId}"
},
"type": "SIMPLE"
}
]
}
}
]
}
© 2019 OpenCore GmbH & Co. KG 36
Uber Cadence
• Uber Project: https://github.com/uber/cadence
• "Cadence is a distributed, scalable, durable, and highly available
orchestration engine to execute asynchronous long-running business logic
in a scalable and resilient way."
• From the same people that lead the Amazon Simple Workflow service
• Has Clients for Java & Go
• Other possible, communicate via Thrift
© 2019 OpenCore GmbH & Co. KG 37
Uber Cadence
• Cadence handles Task state & Queues for us
• Your Workflow is implemented in code
• Workflows can run & wait for a long time
• Example: Subscription Renewal workflow that runs forever and charges
your customer every 30 days
• Also no direct Kafka integration
© 2019 OpenCore GmbH & Co. KG 38
Uber Cadence – Example
@Override public void execute(String customerId) {
activities.sendWelcomeEmail(customerId);
try {
boolean trialPeriod = true;
while (true) {
Workflow.sleep(Duration.ofDays(30));
activities.chargeMonthlyFee(customerId);
if (trialPeriod) {
activities.sendEndOfTrialEmail(customerId);
trialPeriod = false;
} else {
activities.sendMonthlyChargeEmail(customerId);
}
}
} catch (CancellationException e) {
activities.processSubscriptionCancellation(customerId);
activities.sendSorryToSeeYouGoEmail(customerId);
}
}
© 2019 OpenCore GmbH & Co. KG 39
Expedia Stream Registry
• Expedia project (originally HomeAway):
https://github.com/ExpediaGroup/stream-registry
• A metadata service for streams
• Who owns the stream?
• Who are the producers and consumers of the stream?
• Management of stream replication across clusters and regions
• Management of stream storage for permanent access
• Management of stream triggers for legacy stream sources
© 2019 OpenCore GmbH & Co. KG 40
Expedia Stream Registry
• It manages Clusters as well as "Streams" of data
• Including schemas, owners and other metadata
• Unfortunately the docs are pretty thin
• Moved from HomeAway to Expedia while undergoing a refactor
© 2019 OpenCore GmbH & Co. KG 41
Others
There are others:
• ING Baker
• ING Project: https://github.com/ing-bank/baker
• "Orchestrate microservice-based process flows"
• Java based library
• You specify a Recipe which includes all your functions (interactions), the data they
need (ingredients) and the data they produce (event)
• Zeebe
• From the Camunda folks
• BPMN 2
• "A Workflow Engine for Microservices Orchestration"
• Dagster
• …
Where does this leave us?
© 2019 OpenCore GmbH & Co. KG 42
© 2019 OpenCore GmbH & Co. KG 43
The Current State
• Most existing tools require you to explicitly model your dependency graph
• This makes sense for "strict" workflows
• But not for many other use-cases (e.g. analytics, logging, persistence etc.)
• This is comparable to having SQL without a Query Optimizer or Spark
without Catalyst
• Some tools require you to implement their API or use their library
© 2019 OpenCore GmbH & Co. KG 44
The Current State
• Unfortunately, the perfect solution doesn't yet exist
• The Orchestrators that do exist are all very nice and work
• For the Choreography though things are a bit bleak
• Stream Registry moves into the right direction
• Schema Registry is necessary as well but not sufficient
© 2019 OpenCore GmbH & Co. KG 45
Does this seem familiar?
© 2019 OpenCore GmbH & Co. KG 46
Wishlist
• We need better support for Event-Driven (Choreography) style
architectures
• We need better Governance for data in Kafka
• This problem is not exclusive to Kafka
• Kafka topics shouldn't be managed manually
• We need better self-service tools to find data sources
© 2019 OpenCore GmbH & Co. KG 47
Wishlist
• We'd like a tool that
• allows us to register logical streams of data,
• Used to distinguish flows with the same schema
• Metadata (e.g. owners)
• e.g. "New customers stream"
• allows us to register Connections,
• e.g. Kafka Clusters, Kinesis credentials etc.
• allows us to register (Micro-)Services
• Including their Inputs and Outputs
• These are the "Data" in- and outputs, not any topic itself
• Both reference existing Schemas
• Optional: Dependencies
© 2019 OpenCore GmbH & Co. KG 48
Wishlist
• This tool could use this information to
• automatically build an optimal DAG,
• and execute all necessary steps to enable this DAG:
• Create Kafka Topics
• Create necessary ACLs
• Optionally: Update MirrorMaker configuration or other steps
• The Services itself can then get all the information they need from the REST API
• Cluster configuration
• Schema information
• Topic names for in- and output
• Optional: Pre- & Postconditions
© 2019 OpenCore GmbH & Co. KG 49
Wishlist
• For those who use Apache Spark:
• In Spark you define all your actions and transformations, at the end it builds an
optimal DAG out of this information and executes it
• This tool would do the exact same thing but across process boundaries
• The Services itself can be written in any language as long as they can make
REST calls
• Convenience clients would be great but optional
• As this tool controls the data flow (no data flows through the tool itself
though) it can create "intermediate" topics to enable more use-cases:
• Quality checks
• Automatic anonymization
• Automatic collection of samples
© 2019 OpenCore GmbH & Co. KG 50
Example
© 2019 OpenCore GmbH & Co. KG 51
Example
Service A Service BTopic "xqdrnc"
© 2019 OpenCore GmbH & Co. KG 52
Example
Service A Service BTopic "xqdrnc"
Infra
Service
if (booking.travel_agency == "Thomas Cook") {
alert()
}
© 2019 OpenCore GmbH & Co. KG 53
Example
Service A Service BTopic "xqdrnc"
if (booking.travel_agency == "Thomas Cook") {
fail()
}
© 2019 OpenCore GmbH & Co. KG 54
Example
if (booking.travel_agency == "Thomas Cook") {
fail()
}
Service A Service BTopic "xqdrnc"
Infra
Service
Topic "blgrgb"
© 2019 OpenCore GmbH & Co. KG 55
Wishlist
• This tool could also (optionally) automatically run or re-run the services
using e.g. Kubernetes
• This'd allow for total control
• Services need to be made aware of changes in the topology
• We could automatically transform between data formats
• e.g. a service accepting Protobuf but the data only exists in Avro
© 2019 OpenCore GmbH & Co. KG 56
Wishlist
• A side effect would be an automatically up-to-date Governance/Data
Catalog
• This would allow for better self-service operations: You don't have to find "topics" in
Kafka with your data, you just have to declare which data you're interested in and
the system will always tell you where this data lives
• Orchestrators like Conductor etc. would still be important encapsulated in a
"Application"
• Which itself could consist of multiple services
© 2019 OpenCore GmbH & Co. KG 57
Orchestography
Service 1
(Orchestrated)
Kafka
(or similar,
for Choreography)
Service 3 (Orchestrated)
Service 4 (Orchestrated)Service 2 (Orchestrated)
?
Questions
What are your questions?
lars.francke@opencore.com
@lars_francke
© 2019 OpenCore GmbH & Co. KG 58

More Related Content

PDF
Kafka summit apac session
PDF
Building a fence around your Hadoop cluster
PPTX
IoT Data Streaming - Why MQTT and Kafka are a match made in heaven | Dominik ...
PDF
Cisco's MultiCloud Strategy
PPTX
Introduction to ibm cloud paks concept license and minimum config public
PPT
Multi-Cloud Roadmap: Architecting Hybrid Environments for Maximum Results
PDF
The Developer's Journey through IBM Cloud Pak for Applications
PPTX
Breaking the Monolith
Kafka summit apac session
Building a fence around your Hadoop cluster
IoT Data Streaming - Why MQTT and Kafka are a match made in heaven | Dominik ...
Cisco's MultiCloud Strategy
Introduction to ibm cloud paks concept license and minimum config public
Multi-Cloud Roadmap: Architecting Hybrid Environments for Maximum Results
The Developer's Journey through IBM Cloud Pak for Applications
Breaking the Monolith

What's hot (20)

PDF
Application Modernization Using Event Streaming Architecture (David Wadden, V...
PDF
Handling GDPR with Apache Kafka: How to Comply Without Freaking Out? (David J...
PDF
Kubernetes Apache Kafka
PDF
cross cloud inter-operability with iPaaS and serverless for Telco cloud SDN/NFV
PDF
Building Serverless Apps with Kafka (Dale Lane, IBM) Kafka Summit London 2019
PDF
Cloud Foundry, the Open Platform As A Service
PDF
Secure Infrastructure Provisioning with Terraform Cloud, Vault + GitLab CI
PPTX
EDA Governance Model: a multicloud approach based on GitOps | Alejandro Alija...
PDF
IBM Cloud pak for data brochure
PDF
Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kaf...
PDF
How we eased out security journey with OAuth (Goodbye Kerberos!) | Paul Makka...
PDF
BayInfotech (BIT) ACI Portfolio
PPTX
Cloud Foundry: Infrastructure Options
PPTX
CF SUMMIT: Partnerships, Business and Cloud Foundry
PPTX
Kafka Summit 2021 - Why MQTT and Kafka are a match made in heaven
PDF
IBM Cloud Paks - IBM Cloud
PDF
Low-latency real-time data processing at giga-scale with Kafka | John DesJard...
PPT
Birmingham meetup
PDF
Automating the Enterprise with CloudForms & Ansible
PDF
Cncf checkov and bridgecrew
Application Modernization Using Event Streaming Architecture (David Wadden, V...
Handling GDPR with Apache Kafka: How to Comply Without Freaking Out? (David J...
Kubernetes Apache Kafka
cross cloud inter-operability with iPaaS and serverless for Telco cloud SDN/NFV
Building Serverless Apps with Kafka (Dale Lane, IBM) Kafka Summit London 2019
Cloud Foundry, the Open Platform As A Service
Secure Infrastructure Provisioning with Terraform Cloud, Vault + GitLab CI
EDA Governance Model: a multicloud approach based on GitOps | Alejandro Alija...
IBM Cloud pak for data brochure
Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kaf...
How we eased out security journey with OAuth (Goodbye Kerberos!) | Paul Makka...
BayInfotech (BIT) ACI Portfolio
Cloud Foundry: Infrastructure Options
CF SUMMIT: Partnerships, Business and Cloud Foundry
Kafka Summit 2021 - Why MQTT and Kafka are a match made in heaven
IBM Cloud Paks - IBM Cloud
Low-latency real-time data processing at giga-scale with Kafka | John DesJard...
Birmingham meetup
Automating the Enterprise with CloudForms & Ansible
Cncf checkov and bridgecrew
Ad

Similar to Kafka Summit 2019 Microservice Orchestration (20)

PDF
GitOps, Jenkins X &Future of CI/CD
PDF
DevOps Patterns to Enable Success in Microservices
PDF
DevOps Patterns to Enable Success in Microservices
PDF
The new stack isn’t a stack: Fragmentation and terraforming 
the service layer
PDF
DevOps in the Enterprise: My Experience at Accenture
PPTX
Why is Infrastructure-as-Code essential in the Cloud Age?
PDF
Implementing Cloud-Based DevOps for Distributed Agile Projects
PDF
Bluemix overview - Rencontres Ecole Centrale et Supelec avec IBM France Lab -...
PDF
Innovation at Meraki
PDF
Ten Minutes Bluemix Pitch from Dev to Dev
PDF
Leveraging Multiple Cloud Orchestration
PDF
DevOps on AWS: A Practical Introduction
PDF
Ibm cloud open architecture
PPTX
SaaStock 2019 - eltjo hofstee
PPTX
server to cloud: converting a legacy platform to an open source paas
PDF
FaaS or not to FaaS. Visible and invisible benefits of the Serverless paradig...
PPTX
5 strategies for enterprise cloud infrastructure success
PPTX
#dbhouseparty - Should I be building Microservices?
PPTX
Cloud Foundry Technical Overview at IBM Interconnect 2016
PDF
Intalio create and cloudfoudry - short
GitOps, Jenkins X &Future of CI/CD
DevOps Patterns to Enable Success in Microservices
DevOps Patterns to Enable Success in Microservices
The new stack isn’t a stack: Fragmentation and terraforming 
the service layer
DevOps in the Enterprise: My Experience at Accenture
Why is Infrastructure-as-Code essential in the Cloud Age?
Implementing Cloud-Based DevOps for Distributed Agile Projects
Bluemix overview - Rencontres Ecole Centrale et Supelec avec IBM France Lab -...
Innovation at Meraki
Ten Minutes Bluemix Pitch from Dev to Dev
Leveraging Multiple Cloud Orchestration
DevOps on AWS: A Practical Introduction
Ibm cloud open architecture
SaaStock 2019 - eltjo hofstee
server to cloud: converting a legacy platform to an open source paas
FaaS or not to FaaS. Visible and invisible benefits of the Serverless paradig...
5 strategies for enterprise cloud infrastructure success
#dbhouseparty - Should I be building Microservices?
Cloud Foundry Technical Overview at IBM Interconnect 2016
Intalio create and cloudfoudry - short
Ad

Recently uploaded (20)

PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Spectral efficient network and resource selection model in 5G networks
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Electronic commerce courselecture one. Pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
sap open course for s4hana steps from ECC to s4
Reach Out and Touch Someone: Haptics and Empathic Computing
Spectral efficient network and resource selection model in 5G networks
“AI and Expert System Decision Support & Business Intelligence Systems”
Review of recent advances in non-invasive hemoglobin estimation
Chapter 3 Spatial Domain Image Processing.pdf
NewMind AI Weekly Chronicles - August'25 Week I
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
The Rise and Fall of 3GPP – Time for a Sabbatical?
Big Data Technologies - Introduction.pptx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Building Integrated photovoltaic BIPV_UPV.pdf
Understanding_Digital_Forensics_Presentation.pptx
Empathic Computing: Creating Shared Understanding
MIND Revenue Release Quarter 2 2025 Press Release
Electronic commerce courselecture one. Pdf

Kafka Summit 2019 Microservice Orchestration

  • 1. Building and Evolving a Dependency-Graph Based Microservice Architecture Lars Francke – Partner and Co-Founder @ OpenCore Kafka Summit 2019 – 30 September 2019
  • 2. © 2019 OpenCore GmbH & Co. KG 2 About Me – Lars Francke • Partner & Co-Founder at OpenCore • We do Hadoop/Big Data/insert Buzzword consulting • Based in Germany but doing business world-wide if you need us  • https://www.opencore.com • ASF/Big Data/Hadoop since 2008 • Apache Committer & Member: HBase, Hive, ORC, Training (PMC) • Contact • lars.francke@opencore.com • @lars_francke
  • 3. The problem © 2019 OpenCore GmbH & Co. KG 3
  • 4. © 2019 OpenCore GmbH & Co. KG 4 The problem No one here knows the dependencies between all our Microservices anymore! We drew a picture but it hasn't been updated in months and is now doing more harm than good We're afraid of stopping this service because we don't know who depends on it How do these topics differ and where can I find the latest customer registrations? "customer_regs", "customer_regs1", "customer_regs_new", "new_customers", "customers_lars_test" We need to migrate from On-Prem Kafka to Confluent Cloud but have no idea where to begin and what we need.
  • 5. © 2019 OpenCore GmbH & Co. KG 5 The problem Didn't the London team already build a service to check zip codes? Why has this dashboard stopped showing data? Does anyone mind if I add a field to the "Customer" object? Oh no, Governance wants to know where in Kafka we store PII data 
  • 6. Microservice architectures © 2019 OpenCore GmbH & Co. KG 6
  • 8. © 2019 OpenCore GmbH & Co. KG 8 Choreography Also known as Event-Driven
  • 9. © 2019 OpenCore GmbH & Co. KG 9 Choreography • Services coordinate amongst themselves • No central service • "Smart endpoints and dumb pipes" – Martin Fowler & James Lewis • Kafka often used for the "dumb pipes" part (no offense!) • Lots of flexibility • Just add a new service, no need to coordinate with others • Use whatever language you want, whatever data format you want etc. • Often brittle • Loose coupling means you might depend on a service without knowing it • Those dependencies might change and break • People might depend on your service without you knowing it!
  • 10. © 2019 OpenCore GmbH & Co. KG 10 Choreography • Hard to keep track of everything & get an overview • Harder to verify at build-time • One can only do the equivalent of a unit test easily, integration testing is harder if other components are unknown or under control by a different team • Different teams can work independently
  • 11. © 2019 OpenCore GmbH & Co. KG 11 Choreography When I said no central service What I meant was that we obviously still do have central services like: • Schema Registry • Log collection • Monitoring • Etc.
  • 13. © 2019 OpenCore GmbH & Co. KG 13 Orchestration • One central "coordinator" that tells everyone what to do • Like a conductor in an orchestra • The Enterprise Service Bus (ESB) is an example • Routing, Transformations, Business rules etc. • It's easy to get an overview over the whole system • The central service can even provide a nice UI, showing dependency graphs • Monitoring is easier
  • 14. © 2019 OpenCore GmbH & Co. KG 14 Orchestration • Less flexible • Adding a new service requires coordination and potentially changing/restarting existing things • Less brittle • Central service can validate the architecture • The architecture/graph can often be verified at "build"-time • Works well with CI/CD • * as Code (Infrastructure, Configuration, …) • May require coordination between teams • Less self-service
  • 15. © 2019 OpenCore GmbH & Co. KG 15 Orchestography Natural question to ask: Which is better?
  • 16. © 2019 OpenCore GmbH & Co. KG 16 Orchestography
  • 17. © 2019 OpenCore GmbH & Co. KG 17 Orchestography Both have their uses!
  • 18. © 2019 OpenCore GmbH & Co. KG 18 Microservices • Microservices are often used to split up a single monolithic app into multiple independent services • There are still independent "business applications" even though some may share data or even services • Ideally a single team responsible for a product • Orchestration is easier within one product (or team) while Choreography is appealing across product/team borders
  • 19. © 2019 OpenCore GmbH & Co. KG 19 Orchestography • Orchestration lends itself more to "workflow" oriented tasks which are split across multiple processes and/or need to be distributed • Strict or at least strong dependencies between those tasks • Can be seen as "one" thing, that could – in theory – also be implemented as one monolithic process • Choreography lends itself more to loosely coupled or decoupled services • These might also have dependencies but often not as strict
  • 20. © 2019 OpenCore GmbH & Co. KG 20 Orchestography Application 1 (Orchestrated) Kafka (or similar, for Choreography) Application 3 (Orchestrated) Application 4 (Orchestrated) Application 2 (Orchestrated) ?
  • 21. © 2019 OpenCore GmbH & Co. KG 21 Example
  • 22. Naming things is hard © 2019 OpenCore GmbH & Co. KG 22
  • 23. © 2019 OpenCore GmbH & Co. KG 23 Cattle vs. Pets Show of hands Who here has (had) servers with names like: Sources: https://www.flickr.com/photos/gageskidmore/7584137078, https://commons.wikimedia.org/wiki/File:Jean-Luc_Picard_2.jpg, https://www.flickr.com/photos/44214515@N06/21547144233
  • 24. © 2019 OpenCore GmbH & Co. KG 24 Pets Names like that are a good sign that these servers might be your Pets They often have a combination of these features: • Manually built and managed • Indispensable • Can never be down
  • 25. © 2019 OpenCore GmbH & Co. KG 25 Cattle The industry has moved on (or is in the process) to treating Servers (and services) as Cattle instead • No identity (random names or based on some pattern) • Disposable • Infrastructure as Code
  • 26. © 2019 OpenCore GmbH & Co. KG 26 Cattle • The Cloud was a big "enabler" for this movement • Servers have more or less random names • Each specific instance doesn't matter, will be rebuilt when needed • e.g. Spot Instances • Kubernetes & Co. playing a role as well
  • 27. © 2019 OpenCore GmbH & Co. KG 27 Cattle If we agree that this is a good thing… …why do you have a topic called customerCreated
  • 28. Technology to the rescue Lars, tell us what to do! © 2019 OpenCore GmbH & Co. KG 28
  • 29. © 2019 OpenCore GmbH & Co. KG 29  We are not the first to struggle with this Surprising, I know
  • 30. © 2019 OpenCore GmbH & Co. KG 30 Good Compan(y|ies) Source: https://commons.wikimedia.org/wiki/File:ING_Group_N.V._Logo.svg
  • 31. © 2019 OpenCore GmbH & Co. KG 31 Netflix Conductor • Netflix OSS Project: https://github.com/Netflix/conductor • "Conductor is a Workflow Orchestration engine that runs in the cloud." • Conductor runs backend Servers providing UI & REST API • You define your Workflows in a JSON DSL, POST it to the API • You develop your Workers in whichever language you want (convenience libraries available for Java & Python) and they get new work from the REST API
  • 32. © 2019 OpenCore GmbH & Co. KG 32 Netflix Conductor • Workflows consist of Tasks • Conductor itself can store some Payload or it can/must be stored externally • It does not support using Kafka (or similar) to decouple Tasks
  • 33. © 2019 OpenCore GmbH & Co. KG 33 Netflix Conductor – Tasks [ { "name": "verify_if_idents_are_added", "retryCount": 3, "retryLogic": "FIXED", "retryDelaySeconds": 10, "timeoutSeconds": 300, "timeoutPolicy": "TIME_OUT_WF", "responseTimeoutSeconds": 180 }, { "name": "add_idents", "retryCount": 3, "retryLogic": "FIXED", "retryDelaySeconds": 10, "timeoutSeconds": 300, "timeoutPolicy": "TIME_OUT_WF", "responseTimeoutSeconds": 180 } ]
  • 34. © 2019 OpenCore GmbH & Co. KG 34 Netflix Conductor – Workflow Pt. 1 { "name": "add_netflix_identation", "description": "Adds Netflix Identation to video files.", "version": 2, "schemaVersion": 2, "tasks": [ { "name": "verify_if_idents_are_added", "taskReferenceName": "ident_verification", "inputParameters": { "contentId": "${workflow.input.contentId}" }, "type": "SIMPLE" }, { "name": "decide_task", "taskReferenceName": "is_idents_added", "inputParameters": { "case_value_param": "${ident_verification.output.is_idents_added}" },
  • 35. © 2019 OpenCore GmbH & Co. KG 35 Netflix Conductor – Workflow Pt. 2 "type": "DECISION", "caseValueParam": "case_value_param", "decisionCases": { "false": [ { "name": "add_idents", "taskReferenceName": "add_idents_by_type", "inputParameters": { "identType": "${workflow.input.identType}", "contentId": "${workflow.input.contentId}" }, "type": "SIMPLE" } ] } } ] }
  • 36. © 2019 OpenCore GmbH & Co. KG 36 Uber Cadence • Uber Project: https://github.com/uber/cadence • "Cadence is a distributed, scalable, durable, and highly available orchestration engine to execute asynchronous long-running business logic in a scalable and resilient way." • From the same people that lead the Amazon Simple Workflow service • Has Clients for Java & Go • Other possible, communicate via Thrift
  • 37. © 2019 OpenCore GmbH & Co. KG 37 Uber Cadence • Cadence handles Task state & Queues for us • Your Workflow is implemented in code • Workflows can run & wait for a long time • Example: Subscription Renewal workflow that runs forever and charges your customer every 30 days • Also no direct Kafka integration
  • 38. © 2019 OpenCore GmbH & Co. KG 38 Uber Cadence – Example @Override public void execute(String customerId) { activities.sendWelcomeEmail(customerId); try { boolean trialPeriod = true; while (true) { Workflow.sleep(Duration.ofDays(30)); activities.chargeMonthlyFee(customerId); if (trialPeriod) { activities.sendEndOfTrialEmail(customerId); trialPeriod = false; } else { activities.sendMonthlyChargeEmail(customerId); } } } catch (CancellationException e) { activities.processSubscriptionCancellation(customerId); activities.sendSorryToSeeYouGoEmail(customerId); } }
  • 39. © 2019 OpenCore GmbH & Co. KG 39 Expedia Stream Registry • Expedia project (originally HomeAway): https://github.com/ExpediaGroup/stream-registry • A metadata service for streams • Who owns the stream? • Who are the producers and consumers of the stream? • Management of stream replication across clusters and regions • Management of stream storage for permanent access • Management of stream triggers for legacy stream sources
  • 40. © 2019 OpenCore GmbH & Co. KG 40 Expedia Stream Registry • It manages Clusters as well as "Streams" of data • Including schemas, owners and other metadata • Unfortunately the docs are pretty thin • Moved from HomeAway to Expedia while undergoing a refactor
  • 41. © 2019 OpenCore GmbH & Co. KG 41 Others There are others: • ING Baker • ING Project: https://github.com/ing-bank/baker • "Orchestrate microservice-based process flows" • Java based library • You specify a Recipe which includes all your functions (interactions), the data they need (ingredients) and the data they produce (event) • Zeebe • From the Camunda folks • BPMN 2 • "A Workflow Engine for Microservices Orchestration" • Dagster • …
  • 42. Where does this leave us? © 2019 OpenCore GmbH & Co. KG 42
  • 43. © 2019 OpenCore GmbH & Co. KG 43 The Current State • Most existing tools require you to explicitly model your dependency graph • This makes sense for "strict" workflows • But not for many other use-cases (e.g. analytics, logging, persistence etc.) • This is comparable to having SQL without a Query Optimizer or Spark without Catalyst • Some tools require you to implement their API or use their library
  • 44. © 2019 OpenCore GmbH & Co. KG 44 The Current State • Unfortunately, the perfect solution doesn't yet exist • The Orchestrators that do exist are all very nice and work • For the Choreography though things are a bit bleak • Stream Registry moves into the right direction • Schema Registry is necessary as well but not sufficient
  • 45. © 2019 OpenCore GmbH & Co. KG 45 Does this seem familiar?
  • 46. © 2019 OpenCore GmbH & Co. KG 46 Wishlist • We need better support for Event-Driven (Choreography) style architectures • We need better Governance for data in Kafka • This problem is not exclusive to Kafka • Kafka topics shouldn't be managed manually • We need better self-service tools to find data sources
  • 47. © 2019 OpenCore GmbH & Co. KG 47 Wishlist • We'd like a tool that • allows us to register logical streams of data, • Used to distinguish flows with the same schema • Metadata (e.g. owners) • e.g. "New customers stream" • allows us to register Connections, • e.g. Kafka Clusters, Kinesis credentials etc. • allows us to register (Micro-)Services • Including their Inputs and Outputs • These are the "Data" in- and outputs, not any topic itself • Both reference existing Schemas • Optional: Dependencies
  • 48. © 2019 OpenCore GmbH & Co. KG 48 Wishlist • This tool could use this information to • automatically build an optimal DAG, • and execute all necessary steps to enable this DAG: • Create Kafka Topics • Create necessary ACLs • Optionally: Update MirrorMaker configuration or other steps • The Services itself can then get all the information they need from the REST API • Cluster configuration • Schema information • Topic names for in- and output • Optional: Pre- & Postconditions
  • 49. © 2019 OpenCore GmbH & Co. KG 49 Wishlist • For those who use Apache Spark: • In Spark you define all your actions and transformations, at the end it builds an optimal DAG out of this information and executes it • This tool would do the exact same thing but across process boundaries • The Services itself can be written in any language as long as they can make REST calls • Convenience clients would be great but optional • As this tool controls the data flow (no data flows through the tool itself though) it can create "intermediate" topics to enable more use-cases: • Quality checks • Automatic anonymization • Automatic collection of samples
  • 50. © 2019 OpenCore GmbH & Co. KG 50 Example
  • 51. © 2019 OpenCore GmbH & Co. KG 51 Example Service A Service BTopic "xqdrnc"
  • 52. © 2019 OpenCore GmbH & Co. KG 52 Example Service A Service BTopic "xqdrnc" Infra Service if (booking.travel_agency == "Thomas Cook") { alert() }
  • 53. © 2019 OpenCore GmbH & Co. KG 53 Example Service A Service BTopic "xqdrnc" if (booking.travel_agency == "Thomas Cook") { fail() }
  • 54. © 2019 OpenCore GmbH & Co. KG 54 Example if (booking.travel_agency == "Thomas Cook") { fail() } Service A Service BTopic "xqdrnc" Infra Service Topic "blgrgb"
  • 55. © 2019 OpenCore GmbH & Co. KG 55 Wishlist • This tool could also (optionally) automatically run or re-run the services using e.g. Kubernetes • This'd allow for total control • Services need to be made aware of changes in the topology • We could automatically transform between data formats • e.g. a service accepting Protobuf but the data only exists in Avro
  • 56. © 2019 OpenCore GmbH & Co. KG 56 Wishlist • A side effect would be an automatically up-to-date Governance/Data Catalog • This would allow for better self-service operations: You don't have to find "topics" in Kafka with your data, you just have to declare which data you're interested in and the system will always tell you where this data lives • Orchestrators like Conductor etc. would still be important encapsulated in a "Application" • Which itself could consist of multiple services
  • 57. © 2019 OpenCore GmbH & Co. KG 57 Orchestography Service 1 (Orchestrated) Kafka (or similar, for Choreography) Service 3 (Orchestrated) Service 4 (Orchestrated)Service 2 (Orchestrated) ?
  • 58. Questions What are your questions? lars.francke@opencore.com @lars_francke © 2019 OpenCore GmbH & Co. KG 58

Editor's Notes

  • #2: Note: No Kafka in the title We promised Kafka Streams but that's not going to happen, sorry 
  • #8: https://pxhere.com/en/photo/940450
  • #13: https://commons.wikimedia.org/wiki/File:Peter_Oundjian_-_Conductor_of_Toronto_Symphony_Orchestra_2014.jpg
  • #20: Before microservices these workflow style things would probably have been one monolithic service
  • #22: A customer of ours has distributed teams They write their services in various languages (JavaScript, Java, Python, Go) What can't be seen is that every arrow means a new topic Downstream services don't care which topic they read from, they only care about the "best" booking there is, some don't care and use the earliest one
  • #24: https://www.flickr.com/photos/gageskidmore/7584137078 https://commons.wikimedia.org/wiki/File:Jean-Luc_Picard_2.jpg https://www.flickr.com/photos/44214515@N06/21547144233 Other examples: Rancor, Merlin, ...
  • #26: http://cloudscaling.com/blog/cloud-computing/the-history-of-pets-vs-cattle/
  • #31: https://en.wikipedia.org/wiki/File:Netflix_2015_logo.svg https://brand.uber.com/guide#logo-overview https://commons.wikimedia.org/wiki/File:ING_Group_N.V._Logo.svg https://newsroom.expedia.com/image-gallery
  • #32: https://netflix.github.io/conductor/
  • #44: Baker is the exception
  • #51: A customer of ours has distributed teams They write their services in various languages (JavaScript, Java, Python, Go) What can't be seen is that every arrow means a new topic