Kafka Summit 2019 Microservice Orchestration

Building and Evolving a Dependency-Graph
Based Microservice Architecture
Lars Francke – Partner and Co-Founder @ OpenCore
Kafka Summit 2019 – 30 September 2019

© 2019 OpenCore GmbH & Co. KG 2
About Me – Lars Francke
• Partner & Co-Founder at OpenCore
• We do Hadoop/Big Data/insert Buzzword consulting
• Based in Germany but doing business world-wide if you need us 
• https://www.opencore.com
• ASF/Big Data/Hadoop since 2008
• Apache Committer & Member: HBase, Hive, ORC, Training (PMC)
• Contact
• lars.francke@opencore.com
• @lars_francke

The problem

The problem
No one here knows
the dependencies
between all our
Microservices
anymore!
We drew a picture
but it hasn't been
updated in months
and is now doing
more harm than
good
We're afraid of
stopping this service
because we don't
know who depends
on it
How do these topics differ and
where can I find the latest customer
registrations?
"customer_regs",
"customer_regs1",
"customer_regs_new",
"new_customers",
"customers_lars_test"
We need to migrate
from On-Prem Kafka to
Confluent Cloud but
have no idea where to
begin and what we
need.

The problem
Didn't the London
team already build a
service to check zip
codes?
Why has this
dashboard stopped
showing data? Does anyone mind if
I add a field to the
"Customer" object?
Oh no, Governance
wants to know where
in Kafka we store PII
data 

Microservice architectures

Choreography
Also known as
Event-Driven

Choreography
• Services coordinate amongst themselves
• No central service
• "Smart endpoints and dumb pipes" – Martin Fowler & James Lewis
• Kafka often used for the "dumb pipes" part (no offense!)
• Lots of flexibility
• Just add a new service, no need to coordinate with others
• Use whatever language you want, whatever data format you want etc.
• Often brittle
• Loose coupling means you might depend on a service without knowing it
• Those dependencies might change and break
• People might depend on your service without you knowing it!

Choreography
• Hard to keep track of everything & get an overview
• Harder to verify at build-time
• One can only do the equivalent of a unit test easily, integration testing is harder if
other components are unknown or under control by a different team
• Different teams can work independently

Choreography
When I said no central service
What I meant was that we obviously
still do have central services like:
• Schema Registry
• Log collection
• Monitoring
• Etc.

Orchestration
Source:
https://commons.wikimedia.org/wiki/File:Peter_Oundjian_-
_Conductor_of_Toronto_Symphony_Orchestra_2014.jpg

Orchestration
• One central "coordinator" that tells everyone what to do
• Like a conductor in an orchestra
• The Enterprise Service Bus (ESB) is an example
• Routing, Transformations, Business rules etc.
• It's easy to get an overview over the whole system
• The central service can even provide a nice UI, showing dependency graphs
• Monitoring is easier

Orchestration
• Less flexible
• Adding a new service requires coordination and potentially changing/restarting
existing things
• Less brittle
• Central service can validate the architecture
• The architecture/graph can often be verified at "build"-time
• Works well with CI/CD
• * as Code (Infrastructure, Configuration, …)
• May require coordination between teams
• Less self-service

Orchestography
Natural question to ask:
Which is better?

Orchestography

Orchestography
Both have their uses!

Microservices
• Microservices are often used to split up a single monolithic app into
multiple independent services
• There are still independent "business applications" even though some may
share data or even services
• Ideally a single team responsible for a product
• Orchestration is easier within one product (or team) while Choreography is
appealing across product/team borders

Orchestography
• Orchestration lends itself more to "workflow" oriented tasks which are split
across multiple processes and/or need to be distributed
• Strict or at least strong dependencies between those tasks
• Can be seen as "one" thing, that could – in theory – also be implemented as one
monolithic process
• Choreography lends itself more to loosely coupled or decoupled services
• These might also have dependencies but often not as strict

Orchestography
Application 1
(Orchestrated)
Kafka
(or similar,
for Choreography)
Application 3
(Orchestrated)
Application 4
(Orchestrated)
Application 2
(Orchestrated)
?

Example

Naming things is hard

Cattle vs. Pets
Show of hands
Who here has (had) servers with names like:
Sources: https://www.flickr.com/photos/gageskidmore/7584137078, https://commons.wikimedia.org/wiki/File:Jean-Luc_Picard_2.jpg,
https://www.flickr.com/photos/44214515@N06/21547144233

Pets
Names like that are a good sign that these servers might be your Pets
They often have a combination of these features:
• Manually built and managed
• Indispensable
• Can never be down

Cattle
The industry has moved on (or is in the process) to treating Servers (and
services) as Cattle instead
• No identity (random names or based on some pattern)
• Disposable
• Infrastructure as Code

Cattle
• The Cloud was a big "enabler" for this movement
• Servers have more or less random names
• Each specific instance doesn't matter, will be rebuilt when needed
• e.g. Spot Instances
• Kubernetes & Co. playing a role as well

Cattle
If we agree that this is a good thing…
…why do you have a topic called
customerCreated

Technology to the rescue
Lars, tell us what to do!


We are not the first to struggle with this
Surprising, I know

Good Compan(y|ies)
Source: https://commons.wikimedia.org/wiki/File:ING_Group_N.V._Logo.svg

Netflix Conductor
• Netflix OSS Project: https://github.com/Netflix/conductor
• "Conductor is a Workflow Orchestration engine that runs in the cloud."
• Conductor runs backend Servers providing UI & REST API
• You define your Workflows in a JSON DSL, POST it to the API
• You develop your Workers in whichever language you want (convenience
libraries available for Java & Python) and they get new work from the REST
API

Netflix Conductor
• Workflows consist of Tasks
• Conductor itself can store some Payload or it can/must be stored externally
• It does not support using Kafka (or similar) to decouple Tasks

Netflix Conductor – Tasks
[
{
"name": "verify_if_idents_are_added",
"retryCount": 3,
"retryLogic": "FIXED",
"retryDelaySeconds": 10,
"timeoutSeconds": 300,
"timeoutPolicy": "TIME_OUT_WF",
"responseTimeoutSeconds": 180
},
{
"name": "add_idents",
"retryCount": 3,
"retryLogic": "FIXED",
"retryDelaySeconds": 10,
"timeoutSeconds": 300,
"timeoutPolicy": "TIME_OUT_WF",
"responseTimeoutSeconds": 180
}
]

Netflix Conductor – Workflow Pt. 1
{
"name": "add_netflix_identation",
"description": "Adds Netflix Identation to video files.",
"version": 2,
"schemaVersion": 2,
"tasks": [
{
"name": "verify_if_idents_are_added",
"taskReferenceName": "ident_verification",
"inputParameters": {
"contentId": "${workflow.input.contentId}"
},
"type": "SIMPLE"
},
{
"name": "decide_task",
"taskReferenceName": "is_idents_added",
"case_value_param": "${ident_verification.output.is_idents_added}"
},

Netflix Conductor – Workflow Pt. 2
"type": "DECISION",
"caseValueParam": "case_value_param",
"decisionCases": {
"false": [
{
"name": "add_idents",
"taskReferenceName": "add_idents_by_type",
"identType": "${workflow.input.identType}",
"contentId": "${workflow.input.contentId}"
},
"type": "SIMPLE"
}
]
}
}
]
}

Uber Cadence
• Uber Project: https://github.com/uber/cadence
• "Cadence is a distributed, scalable, durable, and highly available
orchestration engine to execute asynchronous long-running business logic
in a scalable and resilient way."
• From the same people that lead the Amazon Simple Workflow service
• Has Clients for Java & Go
• Other possible, communicate via Thrift

Uber Cadence
• Cadence handles Task state & Queues for us
• Your Workflow is implemented in code
• Workflows can run & wait for a long time
• Example: Subscription Renewal workflow that runs forever and charges
your customer every 30 days
• Also no direct Kafka integration

Uber Cadence – Example
@Override public void execute(String customerId) {
activities.sendWelcomeEmail(customerId);
try {
boolean trialPeriod = true;
while (true) {
Workflow.sleep(Duration.ofDays(30));
activities.chargeMonthlyFee(customerId);
if (trialPeriod) {
activities.sendEndOfTrialEmail(customerId);
trialPeriod = false;
} else {
activities.sendMonthlyChargeEmail(customerId);
}
}
} catch (CancellationException e) {
activities.processSubscriptionCancellation(customerId);
activities.sendSorryToSeeYouGoEmail(customerId);
}
}

Expedia Stream Registry
• Expedia project (originally HomeAway):
https://github.com/ExpediaGroup/stream-registry
• A metadata service for streams
• Who owns the stream?
• Who are the producers and consumers of the stream?
• Management of stream replication across clusters and regions
• Management of stream storage for permanent access
• Management of stream triggers for legacy stream sources

Expedia Stream Registry
• It manages Clusters as well as "Streams" of data
• Including schemas, owners and other metadata
• Unfortunately the docs are pretty thin
• Moved from HomeAway to Expedia while undergoing a refactor

Others
There are others:
• ING Baker
• ING Project: https://github.com/ing-bank/baker
• "Orchestrate microservice-based process flows"
• Java based library
• You specify a Recipe which includes all your functions (interactions), the data they
need (ingredients) and the data they produce (event)
• Zeebe
• From the Camunda folks
• BPMN 2
• "A Workflow Engine for Microservices Orchestration"
• Dagster
• …

Where does this leave us?

The Current State
• Most existing tools require you to explicitly model your dependency graph
• This makes sense for "strict" workflows
• But not for many other use-cases (e.g. analytics, logging, persistence etc.)
• This is comparable to having SQL without a Query Optimizer or Spark
without Catalyst
• Some tools require you to implement their API or use their library

The Current State
• Unfortunately, the perfect solution doesn't yet exist
• The Orchestrators that do exist are all very nice and work
• For the Choreography though things are a bit bleak
• Stream Registry moves into the right direction
• Schema Registry is necessary as well but not sufficient

Does this seem familiar?

Wishlist
• We need better support for Event-Driven (Choreography) style
architectures
• We need better Governance for data in Kafka
• This problem is not exclusive to Kafka
• Kafka topics shouldn't be managed manually
• We need better self-service tools to find data sources

Wishlist
• We'd like a tool that
• allows us to register logical streams of data,
• Used to distinguish flows with the same schema
• Metadata (e.g. owners)
• e.g. "New customers stream"
• allows us to register Connections,
• e.g. Kafka Clusters, Kinesis credentials etc.
• allows us to register (Micro-)Services
• Including their Inputs and Outputs
• These are the "Data" in- and outputs, not any topic itself
• Both reference existing Schemas
• Optional: Dependencies

Wishlist
• This tool could use this information to
• automatically build an optimal DAG,
• and execute all necessary steps to enable this DAG:
• Create Kafka Topics
• Create necessary ACLs
• Optionally: Update MirrorMaker configuration or other steps
• The Services itself can then get all the information they need from the REST API
• Cluster configuration
• Schema information
• Topic names for in- and output
• Optional: Pre- & Postconditions

Wishlist
• For those who use Apache Spark:
• In Spark you define all your actions and transformations, at the end it builds an
optimal DAG out of this information and executes it
• This tool would do the exact same thing but across process boundaries
• The Services itself can be written in any language as long as they can make
REST calls
• Convenience clients would be great but optional
• As this tool controls the data flow (no data flows through the tool itself
though) it can create "intermediate" topics to enable more use-cases:
• Quality checks
• Automatic anonymization
• Automatic collection of samples

Example

Example
Service A Service BTopic "xqdrnc"

Example
Infra
Service
if (booking.travel_agency == "Thomas Cook") {
alert()
}

Example
fail()
}

Example
fail()
}
Infra
Service
Topic "blgrgb"

Wishlist
• This tool could also (optionally) automatically run or re-run the services
using e.g. Kubernetes
• This'd allow for total control
• Services need to be made aware of changes in the topology
• We could automatically transform between data formats
• e.g. a service accepting Protobuf but the data only exists in Avro

Wishlist
• A side effect would be an automatically up-to-date Governance/Data
Catalog
• This would allow for better self-service operations: You don't have to find "topics" in
Kafka with your data, you just have to declare which data you're interested in and
the system will always tell you where this data lives
• Orchestrators like Conductor etc. would still be important encapsulated in a
"Application"
• Which itself could consist of multiple services

Orchestography
Service 1
(Orchestrated)
Kafka
(or similar,
for Choreography)
Service 3 (Orchestrated)
Service 4 (Orchestrated)Service 2 (Orchestrated)
?

Questions
What are your questions?
lars.francke@opencore.com
@lars_francke

Kafka Summit 2019 Microservice Orchestration

More Related Content

What's hot (20)

Similar to Kafka Summit 2019 Microservice Orchestration (20)

Recently uploaded (20)

Kafka Summit 2019 Microservice Orchestration

Editor's Notes