SlideShare a Scribd company logo
Streaming DynamoDB
Changelog to
Elasticsearch
Ying Xu - Core Streaming Team
Dan Fan - Core Datastores Team
● DynamoDB -> Elasticsearch use case
● DynamoDB changelog to Elasticsearch -- deep dive
● Elasticsearch cluster management at Lyft
● Summary
Agenda
A story of
ride pass
package
HTTP PUT
UI UI
Update
HTTP PUT
HTTP GET
DynamoDB
streams
What happens in the backend when extending
a ride pass package?
Changelog data ingestion system
Streaming
DynamoDB
changelog to
Elasticsearch
Requirements on DynamoDB changelog
data ingestion
• Real-time data
‒ Data available for downstream consumption within seconds
• High durability
‒ Data loss implies inconsistency at the sink
• Strong ordering
‒ Out-of-order implies inconsistency at the sink
• Heterogeneity
‒ Distinct data sources/sinks
• Highly available, fault-tolerant
‒ Self-recover from occasional failures
• Real-time transformation
‒ Ex: JSON->protobuf
Data ordering and consistency
EXP:
11/18
v3
EXP:
10/18
v2
EXP:
09/18
v1
EXP:
10/18
v2
EXP:
11/18
v3
EXP:
09/18
v1
EXP:
10/18
v2
EXP:
11/18
v3
Overview of E2E data pipeline
Flink job
(dynamostreams
->kafka)
map
JSON
PROTO
Dynamostr
emsflink
connector
Kafka cluster
cdc2es-tableA
Flinkkafka
connector
DynamoDB
(tableA)
DynamoDB
streams
.
.
.
.
.
.
.
.
.
.
.
.
Flink job
(kafka->Elastics
earch)
Flinkkafka
connector
map
Elasticsearch
cluster
Flink
Elasticsearch
connector
NEW DEVELOPMENT
Kafka -- distributed event log
• Apache Kafka: state-of-the-art pubsub technology
‒ High durability and strong ordering guarantee
‒ Excellent fanout
‣ No hard limitation on number of consumer groups
‒ High throughput and low latency
‒ Multi-language client support
‣ Python/GO (librdkafka)
‣ Java (native or flink kafka connector)
‒ Mature technology with wide adoption
Running Kafka on AWS
• Confluent cloud
‒ Managed Kafka clusters running on AWS
‣ VPC peered with Lyft AWS account
‣ High availability, multi-AZ config
‣ SASL authentication
‣ Encryption on the wire & at rest
‒ Monitoring and control
‣ Cloud portal with monitoring/control panel
‣ Broker-side metrics**
‒ General SLOs
‣ Uptime: 99.95%
‣ Message success rate: 99.99%
‣ Write latency: p95 < 50ms
‒ Changelog data retention: 4 days
** on support roadmap
Flink based data pipeline
• Apache Flink: modern stream compute
framework
‒ Per-event based stream processing
‒ Flexible API unifying stream and batch processing
‒ State persisted through checkpoints/savepoints (fault
tolerance)
‒ Multi-language (Python/GO) support through Apache
BEAM framework
• Flink at Lyft
‒ Tooling for job lifecycle management
‒ Flink jobs running as services
‒ Distributed execution -- standalone flink cluster
Graph from flink website: https://ci.apache.org/projects/flink/flink-docs-release-1.6/concepts/runtime.html
E2E data pipeline -- recap
Upstream flink job: DynamoDB
changelogs -> Kafka
Downstream flink job: Kafka ->
Elasticsearch
Flink job
(dynamostreams
->kafka)
map
JSON
PROTO
Dynamostr
emsflink
connector
Kafka cluster
cdc2es-tableA
Flinkkafka
connector
DynamoDB
(tableA)
DynamoDB
streams
.
.
.
.
.
.
.
.
.
.
.
.
Flink job
(kafka->Elasticse
arch)
Flinkkafka
connector
Elasticsearch
cluster
Flink
Elasticsearch
connector
NEW DEVELOPMENT
map
Flink DynamoDB streams connector
• DynamoDB streams
‒ Changelog self-scaling along with DynamoDB
‒ Exactly-once delivery of change operations
‣ Event type
‣ Primary key
‣ Old and/or new images of the record
‒ Special Kinesis stream
• Cannot directly use existing Flink-Kinesis
connector
Two ways to interact with DynamoDB streams
Flink DynamoDB streams connector
• NEW Flink source connector developed at Lyft
‒ Integrate official dynamostreams to kinesis adapter
‒ Apply adaptations to necessary kinesis client methods
‣ describeStream
‣ getRecords
‣ getShardIterator
‒ Reuses Flink-Kinesis connector’s state management and
checkpoint mechanism.
• Other adjustments
‒ Handle DynamoDB streams shardID format
‒ Handle AWS DynamoDB local container compatibility
DynamoDB
streams
DynamoDB
Dynamo Kinesis
adapter
Kinesis protocol
handler
Dynamostreams
low-levelAPI
Flink DynamoDB streams
connector
Sync Changelog
From Kafka to
Elasticsearch
● Overview of Kafka -> Elasticsearch flink job
● How to handle 429 too many request
● How to address access control issue
● How to achieve seamless pipeline migration & ES upgrade
Overview of Kafka -> Elasticsearch Flink Job
● Elasticsearch @ Lyft
○ All AWS managed Elasticsearch cluster
○ Orchestrated by salt
○ Per service per ES cluster for full isolation
● Why not use open source Flink Connector ?
○ Lyft uses AWS managed Elasticsearch Cluster
○ Open source Flink job is a ES transport client based connector
○ Not allowed by AWS managed cluster
bulk
request
Flink job
map
Flink
Kafka
consumer
Elastic
search
Kafka records
Elasticsearch actions
Jest
Http
ClientKafka Cluster
Kafka topic
X documents or every Y seconds,
whichever comes first
● Quick overview of Kafka -> Elasticsearch sink
● How to handle 429 too many request
● How to address access control issue
● How to achieve seamless pipeline migration & ES upgrade
Things may not always go right ...
Pic © Copyright 2018. From @mikeleeorg.
Remind the requirements ...
● zero data loss
● strong ordering
● duplication is fine
Delay is bad, still better than being wrong
How to handle 429 too many requests ?
● Retry with exponential backoff
● No checkpointing till success
● Replay from last checkpoint when throwing exception
Kafka Topic
Flink Job Elasticsearch
Update x = 6
Update x = 6
Update x
= 6
Update x = 5
Update x = 5
Update x
= 5
Load from Kafka Sync to ES
Checkpointing Get 200
Get 429
Too many
Request
X No
checkpointing
Sync to ES
● Quick overview of Kafka -> Elasticsearch sink
● How to handle 429 too many request
● How to address access control issue
● How to achieve seamless pipeline migration & ES upgrade
● Virtual private cloud
● Fully configurable
VPC Security Group Elasticsearch Config
VPC & security group for access control
● Define inbound and
outbound policies
● Under Lyft VPC
● Dedicated security
group
Access Control with AWS Managed ES Cluster
Elasticsearch Cluster Config
…
…
- VPCOptions:
- SubnetIds:
- SecurityGroupIds:
- security_group_id
Ensure AWS sg exists:
….
….
- rules:
- ip_protocol: tcp
from_port: 443
to_port: 443
source_group_name:
- coupon-service-iad
- {{other services}}
- vpc_id:
Create a security group Add the security group for ES
Security
Group
Id
Access control - VPC & security group
● Benefits
○ More secure
○ Faster development and debugging
○ Centralize the access permissions in one place
● More restrictions - IAM policy
○ For example: readonly
● Quick overview of Kafka -> Elasticsearch sink
● How to handle 429 too many request
● How to address access control issue
● How to achieve seamless pipeline migration & ES upgrade
Elasticsearch upgrade/migration
● Why not upgrade in place
○ Backward incompatibility
○ Encryption at rest
● Challenges:
○ No service downtime
○ Backfilling ES could be time consuming
Web
Service
migration
service
Dynamo
Dynamo
change
log
Elastic
search
Old
pipeline
Kafka Cluster
(Buffer Change Log)
Flink job
Elastic
search
Flink job
Conclusions and lessons learned
• Kafka as event storage is essential for
‒ Data ingestion with high durability, low latency & strong ordering guarantee
• Flink connector is essential for
‒ Zero data loss
‒ Easy recovery from failure and ES degradations
• AWS managed Elasticsearch Cluster
‒ Trade small inflexibility for simplicity, scalability & better security
‒ Backward incompatibility is a pain
‒ Seamless migration is an important factor to consider for pipeline design
Thank You!
Q&A
We are hiring!
lyft.com/careers

More Related Content

PDF
Streaming ETL with Apache Kafka and KSQL
PPTX
Kafka Summit NYC 2017 Hanging Out with Your Past Self in VR
PDF
Securing the Message Bus with Kafka Streams | Paul Otto and Ryan Salcido, Raf...
PDF
KSQL Intro
PPTX
How to Lock Down Apache Kafka and Keep Your Streams Safe
PDF
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
PDF
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
PDF
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
Streaming ETL with Apache Kafka and KSQL
Kafka Summit NYC 2017 Hanging Out with Your Past Self in VR
Securing the Message Bus with Kafka Streams | Paul Otto and Ryan Salcido, Raf...
KSQL Intro
How to Lock Down Apache Kafka and Keep Your Streams Safe
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL

What's hot (20)

PPTX
Capture the Streams of Database Changes
PDF
Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...
PDF
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
PDF
Cross the streams thanks to Kafka and Flink (Christophe Philemotte, Digazu) K...
PDF
ksqlDB: A Stream-Relational Database System
PPTX
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
PDF
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
PPTX
Managing multiple event types in a single topic with Schema Registry | Bill B...
PDF
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
PPTX
Kafka Summit NYC 2017 - Apache Kafka in the Enterprise: What if it Fails?
PPTX
Building an Event-oriented Data Platform with Kafka, Eric Sammer
PDF
What's new with Apache Camel 3? | DevNation Tech Talk
PDF
Securing Kafka At Zendesk (Joy Nag, Zendesk) Kafka Summit 2020
PDF
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
PDF
Gwen Shapira, Confluent | Kafka Summit 2020 Keynote | Kafka’s New Architecture
PDF
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
PDF
Building Out Your Kafka Developer CDC Ecosystem
PDF
Performance Tuning RocksDB for Kafka Streams’ State Stores
PDF
Multi cluster, multitenant and hierarchical kafka messaging service slideshare
PDF
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
Capture the Streams of Database Changes
Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
Cross the streams thanks to Kafka and Flink (Christophe Philemotte, Digazu) K...
ksqlDB: A Stream-Relational Database System
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
Managing multiple event types in a single topic with Schema Registry | Bill B...
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
Kafka Summit NYC 2017 - Apache Kafka in the Enterprise: What if it Fails?
Building an Event-oriented Data Platform with Kafka, Eric Sammer
What's new with Apache Camel 3? | DevNation Tech Talk
Securing Kafka At Zendesk (Joy Nag, Zendesk) Kafka Summit 2020
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Gwen Shapira, Confluent | Kafka Summit 2020 Keynote | Kafka’s New Architecture
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Building Out Your Kafka Developer CDC Ecosystem
Performance Tuning RocksDB for Kafka Streams’ State Stores
Multi cluster, multitenant and hierarchical kafka messaging service slideshare
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
Ad

Similar to Kafka elastic search meetup 09242018 (20)

PDF
Load Balancing in the Cloud using Nginx & Kubernetes
PDF
AWS security monitoring and compliance validation from Adobe.
PDF
Otimizando servidores web
PPTX
Building Stream Processing as a Service
PDF
How Uber scaled its Real Time Infrastructure to Trillion events per day
PDF
Linux Kernel vs DPDK: HTTP Performance Showdown
PDF
Flink forward-2017-netflix keystones-paas
PDF
NetflixOSS Open House Lightning talks
PPTX
ELK Ruminating on Logs (Zendcon 2016)
PPTX
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
PDF
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
PDF
HTTP/2 Comes to Java: Servlet 4.0 and what it means for the Java/Jakarta EE e...
PPTX
apidays LIVE Singapore 2021 - REST the Events - REST APIs for Event-Driven Ar...
PDF
Event Driven Microservices
PDF
07 (IDNOG02) SDN Research activity in Institut Teknologi Bandung by Affan Bas...
PPTX
apidays LIVE India - REST the Events - REST APIs for Event-Driven Architectur...
PDF
Linux kernel TLS и HTTPS / Александр Крижановский (Tempesta Technologies)
PDF
DBCC 2021 - FLiP Stack for Cloud Data Lakes
PDF
How to run a bank on Apache CloudStack
PDF
Flink at netflix paypal speaker series
Load Balancing in the Cloud using Nginx & Kubernetes
AWS security monitoring and compliance validation from Adobe.
Otimizando servidores web
Building Stream Processing as a Service
How Uber scaled its Real Time Infrastructure to Trillion events per day
Linux Kernel vs DPDK: HTTP Performance Showdown
Flink forward-2017-netflix keystones-paas
NetflixOSS Open House Lightning talks
ELK Ruminating on Logs (Zendcon 2016)
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
HTTP/2 Comes to Java: Servlet 4.0 and what it means for the Java/Jakarta EE e...
apidays LIVE Singapore 2021 - REST the Events - REST APIs for Event-Driven Ar...
Event Driven Microservices
07 (IDNOG02) SDN Research activity in Institut Teknologi Bandung by Affan Bas...
apidays LIVE India - REST the Events - REST APIs for Event-Driven Architectur...
Linux kernel TLS и HTTPS / Александр Крижановский (Tempesta Technologies)
DBCC 2021 - FLiP Stack for Cloud Data Lakes
How to run a bank on Apache CloudStack
Flink at netflix paypal speaker series
Ad

Recently uploaded (20)

PPTX
Lecture Notes Electrical Wiring System Components
PPTX
Geodesy 1.pptx...............................................
PPTX
Welding lecture in detail for understanding
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PDF
Digital Logic Computer Design lecture notes
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPTX
web development for engineering and engineering
PPT
Mechanical Engineering MATERIALS Selection
Lecture Notes Electrical Wiring System Components
Geodesy 1.pptx...............................................
Welding lecture in detail for understanding
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
Automation-in-Manufacturing-Chapter-Introduction.pdf
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Model Code of Practice - Construction Work - 21102022 .pdf
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Operating System & Kernel Study Guide-1 - converted.pdf
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Digital Logic Computer Design lecture notes
Internet of Things (IOT) - A guide to understanding
R24 SURVEYING LAB MANUAL for civil enggi
web development for engineering and engineering
Mechanical Engineering MATERIALS Selection

Kafka elastic search meetup 09242018

  • 1. Streaming DynamoDB Changelog to Elasticsearch Ying Xu - Core Streaming Team Dan Fan - Core Datastores Team
  • 2. ● DynamoDB -> Elasticsearch use case ● DynamoDB changelog to Elasticsearch -- deep dive ● Elasticsearch cluster management at Lyft ● Summary Agenda
  • 3. A story of ride pass package
  • 4. HTTP PUT UI UI Update HTTP PUT HTTP GET DynamoDB streams What happens in the backend when extending a ride pass package? Changelog data ingestion system
  • 6. Requirements on DynamoDB changelog data ingestion • Real-time data ‒ Data available for downstream consumption within seconds • High durability ‒ Data loss implies inconsistency at the sink • Strong ordering ‒ Out-of-order implies inconsistency at the sink • Heterogeneity ‒ Distinct data sources/sinks • Highly available, fault-tolerant ‒ Self-recover from occasional failures • Real-time transformation ‒ Ex: JSON->protobuf
  • 7. Data ordering and consistency EXP: 11/18 v3 EXP: 10/18 v2 EXP: 09/18 v1 EXP: 10/18 v2 EXP: 11/18 v3 EXP: 09/18 v1 EXP: 10/18 v2 EXP: 11/18 v3
  • 8. Overview of E2E data pipeline Flink job (dynamostreams ->kafka) map JSON PROTO Dynamostr emsflink connector Kafka cluster cdc2es-tableA Flinkkafka connector DynamoDB (tableA) DynamoDB streams . . . . . . . . . . . . Flink job (kafka->Elastics earch) Flinkkafka connector map Elasticsearch cluster Flink Elasticsearch connector NEW DEVELOPMENT
  • 9. Kafka -- distributed event log • Apache Kafka: state-of-the-art pubsub technology ‒ High durability and strong ordering guarantee ‒ Excellent fanout ‣ No hard limitation on number of consumer groups ‒ High throughput and low latency ‒ Multi-language client support ‣ Python/GO (librdkafka) ‣ Java (native or flink kafka connector) ‒ Mature technology with wide adoption
  • 10. Running Kafka on AWS • Confluent cloud ‒ Managed Kafka clusters running on AWS ‣ VPC peered with Lyft AWS account ‣ High availability, multi-AZ config ‣ SASL authentication ‣ Encryption on the wire & at rest ‒ Monitoring and control ‣ Cloud portal with monitoring/control panel ‣ Broker-side metrics** ‒ General SLOs ‣ Uptime: 99.95% ‣ Message success rate: 99.99% ‣ Write latency: p95 < 50ms ‒ Changelog data retention: 4 days ** on support roadmap
  • 11. Flink based data pipeline • Apache Flink: modern stream compute framework ‒ Per-event based stream processing ‒ Flexible API unifying stream and batch processing ‒ State persisted through checkpoints/savepoints (fault tolerance) ‒ Multi-language (Python/GO) support through Apache BEAM framework • Flink at Lyft ‒ Tooling for job lifecycle management ‒ Flink jobs running as services ‒ Distributed execution -- standalone flink cluster Graph from flink website: https://ci.apache.org/projects/flink/flink-docs-release-1.6/concepts/runtime.html
  • 12. E2E data pipeline -- recap Upstream flink job: DynamoDB changelogs -> Kafka Downstream flink job: Kafka -> Elasticsearch Flink job (dynamostreams ->kafka) map JSON PROTO Dynamostr emsflink connector Kafka cluster cdc2es-tableA Flinkkafka connector DynamoDB (tableA) DynamoDB streams . . . . . . . . . . . . Flink job (kafka->Elasticse arch) Flinkkafka connector Elasticsearch cluster Flink Elasticsearch connector NEW DEVELOPMENT map
  • 13. Flink DynamoDB streams connector • DynamoDB streams ‒ Changelog self-scaling along with DynamoDB ‒ Exactly-once delivery of change operations ‣ Event type ‣ Primary key ‣ Old and/or new images of the record ‒ Special Kinesis stream • Cannot directly use existing Flink-Kinesis connector Two ways to interact with DynamoDB streams
  • 14. Flink DynamoDB streams connector • NEW Flink source connector developed at Lyft ‒ Integrate official dynamostreams to kinesis adapter ‒ Apply adaptations to necessary kinesis client methods ‣ describeStream ‣ getRecords ‣ getShardIterator ‒ Reuses Flink-Kinesis connector’s state management and checkpoint mechanism. • Other adjustments ‒ Handle DynamoDB streams shardID format ‒ Handle AWS DynamoDB local container compatibility DynamoDB streams DynamoDB Dynamo Kinesis adapter Kinesis protocol handler Dynamostreams low-levelAPI Flink DynamoDB streams connector
  • 15. Sync Changelog From Kafka to Elasticsearch
  • 16. ● Overview of Kafka -> Elasticsearch flink job ● How to handle 429 too many request ● How to address access control issue ● How to achieve seamless pipeline migration & ES upgrade
  • 17. Overview of Kafka -> Elasticsearch Flink Job ● Elasticsearch @ Lyft ○ All AWS managed Elasticsearch cluster ○ Orchestrated by salt ○ Per service per ES cluster for full isolation ● Why not use open source Flink Connector ? ○ Lyft uses AWS managed Elasticsearch Cluster ○ Open source Flink job is a ES transport client based connector ○ Not allowed by AWS managed cluster
  • 18. bulk request Flink job map Flink Kafka consumer Elastic search Kafka records Elasticsearch actions Jest Http ClientKafka Cluster Kafka topic X documents or every Y seconds, whichever comes first
  • 19. ● Quick overview of Kafka -> Elasticsearch sink ● How to handle 429 too many request ● How to address access control issue ● How to achieve seamless pipeline migration & ES upgrade
  • 20. Things may not always go right ... Pic © Copyright 2018. From @mikeleeorg.
  • 21. Remind the requirements ... ● zero data loss ● strong ordering ● duplication is fine Delay is bad, still better than being wrong
  • 22. How to handle 429 too many requests ? ● Retry with exponential backoff ● No checkpointing till success ● Replay from last checkpoint when throwing exception Kafka Topic Flink Job Elasticsearch Update x = 6 Update x = 6 Update x = 6 Update x = 5 Update x = 5 Update x = 5 Load from Kafka Sync to ES Checkpointing Get 200 Get 429 Too many Request X No checkpointing Sync to ES
  • 23. ● Quick overview of Kafka -> Elasticsearch sink ● How to handle 429 too many request ● How to address access control issue ● How to achieve seamless pipeline migration & ES upgrade
  • 24. ● Virtual private cloud ● Fully configurable VPC Security Group Elasticsearch Config VPC & security group for access control ● Define inbound and outbound policies ● Under Lyft VPC ● Dedicated security group
  • 25. Access Control with AWS Managed ES Cluster Elasticsearch Cluster Config … … - VPCOptions: - SubnetIds: - SecurityGroupIds: - security_group_id Ensure AWS sg exists: …. …. - rules: - ip_protocol: tcp from_port: 443 to_port: 443 source_group_name: - coupon-service-iad - {{other services}} - vpc_id: Create a security group Add the security group for ES Security Group Id
  • 26. Access control - VPC & security group ● Benefits ○ More secure ○ Faster development and debugging ○ Centralize the access permissions in one place ● More restrictions - IAM policy ○ For example: readonly
  • 27. ● Quick overview of Kafka -> Elasticsearch sink ● How to handle 429 too many request ● How to address access control issue ● How to achieve seamless pipeline migration & ES upgrade
  • 28. Elasticsearch upgrade/migration ● Why not upgrade in place ○ Backward incompatibility ○ Encryption at rest ● Challenges: ○ No service downtime ○ Backfilling ES could be time consuming
  • 30. Conclusions and lessons learned • Kafka as event storage is essential for ‒ Data ingestion with high durability, low latency & strong ordering guarantee • Flink connector is essential for ‒ Zero data loss ‒ Easy recovery from failure and ES degradations • AWS managed Elasticsearch Cluster ‒ Trade small inflexibility for simplicity, scalability & better security ‒ Backward incompatibility is a pain ‒ Seamless migration is an important factor to consider for pipeline design
  • 31. Thank You! Q&A We are hiring! lyft.com/careers