SlideShare a Scribd company logo
Applying ML on your Data in
Motion with AWS and Confluent
Joseph Morais
AWS Evangelist/Cloud Partner Solutions Architect, Confluent
theejosephmorais
Kanchan Waikar
Senior Partner Solutions Architect at AWS
kanchanwaikar
Joseph Morais
AWS Evangelist/Cloud Partner Solutions Architect, Confluent
theejosephmorais
Data in Motion
And why you might care
Event Streaming is the
Central Nervous System
for today’s enterprises.
Apache Kafka®
is the technology.
‘Event’ is what happens in your business
Transportation
TPMS sensor in Carol’s car detected low tire-pressure at 5:11am.
Kafka
Banking
Alice sent $250 to Bob on Friday at 7:34pm.
Kafka
Retail
Sabine’s order of a Fujifilm camera was shipped at 9:10am.
Kafka
The Rise of Event Streaming
80%
Fortune 100 Companies
Using Apache Kafka
(majority are Confluent customers)
Event Streaming allow us to set Data in Motion:
Continuously processing evolving streams of data in real-time
Rich front-end
customer
experiences
Real-time
Events
Real-time
Event Streams and Analysis
A Sale A shipment
A Trade
A Customer
Experience
Real-time
backend
operations
At Confluent,
streaming is in our DNA.
We help the world’s
largest organizations
make it part of theirs.
Modernize
your apps
DATA INTEGRATION
Connected car
Fraud detection
Customer 360
Personalized
promotions
Apps driven by
real-time data
Quality
assurance
SIEM/SOC
Inventory
management
Proactive patient
care
Sentiment
analysis
Capital
management
Amazon
Kinesis
Amazon
S3
Set Your Company’s
Data in Motion
Make your apps & services
more valuable with
real-time insights from your
entire business, enabled by
event streaming & analytics
Database changes
Orders
IoT events
Payments
...
...
Amazon
Redshift
AWS
Lambda
Lock-in avoidance
and community
development
Enterprises want to avoid
the high costs of
yesteryear’s lock-in as
they modernize their data
architectures.
Industry Trends & How We’re Empowering Customers
Real-time events
and analysis live
everywhere
Customers live in an
ever-expanding/changing
hybrid and multi-cloud
world requiring
deployment freedom.
11
Digital decisioning
and microservices
converge
Operationalizing decision
making requires real-time
automation and rapid
evolution of business
logic.
Elastic & automatic
scaling of resources
Predicting the storage,
networking, and compute
resources needed for
streaming is difficult as
data volumes fluctuate.
11
What or who is Kafka?
The append-only log
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Confluent and Kanchan Waikar, AWS
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Confluent and Kanchan Waikar, AWS
Both Kafkas like to write
Time
Writers
Kafka
cluster
Readers
Guarantees of a Database
• Strict ordering
• Persistence
Rewind & Replay
Rewind & Replay
Reset to any point in the shared
narrative
Distributed by design
• Replication
• Fault Tolerance
• Partitioning
• Elastic Scaling
Producing to Kafka
Time
Kafka Topics
my-topic
my-topic-partition-0
my-topic-partition-1
my-topic-partition-2
broker-1
broker-2
broker-3
Partition Leadership and Replication
Broker 1
Topic1
partition1
Broker 2 Broker 3 Broker 4
Topic1
partition1
Topic1
partition1
Leader Follower
Topic1
partition2
Topic1
partition2
Topic1
partition2
Topic1
partition3
Topic1
partition4
Topic1
partition3
Topic1
partition3
Topic1
partition4
Topic1
partition4
Partition Leadership and Replication - node failure
Broker 1
Topic1
partition1
Broker 2 Broker 3 Broker 4
Topic1
partition1
Topic1
partition1
Leader Follower
Topic1
partition2
Topic1
partition2
Topic1
partition2
Topic1
partition3
Topic1
partition4
Topic1
partition3
Topic1
partition3
Topic1
partition4
Topic1
partition4
Kafka Connect and Kafka Streams
Sink
Source
KAFKA
STREAMS
KAFKA
CONNECT
KAFKA
CONNECT
Your App
Kafka Connect: Reliable and scalable integration
of Kafka with other systems
• Centralized management and configuration
• Support for hundreds of technologies including
RDBMS, Elasticsearch, HDFS, S3
• Supports CDC ingest of events from RDBMS
• Preserves data schema
• Fault tolerant and automatically load balanced
• Extensible API
• Single Message Transforms
• Part of Apache Kafka
{
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"connection.url": "jdbc:mysql://localhost:3306/demo?user=rmoff&password=foo",
"table.whitelist": "sales,orders,customers"
}
https://docs.confluent.io/current/connect/
Kafka Streams: Write standard Java apps and
microservices to process your data in real-time
• No separate processing cluster required
• Develop on Mac, Linux, Windows
• Deploy to containers, VMs, bare metal, cloud
• Powered by Kafka: elastic, scalable, distributed,
battle-tested
• Perfect for small, medium, large use cases
• Fully integrated with Kafka security
• Exactly-once processing semantics
• Part of Apache Kafka
KStream<User, PageViewEvent> pageViews = builder.stream("pageviews-topic");
KTable<Windowed<User>, Long> viewsPerUserSession = pageViews
.groupByKey()
.count(SessionWindows.with(TimeUnit.MINUTES.toMillis(5)), "session-views");
https://docs.confluent.io/current/streams/
Stream processing with Kafka
Example: Using Kafka’s Streams API for writing
elastic, scalable, fault-tolerant Java and Scala
applications
Main
Logi
c
ksqlDB to the rescue
Confluent makes Kafka Easier
But how though?
https://www.confluent.io/hub/
Large Ecosystem for Event Streaming
Easily connect to 130+ data systems
Data Diode
Amazon Redshift
AWS Lambda
Amazon S3
Amazon Kinesis
Amazon RDS Amazon ElastiCache
Confluent: Everywhere
Confluent Platform
The Enterprise Distribution of
Apache Kafka
Confluent Cloud
Apache Kafka Re-engineered
for the Cloud
Self-Managed Software
Fully-Managed Service
VM
Deploy on any platform, on-prem or cloud
Available on the leading public clouds
ksqlDB at a Glance
What is it?
ksqlDB is an event streaming
database for working with
streams and tables of data.
All the key features of a
modern streaming solution.
Aggregations Joins
Windowing
Event-Time
Dual Query
Support
Exactly-Once
Semantics
Out-of-Order
Handling
User-Defined
Functions
Compute Storage
CREATE TABLE activePromotions AS
SELECT rideId,
qualifyPromotion(distanceToDst) AS promotion
FROM locations
GROUP BY rideId
EMIT CHANGES
How does it work?
It separates compute from storage, and scales
elastically in a fault-tolerant manner.
It remains highly available during disruption,
even in the face of failure to a quorum of its
servers.
ksqlDB Kafka
Built on the Best Technology,
Available as a Fully-Managed Service
Kafka is the backbone of ksqlDB
ksqlDB is built on top of Kafka’s
battle-tested streaming foundation. Its
design re-uses Kafka to achieve elasticity,
fault-tolerance, and scalability for stream
processing & analytics..
Use a fully-managed service
With Confluent Cloud ksqlDB, you
need not worry about any of the
details of running it. You can forget
about:
• Clusters
• Brokers
• Scaling
• Upgrading
• Monitoring
Pay only for what you use.
ksqlDB server Kafka
topic
topic
changelog topic
Push & Pull
Queries
Kafka Streams
Engine
Local State
(transient)
topic
Compute Storage
Federated streaming, hybrid
and multi-cloud.
Data syndication and replication
across and between clouds and
on-premises, with self-service APIs,
data governance, and visual tooling.
Reliable & real-time data streams
between all customer sites, so you
can run always-on streaming
analytics on the data of the entire
enterprise, despite regional or cloud
provider outages.
Everywhere:
Cluster Linking Global Central Nervous System
Confluent: Complete
Dynamic Performance & Elasticity
Self-Balancing Clusters | Tiered Storage
Flexible DevOps Automation
Operator | Ansible
GUI-driven Mgmt & Monitoring
Control Center | Proactive Support
Efficient Operations
at Scale
Freedom of Choice
Committer-driven Expertise
Event Streaming Database
ksqlDB
Rich Pre-built Ecosystem
Connectors | Hub | Schema Registry
Multi-language Development
Non-Java Clients | REST Proxy
Admin REST APIs
Global Resilience
Multi-Region Clusters | Replicator
Cluster Linking
Data Compatibility
Schema Registry | Schema Validation
Enterprise-grade Security
RBAC | Secrets | Audit Logs
ARCHITECT
OPERATOR
DEVELOPER
Open Source | Community licensed
Unrestricted
Developer Productivity
Production-stage
Prerequisites
Fully Managed Cloud Service
Self-managed Software
Training Partners
Enterprise
Support
Professional
Services
Apache Kafka
Architecture Example
Cloud-native
Services
...
Customer services & apps
on-premises (EU)
Customer services & apps
on-premises (US)
Customer services & apps
self-managed in AWS
Connectors
Cluster
Linking
Cloud-native
Services
...
Cloud
Connectors
Real-time analytics
triggered a push alert
“Fraudulent transaction!”
to the B2C customer
Atlas
Connectors
Schema
Registry
Infinite
Storage
Streaming Analytics &
Processing (ksqlDB)
Connectors
Confluent Cloud
Large Partner Network
Consulting Partners, Cloud Partners, OEM Partners, Tech Partners
and more
https://www.confluent.io/partners/
What makes Confluent unique?
Everywhere
Global availability on AWS,
or on-prem
Bridge on-prem to cloud
with cluster linking
Extend streaming apps
across clouds
Cloud-Native
Available as a fully managed
service that is a serverless,
infinitely scalable, elastic, secure,
and globally interconnected. Our
self-managed service inherits all
the work born in the cloud.
Complete
ksqlDB, Connect, Schema
Registry, and more
Capable of end-to-end
applications
Kafka from the people
who made it
Ingest & Process
Capture event streams with a consistent data structure using
Schema Registry, develop real-time ETL pipelines with a lightweight
SQL syntax using ksqlDB & unify real-time streams with batch
processing using +100 Confluent Connectors
Derive insights from data in real-time
Mobile
Web
IoT
Data store
AWS & On-prem
Amazon
S3
S3 Sink
ANALYZE
Amazon
Redshift
AWS Lake
Formation
Amazon
Athena
Redshift Sink
TRANSFORM
Amazon
EMR
AWS Data
Pipeline
AWS
Glue
Source
connectors
Store & Analyze
Stream data with Confluent pre-built Connectors into your
AWS data lake or data warehouse to execute queries on vast
amounts of streaming data for real-time and batch analytics
VISUALIZE
Amazon
Elasticsearch
Schema
Registry
ksqlDB
Events
Real-time analytics
Serverless integration
Connect existing and apps & data stores in a repeatable way without
having to manage- Apache Kafka, Schema Registry to maintain
app compatibility, ksqlDB to develop real-time apps with SQL syntax
and Connect for effortless integrations with Lambda & data stores
AWS serverless platform
Stop provisioning, maintaining or administering servers for
backend components such as compute, databases and
storage so that you can focus on increasing agility and
innovation for your developer teams
Increase developer agility & speed of innovation
Apps
Microservices
ksqlDB
Schema
Registry
COMPUTE
AWS
Lambda
Data stores
REST Proxy
& Clients
Source
Connectors
Lambda
Sink
DATA STORES
Amazon
DynamoDB
Amazon
Aurora
STORAGE
Amazon
S3
S3 Sink
ANALYTICS
Amazon
Athena
Amazon
Redshift
Serverless app integration
Accelerate modernization from on-prem to AWS
Redshift Sink
Lambda Sink
AWS Direct
Connect
Replicator
LEGACY EDW
MAINFRAME
LEGACY DB
JDBC / CDC
connectors
Connect
Leverage +100 Confluent pre-built connectors to
continuously bring valuable data from existing
services on-prem including enterprise data
warehouse, databases and mainframes
Modernize
Increase agility in getting applications to market
and reduce TCO when freeing up resources to
focus on value generating activities and not in
managing servers
On-prem AWS Cloud
Bridge
Hybrid cloud streaming
with consistent,
event-driven architecture
for modern apps
On-prem to AWS modernization
Amazon Athena
AWS Glue
SageMaker
Lake Formation
Amazon
DynamoDB
Amazon
Aurora
S3 Sink
Data Streams
Apps
ksqlDB
Challenge
Maximize customer satisfaction and revenue growth by creating a
hyper-personalized online retail experience, turning each customer
visit into a one-on-one marketing opportunity.
Solution
Use Confluent to combine historical customer data with real-time
digital signals from customers, generating hyper-personalized
content – for example, targeted special offers – which is inserted in
real time back into the customer’s session.
● Schema Registry, KSQL, Lambda Connector
Results
● Real-time hyper-personalization of the customer experience
● Increased customer conversions
● Accelerated innovation
● Confluent Cloud frees up developers’ time
“Our hyper-personalized approach is delivering measurable results.
In our A/B testing, we’ve seen a significant increase in customer
conversion rates. That’s proof that our decision to adopt a real-time
event streaming approach was the right one. I expect even bigger
benefits as we continue to grow our capabilities.” — Jon Vines,
Software Development Team Lead at AO.com
Kanchan Waikar
Senior Partner Solutions Architect at AWS
kanchanwaikar

More Related Content

PDF
Better CQRS with ksqlDB | Anshuman Mukherjee, Airwallex
PDF
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...
PDF
3 Ways to Deliver an Elastic, Cost-Effective Cloud Architecture
PDF
The Road Most Traveled: A Kafka Story | Heikki Nousiainen, Aiven
PDF
Building adaptive user experiences using Contextual Multi-Armed Bandits with...
PDF
Kafka at the Edge: an IoT scenario with OpenShift Streams for Apache Kafka | ...
PDF
Simplified Hybrid Cloud Migration with Confluent and Google Cloud
PDF
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...
Better CQRS with ksqlDB | Anshuman Mukherjee, Airwallex
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...
3 Ways to Deliver an Elastic, Cost-Effective Cloud Architecture
The Road Most Traveled: A Kafka Story | Heikki Nousiainen, Aiven
Building adaptive user experiences using Contextual Multi-Armed Bandits with...
Kafka at the Edge: an IoT scenario with OpenShift Streams for Apache Kafka | ...
Simplified Hybrid Cloud Migration with Confluent and Google Cloud
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...

What's hot (20)

PDF
Bank of China Tech Talk 2: Introduction to Streaming Data and Stream Processi...
PDF
Event-driven Applications with Kafka, Micronaut, and AWS Lambda | Dave Klein,...
PDF
Mainframe Integration, Offloading and Replacement with Apache Kafka | Kai Wae...
PDF
Keeping Your Data Close and Your Caches Hotter (Ricardo Ferreira, Confluent) ...
PDF
Event-driven Applications with Kafka, Micronaut, and AWS Lambda | Dave Klein,...
PDF
From a Million to a Trillion Events Per Day: Stream Processing in Ludicrous M...
PPTX
Building Event Streaming Microservices with Spring Boot and Apache Kafka | Ja...
PDF
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
PDF
Understanding Apache Kafka® Latency at Scale
PPTX
An Introduction to Confluent Cloud: Apache Kafka as a Service
PDF
Using Apache Kafka to Analyze Session Windows
PDF
Creating a Kafka Topic. Super easy? | Andrew Stevenson and Marios Andreopoulo...
PDF
5 lessons learned for successful migration to Confluent cloud | Natan Silinit...
PDF
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...
PDF
The New Way of Configuring Grace Periods for Windowed Operations in Kafka Str...
PDF
Exposing and Controlling Kafka Event Streaming with Kong Konnect Enterprise |...
PPTX
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
PDF
Removing performance bottlenecks with Kafka Monitoring and topic configuration
PDF
Apache Kafka in Adobe Ad Cloud's Analytics Platform
PDF
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Bank of China Tech Talk 2: Introduction to Streaming Data and Stream Processi...
Event-driven Applications with Kafka, Micronaut, and AWS Lambda | Dave Klein,...
Mainframe Integration, Offloading and Replacement with Apache Kafka | Kai Wae...
Keeping Your Data Close and Your Caches Hotter (Ricardo Ferreira, Confluent) ...
Event-driven Applications with Kafka, Micronaut, and AWS Lambda | Dave Klein,...
From a Million to a Trillion Events Per Day: Stream Processing in Ludicrous M...
Building Event Streaming Microservices with Spring Boot and Apache Kafka | Ja...
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
Understanding Apache Kafka® Latency at Scale
An Introduction to Confluent Cloud: Apache Kafka as a Service
Using Apache Kafka to Analyze Session Windows
Creating a Kafka Topic. Super easy? | Andrew Stevenson and Marios Andreopoulo...
5 lessons learned for successful migration to Confluent cloud | Natan Silinit...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...
The New Way of Configuring Grace Periods for Windowed Operations in Kafka Str...
Exposing and Controlling Kafka Event Streaming with Kong Konnect Enterprise |...
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
Removing performance bottlenecks with Kafka Monitoring and topic configuration
Apache Kafka in Adobe Ad Cloud's Analytics Platform
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Ad

Similar to Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Confluent and Kanchan Waikar, AWS (20)

PDF
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
PDF
JHipster conf 2019 - Kafka Ecosystem
PDF
New Features in Confluent Platform 6.0 / Apache Kafka 2.6
PPTX
Event Streaming Architectures with Confluent and ScyllaDB
PDF
Apache Kafka - Scalable Message-Processing and more !
PDF
App modernization on AWS with Apache Kafka and Confluent Cloud
PPTX
Data Streaming with Apache Kafka & MongoDB - EMEA
PPTX
Webinar: Data Streaming with Apache Kafka & MongoDB
PDF
Reinventing Kafka in the Data Streaming Era - Jun Rao
PPTX
Data Streaming with Apache Kafka & MongoDB
PPTX
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
PDF
Jug - ecosystem
PDF
Introduction to Apache Kafka and Confluent... and why they matter
PDF
Chti jug - 2018-06-26
PPTX
AWS Immersion Day Mapfre - Confluent
PDF
Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...
PPTX
Data In Motion Paris 2023
PDF
The Top 5 Event Streaming Use Cases & Architectures in 2021
PDF
Top 5 Event Streaming Use Cases for 2021 with Apache Kafka
PDF
DIMT 2023 SG - Hands-on Workshop_ Getting started with Confluent Cloud.pdf
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
JHipster conf 2019 - Kafka Ecosystem
New Features in Confluent Platform 6.0 / Apache Kafka 2.6
Event Streaming Architectures with Confluent and ScyllaDB
Apache Kafka - Scalable Message-Processing and more !
App modernization on AWS with Apache Kafka and Confluent Cloud
Data Streaming with Apache Kafka & MongoDB - EMEA
Webinar: Data Streaming with Apache Kafka & MongoDB
Reinventing Kafka in the Data Streaming Era - Jun Rao
Data Streaming with Apache Kafka & MongoDB
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Jug - ecosystem
Introduction to Apache Kafka and Confluent... and why they matter
Chti jug - 2018-06-26
AWS Immersion Day Mapfre - Confluent
Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...
Data In Motion Paris 2023
The Top 5 Event Streaming Use Cases & Architectures in 2021
Top 5 Event Streaming Use Cases for 2021 with Apache Kafka
DIMT 2023 SG - Hands-on Workshop_ Getting started with Confluent Cloud.pdf
Ad

More from HostedbyConfluent (20)

PDF
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
PDF
Renaming a Kafka Topic | Kafka Summit London
PDF
Evolution of NRT Data Ingestion Pipeline at Trendyol
PDF
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
PDF
Exactly-once Stream Processing with Arroyo and Kafka
PDF
Fish Plays Pokemon | Kafka Summit London
PDF
Tiered Storage 101 | Kafla Summit London
PDF
Building a Self-Service Stream Processing Portal: How And Why
PDF
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
PDF
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
PDF
Navigating Private Network Connectivity Options for Kafka Clusters
PDF
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
PDF
Explaining How Real-Time GenAI Works in a Noisy Pub
PDF
TL;DR Kafka Metrics | Kafka Summit London
PDF
A Window Into Your Kafka Streams Tasks | KSL
PDF
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
PDF
Data Contracts Management: Schema Registry and Beyond
PDF
Code-First Approach: Crafting Efficient Flink Apps
PDF
Debezium vs. the World: An Overview of the CDC Ecosystem
PDF
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Renaming a Kafka Topic | Kafka Summit London
Evolution of NRT Data Ingestion Pipeline at Trendyol
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Exactly-once Stream Processing with Arroyo and Kafka
Fish Plays Pokemon | Kafka Summit London
Tiered Storage 101 | Kafla Summit London
Building a Self-Service Stream Processing Portal: How And Why
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Navigating Private Network Connectivity Options for Kafka Clusters
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Explaining How Real-Time GenAI Works in a Noisy Pub
TL;DR Kafka Metrics | Kafka Summit London
A Window Into Your Kafka Streams Tasks | KSL
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Data Contracts Management: Schema Registry and Beyond
Code-First Approach: Crafting Efficient Flink Apps
Debezium vs. the World: An Overview of the CDC Ecosystem
Beyond Tiered Storage: Serverless Kafka with No Local Disks

Recently uploaded (20)

PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Advanced Soft Computing BINUS July 2025.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
PDF
KodekX | Application Modernization Development
PDF
Modernizing your data center with Dell and AMD
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
MYSQL Presentation for SQL database connectivity
PPT
Teaching material agriculture food technology
PDF
Advanced IT Governance
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
cuic standard and advanced reporting.pdf
PDF
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
PPTX
Cloud computing and distributed systems.
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Advanced Soft Computing BINUS July 2025.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
20250228 LYD VKU AI Blended-Learning.pptx
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
KodekX | Application Modernization Development
Modernizing your data center with Dell and AMD
Spectral efficient network and resource selection model in 5G networks
Network Security Unit 5.pdf for BCA BBA.
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
MYSQL Presentation for SQL database connectivity
Teaching material agriculture food technology
Advanced IT Governance
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
cuic standard and advanced reporting.pdf
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
Cloud computing and distributed systems.

Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Confluent and Kanchan Waikar, AWS

  • 1. Applying ML on your Data in Motion with AWS and Confluent
  • 2. Joseph Morais AWS Evangelist/Cloud Partner Solutions Architect, Confluent theejosephmorais Kanchan Waikar Senior Partner Solutions Architect at AWS kanchanwaikar
  • 3. Joseph Morais AWS Evangelist/Cloud Partner Solutions Architect, Confluent theejosephmorais
  • 4. Data in Motion And why you might care
  • 5. Event Streaming is the Central Nervous System for today’s enterprises. Apache Kafka® is the technology.
  • 6. ‘Event’ is what happens in your business Transportation TPMS sensor in Carol’s car detected low tire-pressure at 5:11am. Kafka Banking Alice sent $250 to Bob on Friday at 7:34pm. Kafka Retail Sabine’s order of a Fujifilm camera was shipped at 9:10am. Kafka
  • 7. The Rise of Event Streaming 80% Fortune 100 Companies Using Apache Kafka (majority are Confluent customers)
  • 8. Event Streaming allow us to set Data in Motion: Continuously processing evolving streams of data in real-time Rich front-end customer experiences Real-time Events Real-time Event Streams and Analysis A Sale A shipment A Trade A Customer Experience Real-time backend operations
  • 9. At Confluent, streaming is in our DNA. We help the world’s largest organizations make it part of theirs.
  • 10. Modernize your apps DATA INTEGRATION Connected car Fraud detection Customer 360 Personalized promotions Apps driven by real-time data Quality assurance SIEM/SOC Inventory management Proactive patient care Sentiment analysis Capital management Amazon Kinesis Amazon S3 Set Your Company’s Data in Motion Make your apps & services more valuable with real-time insights from your entire business, enabled by event streaming & analytics Database changes Orders IoT events Payments ... ... Amazon Redshift AWS Lambda
  • 11. Lock-in avoidance and community development Enterprises want to avoid the high costs of yesteryear’s lock-in as they modernize their data architectures. Industry Trends & How We’re Empowering Customers Real-time events and analysis live everywhere Customers live in an ever-expanding/changing hybrid and multi-cloud world requiring deployment freedom. 11 Digital decisioning and microservices converge Operationalizing decision making requires real-time automation and rapid evolution of business logic. Elastic & automatic scaling of resources Predicting the storage, networking, and compute resources needed for streaming is difficult as data volumes fluctuate. 11
  • 12. What or who is Kafka? The append-only log
  • 15. Both Kafkas like to write Time
  • 17. Guarantees of a Database • Strict ordering • Persistence
  • 18. Rewind & Replay Rewind & Replay Reset to any point in the shared narrative
  • 19. Distributed by design • Replication • Fault Tolerance • Partitioning • Elastic Scaling
  • 22. Partition Leadership and Replication Broker 1 Topic1 partition1 Broker 2 Broker 3 Broker 4 Topic1 partition1 Topic1 partition1 Leader Follower Topic1 partition2 Topic1 partition2 Topic1 partition2 Topic1 partition3 Topic1 partition4 Topic1 partition3 Topic1 partition3 Topic1 partition4 Topic1 partition4
  • 23. Partition Leadership and Replication - node failure Broker 1 Topic1 partition1 Broker 2 Broker 3 Broker 4 Topic1 partition1 Topic1 partition1 Leader Follower Topic1 partition2 Topic1 partition2 Topic1 partition2 Topic1 partition3 Topic1 partition4 Topic1 partition3 Topic1 partition3 Topic1 partition4 Topic1 partition4
  • 24. Kafka Connect and Kafka Streams Sink Source KAFKA STREAMS KAFKA CONNECT KAFKA CONNECT Your App
  • 25. Kafka Connect: Reliable and scalable integration of Kafka with other systems • Centralized management and configuration • Support for hundreds of technologies including RDBMS, Elasticsearch, HDFS, S3 • Supports CDC ingest of events from RDBMS • Preserves data schema • Fault tolerant and automatically load balanced • Extensible API • Single Message Transforms • Part of Apache Kafka { "connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector", "connection.url": "jdbc:mysql://localhost:3306/demo?user=rmoff&password=foo", "table.whitelist": "sales,orders,customers" } https://docs.confluent.io/current/connect/
  • 26. Kafka Streams: Write standard Java apps and microservices to process your data in real-time • No separate processing cluster required • Develop on Mac, Linux, Windows • Deploy to containers, VMs, bare metal, cloud • Powered by Kafka: elastic, scalable, distributed, battle-tested • Perfect for small, medium, large use cases • Fully integrated with Kafka security • Exactly-once processing semantics • Part of Apache Kafka KStream<User, PageViewEvent> pageViews = builder.stream("pageviews-topic"); KTable<Windowed<User>, Long> viewsPerUserSession = pageViews .groupByKey() .count(SessionWindows.with(TimeUnit.MINUTES.toMillis(5)), "session-views"); https://docs.confluent.io/current/streams/
  • 27. Stream processing with Kafka Example: Using Kafka’s Streams API for writing elastic, scalable, fault-tolerant Java and Scala applications Main Logi c
  • 28. ksqlDB to the rescue
  • 29. Confluent makes Kafka Easier But how though?
  • 30. https://www.confluent.io/hub/ Large Ecosystem for Event Streaming Easily connect to 130+ data systems Data Diode Amazon Redshift AWS Lambda Amazon S3 Amazon Kinesis Amazon RDS Amazon ElastiCache
  • 31. Confluent: Everywhere Confluent Platform The Enterprise Distribution of Apache Kafka Confluent Cloud Apache Kafka Re-engineered for the Cloud Self-Managed Software Fully-Managed Service VM Deploy on any platform, on-prem or cloud Available on the leading public clouds
  • 32. ksqlDB at a Glance What is it? ksqlDB is an event streaming database for working with streams and tables of data. All the key features of a modern streaming solution. Aggregations Joins Windowing Event-Time Dual Query Support Exactly-Once Semantics Out-of-Order Handling User-Defined Functions Compute Storage CREATE TABLE activePromotions AS SELECT rideId, qualifyPromotion(distanceToDst) AS promotion FROM locations GROUP BY rideId EMIT CHANGES How does it work? It separates compute from storage, and scales elastically in a fault-tolerant manner. It remains highly available during disruption, even in the face of failure to a quorum of its servers. ksqlDB Kafka
  • 33. Built on the Best Technology, Available as a Fully-Managed Service Kafka is the backbone of ksqlDB ksqlDB is built on top of Kafka’s battle-tested streaming foundation. Its design re-uses Kafka to achieve elasticity, fault-tolerance, and scalability for stream processing & analytics.. Use a fully-managed service With Confluent Cloud ksqlDB, you need not worry about any of the details of running it. You can forget about: • Clusters • Brokers • Scaling • Upgrading • Monitoring Pay only for what you use. ksqlDB server Kafka topic topic changelog topic Push & Pull Queries Kafka Streams Engine Local State (transient) topic Compute Storage
  • 34. Federated streaming, hybrid and multi-cloud. Data syndication and replication across and between clouds and on-premises, with self-service APIs, data governance, and visual tooling. Reliable & real-time data streams between all customer sites, so you can run always-on streaming analytics on the data of the entire enterprise, despite regional or cloud provider outages. Everywhere: Cluster Linking Global Central Nervous System
  • 35. Confluent: Complete Dynamic Performance & Elasticity Self-Balancing Clusters | Tiered Storage Flexible DevOps Automation Operator | Ansible GUI-driven Mgmt & Monitoring Control Center | Proactive Support Efficient Operations at Scale Freedom of Choice Committer-driven Expertise Event Streaming Database ksqlDB Rich Pre-built Ecosystem Connectors | Hub | Schema Registry Multi-language Development Non-Java Clients | REST Proxy Admin REST APIs Global Resilience Multi-Region Clusters | Replicator Cluster Linking Data Compatibility Schema Registry | Schema Validation Enterprise-grade Security RBAC | Secrets | Audit Logs ARCHITECT OPERATOR DEVELOPER Open Source | Community licensed Unrestricted Developer Productivity Production-stage Prerequisites Fully Managed Cloud Service Self-managed Software Training Partners Enterprise Support Professional Services Apache Kafka
  • 36. Architecture Example Cloud-native Services ... Customer services & apps on-premises (EU) Customer services & apps on-premises (US) Customer services & apps self-managed in AWS Connectors Cluster Linking Cloud-native Services ... Cloud Connectors Real-time analytics triggered a push alert “Fraudulent transaction!” to the B2C customer Atlas Connectors Schema Registry Infinite Storage Streaming Analytics & Processing (ksqlDB) Connectors Confluent Cloud
  • 37. Large Partner Network Consulting Partners, Cloud Partners, OEM Partners, Tech Partners and more https://www.confluent.io/partners/
  • 38. What makes Confluent unique? Everywhere Global availability on AWS, or on-prem Bridge on-prem to cloud with cluster linking Extend streaming apps across clouds Cloud-Native Available as a fully managed service that is a serverless, infinitely scalable, elastic, secure, and globally interconnected. Our self-managed service inherits all the work born in the cloud. Complete ksqlDB, Connect, Schema Registry, and more Capable of end-to-end applications Kafka from the people who made it
  • 39. Ingest & Process Capture event streams with a consistent data structure using Schema Registry, develop real-time ETL pipelines with a lightweight SQL syntax using ksqlDB & unify real-time streams with batch processing using +100 Confluent Connectors Derive insights from data in real-time Mobile Web IoT Data store AWS & On-prem Amazon S3 S3 Sink ANALYZE Amazon Redshift AWS Lake Formation Amazon Athena Redshift Sink TRANSFORM Amazon EMR AWS Data Pipeline AWS Glue Source connectors Store & Analyze Stream data with Confluent pre-built Connectors into your AWS data lake or data warehouse to execute queries on vast amounts of streaming data for real-time and batch analytics VISUALIZE Amazon Elasticsearch Schema Registry ksqlDB Events Real-time analytics
  • 40. Serverless integration Connect existing and apps & data stores in a repeatable way without having to manage- Apache Kafka, Schema Registry to maintain app compatibility, ksqlDB to develop real-time apps with SQL syntax and Connect for effortless integrations with Lambda & data stores AWS serverless platform Stop provisioning, maintaining or administering servers for backend components such as compute, databases and storage so that you can focus on increasing agility and innovation for your developer teams Increase developer agility & speed of innovation Apps Microservices ksqlDB Schema Registry COMPUTE AWS Lambda Data stores REST Proxy & Clients Source Connectors Lambda Sink DATA STORES Amazon DynamoDB Amazon Aurora STORAGE Amazon S3 S3 Sink ANALYTICS Amazon Athena Amazon Redshift Serverless app integration
  • 41. Accelerate modernization from on-prem to AWS Redshift Sink Lambda Sink AWS Direct Connect Replicator LEGACY EDW MAINFRAME LEGACY DB JDBC / CDC connectors Connect Leverage +100 Confluent pre-built connectors to continuously bring valuable data from existing services on-prem including enterprise data warehouse, databases and mainframes Modernize Increase agility in getting applications to market and reduce TCO when freeing up resources to focus on value generating activities and not in managing servers On-prem AWS Cloud Bridge Hybrid cloud streaming with consistent, event-driven architecture for modern apps On-prem to AWS modernization Amazon Athena AWS Glue SageMaker Lake Formation Amazon DynamoDB Amazon Aurora S3 Sink Data Streams Apps ksqlDB
  • 42. Challenge Maximize customer satisfaction and revenue growth by creating a hyper-personalized online retail experience, turning each customer visit into a one-on-one marketing opportunity. Solution Use Confluent to combine historical customer data with real-time digital signals from customers, generating hyper-personalized content – for example, targeted special offers – which is inserted in real time back into the customer’s session. ● Schema Registry, KSQL, Lambda Connector Results ● Real-time hyper-personalization of the customer experience ● Increased customer conversions ● Accelerated innovation ● Confluent Cloud frees up developers’ time “Our hyper-personalized approach is delivering measurable results. In our A/B testing, we’ve seen a significant increase in customer conversion rates. That’s proof that our decision to adopt a real-time event streaming approach was the right one. I expect even bigger benefits as we continue to grow our capabilities.” — Jon Vines, Software Development Team Lead at AO.com
  • 43. Kanchan Waikar Senior Partner Solutions Architect at AWS kanchanwaikar