SlideShare a Scribd company logo
1
Enhancing AI-Driven
User Engagement
with Real-Time Data
Streaming via Flink
Zbigniew Królikowski
Senior ML Engineer @VirtusLab
2
● Our client, a worldwide retailer, runs an e-commerce
platform. Content displayed to the customer is
personalised based on individual purchase history.
● The existing approach was based on batch data
processing and model training that introduced a
significant delay between customer actions on the
website and tailored content recommendations,
leading to missed sales opportunities.
● Client’s platform allows for continuous tracking and
processing customer actions with minimal latency. Up
to this point this capability wasn’t leveraged.
● Our Machine Learning engineering team undertook the
challenge to build an online data processing and model
training solution that improves customer engagement
in order to generate more sales opportunities for the
client.
The opportunity
3
● From Machine Learning perspective our main goal was
to improve the metric called Click-Through Rate (CTR),
which is built on a statistic of individual users seeing a
particular piece of content and engaging with it.
● Engagement with content leads to sales opportunities
impacting the Revenue per Customer (RPC) as well
Revenue per Basket (RPB).
● Although both RPC and RPB have a direct link to profit
and were actively monitored during A/B testing through
analytics means. CTR is more practical as a running
metric as it can be calculated on-the-fly.
Business metrics
we were interested in
4
● Events describing user actions are fed through Kafka. Our
solution supports backwards-compatibility for different
event versions.
● Events contain information about user identity, displayed
content, user’s basket as well as user’s interactions within
the page. A number of them needs to be taken account to
form a complete customer story.
● All events types have to be fed through a specialised
filtering logic as well correctly joined and aggregated to
build the model training features.
● Events are broadcasted from all locations on the mobile app
and website while select subsets of this data have to be
efficiently routed into dedicated machine learning models.
The data
5
● Customer offers a mobile application as well as a web
service. Our solution is capable of supporting both of these
as well as potential extensions for on-site use-cases.
● Training a single model, on one platform, requires
processing events for each individual customer exposed the
that piece of content. This, on average, is close to 2000
events per second.
● There are multiple locations, on each platform, that can be
enhanced with the move to the online solution enabled by
Flink.
● In order to provide complete coverage, the solution needs
to be able to scale up to tens of thousands of events per
second, while keeping costs manageable during
development ramp-up.
The data: volume
6
● Uptime SLA’s are fulfilled and the system needs to be able
to recover from transient failure automatically.
● Data is never lost and the whole process is always
recoverable.
● User identity remains hidden at all times.
● No bias is introduced to the models through the data
ensuring fair treatment. Training is based solely on relevant
characteristics.
● Personalisation is only delivered to logged-in users who
have expressly agreed to participate.
Other
requirements
7
Apache Flink is a stream processing software designed from ground up with
those tenants is mind:
● Correctness guarantees - reproducible and consistent results
● Layered APIs - viable for all team skill sets
● Operational focus - production ready from the start
● Scalability - from minimal use-cases to core business
● Performance - no-compromise approach to low-latency, high-throughput
and minimal operational costs.
Flink offers excellent integration with a number of queue systems including
Kafka as well as SQL and NoSQL databases.
Apache Flink is an Open Source Software available under the Apache License
2.0.
Apache Flink
8
● In contrast to stateless processing, stateful
processing enables a broader set of operations that
encompass more than one event. This opens-up the
possibilities for more sophisticated business
processes to be represented within the code.
● Apache Flink supports multiple back-ends for
storing state to best fit different scalability,
throughput, and latency requirements: in-memory
or a key-value database (RocksDB).
Stateful
processing
9
● Apache Flink supports both batch and streaming processing
and guarantees full reproducibility of results across both
approaches. This requires close to zero code changes and works
with each of Flink’s API’s. Both bounded and unbounded
streams are supported.
Image source: https://flink.apache.org/what-is-flink/flink-architecture/
Batch processing and
Stream processing
● This feature makes Flink capable of covering the use-cases as
batch processing tools like Apache Spark. This has potential
to reduce the amount of different tooling necessary for a
project.
10
Big part of what makes Flink a good fit for the client is the very
powerful feature of event-time processing.
By using this feature, risk associated with a scalable, dynamic
environment is mitigated and we achieve full reproducibility. As
the logical time is untied from wall-clock time application gain
data consistency the focus can be directed towards keeping
latency as low as possible.
Development process is also greatly enhanced, as the logic can be
rolled back to any point time allowing for a replicable
development environment.
Moreover the same applies to the root-cause-analysis process.
Production issues can be resolved swiftly and reliably reducing
the ongoing maintenance costs.
Event-time processing
11
Flink supports two application programming interfaces
(APIs) with a large overlap of features:
● Table API (SQL) - suitable for BI specialists, analysts
and data scientists.
● DataStream API (available for Python, Java and
Scala) - suitable for data engineers and ML
engineers.
This makes Flink fit seamlessly into existing proficiencies
of the team, without the need of time-investment
necessary for re-skilling.
Available languages and
APIs
Image source: https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/concepts/overview/
12
System architecture
13
Flink can be easily integrated with
cloud platform as well as on
premise environment through the
use of the Flink Kubernetes
Operator which handles many
aspects of the deployment.
Deployment
image source: https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/img/overview.svg
14
● Flink offers complete consistency of results
through checkpointing which will either
be run on a very small schedule (seconds
to minutes) or continuously. In case of
issues Flink application is able to reload
from that point and seamlessly catch-up
with minimal disruption.
Resilience
● When deployed on Kubernetes, Flink
supports a high-availability (HA)
deployment method in which application
stability is maintained in the case of failure
of the JobManager, entity that supervises
the scheduling and resource management.
● Industry-standard ZooKeeper can be used
instead. This makes HA possible on non-
kubernetes platforms.
Image source: https://nightlies.apache.org/flink/flink-docs-master/docs/concepts/stateful-stream-processing/
15
The deployment will adapt to changing load patterns by
increasing or reducing the allocation of resources based on the
load. This lowers the operating costs, while at the same
safeguards from deterioration of the service level in time os
increased customer traffic.
Auto-scaling
Apache Flink will automatically re-adjust it’s internal
allocation of resource to individual processing steps based on
quality characteristics. Through this approach, the
administrative and development costs can be reduced at the
same time ensuring operational efficiency.
Image source: https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-release-1.7/docs/custom-resource/autoscaler
16
Flink offers excellent in-built monitoring capability with a web UI:
● Provides real-time statistics of input and data sizes and
message counts for every step of the processing.
● Collects all application logs in single location for seamless
browsing and auditing.
● Displays information regarding checkpoints providing clear
understanding of stability of the applications.
● The same capability is available during the development
process ensuring quality and clarity of operation from the early
stages.
Additionally we’ve integrated the Flink instance with an external
cloud monitoring tools giving us a uniform monitoring capability
across all cloud deployments.
Monitoring
17
System architecture - adjusted
18
● Our solution gave an 35% uptick
in CTR for duration of the A/B trial.
● Performance has been
maintained over time. This shows
that this method has prevented
concept/model drift.
The results
19
Thank you

More Related Content

PDF
Confluent Partner Tech Talk with Reply
PDF
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
DOC
Mridul_Halder_Resume
PDF
Pivotal Cloud Foundry 2.6: A First Look
PDF
Technical Webinar: Patterns for Integrating Your Salesforce App with Off-Plat...
PDF
Partner Connect APAC - 2022 - April
PDF
Cloud-Native Patterns and the Benefits of MySQL as a Platform Managed Service
PDF
Which Application Modernization Pattern Is Right For You?
Confluent Partner Tech Talk with Reply
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Mridul_Halder_Resume
Pivotal Cloud Foundry 2.6: A First Look
Technical Webinar: Patterns for Integrating Your Salesforce App with Off-Plat...
Partner Connect APAC - 2022 - April
Cloud-Native Patterns and the Benefits of MySQL as a Platform Managed Service
Which Application Modernization Pattern Is Right For You?

Similar to Enhancing AI-Driven User Engagement with Real-Time Data Streaming via Flink.pptx (20)

PDF
The F5 Networks Application Services Reference Architecture (White Paper)
PDF
Top 8 Trends in Performance Engineering
PDF
Datasheet scriptspluginforrd
PDF
MuleSoft Surat Virtual Meetup#16 - Anypoint Deployment Option, API and Operat...
PDF
Santander Stream Processing with Apache Flink
PPSX
M.S. Dissertation in Salesforce on Force.com
PDF
Enable business continuity and high availability through active active techno...
PDF
Hotel billing management system project report.pdf
PPTX
IBM APM for Hybrid Applications
PDF
Business Technology Brief
PDF
Adopting the Cloud
PPTX
Delivering Cloud Native Batch Solutions - Dodd Pfeffer
PDF
Unlock the Power of the Web with Advanced PWA Development (1).pdf
POTX
eXplain – Systematical Analysis of IBM Legacy Code
PDF
Global Squad Leader - Resume
PDF
Gluon Consulting - Specialized Software Development for Finance
PDF
Datasheet.net pluginforrd
PDF
Graphical Data Analytic Workflows and Cross-Platform Optimization
PDF
Application Modernisation through Event-Driven Microservices
PDF
MuleSoft Manchester Meetup #4 slides 11th February 2021
The F5 Networks Application Services Reference Architecture (White Paper)
Top 8 Trends in Performance Engineering
Datasheet scriptspluginforrd
MuleSoft Surat Virtual Meetup#16 - Anypoint Deployment Option, API and Operat...
Santander Stream Processing with Apache Flink
M.S. Dissertation in Salesforce on Force.com
Enable business continuity and high availability through active active techno...
Hotel billing management system project report.pdf
IBM APM for Hybrid Applications
Business Technology Brief
Adopting the Cloud
Delivering Cloud Native Batch Solutions - Dodd Pfeffer
Unlock the Power of the Web with Advanced PWA Development (1).pdf
eXplain – Systematical Analysis of IBM Legacy Code
Global Squad Leader - Resume
Gluon Consulting - Specialized Software Development for Finance
Datasheet.net pluginforrd
Graphical Data Analytic Workflows and Cross-Platform Optimization
Application Modernisation through Event-Driven Microservices
MuleSoft Manchester Meetup #4 slides 11th February 2021
Ad

Recently uploaded (20)

PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
A Presentation on Artificial Intelligence
PDF
Empathic Computing: Creating Shared Understanding
PDF
Modernizing your data center with Dell and AMD
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
NewMind AI Monthly Chronicles - July 2025
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Electronic commerce courselecture one. Pdf
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Cloud computing and distributed systems.
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
20250228 LYD VKU AI Blended-Learning.pptx
Spectral efficient network and resource selection model in 5G networks
Network Security Unit 5.pdf for BCA BBA.
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
A Presentation on Artificial Intelligence
Empathic Computing: Creating Shared Understanding
Modernizing your data center with Dell and AMD
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
NewMind AI Monthly Chronicles - July 2025
“AI and Expert System Decision Support & Business Intelligence Systems”
The Rise and Fall of 3GPP – Time for a Sabbatical?
Dropbox Q2 2025 Financial Results & Investor Presentation
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Understanding_Digital_Forensics_Presentation.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
Electronic commerce courselecture one. Pdf
MYSQL Presentation for SQL database connectivity
Cloud computing and distributed systems.
Per capita expenditure prediction using model stacking based on satellite ima...
Ad

Enhancing AI-Driven User Engagement with Real-Time Data Streaming via Flink.pptx

  • 1. 1 Enhancing AI-Driven User Engagement with Real-Time Data Streaming via Flink Zbigniew Królikowski Senior ML Engineer @VirtusLab
  • 2. 2 ● Our client, a worldwide retailer, runs an e-commerce platform. Content displayed to the customer is personalised based on individual purchase history. ● The existing approach was based on batch data processing and model training that introduced a significant delay between customer actions on the website and tailored content recommendations, leading to missed sales opportunities. ● Client’s platform allows for continuous tracking and processing customer actions with minimal latency. Up to this point this capability wasn’t leveraged. ● Our Machine Learning engineering team undertook the challenge to build an online data processing and model training solution that improves customer engagement in order to generate more sales opportunities for the client. The opportunity
  • 3. 3 ● From Machine Learning perspective our main goal was to improve the metric called Click-Through Rate (CTR), which is built on a statistic of individual users seeing a particular piece of content and engaging with it. ● Engagement with content leads to sales opportunities impacting the Revenue per Customer (RPC) as well Revenue per Basket (RPB). ● Although both RPC and RPB have a direct link to profit and were actively monitored during A/B testing through analytics means. CTR is more practical as a running metric as it can be calculated on-the-fly. Business metrics we were interested in
  • 4. 4 ● Events describing user actions are fed through Kafka. Our solution supports backwards-compatibility for different event versions. ● Events contain information about user identity, displayed content, user’s basket as well as user’s interactions within the page. A number of them needs to be taken account to form a complete customer story. ● All events types have to be fed through a specialised filtering logic as well correctly joined and aggregated to build the model training features. ● Events are broadcasted from all locations on the mobile app and website while select subsets of this data have to be efficiently routed into dedicated machine learning models. The data
  • 5. 5 ● Customer offers a mobile application as well as a web service. Our solution is capable of supporting both of these as well as potential extensions for on-site use-cases. ● Training a single model, on one platform, requires processing events for each individual customer exposed the that piece of content. This, on average, is close to 2000 events per second. ● There are multiple locations, on each platform, that can be enhanced with the move to the online solution enabled by Flink. ● In order to provide complete coverage, the solution needs to be able to scale up to tens of thousands of events per second, while keeping costs manageable during development ramp-up. The data: volume
  • 6. 6 ● Uptime SLA’s are fulfilled and the system needs to be able to recover from transient failure automatically. ● Data is never lost and the whole process is always recoverable. ● User identity remains hidden at all times. ● No bias is introduced to the models through the data ensuring fair treatment. Training is based solely on relevant characteristics. ● Personalisation is only delivered to logged-in users who have expressly agreed to participate. Other requirements
  • 7. 7 Apache Flink is a stream processing software designed from ground up with those tenants is mind: ● Correctness guarantees - reproducible and consistent results ● Layered APIs - viable for all team skill sets ● Operational focus - production ready from the start ● Scalability - from minimal use-cases to core business ● Performance - no-compromise approach to low-latency, high-throughput and minimal operational costs. Flink offers excellent integration with a number of queue systems including Kafka as well as SQL and NoSQL databases. Apache Flink is an Open Source Software available under the Apache License 2.0. Apache Flink
  • 8. 8 ● In contrast to stateless processing, stateful processing enables a broader set of operations that encompass more than one event. This opens-up the possibilities for more sophisticated business processes to be represented within the code. ● Apache Flink supports multiple back-ends for storing state to best fit different scalability, throughput, and latency requirements: in-memory or a key-value database (RocksDB). Stateful processing
  • 9. 9 ● Apache Flink supports both batch and streaming processing and guarantees full reproducibility of results across both approaches. This requires close to zero code changes and works with each of Flink’s API’s. Both bounded and unbounded streams are supported. Image source: https://flink.apache.org/what-is-flink/flink-architecture/ Batch processing and Stream processing ● This feature makes Flink capable of covering the use-cases as batch processing tools like Apache Spark. This has potential to reduce the amount of different tooling necessary for a project.
  • 10. 10 Big part of what makes Flink a good fit for the client is the very powerful feature of event-time processing. By using this feature, risk associated with a scalable, dynamic environment is mitigated and we achieve full reproducibility. As the logical time is untied from wall-clock time application gain data consistency the focus can be directed towards keeping latency as low as possible. Development process is also greatly enhanced, as the logic can be rolled back to any point time allowing for a replicable development environment. Moreover the same applies to the root-cause-analysis process. Production issues can be resolved swiftly and reliably reducing the ongoing maintenance costs. Event-time processing
  • 11. 11 Flink supports two application programming interfaces (APIs) with a large overlap of features: ● Table API (SQL) - suitable for BI specialists, analysts and data scientists. ● DataStream API (available for Python, Java and Scala) - suitable for data engineers and ML engineers. This makes Flink fit seamlessly into existing proficiencies of the team, without the need of time-investment necessary for re-skilling. Available languages and APIs Image source: https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/concepts/overview/
  • 13. 13 Flink can be easily integrated with cloud platform as well as on premise environment through the use of the Flink Kubernetes Operator which handles many aspects of the deployment. Deployment image source: https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/img/overview.svg
  • 14. 14 ● Flink offers complete consistency of results through checkpointing which will either be run on a very small schedule (seconds to minutes) or continuously. In case of issues Flink application is able to reload from that point and seamlessly catch-up with minimal disruption. Resilience ● When deployed on Kubernetes, Flink supports a high-availability (HA) deployment method in which application stability is maintained in the case of failure of the JobManager, entity that supervises the scheduling and resource management. ● Industry-standard ZooKeeper can be used instead. This makes HA possible on non- kubernetes platforms. Image source: https://nightlies.apache.org/flink/flink-docs-master/docs/concepts/stateful-stream-processing/
  • 15. 15 The deployment will adapt to changing load patterns by increasing or reducing the allocation of resources based on the load. This lowers the operating costs, while at the same safeguards from deterioration of the service level in time os increased customer traffic. Auto-scaling Apache Flink will automatically re-adjust it’s internal allocation of resource to individual processing steps based on quality characteristics. Through this approach, the administrative and development costs can be reduced at the same time ensuring operational efficiency. Image source: https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-release-1.7/docs/custom-resource/autoscaler
  • 16. 16 Flink offers excellent in-built monitoring capability with a web UI: ● Provides real-time statistics of input and data sizes and message counts for every step of the processing. ● Collects all application logs in single location for seamless browsing and auditing. ● Displays information regarding checkpoints providing clear understanding of stability of the applications. ● The same capability is available during the development process ensuring quality and clarity of operation from the early stages. Additionally we’ve integrated the Flink instance with an external cloud monitoring tools giving us a uniform monitoring capability across all cloud deployments. Monitoring
  • 18. 18 ● Our solution gave an 35% uptick in CTR for duration of the A/B trial. ● Performance has been maintained over time. This shows that this method has prevented concept/model drift. The results