SlideShare a Scribd company logo
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SPONSORED BY SCYLLADB
Inside Tripadvisor’s
real-time personalization
with ScyllaDB and AWS
Dean Poulin
DAT204-S
Data Engineering Team Lead
Felipe Cardeneti Mendes
Technical Director
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Felipe Cardeneti Mendes
Technical Director, ScyllaDB
Felipe Cardeneti Mendes is an IT Specialist with years of experience on
distributed systems and open source technologies. He has co-authored
three Linux books and is a frequent speaker on public events and
conferences to promote open source technologies. At ScyllaDB, he
works as a Technical Director.
Speaker info
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What is important for data-intensive
applications?
High
throughput
Low
latency
Predictable
cost
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
ScyllaDB: The database for game changers
• For data-intensive applications that require high
throughput and predictable low latencies
• Close-to-the-metal design takes full advantage of
modern infrastructure
▪ >5x higher throughput
▪ >20x lower latency
▪ >75% TCO savings
• Compatible with Apache Cassandra and
Amazon DynamoDB
• DBaaS/cloud, enterprise, and open source
solutions
“ScyllaDB stands apart...It’s the rare
product that exceeds my expectations.”
– Martin Heller, InfoWorld Contributing Editor and Reviewer
“For 99.9% of applications, Scylla delivers
all the power a customer will ever need, on
workloads that other NoSQL databases
can’t touch – and at a fraction of the cost
of an in-memory solution.”
– Dor Laor, CEO of ScyllaDB, in article by
Forbes Contributor Adrian Bridgwater
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS relationship
• ScyllaDB is an AWS ISV Accelerate Partner
• ScyllaDB has two AWS Marketplace listings
▪ ScyllaDB Enterprise
▪ ScyllaDB Cloud (DBaaS)
• AWS Marketplace TCV grew over 200% in FY23; expected to grow 3x growth this year
• Foundation of the relationship is technical excellence on EC2 instances; led to ScyllaDB being
mentioned in the I4i launch blog and demonstrating superior AWS Graviton performance
• ScyllaDB is an AWS Graviton Ready Partner
• ScyllaDB was a sponsor at AWS Summits in London, India, and Tel Aviv and is
a sponsor at AWS re:Invent
• ScyllaDB sponsored the AWS Marketplace Conclave and participated in the
AWS Startup Roadshow in India
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Game changers leveraging ScyllaDB
Powering India’s top
social media platform
Video recommendation
management
Real-time fraud
detection
Seamless experiences
across content & devices
Network security
threat detection
Power ~50M X1 DVRs
with billions of reqs/day
Precision healthcare
via Edison AI
Inventory hub for
retail operations
Property listings
and updates
Cryptocurrency
exchange app
Geography-based
recommendations
Predictable performance
for on sale surges
Online gaming ad
targeting
Media streaming
for 45M+ subscribers
IT service
management
Real-time ML-driven
recommendations
Connecting people around
the globe
GPS-based exercise
tracking
Global operations – Avon,
The Body Shop & more
Real-time analytics
Unified ML feature store
across the business
Distribution of game
assets in Unreal Engine
Uber scale,
mission-critical chat &
messaging app
2,000,000 SKU
-commerce management
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Finding the perfect fit for high-throughput/
low-latency workloads
ScyllaDB is an excellent choice for
business-critical workloads that need
high throughput and predictable, low
latency
To understand if it’s right for you,
first consider where your workload
falls with respect to high throughput
and/or low latency
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Tripadvisor’s workload
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Dean Poulin
Data Engineering Team Lead,
AI Services & Products, Tripadvisor
With over two decades of experience, Dean has led engineering efforts
across a variety of domains, from frontend and backend development to
technical management. Currently, as the Data Engineering Team Lead
at Tripadvisor, he guides a talented team in building and scaling
real-time personalization systems, leveraging microservices and
cutting-edge architectures.
Dean’s background includes key roles at startups like MDconnectME,
where he developed HIPAA-compliant platforms on AWS, and Orbius,
where he helped build the core technology. He’s passionate about
creating scalable solutions and building strong tech teams. Dean’s
extensive experience across startups, product management, and
scalable architectures uniquely positions him to solve complex technical
challenges. Dean holds a Bachelor of Science in Computer Science from
the University of Maine.
Speaker info
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Tripadvisor at a glance
2000
Founded
TRIP
Stock ticker
$1.8B
Revenue
2,800+
Employees
400M+
Monthly users
1B+
Reviews
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The challenge at a glance
Delivering real-time, hyper-relevant recommendations to users while
navigating scale and latency constraints
1 ms
P99 latency
50M+
Daily users
400M
Monthly users
2B+
Daily requests
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Powered by ML personalization
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Model serving architecture
• Runs in Amazon EKS
for high scalability
• 100+ ML models
serving live traffic
• Each model is
independently
scalable in Kubernetes
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Our custom feature store
• Serves up to 5 million
features/sec
• Static data is served
directly from Redis
• 500,000 user
features/sec retrieved
from ScyllaDB
• User features require
real-time user-based
calculations
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What’s an ML feature?
• An ML feature is an individual measurable property or
characteristic of the data being used to make predictions
• Features are the input variables or attributes that the ML model uses
to learn patterns and relationships from the data in order to make
predictions or classifications
Static features
• Restaurant awards
• Dining restrictions
User features
• Hotel bookings last year
• Reviews submitted last 30 days
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The technologies powering Visitor Platform
• ScyllaDB for the online database
• Spring Boot microservices on
Amazon ECS Fargate
• Spark for data retention
• Spark on Kubernetes for loading offline
data into ScyllaDB
• Amazon Kinesis for streaming events
• Spark for point-in-time queries for
training ML models
• Data warehouse for ETL processing
Amazon
Kinesis
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The Visitor Platform data flow
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Visitor Platform by the numbers
>2.1B unique visitors
500+ user audiences
Up to 425K OPS on ScyllaDB
9 TB stored in ScyllaDB (online)
125 TB in data warehouse (offline)
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Why two databases?
Online real-time data
• Focused on now/today
• Identity management lookup
• Short-term retention with TTL
• For fast rapid retrieval of specific data
• Focus on real-time live site requirements
Offline data warehouse
• Huge date range
• For ML model training
• Long-term storage
• Less latency-sensitive
• Multipurpose
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Visitor Platform microservices
Visitor Core
• Performs identity
management
• Maps users across
devices
• Records data online
and offline
• Drives data retention
Visitor Metric
• Enables robust querying
of facts and metrics for a
given visitor
• Uses custom
domain-specific
language known as
Visitor Query Language
(VQL)
• Example VQL:
Visitor Publisher
• Primary entry point to save
visitor facts and metrics into
Visitor Platform
• Validates input data with
custom validation rules
• Asynchronously calls Visitor
Saver and Audience Manager
Visitor Saver
• Saves visitor facts and
metrics online and offline
• Sends facts and metrics to
the offline data warehouse
through an Amazon Kinesis
stream
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Roundtrip microservice latency
• Visitor Platform microservices operate
with extremely low latency requirements
• 1B+ requests per day
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
ScyllaDB latency
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Partitioning data into ScyllaDB
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Why ScyllaDB?
A monstrously fast live serving database with
the lowest possible latencies
• Better performance than Cassandra
• Eliminate the operational burden of Cassandra
• Ease of migration – with zero downtime
• AWS BYOA option
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
ScyllaDB on AWS: Cloud BYOA
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Please complete the session
survey in the mobile app
Dean Poulin
Data Engineering Team Lead
Felipe Cardeneti Mendes
Technical Director

More Related Content

PPTX
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
PPTX
Azure_Business_Opportunity
PPTX
Get Started with Microsoft Azure.pptx
PDF
Module 3 - QuickSight Overview
PPTX
Microsoft Azure - Planning your move to the cloud
PPTX
MongoDB World 2018: Tutorial - How to Build Applications with MongoDB Atlas &...
PDF
AWS User Group November
PDF
AWS November meetup Slides
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Azure_Business_Opportunity
Get Started with Microsoft Azure.pptx
Module 3 - QuickSight Overview
Microsoft Azure - Planning your move to the cloud
MongoDB World 2018: Tutorial - How to Build Applications with MongoDB Atlas &...
AWS User Group November
AWS November meetup Slides

Similar to Inside Tripadvisor’s real-time personalization with ScyllaDB and AWS (19)

PDF
GEN AI EDM -Generative AI: Beyond Chatbots, Shaping the Future
PDF
APCR-CP_CertReadiness_AprilMay Session 1.pdf
PPTX
Federal Webinar: Application monitoring for on-premises, hybrid, and multi-cl...
PDF
Developer Conference 2.1 - (Cloud) First Steps to the Cloud
PDF
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
PDF
Introduction to Serverless computing and AWS Lambda - Floor28
PDF
Deep dive session - how to achieve database freedom
PDF
Improve Time to Market with Real-Time Analytics on Time-Series Data
PPT
Deploying IBM WebSphere Application Server to the Cloud_GWC_3-24-2015
PPTX
Unlock Innovation with AWS Generative AI: Transform Your Business with Scalab...
PDF
Benefits of the Azure Cloud
PPTX
Microsoft cloud continuum
PPTX
Developing Modern Applications in the Cloud
PPTX
Big Data Expertise
PDF
30 March 2017 - Vuzion Ireland Love Cloud
PPTX
Moving IBM i Applications to the Cloud with AWS and Precisely
PDF
Advance Serverless for Production Grade Workloads
PDF
Handout Introduction_to_AWS for beginner learning
PDF
Microservices on AWS: Architectural Patterns and Best Practices | AWS Summit ...
GEN AI EDM -Generative AI: Beyond Chatbots, Shaping the Future
APCR-CP_CertReadiness_AprilMay Session 1.pdf
Federal Webinar: Application monitoring for on-premises, hybrid, and multi-cl...
Developer Conference 2.1 - (Cloud) First Steps to the Cloud
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Introduction to Serverless computing and AWS Lambda - Floor28
Deep dive session - how to achieve database freedom
Improve Time to Market with Real-Time Analytics on Time-Series Data
Deploying IBM WebSphere Application Server to the Cloud_GWC_3-24-2015
Unlock Innovation with AWS Generative AI: Transform Your Business with Scalab...
Benefits of the Azure Cloud
Microsoft cloud continuum
Developing Modern Applications in the Cloud
Big Data Expertise
30 March 2017 - Vuzion Ireland Love Cloud
Moving IBM i Applications to the Cloud with AWS and Precisely
Advance Serverless for Production Grade Workloads
Handout Introduction_to_AWS for beginner learning
Microservices on AWS: Architectural Patterns and Best Practices | AWS Summit ...
Ad

More from ScyllaDB (20)

PDF
Understanding The True Cost of DynamoDB Webinar
PDF
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
PDF
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
PDF
New Ways to Reduce Database Costs with ScyllaDB
PDF
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
PDF
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
PDF
Leading a High-Stakes Database Migration
PDF
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
PDF
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
PDF
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
PDF
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
PDF
ScyllaDB: 10 Years and Beyond by Dor Laor
PDF
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
PDF
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
PDF
Vector Search with ScyllaDB by Szymon Wasik
PDF
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
PDF
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
PDF
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
PDF
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
PDF
Lessons Learned from Building a Serverless Notifications System by Srushith R...
Understanding The True Cost of DynamoDB Webinar
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
New Ways to Reduce Database Costs with ScyllaDB
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
Leading a High-Stakes Database Migration
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
ScyllaDB: 10 Years and Beyond by Dor Laor
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
Vector Search with ScyllaDB by Szymon Wasik
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
Lessons Learned from Building a Serverless Notifications System by Srushith R...
Ad

Recently uploaded (20)

PDF
Approach and Philosophy of On baking technology
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Mushroom cultivation and it's methods.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PPTX
A Presentation on Artificial Intelligence
PDF
August Patch Tuesday
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PPTX
Spectroscopy.pptx food analysis technology
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
cloud_computing_Infrastucture_as_cloud_p
PPTX
Tartificialntelligence_presentation.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Approach and Philosophy of On baking technology
A comparative study of natural language inference in Swahili using monolingua...
Per capita expenditure prediction using model stacking based on satellite ima...
Univ-Connecticut-ChatGPT-Presentaion.pdf
A comparative analysis of optical character recognition models for extracting...
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Mushroom cultivation and it's methods.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
A Presentation on Artificial Intelligence
August Patch Tuesday
Assigned Numbers - 2025 - Bluetooth® Document
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Spectroscopy.pptx food analysis technology
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Advanced methodologies resolving dimensionality complications for autism neur...
cloud_computing_Infrastucture_as_cloud_p
Tartificialntelligence_presentation.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx

Inside Tripadvisor’s real-time personalization with ScyllaDB and AWS

  • 1. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 2. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. SPONSORED BY SCYLLADB Inside Tripadvisor’s real-time personalization with ScyllaDB and AWS Dean Poulin DAT204-S Data Engineering Team Lead Felipe Cardeneti Mendes Technical Director
  • 3. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. Felipe Cardeneti Mendes Technical Director, ScyllaDB Felipe Cardeneti Mendes is an IT Specialist with years of experience on distributed systems and open source technologies. He has co-authored three Linux books and is a frequent speaker on public events and conferences to promote open source technologies. At ScyllaDB, he works as a Technical Director. Speaker info
  • 4. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. What is important for data-intensive applications? High throughput Low latency Predictable cost
  • 5. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. ScyllaDB: The database for game changers • For data-intensive applications that require high throughput and predictable low latencies • Close-to-the-metal design takes full advantage of modern infrastructure ▪ >5x higher throughput ▪ >20x lower latency ▪ >75% TCO savings • Compatible with Apache Cassandra and Amazon DynamoDB • DBaaS/cloud, enterprise, and open source solutions “ScyllaDB stands apart...It’s the rare product that exceeds my expectations.” – Martin Heller, InfoWorld Contributing Editor and Reviewer “For 99.9% of applications, Scylla delivers all the power a customer will ever need, on workloads that other NoSQL databases can’t touch – and at a fraction of the cost of an in-memory solution.” – Dor Laor, CEO of ScyllaDB, in article by Forbes Contributor Adrian Bridgwater
  • 6. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS relationship • ScyllaDB is an AWS ISV Accelerate Partner • ScyllaDB has two AWS Marketplace listings ▪ ScyllaDB Enterprise ▪ ScyllaDB Cloud (DBaaS) • AWS Marketplace TCV grew over 200% in FY23; expected to grow 3x growth this year • Foundation of the relationship is technical excellence on EC2 instances; led to ScyllaDB being mentioned in the I4i launch blog and demonstrating superior AWS Graviton performance • ScyllaDB is an AWS Graviton Ready Partner • ScyllaDB was a sponsor at AWS Summits in London, India, and Tel Aviv and is a sponsor at AWS re:Invent • ScyllaDB sponsored the AWS Marketplace Conclave and participated in the AWS Startup Roadshow in India
  • 7. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. Game changers leveraging ScyllaDB Powering India’s top social media platform Video recommendation management Real-time fraud detection Seamless experiences across content & devices Network security threat detection Power ~50M X1 DVRs with billions of reqs/day Precision healthcare via Edison AI Inventory hub for retail operations Property listings and updates Cryptocurrency exchange app Geography-based recommendations Predictable performance for on sale surges Online gaming ad targeting Media streaming for 45M+ subscribers IT service management Real-time ML-driven recommendations Connecting people around the globe GPS-based exercise tracking Global operations – Avon, The Body Shop & more Real-time analytics Unified ML feature store across the business Distribution of game assets in Unreal Engine Uber scale, mission-critical chat & messaging app 2,000,000 SKU -commerce management
  • 8. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. Finding the perfect fit for high-throughput/ low-latency workloads ScyllaDB is an excellent choice for business-critical workloads that need high throughput and predictable, low latency To understand if it’s right for you, first consider where your workload falls with respect to high throughput and/or low latency
  • 9. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. Tripadvisor’s workload
  • 10. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. Dean Poulin Data Engineering Team Lead, AI Services & Products, Tripadvisor With over two decades of experience, Dean has led engineering efforts across a variety of domains, from frontend and backend development to technical management. Currently, as the Data Engineering Team Lead at Tripadvisor, he guides a talented team in building and scaling real-time personalization systems, leveraging microservices and cutting-edge architectures. Dean’s background includes key roles at startups like MDconnectME, where he developed HIPAA-compliant platforms on AWS, and Orbius, where he helped build the core technology. He’s passionate about creating scalable solutions and building strong tech teams. Dean’s extensive experience across startups, product management, and scalable architectures uniquely positions him to solve complex technical challenges. Dean holds a Bachelor of Science in Computer Science from the University of Maine. Speaker info
  • 11. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. Tripadvisor at a glance 2000 Founded TRIP Stock ticker $1.8B Revenue 2,800+ Employees 400M+ Monthly users 1B+ Reviews
  • 12. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. The challenge at a glance Delivering real-time, hyper-relevant recommendations to users while navigating scale and latency constraints 1 ms P99 latency 50M+ Daily users 400M Monthly users 2B+ Daily requests
  • 13. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. Powered by ML personalization
  • 14. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. Model serving architecture • Runs in Amazon EKS for high scalability • 100+ ML models serving live traffic • Each model is independently scalable in Kubernetes
  • 15. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. Our custom feature store • Serves up to 5 million features/sec • Static data is served directly from Redis • 500,000 user features/sec retrieved from ScyllaDB • User features require real-time user-based calculations
  • 16. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. What’s an ML feature? • An ML feature is an individual measurable property or characteristic of the data being used to make predictions • Features are the input variables or attributes that the ML model uses to learn patterns and relationships from the data in order to make predictions or classifications Static features • Restaurant awards • Dining restrictions User features • Hotel bookings last year • Reviews submitted last 30 days
  • 17. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. The technologies powering Visitor Platform • ScyllaDB for the online database • Spring Boot microservices on Amazon ECS Fargate • Spark for data retention • Spark on Kubernetes for loading offline data into ScyllaDB • Amazon Kinesis for streaming events • Spark for point-in-time queries for training ML models • Data warehouse for ETL processing Amazon Kinesis
  • 18. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. The Visitor Platform data flow
  • 19. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. Visitor Platform by the numbers >2.1B unique visitors 500+ user audiences Up to 425K OPS on ScyllaDB 9 TB stored in ScyllaDB (online) 125 TB in data warehouse (offline)
  • 20. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. Why two databases? Online real-time data • Focused on now/today • Identity management lookup • Short-term retention with TTL • For fast rapid retrieval of specific data • Focus on real-time live site requirements Offline data warehouse • Huge date range • For ML model training • Long-term storage • Less latency-sensitive • Multipurpose
  • 21. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. Visitor Platform microservices Visitor Core • Performs identity management • Maps users across devices • Records data online and offline • Drives data retention Visitor Metric • Enables robust querying of facts and metrics for a given visitor • Uses custom domain-specific language known as Visitor Query Language (VQL) • Example VQL: Visitor Publisher • Primary entry point to save visitor facts and metrics into Visitor Platform • Validates input data with custom validation rules • Asynchronously calls Visitor Saver and Audience Manager Visitor Saver • Saves visitor facts and metrics online and offline • Sends facts and metrics to the offline data warehouse through an Amazon Kinesis stream
  • 22. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. Roundtrip microservice latency • Visitor Platform microservices operate with extremely low latency requirements • 1B+ requests per day
  • 23. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. ScyllaDB latency
  • 24. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. Partitioning data into ScyllaDB
  • 25. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. Why ScyllaDB? A monstrously fast live serving database with the lowest possible latencies • Better performance than Cassandra • Eliminate the operational burden of Cassandra • Ease of migration – with zero downtime • AWS BYOA option
  • 26. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. ScyllaDB on AWS: Cloud BYOA
  • 27. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank you! © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. Please complete the session survey in the mobile app Dean Poulin Data Engineering Team Lead Felipe Cardeneti Mendes Technical Director