SlideShare a Scribd company logo
Scaling UP
Challenges Encountered Scaling Up
Recommendation Services @Gravity R&D
Bottyán Németh
Who we are and what we do
Gravity R&D is a recommender system vendor company.
We provide recommendation as a service since 2009 for
our customers all around the globe.
2
How we imagine growth?
3
?
How we imagine growth?
4
How it actually happens?
5
?
How it actually happens?
6
# of requests
7
Vatera.hu largest online marketplace in Hungary
served by one “server”
Alexa TOP100 video chat webpage
(~40M recommendation requests / day):
 Served by 5 application servers and 1 DB
 Too many events to store in MySQL  using
Cassandra (v0.6)
 Training time for IALS too long  speedup by IALS1
 Max. 5 sec latency in “product” availability
Using new/beta technologies
8
Cassandra (v0.6)
Nginx (v0.5) (22% of top 1M sites)
Kafka (v0.8)
MySQL auto. failover
Reaching the limits
9
Even if the technology is widely used if you reach it’s
limits the optimization is very costly / time consuming.
Java GC – service collapsed because increased minor GC
times due to a JVM bug (26th of January 2013)
Maintaining MySQL with lots of data (optimize table,
slave replication lag, faster storage device)
Complexity increases
10
There is always a business request or an algorithmic
development which requires more resources.
Optimizations
11
Infrastructure
12
Currently 200+ hosts and 3500+ services monitored
0
50
100
150
200
250
2008 2009 2010 2011 2012 2013 2014 2015 2016
Number of servers
# of items
13
How to store item model / metadata in memory to serve
requests fast?
# of items
14
How to store item model / metadata in memory to serve
requests fast?
VS.
Auto increment IDs for the items?
231 not enough
Preconceptions
15
More data better results.
If the CTR of a new algorithm is low than the old
algorithm is better.
Daily retrain is enough.
Training frequency
16
CTR decreased in the morning
100+ Algorithms
17
0
10
20
30
40
50
60
0 20 40 60 80 100 120
Number of times an algorithm is used
Now
18
• Performance: Gravity’s performance
oriented architecture enables real-time
response to the always changing
environment and user behavior
• Algorithms: more than 100 different
recommendation algorithm enables true
personalization and to reach the highest
KPIs in different domains
• Infrastructure: fast response times all
around the globe and data security thanks
to the private cloud infrastructure located
in 4 different data centers
• Flexibility: the advanced business rule
engine with intuitive user interface allows
to satisfy various business requirements
Performance
140M requests
served daily
Algorithms
30 man-years
invested
Infrastructure
4 data centers
globally
Flexibility
100s of logics
configurable
Cross the river when you come to it
19
Thank you!
20

More Related Content

PPTX
From a toolkit of recommendation algorithms into a real business: the Gravity...
PDF
Using AI for Providing Insights and Recommendations on Activity Data Alexis R...
PDF
An Architecture for Agile Machine Learning in Real-Time Applications
PDF
Utilizing Human Data Validation For KPI Analysis And Machine Learning
PDF
How we integrate Machine Learning Algorithms into our IT Platform at Outfitte...
PDF
From Labelling Open data images to building a private recommender system
PDF
Scaling Your Applications with Engine Yard Cloud
PPTX
Real time analytics @ netflix
From a toolkit of recommendation algorithms into a real business: the Gravity...
Using AI for Providing Insights and Recommendations on Activity Data Alexis R...
An Architecture for Agile Machine Learning in Real-Time Applications
Utilizing Human Data Validation For KPI Analysis And Machine Learning
How we integrate Machine Learning Algorithms into our IT Platform at Outfitte...
From Labelling Open data images to building a private recommender system
Scaling Your Applications with Engine Yard Cloud
Real time analytics @ netflix

What's hot (17)

PPTX
GluonCV
PPTX
The challenges of live events scalability
PPTX
ASTQB washington-sept-2015
PPTX
Microsoft AI Platform - AETHER Introduction
PDF
Sri Rajan - Driving cloud adoption through DevOps / Unlocked: the Hybrid Clou...
PPTX
What is changed in products/service licensing with Cloud?
PDF
Industrial Data Science
PDF
ReStream: Accelerating Backtesting and Stream Replay with Serial-Equivalent P...
PDF
Rail Performance in the Cloud - Opening
PPTX
SolidWorks Design Automation Using the SolidWorks API, Microsoft Excel and VBA
PDF
EIA2017Italy - Danny Lange - Artificial Intelligence - A Game Changer in App ...
PDF
12 Ways to Manage Cloud Costs and Optimize Cloud Spend
PPTX
Real time machine learning
PDF
Big Data in Production: Lessons from Running in the Cloud
PDF
SnapLogic Overview: Are You Feeling SMACT?
PPTX
Big Data Day LA 2015 - Building a Big Data Culture in the Entertainment Indus...
PDF
Cloud Wars: Performance Benchmarking AWS, GCP and Azure
GluonCV
The challenges of live events scalability
ASTQB washington-sept-2015
Microsoft AI Platform - AETHER Introduction
Sri Rajan - Driving cloud adoption through DevOps / Unlocked: the Hybrid Clou...
What is changed in products/service licensing with Cloud?
Industrial Data Science
ReStream: Accelerating Backtesting and Stream Replay with Serial-Equivalent P...
Rail Performance in the Cloud - Opening
SolidWorks Design Automation Using the SolidWorks API, Microsoft Excel and VBA
EIA2017Italy - Danny Lange - Artificial Intelligence - A Game Changer in App ...
12 Ways to Manage Cloud Costs and Optimize Cloud Spend
Real time machine learning
Big Data in Production: Lessons from Running in the Cloud
SnapLogic Overview: Are You Feeling SMACT?
Big Data Day LA 2015 - Building a Big Data Culture in the Entertainment Indus...
Cloud Wars: Performance Benchmarking AWS, GCP and Azure

Viewers also liked (16)

PPTX
Recommenders on video sharing portals - business and algorithmic aspects
PPTX
Gravity rd corporate introduction - nlp matiné 2014
PPTX
Xây dựng mạng lưới tài năng trẻ trong sáng tạo – khởi nghiệp
PPT
Gravity personalizaton intro
PPTX
Entrepreneurship & Innovation: Dual-core Engine
PPTX
The rise of Recommendation Engines
PPTX
Lessons learnt at building recommendation services at industry scale
PDF
RecSys 2015: Large-scale real-time product recommendation at Criteo
PPTX
Understanding How CQL3 Maps to Cassandra's Internal Data Structure
PDF
Dynamically Allocate Cluster Resources to your Spark Application
PDF
Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...
PPT
Organizational-culture
PPTX
Centralization and Decentralization
PPTX
Using Docker for GPU Accelerated Applications
POTX
LDA Beginner's Tutorial
PDF
10 Lessons Learned from Building Machine Learning Systems
Recommenders on video sharing portals - business and algorithmic aspects
Gravity rd corporate introduction - nlp matiné 2014
Xây dựng mạng lưới tài năng trẻ trong sáng tạo – khởi nghiệp
Gravity personalizaton intro
Entrepreneurship & Innovation: Dual-core Engine
The rise of Recommendation Engines
Lessons learnt at building recommendation services at industry scale
RecSys 2015: Large-scale real-time product recommendation at Criteo
Understanding How CQL3 Maps to Cassandra's Internal Data Structure
Dynamically Allocate Cluster Resources to your Spark Application
Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...
Organizational-culture
Centralization and Decentralization
Using Docker for GPU Accelerated Applications
LDA Beginner's Tutorial
10 Lessons Learned from Building Machine Learning Systems

Similar to Challenges Encountered by Scaling Up Recommendation Services at Gravity R&D (20)

PPTX
There are 250 Database products, are you running the right one?
PPTX
How to run Real Time processing on Big Data / Ron Zavner (GigaSpaces)
PPTX
The Cloud - What's different
PDF
Serverless Computing: Driving Innovation and Business Value
PPTX
Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...
PPTX
Scale Your Load Balancer from 0 to 1 million TPS on Azure
PDF
Proact SYNC 2013 Breakout session - NetApp Clustered DataONTAP, dé storage hy...
PDF
Data Culture Series - Keynote - 3rd Dec
PPTX
Leveraging Big Data with Hadoop, NoSQL and RDBMS
PDF
Five Early Challenges Of Building Streaming Fast Data Applications
PDF
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...
PDF
Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...
PDF
Vertica Analytics Database general overview
PDF
Migrating from Oracle to Postgres
 
PDF
NoOps in a Serverless World
PPTX
Data & Analytics Forum: Moving Telcos to Real Time
PPT
Just do it!
PDF
J1 - Keynote Data Platform - Rohan Kumar
PPTX
Data stream processing and micro service architecture
PPTX
There are 250 Database products, are you running the right one?
How to run Real Time processing on Big Data / Ron Zavner (GigaSpaces)
The Cloud - What's different
Serverless Computing: Driving Innovation and Business Value
Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...
Scale Your Load Balancer from 0 to 1 million TPS on Azure
Proact SYNC 2013 Breakout session - NetApp Clustered DataONTAP, dé storage hy...
Data Culture Series - Keynote - 3rd Dec
Leveraging Big Data with Hadoop, NoSQL and RDBMS
Five Early Challenges Of Building Streaming Fast Data Applications
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...
Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...
Vertica Analytics Database general overview
Migrating from Oracle to Postgres
 
NoOps in a Serverless World
Data & Analytics Forum: Moving Telcos to Real Time
Just do it!
J1 - Keynote Data Platform - Rohan Kumar
Data stream processing and micro service architecture

More from Domonkos Tikk (9)

PDF
Neighbor methods vs matrix factorization - case studies of real-life recommen...
DOC
General factorization framework for context-aware recommendations
PPTX
Tartalomgazdagítás (content enrichment)
PPTX
Idomaar crowd rec_reference_fw
PPTX
Big Data in Online Classifieds
PPTX
Context-aware similarities within the factorization framework - presented at ...
PPTX
Slides from CARR 2012 WS - Enhancing Matrix Factorization Through Initializat...
PPTX
Fast ALS-Based Tensor Factorization for Context-Aware Recommendation from Imp...
PPTX
Recommender Systems Evaluation: A 3D Benchmark - presented at RUE 2012 worksh...
Neighbor methods vs matrix factorization - case studies of real-life recommen...
General factorization framework for context-aware recommendations
Tartalomgazdagítás (content enrichment)
Idomaar crowd rec_reference_fw
Big Data in Online Classifieds
Context-aware similarities within the factorization framework - presented at ...
Slides from CARR 2012 WS - Enhancing Matrix Factorization Through Initializat...
Fast ALS-Based Tensor Factorization for Context-Aware Recommendation from Imp...
Recommender Systems Evaluation: A 3D Benchmark - presented at RUE 2012 worksh...

Recently uploaded (20)

PPT
Ethics in Information System - Management Information System
PDF
Introduction to the IoT system, how the IoT system works
PPTX
Digital Literacy And Online Safety on internet
PPT
Design_with_Watersergyerge45hrbgre4top (1).ppt
PDF
Tenda Login Guide: Access Your Router in 5 Easy Steps
PPTX
Internet___Basics___Styled_ presentation
PPTX
Introduction to cybersecurity and digital nettiquette
PPTX
newyork.pptxirantrafgshenepalchinachinane
PPTX
Power Point - Lesson 3_2.pptx grad school presentation
PDF
FINAL CALL-6th International Conference on Networks & IOT (NeTIOT 2025)
PDF
Paper PDF World Game (s) Great Redesign.pdf
PPT
FIRE PREVENTION AND CONTROL PLAN- LUS.FM.MQ.OM.UTM.PLN.00014.ppt
PDF
Vigrab.top – Online Tool for Downloading and Converting Social Media Videos a...
PPTX
Job_Card_System_Styled_lorem_ipsum_.pptx
PPTX
presentation_pfe-universite-molay-seltan.pptx
PPTX
Module 1 - Cyber Law and Ethics 101.pptx
DOC
Rose毕业证学历认证,利物浦约翰摩尔斯大学毕业证国外本科毕业证
PPTX
artificial intelligence overview of it and more
PPTX
artificialintelligenceai1-copy-210604123353.pptx
PPTX
Mathew Digital SEO Checklist Guidlines 2025
Ethics in Information System - Management Information System
Introduction to the IoT system, how the IoT system works
Digital Literacy And Online Safety on internet
Design_with_Watersergyerge45hrbgre4top (1).ppt
Tenda Login Guide: Access Your Router in 5 Easy Steps
Internet___Basics___Styled_ presentation
Introduction to cybersecurity and digital nettiquette
newyork.pptxirantrafgshenepalchinachinane
Power Point - Lesson 3_2.pptx grad school presentation
FINAL CALL-6th International Conference on Networks & IOT (NeTIOT 2025)
Paper PDF World Game (s) Great Redesign.pdf
FIRE PREVENTION AND CONTROL PLAN- LUS.FM.MQ.OM.UTM.PLN.00014.ppt
Vigrab.top – Online Tool for Downloading and Converting Social Media Videos a...
Job_Card_System_Styled_lorem_ipsum_.pptx
presentation_pfe-universite-molay-seltan.pptx
Module 1 - Cyber Law and Ethics 101.pptx
Rose毕业证学历认证,利物浦约翰摩尔斯大学毕业证国外本科毕业证
artificial intelligence overview of it and more
artificialintelligenceai1-copy-210604123353.pptx
Mathew Digital SEO Checklist Guidlines 2025

Challenges Encountered by Scaling Up Recommendation Services at Gravity R&D

  • 1. Scaling UP Challenges Encountered Scaling Up Recommendation Services @Gravity R&D Bottyán Németh
  • 2. Who we are and what we do Gravity R&D is a recommender system vendor company. We provide recommendation as a service since 2009 for our customers all around the globe. 2
  • 3. How we imagine growth? 3 ?
  • 4. How we imagine growth? 4
  • 5. How it actually happens? 5 ?
  • 6. How it actually happens? 6
  • 7. # of requests 7 Vatera.hu largest online marketplace in Hungary served by one “server” Alexa TOP100 video chat webpage (~40M recommendation requests / day):  Served by 5 application servers and 1 DB  Too many events to store in MySQL  using Cassandra (v0.6)  Training time for IALS too long  speedup by IALS1  Max. 5 sec latency in “product” availability
  • 8. Using new/beta technologies 8 Cassandra (v0.6) Nginx (v0.5) (22% of top 1M sites) Kafka (v0.8) MySQL auto. failover
  • 9. Reaching the limits 9 Even if the technology is widely used if you reach it’s limits the optimization is very costly / time consuming. Java GC – service collapsed because increased minor GC times due to a JVM bug (26th of January 2013) Maintaining MySQL with lots of data (optimize table, slave replication lag, faster storage device)
  • 10. Complexity increases 10 There is always a business request or an algorithmic development which requires more resources.
  • 12. Infrastructure 12 Currently 200+ hosts and 3500+ services monitored 0 50 100 150 200 250 2008 2009 2010 2011 2012 2013 2014 2015 2016 Number of servers
  • 13. # of items 13 How to store item model / metadata in memory to serve requests fast?
  • 14. # of items 14 How to store item model / metadata in memory to serve requests fast? VS. Auto increment IDs for the items? 231 not enough
  • 15. Preconceptions 15 More data better results. If the CTR of a new algorithm is low than the old algorithm is better. Daily retrain is enough.
  • 17. 100+ Algorithms 17 0 10 20 30 40 50 60 0 20 40 60 80 100 120 Number of times an algorithm is used
  • 18. Now 18 • Performance: Gravity’s performance oriented architecture enables real-time response to the always changing environment and user behavior • Algorithms: more than 100 different recommendation algorithm enables true personalization and to reach the highest KPIs in different domains • Infrastructure: fast response times all around the globe and data security thanks to the private cloud infrastructure located in 4 different data centers • Flexibility: the advanced business rule engine with intuitive user interface allows to satisfy various business requirements Performance 140M requests served daily Algorithms 30 man-years invested Infrastructure 4 data centers globally Flexibility 100s of logics configurable
  • 19. Cross the river when you come to it 19