SlideShare a Scribd company logo
| 1
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
| 2
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
XO Group Inc.
Membership and Community Team
Alexander Copquin - Senior Software Engineer
Vladimir Carballo - Senior Software Engineer
| 3
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
Favorites API Re-platforming
…a case study
| 4
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
Favorites API Re-platforming
• Architectures SQL .NET / Ruby Mongo
• Reasons for migration
• Schema design
• RoR model design and implementation
• Migration strategies and systems
• Lessons learned
| 5
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
Our Favorites Feature
| 6
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
Favorites API
• Add / Edit / Delete Object.
• Manage Boards
• Get counts & stats
• RESTful API
• Rails
• JavaScript
• Ios
• Android
Features
| 7
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
• + 100,000,000 “favorited” objects
• + 760,000 boards
• Avg. 55,000 new objects per day
Stats
| 8
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
Legacy Architecture
| 9
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
• Database 55 GB and growing…
• Avg 45 rpm on peak times
• Avg 80 msec response POST
• Avg 460 msec response GET
Legacy Benchmarks
| 10
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
• Db Reaching max. capacity for setup
• Scalability problems
• Hard to modify schema
• Bad response times
• Very complex caching layer
• Out of line with company’s strategy
Maxed
| 11
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
New Architecture
| 12
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
• Easy to scale
• Flexible schema
• Fast Response
• No Cache Layer
• Fast Iteration / Deploy
• TDD first and foremost
• At a glance monitoring of all layers
Scalable
| 13
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
Implementation
| 14
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
What we persisted in the legacy schema
• UserId (primary key)
• UniqueId
• Url (unique per user)
• ImageUrl
• Name
• Description
• ObjectId (unique per application
adding favorites)
• Category
• Timestamps
• Other
| 15
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
Favorites DB Legacy Schema
| 16
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
select top 10 UserFavoriteId, Name, Description, Url, ImageUrl
from userFavorites
where userId = '5174181997807393'
Sample queries
| 17
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
select top 5 grp.groupId, grp.Name as GroupName,
fav.userFavoriteId, fav.name,
fav.Description, fav.Url, fav.ImageUrl
from userFavoritesGroups grp
inner join userFavoritesGroupsItems grpItm
on grp.GroupId = grpItm.GroupId
inner join userFavorites fav
on grpItm.userFavoriteId = fav.userFavoriteId
where grp.userId = '5174181997807393'
Sample queries
| 18
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
Towards a new Schema and Persistance Layer
• Start with a clean slate
• Break with the past
• Persist only relevant minimum data points
• Think and rethink relationships
• High Performance
• Flexible
• Prototype different scenarios
| 19
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
First attempt
| 20
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
UserFavorites
| 21
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
• Document contains embedded documents which are
required to be accessed on its own
• Documents would grow without bound
• Most queries would be slow
• Indexes would be very expensive
• Tries too hard to imitate legacy
Cons
| 22
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
Second attempt
| 23
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
| 24
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
Board document with one recent favorite
| 25
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
Board document with more recent favorites
| 26
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
| 27
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
Favorite document located on different boards
| 28
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
• Document structure matches the data required on the view
• A Board document includes the 4 most recent favorites.
• A Favorite document includes the list of boards it was
added to
• Faster queries.
• More control on the size of each document
• Better implementation of UX intent
Pros
| 29
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
Sample queries
db.favorites.find({'member': 'e1606ed5-4ac8-48b4-aee6-bc4203937903'})
.limit(1)
db.favorites.find({'boards': '7557acf8-b7b1-4eab-a64d-57449034cfc6'})
.limit(1)
db.favorites.find({'application': 'marketplace'})
.limit(1)
db.boards.find({'member': 'e1606ed5-4ac8-48b4-aee6-bc4203937903'})
.limit(1)
db.boards.find({'member': 'e1606ed5-4ac8-48b4-aee6-bc4203937903',
'default_board': true})
db.boards.find({'name' : 'Simple Reception Decor'}).limit(1)
| 30
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
● Rails web application framework
● We speak RoR and JS
● mongoDB as a data repository (we love NoSQL)
● Two collections, one for Boards and one for
Favorites
● No joins, no foreign keys
● Referential integrity is handled in a different fashion.
● MongoId Gem (Pros & Cons)
Some implementation details
| 31
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
Favorites re-platform
| 32
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
Board
class
| 33
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
| 34
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
Favorite
class
| 35
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
| 36
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
Scaling reads with replica sets
| 37
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
Scaling reads with sharding
| 38
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
Migration
| 39
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
Clients switchover
New
API
Legacy API
Client
Client
Client
Client
ONE WAY
| 40
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
Migration Timeline
new API.
Continuous
Migr.
Implement
Monitors
Turn on
Continuous
Data
Catch-up
Plug ClientsBulk Migr.
Development
Bulk Migr.
Migration
| 41
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
Bulk Migration
ETL
| 42
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
Bulk Migration
FavoritesUserFavorites
SQL Tables Mongo Collections
BoardsUserFavoritesGroups
UserFavoritesGroupsItems
| 43
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
Favorites Job
| 44
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
Pentaho Steps
| 45
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
Auto-increment Id vs. UUID
UserFavoriteId
GroupId
Favorites UUID
Groups UUID
Continuous
Migration
| 46
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
Get UUID from the Get Go
• Add a column to legacy Db (+ 100M recs!!) with new
Mongo UUID
• Then migrations will take care of inserting into new
documents
SQL
has all new
ids
xxxxx-xxxx-xxxx
xxxxx-xxxx-xxxx
xxxxx-xxxx-xxxx
Mongo
Ids are
inserted
Migration
Systems
| 47
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
Add UUID Columns in SQL
100 M recs!!!Alter table add UUID uniqueidentifier
New Favs TempTable
with UUID
SELECT *, uuid = NEWID()
INTO NewUserFavorites
FROM UserFavorites
Add
Indexes
Rename & drop
| 48
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
• SQL needed some sanitation
• SQL prep scripts approx. 4 hs
• Pentaho ETL on local Workstation: 8hs
• Restore into production Mongo Cluster: 4hs
Facts
| 49
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
We’ve got data!!
| 50
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
Continuous Migration Architecture
Clients
Legacy API New API
SQS Queue Messenger
ONE WAY SYNC…
| 51
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
Continuous Migration
Favorites Legacy Messenger
• Ruby
• Consumer of an SQS queue coming from Legacy
that generates 1 message per operation
• Issues API call to new app per each operation
• Runs as a worker in the background
Legacy
API
SQS
Legacy
Messenge
r
New
API
Mongo
| 52
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
SQS Queue is not a FIFO Friend
Sent by Legacy
1
2
3
4
5
6
7
5
3
1
2
4
6
7
Consumed by Messenger
| 53
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
• Queue is not FIFO
• Objects don’t exist
• Queue bloats fast
• Can get like not-real-time
• Data is different
Challenges
| 54
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
• Verify if entity exists (API call),
otherwise, throw back in queue
• Set message expiration
• Sanitize data
• Get multiple workers to achieve
near real-time syncing.
Solutions
| 55
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
• Favor a simple document structure
• Try different schema paradigms
• Bypass native objectId generation in favor of UUID
• Break with the past
• Queues can be deceiving
• Gems can simplify application layer impl.
• Manage ref. integrity in app. layer
• No cache required Take away
| 56
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
• Avg 85 rpm on peak times
• Avg 58 msec response POST
• Avg 18 msec response GET
New Benchmarks
| 57
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
New vs. Legacy
• Overall Performance Increase
• 18 ms vs. 460 ms for GET
• 58 ms vs. 80 ms for POST
• Easy Schema Changes
• Scalable
• Simpler architecture
• No Cache layer
• Fast Code iteration, testing and deployment
• In-line with company’s technology strategy
Good
| 58
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
Acknowledgments
• Dmitri Nesterenko
• Jason Sherman
• Nelly Santoso
• Phillip Chiu
• Sean Lipkin
• George Taveras
• Alison Fay
• Diana Taykhman
• Rajendra Prashad
• Josh Keys
• Lewis DiFelice
| 59
© 2014 XO GROUP INC. ALL RIGHTS RESERVED.
contact, questions, inquiries?
memcomtech@xogrp.com

More Related Content

PDF
Building Open Source Identity Management with FreeIPA
PPTX
Indexing with MongoDB
PPSX
LMAX Disruptor - High Performance Inter-Thread Messaging Library
PPTX
PDF
Distributed Locking in Kubernetes
PDF
Spark shuffle introduction
PDF
Learning postgresql
PDF
Cassandra Introduction & Features
Building Open Source Identity Management with FreeIPA
Indexing with MongoDB
LMAX Disruptor - High Performance Inter-Thread Messaging Library
Distributed Locking in Kubernetes
Spark shuffle introduction
Learning postgresql
Cassandra Introduction & Features

What's hot (20)

PPTX
Ibm db2
PDF
Building large scale transactional data lake using apache hudi
PPTX
Infrastructure as Code on Azure: Show your Bicep!
PDF
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재
PPTX
An Enterprise Architect's View of MongoDB
PDF
Redo log improvements MYSQL 8.0
PDF
JCR - Java Content Repositories
PDF
How to Design Indexes, Really
PPTX
[135] 오픈소스 데이터베이스, 은행 서비스에 첫발을 내밀다.
PDF
Yapp methodology anjo-kolk
PDF
Inside MongoDB: the Internals of an Open-Source Database
PPTX
Automated Deployments with Ansible
PDF
How to Manage Scale-Out Environments with MariaDB MaxScale
PDF
Kubernetes in Docker
PDF
Aws Elastic Block Storage
PPTX
Terraform Basics
PPTX
DTW18 - code08 - Everything You Need To Know About Storage with Kubernetes
PPTX
Introduction to GCP BigQuery and DataPrep
PPTX
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
Ibm db2
Building large scale transactional data lake using apache hudi
Infrastructure as Code on Azure: Show your Bicep!
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재
An Enterprise Architect's View of MongoDB
Redo log improvements MYSQL 8.0
JCR - Java Content Repositories
How to Design Indexes, Really
[135] 오픈소스 데이터베이스, 은행 서비스에 첫발을 내밀다.
Yapp methodology anjo-kolk
Inside MongoDB: the Internals of an Open-Source Database
Automated Deployments with Ansible
How to Manage Scale-Out Environments with MariaDB MaxScale
Kubernetes in Docker
Aws Elastic Block Storage
Terraform Basics
DTW18 - code08 - Everything You Need To Know About Storage with Kubernetes
Introduction to GCP BigQuery and DataPrep
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
Ad

Viewers also liked (20)

PDF
From Oracle to MongoDB
PPTX
Transitioning from SQL to MongoDB
PDF
Migrating from RDBMS to MongoDB
KEY
Practical Ruby Projects With Mongo Db
PPTX
Moving from SQL Server to MongoDB
PDF
Scaling Hike Messenger to 15M Users
PDF
Zero to 1 Billion+ Records: A True Story of Learning & Scaling GameChanger
PPT
Geolocation and Cassandra at Physi
PDF
MongoDB and the Connectivity Map: Making Connections Between Genetics and Dis...
PDF
Geolocation in mongodb
PDF
Mongo db with spring data document
PDF
From sql server to mongo db
POTX
Content Management with MongoDB by Mark Helmstetter
PDF
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
PPTX
Webinar: Architecting Secure and Compliant Applications with MongoDB
PDF
Oracle vs NoSQL – The good, the bad and the ugly
PPTX
Replacing Traditional Technologies with MongoDB: A Single Platform for All Fi...
PDF
Spark and MongoDB
PPTX
Building an An AI Startup with MongoDB at x.ai
PPTX
Securing MongoDB to Serve an AWS-Based, Multi-Tenant, Security-Fanatic SaaS A...
From Oracle to MongoDB
Transitioning from SQL to MongoDB
Migrating from RDBMS to MongoDB
Practical Ruby Projects With Mongo Db
Moving from SQL Server to MongoDB
Scaling Hike Messenger to 15M Users
Zero to 1 Billion+ Records: A True Story of Learning & Scaling GameChanger
Geolocation and Cassandra at Physi
MongoDB and the Connectivity Map: Making Connections Between Genetics and Dis...
Geolocation in mongodb
Mongo db with spring data document
From sql server to mongo db
Content Management with MongoDB by Mark Helmstetter
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
Webinar: Architecting Secure and Compliant Applications with MongoDB
Oracle vs NoSQL – The good, the bad and the ugly
Replacing Traditional Technologies with MongoDB: A Single Platform for All Fi...
Spark and MongoDB
Building an An AI Startup with MongoDB at x.ai
Securing MongoDB to Serve an AWS-Based, Multi-Tenant, Security-Fanatic SaaS A...
Ad

Similar to Migration from SQL to MongoDB - A Case Study at TheKnot.com (20)

PPTX
PPTX
Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB
PPTX
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform
PDF
MongoDB at FrozenRails
PPTX
MongoDB
PPTX
MongoDB
PDF
MongoDB.pdf
PPTX
Dynamo vs Mongo
PPTX
Compare DynamoDB vs. MongoDB
PDF
Introduction to MongoDB
PPT
9. Document Oriented Databases
PPTX
An Evening with MongoDB Detroit 2013
PDF
MongoDB@sfr.fr
PPT
No SQL and MongoDB - Hyderabad Scalability Meetup
PDF
MongoDB in the Big Data Landscape
PDF
10gen telco white_paper
PPTX
Use Case: Apollo Group at Oracle Open World
PPTX
Python Ireland Conference 2016 - Python and MongoDB Workshop
PPTX
Everything You Need to Know About MongoDB Development.pptx
PDF
Using MongoDB and a Relational Database at MongoDB Day
Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform
MongoDB at FrozenRails
MongoDB
MongoDB
MongoDB.pdf
Dynamo vs Mongo
Compare DynamoDB vs. MongoDB
Introduction to MongoDB
9. Document Oriented Databases
An Evening with MongoDB Detroit 2013
MongoDB@sfr.fr
No SQL and MongoDB - Hyderabad Scalability Meetup
MongoDB in the Big Data Landscape
10gen telco white_paper
Use Case: Apollo Group at Oracle Open World
Python Ireland Conference 2016 - Python and MongoDB Workshop
Everything You Need to Know About MongoDB Development.pptx
Using MongoDB and a Relational Database at MongoDB Day

More from MongoDB (20)

PDF
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
PDF
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
PDF
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
PDF
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
PDF
MongoDB SoCal 2020: MongoDB Atlas Jump Start
PDF
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
PDF
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
PDF
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
PDF
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
PDF
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
PDF
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...

Recently uploaded (20)

PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
cuic standard and advanced reporting.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Encapsulation theory and applications.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Big Data Technologies - Introduction.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Empathic Computing: Creating Shared Understanding
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Spectroscopy.pptx food analysis technology
PPTX
A Presentation on Artificial Intelligence
Encapsulation_ Review paper, used for researhc scholars
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
cuic standard and advanced reporting.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Chapter 3 Spatial Domain Image Processing.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Encapsulation theory and applications.pdf
Electronic commerce courselecture one. Pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Big Data Technologies - Introduction.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Empathic Computing: Creating Shared Understanding
Assigned Numbers - 2025 - Bluetooth® Document
Reach Out and Touch Someone: Haptics and Empathic Computing
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Spectroscopy.pptx food analysis technology
A Presentation on Artificial Intelligence

Migration from SQL to MongoDB - A Case Study at TheKnot.com

  • 1. | 1 © 2014 XO GROUP INC. ALL RIGHTS RESERVED.
  • 2. | 2 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. XO Group Inc. Membership and Community Team Alexander Copquin - Senior Software Engineer Vladimir Carballo - Senior Software Engineer
  • 3. | 3 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. Favorites API Re-platforming …a case study
  • 4. | 4 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. Favorites API Re-platforming • Architectures SQL .NET / Ruby Mongo • Reasons for migration • Schema design • RoR model design and implementation • Migration strategies and systems • Lessons learned
  • 5. | 5 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. Our Favorites Feature
  • 6. | 6 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. Favorites API • Add / Edit / Delete Object. • Manage Boards • Get counts & stats • RESTful API • Rails • JavaScript • Ios • Android Features
  • 7. | 7 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. • + 100,000,000 “favorited” objects • + 760,000 boards • Avg. 55,000 new objects per day Stats
  • 8. | 8 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. Legacy Architecture
  • 9. | 9 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. • Database 55 GB and growing… • Avg 45 rpm on peak times • Avg 80 msec response POST • Avg 460 msec response GET Legacy Benchmarks
  • 10. | 10 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. • Db Reaching max. capacity for setup • Scalability problems • Hard to modify schema • Bad response times • Very complex caching layer • Out of line with company’s strategy Maxed
  • 11. | 11 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. New Architecture
  • 12. | 12 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. • Easy to scale • Flexible schema • Fast Response • No Cache Layer • Fast Iteration / Deploy • TDD first and foremost • At a glance monitoring of all layers Scalable
  • 13. | 13 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. Implementation
  • 14. | 14 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. What we persisted in the legacy schema • UserId (primary key) • UniqueId • Url (unique per user) • ImageUrl • Name • Description • ObjectId (unique per application adding favorites) • Category • Timestamps • Other
  • 15. | 15 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. Favorites DB Legacy Schema
  • 16. | 16 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. select top 10 UserFavoriteId, Name, Description, Url, ImageUrl from userFavorites where userId = '5174181997807393' Sample queries
  • 17. | 17 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. select top 5 grp.groupId, grp.Name as GroupName, fav.userFavoriteId, fav.name, fav.Description, fav.Url, fav.ImageUrl from userFavoritesGroups grp inner join userFavoritesGroupsItems grpItm on grp.GroupId = grpItm.GroupId inner join userFavorites fav on grpItm.userFavoriteId = fav.userFavoriteId where grp.userId = '5174181997807393' Sample queries
  • 18. | 18 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. Towards a new Schema and Persistance Layer • Start with a clean slate • Break with the past • Persist only relevant minimum data points • Think and rethink relationships • High Performance • Flexible • Prototype different scenarios
  • 19. | 19 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. First attempt
  • 20. | 20 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. UserFavorites
  • 21. | 21 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. • Document contains embedded documents which are required to be accessed on its own • Documents would grow without bound • Most queries would be slow • Indexes would be very expensive • Tries too hard to imitate legacy Cons
  • 22. | 22 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. Second attempt
  • 23. | 23 © 2014 XO GROUP INC. ALL RIGHTS RESERVED.
  • 24. | 24 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. Board document with one recent favorite
  • 25. | 25 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. Board document with more recent favorites
  • 26. | 26 © 2014 XO GROUP INC. ALL RIGHTS RESERVED.
  • 27. | 27 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. Favorite document located on different boards
  • 28. | 28 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. • Document structure matches the data required on the view • A Board document includes the 4 most recent favorites. • A Favorite document includes the list of boards it was added to • Faster queries. • More control on the size of each document • Better implementation of UX intent Pros
  • 29. | 29 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. Sample queries db.favorites.find({'member': 'e1606ed5-4ac8-48b4-aee6-bc4203937903'}) .limit(1) db.favorites.find({'boards': '7557acf8-b7b1-4eab-a64d-57449034cfc6'}) .limit(1) db.favorites.find({'application': 'marketplace'}) .limit(1) db.boards.find({'member': 'e1606ed5-4ac8-48b4-aee6-bc4203937903'}) .limit(1) db.boards.find({'member': 'e1606ed5-4ac8-48b4-aee6-bc4203937903', 'default_board': true}) db.boards.find({'name' : 'Simple Reception Decor'}).limit(1)
  • 30. | 30 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. ● Rails web application framework ● We speak RoR and JS ● mongoDB as a data repository (we love NoSQL) ● Two collections, one for Boards and one for Favorites ● No joins, no foreign keys ● Referential integrity is handled in a different fashion. ● MongoId Gem (Pros & Cons) Some implementation details
  • 31. | 31 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. Favorites re-platform
  • 32. | 32 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. Board class
  • 33. | 33 © 2014 XO GROUP INC. ALL RIGHTS RESERVED.
  • 34. | 34 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. Favorite class
  • 35. | 35 © 2014 XO GROUP INC. ALL RIGHTS RESERVED.
  • 36. | 36 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. Scaling reads with replica sets
  • 37. | 37 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. Scaling reads with sharding
  • 38. | 38 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. Migration
  • 39. | 39 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. Clients switchover New API Legacy API Client Client Client Client ONE WAY
  • 40. | 40 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. Migration Timeline new API. Continuous Migr. Implement Monitors Turn on Continuous Data Catch-up Plug ClientsBulk Migr. Development Bulk Migr. Migration
  • 41. | 41 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. Bulk Migration ETL
  • 42. | 42 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. Bulk Migration FavoritesUserFavorites SQL Tables Mongo Collections BoardsUserFavoritesGroups UserFavoritesGroupsItems
  • 43. | 43 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. Favorites Job
  • 44. | 44 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. Pentaho Steps
  • 45. | 45 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. Auto-increment Id vs. UUID UserFavoriteId GroupId Favorites UUID Groups UUID Continuous Migration
  • 46. | 46 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. Get UUID from the Get Go • Add a column to legacy Db (+ 100M recs!!) with new Mongo UUID • Then migrations will take care of inserting into new documents SQL has all new ids xxxxx-xxxx-xxxx xxxxx-xxxx-xxxx xxxxx-xxxx-xxxx Mongo Ids are inserted Migration Systems
  • 47. | 47 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. Add UUID Columns in SQL 100 M recs!!!Alter table add UUID uniqueidentifier New Favs TempTable with UUID SELECT *, uuid = NEWID() INTO NewUserFavorites FROM UserFavorites Add Indexes Rename & drop
  • 48. | 48 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. • SQL needed some sanitation • SQL prep scripts approx. 4 hs • Pentaho ETL on local Workstation: 8hs • Restore into production Mongo Cluster: 4hs Facts
  • 49. | 49 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. We’ve got data!!
  • 50. | 50 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. Continuous Migration Architecture Clients Legacy API New API SQS Queue Messenger ONE WAY SYNC…
  • 51. | 51 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. Continuous Migration Favorites Legacy Messenger • Ruby • Consumer of an SQS queue coming from Legacy that generates 1 message per operation • Issues API call to new app per each operation • Runs as a worker in the background Legacy API SQS Legacy Messenge r New API Mongo
  • 52. | 52 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. SQS Queue is not a FIFO Friend Sent by Legacy 1 2 3 4 5 6 7 5 3 1 2 4 6 7 Consumed by Messenger
  • 53. | 53 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. • Queue is not FIFO • Objects don’t exist • Queue bloats fast • Can get like not-real-time • Data is different Challenges
  • 54. | 54 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. • Verify if entity exists (API call), otherwise, throw back in queue • Set message expiration • Sanitize data • Get multiple workers to achieve near real-time syncing. Solutions
  • 55. | 55 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. • Favor a simple document structure • Try different schema paradigms • Bypass native objectId generation in favor of UUID • Break with the past • Queues can be deceiving • Gems can simplify application layer impl. • Manage ref. integrity in app. layer • No cache required Take away
  • 56. | 56 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. • Avg 85 rpm on peak times • Avg 58 msec response POST • Avg 18 msec response GET New Benchmarks
  • 57. | 57 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. New vs. Legacy • Overall Performance Increase • 18 ms vs. 460 ms for GET • 58 ms vs. 80 ms for POST • Easy Schema Changes • Scalable • Simpler architecture • No Cache layer • Fast Code iteration, testing and deployment • In-line with company’s technology strategy Good
  • 58. | 58 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. Acknowledgments • Dmitri Nesterenko • Jason Sherman • Nelly Santoso • Phillip Chiu • Sean Lipkin • George Taveras • Alison Fay • Diana Taykhman • Rajendra Prashad • Josh Keys • Lewis DiFelice
  • 59. | 59 © 2014 XO GROUP INC. ALL RIGHTS RESERVED. contact, questions, inquiries? memcomtech@xogrp.com