SlideShare a Scribd company logo
Stop Worrying
Love the SQL! (the Quepid story)
OpenSource Connections
Me/Us
Doug Turnbull
@softwaredoug
Likes: Solr, Elasticsearch, Cassandra,
Postgres
OpenSource Connections
@o19s
Search, Discovery and Analytics
Let us introduce you to freelancing!
OpenSource Connections
Most Importantly we do...
Make my search results more relevant!
“Search Relevancy”
What database works best for problem X?
“(No)SQL Architect/Trusted Advisor”
OpenSource Connections
How products actually get built
Rena: Doug, John can you come by this afternoon?
One of our Solr-based products needs some urgent relevancy
work
Its Friday, it needs to get done today!
Us: Sure!
The Client
(Rena!)
smart
cookie!
OpenSource Connections
A few hours later
Us: we’ve made a bit of progress!
image frustration-1081 by jseliger2
Rena: but everytime we fix something, we break an
existing search!
Us: yeah! we’re stuck in a whack-a-mole-game
other image: whack a mole by jencu
OpenSource Connections
Whack-a-Mole
What search relevancy
work actually looks like
OpenSource Connections
I HAVE AN IDEA
● Middle of the afternoon, I stop doing search
work and start throwing together some
python from flask import Flask
app = Flask(__name__)
Everyone: Doug, stop that, you have important search work to do!
Me: We’re not making any progress!
WE NEED A WAY TO REGRESSION TEST OUR RELEVANCY AS WE TUNE!
Everyone: You’re nuts!
OpenSource Connections
What did I make?
Focus on gathering stakeholder (ie Rena)
feedback on search, coupled w/ workbench
tuning against that feedback
Today we have customers...
… forget that, tell me about your failures!
OpenSource Connections
Our war story
My mistakes:
● Building a product
● Selling a product
● As a user experience engineer
● As an Angular developer
● At choosing databases
OpenSource Connections
Quepid 0.0.0.0.0.0.1
Track multiple user searches
for this query (hdmi cables) Rena rates this document
as a good/bad search result
need to store:
<search> -> <id for search result> -> <rating 1-10>
“hdmi cables” -> “doc1234” -> “10”
*Actual UI may have been much uglier
OpenSource Connections
Data structure selection under duress
● What’s simple, easy, and will persist our
data?
● What plays well with python?
● What can I get working now in Rena’s office?
OpenSource Connections
Redis
● In memory “Data Structure Server”
○ hashes, lists, simple key-> value storage
● Persistent -- write to disk every X minutes
OpenSource Connections
Redis
from redis import Redis
redis = Redis()
redis.set("foo", "bar")
redis.get("foo") # gets ‘bar’
$ pip install redis
Easy to install and go! Specific to our problem:
from redis import Redis
redis = Redis()
ratings = {“doc1234”: “10”,
“doc532”: “5”}
searchQuery = “hdmi cables”
redis.hsetall(searchQuery, ratings)
Store a hash table
at “hdmi cables”
with:
“doc1234” -> “10”
“doc532” -> “5”
OpenSource Connections
Success!
● My insanity paid off that afternoon
● Now we’re left with a pile of hacked together
(terrible) code -- now what?
OpenSource Connections
Adding some features
● Would like to add multiple “cases”
(different search projects that solve different problems)
● Would like to add user accounts
● Still a one-off for Silverchair
OpenSource Connections
Cases
Tuning a cable shopping site... … vs state laws
OpenSource Connections
Cases in Redis?
from redis import Redis
redis = Redis()
ratings = {“doc1234”: “10”,
“doc532”: “5”}
searchQuery = “hdmi cables”
redis.hset(searchQuery, ratings)
Recall our existing implementation
“data model”
Out of the box, redis can deal with 2 levels deep:
{
“hdmi cables”: {
“doc1234”: “10”,
“doc532”: “5”
},
“ethernet cables”
...
}
Can’t add extra layer (redis hash only one layer)
{“cable site”: {
“hdmi cables”: {...}
“ethernet cables”: {...}
}
“laws site: {...}}
OpenSource Connections
Time to give up Redis?
“All problems in computer science can be solved by another level of indirection” -- David Wheeler
Crazy Idea: Add dynamic prefix to query keys to indicate case, ie:
{
“case_cablestore_hdmi cables”: {
“doc1234”: “10”,
“doc532”: “5”
},
“case_cablestore_ethernet cables”: {
… },
“case_statelaws_car tax”: {
…}
}
Queries for “Cable Store” case
Query for “State Laws” case
redis.keys(“case_cablestore*”)
To Fetch:
OpenSource Connections
Store other info about cases?
New problem: we need to store some information about cases, case name, et
{
“case_cablestore_hdmi cables”: {
“doc1234”: “10”,
“doc532”: “5”
},
“case_cablestore_ethernet cables”: {
… },
“case_statelaws_car tax”: {
…}
}
Where would it go here?
{
“case_cablestore” {
“name”: “cablestore”,
“created” “20140101”
},
“case_cablestore_query_hdmi cables”: {
“doc1234”: “10”,
“doc532”: “5”
},
“case_cablestore_query_ethernet cables”:
{
… },
“case_statelaws_query_car tax”: {
…}
}
OpenSource Connections
Oh but let’s add users
Extrapolating on past patterns {
“user_doug” {
“name”: “Doug”,
“created_date”: “20140101”
},
“user_doug_case_cablestore” {
“name”: “cablestore”,
“created_date” “20140101”
},
“user_doug_case_cablestore_query_hdmi cables”: {
“doc1234”: “10”,
“doc532”: “5”
},
“user_doug_case_cablestore_query_ethernet cables”:
{
… },
“user_tom_case_statelaws_query_car tax”: {
…}
}image: Rage Wallpaper from Flickr user Thoth God of Knowledge
You right now!
OpenSource Connections
Step Back
We ask ourselves: Is this tool a
product? Is it useful outside of this
customer?
What level of software engineering helps us move forward?
● Migrate to RDMS?
● “NoSQL” options?
● Clean up use of Redis somehow?
OpenSource Connections
SubRedis
Operationalizes hierarchy inside of redis
https://github.com/softwaredoug/subredis
from redis import Redis
from subredis import SubRedis
redis = Redis()
sr = SubRedis(“case_%s” % caseId , redis)
ratings = {“doc1234”: “10”,
“doc532”: “5”}
searchQuery = “hdmi cables”
sr.hsetall(searchQuery, ratings)
Create a redis sandbox for this case
Interact with this case’s queries with redis
sandbox specific to that case
Behind the scenes, subredis
queries/appends the case_1 prefix to
everything
OpenSource Connections
SubRedis == composable
userSr = SubRedis(“user_%s” % userId , redis)
caseSr = SubRedis(“case_%s” % caseId , userSr)
# Sandbox redis for queries about user
ratings = {“doc1234”: “10”,
“doc532”: “5”}
searchQuery = “hdmi cables”
caseSr.hsetall(searchQuery, ratings)
SubRedis takes any Redis like
thing, and works safely in that
sandbox
Now working on sandbox, within a sandbox
OpenSource Connections
Does something reasonable under the hood
{
“user_1_name”: “Doug”,
“user_1_created_date”: “Doug”,
“user_1_case_1_name”: “name”: “cablestore”
“user_1_case_1_hdmi cables”: {
“doc1234”: “10”,
“doc532”: “5”
},
“user_2_name”, “Rena”,
...
}
All
Redis
user_1
subred.
case_1
subred.
OpenSource Connections
We reflect again
● Ok we tried this out as a product. Launched.
● Paid off *some* tech debt, but wtf are we
doing
● Works well enough, we’ve got a bunch of
new features, forge ahead
OpenSource Connections
We reflect again
● We have real customers
● Our backend is evolving away from simple
key-value storage
○ user accounts? users that share cases? stored
search snapshots? etc etc
OpenSource Connections
Attack of the relational
Given our current set of tools, how would we solve the problem
“case X can be shared between multiple users”?
{
“user_1_name”: “Doug”,
“user_1_created_date”: “Doug”,
“user_1_case_1_name”: “name”: “cablestore”
“user_1_case_1_hdmi cables”: {
“doc1234”: “10”,
“doc532”: “5”
},
“user_2_name”, “Rena”,
“user_2_case_1_name”: “name”: “cablestore”
“user_2_case_1_hdmi cables”: {
“doc1234”: “10”,
“doc532”: “5”
},
}
Could duplicate the data?
This stinks!
● Updates require visiting many (every?)
user, looking for this case
● Bloated database
Duplicate the data?
OpenSource Connections
Attack of the relational
Given our current set of tools, how would we solve the problem
“case X can be shared between multiple users”?
{
“user_1_name”: “Doug”,
“user_1_created_date”: “Doug”,
“user_1_cases”: [1, ...]
“case_1_name”: “name”: “cablestore”
“case_1_hdmi cables”: {
“doc1234”: “10”,
“doc532”: “5”
},
“user_2_name”, “Rena”,
“user_2_cases”: [1, ...]
...
}
User 1
Case 1
User 2
Store list of
owned cases
Break out cases to a top-level record?
OpenSource Connections
SudRedisRelational?
{
“user_1_name”: “Doug”,
“user_1_created_date”: “Doug”,
“user_1_cases”: [1, ...]
“case_1_name”: “name”: “cablestore”
“case_1_hdmi cables”: {
“doc1234”: “10”,
“doc532”: “5”
},
“user_2_name”, “Rena”,
“user_2_cases”: [1, ...]
...
}
We’ve actually just normalized our data.
Why was this good?
● We want to update case 1 in isolation
without anomalies
● We don’t want to visit every user to
update case 1!
● We want to avoid duplication
We just made our “NoSQL” database a bit relational
OpenSource Connections
Other Problems
● Simple CRUD tasks like “delete a case”
need to be coded up
● We’re managing our own record ids
● Is any of this atomic? does it occur in
isolation?
OpenSource Connections
What’s our next DB?
● These problems are hard, we need a new
DB
● We also need better tooling!
OpenSource Connections
Irony
● This is the exact situation we warn clients
about in our (No)SQL Architect Roles.
○ Relational == General Purpose
○ Many-many, many-one, one-many, etc
○ Relational == consistent tooling
○ NoSQL == solve specific problems well
OpenSource Connections
So we went relational!
● Took advantage of great tooling: MySQL,
Sqlalchemy (ORM), Alembic (migrations)
● Modeled our data relationships exactly like
we needed them to be modeled
OpenSource Connections
Map db Python classes
class SearchQuery(Base):
__tablename__ = 'query'
id = Column(Integer, primary_key=True)
search_string = Column(String)
ratings = relationship("QueryRating")
class QueryRating(Base):
__tablename__ = 'rating'
id = Column(Integer, primary_key=True)
doc_id = Column(String)
rating = Column(Integer)
Can model my domain in coder-friendly
classes class SearchQuery(Base):
__tablename__ = 'query'
id = Column(Integer, primary_key=True)
search_string = Column(String)
ratings = relationship("QueryRating")
class QueryRating(Base):
__tablename__ = 'rating'
id = Column(Integer, primary_key=True)
doc_id = Column(String)
rating = Column(Integer)
OpenSource Connections
Easy CRUD
q = SearchQuery(search_string=”hdmi cable”)
db.session.add(q)
db.session.commit()
del q.ratings[0]
db.session.add(q)
db.session.commit()
q = SearchQuery.query.filter(id=1).one()
q.search_string=”foo”
db.session.add(q)
db.session.commit()
Create!
Delete!
Update!
OpenSource Connections
Migrations are good
alembic revision --autogenerate -m "name for tries"
alembic upgrade head
alembic downgrade 0ab51c25c
How do you upgrade your database to add/move/reorganize data?
● Redis this was always done manually/scripted
● Migrations with RDMS are a very robust/well-understood way to
handle this
SQLAlchemy has “alembic” to help:
OpenSource Connections
Modeling Users ←→ Cases
association_table = Table(case2users, Base.metadata,
Column('case_id', Integer, ForeignKey('case.id')),
Column('user_id', Integer, ForeignKey('user.id'))
)
class User(Base):
__tablename__ = 'user'
id = Column(Integer, primary_key=True)
cases = relationship("Case",
secondary=association_table)
class Case(Base):
__tablename__ = 'case'
id = Column(Integer, primary_key=True)
Can model many-many relationships
OpenSource Connections
Ultimate Query Flexibility
for user in User.query.all():
for case in user.cases:
print case.caseName
for user in User.query.filter(User.isPaying==True):
for case in user.cases:
print case.caseName
Print all cases:
Cases from paying members:
OpenSource Connections
Lots of things easier
● backups
● robust hosting services (RDS)
● industrial strength ACID with flexible
querying
● 3rd-party tooling (ie VividCortex for MySQL)
OpenSource Connections
When NoSQL?
● Solve specific problems well
○ Optimize for specific query patterns
○ Full-Text Search (Elasticsearch, Solr)
○ Caching, shared data structure (Redis)
● Optimize for specific scaling problems
○ Provide a denormalized “view” of your data for
specific task
OpenSource Connections
Final Thoughts
Sometimes RDMS’s have harder initial hurdle
for setup, figuring out migrations; data
modeling; etc
Why isn’t the easy path the wise path?
OpenSource Connections
In conclusion

More Related Content

PPTX
Javascript - The Stack and Beyond
PDF
A Modest Introduction To Swift
PDF
Regex Considered Harmful: Use Rosie Pattern Language Instead
PDF
Got Logs? Get Answers with Elasticsearch ELK - PuppetConf 2014
PDF
Modern tooling to assist with developing applications on FreeBSD
PDF
Production Readiness Strategies in an Automated World
PDF
Data Encryption at Rest
PPTX
How to Use Cryptography Properly: Common Mistakes People Make When Using Cry...
Javascript - The Stack and Beyond
A Modest Introduction To Swift
Regex Considered Harmful: Use Rosie Pattern Language Instead
Got Logs? Get Answers with Elasticsearch ELK - PuppetConf 2014
Modern tooling to assist with developing applications on FreeBSD
Production Readiness Strategies in an Automated World
Data Encryption at Rest
How to Use Cryptography Properly: Common Mistakes People Make When Using Cry...

What's hot (20)

PDF
Incrementalism: An Industrial Strategy For Adopting Modern Automation
PDF
Creating PostgreSQL-as-a-Service at Scale
PDF
HTTP For the Good or the Bad
PPTX
Deploying E.L.K stack w Puppet
PDF
Leveraging Open Source for Database Development: Database Version Control wit...
PDF
Null Bachaav - May 07 Attack Monitoring workshop.
PPTX
The tale of 100 cve's
PDF
PostgreSQL Open SV 2018
PDF
FreeBSD: Dev to Prod
PDF
Terracotta's OffHeap Explained
PPTX
REST with Eve and Python
PDF
#NoXML: Eliminating XML in Spring Projects - SpringOne 2GX 2015
PDF
Introduction to rest.li
PDF
[2D1]Elasticsearch 성능 최적화
KEY
A million connections and beyond - Node.js at scale
PPT
{{more}} Kibana4
PDF
Move Over, Rsync
PDF
Building servers with Node.js
PDF
Node.js in production
PDF
Node.js - A Quick Tour
Incrementalism: An Industrial Strategy For Adopting Modern Automation
Creating PostgreSQL-as-a-Service at Scale
HTTP For the Good or the Bad
Deploying E.L.K stack w Puppet
Leveraging Open Source for Database Development: Database Version Control wit...
Null Bachaav - May 07 Attack Monitoring workshop.
The tale of 100 cve's
PostgreSQL Open SV 2018
FreeBSD: Dev to Prod
Terracotta's OffHeap Explained
REST with Eve and Python
#NoXML: Eliminating XML in Spring Projects - SpringOne 2GX 2015
Introduction to rest.li
[2D1]Elasticsearch 성능 최적화
A million connections and beyond - Node.js at scale
{{more}} Kibana4
Move Over, Rsync
Building servers with Node.js
Node.js in production
Node.js - A Quick Tour
Ad

Similar to Stop Worrying & Love the SQL - A Case Study (20)

KEY
Scaling php applications with redis
PDF
Redis Everywhere - Sunshine PHP
PDF
Redis for the Everyday Developer
PDF
#SydPHP - The Magic of Redis
KEY
KeyValue Stores
ODP
Beyond relational database - Building high performance websites using Redis a...
PDF
Florida Man Uses Cache as Database.pdf
PDF
Introduction to redis - version 2
PPTX
Introduction to Redis
PPTX
Get more than a cache back! - ConFoo Montreal
PDF
Introduction to Redis
PPTX
Redis Use Patterns (DevconTLV June 2014)
PDF
Open-source Observability for Your LLM Applications... Tracing your chains!.pdf
PDF
Paris Redis Meetup Introduction
PDF
Redis Workshop on Data Structures, Commands, Administration
PPTX
Redis Modules - Redis India Tour - 2017
PPT
Python redis talk
PDF
The Cassandra Distributed Database
PDF
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
PPTX
Redis Indices (#RedisTLV)
Scaling php applications with redis
Redis Everywhere - Sunshine PHP
Redis for the Everyday Developer
#SydPHP - The Magic of Redis
KeyValue Stores
Beyond relational database - Building high performance websites using Redis a...
Florida Man Uses Cache as Database.pdf
Introduction to redis - version 2
Introduction to Redis
Get more than a cache back! - ConFoo Montreal
Introduction to Redis
Redis Use Patterns (DevconTLV June 2014)
Open-source Observability for Your LLM Applications... Tracing your chains!.pdf
Paris Redis Meetup Introduction
Redis Workshop on Data Structures, Commands, Administration
Redis Modules - Redis India Tour - 2017
Python redis talk
The Cassandra Distributed Database
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
Redis Indices (#RedisTLV)
Ad

More from All Things Open (20)

PDF
Agentic AI for Developers and Data Scientists Build an AI Agent in 10 Lines o...
PPTX
Big Data on a Small Budget: Scalable Data Visualization for the Rest of Us - ...
PDF
AI 3-in-1: Agents, RAG, and Local Models - Brent Laster
PDF
Let's Create a GitHub Copilot Extension! - Nick Taylor, Pomerium
PDF
Leveraging Pre-Trained Transformer Models for Protein Function Prediction - T...
PDF
Gen AI: AI Agents - Making LLMs work together in an organized way - Brent Las...
PDF
You Don't Need an AI Strategy, But You Do Need to Be Strategic About AI - Jes...
PPTX
DON’T PANIC: AI IS COMING – The Hitchhiker’s Guide to AI - Mark Hinkle, Perip...
PDF
Fine-Tuning Large Language Models with Declarative ML Orchestration - Shivay ...
PDF
Leveraging Knowledge Graphs for RAG: A Smarter Approach to Contextual AI Appl...
PPTX
Artificial Intelligence Needs Community Intelligence - Sriram Raghavan, IBM R...
PDF
Don't just talk to AI, do more with AI: how to improve productivity with AI a...
PPTX
Open-Source GenAI vs. Enterprise GenAI: Navigating the Future of AI Innovatio...
PDF
The Death of the Browser - Rachel-Lee Nabors, AgentQL
PDF
Making Operating System updates fast, easy, and safe
PDF
Reshaping the landscape of belonging to transform community
PDF
The Unseen, Underappreciated Security Work Your Maintainers May (or may not) ...
PDF
Integrating Diversity, Equity, and Inclusion into Product Design
PDF
The Open Source Ecosystem for eBPF in Kubernetes
PDF
Open Source Privacy-Preserving Metrics - Sarah Gran & Brandon Pitman
Agentic AI for Developers and Data Scientists Build an AI Agent in 10 Lines o...
Big Data on a Small Budget: Scalable Data Visualization for the Rest of Us - ...
AI 3-in-1: Agents, RAG, and Local Models - Brent Laster
Let's Create a GitHub Copilot Extension! - Nick Taylor, Pomerium
Leveraging Pre-Trained Transformer Models for Protein Function Prediction - T...
Gen AI: AI Agents - Making LLMs work together in an organized way - Brent Las...
You Don't Need an AI Strategy, But You Do Need to Be Strategic About AI - Jes...
DON’T PANIC: AI IS COMING – The Hitchhiker’s Guide to AI - Mark Hinkle, Perip...
Fine-Tuning Large Language Models with Declarative ML Orchestration - Shivay ...
Leveraging Knowledge Graphs for RAG: A Smarter Approach to Contextual AI Appl...
Artificial Intelligence Needs Community Intelligence - Sriram Raghavan, IBM R...
Don't just talk to AI, do more with AI: how to improve productivity with AI a...
Open-Source GenAI vs. Enterprise GenAI: Navigating the Future of AI Innovatio...
The Death of the Browser - Rachel-Lee Nabors, AgentQL
Making Operating System updates fast, easy, and safe
Reshaping the landscape of belonging to transform community
The Unseen, Underappreciated Security Work Your Maintainers May (or may not) ...
Integrating Diversity, Equity, and Inclusion into Product Design
The Open Source Ecosystem for eBPF in Kubernetes
Open Source Privacy-Preserving Metrics - Sarah Gran & Brandon Pitman

Recently uploaded (20)

PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Approach and Philosophy of On baking technology
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
cuic standard and advanced reporting.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Spectroscopy.pptx food analysis technology
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Big Data Technologies - Introduction.pptx
PPTX
1. Introduction to Computer Programming.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Assigned Numbers - 2025 - Bluetooth® Document
Digital-Transformation-Roadmap-for-Companies.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf
A comparative analysis of optical character recognition models for extracting...
Approach and Philosophy of On baking technology
“AI and Expert System Decision Support & Business Intelligence Systems”
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Reach Out and Touch Someone: Haptics and Empathic Computing
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
MIND Revenue Release Quarter 2 2025 Press Release
Programs and apps: productivity, graphics, security and other tools
cuic standard and advanced reporting.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Spectroscopy.pptx food analysis technology
Per capita expenditure prediction using model stacking based on satellite ima...
Big Data Technologies - Introduction.pptx
1. Introduction to Computer Programming.pptx

Stop Worrying & Love the SQL - A Case Study

  • 1. Stop Worrying Love the SQL! (the Quepid story)
  • 2. OpenSource Connections Me/Us Doug Turnbull @softwaredoug Likes: Solr, Elasticsearch, Cassandra, Postgres OpenSource Connections @o19s Search, Discovery and Analytics Let us introduce you to freelancing!
  • 3. OpenSource Connections Most Importantly we do... Make my search results more relevant! “Search Relevancy” What database works best for problem X? “(No)SQL Architect/Trusted Advisor”
  • 4. OpenSource Connections How products actually get built Rena: Doug, John can you come by this afternoon? One of our Solr-based products needs some urgent relevancy work Its Friday, it needs to get done today! Us: Sure! The Client (Rena!) smart cookie!
  • 5. OpenSource Connections A few hours later Us: we’ve made a bit of progress! image frustration-1081 by jseliger2 Rena: but everytime we fix something, we break an existing search! Us: yeah! we’re stuck in a whack-a-mole-game other image: whack a mole by jencu
  • 6. OpenSource Connections Whack-a-Mole What search relevancy work actually looks like
  • 7. OpenSource Connections I HAVE AN IDEA ● Middle of the afternoon, I stop doing search work and start throwing together some python from flask import Flask app = Flask(__name__) Everyone: Doug, stop that, you have important search work to do! Me: We’re not making any progress! WE NEED A WAY TO REGRESSION TEST OUR RELEVANCY AS WE TUNE! Everyone: You’re nuts!
  • 8. OpenSource Connections What did I make? Focus on gathering stakeholder (ie Rena) feedback on search, coupled w/ workbench tuning against that feedback Today we have customers... … forget that, tell me about your failures!
  • 9. OpenSource Connections Our war story My mistakes: ● Building a product ● Selling a product ● As a user experience engineer ● As an Angular developer ● At choosing databases
  • 10. OpenSource Connections Quepid 0.0.0.0.0.0.1 Track multiple user searches for this query (hdmi cables) Rena rates this document as a good/bad search result need to store: <search> -> <id for search result> -> <rating 1-10> “hdmi cables” -> “doc1234” -> “10” *Actual UI may have been much uglier
  • 11. OpenSource Connections Data structure selection under duress ● What’s simple, easy, and will persist our data? ● What plays well with python? ● What can I get working now in Rena’s office?
  • 12. OpenSource Connections Redis ● In memory “Data Structure Server” ○ hashes, lists, simple key-> value storage ● Persistent -- write to disk every X minutes
  • 13. OpenSource Connections Redis from redis import Redis redis = Redis() redis.set("foo", "bar") redis.get("foo") # gets ‘bar’ $ pip install redis Easy to install and go! Specific to our problem: from redis import Redis redis = Redis() ratings = {“doc1234”: “10”, “doc532”: “5”} searchQuery = “hdmi cables” redis.hsetall(searchQuery, ratings) Store a hash table at “hdmi cables” with: “doc1234” -> “10” “doc532” -> “5”
  • 14. OpenSource Connections Success! ● My insanity paid off that afternoon ● Now we’re left with a pile of hacked together (terrible) code -- now what?
  • 15. OpenSource Connections Adding some features ● Would like to add multiple “cases” (different search projects that solve different problems) ● Would like to add user accounts ● Still a one-off for Silverchair
  • 16. OpenSource Connections Cases Tuning a cable shopping site... … vs state laws
  • 17. OpenSource Connections Cases in Redis? from redis import Redis redis = Redis() ratings = {“doc1234”: “10”, “doc532”: “5”} searchQuery = “hdmi cables” redis.hset(searchQuery, ratings) Recall our existing implementation “data model” Out of the box, redis can deal with 2 levels deep: { “hdmi cables”: { “doc1234”: “10”, “doc532”: “5” }, “ethernet cables” ... } Can’t add extra layer (redis hash only one layer) {“cable site”: { “hdmi cables”: {...} “ethernet cables”: {...} } “laws site: {...}}
  • 18. OpenSource Connections Time to give up Redis? “All problems in computer science can be solved by another level of indirection” -- David Wheeler Crazy Idea: Add dynamic prefix to query keys to indicate case, ie: { “case_cablestore_hdmi cables”: { “doc1234”: “10”, “doc532”: “5” }, “case_cablestore_ethernet cables”: { … }, “case_statelaws_car tax”: { …} } Queries for “Cable Store” case Query for “State Laws” case redis.keys(“case_cablestore*”) To Fetch:
  • 19. OpenSource Connections Store other info about cases? New problem: we need to store some information about cases, case name, et { “case_cablestore_hdmi cables”: { “doc1234”: “10”, “doc532”: “5” }, “case_cablestore_ethernet cables”: { … }, “case_statelaws_car tax”: { …} } Where would it go here? { “case_cablestore” { “name”: “cablestore”, “created” “20140101” }, “case_cablestore_query_hdmi cables”: { “doc1234”: “10”, “doc532”: “5” }, “case_cablestore_query_ethernet cables”: { … }, “case_statelaws_query_car tax”: { …} }
  • 20. OpenSource Connections Oh but let’s add users Extrapolating on past patterns { “user_doug” { “name”: “Doug”, “created_date”: “20140101” }, “user_doug_case_cablestore” { “name”: “cablestore”, “created_date” “20140101” }, “user_doug_case_cablestore_query_hdmi cables”: { “doc1234”: “10”, “doc532”: “5” }, “user_doug_case_cablestore_query_ethernet cables”: { … }, “user_tom_case_statelaws_query_car tax”: { …} }image: Rage Wallpaper from Flickr user Thoth God of Knowledge You right now!
  • 21. OpenSource Connections Step Back We ask ourselves: Is this tool a product? Is it useful outside of this customer? What level of software engineering helps us move forward? ● Migrate to RDMS? ● “NoSQL” options? ● Clean up use of Redis somehow?
  • 22. OpenSource Connections SubRedis Operationalizes hierarchy inside of redis https://github.com/softwaredoug/subredis from redis import Redis from subredis import SubRedis redis = Redis() sr = SubRedis(“case_%s” % caseId , redis) ratings = {“doc1234”: “10”, “doc532”: “5”} searchQuery = “hdmi cables” sr.hsetall(searchQuery, ratings) Create a redis sandbox for this case Interact with this case’s queries with redis sandbox specific to that case Behind the scenes, subredis queries/appends the case_1 prefix to everything
  • 23. OpenSource Connections SubRedis == composable userSr = SubRedis(“user_%s” % userId , redis) caseSr = SubRedis(“case_%s” % caseId , userSr) # Sandbox redis for queries about user ratings = {“doc1234”: “10”, “doc532”: “5”} searchQuery = “hdmi cables” caseSr.hsetall(searchQuery, ratings) SubRedis takes any Redis like thing, and works safely in that sandbox Now working on sandbox, within a sandbox
  • 24. OpenSource Connections Does something reasonable under the hood { “user_1_name”: “Doug”, “user_1_created_date”: “Doug”, “user_1_case_1_name”: “name”: “cablestore” “user_1_case_1_hdmi cables”: { “doc1234”: “10”, “doc532”: “5” }, “user_2_name”, “Rena”, ... } All Redis user_1 subred. case_1 subred.
  • 25. OpenSource Connections We reflect again ● Ok we tried this out as a product. Launched. ● Paid off *some* tech debt, but wtf are we doing ● Works well enough, we’ve got a bunch of new features, forge ahead
  • 26. OpenSource Connections We reflect again ● We have real customers ● Our backend is evolving away from simple key-value storage ○ user accounts? users that share cases? stored search snapshots? etc etc
  • 27. OpenSource Connections Attack of the relational Given our current set of tools, how would we solve the problem “case X can be shared between multiple users”? { “user_1_name”: “Doug”, “user_1_created_date”: “Doug”, “user_1_case_1_name”: “name”: “cablestore” “user_1_case_1_hdmi cables”: { “doc1234”: “10”, “doc532”: “5” }, “user_2_name”, “Rena”, “user_2_case_1_name”: “name”: “cablestore” “user_2_case_1_hdmi cables”: { “doc1234”: “10”, “doc532”: “5” }, } Could duplicate the data? This stinks! ● Updates require visiting many (every?) user, looking for this case ● Bloated database Duplicate the data?
  • 28. OpenSource Connections Attack of the relational Given our current set of tools, how would we solve the problem “case X can be shared between multiple users”? { “user_1_name”: “Doug”, “user_1_created_date”: “Doug”, “user_1_cases”: [1, ...] “case_1_name”: “name”: “cablestore” “case_1_hdmi cables”: { “doc1234”: “10”, “doc532”: “5” }, “user_2_name”, “Rena”, “user_2_cases”: [1, ...] ... } User 1 Case 1 User 2 Store list of owned cases Break out cases to a top-level record?
  • 29. OpenSource Connections SudRedisRelational? { “user_1_name”: “Doug”, “user_1_created_date”: “Doug”, “user_1_cases”: [1, ...] “case_1_name”: “name”: “cablestore” “case_1_hdmi cables”: { “doc1234”: “10”, “doc532”: “5” }, “user_2_name”, “Rena”, “user_2_cases”: [1, ...] ... } We’ve actually just normalized our data. Why was this good? ● We want to update case 1 in isolation without anomalies ● We don’t want to visit every user to update case 1! ● We want to avoid duplication We just made our “NoSQL” database a bit relational
  • 30. OpenSource Connections Other Problems ● Simple CRUD tasks like “delete a case” need to be coded up ● We’re managing our own record ids ● Is any of this atomic? does it occur in isolation?
  • 31. OpenSource Connections What’s our next DB? ● These problems are hard, we need a new DB ● We also need better tooling!
  • 32. OpenSource Connections Irony ● This is the exact situation we warn clients about in our (No)SQL Architect Roles. ○ Relational == General Purpose ○ Many-many, many-one, one-many, etc ○ Relational == consistent tooling ○ NoSQL == solve specific problems well
  • 33. OpenSource Connections So we went relational! ● Took advantage of great tooling: MySQL, Sqlalchemy (ORM), Alembic (migrations) ● Modeled our data relationships exactly like we needed them to be modeled
  • 34. OpenSource Connections Map db Python classes class SearchQuery(Base): __tablename__ = 'query' id = Column(Integer, primary_key=True) search_string = Column(String) ratings = relationship("QueryRating") class QueryRating(Base): __tablename__ = 'rating' id = Column(Integer, primary_key=True) doc_id = Column(String) rating = Column(Integer) Can model my domain in coder-friendly classes class SearchQuery(Base): __tablename__ = 'query' id = Column(Integer, primary_key=True) search_string = Column(String) ratings = relationship("QueryRating") class QueryRating(Base): __tablename__ = 'rating' id = Column(Integer, primary_key=True) doc_id = Column(String) rating = Column(Integer)
  • 35. OpenSource Connections Easy CRUD q = SearchQuery(search_string=”hdmi cable”) db.session.add(q) db.session.commit() del q.ratings[0] db.session.add(q) db.session.commit() q = SearchQuery.query.filter(id=1).one() q.search_string=”foo” db.session.add(q) db.session.commit() Create! Delete! Update!
  • 36. OpenSource Connections Migrations are good alembic revision --autogenerate -m "name for tries" alembic upgrade head alembic downgrade 0ab51c25c How do you upgrade your database to add/move/reorganize data? ● Redis this was always done manually/scripted ● Migrations with RDMS are a very robust/well-understood way to handle this SQLAlchemy has “alembic” to help:
  • 37. OpenSource Connections Modeling Users ←→ Cases association_table = Table(case2users, Base.metadata, Column('case_id', Integer, ForeignKey('case.id')), Column('user_id', Integer, ForeignKey('user.id')) ) class User(Base): __tablename__ = 'user' id = Column(Integer, primary_key=True) cases = relationship("Case", secondary=association_table) class Case(Base): __tablename__ = 'case' id = Column(Integer, primary_key=True) Can model many-many relationships
  • 38. OpenSource Connections Ultimate Query Flexibility for user in User.query.all(): for case in user.cases: print case.caseName for user in User.query.filter(User.isPaying==True): for case in user.cases: print case.caseName Print all cases: Cases from paying members:
  • 39. OpenSource Connections Lots of things easier ● backups ● robust hosting services (RDS) ● industrial strength ACID with flexible querying ● 3rd-party tooling (ie VividCortex for MySQL)
  • 40. OpenSource Connections When NoSQL? ● Solve specific problems well ○ Optimize for specific query patterns ○ Full-Text Search (Elasticsearch, Solr) ○ Caching, shared data structure (Redis) ● Optimize for specific scaling problems ○ Provide a denormalized “view” of your data for specific task
  • 41. OpenSource Connections Final Thoughts Sometimes RDMS’s have harder initial hurdle for setup, figuring out migrations; data modeling; etc Why isn’t the easy path the wise path?