SlideShare a Scribd company logo
Ubiquitous Solr - A Database’s not-so-evil Twin
Ayon Sinha
Data Foundation @WalmartLabs
O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X
2
Text Search
wow
Search Suggestions
Search Engine… Lucene… Solr
•  Internet and Intranet Search
•  Relevance
•  Search Suggestions
•  Faceting
•  Recommendations
•  Time series
•  Log search
•  Geo-spatial search
•  Analytics
•  Graph search
•  Document Store
Recommendations
Relevance
Facets
3
Overview
•  How to scale any data infrastructure with Apache Solr
•  Build a high performance and highly available data platform for
internal and external users alike
•  Walmart’s commitment to open source
4
About me
•  Team lead at the Data Foundation team for the largest retailer and the
largest private employer in the world
•  Prior to Walmart, worked at startups building recommendation and
analytics systems
•  And prior to that, was building search applications, recommendations
systems and Hadoop based analytics systems for the largest online
auction company, ebay, for 6 years
•  Have been a manuscript reviewer for Manning publications for 4 years
and have helped shape the contents of “Hadoop in Practice” and “Big
Data”
5
About Walmart
•  11,000+ Stores in 27 countries
•  11 eCommerce sites
•  250M customers weekly in stores and online
•  Millions of database transactions per day
•  Sales, Holidays and massive volume shifts
6
It starts-up so simple
An idea implemented on the LAMP stack
7
Turns out to be a great idea!
Users seem to like the new product
8
Users REALLY like this..
Higher volume, increased use cases. Quick fix scaling
alternatives add some headroom … and complexity
9
We need more Business Intelligence
Business is looking good but source-of-truth data store,
not so much …
10
Scale up (in a hurry) with hardware
Least risk. Diminishing returns. What next?
11
Design to scale out
•  Offload queries to Search Engines
•  Offload recurring reads to Cache
•  Offload analytics to OLAP datastores
•  Shard the database
… and of course do something to hide the complexity. It is
worth it.
12
The Inspiration
Integration tools with a Lucene based search engines are
abundant
13
The “not-so-evil” Twin to protect your Source of Truth DB
•  What if a copy of your source-of-truth data is available … Just about
anywhere you want it?
•  How could you use a search engine to protect and augment your
database?
–  Redirect queries
•  Helps scale by reducing demand for
–  database indexing
–  database connections
–  scarce database resources like memory, storage
•  Not-so-evil Twin
–  Adding multiple near real-time search adds complexity … and it
comes at a cost; but done right, the benefits far outweigh the costs
14
Our Approach
•  Abstract the complexity of managing
–  source-of-truth database
–  cache coherence
–  Search queries
–  message bus
•  Abstract Connection pool management
•  Provide a scalable way to query across shards with full control of Solr
schema
•  And to analyze big data without affecting real-time systems and
isolating individual data domains
15
From a situation like..
16
DB, Solr and Hadoop
17
Sharded DB with Solr
18
The Eco-system
Separation of concerns
19
The Result
Scatter-gather vs Powered by Apache Solr
20
Lessons learned
A Search engine like Apache Solr is…
•  not limited to search-based business applications.
•  a first class citizen in your persistence technology stack; it
complements the SoT database.
•  easy to adopt and has all of us as community for support.
21
The Future
•  Symbiotic existence of Solr/Lucene with RDBMS, NoSQL and Big
Data systems
•  Walmart is committed to be part of the community building it
22
Questions? Reach us at:
•  You can reach me, Ayon Sinha, at:
–  asinha@walmartlabs.com
–  https://www.linkedin.com/in/ayonsinha
•  Jason Sardina, our Lead Persistence Architect
–  jsardina@walmartlabs.com
•  @WalmartLabs is always hiring the best

More Related Content

PDF
TweetMogaz - The Arabic Tweets Platform: Presented by Ahmed Adel, BADR
PDF
Searching for Better Code: Presented by Grant Ingersoll, Lucidworks
PPTX
Webinar: Solr & Fusion for Big Data
PDF
Webinar: Event Processing & Data Analytics with Lucidworks Fusion
PDF
Webinar: Rapid Solr Development with Fusion
PDF
Webinar: Fusion for Data Science
PDF
Meetup070416 Presentations
PDF
Solr for Data Science
TweetMogaz - The Arabic Tweets Platform: Presented by Ahmed Adel, BADR
Searching for Better Code: Presented by Grant Ingersoll, Lucidworks
Webinar: Solr & Fusion for Big Data
Webinar: Event Processing & Data Analytics with Lucidworks Fusion
Webinar: Rapid Solr Development with Fusion
Webinar: Fusion for Data Science
Meetup070416 Presentations
Solr for Data Science

What's hot (20)

PDF
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
PDF
PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...
PDF
Elasticsearch in Netflix
PDF
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...
PDF
Webinar: Fusion 2.3 Preview - Enhanced Features with Solr & Spark
PPTX
NATE-Central-Log
PDF
Webinar: Replace Google Search Appliance with Lucidworks Fusion
PDF
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...
PDF
Elastic Stack Roadmap
PDF
An Open Source NoSQL solution for Internet Access Logs Analysis
PDF
Elastic{ON} 2017 Recap
PPTX
Open source log analytics
PPTX
Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)
ODP
Elasticsearch for beginners
PDF
Efficient Scalable Search in a Multi-Tenant Environment: Presented by Harry H...
PPTX
The Ultimate Logging Architecture - You KNOW you want it!
PDF
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
PDF
Webinar: Search and Recommenders
PPTX
Building an ETL pipeline for Elasticsearch using Spark
PPTX
Elasticsearch Introduction
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...
Elasticsearch in Netflix
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...
Webinar: Fusion 2.3 Preview - Enhanced Features with Solr & Spark
NATE-Central-Log
Webinar: Replace Google Search Appliance with Lucidworks Fusion
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...
Elastic Stack Roadmap
An Open Source NoSQL solution for Internet Access Logs Analysis
Elastic{ON} 2017 Recap
Open source log analytics
Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)
Elasticsearch for beginners
Efficient Scalable Search in a Multi-Tenant Environment: Presented by Harry H...
The Ultimate Logging Architecture - You KNOW you want it!
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Webinar: Search and Recommenders
Building an ETL pipeline for Elasticsearch using Spark
Elasticsearch Introduction
Ad

Viewers also liked (14)

PDF
What's Your Money Persona?
PPTX
Nordnet investorkveld i Bergen - 6.6.2016
PPTX
Framtidens konkurransekraft finnes der det skapes sammen @ First Tuesday Bergen
PDF
Developing A Big Data Search Engine - Where we have gone. Where we are going:...
PDF
Insight Works KPI Strategi @ Berghs School of Communication
PDF
Digital disruption v6
PDF
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
PDF
一歩前に進める Web開発のスパイス(仙台Geek★Night #1)
PPTX
Creuna designthinking
PPTX
iShares - Exchange Traded Funds
PDF
How do it and telecom change ... วตท v2
PPTX
Extension of time Analysis
PDF
こんなに使える!今どきのAPIドキュメンテーションツール
PDF
Google apps scriptを使って業務改善
What's Your Money Persona?
Nordnet investorkveld i Bergen - 6.6.2016
Framtidens konkurransekraft finnes der det skapes sammen @ First Tuesday Bergen
Developing A Big Data Search Engine - Where we have gone. Where we are going:...
Insight Works KPI Strategi @ Berghs School of Communication
Digital disruption v6
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
一歩前に進める Web開発のスパイス(仙台Geek★Night #1)
Creuna designthinking
iShares - Exchange Traded Funds
How do it and telecom change ... วตท v2
Extension of time Analysis
こんなに使える!今どきのAPIドキュメンテーションツール
Google apps scriptを使って業務改善
Ad

Similar to Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, WalmartLabs (20)

PPTX
Ubiquitous Solr - A Database's not-so-evil Twin
PDF
Big data rmoug
PPTX
Big Data, Baby Steps
PDF
Utah Big Mountain Big Data Baby Steps (4-12-2014) Final
PDF
Transform from database professional to a Big Data architect
PPTX
5 Things that Make Hadoop a Game Changer
PPTX
IARE_BDBA_ PPT_0.pptx
PDF
The Data Lake and Getting Buisnesses the Big Data Insights They Need
PPTX
Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014
PPTX
Strata sf - Amundsen presentation
PDF
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...
PDF
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
PPTX
One Large Data Lake, Hold the Hype
PPTX
One Large Data Lake, Hold the Hype
PDF
Introduction to Big Data
PPTX
Lecture1 BIG DATA and Types of data in details
PPTX
Data Science Overview
PPTX
How Startups can leverage big data?
PPTX
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
PPTX
Real Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
Ubiquitous Solr - A Database's not-so-evil Twin
Big data rmoug
Big Data, Baby Steps
Utah Big Mountain Big Data Baby Steps (4-12-2014) Final
Transform from database professional to a Big Data architect
5 Things that Make Hadoop a Game Changer
IARE_BDBA_ PPT_0.pptx
The Data Lake and Getting Buisnesses the Big Data Insights They Need
Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014
Strata sf - Amundsen presentation
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
One Large Data Lake, Hold the Hype
One Large Data Lake, Hold the Hype
Introduction to Big Data
Lecture1 BIG DATA and Types of data in details
Data Science Overview
How Startups can leverage big data?
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
Real Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup

More from Lucidworks (20)

PDF
Search is the Tip of the Spear for Your B2B eCommerce Strategy
PDF
Drive Agent Effectiveness in Salesforce
PPTX
How Crate & Barrel Connects Shoppers with Relevant Products
PPTX
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
PPTX
Connected Experiences Are Personalized Experiences
PDF
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
PPTX
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
PPTX
Preparing for Peak in Ecommerce | eTail Asia 2020
PPTX
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
PPTX
AI-Powered Linguistics and Search with Fusion and Rosette
PDF
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
PPTX
Webinar: Smart answers for employee and customer support after covid 19 - Europe
PDF
Smart Answers for Employee and Customer Support After COVID-19
PPTX
Applying AI & Search in Europe - featuring 451 Research
PPTX
Webinar: Accelerate Data Science with Fusion 5.1
PDF
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
PPTX
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
PPTX
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
PPTX
Webinar: Building a Business Case for Enterprise Search
PPTX
Why Insight Engines Matter in 2020 and Beyond
Search is the Tip of the Spear for Your B2B eCommerce Strategy
Drive Agent Effectiveness in Salesforce
How Crate & Barrel Connects Shoppers with Relevant Products
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Connected Experiences Are Personalized Experiences
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
Preparing for Peak in Ecommerce | eTail Asia 2020
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
AI-Powered Linguistics and Search with Fusion and Rosette
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Smart Answers for Employee and Customer Support After COVID-19
Applying AI & Search in Europe - featuring 451 Research
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Webinar: Building a Business Case for Enterprise Search
Why Insight Engines Matter in 2020 and Beyond

Recently uploaded (20)

PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Encapsulation theory and applications.pdf
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
TLE Review Electricity (Electricity).pptx
PPTX
Machine Learning_overview_presentation.pptx
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
A Presentation on Artificial Intelligence
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Heart disease approach using modified random forest and particle swarm optimi...
Reach Out and Touch Someone: Haptics and Empathic Computing
A comparative analysis of optical character recognition models for extracting...
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Encapsulation theory and applications.pdf
Univ-Connecticut-ChatGPT-Presentaion.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
SOPHOS-XG Firewall Administrator PPT.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
TLE Review Electricity (Electricity).pptx
Machine Learning_overview_presentation.pptx
Assigned Numbers - 2025 - Bluetooth® Document
Agricultural_Statistics_at_a_Glance_2022_0.pdf
A Presentation on Artificial Intelligence

Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, WalmartLabs

  • 1. Ubiquitous Solr - A Database’s not-so-evil Twin Ayon Sinha Data Foundation @WalmartLabs O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X
  • 2. 2 Text Search wow Search Suggestions Search Engine… Lucene… Solr •  Internet and Intranet Search •  Relevance •  Search Suggestions •  Faceting •  Recommendations •  Time series •  Log search •  Geo-spatial search •  Analytics •  Graph search •  Document Store Recommendations Relevance Facets
  • 3. 3 Overview •  How to scale any data infrastructure with Apache Solr •  Build a high performance and highly available data platform for internal and external users alike •  Walmart’s commitment to open source
  • 4. 4 About me •  Team lead at the Data Foundation team for the largest retailer and the largest private employer in the world •  Prior to Walmart, worked at startups building recommendation and analytics systems •  And prior to that, was building search applications, recommendations systems and Hadoop based analytics systems for the largest online auction company, ebay, for 6 years •  Have been a manuscript reviewer for Manning publications for 4 years and have helped shape the contents of “Hadoop in Practice” and “Big Data”
  • 5. 5 About Walmart •  11,000+ Stores in 27 countries •  11 eCommerce sites •  250M customers weekly in stores and online •  Millions of database transactions per day •  Sales, Holidays and massive volume shifts
  • 6. 6 It starts-up so simple An idea implemented on the LAMP stack
  • 7. 7 Turns out to be a great idea! Users seem to like the new product
  • 8. 8 Users REALLY like this.. Higher volume, increased use cases. Quick fix scaling alternatives add some headroom … and complexity
  • 9. 9 We need more Business Intelligence Business is looking good but source-of-truth data store, not so much …
  • 10. 10 Scale up (in a hurry) with hardware Least risk. Diminishing returns. What next?
  • 11. 11 Design to scale out •  Offload queries to Search Engines •  Offload recurring reads to Cache •  Offload analytics to OLAP datastores •  Shard the database … and of course do something to hide the complexity. It is worth it.
  • 12. 12 The Inspiration Integration tools with a Lucene based search engines are abundant
  • 13. 13 The “not-so-evil” Twin to protect your Source of Truth DB •  What if a copy of your source-of-truth data is available … Just about anywhere you want it? •  How could you use a search engine to protect and augment your database? –  Redirect queries •  Helps scale by reducing demand for –  database indexing –  database connections –  scarce database resources like memory, storage •  Not-so-evil Twin –  Adding multiple near real-time search adds complexity … and it comes at a cost; but done right, the benefits far outweigh the costs
  • 14. 14 Our Approach •  Abstract the complexity of managing –  source-of-truth database –  cache coherence –  Search queries –  message bus •  Abstract Connection pool management •  Provide a scalable way to query across shards with full control of Solr schema •  And to analyze big data without affecting real-time systems and isolating individual data domains
  • 16. 16 DB, Solr and Hadoop
  • 19. 19 The Result Scatter-gather vs Powered by Apache Solr
  • 20. 20 Lessons learned A Search engine like Apache Solr is… •  not limited to search-based business applications. •  a first class citizen in your persistence technology stack; it complements the SoT database. •  easy to adopt and has all of us as community for support.
  • 21. 21 The Future •  Symbiotic existence of Solr/Lucene with RDBMS, NoSQL and Big Data systems •  Walmart is committed to be part of the community building it
  • 22. 22 Questions? Reach us at: •  You can reach me, Ayon Sinha, at: –  asinha@walmartlabs.com –  https://www.linkedin.com/in/ayonsinha •  Jason Sardina, our Lead Persistence Architect –  jsardina@walmartlabs.com •  @WalmartLabs is always hiring the best