SlideShare a Scribd company logo
[Open Source]
Search Evolution
Otis Gospodnetić @otisg
Today
The Early Days
Even Earlier Days
Foci
1974 1995 now()
__________________________________________________________________________________________________________________
______
SEARCH
Otis Who?
SEARCH
Then & Now
1990s 2014
WebGlimpse
Swish
Harvest
Ht://Dig
freeWAIS elasticsearch.
Still New?
elasticsearch.
…………………... 2000
…………………... 2004
…………………... 2010
Dominance
[Open Source]
Search Evolution
Big Cake
Big Data
Beyond Text
Memory Footprint
Distributed Model
Language Support
Indexing Speed, NRT
Relevance Algorithms
Language Support: Stemming
Language Support: Lemmatization
Language Support: Morphology
Language Support
Lucene 2004: ~ 20 languages
Lucene 2014: ~ 40 languages
most are stemmers
Relevance Models: VSM
TF IDF
For term i in document j
wi,j
= tfi,j
x log(N/dfi
)
tfi,j
= number of occurrences of i in j
dfi
= number of document containing i
N = total number of documents
Relevance Models: Pluggable
Lucene until 2011: 1 relevance model
Lucene 2014: 6 relevance models
got more?
Distributed Architecture
1 Master - N Slaves
good for scaling queries
not good for scaling data
Sharded index with replication
good for scaling queries
good for scaling data
Indexing Speed & NRT Search
Memory Footprint
Beyond Text
Geospatial Search
Classifier
Recommendation Engine
Key Value Store
NoSQL DB
Analytical DB
Geospatial Search
Classifier
Recommender
Content Similarity
Collaborative Filtering
Key Value Store
id123 ⇒ manu:Apple desc:foo bar price:$111
id234 ⇒ manu:Sony desc:baz bam price:$222
NoSQL DB
Distributed
Replicated
Horizontally Scalable
Fast Retrieval
Searchable?
Slicing & Dicing
Analytical Queries
Gobble Gobble
If software is eating the world,
then [open source] search is gobbling it.
And has been for years.
FIN. Questions
otis@sematext.com

More Related Content

PDF
Docker Monitoring Webinar
PDF
Docker Logging Webinar
PDF
Elasticsearch and Solr for Logs
PDF
Metrics, Logs, Transaction Traces, Anomaly Detection at Scale
PDF
Tuning Solr & Pipeline for Logs
PDF
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
PDF
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
PDF
(Elastic)search in big data
Docker Monitoring Webinar
Docker Logging Webinar
Elasticsearch and Solr for Logs
Metrics, Logs, Transaction Traces, Anomaly Detection at Scale
Tuning Solr & Pipeline for Logs
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
(Elastic)search in big data

Viewers also liked (6)

PDF
Introduction to Elasticsearch
PDF
Side by Side with Elasticsearch and Solr
PDF
Side by Side with Elasticsearch & Solr, Part 2
PPTX
Running High Performance & Fault-tolerant Elasticsearch Clusters on Docker
PPT
Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker
PDF
How to Run Solr on Docker and Why
Introduction to Elasticsearch
Side by Side with Elasticsearch and Solr
Side by Side with Elasticsearch & Solr, Part 2
Running High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker
How to Run Solr on Docker and Why
Ad

Similar to Open Source Search Evolution (20)

PPTX
Elastic pivorak
PDF
Elasticsearch and Spark
PPT
Open Source Search Tools for www2010 conferencesourcesearchtoolswww20100426dA...
PDF
Elasticsearch
PPTX
Elasticsearch - DevNexus 2015
PPTX
BigData Search Simplified with ElasticSearch
PPTX
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
PPTX
Introduction to Elasticsearch
PDF
Elasticsearch Intro
ODP
Elasticsearch for beginners
PDF
IR: Open source state
PDF
Introduction to elasticsearch
PDF
ElasticSearch - Suche im Zeitalter der Clouds
PPTX
Intro elasticsearch taswarbhatti
PPTX
ElasticSearch Basic Introduction
PPTX
Logstash, Elasticsearch and Kibana
PPTX
Elasticsearch
PDF
In search of: A meetup about Liferay and Search 2016-04-20
PPTX
Connect and search your data
PDF
Open Distro for ElasticSearch and how Grimoire is using it. Madrid DevOps Oct...
Elastic pivorak
Elasticsearch and Spark
Open Source Search Tools for www2010 conferencesourcesearchtoolswww20100426dA...
Elasticsearch
Elasticsearch - DevNexus 2015
BigData Search Simplified with ElasticSearch
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Introduction to Elasticsearch
Elasticsearch Intro
Elasticsearch for beginners
IR: Open source state
Introduction to elasticsearch
ElasticSearch - Suche im Zeitalter der Clouds
Intro elasticsearch taswarbhatti
ElasticSearch Basic Introduction
Logstash, Elasticsearch and Kibana
Elasticsearch
In search of: A meetup about Liferay and Search 2016-04-20
Connect and search your data
Open Distro for ElasticSearch and how Grimoire is using it. Madrid DevOps Oct...
Ad

More from Sematext Group, Inc. (16)

PDF
Tweaking the Base Score: Lucene/Solr Similarities Explained
PDF
OOPs, OOMs, oh my! Containerizing JVM apps
PPTX
Is observability good for your brain?
PDF
Introducing log analysis to your organization
PPTX
Solr Search Engine: Optimize Is (Not) Bad for You
PDF
Solr on Docker - the Good, the Bad and the Ugly
PDF
Monitoring and Log Management for
PDF
Introduction to solr
PDF
Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka
PDF
Elasticsearch for Logs & Metrics - a deep dive
PDF
Top Node.js Metrics to Watch
PPTX
Tuning Elasticsearch Indexing Pipeline for Logs
PDF
Solr Anti Patterns
PDF
Tuning Solr for Logs
PDF
From Zero to Hero - Centralized Logging with Logstash & Elasticsearch
PPTX
Administering and Monitoring SolrCloud Clusters
Tweaking the Base Score: Lucene/Solr Similarities Explained
OOPs, OOMs, oh my! Containerizing JVM apps
Is observability good for your brain?
Introducing log analysis to your organization
Solr Search Engine: Optimize Is (Not) Bad for You
Solr on Docker - the Good, the Bad and the Ugly
Monitoring and Log Management for
Introduction to solr
Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka
Elasticsearch for Logs & Metrics - a deep dive
Top Node.js Metrics to Watch
Tuning Elasticsearch Indexing Pipeline for Logs
Solr Anti Patterns
Tuning Solr for Logs
From Zero to Hero - Centralized Logging with Logstash & Elasticsearch
Administering and Monitoring SolrCloud Clusters

Recently uploaded (20)

PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Empathic Computing: Creating Shared Understanding
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Electronic commerce courselecture one. Pdf
PDF
Modernizing your data center with Dell and AMD
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Cloud computing and distributed systems.
PDF
Advanced IT Governance
PDF
KodekX | Application Modernization Development
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
MYSQL Presentation for SQL database connectivity
Diabetes mellitus diagnosis method based random forest with bat algorithm
Empathic Computing: Creating Shared Understanding
NewMind AI Weekly Chronicles - August'25 Week I
Electronic commerce courselecture one. Pdf
Modernizing your data center with Dell and AMD
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Review of recent advances in non-invasive hemoglobin estimation
Dropbox Q2 2025 Financial Results & Investor Presentation
Cloud computing and distributed systems.
Advanced IT Governance
KodekX | Application Modernization Development
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
20250228 LYD VKU AI Blended-Learning.pptx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Understanding_Digital_Forensics_Presentation.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
MYSQL Presentation for SQL database connectivity

Open Source Search Evolution