SlideShare a Scribd company logo
Text Data Analysis Panel: South Big Data Hub
Trey Grainger
SVP of Engineering, Lucidworks
Trey Grainger
SVP of Engineering
• Previously Director of Engineering @ CareerBuilder
• MBA, Management of Technology – Georgia Tech
• BA, Computer Science, Business, & Philosophy – Furman University
• Information Retrieval & Web Search - Stanford University
Other fun projects:
• Co-author of Solr in Action, plus numerous research papers
• Frequent conference speaker
• Founder of Celiaccess.com, the gluten-free search engine
• Lucene/Solr contributor
About Me
what do you do?
South Big Data Hub: Text Data Analysis Panel
Search-Driven
Everything
Customer
Service
Customer
Insights
Fraud Surveillance
Research
Portal
Online Retail
Digital
Content
Lucidworks enables Search-Driven Everything
Data Acquisition
Indexing & Streaming
Smart Access API
Recommendations &
Alerts
Analytics & InsightsExtreme Relevancy
CUSTOMER
SERVICE
RESEARCH
PORTAL
DIGITAL
CONTENT
CUSTOMER
INSIGHTS
FRAUD
SURVEILLANCE
ONLINE
RETAIL
• Access all your data in a
number of ways from one
place.
• Secure storage and
processing from Solr and
Spark.
• Acquire data from any source
with pre-built connectors and
adapters.
Machine learning and
advanced analytics turn all
of your apps into intelligent
data-driven applications.
South Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis Panel
how do you do it?
Solr is the popular, blazing-fast,
open source enterprise search
platform built on Apache Lucene™.
Key Solr Features:
● Multilingual Keyword search
● Relevancy Ranking of results
● Faceting & Analytics (nested / relational)
● Highlighting
● Spelling Correction
● Autocomplete/Type-ahead Prediction
● Sorting, Grouping, Deduplication
● Distributed, Fault-tolerant, Scalable
● Geospatial search
● Complex Function queries
● Recommendations (More Like This)
● Graph Queries and Traversals
● SQL Query Support
● Streaming Aggregations
● Batch and Streaming processing
● Highly Configurable / Plugins
● Learning to Rank
● Building machine-learning models
● … many more
*source: Solr in Action, chapter 2
The standard
for enterprise
search.
of Fortune 500
uses Solr.
90%
Reference Architecture (Lucidworks Fusion)
Bay Area Search
Type-ahead
Prediction
Building an Intent Engine
Search Box
Semantic Query
Parsing
Intent Engine
Spelling Correction
Entity / Entity Type
Resolution
Machine-learned
Ranking
Relevancy Engine (“re-expressing intent”)
User Feedback
(Clarifying Intent)
Query Re-writing Search Results
Query
Augmentation
Knowledge
Graph
Contextual
Disambiguation
Additional References:
what’s next?
Basic Keyword Search
(inverted index, tf-idf, bm25,
query formulation, etc.)
Taxonomies / Entity
Extraction
(entity recognition,
ontologies, synonyms, etc.)
Query Intent
(query classification, semantic
query parsing, concept
expansion, rules, clustering,
classification)
Relevancy Tuning
(signals, AB testing/genetic
algorithms, Learning to Rank,
Neural Networks)
Self-learning
The Three C’s
Content:
Keywords and other features in your documents
Collaboration:
How other’s have chosen to interact with your system
Context:
Available information about your users and their intent
Reflected Intelligence
“Leveraging previous data and interactions to improve how
new data and interactions should be interpreted”
Feedback Loops
User
Searches
User
Sees
Results
User
takes an
action
Users’ actions
inform system
improvements
● Recommendation Algorithms
● Building user profiles from past searches, clicks, and other actions
● Identifying correlations between keywords/phrases
● Building out automatically-generated ontologies from content and queries
● Determining relevancy judgements (precision, recall, nDCG, etc.) from click
logs
● Learning to Rank - using relevancy judgements and machine learning to train
a relevance model
● Discovering misspellings, synonyms, acronyms, and related keywords
● Disambiguation of keyword phrases with multiple meanings
● Learning what’s important in your content
Examples of Reflected Intelligence
Key Technologies
• Keyword Search
- Lucene/Solr
• Taxonomies / Entity Extraction
- Solr Text Tagger
- Word2Vec / Dice Conceptual Search
- SolrRDF
• Query Intent
- Probabilistic Query Parser (SOLR-9418)
- Semantic Knowledge Graph (SOLR-9480)
• Relevancy Tuning
- Solr Learning to Rank Plugin (SOLR-8542)
• General Needs: a solid log processing framework
(Apache Spark, Lucidworks Fusion, or Solr Daemon Expression)
South Big Data Hub: Text Data Analysis Panel
Source: Trey Grainger,
Khalifeh AlJadda, Mohammed
Korayem, Andries Smith.“The
Semantic Knowledge Graph: A
compact, auto-generated
model for real-time traversal
and ranking of any relationship
within a domain”. DSAA 2016.
Knowledge
Graph
Semantic Knowledge Graph Traversal
software engineer*
(materialized node)
Java
C#
.NET
.NET
Developer
Java
Developer
Hibernate
ScalaVB.NET
Software
Engineer
Data
Scientist
Skill
Nodes
has_related_skillStarting
Node
Skill
Nodes
has_related_skill Job Title
Nodes
has_related_job_title
0.90
0.88 0.93
0.93
0.34
0.74
0.91
0.89
0.74
0.89
0.780.72
0.48
0.93
0.76
0.83
0.80
0.64
0.61
0.780.55
Knowledge
Graph
Knowledge
Graph
Traditional
Keyword
Search
Recommendations
Semantic
Search
User Intent
Personalized
Search
Augmented
Search
Domain-aware
Matching
Contact Info
Trey Grainger
trey.grainger@lucidworks.com
@treygrainger
http://solrinaction.com
Other presentations:
http://www.treygrainger.com

More Related Content

PPTX
Searching for Meaning
PDF
Reflected Intelligence: Real world AI in Digital Transformation
PDF
Measuring Relevance in the Negative Space
PDF
Natural Language Search with Knowledge Graphs (Haystack 2019)
PDF
AI, Search, and the Disruption of Knowledge Management
PDF
The Next Generation of AI-powered Search
PDF
Natural Language Search with Knowledge Graphs (Activate 2019)
PPTX
The Semantic Knowledge Graph
Searching for Meaning
Reflected Intelligence: Real world AI in Digital Transformation
Measuring Relevance in the Negative Space
Natural Language Search with Knowledge Graphs (Haystack 2019)
AI, Search, and the Disruption of Knowledge Management
The Next Generation of AI-powered Search
Natural Language Search with Knowledge Graphs (Activate 2019)
The Semantic Knowledge Graph

What's hot (20)

PPTX
Reflected Intelligence: Lucene/Solr as a self-learning data system
PDF
Thought Vectors and Knowledge Graphs in AI-powered Search
PPTX
The Intent Algorithms of Search & Recommendation Engines
PDF
Crowdsourced query augmentation through the semantic discovery of domain spec...
PDF
Natural Language Search with Knowledge Graphs (Chicago Meetup)
PDF
Reflected intelligence evolving self-learning data systems
PDF
The Future of Search and AI
PPTX
The Relevance of the Apache Solr Semantic Knowledge Graph
PPTX
Self-learned Relevancy with Apache Solr
PPTX
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
PPTX
How to Build a Semantic Search System
PPTX
The Apache Solr Smart Data Ecosystem
PDF
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
PDF
Enhancing relevancy through personalization & semantic search
PPTX
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
PPTX
Interleaving, Evaluation to Self-learning Search @904Labs
PPTX
Building Search & Recommendation Engines
PPT
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
PPTX
Better Search Through Query Understanding
PPTX
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
Reflected Intelligence: Lucene/Solr as a self-learning data system
Thought Vectors and Knowledge Graphs in AI-powered Search
The Intent Algorithms of Search & Recommendation Engines
Crowdsourced query augmentation through the semantic discovery of domain spec...
Natural Language Search with Knowledge Graphs (Chicago Meetup)
Reflected intelligence evolving self-learning data systems
The Future of Search and AI
The Relevance of the Apache Solr Semantic Knowledge Graph
Self-learned Relevancy with Apache Solr
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
How to Build a Semantic Search System
The Apache Solr Smart Data Ecosystem
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Enhancing relevancy through personalization & semantic search
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Interleaving, Evaluation to Self-learning Search @904Labs
Building Search & Recommendation Engines
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
Better Search Through Query Understanding
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
Ad

Similar to South Big Data Hub: Text Data Analysis Panel (20)

PDF
Applications & Research Topics in Machine Learning
PPTX
information retrieval in artificial intelligence
PPTX
Information retrieval s
PDF
Webinar: Search and Recommenders
PPTX
Machine Learned Relevance at A Large Scale Search Engine
PDF
Sweeny ux-seo om-cap 2014_v3
PDF
Meetup SF - Amundsen
PDF
Introduction to Business and Data Analysis Undergraduate.pdf
PDF
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
PDF
Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...
PDF
How to Get Enterprise Search Right Webinar
PPTX
How Lyft Drives Data Discovery
PPTX
Data science in business Administration Nagarajan.pptx
PDF
What IA, UX and SEO Can Learn from Each Other
PPTX
A brief of Osint and its uses in cyber crime.pptx
PPTX
Advanced Analytics and Data Science Expertise
PPTX
Data science and business analytics
PPTX
Summit EU Machine Learning
PPTX
CHAPTER -12 it.pptx
PPTX
What is Search Engine - Lecture#3 AICT.pptx
Applications & Research Topics in Machine Learning
information retrieval in artificial intelligence
Information retrieval s
Webinar: Search and Recommenders
Machine Learned Relevance at A Large Scale Search Engine
Sweeny ux-seo om-cap 2014_v3
Meetup SF - Amundsen
Introduction to Business and Data Analysis Undergraduate.pdf
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...
How to Get Enterprise Search Right Webinar
How Lyft Drives Data Discovery
Data science in business Administration Nagarajan.pptx
What IA, UX and SEO Can Learn from Each Other
A brief of Osint and its uses in cyber crime.pptx
Advanced Analytics and Data Science Expertise
Data science and business analytics
Summit EU Machine Learning
CHAPTER -12 it.pptx
What is Search Engine - Lecture#3 AICT.pptx
Ad

More from Trey Grainger (6)

PDF
Balancing the Dimensions of User Intent
PPTX
The Apache Solr Semantic Knowledge Graph
PDF
Semantic & Multilingual Strategies in Lucene/Solr
PDF
Scaling Recommendations, Semantic Search, & Data Analytics with solr
PDF
Building a real time big data analytics platform with solr
PPTX
Building a real time, solr-powered recommendation engine
Balancing the Dimensions of User Intent
The Apache Solr Semantic Knowledge Graph
Semantic & Multilingual Strategies in Lucene/Solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Building a real time big data analytics platform with solr
Building a real time, solr-powered recommendation engine

Recently uploaded (20)

PPTX
A Presentation on Artificial Intelligence
PDF
KodekX | Application Modernization Development
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Modernizing your data center with Dell and AMD
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
NewMind AI Monthly Chronicles - July 2025
PPTX
Big Data Technologies - Introduction.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
A Presentation on Artificial Intelligence
KodekX | Application Modernization Development
Advanced methodologies resolving dimensionality complications for autism neur...
Per capita expenditure prediction using model stacking based on satellite ima...
The AUB Centre for AI in Media Proposal.docx
Building Integrated photovoltaic BIPV_UPV.pdf
Modernizing your data center with Dell and AMD
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Empathic Computing: Creating Shared Understanding
Review of recent advances in non-invasive hemoglobin estimation
Digital-Transformation-Roadmap-for-Companies.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
Encapsulation_ Review paper, used for researhc scholars
Spectral efficient network and resource selection model in 5G networks
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
NewMind AI Monthly Chronicles - July 2025
Big Data Technologies - Introduction.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
The Rise and Fall of 3GPP – Time for a Sabbatical?

South Big Data Hub: Text Data Analysis Panel

  • 1. Text Data Analysis Panel: South Big Data Hub Trey Grainger SVP of Engineering, Lucidworks
  • 2. Trey Grainger SVP of Engineering • Previously Director of Engineering @ CareerBuilder • MBA, Management of Technology – Georgia Tech • BA, Computer Science, Business, & Philosophy – Furman University • Information Retrieval & Web Search - Stanford University Other fun projects: • Co-author of Solr in Action, plus numerous research papers • Frequent conference speaker • Founder of Celiaccess.com, the gluten-free search engine • Lucene/Solr contributor About Me
  • 6. Lucidworks enables Search-Driven Everything Data Acquisition Indexing & Streaming Smart Access API Recommendations & Alerts Analytics & InsightsExtreme Relevancy CUSTOMER SERVICE RESEARCH PORTAL DIGITAL CONTENT CUSTOMER INSIGHTS FRAUD SURVEILLANCE ONLINE RETAIL • Access all your data in a number of ways from one place. • Secure storage and processing from Solr and Spark. • Acquire data from any source with pre-built connectors and adapters. Machine learning and advanced analytics turn all of your apps into intelligent data-driven applications.
  • 12. how do you do it?
  • 13. Solr is the popular, blazing-fast, open source enterprise search platform built on Apache Lucene™.
  • 14. Key Solr Features: ● Multilingual Keyword search ● Relevancy Ranking of results ● Faceting & Analytics (nested / relational) ● Highlighting ● Spelling Correction ● Autocomplete/Type-ahead Prediction ● Sorting, Grouping, Deduplication ● Distributed, Fault-tolerant, Scalable ● Geospatial search ● Complex Function queries ● Recommendations (More Like This) ● Graph Queries and Traversals ● SQL Query Support ● Streaming Aggregations ● Batch and Streaming processing ● Highly Configurable / Plugins ● Learning to Rank ● Building machine-learning models ● … many more *source: Solr in Action, chapter 2
  • 15. The standard for enterprise search. of Fortune 500 uses Solr. 90%
  • 17. Bay Area Search Type-ahead Prediction Building an Intent Engine Search Box Semantic Query Parsing Intent Engine Spelling Correction Entity / Entity Type Resolution Machine-learned Ranking Relevancy Engine (“re-expressing intent”) User Feedback (Clarifying Intent) Query Re-writing Search Results Query Augmentation Knowledge Graph Contextual Disambiguation
  • 20. Basic Keyword Search (inverted index, tf-idf, bm25, query formulation, etc.) Taxonomies / Entity Extraction (entity recognition, ontologies, synonyms, etc.) Query Intent (query classification, semantic query parsing, concept expansion, rules, clustering, classification) Relevancy Tuning (signals, AB testing/genetic algorithms, Learning to Rank, Neural Networks) Self-learning
  • 21. The Three C’s Content: Keywords and other features in your documents Collaboration: How other’s have chosen to interact with your system Context: Available information about your users and their intent Reflected Intelligence “Leveraging previous data and interactions to improve how new data and interactions should be interpreted”
  • 23. ● Recommendation Algorithms ● Building user profiles from past searches, clicks, and other actions ● Identifying correlations between keywords/phrases ● Building out automatically-generated ontologies from content and queries ● Determining relevancy judgements (precision, recall, nDCG, etc.) from click logs ● Learning to Rank - using relevancy judgements and machine learning to train a relevance model ● Discovering misspellings, synonyms, acronyms, and related keywords ● Disambiguation of keyword phrases with multiple meanings ● Learning what’s important in your content Examples of Reflected Intelligence
  • 24. Key Technologies • Keyword Search - Lucene/Solr • Taxonomies / Entity Extraction - Solr Text Tagger - Word2Vec / Dice Conceptual Search - SolrRDF • Query Intent - Probabilistic Query Parser (SOLR-9418) - Semantic Knowledge Graph (SOLR-9480) • Relevancy Tuning - Solr Learning to Rank Plugin (SOLR-8542) • General Needs: a solid log processing framework (Apache Spark, Lucidworks Fusion, or Solr Daemon Expression)
  • 26. Source: Trey Grainger, Khalifeh AlJadda, Mohammed Korayem, Andries Smith.“The Semantic Knowledge Graph: A compact, auto-generated model for real-time traversal and ranking of any relationship within a domain”. DSAA 2016. Knowledge Graph Semantic Knowledge Graph Traversal software engineer* (materialized node) Java C# .NET .NET Developer Java Developer Hibernate ScalaVB.NET Software Engineer Data Scientist Skill Nodes has_related_skillStarting Node Skill Nodes has_related_skill Job Title Nodes has_related_job_title 0.90 0.88 0.93 0.93 0.34 0.74 0.91 0.89 0.74 0.89 0.780.72 0.48 0.93 0.76 0.83 0.80 0.64 0.61 0.780.55