SlideShare a Scribd company logo
Name Matching
with Elasticsearch
June 25, 2015
Graham Morehead
gmorehead@basistech.com
April 15 2013 2:49 PM .
Simple fuzzy Name Matching in Elasticsearch - Graham Morehead
Simple fuzzy Name Matching in Elasticsearch - Graham Morehead
Simple fuzzy Name Matching in Elasticsearch - Graham Morehead
Simple fuzzy Name Matching in Elasticsearch - Graham Morehead
Simple fuzzy Name Matching in Elasticsearch - Graham Morehead
Real life example
David K. Murgatroyd
VP of Engineering
Boarding Pass
Best Practice using Elasticsearch?
● NameMapper
(http://stackoverflow.com/questions/20632042/elasticsearch-searching-for-human-names)
"mappings": { ... "type": "multi_field", "fields": {
"pty_surename": { "type": "string", "analyzer": "simple" },
"metaphone": { "type": "string", "analyzer": "metaphone" },
"porter": { "type": "string", "analyzer": "porter" } …
● rescore_query
“Jesus Alfonso Lopez Diaz”
vs.
“LobezDias, Chuy”
RNI
Simple fuzzy Name Matching in Elasticsearch - Graham Morehead
Simple fuzzy Name Matching in Elasticsearch - Graham Morehead
Simple fuzzy Name Matching in Elasticsearch - Graham Morehead
Rescore Query
Main Query
Plug-in Implementation
match : {
name:
"Bob Smitty" }
bool:
name.Key1:...
name.Key2:...
name.Key3:...
User Query
Rescore
name_score : {
field : "name",
name : "Bob
Smitty")
name:"Robert
Smith"
dob:2/13/1987
score : .79
Indexing
{ name: "Robert
Smith"
dob:"
1987/02/13" }
{ name: "Robert
Smith"
name.Key1:…
name.Key2:…
name.Key3:…
dob:
"1987/02/13" }
User Doc
Index
subset
Demo
Elastic + RNI
Name Matching
with Elasticsearch
June 25, 2015
Graham Morehead
gmorehead@basistech.com
How could you use such a Field?
● Plugin contains custom mapper which does
all the work behind the scenes
PUT /ofac/ofac/_mapping
{
"ofac" : {
"properties" : {
"name" : { "type:" : "rni_name" }
"aka" : { "type:" : "rni_name" }
}
}
}
What happens at index time?
● NameMapper indexes keys for different
phenomena in separate (sub) fields
@Override
public void parse(ParseContext context) throws IOException {
Name name = NameBuilder.data(nameString).build();
//Generate keys for name
Collection<FieldSpec> fields = helper.deriveFieldsForName(name);
//Parse each key with the appropriate Mapper
for (FieldSpec field : fields) {
Mapper mapper = keyMappers.get(field.getField().fieldName());
context = context.createExternalValueContext(field.getStringValue());
mapper.parse(context);
}
}
What happens at query time?
● Step #1: NameMapper generates analogous
keys for a custom Lucene query that finds
good candidates for re-scoring
@Override
public Query termQuery(Object value, @Nullable QueryParseContext context) {
//Parse name string
Name name = NameBuilder.data(value.toString()).build();
QuerySpec spec = helper.buildQuerySpec(new NameIndexQuery(name));
//Build Lucene query
Query query = spec.accept(new ESQueryVisitor(names.indexName() + "."));
return query;
}
What else happens at query time?
● Step #2: Uses a Rescore query to score names in the
best candidate documents and reorder accordingly
○ Tuned for high precision name matching
○ Computationally expensive
"rescore" : {
"query" : {
"rescore_query" : {
"function_score" : {
"name_score" : {
"field" : "name",
"query_name" : "LobEzDiaS, Chuy"
}
...
● The 'name_score' function matches the
query name against the indexed name in
every candidate document and returns the
similarity score
@Override
public double score(int docId, float subQueryScore) {
//Create a scorer for the query name
CachedScorer cs = createCachedScorer(queryName);
//Retrieve name data from doc values
nameByteData.setDocument(docId);
Name indexName = bytesToName(nameByteData.valueAt(i).bytes);
//Score the query against the indexed name in this document
return cs.score(indexName);
}
What does that function do?

More Related Content

PDF
How to Fail at Kafka
PPT
Metodos Busqueda Interna
PDF
Parallel Execution With Oracle Database 12c - Masterclass
PPTX
ELF(executable and linkable format)
PDF
Manual de Estudiante ABT-CCP-146-TSMES - Fundamentos del Sistema RSLogix 5000...
PPTX
OSDF 2013 - Autopsy 3: Extensible Desktop Forensics by Brian Carrier
PPTX
Optimizing multilingual search in SOLR
PPTX
Rosette Search Essentials for Elasticsearch
How to Fail at Kafka
Metodos Busqueda Interna
Parallel Execution With Oracle Database 12c - Masterclass
ELF(executable and linkable format)
Manual de Estudiante ABT-CCP-146-TSMES - Fundamentos del Sistema RSLogix 5000...
OSDF 2013 - Autopsy 3: Extensible Desktop Forensics by Brian Carrier
Optimizing multilingual search in SOLR
Rosette Search Essentials for Elasticsearch

Viewers also liked (16)

PDF
Moving Beyond Entity Extraction to Entity Resolution - Human Language Technol...
PDF
Final Doc_1.1
PDF
World Domination with Pentaho EE?
PDF
How can iceland produce so many professional players sept 2010
PPTX
The Next Generation SharePoint: Powered by Text Analytics
PPTX
Building Data Integration and Transformations using Pentaho
ODP
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
PPSX
Business Intelligence and Big Data Analytics with Pentaho
PDF
Introduction to Apache Solr
PPTX
Pentaho-BI
PPTX
Slides pentaho-hadoop-weka
ODP
Pentaho Data Integration Introduction
PPTX
ElasticSearch : Architecture et Développement
PDF
Nantes JUG - Elasticsearch
KEY
Elasticsearch - Montpellier JUG
PDF
Tirer le meilleur de ses données avec ElasticSearch
Moving Beyond Entity Extraction to Entity Resolution - Human Language Technol...
Final Doc_1.1
World Domination with Pentaho EE?
How can iceland produce so many professional players sept 2010
The Next Generation SharePoint: Powered by Text Analytics
Building Data Integration and Transformations using Pentaho
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Business Intelligence and Big Data Analytics with Pentaho
Introduction to Apache Solr
Pentaho-BI
Slides pentaho-hadoop-weka
Pentaho Data Integration Introduction
ElasticSearch : Architecture et Développement
Nantes JUG - Elasticsearch
Elasticsearch - Montpellier JUG
Tirer le meilleur de ses données avec ElasticSearch
Ad

Similar to Simple fuzzy Name Matching in Elasticsearch - Graham Morehead (20)

PDF
Elasticsearch first-steps
PPTX
Simple fuzzy name matching in solr
PDF
Data access 2.0? Please welcome: Spring Data!
PDF
An introduction into Spring Data
PDF
Querydsl fin jug - june 2012
PDF
Functional programming using underscorejs
PDF
Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany
PPTX
Lucene in Action
PDF
Full Text Search In PostgreSQL
PDF
Elasticsearch: You know, for search! and more!
PPTX
2017 02-07 - elastic & spark. building a search geo locator
PPTX
2017 02-07 - elastic & spark. building a search geo locator
PPTX
C# 7.0 Hacks and Features
PPTX
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
PPTX
Webinar: Simplifying Persistence for Java and MongoDB
PDF
d3sparql.js demo at SWAT4LS 2014 in Berlin
PPTX
Simplifying Persistence for Java and MongoDB with Morphia
PDF
[2019-07] GraphQL in depth (serverside)
PDF
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
PDF
Next-generation API Development with GraphQL and Prisma
Elasticsearch first-steps
Simple fuzzy name matching in solr
Data access 2.0? Please welcome: Spring Data!
An introduction into Spring Data
Querydsl fin jug - june 2012
Functional programming using underscorejs
Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany
Lucene in Action
Full Text Search In PostgreSQL
Elasticsearch: You know, for search! and more!
2017 02-07 - elastic & spark. building a search geo locator
2017 02-07 - elastic & spark. building a search geo locator
C# 7.0 Hacks and Features
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Webinar: Simplifying Persistence for Java and MongoDB
d3sparql.js demo at SWAT4LS 2014 in Berlin
Simplifying Persistence for Java and MongoDB with Morphia
[2019-07] GraphQL in depth (serverside)
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
Next-generation API Development with GraphQL and Prisma
Ad

More from Basis Technology (17)

PDF
Product Update: Customization with Rosette
PDF
Smart Matching for Screening Webinar - May 2020
PDF
Understanding Names with Neural Networks - May 2020
PDF
Rosette Product Update (May 2019)
PDF
Simple fuzzy name matching in elasticsearch paris meetup
PDF
Gregor Stewart - OSIRA 2014
PDF
Basis Technology showcase at elasticsearch meetup in Japan
PPTX
HLT 2013 - Big Data Navigation and Discovery by Stefan Andreasen & Jeff Godbold
PDF
HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier
PDF
OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies
PPTX
HLT 2013 - Adapting News-Trained Entity Extraction to New Domains and Emergin...
PPTX
HLT 2013 - From Research to Reality: Advances in HLT by David Murgatroyd
PPTX
Autopsy 3: Free Open Source End-to-End Windows-based Digital Forensics Platform
PDF
A Lightning Introduction To Clouds & HLT - Human Language Technology Conference
PDF
Autopsy 3.0 - Open Source Digital Forensics Conference
PDF
Big Data Triage with Rosette Human Language Technology Conference
PDF
Multilingual Search and Text Analytics with Solr - Open Source Search Conference
Product Update: Customization with Rosette
Smart Matching for Screening Webinar - May 2020
Understanding Names with Neural Networks - May 2020
Rosette Product Update (May 2019)
Simple fuzzy name matching in elasticsearch paris meetup
Gregor Stewart - OSIRA 2014
Basis Technology showcase at elasticsearch meetup in Japan
HLT 2013 - Big Data Navigation and Discovery by Stefan Andreasen & Jeff Godbold
HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier
OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies
HLT 2013 - Adapting News-Trained Entity Extraction to New Domains and Emergin...
HLT 2013 - From Research to Reality: Advances in HLT by David Murgatroyd
Autopsy 3: Free Open Source End-to-End Windows-based Digital Forensics Platform
A Lightning Introduction To Clouds & HLT - Human Language Technology Conference
Autopsy 3.0 - Open Source Digital Forensics Conference
Big Data Triage with Rosette Human Language Technology Conference
Multilingual Search and Text Analytics with Solr - Open Source Search Conference

Recently uploaded (20)

PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Empathic Computing: Creating Shared Understanding
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Big Data Technologies - Introduction.pptx
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Approach and Philosophy of On baking technology
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Electronic commerce courselecture one. Pdf
Encapsulation_ Review paper, used for researhc scholars
Unlocking AI with Model Context Protocol (MCP)
Empathic Computing: Creating Shared Understanding
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
The AUB Centre for AI in Media Proposal.docx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Per capita expenditure prediction using model stacking based on satellite ima...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Review of recent advances in non-invasive hemoglobin estimation
Big Data Technologies - Introduction.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
20250228 LYD VKU AI Blended-Learning.pptx
Approach and Philosophy of On baking technology
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Digital-Transformation-Roadmap-for-Companies.pptx
Electronic commerce courselecture one. Pdf

Simple fuzzy Name Matching in Elasticsearch - Graham Morehead

  • 1. Name Matching with Elasticsearch June 25, 2015 Graham Morehead gmorehead@basistech.com
  • 2. April 15 2013 2:49 PM .
  • 8. Real life example David K. Murgatroyd VP of Engineering Boarding Pass
  • 9. Best Practice using Elasticsearch? ● NameMapper (http://stackoverflow.com/questions/20632042/elasticsearch-searching-for-human-names) "mappings": { ... "type": "multi_field", "fields": { "pty_surename": { "type": "string", "analyzer": "simple" }, "metaphone": { "type": "string", "analyzer": "metaphone" }, "porter": { "type": "string", "analyzer": "porter" } … ● rescore_query
  • 10. “Jesus Alfonso Lopez Diaz” vs. “LobezDias, Chuy”
  • 11. RNI
  • 15. Rescore Query Main Query Plug-in Implementation match : { name: "Bob Smitty" } bool: name.Key1:... name.Key2:... name.Key3:... User Query Rescore name_score : { field : "name", name : "Bob Smitty") name:"Robert Smith" dob:2/13/1987 score : .79 Indexing { name: "Robert Smith" dob:" 1987/02/13" } { name: "Robert Smith" name.Key1:… name.Key2:… name.Key3:… dob: "1987/02/13" } User Doc Index subset
  • 16. Demo
  • 18. Name Matching with Elasticsearch June 25, 2015 Graham Morehead gmorehead@basistech.com
  • 19. How could you use such a Field? ● Plugin contains custom mapper which does all the work behind the scenes PUT /ofac/ofac/_mapping { "ofac" : { "properties" : { "name" : { "type:" : "rni_name" } "aka" : { "type:" : "rni_name" } } } }
  • 20. What happens at index time? ● NameMapper indexes keys for different phenomena in separate (sub) fields @Override public void parse(ParseContext context) throws IOException { Name name = NameBuilder.data(nameString).build(); //Generate keys for name Collection<FieldSpec> fields = helper.deriveFieldsForName(name); //Parse each key with the appropriate Mapper for (FieldSpec field : fields) { Mapper mapper = keyMappers.get(field.getField().fieldName()); context = context.createExternalValueContext(field.getStringValue()); mapper.parse(context); } }
  • 21. What happens at query time? ● Step #1: NameMapper generates analogous keys for a custom Lucene query that finds good candidates for re-scoring @Override public Query termQuery(Object value, @Nullable QueryParseContext context) { //Parse name string Name name = NameBuilder.data(value.toString()).build(); QuerySpec spec = helper.buildQuerySpec(new NameIndexQuery(name)); //Build Lucene query Query query = spec.accept(new ESQueryVisitor(names.indexName() + ".")); return query; }
  • 22. What else happens at query time? ● Step #2: Uses a Rescore query to score names in the best candidate documents and reorder accordingly ○ Tuned for high precision name matching ○ Computationally expensive "rescore" : { "query" : { "rescore_query" : { "function_score" : { "name_score" : { "field" : "name", "query_name" : "LobEzDiaS, Chuy" } ...
  • 23. ● The 'name_score' function matches the query name against the indexed name in every candidate document and returns the similarity score @Override public double score(int docId, float subQueryScore) { //Create a scorer for the query name CachedScorer cs = createCachedScorer(queryName); //Retrieve name data from doc values nameByteData.setDocument(docId); Name indexName = bytesToName(nameByteData.valueAt(i).bytes); //Score the query against the indexed name in this document return cs.score(indexName); } What does that function do?