SlideShare a Scribd company logo
Fuzzy Entity Matching 
Ken Krugler | President, Scale Unlimited
whoami 
•Ken Krugler, Scale Unlimited - Nevada City, CA 
•Consulting on big data (workflows, search, etc) 
•Training for Hadoop, Cascading, Solr & Cassandra
The Problem
Should I Trust You? 
•When opening a bank account... 
•...what is the applicant's risk? 
! 
•Key is matching person... 
•...to other account info
Matching people 
•I have some information you've provided 
•I need to match against ALL bank data 
•But banks won't exchange their customer info 
•So what can we do?
Early Warning Services 
•Owned by the top 5 US banks 
•Gets data from 800+ financial institutions 
•So they have details on most US bank accounts
Fuzzy Matching
What's a fuzzy match? 
•Match everything that's equivalent 
! 
≅ 
! 
•Match nothing that's different 
≇
Why is it hard? 
•Lots of gray areas in fuzzy matching 
≟ 
! 
•Can't use exact key join 
•So no easy lookup using C* row key 
•Often computationally intensive
Matching People 
•I've got information on lots of people 
•I'm being asked about a specific person 
•How to quickly find all good matches? 
•Not doing batch matching ≟
What's a Good Match? 
•Comparing field values between records 
•Are these two people the same? 
Name Bob Bogus Robert Bogus 
Address 220 3rd Ave 220 3rd Avenue 
City Seattle Seattle 
State WA WA 
ZIP 98104-2608 98104
What about now? 
•Normalization becomes critical 
•How to focus on the important features? 
Name Bob Bogus Robert H. Bogus 
Address Apt 102, 220 3rd Ave 3220 3rd Avenue South 
City Seattle Seattle 
State Washington WA 
ZIP 98104
How do you calc similarity? 
•Calculate degree of similarity for each field (0 -> 1.0) 
•Give each field a weight (these sum to 1.0) 
•Score is sum(fieldN sim * fieldN weight) 
•So score is 0 (nothing in common) to 1.0 (exact dup)
Does that scale? 
•For a given person being matched... 
•You need to compare to every other person 
•Works for a few thousand people 
•Doesn't scale for 100s of millions of people
Search to the Rescue
Search is (fast) similarity 
•Find N most similar docs to this doc (my query) 
•Each doc has multi-dimensional feature vector 
•Each feature (dimension) is a unique word 
•Feature weight is TF * IDF
Cosine Similarity 
•Each document has a term vector 
•E.g. three unique words x, y, z 
•Weight is TF*IDF of each word 
•Calc cosine of angle between 2 vectors 
•That is the similarity score
Cosine sim ≢ match sim 
•Doesn't have same level of sophistication 
•So throw a bigger net to find candidates 
•e.g. get top N*X, assuming at most X matches 
•Then do match similarity calc on this (small) set
So two-step process 
Match 
0.90 
0.50 
0.10 
0.85 
... 
Query: name=“Bob Bogus”^3 
and ssn=“222447777”^10 
and dob=“19600723”^5 
Solr 
Index 
Name SSN DOB 
Bob Bogus 222447777 19610603 
Robert Bogus 193618919 19600723 
Bob Smith 479385821 19600723 
Sam Stealthy 222447777 19930523 
Name SSN DOB 
Bob Bogus 222447777 19600723 
... ... ...
How do you pick N? 
•Can be small, if match sim ≈ search sim 
•If N is too big, it's inefficient 
•If N is too small, you miss matches 
•Tune search to mimic match sim 
•Right tradeoff depends on use case
What is Solr? 
•Enterprise search system, build on top of Lucene 
•Open source project at Apache Software Foundation 
•Scales to billions of documents 
•Highly configurable & customizable 
•Integrated with Cassandra in DSE
Solr Schema 
•Defines set of fields in a document 
•Direct one-to-one mapping with Cassandra columns 
•Fields can be defined with synonyms, etc., etc. 
<fields> 
<field name="key" type="string" indexed="true" stored="true" /> 
<field name="name" type="text" indexed="true" stored="true" /> 
</fields>
DSE Search with Solr
What is DSE with Solr? 
•DSE-specific enhancement to Cassandra 
•Keeps a Solr index in sync with a C* table 
•Indexes distributed to all nodes C* & 
Solr 
C* & 
Solr 
C* & 
Solr 
C* Table 
S* Index 
C* Table 
S* Index 
C* Table 
S* Index
Handy replication & failover 
•Implementation leverages C* replication 
•So you get load balancing, reliability, scalability 
•You can replicate from a regular C* DC to Solr DC 
C* & 
Solr 
C* & 
Solr 
Solr DC C* DC 
C* & 
Solr 
C* C* 
C*
Who builds the index? 
•In background 
•Much slower than 
C* updates 
•Uses existing 
secondary index 
hook 
Secondary 
Index Hook 
Distribute to 
indexing queues 
Logical Rows 
Indexing 
Queue 
Read C* storage row 
back_pressure_threshold_per_core 
max_solr_concurrency_per_core 
Create one Solr doc 
per entry 
Apply 
FieldInputTransformer Update Solr
How fast is it? 
•Writing 170M records ≈ 2.5 hours 
•8 node DSE 4.0 cluster, 8 1TB SSDs on each 
•This is indexing during writes 
•About 15% of index available when writes finish 
•Complete index takes another 12 hours
System Overview
ETL Hadoop Workflow 
•Extract, transform, load 
•Built using Cascading API 
•Parse data, simple normalization 
•Other transformations happen in Solr
Cassandra ingress 
•Reduce tasks in Hadoop talk to C* cluster 
•Using DataStax Java driver for Cassandra 
•Bottleneck is Solr indexing 
•Inserts get throttled when this falls behind 
•But total time less than with deferred indexing
Architectural Diagram 
C* + 
Solr 
C* + 
Solr 
C* + 
Solr 
Hadoop 
Cluster 
Entity 
Matcher API
Ingest performance 
•For max performance, write without reads 
•But how to avoid creating duplicate entries? 
•Set the row key to the hash of searchable fields 
•Accept "near duplicates" in search results 
•Possible to push some Solr load into workflow
Summary
Key points to remember 
•This is for ad hoc requests, not batch deduplication 
•Use search to reduce candidate set, then match 
•Pain is in normalization, matching logic 
•DSE + Solr simplifies architecture & adds goodness
More questions? 
•Feel free to contact me 
•http://www.scaleunlimited.com/contact/ 
•Get training on DSE with Solr 
•http://www.datastax.com/what-we-offer/products-services/ 
training

More Related Content

PDF
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
PPTX
PostgreSQL.pptx
PDF
Airflow presentation
PDF
Normalization in SQL | Edureka
PDF
Introduction To Liquibase
PDF
mysql 8.0 architecture and enhancement
PDF
High-speed Database Throughput Using Apache Arrow Flight SQL
PPTX
Sql fundamentals
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
PostgreSQL.pptx
Airflow presentation
Normalization in SQL | Edureka
Introduction To Liquibase
mysql 8.0 architecture and enhancement
High-speed Database Throughput Using Apache Arrow Flight SQL
Sql fundamentals

What's hot (20)

PDF
MySQL Tutorial For Beginners | Relational Database Management System | MySQL ...
PDF
Airflow introduction
PPTX
MongoDB.pptx
PDF
MongoDB Fundamentals
PDF
Nosql data models
PPTX
Lect 08 materialized view
PPT
Web ontology language (owl)
PDF
Apache Olingo - ApacheCon Denver 2014
PPTX
The Semantic Knowledge Graph
PDF
Data Modeling with NGSI, NGSI-LD
PDF
Open core summit: Observability for data pipelines with OpenLineage
PDF
FIWARE Training: Introduction to Smart Data Models
PPTX
Introduction of sql server indexing
PDF
PromQL Deep Dive - The Prometheus Query Language
PDF
Parquet Hadoop Summit 2013
PPT
PHP Frameworks and CodeIgniter
PPTX
Real-time Analytics with Presto and Apache Pinot
PPTX
Introduction to Apache Spark
PPT
Understanding RDF: the Resource Description Framework in Context (1999)
PDF
Spark SQL
MySQL Tutorial For Beginners | Relational Database Management System | MySQL ...
Airflow introduction
MongoDB.pptx
MongoDB Fundamentals
Nosql data models
Lect 08 materialized view
Web ontology language (owl)
Apache Olingo - ApacheCon Denver 2014
The Semantic Knowledge Graph
Data Modeling with NGSI, NGSI-LD
Open core summit: Observability for data pipelines with OpenLineage
FIWARE Training: Introduction to Smart Data Models
Introduction of sql server indexing
PromQL Deep Dive - The Prometheus Query Language
Parquet Hadoop Summit 2013
PHP Frameworks and CodeIgniter
Real-time Analytics with Presto and Apache Pinot
Introduction to Apache Spark
Understanding RDF: the Resource Description Framework in Context (1999)
Spark SQL
Ad

Viewers also liked (20)

PDF
Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat Videos
PDF
Become a super modeler
PPT
IoTMidlands #4 - matthew fox from viridian housing
PDF
C* Summit 2013: Data Modelers Still Have Jobs - Adjusting For the NoSQL Envir...
PDF
Cassandra Day Denver 2014: So, You Want to Use Cassandra?
PDF
Cassandra Day Denver 2014: Transitioning to Cassandra for an Already Giant Pr...
PDF
Cassandra Day Denver 2014: Using Cassandra to Support Crisis Informatics Rese...
PDF
Cassandra Day Denver 2014: Building Java Applications with Apache Cassandra
PDF
Cassandra Day Denver 2014: Python & Cassandra Best Friends
PDF
Cassandra Day Denver 2014: Cassandra Anti-Pattern Jeopardy
PDF
Cassandra Day Denver 2014: Setting up a DataStax Enterprise Instance on Micro...
PDF
Cassandra Day Denver 2014: Feelin' the Flow: Analyzing Data with Spark and Ca...
PDF
A Shortcut to Awesome: Cassandra Data Modeling By Example (Jon Haddad, The La...
PDF
Cassandra Day Denver 2014: Introduction to Apache Cassandra
PDF
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
PDF
Cassandra Summit 2014: Reading Cassandra SSTables Directly for Offline Data A...
PDF
Apache Cassandra Lesson: Data Modelling and CQL3
PPTX
Spark + Cassandra = Real Time Analytics on Operational Data
KEY
NoSQL Databases: Why, what and when
PPTX
Quelles stratégies de Recherche avec Cassandra ?
Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat Videos
Become a super modeler
IoTMidlands #4 - matthew fox from viridian housing
C* Summit 2013: Data Modelers Still Have Jobs - Adjusting For the NoSQL Envir...
Cassandra Day Denver 2014: So, You Want to Use Cassandra?
Cassandra Day Denver 2014: Transitioning to Cassandra for an Already Giant Pr...
Cassandra Day Denver 2014: Using Cassandra to Support Crisis Informatics Rese...
Cassandra Day Denver 2014: Building Java Applications with Apache Cassandra
Cassandra Day Denver 2014: Python & Cassandra Best Friends
Cassandra Day Denver 2014: Cassandra Anti-Pattern Jeopardy
Cassandra Day Denver 2014: Setting up a DataStax Enterprise Instance on Micro...
Cassandra Day Denver 2014: Feelin' the Flow: Analyzing Data with Spark and Ca...
A Shortcut to Awesome: Cassandra Data Modeling By Example (Jon Haddad, The La...
Cassandra Day Denver 2014: Introduction to Apache Cassandra
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
Cassandra Summit 2014: Reading Cassandra SSTables Directly for Offline Data A...
Apache Cassandra Lesson: Data Modelling and CQL3
Spark + Cassandra = Real Time Analytics on Operational Data
NoSQL Databases: Why, what and when
Quelles stratégies de Recherche avec Cassandra ?
Ad

Similar to Cassandra Summit 2014: Fuzzy Entity Matching at Scale (20)

PPTX
Neo4j Training Introduction
PPTX
Lightning Talk: What You Need to Know Before You Shard in 20 Minutes
PPTX
Sharding why,what,when, how
PDF
Postgres Vision 2018: Five Sharding Data Models
 
PPTX
Test driving Azure Search and DocumentDB
KEY
Strengths and Weaknesses of MongoDB
PPTX
Creating an Open Source Genealogical Search Engine with Apache Solr
PDF
Neo4j Training Modeling
PDF
Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...
PDF
Five Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
PPTX
Build a modern data platform.pptx
PPTX
Database Design Disasters
PPTX
Betabit - syrwag 2018-03-28
PDF
Neo4j Data Science Presentation
PPTX
What Your Database Query is Really Doing
PDF
Scaling the Web: Databases & NoSQL
PDF
MongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_Wilkins
PPTX
50 Shades of Data - how, when and why Big, Fast, Relational, NoSQL, Elastic, ...
PDF
MongoDB: What, why, when
PPTX
Graphs fun vjug2
Neo4j Training Introduction
Lightning Talk: What You Need to Know Before You Shard in 20 Minutes
Sharding why,what,when, how
Postgres Vision 2018: Five Sharding Data Models
 
Test driving Azure Search and DocumentDB
Strengths and Weaknesses of MongoDB
Creating an Open Source Genealogical Search Engine with Apache Solr
Neo4j Training Modeling
Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...
Five Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
Build a modern data platform.pptx
Database Design Disasters
Betabit - syrwag 2018-03-28
Neo4j Data Science Presentation
What Your Database Query is Really Doing
Scaling the Web: Databases & NoSQL
MongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_Wilkins
50 Shades of Data - how, when and why Big, Fast, Relational, NoSQL, Elastic, ...
MongoDB: What, why, when
Graphs fun vjug2

More from DataStax Academy (20)

PDF
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
PPTX
Introduction to DataStax Enterprise Graph Database
PPTX
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
PPTX
Cassandra on Docker @ Walmart Labs
PDF
Cassandra 3.0 Data Modeling
PPTX
Cassandra Adoption on Cisco UCS & Open stack
PDF
Data Modeling for Apache Cassandra
PDF
Coursera Cassandra Driver
PDF
Production Ready Cassandra
PDF
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
PPTX
Cassandra @ Sony: The good, the bad, and the ugly part 1
PPTX
Cassandra @ Sony: The good, the bad, and the ugly part 2
PDF
Standing Up Your First Cluster
PDF
Real Time Analytics with Dse
PDF
Introduction to Data Modeling with Apache Cassandra
PDF
Cassandra Core Concepts
PPTX
Enabling Search in your Cassandra Application with DataStax Enterprise
PPTX
Bad Habits Die Hard
PDF
Advanced Data Modeling with Apache Cassandra
PDF
Advanced Cassandra
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Cassandra on Docker @ Walmart Labs
Cassandra 3.0 Data Modeling
Cassandra Adoption on Cisco UCS & Open stack
Data Modeling for Apache Cassandra
Coursera Cassandra Driver
Production Ready Cassandra
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 2
Standing Up Your First Cluster
Real Time Analytics with Dse
Introduction to Data Modeling with Apache Cassandra
Cassandra Core Concepts
Enabling Search in your Cassandra Application with DataStax Enterprise
Bad Habits Die Hard
Advanced Data Modeling with Apache Cassandra
Advanced Cassandra

Recently uploaded (20)

PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Big Data Technologies - Introduction.pptx
PDF
cuic standard and advanced reporting.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Electronic commerce courselecture one. Pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Encapsulation theory and applications.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
sap open course for s4hana steps from ECC to s4
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Programs and apps: productivity, graphics, security and other tools
Per capita expenditure prediction using model stacking based on satellite ima...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Spectral efficient network and resource selection model in 5G networks
Advanced methodologies resolving dimensionality complications for autism neur...
Network Security Unit 5.pdf for BCA BBA.
Big Data Technologies - Introduction.pptx
cuic standard and advanced reporting.pdf
Review of recent advances in non-invasive hemoglobin estimation
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Electronic commerce courselecture one. Pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Encapsulation theory and applications.pdf
20250228 LYD VKU AI Blended-Learning.pptx
sap open course for s4hana steps from ECC to s4
NewMind AI Weekly Chronicles - August'25 Week I
Encapsulation_ Review paper, used for researhc scholars
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Programs and apps: productivity, graphics, security and other tools

Cassandra Summit 2014: Fuzzy Entity Matching at Scale

  • 1. Fuzzy Entity Matching Ken Krugler | President, Scale Unlimited
  • 2. whoami •Ken Krugler, Scale Unlimited - Nevada City, CA •Consulting on big data (workflows, search, etc) •Training for Hadoop, Cascading, Solr & Cassandra
  • 4. Should I Trust You? •When opening a bank account... •...what is the applicant's risk? ! •Key is matching person... •...to other account info
  • 5. Matching people •I have some information you've provided •I need to match against ALL bank data •But banks won't exchange their customer info •So what can we do?
  • 6. Early Warning Services •Owned by the top 5 US banks •Gets data from 800+ financial institutions •So they have details on most US bank accounts
  • 8. What's a fuzzy match? •Match everything that's equivalent ! ≅ ! •Match nothing that's different ≇
  • 9. Why is it hard? •Lots of gray areas in fuzzy matching ≟ ! •Can't use exact key join •So no easy lookup using C* row key •Often computationally intensive
  • 10. Matching People •I've got information on lots of people •I'm being asked about a specific person •How to quickly find all good matches? •Not doing batch matching ≟
  • 11. What's a Good Match? •Comparing field values between records •Are these two people the same? Name Bob Bogus Robert Bogus Address 220 3rd Ave 220 3rd Avenue City Seattle Seattle State WA WA ZIP 98104-2608 98104
  • 12. What about now? •Normalization becomes critical •How to focus on the important features? Name Bob Bogus Robert H. Bogus Address Apt 102, 220 3rd Ave 3220 3rd Avenue South City Seattle Seattle State Washington WA ZIP 98104
  • 13. How do you calc similarity? •Calculate degree of similarity for each field (0 -> 1.0) •Give each field a weight (these sum to 1.0) •Score is sum(fieldN sim * fieldN weight) •So score is 0 (nothing in common) to 1.0 (exact dup)
  • 14. Does that scale? •For a given person being matched... •You need to compare to every other person •Works for a few thousand people •Doesn't scale for 100s of millions of people
  • 15. Search to the Rescue
  • 16. Search is (fast) similarity •Find N most similar docs to this doc (my query) •Each doc has multi-dimensional feature vector •Each feature (dimension) is a unique word •Feature weight is TF * IDF
  • 17. Cosine Similarity •Each document has a term vector •E.g. three unique words x, y, z •Weight is TF*IDF of each word •Calc cosine of angle between 2 vectors •That is the similarity score
  • 18. Cosine sim ≢ match sim •Doesn't have same level of sophistication •So throw a bigger net to find candidates •e.g. get top N*X, assuming at most X matches •Then do match similarity calc on this (small) set
  • 19. So two-step process Match 0.90 0.50 0.10 0.85 ... Query: name=“Bob Bogus”^3 and ssn=“222447777”^10 and dob=“19600723”^5 Solr Index Name SSN DOB Bob Bogus 222447777 19610603 Robert Bogus 193618919 19600723 Bob Smith 479385821 19600723 Sam Stealthy 222447777 19930523 Name SSN DOB Bob Bogus 222447777 19600723 ... ... ...
  • 20. How do you pick N? •Can be small, if match sim ≈ search sim •If N is too big, it's inefficient •If N is too small, you miss matches •Tune search to mimic match sim •Right tradeoff depends on use case
  • 21. What is Solr? •Enterprise search system, build on top of Lucene •Open source project at Apache Software Foundation •Scales to billions of documents •Highly configurable & customizable •Integrated with Cassandra in DSE
  • 22. Solr Schema •Defines set of fields in a document •Direct one-to-one mapping with Cassandra columns •Fields can be defined with synonyms, etc., etc. <fields> <field name="key" type="string" indexed="true" stored="true" /> <field name="name" type="text" indexed="true" stored="true" /> </fields>
  • 24. What is DSE with Solr? •DSE-specific enhancement to Cassandra •Keeps a Solr index in sync with a C* table •Indexes distributed to all nodes C* & Solr C* & Solr C* & Solr C* Table S* Index C* Table S* Index C* Table S* Index
  • 25. Handy replication & failover •Implementation leverages C* replication •So you get load balancing, reliability, scalability •You can replicate from a regular C* DC to Solr DC C* & Solr C* & Solr Solr DC C* DC C* & Solr C* C* C*
  • 26. Who builds the index? •In background •Much slower than C* updates •Uses existing secondary index hook Secondary Index Hook Distribute to indexing queues Logical Rows Indexing Queue Read C* storage row back_pressure_threshold_per_core max_solr_concurrency_per_core Create one Solr doc per entry Apply FieldInputTransformer Update Solr
  • 27. How fast is it? •Writing 170M records ≈ 2.5 hours •8 node DSE 4.0 cluster, 8 1TB SSDs on each •This is indexing during writes •About 15% of index available when writes finish •Complete index takes another 12 hours
  • 29. ETL Hadoop Workflow •Extract, transform, load •Built using Cascading API •Parse data, simple normalization •Other transformations happen in Solr
  • 30. Cassandra ingress •Reduce tasks in Hadoop talk to C* cluster •Using DataStax Java driver for Cassandra •Bottleneck is Solr indexing •Inserts get throttled when this falls behind •But total time less than with deferred indexing
  • 31. Architectural Diagram C* + Solr C* + Solr C* + Solr Hadoop Cluster Entity Matcher API
  • 32. Ingest performance •For max performance, write without reads •But how to avoid creating duplicate entries? •Set the row key to the hash of searchable fields •Accept "near duplicates" in search results •Possible to push some Solr load into workflow
  • 34. Key points to remember •This is for ad hoc requests, not batch deduplication •Use search to reduce candidate set, then match •Pain is in normalization, matching logic •DSE + Solr simplifies architecture & adds goodness
  • 35. More questions? •Feel free to contact me •http://www.scaleunlimited.com/contact/ •Get training on DSE with Solr •http://www.datastax.com/what-we-offer/products-services/ training