Measuring Search Engine Quality using Spark and Python

Sujit Pal
March 13, 2016
Measuring Search Engine
Quality using Spark and Python

| 2
• About Me
 Work at Elsevier Labs
 Interests: Search, NLP and Distributed Processing.
 URL: labs.elsevier.com
 Email: sujit.pal@elsevier.com
 Blog: Salmon Run
 Twitter: @palsujit
• About Elsevier
 World’s largest publisher of STM Books and Journals
 Uses data to inform and enable consumers of STM info
Introduction

| 3
• Problem Description
• Our Solution
• Other Uses
• Future Work
• Q&A
Agenda

• Migrate the Science Direct Search Engine from Microsoft FAST to
Apache Solr.
Background

• Keep search quality consistent across platforms.
• Full text downloads from Solr must equal or exceed that from FAST.
Success Criteria

• A/B Tests happen in Production
 Expensive: Requires Production Deployment of Search
Engine(s).
 Risky: Bad customer experience in AB can result in customer
leaving.
 Limited scope for iterative improvement: because of expense
and risk.
 A/B tests take time to be statistically meaningful.
• We needed something that
 Could be run by DEV/QA on demand.
 Produces repeatable indicator of search engine quality.
 Does not require production deployment.
• This tool is the subject of our talk today.
But…

• We have
 Query Logs – from query string entered by user.
 Click Logs – from the download links clicked by user.
• We can generate
 Search results for each query against the search engine.
• Combining which we can provide
 Click Rank Distribution for a search engine (configuration)
Solution Overview

Click Rank Definition
• Click Rank is the sum of the ranks of all PIIs in the result set that
match the PIIs clicked for that query, divided by the number of
matches.
• Deconstructing the above:
 Let the click logs for a query be the document set Q.
 Let the top N search results for the query be represented by a
List R.
 Let P be the intersection of Q and R, and P be the (one-based)
indexes of the documents in R that are in P.
 Click Rank = Σ P / || P ||

Preprocess Logs
• Query and Click logs already merged by A/B test framework.
• Provided as line-oriented JSON format.

Generate Search Results for Engine
• Replay query logs against Search Engine (configuration).
• Use Python Multiprocessing module (for parallel access).

Search Results Example Data
• Search Results are saved one file per query.
• Top 50 results extracted (so each file is 50 lines long).
• Maintains parity with FAST reference query results (provided one-
time via legacy process).

Compute Click Rank Distribution
• Use Apache Spark to compute the Click Rank Distribution.
• Use Python + Matplotlib to build reports and visualizations.

Spark Code for Generating Click Rank Distribution
• Skeleton of a PySpark Program

• Step #1: Convert the Clicks data to (Query_ID, List of clicked
PIIs)

• Step #2: Convert the Search Results data to (Query_ID, List of
PIIs in search result)

• Step #3: Join the two RDDs and compute Click Rank from the
intersection of the clicked PIIs and the result PIIs for each
Query_ID.

Generate Reports
• Download Click Rank Distribution for Search Engine (configuration).
• Use Python + Matplotlib to build reports and visualizations.

Outputs from Tool
• Step #4: Download distribution from S3 and aggregate to chart
and spreadsheet.

How did we do (in our A/B test)?
• Solr PDF downloads were 99.6% of FAST downloads.
• Difference in download rates not statistically significant.
• Decision made to put Solr into production.
90%
95%
100%
105%
Jan
(AB #1)
Feb
(AB #2)
Mar
(AB #3)
Apr
(AB #4)
SOLR Downloads as % of FAST Downloads
PDF
HTML
Level of FAST Downloads

Find Search Result Overlap between Configurations
• Measure drift between two search configurations.
• Ordered and Unordered comparison at different top N positions.
• Result set overlap increases with N.
• Lot of positional overlap in the top N positions across engines.

Search Quality as Overlap between Title and Query
• Measures overlap of title words with query words for various top
N positions.
• Overlap @ N defined as sum of number of words overlap for the
first N titles with the query normalized by N times the number of
words in the query.
• Overlap @ N decreases monotonically with N.
• Solr engines seem to do better at this measure.

Click Distribution
• Measures the distribution of clicked positions across the top 50
positions for each engine and compares them.
• In this chart, FAST has higher number of clicks at the top
positions than the Solr configurations shown.

Distribution of Publication Dates in Results
• The engine has a temporal component in its ranking algorithm.
• Compares the distribution of publication dates across search
engine configurations to visualize its behavior.

More Uses …
• Measuring impact of query response time on click rank.
• Comparing click rank distributions by document type.
• …

• Compute
(Average/Median) CR
per user.
• Compute CR per query
and user.
• Use this as input to
Learning to Rank
Algorithms.
• Other ideas…
Future Work

Thank you for
listening!
Questions?
My Email:
sujit.pal@elsevier.com

Measuring Search Engine Quality using Spark and Python

More Related Content

What's hot (20)

Viewers also liked (6)

Similar to Measuring Search Engine Quality using Spark and Python (20)

More from Sujit Pal (20)

Recently uploaded (20)

Measuring Search Engine Quality using Spark and Python