SlideShare a Scribd company logo
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component 
Steven Bower 
©2014 Bloomberg L.P.
Bloomberg 
• Largest provider of financial news and information 
• Our strength is quickly and accurately delivering data, news and analytics 
• Creating high performance and accurate information retrieval systems is core to 
our strength
Bloomberg Search Team 
• Search infrastructure 
• Develop and support search as a service platform 
• Support for other search applications within the company 
• Consultancy 
• Provide design consultancy/support to application teams 
• Promote search best practices/standardization throughout the company 
• Machine learning 
• Develop machine learning techniques to improve relevancy 
• Create natural language processors to answer questions 
• Unified search 
• Create information retrieval tools to organize and connect the vast and varied 
datasets provided to our clients
Our Challenge
Our Approach 
• Use Search/Solr as it provides flexible search/filtering over large, fast moving, 
result sets 
• Initially used StatsComponent, but quickly ran into limitations 
• Wanted to push the bounds of analytics capabilities in Solr/Lucene 
• Needed a pluggable framework to perform complex calculations/aggregations on 
numerical time-series data 
• DocValues provided high performance columnar access to fields in the index 
(without un-inversion cost)
DocValues 
• DocValues provide high performance 
columnar access to fields in the index 
• No un-inversion cost 
• Increased storage footprint 
• Helps achieve NRT 
• Values live off-heap in memory map
Analytics Component 
• New component from the ground up 
• Designed/Implemented by the Bloomberg Search Team over summer of 2013 
• Initial implementation was built using DocValues API directly, but moved to 
FieldCache 
• Refactored existing faceting implementation to support analytics 
• Created simple prefix notation for statistical expressions 
• Available as a Solr Contrib module in Solr 5.x or patches for 4.8+ on SOLR-5302
Features 
• Flexible/Extendable framework for adding additional statistics/faceting 
• Supports Multiple Analytics Requests per query execution 
• Multiple statistic calculations per request 
• Multiple facets per request 
• Each request can facet statistics over different fields and ranges
Features - Faceting 
• Field Faceting 
• Support for int, long, float, double, date, string fields 
• Support for multi-value fields 
• Support for limit, offset and mincount 
• Support for sorting of stats-facets by any statistic (i.e. sort by mean) 
• Range faceting 
• Numeric types and dates 
• Dynamically calculate range/gap based on calculated statistics 
• Support for query faceting of stats 
• Use calculated statistics to generate facet queries
Features – Map Operators 
• Basic Math 
• neg(<expr>) 
• add(<expr>,...) 
• mult(<expr>,...) 
• div(<expr>,<expr>) 
• pow(<expr>,<expr>) 
• log(<expr>,<expr>) 
• Constants 
• const_num(<number>) 
• const_date(<date>) 
• const_str(<string>) 
• Date Math 
• date_math(<date expr>,<date op>,...) 
• String operations 
• rev(<expr>) 
• concat(<expr>,...) 
• Field 
• <field> 
• Missing Values 
• miss(<expr>,<value>)
Features – Reduction Operators 
• Statistical 
• min(<expr>) 
• max(<expr>) 
• sum(<expr>) 
• count(<expr>) 
• miss(<expr>) 
• unique(<expr>) 
• Complex 
• sumofsquares(<expr>) 
• mean(<expr>) 
• stddev(<expr>) 
• median(<expr>) 
• percentile(<expr>)
Examples 
• Weighted Average 
• Calculate weighted average of field_a with field_b as the weight 
div( mean( mult(field_a, field_b) ), sum(field_b) ) 
• Variance 
• Calculate the variance of field_a 
pow( stddev(field_a), const_num(2) )
Examples 
• T-Score 
• Calculate a t-score where ## is the value and all values in your sample are stored in field_a. 
div( add( const_num(##), neg( mean(field_a) ) ), 
div( stddev(field_a), pow( count(field_a), const_num(.5) ) ) )
How We Use It 
• Segment, aggregate and analyze 
financial data quickly 
• Aggregate time series data across 
multiple fields to render charts 
• Created flexible diagnostic tools/ 
visualizations to analyze Solr 
performance
Future Plans 
• Multi-shard support 
• Pivot Facet Support 
• Statistics on Multi-value fields 
• To support unique() 
• Filter result set based upon calculated statistics 
• Generalize facet implementation
Links and Questions? 
Analytics Component 
h"ps://issues.apache.org/jira/browse/SOLR-­‐5302 
More About Bloomberg 
h"p://www.bloomberglabs.com/

More Related Content

PDF
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
PDF
Real-Time Analytics with Solr: Presented by Yonik Seeley, Cloudera
PDF
Deduplication Using Solr: Presented by Neeraj Jain, Stubhub
PDF
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
PDF
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, Target
PDF
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
PDF
Solr4 nosql search_server_2013
PDF
Efficient Scalable Search in a Multi-Tenant Environment: Presented by Harry H...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Real-Time Analytics with Solr: Presented by Yonik Seeley, Cloudera
Deduplication Using Solr: Presented by Neeraj Jain, Stubhub
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, Target
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Solr4 nosql search_server_2013
Efficient Scalable Search in a Multi-Tenant Environment: Presented by Harry H...

What's hot (20)

PDF
Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...
PDF
SQL Now! How Optiq brings the best of SQL to NoSQL data.
PDF
Webinar: Replace Google Search Appliance with Lucidworks Fusion
PDF
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
PPTX
Webinar: Solr & Fusion for Big Data
PDF
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
PDF
Distributed Stream Processing - Spark Summit East 2017
PPT
Configuring elasticsearch for performance and scale
PDF
What's new in pandas and the SciPy stack for financial users
PDF
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
PPTX
Multi dimension aggregations using spark and dataframes
PDF
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
PDF
Scaling Self Service Analytics with Databricks and Apache Spark with Amelia C...
PPTX
Dictionary Based Annotation at Scale with Spark by Sujit Pal
PPTX
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
PDF
Writing Continuous Applications with Structured Streaming PySpark API
PDF
Enabling exploratory data science with Spark and R
PDF
Performance Optimization Case Study: Shattering Hadoop's Sort Record with Spa...
PPTX
Large-Scale Data Science in Apache Spark 2.0
PDF
Spark Application Carousel: Highlights of Several Applications Built with Spark
Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...
SQL Now! How Optiq brings the best of SQL to NoSQL data.
Webinar: Replace Google Search Appliance with Lucidworks Fusion
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Webinar: Solr & Fusion for Big Data
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
Distributed Stream Processing - Spark Summit East 2017
Configuring elasticsearch for performance and scale
What's new in pandas and the SciPy stack for financial users
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Multi dimension aggregations using spark and dataframes
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Scaling Self Service Analytics with Databricks and Apache Spark with Amelia C...
Dictionary Based Annotation at Scale with Spark by Sujit Pal
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
Writing Continuous Applications with Structured Streaming PySpark API
Enabling exploratory data science with Spark and R
Performance Optimization Case Study: Shattering Hadoop's Sort Record with Spa...
Large-Scale Data Science in Apache Spark 2.0
Spark Application Carousel: Highlights of Several Applications Built with Spark
Ad

Viewers also liked (20)

PDF
Building a real time big data analytics platform with solr
PDF
Lucene/Solr Revolution 2015 Opening Keynote with Lucidworks CEO Will Hayes
PDF
Search at Twitter: Presented by Michael Busch, Twitter
PDF
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
PPTX
Real-Time Big Data with Storm, Kafka and GigaSpaces
PPTX
Webinar Google Analytics Real Time MA 22-11-11
PPTX
Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)
PDF
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
PDF
This Ain't Your Parent's Search Engine: Presented by Grant Ingersoll, Lucidworks
PDF
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
PDF
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
PDF
Lucene/Solr Spatial in 2015: Presented by David Smiley
PDF
SearchHub - How to Spend Your Summer Keeping it Real: Presented by Grant Inge...
PDF
Search Architecture at Evernote: Presented by Christian Kohlschütter, Evernote
PDF
Rapid Prototyping with Solr
PDF
Lucene for Solr Developers
PDF
Evolving Search Relevancy: Presented by James Strassburg, Direct Supply
PDF
Multi-language Content Discovery Through Entity Driven Search: Presented by A...
PDF
Deep Data at Macy's - Searching Hierarchichal Documents for eCommerce Merchan...
PDF
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
Building a real time big data analytics platform with solr
Lucene/Solr Revolution 2015 Opening Keynote with Lucidworks CEO Will Hayes
Search at Twitter: Presented by Michael Busch, Twitter
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
Real-Time Big Data with Storm, Kafka and GigaSpaces
Webinar Google Analytics Real Time MA 22-11-11
Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
This Ain't Your Parent's Search Engine: Presented by Grant Ingersoll, Lucidworks
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Spatial in 2015: Presented by David Smiley
SearchHub - How to Spend Your Summer Keeping it Real: Presented by Grant Inge...
Search Architecture at Evernote: Presented by Christian Kohlschütter, Evernote
Rapid Prototyping with Solr
Lucene for Solr Developers
Evolving Search Relevancy: Presented by James Strassburg, Direct Supply
Multi-language Content Discovery Through Entity Driven Search: Presented by A...
Deep Data at Macy's - Searching Hierarchichal Documents for eCommerce Merchan...
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
Ad

Similar to Search Analytics Component: Presented by Steven Bower, Bloomberg L.P. (20)

PDF
Elasticsearch - SEARCH & ANALYZE DATA IN REAL TIME
PDF
Building a real time, big data analytics platform with solr
PDF
Analytics at Scale with the Analytics Component 2.0 - Houston Putman, Bloombe...
PDF
PDF
Faceted Search And Result Reordering
PDF
Retrieving Information From Solr
PPTX
AI from your data lake: Using Solr for analytics
PDF
The Evolution of Streaming Expressions - Joel Bernstein, Alfresco & Dennis Go...
PDF
Real Time Search and Analytics on Big Data
PPTX
Learning to Rank: From Theory to Production - Malvina Josephidou & Diego Cecc...
PDF
Enhancing relevancy through personalization & semantic search
PDF
Bringing back the excitement to data analysis
PPT
IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis
PPTX
Building Search & Recommendation Engines
PDF
The Many Facets of Apache Solr - Yonik Seeley
PDF
Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany
PDF
How Solr Uses Advanced Search to Strengthen Organizations?
PDF
Apache Solr as a compressed, scalable, and high performance time series database
PPTX
Apache solr
PDF
Apache Solr lessons learned
Elasticsearch - SEARCH & ANALYZE DATA IN REAL TIME
Building a real time, big data analytics platform with solr
Analytics at Scale with the Analytics Component 2.0 - Houston Putman, Bloombe...
Faceted Search And Result Reordering
Retrieving Information From Solr
AI from your data lake: Using Solr for analytics
The Evolution of Streaming Expressions - Joel Bernstein, Alfresco & Dennis Go...
Real Time Search and Analytics on Big Data
Learning to Rank: From Theory to Production - Malvina Josephidou & Diego Cecc...
Enhancing relevancy through personalization & semantic search
Bringing back the excitement to data analysis
IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis
Building Search & Recommendation Engines
The Many Facets of Apache Solr - Yonik Seeley
Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany
How Solr Uses Advanced Search to Strengthen Organizations?
Apache Solr as a compressed, scalable, and high performance time series database
Apache solr
Apache Solr lessons learned

More from Lucidworks (20)

PDF
Search is the Tip of the Spear for Your B2B eCommerce Strategy
PDF
Drive Agent Effectiveness in Salesforce
PPTX
How Crate & Barrel Connects Shoppers with Relevant Products
PPTX
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
PPTX
Connected Experiences Are Personalized Experiences
PDF
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
PPTX
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
PPTX
Preparing for Peak in Ecommerce | eTail Asia 2020
PPTX
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
PPTX
AI-Powered Linguistics and Search with Fusion and Rosette
PDF
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
PPTX
Webinar: Smart answers for employee and customer support after covid 19 - Europe
PDF
Smart Answers for Employee and Customer Support After COVID-19
PPTX
Applying AI & Search in Europe - featuring 451 Research
PPTX
Webinar: Accelerate Data Science with Fusion 5.1
PDF
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
PPTX
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
PPTX
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
PPTX
Webinar: Building a Business Case for Enterprise Search
PPTX
Why Insight Engines Matter in 2020 and Beyond
Search is the Tip of the Spear for Your B2B eCommerce Strategy
Drive Agent Effectiveness in Salesforce
How Crate & Barrel Connects Shoppers with Relevant Products
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Connected Experiences Are Personalized Experiences
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
Preparing for Peak in Ecommerce | eTail Asia 2020
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
AI-Powered Linguistics and Search with Fusion and Rosette
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Smart Answers for Employee and Customer Support After COVID-19
Applying AI & Search in Europe - featuring 451 Research
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Webinar: Building a Business Case for Enterprise Search
Why Insight Engines Matter in 2020 and Beyond

Recently uploaded (20)

PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
Softaken Excel to vCard Converter Software.pdf
PPTX
ISO 45001 Occupational Health and Safety Management System
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PDF
Understanding Forklifts - TECH EHS Solution
PPTX
ai tools demonstartion for schools and inter college
PPTX
Essential Infomation Tech presentation.pptx
PDF
How Creative Agencies Leverage Project Management Software.pdf
PPTX
history of c programming in notes for students .pptx
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
top salesforce developer skills in 2025.pdf
PDF
System and Network Administraation Chapter 3
PPT
JAVA ppt tutorial basics to learn java programming
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
Complete React Javascript Course Syllabus.pdf
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Softaken Excel to vCard Converter Software.pdf
ISO 45001 Occupational Health and Safety Management System
Design an Analysis of Algorithms II-SECS-1021-03
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
Understanding Forklifts - TECH EHS Solution
ai tools demonstartion for schools and inter college
Essential Infomation Tech presentation.pptx
How Creative Agencies Leverage Project Management Software.pdf
history of c programming in notes for students .pptx
Upgrade and Innovation Strategies for SAP ERP Customers
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PTS Company Brochure 2025 (1).pdf.......
top salesforce developer skills in 2025.pdf
System and Network Administraation Chapter 3
JAVA ppt tutorial basics to learn java programming
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Complete React Javascript Course Syllabus.pdf
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...

Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.

  • 2. Search Analytics Component Steven Bower ©2014 Bloomberg L.P.
  • 3. Bloomberg • Largest provider of financial news and information • Our strength is quickly and accurately delivering data, news and analytics • Creating high performance and accurate information retrieval systems is core to our strength
  • 4. Bloomberg Search Team • Search infrastructure • Develop and support search as a service platform • Support for other search applications within the company • Consultancy • Provide design consultancy/support to application teams • Promote search best practices/standardization throughout the company • Machine learning • Develop machine learning techniques to improve relevancy • Create natural language processors to answer questions • Unified search • Create information retrieval tools to organize and connect the vast and varied datasets provided to our clients
  • 6. Our Approach • Use Search/Solr as it provides flexible search/filtering over large, fast moving, result sets • Initially used StatsComponent, but quickly ran into limitations • Wanted to push the bounds of analytics capabilities in Solr/Lucene • Needed a pluggable framework to perform complex calculations/aggregations on numerical time-series data • DocValues provided high performance columnar access to fields in the index (without un-inversion cost)
  • 7. DocValues • DocValues provide high performance columnar access to fields in the index • No un-inversion cost • Increased storage footprint • Helps achieve NRT • Values live off-heap in memory map
  • 8. Analytics Component • New component from the ground up • Designed/Implemented by the Bloomberg Search Team over summer of 2013 • Initial implementation was built using DocValues API directly, but moved to FieldCache • Refactored existing faceting implementation to support analytics • Created simple prefix notation for statistical expressions • Available as a Solr Contrib module in Solr 5.x or patches for 4.8+ on SOLR-5302
  • 9. Features • Flexible/Extendable framework for adding additional statistics/faceting • Supports Multiple Analytics Requests per query execution • Multiple statistic calculations per request • Multiple facets per request • Each request can facet statistics over different fields and ranges
  • 10. Features - Faceting • Field Faceting • Support for int, long, float, double, date, string fields • Support for multi-value fields • Support for limit, offset and mincount • Support for sorting of stats-facets by any statistic (i.e. sort by mean) • Range faceting • Numeric types and dates • Dynamically calculate range/gap based on calculated statistics • Support for query faceting of stats • Use calculated statistics to generate facet queries
  • 11. Features – Map Operators • Basic Math • neg(<expr>) • add(<expr>,...) • mult(<expr>,...) • div(<expr>,<expr>) • pow(<expr>,<expr>) • log(<expr>,<expr>) • Constants • const_num(<number>) • const_date(<date>) • const_str(<string>) • Date Math • date_math(<date expr>,<date op>,...) • String operations • rev(<expr>) • concat(<expr>,...) • Field • <field> • Missing Values • miss(<expr>,<value>)
  • 12. Features – Reduction Operators • Statistical • min(<expr>) • max(<expr>) • sum(<expr>) • count(<expr>) • miss(<expr>) • unique(<expr>) • Complex • sumofsquares(<expr>) • mean(<expr>) • stddev(<expr>) • median(<expr>) • percentile(<expr>)
  • 13. Examples • Weighted Average • Calculate weighted average of field_a with field_b as the weight div( mean( mult(field_a, field_b) ), sum(field_b) ) • Variance • Calculate the variance of field_a pow( stddev(field_a), const_num(2) )
  • 14. Examples • T-Score • Calculate a t-score where ## is the value and all values in your sample are stored in field_a. div( add( const_num(##), neg( mean(field_a) ) ), div( stddev(field_a), pow( count(field_a), const_num(.5) ) ) )
  • 15. How We Use It • Segment, aggregate and analyze financial data quickly • Aggregate time series data across multiple fields to render charts • Created flexible diagnostic tools/ visualizations to analyze Solr performance
  • 16. Future Plans • Multi-shard support • Pivot Facet Support • Statistics on Multi-value fields • To support unique() • Filter result set based upon calculated statistics • Generalize facet implementation
  • 17. Links and Questions? Analytics Component h"ps://issues.apache.org/jira/browse/SOLR-­‐5302 More About Bloomberg h"p://www.bloomberglabs.com/