SlideShare a Scribd company logo
Jayesh Govindarajan
Search Relevance @ Salesforce
Improving Enterprise findability
Jayesh
Govindarajan
Senior Director Search
Relevance, Data Science
Salesforce
1. How is search in the enterprise
different ?
2. Enterprise findability problem
3. Relevance, LETOR algorithms
4. Deploying models in solr
5. A model for every customer
6. Putting the pieces together
Forward-Looking Statements
Statement under the Private Securities Litigation Reform Act of 1995:
This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties
materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results expressed or
implied by the forward-looking statements we make. All statements other than statements of historical fact could be deemed forward-looking,
including any projections of product or service availability, subscriber growth, earnings, revenues, or other financial items and any statements
regarding strategies or plans of management for future operations, statements of belief, any statements concerning new, planned, or upgraded
services or technology developments and customer contracts or use of our services.
The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new functionality
for our service, new products and services, our new business model, our past operating losses, possible fluctuations in our operating results and
rate of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of any litigation, risks associated with
completed and any possible mergers and acquisitions, the immature market in which we operate, our relatively limited operating history, our
ability to expand, retain, and motivate our employees and manage our growth, new releases of our service and successful customer
deployment, our limited history reselling non-salesforce.com products, and utilization and selling to larger enterprise customers. Further
information on potential factors that could affect the financial results of salesforce.com, inc. is included in our annual report on Form 10-K for the
most recent fiscal year and in our quarterly report on Form 10-Q for the most recent fiscal quarter. These documents and others containing
important disclosures are available on the SEC Filings section of the Investor Information section of our Web site.
Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently available
and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features that
are currently available. Salesforce.com, inc. assumes no obligation and does not intend to update these forward-looking statements.
Largest Enterprise Search Service!
1.6
300TB+ 600M+
Queries / Week
<2min
Incremental Index Latency
500B+
Average Click Rank
Index Size
Documents in the Index
<120ms
Query Latency on Search Server
7B+
Index Updates / Day
Empower enterprise users to
effortlessly find all the
information they need in order
to be successful with
Salesforce
Intelligent, Fast and Powerful
Be a competitive differentiator
for Salesforce
The Search Vision
What information do you need?
Demo time
1. How is enterprise search
different ?
Diversity of Data is a Challenge!
Sales Cloud
Structured data
SFA, B2C
Service Cloud
Unstructured data
Case Mgmt, KB, Field Svc
Community Cloud
Enterprise Social data
Q&A, Chatter, Files
App Cloud
Search APIs
Person Search
People data
Diversity of Intentions:
A Service agent exploring a community forum to educate himself:
Recall
A Service agent looking for a case similar to the one she is currently assigned:
Precision
A Sales rep looking for a named account to call:
Precision
A Sales rep looking for contacts in an industry within a certain geo:
Recall
Patterns of search and discovery differ by user roles, and searched entity
Customer diversity: one size doesn’t fit all
Matching models to Customer Orgs
Some Orgs want a lower coefficient
in some cases
Some Orgs want a higher coefficient
2. Understanding enterprise
findability problem
Most ranking functions start off with a few boosts
and end up like….this
Form
1. Query independent signals - multiplicative boosts in range [1-3]
2. Entity specific signals - additive boosts in the range of [1-12]
a. Accounts, Contacts, Leads - LastActivityScore, LastModifiedScore
b. Cases - CaseStatus, CaseEscalationScore
3. ...
Getting to a machine learned function has
challenges
Constraint
1. Customers build apps on enterprise search platforms. One cannot
simply cutover to a new ranking system.
Key Lessons
2. Understanding the current search equation is key to anticipating
customer breakage/impact.
3. Important to formalize the “Human Intelligence” equation behind a
working system.
3. Machine learning and Learning
to Rank methods
Build a probabilistic model of relevance.
Chance that user clicks on the ith record: pr(r,q,i)= L(r,q)*Rb(i)
- L() is a function which maps the (record,query) pair to the likelihood that a user clicks on r in
response to q
- Rb() is a function which corrects for the positional bias probabilities
Master Relevance Equation
Goal: learn the best linear function of these variables
From queries and clicks.
Logistic Regression
docPV score Clicked
queryId
-hnjnxlbxd 0 10.892 1
1ttuuy6n3 5 0.230 0
1ttuuy6n3 0 0.232 0
1ttuuy6n3 0 0.230 0
1ttuuy6n3 0 0.230 1
1ttuuy6n3 0 0.244 0
1ttuuy6n3 0 0.231 0
1ttuuy6n3 6 0.228 0
1ttuuy6n3 5 0.228 0
1ttuuy6n3 0 0.231 0
If this was the data, the
simplest approach would be
logistic regression
P(clicked) = sigmoid (a0
+ a1
(docPV) +
a2
(score))
Incremental effects of docPV and
score on relevance
Bias term that affects all
observations equally
The only thing to change about this is that we want
a separate bias term for each positional rank.
Page views and Lucene score for result in
position 1
Page views and Lucene score for result in
position 3
Position that was clicked
Example: Query/Click Data
Goal:
b1
b2
b3
b4
b5
1
Result 1 Result 2 Result 3 Result 4 Result 5
=
● Five logistic regressions with shared weights, but different biases.
● Coefficients and biases are learned via MLE (SGD).
Learning
Results
---------- ACCOUNTS ----------
Coefficients
-------------------------
docPC 0.203
docPV 0.312
doclm_score 0.642
lastAccessed_score 0.34
score 0.251
Rank Bias
-------------------------
Rank 1 1.0
Rank 2 0.884
Rank 3 0.843
Rank 4 0.788
Rank 5 0.94
The model is much better at predicting which of the five
results will be clicked.
Detect 50% of occurrences
where the 5th result was
clicked. Wrong on 1 out of 8
attempts
All else being equal, the odds of clicking position 2 are
about .884 compared to the odds of clicking position 1
Opportunities are the subject
of more general searches. E.g.
“Which opportunities are John
Smith working on?”
Searches for accounts or
cases are more likely to be
very specific. E.g. “I have a
specific account in mind…”
Results: Coefficients (normalized by relative influence)
q1 / docPV q2 / docPC d1 /
doclm_score
d2 /
lastaccessed
d3 /
oppclosedate
d4 /
oppclosed
d5 /
caseEscState
d6 /
caseClosed
Lucene Score
users 3.12 7.24
groups 4.59 4.59
files 1.32 3.74
cases 0.85 0.15 0.49 0.33 1.21
leads 2.01 2.30 1.01 1.62
contacts 1.87 1.42 1.15 0.76 2.76
accounts 1.64 1.07 1.65 0.96 4.17
oppy 2.27 0.53 0.53 0.62 0.81 3.64
kb 0.50 0.32 2.00
● Lucene score is always most important (except for leads)
● LastModified is extremely important for accounts and leads, but not at all for cases.
4. Implementing Model
representation in SOLR
Relevance Metadata JSON format
{
"schema": 1.0,
"global": {
"pc_s": 1.5, Boost Parent Child scores by 1.5
"pv_s": 2.0, Boost PageView counts by 2
"lm_s": 1.333, Last Modified
},
"entity": {
"500": { Specific for Cases (key prefix 005)
"cc_s": 4.0, Boost Open Cases
"lm_s": 1.0, Apply a different boost for Last Modified
}
Relevance Metadata (RMD)
Relevance
Model
{
schemaVersion:"V1",
orgId:"00D1234567",
Account:{
PV:2.1,
LastMod:0.5
},...
QIR:"Solr",
DBRerank:"CoreApp"
}
Model
Store
JSON is stored in a Blob
field in Setup BPO or
HBase table
Changes to format /
schema won't affect
table (it's just a blob)
AB Experiment
Name:
Org:
Params:
JSON
Model Deploy
Org:
Params:
JSON
Querying
Pass the same JSON to
Query layer and Solr
Server, each should have
code that knows what to
do given the JSON
Model Builder
(offline)
solutions to help us (Devs / PM
etc) build the model JSON files
Pass entire JSON to
Solr, or just boost
function
The same JSON is used to run AB
experiment and eventually deploy
into production
Relevancy coefficients are expressed in
a JSON data structure, so that we can
easily specify per-entity or global-to-org
coefficients
5. Stacking base and custom
models
Recap: And one size doesn’t fit all
Cluster Orgs based on their ACR response curves
Green orgs are hurt
badly by increasing
coefficient changes
Blue orgs are hurt badly
by decreasing
coefficient changes
Reds are hurt badly
either way
Three distinct clusters
observed in hierarchical
clustering
Stacking models
RELEVANCE PIPELINE
Base Model
(All orgs, all entities)
...Accounts
Model
Case
Model
Knowledge
Article Model
Feeds
Model
} Entity Signals
Org 1
Org 2
...
Org n
Putting the pieces together:
Relevance ML Pipeline, Runtime
Relevance ML Pipeline
RELEVANCE PIPELINE
Common representation
of ranking model. Infra
to automate training, A/B
testing and deployment.
FEATURE DETAILS
● Config driven Model
Deployment
● Automated model
generation
● Training and A/B
experimentation
Core App Solr
Model
Deployment
Model
Evaluation
A/B Experiments
Model Building
ML Training Infrastructure
Search
Query
Logs
RMD
JSON
Training
Models
RMD
JSON
RMD
JSON
RMD
JSON
Relevance Runtime
RMD
JSON
LEARN
TEST
SHIP
RMD JSON
Relevance
Model
Representation
Ranker
RMD
JSON
Training
Models
Relevance Runtime Infrastructure
RELEVANCE RUNTIME
Executes machine
learning models as a
service in solr, at scale
FEATURE DETAILS
● Ranking functions in solr
(linear and non-linear)
● Support for org, entity
specific models
● Query Understanding
Enterprise
Data
Index
Training Data Signals
- Interaction, Behavior
Clicks, Likes, Mentions
TF/IDF
Query
Understand
ing
(NLP, Q&A)
Level 1
Top K
Ranker
Level 2
ML Ranker Model
Salesforce
Cloud
Feature
Engineering
Snippet
Generation
Level 3
Post-Ranking
Model
MachineLearning
Pipeline
Search
Results
Conte
nt
Users/Acti
ons
Query
/Intent
query
Thank you
We are Hiring !
ML Engineers
Engineering
Managers
Software
EngineersData Scientists Join Salesforce Search Cloud
Mining Intent @ Work
Results: Positional Bias
Opportunities are the subject of more general
searches. E.g. “Which opportunities are John
Smith working on?”
Searches for accounts or cases are more
likely to be very specific. E.g. “I have a
specific account in mind…”

More Related Content

PDF
Webinar: Lucidworks + Thomson Reuters for Improved Investment Performance
PDF
Felix Burkhardt | ARCHITECTURE FOR A QUESTION ANSWERING MACHINE
PPT
Data Management and Migration in Salesforce
PPTX
SharePoint 2013 search improvements
PPTX
Admin community meetup admin secrets to clear salesforce interview (1)
PDF
Avoid Growing Pains: Scale Your App for the Enterprise (October 14, 2014)
PPTX
Establishing a relevance focused culture in a large organization
PDF
Apex Connector for Lightning Connect: Make Anything a Salesforce Object
Webinar: Lucidworks + Thomson Reuters for Improved Investment Performance
Felix Burkhardt | ARCHITECTURE FOR A QUESTION ANSWERING MACHINE
Data Management and Migration in Salesforce
SharePoint 2013 search improvements
Admin community meetup admin secrets to clear salesforce interview (1)
Avoid Growing Pains: Scale Your App for the Enterprise (October 14, 2014)
Establishing a relevance focused culture in a large organization
Apex Connector for Lightning Connect: Make Anything a Salesforce Object

What's hot (20)

PDF
Writing Code to Work Against any Salesforce Object
PDF
Performance Tuning for Visualforce and Apex
PDF
Moving to Solr/Lucene Open Source Search
PPTX
Understanding and Configuring an Effective SharePoint 2013 Search
PDF
Planning SharePoint 2013 Search for IT PROs
PPTX
Barcelona salesforce sdg november lightning connect
PDF
Introduction to External Objects and the OData Connector
PDF
Sps boston 2014_o365_power_shell_csom_amitv
PPT
Importing data to salesforce
PPTX
#SPSPhilly search topology & optimization
PPT
Bw web application design
PDF
Inside the Force.com Query Optimizer Webinar
PDF
SharePoint 2013 – the upgrade story
PDF
Wave Analytics: Developing Predictive Business Intelligence Apps
PDF
Salesforce1: Every Developer is a Mobile Developer
PDF
Salesforce Integration: Talking the Pain out of Data Loading
PDF
Phx User Group Salesforce Connect
PPT
Oracle bi ee architecture
PPSX
SharePoint Integration and the BDC - Richard Harbridge and Mark Brahmhall
PDF
In search of: A meetup about Liferay and Search 2016-04-20
Writing Code to Work Against any Salesforce Object
Performance Tuning for Visualforce and Apex
Moving to Solr/Lucene Open Source Search
Understanding and Configuring an Effective SharePoint 2013 Search
Planning SharePoint 2013 Search for IT PROs
Barcelona salesforce sdg november lightning connect
Introduction to External Objects and the OData Connector
Sps boston 2014_o365_power_shell_csom_amitv
Importing data to salesforce
#SPSPhilly search topology & optimization
Bw web application design
Inside the Force.com Query Optimizer Webinar
SharePoint 2013 – the upgrade story
Wave Analytics: Developing Predictive Business Intelligence Apps
Salesforce1: Every Developer is a Mobile Developer
Salesforce Integration: Talking the Pain out of Data Loading
Phx User Group Salesforce Connect
Oracle bi ee architecture
SharePoint Integration and the BDC - Richard Harbridge and Mark Brahmhall
In search of: A meetup about Liferay and Search 2016-04-20
Ad

Viewers also liked (20)

PDF
Working with Deeply Nested Documents in Apache Solr: Presented by Anshum Gupt...
PDF
Webinar: Building Conversational Search with Fusion
PPTX
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
PDF
Webinar: Site Search in an Hour with Fusion
PDF
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
PDF
Customizing Ranking Models for Enterprise Search: Presented by Ammar Haris & ...
PDF
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
PDF
Fusion 3 Overview Webinar
PDF
Free Data
PDF
Open Source at Salesforce.com
PDF
Automotive Information Research Driven by Apache Solr: Presented by Mario-Lea...
PDF
Faceting Optimizations for Solr: Presented by Toke Eskildsen, State & Univers...
PDF
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
PDF
Search++: Cognitive transformation of human-system interaction: Presented by ...
PDF
Why Is My Solr Slow?: Presented by Mike Drob, Cloudera
PDF
It's Just Search: Presented by Erik Hatcher, Lucidworks
PDF
Webinar: Ecommerce, Rules, and Relevance
PDF
Working with deeply nested documents in Apache Solr
PDF
Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...
PDF
Webinar: Building Customer-Targeted Search with Fusion
Working with Deeply Nested Documents in Apache Solr: Presented by Anshum Gupt...
Webinar: Building Conversational Search with Fusion
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Webinar: Site Search in an Hour with Fusion
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Customizing Ranking Models for Enterprise Search: Presented by Ammar Haris & ...
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Fusion 3 Overview Webinar
Free Data
Open Source at Salesforce.com
Automotive Information Research Driven by Apache Solr: Presented by Mario-Lea...
Faceting Optimizations for Solr: Presented by Toke Eskildsen, State & Univers...
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
Search++: Cognitive transformation of human-system interaction: Presented by ...
Why Is My Solr Slow?: Presented by Mike Drob, Cloudera
It's Just Search: Presented by Erik Hatcher, Lucidworks
Webinar: Ecommerce, Rules, and Relevance
Working with deeply nested documents in Apache Solr
Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...
Webinar: Building Customer-Targeted Search with Fusion
Ad

Similar to Improving Enterprise Findability: Presented by Jayesh Govindarajan, Salesforce (20)

PPT
The right path to making search relevant - Taxonomy Bootcamp London 2019
PDF
SDSC18 and DSATL Meetup March 2018
PPTX
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
PDF
Enhancing relevancy through personalization & semantic search
PDF
Reflected intelligence evolving self-learning data systems
PDF
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
PPTX
Personalizing Search at LinkedIn
PPTX
Haystack keynote 2019: What is Search Relevance? - Max Irwin
PDF
Measuring Relevance in the Negative Space
PDF
Personalizing Search
PPTX
How Humans & Machines Can Improve Site Search Results - Search Y: Paris
PDF
Krellenstein lucene revolution_2011_keynote_once_future_history_enterprise se...
PDF
Fully Automated QA System For Large Scale Search And Recommendation Engines U...
PDF
Search Relevance Roadmap KPIs Searcher Behaviors June2024
PDF
Find and be Found: Information Retrieval at LinkedIn
PDF
Enterprise Search – How Relevant Is Relevance?
PDF
Everything You Wish You Knew About Search
PDF
Webinar: Increase Conversion With Better Search
PDF
Haystack 2019 - Towards a Learning To Rank Ecosystem @ Snag - We've got LTR t...
PPTX
Enterprise Search as a Service at PwC - Viren Patel, PricewaterhouseCoopers
The right path to making search relevant - Taxonomy Bootcamp London 2019
SDSC18 and DSATL Meetup March 2018
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing relevancy through personalization & semantic search
Reflected intelligence evolving self-learning data systems
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
Personalizing Search at LinkedIn
Haystack keynote 2019: What is Search Relevance? - Max Irwin
Measuring Relevance in the Negative Space
Personalizing Search
How Humans & Machines Can Improve Site Search Results - Search Y: Paris
Krellenstein lucene revolution_2011_keynote_once_future_history_enterprise se...
Fully Automated QA System For Large Scale Search And Recommendation Engines U...
Search Relevance Roadmap KPIs Searcher Behaviors June2024
Find and be Found: Information Retrieval at LinkedIn
Enterprise Search – How Relevant Is Relevance?
Everything You Wish You Knew About Search
Webinar: Increase Conversion With Better Search
Haystack 2019 - Towards a Learning To Rank Ecosystem @ Snag - We've got LTR t...
Enterprise Search as a Service at PwC - Viren Patel, PricewaterhouseCoopers

More from Lucidworks (20)

PDF
Search is the Tip of the Spear for Your B2B eCommerce Strategy
PDF
Drive Agent Effectiveness in Salesforce
PPTX
How Crate & Barrel Connects Shoppers with Relevant Products
PPTX
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
PPTX
Connected Experiences Are Personalized Experiences
PDF
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
PPTX
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
PPTX
Preparing for Peak in Ecommerce | eTail Asia 2020
PPTX
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
PPTX
AI-Powered Linguistics and Search with Fusion and Rosette
PDF
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
PPTX
Webinar: Smart answers for employee and customer support after covid 19 - Europe
PDF
Smart Answers for Employee and Customer Support After COVID-19
PPTX
Applying AI & Search in Europe - featuring 451 Research
PPTX
Webinar: Accelerate Data Science with Fusion 5.1
PDF
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
PPTX
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
PPTX
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
PPTX
Webinar: Building a Business Case for Enterprise Search
PPTX
Why Insight Engines Matter in 2020 and Beyond
Search is the Tip of the Spear for Your B2B eCommerce Strategy
Drive Agent Effectiveness in Salesforce
How Crate & Barrel Connects Shoppers with Relevant Products
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Connected Experiences Are Personalized Experiences
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
Preparing for Peak in Ecommerce | eTail Asia 2020
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
AI-Powered Linguistics and Search with Fusion and Rosette
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Smart Answers for Employee and Customer Support After COVID-19
Applying AI & Search in Europe - featuring 451 Research
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Webinar: Building a Business Case for Enterprise Search
Why Insight Engines Matter in 2020 and Beyond

Recently uploaded (20)

PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Modernizing your data center with Dell and AMD
PDF
Encapsulation theory and applications.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Big Data Technologies - Introduction.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
NewMind AI Monthly Chronicles - July 2025
Spectral efficient network and resource selection model in 5G networks
Digital-Transformation-Roadmap-for-Companies.pptx
Unlocking AI with Model Context Protocol (MCP)
Per capita expenditure prediction using model stacking based on satellite ima...
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Encapsulation_ Review paper, used for researhc scholars
NewMind AI Weekly Chronicles - August'25 Week I
Mobile App Security Testing_ A Comprehensive Guide.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Modernizing your data center with Dell and AMD
Encapsulation theory and applications.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Understanding_Digital_Forensics_Presentation.pptx
Review of recent advances in non-invasive hemoglobin estimation
Big Data Technologies - Introduction.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
The Rise and Fall of 3GPP – Time for a Sabbatical?
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...

Improving Enterprise Findability: Presented by Jayesh Govindarajan, Salesforce

  • 1. Jayesh Govindarajan Search Relevance @ Salesforce Improving Enterprise findability
  • 2. Jayesh Govindarajan Senior Director Search Relevance, Data Science Salesforce 1. How is search in the enterprise different ? 2. Enterprise findability problem 3. Relevance, LETOR algorithms 4. Deploying models in solr 5. A model for every customer 6. Putting the pieces together
  • 3. Forward-Looking Statements Statement under the Private Securities Litigation Reform Act of 1995: This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results expressed or implied by the forward-looking statements we make. All statements other than statements of historical fact could be deemed forward-looking, including any projections of product or service availability, subscriber growth, earnings, revenues, or other financial items and any statements regarding strategies or plans of management for future operations, statements of belief, any statements concerning new, planned, or upgraded services or technology developments and customer contracts or use of our services. The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new functionality for our service, new products and services, our new business model, our past operating losses, possible fluctuations in our operating results and rate of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of any litigation, risks associated with completed and any possible mergers and acquisitions, the immature market in which we operate, our relatively limited operating history, our ability to expand, retain, and motivate our employees and manage our growth, new releases of our service and successful customer deployment, our limited history reselling non-salesforce.com products, and utilization and selling to larger enterprise customers. Further information on potential factors that could affect the financial results of salesforce.com, inc. is included in our annual report on Form 10-K for the most recent fiscal year and in our quarterly report on Form 10-Q for the most recent fiscal quarter. These documents and others containing important disclosures are available on the SEC Filings section of the Investor Information section of our Web site. Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently available and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features that are currently available. Salesforce.com, inc. assumes no obligation and does not intend to update these forward-looking statements.
  • 4. Largest Enterprise Search Service! 1.6 300TB+ 600M+ Queries / Week <2min Incremental Index Latency 500B+ Average Click Rank Index Size Documents in the Index <120ms Query Latency on Search Server 7B+ Index Updates / Day
  • 5. Empower enterprise users to effortlessly find all the information they need in order to be successful with Salesforce Intelligent, Fast and Powerful Be a competitive differentiator for Salesforce The Search Vision What information do you need?
  • 7. 1. How is enterprise search different ?
  • 8. Diversity of Data is a Challenge! Sales Cloud Structured data SFA, B2C Service Cloud Unstructured data Case Mgmt, KB, Field Svc Community Cloud Enterprise Social data Q&A, Chatter, Files App Cloud Search APIs Person Search People data
  • 9. Diversity of Intentions: A Service agent exploring a community forum to educate himself: Recall A Service agent looking for a case similar to the one she is currently assigned: Precision A Sales rep looking for a named account to call: Precision A Sales rep looking for contacts in an industry within a certain geo: Recall Patterns of search and discovery differ by user roles, and searched entity
  • 10. Customer diversity: one size doesn’t fit all Matching models to Customer Orgs Some Orgs want a lower coefficient in some cases Some Orgs want a higher coefficient
  • 12. Most ranking functions start off with a few boosts and end up like….this Form 1. Query independent signals - multiplicative boosts in range [1-3] 2. Entity specific signals - additive boosts in the range of [1-12] a. Accounts, Contacts, Leads - LastActivityScore, LastModifiedScore b. Cases - CaseStatus, CaseEscalationScore 3. ...
  • 13. Getting to a machine learned function has challenges Constraint 1. Customers build apps on enterprise search platforms. One cannot simply cutover to a new ranking system. Key Lessons 2. Understanding the current search equation is key to anticipating customer breakage/impact. 3. Important to formalize the “Human Intelligence” equation behind a working system.
  • 14. 3. Machine learning and Learning to Rank methods
  • 15. Build a probabilistic model of relevance. Chance that user clicks on the ith record: pr(r,q,i)= L(r,q)*Rb(i) - L() is a function which maps the (record,query) pair to the likelihood that a user clicks on r in response to q - Rb() is a function which corrects for the positional bias probabilities
  • 16. Master Relevance Equation Goal: learn the best linear function of these variables From queries and clicks.
  • 17. Logistic Regression docPV score Clicked queryId -hnjnxlbxd 0 10.892 1 1ttuuy6n3 5 0.230 0 1ttuuy6n3 0 0.232 0 1ttuuy6n3 0 0.230 0 1ttuuy6n3 0 0.230 1 1ttuuy6n3 0 0.244 0 1ttuuy6n3 0 0.231 0 1ttuuy6n3 6 0.228 0 1ttuuy6n3 5 0.228 0 1ttuuy6n3 0 0.231 0 If this was the data, the simplest approach would be logistic regression P(clicked) = sigmoid (a0 + a1 (docPV) + a2 (score)) Incremental effects of docPV and score on relevance Bias term that affects all observations equally The only thing to change about this is that we want a separate bias term for each positional rank.
  • 18. Page views and Lucene score for result in position 1 Page views and Lucene score for result in position 3 Position that was clicked Example: Query/Click Data Goal:
  • 19. b1 b2 b3 b4 b5 1 Result 1 Result 2 Result 3 Result 4 Result 5 = ● Five logistic regressions with shared weights, but different biases. ● Coefficients and biases are learned via MLE (SGD). Learning
  • 20. Results ---------- ACCOUNTS ---------- Coefficients ------------------------- docPC 0.203 docPV 0.312 doclm_score 0.642 lastAccessed_score 0.34 score 0.251 Rank Bias ------------------------- Rank 1 1.0 Rank 2 0.884 Rank 3 0.843 Rank 4 0.788 Rank 5 0.94 The model is much better at predicting which of the five results will be clicked. Detect 50% of occurrences where the 5th result was clicked. Wrong on 1 out of 8 attempts All else being equal, the odds of clicking position 2 are about .884 compared to the odds of clicking position 1 Opportunities are the subject of more general searches. E.g. “Which opportunities are John Smith working on?” Searches for accounts or cases are more likely to be very specific. E.g. “I have a specific account in mind…”
  • 21. Results: Coefficients (normalized by relative influence) q1 / docPV q2 / docPC d1 / doclm_score d2 / lastaccessed d3 / oppclosedate d4 / oppclosed d5 / caseEscState d6 / caseClosed Lucene Score users 3.12 7.24 groups 4.59 4.59 files 1.32 3.74 cases 0.85 0.15 0.49 0.33 1.21 leads 2.01 2.30 1.01 1.62 contacts 1.87 1.42 1.15 0.76 2.76 accounts 1.64 1.07 1.65 0.96 4.17 oppy 2.27 0.53 0.53 0.62 0.81 3.64 kb 0.50 0.32 2.00 ● Lucene score is always most important (except for leads) ● LastModified is extremely important for accounts and leads, but not at all for cases.
  • 23. Relevance Metadata JSON format { "schema": 1.0, "global": { "pc_s": 1.5, Boost Parent Child scores by 1.5 "pv_s": 2.0, Boost PageView counts by 2 "lm_s": 1.333, Last Modified }, "entity": { "500": { Specific for Cases (key prefix 005) "cc_s": 4.0, Boost Open Cases "lm_s": 1.0, Apply a different boost for Last Modified }
  • 24. Relevance Metadata (RMD) Relevance Model { schemaVersion:"V1", orgId:"00D1234567", Account:{ PV:2.1, LastMod:0.5 },... QIR:"Solr", DBRerank:"CoreApp" } Model Store JSON is stored in a Blob field in Setup BPO or HBase table Changes to format / schema won't affect table (it's just a blob) AB Experiment Name: Org: Params: JSON Model Deploy Org: Params: JSON Querying Pass the same JSON to Query layer and Solr Server, each should have code that knows what to do given the JSON Model Builder (offline) solutions to help us (Devs / PM etc) build the model JSON files Pass entire JSON to Solr, or just boost function The same JSON is used to run AB experiment and eventually deploy into production Relevancy coefficients are expressed in a JSON data structure, so that we can easily specify per-entity or global-to-org coefficients
  • 25. 5. Stacking base and custom models
  • 26. Recap: And one size doesn’t fit all Cluster Orgs based on their ACR response curves Green orgs are hurt badly by increasing coefficient changes Blue orgs are hurt badly by decreasing coefficient changes Reds are hurt badly either way Three distinct clusters observed in hierarchical clustering
  • 27. Stacking models RELEVANCE PIPELINE Base Model (All orgs, all entities) ...Accounts Model Case Model Knowledge Article Model Feeds Model } Entity Signals Org 1 Org 2 ... Org n
  • 28. Putting the pieces together: Relevance ML Pipeline, Runtime
  • 29. Relevance ML Pipeline RELEVANCE PIPELINE Common representation of ranking model. Infra to automate training, A/B testing and deployment. FEATURE DETAILS ● Config driven Model Deployment ● Automated model generation ● Training and A/B experimentation Core App Solr Model Deployment Model Evaluation A/B Experiments Model Building ML Training Infrastructure Search Query Logs RMD JSON Training Models RMD JSON RMD JSON RMD JSON Relevance Runtime RMD JSON LEARN TEST SHIP RMD JSON Relevance Model Representation Ranker RMD JSON Training Models
  • 30. Relevance Runtime Infrastructure RELEVANCE RUNTIME Executes machine learning models as a service in solr, at scale FEATURE DETAILS ● Ranking functions in solr (linear and non-linear) ● Support for org, entity specific models ● Query Understanding Enterprise Data Index Training Data Signals - Interaction, Behavior Clicks, Likes, Mentions TF/IDF Query Understand ing (NLP, Q&A) Level 1 Top K Ranker Level 2 ML Ranker Model Salesforce Cloud Feature Engineering Snippet Generation Level 3 Post-Ranking Model MachineLearning Pipeline Search Results Conte nt Users/Acti ons Query /Intent query
  • 32. We are Hiring ! ML Engineers Engineering Managers Software EngineersData Scientists Join Salesforce Search Cloud Mining Intent @ Work
  • 33. Results: Positional Bias Opportunities are the subject of more general searches. E.g. “Which opportunities are John Smith working on?” Searches for accounts or cases are more likely to be very specific. E.g. “I have a specific account in mind…”