SlideShare a Scribd company logo
Applying data science to sales pipelines –
for fun and profit!
!
Andy Twigg!
CTO!
Data!
science!
Domain!
expertise!
Machine!
learning! Data!
•  62B sales pipeline records!
•  Structured, unstructured!
•  3rd party public data!
•  Fine-grained temporal data!
Deep expertise:!
•  sales!
•  forecasting!
•  revenue models!
•  Automated ML infrastructure!
•  ML models tuned for specific
problems!
CUSTOMERS!
DATA SCIENCE @ C9!
•  Opportunity Scoring!
•  What is Pr(win) for this deal?!
•  What is Pr(win in quarter) for this deal?!
•  How does this compare to sales team commits?!
•  Which deals can we influence most?!
•  Forecasting!
•  How much will we close this quarter?!
SALES PIPELINES & OPPORTUNITIES!
•  Opportunities are temporal creatures; while ‘open’ they proceed through a number of observations
and terminate in one of a discrete set of ‘closed’ states – typically won or lost!
•  Usually they proceed through ‘stages’, except:!
•  An opportunity can be entered into the CRM system as closed (no open observations)!
•  Stages are only a partial order - can skip / revisit stages!
•  Can be re-opened after closed!
•  As the opportunity evolves, we get more and more data about the opportunity!
•  A pipeline is a set of open opportunities!
Lead
created!
Stage:
Qualifying!
Email
sent!
Email sent:
response!
Amount=
$1000! Call!
Stage:
demo!
Meeting! Demo!
Push
close
date!
Stage:
negotiation!
Closed/
won!
Reopened
Amount=
$2000!
Closed/won!
ANATOMY OF AN OPPTY!
ANATOMY OF AN OPPTY!
Pushed out
Pulled back
in
Final outcome:
won
Committed
here (by the
sales rep)
ANATOMY OF AN OPPTY!
Pushed out
Pulled back
in
Final outcome:
won
Committed
here (by the
sales rep)
Predicted
won from
the start
Predicted won
in the correct
quarter
PREDICTIVE ENGINE!
Build a fine-grained
history of closed
opportunities
•  Cleaning!
•  Preprocessing!
•  Featurizing!
•  ~10 GB/customer!
•  ~ 1M training rows!
!
1,000s of raw
signals per
opportunity
•  Structured (CRM, ERP)!
•  Unstructured (NLP)!
•  Firmographic!
•  Gov sources!
•  SEC filings!
•  Crunchbase!
•  …!
!
Identify historic deals
with similar behavior
Continuously re-score
opportunities as they
evolve
Update model as
opportunities close
•  Fully-automated model rebuilding and scoring platform!
•  Model input features:!
•  Historic observations of opportunity!
•  Sales-specific features e.g. momentum!
•  Temporal features e.g. std(amount over last 30 days)!
•  Industry-wide features e.g. avg_sales_cycle(target)!
•  Continuously cross-validated model tuning!
•  Extensible, scalable platform using Hadoop (HDFS), Python!
Win/Loss Model (Random Forest)
Estimate Pr(win)
Duration Model (Poisson Regression)
Estimate Pr(win in quarter)
Influencer Model (Linear)
Positive/Negative Drivers
•  Standard Features
•  Temporal Features
•  Derived Features
BEHIND THE SCORES!
©2014 All Rights Reserved
©2014 All Rights Reserved
Applying data science to sales pipelines - for fun and profit
sales team: good precision (~70-80%) but poor recall (~10-40%)!
C9 won precision ~ sales team won precision!
C9 won recall ~ 3 x sales team won recall!
First observation Last observation
precision recall f1 precision recall f1
C9 scoring 0.65 0.86 0.74 0.75 0.93 0.83
Commit 0.70 0.07 0.13 0.87 0.45 0.59
Applying data science to sales pipelines - for fun and profit
FORECASTING: TOP-DOWN VS BOTTOM-UP!
Top-down: Predict current quarter based on
previous quarters!
!
Accounts for seasonality and trending!
!
Ignores state of current pipeline!
0.0e+002.5e+08
observed
5.0e+072.5e+08
trend
−5e+065e+06
seasonal
−1e+075e+06
2013.0 2013.2 2013.4 2013.6 2013.8 2014.0 2014.2 2014.4
random
Time
Decomposition of additive time series
Bottom-up: Predict current quarter based
on currently open pipeline!
!
Considers quality of deals in pipe!
!
Ignores trends, deals not in pipe!
$265,410!
$157,000
77%
$200,000
37%
$82,000
86%
+!
-!
+!
-!
HYBRID FORECASTING!
top down + bottom up!
20
40
60
11 10 9 8 7 6 5 4 3 2 1
Weeks to EOQ
Amount($M)
C9
Final Amount
Actual Amount
Amount Forecast
•  Augment time-series model with
side information from bottom-up
model, e.g.:!
•  Amount predicted to close
in current quarter!
•  Average score of currently
open opportunities!
•  Average predicted days to
close!
!
•  Sometimes known as ARIMAX!
log(yt) ∼
12
i=1
log(yt−i) + log(x
(1)
t−12) + log(x
(2)
t−12) + log(x
(3)
t−12)
Applying data science to sales pipelines - for fun and profit

More Related Content

PDF
A Brief History of Taiwan Cinema.pdf
PDF
Applying data science to sales pipelines — for fun and profit
PDF
Data science at InsideSales.com
PDF
Applying Data Science - for Fun and Profit
PDF
Applying data science to sales pipelines -- for fun and profit
PDF
Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...
PDF
Automatic Forecasting using Prophet, Databricks, Delta Lake and MLflow
PPTX
Sip reportf
A Brief History of Taiwan Cinema.pdf
Applying data science to sales pipelines — for fun and profit
Data science at InsideSales.com
Applying Data Science - for Fun and Profit
Applying data science to sales pipelines -- for fun and profit
Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...
Automatic Forecasting using Prophet, Databricks, Delta Lake and MLflow
Sip reportf

Similar to Applying data science to sales pipelines - for fun and profit (20)

PPTX
Migrating to Salesforce
PPTX
Breakthrough Sales Productivity
PDF
RevOps: Automating Revenue Operations to Drive Revenue Growth
PDF
The Digital Transformation of Sales
PDF
Slides: Applying Artificial Intelligence (AI) in All the Right Places in the ...
PPTX
moniiii21internshipreviewofmachinelearningppt.pptx
PPTX
The Evolution of Data Architecture
PDF
Close Sales Faster By Combining Data & Artificial Intelligence
PDF
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
PPTX
Data Strategy for Digital Sales
PPTX
Data Science Innovation Summit Philadelphia 2019 - pariveda
PPTX
[DSC DACH 24] How to optimize stock with an Automated Logistics and Intellige...
PDF
Real-Time Forecasting at Scale using Delta Lake and Delta Caching
PPTX
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
PDF
SALES_FORECASTING of sparkflows.pdf
PDF
Data Science Salon: Adopting Machine Learning to Drive Revenue and Market Share
PPTX
Intelligent Campaigns : A data-driven approach to DemandGen and Prospecting
PPTX
Using Salesforce, ERP, Tableau & R in Sales Forecasting
PDF
Data Science and Enterprise Engineering with Michael Finger and Chris Robison
PDF
Big data, predictive modeling and analytics in online marketing
Migrating to Salesforce
Breakthrough Sales Productivity
RevOps: Automating Revenue Operations to Drive Revenue Growth
The Digital Transformation of Sales
Slides: Applying Artificial Intelligence (AI) in All the Right Places in the ...
moniiii21internshipreviewofmachinelearningppt.pptx
The Evolution of Data Architecture
Close Sales Faster By Combining Data & Artificial Intelligence
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
Data Strategy for Digital Sales
Data Science Innovation Summit Philadelphia 2019 - pariveda
[DSC DACH 24] How to optimize stock with an Automated Logistics and Intellige...
Real-Time Forecasting at Scale using Delta Lake and Delta Caching
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
SALES_FORECASTING of sparkflows.pdf
Data Science Salon: Adopting Machine Learning to Drive Revenue and Market Share
Intelligent Campaigns : A data-driven approach to DemandGen and Prospecting
Using Salesforce, ERP, Tableau & R in Sales Forecasting
Data Science and Enterprise Engineering with Michael Finger and Chris Robison
Big data, predictive modeling and analytics in online marketing
Ad

Recently uploaded (20)

PPT
Predictive modeling basics in data cleaning process
PPTX
Modelling in Business Intelligence , information system
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
Managing Community Partner Relationships
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PDF
Mega Projects Data Mega Projects Data
PDF
Introduction to Data Science and Data Analysis
Predictive modeling basics in data cleaning process
Modelling in Business Intelligence , information system
Qualitative Qantitative and Mixed Methods.pptx
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
Galatica Smart Energy Infrastructure Startup Pitch Deck
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
climate analysis of Dhaka ,Banglades.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Managing Community Partner Relationships
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
ISS -ESG Data flows What is ESG and HowHow
Topic 5 Presentation 5 Lesson 5 Corporate Fin
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
Mega Projects Data Mega Projects Data
Introduction to Data Science and Data Analysis
Ad

Applying data science to sales pipelines - for fun and profit

  • 1. Applying data science to sales pipelines – for fun and profit! ! Andy Twigg! CTO!
  • 2. Data! science! Domain! expertise! Machine! learning! Data! •  62B sales pipeline records! •  Structured, unstructured! •  3rd party public data! •  Fine-grained temporal data! Deep expertise:! •  sales! •  forecasting! •  revenue models! •  Automated ML infrastructure! •  ML models tuned for specific problems!
  • 4. DATA SCIENCE @ C9! •  Opportunity Scoring! •  What is Pr(win) for this deal?! •  What is Pr(win in quarter) for this deal?! •  How does this compare to sales team commits?! •  Which deals can we influence most?! •  Forecasting! •  How much will we close this quarter?!
  • 5. SALES PIPELINES & OPPORTUNITIES! •  Opportunities are temporal creatures; while ‘open’ they proceed through a number of observations and terminate in one of a discrete set of ‘closed’ states – typically won or lost! •  Usually they proceed through ‘stages’, except:! •  An opportunity can be entered into the CRM system as closed (no open observations)! •  Stages are only a partial order - can skip / revisit stages! •  Can be re-opened after closed! •  As the opportunity evolves, we get more and more data about the opportunity! •  A pipeline is a set of open opportunities! Lead created! Stage: Qualifying! Email sent! Email sent: response! Amount= $1000! Call! Stage: demo! Meeting! Demo! Push close date! Stage: negotiation! Closed/ won! Reopened Amount= $2000! Closed/won!
  • 6. ANATOMY OF AN OPPTY!
  • 7. ANATOMY OF AN OPPTY! Pushed out Pulled back in Final outcome: won Committed here (by the sales rep)
  • 8. ANATOMY OF AN OPPTY! Pushed out Pulled back in Final outcome: won Committed here (by the sales rep) Predicted won from the start Predicted won in the correct quarter
  • 9. PREDICTIVE ENGINE! Build a fine-grained history of closed opportunities •  Cleaning! •  Preprocessing! •  Featurizing! •  ~10 GB/customer! •  ~ 1M training rows! ! 1,000s of raw signals per opportunity •  Structured (CRM, ERP)! •  Unstructured (NLP)! •  Firmographic! •  Gov sources! •  SEC filings! •  Crunchbase! •  …! ! Identify historic deals with similar behavior Continuously re-score opportunities as they evolve Update model as opportunities close •  Fully-automated model rebuilding and scoring platform! •  Model input features:! •  Historic observations of opportunity! •  Sales-specific features e.g. momentum! •  Temporal features e.g. std(amount over last 30 days)! •  Industry-wide features e.g. avg_sales_cycle(target)! •  Continuously cross-validated model tuning! •  Extensible, scalable platform using Hadoop (HDFS), Python!
  • 10. Win/Loss Model (Random Forest) Estimate Pr(win) Duration Model (Poisson Regression) Estimate Pr(win in quarter) Influencer Model (Linear) Positive/Negative Drivers •  Standard Features •  Temporal Features •  Derived Features BEHIND THE SCORES!
  • 11. ©2014 All Rights Reserved
  • 12. ©2014 All Rights Reserved
  • 14. sales team: good precision (~70-80%) but poor recall (~10-40%)! C9 won precision ~ sales team won precision! C9 won recall ~ 3 x sales team won recall! First observation Last observation precision recall f1 precision recall f1 C9 scoring 0.65 0.86 0.74 0.75 0.93 0.83 Commit 0.70 0.07 0.13 0.87 0.45 0.59
  • 16. FORECASTING: TOP-DOWN VS BOTTOM-UP! Top-down: Predict current quarter based on previous quarters! ! Accounts for seasonality and trending! ! Ignores state of current pipeline! 0.0e+002.5e+08 observed 5.0e+072.5e+08 trend −5e+065e+06 seasonal −1e+075e+06 2013.0 2013.2 2013.4 2013.6 2013.8 2014.0 2014.2 2014.4 random Time Decomposition of additive time series Bottom-up: Predict current quarter based on currently open pipeline! ! Considers quality of deals in pipe! ! Ignores trends, deals not in pipe! $265,410! $157,000 77% $200,000 37% $82,000 86% +! -! +! -!
  • 17. HYBRID FORECASTING! top down + bottom up! 20 40 60 11 10 9 8 7 6 5 4 3 2 1 Weeks to EOQ Amount($M) C9 Final Amount Actual Amount Amount Forecast •  Augment time-series model with side information from bottom-up model, e.g.:! •  Amount predicted to close in current quarter! •  Average score of currently open opportunities! •  Average predicted days to close! ! •  Sometimes known as ARIMAX! log(yt) ∼ 12 i=1 log(yt−i) + log(x (1) t−12) + log(x (2) t−12) + log(x (3) t−12)

Editor's Notes

  • #3: why is data science work @ c9 different to other MLaaS/BI/SaaS companies? it’s ML put to work for a specific application, using domain knowledge
  • #5: Answering each of these questions uses different techniques
  • #6: Let’s dig deeper into what goes into the scores. We take more than 1,000 raw signals per opportunity – this is structured CRM data, unstructured text data (NLP), firmographic data including government sources (registration filings, credit unions, SEC filings), and other sources such as job postings, crunchbase, etc. This data is noisy so a lot of effort goes into cleaning and preparing this data, removing highly correlated subsets of fields, matching across tables, etc. Finally, this preprocessed data is transformed into a form that can be used for machine learning. For each customer, we have a large, fine-grained history of closed opportunities, stored in a temporal database. Each customer will typically have around several GB of raw historical data. The predictive model pipeline is fully automated and runs in production using HDFS, Java and Python. The This shows some of the detail surfaced by our predictive models for a specific opportunity. We surface three main results: Is the deal likely to win? Is it likely to be win this quarter? What are the main positive/negative influencers? Can I win: Random forest model, estimating Pr(win) regardless of time horizon # signals ~ 200-500 raw fields, covering > 1000 signals after cleaning Structured CRM data, NLP on unstructured data, ~30 government sources (registration filings, credit unions, SEC filings - # employees, revenue), crunchbase, NLP Can I win this quarter: We construct a duration model that attempts to estimate the number of days until the opportunity will close, assuming the eventual outcome is a win. We then transform this into an estimate of Pr(win in quarter). Key indicators: We use a model to that takes in specific influencing features, such as ‘budget secured, deal momentum’ etc. It includes both standard features in CRM data (eg industry), but also domain-specific and temporal features (eg ‘momentum’, ‘amount is fluctuating’) It then surfaces the 3 features that most strongly positively or negatively influence the overall score of the deal Constructing these fields requires domain expertise – in knowing what measures are importance, and which are also actionable by the managers and sales reps
  • #7: This shows clearly the temporal nature of an opportunity Opportunities are (usually) either open/closed Usually proceed through stages, but: Can go straight to closed Stages are only a partial order - can skip / revisit stages Can be re-opened after closed As the opportunity evolves, we get more and more data A pipeline is a set of open opportunities
  • #8: Let’s dig deeper into what goes into the scores. We take more than 1,000 raw signals per opportunity – this is structured CRM data, unstructured text data (NLP), firmographic data including government sources (registration filings, credit unions, SEC filings), and other sources such as job postings, crunchbase, etc. This data is noisy so a lot of effort goes into cleaning and preparing this data, removing highly correlated subsets of fields, matching across tables, etc. Finally, this preprocessed data is transformed into a form that can be used for machine learning. For each customer, we have a large, fine-grained history of closed opportunities, stored in a temporal database. Each customer will typically have around several GB of raw historical data. The predictive model pipeline is fully automated and runs in production using HDFS, Java and Python. The This shows some of the detail surfaced by our predictive models for a specific opportunity. We surface three main results: Is the deal likely to win? Is it likely to be win this quarter? What are the main positive/negative influencers? Can I win: Random forest model, estimating Pr(win) regardless of time horizon # signals ~ 200-500 raw fields, covering > 1000 signals after cleaning Structured CRM data, NLP on unstructured data, ~30 government sources (registration filings, credit unions, SEC filings - # employees, revenue), crunchbase, NLP Can I win this quarter: We construct a duration model that attempts to estimate the number of days until the opportunity will close, assuming the eventual outcome is a win. We then transform this into an estimate of Pr(win in quarter). Key indicators: We use a model to that takes in specific influencing features, such as ‘budget secured, deal momentum’ etc. It includes both standard features in CRM data (eg industry), but also domain-specific and temporal features (eg ‘momentum’, ‘amount is fluctuating’) It then surfaces the 3 features that most strongly positively or negatively influence the overall score of the deal Constructing these fields requires domain expertise – in knowing what measures are importance, and which are also actionable by the managers and sales reps
  • #9: Let’s dig deeper into what goes into the scores. We take more than 1,000 raw signals per opportunity – this is structured CRM data, unstructured text data (NLP), firmographic data including government sources (registration filings, credit unions, SEC filings), and other sources such as job postings, crunchbase, etc. This data is noisy so a lot of effort goes into cleaning and preparing this data, removing highly correlated subsets of fields, matching across tables, etc. Finally, this preprocessed data is transformed into a form that can be used for machine learning. For each customer, we have a large, fine-grained history of closed opportunities, stored in a temporal database. Each customer will typically have around several GB of raw historical data. The predictive model pipeline is fully automated and runs in production using HDFS, Java and Python. The This shows some of the detail surfaced by our predictive models for a specific opportunity. We surface three main results: Is the deal likely to win? Is it likely to be win this quarter? What are the main positive/negative influencers? Can I win: Random forest model, estimating Pr(win) regardless of time horizon # signals ~ 200-500 raw fields, covering > 1000 signals after cleaning Structured CRM data, NLP on unstructured data, ~30 government sources (registration filings, credit unions, SEC filings - # employees, revenue), crunchbase, NLP Can I win this quarter: We construct a duration model that attempts to estimate the number of days until the opportunity will close, assuming the eventual outcome is a win. We then transform this into an estimate of Pr(win in quarter). Key indicators: We use a model to that takes in specific influencing features, such as ‘budget secured, deal momentum’ etc. It includes both standard features in CRM data (eg industry), but also domain-specific and temporal features (eg ‘momentum’, ‘amount is fluctuating’) It then surfaces the 3 features that most strongly positively or negatively influence the overall score of the deal Constructing these fields requires domain expertise – in knowing what measures are importance, and which are also actionable by the managers and sales reps
  • #13: what’s the competition? reps’ commits we have found that typically, sales people have good won precision (~70%) but poor recall (30-45%) but they aren’t good at identifying _which_ deals will close we can identify about 2x as many deals that will eventually win, at the start of the quarter def f1(p,r): return 2.0*(p*r)/(p+r) Many customers follow this pattern.
  • #14: world’s most interesting salesman
  • #15: Slide Objective: Present the unique capabilities of C9 OppScore, show those capabilities in the demo and then connect those capabilities to business benefit. Questions: Let’s dive into the product. We’ll start by talking about how C9 OppScore helps reps determine which deals they should focus on and the specific actions they can take to improve the probability of winning. How much quality coaching time do managers have with reps each week? How do your reps prioritize which deals to pursue and how do they determine when to walk away? How long does it take a new rep to ramp and what are you doing to enable new reps? How do you ensure they reps are identify and execute on the sales strategies that have had the highest rate of success? Talking Points: C9 can help you address all of these points. In the next few minutes you’re going to see how C9 allows you understand how data science can be used to get reps focused on the right deals and applying the best actions to close those deals. After the demo recap the business benefits
  • #16: Slide Objective: Present the unique capabilities of C9 OppScore, show those capabilities in the demo and then connect those capabilities to business benefit. Questions: Let’s dive into the product. We’ll start by talking about how C9 OppScore helps reps determine which deals they should focus on and the specific actions they can take to improve the probability of winning. How much quality coaching time do managers have with reps each week? How do your reps prioritize which deals to pursue and how do they determine when to walk away? How long does it take a new rep to ramp and what are you doing to enable new reps? How do you ensure they reps are identify and execute on the sales strategies that have had the highest rate of success? Talking Points: C9 can help you address all of these points. In the next few minutes you’re going to see how C9 allows you understand how data science can be used to get reps focused on the right deals and applying the best actions to close those deals. After the demo recap the business benefits