SlideShare a Scribd company logo
Applying data science to sales pipelines !
– for fun and profit!
!
Andy Twigg!
Chief Scientist!
WHY APPLY DATA SCIENCE TO SALES?!
Problem: sales teams are biased!
!
•  Unrealistic targets – “you must have 3x coverage”!
•  Happy ears – “they said they’ll definitely buy it”!
•  Sandbagging – reps want to look like heroes, so don’t report deals
until late in the quarter!
We should be able to remove these biases!
•  Stat: since 1995, CRM data has increased ~150x, but forecast
accuracy has reduced by 10% !
!
è data is available, but not helping!
PROBLEMS!
Opportunity Scoring!
•  Pr(win) ?!
•  Pr(win in quarter) ?!
•  How does this compare to sales team commits?!
•  Which deals can we influence most?!
Forecasting!
•  How much will be won this quarter?!
SALES OPPORTUNITIES!
•  Opportunities are temporal, either open or closed. Once closed, either won/lost!
•  Usually proceed through stages, except:!
•  Stages are a partial order - can skip / revisit!
•  An opportunity can be entered as closed (no open observations)!
•  As the opportunity evolves, we get more and more data about the opportunity!
•  Sales teams mark an opportunity ‘committed’ – they predict win within the quarter!
•  A pipeline is a set of open opportunities!
•  We want to estimate Pr(final outcome = won), Pr(closed before time t), …!
Lead
created!
Stage:
Qualifying!
Email sent! Email opened! Amount=
$1000! Call!
Stage:
Validate!
Meeting! Demo!
Close date!
changed!
Stage:
negotiation!
Outcome:
Closed/won!
open closed
committed
Data science at InsideSales.com
Data science at InsideSales.com
•  sales team: good precision (~70-80%) but poor recall (~10-40%)!
•  model won precision ~ sales team won precision!
•  model won recall ~ 3 x sales team won recall!
First observation Last observation
precision recall F1 precision recall F1
model 0.65 0.86 0.74 0.75 0.93 0.83
sales team 0.70 0.07 0.13 0.87 0.45 0.59
Data science at InsideSales.com
ANATOMY OF AN OPPTY!
ANATOMY OF AN OPPTY!
Pushed out
Pulled back
in
Final outcome:
won
Committed
here (by the
sales rep)
ANATOMY OF AN OPPTY!
Pushed out
Pulled back
in
Final outcome:
won
Committed
here (by the
sales rep)
Predicted
won from
the start
Predicted won
in the correct
quarter
SALES OPPORTUNITIES!
Lead
created!
Stage:
Qualifying!
Email sent! Email opened! Amount=
$1000! Call!
Stage:
Validate!
Meeting! Demo!
Close date!
changed!
Stage:
negotiation!
Outcome:
Closed/won!
state!
xt!
state!
…!
x0!
y=1!
Lead
created!
Stage:
Qualifying!
Email sent! Email opened! Amount=
$1000! Call!
Stage:
Validate!
Meeting! Demo!
Close date!
changed!
Stage:
negotiation!
Outcome:
Closed/won!
SALES OPPORTUNITIES!
state!
xt!
state!
…!
x0!
•  Sequence of observations x0, x1, … !
•  associated with fixed target y={0,1}!
•  Consider states as a MDP: state xt encodes temporal features
about previous states (cf RMF features)!
•  # times this stage was previously visited, time between successive
visits, time in current stage, direction of amount change, …!
y=1!
•  Sequence of observations x0, x1, … !
•  associated with fixed target y={0,1}!
•  Consider states as a MDP: state xt encodes temporal features
about previous states (cf RMF features)!
•  # times this stage was previously visited, time between successive
visits, time in current stage, direction of amount change, …!
•  States also contain!
•  Sales-specific features e.g. momentum!
•  External data e.g. firmographic!
•  Global features e.g. avg_sales_cycle(target)!
•  Gives examples {(x0,y),(x1,y),…} for each opportunity!
•  Shuffle to break correlations between successive examples!
SALES OPPORTUNITIES!
y=1!
state!
xt!
state!
…!
x0!
Lead
created!
Stage:
Qualifying!
Email sent! Email opened! Amount=
$1000! Call!
Stage:
Validate!
Meeting! Demo!
Close date!
changed!
Stage:
negotiation!
Outcome:
Closed/won!
DURATION MODEL!
•  Win/loss model!
•  Pr(win)!
•  independent of time horizon!
•  RF/GBDT!
!
•  Duration model!
•  Pr(win within quarter)!
•  Poisson regression: assume that in current state xt, fixed probability of closing each day!
•  Train a model to predict expected duration d, conditioned on outcome=win!
•  Integrating corresponding exponential distribution gives Pr(close < t) (interarrival times)!
•  Pr(win < t) = Pr(win) Pr(close < t | win)!
FORECASTING: BOTTOM-UP!
Bottom-up: Predict current quarter based
on currently open pipeline!
!
Considers quality of deals in pipeline!
!
Ignores trends, deals not in pipeline!
$265,410!
$157,000
77%
$200,000
37%
$82,000
86%
+!
-!
Obvious solution: expected amount in
pipeline wrt Pr(win in quarter) scores!
FORECASTING: TOP-DOWN!
Top-down: Predict current quarter based on
previous quarters!
!
Accounts for seasonality and trending!
!
Ignores state of current pipeline!
0.0e+002.5e+08
observed
5.0e+072.5e+08
trend
−5e+065e+06
seasonal
−1e+075e+06
2013.0 2013.2 2013.4 2013.6 2013.8 2014.0 2014.2 2014.4
random
Time
Decomposition of additive time series
+!
-!
Typical decomposition of
revenue time series into 3
components:!
!
•  Trend component!
•  Seasonal component!
•  Random component!
Idea: try to reduce the
random component by taking
into account current pipeline!
‘HYBRID’ FORECASTING!
top down + bottom up!
•  Idea: augment ARIMA model with side
information from bottom-up model!
•  Allows model to adjust coefficients in
response to bottom-up features
(representing current pipeline) while
retaining ARIMA features !
•  Amount predicted to close in
current quarter!
•  Average score of currently open
opportunities!
•  Average predicted days to close!
•  Historic adjusted coverage ratios!
!
•  Sometimes known as ARIMAX [1]!
[1] robjhyndman.com/hyndsight/arimax!
!
WORD VECTORS!
•  Train word2vec model on text fields
on opportunities!
•  description, status, risks, …!
•  “deal pushed out because no
budget this quarter”!
!
•  ~200m words!
•  Gives 300-dimensional ‘neural’ word
embeddings!
•  Compare to GoogleNews model!
•  Learned some sales-specific
concepts!
In [23]: model.most_similar('lost')!
Out[23]:!
[('disqualified', 0.7105633020401001),!
('killed', 0.6871206164360046),!
('won', 0.6662579774856567),!
('abandoned', 0.6619119048118591),!
('closing', 0.6464139223098755),!
('moved', 0.6406350135803223),!
('reopened', 0.6268107891082764),!
('closed_lost', 0.6187739968299866),!
('low_probability', 0.6092942953109741),!
('closed', 0.6073518395423889)]!
!
In [24]: gn_model.most_similar('lost')!
Out[24]:!
[(u'losing', 0.7544215321540833),!
(u'lose', 0.7136349081993103),!
(u'regained', 0.618366003036499),!
(u'loses', 0.6115548610687256),!
(u'loosing', 0.576453447341919),!
(u'gained', 0.5561528205871582),!
(u'dropped', 0.5492223501205444),!
(u'loss', 0.5399519205093384),!
(u'won', 0.5263957977294922),!
(u'regain', 0.5241336822509766)]!
WORD VECTORS! In [8]: model.most_similar('pushed')!
Out[8]:!
[('moved', 0.8117796778678894),!
('pushing', 0.72132408618927),!
('delayed', 0.7004601955413818),!
('stalled', 0.6817235946655273),!
('indefinitely', 0.6797506809234619),!
('until', 0.6696473360061646),!
('shelved', 0.6633578538894653),!
('slowed_down', 0.6619900465011597),!
('might_slip', 0.6591036915779114),!
('gone', 0.6582096815109253)]!
!
In [9]: gn_model.most_similar('pushed')!
Out[9]:!
[(u'pushing', 0.762706458568573),!
(u'push', 0.695708692073822),!
(u'nudged', 0.6802582144737244),!
(u'shoved', 0.6162334084510803),!
(u'bumped', 0.6148176789283752),!
(u'pushes', 0.610393762588501),!
(u'dragged', 0.5916476845741272),!
(u'pulled', 0.5719939470291138),!
(u'moved', 0.5660783052444458),!
(u'inched', 0.5563575029373169)]!
In [49]: model.most_similar('sdr')!
Out[49]:!
[('mktg', 0.6193182468414307),!
('lead_gen', 0.5637482404708862),!
('ppl', 0.5618690252304077),!
('lss', 0.5492127537727356),!
('reps', 0.5445878505706787),!
('cold_calling', 0.5426461696624756),!
('mkt', 0.5422939658164978),!
('marketo', 0.5341131687164307),!
('team', 0.532421886920929),!
('guru', 0.5259524583816528)]!
!
In [50]: gn_model.most_similar('sdr')!
!
!
KeyError: "word 'sdr' not in vocabulary"!
We’re hiring!
!
data {scientists, engineers}!
!
!
andy.twigg@insidesales.com!

More Related Content

PDF
Applying data science to sales pipelines - for fun and profit
PDF
Applying data science to sales pipelines -- for fun and profit
PDF
Applying data science to sales pipelines - for fun and profit
PPTX
The QuantCon Keynote: "Counter Trend Trading – Threat or Complement to Trend ...
ODP
Using Java & Genetic Algorithms to Beat the Market
PDF
Applying Data Science - for Fun and Profit
PDF
featurestream-2
PDF
Parallel disk head emulation
Applying data science to sales pipelines - for fun and profit
Applying data science to sales pipelines -- for fun and profit
Applying data science to sales pipelines - for fun and profit
The QuantCon Keynote: "Counter Trend Trading – Threat or Complement to Trend ...
Using Java & Genetic Algorithms to Beat the Market
Applying Data Science - for Fun and Profit
featurestream-2
Parallel disk head emulation

Similar to Data science at InsideSales.com (20)

PDF
9 Ways Slides
PPTX
Strategic plan projections for Capsim
PDF
The SalesITV Sales Process
PPTX
Brisbane Shopify Meetup - 7th June 2017
PPSX
Cost volume analysis
PPS
Business in Motion
KEY
Business planning for social entrepreneurs
PPTX
Lsmto lean canvas ranking by profit forecasting
PPT
Profits, not sales for Keystone
PPTX
Building a Repeatable, Scalable & Profitable Growth Process
PPTX
Autotask how to stop being a whiner 2013
PPTX
Trade Shows Optimization
PPTX
#Measurefest : 20 Simple Ways to Fuck Up your AB tests
PPTX
Mass affluent lead gen and web based marketing for financial professionals
PPTX
Revenue Reporting: Your Genie in a Bottle
PPTX
Zero to 50m
PPTX
Zero to $50M – A Roadmap of the Key Stages, and How to Win at Each Stage
PPT
Day1 track session_1_b_ryan_cheyne
PPT
Day1 track session_1_b_ryan_cheyne
PDF
Financial Planning/Budgeting - Entrepreneurship 101
9 Ways Slides
Strategic plan projections for Capsim
The SalesITV Sales Process
Brisbane Shopify Meetup - 7th June 2017
Cost volume analysis
Business in Motion
Business planning for social entrepreneurs
Lsmto lean canvas ranking by profit forecasting
Profits, not sales for Keystone
Building a Repeatable, Scalable & Profitable Growth Process
Autotask how to stop being a whiner 2013
Trade Shows Optimization
#Measurefest : 20 Simple Ways to Fuck Up your AB tests
Mass affluent lead gen and web based marketing for financial professionals
Revenue Reporting: Your Genie in a Bottle
Zero to 50m
Zero to $50M – A Roadmap of the Key Stages, and How to Win at Each Stage
Day1 track session_1_b_ryan_cheyne
Day1 track session_1_b_ryan_cheyne
Financial Planning/Budgeting - Entrepreneurship 101
Ad

Recently uploaded (20)

PDF
Approach and Philosophy of On baking technology
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
A Presentation on Artificial Intelligence
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Encapsulation theory and applications.pdf
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Tartificialntelligence_presentation.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Big Data Technologies - Introduction.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Approach and Philosophy of On baking technology
Assigned Numbers - 2025 - Bluetooth® Document
A Presentation on Artificial Intelligence
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Encapsulation theory and applications.pdf
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Per capita expenditure prediction using model stacking based on satellite ima...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Tartificialntelligence_presentation.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Network Security Unit 5.pdf for BCA BBA.
Big Data Technologies - Introduction.pptx
Encapsulation_ Review paper, used for researhc scholars
Building Integrated photovoltaic BIPV_UPV.pdf
Spectral efficient network and resource selection model in 5G networks
Diabetes mellitus diagnosis method based random forest with bat algorithm
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Ad

Data science at InsideSales.com

  • 1. Applying data science to sales pipelines ! – for fun and profit! ! Andy Twigg! Chief Scientist!
  • 2. WHY APPLY DATA SCIENCE TO SALES?! Problem: sales teams are biased! ! •  Unrealistic targets – “you must have 3x coverage”! •  Happy ears – “they said they’ll definitely buy it”! •  Sandbagging – reps want to look like heroes, so don’t report deals until late in the quarter! We should be able to remove these biases! •  Stat: since 1995, CRM data has increased ~150x, but forecast accuracy has reduced by 10% ! ! è data is available, but not helping!
  • 3. PROBLEMS! Opportunity Scoring! •  Pr(win) ?! •  Pr(win in quarter) ?! •  How does this compare to sales team commits?! •  Which deals can we influence most?! Forecasting! •  How much will be won this quarter?!
  • 4. SALES OPPORTUNITIES! •  Opportunities are temporal, either open or closed. Once closed, either won/lost! •  Usually proceed through stages, except:! •  Stages are a partial order - can skip / revisit! •  An opportunity can be entered as closed (no open observations)! •  As the opportunity evolves, we get more and more data about the opportunity! •  Sales teams mark an opportunity ‘committed’ – they predict win within the quarter! •  A pipeline is a set of open opportunities! •  We want to estimate Pr(final outcome = won), Pr(closed before time t), …! Lead created! Stage: Qualifying! Email sent! Email opened! Amount= $1000! Call! Stage: Validate! Meeting! Demo! Close date! changed! Stage: negotiation! Outcome: Closed/won! open closed committed
  • 7. •  sales team: good precision (~70-80%) but poor recall (~10-40%)! •  model won precision ~ sales team won precision! •  model won recall ~ 3 x sales team won recall! First observation Last observation precision recall F1 precision recall F1 model 0.65 0.86 0.74 0.75 0.93 0.83 sales team 0.70 0.07 0.13 0.87 0.45 0.59
  • 9. ANATOMY OF AN OPPTY!
  • 10. ANATOMY OF AN OPPTY! Pushed out Pulled back in Final outcome: won Committed here (by the sales rep)
  • 11. ANATOMY OF AN OPPTY! Pushed out Pulled back in Final outcome: won Committed here (by the sales rep) Predicted won from the start Predicted won in the correct quarter
  • 12. SALES OPPORTUNITIES! Lead created! Stage: Qualifying! Email sent! Email opened! Amount= $1000! Call! Stage: Validate! Meeting! Demo! Close date! changed! Stage: negotiation! Outcome: Closed/won! state! xt! state! …! x0! y=1!
  • 13. Lead created! Stage: Qualifying! Email sent! Email opened! Amount= $1000! Call! Stage: Validate! Meeting! Demo! Close date! changed! Stage: negotiation! Outcome: Closed/won! SALES OPPORTUNITIES! state! xt! state! …! x0! •  Sequence of observations x0, x1, … ! •  associated with fixed target y={0,1}! •  Consider states as a MDP: state xt encodes temporal features about previous states (cf RMF features)! •  # times this stage was previously visited, time between successive visits, time in current stage, direction of amount change, …! y=1!
  • 14. •  Sequence of observations x0, x1, … ! •  associated with fixed target y={0,1}! •  Consider states as a MDP: state xt encodes temporal features about previous states (cf RMF features)! •  # times this stage was previously visited, time between successive visits, time in current stage, direction of amount change, …! •  States also contain! •  Sales-specific features e.g. momentum! •  External data e.g. firmographic! •  Global features e.g. avg_sales_cycle(target)! •  Gives examples {(x0,y),(x1,y),…} for each opportunity! •  Shuffle to break correlations between successive examples! SALES OPPORTUNITIES! y=1! state! xt! state! …! x0! Lead created! Stage: Qualifying! Email sent! Email opened! Amount= $1000! Call! Stage: Validate! Meeting! Demo! Close date! changed! Stage: negotiation! Outcome: Closed/won!
  • 15. DURATION MODEL! •  Win/loss model! •  Pr(win)! •  independent of time horizon! •  RF/GBDT! ! •  Duration model! •  Pr(win within quarter)! •  Poisson regression: assume that in current state xt, fixed probability of closing each day! •  Train a model to predict expected duration d, conditioned on outcome=win! •  Integrating corresponding exponential distribution gives Pr(close < t) (interarrival times)! •  Pr(win < t) = Pr(win) Pr(close < t | win)!
  • 16. FORECASTING: BOTTOM-UP! Bottom-up: Predict current quarter based on currently open pipeline! ! Considers quality of deals in pipeline! ! Ignores trends, deals not in pipeline! $265,410! $157,000 77% $200,000 37% $82,000 86% +! -! Obvious solution: expected amount in pipeline wrt Pr(win in quarter) scores!
  • 17. FORECASTING: TOP-DOWN! Top-down: Predict current quarter based on previous quarters! ! Accounts for seasonality and trending! ! Ignores state of current pipeline! 0.0e+002.5e+08 observed 5.0e+072.5e+08 trend −5e+065e+06 seasonal −1e+075e+06 2013.0 2013.2 2013.4 2013.6 2013.8 2014.0 2014.2 2014.4 random Time Decomposition of additive time series +! -! Typical decomposition of revenue time series into 3 components:! ! •  Trend component! •  Seasonal component! •  Random component! Idea: try to reduce the random component by taking into account current pipeline!
  • 18. ‘HYBRID’ FORECASTING! top down + bottom up! •  Idea: augment ARIMA model with side information from bottom-up model! •  Allows model to adjust coefficients in response to bottom-up features (representing current pipeline) while retaining ARIMA features ! •  Amount predicted to close in current quarter! •  Average score of currently open opportunities! •  Average predicted days to close! •  Historic adjusted coverage ratios! ! •  Sometimes known as ARIMAX [1]! [1] robjhyndman.com/hyndsight/arimax! !
  • 19. WORD VECTORS! •  Train word2vec model on text fields on opportunities! •  description, status, risks, …! •  “deal pushed out because no budget this quarter”! ! •  ~200m words! •  Gives 300-dimensional ‘neural’ word embeddings! •  Compare to GoogleNews model! •  Learned some sales-specific concepts! In [23]: model.most_similar('lost')! Out[23]:! [('disqualified', 0.7105633020401001),! ('killed', 0.6871206164360046),! ('won', 0.6662579774856567),! ('abandoned', 0.6619119048118591),! ('closing', 0.6464139223098755),! ('moved', 0.6406350135803223),! ('reopened', 0.6268107891082764),! ('closed_lost', 0.6187739968299866),! ('low_probability', 0.6092942953109741),! ('closed', 0.6073518395423889)]! ! In [24]: gn_model.most_similar('lost')! Out[24]:! [(u'losing', 0.7544215321540833),! (u'lose', 0.7136349081993103),! (u'regained', 0.618366003036499),! (u'loses', 0.6115548610687256),! (u'loosing', 0.576453447341919),! (u'gained', 0.5561528205871582),! (u'dropped', 0.5492223501205444),! (u'loss', 0.5399519205093384),! (u'won', 0.5263957977294922),! (u'regain', 0.5241336822509766)]!
  • 20. WORD VECTORS! In [8]: model.most_similar('pushed')! Out[8]:! [('moved', 0.8117796778678894),! ('pushing', 0.72132408618927),! ('delayed', 0.7004601955413818),! ('stalled', 0.6817235946655273),! ('indefinitely', 0.6797506809234619),! ('until', 0.6696473360061646),! ('shelved', 0.6633578538894653),! ('slowed_down', 0.6619900465011597),! ('might_slip', 0.6591036915779114),! ('gone', 0.6582096815109253)]! ! In [9]: gn_model.most_similar('pushed')! Out[9]:! [(u'pushing', 0.762706458568573),! (u'push', 0.695708692073822),! (u'nudged', 0.6802582144737244),! (u'shoved', 0.6162334084510803),! (u'bumped', 0.6148176789283752),! (u'pushes', 0.610393762588501),! (u'dragged', 0.5916476845741272),! (u'pulled', 0.5719939470291138),! (u'moved', 0.5660783052444458),! (u'inched', 0.5563575029373169)]! In [49]: model.most_similar('sdr')! Out[49]:! [('mktg', 0.6193182468414307),! ('lead_gen', 0.5637482404708862),! ('ppl', 0.5618690252304077),! ('lss', 0.5492127537727356),! ('reps', 0.5445878505706787),! ('cold_calling', 0.5426461696624756),! ('mkt', 0.5422939658164978),! ('marketo', 0.5341131687164307),! ('team', 0.532421886920929),! ('guru', 0.5259524583816528)]! ! In [50]: gn_model.most_similar('sdr')! ! ! KeyError: "word 'sdr' not in vocabulary"!
  • 21. We’re hiring! ! data {scientists, engineers}! ! ! andy.twigg@insidesales.com!