SlideShare a Scribd company logo
Lessons Learned:
Machine Learning and
Technical Debt
Matthew Kirk
@mjkirk
Who uses data?
Responsive Enterprise
A Golden Opportunity
The Danger
The High Interest Debt of
Machine Learning
What we’re covering
• Boundary Erosion
• Data Dependencies
• Spaghetti Code
• The Real World
`whoami`
• O’Reilly Author - Thoughtful Machine Learning. Use
AUTHD to get a discount on OReilly.com.
• Former Financial Quant
• Independent Consultant
• @mjkirk
Boundary Erosion
• Entanglement
• Visibility Debt
Entanglement
Entanglement: Solution
• Isolate models as much as possible
• Regularization
Visibility Debt
Solutions
• Keeping an API Log
• Monitoring of tool use
• No sharing of usernames :)
Data Dependencies
• Unstable
• Underutilized
Unstable Data
Solution
• Versioning
• Keep a specific version of a dataset. For instance a
timestamped version of language data.
Underutilized
Solution
• Feature engineering: PCA, ICA, Random Feature
Selection, VIMP, etc.
Spaghetti Code
• Glue Code
• Pipeline Jungle
• Experimental Paths
• Configuration Debt
Glue Code
R, Matlab, Python, Java. All to use that one
implementation
Solution
• Write your own implementation of the algorithm….
Pipeline Jungle
Conway’s Law
The Clymb’s Database V1.0
PS: No Monitoring on any of this.
Clymb DB V2.0
Solution
• Map systems and reduce
• Reduce organizational disconnects by attending
stand ups and being a part of the engineering team
Experimental Paths
Solution: Tombstones
!
• def run_this_once_in_prod!;
Tombstone.new(‘2014-01-02’); end
• When you think something is dead put a
Tombstone on it
• https://www.youtube.com/watch?v=29UXzfQWOhQ
Configuration Debt
Solution
• Find optimal configurations regularly
• Revisit initial configuration with new datapoints.
External World Changes
• Fixed Thresholds
• Correlation changes
Fixed Thresholds
• Law’s Change: The drinking age used to be 19 in
many states.
Solution
• Rebuild, or include accuracy as part of your model
to minimize on.
• Min Cost = Actual - Predicted
Correlations Change
Solution
• Be careful when trying to find causal evidence.
Think what if the model doesn’t work.
• Iterate often
Questions?
The Blissful Land of
Opportunity
Lessons Learned In one
Slide
Danger Solutions
Entanglement Regularize or Isolate Models
Visibility Debt Keep an access log of who uses what
Unstable Data Version datasets
Underutilized Data Trim by finding better features
Glue Code Write your own implementations
Pipeline Jungle Find minimum cut in systems
Experimental Paths Use Tombstones
Configuration Debt Reconfigure with new datasets
Fixed Thresholds Include accuracy as part of model
Correlation Changes Trim non-causal data from models
Links and Contact
• @mjkirk
• matt@matthewkirk.com
• Machine Learning: The High-Interest Credit Card of
Technical Debt: https://bit.ly/1zs9TXi
• Is that code dead?: http://bit.ly/1sg0B1L
Photo Sources
• Cost of gigabyte: http://royal.pingdom.com/2011/12/19/would-you-pay-7260-for-a-3-tb-drive-charting-hdd-and-ssd-prices-over-time/
• Golden Opportunity: https://flic.kr/p/7xvfZr
• Problems are Opportunities: https://flic.kr/p/ifFos
• Master Charge: https://flic.kr/p/noQUh1
• Erosion: https://flic.kr/p/9agH2q
• Coupler: https://flic.kr/p/ppm9HG
• Fruit Loops: https://flic.kr/p/5rkLhP
• Somewhere in Quản Bạ, Hà Giang: https://flic.kr/p/q4K9Bo
• Data Dependencies: https://flic.kr/p/dVq7vg
• Unstable!: https://flic.kr/p/s7RLj
• Underutilized Piano: https://flic.kr/p/2sZVP
• Spaghetti: https://flic.kr/p/tuwkp
• Glue: https://flic.kr/p/6L13SK
• Pipelines at google: https://flic.kr/p/pvLQG2

More Related Content

PPTX
H2O World - Migrating from Proprietary Analytics Software - Fonda Ingram
PDF
Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013
PDF
Big data expo - machine learning in the elastic stack
PPTX
Data Science Salon: Introduction to Machine Learning - Marketing Use Case
PDF
Executive Briefing: Why managing machines is harder than you think
PDF
H2O World - Machine Learning for non-data scientists
PDF
Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...
PPTX
Best Practices for Strucuturing a Data Team
H2O World - Migrating from Proprietary Analytics Software - Fonda Ingram
Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013
Big data expo - machine learning in the elastic stack
Data Science Salon: Introduction to Machine Learning - Marketing Use Case
Executive Briefing: Why managing machines is harder than you think
H2O World - Machine Learning for non-data scientists
Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...
Best Practices for Strucuturing a Data Team

Viewers also liked (9)

PDF
Is Machine learning for your business? - Girls in Tech Luxembourg
PDF
.Net development with Azure Machine Learning (AzureML) Nov 2014
PDF
Assignment of arbitrarily distributed random samples to the fixed probability...
PDF
Reproducibility and automation of machine learning process
PDF
Directions towards a cool consumer review platform using machine learning (ml...
PDF
Requirements for next generation of Cloud Computing: Case study with multiple...
PDF
Technical Area: Machine Learning and Pattern Recognition
PDF
Machine Learning part 2 - Introduction to Data Science
PPTX
Introduction to Machine Learning
Is Machine learning for your business? - Girls in Tech Luxembourg
.Net development with Azure Machine Learning (AzureML) Nov 2014
Assignment of arbitrarily distributed random samples to the fixed probability...
Reproducibility and automation of machine learning process
Directions towards a cool consumer review platform using machine learning (ml...
Requirements for next generation of Cloud Computing: Case study with multiple...
Technical Area: Machine Learning and Pattern Recognition
Machine Learning part 2 - Introduction to Data Science
Introduction to Machine Learning
Ad

Similar to Lessons learned (20)

PDF
Knowledge Discovery
PPTX
Active Learning for Fraud Prevention
PPTX
Large Scale Graph Processing & Machine Learning Algorithms for Payment Fraud ...
PDF
Barga Galvanize Sept 2015
PDF
predictive analysis and usage in procurement ppt 2017
PDF
The lean principles of data ops
PDF
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
PPTX
Building High Available and Scalable Machine Learning Applications
PDF
Think Big | Enterprise Artificial Intelligence
PDF
1440 track 2 boire_using our laptop
PPTX
Big Data and Semantic Web in Manufacturing
PDF
Architecting for Data Science
PDF
From Labelling Open data images to building a private recommender system
PPTX
Introduction to Machine Learning
PPTX
Shikha fdp 62_14july2017
PDF
Practical model management in the age of Data science and ML
PPTX
Machine Learning & Predictive Maintenance
PDF
Big Data Analysis and Business Intelligence
PDF
Machine Learning Applications in Credit Risk
PDF
The New Model
Knowledge Discovery
Active Learning for Fraud Prevention
Large Scale Graph Processing & Machine Learning Algorithms for Payment Fraud ...
Barga Galvanize Sept 2015
predictive analysis and usage in procurement ppt 2017
The lean principles of data ops
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
Building High Available and Scalable Machine Learning Applications
Think Big | Enterprise Artificial Intelligence
1440 track 2 boire_using our laptop
Big Data and Semantic Web in Manufacturing
Architecting for Data Science
From Labelling Open data images to building a private recommender system
Introduction to Machine Learning
Shikha fdp 62_14july2017
Practical model management in the age of Data science and ML
Machine Learning & Predictive Maintenance
Big Data Analysis and Business Intelligence
Machine Learning Applications in Credit Risk
The New Model
Ad

More from hexgnu (6)

PPTX
What is Reinforcement Learning? A quick 8 slide intro
PDF
Pasted image-94
PDF
Pasted image-97
PDF
Pasted image-100
PPT
Rubyconf Neural Networks
PDF
Katy perry and trend detection red dirt
What is Reinforcement Learning? A quick 8 slide intro
Pasted image-94
Pasted image-97
Pasted image-100
Rubyconf Neural Networks
Katy perry and trend detection red dirt

Recently uploaded (20)

PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PPTX
Odoo POS Development Services by CandidRoot Solutions
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PDF
Digital Strategies for Manufacturing Companies
PDF
Nekopoi APK 2025 free lastest update
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
System and Network Administraation Chapter 3
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
System and Network Administration Chapter 2
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PPTX
Reimagine Home Health with the Power of Agentic AI​
PPTX
L1 - Introduction to python Backend.pptx
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
top salesforce developer skills in 2025.pdf
PDF
PTS Company Brochure 2025 (1).pdf.......
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Odoo POS Development Services by CandidRoot Solutions
Operating system designcfffgfgggggggvggggggggg
How to Migrate SBCGlobal Email to Yahoo Easily
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
Digital Strategies for Manufacturing Companies
Nekopoi APK 2025 free lastest update
Which alternative to Crystal Reports is best for small or large businesses.pdf
System and Network Administraation Chapter 3
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Wondershare Filmora 15 Crack With Activation Key [2025
System and Network Administration Chapter 2
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Reimagine Home Health with the Power of Agentic AI​
L1 - Introduction to python Backend.pptx
Design an Analysis of Algorithms I-SECS-1021-03
top salesforce developer skills in 2025.pdf
PTS Company Brochure 2025 (1).pdf.......

Lessons learned