SlideShare a Scribd company logo
www.scling.com
DataOps - Lean principles
and practices
Data 2030 Summit, 2021-02-11
Lars Albertsson, Founder, Scling
1
www.scling.com
Ask not what, but how
2
Ideas << execution
DataOps is the "how" of data & ML
2013: Transform @ Spotify
2014: "DataOps" term first seen
2018: Conference talk rejected
2019: Most watched recording @ Data Innovation Summit
2021: DataOps day @ Data 2030 Summit
www.scling.com
Enabling innovation
3
"The actual work that went into
Discover Weekly was very little,
because we're reusing things we
already had."
https://youtu.be/A259Yo8hBRs
https://youtu.be/ZcmJxli8WS8
https://musically.com/2018/08/08/daniel-ek-would-have-killed-discover-weekly-before-launch/
"Discover Weekly wasn't a great
strategic plan and 100 engineers.
It was 3 engineers that decided to
build something."
"I would have killed it. All of a sudden,
they shipped it. It’s one of the most
loved product features that we have."
- Daniel Ek, CEO
www.scling.com
IT craft to factory
4
Security Waterfall
Application
delivery
Traditional
operations
Traditional
QA
Infrastructure
DevSecOps Agile
Containers
DevOps CI/CD
Infrastructure
as code
www.scling.com
Security Waterfall
Data factories
5
Application
delivery
Traditional
operations
Traditional
QA
Infrastructure
DB-oriented
architecture
DevSecOps Agile
Containers
DevOps CI/CD
Infrastructure
as code
Data factories,
data pipelines,
DataOps
www.scling.com
From craft to process
6
www.scling.com
From craft to process
7
Multiple time windows
Assess ingress data quality
Repair broken data from
complementary source
Forecast based on history,
multiple parameter settings
Assess outcome data quality
Assess forecast success,
adapt parameters
www.scling.com
Naive ML
8
www.scling.com
Towards sustainable production ML
9
Multiple models,
parameters, features
Assess ingress data quality
Repair broken data from
complementary source
Choose model and parameters based
on performance and input data
Benchmark models
Try multiple models,
measure, A/B test
www.scling.com
The Toyota Way
Selected lean principles:
● Long-term over short-term
● The right process will produce the right results
● Eliminate waste (muda)
● Continuous improvement (kaizen)
● Use pull systems to avoid unnecessary production
● Quality takes precedence (jidoka)
○ Stop to fix problems
● Standardised tasks and processes
● Reliable technology that serves people and process
● Develop your people
● Decisions slowly by consensus
● Relentless reflection (hansei), organisational learning
10
www.scling.com
Common waste species
● Cognitive waste
● Technology waste
● Delivery waste
● Operational waste
● Product waste
11
Companies are generally good
at handling some waste forms,
and blind to others.
Your blindness is your potential.
www.scling.com
Cognitive waste
● Why do we have 25 time formats?
○ ISO 8601, UTC assumed
○ ISO 8601 + timezone
○ Millis since epoch, UTC
○ Nanos since epoch, UTC
○ Millis since epoch, user local time
○ …
○ Float of seconds since epoch, as string.
WTF?!?
● my-kafka-topic-name, your_topic_name
12
● Definition of an order:
○ Abandoned cart?
○ Payment refused?
○ Returned goods?
○ Free promotion?
● Data entity source of truth
○ MySQL, Kafka, data lake?
● Code and documentation sprawl
○ Repositories & branches
○ Wikis
www.scling.com
What causes cognitive waste?
● We are autonomous!
○ Teams can choose technology, format, process, ...
● Cognitive debt
○ Short-term over long-term
○ Decisions without consensus
● Recognition and rewards
○ "You have made a similar independent pipeline, great work!"
13
www.scling.com
Avoiding cognitive waste
● Reusing semantic definitions
● Reusing code & technical definitions
○ Code transparency & sharing
○ Standardised technology
○ Document decisions & consensus process
● Read-only sharing not enough
○ Must be empowered to
■ change for reuse
■ improve quality
■ delete unused
○ Low risk - what will I break downstream?
○ Standardised, end-to-end QA processes
14
www.scling.com
● Code not yet fully utilised
● Code on its way to production
○ In a notebook
○ Waiting for approval
○ Waiting for release
○ Internally released, waiting
for dependants to upgrade
● Tests not fully used
○ Tests that cover code (shared component),
but are not yet executed
Delivery waste - code inventory
15
www.scling.com
Eliminating delivery waste
16
● Friction from code to production
○ Positive engineering: research, writing code, tests, docs, refactor, improve
○ All else is negative
● You are limited by your assumptions
○ State of practice far is from state of art
But the test suite
takes 3 hours.
We have this
checklist.
Security must
approve.
X must be
released before Y.
That is another
team's job.
We don't have
access.
We must test in
staging first.
We haven't
performance
tested yet.
www.scling.com
So get rid of the waste. Resources:
No tradeoff between speed and quality!
17
www.scling.com
Data inventory
● Data collected, but not yet fully processed
○ Traditional lazy joins & SQL processing at runtime
○ Extract-load-transform (ELT)
● Eliminate with eager processing = pipeline
○ Process, join, denormalise
○ Extract-transform-load (ETL)
● Fatal problems → offline crash
○ "Andon" cord - stop and fix before significant harm is done
18
www.scling.com
Technology waste
19
NoSQL
Stream
processing
Spark/Flink
Hadoop
In-memory
databases
Schema
registry
Data
catalogue
Feature store
Change data
capture
Data
versioning
Governance
system Data
warehouse
Lakehouse
Scaled out
compute
Kubernetes
Essential
Compute
machines
Workflow
orchestration
RDBMS
File
storage
Code version
control
Visualisation Graph
processing
Deep learning
www.scling.com
Operational waste
● Friction in operational manoeuvres
○ Fear of mistakes
○ Application-specific tooling
● Cost of incidents
○ Time to recovery
○ Impact of incident
○ Frequency of incidents
20
www.scling.com
Separating offline and online
21
Raw
Fraud
service
Fraud
model
Orders Orders
Replication /
Backup
Prudent procedures Prudent procedures
Lightweight procedures
● QA driven by internal efficiency
● Continuous deployment
● New pipeline < 1 day
● Upgrade < 1 hour
● Bug recovery < 1 hour
Careful handover Careful handover
www.scling.com
Many nines uptime (99.99.. %) A couple of sevens
Data speed Innovation speed
22
Nearline
Data processing tradeoff
Job
Stream
Offline
Online
Stream
Job
Stream
www.scling.com
Product waste
● Work not driven by use case
● Unrealised data potential due to friction
○ Unawareness of data
○ Difficulty to use data
● Collaboration and communication
○ Connection
○ Overhead
23
Data democratisation -
making data accessible
and usable
Form teams aligned to
value flows.
www.scling.com
Continuous improvement & learning
● Products, not projects
○ Owned, never done, always improving
● To production early
○ Minimal fear
○ Measure and monitor to learn
● Fail & iterate
○ No blame, no penalties
● Communication across organisation essential
○ Data source team - data processing team - stakeholders
24
www.scling.com
Data product quality assurance
● Product quality = f(code, data)
○ Cannot do full QA on code only
○ Only real data is production data
● Test in production
○ Quick QA cycle = quick production deployment
○ Measure, monitor, validate
25
www.scling.com
Infrastructure waste
26
● Production environment only
○ Dev, test, staging lack production data
● Dark pipelines
○ Run in parallel
○ Monitor diff vs production
○ Roll out slowly?
∆?
www.scling.com
Slow cycle - slow learning
27
www.scling.com
Learning more about Lean & DataOps
28
www.scling.com
Scling - data-value-as-a-service
29
Data value through collaboration
Customer
Data factory
Data platform & lake
data
domain
expertise
Value from data!
Rapid data
innovation
Learning by doing,
in collaboration

More Related Content

PPT
APOSTILA DE IMPLANTODONTIA CIRURGICA - RESUMÃO
PDF
operatingmodelandorganizationdesigntoolkit-overviewandapproach-211220031125.pdf
PPT
Intro to agile business analysis
PDF
Apostila de estágio vi
DOCX
Lesões endo perio
DOCX
Plano+de+gerenciamento+das+comunicacoes
PPTX
Consulting toolkit preparing dummy pack
PDF
Fundamentals of program, project portfolio management
APOSTILA DE IMPLANTODONTIA CIRURGICA - RESUMÃO
operatingmodelandorganizationdesigntoolkit-overviewandapproach-211220031125.pdf
Intro to agile business analysis
Apostila de estágio vi
Lesões endo perio
Plano+de+gerenciamento+das+comunicacoes
Consulting toolkit preparing dummy pack
Fundamentals of program, project portfolio management

What's hot (7)

PDF
Abfração- Fatores etiológicos e características clínicas
PDF
7 steps to master problem solving
PDF
Gestão Ágil de Projetos
PDF
httpsabojf.com.brsistemastaticsblog5855fichas-de-odonto-segunda-edicao.pdf.pdf
PDF
Business Case Development Framework
PPTX
Program, Project and Change Management Toolkit and Playbook
PDF
Problem Solving
Abfração- Fatores etiológicos e características clínicas
7 steps to master problem solving
Gestão Ágil de Projetos
httpsabojf.com.brsistemastaticsblog5855fichas-de-odonto-segunda-edicao.pdf.pdf
Business Case Development Framework
Program, Project and Change Management Toolkit and Playbook
Problem Solving
Ad

Similar to DataOps - Lean principles and lean practices (20)

PDF
The lean principles of data ops
PDF
Data engineering in 10 years.pdf
PDF
Holistic data application quality
PDF
The 7 habits of data effective companies.pdf
PDF
End-to-end pipeline agility - Berlin Buzzwords 2024
PDF
Crossing the data divide
PDF
Data ops in practice - Swedish style
PDF
Large scale agile_svante_lidman
PDF
ANI | Agile Mindset Day @Gurugram | Agile Planning: Effective Practices and C...
PDF
DevOps and End to End Visibility with Ed Gaile
PDF
DevOps and End to End Visibility with Ed Gaile
PDF
Modern Engineering Practices - Building Blocks for the New Digital Economy (A...
PDF
Agile adoption tales from the coalface
PPTX
WinSmart Technologies
PPTX
2014-10 DevOps NFi - Why it's a good idea to deploy 10 times per day v1.0
PPTX
Make Continuous Delivery work for middle management
PDF
10 ways to stumble with big data
PPTX
Applying TQM and the Toyota Production System in Development of Software Arti...
PPTX
Agilelessons scanagile-final 2013
PDF
"We are doing it wrong."
The lean principles of data ops
Data engineering in 10 years.pdf
Holistic data application quality
The 7 habits of data effective companies.pdf
End-to-end pipeline agility - Berlin Buzzwords 2024
Crossing the data divide
Data ops in practice - Swedish style
Large scale agile_svante_lidman
ANI | Agile Mindset Day @Gurugram | Agile Planning: Effective Practices and C...
DevOps and End to End Visibility with Ed Gaile
DevOps and End to End Visibility with Ed Gaile
Modern Engineering Practices - Building Blocks for the New Digital Economy (A...
Agile adoption tales from the coalface
WinSmart Technologies
2014-10 DevOps NFi - Why it's a good idea to deploy 10 times per day v1.0
Make Continuous Delivery work for middle management
10 ways to stumble with big data
Applying TQM and the Toyota Production System in Development of Software Arti...
Agilelessons scanagile-final 2013
"We are doing it wrong."
Ad

More from Lars Albertsson (20)

PDF
All the DataOps, all the paradigms .
PDF
Generative AI - the power to destroy democracy meets the security and reliabi...
PDF
The road to pragmatic application of AI.pdf
PDF
Schema on read is obsolete. Welcome metaprogramming..pdf
PDF
Industrialised data - the key to AI success.pdf
PDF
Schema management with Scalameta
PDF
How to not kill people - Berlin Buzzwords 2023.pdf
PDF
Secure software supply chain on a shoestring budget
PDF
Ai legal and ethics
PDF
The right side of speed - learning to shift left
PDF
Mortal analytics - Covid-19 and the problem of data quality
PDF
Data democratised
PDF
Engineering data quality
PDF
Eventually, time will kill your data processing
PDF
Taming the reproducibility crisis
PDF
Eventually, time will kill your data pipeline
PPTX
Data ops in practice
PDF
Kubernetes as data platform
PDF
Don't build a data science team
PDF
Big data == lean data
All the DataOps, all the paradigms .
Generative AI - the power to destroy democracy meets the security and reliabi...
The road to pragmatic application of AI.pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
Industrialised data - the key to AI success.pdf
Schema management with Scalameta
How to not kill people - Berlin Buzzwords 2023.pdf
Secure software supply chain on a shoestring budget
Ai legal and ethics
The right side of speed - learning to shift left
Mortal analytics - Covid-19 and the problem of data quality
Data democratised
Engineering data quality
Eventually, time will kill your data processing
Taming the reproducibility crisis
Eventually, time will kill your data pipeline
Data ops in practice
Kubernetes as data platform
Don't build a data science team
Big data == lean data

Recently uploaded (20)

PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PDF
Introduction to Business Data Analytics.
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
Global journeys: estimating international migration
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPT
Quality review (1)_presentation of this 21
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Business Acumen Training GuidePresentation.pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
1_Introduction to advance data techniques.pptx
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
Major-Components-ofNKJNNKNKNKNKronment.pptx
Introduction to Business Data Analytics.
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Global journeys: estimating international migration
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Quality review (1)_presentation of this 21
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Business Acumen Training GuidePresentation.pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Fluorescence-microscope_Botany_detailed content
1_Introduction to advance data techniques.pptx

DataOps - Lean principles and lean practices

  • 1. www.scling.com DataOps - Lean principles and practices Data 2030 Summit, 2021-02-11 Lars Albertsson, Founder, Scling 1
  • 2. www.scling.com Ask not what, but how 2 Ideas << execution DataOps is the "how" of data & ML 2013: Transform @ Spotify 2014: "DataOps" term first seen 2018: Conference talk rejected 2019: Most watched recording @ Data Innovation Summit 2021: DataOps day @ Data 2030 Summit
  • 3. www.scling.com Enabling innovation 3 "The actual work that went into Discover Weekly was very little, because we're reusing things we already had." https://youtu.be/A259Yo8hBRs https://youtu.be/ZcmJxli8WS8 https://musically.com/2018/08/08/daniel-ek-would-have-killed-discover-weekly-before-launch/ "Discover Weekly wasn't a great strategic plan and 100 engineers. It was 3 engineers that decided to build something." "I would have killed it. All of a sudden, they shipped it. It’s one of the most loved product features that we have." - Daniel Ek, CEO
  • 4. www.scling.com IT craft to factory 4 Security Waterfall Application delivery Traditional operations Traditional QA Infrastructure DevSecOps Agile Containers DevOps CI/CD Infrastructure as code
  • 7. www.scling.com From craft to process 7 Multiple time windows Assess ingress data quality Repair broken data from complementary source Forecast based on history, multiple parameter settings Assess outcome data quality Assess forecast success, adapt parameters
  • 9. www.scling.com Towards sustainable production ML 9 Multiple models, parameters, features Assess ingress data quality Repair broken data from complementary source Choose model and parameters based on performance and input data Benchmark models Try multiple models, measure, A/B test
  • 10. www.scling.com The Toyota Way Selected lean principles: ● Long-term over short-term ● The right process will produce the right results ● Eliminate waste (muda) ● Continuous improvement (kaizen) ● Use pull systems to avoid unnecessary production ● Quality takes precedence (jidoka) ○ Stop to fix problems ● Standardised tasks and processes ● Reliable technology that serves people and process ● Develop your people ● Decisions slowly by consensus ● Relentless reflection (hansei), organisational learning 10
  • 11. www.scling.com Common waste species ● Cognitive waste ● Technology waste ● Delivery waste ● Operational waste ● Product waste 11 Companies are generally good at handling some waste forms, and blind to others. Your blindness is your potential.
  • 12. www.scling.com Cognitive waste ● Why do we have 25 time formats? ○ ISO 8601, UTC assumed ○ ISO 8601 + timezone ○ Millis since epoch, UTC ○ Nanos since epoch, UTC ○ Millis since epoch, user local time ○ … ○ Float of seconds since epoch, as string. WTF?!? ● my-kafka-topic-name, your_topic_name 12 ● Definition of an order: ○ Abandoned cart? ○ Payment refused? ○ Returned goods? ○ Free promotion? ● Data entity source of truth ○ MySQL, Kafka, data lake? ● Code and documentation sprawl ○ Repositories & branches ○ Wikis
  • 13. www.scling.com What causes cognitive waste? ● We are autonomous! ○ Teams can choose technology, format, process, ... ● Cognitive debt ○ Short-term over long-term ○ Decisions without consensus ● Recognition and rewards ○ "You have made a similar independent pipeline, great work!" 13
  • 14. www.scling.com Avoiding cognitive waste ● Reusing semantic definitions ● Reusing code & technical definitions ○ Code transparency & sharing ○ Standardised technology ○ Document decisions & consensus process ● Read-only sharing not enough ○ Must be empowered to ■ change for reuse ■ improve quality ■ delete unused ○ Low risk - what will I break downstream? ○ Standardised, end-to-end QA processes 14
  • 15. www.scling.com ● Code not yet fully utilised ● Code on its way to production ○ In a notebook ○ Waiting for approval ○ Waiting for release ○ Internally released, waiting for dependants to upgrade ● Tests not fully used ○ Tests that cover code (shared component), but are not yet executed Delivery waste - code inventory 15
  • 16. www.scling.com Eliminating delivery waste 16 ● Friction from code to production ○ Positive engineering: research, writing code, tests, docs, refactor, improve ○ All else is negative ● You are limited by your assumptions ○ State of practice far is from state of art But the test suite takes 3 hours. We have this checklist. Security must approve. X must be released before Y. That is another team's job. We don't have access. We must test in staging first. We haven't performance tested yet.
  • 17. www.scling.com So get rid of the waste. Resources: No tradeoff between speed and quality! 17
  • 18. www.scling.com Data inventory ● Data collected, but not yet fully processed ○ Traditional lazy joins & SQL processing at runtime ○ Extract-load-transform (ELT) ● Eliminate with eager processing = pipeline ○ Process, join, denormalise ○ Extract-transform-load (ETL) ● Fatal problems → offline crash ○ "Andon" cord - stop and fix before significant harm is done 18
  • 19. www.scling.com Technology waste 19 NoSQL Stream processing Spark/Flink Hadoop In-memory databases Schema registry Data catalogue Feature store Change data capture Data versioning Governance system Data warehouse Lakehouse Scaled out compute Kubernetes Essential Compute machines Workflow orchestration RDBMS File storage Code version control Visualisation Graph processing Deep learning
  • 20. www.scling.com Operational waste ● Friction in operational manoeuvres ○ Fear of mistakes ○ Application-specific tooling ● Cost of incidents ○ Time to recovery ○ Impact of incident ○ Frequency of incidents 20
  • 21. www.scling.com Separating offline and online 21 Raw Fraud service Fraud model Orders Orders Replication / Backup Prudent procedures Prudent procedures Lightweight procedures ● QA driven by internal efficiency ● Continuous deployment ● New pipeline < 1 day ● Upgrade < 1 hour ● Bug recovery < 1 hour Careful handover Careful handover
  • 22. www.scling.com Many nines uptime (99.99.. %) A couple of sevens Data speed Innovation speed 22 Nearline Data processing tradeoff Job Stream Offline Online Stream Job Stream
  • 23. www.scling.com Product waste ● Work not driven by use case ● Unrealised data potential due to friction ○ Unawareness of data ○ Difficulty to use data ● Collaboration and communication ○ Connection ○ Overhead 23 Data democratisation - making data accessible and usable Form teams aligned to value flows.
  • 24. www.scling.com Continuous improvement & learning ● Products, not projects ○ Owned, never done, always improving ● To production early ○ Minimal fear ○ Measure and monitor to learn ● Fail & iterate ○ No blame, no penalties ● Communication across organisation essential ○ Data source team - data processing team - stakeholders 24
  • 25. www.scling.com Data product quality assurance ● Product quality = f(code, data) ○ Cannot do full QA on code only ○ Only real data is production data ● Test in production ○ Quick QA cycle = quick production deployment ○ Measure, monitor, validate 25
  • 26. www.scling.com Infrastructure waste 26 ● Production environment only ○ Dev, test, staging lack production data ● Dark pipelines ○ Run in parallel ○ Monitor diff vs production ○ Roll out slowly? ∆?
  • 27. www.scling.com Slow cycle - slow learning 27
  • 29. www.scling.com Scling - data-value-as-a-service 29 Data value through collaboration Customer Data factory Data platform & lake data domain expertise Value from data! Rapid data innovation Learning by doing, in collaboration