SlideShare a Scribd company logo
Strata 2017
Creating a DevOps
Practice for Analytics
Bob Eilbacher
September 28, 2017
Agenda
 About Caserta
 DevOps
 DevOps for Analytics
 Organization and Teams
 Questions
About Caserta
 Data Intelligence Consulting and Modern Data Engineering
 Award-winning data innovation
 Internationally recognized work force
 Strategy, Architecture, Governance, Implementation
About Caserta
 Architecture & Design
 Implementation Services
 Disruption Management
 Strategic Technical Consulting
 Training & Education
 Application Innovation
 Cloud Management
What is DevOps for Analytics?
First some terminology…
 DevOps
 Associated with movement primarily in application
development space for last 5-10 years
 Focused on very fast and continuous software product
releases
 Think intra-day Prod releases at Netflix, Amazon, etc.
 Convergence of development and operations
methodologies to minimize TTR
 Tons of resources – devops.com, DZone
What is DevOps for Analytics?
Some more terminology…
 DataOps
 Re-emergent term
 Seems to have a broader context
 Applying DevOps to data management or to handling
backend databases
 Also tends to carry real legacy connotation
 Manual operations of database backups and restores,
What is DevOps for Analytics?
And finally…
 AnalyticsOps
 This is a term that we see starting to be used more
 Its focused on applying DevOps practices within a data
analytics and data science context
 This is the area we’re interested in for this talk
 We’ll use the terms AnalyticsOps or the more explicit
DevOps for Analytics interchangeably
DevOps…
 Speak with anyone and they will tell you first that DevOps
is a culture
 Based primarily on teamwork
DevOps…
DevOps…
 Speak with anyone and they will tell you first that DevOps is a
“culture”
 Based primarily on teamwork
 Aims to address the underlying conflict between
development and operations objectives
Innovation @ speed vs. Performance @ quality
Change vs. Stability
 Culture is not “implemented”
 It needs to evolve
 Good news is it can be seeded
DevOps…
 It works!
 75% of IT and product dev organizations were successfully
using DevOps to some extent
– Source: RightScale 2016 State of the Cloud Report
 It’s flexible
 No two companies’ DevOps approaches will look the same
 Infinite number of ways to create teamwork
 A reflection of the organization itself
DevOps…
 DevOps tenets
 Continuous Integration
 Test Automation
 Continuous Delivery
 Continuous Deployment
 End-to-end automation is still aspirational for most
companies
 Justify how much automation you need based on business
requirements.
DevOps…
 What DevOps is not is a toolchain implementation
 Tools help the team execute within the culture
 Don’t run out and put an end to end chain in place and then
expect adoption
 Lets talk about tools for a minute …
 Explosion of both open-source and commercial DevOps
tooling
 Serve every discrete need
 requirements management, SCM, test automation, defect
tracking, build, deployment, monitoring and more
 1,500+ tools available
DevOps…
 Tooling categories:
 Code : Code development, version control tools, code merging
 Build : Continuous integration tools, build status
 Test : Test and results determine performance
 Package : Artifact repository, application pre-deployment
staging
 Release : Change management, release automation
 Configure : Infrastructure configuration and management,
Infrastructure as Code tools
 Monitor : Applications performance monitoring, end user
experience
DevOps…
Source: XebiaLabs
Why DevOps for Analytics?
“The fact is that analytic teams are
being compared by their businesses to
Amazon Prime – 2-day delivery of
almost anything”
Source: Unknown
Why DevOps for Analytics?
Why DevOps for Analytics?
 A couple of recent real world examples…
Data Science Rock Star Process Overengineering
Why DevOps for Analytics?
 Analytics and data science projects, what used to take
months to achieve is happening in days or hours
 Businesses typically like that and want more…
 Enabled by the strong trend toward cloud analytic
platforms/services
 Infrastructure as code (IaC) allows extension of software
development practices to servers and infrastructure
 We can automate the build of complex analytic pipelines -
storage, processing engines, etc. with relative ease
DevOps for Analytics
 DevOps for Analytics combines the development and
operations teams and establishes best practices that
improve coordination between data science and operations
 BUT… Data Science and Analytics are different from
application development
 Especially in a Big Data environments - need big data to test big
data applications
 Much more diverse mix of tools and technologies – not just java
 Some differences in approach are needed
DevOps for Analytics
 AnalyticsOps this is still in its early days
 There aren’t any real solid industry success stories published
 People are still trying to figure out what works and aren’t’ open
kimono and sharing experiences just yet
 Not a lot of experienced practitioners
 But there are some early themes and guidelines emerging
DevOps for Analytics
 Environments
 Separate DEV and PROD environments
 Should you reuse any of the PROD data assets?
 Separate landing area, destination area (Data Lake), etc.
 Trickier with increasing data volumes – do it smart to avoid
double costs
 Sharing compute cluster resources is OK
 Make all job inputs and outputs configuration driven (PROD
and DEV code doesn’t change) – for CI
DevOps for Analytics
 Automated Testing
 It’s almost impossible to get full code coverage
 How do you unit test SPARK SQL scripts? Regression tests?
Data validation?
 Test data is a complex problem – handle as a cross-functional
initiative.
 Analytic results are often buried in complex outputs, QA
becomes forensic data analysis
 Automate what you can, supplement with community based
real-world data testing in a parallel Dev/Test environment
 The role of the Test/QA Engineer is still really important
 Test/QA Engineers need Data Engineering experience
DevOps for Analytics
 Monitoring
 Tracking and analyzing intra-day demand and longer term trends
in infrastructure performance (standard DevOps)
 But then…
 By their nature analytics processes require monitoring and
tuning over time with real-world inputs
 Data drifts; Predictive models have a finite lifetime
 Silent failures
 Feedback to developers so they can see how their code is
performing and affecting the Prod environment
 Continuous improvement
 The next wave is analytics on analytics…
DevOps for Analytics
 Emerging DevOps for Analytics environment usually contain
 SCM
 CI
 Repo to store analytics app
 Repo to store configuration
 An API to deploy to the cluster
 Mechanism to monitor behavior and performance
DevOps for Analytics Organization
 Building a DevOps for Analytics culture is not an easy
undertaking
 Should fall under the purview of a dedicated data organization
 These organizations are typically lead by the Chief Data
Officer
 More recently by Chief Data Scientist a Chief Analytics Officer
 Key responsibilities include
 Fostering adoption
 Clarifying and aligning to the business' vision
 Securing reasonable funding
DevOps for Analytics Organization
 The goal over time is to create lean, highly performant, cross-
functional, extremely effective teams
 Business Stakeholders
 Data Engineers
 Data Analysts & Data Scientists
 QA
 Operations
 All of these skills are important - but when in doubt get more Data
Engineers!
 Everyone on team has an equal voice
 Everyone codes & Everyone needs to know what Prod looks like
DevOps for Analytics Organization
 Start-up Condition: Bring in an experienced set of DevOps for
Analytics Engineers
 Help define the culture, lead by example
 Identify the Innovators and get them involved and leading
 The DevOps Engineers job is to ultimately engineer themselves out
of the equation
Source: Matthew Skelton, DevOps Patterns - Team Topologies
Final Thoughts
“We aim to engineer systems and processes
to better integrate development and
operations, resulting in decreased time to
market and an application infrastructure
that is instrumented, scalable and fault
tolerant… and immortal!”
- Will Liu, Equinox Data Team
Final Thoughts
 There are plenty of benefits in establishing a DevOps
for Analytics culture for your organization
 For the business: Speed to insight
 For the teams: Professional and personal satisfaction
 Be Fearless –
go build your own DevOps for Analytics culture!
Questions
Happy Birthday Joe Caserta!
Thank You
 Bob Eilbacher
 Vice President Operations, Caserta
 bob@casertaconcepts.com
Upcoming Training Opportunity:
Caserta is hosting 3 Days of Training Courses October 18-20th in NYC,
taught by Joe Caserta, co-author of The Data Warehouse ETL Toolkit:
Day 1: Agile Data Warehouse Design & Dimensional Modeling
Day 2: ETL Architecture & Design
Day 3: Big Data for Data Warehouse Practitioners
More info at casertaconcepts.com/event/

More Related Content

PPTX
Artificial Intelligence improving customer experience in Retail
PPTX
ChatGPT.pptx
PDF
How to Easily Create a Page in Sitecore
PPTX
A Beginner's Guide to Large Language Models
PDF
ใบงานที่ 16 เรื่อง ปฏิทินการปฏิบัติงาน
PDF
General Data Protection Regulation - BDW Meetup, October 11th, 2017
PDF
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Artificial Intelligence improving customer experience in Retail
ChatGPT.pptx
How to Easily Create a Page in Sitecore
A Beginner's Guide to Large Language Models
ใบงานที่ 16 เรื่อง ปฏิทินการปฏิบัติงาน
General Data Protection Regulation - BDW Meetup, October 11th, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017

Similar to Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017 (20)

PPTX
DevOps 1 (1).pptx
PPTX
Introduction to DevOps slides-converted (1).pptx
PPTX
Innovate Better Through Machine data Analytics
PPTX
PDF
PDF
Integrating SAP into DevOps Pipelines: Why and How
PDF
Introduction to DevOps slides.pdf
PDF
Breaking DevOps Illusion
PPT
DevOps-driving-blind
PPTX
ITpreneurs’ DevOps Portfolio- Professionalizing DevOps Skills
PDF
Devops Strategy Roadmap Lifecycle Ppt Powerpoint Presentation Slides Complete...
PDF
DevOps for the Discouraged
PDF
Dev ops concept
PDF
Meetup DevOps - Accelerate
PPTX
DevOps Culture transformation in Modern Software Delivery
PDF
Continuous Security / DevSecOps- Why How and What
PDF
DevOps culture, concepte , philosophie and practices
PPTX
apidays LIVE India 2022_Achieving High DevOps Practice Maturity.pptx
PPTX
Agile Chennai 2021 | Achieving High DevOps Maturity through Platform Engineer...
PDF
Data-Driven DevOps: Mining Machine Data for 'Metrics that Matter' in a DevOps...
DevOps 1 (1).pptx
Introduction to DevOps slides-converted (1).pptx
Innovate Better Through Machine data Analytics
Integrating SAP into DevOps Pipelines: Why and How
Introduction to DevOps slides.pdf
Breaking DevOps Illusion
DevOps-driving-blind
ITpreneurs’ DevOps Portfolio- Professionalizing DevOps Skills
Devops Strategy Roadmap Lifecycle Ppt Powerpoint Presentation Slides Complete...
DevOps for the Discouraged
Dev ops concept
Meetup DevOps - Accelerate
DevOps Culture transformation in Modern Software Delivery
Continuous Security / DevSecOps- Why How and What
DevOps culture, concepte , philosophie and practices
apidays LIVE India 2022_Achieving High DevOps Practice Maturity.pptx
Agile Chennai 2021 | Achieving High DevOps Maturity through Platform Engineer...
Data-Driven DevOps: Mining Machine Data for 'Metrics that Matter' in a DevOps...
Ad

More from Caserta (20)

PPTX
Using Machine Learning & Spark to Power Data-Driven Marketing
PPTX
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
PDF
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
PPTX
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
PDF
Introduction to Data Science (Data Summit, 2017)
PDF
The Rise of the CDO in Today's Enterprise
PDF
Building a New Platform for Customer Analytics
PDF
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
PDF
You're the New CDO, Now What?
PDF
The Data Lake - Balancing Data Governance and Innovation
PDF
Making Big Data Easy for Everyone
PDF
Benefits of the Azure Cloud
PDF
Big Data Analytics on the Cloud
PDF
Intro to Data Science on Hadoop
PDF
The Emerging Role of the Data Lake
PDF
Not Your Father's Database by Databricks
PDF
Mastering Customer Data on Apache Spark
PDF
Moving Past Infrastructure Limitations
PDF
Balancing Data Governance and Innovation
PDF
Introducing Kudu, Big Data Warehousing Meetup
Using Machine Learning & Spark to Power Data-Driven Marketing
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Introduction to Data Science (Data Summit, 2017)
The Rise of the CDO in Today's Enterprise
Building a New Platform for Customer Analytics
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
You're the New CDO, Now What?
The Data Lake - Balancing Data Governance and Innovation
Making Big Data Easy for Everyone
Benefits of the Azure Cloud
Big Data Analytics on the Cloud
Intro to Data Science on Hadoop
The Emerging Role of the Data Lake
Not Your Father's Database by Databricks
Mastering Customer Data on Apache Spark
Moving Past Infrastructure Limitations
Balancing Data Governance and Innovation
Introducing Kudu, Big Data Warehousing Meetup
Ad

Recently uploaded (20)

PDF
Building Integrated photovoltaic BIPV_UPV.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Electronic commerce courselecture one. Pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
KodekX | Application Modernization Development
PDF
cuic standard and advanced reporting.pdf
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
MYSQL Presentation for SQL database connectivity
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Encapsulation theory and applications.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
Big Data Technologies - Introduction.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
The AUB Centre for AI in Media Proposal.docx
Electronic commerce courselecture one. Pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Mobile App Security Testing_ A Comprehensive Guide.pdf
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Digital-Transformation-Roadmap-for-Companies.pptx
Unlocking AI with Model Context Protocol (MCP)
KodekX | Application Modernization Development
cuic standard and advanced reporting.pdf
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
MYSQL Presentation for SQL database connectivity
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Encapsulation theory and applications.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Chapter 3 Spatial Domain Image Processing.pdf
A Presentation on Artificial Intelligence
The Rise and Fall of 3GPP – Time for a Sabbatical?
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Big Data Technologies - Introduction.pptx

Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017

  • 1. Strata 2017 Creating a DevOps Practice for Analytics Bob Eilbacher September 28, 2017
  • 2. Agenda  About Caserta  DevOps  DevOps for Analytics  Organization and Teams  Questions
  • 3. About Caserta  Data Intelligence Consulting and Modern Data Engineering  Award-winning data innovation  Internationally recognized work force  Strategy, Architecture, Governance, Implementation
  • 4. About Caserta  Architecture & Design  Implementation Services  Disruption Management  Strategic Technical Consulting  Training & Education  Application Innovation  Cloud Management
  • 5. What is DevOps for Analytics? First some terminology…  DevOps  Associated with movement primarily in application development space for last 5-10 years  Focused on very fast and continuous software product releases  Think intra-day Prod releases at Netflix, Amazon, etc.  Convergence of development and operations methodologies to minimize TTR  Tons of resources – devops.com, DZone
  • 6. What is DevOps for Analytics? Some more terminology…  DataOps  Re-emergent term  Seems to have a broader context  Applying DevOps to data management or to handling backend databases  Also tends to carry real legacy connotation  Manual operations of database backups and restores,
  • 7. What is DevOps for Analytics? And finally…  AnalyticsOps  This is a term that we see starting to be used more  Its focused on applying DevOps practices within a data analytics and data science context  This is the area we’re interested in for this talk  We’ll use the terms AnalyticsOps or the more explicit DevOps for Analytics interchangeably
  • 8. DevOps…  Speak with anyone and they will tell you first that DevOps is a culture  Based primarily on teamwork
  • 10. DevOps…  Speak with anyone and they will tell you first that DevOps is a “culture”  Based primarily on teamwork  Aims to address the underlying conflict between development and operations objectives Innovation @ speed vs. Performance @ quality Change vs. Stability  Culture is not “implemented”  It needs to evolve  Good news is it can be seeded
  • 11. DevOps…  It works!  75% of IT and product dev organizations were successfully using DevOps to some extent – Source: RightScale 2016 State of the Cloud Report  It’s flexible  No two companies’ DevOps approaches will look the same  Infinite number of ways to create teamwork  A reflection of the organization itself
  • 12. DevOps…  DevOps tenets  Continuous Integration  Test Automation  Continuous Delivery  Continuous Deployment  End-to-end automation is still aspirational for most companies  Justify how much automation you need based on business requirements.
  • 13. DevOps…  What DevOps is not is a toolchain implementation  Tools help the team execute within the culture  Don’t run out and put an end to end chain in place and then expect adoption  Lets talk about tools for a minute …  Explosion of both open-source and commercial DevOps tooling  Serve every discrete need  requirements management, SCM, test automation, defect tracking, build, deployment, monitoring and more  1,500+ tools available
  • 14. DevOps…  Tooling categories:  Code : Code development, version control tools, code merging  Build : Continuous integration tools, build status  Test : Test and results determine performance  Package : Artifact repository, application pre-deployment staging  Release : Change management, release automation  Configure : Infrastructure configuration and management, Infrastructure as Code tools  Monitor : Applications performance monitoring, end user experience
  • 16. Why DevOps for Analytics? “The fact is that analytic teams are being compared by their businesses to Amazon Prime – 2-day delivery of almost anything” Source: Unknown
  • 17. Why DevOps for Analytics?
  • 18. Why DevOps for Analytics?  A couple of recent real world examples… Data Science Rock Star Process Overengineering
  • 19. Why DevOps for Analytics?  Analytics and data science projects, what used to take months to achieve is happening in days or hours  Businesses typically like that and want more…  Enabled by the strong trend toward cloud analytic platforms/services  Infrastructure as code (IaC) allows extension of software development practices to servers and infrastructure  We can automate the build of complex analytic pipelines - storage, processing engines, etc. with relative ease
  • 20. DevOps for Analytics  DevOps for Analytics combines the development and operations teams and establishes best practices that improve coordination between data science and operations  BUT… Data Science and Analytics are different from application development  Especially in a Big Data environments - need big data to test big data applications  Much more diverse mix of tools and technologies – not just java  Some differences in approach are needed
  • 21. DevOps for Analytics  AnalyticsOps this is still in its early days  There aren’t any real solid industry success stories published  People are still trying to figure out what works and aren’t’ open kimono and sharing experiences just yet  Not a lot of experienced practitioners  But there are some early themes and guidelines emerging
  • 22. DevOps for Analytics  Environments  Separate DEV and PROD environments  Should you reuse any of the PROD data assets?  Separate landing area, destination area (Data Lake), etc.  Trickier with increasing data volumes – do it smart to avoid double costs  Sharing compute cluster resources is OK  Make all job inputs and outputs configuration driven (PROD and DEV code doesn’t change) – for CI
  • 23. DevOps for Analytics  Automated Testing  It’s almost impossible to get full code coverage  How do you unit test SPARK SQL scripts? Regression tests? Data validation?  Test data is a complex problem – handle as a cross-functional initiative.  Analytic results are often buried in complex outputs, QA becomes forensic data analysis  Automate what you can, supplement with community based real-world data testing in a parallel Dev/Test environment  The role of the Test/QA Engineer is still really important  Test/QA Engineers need Data Engineering experience
  • 24. DevOps for Analytics  Monitoring  Tracking and analyzing intra-day demand and longer term trends in infrastructure performance (standard DevOps)  But then…  By their nature analytics processes require monitoring and tuning over time with real-world inputs  Data drifts; Predictive models have a finite lifetime  Silent failures  Feedback to developers so they can see how their code is performing and affecting the Prod environment  Continuous improvement  The next wave is analytics on analytics…
  • 25. DevOps for Analytics  Emerging DevOps for Analytics environment usually contain  SCM  CI  Repo to store analytics app  Repo to store configuration  An API to deploy to the cluster  Mechanism to monitor behavior and performance
  • 26. DevOps for Analytics Organization  Building a DevOps for Analytics culture is not an easy undertaking  Should fall under the purview of a dedicated data organization  These organizations are typically lead by the Chief Data Officer  More recently by Chief Data Scientist a Chief Analytics Officer  Key responsibilities include  Fostering adoption  Clarifying and aligning to the business' vision  Securing reasonable funding
  • 27. DevOps for Analytics Organization  The goal over time is to create lean, highly performant, cross- functional, extremely effective teams  Business Stakeholders  Data Engineers  Data Analysts & Data Scientists  QA  Operations  All of these skills are important - but when in doubt get more Data Engineers!  Everyone on team has an equal voice  Everyone codes & Everyone needs to know what Prod looks like
  • 28. DevOps for Analytics Organization  Start-up Condition: Bring in an experienced set of DevOps for Analytics Engineers  Help define the culture, lead by example  Identify the Innovators and get them involved and leading  The DevOps Engineers job is to ultimately engineer themselves out of the equation Source: Matthew Skelton, DevOps Patterns - Team Topologies
  • 29. Final Thoughts “We aim to engineer systems and processes to better integrate development and operations, resulting in decreased time to market and an application infrastructure that is instrumented, scalable and fault tolerant… and immortal!” - Will Liu, Equinox Data Team
  • 30. Final Thoughts  There are plenty of benefits in establishing a DevOps for Analytics culture for your organization  For the business: Speed to insight  For the teams: Professional and personal satisfaction  Be Fearless – go build your own DevOps for Analytics culture!
  • 32. Happy Birthday Joe Caserta!
  • 33. Thank You  Bob Eilbacher  Vice President Operations, Caserta  bob@casertaconcepts.com Upcoming Training Opportunity: Caserta is hosting 3 Days of Training Courses October 18-20th in NYC, taught by Joe Caserta, co-author of The Data Warehouse ETL Toolkit: Day 1: Agile Data Warehouse Design & Dimensional Modeling Day 2: ETL Architecture & Design Day 3: Big Data for Data Warehouse Practitioners More info at casertaconcepts.com/event/