JustEnoughDevOpsForDataScientists

Just Enough DevOps for Data Scientists
abida@salesforce.com
@ anyabida1
Anya Bida, SRE at Salesforce

About Anya
Sr. Member of Technical Staff (SRE)
Salesforce Production Engineering
Salesforce Einstein Platform
Co-organizer SF Big Analytics
Spark Tuning
• Cheat-sheet
• Talks
Previously at Alpine Data, SRI
PhD Mayo Clinic, BS Johns Hopkins
@anyabida1

What I am going to talk about
What is DevOps
Salesforce Einstein Scales
Our goal
Top 10 tips
What’s next?

What is DevOps?
Software Development
Network &
SecurityInfrastructure
Build & Release

What is DevOps?
Network &
Build & Release
Data Science

What is DevOps?
Network &
Build & Release
Data Science
• Awesome library
on SparkML
• Spark clusters
• Microservices
• Cluster, Containers

Fastest Growing Top 5
Enterprise Software Company
$5.4B
FY15
$4.1B
FY14
$3.1B
FY13
$6.7B
FY16
$2.3B
FY12
$1.7B
FY11
$2.56BFY18Q2 revenue
$8.4BFY17 revenue
2009 • 2010 • 2011
2012 • 2013 • 2014
2015 • 2016 • 2017
September
2016
2011 • 2012 • 2013
2014 • 2015 • 2016 • 2017
The world’s most
innovative companies
“Innovator of
the Decade”

Our Goal
Time
Number of Predictions
Infrastructure Costs

Tip 1: Plan for Failure
Take off that Data Scientist hat now.

Simple Dashboard with KPIs

https://www.slideshare.net/jiboumans/how-to-measure-everything-a-million-metrics-per-second-with-minimal-developer-overhead
• Request & error rates
• Longest response times - upper
95th & 99th percentile
• Capacity
• Events
Jos Boumans,
Salesforce DMP
slides

https://www.slideshare.net/jiboumans/how-to-measure-everything-a-million-metrics-per-second-with-minimal-developer-overhead
• Request & error rates
• Longest response times - upper
95th & 99th percentile
• Capacity
• Events
Collect metrics from every
machine.
Troubleshoot with all the
metrics at your disposal

Tip 2: Blue Green Deployments
https://docs.mobingi.com/official/guide/bg-deploy
Blue Machine
(old)
Green Machine
(new)
Users

Tip 3: Assume people make mistakes
Technical debt
• Every manual change
• Duplicate metrics
Scale down resources
• Terminate unused machines
• Janitor Monkey
• Understand the cost per job
• Jobs should not accumulate files on disk

Tip 4: Changes should be auditable
Schaper - the tool to compare schemas
https://www.linkedin.com/in/huqixiu/
Qixiu “Q” Hu

Qixiu “Q” Hu
CREATE TABLE myConferences (
name text ,
city text,
early_bird timeuuid,
late_bird timeuuid,
PRIMARY KEY ((name, city),
early_bird)
) WITH CLUSTERING ORDER BY
(early_bird DESC);
name text ,
city text,
late_bird timeuuid,
early_bird)
(early_bird DESC);

Qixiu “Q” Hu
name text ,
city text,
late_bird timeuuid,
early_bird)
(early_bird DESC);
name text ,
city text,
late_bird timeuuid,
discount_code string,
early_bird)
(early_bird DESC);

Tip 5: Configuration management
Network Connectivity
• 20 parameters
User Access
• 50 parameters
Deploy cluster (eg Mesos)
• 20 non-default parameters
Deploy a microservice
• 50 parameters
Schedule a job
• 3 parameters
SUM X 3 regions
X 20 metrics
Approx.6000

Templates for Automation
Service discovery
Creating dashboards
• Prod, non-prod, …
Log queries
Cost analysis
Tip 6: Pick a naming convention
<service>.
<environment>.
<region>.
<hostname>.
<metric>

Tip 7: Permissions
Every user, service, & job should have specific, auditable permissions.
Cluster Manager
Scheduler
IAM
IAM Roles
• User has an IAM Role
• Job has an IAM Role
• IAM Roles determine read /
write access to data
IAM
Out
Logs
IAM
In

Understanding Memory Management in Spark For Fun And Profit Shivnath Babu (Duke University, Unravel Data Systems)
Mayuresh Kunjir (Duke University)
Tip 8: Understand resource allocation
Node Memory
Container Memory
8Gb
Node Memory
Container
Memory
8Gb

Node
Memory
Node
Memory
Node
Memory
4Gb
used
8Gb
total
Can my 8Gb container launch on this cluster?
8Gb

Tip 9: Monitor multiple viewpoints
https://light.co/camera

Tip 9: Monitor multiple viewpoints
Connectivity Viewer
https://www.linkedin.com/in/vaibhavt/
Vaibhav Tandon

Getting started tips:
1. Plan for failure
2. Blue / Green Deployments
3. Assume people make mistakes
4. Changes should be auditable
5. Configuration management
6. Pick a naming convention
7. Permissions
• user, service, job
8. Understand resource allocation
9. Monitor multiple viewpoints

Getting started tips: 1. Plan for failure
2. Blue / Green Deployments
3. Assume people make mistakes
4. Changes should be auditable
5. Configuration management
6. Pick a naming convention
7. Permissions
• user, service, job
8. Understand resource allocation
9. Monitor multiple viewpoints
10. Infrastructure as Code

Did we just automate ourselves
out of our jobs?
Nope. Now we have time to take on new projects and grow…

More info:
Jos Boumans,
Salesforce DMP
slides
SRE How Google Runs
Production Systems book
James Ward,
Engineering & Open Source
Ambassador at Salesforce
High Performance
spark book

More info:
Real Time ML Pipelines in Multi-Tenant Environments
Director of Engineering Karl Skucha & Lead Engineer Yan Yang
Introduction to Machine Learning
Engineering & Open Source Ambassador James Ward
Fantastic ML apps and how to build them
Principal Engineer, Matthew Tovbin
Fireworks - lighting up the sky with millions of Sparks
Director of Engineering Thomas Gerber
Functional Linear Algebra in Scala
Engineer & Professor Vlad Patryshev
Panel: Functional Programming for Machine Learning
Saturday @ 2:10pm —Complex Machine Learning Pipelines Made Easy
Machine Learning Engineers Till Bergmann & Chris Rupley

abida@salesforce.com
@ anyabida1
Anya Bida, SRE at Salesforce

JustEnoughDevOpsForDataScientists

More Related Content

What's hot (13)

Similar to JustEnoughDevOpsForDataScientists (20)

Recently uploaded (20)

JustEnoughDevOpsForDataScientists

Editor's Notes