SlideShare a Scribd company logo
From Data Science to Production
01
deploy, scale, enjoy!
Sergii Khomenko, Data Scientist
sergii.khomenko@stylight.com, @lc0d3r
PyData Amsterdam - March 12, 2016
Sergii Khomenko
2
Data scientist at one of the biggest fashion communities, Stylight.
Data analysis and visualisation hobbyist, working on problems not
only in working time but in free time for fun and personal data
visualisations.
Originally from computer engineering background.
Speaker at Berlin Buzzwords 2014, ApacheCon Europe 2014, Puppet
Camp London 2015, Berlin Buzzwords 2015 , Tableau Conference on
Tour 2015, Budapest BI Forum 2015, Crunchsconf 2015, FOSDEM
2016
Fellow DevOps
3
Quentin NerdenMilos Radovanovic Patrick Roelke
Profitable Leads
Stylight provides its
partners with high-
quality leads enabling
partner shops to
leverage Stylight as a
ROI positive traffic
channel.
Inspiration
Stylight offers
shoppable
inspiration that
makes it easy to
know what to
buy and how to
style it.
Branding & Reach
Stylight offers a unique
opportunity for brands to reach
an audience that is actively
looking for style online.
Shopping
Stylight helps users search
and shop fashion and lifestyle
products smarter across
hundreds of shops.
4
Stylight – Make Style Happen
Core Target Group
Stylight help aspiring women between 18 and 35 to evolve their style through shoppable inspiration.
Stylight – acting on a global scale
Experienced & Ambitious Team
Innovative cross-
functional organisation
with flat hierarchy builds a 

unique team spirit.
• +200 employees
• 40 PhDs/Engineers
• 28 years average age
• 63% female
• 23 nationalities
• 0 suits
6
7
D a t a S c i e n t i s t : P e r s o n w h o i s
b e t t e r a t s t a t i s t i c s t h a n a n y
s o f t w a r e e n g i n e e r a n d b e t t e r a t
s o f t w a r e e n g i n e e r i n g t h a n a n y
s t a t i s t i c i a n .
Agenda
8
E a r l y d a y s o f s t a r t u p s
S o f t w a r e e n g i n e e r i n g
I m m u t a b l e i n f r a s t r u c t u r e
S e r v e r l e s s a r c h i t e c t u r e
The Early Days of Startups
9
Problem definition:
10
• Many different technologies
• Hard to reproduce data science results
• Issues with backward compatibility
• Dependency hell
• Hard to scale products
• Hard to on-board new people
11
Software engineering
12
built circa 2015-16
Our stack
13
14
You most likely doing it already
15
• Version control
• Cover code with tests
• nosetests, pytest, unittest2
- start small with doc tests
- try out TDD: rednose, nose-watch
You most likely doing it already
16
• Cover code with tests
• yes, even your R application could
have tests
- testthat
- devtools
• Code reviews
• Pair programming
Some of the mentioned problems
17
• Many different technologies
• Issues with backward compatibility
• Dependency hell
• Hard to on-board new people
18image from http://udaypal.com/
19image from http://udaypal.com/
20image from http://udaypal.com/
Some of the mentioned problems
21
• Many different technologies
• Issues with backward compatibility
• Dependency hell
• Hard to on-board new people
How it could help:
22
• Every technology has its own container
- just docker run
• Every package with version defined in
Dockerfile
- have a base image for more advanced cases
• New people
- just docker run
23image from http://udaypal.com/
r-base/Dockerfile
24image from http://udaypal.com/
lc0/docker-shiny-server
25image from http://udaypal.com/
Known issues
26
• Images could be really huge
• Try to skip anything you do not need
• Alpine Linux as a base image
• 5 mb base image (musl libc and BusyBox)
• Iron.io has pre-built images based on alpine
• python, scala, java, elixir, etc
Known issues
27
16 mb
232 mb
Some of the mentioned problems
28
• Hard to roll out
• Hard to maintain production dependencies
29image from http://udaypal.com/
AWS ECR
30image from http://udaypal.com/
31image from http://udaypal.com/
CircleCI deployments
32image from http://udaypal.com/
CircleCI deployments
33image from http://udaypal.com/
CircleCI deployments
34image from http://udaypal.com/
CircleCI deployments
Immutable infrastructure
35
Infrastructure as Code
36
N e e d t o u p g r a d e ? N o p r o b l e m .
B u i l d a n e w , u p g r a d e d s y s t e m a n d
t h r o w t h e o l d o n e a w a y . N e w a p p
r e v i s i o n ? S a m e t h i n g . B u i l d a
s e r v e r ( o r i m a g e ) w i t h a n e w
r e v i s i o n a n d t h r o w a w a y t h e o l d
o n e s .
37
38
39
40
CloudFormation
41
CloudFormation
42
cloudtools/troposphere
43
cloudtools/troposphere
44
cloudtools/troposphere
45
Terraform
46
47
Terraform
Kubernetes and Docker {Swarm, Compose}
Serverless architecture
48
49
50
51
52
53
54
55
Possibilities
56
• all Lambdas in one place with version control
• integration tests with real events
• proper CI/CD setup
57
CircleCI deployments
58
CircleCI deployments
59
CircleCI deployments
60
Cloud functions
Use-case of
outlier detection
61
62
63
custom
unification
pipeline
Departments
Business
Intelligence
internal processes variety of event types
and structures
64
Outlier detection to Slack
www.stylight.com
sergii.khomenko@stylight.com
@lc0d3r
Related links
66
1. Testing Your Code - The Hitchhiker's Guide to Python
2. https://hub.docker.com/_/r-base/
3. http://www.alpinelinux.org/
4. https://github.com/iron-io/dockers
5. Docker Hub: A new stack plus ecosystem partners automate developer
workflows
6. Trash Your Servers and Burn Your Code: Immutable Infrastructure and
Disposable Components
Related links
67
7. https://github.com/cloudtools/troposphere
8. CloudFormation UpdatePolicy Attribute
9. https://www.terraform.io/
10.(Docker Compose + Docker Swarm) or Kubernetes
11.Google Cloud Functions
12.https://github.com/apex/apex
13.Streaming Data Processing with Amazon Kinesis and AWS Lambda
68
69

More Related Content

PDF
Handle your Lambdas - From event-based processing to Continuous Integration /...
PDF
Goobi: News & Noteworthy – Latest Developments and Future Roadmap
PPTX
Accelerated Augmented Reality services Boston 2018 LiveWorx
PDF
Certificate_C-FCDGQNA9AU
PDF
Building Ranking Infrastructure: Data-Driven, Lean, Flexible - Sergii Khomenk...
PDF
Dilligentia 2nd Annual Gas conference - (ICF) India Gas Market Assessment_27t...
PDF
Crunching data with go: Tips, tricks, use-cases
PPTX
Google Analytics Crash Course
Handle your Lambdas - From event-based processing to Continuous Integration /...
Goobi: News & Noteworthy – Latest Developments and Future Roadmap
Accelerated Augmented Reality services Boston 2018 LiveWorx
Certificate_C-FCDGQNA9AU
Building Ranking Infrastructure: Data-Driven, Lean, Flexible - Sergii Khomenk...
Dilligentia 2nd Annual Gas conference - (ICF) India Gas Market Assessment_27t...
Crunching data with go: Tips, tricks, use-cases
Google Analytics Crash Course

Viewers also liked (12)

PPTX
See This, Do That Analytics presentation from Superweek 2014
PPTX
Impacting Business Performance with Analytics
PPTX
Measure camp pres 5 cro myths
PDF
Get more from Analytics 360 with BigQuery and the Google Cloud Platform
PPTX
Breaking down the barriers to the use of digital analytics
PDF
User-Centric Analytics (MeasureCamp Talk)
PPTX
Superweek 2015 traffic attribution
PDF
A/B Testing Pitfalls - MeasureCamp London 2015
PPTX
Apache Spark Model Deployment
PPTX
Google BigQuery 101 & What’s New
PPTX
31 Ways To Destroy Your Google Analytics Implementation
PDF
Google Analytics Premium for Better Data-Driven Decisions With Swapnil Sinha
See This, Do That Analytics presentation from Superweek 2014
Impacting Business Performance with Analytics
Measure camp pres 5 cro myths
Get more from Analytics 360 with BigQuery and the Google Cloud Platform
Breaking down the barriers to the use of digital analytics
User-Centric Analytics (MeasureCamp Talk)
Superweek 2015 traffic attribution
A/B Testing Pitfalls - MeasureCamp London 2015
Apache Spark Model Deployment
Google BigQuery 101 & What’s New
31 Ways To Destroy Your Google Analytics Implementation
Google Analytics Premium for Better Data-Driven Decisions With Swapnil Sinha
Ad

Similar to From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016 (20)

PDF
Reliability Patterns for Distributed Applications
PDF
The "Holy Grail" of Dev/Ops
ODP
Path Dependent Development (PyCon AU)
PDF
An Introduction to developing for production
ODP
Path dependent-development (PyCon India)
PPTX
Design Reviews for Operations - Velocity Europe 2014
PDF
Introduction to DevOps
PDF
How to get started with Site Reliability Engineering
PDF
Software Engineering for Startups (University of St Andrews, 2013)
PDF
DDDP 2019 - Brown to Green
PDF
Building Evolvable Infrastructure
PDF
Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"
PDF
Fine-Tuning of Agile Development
PDF
Microservices - Scaling Development and Service
PDF
Deploying large-scale, serverless and asynchronous systems - without integrat...
PDF
Microservices: State of the Union
PDF
The challenge of putting software sustainability research into practice
PDF
Lessons from Large-Scale Cloud Software at Databricks
PPTX
Software engineering practices for the data science and machine learning life...
PDF
Productionizing Data Science at Experience
Reliability Patterns for Distributed Applications
The "Holy Grail" of Dev/Ops
Path Dependent Development (PyCon AU)
An Introduction to developing for production
Path dependent-development (PyCon India)
Design Reviews for Operations - Velocity Europe 2014
Introduction to DevOps
How to get started with Site Reliability Engineering
Software Engineering for Startups (University of St Andrews, 2013)
DDDP 2019 - Brown to Green
Building Evolvable Infrastructure
Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"
Fine-Tuning of Agile Development
Microservices - Scaling Development and Service
Deploying large-scale, serverless and asynchronous systems - without integrat...
Microservices: State of the Union
The challenge of putting software sustainability research into practice
Lessons from Large-Scale Cloud Software at Databricks
Software engineering practices for the data science and machine learning life...
Productionizing Data Science at Experience
Ad

More from Sergii Khomenko (9)

PDF
Building Data applications with Go: from Bloom filters to Data pipelines / FO...
PDF
Building data pipelines: from simple to more advanced - hands-on experience /...
PDF
Scaling up Business Intelligence from the scratch and to 15 countries worldwi...
PDF
Secure Data Scalability at Stylight with Tableau Online and Amazon Redshift /...
PDF
Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015
PPTX
Scaling your Tableau - Migrating from Tableau Online to a proper DWH solution...
PPTX
From simple to more advanced: Lessons learned in 13 months with Tableau
PPTX
Lean Ranking infrastructure with Solr
PDF
Data Visualization with R
Building Data applications with Go: from Bloom filters to Data pipelines / FO...
Building data pipelines: from simple to more advanced - hands-on experience /...
Scaling up Business Intelligence from the scratch and to 15 countries worldwi...
Secure Data Scalability at Stylight with Tableau Online and Amazon Redshift /...
Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015
Scaling your Tableau - Migrating from Tableau Online to a proper DWH solution...
From simple to more advanced: Lessons learned in 13 months with Tableau
Lean Ranking infrastructure with Solr
Data Visualization with R

Recently uploaded (20)

PDF
Lecture1 pattern recognition............
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
annual-report-2024-2025 original latest.
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PDF
Fluorescence-microscope_Botany_detailed content
PDF
[EN] Industrial Machine Downtime Prediction
PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Database Infoormation System (DBIS).pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Computer network topology notes for revision
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
Business Analytics and business intelligence.pdf
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Lecture1 pattern recognition............
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
annual-report-2024-2025 original latest.
Galatica Smart Energy Infrastructure Startup Pitch Deck
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
climate analysis of Dhaka ,Banglades.pptx
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
Fluorescence-microscope_Botany_detailed content
[EN] Industrial Machine Downtime Prediction
Clinical guidelines as a resource for EBP(1).pdf
.pdf is not working space design for the following data for the following dat...
Database Infoormation System (DBIS).pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
Reliability_Chapter_ presentation 1221.5784
Computer network topology notes for revision
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Business Analytics and business intelligence.pdf
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...

From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016