SlideShare a Scribd company logo
Evolution of AWS
infrastructure for ML
from Zero to Hero
About me
{
“First name”: “Taras”,
“Last name”: “Slipets”,
“Occupation”: “Senior Software Engineer”,
“Company”: “Flixbus”,
“Interests”: [
“Software Engineering”,
“Music”,
“Skiing”
]
}
2
Disclaimer: my own personal experience
3
Agenda
● Problem statement
● Infrastructure start point
● Infrastructure evolution
○ First automations
○ Containerisation and cost reduction
○ Reactive approach
● Future improvements
● Summary and Q&A 4
Agenda
● Problem statement
● Infrastructure start point
● Infrastructure evolution
○ First automations
○ Containerisation and cost reduction
○ Reactive approach
● Future improvements
● Summary and Q&A 5
Customer Lifetime Value
6
Heterogeneous data sources
● Purchase history
● Website activities
● etc...
7
Agenda
● Problem statement
● Infrastructure start point
● Infrastructure evolution
○ First automations
○ Containerisation and cost reduction
○ Reactive approach
● Future improvements
● Summary and Q&A 8
Fresh account
9
Speed up Data Science experiments
10
Git + SageMaker Notebook Instances
11
AWS SageMaker
Key takeaways
● K.I.S.S.
● Concentrate efforts on business value, not on
technologies stack
12
Agenda
● Problem statement
● Infrastructure start point
● Infrastructure evolution
○ First automations
○ Containerisation and cost reduction
○ Reactive approach
● Future improvements
● Summary and Q&A 13
Split up pipeline into clear steps
14
Input data collection at scale
15
Features extraction at scale
16
Lambda
Redshift
Lambda
S3Glue
Athena
Model training and inference
17
AWS SageMaker
S3S3
CRON + Bash
We used a VERY big machine
18
AWS SageMaker
S3S3
CRON + Bash
Key takeaways
● Clearly define what you actually need - there might be
service ready for use in cloud provider
● Cloud resources are cheaper than long-time manual
infrastructure setup and optimisation
19
MVP in 1 month
20
Agenda
● Problem statement
● Infrastructure start point
● Infrastructure evolution
○ First automations
○ Containerisation and cost reduction
○ Reactive approach
● Future improvements
● Summary and Q&A 21
Containerization
22
ECR
CI / CD Pipelines
Pipeline Scheduling
23
ECR
S3 S3
ECS
Costs reduction
24
CRON
ECS
24 / 7 On Demand 44%
Key takeaways
● Get actual practical numbers
● Actual bottlenecks might be completely different from your
preliminar gut feelings
● Resolve low-hanging fruits first
25
Agenda
● Problem statement
● Infrastructure start point
● Infrastructure evolution
○ First automations
○ Containerisation and cost reduction
○ Reactive approach
● Future improvements
● Summary and Q&A 26
Scheduled steps
27
ECS
Aligned reactive execution
28
ECS
Step Function
Monitoring and metrics
29
CloudWatch QuickSight
Infrastructure as code
30
Key takeaways
● Think about entire e2e flow observability
● Start with simple, but business-relevant KPIs
● Separate business and technical metrics
● Offload knowledge of data engineers into code
31
Agenda
● Problem statement
● Infrastructure start point
● Infrastructure evolution
○ First automations
○ Containerisation and cost reduction
○ Reactive approach
● Future improvements
● Summary and Q&A 32
Future improvements
33
● AWS SageMaker ecosystem usage for model training and
inference
● Features extraction optimisation via AWS Athena and
Glue
● Custom AWS CloudWatch metrics for pipeline execution
monitoring
● Make model training and inference independent
● Enable real-time inference beside batch processing
● ...
Summary
● Start simple, bring business value
● Cloud resources are cheap and fast to setup
● Preliminary optimisation based on guessing is bull$&it
● Fully-fledged infrastructure automation should go after you
have MVP, not before
● Have measurable and observable metrics
34
Q&A
35

More Related Content

PDF
Fantastic datasets and where to find them
PDF
Seven Cloud Sins of DevOps
PDF
Operating the Hyperscale Cloud
PDF
Logging, Metrics, and APM: The Operations Trifecta (P)
PDF
Building Software to Scale
PPTX
PowerStream Demo
PPTX
Building big data applications on AWS by Ran Tessler
PPTX
IronSource Atom - Redshift - Lessons Learned
Fantastic datasets and where to find them
Seven Cloud Sins of DevOps
Operating the Hyperscale Cloud
Logging, Metrics, and APM: The Operations Trifecta (P)
Building Software to Scale
PowerStream Demo
Building big data applications on AWS by Ran Tessler
IronSource Atom - Redshift - Lessons Learned

What's hot (20)

PDF
(New)SQL on AWS: Aurora serverless
PDF
Bandwidth: Use Cases for Elastic Cloud on Kubernetes
PDF
Virtual Global Azure 2020 - Azure Monitor
PPTX
Lift SSIS package to Azure Data Factory V2
PDF
Cnvrg webinar continual learning
PDF
_Search? Made Simple: Elastic + App Search
PDF
Opening Keynote
PDF
Handle insane devices traffic using Google Cloud Platform - Andrea Ulisse - C...
PDF
Accelerating Innovation with Apache Kafka, Heikki Nousiainen | Heikki Nousiai...
PPTX
Server Sent Events using Reactive Kafka and Spring Web flux | Gagan Solur Ven...
PDF
Migrating .NET and .NET Core to Pivotal Cloud Foundry (1/2)
PPTX
Microservice Plumbing - Glynn Bird - Codemotion Rome 2017
PDF
Mastering Azure Monitor
PDF
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
PPTX
Monitor Azure Kubernetes Cluster With Prometheus by Mamta Jha
PPTX
Migrating SSIS to the cloud
PPTX
Google Cloud and Data Pipeline Patterns
PDF
Elastic Search Meetup Special - Yann Cluchey, Cogenta
PDF
Architectural Best Practices to Master + Pitfalls to Avoid (P)
PDF
Real-Time Vote Platform Benchmark
(New)SQL on AWS: Aurora serverless
Bandwidth: Use Cases for Elastic Cloud on Kubernetes
Virtual Global Azure 2020 - Azure Monitor
Lift SSIS package to Azure Data Factory V2
Cnvrg webinar continual learning
_Search? Made Simple: Elastic + App Search
Opening Keynote
Handle insane devices traffic using Google Cloud Platform - Andrea Ulisse - C...
Accelerating Innovation with Apache Kafka, Heikki Nousiainen | Heikki Nousiai...
Server Sent Events using Reactive Kafka and Spring Web flux | Gagan Solur Ven...
Migrating .NET and .NET Core to Pivotal Cloud Foundry (1/2)
Microservice Plumbing - Glynn Bird - Codemotion Rome 2017
Mastering Azure Monitor
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitor Azure Kubernetes Cluster With Prometheus by Mamta Jha
Migrating SSIS to the cloud
Google Cloud and Data Pipeline Patterns
Elastic Search Meetup Special - Yann Cluchey, Cogenta
Architectural Best Practices to Master + Pitfalls to Avoid (P)
Real-Time Vote Platform Benchmark
Ad

Similar to Evolution of AWS infrastructure for ML: from Zero to Hero (20)

PPTX
AWS Summit 2018 Summary
PPTX
Using AWS To Build A Scalable Machine Data Analytics Service
PPTX
How to Architect AWS for Mission-Critical Applications
PPTX
Introduction to AWS July
PDF
Big data and Analytics on AWS
PPTX
DevOps, CI/CD, cost management, and security on AWS
PDF
AWSomeBuilder3-v12-clean.pdf
PDF
Programming Amazon EC2 1st Edition Jurg Van Vliet
PDF
Data analytics master class: predict hotel revenue
PDF
20141021 AWS Cloud Taekwon - Startup Best Practices on AWS
PPTX
CuriousMinds and Siemens in Brasov 2015 - Building and Developing for the Clo...
PDF
Tech Talks On Site- Edição de Maio- AutoScaling
PPTX
AWS_CLOUD (2).pptx
PDF
AWS re:Invent 2016 recap (part 2)
PPTX
AWS 101 - An Introduction to the Amazon Cloud
PPTX
How Easy to Automate Application Deployment on AWS
PPTX
Journey Towards Scaling Your API to 10 Million Users
PPTX
Adopting AWS in your organization - ITPalooza 2015
PDF
AWS Partner Data Analytics on AWS_Handout.pdf
PPTX
Modernizing your AWS Deployment
AWS Summit 2018 Summary
Using AWS To Build A Scalable Machine Data Analytics Service
How to Architect AWS for Mission-Critical Applications
Introduction to AWS July
Big data and Analytics on AWS
DevOps, CI/CD, cost management, and security on AWS
AWSomeBuilder3-v12-clean.pdf
Programming Amazon EC2 1st Edition Jurg Van Vliet
Data analytics master class: predict hotel revenue
20141021 AWS Cloud Taekwon - Startup Best Practices on AWS
CuriousMinds and Siemens in Brasov 2015 - Building and Developing for the Clo...
Tech Talks On Site- Edição de Maio- AutoScaling
AWS_CLOUD (2).pptx
AWS re:Invent 2016 recap (part 2)
AWS 101 - An Introduction to the Amazon Cloud
How Easy to Automate Application Deployment on AWS
Journey Towards Scaling Your API to 10 Million Users
Adopting AWS in your organization - ITPalooza 2015
AWS Partner Data Analytics on AWS_Handout.pdf
Modernizing your AWS Deployment
Ad

More from Taras Slipets (11)

PDF
FlixBus Ride with Snowflake
PPTX
Serverless Kafka Patterns
PPTX
Customers feedback – from data mess to data mesh
PDF
Experiment more, pay less for your AWS ML.pdf
PDF
DevOps applied: Survival guide
PDF
Full stack, Full run, Full test
PPTX
Java Developers
PPTX
Testing in Legacy
PPTX
Testing in Legacy: From Rags to Riches
PPTX
What developers can really contribute in DevOps concept?
PPTX
Есть ли жизнь без ООП
FlixBus Ride with Snowflake
Serverless Kafka Patterns
Customers feedback – from data mess to data mesh
Experiment more, pay less for your AWS ML.pdf
DevOps applied: Survival guide
Full stack, Full run, Full test
Java Developers
Testing in Legacy
Testing in Legacy: From Rags to Riches
What developers can really contribute in DevOps concept?
Есть ли жизнь без ООП

Recently uploaded (20)

PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
top salesforce developer skills in 2025.pdf
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PDF
Nekopoi APK 2025 free lastest update
PPTX
Essential Infomation Tech presentation.pptx
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PPTX
CHAPTER 2 - PM Management and IT Context
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PPTX
ai tools demonstartion for schools and inter college
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PPTX
Transform Your Business with a Software ERP System
PPTX
L1 - Introduction to python Backend.pptx
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
Understanding Forklifts - TECH EHS Solution
How Creative Agencies Leverage Project Management Software.pdf
top salesforce developer skills in 2025.pdf
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PTS Company Brochure 2025 (1).pdf.......
wealthsignaloriginal-com-DS-text-... (1).pdf
Nekopoi APK 2025 free lastest update
Essential Infomation Tech presentation.pptx
Odoo Companies in India – Driving Business Transformation.pdf
Softaken Excel to vCard Converter Software.pdf
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
CHAPTER 2 - PM Management and IT Context
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
ai tools demonstartion for schools and inter college
Design an Analysis of Algorithms I-SECS-1021-03
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Transform Your Business with a Software ERP System
L1 - Introduction to python Backend.pptx
Design an Analysis of Algorithms II-SECS-1021-03
Understanding Forklifts - TECH EHS Solution

Evolution of AWS infrastructure for ML: from Zero to Hero