SlideShare a Scribd company logo
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved
Roy Ben-Alta, Head Of WW Data Analytics Practice at AWS
November 2019
Modern Data Platforms
Thinking Data Flywheel on the Cloud
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Flywheel
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The Data Flywheel
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The Cloud Was Built for Data Analytics and ML
Agility: Try more, fail fast, go
big or start small, and
process data at any scale
Scalability: Run jobs any
time, without guessing
capacity or limiting
functionality
Get to Insights Faster:
Focus on data science not the
heavy undifferentiated lift of
managing raw data
Broadest and Deepest
Capabilities: Numerous
serverless & managed Big
Data services to address
any workload
Low Cost: Pay only for
what you use, when you
use it
Data Migrations Made Easy:
Move exabyte-scale data to
the cloud quickly and cost-
effectively
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Built on top of open source
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
data problem: high level solution approach
Problem statement AWS solution approach
data is undiscovered into on-premises
historians
Build a consistent and open datastore for all your
sites using flexible ingestion tools
customers want to use
AI/ML, but are usually early in their
data journey
Deploy hosted and managed future proof tools as
you move from descriptive to preventive to
predictive analytics
data has gravity but often untapped Democratize access to the data for actionable
operational insights
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
You need a Data Lake
Data from
Sources
Customer
and Insights
1
2
4
5
Collect and store any data at any scale and
at low costs
One home for all data – no silos
Right tool for the job
Democratize access to the data as per security
entitlements
3 Data security is #1 priority 6 Connect to internal and external apps
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Key ‘Components’ of a Data Lake on AWS
Athena
Query Service
Batch GlueIoT Lambda SageMaker
Glue Catalog
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Storing is not enough, data needs to be discoverable
Dark data are the information
assets organizations collect,
process, and store during
regular business activities,
but generally fail to use for other
purposes (for example, analytics,
business relationships and direct
monetizing).
Gartner
Traditional
enterprise
data
Aka as Big data
Dark data
CRM ERP Data warehouse Mainframe
data
Web Social Log
files
Machine
data
Semi-
structured
Unstructured
“
”
Source: https://www.gartner.com/it-glossary/dark-data
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
There’s a service for that: AWS Lake Formation
Data Lake Storage
Data
Catalog
Access
Control
Data import
Lake Formation
Crawlers ML-based
data prep
Use Amazon S3 as the
storage layer for
Lake Formation
Ask Lake Formation
to create required
S3 buckets and
import data into them
Register existing
S3 buckets that
contain your data
Data is stored in
your account:
You have direct access. No
lock-in.
Amazon S3
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How Does It Work
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
EMR: Big Data Toolbox
Amazon EMR provides a managed
Hadoop, Spark and more framework
that makes it easy, fast, and cost-
effective to process and train vast
amounts of data on dynamically scalable
instance fleets.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Easy & Cost Effective
Spin up clusters in minutes
Easy
• Launch Hadoop & Spark clusters in minutes
• No need to install or maintain Hadoop
• Cluster tuning, and configuration done for you
• Latest Hadoop versions within 30 days of release
Cost Effective
• EMR provides 57% reduced costs vs. on premise in Year 1
• 342% ROI over 5 years
• New design patterns for storage, transient clusters, spot instances
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon SageMaker:
Build, train, and deploy ML models at scale
Collect and prepare
training data
Choose and optimize
your
ML algorithm
1
2
3
Set up and
manage
environments
for training
Train and
Tune ML Models
Deploy models
in production
Scale and manage
the production
environment
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Web/App
Classic
• Traditional databases, data warehousing
• Historical data, logs, archives
• Big data
Industry
• Production facilities
• Control devices
• IoT-Sensors
• Websites, web apps, mobile apps
• Enterprise applications (BI, CRM, etc.)
• External data (partners, weather, traffic, etc.)
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Streaming
Import/Batch
Web/App
Classic
Industry
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Lake
Streaming
Import/Batch
Web/App
Classic
Industry
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Lake
Streaming
Import/Batch
Encryption
Data catalog
classification
Access
control
Web/App
Classic
Industry
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Lake
Streaming
Import/Batch
Web/App
Classic
Industry
ETL
Pre-processing
Machine learning
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Lake
Streaming
Import/Batch
Web/App
Classic
Industry
ETL
Pre-processing
Machine learning Monitoring
Control
Applications
API
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Lake
Streaming
Import/Batch
Building a data lake
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Lake
Streaming
Import/Batch
AWS Direct Connect
AWS Snowball
AWS Snowmobile
AWS Database Migration Service
On-premises
Data Movement
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Lake
Streaming
Import/Batch
AWS Direct Connect
AWS Snowball
AWS Snowmobile
AWS Database Migration Service
Amazon Managed Service for Kafka
Amazon Kinesis Data Firehose
Amazon Kinesis Data Streams
Amazon Kinesis Video Streams
On-premises
Data Movement
Real-time
Data Movement
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Lake
Streaming
Import/Batch
AWS Direct Connect
AWS Snowball
AWS Snowmobile
AWS Database Migration Service
Amazon Managed Service for Kafka
Amazon Kinesis Data Firehose
Amazon Kinesis Data Streams
Amazon Kinesis Video Streams
On-premises
Data Movement
Real-time
Data Movement
Amazon Athena
Amazon EMR
Amazon Redshift
Amazon Elasticsearch Service
Amazon Kinesis Data Analytics
Amazon QuickSight
Analytics
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Lake
Streaming
Import/Batch
AWS Direct Connect
AWS Snowball
AWS Snowmobile
AWS Database Migration Service
Amazon Managed Service for Kafka
Amazon Kinesis Data Firehose
Amazon Kinesis Data Streams
Amazon Kinesis Video Streams
On-premises
Data Movement
Real-time
Data Movement
Amazon Athena
Amazon EMR
Amazon Redshift
Amazon Elasticsearch Service
Amazon Kinesis Data Analytics
Amazon QuickSight
Analytics
Amazon SageMaker
Amazon Rekognition
Amazon Comprehend
Amazon Translate
Amazon Transcribe
Etc.
Machine
Learning
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Customer Case Study
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Customers running data analytics and ML on AWS
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved
On-Premises
Data Center
Archival
Processing
Amazon S3AWS Storage Gateway
AWS DataSync
AWS Transfer for SFTP
Hybrid Cloud Storage
Analytics
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank You
benaltar@amazon.com
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS data and analytic services
Any analytic workload, any scale, at the lowest possible cost
Insights
Analytics
Data Lake
Data Movement
Amazon
QuickSight
Amazon
SageMaker
AWS Glue ETL
Amazon S3/Amazon Glacier (Storage)
Amazon Redshift Amazon EMR Amazon Athena
Amazon Elasticsearch Service Amazon Kinesis Data Analytics
AWS Database Migration Service | AWS Snowball | AWS Snowmobile | Amazon Kinesis Data Firehose Amazon Kinesis
Data Streams
Real-time
Amazon
Comprehend
Data
Warehouse
data processing Interactive
Amazon
Rekognition
Metadata &
Governance
AWS Lake Formation / AWS Glue Data Catalog

More Related Content

PDF
Snowflake SnowPro Certification Exam Cheat Sheet
PDF
Intro to Delta Lake
PPTX
Cloudera - The Modern Platform for Analytics
PDF
Modern Data Flow
PPTX
Snowflake Architecture.pptx
PDF
How to govern and secure a Data Mesh?
PPTX
Azure Data Lake Intro (SQLBits 2016)
PDF
Snowflake Data Science and AI/ML at Scale
Snowflake SnowPro Certification Exam Cheat Sheet
Intro to Delta Lake
Cloudera - The Modern Platform for Analytics
Modern Data Flow
Snowflake Architecture.pptx
How to govern and secure a Data Mesh?
Azure Data Lake Intro (SQLBits 2016)
Snowflake Data Science and AI/ML at Scale

What's hot (20)

PPTX
Demystifying Data Warehouse as a Service
PDF
ADV Slides: Modern Analytic Data Architecture Maturity Modeling
PPTX
Snowflake Data Loading.pptx
PPTX
Architecting a datalake
PDF
Getting Started with Delta Lake on Databricks
PPTX
warner-DP-203-slides.pptx
PDF
Considerations for Data Access in the Lakehouse
PPTX
Data mesh
PDF
Technical Deck Delta Live Tables.pdf
PDF
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
PPTX
Snowflake Overview
PDF
Snowflake SnowPro Core Cert CheatSheet.pdf
PPTX
DW Migration Webinar-March 2022.pptx
PPTX
Introducing the Snowflake Computing Cloud Data Warehouse
PPTX
Our big data
PPTX
Master the Multi-Clustered Data Warehouse - Snowflake
PDF
Webinar Data Mesh - Part 3
PPTX
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
PPTX
Introduction to Data Engineering
PDF
Data Mesh Part 4 Monolith to Mesh
Demystifying Data Warehouse as a Service
ADV Slides: Modern Analytic Data Architecture Maturity Modeling
Snowflake Data Loading.pptx
Architecting a datalake
Getting Started with Delta Lake on Databricks
warner-DP-203-slides.pptx
Considerations for Data Access in the Lakehouse
Data mesh
Technical Deck Delta Live Tables.pdf
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Snowflake Overview
Snowflake SnowPro Core Cert CheatSheet.pdf
DW Migration Webinar-March 2022.pptx
Introducing the Snowflake Computing Cloud Data Warehouse
Our big data
Master the Multi-Clustered Data Warehouse - Snowflake
Webinar Data Mesh - Part 3
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
Introduction to Data Engineering
Data Mesh Part 4 Monolith to Mesh
Ad

Similar to Modern Data Platforms - Thinking Data Flywheel on the Cloud (20)

PPTX
From raw data to business insights. A modern data lake
PDF
Big Data, Ingeniería de datos, y Data Lakes en AWS
PPTX
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
PDF
Building a Modern Data Platform in the Cloud. AWS Initiate Portugal
PDF
Building a modern data platform in the cloud. AWS DevDay Nordics
PDF
Building a modern data platform on AWS. Utrecht AWS Dev Day
PDF
Building Data Lakes and Analytics on AWS. IPExpo Manchester.
PPTX
Building Data Lakes & Analytics on AWS
PDF
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...
PDF
Value of Data Beyond Analytics by Darin Briskman
PDF
Immersion Day - Democratize o acesso ao dado
PDF
AWS Data Analytics on AWS
PDF
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
PPTX
Construindo data lakes e analytics com AWS
PDF
AWS Floor 28 - Building Data lake on AWS
PDF
The Modern Tech Stack: Data Analytics in the Cloud for Developers and Founders
PDF
AWS Partner Data Analytics on AWS_Handout.pdf
PDF
The Beginner's Guide to Data Lakes in AWS
PDF
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
PDF
Big data and Analytics on AWS
From raw data to business insights. A modern data lake
Big Data, Ingeniería de datos, y Data Lakes en AWS
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
Building a Modern Data Platform in the Cloud. AWS Initiate Portugal
Building a modern data platform in the cloud. AWS DevDay Nordics
Building a modern data platform on AWS. Utrecht AWS Dev Day
Building Data Lakes and Analytics on AWS. IPExpo Manchester.
Building Data Lakes & Analytics on AWS
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...
Value of Data Beyond Analytics by Darin Briskman
Immersion Day - Democratize o acesso ao dado
AWS Data Analytics on AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Construindo data lakes e analytics com AWS
AWS Floor 28 - Building Data lake on AWS
The Modern Tech Stack: Data Analytics in the Cloud for Developers and Founders
AWS Partner Data Analytics on AWS_Handout.pdf
The Beginner's Guide to Data Lakes in AWS
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
Big data and Analytics on AWS
Ad

More from Alluxio, Inc. (20)

PDF
Introduction to Apache Iceberg™ & Tableflow
PDF
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
PDF
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
PDF
From Data Preparation to Inference: How Alluxio Speeds Up AI
PDF
Best Practice for LLM Serving in the Cloud
PDF
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
PDF
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
PDF
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
PDF
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
PDF
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
PDF
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
PDF
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
PDF
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
PDF
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
PDF
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
PDF
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
PDF
Alluxio Webinar | Accelerate AI: Alluxio 101
PDF
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
PDF
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
PDF
AI/ML Infra Meetup | Big Data and AI, Zoom Developers
Introduction to Apache Iceberg™ & Tableflow
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
From Data Preparation to Inference: How Alluxio Speeds Up AI
Best Practice for LLM Serving in the Cloud
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
Alluxio Webinar | Accelerate AI: Alluxio 101
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
AI/ML Infra Meetup | Big Data and AI, Zoom Developers

Recently uploaded (20)

PPTX
Operating system designcfffgfgggggggvggggggggg
PPTX
Materi-Enum-and-Record-Data-Type (1).pptx
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
DOCX
The Five Best AI Cover Tools in 2025.docx
PPTX
ai tools demonstartion for schools and inter college
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PPTX
ISO 45001 Occupational Health and Safety Management System
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
System and Network Administraation Chapter 3
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
top salesforce developer skills in 2025.pdf
PDF
Softaken Excel to vCard Converter Software.pdf
PPTX
Online Work Permit System for Fast Permit Processing
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PPTX
Introduction to Artificial Intelligence
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
medical staffing services at VALiNTRY
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Operating system designcfffgfgggggggvggggggggg
Materi-Enum-and-Record-Data-Type (1).pptx
Which alternative to Crystal Reports is best for small or large businesses.pdf
The Five Best AI Cover Tools in 2025.docx
ai tools demonstartion for schools and inter college
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
ISO 45001 Occupational Health and Safety Management System
ManageIQ - Sprint 268 Review - Slide Deck
Wondershare Filmora 15 Crack With Activation Key [2025
System and Network Administraation Chapter 3
Upgrade and Innovation Strategies for SAP ERP Customers
top salesforce developer skills in 2025.pdf
Softaken Excel to vCard Converter Software.pdf
Online Work Permit System for Fast Permit Processing
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Introduction to Artificial Intelligence
Design an Analysis of Algorithms II-SECS-1021-03
medical staffing services at VALiNTRY
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...

Modern Data Platforms - Thinking Data Flywheel on the Cloud

  • 1. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved Roy Ben-Alta, Head Of WW Data Analytics Practice at AWS November 2019 Modern Data Platforms Thinking Data Flywheel on the Cloud
  • 2. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Flywheel
  • 3. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The Data Flywheel
  • 4. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The Cloud Was Built for Data Analytics and ML Agility: Try more, fail fast, go big or start small, and process data at any scale Scalability: Run jobs any time, without guessing capacity or limiting functionality Get to Insights Faster: Focus on data science not the heavy undifferentiated lift of managing raw data Broadest and Deepest Capabilities: Numerous serverless & managed Big Data services to address any workload Low Cost: Pay only for what you use, when you use it Data Migrations Made Easy: Move exabyte-scale data to the cloud quickly and cost- effectively
  • 5. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Built on top of open source
  • 6. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. data problem: high level solution approach Problem statement AWS solution approach data is undiscovered into on-premises historians Build a consistent and open datastore for all your sites using flexible ingestion tools customers want to use AI/ML, but are usually early in their data journey Deploy hosted and managed future proof tools as you move from descriptive to preventive to predictive analytics data has gravity but often untapped Democratize access to the data for actionable operational insights
  • 7. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. You need a Data Lake Data from Sources Customer and Insights 1 2 4 5 Collect and store any data at any scale and at low costs One home for all data – no silos Right tool for the job Democratize access to the data as per security entitlements 3 Data security is #1 priority 6 Connect to internal and external apps
  • 8. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Key ‘Components’ of a Data Lake on AWS Athena Query Service Batch GlueIoT Lambda SageMaker Glue Catalog
  • 9. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Storing is not enough, data needs to be discoverable Dark data are the information assets organizations collect, process, and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing). Gartner Traditional enterprise data Aka as Big data Dark data CRM ERP Data warehouse Mainframe data Web Social Log files Machine data Semi- structured Unstructured “ ” Source: https://www.gartner.com/it-glossary/dark-data
  • 10. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. There’s a service for that: AWS Lake Formation Data Lake Storage Data Catalog Access Control Data import Lake Formation Crawlers ML-based data prep Use Amazon S3 as the storage layer for Lake Formation Ask Lake Formation to create required S3 buckets and import data into them Register existing S3 buckets that contain your data Data is stored in your account: You have direct access. No lock-in. Amazon S3
  • 11. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. How Does It Work
  • 12. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. EMR: Big Data Toolbox Amazon EMR provides a managed Hadoop, Spark and more framework that makes it easy, fast, and cost- effective to process and train vast amounts of data on dynamically scalable instance fleets.
  • 13. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Easy & Cost Effective Spin up clusters in minutes Easy • Launch Hadoop & Spark clusters in minutes • No need to install or maintain Hadoop • Cluster tuning, and configuration done for you • Latest Hadoop versions within 30 days of release Cost Effective • EMR provides 57% reduced costs vs. on premise in Year 1 • 342% ROI over 5 years • New design patterns for storage, transient clusters, spot instances
  • 14. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon SageMaker: Build, train, and deploy ML models at scale Collect and prepare training data Choose and optimize your ML algorithm 1 2 3 Set up and manage environments for training Train and Tune ML Models Deploy models in production Scale and manage the production environment
  • 15. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Web/App Classic • Traditional databases, data warehousing • Historical data, logs, archives • Big data Industry • Production facilities • Control devices • IoT-Sensors • Websites, web apps, mobile apps • Enterprise applications (BI, CRM, etc.) • External data (partners, weather, traffic, etc.)
  • 16. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Streaming Import/Batch Web/App Classic Industry
  • 17. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Lake Streaming Import/Batch Web/App Classic Industry
  • 18. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Lake Streaming Import/Batch Encryption Data catalog classification Access control Web/App Classic Industry
  • 19. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Lake Streaming Import/Batch Web/App Classic Industry ETL Pre-processing Machine learning
  • 20. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Lake Streaming Import/Batch Web/App Classic Industry ETL Pre-processing Machine learning Monitoring Control Applications API
  • 21. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Lake Streaming Import/Batch Building a data lake
  • 22. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Lake Streaming Import/Batch AWS Direct Connect AWS Snowball AWS Snowmobile AWS Database Migration Service On-premises Data Movement
  • 23. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Lake Streaming Import/Batch AWS Direct Connect AWS Snowball AWS Snowmobile AWS Database Migration Service Amazon Managed Service for Kafka Amazon Kinesis Data Firehose Amazon Kinesis Data Streams Amazon Kinesis Video Streams On-premises Data Movement Real-time Data Movement
  • 24. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Lake Streaming Import/Batch AWS Direct Connect AWS Snowball AWS Snowmobile AWS Database Migration Service Amazon Managed Service for Kafka Amazon Kinesis Data Firehose Amazon Kinesis Data Streams Amazon Kinesis Video Streams On-premises Data Movement Real-time Data Movement Amazon Athena Amazon EMR Amazon Redshift Amazon Elasticsearch Service Amazon Kinesis Data Analytics Amazon QuickSight Analytics
  • 25. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Lake Streaming Import/Batch AWS Direct Connect AWS Snowball AWS Snowmobile AWS Database Migration Service Amazon Managed Service for Kafka Amazon Kinesis Data Firehose Amazon Kinesis Data Streams Amazon Kinesis Video Streams On-premises Data Movement Real-time Data Movement Amazon Athena Amazon EMR Amazon Redshift Amazon Elasticsearch Service Amazon Kinesis Data Analytics Amazon QuickSight Analytics Amazon SageMaker Amazon Rekognition Amazon Comprehend Amazon Translate Amazon Transcribe Etc. Machine Learning
  • 26. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Customer Case Study
  • 27. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Customers running data analytics and ML on AWS
  • 28. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved On-Premises Data Center Archival Processing Amazon S3AWS Storage Gateway AWS DataSync AWS Transfer for SFTP Hybrid Cloud Storage Analytics
  • 29. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 30. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank You benaltar@amazon.com
  • 31. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS data and analytic services Any analytic workload, any scale, at the lowest possible cost Insights Analytics Data Lake Data Movement Amazon QuickSight Amazon SageMaker AWS Glue ETL Amazon S3/Amazon Glacier (Storage) Amazon Redshift Amazon EMR Amazon Athena Amazon Elasticsearch Service Amazon Kinesis Data Analytics AWS Database Migration Service | AWS Snowball | AWS Snowmobile | Amazon Kinesis Data Firehose Amazon Kinesis Data Streams Real-time Amazon Comprehend Data Warehouse data processing Interactive Amazon Rekognition Metadata & Governance AWS Lake Formation / AWS Glue Data Catalog