SlideShare a Scribd company logo
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
@julsimon
Webinar “Salon du Big Data”
02/03/2016
Simplify Big Data with AWS
Julien Simon, Principal Technical Evangelist
Simplify Big Data Processing
ingest /
collect
store
process /
analyze
consume /
visualize
Time to Answer (Latency)
Throughput
Cost
Collect /
Ingest
Types of Data
•  Transactional
•  Database reads & writes (OLTP)
•  Cache
•  Search
•  Logs
•  Streams
•  File
•  Log files (/var/log)
•  Log collectors & frameworks
•  Stream
•  Log records
•  Sensors & IoT data
Database
File
Storage
Stream
Storage
A
iOS
 Android
Web Apps
Logstash
Logging
IoT
Applications
Transactional Data
File Data
Stream Data
Mobile
Apps
Search Data
Search
Collect Store
Logging
IoT
Store
Stream
Storage
A
iOS
 Android
Web Apps
Logstash
Amazon
RDS
Amazon
DynamoDB
Amazon
ES
Amazon

S3
Apache
Kafka
Amazon

Glacier
Amazon

Kinesis
Amazon

DynamoDB
Amazon
ElastiCache
SearchSQLNoSQLCache
StreamStorage
FileStorage
Transactional Data
File Data
Stream Data
Mobile
Apps
Search Data
Database
File
Storage
Search
Collect Store
Logging
IoT
Applications
ü 
File
Storage
A
iOS
 Android
Web Apps
Logstash
Amazon
RDS
Amazon
DynamoDB
Amazon
ES
Amazon

S3
Apache
Kafka
Amazon

Glacier
Amazon

Kinesis
Amazon

DynamoDB
Amazon
ElastiCache
SearchSQLNoSQLCache
StreamStorage
FileStorage
Transactional Data
File Data
Stream Data
Mobile
Apps
Search Data
Database
Search
Collect Store
Logging
IoT
Applications
ü 
Database +
Search
Tier
A
Amazon

S3
iOS
 Android
Web Apps
Logstash
Amazon
RDS / Aurora
Amazon
DynamoDB
Amazon
ES
Amazon

S3
Apache
Kafka
Amazon

Glacier
Amazon

Kinesis
Amazon

DynamoDB
Amazon
ElastiCache
SearchSQLNoSQLCache
StreamStorage
FileStorage
Transactional Data
File Data
Stream Data
Mobile
Apps
Search Data
Collect Store
ü Logging
IoT
Applications
Database + Search Tier Anti-pattern
RDBMS
Database + Search Tier
Applications
What Data Store Should I Use?
Amazon
ElastiCache
Amazon
DynamoDB
Amazon
Aurora
Amazon
Elasticsearch
Amazon
EMR (HDFS)
Amazon S3 Amazon Glacier
Average
latency
ms ms ms, sec ms,sec sec,min,hrs ms,sec,min
(~ size)
hrs
Data volume GB GB–TBs
(no limit)
GB–TB
(64 TB
Max)
GB–TB GB–PB
(~nodes)
MB–PB
(no limit)
GB–PB
(no limit)
Item size B-KB KB
(400 KB
max)
KB
(64 KB)
KB
(1 MB max)
MB-GB KB-GB
(5 TB max)
GB
(40 TB max)
Request rate High -
Very High
Very High
(no limit)
High High Low – Very
High
Low –
Very High
(no limit)
Very Low
Storage cost
GB/month
$$ ¢¢ ¢¢ ¢¢ ¢ ¢ ¢/10
Durability Low -
Moderate
Very High Very High High High Very High Very High
Hot Data Warm Data Cold Data
Hot Data Warm Data Cold Data
Process /
Analyze
AnalyzeA
iOS
 Android
Web Apps
Logstash
Amazon
RDS
Amazon
DynamoDB
Amazon
ES
Amazon

S3
Apache
Kafka
Amazon

Glacier
Amazon

Kinesis
Amazon

DynamoDB
Amazon
Redshift
Impala
Pig
Amazon ML
Streaming
Amazon

Kinesis
AWS
Lambda
AmazonElasticMapReduce
Amazon
ElastiCache
SearchSQLNoSQLCache
StreamProcessing
Batch
Interactive
Logging
StreamStorage
IoT
Applications
FileStorage
Hot
Cold
Warm
Hot
Hot
ML
Transactional Data
File Data
Stream Data
Mobile
Apps
Search Data
Collect Store Analyze
ü  ü 
Analysis Tools and Frameworks
Machine Learning
•  Mahout, Spark ML, Amazon ML
Interactive Analytics
•  Amazon Redshift, Presto, Impala, Spark
Batch Processing
•  MapReduce, Hive, Pig, Spark
Stream Processing
•  Micro-batch: Spark Streaming, KCL, Hive, Pig
•  Real-time: Storm, AWS Lambda, KCL
Amazon
Redshift
Impala
Pig
Amazon Machine
Learning
Streaming
Amazon

Kinesis
AWS
Lambda
AmazonElasticMapReduce
StreamProcessing
Batch
Interactive
ML
Analyze
What Data Processing Technology Should I Use?
Amazon
Redshift
Impala Presto Spark Hive
Query
Latency
Low Low Low Low Medium (Tez) –
High (MapReduce)
Durability High High High High High
Data Volume 1.6 PB
Max
~Nodes ~Nodes ~Nodes ~Nodes
Managed Yes Yes (EMR) Yes (EMR) Yes (EMR) Yes (EMR)
Storage Native HDFS / S3 HDFS / S3 HDFS / S3 HDFS / S3
SQL
Compatibility
High Medium High Low (SparkSQL) Medium (HQL)
Query Latency
High
(Low is better)
Medium
Consume /
Visualize
Collect Store Analyze Consume
A
iOS
 Android
Web Apps
Logstash
Amazon
RDS
Amazon
DynamoDB
Amazon
ES
Amazon

S3
Apache
Kafka
Amazon

Glacier
Amazon

Kinesis
Amazon

DynamoDB
Amazon
Redshift
Impala
Pig
Amazon ML
Streaming
Amazon

Kinesis
AWS
Lambda
AmazonElasticMapReduce
Amazon
ElastiCache
SearchSQLNoSQLCache
StreamProcessing
Batch
Interactive
Logging
StreamStorage
IoT
Applications
FileStorage
Analysis&Visualization
Hot
Cold
Warm
Hot
Slow
Hot
ML
Fast
Fast
Transactional Data
File Data
Stream Data
Notebooks
Predictions
Apps & APIs
Mobile
Apps
IDE
Search Data
ETL
Amazon
QuickSight
Consume
•  Predictions
•  Analysis and Visualization
•  Notebooks
•  IDE
•  Applications & API
Consume
Analysis&Visualization
Amazon
QuickSight
Notebooks
Predictions
Apps & APIs
IDE
Store Analyze ConsumeETL
Business
users
Data Scientist,
Developers
Putting It All Together
Collect Store Analyze Consume
A
iOS
 Android
Web Apps
Logstash
Amazon
RDS
Amazon
DynamoDB
Amazon
ES
Amazon

S3
Apache
Kafka
Amazon

Glacier
Amazon

Kinesis
Amazon

DynamoDB
Amazon
Redshift
Impala
Pig
Amazon ML
Streaming
Amazon

Kinesis
AWS
Lambda
AmazonElasticMapReduce
Amazon
ElastiCache
SearchSQLNoSQLCache
StreamProcessing
Batch
Interactive
Logging
StreamStorage
IoT
Applications
FileStorage
Analysis&Visualization
Hot
Cold
Warm
Hot
Slow
Hot
ML
Fast
Fast
Amazon
QuickSight
Transactional Data
File Data
Stream Data
Notebooks
Predictions
Apps & APIs
Mobile
Apps
IDE
Search Data
ETL
Interactive
& Batch
Analytics
Producer Amazon S3
Amazon EMR
Hive
Pig
Spark
Amazon
ML
process
store
Consume
Amazon
Redshift
Amazon EMR
Presto
Impala
Spark
Batch
Interactive
Batch Prediction
Real-time Prediction
Real-time Analytics
Producer
Apache
Kafka
KCL
AWS Lambda
Spark
Streaming
Apache
Storm
Amazon
SNS
Amazon
ML
Notifications
Amazon
ElastiCache
(Redis)
Amazon
DynamoDB
Amazon
RDS
Amazon
ES
Alert
App state
Real-time Prediction
KPI
process
store
DynamoDB
Streams
Amazon
Kinesis
Batch Layer
Amazon
Kinesis
data
process
store
Lambda
Architecture
Amazon
Kinesis S3
Connector
Amazon S3
A
p
p
l
i
c
a
t
i
o
n
s
Amazon
Redshift
Amazon EMR
Presto
Hive
Pig
Spark
answer
Speed Layer
answer
Serving
Layer
Amazon
ElastiCache
Amazon
DynamoDB
Amazon
RDS
Amazon
ES
answer
Amazon
ML
KCL
AWS Lambda
Spark Streaming
Storm
Summary
•  Use the right tool for the job
•  Latency, throughput, access patterns
•  Leverage AWS managed services
•  No/low admin
•  Be cost conscious
•  Big data ≠ big cost
Thank you. Let’s keep in touch!
@aws_actus @julsimon
facebook.com/groups/AWSFrance/
AWS User Groups in Paris,
Lyon, Nantes, Lille & Rennes
(meetup.com)
March 7-8
AWS Summit
May 31st
April 20-22
March 23-24 April 6-7 (Lyon)
April 25
March 16
Customer references & further reading
•  Amazon Kinesis: https://aws.amazon.com/solutions/case-studies/supercell/
•  Amazon DynamoDB: https://aws.amazon.com/fr/solutions/case-studies/adroll/
•  Amazon S3 / Glacier: https://aws.amazon.com/fr/solutions/case-studies/soundcloud/
•  Amazon EMR: https://aws.amazon.com/fr/solutions/case-studies/yelp/
•  Amazon Aurora: https://aws.amazon.com/fr/rds/aurora/testimonials/
•  Amazon Redshift: https://aws.amazon.com/fr/solutions/case-studies/financial-times/
•  AWS Lambda: https://aws.amazon.com/fr/solutions/case-studies/nordstrom/
•  Many more case studies at https://aws.amazon.com/fr/solutions/case-studies/big-data/
•  Whitepaper: “Big Data Analytics Options on AWS” :
http://d0.awsstatic.com/whitepapers/Big_Data_Analytics_Options_on_AWS.pdf
•  AWS Big Data blog: https://blogs.aws.amazon.com/bigdata

More Related Content

PDF
Deep Dive on Amazon RDS (May 2016)
PPTX
AWS Batch: Simplifying batch computing in the cloud
PDF
AWS Big Data combo
PDF
Influencer marketing: Buying and Selling Audience Impressions
PDF
Scale, baby, scale! (June 2016)
PDF
Machine Learning for everyone
PDF
Deep Dive on Amazon S3 (May 2016)
PDF
A 60-minute tour of AWS Compute (November 2016)
Deep Dive on Amazon RDS (May 2016)
AWS Batch: Simplifying batch computing in the cloud
AWS Big Data combo
Influencer marketing: Buying and Selling Audience Impressions
Scale, baby, scale! (June 2016)
Machine Learning for everyone
Deep Dive on Amazon S3 (May 2016)
A 60-minute tour of AWS Compute (November 2016)

Viewers also liked (13)

PDF
Hands-on with AWS IoT (November 2016)
PDF
AWS Machine Learning Workshp
PDF
Hands-on with AWS IoT
PPTX
Intro to AWS Machine Learning
PDF
Deep Dive AWS CloudTrail
PDF
IoT: it's all about Data!
PDF
Deep Dive: Amazon Relational Database Service (March 2017)
PDF
Fascinating Tales of a Strange Tomorrow
PDF
AWS Security Best Practices (March 2017)
PDF
Bonnes pratiques pour la gestion des opérations de sécurité AWS
PDF
Amazon AI (February 2017)
PDF
Advanced Task Scheduling with Amazon ECS
PDF
Deep Dive: Amazon Redshift (March 2017)
Hands-on with AWS IoT (November 2016)
AWS Machine Learning Workshp
Hands-on with AWS IoT
Intro to AWS Machine Learning
Deep Dive AWS CloudTrail
IoT: it's all about Data!
Deep Dive: Amazon Relational Database Service (March 2017)
Fascinating Tales of a Strange Tomorrow
AWS Security Best Practices (March 2017)
Bonnes pratiques pour la gestion des opérations de sécurité AWS
Amazon AI (February 2017)
Advanced Task Scheduling with Amazon ECS
Deep Dive: Amazon Redshift (March 2017)
Ad

Similar to Simplify Big Data with AWS (20)

PPTX
2016 Utah Cloud Summit: Big Data Architectural Patterns and Best Practices on...
PDF
Builders' Day - Building Data Lakes for Analytics On AWS LC
PDF
Big Data Architecture and Design Patterns
PDF
Building a Modern Data Platform in the Cloud. AWS Initiate Portugal
PDF
Big Data, Ingeniería de datos, y Data Lakes en AWS
PPTX
Make your data fly - Building data platform in AWS
PPTX
AWS Lake Formation Deep Dive
PDF
AWS Floor 28 - Building Data lake on AWS
PPTX
Construindo data lakes e analytics com AWS
PDF
Big data on aws
PDF
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
PPTX
Data Lake na área da saúde- AWS
PDF
Re:cap do AWS re:Invet 2022 for Data Engineer and Analytics
PPTX
AWS Big Data Demystified #1: Big data architecture lessons learned
PDF
Building a modern data platform on AWS. Utrecht AWS Dev Day
PDF
Serverless Big Data Architectures: Serverless Data Analytics
PPTX
Big Data in 200 km/h | AWS Big Data Demystified #1.3
PDF
Architecting Data Lakes on AWS
PDF
¿Quién es Amazon Web Services?
PDF
What's New & What's Next from AWS?
2016 Utah Cloud Summit: Big Data Architectural Patterns and Best Practices on...
Builders' Day - Building Data Lakes for Analytics On AWS LC
Big Data Architecture and Design Patterns
Building a Modern Data Platform in the Cloud. AWS Initiate Portugal
Big Data, Ingeniería de datos, y Data Lakes en AWS
Make your data fly - Building data platform in AWS
AWS Lake Formation Deep Dive
AWS Floor 28 - Building Data lake on AWS
Construindo data lakes e analytics com AWS
Big data on aws
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
Data Lake na área da saúde- AWS
Re:cap do AWS re:Invet 2022 for Data Engineer and Analytics
AWS Big Data Demystified #1: Big data architecture lessons learned
Building a modern data platform on AWS. Utrecht AWS Dev Day
Serverless Big Data Architectures: Serverless Data Analytics
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Architecting Data Lakes on AWS
¿Quién es Amazon Web Services?
What's New & What's Next from AWS?
Ad

More from Julien SIMON (20)

PDF
Implementing high-quality and cost-effiient AI applications with small langua...
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
PDF
Arcee AI - building and working with small language models (06/25)
PDF
deep_dive_multihead_latent_attention.pdf
PDF
Deep Dive: Model Distillation with DistillKit
PDF
Deep Dive: Parameter-Efficient Model Adaptation with LoRA and Spectrum
PDF
Building High-Quality Domain-Specific Models with Mergekit
PDF
Tailoring Small Language Models for Enterprise Use Cases
PDF
Tailoring Small Language Models for Enterprise Use Cases
PDF
Julien Simon - Deep Dive: Compiling Deep Learning Models
PDF
Tailoring Small Language Models for Enterprise Use Cases
PDF
Julien Simon - Deep Dive - Optimizing LLM Inference
PDF
Julien Simon - Deep Dive - Accelerating Models with Better Attention Layers
PDF
Julien Simon - Deep Dive - Quantizing LLMs
PDF
Julien Simon - Deep Dive - Model Merging
PDF
An introduction to computer vision with Hugging Face
PDF
Reinventing Deep Learning
 with Hugging Face Transformers
PDF
Building NLP applications with Transformers
PPTX
Building Machine Learning Models Automatically (June 2020)
Implementing high-quality and cost-effiient AI applications with small langua...
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Trying to figure out MCP by actually building an app from scratch with open s...
Arcee AI - building and working with small language models (06/25)
deep_dive_multihead_latent_attention.pdf
Deep Dive: Model Distillation with DistillKit
Deep Dive: Parameter-Efficient Model Adaptation with LoRA and Spectrum
Building High-Quality Domain-Specific Models with Mergekit
Tailoring Small Language Models for Enterprise Use Cases
Tailoring Small Language Models for Enterprise Use Cases
Julien Simon - Deep Dive: Compiling Deep Learning Models
Tailoring Small Language Models for Enterprise Use Cases
Julien Simon - Deep Dive - Optimizing LLM Inference
Julien Simon - Deep Dive - Accelerating Models with Better Attention Layers
Julien Simon - Deep Dive - Quantizing LLMs
Julien Simon - Deep Dive - Model Merging
An introduction to computer vision with Hugging Face
Reinventing Deep Learning
 with Hugging Face Transformers
Building NLP applications with Transformers
Building Machine Learning Models Automatically (June 2020)

Recently uploaded (20)

PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Encapsulation theory and applications.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPT
Teaching material agriculture food technology
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Big Data Technologies - Introduction.pptx
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Machine learning based COVID-19 study performance prediction
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
NewMind AI Weekly Chronicles - August'25 Week I
Agricultural_Statistics_at_a_Glance_2022_0.pdf
MIND Revenue Release Quarter 2 2025 Press Release
Encapsulation theory and applications.pdf
The AUB Centre for AI in Media Proposal.docx
Building Integrated photovoltaic BIPV_UPV.pdf
Teaching material agriculture food technology
Encapsulation_ Review paper, used for researhc scholars
Advanced methodologies resolving dimensionality complications for autism neur...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Spectral efficient network and resource selection model in 5G networks
Diabetes mellitus diagnosis method based random forest with bat algorithm
Big Data Technologies - Introduction.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Machine learning based COVID-19 study performance prediction
Mobile App Security Testing_ A Comprehensive Guide.pdf
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx

Simplify Big Data with AWS

  • 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. @julsimon Webinar “Salon du Big Data” 02/03/2016 Simplify Big Data with AWS Julien Simon, Principal Technical Evangelist
  • 2. Simplify Big Data Processing ingest / collect store process / analyze consume / visualize Time to Answer (Latency) Throughput Cost
  • 4. Types of Data •  Transactional •  Database reads & writes (OLTP) •  Cache •  Search •  Logs •  Streams •  File •  Log files (/var/log) •  Log collectors & frameworks •  Stream •  Log records •  Sensors & IoT data Database File Storage Stream Storage A iOS Android Web Apps Logstash Logging IoT Applications Transactional Data File Data Stream Data Mobile Apps Search Data Search Collect Store Logging IoT
  • 8. Database + Search Tier A Amazon
 S3 iOS Android Web Apps Logstash Amazon RDS / Aurora Amazon DynamoDB Amazon ES Amazon
 S3 Apache Kafka Amazon
 Glacier Amazon
 Kinesis Amazon
 DynamoDB Amazon ElastiCache SearchSQLNoSQLCache StreamStorage FileStorage Transactional Data File Data Stream Data Mobile Apps Search Data Collect Store ü Logging IoT Applications
  • 9. Database + Search Tier Anti-pattern RDBMS Database + Search Tier Applications
  • 10. What Data Store Should I Use? Amazon ElastiCache Amazon DynamoDB Amazon Aurora Amazon Elasticsearch Amazon EMR (HDFS) Amazon S3 Amazon Glacier Average latency ms ms ms, sec ms,sec sec,min,hrs ms,sec,min (~ size) hrs Data volume GB GB–TBs (no limit) GB–TB (64 TB Max) GB–TB GB–PB (~nodes) MB–PB (no limit) GB–PB (no limit) Item size B-KB KB (400 KB max) KB (64 KB) KB (1 MB max) MB-GB KB-GB (5 TB max) GB (40 TB max) Request rate High - Very High Very High (no limit) High High Low – Very High Low – Very High (no limit) Very Low Storage cost GB/month $$ ¢¢ ¢¢ ¢¢ ¢ ¢ ¢/10 Durability Low - Moderate Very High Very High High High Very High Very High Hot Data Warm Data Cold Data Hot Data Warm Data Cold Data
  • 12. AnalyzeA iOS Android Web Apps Logstash Amazon RDS Amazon DynamoDB Amazon ES Amazon
 S3 Apache Kafka Amazon
 Glacier Amazon
 Kinesis Amazon
 DynamoDB Amazon Redshift Impala Pig Amazon ML Streaming Amazon
 Kinesis AWS Lambda AmazonElasticMapReduce Amazon ElastiCache SearchSQLNoSQLCache StreamProcessing Batch Interactive Logging StreamStorage IoT Applications FileStorage Hot Cold Warm Hot Hot ML Transactional Data File Data Stream Data Mobile Apps Search Data Collect Store Analyze ü  ü 
  • 13. Analysis Tools and Frameworks Machine Learning •  Mahout, Spark ML, Amazon ML Interactive Analytics •  Amazon Redshift, Presto, Impala, Spark Batch Processing •  MapReduce, Hive, Pig, Spark Stream Processing •  Micro-batch: Spark Streaming, KCL, Hive, Pig •  Real-time: Storm, AWS Lambda, KCL Amazon Redshift Impala Pig Amazon Machine Learning Streaming Amazon
 Kinesis AWS Lambda AmazonElasticMapReduce StreamProcessing Batch Interactive ML Analyze
  • 14. What Data Processing Technology Should I Use? Amazon Redshift Impala Presto Spark Hive Query Latency Low Low Low Low Medium (Tez) – High (MapReduce) Durability High High High High High Data Volume 1.6 PB Max ~Nodes ~Nodes ~Nodes ~Nodes Managed Yes Yes (EMR) Yes (EMR) Yes (EMR) Yes (EMR) Storage Native HDFS / S3 HDFS / S3 HDFS / S3 HDFS / S3 SQL Compatibility High Medium High Low (SparkSQL) Medium (HQL) Query Latency High (Low is better) Medium
  • 16. Collect Store Analyze Consume A iOS Android Web Apps Logstash Amazon RDS Amazon DynamoDB Amazon ES Amazon
 S3 Apache Kafka Amazon
 Glacier Amazon
 Kinesis Amazon
 DynamoDB Amazon Redshift Impala Pig Amazon ML Streaming Amazon
 Kinesis AWS Lambda AmazonElasticMapReduce Amazon ElastiCache SearchSQLNoSQLCache StreamProcessing Batch Interactive Logging StreamStorage IoT Applications FileStorage Analysis&Visualization Hot Cold Warm Hot Slow Hot ML Fast Fast Transactional Data File Data Stream Data Notebooks Predictions Apps & APIs Mobile Apps IDE Search Data ETL Amazon QuickSight
  • 17. Consume •  Predictions •  Analysis and Visualization •  Notebooks •  IDE •  Applications & API Consume Analysis&Visualization Amazon QuickSight Notebooks Predictions Apps & APIs IDE Store Analyze ConsumeETL Business users Data Scientist, Developers
  • 18. Putting It All Together
  • 19. Collect Store Analyze Consume A iOS Android Web Apps Logstash Amazon RDS Amazon DynamoDB Amazon ES Amazon
 S3 Apache Kafka Amazon
 Glacier Amazon
 Kinesis Amazon
 DynamoDB Amazon Redshift Impala Pig Amazon ML Streaming Amazon
 Kinesis AWS Lambda AmazonElasticMapReduce Amazon ElastiCache SearchSQLNoSQLCache StreamProcessing Batch Interactive Logging StreamStorage IoT Applications FileStorage Analysis&Visualization Hot Cold Warm Hot Slow Hot ML Fast Fast Amazon QuickSight Transactional Data File Data Stream Data Notebooks Predictions Apps & APIs Mobile Apps IDE Search Data ETL
  • 20. Interactive & Batch Analytics Producer Amazon S3 Amazon EMR Hive Pig Spark Amazon ML process store Consume Amazon Redshift Amazon EMR Presto Impala Spark Batch Interactive Batch Prediction Real-time Prediction
  • 22. Batch Layer Amazon Kinesis data process store Lambda Architecture Amazon Kinesis S3 Connector Amazon S3 A p p l i c a t i o n s Amazon Redshift Amazon EMR Presto Hive Pig Spark answer Speed Layer answer Serving Layer Amazon ElastiCache Amazon DynamoDB Amazon RDS Amazon ES answer Amazon ML KCL AWS Lambda Spark Streaming Storm
  • 23. Summary •  Use the right tool for the job •  Latency, throughput, access patterns •  Leverage AWS managed services •  No/low admin •  Be cost conscious •  Big data ≠ big cost
  • 24. Thank you. Let’s keep in touch! @aws_actus @julsimon facebook.com/groups/AWSFrance/ AWS User Groups in Paris, Lyon, Nantes, Lille & Rennes (meetup.com) March 7-8 AWS Summit May 31st April 20-22 March 23-24 April 6-7 (Lyon) April 25 March 16
  • 25. Customer references & further reading •  Amazon Kinesis: https://aws.amazon.com/solutions/case-studies/supercell/ •  Amazon DynamoDB: https://aws.amazon.com/fr/solutions/case-studies/adroll/ •  Amazon S3 / Glacier: https://aws.amazon.com/fr/solutions/case-studies/soundcloud/ •  Amazon EMR: https://aws.amazon.com/fr/solutions/case-studies/yelp/ •  Amazon Aurora: https://aws.amazon.com/fr/rds/aurora/testimonials/ •  Amazon Redshift: https://aws.amazon.com/fr/solutions/case-studies/financial-times/ •  AWS Lambda: https://aws.amazon.com/fr/solutions/case-studies/nordstrom/ •  Many more case studies at https://aws.amazon.com/fr/solutions/case-studies/big-data/ •  Whitepaper: “Big Data Analytics Options on AWS” : http://d0.awsstatic.com/whitepapers/Big_Data_Analytics_Options_on_AWS.pdf •  AWS Big Data blog: https://blogs.aws.amazon.com/bigdata