SlideShare a Scribd company logo
Big Data on AWS 
Johann Romefort
Agenda 
• What is Big Data? 
• What is AWS? 
• Presenting the tools: How Big Data and AWS fit 
together
What is Big Data? 
• It’s at the intersection of data’s 3 V: 
• Velocity (Batch / Real time / Streaming) 
• Volume (Terabytes/Petabytes) 
• Variety (structure/semi-structured/unstructured)
Why is everybody talking about it? 
• Cost of generation of data has gone down 
• By 2015, 3B people will be online, pushing data 
volume created to 8 zettabytes 
• More data = More insights = Better decisions 
• Ease and cost of processing is falling thanks to 
cloud platforms
Data flow and constraints 
Generate 
Ingest / Store 
Process 
Visualize / Share 
The 3 V involve 
heterogeneity and 
make it hard to 
achieve those steps
What is AWS? 
• AWS is a cloud computing platform 
• On-demand delivery of IT resources 
• Pay-as-you-go pricing model
Cloud Computing 
+ + 
Compute Storage Networking 
Adapts dynamically to ever 
changing needs to stick closely 
to user infrastructure and 
applications requirements
How does AWS helps 
with Big Data? 
• Remove constraints on the ingesting, storing, and 
processing layer and adapts closely to demands. 
• Provides a collection of integrated tools to adapt to 
the 3 V’s of Big Data 
• Unlimited capacity of storage and processing power 
fits well to changing data storage and analysis 
requirements.
Computing Solutions 
for Big Data on AWS 
EC2 EMR 
Kinesis 
Redshift
Computing Solutions 
for Big Data on AWS 
EC2 
All-purpose computing instances. 
Dynamic Provisioning and resizing 
Let you scale your infrastructure 
at low cost 
Use Case: Well suited for running custom or proprietary 
application (ex: SAP Hana, Tableau…)
Computing Solutions 
for Big Data on AWS 
EMR 
‘Hadoop in the cloud’ 
Adapt to complexity of the analysis 
and volume of data to process 
Use Case: Offline processing of very large volume of data, 
possibly unstructured (Variety variable)
Computing Solutions 
for Big Data on AWS 
Kinesis 
Stream Processing 
Real-time data 
Scale to adapt to the flow of 
inbound data 
Use Case: Complex Event Processing, click streams, 
sensors data, computation over window of time
Computing Solutions 
for Big Data on AWS 
RedShift 
Data Warehouse in the cloud 
Scales to Petabytes 
Supports SQL Querying 
Start small for just $0.25/h 
Use Case: BI Analysis, Use of ODBC/JDBC legacy software 
to analyze or visualize data
Storage Solution 
for Big Data on AWS 
DynamoDB RedShift 
S3 Glacier
Storage Solution 
for Big Data on AWS 
DynamoDB 
NoSQL Database 
Consistent 
Low latency access 
Column-base flexible 
data model 
Use Case: Offline processing of very large volume of data, 
possibly unstructured (Variety variable)
Storage Solution 
for Big Data on AWS 
S3 
Versatile storage system 
Low-cost 
Fast retrieving of data 
Use Case: Backups and Disaster recovery, Media storage, 
Storage for data analysis
Storage Solution 
for Big Data on AWS 
Glacier 
Archive storage of cold data 
Extremely low-cost 
optimized for data infrequently 
accessed 
Use Case: Storing raw logs of data. Storing media archives. 
Magnetic tape replacement
What makes AWS different 
when it comes to big data?
Integrated Environment for Big Data 
Given the 3V’s a collection of tools is most of the time 
needed for your data processing and storage. 
AWS Big Data solutions comes integrated with each others 
already 
AWS Big Data solutions also integrate with the whole AWS 
ecosystem (Security, Identity Management, Logging, Backups, 
Management Console…)
Example of products interacting with 
each other.
Tightly integrated rich 
environment of tools 
+ 
On-demand scaling sticking to 
processing requirements 
= 
Extremely cost-effective and easy to 
deploy solution for big data needs
Use Case: 
Real-time IOT Analytics 
Gathering data in real time from sensors deployed in 
factory and send them for immediate processing 
• Error Detection: Real-time detection of hardware 
problems 
• Optimization and Energy management
First Version of the 
infrastructure 
Aggregate 
Sensors 
data 
nodejs 
stream 
processor 
On customer site 
evaluate rules 
over time 
window 
mongodb 
feed algorithm 
in-house hadoop cluster 
write raw 
data for 
further 
processing 
backup
Second Version of the 
infrastructure 
Aggregate 
Sensors 
data 
On customer site 
evaluate rules 
over time 
window 
write raw 
data for 
archiving 
Kinesis RedShift 
for BI 
analysis 
Glacier
Thank You 
romefort@gmail.com 
follow me on @romefort

More Related Content

PPTX
REDSHIFT - Amazon
PPTX
4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...
PPTX
SQL Server on Google Cloud Platform
PDF
Add Historical Analysis of Operational Data with Easy Configurations in Fivet...
PPTX
Curriculum Associates Strata NYC 2017
PDF
"Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ...
PPTX
IronSource Atom - Redshift - Lessons Learned
PDF
Building a Bigdata Architecture on AWS
REDSHIFT - Amazon
4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...
SQL Server on Google Cloud Platform
Add Historical Analysis of Operational Data with Easy Configurations in Fivet...
Curriculum Associates Strata NYC 2017
"Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ...
IronSource Atom - Redshift - Lessons Learned
Building a Bigdata Architecture on AWS

What's hot (18)

PPTX
Azure Big Data Story
PPTX
Building big data applications on AWS by Ran Tessler
PDF
Big data on AWS
PDF
Real-Time Analytics with Confluent and MemSQL
PDF
Beyond Relational
PPTX
Bleeding Edge Databases
PDF
How to teach your data scientist to leverage an analytics cluster with Presto...
PDF
Yahoo's Next Generation User Profile Platform
PPTX
Vitalii Bondarenko "Machine Learning on Fast Data"
PDF
Cloud Big Data Architectures
PPTX
Aws meetup 20190427
PDF
Managing Cassandra Databases with OpenStack Trove
PDF
Building Data Lakes with Apache Airflow
PDF
Apache Cassandra in the Cloud
PPTX
Benchmarking Aerospike on the Google Cloud - NoSQL Speed with Ease
PPTX
Big Data on Cloud Native Platform
PDF
Azure SQL Data Warehouse
PPTX
Snowflake essentials
Azure Big Data Story
Building big data applications on AWS by Ran Tessler
Big data on AWS
Real-Time Analytics with Confluent and MemSQL
Beyond Relational
Bleeding Edge Databases
How to teach your data scientist to leverage an analytics cluster with Presto...
Yahoo's Next Generation User Profile Platform
Vitalii Bondarenko "Machine Learning on Fast Data"
Cloud Big Data Architectures
Aws meetup 20190427
Managing Cassandra Databases with OpenStack Trove
Building Data Lakes with Apache Airflow
Apache Cassandra in the Cloud
Benchmarking Aerospike on the Google Cloud - NoSQL Speed with Ease
Big Data on Cloud Native Platform
Azure SQL Data Warehouse
Snowflake essentials
Ad

Similar to Big Data on AWS (20)

PPTX
Solving Big Data problems on AWS by Rajnish Malik
PDF
Lean Enterprise, Microservices and Big Data
PDF
20141021 AWS Cloud Taekwon - Big Data on AWS
PDF
Data Analysis - Journey Through the Cloud
PDF
Big Data Building Blocks with AWS Cloud
PPTX
Amazon Web Services
PDF
Big data and Analytics on AWS
PDF
Module 1 - CP Datalake on AWS
PPTX
Aaum Analytics event - Big data in the cloud
PDF
AWS Floor 28 - Building Data lake on AWS
PDF
Cloud as a Data Platform
PDF
Big Data Architecture and Design Patterns
PDF
Amazon Elastic Map Reduce - Ian Meyers
PPTX
AWS for Big Data Experts
PDF
Builders' Day - Building Data Lakes for Analytics On AWS LC
PPTX
2016 Utah Cloud Summit: Big Data Architectural Patterns and Best Practices on...
PPT
Big Data on The Cloud
PPTX
Rethinking the database for the cloud (iJAWS)
PPTX
AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence
PDF
Immersion Day - Como simplificar o acesso ao seu ambiente analítico
Solving Big Data problems on AWS by Rajnish Malik
Lean Enterprise, Microservices and Big Data
20141021 AWS Cloud Taekwon - Big Data on AWS
Data Analysis - Journey Through the Cloud
Big Data Building Blocks with AWS Cloud
Amazon Web Services
Big data and Analytics on AWS
Module 1 - CP Datalake on AWS
Aaum Analytics event - Big data in the cloud
AWS Floor 28 - Building Data lake on AWS
Cloud as a Data Platform
Big Data Architecture and Design Patterns
Amazon Elastic Map Reduce - Ian Meyers
AWS for Big Data Experts
Builders' Day - Building Data Lakes for Analytics On AWS LC
2016 Utah Cloud Summit: Big Data Architectural Patterns and Best Practices on...
Big Data on The Cloud
Rethinking the database for the cloud (iJAWS)
AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence
Immersion Day - Como simplificar o acesso ao seu ambiente analítico
Ad

More from Johann Romefort (9)

PDF
A Gentle introduction to Blockchain with Ethereum
PDF
Introduction to Blockchain with an Ethereuem Hands-on
PDF
IoT on AWS with NodeMCU for less than 5 Euros
PPTX
Hack the hack vivatech
PDF
Supply Chain Management on the blockchain with Iot, Azure, BigchainDB, VueJS
PDF
CoreOS introduction - by johann romefort
KEY
Le passage de clientele a communaute
PDF
Webcom - From the Social Web to the Web of Data
PDF
Seesmic - Using Free to Create Value
A Gentle introduction to Blockchain with Ethereum
Introduction to Blockchain with an Ethereuem Hands-on
IoT on AWS with NodeMCU for less than 5 Euros
Hack the hack vivatech
Supply Chain Management on the blockchain with Iot, Azure, BigchainDB, VueJS
CoreOS introduction - by johann romefort
Le passage de clientele a communaute
Webcom - From the Social Web to the Web of Data
Seesmic - Using Free to Create Value

Recently uploaded (20)

PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Spectroscopy.pptx food analysis technology
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Getting Started with Data Integration: FME Form 101
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Tartificialntelligence_presentation.pptx
PDF
cuic standard and advanced reporting.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Empathic Computing: Creating Shared Understanding
Spectroscopy.pptx food analysis technology
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Electronic commerce courselecture one. Pdf
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Getting Started with Data Integration: FME Form 101
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Tartificialntelligence_presentation.pptx
cuic standard and advanced reporting.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Advanced methodologies resolving dimensionality complications for autism neur...
Per capita expenditure prediction using model stacking based on satellite ima...
Assigned Numbers - 2025 - Bluetooth® Document
Unlocking AI with Model Context Protocol (MCP)
Dropbox Q2 2025 Financial Results & Investor Presentation

Big Data on AWS

  • 1. Big Data on AWS Johann Romefort
  • 2. Agenda • What is Big Data? • What is AWS? • Presenting the tools: How Big Data and AWS fit together
  • 3. What is Big Data? • It’s at the intersection of data’s 3 V: • Velocity (Batch / Real time / Streaming) • Volume (Terabytes/Petabytes) • Variety (structure/semi-structured/unstructured)
  • 4. Why is everybody talking about it? • Cost of generation of data has gone down • By 2015, 3B people will be online, pushing data volume created to 8 zettabytes • More data = More insights = Better decisions • Ease and cost of processing is falling thanks to cloud platforms
  • 5. Data flow and constraints Generate Ingest / Store Process Visualize / Share The 3 V involve heterogeneity and make it hard to achieve those steps
  • 6. What is AWS? • AWS is a cloud computing platform • On-demand delivery of IT resources • Pay-as-you-go pricing model
  • 7. Cloud Computing + + Compute Storage Networking Adapts dynamically to ever changing needs to stick closely to user infrastructure and applications requirements
  • 8. How does AWS helps with Big Data? • Remove constraints on the ingesting, storing, and processing layer and adapts closely to demands. • Provides a collection of integrated tools to adapt to the 3 V’s of Big Data • Unlimited capacity of storage and processing power fits well to changing data storage and analysis requirements.
  • 9. Computing Solutions for Big Data on AWS EC2 EMR Kinesis Redshift
  • 10. Computing Solutions for Big Data on AWS EC2 All-purpose computing instances. Dynamic Provisioning and resizing Let you scale your infrastructure at low cost Use Case: Well suited for running custom or proprietary application (ex: SAP Hana, Tableau…)
  • 11. Computing Solutions for Big Data on AWS EMR ‘Hadoop in the cloud’ Adapt to complexity of the analysis and volume of data to process Use Case: Offline processing of very large volume of data, possibly unstructured (Variety variable)
  • 12. Computing Solutions for Big Data on AWS Kinesis Stream Processing Real-time data Scale to adapt to the flow of inbound data Use Case: Complex Event Processing, click streams, sensors data, computation over window of time
  • 13. Computing Solutions for Big Data on AWS RedShift Data Warehouse in the cloud Scales to Petabytes Supports SQL Querying Start small for just $0.25/h Use Case: BI Analysis, Use of ODBC/JDBC legacy software to analyze or visualize data
  • 14. Storage Solution for Big Data on AWS DynamoDB RedShift S3 Glacier
  • 15. Storage Solution for Big Data on AWS DynamoDB NoSQL Database Consistent Low latency access Column-base flexible data model Use Case: Offline processing of very large volume of data, possibly unstructured (Variety variable)
  • 16. Storage Solution for Big Data on AWS S3 Versatile storage system Low-cost Fast retrieving of data Use Case: Backups and Disaster recovery, Media storage, Storage for data analysis
  • 17. Storage Solution for Big Data on AWS Glacier Archive storage of cold data Extremely low-cost optimized for data infrequently accessed Use Case: Storing raw logs of data. Storing media archives. Magnetic tape replacement
  • 18. What makes AWS different when it comes to big data?
  • 19. Integrated Environment for Big Data Given the 3V’s a collection of tools is most of the time needed for your data processing and storage. AWS Big Data solutions comes integrated with each others already AWS Big Data solutions also integrate with the whole AWS ecosystem (Security, Identity Management, Logging, Backups, Management Console…)
  • 20. Example of products interacting with each other.
  • 21. Tightly integrated rich environment of tools + On-demand scaling sticking to processing requirements = Extremely cost-effective and easy to deploy solution for big data needs
  • 22. Use Case: Real-time IOT Analytics Gathering data in real time from sensors deployed in factory and send them for immediate processing • Error Detection: Real-time detection of hardware problems • Optimization and Energy management
  • 23. First Version of the infrastructure Aggregate Sensors data nodejs stream processor On customer site evaluate rules over time window mongodb feed algorithm in-house hadoop cluster write raw data for further processing backup
  • 24. Second Version of the infrastructure Aggregate Sensors data On customer site evaluate rules over time window write raw data for archiving Kinesis RedShift for BI analysis Glacier
  • 25. Thank You romefort@gmail.com follow me on @romefort