SlideShare a Scribd company logo
Morri Feldman
The Road
Less Traveled
Highlights and Challenges from Running
Spark on Mesos in Production
morri@appsflyer.com
The Plan
Attribution &
Overall
Architecture
Retention
Data
Infrastructure -
Spark on Mesos
1 2 3
-OR-
User Device
StoreRedirected
Enables
• Cost Per Install (CPI)
• Cost Per In-app Action
(CPA)
• Revenue Share
• Network Optimization
• Retargeting
Media sources
The Flow
AppsFlyer Servers
Highlights and Challenges from Running Spark on Mesos in Production by Morri Feldman
Retention
Install day 1 2 3 4 5 6 7 8 9 10 11 12
Retention Scale
> 30 Million Installs / Day
> 5 Billion Sessions / Day
Retention
Install day 1 2 3 4 5 6 7 8 9 10 11 12
Retention Dimensions
Two Dimensions (App-Id and Media-Source)
Cascalog
DataLog / Logic programming over Cascading /
Hadoop
Retention V1 (MVP)
Two Dimensions (App-Id and Media-Source)
Cascalog
DataLog / Logic programming over Cascading /
Hadoop
Retention V1 (MVP)
Two Dimensions (App-Id and Media-Source)
Cascalog
DataLog / Logic programming over Cascading / Hadoop
Retention V1 (MVP)
S3 Data v1 – Hadoop Sequence files:
Key, Value <Kafka Offset, Json Message>
Gzip Compressed ~ 1.8 TB / Day
S3 Data v2 – Parquet Files (Schema on Write)
Retain fields required for retention, apply some
business logic while converting.
Generates “tables” for installs and sessions.
Retention v2 – “SELECT … JOIN ON ...”
18 Dimensions vs 2 in original report
Retention – Spark SQL / Parquet
Retention Calculation Phases
1. Daily aggregation
Cohort_day, Activity_day, <Dimensions>, Retained Count
2. Pivot
Cohort_day, <Dimensions>, Day0, Day1, Day2 …
After Aggregation and Pivot ~ 1 billion rows
Data Warehouse v3
Parquet Files – Schema on Read
Retain almost all fields from original json
Do not apply any business logic
Business logic applied when reading through
use of a shared library
Spark and Spark
Streaming: ETL for Druid
SQL
Why?
All Data on S3 – No need for HDFS
Spark & Mesos have a long history
Some interest in moving our attribution services to Mesos
Began using spark with EC2 “standalone” cluster scripts (No VPC)
Easy to setup
Culture of trying out promising technologies
Mesos Creature Comforts
Nice UI –
Job outputs / sandbox easy to find
Driver and Slave logs are accessible
Mesos Creature Comforts
Fault tolerant – Masters store data in
zookeeper and canfail over smoothly
Nodes join and leave the cluster
automatically at bootup / shutdown
Job Scheduling – Chronos
?https://aphyr.com/posts/326-jepsen-chronos
Specific Lessons / Challenges
using Spark, Mesos & S3
-or-
What Went Wrong with
Spark / Mesos & S3 and How
We Fixed It.
Spark / Mesos in production for nearly 1 year
S3 is not HDFS
S3n gives tons of timeouts and DNS Errors
@ 5pm Daily
Can compensate for timeouts with
spark.task.maxFailures set to 20
Use S3a from Hadoop 2.7
(S3a in 2.6 generates millions of partitions –
HADOOP-11584)
https://www.appsflyer.com/blog/the-bleeding-edge-spark-parquet-and-s3/
S3 is not HDFS part 2
Use a Direct Output Commiter
https://www.appsflyer.com/blog/the-bleeding-edge-spark-parquet-and-s3/
Spark writes files to staging area and renames them at
end of job
Rename on S3 is an expensive operation
(~10s of minutes for thousands of files)
Direct Output Commiters write to final output location
(Safe because S3 is atomic, so writes always succeed)
Disadvantages –Incompatible with speculative
execution
Poor recovery from failures during write operations
Avoid .0 releases if possible
https://www.appsflyer.com/blog/the-bleeding-edge-spark-parquet-and-s3/
Worst example
Spark 1.4.0 randomly loses data especially
on jobs with many output partitions
Fixed by SPARK-8406
Coarse-Grained or Fine-
Grained?
TL; DR – Use coarse-grained
Not Perfect, but Stable
Coarse-Grained –
Disadvantages
spark.cores.max (not dynamic)
Coarse-Grained with
Dynamic Allocation
Tuning Jobs in
Coarse-Grained
Tuning Jobs in Coarse-Grained
Set executor memory to ~ entire memory of a
machine (200GB for r3.8xlarge)
spark.task.cpus is then actually spark memory
per task
OOM!!
200 GB 32 cpus
Tuning Jobs in Coarse-Grained
More Shuffle Partitions
OOM!!
Spark on Mesos Future
Improvements
Increased stability –
Dynamic allocation
Tungsten
Mesos Maintenance Primitives, experimental in 0.25.0
Gracefully reduce size of cluster by marking nodes
that will soon be killed
Inverse Offers – preemption, more dynamic scheduling
How We Generated
Duplicate Data
OR
S3 is Still Not HDFS
S3 is Still Not HDFS
S3 is Eventually
Consistent
We are Hiring!
https://www.appsflyer.com/jobs/

More Related Content

PDF
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
PDF
Spark Summit EU talk by Patrick Baier and Stanimir Dragiev
PDF
Spark Uber Development Kit
PDF
Spark Summit EU talk by Rolf Jagerman
PDF
Spark Summit EU talk by Oscar Castaneda
PPTX
Spark Summit EU talk by Kaarthik Sivashanmugam
PDF
An Introduction to Sparkling Water by Michal Malohlava
PDF
Spark Summit EU talk by Jim Dowling
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
Spark Summit EU talk by Patrick Baier and Stanimir Dragiev
Spark Uber Development Kit
Spark Summit EU talk by Rolf Jagerman
Spark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Kaarthik Sivashanmugam
An Introduction to Sparkling Water by Michal Malohlava
Spark Summit EU talk by Jim Dowling

What's hot (20)

PDF
Spark Summit EU talk by Michael Nitschinger
PDF
Spark Summit EU talk by Oscar Castaneda
PDF
Spark Summit EU talk by Debasish Das and Pramod Narasimha
PDF
Spark Summit EU talk by John Musser
PDF
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
PDF
An Insider’s Guide to Maximizing Spark SQL Performance
PDF
Spark Summit EU talk by Ahsan Javed Awan
PDF
Analytics at Scale with Apache Spark on AWS with Jonathan Fritz
PDF
Spark Summit EU talk by Tim Hunter
PDF
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
PDF
Deep Learning to Production with MLflow & RedisAI
PDF
Spark Summit EU talk by Yiannis Gkoufas
PDF
Scaling Machine Learning To Billions Of Parameters
PPTX
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
PPTX
Simplifying Big Data Applications with Apache Spark 2.0
PDF
Simplify and Boost Spark 3 Deployments with Hypervisor-Native Kubernetes
PDF
Spark Summit EU talk by Josef Habdank
PDF
Deploying Apache Spark Jobs on Kubernetes with Helm and Spark Operator
PDF
Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by ...
PDF
Spark Summit EU talk by Mikhail Semeniuk Hollin Wilkins
Spark Summit EU talk by Michael Nitschinger
Spark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by John Musser
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
An Insider’s Guide to Maximizing Spark SQL Performance
Spark Summit EU talk by Ahsan Javed Awan
Analytics at Scale with Apache Spark on AWS with Jonathan Fritz
Spark Summit EU talk by Tim Hunter
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Deep Learning to Production with MLflow & RedisAI
Spark Summit EU talk by Yiannis Gkoufas
Scaling Machine Learning To Billions Of Parameters
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Simplifying Big Data Applications with Apache Spark 2.0
Simplify and Boost Spark 3 Deployments with Hypervisor-Native Kubernetes
Spark Summit EU talk by Josef Habdank
Deploying Apache Spark Jobs on Kubernetes with Helm and Spark Operator
Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by ...
Spark Summit EU talk by Mikhail Semeniuk Hollin Wilkins
Ad

Viewers also liked (20)

PDF
MLLeap, or How to Productionize Data Science Workflows Using Spark by Mikha...
PDF
MLeap: Productionize Data Science Workflows Using Spark
PDF
Reactive Feature Generation with Spark and MLlib by Jeffrey Smith (1)
PDF
Production Readiness Testing At Salesforce Using Spark MLlib
PDF
Tagging and Processing Data in Real Time-(Hari Shreedharan and Siddhartha Jai...
PDF
Spark Summit EU 2015: SparkUI visualization: a lens into your application
PDF
Spark with Cassandra by Christopher Batey
PDF
MLeap: Release Spark ML Pipelines
PDF
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
PDF
Some Important Streaming Algorithms You Should Know About-(Ted Dunning, MapR)
PDF
Spark Tuning for Enterprise System Administrators By Anya Bida
PDF
Insights into Customer Behavior from Clickstream Data by Ronald Nowling
PDF
Continuous Integration for Spark Apps by Sean McIntyre
PDF
Beyond Parallelize and Collect by Holden Karau
PDF
Integrating Spark and Solr-(Timothy Potter, Lucidworks)
PDF
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
PDF
Clickstream Analysis with Spark—Understanding Visitors in Realtime by Josef A...
PDF
Relationship Extraction from Unstructured Text-Based on Stanford NLP with Spa...
PDF
Implementing Near-Realtime Datacenter Health Analytics using Model-driven Ver...
PDF
Spark in the Wild: An In-Depth Analysis of 50+ Production Deployments-(Arsala...
MLLeap, or How to Productionize Data Science Workflows Using Spark by Mikha...
MLeap: Productionize Data Science Workflows Using Spark
Reactive Feature Generation with Spark and MLlib by Jeffrey Smith (1)
Production Readiness Testing At Salesforce Using Spark MLlib
Tagging and Processing Data in Real Time-(Hari Shreedharan and Siddhartha Jai...
Spark Summit EU 2015: SparkUI visualization: a lens into your application
Spark with Cassandra by Christopher Batey
MLeap: Release Spark ML Pipelines
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Some Important Streaming Algorithms You Should Know About-(Ted Dunning, MapR)
Spark Tuning for Enterprise System Administrators By Anya Bida
Insights into Customer Behavior from Clickstream Data by Ronald Nowling
Continuous Integration for Spark Apps by Sean McIntyre
Beyond Parallelize and Collect by Holden Karau
Integrating Spark and Solr-(Timothy Potter, Lucidworks)
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Clickstream Analysis with Spark—Understanding Visitors in Realtime by Josef A...
Relationship Extraction from Unstructured Text-Based on Stanford NLP with Spa...
Implementing Near-Realtime Datacenter Health Analytics using Model-driven Ver...
Spark in the Wild: An In-Depth Analysis of 50+ Production Deployments-(Arsala...
Ad

Similar to Highlights and Challenges from Running Spark on Mesos in Production by Morri Feldman (20)

PDF
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
PDF
Scala like distributed collections - dumping time-series data with apache spark
PDF
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...
PDF
From HDFS to S3: Migrate Pinterest Apache Spark Clusters
PDF
Spark Summit EU talk by Kent Buenaventura and Willaim Lau
PPTX
AWS Big Data Demystified #1: Big data architecture lessons learned
PPTX
Introduction to Spark
PDF
Spark Meetup at Uber
PPTX
S3 cassandra or outer space? dumping time series data using spark
PDF
xPatterns on Spark, Tachyon and Mesos - Bucharest meetup
PDF
Kafka spark cassandra webinar feb 16 2016
PDF
Kafka spark cassandra webinar feb 16 2016
PDF
Ultimate journey towards realtime data platform with 2.5M events per sec
PDF
Tachyon and Apache Spark
PDF
Introduction to Spark Training
PPTX
Intro to Spark development
PDF
spark_v1_2
PPTX
Spark on Yarn @ Netflix
PPTX
Producing Spark on YARN for ETL
PDF
Data Day Texas 2017: Scaling Data Science at Stitch Fix
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
Scala like distributed collections - dumping time-series data with apache spark
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...
From HDFS to S3: Migrate Pinterest Apache Spark Clusters
Spark Summit EU talk by Kent Buenaventura and Willaim Lau
AWS Big Data Demystified #1: Big data architecture lessons learned
Introduction to Spark
Spark Meetup at Uber
S3 cassandra or outer space? dumping time series data using spark
xPatterns on Spark, Tachyon and Mesos - Bucharest meetup
Kafka spark cassandra webinar feb 16 2016
Kafka spark cassandra webinar feb 16 2016
Ultimate journey towards realtime data platform with 2.5M events per sec
Tachyon and Apache Spark
Introduction to Spark Training
Intro to Spark development
spark_v1_2
Spark on Yarn @ Netflix
Producing Spark on YARN for ETL
Data Day Texas 2017: Scaling Data Science at Stitch Fix

More from Spark Summit (20)

PDF
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
PDF
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
PDF
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
PDF
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
PDF
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
PDF
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
PDF
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
PDF
Next CERN Accelerator Logging Service with Jakub Wozniak
PDF
Powering a Startup with Apache Spark with Kevin Kim
PDF
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
PDF
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
PDF
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
PDF
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
PDF
Goal Based Data Production with Sim Simeonov
PDF
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
PDF
Getting Ready to Use Redis with Apache Spark with Dvir Volk
PDF
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
PDF
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Next CERN Accelerator Logging Service with Jakub Wozniak
Powering a Startup with Apache Spark with Kevin Kim
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Goal Based Data Production with Sim Simeonov
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...

Recently uploaded (20)

PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
Launch Your Data Science Career in Kochi – 2025
PPTX
Database Infoormation System (DBIS).pptx
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
Computer network topology notes for revision
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Introduction to Business Data Analytics.
PPT
Quality review (1)_presentation of this 21
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PDF
Mega Projects Data Mega Projects Data
PPTX
Global journeys: estimating international migration
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
STUDY DESIGN details- Lt Col Maksud (21).pptx
Launch Your Data Science Career in Kochi – 2025
Database Infoormation System (DBIS).pptx
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Computer network topology notes for revision
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Introduction to Business Data Analytics.
Quality review (1)_presentation of this 21
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Moving the Public Sector (Government) to a Digital Adoption
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
.pdf is not working space design for the following data for the following dat...
Mega Projects Data Mega Projects Data
Global journeys: estimating international migration

Highlights and Challenges from Running Spark on Mesos in Production by Morri Feldman

  • 1. Morri Feldman The Road Less Traveled Highlights and Challenges from Running Spark on Mesos in Production morri@appsflyer.com
  • 3. -OR- User Device StoreRedirected Enables • Cost Per Install (CPI) • Cost Per In-app Action (CPA) • Revenue Share • Network Optimization • Retargeting Media sources The Flow AppsFlyer Servers
  • 5. Retention Install day 1 2 3 4 5 6 7 8 9 10 11 12
  • 6. Retention Scale > 30 Million Installs / Day > 5 Billion Sessions / Day Retention Install day 1 2 3 4 5 6 7 8 9 10 11 12
  • 8. Two Dimensions (App-Id and Media-Source) Cascalog DataLog / Logic programming over Cascading / Hadoop Retention V1 (MVP)
  • 9. Two Dimensions (App-Id and Media-Source) Cascalog DataLog / Logic programming over Cascading / Hadoop Retention V1 (MVP)
  • 10. Two Dimensions (App-Id and Media-Source) Cascalog DataLog / Logic programming over Cascading / Hadoop Retention V1 (MVP)
  • 11. S3 Data v1 – Hadoop Sequence files: Key, Value <Kafka Offset, Json Message> Gzip Compressed ~ 1.8 TB / Day S3 Data v2 – Parquet Files (Schema on Write) Retain fields required for retention, apply some business logic while converting. Generates “tables” for installs and sessions. Retention v2 – “SELECT … JOIN ON ...” 18 Dimensions vs 2 in original report Retention – Spark SQL / Parquet
  • 12. Retention Calculation Phases 1. Daily aggregation Cohort_day, Activity_day, <Dimensions>, Retained Count 2. Pivot Cohort_day, <Dimensions>, Day0, Day1, Day2 … After Aggregation and Pivot ~ 1 billion rows
  • 13. Data Warehouse v3 Parquet Files – Schema on Read Retain almost all fields from original json Do not apply any business logic Business logic applied when reading through use of a shared library
  • 14. Spark and Spark Streaming: ETL for Druid SQL
  • 15. Why? All Data on S3 – No need for HDFS Spark & Mesos have a long history Some interest in moving our attribution services to Mesos Began using spark with EC2 “standalone” cluster scripts (No VPC) Easy to setup Culture of trying out promising technologies
  • 16. Mesos Creature Comforts Nice UI – Job outputs / sandbox easy to find Driver and Slave logs are accessible
  • 17. Mesos Creature Comforts Fault tolerant – Masters store data in zookeeper and canfail over smoothly Nodes join and leave the cluster automatically at bootup / shutdown
  • 18. Job Scheduling – Chronos ?https://aphyr.com/posts/326-jepsen-chronos
  • 19. Specific Lessons / Challenges using Spark, Mesos & S3 -or- What Went Wrong with Spark / Mesos & S3 and How We Fixed It. Spark / Mesos in production for nearly 1 year
  • 20. S3 is not HDFS S3n gives tons of timeouts and DNS Errors @ 5pm Daily Can compensate for timeouts with spark.task.maxFailures set to 20 Use S3a from Hadoop 2.7 (S3a in 2.6 generates millions of partitions – HADOOP-11584) https://www.appsflyer.com/blog/the-bleeding-edge-spark-parquet-and-s3/
  • 21. S3 is not HDFS part 2 Use a Direct Output Commiter https://www.appsflyer.com/blog/the-bleeding-edge-spark-parquet-and-s3/ Spark writes files to staging area and renames them at end of job Rename on S3 is an expensive operation (~10s of minutes for thousands of files) Direct Output Commiters write to final output location (Safe because S3 is atomic, so writes always succeed) Disadvantages –Incompatible with speculative execution Poor recovery from failures during write operations
  • 22. Avoid .0 releases if possible https://www.appsflyer.com/blog/the-bleeding-edge-spark-parquet-and-s3/ Worst example Spark 1.4.0 randomly loses data especially on jobs with many output partitions Fixed by SPARK-8406
  • 23. Coarse-Grained or Fine- Grained? TL; DR – Use coarse-grained Not Perfect, but Stable
  • 27. Tuning Jobs in Coarse-Grained Set executor memory to ~ entire memory of a machine (200GB for r3.8xlarge) spark.task.cpus is then actually spark memory per task OOM!! 200 GB 32 cpus
  • 28. Tuning Jobs in Coarse-Grained More Shuffle Partitions OOM!!
  • 29. Spark on Mesos Future Improvements Increased stability – Dynamic allocation Tungsten Mesos Maintenance Primitives, experimental in 0.25.0 Gracefully reduce size of cluster by marking nodes that will soon be killed Inverse Offers – preemption, more dynamic scheduling
  • 30. How We Generated Duplicate Data OR S3 is Still Not HDFS
  • 31. S3 is Still Not HDFS S3 is Eventually Consistent