SlideShare a Scribd company logo
Date
Spark Job Server
Evan Chan and Kelvin Chu
Overview
WhyWe Needed a Job Server
• Created at Ooyala in 2013
• Our vision for Spark is as a multi-team big data service
• What gets repeated by every team:
• Bastion box for running Hadoop/Spark jobs
• Deploys and process monitoring
• Tracking and serializing job status, progress, and job results
• Job validation
• No easy way to kill jobs
• Polyglot technology stack - Ruby scripts run jobs, Go services
Spark as a Service
• REST API for Spark jobs and contexts. Easily operate Spark from any
language or environment.
• Runs jobs in their own Contexts or share 1 context amongst jobs
• Great for sharing cached RDDs across jobs and low-latency jobs
• Works for Spark Streaming as well!
• Works with Standalone, Mesos, any Spark config
• Jars, job history and config are persisted via a pluggable API
• Async and sync API, JSON job results
http://github.com/ooyala/spark-jobserver
Open Source!!
Creating a Job Server Project
✤ sbt assembly -> fat jar -> upload to job server!
✤ "provided" is used. Don’t want SBT assembly to include the
whole job server jar.!
✤ Java projects should be possible too
resolvers += "Ooyala Bintray" at "http://dl.bintray.com/ooyala/maven"
!
libraryDependencies += "ooyala.cnd" % "job-server" % "0.3.1" % "provided"
✤ In your build.sbt, add this
Example Job Server Job
/**!
* A super-simple Spark job example that implements the SparkJob trait and!
* can be submitted to the job server.!
*/!
object WordCountExample extends SparkJob {!
override def validate(sc: SparkContext, config: Config): SparkJobValidation = {!
Try(config.getString(“input.string”))!
.map(x => SparkJobValid)!
.getOrElse(SparkJobInvalid(“No input.string”))!
}!
!
override def runJob(sc: SparkContext, config: Config): Any = {!
val dd = sc.parallelize(config.getString(“input.string”).split(" ").toSeq)!
dd.map((_, 1)).reduceByKey(_ + _).collect().toMap!
}!
}!
What’s Different?
• Job does not create Context, Job Server does
• Decide when I run the job: in own context, or in pre-created context
• Upload new jobs to diagnose your RDD issues:
• POST /contexts/newContext
• POST /jobs .... context=newContext
• Upload a new diagnostic jar... POST /jars/newDiag
• Run diagnostic jar to dump into on cached RDDs
Submitting and Running a Job
✦ curl --data-binary @../target/mydemo.jar localhost:8090/jars/demo
OK[11:32 PM] ~
!
✦ curl -d "input.string = A lazy dog jumped mean dog" 'localhost:8090/jobs?
appName=demo&classPath=WordCountExample&sync=true'
{
"status": "OK",
"RESULT": {
"lazy": 1,
"jumped": 1,
"A": 1,
"mean": 1,
"dog": 2
}
}
Retrieve Job Statuses
~/s/jobserver (evan-working-1 ↩=) curl 'localhost:8090/jobs?limit=2'
[{
"duration": "77.744 secs",
"classPath": "ooyala.cnd.CreateMaterializedView",
"startTime": "2013-11-26T20:13:09.071Z",
"context": "8b7059dd-ooyala.cnd.CreateMaterializedView",
"status": "FINISHED",
"jobId": "9982f961-aaaa-4195-88c2-962eae9b08d9"
}, {
"duration": "58.067 secs",
"classPath": "ooyala.cnd.CreateMaterializedView",
"startTime": "2013-11-26T20:22:03.257Z",
"context": "d0a5ebdc-ooyala.cnd.CreateMaterializedView",
"status": "FINISHED",
"jobId": "e9317383-6a67-41c4-8291-9c140b6d8459"
}]
Use Case: Fast Query Jobs
Spark as a Query Engine
✤ Goal: spark jobs that run in under a second and answers queries
on shared RDD data!
✤ Query params passed in as job config!
✤ Need to minimize context creation overhead!
✤ Thus many jobs sharing the same SparkContext!
✤ On-heap RDD caching means no serialization loss!
✤ Need to consider concurrent jobs (fair scheduling)
LOW-LATENCY QUERY JOBS
RDDLoad Data
Query
Job
Spark

Executors
Cassandra
REST Job Server
Query
Job
Query
Result
Query
Result
new SparkContext
Create
query
context
Load
some
data
Sharing Data Between Jobs
✤ RDD Caching!
✤ Benefit: no need to serialize data. Especially useful for indexes etc.!
✤ Job server provides a NamedRdds trait for thread-safe CRUD of
cached RDDs by name!
✤ (Compare to SparkContext’s API which uses an integer ID and
is not thread safe)!
✤ For example, at Ooyala a number of fields are multiplexed into the
RDD name: timestamp:customerID:granularity
Data Concurrency
✤ Single writer, multiple readers!
✤ Managing multiple updates to RDDs!
✤ Cache keeps track of which RDDs being updated!
✤ Example: thread A spark job creates RDD “A” at t0!
✤ thread B fetches RDD “A” at t1 > t0!
✤ Both threads A and B, using NamedRdds, will get the RDD at
time t2 when thread A finishes creating the RDD “A”
Production Usage
Persistence
✤ What gets persisted?!
✤ Job status (success, error, why it failed)!
✤ Job Configuration!
✤ Jars!
✤ JDBC database configuration: spark.sqldao.jdbc.url!
✤ jdbc:mysql://dbserver:3306/jobserverdb
✤ Multiple Job Servers can share the same database.!
✤ The default will be H2 - single file on disk.
Deployment and Metrics
✤ spark-jobserver repo comes with a full suite of tests
and deploy scripts:!
✤ server_deploy.sh for regular server pushes!
✤ server_package.sh for Mesos and Chronos .tar.gz!
✤ /metricz route for codahale-metrics monitoring!
✤ /healthz route for health check0o
Challenges and Lessons
• Spark is based around contexts - we need a Job Server oriented around
logical jobs
• Running multiple SparkContexts in the same process
• Much easier with Spark 0.9+ … no more global System properties
• Have to be careful with SparkEnv
• Dynamic jar and class loading is tricky (contributed back to Spark)
• Manage threads carefully - each context uses lots of threads
FutureWork
Future Plans
✤ Spark-contrib project list. So this and other projects
can gain visibility! (SPARK-1283)!
✤ HA mode using Akka Cluster or Mesos!
✤ HA and Hot Failover for Spark Drivers/Contexts!
✤ REST API for job progress!
✤ Swagger API documentation
HA and Hot Failover for Jobs
Job
Server 1
Job
Server 2
Active
Job
Context
HDFS
Standby
Job
Context
Gossip
Checkpoint
✤ Job context dies:!
✤ Job server 2
notices and spins
up standby
context, restores
checkpoint
Thanks for your contributions!
✤ All of these were community contributed:!
✤ index.html main page!
✤ saving and retrieving job configuration!
✤ Your contributions are very welcome on Github!
Architecture
Completely Async Design
✤ http://spray.io - probably the fastest JVM HTTP
microframework!
✤ Akka Actor based, non blocking!
✤ Futures used to manage individual jobs. (Note that
Spark is using Scala futures to manage job stages now)!
✤ Single JVM for now, but easy to distribute later via
remote Actors / Akka Cluster
Async Actor Flow
Spray web
API
Request
actor
Local
Supervisor
Job
Manager
Job 1
Future
Job 2
Future
Job Status
Actor
Job Result
Actor
Message flow fully documented
Thank you!
And Everybody is Hiring!!
UsingTachyon
Pros Cons
Off-heap storage: No GC
ByteBuffer API - need to
pay deserialization cost
Can be shared across
multiple processes
Data can survive process
loss
Backed by HDFS
Does not support random
access writes

More Related Content

PDF
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
PPTX
How to build your query engine in spark
PDF
Productionizing Spark and the Spark Job Server
PDF
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
PDF
Cassandra Day 2014: Interactive Analytics with Cassandra and Spark
PPTX
Productionizing Spark and the REST Job Server- Evan Chan
PDF
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)
PPT
spark-kafka_mod
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
How to build your query engine in spark
Productionizing Spark and the Spark Job Server
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
Cassandra Day 2014: Interactive Analytics with Cassandra and Spark
Productionizing Spark and the REST Job Server- Evan Chan
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)
spark-kafka_mod

What's hot (20)

PDF
Breakthrough OLAP performance with Cassandra and Spark
PPTX
Real time Analytics with Apache Kafka and Apache Spark
PDF
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
PDF
Reactive app using actor model & apache spark
PDF
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
PPTX
Introduction to real time big data with Apache Spark
PDF
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
PDF
Transactional writes to cloud storage with Eric Liang
PDF
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
PPTX
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
PPT
Spark stream - Kafka
PDF
Stream Processing using Apache Spark and Apache Kafka
PPTX
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
PPTX
Intro to Apache Spark
PDF
Introduction to Spark Streaming & Apache Kafka | Big Data Hadoop Spark Tutori...
PDF
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
PPTX
Emr zeppelin & Livy demystified
PPTX
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
PPTX
Akka 2.4 plus new commercial features in Typesafe Reactive Platform
PPTX
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
Breakthrough OLAP performance with Cassandra and Spark
Real time Analytics with Apache Kafka and Apache Spark
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Reactive app using actor model & apache spark
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Introduction to real time big data with Apache Spark
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Transactional writes to cloud storage with Eric Liang
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Spark stream - Kafka
Stream Processing using Apache Spark and Apache Kafka
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
Intro to Apache Spark
Introduction to Spark Streaming & Apache Kafka | Big Data Hadoop Spark Tutori...
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
Emr zeppelin & Livy demystified
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
Akka 2.4 plus new commercial features in Typesafe Reactive Platform
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
Ad

Similar to Spark Summit 2014: Spark Job Server Talk (20)

PDF
Faster Data Integration Pipeline Execution using Spark-Jobserver
PDF
Apache Spark at Viadeo
PDF
Hadoop Spark Introduction-20150130
PDF
Apache Spark Workshop, Apr. 2016, Euangelos Linardos
PPTX
Apache Spark Core
PDF
TriHUG talk on Spark and Shark
ODP
Spark Deep Dive
PDF
Apache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
PPTX
Spark from the Surface
PPTX
Pyspark presentationfsfsfjspfsjfsfsfjsfpsfsf
PPTX
Pyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsf
PDF
Apache Spark in Depth: Core Concepts, Architecture & Internals
PDF
Apache Spark - A High Level overview
PPTX
Spark Study Notes
PPTX
Ten tools for ten big data areas 03_Apache Spark
PDF
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
PPTX
Intro to Spark development
PPTX
Bring the Spark To Your Eyes
PDF
Introduction to Spark Training
PPTX
Spark 101 - First steps to distributed computing
Faster Data Integration Pipeline Execution using Spark-Jobserver
Apache Spark at Viadeo
Hadoop Spark Introduction-20150130
Apache Spark Workshop, Apr. 2016, Euangelos Linardos
Apache Spark Core
TriHUG talk on Spark and Shark
Spark Deep Dive
Apache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
Spark from the Surface
Pyspark presentationfsfsfjspfsjfsfsfjsfpsfsf
Pyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsf
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark - A High Level overview
Spark Study Notes
Ten tools for ten big data areas 03_Apache Spark
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Intro to Spark development
Bring the Spark To Your Eyes
Introduction to Spark Training
Spark 101 - First steps to distributed computing
Ad

More from Evan Chan (10)

PDF
Time-State Analytics: MinneAnalytics 2024 Talk
PDF
Porting a Streaming Pipeline from Scala to Rust
PDF
Designing Stateful Apps for Cloud and Kubernetes
PDF
Histograms at scale - Monitorama 2019
PDF
FiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
PDF
Building a High-Performance Database with Scala, Akka, and Spark
PDF
Akka in Production - ScalaDays 2015
PDF
MIT lecture - Socrata Open Data Architecture
PDF
OLAP with Cassandra and Spark
PDF
Real-time Analytics with Cassandra, Spark, and Shark
Time-State Analytics: MinneAnalytics 2024 Talk
Porting a Streaming Pipeline from Scala to Rust
Designing Stateful Apps for Cloud and Kubernetes
Histograms at scale - Monitorama 2019
FiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
Building a High-Performance Database with Scala, Akka, and Spark
Akka in Production - ScalaDays 2015
MIT lecture - Socrata Open Data Architecture
OLAP with Cassandra and Spark
Real-time Analytics with Cassandra, Spark, and Shark

Recently uploaded (20)

PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PDF
Digital Logic Computer Design lecture notes
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
Lecture Notes Electrical Wiring System Components
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PPTX
Fluid Mechanics, Module 3: Basics of Fluid Mechanics
PPTX
Sustainable Sites - Green Building Construction
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
Strings in CPP - Strings in C++ are sequences of characters used to store and...
PPT
Project quality management in manufacturing
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
composite construction of structures.pdf
PDF
ETO & MEO Certificate of Competency Questions and Answers
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Digital Logic Computer Design lecture notes
Internet of Things (IOT) - A guide to understanding
Lecture Notes Electrical Wiring System Components
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
Fluid Mechanics, Module 3: Basics of Fluid Mechanics
Sustainable Sites - Green Building Construction
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
Operating System & Kernel Study Guide-1 - converted.pdf
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
UNIT-1 - COAL BASED THERMAL POWER PLANTS
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Strings in CPP - Strings in C++ are sequences of characters used to store and...
Project quality management in manufacturing
Embodied AI: Ushering in the Next Era of Intelligent Systems
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
composite construction of structures.pdf
ETO & MEO Certificate of Competency Questions and Answers

Spark Summit 2014: Spark Job Server Talk

  • 1. Date Spark Job Server Evan Chan and Kelvin Chu
  • 3. WhyWe Needed a Job Server • Created at Ooyala in 2013 • Our vision for Spark is as a multi-team big data service • What gets repeated by every team: • Bastion box for running Hadoop/Spark jobs • Deploys and process monitoring • Tracking and serializing job status, progress, and job results • Job validation • No easy way to kill jobs • Polyglot technology stack - Ruby scripts run jobs, Go services
  • 4. Spark as a Service • REST API for Spark jobs and contexts. Easily operate Spark from any language or environment. • Runs jobs in their own Contexts or share 1 context amongst jobs • Great for sharing cached RDDs across jobs and low-latency jobs • Works for Spark Streaming as well! • Works with Standalone, Mesos, any Spark config • Jars, job history and config are persisted via a pluggable API • Async and sync API, JSON job results
  • 6. Creating a Job Server Project ✤ sbt assembly -> fat jar -> upload to job server! ✤ "provided" is used. Don’t want SBT assembly to include the whole job server jar.! ✤ Java projects should be possible too resolvers += "Ooyala Bintray" at "http://dl.bintray.com/ooyala/maven" ! libraryDependencies += "ooyala.cnd" % "job-server" % "0.3.1" % "provided" ✤ In your build.sbt, add this
  • 7. Example Job Server Job /**! * A super-simple Spark job example that implements the SparkJob trait and! * can be submitted to the job server.! */! object WordCountExample extends SparkJob {! override def validate(sc: SparkContext, config: Config): SparkJobValidation = {! Try(config.getString(“input.string”))! .map(x => SparkJobValid)! .getOrElse(SparkJobInvalid(“No input.string”))! }! ! override def runJob(sc: SparkContext, config: Config): Any = {! val dd = sc.parallelize(config.getString(“input.string”).split(" ").toSeq)! dd.map((_, 1)).reduceByKey(_ + _).collect().toMap! }! }!
  • 8. What’s Different? • Job does not create Context, Job Server does • Decide when I run the job: in own context, or in pre-created context • Upload new jobs to diagnose your RDD issues: • POST /contexts/newContext • POST /jobs .... context=newContext • Upload a new diagnostic jar... POST /jars/newDiag • Run diagnostic jar to dump into on cached RDDs
  • 9. Submitting and Running a Job ✦ curl --data-binary @../target/mydemo.jar localhost:8090/jars/demo OK[11:32 PM] ~ ! ✦ curl -d "input.string = A lazy dog jumped mean dog" 'localhost:8090/jobs? appName=demo&classPath=WordCountExample&sync=true' { "status": "OK", "RESULT": { "lazy": 1, "jumped": 1, "A": 1, "mean": 1, "dog": 2 } }
  • 10. Retrieve Job Statuses ~/s/jobserver (evan-working-1 ↩=) curl 'localhost:8090/jobs?limit=2' [{ "duration": "77.744 secs", "classPath": "ooyala.cnd.CreateMaterializedView", "startTime": "2013-11-26T20:13:09.071Z", "context": "8b7059dd-ooyala.cnd.CreateMaterializedView", "status": "FINISHED", "jobId": "9982f961-aaaa-4195-88c2-962eae9b08d9" }, { "duration": "58.067 secs", "classPath": "ooyala.cnd.CreateMaterializedView", "startTime": "2013-11-26T20:22:03.257Z", "context": "d0a5ebdc-ooyala.cnd.CreateMaterializedView", "status": "FINISHED", "jobId": "e9317383-6a67-41c4-8291-9c140b6d8459" }]
  • 11. Use Case: Fast Query Jobs
  • 12. Spark as a Query Engine ✤ Goal: spark jobs that run in under a second and answers queries on shared RDD data! ✤ Query params passed in as job config! ✤ Need to minimize context creation overhead! ✤ Thus many jobs sharing the same SparkContext! ✤ On-heap RDD caching means no serialization loss! ✤ Need to consider concurrent jobs (fair scheduling)
  • 13. LOW-LATENCY QUERY JOBS RDDLoad Data Query Job Spark
 Executors Cassandra REST Job Server Query Job Query Result Query Result new SparkContext Create query context Load some data
  • 14. Sharing Data Between Jobs ✤ RDD Caching! ✤ Benefit: no need to serialize data. Especially useful for indexes etc.! ✤ Job server provides a NamedRdds trait for thread-safe CRUD of cached RDDs by name! ✤ (Compare to SparkContext’s API which uses an integer ID and is not thread safe)! ✤ For example, at Ooyala a number of fields are multiplexed into the RDD name: timestamp:customerID:granularity
  • 15. Data Concurrency ✤ Single writer, multiple readers! ✤ Managing multiple updates to RDDs! ✤ Cache keeps track of which RDDs being updated! ✤ Example: thread A spark job creates RDD “A” at t0! ✤ thread B fetches RDD “A” at t1 > t0! ✤ Both threads A and B, using NamedRdds, will get the RDD at time t2 when thread A finishes creating the RDD “A”
  • 17. Persistence ✤ What gets persisted?! ✤ Job status (success, error, why it failed)! ✤ Job Configuration! ✤ Jars! ✤ JDBC database configuration: spark.sqldao.jdbc.url! ✤ jdbc:mysql://dbserver:3306/jobserverdb ✤ Multiple Job Servers can share the same database.! ✤ The default will be H2 - single file on disk.
  • 18. Deployment and Metrics ✤ spark-jobserver repo comes with a full suite of tests and deploy scripts:! ✤ server_deploy.sh for regular server pushes! ✤ server_package.sh for Mesos and Chronos .tar.gz! ✤ /metricz route for codahale-metrics monitoring! ✤ /healthz route for health check0o
  • 19. Challenges and Lessons • Spark is based around contexts - we need a Job Server oriented around logical jobs • Running multiple SparkContexts in the same process • Much easier with Spark 0.9+ … no more global System properties • Have to be careful with SparkEnv • Dynamic jar and class loading is tricky (contributed back to Spark) • Manage threads carefully - each context uses lots of threads
  • 21. Future Plans ✤ Spark-contrib project list. So this and other projects can gain visibility! (SPARK-1283)! ✤ HA mode using Akka Cluster or Mesos! ✤ HA and Hot Failover for Spark Drivers/Contexts! ✤ REST API for job progress! ✤ Swagger API documentation
  • 22. HA and Hot Failover for Jobs Job Server 1 Job Server 2 Active Job Context HDFS Standby Job Context Gossip Checkpoint ✤ Job context dies:! ✤ Job server 2 notices and spins up standby context, restores checkpoint
  • 23. Thanks for your contributions! ✤ All of these were community contributed:! ✤ index.html main page! ✤ saving and retrieving job configuration! ✤ Your contributions are very welcome on Github!
  • 25. Completely Async Design ✤ http://spray.io - probably the fastest JVM HTTP microframework! ✤ Akka Actor based, non blocking! ✤ Futures used to manage individual jobs. (Note that Spark is using Scala futures to manage job stages now)! ✤ Single JVM for now, but easy to distribute later via remote Actors / Akka Cluster
  • 26. Async Actor Flow Spray web API Request actor Local Supervisor Job Manager Job 1 Future Job 2 Future Job Status Actor Job Result Actor
  • 27. Message flow fully documented
  • 29. UsingTachyon Pros Cons Off-heap storage: No GC ByteBuffer API - need to pay deserialization cost Can be shared across multiple processes Data can survive process loss Backed by HDFS Does not support random access writes