Highlights and Challenges from Running Spark on Mesos in Production by Morri Feldman

Morri Feldman
The Road
Less Traveled
Highlights and Challenges from Running
Spark on Mesos in Production
morri@appsﬂyer.com

The Plan
Attribution &
Overall
Architecture
Retention
Data
Infrastructure -
Spark on Mesos
1 2 3

-OR-
User Device
StoreRedirected
Enables
• Cost Per Install (CPI)
• Cost Per In-app Action
(CPA)
• Revenue Share
• Network Optimization
• Retargeting
Media sources
The Flow
AppsFlyer Servers

Retention
Install day 1 2 3 4 5 6 7 8 9 10 11 12

Retention Scale
> 30 Million Installs / Day
> 5 Billion Sessions / Day
Retention
Install day 1 2 3 4 5 6 7 8 9 10 11 12

Two Dimensions (App-Id and Media-Source)
Cascalog
DataLog / Logic programming over Cascading /
Hadoop
Retention V1 (MVP)

Two Dimensions (App-Id and Media-Source)
Cascalog
DataLog / Logic programming over Cascading / Hadoop
Retention V1 (MVP)

S3 Data v1 – Hadoop Sequence files:
Key, Value <Kafka Offset, Json Message>
Gzip Compressed ~ 1.8 TB / Day
S3 Data v2 – Parquet Files (Schema on Write)
Retain fields required for retention, apply some
business logic while converting.
Generates “tables” for installs and sessions.
Retention v2 – “SELECT … JOIN ON ...”
18 Dimensions vs 2 in original report
Retention – Spark SQL / Parquet

Retention Calculation Phases
1. Daily aggregation
Cohort_day, Activity_day, <Dimensions>, Retained Count
2. Pivot
Cohort_day, <Dimensions>, Day0, Day1, Day2 …
After Aggregation and Pivot ~ 1 billion rows

Data Warehouse v3
Parquet Files – Schema on Read
Retain almost all fields from original json
Do not apply any business logic
Business logic applied when reading through
use of a shared library

Spark and Spark
Streaming: ETL for Druid
SQL

Why?
All Data on S3 – No need for HDFS
Spark & Mesos have a long history
Some interest in moving our attribution services to Mesos
Began using spark with EC2 “standalone” cluster scripts (No VPC)
Easy to setup
Culture of trying out promising technologies

Mesos Creature Comforts
Nice UI –
Job outputs / sandbox easy to ﬁnd
Driver and Slave logs are accessible

Mesos Creature Comforts
Fault tolerant – Masters store data in
zookeeper and canfail over smoothly
Nodes join and leave the cluster
automatically at bootup / shutdown

Job Scheduling – Chronos
?https://aphyr.com/posts/326-jepsen-chronos

Speciﬁc Lessons / Challenges
using Spark, Mesos & S3
-or-
What Went Wrong with
Spark / Mesos & S3 and How
We Fixed It.
Spark / Mesos in production for nearly 1 year

S3 is not HDFS
S3n gives tons of timeouts and DNS Errors
@ 5pm Daily
Can compensate for timeouts with
spark.task.maxFailures set to 20
Use S3a from Hadoop 2.7
(S3a in 2.6 generates millions of partitions –
HADOOP-11584)
https://www.appsflyer.com/blog/the-bleeding-edge-spark-parquet-and-s3/

S3 is not HDFS part 2
Use a Direct Output Commiter
Spark writes files to staging area and renames them at
end of job
Rename on S3 is an expensive operation
(~10s of minutes for thousands of files)
Direct Output Commiters write to final output location
(Safe because S3 is atomic, so writes always succeed)
Disadvantages –Incompatible with speculative
execution
Poor recovery from failures during write operations

Avoid .0 releases if possible
Worst example
Spark 1.4.0 randomly loses data especially
on jobs with many output partitions
Fixed by SPARK-8406

Coarse-Grained or Fine-
Grained?
TL; DR – Use coarse-grained
Not Perfect, but Stable

Coarse-Grained –
Disadvantages
spark.cores.max (not dynamic)

Coarse-Grained with
Dynamic Allocation

Tuning Jobs in Coarse-Grained
Set executor memory to ~ entire memory of a
machine (200GB for r3.8xlarge)
spark.task.cpus is then actually spark memory
per task
OOM!!
200 GB 32 cpus

Tuning Jobs in Coarse-Grained
More Shuffle Partitions
OOM!!

Spark on Mesos Future
Improvements
Increased stability –
Dynamic allocation
Tungsten
Mesos Maintenance Primitives, experimental in 0.25.0
Gracefully reduce size of cluster by marking nodes
that will soon be killed
Inverse Oﬀers – preemption, more dynamic scheduling

How We Generated
Duplicate Data
OR
S3 is Still Not HDFS

S3 is Still Not HDFS
S3 is Eventually
Consistent

We are Hiring!
https://www.appsﬂyer.com/jobs/

Highlights and Challenges from Running Spark on Mesos in Production by Morri Feldman

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Highlights and Challenges from Running Spark on Mesos in Production by Morri Feldman (20)

More from Spark Summit (20)

Recently uploaded (20)

Highlights and Challenges from Running Spark on Mesos in Production by Morri Feldman