This document discusses AppsFlyer's experience running Spark on Mesos in production for retention data processing and analytics. Key points include:
- AppsFlyer processes over 30 million installs and 5 billion sessions daily for retention reporting across 18 dimensions using Spark, Mesos, and S3.
- Challenges included timeouts and errors when using Spark's S3 connectors due to the eventual consistency of S3, which was addressed by using more robust connectors and configuration options.
- A coarse-grained Mesos scheduling approach was found to be more stable than fine-grained, though it has limitations like static core allocation that future Mesos improvements may address.
- Tuning jobs for coarse-
Related topics: