SlideShare a Scribd company logo
Tachyon in xPatterns 
Tachyon Meetup SF 
1 
Oct 2014
Agenda 
• xPatterns architecture 
• BDAS++ 
• Demos 
• Tachyon internals (apis) 
• Lessons learned 
2
3
BDAS++ 
• Tachyon patch (https://github.com/amplab/tachyon/pull/482) 
• Jaws, xPatterns http spark sql server http://github.com/Atigeo/http-spark-sql- 
4 
server 
 Backward compatible with Shark and Spark 0.x stack 
• Spark Job Server 
 multiple Spark contexts in same JVM, job submission in Java + Scala 
https://github.com/Atigeo/spark-job-rest 
• Mesos framework starvation bug 
 submitted patch… detailed Tech Blog link soon at http://xpatterns.com 
• *SchedulerBackend update, 0.9.0 patches (shuffle spill, Mesos fine-grained)
Demos … 
5
Lessons learned 
• partial in-memory file storage bug 
• journal file on hdfs -> backup of local master disk 
• hdfs api 
• RawTable in Shark 
• persist(OFF_HEAP) temporary storage 
• RDD.persist() OFF_HEAP > MEMORY_SER_AND_DISK 
• native API: getInStream(CACHE|NO_CACHE) -> local workers 
• do not evict blocks when streaming to Tachyon/hdfs 
• Tachyon > Spark JVM Cache for long running jobs 
• kryo/defaultCodec/sequenceFile format to minimize memory footprint 
• 25million emails/month 2TB, 3-45 nodes, 120-170GB of RAM for Tachyon 
6
Q & A 
7
© 2013 Atigeo, LLC. All rights reserved. Atigeo and the xPatterns logo are trademarks of Atigeo. The information herein is for informational purposes only and represents the current view of Atigeo as of the date of this 
presentation. Because Atigeo must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Atigeo, and Atigeo cannot guarantee the accuracy of any information provided 
after the date of this presentation. ATIGEO MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

More Related Content

PPTX
xPatterns on Spark, Shark, Mesos, Tachyon
PDF
Reactive streams
PPTX
Spark, Tachyon and Mesos internals
PPTX
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
PDF
Tachyon and Apache Spark
PPTX
Lessons learned from embedding Cassandra in xPatterns
PDF
Next CERN Accelerator Logging Service with Jakub Wozniak
PDF
Streaming Analytics with Spark, Kafka, Cassandra and Akka
xPatterns on Spark, Shark, Mesos, Tachyon
Reactive streams
Spark, Tachyon and Mesos internals
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
Tachyon and Apache Spark
Lessons learned from embedding Cassandra in xPatterns
Next CERN Accelerator Logging Service with Jakub Wozniak
Streaming Analytics with Spark, Kafka, Cassandra and Akka

What's hot (20)

ODP
Lambda Architecture with Spark
PDF
IEEE International Conference on Data Engineering 2015
PPTX
Lambda architecture on Spark, Kafka for real-time large scale ML
PDF
How to deploy Apache Spark 
to Mesos/DCOS
PPTX
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
PDF
Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira
PPTX
How ReversingLabs Serves File Reputation Service for 10B Files
PDF
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...
PDF
Performance Troubleshooting Using Apache Spark Metrics
PPTX
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
PDF
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
PDF
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
PDF
Feeding Cassandra with Spark-Streaming and Kafka
PDF
An introduction into Spark ML plus how to go beyond when you get stuck
PDF
Spark Summit EU talk by Mike Percy
PDF
Spark Summit EU talk by Jorg Schad
PDF
Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...
PDF
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
PDF
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
PDF
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark
IEEE International Conference on Data Engineering 2015
Lambda architecture on Spark, Kafka for real-time large scale ML
How to deploy Apache Spark 
to Mesos/DCOS
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira
How ReversingLabs Serves File Reputation Service for 10B Files
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...
Performance Troubleshooting Using Apache Spark Metrics
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Feeding Cassandra with Spark-Streaming and Kafka
An introduction into Spark ML plus how to go beyond when you get stuck
Spark Summit EU talk by Mike Percy
Spark Summit EU talk by Jorg Schad
Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Ad

Similar to Tachyon meetup San Francisco Oct 2014 (20)

PPTX
Tachyon workshop 2015-07-19
PDF
Tachyon-2014-11-21-amp-camp5
PDF
Using Spark with Tachyon by Gene Pang
PDF
xPatterns on Spark, Tachyon and Mesos - Bucharest meetup
PPTX
Presentation by TachyonNexus & Baidu at Strata Singapore 2015
PDF
In Memory Data Pipeline And Warehouse At Scale - BerlinBuzzwords 2015
PDF
Fast Big Data Analytics with Spark on Tachyon
PPTX
xPatterns - Spark Summit 2014
PPTX
Presentation by TachyonNexus & Intel at Strata Singapore 2015
PDF
A Reliable Memory-Centric Distributed Storage System
PPTX
Tachyon meetup slides.
PDF
Tachyon: An Open Source Memory-Centric Distributed Storage System
PDF
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
PDF
Tachyon Presentation at AMPCamp 6 (November, 2015)
PPTX
Tachyon_meetup_5-28-2015-IBM
PDF
Spark Meetup at Uber
PDF
Ultimate journey towards realtime data platform with 2.5M events per sec
PDF
First-ever scalable, distributed deep learning architecture using Spark & Tac...
PPTX
Processing Large Data with Apache Spark -- HasGeek
PDF
Tachyon memory centric, fault tolerance storage for cluster framworks
Tachyon workshop 2015-07-19
Tachyon-2014-11-21-amp-camp5
Using Spark with Tachyon by Gene Pang
xPatterns on Spark, Tachyon and Mesos - Bucharest meetup
Presentation by TachyonNexus & Baidu at Strata Singapore 2015
In Memory Data Pipeline And Warehouse At Scale - BerlinBuzzwords 2015
Fast Big Data Analytics with Spark on Tachyon
xPatterns - Spark Summit 2014
Presentation by TachyonNexus & Intel at Strata Singapore 2015
A Reliable Memory-Centric Distributed Storage System
Tachyon meetup slides.
Tachyon: An Open Source Memory-Centric Distributed Storage System
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Tachyon Presentation at AMPCamp 6 (November, 2015)
Tachyon_meetup_5-28-2015-IBM
Spark Meetup at Uber
Ultimate journey towards realtime data platform with 2.5M events per sec
First-ever scalable, distributed deep learning architecture using Spark & Tac...
Processing Large Data with Apache Spark -- HasGeek
Tachyon memory centric, fault tolerance storage for cluster framworks
Ad

Recently uploaded (20)

PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
UNIT 4 Total Quality Management .pptx
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
CH1 Production IntroductoryConcepts.pptx
PPT
Mechanical Engineering MATERIALS Selection
PPTX
Strings in CPP - Strings in C++ are sequences of characters used to store and...
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PDF
PPT on Performance Review to get promotions
PPT
Project quality management in manufacturing
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
Lecture Notes Electrical Wiring System Components
PDF
Well-logging-methods_new................
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
Lesson 3_Tessellation.pptx finite Mathematics
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
UNIT 4 Total Quality Management .pptx
Embodied AI: Ushering in the Next Era of Intelligent Systems
Operating System & Kernel Study Guide-1 - converted.pdf
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
CH1 Production IntroductoryConcepts.pptx
Mechanical Engineering MATERIALS Selection
Strings in CPP - Strings in C++ are sequences of characters used to store and...
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPT on Performance Review to get promotions
Project quality management in manufacturing
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Lecture Notes Electrical Wiring System Components
Well-logging-methods_new................
Model Code of Practice - Construction Work - 21102022 .pdf
Lesson 3_Tessellation.pptx finite Mathematics

Tachyon meetup San Francisco Oct 2014

  • 1. Tachyon in xPatterns Tachyon Meetup SF 1 Oct 2014
  • 2. Agenda • xPatterns architecture • BDAS++ • Demos • Tachyon internals (apis) • Lessons learned 2
  • 3. 3
  • 4. BDAS++ • Tachyon patch (https://github.com/amplab/tachyon/pull/482) • Jaws, xPatterns http spark sql server http://github.com/Atigeo/http-spark-sql- 4 server  Backward compatible with Shark and Spark 0.x stack • Spark Job Server  multiple Spark contexts in same JVM, job submission in Java + Scala https://github.com/Atigeo/spark-job-rest • Mesos framework starvation bug  submitted patch… detailed Tech Blog link soon at http://xpatterns.com • *SchedulerBackend update, 0.9.0 patches (shuffle spill, Mesos fine-grained)
  • 6. Lessons learned • partial in-memory file storage bug • journal file on hdfs -> backup of local master disk • hdfs api • RawTable in Shark • persist(OFF_HEAP) temporary storage • RDD.persist() OFF_HEAP > MEMORY_SER_AND_DISK • native API: getInStream(CACHE|NO_CACHE) -> local workers • do not evict blocks when streaming to Tachyon/hdfs • Tachyon > Spark JVM Cache for long running jobs • kryo/defaultCodec/sequenceFile format to minimize memory footprint • 25million emails/month 2TB, 3-45 nodes, 120-170GB of RAM for Tachyon 6
  • 7. Q & A 7
  • 8. © 2013 Atigeo, LLC. All rights reserved. Atigeo and the xPatterns logo are trademarks of Atigeo. The information herein is for informational purposes only and represents the current view of Atigeo as of the date of this presentation. Because Atigeo must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Atigeo, and Atigeo cannot guarantee the accuracy of any information provided after the date of this presentation. ATIGEO MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Editor's Notes

  • #4: The physical architecture diagram for our largest customer deployment, demonstrating the enterprise-grade attributes of the platform: scalability, high availability, performance, resilience, manageability while providing means for geo-failover (warehouse), geo-replication (real-time DB), data and system monitoring, instrumentation, backup & restore. Cassandra rings are DC-replicated across EC2 east and west coast regions, data between geo-replicas synchronized in real time through an ipsec tunnel (VPC-to-VPC). Geo-replicated apis behind an AWS Route 53 DNS service (latency based resource records sets) and ELBs ensures users requests are served from the closest geographical location. Failure to an entire region (happened to us during a big conference!) does not affect our availability and SLAs. User facing dashboards are served from Cassandra (real-time store), with data being exported from a data warehouse (Shark/Hive) build on top a Mesos-managed Spark/Hadoop cluster. Export jobs are instrumented and provide a throttling mechanism to control throughput. Export jobs run on the east-coast only, data is synchronized in real time with the west coast ring. Generated apis are automatically instrumented (Graphite) and monitored (Nagios).
  • #6: Referral Provider Network: one of the 6 applications that we built for our healthcare customer using the xPatterns APIs and tools on the new beyond Hadoop infrastructure: ELT Pipeline, Export to NoSQL API. The dashboard for the RPN application was built using D3.js and angular against the generic api published by the export tool. The application allows for building a graph of downstream and upstream referred and referring providers, grouped by specialty and with computed aggregates like patient counts, claim counts and total charged amounts. RPN is used for both fraud detection and for aiding a clinic buying decision, by following the busiest graph paths. The dataset behind the app consists of 8 billion medical records, from which we extracted 1.7 million providers (Shark warehouse) and built 53 million relationships in the graph (persisted in Cassandra)