SlideShare a Scribd company logo
ADAM
Fast, Scalable Genome Analysis
Adam
Parquet
• OSS Created by Twitter and Cloudera, based on Google
Dremel
• Columnar File Format
• Limit I/O to only data that is needed
• Compresses very well - ADAM file are 5-25% smaller
than BAM file without loss of data
• 3 layers of parallelism: File/row group, Column chunk,
Page
Adam
Spark
5
Worker
Worker
Worker
Driver
Block 1
Block 2
Block 3
Cache 2
Cache 3
Cache 1
task
task
task
result
result
result
Parquet/Spark integration
• 1 row group in Parquet maps
to 1 partition in spark

• We interact with Parquet via
input/output formats

• Spark builds and execute a
computation Directed Acyclic
Graph(DAG), manages data
locality, error/retries
6
K-mers
http://www.homolog.us/blogs/blog/2011/07/28/de-bruijn-graphs-i/
small.sam
Demo
adam-submit
Transform
• SAM/BAM to ADAM format
Flagstat
K-mers
K-mers
adam-shell
K-mers
K-mers

More Related Content

PDF
On Improving Broadcast Joins in Apache Spark SQL
PDF
Filtering vs Enriching Data in Apache Spark
PPTX
Parallel Graph Analytics
PDF
Finns Using FME Like Crazy
PDF
Automatic Forecasting using Prophet, Databricks, Delta Lake and MLflow
PDF
Understanding and Improving Code Generation
PDF
AutoML Toolkit – Deep Dive
PDF
MLeap: Release Spark ML Pipelines
On Improving Broadcast Joins in Apache Spark SQL
Filtering vs Enriching Data in Apache Spark
Parallel Graph Analytics
Finns Using FME Like Crazy
Automatic Forecasting using Prophet, Databricks, Delta Lake and MLflow
Understanding and Improving Code Generation
AutoML Toolkit – Deep Dive
MLeap: Release Spark ML Pipelines

What's hot (20)

PPTX
Apache Yarn - Hadoop Cluster Management
PPTX
Data Science Salon: A Journey of Deploying a Data Science Engine to Production
PDF
Scaling Data and ML with Apache Spark and Feast
PDF
饿了么工作流介绍
PDF
Gender Prediction with Databricks AutoML Pipeline
PDF
XStream: stream processing platform at facebook
PDF
Ray: Enterprise-Grade, Distributed Python
PDF
Migrating Oracle database to Cassandra
PDF
MLLeap, or How to Productionize Data Science Workflows Using Spark by Mikha...
PDF
Index conf sparkml-feb20-n-pentreath
PPTX
FME World Tour 2015: (EN) FME 2015 in action
PPTX
Genome Analysis Pipelines with Spark and ADAM
PDF
Scalable Acceleration of XGBoost Training on Apache Spark GPU Clusters
PDF
Willump: Optimizing Feature Computation in ML Inference
PDF
Stad Lier: Transforming raw data into business info
PDF
Apache SystemML Architecture by Niketan Panesar
PDF
XML-athon with Don and Dean
PDF
From HDFS to S3: Migrate Pinterest Apache Spark Clusters
PDF
Efficiently Building Machine Learning Models for Predictive Maintenance in th...
PDF
High Performance Computing in web application
Apache Yarn - Hadoop Cluster Management
Data Science Salon: A Journey of Deploying a Data Science Engine to Production
Scaling Data and ML with Apache Spark and Feast
饿了么工作流介绍
Gender Prediction with Databricks AutoML Pipeline
XStream: stream processing platform at facebook
Ray: Enterprise-Grade, Distributed Python
Migrating Oracle database to Cassandra
MLLeap, or How to Productionize Data Science Workflows Using Spark by Mikha...
Index conf sparkml-feb20-n-pentreath
FME World Tour 2015: (EN) FME 2015 in action
Genome Analysis Pipelines with Spark and ADAM
Scalable Acceleration of XGBoost Training on Apache Spark GPU Clusters
Willump: Optimizing Feature Computation in ML Inference
Stad Lier: Transforming raw data into business info
Apache SystemML Architecture by Niketan Panesar
XML-athon with Don and Dean
From HDFS to S3: Migrate Pinterest Apache Spark Clusters
Efficiently Building Machine Learning Models for Predictive Maintenance in th...
High Performance Computing in web application
Ad

Viewers also liked (20)

PDF
Spark
PPT
The beatles
PPTX
Afl presentation assessment 2b
PPTX
Skal International Sunshine Coast 2015 National AGM club report
PDF
Computer Languages
PDF
A time energy performance analysis of map reduce on heterogeneous systems wit...
DOCX
Industrial report on fairness cream
PPTX
Tide ghoshal sir
PPSX
Skal International Sunshine Coast National Assembly Sep 2015
PPTX
X-breikki 1.-2. luokkalaisille/Ypi
DOC
Jonathan M Edgecombe_Resume_March 9 2015 (1)new2015
PPTX
第3回プログラミングカフェ_テキスト
PPT
PPTX
Prezentacja mrzygłód sylwia
PDF
STCW Certificates
PPTX
第2回プログラミングカフェ_テキスト
PDF
Real_Estate_Script
DOCX
Tracking Variation in Systemic Risk-2 8-3
DOC
Resume1 -Team leader
Spark
The beatles
Afl presentation assessment 2b
Skal International Sunshine Coast 2015 National AGM club report
Computer Languages
A time energy performance analysis of map reduce on heterogeneous systems wit...
Industrial report on fairness cream
Tide ghoshal sir
Skal International Sunshine Coast National Assembly Sep 2015
X-breikki 1.-2. luokkalaisille/Ypi
Jonathan M Edgecombe_Resume_March 9 2015 (1)new2015
第3回プログラミングカフェ_テキスト
Prezentacja mrzygłód sylwia
STCW Certificates
第2回プログラミングカフェ_テキスト
Real_Estate_Script
Tracking Variation in Systemic Risk-2 8-3
Resume1 -Team leader
Ad

Similar to Adam (20)

PDF
Ga4 gh meeting at the the sanger institute
PPTX
Transformation Processing Smackdown; Spark vs Hive vs Pig
PPTX
The columnar roadmap: Apache Parquet and Apache Arrow
PDF
Stream Processing Everywhere - What to use?
PDF
Buzzwords 2014 / Overview / part1
PDF
Understanding and building big data Architectures - NoSQL
PPTX
Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
PDF
Media_Entertainment_Veriticals
PPT
20120606 Lazy Programmers Write Self-Modifying Code /or/ Dealing with XML Ord...
PDF
Scalding - Big Data Programming with Scala
PPTX
Strata NY 2017 Parquet Arrow roadmap
PPTX
BDM8 - Near-realtime Big Data Analytics using Impala
PPTX
Frustration-Reduced PySpark: Data engineering with DataFrames
PDF
Amazon Elastic Map Reduce: the concepts
PPTX
Robust and Scalable ETL over Cloud Storage with Apache Spark
PDF
Arc 300-3 ade miller-en
PPTX
Processing Large Data with Apache Spark -- HasGeek
PDF
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
PDF
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
PDF
Top 5 mistakes when writing Streaming applications
Ga4 gh meeting at the the sanger institute
Transformation Processing Smackdown; Spark vs Hive vs Pig
The columnar roadmap: Apache Parquet and Apache Arrow
Stream Processing Everywhere - What to use?
Buzzwords 2014 / Overview / part1
Understanding and building big data Architectures - NoSQL
Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Media_Entertainment_Veriticals
20120606 Lazy Programmers Write Self-Modifying Code /or/ Dealing with XML Ord...
Scalding - Big Data Programming with Scala
Strata NY 2017 Parquet Arrow roadmap
BDM8 - Near-realtime Big Data Analytics using Impala
Frustration-Reduced PySpark: Data engineering with DataFrames
Amazon Elastic Map Reduce: the concepts
Robust and Scalable ETL over Cloud Storage with Apache Spark
Arc 300-3 ade miller-en
Processing Large Data with Apache Spark -- HasGeek
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
Top 5 mistakes when writing Streaming applications

Recently uploaded (20)

PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
System and Network Administraation Chapter 3
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
medical staffing services at VALiNTRY
PPTX
Essential Infomation Tech presentation.pptx
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PPTX
Transform Your Business with a Software ERP System
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PPTX
Reimagine Home Health with the Power of Agentic AI​
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
AI in Product Development-omnex systems
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
How Creative Agencies Leverage Project Management Software.pdf
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
System and Network Administraation Chapter 3
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Operating system designcfffgfgggggggvggggggggg
medical staffing services at VALiNTRY
Essential Infomation Tech presentation.pptx
Adobe Illustrator 28.6 Crack My Vision of Vector Design
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Transform Your Business with a Software ERP System
Which alternative to Crystal Reports is best for small or large businesses.pdf
Reimagine Home Health with the Power of Agentic AI​
VVF-Customer-Presentation2025-Ver1.9.pptx
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Design an Analysis of Algorithms I-SECS-1021-03
AI in Product Development-omnex systems
Design an Analysis of Algorithms II-SECS-1021-03
How Creative Agencies Leverage Project Management Software.pdf
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Upgrade and Innovation Strategies for SAP ERP Customers

Adam