SlideShare a Scribd company logo
Taming Snakemake
1/27/14
Why Make?


What are Make's advantages (over Perl and shell scripts)?


Make forces you to think about file transformation in terms of inputs and outputs, recipes and rules. In Perl you are forced to think at the level of
variables, conditionals, and loops. In Shell you are forced to think like a caveman.



Unfortunately, bioinformatics is still largely about files and their suffixes. Make has a very powerful syntax based almost entirely around file suffixes.



Make knows what's been made and what hasn't. Make can be interrupted and restarted safely, and without overwriting finished work.



Make knows what's changed and what hasn't. If an input is newer than an output, it will attempt to rebuild the output.



Make allows you to add new input files without worrying about overwriting old ones.



Make is well supported. There are 1333 Make questions on SO alone.



When people see a Makefile, they immediately know how to run it.



Make does not force you to wrap shell statements in quotes.



Make is a DSL. It will attempt to validate your syntax.



Make is ancient, ubiquitous, and reliable.



Make can parallelize with --jobs.



Make recipes encourage reuse.

https://share.chop.edu/pages/viewpage.action?pageId=138478819
Make review

http://github.research.chop.edu/BiG/err_chip_seq/blob/master/Makefile
Pipelines and Workflows
Other pipelines
Ruffus

Queue

GKNO
Why Snakemake?
 Addresses Makefile weaknesses without
throwing out the good stuff
 Difficult to implement control flow
 No cluster support
 Inflexible wildcards
 Too much reliance on sentinal files
 No reporting mechanism

Johannes Köster
Syntax
Make
Variables

Targets

Rules

Snakemake
Utilities
 Logs - wire them up manually

 Cluster support pretty decent
source /nas/is1/leipzig/martin/variome-env/bin/activate
snakemake --directory /nas/is1/leipzig/martin/snake-env --snakefile /nas/is1/leipzig/martin/snake-env/Snakefile -c qsub -j
16
source /mnt/isilon/cbmi/variome/leipzig/martin/respublica-env/bin/activate
snakemake --directory /mnt/isilon/cbmi/variome/leipzig/martin/snake-env --snakefile /mnt/isilon/cbmi/variome/leipzig/martin/snake-env/Snakefile -c qsub -j
16

 Cores/jobs/resources
Useful stuff
 dry-runs
 keep-going
 touch
 version changes
 workflow diagrams
Python legal
Client websites with Jekyll
 Jekyll is a templating engine for blogs that
accepts Markdown
 Layouts use the Liquid markup

http://mitomap.org/martin-rnaseq/
A workflow that reports itself
Avoiding Sweave-Hell
The bad way
Cache-ing chunks?
Avoiding Sweave-Hell
Avoiding Sweave-Hell
R/Snakemake integration
git submodule add git@github.research.chop.edu:BiG/rna-seq-common-functions.git common/rna-seq
Leave a paper trail
Reproducible Checklist
 repository github.research.chop.edu
 workflow of some kind from beginning to end
 website at mybic.chop.edu
Ties that bind

More Related Content

PPT
2/27/12 Special Factoring - Sum & Difference of Two Cubes
PDF
Multiplication of radicals
PDF
Multiplication and Division of Rational Algebraic Expressions
PPT
Addition and Subtraction of radicals (Dissimilar radicals)
PDF
Factoring Sum and Difference of Two Cubes
PPTX
Linear-Diophantine-Equations.pptx
PPTX
Factoring Perfect Square Trinomial
PPTX
Polynomials
2/27/12 Special Factoring - Sum & Difference of Two Cubes
Multiplication of radicals
Multiplication and Division of Rational Algebraic Expressions
Addition and Subtraction of radicals (Dissimilar radicals)
Factoring Sum and Difference of Two Cubes
Linear-Diophantine-Equations.pptx
Factoring Perfect Square Trinomial
Polynomials

What's hot (20)

PPTX
Product of a binomial and a trinomial involving
PPTX
Two point form Equation of a line
PPTX
equation of the line using two point form
PPTX
Rational Expressions
DOCX
Absolute Value
PPSX
Adding and Subtracting Polynomials - Math 7 Q2W4 LC1
PPTX
05 Performing Fundamental Operations on Integers.pptx
DOCX
Lesson plan quadratic inequalities
PPTX
Problem Solving Involving Factoring
PPT
2.7 Piecewise Functions
PPTX
Arithmetic sequence
PDF
Linear Equations in Two Variables
PPTX
4 2 rules of radicals
PPTX
Strategic intervention materials on mathematics 2.0
PPT
8 2 Using Properties Of Parallelograms
PPTX
Sum and product of the roots of a
PPTX
laws of exponents
PPSX
Factoring Techniques: Difference of Two Squares
PPTX
Module 4 Grade 9 Mathematics (RADICALS)
PPT
Special Products
Product of a binomial and a trinomial involving
Two point form Equation of a line
equation of the line using two point form
Rational Expressions
Absolute Value
Adding and Subtracting Polynomials - Math 7 Q2W4 LC1
05 Performing Fundamental Operations on Integers.pptx
Lesson plan quadratic inequalities
Problem Solving Involving Factoring
2.7 Piecewise Functions
Arithmetic sequence
Linear Equations in Two Variables
4 2 rules of radicals
Strategic intervention materials on mathematics 2.0
8 2 Using Properties Of Parallelograms
Sum and product of the roots of a
laws of exponents
Factoring Techniques: Difference of Two Squares
Module 4 Grade 9 Mathematics (RADICALS)
Special Products
Ad

Viewers also liked (6)

PPTX
Principals, Practices, and Habits
PPT
Reproducible bioinformatics pipelines with Docker and Anduril
PDF
SciPipe - A light-weight workflow library inspired by flow-based programming
PPT
Strategic review (Sample)
PPTX
Hadoop gets Groovy
PPTX
Teamcenter – sap integration gateway
Principals, Practices, and Habits
Reproducible bioinformatics pipelines with Docker and Anduril
SciPipe - A light-weight workflow library inspired by flow-based programming
Strategic review (Sample)
Hadoop gets Groovy
Teamcenter – sap integration gateway
Ad

Similar to Taming Snakemake (20)

PDF
🐲 Here be Stacktraces — Flink SQL for Non-Java Developers
PPTX
Makefile for python projects
PDF
Katello on TorqueBox
PDF
Ruby on Rails (RoR) as a back-end processor for Apex
PDF
Compile ahead of time. It's fine?
PDF
Iptablesrocks
PDF
Functional programming is the most extreme programming
PDF
ABRIDGED VERSION - Joys & frustrations of putting 34,000 lines of Haskell in...
PDF
Sparklife - Life In The Trenches With Spark
ODP
Basic Make
PDF
Developing OpenResty Framework
PDF
OSDC 2018 | The Computer science behind a modern distributed data store by Ma...
PDF
Intro to Elixir talk
PPTX
Webinar: Learn Perl - The Jewel of Scripting Languages
ODP
Concurrent Programming with Ruby and Tuple Spaces
PDF
Verilog By Example A Concise Introduction For Fpga Design Blaine Readler
KEY
Le PERL est mort
PDF
Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...
PDF
The computer science behind a modern disributed data store
PPT
Cloudera Impala Internals
🐲 Here be Stacktraces — Flink SQL for Non-Java Developers
Makefile for python projects
Katello on TorqueBox
Ruby on Rails (RoR) as a back-end processor for Apex
Compile ahead of time. It's fine?
Iptablesrocks
Functional programming is the most extreme programming
ABRIDGED VERSION - Joys & frustrations of putting 34,000 lines of Haskell in...
Sparklife - Life In The Trenches With Spark
Basic Make
Developing OpenResty Framework
OSDC 2018 | The Computer science behind a modern distributed data store by Ma...
Intro to Elixir talk
Webinar: Learn Perl - The Jewel of Scripting Languages
Concurrent Programming with Ruby and Tuple Spaces
Verilog By Example A Concise Introduction For Fpga Design Blaine Readler
Le PERL est mort
Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...
The computer science behind a modern disributed data store
Cloudera Impala Internals

Recently uploaded (20)

PDF
Getting Started with Data Integration: FME Form 101
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
Spectroscopy.pptx food analysis technology
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPT
Teaching material agriculture food technology
PPTX
Big Data Technologies - Introduction.pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Empathic Computing: Creating Shared Understanding
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Getting Started with Data Integration: FME Form 101
Advanced methodologies resolving dimensionality complications for autism neur...
Spectral efficient network and resource selection model in 5G networks
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Spectroscopy.pptx food analysis technology
Mobile App Security Testing_ A Comprehensive Guide.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Assigned Numbers - 2025 - Bluetooth® Document
Teaching material agriculture food technology
Big Data Technologies - Introduction.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
NewMind AI Weekly Chronicles - August'25-Week II
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Empathic Computing: Creating Shared Understanding
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Encapsulation_ Review paper, used for researhc scholars
The Rise and Fall of 3GPP – Time for a Sabbatical?
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf

Taming Snakemake

  • 2. Why Make?  What are Make's advantages (over Perl and shell scripts)?  Make forces you to think about file transformation in terms of inputs and outputs, recipes and rules. In Perl you are forced to think at the level of variables, conditionals, and loops. In Shell you are forced to think like a caveman.  Unfortunately, bioinformatics is still largely about files and their suffixes. Make has a very powerful syntax based almost entirely around file suffixes.  Make knows what's been made and what hasn't. Make can be interrupted and restarted safely, and without overwriting finished work.  Make knows what's changed and what hasn't. If an input is newer than an output, it will attempt to rebuild the output.  Make allows you to add new input files without worrying about overwriting old ones.  Make is well supported. There are 1333 Make questions on SO alone.  When people see a Makefile, they immediately know how to run it.  Make does not force you to wrap shell statements in quotes.  Make is a DSL. It will attempt to validate your syntax.  Make is ancient, ubiquitous, and reliable.  Make can parallelize with --jobs.  Make recipes encourage reuse. https://share.chop.edu/pages/viewpage.action?pageId=138478819
  • 6. Why Snakemake?  Addresses Makefile weaknesses without throwing out the good stuff  Difficult to implement control flow  No cluster support  Inflexible wildcards  Too much reliance on sentinal files  No reporting mechanism Johannes Köster
  • 8. Utilities  Logs - wire them up manually  Cluster support pretty decent source /nas/is1/leipzig/martin/variome-env/bin/activate snakemake --directory /nas/is1/leipzig/martin/snake-env --snakefile /nas/is1/leipzig/martin/snake-env/Snakefile -c qsub -j 16 source /mnt/isilon/cbmi/variome/leipzig/martin/respublica-env/bin/activate snakemake --directory /mnt/isilon/cbmi/variome/leipzig/martin/snake-env --snakefile /mnt/isilon/cbmi/variome/leipzig/martin/snake-env/Snakefile -c qsub -j 16  Cores/jobs/resources
  • 9. Useful stuff  dry-runs  keep-going  touch  version changes  workflow diagrams
  • 11. Client websites with Jekyll  Jekyll is a templating engine for blogs that accepts Markdown  Layouts use the Liquid markup http://mitomap.org/martin-rnaseq/
  • 12. A workflow that reports itself
  • 18. R/Snakemake integration git submodule add git@github.research.chop.edu:BiG/rna-seq-common-functions.git common/rna-seq
  • 19. Leave a paper trail
  • 20. Reproducible Checklist  repository github.research.chop.edu  workflow of some kind from beginning to end  website at mybic.chop.edu