SlideShare a Scribd company logo
speedupanalysisusingacomputingcluster
lowr∳
streamliningworkflows
flowr streamlining computing workflows
➡when one needs to wrangle a lot of data
whybother?
➡when one needs to wrangle a lot of data
➡and there are multiple steps involved
whybother?
➡when one needs to wrangle a lot of data
➡and there are multiple steps involved
➡esp. when some of the steps can be further
broken down and processed in parallel
whybother?
➡when one needs to wrangle a lot of data
➡and there are multiple steps involved
➡esp. when some of the steps can be further
broken down and processed in parallel
➡use a computing cluster, submit a web of
jobs
whybother?
➡when one needs to wrangle a lot of data
➡and there are multiple steps involved
➡esp. when some of the steps can be further
broken down and processed in parallel
➡use a computing cluster, submit a web of
jobs
whybother?
✓ Effectively process a multi-step pipeline, spawning it across the
computing cluster
➡when one needs to wrangle a lot of data
➡and there are multiple steps involved
➡esp. when some of the steps can be further
broken down and processed in parallel
➡use a computing cluster, submit a web of
jobs
whybother?
✓ Effectively process a multi-step pipeline, spawning it across the
computing cluster
✓ Reproducible and transparent, with cleanly structured execution logs
➡when one needs to wrangle a lot of data
➡and there are multiple steps involved
➡esp. when some of the steps can be further
broken down and processed in parallel
➡use a computing cluster, submit a web of
jobs
whybother?
✓ Effectively process a multi-step pipeline, spawning it across the
computing cluster
✓ Reproducible and transparent, with cleanly structured execution logs
✓ Track and re-run flows
➡when one needs to wrangle a lot of data
➡and there are multiple steps involved
➡esp. when some of the steps can be further
broken down and processed in parallel
➡use a computing cluster, submit a web of
jobs
whybother?
✓ Effectively process a multi-step pipeline, spawning it across the
computing cluster
✓ Reproducible and transparent, with cleanly structured execution logs
✓ Track and re-run flows
✓ Lean and Portable, with easy installation
➡when one needs to wrangle a lot of data
➡and there are multiple steps involved
➡esp. when some of the steps can be further
broken down and processed in parallel
➡use a computing cluster, submit a web of
jobs
whybother?
✓ Effectively process a multi-step pipeline, spawning it across the
computing cluster
✓ Reproducible and transparent, with cleanly structured execution logs
✓ Track and re-run flows
✓ Lean and Portable, with easy installation
✓ Run the same pipeline in the cloud (using star cluster) OR a local machine
➡when one needs to wrangle a lot of data
➡and there are multiple steps involved
➡esp. when some of the steps can be further
broken down and processed in parallel
➡use a computing cluster, submit a web of
jobs
whybother?
✓ Effectively process a multi-step pipeline, spawning it across the
computing cluster
✓ Reproducible and transparent, with cleanly structured execution logs
✓ Track and re-run flows
✓ Lean and Portable, with easy installation
✓ Run the same pipeline in the cloud (using star cluster) OR a local machine
✓ Supports multiple cluster computing platforms (torque, lsf, sge, slurm …)
fivesimpleterms,definingallrelationships
submission types
scatter serial
fivesimpleterms,definingallrelationships
submission types
scatter serial
fivesimpleterms,definingallrelationships
decide how pieces of a
single step are processed
submission types
scatter serial
fivesimpleterms,definingallrelationships
decide how pieces of a
single step are processed
all in parallel
submission types
scatter serial
fivesimpleterms,definingallrelationships
decide how pieces of a
single step are processed
all in parallel sequentially
serial
dependency types
burstgather
submission types
scatter serial
fivesimpleterms,definingallrelationships
decide how pieces of a
single step are processed
all in parallel sequentially
serial
dependency types
burstgather
submission types
scatter serial
fivesimpleterms,definingallrelationships
decide how pieces of a
single step are processed
decide the relationship
b/w steps
all in parallel sequentially
serial
dependency types
burstgather
submission types
scatter serial
fivesimpleterms,definingallrelationships
decide how pieces of a
single step are processed
decide the relationship
b/w steps
many-to-many
all in parallel sequentially
serial
dependency types
burstgather
submission types
scatter serial
fivesimpleterms,definingallrelationships
decide how pieces of a
single step are processed
decide the relationship
b/w steps
many-to-many
many-to-one
all in parallel sequentially
serial
dependency types
burstgather
submission types
scatter serial
fivesimpleterms,definingallrelationships
decide how pieces of a
single step are processed
decide the relationship
b/w steps
many-to-many
many-to-one
one-to-manyall in parallel sequentially
one
to
many
submission
type
dependency
type
relationship
Usingagenomicsexampleflow,withflowrconcepts
fastq
sam
bam
fastq
sam
bam
fastq
sam
bam
fastq
sam
bam
fastq
sam
bam
scatter serial
many
to
many
one
to
many
submission
type
dependency
type
relationship
Usingagenomicsexampleflow,withflowrconcepts
fastq
sam
bam
fastq
sam
bam
fastq
sam
bam
fastq
sam
bam
fastq
sam
bam
scatter serial
many
to
many
merged
bam
serial gather
many
to
one
one
to
many
submission
type
dependency
type
relationship
Usingagenomicsexampleflow,withflowrconcepts
fastq
sam
bam
fastq
sam
bam
fastq
sam
bam
fastq
sam
bam
fastq
sam
bam
scatter serial
many
to
many
merged
bam
serial gather
many
to
one
alignment
stats
sort & index
Mutations
Copy Number
variation
Indel Calling
downstream analysis
scatter burst
one
to
many
submission
type
dependency
type
relationship
Usingagenomicsexampleflow,withflowrconcepts
asimplepipeline,where
★ wewouldsleepforafewseconds
asimplepipeline,where
★ wewouldsleepforafewseconds
★ createafewsmallfiles
asimplepipeline,where
★ wewouldsleepforafewseconds
★ createafewsmallfiles
★ mergethosefiles
asimplepipeline,where
★ wewouldsleepforafewseconds
★ createafewsmallfiles
★ mergethosefiles
★ getthesizeoftheresultingmergedfile
asimplepipeline,where
echo 'Hello World !'
sleep 5
sleep 5
cat $RANDOM > tmp1
cat $RANDOM > tmp2
cat tmp1 tmp2 > tmp
du -sh tmp
simplepipelineinbash
echo 'Hello World !'
sleep 5
sleep 5
cat $RANDOM > tmp1
cat $RANDOM > tmp2
cat tmp1 tmp2 > tmp
du -sh tmp
simplepipelineinbash
say Hello to
the world
echo 'Hello World !'
sleep 5
sleep 5
cat $RANDOM > tmp1
cat $RANDOM > tmp2
cat tmp1 tmp2 > tmp
du -sh tmp
simplepipelineinbash
say Hello to
the world wait for a few
seconds…
echo 'Hello World !'
sleep 5
sleep 5
cat $RANDOM > tmp1
cat $RANDOM > tmp2
cat tmp1 tmp2 > tmp
du -sh tmp
simplepipelineinbash
say Hello to
the world wait for a few
seconds…
create two
small files
echo 'Hello World !'
sleep 5
sleep 5
cat $RANDOM > tmp1
cat $RANDOM > tmp2
cat tmp1 tmp2 > tmp
du -sh tmp
simplepipelineinbash
say Hello to
the world wait for a few
seconds…
create two
small files merge the
two files
echo 'Hello World !'
sleep 5
sleep 5
cat $RANDOM > tmp1
cat $RANDOM > tmp2
cat tmp1 tmp2 > tmp
du -sh tmp
simplepipelineinbash
say Hello to
the world wait for a few
seconds…
create two
small files merge the
two files
check the size of
the resulting file
wrapbashcommandsintoR
hello='echo Hello World !'
sleep=c('sleep 5', 'sleep 5')
tmp=c('cat $RANDOM > tmp1',
'cat $RANDOM > tmp2')
merge='cat tmp1 tmp2 > tmp'
size='du -sh tmp'
wrapbashcommandsintoR
hello='echo Hello World !'
sleep=c('sleep 5', 'sleep 5')
tmp=c('cat $RANDOM > tmp1',
'cat $RANDOM > tmp2')
merge='cat tmp1 tmp2 > tmp'
size='du -sh tmp'
say Hello to
the world
wrapbashcommandsintoR
hello='echo Hello World !'
sleep=c('sleep 5', 'sleep 5')
tmp=c('cat $RANDOM > tmp1',
'cat $RANDOM > tmp2')
merge='cat tmp1 tmp2 > tmp'
size='du -sh tmp'
say Hello to
the world wait for a few
seconds…
wrapbashcommandsintoR
hello='echo Hello World !'
sleep=c('sleep 5', 'sleep 5')
tmp=c('cat $RANDOM > tmp1',
'cat $RANDOM > tmp2')
merge='cat tmp1 tmp2 > tmp'
size='du -sh tmp'
say Hello to
the world wait for a few
seconds…
create two
small files
wrapbashcommandsintoR
hello='echo Hello World !'
sleep=c('sleep 5', 'sleep 5')
tmp=c('cat $RANDOM > tmp1',
'cat $RANDOM > tmp2')
merge='cat tmp1 tmp2 > tmp'
size='du -sh tmp'
say Hello to
the world wait for a few
seconds…
create two
small files merge the
two files
wrapbashcommandsintoR
hello='echo Hello World !'
sleep=c('sleep 5', 'sleep 5')
tmp=c('cat $RANDOM > tmp1',
'cat $RANDOM > tmp2')
merge='cat tmp1 tmp2 > tmp'
size='du -sh tmp'
say Hello to
the world wait for a few
seconds…
create two
small files merge the
two files
check the size of
the resulting file
createatableofallcommands
library(flowr)
lst = list(hello=hello,
sleep=sleep,
tmp=tmp,
merge=merge,
size=size)
flowmat = to_flowmat(lst, "samp1")
hello='echo Hello World !'
sleep=c('sleep 5', 'sleep 5')
tmp=c('cat $RANDOM > tmp1',
'cat $RANDOM > tmp2')
merge='cat tmp1 tmp2 > tmp'
size='du -sh tmp'
createatableofallcommands
library(flowr)
lst = list(hello=hello,
sleep=sleep,
tmp=tmp,
merge=merge,
size=size)
flowmat = to_flowmat(lst, "samp1")
hello='echo Hello World !'
sleep=c('sleep 5', 'sleep 5')
tmp=c('cat $RANDOM > tmp1',
'cat $RANDOM > tmp2')
merge='cat tmp1 tmp2 > tmp'
size='du -sh tmp'
create a
named list
createatableofallcommands
library(flowr)
lst = list(hello=hello,
sleep=sleep,
tmp=tmp,
merge=merge,
size=size)
flowmat = to_flowmat(lst, "samp1")
hello='echo Hello World !'
sleep=c('sleep 5', 'sleep 5')
tmp=c('cat $RANDOM > tmp1',
'cat $RANDOM > tmp2')
merge='cat tmp1 tmp2 > tmp'
size='du -sh tmp'
create a
named list
create a
table
createatableofallcommands
library(flowr)
lst = list(hello=hello,
sleep=sleep,
tmp=tmp,
merge=merge,
size=size)
flowmat = to_flowmat(lst, "samp1")
hello='echo Hello World !'
sleep=c('sleep 5', 'sleep 5')
tmp=c('cat $RANDOM > tmp1',
'cat $RANDOM > tmp2')
merge='cat tmp1 tmp2 > tmp'
size='du -sh tmp'
create a
named list
create a
table
|samplename |jobname |cmd |
|:----------|:-------|:-------------------|
|samp1 |hello |echo Hello World ! |
|samp1 |sleep |sleep 5 |
|samp1 |sleep |sleep 5 |
|samp1 |tmp |cat $RANDOM > tmp1 |
|samp1 |tmp |cat $RANDOM > tmp2 |
|samp1 |merge |cat tmp1 tmp2 > tmp |
|samp1 |size |du -sh tmp |
createatableofallcommands
library(flowr)
lst = list(hello=hello,
sleep=sleep,
tmp=tmp,
merge=merge,
size=size)
flowmat = to_flowmat(lst, "samp1")
hello='echo Hello World !'
sleep=c('sleep 5', 'sleep 5')
tmp=c('cat $RANDOM > tmp1',
'cat $RANDOM > tmp2')
merge='cat tmp1 tmp2 > tmp'
size='du -sh tmp'
create a
named list
create a
table
|samplename |jobname |cmd |
|:----------|:-------|:-------------------|
|samp1 |hello |echo Hello World ! |
|samp1 |sleep |sleep 5 |
|samp1 |sleep |sleep 5 |
|samp1 |tmp |cat $RANDOM > tmp1 |
|samp1 |tmp |cat $RANDOM > tmp2 |
|samp1 |merge |cat tmp1 tmp2 > tmp |
|samp1 |size |du -sh tmp |
asimpletab-delimtable
connectthedots…
connectthedots…
flowdefinitiondecidesthe
sequenceofsteps
createaflowdefinition
flowdef = to_flowdef(flowmat,
sub_type = c("serial", "scatter", "scatter", "serial", "serial"),
dep_type = c("none", "burst", "serial", "gather", "serial"),
platform = "local")
createaflowdefinition
flowdef = to_flowdef(flowmat,
sub_type = c("serial", "scatter", "scatter", "serial", "serial"),
dep_type = c("none", "burst", "serial", "gather", "serial"),
platform = "local")
|jobname |sub_type |prev_jobs |dep_type | cpu|
|:-------|:--------|:---------|:--------|---:|
|hello |serial |none |none | 1|
|sleep |scatter |hello |burst | 1|
|tmp |scatter |sleep |serial | 1|
|merge |serial |tmp |gather | 1|
|size |serial |merge |serial | 1|
createaflowdefinition
asimpletab-delimtable
flowdef = to_flowdef(flowmat,
sub_type = c("serial", "scatter", "scatter", "serial", "serial"),
dep_type = c("none", "burst", "serial", "gather", "serial"),
platform = "local")
hello
dep: none
sub: serial
sleepsleepsleepsleep
dep: burst
sub: scatter
tmptmptmptmp
dep: serial
sub: scatter
merge
dep: gather
sub: serial
size
dep: serial
sub: serial
|jobname |sub_type |prev_jobs |dep_type | cpu|
|:-------|:--------|:---------|:--------|---:|
|hello |serial |none |none | 1|
|sleep |scatter |hello |burst | 1|
|tmp |scatter |sleep |serial | 1|
|merge |serial |tmp |gather | 1|
|size |serial |merge |serial | 1|
plot_flow(flowdef)
createaflowdefinition
asimpletab-delimtable
stitchaflow…
useaflowmatandflowdef,
tocreateaflowobject
stitch&submittothecluster
(cloudorserver)
stitch&submittothecluster
(cloudorserver)
|jobname |sub_type |prev_jobs |dep_type | cpu|
|:-------|:--------|:---------|:--------|---:|
|hello |serial |none |none | 1|
|sleep |scatter |hello |burst | 1|
|tmp |scatter |sleep |serial | 1|
|merge |serial |tmp |gather | 1|
|size |serial |merge |serial | 1|
|samplename |jobname |cmd |
|:----------|:-------|:-------------------|
|samp1 |hello |echo Hello World ! |
|samp1 |sleep |sleep 5 |
|samp1 |sleep |sleep 5 |
|samp1 |tmp |cat $RANDOM > tmp1 |
|samp1 |tmp |cat $RANDOM > tmp2 |
|samp1 |merge |cat tmp1 tmp2 > tmp |
|samp1 |size |du -sh tmp |
flowmat flowdef
+
stitch&submittothecluster
(cloudorserver)
|jobname |sub_type |prev_jobs |dep_type | cpu|
|:-------|:--------|:---------|:--------|---:|
|hello |serial |none |none | 1|
|sleep |scatter |hello |burst | 1|
|tmp |scatter |sleep |serial | 1|
|merge |serial |tmp |gather | 1|
|size |serial |merge |serial | 1|
|samplename |jobname |cmd |
|:----------|:-------|:-------------------|
|samp1 |hello |echo Hello World ! |
|samp1 |sleep |sleep 5 |
|samp1 |sleep |sleep 5 |
|samp1 |tmp |cat $RANDOM > tmp1 |
|samp1 |tmp |cat $RANDOM > tmp2 |
|samp1 |merge |cat tmp1 tmp2 > tmp |
|samp1 |size |du -sh tmp |
flowmat flowdef
+
fobj = to_flow(flowmat, flowdef, execute = TRUE)
Working on: hello
|===== | 25%
Working on: sleep
|================ | 50%
Working on: merge
|==================================| 100%
Working on: size
Flow is being processed. Track it from R/Terminal using:
flowr status x=~/flowr/runs/flowname-samp1-20151005-16-01-38-M8WniKJo
OR from R using:
status(x='~/flowr/runs/flowname-samp1-20151005-16-01-38-M8WniKJo')
stitch&submittothecluster
(cloudorserver)
|jobname |sub_type |prev_jobs |dep_type | cpu|
|:-------|:--------|:---------|:--------|---:|
|hello |serial |none |none | 1|
|sleep |scatter |hello |burst | 1|
|tmp |scatter |sleep |serial | 1|
|merge |serial |tmp |gather | 1|
|size |serial |merge |serial | 1|
|samplename |jobname |cmd |
|:----------|:-------|:-------------------|
|samp1 |hello |echo Hello World ! |
|samp1 |sleep |sleep 5 |
|samp1 |sleep |sleep 5 |
|samp1 |tmp |cat $RANDOM > tmp1 |
|samp1 |tmp |cat $RANDOM > tmp2 |
|samp1 |merge |cat tmp1 tmp2 > tmp |
|samp1 |size |du -sh tmp |
flowmat flowdef
+
fobj = to_flow(flowmat, flowdef, execute = TRUE)
submitaflow,then…
submitaflow,then…
status()
monitor the status of a single flow
OR multiple flows
submitaflow,then…
status()
monitor the status of a single flow
OR multiple flows
kill()
kill all the associated jobs, of one or many flows
submitaflow,then…
status()
monitor the status of a single flow
OR multiple flows
kill()
kill all the associated jobs, of one or many flows
rerun()
One can rerun the flow from a intermediate step
github.com/sahilseth/flowr
complete documentation: docs.flowr.space
email: sahil.seth@me.com
Extradetails
- creativily define relationships using submission and dependency types
- each row describes resources for one step, providing full flexibility
Flow mat
samplename jobname cmd
sample1 A sleep 2 && sleep 5;echo hello
sample1 A sleep 13 && sleep 7;echo hello
sample1 B head -c 100000 /dev/urandom > tmp1
sample1 B head -c 100000 /dev/urandom > tmp1
sample1 C cat tmp1 tmp2 tmp3 > merged
sample1 D du -sh merged
sample1 D ls merged
Flow Definition
Define Relationships Resource Requirements
jobname
submission
type
previous
job(s)
dependency
type
queue memory time cpu platform
A scatter none none medium 163185 23:00 1 lsf
B scatter A serial medium 163185 23:00 1 lsf
C serial B gather medium 163185 23:00 1 lsf
D scatter C burst medium 163185 23:00 1 lsf
- creativily define relationships using submission and dependency types
- each row describes resources for one step, providing full flexibility
Flow mat
samplename jobname cmd
sample1 A sleep 2 && sleep 5;echo hello
sample1 A sleep 13 && sleep 7;echo hello
sample1 B head -c 100000 /dev/urandom > tmp1
sample1 B head -c 100000 /dev/urandom > tmp1
sample1 C cat tmp1 tmp2 tmp3 > merged
sample1 D du -sh merged
sample1 D ls merged
Flow Definition
Define Relationships Resource Requirements
jobname
submission
type
previous
job(s)
dependency
type
queue memory time cpu platform
A scatter none none medium 163185 23:00 1 lsf
B scatter A serial medium 163185 23:00 1 lsf
C serial B gather medium 163185 23:00 1 lsf
D scatter C burst medium 163185 23:00 1 lsf
- creativily define relationships using submission and dependency types
- each row describes resources for one step, providing full flexibility
Flow mat
samplename jobname cmd
sample1 A sleep 2 && sleep 5;echo hello
sample1 A sleep 13 && sleep 7;echo hello
sample1 B head -c 100000 /dev/urandom > tmp1
sample1 B head -c 100000 /dev/urandom > tmp1
sample1 C cat tmp1 tmp2 tmp3 > merged
sample1 D du -sh merged
sample1 D ls merged
Flow Definition
Define Relationships Resource Requirements
jobname
submission
type
previous
job(s)
dependency
type
queue memory time cpu platform
A scatter none none medium 163185 23:00 1 lsf
B scatter A serial medium 163185 23:00 1 lsf
C serial B gather medium 163185 23:00 1 lsf
D scatter C burst medium 163185 23:00 1 lsf
- creativily define relationships using submission and dependency types
- each row describes resources for one step, providing full flexibility
Flow mat
samplename jobname cmd
sample1 A sleep 2 && sleep 5;echo hello
sample1 A sleep 13 && sleep 7;echo hello
sample1 B head -c 100000 /dev/urandom > tmp1
sample1 B head -c 100000 /dev/urandom > tmp1
sample1 C cat tmp1 tmp2 tmp3 > merged
sample1 D du -sh merged
sample1 D ls merged
Flow Definition
Define Relationships Resource Requirements
jobname
submission
type
previous
job(s)
dependency
type
queue memory time cpu platform
A scatter none none medium 163185 23:00 1 lsf
B scatter A serial medium 163185 23:00 1 lsf
C serial B gather medium 163185 23:00 1 lsf
D scatter C burst medium 163185 23:00 1 lsf
- creativily define relationships using submission and dependency types
- each row describes resources for one step, providing full flexibility
Flow mat
samplename jobname cmd
sample1 A sleep 2 && sleep 5;echo hello
sample1 A sleep 13 && sleep 7;echo hello
sample1 B head -c 100000 /dev/urandom > tmp1
sample1 B head -c 100000 /dev/urandom > tmp1
sample1 C cat tmp1 tmp2 tmp3 > merged
sample1 D du -sh merged
sample1 D ls merged
Flow Definition
Define Relationships Resource Requirements
jobname
submission
type
previous
job(s)
dependency
type
queue memory time cpu platform
A scatter none none medium 163185 23:00 1 lsf
B scatter A serial medium 163185 23:00 1 lsf
C serial B gather medium 163185 23:00 1 lsf
D scatter C burst medium 163185 23:00 1 lsf
- use any language to
create a flow mat (a tsv
file)
- cmd column defines
commands to run
- creativily define relationships using submission and dependency types
- each row describes resources for one step, providing full flexibility
Flow mat
samplename jobname cmd
sample1 A sleep 2 && sleep 5;echo hello
sample1 A sleep 13 && sleep 7;echo hello
sample1 B head -c 100000 /dev/urandom > tmp1
sample1 B head -c 100000 /dev/urandom > tmp1
sample1 C cat tmp1 tmp2 tmp3 > merged
sample1 D du -sh merged
sample1 D ls merged
Flow Definition
Define Relationships Resource Requirements
jobname
submission
type
previous
job(s)
dependency
type
queue memory time cpu platform
A scatter none none medium 163185 23:00 1 lsf
B scatter A serial medium 163185 23:00 1 lsf
C serial B gather medium 163185 23:00 1 lsf
D scatter C burst medium 163185 23:00 1 lsf
- use any language to
create a flow mat (a tsv
file)
- cmd column defines
commands to run
- creativily define relationships using submission and dependency types
- each row describes resources for one step, providing full flexibility
Flow mat
samplename jobname cmd
sample1 A sleep 2 && sleep 5;echo hello
sample1 A sleep 13 && sleep 7;echo hello
sample1 B head -c 100000 /dev/urandom > tmp1
sample1 B head -c 100000 /dev/urandom > tmp1
sample1 C cat tmp1 tmp2 tmp3 > merged
sample1 D du -sh merged
sample1 D ls merged
Flow Definition
Define Relationships Resource Requirements
jobname
submission
type
previous
job(s)
dependency
type
queue memory time cpu platform
A scatter none none medium 163185 23:00 1 lsf
B scatter A serial medium 163185 23:00 1 lsf
C serial B gather medium 163185 23:00 1 lsf
D scatter C burst medium 163185 23:00 1 lsf
- use any language to
create a flow mat (a tsv
file)
- cmd column defines
commands to run
- creativily define relationships using submission and dependency types
- each row describes resources for one step, providing full flexibility
Flow mat
samplename jobname cmd
sample1 A sleep 2 && sleep 5;echo hello
sample1 A sleep 13 && sleep 7;echo hello
sample1 B head -c 100000 /dev/urandom > tmp1
sample1 B head -c 100000 /dev/urandom > tmp1
sample1 C cat tmp1 tmp2 tmp3 > merged
sample1 D du -sh merged
sample1 D ls merged
Flow Definition
Define Relationships Resource Requirements
jobname
submission
type
previous
job(s)
dependency
type
queue memory time cpu platform
A scatter none none medium 163185 23:00 1 lsf
B scatter A serial medium 163185 23:00 1 lsf
C serial B gather medium 163185 23:00 1 lsf
D scatter C burst medium 163185 23:00 1 lsf
- use any language to
create a flow mat (a tsv
file)
- cmd column defines
commands to run

More Related Content

PDF
Faster PHP apps using Queues and Workers
KEY
Gearman and CodeIgniter
PPT
Gearman - Job Queue
PDF
Queue your work
PDF
Distributed Queue System using Gearman
PDF
Packaging is the Worst Way to Distribute Software, Except for Everything Else
PDF
Puppet Camp New York 2015: Puppet Enterprise Scaling Lessons Learned (Interme...
PPTX
Distributed Applications with Perl & Gearman
Faster PHP apps using Queues and Workers
Gearman and CodeIgniter
Gearman - Job Queue
Queue your work
Distributed Queue System using Gearman
Packaging is the Worst Way to Distribute Software, Except for Everything Else
Puppet Camp New York 2015: Puppet Enterprise Scaling Lessons Learned (Interme...
Distributed Applications with Perl & Gearman

Viewers also liked (9)

PPTX
Pozycki slideshare
PDF
PPTX
Who am i
PPS
C3 Network - Slides apresentacao
PPTX
Androids
PDF
Basic SEO
PDF
PPT
PDF
Pozycki slideshare
Who am i
C3 Network - Slides apresentacao
Androids
Basic SEO
Ad

Similar to flowr streamlining computing workflows (20)

PDF
High performance computing tutorial, with checklist and tips to optimize clus...
PDF
Overview of Scientific Workflows - Why Use Them?
PPT
BioMake BOSC 2004
PPTX
Your data isn't that big @ Big Things Meetup 2016-05-16
PPTX
C-SCALE Tutorial: Slurm
PPTX
Lrz kurs: big data analysis
PPTX
Advances in Scientific Workflow Environments
PPTX
It summit 150604 cb_wcl_ld_kmh_v6_to_publish
PPT
Parallel_and_Cluster_Computing.ppt
PPTX
TASK AND DATA PARALLELISM in Computer Science pptx
PDF
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
PPTX
Using R on High Performance Computers
PPTX
Scheduling in distributed systems - Andrii Vozniuk
PDF
Systems Bioinformatics Workshop Keynote
PPTX
2022.03.24 Snakemake.pptx
PDF
Scaling Systems for Research Computing
PDF
WorDS of Data Science in the Presence of Heterogenous Computing Architectures
PPTX
MapReduce presentation
PDF
UNIX Basics and Cluster Computing
High performance computing tutorial, with checklist and tips to optimize clus...
Overview of Scientific Workflows - Why Use Them?
BioMake BOSC 2004
Your data isn't that big @ Big Things Meetup 2016-05-16
C-SCALE Tutorial: Slurm
Lrz kurs: big data analysis
Advances in Scientific Workflow Environments
It summit 150604 cb_wcl_ld_kmh_v6_to_publish
Parallel_and_Cluster_Computing.ppt
TASK AND DATA PARALLELISM in Computer Science pptx
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
Using R on High Performance Computers
Scheduling in distributed systems - Andrii Vozniuk
Systems Bioinformatics Workshop Keynote
2022.03.24 Snakemake.pptx
Scaling Systems for Research Computing
WorDS of Data Science in the Presence of Heterogenous Computing Architectures
MapReduce presentation
UNIX Basics and Cluster Computing
Ad

Recently uploaded (20)

PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PPTX
Microbiology with diagram medical studies .pptx
PPTX
2. Earth - The Living Planet earth and life
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PDF
Sciences of Europe No 170 (2025)
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PPT
protein biochemistry.ppt for university classes
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
PPTX
Cell Membrane: Structure, Composition & Functions
PPTX
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
PPTX
neck nodes and dissection types and lymph nodes levels
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PDF
lecture 2026 of Sjogren's syndrome l .pdf
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
Microbiology with diagram medical studies .pptx
2. Earth - The Living Planet earth and life
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
7. General Toxicologyfor clinical phrmacy.pptx
Sciences of Europe No 170 (2025)
POSITIONING IN OPERATION THEATRE ROOM.ppt
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
protein biochemistry.ppt for university classes
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
Comparative Structure of Integument in Vertebrates.pptx
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
Cell Membrane: Structure, Composition & Functions
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
neck nodes and dissection types and lymph nodes levels
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
lecture 2026 of Sjogren's syndrome l .pdf

flowr streamlining computing workflows

  • 3. ➡when one needs to wrangle a lot of data whybother?
  • 4. ➡when one needs to wrangle a lot of data ➡and there are multiple steps involved whybother?
  • 5. ➡when one needs to wrangle a lot of data ➡and there are multiple steps involved ➡esp. when some of the steps can be further broken down and processed in parallel whybother?
  • 6. ➡when one needs to wrangle a lot of data ➡and there are multiple steps involved ➡esp. when some of the steps can be further broken down and processed in parallel ➡use a computing cluster, submit a web of jobs whybother?
  • 7. ➡when one needs to wrangle a lot of data ➡and there are multiple steps involved ➡esp. when some of the steps can be further broken down and processed in parallel ➡use a computing cluster, submit a web of jobs whybother? ✓ Effectively process a multi-step pipeline, spawning it across the computing cluster
  • 8. ➡when one needs to wrangle a lot of data ➡and there are multiple steps involved ➡esp. when some of the steps can be further broken down and processed in parallel ➡use a computing cluster, submit a web of jobs whybother? ✓ Effectively process a multi-step pipeline, spawning it across the computing cluster ✓ Reproducible and transparent, with cleanly structured execution logs
  • 9. ➡when one needs to wrangle a lot of data ➡and there are multiple steps involved ➡esp. when some of the steps can be further broken down and processed in parallel ➡use a computing cluster, submit a web of jobs whybother? ✓ Effectively process a multi-step pipeline, spawning it across the computing cluster ✓ Reproducible and transparent, with cleanly structured execution logs ✓ Track and re-run flows
  • 10. ➡when one needs to wrangle a lot of data ➡and there are multiple steps involved ➡esp. when some of the steps can be further broken down and processed in parallel ➡use a computing cluster, submit a web of jobs whybother? ✓ Effectively process a multi-step pipeline, spawning it across the computing cluster ✓ Reproducible and transparent, with cleanly structured execution logs ✓ Track and re-run flows ✓ Lean and Portable, with easy installation
  • 11. ➡when one needs to wrangle a lot of data ➡and there are multiple steps involved ➡esp. when some of the steps can be further broken down and processed in parallel ➡use a computing cluster, submit a web of jobs whybother? ✓ Effectively process a multi-step pipeline, spawning it across the computing cluster ✓ Reproducible and transparent, with cleanly structured execution logs ✓ Track and re-run flows ✓ Lean and Portable, with easy installation ✓ Run the same pipeline in the cloud (using star cluster) OR a local machine
  • 12. ➡when one needs to wrangle a lot of data ➡and there are multiple steps involved ➡esp. when some of the steps can be further broken down and processed in parallel ➡use a computing cluster, submit a web of jobs whybother? ✓ Effectively process a multi-step pipeline, spawning it across the computing cluster ✓ Reproducible and transparent, with cleanly structured execution logs ✓ Track and re-run flows ✓ Lean and Portable, with easy installation ✓ Run the same pipeline in the cloud (using star cluster) OR a local machine ✓ Supports multiple cluster computing platforms (torque, lsf, sge, slurm …)
  • 16. submission types scatter serial fivesimpleterms,definingallrelationships decide how pieces of a single step are processed all in parallel
  • 17. submission types scatter serial fivesimpleterms,definingallrelationships decide how pieces of a single step are processed all in parallel sequentially
  • 18. serial dependency types burstgather submission types scatter serial fivesimpleterms,definingallrelationships decide how pieces of a single step are processed all in parallel sequentially
  • 19. serial dependency types burstgather submission types scatter serial fivesimpleterms,definingallrelationships decide how pieces of a single step are processed decide the relationship b/w steps all in parallel sequentially
  • 20. serial dependency types burstgather submission types scatter serial fivesimpleterms,definingallrelationships decide how pieces of a single step are processed decide the relationship b/w steps many-to-many all in parallel sequentially
  • 21. serial dependency types burstgather submission types scatter serial fivesimpleterms,definingallrelationships decide how pieces of a single step are processed decide the relationship b/w steps many-to-many many-to-one all in parallel sequentially
  • 22. serial dependency types burstgather submission types scatter serial fivesimpleterms,definingallrelationships decide how pieces of a single step are processed decide the relationship b/w steps many-to-many many-to-one one-to-manyall in parallel sequentially
  • 26. fastq sam bam fastq sam bam fastq sam bam fastq sam bam fastq sam bam scatter serial many to many merged bam serial gather many to one alignment stats sort & index Mutations Copy Number variation Indel Calling downstream analysis scatter burst one to many submission type dependency type relationship Usingagenomicsexampleflow,withflowrconcepts
  • 30. ★ wewouldsleepforafewseconds ★ createafewsmallfiles ★ mergethosefiles asimplepipeline,where
  • 31. ★ wewouldsleepforafewseconds ★ createafewsmallfiles ★ mergethosefiles ★ getthesizeoftheresultingmergedfile asimplepipeline,where
  • 32. echo 'Hello World !' sleep 5 sleep 5 cat $RANDOM > tmp1 cat $RANDOM > tmp2 cat tmp1 tmp2 > tmp du -sh tmp simplepipelineinbash
  • 33. echo 'Hello World !' sleep 5 sleep 5 cat $RANDOM > tmp1 cat $RANDOM > tmp2 cat tmp1 tmp2 > tmp du -sh tmp simplepipelineinbash say Hello to the world
  • 34. echo 'Hello World !' sleep 5 sleep 5 cat $RANDOM > tmp1 cat $RANDOM > tmp2 cat tmp1 tmp2 > tmp du -sh tmp simplepipelineinbash say Hello to the world wait for a few seconds…
  • 35. echo 'Hello World !' sleep 5 sleep 5 cat $RANDOM > tmp1 cat $RANDOM > tmp2 cat tmp1 tmp2 > tmp du -sh tmp simplepipelineinbash say Hello to the world wait for a few seconds… create two small files
  • 36. echo 'Hello World !' sleep 5 sleep 5 cat $RANDOM > tmp1 cat $RANDOM > tmp2 cat tmp1 tmp2 > tmp du -sh tmp simplepipelineinbash say Hello to the world wait for a few seconds… create two small files merge the two files
  • 37. echo 'Hello World !' sleep 5 sleep 5 cat $RANDOM > tmp1 cat $RANDOM > tmp2 cat tmp1 tmp2 > tmp du -sh tmp simplepipelineinbash say Hello to the world wait for a few seconds… create two small files merge the two files check the size of the resulting file
  • 38. wrapbashcommandsintoR hello='echo Hello World !' sleep=c('sleep 5', 'sleep 5') tmp=c('cat $RANDOM > tmp1', 'cat $RANDOM > tmp2') merge='cat tmp1 tmp2 > tmp' size='du -sh tmp'
  • 39. wrapbashcommandsintoR hello='echo Hello World !' sleep=c('sleep 5', 'sleep 5') tmp=c('cat $RANDOM > tmp1', 'cat $RANDOM > tmp2') merge='cat tmp1 tmp2 > tmp' size='du -sh tmp' say Hello to the world
  • 40. wrapbashcommandsintoR hello='echo Hello World !' sleep=c('sleep 5', 'sleep 5') tmp=c('cat $RANDOM > tmp1', 'cat $RANDOM > tmp2') merge='cat tmp1 tmp2 > tmp' size='du -sh tmp' say Hello to the world wait for a few seconds…
  • 41. wrapbashcommandsintoR hello='echo Hello World !' sleep=c('sleep 5', 'sleep 5') tmp=c('cat $RANDOM > tmp1', 'cat $RANDOM > tmp2') merge='cat tmp1 tmp2 > tmp' size='du -sh tmp' say Hello to the world wait for a few seconds… create two small files
  • 42. wrapbashcommandsintoR hello='echo Hello World !' sleep=c('sleep 5', 'sleep 5') tmp=c('cat $RANDOM > tmp1', 'cat $RANDOM > tmp2') merge='cat tmp1 tmp2 > tmp' size='du -sh tmp' say Hello to the world wait for a few seconds… create two small files merge the two files
  • 43. wrapbashcommandsintoR hello='echo Hello World !' sleep=c('sleep 5', 'sleep 5') tmp=c('cat $RANDOM > tmp1', 'cat $RANDOM > tmp2') merge='cat tmp1 tmp2 > tmp' size='du -sh tmp' say Hello to the world wait for a few seconds… create two small files merge the two files check the size of the resulting file
  • 44. createatableofallcommands library(flowr) lst = list(hello=hello, sleep=sleep, tmp=tmp, merge=merge, size=size) flowmat = to_flowmat(lst, "samp1") hello='echo Hello World !' sleep=c('sleep 5', 'sleep 5') tmp=c('cat $RANDOM > tmp1', 'cat $RANDOM > tmp2') merge='cat tmp1 tmp2 > tmp' size='du -sh tmp'
  • 45. createatableofallcommands library(flowr) lst = list(hello=hello, sleep=sleep, tmp=tmp, merge=merge, size=size) flowmat = to_flowmat(lst, "samp1") hello='echo Hello World !' sleep=c('sleep 5', 'sleep 5') tmp=c('cat $RANDOM > tmp1', 'cat $RANDOM > tmp2') merge='cat tmp1 tmp2 > tmp' size='du -sh tmp' create a named list
  • 46. createatableofallcommands library(flowr) lst = list(hello=hello, sleep=sleep, tmp=tmp, merge=merge, size=size) flowmat = to_flowmat(lst, "samp1") hello='echo Hello World !' sleep=c('sleep 5', 'sleep 5') tmp=c('cat $RANDOM > tmp1', 'cat $RANDOM > tmp2') merge='cat tmp1 tmp2 > tmp' size='du -sh tmp' create a named list create a table
  • 47. createatableofallcommands library(flowr) lst = list(hello=hello, sleep=sleep, tmp=tmp, merge=merge, size=size) flowmat = to_flowmat(lst, "samp1") hello='echo Hello World !' sleep=c('sleep 5', 'sleep 5') tmp=c('cat $RANDOM > tmp1', 'cat $RANDOM > tmp2') merge='cat tmp1 tmp2 > tmp' size='du -sh tmp' create a named list create a table |samplename |jobname |cmd | |:----------|:-------|:-------------------| |samp1 |hello |echo Hello World ! | |samp1 |sleep |sleep 5 | |samp1 |sleep |sleep 5 | |samp1 |tmp |cat $RANDOM > tmp1 | |samp1 |tmp |cat $RANDOM > tmp2 | |samp1 |merge |cat tmp1 tmp2 > tmp | |samp1 |size |du -sh tmp |
  • 48. createatableofallcommands library(flowr) lst = list(hello=hello, sleep=sleep, tmp=tmp, merge=merge, size=size) flowmat = to_flowmat(lst, "samp1") hello='echo Hello World !' sleep=c('sleep 5', 'sleep 5') tmp=c('cat $RANDOM > tmp1', 'cat $RANDOM > tmp2') merge='cat tmp1 tmp2 > tmp' size='du -sh tmp' create a named list create a table |samplename |jobname |cmd | |:----------|:-------|:-------------------| |samp1 |hello |echo Hello World ! | |samp1 |sleep |sleep 5 | |samp1 |sleep |sleep 5 | |samp1 |tmp |cat $RANDOM > tmp1 | |samp1 |tmp |cat $RANDOM > tmp2 | |samp1 |merge |cat tmp1 tmp2 > tmp | |samp1 |size |du -sh tmp | asimpletab-delimtable
  • 52. flowdef = to_flowdef(flowmat, sub_type = c("serial", "scatter", "scatter", "serial", "serial"), dep_type = c("none", "burst", "serial", "gather", "serial"), platform = "local") createaflowdefinition
  • 53. flowdef = to_flowdef(flowmat, sub_type = c("serial", "scatter", "scatter", "serial", "serial"), dep_type = c("none", "burst", "serial", "gather", "serial"), platform = "local") |jobname |sub_type |prev_jobs |dep_type | cpu| |:-------|:--------|:---------|:--------|---:| |hello |serial |none |none | 1| |sleep |scatter |hello |burst | 1| |tmp |scatter |sleep |serial | 1| |merge |serial |tmp |gather | 1| |size |serial |merge |serial | 1| createaflowdefinition asimpletab-delimtable
  • 54. flowdef = to_flowdef(flowmat, sub_type = c("serial", "scatter", "scatter", "serial", "serial"), dep_type = c("none", "burst", "serial", "gather", "serial"), platform = "local") hello dep: none sub: serial sleepsleepsleepsleep dep: burst sub: scatter tmptmptmptmp dep: serial sub: scatter merge dep: gather sub: serial size dep: serial sub: serial |jobname |sub_type |prev_jobs |dep_type | cpu| |:-------|:--------|:---------|:--------|---:| |hello |serial |none |none | 1| |sleep |scatter |hello |burst | 1| |tmp |scatter |sleep |serial | 1| |merge |serial |tmp |gather | 1| |size |serial |merge |serial | 1| plot_flow(flowdef) createaflowdefinition asimpletab-delimtable
  • 57. stitch&submittothecluster (cloudorserver) |jobname |sub_type |prev_jobs |dep_type | cpu| |:-------|:--------|:---------|:--------|---:| |hello |serial |none |none | 1| |sleep |scatter |hello |burst | 1| |tmp |scatter |sleep |serial | 1| |merge |serial |tmp |gather | 1| |size |serial |merge |serial | 1| |samplename |jobname |cmd | |:----------|:-------|:-------------------| |samp1 |hello |echo Hello World ! | |samp1 |sleep |sleep 5 | |samp1 |sleep |sleep 5 | |samp1 |tmp |cat $RANDOM > tmp1 | |samp1 |tmp |cat $RANDOM > tmp2 | |samp1 |merge |cat tmp1 tmp2 > tmp | |samp1 |size |du -sh tmp | flowmat flowdef +
  • 58. stitch&submittothecluster (cloudorserver) |jobname |sub_type |prev_jobs |dep_type | cpu| |:-------|:--------|:---------|:--------|---:| |hello |serial |none |none | 1| |sleep |scatter |hello |burst | 1| |tmp |scatter |sleep |serial | 1| |merge |serial |tmp |gather | 1| |size |serial |merge |serial | 1| |samplename |jobname |cmd | |:----------|:-------|:-------------------| |samp1 |hello |echo Hello World ! | |samp1 |sleep |sleep 5 | |samp1 |sleep |sleep 5 | |samp1 |tmp |cat $RANDOM > tmp1 | |samp1 |tmp |cat $RANDOM > tmp2 | |samp1 |merge |cat tmp1 tmp2 > tmp | |samp1 |size |du -sh tmp | flowmat flowdef + fobj = to_flow(flowmat, flowdef, execute = TRUE)
  • 59. Working on: hello |===== | 25% Working on: sleep |================ | 50% Working on: merge |==================================| 100% Working on: size Flow is being processed. Track it from R/Terminal using: flowr status x=~/flowr/runs/flowname-samp1-20151005-16-01-38-M8WniKJo OR from R using: status(x='~/flowr/runs/flowname-samp1-20151005-16-01-38-M8WniKJo') stitch&submittothecluster (cloudorserver) |jobname |sub_type |prev_jobs |dep_type | cpu| |:-------|:--------|:---------|:--------|---:| |hello |serial |none |none | 1| |sleep |scatter |hello |burst | 1| |tmp |scatter |sleep |serial | 1| |merge |serial |tmp |gather | 1| |size |serial |merge |serial | 1| |samplename |jobname |cmd | |:----------|:-------|:-------------------| |samp1 |hello |echo Hello World ! | |samp1 |sleep |sleep 5 | |samp1 |sleep |sleep 5 | |samp1 |tmp |cat $RANDOM > tmp1 | |samp1 |tmp |cat $RANDOM > tmp2 | |samp1 |merge |cat tmp1 tmp2 > tmp | |samp1 |size |du -sh tmp | flowmat flowdef + fobj = to_flow(flowmat, flowdef, execute = TRUE)
  • 61. submitaflow,then… status() monitor the status of a single flow OR multiple flows
  • 62. submitaflow,then… status() monitor the status of a single flow OR multiple flows kill() kill all the associated jobs, of one or many flows
  • 63. submitaflow,then… status() monitor the status of a single flow OR multiple flows kill() kill all the associated jobs, of one or many flows rerun() One can rerun the flow from a intermediate step
  • 66. - creativily define relationships using submission and dependency types - each row describes resources for one step, providing full flexibility Flow mat samplename jobname cmd sample1 A sleep 2 && sleep 5;echo hello sample1 A sleep 13 && sleep 7;echo hello sample1 B head -c 100000 /dev/urandom > tmp1 sample1 B head -c 100000 /dev/urandom > tmp1 sample1 C cat tmp1 tmp2 tmp3 > merged sample1 D du -sh merged sample1 D ls merged Flow Definition Define Relationships Resource Requirements jobname submission type previous job(s) dependency type queue memory time cpu platform A scatter none none medium 163185 23:00 1 lsf B scatter A serial medium 163185 23:00 1 lsf C serial B gather medium 163185 23:00 1 lsf D scatter C burst medium 163185 23:00 1 lsf
  • 67. - creativily define relationships using submission and dependency types - each row describes resources for one step, providing full flexibility Flow mat samplename jobname cmd sample1 A sleep 2 && sleep 5;echo hello sample1 A sleep 13 && sleep 7;echo hello sample1 B head -c 100000 /dev/urandom > tmp1 sample1 B head -c 100000 /dev/urandom > tmp1 sample1 C cat tmp1 tmp2 tmp3 > merged sample1 D du -sh merged sample1 D ls merged Flow Definition Define Relationships Resource Requirements jobname submission type previous job(s) dependency type queue memory time cpu platform A scatter none none medium 163185 23:00 1 lsf B scatter A serial medium 163185 23:00 1 lsf C serial B gather medium 163185 23:00 1 lsf D scatter C burst medium 163185 23:00 1 lsf
  • 68. - creativily define relationships using submission and dependency types - each row describes resources for one step, providing full flexibility Flow mat samplename jobname cmd sample1 A sleep 2 && sleep 5;echo hello sample1 A sleep 13 && sleep 7;echo hello sample1 B head -c 100000 /dev/urandom > tmp1 sample1 B head -c 100000 /dev/urandom > tmp1 sample1 C cat tmp1 tmp2 tmp3 > merged sample1 D du -sh merged sample1 D ls merged Flow Definition Define Relationships Resource Requirements jobname submission type previous job(s) dependency type queue memory time cpu platform A scatter none none medium 163185 23:00 1 lsf B scatter A serial medium 163185 23:00 1 lsf C serial B gather medium 163185 23:00 1 lsf D scatter C burst medium 163185 23:00 1 lsf
  • 69. - creativily define relationships using submission and dependency types - each row describes resources for one step, providing full flexibility Flow mat samplename jobname cmd sample1 A sleep 2 && sleep 5;echo hello sample1 A sleep 13 && sleep 7;echo hello sample1 B head -c 100000 /dev/urandom > tmp1 sample1 B head -c 100000 /dev/urandom > tmp1 sample1 C cat tmp1 tmp2 tmp3 > merged sample1 D du -sh merged sample1 D ls merged Flow Definition Define Relationships Resource Requirements jobname submission type previous job(s) dependency type queue memory time cpu platform A scatter none none medium 163185 23:00 1 lsf B scatter A serial medium 163185 23:00 1 lsf C serial B gather medium 163185 23:00 1 lsf D scatter C burst medium 163185 23:00 1 lsf
  • 70. - creativily define relationships using submission and dependency types - each row describes resources for one step, providing full flexibility Flow mat samplename jobname cmd sample1 A sleep 2 && sleep 5;echo hello sample1 A sleep 13 && sleep 7;echo hello sample1 B head -c 100000 /dev/urandom > tmp1 sample1 B head -c 100000 /dev/urandom > tmp1 sample1 C cat tmp1 tmp2 tmp3 > merged sample1 D du -sh merged sample1 D ls merged Flow Definition Define Relationships Resource Requirements jobname submission type previous job(s) dependency type queue memory time cpu platform A scatter none none medium 163185 23:00 1 lsf B scatter A serial medium 163185 23:00 1 lsf C serial B gather medium 163185 23:00 1 lsf D scatter C burst medium 163185 23:00 1 lsf - use any language to create a flow mat (a tsv file) - cmd column defines commands to run
  • 71. - creativily define relationships using submission and dependency types - each row describes resources for one step, providing full flexibility Flow mat samplename jobname cmd sample1 A sleep 2 && sleep 5;echo hello sample1 A sleep 13 && sleep 7;echo hello sample1 B head -c 100000 /dev/urandom > tmp1 sample1 B head -c 100000 /dev/urandom > tmp1 sample1 C cat tmp1 tmp2 tmp3 > merged sample1 D du -sh merged sample1 D ls merged Flow Definition Define Relationships Resource Requirements jobname submission type previous job(s) dependency type queue memory time cpu platform A scatter none none medium 163185 23:00 1 lsf B scatter A serial medium 163185 23:00 1 lsf C serial B gather medium 163185 23:00 1 lsf D scatter C burst medium 163185 23:00 1 lsf - use any language to create a flow mat (a tsv file) - cmd column defines commands to run
  • 72. - creativily define relationships using submission and dependency types - each row describes resources for one step, providing full flexibility Flow mat samplename jobname cmd sample1 A sleep 2 && sleep 5;echo hello sample1 A sleep 13 && sleep 7;echo hello sample1 B head -c 100000 /dev/urandom > tmp1 sample1 B head -c 100000 /dev/urandom > tmp1 sample1 C cat tmp1 tmp2 tmp3 > merged sample1 D du -sh merged sample1 D ls merged Flow Definition Define Relationships Resource Requirements jobname submission type previous job(s) dependency type queue memory time cpu platform A scatter none none medium 163185 23:00 1 lsf B scatter A serial medium 163185 23:00 1 lsf C serial B gather medium 163185 23:00 1 lsf D scatter C burst medium 163185 23:00 1 lsf - use any language to create a flow mat (a tsv file) - cmd column defines commands to run
  • 73. - creativily define relationships using submission and dependency types - each row describes resources for one step, providing full flexibility Flow mat samplename jobname cmd sample1 A sleep 2 && sleep 5;echo hello sample1 A sleep 13 && sleep 7;echo hello sample1 B head -c 100000 /dev/urandom > tmp1 sample1 B head -c 100000 /dev/urandom > tmp1 sample1 C cat tmp1 tmp2 tmp3 > merged sample1 D du -sh merged sample1 D ls merged Flow Definition Define Relationships Resource Requirements jobname submission type previous job(s) dependency type queue memory time cpu platform A scatter none none medium 163185 23:00 1 lsf B scatter A serial medium 163185 23:00 1 lsf C serial B gather medium 163185 23:00 1 lsf D scatter C burst medium 163185 23:00 1 lsf - use any language to create a flow mat (a tsv file) - cmd column defines commands to run