A meta-analysis of computational biology benchmarks reveals predictors of programming accuracy

A meta-analysis of computational biology benchmarks
reveals predictors of programming accuracy
Paul Gardner
University of Canterbury
Christchurch
New Zealand

ResBaz
I want to say a big thank you to the organisors of ResBaz and NeSI and
Aleksandra and...!
Everything you are about to see is built using tools you have learned at
ResBaz...
Warning: the following research is a work in progress, conclusions may
change (after I’ve triple-checked data & claims)
{ }

Pretend we want to build a phylogenetic tree...

Building trees...
Bioinformaticians are bad, impatient & intolerant people!
Once you have gathered your data, you are faced with a problem...
Parsimony (useful if we want to publish in Cladistics)
47 methods
ARB FootPrinter LVB Parsimov POY
Bionumerics Freqpars MALIGN PAST PRAP
BIRCH Gambit MEGA PAUP* PSODA
Bosque GAPars Mesquite PAUPRat RA
BPAnalysis GelCompar-II Murka PaupUp SeaView
CAFCA GeneTree Network phangorn SeqState
CRANN gmaes NimbleTree PHYLIP Simplot
DAMBE Hennig86 NONA PhyloNet sog
EMBOSS IDEA Notung Phylo_win TCS
TNT
Felsenstein http://evolution.genetics.washington.edu/phylip/software.html

Building trees...
Maximum likelihood
97 methods
ALIFRITZ EMBOSS MOLPHY PHYLLAB rRNA-phylogeny
aLRT EREM MrAIC PhyloCoCo SeaView
ARB fastDNAml MrModeltest Phylo_win Segminator
Bio++ fastDNAmlRev MrMTgui PHYML SEMPHY
Bionumerics FASTML MultiPhyl PhyML-Multi SeqPup
BIRCH FastTree NEPAL PhyNav SeqState
BootPHYML GARLI NHML PHYSIG SIMMAP
Bosque GZ-Gamma nhPhyML PLATO Simplot
CodeAxe HY-PHY NimbleTree Porn* SLR
CoMET IQPNNI p4 PRAP Spectronet
Concaterpillar Kakusan4 PAL PROCOV Spectrum
CONSEL Leaphy PAML ProtTest SplitsTree
Crux Mac5 PARAT PTP SSA
DAMBE McRate PARBOOT r8s-bootstrap TipDate
DART Mesquite PASSML Rate4Site Treefinder
Darwin MetaPIGA PAUP* rate-evolution TREE-PUZZLE
dnarates MixtureTree PAUPRat RAxML Vanilla
DPRML Modelfit PaupUp raxmlGUI
DT-ModSel ModelGenerator phangorn RevDNArates

Building trees...
Bayesian methods
28 methods
AMBIORE BEST IMa2 p4 SIMMAP
ANC-GENE Bio++ Mesquite PAL tracer
BAli-Phy bms_runner MrBayes PAML Vanilla
BAMBE burntrees MrBayesPlugin PHASE
BayesPhylogenies Cadence MrBayes-tree-scanners PHYLLAB
BEAST Crux Multidivtime PhyloBayes
Felsenstein http://evolution.genetics.washington.edu/phylip/software.html

How can we choose software?
Which of the 172 methods do you use?

Can we trust the authors of software?
We can read all the manuscripts & manuals describing 172 software
packages. But...

How should we choose software?
Some possibilities (assuming you don’t create another method...)
Do you know the developer? Are they famous?
Select the most recently published tool?
Has the software been widely adopted?
Is it published in a good journal?
Is the software fast?
We could test the software...

Neutral comparison studies (a.k.a. benchmarks)
A. The main focus of the article is the comparison itself.
B. The authors should be reasonably neutral.
C. The evaluation criteria, methods, and data sets should be chosen in a
rational way.

Try approaching software like a scientist
Are any good controls available?
Positive: databases, publications,
simulation, ...
Negative: randomized, select
relevant negative data, ...
Some common accuracy metrics:
Sensitivity (true positive rate)
Specificity (true negative rate)
Mathew’s correlation coefficients
Area under an ROC curve
False positive rateTruepositiverate
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Pfam
Treefam
Custom
PROVEAN
Polyphen−2
FATHMM
FATHMM, unweighted
Wheeler et al. (2016) A Profile-Based Method for
Measuring the Impact of Genetic Variation. bioRxiv.

Benchmarks are useful, and fun...

Tools can be slow and inaccurate!
CLARK
Kraken
OneCodex
LMAT
MG−RAST
MetaPhlAn
mOTU
Genometa
QIIME
EBI
MetaPhyler
MEGAN
taxator−tk
GOTTCHA
A) Sum of log odds scores, phylum level
Deviation
0
10
20
30
40
50
0
5
10
15
Log2ofruntime(minutes)
~30 mins
~17 hrs
~23 days

Is there really a relationship between speed & accuracy?
Can we run a meta-analysis of bioinformatic benchmarks
What factors are predictive of accuracy?
Training articles:
initially 10 (historical knowledge)
Candidate articles:
((bioinformatics) AND (algorithmic OR algorithms OR biotechnologies OR
computational OR kernel OR methods OR procedure OR programs OR software
OR technologies)) AND (accuracy OR analysis OR assessment OR benchmark
OR benchmarking OR biases OR comparing OR comparison OR comparisons OR
comprehensive OR effectiveness OR estimation OR evaluation OR metrics
OR efficiency OR performance OR perspective OR quality OR rated OR
robust OR strengths OR suitable OR suitability OR superior OR survey OR
weaknesses) AND (benchmark OR competing OR complexity OR cputime OR
duration OR fast OR faster OR perform OR performance OR slow OR speed
OR time)
568,130 articles
Background articles:
(bioinformatics [TIAB] 2013:2015 [dp]) #sorted on first author
154,485 articles

Hunting for relevant articles
After trying Abstrackr (& getting annoyed)...
Training
articles
Background
articles
Removehighfreq. words
Computeword&di-wordfreqs
Computeword
scores: lo(word) =
log2




ftraining(word)+δ
fbackground(word)+δ




logOdds tnFreq bgFreq word
5.28 0.0019 0.0000 benchmarking
5.21 0.0061 0.0002 benchmark
4.91 0.0011 0.0000 noisy
4.85 0.0022 0.0001 metrics
4.85 0.0003 0.0000 encouragingly
...
-7.90 0.0000 0.0024 disease
-8.02 0.0000 0.0026 associated
-8.09 0.0000 0.0027 mirnas
Score&rankcandi-
datearticles: i lo(wi)
Candidate
articles
Manually
evaluate
high
scoring
articles
noyes
Buildmodel

Word and article scores
Can use the same scoring scheme for words that we use for scoring
biological sequences...
logOdds(word) = log2
ftraining (word)+δ
fbackground (word)+δ
articleScore = word∈article logOdds(word)
expression
mirnas
associated
patients
binding
mirna
expressed
network
involved
regulated
levels
revealed
database
mutations
drug
response
tumor
system
activity
induced
.
.
.
benchmarking
sequencers
benchtop
merits
correctness
benchmark
kernels
convolution
winner
supertree
structal
seeker
choosing
corpora
supermatrix
phenocopy
epistasis
segmod
encad
balibase
head & tail word scores
wordscore(bits)
−10
−5
0
5

Iteratively checking articles...
1. Score and rank candidate articles
2. Check the highest scoring articles, add to either training or background
articles
3. Return to 1.

So far we have...
found 35 matching articles. Manually extracted ranks, IF, H, ...
84 benchmarks (method accuracies and speeds)
203 bioinformatic methods
63 journals (47 Bioinformatics, 17 BMC bioinformatics, ...)
124 author GoogleScholar proﬁles
abyss bwasw dialigntx gossamer mafftfftns2 mpest paralign repeatfinder seqmap ssake velvet
antepiseeker caml diffsplice gottcha mafftlinsi mpjclustalw pass repeatgluer sga ssap wmrpmp
apg camp diginormvelvet greedyft maq mpsclustalw perm repeatscout sharcgs ssearch woodhams
barry ce dima gsnap mats mrfast phylonetft rmap shrimp ssm wublast
bfast celera djigsaw heidge megan mrpml piler rnacofold simulatedannealing sst xalign
bismark clark downhillsimplex hmmer metaphlan mrpmp poa rnaduplex sl st xcmswithcorrection
biss clc dsgseq idbaud metaphyler mrsfast poy rnahybrid smalt starbeast xcmswithoutretentiontime
boost clustalomega ebi igtpduplossft mgrast msinspect poystar rnaplex snap strcutal zema
bowtie clustalw edenanonstrict inchworm minia multalin pragcz rnaup snpruler swissmodel
bowtie2 comus edenastrict infernal mira muscle probalign rsearch soap taipan
bratbw coprarna edit intarna mlclustalw musclemaxiters probcons rsmatch soap2 targetrna
bsmap cosine epimode kalign mlclustalwquicktree mzmine probtree sam soapdenovo targetrna2
bsseeker cro erpin kbsps mlmafft ncbiblast pso sate spades taxatortk
buckycon cufflinks fa kraken mlmafftparttree nest pt scro sparse tcoffee
buckymrbayes dali fasta kthse mlmuscle newbler qiime scwrl sparseassembler team
buckymrbayesspa de fasttree leidnl mlopal novoalign qsra scwrlcons spcomp tmap
buckypop dexseq gassst lmat mlprankgt oases ravenna segemehl specarray transabyss
buckyraxml dialign genometa lsqman modellerv onecodex raxml segmodencad spt trinity
builder dialign22 gojobori mafft mosaik openms raxmllimited seqgsea srmapper upmes
bwa dialignt goldman mafftfftns motu pairfold rdiffparam seqman ssaha vcake

Possible predictors of accuracy...
Number of citations
#citations
Frequency
0
5
10
15
20
1 10 100 1,000 10,000 100,000
Journal impact factor
journal.IF
Frequency
0
10
20
30
40
50
60
0.5 1 2.5 5 10 25 50
Journal H5 index (GoogleScholar)
journal.H5
Frequency
0
10
20
30
40
50
60
10 25 50 100 250 500
Corresponding Author's H−index
author.H
Frequency
0
5
10
15
5 10 25 50 100 150
Corresponding Author's M−index
author.M
Frequency
2 4 6 8
0
5
10
15
20
25
30
Relative age
Relative age
Frequency
0.0 0.2 0.4 0.6 0.8 1.0
0
5
10
15
20
25
30

I have found no *signiﬁcant* predictors accuracy!
Z = −1.52; p = 0.94author.M
author.H
journal.H5
relative
age
speed
#citations
journal.IF
Correlations with accuracy rank
Spearman'srho
−0.10
−0.05
0.00
0.05
0.10
Accuracy vs. Speed
mean normalised speed rank
meannormalisedaccuracyrank
0.2
0.4
0.6
0.8
1.0
1.2
1.0
0.8
0.6
0.4
0.2
0.0
1.0
0.8
0.6
0.4
0.2
0.0
*
*
*
*
*
*
*
*
*
**
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
**
*
* **
*
*
*
*
o
o
o
o
o
o o
o
o
o
o
o
o
o
o
o
o o
o
o
o
o
o
o
o
o
o
o
o
o o
o
o
o
o o
x
x
x
x
x
x
x
x
x x
x
xx
x
x
x
x
x
x
x
x
x
x
xx
x
xx
x
x
x
xx
x
xx
x
x
x
x
x
x
x
x x
* = hi profile journal; o = hi profile author; x = hi cited
fast+accurate
fast+inaccurateslow+inaccurate
slow+accurate

IF & #citations
IF: Spearman’s ρ = 0.104; p-value = 0.20
#cites: Spearman’s ρ = 0.101; p-value = 0.18
Accuracy vs. IF
Journal impact factor
0.0
0.5
1.0
1.5
0.5
1
2.5
5
10
25
50
1.0
0.8
0.6
0.4
0.2
0.0
Accuracy vs. #citations
# citations
0.0
0.2
0.4
0.6
0.8
1.0
1
10
100
1,000
10,000
100,000
1.0
0.8
0.6
0.4
0.2
0.0

Conclusions
Nothing appears to be predictive of accuracy1
Fast software undergoes more developmental iterations
Can heuristic approaches produces a better result than mathematically
complete approaches?
It doesn’t appear to matter how famous you are, the journals you
publish in, whether you’re early or late or often your work is cited, you
can still write great software!
1
There is still a chance I have screwed something up...

Thanks
Stephanie McGimpsey
Fatemeh Ashari Ghomi
Sinan Uur Umu
Funded by: Rutherford Discovery Fellowship, BPRC and Biological Heritage: National Science Challenge.

A meta-analysis of computational biology benchmarks reveals predictors of programming accuracy

More Related Content

Viewers also liked (15)

Similar to A meta-analysis of computational biology benchmarks reveals predictors of programming accuracy (20)

More from Paul Gardner (20)

Recently uploaded (20)

A meta-analysis of computational biology benchmarks reveals predictors of programming accuracy