@yannick__ http://yannick.poulet.org
Social insect evolution:
genomics opportunities
& approaches
2014-10-15-NextBUG
2014 10-15-Nextbug edinburgh
© Alex Wild & others
2014 10-15-Nextbug edinburgh
© National Geographic
Atta leaf-cutter ants
© National Geographic
Atta leaf-cutter ants
© National Geographic
Atta leaf-cutter ants
2014 10-15-Nextbug edinburgh
Oecophylla Weaver ants
© ameisenforum.de
© ameisenforum.de
Fourmis tisserandes
© ameisenforum.de
Oecophylla Weaver ants
© forestryimages.org© wynnie@flickr
Tofilski et al 2008
Forelius pusillus
Tofilski et al 2008
Forelius pusillus hides the nest entrance at night
Tofilski et al 2008
Forelius pusillus hides the nest entrance at night
Tofilski et al 2008
Forelius pusillus hides the nest entrance at night
Tofilski et al 2008
Forelius pusillus hides the nest entrance at night
Avant
Workers staying outside die
« preventive self-sacrifice »
Tofilski et al 2008
Forelius pusillus hides the nest entrance at night
Dorylus driver ants: ants with no home
© BBC
© Dirk Mezger
Ritualized fighting
© Carsten Brühl
Camponotus gigas Pfeiffer & Linsenmair 2001
Army ant milling - “spiral of death”
Animal biomass (Brazilian rainforest)
from Fittkau & Klinge 1973
Other insects
49.6
Amphibians
2.8
Reptiles
3.7
Birds
5.3
Mammals
14.5
!
Earthworms
17.3
!
!
Spiders
4.7
Soil fauna excluding
earthworms,
ants & termites
148
Ants & termites
114
2014 10-15-Nextbug edinburgh
Well-studied:
• behavior	

• morphology 	

• evolutionary context	

• ecology
This changes
everything.454	

Illumina	

Solid...
Any lab can
sequence
anything!
Major research areas
Genes/mechanisms for evolution of
social behavior?
REPORTS
onMarch12,2013www.sciencemag.orgDownloadedfrom
Solenopsis invicta fire ants are
a big problem!
verywellstudied!
Ascunceetal2011
Solenopsis invicta fire ant:
two social forms
!
•1 large queen	

•Independent founding	

•Highly territorial	

•Many sizes of workers
!
•2-100 smaller queens	

•Dependent founding	

•No inter-colony aggression	

•All workers similar size
Single-queen form: Multiple-queen form:
Fire ants
+
Population genetics:Allozyme screen
Ken Ross L. Keller
“starch gel”+
1 2 3=> “Gp-9” locus associated to social form
2014 10-15-Nextbug edinburgh
Single queen form Multiple queen form
Ken Ross and colleagues	

Laurent Keller and colleagues
Social form completely associated to Gp-9 locus
bbbbBB BB Bb bb
Ken Ross and colleagues	

Laurent Keller and colleagues
Single queen form Multiple queen form
Social form completely associated to Gp-9 locus
(>15% )(< 5% )
bbBB BB Bb
x
Gp-9 bb females rare
Ken Ross and colleagues	

Laurent Keller and colleagues
Single queen form Multiple queen form
Social form completely associated to Gp-9 locus
(>15% )(< 5% )
BB BB Bb
Ken Ross and colleagues	

Laurent Keller and colleagues
Single queen form Multiple queen form
Social form completely associated to Gp-9 locus
(>15% )(< 5% )
BB BB Bb
x
Ken Ross and colleagues	

Laurent Keller and colleagues
Single queen form Multiple queen form
Social form completely associated to Gp-9 locus
(>15% )(< 5% )
BB BB Bb
x x
Ken Ross and colleagues	

Laurent Keller and colleagues
Social form completely associated to Gp-9 locus
Single queen form Multiple queen form
(>15% )(< 5% )
BB BB Bb
x x x
Ken Ross and colleagues	

Laurent Keller and colleagues
Single queen form Multiple queen form
(>15% )(< 5% )
Social form completely associated to Gp-9 locus
Sex chromosomes
X Y
Gp-9 B
Gp-9 b
SB Sb
“Social chromosomes”
?
Wang et al Nature 2013
Major research areas
Genes/mechanisms for differences (e.g., lifespan?)?
Genes/mechanisms for evolution of
social behavior?
genome evolution social evolution
2014 10-15-Nextbug edinburgh
2014 10-15-Nextbug edinburgh
This changes
everything.454	

Illumina	

Solid...
Any lab can
sequence
anything!
Genomics is hard.
• Biology/life is complex	

• Field is young.	

• Biologists lack computational training.	

• Generally, analysis tools suck. 	

• badly written	

• badly tested	

• hard to install	

• output quality… often questionable. 	

• Understanding/visualizing/massaging data is hard.	

• Datasets continue to grow!
Genomics is hard.
Inspiration?
2014 10-15-Nextbug edinburgh
Best Practices for Scientific Computing
Greg Wilson ∗
, D.A. Aruliah †
, C. Titus Brown ‡
, Neil P. Chue Hong §
, Matt Davis ¶
, Richard T. Guy ∥
,
Steven H.D. Haddock ∗∗
, Katy Huff ††
, Ian M. Mitchell ‡‡
, Mark D. Plumbley §§
, Ben Waugh ¶¶
,
Ethan P. White ∗∗∗
, Paul Wilson †††
∗
Software Carpentry (gvwilson@software-carpentry.org),†University of Ontario Institute of Technology (Dhavide.Aru
State University (ctb@msu.edu),§Software Sustainability Institute (N.ChueHong@epcc.ed.ac.uk),¶ Space Telescope
(mrdavis@stsci.edu),∥University of Toronto (guy@cs.utoronto.ca),∗∗Monterey Bay Aquarium Research Institute
(steve@practicalcomputing.org),††University of Wisconsin (khuff@cae.wisc.edu),‡‡University of British Columbia (mi
Mary University of London (mark.plumbley@eecs.qmul.ac.uk),¶¶University College London (b.waugh@ucl.ac.uk),∗
University (ethan@weecology.org), and †††University of Wisconsin (wilsonp@engr.wisc.edu)
Scientists spend an increasing amount of time building and using
software. However, most scientists are never taught how to do this
efficiently. As a result, many are unaware of tools and practices that
would allow them to write more reliable and maintainable code with
less effort. We describe a set of best practices for scientific software
development that have solid foundations in research and experience,
and that improve scientists’ productivity and the reliability of their
software.
Software is as important to modern scientific research as
telescopes and test tubes. From groups that work exclusively
on computational problems, to traditional laboratory and field
scientists, more and more of the daily operation of science re-
volves around computers. This includes the development of
new algorithms, managing and analyzing the large amounts
of data that are generated in single research projects, and
combining disparate datasets to assess synthetic problems.
Scientists typically develop their own software for these
purposes because doing so requires substantial domain-specific
and open source software development [6
ical studies of scientific computing [4, 31,
development in general (summarized in
practices will guarantee efficient, error-fr
ment, but used in concert they will re
errors in scientific software, make it easie
the authors of the software time and effo
focusing on the underlying scientific ques
1. Write programs for people, not c
Scientists writing software need to write
cutes correctly and can be easily read and
programmers (especially the author’s fut
cannot be easily read and understood it is
to know that it is actually doing what it i
be productive, software developers must t
aspects of human cognition into account
(steve@practicalcomputing.org), University of Wisconsin (khuff@cae.w
Mary University of London (mark.plumbley@eecs.qmul.ac.uk),¶¶Unive
University (ethan@weecology.org), and †††University of Wisconsin (wil
Scientists spend an increasing amount of time building and using
software. However, most scientists are never taught how to do this
efficiently. As a result, many are unaware of tools and practices that
would allow them to write more reliable and maintainable code with
less effort. We describe a set of best practices for scientific software
development that have solid foundations in research and experience,
and that improve scientists’ productivity and the reliability of their
software.
Software is as important to modern scientific research as
telescopes and test tubes. From groups that work exclusively
on computational problems, to traditional laboratory and field
scientists, more and more of the daily operation of science re-
volves around computers. This includes the development of
new algorithms, managing and analyzing the large amounts
of data that are generated in single research projects, and
arXiv:1210.0530v3[cs.MS]29Nov2012
1. Write programs for people, not computers.
2. Automate repetitive tasks.
3. Use the computer to record history.
4. Make incremental changes.
5. Use version control.
6. Don’t repeat yourself (or others).
7. Plan for mistakes.
8. Optimize software only after it works correctly.
9. Document the design and purpose of code rather than its mechanics.!
10. Conduct code reviews.
2014 10-15-Nextbug edinburgh
2014 10-15-Nextbug edinburgh
Inspiration?
• Technologies	

• Planning for mistakes 	

• Automated testing	

• Continuous 	

• Writing for people: use style guide
Code for people: Use a style guide
• For R: http://r-pkgs.had.co.nz/style.html
R style guide extract
Coding for people: Indent your code!
ers
and and improve your code in 6
pproximate Damian Conway
Line length
Strive to limit your code to 80 characters per line. This fits comfortably on a printed page with a
reasonably sized font. If you find yourself running out of room, this is a good indication that you
should encapsulate some of the work in a separate function.

R style guide extract
!
ant_measurements <- read.table(file = '~/Downloads/Web/ant_measurements.txt', header=TRUE, se
!
ant_measurements <- read.table(file = '~/Downloads/Web/ant_measurements.txt',
header = TRUE,
sep = 't',
col.names = c('colony', 'individual', 'headwidth', 'mass')
)
!
ant_measurements <- read.table(file = '~/Downloads/Web/ant_measurements.txt', header=TRUE, 

sep='t', col.names = c('colony', 'individual', 'headwidth', ‘mass'))
Code for people: Use a style guide
• For R: http://r-pkgs.had.co.nz/style.html	

• For Ruby: https://github.com/bbatsov/ruby-style-guide 	

Automatically check your code:
install.packages(“lint”) # once
library(lint) # everytime
lint(“file_to_check.R”)
2014 10-15-Nextbug edinburgh
Four tools
suck less.
Four tools that
Four tools
suck less.
(hopefully)
Four tools that
1. SequenceServer
“Can you BLAST this for me?”
Anurag Priyam, 

Mechanical engineering student, IIT Kharagpur
Sure, I can
help you…
“Can you BLAST this for me?”
Antgenomes.org SequenceServer 	

BLAST made easy
(well, we’re trying...)
http://www.sequenceserver.com/
(requires a BLAST+ install)
Do you have BLAST-formatted databases? If not:
sequenceserver format-databases /path/to/fastas
1. Installing
gem install sequenceserver
# ~/.sequenceserver.conf
bin: ~/ncbi-blast-2.2.25+/bin/
database: /Users/me/blast_databases/
2. Configure.
sequenceserver
### Launched SequenceServer at: http://0.0.0.0:4567
3. Launch.
New release
(soon)
Demo
“Can you BLAST this for me?”
Antgenomes.org SequenceServer 	

BLAST made easy
(well, we’re trying...)
Web server:
Anurag Priyam & Git community - http://sequenceserver.com
blast on 48-core
512gig fat machine
via ssh
2. Bionode
Module counts
Node = “NPM”
2014 10-15-Nextbug edinburgh
Reusable, small and tested
modules
Examples
BASH
JavaScript
bionode.io (online shell)
bionode-ncbi urls assembly Solenopsis invicta | grep genomic.fna
http://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000188075.1_Si_gnG/
GCA_000188075.1_Si_gnG_genomic.fna.gz
bionode-ncbi download sra arthropoda | bionode-sra
bionode-ncbi download gff bacteria
var ncbi = require('bionode-ncbi')
ncbi.urls('assembly', 'Solenopsis invicta'), gotData)
function gotData(urls) {
var genome = urls[0].genomic.fna
download(genome)
})
#	
  Get	
  descriptions	
  for	
  papers	
  related	
  to	
  SRA	
  search	
  
!
bionode	
  ncbi	
  search	
  sra	
  Solenopsis	
  invicta	
  |	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  tool-­‐stream	
  extractProperty	
  uid	
  |	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  bionode	
  ncbi	
  link	
  sra	
  pubmed	
  |	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  tool-­‐stream	
  extractProperty	
  destUID	
  |	
  

	
  	
  	
  	
  	
  	
  	
  	
  bionode	
  ncbi	
  search	
  pubmed	
  
!
Difficulty writing scalable, reproducible and
complex bioinformatic pipelines.
Solution: Node.js everywhereStreams
var ncbi = require('bionode-ncbi')
var tool = require('tool-stream')
var through = require('through2')
var fork1 = through.obj()
var fork2 = through.obj()
ncbi
.search('sra', 'Solenopsis invicta')
.pipe(fork1)
.pipe(dat.reads)
fork1
.pipe(tool.extractProperty('expxml.Biosample.id'))
.pipe(ncbi.search('biosample'))
.pipe(dat.samples)
fork1
.pipe(tool.extractProperty('uid'))
.pipe(ncbi.link('sra', 'pubmed'))
2014 10-15-Nextbug edinburgh
2014 10-15-Nextbug edinburgh
Working with Gene predictions
Gene prediction
Dozens of software algorithms: dozens of predictions
20% failure rate: 	

•missing pieces	

•extra pieces	

•incorrect merging	

•incorrect splitting
Visual inspection... and
manual fixing required.
1 gene = 5 minutes to 3 days	

Yandell&Ence2013NRG
GTTTTtACCTGTTTTtGAAAAGGTAATTTTCTTTAGATATATTATGTTGAATaTTAGGGTTTTTATAAAGAATGTGTATATTGUTTACAATATAAAAGACACAATTGCAAACTAGCATGATTGTAAACAATTGCTAAACGGATCAATATAAATTAAAATTGTAATATTAAGTATCAAACCGATAATTTTTA
Evidence
Consensus:
2014 10-15-Nextbug edinburgh
3. GeneValidator
Monica Dragan
Ismail Moghul
https://github.com/monicadragan/GeneValidator
https://github.com/IsmailM/GeneValidatorApp
Monica Dragan
https://github.com/monicadragan/GeneValidator
https://github.com/IsmailM/GeneValidatorApp
Ismail Moghul
GeneValidator
Run on: 	

★whole geneset: identify most problematic predictions	

★alternative models for a gene (choose best)	

★individual genes (while manually curating)
Warning:Work in Progress
gem install GeneValidator
gem install GeneValidatorApp
http://afra.sbcs.qmul.ac.uk/genevalidator
3.Afra: Crowdsourcing
gene model curation
Gene prediction
Dozens of software algorithms: dozens of predictions
20% failure rate: 	

•missing pieces	

•extra pieces	

•incorrect merging	

•incorrect splitting
Visual inspection... and
manual fixing required.
1 gene = 20 minutes to 3 days	

15,000 genes * 20 species =
impossible.
Yandell&Ence2013NRG
TTTTtACCTGTTTTtGAAAAGGTAATTTTCTTTAGATATATACAGTTTGTAATaTTAGGTATTTTATAAACAGTGTGTATATTTCTTACAATATAAAAGACACAATTGCAAACTAGCATGATTGTAAACAATTGCTAAACGGATCAATATAAATTAAAATTGTAATATTAAGTATCAAACCGATAATTTTT
Evidence
Consensus:
2014 10-15-Nextbug edinburgh
Algorithm discovery by protein folding game players
Firas Khatiba
, Seth Cooperb
, Michael D. Tykaa
, Kefan Xub
, Ilya Makedonb
, Zoran Popovićb
,
David Bakera,c,1
, and Foldit Players
a
Department of Biochemistry; b
Department of Computer Science and Engineering; and c
Howard Hughes Medical Institute, University of Washington,
Box 357370, Seattle, WA 98195
Contributed by David Baker, October 5, 2011 (sent for review June 29, 2011)
Foldit is a multiplayer online game in which players collaborate
and compete to create accurate protein structure models. For spe-
cific hard problems, Foldit player solutions can in some cases out-
perform state-of-the-art computational methods. However, very
little is known about how collaborative gameplay produces these
results and whether Foldit player strategies can be formalized and
structured so that they can be used by computers. To determine
whether high performing player strategies could be collectively
codified, we augmented the Foldit gameplay mechanics with tools
for players to encode their folding strategies as “recipes” and to
share their recipes with other players, who are able to further mod-
ify and redistribute them. Here we describe the rapid social evolu-
tion of player-developed folding algorithms that took place in the
year following the introduction of these tools. Players developed
over 5,400 different recipes, both by creating new algorithms and
by modifying and recombining successful recipes developed by
other players. The most successful recipes rapidly spread through
the Foldit player population, and two of the recipes became parti-
cularly dominant. Examination of the algorithms encoded in these
two recipes revealed a striking similarity to an unpublished algo-
rithm developed by scientists over the same period. Benchmark
calculations show that the new algorithm independently discov-
ered by scientists and by Foldit players outperforms previously
published methods. Thus, online scientific game frameworks have
the potential not only to solve hard scientific problems, but also to
discover and formalize effective new strategies and algorithms.
citizen science ∣ crowd-sourcing ∣ optimization ∣ structure prediction ∣
strategy
Citizen science is an approach to leveraging natural human
abilities for scientific purposes. Most such efforts involve
visual tasks such as tagging images or locating image features
(1–3). In contrast, Foldit is a multiplayer online scientific discovery
game, in which players become highly skilled at creating accurate
protein structure models through extended game play (4, 5). Foldit
recruits online gamers to optimize the computed Rosetta energy
using human spatial problem-solving skills. Players manipulate
protein structures with a palette of interactive tools and manipula-
tions. Through their interactive exploration Foldit players also uti-
lize user-friendly versions of algorithms from the Rosetta structure
prediction methodology (6) such as wiggle (gradient-based energy
minimization) and shake (combinatorial side chain rotamer pack-
ing). The potential of gamers to solve more complex scientific pro-
blems was recently highlighted by the solution of a long-standing
protein structure determination problem by Foldit players (7).
One of the key strengths of game-based human problem ex-
ploration is the human ability to search over the space of possible
strategies and adapt those strategies to the type of problem and
stage of problem solving (5). The variability of tactics and
strategies stems from the individuality of each player as well as
multiple methods of sharing and evolution within the game
(group play, game chat), and outside of the game [wiki pages (8)].
One way to arrive at algorithmic methods underlying successful
human Foldit play would be to apply machine learning techniques
to the detailed logs of expert Foldit players (9). We chose instead
to rely on a superior learning machine: Foldit players themselves.
As the players themselves understand their strategies better than
anyone, we decided to allow them to codify their algorithms
directly, rather than attempting to automatically learn approxi-
mations. We augmented standard Foldit play with the ability to
create, edit, share, and rate gameplay macros, referred to as
“recipes” within the Foldit game (10). In the game each player
has their own “cookbook” of such recipes, from which they can
invoke a variety of interactive automated strategies. Players can
share recipes they write with the rest of the Foldit community or
they can choose to keep their creations to themselves.
In this paper we describe the quite unexpected evolution of
recipes in the year after they were released, and the striking con-
vergence of this very short evolution on an algorithm very similar
to an unpublished algorithm recently developed independently
by scientific experts that improves over previous methods.
Results
In the social development environment provided by Foldit,
players evolved a wide variety of recipes to codify their diverse
strategies to problem solving. During the three and a half month
study period (see Materials and Methods), 721 Foldit players ran
5,488 unique recipes 158,682 times and 568 players wrote 5,202
recipes. We studied these algorithms and found that they fell
into four main categories: (i) perturb and minimize, (ii) aggressive
rebuilding, (iii) local optimize, and (iv) set constraints. The first
category goes beyond the deterministic minimize function
provided to Foldit players, which has the disadvantage of readily
being trapped in local minima, by adding in perturbations to lead
the minimizer in different directions (11). The second category
uses the rebuild tool, which performs fragment insertion with
loop closure, to search different areas of conformation space;
these recipes are often run for long periods of time as they are
designed to rebuild entire regions of a protein rather than just
refining them (Fig. S1). The third category of recipes performs
local minimizations along the protein backbone in order to im-
prove the Rosetta energy for every segment of a protein. The final
category of recipes assigns constraints between beta strands or
pairs of residues (rubber bands), or changes the secondary struc-
ture assignment to guide subsequent optimization.
Different algorithms were used with very different frequencies
during the experiment. Some are designated by the authors as
public and are available for use by all Foldit players, whereas
others are private and available only to their creator or their
Foldit team. The distribution of recipe usage among different
players is shown in Fig. 1 for the 26 recipes that were run over
1,000 times. Some recipes, such as the one represented by the
leftmost bar, were used many times by many different players,
while others, such as the one represented by the pink bar in the
Author contributions: F.K., S.C., Z.P., and D.B. designed research; F.K., S.C., M.D.T., and
F.P. performed research; F.K., S.C., M.D.T., K.X., and I.M. analyzed data; and F.K., S.C., Z.P.,
and D.B. wrote the paper.
The authors declare no conflict of interest.
Freely available online through the PNAS open access option.
1
To whom correspondence should be addressed. E-mail: dabaker@u.washington.edu.
This article contains supporting information online at www.pnas.org/lookup/suppl/
doi:10.1073/pnas.1115898108/-/DCSupplemental.
BIOPHYSICSAND
COMPUTATIONALBIOLOGY
PSYCHOLOGICALAND
COGNITIVESCIENCES
http://Fold.it
• Recruiting & retaining contributors
Crowd-sourcing the visual inspection + correction
of gene models.
Challenges
Recruiting & retaining contributors
Plan A: get students. 	

• Increase accessibility: 	

• Make tasks small & simple 	

• Need excellent tutorials & training	

• Need an intelligent “mothering” user interface.	

• Provide rewards:	

• Better grades	

• Learning experience 	

• Good karma (helping science)	

• Prestige & pride (on facebook; points & badges “leaderboard”, with
certificates, in publications)	

• Opportunities to develop expertise & responsibilities
Crowd-sourcing the visual inspection + correction
of gene models.
Challenges
• Recruiting & retaining contributors	

• Ensuring quality
Ensuring quality
• Excellent tutorials/training	

• Make tasks small & simple	

• Redundancy	

• Review of conflicts by senior
users.
Begin
Being curated
Curate
Being curated
Curate
Being curated
Curate
Submit Submit Submit
“
create nex
Crowd-sourcing the visual inspection + correction.
Challenges
http://afra.sbcs.qmul.ac.ukAnurag Priyam http://github.com/yeban/afra
• Recruiting & retaining contributors	

• Ensuring quality
Warning:Work in Progress
2014 10-15-Nextbug edinburgh
2014 10-15-Nextbug edinburgh
2014 10-15-Nextbug edinburgh
Timelines
• Rolled out to:	

• 8 MSc students	

• 20 3rd year students	

• Need to improve tutorials/guidance/documentation	

• Roll out to 200 first years (few months)	

• Expand
Summary• Ants are cool	

• Exciting times & big challenges	

• Inspiration from people working with computers more/longer	

• SequenceServer - set up custom BLAST servers	

• Bionode -modular streams for bioinformatics	

• GeneValidator - identifying problems with gene predictions	

• Afra - infrastructure to crowdsource gene curation to the
masses
Recruiting Genomehacker/
Bioinformatics support
GitHub
Thanks!
y.wurm@qmul.ac.uk
@yannick__
http://yannick.poulet.org
Colleagues & Collaborators
@ QMUL & UNIL
Anurag Priyam 		 @yeban
Monica Dragan
Ismail Moghul
Vivek Rai
Bruno Vieira @bmpvieira
2014 10-15-Nextbug edinburgh
Maybe
genome evolution social evolution
Generally
Single- vs. Multiple queenness
in fire ants
in similar independent species
•one or many loci?	

•one or many genes?	

•convergence?
Social parasitism
Strengths of selection in
social evolution
concepts & mechanisms
Medically relevant questions
Candidate gene studies
Vitellogenin
Sex determination genes
functional testing....
Tools for genomics work on emerging model organisms
Molecular response to
social upheaval

More Related Content

PPTX
HKU Data Curation MLIM7350 Class 7
PDF
Medical supplies pres
PPTX
15.Honey bees the social insects A Lecture By Mr. Allah Dad Khan Former DG A...
PPSX
The miraculous creation of mosquitoes is hinted at the quran
PPTX
Honey bees
PPT
PPSX
Social Insects FAQ
PPT
Bee presentation
HKU Data Curation MLIM7350 Class 7
Medical supplies pres
15.Honey bees the social insects A Lecture By Mr. Allah Dad Khan Former DG A...
The miraculous creation of mosquitoes is hinted at the quran
Honey bees
Social Insects FAQ
Bee presentation

Viewers also liked (9)

PPT
A for ant
PPTX
Presentation on ant
PPTX
Microscopy for beekeepers
PPTX
Beekeeping May 16th 2015 Smallholder's Fair
PPTX
social organisation of honey bee
PPT
Colony organization in honey bee
PDF
Basic entomology
PPT
Insects Powerpoint
PPTX
Insects powerpoint
A for ant
Presentation on ant
Microscopy for beekeepers
Beekeeping May 16th 2015 Smallholder's Fair
social organisation of honey bee
Colony organization in honey bee
Basic entomology
Insects Powerpoint
Insects powerpoint
Ad

Similar to 2014 10-15-Nextbug edinburgh (20)

PDF
2014 11-13-sbsm032-reproducible research
PDF
2013 10-30-sbc361-reproducible designsandsustainablesoftware
PDF
2014-10-10-SBC361-Reproducible research
PDF
2015 10-7-11am-reproducible research
PPTX
Software Sustainability: Better Software Better Science
PPTX
2016 davis-biotech
PPTX
2013 ucar best practices
PDF
Journal Club - Best Practices for Scientific Computing
PPTX
Data and Computing Infrastructure for the Life Sciences
PPTX
2016 davis-plantbio
PPT
Crowdsourced biological science - edinburgh
PPT
Bioinformatics
PPTX
2014 nicta-reproducibility
PPTX
2016 bergen-sars
PDF
Masters bioinfo 2013-11-14-15
PPTX
2015 genome-center
PDF
The web as a tool - rather than a threat
PPTX
2013 arizona-swc
PPTX
Coding & Best Practice in Programming in the NGS era
PPT
The beauty of workflows and models
2014 11-13-sbsm032-reproducible research
2013 10-30-sbc361-reproducible designsandsustainablesoftware
2014-10-10-SBC361-Reproducible research
2015 10-7-11am-reproducible research
Software Sustainability: Better Software Better Science
2016 davis-biotech
2013 ucar best practices
Journal Club - Best Practices for Scientific Computing
Data and Computing Infrastructure for the Life Sciences
2016 davis-plantbio
Crowdsourced biological science - edinburgh
Bioinformatics
2014 nicta-reproducibility
2016 bergen-sars
Masters bioinfo 2013-11-14-15
2015 genome-center
The web as a tool - rather than a threat
2013 arizona-swc
Coding & Best Practice in Programming in the NGS era
The beauty of workflows and models
Ad

More from Yannick Wurm (20)

PDF
2018 09-03-ses open-fair_practices_in_evolutionary_genomics
PDF
2018 08-reduce risks of genomics research
PDF
2017 11-15-reproducible research
PDF
2016 09-16-fairdom
PDF
2016 05-31-wurm-social-chromosome
PDF
2016 05-30-monday-assembly
PDF
2016 05-29-intro-sib-springschool-leuker bad
PDF
2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...
PDF
2015 11-17-programming inr.key
PDF
2015 11-10-bio-in-docker-oswitch
PDF
Week 5 genetic basis of evolution
PDF
Biol113 week4 evolution
PDF
Evolution week3
PDF
2015 10-7-9am regex-functions-loops.key
PDF
Evolution week2
PDF
2015 9-30-sbc361-research methcomm
PDF
2015 09-29-sbc322-methods.key
PDF
Sbc322 intro.key
PDF
2015 09-28 bio721 intro
PDF
Sustainable software institute Collaboration workshop
2018 09-03-ses open-fair_practices_in_evolutionary_genomics
2018 08-reduce risks of genomics research
2017 11-15-reproducible research
2016 09-16-fairdom
2016 05-31-wurm-social-chromosome
2016 05-30-monday-assembly
2016 05-29-intro-sib-springschool-leuker bad
2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...
2015 11-17-programming inr.key
2015 11-10-bio-in-docker-oswitch
Week 5 genetic basis of evolution
Biol113 week4 evolution
Evolution week3
2015 10-7-9am regex-functions-loops.key
Evolution week2
2015 9-30-sbc361-research methcomm
2015 09-29-sbc322-methods.key
Sbc322 intro.key
2015 09-28 bio721 intro
Sustainable software institute Collaboration workshop

Recently uploaded (20)

PPTX
gene cloning powerpoint for general biology 2
PDF
7.Physics_8_WBS_Electricity.pdfXFGXFDHFHG
PDF
Communicating Health Policies to Diverse Populations (www.kiu.ac.ug)
PDF
Packaging materials of fruits and vegetables
PPT
Biochemestry- PPT ON Protein,Nitrogenous constituents of Urine, Blood, their ...
PPTX
Understanding the Circulatory System……..
PPTX
ELISA(Enzyme linked immunosorbent assay)
PPTX
TORCH INFECTIONS in pregnancy with toxoplasma
PPTX
Presentation1 INTRODUCTION TO ENZYMES.pptx
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PPTX
Substance Disorders- part different drugs change body
PPTX
SCIENCE 4 Q2W5 PPT.pptx Lesson About Plnts and animals and their habitat
PPT
Enhancing Laboratory Quality Through ISO 15189 Compliance
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PPTX
congenital heart diseases of burao university.pptx
PDF
Chapter 3 - Human Development Poweroint presentation
PDF
Integrative Oncology: Merging Conventional and Alternative Approaches (www.k...
PPT
Mutation in dna of bacteria and repairss
PPT
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
PPTX
A powerpoint on colorectal cancer with brief background
gene cloning powerpoint for general biology 2
7.Physics_8_WBS_Electricity.pdfXFGXFDHFHG
Communicating Health Policies to Diverse Populations (www.kiu.ac.ug)
Packaging materials of fruits and vegetables
Biochemestry- PPT ON Protein,Nitrogenous constituents of Urine, Blood, their ...
Understanding the Circulatory System……..
ELISA(Enzyme linked immunosorbent assay)
TORCH INFECTIONS in pregnancy with toxoplasma
Presentation1 INTRODUCTION TO ENZYMES.pptx
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
Substance Disorders- part different drugs change body
SCIENCE 4 Q2W5 PPT.pptx Lesson About Plnts and animals and their habitat
Enhancing Laboratory Quality Through ISO 15189 Compliance
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
congenital heart diseases of burao university.pptx
Chapter 3 - Human Development Poweroint presentation
Integrative Oncology: Merging Conventional and Alternative Approaches (www.k...
Mutation in dna of bacteria and repairss
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
A powerpoint on colorectal cancer with brief background

2014 10-15-Nextbug edinburgh

  • 1. @yannick__ http://yannick.poulet.org Social insect evolution: genomics opportunities & approaches 2014-10-15-NextBUG
  • 3. © Alex Wild & others
  • 5. © National Geographic Atta leaf-cutter ants
  • 6. © National Geographic Atta leaf-cutter ants
  • 7. © National Geographic Atta leaf-cutter ants
  • 9. Oecophylla Weaver ants © ameisenforum.de
  • 13. Tofilski et al 2008 Forelius pusillus
  • 14. Tofilski et al 2008 Forelius pusillus hides the nest entrance at night
  • 15. Tofilski et al 2008 Forelius pusillus hides the nest entrance at night
  • 16. Tofilski et al 2008 Forelius pusillus hides the nest entrance at night
  • 17. Tofilski et al 2008 Forelius pusillus hides the nest entrance at night
  • 18. Avant Workers staying outside die « preventive self-sacrifice » Tofilski et al 2008 Forelius pusillus hides the nest entrance at night
  • 19. Dorylus driver ants: ants with no home © BBC
  • 20. © Dirk Mezger Ritualized fighting © Carsten Brühl Camponotus gigas Pfeiffer & Linsenmair 2001
  • 21. Army ant milling - “spiral of death”
  • 22. Animal biomass (Brazilian rainforest) from Fittkau & Klinge 1973 Other insects 49.6 Amphibians 2.8 Reptiles 3.7 Birds 5.3 Mammals 14.5 ! Earthworms 17.3 ! ! Spiders 4.7 Soil fauna excluding earthworms, ants & termites 148 Ants & termites 114
  • 24. Well-studied: • behavior • morphology • evolutionary context • ecology
  • 26. Major research areas Genes/mechanisms for evolution of social behavior?
  • 27. REPORTS onMarch12,2013www.sciencemag.orgDownloadedfrom Solenopsis invicta fire ants are a big problem! verywellstudied! Ascunceetal2011
  • 28. Solenopsis invicta fire ant: two social forms ! •1 large queen •Independent founding •Highly territorial •Many sizes of workers ! •2-100 smaller queens •Dependent founding •No inter-colony aggression •All workers similar size Single-queen form: Multiple-queen form:
  • 29. Fire ants + Population genetics:Allozyme screen Ken Ross L. Keller “starch gel”+ 1 2 3=> “Gp-9” locus associated to social form
  • 31. Single queen form Multiple queen form Ken Ross and colleagues Laurent Keller and colleagues Social form completely associated to Gp-9 locus
  • 32. bbbbBB BB Bb bb Ken Ross and colleagues Laurent Keller and colleagues Single queen form Multiple queen form Social form completely associated to Gp-9 locus (>15% )(< 5% )
  • 33. bbBB BB Bb x Gp-9 bb females rare Ken Ross and colleagues Laurent Keller and colleagues Single queen form Multiple queen form Social form completely associated to Gp-9 locus (>15% )(< 5% )
  • 34. BB BB Bb Ken Ross and colleagues Laurent Keller and colleagues Single queen form Multiple queen form Social form completely associated to Gp-9 locus (>15% )(< 5% )
  • 35. BB BB Bb x Ken Ross and colleagues Laurent Keller and colleagues Single queen form Multiple queen form Social form completely associated to Gp-9 locus (>15% )(< 5% )
  • 36. BB BB Bb x x Ken Ross and colleagues Laurent Keller and colleagues Social form completely associated to Gp-9 locus Single queen form Multiple queen form (>15% )(< 5% )
  • 37. BB BB Bb x x x Ken Ross and colleagues Laurent Keller and colleagues Single queen form Multiple queen form (>15% )(< 5% ) Social form completely associated to Gp-9 locus
  • 38. Sex chromosomes X Y Gp-9 B Gp-9 b SB Sb “Social chromosomes” ? Wang et al Nature 2013
  • 39. Major research areas Genes/mechanisms for differences (e.g., lifespan?)? Genes/mechanisms for evolution of social behavior? genome evolution social evolution
  • 44. • Biology/life is complex • Field is young. • Biologists lack computational training. • Generally, analysis tools suck. • badly written • badly tested • hard to install • output quality… often questionable. • Understanding/visualizing/massaging data is hard. • Datasets continue to grow! Genomics is hard.
  • 47. Best Practices for Scientific Computing Greg Wilson ∗ , D.A. Aruliah † , C. Titus Brown ‡ , Neil P. Chue Hong § , Matt Davis ¶ , Richard T. Guy ∥ , Steven H.D. Haddock ∗∗ , Katy Huff †† , Ian M. Mitchell ‡‡ , Mark D. Plumbley §§ , Ben Waugh ¶¶ , Ethan P. White ∗∗∗ , Paul Wilson ††† ∗ Software Carpentry (gvwilson@software-carpentry.org),†University of Ontario Institute of Technology (Dhavide.Aru State University (ctb@msu.edu),§Software Sustainability Institute (N.ChueHong@epcc.ed.ac.uk),¶ Space Telescope (mrdavis@stsci.edu),∥University of Toronto (guy@cs.utoronto.ca),∗∗Monterey Bay Aquarium Research Institute (steve@practicalcomputing.org),††University of Wisconsin (khuff@cae.wisc.edu),‡‡University of British Columbia (mi Mary University of London (mark.plumbley@eecs.qmul.ac.uk),¶¶University College London (b.waugh@ucl.ac.uk),∗ University (ethan@weecology.org), and †††University of Wisconsin (wilsonp@engr.wisc.edu) Scientists spend an increasing amount of time building and using software. However, most scientists are never taught how to do this efficiently. As a result, many are unaware of tools and practices that would allow them to write more reliable and maintainable code with less effort. We describe a set of best practices for scientific software development that have solid foundations in research and experience, and that improve scientists’ productivity and the reliability of their software. Software is as important to modern scientific research as telescopes and test tubes. From groups that work exclusively on computational problems, to traditional laboratory and field scientists, more and more of the daily operation of science re- volves around computers. This includes the development of new algorithms, managing and analyzing the large amounts of data that are generated in single research projects, and combining disparate datasets to assess synthetic problems. Scientists typically develop their own software for these purposes because doing so requires substantial domain-specific and open source software development [6 ical studies of scientific computing [4, 31, development in general (summarized in practices will guarantee efficient, error-fr ment, but used in concert they will re errors in scientific software, make it easie the authors of the software time and effo focusing on the underlying scientific ques 1. Write programs for people, not c Scientists writing software need to write cutes correctly and can be easily read and programmers (especially the author’s fut cannot be easily read and understood it is to know that it is actually doing what it i be productive, software developers must t aspects of human cognition into account (steve@practicalcomputing.org), University of Wisconsin (khuff@cae.w Mary University of London (mark.plumbley@eecs.qmul.ac.uk),¶¶Unive University (ethan@weecology.org), and †††University of Wisconsin (wil Scientists spend an increasing amount of time building and using software. However, most scientists are never taught how to do this efficiently. As a result, many are unaware of tools and practices that would allow them to write more reliable and maintainable code with less effort. We describe a set of best practices for scientific software development that have solid foundations in research and experience, and that improve scientists’ productivity and the reliability of their software. Software is as important to modern scientific research as telescopes and test tubes. From groups that work exclusively on computational problems, to traditional laboratory and field scientists, more and more of the daily operation of science re- volves around computers. This includes the development of new algorithms, managing and analyzing the large amounts of data that are generated in single research projects, and arXiv:1210.0530v3[cs.MS]29Nov2012 1. Write programs for people, not computers. 2. Automate repetitive tasks. 3. Use the computer to record history. 4. Make incremental changes. 5. Use version control. 6. Don’t repeat yourself (or others). 7. Plan for mistakes. 8. Optimize software only after it works correctly. 9. Document the design and purpose of code rather than its mechanics.! 10. Conduct code reviews.
  • 50. Inspiration? • Technologies • Planning for mistakes • Automated testing • Continuous • Writing for people: use style guide
  • 51. Code for people: Use a style guide • For R: http://r-pkgs.had.co.nz/style.html
  • 52. R style guide extract
  • 53. Coding for people: Indent your code! ers and and improve your code in 6 pproximate Damian Conway
  • 54. Line length Strive to limit your code to 80 characters per line. This fits comfortably on a printed page with a reasonably sized font. If you find yourself running out of room, this is a good indication that you should encapsulate some of the work in a separate function. R style guide extract ! ant_measurements <- read.table(file = '~/Downloads/Web/ant_measurements.txt', header=TRUE, se ! ant_measurements <- read.table(file = '~/Downloads/Web/ant_measurements.txt', header = TRUE, sep = 't', col.names = c('colony', 'individual', 'headwidth', 'mass') ) ! ant_measurements <- read.table(file = '~/Downloads/Web/ant_measurements.txt', header=TRUE, 
 sep='t', col.names = c('colony', 'individual', 'headwidth', ‘mass'))
  • 55. Code for people: Use a style guide • For R: http://r-pkgs.had.co.nz/style.html • For Ruby: https://github.com/bbatsov/ruby-style-guide Automatically check your code: install.packages(“lint”) # once library(lint) # everytime lint(“file_to_check.R”)
  • 61. “Can you BLAST this for me?”
  • 62. Anurag Priyam, 
 Mechanical engineering student, IIT Kharagpur Sure, I can help you…
  • 63. “Can you BLAST this for me?” Antgenomes.org SequenceServer BLAST made easy (well, we’re trying...)
  • 64. http://www.sequenceserver.com/ (requires a BLAST+ install) Do you have BLAST-formatted databases? If not: sequenceserver format-databases /path/to/fastas 1. Installing gem install sequenceserver # ~/.sequenceserver.conf bin: ~/ncbi-blast-2.2.25+/bin/ database: /Users/me/blast_databases/ 2. Configure. sequenceserver ### Launched SequenceServer at: http://0.0.0.0:4567 3. Launch.
  • 66. “Can you BLAST this for me?” Antgenomes.org SequenceServer BLAST made easy (well, we’re trying...) Web server: Anurag Priyam & Git community - http://sequenceserver.com blast on 48-core 512gig fat machine via ssh
  • 68. Module counts Node = “NPM”
  • 70. Reusable, small and tested modules
  • 71. Examples BASH JavaScript bionode.io (online shell) bionode-ncbi urls assembly Solenopsis invicta | grep genomic.fna http://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000188075.1_Si_gnG/ GCA_000188075.1_Si_gnG_genomic.fna.gz bionode-ncbi download sra arthropoda | bionode-sra bionode-ncbi download gff bacteria var ncbi = require('bionode-ncbi') ncbi.urls('assembly', 'Solenopsis invicta'), gotData) function gotData(urls) { var genome = urls[0].genomic.fna download(genome) }) #  Get  descriptions  for  papers  related  to  SRA  search   ! bionode  ncbi  search  sra  Solenopsis  invicta  |                    tool-­‐stream  extractProperty  uid  |                    bionode  ncbi  link  sra  pubmed  |                    tool-­‐stream  extractProperty  destUID  |  
                bionode  ncbi  search  pubmed   !
  • 72. Difficulty writing scalable, reproducible and complex bioinformatic pipelines. Solution: Node.js everywhereStreams var ncbi = require('bionode-ncbi') var tool = require('tool-stream') var through = require('through2') var fork1 = through.obj() var fork2 = through.obj() ncbi .search('sra', 'Solenopsis invicta') .pipe(fork1) .pipe(dat.reads) fork1 .pipe(tool.extractProperty('expxml.Biosample.id')) .pipe(ncbi.search('biosample')) .pipe(dat.samples) fork1 .pipe(tool.extractProperty('uid')) .pipe(ncbi.link('sra', 'pubmed'))
  • 75. Working with Gene predictions
  • 76. Gene prediction Dozens of software algorithms: dozens of predictions 20% failure rate: •missing pieces •extra pieces •incorrect merging •incorrect splitting Visual inspection... and manual fixing required. 1 gene = 5 minutes to 3 days Yandell&Ence2013NRG GTTTTtACCTGTTTTtGAAAAGGTAATTTTCTTTAGATATATTATGTTGAATaTTAGGGTTTTTATAAAGAATGTGTATATTGUTTACAATATAAAAGACACAATTGCAAACTAGCATGATTGTAAACAATTGCTAAACGGATCAATATAAATTAAAATTGTAATATTAAGTATCAAACCGATAATTTTTA Evidence Consensus:
  • 81. GeneValidator Run on: ★whole geneset: identify most problematic predictions ★alternative models for a gene (choose best) ★individual genes (while manually curating)
  • 82. Warning:Work in Progress gem install GeneValidator gem install GeneValidatorApp http://afra.sbcs.qmul.ac.uk/genevalidator
  • 84. Gene prediction Dozens of software algorithms: dozens of predictions 20% failure rate: •missing pieces •extra pieces •incorrect merging •incorrect splitting Visual inspection... and manual fixing required. 1 gene = 20 minutes to 3 days 15,000 genes * 20 species = impossible. Yandell&Ence2013NRG TTTTtACCTGTTTTtGAAAAGGTAATTTTCTTTAGATATATACAGTTTGTAATaTTAGGTATTTTATAAACAGTGTGTATATTTCTTACAATATAAAAGACACAATTGCAAACTAGCATGATTGTAAACAATTGCTAAACGGATCAATATAAATTAAAATTGTAATATTAAGTATCAAACCGATAATTTTT Evidence Consensus:
  • 86. Algorithm discovery by protein folding game players Firas Khatiba , Seth Cooperb , Michael D. Tykaa , Kefan Xub , Ilya Makedonb , Zoran Popovićb , David Bakera,c,1 , and Foldit Players a Department of Biochemistry; b Department of Computer Science and Engineering; and c Howard Hughes Medical Institute, University of Washington, Box 357370, Seattle, WA 98195 Contributed by David Baker, October 5, 2011 (sent for review June 29, 2011) Foldit is a multiplayer online game in which players collaborate and compete to create accurate protein structure models. For spe- cific hard problems, Foldit player solutions can in some cases out- perform state-of-the-art computational methods. However, very little is known about how collaborative gameplay produces these results and whether Foldit player strategies can be formalized and structured so that they can be used by computers. To determine whether high performing player strategies could be collectively codified, we augmented the Foldit gameplay mechanics with tools for players to encode their folding strategies as “recipes” and to share their recipes with other players, who are able to further mod- ify and redistribute them. Here we describe the rapid social evolu- tion of player-developed folding algorithms that took place in the year following the introduction of these tools. Players developed over 5,400 different recipes, both by creating new algorithms and by modifying and recombining successful recipes developed by other players. The most successful recipes rapidly spread through the Foldit player population, and two of the recipes became parti- cularly dominant. Examination of the algorithms encoded in these two recipes revealed a striking similarity to an unpublished algo- rithm developed by scientists over the same period. Benchmark calculations show that the new algorithm independently discov- ered by scientists and by Foldit players outperforms previously published methods. Thus, online scientific game frameworks have the potential not only to solve hard scientific problems, but also to discover and formalize effective new strategies and algorithms. citizen science ∣ crowd-sourcing ∣ optimization ∣ structure prediction ∣ strategy Citizen science is an approach to leveraging natural human abilities for scientific purposes. Most such efforts involve visual tasks such as tagging images or locating image features (1–3). In contrast, Foldit is a multiplayer online scientific discovery game, in which players become highly skilled at creating accurate protein structure models through extended game play (4, 5). Foldit recruits online gamers to optimize the computed Rosetta energy using human spatial problem-solving skills. Players manipulate protein structures with a palette of interactive tools and manipula- tions. Through their interactive exploration Foldit players also uti- lize user-friendly versions of algorithms from the Rosetta structure prediction methodology (6) such as wiggle (gradient-based energy minimization) and shake (combinatorial side chain rotamer pack- ing). The potential of gamers to solve more complex scientific pro- blems was recently highlighted by the solution of a long-standing protein structure determination problem by Foldit players (7). One of the key strengths of game-based human problem ex- ploration is the human ability to search over the space of possible strategies and adapt those strategies to the type of problem and stage of problem solving (5). The variability of tactics and strategies stems from the individuality of each player as well as multiple methods of sharing and evolution within the game (group play, game chat), and outside of the game [wiki pages (8)]. One way to arrive at algorithmic methods underlying successful human Foldit play would be to apply machine learning techniques to the detailed logs of expert Foldit players (9). We chose instead to rely on a superior learning machine: Foldit players themselves. As the players themselves understand their strategies better than anyone, we decided to allow them to codify their algorithms directly, rather than attempting to automatically learn approxi- mations. We augmented standard Foldit play with the ability to create, edit, share, and rate gameplay macros, referred to as “recipes” within the Foldit game (10). In the game each player has their own “cookbook” of such recipes, from which they can invoke a variety of interactive automated strategies. Players can share recipes they write with the rest of the Foldit community or they can choose to keep their creations to themselves. In this paper we describe the quite unexpected evolution of recipes in the year after they were released, and the striking con- vergence of this very short evolution on an algorithm very similar to an unpublished algorithm recently developed independently by scientific experts that improves over previous methods. Results In the social development environment provided by Foldit, players evolved a wide variety of recipes to codify their diverse strategies to problem solving. During the three and a half month study period (see Materials and Methods), 721 Foldit players ran 5,488 unique recipes 158,682 times and 568 players wrote 5,202 recipes. We studied these algorithms and found that they fell into four main categories: (i) perturb and minimize, (ii) aggressive rebuilding, (iii) local optimize, and (iv) set constraints. The first category goes beyond the deterministic minimize function provided to Foldit players, which has the disadvantage of readily being trapped in local minima, by adding in perturbations to lead the minimizer in different directions (11). The second category uses the rebuild tool, which performs fragment insertion with loop closure, to search different areas of conformation space; these recipes are often run for long periods of time as they are designed to rebuild entire regions of a protein rather than just refining them (Fig. S1). The third category of recipes performs local minimizations along the protein backbone in order to im- prove the Rosetta energy for every segment of a protein. The final category of recipes assigns constraints between beta strands or pairs of residues (rubber bands), or changes the secondary struc- ture assignment to guide subsequent optimization. Different algorithms were used with very different frequencies during the experiment. Some are designated by the authors as public and are available for use by all Foldit players, whereas others are private and available only to their creator or their Foldit team. The distribution of recipe usage among different players is shown in Fig. 1 for the 26 recipes that were run over 1,000 times. Some recipes, such as the one represented by the leftmost bar, were used many times by many different players, while others, such as the one represented by the pink bar in the Author contributions: F.K., S.C., Z.P., and D.B. designed research; F.K., S.C., M.D.T., and F.P. performed research; F.K., S.C., M.D.T., K.X., and I.M. analyzed data; and F.K., S.C., Z.P., and D.B. wrote the paper. The authors declare no conflict of interest. Freely available online through the PNAS open access option. 1 To whom correspondence should be addressed. E-mail: dabaker@u.washington.edu. This article contains supporting information online at www.pnas.org/lookup/suppl/ doi:10.1073/pnas.1115898108/-/DCSupplemental. BIOPHYSICSAND COMPUTATIONALBIOLOGY PSYCHOLOGICALAND COGNITIVESCIENCES http://Fold.it
  • 87. • Recruiting & retaining contributors Crowd-sourcing the visual inspection + correction of gene models. Challenges
  • 88. Recruiting & retaining contributors Plan A: get students. • Increase accessibility: • Make tasks small & simple • Need excellent tutorials & training • Need an intelligent “mothering” user interface. • Provide rewards: • Better grades • Learning experience • Good karma (helping science) • Prestige & pride (on facebook; points & badges “leaderboard”, with certificates, in publications) • Opportunities to develop expertise & responsibilities
  • 89. Crowd-sourcing the visual inspection + correction of gene models. Challenges • Recruiting & retaining contributors • Ensuring quality
  • 90. Ensuring quality • Excellent tutorials/training • Make tasks small & simple • Redundancy • Review of conflicts by senior users. Begin Being curated Curate Being curated Curate Being curated Curate Submit Submit Submit “ create nex
  • 91. Crowd-sourcing the visual inspection + correction. Challenges http://afra.sbcs.qmul.ac.ukAnurag Priyam http://github.com/yeban/afra • Recruiting & retaining contributors • Ensuring quality
  • 96. Timelines • Rolled out to: • 8 MSc students • 20 3rd year students • Need to improve tutorials/guidance/documentation • Roll out to 200 first years (few months) • Expand
  • 97. Summary• Ants are cool • Exciting times & big challenges • Inspiration from people working with computers more/longer • SequenceServer - set up custom BLAST servers • Bionode -modular streams for bioinformatics • GeneValidator - identifying problems with gene predictions • Afra - infrastructure to crowdsource gene curation to the masses
  • 100. Thanks! y.wurm@qmul.ac.uk @yannick__ http://yannick.poulet.org Colleagues & Collaborators @ QMUL & UNIL Anurag Priyam @yeban Monica Dragan Ismail Moghul Vivek Rai Bruno Vieira @bmpvieira
  • 102. Maybe
  • 103. genome evolution social evolution Generally Single- vs. Multiple queenness in fire ants in similar independent species •one or many loci? •one or many genes? •convergence? Social parasitism Strengths of selection in social evolution concepts & mechanisms Medically relevant questions Candidate gene studies Vitellogenin Sex determination genes functional testing.... Tools for genomics work on emerging model organisms Molecular response to social upheaval