SlideShare a Scribd company logo
Making your data work for you:
                           Scratchpads, publishing & the
                               Biodiversity Data Journal



EBI, UK              Vince Smith1, Dave Roberts1 & Lyubomir Penev2
25 September, 2012                  1. Natural History Museum, London
                                  2. Pensoft Publishers, Sofia, Bulgaria

                                                     vince@vsmith.info
Our informatics grand challenge…

 ―Link together evolutionary
 data… by developing
 analytical tools and proper
 documentation and then
 use this framework to
 conduct comparative
 analyses, studies of
 evolutionary process and
 biodiversity analyses‖


         Cyndy Parr, Rob Guralnick, Nico
         Cellinese and Rod Page. TREE.
         doi:10.1016/j.tree.2011.11.001
Our informatics grand challenge…

 ―Link together evolutionary               This requires data, information
 data… by developing                       & knowledge to be…
 analytical tools and proper
 documentation and then                       • Digital
 use this framework to                            Not printed paper
 conduct comparative                          • Openly accessible
 analyses, studies of
 evolutionary process and                         Not behind barriers
 biodiversity analyses‖                       • Linked-up
                                                   Not in silos
         Cyndy Parr, Rob Guralnick, Nico
         Cellinese and Rod Page. TREE.
         doi:10.1016/j.tree.2011.11.001
Most of our output is not digital, open or linked
 •      15-20k new spp. described annually (2M total)1
 •      30k nomenclatural acts (12M total) 1
 •      20k phylogenies (750k total)2
 •      31k taxa sequenced (360k taxa total)3
 •      800k BioMed papers (40M total pp. of taxonomy) 4
 •      Countless specimens, images, maps, keys…


     Typically generated by small
     communities for “local” research
     projects

     Figures from 1) Zhang, Zootaxa 2011 4, 1-4; 2) Web-of-Science; 3) Genbank and 4) PubMed.
Scratchpad
Virtual Research Environments

    Making taxonomy digital, open & linked
What is a Scratchpad?
 A website for you & your community




         1                      2                 3
     Your data             Uploaded &   ―Published‖ & reviewed
                             tagged           on your site

     Fast                 Intuitive       Fit for use
Scratchpads
                        • EDIT (07-11), ViBRANT / eMonocot (11-13)
                        • Hosted websites for taxonomists
                        • Taxonomic, regional or societal
                        • Research & publication platform
                        • Supports the taxonomic workflow
                        • Modular (Drupal) & flexible
                        • Two full time developers
                        • Ecosystem of communities (~450)




http://scratchpads.eu
Categories of Scratchpads




                                      Taxa
 (Classifications, taxon profiles, specimens, literature, images, maps, phenotypic,
               genotypic & morphometric datasets, keys, phylogenies)




    Conservation           Projects             Regions              Societies
Summary of what Scratchpads can do
  •   Taxon pages, generated from tagged content (plant/animal)
  •   Bibliography management
  •   Character matrixes
  •   Specimen records
  •   Distribution maps (from specimens and regional)
  •   Images, video and sound (bulk import)
  •   Excel spreadsheet import (dynamically generated)
  •   Darwin Core Archive export
  •   Tabular data editing
  •   Custom content
  •   User management
  •   Custom webforms
  •   EOL data import (taxonomy, species information)
  •   GBIF Map integration
Scratchpad v.1 usage (2007- Mar. 2012)


   Nodes, 430, 948
   Sites 326
   Users 6809
   Active Users 5733
   (273 w / 759 m)




                                                  Users
  Range: 1-1049          Sites
  Mean: 15
  Mode: 1


 • Prof. scientists
 • Amateur naturalists
 • Citizen scientists
                                 ViBRANT   SP 2
Scratchpad 2 – the new version of Scratchpads
                                     • Launched March 2012
                                     • 120 sites to date
                                     • EOL Fellows
                                     • SP1 migration ongoing

                                     • More professional
                                     • Easier to…
                                         - configure (workflows)
                                         - navigate (facets)
                                         - & populate (MS Excel templates)
                                     •   Greater standardisation
                                     •   Still highly flexible
                                     •   Project profiles (eMonocot)
                                     •   Framework for integration
e.g. http://ihs.myspecies.info/
Getting data in and out of Scratchpads 2
Online community revision
                          • Taxonomy is in perpetual beta
                            - Constantly evolving
                            - Changing contributors
                            - Small granular contributions
                          • Sustainability
                            - A permanent space to work
                            - Guaranteed access (2016)
                            - Easy ways to get the data out
                          • Open science
                            - Beyond Open Access
                            - New ways of working
                            - Data management plans
Freeloader flies
http://milichiidae.info   • Need incentives to use
                            - More efficient (functions & reuse)
                            - Attribution & provenance
                            - Credit via citation
                          • New forms of publication
Publishing observations & taxon data
http://scratchpads.eu > http://gbif.org & http://eol.org

   Specimen records & species                     Pushed to GBIF & EOL
     pages on Scratchpads                       (requires site registration with
                                                         GBIF & EOL)




                                      Darwin
                                       Core
                                     Archive
                                     (DwCA)




     >19K specimen records                     >377M specimen records GBIF
      > 122k species pages                      > 1 M species pages in EOL
Experiments with article publishing
http://scratchpads.eu > http://pensoft.net

     Paper assembled from                     XML submission, peer review &
      Scratchpad database                    marked-up publication by Pensoft
                                             doi:10.3897/zookeys.50.539




                                             XML
                                             HTML
                                             PDF

5-step workflow for selecting data,           Published in Zookeys & Phytokeys
  adding metadata & previewing                      (worldwide coverage)
Example papers via Scratchpads…
  Blagoderov V, Hippa H, Nel A (2010). ZooKeys 50:        Faulwetter S, Chatzigeorgiou G, Galil BS,      Brake I, von Tschirnhaus M (2010). ZooKeys 50:
        79–90. doi: 10.3897/zookeys.50.506             Nicolaidou A, Arvanitidis C (2011. ZooKeys 150:        91–96. doi: 10.3897/zookeys.50.505
                                                          327–345. doi: 10.3897/zookeys.150.1877




  http://sciaroidea.info/node/44428                  http://polychaetes.marbigen.org/node/35             http://milichiidae.info/node/14995

                                                Live (updated) versions of these papers
BDJ
The Biodiversity Data Journal

        Making small data big!
Why do we need another new journal!!!
    Taxonomy needs less fragmentation, not more!

 BUT…
 • We need to encourage taxonomists to mobilize & describe their data
 • This takes considerable effort (e.g. Scratchpads)
 • ―Arguably‖ this is best rewarded through credit
 • This means papers and citations
 • Process must be very easy for authors
 • Process must facilitate data reuse
 • Meet ―Open Data‖ policy commitments

 • The Biodiversity Data Journal is very different…
Biodiversity Data Journal (BDJ)

• All data matters: No lower or upper limit of manuscript size!
• Multiple publishing routes (not just Scratchpads)
• ALL within a single online collaborative platform, including
  the writing of the manuscript!
• New collaborative article authoring tool
• Community peer review with ―open‖ &―public‖ options
• This is in addition to conventional peer-review
• Online editorial process and version control
• Standards-compliant (Darwin Core, Dublin Core, NLM etc.)
• Pre-defined Code-compliant article templates
BDJ publication & dissemination workflow
                             GBIF-generated                                    Manuscripts
                                                       Scratchpads-
                            manuscripts from                                 generated from
                                                   generated manuscripts
                           metadata descriptions                            authors’ databases

      Authors

Conventional manuscripts
 (MS Word, Open Office)    Pensoft Journal System                  Pensoft Writing Tool
                                    (PJS)                                (PWT)



                            Marked up final publication in PDF, HTML and XML formats
Pensoft manuscript writing tool

                             Contributors                                              • Collaborative online editing
              (mentor, linguis c editor, copy editor,
              poten al reviewer, colleague/friend)              Con                    • Rich text capabilities
                                                                   trib
                                                                       u
                                                                           ng          • Various templates for taxon treatments
                    Inv
                       ite                                                             • Identification keys builder

                                                        Taxon treatment                               • Species occurrence data
                     Template-                                                                          import (Darwin Core
                       based                            Interac ve key                                  compliant)
                     manuscript                         Checklist
                                                                           Authoring                  • Smart citation for figures,
Lead author           crea on                                                                           tables, references &
                                                        Data paper                                      automated positioning
              Inv
                    ite

                                                                           g
                                                                                       • Assembling plates from single figures
                                                                       orin
                                                                A   uth                • References import
                                                                                       • (CrossRef, PubMed Central, etc.)


                              Co-authors
Testing screenshots of the writing tool




  Manuscript preview   Multi-figure plates   Plate layout




  ID Key                                        ID Key
  preview                                       builder
Why publish in the BDJ?

• Joining (small) data into a large data pool
• Open-access, archiving and re-using your data
  through data aggregators
• Providing citation record and creditability for data in
  the form of peer-reviewed publications
• Facilitating online article authoring and editorial
  process for authors, reviewers and editors
• Using a truly innovative dissemination of atomized
  content
• Very low-cost. Free in the launch phase, thereafter at
  fee that anyone can afford!
What will BDJ publish?

• Single taxon treatments and nomenclatural acts
• Local or regional checklists
• Sampling reports and occasional inventories
• Habitat-based checklists and inventories
• Ecological and biological observations of species
  and communities?
• Single identification keys
• ANY KIND of biodiversity-related database, including
  genomic, ecological and environmental data (data
  papers)
• Biodiversity-related software tools

    Starting late 2012, early 2013                        Recruiting
                                                         editors now
BDJ
     Barcoding, genomic &
environmental sequence papers
        Making small data big!
Mammal taxa added to Genbank annually




                                             Aus sp.
                                      = dark taxa", taxa
                                      (specimens) that
                                      aren't identified to a
                                      known species



                         Proper Linnaean names
Proportion of mammal dark taxa in Genbank




                                            Aus sp.




                          Proper Linnaean names
Proportion of invert. dark taxa in Genbank




                                       BOLD
Dark taxa are the norm for bacteria
A lesson in principles for dealing with dark taxa
Roth v. Wikipedia




http://www.newyorker.com/online/blogs/books/2012/09/an-open-letter-to-wikipedia.html
But Wikipedia said ―no‖


   ―I understand your point that the
   author is the greatest authority on
   their own work,‖ writes the Wikipedia
   Administrator—―but we require
   secondary sources.‖
But Wikipedia said ―no‖

 One of Wikipedia’s core principles, along
 with things like neutrality, is verifiability: a
 reader must be able to look at a statement
 in a Wikipedia article and find out where
 it comes from.




 http://quominus.org/archives/981
Lessons for taxonomy & dark taxa…


       Taxonomic statements should be verifiable

                         Literature is the
                   evidence base for taxonomy

                    Literature should be the
                   evidence base for dark taxa


 http://quominus.org/archives/981
Example templates & dissemination
   Occurrence data           Any other data      ―Dark‖ taxon data



   Morphometric data      BIODIVERSITY           Genome descriptions
                           MANUSCRIPT


    Image galleries                                Environmental
                                     XML           sequence data
                                     MARK UP

                            Structured text
                                (data!)



             Biblio-    Occurr-
ARTICLES                           Taxon treatments      Taxon names
            graphies   ence data


                                                               COL
                                         Plazi          Wiki

           BHL
Example template & data fields
Workflow describing ―Dark Taxa‖
                                               PWT – COLLABORATIVE
    Dark taxon sequenced                      ARTICLE AUTHORING TOOL




                                              MANUSCRIPT FINALISATION
                                                  & SUBMISSION

                              Automated
                              submission
                              to Pensoft          BDJ – PEER-REVIEW
                              Writing Tool

          Metadata:
      voucher specimen,
     images, locality, etc.                         MANUSCRIPT
                                                     PUBLISHED
                 Automated update of bibliographic metadata,
                 taxon name, Zoobank record, etc.
Data published
                 Nomenclature

                   Literature




                 Descriptions

                                 Plazi

                    Images




                   Occurrences
―Dark Taxon‖ papers
  • Should contain…
   -   The scope of the taxonomic, ecological & geographic coverage
   -   The sources of voucher specimens
   -   The sampling & lab. protocols used
   -   The process used to ID taxa to which vouchers belong

  • Possible data fields include…
   -   Average no. of records per taxon
   -   Range of records per taxon (Min-Max)
   -   Average, min. and max. sequence length
   -   Range of intraspecific variation
   -   Median variation with in taxon X%
   -   Range of divergence to closed know taxon pairs (min & max?)
   -   Median divergence between closest taxon pair
Possible discussion points…

  • The concept…
    - Is it a good approach to incentivize data publishing & good metadata
      practices?
    - The suitability for ―Dark Taxa‖, new genomes and env. sequence data
    - Is this more suitable for some data papers (e.g. dark taxa) than others?




  • The practicalities…
    - The fit to existing systems (both for data collection and dissemination)
    - The data fields (Dark Taxa‖, new genomes and env. sequence data)
    - Next steps in developing this concept
Acknowledgements
  • Scratchpad technical development
   - Simon Rycroft, Ben Scott, Ed Baker, Alice Heaton, Katherine Boulton,
  • Scratchpad outreach
   - Irina Brake, Laurence Livermore, Dimitris Koureas
  • E-Monocot
   - Paul Wilkin &the Kew team, Charles Godfray & the Oxford team
  • ViBRANT
   - Dave Roberts, Lucy Reeve & many many more
  • Pensoft
   - Lyubomir Penev, Teodor Georgiev & colleagues


  • Our 7,000+ users
Making your data work for you: Scratchpads, publishing & the biodiversity data journal
Making your data work for you: Scratchpads, publishing & the biodiversity data journal
Penso                    Penso                               Peer-review op ons
Wri ng                   Journal                                 Public
                                                                          Community
Tool                     System                                                       Closed
(PWT)                    (PJS)
                                                                                                             Review



                                                 Review
                                                                                        Nominated reviewers
                                                 requests
                                                                                                             Review
                                    Editor
      Collabora ve                                                                        Panel reviewers
      online wri ng              Online edi ng


                                                                                                             Review

                                    Editorial
                              decision & feedback                                         Public reviewers
 Authors



                                                  Publica on &                                          All reviews assembled into a
    Online edi ng                                 dissemina on                                               single online version
                      Author’s revised
                        manuscript
Why we need new methods of publishing…



                                                                      RE-USE
                                                                        of
                                                                     CONTENT




                    Publishing and sharing of primary data
     Primary data

                                                             Drawings: Slavena Peneva
Source: Wikipedia

More Related Content

PPTX
Scratchpads training course introduction
PDF
Scratchpads past,present,future
PDF
Curating and Preserving Collaborative Digital Experiments
PDF
Wf4Ever: Workflow Preservation
PPTX
myExperiment and the Rise of Social Machines
PDF
OAI7 Research Objects
PDF
Workflow Preservation
PPTX
NISO Forum, Denver, Sept. 24, 2012: DataCite and Campus Data Services
Scratchpads training course introduction
Scratchpads past,present,future
Curating and Preserving Collaborative Digital Experiments
Wf4Ever: Workflow Preservation
myExperiment and the Rise of Social Machines
OAI7 Research Objects
Workflow Preservation
NISO Forum, Denver, Sept. 24, 2012: DataCite and Campus Data Services

What's hot (20)

PDF
Collaborative Digital Experiments
PDF
OeRC Seminar
PPTX
Needs for Data Management & Citation Throughout the Information Lifecycle
PPTX
NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...
KEY
NISO Forum, Denver, Sept. 24, 2012: Data Equivalence
PDF
Preservation and institutional repositories for the digital arts and humanities
PPTX
NISO Forum, Denver, Sept. 24, 2012: EZID: Easy dataset identification & manag...
PDF
New Metaphors: Data Papers and Data Citations
PDF
Risk management and auditing
PPTX
Data Publishing in Archaeozoology
PPTX
Scott Edmunds: Data Dissemination in the era of "Big-Data"
PDF
2012 02 pre_hbs_grid_overview_ianstokesrees_pt2
PPTX
DuraSpace is OPEN, OR2016
PDF
Escaping Datageddon
PPTX
ESI Supplemental 1 E-research Support Slides
PPT
Saving private data, sharing Open Data? Role of libraries and institutional r...
PDF
ESI Supplemental Webinar 2 - DataONE presentation slides
PPTX
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
PPTX
Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...
PDF
4.2.15 Slides, “Hydra: many heads, many connections. Enriching Fedora Reposit...
Collaborative Digital Experiments
OeRC Seminar
Needs for Data Management & Citation Throughout the Information Lifecycle
NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...
NISO Forum, Denver, Sept. 24, 2012: Data Equivalence
Preservation and institutional repositories for the digital arts and humanities
NISO Forum, Denver, Sept. 24, 2012: EZID: Easy dataset identification & manag...
New Metaphors: Data Papers and Data Citations
Risk management and auditing
Data Publishing in Archaeozoology
Scott Edmunds: Data Dissemination in the era of "Big-Data"
2012 02 pre_hbs_grid_overview_ianstokesrees_pt2
DuraSpace is OPEN, OR2016
Escaping Datageddon
ESI Supplemental 1 E-research Support Slides
Saving private data, sharing Open Data? Role of libraries and institutional r...
ESI Supplemental Webinar 2 - DataONE presentation slides
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...
4.2.15 Slides, “Hydra: many heads, many connections. Enriching Fedora Reposit...
Ad

Viewers also liked (8)

PPT
Setting the Scene for ViBRANT – Strategy, Philosophy and Communication
PDF
Community web sites: small pieces loosely joined
PDF
Apresentação Lourdes Casanova | OIS 2011 | Seminário - 23/11
PPT
Google Chronicles: Analytics And Chrome
PDF
Worldwide security requirements
PDF
Luciana Hashiba | OIS 2012 | Painel de estudo de casos Brasileiros: o que apr...
PDF
Maria Cristina | OIS 2012 | Painel de estudo de casos Brasileiros: o que apre...
PDF
Augusto de Franco | OIS 2012 | O desafio das redes de inovação
Setting the Scene for ViBRANT – Strategy, Philosophy and Communication
Community web sites: small pieces loosely joined
Apresentação Lourdes Casanova | OIS 2011 | Seminário - 23/11
Google Chronicles: Analytics And Chrome
Worldwide security requirements
Luciana Hashiba | OIS 2012 | Painel de estudo de casos Brasileiros: o que apr...
Maria Cristina | OIS 2012 | Painel de estudo de casos Brasileiros: o que apre...
Augusto de Franco | OIS 2012 | O desafio das redes de inovação
Ad

Similar to Making your data work for you: Scratchpads, publishing & the biodiversity data journal (20)

PPTX
Making your data work for you: Scratchpads, publishing & the Biodiversity Dat...
PPT
Scratchpad training
PPTX
Scratchpad 2014-introduction
PPT
Small pieces loosely joined: towards a unified theory of biodiversity for the...
PPTX
Scratchpads introductory presentation 45mins
PDF
Introduction to Scratchpads & ViBRANT
PPT
Scratchpads: past, present and future
PPT
Scratchpads: past, present and future
PPTX
Delivering biodiversity knowledge in the information age
PPT
Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...
PPTX
Vince smith-delivering biodiversity knowledge in the information age-notext
PPT
Small pieces loosely joined: getting louse research online.
PPT
Scratchpad 2, Virtual Research Environment: Project Update
PPTX
Scratchpads: the Virtual Research Environment for biodiversity data
PDF
Scratchpads: Building web communities supporting biodiversity science
PDF
Scratchpads Training Course
PDF
Sharing, linking and publishing biodiversity data the ViBRANT way
PPT
WP6 Overview: From prototypes to industry standards: Markup, semantic enhance...
PPT
A summary of Scratchpad functionality
PPT
Scratchpad Requirements Exercise
Making your data work for you: Scratchpads, publishing & the Biodiversity Dat...
Scratchpad training
Scratchpad 2014-introduction
Small pieces loosely joined: towards a unified theory of biodiversity for the...
Scratchpads introductory presentation 45mins
Introduction to Scratchpads & ViBRANT
Scratchpads: past, present and future
Scratchpads: past, present and future
Delivering biodiversity knowledge in the information age
Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...
Vince smith-delivering biodiversity knowledge in the information age-notext
Small pieces loosely joined: getting louse research online.
Scratchpad 2, Virtual Research Environment: Project Update
Scratchpads: the Virtual Research Environment for biodiversity data
Scratchpads: Building web communities supporting biodiversity science
Scratchpads Training Course
Sharing, linking and publishing biodiversity data the ViBRANT way
WP6 Overview: From prototypes to industry standards: Markup, semantic enhance...
A summary of Scratchpad functionality
Scratchpad Requirements Exercise

More from Vince Smith (20)

PPTX
DiSSCo institutional benefits
PPTX
NHM Data Portal: first steps toward the Graph-of-Life
PPT
Moving beyond the box: automating the digitisation of insect collections
PPT
FP7 Funded RI Project experiences: some overly honest tips from a project coo...
PPTX
Use it or lose it: a hybrid model for sustaining e-infrastructures
PPTX
No specimen left behind: Collections digitisation at the NHM, London*
PPT
SYNTHESYS 3 Overview
PPTX
Consolidated ViBRANT Project Final Review Presentations
PDF
Assisted restructure of web content for paper-based presentation: a look at w...
PDF
Bibliography of Life: Comprehensive services for biodiversity bibliographic r...
PDF
Next generation sequencing requires next generation publishing: the Biodivers...
PPTX
Use it or lose it: crowdsourcing support and outreach activities in a hybrid ...
PPTX
The biodiversity informatics landscape: a systematics perspective
PPTX
Building data infrastructures for science
PPTX
Don't make me think: biodiversity data publishing made easy
PDF
The Biodiversity Informatics Landscape
PPTX
Don’t make me think: biodiversity data publishing made easy
PPTX
Digitised collections: Toward a digital strategy for for the NHM, London
PPTX
Virtual Research Environments supporting biodiversity research: Needs & prior...
PPT
2013 02 data portal science group update -v smith
DiSSCo institutional benefits
NHM Data Portal: first steps toward the Graph-of-Life
Moving beyond the box: automating the digitisation of insect collections
FP7 Funded RI Project experiences: some overly honest tips from a project coo...
Use it or lose it: a hybrid model for sustaining e-infrastructures
No specimen left behind: Collections digitisation at the NHM, London*
SYNTHESYS 3 Overview
Consolidated ViBRANT Project Final Review Presentations
Assisted restructure of web content for paper-based presentation: a look at w...
Bibliography of Life: Comprehensive services for biodiversity bibliographic r...
Next generation sequencing requires next generation publishing: the Biodivers...
Use it or lose it: crowdsourcing support and outreach activities in a hybrid ...
The biodiversity informatics landscape: a systematics perspective
Building data infrastructures for science
Don't make me think: biodiversity data publishing made easy
The Biodiversity Informatics Landscape
Don’t make me think: biodiversity data publishing made easy
Digitised collections: Toward a digital strategy for for the NHM, London
Virtual Research Environments supporting biodiversity research: Needs & prior...
2013 02 data portal science group update -v smith

Recently uploaded (20)

PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
WOOl fibre morphology and structure.pdf for textiles
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
Zenith AI: Advanced Artificial Intelligence
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
Chapter 5: Probability Theory and Statistics
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
A comparative analysis of optical character recognition models for extracting...
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
WOOl fibre morphology and structure.pdf for textiles
A novel scalable deep ensemble learning framework for big data classification...
Zenith AI: Advanced Artificial Intelligence
Programs and apps: productivity, graphics, security and other tools
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
Heart disease approach using modified random forest and particle swarm optimi...
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
Assigned Numbers - 2025 - Bluetooth® Document
Unlocking AI with Model Context Protocol (MCP)
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
MIND Revenue Release Quarter 2 2025 Press Release
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Chapter 5: Probability Theory and Statistics
1 - Historical Antecedents, Social Consideration.pdf
NewMind AI Weekly Chronicles - August'25-Week II
Building Integrated photovoltaic BIPV_UPV.pdf
A comparative analysis of optical character recognition models for extracting...

Making your data work for you: Scratchpads, publishing & the biodiversity data journal

  • 1. Making your data work for you: Scratchpads, publishing & the Biodiversity Data Journal EBI, UK Vince Smith1, Dave Roberts1 & Lyubomir Penev2 25 September, 2012 1. Natural History Museum, London 2. Pensoft Publishers, Sofia, Bulgaria vince@vsmith.info
  • 2. Our informatics grand challenge… ―Link together evolutionary data… by developing analytical tools and proper documentation and then use this framework to conduct comparative analyses, studies of evolutionary process and biodiversity analyses‖ Cyndy Parr, Rob Guralnick, Nico Cellinese and Rod Page. TREE. doi:10.1016/j.tree.2011.11.001
  • 3. Our informatics grand challenge… ―Link together evolutionary This requires data, information data… by developing & knowledge to be… analytical tools and proper documentation and then • Digital use this framework to Not printed paper conduct comparative • Openly accessible analyses, studies of evolutionary process and Not behind barriers biodiversity analyses‖ • Linked-up Not in silos Cyndy Parr, Rob Guralnick, Nico Cellinese and Rod Page. TREE. doi:10.1016/j.tree.2011.11.001
  • 4. Most of our output is not digital, open or linked • 15-20k new spp. described annually (2M total)1 • 30k nomenclatural acts (12M total) 1 • 20k phylogenies (750k total)2 • 31k taxa sequenced (360k taxa total)3 • 800k BioMed papers (40M total pp. of taxonomy) 4 • Countless specimens, images, maps, keys… Typically generated by small communities for “local” research projects Figures from 1) Zhang, Zootaxa 2011 4, 1-4; 2) Web-of-Science; 3) Genbank and 4) PubMed.
  • 5. Scratchpad Virtual Research Environments Making taxonomy digital, open & linked
  • 6. What is a Scratchpad? A website for you & your community 1 2 3 Your data Uploaded & ―Published‖ & reviewed tagged on your site Fast Intuitive Fit for use
  • 7. Scratchpads • EDIT (07-11), ViBRANT / eMonocot (11-13) • Hosted websites for taxonomists • Taxonomic, regional or societal • Research & publication platform • Supports the taxonomic workflow • Modular (Drupal) & flexible • Two full time developers • Ecosystem of communities (~450) http://scratchpads.eu
  • 8. Categories of Scratchpads Taxa (Classifications, taxon profiles, specimens, literature, images, maps, phenotypic, genotypic & morphometric datasets, keys, phylogenies) Conservation Projects Regions Societies
  • 9. Summary of what Scratchpads can do • Taxon pages, generated from tagged content (plant/animal) • Bibliography management • Character matrixes • Specimen records • Distribution maps (from specimens and regional) • Images, video and sound (bulk import) • Excel spreadsheet import (dynamically generated) • Darwin Core Archive export • Tabular data editing • Custom content • User management • Custom webforms • EOL data import (taxonomy, species information) • GBIF Map integration
  • 10. Scratchpad v.1 usage (2007- Mar. 2012) Nodes, 430, 948 Sites 326 Users 6809 Active Users 5733 (273 w / 759 m) Users Range: 1-1049 Sites Mean: 15 Mode: 1 • Prof. scientists • Amateur naturalists • Citizen scientists ViBRANT SP 2
  • 11. Scratchpad 2 – the new version of Scratchpads • Launched March 2012 • 120 sites to date • EOL Fellows • SP1 migration ongoing • More professional • Easier to… - configure (workflows) - navigate (facets) - & populate (MS Excel templates) • Greater standardisation • Still highly flexible • Project profiles (eMonocot) • Framework for integration e.g. http://ihs.myspecies.info/
  • 12. Getting data in and out of Scratchpads 2
  • 13. Online community revision • Taxonomy is in perpetual beta - Constantly evolving - Changing contributors - Small granular contributions • Sustainability - A permanent space to work - Guaranteed access (2016) - Easy ways to get the data out • Open science - Beyond Open Access - New ways of working - Data management plans Freeloader flies http://milichiidae.info • Need incentives to use - More efficient (functions & reuse) - Attribution & provenance - Credit via citation • New forms of publication
  • 14. Publishing observations & taxon data http://scratchpads.eu > http://gbif.org & http://eol.org Specimen records & species Pushed to GBIF & EOL pages on Scratchpads (requires site registration with GBIF & EOL) Darwin Core Archive (DwCA) >19K specimen records >377M specimen records GBIF > 122k species pages > 1 M species pages in EOL
  • 15. Experiments with article publishing http://scratchpads.eu > http://pensoft.net Paper assembled from XML submission, peer review & Scratchpad database marked-up publication by Pensoft doi:10.3897/zookeys.50.539 XML HTML PDF 5-step workflow for selecting data, Published in Zookeys & Phytokeys adding metadata & previewing (worldwide coverage)
  • 16. Example papers via Scratchpads… Blagoderov V, Hippa H, Nel A (2010). ZooKeys 50: Faulwetter S, Chatzigeorgiou G, Galil BS, Brake I, von Tschirnhaus M (2010). ZooKeys 50: 79–90. doi: 10.3897/zookeys.50.506 Nicolaidou A, Arvanitidis C (2011. ZooKeys 150: 91–96. doi: 10.3897/zookeys.50.505 327–345. doi: 10.3897/zookeys.150.1877 http://sciaroidea.info/node/44428 http://polychaetes.marbigen.org/node/35 http://milichiidae.info/node/14995 Live (updated) versions of these papers
  • 17. BDJ The Biodiversity Data Journal Making small data big!
  • 18. Why do we need another new journal!!! Taxonomy needs less fragmentation, not more! BUT… • We need to encourage taxonomists to mobilize & describe their data • This takes considerable effort (e.g. Scratchpads) • ―Arguably‖ this is best rewarded through credit • This means papers and citations • Process must be very easy for authors • Process must facilitate data reuse • Meet ―Open Data‖ policy commitments • The Biodiversity Data Journal is very different…
  • 19. Biodiversity Data Journal (BDJ) • All data matters: No lower or upper limit of manuscript size! • Multiple publishing routes (not just Scratchpads) • ALL within a single online collaborative platform, including the writing of the manuscript! • New collaborative article authoring tool • Community peer review with ―open‖ &―public‖ options • This is in addition to conventional peer-review • Online editorial process and version control • Standards-compliant (Darwin Core, Dublin Core, NLM etc.) • Pre-defined Code-compliant article templates
  • 20. BDJ publication & dissemination workflow GBIF-generated Manuscripts Scratchpads- manuscripts from generated from generated manuscripts metadata descriptions authors’ databases Authors Conventional manuscripts (MS Word, Open Office) Pensoft Journal System Pensoft Writing Tool (PJS) (PWT) Marked up final publication in PDF, HTML and XML formats
  • 21. Pensoft manuscript writing tool Contributors • Collaborative online editing (mentor, linguis c editor, copy editor, poten al reviewer, colleague/friend) Con • Rich text capabilities trib u ng • Various templates for taxon treatments Inv ite • Identification keys builder Taxon treatment • Species occurrence data Template- import (Darwin Core based Interac ve key compliant) manuscript Checklist Authoring • Smart citation for figures, Lead author crea on tables, references & Data paper automated positioning Inv ite g • Assembling plates from single figures orin A uth • References import • (CrossRef, PubMed Central, etc.) Co-authors
  • 22. Testing screenshots of the writing tool Manuscript preview Multi-figure plates Plate layout ID Key ID Key preview builder
  • 23. Why publish in the BDJ? • Joining (small) data into a large data pool • Open-access, archiving and re-using your data through data aggregators • Providing citation record and creditability for data in the form of peer-reviewed publications • Facilitating online article authoring and editorial process for authors, reviewers and editors • Using a truly innovative dissemination of atomized content • Very low-cost. Free in the launch phase, thereafter at fee that anyone can afford!
  • 24. What will BDJ publish? • Single taxon treatments and nomenclatural acts • Local or regional checklists • Sampling reports and occasional inventories • Habitat-based checklists and inventories • Ecological and biological observations of species and communities? • Single identification keys • ANY KIND of biodiversity-related database, including genomic, ecological and environmental data (data papers) • Biodiversity-related software tools Starting late 2012, early 2013 Recruiting editors now
  • 25. BDJ Barcoding, genomic & environmental sequence papers Making small data big!
  • 26. Mammal taxa added to Genbank annually Aus sp. = dark taxa", taxa (specimens) that aren't identified to a known species Proper Linnaean names
  • 27. Proportion of mammal dark taxa in Genbank Aus sp. Proper Linnaean names
  • 28. Proportion of invert. dark taxa in Genbank BOLD
  • 29. Dark taxa are the norm for bacteria
  • 30. A lesson in principles for dealing with dark taxa Roth v. Wikipedia http://www.newyorker.com/online/blogs/books/2012/09/an-open-letter-to-wikipedia.html
  • 31. But Wikipedia said ―no‖ ―I understand your point that the author is the greatest authority on their own work,‖ writes the Wikipedia Administrator—―but we require secondary sources.‖
  • 32. But Wikipedia said ―no‖ One of Wikipedia’s core principles, along with things like neutrality, is verifiability: a reader must be able to look at a statement in a Wikipedia article and find out where it comes from. http://quominus.org/archives/981
  • 33. Lessons for taxonomy & dark taxa… Taxonomic statements should be verifiable Literature is the evidence base for taxonomy Literature should be the evidence base for dark taxa http://quominus.org/archives/981
  • 34. Example templates & dissemination Occurrence data Any other data ―Dark‖ taxon data Morphometric data BIODIVERSITY Genome descriptions MANUSCRIPT Image galleries Environmental XML sequence data MARK UP Structured text (data!) Biblio- Occurr- ARTICLES Taxon treatments Taxon names graphies ence data COL Plazi Wiki BHL
  • 35. Example template & data fields
  • 36. Workflow describing ―Dark Taxa‖ PWT – COLLABORATIVE Dark taxon sequenced ARTICLE AUTHORING TOOL MANUSCRIPT FINALISATION & SUBMISSION Automated submission to Pensoft BDJ – PEER-REVIEW Writing Tool Metadata: voucher specimen, images, locality, etc. MANUSCRIPT PUBLISHED Automated update of bibliographic metadata, taxon name, Zoobank record, etc.
  • 37. Data published Nomenclature Literature Descriptions Plazi Images Occurrences
  • 38. ―Dark Taxon‖ papers • Should contain… - The scope of the taxonomic, ecological & geographic coverage - The sources of voucher specimens - The sampling & lab. protocols used - The process used to ID taxa to which vouchers belong • Possible data fields include… - Average no. of records per taxon - Range of records per taxon (Min-Max) - Average, min. and max. sequence length - Range of intraspecific variation - Median variation with in taxon X% - Range of divergence to closed know taxon pairs (min & max?) - Median divergence between closest taxon pair
  • 39. Possible discussion points… • The concept… - Is it a good approach to incentivize data publishing & good metadata practices? - The suitability for ―Dark Taxa‖, new genomes and env. sequence data - Is this more suitable for some data papers (e.g. dark taxa) than others? • The practicalities… - The fit to existing systems (both for data collection and dissemination) - The data fields (Dark Taxa‖, new genomes and env. sequence data) - Next steps in developing this concept
  • 40. Acknowledgements • Scratchpad technical development - Simon Rycroft, Ben Scott, Ed Baker, Alice Heaton, Katherine Boulton, • Scratchpad outreach - Irina Brake, Laurence Livermore, Dimitris Koureas • E-Monocot - Paul Wilkin &the Kew team, Charles Godfray & the Oxford team • ViBRANT - Dave Roberts, Lucy Reeve & many many more • Pensoft - Lyubomir Penev, Teodor Georgiev & colleagues • Our 7,000+ users
  • 43. Penso Penso Peer-review op ons Wri ng Journal Public Community Tool System Closed (PWT) (PJS) Review Review Nominated reviewers requests Review Editor Collabora ve Panel reviewers online wri ng Online edi ng Review Editorial decision & feedback Public reviewers Authors Publica on & All reviews assembled into a Online edi ng dissemina on single online version Author’s revised manuscript
  • 44. Why we need new methods of publishing… RE-USE of CONTENT Publishing and sharing of primary data Primary data Drawings: Slavena Peneva