NeXML A future data exchange standard for phylogenetics Rutger Vos University of British Columbia
Increased automation in evolutionary informatics is hampered by poorly defined “standards” Introduction (1/7) The problem Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
Addressing interoperability problems by coding our way out of it Introduction (2/7) EvoInfo   interests Syntax: NeXML Semantics: CDAO Transport: PhyloWS Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
Introduction (3/7) This subproject’s mission To create a file format like  nexus* * Maddison, Swofford and Maddison , 1997. NEXUS: An Extensible File Format for Systematic Information.  Syst. Biol.   46 (4):590-621 Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources Fix (some) problems with nexus Give access to data at higher level Be extensible Expose data to xml goodies , but:
Introduction (4/7) Nexus issues Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources https://www.nescent.org/wg_evoinfo/NEXUS_Problems No explicit versions Nothing ever deprecated No public extensions Leads to hacks such as ‘mixed’ data, ‘hot comments’ Phylogenetics post-’80s in private blocks Hard/impossible to validate
Introduction (5/7) Parsing plain text versus parsing XML Processing nexus data involves  lexing  +  parsing  +  processing XML allows choosing a  parser library , data can be processed as a structure that  hides tokenization issues Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
Introduction (6/7) Extensibility Extensible  file format should provide the ability to:   Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources Define new data types  that implement described ‘interfaces’ Attach typed data structures  to core types  Attach  custom XML
Introduction (7/7) XML goodies Large stack of off-the-shelf tools: Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources XML  parser libraries Web service  toolkits Native XML  databases Editors / IDEs Serialization / data binding  tools
Design (1/5) Design principles Re-use of prior art Follow design patterns Referencing Verbose and compact representations Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
Design (2/5) Re-use of prior art Generic key/value attachments  following apple’s plist semantics: <dict> <key>prior</key> <float>0.78</float> </dict> Trees and networks following  graphml General file structure following nexus concepts, i.e. blocks that reference each other Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
Design (3/5) XML design patterns “ Declare before use ” Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources “ Metadata first ” “ Venetian blinds ” Abstract inheritance through extension, concrete inheritance through restriction
Design (4/5) Inheritance IDTagged   (required id attribute) Labelled   (optional label attribute) Annotated   (optional dict elements) Base   (optional base/lang/href attributes) AbstractElement   (in root schema) ConcreteElement   (in instance document) extends extends extends extends restricts Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
Design (5/5) Referencing Elements sometimes  refer  to other elements, much like in nexus In nexml, elements refer to the  id  of other elements by the  name  of the referenced element:    <otu id=&quot;t1&quot;/>     <!-- referenced later: -->    <node id=&quot;n1&quot; otu=&quot;t1&quot;/>   Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
Schema  design Community feedback  through wiki, email, telecon, projects (evoinfo, ppod, MIAPA) etc. Processors  (perl, java, python, c++, VB, JavaScript) development in parallel Experiments  with xml tools (ws, db, data binding tools) Implementation (1/6) Approach Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
Implementation (2/6)  Entity relationships Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
Implementation (3/6) inheritance tree for elements Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
Implementation (4/6)  anatomy of a “block” <characters       id=&quot;c1&quot;       xsi:type=&quot;nex:DnaSeqs&quot;       otus=&quot;t1&quot;> </characters> <dict> <key>desc</key> <string>description … </string> </dict> Contents… Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
Implementation (5/6) Character Classes RestrictionCells RestrictionSeqs Restriction ContinuousCells ContinuousSeqs Continuous StandardCells StandardSeqs Standard ProteinCells ProteinSeqs Protein RnaCells RnaSeqs RNA DnaCells DnaSeqs DNA Cells Sequence Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
Implementation (6/6) Tree Classes IntTree FloatTree Tree IntNetwork FloatNetwork Network Int Float Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
Current status (1/4) Schema blocks Done: OTUs characters : dna, rna, nucleotide, protein, categorical, continuous, restriction (compact and verbose) trees : graphml trees and networks, various edge formats and rootings Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
Nexml parsers and writers :  mesquite  (java NeXML class libraries) Bio::Phylo  (BioPerl compatible) pyNexml  (python) DAMBE  (Visual Basic) NCL  (C++) JavaScript Current status (2/4) Parsers and writers Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
Semantic annotation  (CDAO)  using  SAWSDL Current status (3/4) Experiments Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources Scalability: Indexed files in  dbxml Created large files from  tolweb ,  rbcl XInclude  with tinyseq xml REST Web services: ToL  service validation  service nexml2json ,  nexus2xml Schema inclusion in  wsdl
Publish standard More  restricted vocabulary attachments  (e.g. Darwin core, CDAO-mediated terms) Substitution model descriptions Sets  (in progress, using class identifiers) Distances Splits Current status (4/4) To do Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
Resources NeXML Base URL:  http://www.nexml.org Wiki:  /wiki Mailing list:  /mail Issue tracker:  /tracker SVN repository:  /code EvoInfo:  http://evoinfo.nescent.org    CDAO:  http://www.evolutionaryontology.org Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
Acknowledgements Contributions:  Jason Caravas, Mark Holder, Peter Midford, Jeet Sukumaran, Xuhua Xia Feedback:  wg-evoinfo, pPOD, Wayne Maddison, David Maddison Additional funding, support:  NESCent, GSoC

More Related Content

PPT
NeXML - phylogenetic data as XML
PPTX
Placement oriented data structures
PDF
Brief Review of Common Modeling Formalisms and Representation Approaches
PDF
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...
DOC
Gellish A Standard Data And Knowledge Representation Language And Ontology
PDF
RuleML 2015: Ontology Reasoning using Rules in an eHealth Context
PDF
OOPS!: on-line ontology diagnosis by Maria Poveda
NeXML - phylogenetic data as XML
Placement oriented data structures
Brief Review of Common Modeling Formalisms and Representation Approaches
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...
Gellish A Standard Data And Knowledge Representation Language And Ontology
RuleML 2015: Ontology Reasoning using Rules in an eHealth Context
OOPS!: on-line ontology diagnosis by Maria Poveda

What's hot (20)

PDF
Programming the Semantic Web
PDF
A summary of various COMBINE standardization activities
PDF
A little more semantics goes a lot further!  Getting more out of Linked Data ...
PDF
Recent developments in the world of SBML (the Systems Biology Markup Language)
PPTX
Ontology-based Data Integration
PPTX
247th ACS Meeting: Experiment Markup Language (ExptML)
PPT
download
PPTX
Overview of XSL, XPath and XSL-FO
PDF
Ontologies Ontop Databases
PDF
VDOS2013-Zhe-Slides
PDF
Ontology-based data access: why it is so cool!
ODP
An RDF Metadata Model for OpenDocument Format 1.2
PPT
Understanding and Configuring the FO Plug-in for Generating PDF Files: Part I...
PDF
SBML (the Systems Biology Markup Language)
PDF
Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewri...
PPTX
Ontology
PDF
Recent Developments in SBML
PPT
CustomizingStyleSheetsForHTMLOutputs
PDF
Executable specifications for xtext
PDF
Linked List Problems
Programming the Semantic Web
A summary of various COMBINE standardization activities
A little more semantics goes a lot further!  Getting more out of Linked Data ...
Recent developments in the world of SBML (the Systems Biology Markup Language)
Ontology-based Data Integration
247th ACS Meeting: Experiment Markup Language (ExptML)
download
Overview of XSL, XPath and XSL-FO
Ontologies Ontop Databases
VDOS2013-Zhe-Slides
Ontology-based data access: why it is so cool!
An RDF Metadata Model for OpenDocument Format 1.2
Understanding and Configuring the FO Plug-in for Generating PDF Files: Part I...
SBML (the Systems Biology Markup Language)
Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewri...
Ontology
Recent Developments in SBML
CustomizingStyleSheetsForHTMLOutputs
Executable specifications for xtext
Linked List Problems
Ad

Viewers also liked (7)

PPT
TreeBASE CIPRES
PPT
Introduction to Computer
PPTX
Phyloinformatics and the Semantic Web
PPT
Computer กับกระบวนการทางธุรกิจ
PPT
Retail Saa S 2011 1
PPTX
Synthesising disparate data resources to obtain composite estimates of geophy...
PPTX
Biomechatronics
TreeBASE CIPRES
Introduction to Computer
Phyloinformatics and the Semantic Web
Computer กับกระบวนการทางธุรกิจ
Retail Saa S 2011 1
Synthesising disparate data resources to obtain composite estimates of geophy...
Biomechatronics
Ad

Similar to NeXML (20)

ODP
Cdao Evolution08
PPT
Sedna XML Database: Query Parser & Optimizing Rewriter
PDF
prefix based labelling scheme for xml data
PPT
Ontology-based Cooperation of Information Systems
PPT
Building Semantic Web Portals with WebML
PPTX
ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
PDF
NetIKX Semantic Search Presentation
PPTX
Building nTier Applications with Entity Framework Services (Part 1)
PDF
Part2- The Atomic Information Resource
PDF
MT_LinqXml _Introduce Linq to XML and Application.pdf
PPT
PhD Presentation
PPT
The return of the hierarchical model
PPTX
eXtensible Markup Language (XML)
PPTX
Building nTier Applications with Entity Framework Services (Part 1)
PPT
osm.cs.byu.edu
PDF
On Parameterised Types and Java Generics
PDF
On Java Generics, History, Use, Caveats v1.1
PPTX
Optimizing Application Architecture (.NET/Java topics)
PPT
Facilitating Busines Interoperability from the Semantic Web
PPT
Introduction to odbms
Cdao Evolution08
Sedna XML Database: Query Parser & Optimizing Rewriter
prefix based labelling scheme for xml data
Ontology-based Cooperation of Information Systems
Building Semantic Web Portals with WebML
ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
NetIKX Semantic Search Presentation
Building nTier Applications with Entity Framework Services (Part 1)
Part2- The Atomic Information Resource
MT_LinqXml _Introduce Linq to XML and Application.pdf
PhD Presentation
The return of the hierarchical model
eXtensible Markup Language (XML)
Building nTier Applications with Entity Framework Services (Part 1)
osm.cs.byu.edu
On Parameterised Types and Java Generics
On Java Generics, History, Use, Caveats v1.1
Optimizing Application Architecture (.NET/Java topics)
Facilitating Busines Interoperability from the Semantic Web
Introduction to odbms

More from Rutger Vos (20)

PDF
Anna Karenina on hooves - what makes an animal fit for domestication?
PDF
10 Misverstanden Over Evolutie
PDF
Crash Course Biodiversiteit
PDF
Natural history research as a replicable data science
PDF
Species delimitation - species limits and character evolution
PDF
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
PDF
Robot eye for the butterfly
PDF
Taxonomic classification of digitized specimens using machine learning
PDF
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...
PPTX
Assembling the Tree of Life from public DNA sequence data
PDF
Hoe leer je een robot soorten te herkennen?
PDF
Modeling the biosphere: the natural historian's perspective
PDF
Kunnen we een tomaat van 400 jaar oud proeven
PPTX
PhyloTastic: names-based phyloinformatic data integration
PPTX
SUPERSMART pipeline intro
PPTX
Reconstructing paleoenvironments using metagenomics
PDF
The Galaxy bioinformatics workflow environment
PDF
Retrieving useful information from connected specimen- and data collections
PPTX
Vos at NCB Naturalis
PPTX
Tree of Life
Anna Karenina on hooves - what makes an animal fit for domestication?
10 Misverstanden Over Evolutie
Crash Course Biodiversiteit
Natural history research as a replicable data science
Species delimitation - species limits and character evolution
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
Robot eye for the butterfly
Taxonomic classification of digitized specimens using machine learning
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...
Assembling the Tree of Life from public DNA sequence data
Hoe leer je een robot soorten te herkennen?
Modeling the biosphere: the natural historian's perspective
Kunnen we een tomaat van 400 jaar oud proeven
PhyloTastic: names-based phyloinformatic data integration
SUPERSMART pipeline intro
Reconstructing paleoenvironments using metagenomics
The Galaxy bioinformatics workflow environment
Retrieving useful information from connected specimen- and data collections
Vos at NCB Naturalis
Tree of Life

Recently uploaded (20)

PPT
What is a Computer? Input Devices /output devices
PPTX
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
PDF
OpenACC and Open Hackathons Monthly Highlights July 2025
PPTX
Custom Battery Pack Design Considerations for Performance and Safety
PPTX
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PDF
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
PPTX
Microsoft Excel 365/2024 Beginner's training
PDF
UiPath Agentic Automation session 1: RPA to Agents
PPT
Module 1.ppt Iot fundamentals and Architecture
PPTX
Benefits of Physical activity for teenagers.pptx
PDF
The influence of sentiment analysis in enhancing early warning system model f...
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PDF
Improvisation in detection of pomegranate leaf disease using transfer learni...
PDF
Architecture types and enterprise applications.pdf
PDF
sbt 2.0: go big (Scala Days 2025 edition)
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
Convolutional neural network based encoder-decoder for efficient real-time ob...
What is a Computer? Input Devices /output devices
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
OpenACC and Open Hackathons Monthly Highlights July 2025
Custom Battery Pack Design Considerations for Performance and Safety
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
NewMind AI Weekly Chronicles – August ’25 Week III
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
Microsoft Excel 365/2024 Beginner's training
UiPath Agentic Automation session 1: RPA to Agents
Module 1.ppt Iot fundamentals and Architecture
Benefits of Physical activity for teenagers.pptx
The influence of sentiment analysis in enhancing early warning system model f...
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
Taming the Chaos: How to Turn Unstructured Data into Decisions
Improvisation in detection of pomegranate leaf disease using transfer learni...
Architecture types and enterprise applications.pdf
sbt 2.0: go big (Scala Days 2025 edition)
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
Getting started with AI Agents and Multi-Agent Systems
Convolutional neural network based encoder-decoder for efficient real-time ob...

NeXML

  • 1. NeXML A future data exchange standard for phylogenetics Rutger Vos University of British Columbia
  • 2. Increased automation in evolutionary informatics is hampered by poorly defined “standards” Introduction (1/7) The problem Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
  • 3. Addressing interoperability problems by coding our way out of it Introduction (2/7) EvoInfo interests Syntax: NeXML Semantics: CDAO Transport: PhyloWS Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
  • 4. Introduction (3/7) This subproject’s mission To create a file format like nexus* * Maddison, Swofford and Maddison , 1997. NEXUS: An Extensible File Format for Systematic Information. Syst. Biol. 46 (4):590-621 Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources Fix (some) problems with nexus Give access to data at higher level Be extensible Expose data to xml goodies , but:
  • 5. Introduction (4/7) Nexus issues Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources https://www.nescent.org/wg_evoinfo/NEXUS_Problems No explicit versions Nothing ever deprecated No public extensions Leads to hacks such as ‘mixed’ data, ‘hot comments’ Phylogenetics post-’80s in private blocks Hard/impossible to validate
  • 6. Introduction (5/7) Parsing plain text versus parsing XML Processing nexus data involves lexing + parsing + processing XML allows choosing a parser library , data can be processed as a structure that hides tokenization issues Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
  • 7. Introduction (6/7) Extensibility Extensible file format should provide the ability to: Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources Define new data types that implement described ‘interfaces’ Attach typed data structures to core types Attach custom XML
  • 8. Introduction (7/7) XML goodies Large stack of off-the-shelf tools: Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources XML parser libraries Web service toolkits Native XML databases Editors / IDEs Serialization / data binding tools
  • 9. Design (1/5) Design principles Re-use of prior art Follow design patterns Referencing Verbose and compact representations Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
  • 10. Design (2/5) Re-use of prior art Generic key/value attachments following apple’s plist semantics: <dict> <key>prior</key> <float>0.78</float> </dict> Trees and networks following graphml General file structure following nexus concepts, i.e. blocks that reference each other Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
  • 11. Design (3/5) XML design patterns “ Declare before use ” Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources “ Metadata first ” “ Venetian blinds ” Abstract inheritance through extension, concrete inheritance through restriction
  • 12. Design (4/5) Inheritance IDTagged (required id attribute) Labelled (optional label attribute) Annotated (optional dict elements) Base (optional base/lang/href attributes) AbstractElement (in root schema) ConcreteElement (in instance document) extends extends extends extends restricts Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
  • 13. Design (5/5) Referencing Elements sometimes refer to other elements, much like in nexus In nexml, elements refer to the id of other elements by the name of the referenced element:   <otu id=&quot;t1&quot;/>   <!-- referenced later: -->   <node id=&quot;n1&quot; otu=&quot;t1&quot;/> Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
  • 14. Schema design Community feedback through wiki, email, telecon, projects (evoinfo, ppod, MIAPA) etc. Processors (perl, java, python, c++, VB, JavaScript) development in parallel Experiments with xml tools (ws, db, data binding tools) Implementation (1/6) Approach Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
  • 15. Implementation (2/6) Entity relationships Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach     ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
  • 16. Implementation (3/6) inheritance tree for elements Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
  • 17. Implementation (4/6) anatomy of a “block” <characters      id=&quot;c1&quot;      xsi:type=&quot;nex:DnaSeqs&quot;      otus=&quot;t1&quot;> </characters> <dict> <key>desc</key> <string>description … </string> </dict> Contents… Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
  • 18. Implementation (5/6) Character Classes RestrictionCells RestrictionSeqs Restriction ContinuousCells ContinuousSeqs Continuous StandardCells StandardSeqs Standard ProteinCells ProteinSeqs Protein RnaCells RnaSeqs RNA DnaCells DnaSeqs DNA Cells Sequence Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
  • 19. Implementation (6/6) Tree Classes IntTree FloatTree Tree IntNetwork FloatNetwork Network Int Float Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
  • 20. Current status (1/4) Schema blocks Done: OTUs characters : dna, rna, nucleotide, protein, categorical, continuous, restriction (compact and verbose) trees : graphml trees and networks, various edge formats and rootings Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
  • 21. Nexml parsers and writers : mesquite (java NeXML class libraries) Bio::Phylo (BioPerl compatible) pyNexml (python) DAMBE (Visual Basic) NCL (C++) JavaScript Current status (2/4) Parsers and writers Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
  • 22. Semantic annotation (CDAO) using SAWSDL Current status (3/4) Experiments Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources Scalability: Indexed files in dbxml Created large files from tolweb , rbcl XInclude with tinyseq xml REST Web services: ToL service validation service nexml2json , nexus2xml Schema inclusion in wsdl
  • 23. Publish standard More restricted vocabulary attachments (e.g. Darwin core, CDAO-mediated terms) Substitution model descriptions Sets (in progress, using class identifiers) Distances Splits Current status (4/4) To do Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
  • 24. Resources NeXML Base URL: http://www.nexml.org Wiki: /wiki Mailing list: /mail Issue tracker: /tracker SVN repository: /code EvoInfo: http://evoinfo.nescent.org    CDAO: http://www.evolutionaryontology.org Introduction      The problem      EvoInfo interests      This subproject      Nexus issues      Parsing      Extensibility      XML goodies Design      Principles      Re-use      Patterns      Inheritance      References Implementation      Approach      ERD      Inheritance      Anatomy      Characters      Trees Current status      Schema blocks      Parsers & writers      Experiments      To do Resources
  • 25. Acknowledgements Contributions: Jason Caravas, Mark Holder, Peter Midford, Jeet Sukumaran, Xuhua Xia Feedback: wg-evoinfo, pPOD, Wayne Maddison, David Maddison Additional funding, support: NESCent, GSoC