SlideShare a Scribd company logo
João André Carriço,
Microbiology Institute and Instituto de Medicina Molecular,
Faculty of Medicine, University of Lisbon
jcarrico@fm.ul.pt twitter: @jacarrico
Session SY024 Controversies in interpreting whole genome sequence data
26th ECCMID, Amsterdam, Netherlands
7-12 April 2016
João André Carriço,
Microbiology Institute and Instituto de Medicina Molecular,
Faculty of Medicine, University of Lisbon
jcarrico@fm.ul.pt twitter: @jacarrico
Session SY024 Controversies in interpreting whole genome sequence data
26th ECCMID, Amsterdam, Netherlands
7-12 April 2016
Virulence Factors:
 Class of gene products
 Help pathogens to invade the host and
evade specific host’s defensive mechanisms
 Enhance the pathogen’s potential to cause
disease
Virulence Factors (example):
 Bacterial toxins (Endotoxins and Exotoxins)
 Adherence factors (Pili)
 Cell surface carbohydrates and proteins that protect a
bacterium (Streptococcal M Protein)
 Hydrolytic enzymes that may contribute to the
pathogenicity of the bacterium (hyaluronidase)
 Factors to compete with host nutrient uptake
(Siderophores)
Sources:VFDB / Medical Microbiology. 4th edition. (http://www.ncbi.nlm.nih.gov/books/NBK7627/)
Virulome
Core genome
Accessory
genome
Mobilome
 VFDB (http://www.mgc.ac.cn/VFs/main.htm)
 Pathosystems Resource Integration Center (PATRIC)
VF (https)://www.patricbrc.org/)
 Victors (http://www.phidias.us/victors/)
 PHI-Base (http://www.phi-base.org/)
 MvirDB (http://mvirdb.llnl.gov/ )
Criteria for choice:
 Focused mainly on virulence factors DB (as defined in the first slide)
 excludes Antibiotic resistance databases (CARD, ARDB,ARGO, RAC,…)
* Created to facilitate the screening of HTS data
Database last update:
Tue Feb 23 22:05:25
2016
• 6 NIAID priority genera:
• Mycobacterium
• Salmonella
• Escherichia
• Shigella
• Listeria
• Bartonella
• 1572VFs
• 1071 articles
• Use of controlled vocabulary
• IntegratesVFDB andVictorsVF information
• PATRIC supports:
• Genome annotation
• Comparative Genomics
• Transcriptomics
• Pathways
• Host-pathogen interaction
• Disease-related information
• Database last update:
• March 2016
Pathosystems Resource Integration Center
• 5177Virulence Factors
• 126 Pathogens (class/#sp/#VFs):
• Gram + 15 1160
• Gram – 36 3488
• Virus 54 179
• Parasites 13 105
• Fungi 8 245
• Last DB Update: 27/8/2014
• pathogenicity, virulence and effector genes
• Fungal
• Oomycete
• bacterial pathogens
• Hosts:
• Animal
• Plant
• Fungal
• Insect hosts.
• Biodefense focused
• Last update 2007??
• Data still available for download..
 All the databases have:
 manually curated data
 links for the original publication
 However manual curation is a huge caveat
due to the sustainability of the process
 Querying annotation in the the website
 Selecting species of interest, and browsing
the website
 BLAST query for DNA or Protein
 Download the gene/protein databases and
use them as templates for searching own
data
MVLST/MLST-v
 With HTS several core genome /whole genome MLST schemas are becoming available/being
developed:
 Neisseria sp.
 Campylobacter sp.
 Staphylococcus aureus
 Legionella pneumophila
 Listeria monocitogenes
 Enterococcus faecium
 Mycobacterium tuberculosis
 Acinetobacter baumannii
 Salmonella enterica
 E.coli
 ….
 Loci in these schemas can be annotated / linked to the Virulence Factor DBs for automatic
allele annotation through these systems
Seqsphere+
http://pubmlst.org/
http://bigsdb.web.pasteur.fr/
https://enterobase.warwick.ac.uk/
Bionumerics 7.5
 So far we have seen what is available
How can we design
actionable virulome databases ?
Actionable: able to be done or acted on; having practical value
New Oxford American Dictionary
 Available databases still lack interfaces for
programmatic access :
 RESTful APIs would allow:
▪ easy automatic querying from scripts without the need
of web interfaces or downloads
▪ Database updates by authorized groups (distributed
curation effort)
APIs : Application Programming Interfaces
 Existing DBs reuse each others datasets without true
database interoperability: need for common ontologies
(controlled vocabularies already exist but are not used by
all)
 Ontologies and computer readable data formats (json-
ld or RDF) can allow for true database interoperability
allowing bioinformaticians to extract the targeted
information from a single query reaching multiple
databases
Trends Microbiol 17, 279–285 (2009).
 Major problems of databases
 Manual curation still a necessity
 Academic model for sustainability of a resource:
lack of funding leads to “dead” databases
 Existing virulome databases provide a wealth of data
 A large part of the availableVF data overlaps between DBs.
The overlap largely depends of the last database update and
what was included.
 They are always aWork in Progress , heavily relying in
manual curation
 Novel HTS based techniques such as cg/wgMLST can use
this databases to annotate schemas and provide a much
richer picture ofVF diversity at DNA/Protein level.
 UMMI Members
 Mário Ramirez
 José Melo-Cristino
 EFSA INNUENDO Project (https://sites.google.com/site/innuendocon/)
 Mirko Rossi
 FP7 PathoNGenTrace (http://www.patho-ngen-trace.eu/):
 Dag Harmsen (Univ. Muenster)
 Stefan Niemann (Research Center Borstel)
 Keith Jolley, James Bray and Martin Maiden (Univ. Oxford)
 Joerg Rothganger (RIDOM)
 Hannes Pouseele (Applied Maths)
 Genome Canada IRIDA project (www.irida.ca)
 Franklin Bristow, Thomas Matthews, Aaron Petkau, Morag Graham and Gary Van Domselaar (NLM , PHAC)
 Ed Taboada and Peter Kruczkiewicz (Lab Foodborne Zoonoses, PHAC)
 Fiona Brinkman (SFU)
 William Hsiao (BCCDC)
INTEGRATED RAPID INFECTIOUS DISEASE ANALYSIS

More Related Content

PPT
Integrating phylogenetic inference and metadata visualization for NGS data
PPTX
Computational Resources In Infectious Disease
PPTX
Genomic Epidemiology: How High Throughput Sequencing changed our view on bac...
PPTX
Eccmid meet the expert 2015
PPTX
Common languages in genomic epidemiology: from ontologies to algorithms
PPTX
Software Pipelines: The Good, The Bad and The Ugly
PPTX
Making Use of NGS Data: From Reads to Trees and Annotations
PPT
How to compare typing techniques: do’s and Don’t’s
Integrating phylogenetic inference and metadata visualization for NGS data
Computational Resources In Infectious Disease
Genomic Epidemiology: How High Throughput Sequencing changed our view on bac...
Eccmid meet the expert 2015
Common languages in genomic epidemiology: from ontologies to algorithms
Software Pipelines: The Good, The Bad and The Ugly
Making Use of NGS Data: From Reads to Trees and Annotations
How to compare typing techniques: do’s and Don’t’s

What's hot (20)

PDF
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
PDF
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
PDF
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015
PPTX
Choosing the Right Microbial Typing Method: A Quantitative Approach
PDF
Introduction to 16S Microbiome Analysis
PPTX
GMI proficiency testing- Progress report 2016
PPTX
Next generation sequencing by Muhammad Abbas
PPTX
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
PDF
Proposal for 2016 survey of WGS capacity in EU/EEA Member States
PPTX
Haendel clingenetics.3.14.14
PDF
NGS and the molecular basis of disease: a practical view
PPTX
Cell Authentication By STR Profiling
PDF
zandona14nipsA0
PDF
SPIN Workshop Microbial Genomics @NIST
PDF
NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...
PPTX
Bacterial Pathogen Genomics at NCBI
PDF
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
PDF
Introduction to 16S Analysis with NGS - BMR Genomics
PDF
Proof of concept of WGS based surveillance: meningococcal disease
PDF
Pattemore 2015
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015
Choosing the Right Microbial Typing Method: A Quantitative Approach
Introduction to 16S Microbiome Analysis
GMI proficiency testing- Progress report 2016
Next generation sequencing by Muhammad Abbas
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Proposal for 2016 survey of WGS capacity in EU/EEA Member States
Haendel clingenetics.3.14.14
NGS and the molecular basis of disease: a practical view
Cell Authentication By STR Profiling
zandona14nipsA0
SPIN Workshop Microbial Genomics @NIST
NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...
Bacterial Pathogen Genomics at NCBI
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Introduction to 16S Analysis with NGS - BMR Genomics
Proof of concept of WGS based surveillance: meningococcal disease
Pattemore 2015
Ad

Similar to ECCMID 2016 - How to build actionable virulome databases (20)

PDF
13073_2020_Article_766.pdf
PPTX
patho.ppt
PDF
Preclinical Scale Bioprocessing, Nov. 2, 2009
PDF
PIIS0966842X20301621.pdf
PPTX
An introduction to PATRIC and its use in phage annotation
PDF
Pathogenomics Genome Analysis of Pathogenic Microbes 1st Edition Werner Gobel...
PDF
COMPARE: A global platform for the sequence-based rapid identification of pat...
PPTX
Introduction to infectomics
PPTX
European Molecular Biology Laboratory (EMBL)- European Bioinformatics Institu...
PDF
Proteomic strategies for discovery of bacterial virulence factors
PPT
Application of Whole Genome Sequencing in the infectious disease’ in vitro di...
PDF
Bioinformatics tools for the diagnostic laboratory - T.Seemann - Antimicrobi...
PDF
The Human Virome Methods and Protocols Andrés Moya
PDF
The Human Virome Methods and Protocols Andrés Moya
PDF
152310_brisse-invs-genoepi-nov2015.pdf
PDF
Containerized attribute indexing and graph genomes for federated data access
PDF
Overview of the commonly used sequencing platforms, bioinformatic search tool...
PDF
Pathema: A Bioinformatics Resource Center
PPTX
bioinformatics presentation in the master presentation
13073_2020_Article_766.pdf
patho.ppt
Preclinical Scale Bioprocessing, Nov. 2, 2009
PIIS0966842X20301621.pdf
An introduction to PATRIC and its use in phage annotation
Pathogenomics Genome Analysis of Pathogenic Microbes 1st Edition Werner Gobel...
COMPARE: A global platform for the sequence-based rapid identification of pat...
Introduction to infectomics
European Molecular Biology Laboratory (EMBL)- European Bioinformatics Institu...
Proteomic strategies for discovery of bacterial virulence factors
Application of Whole Genome Sequencing in the infectious disease’ in vitro di...
Bioinformatics tools for the diagnostic laboratory - T.Seemann - Antimicrobi...
The Human Virome Methods and Protocols Andrés Moya
The Human Virome Methods and Protocols Andrés Moya
152310_brisse-invs-genoepi-nov2015.pdf
Containerized attribute indexing and graph genomes for federated data access
Overview of the commonly used sequencing platforms, bioinformatic search tool...
Pathema: A Bioinformatics Resource Center
bioinformatics presentation in the master presentation
Ad

Recently uploaded (20)

PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PDF
Sciences of Europe No 170 (2025)
PPTX
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
PPTX
Overview of calcium in human muscles.pptx
PPT
6.1 High Risk New Born. Padetric health ppt
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPTX
Fluid dynamics vivavoce presentation of prakash
PPTX
Science Quipper for lesson in grade 8 Matatag Curriculum
PDF
The scientific heritage No 166 (166) (2025)
PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PDF
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PPTX
C1 cut-Methane and it's Derivatives.pptx
PPTX
neck nodes and dissection types and lymph nodes levels
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PPTX
2. Earth - The Living Planet Module 2ELS
PDF
lecture 2026 of Sjogren's syndrome l .pdf
PPTX
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
Sciences of Europe No 170 (2025)
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
Overview of calcium in human muscles.pptx
6.1 High Risk New Born. Padetric health ppt
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
Fluid dynamics vivavoce presentation of prakash
Science Quipper for lesson in grade 8 Matatag Curriculum
The scientific heritage No 166 (166) (2025)
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
Phytochemical Investigation of Miliusa longipes.pdf
TOTAL hIP ARTHROPLASTY Presentation.pptx
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
C1 cut-Methane and it's Derivatives.pptx
neck nodes and dissection types and lymph nodes levels
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
2. Earth - The Living Planet Module 2ELS
lecture 2026 of Sjogren's syndrome l .pdf
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...

ECCMID 2016 - How to build actionable virulome databases

  • 1. João André Carriço, Microbiology Institute and Instituto de Medicina Molecular, Faculty of Medicine, University of Lisbon jcarrico@fm.ul.pt twitter: @jacarrico Session SY024 Controversies in interpreting whole genome sequence data 26th ECCMID, Amsterdam, Netherlands 7-12 April 2016
  • 2. João André Carriço, Microbiology Institute and Instituto de Medicina Molecular, Faculty of Medicine, University of Lisbon jcarrico@fm.ul.pt twitter: @jacarrico Session SY024 Controversies in interpreting whole genome sequence data 26th ECCMID, Amsterdam, Netherlands 7-12 April 2016
  • 3. Virulence Factors:  Class of gene products  Help pathogens to invade the host and evade specific host’s defensive mechanisms  Enhance the pathogen’s potential to cause disease
  • 4. Virulence Factors (example):  Bacterial toxins (Endotoxins and Exotoxins)  Adherence factors (Pili)  Cell surface carbohydrates and proteins that protect a bacterium (Streptococcal M Protein)  Hydrolytic enzymes that may contribute to the pathogenicity of the bacterium (hyaluronidase)  Factors to compete with host nutrient uptake (Siderophores) Sources:VFDB / Medical Microbiology. 4th edition. (http://www.ncbi.nlm.nih.gov/books/NBK7627/)
  • 6.  VFDB (http://www.mgc.ac.cn/VFs/main.htm)  Pathosystems Resource Integration Center (PATRIC) VF (https)://www.patricbrc.org/)  Victors (http://www.phidias.us/victors/)  PHI-Base (http://www.phi-base.org/)  MvirDB (http://mvirdb.llnl.gov/ ) Criteria for choice:  Focused mainly on virulence factors DB (as defined in the first slide)  excludes Antibiotic resistance databases (CARD, ARDB,ARGO, RAC,…)
  • 7. * Created to facilitate the screening of HTS data Database last update: Tue Feb 23 22:05:25 2016
  • 8. • 6 NIAID priority genera: • Mycobacterium • Salmonella • Escherichia • Shigella • Listeria • Bartonella • 1572VFs • 1071 articles • Use of controlled vocabulary • IntegratesVFDB andVictorsVF information • PATRIC supports: • Genome annotation • Comparative Genomics • Transcriptomics • Pathways • Host-pathogen interaction • Disease-related information • Database last update: • March 2016 Pathosystems Resource Integration Center
  • 9. • 5177Virulence Factors • 126 Pathogens (class/#sp/#VFs): • Gram + 15 1160 • Gram – 36 3488 • Virus 54 179 • Parasites 13 105 • Fungi 8 245 • Last DB Update: 27/8/2014
  • 10. • pathogenicity, virulence and effector genes • Fungal • Oomycete • bacterial pathogens • Hosts: • Animal • Plant • Fungal • Insect hosts.
  • 11. • Biodefense focused • Last update 2007?? • Data still available for download..
  • 12.  All the databases have:  manually curated data  links for the original publication  However manual curation is a huge caveat due to the sustainability of the process
  • 13.  Querying annotation in the the website  Selecting species of interest, and browsing the website  BLAST query for DNA or Protein
  • 14.  Download the gene/protein databases and use them as templates for searching own data
  • 16.  With HTS several core genome /whole genome MLST schemas are becoming available/being developed:  Neisseria sp.  Campylobacter sp.  Staphylococcus aureus  Legionella pneumophila  Listeria monocitogenes  Enterococcus faecium  Mycobacterium tuberculosis  Acinetobacter baumannii  Salmonella enterica  E.coli  ….  Loci in these schemas can be annotated / linked to the Virulence Factor DBs for automatic allele annotation through these systems Seqsphere+ http://pubmlst.org/ http://bigsdb.web.pasteur.fr/ https://enterobase.warwick.ac.uk/ Bionumerics 7.5
  • 17.  So far we have seen what is available How can we design actionable virulome databases ? Actionable: able to be done or acted on; having practical value New Oxford American Dictionary
  • 18.  Available databases still lack interfaces for programmatic access :  RESTful APIs would allow: ▪ easy automatic querying from scripts without the need of web interfaces or downloads ▪ Database updates by authorized groups (distributed curation effort) APIs : Application Programming Interfaces
  • 19.  Existing DBs reuse each others datasets without true database interoperability: need for common ontologies (controlled vocabularies already exist but are not used by all)  Ontologies and computer readable data formats (json- ld or RDF) can allow for true database interoperability allowing bioinformaticians to extract the targeted information from a single query reaching multiple databases
  • 20. Trends Microbiol 17, 279–285 (2009).
  • 21.  Major problems of databases  Manual curation still a necessity  Academic model for sustainability of a resource: lack of funding leads to “dead” databases
  • 22.  Existing virulome databases provide a wealth of data  A large part of the availableVF data overlaps between DBs. The overlap largely depends of the last database update and what was included.  They are always aWork in Progress , heavily relying in manual curation  Novel HTS based techniques such as cg/wgMLST can use this databases to annotate schemas and provide a much richer picture ofVF diversity at DNA/Protein level.
  • 23.  UMMI Members  Mário Ramirez  José Melo-Cristino  EFSA INNUENDO Project (https://sites.google.com/site/innuendocon/)  Mirko Rossi  FP7 PathoNGenTrace (http://www.patho-ngen-trace.eu/):  Dag Harmsen (Univ. Muenster)  Stefan Niemann (Research Center Borstel)  Keith Jolley, James Bray and Martin Maiden (Univ. Oxford)  Joerg Rothganger (RIDOM)  Hannes Pouseele (Applied Maths)  Genome Canada IRIDA project (www.irida.ca)  Franklin Bristow, Thomas Matthews, Aaron Petkau, Morag Graham and Gary Van Domselaar (NLM , PHAC)  Ed Taboada and Peter Kruczkiewicz (Lab Foodborne Zoonoses, PHAC)  Fiona Brinkman (SFU)  William Hsiao (BCCDC) INTEGRATED RAPID INFECTIOUS DISEASE ANALYSIS