SlideShare a Scribd company logo
///////////
How Bioinformatic
and Sequencing Data
Might Inform the
Regulatory Process
(Industry
Perspective)
OECD Expert Group
BioPesticides Seminar:
Bioinformatics and
Regulation of Microbial
Pesticides
Andre Silvanovich, PhD
June 24, 2019
Regulatory Risk Assessment of Microbials &
Whole Genome Sequencing (WGS)
The current risk assessment paradigm for microbials is:
Fit for purpose
Has been and is used to successfully evaluate risk for commercialized
products
At this time WGS coupled with genome assembly and annotation
does not improve the quality or value of a regulatory risk
assessment
Lacking information on protein functionality
Commoditization of WGS has created unrealistic expectations for its use
Evolution of Whole Genome Sequencing
Rapid evolution through platform changes
454
Illumina
Pacbio
Quality and throughput
Steadily decreasing cost per base
Accessible through fee for service vendors
Routinely described in popular press
“$200 genome”
Personalized medicine
“$999 consumer-friendly genome sequence”
Image and quotations downloaded from https://www.wired.com/story/whole-genome-sequencing-cost-200-dollars/ May 22, 2019
Raw data are strings of nucleotides (G, A, T, & C’s) - that are of no value without further analysis
WGS Data are Meaningless Without Further Analysis
4
Assembly
Combine short nucleotide strings into
longer strings to construct collection of
“contigs” (longer strings), or a complete
chromosome
Identification of single nucleotide
variants (SNVs) and insertion or deletion
of bases (indels)
Added nucleotides that are present or
nucleotides that are missing relative to a
reference sequence
Changes in amino acid sequence
Use nucleotide sequence to identify taxonomy
Specific sequence unique to genus or species
reference sequence
Coupled with additional experimental data
(e.g., morphology and growth on selective
media)
Gene prediction
Identify promoter-coding-3’ untranslated region
(UTR) sequences, or amino acid sequence
Use predicted amino acid sequence to assign
predicted gene function
Build an annotated genome
Sequence is not enough to inform a microbial regulatory process
WGS Data has Overwhelmed our Analytical Capacity
“In a study published Tuesday in PLOS Biology,
researchers at Northwestern University reported that
of our 20,000 protein-coding genes, about 5,400
have never been the subject of a single dedicated
paper.”
“Most of our other genes have been almost as badly
neglected, the subjects of minor investigation at best.
A tiny fraction — 2,000 of them — have hogged most
of the attention, the focus of 90 percent of the
scientific studies published in recent years.”
Cited text viewed and image downloaded Sept. 19, 2018
https://www.nytimes.com/2018/09/18/science/why-your-dna-is-still-uncharted-
territory.html?action=click&module=Discovery&pgtype=Homepage
May 2019 Release Statistics for UniProt*
We Lack Experimental Data on Protein Function
Downloaded from https://www.ebi.ac.uk/uniprot/TrEMBLstats May 22, 2019
Fewer than 1% of protein sequences
have evidence of their existence
through experimentation
Greater than 99% of protein
sequences are predicted from
genome sequencing activities
* The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data. UniProt is a collaboration between the European
Bioinformatics Institute (EMBL-EBI), the SIB Swiss Institute of Bioinformatics and the Protein Information Resource (PIR
Knowledge of protein function is important to the regulatory process
Bacterial Protein Sequences Lack Data on Protein Function
Thousands of bacterial genomes sequenced
Millions of protein sequences
A small proportion of these proteins have
been studied experimentally
One third do not display sufficient similarity for
functional assignment
Price et al., 2018. Mutant phenotypes for thousands of bacterial genes of unknown function. Nature 557:503-509
Chang et al., 2016 COMBREX-DB: an experiment centered database of protein function: knowledge, predictions and knowledge gaps. Nucleic Acids Research 44:D330-D335.
Protein with
experimental
evidence
Functional
assignment based
upon alignment with
triangle protein
Functional
assignment based
upon alignment with
square protein
Functional
assignment based
upon alignment with
pentagon protein
Triangle
protein
Triangle-like
protein
Triangle-like
protein
Triangle-like
protein Annotation
Assignment
Most of the proteins have predicted function
based on sequence similarity
Automated pipelines create annotations (function
assignments) with what is known at the time
Functional assignment has not been systematically
documented
No standardized process for functional assignment
Inaccurate or incorrect annotations are sometimes
produced and propagated by pipelines
A controlled lexicon is not used
No understanding of the functional context is
produced (Does the “gene” yield a protein)
Knowledge of protein function is important to
the regulatory process
Count Description
373 phosphotransferase
42 pyridoxal phosphate (plp) phosphatase
21 putative phosphatase
230 phosphatase ybha
1 conserved hypothetical protein
3 predicted hydrolase
2 ygha hydrolase
209 cof-like hydrolase family protein
3 hydrolase of the had superfamily
475 pyridoxal phosphate phosphatase
1 putative phosphatase ywpj
101 pyridoxal phosphate phosphatase ybha
2 phosphatase
2 cof hydrolase
702 pyridoxal phosphatase
1 hypothetical protein samn05216485_102158
1 hypothetical protein samn04487822_11863
515 cof-type had-iib family hydrolase
1 hydrolase_3, haloacid dehalogenase-like hydrolase
3 pyridoxal phosphatase / fructose 1,6-bisphosphatase
This bacterial protein sequence is found 2689 times in GenBank NR
database:
MTTRVIALDLDGTLLTPKKTLLPSSIEALARAREAGYQLIIVTGRHHVAIHPFYQALALDTPAICCNGTY
LYDYHAKTVLEADPMPVNKALQLIEMLNEHHIHGLMYVDDAMVYEHPTGHVIRTSNWAQTLPPEQRPTFT
QVASLAETAQQVNAVWKFALTHDDLPQLQHFGKHVEHELGLECEWSWHDQVDIARGGNSKGKRLTKWVEA
QGWSMENVVAFGDNFNDISMLEAAGTGVAMGNADDAVKARANIVIGDNTTDSIAQFIYSHLI
Despite being the same sequence in each of the 2689 GenBank entries it is
described in multiple ways:
• Phosphotransferase
• Phosphatase
• Hydrolase
• Hypothetical protein
Given the count associated with each description, there are three primary
annotation lineages that describe different biological activities for the same
sequence
Phosphotransferase is the opposite of phosphatase activity
• Phosphotransferase adds phosphate, phosphatase removes
phosphate
What is relationship between hydrolase and phosphatase/transferase
activities?
Annotations of Identical Protein Sequences Vary Considerably
Knowledge of protein function is important to the regulatory process
Potential Conclusions Related to Disparate Annotations
for Identical Protein Sequences
The protein has multiple enzyme functions and all annotations are correct
The protein is a hydrolase
Interpro says the following:
Haloacid dehydrogenase-like family that includes cof-hydrolases, ATPases and phosphatases
Haloacid dehydrogenase family that includes hydrolases and eukaryotic phosphatases
Some annotations for hydrolases are due to activity versus a common lab test compound used to determine phosphatase
activity
There is no evidence that this phosphatase activity is physiologically relevant to hydrolase activity
Perhaps the phosphorylase annotation is a book keeping error or typo that is propagated
If not a multifunction enzyme, resolution of the disparate annotations would require:
Investigation of the annotation lineage to determine if one or more annotations are based upon
physicochemical or genetic analysis of mutants
Genetic analysis of mutants
Purification of the protein followed by physicochemical analysis
Knowledge of protein function is important to the regulatory process
Current State of WGS Data Analysis
& the Regulatory Process
Tremendous strides have been made
in:
Sequencing platforms and
technologies
Software and tools used to
interrogate and catalogue sequence
data
The gap between sequence data
collection and experimentation
continues to widen
Given the current body of knowledge and state
of the field, at this time WGS coupled with
genome assembly and annotation does not
improve the quality or value of a regulatory
risk assessment
High quality sequence data are readily collected &
bioinformatic tools are robust and fit for purpose
However, the total body of experimental protein data
is significantly lacking
Without experimental data, the majority of
functional assignments are simply predictions
- 1st OECD EGBP document 2001
- EPA BPPD established in 1995
Provision of Selected Microbial Sequences to Regulatory
Authorities Could Add Value
Hypothesis–driven evaluation of specific sequence / bioinformatic information has the
ability to inform a regulatory risk assessment
Examples such as:
Taxonomic identification
Presence of potential resistance to clinically relevant antibiotics
These hypotheses are underpinned with existing experimental evidence / data that
indicate they may yield meaningful hazard identification
If hazard is identified via sequence / bioinformatic information, experimentation / data
generation would still be needed to assess hazard & determine risk
WGS in Microbial Risk Assessment – Not Ready for Prime Time
The current risk assessment paradigm that focuses experimental evaluation is:
Fit for purpose
Has been and is used to successfully evaluate risk for commercialized products
At this time WGS coupled with genome assembly and annotation does not
improve the quality or value of a risk assessment
Hypothesis–driven evaluation of specific sequence / bioinformatic information does have the
ability to inform a regulatory risk assessment
In the future as the body of experimental protein knowledge expands assembly
and annotation may prove to be of value for a hazard assessment
A standardization of software tools, thresholds and references will likely be required
///////////
Thank you for the
opportunity

More Related Content

PDF
High-throughput sequencing data of microorganisms opens new perspectives for ...
PDF
Bioinformatics: Building the cornerstones of Sequence Homology and its use fo...
PDF
US Perspective on use of bioinformatics in microbial pesticide regulation - O...
PDF
How can Whole Genome Sequencing information be used to address data requireme...
PDF
Overview of the commonly used sequencing platforms, bioinformatic search tool...
PDF
Potential value of bioinformatic analysis in regulatory process - OECD Bioinf...
PDF
Introduction - OECD Seminar on Bioinformatics and Regulation of Microbial Pes...
PDF
Activities in development of bioinformatics pipelines for characterisation of...
High-throughput sequencing data of microorganisms opens new perspectives for ...
Bioinformatics: Building the cornerstones of Sequence Homology and its use fo...
US Perspective on use of bioinformatics in microbial pesticide regulation - O...
How can Whole Genome Sequencing information be used to address data requireme...
Overview of the commonly used sequencing platforms, bioinformatic search tool...
Potential value of bioinformatic analysis in regulatory process - OECD Bioinf...
Introduction - OECD Seminar on Bioinformatics and Regulation of Microbial Pes...
Activities in development of bioinformatics pipelines for characterisation of...

What's hot (20)

PDF
Use of Next Generation Sequencing techniques for characterisation of baculovi...
PPTX
The Global Micorbial Identifier (GMI) initiative - and its working groups
PDF
Overview of the ECDC whole genome sequencing strategy
PPTX
The Value of Bioinformatics Software
PDF
PGCA_Agenda 2017
PDF
Bioinformatics and sequencing tools used in research and development - OECD B...
PDF
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
PDF
Whole Genome Sequencing (WGS) for food safety management in France: Example...
PDF
Applications of Whole Genome Sequencing (WGS) technology on food safety manag...
PPTX
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...
PPTX
GMI proficiency testing- Progress report 2016
PDF
Whole Genome Sequencing (WGS) for surveillance of foodborne infections in Den...
PDF
Whole Genome Sequencing (WGS) for food safety management-Perspectives from C...
PPTX
Brief introduction to Bioinformatics
PDF
Agronomy 08-00057-v2
PDF
Bioinformatics Market - Global Report on Bioinformatics Industry 2020
PDF
Results of the 2015 survey on WGS capacity in EU/EEA Member States
PPTX
Bioinformatics Database Computer applications
PDF
PPTX
Rt131 lec 2
Use of Next Generation Sequencing techniques for characterisation of baculovi...
The Global Micorbial Identifier (GMI) initiative - and its working groups
Overview of the ECDC whole genome sequencing strategy
The Value of Bioinformatics Software
PGCA_Agenda 2017
Bioinformatics and sequencing tools used in research and development - OECD B...
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
Whole Genome Sequencing (WGS) for food safety management in France: Example...
Applications of Whole Genome Sequencing (WGS) technology on food safety manag...
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...
GMI proficiency testing- Progress report 2016
Whole Genome Sequencing (WGS) for surveillance of foodborne infections in Den...
Whole Genome Sequencing (WGS) for food safety management-Perspectives from C...
Brief introduction to Bioinformatics
Agronomy 08-00057-v2
Bioinformatics Market - Global Report on Bioinformatics Industry 2020
Results of the 2015 survey on WGS capacity in EU/EEA Member States
Bioinformatics Database Computer applications
Rt131 lec 2
Ad

Similar to How bioinformatic and sequencing data might inform the regulatory process - OECD Bioinformatics and Microbial Pesticides Seminar - Andre Silvanovich (20)

PPTX
Mapping protein to function
PPT
NIH-mar2604. structural bioinformatics and genomics
PDF
Protein function and bioinformatics
PDF
What can we do with microbial WGS data? - t.seemann - mc gill summer 2016 - ...
PPT
Prediction of protein function from sequence derived protein features
PPTX
Genomic proteomics
PDF
Advanced Bioinformatics for Genomics and BioData Driven Research
PDF
Afp cafa djuric
PPTX
How to analyse large data sets
PPT
Protein function prediction
PPTX
protein function on genome wide scale analysis.pptx
PPTX
Ondex: Data integration and visualisation
PPTX
Hidden in plain sight
PPT
Integration of heterogeneous data
PPT
Project report-on-bio-informatics
PDF
Making Protein Function and Subcellular Localization Predictions: Challenges ...
PPTX
Characterizing Protein Families of Unknown Function
PDF
Open-Source Bioinformatics for Data Scientists with Amanda Schierz
PDF
Schierz ODSC Meetup pdf
PDF
Personalized medicine via molecular interrogation, data mining and systems bi...
Mapping protein to function
NIH-mar2604. structural bioinformatics and genomics
Protein function and bioinformatics
What can we do with microbial WGS data? - t.seemann - mc gill summer 2016 - ...
Prediction of protein function from sequence derived protein features
Genomic proteomics
Advanced Bioinformatics for Genomics and BioData Driven Research
Afp cafa djuric
How to analyse large data sets
Protein function prediction
protein function on genome wide scale analysis.pptx
Ondex: Data integration and visualisation
Hidden in plain sight
Integration of heterogeneous data
Project report-on-bio-informatics
Making Protein Function and Subcellular Localization Predictions: Challenges ...
Characterizing Protein Families of Unknown Function
Open-Source Bioinformatics for Data Scientists with Amanda Schierz
Schierz ODSC Meetup pdf
Personalized medicine via molecular interrogation, data mining and systems bi...
Ad

More from OECD Environment (20)

PDF
First OECD Roundtable on Financing Water
PDF
First OECD Roundtable on Financing Water
PDF
First OECD Roundtable on Financing Water
PDF
OECD Green Talks LIVE | Securing a sustainable plastics future for Southeast ...
PDF
12th Roundtable on Financing Water: Strengthening the sustainability of water...
PDF
12th Roundtable on Financing Water: Strengthening the sustainability of water...
PDF
12th Roundtable on Financing Water: Strengthening the sustainability of water...
PDF
Green Talks LIVE | Adapting to a drier world in a changing climate: Launch of...
PDF
International expert workshop on forestry for the future 26-28 May 2025: Pres...
PDF
International expert workshop on forestry for the future 26-28 May 2025: Pres...
PDF
International expert workshop on forestry for the future 26-28 May 2025: Pres...
PDF
International expert workshop on forestry for the future 26-28 May 2025: Pres...
PDF
International expert workshop on forestry for the future 26-28 May 2025: Pres...
PDF
International expert workshop on forestry for the future 26-28 May 2025: Pres...
PDF
International expert workshop on forestry for the future 26-28 May 2025: Pres...
PDF
International expert workshop on forestry for the future 26-28 May 2025: Pres...
PDF
International expert workshop on forestry for the future 26-28 May 2025: Pres...
PDF
International expert workshop on forestry for the future 26-28 May 2025: Pres...
PDF
International expert workshop on forestry for the future 26-28 May 2025: Pres...
PDF
International expert workshop on forestry for the future 26-28 May 2025: Pres...
First OECD Roundtable on Financing Water
First OECD Roundtable on Financing Water
First OECD Roundtable on Financing Water
OECD Green Talks LIVE | Securing a sustainable plastics future for Southeast ...
12th Roundtable on Financing Water: Strengthening the sustainability of water...
12th Roundtable on Financing Water: Strengthening the sustainability of water...
12th Roundtable on Financing Water: Strengthening the sustainability of water...
Green Talks LIVE | Adapting to a drier world in a changing climate: Launch of...
International expert workshop on forestry for the future 26-28 May 2025: Pres...
International expert workshop on forestry for the future 26-28 May 2025: Pres...
International expert workshop on forestry for the future 26-28 May 2025: Pres...
International expert workshop on forestry for the future 26-28 May 2025: Pres...
International expert workshop on forestry for the future 26-28 May 2025: Pres...
International expert workshop on forestry for the future 26-28 May 2025: Pres...
International expert workshop on forestry for the future 26-28 May 2025: Pres...
International expert workshop on forestry for the future 26-28 May 2025: Pres...
International expert workshop on forestry for the future 26-28 May 2025: Pres...
International expert workshop on forestry for the future 26-28 May 2025: Pres...
International expert workshop on forestry for the future 26-28 May 2025: Pres...
International expert workshop on forestry for the future 26-28 May 2025: Pres...

Recently uploaded (20)

PPTX
FIRE SAFETY SEMINAR SAMPLE FOR EVERYONE.pptx
PPTX
Envrironmental Ethics: issues and possible solution
DOCX
Epoxy Coated Steel Bolted Tanks for Farm Digesters Supports On-Farm Organic W...
PDF
FMM Slides For OSH Management Requirement
PDF
Lecture 2 investigation of renal diseses.pdf
PDF
The Truth Behind Vantara zoo in Jamnagar
PPT
Compliance Monitoring report CMR presentation.ppt
PPTX
Environmental Ethics: issues and possible solutions
PPTX
Disposal Of Wastes.pptx according to community medicine
PDF
Ornithology-Basic-Concepts.pdf..........
PDF
Blue Economy Development Framework for Indonesias Economic Transformation.pdf
PPTX
Green and Cream Aesthetic Group Project Presentation.pptx
PDF
Earthquake, learn from the past and do it now.pdf
PPTX
Green Modern Sustainable Living Nature Presentation_20250226_230231_0000.pptx
PDF
The Role of Non-Legal Advocates in Fighting Social Injustice.pdf
PDF
Urban Hub 50: Spirits of Place - & the Souls' of Places
PPTX
structure and components of Environment.pptx
DOCX
D-360 ESG Series: Sustainable Hospitality Strategies for a Greener Future
PPTX
Concept of Safe and Wholesome Water.pptx
PPT
PPTPresentation3 jhsvdasvdjhavsdhsvjcksjbc.jasb..ppt
FIRE SAFETY SEMINAR SAMPLE FOR EVERYONE.pptx
Envrironmental Ethics: issues and possible solution
Epoxy Coated Steel Bolted Tanks for Farm Digesters Supports On-Farm Organic W...
FMM Slides For OSH Management Requirement
Lecture 2 investigation of renal diseses.pdf
The Truth Behind Vantara zoo in Jamnagar
Compliance Monitoring report CMR presentation.ppt
Environmental Ethics: issues and possible solutions
Disposal Of Wastes.pptx according to community medicine
Ornithology-Basic-Concepts.pdf..........
Blue Economy Development Framework for Indonesias Economic Transformation.pdf
Green and Cream Aesthetic Group Project Presentation.pptx
Earthquake, learn from the past and do it now.pdf
Green Modern Sustainable Living Nature Presentation_20250226_230231_0000.pptx
The Role of Non-Legal Advocates in Fighting Social Injustice.pdf
Urban Hub 50: Spirits of Place - & the Souls' of Places
structure and components of Environment.pptx
D-360 ESG Series: Sustainable Hospitality Strategies for a Greener Future
Concept of Safe and Wholesome Water.pptx
PPTPresentation3 jhsvdasvdjhavsdhsvjcksjbc.jasb..ppt

How bioinformatic and sequencing data might inform the regulatory process - OECD Bioinformatics and Microbial Pesticides Seminar - Andre Silvanovich

  • 1. /////////// How Bioinformatic and Sequencing Data Might Inform the Regulatory Process (Industry Perspective) OECD Expert Group BioPesticides Seminar: Bioinformatics and Regulation of Microbial Pesticides Andre Silvanovich, PhD June 24, 2019
  • 2. Regulatory Risk Assessment of Microbials & Whole Genome Sequencing (WGS) The current risk assessment paradigm for microbials is: Fit for purpose Has been and is used to successfully evaluate risk for commercialized products At this time WGS coupled with genome assembly and annotation does not improve the quality or value of a regulatory risk assessment Lacking information on protein functionality
  • 3. Commoditization of WGS has created unrealistic expectations for its use Evolution of Whole Genome Sequencing Rapid evolution through platform changes 454 Illumina Pacbio Quality and throughput Steadily decreasing cost per base Accessible through fee for service vendors Routinely described in popular press “$200 genome” Personalized medicine “$999 consumer-friendly genome sequence” Image and quotations downloaded from https://www.wired.com/story/whole-genome-sequencing-cost-200-dollars/ May 22, 2019
  • 4. Raw data are strings of nucleotides (G, A, T, & C’s) - that are of no value without further analysis WGS Data are Meaningless Without Further Analysis 4 Assembly Combine short nucleotide strings into longer strings to construct collection of “contigs” (longer strings), or a complete chromosome Identification of single nucleotide variants (SNVs) and insertion or deletion of bases (indels) Added nucleotides that are present or nucleotides that are missing relative to a reference sequence Changes in amino acid sequence Use nucleotide sequence to identify taxonomy Specific sequence unique to genus or species reference sequence Coupled with additional experimental data (e.g., morphology and growth on selective media) Gene prediction Identify promoter-coding-3’ untranslated region (UTR) sequences, or amino acid sequence Use predicted amino acid sequence to assign predicted gene function Build an annotated genome
  • 5. Sequence is not enough to inform a microbial regulatory process WGS Data has Overwhelmed our Analytical Capacity “In a study published Tuesday in PLOS Biology, researchers at Northwestern University reported that of our 20,000 protein-coding genes, about 5,400 have never been the subject of a single dedicated paper.” “Most of our other genes have been almost as badly neglected, the subjects of minor investigation at best. A tiny fraction — 2,000 of them — have hogged most of the attention, the focus of 90 percent of the scientific studies published in recent years.” Cited text viewed and image downloaded Sept. 19, 2018 https://www.nytimes.com/2018/09/18/science/why-your-dna-is-still-uncharted- territory.html?action=click&module=Discovery&pgtype=Homepage
  • 6. May 2019 Release Statistics for UniProt* We Lack Experimental Data on Protein Function Downloaded from https://www.ebi.ac.uk/uniprot/TrEMBLstats May 22, 2019 Fewer than 1% of protein sequences have evidence of their existence through experimentation Greater than 99% of protein sequences are predicted from genome sequencing activities * The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data. UniProt is a collaboration between the European Bioinformatics Institute (EMBL-EBI), the SIB Swiss Institute of Bioinformatics and the Protein Information Resource (PIR Knowledge of protein function is important to the regulatory process
  • 7. Bacterial Protein Sequences Lack Data on Protein Function Thousands of bacterial genomes sequenced Millions of protein sequences A small proportion of these proteins have been studied experimentally One third do not display sufficient similarity for functional assignment Price et al., 2018. Mutant phenotypes for thousands of bacterial genes of unknown function. Nature 557:503-509 Chang et al., 2016 COMBREX-DB: an experiment centered database of protein function: knowledge, predictions and knowledge gaps. Nucleic Acids Research 44:D330-D335. Protein with experimental evidence Functional assignment based upon alignment with triangle protein Functional assignment based upon alignment with square protein Functional assignment based upon alignment with pentagon protein Triangle protein Triangle-like protein Triangle-like protein Triangle-like protein Annotation Assignment Most of the proteins have predicted function based on sequence similarity Automated pipelines create annotations (function assignments) with what is known at the time Functional assignment has not been systematically documented No standardized process for functional assignment Inaccurate or incorrect annotations are sometimes produced and propagated by pipelines A controlled lexicon is not used No understanding of the functional context is produced (Does the “gene” yield a protein) Knowledge of protein function is important to the regulatory process
  • 8. Count Description 373 phosphotransferase 42 pyridoxal phosphate (plp) phosphatase 21 putative phosphatase 230 phosphatase ybha 1 conserved hypothetical protein 3 predicted hydrolase 2 ygha hydrolase 209 cof-like hydrolase family protein 3 hydrolase of the had superfamily 475 pyridoxal phosphate phosphatase 1 putative phosphatase ywpj 101 pyridoxal phosphate phosphatase ybha 2 phosphatase 2 cof hydrolase 702 pyridoxal phosphatase 1 hypothetical protein samn05216485_102158 1 hypothetical protein samn04487822_11863 515 cof-type had-iib family hydrolase 1 hydrolase_3, haloacid dehalogenase-like hydrolase 3 pyridoxal phosphatase / fructose 1,6-bisphosphatase This bacterial protein sequence is found 2689 times in GenBank NR database: MTTRVIALDLDGTLLTPKKTLLPSSIEALARAREAGYQLIIVTGRHHVAIHPFYQALALDTPAICCNGTY LYDYHAKTVLEADPMPVNKALQLIEMLNEHHIHGLMYVDDAMVYEHPTGHVIRTSNWAQTLPPEQRPTFT QVASLAETAQQVNAVWKFALTHDDLPQLQHFGKHVEHELGLECEWSWHDQVDIARGGNSKGKRLTKWVEA QGWSMENVVAFGDNFNDISMLEAAGTGVAMGNADDAVKARANIVIGDNTTDSIAQFIYSHLI Despite being the same sequence in each of the 2689 GenBank entries it is described in multiple ways: • Phosphotransferase • Phosphatase • Hydrolase • Hypothetical protein Given the count associated with each description, there are three primary annotation lineages that describe different biological activities for the same sequence Phosphotransferase is the opposite of phosphatase activity • Phosphotransferase adds phosphate, phosphatase removes phosphate What is relationship between hydrolase and phosphatase/transferase activities? Annotations of Identical Protein Sequences Vary Considerably Knowledge of protein function is important to the regulatory process
  • 9. Potential Conclusions Related to Disparate Annotations for Identical Protein Sequences The protein has multiple enzyme functions and all annotations are correct The protein is a hydrolase Interpro says the following: Haloacid dehydrogenase-like family that includes cof-hydrolases, ATPases and phosphatases Haloacid dehydrogenase family that includes hydrolases and eukaryotic phosphatases Some annotations for hydrolases are due to activity versus a common lab test compound used to determine phosphatase activity There is no evidence that this phosphatase activity is physiologically relevant to hydrolase activity Perhaps the phosphorylase annotation is a book keeping error or typo that is propagated If not a multifunction enzyme, resolution of the disparate annotations would require: Investigation of the annotation lineage to determine if one or more annotations are based upon physicochemical or genetic analysis of mutants Genetic analysis of mutants Purification of the protein followed by physicochemical analysis Knowledge of protein function is important to the regulatory process
  • 10. Current State of WGS Data Analysis & the Regulatory Process Tremendous strides have been made in: Sequencing platforms and technologies Software and tools used to interrogate and catalogue sequence data The gap between sequence data collection and experimentation continues to widen Given the current body of knowledge and state of the field, at this time WGS coupled with genome assembly and annotation does not improve the quality or value of a regulatory risk assessment High quality sequence data are readily collected & bioinformatic tools are robust and fit for purpose However, the total body of experimental protein data is significantly lacking Without experimental data, the majority of functional assignments are simply predictions - 1st OECD EGBP document 2001 - EPA BPPD established in 1995
  • 11. Provision of Selected Microbial Sequences to Regulatory Authorities Could Add Value Hypothesis–driven evaluation of specific sequence / bioinformatic information has the ability to inform a regulatory risk assessment Examples such as: Taxonomic identification Presence of potential resistance to clinically relevant antibiotics These hypotheses are underpinned with existing experimental evidence / data that indicate they may yield meaningful hazard identification If hazard is identified via sequence / bioinformatic information, experimentation / data generation would still be needed to assess hazard & determine risk
  • 12. WGS in Microbial Risk Assessment – Not Ready for Prime Time The current risk assessment paradigm that focuses experimental evaluation is: Fit for purpose Has been and is used to successfully evaluate risk for commercialized products At this time WGS coupled with genome assembly and annotation does not improve the quality or value of a risk assessment Hypothesis–driven evaluation of specific sequence / bioinformatic information does have the ability to inform a regulatory risk assessment In the future as the body of experimental protein knowledge expands assembly and annotation may prove to be of value for a hazard assessment A standardization of software tools, thresholds and references will likely be required
  • 13. /////////// Thank you for the opportunity