SlideShare a Scribd company logo
STRING Prediction of protein networks through integration of diverse large-scale data sets Lars Juhl Jensen EMBL Heidelberg
STRING integrates many types of evidence Genomic neighborhood Species co-occurrence Gene fusions Database imports Exp. interaction data Microarray expression data Literature co-mentioning
Integrating physical interaction screens Make binary representation of complexes Yeast two-hybrid data sets are inherently binary Calculate score from number of (co-)occurrences Calculate score from non-shared partners Calibrate against KEGG maps Infer associations in other species Combine evidence from experiments
Gene fusion: predicting physical interactions Detect multiple proteins matching to one protein Exclude overlapping alignments Infer associations in other species Calibrate against KEGG  maps
Mining microarray expression databases Re-normalize arrays by modern method to remove biases Build expression matrix Combine similar arrays by PCA Construct predictor by Gaussian kernel density estimation Calibrate against KEGG maps Infer associations in other species
Gene neighborhood: predicting co-expression Identify runs of adjacent genes with the same direction Score each gene pair based on intergenic distances Calibrate against KEGG maps Infer associations in other species
Co-mentioning in the scientific literature Associate abstracts with species Identify gene names in title/abstract Count (co-)occurrences of genes Test significance of associations Calibrate against KEGG maps Infer associations in other species
Phylogenetic profile: co-mentioning in genomes Align all proteins against all Calculate best-hit profile Join similar species by PCA Calculate PC profile distances Calibrate against KEGG maps
Multiple evidence types from several species
Score calibration against a common reference Many diverse types of evidence The quality of each is judged by very different raw scores These are all calibrated against the same reference set Requirements for a reference Must represent a compromise of the all types of evidence Broad species coverage Both a strength and a weakness Scores for all evidence types are directly comparable The type of interaction is currently not predicted
Getting more specific – generally speaking
Other possible improvements Bidirectionally transcribed gene pairs: a new genomic context method that may work on eukaryotes too [Korbel et al.,  Nature Biotechnology  2004] Information extraction from PubMed using shallow parsing [Saric et al., Proceedings of ACL 2004] Add more types of experiment types, e.g. protein expression levels Infer functional relations from feature similarity Hook up STRING with a robot  
Acknowledgments The STRING team Christian von Mering Berend Snel Martijn Huynen Daniel Jaeggi Steffen Schmidt Mathilde Foglierini Peer Bork ArrayProspector web service Julien Lagarde Chris Workman NetView visualization tool Sean Hooper Analysis of yeast cell cycle Ulrik de Lichtenberg Thomas Skøt Anders Fausbøll Søren Brunak Web resources string.embl.de www.bork.embl.de/ArrayProspector www.bork.embl.de/synonyms
Thank you!

More Related Content

PPT
STRING - Prediction of functionally associated proteins from heterogeneous ge...
PPT
STRING - Prediction of functional relations, modules, and networks from heter...
PPT
STRING - Prediction of protein networks through integration of diverse large-...
PPT
STRING - Cross-species integration of known and predicted protein-protein int...
PPT
Interaction prediction with STRING - Principles and examples
PPT
STRING & related databases: Large-scale integration of heterogeneous data
PPT
Integration of biomedical data and electronic publications
PPT
The pragmatic text miner - It's just another type of poorly standardized data
STRING - Prediction of functionally associated proteins from heterogeneous ge...
STRING - Prediction of functional relations, modules, and networks from heter...
STRING - Prediction of protein networks through integration of diverse large-...
STRING - Cross-species integration of known and predicted protein-protein int...
Interaction prediction with STRING - Principles and examples
STRING & related databases: Large-scale integration of heterogeneous data
Integration of biomedical data and electronic publications
The pragmatic text miner - It's just another type of poorly standardized data

What's hot (11)

PPT
The pragmatic text miner: It’s just another type of poorly standardized data
PDF
Gcc talk baltimore july 2014
PDF
MSR david-heckerman_genomics
PPTX
WikiGenomes Poster (ISMB)
PDF
ALS postdoc position 2017
PPT
Network integration of data and text
PPTX
Deep learning
PPT
Open Notebook Science in 15 minutes
PPTX
On the Reproducibility of Science: Unique Identification of Research Resourc...
PPTX
Highly dimensional data_20160926
PPTX
GWAS in a model organism: Arabidopsis thaliana
The pragmatic text miner: It’s just another type of poorly standardized data
Gcc talk baltimore july 2014
MSR david-heckerman_genomics
WikiGenomes Poster (ISMB)
ALS postdoc position 2017
Network integration of data and text
Deep learning
Open Notebook Science in 15 minutes
On the Reproducibility of Science: Unique Identification of Research Resourc...
Highly dimensional data_20160926
GWAS in a model organism: Arabidopsis thaliana
Ad

Similar to STRING - Prediction of protein networks through integration of diverse large-scale data sets (20)

PPT
STRING - Cross-species integration of known and predicted protein-protein int...
PPT
Introduction to STRING
PPT
Proteomics - Analysis and integration of large-scale data sets
PPT
STRING: Prediction of protein networks through integration of diverse large-s...
PPT
STRING - Modeling of pathways through cross-species integration of large-scal...
PPT
Prediction of protein networks through data integration
PPT
STRING - Prediction of a functional association network for the yeast mitocho...
PPT
The STRING database
PPT
STRING - Modeling of biological systems through cross-species data integ...
PPT
Large-scale integration of data and text
PPT
Text and data integration
PPTX
String.pptx
PPT
Protein association networks with STRING
PPT
Large-scale integration of data and text
PPT
Protein protein interaction important doc
PPT
Protein protein interactions in systems biology
PPT
Functional association networks - The STRING and STITCH web resources
PPT
Prediction of protein function
PPT
Cross-species data integration
PPT
STRING & STITCH : Network integration of heterogeneous data
STRING - Cross-species integration of known and predicted protein-protein int...
Introduction to STRING
Proteomics - Analysis and integration of large-scale data sets
STRING: Prediction of protein networks through integration of diverse large-s...
STRING - Modeling of pathways through cross-species integration of large-scal...
Prediction of protein networks through data integration
STRING - Prediction of a functional association network for the yeast mitocho...
The STRING database
STRING - Modeling of biological systems through cross-species data integ...
Large-scale integration of data and text
Text and data integration
String.pptx
Protein association networks with STRING
Large-scale integration of data and text
Protein protein interaction important doc
Protein protein interactions in systems biology
Functional association networks - The STRING and STITCH web resources
Prediction of protein function
Cross-species data integration
STRING & STITCH : Network integration of heterogeneous data
Ad

More from Lars Juhl Jensen (20)

PPT
One tagger, many uses: Illustrating the power of dictionary-based named entit...
PPT
One tagger, many uses: Simple text-mining strategies for biomedicine
PPT
Extract 2.0: Text-mining-assisted interactive annotation
PPT
Network visualization: A crash course on using Cytoscape
PPT
Biomedical text mining: Automatic processing of unstructured text
PPT
Medical network analysis: Linking diseases and genes through data and text mi...
PPT
Network Biology: A crash course on STRING and Cytoscape
PPT
Cellular networks
PPT
Cellular Network Biology: Large-scale integration of data and text
PPT
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
PPT
Tagger: Rapid dictionary-based named entity recognition
PPT
Network Biology: Large-scale integration of data and text
PPT
Medical text mining: Linking diseases, drugs, and adverse reactions
PPT
Network biology: Large-scale integration of data and text
PPT
Medical data and text mining: Linking diseases, drugs, and adverse reactions
PPT
Cellular Network Biology
PPT
Network biology: Large-scale integration of data and text
PPT
Biomarker bioinformatics: Network-based candidate prioritization
PPT
The Art of Counting: Scoring and ranking co-occurrences in literature
PPT
Text-mining-based retrieval of protein networks
One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Simple text-mining strategies for biomedicine
Extract 2.0: Text-mining-assisted interactive annotation
Network visualization: A crash course on using Cytoscape
Biomedical text mining: Automatic processing of unstructured text
Medical network analysis: Linking diseases and genes through data and text mi...
Network Biology: A crash course on STRING and Cytoscape
Cellular networks
Cellular Network Biology: Large-scale integration of data and text
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Tagger: Rapid dictionary-based named entity recognition
Network Biology: Large-scale integration of data and text
Medical text mining: Linking diseases, drugs, and adverse reactions
Network biology: Large-scale integration of data and text
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Cellular Network Biology
Network biology: Large-scale integration of data and text
Biomarker bioinformatics: Network-based candidate prioritization
The Art of Counting: Scoring and ranking co-occurrences in literature
Text-mining-based retrieval of protein networks

Recently uploaded (20)

PDF
cuic standard and advanced reporting.pdf
PDF
Approach and Philosophy of On baking technology
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Cloud computing and distributed systems.
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPT
Teaching material agriculture food technology
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
cuic standard and advanced reporting.pdf
Approach and Philosophy of On baking technology
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
“AI and Expert System Decision Support & Business Intelligence Systems”
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Reach Out and Touch Someone: Haptics and Empathic Computing
Mobile App Security Testing_ A Comprehensive Guide.pdf
Cloud computing and distributed systems.
Understanding_Digital_Forensics_Presentation.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Teaching material agriculture food technology
NewMind AI Weekly Chronicles - August'25 Week I
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Review of recent advances in non-invasive hemoglobin estimation
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf

STRING - Prediction of protein networks through integration of diverse large-scale data sets

  • 1. STRING Prediction of protein networks through integration of diverse large-scale data sets Lars Juhl Jensen EMBL Heidelberg
  • 2. STRING integrates many types of evidence Genomic neighborhood Species co-occurrence Gene fusions Database imports Exp. interaction data Microarray expression data Literature co-mentioning
  • 3. Integrating physical interaction screens Make binary representation of complexes Yeast two-hybrid data sets are inherently binary Calculate score from number of (co-)occurrences Calculate score from non-shared partners Calibrate against KEGG maps Infer associations in other species Combine evidence from experiments
  • 4. Gene fusion: predicting physical interactions Detect multiple proteins matching to one protein Exclude overlapping alignments Infer associations in other species Calibrate against KEGG maps
  • 5. Mining microarray expression databases Re-normalize arrays by modern method to remove biases Build expression matrix Combine similar arrays by PCA Construct predictor by Gaussian kernel density estimation Calibrate against KEGG maps Infer associations in other species
  • 6. Gene neighborhood: predicting co-expression Identify runs of adjacent genes with the same direction Score each gene pair based on intergenic distances Calibrate against KEGG maps Infer associations in other species
  • 7. Co-mentioning in the scientific literature Associate abstracts with species Identify gene names in title/abstract Count (co-)occurrences of genes Test significance of associations Calibrate against KEGG maps Infer associations in other species
  • 8. Phylogenetic profile: co-mentioning in genomes Align all proteins against all Calculate best-hit profile Join similar species by PCA Calculate PC profile distances Calibrate against KEGG maps
  • 9. Multiple evidence types from several species
  • 10. Score calibration against a common reference Many diverse types of evidence The quality of each is judged by very different raw scores These are all calibrated against the same reference set Requirements for a reference Must represent a compromise of the all types of evidence Broad species coverage Both a strength and a weakness Scores for all evidence types are directly comparable The type of interaction is currently not predicted
  • 11. Getting more specific – generally speaking
  • 12. Other possible improvements Bidirectionally transcribed gene pairs: a new genomic context method that may work on eukaryotes too [Korbel et al., Nature Biotechnology 2004] Information extraction from PubMed using shallow parsing [Saric et al., Proceedings of ACL 2004] Add more types of experiment types, e.g. protein expression levels Infer functional relations from feature similarity Hook up STRING with a robot 
  • 13. Acknowledgments The STRING team Christian von Mering Berend Snel Martijn Huynen Daniel Jaeggi Steffen Schmidt Mathilde Foglierini Peer Bork ArrayProspector web service Julien Lagarde Chris Workman NetView visualization tool Sean Hooper Analysis of yeast cell cycle Ulrik de Lichtenberg Thomas Skøt Anders Fausbøll Søren Brunak Web resources string.embl.de www.bork.embl.de/ArrayProspector www.bork.embl.de/synonyms