SlideShare a Scribd company logo
The STRING database Lars Juhl Jensen EMBL Heidelberg
data integration
Jensen et al., Drug Discovery Today: Targets, 2004
functional interactions
Bork et al., Current Opinion in Structural Biology, 2005
373 proteomes
Genome Reviews
RefSeq
Ensembl
model organism databases
genomic context methods
gene fusion
 
gene neighborhood
 
phylogenetic profiles
 
 
 
 
Cell Cellulosomes Cellulose
automation
scoring scheme
correct interactions
wrong associations
gene fusion
sequence similarity
 
gene neighborhood
sum of intergenic distances
 
phylogenetic profiles
SVD Singular Value Decomposition
Euclidian distance
 
raw quality scores
not comparable
sequence similarity
sum of intergenic distances
Euclidian distance
benchmarking
calibrate vs. gold standard
 
raw quality scores
probabilistic scores
curated knowledge
KEGG Kyoto Encyclopedia of Genes and Genomes
Reactome
MIPS Munich Information center for Protein Sequences
STKE Signal Transduction Knowledge Environment
primary experimental data
many sources
many parsers
physical protein interactions
BIND Biomolecular Interaction Network Database
GRID General Repository for Interaction Datasets
MINT Molecular Interactions Database
DIP Database of Interacting Proteins
HPRD Human Protein Reference Database
merge data by publication
topology-based scores
von Mering et al., Nucleic Acids Research, 2005
co-expression
GEO Gene Expression Omnibus
correlation coefficient
literature mining
different gene identifiers
synonyms lists
M EDLINE
SGD Saccharomyces Genome Database
The Interactive Fly
OMIM Online Mendelian Inheritance in Man
co-mentioning
NLP Natural Language Processing
Gene  and protein  names Cue words for entity recognition Verbs for relation extraction [ nxgene  The  GAL4   gene ] [ nxexpr  T he  expression  of   [ nxgene   the cytochrome  genes   [ nxpg   CYC1  and  CYC7 ]]] is  controlled  by [ nxpg   HAP1 ]
calibrate vs. gold standard
 
combine all evidence
spread over many species
transfer by orthology
von Mering et al., Nucleic Acids Research, 2005
two modes
 
orthologous groups
von Mering et al., Nucleic Acids Research, 2005
fuzzy orthology
von Mering et al., Nucleic Acids Research, 2005
Bayesian scoring scheme
Bork et al., Current Opinion in Structural Biology, 2005
Acknowledgments The STRING team (EMBL) Christian von Mering Berend Snel Martijn Huynen Sean Hooper Samuel Chaffron Julien Lagarde Mathilde Foglierini Peer Bork Literature mining project (EML Research) Jasmin Saric Rossitza Ouzounova Isabel Rojas

More Related Content

PPT
The STRING database
PPT
The STRING database and related tools
PPT
Protein association networks with STRING
PPT
The STRING database - Quality scores for heterogeneous interaction data
KEY
STRING/STITCH tutorial
PPT
Network biology: Large-scale data and text mining
PPT
Large-scale integration of data and text
PPT
STRING - Protein networks from data and text mining
The STRING database
The STRING database and related tools
Protein association networks with STRING
The STRING database - Quality scores for heterogeneous interaction data
STRING/STITCH tutorial
Network biology: Large-scale data and text mining
Large-scale integration of data and text
STRING - Protein networks from data and text mining

What's hot (20)

PPT
Introduction to STRING
PPT
STRING: Large-scale data and text mining
PPT
STRING - Modeling of biological systems through cross-species data integ...
PPT
Integration of heterogeneous data
PPT
Cross-species data integration
PPT
Gene association networks - Large-scale integration of data and text
PPT
STRING - Large-scale integration of data and text
PPT
Gene association networks - Large-scale integration of data and text
PPT
Network biology: A crash course on STRING and Cytoscape
PPT
Network biology - Large-scale integration of data and text
PPT
Network biology: Large-scale data and text mining
PPT
Gene association networks - Large-scale integration of data and text
PPT
Large-scale integration of data and text
PPT
Biomarker bioinformatics: Network-based candidate prioritization
PPT
Systems biology: Bioinformatics on complete biological system
PPT
STRING & related databases: Large-scale integration of heterogeneous data
PPT
Network biology: Large-scale data integration and text mining
PPT
Systems biology - Understanding biology at the systems level
PPT
Cellular network biology: Proteome-wide analysis of heterogeneous data
PPT
Making gene networks through data integration
Introduction to STRING
STRING: Large-scale data and text mining
STRING - Modeling of biological systems through cross-species data integ...
Integration of heterogeneous data
Cross-species data integration
Gene association networks - Large-scale integration of data and text
STRING - Large-scale integration of data and text
Gene association networks - Large-scale integration of data and text
Network biology: A crash course on STRING and Cytoscape
Network biology - Large-scale integration of data and text
Network biology: Large-scale data and text mining
Gene association networks - Large-scale integration of data and text
Large-scale integration of data and text
Biomarker bioinformatics: Network-based candidate prioritization
Systems biology: Bioinformatics on complete biological system
STRING & related databases: Large-scale integration of heterogeneous data
Network biology: Large-scale data integration and text mining
Systems biology - Understanding biology at the systems level
Cellular network biology: Proteome-wide analysis of heterogeneous data
Making gene networks through data integration
Ad

Viewers also liked (17)

PPT
Medical data and text mining - Linking diseases, drugs, and adverse reactions
PPT
Mining heaps of data and piles of papers
PPT
Networks of proteins and diseases
PPT
The Literature Text Mining Approach In Cancer Research
PPT
Integration of heterogeneous data
PPT
Network biology: Large-scale data integration and text mining
PPT
Identification of drug targets from side-effect similarity
PPT
Information integration
PPT
Biological literature mining - from information retrieval to biological disco...
PPT
Biomedical literature mining
PPT
Biomedical literature mining (and why we really need open access)
PDF
Literature-based discovery: it's all about connecting dots in widely disparat...
PPT
Mining heterogeneous data: Understanding systems at the level of complexes an...
PPT
One tagger, many uses - Illustrating the power of ontologies in named entity ...
PPT
Systems biology: Large-scale biomedical data mining
PDF
Bibliological data science and drug discovery
PDF
Biomedical Relation Extraction for Knowledge Graph Completion
Medical data and text mining - Linking diseases, drugs, and adverse reactions
Mining heaps of data and piles of papers
Networks of proteins and diseases
The Literature Text Mining Approach In Cancer Research
Integration of heterogeneous data
Network biology: Large-scale data integration and text mining
Identification of drug targets from side-effect similarity
Information integration
Biological literature mining - from information retrieval to biological disco...
Biomedical literature mining
Biomedical literature mining (and why we really need open access)
Literature-based discovery: it's all about connecting dots in widely disparat...
Mining heterogeneous data: Understanding systems at the level of complexes an...
One tagger, many uses - Illustrating the power of ontologies in named entity ...
Systems biology: Large-scale biomedical data mining
Bibliological data science and drug discovery
Biomedical Relation Extraction for Knowledge Graph Completion
Ad

Similar to The STRING database (17)

PPT
Using networks to derive function
PPT
Data integration and functional association networks
PPT
Functional association networks - The STRING and STITCH web resources
PPT
Integration of diverse large-scale datasets
PPT
Data integration - Integration of functional associations using STRING
PPT
Prediction of protein networks through data integration
PPT
Advanced bioinformatics of proteomics datasets
PPT
Systems biology: Bioinformatics on complete biological systems
PPT
Large-scale integration of data and text
PPT
Protein interaction networks from yeast to human
PPT
Large-scale data and text mining
PPT
Network integration of heterogeneous data
PPT
Networks of proteins and diseases
PPT
Data and Text Mining
PPT
The STITCH and Reflect web resources
PPT
STRING: Protein networks from data and text mining
PPT
Protein interaction networks
Using networks to derive function
Data integration and functional association networks
Functional association networks - The STRING and STITCH web resources
Integration of diverse large-scale datasets
Data integration - Integration of functional associations using STRING
Prediction of protein networks through data integration
Advanced bioinformatics of proteomics datasets
Systems biology: Bioinformatics on complete biological systems
Large-scale integration of data and text
Protein interaction networks from yeast to human
Large-scale data and text mining
Network integration of heterogeneous data
Networks of proteins and diseases
Data and Text Mining
The STITCH and Reflect web resources
STRING: Protein networks from data and text mining
Protein interaction networks

More from Lars Juhl Jensen (20)

PPT
One tagger, many uses: Illustrating the power of dictionary-based named entit...
PPT
One tagger, many uses: Simple text-mining strategies for biomedicine
PPT
Extract 2.0: Text-mining-assisted interactive annotation
PPT
Network visualization: A crash course on using Cytoscape
PPT
STRING & STITCH : Network integration of heterogeneous data
PPT
Biomedical text mining: Automatic processing of unstructured text
PPT
Medical network analysis: Linking diseases and genes through data and text mi...
PPT
Network Biology: A crash course on STRING and Cytoscape
PPT
Cellular networks
PPT
Cellular Network Biology: Large-scale integration of data and text
PPT
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
PPT
Tagger: Rapid dictionary-based named entity recognition
PPT
Network Biology: Large-scale integration of data and text
PPT
Medical text mining: Linking diseases, drugs, and adverse reactions
PPT
Network biology: Large-scale integration of data and text
PPT
Medical data and text mining: Linking diseases, drugs, and adverse reactions
PPT
Cellular Network Biology
PPT
Network biology: Large-scale integration of data and text
PPT
The Art of Counting: Scoring and ranking co-occurrences in literature
PPT
Text-mining-based retrieval of protein networks
One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Simple text-mining strategies for biomedicine
Extract 2.0: Text-mining-assisted interactive annotation
Network visualization: A crash course on using Cytoscape
STRING & STITCH : Network integration of heterogeneous data
Biomedical text mining: Automatic processing of unstructured text
Medical network analysis: Linking diseases and genes through data and text mi...
Network Biology: A crash course on STRING and Cytoscape
Cellular networks
Cellular Network Biology: Large-scale integration of data and text
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Tagger: Rapid dictionary-based named entity recognition
Network Biology: Large-scale integration of data and text
Medical text mining: Linking diseases, drugs, and adverse reactions
Network biology: Large-scale integration of data and text
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Cellular Network Biology
Network biology: Large-scale integration of data and text
The Art of Counting: Scoring and ranking co-occurrences in literature
Text-mining-based retrieval of protein networks

Recently uploaded (20)

PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Empathic Computing: Creating Shared Understanding
PDF
cuic standard and advanced reporting.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Cloud computing and distributed systems.
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Encapsulation theory and applications.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Spectral efficient network and resource selection model in 5G networks
Empathic Computing: Creating Shared Understanding
cuic standard and advanced reporting.pdf
Machine learning based COVID-19 study performance prediction
Mobile App Security Testing_ A Comprehensive Guide.pdf
MYSQL Presentation for SQL database connectivity
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Cloud computing and distributed systems.
NewMind AI Weekly Chronicles - August'25 Week I
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Chapter 3 Spatial Domain Image Processing.pdf
Big Data Technologies - Introduction.pptx
Encapsulation theory and applications.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton

The STRING database