SlideShare a Scribd company logo
Data integration Integration of functional associations using STRING Lars Juhl Jensen
Jensen, Kuhn et al.,  Nucleic Acids Research , 2009
functional associations
confidence scores
cross-species integration
630 genomes
model organism databases
Ensembl
RefSeq
defining orthology
two modes
protein mode
von Mering et al.,  Nucleic Acids Research , 2005
COG mode
von Mering et al.,  Nucleic Acids Research , 2005
genomic context
gene fusion
Korbel et al.,  Nature Biotechnology , 2004
conserved neighborhood
operons
Korbel et al.,  Nature Biotechnology , 2004
bidirectional promoters
Korbel et al.,  Nature Biotechnology , 2004
phylogenetic profiles
Korbel et al.,  Nature Biotechnology , 2004
examples
bacterial Cox assembly
 
Banci et al.,  PNAS , 2005
Banci et al.,  PNAS , 2005
cellulose degradation
 
 
 
Cell Cellulosomes Cellulose
experimental data
protein interactions
yeast two-hybrid
affinity purification
fragment complementation
Jensen & Bork,  Science , 2008
genetic interactions
Beyer et al.,  Nature Reviews Genetics , 2007
BIND Biomolecular Interaction Network Database
BioGRID General Repository for Interaction Datasets
DIP Database of Interacting Proteins
IntAct
MINT Molecular Interactions Database
HPRD Human Protein Reference Database
PDB Protein Data Bank
inferred associations
gene coexpression
 
GEO Gene Expression Omnibus
expression compendia
curated knowledge
complexes
MIPS Munich Information center for Protein Sequences
Gene Ontology
pathways
Letunic & Bork,  Trends in Biochemical Sciences , 2008
KEGG Kyoto Encyclopedia of Genes and Genomes
MetaCyc
Reactome
PID NCI-Nature Pathway Interaction Database
literature mining
>10 km
M EDLINE
SGD Saccharomyces Genome Database
The Interactive Fly
OMIM Online Mendelian Inheritance in Man
co-mentioning
NLP Natural Language Processing
Gene  and protein  names Cue words for entity recognition Verbs for relation extraction [ nxgene  The  GAL4   gene ] [ nxexpr  T he  expression  of   [ nxgene   the cytochrome  genes   [ nxpg   CYC1  and  CYC7 ]]] is  controlled  by [ nxpg   HAP1 ]
 
easy in theory …
…  but not in practice
many data types
not comparable
variable quality
many sources
different file formats
different gene identifiers
partially redundant
spread over 630 genomes
quality scores
reproducibility
von Mering et al.,  Nucleic Acids Research , 2005
intergenic distances
 
benchmarking
calibrate vs. gold standard
von Mering et al.,  Nucleic Acids Research , 2005
raw quality scores
probabilistic scores
integrate over orthologs
protein mode
von Mering et al.,  Nucleic Acids Research , 2005
COG mode
von Mering et al.,  Nucleic Acids Research , 2005
combine all evidence
Frishman et al.,  Modern Genome Annotation , 2009
small molecules
Kuhn et al.,  Nucleic Acids Research , 2008
metametabolomics
Acknowledgments Christian von Mering Michael Kuhn Manuel Stark Samuel Chaffron Philippe Julien Monica Campillos Tobias Doerks Jan Korbel Berend Snel Martijn Huynen Peer Bork
larsjuhljensen

More Related Content

PPT
Using networks to derive function
PPT
Data integration and functional association networks
PPT
The STITCH and Reflect web resources
PPT
Integration of heterogeneous data
PPT
The STITCH and Reflect web resources
PPT
Network Biology: Large-scale integration of data and text
PPTX
Exploring the role of DNA methylation as a source of phenotypic variation in ...
PPT
Unraveling cellular phosphorylation networks using computational biology
Using networks to derive function
Data integration and functional association networks
The STITCH and Reflect web resources
Integration of heterogeneous data
The STITCH and Reflect web resources
Network Biology: Large-scale integration of data and text
Exploring the role of DNA methylation as a source of phenotypic variation in ...
Unraveling cellular phosphorylation networks using computational biology

What's hot (19)

PPT
Protein association networks: Large-scale integration of data and text
PPT
Gene association networks - Large-scale integration of data and text
PPT
Gene association networks - Large-scale integration of data and text
PPT
Network biology - Large-scale integration of data and text
PPT
Large-scale data and text mining
PPT
Gene association networks: Large-scale integration of data and text
PPT
Gene association networks: Large-scale integration of data and text
PPT
Systems biology - Understanding biology at the systems level
PPT
Unraveling signaling networks by data integration
PPT
Unraveling signal transduction networks through data integration
PPT
Large-scale integration of data and text
PPT
Data Integration and Systems Biology
PPT
Network biology: A basis for large-scale biomedical data mining
PPT
Gene association networks: Large-scale integration of data and text
PPT
From phosphoproteomics to signaling networks
PPT
Information integration
PPT
Mining heterogeneous data: Understanding systems at the level of complexes an...
PPT
Network biology: Large-scale data and text mining
PPT
Protein–protein interaction networks
Protein association networks: Large-scale integration of data and text
Gene association networks - Large-scale integration of data and text
Gene association networks - Large-scale integration of data and text
Network biology - Large-scale integration of data and text
Large-scale data and text mining
Gene association networks: Large-scale integration of data and text
Gene association networks: Large-scale integration of data and text
Systems biology - Understanding biology at the systems level
Unraveling signaling networks by data integration
Unraveling signal transduction networks through data integration
Large-scale integration of data and text
Data Integration and Systems Biology
Network biology: A basis for large-scale biomedical data mining
Gene association networks: Large-scale integration of data and text
From phosphoproteomics to signaling networks
Information integration
Mining heterogeneous data: Understanding systems at the level of complexes an...
Network biology: Large-scale data and text mining
Protein–protein interaction networks
Ad

Viewers also liked (6)

PDF
Survey Results Age Of Unbounded Data June 03 10
PDF
Webinar: SnapLogic Winter 2015
PDF
Industry Report: The State of Customer Data Integration in 2013
PDF
RDAP 15: Research Data Integration in the Purdue Libraries
PDF
Webinar: Attaining Excellence in Big Data Integration
PPT
Data sources and collection methods
Survey Results Age Of Unbounded Data June 03 10
Webinar: SnapLogic Winter 2015
Industry Report: The State of Customer Data Integration in 2013
RDAP 15: Research Data Integration in the Purdue Libraries
Webinar: Attaining Excellence in Big Data Integration
Data sources and collection methods
Ad

Similar to Data integration - Integration of functional associations using STRING (20)

PPT
The STRING database
PPT
Cross-species data integration
PPT
Integration of heterogeneous data
PPT
Protein interaction networks
PPT
STRING - Modeling of biological systems through cross-species data integ...
PPT
Integration of diverse large-scale datasets
PPT
Large-scale integration of data and text
PPT
STRING: Protein networks from data and text mining
PPT
STRING - Protein networks from data and text mining
PPT
Advanced bioinformatics of proteomics datasets
PPT
Gene association networks - Large-scale integration of data and text
PPT
Data and Text Mining
PPT
Network biology: Large-scale data and text mining
PPT
STRING - Large-scale integration of data and text
PPT
Data integration: The STITCH database of protein–small molecule interactions
PPT
STRING: Large-scale data and text mining
PPT
Gene association networks - Large-scale integration of data and text
PPT
Large-scale integration of data and text
PPT
Systems biology: Bioinformatics on complete biological system
PPT
Network biology
The STRING database
Cross-species data integration
Integration of heterogeneous data
Protein interaction networks
STRING - Modeling of biological systems through cross-species data integ...
Integration of diverse large-scale datasets
Large-scale integration of data and text
STRING: Protein networks from data and text mining
STRING - Protein networks from data and text mining
Advanced bioinformatics of proteomics datasets
Gene association networks - Large-scale integration of data and text
Data and Text Mining
Network biology: Large-scale data and text mining
STRING - Large-scale integration of data and text
Data integration: The STITCH database of protein–small molecule interactions
STRING: Large-scale data and text mining
Gene association networks - Large-scale integration of data and text
Large-scale integration of data and text
Systems biology: Bioinformatics on complete biological system
Network biology

More from Lars Juhl Jensen (20)

PPT
One tagger, many uses: Illustrating the power of dictionary-based named entit...
PPT
One tagger, many uses: Simple text-mining strategies for biomedicine
PPT
Extract 2.0: Text-mining-assisted interactive annotation
PPT
Network visualization: A crash course on using Cytoscape
PPT
STRING & STITCH : Network integration of heterogeneous data
PPT
Biomedical text mining: Automatic processing of unstructured text
PPT
Medical network analysis: Linking diseases and genes through data and text mi...
PPT
Network Biology: A crash course on STRING and Cytoscape
PPT
Cellular networks
PPT
Cellular Network Biology: Large-scale integration of data and text
PPT
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
PPT
STRING & related databases: Large-scale integration of heterogeneous data
PPT
Tagger: Rapid dictionary-based named entity recognition
PPT
Medical text mining: Linking diseases, drugs, and adverse reactions
PPT
Network biology: Large-scale integration of data and text
PPT
Medical data and text mining: Linking diseases, drugs, and adverse reactions
PPT
Cellular Network Biology
PPT
Network biology: Large-scale integration of data and text
PPT
Biomarker bioinformatics: Network-based candidate prioritization
PPT
The Art of Counting: Scoring and ranking co-occurrences in literature
One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Simple text-mining strategies for biomedicine
Extract 2.0: Text-mining-assisted interactive annotation
Network visualization: A crash course on using Cytoscape
STRING & STITCH : Network integration of heterogeneous data
Biomedical text mining: Automatic processing of unstructured text
Medical network analysis: Linking diseases and genes through data and text mi...
Network Biology: A crash course on STRING and Cytoscape
Cellular networks
Cellular Network Biology: Large-scale integration of data and text
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
STRING & related databases: Large-scale integration of heterogeneous data
Tagger: Rapid dictionary-based named entity recognition
Medical text mining: Linking diseases, drugs, and adverse reactions
Network biology: Large-scale integration of data and text
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Cellular Network Biology
Network biology: Large-scale integration of data and text
Biomarker bioinformatics: Network-based candidate prioritization
The Art of Counting: Scoring and ranking co-occurrences in literature

Data integration - Integration of functional associations using STRING