SlideShare a Scribd company logo
Integration of heterogeneous data Lars Juhl Jensen
 
 
 
 
data mining
text mining
interaction networks
 
Kuhn et al.,  Nucleic Acids Research , 2010
parts lists
630 genomes
2.5 million proteins
~74,000 small molecules
many databases
different formats
model organism databases
Ensembl
RefSeq
PubChem
genomic context
gene fusion
Korbel et al.,  Nature Biotechnology , 2004
conserved neighborhood
operons
Korbel et al.,  Nature Biotechnology , 2004
bidirectional promoters
Korbel et al.,  Nature Biotechnology , 2004
phylogenetic profiles
Korbel et al.,  Nature Biotechnology , 2004
experimental data
gene coexpression
 
protein interactions
Jensen & Bork,  Science , 2008
genetic interactions
Beyer et al.,  Nature Reviews Genetics , 2007
small molecule interactions
in vitro  binding assays
cellular activity assays
many databases
GEO Gene Expression Omnibus
BIND Biomolecular Interaction Network Database
BioGRID General Repository for Interaction Datasets
DIP Database of Interacting Proteins
IntAct
MINT Molecular Interactions Database
HPRD Human Protein Reference Database
PDB Protein Data Bank
BindingDB
CTD Comparative Toxicogenomics Database
DrugBank
GLIDA GPCR-Ligand Database
MATADOR
PDSP K i Psycoactive Drug Screening Program
PharmGKB Pharmacogenomics Knowledge Base
different formats
different identifiers
partially redundant
Campillos & Kuhn et al.,  Science , 2008
curated knowledge
complexes
pathways
Letunic & Bork,  Trends in Biochemical Sciences , 2008
many databases
Gene Ontology
MIPS Munich Information center for Protein Sequences
KEGG Kyoto Encyclopedia of Genes and Genomes
MetaCyc
Reactome
PID NCI-Nature Pathway Interaction Database
high confidence
different formats
different identifiers
partially redundant
literature mining
>10 km
human readable
not computer readable
different names
text corpus
M EDLINE
SGD Saccharomyces Genome Database
The Interactive Fly
OMIM Online Mendelian Inheritance in Man
thesaurus
co-mentioning
statistical methods
NLP Natural Language Processing
Gene  and protein  names Cue words for entity recognition Verbs for relation extraction [ nxgene  The  GAL4   gene ] [ nxexpr  T he  expression  of   [ nxgene   the cytochrome  genes   [ nxpg   CYC1  and  CYC7 ]]] is  controlled  by [ nxpg   HAP1 ]
 
restricted access
Reflect
augmented browsing
Pafilis, O’Donoghue, Jensen et al.,  Nature Biotechnology , 2009
integration
the easy problems
many databases
different formats
different identifiers
partially redundant
parsers
thesaurus
book keeping
the hard problems
many data types
not comparable
variable quality
raw quality scores
intergenic distances
Korbel et al.,  Nature Biotechnology , 2004
correlations
 
reproducibility
von Mering et al.,  Nucleic Acids Research , 2005
score calibration
gold standard
von Mering et al.,  Nucleic Acids Research , 2005
spread over 630 genomes
transfer by orthology
von Mering et al.,  Nucleic Acids Research , 2005
two modes
COG mode
von Mering et al.,  Nucleic Acids Research , 2005
protein mode
von Mering et al.,  Nucleic Acids Research , 2005
combine all evidence
P = 1-(1-P 1 )(1-P 2 )(1-P 3 ) …
visualize
Kuhn et al.,  Nucleic Acids Research , 2010
access
access for humans
web interfaces
 
 
 
access for computers
web services
REST Representational State Transfer
SOAP Simple Object Access Protocol
Acknowledgments STITCH Michael Kuhn Damian Szklarczyk Andrea Franceschini Monica Campillos Christian von Mering Lars Juhl Jensen Andreas Beyer Peer Bork Reflect Sean O’Donoghue Heiko Horn Sune Frankild Evangelos Pafilis Michael Kuhn Nigel Brown Reinhardt Schneider STRING Christian von Mering Michael Kuhn Manuel Stark Samuel Chaffron Chris Creevey Jean Muller Tobias Doerks Philippe Julien Alexander Roth Milan Simonovic Jan Korbel Berend Snel Martijn Huynen Peer Bork
larsjuhljensen

More Related Content

PPT
The STITCH and Reflect web resources
PPT
Data integration and functional association networks
PPT
The STITCH and Reflect web resources
PPT
Using networks to derive function
KEY
STRING/STITCH tutorial
PPT
Data integration - Integration of functional associations using STRING
PPT
The STRING database and related tools
PPT
The STRING database
The STITCH and Reflect web resources
Data integration and functional association networks
The STITCH and Reflect web resources
Using networks to derive function
STRING/STITCH tutorial
Data integration - Integration of functional associations using STRING
The STRING database and related tools
The STRING database

What's hot (20)

PPT
Large-scale integration of data and text
PPT
STRING - Modeling of biological systems through cross-species data integ...
PPT
Network Biology: Large-scale integration of data and text
PPT
Network biology: Large-scale data and text mining
PPT
The STRING database
PPT
Cellular network biology: Proteome-wide analysis of heterogeneous data
PPT
STRING - Protein networks from data and text mining
PPT
From phosphoproteomics to signaling networks
PPT
STRING: Large-scale data and text mining
PPT
Protein association networks with STRING
PPT
The STRING database - Quality scores for heterogeneous interaction data
PPT
Systems biology - Understanding biology at the systems level
PPT
Introduction to STRING
PPT
Gene association networks - Large-scale integration of data and text
PPT
Gene association networks - Large-scale integration of data and text
PPT
Network biology: Large-scale data integration and text mining
PPT
Systems biology: Bioinformatics on complete biological system
PPT
Scientific Highlights: The Reflect and NetPhorest web resources
PPT
Cross-species data integration
PPT
Network biology: Large-scale data integration and text mining
Large-scale integration of data and text
STRING - Modeling of biological systems through cross-species data integ...
Network Biology: Large-scale integration of data and text
Network biology: Large-scale data and text mining
The STRING database
Cellular network biology: Proteome-wide analysis of heterogeneous data
STRING - Protein networks from data and text mining
From phosphoproteomics to signaling networks
STRING: Large-scale data and text mining
Protein association networks with STRING
The STRING database - Quality scores for heterogeneous interaction data
Systems biology - Understanding biology at the systems level
Introduction to STRING
Gene association networks - Large-scale integration of data and text
Gene association networks - Large-scale integration of data and text
Network biology: Large-scale data integration and text mining
Systems biology: Bioinformatics on complete biological system
Scientific Highlights: The Reflect and NetPhorest web resources
Cross-species data integration
Network biology: Large-scale data integration and text mining
Ad

Viewers also liked (8)

PPT
Mining heterogeneous data: Understanding systems at the level of complexes an...
PPTX
Connecting Heterogeneous Collections using Linked Data
PDF
Heterogeneous data fusion with multiple kernel growing self organizing maps
PDF
Machine-Interpretable Dataset and Service Descriptions for Heterogeneous Data...
PPTX
Gulf of Mexico Hydrocarbon Database: Integrating Heterogeneous Data for Impro...
PDF
Too Long; Didn’t Watch! Extracting Relevant Fragments from Software Developme...
PDF
Summarizing Complex Development Artifacts by Mining Heterogeneous Data
PDF
The Heterogeneous Data lake
Mining heterogeneous data: Understanding systems at the level of complexes an...
Connecting Heterogeneous Collections using Linked Data
Heterogeneous data fusion with multiple kernel growing self organizing maps
Machine-Interpretable Dataset and Service Descriptions for Heterogeneous Data...
Gulf of Mexico Hydrocarbon Database: Integrating Heterogeneous Data for Impro...
Too Long; Didn’t Watch! Extracting Relevant Fragments from Software Developme...
Summarizing Complex Development Artifacts by Mining Heterogeneous Data
The Heterogeneous Data lake
Ad

Similar to Integration of heterogeneous data (20)

PPT
Network biology: Large-scale data and text mining
PPT
Integration of heterogeneous data
PPT
Gene association networks - Large-scale integration of data and text
PPT
Integration of diverse large-scale datasets
PPT
Network biology
PPT
Protein interaction networks
PPT
STRING - Large-scale integration of data and text
PPT
Networks of proteins and diseases
PPT
Networks of proteins and diseases
PPT
Protein association networks: Large-scale integration of data and text
PPT
Information integration
PPT
Functional association networks - The STRING and STITCH web resources
PPT
Data and Text Mining
PPT
Large-scale integration of data and text
PPT
Gene association networks: Large-scale integration of data and text
PPT
Gene association networks: Large-scale integration of data and text
PPT
Large-scale data and text mining
PPT
STRING & STITCH : Network integration of heterogeneous data
PPT
Systems biology: Bioinformatics on complete biological systems
PPT
Large-scale integration of data and text
Network biology: Large-scale data and text mining
Integration of heterogeneous data
Gene association networks - Large-scale integration of data and text
Integration of diverse large-scale datasets
Network biology
Protein interaction networks
STRING - Large-scale integration of data and text
Networks of proteins and diseases
Networks of proteins and diseases
Protein association networks: Large-scale integration of data and text
Information integration
Functional association networks - The STRING and STITCH web resources
Data and Text Mining
Large-scale integration of data and text
Gene association networks: Large-scale integration of data and text
Gene association networks: Large-scale integration of data and text
Large-scale data and text mining
STRING & STITCH : Network integration of heterogeneous data
Systems biology: Bioinformatics on complete biological systems
Large-scale integration of data and text

More from Lars Juhl Jensen (20)

PPT
One tagger, many uses: Illustrating the power of dictionary-based named entit...
PPT
One tagger, many uses: Simple text-mining strategies for biomedicine
PPT
Extract 2.0: Text-mining-assisted interactive annotation
PPT
Network visualization: A crash course on using Cytoscape
PPT
Biomedical text mining: Automatic processing of unstructured text
PPT
Medical network analysis: Linking diseases and genes through data and text mi...
PPT
Network Biology: A crash course on STRING and Cytoscape
PPT
Cellular networks
PPT
Cellular Network Biology: Large-scale integration of data and text
PPT
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
PPT
STRING & related databases: Large-scale integration of heterogeneous data
PPT
Tagger: Rapid dictionary-based named entity recognition
PPT
Medical text mining: Linking diseases, drugs, and adverse reactions
PPT
Network biology: Large-scale integration of data and text
PPT
Medical data and text mining: Linking diseases, drugs, and adverse reactions
PPT
Cellular Network Biology
PPT
Network biology: Large-scale integration of data and text
PPT
Biomarker bioinformatics: Network-based candidate prioritization
PPT
The Art of Counting: Scoring and ranking co-occurrences in literature
PPT
Text-mining-based retrieval of protein networks
One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Simple text-mining strategies for biomedicine
Extract 2.0: Text-mining-assisted interactive annotation
Network visualization: A crash course on using Cytoscape
Biomedical text mining: Automatic processing of unstructured text
Medical network analysis: Linking diseases and genes through data and text mi...
Network Biology: A crash course on STRING and Cytoscape
Cellular networks
Cellular Network Biology: Large-scale integration of data and text
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
STRING & related databases: Large-scale integration of heterogeneous data
Tagger: Rapid dictionary-based named entity recognition
Medical text mining: Linking diseases, drugs, and adverse reactions
Network biology: Large-scale integration of data and text
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Cellular Network Biology
Network biology: Large-scale integration of data and text
Biomarker bioinformatics: Network-based candidate prioritization
The Art of Counting: Scoring and ranking co-occurrences in literature
Text-mining-based retrieval of protein networks

Recently uploaded (20)

PPT
Teaching material agriculture food technology
PPTX
Big Data Technologies - Introduction.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Encapsulation theory and applications.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
cuic standard and advanced reporting.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
MIND Revenue Release Quarter 2 2025 Press Release
Teaching material agriculture food technology
Big Data Technologies - Introduction.pptx
Empathic Computing: Creating Shared Understanding
Encapsulation_ Review paper, used for researhc scholars
A comparative analysis of optical character recognition models for extracting...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Electronic commerce courselecture one. Pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Dropbox Q2 2025 Financial Results & Investor Presentation
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Encapsulation theory and applications.pdf
Spectral efficient network and resource selection model in 5G networks
cuic standard and advanced reporting.pdf
A Presentation on Artificial Intelligence
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
MIND Revenue Release Quarter 2 2025 Press Release

Integration of heterogeneous data