SlideShare a Scribd company logo
Integration of diverse large-scale datasets
Lars Juhl Jensen
 
 
 
promoter analysis
Jensen et al., Bioinformatics, 2000
DNA structure
genome visualization
Pedersen et al., Journal of Molecular Biology, 2000
microarray normalization
Workman et al., Genome Biology, 2002
protein function prediction
 
 
 
 
STRING
 
integrate diverse evidence
functional interactions
Bork et al., Current Opinion in Structural Biology, 2005
179 proteomes
evolution
 
 
statistics
(the original sin)
prokaryotes
genomic context methods
gene fusion
 
gene neighborhood
 
phylogenetic profiles
 
 
 
 
Cell Cellulosomes Cellulose
eukaryotes
integrate diverse datasets
Jensen et al., Drug Discovery Today: Targets, 2004
curated knowledge
MIPS Munich Information center for Protein Sequences
KEGG Kyoto Encyclopedia of Genes and Genomes
STKE Signal Transduction Knowledge Environment
Reactome
literature mining
M EDLINE
SGD Saccharomyces Genome Database
The Interactive Fly
OMIM Online Mendelian Inheritance in Man
co-mentioning
NLP Natural Language Processing
Gene  and protein  names Cue words for entity recognition Verbs for relation extraction [ nxgene  The  GAL4   gene ] [ nxexpr  T he  expression  of   [ nxgene   the cytochrome  genes   [ nxpg   CYC1  and  CYC7 ]]] is  controlled  by [ nxpg   HAP1 ]
 
primary experimental data
microarray expression data
GEO Gene Expression Omnibus
physical protein interactions
BIND Biomolecular Interaction Network Database
MINT Molecular Interactions Database
GRID General Repository for Interaction Datasets
DIP Database of Interacting Proteins
HPRD Human Protein Reference Database
problems
many sources
(different gene identifiers)
many types of evidence
questionable quality
not directly comparable
spread over many species
huge synonyms lists
calculate raw quality scores
calibrate vs. gold standard
KEGG Kyoto Encyclopedia of Genes and Genomes
von Mering et al., Nucleic Acids Research, 2005
transfer based on orthology
combine all evidence
Bork et al., Current Opinion in Structural Biology, 2005
cell cycle
qualitative modeling
 
Chen et al., Molecular Biology of the Cell, 2004
Chen et al., Molecular Biology of the Cell, 2004
synchronized cell culture
 
microarray time series
 
periodically expressed genes
 
S. cerevisiae
Cho et al.
Spellman et al.
numerous analysis methods
Cho et al.
Spellman et al.
Zhao et al.
Johansson et al.
Luan and Li
Lu et al.
Ahdesm äki et al.
Willbrand et al.
no benchmarking
de Lichtenberg et al., Bioinformatics, 2005
reproducibility
de Lichtenberg et al., Bioinformatics, 2005
regulation vs. periodicity
de Lichtenberg et al., Bioinformatics, 2005
list of 600 periodic genes
S. pombe
several expression studies
reproducibility
Marguerat et al., Yeast, 2006
name inconsistencies
Marguerat et al., Yeast, 2006
different analysis methods
no benchmarking
Marguerat et al., Yeast, 2006
Marguerat et al., Yeast, 2006
too many genes suggested
Marguerat et al., Yeast, 2006
Marguerat et al., Yeast, 2006
averaging better than voting
Marguerat et al., Yeast, 2006
S. cerevisiae
list of 600 periodic genes
protein interaction data
 
von Mering et al., Nucleic Acids Research, 2005
de Lichtenberg et al., Science, 2005
dynamic proteins
static proteins
de Lichtenberg et al., Science, 2005
reproduces what is known
de Lichtenberg et al., Science, 2005
many detailed predictions
de Lichtenberg et al., Science, 2005
global trends
dynamic proteins
de Lichtenberg et al., Science, 2005
static proteins
de Lichtenberg et al., Science, 2005
just-in-time assembly
de Lichtenberg et al., Science, 2005
de Lichtenberg et al., Science, 2005
coordinated regulation
periodically expressed genes
Cdc28p substrates
PEST degradation signals
the human interactome
yeast two-hybrid
1936 13 4 4 1385 65 18465 Stelzl  et al. Rual  et al. Small-scale studies
32 0 3 4 18 4 23 Stelzl  et al. Rual  et al. Small-scale studies
62 8 39 Small-scale studies Stelzl  et al. Rual  et al. 852 17 473 432 69 260
3.5% and 21% sensitivity
in a couple of years
the human interactome
100% = 1/5?
the yeast interactome
five years ago
yeast two-hybrid
1150 117 117 72 4053 118 4469 Uetz  et al. Ito  et al. Small-scale studies
162 53 34 72 180 29 338 Uetz  et al. Ito  et al. Small-scale studies
511 189 616 Small-scale studies Uetz  et al. Ito  et al. 439 178 759 897 190 1347
19% and 12% sensitivity
the challenge
how to get from here …
1936 13 4 4 1385 65 18465 Stelzl  et al. Rual  et al. Small-scale studies
…  to there …
de Lichtenberg et al., Science, 2005
Acknowledgments The STRING team (EMBL) Christian von Mering Berend Snel Martijn Huynen Sean Hooper Mathilde Foglierini Julien Lagarde Peer Bork Literature mining project (EML Research) Jasmin Saric Rossitza Ouzounova Isabel Rojas Cell cycle studies (CBS) Ulrik de Lichtenberg Thomas Skøt Jensen Søren Brunak S. pombe  cell cycle (Sanger) Samuel Marguerat J ürg Bähler Inspiration for presentation Lawrence Lessig Dick Clarence Hardt Anders Gorm Pedersen
Thank you!

More Related Content

PPT
Protein interaction networks from yeast to human
PPT
Data and Text Mining
PPT
Mining heterogeneous data: Understanding systems at the level of complexes an...
PDF
Microbial Agrogenomics 4/2/2015, UK-MX Workshop
PPT
Integration of heterogeneous data
PPT
Network Biology: A crash course on STRING and Cytoscape
PDF
Mike (Gang) CV-updated
PPT
Cellular network biology: Proteome-wide analysis of heterogeneous data
Protein interaction networks from yeast to human
Data and Text Mining
Mining heterogeneous data: Understanding systems at the level of complexes an...
Microbial Agrogenomics 4/2/2015, UK-MX Workshop
Integration of heterogeneous data
Network Biology: A crash course on STRING and Cytoscape
Mike (Gang) CV-updated
Cellular network biology: Proteome-wide analysis of heterogeneous data

What's hot (20)

PPT
The STITCH and Reflect web resources
PPT
The STITCH and Reflect web resources
PPT
Bms 2010
PPTX
The Genomics Revolution: The Good, The Bad, and The Ugly
PPT
Network biology
PPT
The STRING database and related tools
PPT
Networks of proteins and diseases
PPT
Real-time Phylogenomics: Joe Parker
PDF
Jane Yang_Resume_10-2016
PPTX
Graph properties of biological networks
PPTX
Quadruple helix dna and epigenetics in rheumatism genes
PDF
The Central dogma
PPT
Genetic engineering
PDF
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
DOCX
EGRupdatedCVwithCL052015
PDF
Wenbin Mei: The cause and consequence of alternative splicing in maize and ac...
PPT
Protein networks as a scaffold for structuring other data
PDF
Aaron_Bender_resume1
PPTX
GIAB Sep2016 Lightning mason chris_epi_qc
PDF
2016-07-CV_JaemunChoi04
The STITCH and Reflect web resources
The STITCH and Reflect web resources
Bms 2010
The Genomics Revolution: The Good, The Bad, and The Ugly
Network biology
The STRING database and related tools
Networks of proteins and diseases
Real-time Phylogenomics: Joe Parker
Jane Yang_Resume_10-2016
Graph properties of biological networks
Quadruple helix dna and epigenetics in rheumatism genes
The Central dogma
Genetic engineering
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
EGRupdatedCVwithCL052015
Wenbin Mei: The cause and consequence of alternative splicing in maize and ac...
Protein networks as a scaffold for structuring other data
Aaron_Bender_resume1
GIAB Sep2016 Lightning mason chris_epi_qc
2016-07-CV_JaemunChoi04
Ad

Viewers also liked (20)

PPT
Open access - making the most of biomedical literature mining
PPT
Literature mining and large-scale data integration
PPT
Network integration of heterogeneous data
PDF
Brecha Digital Chile Comunal
PDF
PPS
Matzeget Shoah
PPT
PresentacióN1
PDF
Cartea Stelelor
PPT
Meguilat Rut
PPT
Week With G & P
PPS
Lo Que Se Promete Mag
PPT
Debates 2
PPS
Holocausto
PDF
Vrsovice Banner Case Study
ODP
Listening Experience3.4
PPS
Blondes Motsfleches
PPS
50 Momentos unicos
PDF
PDF
Marketing Internacional 2008 2 1 Fase mktpassos
PPT
Gil Giardelli Www Versus Wwd A Web 3
Open access - making the most of biomedical literature mining
Literature mining and large-scale data integration
Network integration of heterogeneous data
Brecha Digital Chile Comunal
Matzeget Shoah
PresentacióN1
Cartea Stelelor
Meguilat Rut
Week With G & P
Lo Que Se Promete Mag
Debates 2
Holocausto
Vrsovice Banner Case Study
Listening Experience3.4
Blondes Motsfleches
50 Momentos unicos
Marketing Internacional 2008 2 1 Fase mktpassos
Gil Giardelli Www Versus Wwd A Web 3
Ad

Similar to Integration of diverse large-scale datasets (20)

PPT
Integration of heterogeneous data
PPT
Proteomics - Analysis and integration of large-scale data sets
PPT
Data integration - Integration of functional associations using STRING
PPT
STRING - Modeling of pathways through cross-species integration of large-scal...
PPT
The STRING database - Quality scores for heterogeneous interaction data
PPT
STRING - Cross-species integration of known and predicted protein-protein int...
PPT
The STRING database
PPT
STRING - Prediction of a functional association network for the yeast mitocho...
PPT
Cross-species data integration
PPT
The STRING database
PPT
STRING - Modeling of biological systems through cross-species data integ...
PPT
Information integration
PPT
Functional association networks - The STRING and STITCH web resources
PPT
Mining large-scale data sets on the eukaryotic cell cycle
PPT
Systems biology - Understanding biology at the systems level
PPT
Unraveling signal transduction networks through data integration
PPT
Prediction of protein function
PPT
Using networks to derive function
PPT
Dynamic complex formation during the yeast cell cycle
PPT
Protein interaction networks
Integration of heterogeneous data
Proteomics - Analysis and integration of large-scale data sets
Data integration - Integration of functional associations using STRING
STRING - Modeling of pathways through cross-species integration of large-scal...
The STRING database - Quality scores for heterogeneous interaction data
STRING - Cross-species integration of known and predicted protein-protein int...
The STRING database
STRING - Prediction of a functional association network for the yeast mitocho...
Cross-species data integration
The STRING database
STRING - Modeling of biological systems through cross-species data integ...
Information integration
Functional association networks - The STRING and STITCH web resources
Mining large-scale data sets on the eukaryotic cell cycle
Systems biology - Understanding biology at the systems level
Unraveling signal transduction networks through data integration
Prediction of protein function
Using networks to derive function
Dynamic complex formation during the yeast cell cycle
Protein interaction networks

More from Lars Juhl Jensen (20)

PPT
One tagger, many uses: Illustrating the power of dictionary-based named entit...
PPT
One tagger, many uses: Simple text-mining strategies for biomedicine
PPT
Extract 2.0: Text-mining-assisted interactive annotation
PPT
Network visualization: A crash course on using Cytoscape
PPT
STRING & STITCH : Network integration of heterogeneous data
PPT
Biomedical text mining: Automatic processing of unstructured text
PPT
Medical network analysis: Linking diseases and genes through data and text mi...
PPT
Cellular networks
PPT
Cellular Network Biology: Large-scale integration of data and text
PPT
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
PPT
STRING & related databases: Large-scale integration of heterogeneous data
PPT
Tagger: Rapid dictionary-based named entity recognition
PPT
Network Biology: Large-scale integration of data and text
PPT
Medical text mining: Linking diseases, drugs, and adverse reactions
PPT
Network biology: Large-scale integration of data and text
PPT
Medical data and text mining: Linking diseases, drugs, and adverse reactions
PPT
Cellular Network Biology
PPT
Network biology: Large-scale integration of data and text
PPT
Biomarker bioinformatics: Network-based candidate prioritization
PPT
The Art of Counting: Scoring and ranking co-occurrences in literature
One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Simple text-mining strategies for biomedicine
Extract 2.0: Text-mining-assisted interactive annotation
Network visualization: A crash course on using Cytoscape
STRING & STITCH : Network integration of heterogeneous data
Biomedical text mining: Automatic processing of unstructured text
Medical network analysis: Linking diseases and genes through data and text mi...
Cellular networks
Cellular Network Biology: Large-scale integration of data and text
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
STRING & related databases: Large-scale integration of heterogeneous data
Tagger: Rapid dictionary-based named entity recognition
Network Biology: Large-scale integration of data and text
Medical text mining: Linking diseases, drugs, and adverse reactions
Network biology: Large-scale integration of data and text
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Cellular Network Biology
Network biology: Large-scale integration of data and text
Biomarker bioinformatics: Network-based candidate prioritization
The Art of Counting: Scoring and ranking co-occurrences in literature

Recently uploaded (20)

PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Encapsulation theory and applications.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Machine learning based COVID-19 study performance prediction
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
KodekX | Application Modernization Development
PPTX
MYSQL Presentation for SQL database connectivity
PDF
cuic standard and advanced reporting.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
Big Data Technologies - Introduction.pptx
PDF
Empathic Computing: Creating Shared Understanding
Unlocking AI with Model Context Protocol (MCP)
Per capita expenditure prediction using model stacking based on satellite ima...
Encapsulation theory and applications.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Machine learning based COVID-19 study performance prediction
20250228 LYD VKU AI Blended-Learning.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Chapter 3 Spatial Domain Image Processing.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Mobile App Security Testing_ A Comprehensive Guide.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
KodekX | Application Modernization Development
MYSQL Presentation for SQL database connectivity
cuic standard and advanced reporting.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Building Integrated photovoltaic BIPV_UPV.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Big Data Technologies - Introduction.pptx
Empathic Computing: Creating Shared Understanding

Integration of diverse large-scale datasets