SlideShare a Scribd company logo
Text and data integration
Lars Juhl Jensen
interaction networks
association networks
guilt by association
Text and data integration
molecular networks
STRING
protein networks
9.6 million proteins
Szklarczyk et al., Nucleic Acids Research, 2015string-db.org
STITCH
chemical networks
430,000 chemicals
Kuhn et al., Nucleic Acids Research, 2014stitch-db.org
Exercise 1
Go to http://string-db.org/
Query for human thymidylate
synthase (TYMS) using search by
name
Make sure you are in evidence view
(check the buttons below the
network)
Why are there multiple lines
curated knowledge
(what we know)
text-book material
metabolic pathways
Letunic & Bork, Trends in Biochemical Sciences, 2008
signaling pathways
protein complexes
3D structures
Text and data integration
very incomplete
text mining
(what we buried)
named entity recognition
proteins
chemicals
information extraction
co-mentioning
counting
within documents
within paragraphs
within sentences
scoring scheme
Text and data integration
Text and data integration
NLP
Natural Language Processing
experimental data
(what we measured)
protein interactions
Jensen & Bork, Science, 2008
genetic interactions
Beyer et al., Nature Reviews Genetics, 2007
gene coexpression
Text and data integration
chemical screens
Karaman et al., Nature Biotechnology, 2008
predictions
(what we infer)
genomic context
gene fusion
Korbel et al., Nature Biotechnology, 2004
gene neighborhood
Korbel et al., Nature Biotechnology, 2004
phylogenetic profiles
Korbel et al., Nature Biotechnology, 2004
a real example
Text and data integration
Text and data integration
Text and data integration
Cell
Cellulosomes
Cellulose
Exercise 2
(Continue from where exercise 1
ended)
Which types of evidence support the
interaction between TYMS and
DHFR?
Click on the interaction to view the
popup, which has buttons linking to
full details
complications
many databases
different formats
different identifiers
variable quality
not comparable
not same species
hard work
parsers
mapping files
quality scores
affinity purification
von Mering et al., Nucleic Acids Research, 2005
phylogenetic profiles
Text and data integration
score calibration
gold standard
von Mering et al., Nucleic Acids Research, 2005
implicit weighting by quality
common scale
homology-based transfer
orthologous groups
two-step process
Franceschini et al., Nucleic Acids Research, 2013
Exercise 3
(Continue from where exercise 2
ended)
Change the network to the
confidence view
Change the confidence cutoff to 0.9;
do you see changes in the proteins or
interactions?
Turn off all all evidence types except
summary
association networks
text mining
heterogeneous data
common identifiers
quality scores
protein networks
Szklarczyk et al., Nucleic Acids Research, 2015string-db.org
chemical networks
Kuhn et al., Nucleic Acids Research, 2014stitch-db.org
COMPARTMENTS
subcellular localization
Binder et al., Database, 2014compartments.jensenlab.org
TISSUES
tissue expression
tissues.jensenlab.org Santos et al., PeerJ, 2015
DISEASES
disease associations
diseases.jensenlab.org Frankild et al., Methods, 2015
Exercise 4
Open http://stitch-db.org
Query for thymidlate synthase (TYMS)
and inspect the TYMS–pemetrexed
interaction
Open http://diseases.jensenlab.org
Search for thymidylate synthase
(TYMS)
What is the strongest associated

More Related Content

PPT
STRING & related databases: Large-scale integration of heterogeneous data
PPT
Systems biology: Bioinformatics on complete biological system
PPT
Systems biology: Bioinformatics on complete biological systems
PPT
Network biology - Large-scale integration of data and text
KEY
STRING/STITCH tutorial
PPT
Large-scale integration of data and text
PPT
Gene association networks - Large-scale integration of data and text
PPT
STRING: Large-scale data and text mining
STRING & related databases: Large-scale integration of heterogeneous data
Systems biology: Bioinformatics on complete biological system
Systems biology: Bioinformatics on complete biological systems
Network biology - Large-scale integration of data and text
STRING/STITCH tutorial
Large-scale integration of data and text
Gene association networks - Large-scale integration of data and text
STRING: Large-scale data and text mining

What's hot (20)

PPT
STRING - Protein networks from data and text mining
PPT
Large-scale integration of data and text
PPT
STRING: protein association networks
PPT
STRING: Protein association networks
PPT
Network biology: Large-scale integration of data and text
PPT
Introduction to STRING
PPTX
Systems Biology Approaches to Cancer
PPT
The STRING database and related tools
PPT
Protein association networks with STRING
PPT
Gene association networks - Large-scale integration of data and text
PPT
Information integration
PPTX
Introduction to systems biology
PPT
Gene association networks: Large-scale integration of data and text
PPT
Gene association networks: Large-scale integration of data and text
PPT
STRING - Large-scale integration of data and text
PPT
Gene association networks: Large-scale integration of data and text
PPT
Making gene networks through data integration
PPT
Protein association networks: Large-scale integration of data and text
PPT
Network biology: Large-scale biomedical data and text mining
PPT
The STRING database - Quality scores for heterogeneous interaction data
STRING - Protein networks from data and text mining
Large-scale integration of data and text
STRING: protein association networks
STRING: Protein association networks
Network biology: Large-scale integration of data and text
Introduction to STRING
Systems Biology Approaches to Cancer
The STRING database and related tools
Protein association networks with STRING
Gene association networks - Large-scale integration of data and text
Information integration
Introduction to systems biology
Gene association networks: Large-scale integration of data and text
Gene association networks: Large-scale integration of data and text
STRING - Large-scale integration of data and text
Gene association networks: Large-scale integration of data and text
Making gene networks through data integration
Protein association networks: Large-scale integration of data and text
Network biology: Large-scale biomedical data and text mining
The STRING database - Quality scores for heterogeneous interaction data
Ad

Similar to Text and data integration (20)

PPT
Large-scale data and text mining - Linking proteins, chemicals, and side effects
PPT
Systems biology - Bioinformatics on complete biological systems
PPT
Systems biology - Understanding biology at the systems level
PPT
Data integration and functional association networks
PPT
Prediction of protein function
PDF
Investigating plant systems using data integration and network analysis
PPT
Integration of heterogeneous data
PPT
STRING & STITCH : Network integration of heterogeneous data
PPT
Protein interaction networks
PPT
Network integration of heterogeneous data
PPT
Data integration: The STITCH database of protein-small molecule interactions
PPTX
Target Identification - Gene Disease and Protein Target Prediction
PPT
Proteomics - Analysis and integration of large-scale data sets
PDF
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
PPTX
Data analysis & integration challenges in genomics
PPT
Cross-species data integration
PPTX
R Packages Unpacked
PPT
Large-scale integration of data and text
PPT
Network biology: Large-scale data and text mining
PDF
Open-Source Bioinformatics for Data Scientists with Amanda Schierz
Large-scale data and text mining - Linking proteins, chemicals, and side effects
Systems biology - Bioinformatics on complete biological systems
Systems biology - Understanding biology at the systems level
Data integration and functional association networks
Prediction of protein function
Investigating plant systems using data integration and network analysis
Integration of heterogeneous data
STRING & STITCH : Network integration of heterogeneous data
Protein interaction networks
Network integration of heterogeneous data
Data integration: The STITCH database of protein-small molecule interactions
Target Identification - Gene Disease and Protein Target Prediction
Proteomics - Analysis and integration of large-scale data sets
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Data analysis & integration challenges in genomics
Cross-species data integration
R Packages Unpacked
Large-scale integration of data and text
Network biology: Large-scale data and text mining
Open-Source Bioinformatics for Data Scientists with Amanda Schierz
Ad

More from Lars Juhl Jensen (20)

PPT
One tagger, many uses: Illustrating the power of dictionary-based named entit...
PPT
One tagger, many uses: Simple text-mining strategies for biomedicine
PPT
Extract 2.0: Text-mining-assisted interactive annotation
PPT
Network visualization: A crash course on using Cytoscape
PPT
Biomedical text mining: Automatic processing of unstructured text
PPT
Medical network analysis: Linking diseases and genes through data and text mi...
PPT
Network Biology: A crash course on STRING and Cytoscape
PPT
Cellular networks
PPT
Cellular Network Biology: Large-scale integration of data and text
PPT
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
PPT
Tagger: Rapid dictionary-based named entity recognition
PPT
Network Biology: Large-scale integration of data and text
PPT
Medical text mining: Linking diseases, drugs, and adverse reactions
PPT
Medical data and text mining: Linking diseases, drugs, and adverse reactions
PPT
Cellular Network Biology
PPT
Network biology: Large-scale integration of data and text
PPT
Biomarker bioinformatics: Network-based candidate prioritization
PPT
The Art of Counting: Scoring and ranking co-occurrences in literature
PPT
Text-mining-based retrieval of protein networks
PPT
Medical data and text mining: Linking diseases, drugs, and adverse reactions
One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Simple text-mining strategies for biomedicine
Extract 2.0: Text-mining-assisted interactive annotation
Network visualization: A crash course on using Cytoscape
Biomedical text mining: Automatic processing of unstructured text
Medical network analysis: Linking diseases and genes through data and text mi...
Network Biology: A crash course on STRING and Cytoscape
Cellular networks
Cellular Network Biology: Large-scale integration of data and text
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Tagger: Rapid dictionary-based named entity recognition
Network Biology: Large-scale integration of data and text
Medical text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Cellular Network Biology
Network biology: Large-scale integration of data and text
Biomarker bioinformatics: Network-based candidate prioritization
The Art of Counting: Scoring and ranking co-occurrences in literature
Text-mining-based retrieval of protein networks
Medical data and text mining: Linking diseases, drugs, and adverse reactions

Recently uploaded (20)

PPTX
microscope-Lecturecjchchchchcuvuvhc.pptx
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PDF
Sciences of Europe No 170 (2025)
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PPTX
Cell Membrane: Structure, Composition & Functions
PPT
protein biochemistry.ppt for university classes
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PDF
The scientific heritage No 166 (166) (2025)
PPTX
BIOMOLECULES PPT........................
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
Microbiology with diagram medical studies .pptx
microscope-Lecturecjchchchchcuvuvhc.pptx
AlphaEarth Foundations and the Satellite Embedding dataset
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
Introduction to Fisheries Biotechnology_Lesson 1.pptx
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
Sciences of Europe No 170 (2025)
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
Phytochemical Investigation of Miliusa longipes.pdf
POSITIONING IN OPERATION THEATRE ROOM.ppt
INTRODUCTION TO EVS | Concept of sustainability
Cell Membrane: Structure, Composition & Functions
protein biochemistry.ppt for university classes
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
Comparative Structure of Integument in Vertebrates.pptx
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
The scientific heritage No 166 (166) (2025)
BIOMOLECULES PPT........................
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
Microbiology with diagram medical studies .pptx

Text and data integration