SlideShare a Scribd company logo
STRING
Large-scale integration of data and
text
Lars Juhl Jensen
9.6 million proteins
association network
guilt by association
STRING - Large-scale integration of data and text
genomic context
gene fusion
Korbel et al., Nature Biotechnology, 2004
phylogenetic profiles
Korbel et al., Nature Biotechnology, 2004
experimental data
gene coexpression
STRING - Large-scale integration of data and text
physical interactions
Jensen & Bork, Science, 2008
curated knowledge
protein complexes
STRING - Large-scale integration of data and text
pathways
Letunic & Bork, Trends in Biochemical Sciences, 2008
many databases
different formats
different identifiers
variable quality
not comparable
hard work
parsers
mapping files
quality scores
affinity purification
von Mering et al., Nucleic Acids Research, 2005
score calibration
gold standard
von Mering et al., Nucleic Acids Research, 2005
implicit weighting by quality
common scale
cross-species transfer
orthologous groups
Franceschini et al., Nucleic Acids Research, 2013
missing most of the data
>10 km
too much to read
computer
as smart as a dog
teach it specific tricks
STRING - Large-scale integration of data and text
STRING - Large-scale integration of data and text
named entity recognition
comprehensive lexicon
cyclin dependent kinase 1
CDC2
orthographic variation
spaces and hyphens
cyclin dependent kinase 1
cyclin-dependent kinase 1
prefixes and suffixes
CDC2
hCdc2
“black list”
SDS
co-mentioning
counting
within documents
within paragraphs
within sentences
quality scores
score calibration
cross-species transfer
combine all evidence
Szklarczyk et al., Nucleic Acids Research, 2015string-db.org
web resource
Cytoscape App
larsjuhljensen.wordpress.com
larsjuhljensen.wordpress.com
Acknowledgments
Damian Szklarczyk
Michael Kuhn
Andrea Franceschini
Milan Simonovic
Alexander Roth
Sune Pletscher-Frankild
John “Scooter” Morris
Christian von Mering
Peer Bork

More Related Content

PPT
STRING - Protein networks from data and text mining
PPT
Introduction to STRING
PPT
STRING: Protein networks from data and text mining
PPT
Gene association networks - Large-scale integration of data and text
PPT
Gene association networks - Large-scale integration of data and text
PPT
Gene association networks - Large-scale integration of data and text
PPT
Gene association networks - Large-scale integration of data and text
PPT
Network biology - Large-scale integration of data and text
STRING - Protein networks from data and text mining
Introduction to STRING
STRING: Protein networks from data and text mining
Gene association networks - Large-scale integration of data and text
Gene association networks - Large-scale integration of data and text
Gene association networks - Large-scale integration of data and text
Gene association networks - Large-scale integration of data and text
Network biology - Large-scale integration of data and text

What's hot (20)

PPT
Gene association networks: Large-scale integration of data and text
PPT
Gene association networks: Large-scale integration of data and text
PPT
Gene association networks: Large-scale integration of data and text
PPT
Network biology: Large-scale data and text mining
PPT
In silico and Text-Based Analysis of Cellular Networks
PPT
Data integration with STRING
PPT
Protein association networks: Large-scale integration of data and text
KEY
STRING/STITCH tutorial
PPT
One tagger, many uses - Illustrating the power of ontologies in named entity ...
PPT
Making gene networks through data integration
PPT
Networks of proteins and diseases
PPT
Network biology: Large-scale data integration and text mining
PPT
Network biology: Large-scale data and text mining
PPT
STRING: Large-scale data and text mining
PPT
Network Biology: Large-scale integration of data and text
PPT
Network Biology: A crash course on STRING and Cytoscape
PPT
The STRING database and related tools
PPT
Advanced bioinformatics of proteomics datasets
PPT
Large-scale integration of data and text
PPT
Large-scale integration of data and text
Gene association networks: Large-scale integration of data and text
Gene association networks: Large-scale integration of data and text
Gene association networks: Large-scale integration of data and text
Network biology: Large-scale data and text mining
In silico and Text-Based Analysis of Cellular Networks
Data integration with STRING
Protein association networks: Large-scale integration of data and text
STRING/STITCH tutorial
One tagger, many uses - Illustrating the power of ontologies in named entity ...
Making gene networks through data integration
Networks of proteins and diseases
Network biology: Large-scale data integration and text mining
Network biology: Large-scale data and text mining
STRING: Large-scale data and text mining
Network Biology: Large-scale integration of data and text
Network Biology: A crash course on STRING and Cytoscape
The STRING database and related tools
Advanced bioinformatics of proteomics datasets
Large-scale integration of data and text
Large-scale integration of data and text
Ad

Viewers also liked (18)

PPT
Integration of heterogeneous data
PPT
The STRING database
PPT
Cross-species data integration
PPT
The Literature Text Mining Approach In Cancer Research
PPT
Networks of proteins and diseases
PPT
Identification of drug targets from side-effect similarity
PPT
Medical data and text mining - Linking diseases, drugs, and adverse reactions
PPT
Mining heaps of data and piles of papers
PPT
Network biology: Large-scale data integration and text mining
PPT
Information integration
PPT
Biological literature mining - from information retrieval to biological disco...
PPT
Biomedical literature mining (and why we really need open access)
PPT
Biomedical literature mining
PDF
Literature-based discovery: it's all about connecting dots in widely disparat...
PPT
Mining heterogeneous data: Understanding systems at the level of complexes an...
PPT
Systems biology: Large-scale biomedical data mining
PDF
Bibliological data science and drug discovery
PDF
Biomedical Relation Extraction for Knowledge Graph Completion
Integration of heterogeneous data
The STRING database
Cross-species data integration
The Literature Text Mining Approach In Cancer Research
Networks of proteins and diseases
Identification of drug targets from side-effect similarity
Medical data and text mining - Linking diseases, drugs, and adverse reactions
Mining heaps of data and piles of papers
Network biology: Large-scale data integration and text mining
Information integration
Biological literature mining - from information retrieval to biological disco...
Biomedical literature mining (and why we really need open access)
Biomedical literature mining
Literature-based discovery: it's all about connecting dots in widely disparat...
Mining heterogeneous data: Understanding systems at the level of complexes an...
Systems biology: Large-scale biomedical data mining
Bibliological data science and drug discovery
Biomedical Relation Extraction for Knowledge Graph Completion
Ad

Similar to STRING - Large-scale integration of data and text (14)

PPT
Gene Association Networks: Large-scale integration of data and text
PPT
Networks of proteins and diseases
PPT
STRING & STITCH : Network integration of heterogeneous data
PPT
Large-scale data and text mining
PPT
Turning big data and text collections into web resrouces
PPT
Systems biology: Bioinformatics on complete biological systems
PPT
Systems biology: Bioinformatics on complete biological system
PPT
Large-scale integration of data and text
PPT
Network biology
PPT
Data and Text Mining
PPT
Biomarker bioinformatics: Network-based candidate prioritization
PPT
Integration of heterogeneous data
PPT
Network biology: Large-scale biomedical data and text mining
PPT
Cellular network biology: Proteome-wide analysis of heterogeneous data
Gene Association Networks: Large-scale integration of data and text
Networks of proteins and diseases
STRING & STITCH : Network integration of heterogeneous data
Large-scale data and text mining
Turning big data and text collections into web resrouces
Systems biology: Bioinformatics on complete biological systems
Systems biology: Bioinformatics on complete biological system
Large-scale integration of data and text
Network biology
Data and Text Mining
Biomarker bioinformatics: Network-based candidate prioritization
Integration of heterogeneous data
Network biology: Large-scale biomedical data and text mining
Cellular network biology: Proteome-wide analysis of heterogeneous data

More from Lars Juhl Jensen (20)

PPT
One tagger, many uses: Illustrating the power of dictionary-based named entit...
PPT
One tagger, many uses: Simple text-mining strategies for biomedicine
PPT
Extract 2.0: Text-mining-assisted interactive annotation
PPT
Network visualization: A crash course on using Cytoscape
PPT
Biomedical text mining: Automatic processing of unstructured text
PPT
Medical network analysis: Linking diseases and genes through data and text mi...
PPT
Cellular networks
PPT
Cellular Network Biology: Large-scale integration of data and text
PPT
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
PPT
STRING & related databases: Large-scale integration of heterogeneous data
PPT
Tagger: Rapid dictionary-based named entity recognition
PPT
Medical text mining: Linking diseases, drugs, and adverse reactions
PPT
Network biology: Large-scale integration of data and text
PPT
Medical data and text mining: Linking diseases, drugs, and adverse reactions
PPT
Cellular Network Biology
PPT
Network biology: Large-scale integration of data and text
PPT
The Art of Counting: Scoring and ranking co-occurrences in literature
PPT
Text-mining-based retrieval of protein networks
PPT
Medical data and text mining: Linking diseases, drugs, and adverse reactions
PPT
Medical data and text mining: Linking diseases, drugs, and adverse reactions
One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Simple text-mining strategies for biomedicine
Extract 2.0: Text-mining-assisted interactive annotation
Network visualization: A crash course on using Cytoscape
Biomedical text mining: Automatic processing of unstructured text
Medical network analysis: Linking diseases and genes through data and text mi...
Cellular networks
Cellular Network Biology: Large-scale integration of data and text
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
STRING & related databases: Large-scale integration of heterogeneous data
Tagger: Rapid dictionary-based named entity recognition
Medical text mining: Linking diseases, drugs, and adverse reactions
Network biology: Large-scale integration of data and text
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Cellular Network Biology
Network biology: Large-scale integration of data and text
The Art of Counting: Scoring and ranking co-occurrences in literature
Text-mining-based retrieval of protein networks
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactions

Recently uploaded (20)

PPTX
INTRODUCTION TO EVS | Concept of sustainability
PPT
protein biochemistry.ppt for university classes
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PDF
bbec55_b34400a7914c42429908233dbd381773.pdf
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PPTX
Introduction to Cardiovascular system_structure and functions-1
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PPTX
BIOMOLECULES PPT........................
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PDF
HPLC-PPT.docx high performance liquid chromatography
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
INTRODUCTION TO EVS | Concept of sustainability
protein biochemistry.ppt for university classes
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
7. General Toxicologyfor clinical phrmacy.pptx
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
bbec55_b34400a7914c42429908233dbd381773.pdf
Comparative Structure of Integument in Vertebrates.pptx
Taita Taveta Laboratory Technician Workshop Presentation.pptx
Derivatives of integument scales, beaks, horns,.pptx
Introduction to Cardiovascular system_structure and functions-1
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
BIOMOLECULES PPT........................
Phytochemical Investigation of Miliusa longipes.pdf
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
HPLC-PPT.docx high performance liquid chromatography
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
ECG_Course_Presentation د.محمد صقران ppt
The KM-GBF monitoring framework – status & key messages.pptx

STRING - Large-scale integration of data and text