SlideShare a Scribd company logo
Large-scale integration 
of data and text 
Lars Juhl Jensen
cellular network biology
association networks
guilt by association
Large-scale integration of data and text
molecular networks
proteins
string-db.org
small molecules
stitch-db.org
subcellular localization
compartments.jensenlab.org
tissue expression
tissues.jensenlab.org
disease associations
usage statistics
heavily used
Large-scale integration of data and text
especially in the US
Large-scale integration of data and text
data integration
heterogeneous data
curated knowledge
experimental data
computational predictions
many databases
different formats
different identifiers
variable quality
not comparable
hard work
common identifiers
quality scores
score calibration
missing most of the data
text mining
>10 km
named entity recognition
comprehensive lexicon
cyclin dependent kinase 1
CDC2
orthographic variation
hCdc2
“black list”
SDS
co-mentioning
counting
within documents
within paragraphs
within sentences
IDG-specific tasks
target classification
text mining
“protein studiedness”
probabilistic counting
resource integration
disease associations
tissue expression
subcellular localization
automation of updates
web services
remapping of identifiers
predictions for dark matter
network-based inference
questions?

More Related Content

PPT
The pragmatic text miner: It’s just another type of poorly standardized data
PPT
The pragmatic text miner - It's just another type of poorly standardized data
PPT
Protein association networks: Large-scale integration of data and text
PPT
Prediction of protein networks through data integration
PPT
STRING: Large-scale data and text mining
PPT
Networks of proteins and diseases
PPT
Protein association networks with STRING
PPT
Gene association networks: Large-scale integration of data and text
The pragmatic text miner: It’s just another type of poorly standardized data
The pragmatic text miner - It's just another type of poorly standardized data
Protein association networks: Large-scale integration of data and text
Prediction of protein networks through data integration
STRING: Large-scale data and text mining
Networks of proteins and diseases
Protein association networks with STRING
Gene association networks: Large-scale integration of data and text

What's hot (20)

PPT
Gene association networks: Large-scale integration of data and text
PPT
Gene Association Networks: Large-scale integration of data and text
PPT
The STRING database - Quality scores for heterogeneous interaction data
PPT
Gene association networks: Large-scale integration of data and text
PPT
STRING - Protein networks from data and text mining
PPT
Network biology: Large-scale data and text mining
PPT
The STRING database
PPT
The STRING database and related tools
PPT
Network Biology: Large-scale integration of data and text
PPT
Introduction to STRING
PPT
Gene association networks - Large-scale integration of data and text
PPT
Networks of proteins and diseases
PPT
Integration of biomedical literature and databases
PPT
Turning big data and text collections into web resrouces
PPT
Integration of heterogeneous data
PPT
STRING & related databases: Large-scale integration of heterogeneous data
PPT
Scientific Highlights: The Reflect and NetPhorest web resources
KEY
STRING/STITCH tutorial
PPT
The STRING database
PPT
One tagger, many uses - Illustrating the power of ontologies in named entity ...
Gene association networks: Large-scale integration of data and text
Gene Association Networks: Large-scale integration of data and text
The STRING database - Quality scores for heterogeneous interaction data
Gene association networks: Large-scale integration of data and text
STRING - Protein networks from data and text mining
Network biology: Large-scale data and text mining
The STRING database
The STRING database and related tools
Network Biology: Large-scale integration of data and text
Introduction to STRING
Gene association networks - Large-scale integration of data and text
Networks of proteins and diseases
Integration of biomedical literature and databases
Turning big data and text collections into web resrouces
Integration of heterogeneous data
STRING & related databases: Large-scale integration of heterogeneous data
Scientific Highlights: The Reflect and NetPhorest web resources
STRING/STITCH tutorial
The STRING database
One tagger, many uses - Illustrating the power of ontologies in named entity ...
Ad

Viewers also liked (18)

PPT
Large-scale integration of data and text
PPT
In silico and Text-Based Analysis of Cellular Networks
PPT
The pragmatic text miner: It’s just another type of poorly standardized data
PPT
STRING: protein association networks
PPT
Making gene networks through data integration
PPT
Large-scale integration of data and text
PPT
Real-time tagging of biomedical entities
PPT
Text mining for organism and environment names
PPT
Biomedical text mining and network analysis
PPT
Medical data and text mining - Linking diseases, drugs, and adverse reactions
PPT
Large-scale biomedical data and text integration
PPT
Statistics on big biomedical data - Methods and pitfalls when analyzing high-...
PPT
Large-scale integration of data and text
PPT
Text and data integration
PPT
The Literature Text Mining Approach In Cancer Research
PPT
Large-scale data and text mining - Linking proteins, chemicals, and side effects
PPT
Data and text mining of Danish electronic health records
PPT
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Large-scale integration of data and text
In silico and Text-Based Analysis of Cellular Networks
The pragmatic text miner: It’s just another type of poorly standardized data
STRING: protein association networks
Making gene networks through data integration
Large-scale integration of data and text
Real-time tagging of biomedical entities
Text mining for organism and environment names
Biomedical text mining and network analysis
Medical data and text mining - Linking diseases, drugs, and adverse reactions
Large-scale biomedical data and text integration
Statistics on big biomedical data - Methods and pitfalls when analyzing high-...
Large-scale integration of data and text
Text and data integration
The Literature Text Mining Approach In Cancer Research
Large-scale data and text mining - Linking proteins, chemicals, and side effects
Data and text mining of Danish electronic health records
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Ad

Similar to Large-scale integration of data and text (20)

PPT
Large-scale integration of data and text
PPT
Network biology: Large-scale data integration and text mining
PPT
Network biology: Large-scale data integration and text mining
PPT
Large-scale integration of data and text
PPT
Cellular Network Biology: Large-scale integration of data and text
PPT
Network biology: Large-scale data and text mining
PPT
Data and Text Mining
PPT
Gene association networks - Large-scale integration of data and text
PPT
Data integration and visualization
PPT
Gene association networks - Large-scale integration of data and text
PPT
Systems biology: Large-scale biomedical data mining
PPT
Advanced bioinformatics of proteomics datasets
PPT
Systems biology - Bioinformatics on complete biological systems
PPT
Systems biology: Large-scale biomedical data mining
PPT
Gene association networks - Large-scale integration of data and text
PPT
Large-scale data and text mining
PPT
Network biology: A basis for large-scale biomedical data mining
PPT
Network biology: A basis for large-scale biomedical data mining
PPT
Network biology: A basis for large-scale biomedical data mining
PPT
Systems biology: Bioinformatics on complete biological system
Large-scale integration of data and text
Network biology: Large-scale data integration and text mining
Network biology: Large-scale data integration and text mining
Large-scale integration of data and text
Cellular Network Biology: Large-scale integration of data and text
Network biology: Large-scale data and text mining
Data and Text Mining
Gene association networks - Large-scale integration of data and text
Data integration and visualization
Gene association networks - Large-scale integration of data and text
Systems biology: Large-scale biomedical data mining
Advanced bioinformatics of proteomics datasets
Systems biology - Bioinformatics on complete biological systems
Systems biology: Large-scale biomedical data mining
Gene association networks - Large-scale integration of data and text
Large-scale data and text mining
Network biology: A basis for large-scale biomedical data mining
Network biology: A basis for large-scale biomedical data mining
Network biology: A basis for large-scale biomedical data mining
Systems biology: Bioinformatics on complete biological system

More from Lars Juhl Jensen (20)

PPT
One tagger, many uses: Illustrating the power of dictionary-based named entit...
PPT
One tagger, many uses: Simple text-mining strategies for biomedicine
PPT
Extract 2.0: Text-mining-assisted interactive annotation
PPT
Network visualization: A crash course on using Cytoscape
PPT
STRING & STITCH : Network integration of heterogeneous data
PPT
Biomedical text mining: Automatic processing of unstructured text
PPT
Medical network analysis: Linking diseases and genes through data and text mi...
PPT
Network Biology: A crash course on STRING and Cytoscape
PPT
Cellular networks
PPT
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
PPT
Tagger: Rapid dictionary-based named entity recognition
PPT
Medical text mining: Linking diseases, drugs, and adverse reactions
PPT
Network biology: Large-scale integration of data and text
PPT
Medical data and text mining: Linking diseases, drugs, and adverse reactions
PPT
Cellular Network Biology
PPT
Network biology: Large-scale integration of data and text
PPT
Biomarker bioinformatics: Network-based candidate prioritization
PPT
The Art of Counting: Scoring and ranking co-occurrences in literature
PPT
Text-mining-based retrieval of protein networks
PPT
Medical data and text mining: Linking diseases, drugs, and adverse reactions
One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Simple text-mining strategies for biomedicine
Extract 2.0: Text-mining-assisted interactive annotation
Network visualization: A crash course on using Cytoscape
STRING & STITCH : Network integration of heterogeneous data
Biomedical text mining: Automatic processing of unstructured text
Medical network analysis: Linking diseases and genes through data and text mi...
Network Biology: A crash course on STRING and Cytoscape
Cellular networks
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Tagger: Rapid dictionary-based named entity recognition
Medical text mining: Linking diseases, drugs, and adverse reactions
Network biology: Large-scale integration of data and text
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Cellular Network Biology
Network biology: Large-scale integration of data and text
Biomarker bioinformatics: Network-based candidate prioritization
The Art of Counting: Scoring and ranking co-occurrences in literature
Text-mining-based retrieval of protein networks
Medical data and text mining: Linking diseases, drugs, and adverse reactions

Recently uploaded (20)

PPTX
Cell Membrane: Structure, Composition & Functions
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PPT
protein biochemistry.ppt for university classes
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PPTX
2. Earth - The Living Planet Module 2ELS
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PDF
Sciences of Europe No 170 (2025)
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PDF
An interstellar mission to test astrophysical black holes
PPTX
2Systematics of Living Organisms t-.pptx
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPTX
BIOMOLECULES PPT........................
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PPTX
famous lake in india and its disturibution and importance
PPTX
2. Earth - The Living Planet earth and life
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
Cell Membrane: Structure, Composition & Functions
POSITIONING IN OPERATION THEATRE ROOM.ppt
protein biochemistry.ppt for university classes
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
2. Earth - The Living Planet Module 2ELS
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
Sciences of Europe No 170 (2025)
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
An interstellar mission to test astrophysical black holes
2Systematics of Living Organisms t-.pptx
ECG_Course_Presentation د.محمد صقران ppt
Biophysics 2.pdffffffffffffffffffffffffff
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
AlphaEarth Foundations and the Satellite Embedding dataset
BIOMOLECULES PPT........................
Introduction to Fisheries Biotechnology_Lesson 1.pptx
famous lake in india and its disturibution and importance
2. Earth - The Living Planet earth and life
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
The KM-GBF monitoring framework – status & key messages.pptx