SlideShare a Scribd company logo
Lars Juhl Jensen
The pragmatic text miner
It’s just another type of poorly standardized
data
why text mining?
data mining
guilt by association
The pragmatic text miner - It's just another type of poorly standardized data
structured data
unstructured text
biomedical literature
>10 km
too much to read
computer
as smart as a dog
teach it specific tricks
The pragmatic text miner - It's just another type of poorly standardized data
The pragmatic text miner - It's just another type of poorly standardized data
named entity recognition
comprehensive lexicon
synonyms
prostate specific antigen
KLK3
expansion rules
prefixes and suffixes
KLK3
hKLK3
flexible matching
hyphens and spaces
prostate specific antigen
prostate-specific antigen
“black list”
SDS
co-mentioning
within documents
within paragraphs
within sentences
weighted score
text corpus
what we normally use
Medline abstracts
what we should use
full-text articles
many publishers
different interfaces
different formats
different licenses
unclear terms
unifying text & data
text mining
curated knowledge
experimental data
computational predictions
integrated web resources
protein networks
string-db.org
chemical networks
stitch-db.org
subcellular localization
compartments.jensenlab.org
tissue expression
tissues.jensenlab.org
disease associations
many sources
different formats
different identifiers
variable quality
not comparable
hard work
common identifiers
quality scores
score calibration
data visualization
collaboration model
domain experts
what?
why?
problem
manpower
me
how?
technology
guidance
biodiversity
organisms
environments
Encyclopedia of Life
Biodiversity Heritage Library
pharmacovigilance
drugs
adverse drug reactions
electronic health records
The pragmatic text miner - It's just another type of poorly standardized data
what we need
The pragmatic text miner - It's just another type of poorly standardized data
one place to get all
the format is not crucial
the license is
Acknowledgments
Protein networks
Michael Kuhn
Damian Szklarczyk
Andrea Franceschini
Milan Simonovic
Alexander Roth
Sune Pletscher-
Frankild
Jianyi Lin
Pablo Minguez
Christian von Mering
Peer Bork
Localization and disease
Sune Pletscher-Frankild
Alberto Santos
Janos Binder
Kalliopi Tsafou
Christian Stolte
Albert Palleja
Heiko Horn
Evangelos Pafilis
Reinhardt Schneider
Sean O’ Donoghue

More Related Content

PPT
The pragmatic text miner: It’s just another type of poorly standardized data
PPT
The pragmatic text miner: It's just another type of poorly standardized data
PPT
Large-scale integration of data and text
PPT
STRING: Large-scale data and text mining
PPT
Protein association networks with STRING
PPT
STRING - Protein networks from data and text mining
PPT
The STRING database - Quality scores for heterogeneous interaction data
PPT
Introduction to STRING
The pragmatic text miner: It’s just another type of poorly standardized data
The pragmatic text miner: It's just another type of poorly standardized data
Large-scale integration of data and text
STRING: Large-scale data and text mining
Protein association networks with STRING
STRING - Protein networks from data and text mining
The STRING database - Quality scores for heterogeneous interaction data
Introduction to STRING

What's hot (20)

PPT
The STRING database and related tools
PPT
The STRING database
PPT
Network biology: A crash course on STRING and Cytoscape
PPT
STRING - Large-scale integration of data and text
KEY
STRING/STITCH tutorial
PPT
Turning big data and text collections into web resrouces
PPT
Text mining for organism and environment names
PPT
The STRING database
PPT
Data integration with STRING
PPT
Gene association networks - Large-scale integration of data and text
PPT
Network biology: Large-scale data and text mining
PPT
Network biology: Large-scale data integration and text mining
PPT
Biomarker bioinformatics: Network-based candidate prioritization
PPT
Gene association networks: Large-scale integration of data and text
PPT
Gene association networks: Large-scale integration of data and text
PPT
Scientific Highlights: The Reflect and NetPhorest web resources
PPT
Networks of proteins and diseases
PPT
Gene association networks - Large-scale integration of data and text
PPT
Large-scale integration of data and text
PPT
Protein association networks: Large-scale integration of data and text
The STRING database and related tools
The STRING database
Network biology: A crash course on STRING and Cytoscape
STRING - Large-scale integration of data and text
STRING/STITCH tutorial
Turning big data and text collections into web resrouces
Text mining for organism and environment names
The STRING database
Data integration with STRING
Gene association networks - Large-scale integration of data and text
Network biology: Large-scale data and text mining
Network biology: Large-scale data integration and text mining
Biomarker bioinformatics: Network-based candidate prioritization
Gene association networks: Large-scale integration of data and text
Gene association networks: Large-scale integration of data and text
Scientific Highlights: The Reflect and NetPhorest web resources
Networks of proteins and diseases
Gene association networks - Large-scale integration of data and text
Large-scale integration of data and text
Protein association networks: Large-scale integration of data and text
Ad

Similar to The pragmatic text miner - It's just another type of poorly standardized data (20)

PPT
The pragmatic text miner: From literature to electronic health records
PPT
Pragmatic text mining: From literature to electronic health records
PPT
Pragmatic text mining: From literature to electronic health records
PPT
Text-mining practical
PPT
One tagger, many uses: Simple text-mining strategies for biomedicine
PPT
Data and Text Mining
PPT
Mining literature and medical records
PPT
Medical data and text mining - Linking diseases, drugs, and adverse reactions
PPT
The Literature Text Mining Approach In Cancer Research
PPT
Text mining exercise
PPT
Text-mining practical
PPT
Text-mining practical
PPT
Biomedical text mining and network analysis
PPT
Analysis of ‘Unstructured’ Data
PPT
Medical Data Mining
PPT
Open data and open access - A biomedical data- and text-mining perspective
PPT
Text Analytics for Semantic Computing
PPT
Medical data and text mining: Linking diseases, drugs, and adverse reactions
PPT
Turning literature into databases
PPT
Medical data and text mining - Linking diseases, drugs, and adverse reactions
The pragmatic text miner: From literature to electronic health records
Pragmatic text mining: From literature to electronic health records
Pragmatic text mining: From literature to electronic health records
Text-mining practical
One tagger, many uses: Simple text-mining strategies for biomedicine
Data and Text Mining
Mining literature and medical records
Medical data and text mining - Linking diseases, drugs, and adverse reactions
The Literature Text Mining Approach In Cancer Research
Text mining exercise
Text-mining practical
Text-mining practical
Biomedical text mining and network analysis
Analysis of ‘Unstructured’ Data
Medical Data Mining
Open data and open access - A biomedical data- and text-mining perspective
Text Analytics for Semantic Computing
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Turning literature into databases
Medical data and text mining - Linking diseases, drugs, and adverse reactions
Ad

More from Lars Juhl Jensen (20)

PPT
One tagger, many uses: Illustrating the power of dictionary-based named entit...
PPT
Extract 2.0: Text-mining-assisted interactive annotation
PPT
Network visualization: A crash course on using Cytoscape
PPT
STRING & STITCH : Network integration of heterogeneous data
PPT
Biomedical text mining: Automatic processing of unstructured text
PPT
Medical network analysis: Linking diseases and genes through data and text mi...
PPT
Network Biology: A crash course on STRING and Cytoscape
PPT
Cellular networks
PPT
Cellular Network Biology: Large-scale integration of data and text
PPT
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
PPT
STRING & related databases: Large-scale integration of heterogeneous data
PPT
Tagger: Rapid dictionary-based named entity recognition
PPT
Network Biology: Large-scale integration of data and text
PPT
Medical text mining: Linking diseases, drugs, and adverse reactions
PPT
Network biology: Large-scale integration of data and text
PPT
Medical data and text mining: Linking diseases, drugs, and adverse reactions
PPT
Cellular Network Biology
PPT
Network biology: Large-scale integration of data and text
PPT
The Art of Counting: Scoring and ranking co-occurrences in literature
PPT
Text-mining-based retrieval of protein networks
One tagger, many uses: Illustrating the power of dictionary-based named entit...
Extract 2.0: Text-mining-assisted interactive annotation
Network visualization: A crash course on using Cytoscape
STRING & STITCH : Network integration of heterogeneous data
Biomedical text mining: Automatic processing of unstructured text
Medical network analysis: Linking diseases and genes through data and text mi...
Network Biology: A crash course on STRING and Cytoscape
Cellular networks
Cellular Network Biology: Large-scale integration of data and text
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
STRING & related databases: Large-scale integration of heterogeneous data
Tagger: Rapid dictionary-based named entity recognition
Network Biology: Large-scale integration of data and text
Medical text mining: Linking diseases, drugs, and adverse reactions
Network biology: Large-scale integration of data and text
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Cellular Network Biology
Network biology: Large-scale integration of data and text
The Art of Counting: Scoring and ranking co-occurrences in literature
Text-mining-based retrieval of protein networks

Recently uploaded (20)

PPTX
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
PPTX
Introduction to Cardiovascular system_structure and functions-1
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPT
protein biochemistry.ppt for university classes
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PDF
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PDF
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
PDF
HPLC-PPT.docx high performance liquid chromatography
PDF
The scientific heritage No 166 (166) (2025)
PDF
Sciences of Europe No 170 (2025)
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
2. Earth - The Living Planet earth and life
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PPT
6.1 High Risk New Born. Padetric health ppt
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
Introduction to Cardiovascular system_structure and functions-1
TOTAL hIP ARTHROPLASTY Presentation.pptx
protein biochemistry.ppt for university classes
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
HPLC-PPT.docx high performance liquid chromatography
The scientific heritage No 166 (166) (2025)
Sciences of Europe No 170 (2025)
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
2. Earth - The Living Planet earth and life
Classification Systems_TAXONOMY_SCIENCE8.pptx
6.1 High Risk New Born. Padetric health ppt

The pragmatic text miner - It's just another type of poorly standardized data