SlideShare a Scribd company logo
Lars Juhl Jensen
The pragmatic text miner
It’s just another type of poorly standardized
data
why text mining?
data mining
guilt by association
The pragmatic text miner: It’s just another type of poorly standardized data
structured data
unstructured text
biomedical literature
>10 km
too much to read
computer
as smart as a dog
teach it specific tricks
The pragmatic text miner: It’s just another type of poorly standardized data
The pragmatic text miner: It’s just another type of poorly standardized data
named entity recognition
text corpus
comprehensive lexicon
synonyms
expansion rules
prefixes and suffixes
flexible matching
hyphens and spaces
“black list”
a
co-mentioning
within documents
within paragraphs
within sentences
weighted score
unifying text & data
text mining
curated knowledge
experimental data
computational predictions
protein networks
Szklarczyk et al., Nucleic Acids Research, 2015string-db.org
chemical networks
Kuhn et al., Nucleic Acids Research, 2014stitch-db.org
subcellular localization
Binder et al., Database, 2014compartments.jensenlab.org
tissue expression
tissues.jensenlab.org Santos et al., submitted, 2015
disease associations
diseases.jensenlab.org Frankild et al., Methods, 2015
many databases
different formats
different identifiers
variable quality
not comparable
hard work
common identifiers
quality scores
calibrate vs. gold standard
von Mering et al., Nucleic Acids Research, 2005
general framework
interactive web resources
semantic web services
augmented browsing
Pafilis, O’Donoghue, Jensen et al., Nature Biotechnology, 2009reflect.ws
medical data mining
Jensen et al., Nature Reviews Genetics, 2012
structured data
Jensen et al., Nature Reviews Genetics, 2012
119 million diagnoses
6.2 million patients
distributions
Jensen et al., Nature Communications, 2014
trajectories
Jensen et al., Nature Communications, 2014
clinical narrative
The pragmatic text miner: It’s just another type of poorly standardized data
unstructured text
Danish
busy doctors
comprehensive lexicon
adverse drug events
drugs
Clozapine
clozapi
n
clossapi
n
klozapin
e
chlosapi
n
chlosapi
ne
chlozapi
n
chlozapi
ne
klossapi
n
closapin
e
klozapi
nklosapi
n
Clozapine
rule-based system
Eriksson et al., Drug Safety, 2014
Drug introduction Drug discontinuationAdverse eventIdentification start
Adverse eventNegative modifier Indication Pre-existing
condition
Adverse drug reaction Possible
adverse drug reaction
ADR of
additional drug
Eriksson et al., Drug Safety, 2014
Drug introduction Drug discontinuation
Adverse eventNegative modifier Indication Pre-existing
condition
Adverse drug reaction Possible
adverse drug reaction
Adverse event
ADR of
additional drug
Identification start
Eriksson et al., Drug Safety, 2014
Drug introduction Drug discontinuation
Adverse eventNegative modifier Indication Pre-existing
condition
Adverse drug reaction Possible
adverse drug reaction
Adverse event
ADR of
additional drug
Identification start
direct medical implications
Acknowledgments
STRING/STITCH
Michael Kuhn
Damian Szklarczyk
Andrea Franceschini
Milan Simonovic
Alexander Roth
Sune Pletscher-Frankild
Jianyi Lin
Pablo Minguez
Christian von Mering
Peer Bork
Text mining
Sune Pletscher-
Frankild
Jasmin Saric
Evangelos Pafilis
Alberto Santos
Janos Binder
Kalliopi Tsafou
Heiko Horn
Michael Kuhn
Reinhardt Schneider
Sean O’ Donoghue
EHR mining
Anders Boeck
Jensen
Robert Eriksson
Peter Bjødstrup
Jensen
Andreas Bok
Andersen
Sabrina Gade
Ellesøe
Henriette Schmock
Tudor Oprea
Pope Moseley
Thomas Werge
Søren Brunak

More Related Content

PPT
The pragmatic text miner - It's just another type of poorly standardized data
PPT
The pragmatic text miner: It’s just another type of poorly standardized data
PPT
The pragmatic text miner: It's just another type of poorly standardized data
PPT
STRING - Protein networks from data and text mining
PPT
Network biology: Large-scale data integration and text mining
PPT
Introduction to STRING
PPT
STRING - Large-scale integration of data and text
PPT
Gene association networks - Large-scale integration of data and text
The pragmatic text miner - It's just another type of poorly standardized data
The pragmatic text miner: It’s just another type of poorly standardized data
The pragmatic text miner: It's just another type of poorly standardized data
STRING - Protein networks from data and text mining
Network biology: Large-scale data integration and text mining
Introduction to STRING
STRING - Large-scale integration of data and text
Gene association networks - Large-scale integration of data and text

What's hot (20)

PPT
Data integration with STRING
PPT
Networks of proteins and diseases
PPT
Protein association networks with STRING
PPT
The STRING database and related tools
PPT
Turning big data and text collections into web resrouces
KEY
STRING/STITCH tutorial
PPT
Gene association networks - Large-scale integration of data and text
PPT
Large-scale integration of data and text
PPT
Network biology: Large-scale data and text mining
PPT
Network biology: A crash course on STRING and Cytoscape
PPT
The STRING database
PPT
The STRING database - Quality scores for heterogeneous interaction data
PPT
Network biology: Large-scale data integration and text mining
PPT
Protein association networks: Large-scale integration of data and text
PPT
Gene association networks: Large-scale integration of data and text
PPT
Gene association networks: Large-scale integration of data and text
PPT
STRING: Protein networks from data and text mining
PPT
The STRING database
PPT
Large-scale integration of data and text
PPT
Text mining for organism and environment names
Data integration with STRING
Networks of proteins and diseases
Protein association networks with STRING
The STRING database and related tools
Turning big data and text collections into web resrouces
STRING/STITCH tutorial
Gene association networks - Large-scale integration of data and text
Large-scale integration of data and text
Network biology: Large-scale data and text mining
Network biology: A crash course on STRING and Cytoscape
The STRING database
The STRING database - Quality scores for heterogeneous interaction data
Network biology: Large-scale data integration and text mining
Protein association networks: Large-scale integration of data and text
Gene association networks: Large-scale integration of data and text
Gene association networks: Large-scale integration of data and text
STRING: Protein networks from data and text mining
The STRING database
Large-scale integration of data and text
Text mining for organism and environment names
Ad

Similar to The pragmatic text miner: It’s just another type of poorly standardized data (20)

PPT
Cellular network biology: Proteome-wide analysis of heterogeneous data
PPT
Network Biology: Large-scale integration of data and text
PPT
Large-scale integration of data and text
PPT
Large-scale integration of data and text
PPT
Networks of proteins and diseases
PPT
Systems biology: Bioinformatics on complete biological system
PPT
Text and data integration
PPT
Network biology
PPT
One tagger, many uses - Illustrating the power of ontologies in named entity ...
PPT
Gene association networks - Large-scale integration of data and text
PPT
Biological databases: Challenges in organization and usability
PPT
Network biology: Large-scale data and text mining
PPT
Integration of heterogeneous data
PPT
The STITCH and Reflect web resources
PPT
STRING: Large-scale data and text mining
PPT
Pragmatic text mining: From literature to electronic health records
PDF
LarsJuhlJensen2020
PPT
Systems biology: Bioinformatics on complete biological systems
PPT
Advanced bioinformatics of proteomics datasets
PPT
Network biology
Cellular network biology: Proteome-wide analysis of heterogeneous data
Network Biology: Large-scale integration of data and text
Large-scale integration of data and text
Large-scale integration of data and text
Networks of proteins and diseases
Systems biology: Bioinformatics on complete biological system
Text and data integration
Network biology
One tagger, many uses - Illustrating the power of ontologies in named entity ...
Gene association networks - Large-scale integration of data and text
Biological databases: Challenges in organization and usability
Network biology: Large-scale data and text mining
Integration of heterogeneous data
The STITCH and Reflect web resources
STRING: Large-scale data and text mining
Pragmatic text mining: From literature to electronic health records
LarsJuhlJensen2020
Systems biology: Bioinformatics on complete biological systems
Advanced bioinformatics of proteomics datasets
Network biology
Ad

More from Lars Juhl Jensen (20)

PPT
One tagger, many uses: Illustrating the power of dictionary-based named entit...
PPT
One tagger, many uses: Simple text-mining strategies for biomedicine
PPT
Extract 2.0: Text-mining-assisted interactive annotation
PPT
Network visualization: A crash course on using Cytoscape
PPT
STRING & STITCH : Network integration of heterogeneous data
PPT
Biomedical text mining: Automatic processing of unstructured text
PPT
Medical network analysis: Linking diseases and genes through data and text mi...
PPT
Network Biology: A crash course on STRING and Cytoscape
PPT
Cellular networks
PPT
Cellular Network Biology: Large-scale integration of data and text
PPT
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
PPT
STRING & related databases: Large-scale integration of heterogeneous data
PPT
Tagger: Rapid dictionary-based named entity recognition
PPT
Medical text mining: Linking diseases, drugs, and adverse reactions
PPT
Network biology: Large-scale integration of data and text
PPT
Medical data and text mining: Linking diseases, drugs, and adverse reactions
PPT
Cellular Network Biology
PPT
Network biology: Large-scale integration of data and text
PPT
Biomarker bioinformatics: Network-based candidate prioritization
PPT
The Art of Counting: Scoring and ranking co-occurrences in literature
One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Simple text-mining strategies for biomedicine
Extract 2.0: Text-mining-assisted interactive annotation
Network visualization: A crash course on using Cytoscape
STRING & STITCH : Network integration of heterogeneous data
Biomedical text mining: Automatic processing of unstructured text
Medical network analysis: Linking diseases and genes through data and text mi...
Network Biology: A crash course on STRING and Cytoscape
Cellular networks
Cellular Network Biology: Large-scale integration of data and text
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
STRING & related databases: Large-scale integration of heterogeneous data
Tagger: Rapid dictionary-based named entity recognition
Medical text mining: Linking diseases, drugs, and adverse reactions
Network biology: Large-scale integration of data and text
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Cellular Network Biology
Network biology: Large-scale integration of data and text
Biomarker bioinformatics: Network-based candidate prioritization
The Art of Counting: Scoring and ranking co-occurrences in literature

Recently uploaded (20)

PPTX
The KM-GBF monitoring framework – status & key messages.pptx
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PDF
. Radiology Case Scenariosssssssssssssss
PPTX
Microbiology with diagram medical studies .pptx
PPTX
Cell Membrane: Structure, Composition & Functions
PDF
Placing the Near-Earth Object Impact Probability in Context
PPTX
neck nodes and dissection types and lymph nodes levels
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PPT
protein biochemistry.ppt for university classes
PPTX
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PDF
Sciences of Europe No 170 (2025)
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PPTX
Introduction to Cardiovascular system_structure and functions-1
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PDF
Biophysics 2.pdffffffffffffffffffffffffff
The KM-GBF monitoring framework – status & key messages.pptx
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
. Radiology Case Scenariosssssssssssssss
Microbiology with diagram medical studies .pptx
Cell Membrane: Structure, Composition & Functions
Placing the Near-Earth Object Impact Probability in Context
neck nodes and dissection types and lymph nodes levels
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
Introduction to Fisheries Biotechnology_Lesson 1.pptx
protein biochemistry.ppt for university classes
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
Sciences of Europe No 170 (2025)
Derivatives of integument scales, beaks, horns,.pptx
Introduction to Cardiovascular system_structure and functions-1
ECG_Course_Presentation د.محمد صقران ppt
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
Biophysics 2.pdffffffffffffffffffffffffff

The pragmatic text miner: It’s just another type of poorly standardized data