SlideShare a Scribd company logo
Applied text mining
>10 km
too much to read
exponential growth
~40 seconds per paper
computer
as smart as a dog
teach it specific tricks
Applied text mining
Applied text mining
information retrieval
named entity recognition
information extraction
text/data integration
medical text mining
information retrieval
find the relevant papers
ad hoc retrieval
user-specified query
“yeast AND cell cycle”
PubMed
Applied text mining
indexing
fast lookup
stemming
word endings
dynamic query expansion
MeSH terms
Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1
homolog) directly phosphorylated Swe1
and this modification served as a priming
step to promote subsequent Cdc5-
dependent Swe1 hyperphosphorylation
and degradation
no tool will find that
named entity recognition
identify the concepts
Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1
homolog) directly phosphorylated Swe1
and this modification served as a priming
step to promote subsequent Cdc5-
dependent Swe1 hyperphosphorylation
and degradation
comprehensive lexicon
CDC2
cyclin dependent kinase 1
orthographic variation
flexible matching
upper- and lower-case
CDC2
Cdc2
spaces and hyphens
cyclin dependent kinase 1
cyclin-dependent kinase 1
name expansions
prefixes and postfixes
CDC2
hCDC2
“black list”
SDS
efficient tagger
Pafilis et al., PLOS ONE, 2013
benchmarking
the formal way
manually annotated corpus
Applied text mining
precision
recall
much work
the pragmatic way
random sampling
Applied text mining
precision
no recall
much less work
augmented browsing
Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1
homolog) directly phosphorylated Swe1
and this modification served as a priming
step to promote subsequent Cdc5-
dependent Swe1 hyperphosphorylation
and degradation
Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1
homolog) directly phosphorylated Swe1
and this modification served as a priming
step to promote subsequent Cdc5-
dependent Swe1 hyperphosphorylation
and degradation
Reflect
reflect.ws
information extraction
formalize the facts
Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1
homolog) directly phosphorylated Swe1
and this modification served as a priming
step to promote subsequent Cdc5-
dependent Swe1 hyperphosphorylation
and degradation
two approaches
the formal way
NLP
Natural Language Processing
grammatical analysis
part-of-speech tagging
multiword detection
semantic tagging
sentence parsing
Gene and protein names
Cue words for entity
recognition
Verbs for relation extraction
[nxexpr The expression of
[nxgene the cytochrome
genes
[nxpg CYC1 and CYC7]]]
is controlled by
[nxpg HAP1]
extract stated facts
high precision
poor recall
the pragmatic way
guilt by association
Applied text mining
co-mentioning
counting
within documents
within paragraphs
within sentences
quality score
Applied text mining
Applied text mining
high recall
high precision
undirected associations
unknown type
text/data integration
STRING
protein associations
string-db.org
STITCH
STRING + 300k chemicals
stitch-db.org
COMPARTMENTS
subcellular localization
compartments.jensenlab.org
TISSUES
tissue expression
tissues.jensenlab.org
DISEASES
disease–gene assocations
diseases.jensenlab.org
curated knowledge
pathways
Letunic & Bork, Trends in Biochemical Sciences, 2008
experimental data
gene expression
Applied text mining
computational predictions
gene neighborhood
Korbel et al., Nature Biotechnology, 2004
many databases
different formats
different identifiers
variable quality
not comparable
hard work
common identifiers
quality scores
score calibration
visualization
web interfaces
bulk download
why so many resources?
Swiss army knife syndrome
Applied text mining
medical text mining
electronic health records
Applied text mining
opt-out
opt-in
structured data
Jensen et al., Nature Reviews Genetics, 2012
unstructured data
clinical narrative
Applied text mining
Danish
busy doctors
psychiatric patients
named entity recognition
custom dictionaries
diseases
drugs
adverse events
expansion rules
phonetic spelling
typos
sentence filters
negations
family members
delutions
detailed disease profiles
Roque et al., PLOS Computational Biology, 2011
3262638254947
Assigned codes
Text mined codes
comorbidity
Roque et al., PLOS Computational Biology, 2011
patient stratification
Roque et al., PLOS Computational Biology, 2011
pharmacovigilance
structured medication data
text-mined adverse events
Eriksson et al., submitted, 2013
EMBO Practical Course Computational Biology:
Genomesto Systems
Puerto Varas, 3-9April2014
Thank you!Thank you!

More Related Content

PPT
Applied text mining
PPT
Text mining and data integration
PPT
Text mining
PPT
Literature mining: what is it, and should I care?
PPT
Mining literature and medical records
PPT
Literature mining and large-scale data integration
PPT
Biomedical literature mining
PPT
Biomedical literature mining (and why we really need open access)
Applied text mining
Text mining and data integration
Text mining
Literature mining: what is it, and should I care?
Mining literature and medical records
Literature mining and large-scale data integration
Biomedical literature mining
Biomedical literature mining (and why we really need open access)

What's hot (20)

PPT
Literature Mining and Systems Biology
PPT
Integration of biomedical literature and databases
PPT
Biological literature mining - from information retrieval to biological disco...
PPT
Text mining
ZIP
Exploring proteins, chemicals and their interactions with STRING and STITCH
PPT
Biomedical text mining
PPT
Open access - making the most of biomedical literature mining
PPTX
An Introduction to Crispr Genome Editing
PDF
Theory and practice of graphical population analysis
PPT
Hoofdstuk 16 2008 deel 1
PDF
CRISPR Gene Editing Congress, 25-27 February 2015 in Boston, MA
PPTX
Rewriting the Genome Using CRISPR and Synthetic Biology
PPTX
Mason abrf single_cell_2017
PPTX
Genome Editing Comes of Age
PDF
Hippocampal transcriptomic responses to technical and biological perturbations
PPTX
Sept2016 smallvar 10_x
PPSX
Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...
PPTX
CRISPR presentation extended Mouse Modeling
PPTX
Transhumanismo y Mejoramiento Genético mediante CRISPR
PPTX
171017 giab for giab grc workshop
Literature Mining and Systems Biology
Integration of biomedical literature and databases
Biological literature mining - from information retrieval to biological disco...
Text mining
Exploring proteins, chemicals and their interactions with STRING and STITCH
Biomedical text mining
Open access - making the most of biomedical literature mining
An Introduction to Crispr Genome Editing
Theory and practice of graphical population analysis
Hoofdstuk 16 2008 deel 1
CRISPR Gene Editing Congress, 25-27 February 2015 in Boston, MA
Rewriting the Genome Using CRISPR and Synthetic Biology
Mason abrf single_cell_2017
Genome Editing Comes of Age
Hippocampal transcriptomic responses to technical and biological perturbations
Sept2016 smallvar 10_x
Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...
CRISPR presentation extended Mouse Modeling
Transhumanismo y Mejoramiento Genético mediante CRISPR
171017 giab for giab grc workshop
Ad

Viewers also liked (8)

PPT
Systems biology: Large-scale biomedical data mining
PPT
Network biology: A basis for large-scale biomedical data mining
PPT
Medical data and text mining: Linking diseases, drugs, and adverse reactions
PPT
Protein networks: A basis for large-scale data mining
PPT
The pragmatic text miner: It's just another type of poorly standardized data
PPT
Data integration with STRING
PPT
Using networks to derive function
PPT
One tagger, many uses - Illustrating the power of ontologies in named entity ...
Systems biology: Large-scale biomedical data mining
Network biology: A basis for large-scale biomedical data mining
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Protein networks: A basis for large-scale data mining
The pragmatic text miner: It's just another type of poorly standardized data
Data integration with STRING
Using networks to derive function
One tagger, many uses - Illustrating the power of ontologies in named entity ...
Ad

Similar to Applied text mining (20)

PPT
Open access - making the most of biomedical literature mining
PPT
Text mining
PPT
Computational approaches to cell cycle analysis: Current research topics (tho...
PPT
Integration of biomedical literature and databases
PPT
Text mining for protein and small molecule relations
PPT
STRING: Large-scale data and text mining
PPT
Large-scale integration of data and text
PPT
Biomedical text mining: Automatic processing of unstructured text
PPT
The STRING database and related tools
PPT
Biomedical literature mining
PPT
University of Texas at Austin
PPT
University of Texas at Austin
PPT
Large-scale integration of data and text
PPT
Large-scale data and text mining
PPT
Transcriptomics and lexico-syntactic analysis
PPTX
Cell cycle and molecular basis of cancer.
PDF
ELGP 21_Molecular basis of cancer.pdf
PDF
Cyclin Dependent Kinases: Old Target with New Challenges for Anti-Cancer Drugs
PPTX
Cell cycle regulation
PDF
Introduction to the Cell Cycle (Tutorial)
Open access - making the most of biomedical literature mining
Text mining
Computational approaches to cell cycle analysis: Current research topics (tho...
Integration of biomedical literature and databases
Text mining for protein and small molecule relations
STRING: Large-scale data and text mining
Large-scale integration of data and text
Biomedical text mining: Automatic processing of unstructured text
The STRING database and related tools
Biomedical literature mining
University of Texas at Austin
University of Texas at Austin
Large-scale integration of data and text
Large-scale data and text mining
Transcriptomics and lexico-syntactic analysis
Cell cycle and molecular basis of cancer.
ELGP 21_Molecular basis of cancer.pdf
Cyclin Dependent Kinases: Old Target with New Challenges for Anti-Cancer Drugs
Cell cycle regulation
Introduction to the Cell Cycle (Tutorial)

More from Lars Juhl Jensen (20)

PPT
One tagger, many uses: Illustrating the power of dictionary-based named entit...
PPT
One tagger, many uses: Simple text-mining strategies for biomedicine
PPT
Extract 2.0: Text-mining-assisted interactive annotation
PPT
Network visualization: A crash course on using Cytoscape
PPT
STRING & STITCH : Network integration of heterogeneous data
PPT
Medical network analysis: Linking diseases and genes through data and text mi...
PPT
Network Biology: A crash course on STRING and Cytoscape
PPT
Cellular networks
PPT
Cellular Network Biology: Large-scale integration of data and text
PPT
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
PPT
STRING & related databases: Large-scale integration of heterogeneous data
PPT
Tagger: Rapid dictionary-based named entity recognition
PPT
Network Biology: Large-scale integration of data and text
PPT
Medical text mining: Linking diseases, drugs, and adverse reactions
PPT
Network biology: Large-scale integration of data and text
PPT
Medical data and text mining: Linking diseases, drugs, and adverse reactions
PPT
Cellular Network Biology
PPT
Network biology: Large-scale integration of data and text
PPT
Biomarker bioinformatics: Network-based candidate prioritization
PPT
The Art of Counting: Scoring and ranking co-occurrences in literature
One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Simple text-mining strategies for biomedicine
Extract 2.0: Text-mining-assisted interactive annotation
Network visualization: A crash course on using Cytoscape
STRING & STITCH : Network integration of heterogeneous data
Medical network analysis: Linking diseases and genes through data and text mi...
Network Biology: A crash course on STRING and Cytoscape
Cellular networks
Cellular Network Biology: Large-scale integration of data and text
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
STRING & related databases: Large-scale integration of heterogeneous data
Tagger: Rapid dictionary-based named entity recognition
Network Biology: Large-scale integration of data and text
Medical text mining: Linking diseases, drugs, and adverse reactions
Network biology: Large-scale integration of data and text
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Cellular Network Biology
Network biology: Large-scale integration of data and text
Biomarker bioinformatics: Network-based candidate prioritization
The Art of Counting: Scoring and ranking co-occurrences in literature

Recently uploaded (20)

PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPTX
BIOMOLECULES PPT........................
PPTX
2. Earth - The Living Planet earth and life
PPTX
2. Earth - The Living Planet Module 2ELS
PDF
bbec55_b34400a7914c42429908233dbd381773.pdf
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PPTX
Cell Membrane: Structure, Composition & Functions
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PDF
An interstellar mission to test astrophysical black holes
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
microscope-Lecturecjchchchchcuvuvhc.pptx
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PDF
. Radiology Case Scenariosssssssssssssss
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
POSITIONING IN OPERATION THEATRE ROOM.ppt
Biophysics 2.pdffffffffffffffffffffffffff
BIOMOLECULES PPT........................
2. Earth - The Living Planet earth and life
2. Earth - The Living Planet Module 2ELS
bbec55_b34400a7914c42429908233dbd381773.pdf
Introduction to Fisheries Biotechnology_Lesson 1.pptx
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
INTRODUCTION TO EVS | Concept of sustainability
Cell Membrane: Structure, Composition & Functions
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
An interstellar mission to test astrophysical black holes
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
microscope-Lecturecjchchchchcuvuvhc.pptx
ECG_Course_Presentation د.محمد صقران ppt
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
Phytochemical Investigation of Miliusa longipes.pdf
. Radiology Case Scenariosssssssssssssss
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...

Applied text mining