SlideShare a Scribd company logo
Lars Juhl Jensen
The pragmatic text miner
It’s just another type of poorly standardized
data
why text mining?
data mining
guilt by association
The pragmatic text miner: It’s just another type of poorly standardized data
structured data
unstructured text
biomedical literature
>10 km
too much to read
computer
as smart as a dog
teach it specific tricks
The pragmatic text miner: It’s just another type of poorly standardized data
The pragmatic text miner: It’s just another type of poorly standardized data
named entity recognition
comprehensive lexicon
synonyms
prostate specific antigen
KLK3
expansion rules
prefixes and suffixes
KLK3
hKLK3
flexible matching
hyphens and spaces
prostate specific antigen
prostate-specific antigen
“black list”
SDS
co-mentioning
within documents
within paragraphs
within sentences
weighted score
text corpus
what we normally use
Medline abstracts
what we should use
full-text articles
many publishers
different interfaces
different formats
different licenses
unclear terms
unifying text & data
text mining
curated knowledge
experimental data
computational predictions
integrated web resources
protein networks
string-db.org
chemical networks
stitch-db.org
subcellular localization
compartments.jensenlab.org
tissue expression
tissues.jensenlab.org
disease associations
many sources
different formats
different identifiers
variable quality
not comparable
hard work
common identifiers
quality scores
score calibration
data visualization
collaboration model
domain experts
what?
why?
problem
manpower
me
how?
technology
guidance
biodiversity
organisms
environments
Encyclopedia of Life
Biodiversity Heritage Library
pharmacovigilance
drugs
adverse drug reactions
electronic health records
The pragmatic text miner: It’s just another type of poorly standardized data
what we need
The pragmatic text miner: It’s just another type of poorly standardized data
one place to get all
the format is not crucial
the license is
Acknowledgments
Protein networks
Michael Kuhn
Damian Szklarczyk
Andrea Franceschini
Milan Simonovic
Alexander Roth
Sune Pletscher-
Frankild
Jianyi Lin
Pablo Minguez
Christian von Mering
Peer Bork
Localization and disease
Sune Pletscher-Frankild
Alberto Santos
Janos Binder
Kalliopi Tsafou
Christian Stolte
Albert Palleja
Heiko Horn
Evangelos Pafilis
Reinhardt Schneider
Sean O’ Donoghue

More Related Content

PPT
The pragmatic text miner: It’s just another type of poorly standardized data
PPT
The pragmatic text miner: It's just another type of poorly standardized data
PPT
Large-scale integration of data and text
PPT
STRING: Large-scale data and text mining
PPT
Protein association networks with STRING
PPT
STRING - Protein networks from data and text mining
PPT
The STRING database - Quality scores for heterogeneous interaction data
PPT
Introduction to STRING
The pragmatic text miner: It’s just another type of poorly standardized data
The pragmatic text miner: It's just another type of poorly standardized data
Large-scale integration of data and text
STRING: Large-scale data and text mining
Protein association networks with STRING
STRING - Protein networks from data and text mining
The STRING database - Quality scores for heterogeneous interaction data
Introduction to STRING

What's hot (20)

PPT
The STRING database and related tools
PPT
The STRING database
PPT
Network biology: A crash course on STRING and Cytoscape
PPT
STRING - Large-scale integration of data and text
KEY
STRING/STITCH tutorial
PPT
Turning big data and text collections into web resrouces
PPT
Text mining for organism and environment names
PPT
The STRING database
PPT
Data integration with STRING
PPT
Gene association networks - Large-scale integration of data and text
PPT
Network biology: Large-scale data and text mining
PPT
Network biology: Large-scale data integration and text mining
PPT
Biomarker bioinformatics: Network-based candidate prioritization
PPT
Gene association networks: Large-scale integration of data and text
PPT
Gene association networks: Large-scale integration of data and text
PPT
Scientific Highlights: The Reflect and NetPhorest web resources
PPT
Networks of proteins and diseases
PPT
Gene association networks - Large-scale integration of data and text
PPT
Large-scale integration of data and text
PPT
Protein association networks: Large-scale integration of data and text
The STRING database and related tools
The STRING database
Network biology: A crash course on STRING and Cytoscape
STRING - Large-scale integration of data and text
STRING/STITCH tutorial
Turning big data and text collections into web resrouces
Text mining for organism and environment names
The STRING database
Data integration with STRING
Gene association networks - Large-scale integration of data and text
Network biology: Large-scale data and text mining
Network biology: Large-scale data integration and text mining
Biomarker bioinformatics: Network-based candidate prioritization
Gene association networks: Large-scale integration of data and text
Gene association networks: Large-scale integration of data and text
Scientific Highlights: The Reflect and NetPhorest web resources
Networks of proteins and diseases
Gene association networks - Large-scale integration of data and text
Large-scale integration of data and text
Protein association networks: Large-scale integration of data and text
Ad

Viewers also liked (12)

PDF
Photograph CV Nouman Malik
PDF
Memorias II Jornadas de Planificación y Gestión Ambiental
PPTX
Marketing mix
PDF
Séminaire du NET
PPT
PPTX
Vinyl Plus Plugs,Dip Moulded Pvc Insulators
PPTX
Sitting posture
DOCX
De thi hsg anh 6
DOC
De thi hsg tieng anh 6
PDF
MM_Mentorship_2-28
PDF
Nisha parwani B.Sc. Interior Design ( Residential Design Portfolio)
PDF
Axion Suit Magazine Article April 2015
Photograph CV Nouman Malik
Memorias II Jornadas de Planificación y Gestión Ambiental
Marketing mix
Séminaire du NET
Vinyl Plus Plugs,Dip Moulded Pvc Insulators
Sitting posture
De thi hsg anh 6
De thi hsg tieng anh 6
MM_Mentorship_2-28
Nisha parwani B.Sc. Interior Design ( Residential Design Portfolio)
Axion Suit Magazine Article April 2015
Ad

Similar to The pragmatic text miner: It’s just another type of poorly standardized data (20)

PPT
Pragmatic text mining: From literature to electronic health records
PPT
Systems biology - Bioinformatics on complete biological systems
PPT
Pragmatic text mining: From literature to electronic health records
PPT
Cellular network biology: Proteome-wide analysis of heterogeneous data
PPT
STRING & related databases: Large-scale integration of heterogeneous data
PPT
The pragmatic text miner: From literature to electronic health records
PPT
One tagger, many uses: Simple text-mining strategies for biomedicine
PPT
One tagger, many uses: Illustrating the power of dictionary-based named entit...
PPT
Turning literature into databases
PPT
Networks of proteins and diseases
PPT
Biomedical literature mining
PPT
STRING: protein association networks
PPT
STRING: Protein association networks
PPT
Network biology - Large-scale integration of data and text
PPT
Systems biology: Bioinformatics on complete biological system
PPT
Network biology: Large-scale data integration and text mining
PPT
Network biology: Large-scale integration of data and text
PPT
Advanced bioinformatics of proteomics datasets
PPT
Network biology: Large-scale integration of data and text
PPT
Network Biology: Large-scale integration of data and text
Pragmatic text mining: From literature to electronic health records
Systems biology - Bioinformatics on complete biological systems
Pragmatic text mining: From literature to electronic health records
Cellular network biology: Proteome-wide analysis of heterogeneous data
STRING & related databases: Large-scale integration of heterogeneous data
The pragmatic text miner: From literature to electronic health records
One tagger, many uses: Simple text-mining strategies for biomedicine
One tagger, many uses: Illustrating the power of dictionary-based named entit...
Turning literature into databases
Networks of proteins and diseases
Biomedical literature mining
STRING: protein association networks
STRING: Protein association networks
Network biology - Large-scale integration of data and text
Systems biology: Bioinformatics on complete biological system
Network biology: Large-scale data integration and text mining
Network biology: Large-scale integration of data and text
Advanced bioinformatics of proteomics datasets
Network biology: Large-scale integration of data and text
Network Biology: Large-scale integration of data and text

More from Lars Juhl Jensen (20)

PPT
Extract 2.0: Text-mining-assisted interactive annotation
PPT
Network visualization: A crash course on using Cytoscape
PPT
STRING & STITCH : Network integration of heterogeneous data
PPT
Biomedical text mining: Automatic processing of unstructured text
PPT
Medical network analysis: Linking diseases and genes through data and text mi...
PPT
Network Biology: A crash course on STRING and Cytoscape
PPT
Cellular networks
PPT
Cellular Network Biology: Large-scale integration of data and text
PPT
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
PPT
Tagger: Rapid dictionary-based named entity recognition
PPT
Medical text mining: Linking diseases, drugs, and adverse reactions
PPT
Medical data and text mining: Linking diseases, drugs, and adverse reactions
PPT
Cellular Network Biology
PPT
The Art of Counting: Scoring and ranking co-occurrences in literature
PPT
Text-mining-based retrieval of protein networks
PPT
Medical data and text mining: Linking diseases, drugs, and adverse reactions
PPT
Gene association networks: Large-scale integration of data and text
PPT
Medical data and text mining: Linking diseases, drugs, and adverse reactions
PPT
Medical data and text mining: Linking diseases, drugs, and adverse reactions
PPT
Gene Association Networks: Large-scale integration of data and text
Extract 2.0: Text-mining-assisted interactive annotation
Network visualization: A crash course on using Cytoscape
STRING & STITCH : Network integration of heterogeneous data
Biomedical text mining: Automatic processing of unstructured text
Medical network analysis: Linking diseases and genes through data and text mi...
Network Biology: A crash course on STRING and Cytoscape
Cellular networks
Cellular Network Biology: Large-scale integration of data and text
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Tagger: Rapid dictionary-based named entity recognition
Medical text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Cellular Network Biology
The Art of Counting: Scoring and ranking co-occurrences in literature
Text-mining-based retrieval of protein networks
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Gene association networks: Large-scale integration of data and text
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Gene Association Networks: Large-scale integration of data and text

Recently uploaded (20)

PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
Introduction to Cardiovascular system_structure and functions-1
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PPTX
BIOMOLECULES PPT........................
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PDF
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PDF
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
PDF
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PPTX
2. Earth - The Living Planet Module 2ELS
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
2Systematics of Living Organisms t-.pptx
PDF
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
Introduction to Cardiovascular system_structure and functions-1
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
Introduction to Fisheries Biotechnology_Lesson 1.pptx
BIOMOLECULES PPT........................
ECG_Course_Presentation د.محمد صقران ppt
TOTAL hIP ARTHROPLASTY Presentation.pptx
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
Classification Systems_TAXONOMY_SCIENCE8.pptx
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
POSITIONING IN OPERATION THEATRE ROOM.ppt
7. General Toxicologyfor clinical phrmacy.pptx
2. Earth - The Living Planet Module 2ELS
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
2Systematics of Living Organisms t-.pptx
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
cpcsea ppt.pptxssssssssssssssjjdjdndndddd

The pragmatic text miner: It’s just another type of poorly standardized data