SlideShare a Scribd company logo
Literature mining and large-scale data integration Lars Juhl Jensen EMBL Heidelberg
literature mining
why?
 
too much to read
information retrieval
finding the papers
ad hoc  retrieval
user-specified query
“ yeast  AND  cell cycle”
stemming
yeast / yeasts
dynamic query expansion
yeast /  S. cerevisiae
ranking
 
 
 
 
 
 
 
 
Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1 hyperphosphorylation and degradation
no tool will find it
entity recognition
identifying the substance(s)
Mitotic cyclin ( Clb2 )-bound  Cdc28  (Cdk1 homolog) directly phosphorylated  Swe1  and this modification served as a priming step to promote subsequent  Cdc5 -dependent  Swe1  hyperphosphorylation and degradation
Cdc28    yeast
Cdc28    cell cycle
good synonyms list
manual curation
orthographic variation
CDC28
Cdc28p
disambiguation
hairy
SDS
APC
Cdc2
 
 
 
 
still too much to read
information extraction
formalizing the facts
 
co-mentioning
statistical methods
NLP Natural Language Processing
Gene  and protein  names Cue words for entity recognition Verbs for relation extraction [ nxexpr  T he  expression  of   [ nxgene   the cytochrome  genes   [ nxpg   CYC1  and  CYC7 ]]] is  controlled  by [ nxpg   HAP1 ]
Mitotic cyclin ( Clb2 )-bound  Cdc28  (Cdk1 homolog) directly phosphorylated  Swe1  and this modification served as a priming step to promote subsequent  Cdc5 -dependent  Swe1  hyperphosphorylation  and degradation
 
no new discoveries
text mining
undiscovered links
 
Raynaud’s syndrome
fish oil
 
temporal trends
 
buzzwords
 
data integration
association networks
 
information extraction
 
curated knowledge
 
protein interaction data
 
genetic interaction data
 
gene expression data
 
computational predictions
conserved neighborhood
 
gene fusion
 
phylogenetic profiles
 
variable reliability
raw quality scores
 
 
 
not comparable
benchmarking
calibrate vs. gold standard
 
probabilistic scores
spread over many species
373 genomes
 
transfer by orthology
 
combine all evidence
P = 1-(1-P 1 ) . (1-P 2 ) . (1-P 3 ) …
web resources
 
 
signaling networks
phosphoproteomics
 
in vivo  phosphosites
kinases are unknown
computational methods
 
overprediction
context
scaffolders
association networks
 
NetworKIN
 
benchmarking
 
2.5-fold better accuracy
web resources
 
 
summary
literature mining is good
data integration is better
Acknowledgments Reflect & NLP Evangelos Pafilis Jasmin Saric Rossitza Ouzounova Sean O’Donoghue Isabel Rojas STRING & STITCH Christian von Mering Michael Kuhn Manuel Stark Samuel Chaffron Philippe Julien Tobias Doerks Jan Korbel Berend Snel Martijn Huynen Peer Bork NetworKIN & NetPhorest Rune Linding Martin Lee Miller Gerard Ostheimer Francesca Diella Karen Colwill Jing Jin Pavel Metalnikov Vivian Nguyen Adrian Pasculescu Jin Gyoon Park Leona D. Samson Nikolaj Blom Rob Russell Peer Bork Søren Brunak Michael Yaffe Tony Pawson
http://larsjuhljensen.wordpress.com

More Related Content

PPT
Literature Mining and Systems Biology
PPT
Biomedical literature mining
PPT
Literature mining: what is it, and should I care?
PPT
Biomedical literature mining (and why we really need open access)
PPT
Biological literature mining - from information retrieval to biological disco...
ZIP
Exploring proteins, chemicals and their interactions with STRING and STITCH
PPTX
Mining Drug Targets, Structures and Activity Data
PPT
Mining literature and medical records
Literature Mining and Systems Biology
Biomedical literature mining
Literature mining: what is it, and should I care?
Biomedical literature mining (and why we really need open access)
Biological literature mining - from information retrieval to biological disco...
Exploring proteins, chemicals and their interactions with STRING and STITCH
Mining Drug Targets, Structures and Activity Data
Mining literature and medical records

What's hot (20)

PPT
Applied text mining
PPT
Text mining
PPT
Applied text mining
PPT
Text mining
PPT
Integration of biomedical literature and databases
PPT
Biomedical text mining
PPT
Integration of biomedical literature and databases
PPTX
Graph Analytics in Pharmacology over the Web of Life Sciences Linked Open Data
PPTX
Mason abrf single_cell_2017
PPT
Open access - making the most of biomedical literature mining
PDF
CRISPR Gene Editing Congress, 25-27 February 2015 in Boston, MA
PPTX
171017 giab for giab grc workshop
PPTX
An Introduction to Crispr Genome Editing
PPTX
Transitioning to gr_ch38
PDF
Hippocampal transcriptomic responses to technical and biological perturbations
PPTX
CRISPR Screening: the What, Why and How
PDF
Bda2015 tutorial-part2-data&databases
PPTX
Lrg and mane 16 oct 2018
PPTX
ClinVar: Getting the most from the reference assembly and reference materials
DOCX
Research project
Applied text mining
Text mining
Applied text mining
Text mining
Integration of biomedical literature and databases
Biomedical text mining
Integration of biomedical literature and databases
Graph Analytics in Pharmacology over the Web of Life Sciences Linked Open Data
Mason abrf single_cell_2017
Open access - making the most of biomedical literature mining
CRISPR Gene Editing Congress, 25-27 February 2015 in Boston, MA
171017 giab for giab grc workshop
An Introduction to Crispr Genome Editing
Transitioning to gr_ch38
Hippocampal transcriptomic responses to technical and biological perturbations
CRISPR Screening: the What, Why and How
Bda2015 tutorial-part2-data&databases
Lrg and mane 16 oct 2018
ClinVar: Getting the most from the reference assembly and reference materials
Research project
Ad

Similar to Literature mining and large-scale data integration (20)

PPT
Computational approaches to cell cycle analysis: Current research topics (tho...
PPT
Text mining and data integration
PPT
Open access - making the most of biomedical literature mining
PPT
Text mining
PPT
Text mining for protein and small molecule relations
PPTX
Cell cycle and molecular basis of cancer.
PDF
ELGP 21_Molecular basis of cancer.pdf
PPTX
Cell cycle regulation
PPTX
1. CELL DIVISION.pptx
PPTX
Regulation of cell cycle
PDF
Cell cycle, regulation & cancer - PATHOLOGY.pdf
PDF
Introduction to the Cell Cycle (Tutorial)
PPT
Biomedical literature mining
PPT
University of Texas at Austin
PPT
University of Texas at Austin
PPTX
mol basis cancer.pptx, carcinoma molecular basis
PPTX
mol basis cancer.pptx abou the carcinoma signs symptoms and pathology
PPTX
4. molecular basis of cancer dr. sinhasan, mdzah
PPTX
Regulation of cell cycle (1)
PPTX
Computational approaches to cell cycle analysis: Current research topics (tho...
Text mining and data integration
Open access - making the most of biomedical literature mining
Text mining
Text mining for protein and small molecule relations
Cell cycle and molecular basis of cancer.
ELGP 21_Molecular basis of cancer.pdf
Cell cycle regulation
1. CELL DIVISION.pptx
Regulation of cell cycle
Cell cycle, regulation & cancer - PATHOLOGY.pdf
Introduction to the Cell Cycle (Tutorial)
Biomedical literature mining
University of Texas at Austin
University of Texas at Austin
mol basis cancer.pptx, carcinoma molecular basis
mol basis cancer.pptx abou the carcinoma signs symptoms and pathology
4. molecular basis of cancer dr. sinhasan, mdzah
Regulation of cell cycle (1)
Ad

More from Lars Juhl Jensen (20)

PPT
One tagger, many uses: Illustrating the power of dictionary-based named entit...
PPT
One tagger, many uses: Simple text-mining strategies for biomedicine
PPT
Extract 2.0: Text-mining-assisted interactive annotation
PPT
Network visualization: A crash course on using Cytoscape
PPT
STRING & STITCH : Network integration of heterogeneous data
PPT
Biomedical text mining: Automatic processing of unstructured text
PPT
Medical network analysis: Linking diseases and genes through data and text mi...
PPT
Network Biology: A crash course on STRING and Cytoscape
PPT
Cellular networks
PPT
Cellular Network Biology: Large-scale integration of data and text
PPT
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
PPT
STRING & related databases: Large-scale integration of heterogeneous data
PPT
Tagger: Rapid dictionary-based named entity recognition
PPT
Network Biology: Large-scale integration of data and text
PPT
Medical text mining: Linking diseases, drugs, and adverse reactions
PPT
Network biology: Large-scale integration of data and text
PPT
Medical data and text mining: Linking diseases, drugs, and adverse reactions
PPT
Cellular Network Biology
PPT
Network biology: Large-scale integration of data and text
PPT
Biomarker bioinformatics: Network-based candidate prioritization
One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Simple text-mining strategies for biomedicine
Extract 2.0: Text-mining-assisted interactive annotation
Network visualization: A crash course on using Cytoscape
STRING & STITCH : Network integration of heterogeneous data
Biomedical text mining: Automatic processing of unstructured text
Medical network analysis: Linking diseases and genes through data and text mi...
Network Biology: A crash course on STRING and Cytoscape
Cellular networks
Cellular Network Biology: Large-scale integration of data and text
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
STRING & related databases: Large-scale integration of heterogeneous data
Tagger: Rapid dictionary-based named entity recognition
Network Biology: Large-scale integration of data and text
Medical text mining: Linking diseases, drugs, and adverse reactions
Network biology: Large-scale integration of data and text
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Cellular Network Biology
Network biology: Large-scale integration of data and text
Biomarker bioinformatics: Network-based candidate prioritization

Recently uploaded (20)

PDF
Empathic Computing: Creating Shared Understanding
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Big Data Technologies - Introduction.pptx
PDF
KodekX | Application Modernization Development
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Spectroscopy.pptx food analysis technology
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
Empathic Computing: Creating Shared Understanding
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
MYSQL Presentation for SQL database connectivity
Dropbox Q2 2025 Financial Results & Investor Presentation
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
NewMind AI Weekly Chronicles - August'25 Week I
Understanding_Digital_Forensics_Presentation.pptx
MIND Revenue Release Quarter 2 2025 Press Release
Big Data Technologies - Introduction.pptx
KodekX | Application Modernization Development
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Spectral efficient network and resource selection model in 5G networks
20250228 LYD VKU AI Blended-Learning.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Spectroscopy.pptx food analysis technology
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Per capita expenditure prediction using model stacking based on satellite ima...

Literature mining and large-scale data integration