SlideShare a Scribd company logo
Data and Text Mining

Lars Juhl Jensen
Data and Text Mining
Data and Text Mining
sequence analysis
Data and Text Mining
Data and Text Mining
protein networks
de Lichtenberg, Jensen et al., Science, 2005
adverse drug reactions
Campillos, Kuhn et al., Science, 2008
Data and Text Mining
group leader
Data and Text Mining
cofounder
Data and Text Mining
data mining
proteomics
text mining
biomedical literature
electronic health records
protein networks
guilt by association
Data and Text Mining
STRING
Szklarczyk, Franceschini et al., Nucleic Acids Research, 2011
computational predictions
gene fusion
Korbel et al., Nature Biotechnology, 2004
gene neighborhood
operons
Korbel et al., Nature Biotechnology, 2004
bidirectional promoters
Korbel et al., Nature Biotechnology, 2004
phylogenetic profiles
Korbel et al., Nature Biotechnology, 2004
a real example
Data and Text Mining
Data and Text Mining
Data and Text Mining
Cell

Cellulosomes

Cellulose
experimental data
gene coexpression
Data and Text Mining
protein interactions
Jensen & Bork, Science, 2008
genetic interactions
Beyer et al., Nature Reviews Genetics, 2007
curated knowledge
complexes
pathways
Letunic & Bork, Trends in Biochemical Sciences, 2008
many databases
different formats
different identifiers
variable quality
not comparable
not same species
hard work
quality scores
von Mering et al., Nucleic Acids Research, 2005
calibrate vs. gold standard
von Mering et al., Nucleic Acids Research, 2005
homology-based transfer
Franceschini et al., Nucleic Acids Research, 2013
missing most of the data
text mining
>10 km
too much to read
computer
as smart as a dog
teach it specific tricks
Data and Text Mining
Data and Text Mining
named entity recognition
comprehensive lexicon
CDC2
cyclin dependent kinase 1
expansion rules
hCdc2
CDC2
flexible matching
cyclin-dependent kinase 1
cyclin dependent kinase 1
“black list”
SDS
augmented browsing
Reflect
browser add-on
real-time text mining
Pafilis, O’Donoghue, Jensen et al., Nature Biotechnology, 2009
O’Donoghue et al., Journal of Web Semantics, 2010
information extraction
co-mentioning
within documents
within paragraphs
within sentences
text corpus
~22 million abstracts
no access
millions of full-text articles
Data and Text Mining
localization and disease
general approach
COMPARTMENTS
TISSUES
DISEASES
curated knowledge
experimental data
text mining
computational predictions
common identifiers
quality scores
visualization
compartments.jensenlab.org
tissues.jensenlab.org
dissemination
web interfaces
Data and Text Mining
web services
diseases.jensenlab.org
bulk download
Acknowledgments
STRING
Christian von
Mering
Damian
Szklarczyk
Michael Kuhn
Manuel Stark
Samuel Chaffron
Chris Creevey
Jean Muller
Tobias Doerks
Philippe Julien
Alexander Roth
Milan Simonovic
Jan Korbel
Berend Snel
Martijn Huynen
Peer Bork

Text
mining
Sune Frankild
Evangelos Pafilis
Kalliopi Tsafou
Alberto Santos
Janos Binder
Heiko Horn
Michael Kuhn
Nigel Brown
Reinhardt Schneider
Sean O’ Donoghue

More Related Content

PPT
Large-scale integration of data and text
PPT
Network biology
PPT
Gene association networks - Large-scale integration of data and text
PPT
Networks of proteins and diseases
PPT
One tagger, many uses - Illustrating the power of ontologies in named entity ...
PPT
STRING - Protein networks from data and text mining
PPT
Large-scale integration of data and text
PPT
Gene association networks - Large-scale integration of data and text
Large-scale integration of data and text
Network biology
Gene association networks - Large-scale integration of data and text
Networks of proteins and diseases
One tagger, many uses - Illustrating the power of ontologies in named entity ...
STRING - Protein networks from data and text mining
Large-scale integration of data and text
Gene association networks - Large-scale integration of data and text

What's hot (20)

PPT
STRING: Protein networks from data and text mining
PPT
Introduction to STRING
PPT
STRING & STITCH : Network integration of heterogeneous data
PPT
Gene association networks - Large-scale integration of data and text
PPT
Network integration of data and text
PPT
Data integration and functional association networks
PPT
Networks of proteins and diseases
PPT
Text and data mining
PPT
In silico and Text-Based Analysis of Cellular Networks
PPT
Gene association networks - Large-scale integration of data and text
PPT
Network biology: Large-scale data and text mining
PPT
STRING - Modeling of biological systems through cross-species data integ...
PPT
Network biology - Large-scale integration of data and text
PPT
Gene association networks: Large-scale integration of data and text
PPT
Gene association networks: Large-scale integration of data and text
PPT
Network Biology: A crash course on STRING and Cytoscape
KEY
STRING/STITCH tutorial
PPT
Cellular network biology: Proteome-wide analysis of heterogeneous data
PPT
STRING - Large-scale integration of data and text
PPT
Network biology: Large-scale biomedical data and text mining
STRING: Protein networks from data and text mining
Introduction to STRING
STRING & STITCH : Network integration of heterogeneous data
Gene association networks - Large-scale integration of data and text
Network integration of data and text
Data integration and functional association networks
Networks of proteins and diseases
Text and data mining
In silico and Text-Based Analysis of Cellular Networks
Gene association networks - Large-scale integration of data and text
Network biology: Large-scale data and text mining
STRING - Modeling of biological systems through cross-species data integ...
Network biology - Large-scale integration of data and text
Gene association networks: Large-scale integration of data and text
Gene association networks: Large-scale integration of data and text
Network Biology: A crash course on STRING and Cytoscape
STRING/STITCH tutorial
Cellular network biology: Proteome-wide analysis of heterogeneous data
STRING - Large-scale integration of data and text
Network biology: Large-scale biomedical data and text mining
Ad

Similar to Data and Text Mining (20)

PPT
Network biology: Large-scale data and text mining
PPT
Networks of proteins and diseases
PPT
Systems biology: Large-scale biomedical data mining
PPT
Network biology: Large-scale data integration and text mining
PPT
Network biology: Large-scale data integration and text mining
PPT
Text mining for organism and environment names
PPT
Turning big data and text collections into web resrouces
PPT
Network Biology: Large-scale integration of data and text
PPT
Large-scale integration of data and text
PPT
Protein networks: A basis for large-scale data mining
PPT
Protein networks: A basis for large-scale data mining
PPT
Network biology
PPT
Large-scale integration of data and text
PPT
Cellular Network Biology
PPT
Making gene networks through data integration
PPT
Protein networks: A basis for large-scale data mining
PPT
Large-scale data and text mining
PPT
Systems biology - Bioinformatics on complete biological systems
PPT
The pragmatic text miner: It’s just another type of poorly standardized data
PPT
Large-scale integration of data and text
Network biology: Large-scale data and text mining
Networks of proteins and diseases
Systems biology: Large-scale biomedical data mining
Network biology: Large-scale data integration and text mining
Network biology: Large-scale data integration and text mining
Text mining for organism and environment names
Turning big data and text collections into web resrouces
Network Biology: Large-scale integration of data and text
Large-scale integration of data and text
Protein networks: A basis for large-scale data mining
Protein networks: A basis for large-scale data mining
Network biology
Large-scale integration of data and text
Cellular Network Biology
Making gene networks through data integration
Protein networks: A basis for large-scale data mining
Large-scale data and text mining
Systems biology - Bioinformatics on complete biological systems
The pragmatic text miner: It’s just another type of poorly standardized data
Large-scale integration of data and text
Ad

More from Lars Juhl Jensen (20)

PPT
One tagger, many uses: Illustrating the power of dictionary-based named entit...
PPT
One tagger, many uses: Simple text-mining strategies for biomedicine
PPT
Extract 2.0: Text-mining-assisted interactive annotation
PPT
Network visualization: A crash course on using Cytoscape
PPT
Biomedical text mining: Automatic processing of unstructured text
PPT
Medical network analysis: Linking diseases and genes through data and text mi...
PPT
Cellular networks
PPT
Cellular Network Biology: Large-scale integration of data and text
PPT
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
PPT
STRING & related databases: Large-scale integration of heterogeneous data
PPT
Tagger: Rapid dictionary-based named entity recognition
PPT
Medical text mining: Linking diseases, drugs, and adverse reactions
PPT
Network biology: Large-scale integration of data and text
PPT
Medical data and text mining: Linking diseases, drugs, and adverse reactions
PPT
Network biology: Large-scale integration of data and text
PPT
Biomarker bioinformatics: Network-based candidate prioritization
PPT
The Art of Counting: Scoring and ranking co-occurrences in literature
PPT
Text-mining-based retrieval of protein networks
PPT
Medical data and text mining: Linking diseases, drugs, and adverse reactions
PPT
Gene association networks: Large-scale integration of data and text
One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Simple text-mining strategies for biomedicine
Extract 2.0: Text-mining-assisted interactive annotation
Network visualization: A crash course on using Cytoscape
Biomedical text mining: Automatic processing of unstructured text
Medical network analysis: Linking diseases and genes through data and text mi...
Cellular networks
Cellular Network Biology: Large-scale integration of data and text
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
STRING & related databases: Large-scale integration of heterogeneous data
Tagger: Rapid dictionary-based named entity recognition
Medical text mining: Linking diseases, drugs, and adverse reactions
Network biology: Large-scale integration of data and text
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Network biology: Large-scale integration of data and text
Biomarker bioinformatics: Network-based candidate prioritization
The Art of Counting: Scoring and ranking co-occurrences in literature
Text-mining-based retrieval of protein networks
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Gene association networks: Large-scale integration of data and text

Recently uploaded (20)

PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Approach and Philosophy of On baking technology
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Empathic Computing: Creating Shared Understanding
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Big Data Technologies - Introduction.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Encapsulation theory and applications.pdf
PPT
Teaching material agriculture food technology
“AI and Expert System Decision Support & Business Intelligence Systems”
The AUB Centre for AI in Media Proposal.docx
Reach Out and Touch Someone: Haptics and Empathic Computing
Approach and Philosophy of On baking technology
Per capita expenditure prediction using model stacking based on satellite ima...
sap open course for s4hana steps from ECC to s4
Unlocking AI with Model Context Protocol (MCP)
Empathic Computing: Creating Shared Understanding
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
MYSQL Presentation for SQL database connectivity
Big Data Technologies - Introduction.pptx
20250228 LYD VKU AI Blended-Learning.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Encapsulation theory and applications.pdf
Teaching material agriculture food technology

Data and Text Mining