SlideShare a Scribd company logo
Large-scale integration of data and text




              Lars Juhl Jensen
Large-scale integration of data and text




              Lars Juhl Jensen
association networks
text mining
localization and diseases
me
Large-scale integration of data and text
Large-scale integration of data and text
promoter analysis
Jensen & Knudsen, Bioinformatics, 2000
function prediction
Jensen, Gupta et al., Journal of Molecular Biology, 2002
Large-scale integration of data and text
Large-scale integration of data and text
protein networks
de Lichtenberg, Jensen et al., Science, 2005
chemoinformatics
Campillos, Kuhn et al., Science, 2008
Large-scale integration of data and text
Large-scale integration of data and text
Large-scale integration of data and text
Large-scale integration of data and text
data mining
text mining
electronic health records
association networks
guilt by association
Large-scale integration of data and text
STRING
~2.6 million proteins
Szklarczyk, Franceschini et al., Nucleic Acids Research, 2011
STITCH
~300,000 small molecules
Kuhn et al., Nucleic Acids Research, 2012
genomic context
gene fusion
Korbel et al., Nature Biotechnology, 2004
operons
Korbel et al., Nature Biotechnology, 2004
bidirectional promoters
Korbel et al., Nature Biotechnology, 2004
metagenome neighborhood
Harrington et al., PNAS, 2007
phylogenetic profiles
Korbel et al., Nature Biotechnology, 2004
a real example
Large-scale integration of data and text
Large-scale integration of data and text
Large-scale integration of data and text
Cell




       Cellulosomes




                 Cellulose
experimental data
gene coexpression
Large-scale integration of data and text
protein interactions
Jensen & Bork, Science, 2008
curated knowledge
drug targets
complexes
pathways
Letunic & Bork, Trends in Biochemical Sciences, 2008
many databases
different formats
different identifiers
variable quality
not comparable
hard work
quality scores
von Mering et al., Nucleic Acids Research, 2005
calibrate vs. gold standard
missing most of the data
text mining
>10 km
too much to read
computer
as smart as a dog
teach it specific tricks
Large-scale integration of data and text
Large-scale integration of data and text
named entity recognition
comprehensive lexicon
cyclin dependent kinase 1
CDK1
CDC2
flexible matching
spaces and hyphens
cyclin dependent kinase 1
cyclin-dependent kinase 1
orthographic variation
CDC2
hCdc2
“black list”
SDS
information extraction
count co-mentioning
within documents
within paragraphs
within sentences
scoring scheme
Large-scale integration of data and text
Large-scale integration of data and text
corpora
~22 million abstracts
no access
~4 million full-text articles
Large-scale integration of data and text
augmented browsing
Reflect
browser add-on
real-time text mining
Pafilis, O’Donoghue, Jensen et al., Nature Biotechnology, 2009
            O’Donoghue et al., Journal of Web Semantics, 2010
localization and disease
small molecules
proteins
compartments
tissues
diseases
organisms
environments
suite of web resources
common backend database
jensenlab.org
text mining
curated knowledge
experimental data
computational predictions
quality scores
web-centric databases
DISEASES
Large-scale integration of data and text
Large-scale integration of data and text
visualization
COMPARTMENTS
compartments.jensenlab.org
TISSUES
tissues.jensenlab.org
project onto networks
Szklarczyk, Franceschini et al., Nucleic Acids Research, 2011
compartments.jensenlab.org
tissues.jensenlab.org
diseases.jensenlab.org
summary
bioinformatics
more than alignment
data/text mining
save you much time
Acknowledgments
STRING/STITCH               Literature mining
    Christian von Mering    Sune Frankild
     Damian Szklarczyk      Evangelos Pafilis
            Michael Kuhn    Janos Binder
            Manuel Stark    Kalliopi Tsafou
       Samuel Chaffron      Alberto Santos
           Chris Creevey    Heiko Horn
              Jean Muller   Michael Kuhn
          Tobias Doerks     Nigel Brown
          Philippe Julien   Reinhardt Schneider
         Alexander Roth     Sean O’Donoghue
        Milan Simonovic
               Jan Korbel
             Berend Snel
         Martijn Huynen
                Peer Bork
Large-scale integration of data and text
Questions?

More Related Content

PPT
Systems biology - Bioinformatics on complete biological systems
PPT
Integration of heterogeneous data
PDF
microBEnet: Perspectives on trying to nurture a growing MoBE field
PPT
Network biology
PPT
From phosphoproteomics to signaling networks
PPT
The STITCH and Reflect web resources
PPT
Large-scale integration of data and text
PPT
Cellular network biology: Proteome-wide analysis of heterogeneous data
Systems biology - Bioinformatics on complete biological systems
Integration of heterogeneous data
microBEnet: Perspectives on trying to nurture a growing MoBE field
Network biology
From phosphoproteomics to signaling networks
The STITCH and Reflect web resources
Large-scale integration of data and text
Cellular network biology: Proteome-wide analysis of heterogeneous data

What's hot (20)

PPT
Network biology: Large-scale data and text mining
PPT
The STITCH and Reflect web resources
PPT
Advanced bioinformatics methods for proteomics
PPT
Scientific Highlights: The Reflect and NetPhorest web resources
PPT
Large-scale data and text mining
PPT
Unraveling signaling networks by large-scale data integration
PPT
Systems biology: Large-scale biomedical data mining
PDF
Microbial Forensics: Forensic Relevance of the Individual Person’s Microbial ...
PDF
Sasan Sharee Ghourichaee
PPTX
04.19.2013.an.analytical.workflow.for.metagenomic.data.and.its.application.to...
PPT
Protein networks: A basis for large-scale data mining
PPT
Mining molecules from text and data
PPTX
TMP presentation
PPT
Activity 42 c a closer look
PPT
Visualization of large-scale protein and disease networks
PPTX
How dna works
PPT
Determining the Human Gut Microbiome Using Genome Sequencing and Dell's Cloud...
DOC
Chapter 1 final 20121-2022
PDF
Adriana San Miguel and Hang Lu (2013)
PPTX
Encyclopedia of Life: Use cases for phenotypes
Network biology: Large-scale data and text mining
The STITCH and Reflect web resources
Advanced bioinformatics methods for proteomics
Scientific Highlights: The Reflect and NetPhorest web resources
Large-scale data and text mining
Unraveling signaling networks by large-scale data integration
Systems biology: Large-scale biomedical data mining
Microbial Forensics: Forensic Relevance of the Individual Person’s Microbial ...
Sasan Sharee Ghourichaee
04.19.2013.an.analytical.workflow.for.metagenomic.data.and.its.application.to...
Protein networks: A basis for large-scale data mining
Mining molecules from text and data
TMP presentation
Activity 42 c a closer look
Visualization of large-scale protein and disease networks
How dna works
Determining the Human Gut Microbiome Using Genome Sequencing and Dell's Cloud...
Chapter 1 final 20121-2022
Adriana San Miguel and Hang Lu (2013)
Encyclopedia of Life: Use cases for phenotypes
Ad

Viewers also liked (13)

PPT
Networks of proteins and diseases
PPT
Network biology
PPT
Disease Systems Biology
PPT
Mining literature and medical records
PPTX
2016 03-16 research seminar
PDF
Evaluating HIV Clinical Care Quality in Massachusetts Sites Supported through...
PPT
Text-mining practical
PPT
The pragmatic text miner: From literature to electronic health records
PPT
Network biology: Large-scale data integration and text mining
PDF
HI201 in 2014
PPT
Network integration of data and text
PDF
MI227 Cousework1
PPT
One tagger, many uses - Illustrating the power of ontologies in named entity ...
Networks of proteins and diseases
Network biology
Disease Systems Biology
Mining literature and medical records
2016 03-16 research seminar
Evaluating HIV Clinical Care Quality in Massachusetts Sites Supported through...
Text-mining practical
The pragmatic text miner: From literature to electronic health records
Network biology: Large-scale data integration and text mining
HI201 in 2014
Network integration of data and text
MI227 Cousework1
One tagger, many uses - Illustrating the power of ontologies in named entity ...
Ad

Similar to Large-scale integration of data and text (20)

PPT
Networks of proteins and diseases
PPT
The STRING database and related tools
PPT
Disease Systems Biology
PPT
Networks of proteins and diseases
PPT
Protein networks: A basis for large-scale data mining
PPT
Large-scale data and text mining
PPT
Network biology: Large-scale data integration and text mining
PPT
Network biology: Large-scale data and text mining
PPT
Networks of proteins and diseases
PPT
Mining biomedical texts
PPT
Mining text and data on chemicals
PPT
Unraveling signal transduction networks through data integration
PPT
Network Biology: Large-scale integration of data and text
PPT
Network biology - A basis for large-scale biomedica data mining
PPT
Unraveling signaling networks by large-scale data integration
PPT
Network biology: Large-scale data integration and text mining
PPT
Network biology - Large-scale data integration and text mining
PPT
Data integration: The STITCH database of protein-small molecule interactions
PPT
Unraveling signaling networks by data integration
PPT
Large-scale data and text mining
Networks of proteins and diseases
The STRING database and related tools
Disease Systems Biology
Networks of proteins and diseases
Protein networks: A basis for large-scale data mining
Large-scale data and text mining
Network biology: Large-scale data integration and text mining
Network biology: Large-scale data and text mining
Networks of proteins and diseases
Mining biomedical texts
Mining text and data on chemicals
Unraveling signal transduction networks through data integration
Network Biology: Large-scale integration of data and text
Network biology - A basis for large-scale biomedica data mining
Unraveling signaling networks by large-scale data integration
Network biology: Large-scale data integration and text mining
Network biology - Large-scale data integration and text mining
Data integration: The STITCH database of protein-small molecule interactions
Unraveling signaling networks by data integration
Large-scale data and text mining

More from Lars Juhl Jensen (20)

PPT
One tagger, many uses: Illustrating the power of dictionary-based named entit...
PPT
One tagger, many uses: Simple text-mining strategies for biomedicine
PPT
Extract 2.0: Text-mining-assisted interactive annotation
PPT
Network visualization: A crash course on using Cytoscape
PPT
STRING & STITCH : Network integration of heterogeneous data
PPT
Biomedical text mining: Automatic processing of unstructured text
PPT
Medical network analysis: Linking diseases and genes through data and text mi...
PPT
Network Biology: A crash course on STRING and Cytoscape
PPT
Cellular networks
PPT
Cellular Network Biology: Large-scale integration of data and text
PPT
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
PPT
STRING & related databases: Large-scale integration of heterogeneous data
PPT
Tagger: Rapid dictionary-based named entity recognition
PPT
Medical text mining: Linking diseases, drugs, and adverse reactions
PPT
Network biology: Large-scale integration of data and text
PPT
Medical data and text mining: Linking diseases, drugs, and adverse reactions
PPT
Cellular Network Biology
PPT
Network biology: Large-scale integration of data and text
PPT
Biomarker bioinformatics: Network-based candidate prioritization
PPT
The Art of Counting: Scoring and ranking co-occurrences in literature
One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Simple text-mining strategies for biomedicine
Extract 2.0: Text-mining-assisted interactive annotation
Network visualization: A crash course on using Cytoscape
STRING & STITCH : Network integration of heterogeneous data
Biomedical text mining: Automatic processing of unstructured text
Medical network analysis: Linking diseases and genes through data and text mi...
Network Biology: A crash course on STRING and Cytoscape
Cellular networks
Cellular Network Biology: Large-scale integration of data and text
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
STRING & related databases: Large-scale integration of heterogeneous data
Tagger: Rapid dictionary-based named entity recognition
Medical text mining: Linking diseases, drugs, and adverse reactions
Network biology: Large-scale integration of data and text
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Cellular Network Biology
Network biology: Large-scale integration of data and text
Biomarker bioinformatics: Network-based candidate prioritization
The Art of Counting: Scoring and ranking co-occurrences in literature

Large-scale integration of data and text