Network biology: Large-scale data and text mining

Download as PPT, PDF

1 like622 views

Lars Juhl Jensen

This document discusses network biology and large-scale text mining. It describes using computational predictions, experimental data, and text mining to build protein interaction networks for various species from databases with different formats and quality. It also discusses using named entity recognition, expansion rules, and flexible matching to extract information from millions of abstracts and articles to identify relationships between biological entities like proteins, complexes, pathways, tissues, compartments, and diseases. The extracted information is integrated into web interfaces and services to allow visualization and exploration of the biological networks and relationships.

Network biology
Large-scale data and text mining
Lars Juhl Jensen

guilt by association

Network biology: Large-scale data and text mining

protein networks

STRING

computational predictions

gene fusion

Korbel et al., Nature Biotechnology, 2004

gene neighborhood

Korbel et al., Nature Biotechnology, 2004

phylogenetic profiles

Korbel et al., Nature Biotechnology, 2004

experimental data

gene coexpression

Network biology: Large-scale data and text mining

protein interactions

Jensen & Bork, Science, 2008

curated knowledge

complexes

pathways

Letunic & Bork, Trends in Biochemical Sciences, 2008

many databases

different formats

different identifiers

variable quality

not comparable

not same species

hard work

quality scores

von Mering et al., Nucleic Acids Research, 2005

calibrate vs. gold standard

von Mering et al., Nucleic Acids Research, 2005

homology-based transfer

Franceschini et al., Nucleic Acids Research, 2013

missing most of the data

text mining

>10 km

too much to read

computer

as smart as a dog

teach it specific tricks

Network biology: Large-scale data and text mining

Network biology: Large-scale data and text mining

named entity recognition

comprehensive lexicon

CDC2

cyclin dependent kinase 1

expansion rules

hCdc2

CDC2

flexible matching

cyclin-dependent kinase 1

cyclin dependent kinase 1

“black list”

SDS

augmented browsing

Reflect

browser add-on

real-time text mining

Pafilis, O’Donoghue, Jensen et al., Nature Biotechnology, 2009
O’Donoghue et al., Journal of Web Semantics, 2010

information extraction

co-mentioning

within documents

within paragraphs

within sentences

text corpus

~22 million abstracts

no access

millions of full-text articles

Network biology: Large-scale data and text mining

localization and disease

general approach

COMPARTMENTS

TISSUES

DISEASES

curated knowledge

experimental data

text mining

computational predictions

common identifiers

quality scores

visualization

compartments.jensenlab.org

tissues.jensenlab.org

dissemination

web interfaces

Network biology: Large-scale data and text mining

web services

diseases.jensenlab.org

bulk download

Acknowledgments
STRING
Christian von
Mering
Damian
Szklarczyk
Michael Kuhn
Manuel Stark
Samuel Chaffron
Chris Creevey
Jean Muller
Tobias Doerks
Philippe Julien
Alexander Roth
Milan Simonovic
Jan Korbel
Berend Snel
Martijn Huynen
Peer Bork
Text
mining
Sune Frankild
Evangelos Pafilis
Kalliopi Tsafou
Alberto Santos
Janos Binder
Heiko Horn
Michael Kuhn
Nigel Brown
Reinhardt Schneider
Sean O’ Donoghue

Ad

Recommended

PPT

Gene association networks - Large-scale integration of data and text

Lars Juhl Jensen

PPT

Network biology: Large-scale data and text mining

Lars Juhl Jensen

PPT

STRING - Protein networks from data and text mining

Lars Juhl Jensen

KEY

STRING/STITCH tutorial

PPT

STRING - Large-scale integration of data and text

Lars Juhl Jensen

PPT

Large-scale integration of data and text

Lars Juhl Jensen

PPT

In silico and Text-Based Analysis of Cellular Networks

Lars Juhl Jensen

PPT

Protein association networks: Large-scale integration of data and text

Lars Juhl Jensen

PPT

Introduction to STRING

Lars Juhl Jensen

PPT

Network Biology: A crash course on STRING and Cytoscape

Lars Juhl Jensen

PPT

Gene association networks - Large-scale integration of data and text

Lars Juhl Jensen

PPT

Gene association networks: Large-scale integration of data and text

Lars Juhl Jensen

PPT

Gene association networks - Large-scale integration of data and text

Lars Juhl Jensen

PPT

Network biology: Large-scale data integration and text mining

Lars Juhl Jensen

PPT

Network Biology: Large-scale integration of data and text

Lars Juhl Jensen

PPT

Network biology - Large-scale integration of data and text

Lars Juhl Jensen

PPT

The STRING database and related tools

Lars Juhl Jensen

PPT

Gene association networks: Large-scale integration of data and text

Lars Juhl Jensen

PPT

Gene association networks: Large-scale integration of data and text

Lars Juhl Jensen

PPT

Gene association networks - Large-scale integration of data and text

Lars Juhl Jensen

PPT

Making gene networks through data integration

Lars Juhl Jensen

PPT

STRING: Protein networks from data and text mining

Lars Juhl Jensen

PPT

Data integration with STRING

Lars Juhl Jensen

PPT

The STRING database

Lars Juhl Jensen

PPT

Integration of heterogeneous data

Lars Juhl Jensen

PPT

Networks of proteins and diseases

Lars Juhl Jensen

PPT

STRING: Large-scale data and text mining

Lars Juhl Jensen

PPT

One tagger, many uses - Illustrating the power of ontologies in named entity ...

Lars Juhl Jensen

PPTX

Computational Systems Biology (JCSB)

Annex Publishers

PPTX

Colombia desarrollo tecnológico y científico

Eduar Jerez Tellez

More Related Content

PPT

Gene association networks - Large-scale integration of data and text

Lars Juhl Jensen

PPT

Network biology: Large-scale data and text mining

Lars Juhl Jensen

PPT

STRING - Protein networks from data and text mining

Lars Juhl Jensen

KEY

STRING/STITCH tutorial

PPT

STRING - Large-scale integration of data and text

Lars Juhl Jensen

PPT

Large-scale integration of data and text

Lars Juhl Jensen

PPT

In silico and Text-Based Analysis of Cellular Networks

Lars Juhl Jensen

PPT

Protein association networks: Large-scale integration of data and text

Lars Juhl Jensen

Gene association networks - Large-scale integration of data and text

Lars Juhl Jensen

Network biology: Large-scale data and text mining

Lars Juhl Jensen

STRING - Protein networks from data and text mining

Lars Juhl Jensen

STRING/STITCH tutorial

STRING - Large-scale integration of data and text

Lars Juhl Jensen

Large-scale integration of data and text

Lars Juhl Jensen

In silico and Text-Based Analysis of Cellular Networks

Lars Juhl Jensen

Protein association networks: Large-scale integration of data and text

Lars Juhl Jensen

What's hot (20)

PPT

Introduction to STRING

Lars Juhl Jensen

PPT

Network Biology: A crash course on STRING and Cytoscape

Lars Juhl Jensen

PPT

Gene association networks - Large-scale integration of data and text

Lars Juhl Jensen

PPT

Gene association networks: Large-scale integration of data and text

Lars Juhl Jensen

PPT

Gene association networks - Large-scale integration of data and text

Lars Juhl Jensen

PPT

Network biology: Large-scale data integration and text mining

Lars Juhl Jensen

PPT

Network Biology: Large-scale integration of data and text

Lars Juhl Jensen

PPT

Network biology - Large-scale integration of data and text

Lars Juhl Jensen

PPT

The STRING database and related tools

Lars Juhl Jensen

PPT

Gene association networks: Large-scale integration of data and text

Lars Juhl Jensen

PPT

Gene association networks: Large-scale integration of data and text

Lars Juhl Jensen

PPT

Gene association networks - Large-scale integration of data and text

Lars Juhl Jensen

PPT

Making gene networks through data integration

Lars Juhl Jensen

PPT

STRING: Protein networks from data and text mining

Lars Juhl Jensen

PPT

Data integration with STRING

Lars Juhl Jensen

PPT

The STRING database

Lars Juhl Jensen

PPT

Integration of heterogeneous data

Lars Juhl Jensen

PPT

Networks of proteins and diseases

Lars Juhl Jensen

PPT

STRING: Large-scale data and text mining

Lars Juhl Jensen

PPT

One tagger, many uses - Illustrating the power of ontologies in named entity ...

Lars Juhl Jensen

Introduction to STRING

Lars Juhl Jensen

Network Biology: A crash course on STRING and Cytoscape

Lars Juhl Jensen

Gene association networks - Large-scale integration of data and text

Lars Juhl Jensen

Gene association networks: Large-scale integration of data and text

Lars Juhl Jensen

Gene association networks - Large-scale integration of data and text

Lars Juhl Jensen

Network biology: Large-scale data integration and text mining

Lars Juhl Jensen

Network Biology: Large-scale integration of data and text

Lars Juhl Jensen

Network biology - Large-scale integration of data and text

Lars Juhl Jensen

The STRING database and related tools

Lars Juhl Jensen

Gene association networks: Large-scale integration of data and text

Lars Juhl Jensen

Gene association networks: Large-scale integration of data and text

Lars Juhl Jensen

Gene association networks - Large-scale integration of data and text

Lars Juhl Jensen

Making gene networks through data integration

Lars Juhl Jensen

STRING: Protein networks from data and text mining

Lars Juhl Jensen

Data integration with STRING

Lars Juhl Jensen

The STRING database

Lars Juhl Jensen

Integration of heterogeneous data

Lars Juhl Jensen

Networks of proteins and diseases

Lars Juhl Jensen

STRING: Large-scale data and text mining

Lars Juhl Jensen

One tagger, many uses - Illustrating the power of ontologies in named entity ...

Lars Juhl Jensen

Ad

Viewers also liked (20)

PPTX

Computational Systems Biology (JCSB)

Annex Publishers

PPTX

Colombia desarrollo tecnológico y científico

Eduar Jerez Tellez

PPT

Satellite tv software trial

John Rutherford

PPT

Satellite tv software pc

John Rutherford

PPT

παρουσίαση1

PPT

Satellite tv software on pc

John Rutherford

PPT

Satellite tv software laptop

John Rutherford

PPT

παρουσίαση1

PPT

παρουσίαση1

PPTX

Colombia desarrollo tecnológico y científico

Eduar Jerez Tellez

PPTX

The Value of Bioinformatics Software

Robert Ward Cutler Thailand

PPT

Explorations in bioinformatics

Douglas Joubert

PDF

Making the Most of Your Gradle Build

PPTX

Metabolomics Society meeting 2011 - presentatie Kees

PPTX

タイ人オタクが艦これ聖地山を巡った話　第３話１章　和歌山　新宮市　熊野川

Matumit Sombunjaroen

PPTX

Historica Fantasia, Development Blog 05, Prototype Cost Estimation

Matumit Sombunjaroen

PDF

World-wide data exchange in metabolomics, Wageningen, October 2016

Christoph Steinbeck

PPTX

Kalifornia

PDF

The Future of Progressive Web Apps - View Source conference, Berlin 2016

PPTX

NOMNENCLATURA DE QUIMICA INORGANICA

Computational Systems Biology (JCSB)

Annex Publishers

Colombia desarrollo tecnológico y científico

Eduar Jerez Tellez

Satellite tv software trial

John Rutherford

Satellite tv software pc

John Rutherford

παρουσίαση1

Satellite tv software on pc

John Rutherford

Satellite tv software laptop

John Rutherford

παρουσίαση1

παρουσίαση1

Colombia desarrollo tecnológico y científico

Eduar Jerez Tellez

The Value of Bioinformatics Software

Robert Ward Cutler Thailand

Explorations in bioinformatics

Douglas Joubert

Making the Most of Your Gradle Build

Metabolomics Society meeting 2011 - presentatie Kees

タイ人オタクが艦これ聖地山を巡った話　第３話１章　和歌山　新宮市　熊野川

Matumit Sombunjaroen

Historica Fantasia, Development Blog 05, Prototype Cost Estimation

Matumit Sombunjaroen

World-wide data exchange in metabolomics, Wageningen, October 2016

Christoph Steinbeck

Kalifornia

The Future of Progressive Web Apps - View Source conference, Berlin 2016

NOMNENCLATURA DE QUIMICA INORGANICA

Ad

Similar to Network biology: Large-scale data and text mining (20)

PPT

Network biology: Large-scale data integration and text mining

Lars Juhl Jensen

PPT

Data and Text Mining

Lars Juhl Jensen

PPT

Large-scale data and text mining

Lars Juhl Jensen

PPT

Turning big data and text collections into web resrouces

Lars Juhl Jensen

PPT

Protein networks: A basis for large-scale data mining

Lars Juhl Jensen

PPT

Protein networks: A basis for large-scale data mining

Lars Juhl Jensen

PPT

Systems biology: Large-scale biomedical data mining

Lars Juhl Jensen

PPT

Systems biology: Large-scale biomedical data mining

Lars Juhl Jensen

PPT

Large-scale integration of data and text

Lars Juhl Jensen

PPT

STRING & related databases: Large-scale integration of heterogeneous data

Lars Juhl Jensen

PPT

Protein networks: A basis for large-scale data mining

Lars Juhl Jensen

PPT

Cellular Network Biology: Large-scale integration of data and text

Lars Juhl Jensen

PPT

Protein networks: A basis for large-scale data mining

Lars Juhl Jensen

PPT

Networks of proteins and diseases

Lars Juhl Jensen

PPT

Large-scale integration of data and text

Lars Juhl Jensen

PPT

Network biology

Lars Juhl Jensen

PPT

Network biology: A basis for large-scale biomedical data mining

Lars Juhl Jensen

PPT

Network biology: A basis for large-scale biomedical data mining

Lars Juhl Jensen

PPT

Network biology: A basis for large-scale biomedical data mining

Lars Juhl Jensen

PPT

Systems biology - Bioinformatics on complete biological systems

Lars Juhl Jensen

Network biology: Large-scale data integration and text mining

Lars Juhl Jensen

Data and Text Mining

Lars Juhl Jensen

Large-scale data and text mining

Lars Juhl Jensen

Turning big data and text collections into web resrouces

Lars Juhl Jensen

Protein networks: A basis for large-scale data mining

Lars Juhl Jensen

Protein networks: A basis for large-scale data mining

Lars Juhl Jensen

Systems biology: Large-scale biomedical data mining

Lars Juhl Jensen

Systems biology: Large-scale biomedical data mining

Lars Juhl Jensen

Large-scale integration of data and text

Lars Juhl Jensen

STRING & related databases: Large-scale integration of heterogeneous data

Lars Juhl Jensen

Protein networks: A basis for large-scale data mining

Lars Juhl Jensen

Cellular Network Biology: Large-scale integration of data and text

Lars Juhl Jensen

Protein networks: A basis for large-scale data mining

Lars Juhl Jensen

Networks of proteins and diseases

Lars Juhl Jensen

Large-scale integration of data and text

Lars Juhl Jensen

Network biology

Lars Juhl Jensen

Network biology: A basis for large-scale biomedical data mining

Lars Juhl Jensen

Network biology: A basis for large-scale biomedical data mining

Lars Juhl Jensen

Network biology: A basis for large-scale biomedical data mining

Lars Juhl Jensen

Systems biology - Bioinformatics on complete biological systems

Lars Juhl Jensen

More from Lars Juhl Jensen (20)

PPT

One tagger, many uses: Illustrating the power of dictionary-based named entit...

Lars Juhl Jensen

PPT

One tagger, many uses: Simple text-mining strategies for biomedicine

Lars Juhl Jensen

PPT

Extract 2.0: Text-mining-assisted interactive annotation

Lars Juhl Jensen

PPT

Network visualization: A crash course on using Cytoscape

Lars Juhl Jensen

PPT

STRING & STITCH: Network integration of heterogeneous data

Lars Juhl Jensen

PPT

Biomedical text mining: Automatic processing of unstructured text

Lars Juhl Jensen

PPT

Medical network analysis: Linking diseases and genes through data and text mi...

Lars Juhl Jensen

PPT

Cellular networks

Lars Juhl Jensen

PPT

Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...

Lars Juhl Jensen

PPT

Tagger: Rapid dictionary-based named entity recognition

Lars Juhl Jensen

PPT

Medical text mining: Linking diseases, drugs, and adverse reactions

Lars Juhl Jensen

PPT

Network biology: Large-scale integration of data and text

Lars Juhl Jensen

PPT

Medical data and text mining: Linking diseases, drugs, and adverse reactions

Lars Juhl Jensen

PPT

Cellular Network Biology

Lars Juhl Jensen

PPT

Network biology: Large-scale integration of data and text

Lars Juhl Jensen

PPT

Biomarker bioinformatics: Network-based candidate prioritization

Lars Juhl Jensen

PPT

The Art of Counting: Scoring and ranking co-occurrences in literature

Lars Juhl Jensen

PPT

Text-mining-based retrieval of protein networks

Lars Juhl Jensen

PPT

Medical data and text mining: Linking diseases, drugs, and adverse reactions

Lars Juhl Jensen

PPT

Medical data and text mining: Linking diseases, drugs, and adverse reactions

Lars Juhl Jensen

One tagger, many uses: Illustrating the power of dictionary-based named entit...

Lars Juhl Jensen

One tagger, many uses: Simple text-mining strategies for biomedicine

Lars Juhl Jensen

Extract 2.0: Text-mining-assisted interactive annotation

Lars Juhl Jensen

Network visualization: A crash course on using Cytoscape

Lars Juhl Jensen

STRING & STITCH: Network integration of heterogeneous data

Lars Juhl Jensen

Biomedical text mining: Automatic processing of unstructured text

Lars Juhl Jensen

Medical network analysis: Linking diseases and genes through data and text mi...

Lars Juhl Jensen

Cellular networks

Lars Juhl Jensen

Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...

Lars Juhl Jensen

Tagger: Rapid dictionary-based named entity recognition

Lars Juhl Jensen

Medical text mining: Linking diseases, drugs, and adverse reactions

Lars Juhl Jensen

Network biology: Large-scale integration of data and text

Lars Juhl Jensen

Medical data and text mining: Linking diseases, drugs, and adverse reactions

Lars Juhl Jensen

Cellular Network Biology

Lars Juhl Jensen

Network biology: Large-scale integration of data and text

Lars Juhl Jensen

Biomarker bioinformatics: Network-based candidate prioritization

Lars Juhl Jensen

The Art of Counting: Scoring and ranking co-occurrences in literature

Lars Juhl Jensen

Text-mining-based retrieval of protein networks

Lars Juhl Jensen

Medical data and text mining: Linking diseases, drugs, and adverse reactions

Lars Juhl Jensen

Medical data and text mining: Linking diseases, drugs, and adverse reactions

Lars Juhl Jensen

Recently uploaded (20)

PDF

Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...

PDF

Looking into the jet cone of the neutrino-associated very high-energy blazar ...

PDF

ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf

PPTX

Microbes in human welfare class 12 .pptx

chinmayikalokhe

PPTX

Science Quipper for lesson in grade 8 Matatag Curriculum

PDF

lecture 2026 of Sjogren's syndrome l .pdf

PPT

veterinary parasitology ````````````.ppt

midolyon1990gmailcom

PPTX

BODY FLUIDS AND CIRCULATION class 11 .pptx

chinmayikalokhe

PPT

1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine

PPTX

Application of enzymes in medicine (2).pptx

PPTX

ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...

PDF

Formation of Supersonic Turbulence in the Primordial Star-forming Cloud

PPTX

Introduction to Cardiovascular system_structure and functions-1

PPT

Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.

PDF

Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...

PPTX

POULTRY PRODUCTION AND MANAGEMENTNNN.pptx

PDF

GROUP 2 ORIGINAL PPT. pdf Hhfiwhwifhww0ojuwoadwsfjofjwsofjw

PPTX

C1 cut-Methane and it's Derivatives.pptx

PDF

Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6

PDF

Biophysics 2.pdffffffffffffffffffffffffff

Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...

Looking into the jet cone of the neutrino-associated very high-energy blazar ...

ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf

Microbes in human welfare class 12 .pptx

chinmayikalokhe

Science Quipper for lesson in grade 8 Matatag Curriculum

lecture 2026 of Sjogren's syndrome l .pdf

veterinary parasitology ````````````.ppt

midolyon1990gmailcom

BODY FLUIDS AND CIRCULATION class 11 .pptx

chinmayikalokhe

1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine

Application of enzymes in medicine (2).pptx

ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...

Formation of Supersonic Turbulence in the Primordial Star-forming Cloud

Introduction to Cardiovascular system_structure and functions-1

Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.

Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...

POULTRY PRODUCTION AND MANAGEMENTNNN.pptx

GROUP 2 ORIGINAL PPT. pdf Hhfiwhwifhww0ojuwoadwsfjofjwsofjw

C1 cut-Methane and it's Derivatives.pptx

Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6

Biophysics 2.pdffffffffffffffffffffffffff

Network biology: Large-scale data and text mining

1. Network biology Large-scale data and text mining Lars Juhl Jensen

2. guilt by association

4. protein networks

6. computational predictions

8. Korbel et al., Nature Biotechnology, 2004

9. gene neighborhood

10. Korbel et al., Nature Biotechnology, 2004

11. phylogenetic profiles

12. Korbel et al., Nature Biotechnology, 2004

13. experimental data

14. gene coexpression

16. protein interactions

17. Jensen & Bork, Science, 2008

18. curated knowledge

21. Letunic & Bork, Trends in Biochemical Sciences, 2008

22. many databases

23. different formats

24. different identifiers

25. variable quality

26. not comparable

27. not same species

29. quality scores

30. von Mering et al., Nucleic Acids Research, 2005

31. calibrate vs. gold standard

32. von Mering et al., Nucleic Acids Research, 2005

33. homology-based transfer

34. Franceschini et al., Nucleic Acids Research, 2013

35. missing most of the data

36. text mining

38. too much to read

40. as smart as a dog

41. teach it specific tricks

44. named entity recognition

45. comprehensive lexicon

47. cyclin dependent kinase 1

48. expansion rules

51. flexible matching

52. cyclin-dependent kinase 1

53. cyclin dependent kinase 1

54. “black list”

56. augmented browsing

58. browser add-on

59. real-time text mining

60. Pafilis, O’Donoghue, Jensen et al., Nature Biotechnology, 2009 O’Donoghue et al., Journal of Web Semantics, 2010

61. information extraction

62. co-mentioning

63. within documents

64. within paragraphs

65. within sentences

66. text corpus

67. ~22 million abstracts

69. millions of full-text articles

71. localization and disease

72. general approach

73. COMPARTMENTS

76. curated knowledge

77. experimental data

78. text mining

79. computational predictions

80. common identifiers

81. quality scores

82. visualization

83. compartments.jensenlab.org

84. tissues.jensenlab.org

85. dissemination

86. web interfaces

88. web services

89. diseases.jensenlab.org

90. bulk download

91. Acknowledgments STRING Christian von Mering Damian Szklarczyk Michael Kuhn Manuel Stark Samuel Chaffron Chris Creevey Jean Muller Tobias Doerks Philippe Julien Alexander Roth Milan Simonovic Jan Korbel Berend Snel Martijn Huynen Peer Bork Text mining Sune Frankild Evangelos Pafilis Kalliopi Tsafou Alberto Santos Janos Binder Heiko Horn Michael Kuhn Nigel Brown Reinhardt Schneider Sean O’ Donoghue