Integration of diverse large-scale datasets

1. Integration of diverse large-scale datasets

2. Lars Juhl Jensen

6. promoter analysis

7. Jensen et al., Bioinformatics, 2000

8. DNA structure

9. genome visualization

10. Pedersen et al., Journal of Molecular Biology, 2000

11. microarray normalization

12. Workman et al., Genome Biology, 2002

13. protein function prediction

18. STRING

20. integrate diverse evidence

21. functional interactions

22. Bork et al., Current Opinion in Structural Biology, 2005

23. 179 proteomes

24. evolution

27. statistics

28. (the original sin)

29. prokaryotes

30. genomic context methods

31. gene fusion

33. gene neighborhood

35. phylogenetic profiles

40. Cell Cellulosomes Cellulose

41. eukaryotes

42. integrate diverse datasets

43. Jensen et al., Drug Discovery Today: Targets, 2004

44. curated knowledge

45. MIPS Munich Information center for Protein Sequences

46. KEGG Kyoto Encyclopedia of Genes and Genomes

47. STKE Signal Transduction Knowledge Environment

48. Reactome

49. literature mining

50. M EDLINE

51. SGD Saccharomyces Genome Database

52. The Interactive Fly

53. OMIM Online Mendelian Inheritance in Man

54. co-mentioning

55. NLP Natural Language Processing

56. Gene and protein names Cue words for entity recognition Verbs for relation extraction [ nxgene The GAL4 gene ] [ nxexpr T he expression of [ nxgene the cytochrome genes [ nxpg CYC1 and CYC7 ]]] is controlled by [ nxpg HAP1 ]

58. primary experimental data

59. microarray expression data

60. GEO Gene Expression Omnibus

61. physical protein interactions

62. BIND Biomolecular Interaction Network Database

63. MINT Molecular Interactions Database

64. GRID General Repository for Interaction Datasets

65. DIP Database of Interacting Proteins

66. HPRD Human Protein Reference Database

67. problems

68. many sources

69. (different gene identifiers)

70. many types of evidence

71. questionable quality

72. not directly comparable

73. spread over many species

74. huge synonyms lists

75. calculate raw quality scores

76. calibrate vs. gold standard

77. KEGG Kyoto Encyclopedia of Genes and Genomes

78. von Mering et al., Nucleic Acids Research, 2005

79. transfer based on orthology

80. combine all evidence

81. Bork et al., Current Opinion in Structural Biology, 2005

82. cell cycle

83. qualitative modeling

85. Chen et al., Molecular Biology of the Cell, 2004

86. Chen et al., Molecular Biology of the Cell, 2004

87. synchronized cell culture

89. microarray time series

91. periodically expressed genes

93. S. cerevisiae

94. Cho et al.

95. Spellman et al.

96. numerous analysis methods

97. Cho et al.

98. Spellman et al.

99. Zhao et al.

100. Johansson et al.

101. Luan and Li

102. Lu et al.

103. Ahdesm äki et al.

104. Willbrand et al.

105. no benchmarking

106. de Lichtenberg et al., Bioinformatics, 2005

107. reproducibility

109. regulation vs. periodicity

111. list of 600 periodic genes

112. S. pombe

113. several expression studies

114. reproducibility

115. Marguerat et al., Yeast, 2006

116. name inconsistencies

118. different analysis methods

119. no benchmarking

122. too many genes suggested

125. averaging better than voting

127. S. cerevisiae

128. list of 600 periodic genes

129. protein interaction data

131. von Mering et al., Nucleic Acids Research, 2005

132. de Lichtenberg et al., Science, 2005

133. dynamic proteins

134. static proteins

136. reproduces what is known

138. many detailed predictions

140. global trends

141. dynamic proteins

143. static proteins

145. just-in-time assembly

148. coordinated regulation

149. periodically expressed genes

150. Cdc28p substrates

151. PEST degradation signals

152. the human interactome

153. yeast two-hybrid

154. 1936 13 4 4 1385 65 18465 Stelzl et al. Rual et al. Small-scale studies

156. 62 8 39 Small-scale studies Stelzl et al. Rual et al. 852 17 473 432 69 260

157. 3.5% and 21% sensitivity

158. in a couple of years

159. the human interactome

160. 100% = 1/5?

161. the yeast interactome

162. five years ago

163. yeast two-hybrid

164. 1150 117 117 72 4053 118 4469 Uetz et al. Ito et al. Small-scale studies

165. 162 53 34 72 180 29 338 Uetz et al. Ito et al. Small-scale studies

166. 511 189 616 Small-scale studies Uetz et al. Ito et al. 439 178 759 897 190 1347

167. 19% and 12% sensitivity

168. the challenge

169. how to get from here …

171. … to there …

173. Acknowledgments The STRING team (EMBL) Christian von Mering Berend Snel Martijn Huynen Sean Hooper Mathilde Foglierini Julien Lagarde Peer Bork Literature mining project (EML Research) Jasmin Saric Rossitza Ouzounova Isabel Rojas Cell cycle studies (CBS) Ulrik de Lichtenberg Thomas Skøt Jensen Søren Brunak S. pombe cell cycle (Sanger) Samuel Marguerat J ürg Bähler Inspiration for presentation Lawrence Lessig Dick Clarence Hardt Anders Gorm Pedersen

174. Thank you!

Integration of diverse large-scale datasets

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Integration of diverse large-scale datasets (20)

More from Lars Juhl Jensen (20)

Recently uploaded (20)

Integration of diverse large-scale datasets