bio data

Research
Search extension transforms Wiki into a relational system: A case for flavonoid
metabolite database

Masanori Arita1,2,3* and Kazuhiro Suwa1

* Corresponding author: Masanori Arita arita@k.u-tokyo.ac.jp

Author Affiliations

1 Department of Computational Biology, Graduate School of Frontier Sciences, The
University of Tokyo, Kashiwanoha 5-1-5 CB05, Kashiwa, Japan

2 Metabolome Informatics Unit, Plant Science Center, RIKEN, Japan

3 Institute for Advanced Biosciences, Keio University, Japan

For all author emails, please log on.

BioData Mining 2008, 1:7 doi:10.1186/1756-0381-1-7

The electronic version of this article is the complete one and can be found
online at: http://www.biodatamining.org/content/1/1/7

Received: 23 May 2008
Accepted: 17 September 2008
Published: 17 September 2008

© 2008 Arita and Suwa; licensee BioMed Central Ltd.

This is an Open Access article distributed under the terms of the Creative
Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which
permits unrestricted use, distribution, and reproduction in any medium, provided
the original work is properly cited.
Abstract
Background

In computer science, database systems are based on the relational model founded
by Edgar Codd in 1970. On the other hand, in the area of biology the word
'database' often refers to loosely formatted, very large text files. Although
such bio-databases may describe conflicts or ambiguities (e.g. a protein pair do
and do not interact, or unknown parameters) in a positive sense, the flexibility
of the data format sacrifices a systematic query mechanism equivalent to the
widely used SQL.
Results

To overcome this disadvantage, we propose embeddable string-search commands on a
Wiki-based system and designed a half-formatted database. As proof of principle,
a database of flavonoid with 6902 molecular structures from over 1687 plant
species was implemented on MediaWiki, the background system of Wikipedia.
Registered users can describe any information in an arbitrary format. Structured
part is subject to text-string searches to realize relational operations. The
system was written in PHP language as the extension of MediaWiki. All
modifications are open-source and publicly available.
Conclusion

This scheme benefits from both the free-formatted Wiki style and the concise and
structured relational-database style. MediaWiki supports multi-user environments
for document management, and the cost for database maintenance is alleviated.
Background
Why is database maintenance unappreciated?

In many research fields, building or maintaining a database system is not a
sought-after task and researchers tend to avoid the chore because: 1) the

inputting and checking of data are routine and tedious, 2) novel findings are
rarely based on a collection of old data, 3) database developers often do not
receive deserved credit especially when data are distributed for free, and 4) it
is difficult to evaluate the quality and value of data. However, most
bioinformatics research requires high-quality databases. Their significance is
clear from the success of major data-servicing institutes such as the National
Center for Biotechnology Information (NCBI; USA) or the European Bioinformatics
Institute (EBI; UK). Without doubt, data collection and management are important
activities in scientific research.
Input, management, and query are the keys

How can we promote the development and maintenance of high-quality databases?
The key processes in data handling are the input, management, and query of data.
The effort required for the first two activities can be significantly reduced by
introducing a Wiki-based system. The Wikipedia, a web-based free encyclopaedia,
for example, continues its rapid growth despite criticism by experts of 'lack of
quality control' and 'vulnerability to website vandals' [1]. Its English version
now boasts over 2 million articles followed by 0.7 and 0.6 million in German and
French [2]. Still unsupported is flexibility in query mechanisms and
presentation such as displaying customized statistics in a user-friendly way.

In the rapidly evolving frontiers of biology research, the development of
flexible and systematic query mechanisms has not been pursued actively.
Biological data and their formats often co-evolve; the definition of data type
in a repository requires frequent, major updates. This is not compatible with
the relational model proposed by E. Codd [3]. It has been the gold standard for
data management for decades but the model requires fixation of database schema
prior to data input. Currently, many biologists prefer using a spreadsheet such
as Excel (Microsoft Corp., Seattle, WA, USA) or a simple text to organize their
experimental results, and biology databases only provide full-text searches to
access large-scale, unstructured text data. Ideally, a bio-database needs to
serve as a searchable, digital laboratory notebook where users can efficiently
organize and query data. As most Wiki systems only provide a collection of
independent pages with weak query functions, we tested the possibility of
installing a flexible query mechanism on a Wiki-based system. Here we propose an
extension of MediaWiki that can emulate traditional database operations [4]. We
demonstrate our idea with the molecular information on flavonoid, the major
category of plant secondary metabolites.

This paper is organized as follows. The basic function and flexibility of
MediaWiki are introduced in Methods section from a computer-science perspective.
Using functionality, we introduce the implementation of the flavonoid database
in Results section. The advantages of our approach and differences from other
approaches are reviewed in Discussions. Readers are encouraged to access our
website at http://metabolomics.jp/ webcite.

bio data

More Related Content

What's hot (16)

Similar to bio data (20)

Recently uploaded (20)

bio data