Research
Search extension transforms Wiki into a relational system: A case for flavonoid
metabolite database

Masanori Arita1,2,3* and Kazuhiro Suwa1

    * Corresponding author: Masanori Arita arita@k.u-tokyo.ac.jp

Author Affiliations

1 Department of Computational Biology, Graduate School of Frontier Sciences, The
University of Tokyo, Kashiwanoha 5-1-5 CB05, Kashiwa, Japan

2 Metabolome Informatics Unit, Plant Science Center, RIKEN, Japan

3 Institute for Advanced Biosciences, Keio University, Japan

For all author emails, please log on.

BioData Mining 2008, 1:7 doi:10.1186/1756-0381-1-7

The electronic version of this article is the complete one and can be found
online at: http://www.biodatamining.org/content/1/1/7

Received:    23 May 2008
Accepted:    17 September 2008
Published:   17 September 2008

© 2008 Arita and Suwa; licensee BioMed Central Ltd.

This is an Open Access article distributed under the terms of the Creative
Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which
permits unrestricted use, distribution, and reproduction in any medium, provided
the original work is properly cited.
Abstract
Background

In computer science, database systems are based on the relational model founded
by Edgar Codd in 1970. On the other hand, in the area of biology the word
'database' often refers to loosely formatted, very large text files. Although
such bio-databases may describe conflicts or ambiguities (e.g. a protein pair do
and do not interact, or unknown parameters) in a positive sense, the flexibility
of the data format sacrifices a systematic query mechanism equivalent to the
widely used SQL.
Results

To overcome this disadvantage, we propose embeddable string-search commands on a
Wiki-based system and designed a half-formatted database. As proof of principle,
a database of flavonoid with 6902 molecular structures from over 1687 plant
species was implemented on MediaWiki, the background system of Wikipedia.
Registered users can describe any information in an arbitrary format. Structured
part is subject to text-string searches to realize relational operations. The
system was written in PHP language as the extension of MediaWiki. All
modifications are open-source and publicly available.
Conclusion

This scheme benefits from both the free-formatted Wiki style and the concise and
structured relational-database style. MediaWiki supports multi-user environments
for document management, and the cost for database maintenance is alleviated.
Background
Why is database maintenance unappreciated?

In many research fields, building or maintaining a database system is not a
sought-after task and researchers tend to avoid the chore because: 1) the
inputting and checking of data are routine and tedious, 2) novel findings are
rarely based on a collection of old data, 3) database developers often do not
receive deserved credit especially when data are distributed for free, and 4) it
is difficult to evaluate the quality and value of data. However, most
bioinformatics research requires high-quality databases. Their significance is
clear from the success of major data-servicing institutes such as the National
Center for Biotechnology Information (NCBI; USA) or the European Bioinformatics
Institute (EBI; UK). Without doubt, data collection and management are important
activities in scientific research.
Input, management, and query are the keys

How can we promote the development and maintenance of high-quality databases?
The key processes in data handling are the input, management, and query of data.
The effort required for the first two activities can be significantly reduced by
introducing a Wiki-based system. The Wikipedia, a web-based free encyclopaedia,
for example, continues its rapid growth despite criticism by experts of 'lack of
quality control' and 'vulnerability to website vandals' [1]. Its English version
now boasts over 2 million articles followed by 0.7 and 0.6 million in German and
French [2]. Still unsupported is flexibility in query mechanisms and
presentation such as displaying customized statistics in a user-friendly way.

In the rapidly evolving frontiers of biology research, the development of
flexible and systematic query mechanisms has not been pursued actively.
Biological data and their formats often co-evolve; the definition of data type
in a repository requires frequent, major updates. This is not compatible with
the relational model proposed by E. Codd [3]. It has been the gold standard for
data management for decades but the model requires fixation of database schema
prior to data input. Currently, many biologists prefer using a spreadsheet such
as Excel (Microsoft Corp., Seattle, WA, USA) or a simple text to organize their
experimental results, and biology databases only provide full-text searches to
access large-scale, unstructured text data. Ideally, a bio-database needs to
serve as a searchable, digital laboratory notebook where users can efficiently
organize and query data. As most Wiki systems only provide a collection of
independent pages with weak query functions, we tested the possibility of
installing a flexible query mechanism on a Wiki-based system. Here we propose an
extension of MediaWiki that can emulate traditional database operations [4]. We
demonstrate our idea with the molecular information on flavonoid, the major
category of plant secondary metabolites.

This paper is organized as follows. The basic function and flexibility of
MediaWiki are introduced in Methods section from a computer-science perspective.
Using functionality, we introduce the implementation of the flavonoid database
in Results section. The advantages of our approach and differences from other
approaches are reviewed in Discussions. Readers are encouraged to access our
website at http://metabolomics.jp/ webcite.

More Related Content

PPTX
Met soc15 roccaserra-biocrates-datasharing
PDF
Investigating plant systems using data integration and network analysis
PPTX
Entrez databases
PPTX
Citing data in research articles: principles, implementation, challenges - an...
PDF
Model repositories and standard formats for model reusability
PPTX
FAIR Data and Model Management for Systems Biology (and SOPs too!)
PDF
STI Summit 2011 - LS4 LS Khaos
PDF
Implementing and Institutional Repository for Sharing, Archiving, and Accessi...
Met soc15 roccaserra-biocrates-datasharing
Investigating plant systems using data integration and network analysis
Entrez databases
Citing data in research articles: principles, implementation, challenges - an...
Model repositories and standard formats for model reusability
FAIR Data and Model Management for Systems Biology (and SOPs too!)
STI Summit 2011 - LS4 LS Khaos
Implementing and Institutional Repository for Sharing, Archiving, and Accessi...

What's hot (16)

PDF
When is a model FAIR – and why should we care?
PDF
Adding value to scientific results: COMBINE standards & guidelines for system...
PPTX
The FAIRDOM Commons for Systems Biology
PDF
Model management tools for improved reproducibility in systems biology
PPTX
Nucleic Acid Sequence Databases
PPTX
Metid Match 2014 - SEEK for Science
PPTX
Composite protein databases
PDF
COMBINE standards & tools: Getting model management right
PPTX
Data retriveal ,srg and dbget
PPT
Bioinformatics
PPTX
VIVO Mini-Grant: Integrating the UMLS Ontology into VIVO for Linking Biomedic...
PPTX
OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata...
PPTX
Opportunities in chemical structure standardization
PPTX
FAIR data and model management for systems biology.
PDF
Standards and tools for model management in biomedical research
PPTX
Biological data bioinformatics
When is a model FAIR – and why should we care?
Adding value to scientific results: COMBINE standards & guidelines for system...
The FAIRDOM Commons for Systems Biology
Model management tools for improved reproducibility in systems biology
Nucleic Acid Sequence Databases
Metid Match 2014 - SEEK for Science
Composite protein databases
COMBINE standards & tools: Getting model management right
Data retriveal ,srg and dbget
Bioinformatics
VIVO Mini-Grant: Integrating the UMLS Ontology into VIVO for Linking Biomedic...
OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata...
Opportunities in chemical structure standardization
FAIR data and model management for systems biology.
Standards and tools for model management in biomedical research
Biological data bioinformatics
Ad

Similar to bio data (20)

PPT
BioWikis BSB10
PPT
Wikis at work
PDF
Brohee_wiki_BOSC2009
PPT
bioinfomatics
PPTX
biological databases.pptx
PPT
Biological Database Systems
ODP
Semantic web technologies applied to bioinformatics and laboratory data manag...
PPTX
Biological database ppt(1).pptx Introuction
PPT
protein databases.ppt
PPT
Wikilims Road4
PDF
Research data catalogues and data interoperability in life sciences
PDF
Bioinformatics - Exam_Materials.pdf by uos
PPTX
Delivering biodiversity knowledge in the information age
PPTX
Life Science Database Cross Search and Metadata
PDF
Bioinformatics databases: Current Trends and Future Perspectives
PPT
Kelly presentationarin6912
PPT
Kelly presentation ARIN6912
PPTX
High-performance web services for gene and variant annotations
PPTX
Biological databases
PPT
KnowIT, semantic informatics knowledge base
BioWikis BSB10
Wikis at work
Brohee_wiki_BOSC2009
bioinfomatics
biological databases.pptx
Biological Database Systems
Semantic web technologies applied to bioinformatics and laboratory data manag...
Biological database ppt(1).pptx Introuction
protein databases.ppt
Wikilims Road4
Research data catalogues and data interoperability in life sciences
Bioinformatics - Exam_Materials.pdf by uos
Delivering biodiversity knowledge in the information age
Life Science Database Cross Search and Metadata
Bioinformatics databases: Current Trends and Future Perspectives
Kelly presentationarin6912
Kelly presentation ARIN6912
High-performance web services for gene and variant annotations
Biological databases
KnowIT, semantic informatics knowledge base
Ad

Recently uploaded (20)

PDF
Paper A Mock Exam 9_ Attempt review.pdf.
PDF
Empowerment Technology for Senior High School Guide
PDF
FORM 1 BIOLOGY MIND MAPS and their schemes
PDF
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
PPTX
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
PPTX
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
PDF
Practical Manual AGRO-233 Principles and Practices of Natural Farming
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
PDF
LDMMIA Reiki Yoga Finals Review Spring Summer
PPTX
20th Century Theater, Methods, History.pptx
PDF
Hazard Identification & Risk Assessment .pdf
PPTX
Virtual and Augmented Reality in Current Scenario
PDF
Complications of Minimal Access-Surgery.pdf
PPTX
Computer Architecture Input Output Memory.pptx
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
DOCX
Cambridge-Practice-Tests-for-IELTS-12.docx
PDF
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
PDF
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
Paper A Mock Exam 9_ Attempt review.pdf.
Empowerment Technology for Senior High School Guide
FORM 1 BIOLOGY MIND MAPS and their schemes
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
Practical Manual AGRO-233 Principles and Practices of Natural Farming
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
LDMMIA Reiki Yoga Finals Review Spring Summer
20th Century Theater, Methods, History.pptx
Hazard Identification & Risk Assessment .pdf
Virtual and Augmented Reality in Current Scenario
Complications of Minimal Access-Surgery.pdf
Computer Architecture Input Output Memory.pptx
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
Cambridge-Practice-Tests-for-IELTS-12.docx
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf

bio data

  • 1. Research Search extension transforms Wiki into a relational system: A case for flavonoid metabolite database Masanori Arita1,2,3* and Kazuhiro Suwa1 * Corresponding author: Masanori Arita arita@k.u-tokyo.ac.jp Author Affiliations 1 Department of Computational Biology, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwanoha 5-1-5 CB05, Kashiwa, Japan 2 Metabolome Informatics Unit, Plant Science Center, RIKEN, Japan 3 Institute for Advanced Biosciences, Keio University, Japan For all author emails, please log on. BioData Mining 2008, 1:7 doi:10.1186/1756-0381-1-7 The electronic version of this article is the complete one and can be found online at: http://www.biodatamining.org/content/1/1/7 Received: 23 May 2008 Accepted: 17 September 2008 Published: 17 September 2008 © 2008 Arita and Suwa; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Background In computer science, database systems are based on the relational model founded by Edgar Codd in 1970. On the other hand, in the area of biology the word 'database' often refers to loosely formatted, very large text files. Although such bio-databases may describe conflicts or ambiguities (e.g. a protein pair do and do not interact, or unknown parameters) in a positive sense, the flexibility of the data format sacrifices a systematic query mechanism equivalent to the widely used SQL. Results To overcome this disadvantage, we propose embeddable string-search commands on a Wiki-based system and designed a half-formatted database. As proof of principle, a database of flavonoid with 6902 molecular structures from over 1687 plant species was implemented on MediaWiki, the background system of Wikipedia. Registered users can describe any information in an arbitrary format. Structured part is subject to text-string searches to realize relational operations. The system was written in PHP language as the extension of MediaWiki. All modifications are open-source and publicly available. Conclusion This scheme benefits from both the free-formatted Wiki style and the concise and structured relational-database style. MediaWiki supports multi-user environments for document management, and the cost for database maintenance is alleviated. Background Why is database maintenance unappreciated? In many research fields, building or maintaining a database system is not a sought-after task and researchers tend to avoid the chore because: 1) the
  • 2. inputting and checking of data are routine and tedious, 2) novel findings are rarely based on a collection of old data, 3) database developers often do not receive deserved credit especially when data are distributed for free, and 4) it is difficult to evaluate the quality and value of data. However, most bioinformatics research requires high-quality databases. Their significance is clear from the success of major data-servicing institutes such as the National Center for Biotechnology Information (NCBI; USA) or the European Bioinformatics Institute (EBI; UK). Without doubt, data collection and management are important activities in scientific research. Input, management, and query are the keys How can we promote the development and maintenance of high-quality databases? The key processes in data handling are the input, management, and query of data. The effort required for the first two activities can be significantly reduced by introducing a Wiki-based system. The Wikipedia, a web-based free encyclopaedia, for example, continues its rapid growth despite criticism by experts of 'lack of quality control' and 'vulnerability to website vandals' [1]. Its English version now boasts over 2 million articles followed by 0.7 and 0.6 million in German and French [2]. Still unsupported is flexibility in query mechanisms and presentation such as displaying customized statistics in a user-friendly way. In the rapidly evolving frontiers of biology research, the development of flexible and systematic query mechanisms has not been pursued actively. Biological data and their formats often co-evolve; the definition of data type in a repository requires frequent, major updates. This is not compatible with the relational model proposed by E. Codd [3]. It has been the gold standard for data management for decades but the model requires fixation of database schema prior to data input. Currently, many biologists prefer using a spreadsheet such as Excel (Microsoft Corp., Seattle, WA, USA) or a simple text to organize their experimental results, and biology databases only provide full-text searches to access large-scale, unstructured text data. Ideally, a bio-database needs to serve as a searchable, digital laboratory notebook where users can efficiently organize and query data. As most Wiki systems only provide a collection of independent pages with weak query functions, we tested the possibility of installing a flexible query mechanism on a Wiki-based system. Here we propose an extension of MediaWiki that can emulate traditional database operations [4]. We demonstrate our idea with the molecular information on flavonoid, the major category of plant secondary metabolites. This paper is organized as follows. The basic function and flexibility of MediaWiki are introduced in Methods section from a computer-science perspective. Using functionality, we introduce the implementation of the flavonoid database in Results section. The advantages of our approach and differences from other approaches are reviewed in Discussions. Readers are encouraged to access our website at http://metabolomics.jp/ webcite.