The Ovarian Kaleidoscope database (OKdb) –

an online resource for the ovarian research community

Chandra P. Leo, Ursula A. Vitt and Aaron J. W. Hsueh

Division of Reproductive Biology, Department of Gynecology and Obstetrics
Stanford University School of Medicine, Stanford, California 94305-5317


 
The URL for the Ovarian Kaleidoscope database is: http://ovary.stanford.edu
 

ABSTRACT

        The Ovarian Kaleidoscope database (OKdb) is a collaborative online resource for scientists investigating the ovary. It provides information regarding the biological function, expression pattern, and regulation of genes expressed in the ovary, as well as for the phenotypes associated with their mutation. In addition, the records in the OKdb are linked to other sites offering online information about biomedical publications, nucleotide and amino acid sequences, and human genes and genetic disorders. A powerful search tool allows the retrieval of records for specific genes and gene products based on their properties at the molecular, cellular, ovarian, or organism level. Researchers working on particular aspects of ovarian physiology can submit information into the database through a simple web-based form and instantly update their records as additional data become available. Because of this approach, the OKdb website could serve as a tool with which to navigate through the rapidly expanding amount of information about the expression and function of individual genes in the ovary, and could also enhance communication within the ovarian research community. Moreover, the design of the OKdb could serve as a model for the development of other online databases of tissue-specific gene expression and function. The Ovarian Kaleidoscope database can be accessed at http://ovary.stanford.edu/.

INTRODUCTION / BACKGROUND

        The complex reproductive and endocrine functions of the ovary are orchestrated by the interplay of thousands of genes expressed in this organ. Information is rapidly emerging about the function, expression, and regulation of these genes and their gene products in different ovarian cell types under physiologic and pathophysiologic conditions and in different species. In the text-based PubMed database, one can identify 59,000 papers by searching 'ovary NOT CHO' whereas in the sequence-based UniGene database, 10,991 and 5,015 individual ovarian gene records have been deposited for human and rat, respectively. In addition, Unigene has >1,500 individual gene records from unfertilized mouse eggs, and a recent study using serial analysis of gene expression (SAGE)-PCR identified 21,000 known genes and ESTs as well as 6,000 unknown tags in human oocytes (Neilson et al., 2000). Based on these data, one can estimate that ovarian cells could express 20-30% of the estimated 80,000 genes in the human genome.  The high abundance of expressed ovarian genes is in direct contrast to the lower number of expressed genes in several other human tissues (e.g. muscle: 5,321 genes, liver: 16,894 genes, and prostate: 13,974 genes)  UniGene database.

        This wealth of data on ovary genes, however, is not readily accessible to the individual ovarian scientist because it is distributed across a host of different publications and online resources. Although a number of large public databases including GenBankUniGene, and OMIM (Online Mendelian Inheritance in Man), have been established to facilitate the management of the rapidly expanding amount of biomedical information, most of the existing databases provide only minimal information about the tissue-specific expression pattern and function of genes. The text string-based search functions used in many of these databases (e.g. PubMed) suffer from low specificity and sensitivity. Moreover, the individual entries in the GenBank generally lack review and annotation by experts in the specific field. In addition, information on reproductive tissues appears across different databases, thus not allowing scientists an integrated view.

        To alleviate the present situation, we have initiated an Internet database project for the ovarian research community termed the Ovarian Kaleidoscope database (OKdb). The OKdb was created to provide a unified online gateway to store, search, review, and update information about genes expressed in the ovary. The searchable database provides information regarding the biological function, expression pattern, and hormonal regulation of genes expressed in the ovary. Furthermore, it provides links to other online information resources offering data about nucleotide and amino acid sequences, chromosomal localization, human and murine mutation phenotypes, and biomedical publications relevant to the ovarian research community. Each ovarian gene entry is created and updated by experts working in the particular research area and the database is linked to multiple external databases to allow easy access of information.
        This information is now accessible online through the OKdb and is searchable not only by gene name but also by various criteria of the gene and gene products such as cellular and ovarian function, expression in different ovarian cell types, or association with specific ovarian phenotypes. By searching the expanding database, new patterns of ovarian gene expression and regulation could emerge. The Ovarian Kaleidoscope database represents a virtual platform for the entire community of ovarian researchers. This database is projected to adapt and expand in order to reflect the most current state of knowledge about genes expressed in the ovary. In order to achieve this goal, scientists around the world are invited to participate in this community effort through an online interface by submitting new records to the database and by suggesting updates to existing records.
  Herein, we describe the development and current status of the OKdb and discuss its use as a resource for the scientific community.
 

MATERIALS AND METHODS

        The OKdb is implemented as a relational database using an MySQL server (T.c.X. DataConsultAB, Stockholm, Sweden) running on a dedicated Windows NT system. The database consists of four tables in which the largest encompasses the functional information for each gene and includes the links to biomedical publications, and nucleotide and amino acid sequences. Furthermore, a link is set to the corresponding record in the OMIM database containing the general function of a given gene and the mutations found in human. A second table contains the information on mutations and phenotypes found in different species including mouse. The database allows each user to manage and change their own record; this is done by linking the records for each gene to a third table. The fourth table contains information on location and accessibility of DNA sequences for specific rodent and human genes.
          Initially the website was developed using Macromedia Dreamweaver 2.0 with Microsoft Active Server Pages code as the interface to the database. The Active Server Pages were later replaced by DTML and SQL Methods embedded in a ZOPE server (version: 2.1.4; binary release, python 1.5.2, win32-x86).  Since August 1999, the present database has been supported by NICHD/NIH through cooperative agreement (U54) as part of the Specialized Cooperative Centers Program in Reproduction Research.
 

DESCRIPTION

        The Ovarian Kaleidoscope database (OKdb) can be accessed at the URL http://ovary.stanford.edu/ (users of the OKdb are encouraged to cite the present paper as the primary reference). At this writing, the OKdb contains more than 450 searchable gene records from different species such as human, mouse, rat, and bovine. The records in the database contain hypertext links to a total of >2,800 PubMed records. In contrast to curated online databases such as OMIM, the creation and updating of gene records in OKdb is not limited to the authors of the database. Instead, any scientist working in reproductive biology, endocrinology, or neighboring fields can submit information on specific genes expressed in the ovary and update their records when new data become available. This design enables researchers to share knowledge from their particular area of expertise with the general ovarian research community and to conversely benefit from the knowledge of their colleagues in other specific areas. As a consequence of this approach, scientists contributing to the OKdb retain full credit, and responsibility, for the accuracy of the information submitted; each individual database record carries the name, institution, and email address of its author.
          Through the navigation bar on the home page, the OKdb users have access to the three main functions (SEARCH, BROWSE, SUBMIT, and UPDATE) and to additional features and information. Although searching and browsing the database does not necessitate logging in, submitting and updating gene records requires the establishment of a username and password.

        Search function Upon entering the SEARCH mode, the user can look for individual genes or subsets of genes by using an online form and selecting one or several search criteria. At the most basic level, the user may enter the name or abbreviation of a gene into the GENE NAME text field, then search only for genes from a specific species; or search all records in the database for keywords. More importantly, OKdb enables the user to perform highly specific searches for genes according to their general or ovarian function, cellular localization, regulation, and/or expression in various ovarian cell types at different stages of follicle development. The search criteria are entered by selecting check boxes, or items from menus, or by entering text into fields on the search form. If more than one search criterion is entered, only gene records meeting all the selected criteria are retrieved. For example, selecting 'receptor' as the general function, 'ovulation' as the ovarian function, and 'granulosa cells' as the ovarian cell type, will retrieve all records of genes encoding receptors involved in ovulation which are expressed in granulosa cells. An additional feature allows the search for genes for which there exists naturally occurring mutations or genetically modified (knockout or transgenic) animal strains as well as searches based on abnormal reproductive phenotypes in humans or animals (e.g. infertility with ovarian defect or subfertility).
          After submitting the search criteria, the user is presented with an overview of the search results, listing the gene name, species, submitter, and creation date of each record found. The user can either modify the search criteria by returning to the search screen or review individual gene records from the search results list. The utility of the present database is becoming evident.  In June 2000, a search for ‘hormones’ generated 83 records whereas a search for ‘infertile with ovarian defect’ generated 35 records.  These records were not easily accessible before and are not found in any single review article.

        Gene information screen After selecting a record from the search results, the user is taken to a screen where detailed information is displayed about the properties of the gene, including function, localization, expression, regulation, and mutations. In addition to the information directly stored in the OKdb, the gene information screen also contains links to nucleotide and amino acid sequences in GenBank, publication abstracts in PubMed, gene records in the OMIM database, and other relevant sites. Furthermore, the email address and, if available, homepage of the scientist who submitted the abstract will appear at the bottom of the page, along with the dates on which the record was created and last updated.  To maintain the quality of submitted records, we are planning to set up a scientific board to provide periodic review of individual record submitted.

        Submit/update functions When submitting a gene record to the OKdb, first–time users are asked to create a username and password as a means to authenticate subsequent updates and further submissions. They can then proceed directly to enter information about the gene of interest onto a gene submission form. Check boxes and list menus on this form are designed to facilitate precise searches for subsequent users, however, contributors are encouraged to enter additional comments and annotations into the accompanying text fields provided for each category of information. Use of a specific format enables researchers to create hypertext links from their comments to the PubMed abstracts of the publications cited. After entering their information onto the form, users can submit the gene record thus making it instantly available to other scientists using the SEARCH function. Alternatively, users may choose to submit an incomplete record using the HIDE mode, thus rendering it invisible to queries from other users. They can then return at a later time to complete the record using the UPDATE function (see below) before releasing it into the searchable portion of the database.
        Using the UPDATE function also requires authentication by username and password. The user is then presented with a list of all the records they have submitted, including any hidden records. Selecting a record from the list will take the user to the gene update form displaying all the information previously entered that can now be modified or completed as desired.
        Beginning in July 1999, all appropriate PubMed records dealing with ovarian genes (with a key word of ovary, follicle, granulosa, luteal, theca or oocyte) have been updated weekly into the database.  In addition, UniGene records on ovarian expressed sequence tags (ESTs), based on the sequencing of normal ovarian cDNA libraries have been sorted for incorporation into the OKdb.
        Browse function This feature allows the user to browse through the alphabetical listing of all ovarian gene records.
 

CONCLUSIONS AND OUTLOOK

        We have recently expanded the contents to include history and review sections in which review articles about the history of ovarian research (by Drs. R. Short and A. R. Zeleznik) have been published and other topics will be solicited. We are also planning pathway and gene circuitry sections in the OKdb in order to accommodate the results of ovarian gene expression analysis. The present categorization of gene networks for specific ovarian functions will facilitate studies using ovarian DNA microarrays. Current data analyses of microarrays include hierarchical clustering (Eisen et al., 1998), self-organizing maps (Tamayo et al, 1999, Toronen et al., 1999), and knowledge-based support vector machines (Brown et al., 2000). The data derived using any of these algorithms can be embedded and linked to the respective records of the database. Furthermore, the information on ovarian gene function and localization that is available in the database can be integrated with future microarray analyses to allow identification of genes with similar function and expression patterns. Automated expansion of the database is also possible by including genes discovered in microarray trials and amending information of already submitted genes. The large amounts of data generated in the future by global gene analysis can therefore be made available to ovarian researchers using a platform that is accessible to the public. In the future, the OKdb will also complement other databases such as the ArrayDB (Ermolaeva et al., 1998) which contains expression data from a subset of 15K human genes that are linked to the GenBank (ArrayDB).
         Thus, the OKdb serves as a foundation for future hypothesis-driven research and for uncovering previously unsuspected relationships among ovarian genes. With the sequencing of the entire human genome, the biomedical field is undergoing a major revolution. The integration of databases containing text-based literature derived from experimental results and sequence-based gene information suitable for bioinformatic computation is essential for a better understanding of different physiological and pathophysiological processes. The Ovarian Kaleidoscope Database represents an attempt to provide integrated information using a tissue-specific approach. In addition to its utility for ovarian scientists, the database could provide a model for establishing similar online resources for other tissues and organs of the human body.
 

ACKNOWLEDGMENTS

We acknowledged the set up of the original ASP interface by Tom Burns (Intelligent Network Solutions, CA) and the ZOPE server integration by Matthias Braun. The server is maintained by Ursula Vitt with individual records submitted by investigators around the world.
 
 

Published: Endocrinology, September 2000, Vol. 141