Author: David Hummel (updated 23Sep2004 by Nancy Lui)
Relational tables have been designed for all 38 GrainGenes classes. Another, more graphical view of the schema is here. Each original ACEDB class and tag has been translated to a MySQL table and column. Complex, tree-structured ACEDB tags were handled as described below.
A Class Browser has been developed, intended as the simplest user interface for searching and viewing GrainGenes records. The search results link to specific Object Reports.
Object Reports have been created for all of the data classes. Here is a sample report for the Gene class: Glu-A1 (Triticum). A special report for Markers, integrating the information currently in the separate reports for Locus, Probe and Gene, is also available.
An SQL Interface allows advanced users to perform raw SQL queries against the GrainGenes database. The interface includes a menu of Premade Queries as editable examples. The Batch SQL Interface is a further advanced version, which allows supplying the raw SQL query with a list of arguments to the "where .. in" clause. The Quick Queries page includes all the queries available in the ACEDB version of GrainGenes and is implemented using the SQL Interface, providing additional editable (and useful) examples.
A full-text search has been implemented using the ht://Dig search engine, for rapid simple searching of all data in GrainGenes. Most of the data classes have been indexed and are available for searching.
The CMap comparative map display has been implemented and integrated with the main MySQL database interface, with links to and from the Object Reports.
A Field-based Search is being developed, allowing searches for data objects based on the contents of one or more fields in the data class. Here is a sample query for etiolated shoot sequences.
The new GrainGenes Homepage provides a multi-featured portal to the new GrainGenes.
Curation, Data flow
Currently all primary curation and validation are being done in ACEDB. Results are available next day at www.graingenes.org and the three mirror sites indicated as "daily" at http://wheat.pw.usda.gov/GG2/pickGG.shtml.
Weekly, the main MySQL database is updated from the ACEDB database. The procedure takes ca. 20 hours to execute, so more frequent updates aren't practical. Once the main MySQL database is updated another script updates the CMap MySQL database with the current state of the GrainGenes maps.
The data contained in ACEDB is highly object-oriented in nature, existing in tree-like nested structures which are not directly translatable to a relational format. To maintain the complexity and breadth of data contained in ACEDB, several relational tables are required to express the data in a single class. These tables, linked together via foreign keys, allow for the one-to-many, many-to-one, and many-to-many relationships that exist in the tree-like ACEDB models. The following relational strategy was used:
Given the high degree of model complexity for certain ACEDB classes (i.e. Sequence), these classes require many SQL tables to express the data. To minimize the number of tables required, it was helpful to reduce the complexity of the class models, wherever possible, before deriving the SQL tables. The following steps were taken to simplify the models:
All of the other major plant genomic database resources (PlantGDB, MaizeGDB, Gramene, tair, NCBI) are using relational databases to house their data. Each site has developed and employed various on-line query, display, report and analysis tools providing access to the data. Several tools are publicly available for use at other sites, such as CMap and GBrowse, which are useful for displaying maps and browsing genomic sequences.
Our intention is to leverage these other database resources as much as possible by utilizing tools such as CMap and GBrowse for use on GrainGenes. In fact, CMap and GBrowse have already been tested and deployed to display genetic and physical map data housed in GrainGenes. Please visit our new GBrowse and CMap sites.
Each of these tools use relational databases with specific schemas. Since it is not practical to make custom modifications to each of these tools, it will be necessary to develop programs to convert the data from the final GrainGenes schema, to the specialized schemas of each display tool. This can be done in an automated and seamless fashion.
The user interface is important in determining how the data should be organized in the underlying relational database. GrainGenes has a user interface that is distinctly different from that of other databases. It is important to identify areas of the GrainGenes user interface that should be maintained, replaced with tools from other database resources, or developed from scratch. To a certain extent, this has facilitated and simplified the development of the GrainGenes relational schema.