A Database for Triticeae and Avena
Olin D. Anderson1, David
Matthews2, and Jon Wong1.
1U.S. Department of Agriculture, Agricultural Research Service, Western Regional Research Center,
800 Buchanan Street, Albany, CA 94710,
USA.
2Department
of Biometry and Plant Breeding, Cornell University, Ithaca, NY
14853, USA.
The United States Department of Agriculture's
national research initiative in plant genomes has the goal of
developing new technologies and genes for enhancing U.S. agriculture
production. A major focus is the use of modern genetic mapping techniques to identify
and isolate agronomically important genes. However, the identification
and isolation of genes is of limited utility unless an efficient
mechanism exists for disseminating to researchers the increasing
amounts of molecular and genetic information. Thus, integral
parts of the plant genome initiative are establishment and maintenance
of computer databases. These databases will serve both as repositories
of information and as research tools containing interrelated data
types.
The initial focus of the Plant Genome
Database Program (Dr. Jerome Miksche, director) was to develop
prototype databases using four model plant systems: maize, soybeans,
trees, and wheat. As the database models have stabilized, additional
crops have been included, e.g., barley, rice, sorghum, cotton,
other legumes, and the Solanaceae. Separate databases for each
crop, or crop group, are assembled at sites around the United
States. Data then is transferred to the National Agricultural
Library (NAL) in Washington, D.C., which is the intended primary
access point for database users. Individual databases may also
be accessible directly from different locations, but the general
access to all crops will be at NAL.
It was immediately obvious in the wheat
database project that important data needed to be included from
crops other than wheat. The ability to make wide crosses within
the Triticeae and the similarity of the different species lead
us fairly early in the project to expand our data assembly to
include barley, rye, and those wild grasses that can be crossed
to wheat. In addition, we have included data from oats and sugarcane.
The inclusion of data on many of the small grains lead us to
name our database GrainGenes (with an apology to sugarcane).
The database is intended for all researchers
working with those crops we cover. Although the initial impetus
for the project was from consideration of newly accumulating molecular
data, we found that sources of nonmolecular data were of great
interest to researchers. Much of these additional data were not
appropriate for a traditional database, but were more suited to
browsing or simple searches of large files. To accommodate the
different types, we are using several different data presentations
(gopher, Acedb, World-Wide-Web, and CD-ROM), but they are referred
to collectively as the "GrainGenes" database.
The two main presentations are the
GrainGenes gopher and the Acedb GrainGenes. The gopher format
is text-based and the Acedb format is a graphical interface with
extensive graphic and query capabilities. The gopher has the
simplest access. A user must be able to log onto a computer connected
to the Internet (the early version of the Information Superhighway
we hear so much about). From there, the user connects to the
GrainGenes gopher and uses the basic keyboard strokes to maneuver.
From the text-based display, the user is able to maneuver to
different information such as the following:
- Search the GrainGenes Acedb database for information
- Retrieve files and images to a local computer
- Search or download the Wheat Gene Catalog
- Search the Commercial Wheat Cultivar Catalog
- Browse or download yearly performance and quality evaluations
- Browse the Cereal Rust Bulletin
- Download raw mapping scores
-
etc.
More complex hardware and software
are needed for a graphic connection to the Acedb GrainGenes, whose
basic requirements are a direct Internet connection and X-windows
software on the user's computer. Acedb has a graphical interface
and is a multiwindowed, mouse-controlled environment with both
graphic and text displays. Among the data and capabilities of
the Acedb format are:
- Image displays
- Active map displays
- Complex query capability
- 24 maps and 166 linkage groups
- Comparative maps of rice, maize, and wheat
- More than 2,500 loci and 600 genes
- More than 2,000 probes with nucleotide end sequences of 200
- Information on 22,000 germplasms
- Information on 1,400 species of plants and plant and insect pathogens
- Results of 4 QTL studies in wheat
- Names, addresses, and research interests of 1,000 colleagues
- More than 1,400 relevant bibliographic citations
- 450 pathology entries, some with digitized images of disease symptoms
- Images of pathogens, morphologies, and DNA hybridizations from mapping studies
-
HMW-glutenin gene complements for 1,800 wheat cultivars
Both presentations have advantages,
and a single interface with all data types may eventually develop.
Perhaps an early version of this union is via the World-Wide-Web
(WWW), a graphic interface with the same windowing appearance
in all platforms. The user runs resident software of a "Web-browser"
on the local computer (PC, Mac, or Unix) and interacts via nearly
identical-looking windows and buttons. The current browser we
are using is Mosaic, but several others, both commercial and free,
are becoming available. One endearing characteristic of all the
basic software we are using currently is that it is free and can
be downloaded over the Internet from a number of source sites.
The exception is the X-windowing software that must be purchased
from any of a number of vendors (unless your system already has
X-window capability).
The final presentation format is via
CD-ROM using Mosaic to access the data. The CD-ROM is pressed
by the NAL, and current plans are to update 2-3 times a year.
Distribution is free at the moment, but charging for the cost
of the discs will probably eventually be necessary. The second
version of the CD-ROM is now being distributed. Several bugs
have been fixed from the first prototype. Although there are
still some problems to be addressed in future version, the current
CD-ROM is reasonably straightforward to install and includes data
not just for GrainGenes, but also for several other crop databases
(maize, rice, Solanaceae, soybeans, and trees). The advantage
of the CD-ROM is two-fold. First, users without good, or any,
network access will be able to use the databases. This can include
less-developed countries, users in more remote sites, and users
wanting to work at home where no Internet access is yet common.
Secondly, the CD-ROM adds a mobile dimension. Besides home use,
newer models of laptop computers are beginning to include CD players.
Even though GrainGenes is funded by
the United States Department of Agriculture, it is, by necessity,
of an international character. Our crops are among the most widely
grown in the world; thus, not only are potential users of the
database found throughout the world, but many of the important
data sources are international (either literally from other countries
or with an international component). Thus, we encourage, and
are enjoying, successful interactions with scientists throughout
the world. Just a few examples include the curator of the Wheat
Gene Catalog (Bob McIntosh - Australia), germplasm and trait data
from CIMMYT (Mexico City); and maps from Australia, the United
Kingdom, Germany, and France.
We are always eager to help users connect
to the databases, and we encourage comments and suggestions for
improvements. If you are a small grains scientist and, after
viewing the GrainGenes databases, wish that data you possess,
or know about, were part of GrainGenes, then you should be in
contact with us. Improvements in data presentation and the breadth
and depth of data are driven mainly by user interactions with
the GrainGenes personnel. Similarly, the collation and maintenance
of data are dependent on scientists' contributions to the database
project. As an example, we currently are organizing curators
for specific areas in pathology. Individuals or small groups
will be responsible for curating data in areas of their expertise.
E-mail notices of this effort will be forthcoming as we organize
further, and anyone interested in participating is encouraged
to contact us.
Contacts:
Olin Anderson: oandersn@pw.usda.gov (510) 559-5773
Jon Wong: jwong@pw.usda.gov (510) 559-5614
USDA, ARS, WRRC, 800 Buchanan Street,
Albany, CA 94710, USA.
David Matthews: matthews@greengenes.cit.cornell.edu (607) 255-9951
Dept. of Biometry and Plant Breeding,
Cornell University, Ithaca, NY 14853, USA.
GrainGenes Acedb access requires a password from David Matthews.
Gopher access is via greengenes.cit.cornell.edu, or probe.nalusda.gov.
Mosaic access is via http://probe.nalusda.gov:8300.