AWN Vol 41

STATUS OF THE WHEAT DATABASE

Olin D. Anderson1, David Matthews2, and Jon Wong1.

1U.S. Department of Agriculture, Agricultural Research Service, Western Regional Research Center,

800 Buchanan Street, Albany, CA 94710, USA.

2Department of Biometry and Plant Breeding, Cornell University, Ithaca, NY 14853, USA.

The United States Department of Agriculture's national research initiative in plant genomes has the goal of developing new technologies and genes for enhancing U.S. agriculture production. A major focus is the use of modern genetic mapping techniques to identify and isolate agronomically important genes. However, the identification and isolation of genes is of limited utility unless an efficient mechanism exists for disseminating to researchers the increasing amounts of molecular and genetic information. Thus, integral parts of the plant genome initiative are establishment and maintenance of computer databases. These databases will serve both as repositories of information and as research tools containing interrelated data types.

The initial focus of the Plant Genome Database Program (Dr. Jerome Miksche, director) was to develop prototype databases using four model plant systems: maize, soybeans, trees, and wheat. As the database models have stabilized, additional crops have been included, e.g., barley, rice, sorghum, cotton, other legumes, and the Solanaceae. Separate databases for each crop, or crop group, are assembled at sites around the United States. Data then is transferred to the National Agricultural Library (NAL) in Washington, D.C., which is the intended primary access point for database users. Individual databases may also be accessible directly from different locations, but the general access to all crops will be at NAL.

It was immediately obvious in the wheat database project that important data needed to be included from crops other than wheat. The ability to make wide crosses within the Triticeae and the similarity of the different species lead us fairly early in the project to expand our data assembly to include barley, rye, and those wild grasses that can be crossed to wheat. In addition, we have included data from oats and sugarcane. The inclusion of data on many of the small grains lead us to name our database GrainGenes (with an apology to sugarcane).

The database is intended for all researchers working with those crops we cover. Although the initial impetus for the project was from consideration of newly accumulating molecular data, we found that sources of nonmolecular data were of great interest to researchers. Much of these additional data were not appropriate for a traditional database, but were more suited to browsing or simple searches of large files. To accommodate the different types, we are using several different data presentations (gopher, Acedb, World-Wide-Web, and CD-ROM), but they are referred to collectively as the "GrainGenes" database.

The two main presentations are the GrainGenes gopher and the Acedb GrainGenes. The gopher format is text-based and the Acedb format is a graphical interface with extensive graphic and query capabilities. The gopher has the simplest access. A user must be able to log onto a computer connected to the Internet (the early version of the Information Superhighway we hear so much about). From there, the user connects to the GrainGenes gopher and uses the basic keyboard strokes to maneuver. From the text-based display, the user is able to maneuver to different information such as the following:

- Search the GrainGenes Acedb database for information

- Retrieve files and images to a local computer

- Search or download the Wheat Gene Catalog

- Search the Commercial Wheat Cultivar Catalog

- Browse or download yearly performance and quality evaluations

- Browse the Cereal Rust Bulletin

- Download raw mapping scores

- etc.

More complex hardware and software are needed for a graphic connection to the Acedb GrainGenes, whose basic requirements are a direct Internet connection and X-windows software on the user's computer. Acedb has a graphical interface and is a multiwindowed, mouse-controlled environment with both graphic and text displays. Among the data and capabilities of the Acedb format are:

- Image displays

- Active map displays

- Complex query capability

- 24 maps and 166 linkage groups

- Comparative maps of rice, maize, and wheat

- More than 2,500 loci and 600 genes

- More than 2,000 probes with nucleotide end sequences of 200

- Information on 22,000 germplasms

- Information on 1,400 species of plants and plant and insect pathogens

- Results of 4 QTL studies in wheat

- Names, addresses, and research interests of 1,000 colleagues

- More than 1,400 relevant bibliographic citations

- 450 pathology entries, some with digitized images of disease symptoms

- Images of pathogens, morphologies, and DNA hybridizations from mapping studies

- HMW-glutenin gene complements for 1,800 wheat cultivars

Both presentations have advantages, and a single interface with all data types may eventually develop. Perhaps an early version of this union is via the World-Wide-Web (WWW), a graphic interface with the same windowing appearance in all platforms. The user runs resident software of a "Web-browser" on the local computer (PC, Mac, or Unix) and interacts via nearly identical-looking windows and buttons. The current browser we are using is Mosaic, but several others, both commercial and free, are becoming available. One endearing characteristic of all the basic software we are using currently is that it is free and can be downloaded over the Internet from a number of source sites. The exception is the X-windowing software that must be purchased from any of a number of vendors (unless your system already has X-window capability).

The final presentation format is via CD-ROM using Mosaic to access the data. The CD-ROM is pressed by the NAL, and current plans are to update 2-3 times a year. Distribution is free at the moment, but charging for the cost of the discs will probably eventually be necessary. The second version of the CD-ROM is now being distributed. Several bugs have been fixed from the first prototype. Although there are still some problems to be addressed in future version, the current CD-ROM is reasonably straightforward to install and includes data not just for GrainGenes, but also for several other crop databases (maize, rice, Solanaceae, soybeans, and trees). The advantage of the CD-ROM is two-fold. First, users without good, or any, network access will be able to use the databases. This can include less-developed countries, users in more remote sites, and users wanting to work at home where no Internet access is yet common. Secondly, the CD-ROM adds a mobile dimension. Besides home use, newer models of laptop computers are beginning to include CD players.

Even though GrainGenes is funded by the United States Department of Agriculture, it is, by necessity, of an international character. Our crops are among the most widely grown in the world; thus, not only are potential users of the database found throughout the world, but many of the important data sources are international (either literally from other countries or with an international component). Thus, we encourage, and are enjoying, successful interactions with scientists throughout the world. Just a few examples include the curator of the Wheat Gene Catalog (Bob McIntosh - Australia), germplasm and trait data from CIMMYT (Mexico City); and maps from Australia, the United Kingdom, Germany, and France.

We are always eager to help users connect to the databases, and we encourage comments and suggestions for improvements. If you are a small grains scientist and, after viewing the GrainGenes databases, wish that data you possess, or know about, were part of GrainGenes, then you should be in contact with us. Improvements in data presentation and the breadth and depth of data are driven mainly by user interactions with the GrainGenes personnel. Similarly, the collation and maintenance of data are dependent on scientists' contributions to the database project. As an example, we currently are organizing curators for specific areas in pathology. Individuals or small groups will be responsible for curating data in areas of their expertise. E-mail notices of this effort will be forthcoming as we organize further, and anyone interested in participating is encouraged to contact us.

Contacts:

Olin Anderson: oandersn@pw.usda.gov (510) 559-5773

Jon Wong: jwong@pw.usda.gov (510) 559-5614

USDA, ARS, WRRC, 800 Buchanan Street, Albany, CA 94710, USA.

David Matthews: matthews@greengenes.cit.cornell.edu (607) 255-9951

Dept. of Biometry and Plant Breeding, Cornell University, Ithaca, NY 14853, USA.

GrainGenes Acedb access requires a password from David Matthews.

Gopher access is via greengenes.cit.cornell.edu, or probe.nalusda.gov.

Mosaic access is via http://probe.nalusda.gov:8300.