ANNOUNCEMENT

International Triticeae EST Cooperative (ITEC) Update: Collaboration Guidelines

At the 9th International Wheat Genetics Symposium held at the University of Saskatchewan, Saskatoon August 2-7, 1998, a proposal was developed to establish a public database of Expressed Sequence Tags (ESTs) from species of the Triticeae. The target is to have at least 40,000 ESTs available publicly from 1 July 2000.

This activity was seen as the first stage in developing an international effort to produce a public set of information and materials for Triticeae genome research. Following the Symposium, a discussion period was set until September 14 and a webpage (http://wheat.pw.usda.gov/genome/) and bulletin board (http://wheat.pw.usda.gov:8000/cgi-bin/mboard/ITEC/list.cgi) were established to facilitate public discussion.

The ITEC Steering Committee has reviewed the suggestions, comments, and questions and has produced the following description and guidelines for the ITEC effort. Comments and offers to participate should be emailed to genome@pw.usda.gov. Offers to participate should provide full contact information and details of the proposed participation with respect to the guidelines below. The ITEC Steering Committee members are recipients at this address. A summary of commentary will be posted to the ITEC webpage and bulletin board. A roster of participants will be maintained there as well, as offers of participation are received and confirmed.

ITEC Guidelines

  1. The definition of an ITEC participating laboratory will be flexible to accommodate the great variety of funding and staffing levels and public or private status of interested organizations. The overall goal is production of a large body of information and materials in the public domain in as efficient, rapid, and generous a manner as possible. These criteria will guide the definition. In many cases, this means that a reasonable balance between the size (funding and staff) of an organization and the type and level of its contribution to the ITEC effort will need to be worked out between the ITEC Steering Committee and individual organizations with interest in participating. It is anticipated that labs that produce or arrange to purchase less than the target number of sequences will also contribute in other ways, such as providing cDNA libraries, coordination, analysis, or other services. The roster of participants and the type and level of contribution will be part of the publicly available database.

  2. The general target for each participating laboratory will be the submission of 1,000 EST sequences by 1 July 1999, a sample of the clone used for each sequence, and the trace file from the sequencing. More will come from some participants and fewer from others, with a minimum number set at 250 sequences. Sequences will be accepted after 1 July 1999, but the submitting laboratory will only have access to the database after their sequences have been received and all sequences will be publicly available on 1 July 2000 irrespective of submission date.

  3. The EST sequences should be of random clones from an unamplified or prescreened cDNA library. Each sequence should be at least 300 bp of non-vector sequence with a maximum of 5% ambiguous bases, the minimum to achieve reasonable reliability with BLAST searches. The cDNA can be from any species of the Triticeae, although most data are expected to be from wheat and barley. A call will be made for information about existing, available cDNA libraries. Watch the ITEC webpage.

    High-quality cDNA libraries are critical to the ITEC effort, therefore the following standards must be met before libraries will be acceptable for ITEC: The contributor must confirm that a sample of at least 100 of the clones have been sequenced to ensure that inserts are present in most clones and to monitor the size of the inserted sequences. Useful sequences (longer than 300 bases) should be present in at least 80% of the sequenced samples.

  4. On 1 July 2000 the entire database will be made publicly available.

  5. All participating laboratories will have free access to the database for one year, from 1 July 1999 until the date of public access to the database (1 July 2000). Access to the database during this year would be password protected. Any participating laboratory could make the sequences which it submitted to ITEC publicly available in advance of 1 July 2000, but not through the channel of the ITEC database.

  6. The repository of the data during both the period of restricted and general access would be from the GrainGenes database (http://wheat.pw.usda.gov/) maintained by the US Dept. of Agriculture. Data will be available for simple downloading via an ftp address. The importing of data into the ACEDB environment of GrainGenes will only be part of the pre-processing and may simplify the search for desired sequences and will simplify the post-processing phase when sequence data will be distributed publicly via GrainGenes.

  7. The sample of each clone sequenced would be maintained through the GrainGenes Probe Repository.

  8. Another component of the database will be a roster of the cDNA libraries for which sequencing has been undertaken and the tissue/stress/developmental stage status of each library. The database will serve as a clearinghouse for linking available libraries with laboratories interested in particular tissues and developmental stages and as a means for locating labs which can produce libraries for distribution. The objective is to avoid duplication of effort and enhance the diversity of the resulting ESTs.

  9. Securing funding for the data and clone repository will be an ongoing task for the Triticeae research community. While the maintained data will be available electronically, at some point a charge for distribution of materials will be necessary. Initially, the database will fall under the mandate of activities for GrainGenes. However, the GrainGenes Probe Repository will need support for the maintenance of the clone samples. Additional contributions of financial support have been offered and others will be sought with the proviso that the information and materials remain in the public domain.

Scenarios of ITEC participation.

Given that there will be a flexible definition of participating laboratory, there will be several possibilities that should allow laboratories without sequencing capabilities to become participants and achieve early access to the database. Following are several of the scenarios that have been envisioned.

  1. A participating laboratory produces or acquires a cDNA library, sequences 1000 clones in house, and submits the data, a sample of each clone sequenced, and the sequencing trace files to the GrainGenes database and repository.

  2. A participating laboratory produces or acquires a cDNA library and contracts to have 1000 clones sequenced, subsequently submitting the data, a sample of each clone sequenced, and trace files to GrainGenes.

  3. A participating laboratory produces a cDNA library and provides it and funding to a sponsoring participating laboratory which will arrange for the sequencing and data and file submission to GrainGenes.

  4. A participating laboratory provides funding to a sponsoring participating laboratory for a set number of ESTs. The sponsoring laboratory will arrange for the sequencing of that number of ESTs from a cDNA library of its choice and will submit the data and files to GrainGenes.

Recognition of contributions of participants.

A given sequence in the ITEC database will be designated with a label that allows identification of the participating organization which funded its production. This, along with the database's roster of participants and the type and level of their contribution, will allow identification of an organization's contribution to the ITEC effort. This labelling is for recognition purposes only and should not be construed to imply any ownership of an EST with respect to any subsequent use.

Facilitation of contracting for sequence production.

The ITEC webpage will maintain links to organizations which will provide, on a fee basis, the range of technologies and services relevant to ITEC's goals: e.g., cDNA library construction, prescreening, clone picking, sequencing, etc. At least three such organizations are known currently to the steering committee and it appears that the cost of providing 1000 sequences currently ranges from $8,000USD to $10,000USD.

Further ITEC activity.

Since 40,000 sequences is a very minimal target (a new US National Science Foundation-funded project to produce ESTs for tomato has a target of 90,000), a second (and possibly larger) stage of sequence accumulation may be organized. In other words, another collection date may be set with a year lag until public availability to accumulate a new target number of sequences.

We thank those of you who have contributed to the discussion about the establishment of ITEC.

ITEC Steering Committee members:

Peter Langridge, Univ. of Adelaide, AUSTRALIA
Olin Anderson, USDA-ARS, USA
Perry Gustafson, USDA-ARS, USA
Mike Gale, John Innes Centre, UNITED KINGDOM
Cal Qualset, ITMI, UC Davis, USA
Pat McGuire, ITMI, UC Davis, USA