ANNOUNCEMENT
International Triticeae EST Cooperative (ITEC) Update: Collaboration Guidelines
At the 9th International Wheat Genetics Symposium held at the University of
Saskatchewan, Saskatoon August 2-7, 1998, a proposal was developed to
establish a public database of Expressed Sequence Tags (ESTs) from species
of the Triticeae. The target is to have at least 40,000 ESTs available
publicly from 1 July 2000.
This activity was seen as the first stage in developing an international
effort to produce a public set of information and materials for Triticeae
genome research. Following the Symposium, a discussion period was set until
September 14 and a webpage (http://wheat.pw.usda.gov/genome/) and bulletin
board (http://wheat.pw.usda.gov:8000/cgi-bin/mboard/ITEC/list.cgi) were
established to facilitate public discussion.
The ITEC Steering Committee has reviewed the suggestions, comments, and
questions and has produced the following description and guidelines for the
ITEC effort. Comments and offers to participate should be emailed to
genome@pw.usda.gov. Offers to participate should provide full contact
information and details of the proposed participation with respect to the
guidelines below. The ITEC Steering Committee members are recipients at this
address. A summary of commentary will be posted to the ITEC webpage and
bulletin board. A roster of participants will be maintained there as well,
as offers of participation are received and confirmed.
ITEC Guidelines
- The definition of an ITEC participating laboratory will be flexible to
accommodate the great variety of funding and staffing levels and public or
private status of interested organizations. The overall goal is production
of a large body of information and materials in the public domain in as
efficient, rapid, and generous a manner as possible. These criteria will
guide the definition. In many cases, this means that a reasonable balance
between the size (funding and staff) of an organization and the type and
level of its contribution to the ITEC effort will need to be worked out
between the ITEC Steering Committee and individual organizations with
interest in participating. It is anticipated that labs that produce or
arrange to purchase less than the target number of sequences will also
contribute in other ways, such as providing cDNA libraries, coordination,
analysis, or other services. The roster of participants and the type and
level of contribution will be part of the publicly available database.
- The general target for each participating laboratory will be the
submission of 1,000 EST sequences by 1 July 1999, a sample of the clone used
for each sequence, and the trace file from the sequencing. More will come
from some participants and fewer from others, with a minimum number set at
250 sequences. Sequences will be accepted after 1 July 1999, but the
submitting laboratory will only have access to the database after their
sequences have been received and all sequences will be publicly available on
1 July 2000 irrespective of submission date.
- The EST sequences should be of random clones from an unamplified or
prescreened cDNA library. Each sequence should be at least 300 bp of
non-vector sequence with a maximum of 5% ambiguous bases, the minimum to
achieve reasonable reliability with BLAST searches. The cDNA can be from any
species of the Triticeae, although most data are expected to be from wheat
and barley. A call will be made for information about existing, available
cDNA libraries. Watch the ITEC webpage.
High-quality cDNA libraries are critical to the ITEC effort, therefore the
following standards must be met before libraries will be acceptable for
ITEC: The contributor must confirm that a sample of at least 100 of the
clones have been sequenced to ensure that inserts are present in most clones
and to monitor the size of the inserted sequences. Useful sequences (longer
than 300 bases) should be present in at least 80% of the sequenced samples.
- On 1 July 2000 the entire database will be made publicly available.
- All participating laboratories will have free access to the database for
one year, from 1 July 1999 until the date of public access to the database
(1 July 2000). Access to the database during this year would be password
protected. Any participating laboratory could make the sequences which it
submitted to ITEC publicly available in advance of 1 July 2000, but not
through the channel of the ITEC database.
- The repository of the data during both the period of restricted and
general access would be from the GrainGenes database
(http://wheat.pw.usda.gov/) maintained by the US Dept. of Agriculture. Data
will be available for simple downloading via an ftp address. The importing
of data into the ACEDB environment of GrainGenes will only be part of the
pre-processing and may simplify the search for desired sequences and will
simplify the post-processing phase when sequence data will be distributed
publicly via GrainGenes.
- The sample of each clone sequenced would be maintained through the
GrainGenes Probe Repository.
- Another component of the database will be a roster of the cDNA libraries
for which sequencing has been undertaken and the tissue/stress/developmental
stage status of each library. The database will serve as a clearinghouse for
linking available libraries with laboratories interested in particular
tissues and developmental stages and as a means for locating labs which can
produce libraries for distribution. The objective is to avoid duplication of
effort and enhance the diversity of the resulting ESTs.
- Securing funding for the data and clone repository will be an ongoing
task for the Triticeae research community. While the maintained data will be
available electronically, at some point a charge for distribution of
materials will be necessary. Initially, the database will fall under the
mandate of activities for GrainGenes. However, the GrainGenes Probe
Repository will need support for the maintenance of the clone samples.
Additional contributions of financial support have been offered and others
will be sought with the proviso that the information and materials remain in
the public domain.
Scenarios of ITEC participation.
Given that there will be a flexible definition of participating laboratory,
there will be several possibilities that should allow laboratories without
sequencing capabilities to become participants and achieve early access to
the database. Following are several of the scenarios that have been envisioned.
- A participating laboratory produces or acquires a cDNA library, sequences
1000 clones in house, and submits the data, a sample of each clone
sequenced,
and the sequencing trace files to the GrainGenes database and repository.
- A participating laboratory produces or acquires a cDNA library and
contracts
to have 1000 clones sequenced, subsequently submitting the data, a
sample of
each clone sequenced, and trace files to GrainGenes.
- A participating laboratory produces a cDNA library and provides it and
funding
to a sponsoring participating laboratory which will arrange for the
sequencing and
data and file submission to GrainGenes.
- A participating laboratory provides funding to a sponsoring participating
laboratory for a set number of ESTs. The sponsoring laboratory will
arrange
for the sequencing of that number of ESTs from a cDNA library of its
choice
and will submit the data and files to GrainGenes.
Recognition of contributions of participants.
A given sequence in the ITEC database will be designated with a label that
allows identification of the participating organization which funded its
production. This, along with the database's roster of participants and the
type and level of their contribution, will allow identification of an
organization's contribution to the ITEC effort. This labelling is for
recognition purposes only and should not be construed to imply any ownership
of an EST with respect to any subsequent use.
Facilitation of contracting for sequence production.
The ITEC webpage will maintain links to organizations which will provide, on
a fee basis, the range of technologies and services relevant to ITEC's
goals: e.g., cDNA library construction, prescreening, clone picking,
sequencing, etc. At least three such organizations are known currently to
the steering committee and it appears that the cost of providing 1000
sequences currently ranges from $8,000USD to $10,000USD.
Further ITEC activity.
Since 40,000 sequences is a very minimal target (a new US National Science
Foundation-funded project to produce ESTs for tomato has a target of
90,000), a second (and possibly larger) stage of sequence accumulation may
be organized. In other words, another collection date may be set with a year
lag until public availability to accumulate a new target number of sequences.
We thank those of you who have contributed to the discussion about the
establishment of ITEC.
ITEC Steering Committee members:
- Peter Langridge, Univ. of Adelaide, AUSTRALIA
- Olin Anderson, USDA-ARS, USA
- Perry Gustafson, USDA-ARS, USA
- Mike Gale, John Innes Centre, UNITED KINGDOM
- Cal Qualset, ITMI, UC Davis, USA
- Pat McGuire, ITMI, UC Davis, USA