Triticeae Unigene Set

Triticeae Unigenes

The Triticeae Unigene set was generated from the global EST assembly using phrap. The detailed phrap assembly information can be found at the Wheat EST Assembly page.

In summary, the first Unigene Set (generated on January 29, 2001) consisted of 6,900 contigs with at least two members of ESTs. Around 20,000 ESTs remained as singletons. The second round of contig assembly was carried out on April 1, 2002 with 77,022 ESTs. They were assembled into 11,758 contigs with the result that 18,903 remained as singletons. Althogher, the number of Triticeae Unigenes is now at 30,661. The lastest assembly done on January 8, 2003 including 115,510 ESTs and assembly results gave 18,876 contigs and 23,034 singletons, or 41,910 Unigenes.

To query contig information from all versions, please go to the wEST-SQL page.

Nonduplicated Triticeae Unigenes

Since the majority of the ESTs used for generating the Unigene Set was from 5' sequencing, more than one Unigene may correspond to the same gene due to lack of nonoverlapping ESTs during asssembly. Thus, a strategy to identify nonduplicated Unigenes was developed. The resulting nonduplicated Unigenes can be treated as an anchor Unigene Set to assist in building an expanded Unigene Set later. These nonduplicated Unigenes have been used as probes for the Deletion Mapping Project and will be the source for microarray work in the future.

The Nonduplicated Unigenes Selection Protocol illustrates the strategy used at Albany, CA to screen and validate nonduplicated ESTs selected from the Unigene Set. A list of ESTs derived from the Unigene Set contains one member of ESTs chosen from each contig, ESTs of hexaploid wheat origin usually preferred, and all the unassembled ESTs. After removing undesirable Unigenes, Unigenes with BLAST hits matched to retroelements, organellar gene sequences, and rRNA, ESTs were then validated using both 5' and 3' sequence data to verify the original EST identity and to remove duplicates. The software used to detect duplicates is cross_match, and any two reads with sequence similarity at 90% or higher over a stretch of 100 bases are considered as duplicates. Under this criterion, alleles of the same gene or different members of the same gene family could escape the duplicate check process, if the sequence similarity is lower than 90%.