Wheat Transcriptome Sequences

We have generated three transcriptome datasets, for Triticum urartu, Triticum turgidum and a complementary set of published wheat transcripts not present in the T. turgidum set.

Datasets include 5' and 3' untranslated regions. T. urartu: 86,247, T. turgidum: 140,118

Open reading frames
ORFs start at ATG and end at the stop codon unless they are truncated. Predicted pseudogenes have been excluded. T. urartu: 37,806, T. turgidum: 66,633

Translated ORFs. Select BLASTP to see the protein datasets. Predicted pseudogenes have been excluded. T. urartu: 37,806, T. turgidum: 66,633

"Complementary" wheat transcribed sequences
Since our tetraploid transcriptome was assembled from a limited number of tissues and developmental stages, we generated an additional non-redundant set of sequences from
- published wheat transcriptomes [1,2,3],
- full-length cDNA datasets [4], and
- re-assembled wheat ESTs.
The initial non-redundant set included 146,300 contigs, available in dataset "Published wheat transcripts".
Following BLASTX searches against characterized plant protein databases, we used findorf to predict 65,921 ORFs (>30 amino acids, no pseudogenes) within the previous dataset. We then identified and removed ORFs that were already present in our T. turgidum dataset (CD-HIT-2D). The remaining 27,544 non-redundant ORFs are available in "Complementary wheat ORFs" and "Complementary wheat proteins".

Cautionary note: Some of the T. turgidum assembled transcripts and predicted ORFs are chimeras between A and B genome transcripts.

If you use this data in your publications, please cite:

Krasileva, K.V., V. Buffalo, P. Bailey, S. Pearce, M. Soria, F. Tabbita, C. Uauy, International Wheat Genome Sequencing Consortium, and J. Dubcovsky*. Separating homeologs by phasing in the tetraploid wheat transcriptome. 2013 Genome Biology. In press.


