At the Dundee meeting of the International Triticeae Mapping Initiative, September 2001, there was a call to create a database of Triticeae genomic repeat sequences.
database would be valuable for masking repeats out of sequences where they
don't belong, e.g. ESTs. The same plea was sounded by Lukas Wagner of NCBI, for
improving the quality of his UniGene sets for wheat and barley.
A good knowledge of the different types of repetitive elements is also important for sequencing and analysis of large regions. Therefore, it can make life of the Triticeae researcher much easier.
The TREP database contains a collection of repetitive DNA sequences from different Triticeae species. All elements were classified and annotated as far as it was possible to do so. An individual TREP entry should only contain the element of interest without flanking sequences to make it easier to determine the actual borders of the element in your query sequence. Elements with coding capacity (i.e. Transposons) are deposited in 5’ – 3’ orientation.
The non-redundant database contains only one or very few copies of the different types of repetitive elements in order to make Blast searches more efficient.
The complete database contains all elements which were submitted and allows studies of the different types of elements.
The database for hypothetical proteins (PTREP) contains deduced amino acid sequences. Due to the degenerated nature of most transposable elements, many protein sequences contain in-frame stop codons. For the deduction of the hypothetical proteins, frameshifts have to be removed in many cases. PTREP is useful for the identification for of divergent repeats which do not show similarity on the DNA level.
All repetitive sequences have a unique identifier which is the TREP accession number (the prefix “TREP” followed by a number). Each annotation header contains classification of the repetitive element. In addition, there is information about where the element was found (i.e. GenBank or EMBL Accession number), a reference if the source sequence was published and the name of the person or institution who submitted the sequence to TREP. If the data were available, the positions of special features like long terminal repeats (LTRs), conding regions etc. were annotated.
Hypothetical protein sequences have the identifier “PTREP” followed by a number.
The description (DE) line
The description line corresponds to the information which is shown in the result of your Blast search. The classification of each repetitive element is shown in hierarchical order. If the element was given a specific name by the researcher, it is displayed also as part of the classification.
Additional remarks (i.e. “complete element” or “fragment”) are separated from the classification by a semicolon. In the case of a complete element, the size of the target site duplication may be indicated too to give additional proof that indeed the complete element was isolated.
The DE line of a well classified element looks like this:
DE retrotransposon, LTR, copia, “angela-6”; complete element (5 bp TSD)
The minimal classification is: