Triticeae Repeat Classification
Classification of repetitive elements is a hotly debated issue. After extensive discussions, a unified classification system for transposable elements (TEs) was proposed (Wicker, Sabot, Hua-Van et al., Nat. Rev. Genet. 8:973-982, 2007). Repetitive elements were classified into two main groups which are further subdivided (in hierarchical order) into Subclasses, Orders, Superfamilies and Families. A three letter code that precedes the family name describes class, order and superfamily so that the elements classification can be seen at one glance.
In the following, the main groups of TEs that are found in plants are outlined. The description includes some of the characteristics by which the elements can be identified. Please be aware that the described characteristics may apply to most of the TEs of a particular group, but there are always exceptions. More detailed descriptions can be found in the review by Wicker, Sabot, Hua-Van et al., 2007)
Class 1 elements (Retrotransposons)
They replicate via an mRNA that is reverse transcribed into DNA. Thus, each replication cycle produces an addotional copy (Copy-an-paste mechanims). Key enzymes are gag, reverse transcriptase (RT), RNase H (RH) and integrase (IN).
Order LTR retrotransposons
Long terminal repeat (LTR) retrotransposons have internal domain which usually contains several genes and is flanked by direct repeats (LTRs) that contain promoter and downstream regions. The two LTRs range in size from a few hundred bp to alomost 5 kb, depending on the repeat family. They are usually flanked by a 5 bp target site duplication (TSD) and the LTRs terminate in the highly conserved TG…CA motifs.
Superfamiliy Gypsy (code RLG)
Protein domain order: gag – RT/RH – INT)
Superfamiliy Gypsy (code RLC)
Protein domain order: gag – INT-RT/RH).
Superfamiliy Unknown (code RLX)
Many LTR retrotransposons do not contain any coding regions in their internal domain and consist only of two LTRs with an internal domain of variable size. Depending on their overall length, they are sometimes referred to as TRIMs (Terminal repeat transposons in minature) or LARDs (Large retrotransposon derivatives). Due to the lack of coding regions, their classification into Gypsy or Copia is not possible (hence their code “RLX”).
Order SINE (long interspersed nuclear elements)
Also called “Non-LTR retrotransposons”, they do not contain LTRs and probably are under the control of an internal promoter. Sometimes, the 3’ end terminates in a poly-A tail that originates from the reverse transcription of the mRNA transcript. Due to their insufficient characterisation in plants, LINE superfamilies have not been assigned yet. Thus, all LINEs in TREP have the code “RIX”.
Due to frequent premature abortion of reverse transcription, most lINEs are only fragments of the 3’ regions. Full-length elements (usually containing two ORFs) are very rare. Full length elements have a sizes of 6-10 kb and usually contain a 5’ ORF whose function is not entirely clear and a 3’ ORF that contains the reverse transcriptase domain.
Order SINE (short interspersed nuclear elements)
SINEs are only poorly characterised in plants. Some of the unclassified (“XXX”) elements in TREP might actually be SINEs.
Class 2 elements (DNA Transposons)
They usually encode a transposase and sometimes other proteins. DNA transposons move via a DNA intermediate, which means, the DNA itself is excised from the genome and integrated elsewhere (cut-and-paste mechanism). They multiply in a complex process called replicative transposition.
Order TIR (Terminal Inverted Repeat)
This order contains all elements that are flanked by terminal inverted repeats (TIRs) which can range in size from a few to several hunded bp. Autonomous elements often encode more than one protein product and their coding regions can contain introns. However, most TIR elements are non-autonomous and consist only of TIR sequences that flank an internal domain with no coding capacity. Thus, the sequence of the TIRs and the size of the TSD are the most important classification criteria.
CACTA elements are flaked by relatively short TIRs (15 – 100 bp) which terminate in a conserved CACTA…TAGTG motif. The aproduce a 3 bp TSD. Many of them are non-autonomous elements. CACTAs can be very difficult to recognise them because the TIRs of different subfamilies show very little sequence conservation besides the CACTA motif. Even that is sometimes modified to CACTG.
They are usually flanked by very large inverted repeats (200-500 bp) and are flanked by a 9 bp target site duplication. Most are non-autonomous derivatives which contain no coding sequence and basically consist of large inverted repeats which are separated by an internal domain of a few hundred bp.
So far, only very few such elements are deposited at TREP. It is therefore not yet possible to give general statements about their structure and sequence organisation. They can have very short TIRs of only a few bp and are flanked by an 8 bp TSD.
For simplicity’s sake they are simply called Mariner here. These elements have short TIRs of about 10-30 bp and are generally flanked by a TA TSD. By far most of them are small non-autonomous elements. The most famous ones are the large group of Stowaway MITEs (Miniature inverted-repeat transposable elements). They are small (50-500 bp) and have a well conserved inverted repeat structure and do not encode proteins.
All Stowaways terminate in a CTCCCTCC motive (or something very similar to that). In the TREP database, Stowaway elements were further classified into several families. These differ in size but all contain the conserved Stowaway motif.
Here simply called Harbinger. They contains short TIRs which often terminate in a short stretch of C’s or G’s and are flanked by a T/A rich 3 bp TSD. Many are MITEs of the Tourist Group which have a G-rich TIR sequence in common.
So far, there are only 3 Helitrons from Triticeae and a few from rice in TREP. Soon there should be more information on them available here.
These are elements which could not (yet) be classified into one of the above classes, but occur in multiple copies. The presence of a target site duplication is often the only indication that they represent mobile DNA elements. Some of them probably represent SINEs.
Repeats of up to several hundred bp occur frequently in the Triticeae genomes. Tandem repeats are often found within CACTA transposons (e.g. “Afa” repeats) and are likely to be a functional part of these elements. Many types, however, do not appear to have a common structure or specific function.