Nomenclature, classification and annotation of repetitive elements

Olin Anderson, Jorge Dubcovsky, Dave Matthews, François Sabot and Thomas Wicker


4 December 2002
, revised April 2005

The nomenclature for transposable elements needs to be standardized to allow researchers to quickly recognize the type (Class) of element, their origin (sequence ID) and their structure (complete elements, solo-LTRs, fragments etc.).

Repeat nomenclature

After discussion among ourselves we have agreed to recommend the following nomenclature for identifying individual sequenced repeat elements in the Triticeae:

Examples:

The first component should be the type of element, capitalized, followed by an underscore (_). Examples of known element types may be found in total_TREP_list.htmlRetrotransposons (Class I) are usually given female names, DNA Transposons and Helitrons (Class II) male names, and foldback elements such as MITEs or LITEs (Class II) are usually given mythological names.  

The second component may be either the GenBank accession of the sequence containing the element, or the name of the clone (BAC etc.) containing it. The former is preferred where feasible.

The third component is a hyphen (-) followed by a number indicating which of the elements of that type in the sequence or clone. For example Angela_BHG4-2 denotes the second Angela element in BAC clone BHG4.

Classification and annotation

 

Transposable elements are classified by similarity to known elements on the DNA level.  Two elements are considered the same if they show more than 80% sequence identity over most of their entire length (i.e. there are one or more very strong BLASTN alignments).

 

The elements detected only by protein homologies or software (e.g. LTR_STRUC or DOTTER) can be considered novel elements and should be given a new name.

The name is followed by the accession number of the source sequence and the number of the individual copy of the element on that sequence (e.g. Hades_AF325187-1).

 

In the annotation of the sequence that contains the element, it should be annotated either as LTR (for solo-LTRs) or repeat_unit (for all others).

Below are proposed annotation features that classify and characterize a transposable element.  An element can have several of the described features, as long as they do not contradict one another (e.g. “partial element” and “complete element”.

 

/rpt_type (or LTR) = Class, Subclass, Family

E.g. Class I, LTR retrotransposon, copia

 

/name (or label) = Name_accession number

E.g. “Angela_AF325187-1”

 

/note=“Complete element”

if the element is complete, with  two intact  ends (with TSD) and no major internal deletions

 

/note=“Partial element, description”

 if the element is truncated on one end (e.g. “Partial element, truncated 5’ end”)

 

/note=“Fragmented element”

if there is only a fragment of the internal sequence

 

/note=“Degenerated element”

if there are only small parts of the element (less than 100 bp for a LTR retrotransposon, for example)

 

/note=“Interrupted element”

if the element carries nested insertions of other elements  (the element should also be complete).

 

/note=“TSD is NNNNN”

for complete elements

 

/note=”TSD is probably NNNNN”

 for partial elements (e.g. only one intact end is present)

 

            For “coding” sequences, we distinguish the putative functional protein (only in complete elements in general) and partial protein sequence.

 

            For complete sequence, we give the ID of the protein which show the best hit (e.g. /similarity=”AAO00696, putative reverse transcriptase (O. sativa) E=e-116”) and the level (percentage) of similarity if it is possible. If the protein seems not functional, it can be characterized as follows:

 

/note=fragment of putative protein XXX; non-functional

 

If protein sequences are derived from transposable elements, they can be submitted to TREP and will be integrated in the collection of hypothetical proteins (PTREP).

 

Example

The following example characterizes the second Angela element found on BAC 111A01. The element is reverse orientation. It is a complete element that carries the nested insertion of another repeat.

 

Repeat_unit   complement(10000..15000,19000..22000)

              /rpt_type=”Class 1, Retrotransposon, copia”

              /name=”Angela_111A01-2”

              /comment=”interrupted element”

              /note=”TSD is ACGTA”   

 

Complications

Probably there will be complicated cases that aren't covered by the recommendations above. If you find logic problems that need addressing please let us know and we can discuss further.