Repetitive DNA

When the chromosomes of an organism such as wheat or pine is dissected at the molecular level, stretches of nucleotide sequence that occur once or only a few times - including the genes - represent as little as 5% of the DNA. Most plant and animal genomes consist largely of repetitive DNA - perhaps 30 sequence motifs, typically one to 10 000 nucleotides long, present many hundreds or thousands of times in the genome, which may be located at a few defined chromosomal sites or widely dispersed.

Species over wide taxonomic groupings have similar genes and arrangements of genes along the chromosomes - they show conserved synteny. However, knowledge of synteny - provided by high-density, marker-saturated genetic maps and genomic DNA sequence data - tells us relatively little about the large-scale physical organization of the chromosomes and the repetitive DNA elements that make up the bulk of most genomes. We are working on tandem repeats (satellite DNA) in animals - Drosophila, bovids and bivalves (scallops).

Repetitive DNA, with different selective pressures from those acting on genes and evolutionarily successful multigene modules, may show extensive differences in sequence motifs and abundance even between closely related species. The repetitive DNA in the genome is also important for evolutionary, genetic, taxonomic and applied studies.

An excellent database of various repetitive sequences from plants is maintained by Jiri Macas at the Czech Academy of Sciences, Budejovice. Not surprisingly, many of these (probably about 50) are from Triticeae and many although reported under their original species (particularly rye) are also present in other species. The database is located available at http://w3lamc.umbr.cas.cz/PlantSat/

For Triticeae, has the Triticeae Repeat Sequence Database, TREP.

 

A few repetitive sequences are known to have well defined functions. The telomeric sequences, added at the ends of most plant and animal chromosomes, allow a linear replication unit to be maintained, protect chromosome ends and overcome the 'end replication problem'. The 18S-5.8S-25S and 5S rRNA gene loci, clustered at a small number of sites, encode the structural RNA components of ribosomes. Mobile DNA sequences - such as transposons and retrotransposons - make up a high proportion of most plant and animal genomes. A major class, the retroelements, encode the proteins necessary for their own reverse transcription and integration, and sometimes represent 50% of the genome. As a result of their transcription into RNA, reverse transcription into DNA and integration into the genome, they have a dispersed distribution along chromosomes. Notably, telomeres, rDNA and retroelement sequences are all ancient - they are found in all animals and plants, and might be considered as early derivatives of the 'RNA-world' from which DNA-based organisms evolved.

Tandemly repeated sequences normally have characteristic chromosomal locations - sub-telomeric, intercalary or centromeric - with blocks of each motif present, in plants, on most or all chromosomes in the genome. Centromeric repeats are frequent, with arrays of 140-360 bp monomers often spanning more than 1Mbp. It is notable that nucleotide stretches homologous to key parts of the yeast and human centromere boxes CDEIII and CENP-B  can be identified in some plant sequences that locate at the centromere indicating that functional centromere motifs may soon be identified in higher plants. Many tandem repeat units have a complex structure, sometimes including simple sequence repeats, resulting from rounds of rearrangement and amplification during evolution.

Isolation and localization by in situ hybridization of multiple repetitive sequences, each representing a substantial fraction of the genome, provides a novel view of genomic organization, chromosome structure and landmarks for looking at genes, their clustering and orientation. It is a top-down chromosomal approach to complement bottom-up DNA marker and clone-based genome analysis.

Survey of major DNA sequences of plant nuclear genomes. Satellite DNAs have varying monomer lengths but 140-180 or 300-360 bp are frequent, corresponding to mono-or dinucleosomes. Microsatellites are runs of simple sequence repeats (with motifs 1-5 bp long), while minisatellites have longer and more complex repeating units (up to 40 bp). Telomeric DNA, consisting of conserved 7 bp repeats (CCCTAAA), is added to the chromosome termini by telomerase. Retroelements, amplifying and transposing via RNA intermediates, are divided into mobile sequences with long terminal repeats (LTRs) and non-LTR retroposons (LINEs, long interspersed nuclear elements and the related SINEs, short interspersed nuclear elements). Plant genomes may also contain solo-LTRs, miniature inverted-repeat transposable elements (MITEs) and virus-like sequences. Transposons move as DNA elements, and non-autonomous copies may be trans-activated by active autonomous elements. Connections between boxes indicate similarities in genome organization. Arrows indicate dynamic changes between sequence classes - divergence and dispersion of tandem repeats, and clustering and homogenization of dispersed sequences.