When the chromosomes of an organism
such as wheat or pine is dissected at the molecular level, stretches of
nucleotide sequence that occur once or only a few times - including the
genes - represent as little as 5% of the DNA. Most plant and animal genomes
consist largely of repetitive DNA - perhaps 30 sequence motifs, typically
one to 10 000 nucleotides long, present many hundreds or thousands of
times in the genome, which may be located at a few defined chromosomal
sites or widely dispersed.
Species over wide taxonomic groupings
have similar genes and arrangements of genes along the chromosomes - they
show conserved synteny. However, knowledge of synteny - provided by high-density,
marker-saturated genetic maps and genomic DNA sequence data - tells us
relatively little about the large-scale physical organization of the chromosomes
and the repetitive DNA elements that make up the bulk of most genomes. We are working on tandem repeats (satellite DNA) in animals - Drosophila, bovids and bivalves (scallops).
Repetitive DNA, with different selective
pressures from those acting on genes and evolutionarily successful multigene
modules, may show extensive differences in sequence motifs and abundance
even between closely related species. The repetitive DNA in the genome
is also important for evolutionary, genetic, taxonomic and applied studies.
An excellent database of
various repetitive sequences from plants is maintained
by Jiri Macas at the Czech Academy of Sciences, Budejovice. Not
surprisingly, many of these (probably about 50) are from Triticeae and
many although reported under their original species (particularly
rye) are also present in
other species. The database is located available
has the Triticeae Repeat Sequence Database, TREP.
A few repetitive sequences are known
to have well defined functions. The telomeric sequences, added at the
ends of most plant and animal chromosomes, allow a linear replication
unit to be maintained, protect chromosome ends and overcome the 'end replication
problem'. The 18S-5.8S-25S and 5S rRNA gene loci, clustered at a small
number of sites, encode the structural RNA components of ribosomes. Mobile
DNA sequences - such as transposons and retrotransposons - make up a high
proportion of most plant and animal genomes. A major class, the retroelements,
encode the proteins necessary for their own reverse transcription and
integration, and sometimes represent 50% of the genome. As a result of
their transcription into RNA, reverse transcription into DNA and integration
into the genome, they have a dispersed distribution along chromosomes.
Notably, telomeres, rDNA and retroelement sequences are all ancient -
they are found in all animals and plants, and might be considered as early
derivatives of the 'RNA-world' from which DNA-based organisms evolved.
Tandemly repeated sequences normally
have characteristic chromosomal locations - sub-telomeric, intercalary
or centromeric - with blocks of each motif present, in plants, on most
or all chromosomes in the genome. Centromeric repeats are frequent, with
arrays of 140-360 bp monomers often spanning more than 1Mbp. It is notable
that nucleotide stretches homologous to key parts of the yeast and human
centromere boxes CDEIII and CENP-B can be identified in some plant
sequences that locate at the centromere indicating that functional centromere
motifs may soon be identified in higher plants. Many tandem repeat units
have a complex structure, sometimes including simple sequence repeats,
resulting from rounds of rearrangement and amplification during evolution.
Isolation and localization by in
situ hybridization of multiple repetitive sequences, each representing
a substantial fraction of the genome, provides a novel view of genomic
organization, chromosome structure and landmarks for looking at genes,
their clustering and orientation. It is a top-down chromosomal approach
to complement bottom-up DNA marker and clone-based genome analysis.
Survey of major DNA sequences of
plant nuclear genomes. Satellite DNAs have varying monomer lengths but
140-180 or 300-360 bp are frequent, corresponding to mono-or dinucleosomes.
Microsatellites are runs of simple sequence repeats (with motifs 1-5 bp
long), while minisatellites have longer and more complex repeating units
(up to 40 bp). Telomeric DNA, consisting of conserved 7 bp repeats (CCCTAAA),
is added to the chromosome termini by telomerase. Retroelements, amplifying
and transposing via RNA intermediates, are divided into mobile sequences
with long terminal repeats (LTRs) and non-LTR retroposons (LINEs, long
interspersed nuclear elements and the related SINEs, short interspersed
nuclear elements). Plant genomes may also contain solo-LTRs, miniature
inverted-repeat transposable elements (MITEs) and virus-like sequences.
Transposons move as DNA elements, and non-autonomous copies may be trans-activated
by active autonomous elements. Connections between boxes indicate similarities
in genome organization. Arrows indicate dynamic changes between sequence
classes - divergence and dispersion of tandem repeats, and clustering
and homogenization of dispersed sequences.