DNA structure
DNA is usually a double-helix and has two strands running in opposite
directions. (There are some examples of viral DNA which are single-stranded).
Each chain is a polymer of subunits called nucleotides (hence the name
polynucleotide).
Each strand has a backbone made up of (deoxy-ribose) sugar molecules
linked together by phosphate groups. The 3' C of a sugar molecule is connected
through a phosphate group to the 5' C of the next sugar. This linkage is
also called 3'-5' phosphodiester linkage. All DNA strands are read from
the 5' to the 3' end where the 5' end terminates in a phosphate group and
the 3' end terminates in a sugar molecule.
Each sugar molecule is covalently linked to one of 4 possible bases
(
Adenine,
Guanine,
Cytosine and
Thymine). A
and G are double-ringed larger molecules (called
purines); C and
T are single-ringed smaller molecules (called
pyrimidines).
In the double-stranded DNA, the two strands run in opposite directions
and the bases pair up such that A always pairs with T and G always pairs
with C. The A-T base-pair has 2 hydrogen bonds and the G-C base-pair has
3 hydrogen bonds. The G-C interaction is therefore stronger (by about 30%)
than A-T, and A-T rich regions of DNA are more prone to thermal fluctuations.
The bases are oriented perpendicular to the helix axis. They are hydrophobic
in the direction perpendicular to the plane of the bases (cannot form hydrogen
bonds with water). The interaction energy between two bases in a double-helical
structure is therefore a combination of hydrogen-bonding between complementary
bases, and hydrophobic interactions between the neighboring stacks of base-pairs.
Even in the single-stranded state, the bases prefer to be stacked (like
the steps of a spiral staircase if the bases are identical) and a single-stranded
chain can also have regions of helical conformation.
The backbone of polynucleotides are highly charged (1 unit negative
charge for each phosphate group; 2 negative charges per base-pair). If
there is no salt in the surrounding medium, there is a strong repulsion
between the two strands and they will fall apart. Therefore
counter-ions
are essential for the double-helical structure. Counter-ions shield the
charges on the sugar-phosphate backbone. They may also contribute to an
attractive
interaction from fluctuating counter-ions around the backbone, similar
to the Van der Waals interactions for fluctuating induced dipoles.
The most common DNA structure in solution is the B-DNA. Under conditions
of applied force or twists in the DNA, or under low hydration conditions,
it can adopt several helical conformations, referred to as the A-DNA, Z-DNA,
S-DNA...
Shown in picture above are three crystallized states of DNA, the A-DNA
(left), B-DNA (middle) and Z-DNA (right). The A-form crystallizes under
low hydration conditions and is not normally found for DNA in the cell.
It is, however, the structure adopted by double-stranded regions in RNA
as well as the transient double-helix between DNA and RNA during transcription.
Both A- and B-DNA are right-handed helices whereas Z-DNA is a left-handed
helix and is commonly found in regions of DNA that have an alternating
purine-pyrimidine (e.g. 5'-CGCGCGCG-3' or 5'-CGCGCATGC-3') sequences. The
table below summarizes some of the major differences.
A-DNA
B-DNA
Z-DNA
Right-handed helix
Right-handed
Left-handed
Short and broad
Long and thin
Longer and thinner
Helix Diameter
25.5A
23.7A
18.4A
Rise / base-pair
2.3A
3.4A
3.8A
Base-pair / helical turn ~ 11
~ 10
~ 12
Helix pitch
25A
34A
47A
Tilt of the bases
20 deg
-1 deg
-9 deg
The ball-and-stick representation shown above can be misleading because
it suggests that there is empty space between the two strands and between
the base-pair stacks. Another representation is the filled space representation
in which each of the atoms are shown as a ball of radius representative
of its Van der waals radius. The picture below shows this view for the
3 DNA structures shown above.
Here, the B-DNA is on the left and the A-DNA is in the middle. The blue
and white atoms are the sugar-phosphate backbone atoms, the red are G-C
base-pairs and the yellow are A-T base-pairs. The B-DNA picture shows very
clearly the 'grooves' in between the backbones that also spiral around
the DNA structure; the grooves in B-DNA come in two sizes, the minor groove
and the major groove.
A DNA molecule is not a rigid, static structure as x-ray diffraction
pictures might suggest, and the crystallographic parameters shown above
are average parameters. In reality, each of these structures are under
constant thermal fluctuations, which result in local twisting, stretching,
bending, and unwinding of the double-strands. Also, certain sequences lead
to permanent bends or kinks in the direction of the helix. These local
(sequence-specific) fluctuations are essential for the recognition of specific
binding sites along the DNA molecule where proteins involved in replication,
transcription, regulation of gene expression, or DNA-damage repair can
bind.
RNA structures
RNA molecules are also polynucleotides with a sugar-phosphate backbone
and four kinds of bases. The main differences between RNA and DNA are:
-
RNA molecules are single-stranded
-
The sugar in RNA is a ribose sugar (as opposed to deoxy-ribose) and has
an �OH at the 2' C position highlighted in red in the figure below (DNA
sugars have �H at that position)
-
Thymine in DNA is replaced by Uracil in RNA. T has a methyl (-CH3)
group instead of the H atom shown in red in U.
The picture shows an ATP molecule (adenosine tri-phosphate) about to
be incorporated into an RNA chain with the release of a di-phosphate).
RNA molecules do not have a regular helical structure like DNA. Instead,
they can form complicated 3-dimensional structures where the strands can
loop back and form
intra-strand base-pairs from self-complementary
regions along the chain.
DNA structure
RNA structure
There are three classes of RNA molecules:
-
messenger RNA (mRNA) which acts as a template for protein synthesis and
has the same sequence of bases (read from the 5' to the 3' end) as the
DNA strand that has the gene sequence. mRNA can range from ~300 nucleotides
to ~7000 nucleotides, depending on the size and the number of proteins
that they are coding for.
-
transfer RNA (tRNA), one for each triplet codon that codes for a specific
amino-acid (the building blocks of proteins). tRNA molecules are covalently
attached to the corresponding amino-acid at one end, and at the other end
they have a triplet sequence (called the anti-codon) that is complementary
to the triplet codon on the mRNA. All tRNA molecules are in the range
~70-90 nucleotides. They have a molecular weight of ~25,000 and have
sedimentation constant ~ 4 Svedberg (S) units.
-
ribosomal RNA (rRNA) which make up an integral part of the ribosome, the
protein synthesis machinery in the cell.
Secondary and tertiary structures of tRNA molecules
The crystal structures of several tRNA molecules have been determined.
All tRNA molecules have very similar
secondary structures in which
the single-stranded chain is folded in a 'clover-leaf' structure that has
three hairpins and an acceptor stem where the amino-acid is covalently
attached. The acceptor stem is the 3' end of the chain and always terminates
in the sequence 5'-CCA-3'.
This particular tRNA is specific for the amino-acid Alanine whose codon
on the mRNA is 5'-GCC-3' and the
anti-codon loop of tRNA reads 5'-GGC-3'.
The grey circles are examples of unusual, chemically modified, bases.
The secondary structure then folds up to form a 3-dimensional structure
which looks like an inverted L.
One end of one L arm (the 3' end of the chain) is the acceptor stem.
The other end of the L is the anti-codon loop that has to match the codon
on the
mRNA. The distance between the two ends of the L is ~ 7 nm.
The corner of the L is used for correct positioning on the ribosome where
the protein synthesis takes place.
In the
tertiary (3-dimensional) structures of RNA, bases sometimes
make hydrogen bonds with more than one partner, as illustrated in the picture
above. These extra hydrogen bonds compensate for the distortion in the
double-stranded helical regions when the RNA folds up and help stabilize
the tertiary structure.
The covalent attachment between the
tRNA and its corresponding
amino-acid is achieved by yet another adaptor molecule (this time a protein
molecule called aminoacyl-tRNA synthetase) of which there are at least
20 varieties, one for each kind of amino-acid. The synthetases recognize
the detailed shape and properties of a specific amino-acid and the detailed
shape of the acceptor stem in the folded tRNA molecule and catalyze the
covalent attachment between the amino-acid and its corresponding tRNA.
Ribosomal RNA
The ribosome is a large machinery (~ 20 nm in diameter, 70S sedimentation
rate for bacterial ribosomes) and is made of two subunits: a large subunit
(~50S) and a small subunit (~ 30S). The large subunit is in turn made of
two ribosomal RNA (5S and 23S) and several (~34 proteins) whereas the small
subunit has one ribosomal RNA (16S) and ~ 21 proteins. The 23S rRNA is
~ 3000 nucleotides long, and the 16S rRNA is ~ 1500 nucleotides long.
The structures of ribosomal RNA can get very complicated because of
the large number of ways in which hairpins and loops can be formed. Predicting
these structures requires a combination of both computational methods (in
which the most probable secondary structures are determined from estimates
of free energy for a given structure) and a variety of experimental techniques.
Oligonucleotide mapping techniques
This technique is useful in identifying exposed single-stranded regions
of a folded RNA molecule by hybridization with short synthesized nucleotide
chains (also called oligonucleotides) that are complementary to, for instance,
the loop regions in RNA.
Folded RNA molecules are confined to one region in space separated by
another region by a semi-permeable membrane. On the other side of the partition
are radioactive oligonucleotides (~ 5-10 nucleotides long) that can pass
through the membrane and bind to RNA molecules, but the RNA molecules,
which are much bigger in size, cannot.
At equilibrium, free oligomers are in the same concentration on both
sides of the partition. However, the radioactivity on the side with the
RNA molecules is larger than the other size because some oligomers will
associate with (bind to) RNA if the sequences of oligomers and loop regions
are complementary. A measure of the ratio (
rd)
of
radioactivity from either side gives a measure of the binding or association
constant

where
[X] is the concentration of the RNA-oligomer complex, [O] is the free oligomer
concentration on either side, and [RNA] is the concentration of molecules
that are not bound to an oligomer.
The ratio
If [RNA] >> [O], then the RNA concentration can be assumed the same
before and after mixing and the ratio becomes

.
Therefore, a measurement of
rd yields a direct measure
of
Ka.
All oilgonucleotides will lead to some association since there is always
a match at a single base-pair level. Therefore

for
any oligonucleotide. For oligonucleotides ~ 4 bases long that match an
exposed loop region on the RNA the free energy change upon association
is substantially larger (by ~ 10-15
kBT ) than the free
energy change from single base-pair matches. This lead to an increase in
the association constant by a factor of 10
4 to 10
6.
This technique can easily distinguish between two possible conformations
of an RNA molecule which have different sequences in their loop regions.
We can also estimate which structure is more probable (i.e. which one
has the lower free energy.
The free energy of a hairpin can be broken into two parts, the free
energy of forming a loop closed by a single base-pair

and the free energy for the base-paired `stem' of the hairpin.
In RNA molecules the most probable loop size consists of ~ 6-7 bases
in the loop. Smaller loops are energetically unfavorable as a result of
steric hindrances among the bases and atoms of the backbone. Larger loops
are entropically unfavorable. The loss of entropy when loops are formed
increases with increasing loop size.

for the optimal
sized loop
closed by a G-C base-pair is ~ 7-8
kBT
in 1M NaCl. In our example we have a loop with 10 bases in structure 1
(

) and 2 loops
with 4 bases each in structure 2 (

for
each loop).
Note that

is
a positive quantity; it is unfavorable to make loops relative to the random
coil conformation.
The hairpin structures are stabilized when the free energy gain from
base-pair formation exceeds the free energy cost of loop formation.
The gain

from
adding a base-pair to an already existing G-C pair is ~

for
adding a G-C base-pair and ~

for
adding a A-U base-pair.
Therefore the net change in free energy for structure 1 is
and for structure 2 is
Structure 1 is more stable (although marginally) and the relative populations
of the two structures are given by the Boltzmann distribution