Molecular Biology 08: 'DNA replication and repair'
These are my notes from lecture 08 in Harvard’s BCMP 200: Molecular Biology course, delivered by Johannes Walter on September 22, 2014.
The dipoid human genome contains 6e9 bp - compare to 1.6e10 characters in English Wikipedia. The physical length of the diploid genome is about 0.6 nm/bp, for a total of almost 2 meters of DNA per cell [source].
A few years after Watson and Crick published their proposed structure for DNA, experimental evidence from E. coli confirmed that indeed, replication was “semi-conservative”, meaning that it produces two helices, each of which contains one strand from the original double helix [Meselson & Stahl 1958]. This requires DNA helicases to unwind DNA; the motor mechanism thereof is described in [Enemark & Joshua-Tor 2006]. In bacteria, DnaB helicases work 5’ to 3’, and in eukaryotes the MCM2-7 helicases work 3’ to 5’. The classic helicase assay involves running DNA on a gel to distinguish double helical from single-standed DNA [Kaplan 2003]. To demonstrate that a protein acts as a helicase, it is necessary to mutate the DnaB ATPase motif; absent this proof, your helicase activity could be due to contaminants. Kaplan biotinylated either the 5’ or 3’ end of DNA and bound it to streptavidin. He found that binding the 5’ end severely inhibited DNA unwinding, while binding the 3’ end actually increased the efficiency of unwinding. This demonstrated that one ssDNA tail is for loading the helicase, and the other is for “pushing against”.
Another landmark experiment involved purifying DNA from calf thymus and purifying DNA polymerase I from E. coli (1 g of enzyme per 100kg of bacterial paste), and showing that the enzyme could incorporate radiolabeled dNTPs into new strands of DNA [Lehman 1958]. This swiftly earned senior author Arthur Kornberg the 1959 Nobel Prize in Physiology and Medicine. DNA polymerase synthesizes DNA by extending a primer in the 5’ to 3’ direction (which means walking from 3’ to 5’ on the template strand). It removes 2 phosphates (pyrophosphate) from a dNTP and then cleaves those into two individual inorganic phosphates in order to power the ligation of the remaining phosphate on the dNMP into the DNA polymer. The ligation step’s own ΔG is barely negative, so the coupled pyrophosphate cleavage greatly improves the thermodynamics of DNA replication.
The leading strand is the strand where the 5’ to 3’ synthesis points into the replication fork and proceeds continuously. On the other, lagging strand, the nascent strand is synthesized in Okazaki fragments which are ligated togehter later.
All DNA polymerases require a primer. In DNA replication in vivo this is accomplished by a primase such as DnaG in E. coli which creates short RNA primers. On the lagging strand this is done repeatedly for each Okazaki fragment and the RNA is removed at the time of ligation.
This video shows the leading strand being synthesized continuously, and Okazaki fragments being created in serial:
DNA polymerase not only has 5’ → 3’ synthase activity, it also has 3’ → 5’ exonuclease activity. If starved of dNTPs it will back up and degrade the strand it just made. It will also back up to remove incorrectly incorporated nucleotides - this is called “proofreading activity”. DNA polymerase I also has another activity unique among polymerases, which is that it also has 5’→3’ exonuclease activity, aka “nick translation” which moves nicks in a 3’ direction (think the geometric meaning of “translation”).
DNA polymerase has a Km of 10 μM for dGTP and dATP, and of about 50 μM for dCTP and dTTP. For optimal activity, therefore, the concentration of dNTPs has to be >50μM. Under such conditions the rate of incorporation is about 50 dNTPs/second. DNA polymerase has a processivity of ~50 dNTPs, which refers to the number of enzymatic activities (dNTP incorporations) that it can perform per DNA binding event. For each time it binds a strand of DNA, it can add 50 nucleotides. Measurements of processivity require watching the activity of one binding event for a single molecule. This can be achieved by (1) using immobilized primer templates and washing away excess polymerase, (2) adding a large amount of non-radiolabeled competitor DNA, or (3) using single molecule imaging.
Klenow discovered that a certain protease treatment could separate DNA polymerase I into one domain with 5’→3’ polymerase activity, and another domain with 3’→5’ exonuclease activity. The polymerase fragment was used in sequencing for many years, until bacteriophage T7 polymerase came along.
In the active site of DNA polymerase, two metal cations bound to two aspartates catalyze the nucleophilic attack of the 3’OH of the nascent strand on the 5’ phosphate of the next dNTP. DNA polymerase is said to contain a “palm” and “finger” which form the catalytic site, and a “thumb” which holds the DNA in place. The active site is an extremely confined space, such that only proper Watson-Crick base pairing will allow a dNTP to fit [Goodman 1997]. The active site geometry alone is sufficient to give DNA polymerase an intrinsic fidelity of 1e-5, i.e. one error in 100,000 base pairs. Additional fidelity is achieved through proofreading exonuclease. If a wrong base pair is incorporated, it distorts the geometry of the strand such that the polymerase stalls upon attempting to incorporate the next nucleotide. This acvitity brings the fidelity to 1e-7. After replication, the DNA mismatch repair pathway recognizes any wrongly incorporated nucleotides left behind by the replication fork, and fixes them, bringing the final fidelity to 1e-10, so that there is <1 mistake per cell division event in, say, the human genome. Note that the figures for fidelity rates and the relative contributions of different enzymes are similar in prokaryotes and eukaryotes.
Knocked-in mutations in DNA polymerase that increase the error rate lead to death by cancer in mice [Goldsby 2001].
After Kornberg’s experiments, people figured that the DNA polymerase he had isolated was probably responsible for DNA replication in E. coli, but no one was sure yet. De Lucia & Cairns 1969 created thousands of clones of mutagenized E. coli and found one clone which had <1% of the wild-type DNA polymerase activity, yet could still replicate, demonstrating that DNA polymerase I was dispensible for life - although that clone did exhibit increased sensitivity to UV light. Arthur Kornberg’s son Tom subsequently isolated DNA polymerase III, which turns out to be the essential one in E. coli for DNA replication [Kornberg & Gefter 1972]. In fact, the mutant that De Lucia and Cairns isolated durned out to have a W382X mutation in DNA pol I which preserves the 3’→5’ exonuclease domain of DNA pol I, which is essential for life. DNA polymerase I and III are essential, therefore, while DNA polymerase II is dispensible. The DNA polymerase III “holoenzyme” is comprised of several subunits, some of which come in >1 copy per complex, and are divided into “core”, “clamp” and “γ complex” subassemblies. The β subunit is essential for producing long strands - its crystal structure [Kong 1992] eventually revealed that it is a ring which fits around dsDNA and keeps the complex from falling off, and it turns out to increase the processitivity of the complex from 25 to 15,000 dNTPs. The γ complex is a AAA+ ATPase “clamp loader” which pops open the β subunit and fits it around the dsDNA. The γ complex must bind ATP in order to open the β subunit, and then must hydrolyze the ATP in order to close it again and release the β subunit. In the movie above, the green ring is the β subunit, and the gray blob under it is the γ complex.
Once Okazaki fragments were revealed to exist, Bruce Alberts felt there must be some sort of temporal coordination between the leading and lagging strand synthesis to prevent the leading strand from outrunning the lagging strand. In the “trombone model” he hypothesized that the replication complex for the lagging strand is bound to the replication complex of the leading strand, forming a loop that goes until replication hits the last Okazaki fragment, at which point the lagging strand is released and a new loop is gathered up. This model is diagrammed here and is also how it is depicted in the movie above. This hypothesis is currently believed to be correct, and the γ complex contains two copies of a subunit called Tau, which are thought to each bind to one of the two copies of DNA polymerase III.
Replication is more complex in eukaryotes. Eukaryotes have different polymerases for the leading (DNA pol ε) and lagging (DNA pol δ) strands, and an enzyme called DNA pol α which performs both the RNA primase activity and the first 30 nt of DNA polymerase activity. DNA pol α cannot bind to the processivity factor PCNA, so it falls off after ~30 nt, at which point the other polymerases take over. DNA pol δ will actually peel away the RNA primer and keep going, leaving an RNA “flap” which is later cut by Flap Endonuclease Fen1, which leaves a nick later ligated by DNA ligase I.
Okazaki fragments are ~3kb in bacteria and 150bp in eukaryotes. The replication fork moves about 30 kb/min in bacteria and 1 kb/min in eukaryotes. E. coli have only one origin of replication, from which two forks set out, one in each direction. Human cells have ~100,000 points where replication can be initiated.