Biochemistry 04: proteins and nucleic acids

These are notes from lecture 4 of Harvard Extension’s biochemistry class.

protein separation and purification

Purification of proteins often begins with an initial “crude” purification. This uses salts to “salt out” / precipitate proteins with similar qualities, leaving undesired ones in the supernatant. (I think precipitation of PrPSc by NaPTA [Safar 1998] is an example of this). This step is then followed with a more specific purification based on the protein’s properties. There are several such purification procedures, most of which are some form of column chromatography. This involves a glass column with a reservoir where you put the homogenate / lysate. It drips onto a solid but porous matrix called the “stationary phase”. This basic concept can be adapted to separate molecules on a few different properties:

size exclusion chromatography also called “gel exclusion” chromatography. Porous beads (like golf balls) within a column catch small molecules, while large molecules (too big to get stuck in the pores) move faster and elute first.
ion exchange chromatography. Solid matrix is either positively or negatively charged. For instance, positively charged proteins will stick to a negatively charged matrix, making them get stuck. Once you’ve eluted away the neutral and negative proteins that you don’t want, you can then add salt or change pH to change the net charge of the originally positive proteins in order to get them out.
affinity chromatography. The column contains a covalently attached ligand specific to the protein of interest.

Once these processes are done, how to test the purity of the protein? We define “specific activity” to be the number of enzyme units per milligram of total protein, maximized when a single protein is absolutely purified.

Sometimes purifying a protein takes several different steps. At each step you lose volume, total protein and to a lesser extent total activity. But because you lose less of your protein of interest (total activity) than other proteins (total protein) your specific activity rises dramatically.

Once a protein is purified, how do we characterize it?

SDS-PAGE. Sodium dodecyl sulfate disrupts non-covalent bonds denatures the protein and coats it with a negative charge, plus beta mercaptoethanol reduces disulfide bonds. Now all proteins are negatively charged, and will run towards a positive electrode. Thus this does not separate based on charge, but on molecular weight. You can then transfer the protein to nitrocellulose and label it with a specific antibody, then detect the antibody. This is a Western blot.
2-D Gel Electrophoresis. The first step is a gel run with no SDS across a pH gradient, so proteins migrate based on their isoelectric point (pI) which depends on their R groups. In the second step, add SDS to convert all proteins to a negative charge and then run them on a current gradient perpendicular to the original pH gradient.
Polypeptide hydrolysis uses acid, base or enzymes to break peptide bonds in a protein. Free amino acids are then labeled with fluorescence, and subjected to some sort of chromatography, esp. HPLC, so that the intensity of fluorescence tells you relative abundance of each amino acid. This can’t tell you the sequence of the amino acids.
Protein sequencing. If you have two peptide chains linked by a disulfide bond, first reduce that bond to separate the chains, then use chemical or enzymatic methods to break each chain into different length fragments. Typical enzymes used for this include trypsin, chymotrypsin, elastase, thermolysin, pepsin and endopeptidase V8, each of which is usually extracted from the tissues of different meat animals and is specific to a different amino acid where it cleaves. Sequence each fragment and then perform the “turnpike problem” to reassemble the full protein sequence. But how to sequence each fragment? A technique called Edman degradation. Tag the protein with “Edman’s reagent” phenylisothiocyanate which tags only the amino acid at the N terminus. Then use anhydrous acid to release the N-most amino acid, use chromatography to identify that amino acid, then repeat.
Mass spec proteomics. See review [Steen & Mann 2004 (ft)]. Basically, separate and digest proteins, then separate peptides based on HPLC or ion exchange, then ionize the sample with electrospray ionization or MALDI and spray the peptide ions into the mass spec machine. Look up the mass/charge ratio of each hit in UniProt.

nucleic acids

Why talk about DNA and RNA in a biochemistry class? Because DNA mutations cause protein changes, of course.

A nucleoside is just the pentose and the N base. A nucleotide also has the phosphate backbone. The purines adenine and guanine are double rings; pyrimidines cytosine, thymine and uracil are single rings. The phosphate backbone can be mono-, di- or tri-phosphate.

base (base-H)	nucleoside (base-ribose)	nucleotide (base-ribose phosphate)
Adenine (Ade, A)	Adenosine (Ado)	Adenylic acid / adenosine monophosphate (AMP)
Cytosine (Cyt, C)	Cytidine (Cyd)	Cytidylic acid / cytidine monophosphate (CMP)
Guanine (Gua, G)	Guanosine (Guo)	Guanylic acid / guanosine monophosphate (GMP)
Uracil (Ura, U)	Uridine (Urd)	Uridylic acid / uridine monophosphate (UMP)
Thymine (Thy, T)	Deoxythymidine (dThd / dT)	Deoxythymidylic acid / deoxythymidine monophosphate (dTMP)

By adding additional phosphates you get other familiar names e.g. AMP → ADP → ATP.

Nucleotides are linked through 3′C of one sugar to the 5′C of another sugar through a phosphodiester bond. One end of an oliognucleotide has a free 5′ phosphate group, the other has a free 3′ hydroxyl (OH). By convention, nucleotide sequences are always read 5′ to 3′, with “p” representing the phosphate at the 5′ end (e.g. pACGT) or between two nucleotides (e.g. CpG).

Friedrich Miescher isolated and characterized DNA in 1868. He called it “nuclein” and knew that it was acidic and contained phosphorous. Then in 1943 Oswald Avery discovered that if you transferred this substance between bacterial nuclei it would transfer the traits of the original bacterium with it. In 1950 Erwin Chargaff realized that nucleotides occurred in different ratios, so that the quantity of C and G were tightly linked and of A and T were tightly linked. In the next few years, Maurice Wilkins and Rosalind Franklin discovered that DNA had a characteristic X-ray diffraction pattern implying it is helical, but wanted to wait to be sure before publishing. Wilkins then shared the data with James Watson and Francis Crick, who proposed the 3-dimensional structure we now know, and the three of them won the Nobel prize in 1962.

Watson and Crick had three pieces of information. 1) Bases had different tautomers – enol and keto – and the keto form was necessary for function (see keto-enol tautomerism). 2) DNA is helical with aromatic bases stacking to form rings (see explanation) 3) The Chargaff rule: [A] == [T] and [C] == [G].

If you know just [A] you can derive the others. e.g. [A] = 32% implies [T] = 32% implies [C] == [G] == 18%.

The twisting of DNA creates two grooves of unequal width – major and minor grooves – due to the angle of the glycosidic bond. In a cross section you can see the major groove and minor groove in a plane together.

Bases are stacked on top of each other with the planes of their rings parallel. Bases are attracted through Van der Waals forces. The stacking interactions are stronger in GC pairs. Different stacked dimers have different stacking energies with GC (CpG or GpC) dimers being more stable than others.

Different higher order structures of DNA. B-form is the normal, physiological, right-handed helix. A-form is right-handed but wider. Z form is left-handed.

DNA’s melting temperature is a good indicator of its stability. Aromatic bases absorb more UV light when unstacked, so absorption rises on a sigmoidal curve as you heat DNA.

RNA, being single stranded, has a ton of conformational freedom and can form A-form helices (see above), hairpin loops, internal loops, bulges.

DNA replication is semiconservative, meaning that each daughter molecule contains one strand of the parent molecule. The point where DNA is unwound is the replication fork. Helicase is a ring-shaped enzyme that pulls DNA through the ring and burns ATP for the energy to unwind it. Each ATP molecule is enough to pull apart 5 nucleotides. As the DNA opens up, single strand binding (SSB) proteins coat the single strands to keep them from re-annealing.

DNA polymerase can’t initiate synthesis, it can only extend a pre-existing chain in the 5′ to 3′ direction. The 3′ OH group on the existing chain attacks the first of the three phosphates on the 5′ phosphate group of the free dNTP to form a new phosphodiester bond. That’s why you can’t go 3′ to 5′. An RNA primer provides the initial free 3′ OH group so that polymerase can extend the existing chain.

Past the replication fork, the leading strand is the one that exposes a 3′ of the parent molecule. This allows the new strand to be synthesized 5′ to 3′ into the replication fork, which is the “correct” way. The opposite lagging strand loops out, synthesizing Okazaki fragments opposite the leading strand over and over. These get stitched together later.

Types of RNA: rRNA, tRNA, siRNA, miRNA, lincRNA, snRNA, snoRNA.

Transcription factors bind to promoter regions, recruit RNA polymerase, position DNA at the RNA polymerase active site, and recruit helicase to unwind the DNA. RNA polymerase produces RNA from, and complementary to, the non-coding strand, thus being identical in sequence to the coding strand, save the U-for-T swap. Helicases wind and unwind the DNA at either end; in between there is a strip of 10 – 17 bases that are fully unwound and the RNA polymerase moves in lockstep with the helicases.

mRNA is processed in several ways: a 5′ cap protects the end from exonuclease. A ~200-base long 3′ polyA tail also protects from endonucleases and is thought to provide a “handle” for transport. Introns are spliced and exons joined.

Each tRNA molecule gets “charged” with one amino acid. tRNAs are depicted as having three large hairpin loops, with the anticodon on the middle one and the amino acid attached to the 3′. The binding in the third base of the anticodon is weak, thus this is the wobble base. U in this position can base pair with A or G; G in this position can base pair with U or C. Inosine (I) is very flexible and can pair with U, C or A. Thanks to this flexibility you don’t need all 61 possible tRNAs.

Translation involves 3 steps: 1) initiation at AUG; 2) elongation 5′ to 3′ / N terminal to C terminal; 3) termination when the ribosome encounters a stop codon.

Ribosomes have two active sites: the P site and A site. The P site is where a tRNA is bound to the mRNA at its anticodon and the growing polypeptide chain at its 3′ end. The A site is adjacent and contains a new tRNA charged with just one amino acid. The peptide chain needs to be passed from the P site to the A site.

Termination proteins or release factors RF-1 and RF-2 recognize a stop codon in the A site and make the ribosome release the pepitdyl group to water, freeing the polypeptide chain.

Post-translational modifications can include GPI anchoring, phosphorylation and glycosylation. One role of glycosylation is to protect the protein from degradation.

aside: RNA editing

Most common: A to I deamination. C to U is rarer. U to C, G to A and A to G have also been observed but are presently considered quite novel.