Molecular Biology 18: 'Gene regulation IV - transcription initiation'
These are my notes from lecture 18 in Harvard’s BCMP 200: Molecular Biology course, delivered by Timur Yusufzai on October 24, 2014.
Intro to transcription factors
In vitro, a combination of RNA polymerase, a DNA template, NTPs, and Mg2+ is sufficient to cause transcription to start at arbitrary sites. So how is transcription started specifically at promoters in cells? Fractionation experiments showed that specific transcription requires a series of transcription factors named “transcription factor [for RNA polymerase] II” - such as TFIIB, TFIID, TFIIE, and so on, pronounced “TF two B”, etc. [Sawadogo & Roeder 1985]. These are huge complexes (1.5 MDa), even larger than RNA pol II itself (0.5 MDa). These complexes must recognie a promoter sequence, recruit RNA pol II, unwind DNA to load RNA pol II, and then cause RNA pol II to start transcribing.
You can remember the functions of the TF complexes based on the first initial of what they do, as follows:
- TFIID: DNA-binding
- TFIIH(K): Helicase / Kinase
- TFIIF: Facilitates RNA pol II loading
- TFIIE: Entry onto DNA
- TFIIA: Activator, or Anti-repressor
- TFIIB: Bridges TFIID and pol II
Some facts about how they work
TFIIB can bind DNA upstream or downstream of the 90° bend.
TFIIB and TFIIA help stabilize the binding of TBP to DNA.
TFIIH is a helicase of 0.5 MDa [Gibbons 2012] which helps recruit RNA pol II.
TFIIE and TFIIH create a transcription “bubble” of unwound DNA.
A non-hydrolyzable ATP analogue, AMP-PNP, prevents transcription initiation and elongation. Thus, ATP is required for those things. Three separate activities require ATP: helicase, kinase and polymerase activity.
In mammals, RNA pol II contains a C-terminal domain with 52 repeats of the sequence YSPTSPS. S2 and S5 are phosphorylatable. Their phosphorylation is invovled in elongation. This was first characterized in [Meinhart & Cramer 2004] and was the subject of BBS 230 week 05 [Kwon 2013].
The structure of the pre-initiation complex (PIC) is a subject of controversy between two famous scientists, Eva Nogales [He 2013] and Roger Kornberg [Murakami 2013]. Nogales visualized the process of transcription initiation via cryo-electron microscopy (cryo-EM) of the pre-initiation complex (PIC) [He 2013] as shown in this video:
Kornberg used a combination of cryo-EM and cross-linking mass spec and did not make a video. The two groups’ structures differ considerably. We still don’t know which, if either, is correct.
Polymerase at promoters
The presence of RNA pol II at a promoter does not guarantee transcription. ChIP-seq has found RNA pol II, for instance, to be present at promoters of non-expressed genes, and also in non-promoter regions that do not act as transcription start sites [Adelman & Lis 2012]. The nomenclature for this is confusing. RNA pol II is said to be “paused” if there is a peak of RNA pol II bound at a transcription start site indicating it has piled up there, pausing. Thus,
- a “paused, expressed” gene is one with a pol II peak at the start site but some pol II signal throughout
- a “paused, non-expressed” gene is one with a pol II peak at the start site and no pol II signal elsewhere (and also no evidence for expression from RNA-seq)
- a “non-paused, expressed” gene is one with pol II signal evenly distributed throughout the gene, consistent with active transcription
- a “non-paused, non-expressed” gene is simply one with no pol II signal at all (and also no evidence for expression from RNA-seq)
Pausing usually occurs at early elongation, after initiation. Early elongation is a very tightly regulated step in gene expression.
More facts
Within the promoter, the core promoter is ±40bp from the transcription start site. DNA sequence in the core promoter is important for recruiting TFIID.
TBP is part of TFIID.
TFIID makes multiple contacts with DNA at the core promoter, including but not limited to the TATA box.
People still use DNAse I and hydroxyl radical footprinting to study TFIID [Cianfrocco 2013].
<25% of mammalian promoters have a consensus TATA box. Instead, most mammalian genes are transcribed starting from CpG islands with multiple start sites. This somehow relates to SAGA, a complex that Fred Winston studied - see commentary in [Timmers & Tora 2005]. CpGs at promoters must be unmethylated for transcription to occur.