Multiple alignment of mammalian PrP amino acid sequences
Here is a multiple alignment of the PrP amino acid sequence from several different mammals:
At the end of this post I will give quick instructions on how to do this. First, some discussion. The multiple alignment allowed me to see visually several things I’d heard for a while. Notice that mouse has one fewer glycine than human right here:
That’s why all the amino acids after that are numbered differently in mice (e.g. A117V in humans is A116V in mouse, D178N in humans is D177N in mice, etc.). Similarly, you can see how cows have one extra copy of the octapeptide repeat PHGGGWGQ compared to all the other mammals I’m looking at:
(And apparently opossums have a decapeptide repeat instead!)
Codon 178 (right, below) is completely conserved as D in all these mammals, and even codon 129 (which has a non-disease-causing polymorphism in humans) is M in the reference genome for all of these as well:
Overall, you can see just how much red is present throughout the alignment. (In T-Coffee output, red doesn’t necessarily mean perfect amino acid match– if you look carefully at the above you’ll see some functionally similar amino acids e.g. the hydrophobic I, M, V, L substituted for one another). PRNP is a very highly conserved gene among mammals. It’s quite peculiar actually: you’d expect highly conserved genes to be vitally important and if you knock them out, the organism would be severely affected and maybe not viable. That’s the case with HTT. But not with PRNP: it’s highly conserved and yet knockouts have no really severe or obvious phenotype [Bueler 1992] (later studies have shown there are some phenotypes though they are subtle; topic for a future post).
Notice I said PRNP was highly conserved “among mammals.” Here’s a multiple alignment of human, mouse, chicken and zebrafish. Zebrafish has three paralogs of PRNP so all are included here.
Notice how much poorer the alignment got when we added these non-mammals: barely any red in that graphic. Though there is one stretch (from about 114 to 150 in human codon numbering) that is pretty highly conserved. The stretch around human codon 178 also shows pretty decent homology with the chicken and zebrafish, though few of the amino acids are actually identical.
So PRNP is highly conserved among mammals and not-so-conserved among vertebrates more broadly. Although the sequence is not very conserved, if you look at Zebrafish PrP in the Genome Browser, you’ll see it does have the same exon structure as its human homologue: two exons, with all the coding sequence in the second exon.
Appendix: How to create a multiple alignment like this
- Go to the UCSC Genome Browser, select Human and search for PRNP. Scroll down to the list of RefSeq genes and click on the first (or any) hit. This will get you to a view kind of like this.
- In the leftmost column of the browser, click on “PRNP” icon:
- Scroll down and click on the “Predicted Protein” link. This will give you the amino acid sequence. This is the protein as translated directly from mRNA, so for PrP that includes N and C terminal sequences that later get cleaved off during post-translational modification.
- Repeat starting from step 1 for other organisms you want to include, and paste all the results into a text file.*
- Paste the contents of your text file into your favorite multiple alignment tool. I am partial to T-Coffee [Di Tommaso 2011]; some people prefer Clustal Omega.
*Am I making step 4 more difficult than it needs to be? If you click on the mRNAs for other organisms in the human genome browser you can get the nucleotide sequence but I don’t see a link for the protein, and for the limited number I was doing I figured it would be quicker to just click through for all of them than to copy and paste each nt sequence into another tool to translate to amino acids. Alternately you can just take the first amino acid sequence you get from UCSC, BLAST it and copy the top results from BLAST. One advantage of that approach is that you’ll pick up orthologs that aren’t named “prnp”. Yet another approach is to just use the multiple alignment that UCSC produces for you. To do that, from Step 2 just click on “CDS FASTA alignment” and you’ll get to something like this. This UCSC tool is wonderfully easy to use: you can just check boxes for the different species you want included or not. However: it only shows deletions relative to the human sequence (or whatever sequence you started with), not insertions: “Because the CDS FASTA alignments are based on one reference genome, any amino acids or nucleotides that are not in the reference genome are not displayed.” - User Guide. So if you want a true multiple alignment including insertions and deletions of everything relative to everything else, you’ve got to gather all the amino acid sequences separately and then do a multiple alignment with a separate tool.