Background: Proteins bind ligands/substrates through molecular interactions provided by specific amino acids in the binding pocket. These…

Background:
Proteins bind ligands/substrates through molecular interactions provided by specific amino
acids in the binding pocket. These interactions (e.g. hydrogen bond, hydrophobic interactions,
and electrostatic interactions etc.) are key to binding and in turn modulating the proteins
function. As a result, one common drug design strategy is to design molecules that would
make favorable interactions to these essential amino acids in order to outcompete the native
substrate and inhibit the protein’s function. At the same time, regions that can change can be
important to take advantage of for designing organism specific drugs and avoiding off-target
effects (e.g. maybe you want the drug inert in humans but highly functional in dogs). In this
case you want to take advantage of the sequence differences between species. In this lab, you
are going to identify and visualize these important amino acids through protein sequence
analysis using bioinformatics tools.
Instructions:
1. We are going to use a sequence visualization program (Jalview). This is where we can identify
the residues we think are interacting with the molecule in the active site.
2. On the page of your protein from the pdb website, click on fasta sequence under the “Display
Files” menu.
3. This will display the fasta sequence on your browser (which is the amino acid sequence for your
enzyme).
4. Copy this sequence then go to the BLAST web server:
5. Select Protein Blast (blastp)
6. Paste in your sequence. Wait! Don’t hit submit, instead lets add some paramters to make sure
you have enough diversity in sequences to get a good analysis of conservation:
a. Change database to: Reference Proteins (refseq_protein). This database is a bit less
redundant than the default nr database. Ideally we would use a database that is nonredundant
on the 75% level (other servers such as HMMER allow that, but they are more
difficult to use).
b. Expand “Algorithm Parameters” and select 250 on “Max Target Sequences”. This will
make sure you have enough sequences to measure conservation while making sure that all the
proteins identified are predicted to be highly related.
*IMPORTANT NOTE: If you see %’s in the “Ident” column in the blast results
going below 30% re-adjust this to 100 or even 50. Below 30% you are in what is
considered the “twighlight” zone and the sequences you are finding are not
necessarily related in structure or function. Basically, you have drifted too far in
evolutionary space. Roughly speaking, 30-70% Ident it has been found that proteins
are both related in protein structure and function, above 70% it is almost certain that
proteins are almost identical in structure and function.
7. Hit “BLAST” to initiate search results
8. Analyze your results, make sure all the hits are “Pink or Red” for alignment score (i.e. large
portions of the sequence aligned). And in the “Descriptions” table with “Sequences producing
significant alignments” make sure the Ident column never goes below 30. If it does, see note above.
9. If all looks good, Select “Multiple Alignment” near the top in the “Other Reports” section.
10. From this output (don’t worry if an error is thrown for graphical overview) select:
“Download” -> Fasta plus gaps
13. Now we are going to open another program named Jalview (. Click
Launch Jalview Desktop open the downloaded file.
14. Under the “File” menu, hit Input Alignment, from file and open the alignment file you just
downloaded from the BLAST.
15. Under Jalview hit colour tab at top, then click percentage identity, now go back to
color and click above identity threshold. Change occurrence to 99% conserved.
15. The residues highlighted here are either critical for protein structure or function. All of the
residues you identified in lab 1 (i.e. the catalytic residues) should be highlighted.
16. Using Jalview and using percentage identity and moving the bar back and forth you can
determine the percent conservation for a residue (look at switch from being
colored to not colored).
17. Now, in the assignment below you will need to find non-catalytic residues (i.e. residues not
directly involved in the reaction mechanism) that are either highly conserved OR have low
conservation. Roughly speaking (based on search paramters above):
High Conservation residues have >70% identity across the homologous sequences
Low Conservation residues have <30% identity across the homologous sequences
Residues in the 30-70% range a likely important but not critical for structure or function.
Assignment—
For your assigned enzyme group complete the following assignments with the help of the Lab
2 instructions:
1) What NON-CATALYTIC residues in the active site of the enzyme are most likely
important for function?
a. Highlight three highly conservative residues, show both structural interactions
AND conservation using PyMol and JalView. Hypothesize why they are critical for
function.
b. Highlight one low conservation residue, show both structural interactions AND
conservation using Pymol and JalView. What other amino acids are observed,
hypothesize why the alternate residues are ok.
2) How does the inhibitor interact and how would you further optimize?
a. Describe WHY it acts as an inhibitor.
b. What key interactions are made with the inhibitor? Are any of those interactions
mimic the natural substrate?
c. If you were to modify the inhibitor what new interactions would you try to take
advantage of? (hint…conserved residues in the protein are unlikely to change
while maintaining protein structure or function)
Case 1 = ACE: 2X91
Case 2 = OD: 2TOD
Case 3 = α-Fucosidase: 2ZX5
Case 4 = PNP: 1A9S
Case 5 =CD: 2FR6