Exercises for Chapter 9 – Complete Process of Data Extraction and Analysis
9.1 Lectin example (Validation, extraction, comparison, chargecalculation)Pseudomonas aeruginosa is an opportunistic pathogen associated with a number of
chronic infections. This pathogen forms a biofilm, enabling it to survive both the re
sponse of the host immune system and antibiotic treatment. One of the cornerstones
of biofilm formation is the presence of the sugar-binding protein LecB (PA-IIL). Its
inhibition is considered to be a promising approach for anti-pseudomonadal treat
ment.
For this reason, we will examine a sugar-binding site of LecB. Specifically, we
will find all occurrences of this binding site in Protein Data Bank, validate them
(and remove potential wrong structures) and ask the following questions:
1. Does this binding site also occur in organisms other than Pseudomonas aeruginosa and in proteins other than LecB?
2. Do the binding sites have a common amino acid composition? Specifically, are
there amino acids which are present in all occurrences of the binding site (or
most of them)? And are there binding sites which have a different amino acid
composition, and thus seem to be outliers?
3. Is the 3D structure of the common amino acid part (or parts) similar? Thus does
the binding site have some conserved structural pattern(s)?
4. Is there any common charge distribution within the binding site?
9.2 Cytochrome P450 example (Database search, detection of channels, channel characterization)
We have already encountered cytochromes P450 (CYPs) in previous examples in databases and channel detection, however here we focus on overall analysis of the given biomacromolecule from known sources and question the hypothesis of whether we can link the effect of known mutations to amino acids in the channels or rather to amino acids binding ligands. First, for any new macromolecule for analysis, it is wise to look up known data in a somewhat concise form. For proteins, the UniProt database is such a place to start. So let’s focus on data about the human CYPs presented in this database.
9.2.1 Database search
1. Find human CYPs with the largest number of crystal structures. Note its UNI
PROT ID.
2. What are the molecular functions and biological processes connected with this
protein according to its GO annotation? Restrict yourself to major keywords.
3. State the most generic catalytic activity of this selected CYP. Write the equation
of this chemical reaction.
4. What is the EC number of this protein?
5. Where is this protein located within the cell?
6. List the interactions of this protein with small molecules available in ChEMBL,
DrugBank and BindingDB. Which database contains the most chemical inter
action data?
7. Find known problematic mutations for this protein. List any variants with a
known effect on protein function.
8. Find the closest protein partners via cross-link to the STRING database. List
those which are known from experiment.
After collecting information about the protein in general, it is usually a good
option to look at the structure in structural databases:
9. Select the structure of the protein with the best resolution and open it in the
PDBe database to find whether this protein dimerizes or forms any other mac
romolecular assemblies.
10. Find similar 3D structures using PDBeFold – which other protein has the most
similar structure? What is its sequence similarity?
11. Try to find the active site by using the ligands present within the structure.
12. Use the Protein Feature View in RCSB to compare the coverage of the sequence
with the extracted information about the sources of disorder within the structure.
13. Compare the structure of the CYP protein with others from its family using
PDBflex. Which region exhibits the largest local flexibility?
14. Based on global flexibility analysis, find representative structures of individual
clusters of conformations of the protein. Also select the two most distant struc
tures.
15. Using PDBsum, find which ligands occupy the active site of the most distant
structures from the previous task.
16. Analyze how different their surrounding residues are using LigPlot, and com
pare them to catalytic residues from the Catalytic Site Atlas.
9.2.2 Channels detection
17. Analyze whether these two most distant structures share all channels from the
catalytic site. Use MOLEonline 2.0 without HETATMs to even include channels
blocked by ligands.
9.2.3 Channels characterization
18. One of the structures contains a channel which is wider than the ProbeRadius
(you can check the molecular surface). In order to analyze this channel as well,
enlarge the ProbeRadius to twice the original value (i.e. to 6 A) and redo the ˚
calculations. Are there visually similar channels in both structures now?
19. Compare the lining residues in the channels in both structures and list the
channel-lining residues for channels with the largest overlap. How much do
they differ in their composition, charge, hydropathy and polarity?
20. Compare residues lining the channels with mutated amino acids with a known
effect. Are there any overlaps?