Exercises for Chapter 9 – Complete Process of Data Extraction and Analysis

9.1 Lectin example (Validation, extraction, comparison, charge calculation) Pseudomonas aeruginosa is an opportunistic pathogen associated with a number of chronic infections. This pathogen forms a biofilm, enabling it to survive both the re sponse of the host immune system and antibiotic treatment. One of the cornerstones of biofilm formation is the presence of the sugar-binding protein LecB (PA-IIL). Its inhibition is considered to be a promising approach for anti-pseudomonadal treat ment. For this reason, we will examine a sugar-binding site of LecB. Specifically, we will find all occurrences of this binding site in Protein Data Bank, validate them (and remove potential wrong structures) and ask the following questions: 1. Does this binding site also occur in organisms other than Pseudomonas aeru ginosa and in proteins other than LecB? 2. Do the binding sites have a common amino acid composition? Specifically, are there amino acids which are present in all occurrences of the binding site (or most of them)? And are there binding sites which have a different amino acid composition, and thus seem to be outliers? 3. Is the 3D structure of the common amino acid part (or parts) similar? Thus does the binding site have some conserved structural pattern(s)? 4. Is there any common charge distribution within the binding site? 9.2 Cytochrome P450 example (Database search, detection of channels, channel characterization) We have already encountered cytochromes P450 (CYPs) in previous examples in databases and channel detection, however here we focus on overall analysis of the given biomacromolecule from known sources and question the hypothesis of whether we can link the effect of known mutations to amino acids in the channels or rather to amino acids binding ligands. First, for any new macromolecule for analysis, it is wise to look up known data in a somewhat concise form. For proteins, the UniProt database is such a place to start. So let’s focus on data about the human CYPs presented in this database. 9.2.1 Database search 1. Find human CYPs with the largest number of crystal structures. Note its UNI PROT ID. 2. What are the molecular functions and biological processes connected with this protein according to its GO annotation? Restrict yourself to major keywords. 3. State the most generic catalytic activity of this selected CYP. Write the equation of this chemical reaction. 4. What is the EC number of this protein? 5. Where is this protein located within the cell? 6. List the interactions of this protein with small molecules available in ChEMBL, DrugBank and BindingDB. Which database contains the most chemical inter action data? 7. Find known problematic mutations for this protein. List any variants with a known effect on protein function. 8. Find the closest protein partners via cross-link to the STRING database. List those which are known from experiment. After collecting information about the protein in general, it is usually a good option to look at the structure in structural databases: 9. Select the structure of the protein with the best resolution and open it in the PDBe database to find whether this protein dimerizes or forms any other mac romolecular assemblies. 10. Find similar 3D structures using PDBeFold – which other protein has the most similar structure? What is its sequence similarity? 11. Try to find the active site by using the ligands present within the structure. 12. Use the Protein Feature View in RCSB to compare the coverage of the sequence with the extracted information about the sources of disorder within the structure. 13. Compare the structure of the CYP protein with others from its family using PDBflex. Which region exhibits the largest local flexibility? 14. Based on global flexibility analysis, find representative structures of individual clusters of conformations of the protein. Also select the two most distant struc tures. 15. Using PDBsum, find which ligands occupy the active site of the most distant structures from the previous task. 16. Analyze how different their surrounding residues are using LigPlot, and com pare them to catalytic residues from the Catalytic Site Atlas. 9.2.2 Channels detection 17. Analyze whether these two most distant structures share all channels from the catalytic site. Use MOLEonline 2.0 without HETATMs to even include channels blocked by ligands. 9.2.3 Channels characterization 18. One of the structures contains a channel which is wider than the ProbeRadius (you can check the molecular surface). In order to analyze this channel as well, enlarge the ProbeRadius to twice the original value (i.e. to 6 A) and redo the ˚ calculations. Are there visually similar channels in both structures now? 19. Compare the lining residues in the channels in both structures and list the channel-lining residues for channels with the largest overlap. How much do they differ in their composition, charge, hydropathy and polarity? 20. Compare residues lining the channels with mutated amino acids with a known effect. Are there any overlaps?