Data analysis and visualization

distance_tree

This module can be used to load a file named tabulated.tsv (as generated by the tabulate module), to produce a distance matrix, an UPGMA tree based on the distance matrix, and a file with the input data, ordered as shown on the UPGMA tree. One parameter must be declared in the config file, namely vt_columns, that will determine the size of the svg image that is produced (we suggest 5000). There is also an additional parameter that can be declared, namely distance_tree_coordinates that can be used to consider only some positions of the protain being analyzed. For instance, distance_tree_coordinates="1-100,400-500" means that only the regions comprised in between amino acid positions 1-100 and 400-500 will be considered.

get_pattern

This module takes as input the table created by the tabulate module and returns a file named pattern_regions.txt with the pattern positions for structure 1 according to the rule and the cutoff value specified in the auto-p2docking configuration file, that may look like this:

pattern_cutoff=50

pattern=“111[01]1”

where pattern_cutoff is the value (in percentage) above which the site is assigned as “1” in the rule pattern (in the example given is 50%), and pattern is a regular expression (pattern rule). In this example is “111[01]1”, which means that a pattern region is defined as a sequence of at least three consecutive sites with more than 50% frequency, followed by a site that may have more or less than 50% frequency, if followed by another site with more than 50% frequency.

highlight_regions

This module applies colors to PDB structures to highlight regions of interest, based on the pattern_regions.txt file created by the get_pattern module, that must be in the module’s input folder. The .pdb files to be colored must be placed in a folder called files_to_keep, inside a folder called pisa_extract and then organized in subfolders named after the respective ligand (for instance, P59665_ditasser) (if this module is used in a pipeline that has a docking method and pisa_server_extract or pisa_ccp4_extract, the .pdb files will automatically be placed in the files_to_keep folder). Chain A (receptor) and B (ligand) are colored green and blue respectively, and pattern amino acid sites are colored red. This module also uses pymol (https://pymol.org/) to save a png file.

tabulate

This module takes as input .tsv files (generated by the pisa_server_extract, pisa_ccp4_extract, or pisa_xml_extract modules) and creates a table with all the receptor ligand sites when in complex with each one of the ligands. The files must be organized in subfolders named after the respective ligand (for instance, P59665_ditasser) within the input folder of the module. of the colored structure. The output directory will contain the colored .pdb files and the corresponding .png files. This module works for any type of pdb, if the structure to be colored is chain A.