############################################################################# # EASE-AA: Evolutionary And Structural Encodings with Amino Acid parameters # ############################################################################# EASE-AA is a sequence-only machine learning method for the prediction of mutation-induced stability changes in unseen non-homologous proteins. ------------------------- Contents ------------------------- Installation requirements Installation Running SIFT web-server Usage How to cite References Contact ------------------------- Installation requirements ------------------------- EASE-AA has been developed and is supported only for Linux at the moment. Neverheless, if you can get all the other software running under a different operating system. There should be no problem in running EASE-AA as well. * Linux (tested with Ubuntu 12.04) - http://www.ubuntu.com/ * Python (tested with version 2.7.3) - http://www.python.org/ * LibSVM (tested with version 3.17) - http://www.csie.ntu.edu.tw/~cjlin/libsvm/ * PSI-BLAST (tested with legacy 'blastpgp' version 2.2.25) and NCBI Non-Redundant Database - http://blast.ncbi.nlm.nih.gov - ftp://ftp.ncbi.nlm.nih.gov/blast/db/ - see Altschul et al. (1997) * Spine-D (tested with version 1.3) - http://sparks.informatics.iupui.edu/SPINE-D/ - see Zhang et al. (2012) * Spine-X (tested with version 2.0) - http://sparks.informatics.iupui.edu/SPINE-X/ - see Faraggi et al. (2012) * SIFT (tested only with the web-server) - http://sift.bii.a-star.edu.sg/www/SIFT_seq_submit2.html - Currently, you will need to run SIFT on the web-server, save the prediction results into a text file and supply it to the EASE-AA for the prediction of stability changes. - see Ng & Henikoff (2001) ------------ Installation ------------ 1) Install Python - Repositories: $ sudo apt-get install python 2) Install LibSVM - Follow the instructions in the LibSVM package. 3) Install legacy BLAST and configure the NCBI Non-Redundant Database. - Repositories: $ sudo apt-get install blast2 - Installation tutorial: http://biskit.pasteur.fr/install/applications/local-blast 4) Install Spine-X - Follow the instructions in the Spine-X package. 5) Install Spine-D - Follow the instructions in the Spine-D package. 6) Configure EASE-AA - Once all the required software is installed and the non-redundant (NR) database configured, you will need to configure EASE-AA. Specifically, configure the location of the LibSVM and Spine-D. 6.1) Open the file ease.config 6.2) Configure the path to every installed software: LIBSVM = /your_path/libsvm-3.17/ SPINE_D = /your_path/spineD/bin/ 6.3) Configure the path where temporary files can be created: TEMP_DIR = ./temp 6.4) Please note that standalone SIFT is not supported yet. 6.5) Save all changes. 7) Test if EASE is working correctly 7.1) Execute the following command (pre-calculated predictive features will be used): $ python ./runEase.py test/1BNI.fasta Q15P --sift test/1BNI.sift --pssm test/1BNI.pssm --spineX test/1BNI.spXout --spineD test/1BNI.spd --debug > my_prediction.txt - Compare if the result in my_prediction.txt is the same as in ./test/test_prediction_saved_features.txt 7.2) Execute the following command (may take several minutes upto an hour based on your configuration): $ python ./runEase.py test/1BNI.fasta Q15P --sift test/1BNI.sift --debug > my_prediction.txt - Compare if the result in my_prediction.txt is similar to the one in ./test/test_prediction.txt - Compare if the PSSM, Spine-X, and Spine-D predictions in ./temp/ are similar to those in ./test/ - Most likely, there will be some differences because of the different versions of the NR database. ----------------------- Running SIFT web-server ----------------------- SIFT is used to calculate some predictive features for EASE-AA. Currently, standalone SIFT is not supported. You will need to run SIFT on the web-server, save the prediction results into a text file and supply it to the EASE-AA for the prediction of stability changes. Run SIFT via the following web-page: http://sift.bii.a-star.edu.sg/www/SIFT_seq_submit2.html 1) Enter your e-mail address to be able to collect the results. 2) Input the target fasta sequence. 3) Do NOT enter any substitutions of interest. This way, all possible amino acid substitutions in all residues will be calculated. 4) Use SwissProt+TrEMBL database. 5) Leave the default 'Median conservation of sequences' threshold of 3.0. 6) Leave the default sequence similarity threshold of 90% for removing identitical sequences. 7) When the prediction result is displayed in the web-browser: 7.1) Click on the link 'Scaled Probabilities for Entire Protein' (it will open in a new window) 7.2) Use Edit->Select All (Ctrl+A) to select all text on the web-page. 7.3) Open your favourite text editor and paste the text into an empty text document Edit->Paste (Ctrl+V). 7.4) Save the file and supply it for the EASE-AA prediction using the --sift argument. ----- Usage ----- EASE-AA is a sequence-only machine learning method for the prediction of mutation-induced stability changes in unseen non-homologous proteins. Both classification and real-value prediction is performed at the same time. EASE-AA is written in Python. The main script calls Spine-D which in turn calls PSI-BLAST, Spine-X, and other required software. This is done in order to calculate predictive features for the target sequence. SIFT is also used to calculate predictive features. Unfortunately, standalone SIFT is not supported by EASE-AA and you will need to run SIFT on the web-server, save the prediction results into a text file and supply it to the EASE-AA for the prediction of stability changes. The most computational expensive part of the prediction is the calculation of the predictive features. Thus, if you want to predict more than one mutation of the same protein sequence, use --degug option which will prevent deleting the PSSM, Spine-X, and Spine-D files. Locate these files in the 'temp' directory and save them into a preferred location. Then, use --pssm, --spineX, and --spineD options to supply these files. EASE-AA has three mandatory arguments: $ python ./runEase.py input_sequence_file mutation --sift SIFT To display all options, use --help option: $ python ./runEase.py --help usage: runEase.py [-h] --sift SIFT [--pssm PSSM] [--spineX SPINEX] [--spineD SPINED] [--configuration CONFIGURATION] [-d] input_sequence_file mutation positional arguments: input_sequence_file input sequence filename (fasta format) mutation mutation in the follwoing format: amino acid X at position n substituted by amino acid Y - XnY, e.g., A15C optional arguments: -h, --help show this help message and exit --sift SIFT predicted SIFT scores (web-server, format: ASCII plain) --pssm PSSM PSSM profile from PSI-BLAST (format: ASCII plain) --spineX SPINEX predicted secondary structure and accessible surface area from Spine-X (format: ASCII plain) --spineD SPINED predicted disorder probability from Spine-D (format: ASCII plain) --configuration CONFIGURATION use non-default configuration file -d, --debug do not delete intermediate files ----------- How to cite ----------- Folkman, L., Stantic, B. & Sattar, A. (2014), ‘Towards sequence-based prediction of mutation-induced stability changes in unseen non-homologous proteins’, BMC Genomics 15(Suppl 1), S4. ---------- References ---------- Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W. & Lipman, D.J. (1997), ‘Gapped BLAST and PSI-BLAST: A new generation of protein database search programs’, Nucleic Acids Research 25(17), 3389. Ng, P.C. & Henikoff, S. (2001), ‘Predicting deleterious amino acid substitutions’, Genome research 11(5), 863-874. Faraggi, E., Zhang, T., Yang, Y., Kurgan, L. & Zhou, Y. (2012), ‘SPINE X: Improving protein secondary structure prediction by multistep learning coupled with prediction of solvent accessible surface area and backbone torsion angles’, Journal of Computational Chemistry 33(3), 259-267. Zhang, T., Faraggi, E., Xue, B., Dunker, A.K., Uversky, V.N. & Zhou, Y. (2012), ‘SPINE-D: Accurate prediction of short and long disordered regions by a single neural-network based method.’, Journal of Biomolecular Structure and Dynamics 29(4), 799-813. ------- Contact ------- e-mail: ease (dot) mutations (at) gmail (dot) com web: www.ict.griffith.edu.au/bioinf/ease author: Lukas Folkman (www.ict.griffith.edu.au/lukas) Please help us by reporting any bugs or problems. All queries will be answered. We are planning to build a web-server soon, please, e-mail us if you are interested to be updated with our progress. If you need to process a large number of mutations, please contact us, we may be able to help you.