Intelligent Computing Lab.
Bioinformatics in NCTU, Taiwan.

Propensity scores for prediction and characterization of bioluminescent proteins from sequences
Hui-Ling Huang 1,2
1Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan
2Department of Biological Science and Technology, National Chiao Tung University, Hsinchu, Taiwan

| Home | SCMLFP | SCMBLP | Release 1.1, Last update: September 25 2014 


Bioluminescent proteins (BLPs) are a class of the proteins with various mechanisms of light emission such as bioluminescence and fluorescence from luminous organisms. Identification of BLPs including luciferases and fluorescent proteins (FPs) is valuable for commercial and medical applications but more challenging due to their high variety of protein sequences. Furthermore, characterization of BLPs is helpful for mutagenesis analysis to enhance bioluminescence and fluorescence. This study proposes a novel approach to estimating propensity scores of 400 dipeptides and 20 amino acids to design two prediction methods and characterize BLPs based on a scoring card method (SCM). The method SCMBLP for predicting BLPs obtained the accuracy 90.83% of 10-fold cross-validation better than existing support vector machine based methods and the test accuracy of 82.85% on a new balanced dataset of 548 samples. A new dataset consists of 269 luciferases and 216 FPs was established to design the prediction method SCMLFP with training and test accuracies of 97.10% and 96.28% respectively. Furthermore, we identified four informative physicochemical properties of 20 amino acids using the estimated propensity scores to obtain insight into characterization of BLPs: 1) high transfer free energy from inside to the protein surface, 2) high occurrence frequency of residues in the transmembrane regions of the protein, 3) large hydrophobicity scale from native protein structure, and 4) high correlation coefficient (R = 0.921) between the amino acid compositions of BLPs and integral membrane proteins. By further analyzing BLPs, luciferases have a larger value of R (0.937) than FPs (0.635) suggesting that luciferases more prefer to locate near to the cell membrane location than FPs for convenient receipt of extracellular ions. The propensity scores of dipeptides and amino acids, and the identified properties are helpful for prediction, characterization, and applications of BLPs including luciferases, photoproteins, and FPs.


Download SCM Tool
Please read README file for manual

The system flowchart of the proposed scoring matrix method

Sequence prediction:

Contact with:
Hui-Ling Huang, Shinn-Ying Ho.

Cite SCM:
Prediction and analysis of protein solubility using a novel scoringcard method with dipeptide composition.