Peptide Knowledge Center

Prediction of anticancer peptides: adding secondary structure

The anticancer peptide is a kind of antimicrobial peptide which has obvious antitumor activity.In recent years, biologists have done a lot of research in the field of anticancer peptides. With the deepening of research, scientists were surprised to find that Anti-cancer peptide can not only quickly and effectively eliminate pathogenic bacteria, but also can be effective in human tumor cells. The nuclear chromosome synthesis of tumor cells is blocked, and the DNA is broken, which leads to the death of tumor cells. The discovery of anticancer peptides (ACP) offers new hope for the treatment of cancer, as anticancer peptides (ACP) do not impair normal physiological functions of the body. In the past decade, many anticancer peptides targeting various tumor have been clinically applied, suggesting that anticancer peptides (ACP) may be a means of cancer treatment.

In 2013, Tyagi et al. apply Suppoa Vector Machine based on the characteristic information of amino acid components to the rediction of anticancer peptides .The total accuracy (Acc)   was 88.89%. In 2014, Hajisharifi et al, based on the pseudo amino acid component information and Local contrast to kernel theory , used the Suppoa Vector Machine (SVM) algorithm to predict anticancer peptides under the five-fold cross-validation , and the total accuracy (Acc) reached 89.7%. In 2016, Chen et al. predicted anticancer peptides based on the sequence tool under the five-fold of cross-validation test, and the total accuracy (Acc) reached 94.77%.

Using the quadratic discriminant method to carry out prediction, Combined with 20 kinds of amino acids (20AAC) ,three kinds of protein secondary structure components information and 6 kinds of hydrophobic amino acids (6HP) as characteristic information. The best predictive total accuracy (Acc) reached 94%, At the same time, compared with other prediction algorithms, the results show that the quadratic discriminant method is superior to other prediction algorithms.

Protein secondary component

Three secondary structure components (3PSS) of the protein were selected as the characteristic parameters. The three secondary structures of proteins are α helix,β folding and unregulated Coil. In the data set, there were 138 anticancer peptide sequences in the positive set and 206 anticancer peptide sequences in the negative set. The secondary structure information was predicted by PSIPERD software.

Amino acid component

The 20 amino acid components (20AAC) were selected as the characteristic parameters.

Hydrophilic and hydrophobic amino acid components

According to the hydrophilic and hydrophobic amino acids, 20 kinds of amino acids were divided into 6 categories. The strong hydrophilic amino acids, aspartic acid (D), arginine (R), glutamine (E), asparagine (N), glutamic acid (Q), lysine (K) and histidine (H), were grouped into one group, marked H; The strong hydrophobic amino acids alanine (A), methionine (M), phenylalanine (F), leucine (L), isoleucine (I), and valine (V) are grouped into one group and denoted as L, The weakly hydrophilic or weakly hydrophobic amino acids serine (S), threonine (T), tyrosine (Y) and tryptophan (Z) were grouped into one group, denoted as W. The remaining three amino acids, namely, proline (P), argonic acid (G), and cysteine (C), are classified by their special chemical structure. In this way, the 20 kind of amino acids can be merged into 6 (H, L, W, P, C, G), and the 6 hydrophilic and hydrophobic amino acid components (6HP) are calculated as the characteristics.

The anticancer peptides were predicted by quadratic discriminant method (QD) and random forest (RF)

Classification prediction performance evaluation

At present, the commonly used methods for performance verification of predictive algorithms mainly include independent test and K-Foldcross-Validation test.

Using the 7-Fold of cross-over test, the data set is randomly divided into 7 sub-sets, from which one subset is taken as the test set, and the remaining 6 sub-sets as the training set. This process is repeated for a total of 7 times. For the performance evaluation of any prediction algorithm, the main purpose is to ensure that the prediction algorithm can be generalized to new samples belonging to the same data domain. In our study, sensitivity (Sn), specificity (Sp), total accuracy (Acc) and Mathew's correlation coefficient MCC were mainly used to evaluate the effectiveness of the prediction algorithm:

the three kinds of protein secondary structure (3PSS) were extracted as the characteristic parameters for the first time. Combined with 20 kinds of amino acids (20AAC) and 6 kinds of hydrophobic amino acids (6HP) as characteristic information, using the quadratic discriminant method(QD) to carry out prediction,.The results showed that the accuracy of 20 amino acid components (20AAC) combined with 3 secondary structure components (3PSS) of protein was higher, and the total accuracy (Acc) was up to 94%, which was higher than other prediction algorithms.It is hoped that our prediction model can be applied to the recognition of other antimicrobial peptides.

For more informations about peptide synthesis please visit  ; peptide product (China region)