In Silico tool for predicting, designing and scanning IL-2 inducing peptides

Mehta, Naman Kumar; Lathwal, Anjali; Kumar, Rajesh; Kaur, Dilraj; Raghava, Gajendra P. S.

doi:10.1038/s41598-025-08388-2

Download PDF

Article
Open access
Published: 16 July 2025

In Silico tool for predicting, designing and scanning IL-2 inducing peptides

Scientific Reports volume 15, Article number: 25692 (2025) Cite this article

2897 Accesses
2 Citations
2 Altmetric
Metrics details

Subjects

Abstract

Interleukin-2 (IL-2) based immunotherapy has been approved for treating certain types of cancer, as IL-2 plays a crucial role in regulating the immune system. In this study, we developed a method for predicting IL-2-inducing peptides. Our method was trained, tested, and validated on a main dataset containing 6,574 experimentally validated Major histocompatibility complex (MHC) binders, including 3,429 IL-2-inducing and 3,145 non-inducing peptides. A primary analysis of IL-2 inducing and non-inducing peptides revealed that certain residues, such as alanine and leucine, are more abundant in IL-2-inducing peptides. Initially, we developed alignment-based methods, which demonstrated high precision but limited coverage. Subsequently, we developed artificial intelligence-based models, including machine learning (ML), deep learning (DL), and large language models (LLM), to predict IL-2-inducing peptides. Our Extra Tree-based model, developed using dipeptide composition and peptide length, achieved a maximum AUC of 0.82. Finally, we constructed ensemble models that combined artificial intelligence and alignment-based methods. Our best ensemble model, which integrates the Extra Tree-based model with MERCI, achieved the highest AUC of 0.84 and an MCC of 0.51 on the main dataset. One limitation of the main dataset is that both IL-2-inducing and non-inducing peptides are MHC binders. To address this limitation, we created two additional datasets: Alternate Dataset 1, consisting of 3,429 IL-2-inducing peptides and 3,429 non-inducing peptides (MHC non-binders), and Alternate Dataset 2, consisting of 3,429 IL-2-inducing peptides and 3,439 non-inducing peptides (MHC binders + MHC non-binders). Our best ensemble model achieved AUCs of 0.9 and 0.8 with MCCs of 0.61 and 0.44 on Alternate Datasets 1 and 2, respectively. To assist the scientific community, we have integrated the best models from this study into a standalone software and web server, IL2pred, which enables users to predict, scan, and design IL-2-inducing peptides (https://webs.iiitd.edu.in/raghava/il2pred/).

IL2Pepscan: a machine learning framework for predicting IL-2 inducing peptides and their identification across global viral proteomes

Article Open access 30 January 2026

A genetic algorithm-based ensemble model for efficiently identifying interleukin 6 inducing peptides

Article Open access 01 July 2025

A hybrid method for discovering interferon-gamma inducing peptides in human and mouse

Article Open access 06 November 2024

Introduction

Traditional cancer treatment strategies, such as chemotherapy and radiotherapy, have been the cornerstone of cancer care for decades. While effective, these approaches often come with significant short- and long-term side effects, including toxicity, fatigue, myalgias, cognitive dysfunction, and infertility¹. Immunotherapy is often considered the “fifth pillar” of new-age therapies that aim to eliminate cancer cells via immune system activation. This can be achieved by targeting the tumor antigens or by passively enhancing the anti-tumor immune response by administering cultured lymphocytes². Recent clinical studies have confirmed the efficacy and safety of immunotherapy in managing large cohorts of patients^3,4,5. Several targeted immunotherapies (e.g., drug-target conjugates), therapeutic cancer vaccines (e.g. T-VEC), adoptive targeted therapies (e.g., CAR T-cells), and immunomodulators such as interleukins have been approved and marketed for use in different cancer types^2,6. Interleukins are small glycoproteins that can have a pleiotropic effects on the immune system. They bind to receptors on the cell surface that can aid in immune cells’ differentiation, survival, and proliferation⁷.

Currently, two interleukin-based drugs are approved by the Food and Drug Administration (FDA) for treating human malignancies. This includes the interferon-alpha-based drug Roferon-A and IL-2 based Aldesleukin or Proleukin⁸. IL-2 is an early cytokine that was commercially approved to treat metastatic renal carcinoma and metastatic melanoma by the FDA. IL-2 is a 15.5 KDa alpha-helical cytokine that belongs to the gamma chain family of immune modulators. CD4 + T-cells chiefly produce IL-2, but this can also be made by CD8 + T-cells, dendritic cells, and natural killer cells (NK-cells)³. While IL-2 is known for its immune-stimulating activity at low concentrations, it also promotes the expansion of regulatory T-cells and immune tolerance at higher concentrations, reflecting its dual role in immune regulation. IL-2 works through the JAK-STAT pathway and helps maintain and differentiate CD4 + T-cells in various subsets, including Th1, Th2, and Th17. It can promote CD8 + T-cell and NK-cell cytotoxic activity and the generation of memory cells⁹. The excellent immune-stimulating activity of IL-2 has led to the development of several treatment regimens. IL-2 has been used in clinical settings as a monotherapy agent and combined with several other therapeutic regimes. The low dose of IL-2 combined with interferon-alpha shows more significant therapeutic benefits for patients with renal cell carcinoma¹⁰. IL-2, when combined with chemotherapeutic agents, including cisplatin and dacarbazine, has been extensively investigated in patients with metastatic melanoma¹¹. The IL-2, combined with targeted therapies such as gefitinib, showed a positive response in non-small cell lung cancer (NSCLC) patients¹². Combining IL-2 with a therapeutic cancer vaccine also indicates a synergistic effect in advanced melanoma¹³.

IL-2 therapy has a number of limitations, including a short half-life; high doses of IL-2 lead to toxicity, vascular leakage, and hypotension^14,15,16. The clinicians and researchers employed several strategies for overcoming the challenges faced by IL-2 therapy. One such strategy widely employed by researchers in literature to improve the efficacy of IL-2 therapy is the generation of IL-2 inducing mutant peptides. The mutant version of IL-2 inducing peptides reportedly has low toxicity and can activate Natural-killer cells (NK-cells) without producing a high level of pro-inflammatory cytokines, thus preventing vascular leakage^7,17. There are other mutants of IL-2 inducing peptides that can activate immune cells without pro-inflammatory activity and thus are free from toxicity-related problems¹⁷. These mutant versions of IL-2 inducing peptides are known as “Superkine’’ because of their enhanced antitumor property. These points highlight that the mutant version of natural IL-2 inducing peptides possesses a high therapeutic index. Thus, generating an improved version of IL-2 has become an important area of research. However, identifying and generating such mutants in clinical setups is time-consuming and cost-intensive. In-silico computation methods can help scientists and clinicians in this regard. Several past methods have also been developed that targeted and utilized the therapeutic potential of interleukin-based therapy for managing various human malignancies. CytoPred is one such method that is developed for the identification and classification of cytokines¹⁸. In addition, methods have been developed for inducing specific types of cytokines, including IL4pred¹⁹IL10Pred²⁰IFNepitope²¹and IL6pred²² for IL-4, IL-10, IFN-gamma, and IL-6, respectively. Despite the huge therapeutic potential, no method is available in the literature for identifying and generating peptides that can induce the natural secretion of IL-2. In the present study, a computational method for predicting and designing peptides that can potentially induce IL-2 cytokine has been developed.

Results

In this study, three distinct datasets were used to develop the prediction algorithm – main dataset, alternate dataset 1 and alternate dataset 2. The distribution of peptides with respect to the MHC allele in our main dataset is available in Supplementary File 1 (S1.xlsx). All analyses and model building were conducted on these datasets. Based on this comprehensive analysis, different prediction models were built, trained, and externally validated to assess their performance in predicting IL-2 inducing and non-inducing peptides.

Dataset length distribution analysis

Both positive and negative datasets were subjected to length distribution analysis. The length distribution analysis of both the positive and negative datasets of the main dataset is provided in Fig. 1. The distribution analysis reveals that positive/IL-2 inducing peptides predominantly consist of peptide lengths ranging from 12 to 20, with a length of 15 as the most abundant. In negative/IL-2 non-inducing peptides, the peptides were mainly 15, 17, and 20-length amino acid sequences. The distribution analysis of the other two datasets has been reported in the Supplementary File 2: Figure SF1 and Figure SF2.

Average amino acid composition analysis

The positive and negative datasets were analysed for their average amino acid composition. In the main positive dataset, amino acid residues Ala, Phe, Leu, Pro, and Tyr are highly significant (adjusted p-value < 0.05), while in the main negative dataset, the residues Cys, Asp and Thr are significant. The average amino acid composition analysis for the main dataset is provided in Fig. 2, along with the adjusted p-value, and the other two datasets are provided in Supplementary File 2: Figure SF3 and Figure SF4. Asterisks (*) denote residues showing statistically significant differences after Benjamini-Hochberg correction for multiple comparisons (adjusted p < 0.05).

Positional preference analysis

The TSL reveals the positional preference of amino acids for positive and negative datasets. In the main positive dataset, IL-2 inducing peptides show a strong enrichment for hydrophobic residues like L, G, Y, and F, and positively charged residues like R and K at specific positions. The depleted amino acids, such as T, D, S, and E, suggest that IL-2-inducing peptides avoid hydrophilic and acidic residues at these positions, as shown in Fig. 3. The positional preference analysis for the other two datasets is provided in Supplementary File 2: Figure SF5 and Figure SF6.

Alignment-based approaches

Motif search

The MERCI software has been utilised to determine the motifs present exclusively in the IL-2 inducing peptides but not in IL-2 non-inducing peptides. Similarly, the motifs that are exclusive to IL-2 non-inducing peptides were computed. The different parameters and classification methods (NONE/BETS-RUSSELL/KOOLMAN) available in MERCI software have been utilised to differentiate the motifs into IL-2 inducing and non-inducing peptides. Among these, for the main dataset, the “NONE” classification scheme with fp 10 was selected as the best parameter because of its high accuracy despite having low coverage. The results of the top 10 motifs and the number of sequences from the independent datasets that occurred in the positive and negative classes of the main dataset are presented in Table 1. It is revealing that the motifs were hydrophobic in nature and also exclusive to IL-2 inducers. This result was also seen during the positional preference analysis. The results of motif coverage of the other two datasets were presented in Supplementary File 1 (S2.xlsx).

Table 1 Top 10 IL-2 inducer motifs of the main dataset and the number of sequences in which they occurred.

Full size table

BLAST search

The blastp-short was applied to perform a similarity search against a training dataset (main dataset) of IL-2 inducing and IL-2 non-inducing peptides. The search used blastp-short with an e-value range from 10^− 7 to 10^− 1 (Table 2). Blastp-short achieved optimal performance at an e-value of 10^− 3, correctly identifying 2100 IL-2 induced peptides and 402 IL-2 non-induced peptides, with 272 incorrect hits. E-values lower than 10^− 3 did not provide adequate sequence coverage, while values higher than 10^− 3 exhibited a higher error rate. The complete results for BLAST from e-values 10^− 7 to 10^− 1 for the other two datasets are detailed in the Supplementary File 1 (S3.xlsx).

Table 2 BLAST results for test datasets searched against the training datasets (main dataset).

Full size table

AI-based classification methods

ML models

Various ML-based prediction models were trained, including DT, RF, KNN, MLP, ET, SVR, XGB and Lasso. First, the features of IL-2 and IL-2 non-inducing peptides using Pfeature were computed. Using only DPC as the feature, the ET method yielded the best results with an AUC of 0.81 and an MCC of 0.45 on the test data of the main dataset. Length was used as an additional feature to recognise the importance of peptide length in MHC antigen presentation and IL-2 induction. The ET-based model using these hybrid features (DPC and peptide length) achieved a higher AUC of 0.82 and an MCC of 0.48 on the test set. The details for the best classifiers against each feature set for the independent dataset of the main dataset have been provided in Table 3. The statistical details for different classifiers using various feature sets for all three datasets are in the Supplementary File 1 (S4, S5 and S6.xlsx).

Table 3 The test data performance for the best classifiers developed using various types of peptide features for the main dataset (only the best models are shown).

Full size table

DL models

Using the TabNet classification model, a maximum AUC of 0.65 and an MCC of 0.20 was achieved on the test data of the main dataset with Compositional Enhance Transition and Distribution (CeTD) as a feature.

For the 1D CNN model, an AUC of 0.71 and an MCC of 0.32 were achieved on the test data of the main dataset using DPC-length features. The top 3 best performances for the main dataset, using the independent dataset, have been provided in Table 4, and the detailed results for the TabNet and CNN models applied to IL-2 sequences for the other two datasets are provided in Supplementary File 1 (S7.xlsx).

Table 4 The top 3 best feature performances of the DL in the main dataset.

Full size table

LLM models

For this study, a pre-trained large language protBERT model was utilised. The model was fine-tuned for specific tasks by adjusting the number of epochs. Among the epochs tested, epoch 5 provided the best results, with a maximum AUC of 0.69 and an MCC of 0.21 for the main dataset.

Furthermore, since embeddings generated by the fine-tuned model can serve as valuable features, these features were extracted and applied in various ML algorithms. The ET method was the best-performing model, achieving an AUC of 0.71 and an MCC of 0.31. The performance of the Fine-tuned and best classifier model for the main dataset has been provided in Table 5, and the details results for this LLM model for each dataset are given in Supplementary File 1. (S8.xlsx)

Table 5 The performance of the fine-tuned and best classifier model for the main dataset.

Full size table

Feature selection

Four feature selection methods – mrMR, SVC-L1, RFE, and SHAP were applied to the DPC-length feature on each dataset. The best model’s AUC for the main dataset decreased with all three methods when 10, 100, and 200 features were selected. When 200 features were selected, the model’s AUC remained at approximately 0.82 for each method. The Performance metrics results of our feature selection approach on the main dataset using SVC-L1, mRMR, RFE, and SHAP have been provided in Table 6. The selected features were reported in the Supplementary File 1 (S9.xlsx).

Table 6 Performance metrics results of our feature selection approach on the main dataset using SVC-L1, mRMR, RFE and SHAP.

Full size table

Ensemble method

To enhance the predictive capability of our model, we employed an ensemble approach in this study. The highest AUC and MCC were achieved using the ensemble method compared to all other methods, including ML, DL, and LLM on the main dataset. Combining our top-performing model, DPC-length, with ET, along with motif scores using fp set to 10 and “NONE” as the classification method, an AUC of 0.84 and an MCC of 0.51 was obtained. We have tried other classification methods also, but this approach surpassed all other methods evaluated, highlighting its effectiveness in predicting biologically significant IL-2 inducers. The performance of the top 10 motifs on fp 10 developed using hybrid features (DPC-LEN and motif score) on all three datasets can be found in Table 7. The statistical details of various classifiers built using the hybrid feature set (DPC + Length) have been presented in Supplementary File 1 for all the datasets (S10, S11 and S12.xlsx). The comparative performance of ML, DL, LLM and Ensemble models is provided in Table 8, highlighting the best-performing approach across all three datasets.

Table 7 The performance of the top 10 motifs on fp 10 developed using hybrid features (DPC-LEN and motif score) on all three datasets.

Full size table

Table 8 Comparative performance of ML, DL, LLM, and ensemble models across the three datasets.

Full size table

Benchmarking

To properly evaluate the performance and significance of our IL2pred method, we conducted comprehensive benchmarking against three established cytokine prediction tools: IL6pred, IL4pred and IL10pred^19,20,22. This comparative analysis is important for validating that our method achieves comparable performance to existing state-of-the-art cytokine predictors and demonstrating that IL-2 induction represents a distinct prediction task requiring specialised modelling. The comparative performance metrics are provided in Table 9. To further analyse the biological differences between cytokine-inducing peptides, we generated an average amino acid composition difference plot comparing IL-2 inducers with the benchmark data. The average amino acid composition difference plot for benchmark data has been provided in Fig. 4. Additionally, we performed pairwise statistical comparisons of amino acid compositions between IL-2, IL-4, IL-6, and IL-10 using the Mann-Whitney U test. These results provide further evidence of significant compositional differences, supporting the need for a dedicated IL-2 predictor. The detailed statistical results have been provided in Supplementary File 1. (S13.xlsx)

Table 9 Performance comparison of cytokine prediction methods on independent dataset.

Full size table

Methods

Datasets

Main dataset

A total of 8596 experimentally validated IL-2 inducing and non-inducing peptides were extracted from the largest repository of the immune epitope database (IEDB), filtering for MHC binders from any host organism that was experimentally confirmed to either induce or not induce IL-2 production. Of these, 4475 peptides were MHC binders, which can trigger IL-2 secretion as measured by different immunological assays. These epitopes were termed IL-2 inducing peptides and grouped under a positive set. We also extract 4121 MHC binding peptides that do not trigger the IL-2 secretion and are termed non-inducers. The MHC-binding peptides that do not induce IL-2 are called non-inducers and are grouped under a negative set. Literature evidence suggests that peptides of length between 8 and 25 are most suitable for MHC antigen processing and presentation. Thus, all peptides of length below 8 and above 25 were removed. Additionally, all the redundant peptides were removed. The final main dataset consists of 3429 IL-2 inducing and 3145 non-inducing peptides. One of the major features of our main dataset is all peptides are experimentally validated.

Alternate dataset 1

Our main dataset has all MHC binders, which means models developed on the main dataset are only suitable to MHC binders. In case the user does not know whether a given peptide is an MHC binder or a non-binder, then the model developed on the main dataset cannot be used. In order to overcome this limitation, we changed our negative dataset of non-induces from MHC binders to non-binders. In the Alternate dataset, we extracted and selected 3429 non-MHC-binding peptides from IEDB and assigned them as non-inducers. Finally, our Alternate dataset 1 contains 3429 MHC binding IL-2 inducing peptides as positive peptides and 3429 MHC non-binding IL-2 non-inducing peptides as negative peptides. Models developed on the Alternate dataset 1 are suitable for predicting IL-2 inducing peptides in MHC non-binders.

Alternate dataset 2

Our models developed on our main dataset are suitable for predicting IL-2 inducing peptides in MHC binding peptides. Similarly, our models developed on Alternate dataset 1 are suitable to predict IL-2 inducing peptides in MHC non-binding peptides. In case the user has no idea whether their peptide is an MHC binder or a non-binder, then the above datasets are not suitable. In this study, we proposed alternate dataset 2 that contains 3429 MHC binding IL-2 inducing peptides, referred to as positive peptides and 3429 IL-2 non-induces. In Alternate Dataset 2, IL-2 non-inducers contain a mixture of MHC binders and non-binders, which do not induce IL-2.

Length distribution and composition analysis

To better understand the characteristics of the peptides in both the positive and negative datasets, a comprehensive analysis was performed on their length distribution and amino acid composition. This analysis was carried out using custom Python scripts, which were designed to generate bar plots for visual representation.