Introduction

Human T-cell leukemia virus type 1 (HTLV-1) is the first identified retrovirus known to cause chronic lifelong infections in humans1. Following infection, it causes adult T-cell leukemia-lymphoma (ATL), a form of blood cancer, and HTLV-1-associated myelopathy (HAM), a chronic inflammatory disease of the central nervous system. While most infected individuals remain asymptomatic, approximately 2 to 5% of HTLV-1 carriers develop ATL2, and 0.25 to 4% develop HAM3. The annual national incidence rate of new HTLV-1 infections in Japan has been reported as 3.8 cases per 100,000 person-years4. The combined strategies of immune evasion and suppression allow HTLV-1 to remain in the host for a prolonged time without expressing symptoms which complicates the identification of carriers at higher risk of developing ATL or HAM5,6,7. Sex and age also have correlations with the disease onset that ATL occurs more in males while HAM in females, which might reflect the hormonal or immunological differences8,9. ATL has been reported to occur in adults at least 20 to 30 years after the infection and for HAM, the latency period ranged from 4 months to 30 years8,10, thus around in the middle age. While it remains unknown why some HTLV-1 carriers develop the disease, it is of interest to develop methods that can identify high-risk asymptomatic individuals.

For HTLV-1-infected individuals, the diversity of the infected cells and immune responses is influenced by the interaction between the virus and the host’s immune system. A disruption of the equilibrium between viral persistence and host immunity can trigger the onset of associated diseases. Distinct immunological responses may arise from impaired immune regulation against the HTLV-1 infection11. As reported previously, HTLV-1 infection evokes both cellular and humoral immunity12,13,14. Therefore, immunological markers can be instrumental in evaluating the risk of associated diseases. Although proviral load (PVL) has been considered a risk factor, its interpretation in asymptomatic carriers is still challenging due to individual variations and lack of definitive thresholds for disease risk15,16. Additionally, some studies reported the PVL quantity remains constant over several years, regardless of clinical manifestation16,17,18,19. However, antibody profiling, when combined with other markers like PVL, has been reported as useful for distinguishing asymptomatic carriers from ATL or HAM patients20,21,22,23 and potential for predicting disease progression. Moreover, serum-based testing is simpler and cheaper than genetic analysis. Hence, analyzing antibody responses can provide a quantitative and specific method for identifying the infected individuals at elevated risk.

In our previous study using a modified Luciferase immunoprecipitation system (LIPS) assay, we screened patients for antibodies against each of the viral proteins, HTLV-1 Gag proteins (p15, p19, p24), Env, and Tax. Along with multivariate analysis of the antibody titers followed by targeted sequencing, we identified carriers at high risk for ATL. We also listed important factors for separating the HAM subgroup22; however, this finding might overlook the possibility that some carriers are already showing similar antibody responses to HAM and thus need to be explored in more depth. Although longitudinal studies are invaluable for observing the disease progression over time, it is still challenging given a prolonged latency period. Moreover, considering the treatment effects, it may not fully capture the hidden patterns within the data. Cohort studies usually maintain high data quality and standardized data collection protocols. This reduces noise and inconsistencies, making ML models more compatible and generalizable24. Machine learning (ML) models surpass the traditional multivariate statistical methods by handling complex, high-dimensional data and uncovering non-linear patterns. Based on clinical data alone ML models offer a robust alternative tool within data-rich contexts25,26,27,28.

In the present study, we developed a two-tiered ML-based framework integrating antibody titers to HTLV-1 PVL and Tax, Env, along with the immunogenic mature Gag p15, p19, and p24 proteins to identify and characterize the asymptomatic carriers with a higher likelihood of developing HAM. Collectively our results show that the ML-based approach can be effective in risk prediction and early intervention of HAM that might otherwise remain undetected through conventional diagnostic approaches.

Results

Detection of potentially high-risk carriers

At the data preprocessing step, before applying ML models to our dataset, we handled the multicollinearity issue29 by excluding Gag p19, based on the highest VIF score (see Determination of key variables in method and Supplementary Table S1A and B for details). Out of 264 asymptomatic carriers, the Isolation forest model detected 17 carrier samples as anomalies, which we labeled as anomaly carriers (AC). One carrier who later developed HAM (CDH) was also identified as an anomaly and included in the AC group, suggesting that the Isolation forest model effectively filters carrier samples with a possible risk of HTLV-1-associated diseases. For details about the algorithm of Isolation forest and choice of hyperparameters, please see Anomaly Detection by Isolation Forest Algorithm in Methods.

Next, we compared the performance of four ML models based on PRAUC scores to classify samples into 3 subgroups: non-anomaly carrier, HAM, and ATL. We chose the Random forest model (RF) for its superior performance (See Classification Modeling in Method for details). Table 1 shows each classifier’s average overall mean PRAUC (mean of per-class PRAUCs) across five repeats of one-vs-rest nested CV. RF classifier model was used to predict the holdout set of AC samples (n = 17). Interestingly, the classifier predicted the majority of AC samples (n = 13 out of 17) as HAM [Fig. 1]. The one CDH sample was also predicted as HAM by the classifier model. Moreover, in the case of 17 anomaly carrier samples, all the feature values showed a stronger positive correlation with their predicted probabilities of HAM [Supplementary Table S4].

Table 1 Average of overall PRAUC (mean of per-class PRAUCs) for each classifier across five stratified subsampling repeats in one-vs-rest nested cross-validation.

Results of Partial Least Squares (PLS) distribution of sample groups further revealed that AC samples were localized near the HAM cluster (Fig. 2). This result combined with the classification result where most AC samples predicted as HAM, suggests that AC samples share significant similarities with HAM samples, further supporting our hypothesis that AC samples exhibit ‘HAM-like’ characteristics.

Fig. 1
figure 1

Illustration of the prediction results of anomaly carrier samples by the random forest classifier model. The left bar shows the training and test data for the classifier model. The model was trained on three sample groups and predicted the anomaly carrier samples as unseen test data. The anomaly carrier samples (n = 17) were classified into three prediction groups: Around 76.47% of the anomaly carrier data were predicted as HAM, whereas only 17.64% and 5.88% of the samples were predicted as carrier (n = 3) and ATL (n = 1) respectively (shown on the right bar).

Fig. 2
figure 2

PLS-based visualization of sample distribution across clinical groups. This PLS plane shows the distribution of all 369 samples from non-anomaly carriers (green), ATL patients (blue), HAM patients (yellow), anomaly carriers (red), and the CDH (dark red as edge color and white-centered). Each dot on the plot represents an individual sample. This plot depicts the clustering of the sample groups based on the analyzed variables, where anomaly carrier samples were positioned near the HAM cluster.

Comparison of feature value distributions among the subgroups

Figure 3 compares the feature values of 4 subgroups. Between AC and non-anomaly carrier comparison, all features in the anomaly carrier were significantly higher. On the contrary, we found no statistically significant difference between AC and HAM for PVL (p = 1.0), Tax (p = 1.0), Gag p15 (p = 0.34), Gag p19 (p = 0.65) and Gag p24 (p = 0.89) except Env (p = 0.0048) (for details about the statistical tests, please see Boxplot Visualization and Statistical Tests in Methods). These observations collectively indicate that AC samples display a high degree of similarity with HAM/TSP. Consistent with previous research11, we found antibody responses to the immunodominant proteins (Env, Tax, Gags) higher in HAM patients (n = 56). Conversely, PVL in ATL patients was significantly higher than in all other subgroups (Fig. 3, Supplementary Table S6), which is consistent with previous studies4,30,31.

Fig. 3
figure 3

The boxplots collectively illustrate the distribution of PVL and Antibody titers to HTLV-1 antigens Tax, Env, Gag p15, Gag p19, and Gag p24 across different clinical subgroups: non-anomaly carrier (green), ATL (blue), HAM (yellow), and anomaly carrier (red). The individual data points overlaid on the boxplots show the actual distribution and density of the data.

Exploring biomarkers for anomaly carrier detection

To determine the crucial factors that characterize the anomaly carrier and HAM, SHapley Additive exPlanations (SHAP) analysis was performed. SHAP analysis is an interpretable machine learning framework that can assess the impact of each feature on the classification of each class (non-anomaly carriers, ATL, HAM, and anomaly carriers). Figure 4 shows the SHAP bar plot from the XGB classifier which performed the best [Supplementary Fig. S6]. We found Tax is the most important feature for HAM, and Gag p15 for anomaly carriers, followed by Env and Gag p24. Although the prediction above indicated high similarity between anomaly carriers and HAM subgroups, the ranking of feature importance differs qualitatively. Gag p15 and Env are influential features in anomaly carriers but their relative rankings are not uniformly elevated in HAM. Also, Tax exhibits a lower ranking in the anomaly carrier.

Fig. 4
figure 4

SHAP feature analysis of HAM and anomaly carriers. Each bar shows an absolute median value of 300 iterations for different random seeds and the black line represents standard deviation. SHAP analysis for non-anomaly carriers and ATL is in Supplementary Fig. S7.

Discussion

In this study, we developed a machine learning-based approach to capture HTLV-1 carriers at elevated risk of HAM progression. The Isolation forest anomaly detection algorithm identified a subgroup of anomaly samples from the asymptomatic HTLV-1 carrier population. Further characterization through classifier prediction and statistical analysis revealed that the anomaly carrier samples closely resemble the characteristics of HAM, suggesting a similar disease trajectory. Additionally, different patterns of antibody response were observed among the asymptomatic carriers and other clinical subgroups which enabled us to further investigate the risk factors. Finally, we utilized SHAP for comparative feature analysis among the sample groups (non-anomaly carrier, anomaly carrier, ATL, and HAM) to identify the key driving features that characterize each subgroup and contribute to the disease progression.

The main aim of this study was to shed light on asymptomatic carriers who are at a high risk of progressing HAM onset. With most of the anomaly carrier samples being predicted as HAM by the RF classifier [Fig. 1], our hypothesis was further supported when the purposely included CDH sample in the carrier population was also identified as an anomaly and subsequently predicted as HAM. The potential similarities in the underlying profiles of the anomaly carrier samples are also reflected in their clustering near the HAM samples (Fig. 2). All features were significantly higher in anomaly carriers compared to non-anomaly carriers (Fig. 3). Elevated antibody responses in anomaly carriers might reflect the immune response have higher activity during disease progression. Interestingly, we found that only anti-Env antibody titer in anomaly carriers differed significantly from those of HAM (Fig. 3, Supplementary Table S6), whereas other features showed no significant differences. Env is one of the structural proteins of a virion and is necessary for cell-to-cell transmission. Thus it is a primary target of the antibody response32,33,34,35. Furthermore, elevated anti-Env antibody responses have been associated with HAM patients in several studies, which supports our result11,20,36,37. A novel implication is that, before onset, the rate of progression accelerates, as evidenced by the increased antibody levels. In HAM, the immune response is fully engaged; however, in progressive asymptomatic carriers, this saturation has yet to be achieved20. This phase might represent a snapshot of dynamic host-virus interaction where these rising antibody titers likely reflect the heightened viral activity and the immune system’s escalating response as the disease advances toward clinical manifestation. Ultimately, a saturation point is reached at the onset of the disease, where antibody levels level off as the immune response shifts into a steady-state phase. This might be well reflected in feature analysis, where the SHAP value of Env is relatively high in the anomaly carrier, but not in HAM and non-anomaly carrier [Fig. 4, Supplementary Fig. S7].

We found Tax to be the predominant feature of HAM, consistent with findings from multiple studies20,38. Furthermore, prior studies have reported significantly higher antibody responses to Env and Gag proteins in HAM patients reinforcing their potential role in HAM patients11,20. It is known that during infection, Gag and Env proteins are initially unpolarized in isolated T cells and accumulate at the cell-cell junction upon contact. Gag protein is subsequently transferred from HTLV-1-infected T cells to uninfected T cells39. Aligning with these previous observations, we interestingly found the feature values of Gag p15, p24, and Env of anomaly carrier samples exhibited a significant inverse relationship with their anomaly scores, i.e., higher feature values correspond to higher anomaly levels [Supplementary Table S2, Table S5, Supplementary Fig. S5]40. Assessment of humoral immunity to Gag demonstrates potential as a biomarker for detecting high-risk individuals. In our study, we succeeded in suggesting that Gag p15 protein has some important function that may lead to developing HAM onset [Figs. 3 and 4 and Supplementary Fig. S7], however, we avoid attributing our result to some implications about Gag p15; further research is required to identify the specific function of these mature Gag proteins (p15, p19, and p24). It is noteworthy that, although the SHAP value of Gag p24 falls within the high-ranking features that characterize anomaly carriers in some classifiers, we opted to exclude the interpretation of Gags due to their inconsistent contribution patterns observed across the multiple classifiers employed in this study [Supplementary Fig. S7].

Identifying the risk for developing HAM onset is challenging compared to other HTLV-1-associated diseases. In the case of ATL, for example, the risk can often be characterized by the changes in the clonality of infected cells, since a single clonal infected cell expands during the viral progression. Also, several driver mutations are reported to stimulate malignancy, thus leading to the survival of pathogenic cells and outcompete other infected cells towards monoclonal proliferation41. While these promising markers can detect risks of ATL onset, HAM is less described for early diagnosis, due to the nature of its slow progression42. Moreover, complicated host immune responses against infected cells vary widely between patients with different lifestyles, which makes the prediction more difficult43. Having anti-Env at the top of the list, elevated antibody titer might be a key observation for evaluating disease progression.

Of interest is the significant heterogeneity in immune response among the asymptomatic carriers in our study. Surprisingly, antibody responses (against Env, Tax, Gags, and PVL) in many asymptomatic carriers were observed at the same elevated level as that of HTLV 1-related diseases (ATL and HAM). This finding led us to our initial hypothesis to detect high-risk asymptomatic carriers (i.e., anomaly carriers) who are likely to progress to disease onset. Although heterogeneity seems to be obvious when considering the various lifestyle backgrounds of patients, it is noteworthy to confirm it based on our large number of asymptomatic carrier data. This leads to get our distinct approach but still aligns with our previous findings using the same dataset, showing latent and diverse potential of the asymptomatic carriers22.

Our work acknowledges some limitations. First, we don’t have information on anomaly carriers whether they develop HTLV-1-related diseases in the future except for one sample who was diagnosed as HAM later (CDH). To fully evaluate the prediction and the hypothesis of our result especially for HAM, further data accumulation would be critical (a prospective study like15). Second, little is known about the relationship between the antibody titers and the host immune defense as mentioned above. For the dataset from the LIPS assay like ours to be used as a clinical diagnosis, these interplays should be explored in more depth. Furthermore, inconsistent results in antibody titers from previous studies have discouraged clinical application, which makes it difficult to choose consensus cutoff values for disease distinction42. Additionally, our dataset exhibits class imbalance (more carriers than ATL and HAM) which is reflected by the actual skewed prevalence of the disease, but are still biased toward being balanced (larger proportions of ATL and HAM than in the real-world). We implemented repeated down sampling to prevent the model from inaccurate training, though each down-sampling excludes some carrier samples and thus we run the risk to understate model generalizability when applied to the full population. Integrating additional clinical metadata relating clinical background or comorbidities which we don’t have here may enhance the interpretability of the anomaly-detected carrier subgroup. Finally, as this study was designed as a data-driven investigation focusing primarily on immunological patterns inferred from antibody titer profiles, our findings should be considered exploratory and hypothesis-generating. Given the absence of external model validation, the results remain preliminary. The identified risk indicators are not clinically actionable conclusions but should be viewed as starting points for further prospective studies involving larger cohorts and independent validation.

Methods

Ethics statement

This study was performed in accordance with the Declaration of Helsinki and was approved by the Ethics Committees of Kumamoto University (accession numbers: G489, G499, and E2214). Written informed consent was waived because of the retrospective design. Consent for publication was obtained from all patients.

Study population

The data used in this study was published previously by Yamada et al.22. PVL and antibody titer data (non-time series) were collected against HTLV-1 antigens Tax, Env, Gag p15, p19, and p24 using LIPS assay. No cut-off was applied to LIPS antibody titers, allowing continuous evaluation of their distribution and diagnostic relevance. In our analysis, the ATL group comprised of both individuals who had been diagnosed with ATL at the time of sample collection (n = 25) and those who were carriers at the time of sampling but were later diagnosed with ATL (CDA, n = 24). We also had only one carrier who later developed HAM (CDH) and it was purposefully included into the carrier population. Therefore, we focused our study on 264 asymptomatic carriers, 49 ATL, and 56 HAM patients.

Determination of key variables

Initially, Spearman’s rank correlation revealed a significant correlation between Gag p19 and p24 [Supplementary Fig. S2]. To address the multicollinearity issue and choose the variables to use in the ML analysis, the Variance Inflation Factor (VIF) score was used29. See Supplementary Tables S1A and S1B.

Anomaly detection by isolation forest algorithm

For identifying potential outliers or anomalous data points from the asymptomatic carrier population (n = 264), we selected the Isolation Forest Anomaly Detection algorithm, an unsupervised machine learning technique based on decision trees, as our primary method because of its unique approach of isolating anomalies rather than profiling normal data. For each datapoint (sample), the following process is repeated until the datapoint is isolated:

  1. 1.

    Randomly select a feature (e.g. PVL).

  2. 2.

    Randomly choose a threshold between the maximum and minimum values of the selected feature (e.g. PVL = 0.1) and divide the data points below and above the threshold.

The key idea is that data points with anomalous feature values are likely to be isolated with only a few iterations. The algorithm constructs an ensemble of isolation trees for a given dataset and uses the path length from the root to the leaf to determine the anomaly score. Given m is the number of data points, the anomaly scores s for a datapoint x is defined as

$$\:s(x,m)={2}^{\:\frac{-E\left({h}_{i}\right(x\left)\right)}{c\left(m\right)}}$$
(1)

,

where \(\:{h}_{i}\left(x\right)\) represents the path length for the \(\:i\)-th isolation tree, \(\:E\left.\left({h}_{i\left(x\right)}\right.\right)=\:{\sum\:}_{i}{h}_{i\left(x\right)}\)

denotes the average path length across the ensemble of isolation trees,

\(c\left( m \right){\text{ }}=\left\{ {\begin{array}{*{20}{c}} {2H(m - 1) - \frac{{2(m - 1)}}{m}}&{(m>2)} \\ 1&{(m=2)} \\ 0&{{\text{(otherwise)}}} \end{array}} \right.\)

is the average path length for a dataset with m points, utilized as a normalization factor44, and \(\:H\left(k\right)\) is the harmonic number. The sklearn implementation of the decision function of Isolation Forest yields negative anomaly scores, where lower (negative) scores indicate potential anomalies40.

By applying a cutoff threshold at -0.05 to the anomaly scores of the Isolation forest, we isolated the anomaly data points for further investigation45. This threshold was strategically chosen to capture approximately 5% of the most extreme anomalies (inversely corresponding to the 95th percentile of the normal data distribution) from our carrier population [Supplementary Fig. S4]. Since around 4% of the carriers develop HAM3,46, we aimed to mirror this proportion.

The resulting anomaly carrier samples were then removed from the carrier data and considered as a holdout test set (unseen data) for further classification analysis. The remaining non-anomaly carrier, ATL, and HAM samples were used for training and cross-validation of the classifier models. Additionally, the feature values of the anomaly carrier samples were tested for Spearman correlation analysis with their anomaly scores. The difference between the sample groups was evaluated by plotting all the samples in a PLS plane.

Classification modeling

We employed the One-vs-Rest (OvR) approach to address the multiclass classification problem. This approach breaks down the multiclass classification into multiple binary classification tasks, where one classifier is trained for each class against all others. Given our data volume, we relied on these classifier models because they represent diverse and well-established approaches suitable for our classification task. Combining tree-based ensemble models with SVMs has been shown balancing performance, interpretability, and generalization in high-dimensional biomedical data47. To determine the best-performing model, we evaluated four different classifiers: Random Forest classifier (RF), XGboost Classifier (XGB), Extra Trees Classifier models (ETC), and Support Vector Machine (SVM).

Nested cross-validation (CV) was used to ensure robust performance evaluation and avoid overfitting. Particularly, an outer cross-validation loop was used to assess the model performances, while an inner loop was used to optimize the hyperparameters of each classifier using GridSearch. To address class imbalance and ensure robust evaluation, we first performed bootstrap downsampling of the carrier group: in each of five independent iterations we randomly sampled 147 carriers with replacement, then merged these with the full ATL and HAM cohorts to form a training/validation subset. We then applied nested CV to each subset, using an outer 5-fold loop to estimate model generalization and an inner 5-fold loop within each outer training fold for hyperparameter tuning using GridSearch. The overall mean area under the precision-recall curve (PRAUC)46,48 served as the optimization criterion in the inner loop and as performance metric in the outer loop.

Next, for each candidate classifier (Random Forest, XGBoost, ExtraTrees, SVM) we averaged its per-class PRAUCs across the five outer-fold repeats to obtain an “overall mean PRAUC” per repeat, and then aggregated these values across repeats to yield a mean for each model. The model with the highest average overall mean PRAUC was chosen as the best, retrained on the full training set, and then applied to the held-out anomaly carrier samples. We extracted the predicted probability for the target class among anomaly carriers. The predicted probability of HAM among the anomaly carrier samples was calculated, followed by a correlation analysis of the predicted probabilities and their feature values. The workflow is depicted in [Supplementary Fig. S1]. For the classification models performed in this study, the implementation available in the sklearn library was used40.

Boxplot visualization and statistical analysis

We employed a combination of visual and statistical methods facilitating an initial comparison of the feature distributions among different sample groups including anomaly carriers. The Kruskal-Wallis test was performed, with a significance level set at α = 0.05. P-values were adjusted for multiple comparisons using the Bonferroni correction method for Dunn’s post-hoc analysis to maintain the overall type 1 error rate. The statistical analysis was performed using the Python Scipy package49,50.

Interpretation with SHapley additive explanations (SHAP) analysis

As an approach to interpreting the model’s behavior, the Shapley Additive exPlanations (SHAP) framework was used51,52. It provides the SHAP value for each feature for all samples and explains how much an increase in each feature value can affect the predicted probability for each clinical subgroup (non-anomaly carriers, ATL, HAM, and anomaly carriers). A higher SHAP value indicates a greater impact on the classification of a sample into a specific subgroup, while a lower SHAP value corresponds to a smaller impact. In this section, four classifiers (RF, ETC, XGB, and SVM) were explored for their performance in terms of PRAUC using nested cross-validation and were calculated for 300 different random seeds (i.e., different values for parameter random_state). Different random seeds are considered in this study because we wanted to extract the SHAP value which is consistent whenever the randomized manipulation during the learning process is different. This allows us to evaluate the results with a high degree of confidence. For each random seed, hyperparameters were optimized on all data without cross-validation by GridSearch and used for calculating SHAP value. The absolute median of the SHAP value from all samples was collected for 300 random seeds, and then the absolute median value and its standard deviation were calculated for visualization. Specifically, KernelSHAP was applied for all classifiers in a SHAP python package (Version 0.45.1)52.