Introduction

Nasal extranodal natural killer/T-cell lymphoma (ENKL) is a rare subset of lymphomas, and is relatively prevalent in Asia1. The patients with the aggressive disease often have good performance status at the first visit2, but frequently exhibit multidrug resistance (MDR) and do not well respond to the anthracycline-based regimens (CHOP)3.

Currently, discriminating the patients with unfavorable prognostic factors is important for the selection of treatment modalities4. According to the study from Yang et al.4, for stage IE patients without risk-factors (age < 60 y, Eastern Cooperative Oncology Group performance status <2, normal LDH, Ann Arbor stage I, and no primary tumor invasion), radiotherapy (RT) alone could achieve a 5-year overall survival rate (OS rate) of 88.8%. For stage IE patients with one of these risk-factors, the treatment of combined RT and chemotherapy (CT) could achieve a 5-year OS rate of 72.2%.

Low hemoglobin level is associated with poor outcome of some lymphomas, such as Hodgkin and diffuse large B-Cell lymphoma5, 6. Although nasal ENKL is classified as a type of lymphoma7, and shares some prognostic factors with others, such as International Prognostic Index (IPI) and lactic dehydrogenase level (LDH)8, 9, seldom studies had evaluated the prognostic ability of hemoglobin for nasal ENKL. Therefore, we designed this study to explore its prognostic ability for the patients in stage I–IV, and to integrate it into a prognostic nomogram.

Methods

Patients, Treatment and Follows

At the Shanxi Cancer Hospital and Institute (SCHI) and the First Affiliated Hospital of Anhui Medical University (FAHAMU), nasal ENKL cases between 2000 and 2015 were collected according to the morphological and immunohistological criteria of the World Health Organization classification10. Before any treatment, patient information included history taking, physical and laboratory examinations, and results of computerized tomography. Hemoglobin levels were classified into lower or higher than 120 g/L. To minimize migrations of staging techniques through a time-span of 15 years, only computerized tomography series of PET/CT scans were used to stage and evaluate the diseases (n = 17). Our protocol was approved by the ethics committee at SCHI and FAHAMU.

Because tumor heterogeneity affects the robustness of predictors11, some re-sampling approaches are used to ensure our model can be applied in other cohorts of patients. Using SPSS software (version 10.01), 20% of all recruited patients (group A) were randomized into the group C, and the rest was as the group B. Both the group A and B were used to develop prognostic model for guaranteeing its repeatability. And then, the group B and C were used to validate the model for assuring its reliability. The approach was similar to the external validation procedure, i.e. the developed model (from group B) was validated in another cohort of patients (group C).

Additionally, using the bootstrap method, a 10 fold cross-validation was used to test the generalization ability of the model12. The patients (group A and B) were equally and randomly divided into10 subsets. And then, the model was repeatedly trained and validated 10 times. Each time, the pooled data of 9 subsets were used to train the model, which was validated in the retained subset subsequently. The average error across 10 rounds (integrated Brier score, IBS) could estimate the error in generalizing the model in an independent dataset, and a lower IBS indicated a better model13.

According to the paradigm of the two hospitals, CT was the first-line treatment, and only the poor responders would receive RT. After the publication of the study by Li et al.14 in 2006, more patients received RT than before; however, RT was still not the first-line treatment. From medical records or by telephone, all patients were followed to the end of August 2016. Overall survival (OS) was measured from the day of diagnosis to death from any cause.

Prognostic Model and Validation

Separately using the data of the group A and B, the significance of prognostic factors against OS was univariately identified by the Kaplan–Meier and the Log-rank test (P < 0.05). And then, the multivariate Cox regression was used to select the qualified factors against OS from the significant ones. At last, the same factors between the group A and B were used to develop our model.

Table 1 Patient characteristics.

To validate IPI, Korean prognostic index (KPI), Yang’s9, and our models, their discriminatory ability of the three groups was compared by C-index (mean ± SE), which was similar to the area under the receiver operating curve (ROC) of the models. After that, the nomogram of our model was built, and calibration plots of the model were constructed between the predicted and the observed survival probabilities. A better model would have a C-index closing to 1, and a calibration curve closing to the line passing through the original point with a slope of 1.

Additionally, to evaluate the prognostic ability of factors involved in these models, especially age and hemoglobin, an indicator of random survival forest (RSF) classifier was used. Although RSF is not so popularly used as the Cox multivariate regression, it is a more accurate method for analyzing survival data. Minimal depth is an indicator of RSF classifier to evaluate the prognostic ability of each factor. A smaller minimal depth of a factor is, the more ability it has on prognosis15, 16.

Data were analyzed with the SPSS statistical software (version 10.01) and the R Project software package (version 3.3.1). A two sides of P < 0.05 was considered as the significant level.

Ethics approval and informed consent

Our protocol was approved by the ethics committee of the Shanxi Cancer Hospital and Institute, and the First Affiliated Hospital of Anhui Medical University. The study was conducted in accordance with the relevant guidelines and regulations. Informed consent was obtained from all participants according to the institutional guidelines.

Data availability statement

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Results

Patient Characteristics and Treatments

The median ages of the group A (n = 192), B (n = 155) and C (n = 37) were 42.8 y (9–79 y), 42.7 y (9–79 y) and 43.0 y (17–74 y), respectively. Patient characteristics are listed in Table 1, and are well balanced between the group B and C.

Patient treatments are presented in Fig. 1. Among the patients received CT alone (n = 76), the CHOP or CHOP like, L-ASP, GEM, and other regimens were administered 4–7 cycles (n = 28, median 6), 4–8 cycles (n = 27, median 6), 4–8 cycles (n = 9, median 6) and 4–7 cycles (n = 10, median 6), respectively. Among the patients received combined treatment (n = 107), CHOP or CHOP like, L-ASP, GEM and other regimens were administered 4–8 cycles (n = 49, median 6), 4–9 cycles (n = 45, median 6), 4–8 cycles (n = 8, median 6) and 4–7 cycles (n = 5, median 6), respectively. Between the group B and C, neither regimens (CHOP vs non-CHOP, x 2 = 0.002, P = 0.966) nor CT cycles (x 2 = 11.757, P = 0.302) existed significant difference.

Figure 1
figure 1

Patient treatments.

RT (n = 115) was given by 6-MV linear accelerators (Varian), and the median dose was 50 Gy (range 8–70 Gy). The most patients (n = 75) received conventional radiotherapy (46/75 patients ≥ 50 Gy), and others (n = 40) received intensity modulated radiotherapy or three dimensional conformal radiation therapy (33/40 patients ≥ 50 Gy). Between the group B and C, radiation dose of 66/92 and 13/23 patients were higher than 50 Gy (x 2 = 1.981, P = 0.159), and 60/75 and 32/40 patients received conventional RT (x 2 = 0.000, P = 1.000), respectively.

Outcome and Prognostic Model

By the end of August 2016, the median observation times on the group A, B and C were 24.0 months (range 0–142), 23.0 months (range 0–131) and 27.0 months (range 1–142), respectively. Eighty-one patients died within 0.0 to 127.0 months (median 12.0 months). The results of univariate analysis of the group A and B are presented in Table 2, and only the significance of the factor of distant metastasis is different between the two groups. Additionally, the factor of treatment was significant for OS in both groups.

Table 2 Results of univariate analysis.

In the multivariate Cox regression of group A, treatment (Score = 1.644, P = 0.200) was excluded from the model, while Ann Arbor stage (OR = 1.389, P = 0.025), primary tumor invasion (OR = 2.004, P = 0.003), LDH (OR = 2.073, P = 0.003), hemoglobin (OR = 2.341, P = 0.003), and ECOG performance status (OR = 4.116, P = 0.000) were significant. For the group B, both distant metastasis (Score = 0.511, P = 0.475) and treatment (Score = 1.802, P = 0.180) were excluded. Therefore, the significant factors were same between the group A and B, and were Ann Arbor stage, primary tumor invasion (PTI), LDH, hemoglobin and Eastern Cooperative Oncology Group performance status (ECOG PS).

Prognostic Model and Nomogram Validation

To validate prognostic models, C-index of IPI, KPI, Yang’s, and our model was separately calculated for the three groups (Fig. 2A), and indicated that ours and the Yang’s model were obviously better than IPI or KPI index. Furthermore, C-index of our model (mean ± SE) was slightly better than that of Yang’s in the group A (75.7% ± 4.4% vs. 75.4% ± 4.4%), B (74.3% ± 4.8% vs. 74.0% ± 4.8%) and C (78.2% ± 10.7% vs. 77.5% ± 10.6%).

Figure 2
figure 2

Evaluation of prognostic models and factors. (A) C-index of the models. (B) Minimal depth of factors involved in Yang’s model and ours. (C) Integrated Brier score of the models.

Because our model differed from Yang’s in the substitution of age with hemoglobin, survival plots of age and hemoglobin were compared (Fig. 3), and indicated that hemoglobin was better than age in both group A and B. The result was also confirmed by RSF classifier. Both in the group A and B, the minimal depth of hemoglobin were lower than that of age, and were even lower than that of LDH (Fig. 2B).

Figure 3
figure 3

Survival plots of hemoglobin and age for OS in the group A and B. The left and right columns are those for the group A and B, respectively. The upper and lower rows are those for hemoglobin and age, respectively.

In the 10 fold cross-validation of the group A and B (Fig. 2C), our model’s IBS was obviously lower than others’, and the Yang’s was even the highest. Therefore, these results indicated that the generalization error of our model was the smallest among the evaluated models.

In the group A, the nomogram of our model for predicting 3-year and 5-year OS is plotted in Fig. 4A. To calibrate the nomogram, the predicted 3-year and 5 year OS was separately plotted against corresponding actual OS (Fig. 4B and C). On the plots, the curve between predicted and actual OS closes to the line passing the original point with a slope of 1, which indicates the good agreement between them.

Figure 4
figure 4

Nomogram and calibration of our model. (A) Nomogram. (B,C) are the calibration plots for 3-year and 5-year prediction, respectively.

Discussion

Our study indicates that hemoglobin level is a prognostic factor for the patients with nasal extranodal natural killer/T-cell lymphoma in stage I - IV, and the validated prognostic nomogram (Ann Arbor stage, PTI, LDH, hemoglobin, and ECOG PS) can be used to predict the outcome of the patients. Although, among the evaluated models, ours just slightly improves C-index, its generalization error is the smallest.

Classified as a kind of lymphoma, nasal ENKL shared some prognostic factors with other lymphomas, such as IPI and LDH level8, 9. Previously, using multivariate Cox regression, several studies tried to relate hemoglobin to the prognosis of the patients. However, because of the limited cases, Ma17 and Kim18 (n = 64 and 62, respectively) failed. In the study from Xu et al.19 (n = 170), according to hemoglobin levels, the recruited patients were grouped (threshold value 100 g/L, vs. 120 g/L of our study). In the univariate analysis, the factor of hemoglobin was significant against progression free survival (P = 0.034), but not significant against OS (P = 0.057). In a cohort of 321 patients, Wang et al.20 found that hemoglobin, ECOG PS, age, LDH and Ann Arbor stage were significant factors for both progression free survival and OS, but they did not validate the results in another group of patients20. Therefore, we confirmed their results that hemoglobin level was a prognostic factor for nasal ENKL patients.

Compared to multivariate Cox regression, random survival forest classifier is better in modeling non-linear effects and complex interactions among factors. Furthermore, the classifier can provide indicators, such as minimal depth, to evaluate the prognostic ability of each factor21. As indicated by the depth of RSF classifier, hemoglobin was better than age in predicting the outcome of the patients. Additionally, the results were also confirmed by the survival plots of age and hemoglobin (Fig. 3).

Therefore, the factor of age in Yang’s model, which regressed from the patients in stage I and II9, was substituted by hemoglobin, and the substitution could slightly improve the C-index of the model. Furthermore, as indicated by the IBS of 10 fold cross-validation, the generalization error of our model was the smallest among the evaluated models, especially was obviously better than Yang’s. It should be noted that, for the comparison of models, we re-regressed Yang’s model, which differed from its origin in the scoring points of factors. Therefore, the re-regression made up the difference of C-index between Yang’s and our model.

For diffuse large B-cell lymphoma and non-Hodgkin lymphoma, hemoglobin < 120 g/L is a frequent sign at diagnosis, and interleukin-6 plays a vital role in its development8. Because it had been reported that both interleukin-9 and −10 related to the poor prognosis of the nasal ENKL patients, the underlining mechanism might be that interleukins act as growth factors of tumor cells and participate in the production of erythropoietin (EPO)22, 23. However, further studies should be conducted to confirm this hypothesis.

Besides the prognostic factors, there were other differences between Yang’s and our nomogram. In the Yang’s model9, Ann Arbor stage had the highest score, that was followed by ECOG PS, PTI, age and LDH. In contrast, the sequence of our nomogram was ECOG PS, hemoglobin, LDH, PTI and Ann Arbor stage. Additionally, Yang’s model was developed for stage I-II patients, and ours was for those from stage I to IV.

Besides the prognostic factors identified by previous studies, other powerful factors might be long non-coding RNAs (lnRNAs), microRNAs (miRNAs), and so on. The potential relation between these markers and nasal ENKL could be predicted by some computational models24,25,26. After verifying the markers in laboratory research, our future work would focus on their prognostic ability.

As a study with limited cases, we tried another way to validate our model. Firstly, the enrolled patients (n = 192, group A) were randomly divided into group B (n = 155) and C (n = 37). Subsequently, to validate the repeatability of our model, both the group A and B were separately used to develop prognostic models. And then, C-index of all groups was used to evaluate the discriminatory of models. The approach was similar to the external validation procedure, i.e. the developed model (from group B) was validated in another cohort of patients (group C). At least in this study, the prognostic ability of hemoglobin was coincided with the results from Wang et al.20, and indicated that the validation method might be useful for other studies with limited cases.

Conclusions

Hemoglobin is a prognostic factor for nasal extranodal natural killer/T-cell lymphoma patients from stage I to IV, and integrating it into a validated prognostic nomogram, whose generalization error is the smallest among the evaluated models, can be used to predict the outcome of the patients.