Fig. 4: Development and validation of an independent gene-expression signature to predict treatment outcomes in DLBCL patients.

The RNAseq and microarray datasets were merged into a larger cohort (n = 1376) using a quantile normalization approach. The samples were subsequently randomly divided into a discovery cohort (70%, n = 964; A–F) and a validation cohort (30%, n = 412; G–J). A Forest plots showing the association between the expression levels of the 24 genes and PFS within the discovery cohort. B The distribution of 24-gene expression scores in each DLBCL patient, and the correlation between PFS and risk groups in the discovery cohort. Patients were assigned to high- and low-risk groups based on the optimal threshold for the ROC curve, set at −0.521. Each dot represents one patient. C Kaplan–Meier survival analysis illustrating PFS between high- and low-risk groups in the discovery cohort. The p value was calculated by the log-rank test. D Bar plots showing the distribution of high- and low-risk patients within poor and good outcome groups in the discovery cohort. Fisher’s exact test was utilized to determine the p value. E Univariate and multivariable Cox regression analyses demonstrating the prognostic independence of the 24-gene-expression scores in the discovery cohort. Key clinical parameters such as age, subtype, stage, and IPI factors are included in the analysis. F ROC curves demonstrating the performance of different parameters in identifying DLBCL patients with two-year poor outcomes in the discovery cohort. AUC values are indicated. G Kaplan–Meier survival analysis illustrating PFS of the high- and low-risk groups in the validation cohort (n = 412). Patients were classified into high- and low-risk groups using the same threshold established in the discovery cohort (−0.521). The p value was calculated by the log-rank test. H Bar plots showing the distribution of high- and low-risk patients within poor and good outcome groups in the validation cohort. Fisher’s exact test was used to determine the p value. I Univariate and multivariable Cox regression analyses demonstrating the independent prognostic role of the 24-gene expression score in the validation cohort. J ROC curves demonstrating the performance of different parameters in identifying DLBCL patients with two-year poor outcomes in the validation cohort. K, L Two additional independent cohorts (RNAseq=49, CNP0001327; microarray=484, remaining samples of the GSE181063 cohort were only available for OS data) were used to evaluate the algorithm of the 24-gene risk score. HR hazard ratio, CI confidence interval, ROC receiver operating characteristic, AUC area under the curve, OS overall survival.