Introduction

Retroperitoneal sarcomas (RPS) are rare, heterogeneous malignant mesenchymal tumors that account for a minority of soft-tissue sarcomas yet carry substantial morbidity and mortality due to their deep anatomic location and large size at presentation1. Within this group, retroperitoneal leiomyosarcoma (RLS) constitutes an aggressive histologic subtype. Histology-based analyses demonstrate that recurrence patterns and risk profiles differ markedly across RPS subtypes; in particular, leiomyosarcoma exhibits a greater propensity for distant spread than liposarcoma, underscoring the need for subtype-informed treatment strategies and robust prognostic frameworks in RLS2. Surgical resection remains the cornerstone of potentially curative therapy for RPS. Contemporary consensus statements emphasize management in experienced sarcoma centers with multidisciplinary planning to maximize the likelihood of complete (R0/R1) resection, guide the approach to recurrent disease, and standardize follow-up3. Nonetheless, optimal integration of adjunctive therapies remains unsettled. The randomized STRASS trial showed no overall improvement in abdominal recurrence-free survival with the addition of preoperative radiotherapy to surgery for primary RPS, while post-hoc and histology-specific signals continue to be debated4. In parallel, evidence supporting routine perioperative chemotherapy is heterogeneous and limited for most retroperitoneal histologies. Observational series focused on RLS suggest that carefully selected salvage surgery for recurrence may still yield meaningful benefit, but high-quality, subtype-specific evidence remains scarce5. Recent multidisciplinary reviews similarly call for large, methodologically rigorous analyses that account for selection bias and histologic diversity6. Population-based registries such as the Surveillance, Epidemiology, and End Results (SEER) program enable the study of rare cancers at scale by capturing detailed demographic, pathologic, treatment, and outcome data across diverse real-world settings7,8,9. Leveraging SEER can overcome many limitations of single-center experiences in RLS, where small cohorts and heterogeneity impede precise estimation of treatment effects and reliable identification of prognostic factors. However, the observational nature of registry data necessitates analytic strategies that mitigate confounding and strengthen causal interpretation.

To this end, propensity score matching (PSM) can reduce measured treatment-selection bias by balancing baseline covariates between surgical and non-surgical cohorts, with standardized differences providing recommended balance diagnostics10.

Complementing confounding adjustment, machine-learning-based survival modeling—particularly random survival forests (RSF)—can flexibly capture nonlinear effects and higher-order interactions and provides variable-importance rankings that summarize prognostic patterns in large registries. Importantly, however, registry-based comparisons of surgery versus no surgery are inherently susceptible to selection bias because resectability, anatomical tumor relationships, operative risk, and performance status strongly influence whether surgery is offered and are not captured in SEER. Accordingly, using SEER data (2000–2019), we examined the association between surgical resection status and overall survival (OS) and cancer-specific survival (CSS) in retroperitoneal leiomyosarcoma, and we evaluated the relative importance of routinely recorded demographic, stage, grade, and treatment variables using multivariable Cox regression and RSF-based importance analyses, while emphasizing that these methods adjust for measured confounders only and do not provide causal estimates of surgical benefit.

Materials and methods

Data source and study population

We conducted a population-based retrospective cohort study using the Surveillance, Epidemiology, and End Results (SEER) program. Cases with a pathological diagnosis of RLS from January 1, 2000, to December 31, 2019, were identified. Data were retrieved with SEER*Stat (v8.3.9). SEER data are de-identified and publicly available; therefore, this study was exempt from institutional review board oversight and informed consent requirements.

Case identification, inclusion, and exclusion criteria

RLS was defined using ICD-O-3 histology codes for leiomyosarcoma (8890/3, 8891/3, 8896/3) and a retroperitoneum primary site. Inclusion criteria were: (1) primary RLS at initial diagnosis; (2) pathologically confirmed; (3) diagnosis between 2000–2019; and (4) available survival data.

Exclusion criteria were: (1) cases identified by autopsy or death certificate only; (2) multiple primary tumors with RLS not the first malignancy; and (3) missing or incomplete key variables required for analysis (e.g., survival time, vital status, surgical status, stage, or grade).

Variables and definitions

From SEER, we extracted demographics (age at diagnosis, sex, race), tumor characteristics (tumor differentiation as recorded in the SEER grade/differentiation field), SEER summary stage (localized, regional, distant, unknown), treatment variables (surgery, radiotherapy, chemotherapy), and survival outcomes. SEER summary stage was analyzed using the SEER Summary Stage system, in which “localized” indicates disease confined to the site of origin, “regional” generally reflects contiguous/direct extension into adjacent tissues and/or organs (and/or regional lymph nodes), and “distant” denotes metastatic disease. Given that lymph node metastases are uncommon in retroperitoneal leiomyosarcoma, SEER “regional” stage in this context most often represents locally advanced tumors with direct extension rather than nodal spread; therefore, we interpreted this category accordingly and note that SEER stage categories are registry constructs that may not map directly onto specialist retroperitoneal sarcoma terminology.

  • Age was dichotomized at 60 years (≤ 60 vs > 60) based on prior literature and clinical relevance.

  • Tumor differentiation (SEER grade/differentiation field) was grouped as low differentiation grade (I/II: well/moderately differentiated), high differentiation grade (III/IV: poorly differentiated and undifferentiated/anaplastic; SEER grade IV), or unknown. This SEER variable reflects tumor differentiation and is not equivalent to a dedicated sarcoma grading system (e.g., the French Federation of Cancer Centers Sarcoma Group (FNCLCC) grading system).

  • Race was categorized as White, Black, or Other (including American Indian/Alaska Native and Asian/Pacific Islander).

  • Treatment variables (surgery, radiotherapy, chemotherapy) were extracted from SEER treatment fields and categorized as yes, no, or unknown (if applicable), reflecting treatments delivered during the initial course of therapy for the primary tumor episode.

  • Survival outcomes included overall survival (OS) and cancer-specific survival (CSS). OS was defined as time from diagnosis to death from any cause, and CSS as time from diagnosis to death attributed to the index cancer, with patients censored at last follow-up if alive (or if death occurred from other causes for CSS).

Endpoints

Primary endpoints were OS and CSS:

  • OS was defined as time (months) from diagnosis to death from any cause; patients alive at last follow-up were censored.

  • CSS was defined as the time from diagnosis to RLS-specific death; deaths from other causes and survivors at last follow-up were censored. The administrative follow-up cutoff in SEER for this extraction was December 31, 2019.

Propensity score matching (PSM)

Because treatment allocation (surgery vs no surgery) is non-random in observational data, we implemented 1:1 nearest-neighbor PSM without replacement on the logit of the propensity score to balance baseline characteristics between surgical and non-surgical cohorts. The propensity model included age group, sex, race, SEER stage, differentiation grade, radiotherapy, and chemotherapy. A caliper of 0.01 was used to restrict matches. Covariate balance after matching was assessed using absolute standardized mean differences (SMDs), with SMD ≤ 0.10 considered acceptable balance10. However, propensity score methods can only adjust for measured covariates; unmeasured confounding—particularly factors related to technical resectability, anatomic tumor extent, patient fitness, and center expertise that are not captured in SEER—may still remain.

Survival analysis and multivariable modeling

Kaplan–Meier methods were used to estimate OS and CSS, with log-rank tests comparing survival curves between groups. We fitted Cox proportional hazards models to estimate hazard ratios (HRs) and 95% confidence intervals (CIs). Variables with clinical relevance and/or P < 0.10 in univariable analyses were entered into multivariable models. Variance inflation factors (VIFs) were examined to assess multicollinearity; VIF < 10 was considered acceptable. Proportional hazards assumptions were verified using Schoenfeld residuals and log–log plots.

Machine-learning survival modeling

To complement regression modeling and rank prognostic importance without parametric assumptions, we trained RSF models on the matched cohort. Candidate predictors included all covariates above, plus treatment (surgery). We used standard hyperparameters (e.g., number of trees ≥ 1000; log-rank splitting; internal out-of-bag estimation) and computed variable importance scores to identify the strongest predictors of OS and CSS11,12.

Software

All analyses were performed with Stata/MP 16.0 (StataCorp) and R 4.2.3 (R Foundation for Statistical Computing). PSM was implemented with standard matching routines; RSF modeling used established survival-tree packages. A two-sided α = 0.05 defined statistical significance.

Ethics statement

This study used de-identified SEER data and complied with the Declaration of Helsinki. No human subjects were directly involved; IRB approval and informed consent were not required.

Results

Baseline characteristics before and after matching

A total of 1041 patients with RLS met the inclusion criteria, including 817 (78.5%) treated with surgery and 224 (21.5%) without surgery. Baseline clinicopathological characteristics were imbalanced between groups before matching (P < 0.05 for multiple covariates). After 1:1 propensity score matching (PSM), 318 patients were retained (159 per group) with well-balanced baseline features (P > 0.05 across covariates). The median follow-up was 17 months (IQR, 6–41) (Table 1).

Table 1 Baseline characteristics of patients with retroperitoneal leiomyosarcoma (RLS) before and after propensity score matching (PSM).

Survival outcomes in the matched cohort

Within the matched cohort, 92/159 (57.9%) patients in the surgery group died, including 76 (47.8%) cancer-specific deaths; corresponding figures in the non-surgery group were 135/159 (84.9%) and 117 (73.6%), respectively. One-year OS was 83.14% with surgery versus 51.46% without surgery, and one-year CSS was 85.03% versus 55.58%, respectively (all P < 0.001 by log-rank). Reported log-rank statistics were χ2 = 60.15 for OS and χ2 = 52.00 for CSS (both P < 0.001) (Fig. 1).

Fig. 1
Fig. 1The alternative text for this image may have been generated using AI.
Full size image

Kaplan–Meier survival by surgery after propensity score matching. (A) Overall survival (OS) and (B) cancer-specific survival (CSS) in the matched cohort (n = 318; surgery n = 159, no surgery n = 159). Curves compare surgery (blue) versus no surgery (red) with 95% confidence-interval ribbons; tick marks denote censoring. Risk tables display the number at risk (number censored) at prespecified time points, with cumulative numbers of events reported below. Group differences by log-rank test were significant for both endpoints (P < 0.001). Abbreviations: OS, overall survival; CSS, cancer-specific survival.

Univariable cox analysis

On univariable analysis, tumor stage, differentiation grade, surgery, and radiotherapy were associated with both OS and CSS; poorer differentiation and more advanced stage predicted worse outcomes, while surgery and radiotherapy showed protective associations (all P < 0.05). Age, sex, and race were not significant in univariable models for both endpoints (Table 2).

Table 2 Univariable Cox proportional hazards analysis for overall survival (OS) and cancer-specific survival (CSS).

Multivariable cox analysis

In multivariable models, age, tumor stage, differentiation grade, and surgery were independent predictors of OS (all P < 0.05), whereas tumor stage, differentiation grade, and surgery independently predicted CSS (all P < 0.05). Receipt of surgery remained strongly associated with improved outcomes in the matched cohort (OS HR = 0.34, 95% CI 0.25–0.45; CSS HR = 0.34, 95% CI 0.25–0.46; both P < 0.001) (Fig. 2), noting that this association may also reflect underlying selection related to resectability and patient fitness.

Fig. 2
Fig. 2The alternative text for this image may have been generated using AI.
Full size image

Multivariable Cox models for survival after propensity score matching. Forest plots display adjusted hazard ratios (HRs) with 95% confidence intervals for (A) OS and (B) CSS. Covariates included age (> 60 vs ≤ 60 years), sex (female vs male), race (Black/Other vs White), differentiation grade (III/IV or Unknown vs I/II), SEER summary stage (Regional/Distant/Unknown vs Localized), radiotherapy (yes vs no), chemotherapy (yes vs no), and surgery (yes vs no). HR < 1 favors the category listed second in each contrast (e.g., surgery). Proportional-hazards assumptions were evaluated using Schoenfeld residuals; two-sided α = 0.05. HR, hazard ratio; CI, confidence interval; SEER, Surveillance, Epidemiology, and End Results.

Machine-learning survival modeling

RSF analysis corroborated regression findings: among all candidate variables, the highest importance scores for both OS and CSS were observed for surgery, followed by tumor stage and differentiation grade (Fig. 3).

Fig. 3
Fig. 3The alternative text for this image may have been generated using AI.
Full size image

Random survival forest (RSF) tuning and variable importance. For (A) OS and (B) CSS, the left panels plot out-of-bag (OOB) error against the number of trees; the vertical dashed line marks the ntree with the lowest OOB error (best ntree). Right panels show permutation-based variable importance ranked from highest to lowest; larger bars indicate greater predictive contribution, with surgery, SEER summary stage, and differentiation grade among the top predictors. RSF models used log-rank splitting and internal OOB estimation; details appear in Methods. RSF, random survival forest; OOB, out-of-bag; SEER, Surveillance, Epidemiology, and End Results.

Discussion

In this population-based analysis of retroperitoneal leiomyosarcoma, receipt of surgical resection was strongly associated with longer overall and cancer-specific survival after propensity score matching, while SEER summary stage and tumor differentiation remained independently associated with outcomes. Concordant results from propensity-adjusted Cox models and random survival forests suggested a stable ordering of prognostic features, with surgery status, stage, and differentiation consistently ranked among the most influential predictors in our models10,11. These findings are broadly aligned with contemporary recommendations that emphasize complete gross resection as the cornerstone of curative-intent management for retroperitoneal sarcomas when feasible and delivered within multidisciplinary teams at experienced centers13,14,15. Evidence from randomized trials, including the EORTC STRASS study, did not demonstrate an overall benefit of routine preoperative radiotherapy across histologies, which is consistent with our observation that radiotherapy was not among the top prognostic features in this registry-based analysis4. Prior histology-aware studies have highlighted leiomyosarcoma’s propensity for distant relapse and the role of surgery in selected settings, supporting risk-adapted surveillance and management strategies2,5. Importantly, SEER reflects care delivered across a wide spectrum of institutions, including many low-volume centers, and outcomes may differ meaningfully by center volume and multidisciplinary expertise16,17. In contrast, high-volume reference-center series from Trans-Atlantic Retroperitoneal Sarcoma Working Group represent outcomes achieved within specialized sarcoma pathways. Accordingly, the absolute survival estimates observed in SEER should be interpreted as population-level real-world outcomes and may not be directly comparable to results from high-volume sarcoma centers; nonetheless, both settings underscore the importance of expert multidisciplinary evaluation and appropriate patient selection.

A central consideration in interpreting these results is the profound selection inherent to surgical management of retroperitoneal sarcoma. In routine practice, the decision to operate is determined by nuanced assessments of anatomical relationships and technical resectability (e.g., involvement of the retrohepatic inferior vena cava, hepatic veins, portal structures, aorta, and renal hilum/vasculature), as well as patient fitness and anticipated perioperative risk. Consequently, patients recorded as not undergoing surgery in SEER are frequently those deemed unresectable or unsafe to resect, rather than patients for whom curative-intent surgery was appropriate but omitted. Because SEER lacks granular data on tumor complexity, resectability, margin status, operative intent, perioperative morbidity, and center expertise, neither propensity score matching nor machine-learning models can fully adjust for these determinants. Therefore, the observed survival differences should be interpreted as prognostic associations rather than evidence that surgery would be beneficial or appropriate for all patients, and they underscore the importance of multidisciplinary resectability assessment in specialized sarcoma centers.

Methodologically, pairing propensity score matching with random survival forests strengthens inference and clinical interpretability. Propensity techniques—with standardized balance diagnostics—address measured selection bias and enable fairer comparisons between surgical and non-surgical cohorts10. Random survival forests, free of proportional-hazards and linearity assumptions, capture nonlinearities and interactions while providing variable-importance rankings that are intuitive for clinicians; in our matched data, these rankings reproduced the regression hierarchy and amplified confidence in the primacy of surgery and the roles of stage and grade6,11,15.

This work has important limitations that are particularly relevant to interpreting surgery in retroperitoneal leiomyosarcoma. Most critically, SEER does not capture technical resectability, detailed anatomical extent (including major vascular/organ involvement), performance status, perioperative risk, surgeon judgment, or center expertise—factors that strongly determine whether surgery is offered and cannot be fully addressed by propensity score methods or machine-learning models. Consequently, residual confounding and selection bias are expected, and the observed associations should not be construed as causal. In addition, registry data lack surgical margin status, tumor size/volume and complexity, vascular encasement, detailed treatment intent and sequencing, perioperative morbidity, recurrence patterns, and center-level quality metrics, which limits clinical granularity, increases heterogeneity, and may influence both observed survival and treatment-selection patterns10,11. Despite these limitations, the results provide population-level context and motivate clearer next steps. Practice patterns and perioperative strategies continue to evolve under guideline influence, and histology-tailored systemic approaches remain under active evaluation13,14. Prospective, leiomyosarcoma-focused trials—such as STRASS2, which evaluates neoadjuvant chemotherapy versus surgery alone—may better define which patients are most likely to benefit from systemic therapy in addition to surgery, and systematic capture of tumor biology and biomarkers is needed to address the persistent risk of distant relapse18,19. Finally, integrating centralization with measurable quality and multidisciplinary-process metrics may help translate population-level insights into more consistent outcomes across care settings16.

Conclusions

In this population-based SEER analysis, receipt of surgery was strongly associated with survival and emerged as a leading prognostic factor across both conventional regression and machine-learning survival models. However, because SEER lacks key determinants of surgical candidacy—especially technical resectability, anatomical tumor complexity, margin feasibility, operative risk, and performance status—these findings should not be interpreted as evidence that surgery is appropriate or beneficial for all patients. Clinical decision-making should remain individualized within multidisciplinary sarcoma teams, and our results primarily underscore the importance of timely referral for expert assessment of resectability and comprehensive sarcoma care.