A cost-sensitive multiclass machine learning framework for postoperative neurosurgical triage (Neuro-TACTIC)

Naser, Paul Vincent; Fischer, Maximilian; Peregrino, Roberto Diaz; Jakobs, Martin; Krieg, Sandro; Neher, Peter; Neumann, Jan-Oliver

doi:10.1038/s41598-026-45092-1

Download PDF

Article
Open access
Published: 24 March 2026

A cost-sensitive multiclass machine learning framework for postoperative neurosurgical triage (Neuro-TACTIC)

Paul Vincent Naser^1,2,4,8,
Maximilian Fischer^3,2,7,
Roberto Diaz Peregrino¹,
Martin Jakobs^1,2,4,
Sandro Krieg^1,2,
Peter Neher^3,5,6,7 &
…
Jan-Oliver Neumann^1,2,4

Scientific Reports volume 16, Article number: 9847 (2026) Cite this article

709 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

Postoperative placement of patients into a regular ward, an intermediate-care unit (IMC), or an intensive care unit (ICU) is critical for balancing patient safety against resource constraints. Most existing models collapse this decision into a binary ICU versus non-ICU choice and lack a mechanism to tune risk thresholds to local staffing ratios or definitions of ICU‐level events. We developed Neuro-TACTIC, a cost-sensitive machine learning framework that stratifies postoperative neurosurgical patients into three monitoring levels: regular ward, intermediate care unit, and intensive care unit. An XGBoost-based classifier was trained on 27 demographic, intraoperative, and imaging-derived features from a retrospective cohort of 1072 patients undergoing elective craniotomy. A tunable parameter ζ integrates resource-related and harm-related costs to adjust the balance between over- and under-triage. Generalization was assessed in an independent cohort. Across repeated cross-validation and bootstrap analyses, the framework demonstrated stable behavior across cost settings. At the operating point ζ = 0.975, performance was AUCμ = 0.67 ± 0.03 and weighted F1 = 0.49 ± 0.03 in the development cohort, and AUCμ = 0.60 ± 0.04 and weighted F1 = 0.44 ± 0.06 in the independent evaluation cohort (n = 81). Feature importance analyses identified operative duration, tumor volume, surgical position, body mass index, and patient age as the most influential predictors. This study demonstrates the feasibility of cost-sensitive, three-tier postoperative triage modeling in neurosurgical patients. Neuro-TACTIC is a methodological proof-of-concept; prospective validation and multicenter evaluation are required before clinical deployment.

Explainable machine learning prediction of tracheostomy after craniotomy for supratentorial intracerebral hemorrhage

Article Open access 01 March 2026

Development and validation of an interpretable prediction model for the risk of unplanned reoperation in patients underwent intracranial tumor surgery

Article Open access 21 March 2026

Efficacy study of neuronavigation-assisted stereotactic drilling of urokinase drainage versus craniotomy in the treatment of massive intracerebral haemorrhage in elderly patientsa

Article Open access 03 September 2024

Introduction

Postoperative patient allocation to appropriate monitoring levels is a fundamental challenge in neurosurgical care. Decisions regarding placement on a regular ward, an intermediate care unit (IMC), or an intensive care unit (ICU) must balance early detection of deterioration with finite staffing and bed resources. Overly conservative triage increases resource utilization, whereas insufficient monitoring risks delayed recognition of adverse events^1,2,3.

Neurosurgical patients represent a particularly demanding population for postoperative triage. Intracranial procedures carry risks such as hemorrhage, cerebral edema, and acute neurological decline, which may evolve rapidly and require frequent neurological assessment. As a result, many institutions default to routine ICU admission after elective craniotomy, despite longstanding evidence that only a subset of patients require ICU-level interventions^4,5,6,7.

Prior efforts to formalize postoperative neurosurgical triage have primarily relied on binary classification schemes and regression-based risk scores (ICU vs. non-ICU). For example, in an attempt to identify preoperative factors associated with postoperative ICU-requiring events, Hanak et al. examined 400 elective craniotomy cases and, using multivariate analysis, identified factors predictive of postoperative ICU admission. Several other tools and scores have been developed to assess the risk of postoperative ICU-level care in neurosurgical patients, analyzing parameters such as ASA score, tumor volume, and surgical duration, and reporting ROC-AUC values between 0.7 and 0.77^1,2, which could be validated in studies on novel patient cohorts^9,10,33.

However, existing approaches share important limitations. By collapsing postoperative care into a binary ICU versus non-ICU decision, they fail to account for the growing role of intermediate care units⁸, which serve patients who require closer monitoring than a ward but do not meet criteria for full ICU support. Moreover, these models do not explicitly account for the asymmetric consequences of misclassification, and they typically optimize for statistical performance metrics rather than operational or resource-aware consideration^1,2,9,10.

Recent advances in machine learning enable more flexible modeling of complex perioperative data to improve neurosurgical care^{9,10,11,12,13,14}, and boosted decision trees have been shown to outperform traditional regression-based statistical methods¹⁰. Nevertheless, to date, no published framework integrates cost sensitivity or provides a transparent mechanism to adapt triage thresholds to local resource constraints and institutional risk tolerance. To address this gap, we propose Neuro-TACTIC (Neurosurgical Triage & Acuity algorithm via Cost-Tuned ICU/IMC Classification). This cost-sensitive machine learning framework models postoperative triage as a three-class problem and introduces a tunable parameter (ζ) to balance harm and resource considerations (Fig. 1). In this proof-of-concept study, we aim to develop this three-tiered classification paradigm and internally and externally evaluate the algorithm as a precursor to further clinical studies.

Methods

Patient population

For the main dataset, 1072 consecutive adult patients admitted to our ICU following elective craniotomy at our institution between 01/2019 and 07/2020 were included in the study. Parts of this dataset have been previously analyzed in the context of a binary ICU/non-ICU decision paradigm^9,10. For the evaluation dataset, 81 adult patients undergoing elective brain tumor surgery in 11–12/2024 were consecutively enrolled (Table 1). Emergency cases were excluded. Overall, the majority of patients did not require either ICU or IMC surveillance (62.9% in the main dataset, 67.1% in the evaluation dataset; Tables 2, 3).

Table 1 Overview of demographic features of patients included in the analysis. The chi-square test was used to assess categorical variables with more than two categories, in which case the resulting p-value is denoted in the first row relating to that category. For binary variables, Fisher’s exact test was used, and unpaired t-test with Welsh’s correction for continuous variables.

Full size table

Table 2 Overview of tumor-relevant features extracted from the preoperative radiology report. The chi-square test was used to assess categorical variables with more than two categories, in which case the resulting p-value is denoted in the first row relating to that category. For binary variables, Fisher’s exact test was used, and unpaired t-test with Welsh’s correction for continuous variables.

Full size table

Table 3 Overview of ICU and IMC events in the main and evaluation datasets. Fisher’s exact test was used to test for significant differences between groups.

Full size table

Our in-house protocol does not mandate routine ICU surveillance after surgery. The decision for postoperative surveillance in the ICU was made prior to surgery by joint judgment of the neurosurgeon and the anesthesiologist.

Data collection

Data relevant to the study were retrospectively extracted from clinical records, and preoperative imaging was reviewed. The anesthesia report and subsequent ICU and neurosurgical floor documentation were scrutinized for any intra- or postoperative adverse events. A routine postoperative CT scan is not performed at our institution. There was no missing data in our study.

Prognostic features

Based on the literature^1,2,9,10, we selected candidate features from five categories that were potentially prognostic for predicting adverse events. Table 1 summarizes these medical features, while Table 2 presents the imaging-derived features.

Postoperative events and levels of surveillance

Based on the literature^1,2,9,10, we defined events and interventions that require treatment in an ICU setting (Table 3). We also recorded other events that require specific actions or an elevated level of staff attention and defined these as IMC events (Table 3). The ICU was defined as the highest level of care, capable of managing both ICU and IMC events, whereas IMC could only successfully treat IMC events. The third option, the ward, was defined as incapable of treating either IMC or ICU events.

Training and validation of machine learning algorithms

For the classification task, a supervised gradient boosting technique was used. Gradient boosting uses an ensemble of decision trees that are iteratively optimized to minimize the loss function. The XGBoost framework (V1.7.6) was selected¹⁵. Boosted trees were trained, optimized, and their final performance validated using fivefold cross-validation repeated 5 times for a total of 25 training and validation runs (training set: 800 cases, validation set: 200 cases). Hyperparameters were optimized using fivefold cross-validation on each training set with the Optuna framework (V3.3.0) and a tree-structured Parzen estimator, with 5000 iterations per run. Additional ML tasks were performed with the scikit-learn¹⁵ package (V1.2.2)³².

Integration of a cost-matrix and a weighted loss function

To incorporate domain-specific misclassification costs into model training, we implemented a custom multiclass classification function in XGBoost. We defined a cost-sensitive loss based on a class-dependent cost matrix that specifies penalties for misclassifying each class. This matrix was parameterized by a tunable \(\zeta\) value to adjust the relative severity of the different errors. During training, we implemented a custom gradient and Hessian computation, which modifies the optimization process to minimize the expected misclassification cost rather than the nominal classification error. This allowed the model to prioritize clinically relevant distinctions, particularly when misclassifying certain classes (e.g., ICU) has a higher practical impact. The use of custom objectives in XGBoost builds upon its native support for user-defined loss functions through second-order gradient optimization¹⁵. The mathematical details are explained in the supplementary methods.

Classification performance with HC and RC

In our model, the trade-off between staffing resources and under-triage risk is mediated by the factor ζ, which determines the relative weight of cost considerations in this relationship. On one hand, the relative cost matrix (RC) operationalizes nurse-to-patient ratios (ward 10:1, IMC 4:1, ICU 2:1) as a transparent proxy for relative nursing resource intensity and is row-normalized. This choice is supported by the health services literature, in which nursing coverage/workload is explicitly used as a proxy for ICU resource utilization¹⁶. Moreover, nursing staffing constitutes a major component of ICU costs and has been linked to workload-based resource planning approaches¹⁷. RC is not intended to represent total economic cost, which also depends on physician staffing, monitoring infrastructure, medication use, institutional accounting, and opportunity costs; these components would require institution-specific microcosting approaches, for which nursing activity measures have been proposed as key inputs¹⁸.

On the other hand, the harm-cost matrix (HC) weights under-triage errors using inverse class frequencies in the development cohort to penalize misclassification of rare ICU/IMC outcomes more strongly. This frequency-based weighting is a pragmatic proxy for unequal consequences of misclassification and should not be interpreted as a direct measure of clinical severity. The final cost matrix was computed as a ζ-weighted combination of RC and HC (full equations in Supplementary Methods).

Practical guidance for institution-specific calibration of ζ (data requirements, validation workflow, and ζ-sensitivity analysis) is provided in the Supplementary Methods.

Cross-validation and performance evaluation

We evaluated our classifier on both an internal and an external hold-out dataset using a repeated stratified cross-validation paradigm. After training, we computed probabilistic predictions on the held-out test fold and on the entire external cohort. We quantified discrimination using the area under the cost-weighted multiclass ROC surface (AUCµ)and the class-imbalance-aware F₁ score^30,31. Specifically, we recorded both a single weighted F₁ (averaging per-class F₁ by ground-truth support) and the individual F₁ for each of the three outcome classes. Confusion‐matrix counts and runtimes were also logged. All fold‐level metrics were exported for downstream summary and statistical comparison across \(\zeta\) settings. For external evaluation, each of the 25 trained cross-validation models was applied to the independent evaluation cohort, yielding a distribution of external-cohort performance metrics across models; we report the mean ± SD and 95% confidence interval for the mean across these 25 model evaluations.

Feature importance

In our analysis, we used two complementary approaches to assess feature importance from the trained XGBoost models: gain-based importance and SHAP values.

Gain importance reflects the total gain in the objective function brought by each feature when it is used in a decision tree split across all boosting rounds. Specifically, we extracted gain values via model.get_score(importance_type = ‘gain’), which provides a global measure of feature utility based on the training data and the structure of the fitted trees.

To complement this with a more individualized and model-agnostic explanation, we computed SHAP values¹⁹. SHAP is a unified framework grounded in cooperative game theory that attributes a contribution value to each feature for a given prediction. For multiclass outputs, SHAP produces class-specific explanations that we aggregated across classes and samples by averaging the absolute SHAP values, yielding a robust per-feature importance estimate tailored to the input data. This dual approach provides both a training-time (gain) and a prediction-time (SHAP) perspective on the relevance of each input feature.

Statistics

Data were collected and processed in Microsoft Excel (Microsoft Corp. CA, USA) and submitted to descriptive and statistical analysis using GraphPad Prism (version 10.1.0; GraphPad Software LLC, Boston/MA, USA) and the scipy package (version 1.11.2) ^20,21.

The comparison between ratios of occurrence relied on Chi-square testing for categorical variables with more than two values, and Fisher’s exact test for 2 × 2 comparisons. Comparing two individual groups of continuous variables relied on two-sided Student t-tests. In all cases, a p-value < 0.05 was considered statistically significant. Additional details about the statistical analysis are provided in the figure legends. Absolute values are provided as mean ± standard deviation (s.d.), unless otherwise noted in the figure legends.

For figure design, bar graphs, box plots, and other data illustrations were exported as vector graphics from Python and GraphPad Prism, and were arranged in Adobe Illustrator and Photoshop (Adobe Inc., San Jose, CA, USA).

Ethics approval

The study was performed in accordance with the Declaration of Helsinki. The design of this study, as well as the retrospective collection and analysis of patient data, were approved by the institutional review board. The requirement of informed consent for collection and processing of pseudonymized patient data was waived.

Results

Demographics and comorbidities

The main cohort comprised 1072 patients, with an independent evaluation sample of 81 patients. Demographics and comorbidities were similar between cohorts (Table 1), except for higher rates of cardiovascular and other diseases in the evaluation set (p < 0.001; p = 0.0235). These distribution shifts indicate a measurable change in case mix between cohorts and motivate a cautious interpretation of external performance estimates, particularly for the rarer ICU class.

Tumor-related features

Radiological and tumor-related features likewise showed substantial concordance (Table 2). Mean tumor volume did not differ (21.5 ± 31.2 vs. 19.8 ± 27.2 mL, p = 0.607), nor did rates of hydrocephalus (5.7% vs. 3.7%, p = 0.616) or midline shift (17.4% vs. 25.9%, p = 0.074). However, tumor location was more often supratentorial in the main cohort (73.6% vs. 59.3%, p = 0.002), and the distribution of suspected diagnoses differed (p = 0.0001), with fewer vestibular schwannomas and low-grade gliomas in the evaluation data.

ICU and IMC events

Postoperative ICU and IMC events were generally rare and occurred at a similar frequency (Table 3). Regarding the ICU events, rates of cardiopulmonary resuscitation, cerebrospinal fluid drainage, immediate operative revision, catecholamine use, reintubation, dysphagia, prolonged mechanical ventilation, and impaired consciousness showed no significant differences between groups except for “medication to lower intracranial pressure,” which occurred in none of the evaluation patients (p = 0.0437). Among IMC events, IV antihypertensive administration (23.5% vs. 7.4%, p = 0.0004), new cranial nerve deficits (6.9% vs. 22.2%, p < 0.0001), severe hemiparesis (4.5% vs. 0%, p = 0.0437), and worsening mNIHSS by ≥ 2 points (7.0% vs. 0%, p = 0.0079) were more frequent in the main cohort. At the same time, seizures did not differ significantly (p = 0.0676).

The proportions of patients classified as requiring IMC (i.e., experiencing at least one IMC event without ICU transfer) versus those classified as requiring ICU care (i.e., experiencing at least one ICU event) were similar (p = 0.79 and p = 0.22, respectively).

Classification performance

To assess the model’s optimal performance, we performed a course sweep over ζ from − 5 to + 5 and recorded the resulting fractions of patients allocated to the three classes (Fig. 2A). For both the main and the independent evaluation cohort, the curves trace a smooth transition: at ζ≈ − 5 the model sends almost all patients to the ward, since any upward reclassification would incur prohibitive resource‐use penalties; as ζ increases toward zero, the penalty for under‐triage grows relative to resource cost, and the IMC curve rises first, peaking around ζ≈ + 0.5 where a roughly equal mix of ward and IMC assignments maximizes balanced coverage. Beyond ζ≈ + 1.0, the ICU curve ascends steeply: at these settings, the harm‐cost term dominates, so that even marginal cases are escalated. Crucially, the near-perfect overlap between the main and evaluation curves confirms that our cost-driven decision logic replicates well on held-out data.

Figure 2D–E shows confusion matrices at representative ζ values. At negative ζ, more than 90% of true IMC and ICU cases are assigned to lower-acuity settings (under-triage), illustrating the clinical risk when the objective function heavily penalizes resource utilization. As ζ increases toward 0.0, correct regular-ward classifications increase, and IMC classification performance begins to recover. At ζ = + 1.0, class-wise correct allocation is ≥ /≈ 50% in the development cohort (with comparable behavior in the evaluation cohort; Fig. 2D–E). Between these extremes, performance changes only modestly across a plateau (approximately ζ = 0.5–1.5). We therefore performed a fine-grained sweep in this interval (Δζ = 0.025) to characterize the under- versus over-triage trade-off (Fig. 2C).

Across bootstrapped resamples, the median inflection-point estimate was ζ = 0.975 (Fig. 2B), which was used as the operating point for subsequent analyses. The relevant performance metrics are listed in Table 4, and their dynamics are depicted in supplementary Fig. S3. Because the evaluation cohort contained few ICU-level events (6/81), class-specific ICU metrics are sensitive to small absolute-count changes. We therefore summarize external-cohort performance across 25 independently trained models (mean ± SD and 95% CI of the mean; Supplementary Table S1).

Table 4 Main performance characteristics for ζ = 0.975 (mean ± SD).

Full size table

Relative importance of prognostic features

Understanding which features most influence triage decisions is crucial for model transparency and clinical trust. To this end, we employed two complementary metrics—GAIN and SHAP—to identify the variables driving acuity assignments. To this end, we evaluated how the decision is determined at our optimal ζ = 0.975 and how these importances evolve as ζ diverges from its optimal value.

The GAIN‐derived importance scores (Fig. 3A, left side, blue bars) quantify the total information gain each feature contributes to the model’s splits. Operative duration was the predominant predictor by both GAIN and SHAP, followed by tumor volume and surgical position. As operative duration is definitively known only after completion of surgery, the present model is primarily applicable to immediate postoperative disposition rather than preoperative scheduling; future iterations will therefore evaluate reduced feature-set (preoperative-only) variants. Surgical position, BMI, and patient age comprise the top five factors, followed by intraoperative ranges of relative risk, suspected tumor entity, and ASA status. Importantly, rankings are highly stable across the repeated 50 bootstrap samples.

SHAP‐derived importance scores (Fig. 3A, right side, red for the main dataset, black for the evaluation set) confirm these findings. SHAP-features are a measure of the average absolute change in the model’s output attributable to each feature. Again, surgery duration leads, driving roughly 0.15 units of predicted acuity shift on average, with tumor volume and surgical position close behind. BMI and age occupy an intermediate tier, while tumor location and ASA status have comparatively marginal effects. Crucially, the similarity between the main and evaluation datasets confirms that the model generalizes seamlessly to the held‐out data.

To assess the changes in feature importance over the ζ continuum, we derived the respective GAIN and SHAP values from 0.5 to 1.5 in increments of 0.025 for the four features most important at ζ = 0.975 (Fig. 3, bottom panels). All four features exhibit a characteristic “crossover” behavior. For example, the importance of surgery duration in GAIN terms steadily increases with ζ: when harm-costs dominate, the model places more emphasis on procedural time to decide whether to escalate care. Conversely, its SHAP importance peaks in the negative‐ζ regime, where small changes in duration can tip the scale from ward to IMC or ICU under strict resource constraints. Tumor volume and surgical position display the same inverse relationship between GAIN and SHAP curves, while BMI’s influence remains relatively muted overall but still shifts subtly with ζ.

Discussion

Neuro-TACTIC introduces a cost-sensitive, three-tiered approach to postoperative neurosurgical triage. By defining a relative cost matrix that encodes personnel expenditure for regular wards (10:1 nurse: patient), IMC (4:1), and ICU (2:1) settings, and combining it with a harm-cost matrix reflecting event rarity. The model’s ζ parameter smoothly navigates between under- and over-triage priorities. The ζ parameter yields a plateau (≈0.5–1.5) where IMC assignments peak. In our current dataset, we consistently identified ζ = 0.975 as a good trade-off between resource conservation and under-triage risk, and fixed this value for all subsequent analyses.

All previous models were based on a binary ICU-versus-non-ICU framework and therefore overlook the growing middle tier of intermediate care^1,2,10,22. For example, Hanak and colleagues applied a logistic regression model to 400 elective craniotomy cases and found that only age and diabetes predicted postoperative ICU requirements, without providing any gradation in intermediate monitoring need²². Similarly, Munari et al. developed a simple point score combining ASA-PS, tumor volume, and surgery duration and reported an ROC-AUC of 0.774 for ICU admission, whereas CranioScore yielded an ROC-AUC of 0.70. Follow-up validations on novel cohorts reproduced discriminations in the 0.65–0.75 range, yet both tools collapsed regular-ward and IMC candidates into a single low-acuity category.

Even ML efforts that moved beyond linear models remained binary, consistently outperforming regression models^10,23, but they too produced only ICU versus non-ICU recommendations^10,23. Additionally, none of these prior approaches included a transparent lever to adjust sensitivity versus specificity in accordance with local staffing ratios or definitions of which events truly mandate ICU care. By contrast, Neuro-TACTIC’s three-level output explicitly accommodates intermediate care and introduces ζ as a cost-sensitive tuning parameter.

Essentially, our approach formalizes what expert clinicians implicitly do: consider placing every patient in the ICU as the upper bound of safety (ζ < < 1), and then use a cost-sensitive framework to reduce ICU admissions to the minimum necessary. Notably, because our clinical setting does not routinely require ICU admission, the patients included in this study were classified as “high-risk” by experienced clinicians.

In the retrospective cohorts analyzed, the model produced assignment distributions consistent with reduced ICU allocation at selected cost settings. These findings demonstrate the feasibility of cost-sensitive triage modeling. Routine ICU care for elective craniotomy has been shown to consume multiple times the staff and cost of ward or IMC care, yet benefits only a small minority of patients ^3,4,24,25. A recent study examined 1,070 consecutive craniotomy patients; 674 were monitored overnight in an ICU, and 396 were observed postoperatively in a PACU before transfer to the regular ward, reporting virtually identical incidence rates of any relevant event within 24 h postoperatively (4% in ICU, 2.8% in PACU). In our cohorts, the overall rate of ICU-level events was substantially higher (12.4% in the main dataset and 7.4% in the evaluation set), reflecting either a different case mix (e.g., markedly more frequent infratentorial surgeries in our collective) or more inclusive event monitoring, mirroring similar rates of postoperative ICU-level events reported by others²⁴.

A hospital that classifies minor neurological deficits as ICU-level events can increase the corresponding under-triage penalty, whereas a facility with constrained ICU capacity can increase the resource-cost weight associated with ICU allocation. Importantly, despite measurable case-mix differences between the development and evaluation cohorts, the model exhibited similar ζ-dependent allocation trajectories and operating-point behavior across datasets. At ζ = 0.975, assignment distributions were comparable between cohorts and reflected lower ICU allocation at this operating point. These findings suggest stable model behavior under the observed cohort shift, while acknowledging that ICU-class performance estimates in the evaluation cohort remain imprecise due to the low number of ICU events. Performance in the independent evaluation cohort was modest for the IMC and ICU classes, which is expected given marked class imbalance and only six ICU-level events. Nonetheless our findings support the feasibility and stability of the cost-sensitive multiclass framework, while they do not justify autonomous clinical deployment at this stage. Because ζ explicitly encodes institution-specific trade-offs, there is no single universal AUC/F1 threshold for implementation; rather, prospective translation would require pre-specified safety/utility targets agreed with stakeholders (e.g., constraints on ICU under-triage) and evaluation against current standard practice.

To understand which factors drive Neuro-TACTIC’s triage recommendations, we assessed two complementary measures of feature importance across our XGBoost models (GAIN and SHAP). Gain- and SHAP-based analyses both ranked operative duration, tumor volume, and surgical position as top predictors (Fig. 3A). Emphasizing these features is consistent with numerous studies assessing risk factors for postoperative ICU events (for a summary, see Table 5)^{1,2,10,22,23,24,26}.

Table 5 Qualitative comparison of whether key predictors identified by Neuro-TACTIC were evaluated or reported in prior studies of postoperative monitoring/ICU disposition after elective craniotomy. “X” indicates the variable was explicitly included or reported; blank indicates not reported.

Full size table

Limitations

Several limitations warrant careful consideration. First, the dataset underlying the computational framework was derived from patients who had already been pre-allocated to the ICU by the treating surgeon, representing a clinically enriched population rather than an unselected elective craniotomy cohort. Such enrichment may influence baseline ICU/IMC event frequencies and potentially affect discrimination characteristics. However, published series of elective craniotomy report substantial variability in postoperative ICU-level event rates, ranging from low single-digit unplanned ICU admissions in selectively monitored PACU cohorts to rates in the low teens for adverse events requiring intensive care intervention^3,27,28. Accordingly, while our cohort reflects a higher-risk population, the observed ICU-level event frequencies remain within ranges described in the literature. Nonetheless, extrapolation to broader, lower-risk elective populations should be undertaken cautiously.

In addition, the single-center, retrospective design may limit generalizability, as local practice patterns (such as criteria for IMC admission versus ward observation), perioperative care protocols, and definitions of “ICU-level” events vary across institutions and healthcare systems. The model’s decision boundaries and ζ-derived trade-offs may therefore require recalibration before application elsewhere; however, such recalibration is facilitated by the inclusion of ζ. Neuro-TACTIC should therefore be interpreted as a methodological proof-of-concept rather than a clinical decision rule.

Second, the relative rarity of true ICU-level complications (class imbalance) poses challenges to robustly estimating sensitivity. Although cross-validation and our held-out evaluation cohort (n = 81) yielded similar assignment rates, the small absolute number of positive events means that confidence intervals for each individual adverse outcome remain wide. This limits our ability to explore more granular event subtypes (e.g., reintubation vs. catecholamine use), each of which may warrant its own risk profile and cost parameter. For meaningful external validation of ICU-class performance, future cohorts should include substantially larger numbers of ICU-level events (ideally ≥ 50), to allow reasonably precise estimation of ICU sensitivity/F1; this will likely require prospective multicenter data collection.

Third, while SHAP and gain-based analyses reliably identify the most influential features, the complex interactions are not immediately transparent to end users. Further work is therefore needed to translate these insights into intuitive decision aids.

Finally, our current framework focuses exclusively on preoperative and intraoperative variables to predict immediate postoperative care needs. It does not account for dynamic changes during the recovery period that might further refine acuity assignments. Because operative duration is only fully known at the end of surgery, the current Neuro-TACTIC instantiation is primarily applicable to immediate postoperative disposition decisions (ward vs IMC vs ICU) rather than preoperative scheduling. This is a limitation, as earlier risk stratification would be desirable for operational planning and capacity management. Future iterations will therefore evaluate reduced feature-set variants (e.g., preoperative-only models) to enable preoperative scheduling decisions, consistent with our proof-of-concept positioning.

Conclusions

Neuro-TACTIC introduces a three-tier, cost-sensitive modeling framework for postoperative neurosurgical triage that explicitly incorporates intermediate care alongside regular ward and intensive care unit assignment. By encoding relative resource-use and harm considerations into the tunable parameter ζ, the framework enables systematic exploration of trade-offs between under- and over-triage in retrospective data. In the cohorts analyzed, a ζ value of 0.975 was selected as the operating point, yielding stable assignment behavior in both the development (n = 1,072) and independent evaluation (n = 81) datasets.

Across models, operative duration, tumor volume, surgical position, body mass index, and patient age emerged as the most influential predictors, consistent with established risk factors reported in prior studies. This concordance supports the framework’s interpretability and its alignment with known clinical determinants of postoperative acuity.

The present study is intended as a methodological proof of concept. Prospective, multicenter validation, incorporation of dynamic perioperative data, and evaluation of clinical and economic outcomes will be necessary to determine the utility of Neuro-TACTIC before any clinical application.

Data availability

Due to institutional and privacy regulations, the clinical dataset cannot be made publicly available. De-identified data may be available upon reasonable request and pending institutional approval.

Code availability

The code used to develop and evaluate the Neuro-TACTIC framework (including the cost-sensitive XGBoost objective, ζ-sweep routines, model training/evaluation scripts, and figure generation) will be made publicly available via the University of Heidelberg’s institutional research data repository (HeiData) in a versioned release upon publication and can be accessed at http://neuro-tactic.predict-icu.de. Where applicable, the release will include a minimal example pipeline and documentation to facilitate reuse by other centers.

References

Munari, M. et al. Optimizing post anesthesia care unit admission after elective craniotomy for brain tumors: A cohort study. Acta Neurochir 164, 635–641 (2022).
Article PubMed Google Scholar
Cinotti, R. et al. Prediction score for postoperative neurologic complications after brain tumor craniotomy: A multicenter observational study. Anesthesiology 129, 1111–1120 (2018).
Article PubMed Google Scholar
Wagner, A. et al. Postoperative monitoring after elective intracranial surgery in a postanesthesia care unit is safe, efficient, and cost-effective. Neurocrit. Care https://doi.org/10.1007/s12028-025-02323-z (2025).
Article PubMed PubMed Central Google Scholar
de Almeida, C. C. et al. The utility of routine intensive care admission for patients undergoing intracranial neurosurgical procedures: A systematic review. Neurocrit Care 28, 35–42 (2018).
Article PubMed Google Scholar
Awad, I. A. Intensive care after elective craniotomy: “All politics is local”. World Neurosurg. 81, 64–65 (2014).
Article PubMed Google Scholar
Bahna, M. et al. The necessity for routine intensive care unit admission following elective craniotomy for epilepsy surgery: A retrospective single-center observational study. J. Neurosurg. https://doi.org/10.3171/2021.12.JNS211799 (2022).
Article PubMed Google Scholar
Biswas, K., Agrawal, S., Gupta, P. & Arora, R. Evaluation of risk factors for postoperative neurologic intensive care admission after brain tumor craniotomy: A single-center longitudinal study. J. Anaesthesiol. Clin. Pharmacol. 40, 217–227 (2024).
Article PubMed PubMed Central Google Scholar
Case, A. S., Hochberg, C. H. & Hager, D. N. The role of intermediate care in supporting critically ill patients and critical care infrastructure. Crit. Care Clin. 40, 507–522 (2024).
Article PubMed PubMed Central Google Scholar
Neumann, J.-O., Schmidt, S., Nohman, A., Jakobs, M. & Unterberg, A. Routine ICU admission after brain tumor surgery: Retrospective validation and critical appraisal of two prediction scores. Acta Neurochir (Wien) 165, 1655–1664 (2023).
Article PubMed PubMed Central Google Scholar
Neumann, J.-O. et al. Routine ICU surveillance after brain tumor surgery: Patient selection using machine learning. J. Clin. Med. 13, 5747 (2024).
Article CAS PubMed PubMed Central Google Scholar
Naser, P. V. et al. Deep learning aided preoperative diagnosis of primary central nervous system lymphoma. iScience https://doi.org/10.1016/j.isci.2024.109023 (2024).
Article PubMed PubMed Central Google Scholar
Fischer, M. et al. Precision ICU resource planning: A multimodal model for brain surgery outcomes. in Bildverarbeitung für die Medizin 2025 (eds Palm, C. et al.) 197–202 (Springer Fachmedien Wiesbaden, Wiesbaden, 2025). https://doi.org/10.1007/978-3-658-47422-5_43.
Naser, P. V. et al. LINNDA: Lymphoma identification through neural network detection aid. iScience https://doi.org/10.1016/j.isci.2025.114153 (2025).
Article PubMed PubMed Central Google Scholar
Fischer, M. et al. u-LINNDA: A protocol for user-optimized lymphoma identification through neural network detection aid. STAR Protoc. https://doi.org/10.1016/j.xpro.2025.104176 (2025).
Article PubMed PubMed Central Google Scholar
Chen, T. & Guestrin, C. XGBoost: A scalable tree boosting system. in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 (Association for Computing Machinery, New York, NY, USA, 2016). https://doi.org/10.1145/2939672.2939785.
Iapichino, G. et al. Evaluating daily nursing use and needs in the intensive care unit: A method to assess the rate and appropriateness of ICU resource use. Health Policy 73, 228–234 (2005).
Article CAS PubMed Google Scholar
Stafseth, S. K., Tønnessen, T. I. & Fagerström, L. Association between patient classification systems and nurse staffing costs in intensive care units: An exploratory study. Intensive Crit. Care Nurs. 45, 78–84 (2018).
Article PubMed Google Scholar
Reis Miranda, D. & Jegers, M. Monitoring costs in the ICU: A search for a pertinent methodology. Acta Anaesthesiol. Scand. 56, 1104–1113 (2012).
Article CAS PubMed Google Scholar
Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. in Proceedings of the 31st International Conference on Neural Information Processing Systems 4768–4777 (Curran Associates Inc., Red Hook, NY, USA, 2017).
Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).
Article CAS PubMed PubMed Central Google Scholar
Harris, C. R. et al. Array programming with NumPy. Nature 585, 357–362 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Hanak, B. W. et al. Postoperative intensive care unit requirements after elective craniotomy. World Neurosurg. 81, 165–172 (2014).
Article PubMed Google Scholar
van Niftrik, C. H. B. et al. Machine learning algorithm identifies patients at high risk for early complications after intracranial tumor surgery: Registry-based cohort study. Neurosurgery 85, E756–E764 (2019).
Article PubMed Google Scholar
Kurz, E. et al. Necessary intensity of monitoring after elective craniotomies: A prediction score for postoperative complications to stratify postoperative monitoring. Neurocrit Care https://doi.org/10.1007/s12028-025-02242-z (2025).
Article PubMed PubMed Central Google Scholar
Beauregard, C. L. & Friedman, W. A. Routine use of postoperative ICU care for elective craniotomy: A cost-benefit analysis. Surg. Neurol. 60, 483–489 (2003).
Article PubMed Google Scholar
Ehlers, L. D., Pistone, T., Haller, S. J., Will Robbins, J. & Surdell, D. Perioperative risk factors associated with ICU intervention following select neurosurgical procedures. Clin. Neurol. Neurosurg. 192, 105716 (2020).
Article PubMed Google Scholar
Qasem, L.-E. et al. Implementation of the ‘No ICU - Unless’ approach in postoperative neurosurgical management in times of COVID-19. Neurosurg. Rev. 45, 3437–3446 (2022).
Article PubMed PubMed Central Google Scholar
Qasem, L.-E. et al. No ICU unless 2.0 – Refining ICU admission criteria after elective craniotomy. Brain Spine 5, 105878 (2025).
Article PubMed PubMed Central Google Scholar
Betbeder, T., Moyer, J.-D., Jeantrelle, C., Decq, P. & Sigaut, S. External validation of the Cranioscore for prediction of early postoperative complications requiring ICU after brain tumor craniotomy. Anaesth. Crit. Care Pain Med. 42, 101280 (2023).
PubMed Google Scholar
Kleiman, R. & Page, D. AUCμ: A Performance Metric for Multi-Class Machine Learning Models. in Proceedings of the 36th International Conference on Machine Learning 3439–3447 (PMLR, 2019).
Sokolova, M. & Lapalme, G. A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 45, 427–437 (2009).
Article Google Scholar
Pedregosa, F. et al. Scikit-learn: Machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
MathSciNet Google Scholar
Almansouri, A. et al. Adverse events within 24 hours after 1070 adult brain tumor surgeries recovered in a neurocritical care unit or a postanesthesia care unit. Neurosurg. Pract. 6, e00127 (2025).
PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This publication was supported through state funds approved by the State Parliament of Baden-Württemberg for the Innovation Campus Health + Life Science Alliance Heidelberg Mannheim and the DGNI Nachwuchsförderungspreis 2025 issued to P.V.N. This work was supported by the Research Campus M2OLIE, which was funded by the German Federal Ministry of Education and Research (BMBF) within the Framework “Forschungscampus: Public-private partnership for Innovations” under the funding code 13GW0388A, by the DKTK Joint Funding UPGRADE, Project “Subtyping of pancreatic cancer based on radiographic and pathological features“(SUBPAN) and by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under the grant 410981386. For the publication fee we acknowledge financial support by Heidelberg University. The authors are grateful to Stephanie Schmidt and Amin Nohman, who collected the initial dataset, analyzed in our previous studies9,12. The authors thank all other colleagues in the departments for fruitful discussions. The authors are grateful to Thomas Schmidt for his help with data curation and Philipp Gustav Roth and Chiraz Ben Salah for their feedback.

Funding

Open Access funding enabled and organized by Projekt DEAL. This publication was supported through state funds approved by the State Parliament of Baden-Württemberg for the Innovation Campus Health + Life Science Alliance Heidelberg Mannheim and the DGNI Nachwuchsförderungspreis 2025 issued to P.V.N. This work was supported by the Research Campus M2OLIE, which was funded by the German Federal Ministry of Education and Research (BMBF) within the Framework “Forschungscampus: Public–private partnership for Innovations” under the funding code 13GW0388A, by the DKTK Joint Funding UPGRADE, Project “Subtyping of pancreatic cancer based on radiographic and pathological features “(SUBPAN) and by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under the grant 410981386. For the publication fee, we acknowledge financial support by Heidelberg University.

Author information

Authors and Affiliations

Department of Neurosurgery, Heidelberg University Hospital, Im Neuenheimer Feld 400, 69120, Heidelberg, Germany
Paul Vincent Naser, Roberto Diaz Peregrino, Martin Jakobs, Sandro Krieg & Jan-Oliver Neumann
Medical Faculty, Heidelberg University, Grabengasse 1, 69117, Heidelberg, Germany
Paul Vincent Naser, Maximilian Fischer, Martin Jakobs, Sandro Krieg & Jan-Oliver Neumann
Division of Medical Image Computing, Germany, German Cancer Research Center (DKFZ) Heidelberg, Im Neuenheimer Feld 280, 69120, Heidelberg, Germany
Maximilian Fischer & Peter Neher
Division of Stereotactic Neurosurgery, Department of Neurosurgery, Heidelberg University Hospital, Im Neuenheimer Feld 400, 69120, Heidelberg, Germany
Paul Vincent Naser, Martin Jakobs & Jan-Oliver Neumann
German Cancer Consortium (DKTK), Partner Site Heidelberg, Heidelberg, Germany
Peter Neher
National Center for Tumor Diseases (NCT), NCT Heidelberg, a Partnership Between DKFZ and the University Medical Center Heidelberg, 69120, Heidelberg, Germany
Peter Neher
Pattern Analysis and Learning Group, Department of Radiation Oncology, Heidelberg University Hospital, 69120, Heidelberg, Germany
Maximilian Fischer & Peter Neher
German Cancer Research Center (DKFZ), AI Health Innovation Cluster, Heidelberg, Germany
Paul Vincent Naser

Authors

Paul Vincent Naser
View author publications
Search author on:PubMed Google Scholar
Maximilian Fischer
View author publications
Search author on:PubMed Google Scholar
Roberto Diaz Peregrino
View author publications
Search author on:PubMed Google Scholar
Martin Jakobs
View author publications
Search author on:PubMed Google Scholar
Sandro Krieg
View author publications
Search author on:PubMed Google Scholar
Peter Neher
View author publications
Search author on:PubMed Google Scholar
Jan-Oliver Neumann
View author publications
Search author on:PubMed Google Scholar

Contributions

J.-O.N. and P.V.N. conceived the study, developed the code, and performed all experiments with input from M.F., P.N. and R.D.P. collected the validation dataset. P.V.N. analyzed data, prepared figures, and wrote the manuscript. J.-O.N., M.J., and S.M.K provided infrastructure and conceptual input. All authors reviewed the manuscript and provided feedback.

Reporting guidelines

This study adheres to the TRIPOD-AI (Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis) guidelines. A completed TRIPOD-AI checklist is included in the submission.

Corresponding author

Correspondence to Paul Vincent Naser.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethics approval

This retrospective chart review study involving human participants was in accordance with the ethical standards of the institutional and national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards. The Human Investigation Committee (IRB) of Heidelberg University was extensively consulted and waived informed patient consent due to the retrospective character of the investigation.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information 1. (download PDF )

Supplementary Information 2. (download DOCX )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Naser, P.V., Fischer, M., Peregrino, R.D. et al. A cost-sensitive multiclass machine learning framework for postoperative neurosurgical triage (Neuro-TACTIC). Sci Rep 16, 9847 (2026). https://doi.org/10.1038/s41598-026-45092-1

Download citation

Received: 02 January 2026
Accepted: 17 March 2026
Published: 24 March 2026
Version of record: 25 March 2026
DOI: https://doi.org/10.1038/s41598-026-45092-1

Subjects

Abstract

Similar content being viewed by others

Explainable machine learning prediction of tracheostomy after craniotomy for supratentorial intracerebral hemorrhage

Development and validation of an interpretable prediction model for the risk of unplanned reoperation in patients underwent intracranial tumor surgery

Efficacy study of neuronavigation-assisted stereotactic drilling of urokinase drainage versus craniotomy in the treatment of massive intracerebral haemorrhage in elderly patientsa

Introduction

Methods

Patient population

Data collection

Prognostic features

Postoperative events and levels of surveillance

Training and validation of machine learning algorithms

Integration of a cost-matrix and a weighted loss function

Classification performance with HC and RC

Cross-validation and performance evaluation

Feature importance

Statistics

Ethics approval

Results

Demographics and comorbidities

Tumor-related features

ICU and IMC events

Classification performance

Relative importance of prognostic features

Discussion

Limitations

Conclusions

Data availability

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Reporting guidelines

Corresponding author

Ethics declarations

Competing interests

Ethics approval

Additional information

Publisher’s note

Supplementary Information

Supplementary Information 1. (download PDF )

Supplementary Information 2. (download DOCX )

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links