Table 2 Model performance across test sets

From: Prediction of functional outcomes in aneurysmal subarachnoid hemorrhage using pre-/postoperative noncontrast CT within 3 days of admission

	Regression task (mRS score)	Classification task (poor functional outcome)
	MAE	AUC	Sensitivity (%)	Specificity (%)	PPV (%)
AH
Pre-operative model	1.46 (1.35–1.58)	0.82 (0.76–0.88)	76.8 (67.5–84.9)	79.0 (72.6–85.6)	64.8 (55.7–74.1)
Post-operative model	1.17 (1.07–1.28)	0.86 (0.80–0.91)	68.3 (57.8–78.4)	86.4 (81.1–91.2)	72.0 (62.1–81.1)
Stacking imaging model	0.98 (0.86–1.10)	0.87 (0.82–0.91)	65.8 (55.1–75.6)	88.3 (82.8–93.2)	73.8 (63.3–82.7)
Clinical model	1.10 (0.95–1.25)	0.84 (0.78–0.89)	57.4 (46.9–67.1)	88.9 (83.8–93.3)	72.3 (60.0–83.6)
Fusion model	0.88 (0.76–0.99)	0.90 (0.85–0.93)	64.7 (54.2–74.6)	91.4 (87.0–95.5)	79.1 (69.2–88.7)
FY
Pre-operative model	1.41 (1.30–1.53)	0.82 (0.76–0.88)	75.2 (65.2–83.8)	78.2 (71.7–84.8)	67.0 (57.1–76.5)
Post-operative model	1.07 (0.98–1.17)	0.90 (0.85–0.94)	83.2 (75.0–90.8)	79.6 (72.9–85.9)	70.2 (61.7–78.7)
Stacking imaging model	0.91 (0.79–1.02)	0.91 (0.86–0.95)	83.2 (75.0–90.5)	81.5 (75.0–87.3)	72.6 (63.2–81.2)
Clinical model	0.94 (0.81–1.08)	0.88 (0.84–0.92)	61.6 (51.8–72.4)	92.8 (88.4–96.6)	83.6 (74.2–91.9)
Fusion model	0.73 (0.63–0.84)	0.93 (0.89–0.96)	81.8 (73.1–89.5)	91.7 (86.9–95.9)	85.0 (76.9–92.3)
TL
Pre-operative model	1.47 (1.35–1.58)	0.88 (0.83–0.92)	77.7 (68.4–86.7)	85.3 (79.4–91.1)	77.8 (68.3–86.2)
Post-operative model	1.12 (1.01–1.23)	0.93 (0.89–0.96)	86.3 (78.2–93.4)	87.7 (81.7–93.0)	82.2 (74.1–90.1)
Stacking imaging model	0.87 (0.76–0.99)	0.93 (0.89–0.96)	85.4 (77.5–92.2)	88.5 (82.9–93.9)	83.2 (74.7–90.7)
Clinical model	0.96 (0.81–1.12)	0.89 (0.83–0.94)	64.4 (53.9–74.7)	94.3 (89.8–97.7)	88.1 (78.9–95.5)
Fusion model	0.74 (0.63–0.85)	0.94 (0.90–0.97)	84.1 (75.6–91.5)	93.5 (88.4–97.5)	89.6 (82.2–96.1)
Test-Combined
Pre-operative model	1.45 (1.38–1.52)	0.84 (0.80–0.87)	76.3 (71.2–81.7)	80.5 (76.8–84.3)	69.5 (64.2–75.3)
Post-operative model	1.12 (1.06–1.18)	0.89 (0.87–0.92)	79.2 (74.2–84.3)	84.5 (81.0–87.8)	74.7 (69.3–79.8)
Stacking imaging model	0.92 (0.86–0.99)	0.90 (0.87–0.92)	78.0 (72.8–83.2)	86.2 (82.6–89.2)	76.6 (71.1–81.7)
Clinical model	1.00 (0.92–1.09)	0.87 (0.84–0.90)	60.9 (54.8–66.9)	91.8 (89.2–94.2)	81.0 (75.0–86.3)
Fusion model	0.79 (0.72–0.85)	0.92 (0.90–0.94)	76.9 (71.5–82.0)	92.0 (89.4–94.7)	84.7 (79.9–89.1)

Model performance metrics for predicting mRS score and poor functional outcome in different test cohorts. Values shown with 95% confidence intervals in parentheses. Test-Combined includes pooled data from AH, FY, and TL cohorts. Sensitivity specificity and PPV were calculated using a threshold of 2.5 for binary classification. All performance metrics and their corresponding confidence intervals were estimated using 1000 bootstrap resamples and reported as the mean values.
MAE Mean Absolute Error, AUC Area Under the Receiver Operating Characteristic Curve, PPV Positive Predictive Value, mRS Modified Rankin Scale, WN First Affiliated Hospital of Wannan Medical College, AH First Affiliated Hospital of Anhui Medical University, FY Fuyang People’s Hospital, TL Tongling People’s Hospital.

Back to article page

Table 2 Model performance across test sets

Search

Quick links