Fig. 2: Prompt development and initial model evaluations.

a Comprehensive prompt framework structure including objective definition, European Perioperative Clinical Outcome (EPCO) diagnostic criteria for 22 postoperative complications, structured JavaScript Object Notation (JSON) output format, and clinical data components (general information, postoperative medical records, abnormal test results, and examination results). b Token count distribution analysis revealing substantial heterogeneity between medical centers, with Center 2 exhibiting broader distribution patterns in clinical documentation length while Center 1 showed more concentrated distributions (Center 1 mean: 6841.6 tokens; Center 2 mean: 8084.3 tokens). c Example of basic structured JSON output format without Chain-of-Thought (CoT) prompting, showing direct complication identification and severity grading. d Initial performance evaluation across multiple state-of-the-art language models using micro-averaged metrics with confidence intervals from five repeated inferences (macro-averaged results in Supplementary Fig. 1), demonstrating superior performance of reasoning models over general models, with several AI models exceeding human expert benchmarks. e Example of CoT-enhanced JSON output incorporating diagnostic reasoning through the “think” field, enabling transparent clinical decision-making processes. f Performance comparison following CoT implementation using micro-averaged metrics with 95% confidence intervals from five repeated inferences and patient-level bootstrap paired testing (macro-averaged results in Supplementary Fig. 2; detailed statistics, effect sizes, and CIs in Supplementary Tables 1–2). Asterisks indicate statistical significance levels from bootstrap paired testing: * p < 0.05, ** p < 0.01, *** p < 0.001, showing significant improvements in general models while reasoning models maintained consistently high performance across F1 score, recall, and precision metrics. Figure created using Python matplotlib library; final composition assembled using Canva (www.canva.com).