Introduction

Bile duct strictures are a common and challenging clinical problem arising from a broad spectrum of benign and malignant biliary tract lesions1. Early and accurate differentiation between benign and malignant strictures is critical because it directly determines therapeutic strategies and impacts patient outcomes2. Initial non-invasive imaging modalities such as high-resolution ultrasound, contrast-enhanced computed tomography (CT), and magnetic resonance cholangiopancreatography (MRCP) are excellent for detecting and localizing biliary strictures3. However, imaging alone often cannot definitively distinguish malignant from benign causes, necessitating tissue sampling for pathological confirmation4.

Endoscopic retrograde cholangiopancreatography (ERCP) has long been a cornerstone in the diagnosis and management of biliary disorders, allowing not only radiographic visualization of strictures but also the ability to obtain biopsy specimens during the same procedure5. In China, ERCP technology has reached an internationally advanced level, solidifying its role in diagnosing and treating conditions such as cholelithiasis and obstructive jaundice6. The safety and efficacy of ERCP have been well-documented, establishing it as an essential tool in clinical practice7,8. Traditionally, ERCP-guided tissue sampling is performed with forceps biopsies of the stricture site9. A major limitation of this conventional approach is the inability to assess specimen adequacy immediately; each biopsy sample must be sent to pathology for later analysis, with no feedback during the ongoing procedure. As a result, multiple biopsy attempts are often undertaken empirically to maximize the chance of a diagnostic specimen. This can substantially prolong the procedure and increases patient discomfort and procedural risk10. Prolonged ERCP with numerous sampling passes elevates the likelihood of complications such as bleeding, perforation, and post-ERCP cholangitis11. There is a clear need for improved intra-procedural techniques that can enhance diagnostic yield with fewer biopsy attempts, thereby reducing procedure time and complication risk.

Rapid on-site evaluation (ROSE) has emerged as an innovative technique to address this need by providing immediate cytopathological assessment of specimen adequacy in the procedure room12. During ROSE, smears are prepared and stained from biopsy material while the procedure is underway, enabling a cytopathologist or trained specialist to evaluate cellular content and preliminarily identify malignant cells within minutes13. Real-time confirmation of adequacy enables the endoscopist to decide whether additional biopsies are necessary before completing the ERCP. Once an adequate specimen (e.g., containing diagnostic cellular material) is obtained, further biopsies can be avoided14. This approach has proven successful in other fields, such as endobronchial ultrasound-guided lung biopsy, where ROSE has significantly improved diagnostic yield and efficiency15. In biliary disease, ROSE has been applied primarily to brush cytology samples, with studies showing increased sensitivity of biliary brushing for malignancy16. However, evidence on integrating ROSE with forceps biopsy for histological diagnosis of biliary strictures remain limited. Because biopsy specimens generally provide higher diagnostic accuracy than cytology, evaluating the value of ROSE in this context is important.

Given this background, we conducted a retrospective cohort study to evaluate whether ROSE-enhanced ERCP-guided biopsy improves the diagnostic accuracy for biliary strictures compared to conventional ERCP biopsy without ROSE. In addition to comparing standard diagnostic performance outcomes (accuracy, sensitivity, specificity, PPV, NPV) and procedural metrics (number of biopsies, procedure duration, complications) between the two approaches, we also aimed to develop a multivariable predictive model for diagnostic accuracy. By identifying independent factors influencing whether a biopsy correctly identifies the nature of a stricture, we sought to control for potential confounders in this non-randomized study and to provide clinicians with a nomogram that could estimate the probability of obtaining a correct diagnosis in individual patients. Our goal is that the findings from this study will provide a practical foundation for broader implementation of ROSE in ERCP and improve diagnostic strategies for patients with biliary strictures. Ultimately, this work may contribute to establishing a new, more effective standard of care for tissue diagnosis in biliary strictures, leading to better-informed clinical decisions and improved patient outcomes.

Methods

Study design and population

This single-center retrospective cohort study was conducted in accordance with the STROBE guidelines for observational research. We reviewed all patients referred to the ERCP Diagnosis and Treatment Center of Tianjin Third Central Hospital for diagnostic ERCP with biliary stricture biopsy between January 1, 2021 and December 31, 2022.

Inclusion criteria were: (1) radiologically confirmed biliary duct stricture on at least one imaging modality–such as high-resolution abdominal ultrasound, contrast-enhanced multidetector CT, or three-dimensional MRCP–with all imaging reviewed and the stricture confirmed by two independent radiologists; and (2) clinical indication for ERCP-guided tissue sampling of the stricture (for example, unexplained progressive obstructive jaundice, cholangitis not responding to conservative therapy, or strong suspicion of malignancy requiring histological confirmation).

Exclusion criteria included any condition precluding safe ERCP or biopsy, specifically: severe cardiopulmonary disease (e.g., congestive heart failure or unstable coronary disease) requiring ongoing anticoagulation that could not be temporarily suspended; inability to discontinue anticoagulant or antiplatelet medications for at least 7–10 days prior to the procedure; history of severe contrast allergy; uncorrectable coagulopathy; anatomical obstructions of the upper gastrointestinal tract preventing duodenoscope passage; active acute pancreatitis; uncontrolled systemic infection or sepsis; and pregnancy.

Eligibility was determined by two hepatobiliary surgeons independent of the ERCP team. All data were de-identified and extracted retrospectively from medical records. The Institutional Review Board of the Tianjin Third Central Hospital Ethics Committee approved the study and waived individual informed consent in view of the retrospective design and use of anonymized clinical data. Patient confidentiality was maintained in accordance with the Declaration of Helsinki (2024revision).

ERCP procedure and biopsy technique

All patients underwent ERCP with biliary stricture biopsy according to institutional protocols. Patients fasted for at least 6 h before the procedure. ERCP was performed under monitored anesthesia care with the patient in the left lateral decubitus (semi-prone) position. Continuous monitoring of electrocardiogram, blood pressure, heart rate, and oxygen saturation was provided throughout.

A therapeutic duodenoscope (Olympus TJF-260 V or JF-260 V, Olympus Corp., Tokyo, Japan) was advanced through the esophagus and stomach into the second portion of the duodenum to identify the major duodenal papilla. Fluoroscopic guidance was provided by a Philips ProxiDiagnost N90 fluoroscopy system integrated with an Olympus CLV-290S endoscopic video system. After cannulation of the bile duct using a 0.035-inch, 480 cm METII35480 guidewire (Wilson-Cook Medical, Winston-Salem, NC, USA) and a standard ERCP catheter, contrast was injected to delineate the stricture under X-ray. When necessary, an endoscopic sphincterotomy was performed using a papillotomy knife to facilitate access.

Targeted forceps biopsies of the stricture were then obtained using Radial Jaw 4 biliary biopsy forceps (Boston Scientific Corp., Natick, MA, USA) passed through the duodenoscope channel. For all cases, a maximum of five biopsy samples per stricture was planned, unless a definitive diagnosis was reached with fewer passes. The biopsy protocol differed between cohorts as follows:

ROSE-enhanced biopsy workflow

In the ROSE cohort, each forceps bite was evaluated in real time by a hepatobiliary surgeon trained in cytopathology, with the microscope stationed in the endoscopy suite. Immediately after retrieval, a portion of the tissue was touch-imprinted/smeared onto a glass slide to avoid loss of visible tissue, air-dried, and stained with Diff-Quik. The rapid sequence (fixative, brief buffer rinse, counter-stain, water rinse, air-dry) required ~ 30–60 s per pass, enabling near-instant feedback. On-site assessment focused on cellularity and malignant cytologic features, including disordered clusters, nuclear enlargement, irregular membranes, and high nucleus-to-cytoplasm ratios. Adequacy was predefined as the presence of interpretable material, generally ≥ 40 well-preserved cells.

Smears were categorized as ROSE-positive (malignant/suspicious) or ROSE-negative/insufficient. Biopsy strategy followed a standardized algorithm. If a smear was ROSE-positive, the endoscopist obtained one confirmatory targeted bite from the same stricture segment for histology and then terminated sampling to minimize trauma. If ROSE-negative or inadequate, the forceps were repositioned 3–5 mm proximally or distally within the stricture (or the approach angle was altered), and another pass was performed. Re-sampling proceeded sequentially up to a maximum of five passes per lesion, or until a diagnostic smear was obtained.

All ROSE assessments were performed on-site by two designated surgeons with cytology training; discrepant impressions were resolved immediately by joint review. After smear preparation, the remaining tissue from each pass was placed in 10% neutral-buffered formalin and submitted for routine histopathology, which served as the reference standard. This workflow allowed dynamic, targeted sampling guided by immediate adequacy feedback, truncating the procedure when diagnostic material was confirmed and prompting strategic re-sampling when initial yields were insufficient.

Conventional biopsy (NON-ROSE) workflow

In the NON-ROSE cohort, no immediate cytological assessment was available during the procedure. Accordingly, a standardized protocol of five biopsy passes was followed for each stricture, or fewer if the endoscopist felt that an obvious tumor tissue fragment had been obtained. Without on-site feedback, the operator empirically sampled different quadrants of the stricture to maximize the chance of capturing diagnostic tissue.

All biopsy specimens were placed in formalin for standard pathology examination. Notably, all ERCP procedures and biopsies in both cohorts were performed by a single experienced endoscopist using identical technique, which minimized inter-operator variability.

Definition of outcomes and reference standards

The primary endpoint was the diagnostic accuracy of ERCP-guided forceps biopsy for differentiating malignant from benign biliary strictures, benchmarked against an independent reference standard.

For each patient, the reference standard was defined as: (i) surgical pathology when resection was performed, or (ii) clinical–radiologic follow-up of ≥ 6 months when surgery was not undertaken.

Biopsy classification: (i) true positive: Malignant histology on ERCP biopsy confirmed by surgical pathology, or—if unresected—by unequivocal malignant progression on follow-up (lesion growth, vascular invasion, or nodal/distant metastasis). (ii) true negative: Benign or inflammatory biopsy findings confirmed by benign surgical pathology or by stable/resolving stricture without progression during follow-up. (iii) false negative: Biopsy negative for malignancy, but malignancy confirmed by the reference standard. (iv) false positive: Biopsy positive for malignancy, but reference standard benign.

Using these definitions, we calculated accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) for ROSE-assisted and conventional biopsy. In all metrics, a positive test denoted biopsy evidence of malignancy; a negative test denoted benign/non-malignant biopsy.

Secondary endpoints assessed procedural efficiency and specimen quality. Specimen adequacy was evaluated inversely via the rate of unqualified specimens, defined operationally as cases in which biopsy sampling failed to identify tumor despite a true neoplasm being present by the reference standard. In the ROSE cohort, an unqualified case required both on-site smear assessment and final histology to show only benign/inflammatory elements while the clinical reference later confirmed tumor. In the NON-ROSE cohort, unqualified samples were those with non-diagnostic/benign biopsy yet tumor subsequently proven by surgery or follow-up. We recorded the number and anatomic distribution (hilar, mid, distal bile duct) of unqualified cases to explore site-specific sampling challenges.

Procedural efficiency was measured by the number of biopsy passes (total forceps bites per patient) and the biopsy sampling time, defined as the cumulative interval from the insertion of the biopsy forceps into the narrow biliary duct and initiation of mucosal tissue sampling to the withdrawal of the biopsy forceps from the opening of the major duodenal papilla (i.e., sum of all individual sampling attempts). Time was recorded in minutes.

Safety outcomes included intraprocedural or postprocedural biliary hemorrhage, cholangitis, bile-duct perforation, and post-ERCP pancreatitis. Minor bleeding controlled endoscopically without transfusion was recorded. All adverse events were attributed to the index ERCP/biopsy session and tabulated by cohort.

Statistical analysis

Statistical analyses were conducted using SPSS version 20.0 (IBM Corp., Armonk, NY, USA). Categorical variables were summarized as frequencies or percentages and compared using Chi-square or Fisher’s exact tests. Continuous variables were expressed as mean ± standard deviation (SD) and analyzed with independent-samples t-tests. Statistical significance was defined as P < 0.05.

To control for confounding due to non-randomization, multivariate logistic regression analysis was performed to identify independent predictors of diagnostic accuracy. The outcome variable was biopsy accuracy (correct vs. incorrect diagnosis). The dataset (n = 200) was randomly split into training (60%, n = 120) and validation cohorts (40%, n = 80). Variables significant in univariate analysis (P < 0.05) were entered into a forward stepwise multivariate regression model. Model performance was evaluated by receiver operating characteristic (ROC) curves in both cohorts, with calibration assessed by the Hosmer–Lemeshow test. Clinical utility was explored using decision curve analysis (DCA), comparing net benefit across a range of threshold probabilities. Finally, a nomogram based on the multivariate model was constructed for individualized prediction. All tests were two-sided.

Results

Patient inclusion and baseline characteristics

During the two-year study period, a total of 233 patients met the initial inclusion criteria for biliary stricture evaluation with ERCP-guided biopsy. After applying the exclusion criteria, 33 patients were omitted (due to factors such as high-risk comorbid conditions, or loss to follow-up), leaving 200 patients, were allocated into two cohorts based on whether on-site cytological evaluation was employed during their ERCP procedure (Fig. 1), in the final analysis. Specifically, if real-time ROSE support was available at the time of the procedure–meaning a cytopathology-trained hepatobiliary specialist was present and rapid smear interpretation could be performed–the patient’s biopsy was managed with ROSE guidance (ROSE cohort). If ROSE personnel or cytology equipment were not available (due to scheduling or resource constraints), the patient underwent a conventional ERCP-guided biopsy without on-site evaluation (NON-ROSE cohort). Importantly, group assignment was not randomized; it depended on logistical availability of ROSE and, in some cases, physician recommendation and patient consent to the ROSE procedure. Baseline features were well balanced between the two cohorts (Table S1).

Fig. 1
figure 1

Flowchart of patient enrollment and inclusion in the study.

Pathological findings and final diagnoses

Final diagnostic outcomes (combining biopsy pathology and clinical follow-up or surgery) demonstrated that a majority of these biliary strictures were malignant (adenocarcinoma, neuroendocrine tumors). Concordance of biopsy pathology with the reference diagnosis consistently favored ROSE. For malignant disease, ROSE correctly identified 68/69 cases (98.55%) versus 62/79 (78.48%) with NON-ROSE. For benign tumors (adenoma, papilloma, inflammatory pseudotumor), concordance was 7/8 (87.5%) with ROSE and 4/6 (66.67%) with NON-ROSE. For inflammatory strictures, ROSE achieved 24/24 agreement (100.0%) compared with 15/34 (44.12%) in NON-ROSE. Overall, ROSE markedly reduced false negatives, especially in malignant and inflammatory etiologies, resulting in superior alignment between biopsy results and true final diagnoses. These findings support ROSE-enhanced ERCP biopsy as a more reliable diagnostic approach for indeterminate biliary strictures (Tables S2–4, Fig. 2).

Fig. 2
figure 2

The Agreement rate between pathological diagnosis and clinical diagnosis results in ROSE and NON-ROSE.

Diagnostic performance comparison

The ROSE cohort exhibited superior diagnostic performance compared to the NON-ROSE group. Among 100 patients in each cohort, ROSE-guided biopsy achieved a diagnostic accuracy of 92.0%, significantly higher than the 78.0% observed in NON-ROSE. The sensitivity for detecting malignant strictures was markedly improved with ROSE, indicating a substantial reduction in false negatives. Importantly, the negative predictive value (NPV) was also significantly higher in the ROSE cohort, underscoring the reliability of a negative ROSE result in excluding malignancy. Conversely, the specificity was numerically lower in the ROSE group, primarily due to a higher number of false positives (Table S5). These false positives were predominantly associated with severe inflammatory changes mimicking malignancy on cytopathology. Despite this, the positive predictive value (PPV) remained high and statistically comparable between cohorts, indicating that a biopsy diagnosed as malignant was highly reliable in either group. Overall, ROSE significantly improved both sensitivity and NPV, critical parameters for minimizing missed diagnoses of malignancy, while maintaining high PPV. Although specificity was slightly reduced, the overall accuracy advantage supports the diagnostic superiority of ROSE-assisted biopsy in the evaluation of biliary strictures (Table 1, Fig. 3).

Table 1 Comparison of accuracy, sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) between ROSE and NON-ROSE (%).
Fig. 3
figure 3

The accuracy, sensitivity, specificity, PPV and NPV between ROSE and NON-ROSE (%), positive predictive value (PPV), negative predictive value (NPV).

Specimen adequacy and unqualified samples

The quality of biopsy specimens, as reflected by the rate of unqualified (insufficient or false-negative) samples, was significantly better with ROSE. Unqualified samples, biopsies negative despite a true tumor, occurred in 3/100 (3.0%) ROSE cases versus 18/100 (18.0%) NON-ROSE cases. Misses clustered at the hilar bile duct, 2.0% with ROSE vs 11.0% without. Distal bile-duct misses were less frequent and not significantly different. No mid-duct misses occurred in either cohort (Table S6). These data indicate that ROSE markedly reduces false-negative/insufficient biopsies overall and is particularly advantageous for technically challenging hilar strictures. Clinically, the six-fold lower unqualified rate with ROSE supports greater confidence in a negative result and fewer repeat procedures or ancillary tests.

Biopsy passes and procedure time

ROSE significantly reduced sampling burden and time (Table 2; Fig. 4). The mean number of biopsy passes was 2.43 ± 0.90 with ROSE versus 3.61 ± 0.67 without, a 32.7% reduction. A single-pass definitive diagnosis was achieved in 17% of ROSE cases (0% in NON-ROSE, which followed a multi-pass convention). Correspondingly, tissue-sampling duration was shorter with ROSE (2.71 ± 0.74 min) than with NON-ROSE, a ~ 37% decrease. The reduction in procedure time has clinical implications not only for patient comfort but also potentially for safety, as a shorter ERCP means less anesthesia time and potentially lower risk of complications. These efficiency gains reflect real-time adequacy confirmation, permitting early termination once diagnostic material is obtained. All procedures in both cohorts were completed within the pre-specified maximum of five passes.

Table 2 Comparison of the biopsy number and the operation duration between the two cohorts (x̄ ± s).
Fig. 4
figure 4

Enhanced comparison between biopsy numbers and operation time in ROSE and NON-ROSE.

Adverse events

Both biopsy approaches were generally safe, with a low incidence of complications (Table 3). In the ROSE cohort, 1/100 (1.0%) patient had mild biliary bleeding controlled endoscopically; no cholangitis or perforation occurred. In the NON-ROSE cohort, 4/100 (4.0%) patients experienced complications, three mild bleedings managed endoscopically and one small hepatic-duct perforation treated with covered stent and antibiotics; all recovered without surgery. No pancreatitis was observed in either group. Although the overall complication rate was numerically lower with ROSE (1.0% vs 4.0%), the difference was not statistically significant. Given the low event counts, the study was not powered to detect modest between-group differences. Importantly, ROSE did not increase adverse events and showed a favorable trend, plausibly related to shorter sampling time (Fig. 5). Most events were minor and self-limited; no patient required intensive care or emergent surgery.

Table 3 Comparison of incidence of postoperative adverse events in two cohorts.
Fig. 5
figure 5

Number of postoperative adverse events in ROSE and NON-ROSE.

Multivariate analysis and predictive model

To address the non-randomized allocation and identify factors independently associated with a correct biopsy diagnosis, we developed a predictive model with internal validation. The full cohort (n = 200) was randomly divided into a training set (n = 120) and a test set (n = 80), ensuring comparable baseline characteristics between the two groups (Table 4).

Table 4 Baseline characteristics of 200 patients in the training and test cohorts (n (%) or mean ± SD).

Candidate predictors were prespecified based on clinical relevance and data availability, age, sex, hematologic indices (WBC, HB, PLT), liver biochemistries (ALT, AST, ALP, GGT, TBIL, DBIL), CA19-9, cohort (ROSE vs NON-ROSE), and lesion nature (benign, inflammatory, malignant).

Univariable analysis in the full cohort identified ROSE utilization as positively associated with diagnostic success (P = 0.034). Among cholestasis markers, ALP reached nominal significance (P = 0.032) and GGT showed a non-significant trended (P = 0.054). Age, sex and other laboratory parameters were not significantly associated with diagnostic accuracy (all P > 0.10). Lesion nature was a strong correlate, malignant strictures were far more likely to yield a correct diagnosis than benign ones (P = 0.009), consistent with greater cytologic atypia and cellular yield.

Variables with P < 0.05 were entered into a forward stepwise logistic regression model. Two independent predictors remained: ROSE use (adjusted odds ratio [OR] 4.03, 95% CI 1.13–14.34; P = 0.032) and malignant lesion nature versus benign (adjusted OR 4.48, 95% CI 1.32–15.24; P = 0.016) (Table 5).

Table 5 Univariable and multivariable logistic regression analysis identifying predictors of accurate diagnosis in 200 patients with biliary stricture.

Model discrimination was well preserved across data splits, with an area under the receiver operating characteristic curve (AUC) was 0.806 (95% CI 0.700–0.913) in the training set and 0.804 (95% CI 0.678–0.929) in the test set (Fig. 6), indicating stable performance without evidence of overfitting. Calibration by Hosmer–Lemeshow test was satisfactory in both cohorts (Fig. 7 A-B), with predicted probabilities closely aligned with observed accuracy across deciles of risk. Potential clinical utility was assessed using decision curve analysis (DCA) in the combined cohort. Across a broad range of threshold probabilities relevant to post-biopsy decision-making (e.g., whether to accept a negative result or pursue additional sampling), the model demonstrated a positive net benefit compared with default strategies of “trust all biopsy results” or “trust none” (Fig. 7 C-D). This suggests the model could reduce missed malignancies without incurring an excessive unnecessary downstream testing.

Fig. 6
figure 6

Receiver operating characteristic (ROC) curves of the combined predictive model for clinical endpoints in 200 patients with biliary stricture.

Fig. 7
figure 7

Calibration curves and decision curve analysis (DCA) of the combined predictive model in patients with biliary stricture. A, C: training cohort; B, D: test cohort. (AB) Calibration plots showing agreement between predicted probability of a correct biopsy diagnosis and observed outcomes; (CD) DCA curves depicting net benefit across threshold probabilities.

The final model was translated into a nomogram incorporating the two independent predictors, ROSE utilization and lesion nature (benign, inflammatory, malignant), to estimate the individualized probability that a given biopsy accurately reflects the true diagnosis (Fig. 8).

Fig. 8
figure 8

Nomogram for individualized prediction of diagnostic accuracy in patients with biliary stricture.

Summing points for ROSE “Yes” and a malignancy-consistent clinical impression typically yields a predicted accuracy exceeding 0.90, whereas the absence of ROSE in a lesion judged benign or inflammatory corresponded to a substantially lower probability, signaling the need for adjunctive diagnostics in such scenarios. Collectively, the modeling supports the primary comparative findings, the association between ROSE and higher diagnostic accuracy is independent of case mix, and malignant biology inherently favors correct classification, whereas benign and inflammatory strictures remain diagnostically challenging. Although internally validated with consistent AUCs and satisfactory calibration, the model derives from a single-center experience and should undergo external validation before widespread adoption. Nevertheless, it provides a pragmatic, quantitative framework to tailor confidence in biopsy results, prioritize patients for further testing, and operationalize the incremental value of ROSE within ERCP workflows.

Discussion

Distinguishing malignant from benign biliary strictures is both challenging and essential, as management strategies and prognoses differ markedly17,18. Non-invasive imaging (ultrasound, CT, MRCP) can define stricture location and extent but often cannot provide definitive histological classification. Consequently, tissue acquisition via ERCP is frequently required. Early and accurate differentiation between benign and malignant biliary lesions is crucial, as it directly influences treatment strategies, prognosis, and patient quality of life19. Conventional ERCP-guided biopsy, however, is essentially blind—sampling is performed without immediate adequacy feedback, often necessitating multiple passes and prolonging the procedure4. This inefficiency can increase the risk of complications from repeated bile duct manipulation, such as bleeding, perforation, or cholangitis20.

In this study, we integrated rapid on-site cytopathological evaluation (ROSE) into the ERCP workflow to provide real-time assessment of specimen adequacy. ROSE-enhanced ERCP markedly improved diagnostic performance: overall accuracy was ~ 92% with ROSE versus 78% with standard biopsy, and sensitivity increased from ~ 75 to 97%, substantially reducing false negatives. Nearly all malignancies in the ROSE group were diagnosed in a single session, compared with frequent initial misses in the conventional group. Consequently, the negative predictive value (NPV) rose from ~ 49 to ~ 92%, allowing a negative ROSE-guided biopsy to reliably exclude malignancy in most cases. This is clinically critical, as it prevents both treatment delays and unnecessary surgery for undiagnosed cancers. The positive predictive value (PPV) exceeded 90% in both groups, although specificity decreased slightly with ROSE (79% vs. 90%) due to a small number of benign inflammatory strictures misinterpreted as malignant on rapid cytology. These rare false positives underscore the importance of experienced cytopathology review and correlation with the clinical context, yet the small specificity trade-off is outweighed by the large sensitivity gain, investigating an occasional false positive is far preferable to missing a cancer.

Beyond accuracy, ROSE improved procedural efficiency and potentially safety. Real-time adequacy feedback enabled the endoscopist to cease sampling once diagnostic tissue was obtained, reducing the mean number of biopsy passes by approximately one-third and shortening procedure time by ~ 37%. Shorter, more focused procedures reduce anesthesia exposure, improve patient comfort, and optimize endoscopy suite utilization. Fewer passes may also reduce tissue trauma; indeed, complication rates were numerically lower with ROSE (1% vs. 4%), though this difference was not statistically significant.

Specimen adequacy improved markedly, even for anatomically challenging hilar strictures. The nondiagnostic rate fell to 3% with ROSE versus 18% with standard biopsy, reflecting the ability to redirect sampling immediately if an initial smear was inadequate. This transforms ERCP biopsy from blind sampling to a guided, iterative process that maximizes diagnostic yield.

Multivariate logistic regression confirmed ROSE as the strongest independent predictor of diagnostic success (adjusted OR ~ 4), even after accounting for demographics, laboratory parameters, and lesion characteristics. Malignant lesions were also more likely to yield correct diagnoses, consistent with greater cytologic atypia and cellularity. The resulting nomogram, incorporating these two predictors, demonstrated good discrimination (AUC ~ 0.80) and calibration on internal validation, with decision curve analysis indicating net clinical benefit across a wide range of decision thresholds. Such a tool could support individualized risk estimation and guide decisions on whether further sampling is warranted.

These improvements in diagnostic certainty carry important clinical implications. With ROSE, a single ERCP session can frequently provide a definitive diagnosis, enabling timely surgical or oncologic management for malignancies and avoiding unnecessary surgery for benign disease. By reducing repeat procedures, ROSE may shorten hospital stays and lower healthcare costs. Implementation requires minimal additional resources, primarily standard cytology supplies and staff training, and can be feasible even without a full-time cytopathologist, as demonstrated by our center’s clinician-performed smears.

Limitations

This retrospective, single-center study was subject to potential selection bias, as ROSE use depended on availability rather than by randomization. Although mitigated by consecutive case inclusion and multivariate adjustment, residual confounding is possible. The moderate sample size limited power to detect differences in low-frequency outcomes such as complications or specificity changes. As a tertiary referral center with a high prevalence of malignancy, our predictive values (especially NPV and PPV) may not generalize to settings with a higher proportion of benign disease. Finally, ROSE effectiveness depends on cytopathology expertise; replication at centers without on-site cytology may require dedicated training or telepathology support.

Conclusion

ROSE-enhanced ERCP represents a significant advance in the evaluation of indeterminate biliary strictures. By enabling real-time confirmation of specimen adequacy, it substantially improves diagnostic accuracy and efficiency, which can lead to more timely and appropriate patient management. Given its ease of integration and strong performance, ROSE merits broader consideration as a standard adjunct to ERCP when biliary malignancy is suspected. Future multicenter, prospective studies should validate these findings and explore integrating ROSE with complementary diagnostic modalities. Our results strongly support ROSE as a valuable tool that elevates the care of patients with biliary strictures.