Introduction

Crohn’s disease (CD) is a chronic, relapsing inflammatory bowel disease (IBD) characterized by gastrointestinal inflammation1. The incidence of CD is increasing in newly industrialized countries2. Approximately 60% of CD patients develop complications, with intestinal strictures being the most common (affecting 50% of complicated cases and 31% overall)3,4. Strictures significantly impair quality of life and prognosis while increasing the healthcare burden5.

Previous studies have reported that 70% to 80% of CD patients with strictures require intestinal surgery within 20 years of diagnosis6. Some patients experience recurrent strictures and multiple surgeries, with postoperative complications further diminishing quality of life. Histologically, strictures are classified as fibrotic, inflammatory, or mixed (a combination of both)7. Inflammatory strictures can often be alleviated with intensified medical therapy, potentially delaying or avoiding surgery8. In contrast, fibrotic strictures may not respond to medical therapy, leading to obstructive symptoms and missed surgical windows. Thus, accurate assessment of stricture nature is essential for determining appropriate treatment strategies.

Determining stricture nature in clinical practice remains challenging. While existing guidelines, including those from ECCO, address intestinal strictures, they offer limited and non-standardized guidance for accurately distinguishing stricture nature in routine practice. Imaging-based approaches—including CT, MRI, and intestinal ultrasound—have been explored to distinguish stricture types9,10,11,12. CT enterography may suggest fibrosis but is limited by radiation exposure and unsuitability for long-term monitoring13. Advanced MRI techniques (e.g., MT-MRI, T2WI) show promise in experimental models but lack human validation and are costly14. Intestinal ultrasound offers high sensitivity, low cost, and feasibility for repeated follow-up; however, its performance remains constrained by operator dependence and limited protocol standardization15,16. These limitations highlight that reliance on a single modality or discipline often results in suboptimal stricture assessment.

Management of CD-related intestinal strictures often requires multidisciplinary input, as no single specialty can access the full spectrum of clinical, imaging, and procedural information needed to accurately determine stricture nature and guide optimal therapy. Ultrasound is particularly valuable for longitudinal monitoring compared with CT and MRI, making gastroenterologists, ultrasonographers, and surgeons the primary contributors to stricture evaluation in current practice. Beyond the recognized importance of MDT involvement, the primary challenge lies in achieving consistent assessments among specialists within the same discipline and ensuring the overall accuracy of integrated MDT decisions. In this study, intra-disciplinary consistency refers to agreement among specialists from the same discipline when independently assessing identical clinical cases. Addressing this challenge requires identifying determinants of decision variability and developing structured mechanisms to integrate diverse disciplinary perspectives within the MDT framework. However, the specific roles of each specialty and standardized MDT workflows remain undefined. Variability in team composition, internal agreement, individual characteristics, meeting structure, and training may lead to inconsistent decisions. Prior evidence from cardiac MDTs has demonstrated substantial inter-expert variability and limited reproducibility17,18, and most MDT research to date has focused on cardiology or oncology19,20,21. Data on IBD-specific MDTs are scarce. One study showed that MDT participation improved diagnostic accuracy, surgical planning, and reduced perioperative events in IBD patients, but did not evaluate MDT decision-making quality or consistency22.

This study employed a mixed-methods design to assess diagnostic accuracy and intra-disciplinary consistency in the evaluation of CD-associated stricture nature within an MDT context, to identify factors influencing stricture nature assessment and decision variability, and to develop a structured framework for integrating diverse disciplinary perspectives in MDT practice.

Methods

Study design

This mixed-methods study combined cross-sectional analysis of diagnostic agreement with grounded-theory interviews to examine MDT decision-making in CD-related strictures (Supplementary Fig. 1). This study was registered with the Chinese Clinical Trial Registry (ChiCTR2300076774) and was conducted in strict compliance with the approved protocol and relevant regulations. The qualitative component of this study was conducted in accordance with the Consolidated Criteria for Reporting Qualitative Research (COREQ) guidelines.

Quantitative study

Patient inclusion criteria and data collection

We included patients with Crohn’s disease (CD) and intestinal strictures managed by the Peking Union Medical College Hospital (PUMCH) IBD-MDT between 2018 and 2024, with ≥ 6-month follow-up. Eligibility criteria were: age ≥ 16 years; CD diagnosis per the 2023 Chinese and ECCO guidelines23,24; imaging-confirmed intestinal stricture (CT, MRI, ultrasound, or endoscopy) (Supplementary Table 1)25; and MDT evaluation. Exclusion criteria included pregnancy; non-CD strictures; incomplete cross-sectional or endoscopic assessment; surgical indications unrelated to strictures; contraindications to surgery; >6-month delay in surgery after MDT recommendation; and < 6-month medical therapy following MDT advice.

Demographic, clinical, laboratory, imaging, and endoscopic data were collected. Surgical specimens, where available, underwent histopathologic assessment including Masson’s trichrome staining to characterize stricture nature.

Experts 8 MDT experts (gastroenterology, surgery, ultrasound, and MDT chairs) with ≥ 1-year MDT experience were enrolled. Demographics, clinical experience, MDT participation, and personality traits (Chinese TIPI) were recorded26,27(Supplementary Table 2).

Procedures 32 cases were randomly selected from the study cohort for blinded, independent review by paired experts within each specialty. Each specialty was asked to focus on predefined domain-specific perspectives during independent assessment. Gastroenterologists primarily evaluated inflammatory activity and medical responsiveness; surgeons focused on surgical feasibility, complication risk, and anatomical considerations; ultrasound specialists concentrated on imaging features and dynamic bowel characteristics. Reviewers assessed stricture nature and recommended treatment without access to post MDT decisions (historical MDT decisions). Inter-rater agreement was quantified using the prevalence- and bias-adjusted kappa (PABAK) to account for three-category classification28. Agreement strength was interpreted using established benchmarks. Once a PABAK ≥ 0.60 was reached in the initial assessment, 10 additional cases were added, and the resulting evaluation based on a total of 42 cases was used for subsequent analyses. Cases with disagreement (PABAK < 0.60) triggered a standardized consensus-training session. The targeted training focused on standardizing the interpretation of key indicators for stricture assessment29, including bowel wall thickness, wall stratification, Limberg vascularity grade, and proximal dilation on ultrasound, as well as enhancing conceptual understanding of stricture nature and the integration of clinical, laboratory, endoscopic, and imaging information for stricture classification. Following training, an additional ten cases were evaluated, yielding a total of 42 assessments to confirm improvement in agreement (target PABAK ≥ 0.60). The post-training PABAK analysis was intended as an exploratory, proof-of-concept assessment to evaluate whether targeted standardization could improve intra-disciplinary consistency, rather than as definitive evidence of training effectiveness.

Accuracy Criteria Surgical decision: Valid if postoperative Masson staining and H༆E inflammatory scoring confirmed fibrotic or mixed stricture. Medical decision: Valid if clinical remission (CDAI) or imaging improvement at 6 months was achieved. The experimental protocol is detailed in Supplementary Table 3.

Stricture Nature Classification: To establish histologic reference standards, we consulted pathologists and analyzed data from a previously published study conducted at our center, which included 64 resected Crohn’s disease strictures. In that study, the median collagen proportion on Masson’s trichrome staining (40.1%) was identified and was therefore adopted as the cutoff in the present analysis29. Strictures were classified as follows: fibrotic, > 40.1% collagen with minimal inflammation or > 60% collagen regardless of inflammation; inflammatory, ≤ 40.1% collagen with any degree of inflammation; and mixed, 40.1%–60% collagen with marked inflammation (score 2–3)11. Inflammation scores were assessed on H&E staining (criteria in Supplementary Table 4).

Qualitative study

8 experts (two gastroenterologists, two ultrasonographers, two surgeons, and two MDT chairs) participated in semi-structured interviews (Supplementary Table 5). The interview guide was informed by MDT observations, literature review, and expert input. Interviews were conducted in private, audio-recorded, transcribed verbatim, and analyzed using grounded theory, a qualitative methodology that generates conceptual frameworks directly from empirical data through iterative coding and constant comparison30. Two trained qualitative researchers conducted each session and performed open, axial, and selective coding to derive themes related to MDT decision-making.

Sample size

For the quantitative assessment, a sample size of 34 patients was calculated assuming PABAK = 0.80, a lower 95% CI limit of 0.40, 80%–90% power, and α = 0.05, based on anticipated decision distribution (30% fibrotic, 60% inflammatory, 10% mixed) using PASS software. Qualitative interviews continued until thematic saturation.

Statistical analysis

We evaluated (1) MDT and specialty-specific diagnostic accuracy; (2) intra-disciplinary agreement using kappa and PABAK statistics; and (3) determinants of diagnostic accuracy and disagreement identified through grounded-theory qualitative analysis. Continuous variables are presented as mean ± SD and categorical variables as percentages. Statistical analyses were performed using SPSS v20.0.

For the qualitative component, analysis followed COREQ guidelines. Grounded-theory methods (open, axial, and selective coding) were applied to interview transcripts, supported by NVivo 15 Plus, to derive themes and identify factors shaping MDT performance and optimization.

Results

Patient characteristics

A total of 42 patients with CD-associated intestinal strictures were included. The mean age was 37.98 ± 4.24 years, and 23 patients (54.76%) were male. Stricture locations included the small intestine in 24 cases (57.14%), colon in 10 (23.81%), ileocolonic region in 11 (26.19%), and anastomotic site in 2 (4.76%). Twenty-two patients (52.38%) had multiple stricture sites, and 25 (59.52%) presented with obstructive symptoms. Following MDT management, 32 patients (76.19%) received medical treatment, while 10 (23.81%) underwent surgery (Table 1). In patients who underwent surgery, diagnostic accuracy was evaluated using histological findings as the reference standard, whereas in medically treated patients, accuracy was assessed based on clinical response and imaging improvement.

Table 1 Patient Characteristics.

Expert characteristics

Eight MDT experts were included, with a mean age of 50.3 years (range: 41–67). Among them were 2 males, including 1 attending physician, 4 associate chief physicians, and 3 chief physicians. The average duration of clinical practice was 23.3 years (range: 13–43), and the average duration of MDT participation was 96.8 months. Personality traits varied by specialty. Compared to others, surgeons tended to be more extroverted, less agreeable, slightly less conscientious, more emotionally unstable, and more imaginative and open to new experiences (Table 2).

Table 2 Characteristics of MDT Specialists.

Intra-disciplinary decision consistency and accuracy

Intra-disciplinary decision consistency showed a PABAK value of 0.75 for gastroenterologists, 0.68 for surgeons, and 0.46 for ultrasound specialists in the first round, which improved to 0.93 after targeted training. The gastroenterologists predominantly classified strictures as inflammatory (64–69%), surgeons favored fibrotic classifications (33–38%), while ultrasonographers frequently labeled them as mixed (48–52%). Post MDT decisions were closely aligned with the accuracy standard, with most strictures identified as inflammatory (69–71%) and only a small proportion as mixed (7%) (Supplementary Fig. 2). In terms of accuracy, the historical MDT decisions were correct in 92.9% of cases. Average decision accuracy was 89.3% for gastroenterologists, 69.1% for surgeons, and 66.7% for ultrasound specialists (Table 3).

Table 3 Decision Consistency and Accuracy.

Recommendations for stricture nature assessment based on grounded theory

Qualitative analysis based on grounded theory identified four major themes and 16 subthemes influencing stricture nature assessment: clinical factors, laboratory results, imaging findings, and endoscopic evaluation. Based on these findings, we proposed structured recommendations for distinguishing stricture natures. Inflammatory strictures were more frequently associated with a shorter disease duration (< 5 years), fever, and higher Crohn’s Disease Activity Index (CDAI > 150). These patients generally responded well to medical therapy. Laboratory tests commonly showed elevated ESR, CRP, and fecal calprotectin levels, indicating ongoing inflammation. On ultrasound imaging, bowel wall stratification was often lost, with a wall thickness > 6 mm, high-grade Limberg vascularity (grade 3–4), and mild or absent proximal dilation on ultrasound. CT imaging frequently revealed dilation at the stenotic site, marked mucosal enhancement, and perienteric exudation. Endoscopically, inflammatory strictures typically presented with ulcerative lesions. In contrast, fibrotic strictures were associated with a longer disease course (≥ 5 years), often lacked systemic symptoms such as fever, and had a lower CDAI (< 150). These patients were generally unresponsive to anti-inflammatory treatment. Laboratory markers such as ESR, CRP, and fecal calprotectin were usually normal or only mildly elevated. Ultrasound imaging showed preserved bowel wall stratification, wall thickness < 6 mm, and low Limberg blood flow grades (0–2), with marked proximal dilation. CT scans typically showed non-dilatable stenotic segments, minimal mucosal enhancement, and minimal perienteric changes. On endoscopy, fibrotic strictures had a non-inflammatory appearance without active ulcers (Table 4).

Table 4 Suggested Criteria for Stricture Classification in CD.

Improved MDT practice workflow based on grounded theory

Open coding results revealed clear logical relationships among themes. We further summarized 19 subthemes across seven core domains: personnel selection, team building, team training, pre-meeting preparation, meeting procedures, post-meeting follow-up, and continuous improvement. Accordingly, we developed a structured recommendation for optimizing IBD-MDT workflows. Personnel from Ultrasound, Gastroenterology, and Surgery were selected based on predefined experience thresholds and trait profiles assessed via TIPI-C. Team formation included core and optional departments, with 2–3 trained members per specialty. Standardized intra-department training and MDT meeting observation were implemented for skill alignment. For each MDT session, six cases were reviewed, with > 20 min allocated per patient. Pre-meeting case summaries were distributed 2 days in advance. Final decisions were patient-centered, resolving disagreements through moderated discussion. Post-meeting review ensured follow-up planning and communication, supported by quarterly case reviews and annual updates on IBD care (Fig. 1).

Fig. 1
Fig. 1The alternative text for this image may have been generated using AI.
Full size image

Proposed MDT Workflow for CD Management.

Discussion

In this mixed-methods study, we comprehensively assessed the intra-disciplinary consistency, diagnostic accuracy, and decision-making processes of a multidisciplinary team (MDT) managing CD associated intestinal strictures. Our findings highlight several important contributions. First, we identified substantial variability in decision consistency among disciplines, with gastroenterologists demonstrating higher baseline agreement than surgeons and ultrasound specialists. Notably, targeted training significantly improved consistency across all specialties, underscoring the importance of structured education within MDT frameworks. Second, we provided empirical evidence supporting the diagnostic accuracy of MDT-based assessments, with an overall correctness of 92.9%, supporting the potential clinical utility of MDT discussions in complex stricture evaluation. Third, using grounded theory, we delineated a set of multimodal clinical, laboratory, imaging, and endoscopic features that distinguish inflammatory from fibrotic strictures, thereby proposing a structured and potentially reproducible approach to stricture characterization. Finally, our study proposed a standardized MDT workflow that integrates personnel selection, procedural protocols, and continuous quality improvement, offering a practical blueprint for enhancing the efficiency and reliability of IBD-MDTs.

To the best of our knowledge, no prior studies have systematically examined MDT decision-making consistency for CD-related strictures or verified historical MDT accuracy. In our study, gastroenterologists showed high intra-disciplinary agreement (PABAK = 0.75) and accuracy (up to 90.5%), whereas ultrasound and surgical specialists demonstrated lower concordance and accuracy. These findings indicate that gastroenterologists at PUMCH apply relatively consistent criteria when evaluating stricture nature. Qualitative results further suggest that gastroenterologists integrate a wider spectrum of clinical, laboratory, and imaging information than other specialists, underscoring the value of a multifactorial assessment rather than reliance on a single modality. Specifically, the observed differences in stricture classification across specialties reflect variations in professional perspectives inherent to each discipline, rather than inconsistencies or deficiencies in individual expertise. Each specialty approaches stricture assessment through its own professional lens, shaped by training background, clinical responsibilities, and decision-making priorities. Consequently, independent assessments naturally yield heterogeneous classification tendencies. The multidisciplinary team framework serves to integrate these diverse perspectives, transforming specialty-specific viewpoints into a unified and clinically actionable decision. Our findings suggest that differences among specialties are not merely related to personality traits, but more fundamentally reflect distinct professional goals and cognitive frameworks. MDT meetings function as a platform for integrating these heterogeneous perspectives through case presentation and cross-specialty discussion, with the aim of reaching a consensus-based final decision that is broadly accepted across participating disciplines.

Despite lower diagnostic accuracy among surgeons and ultrasound specialists, both disciplines remain essential to MDT decision-making. Surgeons provide crucial evaluation of surgical feasibility, timing, and long-term strategy, particularly in fibrotic or complicated strictures. Ultrasound specialists offer real-time, noninvasive assessment of bowel wall features and disease dynamics, complementing other imaging modalities and aiding follow-up when advanced imaging is limited. Thus, even with variability in individual accuracy, each specialty contributes unique strengths that collectively enhance comprehensive MDT management.

Moreover, the relatively high accuracy observed in previous MDT decisions at our center highlights the potential value of multidisciplinary collaboration in the management of CD-associated strictures. As one of the earliest institutions in China to establish an IBD-focused MDT, PUMCH has demonstrated high MDT performance quality31. However, this performance should be interpreted in the context of a tertiary referral center and retrospective evaluation, and may have been influenced by factors such as case selection and verification bias. Accordingly, while our findings support the feasibility and potential clinical utility of MDT-based decision-making, the qualitative insights and implementation recommendations derived from our MDT model should be regarded as context-specific and hypothesis-generating.

In the first round, the PABAK value for intra-team agreement among Ultrasonographer was below 0.6. However, after targeted training focused on standardizing the interpretation of key indicators related to CD-associated strictures, agreement markedly improved in the second round. This suggests that inconsistent understanding of critical assessment parameters was a major contributor to the initially low consistency15,16. Furthermore, these findings highlight that variations in MDT composition—specifically, the involvement of different physicians—may lead to inconsistent MDT decisions27. To ensure decision stability and accuracy, it is essential to evaluate intra-team agreement and provide consistency training for all new members during MDT establishment or personnel transitions.

By integrating both quantitative and qualitative approaches, we identified a range of factors influencing the characterization of CD-associated strictures across clinical presentation, laboratory testing, imaging, and endoscopy. For each indicator, we provided preliminary recommendations regarding its tendency to suggest either an inflammatory or fibrotic stricture nature. In clinical practice, variability in physicians’ understanding or weighting of these factors may affect both diagnostic accuracy and decision-making consistency. To date, most existing studies have focused on single-modality or single-discipline assessments—such as CT, MRI, or ultrasound—for stricture characterization. However, no prior research has offered a multidisciplinary, multifactorial framework for distinguishing between inflammatory and fibrotic strictures in CD patients9,10,11.

We also identified factors shaping MDT diagnostic performance. Hierarchical dynamics emerged as a key influence: as one gastroenterologist noted, “I would definitely follow the senior doctor’s decision,” consistent with prior evidence that junior clinicians may defer to senior voices, potentially suppressing alternative assessments32,33. Personality traits similarly affected discussion dynamics. An assertive surgeon commented, “If you believe your assessment is reasonable, how you present it at the meeting matters,” suggesting that dominant personalities may disproportionately shape decisions and, in some cases, contribute to inaccuracy34,35. Drawing on these insights, we developed the first structured implementation framework for IBD-focused MDTs, encompassing seven domains: member selection, team development, training, pre-meeting preparation, in-meeting conduct, post-meeting follow-up, and continuous improvement mechanisms. This framework aims to enhance decision quality and stability in the management of CD-related strictures.

To our knowledge, this is the first mixed-methods investigation of clinical MDT performance in diagnosing CD-associated strictures. The study adhered to established methodological standards for mixed-methods research in healthcare36,37. Several limitations merit consideration. First, this was a single-centre study with a limited sample size, which may constrain generalizability. Although the sample size was based on prior power calculations, the absolute number of cases remains modest, particularly in certain subgroups such as surgical cases. Therefore, subgroup-related findings should be interpreted with caution and regarded as exploratory. Nevertheless, the PUMCH MDT is among the earliest and most experienced IBD teams in China, and demonstrated high diagnostic accuracy in our quantitative analysis, partially mitigating this concern. Second, we did not assess inter-MDT or longitudinal agreement; thus, long-term stability of MDT decision-making remains to be evaluated. Nevertheless, our intra-specialty consistency assessment provides indirect evidence that variability in team composition may affect decision robustness. Third, only core specialties (gastroenterology, surgery, and ultrasound) were included, rather than the full MDT roster. While these disciplines represent the principal decision-making units for stricture classification, broader inclusion may yield additional insights. Fourth, the criteria used to define the accuracy of medical decisions, including clinical remission and imaging improvement, do not exclude the presence of a fibrotic component. This approach reflects a clinically pragmatic assessment aligned with real-world practice, but is histologically limited, as pathological confirmation is generally unavailable in medically managed patients. Finally, our recommended assessment indicators and process framework were derived from qualitative synthesis and may be subject to interpretive bias. These recommendations are not intended as a rigid decision algorithm, but rather as a supportive framework to help structure MDT discussions and facilitate shared understanding across specialties. The findings reflect perceived impact and expert reasoning within MDT meetings, and therefore remain hypothesis-generating in nature. Future work should include multi-centre validation, prospective designs, and quantitative validation, as well as the development of predictive models or decision support systems to refine stricture assessment and standardize MDT practice.

Conclusion

In assessing stricture natures in CD, intra-disciplinary consistency was moderate among gastroenterologists and surgeons, and low among ultrasonographers. Influencing factors span clinical, laboratory, imaging, and endoscopic domains. We also identified seven organizational components critical to MDT performance. Based on these findings, we propose a comprehensive set of recommendations for stricture assessment and MDT practice in IBD, which may facilitate structured MDT discussions and support clinical decision-making in CD-associated intestinal strictures.