Abstract
We detect and interactively visualize occurrence, frequency, sequence, and clustering of extraintestinal manifestations (EIM) and associated immune disorders (AID) in 30,334 inflammatory bowel disease (IBD) patients (Crohn’s disease (CD) n = 15924, ulcerative colitis (UC) n = 11718, IBD unclassified, IBD-U n = 2692, 52% female, median age 40 years (IQR: 25)) with artificial intelligence (AI). 57% (CD > UC 60% vs. 54%, p < 0.00001) had one or more EIM and/or AID. Mental, musculoskeletal and genitourinary disorders were most frequently associated with IBD: 18% (CD vs. UC 19% vs. 16%, p < 0.00001), 17% (CD vs. UC 20% vs. 15%, p < 0.00001) and 11% (CD vs. UC 13% vs. 9%, p < 0.00001), respectively. AI detected 4 vs. 5 vs. 5 distinct EIM/AID communities with 420 vs. 396 vs. 467 nodes and 11,492 vs. 9116 vs. 16,807 edges (links) in CD vs. UC vs. IBD, respectively. Our newly developed interactive free web app shows previously unknown communities, relationships, and temporal patterns—the diseasome and interactome.
Similar content being viewed by others
Introduction
Chronic inflammatory bowel diseases (IBD)1, i.e. Crohn’s disease (CD)2 and ulcerative colitis (UC)3, result from an inappropriate immune response towards the commensal microbiota4 in genetically5 susceptible individuals, exacerbated and promoted by environmental factors such as Western lifestyle, diet and industrialization. They cannot be cured and require lifelong medical therapy6.
IBD affects mainly the digestive tract including the liver7, but due to their systemic nature, they can involve virtually all parts of the human body through extraintestinal manifestations (EIM) and associated autoimmune diseases (AID)8,9. Previously10,11, research on extraintestinal manifestations and associated autoimmune disorders in IBD was based on non-population representative cohorts mostly from academic tertiary referral centers with roughly 1000 patients based on self-reporting.
Here we apply—for the first time—the concept of network medicine to IBD12. By mapping network-based dependencies between diseases a concept of an unbiased IBD diseasome evolves, with color-coded disease maps whose nodes are diseases and whose links represent various relationships between CD and UC associated EIM and AID. We report and digitally visualize the frequency and relationship of EIM and AID to the underlying disease (CD vs. UC vs. IBD) and among each other at different organization levels.
Our study is the most comprehensive, first interactive, and first artificial intelligence-supported analysis of extraintestinal manifestations and associated autoimmune disorders in IBD based on the largest, most diverse, population-representative cohort to date including 30 times more patients than previous studies, spanning nearly two decades, identifying disease clusters and networks and thereby introducing the concept of network medicine into IBD.
Through our interactive models (web app available in the online supplement) we hope to assist clinicians in their daily practice by recognizing relationships between individual and clustered EIM and AID and optimizing clinical decision-making on their management and advance research in this field.
Results
Patients
IBD patient cohort extraction
We initially identified individual patients who were diagnosed at least once with IBD, either CD or UC as described above in a cohort of up to 18 years spanning from 2002 to 2020. After application of the previously validated definition13, 30,334 IBD (CD n = 15,924, UC n = 11,718, IBD unclassified, IBD-U n = 2692) patients remained for evaluation (Table 1).
IBD patient cohort characterization
The studied cohort included equal numbers of female (51.6%) and male (48.4%) mostly adult (83.9%) patients aged between 18 and 64 years, aged 40 ± 25 (median, IQR) years. CD was as expected statistically significantly more often complicated by obstructive problems, fistulas, and abscesses. Patients’ medical treatment history was typical for IBD, with CD patients statistically significantly more often exposed to antimetabolites, anti-TNFα, and anti-IL-12/23 biologics, compared with ulcerative colitis patients who were, in line with professional treatment guidelines, prescribed aminosalicylates, calcineurin inhibitors, and anti-integrins more often. Healthcare utilization among CD patients was higher with per patient per year compared with UC, respectively. Patient features are further detailed in Table 1.
Frequency of extraintestinal manifestations and associated autoimmune disorders
The Canadian version of official WHO ICD code domains was followed to group the data for comparability and secondary data use with our above-described selection criteria, i.e. not ICD codes (comorbidities) were included but those that related to EIM and AID (Table 2 Web Tables 1–14). More than half of all IBD patients experience EIM or AID. EIM and AID occur significantly more frequently in CD (60%) vs. UC (54%) (p < 0.00001) (Table 2).
Mental and musculoskeletal conditions are the most common EIM and AID in IBD
Mental, behavioral, and neurodevelopmental disorders (IBD 18%, CD vs. UC 19% vs. 16%, p < 0.00001) as well as diseases of the musculoskeletal system and connective tissue are most frequently associated with IBD (IBD 17%, CD vs. UC 20% vs. 15%, p < 0.00001). Table 2 Among the mental disorders depression and anxiety dominate. Supplementary Table 1 Among the musculoskeletal disorders, arthropathies, ankylosing spondylitis, and myalgia dominate (Supplementary Table 2).
Genitourinary, cerebrovascular, circulatory, respiratory, and digestive EIM and AID are less frequently associated with IBD
Overall, Genitourinary conditions are more frequently associated with Crohn’s disease (IBD 11%, CD vs. UC 13% vs. 9%, p < 0.00001) (Table 2). Calculus of the kidney, ureter, and bladder dominate and occur more frequently in UC, while tubulo-interstitial nephritis is more commonly seen in CD (Supplementary Table 3). Cerebrovascular diseases are associated with 10% of IBD patients with no preference for CD or UC (10% vs. 10%, p = 0.48) (Table 2). Among the cerebrovascular disorders, phlebitis and thrombophlebitis, embolism, and thrombosis and stroke dominate (Supplementary Table 4). Circulatory system diseases are associated with 10% of IBD patients with no preference for CD or UC (10% vs. 10%, p = 0.08) (Table 2). Among the circulatory disorders, cardiac ischemia and pulmonary embolism dominate (Supplementary Table 5). Respiratory system diseases are associated with 10% of IBD patients and more often in CD vs. UC (10% vs. 9%, p = 0.029) (Table 2). Among the respiratory disorders asthma dominates by far and occurs significantly more frequently in CD vs. UC (Supplementary Table 6).
General, digestive, hematological, dermatological, neurological, endocrine, ophthalmic, and otolaryngeal EIM and AID are least frequently associated with IBD
General symptoms signs and abnormal clinical and laboratory findings not elsewhere classified are more frequently associated with CD (CD vs. UC 9% vs. 8%, p = 0.0017) (Table 2). Malaise and fatigue dominate in this category (Supplementary Table 7). Digestive disorders (other than the underlying conditions CD and UC) are more frequently associated with UC (UC vs. CD 7% vs. 8%, p = 0.04). Table 2 Celiac disease and autoimmune liver diseases dominate here (Supplementary Table 8). Diseases of the blood and blood-forming organs and certain disorders involving the immune system are more frequently associated with CD (CD vs. UC (5% vs. 3%, p < 0.000001)) (Table 2). Anemia, coagulation defects, immunodeficiencies, and immune diseases dominate here (Supplementary Table 9). Diseases of the skin and subcutaneous tissue are more frequently associated with CD (CD vs. UC (5% vs. 3%, p < 0.000001)). Table 2. Psoriasis, pyoderma, and erythema nodosum dominate here (Supplementary Table 10). Diseases of the nervous system are equally associated with IBD (CD vs. UC (3% vs. 3%, p = 0.99)). Transient cerebral ischemia and multiple sclerosis dominate here (Supplementary Table 11). Endocrine, nutritional and metabolic diseases are slightly more frequently associated with UC (UC vs. CD 2% vs. 2%, p = 0.03). Type 1 diabetes dominates here (Supplementary Table 12). Diseases of the eye and adnexa are more frequently associated with CD (CD vs. UC (3% vs. 2%, p < 0.000001) (Table 2). Iridocyclitis, episcleritis and scleritis dominate here (Supplementary Table 13). Diseases of the ear and mastoid process are least frequently associated with IBD and sensorineural hearing loss (Supplementary Table 14) dominate among the least frequent EIM and AID in IBD (Table 2).
Clustering of EIM and AID—community detection network analysis
Below we are summarizing the main clusters and associations of EIM and AID in the static printed figures. Many more clusters, networks and relationships can be explored by individually interrogating our interactive dataset. This can be accomplished either by clicking a) on any node of the network to highlight the associated edges (links) and associated nodes b) dragging the clicked nodes around to rearrange them on the online Interactive Fig. 1, Interactive Fig. 2, and Interactive Fig. 3 or more systematically including additional statistics and tables with our Interactive App.
Louvain network analysis identifies two large and three smaller distinct EIM and AID clusters in IBD
Most EIM and AID associated with IBD occur in two large clusters that appear in blue and yellow, three smaller red, green, and purple clusters (Fig. 1, Interactive Fig. 1).
Most EIM and AID associated with IBD occur in two large clusters that appear in blue and yellow, three smaller red, green, and purple clusters. Click here for the interactive version Interactive Figure 1.
In the blue cluster the central, largest node depicts unspecified CD (K50.9) connected to a smaller blue node depicting CD of the small and large intestine (K50.8). They form a tightly woven network of links (edges) to rheumatoid arthritis (M06.9) and most other musculoskeletal and connective tissue conditions.
The largest node in the yellow cluster is malaise and fatigue (R53). This is most closely related to (thick link/edge) unspecified CD (K50.8) much lesser degree to ulcerative proctitis (K51.2) and a tight network (many links/edges) of distinct endocrine, cerebrovascular, circulatory, neurological and some musculoskeletal EIM and AID.
The most significant nodes in the smaller green cluster are CD of the small intestine (K50.0) and to a lesser degree ulcerative proctitis (K51.2) and left-sided colitis (K51.5) with dominant clustering of depression (F32.9), panic disorder (F41), anxiety (F41.4), and limb pain (M79.60).
The smaller purple cluster evolves around the CD of the large intestine (K50.1) with strong relationships to calculus of the kidney (N20.0) and ureter (N20.1), tubulo-interstitial nephritis (N12) but also intraoperative and postprocedural complications of digestive disorders (K91.8).
The smaller red cluster centers around ulcerative colitis (K51.9) with tight woven networks (links/edges) to phlebitis and thrombophlebitis of other and unspecified deep vessels of the lower extremities (I80.2), pulmonary embolism without acute Cor pulmonale (I26.9), and embolism and thrombosis of other specified veins (I82.2).
Network analysis identifies two large and two smaller distinct EIM and AID clusters in CD
Most EIM and AID associated with CD occur in two large clusters that appear in blue and yellow and two smaller green and red clusters (Fig. 2, Interactive Fig. 2).
Most EIM and AID associated with CD occur in two large clusters that appear in blue and yellow and two smaller green and red clusters. Click here for the interactive version Interactive Figure 2.
The yellow cluster is focused on its largest node, CD of the large intestine (K50.1) and is connected to smaller yellow nodes representing VTE (venous thromboembolic events) and MACE (major cardiovascular events) like phlebitis and thrombophlebitis of other and unspecified deep vessels of the lower extremities (I80.2), pulmonary embolism without acute Cor pulmonale (I26.9), transient ischemic attacks (G45.9), Angina pectoris (I20.9), non-ST-elevation myocardial infarction (I21.4) (NSTEMI), vascular disorders of the intestine (K55.9) and a tightly woven network of distinct mostly cerebrovascular, circulatory, endocrine EIM and AID.
The blue cluster is focused on its largest node, CD of both small and large intestine (K50.9), and smaller blue nodes like unspecified rheumatoid arthritis (M06.9), ankylosing spondylitis (M45.9), celiac disease (K90.0) and unspecified iridocyclitis (H20.9), immunodeficiency (D84.9) and tightly woven network of distinct, mostly musculoskeletal, and some dermatological EIM and AID.
The smaller green cluster is centered around the CD of the small intestine (K50.0) with larger green sub-nodes for kidney (N20.0), ureter calculus (N20.1), tubule-interstitial nephritis (N12), but also intraoperative and procedural complications of digestive disorders (K91.8).
The smaller red cluster is centered around unspecified CD (K50.9) is strongly associated with mental disorder red sub-nodes like malaise and fatigue (R53) and major depressive disorder (F32.9) and anxiety (F41.9) but also asthma unspecified (J45.90) and limb pain (M79.60) as well as a tightly woven network of mostly mental EIM and AID.
Network analysis identifies two large and two smaller distinct EIM and AID clusters in UC
Most EIM and AID associated with UC occur in two large clusters that appear in green and red, two smaller red and blue clusters, and one very small purple cluster. (Fig. 3, Interactive Fig. 3).
Most EIM and AID associated with UC occur in two large clusters that appear in green and red, two smaller red and blue clusters, and one very small purple cluster. Click here for the interactive version Interactive Figure 3 in the online supplement.
The main node series in the green cluster centers around other ulcerative colitis (K51.8), ulcerative proctitis (K51.2), ulcerative rectosigmoiditis (K51.3), left-sided UC (K51.5) but also indeterminate colitis (K52.3) associated with kidney (N20.0) calculus (N20.1) and tubulo-interstitial nephritis (N12) and a tightly woven network of distinct mostly musculoskeletal and dermatological EIM and AID.
The main node in the red cluster is malaise and fatigue (R53), which is strongly connected (thick edge) to the smaller yellow cluster of unspecified UC (K51.9) and a tightly woven network venous thromboembolic events (VTE) and major cardiovascular events (MACE), like phlebitis and thrombophlebitis of other and unspecified deep vessels of the lower extremities (I80.2), pulmonary embolism without acute Cor pulmonale (I26.9), transient ischemic attacks (G45.9), angina pectoris (I20.9), non-ST-elevation myocardial infarction (NSTEMI) (I21.4) and vascular disorders of the intestine (K55.9).
The smaller yellow cluster is centered around unspecified UC (K51.9) and strong associations with mental disorders such as anxiety (F41.9), major depression (M32.9), generalized- (F41.1), and mixed anxiety disorders (F41.3) as well as asthma (J45.90), but also acute laryngitis (J04.0) and celiac disease (K90.0).
The smaller blue cluster is centered around pancolitis (K51.0) and strongly associated with inflammatory polyps (K51.4) and various VTE / MACE disorders like phlebitis and thrombophlebitis of other and unspecified deep vessels of the lower extremities (I80.2), pulmonary embolism without acute Cor pulmonale (I26.9), angina pectoris (I20.9), non-ST-elevation (I21.4) and vascular disorders of the intestine (K55.9).
One very small purple cluster evolves around rheumatic heart conditions (I00).
Algorithm detected EIM and AID communities are not random
The mean overlap score (Dice Sorensen coefficient) for randomly generated communities was 0.232 ± 0.001 SEM. Figure 4 illustrates the expected community overlap variability driven by the order in which ICD code co-occurrences are input to the Louvain algorithm. Figure 5 illustrates the overlap variability is driven by patient sampling differences. The differences between the quartile-restricted scores were pronounced when the source of variability was patient sampling. The most frequently observed ICD codes form the most consistent community association across trials, supporting the notion that the algorithm-detected communities are based on a stable “core” set of ICD codes. Irrespective of whether the source of variability was algorithmic or patient sampling differences, the mean overlap scores using real ICD co-occurrence data were much better than when the communities were randomly assigned (one-sided Welch’s t-test p < 0.001).
Error bars represent SEM. The lightly colored bar gives the mean community overlap score for 1000 pairs of randomly generated communities. The x-axis values indicate the quantile of ICD frequencies in the patient population that was used to threshold the ICD codes that were included in the overlap score calculation. ICD codes whose frequencies in the patient population fell in the quantiles stated along the x-axis were included in the overlap score calculation.
Error bars represent SEM. The lightly colored bar gives the mean community overlap score for 1000 pairs of randomly generated communities. The x-axis values indicate the quantile of ICD frequencies in the patient population that was used to threshold the ICD codes that were included in the overlap score calculation. ICD codes whose frequencies in the patient population fell in the quantiles stated along the x-axis were included in the overlap score calculation.
Further network interactive disease network exploration
While the main goal of our study was to discover and report disease clusters within the main network categories IBD, CD, and UC, the data can be explored further with our newly developed web-based Interactive App accessible in any web browser.
By default (i.e. without any additional user input in the web browser) IBD is visualized in cluster mode including all WHO ICD disease domains (hierarchies) and without any code pair restrictions, i.e. even if only one code association exists, it will be included. This default setting allows for a comprehensive overview.
However, the user may want to explore specific aspects of the data in a research or clinical setting. Several visualization options are available to choose from in the left panel. First, one of the three disease networks (IBD, CD, or UC) can be selected. Second, the network organization can be grouped by clusters (default) or the WHO ICD hierarchy (essentially organ system domains). As discussed in the methods section above, in cluster mode discovered disease clusters are grouped by the same color, while in ICD mode WHO disease domains are grouped by the same color. Third, the number of WHO disease domains to be considered (i.e. hierarchical organization) can be reduced from 7 (default) to 1. A reduced number of WHO disease domains allows for the identification of the strongest clusters. Lastly, the number of ICD pairs can be varied from (default) to 1000. A lower number of IVD code pairs considers even very rare associations, while a higher number focuses on the most frequent associations in a cluster.
Discussion
To the best of our knowledge, this is the most comprehensive, first interactive, and first artificial intelligence14,15,16 supported analysis of extraintestinal manifestations and associated autoimmune disorders in IBD. Our results are based on the largest cohort to date (i.e. >30 times more patients than the last study, published more than a decade ago)11. Our work goes beyond reporting frequencies and associated classic statistics as we have identified and visualized disease clusters and networks in IBD using a novel methodology.
The concept of network medicine, i.e. a network-based approach to the understanding of complex human diseases originated in genetics12, a natural data-rich branch of biomedical research that adopted computational methodology early on and successfully uses it until this very day17. Previously, computational molecular genetics was used to discover gene interactions18, and disease modules19, identify disease pathways5, and predict other disease genes to develop new therapeutic concepts.
The increasing availability of population-representative large electronic health record-based data sets20 enables the application of the network medicine concept to such sufficiently deep, rich, and structured resources and drives unbiased discovery from the clinical setting including hypothesis generation for molecular research. Our study demonstrates the highly interconnected nature, the interactome, of IBD and its respective EIM and AID. It validates and expands previously established molecular relationships with nearly two decades of big data from electronic health records between various chronic systemic inflammatory conditions8. It shows in previously not possible ways that these diseases cannot be considered independent of one another. The mapping of network-based dependencies has culminated in the concept of the “diseasome”, which represents disease maps whose nodes are diseases and whose links represent various relationships between them. Our interactive Web App allows clinicians to uncover possibly associated EIM and AIDS in IBD and apply these insights to their respective patients. This advanced knowledge is another important step towards precision medicine and helps guide therapy in a way that addresses all medical needs.
Our results align with the efforts of large international expert panels to take a much broader look at extraintestinal manifestations and associated autoimmune disorders8,9. For example, many clinical trials are designed to investigate luminal inflammation exclusively. Whereas previous epidemiological10,11,21, genome-wide association studies5,22,23,24,25 and national consensus panels26 directed clinicians’ focus mostly to other obvious chronic inflammatory conditions such as musculoskeletal, dermatological, ophthalmological disorders or hepatobiliary disorders, our work uncovers - for the first time - that mental, behavioral and neurodevelopmental disorders, especially anxiety and depression actually are most frequently associated with IBD. The lack of attention to this problem27,28 is perhaps more surprising than the actual finding since a bi-directional brain-gut axis is a well-established concept29 and neuroinflammation30 and other molecular mechanisms31 for the disturbed blood–brain barrier have been recently unraveled.
Strengths of our work include the population-representative nature of this study with the inclusion of patients from academic and community healthcare centers, rural and urban settings, ethnic and cultural diversity, nearly two-decade observation period, and the ability to interrogate the data set beyond our analysis through the supplied interactive figures and software applications.
The results we present further validate, but also expand our understanding of extraintestinal manifestations and associated autoimmune disorders in IBD. We would like to highlight and discuss the following findings.
Our work emphasizes the importance of cerebrovascular and circulatory disorders. Major cardiovascular, cerebrovascular, and other thromboembolic events have recently mostly received attention in the context of severely hospitalized32 IBD patients for which preventive therapies have been recommended33 or as side effects of novel small molecules for the management of IBD, such as Janus kinase (JAK) inhibitors6. Since the JAK inhibitor exposure was only 0.5%, but their frequency was 20 times higher, this appears to be an independent phenomenon. There are alternative mechanistic explanations involving heat shock protein 47 and its’ regulation of thromboinflammation34. Our findings are supported by a population-based, sibling-controlled cohort study from Sweden between 1969 and 2019 that reported an increased risk of mostly ischemic strokes in IBD35. The increased risk of thromboembolic events independent of age-related processes and medications is further highlighted by reports of their occurrence in children36 and young adults37 and genetic susceptibilty38. A literature search across PubMed since its inception has reported an increased frequency of cardiovascular events in IBD39,40 and several molecular mechanisms have been proposed41,42.
The findings of our work are supported by a British Danish study currently not fully published but presented at a United European Gastroenterology Week (UEGW)43. While they looked at all comorbidities and we focused on EIM and AID, i.e. disorders that are mechanistically related to IBD, as stated in our methods section, we both arrive at comparable findings and independently conclude that IBD is a multisystemic disease, particularly manifesting with metabolic, immune, and neuropsychological disorders and that some conditions substantially precede diagnosis of CD or UC. We consider the independent, indirect validation of our results another major strength of our work.
Our work has limitations. We have used a very stringent selection algorithm, one that errs on the side of specificity and due to its design excludes early IBD. This is a compromise we accepted for this study to ensure we are really dealing with established, rather than suspected IBD. Our analyses rely on ICD codes. These codes and their WHO-assigned domains themselves have limitations in the sense that they do not always include our latest pathophysiology-driven understanding and classification of the respective diseases and sometimes also change their designation. We are also aware of potential coding errors, especially when it comes to the trailing first, second, or even third digits. Coding in Alberta’s electronic health record system does not exclusively rely on clinicians, whose initial choice of codes undergoes a variety of plausibility tests, independent validations, and corrections as necessary. None of these quality assurance measures, however, can completely rule out errors. Although tempting, associated codes, i.e., the coincidence of disorders, do not necessarily establish causality. This, however, is also true for genome, microbiome, or metabolome-wide association studies. Lastly, EIM and AID frequencies are likely also related to patient demographic factors, disease extent, disease activity,y and disease complications over time as well as treatment details. Not all these details are available in our data set, but future work in our group will attempt to investigate and dissect their impact.
Overall, our results support the hypothesis that IBD is a systemic inflammatory disorder with new and further reaching evidence from a large population-representative data set that requires a holistic view of the patient, including a multidisciplinary approach beyond digestive diseases44.
The power of interactive visualization in our work may help remind clinicians in their daily practice to actively look for EIM and AIDS, especially cardiovascular and mental disorders, that have not received the same attention as others in the past. We believe that our work can inspire and drive new research, using basic and clinical data complementing the common vision of precision health.
Materials and methods
The healthcare system in Alberta
Alberta is home to more than 4.85 million people. The population has diverse ethnic and cultural origins with 250 distinct groups including First Nations and immigrants from all continents according to the latest census of Statistics Canada45. All legal residents of Alberta are entitled to publicly funded and administered healthcare. Their care is documented in the Alberta Electronic Health Record Information System (EHRIS) dating back to 1997. EHRIS is jointly operated by Alberta’s Ministry of Health and Alberta Health Services46. Its main domains comprise access tools, repositories, registries, and infrastructure. All available data items are cataloged in the Alberta Health Data Asset Directory and the Alberta Health Services Data Asset Inventory Summary.
The Alberta Inflammatory Bowel Disease Patient Registry
A Provincial IBD Patient Registry was developed and implemented by author D.C.B. in the Alberta EHRIS Connect Care, which is based on a highly customized version of Epic Hyperspace (Epic Inc., WI, USA). IBD patients as well as their EIM and frequently associated AID were identified with the Canadian versions of the WHO International Classification of Diseases (ICD) codes ICD-10CA (K50.X or K51.X) and ICD-9CA (555.X or 556.X) and their respective systematized nomenclature of medicine clinical terms (SNOMED CT).
The selection criteria for the analysis cohort were refined with a previously validated algorithm.
It was derived from a total of 150 IBD case definitions using 1399 IBD patients and 15,439 controls in the development phase. In the validation phase, 318,382 endoscopic procedures were searched and 5201 IBD patients were identified. After consideration of the sensitivity, specificity, and temporal stability of each validated case definition, a diagnosis of IBD was assigned to individuals who experienced at least two hospitalizations or had four physician claims, or two medical contacts in the Ambulatory Care Classification System database with an IBD diagnostic code within a 2-year period (specificity 99.8%; sensitivity 83.4%; positive predictive value 97.4%; negative predictive value 98.5%). An alternative case definition was developed for regions without access to the Ambulatory Care Classification System database. A novel scoring system was developed that detected Crohn's disease and ulcerative colitis patients with a specificity of >99% and a sensitivity of 99.1% and 86.3%, respectively13. Those not meeting either criteria were labeled inflammatory bowel disease undetermined (IBDU). Similar algorithms have been reliably used by various IBD researchers all over the world47.
Extraintestinal manifestations (EIM) and associated autoimmune disorders (AID)
Historically, EIM and AID were thought to be limited to a very small number of inflammatory skin (i.e. psoriasis), joint (“arthopathy”,/“arthritis”), eye (i.e. uveitis), and liver diseases (i.e. PSC). However, our understanding of the systemic nature of Crohn’s disease and ulcerative colitis has substantially evolved due to the study of the genome, microbiome, and more recently metabolome8. This has impacted the most recent professional society treatment guidelines9. There they recognize the following conditions: musculoskeletal, ocular, oral, aural, nasal, skin, urogenital, hepato-pancreatico-biliary, neurological, cardiovascular, pulmonary, hematological, and endocrine. All of these conditions are grouped broadly into multifocal inflammation8 or extraintestinal manifestations (EIM) and associated immune disorders (AID). EIM has a clearly established molecular mechanistic basis while the mechanisms of AID have not been fully elucidated. Some of these conditions may also be disease complications and/or aggravated by their treatment. It is important to recognize that we have not tried to look at all comorbidities of IBD or make any claims about the incidence or prevalence of these conditions compared with the average population, which would be an epidemiological study and a very different research goal.
Ethics approval
The study did not require informed consent and the protocol was approved by the Health Research Ethics Board of the University of Alberta Institutional Review Board (Pro00093304).
Statistical analyses
For descriptive statistics, medians, and interquartile ranges (IQR) were reported where applicable. Statistical significance was tested using the Chi-Square test for binary variables. All analyses are based on available data. Patient demographic variables (age, gender, disease type [CD, UC, IBD], medication exposure [main drug classes and individual compounds], and healthcare utilization [number of hospitalizations per year, number of ER visits per year] are reported.
Louvain community detection network analysis
To identify and visualize communities of EIM and AID in relation to CD, UC, and IBD (CD + UC + IBDU) as defined above13, we deployed a previously described method to extract the community structure of large networks. It is a heuristic method based on modularity optimization, which was previously shown to outperform all other known community detection methods in terms of computation time, quality of the community detection, and accuracy validated in a network with 118 million nodes and more than one billion edges (links)48,49. This graph-theoretic method is a systems medicine approach to the clinical challenge of EIM and AID and if performed according to recent recommendations is robust to reliably reveal and visualize their associations with either CD or UC50.
The visualization consisting of nodes and edges is based on vis-network51. Vis-network uses HTML canvas for rendering. Therefore, the interactive figures should be viewed in the online appendix with a modern web browser. In this unbiased analysis, the size of an ICD code-derived node is proportional to √n × 0.3 + 2.5, where n is the number of unique patients assigned with that ICD. The thickness of each edge for a pair of ICDs = m × 0.01, where m is the number of patients having both of those ICDs, i.e. the strength of the disease relationship. The colors of the nodes and edges identify nodes in the detected community (cluster). The node positions are randomly set, governed by the force-atlas-2-based physics of the vis-network.
When one hovers over a node, it shows the tooltip with: the ICD code, ICD full name, and the absolute number and percentage of patients with that ICD code per disease, respectively. When one hovers over an edge, it shows the tooltip with: ICD codes of the nodes at the ends of the edge, and the number of patients having both codes, where —>, == and <— indicate the order of occurrence (timeline) of their first diagnoses. This allows one to determine for any ICD-coded condition whether it precedes or succeeds the condition it is linked to. One can drag and rearrange nodes to enable reading the edge tooltips. One can also set the minimum number of patient pairs for edges or the maximum length of the ICD code, to display a sub-network of interest.
Louvain community detection robustness evaluation
Overlap of ICD codes between corresponding communities identified by consecutive runs of the Louvain algorithm was scored using the Dice-Sørensen coefficient.
To evaluate variability arising from the greedy nature of the Louvain algorithm we performed consecutive runs of Louvain on all patient data but shuffled the order of ICD code co-occurrences input to each run. In each experiment, we scored the average overlap in ICD codes between corresponding communities detected by consecutive runs and averaged that value over all runs.
To evaluate variability arising from patient sampling we generated community detection results on two equally sized, mutually exclusive, random partitions of patients, we created an ICD co-occurrence graph and performed community detection for each partition, and calculated community overlap. When studying variability arising from differences in the patient samples we controlled algorithmic variability by inputting ICD co-occurrences into the Louvain algorithm in fixed alphabetical order. To assess whether the most frequently observed ICD codes form more consistent community associations than infrequent ones we calculated overlap scores using ICD codes that appeared in either the zeroth, first, second, or third quartiles of ICD code frequencies in the patient population.
We compared the results with a null distribution of overlap scores generated by randomly assigning ICD codes to communities and reported the mean and its standard error (SEM) over 1000 repeated trials.
Data availability
Our online supplementary information includes data tables, interactive html online figures, and our app allows for virtually unlimited opportunities to further explore the data set. The raw, patient-level source data that support the findings of this study were provided by Alberta Health Services and the Alberta Real World Evidence Consortium and are not publicly available. Researchers may file reasonable requests with these entities and data availability is governed by applicable law.
Code availability
The underlying code for this study is not publicly available but may be made available to qualified researchers on reasonable request from the corresponding author.
References
Baumgart, D. C. Crohn’s Disease and Ulcerative Colitis - From Epidemiology and Immunobiology to a Rational Diagnostic and Therapeutic Approach 2nd edn https://doi.org/10.1007/978-3-319-33703-6 (Springer Nature, 2017).
Baumgart, D. C. & Sandborn, W. J. Crohn’s disease. Lancet 380, 1590–1605 (2012).
Ordas, I., Eckmann, L., Talamini, M., Baumgart, D. C. & Sandborn, W. J. Ulcerative colitis. Lancet 380, 1606–1619 (2012).
Lloyd-Price, J. et al. Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases. Nature 569, 655–662 (2019).
Graham, D. B. & Xavier, R. J. Pathway paradigms revealed from the genetics of inflammatory bowel disease. Nature 578, 527–539 (2020).
Baumgart, D. C. & Le Berre, C. Newer biologic and small-molecule therapies for inflammatory bowel disease. N. Engl. J. Med. 385, 1302–1315 (2021).
Trivedi, P. J. & Hirschfield, G. M. Recent advances in clinical practice: epidemiology of autoimmune liver diseases. Gut 70, 1989–2003 (2021).
Hedin, C. R. H. et al. The pathogenesis of extraintestinal manifestations: implications for IBD research, diagnosis, and therapy. J. Crohns Colitis 13, 541–554 (2019).
Gordon, H. et al. ECCO guidelines on extraintestinal manifestations in inflammatory bowel disease. J. Crohns Colitis 18, 1–37 (2024).
Vavricka, S. R. et al. Chronological order of appearance of extraintestinal manifestations relative to the time of IBD diagnosis in the Swiss Inflammatory Bowel Disease Cohort. Inflamm. Bowel Dis. 21, 1794–1800 (2015).
Vavricka, S. R. et al. Frequency and risk factors for extraintestinal manifestations in the Swiss inflammatory bowel disease cohort. Am. J. Gastroenterol. 106, 110–119 (2011).
Barabasi, A. L., Gulbahce, N. & Loscalzo, J. Network medicine: a network-based approach to human disease. Nat. Rev. Genet. 12, 56–68 (2011).
Rezaie, A., Quan, H., Fedorak, R. N., Panaccione, R. & Hilsden, R. J. Development and validation of an administrative case definition for inflammatory bowel diseases. Can. J. Gastroenterol. 26, 711–717 (2012).
Yu, K. H., Healey, E., Leong, T. Y., Kohane, I. S. & Manrai, A. K. Medical artificial intelligence and human values. N. Engl. J. Med. 390, 1895–1904 (2024).
Longo, L., Goebel, R., Lecue, F., Kieseberg, P. & Holzinger, A. in Machine Learning and Knowledge Extraction. (eds Holzinger, A., Kieseberg, P., Min Tjoa, A. & Weippl, E.) 1–16 (Springer International Publishing).
Tjoa, E. & Guan, C. A survey on explainable artificial intelligence (XAI): toward medical XAI. IEEE Trans. Neural Netw. Learn. Syst. 32, 4793–4813 (2021).
Barrio-Hernandez, I. et al. Network expansion of genetic associations defines a pleiotropy map of human cell biology. Nat. Genet. 55, 389–398 (2023).
Galindez, G., Sadegh, S., Baumbach, J., Kacprowski, T. & List, M. Network-based approaches for modeling disease regulation and progression. Comput. Struct. Biotechnol. J. 21, 780–795 (2023).
Martin, J. C. et al. Single-cell analysis of Crohn’s disease lesions identifies a pathogenic cellular module associated with resistance to anti-TNF therapy. Cell 178, 1493–1508e1420 (2019).
Alberta Health Data Asset Directory. http://www.albertarwe.ca/ (2023).
Bernstein, C. N., Blanchard, J. F., Rawsthorne, P. & Yu, N. The prevalence of extraintestinal diseases in inflammatory bowel disease: a population-based study. Am. J. Gastroenterol. 96, 1116–1122 (2001).
Huang, H. et al. Fine-mapping inflammatory bowel disease loci to single-variant resolution. Nature 547, 173–178 (2017).
Marigorta, U. M. et al. Transcriptional risk scores link GWAS to eQTLs and predict complications in Crohn’s disease. Nat. Genet. 49, 1517–1521 (2017).
Sazonovs, A. et al. Large-scale sequencing identifies multiple genes and rare variants associated with Crohn’s disease susceptibility. Nat. Genet. 54, 1275–1283 (2022).
Liu, Z. et al. Genetic architecture of the inflammatory bowel diseases across East Asian and European ancestries. Nat. Genet. 55, 796–806 (2023).
Falloon, K. et al. A United States expert consensus to standardise definitions, follow-up, and treatment targets for extra-intestinal manifestations in inflammatory bowel disease. Aliment. Pharm. Ther. 55, 1179–1191 (2022).
Sciberras, M. et al. Mental health, work presenteeism, and exercise in inflammatory bowel disease. J. Crohns Colitis 16, 1197–1201 (2022).
Moulton, C. D., Norton, C., Powell, N., Mohamedali, Z. & Hopkins, C. W. P. Depression in inflammatory bowel disease: risk factor, prodrome or extraintestinal manifestation?. Gut 69, 609–610 (2020).
Fairbrass, K. M. et al. Bidirectional brain–gut axis effects influence mood and prognosis in IBD: a systematic review and meta-analysis. Gut 71, 1773–1780 (2022).
Craig, C. F., et al. Neuroinflammation as an etiological trigger for depression comorbid with inflammatory bowel disease. J. Neuroinflamm. 19, 4 (2022).
Carloni, S. et al. Identification of a choroid plexus vascular barrier closing during intestinal inflammation. Science 374, 439–448 (2021).
Faye, A. S. et al. Increasing rates of venous thromboembolism among hospitalised patients with inflammatory bowel disease: a nationwide analysis. Aliment. Pharm. Ther. 56, 1157–1167 (2022).
Olivera, P. A. et al. International consensus on the prevention of venous and arterial thrombotic events in patients with inflammatory bowel disease. Nat. Rev. Gastroenterol. Hepatol. 18, 857–873 (2021).
Thienel, M. et al. Immobility-associated thromboprotection is conserved across mammalian species from bear to human. Science 380, 178–187 (2023).
Sun, J. et al. Long-term risk of stroke in patients with inflammatory bowel disease: a Population-Based, Sibling-Controlled Cohort Study, 1969–2019. Neurology 101, e653–e664 (2023).
Gandhi, J., Mages, K., Kucine, N. & Chien, K. Venous thromboembolism in pediatric inflammatory bowel disease: a Scoping Review. J. Pediatr. Gastroenterol. Nutr. 77, 491–498 (2023).
Kirchgesner, J. et al. Increased risk of acute arterial events in young patients and severely active IBD: a nationwide French cohort study. Gut 67, 1261–1268 (2018).
Naito, T. et al. Prevalence and effect of genetic risk of thromboembolic disease in inflammatory bowel disease. Gastroenterology 160, 771–780.e774 (2021).
Pepe, M. et al. Inflammatory bowel disease and acute coronary syndromes: from pathogenesis to the fine line between bleeding and ischemic risk. Inflamm. Bowel Dis. 27, 725–731 (2021).
Pemmasani, G. et al. Epidemiology and clinical outcomes of patients with inflammatory bowel disease presenting with acute coronary syndrome. Inflamm. Bowel Dis. 27, 1017–1025 (2021).
Zhao, J. H. et al. Genetics of circulating inflammatory proteins identifies drivers of immune-mediated disease risk and therapeutic targets. Nat. Immunol. 24, 1540–1551 (2023).
Xiao, Y., Powell, D. W., Liu, X. & Li, Q. Cardiovascular manifestations of inflammatory bowel diseases and the underlying pathogenic mechanisms. Am. J. Physiol. Regul. Integr. Comp. Physiol. 325, R193–R211 (2023).
Ebert, A. et al. Inflammatory bowel disease and risk of more than 1500 comorbidities: a disease-wide pre- and post-diagnostic phenomic association study. Preprint at medRxiv https://doi.org/10.1101/2024.02.14.24302206 (2024).
Guillo, L. et al. Endpoints for extraintestinal manifestations in inflammatory bowel disease trials: the EXTRA consensus from the International Organization for the Study of Inflammatory Bowel Diseases. Lancet Gastroenterol. Hepatol. 7, 254–261 (2022).
Statsitics Canada. (2021).
Baumgart, D. C. Digital advantage in the COVID-19 response: perspective from Canada’s largest integrated digitalized healthcare system. NPJ Digit. Med. 3, 114 (2020).
Hutfless, S. et al. A systematic review of Crohn’s disease case definitions in administrative or claims databases. Inflamm. Bowel Dis. 29, 705–715 (2023).
Blondel, V. D., Guillaume, J. -L., Lambiotte, R. & Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech.: Theory Exp. 2008, P10008 (2008).
Rostami, M., Oussalah, M., Berahmand, K. & Farrahi, V. Community detection algorithms in healthcare applications: a systematic review. IEEE Access 11, 30247–30272 (2023).
Brunson, J. C., Agresta, T. P. & Laubenbacher, R. C. Sensitivity of comorbidity network analysis. JAMIA Open 3, 94–103 (2020).
Vis Network v. 9.1.6. Github. https://github.com/visjs/vis-network (2025).
Acknowledgements
This work has been supported by research grants from the University of Alberta, the Canadian Institute for Advanced Research (CIFAR), the Canadian Institutes of Health Research (CIHR), the National Sciences and Engineering Council of Canada (NSERC), Mathematics of Information Technology and Complex Systems (Mitacs), the Alberta Machine Intelligence Institute (Amii), University of Alberta Hospital Foundation and Alberta Innovates. This work is part of the International Collaborative Research and Training Experience (NSERC CREATE) “From Data to Decision (FD2D)—digital transformation and artificial intelligence from data value chain to human value” (https://fd2d.org) led by D.C.B.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Contributions
D.C.B. had the idea, designed, and led the project. J.R.M. conceived of and performed the initial network analysis. C.H.C. and T.X.D. performed additional network analyses. M.P. performed the Louvain community detection robustness evaluation. C.H.C., T.X.D., J.R.M., and D.C.B. analyzed and interpreted the data. D.C.B. wrote the first draft. All authors (D.C.B., C.H.C., T.X.D., M.P., D.C.S., E.W., F.H., B.P.H., A.M.L., S.Z.G., K.W., F.P., R.G., J.R.M.) edited and finally approved the manuscript including all, tables, panels, figures, and apps prior to submission.
Corresponding author
Ethics declarations
Competing interests
The opinions expressed in this manuscript are the author’s own and do not necessarily reflect those of the University of Alberta, Alberta Health Services (AHS), Alberta Health (AH), or other Government of Alberta entities. The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Baumgart, D.C., Cheng, C.H., Du, T.X. et al. Network analysis of extraintestinal manifestations and associated autoimmune disorders in Crohn’s disease and ulcerative colitis. npj Digit. Med. 8, 209 (2025). https://doi.org/10.1038/s41746-025-01504-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41746-025-01504-6