Abstract
There are no established prognostic tools for predicting relapse risk in first-episode bipolar disorder (FEBD), limiting the use of personalized treatment approaches. We aimed to develop and validate a machine learning (ML) model to predict relapse in FEBD patients and assess whether pharmacotherapy effectiveness varies across predicted risk strata. We used nationwide registry data from Sweden (n = 30,402; follow-up = 2006–2021) and Finland (n = 13,790; follow-up = 1996–2018). The developed ML model achieved an area under the receiver operating characteristic curve (AUROC) of 0.71 (95% CI = 0.69–0.72, highest for relapse due to psychotic mania [AUROC = 0.85, 95% CI = 0.80–0.89]) in the Swedish cohort (internal validation) and 0.68 (95% CI = 0.66–0.69, highest for nonpsychotic mania [AUROC = 0.74, 95% CI = 0.69–0.78]) in the Finnish cohort (external validation). The model incorporated only seven accessible predictors, including pharmacotherapies during the first 30 days post-FEBD (lithium, combination treatments, and antipsychotics), outpatient follow-up within 30 days post-FEBD, prolonged initial hospitalization for FEBD, history of prior psychiatric hospitalizations, and sickness absence days. Among high-risk patients (N = 10,119, 31.77%), long-acting injectable (LAI) antipsychotics were associated with the greatest reduction in psychiatric rehospitalization risk (HR = 0.44, 95% CI 0.29–0.67). In the low-risk group (n = 21,736, 68.23%), only combinations of quetiapine with valproate (HR = 0.79, 95% CI = 0.65–0.95) or lamotrigine (HR = 0.86, 95% CI = 0.74–0.99) were associated with reduced rehospitalization risk. Available online for research purposes, our internally and externally validated prognostic model may inform clinical decision-making by identifying high-risk individuals who could benefit from early initiation of LAI antipsychotics, promoting a shift toward proactive, risk-guided treatment in FEBD patients.
Similar content being viewed by others
Introduction
Most individuals with first-episode bipolar disorder (FEBD) experience relapse during the course of their illness, with annual relapse rates estimated at 21.9−26.3% [1], and nearly half require psychiatric rehospitalization shortly after diagnosis [2]. While epidemiological studies have identified risk factors such as residual symptoms of mania or depression, previous affective relapses, poor baseline functioning, and substance use [3,4,5], these have not been translated into prognostic tools. In other medical fields, models such as the Framingham risk score routinely guide risk stratification and treatment [6]; however, similar advancements in bipolar disorder (BD) treatment are lacking. Existing efforts to develop prediction models for bipolar relapse have largely focused on advanced illness stages, relying on small (<1000 patients) potentially selected samples without external validation [7,8,9,10,11]. The development of robust prognostic models requires the use of large, representative cohorts with long-term follow-up [12] and crucial external validation to ensure generalizability [13].
Although clinical guidelines emphasize the need to tailor treatment for people with BD to meet individual patient characteristics, their recommendations remain broad [14,15,16]. This difference in prognostic capability likely contributes to the use of reactive, trial-and-error treatment strategies for BD patients. In some patients, medication regimens become progressively complex over time due to limited responses to initial therapies, often requiring multiple medication classes in the later stages of BD [17]. Additionally, while extensive observational data have suggested that long-acting injectable (LAI) antipsychotics are associated with the lowest risk of rehospitalization for BD [18], their universal applicability in FEBD remains unclear. Overall, it is unclear whether different pharmacotherapies could be effectively targeted through proactive, risk-based strategies during the early stages of BD.
Here, we aimed to develop and validate a novel machine learning (ML) model for predicting relapse risk in patients with FEBD, focusing on creating a parsimonious prognostic model with few routinely collected variables to enhance clinical applicability. Additionally, we examined whether the model could help differentiate the relative effectiveness of pharmacotherapies in reducing psychiatric rehospitalization risk across patients in different risk strata. Finally, given the inherent uncertainty of psychiatric diagnostic boundaries [19], we assessed the model’s transdiagnostic applicability to ensure its usefulness in uncertain cases of BD diagnosis. For these purposes, we used two unselected nationwide cohorts with up to 23 years of follow-up for model development and internal and external validation.
Methods
Study design and data acquisition
We followed the TRIPOD + AI reporting guidelines [20]. A prestudy protocol or registration was not prepared, and the study had no public involvement. We analyzed two nationwide registry-based cohorts from Sweden and Finland extracted from comparable registry databases using identical exclusion criteria (details in the Supplementary Methods and Supplementary Fig. 1). Ethical approval was granted by the Regional Ethics Board of Stockholm (decision number: 2007/762-31), the Swedish Ethical Review Authority (2024-08708-02), the Finnish National Institute for Health and Welfare, the Social Insurance Institution of Finland, the Finnish Centre for Pensions, and Statistics Finland (permissions THL/5279/14.06.00/2023, 31/522/2019, 19023, and TK-53-569-19). Given that this study was based on registry data and did not involve direct participant contact, informed consent was not required under the legislation of either country. The study was conducted in accordance with the declaration of Helsinki.
Both cohorts included individuals diagnosed with FEBD (ICD-10: F30-F31) in inpatient or outpatient settings, with a minimum follow-up of 2 years and an age limit of 45 years at inclusion. The Swedish cohort included patients with documented treatment contacts across Sweden followed from July 1, 2006, to December 31, 2021. These patients were identified through the National Patient Register (covering inpatient and specialized outpatient care) and the MiDAS Register (tracking disability pensions and sickness absences). Similarly, the Finnish cohort included individuals diagnosed with FEBD between January 1, 1996, and December 31, 2018. The data for this cohort were sourced from the Hospital Discharge Register maintained by the National Institute of Health and Welfare and sickness absence and disability pension records from the Social Insurance Institution of Finland and the Finnish Centre for Pensions.
For transdiagnostic assessments, we included two additional cohorts from the same Swedish registry databases as the FEBD cohort—individuals with nonaffective first-episode psychosis (FEP; F20–F29) and first-episode psychotic depression (FEPD; F32.3, F33.3)—followed over identical periods.
Model development and validation
Data analyses were conducted between January 2023 and January 2025. We conducted ML analyses on relapse risk (defined as psychiatric hospitalization due to bipolar relapse) prediction in R, version 4.1.1 (R Project for Statistical Computing). The primary outcome was the prediction performance of all-cause hospitalization due to bipolar relapse (ICD-10: F30-F31) within two years post-FEBD; the secondary outcome was the prediction performance of specific hospitalizations due to BD. The primary ML task was binary classification of 2-year relapse (psychiatric hospitalization due to bipolar relapse vs. no hospitalization), with secondary analyses extending this framework to cause-specific hospitalizations (e.g., psychiatric hospitalization due to mania vs. no hospitalization). Performance was assessed in terms of discrimination using the area under the receiver operating characteristic curve (AUROC) and calibration (i.e., the alignment between predicted and observed probabilities) through calibration plots and metrics (Brier score, calibration slope, and intercept/calibration-in-the-large). We also evaluated the fairness of the predictions by analyzing discrimination and calibration across different subgroups, including immigration status, sex, and educational levels (details in the Supplementary Methods). The potential clinical utility of the model was gauged via decision curve analysis [21] across relapse risk thresholds of 0–40%, as higher thresholds are unlikely to be clinically acceptable (details in the Supplementary Methods). To evaluate the model’s ability to predict long-term outcomes, we used Cox proportional hazards regression to analyze relapse risk across all available follow-up periods and calculated the C-index as a measure of predictive performance.
In the Swedish cohort, the nationwide sample was randomly geographically split into development (10 counties) and validation (the remaining 11 counties) datasets, with the development sample size determined to meet minimum requirements for robust model training. The final model was restricted to a maximum of 15 predictors. A rate of hospitalization due to BD relapse of 6% was assumed based on prior observational data [4], with an anticipated AUROC of 0.70. Minimum sample size calculations, adapted from methods for multivariable models using the pmsampsize package [22], yielded a requirement of 4455 individuals with 268 events, ensuring 17.82 events per predictor parameter. Given that machine learning models often require larger samples than regression-based models [23], we doubled this estimate, setting a minimum development sample size of 8910 individuals.
We developed an ML model to predict the risk of hospitalization due to BD within two years following FEBD diagnosis, excluding admissions within 30 days, which were considered part of the initial hospitalization if diagnosed in an inpatient setting. We incorporated a broad set of clinical, sociodemographic, and socioeconomic variables from all available registries at FEBD diagnosis or within the preceding 1–2 years. A total of 79 variables covering clinical history, first-line FEBD treatments, prior medication use, employment history, disability pensions, sickness absences, and demographic factors were explored (details in Supplementary Table 2). First-line treatments were included as predictors since these decisions likely influence the subsequent course of illness. As the actual patient records were unavailable, these predictors were approximated using register-based data. Specifically, treatments (i.e., first pharmacotherapy choices and early outpatient care) initiated within 30 days were used as proxy measures of first-line treatment decisions. Note that there was no temporal overlap between the predictors and the outcome variable, as only the relapses that occurred after 30 days post-FEBD were considered outcomes.
Initial ML modeling was performed using eXtreme Gradient Boosting (XGBoost) [24] within a nested cross-validation framework in the development cohort, incorporating all 79 predictors (details in the Supplementary Methods and Supplementary Fig. 2). Feature selection was then applied, reducing the variable set to the 15 most important predictors based on feature gain. A sequential forward selection (SFS) approach within cross-validation was used to refine the model further, iteratively adding variables until no additional predictors improved performance. The final model, trained on this optimized set, was recalibrated using logistic regression. The contributions of the model’s predictors were estimated using the Shapley Additive Explanation (SHAP) [25]. Performance, assessed through discrimination and calibration, was validated internally on held-out Swedish data, externally on the Finnish cohort, and transdiagnostically in Swedish samples with FEP and FEPD. To evaluate the impact of the modeling strategy, elastic net regression, support vector machine, and random forest models were trained and compared against the final XGBoost model (details in the Supplementary Methods).
Pharmacoepidemiologic analysis
We investigated whether the association between pharmacotherapy and the risk of all-cause psychiatric hospitalization (ICD-10: all F diagnoses) varied according to the ML model’s predicted risk strata over the total follow-up period (up to 23 years in the Finnish cohort and 15 years in the Swedish cohort). We used all-cause psychiatric hospitalizations to increase event rates, also recognizing that hospitalizations due to other conditions (e.g., suicidal behavior) are partly driven by BD-related behavior in FEBD [26]. The analysis aimed to assess whether the relationship between specific pharmacotherapies, including combination treatments, and psychiatric hospitalization differed across predicted risk levels. The Swedish and Finnish individuals in the validation cohort were stratified using the model’s prediction cutoff, determined by Youden’s index [27], in the development cohort.
The primary treatment exposures included specific mood stabilizers (Anatomical Therapeutic Chemical [ATC] codes N03AF01, N03AG01, N03AX09, and N05AN01), antipsychotics (N05A, excluding lithium), and their combinations, with nonuse of either class serving as the reference group. Medications and their combinations were included if at least 15 outcome events were observed. Medication exposure periods were derived using the PRE2DUP method [28], which estimates time-varying drug use by analyzing prescription purchases, incorporating defined daily doses, purchase amounts, individual use patterns, hospitalization periods, and stockpiling.
Within-individual Cox regression analyses were conducted to examine the association between pharmacotherapy use and psychiatric rehospitalization using SAS version 9.4. The within-individual approach mitigates selection bias by comparing treatment periods within the same individual while accounting for repeated outcome events [29]. The follow-up time was reset to zero after each rehospitalization, and the models were adjusted for temporal factors, including treatment sequence, time since cohort entry, and concomitant psychotropic medication use (e.g., antidepressants, benzodiazepines, ADHD medications, and medications for substance use disorders).
The results from the Swedish and Finnish cohorts were pooled using fixed-effect meta-analysis with the metafor package in R [30], generating pooled hazard ratios (HRs) and 95% confidence intervals (CIs) for each pharmacotherapy.
Results
Demographic results
We gathered data from 30,402 patients with FEBD from the Swedish cohort (mean [SD] age, 29.68 [8.10] years; 10,085 [33.17%] men; 2-year BD relapse-related hospitalization rate, 2786 [9.16%]); mean [SD] follow-up, 8.53 [3.69] years; individuals on disability pensions at baseline, 2236 [7.35%]; and 13,790 patients from the Finnish cohort (mean [SD] age, 30.73 [8.16] years; 5817 [42.18%] men; 2-year BD relapse-related hospitalization rate, 1949 [14.13%]; mean [SD] follow-up, 10.58 [4.95] years; individuals on disability pensions at baseline, 1240 [8.99%]). In both cohorts (Table 1), bipolar depression was the leading specific cause of rehospitalization, followed by bipolar mania. The Swedish cohort was geographically split, with half of the counties allocated for model development (n = 12,337, exceeding the minimum required sample of 8910) and half for internal validation (n = 18,065). Patients who relapsed within two years post-FEBD were more often diagnosed in inpatient settings; had higher rates of substance use disorder; and were more likely to receive first-line treatment with antidepressants, antipsychotics, mood stabilizers, or benzodiazepines. The transdiagnostic validation samples included 23,362 patients with FEP and 5491 with FEPD (Supplementary Table 1).
Model development and validation results
In the development cohort, the out-of-training predictions using all 79 predictors within a nested cross-validation framework yielded an AUROC of 0.70 (95% CI = 0.69–0.72). SFS identified seven key predictors of relapse: prolonged first hospitalization for BD, first-line (i.e., the first 30 days post-FEBD) pharmacotherapies (lithium, combination treatments, and antipsychotics), number of psychiatric hospital visits (within one year pre-FEBD), outpatient visit within 30 days post-FEBD, and a high number of previous sickness absence days (Fig. 1).
Each point represents an observation from a given validation sample for a given predictor (feature). The x-axis displays the SHAP value, which indicates both the direction and magnitude of the predictor’s impact on relapse risk (positive values denote increased risk, negative values denote decreased risk). The color gradient reflects the predictor’s relative value (warm tones for higher, cool for lower), and predictors are ranked from top to bottom by the mean absolute SHAP value. A The Swedish internal validation sample and (B) the Finnish external validation sample.
The final ML model, which included these seven predictors, achieved an AUROC of 0.71 (95% CI = 0.69–0.72) for all-cause relapse within two years in the Swedish cohort (internal validation, Fig. 2), outperforming alternative ML models (Supplementary Fig. 3). The predictive performance was the highest for relapse due to psychotic mania (AUROC = 0.85, 95% CI = 0.80–0.89) and the lowest for psychotic depression (AUROC = 0.71, 95% CI = 0.59–0.82). In the Finnish cohort (external validation; Fig. 2), the model yielded an AUROC of 0.68 (95% CI = 0.66–0.69) for all-cause bipolar relapse, with the highest performance for nonpsychotic mania (AUROC = 0.74, 95% CI = 0.69–0.78) and the lowest for psychotic mania (AUROC = 0.65, 95% CI = 0.61–0.70). An online version of the model with interpretable predictions is available for research purposes: https://johannes-lieslehto.shinyapps.io/biporacle/.
In the development cohort, the optimal cutoff for the ML model, determined using Youden’s index, was 9.15% (details in Supplementary Fig. 4). At this threshold, the model achieved a sensitivity of 61.82%, specificity of 70.83%, positive predictive value (PPV) of 17.49%, and negative predictive value (NPV) of 94.89% in the internal validation sample. In the external Finnish validation sample, the sensitivity was 56.08%, the specificity was 72.80%, the PPV was 25.34%, and the NPV was 90.97%. The detailed discrimination metrics across various thresholds are detailed in Supplementary Tables 3–4.
Visual inspection indicated good calibration in the internal validation sample but slight underestimation in the external validation sample (Fig. 3). In the internal validation, the Brier score was 0.08 (95% CI = 0.074–0.08), with a calibration slope of 0.95 (95% CI = 0.89–1.01) and an intercept of −0.17 (95% CI = − 0.31 to −0.04). According to the external validation, the Brier score was 0.12 (95% CI = 0.11–0.12), with a calibration slope of 0.82 (95% CI = 0.75–0.87) and an intercept of 0.09 (95% CI = − 0.06–0.22). No significant prediction bias was detected for immigration status, sex, or education level (all P > 0.1; Supplementary Table 5). Decision curve analysis suggested potential clinical benefits across relapse risk thresholds of 4–34% in the internal validation cohort and 9–40% in the external validation cohort (Supplementary Fig. 5 in the Supplement).
We also conducted time-to-event analyses using the available follow-up data (Supplementary Figs. 6, 7). In the Swedish internal validation cohort, the 15-year relapse rates among patients in the top 20% of the predicted risk quintiles were 45.57% (95% CI, 41.91–49.01%) and 13.16% (95% CI, 11.49–14.79%) for those within the bottom 20% of the risk-predicted relapse (HR, 4.61 [95% CI, 4.12-5.16]; Harrell’s C-index of the developed model = 0.67 [95% CI, 0.66-0.68]). In the Finnish validation cohort, the 23-year relapse rate was 46.6% (95% CI, 44.23−48.87%) for patients within the top 20% of the total risk of relapse and 15.98% (95% CI, 13.82−18.08%) for those predicted to survive (HR, 3.84 [95% CI, 3.46–4.26]; Harrell’s C-index of the developed model = 0.65 [95% CI, 0.64–0.66]).
In the transdiagnostic assessment, the model achieved an AUROC of 0.68 (95% CI = 0.65–0.71) for predicting relapse in FEPD patients and 0.60 (95% CI = 0.59–0.61) in FEP patients (Supplementary Fig. 8). Calibration metrics and visual inspection indicated poor calibration for both disorders (Supplementary Fig. 9).
Pharmacoepidemiologic results
Using Youden’s index (threshold 9.15%), patients in the two validation cohorts were classified into a low-risk subgroup (<9.15% predicted relapse risk, 68.23%, N = 21,736) and a high-risk subgroup ( ≥ 9.15% predicted relapse risk, 31.77%, N = 10,119). In the low-risk group (Fig. 4A), the combination of quetiapine and valproate was associated with the lowest adjusted hazard ratio (HR) for psychiatric rehospitalization (HR = 0.79, 95% CI = 0.65–0.95). Furthermore, the periods of use of quetiapine and lamotrigine (vs. nonuse periods of antipsychotics or mood stabilizers) were associated with a reduced risk of psychiatric rehospitalization (HR = 0.86, 95% CI = 0.74 − 0.99). Among high-risk patients (Fig. 4B), LAI antipsychotic treatment was associated with the lowest risk of future psychiatric rehospitalization compared with the same individuals’ nonuse periods of antipsychotics or mood stabilizers (HR = 0.44, 95% CI = 0.29–0.67). In the low-risk group, the same was not true (HR = 1.16, 95% CI = 0.78–1.70). In addition to LAIs, the combination of olanzapine and lithium was related to a lower risk of psychiatric rehospitalization among high-risk patients (HR = 0.84, 95% CI = 0.72 − 0.98).
Discussion
Using two nationwide patient cohorts, we developed an interpretable web-based ML model incorporating seven routinely measured clinical variables that predict future relapse in FEBD patients with performance comparable to that of established risk calculators, such as the Framingham risk score [6]. The developed model with only a handful of predictors was developed to enhance parsimony and improve feasibility for clinical implementation. The model exhibited generalizable predictive performance without bias toward immigration status, sex, or education level. Decision curve analyses supported the model’s potential clinical utility across relevant risk thresholds. Pharmacoepidemiologic findings indicated that the effectiveness of different pharmacotherapies differs according to patients’ predicted risk of relapse. Although the present model is not sufficient for direct clinical implementation, it represents an important proof-of-concept demonstrating the feasibility of developing the first externally validated relapse risk prediction model for FEBD patients.
Previous research has indicated that unaided clinicians overestimate favorable outcomes for low-frequency events such as suicidal behavior [31], underscoring the need for predictive models to enhance risk assessment. The internally and externally validated prediction model represents a step forward in developing individualized risk assessment tools for BD. The model incorporated a small number of key predictors consistent with prior epidemiological and previous ML research on predicting bipolar relapse, including first-line pharmacotherapy (lithium, combination treatments, antipsychotics), immediate outpatient follow-up post-FEBD, and prolonged first hospitalization. These predictors likely reflect baseline illness severity, with more intensive pharmacotherapy and longer hospitalizations marking greater symptom burden at FEBD and associated treatment complexity, consistent with previous research [3, 4, 7,8,9,10]. The association between prior psychiatric hospitalizations and relapse risk also aligns with established evidence [5, 11].
Risk prediction enables stratified care to allocate intensive interventions to those most likely to benefit while sparing unnecessary interventions among lower-risk individuals. However, unclear diagnostic boundaries—for instance, conversion rates between schizophrenia and bipolar disorder of 4.5–10.1% [32] and up to 17% of depression cases later reclassified as bipolar disorder [33]—challenge the development of a psychiatric prediction model. To assess the applicability of our model in diagnostically uncertain patients, we evaluated its transdiagnostic performance, which demonstrated its predictive value for FEPD and FEP. While adaptable, its use beyond FEBD will require recalibration and potentially the addition of disorder-specific variables. Nonetheless, the observed converging performance indicates shared relapse mechanisms across FEBD, FEP, and FEPD patients.
We observed variations in relapse risk across different pharmacotherapies via an individual’s predicted risk profile. Among high-risk patients (approximately one-third of whom are at risk), LAI antipsychotics were associated with the greatest reduction in psychiatric rehospitalization risk, potentially reflecting their efficacy in managing mania and addressing treatment nonadherence [34, 35]. These findings also align with the model’s strongest predictive performance for mania-related relapses. In contrast, LAI antipsychotic use was not linked to reduced psychiatric rehospitalization risk in the majority of patients predicted to have a low risk of relapse. Interestingly, among these individuals, only quetiapine in combination with lamotrigine or valproate was linked to a decreased risk of psychiatric rehospitalization. Given the established efficacy of these combinations in treating depressive symptoms [36, 37], it is plausible that lower-risk patients predominantly exhibit depressive polarity.
The pharmacoepidemiologic findings across risk groups highlight the limitations of a one-size-fits-all approach and demonstrate the potential of ML to guide personalized treatment in BD patients, similar to stratified treatment strategies routinely employed in other medical fields (e.g., oncology [38]). Our results suggest that an individual’s risk profile may influence the effectiveness of specific pharmacotherapies. Although LAI antipsychotics are widely recognized as effective at preventing psychiatric hospitalizations in patients with BD [18, 34], our findings indicate that this benefit may be primarily driven by a subgroup of high-risk patients, with the majority of patients deriving less pronounced benefits. If prospectively validated, these results could inform treatment guidelines that emphasize risk-based pharmacotherapy selection at baseline, moving away from the current reactive paradigm, where new therapies are typically introduced after treatment failure [17, 39].
Strengths and limitations
This study’s strengths include the use of two large, unselected nationwide cohorts with up to two decades of follow-up. This approach enabled the development and internal and external validation of an ML-based risk prediction model in more than 40,000 patients with FEBD, enhancing its generalizability. The within-individual design of the pharmacoepidemiologic analysis mitigates selection bias by using patients as their own controls, a key limitation in observational studies [29]. Nonetheless, the model’s real-world clinical utility remains unproven in the absence of clinician-derived relapse risk benchmarks and, given its dependence on the availability of all seven predictors, which may not be consistent in routine practice. Additionally, observational data preclude causal inference, and despite within-individual analysis, confounding by indication cannot be excluded in pharmacoepidemiologic analyses. Prospective trials are therefore needed to evaluate the clinical impact of the model on treatment selection. Additionally, nonpsychotic major depressive episodes, which are common initial presentations of bipolar disorder, could not be included as index episodes because they are infrequently treated in hospital settings; thus, their outcomes are not reliably captured in registers.
Conclusions
This prognostic study developed and externally validated a relapse risk model using routine clinical data. The prediction model, available online for research purposes, may facilitate the identification of high-risk individuals and inform more tailored pharmacotherapy in FEBD patients.
Data availability
The data used in this study cannot be made publicly available due to privacy regulations. According to the General Data Protection Regulation, the Swedish law SFS 2018:218, the Swedish Data Protection Act, the Swedish Ethical Review Act, and the Public Access to Information and Secrecy Act, these types of sensitive data can be made available only for specific purposes, including research, that meets the criteria for accessing sensitive and confidential data as determined by a legal review. The readers may contact Professor Kristina Alexanderson (kristina.alexanderson@ki.se) regarding the Swedish data. The Finnish data collected for this study are proprietary to Finnish government agencies, the Social Insurance Institution of Finland, and the National Institute for Health and Welfare, which granted the researchers permission and access to the data. The data supporting this study’s findings are available from these authorities, but restrictions apply to the availability of these data.
Code availability
The analysis codes used to analyze these data are available at https://github.com/johpulkk/FEBD_relapse_ML.
References
Vázquez GH, Holtzman JN, Lolich M, Ketter TA, Baldessarini RJ. Recurrence rates in bipolar disorder: Systematic comparison of long-term prospective, naturalistic studies versus randomized controlled trials. Eur Neuropsychopharmacol. 2015;25:1501–12.
Walker S, Mackay E, Barnett P, Sheridan Rains L, Leverton M, Dalton-Locke C, et al. Clinical and social factors associated with increased risk for involuntary psychiatric hospitalisation: a systematic review, meta-analysis, and narrative synthesis. Lancet Psychiatry. 2019;6:1039–53.
Perlis RH, Ostacher MJ, Patel JK, Marangell LB, Zhang H, Wisniewski SR, et al. Predictors of recurrence in bipolar disorder: primary outcomes from the systematic treatment enhancement program for bipolar disorder (STEP-BD). Am J Psychiatry. 2006;163:217–24.
Tokumitsu K, Yasui-Furukori N, Adachi N, Kubota Y, Watanabe Y, Miki K, et al. Predictors of psychiatric hospitalization among outpatients with bipolar disorder in the real-world clinical setting. Front Psychiatry. 2023;14:1078045.
Kessing LV, Hansen MG, Andersen PK, Angst J. The predictive effect of episodes on the risk of recurrence in depressive and bipolar disorders – a life-long perspective. Acta Psychiatrica Scandinavica. 2004;109:339–44.
Damen JA, Pajouheshnia R, Heus P, Moons KGM, Reitsma JB, Scholten RJPM, et al. Performance of the framingham risk models and pooled cohort equations for predicting 10-year risk of cardiovascular disease: a systematic review and meta-analysis. BMC Med. 2019;17:109.
Rotenberg LdeS, Borges-Júnior RG, Lafer B, Salvini R, Dias RdaS. Exploring machine learning to predict depressive relapses of bipolar disorder patients. J Affect Disord. 2021;295:681–7.
Lee H-J, Cho C-H, Lee T, Jeong J, Yeom JW, Kim S, et al. Prediction of impending mood episode recurrence using real-time digital phenotypes in major depression and bipolar disorders in South Korea: a prospective nationwide cohort study. Psychological Med. 2023;53:5636–44.
Salem H, Ruiz A, Hernandez S, Wahid K, Cao F, Karnes B, et al. Borderline personality features in inpatients with bipolar disorder: impact on course and machine learning model use to predict rapid readmission. J Psychiatr Pract®. 2019;25:279–89.
Edgcomb J, Shaddox T, Hellemann G, Brooks JO. High-risk phenotypes of early psychiatric readmission in bipolar disorder with comorbid medical illness. Psychosomatics. 2019;60:563–73.
Palacios-Ariza MA, Morales-Mendoza E, Murcia J, Arias-Duarte R, Lara-Castellanos G, Cely Jiménez A, et al. Prediction of patient admission and readmission in adults from a Colombian cohort with bipolar disorder using artificial intelligence. Front Psychiatry. 2023;14:1266548.
Riley RD, Ensor J, Snell KIE, Harrell FE, Martin GP, Reitsma JB, et al. Calculating the sample size required for developing a clinical prediction model. BMJ. 2020;368:m441.
Solmi M, Cortese S, Vita G, De Prisco M, Radua J, Dragioti E, et al. An umbrella review of candidate predictors of response, remission, recovery, and relapse across mental disorders. Mol Psychiatry. 2023;28:3671–87.
Yatham LN, Kennedy SH, Parikh SV, Schaffer A, Bond DJ, Frey BN, et al. Canadian network for mood and anxiety treatments (CANMAT) and international society for bipolar disorders (ISBD) 2018 guidelines for the management of patients with bipolar disorder. Bipolar Disord. 2018;20:97–170.
Hirschfeld RM, Bowden CL, Gitlin MJ, Keck PE, Suppes T, Thase ME, et al. Treatment of patients with bipolar disorder. APA Pract Guidel. 2010;2002:2010.
Kendall T, Morriss R, Mayo-Wilson E, Marcus E. Assessment and management of bipolar disorder: summary of updated NICE guidance. BMJ. 2014;349:g5673.
Kim AM, Salstein L, Goldberg JF. A systematic review of complex polypharmacy in bipolar disorder: prevalence, clinical features, adherence, and preliminary recommendations for practitioners. J Clin Psychiatry. 2021;82:20r13263.
Lähteenvuo M, Paljärvi T, Tanskanen A, Taipale H, Tiihonen J. Real-world effectiveness of pharmacological treatments for bipolar disorder: register-based national cohort study. Br J Psychiatry. 2023;223:456–64.
Fusar-Poli P, Solmi M, Brondino N, Davies C, Chae C, Politi P, et al. Transdiagnostic psychiatry: a systematic review. World Psychiatry. 2019;18:192–207.
Collins GS, Moons KGM, Dhiman P, Riley RD, Beam AL, Van Calster B, et al. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ. 2024;385:e078378.
Vickers AJ, van Calster B, Steyerberg EW. A simple, step-by-step guide to interpreting decision curve analysis. Diagnostic Prognostic Res. 2019;3:18.
Riley RD, Snell KI, Ensor J, Burke DL, Harrell Jr FE, Moons KG, et al. Minimum sample size for developing a multivariable prediction model: PART II - binary and time-to-event outcomes. Stat Med. 2019;38:1276–96.
van der Ploeg T, Austin PC, Steyerberg EW. Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints. BMC Med Res Methodol. 2014;14:137.
Chen T, Guestrin C. Xgboost: a scalable tree boosting system. 2016. p. 785–794. https://doi.org/10.1145/2939672.2939785.
Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, et al. From local explanations to global understanding with explainable AI for trees. Nat Mach Intell. 2020;2:56–67.
Pompili M, Gonda X, Serafini G, Innamorati M, Sher L, Amore M, et al. Epidemiology of suicide in bipolar disorders: a systematic review of the literature. Bipolar Disord. 2013;15:457–90.
Perkins NJ, Schisterman EF. The youden index and the optimal cut-point corrected for measurement error. Biometrical J. 2005;47:428–41.
Tanskanen A, Taipale H, Koponen M, Tolppanen A-M, Hartikainen S, Ahonen R, et al. From prescription drug purchases to drug use periods – a second generation method (PRE2DUP). BMC Med Inf Decis Mak. 2015;15:21.
Taipale H, Tiihonen J. Registry-based studies: what they can tell us, and what they cannot. Eur Neuropsychopharmacol. 2021;45:35–37.
Viechtbauer W. Conducting meta-analyses in R with the metafor package. J Stat Soft. 2010;36:1–48.
Woodford R, Spittal MJ, Milner A, McGill K, Kapur N, Pirkis J, et al. Accuracy of clinician predictions of future self-harm: a systematic review and meta-analysis of predictive studies. Suicide Life Threat Behav. 2019;49:23–40.
Kendler KS, Abrahamsson L, Sundquist J, Sundquist K. The prediction of diagnostic change from bipolar disorder to schizophrenia and schizophrenia to bipolar disorder in a population-based, longitudinal, national swedish sample. Schizophrenia Bull. 2024;51:710–21.
Daveney J, Panagioti M, Waheed W, Esmail A. Unrecognized bipolar disorder in patients with depression managed in primary care: a systematic review and meta-analysis. Gen Hospital Psychiatry. 2019;58:71–76.
Pacchiarotti I, Tiihonen J, Kotzalidis GD, Verdolini N, Murru A, Goikolea JM, et al. Long-acting injectable antipsychotics (LAIs) for maintenance treatment of bipolar and schizoaffective disorders: a systematic review. Eur Neuropsychopharmacol. 2019;29:457–70.
Tien Y, Huang H-P, Chan C-H, Huang S-C, Wang VX-Y. Addition of long-acting injectable antipsychotics during manic episodes in bipolar disorder: A retrospective analysis of rehospitalizations. J Affect Disord. 2025;373:325–32.
Altamura AC, Mundo E, Dell’Osso B, Tacchini G, Buoli M, Calabrese JR. Quetiapine and classical mood stabilizers in the long-term treatment of bipolar disorder: a 4-year follow-up naturalistic study. J Affect Disord. 2008;110:135–41.
Geddes JR, Gardiner A, Rendell J, Voysey M, Tunbridge E, Hinds C, et al. Comparative evaluation of quetiapine plus lamotrigine combination versus quetiapine monotherapy (and folic acid versus placebo) in bipolar depression (CEQUEL): a 2 × 2 factorial randomised trial. Lancet Psychiatry. 2016;3:31–39.
Waks AG, Winer EP. Breast cancer treatment: a review. JAMA. 2019;321:288–300.
Amerio A, Russo D, Miletto N, Aguglia A, Costanza A, Benatti B, et al. Polypharmacy as maintenance treatment in bipolar illness: a systematic review. Acta Psychiatr Scand. 2021;144:259–76.
Funding
This work was supported by The Swedish Research Council for Health, Working Life and Welfare, FORTE (2021-01079). JL was funded by the Finnish Medical Association (grant 7709), and HT was funded by the Sigrid Jusélius Foundation. The study’s funders had no role in the study design, data collection, data analysis, data interpretation, or writing of the report. We utilized data from the REWHARD consortium supported by the Swedish Research Council (grant number 2021-00154). The Finnish data were funded by the Finnish Ministry of Social Affairs and Health through the developmental fund for Niuvanniemi Hospital. Open access funding provided by Karolinska Institute.
Author information
Authors and Affiliations
Contributions
JL, HT, and JT conceptualized the paper. JT, ML, HT, EM-R, and AT oversaw data collection and project development. JL, HT, and AT were responsible for the statistical analyses. JL drafted the manuscript and provided data interpretation. ML, RA, SL, CC, AA, AK, BA, and EM-R assisted with the data interpretation. All authors participated in finalizing the manuscript, agreed upon the final version of the manuscript and meet the definition of an author, as stated by the International Committee of Medical Journal Editors.
Corresponding author
Ethics declarations
Competing interests
JT, HT, and AT participated in research projects funded by grants from Janssen-Cilag to their employing institution. JL owns shares for the publicly traded companies Orion, Aiforia, and Optomed. JT has been a consultant and/or advisor to and/or received honoria and/or support for attending meetings from Healthcare Global Village, HLS Therapeutics, Janssen-Cilag, Lundbeck, Orion Pharma, Otsuka, Teva, and WebMD Global. HTs report personal fees from Gedeon Richter, Janssen-Cilag, Lundbeck and Otsuka. ML is a board member of Genomi Solutions Ltd. and Springflux Ltd. and has received honoraria from Sunovion, Orion Pharma, Camurus, Lundbeck, Otsuka Pharma, Recordati, Janssen and Janssen-Cilag and research funding from the Finnish Cultural Foundation and the Emil Aaltonen Foundation. CUC has been a consultant and/or advisor to or has received honoraria from AbbVie, Acadia, Adock Ingram, Alkermes, Allergan, Angelini, Aristo, Biogen, Boehringer-Ingelheim, Cardio Diagnostics, Cerevel, CNX Therapeutics, Compass Pathways, Darnitsa, Denovo, Gedeon Richter, Hikma, Holmusk, IntraCellular Therapies, Jamjoom Pharma, Janssen/J&J, Karuna, and Luxbeck, MedAvante-ProPhase, MedInCell, Merck, Mindpax, Mitsubishi Tanabe Pharma, Mylan, Neurocrine, Neurelis, Newron, Noven, Novo Nordisk, Otsuka, Pharmabrain, PPD Biotech, Recordati, Relmada, Reviva, Rovi, Sage, Seqirus, SK Life Science, Sumitomo Pharma America, Sunovion, Sun Pharma, Supernus, Takeda, Teva, Tolmar, Vertex, and Viatris. He provided expert testimony for Janssen and Otsuka. He served on a Data Safety Monitoring Board for Compass Pathways, Denovo, Lundbeck, Relmada, Reviva, Rovi, Supernus, Teva and Xenon. He has received grant support from Janssen and Takeda. He received royalties from UpToDate and is also a stock option holder for Cardio Diagnostics, Kuleon Biosciences, LB Pharma, MindLink, Mindpax, Terran and Quantic. SL has received honoraria as an advisor and/or for lectures and/or for educational material from Alkermes, Angelini, Apsen, Eisai, Gedeon Richter, Janssen, Karuna, Kynexis, Lundbeck, Medichem, Medscape, Merck Sharp and Dohme, Mitshubishi, Neurotorium, NovoNordisk, Otsuka, Recordati, Roche, Rovi, Sanofi Aventis, and TEVA. JL owns shares for Orion. The other coauthors report no conflicts of interest.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Lieslehto, J., Tiihonen, J., Lähteenvuo, M. et al. Relapse risk prediction in patients with first-episode bipolar disorder: development, external validation, and pharmacotherapy associations of a machine learning model. Mol Psychiatry 30, 5722–5730 (2025). https://doi.org/10.1038/s41380-025-03316-2
Received:
Revised:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1038/s41380-025-03316-2






