Fairness analysis of machine learning predictions of aggression in acute psychiatric care

Wang, Yifan; Sikstrom, Laura; Xiao, Robert; Findlay, Zoe; Zaheer, Juveria; Hill, Sean L.; Maslej, Marta M.

doi:10.1038/s44184-026-00194-6

Download PDF

Article
Open access
Published: 02 March 2026

Fairness analysis of machine learning predictions of aggression in acute psychiatric care

Yifan Wang^1,2,
Laura Sikstrom^1,3,
Robert Xiao¹,
Zoe Findlay¹,
Juveria Zaheer^1,4,
Sean L. Hill ORCID: orcid.org/0000-0001-8055-860X¹ &
…
Marta M. Maslej^1,4

npj Mental Health Research volume 5, Article number: 16 (2026) Cite this article

12k Accesses
49 Altmetric
Metrics details

Subjects

Abstract

Machine learning (ML) is increasingly being developed to support individualized risk assessment and de-escalation in acute psychiatry. However, ML algorithms have been shown to exhibit unfair behavior based on protected characteristics, such as an individual’s sex or ethnicity. The fairness of ML-based predictions of aggression in acute psychiatry has received limited investigation. To address this gap, we trained an ML algorithm to predict aggressive incidents from structured electronic health records corresponding to 17,703 patients at a large psychiatric hospital between January 2016 and May 2022 (n = 42,719 observation days). We analyzed predictions for fairness by assessing disparities in false positive rates (FPR) and true positive rates (TPR), based on patient race/ethnicity, gender, admission mode, citizenship, and housing status, as well as intersections of race/ethnicity and gender. A random forest algorithm attained ROC-AUC = 0.81. Fairness analyses revealed significant disparities in FPR and TPR across subgroups: FPR was higher for Middle Eastern and Black patients, men, those admitted into emergency care by the police, and those with unstable or supportive forms of housing. Our analysis demonstrates the potential for ML algorithms to reinforce and amplify known social and structural inequities, highlighting the importance of considering and addressing model fairness prior to clinical implementation.

A proof-of-concept machine learning model for short-term suicide risk stratification in depressed youth

Article Open access 19 March 2026

FairFML: fair federated machine learning with a case study on reducing gender disparities in cardiac arrest outcome prediction

Article Open access 12 August 2025

Integration of fairness-awareness into clinical language processing models

Article Open access 24 February 2026

Introduction

Patient aggression is a major concern in clinical settings, such as acute psychiatry, and encompasses a range of behaviors including verbal abuse, sexual harassment, and physical violence. It has adverse effects on the quality of care, patient and staff safety, and public perceptions of mental health care^1,2,3. However, coercive interventions used to manage the risk of aggression can be similarly problematic: the administration of medications (i.e., chemical restraints) and physical restraints to manage aggression have been shown to negatively impact a patient’s experience of care in a potentially traumatizing way^4,5. Machine learning (ML) has increasingly been applied to predict the risk of patient aggression in psychiatric and forensic settings, as it can leverage complex datasets to generate more individualized predictions to enable earlier and more targeted de-escalation using non-coercive forms of intervention. Previous studies have trained ML algorithms on diverse datasets, from clinical data to neuroimaging scans, with these algorithms often exceeding the predictive performance of current clinical instruments⁶. However, there has been limited investigation of the fairness of these models in acute psychiatry—that is, whether their predictions display any prejudice or favoritism towards an individual or group based on certain inherent or acquired characteristics, such as race or sex (i.e., a protected group)⁷.

Algorithmic fairness has been a growing area of focus in ML research, and widely used algorithms for criminal recidivism prediction and healthcare resource allocation have been shown to be unfair towards racially marginalized groups, such as Black or lower-income individuals^8,9,10, as well as the intersections of underserved groups, like Hispanic females¹¹. The potential for algorithmic unfairness is particularly concerning in the context of predicting aggression in acute psychiatry because of pervasive inequities embedded in the training data. Inequities can be structural (interpersonal and systemic processes which create inequities in power and resources¹²) and social (disparities relating to an individual’s proximal social, political, and economic environment¹²). These include racial profiling in police apprehensions for involuntary psychiatric admission, gendered biases in clinician perceptions of inpatient violence risk, and disparities in access to quality mental health care based on socioeconomic status^13,14,15. Inequities defined by the intersection of race, ethnicity and gender are also a significant concern, given the well-documented challenges Black men face in accessing and receiving mental health care^16,17,18. These inequities can be readily embedded in the data used to train ML algorithms. Evidence of performance disparities in ML-based predictions of violence against hospital providers have emerged in one prior study, suggesting less accurate predictions for Asian and Native Hawaiian patient groups, but the analysis only examined fairness stratified by patient race, and the ability to draw conclusions was limited by small sample sizes¹⁹. There is a need to better assess how both proximal social indicators, like race/ethnicity and sex, as well as upstream factors, such as policing or housing are related to the fairness of ML-based predictions of aggression.

This analysis, which was part of a larger mixed-methods study protocolled in Sikstrom et al. ²⁰ dissecting the construction and use of predictive care tools, had three main objectives. First, we used demographic and clinical features to train a supervised ML algorithm to predict whether a patient would become aggressive on a given day and validated its performance on an internal test set. Second, we performed a fairness assessment, examining the algorithm’s performance stratified by demographic characteristics of patients, focusing on gender, race/ethnicity, citizenship, admission mode, and housing status. Third, we characterized how the model performs for groups of patients defined by the intersection of both race/ethnicity and gender¹¹. Overall, by examining algorithmic unfairness in ML predictions of aggression, our findings highlight the importance of assessing fairness across diverse social and demographic factors during model development and evaluation, to prevent the deployment of ML models in acute psychiatry that can harm specific populations.

Methods

Study population

This analysis utilized electronic health records (EHRs) from ten inpatient care units at the Center of Addiction and Mental Health (CAMH), a large mental health and addictions hospital in Toronto, Canada, between January 2016 and May 2022. This constituted a large, clinically heterogenous dataset that enabled sufficient sample sizes (n > 50) for fairness analysis of each demographic subgroup. Only patients who were admitted to inpatient units via the hospital’s emergency department (ED) were included, to enable consideration of admission mode (e.g., apprehension for admission by police). Patients were excluded from analysis if they were referred from a corrections facility or another hospital. However, we did not exclude patients with prior acute care visits, so the analysis included multiple inpatient hospitalizations from unique patients.

All demographic data were obtained from patient-reported forms routinely collected at admission, and include age, gender, sexual orientation, citizenship, housing status, income, highest education level, language, ethnicity, and marital status. All clinical and contextual factors that were documented at patient intake were also extracted, including primary psychiatric diagnosis assessed by ED psychiatrists via a brief diagnostic interview, presence of substance-induced symptoms, mode of admission, and inpatient unit location. Finally, risk assessment data included ratings on the Dynamic Appraisal of Situational Aggression (DASA), a clinically validated instrument that assesses each patient’s risk of aggression over the next 24 h. Assessment is based on seven dichotomous items which capture behavioral and interpersonal factors related to this risk (e.g., agitation, sensitivity to provocation, verbal threats)^21,22. At CAMH, DASA scores are generated by nurses each morning that a patient is on the unit, based on their clinical observations and relevant information from a chart review of the past 24 h.

This study was approved by the CAMH research ethics board (REB #053-2021). Direct patient consent was not required for this study, as was approved by the CAMH research ethics board. All patient EHR data was processed in adherence with protocols reviewed by the CAMH Privacy Department and research ethics board.

Prediction task

The prediction outcome included aggressive incidents involving patients, as documented by any attending staff (e.g., nurses, clinicians, security guards or program assistants) in CAMH’s reporting tool. Incidents were included as outcomes if they were categorized by staff as either “abuse/assault/violence” or “physical/sexual/verbal behaviors and assaults”. Any documented use of any combination of chemical restraints, physical restraints, or seclusion due to “harm to others” or “harm to both self and others” was included as an outcome since these interventions are only used when violence or aggression is deemed imminent^20,23. This method of encoding outcome was applied for all demographic groups.

Because patients are assessed using the DASA for the risk of imminent aggression (e.g. within the next 24 hours), predictions were made on each day of the acute care stay (Fig. 1). As such, the prediction time (i.e., when a model prediction was made) was based on the time at which DASA ratings were recorded in the morning. The prediction window (i.e., the time duration after the prediction time over which the outcome was being predicted) ended either when an outcome occurred, or when the next DASA was recorded (approximately 24 hours after the prediction time). The predictor/observation window (i.e., the time duration of predictors that were used for prediction) encompassed the data collected at admission, as well as the previous 24 hours of admission or the inpatient stay, which informed the DASA score given for that day.

**Fig. 1: Illustration of prediction window and timing of measures.**

Most outcomes were expected to occur on the first three days that a patient was receiving acute care²⁴. For this reason, we included up to three days or prediction windows for each visit. If one or more outcomes occurred during a given visit, we only included data collected until the first occurrence, since interventions used to manage the outcome may alter risk. In other words, each patient had up to three prediction windows of approximately 24 hours, starting at their first DASA assessment and ending at the next DASA assessment.

Clinical, sociodemographic and admission data was only collected once at admission for each visit, so it was repeated across the three days for each visit and patient. An outcome (i.e., aggressive incident, restraint, or seclusion) occurring before the first DASA assessment on that same visit (i.e., occurring before admission into inpatient care units, such as during transport to CAMH or during their ED course) was treated as a predictor (i.e., a binary variable indicating whether the incident occurred prior to the first DASA), that was repeated across the three days for each visit and patient.

Data processing

All variables were manually reviewed to verify the quality and clinical relevance of each predictor. Additionally, the intake demographic questionnaire contained open-ended response options for variables that were categorized as ‘other’, which were all manually categorized into the existing categories, in consultation with CAMH acute care clinicians. Gender was grouped into three categories: male, female, and gender expansive. Race/ethnicity was grouped into Black, Asian, South Asian, Indigenous, Latin-American, Middle Eastern, Mixed, and White. Each of the seven DASA items was included as an individual dichotomous variable to retain information about specific aggression-related factors. Primary diagnoses appeared as a descriptive field in the EHR; they were grouped by the study team and in consultation with clinicians into ten diagnosis types, guided by the DSM-5 categories (Supplementary table 2). The final predictor dataset included 16 categorical variables.

A 70%/30% train-test split was performed. Randomization for the train-test split was done by patient, as opposed to by observation-day, to ensure that different inpatient days or multiple presentations to acute care for the same patient were not split between the two sets. No functions were fitted on the test set, which was withheld until the final performance and fairness evaluation. All variables with missing responses were imputed based on the proportion of missingness. Variables with ≤ 20% missing were imputed by the mode, and following Suchting and colleagues²³, missing values for variables with > 20% missing were imputed with a new “missing” category to preserve the potential informativeness of high missingness, which may reflect underlying social factors. Non-binary variables were one-hot-encoded in preparation for model training.

Addressing class imbalance

In the study population, the outcomes were imbalanced by almost a ratio of 33:1, with significantly greater cases with no incident. To prevent the decision boundary from greatly favoring the majority class at the expense of the fidelity of minority class predictions (e.g., by making almost exclusively negative predictions), F1 score was used as the primary evaluation metric, which is calculated based on the balance between precision and recall, thereby offering a more reliable evaluation of imbalanced data classification. Additionally, a range of resampling algorithms were tested as part of the model tuning process, where the training set was either undersampled by removing cases from the majority class or oversampled by adding synthetic cases to the minority class to balance the distribution of positive and negative cases.

Model selection and optimization

Model selection and optimization was performed on the training set using 5-fold cross validation, optimizing for F1 score based on a classification probability threshold of 0.5. Logistic regression, naïve bayes, random forest, gradient boosting, support vector machine, decision tree, and simple neural network were evaluated as candidate classification models. Additionally, no resampling, random undersampling, nearmiss undersampling, and SMOTE oversampling were evaluated as candidate resampling methods in combination with all candidate models.

Model selection and optimization was carried out in two steps. First, an initial search to determine an optimal classifier and resampler was conducted by training a model for every possible classifier-resampler combination. For each model, hyperparameters were loosely tuned using 5-fold CV across 100 iterations of a randomized hyperparameter search. (Supplementary table 5). Second, more extensive hyperparameter optimization was conducted on the top three-performing model architectures identified in the previous step, using 5-fold CV across a comprehensive grid search. (Supplementary table 6) The architecture and hyperparameters of the top performing model by F1 score were used for the final model.

Model training

The final model and sampler was refit on the entire training set and its performance was measured based on its predictions on the hold-out test set. No resampling was performed on the test set (Fig. 2). Final model performance on the test set was quantified using F1 score, PR-AUC, ROC-AUC, accuracy, sensitivity, and specificity. Standard deviations and confidence intervals for all performance and fairness metrics were calculated by training the model five times using five different random seeds, then applying each to the test set¹¹. Feature importances were extracted using impurity-based importance as implemented in Sci-kit learn. Model recalibration was not performed, as we wished to examine the fairness characteristics of the classifier without any post-hoc group-specific adjustment.

Fairness assessment

Fairness analysis was conducted using observational criteria based on post-hoc analysis of model outputs, true outcome, and sensitive attributes. In our paper, we operationalize fairness using the equalized odds criteria, which requires that a fair model must have equal false positive rate (FPR) and true positive rate (TPR) between subgroups of sensitive attributes, where ${FPR}=\frac{{FP}}{{FP}+{TN}({\rm{Actual\; negatives}})}$ and ${TPR}=\frac{{TP}}{{TP}+{FN}({\rm{Actual\; positives}})}$⁷. Given that false positives (i.e., incorrectly flagging individuals as being at high risk of aggression) carries the most potential harm with respect to reinforcing social and structural inequities, we primarily focus our assessment on FPR, which represents the disparate mistreatment criterion of fairness²⁵. To understand the fairness behavior of the algorithm more thoroughly, group-specific F1-score, ROC curves, and calibration curves were also assessed.

Attributes that were analyzed for fairness include race/ethnicity, gender, admission mode, citizenship, and housing status. Intersectional fairness analysis was performed for the intersection of gender and race/ethnicity. When performing fairness analysis for a given feature, individuals with imputed values for that feature were excluded from that analysis. This ensures the results of the fairness analysis reflect only real-world observations without being affected by any potential bias introduced by imputation.

Results

Sample characteristics

Across all observation days, there were a total of 41447 “no incident” cases (i.e., observation days on which a violent or aggressive incident was not reported) to 1272 “incident” cases (i.e., days on which such an incident was reported) corresponding to 17703 total unique patients. These patients were split into 29879 cases in the train set (n = 12398 unique patients) and 12840 in the test set (n = 5305 unique patients).

Patients were relatively evenly distributed across age categories, but they were predominantly male, White, single, and of Canadian citizenship. The most common diagnoses were psychotic disorders, and patients were most commonly accompanied during ED admission by family or friends. Citizenship, Housing, Marital status, and Admission mode had proportions of missing observations above 20%, thereby requiring imputation with a “missing” category. All features differed significantly when comparing no incident vs incident populations (p < 0.001). Sample characteristics at the level of observation days in the overall data can be found in Table 1. Sample characteristics at the level of unique patients can be found in supplementary table 3, and characteristics by train/test set is reported in supplementary table 4.

Table 1 Sample characteristics at the level of observation days

Full size table

Model performance

The best-performing model on the train set was a 200-estimator random forest (RF) with no oversampling or undersampling. Model performance for other candidate models are reported in supplementary tables 5 and 6. On the hold-out test set, the random forest obtained a F1 score of 0.2213 ± 0.0031, a PR-AUC of 0.1301 ± 0.0025, an ROC-AUC of 0.8120 ± 0.0016, and an accuracy of 0.9323 ± 0.0004 (Fig. 3A, B). The model had a sensitivity/TPR of 0.3265 ± 0.0057 and a specificity of 0.9507 ± 0.0005 at a probability threshold of 0.5. Feature importance extracted from the RF revealed that the DASA items, especially irritability, as well as the presence of a violent/aggressive incident or restraint occurring prior to admission into acute care, are highly important for predictions. (Fig. 3D).

Fairness assessment

With respect to race/ethnicity, Middle Eastern individuals had the highest FPR among all ethnic groups (FPR [standard deviation] = 0.0801 [0.0048]) followed by Black (0.0694 [0.002]), Indigenous (0.0552 [0.0037]), Mixed (0.0525 [0.0021]), White (0.0404 [0.0008]), South Asian (0.0356 [0.0019]), Asian (0.0322 [0.0028]), and Latin American (0.0313 [0.0028]) individuals (Fig. 4, Supplementary table 7). There was also significant variation in TPR: Latin American (TPR [standard deviation] = 0.3846 [0.0000]) and Middle Eastern (0.3778 [0.0222]) had the highest TPR, and Asian (0.2381 [0.0000]) and South Asian (0.2571 [0.0350]) had the lowest TPR (Supplementary table 7). Predictive accuracy was highest in Middle Eastern individuals (F1 score [standard deviation] = 0.2372 [0.0158]). ROC curves reveal significant differences in the TPR-FPR trade-offs between groups, with Black individuals having considerably higher FPR for any TPR (Supplementary Fig. 1).

**Fig. 4: True positive rate (TPR) and false positive rate (FPR) by demographic attributes.**

Men (0.0542 [0.0005]) had higher FPR than women (0.0426 [0.0009]) and gender expansive individuals (0.0418 [0.006]). TPR, F1 score and ROC-AUC are all lower in men compared to women. At conservative prediction thresholds, men have higher FPR for any given TPR compared to women and gender expansive individuals.

Examining fairness by admission mode, individuals who were admitted by police had significantly higher FPR than any other group label (0.0941 [0.0019]), followed by other (0.0547 [0.002]), mobile crisis (0.0476 [0.0000]), self (0.0326 [0.0007]), case worker/nurse (0.0303 [0.0007]), and friend/family (0.0264 [0.0011]). Police admissions also had relatively high TPR (0.4174 [0.0165]) and F1 score (0.2405 [0.0158]).

Canadian citizens had higher FPR and TPR (FPR = 0.0427 [0.004], TPR = 0.3145 [0.0072]) than non-citizens (FPR = 0.0210 [0.0024], TPR = 0.2696 [0.0174]). There is a significant mismatch in the ROC curves between the two groups, with Canadian citizens having higher FPR rates for any given TPR at conservative prediction thresholds.

With respect to housing, those who were in unstable forms of housing or unhoused (0.0829 [0.0014]) or were living in supportive housing (0.0502 [0.0017]) had higher FPR than those who had other forms of housing, such as owning (0.0344 [0.002]), renting (0.0318 [0.0007]), and living with family (0.0273 [0.001]). Individuals living in supportive housing have considerably lower predictive accuracy than other groups, with the lowest TPR (0.2308 [0.0243]), F1 score (0.1356 [0.0123]) and ROC-AUC (0.7651 [0.0048]).

Intersectional fairness assessment

Intersectional analysis was performed for the intersection of ethnicity and gender (Fig. 5; Supplementary table 8). The gender expansive group was excluded due to low sample sizes (N < 15 observations) for all races/ethnicities except White. All other intersectional groups had more than 50 observations. Middle Eastern men had the highest FPR (FPR [standard deviation] = 0.0933 [0.0074]) and a highly pronounced gender-specific effect; Middle Eastern women had a significantly lower FPR (0.0372 [0.0047]). However, both genders were similar in terms of their TPR. Black men (0.0759 [0.0026]) and Indigenous men (0.0747 [0.0062]) also had a relatively high FPR, and their TPRs also tended to be higher as compared to Black women (TPR [standard deviation] = 0.2353 [0.0372]) and Indigenous women (0.2500 [0.0000]). Across all races/ethnicities, men had an intersectional FPR equal to or greater than that of women.

Discussion

In this study, we assessed whether ML predictions of inpatient aggression in acute psychiatric care are unfair. To our knowledge, this is the most comprehensive fairness assessment of ML as related to this outcome, and builds on previous work by Dobbins et al. ¹⁹ by examining a wider range of social determinants and applying an intersectional approach. A random forest model was trained on a range of demographic, clinical, admission, and risk assessment data, yielding an F1 score of 0.22, a PR-AUC of 0.13, and a ROC-AUC of 0.81. Although maximizing predictive performance was not an emphasis of this study, the model achieved comparable performance to ML algorithms reported in prior research trained on tabular data in clinically heterogenous psychiatric populations (ROC-AUC obtained in Suchting et al. = 0.78²³, Menger et al. = 0.76²⁶, Wang et al. = 0.63²⁷, Danielsen et al. = 0.87²⁸). The fairness assessment revealed the algorithm violates both disparate mistreatment and equalized odds: there were significant disparities in FPR, TPR, and ROC-AUC curves across race/ethnicity, gender, admission mode, citizenship, and housing status. Relative to other groups, FPR was elevated in individuals who are Middle Eastern and Black, those who identify as male, are admitted into emergency care by the police, Canadian citizens, and with unstable or supportive forms of housing. Intersectional analyses revealed that Middle Eastern men had the highest FPR among all groups. There were significant differences in TPR and ROC-AUC curves in relation to the FPR of each group, suggesting the nature of algorithmic unfairness differs between groups. For example, in the case of patients who are Middle Eastern, in unstable or no housing, or admitted by police, FPR and TPR were both elevated relative to other groups, suggesting the model was calibrated to increase overall predictive accuracy at the expense of higher FPR. Conversely, for other groups like Black patients, models had high FPR and low TPR, suggesting poor overall performance.

Importantly, observational measures of unfairness, such as TPR and FPR are merely outcome measures that do not explain how unfair predictions arise. Rather, these results must be understood in the context of underlying social and structural inequities that can give rise to unfair predictions in the first place, such as racial profiling in the criminal justice system, racial residential segregation, or barriers to accessing mental healthcare²⁹. We discuss some of these parallels in the section below.

Black individuals are less likely to receive adequate outpatient psychiatric treatment, they are more likely to be involuntarily admitted into inpatient treatment, and they may also present with more severe psychotic symptoms, compared to White individuals^13,14. Black men in particular face significant barriers in accessing mental health care, and they are more likely to be misdiagnosed with psychotic disorders, as compared to White men^16,17,18. Interpersonal bias is also possibility, where structurally reinforced stereotypes may lead to higher risk perceptions for racially marginalized individuals on clinical risk instruments like the DASA, though research is largely inconclusive on whether these instruments are themselves biased. Both male gender and Black race have been found to be significantly associated with violence in psychiatric settings. Findings from our study suggest that these associations can become embedded in clinical datasets, which may lead to unfair treatment by ML algorithms, both via increased false positive predictions and poorer performance in identifying at-risk individuals^2,30.

Police apprehension for admission into the ED is also communicated among clinicians to be a relevant factor in risk assessment due to an increased likelihood of aggression in patients admitted involuntarily, and/or referred by the police^2,31. It is therefore perhaps not surprising that this mode of admission was associated with the highest FPR of any other predictor in the fairness assessment. Patients apprehended by police for admission into emergency psychiatric care are indeed more likely to become violent or aggressive, which is likely to account for relatively high FPRs and TPRs for this group³². At the same time, racially marginalized and Indigenous groups have increased rates of involuntary admissions into psychiatric care by police, likely due to various factors, such as barriers to accessing mental health care or racial profiling^31,33,34,35. This tendency may in part explain the finding of higher FPRs among Black men, and potentially Middle Eastern and Indigenous individuals as well.

The fairness assessment also highlights housing as a potential source of algorithmic unfairness, specifically for those with unstable or supportive forms of housing. On a social level, unstable housing has been associated with psychiatric conditions, such as trauma and substance use, as well as a lower educational attainment and disrupted support networks^36,37. Conditions of unstable housing may contribute to food or water insecurity, sleep deprivation, and hyper vigilance, which can lead to the expression of behaviors that are rated as precursors of aggression on clinical instruments, such as the DASA (e.g., irritability, sensitivity to provocation, and unwillingness to follow instructions). Structurally, current psychiatric care systems are not well-equipped to meet the constellation of needs of unhoused individuals, which may contribute to their increased ED use and higher false positive predictions for the risk of violence in inpatient care^{37,38,39,40,41}. Supportive housing services for people with severe mental illness offer more stability, but they are in high demand and extremely under-resourced, often unable to meet complex, individual needs⁴².

We also identified performance disparities that are not linked to well-researched inequities. For example, while qualitative analyses have shown a general distrust of biomedical mental health services among Middle Eastern individuals, there is a considerable research gap in characterizing how they interact with these systems⁴³. Although our analyses suggest that high FPR for Middle Eastern patients may be in part related to improved model TPR/sensitivity, social and structural determinants likely play a role in the way their risk of violence or aggression is perceived; these may be related to cultural communication barriers, or expressions of distrust manifesting as increased irritability or an unwillingness to follow instructions. However, the gender discrepancy in FPR (but not in TPR) for this group suggests this effect may only extend to men. Similarly, the algorithm displayed modest FPR differences based on citizenship, which is also not a well-documented demographic feature in the psychiatric literature. Nevertheless, citizenship may be an important factor to consider in future fairness assessments of ML models in healthcare, given its impact on access to community, social, and health services.

These findings highlight the importance of thoughtful documentation and processing of demographic data, which is a strength of our study. Specifically, access to high-quality and diverse sociodemographic information is necessary for evaluating ML models for fairness, making it critical that these data are measured or not lost during processing⁴⁴. Middle Eastern ethnicity, for example, does not appear to be commonly encoded as a unique racial or ethnic category in research datasets, which inevitably precludes the discovery of important trends in this population as identified in our study⁴⁵. Demographics in our dataset were drawn from CAMH’s health equity form, which was designed to capture a range of rich features which are not frequently characterized, such as specific ethnic and gender minorities⁴⁶.

Overall, our results suggest that if fairness is not properly considered, the deployment of ML algorithms to support the prediction of aggression in acute psychiatric care and other clinical settings has the potential to cause significant harms with respect to both disparate mistreatment and equalized odds in socially and structurally disadvantaged groups⁴⁵. Bias in ML algorithms has already been shown to reduce clinician accuracy⁴⁷; in psychiatric risk assessment, the unwarranted use of interventions based on a false positive prediction can lead to unnecessary distress, disruption of trust in a therapeutic relationship or the health system, and may even precipitate violent or aggressive incidents when they otherwise would not have occurred⁴⁸. Furthermore, there is extensive literature highlighting the cyclical nature of algorithmic unfairness: algorithms can reproduce and amplify existing inequalities, which can then become embedded in new datasets used to develop ML algorithms or inform care^45,49. Even if an unfair recommendation is not followed, disagreement between providers and ML algorithms may lead providers to fear legal implications against them, which may negatively impact care⁵⁰. Given these concerns, algorithmic unfairness is recognized by both patients and providers as a major barrier in the clinical implementation of predictive risk models^50,51.

There exists a range of algorithmic methods to improve a model’s fairness, such as integrating fairness benchmarks into optimization criteria during model training, resampling the input data itself to improve fairness, or enforcing specific fairness criteria using group-specific prediction thresholds⁵². Several studies have now applied “debiasing” methods to clinical ML algorithms, demonstrating promising results^53,54,55. Our findings highlight the necessity to properly assess fairness so that these measures can be applied as appropriate to predictive risk models before they are deployed. An important consideration, however, is that most debiasing methods use the ground truth outcome label as a benchmark to determine whether a model is fair⁵⁶. In other words, most methods seek to faithfully replicate “the world as it is” – no more, but no less unfair than the input data. However, we have discussed how data relating to inpatient aggression, particularly the administration of coercive interventions, is deeply intertwined with societal inequities. As such, debiasing metrics and methods in this context must use some “true” notion of fairness that represents “the world as it should be”. Algorithmic interventions, therefore, do not constitute a complete solution. To enable algorithmic debiasing approaches, practitioners first must define how a fair and equitable ML algorithm should behave – this is a social question, not a technical one.

Ultimately, ML systems do not operate in a vacuum, but rather as part of highly complex sociotechnical systems where algorithms and societal inequities interact in complex ways. We highlight that ML fairness assessments can identify inequities across large, complex datasets to help target further investigation. However, fairness analysis alone cannot deeply characterize these social and structural drivers of unfairness, nor the exact processes by which they ultimately result in unfair predictions. When seeking to understand algorithmic fairness, therefore, it is important to characterize and understand these biases and inequities on a social level, such as through qualitative approaches that reveal patient and provider experience^57,58.

It is also important to note that there is no single optimal way to assess the fairness of ML algorithms. There are over 70 definitions of fairness, many of which are conflicting, making it impossible to simultaneously satisfy all possible definitions⁵⁹. We restricted our analysis to a single a priori perspective of what constitutes a fair ML model with a focus on disparate mistreatment and equalized odds, making it possible that our analysis missed other relevant fairness considerations or perspectives. For example, in contrast to the group notion of fairness used in this study, individual fairness postulates that similar individuals should receive similar ML predictions, drawing from philosophies of consistency and individual justice rather than anti-discrimination frameworks^49,60. Individual fairness often relies on counterfactual or explanation-based ways to define fairness, neither of which were assessed in this study⁶⁰. Additionally, our investigation examined the fairness of only one model architecture, since our aim was to evaluate models that could be advanced for further testing and implementation (i.e., those performing best on training or validation data). Although underlying societal inequities are likely to impact different types of ML models in similar ways, there is evidence that fairness performance can vary based on model architecture^55,61. As such, future research could consider how fairness characteristics differ between model types, and investigate impacts of integrating fairness considerations into model selection itself^62,63.

Additionally, there are limitations within the dataset used for this study. Our algorithm was trained using an urban Canadian population—although underlying inequities appear pervasive across populations, our findings may not generalize to other populations⁶⁴. Moreover, the analysis relied on EHR data which is known to vary in quality. For instance, it is possible that some aggressive incidents were not documented, or modes of admission were mislabelled. Following prior work²³, we included restraints in the outcome under the assumption that they were only applied when aggressive incidents were imminent, which may not always hold. Additionally, segmentation of the dataset into subgroups reduced the sample size for the fairness assessment, especially with respect to minority and intersectional groups. For example, limited sample sizes necessitated us to collapse granular descriptions of ethnic heritage into “Black” as a big-bucket category, which may mask additional disparities in ML fairness within this heterogenous group⁶⁵. Similar limitations were present with gender, as we grouped all genders that were not male or female into a single category, which still lacked the sufficient size to perform intersectional analysis. We were also limited in our exploration of other indicators of socioeconomic status beyond housing, such as income or area-level deprivation and marginalization, which may have offered valuable insights into how these features impact model fairness. As such, we encourage future ML studies in this context to perform fairness assessments, particularly by leveraging rich dataset features, such as granular ethnic breakdowns, larger sample sizes for intersectional groups, and including multiple indicators of socioeconomic status. This will enable a more nuanced and thorough understanding of algorithmic fairness and how it may differ across populations.

In conclusion, ML predictions of aggression in acute psychiatric care and other clinical settings have the potential to be unfairly biased. However, this is not meant to be an argument against the use of ML in such contexts. Rather, we suggest that it is critical to be aware of fairness-related considerations prior to their implementation, and illustrate how performing such analyses can shed light on underlying inequities. To this end, we encourage future ML work in psychiatry to consider fairness as a critical element of evaluation and to conduct further research to interrogate these identified inequities.

Data availability

The dataset is restricted as it comprised confidential EHRs. Inquiries about the data can be directed to the corresponding author. The code and prediction model supporting this study is not publicly available but can be shared upon reasonable request from the corresponding author.

Code availability

The code and prediction model supporting this study is not publicly available but can be shared upon reasonable request from the corresponding author.

References

Itzhaki, M. et al. Exposure of mental health nurses to violence associated with job stress, life satisfaction, staff resilience, and post-traumatic growth. Int. J. Ment. Health Nurs. 24, 403–412 (2015).
Article PubMed Google Scholar
Iozzino, L., Ferrari, C., Large, M., Nielssen, O. & Girolamo, G. de. Prevalence and risk factors of violence by psychiatric acute inpatients: a systematic review and meta-analysis. PLoS ONE 10, e0128536 (2015).
Article PubMed PubMed Central Google Scholar
Pescosolido, B. A., Manago, B. & Monahan, J. Evolving public views on the likelihood of violence from people with mental illness: stigma and its consequences. Health Aff. 38, 1735–1743 (2019).
Article Google Scholar
Zaheer, J. Documenting Restraint: Minimizing Trauma. in Interrogating Psychiatric Narratives of Madness: Documented Lives (eds. Daley, A. & Pilling, M. D.) 111–135 (Springer International Publishing, 2021). https://doi.org/10.1007/978-3-030-83692-4_5.
Lu, W., Mueser, K. T., Rosenberg, S. D., Yanos, P. T. & Mahmoud, N. Posttraumatic reactions to psychosis: a qualitative analysis. Front. Psychiatry 8, 129 (2017).
Article PubMed PubMed Central Google Scholar
Parmigiani, G., Barchielli, B., Casale, S., Mancini, T. & Ferracuti, S. The impact of machine learning in predicting risk of violence: a systematic review. Front. Psychiatry 13, 1015914 (2022).
Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K. & Galstyan, A. A survey on bias and fairness in machine learning. ACM Computing Surveys (CSUR). 54, 1−35 (2021).
Julia, A., Larson, J., Surya, M. & Lauren, K. Machine Bias. Machine Bias https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing (2016).
Obermeyer, Z., Powers, B., Vogeli, C. & Mullainathan, S. Dissecting racial bias in an algorithm used to manage the health of populations. Science 366, 447–453 (2019).
Article CAS PubMed Google Scholar
Chen, I. Y., Szolovits, P. & Ghassemi, M. Can AI help reduce disparities in general medical and mental health care? AMA J. Ethics 21, E167–E179 (2019).
Article PubMed Google Scholar
Seyyed-Kalantari, L., Zhang, H., McDermott, M. B. A., Chen, I. Y. & Ghassemi, M. Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations. Nat. Med. 27, 2176–2182 (2021).
Article CAS PubMed PubMed Central Google Scholar
National Collaborating Centre for Determinants of Health. Glossary of essential health equity terms. (NCCDH, 2022).
Hairston, D. R., Gibbs, T. A., Wong, S. S. & Jordan, A. Clinician Bias in Diagnosis and Treatment. in Racism and Psychiatry: Contemporary Issues and Interventions (eds. Medlock, M. M., Shtasel, D., Trinh, N.-H. T. & Williams, D. R.) 105–137 (Springer International Publishing, 2019). https://doi.org/10.1007/978-3-319-90197-8_7.
Smith, C. M. et al. Association of black race with physical and chemical restraint use among patients undergoing emergency psychiatric evaluation. Psychiatr. Serv. Wash. DC 73, 730–736 (2022).
Article Google Scholar
Kirkbride, J. B. et al. The social determinants of mental health and disorder: evidence, prevention and recommendations. World Psychiatry 23, 58–90 (2024).
Article PubMed PubMed Central Google Scholar
Motley, R. & Banks, A. Black males, trauma, and mental health service use: a systematic review. Perspect. Soc. Work J. Dr. Stud. Univ. Houst. Grad. Sch. Soc. Work 14, 4–19 (2018).
Google Scholar
Olbert, C. M., Nagendra, A. & Buck, B. Meta-analysis of Black vs. White racial disparity in schizophrenia diagnosis in the United States: do structured assessments attenuate racial disparities? J. Abnorm. Psychol. 127, 104–115 (2018).
Article PubMed Google Scholar
Tegnerowicz, J. “Maybe It Was Something Wrong With Me”: On the Psychiatric Pathologization of Black Men. in Inequality, Crime, and Health Among African American Males vol. 20 73–94 (Emerald Publishing Limited, 2018).
Dobbins, N. J. et al. Deep learning models can predict violence and threats against healthcare providers using clinical notes. Npj Ment. Health Res. 3, 61 (2024).
Article PubMed PubMed Central Google Scholar
Sikstrom, L. et al. Predictive care: a protocol for a computational ethnographic approach to building fair models of inpatient violence in emergency psychiatry. BMJ Open 13, e069255 (2023).
Article PubMed PubMed Central Google Scholar
Ogloff, J. R. P. & Daffern, M. The dynamic appraisal of situational aggression: an instrument to assess risk for imminent aggression in psychiatric inpatients. Behav. Sci. Law 24, 799–813 (2006).
Article PubMed Google Scholar
Lantta, T., Kontio, R., Daffern, M., Adams, C. E. & Välimäki, M. Using the dynamic appraisal of situational aggression with mental health inpatients: a feasibility study. Patient Prefer. Adherence 10, 691–701 (2016).
Article PubMed PubMed Central Google Scholar
Suchting, R., Green, C. E., Glazier, S. M. & Lane, S. D. A data science approach to predicting patient aggressive events in a psychiatric hospital. Psychiatry Res 268, 217–222 (2018).
Article PubMed Google Scholar
Weltens, I. et al. Aggression on the psychiatric ward: prevalence and risk factors. A systematic review of the literature. PLoS ONE 16, e0258346 (2021).
Article CAS PubMed PubMed Central Google Scholar
Zafar, M. B., Valera, I., Rodriguez, M. G. & Gummadi, K. P. Fairness Beyond Disparate Treatment & Disparate Impact: Learning Classification without Disparate Mistreatment. In Proceedings of the 26th International Conference on World Wide Web 1171–1180 (2017). https://doi.org/10.1145/3038912.3052660.
Menger, V., Spruit, M., van Est, R., Nap, E. & Scheepers, F. Machine learning approach to inpatient violence risk assessment using routinely collected clinical notes in electronic health records. JAMA Netw. Open 2, e196709 (2019).
Article PubMed PubMed Central Google Scholar
Wang, K. Z. et al. Prediction of physical violence in schizophrenia with machine learning algorithms. Psychiatry Res 289, 112960 (2020).
Article PubMed Google Scholar
Danielsen, A. A., Fenger, M. H. J., Østergaard, S. D., Nielbo, K. L. & Mors, O. Predicting mechanical restraint of psychiatric inpatients by applying machine learning on electronic health data. Acta Psychiatr. Scand. 140, 147–157 (2019).
Article CAS PubMed Google Scholar
El-Azab, S. & Nong, P. Clinical algorithms, racism, and “fairness” in healthcare: a case of bounded justice. Big Data Soc 10, 20539517231213820 (2023).
Article Google Scholar
Watts, D., Leese, M., Thomas, S., Atakan, Z. & Wykes, T. The prediction of violence in acute psychiatric units. Int. J. Forensic Ment. Health 2, 173–180 (2003).
Article Google Scholar
Maharaj, R., Gillies, D., Andrew, S. & O’brien, L. Characteristics of patients referred by police to a psychiatric hospital. J. Psychiatr. Ment. Health Nurs. 18, 205–212 (2011).
Article CAS PubMed Google Scholar
Dharma, C. et al. Examining systemic and interpersonal bias in violence risk assessments of patients in acute psychiatric care. Psychiatr. Serv. 76, 326–335 (2025).
Article PubMed Google Scholar
Meerai, S., Abdillahi, I. & Poole, J. An introduction to anti-black sanism. Intersect. Glob. J. Soc. Work Anal. Res. Polity Pract. 5, 18–35 (2016).
Google Scholar
Bhui, K. et al. Ethnic variations in pathways to and use of specialist mental health services in the UK. Systematic review. Br. J. Psychiatry J. Ment. Sci. 182, 105–116 (2003).
Article Google Scholar
Chow, J. C.-C., Jaffee, K. & Snowden, L. Racial/ethnic disparities in the use of mental health services in poverty areas. Am. J. Public Health 93, 792–797 (2003).
Article PubMed PubMed Central Google Scholar
Schreiter, S. et al. Housing situation and healthcare for patients in a psychiatric centre in Berlin, Germany: a cross-sectional patient survey. BMJ Open 9, e032576 (2019).
Article PubMed PubMed Central Google Scholar
Narendorf, S. C. Intersection of homelessness and mental health: a mixed methods study of young adults who accessed psychiatric emergency services. Child. Youth Serv. Rev. 81, 54–62 (2017).
Article Google Scholar
Amato, S., Nobay, F., Amato, D. P., Abar, B. & Adler, D. Sick and unsheltered: Homelessness as a major risk factor for emergency care utilization. Am. J. Emerg. Med. 37, 415–420 (2019).
Article PubMed Google Scholar
Kushel, M. B., Perry, S., Bangsberg, D., Clark, R. & Moss, A. R. Emergency department use among the homeless and marginally housed: results from a community-based study. Am. J. Public Health 92, 778–784 (2002).
Article PubMed PubMed Central Google Scholar
Serper, M. R. et al. Predictors of aggression on the psychiatric inpatient service. Compr. Psychiatry 46, 121–127 (2005).
Article PubMed Google Scholar
Mauri, M. C. et al. Aggressiveness and violence in psychiatric patients: a clinical or social paradigm? CNS Spectr 24, 564–573 (2019).
Article PubMed Google Scholar
Sanford, S., Roche, B., Molina, I., Weston N. A. & Sirotich, F. Toronto supportive housing growth plan: Needs assessment. Toronto, ON: Wellesley Institute & Canadian Mental Health Association-Toronto. (2022).
Tahir, R., Due, C., Ward, P. & Ziersch, A. Understanding mental health from the perception of Middle Eastern refugee women: a critical systematic review. SSM - Ment. Health 2, 100130 (2022).
Article Google Scholar
Andrus, M., Spitzer, E., Brown, J. & Xiang, A. What We Can’t Measure, We Can’t Understand: Challenges to Demographic Data Procurement in the Pursuit of Fairness. in Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency 249–260. https://doi.org/10.1145/3442188.3445888 (Association for Computing Machinery, 2021).
Soliman, L., Jain, A., Rozel, J. & Rachal, J. Safe spaces: mitigating potential aggression in acute care psychiatry. FOCUS 21, 46–51 (2023).
Article PubMed PubMed Central Google Scholar
Centre for Addiction and Mental Health. We Ask Because We Care: Answers to Frequently Asked Questions about Patient Demographic Data Collection [Internet]. Toronto, ON: Centre for Addiction and Mental Health. Available from: https://www.camh.ca/-/media/files/socio-demographic_patient_pamphlet-pdf.
Jabbour, S. et al. Measuring the impact of AI in the diagnosis of hospitalized patients: a randomized clinical vignette survey study. JAMA 330, 2275–2284 (2023).
Article PubMed PubMed Central Google Scholar
Ling, S., Cleverley, K. & Perivolaris, A. Understanding mental health service user experiences of restraint through debriefing: a qualitative analysis. Can. J. Psychiatry Rev. Can. Psychiatr. 60, 386–392 (2015).
Article Google Scholar
Caton, S. & Haas, C. Fairness in machine learning: a survey. ACM Comput. Surv. 56, 3616865 https://doi.org/10.1145/3616865 (2023).
Giddings, R. et al. Factors influencing clinician and patient interaction with machine learning-based risk prediction models: a systematic review. Lancet Digit. Health 6, e131–e144 (2024).
Article CAS PubMed Google Scholar
Sax, D. R., Sturmer, L. R., Mark, D. G., Rana, J. S. & Reed, M. E. Barriers and opportunities regarding implementation of a machine learning-based acute heart failure risk stratification tool in the emergency department. Diagnostics 12, 2463 (2022).
Article PubMed PubMed Central Google Scholar
Feng, Q., Du, M., Zou, N. & Hu, X. Fair Machine Learning in Healthcare: A Survey in IEEE Transactions on Artificial Intelligence, Vol. 6, 493−507 https://doi.org/10.1109/TAI.2024.3361836.
Zhu, Y. et al. M$^3$Fair: Mitigating bias in healthcare data through multi-level and multi-sensitive-attribute reweighting method. arXiv.org https://arxiv.org/abs/2306.04118v1 (2023).
Yang, J., Soltan, A. A. S., Eyre, D. W., Yang, Y. & Clifton, D. A. An adversarial training framework for mitigating algorithmic biases in clinical machine learning. npj Digit. Med. 6, 1–10 (2023).
Article Google Scholar
Li, F. et al. Evaluating and mitigating bias in machine learning models for cardiovascular disease prediction. J. Biomed. Inform. 138, 104294 (2023).
Article PubMed PubMed Central Google Scholar
Hellström, T., Dignum, V. & Bensch, S. Bias in Machine Learning -- What is it Good for? Preprint at https://doi.org/10.48550/arXiv.2004.00686 (2020).
Chin, M. H. et al. Guiding principles to address the impact of algorithm bias on racial and ethnic disparities in health and health care. JAMA Netw. Open 6, e2345050 (2023).
Article PubMed PubMed Central Google Scholar
Aquino, Y. S. J. et al. Practical, epistemic and normative implications of algorithmic bias in healthcare artificial intelligence: a qualitative study of multidisciplinary expert perspectives. J. Med. Ethics https://doi.org/10.1136/jme-2022-108850 (2023).
Kleinberg, J., Mullainathan, S. & Raghavan, M. Inherent Trade-Off s in the Fair Determination of Risk Scores. In 8th Innovations in Theoretical Computer Science Conference, Vol. 67, 43:1−43:23 (ITCS 2017).
Binns, R. On the apparent confl ict between individual and group fairness. In Proc. 2020 conference on fairness, accountability, and transparency 514−524 (2020).
Feng, C. H., Deng, F., Disis, M. L., Gao, N. & Zhang, L. Towards machine learning fairness in classifying multicategory causes of deaths in colorectal or lung cancer patients. Brief. Bioinform. 26, bbaf398 (2025).
Article PubMed PubMed Central Google Scholar
Yang, Y. et al. A responsible framework for assessing, selecting, and explaining machine learning models in cardiovascular disease outcomes among people with type 2 diabetes: methodology and validation study. JMIR Med. Inform. 13, e66200 (2025).
Article PubMed PubMed Central Google Scholar
Dang, V. N. et al. Fairness and bias correction in machine learning for depression prediction across four study populations. Sci. Rep. 14, 7848 (2024).
Article CAS PubMed PubMed Central Google Scholar
Silva, M., Loureiro, A. & Cardoso, G. Social determinants of mental health: a review of the evidence. Eur. J. Psychiatry 30, 259–292 (2016).
Google Scholar
Movva, R. et al. Coarse race data conceals disparities in clinical risk score performance. In Proc. 8th Machine Learning for Healthcare Conference 443–472 (PMLR, 2023).

Download references

Acknowledgements

We thank members of CAMH's Data & Insights Team for their support with health record extraction and interpretation. This work was supported by a Dalla Lana School of Public Health Interdisciplinary Data Science Seed Grant (S.L.H, no award/grant number), the Krembil Foundation (L.S. and M.M.M., no award/grant number), the Social Sciences and Humanities Research Council Insight Development Grant (L.S.: #430- 2021-01166) and a Google Award for Inclusion Research (L.S. and S.L.H, no award/grant number).

Author information

Authors and Affiliations

The Krembil Centre for Neuroinformatics, Centre for Addition and Mental Health, Toronto, ON, Canada
Yifan Wang, Laura Sikstrom, Robert Xiao, Zoe Findlay, Juveria Zaheer, Sean L. Hill & Marta M. Maslej
Faculty of Medicine, University of Ottawa, Ottawa, ON, Canada
Yifan Wang
Department of Anthropology, University of Toronto, Toronto, ON, Canada
Laura Sikstrom
Department of Psychiatry, University of Toronto, Toronto, ON, Canada
Juveria Zaheer & Marta M. Maslej

Authors

Yifan Wang
View author publications
Search author on:PubMed Google Scholar
Laura Sikstrom
View author publications
Search author on:PubMed Google Scholar
Robert Xiao
View author publications
Search author on:PubMed Google Scholar
Zoe Findlay
View author publications
Search author on:PubMed Google Scholar
Juveria Zaheer
View author publications
Search author on:PubMed Google Scholar
Sean L. Hill
View author publications
Search author on:PubMed Google Scholar
Marta M. Maslej
View author publications
Search author on:PubMed Google Scholar

Contributions

L.S., M.M.M., J.Z., and S.H. conceptualized the study and acquired funding for its completion. Y.W., Z.F., and R.Z. contributed to the interpretation and processing of data. Y.W., R.Z., M.M.M., L.S., and S.H. conceptualized the methods. Y.W. completed the analysis and visualizations and drafted the paper. All authors reviewed and contributed edits to the manuscript.

Corresponding author

Correspondence to Marta M. Maslej.

Ethics declarations

Competing interests

S.L.H, J.Z., L.S., and M.M.M report financial support from the University of Toronto Dalla Lana School of Public Health. S.H, J.Z., L.S., and M.M.M. report financial support from the Social Sciences and Humanities Research Council of Canada. L.S. and S.L.H. report financial support from Google Research. These funders had no role in study conceptualization, design, implementation or dissemination of findings. The other authors do not have a competing interest.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Study registration

The study was not registered.

Public involvement

Two patient advisors with lived experience of mental illness and acute mental healthcare were consulted on the study aims and methods.

Supplementary information

Supplementary Information (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Wang, Y., Sikstrom, L., Xiao, R. et al. Fairness analysis of machine learning predictions of aggression in acute psychiatric care. npj Mental Health Res 5, 16 (2026). https://doi.org/10.1038/s44184-026-00194-6

Download citation

Received: 04 October 2025
Accepted: 06 February 2026
Published: 02 March 2026
Version of record: 02 March 2026
DOI: https://doi.org/10.1038/s44184-026-00194-6

Subjects

Abstract

Similar content being viewed by others

A proof-of-concept machine learning model for short-term suicide risk stratification in depressed youth

FairFML: fair federated machine learning with a case study on reducing gender disparities in cardiac arrest outcome prediction

Integration of fairness-awareness into clinical language processing models

Introduction

Methods

Study population

Prediction task

Data processing

Addressing class imbalance

Model selection and optimization

Model training

Fairness assessment

Results

Sample characteristics

Model performance

Fairness assessment

Intersectional fairness assessment

Discussion

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Study registration

Public involvement

Supplementary information

Supplementary Information (download PDF )

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links