Introduction

Mental health disorders represent a significant public health concern, affecting approximately 19.86% of adults in the United States (U.S.) annually, which translates to nearly 50 million Americans. Of these, 4.91% experience severe mental illness1,2,3. While mental health disorders impact individuals across diverse racial, ethnic, and gender demographics, certain groups face disproportionate burdens in both prevalence and impact4,5,6,7.

African Americans, comprising 13.6% of the U.S. population, experience unique challenges in mental health care. Socioeconomic factors exacerbate these disparities, with 20.1% of African Americans living in poverty 8 and 10.8% lacking health insurance9. While African Americans experience mental illness at rates similar to the general population, they face significant barriers to accessing quality mental health care. Only one in three African Americans in need receives mental health care, with lower rates of service use compared to non-Hispanic whites10. These disparities stem from various factors, including racial and ethnic biases11, stigma12, limited access to care due to financial and geographic constraints, historical trauma13, distrust of the healthcare system14, poorer quality of care, and lack of culturally competent services15 Moreover, African Americans are less likely to receive guideline-consistent care, are underrepresented in research, and are more likely to use emergency rooms or primary care for mental health needs16,17,18,19.

Diagnostic disparities further complicate the landscape, with African Americans more frequently diagnosed with schizophrenia and less frequently with mood disorders compared to whites presenting with similar symptoms20. Additionally, African Americans with mental health conditions, particularly schizophrenia, bipolar disorders, and other psychoses, face higher rates of incarceration than individuals of different races21,22. Factors such as gender, age, complications, comorbidities, insurance type, and admission source shape mental health outcomes within this group18,23,24,25. African Americans exhibit higher rates of mental health disorders due to psychosocial stressors such as marital problems, involvement with the justice system, abuse, and financial crises26,27,28,29,30. Challenges such as inadequate assessment tools and biases in clinical decision-making impede accurate reporting of mental health symptoms among African Americans29,31,32,33,34,35.

Despite the growing body of research on mental health disparities, there remains a significant gap in our understanding of the specific patterns, predictors, and outcomes of mental health disorders among African Americans in regionally defined areas. This study aims to address this gap by leveraging a comprehensive dataset from Southeastern Virginia, employing advanced analytical techniques to identify specific trends that can inform targeted interventions and policy decisions. The primary objective of this study was to employ artificial intelligence (AI) and machine learning (ML) methodologies to analyze patterns and predictors of mental health outcomes among underserved populations in the Southeastern Virginia region, with an emphasis on African American communities. Specifically, the study aimed to (a) Examine the impact of various factors (including gender, age, complications, comorbidities, insurance type, and admission source) on mental health outcomes in the Southeastern Virginia area and (b) Develop comprehensive prediction models for mental health outcomes using advanced machine learning techniques.

By leveraging a large, comprehensive dataset from the VHI system, this study seeks to address critical gaps in the literature and provide valuable insights into the unique mental health challenges faced by African Americans. The findings aim to inform targeted interventions and health policies to reduce mental health disparities and improve outcomes for this underserved population, contributing to a more equitable and effective mental health care system.

Materials and methods

This study employs a quantitative, cross-sectional design using retrospective data from 2016 to 2020. The analysis incorporates traditional statistical methods and innovative machine-learning techniques to examine patterns and predictors of mental health outcomes for African Americans in Southeastern Virginia.

Ethics

The study was approved by the Eastern Virginia Medical School (EVMS) Institutional Review Board and Human Subjects’ Protection (IRB #23-07-NH-0174), which determined that it did not involve human subjects research and was therefore exempt from IRB review. Due to the retrospective nature of the study, a waiver of informed consent was granted by the EVMS Institutional Review Board and Human Subjects’ Protection, and all patient data were deidentified to maintain confidentiality. Data were received via secure transfer and stored on password-protected devices accessible only to authorized research team members. All research methods followed the guidelines and regulations set forth by the EVMS IRB and Human Subjects’ Protection committee. The research team extracted demographic, administrative, clinical, and financial data from the VHI database, including data on comorbidities. To ensure data safety throughout the project, deidentified data were securely transferred via a secure File Transfer Protocol behind the EVMS firewall during the collection phase. After collection, data were stored on password-protected devices with access restricted to authorized team members, and regular backups were performed.

Data collection

This study used aggregated hospital discharge data from VHI, a comprehensive healthcare information repository. The dataset provided by the Virginia Department of Health focuses on mental health among underserved populations in Southeastern Virginia. VHI consolidates data from various sources, ensuring accuracy and objectivity. It includes medical and pharmacy claims for around five million Virginia residents covered by commercial, Medicaid, and Medicare plans. The database covers patient demographics, care locations, provider details, diagnoses, and service costs. VHI integrates data from both commercial and public insurance carriers, representing most of Virginia’s insured population, and collaborates with the Department of Medical Assistance Services and nine commercial carriers to ensure data quality and compliance36.

Study population

The target population includes African American adults aged 18 to 85 years residing in the Southeastern Virginia region who sought mental healthcare services between 2016 and 2020. The extracted data comprised demographic information, comorbidities, clinical characteristics, and hospital details. Each discharge record contained one primary diagnosis code, often accompanied by multiple additional codes reflecting the patient’s mental health status. Diagnoses in the VHI system are based on ICD-10 codes assigned by healthcare providers during patient encounters. To ensure diagnostic accuracy and consistency, the study utilized the ICD-10 Classification of Mental and Behavioural Disorders: Clinical Descriptions and Diagnostic Guidelines37 as the reference for diagnostic criteria. This standardized classification system ensures consistency in coding across the dataset and aligns with international diagnostic standards. Table 1 summarizes the ICD-10 codes used in this study, categorized by significant mental health disorders:

Table 1 ICD-10 codes for mental health disorders used in the study.

This comprehensive categorization of mental health disorders using standardized ICD-10 codes enables a detailed and reliable analysis of mental health patterns and trends within the study population. By adhering to these international diagnostic standards, the study ensures comparability with other research and enhances the validity of its findings.

Statistical analysis plan

All statistical analyses were conducted in collaboration with Research and Infrastructure Service Enterprise at EVMS. Data analysis was conducted using a combination of R, Python, and SAS to capitalize on the unique strengths of each software. R (tidyverse package) was employed for data cleaning and initial exploratory analyses, enabling efficient data preprocessing and visualization. Python (pandas, numpy, scipy.stats, scikit-learn and statsmodels libraries) was utilized for implementing and evaluating various machine learning models, leveraging its extensive libraries and frameworks for predictive modeling. SAS was used to conduct complex statistical procedures.

Data cleaning involves checking for and addressing missing or inconsistent data. Standardized procedures were applied to handle missing data, including imputation methods where appropriate. The study employed a diverse range of statistical methods encompassing both traditional statistical analysis and machine learning to extract meaningful insights from the patient dataset. Descriptive statistics were conducted for several parameters to identify relevant factors for the research question. Demographic factors such as sex and age, clinical factors including complications or comorbidities, and administrative aspects like admission status and length of stay (LOS) were explored. Frequencies were run for all categorical parameters, while means, standard deviations, and medians were calculated for all numeric parameters. Chi-square testing was conducted for the analysis of categorical variables. Given the nonparametric nature of the data, Kruskal-Wallis and Wilcoxon tests were implemented when analyzing numeric data. These tests evaluated differences in demographic, clinical, and administrative factors across mental and behavioral diseases and disorders (MBDD) groups (e.g., mood affective disorders (MAD), schizophrenia, schizotypal and delusional disorders (SSDD), mental and behavioral disorders due to psychoactive substance use (MBD), and neurotic, stress-related and somatoform disorders (NSRS)). Significant levels were defined as follows: ***: p-value < 0.0001; **: 0.0001 ≤ p-value < 0.01; *: 0.01 ≤ p-value < 0.05.

Machine learning techniques

To develop robust mental health outcome prediction models for MBDD, machine learning techniques were implemented using Python’s scikit-learn library38. The ML models included gradient boosting (GB), random forest (RF), artificial neural network (ANN), logistic regression (LR), and Naive Bayes (NB). The selection of ML models was based on their diverse strengths and suitability for the study’s objectives. GB and RF, as ensemble methods, can effectively handle complex interactions and nonlinearities in the data. ANN is powerful in capturing intricate patterns and dependencies. LR, as a probabilistic classifier, provides interpretable results and is widely used in healthcare settings. NB, despite its simplicity, can serve as a robust baseline. This combination of models allows for a comprehensive evaluation of predictive performance and insights into the underlying data structure.

Hyperparameter tuning was performed using grid search with 5-fold cross-validation. The optimal hyperparameters for each model were:

GB: learning_rate = 0.1, n_estimators = 100, max_depth = 3.

RF: n_estimators = 100, max_depth = None, min_samples_split = 2.

ANN: hidden_layer_sizes=(100,), solver=’adam’, alpha = 0.0001.

LR: penalty=’l2’, C = 1.0, solver=’lbfgs’.

Naive Bayes: default hyperparameters.

The performance of these models was assessed using a comprehensive set of evaluation metrics, including area under the curve (AUC), correct classification (CA), F-measure or F-score (F1), Precision (Prec), and Recall: Sensitivity or the true positive rate (Recall). Models were validated through a rigorous approach consisting of 100 repeated 5-fold cross-validations to ensure reliability and accuracy in distinguishing between classes and predicting outcomes.

Predictive nomograms

Predictive nomograms were developed using the Logistic Regression classifier to integrate demographic, clinical, and administrative predictors for MAD, MBD, and SSDD. These nomograms provided a visual representation of the risk factors and their respective contributions to the probability of each disorder. The top ten predictors for each disorder were identified, and their respective weights were calculated to aid in clinical decision-making. The nomograms serve as quantitative tools, enabling clinicians to assess the probability of specific mental disorders based on a comprehensive profile of individual risk factors.

Results

Prevalence rates

Table 2 Mental and behavioral diseases and disorders.

Table 2 provides a breakdown of the prevalence of various MBDD among discharged patients within the Southeastern Virginia area. The total number of readmissions recorded was 22,254. MAD was the most common, constituting approximately 41.66% of the cases, followed closely by SSDD, which represented about 39.57%. MBD accounted for 14.30% of the readmissions, while NSRS comprised 4.46% of the total.

Demographic, administrative, clinical, and comorbidity characteristics

Table 3 Demographic, administrative, clinical, and comorbidity characteristics of mental and behavioral disorders.

Table 3 details the demographic, administrative, clinical, and comorbidity characteristics of patients diagnosed with various MBDDs. Females predominantly constitute the patient population for MAD and NSRS, with percentages of 54.54% and 56.50%, respectively. In contrast, SSDD and MBD are less prevalent among females.

The mean age of patients across disorders hovers around the late thirties to mid-forties. Emergency admissions are the most common across all MBDD, particularly pronounced in the MBD group at 71.28%. When examining insurance types, a significant proportion of SSDD patients are covered by Medicare (34.21%), whereas a higher percentage of MBD patients utilize Medicaid (26.08%). Regarding comorbidity profiles, SSDD patients tend to have fewer comorbidities, with 28.24% having none, while MBD patients show a higher prevalence of multiple comorbidities. Specifically, 10.37% of MBD patients present five or more comorbidities. This table also reveals significant data on the (LOS), with SSDD patients experiencing the most extended stays, averaging 8.54 days. The geographical distribution indicates that Norfolk and Virginia Beach are prominent locations for these patients, suggesting regional variations in the prevalence or treatment availability of mental health conditions.

Table 4 Comparative analysis of demographic, clinical, and administrative differences in mental and behavioral disorders.

Table 4 presents a comparative analysis of demographic, clinical, and administrative characteristics across MBDD groups. The analysis highlights statistically significant gender differences, with SSDD showing the most considerable disparity (24.1%, p < 0.0001) between males and females. Age differences also show significant variances; however, statistically significant differences are noted in MAD (2.9 ± 0.2, p < 0.0001) and MBD (4.4 ± 1, p < 0.0001). Medicare coverage significantly differs across groups, with notable differences in SSDD (31.6%, p < 0.0001) and MBD (57.3%, p < 0.0001), indicating distinct patterns in insurance utilization. The comparison of patients with and without primary procedures reveals significant findings in all groups, particularly MAD (57.5%, p < 0.0001), SSDD (60.6%, p < 0.0001), and MBD (7.1%, p < 0.0001). Complication rates are consistently high across all groups but do not reach statistical significance, suggesting a general trend of high complication rates irrespective of specific disorders. Emergency admission types show significant differences, especially in MBD (42.6%, p < 0.0001), emphasizing the urgency in admissions for this group. LOS analysis further emphasizes gender differences, particularly in SSDD, where males exhibit a significantly longer LOS compared to females (2.1 ± 2, p < 0.0001). This indicates a more complex clinical pathway for males in this group. Post-operative LOS also reflects significant gender differences in NSRS (0.9 ± 2, p < 0.05) and SSDD (2.1 ± 2, p < 0.0001), suggesting differential recovery times based on gender (table 3).

Total charge differences in patients

Table 5 Total charge groups differences (%) with procedure.

Table 5 examines the impact of various factors on total charge differences for patients with MBDD who underwent procedures. In the gender category, a notable increase in charges is observed for SSDD in male patients compared to female patients (5.8%, p < 0.0001). Medicare recipients generally see higher charges, with significant increases noted in the MAD (7.5%, p < 0.0001) and SSDD (16.8%, p < 0.0001) groups. Complication presence corresponds to an increase in total charges, with a substantial effect seen in MBD and SSDD, although it did not reach statistical significance. Different admission types also show significant differences in charges, with emergency admissions generally resulting in higher costs compared to urgent and elective, especially in NSRS (43.0%, p < 0.0001) and MBD (33.8%, p < 0.0001). The number of comorbidities correlates with charge differences, where more comorbidities typically lead to higher charges, notably in MBD, with a 25.5% increase when moving from 4 to 5 + comorbidities (p < 0.0001) (table 5).

Table 6 Total charge groups differences (%) without procedure.

Table 6 explores total charge differences for patients without procedures across MBDD. Gender differences are particularly stark in NSRS, with females incurring 33.4% higher charges than males (p < 0.05). For Medicare, all groups show significantly higher recipient charges, especially in SSDD (48.2%, p < 0.0001). The absence of complications, particularly in NSRS, dramatically lowers charges, highlighting the cost impact of managing complications in mental health care. Differences in admission types are less pronounced here than in table 4 but still significant, with emergency versus elective admissions showing enormous disparities, particularly in MBD (30.0%, p < 0.0001). As in table 4, an increase in comorbidities consistently correlates with higher charges, especially in NSRS moving from 4 to 5 + comorbidities with a 77.3% increase (p < 0.01).

AI and ML models performance

Table 7 Performance metrics of machine learning models for predicting mental and behavioral disorders.

Table 7 presents the performance metrics for various AI and ML models that predict outcomes for MAD, MBD, and SSDD. The models evaluated include GB, LR, ANN, and RF. These models were rigorously validated using 100 repeated 5-fold cross-validations, and their performance was assessed based on area under the curve (AUC), correct classification (CA), F1 score, Precision (Prec), and recall.

For MBD, the Gradient Boosting model demonstrated the highest performance with an AUC of 0.955 and a CA of 0.929, along with robust F1 (0.747), precision (0.79), and recall (0.709) scores, indicating its superior predictive capability. LR and ANN also showed strong performance with AUCs of 0.937 and 0.936, respectively, and similar CA and precision metrics. In the MAD category, the GB model again led in performance with an AUC of 0.832 and a balanced F1 score of 0.719, reflecting its reliability in prediction. However, the overall performance metrics for MAD were lower compared to MBD, suggesting potential complexity in modeling MAD outcomes. For SSDD, the GB model achieved the highest AUC at 0.832 and an F1 score of 0.709, highlighting its effectiveness. The LR and ANN models also performed well but exhibited slightly lower metrics across all evaluation criteria. Overall, Gradient Boosting consistently outperformed other models across all disorder categories, particularly excelling in MBD predictions (Table 7).

Predictive nomograms results

The study developed predictive nomograms for three of the most prevalent MBDDs: MAD, MBD, and SSDD, each depicted in Figs. 1 and 2, and 3, respectively. These nomograms were developed using Logistic Regression classifiers to integrate demographic, clinical, and administrative predictors, estimating the probability of each disorder.

Fig. 1
figure 1

Predictive nomogram and risk factors for Mood Affective Disorders (MAD).

For MAD (Fig. 1), the most robust predictors include alcohol and drug use, with significant regional differences, notably residents from Poquoson City contributing the highest points. For MBD (Fig. 2), psychological factors and age played pivotal roles, with older individuals showing a higher likelihood of MBD, and clinical factors such as complications and liver function underscored the interplay between physical and mental health.

Fig. 2
figure 2

Predictive nomogram and risk factors for mental and behavioral disorders due to psychoactive substance use (MBD).

The SSDD nomogram (Fig. 3) identified psychiatric symptoms and drug use as the top predictors, along with impactful demographic factors like insurance type and county of residence. Specific insurance types like Medicare and regions like Suffolk City were associated with higher probabilities of SSDD.

Fig. 3
figure 3

Predictive nomogram and risk factors for schizophrenia, schizotypal, and delusional disorders (SSDD).

Discussion

Our study provides significant insights into the patterns and predictors of mental health outcomes among underserved African American adults in Southeastern Virginia, contributing to the broader field of mental health disparities research. The findings align with and expand upon existing literature, offering a comprehensive understanding of the complex interplay between demographic, clinical, and socioeconomic factors in shaping mental health outcomes in this population.

Our results indicate that MAD is the most prevalent (41.66%), followed by SSDD) and MBD. This prevalence pattern is consistent with national data39,40,41,42, though our higher rates suggest a potentially more significant mental health burden in our study population. This finding underscores the critical need for targeted interventions in this region.

Key predictors of mental health outcomes identified in our study include gender, age, comorbidities, and insurance type. Females predominantly constituted the patient population for MAD and NSRS, aligning with previous research showing higher rates of mood and anxiety disorders among women43,44,45,46,47. The mean age of patients across disorders was in the late thirties to mid-forties, with a trend of decreasing mental disorders with age, consistent with findings by48 and 49. The significant proportion of SSDD patients covered by Medicare highlights the role of insurance type in mental health outcomes, echoing findings of50,51,52.

Our study revealed significant differences in total charges based on demographic and clinical factors, particularly for patients with comorbidities and emergency admissions. This finding emphasizes the economic impact of these variables on mental and behavioral disorder care costs, aligning with research by53,54, and55. These insights can guide healthcare policy and clinical practice in optimizing care delivery and managing healthcare costs for underserved populations.

ML algorithms have revolutionized mental health diagnostics, offering diverse methodological approaches with varying degrees of effectiveness56,57,58,59,60. These computational tools enable personalized prediction of mental health outcomes, facilitating targeted interventions across diverse populations.

Our study’s principal contribution lies in applying sophisticated ML techniques to an extensive regional dataset. The implementation of GB, RF, and LR models, complemented by predictive nomograms, provides a robust empirical framework for understanding mental health trajectories.

In the broader literature, Support Vector Machines (SVM) and Random Forests have emerged as leading classification methods61, while Convolutional Neural Networks (CNNs) have achieved superior accuracy in bipolar disorder diagnosis62,63,64. Gradient Boosting algorithms have demonstrated enhanced predictive capabilities through their iterative error-learning mechanisms65,66.

The field faces persistent methodological challenges, particularly concerning data quality and diagnostic heterogeneity, resulting in variable model performance across research groups62. Model performance varies significantly based on the specific mental health condition, data modality (clinical documentation, patient-reported outcomes, neuroimaging), and algorithmic selection62,65,67,68,69. Despite these constraints, ML approaches consistently demonstrate improved diagnostic and predictive accuracy compared to conventional methodologies, particularly in analyzing complex, large-scale datasets.

While our study focused on African Americans in Southeastern Virginia, the findings have broader implications. The higher prevalence of mental health disorders in our study population compared to national averages highlights potential disparities that may exist in other underserved communities. This emphasizes the importance of region-specific research and tailored interventions to address mental health disparities effectively.

Our study lays the groundwork for future research in several key areas. First, validating these findings in broader populations could provide insights into the generalizability of our results. Second, exploring the effectiveness of interventions tailored to the specific needs of underserved communities, as identified by our predictive models, could lead to more effective mental health care strategies. Finally, further investigation into the economic implications of mental health disparities could inform policy decisions and resource allocation.

In conclusion, our study contributes novel insights to mental health disparities research through its comprehensive analysis of mental health patterns, application of advanced machine learning techniques, and focus on an underserved population. These findings have the potential to inform targeted interventions and personalized care strategies, representing a significant step forward in addressing mental health disparities among African Americans in Southeastern Virginia and potentially in other underserved communities.

Policy implications and recommendations

Our findings have several important implications for mental health policy and practice in Southeastern Virginia:

  1. (1)

    Targeted screening and intervention programs should be developed, particularly for Mood Affective Disorders and Schizophrenia, Schizotypal, and Delusional Disorders, which were found to be most prevalent.

  2. (2)

    Healthcare providers should be trained to recognize and address the unique mental health needs of African American patients, considering the gender and age-related patterns identified in our study.

  3. (3)

    Efforts should be made to improve insurance coverage and access to mental health services, given the significant impact of insurance type on mental health outcomes and healthcare utilization.

  4. (4)

    Community-based mental health programs should be strengthened, particularly in areas identified as having higher risk factors.

  5. (5)

    Future mental health initiatives should adopt data-driven approaches, utilizing predictive models to identify at-risk individuals and tailor interventions accordingly.

Strengths

Our in-depth examination of 28 comorbid illnesses offers a nuanced view of the complex healthcare needs within this population. This comprehensive approach provides a more holistic understanding of the interplay between mental health and other medical conditions, informing more integrated care strategies. The application of machine learning models, including GB, RF, and LR, represents a methodological advancement in mental health research. These techniques allowed us to identify subtle patterns and predictors that might be overlooked by traditional statistical methods, offering new perspectives on risk factors and potential intervention points.

By exploring gender and age differences in mental health prevalence and readmission rates, our study highlights subgroups that may require tailored interventions. This granular analysis contributes to a more personalized approach to mental health care and policy development. Our findings have substantial implications for health policy, particularly in addressing healthcare disparities and improving mental health outcomes. The study provides evidence-based recommendations for targeted interventions, potentially influencing policy decisions to enhance healthcare access and equity for African Americans.

The development of predictive nomograms represents a significant contribution to clinical practice. These tools can assist healthcare providers in identifying high-risk individuals and tailoring interventions, potentially improving patient outcomes and resource allocation. By focusing on Southeastern Virginia, our study provides locally relevant insights that can inform targeted interventions and policy decisions specific to this region while also offering a model for similar region-specific analyses elsewhere.

While we acknowledge the limitations inherent in our cross-sectional design, the strengths of our study significantly outweigh these constraints. The comprehensive nature of our dataset, coupled with advanced analytical techniques and a focus on an underserved population, positions our research as a valuable contribution to the field of mental health disparities. Our findings not only enhance the current understanding of mental health challenges among African Americans but also pave the way for future longitudinal studies that can build upon this foundation to examine causal relationships and long-term trends.

Limitations

Our study, while providing valuable insights into mental health disparities among African Americans in Southeastern Virginia, is subject to several limitations that warrant consideration when interpreting the results.

Firstly, The implementation of machine learning algorithms in mental health diagnostics presents distinct methodological challenges and limitations across different model architectures70. GB, while adept at handling complex interactions, requires meticulous tuning of critical hyperparameters, including learning rate, tree depth, and boosting iterations, to mitigate overfitting risks and optimize the bias-variance trade-off71. RF exhibits robustness against overfitting but demonstrates sensitivity to hyperparameters such as tree count and feature split threshold, potentially compromising computational efficiency and model interpretability in high-dimensional datasets72. ANNs present unique challenges in hyperparameter optimization, particularly in architecting optimal layer structures and selecting appropriate activation functions. Their performance significantly depends on training data quality and requires careful tuning of layer composition, neuron count, learning rate, and regularization parameters73. LR, while offering superior interpretability, faces limitations with non-linear relationships and necessitates precise feature selection and regularization parameter optimization to address multicollinearity concerns74,75. Naive Bayes classifiers, despite their computational efficiency, are constrained by their fundamental assumption of feature independence, which rarely holds in clinical settings. While requiring less intensive hyperparameter tuning, these models demand careful consideration of prior selection and zero-frequency handling76.

These algorithmic limitations are particularly pronounced in mental health applications due to the inherent complexity and heterogeneity of psychiatric data, emphasizing the necessity for robust cross-validation and systematic hyperparameter optimization protocols.

Secondly, our reliance on retrospective, pre-existing hospital discharge data introduces potential biases related to data collection and documentation practices. This approach may lead to misclassification or underreporting of conditions, as we were unable to control the data collection process. Consequently, specific nuances and contextual factors that could influence mental health outcomes may have been overlooked. The retrospective nature of the data also limits our ability to establish causal relationships between the identified factors and mental health outcomes.

Thirdly, while our use of ICD-10 codes for diagnosis enhances the validity and reliability of our findings by providing a standardized framework, it may not capture the full complexity of mental health conditions. Diagnostic accuracy can be influenced by factors such as clinician expertise, cultural competence, and the specific manifestation of symptoms in different populations. Future studies could benefit from incorporating structured diagnostic interviews or additional clinical assessments to further validate these ICD-10-based diagnoses.

Fourthly, the geographic specificity of our dataset to Southeastern Virginia, while providing valuable local insights, limits the generalizability of our findings to other U.S. regions. Mental health disparities and their underlying factors may vary across different geographic and cultural contexts, necessitating caution when extrapolating our results to other populations.

Lastly, our approach to handling missing data by excluding instances with less than 0.001% missing values, while pragmatic, may have overlooked subtle patterns or biases. This could potentially affect the accuracy and completeness of our analysis, mainly if the missing data were not randomly distributed across the dataset.

Despite these limitations, our study makes significant contributions to the understanding of mental health disparities among African Americans. The large, comprehensive dataset and advanced analytical techniques employed provide robust insights into patterns and predictors of mental health outcomes in this underserved population. These findings have important implications for health policy and clinical practice, informing targeted interventions aimed at improving mental health equity and outcomes for African Americans. Furthermore, our study lays a foundation for future research, highlighting areas where more in-depth, longitudinal studies could further elucidate the complex factors influencing mental health disparities.

Conclusions

Our study provides significant insights into the patterns and predictors of mental health outcomes among African American adults in Southeastern Virginia, leveraging an extensive and comprehensive dataset from the VHI system. Through robust statistical analyses and advanced predictive modeling, we have uncovered critical findings contributing to understanding mental health disparities in this underserved population. The high prevalence of MAD and SSDD within our study population aligns with global mental health patterns, underscoring the urgent need for targeted interventions. Our research has identified key demographic and clinical predictors, including gender, age, comorbidities, and insurance type, which significantly influence mental health outcomes. These findings reaffirm the importance of these factors in addressing mental health disparities and provide a foundation for developing more effective, personalized interventions. A significant strength of our study lies in the application of advanced machine learning techniques, particularly Gradient Boosting, which demonstrated high accuracy and reliability in predicting mental health outcomes. This approach not only enhances the precision of our findings but also showcases the potential of these analytical techniques to revolutionize mental health research and clinical practice. The development of predictive nomograms further translates our research into practical tools for clinicians, enabling more accurate assessment of individual risk profiles for specific mental disorders. Our study’s focus on an underrepresented population, coupled with the use of a comprehensive dataset and detailed comorbidity analysis, provides a nuanced understanding of mental health disparities among African Americans. These insights have significant implications for health policy and the development of targeted interventions. While we acknowledge limitations such as the retrospective design, regional specificity, and potential biases in handling missing data, the value of our contributions to the field of mental health research remains substantial. In conclusion, this study offers a detailed examination of mental health outcomes among African Americans in Southeastern Virginia, identifying key predictors and demonstrating the power of machine learning in predictive modeling for mental health. Our findings support the development of targeted health policies and interventions to reduce mental health disparities and improve outcomes for underserved populations. Moving forward, we recommend validating these findings in broader and more diverse populations to enhance the generalizability and impact of our conclusions. This research not only advances our understanding of mental health disparities but also paves the way for more equitable and effective mental health care strategies for African American communities and potentially other underserved populations.