Abstract
The persistent increase in healthcare expenditure has become a major challenge for the sustainability of public financing worldwide. Therefore, identifying the characteristics of at-risk population and their predictability for healthcare use is crucial to inform targeted policy and interventions to curb with increasing healthcare use and expenditure. Drawing on three waves of the HILDA survey that included a ‘health module’, the study applied four machine learning (ML) methods–Random Forest, Gradient Boosting Decision Trees, Extreme Gradient Boosting, and multilayer perceptron neural networks and conventional logistic regression for prediction of non-emergency healthcare use (specifically primary and tertiary inpatient hospital care). Predictive performance for the classifiers was evaluated using accuracy, sensitivity, and specificity measures, and area under the receiver operating characteristic curve (AUC), sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and Matthews Correlation Coefficient (MCC). Calibration of the models was assessed using Brier score, which measures the mean squared difference between predicted probabilities and observed outcomes with lower values indicating better calibration. Finally, Local Interpretable Model-agnostic Explanations (LIME) was conducted to explain the model’s predictive behaviour, while SHAP results are provided for each wave along with a representative SHAP plot as a demonstration which uses the probability-contribution scale. Based on 47,899 observations and 741 variables, our model identified socio-economic factors (age, socio-economic status, private insurance status) and health-related variables (e.g. previous contact with healthcare service) and having a designated doctor to see when sick or for health advice were strong predictors of healthcare use. Between the different ML techniques, Gradient Boosting Decision Trees provided better prediction performance on healthcare use compared with logistic regression across all three waves. Although the standard logistic regression produced AUC of 0.69, had 71% positive predictive value (PPV), and 52% negative predictive value (NPV), with 86% sensitivity and 30% specificity, the ML models produced AUC in the range of 0.68 to 0.76, PPV of 75% to 77%, and NPVs of 61% to 63% with sensitivity ranging between 0.86 and 0.89, specificity between 0.40 and 0.44 and brier scores ranging between 0.11 and 0.28. The novelty of using ML techniques on a large, nationally representative longitudinal household survey data that covers a range of different domains provided more robust estimates on factors influencing future healthcare use (primary and inpatient elective care) which are important to inform resource allocation decisions and priority setting.
Data availability
This paper uses unit record data from Household, Income and Labour Dynamics in Australia Survey (HILDA) conducted by the Australian Government Department of Social Services (DSS). The findings and views reported in this paper, however, are those of the authors and should not be attributed to the Australian Government.The datasets analysed and/or generated during the current study are subject to the Confidentiality Deed signed with the Commonwealth of Australia (as represented by the Department of Social Services) and to the Commonwealth privacy laws. Data are accessible from the NCLD by application (https://www.dss.gov.au/national-centre-for-longitudinal-data-ncld/access-to-dss-longitudinal-datasets), and any questions about applying for the DSS longitudinal datasets should be addressed to NCLD ([ncld@dss.gov.au](mailto: ncld@dss.gov.au)).
References
Australian Institute of. Health and Welfare (AIHW). Health expenditure Australia 2020-21. (2023).
Harris, A. & Sharma, A. Estimating the future health and aged care expenditure in Australia with changes in morbidity. PloS One. 13, e0201697 (2018).
Nghiem, S. H. & Connelly, L. B. Convergence and determinants of health expenditures in OECD countries. Health Econ. Rev. 7, 1–11 (2017).
De Meijer, C., Wouterse, B., Polder, J. & Koopmanschap, M. The effect of population aging on health expenditure growth: a critical review. Eur. J. Ageing. 10, 353–361 (2013).
Baltagi, B. H. & Moscone, F. Health care expenditure and income in the OECD reconsidered: evidence from panel data. Econ. Model. 27, 804–811 (2010).
Ivanovski, K. & Awaworyi Churchill, S. Has healthcare expenditure converged across Australian States and territories? Empir. Econ. 61, 3401–3417 (2021).
Thiébaut, S., Barnay, T. & Ventelou, B. Ageing, chronic conditions and the evolution of future drugs expenditure: a five-year micro-simulation from 2004 to 2029. Appl. Econ. 45, 1663–1672 (2013).
Polder, J. J., Barendregt, J. J. & van Oers, H. Health care costs in the last year of life—the Dutch experience. Soc. Sci. Med. 63, 1720–1731 (2006).
Pulok, M., van Gool, K. & Hall, J. Inequity in healthcare use among the Indigenous population living in non-remote areas of Australia. Public. Health. 186, 35–43 (2020).
Bell, J. et al. Prevalence, hospital admissions and costs of child chronic conditions: A population-based study. J. Paediatr. Child Health. 56, 1365–1370 (2020).
Islam, M. M., Yen, L., Valderas, J. M. & McRae, I. S. Out-of-pocket expenditure by Australian seniors with chronic disease: the effect of specific diseases and morbidity clusters. BMC public. Health. 14, 1–18 (2014).
Hoon, E., Pham, C., Beilby, J. & Karnon, J. Unconnected and out-of-sight: identifying health care non-users with unmet needs. BMC Health Serv. Res. 17, 1–9 (2017).
Moore, T. G., McDonald, M., Carlon, L. & O’Rourke, K. Early childhood development and the social determinants of health inequities. Health Promot. Int. 30, ii102–ii15 (2015).
Pulok, M. H., van Gool, K. & Hall, J. Horizontal inequity in the utilisation of healthcare services in Australia. Health Policy. 124, 1263–1271 (2020).
Callander, E. J., Corscadden, L. & Levesque, J-F. Out-of-pocket healthcare expenditure and chronic disease–do Australians forgo care because of the cost? Aust. J. Prim. Health. 23, 15–22 (2017).
Wong, C. Y., Greene, J., Dolja-Gore, X. & van Gool, K. The rise and fall in Out‐of‐Pocket costs in australia: an analysis of the strengthening medicare reforms. Health Econ. 26, 962–979 (2017).
Russell, D. J. et al. Helping policy-makers address rural health access problems. Aust. J. Rural Health. 21, 61–71 (2013).
Bull, C., Howie, P. & Callander, E. J. Inequities in vulnerable children’s access to health services in Australia. BMJ Global Health. 7, e007961 (2022).
Moorin, R. E. & Holman, C. D. A. J. The effects of socioeconomic status, accessibility to services and patient type on hospital use in Western australia: a retrospective cohort study of patients with homogenous health status. BMC Health Serv. Res. 6, 1–10 (2006).
Erny-Albrecht, K., Oliver-Baxter, J. & Bywood, P. Primary health care-based programmes targeting potentially avoidable hospitalisations in vulnerable groups with chronic disease. Primary Health Care Research & Information Service policy issue review Adelaide: Primary Health Care Research & Information Service. (2016).
Javaid, M., Haleem, A., Singh, R. P., Suman, R. & Rab, S. Significance of machine learning in healthcare: Features, pillars and applications. Int. J. Intell. Networks. 3, 58–73 (2022).
Teo, K. et al. The promise for reducing healthcare cost with predictive model: An analysis with quantized evaluation metric on readmission. Journal of healthcare engineering. 2021:1–10. (2021).
Noor, M. B. T., Zenia, N. Z., Kaiser, M. S., Mamun, S. A. & Mahmud, M. Application of deep learning in detecting neurological disorders from magnetic resonance images: a survey on the detection of alzheimer’s disease, parkinson’s disease and schizophrenia. Brain Inf. 7, 1–21 (2020).
Sajeev, S. et al. Predicting Australian adults at high risk of cardiovascular disease mortality using standard risk factors and machine learning. Int. J. Environ. Res. Public Health. 18, 3187 (2021).
Brankovic, A., Rolls, D., Boyle, J., Niven, P. & Khanna, S. Identifying patients at risk of unplanned re-hospitalisation using statewide electronic health records. Sci. Rep. 12, 16592 (2022).
Niehaus, I. M., Kansy, N., Stock, S., Dötsch, J. & Müller, D. Applicability of predictive models for 30-day unplanned hospital readmission risk in paediatrics: a systematic review. BMJ open. 12, e055956 (2022).
Haut, E. R., Pronovost, P. J. & Schneider, E. B. Limitations of administrative databases. Jama 307, 2589–2590 (2012).
Mahmoudi, E. et al. Use of electronic medical records in development and validation of risk prediction models of hospital readmission: systematic review. Bmj 369, m958 (2020).
Huang, Y., Talwar, A., Chatterjee, S. & Aparasu, R. R. Application of machine learning in predicting hospital readmissions: a scoping review of the literature. BMC Med. Res. Methodol. 21, 1–14 (2021).
Summerfield, M. et al. HILDA user manual–release 18, Melbourne. (2019).
Watson, N. & Wooden, M. P. The HILDA survey: a case study in the design and development of a successful household panel survey. Longitud. Life Course Stud. 3, 369–381 (2012).
Collins, G. S., Moons, K. G., Dhiman, P., Riley, R. D., Beam, A. L., Van Calster,B., … Logullo, P. (2024). TRIPOD + AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. bmj, 385.
Kwakernaak, S., van Mens, K., Cahn, W., Janssen, R. & Investigators, G. Using machine learning to predict mental healthcare consumption in non-affective psychosis. Schizophr. Res. 218, 166–172 (2020).
Hosmer, D. W. Jr, Lemeshow, S. & Sturdivant, R. X. Applied logistic regression: Wiley, (2013).
Loh, H. W. et al. Application of explainable artificial intelligence for healthcare: A systematic review of the last decade (2011–2022). Computer Methods and Programs in Biomedicine. :107161. (2022).
Steyerberg, E. W. et al. Assessing the performance of prediction models: a framework for some traditional and novel measures. Epidemiol. (Cambridge Mass). 21, 128 (2010).
Statistics ABo. Census of Population and Housing: Socio-Economic Indexes for Areas (SEIFA), Australia. 2011. (2016).
Waring, J., Lindvall, C. & Umeton, R. Automated machine learning: review of the state-of-the-art and opportunities for healthcare. Artif. Intell. Med. 104, 101822 (2020).
Graham, B., Kruger, E., Tennant, M. & Shiikha, Y. An assessment of the Spatial distribution of bulk billing-only GP services in Australia in relation to area-based socio-economic status. Aust. J. Prim. Health (2023).
Dollman, J., Gunn, K. M. & Hull, M. J. Sociodemographic predictors of attitudes to support seeking from a medical Doctor or other health provider among rural Australians. Int. J. Behav. Med. 28, 616–626 (2021).
Rana, R. H., Alam, K. & Gow, J. Selection of private or public hospital care: examining the care-seeking behaviour of patients with private health insurance. BMC Health Serv. Res. 20, 1–17 (2020).
Beauchamp, A. et al. Distribution of health literacy strengths and weaknesses across socio-demographic groups: a cross-sectional survey using the health literacy questionnaire (HLQ). BMC public. Health. 15, 1–13 (2015).
Durey, A. & Thompson, S. C. Reducing the health disparities of Indigenous australians: time to change focus. BMC Health Serv. Res. 12, 1–11 (2012).
Jonnagaddala, J., Godinho, M. A. & Liaw, S-T. From telehealth to virtual primary care in australia? A rapid scoping review. Int. J. Med. Informatics. 151, 104470 (2021).
Char, D. S., Abràmoff, M. D. & Feudtner, C. Identifying ethical considerations for machine learning healthcare applications. Am. J. Bioeth. 20, 7–17 (2020).
McCoy, L. G., Brenna, C. T., Chen, S. S., Vold, K. & Das, S. Believing in black boxes: machine learning for healthcare does not need explainability to be evidence-based. J. Clin. Epidemiol. 142, 252–257 (2022).
Acknowledgements
This paper uses unit record data from Household, Income and Labour Dynamics in Australia Survey (HILDA). The HILDA survey was initiated and funded by the Australian Government Department of Social Services (DSS) and managed by the Melbourne Institute of Applied Economic and Social Research (Melbourne Institute. The findings and views reported in this paper, however, are those of the authors and should not be attributed to the Australian Government. https://melbourneinstitute.unimelb.edu.au/hilda).
Funding
No specific funding was used to undertake this study.
Author information
Authors and Affiliations
Contributions
EL conceptualised the study. JZ was responsible for the analytical design. All authors were involved in aspects of data analysis, interpretation, drafting of the manuscript. All authors approved the final draft of the manuscript for submission and take responsibility for the manuscript content.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethics approval and consent to participate
This paper uses unit record data from the Household, Income and Labour Dynamics in Australia (HILDA) Survey. The Melbourne Institute: Applied Economic and Social Research at the University of Melbourne are responsible for design and management of the survey and ethics approval was obtained from the Office of Research Ethics and Integrity, University of Melbourne to conduct the HILDA Survey. All experimental protocols were approved by Office of Research Ethics and Integrity, University of Melbourne. Informed consent was obtained from all subjects and/or their legal guardian(s). All methods were carried out in accordance with relevant guidelines and regulations.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Lee, E., Zhang, J. Predicting non-emergency healthcare use in Australia using machine learning on longitudinal household data. Sci Rep (2026). https://doi.org/10.1038/s41598-025-28968-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-28968-6