Predicting non-emergency healthcare use in Australia using machine learning on longitudinal household data

Lee, Evelyn; Zhang, Jinhui

doi:10.1038/s41598-025-28968-6

Download PDF

Article
Open access
Published: 03 February 2026

Predicting non-emergency healthcare use in Australia using machine learning on longitudinal household data

Evelyn Lee¹ &
Jinhui Zhang²

Scientific Reports , Article number: (2026) Cite this article

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Abstract

The persistent increase in healthcare expenditure has become a major challenge for the sustainability of public financing worldwide. Therefore, identifying the characteristics of at-risk population and their predictability for healthcare use is crucial to inform targeted policy and interventions to curb with increasing healthcare use and expenditure. Drawing on three waves of the HILDA survey that included a ‘health module’, the study applied four machine learning (ML) methods–Random Forest, Gradient Boosting Decision Trees, Extreme Gradient Boosting, and multilayer perceptron neural networks and conventional logistic regression for prediction of non-emergency healthcare use (specifically primary and tertiary inpatient hospital care). Predictive performance for the classifiers was evaluated using accuracy, sensitivity, and specificity measures, and area under the receiver operating characteristic curve (AUC), sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and Matthews Correlation Coefficient (MCC). Calibration of the models was assessed using Brier score, which measures the mean squared difference between predicted probabilities and observed outcomes with lower values indicating better calibration. Finally, Local Interpretable Model-agnostic Explanations (LIME) was conducted to explain the model’s predictive behaviour, while SHAP results are provided for each wave along with a representative SHAP plot as a demonstration which uses the probability-contribution scale. Based on 47,899 observations and 741 variables, our model identified socio-economic factors (age, socio-economic status, private insurance status) and health-related variables (e.g. previous contact with healthcare service) and having a designated doctor to see when sick or for health advice were strong predictors of healthcare use. Between the different ML techniques, Gradient Boosting Decision Trees provided better prediction performance on healthcare use compared with logistic regression across all three waves. Although the standard logistic regression produced AUC of 0.69, had 71% positive predictive value (PPV), and 52% negative predictive value (NPV), with 86% sensitivity and 30% specificity, the ML models produced AUC in the range of 0.68 to 0.76, PPV of 75% to 77%, and NPVs of 61% to 63% with sensitivity ranging between 0.86 and 0.89, specificity between 0.40 and 0.44 and brier scores ranging between 0.11 and 0.28. The novelty of using ML techniques on a large, nationally representative longitudinal household survey data that covers a range of different domains provided more robust estimates on factors influencing future healthcare use (primary and inpatient elective care) which are important to inform resource allocation decisions and priority setting.

Data availability

This paper uses unit record data from Household, Income and Labour Dynamics in Australia Survey (HILDA) conducted by the Australian Government Department of Social Services (DSS). The findings and views reported in this paper, however, are those of the authors and should not be attributed to the Australian Government.The datasets analysed and/or generated during the current study are subject to the Confidentiality Deed signed with the Commonwealth of Australia (as represented by the Department of Social Services) and to the Commonwealth privacy laws. Data are accessible from the NCLD by application (https://www.dss.gov.au/national-centre-for-longitudinal-data-ncld/access-to-dss-longitudinal-datasets), and any questions about applying for the DSS longitudinal datasets should be addressed to NCLD ([ncld@dss.gov.au](mailto: ncld@dss.gov.au)).

References

Australian Institute of. Health and Welfare (AIHW). Health expenditure Australia 2020-21. (2023).
Harris, A. & Sharma, A. Estimating the future health and aged care expenditure in Australia with changes in morbidity. PloS One. 13, e0201697 (2018).
Google Scholar
Nghiem, S. H. & Connelly, L. B. Convergence and determinants of health expenditures in OECD countries. Health Econ. Rev. 7, 1–11 (2017).
Google Scholar
De Meijer, C., Wouterse, B., Polder, J. & Koopmanschap, M. The effect of population aging on health expenditure growth: a critical review. Eur. J. Ageing. 10, 353–361 (2013).
Google Scholar
Baltagi, B. H. & Moscone, F. Health care expenditure and income in the OECD reconsidered: evidence from panel data. Econ. Model. 27, 804–811 (2010).
Google Scholar
Ivanovski, K. & Awaworyi Churchill, S. Has healthcare expenditure converged across Australian States and territories? Empir. Econ. 61, 3401–3417 (2021).
Google Scholar
Thiébaut, S., Barnay, T. & Ventelou, B. Ageing, chronic conditions and the evolution of future drugs expenditure: a five-year micro-simulation from 2004 to 2029. Appl. Econ. 45, 1663–1672 (2013).
Google Scholar
Polder, J. J., Barendregt, J. J. & van Oers, H. Health care costs in the last year of life—the Dutch experience. Soc. Sci. Med. 63, 1720–1731 (2006).
Google Scholar
Pulok, M., van Gool, K. & Hall, J. Inequity in healthcare use among the Indigenous population living in non-remote areas of Australia. Public. Health. 186, 35–43 (2020).
Google Scholar
Bell, J. et al. Prevalence, hospital admissions and costs of child chronic conditions: A population-based study. J. Paediatr. Child Health. 56, 1365–1370 (2020).
Google Scholar
Islam, M. M., Yen, L., Valderas, J. M. & McRae, I. S. Out-of-pocket expenditure by Australian seniors with chronic disease: the effect of specific diseases and morbidity clusters. BMC public. Health. 14, 1–18 (2014).
Google Scholar
Hoon, E., Pham, C., Beilby, J. & Karnon, J. Unconnected and out-of-sight: identifying health care non-users with unmet needs. BMC Health Serv. Res. 17, 1–9 (2017).
Google Scholar
Moore, T. G., McDonald, M., Carlon, L. & O’Rourke, K. Early childhood development and the social determinants of health inequities. Health Promot. Int. 30, ii102–ii15 (2015).
Google Scholar
Pulok, M. H., van Gool, K. & Hall, J. Horizontal inequity in the utilisation of healthcare services in Australia. Health Policy. 124, 1263–1271 (2020).
Google Scholar
Callander, E. J., Corscadden, L. & Levesque, J-F. Out-of-pocket healthcare expenditure and chronic disease–do Australians forgo care because of the cost? Aust. J. Prim. Health. 23, 15–22 (2017).
Google Scholar
Wong, C. Y., Greene, J., Dolja-Gore, X. & van Gool, K. The rise and fall in Out‐of‐Pocket costs in australia: an analysis of the strengthening medicare reforms. Health Econ. 26, 962–979 (2017).
Google Scholar
Russell, D. J. et al. Helping policy-makers address rural health access problems. Aust. J. Rural Health. 21, 61–71 (2013).
Google Scholar
Bull, C., Howie, P. & Callander, E. J. Inequities in vulnerable children’s access to health services in Australia. BMJ Global Health. 7, e007961 (2022).
Google Scholar
Moorin, R. E. & Holman, C. D. A. J. The effects of socioeconomic status, accessibility to services and patient type on hospital use in Western australia: a retrospective cohort study of patients with homogenous health status. BMC Health Serv. Res. 6, 1–10 (2006).
Google Scholar
Erny-Albrecht, K., Oliver-Baxter, J. & Bywood, P. Primary health care-based programmes targeting potentially avoidable hospitalisations in vulnerable groups with chronic disease. Primary Health Care Research & Information Service policy issue review Adelaide: Primary Health Care Research & Information Service. (2016).
Javaid, M., Haleem, A., Singh, R. P., Suman, R. & Rab, S. Significance of machine learning in healthcare: Features, pillars and applications. Int. J. Intell. Networks. 3, 58–73 (2022).
Google Scholar
Teo, K. et al. The promise for reducing healthcare cost with predictive model: An analysis with quantized evaluation metric on readmission. Journal of healthcare engineering. 2021:1–10. (2021).
Noor, M. B. T., Zenia, N. Z., Kaiser, M. S., Mamun, S. A. & Mahmud, M. Application of deep learning in detecting neurological disorders from magnetic resonance images: a survey on the detection of alzheimer’s disease, parkinson’s disease and schizophrenia. Brain Inf. 7, 1–21 (2020).
Google Scholar
Sajeev, S. et al. Predicting Australian adults at high risk of cardiovascular disease mortality using standard risk factors and machine learning. Int. J. Environ. Res. Public Health. 18, 3187 (2021).
Google Scholar
Brankovic, A., Rolls, D., Boyle, J., Niven, P. & Khanna, S. Identifying patients at risk of unplanned re-hospitalisation using statewide electronic health records. Sci. Rep. 12, 16592 (2022).
Google Scholar
Niehaus, I. M., Kansy, N., Stock, S., Dötsch, J. & Müller, D. Applicability of predictive models for 30-day unplanned hospital readmission risk in paediatrics: a systematic review. BMJ open. 12, e055956 (2022).
Google Scholar
Haut, E. R., Pronovost, P. J. & Schneider, E. B. Limitations of administrative databases. Jama 307, 2589–2590 (2012).
Google Scholar
Mahmoudi, E. et al. Use of electronic medical records in development and validation of risk prediction models of hospital readmission: systematic review. Bmj 369, m958 (2020).
Huang, Y., Talwar, A., Chatterjee, S. & Aparasu, R. R. Application of machine learning in predicting hospital readmissions: a scoping review of the literature. BMC Med. Res. Methodol. 21, 1–14 (2021).
Google Scholar
Summerfield, M. et al. HILDA user manual–release 18, Melbourne. (2019).
Watson, N. & Wooden, M. P. The HILDA survey: a case study in the design and development of a successful household panel survey. Longitud. Life Course Stud. 3, 369–381 (2012).
Google Scholar
Collins, G. S., Moons, K. G., Dhiman, P., Riley, R. D., Beam, A. L., Van Calster,B., … Logullo, P. (2024). TRIPOD + AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. bmj, 385.
Kwakernaak, S., van Mens, K., Cahn, W., Janssen, R. & Investigators, G. Using machine learning to predict mental healthcare consumption in non-affective psychosis. Schizophr. Res. 218, 166–172 (2020).
Google Scholar
Hosmer, D. W. Jr, Lemeshow, S. & Sturdivant, R. X. Applied logistic regression: Wiley, (2013).
Loh, H. W. et al. Application of explainable artificial intelligence for healthcare: A systematic review of the last decade (2011–2022). Computer Methods and Programs in Biomedicine. :107161. (2022).
Steyerberg, E. W. et al. Assessing the performance of prediction models: a framework for some traditional and novel measures. Epidemiol. (Cambridge Mass). 21, 128 (2010).
Google Scholar
Statistics ABo. Census of Population and Housing: Socio-Economic Indexes for Areas (SEIFA), Australia. 2011. (2016).
Waring, J., Lindvall, C. & Umeton, R. Automated machine learning: review of the state-of-the-art and opportunities for healthcare. Artif. Intell. Med. 104, 101822 (2020).
Google Scholar
Graham, B., Kruger, E., Tennant, M. & Shiikha, Y. An assessment of the Spatial distribution of bulk billing-only GP services in Australia in relation to area-based socio-economic status. Aust. J. Prim. Health (2023).
Dollman, J., Gunn, K. M. & Hull, M. J. Sociodemographic predictors of attitudes to support seeking from a medical Doctor or other health provider among rural Australians. Int. J. Behav. Med. 28, 616–626 (2021).
Google Scholar
Rana, R. H., Alam, K. & Gow, J. Selection of private or public hospital care: examining the care-seeking behaviour of patients with private health insurance. BMC Health Serv. Res. 20, 1–17 (2020).
Google Scholar
Beauchamp, A. et al. Distribution of health literacy strengths and weaknesses across socio-demographic groups: a cross-sectional survey using the health literacy questionnaire (HLQ). BMC public. Health. 15, 1–13 (2015).
Google Scholar
Durey, A. & Thompson, S. C. Reducing the health disparities of Indigenous australians: time to change focus. BMC Health Serv. Res. 12, 1–11 (2012).
Google Scholar
Jonnagaddala, J., Godinho, M. A. & Liaw, S-T. From telehealth to virtual primary care in australia? A rapid scoping review. Int. J. Med. Informatics. 151, 104470 (2021).
Google Scholar
Char, D. S., Abràmoff, M. D. & Feudtner, C. Identifying ethical considerations for machine learning healthcare applications. Am. J. Bioeth. 20, 7–17 (2020).
Google Scholar
McCoy, L. G., Brenna, C. T., Chen, S. S., Vold, K. & Das, S. Believing in black boxes: machine learning for healthcare does not need explainability to be evidence-based. J. Clin. Epidemiol. 142, 252–257 (2022).
Google Scholar

Download references

Acknowledgements

This paper uses unit record data from Household, Income and Labour Dynamics in Australia Survey (HILDA). The HILDA survey was initiated and funded by the Australian Government Department of Social Services (DSS) and managed by the Melbourne Institute of Applied Economic and Social Research (Melbourne Institute. The findings and views reported in this paper, however, are those of the authors and should not be attributed to the Australian Government. https://melbourneinstitute.unimelb.edu.au/hilda).

Funding

No specific funding was used to undertake this study.

Author information

Authors and Affiliations

The Leeder Centre for Health Policy, Economics and Data, School of Public Health, Medicine and Health, The University of Sydney, Sydney, Australia
Evelyn Lee
Department of Actuarial Studies and Business Analytics, Macquarie Business School Macquarie University, Sydney, NSW, Australia
Jinhui Zhang

Authors

Evelyn Lee
View author publications
Search author on:PubMed Google Scholar
Jinhui Zhang
View author publications
Search author on:PubMed Google Scholar

Contributions

EL conceptualised the study. JZ was responsible for the analytical design. All authors were involved in aspects of data analysis, interpretation, drafting of the manuscript. All authors approved the final draft of the manuscript for submission and take responsibility for the manuscript content.

Corresponding author

Correspondence to Evelyn Lee.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethics approval and consent to participate

This paper uses unit record data from the Household, Income and Labour Dynamics in Australia (HILDA) Survey. The Melbourne Institute: Applied Economic and Social Research at the University of Melbourne are responsible for design and management of the survey and ethics approval was obtained from the Office of Research Ethics and Integrity, University of Melbourne to conduct the HILDA Survey. All experimental protocols were approved by Office of Research Ethics and Integrity, University of Melbourne. Informed consent was obtained from all subjects and/or their legal guardian(s). All methods were carried out in accordance with relevant guidelines and regulations.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary Material 2

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Lee, E., Zhang, J. Predicting non-emergency healthcare use in Australia using machine learning on longitudinal household data. Sci Rep (2026). https://doi.org/10.1038/s41598-025-28968-6

Download citation

Received: 23 May 2025
Accepted: 13 November 2025
Published: 03 February 2026
DOI: https://doi.org/10.1038/s41598-025-28968-6

Predicting non-emergency healthcare use in Australia using machine learning on longitudinal household data

Subjects

Abstract

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethics approval and consent to participate

Additional information

Publisher’s note

Supplementary Information

Supplementary Material 1

Supplementary Material 2

Rights and permissions

About this article

Cite this article

Keywords

Search

Quick links

Subjects

Abstract

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethics approval and consent to participate

Additional information

Publisher’s note

Supplementary Information

Supplementary Material 1

Supplementary Material 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links