Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Reports
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific reports
  3. articles
  4. article
Predicting non-emergency healthcare use in Australia using machine learning on longitudinal household data
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 03 February 2026

Predicting non-emergency healthcare use in Australia using machine learning on longitudinal household data

  • Evelyn Lee1 &
  • Jinhui Zhang2 

Scientific Reports , Article number:  (2026) Cite this article

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Health care
  • Health care economics

Abstract

The persistent increase in healthcare expenditure has become a major challenge for the sustainability of public financing worldwide. Therefore, identifying the characteristics of at-risk population and their predictability for healthcare use is crucial to inform targeted policy and interventions to curb with increasing healthcare use and expenditure. Drawing on three waves of the HILDA survey that included a ‘health module’, the study applied four machine learning (ML) methods–Random Forest, Gradient Boosting Decision Trees, Extreme Gradient Boosting, and multilayer perceptron neural networks and conventional logistic regression for prediction of non-emergency healthcare use (specifically primary and tertiary inpatient hospital care). Predictive performance for the classifiers was evaluated using accuracy, sensitivity, and specificity measures, and area under the receiver operating characteristic curve (AUC), sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and Matthews Correlation Coefficient (MCC). Calibration of the models was assessed using Brier score, which measures the mean squared difference between predicted probabilities and observed outcomes with lower values indicating better calibration. Finally, Local Interpretable Model-agnostic Explanations (LIME) was conducted to explain the model’s predictive behaviour, while SHAP results are provided for each wave along with a representative SHAP plot as a demonstration which uses the probability-contribution scale. Based on 47,899 observations and 741 variables, our model identified socio-economic factors (age, socio-economic status, private insurance status) and health-related variables (e.g. previous contact with healthcare service) and having a designated doctor to see when sick or for health advice were strong predictors of healthcare use. Between the different ML techniques, Gradient Boosting Decision Trees provided better prediction performance on healthcare use compared with logistic regression across all three waves. Although the standard logistic regression produced AUC of 0.69, had 71% positive predictive value (PPV), and 52% negative predictive value (NPV), with 86% sensitivity and 30% specificity, the ML models produced AUC in the range of 0.68 to 0.76, PPV of 75% to 77%, and NPVs of 61% to 63% with sensitivity ranging between 0.86 and 0.89, specificity between 0.40 and 0.44 and brier scores ranging between 0.11 and 0.28. The novelty of using ML techniques on a large, nationally representative longitudinal household survey data that covers a range of different domains provided more robust estimates on factors influencing future healthcare use (primary and inpatient elective care) which are important to inform resource allocation decisions and priority setting.

Data availability

This paper uses unit record data from Household, Income and Labour Dynamics in Australia Survey (HILDA) conducted by the Australian Government Department of Social Services (DSS). The findings and views reported in this paper, however, are those of the authors and should not be attributed to the Australian Government.The datasets analysed and/or generated during the current study are subject to the Confidentiality Deed signed with the Commonwealth of Australia (as represented by the Department of Social Services) and to the Commonwealth privacy laws. Data are accessible from the NCLD by application (https://www.dss.gov.au/national-centre-for-longitudinal-data-ncld/access-to-dss-longitudinal-datasets), and any questions about applying for the DSS longitudinal datasets should be addressed to NCLD ([ncld@dss.gov.au](mailto: ncld@dss.gov.au)).

References

  1. Australian Institute of. Health and Welfare (AIHW). Health expenditure Australia 2020-21. (2023).

  2. Harris, A. & Sharma, A. Estimating the future health and aged care expenditure in Australia with changes in morbidity. PloS One. 13, e0201697 (2018).

    Google Scholar 

  3. Nghiem, S. H. & Connelly, L. B. Convergence and determinants of health expenditures in OECD countries. Health Econ. Rev. 7, 1–11 (2017).

    Google Scholar 

  4. De Meijer, C., Wouterse, B., Polder, J. & Koopmanschap, M. The effect of population aging on health expenditure growth: a critical review. Eur. J. Ageing. 10, 353–361 (2013).

    Google Scholar 

  5. Baltagi, B. H. & Moscone, F. Health care expenditure and income in the OECD reconsidered: evidence from panel data. Econ. Model. 27, 804–811 (2010).

    Google Scholar 

  6. Ivanovski, K. & Awaworyi Churchill, S. Has healthcare expenditure converged across Australian States and territories? Empir. Econ. 61, 3401–3417 (2021).

    Google Scholar 

  7. Thiébaut, S., Barnay, T. & Ventelou, B. Ageing, chronic conditions and the evolution of future drugs expenditure: a five-year micro-simulation from 2004 to 2029. Appl. Econ. 45, 1663–1672 (2013).

    Google Scholar 

  8. Polder, J. J., Barendregt, J. J. & van Oers, H. Health care costs in the last year of life—the Dutch experience. Soc. Sci. Med. 63, 1720–1731 (2006).

    Google Scholar 

  9. Pulok, M., van Gool, K. & Hall, J. Inequity in healthcare use among the Indigenous population living in non-remote areas of Australia. Public. Health. 186, 35–43 (2020).

    Google Scholar 

  10. Bell, J. et al. Prevalence, hospital admissions and costs of child chronic conditions: A population-based study. J. Paediatr. Child Health. 56, 1365–1370 (2020).

    Google Scholar 

  11. Islam, M. M., Yen, L., Valderas, J. M. & McRae, I. S. Out-of-pocket expenditure by Australian seniors with chronic disease: the effect of specific diseases and morbidity clusters. BMC public. Health. 14, 1–18 (2014).

    Google Scholar 

  12. Hoon, E., Pham, C., Beilby, J. & Karnon, J. Unconnected and out-of-sight: identifying health care non-users with unmet needs. BMC Health Serv. Res. 17, 1–9 (2017).

    Google Scholar 

  13. Moore, T. G., McDonald, M., Carlon, L. & O’Rourke, K. Early childhood development and the social determinants of health inequities. Health Promot. Int. 30, ii102–ii15 (2015).

    Google Scholar 

  14. Pulok, M. H., van Gool, K. & Hall, J. Horizontal inequity in the utilisation of healthcare services in Australia. Health Policy. 124, 1263–1271 (2020).

    Google Scholar 

  15. Callander, E. J., Corscadden, L. & Levesque, J-F. Out-of-pocket healthcare expenditure and chronic disease–do Australians forgo care because of the cost? Aust. J. Prim. Health. 23, 15–22 (2017).

    Google Scholar 

  16. Wong, C. Y., Greene, J., Dolja-Gore, X. & van Gool, K. The rise and fall in Out‐of‐Pocket costs in australia: an analysis of the strengthening medicare reforms. Health Econ. 26, 962–979 (2017).

    Google Scholar 

  17. Russell, D. J. et al. Helping policy-makers address rural health access problems. Aust. J. Rural Health. 21, 61–71 (2013).

    Google Scholar 

  18. Bull, C., Howie, P. & Callander, E. J. Inequities in vulnerable children’s access to health services in Australia. BMJ Global Health. 7, e007961 (2022).

    Google Scholar 

  19. Moorin, R. E. & Holman, C. D. A. J. The effects of socioeconomic status, accessibility to services and patient type on hospital use in Western australia: a retrospective cohort study of patients with homogenous health status. BMC Health Serv. Res. 6, 1–10 (2006).

    Google Scholar 

  20. Erny-Albrecht, K., Oliver-Baxter, J. & Bywood, P. Primary health care-based programmes targeting potentially avoidable hospitalisations in vulnerable groups with chronic disease. Primary Health Care Research & Information Service policy issue review Adelaide: Primary Health Care Research & Information Service. (2016).

  21. Javaid, M., Haleem, A., Singh, R. P., Suman, R. & Rab, S. Significance of machine learning in healthcare: Features, pillars and applications. Int. J. Intell. Networks. 3, 58–73 (2022).

    Google Scholar 

  22. Teo, K. et al. The promise for reducing healthcare cost with predictive model: An analysis with quantized evaluation metric on readmission. Journal of healthcare engineering. 2021:1–10. (2021).

  23. Noor, M. B. T., Zenia, N. Z., Kaiser, M. S., Mamun, S. A. & Mahmud, M. Application of deep learning in detecting neurological disorders from magnetic resonance images: a survey on the detection of alzheimer’s disease, parkinson’s disease and schizophrenia. Brain Inf. 7, 1–21 (2020).

    Google Scholar 

  24. Sajeev, S. et al. Predicting Australian adults at high risk of cardiovascular disease mortality using standard risk factors and machine learning. Int. J. Environ. Res. Public Health. 18, 3187 (2021).

    Google Scholar 

  25. Brankovic, A., Rolls, D., Boyle, J., Niven, P. & Khanna, S. Identifying patients at risk of unplanned re-hospitalisation using statewide electronic health records. Sci. Rep. 12, 16592 (2022).

    Google Scholar 

  26. Niehaus, I. M., Kansy, N., Stock, S., Dötsch, J. & Müller, D. Applicability of predictive models for 30-day unplanned hospital readmission risk in paediatrics: a systematic review. BMJ open. 12, e055956 (2022).

    Google Scholar 

  27. Haut, E. R., Pronovost, P. J. & Schneider, E. B. Limitations of administrative databases. Jama 307, 2589–2590 (2012).

    Google Scholar 

  28. Mahmoudi, E. et al. Use of electronic medical records in development and validation of risk prediction models of hospital readmission: systematic review. Bmj 369, m958 (2020).

  29. Huang, Y., Talwar, A., Chatterjee, S. & Aparasu, R. R. Application of machine learning in predicting hospital readmissions: a scoping review of the literature. BMC Med. Res. Methodol. 21, 1–14 (2021).

    Google Scholar 

  30. Summerfield, M. et al. HILDA user manual–release 18, Melbourne. (2019).

  31. Watson, N. & Wooden, M. P. The HILDA survey: a case study in the design and development of a successful household panel survey. Longitud. Life Course Stud. 3, 369–381 (2012).

    Google Scholar 

  32. Collins, G. S., Moons, K. G., Dhiman, P., Riley, R. D., Beam, A. L., Van Calster,B., … Logullo, P. (2024). TRIPOD + AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. bmj, 385.

  33. Kwakernaak, S., van Mens, K., Cahn, W., Janssen, R. & Investigators, G. Using machine learning to predict mental healthcare consumption in non-affective psychosis. Schizophr. Res. 218, 166–172 (2020).

    Google Scholar 

  34. Hosmer, D. W. Jr, Lemeshow, S. & Sturdivant, R. X. Applied logistic regression: Wiley, (2013).

  35. Loh, H. W. et al. Application of explainable artificial intelligence for healthcare: A systematic review of the last decade (2011–2022). Computer Methods and Programs in Biomedicine. :107161. (2022).

  36. Steyerberg, E. W. et al. Assessing the performance of prediction models: a framework for some traditional and novel measures. Epidemiol. (Cambridge Mass). 21, 128 (2010).

    Google Scholar 

  37. Statistics ABo. Census of Population and Housing: Socio-Economic Indexes for Areas (SEIFA), Australia. 2011. (2016).

  38. Waring, J., Lindvall, C. & Umeton, R. Automated machine learning: review of the state-of-the-art and opportunities for healthcare. Artif. Intell. Med. 104, 101822 (2020).

    Google Scholar 

  39. Graham, B., Kruger, E., Tennant, M. & Shiikha, Y. An assessment of the Spatial distribution of bulk billing-only GP services in Australia in relation to area-based socio-economic status. Aust. J. Prim. Health (2023).

  40. Dollman, J., Gunn, K. M. & Hull, M. J. Sociodemographic predictors of attitudes to support seeking from a medical Doctor or other health provider among rural Australians. Int. J. Behav. Med. 28, 616–626 (2021).

    Google Scholar 

  41. Rana, R. H., Alam, K. & Gow, J. Selection of private or public hospital care: examining the care-seeking behaviour of patients with private health insurance. BMC Health Serv. Res. 20, 1–17 (2020).

    Google Scholar 

  42. Beauchamp, A. et al. Distribution of health literacy strengths and weaknesses across socio-demographic groups: a cross-sectional survey using the health literacy questionnaire (HLQ). BMC public. Health. 15, 1–13 (2015).

    Google Scholar 

  43. Durey, A. & Thompson, S. C. Reducing the health disparities of Indigenous australians: time to change focus. BMC Health Serv. Res. 12, 1–11 (2012).

    Google Scholar 

  44. Jonnagaddala, J., Godinho, M. A. & Liaw, S-T. From telehealth to virtual primary care in australia? A rapid scoping review. Int. J. Med. Informatics. 151, 104470 (2021).

    Google Scholar 

  45. Char, D. S., Abràmoff, M. D. & Feudtner, C. Identifying ethical considerations for machine learning healthcare applications. Am. J. Bioeth. 20, 7–17 (2020).

    Google Scholar 

  46. McCoy, L. G., Brenna, C. T., Chen, S. S., Vold, K. & Das, S. Believing in black boxes: machine learning for healthcare does not need explainability to be evidence-based. J. Clin. Epidemiol. 142, 252–257 (2022).

    Google Scholar 

Download references

Acknowledgements

This paper uses unit record data from Household, Income and Labour Dynamics in Australia Survey (HILDA). The HILDA survey was initiated and funded by the Australian Government Department of Social Services (DSS) and managed by the Melbourne Institute of Applied Economic and Social Research (Melbourne Institute. The findings and views reported in this paper, however, are those of the authors and should not be attributed to the Australian Government. https://melbourneinstitute.unimelb.edu.au/hilda).

Funding

No specific funding was used to undertake this study.

Author information

Authors and Affiliations

  1. The Leeder Centre for Health Policy, Economics and Data, School of Public Health, Medicine and Health, The University of Sydney, Sydney, Australia

    Evelyn Lee

  2. Department of Actuarial Studies and Business Analytics, Macquarie Business School Macquarie University, Sydney, NSW, Australia

    Jinhui Zhang

Authors
  1. Evelyn Lee
    View author publications

    Search author on:PubMed Google Scholar

  2. Jinhui Zhang
    View author publications

    Search author on:PubMed Google Scholar

Contributions

EL conceptualised the study. JZ was responsible for the analytical design. All authors were involved in aspects of data analysis, interpretation, drafting of the manuscript. All authors approved the final draft of the manuscript for submission and take responsibility for the manuscript content.

Corresponding author

Correspondence to Evelyn Lee.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethics approval and consent to participate

This paper uses unit record data from the Household, Income and Labour Dynamics in Australia (HILDA) Survey. The Melbourne Institute: Applied Economic and Social Research at the University of Melbourne are responsible for design and management of the survey and ethics approval was obtained from the Office of Research Ethics and Integrity, University of Melbourne to conduct the HILDA Survey. All experimental protocols were approved by Office of Research Ethics and Integrity, University of Melbourne. Informed consent was obtained from all subjects and/or their legal guardian(s). All methods were carried out in accordance with relevant guidelines and regulations.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary Material 2

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lee, E., Zhang, J. Predicting non-emergency healthcare use in Australia using machine learning on longitudinal household data. Sci Rep (2026). https://doi.org/10.1038/s41598-025-28968-6

Download citation

  • Received: 23 May 2025

  • Accepted: 13 November 2025

  • Published: 03 February 2026

  • DOI: https://doi.org/10.1038/s41598-025-28968-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • Machine learning
  • Random forest
  • Australia
  • Healthcare expenditure
  • Health informatics classifications
  • Area-under-the-curve
  • Prediction
Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Subjects
  • Follow us on Facebook
  • Follow us on Twitter
  • Sign up for alerts
  • RSS feed

About the journal

  • About Scientific Reports
  • Contact
  • Journal policies
  • Guide to referees
  • Calls for Papers
  • Editor's Choice
  • Journal highlights
  • Open Access Fees and Funding

Publish with us

  • For authors
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Reports (Sci Rep)

ISSN 2045-2322 (online)

nature.com sitemap

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing