Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Reports
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific reports
  3. articles
  4. article
Taxonomical modeling and classification in space hardware failure reporting
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 21 January 2026

Taxonomical modeling and classification in space hardware failure reporting

  • Daniel Palacios1,2,3 &
  • Terry R. Hill3 

Scientific Reports , Article number:  (2026) Cite this article

  • 520 Accesses

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Aerospace engineering
  • Computational science

Abstract

NASA Johnson Space Center has collected more than 54,000 space hardware failure reports. Obtaining engineering processes trends or root cause analysis by manual inspection is impractical. Fortunately, novel data science tools in Machine Learning and Natural Language Processing (NLP) can be utilized to perform text mining and knowledge extraction. In NLP the use of taxonomies (classification trees) are key to the structuring of text data, extracting knowledge and important concepts from documents, and facilitating the identification of correlations and trends within the data set. Usually, these taxonomies and text structures live in the heads of experts in their specific field. However, when an expert is not available, taxonomies and ontologies are not found in data bases, or the field of study is too broad, this approach can enable and provide structure to the text content of a record set. In this paper an automated taxonomical model is presented by the combination of Latent Dirichlet Allocation (LDA) algorithms and Bidirectional Encoder Representations from Transformers (BERT). Additionally, the limitations and outcomes of causal relationship rule mining models, commercial tools, and deep neural networks are also discussed.

Similar content being viewed by others

Learning the intrinsic dynamics of spatio-temporal processes through Latent Dynamics Networks

Article Open access 28 February 2024

Structured information extraction from scientific text with large language models

Article Open access 15 February 2024

Text mining-assisted machine learning prediction and experimental validation of emission wavelengths

Article Open access 23 January 2026

Data availability

The datasets generated and/or analysed during the current study are not publicly available due to containing sensitive documents with NASA’s engineering processes information but are available from the corresponding author on reasonable request.

References

  1. Sebastiani, F. Machine learning in automated text categorization. ACM computing surveys (CSUR) 34, 1–47 (2002).

    Google Scholar 

  2. Joachims, T. Text categorization with support vector machines: Learning with many relevant features. In European conference on machine learning, 137–142 (Springer, 1998).

  3. Kim, Y. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014).

  4. Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

  5. Blei, D. M., Ng, A. Y. & Jordan, M. I. Latent dirichlet allocation. Journal of machine Learning research 3, 993–1022 (2003).

    Google Scholar 

  6. Teh, Y. W. A hierarchical bayesian language model based on pitman-yor processes. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, 985–992 (2006).

  7. Blei, D. M. & Lafferty, J. D. Dynamic topic models. In Proceedings of the 23rd international conference on Machine learning, 113–120 (2006).

  8. Miao, Y., Yu, L. & Blunsom, P. Neural variational inference for text processing. In International conference on machine learning, 1727–1736 (PMLR, 2016).

  9. Dieng, A. B., Ruiz, F. J. & Blei, D. M. Topic modeling in embedding spaces. Transactions of the Association for Computational Linguistics 8, 439–453 (2020).

    Google Scholar 

  10. Baevski, A., Zhou, H., Mohamed, A. & Auli, M. wav2vec 2.0: A framework for self-supervised learning of speech representations. CoRR arXiv:2006.11477 (2020).

  11. Jalilifard, A., Caridá, V. F., Mansano, A. & Cristo, R. Semantic sensitive TF-IDF to determine word relevance in documents. CoRR arXiv:2001.09896 (2020).

  12. Jelodar, H. et al. Latent dirichlet allocation (lda) and topic modeling: models, applications, a survey (2018). arXiv:1711.04305.

  13. Lewis, D. D. Feature selection and feature extraction for text categorization. In Speech and Natural Language: Proceedings of a Workshop Held at Harriman, New York, February 23-26, 1992 (1992).

  14. Bai, H., Xing, F. Z., Cambria, E. & Huang, W.-B. Business taxonomy construction using concept-level hierarchical clustering. arXiv preprint arXiv:1906.09694 (2019).

  15. Lee, D. et al. Taxocom: Topic taxonomy completion with hierarchical discovery of novel topic clusters. In Proceedings of the ACM Web Conference 2022, 2819–2829 (2022).

    Google Scholar 

  16. Sharp, R. et al. Eidos, INDRA, & delphi: From free text to executable causal models. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), 42–47, https://doi.org/10.18653/v1/N19-4008 (Association for Computational Linguistics, Minneapolis, Minnesota, 2019).

  17. Hagberg, A., Swart, P. & S Chult, D. Exploring network structure, dynamics, and function using networkx. Tech. Rep., Los Alamos National Lab.(LANL), Los Alamos, NM (United States) (2008).

  18. Bokeh Development Team. Bokeh: Python library for interactive visualization (2014).

  19. Honnibal, M. & Montani, I. spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing (2017). To appear.

  20. Lin, F. F. et al. Fast dimensional analysis for root cause investigation in large-scale service environment. CoRR arXiv:1911.01225 (2019).

  21. Gorsuch, R. L. Using bartlett’s significance test to determine the number of factors to extract. Educational and Psychological Measurement 33, 361–364 (1973).

    Google Scholar 

  22. Kaiser, H. F. An index of factorial simplicity. psychometrika 39, 31–36 (1974).

    Google Scholar 

  23. Agrawal, R., Srikant, R. et al. Fast algorithms for mining association rules. In Proc. 20th int. conf. very large data bases, VLDB, vol. 1215, 487–499 (Citeseer, 1994).

  24. Grootendorst, M. Bertopic: Leveraging bert and c-tf-idf to create easily interpretable topics., https://doi.org/10.5281/zenodo.4381785 (2020).

  25. Gawade, M., Mane, T., Ghone, D. & Khade, P. Text document classification by using wordnet ontology and neural network. International Journal of Computer Applications 182, 33–36. https://doi.org/10.5120/ijca2018918229 (2018).

    Google Scholar 

  26. Huang, Z., Ye, Z., Li, S. & Pan, R. Length adaptive recurrent model for text classification. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, CIKM ’17, 1019–1027, https://doi.org/10.1145/3132847.3132947 (Association for Computing Machinery, New York, NY, USA, 2017).

  27. Zhang, H. et al. ASER: towards large-scale commonsense knowledge acquisition via higher-order selectional preference over eventualities. CoRR arXiv:2104.02137 (2021).

  28. Hoyle, A. M., Goel, P. & Resnik, P. Improving Neural Topic Models using Knowledge Distillation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1752–1771, https://doi.org/10.18653/v1/2020.emnlp-main.137 (Association for Computational Linguistics, Online, 2020).

  29. Gallagher, R., Reing, K., Kale, D. & Steeg, G. V. Anchored correlation explanation: Topic modeling with minimal domain knowledge. Transactions of the Association for Computational Linguistics 5, 529–542 (2017).

    Google Scholar 

  30. Dong, Y. A survey on neural network-based summarization methods. CoRR arXiv:1804.04589 (2018).

Download references

Acknowledgements

The authors declare the work conducted on this project was in support of NASA-internal business practices to understand the effectiveness of standard flight hardware processes. Special thanks to the Langley Research Center Data Science Team: Charles A. Liles GCP for guidance and Jam Session organization. Theodore D. Sidehamer for IBM Watson Explorer support, demo and access. Ilangovan, Hari S. for NLP INDRA-EIDOS discussions and resources. Thanks to the Johnson Space Center: (SA) Ram Pisipati, Robert J. Reynolds for early NLP guidance. (EA IT team) Jacci Bloom, Remyi Cole, Michael Patterson, Jeffrey Myerson for providing software access and troubleshooting support. (EX Intern) Ortiz Martes, Dianeliz for giving Power BI tutorials. (EX Interns) Heriberto Triana, Emanuel Sanchez, Jacquelyne Black, Nathan Berg, Sarah Smith, Rishi K. Chitturi, (GSFC Intern) Alexandra Carpenter, and others for helping me to navigate me through my NASA experience. David Kelldorf, Martin Garcia for early GCP discussions. (Intern Coordinators) Hiba Akram, Jennifer Becerra, Annalise Giuliani, Rosie Patterson. Additional thanks to the Marshall Space Flight Center. Trevor Gevers, Micheal Steele, Adam Gorski, James Lane, Frank S. King III for AWS Comprehend guidance and access. Also thanks to Ames Research Center/Arizona State Arizona State University: Dr. Yongming Liu, Dr. Yan, and Xinyu Zhao for providing useful resources to study BERT. Thanks to David C. Smith, Samantha N. Bianco, Aref F. Malek for LDA-BERT improvement suggestions from NASA community GCP AI ML agency presentation. Finally, thanks to the Goddard Space Flight Center, NASA Center for Climate Simulation support: Ellen M. Salmon, Mark L. Carroll for granting a Virtual Machine with Linux environment to test models.

Funding

NASA’s Office of STEM Engagement The Minority University Research and Education Project (MUREP).

Author information

Authors and Affiliations

  1. Jan and Dan Duncan Neurological Research Institute at Texas Children’s Hospital, Houston, TX, USA

    Daniel Palacios

  2. Graduate Program of Quantitative & Computational Biosciences, Baylor College of Medicine, Houston, TX, USA

    Daniel Palacios

  3. NASA, Engineering Processes and Methods Branch, Johnson Space Center, Houston, TX, USA

    Daniel Palacios & Terry R. Hill

Authors
  1. Daniel Palacios
    View author publications

    Search author on:PubMed Google Scholar

  2. Terry R. Hill
    View author publications

    Search author on:PubMed Google Scholar

Contributions

T.H. - Project Conceptualization, Data Curation, Funding, Supervision, Project Administration, General Resources. D.P. Project Formulation, Formal Analysis, Investigation, Methodology, Visualizations and Writing.

Corresponding author

Correspondence to Terry R. Hill.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information 1.

Supplementary Information 2.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Palacios, D., Hill, T.R. Taxonomical modeling and classification in space hardware failure reporting. Sci Rep (2026). https://doi.org/10.1038/s41598-026-36813-7

Download citation

  • Received: 26 June 2023

  • Accepted: 16 January 2026

  • Published: 21 January 2026

  • DOI: https://doi.org/10.1038/s41598-026-36813-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Subjects
  • Follow us on Facebook
  • Follow us on Twitter
  • Sign up for alerts
  • RSS feed

About the journal

  • About Scientific Reports
  • Contact
  • Journal policies
  • Guide to referees
  • Calls for Papers
  • Editor's Choice
  • Journal highlights
  • Open Access Fees and Funding

Publish with us

  • For authors
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Reports (Sci Rep)

ISSN 2045-2322 (online)

nature.com sitemap

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics