Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

npj Computational Materials
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. npj computational materials
  3. articles
  4. article
Unsupervised defect clustering in atomic-resolution microscopy using a convolutional variational autoencoder
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 10 March 2026

Unsupervised defect clustering in atomic-resolution microscopy using a convolutional variational autoencoder

  • R. A. W. Ayyubi1,
  • Seyfal Sultanov2,
  • James P. Buban1 &
  • …
  • Robert F. Klie1 

npj Computational Materials , Article number:  (2026) Cite this article

  • 1142 Accesses

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Techniques and instrumentation
  • Theory and computation

Abstract

Atomic-scale defects govern many functional properties of materials, yet their systematic identification and quantification remain challenging because supervised learning approaches require extensive labeled datasets, which are scarce in atomic-resolution microscopy due to the complexity and diversity of defect structures. To overcome this limitation, we introduce a fully unsupervised machine learning framework capable of discovering and clustering defect structures without prior labeling or predefined defect classes. The framework employs a convolutional variational autoencoder (CVAE) to reconstruct ideal, defect-free images, enabling the generation of difference images that isolate local structural anomalies. From these, 47 features are extracted and refined through a three-tier feature selection process to minimize redundancy and noise. Dimensionality reduction via principal component analysis (PCA), combined with silhouette score optimization, guides the determination of the optimal cluster number prior to applying k-means clustering, which yields well-separated groups corresponding to distinct defect types. Validated on CdTe and SrTiO3 datasets, this unsupervised, label-free approach enables high-throughput defect discovery and clustering in scanning transmission electron microscopy (STEM) and related imaging modalities.

Similar content being viewed by others

Defect detection in atomic-resolution images via unsupervised learning with translational invariance

Article Open access 09 November 2021

Rapid and flexible segmentation of electron microscopy data using few-shot machine learning

Article Open access 17 November 2021

Materials property mapping from atomic scale imaging via machine learning based sub-pixel processing

Article Open access 15 September 2022

Data availability

The data and Python codes supporting the findings of this study are openly available on GitHub at https://github.com/RAW-Ayyubi/defect-classification-in-stem-images.

References

  1. Liu, E. et al. High Responsivity Phototransistors Based on Few-Layer ReS2 for Weak Signal Detection. Adv. Funct. Mater. 26, 1938–1944 (2016).

    Google Scholar 

  2. Shim, J. et al. High-Performance 2D Rhenium Disulfide (ReS2) Transistors and Photodetectors by Oxygen Plasma Treatment. Adv. Mater. 28, 6985–6992 (2016).

    Google Scholar 

  3. Lagunas, F.et al. In Situ Formation of Ripplocations in Hybrid Organic–Inorganic MXenes. Adv. Mater. 37, https://doi.org/10.1002/adma.202411669 (2025).

  4. Rhodes, D., Chae, S. H., Ribeiro-Palau, R. & Hone, J. Disorder in van der Waals heterostructures of 2D materials. Nat. Mater. 18, 541–549 (2019).

    Google Scholar 

  5. Wu, Z. & Ni, Z. Spectroscopic investigation of defects in two-dimensional materials. Nanophotonics 6, 1219–1237 (2017).

    Google Scholar 

  6. Li, C. et al. Column-by-column observation of dislocation motion in CdTe: Dynamic scanning transmission electron microscopy. Appl. Phys. Lett. 109, 143107 (2016).

    Google Scholar 

  7. Li, C., Poplawsky, J., Yan, Y. & Pennycook, S. J. Understanding individual defects in CdTe thin-film solar cells via STEM: From atomic structure to electrical activity. Mater. Sci. Semiconductor Process.65 64–76 https://doi.org/10.1016/j.mssp.2016.06.017 (2017).

  8. Jiang, J., Xu, T., Lu, J., Sun, L. & Ni, Z. Defect Engineering in 2D Materials: Precise Manipulation and Improved Functionalities. Research 2019, 4641739 (2019).

    Google Scholar 

  9. Hong, J. et al. Exploring atomic defects in molybdenum disulphide monolayers. Nat. Commun. 6, 6293 (2015).

    Google Scholar 

  10. Qiu, H. et al. Hopping transport through defect-induced localized states in molybdenum disulphide. Nat. Commun. 4, 2642 (2013).

    Google Scholar 

  11. Guan, X. et al. Targeted elimination of tetravalent-Sn-induced defects for enhanced efficiency and stability in lead-free NIR-II perovskite LEDs. Nat. Commun. 15, 9913 (2024).

    Google Scholar 

  12. Vaibhav, V. et al. Experimental identification of topological defects in 2D colloidal glass. Nat. Commun. 16, 55 (2025).

    Google Scholar 

  13. Bielinski, N. et al. Floquet–Bloch manipulation of the Dirac gap in a topological antiferromagnet. Nat. Phys. 21, 458–463 (2025).

    Google Scholar 

  14. Ayyubi, R. A. W., Horing, N. J. M. & Sabeeh, K. Effect of pseudospin polarization on wave packet dynamics in graphene antidot lattices (GALs) in the presence of a normal magnetic field. J. Appl. Phys. 129, 074301 (2021).

    Google Scholar 

  15. Pennycook, S. J. & Jesson, D. E. High-resolution Z-contrast imaging of crystals. Ultramicroscopy 37, 14–38 (1991).

    Google Scholar 

  16. Guo, J. et al. Effect of selenium and chlorine co-passivation in polycrystalline CdSeTe devices. Appl. Phys. Lett. 115, 153901 (2019).

    Google Scholar 

  17. Shi, C. et al. Domain-dependent strain and stacking in two-dimensional van der Waals ferroelectrics. Nat. Commun. 14, 7168 (2023).

    Google Scholar 

  18. Lagunas, F. et al. Ion-Exchange Effects in One-Dimensional Lepidocrocite TiO2: A Cryogenic Scanning Transmission Electron Microscopy and Density Functional Theory Study. Chem. Mater. 36, 2743–2755 (2024).

    Google Scholar 

  19. Zangeneh, D., Sapkota, B., Uppuluri, R. & Klie, R. F. Atomic-Scale Tracking of Beam-Induced Phase Transitions in MgCr1.5Mn0.5O4. Chem. Mater. 37, 1491–1499 (2025).

    Google Scholar 

  20. Spurgeon, S. R. et al. Towards data-driven next-generation transmission electron microscopy. Nature Materials vol. 20 274–279 https://doi.org/10.1038/s41563-020-00833-z (2021).

  21. Kalinin, S. V. et al. Probe microscopy is all you need. Machine Learning: Science and Technology vol. 4 https://doi.org/10.1088/2632-2153/acccd5 (2023).

  22. Liu, Y., Checa, M. & Vasudevan, R. K. Synergizing human expertise and AI efficiency with language model for microscopy operation and automated experiment design. Mach. Learn Sci. Technol. 5, 02LT01 (2024).

    Google Scholar 

  23. Kalinin, S. V. et al. Machine learning for automated experimentation in scanning transmission electron microscopy. NPJ Comput Mater. 9, 227 (2023).

    Google Scholar 

  24. Doty, C. et al. Design of a graphical user interface for few-shot machine learning classification of electron microscopy data. Comput Mater. Sci. 203, 111121 (2022).

    Google Scholar 

  25. Shi, C. et al. Uncovering material deformations via machine learning combined with four-dimensional scanning transmission electron microscopy. NPJ Comput Mater. 8, 114 (2022).

    Google Scholar 

  26. Kalinin, S. V. et al. Machine learning in scanning transmission electron microscopy. Nat. Rev. Methods Prim. 2, 11 (2022).

    Google Scholar 

  27. Dan, J. et al. Learning Motifs and Their Hierarchies in Atomic Resolution Microscopy. Sci. Adv vol. 8 https://www.science.org (2022).

  28. Ayyubi, R. A. W., Sultanov, S., Buban, J. P. & Klie, R. F. Unsupervised Machine Learning for Atomic-Resolution S(TEM) Image Analysis and Management. Microsc. Microanal. 31, ozaf048.1072 (2025).

  29. Biswas, A., Vasudevan, R., Ziatdinov, M. & Kalinin, S. V. Optimizing training trajectories in variational autoencoders via latent Bayesian optimization approach. Mach. Learn Sci. Technol. 4, 015011 (2023).

    Google Scholar 

  30. Kingma, D. P. & Welling, M. Auto-Encoding Variational Bayes. arXiv:1312.6114 https://doi.org/10.48550/arXiv.1312.6114 (2014).

  31. Prifti, E., Buban, J. P., Thind, A. S. & Klie, R. F. Variational Convolutional Autoencoders for Anomaly Detection in Scanning Transmission Electron Microscopy. Small 19, 2205977 (2023).

    Google Scholar 

  32. Ayyubi, R. A. W., Buban, J. P. & Klie, R. F. Automated Defect Detection in Atomic Resolution STEM Images: A Machine Learning Approach with Variational Convolutional Autoencoders. Microsc. Microanal. 30, ozae044.180 (2024).

  33. Yan, X., Yang, J., Sohn, K. & Lee, H. Attribute2Image: Conditional Image Generation from Visual Attributes. arXiv:1512.00570 https://doi.org/10.48550/arXiv.1512.00570 (2015).

  34. Paulauskas, T. et al. Atomic scale study of polar Lomer-Cottrell and Hirth lock dislocation cores in CdTe. Acta Crystallogr A Found. Adv. 70, 524–531 (2014).

    Google Scholar 

  35. Sultanov, S., Ayyubi, R. A. W., Buban, J. P. & Klie, R. F. Robust Spectral Anomaly Detection in EELS Spectral Images via 3D Convolutional Variational Autoencoders. Small 21, 2503019 (2025).

    Google Scholar 

  36. Mitić, N. S. et al. Correlation-based feature selection of single cell transcriptomics data from multiple sources. J. Big Data 12, 4 (2025).

    Google Scholar 

  37. Ranjan, B. et al. DUBStepR is a scalable correlation-based feature selection method for accurately clustering single-cell data. Nat. Commun. 12, 5849 (2021).

    Google Scholar 

  38. Pudjihartono, N., Fadason, T., Kempa-Liehr, A. W. & O’Sullivan, J. M. A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction. Front. Bioinform. 2 https://doi.org/10.3389/fbinf.2022.927312 (2022).

  39. Zeb, M. H. et al. Machine Learning-Enhanced Prediction of Inorganic Semiconductor Bandgaps for Advancing Optoelectronic Technologies. Adv. Theory Simul. 7, 2400190 (2024).

    Google Scholar 

  40. Kaoungku, N., Suksut, K., Chanklan, R., Kerdprasop, K. & Kerdprasop, N. The silhouette width criterion for clustering and association mining to select image features. Int J. Mach. Learn Comput 8, 69–73 (2018).

    Google Scholar 

  41. Rousseeuw, P. J. Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis. J. Computational Appl. Math. 20, 53–65 (1987).

    Google Scholar 

  42. Kamalov, F. et al. Mathematical Methods in Feature Selection: A Review. Mathematics 13 https://doi.org/10.3390/math13060996 (2025).

  43. Vogelstein, J. T. et al. Supervised dimensionality reduction for big data. Nat. Commun. 12, 2872 (2021).

    Google Scholar 

  44. Jolliffe, I. T. Principal Component Analysis. (Springer New York, New York, NY, 1986). https://doi.org/10.1007/978-1-4757-1904-8.

  45. Abadi, M. et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. www.tensorflow.org.

  46. Pedregosa FABIANPEDREGOSA, F. et al. Scikit-Learn: Machine Learning in Python Gaël Varoquaux Bertrand Thirion Vincent Dubourg Alexandre Passos PEDREGOSA, VAROQUAUX, GRAMFORT ET AL. Matthieu Perrot. Journal of Machine Learning Research vol. 12 http://scikit-learn.sourceforge.net. (2011).

Download references

Acknowledgements

This work was supported by a grant from the National Science Foundation, NSF-DMR 2309396 and was in part based on research sponsored by the U.S. Department of Energy Office of Energy Efficiency and Renewable Energy Solar Energy Technologies Office agreement number 37989 through National Laboratory of the Rockies, operated under Contract No. DE-AC36-08GO28308. The views expressed in the article do not necessarily represent the views of the DOE or the U.S. Government. The U.S. Government retains and the publisher, by accepting the article for publication, acknowledges that the U.S. Government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this work, or allow others to do so, for U.S. Government purposes.

Author information

Authors and Affiliations

  1. Department of Physics, University of Illinois Chicago, Chicago, IL 60607, USA

    R. A. W. Ayyubi, James P. Buban & Robert F. Klie

  2. Department of Computer Science, University of Illinois Chicago, Chicago, IL 60607, USA

    Seyfal Sultanov

Authors
  1. R. A. W. Ayyubi
    View author publications

    Search author on:PubMed Google Scholar

  2. Seyfal Sultanov
    View author publications

    Search author on:PubMed Google Scholar

  3. James P. Buban
    View author publications

    Search author on:PubMed Google Scholar

  4. Robert F. Klie
    View author publications

    Search author on:PubMed Google Scholar

Contributions

R.F.K. and J.P.B. conceived the project and supervised the research. R.A.W.A. conducted the investigation, developed the software, generated the visualizations, and prepared the original draft of the manuscript. S.S. contributed to formal analysis and validation. J.P.B. and R.A.W.A. carried out methodology development. R.F.K. and J.P.B. contributed to conceptualization. R.F.K. led funding acquisition, project administration, and resource provision. R.F.K. further shared supervision responsibilities. All authors reviewed and commented on the final version of the manuscript.

Corresponding author

Correspondence to R. A. W. Ayyubi.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Materials (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ayyubi, R.A.W., Sultanov, S., Buban, J.P. et al. Unsupervised defect clustering in atomic-resolution microscopy using a convolutional variational autoencoder. npj Comput Mater (2026). https://doi.org/10.1038/s41524-026-02024-x

Download citation

  • Received: 10 June 2025

  • Accepted: 19 February 2026

  • Published: 10 March 2026

  • DOI: https://doi.org/10.1038/s41524-026-02024-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Associated content

Collection

Machine learning for automated experimentation in scanning transmission electron microscopy (STEM)

Advertisement

Explore content

  • Research articles
  • Reviews & Analysis
  • News & Comment
  • Collections
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • Aims & Scope
  • Content types
  • Journal Information
  • Open Access
  • About the Editors
  • Contact
  • Editorial policies
  • Journal Metrics
  • About the partner

Publish with us

  • For Authors and Referees
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

npj Computational Materials (npj Comput Mater)

ISSN 2057-3960 (online)

nature.com footer links

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing