Abstract
Large-volume 3D dense prediction is essential in industrial applications like energy exploration and medical image segmentation. However, existing deep learning models struggle to process full-size volumetric inputs at inference due to memory constraints and inefficient operator execution. Conventional solutions—such as tiling or compression—often introduce artifacts, compromise spatial consistency, or require retraining. Here we present a retraining-free inference optimization framework that enables accurate, efficient, whole-volume prediction without performance degradation. Our approach integrates operator spatial tiling, operator fusion, normalization statistic aggregation, and on-demand feature recomputation to reduce memory usage and accelerate runtime. Validated across multiple seismic exploration models, our framework supports full size inference on volumes exceeding 10243 voxels. On FaultSeg3D, for instance, it completes inference on a 10243 volume in 7.5 seconds using just 27.6 GB of memory—compared to conventional inference, which can handle only 4483 inputs under the same budget, marking a 13 × increase in volume size without loss in performance. Unlike traditional patch-wise inference, our method preserves global structural coherence, making it particularly suited for tasks inherently incompatible with chunked processing, such as implicit geological structure estimation. This work offers a generalizable, engineering-friendly solution for deploying 3D models at scale across industrial domains.
Similar content being viewed by others
Data availability
All data and benchmark results (e.g., runtime, memory usage, and error scores) to support the figures and findings of this study are publicly available in the Zenodo repository under the https://doi.org/10.5281/zenodo.17810070.
Code availability
The source code is publicly available at https://github.com/JintaoLee-Roger/torchseis. Pretrained models used in this study are available on Hugging Face at https://huggingface.co/shallowclose/torchseis-efficiency-infer.
References
Wu, X., Liang, L., Shi, Y. & Fomel, S. Faultseg3d: using synthetic data sets to train an end-to-end convolutional neural network for 3d seismic fault segmentation. Geophysics 84, IM35–IM45 (2019).
Dou, Y. et al. Mda gan: adversarial-learning-based 3-d seismic data interpolation and reconstruction for complex missing. IEEE Trans. Geosci. Remote Sens. 61, 1–14 (2023).
Chen, Y., Yu, S. & Ma, J. A projection-onto-convex-sets network for 3d seismic data interpolation. Geophysics 88, V249–V265 (2023).
Yang, L. et al. Salt3dnet: a self-supervised learning framework for 3-d salt segmentation. IEEE Trans. Geosci. Remote Sens. 62, 1–15 (2024).
Li, Y., Wu, X., Zhu, Z., Ding, J. & Wang, Q. Faultseg3d plus: a comprehensive study on evaluating and improving cnn-based seismic fault segmentation. Geophysics 89, N77–N91 (2024).
Taufik, M. & Alkhalifah, T. Efficient 3d velocity model building using conditional generative diffusion through reconstruction guidance. 86th EAGE Annu. Conf. Exhibition 2025, 1–5 (2025).
Çiçek, Ö., Abdulkadir, A., Lienkamp, S. S., Brox, T. & Ronneberger, O. 3d u-net: learning dense volumetric segmentation from sparse annotation. In International conference on medical image computing and computer-assisted intervention 424–432 (Springer, Cham, 2016).
Myronenko, A. 3d mri brain tumor segmentation using autoencoder regularization. In International MICCAI brainlesion workshop 311–320 (Springer, Cham, 2018).
Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J. & Maier-Hein, K. H. nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nat. methods 18, 203–211 (2021).
Hatamizadeh, A. et al. Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images. In International MICCAI brainlesion workshop 272–284 (Springer, Cham, 2022).
Perera, S., Navard, P. & Yilmaz, A. Segformer3d: an efficient transformer for 3d medical image segmentation. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 4981–4988 (IEEE, 2024).
Bi, Z., Wu, X., Geng, Z. & Li, H. Deep relative geologic time: a deep learning method for simultaneously interpreting 3-d seismic horizons and faults. J. Geophys. Res. Solid Earth 126, e2021JB021882 (2021).
Kirillov, A. et al. Segment anything. In Proc. IEEE/CVF international conference on computer vision 4015–4026 (IEEE, 2023).
Ravi, N. et al. SAM 2: Segment anything in images and videos. In The Thirteenth International Conference on Learning Representations (OpenReview.net, 2025).
Dou, Y. et al. Geological everything model 3d: a promptable foundation model for unified and zero-shot subsurface understanding. Preprint at https://doi.org/10.48550/arXiv.2507.00419 (2025).
He, K. et al. Masked autoencoders are scalable vision learners. In Proc. IEEE/CVF conference on computer vision and pattern recognition 16000–16009 (IEEE, 2022).
Rombach, R., Blattmann, A., Lorenz, D., Esser, P. & Ommer, B. High-resolution image synthesis with latent diffusion models. In Proc. IEEE/CVF conference on computer vision and pattern recognition 10684–10695 (IEEE, 2022).
Oquab, M. et al. DINOv2: Learning robust visual features without supervision. Trans. Mach. Learn. Res. (OpenReview.net, 2024).
Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proc. 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers) 4171–4186 (Association for Computational Linguistics, 2019).
Raffel, C. et al. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21, 1–67 (2020).
Brown, T. et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877–1901 (2020).
Achiam, J. et al. Gpt-4 technical report. Preprint at https://doi.org/10.48550/arXiv.2303.08774 (2023).
Team, G. et al. Gemini: a family of highly capable multimodal models. Preprint at https://doi.org/10.48550/arXiv.2312.11805 (2023).
Vaswani, A. et al. Attention is all you need. Adv. Neural Inform. Process. Syst. 30, 6000–6010 (2017).
Hinton, G., Vinyals, O. & Dean, J. Distilling the knowledge in a neural network. Preprint at https://doi.org/10.48550/arXiv.1503.02531 (2015).
Han, S., Mao, H. & Dally, W. J. Deep compression: compressing deep neural network with pruning, trained quantization and huffman coding. Preprint at https://doi.org/10.48550/arXiv.1510.00149 (2016).
Jacob, B. et al. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proc. IEEE conference on computer vision and pattern recognition 2704–2713 (IEEE, 2018).
Cheng, Y., Wang, D., Zhou, P. & Zhang, T. A survey of model compression and acceleration for deep neural networks. Preprint at https://doi.org/10.48550/arXiv.1710.09282 (2017).
Stark, T. J. Relative geologic time (age) volumes—relating every seismic sample to a geologically reasonable horizon. Lead. Edge 23, 928–932 (2004).
Geng, Z., Wu, X., Shi, Y. & Fomel, S. Deep learning for relative geologic time and seismic horizons. Geophysics 85, WA87–WA100 (2020).
Hillier, M. J., Schetselaar, E. M., de Kemp, E. A. & Perron, G. Three-dimensional modelling of geological surfaces using generalized interpolation with radial basis functions. Math. Geosci. 46, 931–953 (2014).
Collon, P. et al. 3d geomodelling combining implicit surfaces and voronoi-based remeshing: A case study in the lorraine coal basin (france). Comput. Geosci. 77, 29–43 (2015).
Bi, Z., Wu, X., Li, Z., Chang, D. & Yong, X. Deepismnet: three-dimensional implicit structural modeling with convolutional neural network. Geosci. Model Dev. Discuss. 2022, 1–28 (2022).
Ronneberger, O., Fischer, P. & Brox, T. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention 234–241 (Springer, Cham, 2015).
Chen, L.-C., Papandreou, G., Schroff, F. & Adam, H. Rethinking atrous convolution for semantic image segmentation. Preprint at https://doi.org/10.48550/arXiv.1706.05587 (2017).
Wang, J. et al. Deep high-resolution representation learning for visual recognition. IEEE Trans. pattern Anal. Mach. Intell. 43, 3349–3364 (2020).
Dou, Y., Li, K., Dong, M. & Xiao, Y. Faultssl: seismic fault detection via semisupervised learning. Geophysics 89, M79–M91 (2024).
Alaei, B. & Torabi, A. Fault asperity and roughness, insight from faults in 3d reflection seismic data. Mar. Pet. Geol. 170, 107145 (2024).
Gao, H., Wu, X. & Liu, G. Channelseg3d: channel simulation and deep learning for channel interpretation in 3d seismic images. Geophysics 86, IM73–IM83 (2021).
Wang, G., Wu, X. & Zhang, W. cigchannel: a large-scale 3d seismic dataset with labeled paleochannels for advancing deep learning in seismic interpretation. Earth Syst. Sci. Data 17, 3447–3471 (2025).
Wu, X., Yan, S., Qi, J. & Zeng, H. Deep learning for characterizing paleokarst collapse features in 3-d seismic images. J. Geophys. Res. Solid Earth 125, e2020JB019685 (2020).
Acknowledgements
This research is supported by the National Science Foundation of China under Grant 42374127. We thank the USTC Supercomputing Center for providing computational resources for this work. We thank Lei Qiao and Yuzhou Zhang from NVIDIA for their insightful discussions and valuable suggestions on the analysis and interpretation of the inference results. Their feedback greatly helped us strengthen the experimental evaluation of this work.
Author information
Authors and Affiliations
Contributions
Jintao Li: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Writing - Original Draft Preparation, Visualization. Xinming Wu: Conceptualization, Funding acquisition, Resources, Supervision, Validation, Project administration, Writing - Review & Editing.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Communications Engineering thanks Behzad Alaei, Stefan Carpentier and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editors: Carmine Galasso and Wenjie Wang, Rosamund Daw. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Li, J., Wu, X. Memory-efficient full-volume inference for large-scale 3D dense prediction without performance degradation. Commun Eng (2026). https://doi.org/10.1038/s44172-025-00576-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s44172-025-00576-2


