Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Reports
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific reports
  3. articles
  4. article
PADP: progressive and adaptive data pruning for efficient incremental learning
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 13 March 2026

PADP: progressive and adaptive data pruning for efficient incremental learning

  • Biqing Duan1,
  • Di Liu2,
  • Zhenli He1,
  • Wei Zhou1 &
  • …
  • Shengfa Miao1 

Scientific Reports , Article number:  (2026) Cite this article

  • 581 Accesses

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Engineering
  • Mathematics and computing

Abstract

Data pruning is a key technique for reducing training costs and improving model performance. However, most existing methods rely on fixed pruning rates or single metrics, which are largely designed for static training settings, making them unsuitable for incremental learning, where data distributions and model states change dynamically. To address this, we propose PADP, a progressive and adaptive data pruning method for incremental learning, which dynamically evaluates sample difficulty and its difficulty changes during training to enable adaptive sample selection for each incremental learning task. Specifically, we propose two metrics, the instant difficulty score and the difficulty variation score. The former evaluates the learning difficulty of a sample, while the latter evaluates the variation in difficulty over a training interval. These two metrics are combined to guide pruning decisions. To prevent certain classes from being completely removed, we also introduce a class-balance retention mechanism. Experimental results show that PADP outperforms existing data selection methods on CIFAR-100 and Tiny-ImageNet, generalizes across multiple incremental learning frameworks, and still maintains or exceeds the original accuracy even when training time is reduced by up to 52.90% compared to using the full dataset, demonstrating its effectiveness and practical value.

Similar content being viewed by others

Interpreting pretext tasks for active learning: a reinforcement learning approach

Article Open access 28 October 2024

Data free knowledge distillation with feature synthesis and spatial consistency for image analysis

Article Open access 11 November 2024

Training strategies for semi-supervised remote sensing image captioning

Article Open access 12 July 2025

Data availability

The datasets used in this study are publicly available. CIFAR-100 is available at https://www.cs.toronto.edu/\(\sim\)kriz/cifar.html, and Tiny ImageNet is available at https://www.kaggle.com/c/tiny-imagenet. The source code used for all experiments is publicly available at: https://github.com/duanbiqing/PADP.

References

  1. Shen, L. et al. On efficient training of large-scale deep learning models: A literature review. arXiv:2304.03589 (2023).

  2. Liu, D., Kong, H., Luo, X., Liu, W. & Subramaniam, R. Bringing AI to edge: From deep learning’s perspective. Neurocomputing 485, 297–320 (2022).

    Google Scholar 

  3. Luo, X. et al. Efficient deep learning infrastructures for embedded computing systems: A comprehensive survey and future envision. ACM Trans. Embedd. Comput. Syst. 24, 1–100 (2024).

    Google Scholar 

  4. Hestness, J., Ardalani, N. & Diamos, G. Beyond human-level accuracy: Computational challenges in deep learning. In Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming. 1–14 (2019).

  5. Liao, Y. et al. Accelerating federated learning with data and model parallelism in edge computing. IEEE/ACM Trans. Netw. 32, 904–918 (2023).

    Google Scholar 

  6. Shih, A., Belkhale, S., Ermon, S., Sadigh, D. & Anari, N. Parallel sampling of diffusion models. Adv. Neural Inf. Process. Syst. 36, 4263–4276 (2023).

    Google Scholar 

  7. He, Y. et al. Amc: Automl for model compression and acceleration on mobile devices. In Proceedings of the European Conference on Computer Vision (ECCV). 784–800 (2018).

  8. Fan, C., Guo, D., Wang, Z. & Wang, M. Multi-objective convex quantization for efficient model compression. In IEEE Transactions on Pattern Analysis and Machine Intelligence (2024).

  9. Paul, M., Ganguli, S. & Dziugaite, G. K. Deep learning on a data diet: Finding important examples early in training. Adv. Neural Inf. Process. Syst. 34, 20596–20607 (2021).

    Google Scholar 

  10. Guo, C., Zhao, B. & Bai, Y. Deepcore: A comprehensive library for coreset selection in deep learning. In International Conference on Database and Expert Systems Applications. 181–195 (Springer, 2022).

  11. Sener, O. & Savarese, S. Active learning for convolutional neural networks: A core-set approach. arXiv:1708.00489 (2017).

  12. Yang, S. et al. Mind the boundary: Coreset selection via reconstructing the decision boundary. In Forty-first International Conference on Machine Learning (2024).

  13. Zheng, H., Liu, R., Lai, F. & Prakash, A. Coverage-centric coreset selection for high pruning rates. arXiv:2210.15809 (2022).

  14. Wan, Z. et al. Contributing dimension structure of deep feature for coreset selection. Proc. AAAI Conf. Artif. Intell. 38, 9080–9088 (2024).

    Google Scholar 

  15. Mirzasoleiman, B., Bilmes, J. & Leskovec, J. Coresets for data-efficient training of machine learning models. In International Conference on Machine Learning. 6950–6960 (PMLR, 2020).

  16. Tiwari, R., Killamsetty, K., Iyer, R. & Shenoy, P. GCR: Gradient coreset based replay buffer selection for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 99–108 (2022).

  17. Chang, H.-S., Learned-Miller, E. & McCallum, A. Active bias: Training more accurate neural networks by emphasizing high variance samples. Adv. Neural Inf. Process. Syst. 30 (2017).

  18. He, M., Yang, S., Huang, T. & Zhao, B. Large-scale dataset pruning with dynamic uncertainty. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7713–7722 (2024).

  19. Toneva, M. et al. An empirical study of example forgetting during deep neural network learning. arXiv:1812.05159 (2018).

  20. Zhang, X., Du, J., Li, Y., Xie, W. & Zhou, J. T. Spanning training progress: Temporal dual-depth scoring (TDDS) for enhanced dataset pruning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 26223–26232 (2024).

  21. Yang, S. et al. Dataset pruning: Reducing training data by examining generalization influence. arXiv:2205.09329 (2022).

  22. Chhabra, A., Li, P., Mohapatra, P. & Liu, H. what data benefits my classifier? Enhancing model performance and interpretability through influence-based data selection. In The Twelfth International Conference on Learning Representations (2024).

  23. Yang, S., Yang, H., Guo, S., Shen, F. & Zhao, J. Not all data matters: An end-to-end adaptive dataset pruning framework for enhancing model performance and efficiency. arXiv:2312.05599 (2023).

  24. Huang, W., Zhang, Y., Guo, S., Shang, Y.-M. & Fu, X. Dynimpt: A dynamic data selection method for improving model training efficiency. IEEE Trans. Knowl. Data Eng. 37, 239–252 (2024).

    Google Scholar 

  25. Zhou, D.-W. et al. Class-incremental learning: A survey. In IEEE Transactions on Pattern Analysis and Machine Intelligence (2024).

  26. Zhou, D.-W. et al. Deep class-incremental learning: A survey. arXiv:2302.036481, 6 (2023).

  27. Coleman, C. et al. Selection via proxy: Efficient data selection for deep learning. arXiv:1906.11829 (2019).

  28. He, Y., Xiao, L. & Zhou, J. T. You only condense once: Two rules for pruning condensed datasets. Adv. Neural Inf. Process. Syst. 36, 39382–39394 (2023).

    Google Scholar 

  29. Kirkpatrick, J. et al. Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. 114, 3521–3526 (2017).

    Google Scholar 

  30. Li, Z. & Hoiem, D. Learning without forgetting. IEEE Trans. Pattern Anal. Mach. Intell. 40, 2935–2947 (2017).

    Google Scholar 

  31. Rebuffi, S.-A., Kolesnikov, A., Sperl, G. & Lampert, C. H. icarl: Incremental classifier and representation learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2001–2010 (2017).

  32. Douillard, A., Cord, M., Ollion, C., Robert, T. & Valle, E. Podnet: Pooled outputs distillation for small-tasks incremental learning. In European Conference on Computer Vision. 86–102 (Springer, 2020).

  33. Yan, S., Xie, J. & He, X. Der: Dynamically expandable representation for class incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3014–3023 (2021).

  34. Zhu, K., Zhai, W., Cao, Y., Luo, J. & Zha, Z.-J. Self-sustaining representation expansion for non-exemplar class-incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9296–9305 (2022).

  35. Zhu, F., Zhang, X.-Y., Wang, C., Yin, F. & Liu, C.-L. Prototype augmentation and self-supervision for incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5871–5880 (2021).

  36. Wang, F., Zhou, D., Ye, H. & Foster, D. Z. Feature boosting and compression for class-incremental learning. In ECCV FOSTER (2022).

  37. Zhou, D.-W., Wang, Q.-W., Ye, H.-J. & Zhan, D.-C. A model or 603 exemplars: Towards memory-efficient class-incremental learning. arXiv:2205.13218 (2022).

  38. Zhou, D.-W., Cai, Z.-W., Ye, H.-J., Zhan, D.-C. & Liu, Z. Revisiting class-incremental learning with pre-trained models: Generalizability and adaptivity are all you need. Int. J. Comput. Vis. 133, 1012–1032 (2025).

    Google Scholar 

  39. Zheng, B., Zhou, D.-W., Ye, H.-J. & Zhan, D.-C. Task-agnostic guided feature expansion for class-incremental learning. In Proceedings of the Computer Vision and Pattern Recognition Conference. 10099–10109 (2025).

  40. Duan, B. et al. Lodap: On-device incremental learning via lightweight operations and data pruning. arXiv:2504.19638 (2025).

  41. Feldman, V. & Zhang, C. What neural networks memorize and why: Discovering the long tail via influence estimation. Adv. Neural Inf. Process. Syst. 33, 2881–2891 (2020).

    Google Scholar 

  42. Hu, Q., Gao, Y. & Cao, B. Curiosity-driven class-incremental learning via adaptive sample selection. IEEE Trans. Circuits Syst. Video Technol. 32, 8660–8673 (2022).

    Google Scholar 

  43. Acharya, A., Yu, D., Yu, Q. & Liu, X. Balancing feature similarity and label variability for optimal size-aware one-shot subset selection. In Forty-first International Conference on Machine Learning (2024).

  44. Raju, R. S., Daruwalla, K. & Lipasti, M. Accelerating deep learning with dynamic data pruning. arXiv:2111.12621 (2021).

  45. Dhar, S. et al. A survey of on-device machine learning: An algorithms and learning theory perspective. ACM Trans. Internet Things 2, 1–49 (2021).

    Google Scholar 

  46. Krizhevsky, A. & Hinton, G. Technical Report (University of Toronto, 2009).

    Google Scholar 

  47. Le, Y. & Yang, X. Tiny ImageNet visual recognition challenge. CS 231N(7), 3 (2015).

    Google Scholar 

  48. Xia, X. et al. Moderate coreset: A universal method of data selection for real-world data-efficient deep learning. In The Eleventh International Conference on Learning Representations (2022).

  49. Pleiss, G., Zhang, T., Elenberg, E. & Weinberger, K. Q. Identifying mislabeled data using the area under the margin ranking. Adv. Neural Inf. Process. Syst. 33, 17044–17056 (2020).

    Google Scholar 

  50. Tan, H. et al. Data pruning via moving-one-sample-out. Adv. Neural Inf. Process. Syst. 36, 18251–18262 (2023).

    Google Scholar 

  51. Paszke, A. et al. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 32 (2019).

  52. He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778 (2016).

Download references

Funding

Open access funding provided by NTNU Norwegian University of Science and Technology (incl St. Olavs Hospital - Trondheim University Hospital). This work is supported in part by NSFC under Grant 62362068, the National Key Research Development Program of China under Grant No.2024YFC3014300, and the Yunnan Province Major Science and Technology Project under Grant No.202302AD080006.

Author information

Authors and Affiliations

  1. School of Software, Yunnan University, Kunming, China

    Biqing Duan, Zhenli He, Wei Zhou & Shengfa Miao

  2. Department of Computer Science, NTNU, Trondheim, Norway

    Di Liu

Authors
  1. Biqing Duan
    View author publications

    Search author on:PubMed Google Scholar

  2. Di Liu
    View author publications

    Search author on:PubMed Google Scholar

  3. Zhenli He
    View author publications

    Search author on:PubMed Google Scholar

  4. Wei Zhou
    View author publications

    Search author on:PubMed Google Scholar

  5. Shengfa Miao
    View author publications

    Search author on:PubMed Google Scholar

Contributions

B.D and D.L formulated the problem, B.D implemented the method, Z.H and S. M secured fundings, W.Z supervised. All authors polished and reviewed the manuscript.

Corresponding authors

Correspondence to Di Liu or Shengfa Miao.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Duan, B., Liu, D., He, Z. et al. PADP: progressive and adaptive data pruning for efficient incremental learning. Sci Rep (2026). https://doi.org/10.1038/s41598-026-43959-x

Download citation

  • Received: 15 January 2026

  • Accepted: 09 March 2026

  • Published: 13 March 2026

  • DOI: https://doi.org/10.1038/s41598-026-43959-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Subjects
  • Follow us on Facebook
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • About Scientific Reports
  • Contact
  • Journal policies
  • Guide to referees
  • Calls for Papers
  • Editor's Choice
  • Journal highlights
  • Open Access Fees and Funding

Publish with us

  • For authors
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Reports (Sci Rep)

ISSN 2045-2322 (online)

nature.com footer links

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics