PADP: progressive and adaptive data pruning for efficient incremental learning

Duan, Biqing; Liu, Di; He, Zhenli; Zhou, Wei; Miao, Shengfa

doi:10.1038/s41598-026-43959-x

Download PDF

Article
Open access
Published: 13 March 2026

PADP: progressive and adaptive data pruning for efficient incremental learning

Biqing Duan¹,
Di Liu²,
Zhenli He¹,
Wei Zhou¹ &
…
Shengfa Miao¹

Scientific Reports , Article number: (2026) Cite this article

581 Accesses
Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Abstract

Data pruning is a key technique for reducing training costs and improving model performance. However, most existing methods rely on fixed pruning rates or single metrics, which are largely designed for static training settings, making them unsuitable for incremental learning, where data distributions and model states change dynamically. To address this, we propose PADP, a progressive and adaptive data pruning method for incremental learning, which dynamically evaluates sample difficulty and its difficulty changes during training to enable adaptive sample selection for each incremental learning task. Specifically, we propose two metrics, the instant difficulty score and the difficulty variation score. The former evaluates the learning difficulty of a sample, while the latter evaluates the variation in difficulty over a training interval. These two metrics are combined to guide pruning decisions. To prevent certain classes from being completely removed, we also introduce a class-balance retention mechanism. Experimental results show that PADP outperforms existing data selection methods on CIFAR-100 and Tiny-ImageNet, generalizes across multiple incremental learning frameworks, and still maintains or exceeds the original accuracy even when training time is reduced by up to 52.90% compared to using the full dataset, demonstrating its effectiveness and practical value.

Interpreting pretext tasks for active learning: a reinforcement learning approach

Article Open access 28 October 2024

Data free knowledge distillation with feature synthesis and spatial consistency for image analysis

Article Open access 11 November 2024

Training strategies for semi-supervised remote sensing image captioning

Article Open access 12 July 2025

Data availability

The datasets used in this study are publicly available. CIFAR-100 is available at https://www.cs.toronto.edu/\(\sim\)kriz/cifar.html, and Tiny ImageNet is available at https://www.kaggle.com/c/tiny-imagenet. The source code used for all experiments is publicly available at: https://github.com/duanbiqing/PADP.

References

Shen, L. et al. On efficient training of large-scale deep learning models: A literature review. arXiv:2304.03589 (2023).
Liu, D., Kong, H., Luo, X., Liu, W. & Subramaniam, R. Bringing AI to edge: From deep learning’s perspective. Neurocomputing 485, 297–320 (2022).
Google Scholar
Luo, X. et al. Efficient deep learning infrastructures for embedded computing systems: A comprehensive survey and future envision. ACM Trans. Embedd. Comput. Syst. 24, 1–100 (2024).
Google Scholar
Hestness, J., Ardalani, N. & Diamos, G. Beyond human-level accuracy: Computational challenges in deep learning. In Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming. 1–14 (2019).
Liao, Y. et al. Accelerating federated learning with data and model parallelism in edge computing. IEEE/ACM Trans. Netw. 32, 904–918 (2023).
Google Scholar
Shih, A., Belkhale, S., Ermon, S., Sadigh, D. & Anari, N. Parallel sampling of diffusion models. Adv. Neural Inf. Process. Syst. 36, 4263–4276 (2023).
Google Scholar
He, Y. et al. Amc: Automl for model compression and acceleration on mobile devices. In Proceedings of the European Conference on Computer Vision (ECCV). 784–800 (2018).
Fan, C., Guo, D., Wang, Z. & Wang, M. Multi-objective convex quantization for efficient model compression. In IEEE Transactions on Pattern Analysis and Machine Intelligence (2024).
Paul, M., Ganguli, S. & Dziugaite, G. K. Deep learning on a data diet: Finding important examples early in training. Adv. Neural Inf. Process. Syst. 34, 20596–20607 (2021).
Google Scholar
Guo, C., Zhao, B. & Bai, Y. Deepcore: A comprehensive library for coreset selection in deep learning. In International Conference on Database and Expert Systems Applications. 181–195 (Springer, 2022).
Sener, O. & Savarese, S. Active learning for convolutional neural networks: A core-set approach. arXiv:1708.00489 (2017).
Yang, S. et al. Mind the boundary: Coreset selection via reconstructing the decision boundary. In Forty-first International Conference on Machine Learning (2024).
Zheng, H., Liu, R., Lai, F. & Prakash, A. Coverage-centric coreset selection for high pruning rates. arXiv:2210.15809 (2022).
Wan, Z. et al. Contributing dimension structure of deep feature for coreset selection. Proc. AAAI Conf. Artif. Intell. 38, 9080–9088 (2024).
Google Scholar
Mirzasoleiman, B., Bilmes, J. & Leskovec, J. Coresets for data-efficient training of machine learning models. In International Conference on Machine Learning. 6950–6960 (PMLR, 2020).
Tiwari, R., Killamsetty, K., Iyer, R. & Shenoy, P. GCR: Gradient coreset based replay buffer selection for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 99–108 (2022).
Chang, H.-S., Learned-Miller, E. & McCallum, A. Active bias: Training more accurate neural networks by emphasizing high variance samples. Adv. Neural Inf. Process. Syst. 30 (2017).
He, M., Yang, S., Huang, T. & Zhao, B. Large-scale dataset pruning with dynamic uncertainty. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7713–7722 (2024).
Toneva, M. et al. An empirical study of example forgetting during deep neural network learning. arXiv:1812.05159 (2018).
Zhang, X., Du, J., Li, Y., Xie, W. & Zhou, J. T. Spanning training progress: Temporal dual-depth scoring (TDDS) for enhanced dataset pruning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 26223–26232 (2024).
Yang, S. et al. Dataset pruning: Reducing training data by examining generalization influence. arXiv:2205.09329 (2022).
Chhabra, A., Li, P., Mohapatra, P. & Liu, H. what data benefits my classifier? Enhancing model performance and interpretability through influence-based data selection. In The Twelfth International Conference on Learning Representations (2024).
Yang, S., Yang, H., Guo, S., Shen, F. & Zhao, J. Not all data matters: An end-to-end adaptive dataset pruning framework for enhancing model performance and efficiency. arXiv:2312.05599 (2023).
Huang, W., Zhang, Y., Guo, S., Shang, Y.-M. & Fu, X. Dynimpt: A dynamic data selection method for improving model training efficiency. IEEE Trans. Knowl. Data Eng. 37, 239–252 (2024).
Google Scholar
Zhou, D.-W. et al. Class-incremental learning: A survey. In IEEE Transactions on Pattern Analysis and Machine Intelligence (2024).
Zhou, D.-W. et al. Deep class-incremental learning: A survey. arXiv:2302.036481, 6 (2023).
Coleman, C. et al. Selection via proxy: Efficient data selection for deep learning. arXiv:1906.11829 (2019).
He, Y., Xiao, L. & Zhou, J. T. You only condense once: Two rules for pruning condensed datasets. Adv. Neural Inf. Process. Syst. 36, 39382–39394 (2023).
Google Scholar
Kirkpatrick, J. et al. Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. 114, 3521–3526 (2017).
Google Scholar
Li, Z. & Hoiem, D. Learning without forgetting. IEEE Trans. Pattern Anal. Mach. Intell. 40, 2935–2947 (2017).
Google Scholar
Rebuffi, S.-A., Kolesnikov, A., Sperl, G. & Lampert, C. H. icarl: Incremental classifier and representation learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2001–2010 (2017).
Douillard, A., Cord, M., Ollion, C., Robert, T. & Valle, E. Podnet: Pooled outputs distillation for small-tasks incremental learning. In European Conference on Computer Vision. 86–102 (Springer, 2020).
Yan, S., Xie, J. & He, X. Der: Dynamically expandable representation for class incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3014–3023 (2021).
Zhu, K., Zhai, W., Cao, Y., Luo, J. & Zha, Z.-J. Self-sustaining representation expansion for non-exemplar class-incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9296–9305 (2022).
Zhu, F., Zhang, X.-Y., Wang, C., Yin, F. & Liu, C.-L. Prototype augmentation and self-supervision for incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5871–5880 (2021).
Wang, F., Zhou, D., Ye, H. & Foster, D. Z. Feature boosting and compression for class-incremental learning. In ECCV FOSTER (2022).
Zhou, D.-W., Wang, Q.-W., Ye, H.-J. & Zhan, D.-C. A model or 603 exemplars: Towards memory-efficient class-incremental learning. arXiv:2205.13218 (2022).
Zhou, D.-W., Cai, Z.-W., Ye, H.-J., Zhan, D.-C. & Liu, Z. Revisiting class-incremental learning with pre-trained models: Generalizability and adaptivity are all you need. Int. J. Comput. Vis. 133, 1012–1032 (2025).
Google Scholar
Zheng, B., Zhou, D.-W., Ye, H.-J. & Zhan, D.-C. Task-agnostic guided feature expansion for class-incremental learning. In Proceedings of the Computer Vision and Pattern Recognition Conference. 10099–10109 (2025).
Duan, B. et al. Lodap: On-device incremental learning via lightweight operations and data pruning. arXiv:2504.19638 (2025).
Feldman, V. & Zhang, C. What neural networks memorize and why: Discovering the long tail via influence estimation. Adv. Neural Inf. Process. Syst. 33, 2881–2891 (2020).
Google Scholar
Hu, Q., Gao, Y. & Cao, B. Curiosity-driven class-incremental learning via adaptive sample selection. IEEE Trans. Circuits Syst. Video Technol. 32, 8660–8673 (2022).
Google Scholar
Acharya, A., Yu, D., Yu, Q. & Liu, X. Balancing feature similarity and label variability for optimal size-aware one-shot subset selection. In Forty-first International Conference on Machine Learning (2024).
Raju, R. S., Daruwalla, K. & Lipasti, M. Accelerating deep learning with dynamic data pruning. arXiv:2111.12621 (2021).
Dhar, S. et al. A survey of on-device machine learning: An algorithms and learning theory perspective. ACM Trans. Internet Things 2, 1–49 (2021).
Google Scholar
Krizhevsky, A. & Hinton, G. Technical Report (University of Toronto, 2009).
Google Scholar
Le, Y. & Yang, X. Tiny ImageNet visual recognition challenge. CS 231N(7), 3 (2015).
Google Scholar
Xia, X. et al. Moderate coreset: A universal method of data selection for real-world data-efficient deep learning. In The Eleventh International Conference on Learning Representations (2022).
Pleiss, G., Zhang, T., Elenberg, E. & Weinberger, K. Q. Identifying mislabeled data using the area under the margin ranking. Adv. Neural Inf. Process. Syst. 33, 17044–17056 (2020).
Google Scholar
Tan, H. et al. Data pruning via moving-one-sample-out. Adv. Neural Inf. Process. Syst. 36, 18251–18262 (2023).
Google Scholar
Paszke, A. et al. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 32 (2019).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778 (2016).

Download references

Funding

Open access funding provided by NTNU Norwegian University of Science and Technology (incl St. Olavs Hospital - Trondheim University Hospital). This work is supported in part by NSFC under Grant 62362068, the National Key Research Development Program of China under Grant No.2024YFC3014300, and the Yunnan Province Major Science and Technology Project under Grant No.202302AD080006.

Author information

Authors and Affiliations

School of Software, Yunnan University, Kunming, China
Biqing Duan, Zhenli He, Wei Zhou & Shengfa Miao
Department of Computer Science, NTNU, Trondheim, Norway
Di Liu

Authors

Biqing Duan
View author publications
Search author on:PubMed Google Scholar
Di Liu
View author publications
Search author on:PubMed Google Scholar
Zhenli He
View author publications
Search author on:PubMed Google Scholar
Wei Zhou
View author publications
Search author on:PubMed Google Scholar
Shengfa Miao
View author publications
Search author on:PubMed Google Scholar

Contributions

B.D and D.L formulated the problem, B.D implemented the method, Z.H and S. M secured fundings, W.Z supervised. All authors polished and reviewed the manuscript.

Corresponding authors

Correspondence to Di Liu or Shengfa Miao.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Duan, B., Liu, D., He, Z. et al. PADP: progressive and adaptive data pruning for efficient incremental learning. Sci Rep (2026). https://doi.org/10.1038/s41598-026-43959-x

Download citation

Received: 15 January 2026
Accepted: 09 March 2026
Published: 13 March 2026
DOI: https://doi.org/10.1038/s41598-026-43959-x