Abstract
Data pruning is a key technique for reducing training costs and improving model performance. However, most existing methods rely on fixed pruning rates or single metrics, which are largely designed for static training settings, making them unsuitable for incremental learning, where data distributions and model states change dynamically. To address this, we propose PADP, a progressive and adaptive data pruning method for incremental learning, which dynamically evaluates sample difficulty and its difficulty changes during training to enable adaptive sample selection for each incremental learning task. Specifically, we propose two metrics, the instant difficulty score and the difficulty variation score. The former evaluates the learning difficulty of a sample, while the latter evaluates the variation in difficulty over a training interval. These two metrics are combined to guide pruning decisions. To prevent certain classes from being completely removed, we also introduce a class-balance retention mechanism. Experimental results show that PADP outperforms existing data selection methods on CIFAR-100 and Tiny-ImageNet, generalizes across multiple incremental learning frameworks, and still maintains or exceeds the original accuracy even when training time is reduced by up to 52.90% compared to using the full dataset, demonstrating its effectiveness and practical value.
Similar content being viewed by others
Data availability
The datasets used in this study are publicly available. CIFAR-100 is available at https://www.cs.toronto.edu/\(\sim\)kriz/cifar.html, and Tiny ImageNet is available at https://www.kaggle.com/c/tiny-imagenet. The source code used for all experiments is publicly available at: https://github.com/duanbiqing/PADP.
References
Shen, L. et al. On efficient training of large-scale deep learning models: A literature review. arXiv:2304.03589 (2023).
Liu, D., Kong, H., Luo, X., Liu, W. & Subramaniam, R. Bringing AI to edge: From deep learning’s perspective. Neurocomputing 485, 297–320 (2022).
Luo, X. et al. Efficient deep learning infrastructures for embedded computing systems: A comprehensive survey and future envision. ACM Trans. Embedd. Comput. Syst. 24, 1–100 (2024).
Hestness, J., Ardalani, N. & Diamos, G. Beyond human-level accuracy: Computational challenges in deep learning. In Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming. 1–14 (2019).
Liao, Y. et al. Accelerating federated learning with data and model parallelism in edge computing. IEEE/ACM Trans. Netw. 32, 904–918 (2023).
Shih, A., Belkhale, S., Ermon, S., Sadigh, D. & Anari, N. Parallel sampling of diffusion models. Adv. Neural Inf. Process. Syst. 36, 4263–4276 (2023).
He, Y. et al. Amc: Automl for model compression and acceleration on mobile devices. In Proceedings of the European Conference on Computer Vision (ECCV). 784–800 (2018).
Fan, C., Guo, D., Wang, Z. & Wang, M. Multi-objective convex quantization for efficient model compression. In IEEE Transactions on Pattern Analysis and Machine Intelligence (2024).
Paul, M., Ganguli, S. & Dziugaite, G. K. Deep learning on a data diet: Finding important examples early in training. Adv. Neural Inf. Process. Syst. 34, 20596–20607 (2021).
Guo, C., Zhao, B. & Bai, Y. Deepcore: A comprehensive library for coreset selection in deep learning. In International Conference on Database and Expert Systems Applications. 181–195 (Springer, 2022).
Sener, O. & Savarese, S. Active learning for convolutional neural networks: A core-set approach. arXiv:1708.00489 (2017).
Yang, S. et al. Mind the boundary: Coreset selection via reconstructing the decision boundary. In Forty-first International Conference on Machine Learning (2024).
Zheng, H., Liu, R., Lai, F. & Prakash, A. Coverage-centric coreset selection for high pruning rates. arXiv:2210.15809 (2022).
Wan, Z. et al. Contributing dimension structure of deep feature for coreset selection. Proc. AAAI Conf. Artif. Intell. 38, 9080–9088 (2024).
Mirzasoleiman, B., Bilmes, J. & Leskovec, J. Coresets for data-efficient training of machine learning models. In International Conference on Machine Learning. 6950–6960 (PMLR, 2020).
Tiwari, R., Killamsetty, K., Iyer, R. & Shenoy, P. GCR: Gradient coreset based replay buffer selection for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 99–108 (2022).
Chang, H.-S., Learned-Miller, E. & McCallum, A. Active bias: Training more accurate neural networks by emphasizing high variance samples. Adv. Neural Inf. Process. Syst. 30 (2017).
He, M., Yang, S., Huang, T. & Zhao, B. Large-scale dataset pruning with dynamic uncertainty. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7713–7722 (2024).
Toneva, M. et al. An empirical study of example forgetting during deep neural network learning. arXiv:1812.05159 (2018).
Zhang, X., Du, J., Li, Y., Xie, W. & Zhou, J. T. Spanning training progress: Temporal dual-depth scoring (TDDS) for enhanced dataset pruning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 26223–26232 (2024).
Yang, S. et al. Dataset pruning: Reducing training data by examining generalization influence. arXiv:2205.09329 (2022).
Chhabra, A., Li, P., Mohapatra, P. & Liu, H. what data benefits my classifier? Enhancing model performance and interpretability through influence-based data selection. In The Twelfth International Conference on Learning Representations (2024).
Yang, S., Yang, H., Guo, S., Shen, F. & Zhao, J. Not all data matters: An end-to-end adaptive dataset pruning framework for enhancing model performance and efficiency. arXiv:2312.05599 (2023).
Huang, W., Zhang, Y., Guo, S., Shang, Y.-M. & Fu, X. Dynimpt: A dynamic data selection method for improving model training efficiency. IEEE Trans. Knowl. Data Eng. 37, 239–252 (2024).
Zhou, D.-W. et al. Class-incremental learning: A survey. In IEEE Transactions on Pattern Analysis and Machine Intelligence (2024).
Zhou, D.-W. et al. Deep class-incremental learning: A survey. arXiv:2302.036481, 6 (2023).
Coleman, C. et al. Selection via proxy: Efficient data selection for deep learning. arXiv:1906.11829 (2019).
He, Y., Xiao, L. & Zhou, J. T. You only condense once: Two rules for pruning condensed datasets. Adv. Neural Inf. Process. Syst. 36, 39382–39394 (2023).
Kirkpatrick, J. et al. Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. 114, 3521–3526 (2017).
Li, Z. & Hoiem, D. Learning without forgetting. IEEE Trans. Pattern Anal. Mach. Intell. 40, 2935–2947 (2017).
Rebuffi, S.-A., Kolesnikov, A., Sperl, G. & Lampert, C. H. icarl: Incremental classifier and representation learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2001–2010 (2017).
Douillard, A., Cord, M., Ollion, C., Robert, T. & Valle, E. Podnet: Pooled outputs distillation for small-tasks incremental learning. In European Conference on Computer Vision. 86–102 (Springer, 2020).
Yan, S., Xie, J. & He, X. Der: Dynamically expandable representation for class incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3014–3023 (2021).
Zhu, K., Zhai, W., Cao, Y., Luo, J. & Zha, Z.-J. Self-sustaining representation expansion for non-exemplar class-incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9296–9305 (2022).
Zhu, F., Zhang, X.-Y., Wang, C., Yin, F. & Liu, C.-L. Prototype augmentation and self-supervision for incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5871–5880 (2021).
Wang, F., Zhou, D., Ye, H. & Foster, D. Z. Feature boosting and compression for class-incremental learning. In ECCV FOSTER (2022).
Zhou, D.-W., Wang, Q.-W., Ye, H.-J. & Zhan, D.-C. A model or 603 exemplars: Towards memory-efficient class-incremental learning. arXiv:2205.13218 (2022).
Zhou, D.-W., Cai, Z.-W., Ye, H.-J., Zhan, D.-C. & Liu, Z. Revisiting class-incremental learning with pre-trained models: Generalizability and adaptivity are all you need. Int. J. Comput. Vis. 133, 1012–1032 (2025).
Zheng, B., Zhou, D.-W., Ye, H.-J. & Zhan, D.-C. Task-agnostic guided feature expansion for class-incremental learning. In Proceedings of the Computer Vision and Pattern Recognition Conference. 10099–10109 (2025).
Duan, B. et al. Lodap: On-device incremental learning via lightweight operations and data pruning. arXiv:2504.19638 (2025).
Feldman, V. & Zhang, C. What neural networks memorize and why: Discovering the long tail via influence estimation. Adv. Neural Inf. Process. Syst. 33, 2881–2891 (2020).
Hu, Q., Gao, Y. & Cao, B. Curiosity-driven class-incremental learning via adaptive sample selection. IEEE Trans. Circuits Syst. Video Technol. 32, 8660–8673 (2022).
Acharya, A., Yu, D., Yu, Q. & Liu, X. Balancing feature similarity and label variability for optimal size-aware one-shot subset selection. In Forty-first International Conference on Machine Learning (2024).
Raju, R. S., Daruwalla, K. & Lipasti, M. Accelerating deep learning with dynamic data pruning. arXiv:2111.12621 (2021).
Dhar, S. et al. A survey of on-device machine learning: An algorithms and learning theory perspective. ACM Trans. Internet Things 2, 1–49 (2021).
Krizhevsky, A. & Hinton, G. Technical Report (University of Toronto, 2009).
Le, Y. & Yang, X. Tiny ImageNet visual recognition challenge. CS 231N(7), 3 (2015).
Xia, X. et al. Moderate coreset: A universal method of data selection for real-world data-efficient deep learning. In The Eleventh International Conference on Learning Representations (2022).
Pleiss, G., Zhang, T., Elenberg, E. & Weinberger, K. Q. Identifying mislabeled data using the area under the margin ranking. Adv. Neural Inf. Process. Syst. 33, 17044–17056 (2020).
Tan, H. et al. Data pruning via moving-one-sample-out. Adv. Neural Inf. Process. Syst. 36, 18251–18262 (2023).
Paszke, A. et al. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 32 (2019).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778 (2016).
Funding
Open access funding provided by NTNU Norwegian University of Science and Technology (incl St. Olavs Hospital - Trondheim University Hospital). This work is supported in part by NSFC under Grant 62362068, the National Key Research Development Program of China under Grant No.2024YFC3014300, and the Yunnan Province Major Science and Technology Project under Grant No.202302AD080006.
Author information
Authors and Affiliations
Contributions
B.D and D.L formulated the problem, B.D implemented the method, Z.H and S. M secured fundings, W.Z supervised. All authors polished and reviewed the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Duan, B., Liu, D., He, Z. et al. PADP: progressive and adaptive data pruning for efficient incremental learning. Sci Rep (2026). https://doi.org/10.1038/s41598-026-43959-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-026-43959-x


