Abstract
With the rapid development of large language models (LLMs), an increasing number of applications leverage cloud-based LLM APIs to reduce usage costs. However, since cloud-based models’ parameters and gradients are agnostic, users have to manually or use heuristic algorithms to adjust prompts for intervening LLM outputs, which requiring costly optimization procedures. In-context learning (ICL) has recently emerged as a promising paradigm that enables LLMs to adapt to new tasks using examples provided within the input, eliminating the need for parameter updates. Nevertheless, the advancement of ICL is often hindered by the lack of high-quality data, which is often sensitive and different to share. Federated learning (FL) offers a potential solution by enabling collaborative training of distributed LLMs while preserving data privacy. Despite this issues, previous FL approaches that incorporate ICL have struggled with severe straggler problems and challenges associated with heterogeneous non-identically data. To address these problems, we propose an asynchronous distributed bilevel tuning (AsynDBT) algorithm that optimizes both in-context learning samples and prompt fragments based on the feedback from the LLM, thereby enhancing downstream task performance. Benefiting from its distributed architecture, AsynDBT provides privacy protection and adaptability to heterogeneous computing environments. Furthermore, we present a theoretical analysis establishing the convergence guarantees of the proposed algorithm. Extensive experiments conducted on multiple benchmark datasets demonstrate the effectiveness and efficiency of AsynDBT.
Similar content being viewed by others
Data availability
The datasets used and/or analyzed during the current study available from the corresponding author on reasonable request.
References
Zhou, H. et al. Large language model (LLM) for telecommunications: A comprehensive survey on principles, key techniques, and opportunities. IEEE Communications Surveys & Tutorials (2024).
Liu, Y., Yang, K., Zhu, Y., Yang, K. & Zhao, H. Argus: Federated non-convex bilevel learning over 6 g space-air-ground integrated network. IEEE Transactions on Network Science and Engineering (2025).
Liu, Y. & Yang, K. Asynchronous decentralized federated anomaly detection for 6g networks. IEEE Transactions on Cognitive Communications and Networking (2025).
Ding, Y., Niu, C. & Wu, e. a., Fan. Enhancing on-device LLM inference with historical cloud-based LLM interactions. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 597–608 (2024).
Dong, Q. et al. A survey on in-context learning. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 1107–1128, https://doi.org/10.18653/v1/2024.emnlp-main.64 (2024).
Min, S. et al. Rethinking the role of demonstrations: What makes in-context learning work? In Proceedings of the Conference on Empirical Methods in Natural Language Processing (2022).
Dai, D. et al. Why can GPT learn in-context? language models secretly perform gradient descent as meta-optimizers. In Findings of the Association for Computational Linguistics, 4005–4019, https://doi.org/10.18653/v1/2023.findings-acl.247 (2023).
Shiguang, W., Yaqing, W. & Quanming, Y. Why in-context learning models are good few-shot learners? In ICLR (2025).
Li, G. et al. Meta in-context learning makes large language models better zero and few-shot relation extractors. In Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI ’24, https://doi.org/10.24963/ijcai.2024/702 (2024).
Genewein, T. et al. Understanding prompt tuning and in-context learning via meta-learning (2025). arXiv:2505.17010.
Huang, H. et al. Contextfl: Context-aware federated learning by estimating the training and reporting phases of mobile clients. In IEEE 42nd International Conference on Distributed Computing Systems (ICDCS), 570–580, https://doi.org/10.1109/ICDCS54860.2022.00061 (2022).
Ruhan, W. et al. Federated in-context learning: Iterative refinement for improved answer quality. In ICML (2025).
Wu, P., Li, K., Nan, J. & Fangxin, W. Federated in-context LLM agent learning. arXiv preprint arXiv:2412.08054 (2024).
Jiang, Y., Shen, J., Liu, Z., Tan, C. W. & Lam, K.-Y. Toward efficient and certified recovery from poisoning attacks in federated learning. IEEE Trans. Inform. Forens. Sec. 20, 2632–2647 (2025).
Zhang, Z., Cao, X., Jia, J. & Gong, N. Z. Fldetector: Defending federated learning against model poisoning attacks via detecting malicious clients. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2545–2555 (2022).
Ma, H., Yang, K. & Jiao, Y. Cellular traffic prediction via byzantine-robust asynchronous federated learning. IEEE Transactions on Network Science and Engineering (2025).
Yang, C. et al. Large language models as optimizers. In International Conference on Learning Representations (2024).
Pryzant, R. et al. Automatic prompt optimization with “gradient descent” and beam search. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 7957–7968, https://doi.org/10.18653/v1/2023.emnlp-main.494 (2023).
Sun, H. et al. Autohint: Automatic prompt optimization with hint generation. In The 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining Workshop (2023).
Shum, K., Diao, S. & Zhang, T. Automatic prompt augmentation and selection with chain-of-thought from labeled data. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 12113–12139, https://doi.org/10.18653/v1/2023.findings-emnlp.811 (2023).
Diao, S. et al. Black-box prompt learning for pre-trained language models. Transactions on Machine Learning Research (2022).
Deng, M. et al. RLPrompt: Optimizing discrete text prompts with reinforcement learning. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 3369–3391 (2022).
Xiaonan, L. et al. Unified demonstration retriever for incontext learning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, 4644–4668 (2023).
Mavromatis, C. et al. Which examples to annotate for in-context learning? towards effective and efficient selection (2023). arXiv:2310.20046.
Liu, J. et al. What makes good in-context examples for GPT-3? In Proceedings of Deep Learning Inside Out: The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures, 100–114 (2022).
Levy, I., Bogin, B. & Berant, J. Diverse demonstrations improve in-context compositional generalization. In Annual Meeting of the Association for Computational Linguistics (2023).
Li, X. & Qiu, X. Finding support examples for in-context learning. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 6219–6235, https://doi.org/10.18653/v1/2023.findings-emnlp.411 (2023).
Zhiyong, W., Yaoxiang, W., Jiacheng, Y. & Lingpeng, K. Self-adaptive in-context learning: An information compression perspective for incontext example selection and ordering. In In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (2023).
Kim, H. J. et al. Self-generated in-context learning: Leveraging auto-regressive language models as a demonstration generator. In Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics Workshop (2022).
Zhang, Z., Zhang, A., Li, M. & Smola, A. Automatic chain of thought prompting in large language models. In The Eleventh International Conference on Learning Representations (2023).
Sun, T., Shao, Y., Qian, H., Huang, X. & Qiu, X. Black-box tuning for language-model-as-a-service. In International Conference on Machine Learning, 20841–20855 (2022).
Singhal, S. P. et al. Asynchronous distributed-memory parallel algorithms for influence maximization. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (2024).
Zhu, H. & Ling, Q. Bridging differential privacy and Byzantine-robustness via model aggregation. In Int. Joint Conf. Artif. Intell., 2427–2433, https://doi.org/10.24963/IJCAI.2022/337 (2022).
Pan, Z. & Cannon, M. Asynchronous admm via a data exchange server. IEEE Trans. Control Netw. Syst. 11, 1631–1643. https://doi.org/10.1109/TCNS.2024.3354840 (2024).
Li, J., Huang, F. & Huang, H. Local stochastic bilevel optimization with momentum-based variance reduction. arXiv preprint arXiv:2205.01608 (2022).
Chen, X., Xiao, T. & Balasubramanian, K. Optimal algorithms for stochastic bilevel optimization under relaxed smoothness conditions. J. Mach. Learn. Res. 25, 1–51 (2024).
Irmai, J. & Andres, B. A state-of-the-art cutting plane algorithm for clique partitioning. In Pattern Recognition: 46th DAGM German Conference, 21–36, https://doi.org/10.1007/978-3-031-85181-0_2 (2025).
Jiao, Y., Yang, K. & Song, D. Distributed distributionally robust optimization with non-convex objectives. In Proc. Adv. Neural Inf. Proces. Syst. 35, 7987-7999 (2022).
Jiao, Y., Yang, K., Wu, T., Song, D. & Jian, C. Asynchronous distributed bilevel optimization (In Proc. Int. Conf. Learn, Represent, 2023).
Xu, Z., Zhang, H., Xu, Y. & Lan, G. A unified single-loop alternating gradient projection algorithm for nonconvex–concave and convex–nonconcave minimax problems. Math. Program. 1–72 (2023).
Wang, A. et al. Glue: A multi-task benchmark and analysis platform for natural language understanding. In International Conference on Learning Representations (2018).
Liu, Y. et al. RoBERTa: A robustly optimized BERT pretraining approach. CoRR abs/1907.11692 (2019). arXiv:1907.11692.
Kojima, T., Gu, S. S., Reid, M., Matsuo, Y. & Iwasawa, Y. Large language models are zero-shot reasoners. Adv. Neural Inform. Process. Syst. 35, 22199–22213 (2022).
Hang, C. N., Yu, P.-D. & Tan, C. W. TrumorGPT: Graph-based retrieval-augmented large language model for fact-checking. IEEE Trans. Artif. Intell. 6, 3148–3162 (2025).
Funding
This work was supported by the Tianchi Talents - Young Doctor Program (5105250183m), Science and Technology Program of Xinjiang Uyghur Autonomous Region (2024B03028, 2025B04051), Regional Fund of the National Natural Science Foundation of China (202512120005).
Author information
Authors and Affiliations
Contributions
Hui Ma: Writing – Review and Editing, Conceptualization; Shaoyu Dou: Writing – Original Draft, Software, Methodology; Ya Liu: Experimental Analysis; Fei Xing: Supervision; Li Feng: Validation, Data curation; Feng Pi: Supervision. All authors reviewed the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Ma, H., Dou, S., Liu, Y. et al. AsynDBT: asynchronous distributed bilevel tuning for efficient in-context learning with large language models. Sci Rep (2026). https://doi.org/10.1038/s41598-026-39582-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-026-39582-5


