Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Reports
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific reports
  3. articles
  4. article
Evaluating routing stability and coordination in swarm-based multi-agent task-oriented dialogue systems
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 03 March 2026

Evaluating routing stability and coordination in swarm-based multi-agent task-oriented dialogue systems

  • Abuzar Khan1,
  • Fahad Masood1,
  • Abid Iqbal2,
  • Ahmad Junaid1,
  • Saad Arif3,
  • Mohammed Al-Naeem4,
  • Ghassan Husnain1 &
  • …
  • Ali Saeed Alzahrani2 

Scientific Reports , Article number:  (2026) Cite this article

  • 2186 Accesses

  • 1 Altmetric

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Engineering
  • Mathematics and computing

Abstract

Conversational systems are becoming a primary interface for services and enterprise automation, and rapid market growth is pushing deployments into safety- and cost-sensitive settings. Reliability remains a bottleneck when interactions span multiple domains: an orchestrator must choose the next specialist, maintain shared dialogue state, and recover from mistakes before they cascade across handoffs. Despite rising interest in swarm-like multi-agent designs, orchestration is rarely evaluated with coordination-centric metrics, making it hard to compare routing policies beyond surface fluency. We present an evaluation-first pipeline for multi-domain task-oriented dialogue on MultiWOZ 2.2 that decouples routing from generation and exposes measurable failure modes. A DeBERTa-based router selects domain specialists, while a FLAN-T5 generator produces structured actions and belief-state updates under a shared memory interface. The protocol tracks delegation correctness, slot-progress coverage, switching and bouncing instability, loop behavior, and recovery after misroutes, and it links early-turn errors to downstream collapse using cascading-error attribution. We further introduce stress tests that simulate reformulation, long-horizon corrections, and tool-latency delays to probe robustness beyond static annotations. Across routing variants, confidence-aware gating yields the strongest stability improvement, achieving routing accuracy of 0.77 while substantially reducing handoff churn, with switching 0.11 and bounce 0.01, relative to a learned baseline with 0.65 accuracy, switching 0.44, and bounce 0.09. At the same time, confidence gating can trade progress for precision when it suppresses belief updates, highlighting an accuracy-progress tension that is important for deployment tuning. Diagnostic summaries identify misrouting and empty-state updates as dominant contributors, while looping is comparatively rare. Finally, applying the same evaluation to SGD shows that coordination challenges persist under schema shift. Overall, the proposed metrics and implementation blueprint provide a reproducible basis for diagnosing coordination failures and selecting orchestration policies for deployment.

Data availability

The MultiWOZ 2.2 and Schema-Guided Dialogue (SGD) datasets, along with the complete source code used in this study, are publicly available at the following links: \(\bullet\) Dataset: MultiWOZ 2.2 Dataset \(\bullet\) Dataset: Schema-Guided Dialogue (SGD) Dataset \(\bullet\) Source Code: GitHub Repository

References

  1. Chong, T., Yu, T., Keeling, D. I. & de Ruyter, K. AI-chatbots on the services frontline addressing the challenges and opportunities of agency. J. Retail. Consum. Serv. 63, 102735. https://doi.org/10.1016/j.jretconser.2021.102735 (2021).

    Google Scholar 

  2. Kietzmann, J. & Park, A. Written by ChatGPT: AI, large language models, conversational chatbots, and their place in society and business. Bus. Horiz. 67, 453–459. https://doi.org/10.1016/j.bushor.2024.06.002 (2024).

    Google Scholar 

  3. Estevez, M., Ballestar, M. T. & Sainz, J. Market research and knowledge using generative AI: The power of large language models. Journal of Innovation & Knowledge 10, 100796. https://doi.org/10.1016/j.jik.2025.100796 (2025).

    Google Scholar 

  4. Ferraro, C., Demsar, V., Sands, S., Restrepo, M. & Campbell, C. The paradoxes of generative AI-enabled customer service: A guide for managers. Bus. Horiz. 67, 549–559. https://doi.org/10.1016/j.bushor.2024.04.013 (2024).

    Google Scholar 

  5. Hermann, E. & Puntoni, S. Artificial intelligence and consumer behavior: From predictive to generative AI. J. Bus. Res. 180, 114720. https://doi.org/10.1016/j.jbusres.2024.114720 (2024).

    Google Scholar 

  6. Kwan, W.-C., Wang, H.-R., Wang, H.-M. & Wong, K.-F. A survey on recent advances and challenges in reinforcement learning methods for task-oriented dialogue policy learning. Mach. Intell. Res. 20, 318–334. https://doi.org/10.1007/s11633-022-1347-y (2023).

    Google Scholar 

  7. Maroengsit, W., Piyakulpinyo, T., Phonyiam, K. & Theeramunkong, T. A survey on evaluation methods for chatbots. In Proceedings of the 7th International Conference on Information Technology (InCIT 2019), 1–6, https://doi.org/10.1145/3323771.3323824 (ACM, New York, NY, USA, 2019).

  8. Sapkota, R., Roumeliotis, K. I. & Karkee, M. AI agents vs. agentic AI: A conceptual taxonomy, applications and challenges. Inf. Fusion 126, 103599. https://doi.org/10.1016/j.inffus.2025.103599 (2026).

    Google Scholar 

  9. Althaf, A. M., Mohammed, M. A., Milanova, M., Talburt, J. & Cakmak, M. C. Multi-agent RAG framework for entity resolution: Advancing beyond single-LLM approaches with specialized agent coordination. Computers 14, 525. https://doi.org/10.3390/computers14120525 (2025).

    Google Scholar 

  10. Vázquez, A., López Zorrilla, A., Olaso, J. M. & Torres, M. I. Dialogue management and language generation for a robust conversational virtual coach: Validation and user study. Sensors 23, 1423. https://doi.org/10.3390/s23031423 (2023).

    Google Scholar 

  11. Liesenfeld, A. & Dingemanse, M. Interactive probes: Towards action-level evaluation for dialogue systems. Discourse Commun. 18, 954–964. https://doi.org/10.1177/17504813241267071 (2024).

    Google Scholar 

  12. Ohashi, A. & Higashinaka, R. Optimizing pipeline task-oriented dialogue systems using post-processing networks. Comput. Speech Lang. 90, 101742. https://doi.org/10.1016/j.csl.2024.101742 (2025).

    Google Scholar 

  13. Deriu, J. et al. Survey on evaluation methods for dialogue systems. Artificial Intelligence Review 54, 755–810. https://doi.org/10.1007/s10462-020-09866-x (2021).

    Google Scholar 

  14. Yi, Z. et al. A survey on recent advances in llm-based multi-turn dialogue systems. ACM Computing Surveys 58, 1–38. https://doi.org/10.1145/3771090 (2025).

    Google Scholar 

  15. Razumovskaia, E. et al. Crossing the conversational chasm: A primer on natural language processing for multilingual task-oriented dialogue systems. Journal of Artificial Intelligence Research 74, 1351–1402. https://doi.org/10.1613/JAIR.1.13083 (2022).

    Google Scholar 

  16. Ni, J., Young, T., Pandelea, V., Xue, F. & Cambria, E. Recent advances in deep learning based dialogue systems: A systematic survey. Artif. Intell. Rev. 56, 3055–3155. https://doi.org/10.1007/s10462-022-10248-8 (2023).

    Google Scholar 

  17. Lee, H., Jo, S., Kim, H., Jung, S. & Kim, T.-Y. SUMBT+LaRL: Effective multi-domain end-to-end neural task-oriented dialog system. IEEE Access 9, 116133–116146. https://doi.org/10.1109/ACCESS.2021.3105461 (2021).

    Google Scholar 

  18. Heck, M. et al. Robust dialogue state tracking with weak supervision and sparse data. Trans. Assoc. Comput. Linguist. 10, 1175–1192. https://doi.org/10.1162/tacl_a_00513 (2022).

    Google Scholar 

  19. Liao, L., Long, L. H., Ma, Y. & Chua, T.-S. Dialogue state tracking with incremental reasoning. Trans. Assoc. Comput. Linguist. 9, 557–569. https://doi.org/10.1162/tac0l_a_00384 (2021).

    Google Scholar 

  20. Li, J., Song, S. & Yan, S. Advanced dialog state tracking with noetic graphs for complex human-machine interactions. Pattern Recogn. 168, 111842. https://doi.org/10.1016/j.patcog.2025.111842 (2025).

    Google Scholar 

  21. Khan, M. A. et al. A multi-attention approach using bert and stacked bidirectional lstm for improved dialogue state tracking. Appl. Sci. 13, 1775. https://doi.org/10.3390/app13031775 (2023).

    Google Scholar 

  22. Yu, H. & Ko, Y. Enriching the dialogue state tracking model with a asyntactic discourse graph. Pattern Recognit. Lett. 169, 81–86. https://doi.org/10.1016/j.patrec.2023.03.024 (2023).

    Google Scholar 

  23. Lu, H. et al. Prompt-based end-to-end cross-domain dialogue state tracking. Electronics 13, 3587. https://doi.org/10.3390/electronics13183587 (2024).

    Google Scholar 

  24. Tsinganos, N., Fouliras, P. & Mavridis, I. Leveraging dialogue state tracking for zero-shot chat-based social engineering attack recognition. Appl. Sci. 13, 5110. https://doi.org/10.3390/app13085110 (2023).

    Google Scholar 

  25. Hong, T., Cho, J., Yu, H., Ko, Y. & Seo, J. Knowledge-grounded dialogue modelling with dialogue-state tracking, domain tracking, and entity extraction. Comput. Speech Lang. 78, 101460. https://doi.org/10.1016/j.csl.2022.101460 (2023).

    Google Scholar 

  26. Jia, X., Zhang, R. & Peng, M. Multi-domain gate and interactive dual attention for multi-domain dialogue state tracking. Knowl. Based Syst. 286, 111383. https://doi.org/10.1016/j.knosys.2024.111383 (2024).

    Google Scholar 

  27. Xi, Z. et al. The rise and potential of large language model based agents: A survey. Sci. China Inf. Sci. 68, 121101. https://doi.org/10.1007/s11432-024-4222-0 (2025).

    Google Scholar 

  28. Wang, L. et al. A survey on large language model based autonomous agents. Front. Comput. Sci. 18, 186345. https://doi.org/10.1007/s11704-024-40231-1 (2024).

    Google Scholar 

  29. Qu, C. et al. Tool learning with large language models: A survey. Front. Comput. Sci. 19, 198343. https://doi.org/10.1007/s11704-024-40678-2 (2025).

    Google Scholar 

  30. Li, X., Wang, S., Zeng, S., Wu, Y. & Yang, Y. A survey on LLM-based multi-agent systems: Workflow, infrastructure, and challenges. Vicinagearth 1, 9. https://doi.org/10.1007/s44336-024-00009-2 (2024).

    Google Scholar 

  31. Gao, C. et al. Large language models empowered agent-based modeling and simulation: A survey and perspectives. Humanit. Soc. Sci. Commun. 11, 1259. https://doi.org/10.1057/s41599-024-03611-3 (2024).

    Google Scholar 

  32. Wang, Y. et al. Large model based agents: State-of-the-art, cooperation paradigms, security and privacy, and future trends. IEEE Commun. Surv. Tutor. https://doi.org/10.1109/COMST.2025.3576176 (2025). Accepted/In press.

  33. Liu, Y. et al. Datasets for large language models: A comprehensive survey. Artif. Intell. Rev. 58, 403. https://doi.org/10.1007/s10462-025-11403-7 (2025).

    Google Scholar 

  34. Lee, P., Son, M. & Jia, Z. Ai-powered automatic item generation for psychological tests: A conceptual framework for an llm-based multi-agent aig system. J. Bus. Psychol. https://doi.org/10.1007/s10869-025-10067-y (2025).

    Google Scholar 

  35. Song, A. & Azman, A. Enhancing LLM-driven multi-agent code generation through cross verification and joint optimization. Symmetry (Basel) 17, 1660. https://doi.org/10.3390/sym17101660 (2025).

    Google Scholar 

  36. Perera, R., Basnayake, A. & Wickramasinghe, M. Auto-scaling LLM-based multi-agent systems through dynamic integration of agents. Front. Artif. Intell. 8, 1638227. https://doi.org/10.3389/frai.2025.1638227 (2025).

    Google Scholar 

  37. Piccialli, F. et al. Agentai: A comprehensive survey on autonomous agents in distributed AI for Industry 4.0. Expert Syst. Appl. 291, 128404. https://doi.org/10.1016/j.eswa.2025.128404 (2025).

    Google Scholar 

  38. Abou Ali, M., Dornaika, F. & Charafeddine, J. Agentic AI: A comprehensive survey of architectures, applications, and future directions. Artif. Intell. Rev. https://doi.org/10.1007/s10462-025-11422-4 (2026).

    Google Scholar 

  39. Xie, J., Chen, Z., Zhang, R. & Li, G. Large multimodal agents: A survey. Vis. Intell. https://doi.org/10.1007/s44267-025-00093-y (2025).

    Google Scholar 

  40. Xia, C. S., Deng, Y., Dunn, S. & Zhang, L. Demystifying LLM-based software engineering agents. Proc. ACM Softw. Eng. 2, 801–824. https://doi.org/10.1145/3715754 (2025).

    Google Scholar 

  41. Kondylidis, N., Tiddi, I. & ten Teije, A. A framework for establishing shared, task-oriented understanding in hybrid open multi-agent systems. Front. Artif. Intell. 8, 1440582. https://doi.org/10.3389/frai.2025.1440582 (2025).

    Google Scholar 

  42. Legashev, L., Shukhman, A., Badikov, V. & Kurynov, V. Using large language models for goal-oriented dialogue systems. Applied Sciences 15, 4687. https://doi.org/10.3390/app15094687 (2025).

    Google Scholar 

  43. Sun, J., Kou, J., Shi, W. & Hou, W. A multi-agent collaborative algorithm for task-oriented dialogue systems. Int. J. Mach. Learn. Cybern. 16, 2009–2022. https://doi.org/10.1007/s13042-024-02374-2 (2025).

    Google Scholar 

  44. squiduu. Multiwoz 2.2. Kaggle dataset (2022). Updated Jan 30, 2022. Accessed: 2026–02-08.

  45. google-research-datasets. dstc8-schema-guided-dialogue. GitHub repository (2019). Schema-Guided Dialogue (SGD) and SGD-X datasets; CC BY-SA 4.0. Accessed: 2026–02-08.

  46. Rastogi, A., Zang, X., Sunkara, S., Gupta, R. & Khaitan, P. Towards scalable multi-domain conversational agents: The schema-guided dialogue dataset. Proc. AAAI Conf. Artif. Intell. 34, 8689–8696 (2020).

    Google Scholar 

  47. Wu, C.-S. et al. Transferable multi-domain state generator for task-oriented dialogue systems. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 808–819, https://doi.org/10.18653/v1/P19-1078 (Association for Computational Linguistics, 2019).

  48. Zang, X. et al. Multiwoz 2.2: A dialogue dataset with additional annotation corrections and state tracking baselines. In Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI, 1442, https://doi.org/10.18653/v1/2020.nlp4convai-1.13 (Association for Computational Linguistics, 2020).

  49. Zhang, J. et al. Find or classify? dual strategy for slot-value predictions on multi-domain dialog state tracking. In Proceedings of the Ninth Joint Conference on Lexical and Computational Semantics, 154–164 (Association for Computational Linguistics, 2020).

  50. Hosseini-Asl, E., McCann, B., Wu, C.-S., Yavuz, S. & Socher, R. A simple language model for task-oriented dialogue. arXiv:2005.00796. (2020).

  51. Tian, X. et al. Amendable generation for dialogue state tracking. In Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI, 62–70, https://doi.org/10.18653/v1/2021.nlp4convai-1.8 (Association for Computational Linguistics, 2021).

  52. Feng, Y., Wang, Y. & Li, H. A sequence-to-sequence approach to dialogue state tracking. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 1691–1701, https://doi.org/10.18653/v1/2021.acl-long.135 (Association for Computational Linguistics, 2021).

  53. Sun, X. et al. On tracking dialogue state by inheriting slot values in mentioned slot pools. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI), 4369–4377 (2022).

Download references

Acknowledgements

We thank the original authors of the MultiWOZ 2.2 dataset and the Schema-Guided Dialogue (SGD) dataset for making these resources publicly available.

Funding

This work was supported by the Deanship of Scientific Research, Vice Presidency for Graduate Studies and Scientific Research, King Faisal University, Saudi Arabia Grant No. KFU260581.

Author information

Authors and Affiliations

  1. Department of Computer Science, CECOS University of IT and Emerging Sciences, Peshawar, Pakistan

    Abuzar Khan, Fahad Masood, Ahmad Junaid & Ghassan Husnain

  2. Department of Computer Engineering, College of Computer Sciences and Information Technology, King Faisal University, Al-Ahsa, 31982, Saudi Arabia

    Abid Iqbal & Ali Saeed Alzahrani

  3. Department of Mechanical Engineering, College of Engineering, King Faisal University, Al-Ahsa, 31982, Saudi Arabia

    Saad Arif

  4. Department of Computer Networks Communications, College of Computer Sciences and Information Technology, King Faisal University, Al-Ahsa, 31982, Saudi Arabia

    Mohammed Al-Naeem

Authors
  1. Abuzar Khan
    View author publications

    Search author on:PubMed Google Scholar

  2. Fahad Masood
    View author publications

    Search author on:PubMed Google Scholar

  3. Abid Iqbal
    View author publications

    Search author on:PubMed Google Scholar

  4. Ahmad Junaid
    View author publications

    Search author on:PubMed Google Scholar

  5. Saad Arif
    View author publications

    Search author on:PubMed Google Scholar

  6. Mohammed Al-Naeem
    View author publications

    Search author on:PubMed Google Scholar

  7. Ghassan Husnain
    View author publications

    Search author on:PubMed Google Scholar

  8. Ali Saeed Alzahrani
    View author publications

    Search author on:PubMed Google Scholar

Contributions

Abuzar Khan contributed to the conceptualization of the study, methodology design, and initial drafting of the manuscript. Fahad Masood assisted in data collection, experimental implementation, and result analysis. Abid Iqbal contributed to supervision, validation of results, and critical revision of the manuscript. Ahmad Junaid supported software development, simulations, and visualization of results. Saad Arif assisted in data preprocessing, performance evaluation, and literature review. Mohammed Al-Naeem contributed to formal analysis, resource provision, and manuscript review. Ghassan Husnain contributed to overall supervision, project administration, funding acquisition, and final manuscript approval. Ali Saeed Alzahrani contributed to conceptual guidance, critical proofreading, and technical refinement of the manuscript.

Corresponding authors

Correspondence to Abid Iqbal or Ghassan Husnain.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethics Statement

This study does not involve human subjects, personal data or animal experiments and therefore does not require ethical approval.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Khan, A., Masood, F., Iqbal, A. et al. Evaluating routing stability and coordination in swarm-based multi-agent task-oriented dialogue systems. Sci Rep (2026). https://doi.org/10.1038/s41598-026-42158-y

Download citation

  • Received: 05 January 2026

  • Accepted: 24 February 2026

  • Published: 03 March 2026

  • DOI: https://doi.org/10.1038/s41598-026-42158-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • Multi Agent Orchestration
  • Task Oriented Dialogue
  • Routing Evaluation
  • Dialogue State Tracking
  • Conversational AI
Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Subjects
  • Follow us on Facebook
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • About Scientific Reports
  • Contact
  • Journal policies
  • Guide to referees
  • Calls for Papers
  • Editor's Choice
  • Journal highlights
  • Open Access Fees and Funding

Publish with us

  • For authors
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Reports (Sci Rep)

ISSN 2045-2322 (online)

nature.com footer links

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics