Attention-based workload prediction and dynamic resource allocation for heterogeneous computing environments

Shao, Shijia; Ding, Xinyi; Zhao, Biao; Ye, Peiqing

doi:10.1038/s41598-026-38622-4

Download PDF

Article
Open access
Published: 12 February 2026

Attention-based workload prediction and dynamic resource allocation for heterogeneous computing environments

Shijia Shao¹,
Xinyi Ding²,
Biao Zhao³ &
…
Peiqing Ye⁴

Scientific Reports , Article number: (2026) Cite this article

217 Accesses
Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Abstract

The rapid proliferation of artificial intelligence applications in modern data centers demands intelligent resource management strategies that can effectively handle diverse workloads across heterogeneous computing infrastructures. This paper proposes an integrated framework that combines multi-head spatial-temporal attention mechanisms for workload prediction with dynamic resource allocation algorithms optimized for heterogeneous environments. The spatial-temporal attention architecture separately models temporal evolution patterns within individual workload streams and spatial correlations across concurrent task types, enabling accurate forecasting of resource demands. The allocation framework formulates resource assignment as a multi-objective optimization problem that jointly considers performance, energy efficiency, and utilization while explicitly accounting for prediction uncertainty. Experimental evaluation on real-world cluster traces demonstrates that our approach achieves 78.4% resource utilization with only 2.3% SLA violations, reduces average task completion time by 25.8%, and decreases energy consumption by 15.1% compared to production-grade baseline methods. The framework provides practical benefits for cloud service providers and enterprise data centers seeking to maximize infrastructure efficiency while maintaining service quality guarantees.

Data availability

The experimental results and analytical data supporting the findings of this study are provided in the Supplementary Materials accompanying this manuscript. The public datasets (Google Cluster Trace 2019 and Alibaba Cluster Trace 2020) are available from their respective repositories. Implementation code, trained model checkpoints, and detailed experimental configurations are available from the corresponding author, Peiqing Ye (15061073566@163.com), upon reasonable request. The Supplementary Materials include: (1) complete hyperparameter configurations for all baseline methods, (2) additional ablation study results, (3) per-dataset performance breakdowns, and (4) runtime profiling data across different cluster scales.

Abbreviations

AI:: Artificial intelligence
ARIMA:: Autoregressive integrated moving average
CPU:: Central processing unit
CV:: Computer vision; coefficient of variation
FPGA:: Field-programmable gate array
GPU:: Graphics processing unit
GRU:: Gated recurrent unit
LLM:: Large Language model
LSTM:: Long short-term memory
MAE:: Mean absolute error
MAPE:: Mean absolute percentage error
NAS:: Neural architecture search
PPO:: Proximal policy optimization
RNN:: Recurrent neural network
RMSE:: Root mean squared error
SLA:: Service level agreement
TPU:: Tensor processing unit

References

Weng, Q. et al. MLaaS in the wild: Workload analysis and scheduling in large-scale heterogeneous GPU Clusters. In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22), 945–960 (2022). https://www.usenix.org/conference/nsdi22/presentation/weng
Zhou, X., Wang, M., Li, D. & Zhang, L. AI-driven job scheduling in cloud computing: A comprehensive review. Artif. Intell. Rev. 58(2). https://doi.org/10.1007/s10462-025-11208-8 (2024). Article 11208-8.
Kumar, N., Soni, D. & Sharma, R. Machine learning techniques in emerging cloud computing integrated paradigms: A survey and taxonomy. J. Netw. Comput. Appl. 205, Article 103419. https://doi.org/10.1016/j.jnca.2022.103419 (2022).
Garg, S., Ahuja, R., Singh, R. & Perl, I. An effective deep learning architecture leveraging BIRCH clustering for resource usage prediction of heterogeneous machines in cloud data center. Cluster Comput. 28(1), 1–21. https://doi.org/10.1007/s10586-024-04933-2 (2024).
Google Scholar
Kim, I. K., Wang, W., Qi, Y. & Humphrey, M. Forecasting cloud application workloads with CloudInsight for predictive resource management. IEEE Trans. Cloud Comput. 10(3), 1848–1863. https://doi.org/10.1109/TCC.2020.3003164 (2020).
Google Scholar
Chen, X. et al. Deep reinforcement learning-based methods for resource scheduling in cloud computing: A review and future directions. Artif. Intell. Rev. 57(5), Article 10756-9. https://doi.org/10.1007/s10462-024-10756-9 (2024).
Shih, S. Y., Sun, F. K. & Lee, H. Temporal pattern attention for multivariate time series forecasting. Mach. Learn. 108(8–9), 1421–1441. https://doi.org/10.1007/s10994-019-05815-0 (2019).
Google Scholar
Hwang, J., Park, S. & Kim, H. Heterogeneous computing systems for AI workloads: A comprehensive survey. IEEE Trans. Parallel Distrib. Syst. 36(2), 356–372. https://doi.org/10.1109/TPDS.2024.3452891 (2025).
Google Scholar
Liu, B., Lin, Y. & Chen, Y. Quantitative workload analysis and prediction using Google cluster traces. In 2016 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), 935–940 (2016). https://doi.org/10.1109/INFCOMW.2016.7562222
Jiang, C. et al. Characterizing co-located workloads in Alibaba cloud datacenters. IEEE Trans. Cloud Comput. 10(4), 2930–2945. https://doi.org/10.1109/TCC.2020.3039906 (2020).
Google Scholar
Rossi, A., Visentin, A., Prestwich, S. & Brown, K. N. Clustering-based numerosity reduction for cloud workload forecasting. In International Symposium on Algorithmic Aspects of Cloud Computing, 115–132 (Springer, 2023). https://doi.org/10.1007/978-3-031-23495-6_7.
Calheiros, R. N., Masoumi, E., Ranjan, R. & Buyya, R. Workload prediction using ARIMA model and its impact on cloud applications’ QoS. IEEE Trans. Cloud Comput. 3(4), 449–458. https://doi.org/10.1109/TCC.2014.2350475 (2015).
Google Scholar
Kong, W. et al. Short-term residential load forecasting based on LSTM recurrent neural network. IEEE Trans. Smart Grid 10(1), 841–851. https://doi.org/10.1109/TSG.2017.2753802 (2019).
Google Scholar
Chung, J., Gulcehre, C., Cho, K. & Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, https://doi.org/10.48550/arXiv.1412.3555 (2014).
Yu, Y., Si, X., Hu, C. & Zhang, J. A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput. 31(7), 1235–1270. https://doi.org/10.1162/neco_a_01199 (2019).
Google Scholar
Zhu, J. et al. Variational mode decomposition and sample entropy optimization based transformer framework for cloud resource load prediction. Knowl. Based Syst. 280, 111042. https://doi.org/10.1016/j.knosys.2023.111042 (2023).
Google Scholar
Vaswani, A. et al. Attention is all you need. In Advances in Neural Information Processing Systems, Vol. 30, 5998–6008 (2017). https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
Devlin, J., Chang, M. W., Lee, K. & Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, 4171–4186 (2019). https://doi.org/10.18653/v1/N19-1423
Bahdanau, D., Cho, K. & Bengio, Y. Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations (ICLR 2015) (2015). https://doi.org/10.48550/arXiv.1409.0473
Zhou, H. et al. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35, no. 12, 11106–11115 (2021). https://doi.org/10.1609/aaai.v35i12.17325
Shi, X. et al. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Advances in Neural Information Processing Systems, Vol. 28, 802–810 (2015). https://proceedings.neurips.cc/paper/2015/hash/07563a3fe3bbe7e3ba84431ad9d055af-Abstract.html
Aguilera-Martos, I., García-Vico, Á. M., Luengo, J. & Herrera, F. Local attention mechanism: Boosting the transformer architecture for long-sequence time series forecasting. Pattern Recogn. 157, Article 110853. https://doi.org/10.1016/j.patcog.2024.110853 (2024).
Zhao, T., Fang, L., Ma, X., Li, X. & Zhang, C. TFformer: A time–frequency domain bidirectional sequence-level attention based transformer for interpretable long-term sequence forecasting. Pattern Recogn. 157, 110902. https://doi.org/10.1016/j.patcog.2024.110902 (2024).
Google Scholar
Sanchis-Agudo, M. et al. Easy attention: A simple attention mechanism for Temporal predictions with Transformers. Phys. Fluids 37(6). https://doi.org/10.1063/5.0235178 (2025). Article 065162.
Jia, Z., Maggioni, M., Staiger, B. & P Scarpazza, D. Dissecting the NVIDIA Volta GPU architecture via microbenchmarking. ArXiv Preprint arXiv:1804 06826, https://doi.org/10.48550/arXiv.1804.06826 (2018).
Rhu, M., Gimelshein, N., Clemons, J., Zulfiqar, A. & Keckler, S. W. vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design. In 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 1–13 (2016). https://doi.org/10.1109/MICRO.2016.7783721
Delimitrou, C. & Kozyrakis, C. Quasar: Resource-efficient and QoS-aware cluster management. ACM SIGPLAN Not. 49(4), 127–144. https://doi.org/10.1145/2644865.2541941 (2014).
Google Scholar
Verma, A. et al. Large-scale cluster management at Google with Borg. Proc. Tenth Eur. Conf. Comput. Syst. Article 18, 1–17. https://doi.org/10.1145/2741948.2741964 (2015).
Cortez, E. et al. Resource central: Understanding and predicting workloads for improved resource management in large cloud platforms. In Proceedings of the 26th ACM Symposium on Operating Systems Principles, Vol. 153, 167. https://doi.org/10.1145/3132747.3132772 (2017).
Holland, J. H. Genetic algorithms. Sci. Am. 267(1), 66–73. https://doi.org/10.1038/scientificamerican0792-66 (1992).
Google Scholar
Bi, J. et al. Integrated deep learning method for workload and resource prediction in cloud systems. Neurocomputing 424, 35–48. https://doi.org/10.1016/j.neucom.2020.11.010 (2021).
Google Scholar
Sharma, Y., Chakraborty, S. & Moulik, S. RT-SEAT: A hybrid approach based real-time scheduler for energy and temperature efficient heterogeneous multicore platforms. Energy Nexus 8, 100157. https://doi.org/10.1016/j.nexus.2022.100157 (2022).
Google Scholar
Moulik, S. RESET: A real-time scheduler for energy and temperature aware heterogeneous multi-core systems. Integration 77, 59–69. https://doi.org/10.1016/j.vlsi.2020.11.003 (2021).
Google Scholar
Sharma, Y. & Moulik, S. FRESH: Fault-tolerant real-time scheduler for heterogeneous multiprocessor platforms. Future Gen. Comput. Syst. 161, 257–273. https://doi.org/10.1016/j.future.2024.07.019 (2024).
Google Scholar
Chakraborty, S., Sharma, Y. & Moulik, S. TREAFET: Temperature-aware real-time task scheduling for FinFET based multicores. ACM Trans. Embed. Comput. Syst. 23(4), 1–29. https://doi.org/10.1145/3665276 (2024).
Google Scholar
Sarma, R., Moulik, S. & Longkumer, A. SMAC: A secure multi-authority access control scheme with attribute unification for fog enabled IoT in E-health. In (eds by Li, Y., Zhang, Y. & Xu, J.), Parallel and Distributed Computing, Applications and Technologies. PDCAT 2024. Lecture Notes in Computer Science, Vol 15502 (Springer, 2025). https://doi.org/10.1007/978-981-96-4207-6_7
Longkumer, A., Moulik, S. & Sarma, R. e-SAFE: A secure and efficient access control scheme with attribute convergence and user revocation in fog enhanced IoT for E-Health. J. Inform. Secur. Appl. 85, 103854. https://doi.org/10.1016/j.jisa.2024.103854 (2024).
Google Scholar
Wu, N., Green, B., Ben, X. & O’Banion, S. Deep transformer models for time series forecasting: The influenza prevalence case. ArXiv Preprint arXiv:2001 08317. https://doi.org/10.48550/arXiv.2001.08317 (2020).
Li, S. et al. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. In Advances in Neural Information Processing Systems, Vol. 32, 5243–5253 (2019). https://proceedings.neurips.cc/paper/2019/hash/6775a0635c302542da2c32aa19d86be0-Abstract.html
Ma, Z., Mei, G. & Cuomo, S. An analytic framework using deep learning for prediction of traffic accident injury severity based on contributing factors. Accid. Anal. Prev. 160, Article 106322. https://doi.org/10.1016/j.aap.2021.106322 (2021).
Ba, J. L., Kiros, J. R. & Hinton, G. E. Layer normalization. arXiv preprint arXiv:1607.06450 (2016). https://doi.org/10.48550/arXiv.1607.06450
Hyndman, R. J. & Athanasopoulos, G. Forecasting: Principles and practice, 2nd ed. (OTexts, 2018). https://otexts.com/fpp2/
Topcuoglu, H., Hariri, S. & Wu, M. Y. Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13(3), 260–274. https://doi.org/10.1109/71.993206 (2002).
Google Scholar
Grandl, R., Ananthanarayanan, G., Kandula, S., Rao, S. & Akella, A. Multi-resource packing for cluster schedulers. ACM SIGCOMM Comput. Commun. Rev. 44(4), 455–466. https://doi.org/10.1145/2740070.2626334 (2014).
Google Scholar
Chen, Q., Liu, M., Huang, J., Wang, W. & Chen, M. Task offloading for deep learning empowered mobile edge computing in industrial internet of things. IEEE Trans. Ind. Inf. 17(12), 8248–8257. https://doi.org/10.1109/TII.2021.3065429 (2021).
Google Scholar
Miettinen, K. Nonlinear Multiobjective Optimization (Springer, 1999). https://doi.org/10.1007/978-1-4615-5563-6
Zheng, L., Joe-Wong, C., Tan, C. W., Chiang, M. & Wang, X. How to bid the cloud. ACM SIGCOMM Comput. Commun. Rev. 45(4), 71–84. https://doi.org/10.1145/2829988.2787459 (2016).
Google Scholar
Xiao, W. et al. Gandiva: Introspective cluster scheduling for deep learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), 595–610 (2018). https://www.usenix.org/conference/osdi18/presentation/xiao
Bengio, Y., Louradour, J., Collobert, R. & Weston, J. Curriculum learning. In Proceedings of the 26th Annual International Conference on Machine Learning, 41–48 (2009). https://doi.org/10.1145/1553374.1553380
Jain, A. K. Data clustering: 50 years beyond K-means. Pattern Recognit. Lett. 31(8), 651–666. https://doi.org/10.1016/j.patrec.2009.09.011 (2010).
Google Scholar
Wang, D., Khosla, A., Gargeya, R., Irshad, H. & Beck, A. H. Deep learning for identifying metastatic breast cancer. arXiv preprint arXiv:1606.05718, https://doi.org/10.48550/arXiv.1606.05718 (2016).
Ousterhout, K., Rasti, R., Ratnasamy, S., Shenker, S. & Chun, B. G. Making sense of performance in data analytics frameworks. In 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI 15), 293–307. (2015). https://www.usenix.org/conference/nsdi15/technical-sessions/presentation/ousterhout..
Wilkes, J. Google cluster-usage traces v2. Google Technical Report (2019). https://github.com/google/cluster-data
Reiss, C., Wilkes, J., Hellerstein, J. L. & Paper, W. Google cluster-usage traces: Format + schema. Google Inc., 1–14 (2011). https://github.com/google/cluster-data/blob/master/ClusterData2011_2.md
Lim, B. & Zohren, S. Time-series forecasting with deep learning: A survey. Philos. Trans. R. Soc. A 379(2194). https://doi.org/10.1098/rsta.2020.0209 (2021). Article 20200209.
Zeng, A., Chen, M., Zhang, L. & Xu, Q. Are Transformers effective for time series forecasting? Proc. AAAI Conf. Artif. Intell. 37(9), 11121–11128. https://doi.org/10.1609/aaai.v37i9.26317 (2023).
Google Scholar
Wu, H., Xu, J., Wang, J. & Long, M. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Adv. Neural. Inf. Process. Syst. 34, 22419–22430 (2021).
Google Scholar
Zhou, T. et al. FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting. In Proceedings of the 39th International Conference on Machine Learning, Vol. 162, 27268–27286 (2022).
Nie, Y., Nguyen, N. H., Sinthong, P. & Kalagnanam, J. A time series is worth 64 words: Long-term forecasting with transformers. In Proceedings of International Conference on Learning Representations (ICLR). https://doi.org/10.48550/arXiv.2211.14730 (2023).
Wen, Q. et al. Transformers in time series: A survey. ArXiv Preprint arXiv:2202 07125, https://doi.org/10.48550/arXiv.2202.07125 (2023).
Peng, Y., Bao, Y., Chen, Y., Wu, C. & Guo, C. Optimus: An efficient dynamic resource scheduler for deep learning clusters. Proc. Thirteen. EuroSys Conf. Article 3, 1–14. https://doi.org/10.1145/3190508.3190517 (2018).
Dayarathna, M., Wen, Y. & Fan, R. Data center energy consumption modeling: A survey. IEEE Commun. Surv. Tutor. 18(1), 732–794. https://doi.org/10.1109/COMST.2015.2481183 (2016).
Google Scholar

Download references

Acknowledgements

The authors acknowledge the computational resources provided by Northwestern Polytechnical University and Tsinghua University for conducting the experiments reported in this study.

Funding

Not applicable.

Author information

Authors and Affiliations

School of Software, Northwestern Polytechnical University, Xi’an, 710129, Shaanxi, China
Shijia Shao
Shanghai Yingzhong Information Technology Co., LTD, Shanghai, 201600, China
Xinyi Ding
Shanghai Zhizhonglian Electronic Sales Co., Ltd., Shanghai, 201600, China
Biao Zhao
School of Mechanical Engineering, Tsinghua University, Beijing, 100084, China
Peiqing Ye

Authors

Shijia Shao
View author publications
Search author on:PubMed Google Scholar
Xinyi Ding
View author publications
Search author on:PubMed Google Scholar
Biao Zhao
View author publications
Search author on:PubMed Google Scholar
Peiqing Ye
View author publications
Search author on:PubMed Google Scholar

Contributions

S.S. contributed to the conceptualization of the research, designed and implemented the multi-head spatial-temporal attention prediction model, conducted the experimental validation, and drafted the original manuscript. X.D. participated in the development of the resource allocation algorithms, performed data preprocessing and experimental setup, and contributed to the methodology section. B.Z. provided the heterogeneous computing infrastructure resources, assisted in system deployment and performance evaluation, and contributed to the experimental analysis. P.Y. conceived the overall research framework, supervised the project, provided critical revisions to the manuscript, and served as the corresponding author coordinating all aspects of the research. All authors read, revised, and approved the final manuscript.

Corresponding author

Correspondence to Peiqing Ye.

Ethics declarations

Ethics approval and consent to participate

Not Applicable. This study involves computational experiments on workload prediction and resource allocation systems and does not involve human subjects, animal subjects, or clinical data.

Consent for publication

All authors have reviewed the manuscript and consent to its publication.

Competing interests

The authors declare no competing interests.

Clinical trial number

Not applicable.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Shao, S., Ding, X., Zhao, B. et al. Attention-based workload prediction and dynamic resource allocation for heterogeneous computing environments. Sci Rep (2026). https://doi.org/10.1038/s41598-026-38622-4

Download citation

Received: 19 November 2025
Accepted: 30 January 2026
Published: 12 February 2026
DOI: https://doi.org/10.1038/s41598-026-38622-4

Attention-based workload prediction and dynamic resource allocation for heterogeneous computing environments

Subjects

Abstract

Data availability

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Clinical trial number

Additional information

Publisher’s note

Supplementary Information

Supplementary Material 1

Rights and permissions

About this article

Cite this article

Keywords

Search

Quick links

Subjects

Abstract

Data availability

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Clinical trial number

Additional information

Publisher’s note

Supplementary Information

Supplementary Material 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links