Abstract
The rapid proliferation of artificial intelligence applications in modern data centers demands intelligent resource management strategies that can effectively handle diverse workloads across heterogeneous computing infrastructures. This paper proposes an integrated framework that combines multi-head spatial-temporal attention mechanisms for workload prediction with dynamic resource allocation algorithms optimized for heterogeneous environments. The spatial-temporal attention architecture separately models temporal evolution patterns within individual workload streams and spatial correlations across concurrent task types, enabling accurate forecasting of resource demands. The allocation framework formulates resource assignment as a multi-objective optimization problem that jointly considers performance, energy efficiency, and utilization while explicitly accounting for prediction uncertainty. Experimental evaluation on real-world cluster traces demonstrates that our approach achieves 78.4% resource utilization with only 2.3% SLA violations, reduces average task completion time by 25.8%, and decreases energy consumption by 15.1% compared to production-grade baseline methods. The framework provides practical benefits for cloud service providers and enterprise data centers seeking to maximize infrastructure efficiency while maintaining service quality guarantees.
Data availability
The experimental results and analytical data supporting the findings of this study are provided in the Supplementary Materials accompanying this manuscript. The public datasets (Google Cluster Trace 2019 and Alibaba Cluster Trace 2020) are available from their respective repositories. Implementation code, trained model checkpoints, and detailed experimental configurations are available from the corresponding author, Peiqing Ye (15061073566@163.com), upon reasonable request. The Supplementary Materials include: (1) complete hyperparameter configurations for all baseline methods, (2) additional ablation study results, (3) per-dataset performance breakdowns, and (4) runtime profiling data across different cluster scales.
Abbreviations
- AI:
-
Artificial intelligence
- ARIMA:
-
Autoregressive integrated moving average
- CPU:
-
Central processing unit
- CV:
-
Computer vision; coefficient of variation
- FPGA:
-
Field-programmable gate array
- GPU:
-
Graphics processing unit
- GRU:
-
Gated recurrent unit
- LLM:
-
Large Language model
- LSTM:
-
Long short-term memory
- MAE:
-
Mean absolute error
- MAPE:
-
Mean absolute percentage error
- NAS:
-
Neural architecture search
- PPO:
-
Proximal policy optimization
- RNN:
-
Recurrent neural network
- RMSE:
-
Root mean squared error
- SLA:
-
Service level agreement
- TPU:
-
Tensor processing unit
References
Weng, Q. et al. MLaaS in the wild: Workload analysis and scheduling in large-scale heterogeneous GPU Clusters. In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22), 945–960 (2022). https://www.usenix.org/conference/nsdi22/presentation/weng
Zhou, X., Wang, M., Li, D. & Zhang, L. AI-driven job scheduling in cloud computing: A comprehensive review. Artif. Intell. Rev. 58(2). https://doi.org/10.1007/s10462-025-11208-8 (2024). Article 11208-8.
Kumar, N., Soni, D. & Sharma, R. Machine learning techniques in emerging cloud computing integrated paradigms: A survey and taxonomy. J. Netw. Comput. Appl. 205, Article 103419. https://doi.org/10.1016/j.jnca.2022.103419 (2022).
Garg, S., Ahuja, R., Singh, R. & Perl, I. An effective deep learning architecture leveraging BIRCH clustering for resource usage prediction of heterogeneous machines in cloud data center. Cluster Comput. 28(1), 1–21. https://doi.org/10.1007/s10586-024-04933-2 (2024).
Kim, I. K., Wang, W., Qi, Y. & Humphrey, M. Forecasting cloud application workloads with CloudInsight for predictive resource management. IEEE Trans. Cloud Comput. 10(3), 1848–1863. https://doi.org/10.1109/TCC.2020.3003164 (2020).
Chen, X. et al. Deep reinforcement learning-based methods for resource scheduling in cloud computing: A review and future directions. Artif. Intell. Rev. 57(5), Article 10756-9. https://doi.org/10.1007/s10462-024-10756-9 (2024).
Shih, S. Y., Sun, F. K. & Lee, H. Temporal pattern attention for multivariate time series forecasting. Mach. Learn. 108(8–9), 1421–1441. https://doi.org/10.1007/s10994-019-05815-0 (2019).
Hwang, J., Park, S. & Kim, H. Heterogeneous computing systems for AI workloads: A comprehensive survey. IEEE Trans. Parallel Distrib. Syst. 36(2), 356–372. https://doi.org/10.1109/TPDS.2024.3452891 (2025).
Liu, B., Lin, Y. & Chen, Y. Quantitative workload analysis and prediction using Google cluster traces. In 2016 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), 935–940 (2016). https://doi.org/10.1109/INFCOMW.2016.7562222
Jiang, C. et al. Characterizing co-located workloads in Alibaba cloud datacenters. IEEE Trans. Cloud Comput. 10(4), 2930–2945. https://doi.org/10.1109/TCC.2020.3039906 (2020).
Rossi, A., Visentin, A., Prestwich, S. & Brown, K. N. Clustering-based numerosity reduction for cloud workload forecasting. In International Symposium on Algorithmic Aspects of Cloud Computing, 115–132 (Springer, 2023). https://doi.org/10.1007/978-3-031-23495-6_7.
Calheiros, R. N., Masoumi, E., Ranjan, R. & Buyya, R. Workload prediction using ARIMA model and its impact on cloud applications’ QoS. IEEE Trans. Cloud Comput. 3(4), 449–458. https://doi.org/10.1109/TCC.2014.2350475 (2015).
Kong, W. et al. Short-term residential load forecasting based on LSTM recurrent neural network. IEEE Trans. Smart Grid 10(1), 841–851. https://doi.org/10.1109/TSG.2017.2753802 (2019).
Chung, J., Gulcehre, C., Cho, K. & Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, https://doi.org/10.48550/arXiv.1412.3555 (2014).
Yu, Y., Si, X., Hu, C. & Zhang, J. A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput. 31(7), 1235–1270. https://doi.org/10.1162/neco_a_01199 (2019).
Zhu, J. et al. Variational mode decomposition and sample entropy optimization based transformer framework for cloud resource load prediction. Knowl. Based Syst. 280, 111042. https://doi.org/10.1016/j.knosys.2023.111042 (2023).
Vaswani, A. et al. Attention is all you need. In Advances in Neural Information Processing Systems, Vol. 30, 5998–6008 (2017). https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
Devlin, J., Chang, M. W., Lee, K. & Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, 4171–4186 (2019). https://doi.org/10.18653/v1/N19-1423
Bahdanau, D., Cho, K. & Bengio, Y. Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations (ICLR 2015) (2015). https://doi.org/10.48550/arXiv.1409.0473
Zhou, H. et al. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35, no. 12, 11106–11115 (2021). https://doi.org/10.1609/aaai.v35i12.17325
Shi, X. et al. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Advances in Neural Information Processing Systems, Vol. 28, 802–810 (2015). https://proceedings.neurips.cc/paper/2015/hash/07563a3fe3bbe7e3ba84431ad9d055af-Abstract.html
Aguilera-Martos, I., García-Vico, Á. M., Luengo, J. & Herrera, F. Local attention mechanism: Boosting the transformer architecture for long-sequence time series forecasting. Pattern Recogn. 157, Article 110853. https://doi.org/10.1016/j.patcog.2024.110853 (2024).
Zhao, T., Fang, L., Ma, X., Li, X. & Zhang, C. TFformer: A time–frequency domain bidirectional sequence-level attention based transformer for interpretable long-term sequence forecasting. Pattern Recogn. 157, 110902. https://doi.org/10.1016/j.patcog.2024.110902 (2024).
Sanchis-Agudo, M. et al. Easy attention: A simple attention mechanism for Temporal predictions with Transformers. Phys. Fluids 37(6). https://doi.org/10.1063/5.0235178 (2025). Article 065162.
Jia, Z., Maggioni, M., Staiger, B. & P Scarpazza, D. Dissecting the NVIDIA Volta GPU architecture via microbenchmarking. ArXiv Preprint arXiv:1804 06826, https://doi.org/10.48550/arXiv.1804.06826 (2018).
Rhu, M., Gimelshein, N., Clemons, J., Zulfiqar, A. & Keckler, S. W. vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design. In 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 1–13 (2016). https://doi.org/10.1109/MICRO.2016.7783721
Delimitrou, C. & Kozyrakis, C. Quasar: Resource-efficient and QoS-aware cluster management. ACM SIGPLAN Not. 49(4), 127–144. https://doi.org/10.1145/2644865.2541941 (2014).
Verma, A. et al. Large-scale cluster management at Google with Borg. Proc. Tenth Eur. Conf. Comput. Syst. Article 18, 1–17. https://doi.org/10.1145/2741948.2741964 (2015).
Cortez, E. et al. Resource central: Understanding and predicting workloads for improved resource management in large cloud platforms. In Proceedings of the 26th ACM Symposium on Operating Systems Principles, Vol. 153, 167. https://doi.org/10.1145/3132747.3132772 (2017).
Holland, J. H. Genetic algorithms. Sci. Am. 267(1), 66–73. https://doi.org/10.1038/scientificamerican0792-66 (1992).
Bi, J. et al. Integrated deep learning method for workload and resource prediction in cloud systems. Neurocomputing 424, 35–48. https://doi.org/10.1016/j.neucom.2020.11.010 (2021).
Sharma, Y., Chakraborty, S. & Moulik, S. RT-SEAT: A hybrid approach based real-time scheduler for energy and temperature efficient heterogeneous multicore platforms. Energy Nexus 8, 100157. https://doi.org/10.1016/j.nexus.2022.100157 (2022).
Moulik, S. RESET: A real-time scheduler for energy and temperature aware heterogeneous multi-core systems. Integration 77, 59–69. https://doi.org/10.1016/j.vlsi.2020.11.003 (2021).
Sharma, Y. & Moulik, S. FRESH: Fault-tolerant real-time scheduler for heterogeneous multiprocessor platforms. Future Gen. Comput. Syst. 161, 257–273. https://doi.org/10.1016/j.future.2024.07.019 (2024).
Chakraborty, S., Sharma, Y. & Moulik, S. TREAFET: Temperature-aware real-time task scheduling for FinFET based multicores. ACM Trans. Embed. Comput. Syst. 23(4), 1–29. https://doi.org/10.1145/3665276 (2024).
Sarma, R., Moulik, S. & Longkumer, A. SMAC: A secure multi-authority access control scheme with attribute unification for fog enabled IoT in E-health. In (eds by Li, Y., Zhang, Y. & Xu, J.), Parallel and Distributed Computing, Applications and Technologies. PDCAT 2024. Lecture Notes in Computer Science, Vol 15502 (Springer, 2025). https://doi.org/10.1007/978-981-96-4207-6_7
Longkumer, A., Moulik, S. & Sarma, R. e-SAFE: A secure and efficient access control scheme with attribute convergence and user revocation in fog enhanced IoT for E-Health. J. Inform. Secur. Appl. 85, 103854. https://doi.org/10.1016/j.jisa.2024.103854 (2024).
Wu, N., Green, B., Ben, X. & O’Banion, S. Deep transformer models for time series forecasting: The influenza prevalence case. ArXiv Preprint arXiv:2001 08317. https://doi.org/10.48550/arXiv.2001.08317 (2020).
Li, S. et al. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. In Advances in Neural Information Processing Systems, Vol. 32, 5243–5253 (2019). https://proceedings.neurips.cc/paper/2019/hash/6775a0635c302542da2c32aa19d86be0-Abstract.html
Ma, Z., Mei, G. & Cuomo, S. An analytic framework using deep learning for prediction of traffic accident injury severity based on contributing factors. Accid. Anal. Prev. 160, Article 106322. https://doi.org/10.1016/j.aap.2021.106322 (2021).
Ba, J. L., Kiros, J. R. & Hinton, G. E. Layer normalization. arXiv preprint arXiv:1607.06450 (2016). https://doi.org/10.48550/arXiv.1607.06450
Hyndman, R. J. & Athanasopoulos, G. Forecasting: Principles and practice, 2nd ed. (OTexts, 2018). https://otexts.com/fpp2/
Topcuoglu, H., Hariri, S. & Wu, M. Y. Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13(3), 260–274. https://doi.org/10.1109/71.993206 (2002).
Grandl, R., Ananthanarayanan, G., Kandula, S., Rao, S. & Akella, A. Multi-resource packing for cluster schedulers. ACM SIGCOMM Comput. Commun. Rev. 44(4), 455–466. https://doi.org/10.1145/2740070.2626334 (2014).
Chen, Q., Liu, M., Huang, J., Wang, W. & Chen, M. Task offloading for deep learning empowered mobile edge computing in industrial internet of things. IEEE Trans. Ind. Inf. 17(12), 8248–8257. https://doi.org/10.1109/TII.2021.3065429 (2021).
Miettinen, K. Nonlinear Multiobjective Optimization (Springer, 1999). https://doi.org/10.1007/978-1-4615-5563-6
Zheng, L., Joe-Wong, C., Tan, C. W., Chiang, M. & Wang, X. How to bid the cloud. ACM SIGCOMM Comput. Commun. Rev. 45(4), 71–84. https://doi.org/10.1145/2829988.2787459 (2016).
Xiao, W. et al. Gandiva: Introspective cluster scheduling for deep learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), 595–610 (2018). https://www.usenix.org/conference/osdi18/presentation/xiao
Bengio, Y., Louradour, J., Collobert, R. & Weston, J. Curriculum learning. In Proceedings of the 26th Annual International Conference on Machine Learning, 41–48 (2009). https://doi.org/10.1145/1553374.1553380
Jain, A. K. Data clustering: 50 years beyond K-means. Pattern Recognit. Lett. 31(8), 651–666. https://doi.org/10.1016/j.patrec.2009.09.011 (2010).
Wang, D., Khosla, A., Gargeya, R., Irshad, H. & Beck, A. H. Deep learning for identifying metastatic breast cancer. arXiv preprint arXiv:1606.05718, https://doi.org/10.48550/arXiv.1606.05718 (2016).
Ousterhout, K., Rasti, R., Ratnasamy, S., Shenker, S. & Chun, B. G. Making sense of performance in data analytics frameworks. In 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI 15), 293–307. (2015). https://www.usenix.org/conference/nsdi15/technical-sessions/presentation/ousterhout..
Wilkes, J. Google cluster-usage traces v2. Google Technical Report (2019). https://github.com/google/cluster-data
Reiss, C., Wilkes, J., Hellerstein, J. L. & Paper, W. Google cluster-usage traces: Format + schema. Google Inc., 1–14 (2011). https://github.com/google/cluster-data/blob/master/ClusterData2011_2.md
Lim, B. & Zohren, S. Time-series forecasting with deep learning: A survey. Philos. Trans. R. Soc. A 379(2194). https://doi.org/10.1098/rsta.2020.0209 (2021). Article 20200209.
Zeng, A., Chen, M., Zhang, L. & Xu, Q. Are Transformers effective for time series forecasting? Proc. AAAI Conf. Artif. Intell. 37(9), 11121–11128. https://doi.org/10.1609/aaai.v37i9.26317 (2023).
Wu, H., Xu, J., Wang, J. & Long, M. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Adv. Neural. Inf. Process. Syst. 34, 22419–22430 (2021).
Zhou, T. et al. FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting. In Proceedings of the 39th International Conference on Machine Learning, Vol. 162, 27268–27286 (2022).
Nie, Y., Nguyen, N. H., Sinthong, P. & Kalagnanam, J. A time series is worth 64 words: Long-term forecasting with transformers. In Proceedings of International Conference on Learning Representations (ICLR). https://doi.org/10.48550/arXiv.2211.14730 (2023).
Wen, Q. et al. Transformers in time series: A survey. ArXiv Preprint arXiv:2202 07125, https://doi.org/10.48550/arXiv.2202.07125 (2023).
Peng, Y., Bao, Y., Chen, Y., Wu, C. & Guo, C. Optimus: An efficient dynamic resource scheduler for deep learning clusters. Proc. Thirteen. EuroSys Conf. Article 3, 1–14. https://doi.org/10.1145/3190508.3190517 (2018).
Dayarathna, M., Wen, Y. & Fan, R. Data center energy consumption modeling: A survey. IEEE Commun. Surv. Tutor. 18(1), 732–794. https://doi.org/10.1109/COMST.2015.2481183 (2016).
Acknowledgements
The authors acknowledge the computational resources provided by Northwestern Polytechnical University and Tsinghua University for conducting the experiments reported in this study.
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
S.S. contributed to the conceptualization of the research, designed and implemented the multi-head spatial-temporal attention prediction model, conducted the experimental validation, and drafted the original manuscript. X.D. participated in the development of the resource allocation algorithms, performed data preprocessing and experimental setup, and contributed to the methodology section. B.Z. provided the heterogeneous computing infrastructure resources, assisted in system deployment and performance evaluation, and contributed to the experimental analysis. P.Y. conceived the overall research framework, supervised the project, provided critical revisions to the manuscript, and served as the corresponding author coordinating all aspects of the research. All authors read, revised, and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not Applicable. This study involves computational experiments on workload prediction and resource allocation systems and does not involve human subjects, animal subjects, or clinical data.
Consent for publication
All authors have reviewed the manuscript and consent to its publication.
Competing interests
The authors declare no competing interests.
Clinical trial number
Not applicable.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Shao, S., Ding, X., Zhao, B. et al. Attention-based workload prediction and dynamic resource allocation for heterogeneous computing environments. Sci Rep (2026). https://doi.org/10.1038/s41598-026-38622-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-026-38622-4