Table 4 Statistical characteristics of workload datasets used in experiments.

From: Attention-based workload prediction and dynamic resource allocation for heterogeneous computing environments

Dataset

Duration (days)

Number of jobs

Avg. GPU utilization (%)

Workload types

Sampling interval

Google cluster 2019

30

147,523

N/A (CPU only)

Batch, service, ML training

5 min

Alibaba cluster 2020

28

68,941

62.3 ± 28.7

GPU training, inference

1 min

Academic research cluster

42

3,847

48.5 ± 35.2

LLM, NAS, CV research

10 s

Synthetic workload

14

25,000

70.1 ± 22.4

Mixed AI workloads

30 s

Combined dataset

114

245,311

58.7 ± 30.1

All categories

Variable