Table 8 Scalability and Resource Metrics Under Load Replay.

Metric	Result
Load replay volume	5 million labeled URLs
Ingestion rate	\(\approx\) 2,300 requests per second
p99 inference latency	41.6 milliseconds
Horizontal Pod Autoscaler (HPA) scaling behavior	Scaled from 3 to 12 pods (CPU target = 70%)
GPU utilization (peak during retraining)	74%
Storage overhead after 60 days	\(\le\) 1.4 TB (MinIO lifecycle-managed)

Quick links

Search