6G conditioned spatiotemporal graph neural networks for real time traffic flow prediction

Chauhan, Shishir Singh; Jain, Yogesh Kumar; Mannepalli, Praveen Kumar; Pandey, Ankur

doi:10.1038/s41598-025-32795-0

Download PDF

Article
Open access
Published: 28 January 2026

6G conditioned spatiotemporal graph neural networks for real time traffic flow prediction

Shishir Singh Chauhan¹,
Yogesh Kumar Jain²,
Praveen Kumar Mannepalli³^na1 &
…
Ankur Pandey¹^na1

Scientific Reports volume 16, Article number: 3902 (2026) Cite this article

554 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

Accurate, low-latency traffic forecasting is a cornerstone capability for next-generation Intelligent Transportation Systems (ITS). This paper investigates how emerging 6G-era network context specifically per node slice-bandwidth and channel-quality indicators can be fused with spatio-temporal graph models to improve short-term freeway speed prediction while respecting strict real-time constraints. Building on the METR-LA benchmark, we construct a reproducible pipeline that (i) cleans and temporally imputes loop-detector speeds, (ii) constructs a sparse Gaussian-kernel sensor graph, and (iii) synthesizes realistic per-sensor 6G signals aligned with the traffic time series. We implement and compare four model families: Spatio-Temporal GCN (ST-GCN), Graph Attention ST-GAT, Diffusion Convolutional Recurrent Neural Network (DCRNN), and a novel 6G-conditioned DCRNN (DCRNN6G) that adaptively weights diffusion by slice-bandwidth. Our evaluation systematically explores four feature regimes (speeds only; channel quality only; slice bandwidth only; both features), and includes hyperparameter sweeps, ablation studies, and latency profiling on commodity CPUs to reflect edge deployment realities. Empirical results reveal three central findings. First, diffusion-recurrent modeling (DCRNN) produces the best accuracy latency trade-off for large-scale freeway forecasting: it attains test RMSE $\approx 0.036$ with average inference latency $\approx 24$ ms, comfortably meeting real-time requirements. Second, naïve incorporation of simulated 6G metrics provides only marginal RMSE gains for ST-GCN/ST-GAT and does not improve DCRNN when conditioned simply on bandwidth or CQI; in many cases, small accuracy gains are offset by notable latency penalties. Third, error diagnostics (sensor-wise RMSE, MAE heatmaps, error histograms) expose a small subset of spatially localized hard sensors and episodic time windows that dominate tail errors, indicating where targeted modules (anomaly detectors, incident-aware submodels) could yield outsized improvements. The main contributions of this work are: (1) the first end-to-end benchmarking of 6G-conditioned spatio-temporal GNNs on METR-LA with real-time latency analysis; (2) the introduction and empirical evaluation of a bandwidth conditional diffusion cell (DCRNN6G); and (3) extensive ablation, hyperparameter, and diagnostic studies that quantify both the potential and limitations of network aware fusion for ITS. We conclude by outlining concrete research directions, heterogeneous cross-graph fusion, dynamic adjacency learning, probabilistic forecasting, and real 5G/6G testbed validation that will be critical to realize truly co-optimized transportation and communication systems.

Graph convolution networks based on adaptive spatiotemporal attention for traffic flow forecasting

Article Open access 15 March 2025

An adaptive spatiotemporal dynamic graph convolutional network for traffic prediction

Article Open access 25 July 2025

Few-shot traffic classification based on autoencoder and deep graph convolutional networks

Article Open access 15 March 2025

Introduction

Intelligent transportation systems (ITS) have become a cornerstone of modern urban planning, integrating advanced sensing, computation, and communication to alleviate congestion, reduce emissions, and enhance safety¹. Among the key capabilities enabling ITS is accurate, real-time traffic forecasting, which drives route guidance, dynamic tolling, incident detection, and coordinated traffic signal control². As metropolitan areas swell and vehicular flows intensify, the demand for forecasting models that can handle large-scale sensor networks and deliver sub-second inference has grown dramatically.

Traditional time-series methods-such as ARIMA and Kalman filters-were once state of the art for single-station prediction³. However, these approaches fail to capture the inherently spatial nature of traffic flow: congestion propagates along road segments and through intersections, exhibiting nonlinear interactions across multiple scales⁴. To address this, graph-based deep learning models have surged in popularity. In particular, diffusion convolutional recurrent neural networks (DCRNN)⁵ exploit a diffusion process on a predefined sensor graph to propagate information spatially, while recurrent units capture temporal dependencies. Similarly, spatio-temporal graph convolutional networks (ST-GCN)⁶ and their variants^7,8,9 integrate graph convolutions with temporal modules (e.g. gated recurrent units or temporal convolutions) to jointly model space–time correlations.

Despite their success, most existing works have focused solely on traffic flow data, neglecting an emerging dimension: the underlying communication network characteristics. With the advent of 5G-and soon 6G-cellular networks, future ITS will operate over networks offering ultra-low latency, massive connectivity, and programmable slices tailored to vehicular applications^10,11. Network slicing will enable ITS to reserve dedicated bandwidth and quality-of-service (QoS) guarantees for critical applications, while channel-quality metrics will reflect real-time link conditions. These 6G features present a tantalizing opportunity: by incorporating slice-level bandwidth and channel-quality as node-level attributes, forecasting models might better anticipate patterns induced by communication variability-e.g. data dropouts, reporting delays, or prioritized telemetry.

However, to date there is no systematic study of how 6G network slicing metrics can be fused with spatio-temporal graph models for traffic forecasting. Key open questions include:

Feature utility: Do slice-bandwidth and channel-quality measurably improve forecasting accuracy when used as additional node feature?
Architectural integration: Is it sufficient to append these features to the input, or must models adaptively weight neighbors or condition diffusion on network metrics?
Latency trade-offs: Can we incorporate 6G metrics without sacrificing the real-time inference constraints (sub-25 ms) required for vehicular control loops?

In this work, we conduct the first end-to-end investigation of 6G-aware traffic forecasting on the METR-LA benchmark⁵. We simulate per-sensor slice-bandwidth and channel-quality time series alongside traffic speeds, then integrate these features into multiple model families:

1.
Spatio-temporal GCN (ST-GCN)⁶: GCNN layers over a Gaussian-kernel sensor graph, followed by GRU temporal modeling.
2.
Graph attention (ST-GAT)¹²: Replacing GCN with multi-head attention to learn adaptive neighbor weighting.
3.
Diffusion convolutional RNN (DCRNN)⁵: Bidirectional random-walk diffusion embedded in a GRU, capturing topology-aware spatial propagation.
4.
6G-conditional DCRNN (DCRNN6G): An enhanced DCGRU cell that scales and diffuses the hidden state according to simulated slice-bandwidth, thereby conditioning the spatial propagation on network metrics.

We systematically evaluate each model under four feature-ablation regimes:

No 6G features (speeds only),
Channel-quality only,
Slice-bandwidth only,
Both features.

For ST-GCN and ST-GAT, we additionally perform hyperparameter searches over hidden dimensions, dropout rates, and learning rates to identify strong 6G-aware baselines.

Our extensive experiments yield three key findings:

1.
DCRNN superiority: The standard DCRNN, using only traffic speeds, achieves the best trade-off of accuracy (Test RMSE $\approx 0.036$) and latency (< 25 ms), outperforming all 6G-aware variants.
2.
Marginal 6G gains: Naïve inclusion of slice-bandwidth and/or channel-quality in ST-GCN/ST-GAT yields at best $\sim$1–2% RMSE reductions at the cost of 2–5$\times$ inference time, and conditioning diffusion on bandwidth (DCRNN6G) fails to improve accuracy while incurring a 3$\times$ latency penalty.
3.
Error diagnostics: Detailed error and topology analyses (error histograms, sensor-wise RMSE distributions, MAE heatmaps) reveal a small subset of sensors and time periods where models struggle, suggesting avenues for anomaly-aware extensions.

Contributions

We present the first comprehensive 6G-aware benchmarking of spatio-temporal graph models on METR-LA, introducing simulated slice-bandwidth and channel-quality features.
We propose a bandwidth-conditional diffusion cell (DCRNN6G) and evaluate its benefits and costs.
We deliver an in-depth ablation study and hyperparameter sweep for ST-GCN and ST-GAT under 6G feature regimes.
We demonstrate that the vanilla DCRNN remains the best model for real-time 6G ITS forecasting, achieving RMSE 0.036 and latency 23 ms.
We provide a rich suite of diagnostic plots characterizing error distributions, graph topology, and temporal dependencies, guiding future research.

The remainder of this paper is organized as follows. Section "Related work" reviews related work in graph-based traffic forecasting and 6G ITS. Section "Dataset description and research framework" details the METR-LA dataset, data cleaning, and feature simulation. Section "Methodology" describes the model architectures and methodology. Section "Experimental results and analysis" presents experimental settings, ablation studies, and results. Section "Discussion and future directions" discusses implications and limitations, and Sect. "Conclusion" concludes with the limitations.

Related work

Traffic forecasting has witnessed dramatic evolution over the past two decades, driven by increasing data availability, advances in statistical learning, and the rise of graph-based deep neural networks. In this section, we provide a comprehensive survey of related work spanning:

1.
Classical time-series and statistical models
2.
Shallow machine learning approaches
3.
Deep learning on gridded traffic data
4.
Spatio-temporal graph neural networks
5.
Attention- and diffusion-based extensions
6.
Adaptive and dynamic graph models
7.
6G-aware ITS and network-traffic fusion

Across these categories, we highlight key mathematical formulations, summarize empirical performance, and identify open challenges. Table 2 at the end of this section provides a consolidated overview of 40 representative works, categorized by modeling paradigm, features used, and core contributions.

Classical time-series and statistical models

Early traffic forecasting relied on univariate time-series models treating each sensor independently. The canonical ARIMA family^3,13 models speed $s_t$ via:

$$\begin{aligned} s_t = \sum _{i=1}^p \phi _i s_{t-i} + \epsilon _t - \sum _{j=1}^q \theta _j \epsilon _{t-j},\quad \epsilon _t \sim {\mathcal {N}}(0,\sigma ^2). \end{aligned}$$

(1)

While ARIMA captures linear temporal correlations, it fails to account for spatial interdependencies. Extensions include SARIMA for seasonal patterns³ and state-space models using Kalman filtering¹⁴:

$$\begin{aligned} {\textbf{x}}_t&= F{\textbf{x}}_{t-1} + {\textbf{w}}_t,\quad {\textbf{w}}_t\sim {\mathcal {N}}(0,Q), \end{aligned}$$

(2)

$$\begin{aligned} s_t&= H{\textbf{x}}_t + v_t,\quad v_t\sim {\mathcal {N}}(0,R). \end{aligned}$$

(3)

Seasonal-trend decomposition (STL)¹⁵ and exponential smoothing (ETS)¹⁶ offer robust univariate baselines, but remain spatially agnostic (Table 1).

Shallow machine learning approaches

To capture nonlinearity, researchers applied support vector regression (SVR)¹⁷, random forests¹⁸, and gradient boosting machines (GBM)¹⁹. These methods ingest handcrafted spatial features-neighboring sensors’ historical speeds or geographic distances-but lack end-to-end spatial modeling. For instance, SVR predicts:

$$\begin{aligned} {\hat{s}}_{t+1} = \sum _{i=1}^n \alpha _i K({\textbf{x}}_i, {\textbf{x}}_t) + b, \end{aligned}$$

(4)

where K is a kernel (e.g. RBF). Random forests aggregate decision trees trained on past speeds and simple spatial indicators ?. GBM further improves performance via sequential boosting?.

Table 1 Taxonomy of classical and shallow-learning models.

Full size table

Deep learning on gridded traffic data

With the success of CNNs in computer vision, early works mapped urban traffic onto grid-structured images. ST-ResNet²⁰ partitions a city into $H\times W$ cells and applies separate CNNs for closeness, periodicity, and trend components:

$$\begin{aligned} X_\text {out} = \tanh (W_\text {clo} * X_\text {clo} +W_\text {per} * X_\text {per}+W_\text {tre} *X_\text {tre}), \end{aligned}$$

(5)

where $*$ denotes convolution. ConvLSTM²¹ replaces fully connected operations in LSTM with convolutions:

$$\begin{aligned} i_t = \sigma (W_{xi} * X_t + W_{hi} * H_{t-1} + b_i),\ \text {etc.} \end{aligned}$$

(6)

However, grid-based approaches distort highway topology and scale poorly across irregular sensor layouts.

Spatio-temporal graph neural networks

To directly model arbitrary sensor graphs, graph neural networks (GNNs) have emerged as the state of the art. Let $G=(V,E)$ with adjacency A. A spectral GCN layer²² performs:

$$\begin{aligned} H^{(l+1)} = \sigma \bigl ({\tilde{D}}^{-1/2}{\tilde{A}}{\tilde{D}}^{-1/2} H^{(l)}W^{(l)}\bigr ), \end{aligned}$$

(7)

where ${\tilde{A}}=A+I$. Spatio-Temporal GCN (ST-GCN)^6,23 interleaves Eq. (7) with temporal 1D-CNNs:

$$\begin{aligned} H_{t}^{(l+1)} = \textrm{Conv1D}(H_{t-S+1:t}^{(l+\frac{1}{2})}). \end{aligned}$$

(8)

Diffusion convolutional RNN (DCRNN)

Li et al.⁵ model traffic as a diffusion process on G. Define forward/backward random-walk matrices:

$$\begin{aligned} P_+ = D^{-1}A,\quad P_- = D^{-1}A^\top . \end{aligned}$$

Then the diffusion convolution of input $Z\in {\mathbb {R}}^{N\times F}$ is:

$$\begin{aligned} \textrm{DiffConv}(Z) = \sum _{k=0}^{K-1}\bigl (P_+^k + P_-^k\bigr )\,Z\,\Theta _k, \end{aligned}$$

(9)

embedded within a GRU cell²⁴:

$$\begin{aligned} g_t&= \sigma \bigl (W_g[x_t;\textrm{DiffConv}(h_{t-1})]\bigr ),\quad u_t = \tanh \bigl (W_u[x_t;\textrm{DiffConv}(h_{t-1})]\bigr ),\nonumber \\ h_t&= g_t\circ h_{t-1} + (1-g_t)\circ u_t. \end{aligned}$$

(10)

DCRNN remains a gold standard, achieving RMSE 0.036 on METR-LA with sub-25 ms inference⁵.

Graph WaveNet and adaptive GCN (AGCRN)

Graph WaveNet⁸ augments GCN with an adaptive adjacency:

$$\begin{aligned} A_\textrm{adp} = \textrm{softmax}(\textrm{ReLU}(EE^\top )),\quad E\in {\mathbb {R}}^{N\times d}, \end{aligned}$$

(11)

and uses dilated temporal convolutions²⁵. AGCRN⁹ learns node-adaptive filters $\Theta _i$ and outperforms DCRNN in some settings.

Attention and diffusion based extensions

Graph Attention Networks (GAT)¹² compute attention:

$$\begin{aligned} \alpha _{ij} = \frac{\exp \bigl (\textrm{LeakyReLU}(a^\top [W h_i\Vert W h_j])\bigr )}{\sum _{k\in {\mathcal {N}}_i}\exp (\cdot )}, \end{aligned}$$

(12)

enabling adaptive neighbor weighting. Guo et al.⁷ combine GAT with temporal GRUs for traffic forecasting. ASTGCN²⁶ introduces spatio-temporal attention to capture long-range dependencies:

$$\begin{aligned} \textrm{Attn}_t = \textrm{softmax}\Bigl (\frac{Q_t K_t^\top }{\sqrt{d}}\Bigr )V_t, \end{aligned}$$

(13)

where Q, K, V are temporal queries, keys, and values.

Adaptive and dynamic graph models

Recent works learn graph structure jointly with forecasting. GTS²⁷ uses Gaussian kernels and Bayesian optimization to refine A. MTGNN²⁷ employs multivariate time-series embeddings to infer adjacency. EvolveGCN²⁸ evolves GCN weights via LSTM. These dynamic approaches address nonstationary spatial correlations but increase model complexity.

6G-aware ITS and network-traffic fusion

With 5G/6G emergence, researchers have begun integrating network metrics into ITS. Zhang et al.²⁹ survey semantic communications for ITS but stop short of forecasting. Saad et al.¹⁰ and Liu et al.¹¹ outline 6G enablers such as network slicing and URLLC. A few preliminary studies^30,31 consider link-quality as auxiliary features, reporting modest RMSE gains but increased latency.

Beyond transportation forecasting, multivariate time–series models have been extensively studied across other scientific domains. For instance, predictive–maintenance research employs temporal deep models for remaining–useful–life estimation, such as the PEMFC RUL forecasting framework in³², which provides a detailed accuracy–complexity trade-off under multi-sensor inputs. Similarly, VoIP traffic forecasting in real operational mobile networks³³ demonstrates how deep temporal models must balance prediction quality with deployment latency constraints-an aspect directly aligned with our 20–100 ms inference budget. In another domain, accelerated multivariate segmentation architectures³⁴ highlight the usefulness of feature extraction pipelines for scaling to large sensor networks. These prior works confirm that comprehensive performance–complexity analyses are critical when evaluating temporal forecasting models. Motivated by these insights, our study explicitly reports inference latency, per-seed uncertainty, and complexity-aware comparisons across ST-GCN, ST-GAT, DCRNN, DCRNN6G, AGCRN, and Graph WaveNet.

Summary table of related work

Table 2 presents a comprehensive taxonomy of forty seminal works in traffic forecasting. Each row corresponds to one study, organized chronologically to illustrate the field’s evolution. The columns capture the following dimensions:

Work & Year: Identifies the reference and its publication date, highlighting the shift from classical time-series methods (e.g. ARIMA³) to modern graph-based deep learning approaches.
Model type: Groups methods into:
- Statistical: ARIMA/SARIMA/Kalman
- Shallow learning: SVR, Random Forest, GBM
- Grid-based deep models: ST-ResNet, ConvLSTM
- Spatio-temporal GNNs: ST-GCN, DCRNN, Graph WaveNet
- Attention/dynamic graphs: GAT, ASTGCN, EvolveGCN
- Emerging 6G–ITS frameworks
Spatial handling: Indicates how each method models spatial dependencies-ranging from “None” for univariate models, through manual feature engineering, to grid convolutions and various graph operators (spectral GCN²², diffusion⁵, attention¹², adaptive adjacency⁸).
Temporal model: Specifies the temporal component used: linear ARMA for classical methods, kernel-based regression for SVR, 1D-CNN for grid and ST-GCN, recurrent units (LSTM/GRU) for RNN-based models, dilated convolutions for WaveNet²⁵, and attention mechanisms for ASTGCN²⁶.
Features/notes: Highlights special attributes such as use of external data (e.g. SINR in network-aware studies³⁰), adaptive or learned graph structures^9,27, and benchmark performance (e.g. METR-LA RMSE).

Scanning Table 2 reveals clear research trends:

1.
A progression from univariate, spatially agnostic models to end-to-end graph neural networks that directly leverage sensor topology.
2.
Increasing sophistication in temporal modeling: from linear autoregression to convolutional, recurrent, and attention-based architectures.
3.
Growing emphasis on adaptive and dynamic graph learning to capture evolving traffic and network conditions.
4.
A nascent but important line of work integrating 6G-related metrics (bandwidth, channel-quality) into forecasting models, which remains underexplored.

These observations motivate our systematic study of 6G-aware spatio-temporal GNNs under stringent real-time constraints.

Table 2 Summary of representative traffic forecasting methods.

Full size table

Open challenges and research gaps

Despite these advances, several gaps persist:

Joint network-traffic modeling: Few works systematically integrate 6G link metrics as graph signals or cross-graph attention.
Dynamic topology adaptation: Most GNNs assume static A; only recent dynamic models^27,28 address evolving networks.
Scalability vs. latency: High-capacity attention and dynamic graphs improve accuracy but often violate real-time constraints.
Uncertainty quantification: Nearly all models produce point forecasts; probabilistic or Bayesian GNNs remain underexplored³⁵.
Cross-city generalization: Transfer learning across cities with different road geometries has seen limited investigation.

These gaps motivate our current work, which provides the first systematic 6G-aware GNN benchmarking with real-time latency evaluations and extensive ablations.

Dataset description and research framework

This section details the METR-LA traffic dataset, our simulated 6G feature generation, preprocessing steps, and the end-to-end experimental pipeline. Wherever appropriate, we reference established methods and include mathematical formulations.

METR-LA traffic dataset

The METR-LA dataset⁵ contains $N=207$ loop-detector sensors on Los Angeles freeways, recording average speeds (mph) at 5-minute intervals over $T=34\,272$ time steps (March–June 2012)²³. Table 3 summarizes key statistics.

Table 3 METR-LA dataset summary.

Full size table

The dataset has $\approx 2.3\%$ missing or zero values due to sensor faults or outages². To handle missingness, we apply time-linear interpolation³⁶:

$$\begin{aligned} S'_{t,i} = {\left\{ \begin{array}{ll} S_{t,i}, & S_{t,i}\ne 0,\\ (1-\alpha )\,S_{t_0,i} + \alpha \,S_{t_1,i}, & t_0< t < t_1,\ \alpha =\frac{t-t_0}{t_1-t_0}, \end{array}\right. } \end{aligned}$$

(14)

where $t_0,t_1$ are the nearest valid indices before and after t. Edge-case values are forward/backward filled. Post-processing yields $S'\in {\mathbb {R}}^{T\times N}$ with no missing entries.

Graph construction

We first compute a symmetric Gaussian-kernel similarity matrix:

$$\begin{aligned} W_{ij} = \exp \!\left( -\frac{D_{ij}^{2}}{2\sigma ^{2}}\right) , \qquad \sigma = \textrm{mean}(D_{ij}), \end{aligned}$$

(15)

where $D_{ij}$ is the geographic distance between sensors i and j. We then apply a 90th-percentile threshold to obtain an undirected sparsified graph:

$$\begin{aligned} E_{\text {undirected}} = \{(i,j)\;|\; W_{ij} \ge \tau ,\; i<j \}, \qquad \tau = \textrm{percentile}_{90}(W). \end{aligned}$$

(16)

This yields $|E_{\text {undirected}}| = 21{,}264$ unique edges. For diffusion convolution, DCRNN requires separate forward and backward transition matrices. Thus, we expand the above to a directed adjacency:

$$\begin{aligned} A_{\text {directed}}(i,j) = {\left\{ \begin{array}{ll} 1, & (i,j)\in E_{\text {undirected}} \ \text {or}\ (j,i)\in E_{\text {undirected}},\\ begin{eqnarray*}3pt] 0, & \text {otherwise}, \end{array}\right. } \end{aligned}$$

(17)

and then add self-loops ($A \leftarrow A + I$). The resulting directed graph has edges. This explains the previously reported edge count and reflects the standard pre-processing used in diffusion-based GNNs.

$$\begin{aligned} |E_{\text {directed}}| = 2|E_{\text {undirected}}| + N = 42{,}735 \end{aligned}$$

(18)

All complexity and latency comparisons across models now explicitly distinguish between symmetric (ST-GCN/ST-GAT/AGCRN/WaveNet) and directed (DCRNN) adjacency.

Simulated 6G feature generation

To incorporate 6G network context, we simulate two per-sensor time series aligned with $S'$:

Slice-bandwidth (BW): Drawn i.i.d. from ${\mathcal {U}}[50,100]$ Mbps, then per-sensor standardized³⁷.
Channel-quality (CQ): Sampled from $\textrm{Beta}(2,2)$ scaled to [0.5, 1], then standardized²⁹.

Let $B,C\in {\mathbb {R}}^{T\times N}$ denote the resulting matrices. We stack features into a 3-channel tensor:

$$\begin{aligned} X_{t,i} = \bigl [S'_{t,i},\,B_{t,i},\,C_{t,i}\bigr ]\in {\mathbb {R}}^3. \end{aligned}$$

(19)

Normalization and splitting

For robust training, we compute per-sensor mean $\mu _i$ and standard deviation $\sigma _i$ on the first $T_{\textrm{train}}=20\,000$ steps:

$$\begin{aligned} \mu _i = \frac{1}{T_{\textrm{train}}}\sum _{t=1}^{T_{\textrm{train}}}S'_{t,i},\quad \sigma _i = \sqrt{\frac{1}{T_{\textrm{train}}}\sum _{t=1}^{T_{\textrm{train}}}(S'_{t,i}-\mu _i)^2}, \end{aligned}$$

(20)

and apply ${\tilde{S}}_{t,i}=(S'_{t,i}-\mu _i)/\sigma _i$ (similarly for B, C)³⁸. We then construct sliding-window samples:

$$\begin{aligned} \bigl \{X_{t-S+1},\dots ,X_t; \, {\textbf{x}}_{t+1:t+P}\bigr \}, \quad t=S,\dots ,T-P, \end{aligned}$$

(21)

producing $M=T - S - P + 1$ examples³⁹. We split into train/val/test as 20 000/5 000/9 257 samples (Table 4).

Research framework

Our pipeline (Fig. 1) comprises:

1.
Data Preparation: Clean ($S'$), graph build (Eqs. 25-26), feature stack (Eq. 19), normalization (Eq. 20), windowing (Eq. 24).
2.
Model prototyping: Implement ST-GCN⁶, ST-GAT¹², and DCRNN⁵.
3.
Ablation & hyperparameter tuning: Evaluate 6G-feature regimes and tune hidden sizes, learning rates, dropout⁴⁰.
4.
Evaluation: Compare on test split using MAE, RMSE², and inference latency.

Table 4 Data modalities overview.

Full size table

Methodology

This section details every component of our 6G-aware traffic forecasting framework, from data preprocessing through model specification, training protocols, and evaluation metrics. We aim for a rigorous, reproducible description at the level expected by top-tier journals.

Problem formulation

We consider an undirected sensor graph $G=(V,E)$ with $N=|V|$ nodes (loop detectors) and weighted adjacency $A\in {\mathbb {R}}^{N\times N}$ (Sect. "Dataset description and research framework"). At each discrete time $t$, we observe three per-sensor time series:

$$\begin{aligned} {\textbf{s}}_t = [s_{t,1},\dots ,s_{t,N}]^\top ,\quad {\textbf{b}}_t = [b_{t,1},\dots ,b_{t,N}]^\top ,\quad {\textbf{c}}_t = [c_{t,1},\dots ,c_{t,N}]^\top , \end{aligned}$$

where $s_{t,i}$ is normalized speed, $b_{t,i}$ slice-bandwidth, and $c_{t,i}$ channel-quality. We stack into a tensor

$$\begin{aligned} X_t = \bigl [{\textbf{s}}_t,\;{\textbf{b}}_t,\;{\textbf{c}}_t\bigr ] \in {\mathbb {R}}^{N\times 3}. \end{aligned}$$

(22)

Using a sliding window of length $S$, we form input–output pairs:

$$\begin{aligned} \bigl \{X_{t-S+1},\dots ,X_t\bigr \}\;\xrightarrow {\;{\mathcal {M}}\;}\;\bigl \{\hat{{\textbf{s}}}_{t+1},\dots ,\hat{{\textbf{s}}}_{t+P}\bigr \}, \end{aligned}$$

(23)

where $P$ is the horizon. Our goal is to learn ${\mathcal {M}}$ to minimize prediction error under real-time latency constraints.

Data preprocessing and feature engineering

Missing-value imputation Raw speeds contain $\approx 2.3\%$. We apply time-linear interpolation (Eq. 14) followed by backward/forward filling to obtain a complete $S'\in {\mathbb {R}}^{T\times N}$³⁶.

6G feature simulation Slice-bandwidth $b_{t,i}$ and channel-quality $c_{t,i}$ are sampled as in Sect. "Dataset description and research framework", then standardized per sensor:

$$\begin{aligned} \mu _i = \frac{1}{T_{\textrm{train}}}\sum _{t=1}^{T_{\textrm{train}}}S'_{t,i},\quad \sigma _i = \sqrt{\frac{1}{T_{\textrm{train}}}\sum _{t}(S'_{t,i}-\mu _i)^2}, \end{aligned}$$

and similarly for $b$ and $c$. This yields zero-mean, unit-variance features³⁸.

Synthetic 6G context features

In the absence of publicly available 6G network traces, the channel-quality (CQ) and slice-bandwidth (BW) variables were generated using simple stationary random processes. These variables capture only first-order variability and do not model key wireless behaviors such as fading, Doppler effects, handovers, correlated interference, or slice orchestration dynamics. Consequently, the results reported for 6G-aware models should be interpreted solely as reflecting the utility of these simplified synthetic features rather than real-world 6G measurements.

Windowing We extract $M=T-S-P+1$ samples:

$$\begin{aligned} \bigl (X_{t-S+1:t},\,{\textbf{s}}_{t+1:t+P}\bigr ),\quad t=S,\dots ,T-P, \end{aligned}$$

(24)

with $S=12$ (1hr) and $P=3$ (15min), matching typical ITS temporal dynamics².

Graph construction

Self-loops and degree calculation

Following thresholding of the symmetric Gaussian kernel, the resulting graph is undirected with $|E_{\text {undirected}}| \approx 21{,}264$ edges. For ST-GCN and ST-GAT, we follow the standard PyTorch Geometric normalization and add self-loops explicitly, i.e. ${\tilde{A}} = A + I$, contributing an additional $N=207$ diagonal entries. For DCRNN, the forward and backward random-walk matrices $P^{+}$ and $P^{-}$ are constructed without adding self-loops, as in the original formulation. When constructing degree distributions (Fig. 9), we report node degrees after adding self-loops and before collapsing the adjacency into an undirected form. This produces degrees clustered around 206–207, even though the underlying sparsified graph contains approximately half as many unique undirected edges. We now explicitly distinguish: (i) the undirected sparsified graph used for conceptual modeling, (ii) the directed adjacency ($|E_{\text {directed}}| = 2|E_{\text {undirected}}|$) used by DCRNN, and (iii) the self-loop–augmented adjacency used in GCN- and GAT-based models. Using geographic distances $D_{ij}$, we define

$$\begin{aligned} W_{ij} = \exp \Bigl (-\tfrac{D_{ij}^2}{2\sigma ^2}\Bigr ),\quad \sigma = \textrm{mean}(D_{ij}), \end{aligned}$$

(25)

then sparsify at the 90th percentile²²:

$$\begin{aligned} E=\{(i,j)\mid W_{ij}\ge \tau \},\quad \tau =\textrm{pct}_{90}(W). \end{aligned}$$

(26)

This yields $|E|=42\,529$ edges, balancing locality and global connectivity.

Model architectures

We detail three architectures and their 6G-aware variant.

Spatio-temporal GCN (ST-GCN)

Per^6,23, each block alternates:

$$\begin{aligned} H_t^{(l+\frac{1}{2})}&= \sigma \bigl ({\tilde{D}}^{-1/2}{\tilde{A}}\,H_t^{(l)}W_s^{(l)}\bigr ), \end{aligned}$$

(27)

$$\begin{aligned} H_t^{(l+1)}&= \textrm{Conv1D}\bigl (H_{t-S+1:t}^{(l+\frac{1}{2})};K_t\bigr ), \end{aligned}$$

(28)

where ${\tilde{A}}=A+I$, ${\tilde{D}}$ its degree, $K_t=3$ temporal kernel, and $\sigma =\textrm{ReLU}$. We stack $L=3$ blocks, hidden dim 64, dropout 0.1, and final linear readout.

Graph attention network (ST-GAT)

Following^7,12, we replace Eq. (27) with:

$$\begin{aligned} h_i' = \Vert _{k=1}^H \sigma \Bigl (\sum _{j\in {\mathcal {N}}_i}\alpha _{ij}^{(k)}W^{(k)}h_j\Bigr ), \quad \alpha _{ij}^{(k)}=\frac{\exp (\cdot )}{\sum _m\exp (\cdot )}, \end{aligned}$$

(29)

with $H=4$ heads, hidden size 32 per head, dropout 0.2. Temporal conv as in Eq. (28), $L=2$ layers.

Diffusion convolutional RNN (DCRNN)

We adopt⁵:

$$\begin{aligned} \textrm{DiffConv}(Z)&= \sum _{k=0}^{K-1}(P_+^k + P_-^k)\,Z\,\Theta _k, \quad P_+=D^{-1}A,\,P_-=P_+^\top , \end{aligned}$$

(30)

$$\begin{aligned} g_t&= \sigma \bigl ([{\textbf{x}}_t;\textrm{DiffConv}(h_{t-1})]W_g\bigr ),\quad u_t = \tanh \bigl ([{\textbf{x}}_t;\textrm{DiffConv}(h_{t-1})]W_u\bigr ), \end{aligned}$$

(31)

$$\begin{aligned} h_t&= g_t\circ h_{t-1} + (1-g_t)\circ u_t, \end{aligned}$$

(32)

with $K=2$, hidden dim 64, dropout 0.1.

Directed adjacency requirement

Although the sparsified road graph is undirected, diffusion convolution operates on forward and backward random-walk matrices $P_{+}=D^{-1}A$ and $P_{-}=D^{-1}A^{\top }$, which requires a directed adjacency. Consequently, the effective edge count in DCRNN is approximately twice that of the sparsified undirected graph (plus self-loops), whereas ST-GCN and ST-GAT operate on the undirected version. All latency comparisons therefore, reflect each model’s native adjacency representation.

6G-conditional DCRNN (DCRNN6G)

We introduce bandwidth-conditioned diffusion:

$$\begin{aligned} A_t = A \circ \textrm{diag}({\textbf{b}}_t),\quad P_\pm ^{(t)} = D^{-1}A_t^{\pm }, \end{aligned}$$

(33)

so that higher-bandwidth edges carry more weight in Eq. (30). All other settings as DCRNN.

Training protocol

Loss and optimizer We minimize

$$\begin{aligned} {\mathcal {L}} = \frac{1}{PN}\sum _{p=1}^P\sum _{i=1}^N \bigl ({\hat{s}}_{t+p,i}-s_{t+p,i}\bigr )^2 \end{aligned}$$

(34)

using AdamW⁴⁰ with initial LR $5\times 10^{-4}$, weight decay $5\times 10^{-4}$.

Forecasting horizon and loss averaging

We adopt a prediction horizon of $P=3$ steps (corresponding to 15 minutes ahead at a 5-minute sampling rate). All training and evaluation losses are computed over all predicted horizons and all nodes. Given model outputs ${\hat{s}}_{t+1:t+P,i}$ and ground-truth speeds $s_{t+1:t+P,i}$, the mean squared error used for training is

$$\begin{aligned} {\mathcal {L}}_{\textrm{MSE}} = \frac{1}{P N} \sum _{p=1}^{P} \sum _{i=1}^{N} \left( {\hat{s}}_{t+p,i} - s_{t+p,i} \right) ^{2}, \end{aligned}$$

(35)

and MAE is computed analogously. Thus, RMSE and MAE reported in all tables reflect an average over (a) batch samples, (b) all P forecast steps, and (c) all N sensors. This averaging convention is consistent with prior METR-LA benchmarks (e.g. DCRNN, Graph WaveNet, AGCRN).

Scheduler and early stopping: We employ a ReduceLROnPlateau scheduler (factor 0.5, patience 2) and early stopping (patience 5) on validation MAE.

Gradient clipping and regularization: Gradients are clipped to norm 5.0 each step. Dropout (0.1–0.2) and LayerNorm inside DCGRU cells ensure stable training.

Batching and epochs: We use batch size 16, train up to 50 epochs, shuffling the training set each epoch.

Experimental repetitions and statistical reporting: To quantify variability and support statistical comparisons, we repeat each experiment (all models and ablations) with $S=5$ independent random seeds, varying weight initializations and data shuffling. Unless otherwise stated, all scalar metrics reported in the main tables (e.g. Tables 5 and 6) are the mean over these S runs. We report the associated standard deviation and use it to construct a $95\%$ confidence interval (CI) as

$$\begin{aligned} \text {CI}_{95} = {\bar{x}} \pm 1.96 \cdot \frac{s}{\sqrt{S}}, \end{aligned}$$

(36)

where ${\bar{x}}$ and s denote the sample mean and standard deviation over seeds, respectively. All hypothesis tests described, use the per-seed metrics for each configuration.

Evaluation metrics

We report:

MAE: $\frac{1}{PN}\sum |{\hat{s}} - s|$.
RMSE: $\sqrt{\frac{1}{PN}\sum ({\hat{s}} - s)^2}$.
Latency: Average inference time per batch on CPU.
Convergence:: Training/validation loss curves.
Robustness: Sensor-wise error distributions and worst-case tail errors.

Normalization and Units-

All RMSE values are computed on the normalized speeds (zero-mean, unit-variance). To ensure comparability in physical units, MAE is additionally computed after de-normalizing predictions back to miles-per-hour (mph). For clarity, all reported metrics explicitly indicate whether they are (i) normalized or (ii) de-normalized (mph). We also report 95% bootstrap confidence intervals (1,000 resamples) for both MAE and RMSE on the test set.

Uncertainty quantification and statistical tests

For each model and configuration we obtain per-seed test metrics $\{m^{(k)}\}_{k=1}^S$ (e.g. RMSE, MAE), and report the mean ${\bar{m}}$, standard deviation $s_m$, and $95\%$ confidence interval ${\bar{m}} \pm 1.96\, s_m / \sqrt{S}$. When comparing two models (e.g. DCRNN vs. DCRNN6G), we perform a paired two-sided t-test on the per-seed metrics and report the corresponding p-values. In ablation studies with multiple pairwise comparisons against the DCRNN baseline, we control the family-wise error rate using a Holm–Bonferroni correction; only differences that remain significant after correction are interpreted as statistically significant.

Computational complexity

Spatial Conv/Attention ST-GCN/ST-GAT cost ${\mathcal {O}}(L(|E|F + NF K_t))$ per time step. Diffusion RNN DCRNN: ${\mathcal {O}}(K|E|F + NH)$ per step. Our DCRNN6G adds negligible extra cost for Eq. (33).

The spatial cost of ST-GCN and ST-GAT is computed using the undirected adjacency ($|E_{\text {undirected}}| = 21{,}264$), whereas DCRNN naturally uses the directed formulation required by diffusion convolution ($|E_{\text {directed}}| = 42{,}735$). All latency measurements reflect these native representations; nevertheless, the empirical results, Table 5 show that DCRNN remains faster despite the larger directed graph.

Algorithmic overview

Experimental results and analysis

In this section, we present a thorough evaluation of our proposed DCRNN and its 6G-aware variant (DCRNN6G) against two graph-CNN baselines (ST-GCN and ST-GAT). We organize the discussion into: (1) quantitative performance (accuracy vs. latency), (2) convergence dynamics, (3) error distributions, (4) graph topology diagnostics, (5) temporal correlations, (6) 6G feature statistics, and (7) spatio-temporal error heatmaps. Detailed tables and eleven figures (Figs. 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 and 12) support our analysis.

Quantitative performance

Table 5 compares test-set accuracy (MAE, RMSE) and average CPU inference latency per batch for all models. Note that RMSE values are reported on normalized speeds, whereas MAE is reported in physical units (mph). This dual-scale reporting ensures that (i) learning stability can be compared via normalized RMSE, and (ii) practical forecasting accuracy is interpretable in mph.

Table 5 Test performance with normalized RMSE and de-normalized MAE (mph).

Full size table

Key observations:

Accuracy: DCRNN reduces RMSE by more than half compared to ST-GCN/ST-GAT, confirming the strength of diffusion-based spatial modeling and recurrent temporal filtering.
Latency: DCRNN runs in 23.6 ms, well under real-time requirements, while the attention-based ST-GAT incurs over 100ms per batch, limiting its practical deployment.
6G Impact: Incorporating slice-bandwidth and CQI in DCRNN6G adds 50ms of latency for negligible RMSE change (0.0361$\rightarrow$0.0364), indicating that our simple conditioning scheme did not meaningfully enhance accuracy.

Additional baselines-

To ensure fair evaluation under dense-graph settings, we additionally include two adaptive-adjacency models: (i) Graph WaveNet⁴¹, which learns a data-driven adjacency matrix via node embeddings, and (ii) AGCRN⁹, which parameterizes node-specific adaptive filters. Both models were trained using the same window length, horizon, optimizer, and CPU-based latency budget as the original baselines.

Table 6 Ablation study evaluating (i) 6G feature regimes, (ii) adaptive adjacency baselines (AGCRN, Graph WaveNet), and (iii) diffusion-based DCRNN.

Full size table

Ablation insights: Table 6 summarizes a controlled ablation of 6G feature regimes. To evaluate whether dense adjacency provides an unfair advantage to diffusion-based models, we additionally include two adaptive-adjacency baselines-Graph WaveNet and AGCRN-both trained under the same normalization, windowing, and latency-constrained CPU setting. Including CQ yields negligible improvement, whereas BW provides a small but consistent gain. Importantly, even when compared against adaptive graph learners, the relative ordering remains stable: diffusion-based models achieve the best accuracy latency balance, while adaptive models improve accuracy at the cost of substantially higher inference time.

CQ alone yields only a minor RMSE gain (0.0717$\rightarrow$0.0715) at +1.2ms.
BW alone delivers the largest improvement (0.0717$\rightarrow$0.0690) with minimal latency overhead, suggesting that slice-bandwidth carries useful spatio-temporal cues.
CQ+BW does not outperform BW alone and incurs the highest latency, indicating feature redundancy when naively concatenated.

Sparse graph validation-

To examine whether graph density artificially inflated the advantage of diffusion-based models, we repeated all experiments using a strictly sparse top-20 nearest-neighbor adjacency matrix. DCRNN continued to outperform all alternatives (RMSE $= 0.0392$), while Graph WaveNet and AGCRN remained slower or less accurate. These results confirm that the superiority of DCRNN is not an artifact of graph density but of its diffusion-recurrent architecture.

Convergence dynamics

Figures 2 and 3 visualize training and validation MSE over the first ten epochs.

ST-GCN vs. ST-GAT (Fig. 2): ST-GAT’s training loss dips slightly below ST-GCN (0.993 vs. 1.008) and its validation loss is marginally lower (1.055 vs. 1.067), but its higher computational complexity yields slower per-epoch throughput.

DCRNN vs. DCRNN6G (Fig. 3): Both converge to low training MSE (0.275) and validation MSE (0.300). DCRNN6G remains 0.010 above DCRNN on validation, illustrating that our bandwidth conditioning did not improve generalization.

Final accuracy and latency

Test RMSE (Fig. 4): DCRNN (0.0361) and DCRNN6G (0.0364) significantly outperform ST-GCN/ST-GAT (0.071), confirming the superiority of diffusion-RNN architectures.

Latency (Fig. 5): DCRNN’s 23.6ms latency makes it practical for edge deployment, while ST-GAT’s 113ms precludes hard real-time guarantees.

Error distribution

Error histogram (Fig. 6): Errors are tightly centered at zero, with 95% within ±2mph, indicating low bias and strong overall fidelity.

Sensor-wise RMSE (Fig. 7): Most sensors lie between 0.39 and 0.59mph RMSE, with a few outliers exceeding 0.85mph-likely high-variance locations (e.g. merges).

Graph topology diagnostics

Adjacency heatmap (Fig. 8): Dense yellow regions indicate near-complete connectivity; sparse purple spots mark weaker edges.

Degree distribution (Fig. 9):

The apparent concentration of node degrees around 206–207 corresponds to the directed adjacency used by DCRNN (including self-loops). In the underlying undirected graph, the average degree is approximately 103, consistent with the 21,264 unique sparsified edges.

Temporal correlation analysis

Autocorrelation (Fig. 10): The ACF decays slowly over 20 lags (1.5hrs), motivating the use of recurrent and dilated temporal modules.

6G feature statistics

BW distribution (Fig. 11): Uniform coverage over [-1.75, +1.75] validates our simulation strategy and the feature’s potential informativeness.

Spatio-temporal error heatmap

Error Heatmap (Fig. 12): Bright streaks at time indices 30–50 and 120–140 suggest episodic events (e.g. incidents). Persistent sensor-specific hotspots around sensor 100 align with previous outlier observations.

Summary of findings

Combining all analyses, we conclude:

Best model: DCRNN achieves the lowest normalized RMSE (0.0361) and lowest physical-unit MAE (0.85 mph), confirming superiority both in scale-free and real-world error terms.
6G feature impact: Naïve conditioning of slice-bandwidth and CQI did not improve forecasting, indicating the need for advanced fusion mechanisms.
Graph & temporal design: Dense topology and long temporal dependencies endorse diffusion-RNNs over shallow GCN or CNN methods.
Targeted refinements: A handful of sensors/timesteps account for most large errors, suggesting room for specialized local models or uncertainty-aware extensions.

These insights inform our final recommendations in Sect. "Discussion and future directions".

Discussion and future directions

In this section, we synthesize our empirical findings, situate them within the broader context of 6G-enabled Intelligent Transportation Systems (ITS), and outline concrete research pathways to advance the state of the art. We organize the discussion into four major themes: (1) interpretation of key results, (2) practical implications for ITS deployment, (3) methodological limitations, and (4) visionary future research directions.

The importance of jointly assessing predictive accuracy and computational cost has been emphasized across multiple domains. Deep temporal models for PEMFC degradation forecasting³², VoIP traffic prediction in mobile networks³³, and real-time multivariate segmentation³⁴ all highlight the necessity of rigorous performance–complexity evaluation. Our findings reinforce a similar conclusion: achieving low-latency inference at city-scale requires careful balancing between graph sparsity, receptive field size, and temporal modeling depth. The broader perspective offered by these works helped motivate our expanded comparison including AGCRN and Graph WaveNet-and strengthened the generalizability of our analysis.

Interpretation of key results: Our experiments (Sect. "Experimental results and analysis") benchmarked four architectures: ST-GCN⁶, ST-GAT¹², DCRNN⁵, and our 6G-aware DCRNN6G. The quantitative and qualitative analyses yield several intertwined insights:

Superiority of diffusion-recurrent modeling: The core advantage of DCRNN over static graph-CNNs lies in its modeling of traffic propagation as a diffusion process on the road graph. Equation 30 formalizes how multi-hop flows are aggregated, capturing both upstream and downstream congestion wave phenomena that are well-documented in traffic flow theory⁴. Our RMSE reduction from 0.071 in ST-GCN to 0.036 in DCRNN represents more than a 50% improvement, corroborating earlier ICLR results⁵ and highlighting the importance of integrating spatial diffusion with temporal recurrence.
Latency–accuracy trade-offs: In real-time ITS, sub-25 ms inference is often required to support 5-minute forecasting intervals². DCRNN meets this latency criterion (23.6 ms), whereas ST-GAT’s attention mechanisms incur 113 ms per batch. This underscores a critical design principle: increased architectural complexity (e.g. adaptive attention weights in ST-GAT, Eq. (29)) yields diminishing returns if latency constraints cannot be met. Similar observations have been made in the robotics domain, where model simplicity often trumps marginal accuracy gains under strict timing requirements²⁹.
Marginal impact of Naïve 6G feature conditioning: Our DCRNN6G variant (Eq. 33) and the ST-GCN ablations (Table 6) reveal that simple concatenation or weighted diffusion by slice-bandwidth (BW) and channel-quality (CQ) yields only modest RMSE improvements ( 3% in ST-GCN+BW) at notable latency overheads. This aligns with findings in multi-modal fusion literature, where naïve feature stacking often fails to harness cross-modal synergies without sophisticated alignment or attention mechanisms³⁷. It suggests that 6G metrics are only beneficial when integrated through mechanisms that can selectively attend to the most informative network conditions.
Interpretation of 6G-feature utility: The observation that 6G context features provide limited predictive benefit must be interpreted with caution. Because CQ and BW were sampled from simple independent distributions, the resulting signals lack realistic temporal and spatial dynamics observed in operational wireless systems. Real 6G traces-exhibiting fading, handover discontinuities, multi-slice orchestration, and congestion dynamics-may introduce stronger correlations with traffic states. Therefore, the present findings do not generalize to actual 6G deployments, but instead quantify performance under simplified synthetic conditions.
Error concentration and outlier dynamics: Our sensor-wise RMSE boxplots (Fig. 7) and spatio-temporal error heatmap (Fig. 12) show that although average errors are low, a small subset of sensors-often located at on-ramps, merges, or busy interchanges-and specific time windows account for the majority of large residuals. These “hard” cases likely correspond to non-recurrent congestion events or rapid demand fluctuations that violate the diffusion assumption. Similar patterns have been observed in urban transit studies, where incident detection modules are required to capture these anomalies³¹.

Practical implications for ITS deployment:

Edge inference feasibility: The sub-25 ms inference time of DCRNN on commodity CPU hardware suggests it can be deployed on roadside units (RSUs) or in-vehicle edge processors to provide near-real-time predictions. This capability enables dynamic traffic control strategies-such as adaptive ramp metering and variable speed limits-to react swiftly to predicted congestion patterns⁸.
Model maintenance and retraining cadence: Our experiments assume a static graph over a four-month data span. In practice, roadway networks and communication links evolve (construction, accidents, changing slice configurations). However, Fig. 9 shows the graph is nearly fully connected, implying that small topological changes may not substantially degrade model performance. Thus, periodic retraining (e.g. weekly or monthly) rather than continuous graph adaptation may suffice, reducing operational overhead.
Modular extension for local high-variance zones: The identification of persistent error hotspots at specific sensors suggests a hybrid modeling approach: deploy a global DCRNN for the majority of sensors, and lightweight specialized submodels or anomaly detectors for the few problematic nodes. Such modular architectures have been proposed in hybrid weather forecasting systems, improving tail-error performance without sacrificing global fluency³⁵.

Methodological limitations: Despite its strengths, our study has several limitations that qualify the generality of our conclusions:

Synthetic nature of 6G features: The 6G-related variables (channel quality and slice bandwidth) in our experiments were generated from simple i.i.d. distributions to offer lightweight contextual signals; however, such synthetic features do not capture critical wireless behaviors, including small-scale fading, Doppler-induced variability, mobility-driven handovers, or slice-orchestration dynamics such as resource-block scheduling and admission control¹¹. As a result, the modest performance gains observed for our 6G-aware variants (e.g. DCRNN6G) should not be interpreted as evidence that wireless context provides limited utility. Rather, the gap highlights the need for richer cross-modal modeling-e.g. attention-based fusion, heterogeneous graph integration, or co-training strategies-which may reveal substantially larger benefits when supplied with realistic 5G/6G telemetry. Bridging this simulation-to-reality divide using real operator datasets or high-fidelity network simulators remains an essential direction for future work.
Fixed window and horizon: We fix $S=12$ and $P=3$ (1hr input, 15min output). Although aligned with common practice²³, different corridors or peak vs. off-peak periods may call for adaptive window sizing or multi-horizon forecasting, which can be addressed via hierarchical RNNs or dilated temporal convolutions²⁵.
Point forecasts without uncertainty quantification: Our use of MSE loss (Eq. 34) yields point estimates and does not quantify predictive uncertainty. For applications like incident management or autonomous vehicle routing, measures of confidence (e.g. quantile regression⁴², Bayesian GNNs⁴³) are critical.
Single-city focus: All experiments rely on the METR-LA dataset. While it is a standard benchmark, its freeway-centric topology differs from urban street grids or public transit networks. Cross-city validation using LOOP⁴⁴ or PeMS-Bay datasets would strengthen external validity.
Simplified wireless modeling: A key limitation is the use of synthetically generated 6G features. These variables omit fading, mobility-driven handovers, slice-level resource orchestration, and spatiotemporally correlated wireless impairments. As a result, conclusions regarding the impact of 6G context should not be interpreted as reflective of real 6G environments.

Future research directions: Building on our insights and acknowledging the limitations, we identify five promising research directions:

Heterogeneous graph fusion: Develop fused graphs that jointly model road topology and communication infrastructure (e.g. base station locations, slice networks). Cross-graph attention mechanisms can learn when to attend to network vs. traffic signals, akin to coattention in multi-modal transformers⁴⁵. Formally, one could define:
$$\begin{aligned} \beta _{ij} = \textrm{softmax}\bigl (\phi ([W_s s_i\Vert W_n n_j])\bigr ), \end{aligned}$$
(37)
where $n_j$ are network nodes, enabling richer cross-domain message passing.
Dynamic graph learning: Incorporate temporal evolution of adjacency matrices $A_t$ to reflect changing traffic patterns and network conditions. Approaches such as EvolveGCN²⁸ use RNNs to update GCN weights over time; similar techniques could adapt edge weights based on recent traffic or CQI observations.
Incident-aware and multi-task forecasting: Augment the primary forecasting task with auxiliary tasks-such as incident detection or congestion classification-to help the model allocate capacity where it matters most. Multi-task learning frameworks have improved robustness in healthcare and finance applications³⁷.
Probabilistic and Bayesian GNNs: Extend deterministic models to output full predictive distributions. Quantile regression networks⁴² or Bayesian GNN formulations³⁵ can provide uncertainty bounds, crucial for risk-aware decision support.
Real-world 5G/6G testbed validation: Deploy and evaluate models on live 5G/6G testbeds with real network slicing, URLLC sessions, and mobility scenarios. Such studies will uncover practical issues-latency variance, measurement noise, packet loss-that are absent in simulations.

Concluding remarks

Our study presents a comprehensive and rigorously controlled evaluation of diffusion-based, convolution-based, and attention-based graph architectures for traffic forecasting under emerging 6G-aware conditions. While the overall architectural novelty is intentionally incremental-with synthetic 6G features and lightweight conditioning mechanisms-our contribution lies in providing the first unified and systematically benchmarked framework for assessing how wireless context may influence spatiotemporal traffic prediction. The results demonstrate that diffusion-recurrent models consistently deliver the most favorable balance between accuracy and real-time inference, whereas naïve 6G feature concatenation yields only modest gains under simplified synthetic conditions.

Importantly, to strengthen the reproducibility and future impact of this line of research, we plan to release our 6G-feature generator and full benchmarking pipeline as an open, stand-alone toolkit. Such a resource will enable the community to (i) easily simulate diverse wireless conditions, (ii) incorporate real 5G/6G traces when available, and (iii) perform controlled cross-modal ablations at scale. We believe this open benchmarking platform will support deeper investigations into advanced fusion mechanisms-such as cross-modal attention, heterogeneous graph coupling, and co-training strategies-and ultimately help unlock the full predictive benefits of real 6G telemetry in next-generation intelligent transportation systems.

Our study presents a comprehensive and rigorously controlled evaluation of diffusion-based, convolution-based, and attention-based graph architectures for traffic forecasting under emerging 6G-aware conditions. While the overall architectural novelty is intentionally incremental-with synthetic 6G features and lightweight conditioning mechanisms-our contribution lies in providing the first unified and systematically benchmarked framework for assessing how wireless context may influence spatiotemporal traffic prediction. The results demonstrate that diffusion-recurrent models consistently deliver the most favorable balance between accuracy and real-time inference, whereas naïve 6G feature concatenation yields only modest gains under simplified synthetic conditions.

Importantly, to strengthen the reproducibility and future impact of this line of research, we plan to release our 6G-feature generator and full benchmarking pipeline as an open, stand-alone toolkit. Such a resource will enable the community to (i) easily simulate diverse wireless conditions, (ii) incorporate real 5G/6G traces when available, and (iii) perform controlled cross-modal ablations at scale. We believe this open benchmarking platform will support deeper investigations into advanced fusion mechanisms-such as cross-modal attention, heterogeneous graph coupling, and co-training strategies-and ultimately help unlock the full predictive benefits of real 6G telemetry in next-generation intelligent transportation systems.

Conclusion

In this work, we have presented a comprehensive study of spatio-temporal graph neural networks for real-time traffic forecasting in the context of emerging 6G networks. Building on the METR-LA benchmark, we,

Developed a rigorous Dataset Description and Research Framework (Sect. "Dataset description and research framework"), including cleaned speed data, simulated slice-bandwidth and channel-quality features, and a sparse Gaussian-kernel graph representation.
Designed and implemented three baseline architectures-ST-GCN²³, ST-GAT¹², and DCRNN⁵-alongside our 6G-aware DCRNN6G variant that conditionally weights diffusion by slice-bandwidth (Sect. Methodology").
Proposed an enhanced training protocol featuring per-sensor normalization, early stopping, learning-rate scheduling, gradient clipping, and LayerNorm within diffusion GRU cells to ensure stable, reproducible performance.
Conducted exhaustive Experimental results and analysis (Sect. "Experimental results and analysis"), comparing accuracy (MAE, RMSE), inference latency, convergence behavior, error distributions, graph topology diagnostics, temporal correlations, and feature statistics across eleven figures and detailed ablation studies.
Demonstrated that DCRNN achieves the best accuracy–latency trade-off (RMSE 0.0361, 23.6ms) and that naïve 6G feature stacking yields only marginal gains, highlighting the need for more expressive multimodal fusion strategies.
Identified key limitations-synthetic network features, static graph assumption, lack of uncertainty quantification, and single-city focus-and outlined rich avenues for future work (Sect. "Discussion and future directions"), including heterogeneous graph fusion, dynamic adjacency learning, incident-aware modules, probabilistic forecasting, and real-world 5G/6G testbed validation.

Generalisability of 6G Findings:

Because the channel-quality and bandwidth traces were synthetically generated without incorporating fading statistics, handover events, or slice-management dynamics, the conclusions regarding the limited impact of 6G context are restricted to the simplified setting studied here. Richer and more realistic wireless traces may enable significantly stronger cross-modal benefits, as shown in recent works on channel-aware forecasting and network digital twins. Evaluating our framework on real 5G/6G logs, ray-tracing-derived channel maps, or open testbeds constitutes an important direction for future work.

Future work will incorporate realistic 6G measurements, such as ray-traced propagation data, OAI/ns-3-based emulation, and operator-derived slice telemetry, enabling a more faithful evaluation of how 6G dynamics influence short-horizon traffic prediction.

Limitations

While our study advances the state of the art, several limitations merit mention

1.
Synthetic 6G features: Our slice-bandwidth and CQI signals are simulated rather than measured from real networks, which may not capture the full complexity of 6G channel dynamics¹¹.
2.
Static graph topology: We assume a fixed adjacency matrix over four months of data; urban road networks and communication links can evolve due to construction, incidents, or dynamic slicing, suggesting the need for dynamic graph learning²⁸.
3.
Point forecasting only: We minimize MSE to produce point estimates and do not quantify predictive uncertainty, which is crucial for risk-aware traffic management and safety-critical applications⁴³.
4.
Single-city evaluation: Our experiments focus solely on METR-LA; generalization to other cities with different road geometries or sensor deployments (e.g. LOOP⁴⁴) remains to be validated.
5.
Fixed temporal horizon: We use a single input window (1hr) and forecast horizon (15min); adaptive windowing or multi-horizon forecasting could better capture diverse traffic patterns³⁹.

Data availability

The datasets (METR-LA) analysed during the current study are available in the zenodo repository, https://zenodo.org/records/5146275.

References

Zheng, C., Qin, L. & et al. Urban flow prediction from spatio-temporal data with deep meta-learning. Proc. of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 1720–1730 (2020).
Vlahogianni, E. I., Karlaftis, M. G. & Golias, J. C. Short-term traffic forecasting: Where we are and where we’re going. Transport. Res. C 43, 1–19 (2014).
Article Google Scholar
Box, G. E. P., Jenkins, G., Reinsel, G. & Ljung, G. Time Series Analysis: Forecasting and Control (Wiley, 2015).
Google Scholar
Treiber, M. & Kesting, A. Traffic Flow Dynamics: Data, Models and Simulation (Springer, 2013).
Book Google Scholar
Li, Y., Yu, R., Shahabi, C. and Liu, Y. Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. Preprint at arXiv:1707.01926 (2018).
Yan, S., Xiong, Y. & Lin, D. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Proc. of the AAAI Conference on Artificial Intelligence (AAAI, 2018).
Guo, S., Zhang, H., Li, X. & Shen, D. Attention based spatial-temporal graph convolutional neural network for traffic flow forecasting. In Proc. of the AAAI Conference on Artificial Intelligence, vol. 33, 914–921 (AAAI Press, 2019).
Wu, Z. et al. Graph wavenet for deep spatial-temporal graph modeling. Preprint at arXiv:1906.00121 (2019).
Bai, L. et al. Adaptive graph convolutional recurrent network for traffic forecasting. NeurIPS 1494, 17804–17815 (2020).
Google Scholar
Saad, W., Bennis, M. & Chen, M. A vision of 6g wireless systems: Applications, trends, technologies, and open research problems. IEEE Netw. 34, 134–142 (2019).
Article Google Scholar
Liu, Y. et al. Reconfigurable intelligent surfaces: Principles and opportunities. IEEE Commun. Surv. Tutor. 23, 2274–2316 (2021).
Article Google Scholar
Veličković, P. et al. Graph attention networks. In 6th International Conference on Learning Representations, ICLR 2018 (2018).
Contreras, J., Espínola, R., Nogales, F. J. & Conejo, A. J. Arima models to predict next-day electricity prices. In IEEE Power Engineering Society General Meeting, 287–292 (IEEE, 2003).
Kalman, R. E. A new approach to linear filtering and prediction problems. J. Basic Eng. 82, 35–45 (1960).
Article MathSciNet Google Scholar
Cleveland, R. B., Cleveland, W. S., McRae, J. E. & Terpenning, I. Stl: A seasonal-trend decomposition procedure based on loess. J. Off. Stat. 6, 3–73 (1990).
Google Scholar
Hyndman, R. J., Koehler, A. B., Snyder, R. D. & Grose, S. Forecasting with exponential smoothing: The state space approach. J. Forecast. 17, 201–218 (2002).
Google Scholar
Wu, C., Ho, J.M.-H. & Lee, D.T.-W. Travel-time prediction with support vector regression. IEEE Trans. Intell. Transp. Syst. 5, 276–281 (2004).
Article Google Scholar
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
Article Google Scholar
Friedman, J. H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001).
Article MathSciNet Google Scholar
Zhang, J., Zheng, Y. & Qi, D. Deep spatio-temporal residual networks for citywide crowd flows prediction. Proc. of the AAAI Conference on Artificial Intelligence 31, 1655–1661 (2017).
Article Google Scholar
Shi, X. et al. Convolutional lstm network: A machine learning approach for precipitation nowcasting. In Advances in Neural Information Processing Systems 28 (NeurIPS), 802–810 (2015).
Kipf, T. N. & Welling, M. Semi-supervised classification with graph convolutional networks. In 5th International Conference on Learning Representations (ICLR) (2017).
Yu, B., Yin, H. & Zhu, Z. Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. In IJCAI, 3634–3640 (2017).
Cho, K. et al. Learning phrase representations using rnn encoder–decoder for statistical machine translation. In Proc. of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1724–1734 (2014).
van den Oord, A. et al. Wavenet: A generative model for raw audio. Preprint at arXiv:1609.03499 (2016).
Song, C., Hao, H., Li, Y., Zhou, X. & Tang, B. Spatial-temporal synchronous graph convolutional networks: A new framework for traffic forecasting. In WSDM, 289–297 (2021).
Wu, Z. et al. Connecting the dots: Multivariate time series forecasting with graph neural networks. In Proc. of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 753–763 (2020).
Pareja, A. et al. Evolvegcn: Evolving graph convolutional networks for dynamic graphs. In AAAI, 5363–5371 (2020).
Zhang, Z. & et al. Semantic communication for 6g: A survey. Preprint at arXiv:2104.11353 (2021).
Xu, P., Zhang, Y., Wang, L., Wang, Y. & Liu, F. Deep learning for urban traffic flow forecasting: A survey. In ICASSP, 4700–4704 (2021).
Wang, Y., Zhu, C., Zhang, B. & Xiong, R. Application of deep learning for short-term traffic flow prediction. Transp. Res. Part C 129, 103205 (2021).
Google Scholar
Zhang, Y., Li, H. & Wang, S. Prediction of the rul of pemfc based on multivariate time series forecasting model. In 2023 3rd International Symposium on Computer Technology and Information Science (ISCTIS), 1–6, https://doi.org/10.1109/ISCTIS57420.2023.00012 (2023).
Tognola, G. et al. Multivariate time series characterization and forecasting of voip traffic in real mobile networks. IEEE Trans. Netw. Serv. Manag. 20, 3020–3034. https://doi.org/10.1109/TNSM.2023.3271238 (2023).
Article Google Scholar
Chen, H., Xu, Y., Li, T. Speeding. & up multivariate time series segmentation using feature extraction. In IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), 1282–1287, 2020. https://doi.org/10.1109/ITNEC48623.2020.9084919 (2020).
Duvenaud, D. K. et al. Convolutional networks on graphs for learning molecular fingerprints. NeurIPS 28, 2224–2232 (2015).
Google Scholar
Little, R. J. A. & Rubin, D. B. Statistical Analysis with Missing Data 2nd edn. (Wiley, 2002).
Book Google Scholar
Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning (MIT Press, 2016).
Google Scholar
He, K., Zhang, X., Ren, S. & Sun, J. Deep Residual Learning for Image Recognition 770–778 (CVPR, 2016).
Google Scholar
Lai, G., Chang, W.-C., Yang, Y. & Liu, H. Modeling long- and short-term temporal patterns with deep neural networks. In WWW, 695–704 (2018).
Loshchilov, I. & Hutter, F. Decoupled weight decay regularization. International Conference on Learning Representations (ICLR) (2019).
Wu, Z. et al. Graph WaveNet for deep spatial-temporal graph modeling. In Proc. of the 28th International Joint Conference on Artificial Intelligence (IJCAI), 1907–1913 (AAAI Press, 2019).
Lian, W. Z. et al. Learning traffic as images: A deep quantile approach. IEEE Trans. Intell. Transp. Syst. 8, 7916–7924 (2020).
Google Scholar
Tagasovska, N. & Lopez-Paz, D. Single-model uncertainties for deep learning. In NeurIPS, 7454–7465 (2019).
Jain, A., Mavrantzas, V., Gopalkrishnan, S. & Ramamritham, K. Loop: A large-scale real-world loop detector dataset for traffic prediction. In KDD, 2776–2786 (2020).
Vaswani, A. et al. Attention is all you need. In NeurIPS, 6000–6010 (2017).

Download references

Funding

Open access funding provided by Manipal University Jaipur.

Author information

Praveen Kumar Mannepalli and Ankur Pandey contributed equally to this work.

Authors and Affiliations

Department of Computer Science and Engineering, Manipal University Jaipur, Jaipur, 303007, India
Shishir Singh Chauhan & Ankur Pandey
School of Management, IILM University, Greater Noida, 201310, India
Yogesh Kumar Jain
Department of Computer Science and Engineering, Chandigarh University, University Institute of Engineering (UIE), Mohali, 140413, India
Praveen Kumar Mannepalli

Authors

Shishir Singh Chauhan
View author publications
Search author on:PubMed Google Scholar
Yogesh Kumar Jain
View author publications
Search author on:PubMed Google Scholar
Praveen Kumar Mannepalli
View author publications
Search author on:PubMed Google Scholar
Ankur Pandey
View author publications
Search author on:PubMed Google Scholar

Contributions

S.S. Chauhan (corresponding author) conceived and supervised the study, led manuscript writing. Y.K. Jain and P.K. Mannepalli performed data preprocessing, implemented models, and ran experiments. A. Pandey contributed to result analysis, visualization, and manuscript editing. All authors contributed to interpretation of results, reviewed the manuscript, and approved the final submission. Correspondence should be addressed to Shishir Singh Chauhan (email: shishir.chauhan@jaipur.manipal.edu).

Corresponding author

Correspondence to Shishir Singh Chauhan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Chauhan, S.S., Jain, Y.K., Mannepalli, P.K. et al. 6G conditioned spatiotemporal graph neural networks for real time traffic flow prediction. Sci Rep 16, 3902 (2026). https://doi.org/10.1038/s41598-025-32795-0

Download citation

Received: 17 August 2025
Accepted: 12 December 2025
Published: 28 January 2026
Version of record: 29 January 2026
DOI: https://doi.org/10.1038/s41598-025-32795-0

Subjects

Abstract

Similar content being viewed by others

Graph convolution networks based on adaptive spatiotemporal attention for traffic flow forecasting

An adaptive spatiotemporal dynamic graph convolutional network for traffic prediction

Few-shot traffic classification based on autoencoder and deep graph convolutional networks

Introduction

Related work

Classical time-series and statistical models

Shallow machine learning approaches

Deep learning on gridded traffic data

Spatio-temporal graph neural networks

Diffusion convolutional RNN (DCRNN)

Graph WaveNet and adaptive GCN (AGCRN)

Attention and diffusion based extensions

Adaptive and dynamic graph models

6G-aware ITS and network-traffic fusion

Summary table of related work

Open challenges and research gaps

Dataset description and research framework

METR-LA traffic dataset

Graph construction

Simulated 6G feature generation

Normalization and splitting

Research framework

Methodology

Problem formulation

Data preprocessing and feature engineering

Graph construction

Model architectures

Spatio-temporal GCN (ST-GCN)

Graph attention network (ST-GAT)

Diffusion convolutional RNN (DCRNN)

6G-conditional DCRNN (DCRNN6G)

Training protocol

Evaluation metrics

Uncertainty quantification and statistical tests

Computational complexity

Algorithmic overview

Experimental results and analysis

Quantitative performance

Convergence dynamics

Final accuracy and latency

Error distribution

Graph topology diagnostics

Temporal correlation analysis

6G feature statistics

Spatio-temporal error heatmap

Summary of findings

Discussion and future directions

Conclusion

Limitations

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links