Conditional autoregressive model based on next scale prediction for missing data reconstruction

Wang, Shuang; Wang, Xiangpeng; Yang, Yuhan; Jiang, Peifan; Wang, Bin; Li, Yuanhao

doi:10.1038/s41598-025-08830-5

Download PDF

Article
Open access
Published: 04 July 2025

Conditional autoregressive model based on next scale prediction for missing data reconstruction

Shuang Wang¹,
Xiangpeng Wang¹,
Yuhan Yang¹,
Peifan Jiang¹,
Bin Wang² &
…
Yuanhao Li¹

Scientific Reports volume 15, Article number: 23904 (2025) Cite this article

1857 Accesses
1 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Seismic data collected under complex field conditions often contain missing traces. Traditional theory-driven methods rely heavily on empirically selected parameters and struggle to reconstruct continuous missing traces effectively. With advancements in deep learning, various generative models have exhibited strong reconstruction capabilities. However, diffusion model-based methods face significant reconstruction time overhead due to their iterative sampling strategies. Existing transformer-based autoregressive methods flatten two-dimensional seismic data into one-dimensional sequences, disrupting the inherent two-dimensional structure and compromising the spatial locality of seismic information. To address these limitations, we propose a conditional autoregressive model based on next-scale prediction. Starting from the smallest scale, the model incrementally predicts larger-scale data using information from preceding smaller scales, ultimately achieving robust data reconstruction. This next-scale prediction approach avoids flattening the data, thereby preserving its spatial structure. Additionally, conditional constraints during autoregressive generation ensure that the predicted data at each scale remains consistent and aligns with the distribution of the known data. Reconstruction experiments on both field and synthetic datasets demonstrate that our method achieves superior reconstruction accuracy compared to existing approaches and effectively handles various complex missing data scenarios.

A large-scale seismic risk assessment framework using enhanced FEMA P-58 and Bayesian network inference: a case study of District 2, Tehran

Article Open access 02 May 2026

Fast, scale-adaptive and uncertainty-aware downscaling of Earth system model fields with generative machine learning

Article Open access 13 March 2025

Dynamics, interactions and delays of the 2019 Ridgecrest rupture sequence

Article 24 May 2023

Introduction

Seismic data collected under complex field conditions often contain missing traces, presenting significant challenges for subsequent data processing^1,2. Consequently, reconstructing complete seismic data is a crucial step in ensuring the effectiveness of data processing.

Traditional theory-driven methods primarily reconstruct complete seismic data based on their physical properties^3,4,5. For example, the f-k interpolation method performs interpolation by transforming data into the f-k domain^6,7, while wave-equation-based methods leverage subsurface velocity as a prior^8,9. However, obtaining accurate subsurface velocity priors remains a significant challenge. Another approach involves transforming seismic data into sparse domains and applying compressed sensing theory for interpolation under sparse constraints^10,11. These traditional methods, despite their reliance on physical properties, are heavily dependent on empirically selected parameters and struggle to effectively reconstruct continuous missing data¹².

The development of deep learning has led to significant interest in using data-driven generative neural networks for seismic data reconstruction^13,14. For instance, ResNet-based methods have demonstrated the potential of neural networks in this field¹⁵. Similarly, Wang et al. utilized convolutional autoencoders to highlight the effectiveness of convolutional networks in seismic data reconstruction¹⁶. Zhang et al. combined deep learning-based image denoising techniques with projection onto convex sets (POCS)¹⁷ interpolation methods, enhancing the reconstruction capabilities of traditional approaches¹⁸. With advancements in generative adversarial networks (GAN), Kaur et al. proposed a self-learning GAN¹⁹ for seismic data reconstruction²⁰. Yu et al. further improved reconstruction performance by introducing attention mechanisms and hybrid loss functions into UNet architectures²¹. Guo et al. employed transformers to address the limitations of convolutional neural networks in global modeling²². Recently, diffusion models²³ have drawn attention for their strong generative capabilities, particularly in seismic data reconstruction²⁴. Conditional diffusion models have shown impressive performance, effectively overcoming the constraints of convolutional networks²⁵.

While diffusion models^26,27 and transformer-based networks²⁸ address the limitations of convolutional networks in global modeling²⁹, diffusion models demand significant computational resources and time due to their iterative sampling strategies³⁰, leading to considerably prolonged reconstruction processes. Existing transformer-based autoregressive methods rely on next-token prediction³¹, where each data point is treated as a token. This approach flattens the two-dimensional seismic data into a one-dimensional sequence and performs token-by-token predictions³². However, this flattening disrupts the inherent two-dimensional structure of the data, compromising its spatial locality. Additionally, the unidirectional dependency of next-token prediction in autoregressive models restricts these methods to predicting subsequent data solely based on preceding data^33,34,35. As a result, they struggle to account for future dependencies and face challenges in accurately predicting upper portions of the data when constrained by information from the bottom.

To address these challenges, we propose a conditional autoregressive model based on next-scale prediction- SeisCAR. Unlike next-token prediction methods, our model employs a next-scale prediction strategy, as illustrated in Fig. 1. Beginning with a randomly sampled smallest scale, the model progressively predicts larger-scale data by utilizing information from preceding smaller scales, ultimately reconstructing the complete dataset. Since next-scale prediction avoids flattening the data into a one-dimensional sequence, it effectively preserves the spatial structure of the data. However, generating data from a randomly sampled smallest scale may introduce discrepancies between the generated data and the distribution of known data. To mitigate this, we incorporate conditional constraints to ensure alignment between the predicted data at each scale and the known data, maintaining consistency and a shared probability distribution. The primary contributions of this study are as follows:

1.
We propose a seismic data reconstruction method based on an autoregressive model employing next-scale prediction. By progressively predicting larger scales rather than flattening the data for next-token prediction, this approach effectively preserves the spatial structure of the data, leading to enhanced reconstruction performance.
2.
We incorporated conditional constraints into the autoregressive model to ensure that the predicted next-scale aligns closely with the known data in both consistency and distribution.
3.
Experimental results on both synthetic and field datasets demonstrate that the proposed method outperforms existing approaches in terms of more accurate interpolation.

Related work

Autoregressive data generation originated in image synthesis³⁶, initially adopting the next-token prediction approach^37,38, where each subsequent token is predicted sequentially to construct the image. In this process, two-dimensional data are typically encoded into one-dimensional token sequences using specific encoding methods, which are later learned and reorganized into two-dimensional structures. For a token sequence $x = (x_1, x_2, \dots , x_T)$, the prediction of the next token $x_t$ using an autoregressive model depends exclusively on the preceding $t-1$ tokens ($x_1, x_2, \dots , x_{t-1}$). This relationship is formally expressed as:

$$\begin{aligned} p(x_1, x_2, \ldots , x_T) = \prod _{t=1}^T p(x_t | x_1, x_2, \ldots , x_{t-1}) \end{aligned}$$

(1)

Next-token prediction-based methods disrupt the two-dimensional structure of the data and the inherent dependencies between data points. To address these limitations, next-scale prediction-based generative methods have been developed. Starting from the smallest 1 $\times$ 1 scale, an autoregressive model incrementally generates larger scales, which are ultimately decoded into complete images. This approach preserves the two-dimensional structure of the data and captures the overall dependencies within the dataset, achieving state-of-the-art performance in image generation.

Methods

After transforming the autoregressive reconstruction model from next-token prediction to next-scale prediction, the unit of autoregressive computation shifts from a single token to a token map $r_k$, where $r_k \in (r_1, r_2, \ldots , r_K)$ and the scale of $r_k$ progressively increases. The autoregressive expression is then given by:

$$\begin{aligned} p(r_1, r_2, \ldots , r_K) = \prod _{k=1}^K p(r_k | r_1, r_2, \ldots , r_{k-1}) \end{aligned}$$

(2)

where $r_k$ represents a token map with varying scales, where the scale increases as $k$ increases. The smallest token map is 1 $\times$ 1, analogous to a single token in next-token prediction. During training, a multi-scale variational auto encoder (VAE) encodes the feature map of the real data into token maps of different scales to train the autoregressive model.

During generation, the process begins with a randomly sampled 1 $\times$ 1 token map for next-scale prediction. Since generation starts from a randomly sampled token map, the resulting data may not align with the distribution of the known data. To ensure that the generated data matches the distribution of the known data, we incorporate conditional constraints based on the known data during next-scale prediction, thereby maintaining greater consistency between the generated and known data.

Muti scale VAE

The multi-scale VAE serves as the first-stage model³⁹. It is trained to generate multi-scale token maps for the conditional autoregressive model and is also used to reconstruct the complete data after the autoregressive model generates data at the largest scale, as shown in Fig. 2.

Our multi-scale VAE is implemented based on vector quantized generative adversarial network (VQGAN)³¹, with the addition of $k$ extra convolutional layers, $\{\phi _k\}_{k=1}^{K}$, to encode the multi-scale token map. The complete encoding process is outlined in Algorithm 1. Given input data $x$, the feature map $f = \varepsilon (x)$ is obtained through the VAE encoder. This feature map is then interpolated into a token map at the specified scale, and the information represented by the token map at the current scale is subtracted from the overall feature map $f$. This residual-like design has been shown to outperform independent interpolation⁴⁰. The reconstruction process from the multi-scale token map is illustrated in Algorithm 2. Given a set of token maps, they are progressively added to the feature map $f$, and the VAE decoder is finally used to decode and obtain the data, $x = D(f)$.

Conditional autoregressive model

With the multi-scale token maps encoded by the multi-scale VAE, training of the autoregressive model can proceed. Starting from a randomly sampled 1 $\times$ 1 token map, next-scale prediction is performed to generate data. At this stage, the generated data may exhibit discrepancies in distribution compared to the known data. To address this, we incorporate the known data as conditional constraints into the autoregressive model, as shown in Fig. 3. By conditioning on the known data, the generated data can maintain greater consistency with the known data.

The network architecture of the conditional autoregressive model is shown in Fig. 4. After embedding the known token map and condition together, they are processed through $N$ CAR blocks, which ultimately output the predicted next token map. The overall architecture follows a GPT-style design⁴¹, with each CAR block consisting of a self-attention module and an FFN module.

Examples

Evaluation metrics

To quantitatively evaluate the quality of the reconstruction results, we selected three commonly used metrics: mean square error (MSE), signal-to-noise ratio (SNR), and structure similarity index measure (SSIM). MSE measures the error between the reconstruction results and the ground truth, and its calculation formula is as follows:

$$\begin{aligned} MSE=\frac{1}{n}\sum _{i=1}^{n}(x_{i}^{r}-x_{i}^{t})^2 \end{aligned}$$

(3)

Here, $x_i^r$ represents the data reconstructed by the network, and $x_i^t$ represents the ground truth. The closer the value of MSE is to 0, the more similar the reconstruction result is to the ground truth. SNR measures the quality of the reconstruction, and its calculation formula is as follows:

$$\begin{aligned} SNR=10log_{10}\frac{\parallel x_r \parallel _F^2 }{\parallel x_t-x_r \parallel _F^2} \end{aligned}$$

(4)

Here, $x_r$ represents the data reconstructed by the network, $x_t$ represents the ground truth, and $\Vert \cdot \Vert _F$ denotes the Frobenius norm. The larger the SNR value, the higher the quality of the reconstruction. SSIM measures the structural similarity of the reconstruction result, and its calculation formula is as follows:

$$\begin{aligned} SSIM=\frac{(2\mu _r\mu _t+c_1)(2\sigma _{rt}+c_2)}{(\mu _r^2\mu _t^2+c_1)(\sigma _r^2+\sigma _t^2+c_2)} \end{aligned}$$

(5)

Here, $u_r$ is the mean of the reconstruction result, $u_t$ is the mean of the ground truth, $\sigma _{rt}$ is the covariance between the reconstruction result and the ground truth, $\sigma _r$ is the variance of the reconstruction result, $\sigma _t$ is the variance of the ground truth, and $c_1$ and $c_2$ are two constants introduced to avoid numerical instability. The closer the SSIM value is to 1, the more similar the reconstruction result is to the ground truth.

Train

We conducted experiments on two publicly available datasets: the SEG C3 synthetic dataset and the Mobil Avo Viking Graben Line 12 field dataset. The SEG C3 synthetic dataset consists of 45 shot gathers, each with a 201 $\times$ 201 receiver grid. The sampling rate is 8 ms, with 625 samples per trace. From this dataset, we randomly cropped 2000 patches, each of size 256 $\times$ 256, using 1600 patches for training and 400 for testing. The Avo Viking Graben Line 12 field dataset has a 1001 $\times$ 120 receiver grid, with a sampling rate of 4 ms and 1500 samples per trace. Similarly, we randomly cropped 2000 patches, each of size 256 $\times$ 256, with 1600 patches used for training and 400 for testing. We trained the model for 200,000 steps on an Nvidia A6000 GPU. During training, we set $N$ to 16, the batch size to 16, and used AdamW as the optimizer with a learning rate of 0.0001.

We selected four of the most advanced and distinct reconstruction methods for comparison: the UNet-based Anet method²¹, which modifies the network’s loss function by incorporating a hyperparameter loss and introduces an attention mechanism into the UNet architecture; the transformer-based MST method²², a U-shaped transformer architecture; a generative adversarial network (GAN)-based reconstruction model STUGAN, that utilizes Swin Transformer as its backbone⁴², and the Conditional Constraint Diffusion Model (CCDM)²⁵, which enhances the diffusion model by introducing conditional constraints and modifying its sampling process. For the comparison, we used the same dataset, set the hyperparameters as described in the respective papers, and trained for 200,000 steps on an Nvidia A6000 GPU.

Synthetic dataset

To validate the effectiveness of the proposed method, we first conducted experiments using the SEG C3 synthetic dataset. We then evaluated its performance under two types of missing data scenarios: random and continuous missing data.

Random missing

For the random missing data scenario, the missing ratio was set to 70%, with missing traces represented by zeros. Figure 5 presents the reconstruction results of the different methods. It can be observed that all five methods successfully reconstructed the data; however, the residual plots demonstrate that our method yielded the best reconstruction results.

To quantitatively compare the performance of the five methods, we calculated three evaluation metrics, as shown in Table 1. Our method outperforms the others, achieving the best reconstruction performance and successfully reconstructing the data.

Table 1 Comparison of five reconstruction networks under random missing on synthetic dataset.

Full size table

Continuous missing

To evaluate the model’s reconstruction performance for large-scale continuous missing data, we simulated a scenario with a missing range of 50 traces. The missing traces were represented by setting their values to 0. The reconstruction results of the five methods are shown in Fig. 6. It can be observed that the reconstruction results of Anet, based on convolution, exhibit inconsistencies, likely due to the limited receptive field of convolution, which hinders global modeling. According to the residual maps, our method achieved the best reconstruction performance.

To quantitatively evaluate the reconstruction performance under continuous missing data, we computed three evaluation metrics, as shown in Table 2. The results indicate that Anet performs the worst in this scenario. The performance of the diffusion model is comparable to that of our method, though slightly inferior.

Table 2 Comparison of five reconstruction networks under continuous missing on synthetic dataset.

Full size table

To better illustrate SeisCAR’s reconstruction performance and amplitude preservation, we plotted the reconstruction of the 100th trace when 50 continuous traces were missing. The results of all five methods are shown in Fig. 7. SeisCAR provides the best reconstruction of local details, with the reconstructed data closely matching the original seismic records.

To better compare the five methods, we plotted the f-k spectra under the condition of 50 continuous missing traces, as shown in Fig. 8. It can be observed that our method exhibits the closest resemblance to the ground truth with the least amount of artifacts. In contrast, ANET and the other three methods exhibit some degree of spatial aliasing, indicating that the SeisCAR is effective in recovering missing seismic trace data.

To evaluate the reliability of the models, we generated uncertainty maps of the reconstruction results under the condition of 50 continuous missing traces, as shown in Fig. 9. It can be observed that our method demonstrates higher reliability compared to the other methods, indicating that SeisCAR can reliably recover missing seismic trace data.

Field dataset

To further evaluate the applicability of our method, we conducted experiments using the Mobil Avo Viking Graben Line 12 field dataset. The experiments were divided into two scenarios: continuous missing data and random missing data.

Random missing

For the random missing data scenario, the missing ratio was set to 70%, with traces assigned a value of 0 to represent the missing data. Figure 10 illustrates the reconstruction results of the different methods. It can be observed that all five methods successfully reconstructed the data; however, the residual maps indicate that our method provided the best reconstruction results.

To quantitatively assess the reconstruction performance for random missing data on the field dataset, we calculated three evaluation metrics, as shown in Table 3. The results indicate that our method achieves the best reconstruction performance.

Table 3 Comparison of five reconstruction networks under random missing on filed dataset.

Full size table

Continuous missing

To evaluate the model’s reconstruction performance for continuous missing data on the field dataset, we simulated a scenario with 40 consecutive missing traces, represented by traces set to 0. The reconstruction results of the five methods are shown in Fig. 11. Although the convolution-based Anet successfully reconstructed the data, the reconstructed results significantly deviated from the ground truth due to the limitations of convolution. The residual maps further confirm that our method achieved the best reconstruction performance.

Table 4 presents the results of three evaluation metrics. These results demonstrate that our method achieves the best reconstruction performance, with the smallest deviation from the ground truth.

Table 4 Comparison of five reconstruction networks under continuous missing on field dataset.

Full size table

To further evaluate the model’s reconstruction performance, we plotted the reconstruction of the 60th trace with 40 consecutive missing traces from the field data, as shown in Fig. 12. The reconstruction result of SeisCAR is closest to the ground truth, with the smallest deviation, demonstrating that SeisCAR achieves the best performance.

To better compare the five methods, we generated f-k spectra under the condition of 40 continuous missing traces, as shown in Fig. 13. Compared to other methods, SeisCAR achieves the best detail restoration, demonstrating the effectiveness of the SeisCAR in recovering missing seismic trace data.

To assess the reliability of the model on field data, we generated uncertainty maps of the reconstruction results under the condition of 40 continuous missing traces, as shown in Fig. 14. The results demonstrate that SeisCAR can reliably reconstruct complex field datasets.

Generalization

To assess the generalization capability of the proposed method, we conducted an experiment using the SEG C3 synthetic dataset for training. The model, trained on SEG C3, was then directly applied to reconstruct a more complex field dataset collected in Tibet, China. In this experiment, we simulated a scenario with 30 consecutive missing traces, which were set to 0 to represent the missing data for reconstruction. The reconstruction results of the five methods are shown in Fig. 15. It can be observed that Anet, being a convolutional point-to-point reconstruction method, introduces significant information from the training set into the reconstruction. In contrast, both CCDM and our method, which incorporate known data as constraints, exhibit stronger consistency in the reconstructed data.

Table 5 presents three evaluation metrics used to quantitatively assess the model’s generalization capability. It can be observed that Anet exhibits the weakest generalization ability. In contrast, both CCDM and our method, which incorporate known data as constraints, show superior generalization performance compared to point-to-point reconstruction models. Our method, in particular, demonstrates robust generalization capability.

Table 5 Comparison of generalization ability of five reconstruction networks with 30 continuous missing traces in field dataset.

Full size table

Table 6 Comparison of adaptability of five reconstruction networks with 30 continuous missing traces in field dataset.

Full size table

Table 7 Comparison of computational efficiency.

Full size table

Transfer learning

To evaluate the adaptability of the proposed method to different datasets, we conducted transfer learning by applying the model trained on SEG C3 to the target dataset. The target dataset consists of seismic data collected from Tibet, China, which was used for the generalization experiments. During transfer learning, only a small sample of 200 patches was selected for training, and the model was then used to reconstruct data with 30 continuous missing traces. The reconstruction results, shown in Fig. 16, indicate that SeisCAR exhibits the strongest adaptability when trained with a small sample, effectively learning the characteristics of the target dataset.

In Table 6, we computed three evaluation metrics to quantitatively assess the model’s performance. The results indicate that ANET exhibits the weakest adaptability, whereas our method demonstrates strong adaptability, successfully adapting to the target data through small-sample transfer learning.

Computational efficiency

To verify that SeisCAR reduces computational costs compared to diffusion-based methods while maintaining efficiency comparable to other transformer-based models, we analyzed memory consumption during training, inference time, and model complexity.

Table 8 Different missing scenarios.

Full size table

The hardware and software configurations for both training and inference were standardized as follows: an NVIDIA A6000 GPU and an Intel Xeon 64-core processor were used, with PyTorch 2.2.1 and CUDA 11.8 for implementation. During training, we used the SEG C3 dataset with data dimensions of 256 $\times$ 256, setting the batch size to 8. The memory consumption during training is reported in Table 7. The results indicate that CCDM has the highest memory consumption and complexity, primarily due to its constrained diffusion model employing two separate networks-one for handling constraints and the other for processing data. Our method exhibits memory usage comparable to MST, while STUGAN consumes more due to the additional discriminator in its architecture.

We also measured inference time for reconstruction using the SEG C3 dataset with 50 continuous missing traces, as illustrated in Fig. 6a. The inference time comparison, shown in Table 7, reveals that CCDM has the longest inference time due to its iterative sampling process.

Table 9 Different missing rate.

Full size table

Other missing scenarios

To evaluate whether our method can handle missing data in different situations, including some extreme cases, we devised several mixed missing scenarios, as shown in Fig. 17b and d. Figure 17d represents a highly extreme missing scenario, where a malfunction occurs in a geophone midway through its operation, causing several nearby geophones to also fail, resulting in data being collected from only the upper half of the array.

It can be observed that our method is capable of effectively reconstructing the data across different missing scenarios. We quantitatively calculated the results for various missing cases, as shown in Table 8. The reconstruction performance does not degrade with the variation in missing data scenarios.

To discuss the maximum missing rate at which the proposed method can reconstruct data, we set up several extreme missing rate scenarios, specifically 85%, 90%, and 95% random missing. Using the SEG C3 dataset for testing, the reconstruction results are shown in Table 9.

It can be observed that when the missing rate reaches 95%, the performance begins to decline significantly, indicating that the maximum missing rate SeisCAR can handle is between 90% and 95%, and it is unable to process data with more than 95% missingness.

Noisy data

To evaluate the robustness of the proposed method to noise, we examined the impact of noise on reconstruction results. We introduced field noise collected during actual construction into the data, as shown in Fig. 18a and b, which display the test data with severe noise and the original data, respectively. Subsequently, a 60% random missing rate was applied, as illustrated in Fig. 18c. The model trained on SEG C3 was then used to reconstruct the noisy and incomplete data.

Table 10 Contribution of conditional constraints.

Full size table

The reconstruction results are shown in Fig. 18d, demonstrating that SeisCAR can effectively reconstruct the missing data, which confirms the robustness of the proposed method to noise. However, in the blank regions of the original data, some noise-like artifacts were also reconstructed, indicating that a model trained on clean data cannot completely eliminate noise interference. To enhance reconstruction performance, we trained the model using a noisy dataset. By introducing field noise collected from actual construction into the SEG C3 dataset, we created a noisy dataset for training, following the same training procedure as for the clean data. The resulting reconstruction, shown in Fig. 18e, illustrates that after training with noisy data, the model can directly reconstruct clean data.

Ablation study

To quantify the contribution of conditional constraints and evaluate the appropriateness of the settings for $N$ and the learning rate, we conducted ablation experiments on the SEG C3 dataset.

The reconstruction results with and without conditional constraints are shown in Fig. 19. Although the reconstructed data without conditional constraints appears visually similar to the ground truth, the generated data in this case does not align with the distribution of the known data.

In Table 10, we computed three evaluation metrics to quantitatively assess the impact of conditional constraints. The results indicate that conditional constraints enable the generated data to maintain a higher consistency with the known data.

To evaluate the appropriateness of the $N$ setting, we varied its value to 12, 16, 20, and 24 and assessed the reconstruction performance on the SEG C3 dataset, where the reconstructed data involved 50 continuous missing traces.

The results, shown in Table 11, indicate that when $N$ is set to 12, the model size is insufficient to fully represent the entire dataset, leading to a performance decline. When $N$ is set to 16 and 20, the model capacity increases, but the performance does not improve significantly, suggesting that 16 is a reasonable choice. However, when $N$ is increased to 24, performance degradation occurs, likely because a larger model requires more training effort to reach optimal performance.

Table 11 The impact of CAR Block number.

Full size table

We investigated the impact of the learning rate on training performance, using the SEG C3 dataset with 50 continuous missing traces to evaluate reconstruction performance. As shown in Table 12, it can be observed that when the learning rate is too large, training quickly converges to a suboptimal state, resulting in degraded reconstruction performance. On the other hand, when the learning rate is too small, the convergence process slows down, requiring more training resources. Therefore, setting the learning rate to 0.0001 is a reasonable choice.

Table 12 The Impact of learning rate.

Full size table

Higher resolution

To test the performance of SeisCAR on higher resolution data, we conducted tests on large-scale data collected from the North China Plain in eastern China. During training, we set the size of each patch to 512 $\times$ 512. The model was trained for 200,000 steps on an Nvidia A6000 GPU. The settings for training were $N = 16$, batch size = 16, and AdamW as the optimizer with a learning rate of 0.0001. The trained model was then used to reconstruct missing data from the test set, with missing scenarios being 70% random missing and 70 continuous missing traces. The reconstruction results, shown in Fig. 20, demonstrate that the proposed method is still able to effectively reconstruct data even with higher resolution, exhibiting strong scalability.

Conclusions

This paper introduces a conditional autoregressive model based on next-scale prediction for seismic data reconstruction. Unlike methods relying on next-token prediction, the proposed model begins with the smallest scale and progressively predicts larger-scale data by utilizing information from preceding smaller scales, ultimately reconstructing the data. Since next-scale prediction does not require unfolding the data into a one-dimensional sequence, it effectively preserves the spatial structure of the data. During autoregressive generation, conditional constraints are incorporated to ensure that the predicted next scale aligns with the known data, maintaining consistency and distribution accuracy. Experimental results conducted on synthetic and field datasets demonstrate that our method outperforms existing methods in terms of achieving more accurate interpolation results.

Data availability

The observation of Society of Exploration Geophysicists (SEG) C3 dataset of this study are available at https://wiki.seg.org/wiki/SEG_C3_45_shot. The observation of Mobil Avo Viking Graben Line 12 field dataset of this study are available at https://wiki.seg.org/wiki/Mobil_AVO_viking_graben_line_12.

Code availability

Our implementation is available at https://github.com/WAL-l/SeisCAR.

References

Chai, X. et al. Deep learning for irregularly and regularly missing data reconstruction. Sci. Rep. 10, 3302 (2020).
Article CAS ADS PubMed PubMed Central Google Scholar
Chen, Y., Chen, X., Wang, Y. & Zu, S. The interpolation of sparse geophysical data. Surv. Geophys. 40, 73–105 (2019).
Article ADS Google Scholar
Niu, X., Fu, L., Zhang, W. & Li, Y. Seismic data interpolation based on simultaneously sparse and low-rank matrix recovery. IEEE Trans. Geosci. Remote Sens. 60, 1–13 (2021).
Google Scholar
Zhang, W., Fu, L. & Liu, Q. Nonconvex log-sum function-based majorization-minimization framework for seismic data reconstruction. IEEE Geosci. Remote Sens. Lett. 16, 1776–1780 (2019).
Article ADS Google Scholar
Innocent Oboué, Y., Chen, W., Wang, H. & Chen, Y. Robust damped rank-reduction method for simultaneous denoising and reconstruction of 5d seismic data. Geophysics 86, V71–V89 (2021).
Article ADS Google Scholar
Porsani, M. J. Seismic trace interpolation using half-step prediction filters. Geophysics 64, 1461–1467 (1999).
Article ADS Google Scholar
Gülünay, N. Seismic trace interpolation in the Fourier transform domain. Geophysics 68, 355–369 (2003).
Article ADS Google Scholar
Fomel, S. Seismic reflection data interpolation with differential offset and shot continuation. Geophysics 68, 733–744 (2003).
Article ADS Google Scholar
Ronen, J. Wave-equation trace interpolation. Geophysics 52, 973–984 (1987).
Article ADS Google Scholar
Latif, A. & Mousa, W. A. An efficient undersampled high-resolution radon transform for exploration seismic data processing. IEEE Trans. Geosci. Remote Sens. 55, 1010–1024 (2016).
Article ADS Google Scholar
Wang, J., Ng, M. & Perz, M. Seismic data interpolation by greedy local radon transform. Geophysics 75, WB225–WB234 (2010).
Article ADS Google Scholar
Chang, D. et al. Seismic data interpolation using dual-domain conditional generative adversarial networks. IEEE Geosci. Remote Sens. Lett. 18, 1856–1860 (2020).
Article ADS Google Scholar
He, T., Wu, B. & Zhu, X. Seismic data consecutively missing trace interpolation based on multistage neural network training process. IEEE Geosci. Remote Sens. Lett. 19, 1–5 (2021).
Google Scholar
Wang, X. et al. Reconstructing regularly missing seismic traces with a classifier-guided diffusion model. IEEE Trans. Geosci. Remote Sens. 62, 1–14 (2024).
Google Scholar
Wang, B., Zhang, N., Lu, W. & Wang, J. Deep-learning-based seismic data interpolation: A preliminary result. Geophysics 84, V11–V20 (2019).
Article ADS Google Scholar
Wang, Y., Wang, B., Tu, N. & Geng, J. Seismic trace interpolation for irregularly spatial sampled data using convolutional autoencodercae-based seismic trace interpolation. Geophysics 85, V119–V130 (2020).
Article ADS Google Scholar
Gao, J.-J., Chen, X.-H., Li, J.-Y., Liu, G.-C. & Ma, J. Irregular seismic data reconstruction based on exponential threshold model of POCS method. Appl. Geophys. 7, 229–238 (2010).
Article ADS Google Scholar
Zhang, H., Yang, X. & Ma, J. Can learning from natural image denoising be used for seismic data interpolation?. Geophysics 85, WA115–WA136 (2020).
Article ADS Google Scholar
Creswell, A. et al. Generative adversarial networks: An overview. IEEE Signal Process. Mag. 35, 53–65 (2018).
Article Google Scholar
Kaur, H., Pham, N. & Fomel, S. Seismic data interpolation using deep learning with generative adversarial networks. Geophys. Prospect. 69, 307–326 (2021).
Article ADS Google Scholar
Yu, J. & Wu, B. Attention and hybrid loss guided deep learning for consecutively missing seismic data reconstruction. IEEE Trans. Geosci. Remote Sens. 60, 1–8 (2021).
Article Google Scholar
Guo, Y., Fu, L. & Li, H. Seismic data interpolation based on multi-scale transformer. IEEE Geosci. Remote Sens. Lett. 20, 1–5 (2023).
Article Google Scholar
Dhariwal, P. & Nichol, A. Diffusion models beat gans on image synthesis. Adv. Neural. Inf. Process. Syst. 34, 8780–8794 (2021).
Google Scholar
Wang, S. et al. Seisfusion: Constrained diffusion model with input guidance for 3d seismic data interpolation and reconstruction. IEEE Trans. Geosci. Remote Sens. (2024).
Deng, F., Wang, S., Wang, X. & Fang, P. Seismic data reconstruction based on conditional constraint diffusion model. IEEE Geosci. Remote Sens. Lett. (2024).
Liu, Q. & Ma, J. Generative interpolation via a diffusion probabilistic model. Geophysics 89, V65–V85 (2024).
Article ADS Google Scholar
Wei, X. et al. Seismic data interpolation based on denoising diffusion implicit models with resampling. arXiv preprint arXiv:2307.04226 (2023).
Vaswani, A. Attention is all you need. Adv. Neural Inf. Process. Syst. 30, 1 (2017).
Google Scholar
Liu, Z. et al. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision 10012–10022 (2021).
Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 33, 6840–6851 (2020).
Google Scholar
Esser, P., Rombach, R. & Ommer, B. Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 12873–12883 (2021).
Dosovitskiy, A. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
Bai, Y. et al. Sequential modeling enables scalable learning for large vision models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 22861–22872 (2024).
Lu, J., Clark, C., Zellers, R., Mottaghi, R. & Kembhavi, A. Unified-io: A unified model for vision, language, and multi-modal tasks. In The Eleventh International Conference on Learning Representations (2022).
Lu, J. et al. Unified-io 2: Scaling autoregressive multimodal models with vision language audio and action. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 26439–26455 (2024).
Wang, W. et al. Visionllm: Large language model is also an open-ended decoder for vision-centric tasks. Adv. Neural Inf. Process. Syst. 36 (2024).
Dai, X. et al. Emu: Enhancing image generation models using photogenic needles in a haystack. arXiv preprint arXiv:2309.15807 (2023).
Chen, Z. et al. Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 24185–24198 (2024).
Razavi, A., Van den Oord, A. & Vinyals, O. Generating diverse high-fidelity images with vq-vae-2. Adv. Neural Inf. Process. Syst. 32 (2019).
Tian, K., Jiang, Y., Yuan, Z., Peng, B. & Wang, L. Visual autoregressive modeling: Scalable image generation via next-scale prediction. arXiv preprint arXiv:2404.02905 (2024).
Floridi, L. & Chiriatti, M. GPT-3: Its nature, scope, limits, and consequences. Minds Mach. 30, 681–694 (2020).
Article Google Scholar
Zhang, Y., Zhang, Y., Dong, H. & Song, L. Stugan: An integrated swin transformer based generative adversarial networks for seismic data reconstruction and denoising. IEEE Trans. Geosci. Remote Sens. (2024).

Download references

Funding

This research was funded by Natural Science Foundation of Sichuan Province (Grant No. 2025ZNSFSC0312), Longyan Key Projects (Grant No. 2023LYF9003) and Sichuan Achievement Transformation Program (Grant No. 2024ZHCG0022).

Author information

Authors and Affiliations

Key Laboratory of Earth Exploration and Information Techniques of Education Ministry, College of Geophysics, Chengdu University of Technology, Chengdu, 610059, China
Shuang Wang, Xiangpeng Wang, Yuhan Yang, Peifan Jiang & Yuanhao Li
College of Computer Science and Cyber Security, Chengdu University of Technology, Chengdu, 610059, China
Bin Wang

Authors

Shuang Wang
View author publications
Search author on:PubMed Google Scholar
Xiangpeng Wang
View author publications
Search author on:PubMed Google Scholar
Yuhan Yang
View author publications
Search author on:PubMed Google Scholar
Peifan Jiang
View author publications
Search author on:PubMed Google Scholar
Bin Wang
View author publications
Search author on:PubMed Google Scholar
Yuanhao Li
View author publications
Search author on:PubMed Google Scholar

Contributions

Conceptualization, methodology, software, validation, formal analysis, S.W.; data curation, Y.Y.; writing—original draft preparation, S.W.; writing—review and editing, P.J.; visualization, B.W. and Y. L.; writing—review and editing, X.W.;

Corresponding author

Correspondence to Xiangpeng Wang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Wang, S., Wang, X., Yang, Y. et al. Conditional autoregressive model based on next scale prediction for missing data reconstruction. Sci Rep 15, 23904 (2025). https://doi.org/10.1038/s41598-025-08830-5

Download citation

Received: 15 January 2025
Accepted: 24 June 2025
Published: 04 July 2025
Version of record: 04 July 2025
DOI: https://doi.org/10.1038/s41598-025-08830-5

Subjects

Abstract

Similar content being viewed by others

A large-scale seismic risk assessment framework using enhanced FEMA P-58 and Bayesian network inference: a case study of District 2, Tehran

Fast, scale-adaptive and uncertainty-aware downscaling of Earth system model fields with generative machine learning

Dynamics, interactions and delays of the 2019 Ridgecrest rupture sequence

Introduction

Related work

Methods

Muti scale VAE

Conditional autoregressive model

Examples

Evaluation metrics

Train

Synthetic dataset

Random missing

Continuous missing

Field dataset

Random missing

Continuous missing

Generalization

Transfer learning

Computational efficiency

Other missing scenarios

Noisy data

Ablation study

Higher resolution

Conclusions

Data availability

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links