Abstract
Seismic data collected under complex field conditions often contain missing traces. Traditional theory-driven methods rely heavily on empirically selected parameters and struggle to reconstruct continuous missing traces effectively. With advancements in deep learning, various generative models have exhibited strong reconstruction capabilities. However, diffusion model-based methods face significant reconstruction time overhead due to their iterative sampling strategies. Existing transformer-based autoregressive methods flatten two-dimensional seismic data into one-dimensional sequences, disrupting the inherent two-dimensional structure and compromising the spatial locality of seismic information. To address these limitations, we propose a conditional autoregressive model based on next-scale prediction. Starting from the smallest scale, the model incrementally predicts larger-scale data using information from preceding smaller scales, ultimately achieving robust data reconstruction. This next-scale prediction approach avoids flattening the data, thereby preserving its spatial structure. Additionally, conditional constraints during autoregressive generation ensure that the predicted data at each scale remains consistent and aligns with the distribution of the known data. Reconstruction experiments on both field and synthetic datasets demonstrate that our method achieves superior reconstruction accuracy compared to existing approaches and effectively handles various complex missing data scenarios.
Similar content being viewed by others
Introduction
Seismic data collected under complex field conditions often contain missing traces, presenting significant challenges for subsequent data processing1,2. Consequently, reconstructing complete seismic data is a crucial step in ensuring the effectiveness of data processing.
Traditional theory-driven methods primarily reconstruct complete seismic data based on their physical properties3,4,5. For example, the f-k interpolation method performs interpolation by transforming data into the f-k domain6,7, while wave-equation-based methods leverage subsurface velocity as a prior8,9. However, obtaining accurate subsurface velocity priors remains a significant challenge. Another approach involves transforming seismic data into sparse domains and applying compressed sensing theory for interpolation under sparse constraints10,11. These traditional methods, despite their reliance on physical properties, are heavily dependent on empirically selected parameters and struggle to effectively reconstruct continuous missing data12.
The development of deep learning has led to significant interest in using data-driven generative neural networks for seismic data reconstruction13,14. For instance, ResNet-based methods have demonstrated the potential of neural networks in this field15. Similarly, Wang et al. utilized convolutional autoencoders to highlight the effectiveness of convolutional networks in seismic data reconstruction16. Zhang et al. combined deep learning-based image denoising techniques with projection onto convex sets (POCS)17 interpolation methods, enhancing the reconstruction capabilities of traditional approaches18. With advancements in generative adversarial networks (GAN), Kaur et al. proposed a self-learning GAN19 for seismic data reconstruction20. Yu et al. further improved reconstruction performance by introducing attention mechanisms and hybrid loss functions into UNet architectures21. Guo et al. employed transformers to address the limitations of convolutional neural networks in global modeling22. Recently, diffusion models23 have drawn attention for their strong generative capabilities, particularly in seismic data reconstruction24. Conditional diffusion models have shown impressive performance, effectively overcoming the constraints of convolutional networks25.
While diffusion models26,27 and transformer-based networks28 address the limitations of convolutional networks in global modeling29, diffusion models demand significant computational resources and time due to their iterative sampling strategies30, leading to considerably prolonged reconstruction processes. Existing transformer-based autoregressive methods rely on next-token prediction31, where each data point is treated as a token. This approach flattens the two-dimensional seismic data into a one-dimensional sequence and performs token-by-token predictions32. However, this flattening disrupts the inherent two-dimensional structure of the data, compromising its spatial locality. Additionally, the unidirectional dependency of next-token prediction in autoregressive models restricts these methods to predicting subsequent data solely based on preceding data33,34,35. As a result, they struggle to account for future dependencies and face challenges in accurately predicting upper portions of the data when constrained by information from the bottom.
To address these challenges, we propose a conditional autoregressive model based on next-scale prediction- SeisCAR. Unlike next-token prediction methods, our model employs a next-scale prediction strategy, as illustrated in Fig. 1. Beginning with a randomly sampled smallest scale, the model progressively predicts larger-scale data by utilizing information from preceding smaller scales, ultimately reconstructing the complete dataset. Since next-scale prediction avoids flattening the data into a one-dimensional sequence, it effectively preserves the spatial structure of the data. However, generating data from a randomly sampled smallest scale may introduce discrepancies between the generated data and the distribution of known data. To mitigate this, we incorporate conditional constraints to ensure alignment between the predicted data at each scale and the known data, maintaining consistency and a shared probability distribution. The primary contributions of this study are as follows:
-
1.
We propose a seismic data reconstruction method based on an autoregressive model employing next-scale prediction. By progressively predicting larger scales rather than flattening the data for next-token prediction, this approach effectively preserves the spatial structure of the data, leading to enhanced reconstruction performance.
-
2.
We incorporated conditional constraints into the autoregressive model to ensure that the predicted next-scale aligns closely with the known data in both consistency and distribution.
-
3.
Experimental results on both synthetic and field datasets demonstrate that the proposed method outperforms existing approaches in terms of more accurate interpolation.
(a) Reconstruction process based on next-token prediction. (b) Reconstruction process based on next-scale prediction.
Related work
Autoregressive data generation originated in image synthesis36, initially adopting the next-token prediction approach37,38, where each subsequent token is predicted sequentially to construct the image. In this process, two-dimensional data are typically encoded into one-dimensional token sequences using specific encoding methods, which are later learned and reorganized into two-dimensional structures. For a token sequence \(x = (x_1, x_2, \dots , x_T)\), the prediction of the next token \(x_t\) using an autoregressive model depends exclusively on the preceding \(t-1\) tokens (\(x_1, x_2, \dots , x_{t-1}\)). This relationship is formally expressed as:
Next-token prediction-based methods disrupt the two-dimensional structure of the data and the inherent dependencies between data points. To address these limitations, next-scale prediction-based generative methods have been developed. Starting from the smallest 1 \(\times\) 1 scale, an autoregressive model incrementally generates larger scales, which are ultimately decoded into complete images. This approach preserves the two-dimensional structure of the data and captures the overall dependencies within the dataset, achieving state-of-the-art performance in image generation.
Methods
After transforming the autoregressive reconstruction model from next-token prediction to next-scale prediction, the unit of autoregressive computation shifts from a single token to a token map \(r_k\), where \(r_k \in (r_1, r_2, \ldots , r_K)\) and the scale of \(r_k\) progressively increases. The autoregressive expression is then given by:
where \(r_k\) represents a token map with varying scales, where the scale increases as \(k\) increases. The smallest token map is 1 \(\times\) 1, analogous to a single token in next-token prediction. During training, a multi-scale variational auto encoder (VAE) encodes the feature map of the real data into token maps of different scales to train the autoregressive model.
During generation, the process begins with a randomly sampled 1 \(\times\) 1 token map for next-scale prediction. Since generation starts from a randomly sampled token map, the resulting data may not align with the distribution of the known data. To ensure that the generated data matches the distribution of the known data, we incorporate conditional constraints based on the known data during next-scale prediction, thereby maintaining greater consistency between the generated and known data.
Muti scale VAE
The multi-scale VAE serves as the first-stage model39. It is trained to generate multi-scale token maps for the conditional autoregressive model and is also used to reconstruct the complete data after the autoregressive model generates data at the largest scale, as shown in Fig. 2.
Encoding data into multi-scale token maps and reconstructing data based on multi-scale token maps.
Our multi-scale VAE is implemented based on vector quantized generative adversarial network (VQGAN)31, with the addition of \(k\) extra convolutional layers, \(\{\phi _k\}_{k=1}^{K}\), to encode the multi-scale token map. The complete encoding process is outlined in Algorithm 1. Given input data \(x\), the feature map \(f = \varepsilon (x)\) is obtained through the VAE encoder. This feature map is then interpolated into a token map at the specified scale, and the information represented by the token map at the current scale is subtracted from the overall feature map \(f\). This residual-like design has been shown to outperform independent interpolation40. The reconstruction process from the multi-scale token map is illustrated in Algorithm 2. Given a set of token maps, they are progressively added to the feature map \(f\), and the VAE decoder is finally used to decode and obtain the data, \(x = D(f)\).
Encoding
Reconstruction
Conditional autoregressive model
With the multi-scale token maps encoded by the multi-scale VAE, training of the autoregressive model can proceed. Starting from a randomly sampled 1 \(\times\) 1 token map, next-scale prediction is performed to generate data. At this stage, the generated data may exhibit discrepancies in distribution compared to the known data. To address this, we incorporate the known data as conditional constraints into the autoregressive model, as shown in Fig. 3. By conditioning on the known data, the generated data can maintain greater consistency with the known data.
Using known data as a condition guides the model to generate more consistent data.
The network architecture of the conditional autoregressive model is shown in Fig. 4. After embedding the known token map and condition together, they are processed through \(N\) CAR blocks, which ultimately output the predicted next token map. The overall architecture follows a GPT-style design41, with each CAR block consisting of a self-attention module and an FFN module.
Conditional autoregressive model network architecture.
Examples
Evaluation metrics
To quantitatively evaluate the quality of the reconstruction results, we selected three commonly used metrics: mean square error (MSE), signal-to-noise ratio (SNR), and structure similarity index measure (SSIM). MSE measures the error between the reconstruction results and the ground truth, and its calculation formula is as follows:
Here, \(x_i^r\) represents the data reconstructed by the network, and \(x_i^t\) represents the ground truth. The closer the value of MSE is to 0, the more similar the reconstruction result is to the ground truth. SNR measures the quality of the reconstruction, and its calculation formula is as follows:
Here, \(x_r\) represents the data reconstructed by the network, \(x_t\) represents the ground truth, and \(\Vert \cdot \Vert _F\) denotes the Frobenius norm. The larger the SNR value, the higher the quality of the reconstruction. SSIM measures the structural similarity of the reconstruction result, and its calculation formula is as follows:
Here, \(u_r\) is the mean of the reconstruction result, \(u_t\) is the mean of the ground truth, \(\sigma _{rt}\) is the covariance between the reconstruction result and the ground truth, \(\sigma _r\) is the variance of the reconstruction result, \(\sigma _t\) is the variance of the ground truth, and \(c_1\) and \(c_2\) are two constants introduced to avoid numerical instability. The closer the SSIM value is to 1, the more similar the reconstruction result is to the ground truth.
Train
We conducted experiments on two publicly available datasets: the SEG C3 synthetic dataset and the Mobil Avo Viking Graben Line 12 field dataset. The SEG C3 synthetic dataset consists of 45 shot gathers, each with a 201 \(\times\) 201 receiver grid. The sampling rate is 8 ms, with 625 samples per trace. From this dataset, we randomly cropped 2000 patches, each of size 256 \(\times\) 256, using 1600 patches for training and 400 for testing. The Avo Viking Graben Line 12 field dataset has a 1001 \(\times\) 120 receiver grid, with a sampling rate of 4 ms and 1500 samples per trace. Similarly, we randomly cropped 2000 patches, each of size 256 \(\times\) 256, with 1600 patches used for training and 400 for testing. We trained the model for 200,000 steps on an Nvidia A6000 GPU. During training, we set \(N\) to 16, the batch size to 16, and used AdamW as the optimizer with a learning rate of 0.0001.
We selected four of the most advanced and distinct reconstruction methods for comparison: the UNet-based Anet method21, which modifies the network’s loss function by incorporating a hyperparameter loss and introduces an attention mechanism into the UNet architecture; the transformer-based MST method22, a U-shaped transformer architecture; a generative adversarial network (GAN)-based reconstruction model STUGAN, that utilizes Swin Transformer as its backbone42, and the Conditional Constraint Diffusion Model (CCDM)25, which enhances the diffusion model by introducing conditional constraints and modifying its sampling process. For the comparison, we used the same dataset, set the hyperparameters as described in the respective papers, and trained for 200,000 steps on an Nvidia A6000 GPU.
Synthetic dataset
To validate the effectiveness of the proposed method, we first conducted experiments using the SEG C3 synthetic dataset. We then evaluated its performance under two types of missing data scenarios: random and continuous missing data.
Random missing
For the random missing data scenario, the missing ratio was set to 70%, with missing traces represented by zeros. Figure 5 presents the reconstruction results of the different methods. It can be observed that all five methods successfully reconstructed the data; however, the residual plots demonstrate that our method yielded the best reconstruction results.
(a) 70% random missing data, (b) ground truth, (c)–(g) are the reconstruction results of Anet, MST, STUGAN, CCDM, and SeisCAR, respectively, and (h)–(l) are the residuals of each reconstruction result relative to the ground truth.
(a) 50 continuous missing traces, (b) ground truth, (c)–(g) are the reconstruction results of Anet, MST, STUGAN, CCDM, and SeisCAR, respectively, and (h)–(l) are the residuals of each reconstruction result relative to the ground truth.
To quantitatively compare the performance of the five methods, we calculated three evaluation metrics, as shown in Table 1. Our method outperforms the others, achieving the best reconstruction performance and successfully reconstructing the data.
Continuous missing
To evaluate the model’s reconstruction performance for large-scale continuous missing data, we simulated a scenario with a missing range of 50 traces. The missing traces were represented by setting their values to 0. The reconstruction results of the five methods are shown in Fig. 6. It can be observed that the reconstruction results of Anet, based on convolution, exhibit inconsistencies, likely due to the limited receptive field of convolution, which hinders global modeling. According to the residual maps, our method achieved the best reconstruction performance.
To quantitatively evaluate the reconstruction performance under continuous missing data, we computed three evaluation metrics, as shown in Table 2. The results indicate that Anet performs the worst in this scenario. The performance of the diffusion model is comparable to that of our method, though slightly inferior.
To better illustrate SeisCAR’s reconstruction performance and amplitude preservation, we plotted the reconstruction of the 100th trace when 50 continuous traces were missing. The results of all five methods are shown in Fig. 7. SeisCAR provides the best reconstruction of local details, with the reconstructed data closely matching the original seismic records.
Comparison of the 100th trace.
To better compare the five methods, we plotted the f-k spectra under the condition of 50 continuous missing traces, as shown in Fig. 8. It can be observed that our method exhibits the closest resemblance to the ground truth with the least amount of artifacts. In contrast, ANET and the other three methods exhibit some degree of spatial aliasing, indicating that the SeisCAR is effective in recovering missing seismic trace data.
f-k spectra. (a) ground truth, (b) 50 continuous missing traces, (c) ANET reconstruction, (d) MST reconstruction, (e) STUGAN reconstruction, (f) CCDM reconstruction, (g) SeisCAR reconstruction.
To evaluate the reliability of the models, we generated uncertainty maps of the reconstruction results under the condition of 50 continuous missing traces, as shown in Fig. 9. It can be observed that our method demonstrates higher reliability compared to the other methods, indicating that SeisCAR can reliably recover missing seismic trace data.
Uncertainty maps. (a) ANET reconstruction, (b) MST reconstruction, (c) STUGAN reconstruction, (d) CCDM reconstruction, (e) SeisCAR reconstruction.
Field dataset
To further evaluate the applicability of our method, we conducted experiments using the Mobil Avo Viking Graben Line 12 field dataset. The experiments were divided into two scenarios: continuous missing data and random missing data.
Random missing
For the random missing data scenario, the missing ratio was set to 70%, with traces assigned a value of 0 to represent the missing data. Figure 10 illustrates the reconstruction results of the different methods. It can be observed that all five methods successfully reconstructed the data; however, the residual maps indicate that our method provided the best reconstruction results.
(a) 70% random missing data, (b) ground truth, (c)–(g) are the reconstruction results of Anet, MST, STUGAN, CCDM, and SeisCAR, respectively, and (h)–(l) are the residuals of each reconstruction result relative to the ground truth.
To quantitatively assess the reconstruction performance for random missing data on the field dataset, we calculated three evaluation metrics, as shown in Table 3. The results indicate that our method achieves the best reconstruction performance.
Continuous missing
To evaluate the model’s reconstruction performance for continuous missing data on the field dataset, we simulated a scenario with 40 consecutive missing traces, represented by traces set to 0. The reconstruction results of the five methods are shown in Fig. 11. Although the convolution-based Anet successfully reconstructed the data, the reconstructed results significantly deviated from the ground truth due to the limitations of convolution. The residual maps further confirm that our method achieved the best reconstruction performance.
(a) 40 continuous missing traces, (b) ground truth, (c)–(g) are the reconstruction results of Anet, MST, STUGAN, CCDM, and SeisCAR, respectively, and (h)–(l) are the residuals of each reconstruction result relative to the ground truth.
Table 4 presents the results of three evaluation metrics. These results demonstrate that our method achieves the best reconstruction performance, with the smallest deviation from the ground truth.
To further evaluate the model’s reconstruction performance, we plotted the reconstruction of the 60th trace with 40 consecutive missing traces from the field data, as shown in Fig. 12. The reconstruction result of SeisCAR is closest to the ground truth, with the smallest deviation, demonstrating that SeisCAR achieves the best performance.
Comparison of the 60th trace.
To better compare the five methods, we generated f-k spectra under the condition of 40 continuous missing traces, as shown in Fig. 13. Compared to other methods, SeisCAR achieves the best detail restoration, demonstrating the effectiveness of the SeisCAR in recovering missing seismic trace data.
f-k spectra. (a) ground truth, (b) 40 continuous missing traces, (c) ANET reconstruction, (d) MST reconstruction, (e) STUGAN reconstruction, (f) CCDM reconstruction, (g) SeisCAR reconstruction.
To assess the reliability of the model on field data, we generated uncertainty maps of the reconstruction results under the condition of 40 continuous missing traces, as shown in Fig. 14. The results demonstrate that SeisCAR can reliably reconstruct complex field datasets.
Uncertainty maps. (a) ANET reconstruction, (b) MST reconstruction, (c) STUGAN reconstruction, (d) CCDM reconstruction, (e) SeisCAR reconstruction.
(a) 30 continuous missing traces, (b) ground truth, (c)–(g) are the reconstruction results of Anet, MST, STUGAN, CCDM, and SeisCAR, respectively, and (h)–(l) are the residuals of each reconstruction result relative to the ground truth.
(a) 30 continuous missing traces, (b) ground truth, (c)–(g) are the reconstruction results of Anet, MST, STUGAN, CCDM, and SeisCAR, respectively, and (h)–(l) are the residuals of each reconstruction result relative to the ground truth.
Generalization
To assess the generalization capability of the proposed method, we conducted an experiment using the SEG C3 synthetic dataset for training. The model, trained on SEG C3, was then directly applied to reconstruct a more complex field dataset collected in Tibet, China. In this experiment, we simulated a scenario with 30 consecutive missing traces, which were set to 0 to represent the missing data for reconstruction. The reconstruction results of the five methods are shown in Fig. 15. It can be observed that Anet, being a convolutional point-to-point reconstruction method, introduces significant information from the training set into the reconstruction. In contrast, both CCDM and our method, which incorporate known data as constraints, exhibit stronger consistency in the reconstructed data.
Table 5 presents three evaluation metrics used to quantitatively assess the model’s generalization capability. It can be observed that Anet exhibits the weakest generalization ability. In contrast, both CCDM and our method, which incorporate known data as constraints, show superior generalization performance compared to point-to-point reconstruction models. Our method, in particular, demonstrates robust generalization capability.
Transfer learning
To evaluate the adaptability of the proposed method to different datasets, we conducted transfer learning by applying the model trained on SEG C3 to the target dataset. The target dataset consists of seismic data collected from Tibet, China, which was used for the generalization experiments. During transfer learning, only a small sample of 200 patches was selected for training, and the model was then used to reconstruct data with 30 continuous missing traces. The reconstruction results, shown in Fig. 16, indicate that SeisCAR exhibits the strongest adaptability when trained with a small sample, effectively learning the characteristics of the target dataset.
In Table 6, we computed three evaluation metrics to quantitatively assess the model’s performance. The results indicate that ANET exhibits the weakest adaptability, whereas our method demonstrates strong adaptability, successfully adapting to the target data through small-sample transfer learning.
Computational efficiency
To verify that SeisCAR reduces computational costs compared to diffusion-based methods while maintaining efficiency comparable to other transformer-based models, we analyzed memory consumption during training, inference time, and model complexity.
The hardware and software configurations for both training and inference were standardized as follows: an NVIDIA A6000 GPU and an Intel Xeon 64-core processor were used, with PyTorch 2.2.1 and CUDA 11.8 for implementation. During training, we used the SEG C3 dataset with data dimensions of 256 \(\times\) 256, setting the batch size to 8. The memory consumption during training is reported in Table 7. The results indicate that CCDM has the highest memory consumption and complexity, primarily due to its constrained diffusion model employing two separate networks-one for handling constraints and the other for processing data. Our method exhibits memory usage comparable to MST, while STUGAN consumes more due to the additional discriminator in its architecture.
We also measured inference time for reconstruction using the SEG C3 dataset with 50 continuous missing traces, as illustrated in Fig. 6a. The inference time comparison, shown in Table 7, reveals that CCDM has the longest inference time due to its iterative sampling process.
Other missing scenarios
To evaluate whether our method can handle missing data in different situations, including some extreme cases, we devised several mixed missing scenarios, as shown in Fig. 17b and d. Figure 17d represents a highly extreme missing scenario, where a malfunction occurs in a geophone midway through its operation, causing several nearby geophones to also fail, resulting in data being collected from only the upper half of the array.
(a) Ground truth, (b) continuous and random missing data mixture, (c) reconstruction results for (b), (d) extreme missing scenario with random missing data and sensor failure, (e) reconstruction results for (d).
(a) Ground truth, (b) Added field noise, (c) 60% random missingness, (d) Reconstruction results from a model trained on clean data, (e) Reconstruction results from a model trained on noisy data.
It can be observed that our method is capable of effectively reconstructing the data across different missing scenarios. We quantitatively calculated the results for various missing cases, as shown in Table 8. The reconstruction performance does not degrade with the variation in missing data scenarios.
To discuss the maximum missing rate at which the proposed method can reconstruct data, we set up several extreme missing rate scenarios, specifically 85%, 90%, and 95% random missing. Using the SEG C3 dataset for testing, the reconstruction results are shown in Table 9.
It can be observed that when the missing rate reaches 95%, the performance begins to decline significantly, indicating that the maximum missing rate SeisCAR can handle is between 90% and 95%, and it is unable to process data with more than 95% missingness.
Noisy data
To evaluate the robustness of the proposed method to noise, we examined the impact of noise on reconstruction results. We introduced field noise collected during actual construction into the data, as shown in Fig. 18a and b, which display the test data with severe noise and the original data, respectively. Subsequently, a 60% random missing rate was applied, as illustrated in Fig. 18c. The model trained on SEG C3 was then used to reconstruct the noisy and incomplete data.
The reconstruction results are shown in Fig. 18d, demonstrating that SeisCAR can effectively reconstruct the missing data, which confirms the robustness of the proposed method to noise. However, in the blank regions of the original data, some noise-like artifacts were also reconstructed, indicating that a model trained on clean data cannot completely eliminate noise interference. To enhance reconstruction performance, we trained the model using a noisy dataset. By introducing field noise collected from actual construction into the SEG C3 dataset, we created a noisy dataset for training, following the same training procedure as for the clean data. The resulting reconstruction, shown in Fig. 18e, illustrates that after training with noisy data, the model can directly reconstruct clean data.
Ablation study
To quantify the contribution of conditional constraints and evaluate the appropriateness of the settings for \(N\) and the learning rate, we conducted ablation experiments on the SEG C3 dataset.
The reconstruction results with and without conditional constraints are shown in Fig. 19. Although the reconstructed data without conditional constraints appears visually similar to the ground truth, the generated data in this case does not align with the distribution of the known data.
(a) 50 continuous missing traces, (b) ground truth, (c) unconditionally constrained reconstruction results, (d) conditionally constrained reconstruction results, and (e), (f) the residuals of each reconstruction result relative to ground truth.
In Table 10, we computed three evaluation metrics to quantitatively assess the impact of conditional constraints. The results indicate that conditional constraints enable the generated data to maintain a higher consistency with the known data.
To evaluate the appropriateness of the \(N\) setting, we varied its value to 12, 16, 20, and 24 and assessed the reconstruction performance on the SEG C3 dataset, where the reconstructed data involved 50 continuous missing traces.
The results, shown in Table 11, indicate that when \(N\) is set to 12, the model size is insufficient to fully represent the entire dataset, leading to a performance decline. When \(N\) is set to 16 and 20, the model capacity increases, but the performance does not improve significantly, suggesting that 16 is a reasonable choice. However, when \(N\) is increased to 24, performance degradation occurs, likely because a larger model requires more training effort to reach optimal performance.
We investigated the impact of the learning rate on training performance, using the SEG C3 dataset with 50 continuous missing traces to evaluate reconstruction performance. As shown in Table 12, it can be observed that when the learning rate is too large, training quickly converges to a suboptimal state, resulting in degraded reconstruction performance. On the other hand, when the learning rate is too small, the convergence process slows down, requiring more training resources. Therefore, setting the learning rate to 0.0001 is a reasonable choice.
Higher resolution
To test the performance of SeisCAR on higher resolution data, we conducted tests on large-scale data collected from the North China Plain in eastern China. During training, we set the size of each patch to 512 \(\times\) 512. The model was trained for 200,000 steps on an Nvidia A6000 GPU. The settings for training were \(N = 16\), batch size = 16, and AdamW as the optimizer with a learning rate of 0.0001. The trained model was then used to reconstruct missing data from the test set, with missing scenarios being 70% random missing and 70 continuous missing traces. The reconstruction results, shown in Fig. 20, demonstrate that the proposed method is still able to effectively reconstruct data even with higher resolution, exhibiting strong scalability.
(a) Ground truth, (b) random missing 70% traces, (c) reconstruction results of the model for (b), (d) 70 continuous missing traces, (e) reconstruction results of the model for (d).
Conclusions
This paper introduces a conditional autoregressive model based on next-scale prediction for seismic data reconstruction. Unlike methods relying on next-token prediction, the proposed model begins with the smallest scale and progressively predicts larger-scale data by utilizing information from preceding smaller scales, ultimately reconstructing the data. Since next-scale prediction does not require unfolding the data into a one-dimensional sequence, it effectively preserves the spatial structure of the data. During autoregressive generation, conditional constraints are incorporated to ensure that the predicted next scale aligns with the known data, maintaining consistency and distribution accuracy. Experimental results conducted on synthetic and field datasets demonstrate that our method outperforms existing methods in terms of achieving more accurate interpolation results.
Data availability
The observation of Society of Exploration Geophysicists (SEG) C3 dataset of this study are available at https://wiki.seg.org/wiki/SEG_C3_45_shot. The observation of Mobil Avo Viking Graben Line 12 field dataset of this study are available at https://wiki.seg.org/wiki/Mobil_AVO_viking_graben_line_12.
Code availability
Our implementation is available at https://github.com/WAL-l/SeisCAR.
References
Chai, X. et al. Deep learning for irregularly and regularly missing data reconstruction. Sci. Rep. 10, 3302 (2020).
Chen, Y., Chen, X., Wang, Y. & Zu, S. The interpolation of sparse geophysical data. Surv. Geophys. 40, 73–105 (2019).
Niu, X., Fu, L., Zhang, W. & Li, Y. Seismic data interpolation based on simultaneously sparse and low-rank matrix recovery. IEEE Trans. Geosci. Remote Sens. 60, 1–13 (2021).
Zhang, W., Fu, L. & Liu, Q. Nonconvex log-sum function-based majorization-minimization framework for seismic data reconstruction. IEEE Geosci. Remote Sens. Lett. 16, 1776–1780 (2019).
Innocent Oboué, Y., Chen, W., Wang, H. & Chen, Y. Robust damped rank-reduction method for simultaneous denoising and reconstruction of 5d seismic data. Geophysics 86, V71–V89 (2021).
Porsani, M. J. Seismic trace interpolation using half-step prediction filters. Geophysics 64, 1461–1467 (1999).
Gülünay, N. Seismic trace interpolation in the Fourier transform domain. Geophysics 68, 355–369 (2003).
Fomel, S. Seismic reflection data interpolation with differential offset and shot continuation. Geophysics 68, 733–744 (2003).
Ronen, J. Wave-equation trace interpolation. Geophysics 52, 973–984 (1987).
Latif, A. & Mousa, W. A. An efficient undersampled high-resolution radon transform for exploration seismic data processing. IEEE Trans. Geosci. Remote Sens. 55, 1010–1024 (2016).
Wang, J., Ng, M. & Perz, M. Seismic data interpolation by greedy local radon transform. Geophysics 75, WB225–WB234 (2010).
Chang, D. et al. Seismic data interpolation using dual-domain conditional generative adversarial networks. IEEE Geosci. Remote Sens. Lett. 18, 1856–1860 (2020).
He, T., Wu, B. & Zhu, X. Seismic data consecutively missing trace interpolation based on multistage neural network training process. IEEE Geosci. Remote Sens. Lett. 19, 1–5 (2021).
Wang, X. et al. Reconstructing regularly missing seismic traces with a classifier-guided diffusion model. IEEE Trans. Geosci. Remote Sens. 62, 1–14 (2024).
Wang, B., Zhang, N., Lu, W. & Wang, J. Deep-learning-based seismic data interpolation: A preliminary result. Geophysics 84, V11–V20 (2019).
Wang, Y., Wang, B., Tu, N. & Geng, J. Seismic trace interpolation for irregularly spatial sampled data using convolutional autoencodercae-based seismic trace interpolation. Geophysics 85, V119–V130 (2020).
Gao, J.-J., Chen, X.-H., Li, J.-Y., Liu, G.-C. & Ma, J. Irregular seismic data reconstruction based on exponential threshold model of POCS method. Appl. Geophys. 7, 229–238 (2010).
Zhang, H., Yang, X. & Ma, J. Can learning from natural image denoising be used for seismic data interpolation?. Geophysics 85, WA115–WA136 (2020).
Creswell, A. et al. Generative adversarial networks: An overview. IEEE Signal Process. Mag. 35, 53–65 (2018).
Kaur, H., Pham, N. & Fomel, S. Seismic data interpolation using deep learning with generative adversarial networks. Geophys. Prospect. 69, 307–326 (2021).
Yu, J. & Wu, B. Attention and hybrid loss guided deep learning for consecutively missing seismic data reconstruction. IEEE Trans. Geosci. Remote Sens. 60, 1–8 (2021).
Guo, Y., Fu, L. & Li, H. Seismic data interpolation based on multi-scale transformer. IEEE Geosci. Remote Sens. Lett. 20, 1–5 (2023).
Dhariwal, P. & Nichol, A. Diffusion models beat gans on image synthesis. Adv. Neural. Inf. Process. Syst. 34, 8780–8794 (2021).
Wang, S. et al. Seisfusion: Constrained diffusion model with input guidance for 3d seismic data interpolation and reconstruction. IEEE Trans. Geosci. Remote Sens. (2024).
Deng, F., Wang, S., Wang, X. & Fang, P. Seismic data reconstruction based on conditional constraint diffusion model. IEEE Geosci. Remote Sens. Lett. (2024).
Liu, Q. & Ma, J. Generative interpolation via a diffusion probabilistic model. Geophysics 89, V65–V85 (2024).
Wei, X. et al. Seismic data interpolation based on denoising diffusion implicit models with resampling. arXiv preprint arXiv:2307.04226 (2023).
Vaswani, A. Attention is all you need. Adv. Neural Inf. Process. Syst. 30, 1 (2017).
Liu, Z. et al. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision 10012–10022 (2021).
Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 33, 6840–6851 (2020).
Esser, P., Rombach, R. & Ommer, B. Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 12873–12883 (2021).
Dosovitskiy, A. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
Bai, Y. et al. Sequential modeling enables scalable learning for large vision models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 22861–22872 (2024).
Lu, J., Clark, C., Zellers, R., Mottaghi, R. & Kembhavi, A. Unified-io: A unified model for vision, language, and multi-modal tasks. In The Eleventh International Conference on Learning Representations (2022).
Lu, J. et al. Unified-io 2: Scaling autoregressive multimodal models with vision language audio and action. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 26439–26455 (2024).
Wang, W. et al. Visionllm: Large language model is also an open-ended decoder for vision-centric tasks. Adv. Neural Inf. Process. Syst. 36 (2024).
Dai, X. et al. Emu: Enhancing image generation models using photogenic needles in a haystack. arXiv preprint arXiv:2309.15807 (2023).
Chen, Z. et al. Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 24185–24198 (2024).
Razavi, A., Van den Oord, A. & Vinyals, O. Generating diverse high-fidelity images with vq-vae-2. Adv. Neural Inf. Process. Syst. 32 (2019).
Tian, K., Jiang, Y., Yuan, Z., Peng, B. & Wang, L. Visual autoregressive modeling: Scalable image generation via next-scale prediction. arXiv preprint arXiv:2404.02905 (2024).
Floridi, L. & Chiriatti, M. GPT-3: Its nature, scope, limits, and consequences. Minds Mach. 30, 681–694 (2020).
Zhang, Y., Zhang, Y., Dong, H. & Song, L. Stugan: An integrated swin transformer based generative adversarial networks for seismic data reconstruction and denoising. IEEE Trans. Geosci. Remote Sens. (2024).
Funding
This research was funded by Natural Science Foundation of Sichuan Province (Grant No. 2025ZNSFSC0312), Longyan Key Projects (Grant No. 2023LYF9003) and Sichuan Achievement Transformation Program (Grant No. 2024ZHCG0022).
Author information
Authors and Affiliations
Contributions
Conceptualization, methodology, software, validation, formal analysis, S.W.; data curation, Y.Y.; writing—original draft preparation, S.W.; writing—review and editing, P.J.; visualization, B.W. and Y. L.; writing—review and editing, X.W.;
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Wang, S., Wang, X., Yang, Y. et al. Conditional autoregressive model based on next scale prediction for missing data reconstruction. Sci Rep 15, 23904 (2025). https://doi.org/10.1038/s41598-025-08830-5
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-08830-5
























