Abstract
Growing pollution levels and associated detrimental effects on human health have made air quality monitoring and forecasting crucial issues in urban environmental management. This paper introduces a novel approach to air quality prediction by using a hybrid deep learning framework that combines Long Short-Term Memory (LSTM) and Convolutional Neural Networks (CNN) with multi-source remote sensing data. We introduce a Multi-Modal Attention-based Spatio-Temporal Network (MAST-Net) designed to jointly analyse satellite imagery, meteorological variables and ground observations for predicting concentrations of PM2.5, PM10, NO₂, and O₃. The framework leverages data from Sentinel-5P, MODIS, and Landsat-8 satellites, integrates a dynamic feature selection strategy, and incorporates uncertainty quantification to enhance reliability. When compared to conventional approaches, experimental validation over metropolitan areas demonstrates higher performance with RMSE improvements of 23–31%, reaching correlation coefficients of 0.91–0.94 for various contaminants. With its strong prediction capabilities across a range of geographic and seasonal situations, the suggested architecture has great promise for real-time air quality control systems.
Similar content being viewed by others
Introduction
Air quality prediction has become a critical research focus in environmental monitoring and Earth system science, given the rapid increase in air pollution levels worldwide and their severe implications for human health, ecosystems, and climate change. Accurate forecasting of pollutants such as PM₂.₅, PM₁₀, NO₂, and O₃ is essential for policy-making, public health, and sustainable urban development1,2,3. Traditionally, ground-based monitoring stations have provided valuable insights into pollutant concentrations, but their sparse distribution limits spatial coverage, particularly in developing regions4. As a result, integrating ground-based observations with multi-source remote sensing data and advanced predictive models has become a promising direction for robust air quality forecasting5,6.
Machine learning (ML) and deep learning (DL) have played a transformative role in advancing predictive air quality modeling3,7,8. Conventional methods, such as autoregressive integrated moving average (ARIMA), have long been employed for pollutant forecasting but are often constrained by linear assumptions and limited spatio-temporal representation9. More recent studies have employed ML algorithms like Random Forests, Support Vector Machines (SVM), and ensemble learning approaches, which provide improved accuracy but still struggle to generalize well under complex meteorological conditions2,10,11. Comparative analyses have shown that deep learning-based methods, particularly those leveraging spatio-temporal architectures, consistently outperform traditional ML by capturing intricate dependencies among multi-source variables12,13,14.
Recent studies highlight the significant role of multi-source and multimodal data fusion in enhancing prediction accuracy. For example, Xia et al.15 demonstrated the effectiveness of combining multi-station time-series data with remote sensing imagery for air quality forecasting in Beijing and Tianjin, while Cai et al.16 emphasized the importance of multi-source fusion in enabling energy-efficient and scalable computing frameworks for large-scale monitoring. Similarly, Anggraini et al.4 showcased how combining ground stations with satellite data enables the development of a global Air Quality Index (AQI) system, providing improved spatial and temporal coverage. Such fusion frameworks are aligned with the broader paradigm of digital twins in Earth system science, where heterogeneous datasets are integrated to provide real-time monitoring and simulation capabilities6.
In addition to data fusion, the choice of deep learning architectures plays a crucial role in prediction performance. A wide range of hybrid and ensemble models have been developed to enhance spatio-temporal representation. Chen et al.12 proposed a hybrid DL model based on neighborhood selection and spatio-temporal attention, significantly improving PM₂.₅ forecasting. Nguyen et al.17 introduced an attention-driven hybrid deep learning framework optimized through quantum-inspired particle swarm optimization, demonstrating the growing trend of combining metaheuristics with DL for improved convergence and accuracy. Duan et al.9 further integrated ARIMA with CNN and LSTM architectures optimized by a dung beetle algorithm, while Cui et al.13 conducted a comparative study of transformer-based and CNN-LSTM-attention models, highlighting the superior capability of attention mechanisms in capturing long-range dependencies.
Moreover, interpretability and robustness remain key concerns in predictive air quality modeling. Houdou et al.8 reviewed interpretable ML approaches, emphasizing the importance of transparency in decision-support systems. Agbehadji and Obagbuwa7 systematically analyzed ML and DL spatio-temporal prediction methods, concluding that ensemble-based approaches and uncertainty-aware models are critical for trustworthy predictions. Similarly, Binbusayyis et al.14 proposed a deep learning approach tailored for smart city applications, where scalability and interpretability are equally important. These perspectives align with the findings of Kaur et al.18, who conducted a systematic review of computational deep learning techniques for air quality prediction, underscoring the need for hybrid frameworks that balance predictive accuracy with computational efficiency.
The growing research interest in integrating remote sensing with advanced predictive models has led to notable contributions in satellite-driven approaches. Ahmed et al.19 developed a DL model incorporating hydro-climatological variables derived from satellite data to forecast AQI, demonstrating the power of Earth observation in filling monitoring gaps. Rahman et al.10 developed AirNet, a predictive ML model with a web interface for air quality forecasting, exemplifying the increasing trend of deploying user-centric, scalable systems for real-time decision-making. Bahadur et al.5 reviewed applications of remote sensing and ML in air quality monitoring, highlighting their synergistic potential for global-scale pollution modeling.
Recent advancements in hybrid deep learning models have significantly improved spatio-temporal PM2.5 prediction. Ahmad et al.20 combined Bidirectional GRU with 1D-CNN to better capture local spatial patterns across Delhi monitoring stations. Ahmad et al.21 addressed missing data issues using a random imputation method integrated with an RNN-BiGRU model. Kumar et al.22 developed a comprehensive spatio-temporal analysis framework for PM2.5 forecasting, while Ahmad et al. 23achieved superior performance in complex urban environments by integrating WaveNet with XGBoost.
Despite notable progress, current research on air quality prediction still faces key challenges, including the limited integration of multi-source remote sensing data with varying spatial and temporal resolutions, inadequate use of attention mechanisms for dynamic feature weighting, insufficient incorporation of uncertainty quantification and restricted evaluation across diverse geographical and climatic contexts. To address these gaps, this study proposes the Multi-Modal Attention-based Spatio-Temporal Network (MAST-Net), which leverages novel multimodal fusion strategies, attention-driven feature selection, and comprehensive uncertainty analysis to deliver robust and generalizable air quality predictions24.
The primary contributions of this work are as follows:
-
Multi-source data integration with complementary spectral and temporal features.
-
Hybrid deep learning architecture for spatio-temporal pattern recognition.
-
Uncertainty quantification to ensure reliable prediction intervals.
-
Dynamic feature selection adapting to environmental variability.
The proposed MAST-Net advances remote sensing-based air quality prediction with improved accuracy and practical applicability.
A detailed comparison of the recent state of the art approaches, data sources, key innovation, and research gaps from recent works on air quality prediction is shown in Table 1.
The rest of this paper is structured as follows. In Sect. 2, a comprehensive description about the study area and data collection is discussed. Section 3 details on the methodology. Section 4 clarifies the results of experimentation and testing to confirm the viability of the approach. Finally, the conclusion is formulated and the findings are summarized in Sect. 5.
Study area and data collection
To develop and evaluate the proposed framework for air quality prediction, ground truth data were obtained from official regulatory monitoring networks covering the period 2019–2023. These networks provided daily measurements of four key pollutants: PM₂.₅, PM₁₀, NO₂, and O₃, which served as reference observations for model training and validation. In parallel, multi-source satellite observations were acquired via Google Earth Engine (GEE) ensuring consistency in spatial and temporal coverage. These datasets provided complementary information on atmospheric composition, land surface conditions, and meteorological variables, which are critical for capturing spatio-temporal dynamics of pollutant dispersion.
Validation experiments were designed across three metropolitan regions with distinct geographical and climatic characteristics: Delhi, India, representing a tropical monsoon climate; Los Angeles, USA, characterized by a Mediterranean climate with prevalent photochemical smog events; and Beijing, China, exhibiting a continental climate with complex topographical influences on pollutant dispersion. These diverse study areas provide a robust basis for assessing the generalizability and adaptability of the proposed framework under varying environmental and atmospheric conditions.
Methodology
General architecture overview
The MAST-Net framework, shown in Fig. 1, introduces a novel multi-modal attention-based spatio-temporal architecture specifically designed for air quality prediction using heterogeneous remote sensing data. This work suggests an integrated deep learning framework that concurrently processes multi-spectral satellite imagery, meteorological reanalysis data, and ground-based measurements to predict multiple air pollutants with improved accuracy and uncertainty quantification, in contrast to traditional methods that rely on single-source satellite data or basic statistical models.
Data integration and preprocessing
Initially, all satellite-derived variables and ground-based measurements were resampled to a common 1 km spatial grid resolution, enabling uniform spatial alignment and reducing biases caused by differing sensor footprints. Five heterogeneous data sources with different spatial and temporal resolutions are integrated: (D₁) Sentinel-5P TROPOMI (7 × 3.5 km), (D₂) MODIS aerosol products (1 × 1 km), (D₃) Landsat-8 surface reflectance (30 m), (D₄) ERA5 meteorological reanalysis (0.25°), and (D₅) ground-based station measurements (point data). To enable joint analysis, all datasets are spatially harmonized to a common 1 × 1 km grid using bilinear interpolation.
where Wbilinear represents the bilinear interpolation weights and (xy) are the target grid coordinates. Second, pollutant concentrations and auxiliary variables were temporally harmonized through daily aggregation, which smooths short-term fluctuations while preserving meaningful temporal trends. where tk represents available timestamps, wk are temporal weights, and σt controls the temporal smoothing parameter.
Third, quality control procedures were implemented to detect and remove anomalous values, ensuring that sensor errors, cloud contamination or measurement noise did not affect model training. Sentinel-5P data undergoes quality assurance filtering using qavalue thresholds:
Finally, to address inevitable gaps in observational records, missing data imputation was performed using a spatio-temporal interpolation strategy, leveraging spatial correlations between neighboring grid cells and temporal continuity to reconstruct complete and reliable datasets.
Multi-modal convolutional feature extraction
The feature extraction stage of the proposed framework is designed to capture rich spatio-temporal representations from heterogeneous inputs by combining parallel CNNs, residual learning, and adaptive pooling. Specifically, four parallel CNN branches are employed, each tailored to process a specific data modality. For the i-th CNN branch with input Xi∈ R H×W×Ci, the convolution operation is defined as:
where \(\:{F}_{i,l}^{\left(j\right)}\) denotes the j-th feature map at layer l of branch i, \(\:{W}_{i,l}^{(j,k)}\:\) is the convolution kernel, * represents the convolution operation, and σ(x) = max(0,x) is the ReLU activation function. To mitigate vanishing gradient problems in deep networks, enhanced residual connections are incorporated, enabling more stable training:
where G denotes the residual function learning the mapping H(Fi,,l) − Fi, l. Finally, to ensure that critical spatial information is preserved, an adaptive pooling strategy is employed in place of conventional fixed pooling. This pooling operation is formulated as:
where R(s, t) defines the adaptive pooling region determined by feature importance scores. Together, these mechanisms enable the module to learn discriminative, multi-scale features while maintaining computational efficiency and robust gradient propagation.
Dynamic feature selection
A dynamic feature selection strategy is used to identify the most relevant inputs under changing environmental conditions. First, mutual information is used to measure the dependency between each feature Xi and the target Y, ensuring that features with stronger predictive power are prioritized:
Next, gradient-based scoring estimates how much each feature influences the model’s predictions by computing the average gradient of the loss function with respect to that feature:
Finally, an adaptive threshold is applied to select only the most informative features. This approach allows the model to adaptively choose the most useful features, improving robustness across different regions and climates.
Bi-directional LSTM for Temporal dependency modeling
To capture temporal dependencies in air quality dynamics, the framework employs Long Short-Term Memory (LSTM) units, which process sequential feature representations Ft∈Rd at each time step t. The internal operations of the LSTM cell are defined as:
Here, ft,it and ot denote the forget, input, and output gates, respectively; σg and σc are sigmoid and tanh activation functions; and W, b are learnable parameters. To capture both past and future dependencies, the framework further extends this design with bi-directional LSTM processing, where outputs from forward and backward passes are concatenated:
This bi-directional structure enables the model to leverage contextual information from the entire sequence, thereby improving its ability to model complex temporal patterns in air quality variations.
Multi-head attention fusion mechanism
To integrate heterogeneous data sources, the framework adopts a multi-head attention fusion strategy. The process begins with scaled dot-product attention, which computes relationships between queries (Q), keys (K), and values (V):
Multiple attention heads are then combined to capture diverse representation subspaces:
Finally, to emphasize the most informative data sources, cross-modal attention weighting is applied:
where ei, j measures the compatibility between modal representations. This mechanism allows the model to dynamically focus on the most relevant features across modalities, improving the robustness of air quality prediction.
Uncertainty quantification framework
To ensure reliable air quality predictions, the framework incorporates multiple strategies for uncertainty quantification. First, Monte Carlo Dropout introduces stochasticity during inference, approximating the predictive distribution through repeated sampling:
Second, a Deep Ensemble of independently trained models provides both mean prediction and variance estimation, enhancing robustness to model uncertainty:
Finally, Quantile Regression directly estimates prediction intervals by minimizing the quantile loss function:
where τ∈[0,1] defines the desired quantile level. Together, these methods provide comprehensive measures of predictive uncertainty, improving decision reliability for environmental monitoring applications.
Results
The proposed MAST-Net framework was implemented in TensorFlow 2.x with Keras, leveraging NVIDIA V100 GPUs (32 GB) and optimized using mixed precision training with a multi-GPU distributed strategy. The computational performance of MAST-Net demonstrates its suitability for practical deployment. Model training required approximately 2.3 h on an NVIDIA V100 GPU, while inference was highly efficient, averaging 0.15 s per prediction. The framework consumed around 4.2 GB of GPU memory, indicating manageable resource demands. These results confirm that MAST-Net can be deployed in real-time operational environments with acceptable computational overhead. The model was trained on 7-day input sequences, with CNN branches extracting features using 64, 128, and 256 channels, followed by an LSTM layer of 256 hidden units and 8-head attention for spatio-temporal fusion. Regularization was ensured with a 0.3 dropout rate, and learning followed a cosine decay schedule starting at 0.001. Data were split into 80% training, 10% validation, and 10% testing, and performance robustness was further validated using 5-fold cross-validation.
Table 2. represents the Overview of ground-based pollution data for all study regions (2019–2023).
The comprehensive statistical summary derived using df.describe() in Table 2 provides full transparency regarding the dataset characteristics and preprocessing steps. Table 3 compares MAST-Net with cutting-edge techniques from current research, providing thorough performance data across study regions and contaminants.
MAST-Net consistently outperforms CNN-LSTM and Random Forest baselines, achieving the lowest RMSE values (e.g., 8.2 µg/m³ for PM 2.5 and 6.7 µg/m³ for NO₂) and the highest R² scores (up to 0.94). Compared to CNN-LSTM, MAST-Net reduces RMSE by 23–31%, highlighting the benefits of multi-modal attention and spatio-temporal modeling.
Table 4 summarizes the effect of varying the number of attention heads on model performance, attention diversity, and computational cost. Results show that increasing heads generally improves prediction accuracy and diversity up to 8 heads, where PM2.5 and NO₂ RMSE reach their lowest values (8.2 and 6.7, respectively) with the highest diversity (0.58). Beyond this point, adding more heads leads to higher errors, reduced diversity, and rapidly rising computational costs, indicating that 8 heads provide the best balance between accuracy, efficiency, and representational richness.
Figure 2 highlights the impact of hyperparameters on model performance. Figure 2(a) shows that RMSE is minimized at a learning rate around 0.001, with both very small and very large learning rates leading to higher error. Figure 2(b) indicates that higher learning rates speed up convergence (fewer epochs needed) but reduce training stability, suggesting a trade-off between speed and reliability. Figure 2(c) demonstrates that increasing dropout reduces the overfitting gap, with both training and validation RMSE improving up to a point (around 0.3–0.5) before slightly increasing again. Overall, these analyses emphasize the importance of carefully tuning learning rate and dropout to balance error minimization, convergence efficiency, and generalization.
Hyper parameter sensitivity analysis. (a) Learning Rate Vs RMSE showing the relationship between learning rate and prediction error. (b) Convergence Analysis demonstrating the trade-off between convergence speed (epochs) and training stability across different learning rates. (c) Dropout Rate impact on over fitting, comparing training RMSE, validation RMSE, and over fitting gap across dropout rates 0.0–0.7.0.7.
Cross-regional validation shown in Table 5 highlights the robustness of MAST-Net across diverse climatic and geographical conditions. Models trained in Delhi transferred effectively to Beijing, achieving R² values between 0.82 and 0.87, while Los Angeles-trained models maintained performance in the range of 0.79 to 0.84 when applied to other regions. Importantly, fine-tuning with even limited local data improved transferability, reducing performance gaps to within 5% of region-specific models. These results collectively demonstrate that MAST-Net not only delivers accurate predictions but also exhibits strong adaptability, making it highly practical for global-scale air quality monitoring.
Ablation study
Ablation studies are conducted to evaluate the robustness and effectiveness of each component of MAST-Net prediction model. It helps to evaluate the contribution of each major component within the proposed MAST-Net framework. The analysis depicted in Table 6, systematically removed or replaced specific modules to assess their relative impact on predictive performance across PM₂.₅, PM₁₀, NO₂, and O₃ forecasting tasks.
Results demonstrate that the attention mechanism plays a pivotal role, contributing to an average 15% improvement by dynamically weighting spatio-temporal features. The inclusion of multi-modal fusion further enhanced performance, achieving a 12% gain compared to single-source baselines, thereby confirming the importance of integrating complementary satellite, meteorological, and ground-based observations.
Incorporation of temporal modelling through LSTM layers produced the highest impact, with an 18% improvement over spatial-only variants, underscoring the critical role of sequential dependencies in air quality dynamics. Finally, while uncertainty quantification did not significantly alter core accuracy metrics, it introduced substantial reliability benefits by providing calibrated prediction intervals, which are crucial for operational deployment.
Overall, the ablation study confirms that each component contributes meaningfully to MAST-Net’s superior performance. Temporal modeling and attention emerge as the most influential factors, while multi-modal fusion ensures robust utilization of heterogeneous data sources. Uncertainty quantification, though not directly improving accuracy, adds essential interpretability and confidence to the predictions, strengthening the framework’s applicability in real-world air quality monitoring.
Conclusion
MAST-Net introduces a multi-modal, attention-driven spatio-temporal deep learning framework that significantly improves urban air quality prediction. By fusing five heterogeneous remote sensing and ground-based datasets within a hybrid CNN–BiLSTM architecture, it achieves substantial accuracy gains, including a 23–31% RMSE reduction for PM2.5, R² values of 0.91–0.94 across major pollutants, strong cross-regional transferability, and efficient training and inference performance. Ablation results highlight the contributions of attention mechanisms, multi-modal fusion, and temporal modeling. While challenges remain such as spatial resolution limits, cloud interference, and sensor drift future integration of geostationary data, physics-informed modeling and IoT networks promises to enhance resolution, robustness, and real-time applicability.
Data availability
All datasets used in this study are publicly available from the following sources: 1. **Sentinel-5P TROPOMI (NO₂, O₃, SO₂, CO)** : Google Earth Engine ([https://developers.google.com/earth-engine/datasets/catalog/sentinel-5p]). 2. **MODIS MAIAC Aerosol Optical Depth (MCD19A2 Collection 6.1)** : Available from Google Earth Engine access: [https://developers.google.com/earth-engine/datasets/catalog/MODIS\_061\_MCD19A2\_GRANULES]. 3. **Landsat-8 OLI/TIRS Surface Reflectance (Collection 2, Level-2)** : Available from Google Earth Engine at [https://developers.google.com/earth-engine/datasets/catalog/LANDSAT\_LC08\_C02\_T1\_L2]. 4. **ERA5 Meteorological Reanalysis** : Available from the Google Earth Engine ([https://developers.google.com/earth-engine/datasets/catalog/ECMWF\_ERA5\_DAILY]). Access requires free CDS API registration at [https://cds.climate.copernicus.eu/how-to-api]. **Ground-Based Air Quality Measurements: **1. **Delhi, India** : Central Pollution Control Board (CPCB) provides hourly air quality data through the Continuous Ambient Air Quality Monitoring Stations (CAAQMS) network. Data accessible at: [https://app.cpcbccr.com/ccr/](https:/app.cpcbccr.com/ccr). Historical data available at: [https://airquality.cpcb.gov.in/ccr/#/caaqm-dashboard-all/caaqm-landing](https:/airquality.cpcb.gov.in/ccr) 2. **Los Angeles, USA** : South Coast Air Quality Management District (SCAQMD) provides hourly air quality measurements. Data accessible at: [https://www.aqmd.gov/home/air-quality/air-quality-data-studies/historical-data-by-year]. Additional data from California Air Resources Board: [https://www.arb.ca.gov/aqmis2/aqdselect.php]. 3. **Beijing, China** : Beijing Municipal Ecological Environment Monitoring Center provides hourly measurements at: [https://www.bjmemc.com.cn/]. (Chinese language interface). English data portal: [http://english.mee.gov.cn/Resources/Data/]. Historical archives: [https://quotsoft.net/air/].
References
Singh, S. Machine learning and deep learning approaches for PM2. 5 prediction: A study on urban air quality in Jaipur. India Earth Sci. Inf. 18 (1), 97. https://doi.org/10.1007/s12145-024-01648-1 (2025).
Özüpak, Y., Alpsalaz, F. & Aslan, E. Air quality forecasting using machine learning: comparative analysis and ensemble strategies for enhanced prediction. Water Air Soil Pollut. 236 (7), 464. https://doi.org/10.1007/s11270-025-08122-8 (2025).
Tang, D., Yang, F. & Yu Zhan, and A review of machine learning for modeling air quality: overlooked but important issues. Atmos. Res. 300, 107261. https://doi.org/10.1016/j.atmosres.2024.107261 (2024).
Anggraini, T., Septi, H., Irie, A. D., Sakti & Ketut Wikantika. Machine learning-based global air quality index development using remote sensing and ground-based stations. Environ. Adv. 15, 100456. https://doi.org/10.1016/j.envadv.2023.100456 (2024).
Bahadur, F., Tahir, S. R., Shah & Rama Rao, N. Applications of remote sensing vis-à-vis machine learning in air quality monitoring and modelling: a review. Environ. Monit. Assess. 195 (12), 1502. https://doi.org/10.1007/s10661-023-12001-2 (2023).
Li, X. et al. Big Data in Earth system science and progress towards a digital twin. Nat. Reviews Earth Environ. 4, 5 : 319–332. https://doi.org/10.1038/s43017-023-00409-w (2023).
Agbehadji, I. E. & Ibidun Christiana Obagbuwa. and. Systematic review of machine learning and deep learning techniques for spatiotemporal air quality prediction. Atmosphere 15, no. 11 : 1352. (2024). https://doi.org/10.3390/atmos15111352
Houdou, A. et al. Azeddine Ibrahimi, and Mohamed Khalis. Interpretable machine learning approaches for forecasting and predicting air pollution: A systematic review. Aerosol Air Qual. Res. 24 (1), 230151. https://doi.org/10.4209/aaqr.230151 (2024).
Duan, J., Gong, Y., Luo, J. & Zhao, Z. Air-quality prediction based on the ARIMA-CNN-LSTM combination model optimized by Dung beetle optimizer. Sci. Rep. 13 (1), 12127. https://doi.org/10.1038/s41598-023-36620-4 (2023).
Rahman, M., Mahbubur, M. E. H., Nayeem, M. S. & Ahmed Khadiza akther Tanha, Md Shahriar Alam Sakib, Khandaker Mohammad Mohi Uddin, and Hafiz Md Hasan Babu. AirNet: predictive machine learning model for air quality forecasting using web interface. Environ. Syst. Res. 13 (1), 44. https://doi.org/10.1186/s40068-024-00378-z (2024).
Khadom, A. A., Albawi, S., Abboud, A. J. & Mahood, H. B. Predicting air quality index and fine particulate matter levels in Bagdad City using advanced machine learning and deep learning techniques. J. Atmos. Solar Terr. Phys. 262, 106312. https://doi.org/10.1016/j.jastp.2024.106312 (2024).
Chen, G., Chen, S., Li, D. & Chen, C. A hybrid deep learning air pollution prediction approach based on neighborhood selection and spatio-temporal attention. Sci. Rep. 15 (1), 3685. https://doi.org/10.1038/s41598-025-88086-1 (2025).
Cui, B. et al. Deep learning methods for atmospheric PM2. 5 prediction: A comparative study of transformer and CNN-LSTM-attention. Atmospheric Pollution Res. 14 (9), 101833. https://doi.org/10.1016/j.apr.2023.101833 (2023).
Binbusayyis, A., Khan, M. A., Mustaq Ahmed, M. & Sam Emmanuel, W. R. A, and A deep learning approach for prediction of air quality index in smart city. Discover Sustainability. 5 (1), 89. https://doi.org/10.1007/s43621-024-00272-9 (2024).
Xia, H., Chen, X., Wang, Z., Chen, X. & Dong, F. A Multi-Modal deep-learning air quality prediction method based on multi-station time-series data and remote-sensing images: case study of Beijing Tianjin Entropy. 26(1). https://doi.org/10.3390/e26010091 (2024).
Cai, J. et al. Multi-source fusion enhanced power-efficient sustainable computing for air quality monitoring. IEEE Internet Things J. https://doi.org/10.1109/JIOT.2024.3420956 (2024).
Nguyen, A. et al. Predicting air quality index using attention hybrid deep learning and quantum-inspired particle swarm optimization. J. Big Data. 11 (1), 71. https://doi.org/10.1186/s40537-024-00926-5 (2024).
Kaur, M. et al. Computational deep air quality prediction techniques: a systematic review. Artif. Intell. Rev. 56 No Suppl. 2, 2053–2098. https://doi.org/10.1007/s10462-023-10570-9 (2023).
Ahmed, A. A. et al. An advanced deep learning predictive model for air quality index forecasting with remote satellite-derived hydro-climatological variables. Sci. Total Environ. 906, 167234. https://doi.org/10.1016/j.scitotenv.2023.167234 (2024).
Ahmad, N. Spatio-temporal forecasting using a hybrid BiGRU-1DCNN model for PM 2.5 concentrations in Delhi, India (2018–2023) across multiple monitoring stations. Water Air Soil Pollut. 236 (7), 459. https://doi.org/10.1007/s11270-025-08103-x (2025).
Ahmad, N. & Kumar, V. Enhancing PM 2.5 air pollution forecasting with novel random imputation based on hybrid rnn-bidirectional GRU (nRI RNN-BiGRU) model. SN Comput. Sci. 6 (6), 637. https://doi.org/10.1007/s42979-025-04167-y (2025).
Ahmad, N. & Kumar, V. A Hybrid Time Series Model for the Spatio-Temporal Analysis of Air Pollution Prediction Based on PM 2.5. In International Conference on Advanced Network Technologies and Intelligent Computing, pp. 62–81. Cham: Springer Nature Switzerland, (2023). https://doi.org/10.1007/978-3-031-64067-4_5
Ahmad, N. & Kumar, V. Effective air pollution prediction using Wavenet deep learning with Xgboost (1DCNN-BiLSTM-XgRC) for urban US embassies. Theoret. Appl. Climatol. 156, 9. https://doi.org/10.1007/s00704-025-05715-5 (2025).
Yu, M., Huang, Q. & Li, Z. Deep learning for Spatiotemporal forecasting in Earth system science: a review. Int. J. Digit. Earth. 17, 1. https://doi.org/10.1080/17538947.2024.2391952 (2024).
Author information
Authors and Affiliations
Contributions
Kalaiselvi S and Anitha V: Writing – review &editing, original draft, Software, Methodology, Investigation, Conceptualization. Manimaran V : Review, Validation, Supervision. Thomas Samraj Lawrence: Writing – review &editing, Methodology.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Kalaiselvi, S., Anitha, V., Manimaran, V. et al. Air quality prediction using multi-source remote sensing data integration with hybrid deep learning framework. Sci Rep 16, 2688 (2026). https://doi.org/10.1038/s41598-025-32466-0
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-32466-0





