Abstract
Spatiotemporally resolved ambient temperature data are essential for environmental epidemiology, especially in urban areas where temperature can vary sharply over short distances, influencing population exposure. Additionally, heat distribution often reflects built environment patterns and may correlate with existing social and environmental disparities. Continuous temporal records at high spatial resolution are, however, often lacking, especially in low- and middle-income countries. We developed a generalizable tree-based machine learning approach to estimate daily mean temperatures at 500 × 500 m resolution using São Paulo, a megacity in Brazil, as a case study, to demonstrate its utility in highly urbanized settings with a heterogeneous urban fabric and unevenly distributed temperature monitoring stations. We trained a Random Forest model using open-access remote sensing data, along with derived products, and temperature measurements from 43 ground stations. To prevent overfitting and select relevant features, we employed a forward feature selection algorithm with target-oriented (spatial) cross-validation. Hyperparameter tuning was performed using grid search approach. The model was validated through ten-fold station-based cross-validation and an external hold-out dataset. The model demonstrated strong performance (RMSERF = 0.80; R2RF = 0.95), with slightly reduced accuracy in rural areas (R2rural = 0.91; R2urban = 0.95). Compared to traditional multilinear approaches (RMSEMLR = 1.02; R2MLR = 0.92), the Random Forest model outperformed, likely due to its ability to better capture microclimates and complex relationships between data sources. This 500 × 500 m daily temperature dataset is the first of its kind in South America, with the São Paulo pipeline and data freely accessible. The approach is adaptable to other regions with appropriate retraining and validation, enabling high-resolution exposure assessments.
Data availability
The datasets generated and analysed during the current study are available in this [Zenodo repository](https:/zenodo.org/records/15868840?token=eyJhbGciOiJIUzUxMiJ9.eyJpZCI6ImNkM2ViZTYzLTUzZjctNDVmYy05NjZjLWViZWQxYWFlNmM4MyIsImRhdGEiOnt9LCJyYW5kb20iOiIxNDQ3ZTAwNDFmNjkwN2Y2YTViNmViYjQzYzcyYWRiYyJ9.jR0CTaNtlOunm0XQYBwjC73yFJIdsjtSXUe-F94VXx5a1vgsCEeQP5-XIPnsRa36rV-fZZuCsw4WtZeDMI9IPA)87. All the code used in the analyses is available on GitHub at (https:/github.com/AinaRB/DailyTemperature_RandomForest_SaoPaulo) . Additionally, a public-facing website providing accessible, layman-friendly information about the project and its findings can be found at the project’s website: (https://ainarb.github.io/climate_and_health/)88.
Abbreviations
- BSA:
-
Black sky albedo
- CV:
-
Cross-validation
- d2m:
-
Dew temperature at 2 m
- DEM:
-
Digital elevation model
- ECMWF:
-
European centre for medium-range weather forecasts
- ERA5:
-
5th generation European centre for medium-range weather forecasts (ECMWF) atmospheric reanalysis
- FFS:
-
Feature forward selection
- GEE:
-
Google earth engine
- LMIC:
-
Low and middle-income countries
- LST:
-
Land surface temperature
- MLR:
-
Multi-linear regression
- NDVI:
-
Normalized difference vegetation index
- R2:
-
r-squared
- RF:
-
Random Forest
- rh:
-
Relative humidity
- RMSE:
-
Root mean square error
- SZA:
-
Solar zenith angle
- t2m:
-
Ambient temperature at 2 m
- Ta:
-
Ambient temperature
- u10:
-
Northward wind component
- v10:
-
Eastward wind component
References
Baccini, M. et al. Heat effects on mortality in 15 European cities. Epidemiology 19, 711–719 (2008).
Antonio Gasparrini, Y. G. et al. Mortality risk attributable to high and low ambient temperature: a multicountry observational study. Lancet 386, 369–375 (2015).
Gasparrini, A. et al. Temporal variation in heat–mortality associations: A multicountry study. Environ. Health Perspect. 123, 1200–1207 (2015).
Zhao, Q. et al. Global, regional, and National burden of mortality associated with non-optimal ambient temperatures from 2000 to 2019: a three-stage modelling study. Lancet Planet. Health. 5, e415–e425 (2021).
Wouters, H. et al. Heat stress increase under climate change twice as large in cities as in rural areas: A study for a densely populated midlatitude maritime region. Geophys. Res. Lett. 44, 8997–9007 (2017).
Roca-Barceló, A. et al. Trends in Temperature-associated mortality in São Paulo (Brazil) between 2000 and 2018: an example of disparities in adaptation to cold and heat. J. Urb. Health. 99, 1012–1026 (2022).
Tuholske, C. et al. Global urban population exposure to extreme heat. Proc. Natl. Acad. Sci. U S A. 118, 1–9 (2021).
UN. World urbanization prospects: the 2018 revision (ST/ESA/SER.A/420). Demographic Research 12 (2019).
Masselot, P. et al. Excess mortality attributed to heat and cold: a health impact assessment study in 854 cities in Europe. Lancet Planet. Health 271–281 (2023).
Mistry, M. N. et al. Comparison of weather station and climate reanalysis data for modelling temperature-related mortality. Sci. Rep. 12, 1–14 (2022).
Yao, R. et al. Global seamless and high-resolution temperature dataset (GSHTD), 2001–2020. Remote Sens. Environment 286 (2023).
Peng, J., Hu, Y., Dong, J., Liu, Q. & Liu, Y. Quantifying Spatial morphology and connectivity of urban heat Islands in a megacity: A radius approach. Sci. Total Environ. 714, 136792 (2020).
Ghandehari, M., Emig, T. & Aghamohamadnia, M. Surface temperatures in new York city: Geospatial data enables the accurate prediction of radiative heat transfer. Sci. Rep. 8, 1–10 (2018).
Kousis, I., Pigliautile, I. & Pisello, A. L. Intra-urban microclimate investigation in urban heat Island through a novel mobile monitoring system. Sci. Rep. 11, 1–17 (2021).
de Lima, G. N. & Magaña Rueda, V. O. The urban growth of the metropolitan area of Sao Paulo and its impact on the climate. Weather Clim. Extrem. 21, 17–26 (2018).
Arnfield, A. J. Two decades of urban climate research: A review of turbulence, exchanges of energy and water, and the urban heat Island. Int. J. Climatol. 23, 1–26 (2003).
Hu, K. et al. Evidence for Urban – Rural disparity in Temperature – Mortality relationships in Province, Zhejiang. Environ. Health Perspect. 127, 037001 (2019).
de Hoogh, K. et al. Development of West-European PM 2.5 and 2 land use regression models incorporating satellite-derived and chemical transport modelling data. Environ. Res. 151, 1–10 (2016).
de Hoogh, K., Héritier, H., Stafoggia, M., Künzli, N. & Kloog, I. Modelling daily PM2.5 concentrations at high spatio-temporal resolution across Switzerland. Environ. Pollut. 233, 1147–1154 (2018).
Schneider, R. Estimating spatio-temporal air temperature in London (UK) using machine learning and Earth observation satellite data. International J. Appl. Earth Observation Geoinformation 88, (2020).
Sekulić, A., Kilibarda, M., Heuvelink, G. B. M., Nikolić, M. & Bajat, B. Random forest Spatial interpolation. Remote Sens. (Basel). 12, 1–29 (2020).
Huerta, A. et al. High-resolution grids of daily air temperature for Peru - the new PISCOt v1.2 dataset. Scientific Data 10, 1–22 (2023). (2023).
Bussalleu, A. et al. Modelling Europe-wide fine resolution daily ambient temperature for 2003–2020 using machine learning. Sci Total Environ 928, (2024).
Verdin, A. et al. Development and validation of the CHIRTS-daily quasi-global high-resolution daily temperature data set. Sci. Data. 7, 1–14 (2020).
Funk, C. et al. A high-resolution 1983–2016 TMAX climate data record based on infrared temperatures and stations by the climate hazard center. J. Clim. 32, 5639–5658 (2019).
Beck, H. E. et al. Present and future Köppen-Geiger climate classification maps at 1-km resolution. Scientific Data 2018 5:1 5, 1–12 (2018).
Shiff, S., Helman, D. & Lensky, I. M. Worldwide continuous gap-filled MODIS land surface temperature dataset. Sci. Data. 8, 1–10 (2021).
Wan, Z. Collection-6 MODIS Land Surface Temperature Products Users’ Guide (University of California, 2013).
Saha, S. et al. NCEP Climate Forecast System Version 2 (CFSv2) 6-hourly Products. Preprint at (2011).
Jang, J. D., Viau, A. A. & Anctil, F. Neural network Estimation of air temperatures from AVHRR data. Int. J. Remote Sens. 25, 4541–4554 (2004).
Vancutsem, C., Ceccato, P., Dinku, T. & Connor, S. J. Evaluation of MODIS land surface temperature data to estimate air temperature in different ecosystems over Africa. Remote Sens. Environ. 114, 449–465 (2010).
Didan, K. MODIS/Terra Vegetation Indices 16-Day L3 Global 1km (MOD13A2_v006) NASA LP DAAC [Data set]. NASA EOSDIS Land Processes Distributed Active Archive Center. (2015). https://lpdaac.usgs.gov/products/mod13a2v006/
Qu, Y. et al. Mapping surface broadband albedo from satellite observations: A review of literatures on algorithms and products. Remote Sens. (Basel). 7, 990–1020 (2015).
Schaaf, C., Wang, Z. & MODIS/Terra + Aqua BRDF/Albedo Daily L3 Global – 500m V061. LP DAAC - MCD43A3 v061 [Data set]. NASA EOSDIS Land Processes Distributed Active Archive Center (2021). https://lpdaac.usgs.gov/products/mcd43a3v061/
Muñoz-Sabater, J. et al. ERA5-Land: A state-of-the-art global reanalysis dataset for land applications. Earth Syst. Sci. Data. 13, 4349–4383 (2021).
Tatem, A. J. WorldPop, open data for spatial demography. Scientific Data vol. 4 1–4 Preprint at (2017). https://doi.org/10.1038/sdata.2017.4
Stevens, F. R., Gaughan, A. E., Linard, C. & Tatem, A. J. Disaggregating census data for population mapping using random forests with remotely-sensed and ancillary data. PLoS One. 10, 1–22 (2015).
Gorelick, N. et al. Google Earth engine (GEE): Planetary-scale Geospatial analysis for everyone. Remote Sens. Environ. 202, 18–27 (2017).
Comitê da Bacia. Hidrográfica Do Alto Tietê (CBH-AT). Reservatórios da Região Metropolitana de São Paulo.
Zhang, X. et al. Development of a global 30-m impervious surface map using multi- source and multi-temporal remote sensing datasets with the Google Earth engine platform. 3505079, 1–27 (2020).
Zhang, X. et al. Development of a global 30m impervious surface map using multisource and multitemporal remote sensing datasets with the Google Earth engine platform. Earth Syst. Sci. Data. 12, 1625–1648 (2020).
Schneider, R. et al. A satellite-based spatio-temporal machine learning model to reconstruct daily PM2.5 concentrations across great Britain. Remote Sens. (Basel). 12, 3803 (2020).
Yang, Y. Z., Cai, W. H. & Yang, J. Evaluation of MODIS land surface temperature data to estimate near-surface air temperature in Northeast China. Remote Sens. (Basel). 9, 1–19 (2017).
Gerber, F., De Jong, R., Schaepman, M. E., Schaepman-Strub, G. & Furrer, R. Predicting missing values in Spatio-Temporal remote sensing data. IEEE Trans. Geosci. Remote Sens. 56, 2841–2853 (2018).
Breiman, L. Random forest. Mach. Learn. 45, 5–32 (2001).
Mehnert, P., Bröde, P. & Griefahn, B. Gender-related difference in sweat loss and its impact on exposure limits to heat stress. Int. J. Ind. Ergon. 29, 343–351 (2002).
Vicente-Serrano, S. M., Saz-Sánchez, M. A. & Cuadrat, J. M. Comparative analysis of interpolation methods in the middle Ebro Valley (Spain): application to annual precipitation and temperature. Clim. Res. 24, 161–180 (2003).
Zhang, H., Zhang, F., Ye, M., Che, T. & Zhang, G. Estimating daily air temperatures over the Tibetan plateau by dynamically integrating MODIS LST data. J. Geophys. Res. Atmos. 121, 11,425 (2016).
Meyer, H., Reudenbach, C., Hengl, T., Katurji, M. & Nauss, T. Improving performance of spatio-temporal machine learning models using forward feature selection and target-oriented validation. Environ. Model Softw. 101, 1–9 (2018).
Pedregosa, F. et al. Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Loecher, M. Unbiased variable importance for random forests. Commun. Stat. Theory Methods. 51, 1413–1425 (2022).
Kloog, I., Nordio, F., Coull, B. A. & Schwartz, J. Predicting Spatiotemporal mean air temperature using MODIS satellite surface temperature measurements across the Northeastern USA. Remote Sens. Environ. 150, 132–139 (2014).
Zhao, P. & He, Z. A. First evaluation of ERA5-Land reanalysis temperature product over the Chinese Qilian mountains. Front. Earth Sci. (Lausanne). 10, 1–10 (2022).
Zou, J. et al. Performance of air temperature from ERA5-Land reanalysis in coastal urban agglomeration of Southeast China. Sci. Total Environ. 828, 154459 (2022).
Lee, J. & Dessler, A. E. Improved Surface Urban Heat Impact Assessment Using GOES Satellite Data: A Comparative Study With ERA-5. Geophys Res Lett 51, e2023GL107364 (2024).
Voogt, J. A. & Oke, T. R. Effects of urban surface geometry on remotely-sensed surface temperature. Int. J. Remote Sens. 19, 895–920 (1998).
Ho, H. C. et al. Mapping maximum urban air temperature on hot summer days. Remote Sens. Environ. 154, 38–45 (2014).
Zhu, W., Lu, A. & Jia, S. Estimation of daily maximum and minimum air temperature using MODIS land surface temperature products. Remote Sens. Environ. 130, 62–73 (2013).
Yoo, C., Im, J., Park, S. & Quackenbush, L. J. Estimation of daily maximum and minimum air temperatures in urban landscapes using MODIS time series satellite data. ISPRS J. Photogrammetry Remote Sens. 137, 149–162 (2018).
Do Nascimento, A. C. L., Galvani, E., Gobo, J. P. A. & Wollmann, C. A. Comparison between air temperature and land surface temperature for the City of São Paulo, Brazil. Atmos. (Basel). 13, 1–21 (2022).
Mildrexler, D. J., Zhao, M. & Running, S. W. A global comparison between station air temperatures and MODIS land surface temperatures reveals the cooling role of forests. J. Geophys. Res. Biogeosci. 116, 1–15 (2011).
Lian, X. et al. Spatiotemporal variations in the difference between satellite-observed daily maximum land surface temperature and station-based daily maximum near-surface air temperature. J. Geophys. Res. 122, 2254–2268 (2017).
Fu, G. et al. Estimating air temperature of an alpine meadow on the Northern Tibetan plateau using MODIS land surface temperature. Acta Ecol. Sin. 31, 8–13 (2011).
Dousset, B. IEEE, New York, NY,. AVHRR-derived cloudiness and surface temperature patterns over the Los Angeles area and their relationship to land use. in Proceedings of IGARSS-89 2132–2137 (1989).
Friedl, M. A. & Davis, F. W. Sources of variation in radiometric surface temperature over a tallgrass prairie. Remote Sens. Environ. 48, 1–17 (1994).
Goward, S. N. & Hope, A. S. Evapotranspiration from combined reflected solar and emitted terrestrial radiation: preliminary results from AVHRR data. Adv. Space Res. 9, 239–249 (1989).
Zhu, X., Zhang, Q., Xu, C. Y., Sun, P. & Hu, P. Reconstruction of high Spatial resolution surface air temperature data across china: A new geo-intelligent multisource data-based machine learning technique. Sci. Total Environ. 665, 300–313 (2019).
Gutiérrez-Avila, I. et al. A Spatiotemporal reconstruction of daily ambient temperature using satellite data in the megalopolis of central Mexico from 2003 to 2019. Int. J. Climatol. 41, 4095–4111 (2021).
Kloog, I., Chudnovsky, A., Koutrakis, P. & Schwartz, J. Temporal and Spatial assessments of minimum air temperature using satellite surface temperature measurements in Massachusetts, USA. Sci. Total Environ. 432, 85–92 (2012).
Silva, F. B., Longo, K. M. & De Marques, F. Spatial and Temporal variability patterns of the urban heat Island in São Paulo. (2017). https://doi.org/10.3390/environments4020027
Shi, L. et al. Estimating daily air temperature across the southeastern united States using high-resolution satellite data: A statistical modeling study. Environ. Res. 146, 51–58 (2016).
Kloog, I. et al. Modelling spatio-temporally resolved air temperature across the complex geo-climate area of France using satellite-derived land surface temperature data. Int. J. Climatol. 37, 296–304 (2017).
Rosenfeld, A. et al. Estimating daily minimum, maximum, and mean near surface air temperature using hybrid satellite models across Israel. Environ. Res. 159, 297–312 (2017).
Milà, C., Ballester, J., Basagaña, X., Nieuwenhuijsen, M. J. & Tonne, C. Estimating daily air temperature and pollution in catalonia: A comprehensive Spatiotemporal modelling of multiple exposures. Environ. Pollut. 337, 122501 (2023).
Flückiger, B. et al. Modelling daily air temperature at a fine Spatial resolution dealing with challenging meteorological phenomena and topography in Switzerland. Int. J. Climatol. 42, 6413–6428 (2022).
Zhou, B. et al. Estimating near-surface air temperature across Israel using a machine learning based hybrid approach. Int. J. Climatol. 40, 6106–6121 (2020).
Xu, Y., Knudby, A. & Ho, H. C. Estimating daily maximum air temperature from MODIS in British Columbia, Canada. Int. J. Remote Sens. 35, 8108–8121 (2014).
Mohsenzadeh Karimi, S., Kisi, O., Porrajabali, M., Rouhani-Nia, F. & Shiri, J. Evaluation of the support vector machine, random forest and geo-statistical methodologies for predicting long-term air temperature. ISH J. Hydraulic Eng. 26, 376–386 (2020).
Hashimoto, H. et al. High-resolution mapping of daily climate variables by aggregating multiple Spatial data sets with the random forest algorithm over the conterminous united States. Int. J. Climatol. 39, 2964–2983 (2019).
Li, J., Heap, A. D., Potter, A. & Daniell, J. J. Application of machine learning methods to Spatial interpolation of environmental variables. Environ. Model Softw. 26, 1647–1659 (2011).
Sobstyl, J. M., Emig, T., Qomi, M. J. A., Ulm, F. J. & Pellenq, R. J. M. Role of City texture in urban heat Islands at nighttime. Phys. Rev. Lett. 120, 108701 (2018).
Shi, H., Xian, G., Auch, R., Gallo, K. & Zhou, Q. Urban heat Island and its regional impacts using remotely sensed thermal Data—A review of recent developments and methodology. Land. (Basel). 10, 867 (2021).
Li, C., Zhao, J., Thinh, N. X., Yang, W. & Li, Z. Analysis of the Spatiotemporally varying effects of urban Spatial patterns on land surface temperatures. J. Environ. Eng. Landsc. Manage. 26, 216–231 (2018).
Schinasi, L. H., Benmarhnia, T. & De Roos, A. J. Modification of the association between high ambient temperature and health by urban microclimate indicators: A systematic review and meta-analysis. Environ. Res. 161, 168–180 (2018).
Hersbach, H. et al. The ERA5 global reanalysis. Q. J. R. Meteorol. Soc. 146, 1999–2049 (2020).
Pelta, R. & Chudnovsky, A. A. Spatiotemporal Estimation of air temperature patterns at the street level using high resolution satellite imagery. Sci. Total Environ. 579, 675–684 (2017).
AinaRB, G. H. Aina Roca-Barcelo (2025). https://github.com/AinaRB
Roca-Barceló, A. Climate and health in urban areas: the case study of Sao Paulo Brazil - Climate health burden. (2022). https://ainarb.github.io/climate_and_health/
Acknowledgements
This work was supported by the Imperial College PhD President Scholarship awarded to Dr Aina Roca-Barcelo. The content of this article is not officially endorsed by the funder. The authors declare no competing financial interest.
Author information
Authors and Affiliations
Contributions
Dr Aina Roca-Barceló : Conceptualization, Data Curation, Formal Analysis, Funding Acquisition, Investigation, Methodology, Validation, Visualization, Writing – Original Draft, Project Administration, Rochelle Schneider : Supervision, Methodology, Validation, Writing – Review And Editing, Monica Pirani : Supervision, Methodology, Validation, Writing – Review And Editing, Alessandro Sebastianelli : Methodology, Writing – Review And Editing, Frédéric B. Piel Supervision, Validation, Writing – Review And Editing, Paolo Vineis Supervision, Validation, Writing – Review And Editing, Adelaide Cassia Nardocci : Resources, Writing – Review And Editing, Daniela Fecht : Supervision, Methodology, Validation, Writing – Review And Editing, Project Administration.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Roca-Barceló, A., Schneider, R., Pirani, M. et al. A satellite based machine learning approach for estimating high resolution daily average air temperature in a megacity in Brazil. Sci Rep (2026). https://doi.org/10.1038/s41598-026-35689-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-026-35689-x