Introduction

Considerable research efforts have been made to investigate the relationship between the urban acoustic environment (AE) and human health. The adverse health effects of environmental noise exposure (e.g., cardiovascular disease and mental health issues) are well documented [1]. However, the AE can be defined as “the sound from all sound sources as modified by the environment” [2], encompassing a broader range of acoustic properties than environmental noise alone (e.g., underlying sound sources, temporal variations or frequency characteristics). A growing body of research is now examining the potential health benefits of additional acoustic properties [3]. Laboratory studies, for instance, have shown that calm and pleasant sounds are associated with greater reductions in heart rate [4], that natural sounds can promote stress recovery [5], and that biophonic AEs can positively influence functional brain connectivity [6]. Beyond laboratory settings, several field studies have also investigated the impact of natural sounds on various aspects of human health, reporting associations with reduced pain, lower stress levels, and improved cognitive performance [7].

However, there is a significant lack of comprehensive population-based studies examining the relationship between the AE and human health beyond environmental noise, which are crucial for examining the patterns, causes, and effects of health outcomes in defined population groups within real-life settings [8] (pp. 555-583). Such studies require high spatial resolution data to assess exposure to the respective acoustic properties under investigation. For research on the relation between environmental noise exposure and human health, data are already available through strategic noise maps (SNMs), mandated by the Environmental Noise Directive (END) in the EU [9]. Here, SNMs with high spatial resolution are created for agglomerations of over 100,000 inhabitants, as well as for major roads, railways, and airports. The SNMs assess exposure to “unwanted or harmful outdoor sound created by human activities” [9]. While they are effective at quantifying noise emitted by “transport, road traffic, rail traffic, air traffic, and from sites of industrial activity” [9], there is currently no established method to estimate exposure to additional acoustic properties of the AE at a comparable spatial resolution. Consequently, the current unavailability of high spatial resolution data on the AE beyond environmental noise exposure is a major barrier to conducting population-based studies on the health impacts of the wider AE.

A promising approach to overcome this barrier of exposure assessment involves the use of land use-based models. These models can estimate environmental exposures at high spatial resolution by leveraging the statistical relationships between measured exposure data (e.g., air pollution) and land use types (LUTs) (e.g., highways or green space). Land use-based models were already successfully used for the estimation of outdoor air pollution [10]. The applicability of such models to acoustics is demonstrated by their effective use in estimating average noise levels, traffic noise, and in the extrapolation of SNMs [11,12,13,14,15,16,17,18,19,20,21]. Some of the major advantages of the application of LUT-based models are that they are cost-effective and computationally inexpensive, especially when compared to more complex alternatives like the creation of SNMs [17]. In addition, they are easily scalable as LUT information is often readily available and thus suited for data-poor regions where detailed exposure measurements are limited or unavailable (e.g., agglomerations with less than 100,000 inhabitants). Furthermore, as they are based on land use information, they can be directly integrated into urban planning processes. However, to the best of the knowledge of the authors, land use-based models have not yet been applied to estimate acoustic indices beyond noise.

In this work, we investigate the potential of LUT-based models to estimate acoustic properties beyond environmental noise exposure, as defined by the END. We focus on the acoustic properties captured by the Articulation Index [22], the Bioacoustic Index [23], the Link Density [24] and the maximum Sharpness [25]. These indices were chosen, as previous works have already shown their relations to LUTs as well as the human perception of pleasantness [26], health-related urban greenspace [27] and biophonic activity [23, 28], thus indicating potentially health-relevant acoustic properties. In addition, we integrate the A-weighted sound pressure level (LAeq) as a measure of total environmental noise. This also enables us to compare the performance to predict total environmental noise between our model, the SNM and results from the literature. To facilitate a scalable approach, we rely exclusively on LUTs as predictor variables because they are usually easily accessible and, in many cases, even legally mandated. Specifically, the research questions are:

  1. 1.

    To what extent can gradient boosting models based on LUTs predict properties of the urban AE at high spatial resolution?

  2. 2.

    How well do these models perform in estimating:

  1. i.

    Acoustic properties beyond noise, when tested on the same-source data and external-source measurements?

  2. ii.

    Total environmental noise levels, when tested on the same-source data, external-source measurements and in comparison to SNM predictions?

Material and methods

To address our research questions, we train a gradient boosting model on LUT data provided by local authorities and acoustic measurements from the SALVE (AcouStic QuAlity and HeaLth in Urban EnVironmEnts) project in Bochum [29]. We evaluate the performance of the model using 5-fold cross-validation, test data from SALVE, as well as acoustic measurements from the Be-MoVe (Participation-based transformation of active mobility for health-promoting urban and transport infrastructures) project conducted in Essen, Germany [30]. Additionally, we compare the model’s performance in estimating total noise levels with the performance of the SNM of Bochum. Finally, we demonstrate the application of the model to the area of Bochum, Essen and the neighbouring city of Mülheim an der Ruhr.

Data on the acoustic environment

SALVE

To train and test our model, we rely on acoustic measurements from the SALVE project in Bochum. A comprehensive description of the SALVE study design is provided in Haselhoff et al. [29]. The sampling strategy was designed to capture the environmental contexts in close proximity to people’s homes. The recording location selection was based on stratified LUT sampling as described in Haselhoff et al. [29]. At each sampling point, four audio recordings (5-min, 48 kHz, 24-bit depth) were made once each season between 03.2019 and 03.2020. Recordings took place between 09:00 and 17:00. The recording device was an NTi XL2 sound recorder with the M2230 omnidirectional microphone [31], mounted at a height of around 1.65 m. The device was calibrated to meet the standards IEC 61672:2013, IEC 61672:2003, IEC 61260:2014, IEC 61260:2003, IEC 60651 and IEC 60804. A total of 2746 audio recordings were gathered at 785 locations. The number of recording locations differs from 730, as we considered measurement location deviations of more than 5 m as significant.

Be-MoVe

To test our model, we draw data from the Be-MoVe project, which aimed to test co-created alternative forms of mobility and designs of public spaces in urban neighbourhoods. The study design is described in detail in Hornberg et al. [30]. Shortly, as part of the Be-MoVe project, soundwalks according to DIN ISO 12913 [2] were conducted to assess the acoustic environment at 22 locations in two districts in Essen. The districts are characterised by (i) apartment buildings, recreational areas and open space, and (ii) a diverse urban landscape, ranging from shopping streets and commercial centres to densely populated residential zones. For each soundwalk, at each listening station, an acoustic measurement (3-min, 48 kHz, 24-bit depth) was conducted, using the same device as within the SALVE project. Recordings were made between 10.05.2023 and 06.09.2023 between 18:00 and 19:30. Originally, the project was conducted from 2022 to 2023, but acoustic measurements from 2022 were made using an uncalibrated recording device. In total, we included 90 measurements across 22 listening stations.

Acoustic properties

For each recording, a number of acoustic indices are derived: The total noise level is assessed using the A-weighted sound pressure level (LAeq), which was directly provided by the recording devices. As a measure of intelligibility of human voice, we use the Articulation Index based on the root mean square (RMS) of the signal. The Articulation Index indicates how much background noise level can interfere with human speech and ranges from 0 (no speech understood) to 1 (all speech understood) [22]. As a measure of the energetic centre of gravity between frequencies, we use the maximum Sharpness value during each recording, according to DIN 45692 [25]. It is measured in Acum and ranges from zero to infinity, where higher values indicate sounds with more energy in higher frequency bands (i.e., whether a sound is perceived as sharp, shrill, bright or hissing). The Articulation Index, as well as the Sharpness are calculated using Artemis SUITE (version 15.1). Additionally, we use the Bioacoustic Index (BIO). The BIO ranges from zero to infinity, where higher values indicate greater differences between the quietest and the loudest 1 kHz frequency band between 2 and 8 kHz. In more rural areas, higher values shall indicate higher avian abundance, while in urban areas, higher values are found to be more indicative of road traffic and less green space [23, 28]. The BIO is calculated following the script used in Lawrence et al. [32]. Furthermore, we include the Link Density as a measure of acoustic dominance, i.e., a measure of how many factors (i.e., sound sources) contribute to the overall spectral dynamic. It ranges from zero to one, where higher values indicate higher acoustic dominance (e.g., by cars passing by). The Link Density is calculated following the procedure outlined in Haselhoff et al. [27].

As we want to predict the average acoustic properties at each location, we calculate the mean of all indices (using energetic averages for the LAeq) for each location from the measurements conducted at each sampling point, resulting in 785 measurement locations for the SALVE and 22 for Be-MoVe.

Training, validation, and test split

We partition the SALVE dataset into a training/validation (85%, n = 668) and a test (15%, n = 117) subset, orientating us at Almansi et al. [33] and the commonly used 70/30 split between training/validation and test data [34]. We call the latter test split SALVETest. The training/validation data are used for model development through 5-fold cross-validation. The SALVETest subset and the Be-MoVe dataset are used to evaluate the model’s performance on unseen data.

Strategic noise maps

To evaluate our model’s performance in predicting total environmental noise, we additionally compare it to the performance of the SNM in Bochum. For this, we received the SNM from the city of Bochum in 2022. To be comparable to our recordings, we use the LDay in 1 dB(A) increments, i.e., the A-weighted long-term average sound level between 06:00 and 18:00, to match our recording times. The provided values range from 34.5 to 79.5 dB(A). Since there are multiple SNM depicting different noise sources (road traffic, railway traffic and industry), we combine them by overlaying the maps and energetically summing their respective LAeq values. As the SNM has a resolution of 5 × 5 m, we compare the LAeq point measurements from SALVE with corresponding polygon values from the SNM by extracting the predicted LAeq from the SNM at each measurement location.

Environmental data

The predictor variables are based on land use data provided by the Ruhr Regional Association (RVR) for the year 2019 [35]. In total, there are 146 LUT categories. We derive the LUTs around each recording location by calculating the proportion of each LUT category within buffers of (i) 50 m, to capture the immediate surroundings of each location, and (ii) 300 m radius, to capture potentially important acoustic impacts of LUT within a wider distance. Altogether, this results in 292 possible predictor variables per recording location.

Methods

Descriptive statistics

To provide an overview of the acoustic properties of our recordings, we show descriptive statistics, including the arithmetic mean, the minimum, the maximum and the standard deviation of the SALVE and the Be-MoVe datasets. In addition, since we have multiple recordings for each location, we report the within-location standard deviation to capture the inherent variability of respective acoustic properties. The reported within-location standard deviation is the arithmetic mean of each within-location standard deviation per recording location.

Feature reduction

To improve the efficiency of the gradient boosting models, we remove non-related LUT categories from the set of predictor variables by using Boruta feature selection. Briefly, Boruta feature selection is a method that identifies all relevant features by comparing the importance of the actual variables to that of randomly permuted “shadow features” of these variables, using a random forest classifier [36]. We repeat this for each acoustic index against all 292 LUT variables in our dataset and keep those LUT variables, which were identified by the algorithm as “important” or “tentative”. Finally, we harmonize all identified LUT variables into a single predictor set for all acoustic indices (Appendix 1). We perform the calculation using the “Boruta” function from the Boruta package (version 8.0.0) in R (4.1.3), applying the default settings as recommended by the authors [36].

Gradient boosting

We train five separate gradient boosting regressor models to predict the respective acoustic indices using the LUT features identified by the Boruta algorithm [36]. Gradient boosting builds a predictive model by sequentially combining multiple weak learners. The gradient boosting framework consists of three key components: (i) a loss function to be minimized, (ii) a weak learner for generating predictions, and (iii) an additive model that incorporates new learners to reduce the residual errors of the ensemble. There are three main hyperparameters, which impact the performance of the model: the number of trees, the maximum depth of the trees, and the learning rate [37]. We optimize these parameters by performing a grid search combined with a 5-fold cross-validation. A 5-fold cross-validation is an evaluation technique used to assess how well a model generalizes to unseen data. The dataset is randomly divided into five equal parts (folds). In each of five iterations, four folds are used to train the model, and the remaining fold is used for validation. This process repeats until each fold has been used once for validation. The final performance metric is the average of the five validation results. For the grid search, we iterate over the hyperparameters number of trees (10, 50, 100, 500, 1000, 5000), maximum tree depth (3, 4, 5, 6), and learning rate (0.0001, 0.001, 0.01, 0.1, 1.0). For each hyperparameter combination, a 5-fold cross-validation is used to evaluate the model performance. The optimal hyperparameter combination is selected based on the lowest root mean square error (RMSE). In the following, we call the model based on 668 locations SALVETrain. Once the optimal hyperparameters are defined, and after the model is tested against the SALVETest data, we train the model on the entire SALVE dataset to maximize its learning from all available data and prepare it for the evaluation on the Be-MoVe data. This model is referred to as SALVEAll.

Furthermore, as we observed significant spatial autocorrelation in selected acoustic indices in previous publications [27], which can lead to artificial inflation of performance measures, we investigate the robustness of our results using leave-spatial-out-cross validation [17]. Recording locations were first grouped according to spatial proximity using a regular grid-based approach (300 and 1000 m). These spatial groupings were then used to define the folds in a grouped k-fold cross-validation scheme, ensuring that each spatial group was used exactly once as the test set across all folds.

Performance measures

To evaluate the performance of each model, we mainly rely on three performance measures: The root mean square error (RMSE), the mean absolute error (MAE) and the coefficient of determination (R2). RMSE provides a measure of the average magnitude of the prediction error, with greater weight given to larger errors. MAE quantifies the average absolute difference between predicted and observed values, offering a more interpretable and less sensitive measure to outliers compared to the RMSE. To facilitate a comparison of model performance, the MAE and RMSE are further normalized by (i) the range (i.e., the difference between the maximum and minimum values) and (ii) the mean of the acoustic index within the respective target dataset. In addition, R² indicates the proportion of variance in the observed data explained by the model, with values closer to 1 signifying better predictive performance. We provide a visualisation to compare the model predictions against the true values using scatter plots (including Pearson’s r) as well as histograms.

Shap

For a more detailed investigation of the model’s performance, we assess the importance of each predictor variable using SHAP (SHapley Additive exPlanations). SHAP values provide a unified measure of feature importance by quantifying the contribution of each variable to individual predictions [38]. This approach allows for a consistent and interpretable explanation of the model output by attributing the prediction difference from the mean to each feature. We report SHAP values using a beeswarm plot for the five most important predictors for each model. SHAP values are derived by the application of the SALVEAll model on the entire SALVE dataset.

Sound maps

To demonstrate the scalable properties of the LUT-based model, we generate “sound maps” for the cities of Bochum, Essen, and Mülheim. If the models demonstrate sufficient performance, they could enable population-level exposure estimates to AE properties beyond conventional environmental noise metrics. To construct the sound maps, we overlay a grid of equally spaced sampling points (30 m apart) across the three cities. For each point, we calculate the LUT variables used in the model within buffers of 50 m and 300 m. We then apply the final prediction model to estimate acoustic properties at each location. For visualisation purposes only, we use Kriging [39] to interpolate values between the point estimates, to create a continuous surface of predicted values.

We calculate gradient boosting, hyperparameter tuning and performance measures using the sklearn package (0.42.2) and SHAP values using the shap package (0.46.0) in Python (3.9.7). The calculation of LUT areas in two buffer sizes, the sampling of the sound map points and the Kriging are performed using ArcGIS (3.2.0).

Results

Data description

The LAeq for the SALVE dataset has a mean of 55.3 dB(A) with a range between 33.6 and 79.3 dB(A) (Table 1). The standard deviation (STD) is higher between recording locations than within recording locations, showing a more stable sound pressure level at each location than between locations. In the Be-MoVe dataset, the STD is also higher between recording locations (5.1 dB(A)) than within them (2.5 dB(A)), but the values are overall lower here, underlining a less diverse AE in terms of LAeq than the one measured in the SALVE project. The Articulation Index has a mean of 0.846, a minimum of 0.207 and a maximum of 1 in the SALVE dataset, while also having a much higher between (0.136) than within STD (0.076). This tendency is even stronger in the Be-MoVe dataset, with a between STD of 0.146 and a within STD of 0.051. In contrast, the between STD for the BIO is smaller than the within STD in both datasets (SALVE: 0.985, 1.190; Be-MoVe: 0.824, 1.249), showing a higher variation within recording locations than between. For the Link Density, this behaviour is also observable for the SALVE dataset, but much less pronounced, with a between STD of 0.203 and a within STD of 0.210. Furthermore, the mean is higher in SALVE with 0.563 against 0.478 in Be-MoVe, and the Be-MoVe minimum (0.081) and maximum (0.927) fall within the range between the SALVE minimum (0.05) and maximum (0.984). The mean sharpness of the SALVE dataset is 2.029 Acum, ranging from 1.4 to 3.328 Acum. Similar to the Bioacoustic Index, a higher within STD (0.356 Acum) than between STD (0.305 Acum) can be observed, which is also the case for the Be-MoVe dataset (0.266 & 0.185 Acum resp.).

Table 1 Descriptive Statistics for the SALVE and Be-MoVe dataset.

Modeling

Feature reduction

From the 292 initial predictor variables, 215 are identified as unimportant, resulting in a list of 77 LUT categories relevant for predicting acoustic properties in the urban environment (28 LUT categories for a 50 m buffer and 49 LUT categories for a 300 m buffer). A list of all relevant LUT categories can be found in Appendix 1.

Hyperparameter tuning

We apply a grid search in combination with a 5-fold cross-validation to tune the hyperparameters of the gradient boosting model on the test/validation data of the SALVE dataset. We find the optimal parameter for number of trees, maximum tree depth, and learning rate for the indices LAeq (1000, 3, 0.01), Articulation Index (5000, 3, 0.001), Bioacoustic Index (500, 3, 0.01), Link Density (500, 3, 0.01) and Sharpness (5000, 3, 0.001). The results for the performance measures from the final 5-fold cross-validation can be found in Table 2. For the LAeq, we find an MAE of 3.9 (STD = 0.4) and an RMSE of 5 (0.5). The higher RMSE indicates an impact of larger outliers between predictions and measurements. This pattern can be found for each acoustic index, with the RMSE being approximately 1.3 times greater than the MAE. Overall, the R² varies considerably across the models, ranging from a low of 0.133 (0.078) for Sharpness to a high of 0.522 (0.054) for the Articulation Index. The model for the BIO also shows a relatively low explanatory power (0.135 ± 0.104), while for the LAeq, it is located at the upper end (0.463 ± 0.058), and for Link Density, it falls in the mid-range (0.286 ± 0.056). These results indicate that the models show varying predictive quality [40]. In addition, the relatively low STDs in comparison to the means indicate a stability of model performance across different folds.

Table 2 Results of the 5-fold cross-validation.

Model performance

To investigate the model performance on unseen data, we apply the SALVETrain model on the SALVETest data and the SALVEAll model on the Be-MoVe data. Results can be found in Table 3 and in Appendix 2. For a more detailed investigation, we also provide scatter plots and histograms between true values and model predictions, as well as SHAP values to investigate the most predictive features (Fig. 1).

Fig. 1: Model performance for Articulation Index, Bioacoustic Index, Link Density & Sharpness.
Fig. 1: Model performance for Articulation Index, Bioacoustic Index, Link Density & Sharpness.
Full size image

Model performance is visualized using scatter plots between test data and model predictions (incl. 95% confidence interval), histograms of the difference between test data and model predictions as well as SHAP values for the top five most important predictor variables in the gradient boosting model for Articulation Index (a), Bioacoustic Index (b), Link Density (c) and Sharpness (d); (RMSE = Root Mean Square Error, r = Pearson correlation). The unit of LAeq is dB(A). The unit of Sharpness is Acum.

Table 3 Performance measures for model application on test data.

For the LAeq predictions of the SALVETest dataset, we find slightly lower MAE and RMSE as well as an increased R2 in comparison to the 5-fold CV results. This tendency is more pronounced for the model application on the Be-MoVe data, with even lower values for MAE and RMSE. However, all values fall within the results from 5-Fold cross-validation ( ± STD), underlining the robustness of the model for completely unseen LAeq data. For the Articulation Index, the BIO and the Sharpness, we find that the model’s performance is comparable to or even better than that of the cross-validation when applied to the SALVETest data. However, the performance decreases strongly when applied to the Be-MoVe data. Here, MAE and RMSE double for the Articulation Index and increase by a factor of approx. 1.5 for the BIO. Although the values slightly decrease for Sharpness, the R² value drops below 0 for all three indices, indicating poor explanatory power in the model’s prediction for the Be-MoVe data. This is underlined by investigating the scatter plots of the linear fit between predicted and measured values (Fig. 1). With a Pearson’s r of 0.23 (95% CI [–0.21, 0.6]) and –0.18 ([–0.56, 0.26]), the BIO and the Sharpness show close to no substantial linear relation respectively, especially when compared to the model application on the SALVETest data (r = 0.57 [0.44, 0.68] and 0.54 [0.4, 0.66] resp.). In contrast, the Articulation Index still exhibits a linear but weaker relationship, with the r decreasing from 0.8 [0.72, 0.85] to 0.59 [0.22, 0.81]. Especially pronounced is the heteroscedastic pattern, which shows a very good fit for higher but an increasingly poor fit for lower Articulation Index values. The performance measures for Link Density remain relatively robust. Results on the SALVETest data are consistent with those from cross-validation, while the application to the Be-MoVe data shows a slight increase in MAE and RMSE, along with a modest increase in R².

Regarding the predictive importance of LUT categories, SHAP values indicate that “Main Streets” are consistently among the top predictors—ranking as the most important variable for all indices except Sharpness, where it ranks second. Similarly, “Highways” (for Articulation Index and Sharpness) as well as “Residential streets” (for BIO and Link Density) are among the top five most important predictors. Furthermore, different commercial areas play an important role for the AE in regard to Articulation Index and Link Density. The presence of all these LUT almost exclusively increases (BIO & Link Density) or decreases (Articulation Index & Sharpness) the respective indices. The opposite can be observed for the frequently important predictor “Deciduous Forest”. Looking at the distribution by buffer sizes, the immediate surrounding seems to be slightly more important than the broader environment for predicting the acoustic indices, with twelve out of the twenty top five predictors relating to the 50 m Buffers. Here, the Link Density stands out, as all of its most important predictors belong to the 50 m buffer.

The results from leave-spatial-out-cross-validation (LSOCV) to account for the potential impact of spatial autocorrelation on the model performance can be found in Appendix 3. By applying LSOCV using spatial groupings based on a 300 and 1000 m regular grid, we find no artificial inflation of model performance, as all measures fall between the mean ± the standard deviation reported in Table 2.

Model performance in comparison to the strategic noise Map

For an improved understanding of the results and an embedding in the overall context of predicting the environmental AE, we compare the performance of our model in predicting total environmental noise to the performance of the SNM of Bochum (Fig. 2). For all investigated measures, we see an overall improved performance of the LUT based models against the SNM (Fig. 2a). In direct comparison, the SNM MAE for predicting the LAeq is 1.84 dB(A) higher than that of the SALVETrain model in predicting LAeq measures from SALVE, while the RMSE is 3 dB(A) higher. Considering the R2, the SNM estimates show a poor performance in predicting total environmental noise, with a value of –0.307, indicating that the model performs worse than a naive model using the arithmetic mean of the test data. However, in contrast to the BIO and Sharpness models, the scatter plots (Fig. 2b) still reveal a substantial linear relation of r = 0.57 (95% CI [–0.52, 0.61]) between predicted and measured LAeq values. The biggest deviations between measured and predicted LAeq values are found for low predictions around 35 dB(A) and measured values between approx. 40 and 70 dB(A).

Fig. 2: Model performance for LAeq.
Fig. 2: Model performance for LAeq.
Full size image

a Model performance comparison in predicting total environmental noise, using the strategic noise map, the SALVETrain model and the SALVEAll model against test data. “SALVE” corresponds to the total SALVE dataset of 785 recordings; (MAE=Mean Absolute Error, RMSE=Root Mean Square Error, R2=Coefficient of Determination). b Model performance visualized using scatter plots between test data (x-axis) and model predictions (incl. 95% Confidence Interval) (y-axis), as well as histograms of the difference between test data and model predictions. LAeq in dB(A) is given, if not indicated otherwise; (r = Pearson correlation, CI=Confidence interval).

This tendency to an underestimation by the predictions is further emphasized by the histogram, which shows a right-skewed distribution of residuals. From there, we see a residual-range from –13.5 dB(A) to 35.1 dB(A). In contrast, residuals are more symmetrically distributed around 0 for the model predictions of the SALVE and Be-MoVe data, ranging from –11.3 to 16.7 dB(A) and –5.4 to 8.7 dB(A). Still, underestimates remain more prevalent also from these models.

Discussion

The goal of this work is to investigate whether LUT-based models can predict properties of the urban AE and how they perform in doing so. As LUT data is often widely available, such models offer an efficient approach for estimating AE properties at high spatial resolution—information that is critical for large-scale studies on AE properties beyond noise modeled by SNMs.

For the LUT model applied to predict LAeq, all performance metrics indicate improved performance compared to the predictions from the SNM. We find improved (i.e., decreased) MAE (by approx. 2 dB(A)) and RMSE (3 dB(A)) for the model application on unseen data from two datasets. As an increase of 3 dB(A) corresponds to a doubling in energy, these represent substantial differences between the LUT-based model and the SNM. This is underlined by comparing the models’ R2, which even becomes negative for the SNM predictions. However, it should be noted that SNMs are not specifically designed to estimate total environmental noise, but rather noise from specific sound sources (major road, rail and air traffic as well as industry noise). Furthermore, estimates are made for a height of ~4 m and at a resolution of 5 ×5 m. Therefore, the SNM may be unable to capture acoustic differences at finer scales, unlike point estimates from the gradient boosting model, which, in theory, can achieve arbitrarily high resolution. In addition, the exclusion of roads with fewer than six million vehicle passages a year complicates the use of SNMs for total noise assessment. Although SNM results are often treated as synonymous with total noise pollution, our results should not be viewed as a shortcoming of the SNM. Rather, the results highlight that there are substantial noise sources contributing to total environmental noise, which are not considered by SNMs. This finding is in line with several results from the literature, which also highlight the conceptual shortcoming of SNM when interpreted as total environmental noise [41, 42].

Our findings are largely in line with those of previous studies that modeled comparable noise concepts based on land use data. Liu et al. [14] report a MAE of 3.47 dB(A) and a RMSE of 4.44 dB(A) with an R2 of 0.58 in predicting LAeq measurements from five Canadian cities. Aguilera et al. [11] report R2 values between 0.66 and 0.87 and also provided comparisons to SNMs, with a Pearson’s r2 between 0.38 and 0.61. However, they predict road traffic noise, which represents only a part of the urban AE. Another study compared noise predictions from five different models to noise measurements from five cities in Bulgaria, focusing mainly on traffic lanes and industrial sites [43]. They find the best performing model to be extreme gradient boosting, with an RMSE of 4.74 dB(A) and an R2 of 0.68. In addition, there are several other studies that support our findings [13, 15, 16, 44]. Although these models include additional information to LUTs (e.g., traffic volume, meteorological data) and predict different forms of noise, their performance is close to what we find in our study. In comparison to the noise map performance here (RMSE = 7.8 dB(A); R² = –0.31), these results suggest that such models could be more effective in predicting noise exposure at a high spatial resolution.

In addition to predicting the LAeq, we also predicted the Articulation Index, the BIO, the Link Density and the maximum Sharpness. Here, results are mixed. While for all indices, a moderate to good performance on the SALVETest data can be observed, performance drops substantially when applied to the Be-MoVe data. This is especially true for the Articulation Index model, which showed the highest R2 of all indices (0.609), but became negative for the Be-MoVe data (–0.149). Inspecting the scatter plots, we find that there is still a substantial linear relation between predicted and measured value (r = 0.59, 95% CI [0.22, 0.81]), though model performance clearly declines at Articulation Index values lower than ~0.8. As 68% of the Be-MoVe datasets' values are below this value, this might explain the low performance for predictions in more dense and diverse built-up urban environments. Still, the model performs reasonably well in predicting Articulation Index values at locations with a high percentage of intelligible speech. No such patterns nor substantial linear relationships are found for the models predicting the BIO and the Sharpness. In both cases, the R2 becomes negative for the Be-MoVe data. For the BIO, the MAE and the RMSE also strongly increase, while they stay approximately the same for the Sharpness. Overall, this indicates that the models incorporating only LUT information are not performing well in predicting these indices. One reason for that could be that the indices focus on frequency power-related information, rather than overall sound pressures. Since frequency power tends to vary more over time than across locations [45], a model based on seasonal averages may overlook important temporal dynamics. In addition, the suitability of the BIO and Sharpness indices for capturing meaningful information about the urban AE is still subject to debate [27]. In contrast, the model predicting the Link Density shows similar performance measures between the applications on both datasets. Although the R2 values between 0.27 and 0.31 only show a moderate fit, a r of 0.52 (95% CI [0.38, 0.64]) for the SALVETest data and of 0.79 [0.56, 0.91] for the Be-MoVe dataset indicate a robust performance across datasets. While similar concerns regarding frequency information also apply to Link Density, the current results are promising and are likely to improve with models that incorporate temporal information.

To the best of our knowledge, this is the first study to use LUT-based models for predicting acoustic indices that capture acoustic properties beyond noise at a citywide scale. Therefore, no comparable results for the specific indices used here are available. However, Clark et al. [46] used a LUT-based random forest approach to model the presence of different sound sources (e.g., traffic, animal) in Accra, Ghana. As model performance measures, the report r values range from 0.01 (Nature) to 0.72 (Animal and insects), with the majority of values being around 0.5.

Strengths and limitations

One of the major strengths of this study is the demonstration that models solely based on readily available LUT information represent a promising approach for predicting selected AE properties (here, LAeq and Link Density). Once the necessary LUT features are calculated and the model is built, this enables a fast application for predicting the respective AE properties on high spatial resolution. For example, Fig. 3 represents the application of the SALVEAll model to predict the LAeq and the Link Density for the research area of three cities (~450 km2). The calculation of LUT features took approximately 48 h, while the computation time for predicting each of the respective indices took less than 10 s on a standard office PC (using an 11th Gen Intel(R) Core(TM) i9-11900K at 3.50 GHz and 64 GB of RAM with no parallelization). In theory, these models can easily be applied to predict further acoustic properties of interest. Additional strengths are that we compare our results to the performance of the SNM using a comprehensive dataset of acoustic measurements across the city of Bochum from the SALVE project, as well as the model evaluation, using two independent test datasets.

Fig. 3: Predicted sound maps for LAeq and Link Density.
Fig. 3: Predicted sound maps for LAeq and Link Density.
Full size image

Application of the gradient boosting models to predict total environmental noise (a) and link density (b) for the area of the cities Bochum, Essen and Mülheim an der Ruhr.

However, limitations of this study comprise the already mentioned neglected temporal information, which might be especially important to predict frequency-related AE properties. In addition, we only focus on the daytime AE, as measurements are only available between 09:00 and 19:30. As the night-time AE plays an important role for health-related issues like sleep [1], our models may not be suitable for accurately estimating exposure levels during this time frame. In addition, although two independent datasets were used for the evaluation of the model performance, the Be-MoVe dataset only comprises 22 recording locations. Another important limitation is that the model is only based on the relations between LUTs in the surroundings of acoustic measurements in Bochum; thus, potentially important LUTs (e.g., airports) are not considered in predicting AE properties. Therefore, additional studies of larger sample sizes and from different cities are needed to prove the scalability of LUT-based approaches—especially for the prediction of acoustic indices beyond noise. Methodologically, we use a gradient boosting model, whose predictions are bound between the minimum and maximum values provided in the training data. While for indices like the Link Density or the Articulation Index, the empirical measures are close to the theoretical bandwidth, the model will fail with predictions for locations with higher or lower values for the other indices. Furthermore, although we performed sensitivity analysis using extreme gradient boosting [47] without observing performance improvements, other models may outperform gradient boosting in the future.

Outlook

In this work, we demonstrate the predictive power of solely LUT-based models to estimate properties of the urban AE at high spatial resolution. As the model for the LAeq outperforms the SNM in predicting total environmental noise and similar performance was found throughout the literature, it represents a promising approach in estimating total environmental noise at high spatial resolution. Although the predictive power for the Link Density model was lower than that of the LAeq model, its results were also robust across datasets. Our results are particularly important for epidemiological studies, as the models offer fast, large-scale estimates, using readily available information. In the future, these models could be used to estimate exposure to AE properties, which can then be analysed in relation to human health, enabling the investigation of associations already observed in laboratory and field studies within population-based research. From a spatial planning perspective, processes that redefine or update land use mapping can leverage our model to account for the impacts on the acoustic environment. While planning measures aimed at altering existing land use types are often gradual and spatially constrained, structural transformations like the repurposing and qualification of former industrial brownfield sites create strategic opportunities for targeted redefinitions of land use. Furthermore, future work should investigate whether the inclusion of temporal information along with available environmental data (e.g., satellite imagery) will enhance the predictive power.