Correction to: Nature Human Behaviour https://doi.org/10.1038/s41562-021-01244-z, published online 27 January 2022.
In the original version, there was a bug in the code of the model experiment measuring localization accuracy as a function of azimuth. There were two errors in the implementation of this experiment, which partially offset each other. First, the stimuli were supposed to be rendered in the horizontal plane (0 degrees elevation) but were accidentally rendered instead at a wide range of elevations. Second, the stimuli for the model omitted the noise that was added to the stimuli in the original human experiment. Correction of the bug leaves the conclusions unaffected. Nonetheless, some text passages and Figures 3c, 7b and 7c as well as Extended Data Figures 3, 7, 8 and 9b required updating.
The graphs for the affected experiment, shown in Figure 3c and Extended Data Figure 9b, were updated. In Figure 3c, the graph showing localization accuracy of our model for broadband noise at different azimuthal positions was replaced. The graph plots mean absolute localization error (Mean abs. error) of the same noise bursts used in the human experiment. As the new experiment tested fewer positions (to match the original human experiment), the number of positions plotted in the graph changed. Similarly, all four panels of Extended Data Figure 9b were updated to show the correct model accuracy for broadband noise bursts at different azimuthal positions.
Several graphs that combine data across experiments were also affected and needed updating. The graphs in Figure 7b and c, showing overall human-model dissimilarity and the experiment-specific effect of training conditions, were updated. Figure 7b shows the overall human–model dissimilarity for natural and unnatural training conditions. The results did not change qualitatively but the dissimilarity for the naturalistic condition was adjusted to be 0.201 instead of 0.183. The dissimilarities in the other conditions were also adjusted (anechoic condition: to 0.252 instead of 0.245; noiseless condition: to 0.263 instead of 0.226; unnatural sounds condition: to 0.302 instead of 0.293). Figure 7c visualizes the effect of unnatural training conditions on human–model dissimilarity for individual experiments, expressed as the effect size of the difference in dissimilarity between the natural and each unnatural training condition (Cohen’s d, computed between human–model dissimilarity for networks in normal and modified training conditions). Positive numbers denote a worse resemblance to human data compared to the model trained in normal conditions. The effect sizes for the azimuth experiment were modest in the original version of the graph and remain so in the updated version. These values have been updated for each of the unnatural training conditions (anechoic condition: to –0.05 instead of 0.46; noiseless condition: to 1.17 instead of –0.96; unnatural sounds condition: to 0.05 instead of 0.16). The bars of the bar graph have been modified to reflect this change.
Similarly, the graph of overall human-model dissimilarity for individual network architectures in Extended Data Figure 7 and the bar graph in Extended Data Figure 8 were revised.
As a result of the correction of the bug and the resulting changes, various textual revisions were implemented.
In the Results section, in the third paragraph of the subsection “Effect of optimization for unnatural environments,” the effect size of the difference in dissimilarity between the naturalistic training condition results and each of the other training conditions was corrected to be d = 2.13 instead of d = 3.06 for the anechoic condition, d = 2.75 instead of d = 3.05 for the noiseless condition and d = 3.06 instead of d = 3.01 for the unnatural sounds condition.
The majority of the textual changes occurred in the Methods section, primarily in the subsection “Azimuthal localization of broadband sounds: stimuli.” In the first paragraph, details of the human experiment were added. In the original article, it was neglected to mention that background noise was added to the stimuli to bring performance below ceiling. As these details are critical to measuring the effect of interest, they were added in the updated version of the article. The added passages are as follows: “The reference speaker position ranged from –97.5° to +97.5° azimuth in 15° intervals. […] 18 speakers arranged at 15° intervals from –127.5° to +127.5° azimuth simultaneously played white noise during all trials, producing spatially diffuse background noise that served to bring performance below ceiling. The SNR of the stimulus was set individually for each participant. To determine the SNR, stimuli were played from the speakers at +90° or –90° azimuth and participants judged if each stimulus was to their right or left. This procedure was repeated at different SNRs and the SNR where the participant performed at 95% accuracy was chosen for the main experiment,” and “The stimuli were presented in spatially diffuse background noise, generated by presenting white noise from 19 positions at 15° intervals from –135° to +135°. The SNR was set for each network individually by measuring its left/right accuracy on stimuli rendered at +90 or –90 degrees at a range of SNRs spaced in 1 dB increments, and then selecting the highest SNR at which the network performed below 95% accuracy. The SNRs selected in this way ranged from –8 dB to –14 dB depending on the network.”
When we reran the experiment after correcting the discovered bug, we also corrected two inconsistencies between our experiment and the original experiment (presented in ref. 80). The first inconsistency was in the stimulus repetition rate used in the model and human experiments. The original model experiment had an inter-stimulus interval of 100 ms separating the 15-ms noise bursts, whereas the human experiment had the burst onsets spaced by 100 ms (i.e., yielding an inter-stimulus interval of 85 ms, with the bursts repeated at 10 Hz). In the correction, each noise burst repeated at 10 Hz to be consistent with the human experiment. To reflect this change, the statement “Each noise pulse was 15 ms in duration and the delay between pulses was 100 ms” was corrected to read “Each noise pulse was 15 ms in duration and repeated at 10 Hz.” The second inconsistency concerned the stimuli locations. In the original article, we had run the model on a superset of the locations that were tested in humans. In the new version of the experiment, we ran a set of locations as closely matched as possible to the set used with humans. That is, instead of azimuthal positions ranging 0° to +355° in 5° steps, stimuli were rendered in azimuthal positions ranging from –90° to +90° in 15° steps.
Additional changes were made to parts of the subsection “Limited spectral resolution of elevation cues: stimuli.” While the experiment described in that section was implemented correctly, there were errors in the description of the methodology. The original description erroneously stated that the stimuli were all at an elevation of zero degrees and reported the wrong number of stimuli. The original text that read “Each exemplar was rendered at 0° elevation and azimuthal positions ranging from 0° to 355° in 5° steps using each smoothed set of HRTFs. This yielded 12,960 stimuli (9 smoothed sets of HRTFS × 20 exemplars × 72 locations)” was corrected to “The exemplars were rendered at elevations between 0° and 60° in 10° steps and a set of azimuths ranging from 0° to 355°, the spacing of which varied with elevation due to the locations in the original set of HRTFs. This yielded 74,340 stimuli (9 smoothed sets of HRTFS × 20 exemplars × 413 locations).”
Finally, two reporting errors were corrected. The first occurred in several places in the Methods section. In the original article and in the subsection “Stimulus preprocessing: cochlear model,” the number of bandpass filters was reported to be 36. This was corrected to be 39. This also affected the last sentence of the section, which was corrected to read “The neural network thus received 1 s of input from the cochlear model, as a 39 × 8,000 × 2 tensor (39 frequency channels × 8,000 samples at 8 kHz × 2 ears)”. In the section “Neural network models,” this again affected the first and last sentences. In both cases 36 was corrected to read 39.
The second reporting error was in two cells in Extended Data Figure 3 that specify the model architectures. In column 3, row 4, the dimensions of the convolutional layer have been corrected from [3, 16, 32] to [3, 32, 32]. In column 9, row 23, the dimensions of the convolutional layer have been corrected from [2, 3, 512] to [2, 8, 512]. In addition, the original table accidentally duplicated rows 15–18 in column 4. These duplications are removed in the updated figure.
The changes have been made to the HTML and PDF versions of the article.
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Francl, A., McDermott, J.H. Author Correction: Deep neural network models of sound localization reveal how perception is adapted to real-world environments. Nat Hum Behav 6, 1743–1744 (2022). https://doi.org/10.1038/s41562-022-01448-x
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1038/s41562-022-01448-x