Extended Data Fig. 9: Behavioural change based on alternative distance metrics and features.
From: Nearest neighbours reveal fast and slow components of motor learning

To demonstrate the robustness of the proposed nearest-neighbour statistics, we verified that the inferred time course of behavioural change is reproduceable using a number of different distance metrics (used to define nearest neighbours) and parameterizations of vocalizations. a–d, We recomputed the main analyses using a Pearson’s correlation metric on 68-ms onset-aligned spectrogram segments (first row); and the Euclidean distance on onset-to-offset spectrogram segments that were linearly time-warped to a duration of 100 ms (second row). For comparison, the main analyses in the text were based on Euclidean distance on 68-ms onset-aligned spectrogram segments (for example, Fig. 2c–f, 3). a, t-SNE visualization based on the corresponding distance metrics and sound representation for the example bird, analogous to Fig. 2a. b, Repertoire dating averaged over birds, analogous to Fig. 3a, b. c, Stratified mixing matrices averaged over birds, analogous to Fig. 3g. The mixing values are highly correlated across distance metrics: Euclidean (main text) versus correlation, variance explained = 92%; Euclidean (main text) versus time-warped Euclidean, 93%. d, Stratified behavioural trajectories based on c, analogous to Fig. 3h–k. The results in a–d are consistent with those in Fig. 3, showing that our findings are robust with respect to the exact definition of nearest neighbours. Moreover, the overall structure of the behavioural trajectory appears to depend only minimally on changes in tempo and spectrogram magnitude (first row: Pearson’s correlation is invariant to changes in overall magnitude of vocalizations; second row: time-warped Euclidean distance is invariant to changes in tempo). e–h, We recomputed all main analyses with four additional parameterizations of vocalizations: time-dependent normalized acoustic feature traces for 16 acoustic features within 68-ms windows after syllable onset (first row); means and variances of the same 16 acoustic features over entire syllables (second row); means and variances of 8 of the 16 acoustic features (third row); and a one-dimensional parametrization consisting solely of entropy variance computed over entire syllables (fourth row). Feature means and variances were z-scored across all syllables. For all of these parameterizations we defined nearest neighbours with the Euclidean distance. e, Embedding using t-SNE based on the corresponding parameterization and metric. For entropy variance alone, the embedding appears locally one dimensional (for visibility, data points are larger than for the other parameterizations). Entropy variance maps mostly smoothly onto this one-dimensional manifold (data not shown). f, Repertoire dating averaged over birds, analogous to Fig. 3a, b. Repertoire dating based on entropy variance alone fails to reproduce most of the results in Fig. 3 obtained with spectrogram segments. The percentile curves are almost flat, indicating that renditions cannot be reliably assigned to their production times on the basis of entropy variance alone. In this case, vertical separation between percentiles cannot be interpreted as spread along the DiSC (see Extended Data Fig. 5e). For entropy variance alone, span is greater than zero across all percentiles, but consolidation is consistently close to zero. g, Stratified mixing matrix averaged over birds, analogous to Fig. 3g. The match with the mixing matrix in Fig. 3g decreases as the dimensionality of the parameterization is reduced (spectrogram versus time-dependent feature traces: variance explained = 93%; spectrogram versus 16 acoustic feature means and variances, 91%; spectrogram versus 8 acoustic feature means and variances, 84%; spectrogram versus entropy variance, 54%). h, Stratified behavioural trajectories based on g, as in Fig. 3h–k. The inferred behavioural trajectories are similar across the first three song parameterizations. However, these alternative parameterizations result in more vertical separation between percentiles in f, suggesting that they capture the direction of slow change less well (compare with Fig. 3a and Extended Data Fig. 5e). Parameterizations of reduced dimensionality also result in progressively less defined syllable clusters in the embeddings (e, top to bottom). These observations suggest that a parameterization based on the full spectrogram is better suited to capture the different directions of change explored during development (see also Extended Data Fig. 7). Note that for entropy variance (bottom row), the projections onto the local direction of slow change are highly magnified compared with the projections in the top panels.