Extended Data Fig. 7: Testing of significance by shuffling.
From: A measure of smell enables the creation of olfactory metamers

We randomly shuffled performance outcome in the previously published dataset15, and in experiments 4–6. For each MC-odorant pair, we assigned performance (means of the participants) randomly 10,000 times, and then computed the correlation between angle distance and ‘shuffled’ performance. a, A copy of Fig. 3b. b, A set of 100 traces (randomly picked for visualization purposes) of a moving average of shuffled data, similar to the black line in a. Red dashed line in a and b is performance of d′ = 1 (41.8% correct) c–f, Histogram of correlations between angle distance and shuffled performance. Red line is the correlation of the observed data. c, The previously published data15. The correlation of observed data (r = 0.50, n = 310 comparisons) outperforms the correlation of shuffled data (P < 10−4, n = 10,000 repetitions). d–f, Angle distance is shown on a log scale. d, Experiment 4, the correlation of observed data (r = 0.51, n = 50 comparisons) outperforms the correlation of shuffled data (P < 10−4, n = 10,000 repetitions). e, Experiment 5, the correlation of observed data (r = 0.42, n = 50 comparisons) is significantly stronger than the correlation of shuffled data (P = 0.0009, n = 10,000 repetitions). f, Experiment 6, the correlation of observed data (r = 0.53, n = 40 comparisons) is significantly stronger than the correlation of shuffled data (P = 0.0013, n = 10,000 repetitions). g–i, Same as d–f, only here angle distance was analysed using a linear rather than logarithmic scale. g, Experiment 4, the correlation of observed data (r = 0.61, n = 50 comparisons) outperforms the correlation of shuffled data (P < 10−4, n = 10,000 repetitions). h, Experiment 5, the correlation of observed data (r = 0.43, n = 50 comparisons) is significantly stronger than the correlation of shuffled data (P = 0.0015, n = 10,000 repetitions). i, Experiment 6, the correlation of observed data (r = 0.45, n = 40 comparisons) outperforms the correlation of shuffled data (P < 10−4, n = 10,000 repetitions). j–l, Here we verify the validity of the choice of performance threshold, namely d′ = 1, in our data. For this verification, we calculate the null distribution for d′ for the discrimination tasks in experiments 4–6. To generate a meaningful distribution, we carefully choose the shuffling in this analysis. For our data, we shuffled the correct responses for each participant in each session, and assigned the responses to different MC-odorant pairs. For each participant, we used a different label assignment; this way we disentangle the difficulty of the task, and produce a statistic on the frequency at which one would expect each d′ by chance. The histograms of performance in the different experiments are shown in the case in which the data of the participants have been shuffled participants. the red areas show the bottom and top 5%; the grey line is d′ = 1. j, Experiment 4. k, Experiment 5. l, Experiment 6.