Fig. 4: Probability of identification for increasing time period length of auxiliary data.
From: Interaction data are identifiable even across long periods of time

For each k ∈ {1, 2, 3}, we plot pk, the probability of correct identification (R = 1) when the attacker’s auxiliary data \({{{{{{{{\mathcal{T}}}}}}}}}_{2}\) consist of L weeks, 1 ≤ L ≤ 20 (the largest value for each k is marked). The 95% confidence interval is shown in light blue. (Inset) shows the difference quotient Δpk(L) = pk(L) − pk(L − 1) for 2 ≤ L ≤ 20. The probability of correct identification increases fast before plateauing around L = 8 weeks for all values of k, even slightly decreasing after L = 16 and L = 15 for k = 2 and k = 3, respectively. The largest values are pk=1 = 19.4% at L = 20, pk=2 = 66.0% at L = 16, and pk=3 = 69.3% at L = 13. This shows that having more auxiliary data further improves the performance of the attack, although data that are more distant in time seem less useful than closer ones, even slightly detrimental.