Fig. 3: Loss-function variance decay.
From: Towards large-scale quantum optimization solvers with few qubits

Main: Average sample variance \(\overline{\,{\mbox{Var}}\,({{\mathcal{L}}})}\) of \({{\mathcal{L}}}\), normalized by \({\alpha }^{4}{\sum }_{(i.j)\in E}{w}_{ij}^{2}\), after plateauing, for the encodings Π(1), Π(2), and Π(3), as a function of m (in log-linear scale). Error bars are not shown since they are too-small to be visible in this scale. The agreement with the analytical expression in Eq. (3) is excellent (the dashed, blue curve corresponds to the first term of the equation, which decreases as 2−2n). Since \(n={{\mathcal{O}}}({m}^{1/k})\), this translates into a super-polynomial suppression in m of the decay speed of \(\overline{\,{\mbox{Var}}\,({{\mathcal{L}}})}\) for k > 1. Inset: average variances of the entries of the gradient of \({{\mathcal{L}}}\) as functions of the number of layers, for quadratic compression (Π(2)). Each curve corresponds to a different n. Note the decay of the plateau (rightmost) values with n. The black crosses indicate the depths needed to reach average approximation ratios > 0.941 (computational-hardness threshold) with the quantum-circuit’s output x alone, i.e., excluding the final bit-swap search. In all cases such ratios are attained before the variances have converged to their asymptotic, steady values.