Introduction

Quantum state tomography (QST) is widely used for estimating quantum states1,2,3,4,5. To reconstruct the density matrix with high accuracy, measurements should be performed on a large number of identical copies; specifically, for single-copy (i.e., non-collective) measurements, a minimum of O(4n) total copies is required to estimate the density matrix of an n-qubit system with a bounded recovery error, as defined by the Frobenius norm between the reconstructed and true density matrices6. Various methods have been proposed to achieve efficient and accurate QST. Classical computational approaches include linear inversion7, maximum likelihood estimation4,5,8, Bayesian inference9,10,11, region estimation12,13, classical machine learning14, and least squares estimators15,16,17. In contrast, quantum machine learning methods encompass algorithms such as variational quantum circuits18,19, quantum principal component analysis20, and quantum variational algorithms combined with classical statistics21.

A significant reduction in the number of required state copies can be achieved by assuming two common low-dimensional structures: low-rankness and matrix product operators (MPOs). (i) Low-rank density matrices frequently emerge in quantum systems with pure or nearly pure states that exhibit low entropy6,16,22,23,24, and low-rank assumptions are employed in various state estimation procedures, with a range of associated measurement processes, including 4-designs22, Pauli strings23,25, Clifford gates16, and Haar-random projective measurements24. When the density matrix has rank r, the required number of total state copies can be reduced to O(2nr)6,16, yet this remains exponential in n, posing challenges for current quantum computers exceeding 100 qubits. (ii) MPOs, on the other hand, offer a more scalable alternative for certain quantum systems, including one-dimensional spatial systems26, Hamiltonians with decaying long-range interactions27, and states generated by noisy quantum devices28. When employing Haar-random projective measurements29 or specific classes of informationally complete positive operator-valued measures (IC-POVMs)30, the required number of total state copies can be reduced to polynomial scaling—either O(n3) or O(n), respectively—while ensuring bounded recovery error for MPO states.

While algorithms with low-rank assumptions or low-dimensional structures can enable significantly improved scaling, they still face considerable computational complexity, which in existing approaches can be attributed to four potential operations: (i) the calculation of the inverse; (ii) repeated inner product operations between matrices that grow exponentially with n; (iii) multiple projection steps onto the target subspace; and (iv) additional matrix multiplications introduced by iterative algorithms to enforce low-rankness or MPO representations. Recently, an efficient and experimentally feasible approach, known as classical shadow (CS) estimation, was introduced by ref. 31 to infer limited sets of state properties like fidelity, entanglement measures, and correlations. By exploiting efficient computational and storage capabilities on classical hardware, all necessary processing to predict these properties can be carried out via classical computations. This has sparked a series of studies leveraging the CS method32,33,34,35,36. Meanwhile, CS has also been utilized for full quantum state reconstruction37,38, and integrated with projection techniques to recover physical quantum states in39,40. Yet to the best of our knowledge, no existing theoretical analysis of the sampling complexity for the CS-based method addresses the reconstruction of other structured quantum states, above and beyond simply enforcing physicality. Thus, a theoretical understanding of whether the CS-based method can be effectively extended to the full state (i.e., QST) with provable performance guarantees remains absent.

In this paper, we derive performance guarantees for QST using a method we term projected classical shadow (PCS), which projects CS estimators onto target subspaces of the Hilbert space, as illustrated in Fig. 1. Given that the original CS density matrix is Hermitian but not in general positive semidefinite (PSD), our method involves projecting its eigenvalues onto the simplex41. We demonstrate that this approach requires O(4n) total state copies to sufficiently achieve a bounded recovery error in the Frobenius norm. For low-rank states, we further leverage (truncated) low-rank eigenvalue decomposition and show that the required number of total state copies can be reduced to O(2nr) for the same accuracy. Finally, for MPO states, we employ a quasi-optimal MPO projection—tensor-train singular value decomposition (TT-SVD)42 with a simplex projection—to form the PCS step, demonstrating that with O(n2) total state copies, the method reliably recovers the ground-truth state. While suboptimal relative to the degrees of freedom for MPO states, this approach improves upon the theoretical O(n3) scaling in ref. 29. PCS also offers a framework for incorporating prior knowledge about the target state form into the CS approach.

Fig. 1: Illustration of proposed PCS method.
figure 1

Given an initial CS estimate ρCS lying in the space of Hermitian and unit-trace matrices (not necessarily PSD), we compute the closest state ρPCS in the physical space of interest–either the space of all possible states (left) or a subspace possessing a desired structure (right).

Notation

We use bold capital letters (e.g., X) to denote matrices, bold lowercase letters (e.g., x) to denote column vectors, and italic letters (e.g., x) to denote scalar quantities. Matrix elements are denoted in parentheses. For example, X(i1, i2) denotes the element in position (i1, i2) of the matrix X. The superscripts () and () denote the transpose and Hermitian transpose, respectively. For two matrices A, B of the same size, \(\left\langle {\boldsymbol{A}},{\boldsymbol{B}}\right\rangle =\mathrm{trace}\,({{\boldsymbol{A}}}^{\dagger }{\boldsymbol{B}})\) denotes the inner product. X, X1, and XF respectively represent the spectral, trace, and Frobenius norm of X. For two positive quantities \(a,b\in {{\mathbb{R}}}^{+}\), the inequality b a or b = O(a) implies b ≤ ca for some universal constant c; likewise, b a or b = Ω(a) represents b ≥ ca for some universal constant c.

Results

Classical shadows

Quantum information science harnesses quantum states for information processing43. The state of an n-qubit system can be described by the density operator \({\boldsymbol{\rho }}\in {{\mathbb{C}}}^{{2}^{n}\times {2}^{n}}\), which is PSD (ρ 0) and has unit-trace (trace(ρ) = 1). In order to estimate this state, measurements can be performed on a collection of copies.

Projective measurements

Within the most general quantum measurement framework of positive operator valued measures (POVMs) (Specifically, a POVM is characterized as a set of PSD matrices: \(\{{{\boldsymbol{A}}}_{1},\ldots ,{{\boldsymbol{A}}}_{K}\}\in {{\mathbb{C}}}^{{2}^{n}\times {2}^{n}},\,\,s.\; t.\,\mathop{\sum}\nolimits_{k = 1}^{K}{{\boldsymbol{A}}}_{k}={{\bf{I}}}_{{2}^{n}}\). Each POVM element Ak corresponds to a potential outcome of a quantum measurement with the special case of projective measurements corresponding to the case where all Ak are pairwise orthogonal projection operators, meaning they satisfy \({{\boldsymbol{A}}}_{k}^{2}={{\boldsymbol{A}}}_{k}\) and AkAj = 0 for kj.), the special case of projective measurements is often employed, where the measurement outcomes are associated with an orthonormal eigenbasis of the system. To implement such a measurement defined by an arbitrary orthonormal basis \(\{{{\boldsymbol{\phi }}}_{k}:{{\boldsymbol{\phi }}}_{k}^{\dagger }{{\boldsymbol{\phi }}}_{l}={\delta }_{kl}\}\), we can introduce a unitary matrix \({\boldsymbol{U}}=\left[\begin{array}{ccc}{{\boldsymbol{\phi }}}_{1}&\cdots \,&{{\boldsymbol{\phi }}}_{{2}^{n}}\end{array}\right]\in {{\mathbb{C}}}^{{2}^{n}\times {2}^{n}}\) and apply U to the state ρ before conducting a projective measurement in the computational basis {ek}, where Uek = ϕk. The probability of observing the k-th outcome is given by:

$${p}_{k}=\langle {{\boldsymbol{\phi }}}_{k}{{\boldsymbol{\phi }}}_{k}^{\dagger },{\boldsymbol{\rho }}\rangle ={{\boldsymbol{e}}}_{k}^{\dagger }\left({{\boldsymbol{U}}}^{\dagger }{\boldsymbol{\rho }}{\boldsymbol{U}}\right){{\boldsymbol{e}}}_{k}.$$
(1)

However, a single projective measurement, even if repeated infinitely many times, provides only partial information on ρ, so multiple projective measurements must be conducted in various bases. In the subsequent discussion, we denote the number of distinct measurement bases by M, and the measurement operators for the m-th projective measurement by \(\{{{\boldsymbol{\phi }}}_{m,1}{{\boldsymbol{\phi }}}_{m,1}^{\dagger },\ldots ,{{\boldsymbol{\phi }}}_{m,{2}^{n}}{{\boldsymbol{\phi }}}_{m,{2}^{n}}^{\dagger }\}\).

Classical shadow (CS)

Consider the original CS proposal with single-shot Haar-random projective measurements. Given an unknown n-qubit ground truth ρ, we repeatedly execute the measurement procedure above Eq. (1) in which U is chosen randomly from the Haar distribution and each measurement is performed on only one copy (i.e., a new U is selected for each copy measured). The specific result \({{\boldsymbol{e}}}_{{j}_{m}}\) yields a snapshot, or “shadow,” of the underlying quantum state, which for Haar-distributed unitaries can be expressed as31:

$$\begin{array}{l}{{\boldsymbol{\rho }}}_{m}=({2}^{n}+1){{\boldsymbol{U}}}_{m}{{\boldsymbol{e}}}_{{j}_{m}}{{\boldsymbol{e}}}_{{j}_{m}}^{\dagger }{{\boldsymbol{U}}}_{m}^{\dagger }-{{\bf{I}}}_{{2}^{n}}\\\quad\,\,\,\,=({2}^{n}+1){{\boldsymbol{\phi }}}_{m,{j}_{m}}{{\boldsymbol{\phi }}}_{m,{j}_{m}}^{\dagger }-{{\bf{I}}}_{{2}^{n}}.\end{array}$$
(2)

By construction, this snapshot equals the ground truth in expectation (over both unitaries and measurement outcomes): \({\mathbb{E}}[{{\boldsymbol{\rho }}}_{m}]={{\boldsymbol{\rho }}}^{\star }\). Executing this process M times produces an array of M independent classical snapshots for the total CS estimator:

$$\begin{array}{l}{{\boldsymbol{\rho }}}_{{\rm{CS}}}=\frac{1}{M}\sum\limits_{m = 1}^{M}{{\boldsymbol{\rho }}}_{m}\\\qquad=\frac{1}{M}\sum\limits_{m = 1}^{M}\left[({2}^{n}+1){{\boldsymbol{\phi }}}_{m,{j}_{m}}{{\boldsymbol{\phi }}}_{m,{j}_{m}}^{\dagger }-{{\bf{I}}}_{{2}^{n}}\right].\end{array}$$
(3)

CS for Tomography?

Although CS estimators can efficiently predict observables of ρ, to our knowledge, there exist no theoretical results concerning the recovery error of the full state. Following the detailed derivation in the section “Methods”, we find the expectation of the mean squared error:

$${\mathbb{E}}\Vert{{\boldsymbol{\rho }}}_{{\rm{CS}}}-{{\boldsymbol{\rho }}}^{\star }{\Vert}_{F}^{2}\,=\,\frac{{4}^{n}+{2}^{n}-1-\Vert {{\boldsymbol{\rho}}}^{\star }{\Vert}_{F}^{2}}{M}.$$
(4)

Given that \(\Vert{{\boldsymbol{\rho}}}^{\star }{\Vert}_{F}^{2}\le {\left[{\mathrm{trace}}\,({{\boldsymbol{\rho }}}^{\star })\right]}^{2}=1\), it follows that Eq. (4) can be simplified to

$${\mathbb{E}}\parallel {{\boldsymbol{\rho }}}_{{\rm{CS}}}-{{\boldsymbol{\rho }}}^{\star }{\parallel }_{F}^{2}\approx \frac{{4}^{n}}{M}$$
(5)

for large n. Eq. (5) demonstrates that stable recovery of the full state can be achieved only when M scales proportionally to 4n, aligning with the optimal M required in QST for general states6.

A comparison between CS and traditional QST returns several key observations of relevance to this study:

  1. 1.

    CS yields an unbiased estimate (\({\mathbb{E}}[{{\boldsymbol{\rho }}}_{{\rm{CS}}}]={{\boldsymbol{\rho }}}^{\star }\)), whereas the solution from QST is often biased44, due to the fact that most QST methods involve physical constraints, such as positivity and unit trace.

  2. 2.

    While the CS estimator is typically unphysical (not PSD), leading QST methods like maximum likelihood estimation (MLE)5, projected least squares45, and Bayesian inference9 enforce physicality by construction.

  3. 3.

    CS boasts significantly lower computational complexity compared to standard QST methods, such as MLE using the fixed-point (FP) algorithm46,47, MLE using gradient descent (GD)48, least squares (LS) using GD29, and the one-step LS method17. The number of iterations required for convergence in iterative methods [e.g., MLE (FP), MLE (GD), and LS (GD)] significantly increases computational complexity. Moreover, the requirement of a suitable initialization further imposes a strong and often nontrivial condition for successful recovery. Although the one-step LS method17 avoids iterations, it does not incorporate any constraint and still involves matrix inversion and multiple matrix multiplications, resulting in substantial computational cost. In contrast, the CS method offers the lowest computational complexity, as it is also a one-step approach whose primary cost arises from computing the outer product of two vectors.

  4. 4.

    For M 2n, CS outperforms QST in predicting certain linear observables, not in predicting the entire state17,49.

  5. 5.

    Including prior information about state structure allows for a reduction in scaling in QST (see Tables 1 and 2). Apart from specialized CS methods tailored for states generated by shallow circuits and Hamiltonian dynamics50,51, which aim to improve the accuracy of predicting quantum state properties, there currently exists no known approach that similarly reduces the sample complexity of CS for full quantum state reconstruction. In other words, CS requires O(4n) measurements for estimating the full state, as demonstrated in Eq. (5).

In the next section we investigate methods for incorporating prior information about state structure into CS to reduce the scaling shown in Eq. (5).

Projected Classical Shadow (PCS) for QST

In this section, we will study the application of CS for the task of describing the full quantum state and show that, with a simple projection step, CS estimators are also effective for QST and achieve (nearly) information-theoretically optimal bounds for broad classes of states. Let \({\mathbb{X}}\) denote the class of states of interest, and assume that the underlying ground truth \({{\boldsymbol{\rho }}}^{\star }\in {\mathbb{X}}\). For instance, \({\mathbb{X}}\) could contain all physical states (PSD and unit-trace) or be restricted to a specific structure with compact representations, such as low-rank or MPO states. Because of the availability of previous sample complexity results based on the Frobenius norm, we choose to define ρPCS as the projection of ρCS on the set \({\mathbb{X}}\) that minimizes Frobenius error, i.e.,

$${{\boldsymbol{\rho }}}_{{\rm{PCS}}}={{\mathcal{P}}}_{{\mathbb{X}}}({{\boldsymbol{\rho }}}_{{\rm{CS}}}):= \arg \mathop{\min }\limits_{{\boldsymbol{\rho }}\in {\mathbb{X}}}{\left\Vert {\boldsymbol{\rho }}-{{\boldsymbol{\rho }}}_{{\rm{CS}}}\right\Vert }_{F}.$$
(6)

To provide a unified and general analysis of Eq. (6), we enlist tools from ϵ-net and covering number29,52 to capture the complexity of the classes of states within the set \({\mathbb{X}}\). First, consider the set \({\mathcal{N}}=\left\{\frac{{\boldsymbol{\rho }}}{\parallel {\boldsymbol{\rho }}{\parallel }_{F}}:{\boldsymbol{\rho }}\in {\mathbb{X}}\right\}\) scaled to unit Frobenius norm. For ϵ > 0, the set \({{\mathcal{N}}}_{\epsilon }\subset {\mathcal{N}}\) is said to be an ϵ-net (or an ϵ-cover) over \({\mathcal{N}}\) if for all \(\frac{{\boldsymbol{\rho }}}{\parallel {\boldsymbol{\rho }}{\parallel }_{F}}\in {\mathcal{N}}\), there exists \(\frac{{{\boldsymbol{\rho }}}^{{\prime} }}{\parallel {{\boldsymbol{\rho }}}^{{\prime} }{\parallel }_{F}}\in {{\mathcal{N}}}_{\epsilon }\) such that \({\left\Vert \frac{{\boldsymbol{\rho }}}{\parallel {\boldsymbol{\rho }}{\parallel }_{F}}-\frac{{{\boldsymbol{\rho }}}^{{\prime} }}{\parallel {{\boldsymbol{\rho }}}^{{\prime} }{\parallel }_{F}}\right\Vert }_{F}\le \epsilon\). The size of an ϵ-net with the smallest cardinality is called the covering number of \({\mathbb{X}}\), denoted by \({N}_{\epsilon }({\mathbb{X}})\). Intuitively speaking, a covering number is the minimum number of balls of a specified radius ϵ to cover a given set entirely. Coverings are useful for managing the complexity of a large set: instead of directly analyzing the behavior of an uncountable number of points in \({\mathcal{N}}\), we can analyze the finite number of points in \({{\mathcal{N}}}_{\epsilon }\). The behavior of all points in \({\mathcal{N}}\) is similar to that of the points in \({{\mathcal{N}}}_{\epsilon }\), as each point in \({\mathcal{N}}\) is close to some point in the covering.

Instead of the covering number \({N}_{\epsilon }({\mathbb{X}})\), our analysis will rely on the covering number of the set \(\overline{{\mathbb{X}}}\) formed by the differences between the elements in \({\mathbb{X}}\):

$$\overline{{\mathbb{X}}}=\left\{{{\boldsymbol{\rho }}}_{1}-{{\boldsymbol{\rho }}}_{2}:\,{{\boldsymbol{\rho }}}_{1},{{\boldsymbol{\rho }}}_{2}\in {\mathbb{X}},{{\boldsymbol{\rho }}}_{1}\ne {{\boldsymbol{\rho }}}_{2}\right\}.$$
(7)

In many cases, the covering number \({N}_{\epsilon }(\overline{{\mathbb{X}}})\) can be upper-bounded by \({N}_{\epsilon }^{2}({\mathbb{X}})\). Here we use \(\overline{{\mathbb{X}}}\) for convenience in the following.

The covering number when \({\mathbb{X}}\) comprises all physical quantum states can be computed as \(\log {N}_{\epsilon }(\overline{{\mathbb{X}}})=O({4}^{n}\log \frac{9}{\epsilon })\). By comparison, for quantum states with rank at most r, this reduces to \(\log {N}_{\epsilon }(\overline{{\mathbb{X}}})=O({2}^{n}r\log \frac{9}{\epsilon })\); when the density matrices are represented by MPOs with bond dimension D, the covering number can be further reduced to \(\log {N}_{\epsilon }(\overline{{\mathbb{X}}})=O\left(4n{D}^{2}\log \frac{4n+\epsilon }{\epsilon }\right)\), as discussed in the next part.

Theorem 1

For a given \({{\boldsymbol{\rho }}}^{\star }\in {\mathbb{X}}\), let ρPCS be the projected CS in Eq. (6). Then with probability at least \(1-{e}^{-\Omega (\log {N}_{1/2}(\overline{{\mathbb{X}}}))}\),

$$\Vert{{\boldsymbol{\rho }}}_{{\rm{PCS}}}-{{\boldsymbol{\rho}}}^{\star }{\Vert}_{F}\,\le\,O\left(\sqrt{\frac{\log {N}_{1/2}(\overline{{\mathbb{X}}})}{M}}\right).$$
(8)

The proof is given in the section “Methods”. Here, the set \({\mathbb{X}}\subset \{{\boldsymbol{\rho }}\in {{\mathbb{C}}}^{{2}^{n}\times {2}^{n}}:{\boldsymbol{\rho }}={{\boldsymbol{\rho }}}^{\dagger },\mathrm{trace}\,({\boldsymbol{\rho }})=1\}\) is any subspace of Hermitian, trace-one matrices (tan space in Fig. 1. The set \({\mathbb{X}}\) will be specialized to PSD matrices only (blue space in Fig. 1) in Corollary 1 and low-dimensional structures (green space in Fig. 1) in Theorems 3 and 4. Theorem 1 guarantees a stable recovery of the ground-truth ρ with ξ-closeness in the Frobenius norm, provided \(M=O(\log {N}_{1/2}({\mathbb{X}})/{\xi }^{2})\) number of Haar-random projective measurements, which scales linearly with the logarithm of the covering number. For structured sets \({\mathbb{X}}\) that are nonconvex, such as MPO states, computing the optimal projection \({{\mathcal{P}}}_{{\mathbb{X}}}\) might be difficult or even NP-hard. For these cases, we can use numerical methods to compute an approximate projection \({\widetilde{{\mathcal{P}}}}_{{\mathbb{X}}}\) that we assume is α-approximately optimal (α ≥ 1), satisfying

$${\widetilde{{\mathcal{P}}}}_{{\mathbb{X}}}({\boldsymbol{\rho }})\in {\mathbb{X}},\quad {\left\Vert {\widetilde{{\mathcal{P}}}}_{{\mathbb{X}}}({\boldsymbol{\rho }})-{\boldsymbol{\rho }}\right\Vert }_{F}\le \sqrt{\alpha }{\left\Vert {{\mathcal{P}}}_{{\mathbb{X}}}({\boldsymbol{\rho }})-{\boldsymbol{\rho }}\right\Vert }_{F}$$
(9)

for any ρ. As will be explained in the next sections, there exist efficient methods to find approximation projections for low-rank and MPO states. Denote by \({\widetilde{{\boldsymbol{\rho }}}}_{{\rm{PCS}}}={\widetilde{{\mathcal{P}}}}_{{\mathbb{X}}}({{\boldsymbol{\rho }}}_{{\rm{CS}}})\) the PCS estimator obtained with this approximate projection. The following extends the results in Theorem 1 to \({\widetilde{{\boldsymbol{\rho }}}}_{{\rm{PCS}}}\).

Theorem 2

For a given \({{\boldsymbol{\rho }}}^{\star }\in {\mathbb{X}}\), let \({\widetilde{{\boldsymbol{\rho }}}}_{{\rm{PCS}}}\) be the approximate PCS estimator in Eq. (9). Then with probability at least \(1-{e}^{-\Omega (\log {N}_{1/2}(\overline{{\mathbb{X}}}))}\),

$$\Vert{\widetilde{{\boldsymbol{\rho }}}}_{{\rm{PCS}}}-{{\boldsymbol{\rho }}}^{\star }{\Vert}_{F}\,\le\, O\left(\sqrt{\frac{\alpha \log {N}_{1/2}(\overline{{\mathbb{X}}})}{M}}\right).$$
(10)

General physical states

We first specialize \({\mathbb{X}}\) to all physical quantum states (We chose the label “simplex” for this set since the eigenvalues {λk} of all physical states define a standard simplex, i.e., λk ≥ 0 and ∑kλk = 1.):

$${{\mathbb{X}}}_{{\rm{simplex}}}=\{{\boldsymbol{\rho }}\in {{\mathbb{C}}}^{{2}^{n}\times {2}^{n}}:{\boldsymbol{\rho }}\succcurlyeq {\boldsymbol{0}},\mathrm{trace}\,({\boldsymbol{\rho }})=1\}.$$
(11)

For \({{\mathbb{X}}}_{{\rm{simplex}}}\), the PCS projection in Eq. (6) can be implemented by performing an eigenvalue decomposition, followed by projecting the eigenvalues onto the simplex \(\{{\boldsymbol{x}}\in {{\mathbb{R}}}^{{2}^{n}}:{x}_{i}\ge 0,\mathop{\sum}\nolimits_{i = 1}^{{2}^{n}}{x}_{i}=1\}\) using the algorithm proposed in Refs. 41,45, while keeping the eigenvectors unchanged. This approach has also been employed in Refs. 39,40 to ensure the physical structure of the reconstructed state. The computational complexity of the projection step is \(O(a\log a)\), where a denotes the number of nonzero eigenvalues.

Since the corresponding set \(\overline{{\mathbb{X}}}\) has covering number \(\log {N}_{\epsilon }(\overline{{\mathbb{X}}})=O({4}^{n}\log \frac{9}{\epsilon })\), we can plug this information into Theorem 1 to obtain recovery guarantee for \({{\mathcal{P}}}_{{\rm{simplex}}}({{\boldsymbol{\rho }}}_{{\rm{CS}}})\).

Corollary 1

For a given physical state \({{\boldsymbol{\rho }}}^{\star }\in {{\mathbb{C}}}^{{2}^{n}\times {2}^{n}}\), we perform M projective measurements to obtain the CS estimate ρCS. Then with probability at least \(1-{e}^{-\Omega ({4}^{n})}\), the projected classical shadow \({{\mathcal{P}}}_{{\rm{simplex}}}({{\boldsymbol{\rho }}}_{{\rm{CS}}})\) satisfies

$$\Vert{{\mathcal{P}}}_{{\rm{simplex}}}({{\boldsymbol{\rho}}}_{{\rm{CS}}})-{{\boldsymbol{\rho}}}^{\star }{\Vert}_{F}\,\le\,O\left(\sqrt{\frac{{4}^{n}}{M}}\right).$$
(12)

Low-rank states

We next explore the structure of pure or nearly pure quantum states characterized by low entropy and represented as low-rank density matrices. Assuming ρ has rank r ≤ 2n, we can refine our attention to the set \({{\mathbb{X}}}_{r}=\{{\boldsymbol{\rho }}\in {{\mathbb{C}}}^{{2}^{n}\times {2}^{n}}:{\boldsymbol{\rho }}\succcurlyeq {\boldsymbol{0}},\mathrm{trace}\,({\boldsymbol{\rho }})=1,\,\text{rank}\,({\boldsymbol{\rho }})=r\}\). Denote \({{\mathcal{P}}}_{{{\mathbb{X}}}_{r}}(\cdot )\) as the optimal projection satisfying Eq. (6). It follows from Theorem 1 and the covering number of the corresponding set \(\log {N}_{\epsilon }(\overline{{\mathbb{X}}})=O({2}^{n}r\log \frac{9}{\epsilon })\) that \(\Vert{{\mathcal{P}}}_{{{\mathbb{X}}}_{r}}({{\boldsymbol{\rho}}}_{{\rm{CS}}})-{{\boldsymbol{\rho }}}^{\star }{\Vert }_{F}\,\le\, O(\sqrt{{2}^{n}r/M})\).

However, since we are unaware of an algorithm to perform the ideal projection \({{\mathcal{P}}}_{{{\mathbb{X}}}_{r}}(\cdot )\), we instead consider a two-step alternative to obtain the low-rank projected classical shadow (LR-PCS):

$${{\boldsymbol{\rho }}}_{{\rm{LR}}\text{-}{\rm{PCS}}}={{\mathcal{P}}}_{{\rm{simplex}}}({{\mathcal{P}}}_{{\rm{rank}}-r}({{\boldsymbol{\rho }}}_{{\rm{CS}}})),$$
(13)

where \({{\mathcal{P}}}_{\text{rank-}r\text{}}(\cdot )\) denotes the rank-r projection obtained by setting all eigenvalues beyond the r-th largest eigenvalue to zero. We can show that ρLR-PCS shares a similar guarantee as \({{\mathcal{P}}}_{{{\mathbb{X}}}_{r}}({{\boldsymbol{\rho }}}_{{\rm{CS}}})\).

Theorem 3

Given M Haar-random projective measurements on physical state \({{\boldsymbol{\rho }}}^{\star }\in {{\mathbb{X}}}_{r}\), with probability \(1-{e}^{-\Omega ({2}^{n}r)}\,{{\boldsymbol{\rho }}}_{{\rm{LR-PCS}}}\), defined in Eq. (13), satisfies

$$\Vert{{\boldsymbol{\rho }}}_{{\rm{LR-PCS}}}-{{\boldsymbol{\rho }}}^{\star }{\Vert}_{F}\,\le\,O\left(\sqrt{\frac{{2}^{n}r}{M}}\right).$$
(14)

The detailed proof appears in the section “Methods”. This theoretical recovery error is optimal, given that the degrees of freedom for the ground truth ρ are O(2nr). This highlights that LR-PCS can achieve the optimal solution in QST using independent measurements, without requiring multiple iterations of optimization algorithms.

To compare LR-PCS with prior results, we convert the result of Theorem 3 to trace norm leveraging the inequality between the Frobenius and the trace norms6, namely \(\Vert{{\boldsymbol{\rho }}}_{{\rm{LR}}{\mbox{-}}{\rm{PCS}}}-{{\boldsymbol{\rho }}}^{\star }{\Vert}_{1}\le \sqrt{2r}\Vert{{\boldsymbol{\rho}}}_{{\rm{LR}}{\mbox{-}}{\rm{PCS}}}-{{\boldsymbol{\rho }}}^{\star }{\Vert}_{F}\le O(\sqrt{{2}^{n}{r}^{2}/M})\), which matches the optimal guarantee (up to small log terms) with independent measurements according to ref. 6. We have summarized the comparison in Table 1. We note that the sufficient condition for PCS in the general setting matches the necessary condition established in ref. 6. Similarly, the sufficient condition for PCS in the low-rank setting is also close to the corresponding necessary condition in ref. 6, up to logarithmic factors.

Table 1 Comparing the total number of state copies in PCS using single-shot global Haar unitaries to that in optimal QST

MPO states

While the computational and storage requirements for low-rank density matrices are significantly smaller compared to general ones, they still grow exponentially in the number of qubits n. Moreover, the assumption of high purity on which the low-rank approximation is based becomes increasingly tenuous in practice for existing processors in the noisy intermediate-scale quantum (NISQ) era. For this reason, reducing parameter count through alternative assumptions is worth pursuing. Examples such as ground states of many quantum systems with short-range interactions and states generated by such systems within a finite duration26 often possess entanglement localized to subsystems of the entire quantum computer. Consequently, they can be compactly represented using MPOs, whose degrees of freedom scale only polynomially in n. To assist in the development of an MPO-PCS method, we will first establish their connection to tensor train (TT) decompositions42, a technique widely utilized in signal processing and machine learning.

For a n-qubit density matrix \({{\boldsymbol{\rho }}}^{\star }\in {{\mathbb{C}}}^{{2}^{n}\times {2}^{n}}\), we employ a single index array i1 in (j1 jn) to denote the row (column) indices, where i1, …, in {1, 2} (Specifically, i1 in represents the \(({i}_{1}+\mathop{\sum}\nolimits_{\ell = 2}^{n}{2}^{\ell -1}({i}_{\ell }-1))\)-th row.). We designate ρ as an MPO if we can represent its (i1 in, j1 jn)-th element using the following matrix product53:

$${{\boldsymbol{\rho }}}^{\star }({i}_{1}\cdots {i}_{n},{j}_{1}\cdots {j}_{n})={{\boldsymbol{X}}}_{1}^{{i}_{1},{j}_{1}}{{\boldsymbol{X}}}_{2}^{{i}_{2},{j}_{2}}\cdots {{\boldsymbol{X}}}_{n}^{{i}_{n},{j}_{n}},$$
(15)

where \({{\boldsymbol{X}}}_{\ell }^{{i}_{\ell },{j}_{\ell }}\in {{\mathbb{C}}}^{D\times D}\) for {2, …, n − 1}, \({{\boldsymbol{X}}}_{1}^{{i}_{1},{j}_{1}}\in {{\mathbb{C}}}^{1\times D}\), \({{\boldsymbol{X}}}_{n}^{{i}_{n},{j}_{n}}\in {{\mathbb{C}}}^{D\times 1}\), and D is the bond dimension, and thus we can introduce the set of physical MPO states with bond dimension D as

$${{\mathbb{X}}}_{D}=\left\{{\boldsymbol{\rho }}\in {{\mathbb{C}}}^{{2}^{n}\times {2}^{n}}:\,{\boldsymbol{\rho }}\succcurlyeq {\boldsymbol{0}},\mathrm{trace}\,({\boldsymbol{\rho }})=1,\,{\rm{bond}}\, {\rm{dimension}}\,({\boldsymbol{\rho }})=D\right\}.$$
(16)

Here, the corresponding difference set \(\overline{{\mathbb{X}}}\) has a covering number \(\log {N}_{\epsilon }(\overline{{\mathbb{X}}})=O\left(4n{D}^{2}\log \frac{4n+\epsilon }{\epsilon }\right)\), which is proportional to the degrees of freedom O(4nD2) in the MPO states. Given the optimal projection \({{\mathcal{P}}}_{{{\mathbb{X}}}_{D}}(\cdot )\), it follows from Theorem 1 that \(\Vert{{\mathcal{P}}}_{{{\mathbb{X}}}_{D}}\left({{\boldsymbol{\rho }}}_{{\rm{CS}}}\right)-{{\boldsymbol{\rho }}}^{\star }{\Vert}_{F}\,\le\,O(\sqrt{n{D}^{2}\log n/M})\).

However, we have been unable to implement the optimal \({{\mathcal{P}}}_{{{\mathbb{X}}}_{D}}(\cdot )\) due to the difficulty in satisfying both the MPO and simplex conditions simultaneously. Therefore, we introduce a quasi-optimal projection based on a sequential singular value decomposition (SVD) algorithm, commonly referred to as tensor train SVD (TT-SVD)42. Based on tensor-matrix equivalence, we can design a two-step MPO PCS method:

$${{\boldsymbol{\rho }}}_{{\rm{MPO}}\text{-}{\rm{PCS}}}={{\mathcal{P}}}_{{\rm{simplex}}}({\text{SVD}\,}_{D}^{tt}({{\boldsymbol{\rho }}}_{{\rm{CS}}})),$$
(17)

where \({\,\text{SVD}\,}_{D}^{tt}(\cdot )\) denotes the TT-SVD operation. It is worth noting that the bond dimension of ρMPO-PCS may differ slightly from D due to the simplex projection, but the recovery error still depends on D. We analyze the recovery error of Eq. (17) as follows:

Theorem 4

Consider an MPO state \({{\boldsymbol{\rho }}}^{\star }\in {{\mathbb{X}}}_{D}\), measured M times with Haar-random projections. With ρMPO-PCS defined as in Eq. (17), with probability \(1-{e}^{-\Omega (n{D}^{2}\log n)}\) we have

$$\Vert{{\boldsymbol{\rho }}}_{{\rm{MPO}}{\mbox{-}}{\rm{PCS}}}-{{\boldsymbol{\rho}}}^{\star }{\Vert}_{F}\,\le\, O\left(\sqrt{\frac{{n}^{2}{D}^{2}\log n}{M}}\right).$$
(18)

The proof can be found in the section “Methods”. Note that due to the quasi-optimality of the TT-SVD, the upper bound of Eq. (18) is not optimal when considering the degrees of freedom O(4nD2) in ρ. To our knowledge, there exists no method that can guarantee both MPO and PSD constraints simultaneously. Should such an optimal MPO projection be found, however, we could potentially remove one factor of n in the numerator of Eq. (18), thus ensuring exact MPO rank. In Table 2, we summarize the total number of state copies required for MPO-PCS compared to existing QST methods. It is important to highlight that all QST results represent sufficient, rather than necessary, conditions. Compared to the constrained LS method using Haar measurements in ref. 29, MPO-PCS exhibits better sample complexity. While the result30 based on constrained least squares with spherical 3-design POVMs—which are informationally complete—could potentially outperform PCS in terms of sample complexity (but PCS achieves the same complexity given an optimal projection), attaining the bound in ref. 30 requires solving a highly nonconvex optimization problem to global optimality. Table 3. However, there is currently a lack of practical algorithms capable of achieving this bound. The gradient-based iterative algorithm proposed in ref. 30 provides only a suboptimal guarantee and is initialization-dependent—which may limit its practical applicability. Additionally, spherical 3-designs are not known to have efficient implementations using current local quantum circuits, whereas Haar measurements are generally more feasible in experimental settings.

Table 2 Total number of copies in MPO-PCS compared to constrained LS using Haar measures, and spherical 3-designs
Table 3 Average runtime per trial (reported as mean ± standard deviation, in seconds) of numerical experiments in Fig. 4, for the case M = 1000

Simulation results

In this section, we conduct numerical QST experiments with Haar-random projective measurements to compare CS, LR-PCS, and MPO-PCS methods. For each configuration, we conduct 10 Monte Carlo tomographic experiments in which each Haar measurement and result are sampled at random; then we take the average over all 10 trials to report the results. For the random state cases (Figs. 2, 3), each trial corresponds to a different randomly chosen ground truth, whereas for the tailored state cases (Figs. 4, 5), each trial in a given average is performed on the same ground truth. Furthermore, since the magnitude of ρF differs across quantum states with different ranks or bond dimensions, we apply the normalized Frobenius norm to enable fair comparisons and provide a consistent metric for reconstruction accuracy.

Fig. 2: Estimating four-qubit low-rank states by CS and LR-PCS methods.
figure 2

Mean squared error as a function of state copies M, averaged over trials on ten randomly chosen ground truth states of rank r = 1 (a), r = 4 (b), and r = 16 (c). The horizontal axes span M = 250 to M = 10000. Uncertainty is defined as the sample standard deviation.

Fig. 3: Estimating seven-qubit MPO states by CS and LR-PCS methods.
figure 3

Mean squared error as a function of state copies M, where each point is the average over trials on ten randomly chosen ground truth states for bond dimension D = 1 (a) and D = 4 (b). Uncertainty is defined as the sample standard deviation.

Fig. 4: Estimating seven-qubit thermal and GHZ states.
figure 4

Mean squared error as a function of the number of state copies M for (a) thermal state (T = 0.2), (b) thermal state (T = 2), and (c) GHZ state. Comparison between different methods for (a) thermal state (T = 0.2), (b) thermal state (T = 2), and (c) GHZ state. All figures have M = 100 as the starting point.

In the first set of tests, we compare CS and LR-PCS for a specific rank r as a function of measurements M. We generate random ground-truth density matrices \({{\boldsymbol{\rho }}}^{\star }={{\boldsymbol{F}}}^{\star }{{{\boldsymbol{F}}}^{\star }}^{\dagger }\in {{\mathbb{C}}}^{16\times 16}\) (n = 4 qubits), where \({{\boldsymbol{F}}}^{\star }=\frac{{{\boldsymbol{A}}}^{\star }+{\rm{i}}{{\boldsymbol{B}}}^{\star }}{\parallel {{\boldsymbol{A}}}^{\star }+{\rm{i}}{{\boldsymbol{B}}}^{\star }{\parallel }_{F}}\in {{\mathbb{C}}}^{16\times r}\), and the entries of A and B are independent and identically distributed (i.i.d.) samples drawn from the standard normal distribution. Notably, when r = 16, LR-PCS reduces to projection onto the set of general physical states defined in Eq. (11). The results in Fig. 2 for rank r {1, 4, 16} reveal two key observations: (i) as the rank r decreases and the number of measurements M increases, the recovery error across all methods consistently reduces, with the squared error quantitatively scaling as expected (4n/M for CS and 2nr/M for LR-PCS); (ii) for any r and M, LR-PCS outperforms standard CS (even at full rank), as it preserves physicality under any rank constraints; and (iii) in Fig. 2c, since r = 16 (i.e., ρ is full rank), LR-PCS provides only the additional PSD constraint compared to CS. As M increases, the performance gap between LR-PCS and CS narrows, as the PSD constraint alone has limited impact on reducing the recovery error.

In the second set of trials, we test CS and MPO-PCS across varying numbers of measurements M and bond dimension D. We consider n = 7-qubit matrix product states (MPSs, pure state special cases of MPOs) of the form \({{\boldsymbol{\rho }}}^{\star }={{\boldsymbol{u}}}^{\star }{{{\boldsymbol{u}}}^{\star }}^{\dagger }\in {{\mathbb{C}}}^{128\times 128}\), where \({{\boldsymbol{u}}}^{\star }\in {{\mathbb{C}}}^{128\times 1}\) satisfies u2 = 1 and its (i1 i7)-element can be represented in the matrix product form: \({{\boldsymbol{u}}}^{\star }({i}_{1}\cdots {i}_{7})={{{\boldsymbol{U}}}_{1}^{\star }}^{{i}_{1}}\cdots {{{\boldsymbol{U}}}_{7}^{\star }}^{{i}_{7}}\). Here, each matrix \({{{\boldsymbol{U}}}_{\ell }^{\star }}^{{i}_{\ell }}\) has size d × d, except for \({{{\boldsymbol{U}}}_{1}^{\star }}^{{i}_{1}}\) and \({{{\boldsymbol{U}}}_{7}^{\star }}^{{i}_{7}}\) of dimensions of 1 × d and d × 1, respectively.

To generate each MPS u, we draw a length-128 complex vector with i.i.d. standard normal elements, apply TT-SVD42 to truncate it to an MPS, and then normalize the result to unit length. As a result, entry ρ(i1 i7, j1 j7) can be expressed as \({{\boldsymbol{\rho }}}^{\star }({i}_{1}\cdots {i}_{7},{j}_{1}\cdots {j}_{7})=({{{\boldsymbol{U}}}_{1}^{\star }}^{{i}_{1}}\otimes {{{{\boldsymbol{U}}}_{1}^{\star }}^{{j}_{1}}}^{\dagger })\cdots ({{{\boldsymbol{U}}}_{7}^{\star }}^{{i}_{7}}\otimes {{{{\boldsymbol{U}}}_{7}^{\star }}^{{j}_{7}}}^{\dagger })={{{\boldsymbol{X}}}_{1}^{\star }}^{{i}_{1},{j}_{1}}\cdots {{{\boldsymbol{X}}}_{7}^{\star }}^{{i}_{7},{j}_{7}}\), where denotes the Kronecker product. Thus, \({{\boldsymbol{\rho }}}^{\star }={{\boldsymbol{u}}}^{\star }{{{\boldsymbol{u}}}^{\star }}^{\dagger }\) is also an MPO with bond dimension D = d2 (equal for all qubits). As shown in Fig. 3, MPO-PCS attains significantly lower error than CS, as it leverages knowledge about the underlying MPO structure. And the recovery error of MPO-PCS increases with higher MPO bond dimension (in line with Table 2), whereas that of CS remains the same regardless of D.

In the third set of trials, we simulate measurements on 7-qubit density matrices: (i) thermal state (The thermal state is generated from the 1D quantum Ising model \(H=\mathop{\sum}\nolimits_{j = 1}^{n-1}{\sigma }_{j}^{z}{\sigma }_{j+1}^{z}+\mathop{\sum}\nolimits_{j = 1}^{n}{\sigma }_{j}^{x}\) with \({\sigma }_{j}^{a}={{\bf{I}}}_{{2}^{j-1}}\otimes {\sigma }^{a}\otimes {{\bf{I}}}_{{2}^{n-j}}\in {{\mathbb{R}}}^{{2}^{n}\times {2}^{n}},a=x,z\) and \({\sigma }^{x}=\left[\begin{array}{cc}0&1\\ 1&0\end{array}\right],{\sigma }^{z}=\left[\begin{array}{cc}1&0\\ 0&-1\end{array}\right]\). The thermal state is then defined as \({{\boldsymbol{\rho }}}^{\star }=\frac{{e}^{-H/T}}{\mathrm{trace}\,({e}^{-H/T})}\),) with temperature T = 0.2 (a relatively low temperature close to the ground state); (ii) thermal state with temperature T = 2 (corresponding to a relatively high temperature); and (iii) Greenberger–Horne–Zeilinger (GHZ) state (The GHZ state is constructed as ρ = gg where \({\boldsymbol{g}}={\left[\begin{array}{ccccc}\frac{1}{\sqrt{2}}&0&\cdots &0&\frac{1}{\sqrt{2}}\end{array}\right]}^{\top }\in {{\mathbb{R}}}^{{2}^{n}\times 1}\).). It is worth noting that the low-temperature thermal state (i) and the GHZ state (iii) simultaneously exhibit low-rank and MPO structures16,48,54, making them well-suited for demonstrating the advantages of exploiting structured subspaces. We impose a rank constraint r {4, 24, 1} for the estimator on each state, respectively. For the T = 0.2 thermal state, the ground-truth density matrix has rank of approximately 4, while for the high-temperature case (T = 2), it is full-rank; for LR-PCS r = 24 is selected, somewhat arbitrarily, which is sufficient to encompass 80% of the sum of the eigenvalues of the ground-truth density matrix. In addition, we apply TT-SVD on the CS estimator to adaptively select the bond dimensions using the error tolerance 10−14. To facilitate a comprehensive comparison between the PCS-based methods and MLE, we introduce the low-rank MLE (LR-MLE) and matrix product operator MLE (MPO-MLE) algorithms, as detailed in the section “Methods”. For LR-MLE and MPO-MLE, we adopt random initialization (We generate a density matrix \({{\boldsymbol{\rho }}}_{0}={{\boldsymbol{F}}}_{0}{{{\boldsymbol{F}}}_{0}}^{\dagger }\in {{\mathbb{C}}}^{{2}^{n}\times {2}^{n}}\), where \({{\boldsymbol{F}}}_{0}=\frac{{{\boldsymbol{A}}}_{0}+i\cdot {{\boldsymbol{B}}}_{0}}{\parallel {{\boldsymbol{A}}}_{0}+i\cdot {{\boldsymbol{B}}}_{0}{\parallel }_{F}}\in {{\mathbb{C}}}^{{2}^{n}\times r}\), with the entries of A0 and B0 independently drawn from the standard normal distribution.) and spectral initialization30, respectively. The step sizes are set to 0.5, 0.05, and 0.1 for LR-MLE, and 0.3, 0.05, and 0.1 for MPO-MLE across the three set of quantum states mentioned above; the number of iterations is fixed at 500 to guarantee convergence for all cases. Given that the variance error in reconstructing the thermal state and GHZ state is less than 10% of the mean error, we exclude this error from the figure to preserve its clarity. Fig. 4 shows that the proposed LR-PCS and MPO-PCS methods outperform standard CS, as quantified by the Frobenius norm. Furthermore, MPO-PCS demonstrates superior performance compared to LR-PCS, which can be attributed to the lower degrees of freedom in the MPO structure relative to the low-rank structure [cf. Eqs. (14),(18))]. In addition, LR-PCS achieves performance comparable to that of LR-MLE, as it attains the information-theoretically optimal error bound. In contrast, the performance of MPO-PCS is slightly inferior to that of MPO-MLE, primarily due to its suboptimal recovery error bound, which contains a factor of n2 rather than n. Nevertheless, according to Table 3, we note that CS-based methods are significantly more computationally efficient than MLE-based methods (more than 100 × faster in the cases considered), as the latter require iterative optimization procedures.

In the final test, we examine how the recovery error scales with qubit number n, using parameter settings of r = 4 for T = 0.2, r = 4(n − 1) for T = 2, r = 1 for GHZ state and an error tolerance of 10−14 for determining bond dimension D. As highlighted in Fig. 5, both LR-PCS and MPO-PCS effectively attenuate the growth in recovery error as the system size n increases, in contrast to the standard CS method. This improvement is attributed to the utilization of the low-dimensional structure in these methods. Additionally, the recovery error of MPO-PCS scales polynomially with n, as indicated in Eq. (18), rather than exponentially as in Eq. (14) of LR-PCS; hence, MPO-PCS outperforms LR-PCS in terms of recovery error.

Fig. 5: Estimating thermal and GHZ states with varied number of qubits.
figure 5

Mean squared error as a function of the total qubit number with M = 3000 for (a) thermal state (T = 0.2), (b) thermal state (T = 2), and (c) GHZ state.

Discussion

This paper has introduced the projected classical shadow (PCS) method to address the computational challenges of quantum state tomography (QST) in large Hilbert spaces by leveraging the classical shadow (CS) framework combined with a physical projection step. The method provides guaranteed performance under Haar-random measurements. Theoretical results show that the PCS method achieves high accuracy in reconstructing general and low-rank quantum states while minimizing the number of state copies, meeting information-theoretically optimal bounds. Moreover, the PCS method reduces the number of state copies required for matrix product operator (MPO) states compared to existing results using Haar random measurements. Numerical validation further demonstrates the practicality and computational efficiency of PCS for large-scale quantum state reconstruction.

More broadly, our formalism points to a promising new general direction for CS methods. Although originally introduced for the estimation of state properties rather than the state per se31, CS nevertheless relies on an estimator ρCS of the full density matrix. As our results reveal, this generally unphysical estimator can be projected onto a physical space of interest—whether the entire Hilbert space or some subset thereof (Fig. 1)—with performance guarantees that attain information-theoretic bounds (for the case of arbitrary and low-rank states) or improve upon previous scaling results (for MPO states). Therefore in merging the conceptual simplicity of CS with the scaling improvements possible in structured quantum systems, our results suggest a compelling role for PCS in traditional quantum state estimation, with exciting opportunities for future exploration in even more types of subspaces tailored to specific physical conditions or prior knowledge, such as projected entangled pair operator (PEPO)55 and multiscale entanglement renormalization ansatz (MERA)56 constructions.

Another promising direction is to analyze the PCS method under local measurements. Although global measurements—characterized by joint operations across all qubits—are theoretically advantageous, their implementation using practical quantum circuits poses substantial challenges. In contrast, local measurements—whether taken from the Haar measure16, the Pauli set23, or local informationally complete POVMs48—are significantly more compatible with current quantum architectures and can be implemented with greater experimental efficiency. While the projection-based framework employed in this work could, in principle, be directly adapted to local measurement scenarios, the theoretical machinery developed herein does not extend to such cases, as the concentration inequality in Eq. (27) [see section “Methods”] cannot be established under local measurements. Addressing this gap necessitates the development of new analytical tools tailored to the locality constraints, which we leave as an important direction for future investigation.

Methods

This section provides detailed proofs and a comprehensive description of the MLE-based methods (LR-MLE and MPO-MLE) introduced in the last section.

Proof of Equation 4

Proof

We expand \({\mathbb{E}}{\Vert{{\boldsymbol{\rho }}}_{{\rm{CS}}}-{{\boldsymbol{\rho }}}^{\star }\Vert}_{F}^{2}\) as follows:

$$\begin{array}{ll}\qquad\quad{\mathbb{E}}{\|{\boldsymbol{\rho}}_{{{\text{CS}}}} - {\boldsymbol{\rho}}^\star \|}_F^2 \\\qquad={\mathbb{E}}{\left\| \frac{1}{M}\sum\limits_{m=1}^M{\boldsymbol{\rho}}_m - {\boldsymbol{\rho}}^\star\right\|}_F^2 \\\qquad={\mathbb{E}}\left\langle \frac{1}{M}\sum\limits_{m=1}^M ({\boldsymbol{\rho}}_m - {\boldsymbol{\rho}}^\star), \frac{1}{M}\sum\limits_{m=1}^M ({\boldsymbol{\rho}}_m - {\boldsymbol{\rho}}^\star)\right\rangle \\\qquad=\frac{1}{M^2}{\mathbb{E}}\sum\limits_{m=1}^M{\|{\boldsymbol{\rho}}_m - {\boldsymbol{\rho}}^\star \|}_F^2 \\\qquad=\frac{1}{M}{\mathbb{E}}{\|{\boldsymbol{\rho}}_1 - {\boldsymbol{\rho}}^\star \|}_F^2 \\\qquad=\frac{1}{M}({\|{\boldsymbol{\rho}}^\star\|}_F^2 - 2{\mathbb{E}}\langle {\boldsymbol{\rho}}_1, {\boldsymbol{\rho}}^\star \rangle + {\mathbb{E}}\langle {\boldsymbol{\rho}}_1,{\boldsymbol{\rho}}_1 \rangle ) \\\qquad=\frac{1}{M}\left[- {\|{\boldsymbol{\rho}}^\star \|}_F^2 + {(2^n+1)}^2{\mathbb{E}}\langle {\boldsymbol{\phi}}_{1,j_1}{\boldsymbol{\phi}}_{1,j_1}^\dagger, {\boldsymbol{\phi}}_{1,j_1}{\boldsymbol{\phi}}_{1,j_1}^\dagger \rangle \right. \\\qquad\left.-2(2^n+1){\mathbb{E}}\langle {\boldsymbol{\phi}}_{1,j_1}{\boldsymbol{\phi}}_{1,j_1}^\dagger, {\bf{I}}_{2^n} \rangle +2^n \right] \\\qquad=\frac{4^n + 2^n - 1 - {\|{\boldsymbol{\rho}}^\star \|}_F^2}{M}, \end{array}$$
(19)

where the third line follows from \({\mathbb{E}}[{{\boldsymbol{\rho }}}_{m}]={{\boldsymbol{\rho }}}^{\star }\), the fourth from the equivalence under expectation of all measurements m, and the last from the normalization \(\langle {{\boldsymbol{\phi }}}_{1,{j}_{1}}{{\boldsymbol{\phi }}}_{1,{j}_{1}}^{\dagger },{{\boldsymbol{\phi }}}_{1,{j}_{1}}{{\boldsymbol{\phi }}}_{1,{j}_{1}}^{\dagger }\rangle =\langle {{\boldsymbol{\phi }}}_{1,{j}_{1}}{{\boldsymbol{\phi }}}_{1,{j}_{1}}^{\dagger },{{\bf{I}}}_{{2}^{n}}\rangle =1\).

Proof of Theorem 1

Proof

We define a restricted Frobenius norm as

$$\begin{array}{l}\Vert{{\boldsymbol{\rho }}}_{{\rm{PCS}}}-{{\boldsymbol{\rho }}}^{\star }{\Vert}_{F,\widehat{{\mathbb{X}}}}=\Vert{{\boldsymbol{\rho }}}_{{\rm{PCS}}}-{{\boldsymbol{\rho }}}^{\star }{\Vert}_{F}\\\qquad\qquad\qquad\quad=\mathop{\max }\limits_{{\boldsymbol{\rho }}\in \widehat{{\mathbb{X}}}}\langle {{\boldsymbol{\rho }}}_{{\rm{PCS}}}-{{\boldsymbol{\rho }}}^{\star },{\boldsymbol{\rho }}\rangle ,\end{array}$$
(20)

where \(\widehat{{\mathbb{X}}}=\{{\boldsymbol{\rho }}\in {{\mathbb{C}}}^{{2}^{n}\times {2}^{n}}:\mathrm{trace}\,({\boldsymbol{\rho }})=0,{\boldsymbol{\rho }}={{\boldsymbol{\rho }}}^{\dagger },\parallel\!\!{\boldsymbol{\rho }}{\parallel }_{F}\le 1\}\). By the definition of the restricted Frobenius norm in Eq. (20), we can further analyze

$$\begin{array}{l}\qquad\Vert{{\boldsymbol{\rho }}}_{{\rm{PCS}}}-{{\boldsymbol{\rho }}}^{\star }{\Vert}_{F}\\=\Vert {{\boldsymbol{\rho }}}_{{\rm{PCS}}}-{{\boldsymbol{\rho }}}^{\star }{\Vert}_{F,\widehat{{\mathbb{X}}}}\le \Vert {{\boldsymbol{\rho }}}_{{\rm{CS}}}-{{\boldsymbol{\rho }}}^{\star }{\Vert}_{F,\widehat{{\mathbb{X}}}}\\=\mathop{\max }\limits_{{\boldsymbol{\rho }}\in \widehat{{\mathbb{X}}}}\left\langle \frac{1}{M}\sum\limits_{m = 1}^{M}\left[({2}^{n}+1){{\boldsymbol{\phi }}}_{m,{j}_{m}}{{\boldsymbol{\phi }}}_{m,{j}_{m}}^{\dagger }-{{\bf{I}}}_{{2}^{n}}\right]-{{\boldsymbol{\rho }}}^{\star },{\boldsymbol{\rho }}\right\rangle,\end{array}$$
(21)

where the inequality follows from the assumption that the physical projection \({{\mathcal{P}}}_{{\mathbb{X}}}(\cdot )\) is optimal and therefore satisfies nonexpansiveness. Next, we bound \(\frac{1}{M}\mathop{\sum}\nolimits_{m = 1}^{M}[({2}^{n}+1){{\boldsymbol{\phi }}}_{m,{j}_{m}}{{\boldsymbol{\phi }}}_{m,{j}_{m}}^{\dagger }-{{\bf{I}}}_{{2}^{n}}]-{{\boldsymbol{\rho }}}^{\star }\) using the covering argument. According to the assumption, we initially construct an ϵ-net \(\{{{\boldsymbol{\rho }}}^{(1)},\ldots ,{{\boldsymbol{\rho }}}^{({N}_{\epsilon }(\widetilde{{\mathbb{X}}}))}\}\in \widetilde{{\mathbb{X}}}\subset \widehat{{\mathbb{X}}}\), where the size of \(\widetilde{{\mathbb{X}}}\) is denoted by \({N}_{\epsilon }(\widetilde{{\mathbb{X}}})\) such that

$$\begin{array}{r}\mathop{\sup}\limits_{{\boldsymbol{\rho }}:\Vert{\boldsymbol{\rho }}{\Vert}_{F}\le 1}\mathop{\min }\limits_{p\le {N}_{\epsilon }(\widetilde{{\mathbb{X}}})}\Vert{\boldsymbol{\rho }}-{{\boldsymbol{\rho }}}^{(p)}{\Vert}_{F}\le \epsilon.\end{array}$$
(22)

In addition, we denote \({{\boldsymbol{B}}}_{m}=\frac{1}{M}(({2}^{n}+1){{\boldsymbol{\phi }}}_{m,{j}_{m}}{{\boldsymbol{\phi }}}_{m,{j}_{m}}^{\dagger }-{{\bf{I}}}_{{2}^{n}}-{{\boldsymbol{\rho }}}^{\star })\) and derive

$$\begin{array}{l}\qquad\qquad\mathop{\max }\limits_{{\boldsymbol{\rho }}\in \widehat{{\mathbb{X}}}}\left\langle \sum\limits_{m = 1}^{M}{{\boldsymbol{B}}}_{m},{\boldsymbol{\rho }}\right\rangle \\\qquad\quad=\mathop{\max }\limits_{{\boldsymbol{\rho }}\in \widehat{{\mathbb{X}}}}\left\langle \sum\limits_{m = 1}^{M}{{\boldsymbol{B}}}_{m},{\boldsymbol{\rho }}-{{\boldsymbol{\rho }}}^{(p)}+{{\boldsymbol{\rho }}}^{(p)}\right\rangle \\\qquad\quad\le\mathop{\max }\limits_{{{\boldsymbol{\rho }}}^{(p)}\in \widetilde{{\mathbb{X}}}}\left\langle \sum\limits_{m = 1}^{M}{{\boldsymbol{B}}}_{m},{{\boldsymbol{\rho }}}^{(p)}\right\rangle +\epsilon \mathop{\max }\limits_{{\boldsymbol{\rho }}\in \widehat{{\mathbb{X}}}}\left\langle \sum\limits_{m = 1}^{M}{{\boldsymbol{B}}}_{m},{\boldsymbol{\rho }}\right\rangle.\end{array}$$

By setting ϵ = 0.5 and moving the second term on the right-hand side to the left, we get

$$\begin{array}{r}\mathop{\max }\limits_{{\boldsymbol{\rho }}\in \widehat{{\mathbb{X}}}}\left\langle \sum\limits_{m = 1}^{M}{{\boldsymbol{B}}}_{m},{\boldsymbol{\rho }}\right\rangle \le \mathop{\max }\limits_{{{\boldsymbol{\rho }}}^{(p)}\in \widetilde{{\mathbb{X}}}}2\left\langle \sum\limits_{m = 1}^{M}{{\boldsymbol{B}}}_{m},{{\boldsymbol{\rho }}}^{(p)}\right\rangle .\end{array}$$
(23)

Then we need to build the concentration inequality for the right-hand side of Eq. (23). First, we define

$$\begin{array}{r}\sum\limits_{m = 1}^{M}{s}_{m}=\sum\limits_{m = 1}^{M}\left\langle ({2}^{n}+1){{\boldsymbol{\phi }}}_{m,{j}_{m}}{{\boldsymbol{\phi }}}_{m,{j}_{m}}^{\dagger }-{{\bf{I}}}_{{2}^{n}}-{{\boldsymbol{\rho }}}^{\star },{{\boldsymbol{\rho }}}^{(p)}\right\rangle ,\\ \end{array}$$
(24)

and due to \({\mathbb{E}}[({2}^{n}+1){{\boldsymbol{\phi }}}_{m,{j}_{m}}{{\boldsymbol{\phi }}}_{m,{j}_{m}}^{\dagger }-{{\bf{I}}}_{{2}^{n}}-{{\boldsymbol{\rho }}}^{\star }]={\bf{0}}\), we have \({\mathbb{E}}[{s}_{m}]=0\). Moreover, we rewrite sm as

$$\begin{array}{l}{s}_{m}=\left\langle ({2}^{n}+1){{\boldsymbol{\phi }}}_{m,{j}_{m}}{{\boldsymbol{\phi }}}_{m,{j}_{m}}^{\dagger }-{{\bf{I}}}_{{2}^{n}}-{{\boldsymbol{\rho }}}^{\star },{{\boldsymbol{\rho }}}^{(p)}\right\rangle \\\quad\,\,=({2}^{n}+1)\left\langle {{\boldsymbol{\phi }}}_{m,{j}_{m}}{{\boldsymbol{\phi }}}_{m,{j}_{m}}^{\dagger }-\frac{{{\boldsymbol{\rho }}}^{\star }}{{2}^{n}+1},{{\boldsymbol{\rho }}}^{(p)}\right\rangle \\\quad\,\,=({2}^{n}+1)\left\langle {{\boldsymbol{\phi }}}_{m,{j}_{m}}{{\boldsymbol{\phi }}}_{m,{j}_{m}}^{\dagger },{{\boldsymbol{\rho }}}^{(p)}-\frac{\langle {{\boldsymbol{\rho }}}^{\star },{{\boldsymbol{\rho }}}^{(p)}\rangle }{{2}^{n}+1}{{\bf{I}}}_{{2}^{n}}\right\rangle \\\quad\,\,=({2}^{n}+1)\left\langle {{\boldsymbol{\phi }}}_{m,{j}_{m}}{{\boldsymbol{\phi }}}_{m,{j}_{m}}^{\dagger },{\boldsymbol{D}}\right\rangle ,\end{array}$$
(25)

where the second line follows from \({\mathrm{trace}}\,({{\boldsymbol{\rho }}}^{(p)})=\langle {{\bf{I}}}_{{2}^{n}},{{\boldsymbol{\rho }}}^{(p)}\rangle =0\). We can further compute

$$\begin{array}{lll}{\mathbb{E}}\left[| {s}_{m}{| }^{a}\right]&=&{\mathbb{E}}\left[{({2}^{n}+1)}^{a} \left| \left\langle {{\boldsymbol{\phi }}}_{m,{j}_{m}}{{\boldsymbol{\phi }}}_{m,{j}_{m}}^{\dagger },{\boldsymbol{D}}\right\rangle \right| ^{a}\right]\\ && \le {({2}^{n}+1)}^{a}{\mathbb{E}}\left[{(\mathrm{trace}\,({{\boldsymbol{\phi }}}_{m,{j}_{m}}{{\boldsymbol{\phi }}}_{m,{j}_{m}}^{\dagger }| {\boldsymbol{D}}| ))}^{a}\right]\\ && =\frac{{({2}^{n}+1)}^{a}}{{C}_{{2}^{n}+a-1}^{a}}{\mathrm{trace}}\,(| {\boldsymbol{D}}{| }^{\otimes a}{P}_{{\rm{Sym}}})\\ && \le\frac{{({2}^{n}+1)}^{a}}{{C}_{{2}^{n}+a-1}^{a}}\Vert | {\boldsymbol{D}}| {\Vert}_{F}^{\otimes a}\Vert {P}_{{\rm{Sym}}}\Vert \\ && \le 6\times {2}^{a-2}a!,\end{array}$$
(26)

where \(| {\boldsymbol{D}}| =\sqrt{{{\boldsymbol{D}}}^{2}}={\boldsymbol{U}}\sqrt{{\boldsymbol{\Sigma }}}{{\boldsymbol{V}}}^{\dagger }\) denotes the absolute value of the matrix D with its compact SVD D2 = UΣV and \({\boldsymbol{A}}^{{\otimes}a} = \underbrace{{\boldsymbol{A}}\otimes \cdots \otimes {\boldsymbol{A}}}_{{a}}\) holds for any matrix A. Given that the unitary Haar measure conforms to any unitary p-design, as exemplified in [ref. 57, Example 51], we can deduce the third line, with PSym representing an orthogonal projector onto the symmetric subspace. The second inequality follows from [ref. 58, Lemma 7] and \(\Vert| {\boldsymbol{D}}| {\parallel }_{F}^{\otimes a}=\Vert | {\boldsymbol{D}}{| }^{\otimes a}{\Vert }_{F}\) due to the positive semidefiniteness of Da and the orthogonal projection. In the last line, we utilize \(\Vert | {\boldsymbol{D}}| {\Vert }_{F}\le \Vert {{\boldsymbol{\rho }}}^{(p)}{\Vert }_{F}+\Vert \frac{\langle {{\boldsymbol{\rho }}}^{\star },{{\boldsymbol{\rho }}}^{(p)}\rangle }{{2}^{n}+1}{{\bf{I}}}_{{2}^{n}}{\Vert}_{F}\le 1+\frac{{2}^{n}}{{2}^{n}+1}\Vert {{\boldsymbol{\rho }}}^{(p)}{\Vert }_{F}\Vert {{\boldsymbol{\rho }}}^{\star }{\Vert }_{F}\le 2\), PSym≤1 and \(\frac{{({2}^{n}+1)}^{a}}{{C}_{{2}^{n}+a-1}^{a}}\le \frac{3}{2}a!\).

Based on Lemma 1 with \({\mathbb{E}}[{s}_{m}]=0\) and \({\mathbb{E}}[| {s}_{m}{| }^{a}]\le 6\times {2}^{a-2}a!\), for any t [0, 1], we have the probability

$$\begin{array}{l}{\mathbb{P}}\left(\frac{1}{M}\left\vert \sum\limits_{m = 1}^{M}\left\langle ({2}^{n}+1){{\boldsymbol{\phi }}}_{m,{j}_{m}}{{\boldsymbol{\phi }}}_{m,{j}_{m}}^{\dagger }-{{\bf{I}}}_{{2}^{n}}-{{\boldsymbol{\rho }}}^{\star },{{\boldsymbol{\rho }}}^{(p)}\right\rangle \right\vert \ge t\right)\\ \le 2{e}^{-\frac{M{t}^{2}}{28}}.\end{array}$$
(27)

Combining Eqs. (23), (27), there exists an ϵ-net \(\widetilde{{\mathbb{X}}}\) of \(\widehat{{\mathbb{X}}}\) such that

$$\begin{array}{l}\quad\,\,{\mathbb{P}}\left(\mathop{\max }\limits_{{\boldsymbol{\rho }}\in \widehat{{\mathbb{X}}}}\left\langle \frac{1}{M}\sum\limits_{m = 1}^{M}\left[({2}^{n}+1){{\boldsymbol{\phi }}}_{m,{j}_{m}}{{\boldsymbol{\phi }}}_{m,{j}_{m}}^{\dagger }\,\,-\,\,{{\bf{I}}}_{{2}^{n}}\right]\,\,-\,\,{{\boldsymbol{\rho }}}^{\star },{\boldsymbol{\rho }}\right\rangle \ge t\right)\\\,\, \le {\mathbb{P}}\left(\,\mathop{\max }\limits_{{{\boldsymbol{\rho }}}^{(p)}\in \widetilde{{\mathbb{X}}}}\frac{1}{M}\sum\limits_{m = 1}^{M}\left\langle ({2}^{n}\,\,+\,\,1){{\boldsymbol{\phi }}}_{m,{j}_{m}}{{\boldsymbol{\phi }}}_{m,{j}_{m}}^{\dagger }\,\,-\,\,{{\bf{I}}}_{{2}^{n}}\,\,-\,\,{{\boldsymbol{\rho }}}^{\star },{{\boldsymbol{\rho }}}^{(p)}\right\rangle \ge \frac{t}{2}\right)\\\,\, \le {\mathbb{P}}\left(\,\mathop{\max }\limits_{{{\boldsymbol{\rho }}}^{(p)}\in \widetilde{{\mathbb{X}}}}\frac{1}{M}\left\vert \sum\limits_{m = 1}^{M}\left\langle ({2}^{n}\,\,+\,\,1){{\boldsymbol{\phi }}}_{m,{j}_{m}}{{\boldsymbol{\phi }}}_{m,{j}_{m}}^{\dagger }\,\,-\,\,{{\bf{I}}}_{{2}^{n}}\,\,-\,\,{{\boldsymbol{\rho }}}^{\star },{{\boldsymbol{\rho }}}^{(p)}\right\rangle \right\vert \ge \frac{t}{2}\right)\\\,\, \le 2{N}_{\epsilon }(\widetilde{{\mathbb{X}}}){e}^{-\frac{M{t}^{2}}{112}}\\\,\, \le {e}^{-\frac{M{t}^{2}}{112}+\log 2{N}_{\epsilon }(\widetilde{{\mathbb{X}}})}.\end{array}$$
(28)

We opt for \(t=O\left(\sqrt{\frac{\log {N}_{\epsilon }(\widetilde{{\mathbb{X}}})}{M}}\right)\), and subsequently, with probability \(1-{e}^{-\Omega (\log {N}_{\epsilon }(\widetilde{{\mathbb{X}}}))}\), we derive

$$\Vert{{\boldsymbol{\rho }}}_{{\rm{PCS}}}-{{\boldsymbol{\rho }}}^{\star}{\Vert}_{F}\le O\left(\sqrt{\frac{\log {N}_{\epsilon }(\widetilde{{\mathbb{X}}})}{M}}\right).$$
(29)

Proof of Theorem 3

Proof

We define a restricted Frobenius norm as following:

$$\begin{array}{l}\Vert {{\boldsymbol{\rho }}}_{1}-{{\boldsymbol{\rho }}}_{2}{\Vert}_{F,2r}=\Vert {{\boldsymbol{\rho }}}_{1}-{{\boldsymbol{\rho }}}_{2}{\Vert}_{F}\\\qquad\qquad\qquad\,=\mathop{\max }\limits_{{\boldsymbol{\rho }}\in {\widehat{{\mathbb{X}}}}_{2r}}\langle {{\boldsymbol{\rho }}}_{1}-{{\boldsymbol{\rho }}}_{2},{\boldsymbol{\rho }}\rangle ,\end{array}$$
(30)

where the set \({\widehat{{\mathbb{X}}}}_{r}\) is defined as follows:

$$\begin{array}{l}\qquad\quad{\widehat{{\mathbb{X}}}}_{r}=\left\{{\boldsymbol{\rho }}\in {{\mathbb{C}}}^{{2}^{n}\times {2}^{n}}:{\boldsymbol{\rho }}={{\boldsymbol{\rho }}}^{\dagger },\right.\\\qquad\quad\quad\quad\,\,\,\left.\,{\text{rank}}\,({\boldsymbol{\rho }})=r,{\mathrm{trace}}\,({\boldsymbol{\rho }})=0,\Vert {\boldsymbol{\rho }}{\Vert}_{F}\le 1\right\}.\end{array}$$
(31)

By the definition of the restricted Frobenius norm in Eq. (30), we can further analyze

$$\begin{array}{l}\qquad\Vert {{\boldsymbol{\rho }}}_{{\rm{LR}}\text{-}{\rm{PCS}}}-{{\boldsymbol{\rho }}}^{\star }{\Vert}_{F}\\\quad=\Vert {{\boldsymbol{\rho }}}_{{\rm{LR}}{\mbox{-}}{\rm{PCS}}}-{{\boldsymbol{\rho }}}^{\star }{\Vert}_{F,2r}\\\quad \le \Vert{{\mathcal{P}}}_{{\rm{ED}}}({{\boldsymbol{\rho }}}_{{\rm{CS}}})-{{\boldsymbol{\rho }}}^{\star }{\Vert}_{F,2r}\\\, \quad\le 2\parallel {{\boldsymbol{\rho }}}_{{\rm{CS}}}-{{\boldsymbol{\rho }}}^{\star }{\parallel }_{F,2r}\\\quad=2\mathop{\max }\limits_{{\boldsymbol{\rho }}\in {\widehat{{\mathbb{X}}}}_{2r}}\left\langle \frac{1}{M}\sum\limits_{m = 1}^{M}[({2}^{n}+1){{\boldsymbol{\phi }}}_{m,{j}_{m}}{{\boldsymbol{\phi }}}_{m,{j}_{m}}^{\dagger }-{{\bf{I}}}_{{2}^{n}}]\,\,-\,\,{{\boldsymbol{\rho }}}^{\star },{\boldsymbol{\rho }}\right\rangle ,\\ \end{array}$$
(32)

where the first two inequalities respectively follow the nonexpansiveness property of the projection and the quasi-optimality property of eigenvalue decomposition (ED) projection42. Next, we need to bound the first term in the last line of Eq. (32) using the covering argument. According to [ref. 59, Lemma 3.1], we initially construct an ϵ-net \(\{{{\boldsymbol{\rho }}}^{(1)},\ldots ,{{\boldsymbol{\rho }}}^{{N}_{\epsilon }({\widetilde{{\mathbb{X}}}}_{2r})}\}\in {\widetilde{{\mathbb{X}}}}_{2r}\subset {\widehat{{\mathbb{X}}}}_{2r}\) in which the size of \({\widetilde{{\mathbb{X}}}}_{2r}\) is denoted by \({N}_{\epsilon }({\widetilde{{\mathbb{X}}}}_{2r})\le {(\frac{9}{\epsilon })}^{({2}^{n+2}+2)r}\) such that

$$\mathop{\sup }\limits_{{\boldsymbol{\rho }}:\Vert {\boldsymbol{\rho }}{\parallel }_{F}\le 1}\mathop{\min }\limits_{p\le {N}_{\epsilon }({\widetilde{{\mathbb{X}}}}_{2r})}\Vert {\boldsymbol{\rho }}-{{\boldsymbol{\rho }}}^{(p)}{\Vert }_{F}\le \epsilon.$$
(33)

Combining Eqs. (23), (27), there exists an ϵ-net \({\widetilde{{\mathbb{X}}}}_{2r}\) of \({\widehat{{\mathbb{X}}}}_{2r}\) such that

$$\begin{array}{l}\quad\,\,{\mathbb{P}}\left(\mathop{\max }\limits_{{\boldsymbol{\rho }}\in {\widehat{{\mathbb{X}}}}_{2r}}\langle \frac{1}{M}\sum\limits_{m = 1}^{M}\left[({2}^{n}\,\,+\,\,1){{\boldsymbol{\phi }}}_{m,{j}_{m}}{{\boldsymbol{\phi }}}_{m,{j}_{m}}^{\dagger }\,\,-\,\,{{\bf{I}}}_{{2}^{n}}\right]\,\,-\,\,{{\boldsymbol{\rho }}}^{\star },{\boldsymbol{\rho }}\rangle \ge t\right)\\ \le {\mathbb{P}}\left(\,\mathop{\max }\limits_{{{\boldsymbol{\rho }}}^{(p)}\in {\widetilde{{\mathbb{X}}}}_{2r}}\frac{1}{M}\sum\limits_{m = 1}^{M}\langle ({2}^{n}\,\,+\,\,1){{\boldsymbol{\phi }}}_{m,{j}_{m}}{{\boldsymbol{\phi }}}_{m,{j}_{m}}^{\dagger }\,\,-\,\,{{\bf{I}}}_{{2}^{n}}\,\,-\,\,{{\boldsymbol{\rho }}}^{\star },{{\boldsymbol{\rho }}}^{(p)}\rangle \ge \frac{t}{2}\right)\\ \le x{\mathbb{P}}\left(\,\mathop{\max }\limits_{{{\boldsymbol{\rho }}}^{(p)}\in {\widetilde{{\mathbb{X}}}}_{2r}}\frac{1}{M}\left\vert \sum\limits_{m = 1}^{M}\langle ({2}^{n}\,\,+\,\,1){{\boldsymbol{\phi }}}_{m,{j}_{m}}{{\boldsymbol{\phi }}}_{m,{j}_{m}}^{\dagger }\,\,-\,\,{{\bf{I}}}_{{2}^{n}}\,\,-\,\,{{\boldsymbol{\rho }}}^{\star },{{\boldsymbol{\rho }}}^{(p)}\rangle \right\vert \ge \frac{t}{2}\right)\\ \le 2{\left(\frac{9}{\epsilon }\right)}^{({2}^{n+2}+2)r}{e}^{-\frac{M{t}^{2}}{112}}\\\le {e}^{-\frac{M{t}^{2}}{112}+C{2}^{n}r},\end{array}$$
(34)

where we set \(\epsilon =\frac{1}{2}\) and C is a positive constant. We opt for \(t=O\left(\sqrt{\frac{{2}^{n}r}{M}}\right)\) and subsequently, with probability \(1-{e}^{-\Omega ({2}^{n}r)}\), derive

$$\Vert{{\boldsymbol{\rho }}}_{{\rm{LR}}{\mbox{-}}{\rm{PCS}}}-{{\boldsymbol{\rho }}}^{\star }{\Vert}_{F}\le O\left(\sqrt{\frac{{2}^{n}r}{M}}\right).$$
(35)

Proof of Theorem 4

Proof

We define a restricted Frobenius norm as follows:

$$\begin{array}{l}\Vert {{\boldsymbol{\rho }}}_{1}-{{\boldsymbol{\rho }}}_{2}{\Vert}_{F,2D}=\Vert {{\boldsymbol{\rho }}}_{1}-{{\boldsymbol{\rho }}}_{2}{\Vert}_{F}\\\qquad\qquad\qquad\,\,\,=\mathop{\max }\limits_{{\boldsymbol{\rho }}\in {\widehat{{\mathbb{X}}}}_{2D}}\langle {{\boldsymbol{\rho }}}_{1}-{{\boldsymbol{\rho }}}_{2},{\boldsymbol{\rho }}\rangle .\end{array}$$
(36)

where we denote by \({\widehat{{\mathbb{X}}}}_{D}\) the normalized set of MPOs with bond dimension D:

$$\begin{array}{l}{\widehat{{\mathbb{X}}}}_{D}=\left\{{\boldsymbol{\rho }}\in {{\mathbb{C}}}^{{2}^{n}\times {2}^{n}}:\,{\boldsymbol{\rho }}={{\boldsymbol{\rho }}}^{\dagger },\Vert {\boldsymbol{\rho }}{\Vert }_{F}\le 1,{\mathrm{trace}}\,({\boldsymbol{\rho }})=0,\right.\\\qquad\left.\,{\rm{bond}}\, {\rm{dimension}}\,({\boldsymbol{\rho }})=D\right\}.\end{array}$$
(37)

Note that the presence of additional orthonormal structures arises from the fact that, according to ref. 60, any TT form is equivalent to a left-orthogonal TT form42.

We define \({{\mathcal{P}}}_{\mathrm{trace}\,}(\cdot )\) as a projection onto a convex set \(\{{\boldsymbol{\rho }}\in {{\mathbb{C}}}^{{2}^{n}\times {2}^{n}}:\mathrm{trace}\,({\boldsymbol{\rho }})=1\}\). By the definition of the restricted Frobenius norm (36), we can derive

$$\begin{array}{l}\quad\Vert {{\boldsymbol{\rho }}}_{{\rm{MPO}}{\mbox{-}}{\rm{PCS}}}-{{\boldsymbol{\rho }}}^{\star }{\Vert }_{F}\\ \le\,\parallel {{\mathcal{P}}}_{\mathrm{trace}\,}({\,\text{SVD}\,}_{D}^{tt}({{\boldsymbol{\rho }}}_{{\rm{CS}}}))-{{\boldsymbol{\rho }}}^{\star }{\parallel }_{F}\\ =\Vert{{\mathcal{P}}}_{\mathrm{trace}\,}({\,{\rm{SVD}}\,}_{D}^{tt}({{\boldsymbol{\rho }}}_{{\rm{CS}}}))-{{\boldsymbol{\rho }}}^{\star }{\Vert}_{F,2D}\\\,\le \Vert{\,{\rm{SVD}}\,}_{D}^{tt}({{\boldsymbol{\rho }}}_{{\rm{CS}}})-{{\boldsymbol{\rho }}}^{\star }{\Vert}_{F,2D}\\\,\le\,(1+\sqrt{n-1})\Vert {{\boldsymbol{\rho }}}_{{\rm{CS}}}-{{\boldsymbol{\rho }}}^{\star }{\Vert }_{F,2D}\\ =(1+\sqrt{n-1})\mathop{\max }\limits_{{\boldsymbol{\rho }}\in {\widehat{{\mathbb{X}}}}_{2D}}\left\langle \frac{1}{M}\sum\limits_{m = 1}^{M}\left(({2}^{n}+1)\right.{{\boldsymbol{\phi }}}_{m,{j}_{m}}{{\boldsymbol{\phi }}}_{m,{j}_{m}}^{\dagger }\right.\\\quad \left.\left.-\,{{\bf{I}}}_{{2}^{n}}\right)-{{\boldsymbol{\rho }}}^{\star },{\boldsymbol{\rho }}\right\rangle \\ =(1+\sqrt{n-1})\mathop{\max }\limits_{{\boldsymbol{\rho }}\in {\widehat{{\mathbb{X}}}}_{2D}}\left\langle \frac{1}{M}\sum\limits_{m = 1}^{M}\left(({2}^{n}+1)\right.{{\boldsymbol{\phi }}}_{m,{j}_{m}}{{\boldsymbol{\phi }}}_{m,{j}_{m}}^{\dagger }\right.\\\quad \left.\left.-\,{{\bf{I}}}_{{2}^{n}}\right)-{{\boldsymbol{\rho }}}^{\star },{\boldsymbol{\rho }}\right\rangle \end{array}$$
(38)

where the first two inequalities respectively follow from the nonexpansiveness property of the projection onto the convex set, while the third inequality is a consequence of the quasi-optimality property of TT-SVD projection42. Additionally, we denote

$$\begin{array}{ll}{\widehat{{\mathbb{X}}}}_{D}=\left\{{\boldsymbol{\rho }}\in {{\mathbb{C}}}^{{2}^{n}\times {2}^{n}}:\,{\boldsymbol{\rho }}={{\boldsymbol{\rho }}}^{\dagger },\mathrm{trace}\,({\boldsymbol{\rho }})=0,\right.\\\qquad\quad{\boldsymbol{\rho }}({i}_{1}\cdots {i}_{n},{j}_{1}\cdots {j}_{n})={{\boldsymbol{X}}}_{1}^{{i}_{1},{j}_{1}}{{\boldsymbol{X}}}_{2}^{{i}_{2},{j}_{2}}\cdots {{\boldsymbol{X}}}_{n}^{{i}_{n},{j}_{n}},\\\qquad\quad{{\boldsymbol{X}}}_{1}^{{i}_{1},{j}_{1}}\in {{\mathbb{C}}}^{1\times D},{{\boldsymbol{X}}}_{n}^{{i}_{n},{j}_{n}}\in {{\mathbb{C}}}^{D\times 1},{{\boldsymbol{X}}}_{\ell }^{{i}_{\ell },{j}_{\ell }}\in {{\mathbb{C}}}^{D\times D},\\ \qquad\quad\left.\parallel L({{\boldsymbol{X}}}_{\ell })\parallel \le 1,\ell \in [n-1],\parallel L({{\boldsymbol{X}}}_{n}){\parallel }_{F}\le 1\right\}\,.\end{array}$$
(39)

Based on ρF = L(Xn)F ≤ 1 for a left-orthogonal TT form using [ref. 61, Eq.(44)], we obtain the last line.

Next, we will apply the covering argument to bound (38). For any fixed value of \(\widetilde{{\boldsymbol{\rho }}}\in {\widetilde{{\mathbb{X}}}}_{2D}\subset {\widehat{{\mathbb{X}}}}_{2D}\), using Eq. (23), concentration inequality in Eq. (27) and Lemma 3, there exists an ϵ-net \({\widetilde{{\mathbb{X}}}}_{2D}\) of \({\widehat{{\mathbb{X}}}}_{2D}\) such that

$$\begin{array}{ll}\quad\,\,{\mathbb{P}}\left(\mathop{\max }\limits_{{\boldsymbol{\rho }}\in {\widehat{{\mathbb{X}}}}_{2D}}\langle \frac{1}{M}\sum\limits_{m = 1}^{M}(({2}^{n}+1){{\boldsymbol{\phi }}}_{m,{j}_{m}}{{\boldsymbol{\phi }}}_{m,{j}_{m}}^{\dagger }\,\,-\,\,{{\bf{I}}}_{{2}^{n}})\,-\,\,{{\boldsymbol{\rho }}}^{\star },{\boldsymbol{\rho }}\rangle \ge t\right)\\\le\,{\mathbb{P}}\left(\mathop{\max }\limits_{\widetilde{{\boldsymbol{\rho }}}\in {\widetilde{{\mathbb{X}}}}_{2D}}\frac{1}{M}\sum\limits_{m = 1}^{M}\langle ({2}^{n}+1){{\boldsymbol{\phi }}}_{m,{j}_{m}}{{\boldsymbol{\phi }}}_{m,{j}_{m}}^{\dagger }\,\,-\,\,{{\bf{I}}}_{{2}^{n}}\,\,-\,\,{{\boldsymbol{\rho }}}^{\star },\widetilde{{\boldsymbol{\rho }}}\rangle \ge \frac{t}{2}\right)\\\le\,{\mathbb{P}}\left(\mathop{\max }\limits_{\widetilde{{\boldsymbol{\rho }}}\in {\widetilde{{\mathbb{X}}}}_{2D}}\frac{1}{M}\left\vert \sum\limits_{m = 1}^{M}\langle ({2}^{n}+1){{\boldsymbol{\phi }}}_{m,{j}_{m}}{{\boldsymbol{\phi }}}_{m,{j}_{m}}^{\dagger }\,\,-\,\,{{\bf{I}}}_{{2}^{n}}\,\,-\,\,{{\boldsymbol{\rho }}}^{\star },\widetilde{{\boldsymbol{\rho }}}\rangle \right\vert \ge \frac{t}{2}\right)\\\le\,2{\left(\frac{4n+\epsilon }{\epsilon }\right)}^{4n{D}^{2}}{e}^{-\frac{M{t}^{2}}{112}}\\ \le{e}^{-\frac{M{t}^{2}}{112}+Cn{D}^{2}\log n},\end{array}$$
(40)

where we set \(\epsilon =\frac{1}{2}\) and C is a positive constant. We opt for \(t=O\left(\sqrt{\frac{n{D}^{2}\log n}{M}}\right)\) and subsequently, with probability \(1-{e}^{-\Omega (n{D}^{2}\log n)}\), derive

$$\Vert{{\boldsymbol{\rho }}}_{{\rm{MPO}}{\mbox{-}}{\rm{PCS}}}-{{\boldsymbol{\rho }}}^{\star }{\Vert}_{F}\le O\left(\sqrt{\frac{{n}^{2}{D}^{2}\log n}{M}}\right).$$
(41)

Maximum Likelihood Estimation for Low-rank States and MPO states Maximum likelihood estimation (MLE) is a widely used technique for quantum state reconstruction. Under single-shot measurements, the MLE loss function can be formulated as follows47,62,63,64,65,66:

$$\min\limits_{{{\boldsymbol{\rho}}\succeq {{{\bf{0}}}},}\atop{{\rm{trace}}({\boldsymbol{\rho}}) = 1 }}f({\boldsymbol{\rho}}) = -\frac{1}{M}\sum\limits_{m=1}^M \log(\langle {\boldsymbol{\phi}}_{m,j_m}{\boldsymbol{\phi}}_{m,j_m}^\dagger, {\boldsymbol{\rho}}\rangle).$$
(42)

However, the objective function in (42) does not leverage the structural properties inherent in quantum states. To address this limitation, we propose two MLE methods tailored for (1) low-rank states and (2) MPO states.

Low-rank MLE

When the density matrix is low-rank, we adopt a Riemannian gradient descent (RGD) algorithm on the unit Frobenius norm sphere. Specifically, for a quantum state \({{\boldsymbol{\rho }}}^{\star }\in {{\mathbb{C}}}^{{2}^{n}\times {2}^{n}}\) satisfying trace(ρ) = 1 and ρ 0, we can factorize it as \({{\boldsymbol{\rho }}}^{\star }={{\boldsymbol{F}}}^{\star }{{{\boldsymbol{F}}}^{\star }}^{\dagger },{{\boldsymbol{F}}}^{\star }\in {{\mathbb{C}}}^{{2}^{n}\times r}\) with FF = 1. This leads to the reformulated MLE objective:

$$\min\limits_{{\boldsymbol{F}}\in{\mathbb{C}}^{2^n\times r}, \atop \|{\boldsymbol{F}}\|_F=1}f_1({\boldsymbol{F}}) = -\frac{1}{M}\sum\limits_{m=1}^M \log(\langle {\boldsymbol{\phi}}_{m,j_m}{\boldsymbol{\phi}}_{m,j_m}^\dagger, {\boldsymbol{F}}{\boldsymbol{F}}^\dagger\rangle).$$

The corresponding Riemannian gradient descent update reads:

$${\widehat{{\boldsymbol{F}}}}_{t}={{\boldsymbol{F}}}_{t-1}-\mu {{\mathcal{P}}}_{{T}_{{\boldsymbol{F}}}\text{Sp}}({\nabla }_{{\boldsymbol{F}}}{f}_{1}({{\boldsymbol{F}}}_{t-1}))\,\,\,{\rm{and}}\,\,\,{{\boldsymbol{F}}}_{t}=\frac{{\widehat{{\boldsymbol{F}}}}_{t}}{\Vert {\widehat{{\boldsymbol{F}}}}_{t}{\Vert}_{F}},$$

where the Euclidean gradient is \({\nabla }_{{\boldsymbol{F}}}{f}_{1}({\boldsymbol{F}})=-\frac{1}{M}\mathop{\sum}\nolimits_{m = 1}^{M}\frac{{{\boldsymbol{\phi }}}_{m,{j}_{m}}{{\boldsymbol{\phi }}}_{m,{j}_{m}}^{\dagger }}{\langle {{\boldsymbol{\phi }}}_{m,{j}_{m}}{{\boldsymbol{\phi }}}_{m,{j}_{m}}^{\dagger },{\boldsymbol{F}}{{\boldsymbol{F}}}^{\dagger }\rangle }{\boldsymbol{F}}\) and \({{\mathcal{P}}}_{{T}_{{\boldsymbol{F}}}\text{Sp}}({\boldsymbol{V}})={\boldsymbol{V}}-\langle {\boldsymbol{F}},{\boldsymbol{V}}\rangle {\boldsymbol{F}}\) denotes the projection onto the tangent space TFSp = {F: FF = 1}. Here, μ is the step size.

MPO-based MLE

When the density matrix admits an MPO representation with bond dimension D, we consider the constrained optimization problem:

$$\mathop{\min }\limits_{{\boldsymbol{\rho }}\in {{\mathbb{X}}}_{D}}{f}_{2}({\boldsymbol{\rho }})=-\frac{1}{M}\sum\limits_{m = 1}^{M}\log (\langle {{\boldsymbol{\phi }}}_{m,{j}_{m}}{{\boldsymbol{\phi }}}_{m,{j}_{m}}^{\dagger },{\boldsymbol{\rho }}\rangle ).$$

We solve (43) using a projected gradient descent (PGD) scheme:

$${{\boldsymbol{\rho }}}_{t}={{\mathcal{P}}}_{{\rm{Simplex}}}({\text{SVD}\,}_{D}^{tt}({{\boldsymbol{\rho }}}_{t-1}-\mu {\nabla }_{{\boldsymbol{\rho }}}{f}_{2}({{\boldsymbol{\rho }}}_{t-1}))),$$

where \({\nabla }_{{\boldsymbol{\rho }}}{f}_{2}({\boldsymbol{\rho }})=-\frac{1}{M}\sum\limits_{m = 1}^{M}\frac{{{\boldsymbol{\phi }}}_{m,{j}_{m}}{{\boldsymbol{\phi }}}_{m,{j}_{m}}^{\dagger }}{\langle {{\boldsymbol{\phi }}}_{m,{j}_{m}}{{\boldsymbol{\phi }}}_{m,{j}_{m}}^{\dagger },{\boldsymbol{\rho }}\rangle }\), and μ is the step size.

Materials

Lemma 1

(Classical Bernstein’s inequality23, Theorem 6) Let \({s}_{1},\ldots ,{s}_{n}\in {\mathbb{R}}\) denote i.i.d. copies of a mean-zero random variable s that obeys \({\mathbb{E}}[| s{| }^{p}]\le p!{R}^{p-2}{\sigma }^{2}/2\) for all integers p≥2, where R, σ2 > 0 are constants. Then, for all t > 0,

$${\mathbb{P}}\left(\left\vert \sum\limits_{i = 1}^{n}{s}_{i}\right\vert \ge t\right)\le 2{e}^{-\frac{{t}^{2}/2}{n{\sigma }^{2}+Rt}}.$$
(43)

Lemma 2

(ref. 29, Lemma 10) For any \({{\boldsymbol{A}}}_{i},{{\boldsymbol{A}}}_{i}^{\star }\in {{\mathbb{R}}}^{{r}_{i-1}\times {r}_{i}},i\in \{1,\ldots ,N\}\), we have

$$\begin{array}{rcl}&&{{\boldsymbol{A}}}_{1}{{\boldsymbol{A}}}_{2}\cdots {{\boldsymbol{A}}}_{N}-{{\boldsymbol{A}}}_{1}^{\star }{{\boldsymbol{A}}}_{2}^{\star }\cdots {{\boldsymbol{A}}}_{N}^{\star }\\ &=&\sum\limits_{i = 1}^{N}{{\boldsymbol{A}}}_{1}^{\star }\cdots {{\boldsymbol{A}}}_{i-1}^{\star }({{\boldsymbol{A}}}_{i}-{{\boldsymbol{A}}}_{i}^{\star }){{\boldsymbol{A}}}_{i+1}\cdots {{\boldsymbol{A}}}_{N}.\end{array}$$
(44)

Lemma 3

There exists an ϵ-net \({\widetilde{{\mathbb{X}}}}_{D}\) for \({\widehat{{\mathbb{X}}}}_{D}\) in Eq. (39) under the Frobenius norm, i.e., ρρ(p)Fϵ for \({{\boldsymbol{\rho }}}^{(p)}\in {\widetilde{{\mathbb{X}}}}_{D}\), obeying

$${N}_{\epsilon }({\widetilde{{\mathbb{X}}}}_{D})\le {\left(\frac{4n+\epsilon }{\epsilon }\right)}^{4n{D}^{2}},$$
(45)

where \({N}_{\epsilon }({\widetilde{{\mathbb{X}}}}_{D})\) denotes the number of elements in the set \({\widetilde{{\mathbb{X}}}}_{D}\).

Proof

For each set of matrices \(\{L({{\boldsymbol{X}}}_{\ell })\in {{\mathbb{R}}}^{4D\times D}:\parallel L({{\boldsymbol{X}}}_{\ell })\parallel \le 1\}\), according to ref. 67, we can construct an ξ-net \(\{L({{\boldsymbol{X}}}_{\ell }^{(1)}),\ldots ,L({{\boldsymbol{X}}}_{\ell }^{({N}_{\ell })})\}\) with the covering number \({N}_{\ell }\le {(\frac{4+\xi }{\xi })}^{4{D}^{2}}\) such that

$$\mathop{\sup }\limits_{L({{\boldsymbol{X}}}_{\ell }):\parallel L({{\boldsymbol{X}}}_{\ell })\parallel \le 1}\,\mathop{\min }\limits_{{p}_{\ell }\le {N}_{\ell }}\parallel L({{\boldsymbol{X}}}_{\ell })-L({{\boldsymbol{X}}}_{\ell }^{({p}_{\ell })})\parallel \le \xi ,$$
(46)

for all {1, …, n − 1}. Also, we can construct an ξ-net \(\{L({{\boldsymbol{X}}}_{n}^{(1)}),\ldots ,L({{\boldsymbol{X}}}_{n}^{({N}_{n})})\}\) for \(\{L({{\boldsymbol{X}}}_{n})\in {{\mathbb{R}}}^{4D\times 1}:\parallel L({{\boldsymbol{X}}}_{n}){\parallel }_{F}\le 1\}\) such that

$$\mathop{\sup }\limits_{L({{\boldsymbol{X}}}_{n}):\parallel L({{\boldsymbol{X}}}_{n}){\parallel }_{F}\le 1}\mathop{\min }\limits_{{p}_{n}\le {N}_{n}}\parallel L({{\boldsymbol{X}}}_{n})-L({{\boldsymbol{X}}}_{n}^{({p}_{n})}){\parallel }_{F}\le \xi ,$$
(47)

with the covering number \({N}_{n}\le {(\frac{2+\xi }{\xi })}^{4D}\).

Therefore, we can construct a ξ-net \(\{[{{\boldsymbol{X}}}_{1}^{(1)},\ldots ,{{\boldsymbol{X}}}_{n}^{(1)}],\ldots ,[{{\boldsymbol{X}}}_{1}^{({N}_{1})},\ldots ,{{\boldsymbol{X}}}_{n}^{({N}_{n})}]\}\) with covering number

$${{{\Pi }}}_{\ell = 1}^{n}{N}_{\ell }\le {\left(\frac{4+\xi }{\xi }\right)}^{4n{D}^{2}}$$
(48)

for any MPO ρ = [X1, …, Xn] with bond dimension D. Then we expand ρρ(p)F as follows:

$$\begin{array}{ll}\quad\,\,\Vert {\boldsymbol{\rho }}-{{\boldsymbol{\rho }}}^{(p)}{\Vert}_{F}\\ =\Vert [{{\boldsymbol{X}}}_{1},\ldots ,{{\boldsymbol{X}}}_{n}]-[{{\boldsymbol{X}}}_{1}^{({p}_{1})},\ldots ,{{\boldsymbol{X}}}_{n}^{({p}_{n})}]{\parallel }_{F}\\ =\Vert\mathop{\sum }\limits_{{a}_{l}=1}^{n}[{{\boldsymbol{X}}}_{1}^{({p}_{1})},\ldots ,{{\boldsymbol{X}}}_{{a}_{l}-1}^{({p}_{l})},{{\boldsymbol{X}}}_{{a}_{l}}^{({p}_{{a}_{l}})}\,\,-\,\,{{\boldsymbol{X}}}_{{a}_{l}},{{\boldsymbol{X}}}_{{a}_{l}+1},\ldots ,{{\boldsymbol{X}}}_{n}]{\Vert}_{F}\\ \le\,\mathop{\sum }\limits_{{a}_{l}=1}^{n}\parallel [{{\boldsymbol{X}}}_{1}^{({p}_{1})},\ldots ,{{\boldsymbol{X}}}_{{a}_{l}-1}^{({p}_{l})},{{\boldsymbol{X}}}_{{a}_{l}}^{({p}_{{a}_{l}})}\,\,-\,\,{{\boldsymbol{X}}}_{{a}_{l}},{{\boldsymbol{X}}}_{{a}_{l}+1},\ldots ,{{\boldsymbol{X}}}_{n}]{\parallel }_{F}\\ \le\mathop{\sum }\limits_{{a}_{l}=1}^{n-1}\parallel L({{\boldsymbol{X}}}_{{a}_{l}}^{({p}_{{a}_{l}})})\,-\,L({{\boldsymbol{X}}}_{{a}_{l}})\parallel +\parallel\!\! L({{\boldsymbol{X}}}_{n}^{({p}_{n})})\,-\,L({{\boldsymbol{X}}}_{n}){\parallel }_{F}\\\le\,n\xi =\epsilon ,\end{array}$$

where the second line and the second inequality respectively follow Lemma 2 and29, Eq.(47). In addition, we choose \(\xi =\frac{\epsilon }{n}\) in the last line. Ultimately, we can construct an ϵ-net \(\{{{\boldsymbol{\rho }}}^{(1)},\ldots ,{{\boldsymbol{\rho }}}^{{N}_{1}\cdots {N}_{n}}\}\) with covering number

$${N}_{\epsilon }({\widetilde{{\mathbb{X}}}}_{D})\le {\left(\frac{4n+\epsilon }{\epsilon }\right)}^{4n{D}^{2}}$$
(49)

for any MPO \({\boldsymbol{\rho }}\in {\widehat{{\mathbb{X}}}}_{D}.\)