Fig. 2: Benchmarking and scaling of model accuracy.

a–d, Measured ground truth (blue circle) and model prediction (red cross) of \({S}_{31}\) for ten unseen random metasurface configurations for the linear model (a), the deep-learning model (b), and the physical model (c), all calibrated only with \({S}_{31}\), as well as for the physical model calibrated with the \(2\times 2\) transmission matrix (d). The achieved accuracy \({\zeta }_{31}\) and number of model parameters \({N}_{{{{{{\rm{params}}}}}}}\) are indicated. e Scaling of the accuracy \({\zeta }_{{{{{{\rm{SISO}}}}}}}\) of a single predicted transmission coefficient as a function of the size \({N}_{{{{{{\rm{data}}}}}}}\) of the calibration data set. We consider the cases of models calibrated to predict only a single transmission coefficient (\(h\), purple), a \(2\times 2\) transmission matrix (\({{{{{\bf{H}}}}}}\), green), and a \(4\times 4\) scattering matrix (\({{{{{\bf{S}}}}}}\), blue). We compare these cases of the physical model and the two benchmark models (DL and linear). For the linear model, the accuracy does not depend on the number of predicted scattering coefficients. In the cases of models predicting \({{{{{\bf{H}}}}}}\) or \({{{{{\bf{S}}}}}}\), \({\zeta }_{{{{{{\rm{SISO}}}}}}}\) is averaged over all available transmission coefficients.