Introduction

Hybrid quantum-classical algorithms1,2 like the quantum approximate optimization algorithm (QAOA) and the variational quantum eigensolver (VQE) show promise for implementation on early quantum devices that are yet incapable of running quantum error correction. There have been several experimental demonstrations of these algorithms on quantum hardware such as trapped ions3,4, neutral atoms5, and superconducting qubits6,7,8. Although these algorithms are suitable for a wide range of optimization problems ranging from electronic structure calculations9 to general combinatorial optimization problems10, the demonstration of a practical quantum advantage compared to purely classical techniques remains an outstanding challenge.

Quantum metrology, which leverages inherently quantum effects such as entanglement to surpass classical limits on resolution and sensitivity11, is a promising area for near-term applications of noisy quantum hardware. Numerous theoretical studies have shown that variational methods can be applied to identify optimal non-classical probe states and measurements in the presence of unknown noise processes and imperfections12,13,14,15,16. Recent experimental demonstrations with photonic platforms have also highlighted the feasibility of these techniques for multi-parameter sensing in the few photon limit17 and supervised learning assisted by continuous variable entangled networks for multi-dimensional data classification18.

We extend previous investigations to the practically relevant regime of optical phase estimation using a squeezed coherent state and homodyne measurements19. This combination has been successfully deployed to enhance gravitational wave-detection20,21, magnetic field sensing22, and biological imaging23 beyond classical limits. We demonstrate that a variational algorithm can effectively steer the experiment towards the optimal probe state and measurement basis in the presence of various imperfections like phase fluctuations and loss. Our approach involves optimizing the classical Fisher information, a key determinant of metrological precision. We develop and implement parameter shift rules24 to calculate and differentiate this quantity in continuous variable systems. This allows us to implement two approaches: one that estimates the gradient of the classical Fisher information through additional measurements for gradient descent optimization, and another that employs a gradient-free Bayesian optimizer to further fine-tune the optimization process.

Our results confirm the potential of variational techniques for quantum sensing tasks and highlight the remaining challenges for their practical application. We find that the additional time overhead incurred by calculating the gradient is balanced by the gradient-based optimizer’s better handling of slow drifts in the sensing apparatus. Conversely, the gradient-free optimization method requires more fine-tuning of the exploration/exploitation trade-off, but enables faster convergence. Further investigation of variational methods in more complex setups with intricate parametrizations, and a detailed study of the potentials and limitations of different optimization techniques, will be crucial for the widespread adoption of variational metrology schemes.

Our task is to optimize a metrological protocol, which includes preparing a probe state and a suitable measurement, for estimation of a small phase shift imprinted on a mode of a continuous-variable quantum system. We choose to focus on a local estimation task considering small shifts around a fixed phase as this is of fundamental interest in applied quantum sensing. Without loss of generality, this fixed phase can be assumed to be zero since the phase of the probe can always be adjusted relative to the fixed phase. We have chosen the continuous-variable platform, as these types of systems are widely used and have proven their usefulness in practically relevant scenarios20,21,23.

The conventional approach to developing metrological protocols typically starts with a theoretical description, followed by protocol development and implementation. This strategy has multiple downsides: A completely faithful model of a system is often hard to devise and when trying to push the system to its limits, every inaccuracy in the theoretical model can negatively impact performance. Furthermore, even with a faithful model, the system often depends on parameters that are not stable over time, causing parameter drift. Therefore, methods that adapt to the actual physical conditions during execution, without requiring a complete theoretical model, are highly desirable.

Variational quantum algorithms for quantum metrology12,13,14,15,16 have been proposed for this purpose. In these approaches, a cost function is defined to represent the performance of a metrological protocol and is then optimized. This process does not require a full theoretical model, allowing the protocol to implicitly account unmodeled effects.

Our experiment broadly consists of three stages. First, a displaced squeezed state is prepared as a probe state, where the displacement phase angle relative to the squeezed quadrature, ϕα, is a free parameter. We fix the squeezing level r and displacement amplitude α to work with probes of a fixed photon number. Next, the state undergoes a phase shift that encodes the parameter to be sensed. Finally, we perform homodyne detection, where the homodyne detection angle ϕHD, relative to the squeezing angle, is another free parameter. Building on the approach of ref. 16, we use the inverse of the classical Fisher information25 as a cost function \(C({\phi }_{\alpha },{\phi }_{HD})=1/{\mathcal{F}}({\phi }_{\alpha },{\phi }_{HD})\), where the inverse of the Fisher information represents the lowest achievable variance with many repetitions of the experiment, serving as a good proxy for metrological precision. The “methods sections IIIB and IIIC”) details the computation of this metric in continuous variable systems using Gaussian parameter-shift rules. As outlined in the introduction, we combine this setup with two different optimizers: a gradient-based one for ab initio optimization and a Bayesian one for fine-tuning. The optimization happens in real time, with the experiment continuously running while measurements are done, cost function is estimated and parameters are optimized and adjusted.

Results

Experimental principle

The principle behind the experiment is illustrated in Fig. 1. We prepare a displaced squeezed state by pumping a hemilithic optical parametric oscillator (OPO) (previously described in ref. 26) at 775 nm. This process generates squeezed vacuum at 1550 nm. The squeezed vacuum is then combined with a coherent state on a 99/1 beamsplitter, resulting in the displacement of the squeezed light at the 5 MHz sideband using a phase modulator in the coherent beam. After interacting with the lab environment, the squeezed light is measured by a homodyne detector. The relative phases between the squeezed light, the local oscillator and the displacement beam are locked using the coherent locking technique27. This technique, employing a 40 MHz phase-locked sideband mode transmitted alongside the squeezed light, serves as a phase reference and allows full access to the phase space for both the squeezed light and displacement.

Fig. 1: Variational experiment principle.
figure 1

A probe state is prepared as a squeezed, displaced state. The free parameters of our system are the measurement basis angle, ϕHD, and the displacement angle, ϕα, both relative to the squeezing angle ϕr. After interacting with the environment, the state is detected in a homodyne detector. The cost function is estimated by varying the measurement basis of the detector. Subsequently, the experiment either estimates the gradient of the cost function to determine the next set of initial parameters or employs a Gaussian Process (GP) in a Bayesian gradient-free optimization algorithm for parameter selection.

A detailed description of the experimental setup can be found in the methods section fig. 4. For the measurements in this paper, the OPO’s pump power is 2.7 mW resulting in the measurement of around 5 dB squeezing and 12 dB anti-squeezing. The displacement added to the squeezing is approximately α = 5.2, a regime of interest since the contributions of the squeezed and the coherent photons to the classical Fisher Information are comparable.

The homodyne detector’s output is sampled using a data-acquisition card after which it is downmixed to the 5 MHz sideband and subsequently lowpass filtered with bandwidth 1 MHz. The processed data are then used to characterize the measurement statistics. The cost function is estimated by varying the phases of the local oscillator according to the parameter-shift rules (see “methods section IIIB”). Depending on the algorithm employed, the local oscillator and displacement phases are either further shifted during measurements to generate gradients for the gradient descent algorithm, or the cost function is directly fed to the Bayesian optimizer for the gradient-free algorithm. Based on either the gradient of the cost function or the decision function of the Bayesian optimizer, a new set of phase parameters is found. The process is repeated until optimization is achieved.

In the data discussed below, each experimental sequence consists of repeated measurements of the relevant quantities tracked in real time. This allows us to directly observe the fluctuations of the experimental parameters such as the cost function and their stabilization to an optimal setting as a result of the classical feedback cycle.

Gradient descent optimization

In Fig. 2a, we present a run of the gradient descent-based optimization over 24 epochs. There is a clear trend of the cost function starting at a high value and then converging to a low one. The black dotted line of the middle plot of Fig. 2a represents the theoretical shot noise limit, which is based on the average number of photons in the probe state and the number of samples used to estimate each value of the cost function. Notably, the optimized cost function falls below this limit after convergence, indicating that the algorithm successfully identifies an optimum below the classical limit. Examining the resulting quadrature values, we observe the variance dropping below shot noise and approaching a value determined by the basis angle that maximizes the quadrature variance contribution to our analytical model of the classical Fisher Information. The theoretical model is elaborated upon in great detail in sections IIIB and IIIC. In general, it can be described as a squeezed, displaced probe state undergoing optical loss. This can be modelled analytically (see eq. 8).

Fig. 2: Results of the gradient descent experiment.
figure 2

a Demonstration of the gradient descent algorithm across 24 optimization epochs. (Top) The optical phase angles ϕHD and ϕα estimated from the measurements. (Middle) The measured cost function \(C=1/{\mathcal{F}}\). The dotted line represents the shot noise limit accounting for the number of photons in the measurement and the number of samples used to estimate the cost function. (Bottom) The measured quadrature mean values and variances. The dotted lines in the top and bottom plots indicate the optimal values as predicted by theory (See “section III”). Note that in this particular measurement, the displacement appears to be slightly larger than α = 5.2, likely due to improved spatial overlap between the displacement beam and local oscillator, as the added modulation was consistent across all measurements. b Kick-test of the gradient descent optimization over 45 epochs. The arrows labeled “kick” indicate the points immediately following the application of a kick.

We observe unexpected local minima during the optimizer runs, evident from the mean value of the measurement quadrature. Ideally, we would expect this mean value to stabilize around 0, but it occasionally stabilizes around an intermediate value (in this case ~4). This is not expected from the analytical model of our experiment (eq. 8). The value of the cost function is roughly the same for this intermediate value and the one predicted by the analytical model indicating that both settings are valid minima of the cost function.

We try to simulate the gradient descent experiment by numerically introducing Gaussian distributed phase noise to approximate the experimental conditions as closely as possible, and while we can observe a similar trend, the simulation does not find the exact same minimum, showing the difficulty in modelling experimental noise environments. The exact cause of this phenomenon is a subject for further investigation, but it does, however, showcase the advantage of variational approaches, as they can naturally uncover minima not predicted by either the analytical or the numerical model of our experiment.

In Fig. 2b, we display the results of a different run of the experiment. To test the algorithm’s robustness against disturbances, we allowed the system to optimize for 14 rounds. Then, on the 15th round, we apply a kick to both control parameters, effectively displacing the system from its optimum. Subsequently, the system was permitted to optimize for another 14 epochs, before being subjected to another kick.

Once again, we observe that the algorithm successfully optimizes the system after each kick, consistently reaching below the shot noise limit. While the quadrature variance converges towards the expected value, the mean value tends to stabilize around an intermediate point. Interestingly, in this particular round, during the final optimization step, the mean value does settle around the expected optimum. However, we also note that the difference in cost function between the various minima is very small and essentially indistinguishable in our measurements. This observation suggests that the optimization of the variance significantly impacts the cost function more than the mean value. This hypothesis is further supported by a simulation presented in Fig. 10 of the methods section. This simulation, a 2D simulation of the cost-function landscape with the kick-measurements superimposed, reveals that the minima are quite broad and shallow as a function of the displacement angle, especially once the measurement angle has been optimized.

Post-hoc Bayesian optimization for fine-tuning

In our search for an even better solution than those obtained from the gradient descent approach, we employ a post-hoc gradient-free optimization using data from low-cost areas. We use Bayesian optimization (BO), a probabilistic algorithm often used in machine learning for tuning model hyperparameters28,29,30. This is suitable for the local estimation task considered here. In the case of ab initio phase estimation, adaptive Baysian techniques31 could potentially be employed to pinpoint approximate values around which local estimation could be performed. Bayesian optimization is particularly suited to our experiment as it utilizes a probabilistic approximation of the underlying model. This approach is beneficial for incorporating uncertainty related to our necessarily inexact modelling of the system and the to random processes occurring in the experiment. Furthermore, the gradient-based algorithm requires five measurements to estimate the cost function and an additional eight measurements per control parameter to estimate the gradient, whereas the gradient-free algorithm only needs the five measurements for the cost function estimation. This results in a significant reduction in experimental resources per optimization step.

A Bayesian optimization routine comprises two parts. First, a surrogate function models the underlying cost function landscape C(ϕHD, ϕα) probabilistically, indicated by the predictive mean μ(C(ϕHD, ϕα)) ≈ C(ϕHD, ϕα) and an uncertainty captured by the covariance matrix Σ(C(ϕHD, ϕα)). Second, an acquisition function determines which data points to query during iterative optimization – essentially, it quantitatively identifies a “good” next point for the experiment.

We selected a Gaussian Process (GP)32 for the surrogate function. For every point (ϕHD, ϕα), the predictive distribution of the surrogate model follows a normal distribution with mean μ(C(ϕHD, ϕα)) and covariance Σ(C(ϕHD, ϕα)). The model’s covariance is a key hyperparameter that balances the trade-off between exploration (favoring larger steps in parameter space) and exploitation (focusing on the vicinity of the current optimum). For the acquisition function, we opted for Expected Improvement (EI), a commonly used technique in Bayesian optimization33. Intuitively, EI selects the next point that according to the current model is expected to achieve the lowest value of the cost function.

Initially, the Gaussian Process was fitted with a data set of 136 data points obtained from the gradient descent experiment along with randomly sampled points in our search space. This approach is hence referred to as post-hoc Bayesian optimization for fine-tuning. The next parameter set for the experiment is determined by maximizing the acquisition function, which relies solely on the surrogate model of the Bayesian Optimizer.

After selecting this new set of parameters, the physical experiment is conducted, and the actual cost value is obtained. This new data point is then incorporated into the dataset, and the updated dataset is used to fit a new Gaussian Process for the subsequent step of the algorithm. This process is repeated iteratively for 50 epochs, and report the lowest obtained cost function value. See the methods section for technical details and Sec. 2.6–2.8 in ref. 34 for a full derivation.

From Fig. 3, we observe once again that the algorithm reaches an optimum of the system close to the expected theoretical minimum. In part a of Fig. 3, we set the hyperparameters of the Gaussian Process more loosely, resulting in the algorithm exploring the parameter landscape around the minimum. This exploration comes at the cost of optimization process’s stability, causing the cost function to fluctuate to relatively high values. Optimizing the balance between exploration and exploitation is achievable by tuning the hyper-parameters of the model. The results of this tuning are evident in part b of Fig. 3, where the model initially focuses on exploring new areas of cost function landscape C(ϕHD, ϕα) for the first 30 epochs. It then shifts to a more stable exploitation of the achieved minimum in the subsequent 20 epochs. Although not depicted in this trace, the gradient-free algorithm was also exhibited susceptibility to the same local minima as those identified using the gradient descent-based algorithm. This adds further evidence that our theoretical model does not fully capture the entire noise landscape of the experiment.

Fig. 3: Results of the gradient free experiment.
figure 3

Demonstration of Bayesian optimization over 50 epochs, with (a) loose and (b) optimized hyper-parameters of the Gaussian Process (see “section IIIL”). (Top) The optical phase angles ϕHD and ϕα estimated from the measurements. (Middle) The measured cost function \(C=1/{\mathcal{F}}\). The dotted line represents the shot noise limit, accounting for the number of photons in the measurement and the number of samples used to estimate the cost function. (Bottom) The measured quadrature mean values and variances. The dotted lines in the top and bottom plots indicate the optimal values as predicted by theory (“section IIIB”).

The gradient-free Bayesian optimization method employed here offers two advantages over the gradient descent optimization. First, it reduces the risk of getting trapped in a local minimum, a common problem with gradient descent. Second, as it does not require the evaluation of the gradient, it uses less experimental resources. However, employing Bayesian optimization is not without its drawbacks. As discussed in previous sections, the gradient-based method will be able to follow slow drifts of the system as these drifts alter the gradients. In contrast, the Bayesian Optimizer is likely to be slower in optimizing a non-stationary system due to the need for an exploration phase whenever the experimental setup undergoes changes. Additionally, the performance of the gradient-free algorithm heavily depends on the careful optimization of its hyperparameters.

Discussion

In summary, we have investigated the feasibility of hybrid quantum-classical optimization algorithms for optical phase estimation with squeezed coherent light and homodyne detection. Specifically, we have investigated the performance of both a gradient-based optimization and post-hoc gradient-free Bayesian optimization.

Our results confirm that both algorithms can successfully adjust the control parameters of the system to achieve optimal estimation performance. This includes preparing of the optimal probe state and setting the measurement parameters. Notably, the algorithms achieve this without prior knowledge about the noise processes in the hardware, as evidenced by the discovery of optima that were not anticipated by our theoretical model. Additionally, we have shown how the gradient-based algorithm can automatically adjust the system to the optimal setting when the phase to be estimated changes.

These findings underscore the potential of variational quantum algorithm-based quantum metrology, particularly for optical phase estimation with squeezed coherent light and homodyne detection.This motivates further investigation of such techniques in more complex quantum sensing systems with additional control parameters, such as the degree of squeezing and the level of coherent excitation. In addition, the application of these variational algorithms in multi-parameter sensing systems35, which are relevant for quantum imaging36, entangled sensor networks37,38, and networked atomic clocks39, is also a promising avenue. In particular, recent theoretical12 and experimental work18 on entanglement assisted supervised learning have shown that multi-mode entangled networks can outperform non-entangled methods for certain data classification tasks. These works considered an ad-hoc cost function and non-gradient based optimization methods. In comparison, we have considered a more general cost function based on the classical Fisher information, which provides a saturable lower bound on the performance of any unbiased estimator. While we have focused on phase estimation in this work, the application of similar techniques, including parameter shift rules for gradient estimation, to data-classification problems as studied in refs. 12,18 could further elucidate the promise of quantum enhanced sensing.

Our study also reveals that the choice of classical optimizers may vary depending on the specific estimation task. We observe that the gradient-based optimizer is effective for ab initio phase estimation but incurs a higher measurement overhead compared to the gradient-free Bayesian optimization. Conversely, our implementation of the Bayesian optimization was adept at fine-tuning control parameters, though it required a training set from the gradient descent optimization for optimal performance. We also noted that careful optimization of the hyper-parameters of the algorithms such as the learning rate and ratio of exploration/exploitation is crucial for good performance. Further investigations in this area, including the potential benefit of switching between different classical optimizers for practical phase estimation tasks, will be vital to further validate the practical applicability of variational techniques in quantum-enhanced metrology.

Methods

Experimental system

The experimental system is shown in figure Fig. 4. A hemilithic, double resonant, optical-parametrtic-oscillator (OPO) is pumped with a light field at 775 nm, and the squeezed light is generated at 1550 nm. The squeezed light source, which is a modification of the source presented in ref. 40, has a FWHM bandwidth of 66 MHz and a threshold power of around 6 mW. The source is pumped with 2.7 mW, and the system has around 72% efficiency and 30 mrad phase noise RMS (between squeezed light and local oscillator), measuring around 5 dB of squeezed light and 11.8 dB of anti-squeezed light at a sideband frequency of 5 MHz. From these a squeezing strength r ~ 1.52 can be estimated.

Fig. 4: Sketch of the experimental setup.
figure 4

CLF Coherent-locking-field, DISP Displacement field, LO Local oscillator, DAQ Data acquisition, OPO Optical-parametric-oscillator, PD Photo-detector, FI Faraday Isolator, BS Beam-splitter, AOM Acousto-optic oscillator, PM Phase modulator, PS Phase shifter, PBS Polarizing beam-splitter, DBS Dichroic beam-splitter, HWP Half-wave plate.

The OPO is stabilized via the Pound-Drever-Hall (PDH) technique41, and a 40 MHz frequency-shifted beam is injected into the OPO to act as a phase reference in a coherent locking scheme27. This 40 MHz reference tone (CLF) is used to stabilize the phase between the squeezed light, the displacement beam (DISP) and the local oscillator (LO). Tuning the electrical down-mixing phase of the two phase locks allows full control of the displacement angle ϕα and the homodyne basis angle ϕHD relative to the squeezed quadrature angle ϕr, which is arbitrarily set by the phase of the pump light.

Classical Fisher information of a Gaussian state and the cost function

The cost function used in the experiment is the inverse of the classical Fisher Information

$$C=\frac{1}{{\mathcal{F}}}$$
(1)

We assume our system to be a CW Gaussian state described by quadrature operators \(\hat{{\bf{X}}}\equiv ({\hat{{\bf{a}}}}^{\dagger }+\hat{{\bf{a}}})\) and \(\hat{{\bf{P}}}\equiv i({\hat{{\bf{a}}}}^{\dagger }-\hat{{\bf{a}}})\) and fully characterized by the first two statistical moments \({\mu }_{x}=\langle \hat{{\bf{X}}}\rangle\) and \({V}_{x}=\langle {\hat{{\bf{X}}}}^{2}\rangle -{\langle \hat{{\bf{X}}}\rangle }^{2}\) (with similar moments for \(\hat{{\bf{P}}}\)) and with a photon number operator given by

$$\hat{{\bf{n}}}=\frac{1}{4}({\hat{{\bf{X}}}}^{2}+{\hat{{\bf{P}}}}^{2}-2{\mathbb{I}}),$$
(2)

using the commutator \([\hat{{\bf{X}}},\hat{{\bf{P}}}]=2i\).

The classical Fisher Information for the estimation of phase shifts ϕ, with a measurement probability distribution of \(\hat{{\bf{X}}}\) quadrature values P(x) parameterized by parameters {Θ} is given by

$${\mathcal{F}}=\int{\rm{d}}xP(x| \phi ,\{\Theta \}){\left(\frac{\partial \log (P(x| \phi ,\{\Theta \}))}{\partial \phi }\right)}^{2}.$$
(3)

Equation (3) can be evaluated, since we have a Gaussian state, as

$${\mathcal{F}}=\frac{1}{{V}_{x}}{\left(\frac{\partial {\mu }_{x}}{\partial \phi }\right)}^{2}+\frac{1}{2{V}_{x}^{2}}{\left(\frac{\partial {V}_{x}}{\partial \phi }\right)}^{2},$$
(4)

where μx and Vx will be functions of the parameters {Θ}.

The gradient of the cost function with respect to the parameters Θ is given by

$${{\boldsymbol{\nabla }}}_{\Theta }C=\frac{-{{\boldsymbol{\nabla }}}_{\Theta }{\mathcal{F}}}{{{\mathcal{F}}}^{2}}.$$
(5)

The gradient of the classical Fisher Information is finally given by

$$\begin{array}{l}{{\boldsymbol{\nabla }}}_{\Theta }{\mathcal{F}}=\frac{1}{{V}_{x}^{3}}\left(2{V}_{x}^{2}\frac{\partial {\mu }_{x}}{\partial \phi }{{\boldsymbol{\nabla }}}_{\Theta }\frac{\partial {\mu }_{x}}{\partial \phi }-{V}_{x}{\left(\frac{\partial {\mu }_{x}}{\partial \phi }\right)}^{2}{{\boldsymbol{\nabla }}}_{\Theta }{V}_{x}\right)\\\qquad\;\;+\,\frac{1}{{V}_{x}^{3}}\left({V}_{x}\frac{\partial {V}_{x}}{\partial \phi }{{\boldsymbol{\nabla }}}_{\Theta }\frac{\partial {V}_{x}}{\partial \phi }-{\left(\frac{\partial {V}_{x}}{\partial \phi }\right)}^{2}{{\boldsymbol{\nabla }}}_{\Theta }{V}_{x}\right).\end{array}$$
(6)

The derivative of the variance is in general given by

$$\frac{\partial {V}_{x}}{\partial \phi }=\frac{\partial }{\partial \phi }\langle {\hat{{\bf{X}}}}^{2}\rangle -2\langle \hat{{\bf{X}}}\rangle \frac{\partial }{\partial \phi }\langle \hat{{\bf{X}}}\rangle .$$
(7)

The two main contributions to the loss of information are the loss of squeezed photons and phase noise between the three interacting fields (displacement- and squeezing fields and local oscillator). The loss of photons comes from the limited escape efficiency of the squeezer ηesc, the limited efficiency of optical components between the squeezer and the detector ηopt, the imperfect visibility between signal and local oscillator \({{\mathcal{V}}}^{2}\) and finally the imperfect quantum efficiency of the photodiodes ηQE. Phase noise comes mainly from the inability of the the phase-stabilization loops to remove all classical phase fluctuations due to having limited bandwidths and shot-noise of the light fields. In the case of pure loss, ignoring phase noise, the Classical Fisher Information can be evaluated analytically by starting from eq. (4)

$$\begin{array}{lll}{\mathcal{F}}(\phi )\;=\;\frac{1}{{V}_{x}}{\left(\frac{\partial {\mu }_{x}}{\partial \phi }\right)}^{2}+\frac{1}{2{V}_{x}^{2}}{\left(\frac{\partial {V}_{x}}{\partial \phi }\right)}^{2}\\\qquad\;\;\,=\;\frac{1}{\eta \left({e}^{-2r}{\cos }^{2}(\phi )+{e}^{2r}{\sin }^{2}(\phi )\right)+1-\eta }{\left(\frac{\partial }{\partial \phi }2\sqrt{\eta }| \alpha | \cos ({\phi }_{\alpha }-\phi )\right)}^{2}\\\qquad\;\;\,+\;\frac{1}{{\left(\eta \left({e}^{-2r}{\cos }^{2}(\phi )+{e}^{2r}{\sin }^{2}(\phi )\right)+1-\eta \right)}^{2}}{\left(\frac{\partial }{\partial \phi }\left[\eta \left({e}^{-2r}{\cos }^{2}(\phi )+{e}^{2r}{\sin }^{2}(\phi )\right)+1-\eta \right]\right)}^{2}\\\qquad\;\;\,=\;\frac{4\eta | \alpha {| }^{2}{\sin }^{2}({\phi }_{\alpha }-\phi )}{\eta \left({e}^{-2r}{\cos }^{2}(\phi )+{e}^{2r}{\sin }^{2}(\phi )\right)+1-\eta }+\frac{2{\eta }^{2}{\sinh }^{2}(2r){\sin }^{2}(2\phi )}{{\left(\eta \left({e}^{-2r}{\cos }^{2}(\phi )+{e}^{2r}{\sin }^{2}(\phi )\right)+1-\eta \right)}^{2}}.\end{array}$$
(8)

In the zero loss scenario, the variance term will always scale faster with increasing photon numbers, meaning that concentrating the photons in the squeezed state is the more effective strategy. In this case the optimal measurement phase is given by \({\phi }_{opt}=\arccos (\tanh (2r))/2\).

In the case of relatively low loss, there exists an optimal ratio between squeezed and coherent photons with an optimal relative angle being ϕαϕ = π/2 and an optimal measurement angle given by

$${\phi }_{opt}^{{\mathcal{L}}}=\frac{1}{2}\arccos \left(\frac{\frac{\eta {e}^{2r}+(1-\eta )}{\eta {e}^{-2r}+(1-\eta )}-1}{\frac{\eta {e}^{2r}+(1-\eta )}{\eta {e}^{-2r}+(1-\eta )}+1}\right).$$
(9)

In the high loss scenario, the mean value term will dominate meaning most photons should be put into the coherent state, with the optimal relative angle being ϕαϕ = π/2 and the optimal measurement angle being ϕ = 0.

Parameter-shift rules for quadratic operators

In general, an arbitrary operator can be expressed as

$$\hat{{\bf{A}}}=(a,b,c,d,e\ldots )\cdot \left(\begin{array}{c}{\mathbb{I}}\\ \hat{{\bf{X}}}\\ \hat{{\bf{P}}}\\ {\hat{{\bf{X}}}}^{2}\\ {\hat{{\bf{P}}}}^{2}\\ \vdots \end{array}\right)$$
(10)

When a gate acts upon this arbitrary operator \(\hat{{\bf{A}}}\), we in principle need to know the infinite-dimensional gate matrix describing the transformation of all entries of the vector \({\hat{{\bf{A}}}}_{G}=G[\hat{{\bf{A}}}]={M}_{G}^{T}\hat{{\bf{A}}}\), where MG is a matrix describing the action of the gate upon the operator. Please note the transposition of the matrix - this is a very subtle and important detail. The derivatives of operators linear in quadratures (e.g. \({{\boldsymbol{\nabla }}}_{\Theta }\langle \hat{{\bf{X}}}\rangle\)) can easily be evaluated using the parameter shift rules from24 to find the higher-order entries of the gate matrix or truncate the vector space to only include the linear operators. It is also possible to derive parameter shift rules for operators quadratic in the quadratures, by finding and differentiating the higher order entries of the gate matrices using

$$G[\hat{{\bf{A}}}\hat{{\bf{B}}}]=G[\hat{{\bf{A}}}]G[\hat{{\bf{B}}}]$$
(11)
$$\frac{\partial }{\partial {\Theta }_{i}}G({\Theta }_{i})\left[{\hat{{\bf{A}}}}^{2}\right]=\frac{\partial }{\partial {\Theta }_{i}}G({\Theta }_{i})\left[\hat{{\bf{A}}}\right]G({\Theta }_{i})\left[\hat{{\bf{A}}}\right]+G({\Theta }_{i})\left[\hat{{\bf{A}}}\right]\frac{\partial }{\partial {\Theta }_{i}}G({\Theta }_{i})\left[\hat{{\bf{A}}}\right],$$
(12)

where Gi) is a gate parameterized by the parameter Θi, and where eq. (12) has been truncated to only include linear quadrature operators \(\hat{{\bf{A}}}\subseteq [{\mathbb{I}},\hat{{\bf{X}}},\hat{{\bf{P}}}]\).

In the following, we will reproduce the results of24, finding the linear parameter shift rules for the relevant Gaussian gates. We will then extend this to also include quadratic operators. Throughout this we will adopt the definition of the Gaussian gates used in24.

Squeezing gate for linear operators

The squeezing gate is parameterized by r the squeezing strength and the gate matrix for linear operators is given by

$${M}_{S}(r)=\left(\begin{array}{ccc}1&0&0\\ 0&{e}^{-r}&0\\ 0&0&{e}^{r}\end{array}\right),$$
(13)
$$\frac{\partial }{\partial r}{M}_{S}(r)=\left(\begin{array}{ccc}0&0&0\\ 0&-{e}^{-r}&0\\ 0&0&{e}^{r}\end{array}\right).$$
(14)

We now want to express the derivate of the matrix as a linear superposition of the gate matrix itself

$$\begin{array}{lll}\frac{\partial }{\partial r}{M}_{S}(r)\;=\;\left(\begin{array}{ccc}0&0&0\\ 0&-{e}^{-r}\frac{{e}^{s}-{e}^{-s}}{2\sinh (s)}&0\\ 0&0&{e}^{r}\frac{{e}^{s}-{e}^{-s}}{2\sinh (s)}\end{array}\right)\\\qquad\qquad\,=\;\frac{1}{2\sinh (s)}\left(\begin{array}{ccc}0&0&0\\ 0&{e}^{-(r+s)}-{e}^{-(r-s)}&0\\ 0&0&{e}^{(r+s)}-{e}^{(r-s)}\end{array}\right)\\\qquad\qquad\,=\;\frac{1}{2\sinh (s)}\left({M}_{S}(r+s)-{M}_{S}(r-s)\right),\end{array}$$
(15)

which is the parameter shift rule for the squeezing gate, with s being an arbitrary shift in the squeezing strength.

Displacement gate for linear operators

The displacement gate is parameterized by α the displacement amplitude and ϕα the displacement angle and given by the matrix

$${M}_{D}(\alpha ,{\phi }_{\alpha })=\left(\begin{array}{ccc}1&0&0\\ 2\alpha \cos ({\phi }_{\alpha })&1&0\\ 2\alpha \sin ({\phi }_{\alpha })&0&1\end{array}\right),$$
(16)
$$\frac{\partial }{\partial \alpha }{M}_{D}(\alpha ,{\phi }_{\alpha })=\left(\begin{array}{ccc}0&0&0\\ 2\cos ({\phi }_{\alpha })&0&0\\ 2\sin ({\phi }_{\alpha })&0&0\end{array}\right),$$
(17)
$$\frac{\partial }{\partial {\phi }_{\alpha }}{M}_{D}(\alpha ,{\phi }_{\alpha })=\left(\begin{array}{ccc}0&0&0\\ -2\alpha \sin ({\phi }_{\alpha })&0&0\\ 2\alpha \cos ({\phi }_{\alpha })&0&0\end{array}\right).$$
(18)

We can once again easily calculate the parameter shift rule for the linear operators

$$\begin{array}{l}\frac{\partial }{\partial \alpha }{M}_{D}(\alpha ,{\phi }_{\alpha })=\left(\begin{array}{ccc}0&0&0\\ 2\cos ({\phi }_{\alpha })\frac{\alpha +s-\left(\alpha -s\right)}{2s}&0&0\\ 2\sin ({\phi }_{\alpha })\frac{\alpha +s-\left(\alpha -s\right)}{2s}&0&0\end{array}\right)\\\qquad\qquad\qquad=\displaystyle\frac{1}{2s}\left({M}_{D}(\alpha +s,{\phi }_{\alpha })-{M}_{D}(\alpha -s,{\phi }_{\alpha })\right),\end{array}$$
(19)
$$\begin{array}{l}\frac{\partial }{\partial {\phi }_{\alpha }}{M}_{D}(\alpha ,{\phi }_{\alpha })=\left(\begin{array}{ccc}0&0&0\\ 2\alpha \frac{1}{2}\left(\cos ({\phi }_{\alpha }+\pi /2)-\cos ({\phi }_{\alpha }-\pi /2)\right)&0&0\\ 2\alpha \frac{1}{2}\left(\sin ({\phi }_{\alpha }+\pi /2)-\sin ({\phi }_{\alpha }-\pi /2)\right)&0&0\end{array}\right)\\\qquad\qquad\qquad\;\,=\displaystyle\frac{1}{2}\left({M}_{D}(\alpha ,{\phi }_{\alpha }+\pi /2)-{M}_{D}(\alpha ,{\phi }_{\alpha }-\pi /2)\right).\end{array}$$
(20)

Rotation gate for linear operators

The final gate of this analysis is the rotation gate R(ϕ) given by the matrix

$${M}_{R}(\phi )=\left(\begin{array}{ccc}1&0&0\\ 0&\cos (\phi )&-\sin (\phi )\\ 0&\sin (\phi )&\cos (\phi )\end{array}\right),$$
(21)
$$\frac{\partial }{\partial \phi }{M}_{R}(\phi )=\left(\begin{array}{ccc}0&0&0\\ 0&-\sin (\phi )&-\cos (\phi )\\ 0&\cos (\phi )&-\sin (\phi )\end{array}\right),$$
(22)

From the derivative matrix we can (similar to the displacement gate) find the basic parameter shift rules by recognizing the linear difference of cosines shifted by π/2 gives sine and vice versa, and we arrive at the same parameter shift rules as with the displacement angle

$$\frac{\partial }{\partial \phi }{M}_{R}(\phi )=\frac{1}{2}\left({M}_{R}(\phi +\pi /2)-{M}_{R}(\phi -\pi /2)\right).$$
(23)

Squeezing gate for quadratic operators

We begin by applying eq. (11) to the squeezing gate to find the higher-order entries of the gate matrix

$$S[\hat{{\bf{X}}}\hat{{\bf{X}}}]=S[\hat{{\bf{X}}}]S[\hat{{\bf{X}}}]={e}^{-2r}{\hat{{\bf{X}}}}^{2},$$
(24)
$$S[\hat{{\bf{P}}}\hat{{\bf{P}}}]=S[\hat{{\bf{P}}}]S[\hat{{\bf{P}}}]={e}^{2r}{\hat{{\bf{P}}}}^{2},$$
(25)
$$S[\hat{{\bf{X}}}\hat{{\bf{P}}}]=S[\hat{{\bf{X}}}]S[\hat{{\bf{P}}}]=\hat{{\bf{X}}}\hat{{\bf{P}}},$$
(26)
$$S[\hat{{\bf{P}}}\hat{{\bf{X}}}]=S[\hat{{\bf{P}}}]S[\hat{{\bf{X}}}]=\hat{{\bf{P}}}\hat{{\bf{X}}}.$$
(27)

The gate matrix is then expanded to include

$${M}_{S}(r)=\left(\begin{array}{ccccccc}1&0&0&0&0&0&0\\ 0&{e}^{-r}&0&0&0&0&0\\ 0&0&{e}^{r}&0&0&0&0\\ 0&0&0&{e}^{-2r}&0&0&0\\ 0&0&0&0&{e}^{2r}&0&0\\ 0&0&0&0&0&1&0\\ 0&0&0&0&0&0&1\end{array}\right).$$
(28)

If we truncate the matrix to only look at the quadratic terms, then we can derive parameters shift rules that apply to the quadratic operators

$$\begin{array}{l}\frac{\partial }{\partial r}{M}_{S,quad}(r)=\left(\begin{array}{cccc}-2r{e}^{-2r}&0&0&0\\ 0&2r{e}^{2r}&0&0\\ 0&0&0&0\\ 0&0&0&0\end{array}\right)\\\qquad\qquad\quad\;\;\,=\frac{1}{\sinh (2s)}\left({M}_{S,quad}(r+s)-{M}_{S,quad}(r-s)\right)\end{array}$$
(29)

Displacement gate for quadratic operators

Once again we can repeat the calculation from before, finding the entries of the gate matrix for the quadratic operators

$$\begin{array}{l}D(\alpha ,{\phi }_{\alpha })[\hat{{\bf{X}}}\hat{{\bf{X}}}]=(2\alpha \cos ({\phi }_{\alpha })+\hat{{\bf{X}}})(2\alpha \cos ({\phi }_{\alpha })+\hat{{\bf{X}}})=4{\alpha }^{2}{\cos }^{2}({\phi }_{\alpha })\\\qquad\qquad\qquad\;\;+\,{\hat{{\bf{X}}}}^{2}+4\alpha \cos ({\phi }_{\alpha })\hat{{\bf{X}}},\end{array}$$
(30)
$$\begin{array}{l}D(\alpha ,{\phi }_{\alpha })[\hat{{\bf{P}}}\hat{{\bf{P}}}]=(2\alpha \sin ({\phi }_{\alpha })+\hat{{\bf{P}}})(2\alpha \sin ({\phi }_{\alpha })+\hat{{\bf{P}}})=4{\alpha }^{2}{\sin }^{2}({\phi }_{\alpha })\\\qquad\qquad\qquad\,+\,{\hat{{\bf{P}}}}^{2}+4\alpha \sin ({\phi }_{\alpha })\hat{{\bf{P}}},\end{array}$$
(31)
$$\begin{array}{l}D(\alpha ,{\phi }_{\alpha })[\hat{{\bf{X}}}\hat{{\bf{P}}}]=(2\alpha \cos ({\phi }_{\alpha })+\hat{{\bf{X}}})(2\alpha \sin ({\phi }_{\alpha })+\hat{{\bf{P}}})=4{\alpha }^{2}\sin (2{\phi }_{\alpha })\\\qquad\qquad\qquad\;+\,\hat{{\bf{X}}}\hat{{\bf{P}}}+2\alpha \left(\cos ({\phi }_{\alpha })\hat{{\bf{P}}}+\sin ({\phi }_{\alpha })\hat{{\bf{X}}}\right),\end{array}$$
(32)
$$\begin{array}{l}D(\alpha ,{\phi }_{\alpha })[\hat{{\bf{P}}}\hat{{\bf{X}}}]=(2\alpha \sin ({\phi }_{\alpha })+\hat{{\bf{P}}})(2\alpha \cos ({\phi }_{\alpha })+\hat{{\bf{X}}})=4{\alpha }^{2}\sin (2{\phi }_{\alpha })\\\qquad\qquad\qquad\;+\,\hat{{\bf{P}}}\hat{{\bf{X}}}+2\alpha \left(\cos ({\phi }_{\alpha })\hat{{\bf{P}}}+\sin ({\phi }_{\alpha })\hat{{\bf{X}}}\right).\end{array}$$
(33)

The resulting gate matrix including quadratic operators is then

$${M}_{D}(\alpha ,{\phi }_{\alpha })=\left(\begin{array}{ccccccc}1&0&0&0&0&0&0\\ 2\alpha \cos ({\phi }_{\alpha })&1&0&0&0&0&0\\ 2\alpha \sin ({\phi }_{\alpha })&0&1&0&0&0&0\\ 4{\alpha }^{2}{\cos }^{2}({\phi }_{\alpha })&4\alpha \cos ({\phi }_{\alpha })&0&1&0&0&0\\ 4{\alpha }^{2}{\sin }^{2}({\phi }_{\alpha })&0&4\alpha \sin ({\phi }_{\alpha })&0&1&0&0\\ 4{\alpha }^{2}\sin (2{\phi }_{\alpha })&2\alpha \sin ({\phi }_{\alpha })&2\alpha \cos ({\phi }_{\alpha })&0&0&1&0\\ 4{\alpha }^{2}\sin (2{\phi }_{\alpha })&2\alpha \sin ({\phi }_{\alpha })&2\alpha \cos ({\phi }_{\alpha })&0&0&0&1\end{array}\right).$$
(34)

Once again, we limit ourselves to only quadratic operators and find the corresponding parameter shift rules. This is a bit more involved than with the squeezing gate, but if we consider the first column with terms proportional to α2. For the displacement amplitude α, if we assume naively that the form of the parameter shift is the same as the linear one but with a different normalization, then differentiating gives us the following equation

$$\begin{array}{lll}\frac{\partial 4{\alpha }^{2}{\cos }^{2}({\phi }_{\alpha })}{\partial \alpha }\;=\;8\alpha {\cos }^{2}({\phi }_{\alpha })=\displaystyle\frac{4}{k}\left({\left(\alpha +s\right)}^{2}-{\left(\alpha -s\right)}^{2}\right){\cos }^{2}({\phi }_{\alpha })\Rightarrow \\\qquad\quad\;\, 2\alpha \;=\;\displaystyle\frac{1}{k}\left({\left(\alpha +s\right)}^{2}-{\left(\alpha -s\right)}^{2}\right)\Rightarrow \\\qquad\quad\;\;\;\, k\;=\;2s,\end{array}$$
(35)

which leads to the same parameter shift rule as with the linear operators

$$\frac{\partial }{\partial \alpha }{M}_{D,quad}(\alpha ,{\phi }_{\alpha })=\frac{1}{2s}\left({M}_{D,quad}(\alpha +s,{\phi }_{\alpha })-{M}_{D,quad}(\alpha -s,{\phi }_{\alpha })\right).$$
(36)

The calculation for the displacement angle ϕα can be calculated by considering transformations of \({\cos }^{2}(\phi )\to -\sin (2\phi ),{\sin }^{2}(\phi )\to \sin (2\phi )\) and \(\cos (\phi )\sin (\phi )\to \cos (2\phi )\). These results can be expressed by linear differences of the original functions shifted up and down by π/4 similar to the basic rotation gate parameter shift. The resulting parameter shift is then

$$\begin{array}{lll}\displaystyle\frac{\partial }{\partial {\phi }_{\alpha }}{M}_{D,quad}(\alpha ,{\phi }_{\alpha })\,=\,\left({M}_{D,quad}(\alpha ,{\phi }_{\alpha }+\pi /4)-{M}_{D,quad}(\alpha ,{\phi }_{\alpha }-\pi /4)\right)\\\qquad\qquad\qquad\qquad\;\;\;+\,\left(1-\frac{\sqrt{2}}{2}\right)\left({M}_{D,quad}(\alpha ,{\phi }_{\alpha }-\pi /2)-{M}_{D,quad}(\alpha ,{\phi }_{\alpha }-\pi /2)\right).\end{array}$$
(37)

The above expressions can be verified by looking at the derivatives of the number operator expectation value \({\partial }_{\alpha }\left\langle \hat{{\bf{n}}}\right\rangle =2\alpha\) and \({\partial }_{\phi \alpha }\left\langle \hat{{\bf{n}}}\right\rangle =0\), as we would expect.

Rotation Gate for quadratic operators

We begin again by finding the quadratic entries of the rotation matrix

$$R(\phi )[\hat{{\bf{X}}}\hat{{\bf{X}}}]={\cos }^{2}(\phi ){\hat{{\bf{X}}}}^{2}+{\sin }^{2}(\phi ){\hat{{\bf{P}}}}^{2}-\cos (\phi )\sin (\phi )\left(\hat{{\bf{X}}}\hat{{\bf{P}}}+\hat{{\bf{P}}}\hat{{\bf{X}}}\right),$$
(38)
$$R(\phi )[\hat{{\bf{P}}}\hat{{\bf{P}}}]={\cos }^{2}(\phi ){\hat{{\bf{P}}}}^{2}+{\sin }^{2}(\phi ){\hat{{\bf{X}}}}^{2}+\cos (\phi )\sin (\phi )\left(\hat{{\bf{X}}}\hat{{\bf{P}}}+\hat{{\bf{P}}}\hat{{\bf{X}}}\right),$$
(39)
$$R(\phi )[\hat{{\bf{X}}}\hat{{\bf{P}}}]={\cos }^{2}(\phi )\hat{{\bf{X}}}\hat{{\bf{P}}}-{\sin }^{2}(\phi )\hat{{\bf{P}}}\hat{{\bf{X}}}+\cos (\phi )\sin (\phi )\left({\hat{{\bf{X}}}}^{2}-{\hat{{\bf{P}}}}^{2}\right),$$
(40)
$$R(\phi )[\hat{{\bf{P}}}\hat{{\bf{X}}}]={\cos }^{2}(\phi )\hat{{\bf{P}}}\hat{{\bf{X}}}-{\sin }^{2}(\phi )\hat{{\bf{P}}}\hat{{\bf{X}}}+\cos (\phi )\sin (\phi )\left({\hat{{\bf{X}}}}^{2}-{\hat{{\bf{P}}}}^{2}\right).$$
(41)

The gate matrix for rotations including quadratic operators is then given by

$${M}_{R,quad}(\phi )=\left(\begin{array}{ccccccc}1&0&0&0&0&0&0\\ 0&\cos (\phi )&-\sin (\phi )&0&0&0&0\\ 0&-\sin (\phi )&\cos (\phi )&0&0&0&0\\ 0&0&0&{\cos }^{2}(\phi )&{\sin }^{2}(\phi )&-\cos (\phi )\sin (\phi )&-\cos (\phi )\sin (\phi )\\ 0&0&0&{\sin }^{2}(\phi )&{\cos }^{2}(\phi )&\cos (\phi )\sin (\phi )&\cos (\phi )\sin (\phi )\\ 0&0&0&\cos (\phi )\sin (\phi )&-\cos (\phi )\sin (\phi )&{\cos }^{2}(\phi )&{\sin }^{2}(\phi )\\ 0&0&0&\cos (\phi )\sin (\phi )&-\cos (\phi )\sin (\phi )&{\sin }^{2}(\phi )&{\cos }^{2}(\phi )\end{array}\right).$$
(42)

Once again differentiating MR,quad(ϕ) is calculated using the same considerations as with ϕα, and we find it to be

$$\frac{\partial }{\partial \phi }{M}_{R,quad}(\phi )=\left({M}_{R,quad}(\phi +\pi /4)-{M}_{R,quad}(\phi -\pi /4)\right).$$
(43)

Calibration of displacement and homodyne angles

Since the parameter shift rules assume a certain shift of experimental parameters, it is necessary to calibrate the experimental apparatus to make sure the correct operations are implemented. This section will deal with the calibrations of the displacement angle and the homodyne angle.

In the coherent locking scheme that stabilizes the phase between the squeezed light and the displacement (and the squeezed light and the local oscillator), the changing of the phase of the 40 MHz electrical local oscillator that downmixes the error signal of the feedback loop rotates the angle ϕα (ϕHD). The correspondence between the set angle of the ELO and the actual quadrature angle is not linear, and thus needs to be calibrated. In order to do this, the phase of the displacement (homodyne) ELO function generator is swept through 2π over 2 seconds, while the homodyne output is recorded at 50 MSa s−1 and downmixed to the 5 MHz sideband using a 1 MHz lowpass filter. The data is normalized to the shot noise standard deviation. For the calibration of the displacement angle, a 5 MHz displacement is added and the homodyne angle is manually set to the squeezed quadrature. For the homodyne angle calibration, the displacement is removed leaving only vacuum squeezing. The raw data of these measurements are shown in Fig. 5.

Fig. 5: Swept homodyne outputs, no calibration.
figure 5

X quadrature measurements normalized to shot noise as a function of the swept ELO phase for (a) the displacement phase lock with the homodyne angle locked to squeezing and (b) the homodyne phase lock without displacement.

Using the data we can estimate the quadrature phase. For the measurement of the homodyne phase using squeezed vacuum, the method used in40 can be directly applied, however for the displacement phase measurement, the marginal probability distribution has to be modified as

$$P(X| {\phi }_{\alpha })=\frac{1}{\sqrt{2\pi {V}_{-}}}{e}^{-\displaystyle\frac{{\left(X-2| \alpha | \cos ({\phi }_{\alpha })\right)}^{2}}{2{V}_{-}}}.$$
(44)

The estimated phases are unwrapped, and the function generator phases of the displacement (homodyne) ELOs are interpolated as a function of the estimated phases using a 3-order B-spline with a smoothing parameter of 1.3. The estimated phases and corresponding spline-representations are shown in Fig. 6.

Fig. 6: Phase calibration.
figure 6

Plot of the function generator phase for (a) the displacement phase lock and (b) homodyne phase lock as a function of the estimated phases. The orange dots are the corresponding B-spline representations.

Using these B-spline interpolations, we can calibrate the system angles. Although the oscillatory behaviour of these calibrations generally result in the set angles to correspond to the desired angles, since the squeezing angle by itself is ill-defined, this can lead to the failure of these calibrations. This can result in a slight error between the set angle and the desired angle.

These calibrations can be verified by plotting the data in Fig. 5 as a function of the unwrapped estimated phases instead of the function generator phase as shown in Fig. 7.

Fig. 7: Swept homodyne outputs, with calibration.
figure 7

X quadrature measurements normalized to shot noise as a function of the estimated quadrature phase for (a) the displacement phase lock and (b) the homodyne phase lock.

Simulation model of the experiment

The experimental setup was modelled analytically using Mathematica and the numerical simulation of the gradient descent-based variational quantum algorithm was done using the PennyLane42 Python library. Considering it’s a continuous-variable system, a Gaussian quantum simulator was utilized as the backend.

A single-mode displaced squeezed state is prepared, characterized by a fixed degree of squeezing and displacement magnitude. The displacement angle, ϕα, and the homodyne detection angle, ϕHD, are taken as the free parameters. The outputs from the quantum system are the mean and variance of the probed quadrature.

The initial probe state is thus,

$$\hat{{\rho }_{0}}({\phi }_{\alpha })=\hat{D}(\alpha ,{\phi }_{\alpha })\hat{S}(r,{\phi }_{r})\left\vert 0\right\rangle$$
(45)

where \(\hat{D}\) and \(\hat{S}\) represent the displacement and quadrature squeezing operators respectively. r is the degree of squeezing and α the magnitude of displacement, corresponding to the experiment.

To simulate the environmental interaction on the probe state, the photon loss and thermal noise in the system are modelled by coupling the probe state mode with a thermal noise mode \(\left\vert \bar{n}\right\rangle\) in a fictitious beamsplitter with a transmittivity of η (Fig. 8).

$$\hat{\rho }=\sqrt{\eta }{\rho }_{0}+\sqrt{1-\eta }\,\left\vert \bar{n}\right\rangle$$
(46)
Fig. 8: Schematic outline of the simulated system.
figure 8

Corresponding to the efficiency of the experimental system, the beamsplitter was simulated with a transmittivity of η = 0.72 and no thermal photons (\(\bar{n}=0\)).

The phase noise in the measurement is modelled by encoding a random phase ϕp, sampled from a Gaussian distribution centred around the root mean square value corresponding to the experimental scheme. The variational algorithm is then simulated by using a gradient descent optimization scheme to minimize the cost function as shown in Fig. 9. It is to be noted that choosing a suitable learning rate and initial parameters is necessary for the optimizer to converge to the minima without requiring a large number of epochs.

Fig. 9: Simulation of the gradient descent-based experiment.
figure 9

The experimental parameters are taken to be r = 1.52, α = 5.2, and phase noise RMS = 0.03 rad. The cost function represents the single-shot Fisher Information and does not take into account the number of samples used, hence the higher value compared to the experiment.

In Fig. 10 the cost function landscape is visualized by setting the simulation model with the experimentally determined parameters and probing the entire parameter space. The experimental data points from the kick-test described in the main text are superimposed on the landscape. We can observe that the optimizer effectively converges close to the theoretical minimum.

Fig. 10: Simulated cost function landscape.
figure 10

The landscape has the kick-measurements super-imposed. The points represent the cost for each epoch, with the colour intensity increasing progressively (the first epoch is depicted in white and the last epoch is the darkest blue). The black dotted line represents the theoretical optimum for the measurement angle ϕHD.

Bayesian optimization

Bayesian optimization is an iterative and gradient-free way of estimating the global minimum x* of some function f(x),

$${{\bf{x}}}_{* }=\mathop{{\rm{argmin}}}\limits_{{\bf{x}}}\,f({\bf{x}})$$
(47)

In our experiment throughout the main paper, x = [ϕα, ϕHD] and f(x) is the corresponding cost function value measured for those parameters. However, in many real-world experiments, f(x) might be corrupted with additive noise, such that the only measurable quantity y(x) is given by

$$y({\bf{x}})=f({\bf{x}})+\epsilon$$
(48)

where it is often assumed that \(\epsilon \sim {\mathcal{N}}(0,{\sigma }_{noise}^{2})\). For low-dimensional x (in our case only 2 dimensions) and expensive (e.g. time or monetary) queries to y(x), Bayesian optimizer is a very relevant optimizer candidate43. This is indeed the case for our experiment outlined in the main text. In order to estimate x*, the Bayesian optimizer framework needs two quantities: 1) a surrogate function and 2) an acquisition function. The surrogate function “mimics” the observed datapoints y(x), and is thus referred to as a surrogate function. Using this function, the acquisition function chooses which points to choose next by means of taking the maximum argument to the acquisition function. In our experiments, we choose a Gaussian Process as surrogate, and Expected Improvement as acquisition. In the next sections, we introduce these two quantities.

Gaussian Process

A Gaussian Process is a non-parametric model, which probabilistically models a variable p(y*x*) with a normal distribution. It does so by conditioning on the corresponding input pair x* as well as previously seen datapoints: a collection of N input/output datapoints \({({{\bf{x}}}_{n},{y}_{n})}_{n = 1}^{N}\) where xn is the n’th input and yn is the corresponding output. The entire collection of input datapoints, can be collected in an input matrix X and the corresponding outputs in a vector y. Specifically, the distribution over any output y*, which together with the corresponding input x* we refer to as a testpoint, is given by the normal distribution

$$p({y}_{* }| {{\bf{x}}}_{* },{\bf{X}},{\bf{y}})={\mathcal{N}}({{\boldsymbol{\mu }}}_{{y}_{* }| {{\bf{x}}}_{* },{\bf{X}},{\bf{y}}},{{\mathbf{\Sigma }}}_{{y}_{* }| {{\bf{x}}}_{* },{\bf{X}},{\bf{y}}}),$$
(49)
$${{\boldsymbol{\mu }}}_{{y}_{* }| {{\bf{x}}}_{* },{\bf{X}},{\bf{y}}}=K{({{\bf{x}}}_{* },{\bf{X}})}^{\top }{[K({\bf{X}},{\bf{X}})+{\sigma }_{n}^{2}I]}^{-1}{\bf{y}},$$
(50)
$${{\mathbf{\Sigma }}}_{{y}_{* }| {{\bf{x}}}_{* },{\bf{X}},{\bf{y}}}=K({{\bf{x}}}_{* },{{\bf{x}}}_{* })-K{({{\bf{x}}}_{* },{\bf{X}})}^{\top }{[K({\bf{X}},{\bf{X}})+{\sigma }_{n}^{2}I]}^{-1}K({\bf{X}},{{\bf{x}}}_{* }).$$
(51)

and where K(X, X) is called the kernel matrix. The kernel matrix is an N × N positive semi-definite matrix that contains pairwise similarity measures between the training points (vectors). Similarly K(x*, X) is a N dimensional vector with a similarity measure between the test point and all the training points. It is easily verified from eq. (51) that both \({{\boldsymbol{\mu }}}_{{y}_{* }| {{\bf{x}}}_{* },{\bf{X}},{\bf{y}}}\) and \({{\mathbf{\Sigma }}}_{{y}_{* }| {{\bf{x}}}_{* },{\bf{X}},{\bf{y}}}\) are scalars. For full derivation, we refer to34.

A popular choice of similarity measure between two datapoints (represented as vectors) is the Radial Basis Function (RBF) Kernel (also called a Gaussian Kernel) given by

$${K}_{RBF}({\bf{x}},{{\bf{x}}}^{{\prime} })=exp\left(-\frac{| | {\bf{x}}-{{\bf{x}}}^{{\prime} }| {| }^{2}}{2{\sigma }^{2}}\right)$$
(52)

which is a function where similarity exponentially decays as the Euclidean distance between two points increases. The hyperparameter σ is called the length scale, as it defines the scale of how quickly the similarity should decrease. In our experiments, we use a Logarithmic Normal distribution prior and estimate it by maximizing the log-likelihood of the Gaussian Process. We also include an output scale hyperparameter σscale such that the final kernel function is given by \({\sigma }_{scale}\cdot {K}_{RBF}({\bf{x}},{{\bf{x}}}^{{\prime} })\).

Acquisition function

We use 136 datapoints and refer to this as our initial training set \({{\mathcal{D}}}_{0}:= [{\bf{X}},{\bf{y}}]\) and using the above equations we get the predictive distribution of the Gaussian Process. We refer to this Gaussian Process as our surrogate model, since it models the underlying loss landscape.

We now iteratively query points by finding the input point x* that maximizes an acquisition function. A popular choice for acquisition function to go together with the Gaussian Process surrogate model is the expected improvement given by

$${\text{EI}}\,({{\bf{x}}}_{* })=(\mu ({{\bf{x}}}_{* })-f({{\bf{x}}}_{* }^{+}))\Phi (Z({{\bf{x}}}_{* }))+\sigma ({{\bf{x}}}_{* })\phi (Z({{\bf{x}}}_{* })),$$
(53)

where \(f({{\bf{x}}}_{* }^{+})\) is the current best guess of the global minimum and \({{\bf{x}}}_{* }^{+}\) is the parameter setting, Φ is the cumulative distribution function of a standard normal distribution, ϕ is the probability density function of a standard normal distribution and μ(x*) and σ(x*) comes from the surrogate predictive distribution. EI is based on calculating expected marginal gain utility in the Gaussian Process after performing observation for candidate parameters44. The next queried input is thus given by

$${{\bf{x}}}_{next}=\mathop{{\rm{argmax}}}\limits_{{\bf{x}}}\,\,\text{EI}\,({\bf{x}})$$
(54)

Note that xnext is a parameter combination [ϕHD, ϕα] which we have not used before. This parameter set is now used in the experiment to get the corresponding loss value y(xnext). The dataset is now updated with this value to obtain \({{\mathcal{D}}}_{1}\), and so on. If the reader wants a pedagogical illustration of this process, have a look at Fig. 2.6 in ref. 34.

Hyper-parameters of the Gaussian process

Hyperparameters of the Gaussian Process were chosen very carefully. Their wrong definition might lead to the model’s worse performance or in a critical case, the model not being able to perform optimization at all. Defining hyperparameters involves setting values of the mean and standard deviation of the aforementioned length scale and output scale hyperparameters. They can be either well-defined values (as for presented results in the main section of the paper), too strict (too low) or too loose (too high). On Fig. 11a we can see results for too loose definition of hyper-parameters and on Fig. 11b results for too strict values defining hyper-parameters are presented.

Fig. 11: Bayesian optimization with bad parameters.
figure 11

Demonstration of the Bayesian optimization over 50 epochs for (a) very high and (b) very low mean and variance of the hyper-parameters of the Gaussian Process. (Top) The phase angles set by the algorithm. (Middle) The measured cost function C = 1/F. The dotted line is the shot noise limit taking into account the number of photons in the measurement and the number of samples used to estimate the cost function. (Bottom) The measured quadrature mean values and variances. The dotted lines are the optimal values predicted by the theory (“section IIIB”).

In figure Fig. 11b we see that too strict values defining hyperparameters lead to the optimizer performing very small changes of angle parameters resulting in it being stuck in areas that not necessarily are optimum. This argument is supported by relatively high values of a cost function which rarely goes below the limit of shot noise. In figure Fig. 11a we also see that too loose values of mean and variance of hyperparameters result in significant changes of angles in consecutive epochs. It is followed by considerable changes in the mean value of X quadrature, its variance relative to the shot noise and in values of a cost function as well. This behaviour is an example of a completely wrong choice of hyperparameters for which the model can’t relate its definition with observations gathered during measurements. It results in the model “randomly walking” over the domain of angle parameters and not being able to perform optimization.

Both of these cases show the importance of defining the model correctly and prove that it should be done with meticulous attention.