Introduction

Neural network quantization is a technique for model compression through bit-width reduction of network parameters. It offers substantial memory footprint reduction, inference acceleration, and energy saving with minimal performance degradation1. Its efficacy has attracted numerous researchers to innovate in architecture and training strategies, such as distribution rectification distillation2, imbalanced quantization3, and zero-shot quantization4. Extensive research and engineering practice have demonstrated the effectiveness of neural network quantization in accelerating real-valued tasks and compressing real-valued networks5,6. However, for tasks involving complex-valued physical fields, the direct application of existing real-valued quantization operations often leads to a significant degradation in the quality of physical field reconstruction7,8,9.

Complex-valued neural networks possess strong capabilities in fitting complex physical fields, serving as potent tools for resolving intricate physical problems. They align closely with the mathematical essence of complex-valued physical fields. The complex number itself integrates the key information of a physical field—amplitude and phase—into a unified format. It encodes the relationship between phase and amplitude, thereby enabling the description of phenomena such as wave superposition and interference in electromagnetic waves, sound waves, quantum mechanics, and seismic waves10,11,12,13,14,15. Complex values possess elegant differential properties, allowing differential equation problems to be transformed into simple algebraic equations, thereby effectively reducing computational costs. On the other hand, the exponential function form of complex numbers systematically captures oscillatory and periodic behaviors. Waves, vibrations, and modes within physical fields all exhibit periodicity, rendering their representation through complex numbers both highly natural and compact. For example, in fluid dynamics, complex-valued methods are highly valuable for the spectral analysis and boundary analysis of turbulence, laminar flow, and vortices16,17,18. In thermodynamics, complex numbers simplify the periodic behaviors in heat conduction problems into algebraic operations19.

Although quantization for real-valued neural networks is a mature field, it is not directly applicable to complex-valued networks. Prevailing approaches treat the real and imaginary components as independent channels for quantization, a method we term independent component quantization. This paradigm, while straightforward, is mathematically suboptimal for coherent systems. It ignores the algebraic coupling between real and imaginary components, leading to uncorrelated quantization errors that severely disrupt the phase relationship during multiplication. Consequently, it introduces non-physical noise that degrades amplitude and—more critically—phase fidelity, resulting in artifacts that limit model utility in phase-sensitive tasks like holography or synthetic aperture radar (SAR) imaging. This limitation stems from a fundamental divergence in how physical information is mathematically represented.

To bridge this gap, we propose a foundational rethinking of quantization for complex-valued neural networks. We introduce a universal framework that respects the mathematical properties of complex operations, ensuring quantization noise is structured to preserve phase coherence. As shown in Fig. 1, our core theoretical innovation is a joint real-imaginary quantization scheme that explicitly models the error propagation in complex multiplication, minimizing corruption of the resultant vector angle and magnitude. To further enhance efficiency, the proposed quantization is augmented with an adaptive-precision strategy. We incorporate an adaptive-precision mechanism that dynamically allocates layer-wise bit-widths, guided by the sensitivity of phase and amplitude inaccuracies to the physical task-specific objective.

Fig. 1: Overview. The proposed complex-valued network quantization for ultra-efficient physical field computing.
Fig. 1: Overview. The proposed complex-valued network quantization for ultra-efficient physical field computing.The alternative text for this image may have been generated using AI.
Full size image

The color image, “Green and yellow macaw bird” on the right-hand side, is originally posted to Unsplash by Andrew Li at https://unsplash.com/photos/gold-and-blue-macaw-on-brown-wooden-stick-iLVaLbfe9_g.

The contributions of this work are threefold. (1) We formally reveal the limitation of independent component quantization and establish the necessity for algebraically consistent quantization in complex-valued deep learning. (2) We devise a holistic quantization framework that jointly optimizes real and imaginary parts for amplitude and phase fidelity, ensuring the quantized network remains a physically valid approximator. (3) We demonstrate that this framework enables unprecedented efficiency while preserving performance. Our method generalizes across diverse domains—hologram generation, audio classification, wireless signal recognition, and SAR processing—achieving superior accuracy while reducing computational load and memory footprint. Notably, we demonstrate inference speedup on a mobile device, proving the practical feasibility of deploying high-fidelity, complex-valued scientific models at the edge.

By establishing a principled approach to complex-valued network compression, this work enables the development of lightweight AI models for computationally demanding scientific fields, from electromagnetics and thermodynamics to quantum physics.

Results

Complex-valued mixed-precision quantization

Our complex-valued mixed-precision quantization method comprises two training stages: identifying the optimal quantization bit width for each network layer and quantizing the network to the optimal bit widths, as shown in Fig. 2.

Fig. 2: The training procedure of the proposed complex-valued mixed-precision quantization block.
Fig. 2: The training procedure of the proposed complex-valued mixed-precision quantization block.The alternative text for this image may have been generated using AI.
Full size image

Our training strategy comprises two stages. a Training stage 1: Identifying the optimal quantization bit width of the parameters in each network layer. b Training stage 2: Quantizing the network to the optimal bit widths. \({a}_{{real}},{a}_{{imag}},{w}_{{real}}\), and \({w}_{{imag}}\) are the real and imaginary parts of the input activations and weights. \(\bar{{a}_{{real}}},\bar{{a}_{{imag}}},\bar{{w}_{{real}}}\), and \(\bar{{w}_{{imag}}}\) are the expectation of the quantized activations and weights. \({a}_{{out}}\) is the output of the complex-valued mixed-precision block. \({sa}\) and \({sw}\) are the quantization spacing for activations and weights. \(\sigma\) is the variance of the distribution of \(w\). \(f\left(a\right)\) and \(f\left(w\right)\) are the probability distributions of activations and weights. Conv denotes the real-valued convolution operator.

Training stage 1: Identifying the optimal quantization bit width of the parameters in each network layer. As illustrated in Fig. 2, this stage aims to determine the best quantization bit width for the real and imaginary parts of activations and weights (areal, aimag, wreal, and wimag)of each layer. In this study, full precision is employed for the input and output layers. For the intermediate layers, we determine the optimal bit width within the range of 1 to 4 bits. We initialize four probabilities corresponding to selecting 1 to 4 bits, represented by learnable parameters (\({p}_{0},{p}_{1},{p}_{2}\), and \({p}_{3}\)). During training, these probabilities are continuously adjusted via gradient descent to minimize the network’s loss, which comprises task loss and complexity loss. Task loss assesses the quality of the quantized images, while complexity loss regulates the quantization bit width. Task loss encourages the selection of higher bit widths, whereas complexity loss favors lower bit widths. By minimizing the network loss, we can achieve a balance between image quality and network complexity. After this training stage, the values of (\({p}_{0},{p}_{1},{p}_{2}\), and \({p}_{3}\)) for each parameter are learnt. Among the four probabilities of one network parameter, the highest probability \({p}_{n}\) indicates the optimal quantization bit width for this parameter is \(n+1\).

Training stage 2: Quantizing the network to the optimal bit widths. In the previous stage, the optimal quantization bit widths for each layer’s parameters \({a}_{{real}},{a}_{{imag}},{w}_{{real}}\), and \({w}_{{imag}}\) are identified. In this second stage, the network is reconstructed and retrained with these optimal bit widths. The network loss is minimized by tuning the network weights, \({w}_{{real}}\), and \({w}_{{imag}}\). Here, the network loss only includes task loss to ensure high-quality image reconstruction. Ultimately, the quantized lightweight network is capable of generating high-quality holograms.

Complex-valued mixed-precision network for hologram generation

We take computer-generated holography (CGH)20,21,22,23,24,25,26,27,28 as a representative task for detailed methodological discussion and comparison. Hologram generation is highly sensitive to computational errors29,30,31,32,33, making it a suitable benchmark for evaluating the impact of quantization errors on model performance. Furthermore, due to the complexity of inverse complex-valued light field reconstruction and the conversion from complex-valued fields to phase-only encoding, hologram generation is generally a computationally intensive task34,35,36,37,38,39. Thus, it is necessary to develop corresponding compression techniques to enable efficient execution on resource-constrained devices. In this work, we propose a light-field-aware ultra-low bit network (ULBN) with complex-valued mixed-precision quantization. This network design has two features:

Light-field-aware mixed-precision quantization: In holography, small perturbations, such as quantization noise, in the complex field can propagate and amplify during wavefront reconstruction. To mitigate this problem, we retain full-precision representations in shallow layers that directly interact with the light field, where preserving fine-grained amplitude and phase information is critical. As the signal moves deeper into the network, where operations become more abstract and less sensitive to high-frequency detail, we apply lower precision using an adaptive mixed-precision scheme. This sensitivity-aware bit allocation improves efficiency without sacrificing reconstruction quality.

Light field-aware loss function: Our reconstruction quality loss is computed on the complex field after light propagation. It implies that any quantization errors introduced during forward propagation are penalized according to their impact on the reconstructed light field, thereby establishing an optical feedback mechanism within the optimization process.

The proposed network architecture is depicted in Fig. 3. Our light-field-aware ULBN takes the amplitude of the target image as input. The output is a phase-only hologram (POH). The ULBN is composed of three subnetworks, namely a phase generator, a POH encoder, and a ringing artifacts compensator. The phase generator network takes the target amplitude distribution as input and predicts the phase of the target field. The resultant target complex field propagates backward from the target display plane to the spatial light modulator (SLM) plane via the angular spectrum method (ASM). The ringing artifacts compensator is proposed in our previous work40. It is a plug-in neural network model that effectively reduces the ringing artifacts caused by the modeling error in the widely used forward and backward propagation CGH methods. The module generates a residual complex field to compensate for the two-fold diffraction propagation modeling error. The calculated residual complex field is added to the complex field at the SLM plane to generate the compensated complex field. The POH encoder transfers the complex-valued light field into a POH. The compensated complex light field and the predicted POH at the modulator plane propagate forward individually via the ASM model, generating two reconstructed amplitudes \({A}_{{rec\_compen}}\) and \({A}_{{rec\_poh}}\) at the target display plane.

Fig. 3: Structural diagram of our light-field-aware ultra-low bit network.
Fig. 3: Structural diagram of our light-field-aware ultra-low bit network.The alternative text for this image may have been generated using AI.
Full size image

The proposed model consists of three subnetworks, a phase generator, a phase-only hologram (POH) encoder, and a ringing artifacts compensator. The subnetworks have downsampling and upsampling complex-valued mixed-precision blocks. FASM−1 and FASM correspond to opposite propagation distances and propagate optical fields using the ASM. The Target amplitude image is originally posted to Flickr by laszlo-photo at https://www.flickr.com/photos/40467171@N00/4972189987.

Simulation results

The proposed ULBN network, U-Net35, Holo-encoder36, and HoloNet37 are evaluated on the DIV2K dataset41 with bit operations, memory, peak signal-to-noise ratio (PSNR), structural similarity index measure (SSIM), learned perceptual image patch similarity (LPIPS), natural image quality evaluator (NIQE), floating point operations (FLOPs), and parameters (Params) in Fig. 4a–h, respectively. Because our method effectively compresses the complex-valued model to 1–4 bits, it achieves substantially lower computational cost and memory usage than other CGH algorithms. Our ULBN requires only 88.352 gigabit operations to generate the phase data of the holography experiment, which is 99.1% less than HoloNet37. The memory footprint is just 28.433 kilobytes, representing a 99.8% reduction compared to HoloNet. Despite these minimal resource demands, our method delivers the best overall performance across all evaluation metrics. For PSNR and SSIM, higher values indicate better quality, whereas for LPIPS and NIQE, lower values reflect greater perceptual similarity. With a PSNR of 30.75 dB, our ULBN surpasses HoloNet by about 4 dB. For the perceptual metrics, including SSIM, LPIPS, and NIQE, our approach still outperforms all other algorithms, delivering a better visual experience for viewers. Additional test results on DIV2K (Table S4), Kodak (Table S5), and Flickr2K (Table S6) datasets are provided in Section 4.3 of the Supplementary Information.

Fig. 4: Performance and computational comparison of different methods.
Fig. 4: Performance and computational comparison of different methods.The alternative text for this image may have been generated using AI.
Full size image

Comparison of U-Net, Holo-encoder, HoloNet, and our ULBN in terms of a bit operations, b memory, c PSNR, d SSIM, e LPIPS, f NIQE, g FLOPs, and h Params on the DIV2K dataset.

The visualized results are shown in Fig. 5a. Our ULBN produces the best reconstructions, free from speckle noise and artifacts across the entire image. Gerchberg–Saxton (GS) algorithm42 requires iterative optimization for each input. The results of U-Net, Holo-encoder, and HoloNet contain noticeable artifacts that degrade the overall visual quality. Our quantized complex-valued network achieves improvements in both computational efficiency and reconstruction quality.

Fig. 5: Comparison of holographic image reconstruction methods and experimental setup.
Fig. 5: Comparison of holographic image reconstruction methods and experimental setup.The alternative text for this image may have been generated using AI.
Full size image

a Simulated reconstruction of two-dimensional images. b Experimental optical reconstruction of two-dimensional images. c Experimental optical reconstruction of three-dimensional images. The image of the scenario is from the Couch dataset50. d Experimental optical reconstruction of binary images. e Schematic diagram of the experimental setup. f Photograph of the experimental setup. The ground truth images in the first column of (a) and (b) are the “Green and yellow macaw bird,” which is originally posted to Unsplash by Andrew Li at https://unsplash.com/photos/gold-and-blue-macaw-on-brown-wooden-stick-iLVaLbfe9_g. The other images in the first column of (a) and (b) are reproduced from the ground truth image. The images in (c) are reproduced from www.bigbuckbunny.org (© 2008, Blender Foundation) under a Creative Commons license (https://creativecommons.org/licenses/by/3.0/).

Experimental results

We conduct comprehensive optical experiments to evaluate the practical performance of our proposed methods on a benchtop prototype. Figure 5b–d illustrate the visualized reconstructions captured with the optical setup in Fig. 5e and f.

The 2D holographic image simulation results are shown in Fig. 5a. For comparison, we demonstrate the relevant experimental results in Fig. 5b. We compare the captured color images of GS, U-Net, Holo-encoder, HoloNet, and our proposed ULBN. For the GS algorithm, the speckle noise exists in the optical reconstruction results, leading to noticeable image quality degradation. Our proposed ULBN method models the properties of complex values in its quantization process and adopts an adaptive complex-valued mixed-precision quantization strategy, effectively reducing light field computation quantization errors and resulting in smoother POHs. This approach significantly reduces the speckle noise problem in the reconstructions. Consequently, compared with the simulations, the optical reconstructions of our method show minimal degradation, and the speckle noise is mostly suppressed, ensuring higher quality and clearer images.

To validate the generalizability of our method across different datasets, we conducted experiments on a binary image dataset in Fig. 5d with its enlarged results in Section 4.4 of Supplementary Information and Fig. S2. The results demonstrate that our approach consistently achieves high-resolution binary images. The checkboard image displays distinct alternating dark and light stripes. The digits on the Indian head image are clearly discernible. Both the numbers and stripes in the USAF-1951 image are distinctly visible.

To demonstrate the capabilities of our network in three-dimensional holographic displays, we present a 3D light-field-aware network architecture in the Supplementary Information Section 5.1 and Fig. S3. The 3D ULBN requires 104 giga FLOPs and 620 gigabits operations for computation, and 335 kilobytes for memory. As illustrated in Fig. 5c, our experimental results demonstrate the captured 3D high-quality images at different distances. Additional 3D holographic display results are presented in Figs. S4 and S5 of the Supplementary Information Sections 5.2 and 5.3. The camera is positioned at 19.5 cm, 20 cm, and 20.5 cm to capture pictures focusing on the hippopotamus, owl, and ring, respectively. The reconstructed images at varying distances demonstrate the ULBN’s capabilities of generating 3D holograms.

Model deployment

The models are deployed on representative hardware platforms, including a high-performance desktop CPU and a resource-constrained Android phone. Given the inadequate support for extremely low bit-width mixed-precision quantization of complex-valued models in current deployment frameworks, we utilize standard uniform INT8 quantization to measure on-device latency of quantized models. Despite the lack of mixed-precision strategies, the results still affirm a key advantage: the effectiveness of our real-imaginary joint quantization approach. For the desktop evaluation, the INT8 models are produced using PyTorch post-training static quantization workflow (torch.ao.quantization) and executed in PyTorch 2.8 on an Intel Core i9-10980XE CPU. For the mobile evaluation, all the models are tested on an HONOR 70 smartphone with a Qualcomm Snapdragon 778 G Plus processor, 12 GB RAM, and Android 12. The models are exported and quantized using PyTorch 2.8. For mobile deployment, we convert them to the ExecuTorch 0.7.0 format, leveraging the XNNPACK backend - a highly optimized neural network inference engine for ARM CPUs. The resulting ExecuTorch models are packaged by an Android application package (APK) using the standard Java-based Android build tools for deployment and benchmarking. The latency results are summarized in Table 1, with detailed results and analysis provided in Section 6 and Table S7 of the Supplementary Information.

Table 1 Latency of hologram generation models on a desktop CPU and an Android device. The best performance in latency is marked in bold

On the desktop CPU, our 8-bit quantized model achieves more than a 2× speedup over its 32-bit (unquantized) counterpart and substantially outperforms prior CGH methods such as HoloNet, Holo-encoder, and U-Net.

As shown in Table 1, the latency trends on the mobile platform are consistent with those observed on the CPU. Our 8-bit joint real-imaginary quantization method achieves the lowest latency and delivers a 389× speedup compared with HoloNet. These results demonstrate that quantization accelerates hologram generation effectively, without introducing noticeable degradation in output quality.

Generalization capability of ULBN on other complex-valued physical signals

To evaluate the generalization capability of ULBN, we employ it to three representative complex-valued scenarios: acoustics, wireless modulation, and SAR, as illustrated in Fig. 6 and Section 7 of Supplementary Information.

Fig. 6: Our methods applied to diverse physical signals.
Fig. 6: Our methods applied to diverse physical signals.The alternative text for this image may have been generated using AI.
Full size image

a Audio classification from the short-time Fourier transform (STFT) of raw audio clips. b Wireless modulation classification from complex-valued signals under noise. c SAR target recognition from complex-valued SAR data. In the subfigures, Conv, DSConv, Linear, BN, ReLU, MPool, and APool represent convolution, depthwise separable convolution, linear layer, batch normalization, rectified linear unit, max pooling, and average pooling, respectively.

For audio classification, it aims to identify the speaker of acoustic signals. In this study, we utilize a subset of the LibriSpeech dataset43, which contains 28 speaker classes for training and testing. The structure of our baseline network is shown in Fig. 6a. Raw audio clips are transformed using the short-time Fourier transform (STFT) to obtain amplitude and phase representations, which are then processed by our ultra-low-bit and mixed-precision complex-valued network. The results are presented in Fig. 7a. Our quantized model achieves an accuracy of 98.93%, closely matching the full-precision complex-valued baseline (99.36%) and outperforming the real-valued model (97.65%). Crucially, the quantized network delivers substantial efficiency gains: the number of bit operations decreases by 85% relative to the complex-valued baseline, and memory usage decreases by more than 80%.

Fig. 7: Comparison of real-valued, complex-valued, and our quantized complex-valued networks across diverse physical signal tasks.
Fig. 7: Comparison of real-valued, complex-valued, and our quantized complex-valued networks across diverse physical signal tasks.The alternative text for this image may have been generated using AI.
Full size image

a Results on the audio classification task. b Results on the wireless modulation classification task. c Results on the SAR target recognition task.

For wireless modulation, complex-valued wireless signals are typically degraded by noise, fading, and frequency shifts, requiring accurate classification of the modulation mode. We use the RadioML 2016.10a dataset44, which provides simulated wireless signals and the corresponding modulation mode. The backbone architecture is illustrated in Fig. 6b. With the proposed quantization scheme, the network efficiently classifies modulation modes from complex-valued noisy inputs. Our results are illustrated in Fig. 7b. Compared with the complex-valued method, our quantized complex-valued model achieves comparable accuracy. Meanwhile, our method substantially reduces bit operations by nearly 85%, and memory usage by about 81%, indicating substantial efficiency gains with minor compromise in performance. Compared with the real-valued method, our method achieves better classification accuracy, while with about a 41% reduction of bit operations and about 67% reduction of memory.

For SAR target recognition, complex-valued microwave signals carrying information about objects are reflected from the ground and captured by receivers. We use the MSTAR dataset45 for training and testing. Figure 6c showcases the network for evaluation. Figure 7c summarizes the results. Our quantized complex-valued model achieves 97.98% accuracy, outperforming the real-valued baseline and closely approaching the full-precision complex-valued model. Meanwhile, our method substantially reduces bit operations by 87% and memory consumption by 80% compared to the complex-valued baseline. Relative to the real-valued model, our approach reduces bit operations by about 47% and memory usage by 59%, while delivering superior accuracy.

Discussion

In this study, we propose a universal quantization framework for complex-valued neural networks, offering an efficient approach to capture the intrinsic coupling relations and physical laws in complex physical fields. By integrating physics-aware adaptive precision training, our approach achieves high-quality outputs with minimal computational and memory overhead, making it suitable for deployment on resource-constrained devices. Extensive evaluations across hologram generation, audio classification, wireless signal classification, and synthetic aperture radar tasks demonstrate that our method achieves superior performance and substantial reductions in computational load and memory usage. Compared to the state-of-the-art model HoloNet, our ULBN achieves approximately 4 dB improvement in PSNR while reducing computational and memory costs by 99.1% and 99.8%, respectively. Real-world edge device deployment further validates its practicality. These results highlight the potential of lightweight complex-valued neural networks for scientific computing, providing a broadly applicable solution with implications that extend beyond computational optics to the broader fields of machine learning and computational physics.

Methods

Two training stages for ULBN

Training stage 1: Identifying the optimal quantization bit width for each network layer. Each complex-valued mixed-precision block receives an input \({a}_{{in}}\), which is the output from the preceding network layer. This input \({a}_{{in}}\) comprises real and imaginary components, expressed as \({a}_{{in}}={a}_{{real}}+j{a}_{{imag}}\). The identical quantization method is employed for both the real and imaginary parts. In this paper, \(a\) represents either \({a}_{{real}}\) or \({a}_{{imag}}\). We assume that a follows the Gaussian distribution. The comparative results presented in Sections 1 and 2 of the Supplementary Information demonstrate that Gaussian quantization (GQ) outperforms learning step quantization (Table S1) and uniform quantization (Table S2), consistent with the Gaussian hypothesis. Firstly, as shown in Eq. (1) and visualized in Fig. 2, the Half-Wave Gaussian Quantization (HWGQ) method is utilized to perform ReLU activation and quantization simultaneously46,47,48. The use of HWGQ is motivated by the prior assumption that the distribution of activations follows a Gaussian distribution, combined with the ReLU activation function. The probability distribution of the negative part of a becomes zero after this operation, resulting in a half-wave Gaussian distribution for a. Because HWGQ includes activation and quantization processes, it is called activation quantization in this work and is calculated by:

$${{{\rm{HWGQ}}}}\left(a,b\right)={{{\rm{clip}}}}\left\{{{{\rm{round}}}}\left(\frac{a}{{s}_{a}}\right)\cdot {s}_{a},0,\left({2}^{b}-1\right)\cdot {s}_{a}\right\},$$
(1)

where \({{\rm{clip}}}\{\cdot,\cdot,\cdot \}\) is a truncation function. The first part refers to the quantization of activations to generate discrete values. The second and third parts are the lower and upper bounds of the quantization range. \(b\) denotes the bit width after quantization. Here, \(b\in \{{{\mathrm{1,2,3,4}}}\}\). \({s}_{a}\) is the quantization spacing. The value of \({s}_{a}\) is given by the Lloyd’s algorithm (\({s}_{a}\in \{{{\mathrm{0.799,0.538,0.3217,0.185}}}\}\), corresponding to the four-bit values)46. The learnable parameters \({p}_{s}^{a}\) represent the probability of selecting the bit width \(s+1\) as the optimal activation quantization bit width, where \(s\in \{{{\mathrm{0,1,2,3}}}\}\). For each activation parameter, the highest probability \({p}_{s}^{a}\) indicates that its corresponding quantization bit width \(s+1\) is optimal. Since the values of probability \({p}_{s}^{a}\) should lie between 0 and 1, it is given by normalizing another learnable parameter \(\alpha\). The probabilities are calculated by:

$${p}_{s}^{a}=\frac{\exp \left({\alpha }_{s}\right)}{{\sum }_{m=0}^{{||}{B}^{a}{||}-1}\exp \left({\alpha }_{m}\right)},$$
(2)

where \({B}^{a}\) is the set of possible bit widths for activation. \(s\) and \(m\) are the indexes of \({{{\rm{\alpha }}}}\) and \({p}_{s}^{a}\). Here, \({B}^{a}=\{{{\mathrm{1,2,3,4}}}\}\). \(s\) and \(m\in \{{{\mathrm{0,1,2,3}}}\}\). Notably, \({p}_{0}^{a}+{p}_{1}^{a}+{p}_{2}^{a}+{p}_{3}^{a}=1\). The expected value \(\bar{a}\) of the quantized activations \({{\rm{HWGQ}}}\left(a,{b}_{s}^{a}\right)\) is calculated as a weighted sum across different quantization bit widths, with each weight given by the probability \({p}_{s}^{a}\).

$$\bar{a}=\sum\limits_{s=0}^{{||}{B}^{a}{||}-1}{{{{\rm{p}}}}}_{s}^{a}\cdot {{{\rm{HWGQ}}}}\left(a,{{{{\rm{b}}}}}_{s}^{a}\right),$$
(3)

where \(\bar{a}\) represents either \(\bar{{a}_{{real}}}\) or \(\bar{{a}_{{imag}}}\). \({b}_{s}^{a}\) \(\in {B}^{a}\).

Secondly, quantization of weights \(w\) is performed. The symbol \(w\) represents either \({w}_{{real}}\) or \({w}_{{imag}}\). As visualized in Fig. 2, unlike the activation quantization, the range of quantized weights extends from -∞ to ∞. The Gaussian Quantization method46,47,48 is adopted to quantize the network weights. Similar to HWGQ, the use of GQ is motivated by the assumption that the activations a follow a Gaussian distribution. The weights after quantization are calculated by:

$${{{\rm{GQ}}}}\left(w,b\right)={{{\rm{clip}}}}\left\{\left({{{\rm{round}}}}\left(\frac{w}{{\sigma s}_{w}}+\frac{1}{2}\right)-\frac{1}{2}\right)\cdot {\sigma s}_{w},-\left(\frac{{2}^{b}-1}{2}\right)\cdot {\sigma s}_{w},\left(\frac{{2}^{b}-1}{2}\right)\cdot {\sigma s}_{w}\right\},$$
(4)

where \(b\) denotes the bit width. \({s}_{w}\) is the quantization spacing. The value of \({s}_{w}\) is given by the Lloyd’s algorithm (\({s}_{w}\in \{{{\mathrm{1.596,0.996,0.586,0.336}}}\}\))46. \({{{\rm{\sigma }}}}\) is the variance of the distribution of \(w\). The probability associated with a quantized weight is:

$${p}_{t}^{w}=\frac{\exp \left({\beta }_{t}\right)}{{\sum }_{n=0}^{{||}{B}^{w}{||}-1}\exp \left({\beta }_{n}\right)},$$
(5)

where \({B}^{w}\) is the set of possible bit widths for activation. \(t\) and \(n\) are the indices of learnable parameters \(\beta\) and \({p}_{t}^{w}\). \(t\), \(n\) \(\in\) \(\{0,1,2,3\}\). The expected values \(\bar{w}\) of the quantized weights \({{\rm{GQ}}}\left(w,{b}_{t}^{w}\right)\) are formulated by:

$$\bar{w}=\sum\limits_{t=0}^{{||}{B}^{w}{||}-1}{{{{\rm{p}}}}}_{t}^{w}\cdot {{{\rm{GQ}}}}\left(w,{{{{\rm{b}}}}}_{t}^{w}\right),$$
(6)

where \(\bar{w}\) represents either \({\overline{{w}_{{real}}}}\) or \({\overline{{w}_{{imag}}}}\). \({b}_{t}^{w}\) \(\in\) \({B}^{w}\). The details of the learnt optimal quantization bit widths of each layer are presented in the Supplementary Information Section 1.

Thirdly, we propose a fusion method for quantizing complex-valued neural networks, which leverages the commutative and associative properties of complex multiplication. The output of this network layer \({a}_{{out}}\) is represented as:

$${a}_{{out}}= {\overline{{a}_{complex}}}*{\overline{{w}_{{complex}}}} \\= \left({\overline{{a}_{{real}}}}*{\overline{{w}_{{real}}}} - {\overline{{a}_{imag}}}*{\overline{{w}_{{imag}}}}\right)+j\left({\overline{{a}_{{real}}}}*{\overline{{w}_{{imag}}}}+{\overline{{a}_{{imag}}}}*{\overline{{w}_{{real}}}}\right),$$
(7)

where \({\overline{{a}_{{complex}}}}={\overline{{a}_{{real}}}}+j{\overline{{a}_{{imag}}}}\) and \({\overline{{w}_{{complex}}}}={\overline{{w}_{{real}}}}+j{\overline{{w}_{{imag}}}}\). The operator \(*\) indicates convolution.

Training stage 2: Quantizing the network to the optimal bit widths. Before deployment, the network is retrained and quantized to the optimal bit widths obtained from Stage 1. Similarly, this stage is divided into three parts, namely quantization of activations, quantization of weights, and complex-valued convolution of activations and weights. Firstly, the real and imaginary parts of the activations are quantized to the best bit widths \({b}_{{opt}}^{a}\) with outputs \({{\rm{HWGQ}}}\left({a}_{{real}},{b}_{{opt}}^{a}\right)\) and \({{\rm{HWGQ}}}\left({a}_{{imag}},{b}_{{opt}}^{a}\right)\). Secondly, the real and imaginary parts of the weights are quantized to the optimal bit widths \({b}_{{opt}}^{w}\) with the outputs \({{\rm{GQ}}}\left({w}_{{real}},{b}_{{opt}}^{w}\right)\) and \({{\rm{GQ}}}\left({w}_{{imag}},{b}_{{opt}}^{w}\right)\). Thirdly, the complex-valued quantized activations and the complex-valued quantized weights are involved in the convolution calculation. The output of the network layer \({a}_{{out}}^{{opt}}\) is formulated as:

$$\begin{array}{c}{a}_{{out}}^{{opt}}=\left\{{{{\rm{HWGQ}}}}\left({a}_{{real}},{b}_{{opt}}^{a}\right)*{{{\rm{GQ}}}}\left({w}_{{real}},{b}_{{opt}}^{w}\right)-{{{\rm{HWGQ}}}}\left({a}_{{imag}},{b}_{{opt}}^{a}\right)*{{{\rm{GQ}}}}\left({w}_{{imag}},{b}_{{opt}}^{w}\right)\right\}\\+j\left\{{{{\rm{HWGQ}}}}\left({a}_{{real}},{b}_{{opt}}^{a}\right)*{{{\rm{GQ}}}}\left({w}_{{imag}},{b}_{{opt}}^{w}\right)+{{{\rm{HWGQ}}}}\left({a}_{{imag}},{b}_{{opt}}^{a}\right)*{{{\rm{GQ}}}}\left({w}_{{real}},{b}_{{opt}}^{w}\right)\right\},\end{array}$$
(8)

Training configuration

The loss function is formulated with the objectives of enhancing hologram quality and determining the optimal bit width. Task loss promotes the selection of higher bit widths to enhance image quality, while complexity loss promotes lower bit widths to reduce network complexity. By minimizing the overall network loss, an optimal balance between image quality and network complexity is achieved. The total loss function \({{{{\mathcal{L}}}}}_{{total}}\) is expressed by:

$${{{{\mathcal{L}}}}}_{t{otal}}={{{{\mathcal{L}}}}}_{t{ask}}+{\mu {{{\mathcal{L}}}}}_{{complexity}},$$
(9)

where \({{{{\mathcal{L}}}}}_{{task}}\) and \({{{{\mathcal{L}}}}}_{{complexity}}\) denote the task loss and complexity loss, respectively. \(\mu\) is the hyperparameter to control the complexity loss.

The task loss \({{{{\mathcal{L}}}}}_{{task}}\) represents the discrepancy between the reconstructed image and the target image. It is given by:

$${{{{\mathcal{L}}}}}_{t{otal}}= {{{{\mathcal{L}}}}}_{m{se}}\left({A}_{re{c}_{{poh}}},{A}_{t{arget}}\right)+{\lambda }_{c{om}}{{{{\mathcal{L}}}}}_{m{se}}\left({A}_{re{c}_{{compen}}},{A}_{t{arget}}\right) \\ + {{{{\mathcal{L}}}}}_{v{gg}}\left({A}_{re{c}_{{poh}}},{A}_{t{arget}}\right)+{\lambda }_{c{om}}{{{{\mathcal{L}}}}}_{v{gg}}\left({A}_{re{c}_{{compen}}},{A}_{t{arget}}\right),$$
(10)

where \({{{{\mathcal{L}}}}}_{{mse}}\) represents the pixel-wise mean squared error (MSE) between the target image and the reconstructed image. When the pixel-wise error approaches zero, the gradient of this loss is too small to properly cover high-frequency variations, resulting in blurred image reconstruction. Therefore, we also include a perceptual loss. It encourages generating natural and perceptually pleasing images based on high-level features extracted from pre-trained networks like the Visual Geometry Group (VGG) Net49. \({{{{\mathcal{L}}}}}_{{vgg}}\) is the Euclidean distance between the VGG-Net’s hidden-layer feature representations of the reconstructed image and the target image. \({A}_{{target}}\) is the amplitude of the target image. \({A}_{{rec}{{{\rm{\_}}}}{poh}}\) is the amplitude of the reconstructed image from POH. \({\lambda }_{{com}}\) is a parameter to adjust the weight of the ringing artifacts compensation module. \({{{{\mathcal{L}}}}}_{{mse}}({A}_{{rec}{{{\rm{\_}}}}{compen}},{A}_{{target}})\) represents the loss between the target image \({A}_{{target}}\) and the reconstructed amplitude from the compensated complex light field \({A}_{{rec}{{{\rm{\_}}}}{compen}}\). \({{{{\mathcal{L}}}}}_{{mse}}({A}_{{rec}{{{\rm{\_}}}}{poh}},{A}_{{target}})\) represents the loss between the target image \({A}_{{target}}\) and the reconstructed amplitude from the predicted POH \({A}_{{rec}{{{\rm{\_}}}}{poh}}\). The coexistence of the two losses decouples the role of the ringing artifacts compensation network and the POH encoder, reducing the learning burden of the network.

The objective of the complexity loss \({{{{\mathcal{L}}}}}_{{complexity}}\) is to minimize the computational cost of network inference. It is proportional to the total bit operations across all network layers. The complexity loss of each layer is related to the expected value of the quantized parameter bit widths, the number of channels, as well as the output and convolution kernel sizes. The calculation of complexity loss is given as follows20:

$${{{{\mathcal{L}}}}}_{{complexity}}=\sum\limits_{l=0}^{L-1}\left(\sum\limits_{t=0}^{{||}{B}^{a}{||}-1}\left({p}_{t}^{l,a}{b}_{t}^{a}\right)\cdot \sum\limits_{s=0}^{{||}{B}^{w}{||}-1}\left({p}_{s}^{l,w}{b}_{s}^{w}\right)\right)\cdot {{comp}}^{l},$$
(11)
$${{comp}}^{l}=\frac{{c}_{{in}}^{l}\cdot {c}_{{out}}^{l}\cdot {h}_{k}^{l}\cdot {w}_{k}^{l}\cdot {h}_{{out}}^{l}\cdot {w}_{{out}}^{l}}{{\left({s}^{l}\right)}^{2}},$$
(12)

where \(L\) represents the total number of layers. \(l\) is the layer index. \({p}_{t}^{l,a}\) and \({p}_{s}^{l,w}\) represent the probability of selecting the bit width \(t+1\) or \(s+1\) as the optimal quantization bit width for the activation \(a\) or weight parameter \(w\) of the layer \(l\). The values of \({p}_{t}^{l,a}\) and \({p}_{s}^{l,w}\) are obtained from Eqs. (2) and (5). The symbol \({b}_{t}^{a}\) or \({b}_{s}^{w}\) represents the candidate bit width from the set \({B}^{a}\) or \({B}^{w}\) for the activation or weight parameter. \({com}{p}^{l}\) is the complexity factor of the layer \(l\). \({c}_{{in}}^{l}\) and \({c}_{{out}}^{l}\) are the number of input channels and the number of output channels of the layer \(l\). \({h}_{k}^{l}\) and \({w}_{k}^{l}\) are the height and width of the convolution kernel in the layer l. \({h}_{{out}}^{l}\) and \({w}_{{out}}^{l}\) are the length and width of the outputs in the layer \(l\). \({s}^{l}\) is the filter stride of the layer \(l\).

According to our ablation study of network structures in the Supplementary Information Section 3 and Table S3, our ULBN network is designed with full precision layers (FPL) at the input and output layers, along with quantization-aware (QAT) mixed-precision quantization (MPQ) of 1-4 bits in the intermediate layers for high-quality holography with reasonable computational cost. The model is trained using Python 3.8 on Ubuntu 20.04, with an NVIDIA A40 GPU and an AMD EPYC 7543 32-Core Processor. In both stages, the models are trained with a learning rate of \({10}^{-3}\) and a batch size of 1. The weights for reconstruction loss, compensation loss, and complexity regularization are 1.0, 0.1, and \({10}^{-6}\), respectively. The training on this hardware takes about one hour. The details of the quantization step size estimation method (Algorithm S1) and the bit widths of each network layer (Fig. S1) are in Sections 4.1 and 4.2 of the Supplementary Information.

Experimental details

In the experiment, a FISBA READYBeam emitting red light at 638 nm, green light at 520 nm, and blue light at 450 nm is utilized, as illustrated in Fig. 5e and f. The phase-only SLM employed is a HOLOEYE LETO-3-CF5−127 LCD modulator with a resolution of 1920 × 1080 and a pixel pitch of 6.4 μm. Sony A7M3 camera is the receiver. The color holographic display is realized by time-multiplexing with 638 nm red, 520 nm green, and 450 nm blue laser sources.