Scaling up for end-to-end on-chip photonic neural network inference

Wu, Bo; Huang, Chaoran; Zhang, Jialong; Zhou, Hailong; Wang, Yilun; Dong, Jianji; Zhang, Xinliang

doi:10.1038/s41377-025-02029-z

Download PDF

Article
Open access
Published: 17 September 2025

Scaling up for end-to-end on-chip photonic neural network inference

Light: Science & Applications volume 14, Article number: 328 (2025) Cite this article

12k Accesses
12 Citations
13 Altmetric
Metrics details

Subjects

Abstract

Optical neural networks are emerging as a competitive alternative to their electronic counterparts, offering distinct advantages in bandwidth and energy efficiency. Despite these benefits, scaling up on-chip optical neural networks for end-to-end inference is facing significant challenges. First, network depth is constrained by the weak cascadability of optical nonlinear activation functions. Second, the input size is constrained by the scale of the optical matrix. Herein, we propose a scaling up strategy called partially coherent deep optical neural networks (PDONNs). By leveraging an on-chip nonlinear activation function based on opto-electro-opto conversion, PDONN enables network depth expansion with positive net gain. Additionally, convolutional layers achieve rapid dimensionality reduction, thereby allowing for an increase in the accommodated input size. The use of a partially coherent optical source significantly reduces reliance on narrow-linewidth laser diodes and coherent detection. Owing to their broader spectral characteristics and simpler implementation, such sources are more accessible and compatible with scalable integration. Benefiting from these innovations, we designed and fabricated a monolithically integrated optical neural network with the largest input size and the deepest network depth, comprising an input layer with a size of 64, two convolutional layers, and two fully connected layers. We successfully demonstrate end-to-end two-class classification of fashion images and four-class classification of handwritten digits with accuracies of 96% and 94%, respectively, using an in-situ training method. Notably, performance is well maintained with partially coherent illumination. This proposed architecture represents a critical step toward realizing energy-efficient, scalable, and widely accessible optical computing.

An on-chip photonic deep neural network for image classification

Article 01 June 2022

Single-chip photonic deep neural network with forward-only training

Article 02 December 2024

Deep learning with coherent VCSEL neural networks

Article 17 July 2023

Introduction

Artificial intelligence is a driving force behind technological advancements, with neural networks serving as its foundational cornerstone. However, these networks require substantial energy to operate, underscoring their significant power consumption¹. Optical neural networks (ONNs), proposed within the last decade, have emerged as a promising alternative to their electronic counterparts². This promise lies in their inherent advantages, including the low energy consumption of optical propagation and access to high-bandwidth resources across multiple parallel dimensions^3,4. End-to-end ONNs refer to architectures that directly map input information to inference results in the optical domain^5,6. This streamlined approach eliminates the need for separate preprocessing, information relay, or post-processing stages in the electrical domain, thereby reducing energy consumption and latency. Various paradigms of ONNs have been proposed, broadly categorized into spatial diffraction-based and on-chip schemes⁷. Spatial diffraction-based networks enable connections among millions of spatial sampling points in the weight matrix^8,9,10. However, they are typically bulky, with a low integration density, and their performance is constrained by the refresh rate of spatial light modulators or digital micromirror devices, which is usually limited to kilohertz or megahertz-level frequencies⁸. In contrast, on-chip schemes leverage linear weight implementations through mechanisms such as Mach-Zehnder interferometer (MZI) meshes^{11,12,13,14,15}, micro-ring resonator (MRR) weight banks^16,17, crossbar arrays^18,19, or on-chip diffraction^12,20,21,22. These on-chip implementations overcome the refresh rate limitations of spatial diffraction schemes, achieving significantly higher rates—up to tens of gigahertz²³.

Although significant efforts have been made to develop various architectures for on-chip ONNs, the majority of these designs utilize the optical chip merely as an operator within a complex neural network model, falling short of achieving end-to-end optical inference. Recent advancements have made significant strides in end-to-end ONNs, showcasing exceptional energy efficiency and minimal latency^5,6. However, scaling up these architectures remains challenging due to constraints on network depth and the limited size of the on-chip optical matrix (Fig. 1a). Nonlinear activation functions (NAFs) are essential for the flexible functionality of deep neural networks and can be implemented in either the electrical or optical domain. Electrical nonlinearity involves converting signals between optical (analog) and electrical (digital) domains, which not only precludes end-to-end optical inference but also introduces significant energy consumption and latency²⁴. A competent optical NAF must meet several key requirements, including cascadability (offering positive net gain), compatibility with massive production processes, low energy consumption, and high bandwidth^{25,26,27,28,29,30}. Among various integrated schemes, opto-electro-opto (O-E-O) nonlinearity stands out due to its strong cascadability and compatibility with scalable production processes^5,6,17. However, demonstrations of on-chip deep ONNs to date have either failed to achieve positive net gain or have relied on external electrical amplifiers, limiting their efficiency and scalability^5,6. Besides, the size of the on-chip optical matrix is constrained by the inherent integration density of optical devices and the reliance on narrow-linewidth lasers. Narrow-linewidth lasers require either an increased number of wavelengths for incoherent computing or scalable coherent detection for coherent computing, producing additional complexity^6,31,32. In contrast, partially coherent optical sources offer significant advantages, such as reduced feedback control and thermal management requirements, greater parallelism, and enhanced robustness³². These benefits facilitate the scaling up of on-chip optical matrices.

**Fig. 1: Comparison of scaling limitations in existing ONNs and the proposed solution.**

In this work, we introduce a partially-coherent deep optical neural network (PDONN) chip to overcome the scaling up issue (Fig. 1b). The architecture comprises an input layer with a record scale of 64, two convolutional layers, and two fully connected layers. The convolution layers enable rapid data dimensionality reduction, effectively overcoming the limitations of input size. The proposed O-E-O nonlinearity facilitates scalable cascading across multiple hidden layers with positive net gain. Notably, we demonstrate that PDONN can effectively utilize a partially coherent source for both linear and nonlinear operations, achieving performance comparable to the same architecture employing a narrow-linewidth laser diode. Furthermore, the network is configured with an in-situ training method, which enhances its robustness against fabrication errors and improves its generalization ability across different applications. Our results demonstrate excellent end-to-end classification accuracy on benchmark tasks, underscoring the advantages of PDONN in terms of computational power, energy efficiency, and adaptability. This advancement highlights the potential of the proposed architectures for next-generation optical computing applications.

Results

Principle of the PDONN chip

The mathematical framework of the proposed optical deep neural network is illustrated in Fig. 2a. The process begins with an input image (8 × 8), which is convolved with a kernel (2 × 2) using a stride of 2. After the NAF is applied, the image’s dimensions are reduced to 4 × 4. Subsequently, two kernels of the same size (2 × 2) are separately applied to the 4 × 4 image, producing an 8 × 1 vector that serves as the input to two fully connected layers. The output vector, where the highest value indicates the classification result, completes the end-to-end inference. Overall, this architecture is a standard convolutional neural network, and the following sections detail its implementation on a fully-integrated silicon photonic chip.

**Fig. 2: The overall architecture of the proposed optical neural network.**

The input image is reshaped into a one-dimensional vector (64 × 1) and encoded onto the optical source. To enable high-speed refreshing of input information, an intensity modulator (IM) based on carrier injection is employed as the information encoder (Fig. 2b). The core component of the convolutional layer is a real-valued optical dot-product unit, depicted in Fig. 2c. The input optical signal is split into positive and negative channels based on the kernel’s value, using a thermally tunable MZI. Specifically, the normalized kernel value is defined as κ = 2α-1where α represents the ratio of optical power entering the positive channel. By adjusting the thermal phase shifter in one arm of the MZI, κ can be tuned to any value within the range [−1, 1]. The light in the positive and negative channels is directed to corresponding photodetectors connected in a differential configuration. According to the Kirchhoff’s current law, the differential photocurrent is calculated as I = I₊₁ + I₊₂ − I_–1 − I_–2 and it represents the convolution result. This photocurrent directly drives a micro-ring modulator (MRM) to achieve nonlinearity. When the injected current I > 0, the MRM is forward-biased, causing a significant blue shift in the transmission spectrum as I increases. Conversely, for I < 0, the MRM is reverse-biased, resulting in a slight redshift due to the lower carrier-depletion modulation efficiency³³. The output light intensity from the MRM’s through port follows a linearized sigmoid function relative to the driving current. The intensity of the supply light determines the gain of the NAF, and a positive net gain is achieved when the intensity reaches a sufficiently high level, indicating that the PNONN can be effectively cascaded.

The fully connected layer is implemented using a simplified real-valued MZI mesh, as shown in Fig. 2d. The 4 × 8 MZI matrix is decomposed into two 4 × 4 MZI matrices, with the resulting optical intensities summed via multi-port photodetectors³⁴. This customized MZI mesh is optimized for incoherent optical inputs, minimizing the number of phase shifters and eliminating redundancy³⁵. It outputs the differential optical power as a real-valued result, with cascaded differential photodetectors processing the signals. Similar to the convolutional layer, the nonlinear optical response is achieved by feeding the differential current to an MRM.

The optical source used for input and for powering the NAF is partially coherent, with a linewidth Δλ, and can be realized using a light-emitting diode (LED) or amplified spontaneous emission (ASE) source (see Supplementary Information S1). Unlike narrow-linewidth laser sources, it only requires direct detection and does not limit the number of input channels in the optical matrix, thereby facilitating the scaling up of ONNs³². To fully exploit the ultra-broadband wavelength resources, different wavelength bands are directed into different optical layers. The proposed PDONN operates in the real-valued domain, enabling the direct representation of both positive and negative weights without requiring additional encoding. This reduces hardware complexity and energy consumption, and facilitates scaling up the network size. Real-valued computation is particularly advantageous under partially coherent illumination, where representing signed values through phase manipulation becomes challenging³².

Chip fabrication and characterization

The PDONN chip was fabricated in a standard silicon photonics foundry, with its microscope image presented in Fig. 3a. The total chip footprint is approximately 17 mm², integrating hundreds of optical devices within this compact area. Figure 3b highlights the packaged PDONN chip, equipped with optical and electrical interfaces to facilitate experimental testing. The on-chip IM was first calibrated, as shown in Fig. 3c, d. The measured 3-dB bandwidth of the IM is 22 MHz, significantly exceeding the bandwidth of traditional thermally tuned MZI encoders. The modulation efficiency was determined to be 0.125 dB/mA. For encoding purposes, a current range of 0 mA to 23 mA was selected, corresponding to an attenuation range of 0 dB to 5.18 dB. The latency of the chip was characterized by comparing the optical pulse through a reference waveguide (230 μm in length) with the output pulse of the first nonlinear layer, as shown in Fig. 3e. The latter pulse exhibited an additional latency of approximately 1 ns, attributed to signal propagation within silicon waveguides, metal wiring, and the RC delay of the O-E-O nonlinearity. Signal propagation accounted for about 100 ps, while the RC latency, modeled as a first-order system, was approximately 1.67 ns (dashed line, see Supplementary Information S2 for details). Under the same input pulse, the total latency of the PDONN chip after three nonlinear layers was estimated as 4.1 ns, comprising 3.7 ns from NAFs and 400 ps from signal propagation. Figure 3f illustrates the optical NAF response as a function of varying differential input powers, with the center wavelength resonating at zero input power. The saturable optical power was measured as only 0.2 mW, reflecting high energy efficiency. In addition, the reference 0 dB-net-gain line (dotted line) is shown on the plot, with the vertical axis in mW. The measured transfer function, above the zero-net-gain line, demonstrates a positive net gain, confirming that the PDONN can be efficiently cascaded.

**Fig. 3: Fabrication and characterization of the PDONN chip.**

Image classification results

Using the same chip, we can flexibly deploy either 4-classification or 2-classification task based on the selected output channels. In our experiment, the drop ports of the MRMs at the first fully connected layer serve as the output channels for the 4-classification task. For the 2-classification task, the power differences across three ports in the final layer represent the computing results (see Supplementary Information S3). The softmax function is applied to process the outputs from the PDONN, and the cross-entropy loss function is used for training the network. A gradient descent algorithm is employed for in-situ training of the PDONN (see “Methods”). It treats the PDONN as a black box, making it inherently robust to fabrication errors and environmental disturbances, thereby enhancing its generalization ability across various applications. We first use the performance of the ONN with a narrow-linewidth optical source as a reference. The wavelengths of the laser sources are slightly detuned to achieve equivalent incoherent linear computing with the MZI mesh²⁵. For the 4-classification task, we use the MNIST dataset as the benchmark, selecting 100 images containing the handwritten digits “0”, “1”, “2”, and “5” as the training set. More datasets can enhance the generalization capability of the model and reduce overfitting. The images are resized to 8 × 8 pixels using bilinear interpolation before being loaded onto the on-chip IMs. Another set of 100 images from the dataset is used for testing. To implement a positive ONN, we disconnect the negative photodetectors in the convolution layers from the electrical bias system, while maintaining the optical NAF of the fully connected layer in the real domain. The training process and the final confusion matrix of the positive ONN are shown in Fig. 4a. The classification accuracy for the training and testing datasets is 73% and 68%, respectively. For the same 4-classification task, the accuracy of the real-valued ONN is 95% and 87% for the training and test datasets, respectively (Fig. 4b). These results demonstrate that the real-valued ONN outperforms the positive ONN.

**Fig. 4: The results on the 4-classification of handwritten digits.**

After benchmarking with narrow-linewidth lasers, we further validate the PDONN’s robustness by employing a partially coherent optical source. The performance of the optical NAF using a partially coherent source depends on balancing the extinction ratio of the MRM with the noise level of the source (see Supplementary Information S1). The extinction ratio of the MRM was studied as a function of the linewidth of the partially coherent source (Fig. 5a). As the linewidth increased, the extinction ratio decreased sharply, adversely affecting the expressiveness of the ONN. To evaluate modulation quality, the eye diagrams of the MRM were measured at a modulation rate of 140 MHz with varying linewidths of partially coherent sources (Fig. 5b). With increased linewidth, the eye diagram quality initially improved due to increased noise frequency but would deteriorate with further increases, caused by the lower extinction ratio. Consequently, a linewidth of 0.4 nm was selected to balance noise and extinction ratio, enabling optimal performance for the PDONN. In the experiment, a wavelength-selective switch partitions the ASE source into four wavelength bands with linewidth of 0.4 nm, each of which enters one layer of ONNs. The optical paths of the different channels of supplying light are carefully configured to ensure that the path difference exceeds the coherence length of the optical source. The intensity of the input light is maintained at the same level as in the previous experiment. For the four-classification task, the partially coherent ONN achieves high accuracies of 94% and 90% for the training and test datasets, respectively (Fig. 5c). These results suggest that the moderate degradation of the extinction ratio in the nonlinear layer, caused by the partially coherent source, does not significantly impair the overall performance of the ONN. Results for the 2-classification task can be found in the Supplementary Information S3.

**Fig. 5: Demonstration of partially coherent optical neural network.**

Discussion

The multiplexing technology of the PDONN

Compared to coherent optical sources, partially coherent optical sources offer significantly higher parallelism in wavelength dimensions³². This advantage paves the way for a promising architecture of a wavelength-multiplexed deep ONN, as illustrated in Fig. 6. Leveraging the broadband feature of the linear computing layer enables the processing of multiple wavelength channels in parallel³⁶. With carefully designed optical paths, the MZI mesh can maintain cascadability without compromising its broadband characteristics (see Supplementary Information S1). In the nonlinear layer, information is first demultiplexed into individual wavelength channels and converted into photocurrent. Subsequently, this photocurrent drives MRMs resonating at distinct wavelength bands to achieve the corresponding NAF. The multiplexing functionality of MRMs aggregates multi-channel information into a bus waveguide, which delivers the nonlinear results to the next linear layer. To avoid inter-channel crosstalk, the wavelength bands must be spectrally separated. For denser wavelength utilization, the bus waveguide can be replaced by a wavelength division multiplexer (WDM) to efficiently combine the different wavelength channels with less crosstalk.

Key performance comparison of different ONNs

The classification performance of our fabricated PDONN is primarily limited by the extinction ratio of data loading and optical NAF, both of which present promising avenues for improvement. The low extinction ratio of the fabricated MRM (6 dB) constrains the signal-to-noise ratio of the partially coherent source when using a linewidth of only 0.4 nm in the experiment. There are two viable optimization strategies for accommodating moderate extinction ratios with larger linewidths. The first approach is to enhance the intrinsic extinction ratio of the MRM, which can readily exceed 10 dB with optimized fabrication techniques³⁷. Another promising direction is to increase the free spectral range (FSR) of the on-chip modulator. The current fabricated MRM has a radius of approximately 10 μm, corresponding to an FSR of nearly 12 nm. By adopting nanocavity-based optical modulators, it is feasible to achieve an FSR-free spectrum with a broad 3-dB optical bandwidth, providing sufficient spectral range to support partially coherent optical sources with linewidths exceeding 1 nm³⁸. In our experiments, only the simple MNIST dataset and a limited set of 100 training samples were used. We anticipate that more complex datasets and larger training sets can be explored with further scaling of the network and improvements in data communication speed.

Latency and energy efficiency are two critical metrics of an ONN. The pulse latency per inference for the fabricated chip is estimated as 4.1 ns. The system’s bandwidth is primarily determined by the RC bandwidth of the optical NAF, calculated as approximately 105 MHz, corresponding to an RC latency of 1.67 ns. With optimized parasitic parameters in the optical modulator and photodetector, the bandwidth could extend to the GHz range, reducing latency to as low as 100 ps³⁹. The data-loading speed, currently limited by the IMs, could be significantly enhanced by adopting carrier-depleted modulators²³. The power consumption of the system is derived from three main sources: the laser source, external electrical circuits, and the optical chip. In total, the power consumption of the experimental system is 10.57 W (see Supplementary Information S4 for details). For computational operations, a dot product involves 2N operations, where N is the vector size, and each NAF is equivalent to three additional operations. For the PDONN, the total operations per inference are 356 OPs. The energy efficiency is calculated as (based on the current chip without optimization): 10.57 W × 4.1 ns/356 Ops = 121.7 pJ/OP. Increasing the size of the optical matrix could further enhance energy efficiency⁴⁰.

Table 1 compares the PDONN with state-of-the-art on-chip ONNs. This work demonstrates the largest input size and number of optical layers, while maintaining low latency and high energy efficiency. Optical NAFs with a net gain greater than 0 dB facilitate the scalable cascading of multiple optical layers on a single chip. The latency of 4.1 ns is higher than that of other on-chip schemes, due to the use of wider optical pulses and the inclusion of RC latency in the NAFs. However, the 400 ps propagation latency remains comparable to previous works that rely on O-E-O nonlinearity. Moreover, this work represents the first demonstration of partially coherent real-valued computing using an on-chip ONN, alleviating the reliance on narrow-linewidth lasers. This relaxation in source requirements enhances the scalability of the on-chip optical matrix while maintaining high inference performance.

Table 1 Comparison of different on-chip ONNs

Full size table

In conclusion, we propose a scaling up strategy for on-chip ONNs by employing cascadable optical NAFs, an efficient convolution architecture, and a partially coherent optical source. These innovations collectively enable exceptional scalability and significantly enhance the performance of widely accessible ONNs. We successfully designed and fabricated a cutting-edge deep ONN chip, comprising two convolutional layers and two fully connected layers. Experimental evaluations demonstrated strong performance in both 4-class and 2-class classification tasks, underscoring the advantages of real-valued computing and the partial coherence of the optical source. Future improvements, such as enhancing the extinction ratio of modulators and further reducing system latency, will enable even better scalability and performance. With its groundbreaking design and demonstrated capabilities, the proposed PDONN is a strong candidate for large-scale artificial intelligence applications, marking a significant step toward the development of energy-efficient and scalable optical computing systems.

Materials and methods

Dataset

The dataset of MNIST handwritten digits and MNIST fashion products was respectively taken from https://yann.lecun.com/exdb/mnist/ and https://github.com/zalandoresearch/fashion-mnist.

In-situ training algorithm

We use the gradient descent algorithm to train the ONN, aiming to minimize the mean entropy loss of the training dataset by optimizing the voltages applied to the thermal phase shifters. The detailed training process is as follows:

1. Initialization: Randomly initialize the voltages applied to the thermal phase shifters.

2. Gradient approximation:

Increment the voltage by 0.05 V for each thermal phase shifter and calculate the corresponding loss function L(U + 0.05).
Decrement the voltage by 0.05 V for each thermal phase shifter and calculate the loss function L(U − 0.05).
Estimate the approximate gradient of the loss function using the formula:

$$G=\frac{L(U+0.05)-L(U-0.05)}{0.1}$$

(1)

3. Voltage update: Update the voltages using the Adam algorithm, a fast-converging gradient descent method, as described by the formula⁴¹:

$$\begin{array}{c}U({\rm{iter}}+1)=U({\rm{iter}})-\alpha ({v}_{{\rm{iter}}}/(1-{{\beta }_{1}}^{{\rm{iter}}}))/\sqrt{{s}_{{\rm{iter}}}/(1-{{\beta }_{2}}^{{\rm{iter}}})+\varepsilon }\\ {v}_{{\rm{iter}}}={\beta }_{1}{v}_{{\rm{iter}}-1}+(1-{\beta }_{1})G\\ {s}_{{\rm{iter}}}={\beta }_{2}{s}_{{\rm{iter}}-1}+(1-{\beta }_{2}){G}^{2}\end{array}$$

(2)

where iter is the current iteration number, α is the learning rate which is set to 0.05 during training, β₁, β_2, and ε are hyperparameters set to 0.9, 0.999, and 10^–8, respectively. The initial values of v_iter and s_iter are zero.

4. Iteration: Repeat steps 2 and 3 until the loss function converges.

5. Save optimized parameters: Once the training is complete, save the optimized voltages for deployment.

This gradient-based training process effectively fine-tunes the ONN for high-performance computation.

Experiment methods

The chip was fabricated using a 200 mm CMOS process line with a two-layer copper interconnect and a small line width of 130 nm. The on-chip photodetectors employ a lateral PIN structure with a 260 nm-thick germanium layer. Calibration of the MRMs on the chip was achieved by sweeping the thermal phase shifters integrated into them. The bandwidth of the on-chip IMs was measured using a 20 GHz vector network analyzer (Anritsu MS2028C). The eye diagram of the MRMs was recorded using a setup comprising an arbitrary waveform generator (GIGOL DG4202, 500 MSa/s), a photodetector with 200 MHz bandwidth, and an oscilloscope (GIGOL DS4022, 4 GSa/s). For chip latency measurements, the input laser source was modulated using a lithium niobate intensity modulator with 10 GHz bandwidth. The pulse signal was generated with an arbitrary waveform generator (GIGOL DG4202), and the output optical signal was detected by an 18 GHz bandwidth photodetector. The detected signal was captured by an oscilloscope (Tektronix DSA72004B). The thermal phase shifters and the bias voltage of the optical NAFs were driven by a digital-to-analog converter (LTC2688) controlled by a field-programmable gate array (FPGA) chip (7K325T). The entire experimental setup was controlled via a personal computer through serial ports, and the chip was thermally stabilized using a thermoelectric cooler (TEC).

Data availability

The datasets used and/or analyzed in the current study are available from the corresponding author upon reasonable request.

References

Zhang, W. Q. et al. Neuro-inspired computing chips. Nat. Electron. 3, 371–382 (2020).
Article ADS Google Scholar
Fu, T. Z. et al. Optical neural networks: progress and challenges. Light Sci. Appl. 13, 263 (2024).
Article ADS Google Scholar
Shekhar, S. et al. Roadmapping the next generation of silicon photonics. Nat. Commun. 15, 751 (2024).
Article ADS Google Scholar
Zhang, W. K. et al. Large-scale optical programmable logic array for two-dimensional cellular automaton. Adv. Photonics 6, 056007 (2024).
Article ADS Google Scholar
Ashtiani, F., Geers, A. J. & Aflatouni, F. An on-chip photonic deep neural network for image classification. Nature 606, 501–506 (2022).
Article ADS Google Scholar
Bandyopadhyay, S. et al. Single-chip photonic deep neural network with forward-only training. Nat. Photonics 18, 1335–1343 (2024).
Article ADS Google Scholar
Zhou, H. L. et al. Photonic matrix multiplication lights up photonic accelerator and beyond. Light Sci. Appl. 11, 30 (2022).
Article ADS Google Scholar
Zhou, T. K. et al. Large-scale neuromorphic optoelectronic computing with a reconfigurable diffractive processing unit. Nat. Photonics 15, 367–373 (2021).
Article ADS Google Scholar
Lin, X. et al. All-optical machine learning using diffractive deep neural networks. Science 361, 1004–1008 (2018).
Article ADS MathSciNet Google Scholar
Chen, Y. T. et al. All-analog photoelectronic chip for high-speed vision tasks. Nature 623, 48–57 (2023).
Article ADS Google Scholar
Wu, B. et al. Chip-to-chip optical multimode communication with universal mode processors. PhotoniX 4, 37 (2023).
Article Google Scholar
Xu, Z. H. et al. Large-scale photonic chiplet Taichi empowers 160-TOPS/W artificial general intelligence. Science 384, 202–209 (2024).
Article ADS Google Scholar
Du, Z. C. et al. Ultracompact and multifunctional integrated photonic platform. Sci. Adv. 10, eadm7569 (2024).
Article Google Scholar
Pai, S. et al. Experimentally realized in situ backpropagation for deep learning in photonic neural networks. Science 380, 398–404 (2023).
Article ADS Google Scholar
Wu, B. et al. Monolithically integrated asynchronous optical recurrent accelerator. eLight 5, 7 (2025).
Article Google Scholar
Zhang, W. P. et al. A system-on-chip microwave photonic processor solves dynamic RF interference in real time with picosecond latency. Light Sci. Appl. 13, 14 (2024).
Article ADS Google Scholar
Huang, C. R. et al. A silicon photonic-electronic neural network for fibre nonlinearity compensation. Nat. Electron. 4, 837–844 (2021).
Article Google Scholar
Feldmann, J. et al. Parallel convolutional processing using an integrated photonic tensor core. Nature 589, 52–58 (2021).
Article ADS Google Scholar
Dong, B. W. et al. Higher-dimensional processing using a photonic tensor core with continuous-time data. Nat. Photonics 17, 1080–1088 (2023).
Article ADS Google Scholar
Cheng, J. W. et al. Multimodal deep learning using on-chip diffractive optics with in situ training capability. Nat. Commun. 15, 6189 (2024).
Article ADS Google Scholar
Fu, T. Z. et al. Photonic machine learning with on-chip diffractive optics. Nat. Commun. 14, 70 (2023).
Article ADS Google Scholar
Zhang, J. L. et al. Highly integrated all-optical nonlinear deep neural network for multi-thread processing. Adv. Photonics 7, 046003(2025).
Article Google Scholar
Rahim, A. et al. Taking silicon photonics modulators to a higher performance level: state-of-the-art and a review of new technologies. Adv. Photonics 3, 024003 (2021).
Article ADS Google Scholar
Shen, Y. C. et al. Deep learning with coherent nanophotonic circuits. Nat. Photonics 11, 441–446 (2017).
Article ADS Google Scholar
Wu, B. et al. Programmable integrated photonic coherent matrix: principle, configuring, and applications. Appl. Phys. Rev. 11, 011309 (2024).
Article ADS Google Scholar
Wu, B. et al. Low-threshold all-optical nonlinear activation function based on a Ge/Si hybrid structure in a microring resonator. Opt. Mater. Express 12, 970–980 (2022).
Article ADS MathSciNet Google Scholar
Shi, Y. et al. Nonlinear germanium-silicon photodiode for activation and monitoring in photonic neuromorphic networks. Nat. Commun. 13, 6048 (2022).
Article ADS Google Scholar
Wang, T. Y. et al. Image sensing with multilayer nonlinear optical neural networks. Nat. Photonics 17, 408–415 (2023).
Article ADS Google Scholar
Huang, Z. et al. Pre-sensor computing with compact multilayer optical neural network. Sci. Adv. 10, eado8516 (2024).
Article Google Scholar
Zhao, B. H. et al. Cascadable optical nonlinear activation function based on Ge–Si. Opt. Lett. 49, 6149–6152 (2024).
Article ADS Google Scholar
Zhang, H. et al. An optical neural chip for implementing complex-valued neural network. Nat. Commun. 12, 457 (2021).
Article ADS Google Scholar
Dong, B. W. et al. Partial coherence enhances parallelized photonic computing. Nature 632, 55–62 (2024).
Article ADS Google Scholar
Tait, A. N. et al. Silicon photonic modulator neuron. Phys. Rev. Appl. 11, 064043 (2019).
Article ADS Google Scholar
Tang, R. et al. Waveguide-multiplexed photonic matrix-vector multiplication processor using multiport photodetectors. Optica 12, 812–820 (2025).
Article ADS Google Scholar
Wu, B. et al. Real-valued optical matrix computing with simplified MZI mesh. Intell. Comput. 2, 0047 (2023).
Article Google Scholar
Wang, X. Y. et al. Chip-based high-dimensional optical neural network. Nano Micro Lett. 14, 221 (2022).
Article ADS Google Scholar
Rizzo, A. et al. Massively scalable Kerr comb-driven silicon photonic link. Nat. Photonics 17, 781–790 (2023).
Article ADS Google Scholar
Zhang, J. H. et al. Compact, efficient, and scalable nanobeam core for photonic matrix-vector multiplication. Optica 11, 190–196 (2024).
Article ADS Google Scholar
Nozaki, K. et al. Femtofarad optoelectronic integration demonstrating energy-saving signal conversion and nonlinear functions. Nat. Photonics 13, 454–459 (2019).
Article ADS Google Scholar
McMahon, P. L. The physics of optical computing. Nat. Rev. Phys. 5, 717–734 (2023).
Article Google Scholar
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. In Proc. 3rd International Conference on Learning Representations (San Diego, 2015).
Feldmann, J. et al. All-optical spiking neurosynaptic networks with self-learning capabilities. Nature 569, 208–214 (2019).
Article ADS Google Scholar

Download references

Acknowledgements

The work was supported by the Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

Wuhan National Laboratory for Optoelectronics, School of Optical and Electronic Information, Huazhong University of Science and Technology, Wuhan, 430074, China
Bo Wu, Jialong Zhang, Hailong Zhou, Yilun Wang, Jianji Dong & Xinliang Zhang
Department of Electronic Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China
Chaoran Huang

Authors

Bo Wu
View author publications
Search author on:PubMed Google Scholar
Chaoran Huang
View author publications
Search author on:PubMed Google Scholar
Jialong Zhang
View author publications
Search author on:PubMed Google Scholar
Hailong Zhou
View author publications
Search author on:PubMed Google Scholar
Yilun Wang
View author publications
Search author on:PubMed Google Scholar
Jianji Dong
View author publications
Search author on:PubMed Google Scholar
Xinliang Zhang
View author publications
Search author on:PubMed Google Scholar

Contributions

B.W., H.L.Z. and J.J.D. conceived the idea. BW designed and fabricated the chip. B.W. carried out theoretical analysis and simulation. B.W. designed and performed the experiments. J.L.Z., H.L.Z., J.J.D. and B.W. discussed and analyzed data. BW prepared the manuscript. H.L.Z. and J.J.D. revised the paper and X.L.Z. supervised the project. All authors contributed to the writing of the manuscript.

Corresponding authors

Correspondence to Hailong Zhou or Jianji Dong.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Supplementary information

Supplementary information for Scaling up for end-to-end on-chip photonic neural network inference (download DOCX )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Wu, B., Huang, C., Zhang, J. et al. Scaling up for end-to-end on-chip photonic neural network inference. Light Sci Appl 14, 328 (2025). https://doi.org/10.1038/s41377-025-02029-z

Download citation

Received: 13 April 2025
Revised: 24 July 2025
Accepted: 31 August 2025
Published: 17 September 2025
Version of record: 17 September 2025
DOI: https://doi.org/10.1038/s41377-025-02029-z

This article is cited by

Demonstrating completeness in optical neural computing
- Krzysztof Tyszka
Light: Science & Applications (2026)
Femto-joule threshold reconfigurable all-optical nonlinear activators for picosecond pulsed optical neural networks
- Ruizhe Liu
- Zijia Wang
- Hongtao Lin
Light: Science & Applications (2026)
Gradient-graphene-enabled directional photothermal regulation for self-aligned laser transfer printing
- Mengxin Gai
- Jing Bian
- YongAn Huang
Light: Science & Applications (2026)
Integrated photonic neural network with on-chip backpropagation training
- Farshid Ashtiani
- Mohamad Hossein Idjadi
- Kwangwoong Kim
Nature (2026)