Introduction

The desire for intelligent systems capable of autonomous learning, reasoning, and adaptation has fueled significant advancements in artificial intelligence (AI), transforming various application landscapes. As the demand for computational resources rapidly grows, traditional electronic computing approaches for AI are reaching their inherent limits in speed and energy efficiency for parallel processing. This limitation has stimulated the exploration of novel computing architectures employing different computational paradigms1,2,3. One example is photonic computing4,5,6,7,8,9, which uses photons to perform computational tasks traditionally executed by electronic systems. Since photons possess unique properties such as high bandwidth enabled by high optical carrier frequencies and inherent parallelism leveraging both the frequency and polarization degrees of freedom, photonic systems have been shown to be capable of computing at unprecedented speeds and energy efficiencies10,11,12,13,14,15.

Driven by these advantages, as well as the rapid development of integrated photonics as a field, photonic computing has emerged as a promising solution for realizing next-generation computing accelerators. Demonstrations using Mach-Zehnder interferometer arrays4,16,17,18, free space optics5,9,19,20,21, silicon photonics with on-chip attenuators, photodetectors, and ring banks8,22,23,24, VCSELs with free space optics25, as well as parallel processors utilizing frequency multiplexing6,7,26,27 and phase change materials7,28,29, showcase the versatility and potential of the photonics approach. Despite these advances, a critical challenge persists. Low speed and high energy consumption are typically associated with the electro-optic (EO) conversion of data from electronic memory into the optical domain15,30. However, such EO conversion is unavoidable, as the majority of data is stored and processed electronically. Current approaches utilizing variable attenuators or fast EO modulators6,7,16,17,18,22,23,24,25,27,28,29,31, typically implemented in silicon photonics or bulk systems, are capable of preparing data at rates ranging from the kHz to GHz but are plagued by high electronic energy consumption and optical losses. Alternatively, approaches using static amplitude masks or spatial light modulators can be more energy efficient yet suffer from low refresh rates5,8,9,19,20,21. As a result, EO conversion remains the most crucial hurdle for practical implementations of photonic computing in real-world applications. Ultimately, high-performance EO modulation is necessary to achieve a high optical data rate that can match the low optical latency while simultaneously maintaining a low energy budget. However, a photonic accelerator that combines high-speed processing with low energy consumption, addressing the challenges faced by traditional EO conversion, is still missing.

Here, we demonstrate high-speed and energy-efficient photonic computation, leveraging the EO effect in a thin-film lithium niobate (TFLN) photonic circuit32,33. Recent progress in the TFLN platform has enabled development of groundbreaking EO devices including modulators34,35,36, frequency shifters37, and comb generators38,39, among others. The capabilities of these devices stem from the strong EO effect and are facilitated by the tight confinement of both optical and electronic modes as well as low propagation loss40. By seamlessly integrating these high-performance EO devices into an optimized large-scale photonic circuit, we build a TFLN computing accelerator that performs matrix-vector-multiplication (MVM) operations and addresses the speed and power limitations of current computing architectures. Complementary to recent demonstrations of TFLN-based optical neural networks41,42, our approach features a high level of component integration as well as a demonstration of a functional integrated EO circuit with a hybrid-integrated laser source and heterogeneous-integrated fast III-V photodiode. This advance represents an important step towards compact, large-scale TFLN-based photonic circuits enhanced by integrated EO conversion, including, but not limited to, photonic computing.

Results

Accelerator architecture

The accelerator imprints electronic data onto the optical domain by modulating optical amplitudes, performs computations of the data through successive EO modulation, and returns the result back to the electrical domain through optical-to-electronic conversion (Fig. 1a). This workflow is implemented using two high-performance TFLN amplitude modulators connected in series, and photonic advantage is achieved through massively parallel multiplexing in the time domain. In the context of fully-connected deep neural networks (DNNs), where MVM operations for data inference is crucial, an input data vector may be encoded onto the amplitude of light by the first TFLN amplitude modulator. The data-modulated light is then fanned-out into distinct spatial channels, each channel subsequently mapping a machine-learned weight vector onto the data vector using a second TFLN amplitude modulator. The intensity of light passing through both modulators is thus proportional to the product of signals applied to each modulator, which amounts to multiplication of the weight vector with the data vector. This intensity is detected using a photodetector and summed electronically, converting the result back into the electrical domain.

Fig. 1: Photonic computing accelerator on thin-film lithium niobate.
figure 1

a Concept of photonic computing accelerators. Data stored in the electronic system (e.g. a computer) are sent to the photonic accelerator at high rates and are converted into the optical domain. Parallel computations are then performed by the accelerator and results are returned to the electronic system. b Illustration of the photonic computing working principle. Continuous-wave light passes through two cascaded amplitude modulators (AMs) which sequentially encode elements of \(\vec{x}\) and \(\vec{a}\) onto the amplitude of light, effectively performing element-wise multiplication of the two vectors. The components contributing to \(\vec{x}\cdot \vec{a}\) are read out by optical-to-electronic conversion using a low-noise and high-speed detector, and electronic summation of these components finally yields \(\vec{x}\cdot \vec{a}\). c The vision for a fully integrated computing core based on TFLN photonics, consisting of laser, detectors, and TFLN modulators for high-speed and energy-efficient EO conversion and computation. An input vector is first encoded in the time domain of the optical field through an amplitude modulator and then fanned-out into \(N\) spatial channels (\(N=16\) in this figure) to leverage massive spatial parallelism. In each channel, another amplitude modulator is used to multiply weights with the input vector. Finally, detectors convert the multiplication results back into electronic signals.

Photonic computing circuit implementation on TFLN

To demonstrate the proposed computing core, we designed individual photonic-integrated components and combined them into a single TFLN circuit. Our accelerator contains \(M=2\) computing cores, each with \(N=16\) channels realized by a spatial fan-out. Multiplication is established owing to the large set of synchronously-functional EO modulators. In practice, one modulator failed during the fabrication process. Further, numerous passive optical elements (e.g. 17\(\times\)2 grating couplers, and 15\(\times\)2 Y-splitters for fan-out splitter trees) are fabricated through high-quality, direct dry etching of TFLN, and smooth waveguide sidewalls over the entire circuit lead to a low optical propagation loss of 0.28 dB/cm, achieved by optimizing the reactive ion etching process (Fig. 2d). Low optical propagation loss critically enables a large-scale TFLN circuit such as ours, as well as energy-efficiency through a one-to-sixteen fan-out splitter tree with added insertion losses of 0.135 dB per channel. The microwave circuit for modulation is accomplished using two layers of gold (heights labeled by \({h}_{1}\) and \({h}_{2}\) in Fig. 2e) with nickel-chromium (NiCr) resistors (Fig. 2e). The bottom layer of gold is required for efficient modulation and direct-current (DC) tunability, while the top layer interfaces with resistors. The use of NiCr ensures high-resistance terminators with high-power handling ability and minimal feature size (Fig. 2f). Each 1-cm long modulator features >40 GHz bandwidth (limited by the 40-GHz detector used), >20 dB microwave reflection suppression, switching voltage to modulator length product (\({V}_{\pi }\cdot L\)) \(\sim 2.2\,{{\rm{V}}}\cdot {{\rm{cm}}}\), and >20 dB extinction ratio (Fig. 2g, h). We note that the variations in modulator electrical properties are minimal and do not compromise their EO conversion ability, as evaluated in the following section.

Fig. 2: Integrated thin-film lithium niobate photonic circuit implementation of computing core.
figure 2

a Optical microscope and scanning electron microscope images of the building blocks used in the integrated TFLN circuit: waveguide array (top left) for signal routing; grating coupler (top center) for efficient in- and out-coupling of light; ring resonator (top right) for evaluating the propagation loss and etch quality; fan-out splitter tree (bottom center) to distribute light into distinct spatial channels; microwave transmission line (bottom left) for delivering efficient EO modulation; and on-chip terminator (bottom right) for high-quality microwave impedance matching. b Optical microscope image of seven weight modulators in the TFLN accelerator. c Full image of one computing core (our circuit contains two such computing cores on the same chip). d Scanning electron microscope image of a high-quality waveguide featuring low propagation loss, enabling low optical energy consumption for the entire circuit. e Cross-section illustration of the TFLN circuit, including gold, nickel-chromium (NiCr), lithium niobate, silicon dioxide, and silicon. \(d=100\,{{\rm{nm}}}\), \(w=1.5\,{{\rm{\mu }}}{{\rm{m}}}\), \({h}_{0}=300\,{{\rm{nm}}}\), \({h}_{1}=1.0\,{{\rm{\mu }}}{{\rm{m}}}\), \({h}_{2}=800\,{{\rm{nm}}}\), \(t=300\,{{\rm{nm}}}\), \({t}_{1}=4.7\,{{\rm{\mu }}}{{\rm{m}}}\), \({t}_{2}=525\,{{\rm{\mu }}}{{\rm{m}}}\). f Measured terminator resistance vs. length. g EO forward transmission (\({S}_{21}\) EO) and electric-electric input reflection (\({S}_{11}\,\)EE) response of the modulators in our circuit. Our circuit features a combined total of 31 spatial channels (32 designed, but one failed during the fabrication process). All modulators have a bandwidth beyond 40 GHz (measurement limited by our detector bandwidth). h Representative \({V}_{\pi }\cdot L\) of the modulators (length 1 cm) in our circuit.

High-speed and energy-efficient photonic computing on TFLN

We first demonstrate high-speed vector operations with low energy consumption (Fig. 3a). For generality, we generate as data and weight vectors two random lists of length 1,000. Both vectors are encoded into the time domain at a rate \(r=\)43.8 GOPS per channel and multiplied after light passes through the data and weight modulators. We note that the vectors are first converted from digital to analog using a high-speed arbitrary waveform generator, and these electronic signals are delivered to the modulators via contact probes. The twice-modulated optical signal, which represents the multiplication result, is detected by a photodetector and digitized using a high-speed oscilloscope. Direct electronic integration of the sampled time traces gives the photonic dot products, and comparison with expected values allows for assessment of computational accuracy per channel (Fig. 3b and see Supplementary information for details). Slight variations in accuracy are likely due to the passive components in each channel, such as input-output coupling and optical path length differences between channels. Furthermore, the time traces are symbol-wise in good agreement with expectation, confirming the high computational speed per channel of our circuit (Fig. 3c). We evaluate the channel by channel performance of the computing core and find that this computational speed (43.8 GOPS/channel) is well maintained. Operating at this speed, we then evaluate the energy consumption while simultaneously assessing computational accuracy under varying optical power conditions. Our experiments reveal a minimum energy consumption of 0.0576 pJ/OP, demonstrating robust computational accuracy with high energy efficiency for the computing circuit (Fig. 3d and see “Computational accuracy” in Supplementary information for details). This assessment includes energy consumption of the pump laser, microwave energy dissipated by the modulators and detectors, as well as other related components  (see Methods and “System characterization: speed and energy consumption calculations” in Supplementary information for details). Altogether, our results confirm the high-speed and energy-efficient multiplication of generalized vectors using the TFLN computing system.

Fig. 3: High-speed and energy-efficient photonic computing on thin-film lithium niobate.
figure 3

a Two-dimensional illustration of the photonic computing core structure. b Computational accuracy for different channels in one computing core. The differences in accuracy between channels is minimal though can be further reduced through fine tuning of the system operational parameters. c Example waveforms of a temporally-multiplexed computing operation between two random vectors \(\vec{x}\) and \(\vec{a}\) with 22.8 ps/symbol. Deviations between theory and experiment can be attributed to limited system bandwidth and residual inaccuracy in time delay between \(\vec{x}\) and \(\vec{a}\). Since the final MVM result is a summation of the data trace, it averages out these deviations and further reduces the error. d Computational accuracy vs. energy consumption by varying the optical power, showing a lowest energy consumption of 0.0576 pJ/OP at 22.8 ps/symbol, while still maintaining computational accuracy. The inset gives error of the computation (Error = Measured-Expected). The \(\sigma\) is the standard deviation of the error.

Inference tasks

We utilize the TFLN circuit to perform computation required by real algorithms. First, we tackle a simple binary classification problem (Fig. 4) over points lying in a two-dimensional plane labeled by “exclusive-or” rule. One such data point \(\vec{x}=\left(\begin{array}{c}{x}_{1}\\ {x}_{2}\end{array}\right)\) is input into the circuit for a small number of computing operations to be carried out, and nonlinear activation via an electronic computer is used to obtain the label. The sign of the label indicates the category of the data point inferred by photonics (Fig. 4a). A total of 400 data points is tested, with the classification results shown in the 2D plane (Fig. 4b), achieving an overall accuracy of 93.8% (93.5% on the electronic computer). Further analysis histograms reveal an accuracy of 98.6% (99.1% on the electronic computer) for data points with positive ground truth and 88.4% (87.3% on the electronic computer) for data points with negative ground truth (Fig. 4c). The photonic computing system is thus effective for performing binary classification tasks, supported by the agreement in its classification accuracies with those achieved by traditional electronic computers.

Fig. 4: Photonic binary classification.
figure 4

a Illustration of binary classification conducted over a data set consisting of two-dimensional vectors \(\vec{x}={\left[{x}_{1},{x}_{2}\right]}^{T}\), where each vector is labeled positive or negative based on the exclusive-or condition. To infer the label of some \(\vec{x}\), a number of computing operations are carried out by our photonic accelerator to compute \({\vec{x}}^{T}Q\vec{x}\) followed by electronic nonlinear activation. Here, \(Q\) is a pre-trained kernel matrix and nonlinearity is electronically applied. Accuracy of inference thus depends on the accuracy of photonic computations. b Classification of 400 randomly selected \(\vec{x}\) in the problem space, colored by their photonics-inferred positive (blue) and negative (red) labels. c Comparison between photonic and electronic classification results over the 400-vector test set, which shows good agreement between photonic and electronic computing. The photonic circuit (electronic computer) achieves a classification accuracy of 93.8% (93.5%).

Next, we apply the photonic accelerator to a handwritten digit classification problem over the Modified National Institute of Standards and Technology (MNIST) dataset43. A two-layer feedforward model is chosen to perform the classification (Fig. 5a), with input vectors of size 784 temporally encoded in layer 1, and 10 neurons in layer 2. These input vectors undergo photonic MVM processing followed by electronic softmax activations. Photonic and electronic confusion matrices (Fig. 5b) generated over a 500-image test set indicate that our circuit can efficiently perform computing operations for machine learning inference, as the circuit achieves a classification accuracy of 88%, compared to 92% obtained from electronic computing. In addition to binary classification, our results demonstrate the versatility of our photonic computing system in handling various models, including neural network tasks with vectors of arbitrarily long length. Importantly, we also evaluated the system stability by running identical computation tasks on the photonic circuit for over 20 h, without applying explicit temperature control or adjusting the input/output light coupling to the chip. We observed good stability with a standard deviation in the error fluctuation of 0.04% (Fig. 5c and see Supplementary information for details).

Fig. 5: Photonic computing for fully-connected layers of photonic neural networks.
figure 5

a Classification of an MNIST handwritten digit. The image is flattened into a single vector encoded in the time domain. An example image (number six) is shown on the left. A two-layer photonic neural network is used to perform the classification task (center). The photonic computing result at the end of the network is then sent back to the computer to perform a nonlinear activation (right). The final classification results agree well with the electronically computed result. b Statistics of MNIST handwritten digit recognition. 500 MNIST images are selected as the test set and processed. The confusion matrices show a classification accuracy of 88% using our circuit (92% using electronic computer). c Stability. The computing core is programmed to continuously run identical computing tasks over 20 h, experiencing minimal fluctuations of 0.04%. d Real image classification. Images resembling real-life objects are selected from the CIFAR-10 database and classified using a convolutional neural network. The images are preprocessed through convolution layers, flattened into vectors, and then sent into our circuit to be classified by the remaining fully-connected layers (bottom left). Nine example figures including a truck, cat, bird, automobile, frog, deer, ship, airplane, and horse (top left), together with the classification results (right) are shown, indicating our photonic computing circuit can accurately process large, multi-layer networks.

Finally, to assess the practical utility of our computing system in solving real-world problems, we extend our demonstrations to image classification over the Canadian Institute for Advanced Research 10-class (CIFAR-10) dataset44. Images undergo preprocessing with a convolution layer, a step that could be directly mapped to MVMs and performed with the processor of this work, or accelerated by creating a specialized photonic circuit for convolution operations. In our setup, the circuit is used to propagate preprocessed images through the fully connected layers of the convolutional neural network (CNN). The first layer consists of an input vector of size 1,024 encoded in time, which is subsequently transformed into an intermediate layer with 128 neurons, and then a final output layer with 10 neurons representative of the ten classes. We tested images in all 10 categories. Examples including trucks, cats, birds, automobiles, frogs, deer, ships, airplanes, and horses are presented, with accurate classification results (Fig. 5d). This confirms our system’s potential for image recognition and analysis, typically requiring large, multi-layer neural networks, since the correct classification of one image requires a large amount of successive computing operations.

Hybrid- and heterogenous-integrated TFLN photonic computing circuit

Next generation high-performance and low power consumption photonic computing circuits would feature lasers and detectors integrated on TFLN chips, alongside active-EO and passive TFLN components. Importantly, wafer-scale fabrication of TFLN (Fig. 6a) is ideally suited for the batch production of stand-alone optical computing cores (Fig. 6b, c), and it would unlock significant potential for greater spatial parallelism, in conjunction with the already demonstrated temporal-multiplexing and efficient EO conversion characteristics unique to the TFLN photonic computing platform.

Fig. 6: Hybrid- and heterogenous- integrated TFLN photonic computing circuit.
figure 6

a Wafer-scale fabrication of computing cores comprising a TFLN photonic computing circuit. b Chiplets of TFLN computing cores from the wafer-scale process. c, Measurement setup for characterizing the hybrid- and heterogeneous-integrated system: light from a hybrid-integrated DFB laser source is butt-coupled to the TFLN computing core, while a heterogeneous-integrated MUTC-PD is used to perform optical-to-electronic conversion of the computing signal (the electrical signal is extracted by contact probes). d Optical microscope images of essential building blocks used for the integrated TFLN circuit similar to Fig. 2a, except bilayer-taper edge couplers (top middle) are employed as a low-loss interface between the TFLN waveguide mode and the DFB waveguide mode. e Schematic of a single channel integrated system. f Optical microscope image of DFB laser and 2-D simulation of the DFB waveguide mode. P (N): positively- (negatively-) doped region; QW: quantum well. g Optical microscope image of TFLN photonic computing circuit, with an array of cascaded amplitude modulators (left) and an array of MUTC-PDs (right). h Optical microscope image of MUTC-PD and schematic of its cross section. i DFB output power measured by an integrating sphere vs. injection current, with about 50 mA current threshold and 0.25 W/A slope efficiency. j MUTC-PD dark current vs. bias voltage. For high-speed operation, a reverse-bias of −2 V is held for all measurements. k Example waveforms of a temporally multiplexed computing operation between two random vectors \(\vec{x}\) and \(\vec{a}\) with 96.8 ps/symbol (10.33 GOPS per channel). l Computational accuracy \(\sigma\), as previously defined, vs. computing energy consumption as the DFB output power is varied. Three energy consumptions are evaluated based on experimental conditions provided DFB injection currents of 85, 125, and 150 mA, respectively.

As an important first step in this direction, we fabricated a TFLN photonic computing accelerator chip with heightened degree of integration but featuring a single spatial channel. Our chip consists of active EO modulators and passive components (waveguides, Y-splitters, on-chip terminators, etc.), with the addition of bilayer-taper edge couplers that are used for hybrid-integration of a distributed feedback (DFB) laser source with the TFLN chip, and heterogeneous-integration of a modified uni-traveling-carrier photodiode (MUTC-PD) evanescently coupled to TFLN signal waveguides (Fig. 6d–h). We calibrate the DFB laser power against injection current and find a lasing threshold of about 50 mA (Fig. 6i) with 0.25 W/A slope efficiency. We also measure the MUTC-PD dark current as function of bias voltage (Fig. 6j) and find a low dark current of 2.7 nA when reverse biased at -2 V for high-speed operation (see Methods and Supplementary information for further details on DFB characterization and PD fabrication). Given these performances, we evaluate key performance metrics for computing, such as high-speed and low energy consumption computations, by operating the single spatial channel at a rate \(r=\)10.33 GOPS (Fig. 6k) and assessing the computational accuracy vs. various full-system energy consumptions (Fig. 6l). We show high-fidelity encoding and multiplication of pairs of random vectors, as well as computational accuracies characterized by \(\sigma\) < 0.5% across all energy consumptions attempted. Here, the reduction in demonstrated data rate (compared to previously demonstrated 43.8 GOPS per channel) is attributed to imperfect on-chip termination resistance. The larger energy consumptions overall (compared to previously demonstrated 0.0576 pJ/OP) are mostly due to overdriving the laser source in our proof-of-concept demonstration. We note that the DFB source operated just above lasing threshold already provides more optical power than needed for accurate computation utilizing a single spatial channel.

Discussion

In conclusion, our work presents a photonic computing acceleration architecture on TFLN, capable of performing increasingly complex algorithmic tasks, from binary classification, handwritten digit classification, to actual image classification. Testing larger datasets, computing with models containing more parameters, and introducing environmental disturbances during computation could further assess the system’s performance. Unlike architectures designed for specific applications such as convolution or vision-based tasks, ours is well suited for general computing tasks. Our experimental demonstration is enabled by a large scale, system-level TFLN circuit. Leveraging the TFLN platform, we include the process of EO conversion in our demonstration to achieve high speed and low energy consumption photonic computing. This is made possible by increasing the scale of TFLN integration while maintaining the high performance of individual components. We then advance our system by fabricating on-chip detectors and replacing the bench-top laser with DFB source butt-coupled to our TFLN chip. Future improvements to enhance performance metrics to match or exceed state-of-the-art photonic accelerators are straightforward: reduce the \({V}_{\pi }\) to 1 V36, increase the bandwidth to 100 GHz36,45,46, and minimize the optical propagation loss of waveguides to 0.03 dB/cm40,47, each of which has been previously demonstrated in TFLN devices. State-of-the-art modulator designs leveraging multi-layer capacitance loading and folded electrodes may further improve the modulation efficiency per chip area, resulting in compact EO modulator networks also featuring a high production yield48,49,50. While these modulator advances are mainly driven by the needs of optical communication technology, they can readily be adopted in, and will have positive impact on, photonic computing architectures such as ours. Transitioning the accelerator to visible wavelengths may further reduce energy consumption as well as improve scalability, leveraging sub-volt TFLN modulators51,52 that may also be shorter in length. Utilizing the frequency degree-of-freedom with TFLN soliton microcombs53 may extend applicability to photonic convolution acceleration. Various optical packaging methods, including photonic wire bonding54, can improve the stability of light coupling to the chip, which may enable accurate computations even in the presence of environmental disturbances. For a comprehensive contextualization against state-of-the-art electronic and photonic systems, as well as future projections in performance, see “System characterization: speed and energy consumption calculations” in Supplementary information.

It is worth noting that our approach to address the EO conversion challenge in photonic computing is compatible with other photonic accelerators. Thus, our work may motivate novel hybrid approaches (e.g. combining TFLN EO conversion with free space optics) that require high bandwidth and low power modulation for data encoding. TFLN stands out as the most powerful platform for this task, owing to its exceptional EO conversion capability, thus offering the potential to overcome existing speed and energy bottlenecks in various optical computation schemes. While we did not operate all channels of our computing circuit simultaneously and only examined them separately, due to limited testing equipment, the associated challenge of potential crosstalk between adjacent TFLN modulators has recently been experimentally shown to not play a role46. Moreover, the full potential of the platform can be unlocked by leveraging photonic-electronic integration through the nascent lateral and 3D integration strategies55,56,57,58,59, especially integration with high-speed electronic circuits, including multi-channel DACs, ADCs, and FPGAs to utilize the spatial multiplexing capability. In this context, a comprehensive understanding of not just the energy consumptions of novel electronics, but also the uncertainty in their estimations, is critically required for accurately mapping the path forward of TFLN photonics for computing. More broadly, we believe that large-scale TFLN circuits similar to those demonstrated here may hold great promise for applications in vision9, sensing21, ranging60,61,62, and even quantum computing63,64,65, and we hope that our work will stimulate further exploration of such applications.

Methods

Device fabrication

The devices in our study are fabricated using a commercial X-cut thin-film lithium niobate (TFLN) on insulator wafer (NANOLN). The wafer comprises a 600 nm-thick LN layer, a 4.7 μm buried oxide layer (thermally grown), all mounted on a 525 μm-thick silicon (Si) handle. The fabrication process involves the following steps. (1) Patterning of the optical layer: electron-beam lithography is used to define patterns for the optical layer of the device, including rib waveguides and microring resonators. Ar+-based reactive ion etching is then used to etch the optical layer down by 300 nm. (2) Defining the bottom-layer electrode: microwave electrodes are defined using a combination of photolithography, electron-beam evaporation, and a bilayer lift-off process. Using this approach, an 800 nm-thick layer of gold (Au) is deposited to form the bottom layer of the microwave electrodes. (3) Cladding with SiO2: the devices are cladded with a 1.0 μm-thick layer of silicon dioxide (SiO2) using plasma-enhanced chemical vapor deposition (PECVD). (3) Depositing the nickel-chromium resistor: the nickel-chromium (NiCr) layer is defined using photolithography with deposition through electron-beam evaporation, followed by a bilayer lift-off process. (4) Defining the top-layer electrode: a second layer of Au is deposited using a combination of photolithography, electron beam evaporation, and bilayer lift-off. The thickness of the top layer gold is 800 nm. The fabrication process results in a TFLN circuit with high-quality optical components and a well-defined microwave circuit, which together enable high-performance computing operations on TFLN.

The on-chip PD is based on the modified uni-traveling carrier (MUTC) structure with optimized transit time limited bandwidth of 165 GHz66. They are heterogeneously integrated on TFLN using adhesive bonding with SU-8, involving the following steps. (1) The bare epitaxial structure is bonded on the TFLN photonic chip with ~100 nm-thick SU-8. (2) The InP substrate is removed using HCl wet etch. (3) PD mesas are formed by a combination of standard III-V material dry etch and wet etch recipes. (4) PDs are passivated using SU-8 first before forming RF probe pads to improve yield and dark current. (5) The SU-8 is fully cured and was compatible with other post processing steps needed. We note that the RF pads were designed to be impedance matched to 50 \(\Omega\) in a ground-signal-ground configuration with 150 µm pitch, matching the available contact probes.

Experimental architecture and setup

Photonic computing for matrix-vector-multiplication using our TFLN circuit is based on the cascaded amplitude modulation of light, which takes advantage of the efficient and high-speed EO modulation characteristics of TFLN. Continuous wave light in the telecommunications C-band is coupled into the chips. This light then passes through the first amplitude modulator (AM) which encodes the data vector \(\vec{x}\) onto the light’s amplitude. The modulated optical signal is split into multiple spatial channels, using the on-chip fan-out splitting tree, each containing another AM. When these AMs are driven in time-synchrony with the data encoded using the first AM, they apply the weight vector \(\vec{a}\) resulting in overall amplitude encoding corresponding to the elements comprising the sum of \(\vec{x}\cdot \vec{a}\), which can be converted back to the electrical domain by photodetection. The generated photocurrent tracks amplitude fluctuations of the impinging light up to the detection bandwidth, and the components of \(\vec{x}\cdot \vec{a}\) can be directly digitized by an oscilloscope (Keysight UXR1102A). An electronic summation of the components yields the final result, \(\vec{x}\cdot \vec{a}\). To enable dot products between signed vectors, we adopt an approach proposed in earlier work20 that converts dot products between signed vectors to that between unsigned vectors, requiring two overhead calculations by the photonic accelerator. This overhead can be removed in future work albeit with higher system-level complexity, for example by integrating balanced photodetection67. In our experiment, the electronic signals \(\vec{x}\) and \(\vec{a}\) are generated by a high-speed arbitrary waveform generator (AWG, Keysight M8196A) and delivered to on-chip microwave transmission lines via high-speed cables and contact probes (GGB 40A-GSG). Due to limitations in the electronic equipment, all interference tasks utilize a single neuron (channel) on the TFLN chip. We note that the synchronization of \(\vec{x}\) and \(\vec{a}\) electrical drives is accomplished through adjusting the delay between them, up to an integer multiple of the sampling rate of the AWG. Both drives are generated by the same AWG. For the full spatial-multiplexing to be utilized, synchronization between each \({\vec{x}}_{i}\) and \(\vec{a}\) is possible by using multiple synchronized DACs and delay adjustment.