Probabilistic photonic computing with chaotic light

Brückerhoff-Plückelmann, Frank; Borras, Hendrik; Klein, Bernhard; Varri, Akhil; Becker, Marlon; Dijkstra, Jelle; Brückerhoff, Martin; Wright, C. David; Salinga, Martin; Bhaskaran, Harish; Risse, Benjamin; Fröning, Holger; Pernice, Wolfram

doi:10.1038/s41467-024-54931-6

Download PDF

Article
Open access
Published: 01 December 2024

Probabilistic photonic computing with chaotic light

Nature Communications volume 15, Article number: 10445 (2024) Cite this article

10k Accesses
15 Citations
2 Altmetric
Metrics details

Subjects

Abstract

Biological neural networks effortlessly tackle complex computational problems and excel at predicting outcomes from noisy, incomplete data. Artificial neural networks (ANNs), inspired by these biological counterparts, have emerged as powerful tools for deciphering intricate data patterns and making predictions. However, conventional ANNs can be viewed as “point estimates” that do not capture the uncertainty of prediction, which is an inherently probabilistic process. In contrast, treating an ANN as a probabilistic model derived via Bayesian inference poses significant challenges for conventional deterministic computing architectures. Here, we use chaotic light in combination with incoherent photonic data processing to enable high-speed probabilistic computation and uncertainty quantification. We exploit the photonic probabilistic architecture to simultaneously perform image classification and uncertainty prediction via a Bayesian neural network. Our prototype demonstrates the seamless cointegration of a physical entropy source and a computational architecture that enables ultrafast probabilistic computation by parallel sampling.

Probabilistic photonic computing for AI

Article 23 May 2025

Photonic probabilistic machine learning using quantum vacuum noise

Article Open access 05 September 2024

Optical synaptic devices with ultra-low power consumption for neuromorphic computing

Article Open access 29 November 2022

Introduction

According to the neuroscience principle of free energy minimization (FEM), living organisms develop internal models of their environment to guide actions that minimize surprise and reduce uncertainty^1,2. This objective stands in contrast to that of biologically inspired artificial neural networks (ANNs), which typically aim to maximize accuracy³. Shifting focus from accuracy to handling uncertainty is pivotal in explaining the efficiency and adaptability of biological neural networks. To date, ANNs have been very successfully implemented on deterministic conventional hardware and have led to breakthrough results in areas including weather forecasting⁴, medical diagnostic⁵, autonomous driving⁶ and natural language processing^7,8,9,10. However, deterministic models are point estimates based on known data and do not take the complete posterior distribution of the parameters into account¹¹. Bayesian neural networks (BNNs) replace the deterministic network parameters with probability distributions to capture the probabilistic nature of inferring from incomplete observed data^12,13. In this way, BNNs allow for distinguishing between epistemic uncertainties due to the lack of data and aleatoric uncertainties arising from noise in the data itself^14,15. Consequently, BNNs are also significantly more robust against overfitting to small data sets^16,17. Bayesian inference also lies at the heart of the of the FEM principle.

Processing complex probabilistic models poses major challenges for conventional deterministic hardware. Because the integral formulations used in describing probabilistic models become intractable already for a small number of parameters, Monte Carlo methods are employed to provide approximate solutions^13,16. This includes sampling from the model’s posterior distribution multiple times and subsequently evaluating the model for each drawn sample. Thus, high-speed (true) random number generators are required in combination with an architecture capable of evaluating the full model for each sample in a reasonable time. In conventional hardware implementation, one major factor contributing to the inefficiency of machine learning systems is the reliance on the von Neumann digital architecture, which, contrary to the physics of computing substrates, enforces determinism and separates memory from computation¹⁸. Brain-inspired computing differs from conventional digital computing by emphasizing in-memory analog processing, fine-grained parallelism, reduced precision, increased randomness, adaptability, analog processing, and possibly, spike-based communication¹⁹. Co-designing FEM-based learning with brain-inspired computing platforms can enhance energy efficiency and adaptability by shifting the learning objective from noise reduction (accuracy) to instead harnessing hardware noise as a valuable computational resource¹⁹. For electronic crossbar arrays, memristors serve as the main in-memory computation element due their tunable conductance. Simultaneously, programming and reading the conductance of a memristor is a stochastic process due to inherent randomness of the switching process in addition to drifts and instabilities^20,21. Since the randomness is programmable by deploying multiple memristors for a single matrix weight, it can be deployed for Bayesian inference. In this case, sampling from the posterior distribution is implemented by reading, and potentially rewriting, the memristor several times while the neuromorphic crossbar architecture ensures the efficient evaluation of the model²². To avoid the need for sequential sampling and the random structural changes within memristive materials, transitioning to the optical domain allows for probabilistic computing in parallel with single-shot readout by deploying chaotic light. Chaotic light is an ideal entropy source for true random number generation^23,24,25,26 and can, moreover, easily be generated at default telecom wavelengths by amplified spontaneous emission in erbium doped fibers or erbium doped waveguides^{27,28,29,30,31}. Moreover, the incoherent nature and large optical bandwidth of chaotic light allows for high-speed data processing in photonic crossbar arrays by exploiting wavelength division multiplexing.

In the following we present a photonic neuromorphic architecture capable of performing probabilistic single-shot computations with a photonic crossbar array. We harness chaotic light fields as the entropy source of the system and as the optical carrier for probabilistic information encoding. For photonic in memory computing, we employ the non-volatile phase change material Germanium-Antimony-Telluride (GST). Using time-amplitude modulation, we perform probabilistic data encoding and achieve parallel sampling based on spectral demultiplexing. We quantify the precision of the stochastic multiply and accumulate operations performed by the photonic circuit. With an incoherent photonic processor, we calculate high-speed probabilistic convolutions on visual inputs, making use of parallel spectral sampling from the output distributions. We deploy stochastic variational inference in a Bayesian neural network based on the LeNet 5³² architecture to minimize the divergence between the true posterior of our model parameters and the variational distributions educible by our encoding scheme. We benchmark the BNN’s accuracy and out-of-domain rejection on an incomplete MNIST³³ data set.

Results

System architecture

Photonic probabilistic computing relies on the capability to generate analog signals which encode input vectors with tailored mean and variance. In order to provide the desired mean/variance tuples we employ chaotic light as the entropy source of our system. In a chaotic light source, the beating between the various frequency components leads to a time varying optical intensity. Since the variance of the intensity fluctuations is proportional to the squared mean intensity, desired intensity distributions can be conveniently shaped by amplitude modulation. In addition, chaotic light offers the unique possibility to tune the autocorrelation of the fluctuations by varying the optical bandwidth. The correlation between samples drawn from the fluctuation approaches zero for sampling rates smaller than the optical bandwidth, since the coherence time, given by the Wiener-Khinchin theorem, is approximately the inverse bandwidth, see Supplementary Fig. 3. Making use of wavelength division multiplexing (WDM), we can employ a single chaotic light source, which easily spans an optical bandwidth of several THz, to provide multiple independent entropy sources. Since the computing mechanism is solely intensity-based, incoherent photonic crossbar arrays which are naturally broadband and compatible with broadband chaotic light can be used for efficient probabilistic computing. Photonic crossbars support parallel computing by WDM³⁴ and thus, a single chaotic light source can serve as the entropy source for all parallel computation channels.

Figure 1 sketches the working principle of the probabilistic processor. As a chaotic light source, we split the amplified spontaneous emission (ASE) of an Erbium doped fiber into four different waveguides and delay the signals with respect to each other beyond the coherence time with fiber loops. In this way, the superposition of the chaotic fields behaves like a single chaotic field with a mean intensity given by the sum of the individual intensities, see Supplementary Fig. 4. A desired input distribution is encoded in a sequence of pulse shapes which are modulated onto the optical carrier signals via electro-optic modulators (EOMs). These input distributions form the vector entries to the photonic crossbar which is used to perform the matrix vector multiplications (MVM). MVMs form the backbone of arithmetic operations in artificial neural networks and our photonic architecture is designed to do this in fast and efficient way. Within the photonic crossbar array, the matrix weights are encoded into an optical attenuation using appropriate states of the non-volatile phase change material Germanium-Antimony-Telluride (GST). Here, we perform additions (accumulations) by overlapping propagating pulses in a single waveguide and multiplications by attenuating the pulses with GST cells corresponding to matrix weights. At the output of the crossbar, we demultiplex the broadband ASE light according to the 200 GHz ITU grid. Since we perform the encoding on all frequency components in parallel at the input and only demultiplex the field before detection at the output, all wavelength channels carry the same intensity distribution. Therefore, WDM enables independent parallel sampling from the output distribution while minimizing data-shuffling.

**Fig. 1: Photonic probabilistic processor.**

Probabilistic encoding

We drive the EOMs with a symbol rate of 17.6 GBaud and sample in parallel from the four ITU wavelength channels C28, C30, C32 and C34 with a readout circuit electronically limited to 30 GHz. Figure 2a shows the mean and standard deviation of the photodetector signal for ten symbols encoded in subsequent time slots separated by 56.8 ps measured in C28. There are two main contributions to the measured output distribution for a given mean $\bar{{{{\rm{x}}}}}$. First, there are intensity fluctuations due to the chaotic carrier signal which are described by a M-fold Bose-Einstein distribution. The fluctuations in the measured signal are proportional to the mean intensity value. The degeneracy factor M depends on the number of independent temporal coherence cells within a measurement interval and is therefore linked to the optical bandwidth and the electrical bandwidth of the readout circuit. Due to the high photon numbers used in our experiments, the optical shot noise is negligible. Second, there is electronic ground noise which can be described by a Gaussian distribution with standard deviation ${{{{\rm{\sigma }}}}}_{{{{\rm{el}}}}}$. In an idealized system, the measured output distribution ${{{\rm{p}}}}\left({{{\rm{x}}}},\bar{{{{\rm{x}}}}}\right)$ is the convolution of those two independent random processes, given as:

$${{{\rm{p}}}}\left({{{\rm{x}}}},\bar{{{{\rm{x}}}}}\right)={\int }_{\!\!\!0}^{\infty }\left[\frac{{{{{\rm{M}}}}}^{{{{\rm{M}}}}}}{\bar{{{{\rm{x}}}}}\cdot \Gamma \left({{{\rm{M}}}}\right)}\cdot {\left(\frac{{\mbox{v}}}{\bar{{{{\rm{x}}}}}}\right)}^{{{{\rm{M}}}}-1}\cdot {{{{\rm{e}}}}}^{-{{{\rm{M}}}}\cdot {{\mbox{v}}}/\bar{{{{\rm{x}}}}}}\right]\cdot \left[\frac{1}{\sqrt{2\cdot {{{\rm{\pi }}}}\cdot {{{{\rm{\sigma }}}}}_{{{{\rm{el}}}}}^{2}}}\cdot {e}^{-0.5\cdot {({\mbox{x}}-{\mbox{v}})}^{2}/{{{{\rm{\sigma }}}}}_{{{{\rm{el}}}}}^{2}}\right]{\mbox{dv}}$$

(1)

**Fig. 2: Chaotic light as an entropy source.**

Since the mean $\bar{{\mbox{x}}}$ depends on the encoded input symbol, i.e., the transmission through the EOM, the chaotic carrier enables shaping of the measured output distribution by changing the programmed waveform. To describe the measured distributions, we also take system imperfections into account, such as the limited extinction ratio of the modulator and detector saturation as described in the Supplement. Figure 2b shows the output distributions measured for different encoded means at the photodetector, fitted to the complete physical model derived in the Supplementary Methods. For zero mean (i.e. maximum attenuation by the EOM), electrical noise is the dominant source of randomness, thus leading to a Gaussian shape of the output distribution. With increasing mean, intensity fluctuations described by a M-fold Bose-Einstein distribution become the major contribution to the shape of the distribution. For the largest measured mean, detector saturation reduces the width of the distribution as the maximal measured voltage is limited.

Next, we investigate the behavior of independent samples drawn from the four WDM wavelength channels shown in Fig. 2c when sampling is performed in parallel. Figure 2d shows the measured standard deviation of the output distribution in each channel in dependence of the mean of the distribution. For each wavelength the standard deviation follows the model prediction. Small differences between the output distributions are caused by slightly different spectral shapes of the WDM channels and the fact that there are four different readout circuits with slightly different ground noise. Since the ASE can be modelled as the superposition of independent random emitters with fixed wavelength, there is no correlation between the intensity fluctuations in different wavelength channels for ideal demultiplexing. Practically, the measured correlation coefficients between different channels during parallel sampling is below 10^-2 as shown Fig. 2e.

For a single symbol the mean of the distribution is directly connected to the variance by Eq. 1. In order to generate mean values with desired variance, we take the sum of 9 subsequent symbols which enables shaping the distribution of the measured sum. Figure 2f shows the distribution for three different mean-variance tuples encoded in channel 34. If we encode the mean of the distribution only in a single symbol and set all other to zero (dark blue trace), the distribution of the sum behaves like that of a single symbol (standard deviation of 0.47). In contrast, spreading the same mean over all 9 symbols (light blue trace) leads to a distribution with the same mean but lower variance as the noise partially averages out (standard deviation of 0.29). In this way, we can tune the mean and variance of the output distribution independently. The main advantage of this encoding scheme is that the electronic readout circuit always performs identical operations, i.e., summing over 9 received symbols, and does not require any information about the noise distribution. The distribution is solely encoded in the waveform encoded on the chaotic optical carrier at the input and is propagated through the circuit. We note, that employing longer time sequences with more symbols provides a wider tuning range in the variance at the cost of longer integration time and an increased impact of the ground noise.

Photonic in-memory computing

We employ a photonic crossbar array for probabilistic computation. The architecture exploits photonic in-memory computing with waveguide coupled GST nanocells used as memory and multiplication units. We tune the optical attenuation of the phase change material by partially switching it between its barely absorptive amorphous and highly absorptive crystalline state, see Supplementary Fig. 5. Since both states are non-volatile, GST enables in-waveguide multiplication without requiring a constant power supply to hold the memory state. Figure 3a shows the concept of photonic multiplication operations performed on time-varying input waveforms. The optical pulses corresponding to a desired input value propagate through the waveguide and couple evanescently to the GST cell. Through the interaction, the pulses are attenuated by an amount depending on the phase state of the GST and are used for further processing. Besides multiplication, we perform photonic additions by overlapping two pulse shapes in a single waveguide as sketched in Fig. 3b. As described in the Supplementary Methods, the sum of two chaotic light fields behaves like a single field with a mean corresponding to the sum of the input means. Since both multiplication and addition operations are linear and (optical) phase-insensitive, we parallelize them via wavelength division multiplexing.

We analyze the multiplication operation by optically writing a relative transmission coefficient of 0.6 into the GST nanocell. We exemplary choose an input distribution with a mean of 1 and sample from the output distribution in four WDM channels in parallel. Figure 3c shows the measured input and output distribution together with the model prediction for the different wavelength channels. As expected, decreasing the transmission through the GST cell decreases the mean of the distribution accordingly. In contrast, the standard deviation is not linearly decreased. The imperfection is due the electronic ground noise of the readout circuit, e.g., the thermal noise of the detector and transimpedance amplifier, and is therefore independent from the optical intensity. Since we do not have direct control over the distribution of the electronic ground noise, we cannot exploit it for probabilistic modelling. Next, we investigate the accumulation, i.e. addition, of two input distributions. Figure 3d shows the measured mean of each output distribution in dependence on the sum of the input distributions’ means. As expected, the means of the distributions add up and align with the model prediction with an average error of 0.4% and a spread of 2.6%. In addition, we compare the standard deviation of the output distribution with the standard deviation of the input distributions. Because the electronic noise is independent of the mean of the distribution, the standard deviations do not add up in the same way as the means. When comparing the width of the output distribution with the model prediction, we observe good alignment with the model within 0.6%, with a spread of 1.2%.

Probabilistic convolutions

Moving beyond individual operations on distributions, we fabricate the 4×4 photonic crossbar array shown in Fig. 4a which combines multiple multiplication and addition units. The high parallelism of the crossbar architecture is well suited for convolution processing due to the shared weights³⁵. In Fig. 4b we illustrate the application of programmable probabilistic convolution operation for a visual (image) input. As in the deterministic (convolutional) counterpart³⁵, we encode the convolution kernel weights in the GST cells and encode the image pixel values in the mean input intensities. In addition, we choose selectively how to spread the mean over the 9 subsequent input symbols contributing to the output distribution. Here, we employ an encoding represented by a probabilistic mask. In the outer area of the picture, we encode the mean given by the pixel value in a single symbol and set the remaining symbols to zero (wide standard deviation of 0.45 for mean 1). In the inner region we spread the mean over all nine symbols (narrow standard deviation 0.28 for mean 1). In this way, the output distributions in the outer area exhibit larger noise levels in the convolution output compared to the inner area. The measured noise level is the superposition of the photonic noise, which is tunable via waveform encoding, and the electronic ground noise, which is not tunable and thus deteriorates the dynamic range of the noise encoding. It is important to note that the electronic readout circuit and post-processing always perform the same operation and are independent of the encoded noise. The noise level is solely encoded in the waveform modulated onto the optical carrier signal and thus no additional communication between the input and output of the processor is required.

**Fig. 4: Probabilistic convolution processing.**

We optically program the kernel weights for average pooling into the GST nanocells within the crossbar and calculate probabilistic convolution on the input image with a stride of two. We encode the input vectors on four chaotic light carrier signals and sample from the output distribution after the crossbar in parallel with four wavelength channels. We perform both the signal encoding and sampling with 17.6 GBaud per channel. Figure 4c-f shows a sample from the convolutions output distribution for each wavelength channel, color coded by the ITU WDM channel as in Fig. 2c. As determined by the probabilistic mask, the average pooled output is qualitative noisier in the outer region of the picture than in the inner region. We quantitatively compare the pooling of pixels with identical means in both regions. As expected, the standard deviation is smaller for the same mean in the inner region. For both cases the output distribution aligns well with the model prediction.

Bayesian inference

Integrating the photonic probabilistic computing architecture in a neural network enables out-of-domain (OOD) detection via Bayesian inference. In this way the network does not only generate a prediction but also quantifies the similarity between the input and previously observed data, providing a measure for the confidence of the neural network. Thus, uncertainty estimation via BNNs is fundamentally different from uncertainty estimation in deterministic ANNs. For example, training an ANN for a classification task via the Cross-Entropy Loss will primarily optimize for confident classification within known classes and does not consider whether an input is substantially different from the training distribution. Thus, they can be easily fooled by OOD data³⁶. We compare the OOD detection of the BNN with different deterministic methods like Maximum Predicted Softmax Probability³⁷, MaxLogit³⁸ and Energy-based ones³⁹ in the Supplementary Fig. 7.

We design a modified LeNet-5³² deep neural network architecture, shown in Fig. 5a, and deploy the MNIST dataset for training and benchmarking. The dataset contains images of handwritten digits, which are to be categorized into ten classes, representing the numbers zero to nine. We create an OOD scenario by training the network only with nine classes, numbers zero to eight, and deploy the handwritten nines as out-of-domain data. While many options for training via Bayesian Inference exist, we choose to make use of the natural similarity between stochastic variational inference (SVI) and the photonic accelerator. Variational inference approximates the true posterior of the model parameters by utilizing simpler distributions $q(w)$, in this case the probabilistic properties of the chaotic light. During training SVI maximizes the so-called evidence lower bound (ELBO) using backpropagation. The ELBO consists of two terms, the first term represents data likelihood, and the second term is the Kullback-Leibler (KL) divergence between the approximated posterior q(w) and the prior p(w). It can be broadly written as⁴⁰:

$${{\mbox{ELBO}}}\left(q\right)={{\mbox{E}}}\left[\log p\left({{{\rm{D}}}| {{\rm{w}}}}\right)\right]-{{\mbox{KL}}}\left(q\left({{{\rm{w}}}}\right){||p}({{{\rm{w}}}})\right)$$

(2)

**Fig. 5: Bayesian inference on the incomplete MNIST dataset.**

To accelerate the off-chip training, we approximate the complex photonic distribution shown in Eq. 1 by a Gaussian distribution during training.

Figure 5b shows the classification performance, as well as the average OOD performance during training. We use the intuitive metric of accuracy on a test subset of known classes to monitor classification performance. To determine how well the BNN distinguishes OOD images from in-domain (ID) images, we compare Mutual Information (MI) on test images of known against unknown classes^41,42. The network quickly learns to correctly classify known images, as the test accuracy rises to over 99%. While the difference in Mutual Information between known (low MI) and unknown (high MI) samples improves at the same time, it is much slower and converges to a fixed difference after about 100 epochs. This shows that by using SVI training the BNN has effectively learned to correctly classify images of known class, while being able to identify OOD images.

Finally, we transfer the learned parameters to the physical encoding scheme shown in Fig. 2f, the exact description of the implementation is shown in the Supplement. Figure 5c shows the output distributions for an ID image, clearly assigning the highest output scores to the correct classes. In contrast, the output distributions overlap for the unknown number nine as shown in Fig. 5d. ID and OOD data can be distinguished on a per-image basis using Mutual Information as shown in Fig. 5e. Evaluating the network again on the test subset leads to similar results as for the Gaussian approximation during training. The accuracy slightly decreases from 99.41% to 99.37% whereas the relative difference in average mutual information between OOD and ID data increases from to ×23.24 to ×25.60.

Discussion

The probabilistic photonic processing architecture outlined above enables parallel sampling of distributions at high speed dictated by telecom frequencies. In contrast to electronic probabilistic processors which employ the switching dynamics of stochastic magnetic tunnel junctions, hafnium-oxide-based filamentary memristors or phase change materials as an entropy source, chaotic light sources provide physical entropy with very high bandwidth. Differing from optical entropy sources, such electronic probabilistic approaches are limited by their sequential sampling process and the material properties, i.e., large switching times in comparison to optical encoding and limited endurance. Our approach overcomes those limitations by using a chaotic light source in combination with a broadband incoherent photonic crossbar array, encoding the distribution in subsequent temporal bins and deploying broadband computation with spectral demultiplexing at the outputs for parallel sampling. With an electronic bandwidth of 30 GHz, a symbol rate of 17.6 GBaud per channel and sampling from 4 channels in parallel, the effective sample rate from a single matrix output distribution is 70.4 GS/s. In comparison, the sample rate, i.e., the inverse time for programming and reading the underlying material entropy source, ranges from 500 MS/s²⁰ to 1 MS/s²¹, which implies a speedup of more than 2 orders of magnitude with photonic sampling. Similarly, (pseudo) random number generation on conventional deterministic hardware slows the system down to MS/s rates as shown in benchmark tests with a 2x AMD EPYC ROME, 32 Cores / 64 Threads CPU system in the Supplementary Table 1. Thus far, prior photonic implementations implemented probabilistic computing only on a single synapse level²⁵, were experimentally limited to sampling rates in the kS/s range⁴³ or envisioned in future hybrid processors⁴⁴.

The physical entropy source is a natural fit for stochastic variational inference (SVI) as one of the major methods for Bayesian neural networks since SVI allows for operation on arbitrary parametrized probability distributions. This enables us to design an SVI representation of the photonic processor, including probabilistic and deterministic parameters, to train a Bayesian neural network that allows to reason about uncertainties. In contrast to standard BNN implementations based on probabilistic weights, we show that we can adhere to hardware properties by employing learnable probabilistic activations in our BNN architecture. We demonstrate the effectiveness of this approach on a BNN architecture trained for image classification with simultaneous OOD detection.

From a computing perspective, employing chaotic light as a carrier leads to interesting properties that pave the way for future scaling and integration. First, arbitrary convolution operations will require larger crossbar arrays as the number of inputs is given by the squared kernel size times the number of input layers instead of the four inputs required for depth wise pooling. Since the photonic processor proposed here is based on the beating between the frequency components of chaotic light, we can use the same optical wavelength channel for all crossbar inputs, thus decoupling the number of inputs from the required optical bandwidth. Second, all components, and especially the optical sources, should be integrated for practical use with the rest of the processor. As chaotic light arises from amplified spontaneous emission, also semiconductor optical amplifiers can be used to generate the required carrier signals. With recent advances in the integration of Indium Phosphide as a gain medium^45,46, silicon photonics platforms will be well suited for future integration. In addition, SOI allows more compact matrix cells in the order of 30 µm x 30 µm³⁵, and offers potentially PIN-heater for simplified PCM switching⁴⁷ and the required high-speed modulators and detectors. Third, the speed of the I/O of the photonic processor must be compatible with the other components in the full signal chain to avoid dataflow and memory problems. Practically, this means that the sampling rate of a single channel will likely need to be reduced. The probabilistic computing capabilities remain unaffected by this change, as the photonic noise is dependent on the ratio between the electronic and optical bandwidth. Thus, a slower electronic I/O only requires a smaller optical bandwidth for one measurement channel. The resulting increase in optical coherence time is directly compensated by the fact, that the sampling rate is reduced. Similarly, the maximal throughput is unaffected as the smaller optical bandwidth directly enables more parallel channels for sampling and thus compensates the lower sampling rate. It is important to note that first findings show that a single probabilistic layer is sufficient to capture the uncertainty of the model⁴⁸. Therefore, a single photonic processor as introduced here might be sufficient to handle the full probabilistic workload while deterministic hardware with high memory density and optimized for parallel processing compute the subsequent layers, that must be evaluated independently for each sample.

A key feature of our architecture is that both, the computation of multiply and accumulate operations with the photonic crossbar array and the tunable noise generation with a chaotic light source, are passive transmission measurements. Thus, limitations as limited endurance and low sampling rating arising from entropy sources based on material switching dynamics do not apply. Since a photonic crossbar array is functional over a range of several THz^34,35, chaotic light sources easily support dozens of THz and only a single wavelength channel is needed to draw a sample from the output distribution. Hence, the overall speed is solely limited by the electronic interface. Overall, our approach to probabilistic computing provides an effective method to remove the computational bottleneck of probabilistic modelling with conventional deterministic hardware^49,50,51,52.

Methods

Nanofabrication

We create the photonic chip design with gdshelpers⁵³, a Python-based open-source design framework for integrated circuits. Our material stack consists of HSQ cladded stoichiometric LPCVD Si3N4 films (330 nm) atop SiO2 dielectric (3300 nm) with Silicon serving as the substrate material. The wafers are obtained from Rogue Valley Microdevices and are annealed prior to fabrication to improve the quality of the Si3N4 film. The fabrication process encompasses four stages. Initially, we deposit gold markers for aligning the various masks with respect to each other. In the second stage, we pattern the photonics, followed by sputtering the phase-change material Germanium-Antimony-Tellurium (GST-225). Finally, we clad the waveguides with HSQ. For exposing the various resists, we deploy the 100 kV Raith EBPG 5150 electron beam lithography tool.

To create the gold markers, we initiate the process by spin-coating the positive photoresist polymethyl methacrylate (PMMA) from the AllResist AR-P 672 series. Following resist baking, we expose the marker regions. Subsequently, we develop the resist in a methyl isobutyl ketone (MIBK) and isopropyl alcohol (IPA) solution and evaporate a stack of chromium (5 nm)/gold (80 nm)/chromium (5 nm) through physical vapor deposition (PVD). The chromium layers at the bottom and top enhance adhesion and protect the gold surface. We liftoff the unexposed areas via sonication in acetone. Next, we pattern the photonic circuit into the negative resist AR-N 7520.12 (Allresist), which is spin-coated with a thickness of 350 nm. After development in a MF-319 (Microposit) solution, we etch the mask into the silicon nitride layer via reactive ion etching in a CHF3/O2 plasma (Oxford PlasmaPro RIE 80). Then we remove the mask with oxygen plasma. We fabricate the PMMA mask for GST deposition in the same way as the mask for gold evaporation. After development, we deposit 10 nm of GST-225 covered by 10 nm of Al2O3, which locally confines the GST during melt-quenching and furthermore protects it from oxidation, via sputter-deposition. Next, we liftoff the unexposed areas with acetone. Finally, we spincoat and expose 800 nm of the negative resist HSQ/FOX16 (Dow Corning) to clad the photonic circuit.

Experimental setup

We deploy an Agiltron ASES-1611A3113 as a chaotic light source and filter is to the relevant wavelength region, C28/C30/C32/C34 of the ITU grid, upon amplifying the light with a PriTel FA-33-IO. Afterwards we split the light to 4 input channels and delay the channels with at least 1.25 ns with respect to each other. Then we modulate the pulse shapes on the chaotic carrier signal with OptiLab IML-1550-40-PM-V electro optic modulators. The EOMs are controlled by a Keysight M9502A. For each pulse shape we optimize the coupling to the chip by adjusting the polarization. To measure the output of the system, we amplify the signal with a PriTel LNTFA-20-NMA before splitting it to the four wavelength channels. For detection we deploy Thorlabs RXM38AF detectors which are connected to a Keysight DSA-X 95004Q to measure the optical intensity. The overall bandwidth of the detection system is limited to 30 GHz by the oscilloscope. We use the python interface of the arbitrary waveform generator and the oscilloscope to control the complete system by the PC as shown in Supplementary Fig. 1.

Phase change photonics

The photonic crossbar arrays consist of multiple cells as shown in Supplementary Fig. 5a, each representing one matrix weight. The input light corresponding to the vector component is coupled by a directional coupler to a crossing with integrated Germanium-Antimony-Telluride (GST) on top, which serves a tunable, non-volatile attenuator. Afterwards, the light is coupled by a directional coupler to the output waveguide again. The transmission through the GST crossing strongly depends on the phase state of the GST, which is highly absorptive in its crystalline state but only barely absorbs in the amorphous one. We can trigger a phase transition of the GST and hence tune the matrix by sending high power optical pulses through the GST cell. Supplementary Fig. 5b shows a typical programming of the GST to different transmission levels relative to the crystalline one. In a closed-loop way, we measure the transmission through the cell and adjust the power of the 200 ns write pulse to obtain the desired weight. With pulse powers between 4 mW and 14 mW we can set the transmission with an error below 1%.

Data availability

All data are available in the main text or supplementary materials.

References

Friston, K. The free-energy principle: a unified brain theory? Nat. Rev. Neurosci. 11, 127–138 (2010).
Article CAS PubMed Google Scholar
Friston, K. et al. The free energy principle made simpler but not too simple. Phys. Rep. 1024, 1–29 (2023).
Article ADS MathSciNet Google Scholar
Lecun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
Article ADS CAS PubMed Google Scholar
Bi, K. et al. Accurate medium-range global weather forecasting with 3D neural networks. Nature 619, 533–538 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Ker, J. & Wang, L. Deep Learning Applications in Medical Image Analysis. IEEE Access 6, 9375–9389 (2018).
Article Google Scholar
Rao, Q. & Frtunikj, J. Deep Learning for Self-Driving Cars: 2018. IEEE/ACM 1st Int. Work. Softw. Eng. AI Auton. Syst. 35, 38 (2018).
Google Scholar
Arkhangelskaya, E. O. & Nikolenko, S. I. DEEP LEARNING FOR NATURAL LANGUAGE PROCESSING: A SURVEY. J. Math. Sci. 273, 533–582 (2023).
Article MathSciNet Google Scholar
Vaswani, A. et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 5999, 6009 (2017).
Google Scholar
Bahdanau, D., Cho, K. H. & Bengio, Y. Neural machine translation by jointly learning to align and translate. 3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc. (2015).
Sutskever, I., Vinyals, O. & Le, Q. V. Sequence to sequence learning with neural networks. Adv. Neural Inf. Process. Syst. 4, 3104–3112 (2014).
Google Scholar
Nixon, J., Dusenberry, M. & Liu, J. Measuring Calibration in Deep Learning. (2015).
Mackay, D. J. C. A Practical Bayesian Framework for Backprop Networks. 74, 1–20 (1992).
Jospin, L. V. Hands-on Bayesian Neural Networks – A Tutorial for Deep Learning Users. arXiv (2020).
Kiureghian, A. Der & Ditlevsen, O. Aleatory or epistemic? Does it matter? Struct. Saf. 31, 105–112 (2009).
Article Google Scholar
Hüllermeier, E. & Waegeman, W. Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods. Machine Learning 110 (Springer US, 2021).
Ghahramani, Z. (2015) Probabilistic machine learning and artificial intelligence. https://doi.org/10.1038/nature14541.
Bishop, C. M. Pattern Recognition and Machine Learning. EAI/Springer Innovations in Communication and Computing (2021). https://doi.org/10.1007/978-3-030-57077-4_11.
Sebastian, A., Le Gallo, M., Khaddam-Aljameh, R. & Eleftheriou, E. Memory devices and applications for in-memory computing. Nat. Nanotechnol. 15, 529–544 (2020).
Article ADS CAS PubMed Google Scholar
Mehonic, A. & Kenyon, A. J. Brain-inspired computing needs a master plan. Nature 604, 255–260 (2022).
Article ADS CAS PubMed Google Scholar
Liu, S. et al. Bayesian neural networks using magnetic tunnel junction-based probabilistic in-memory computing. Front. Nanotechnol. 4, 1–16 (2022).
Article CAS Google Scholar
Bonnet, D. et al. Bringing uncertainty quantification to the extreme-edge with memristor-based Bayesian neural networks. Nat. Commun. 14, 1–13 (2023).
Article ADS Google Scholar
Gallo et al. A 64-core mixed-signal in-memory compute chip based on phase-change memory for deep neural network inference. Nat. Electron. 6, 680–693 (2023).
Article Google Scholar
Guo, Y. et al. 40 Gb/s quantum random number generation based on optically sampled amplified spontaneous emission. APL Photonics 6, 066105 (2021).
Article ADS Google Scholar
Wu, C. et al. Harnessing optoelectronic noises in a photonic generative network. Sci. Adv. 8, 1–8 (2022).
Google Scholar
Wu, C., Yang, X., Chen, Y. & Li, M. Photonic Bayesian Neural Network Using Programmed Optical Noises. IEEE J. Sel. Top. Quantum Electron. 29, 1–16 (2023).
Google Scholar
Ma, B., Zhang, J. & Li, X. Stochastic photonic spiking neuron for Bayesian inference with unsupervised learning. Opt. Lett. 48, 1411–1414 (2023).
Article ADS PubMed Google Scholar
Vannucci, G. & Teich, M. C. Computer simulation of superposed coherent and chaotic radiation. Appl. Opt. 19, 548 (1980).
Article ADS CAS PubMed Google Scholar
Goodman, J. Statistical optics. (2000).
Shimoda, K., Takahasi, H. & Townes, C. H. Fluctuations in Amplification of Quanta with Application to Maser Amplifiers.pdf. J. Phys. Soc. Jpn. 12, 686–700 (1957).
Article ADS Google Scholar
Pietralunga, S. M., Martelli, P. & Martinelli, M. Photon statistics of amplified spontaneous emission in a dense wavelength-division multiplexing regime. Opt. Lett. 28, 152 (2003).
Article ADS PubMed Google Scholar
Liu, Y. et al. A photonic integrated circuit – based erbium-doped amplifier. Sci. (80-.). 1313, 1309–1313 (2022).
Article ADS Google Scholar
LeCun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2323 (1998).
Article Google Scholar
Deng, L. The MNIST database of handwritten digit images for machine learning research. IEEE Signal Process. Mag. 29, 141–142 (2012).
Article ADS Google Scholar
Brückerhoff-Plückelmann, F. et al. Broadband photonic tensor core with integrated ultra-low crosstalk wavelength multiplexers. Nanophotonics 11, 4063–4072 (2022).
Article Google Scholar
Feldmann, J. et al. Parallel convolutional processing using an integrated photonic tensor core. Nature 589, 52–58 (2021).
Article ADS CAS PubMed Google Scholar
Nguyen, A., Yosinski, J. & Clune, J. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 07-12-June, 427–436 (2015).
Hendrycks, D. & Gimpel, K. (2017) A baseline for detecting misclassified and out-of-distribution examples in neural networks. 5th Int. Conf. Learn. Represent. ICLR 2017 - Conf. Track Proc. 1–12.
Hendrycks, D. et al. (2022) Scaling Out-of-Distribution Detection for Real-World Settings.
Liu, W., Wang, X., Owens, J. D. & Li, Y. (2020) Energy-based out-of-distribution detection. Adv. Neural Inf. Process. Syst. 2020-December.
Blei, D. M., Kucukelbir, A. & McAuliffe, J. D. Variational Inference: A Review for Statisticians. J. Am. Stat. Assoc. 112, 859–877 (2017).
Article MathSciNet CAS Google Scholar
Depeweg, S., Hernandez-Lobato, J. M., Doshi-Velez, F. & Udluft, S. Decomposition of Uncertainty in Bayesian Deep Learning for Efficient and Risk-sensitive Learning. 35th Int. Conf. Mach. Learn. ICML 2018 3, 1920–1934 (2018).
Google Scholar
Wimmer, L., Sale, Y., Hofman, P., Bischl, B. & Hüllermeier, E. Quantifying Aleatoric and Epistemic Uncertainty in Machine Learning: Are Conditional Entropy and Mutual Information Appropriate Measures? Proc. Mach. Learn. Res. 216, 2282–2292 (2023).
Google Scholar
Choi, S. et al. (2024) Photonic probabilistic machine learning using quantum vacuum noise. 1–8.
Roques-Carmes, C. et al. Heuristic recurrent algorithms for photonic Ising machines. Nat. Commun. 11, 249 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Yan, Z. et al. A monolithic InP/SOI platform for integrated photonics. Light Sci. Appl. 10, 200 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Shang, K., Pathak, S., Guan, B., Liu, G. & Yoo, S. J. B. Low-loss compact multilayer silicon nitride platform for 3D photonic integrated circuits. Opt. Express 23, 21334 (2015).
Article ADS CAS PubMed Google Scholar
Erickson, J. R. et al. Comparing the thermal performance and endurance of resistive and PIN silicon microheaters for phase-change photonic applications. Opt. Mater. Express 13, 1677 (2023).
Article ADS CAS Google Scholar
Sharma, M., Farquhar, S., Nalisnick, E. & Rainforth, T. Do Bayesian Neural Networks Need To Be Fully Stochastic? Proc. Mach. Learn. Res. 206, 7694–7722 (2023).
Google Scholar
Bingham, E., Chen, J. P., Szerlip, P. & Goodman, N. D. Pyro: Deep Universal Probabilistic Programming. 0–5.
Schrijvers, T., Van Den Berg, B. & Riguzzi, F. (2023) Automatic Differentiation in Prolog. Theory Pract. Log. Program. 1–4 https://doi.org/10.1017/S1471068423000145.
Rossum, G. Van & Drake, F. L. Python Ref. Man. Oct. 22, 9117–9129 (2006).
Google Scholar
Lam, S. K., Pitrou, A. & Seibert, S. Numba: A LLVM-based Python JIT Compiler. in Proceedings of LLVM-HPC 2015: 2nd Workshop on the LLVM Compiler Infrastructure in HPC - Held in conjunction with SC 2015: The International Conference for High Performance Computing, Networking, Storage and Analysis (2015). https://doi.org/10.1145/2833157.2833162.
Gehring, H., Blaicher, M., Hartmann, W. & Pernice, W. H. P. Python based open source design framework for integrated nanophotonic and superconducting circuitry with 2D-3D-hybrid integration. OSA Contin. 2, 3091 (2019).
Article CAS Google Scholar

Download references

Acknowledgements

We thank Jochen Stuhrmann, from Illustrato, and Jonas Schütte for their assistance with the illustrations. Also, we thank Ivonne Bente and Niklas Vollmar for supporting us during the fabrication process. The research is funded by:

‐ German Research Foundation under Germany´s Excellence Strategy EXC 2181/1—390900948 (the Heidelberg STRUCTURES Excellence Cluster), the Excellence Cluster 3D Matter Made to Order (EXC−2082/1—390761711) and CRC 1459 “Intelligent matter”

‐ European Union’s Horizon 2020 research and innovation programme (grant no. 101017237, PHOENICS project) and the European Union’s Innovation Council Pathfinder programme (grant no. 101046878, HYBRAIN project).

‐ COMET program within the K2 Center “Integrated Computational Material, Process and Product Engineering (IC-MPPE) (Project No 886385)

‐ Austrian Federal Ministries for Climate Action, Environment, Energy, Mobility, Innovation and Technology (BMK) and for Labour and Economy (BMAW), represented by the Austrian Research Promotion Agency (FFG), and the federal states of Styria, Upper Austria and Tyrol

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Physical Institute, University of Münster, Münster, 48149, Germany
Frank Brückerhoff-Plückelmann, Akhil Varri & Wolfram Pernice
Kirchhoff-Institute for Physics, University of Heidelberg, Heidelberg, 69120, Germany
Frank Brückerhoff-Plückelmann, Jelle Dijkstra & Wolfram Pernice
Institute of Computer Engineering, University of Heidelberg, Heidelberg, 69120, Germany
Hendrik Borras, Bernhard Klein & Holger Fröning
Institute for Geoinformatics, University of Münster, Münster, 48149, Germany
Marlon Becker & Benjamin Risse
Faculty of Mathematics & Computer Science, University of Münster, Münster, 48149, Germany
Marlon Becker & Benjamin Risse
DEVK RE, Cologne, 50668, Germany
Martin Brückerhoff
Department of Engineering, University of Exeter, Exeter, EX44QF, UK
C. David Wright
Institute of Materials Physics, University of Münster, Münster, 48149, Germany
Martin Salinga
Department of Materials, University of Oxford, Oxford, OX43PJ, UK
Harish Bhaskaran

Authors

Frank Brückerhoff-Plückelmann
View author publications
Search author on:PubMed Google Scholar
Hendrik Borras
View author publications
Search author on:PubMed Google Scholar
Bernhard Klein
View author publications
Search author on:PubMed Google Scholar
Akhil Varri
View author publications
Search author on:PubMed Google Scholar
Marlon Becker
View author publications
Search author on:PubMed Google Scholar
Jelle Dijkstra
View author publications
Search author on:PubMed Google Scholar
Martin Brückerhoff
View author publications
Search author on:PubMed Google Scholar
C. David Wright
View author publications
Search author on:PubMed Google Scholar
Martin Salinga
View author publications
Search author on:PubMed Google Scholar
Harish Bhaskaran
View author publications
Search author on:PubMed Google Scholar
Benjamin Risse
View author publications
Search author on:PubMed Google Scholar
Holger Fröning
View author publications
Search author on:PubMed Google Scholar
Wolfram Pernice
View author publications
Search author on:PubMed Google Scholar

Contributions

Conceptualization: F.B.P., W.P, Harish B., C.D.W., H.F. Methodology: F.B.P., Hendrik B., B.K., A.V., Marlon B., J.D., Martin B. Investigation: F.B.P., Hendrik B., B.K., A.V., Marlon B., J.D., Martin B. Visualization: F.B.P. Hendrik B., B.K., J.D. Funding acquisition: W.P., C.D.W., M.S., Harish B., B.R., H.F. Project administration: W.P., H.F. Supervision: W.P., C.D.W., M.S., Harish B., B.R., H.F. Writing—original draft: F.B.P., Hendrik B., W.P. Writing—review and editing: All authors.

Corresponding author

Correspondence to Wolfram Pernice.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Thomas Ferreira de Lima and Lorenzo de Marinis for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Transparent Peer Review file

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Brückerhoff-Plückelmann, F., Borras, H., Klein, B. et al. Probabilistic photonic computing with chaotic light. Nat Commun 15, 10445 (2024). https://doi.org/10.1038/s41467-024-54931-6

Download citation

Received: 10 July 2024
Accepted: 26 November 2024
Published: 01 December 2024
DOI: https://doi.org/10.1038/s41467-024-54931-6

This article is cited by

Multi-level probabilistic computing: application to the multiway number partitioning problems
- Ki Hyuk Han
- Gyuyoung Park
- OukJae Lee
Scientific Reports (2025)
Probabilistic photonic computing for AI
- Frank Brückerhoff-Plückelmann
- Anna P. Ovvyan
- Wolfram Pernice
Nature Computational Science (2025)
The potential of multidimensional photonic computing
- Ivonne Bente
- Shabnam Taheriniya
- Wolfram Pernice
Nature Reviews Physics (2025)
PZT optical memristors
- Chenlei Li
- Hongyan Yu
- Daoxin Dai
Nature Communications (2025)