Introduction

Probabilistic machine learning can accelerate image generation1,2, heuristic optimization3,4, and probabilistic inference5,6 by leveraging stochasticity to encode uncertainty and enable statistical modeling7,8. These approaches are well suited for real-life applications which must account for uncertainty and variability, including autonomous driving9, medical diagnosis10, and drug discovery11. However, digital complementary metal-oxide-semiconductor (CMOS) technology requires extensive resource overhead to simulate randomness and control probabilities, which leads to significantly increased power consumption and decreased operational speed12. These challenges have sparked recent proposals for beyond-CMOS hardware such as low-barrier magnetic tunnel junctions13 and diffusive memristors14—both of which leverage intrinsic noise as a source of randomness.

Concurrently, optical neural networks (ONNs)15,16 have shown remarkable progress in energy efficiency17,18, speed19, and bandwidth20 for solving deterministic tasks such as image classification21 and speech recognition22. An important feature of ONNs is the inherent presence of noise in their operation. Therefore, photonic computing hardware typically implements computational tasks that are robust to optical noise16. ONNs have also been explored in regimes where deterministic tasks are performed with high accuracy, despite the presence of high levels of inherent noise18. Conversely, ONNs in which optoelectronic noise is intentionally added have also been proposed for optimization23 and generative networks24. Interestingly, quantum optics offers a natural source of randomness in the ground state of electromagnetic field, known as quantum vacuum noise25,26,27. This intrinsic noise source is ubiquitous in optics and has been used to achieve high-data rate random number generation28,29. In addition, optical systems influenced by quantum vacuum noise have shown natural abilities to generate probability distributions30,31,32, which are of strong interest for computing applications13,14. However, the experimental demonstration of a photonic probabilistic machine learning system has remained elusive so far, mostly due to the lack of programmable stochastic photonic elements.

Here, we experimentally demonstrate a probabilistic computing platform utilizing photonic probabilistic neurons (PPNs). Our PPN is implemented as a biased degenerate optical parametric oscillator (OPO), which leverages quantum vacuum noise to generate a probability distribution encoded by a bias field. We realized a hybrid optoelectronic probabilistic machine learning system which combines time-multiplexed PPNs and electronic processors with algorithm-specific measurement-and-feedback strategies. We demonstrate probabilistic inference of MNIST-handwritten digits with a stochastic binary neural network (SBNN), highlighting how quantum vacuum noise can encode classification uncertainty in discriminative models. Additionally, we showcase the generation of MNIST-handwritten digits with a pixel convolutional neural network (pixelCNN), demonstrating how statistical sampling in generative models can be facilitated by quantum vacuum noise. Furthermore, we provide a thorough discussion of the potential of an all-optical probabilistic machine learning system, offering a possible performance enhancement by a factor of 100 in both speed and energy over traditional CMOS implementations, thereby opening new avenues in high-speed, energy-efficient computing applications.

Results

Probabilistic computing with time-multiplexed PPNs

We first provide a brief overview of two probabilistic machine learning models and their optical implementation with PPNs (Fig. 1).

Fig. 1: Probabilistic machine learning with stochastic photonic elements.
figure 1

Probabilistic machine learning enabled by physical random sources, solving a inference and b generation tasks. Neural networks learn a decision line for inference tasks and overall distribution for generation tasks. a Random sources encode uncertainty in neural network parameters, allowing statistical interpretation on inference results. b Stochastic image generation seeded by random sources samples new images from certain probability distributions stored in neural networks. Both computational tasks require controllable stochastic photonic elements which can learn probability distribution and perform statistically independent sampling, which we refer to as photonic probabilistic neurons (PPNs). c Schematic of PPNs. One of the output states (α(0) or α(1)) of a multistable optical system is randomly selected from a certain state probability distribution p(α(1)b) controlled by a bias field level b. Subsequently, a processing unit reads the output state and updates the bias value b for the next sampling. N independent outcomes can be sampled from different probabilities by time-multiplexing the bias signal. Optical elements in (c), originally published by GWoptics; released under a Creative Commons Attribution-NonCommercial 3.0 Unported (CC BY-NC 3.0). MNIST images in (a and b), originally published by LeCun, et al.37; released under a Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0).

Discriminative models learn decision lines that encode classification boundaries between different images (Fig. 1a, left)33. Probabilistic neural networks (Fig. 1a, middle) then impart statistical properties onto network parameters (e.g., weight uncertainty5 or layer nodes34). Therefore, the network can provide a statistical ensemble of classification results, which are shown as different probabilities of the image classified to certain labels (Fig. 1a, right). Probabilistic inference can quantify classification uncertainty, which becomes critical for ambiguous images located near the decision boundary35,36.

On the other hand, generative models learn the underlying probability distribution of the training dataset (e.g., images) in order to create new ones (Fig. 1b, left)33. When generating new images, generative models use random sources to seed stochastic image sampling based on the probability distribution learned by the network (Fig. 1b, middle). As a result, images with different labels can be generated (Fig. 1b, right).

In both of these computational tasks, probabilistic machine learning requires stochastic photonic elements whose probability distribution can be tuned, and that can perform statistically independent sampling. We refer to the optical implementation of this capability as PPNs (purple circles in Fig. 1a, b).

The proposed PPN is depicted in Fig. 1c. The building block consists of a synchronously pumped degenerate OPO30. An OPO consists of a nonlinear medium (e.g., second-order nonlinear crystal, down converting photon frequency) and an optical cavity surrounding it. The phase of the initial optical field is random due to electromagnetic field fluctuations inside the cavity (quantum vacuum noise). When the power of the pump laser exceeds a certain threshold power, phase-sensitive gain of the OPO allows the initial state to fall into one of the bistable output states with either phase (0 rad, or π rad)28. In other words, quantum vacuum noise acts as a perfect random source that manifests itself in the output phase. In fact, this random source is an intrinsic noise source ubiquitous in quantum optics25,26,27. When a vacuum-level external bias field b is introduced in the OPO cavity, the probability distribution of the output steady states can be coherently controlled30. Specifically, our OPO-based PPN encodes a Bernoulli trial B(p) with binary outcomes having probability p and 1 − p. Independent random sampling and processing can be realized by time-multiplexing the bias signal, resulting in N independent outcomes with encoded probabilities as depicted by different heights in Fig. 1c.

The experimental system realizing the PPN, and its implementation into a probabilistic computing system, is shown in Fig. 2. The system consists of three modules: biased OPO (purple area), detection (green area), and processing unit (blue area). We time-multiplex OPO signals with an amplitude modulator along the pump path to sample multiple binary outputs from a single optical cavity at a rate of 10 kHz. This bit rate is chosen to ensure the statistical independence of each PPN28,30. We use a homodyne detector to measure the optical phase of the steady state and map it to the corresponding bit value (i.e., 0 rad → 0 and π rad → 1).

Fig. 2: Experimental demonstration of a photonic probabilistic computer.
figure 2

a Experimental setup, consisting of an ultrafast laser pumping a nonlinear cavity, and homodyne detection to measure the phase of the OPO signal. Electronic processing units (FPGA/GPU) generate electrical signals to tune the probability. AM amplitude modulator, PM phase modulator, FM flat mirror, DM dichroic mirror, ICM in-coupling mirror, (P)BS (polarization) beam-splitter, SM spherical mirror, PPLN periodically poled lithium niobate nonlinear crystal. PZT piezoelectric actuator, λ/2 half waveplate, PD photodiode. b Modulator voltage–probability relationship. Error bar represents the standard deviation. Optical or electronic elements in (a), originally published by GWoptics; released under a Creative Commons Attribution-NonCommercial 3.0 Unported (CC BY-NC 3.0).

During each cycle, a bit is measured by the homodyne detector (value 0 or 1), conditioned on the bias value b. This bit, or a collection of bit values (“bitstream”), is then fed into an electronic processing unit to update the bias field value and sample the PPN in the next cycle. In our experiment, the processing unit is taken as either a field-programmable gated array (FPGA) or a graphics processing unit (GPU). The FPGA is more adapted for real-time bitstream processing and control of the optical system, while the GPU can accelerate complex machine learning algorithms such as image generation at the cost of a slower system control.

Individual pi values are encoded in the phase of the bias field bi by applying a calibrated square-wave voltage to a phase modulator in the bias line path. The voltage–probability relation provided by the phase modulator is shown in Fig. 2b. This relation is used in the following computing experiments to control the bias voltage. A detailed description of the experimental setup is discussed in Supplementary Note 1.

Photonic probabilistic computer for image classification

We now perform probabilistic image classification of MNIST-handwritten digits37 using a pre-trained SBNN model on our optical probabilistic computing platform (Fig. 3a). SBNN encodes inference uncertainty by substituting deterministic layer nodes (as found in conventional fully connected neural networks) with stochastic binary nodes38. In a conventional, fully connected neural network, the jth node value in the (n + 1)th layer Xj,n+1 can be calculated in two steps: (1) matrix–vector multiplication (MVM) between weight matrix W and the nth layer Xn (zj,n ≡ ∑iWj,iXi,n); followed by (2) a nonlinear activation function σ() : \({X}_{j,n+1}=\sigma ({z}_{j,n})\).

Fig. 3: Probabilistic inference uncertainty encoded by quantum vacuum noise.
figure 3

a Hybrid photonic-electronic architecture for stochastic binary neural networks (SBNNs). Original MNIST grayscale handwritten digit is binarized 10 times with PPNs and each binarized image propagates through SBNN (left panel). Binary nodes are sampled by PPNs and their corresponding p values are evaluated by FPGA (middle panel). Because the nodes are stochastic, inference results vary (right panel). b Confusion matrices of image classification results. A total of 1000 binary images (100 grayscale testing images  × 10 times of binarization = 1000 input images) are tested. c Diagnosing inference results with the aid of quantum vacuum noise. Breadth in probability and low classification accuracy reflect the ambiguity of the input image. MNIST images in (a and c), originally published by LeCun, et al.37; released under a Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0).

Within our SBNN model, each layer node is represented by a PPN, and a single layer (yellow areas in Fig. 3a) is described as a bitstream of time-multiplexed PPNs. Because of the nonlinear nature of the bias-probability relationship (Fig. 2b), sampling a binary output Xj,n with our PPN from the given bias bj,n (or equivalently bias modulator voltage Vj,n in our experiment), naturally corresponds to passing a nonlinear activation function: \({X}_{j,n}=B({p}_{j,n})=B[\sigma ({V}_{j,n})]\). Modulator voltage Vj,n is calculated via MVM between the weight Wn−1 and the \({\left(n-1\right)}{{{{{\rm{th}}}}}}\) layer Xn−1 (gray areas in Fig. 3a, which is performed by the FPGA in our experiment). In other words, each PPN node binarizes the input, which consists of a weighted sum of previous layer nodes, with probability pj,n. Because of the stochastic nature of the nodes, their probabilities change for every inference, leading to a probabilistic interpretation of classification results for an identical input image (Fig. 3a, right).

To perform image classification of MNIST-handwritten digits with our optical SBNN, we first binarize original MNIST-handwritten digits (Fig. 3a, left). Original MNIST-handwritten digits (grayscale, pixel values ranging from 0 to 255) are normalized between 0 and 1. The resulting pixel values represent the probability value for each PPN. The grayscale images are binarized by sampling the PPNs. The binary images are propagated through the network (784 → 128 → 64 → 10), with real-time communication between PPNs and the FPGA. The output layer O0,1,...,9 is used to interpret the classification result, higher Oj corresponding to the higher probability of image representing digit “j”. The network is pre-trained in silico and the weights are implemented on the FPGA. A detailed description of the training process and how the FPGA communicates with the optical setup is in Supplementary Note 2.

To test the performance of our optical SBNN, a batch of grayscale MNIST-handwritten digits (100 images) from the test set is selected. By binarizing each grayscale MNIST-handwritten digit 10 times to encode statistical uncertainty, we prepared 1000 binarized MNIST-handwritten digits in total to be classified by our optical SBNN. While propagating to the output layer, PPNs in the input and hidden layers encode the uncertainty by stochastically sampling the binary values from given probabilities. Once the output layer is reached, we can collect the statistics from 10 different inference results for each input image. Confusion matrices in Fig. 3b show that the overall experimental classification accuracy (96.5%) is in close agreement with the accuracy obtained from the numerical simulations for the single batch (97.0%) and total test images (98.3%) (see Supplementary Note 2). The classification accuracy of our photonic probabilistic computing hardware is also comparable with that of other optical computing platforms reaching more than 95%21,39,40.

Figure 3c shows how our probabilistic neural network can diagnose the reliability of inference results by harnessing quantum vacuum noise. Unlike deterministic neural networks, the variability of layer nodes in SBNNs results in different probability for each inference. One of the factors that can potentially degrade the classification performance is the ambiguity of the image (i.e., how close the image is to the decision boundary, as shown in Fig. 1a). By encoding uncertainty during inference, our photonic probabilistic computing hardware suggests all possible labels that ambiguous images can be classified. We choose two ambiguous images and two unambiguous images from the test dataset and plot the probability of each binarized grayscale MNIST-handwritten digit being classified under a certain label. Because we binarized 10 images each, 10 different probability values are shown for each label.

Three different scenarios are described in Fig. 3c. Unambiguous images such as “0” and “9” (achieving 100% of classification accuracy) show relatively consistent classification results with probabilities of correct classification close to 1. In this scenario, probabilistic neural networks show similar behavior to deterministic neural networks, which always give the same classification result with a fixed probability. When the input image becomes ambiguous (image “5” underlined in red, achieving 50% of classification accuracy), our SBNN model indicates that the image can be either “3” or “5”. Accordingly, the distribution of probabilities on each label broadens with its average value close to 50%. The worst case scenario is depicted by image “2” (underlined in blue), showing low overall accuracy (20%) and strong inconsistency in classification results. Such scenario clearly showcases how probabilistic sampling can provide additional information to the end-user. Classification results for labels that are not included in Fig. 3c can be found in Supplementary Note 2.

Offering both overall accuracy and statistics of classification results, probabilistic neural networks can diagnose inference results by providing a confidence level of the decision. The total classification result for each input image can be found in Supplementary Note 2.

Generating images from quantum vacuum noise with photonic generative models

We now turn to the demonstration of generative models with our photonic probabilistic computing platform (Fig. 4), demonstrating the use of quantum optical randomness as a source for generative machine learning models. We use a type of autoregressive model (pixelCNN), which models a conditional probability of a current pixel value from previous pixels41.

Fig. 4: Images sampled from quantum vacuum noise.
figure 4

a PixelCNN generating binary MNIST-handwritten digits. PPNs sample the pixel value XN from the given pN value and GPU calculates the pN+1 value for the next pixel from the previous pixels XiN. b Branching off to different MNIST-handwritten digits, guided by quantum vacuum noise. Stochastic sampling allows generation of images with different digits and features. c Hundred generated images from pixelCNN, starting from a complete empty image. MNIST images in the left side of (b), originally published by LeCun, et al.37; released under a Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0).

Our implementation protocol for pixelCNN with PPNs is described in Fig. 4a. A binary image with the first N − 1 pixels XiN−1 specified is given as an input to the network. In principle, N can be any natural number, N = 1 corresponding to the case when pixelCNN creates an image only using quantum vacuum noise as a random seed. When the input image is given, a pre-trained pixelCNN model in the GPU evaluates pN to be encoded on the PPN from previous pixels XiN−1, generating a binary number for the Nth pixel (XN). The probability pN+1 is now computed based on previous pixel values XiN. This process is repeated until the full image is generated (28 × 28 = 784 pixels). Our hybrid optoelectronic computing system can generate new images using quantum vacuum noise as a random seed. Details of network structure and training method can be found in Supplementary Note 3.

Different MNIST-handwritten digits, all generated from the same incomplete input image, highlight how quantum vacuum noise enables stochastic image sampling (Fig. 4b). Although they all start from the same “ancestor” image, the multiple stochastic samples of pixel values from the PPNs branch off into different MNIST-handwritten digits with different labels (“descendant” images). It is also possible to generate different images with the same label (which is likely to be labeled as “2”).

We produced 100 examples of handwritten digit images from quantum vacuum noise using our photonic probabilistic computing platform (Fig. 4c). This was done by initiating an empty image as an input to our optical pixelCNN. We also test the negative-log-likelihood (NLL) of the generated images NLL \(\equiv -{\sum }_{i}\{{X}_{i}\ln ({p}_{i})+(1-{X}_{i})\ln (1-{p}_{i})\}\), where the sum runs over i = 1, …, 784 pixel indices. A lower value of NLL indicates statistical similarity to the distribution of training images, yielding 71.1 ± 18.8 for our experimental results and 64.9 ± 15.4 for numerical simulations. This shows that our system has learned an accurate representation of the image distribution. Details of the performance of image generations can be found in Supplementary Note 3.

Discussion

In our demonstration of photonic probabilistic machine learning, the speed and energy efficiency were limited by the PPN sampling rate and data transfer bandwidth between electronic processors and PPNs. In the following, we propose an all-optical probabilistic computing platform which can overcome these challenges, and evaluate the potential benefit in terms of speed and energy efficiency compared to the electronic state of the art.

To increase sampling rate and reduce energy consumption, we propose an all-optical implementation. For instance, PPNs can be implemented with injection-seeded vertical-cavity surface-emitting lasers, reaching  >1 Gbps42 and providing energy-efficient operation43. Fast control of the probability and state detection can be achieved with high-bandwidth modulators and detectors44,45,46,47,48, suggesting that PPNs achieving 1 Gbps sampling rate are within reach (detailed explanations can be found in Supplementary Note 4).

Furthermore, our programmable stochastic element naturally implements an all-optical nonlinearity through the bias-probability relationship, which has been a historical challenge in the implementation of energy-efficient all-optical ONNs15. Typically, ONNs rely on optoelectronic measurement-feedback schemes to update the network layers39,49. Conversely, in the proposed scheme, an optical signal (vacuum-level bias) controls the nonlinearity of the layer. Because the bias signal can be derived directly from the accumulated PPN outputs, bypassing active components, the scheme can reduce energy consumption per multiply-accumulate (MAC) operation to as low as  ~5 fJ/MAC. State-of-the-art stochastic electronic devices, such as low-barrier magnetic tunnel junctions and diffusive memristors integrated with conventional CMOS technologies are expected to achieve  ~0.1 Gbps50,51 and consume  ~900 fJ/MAC38. Comparatively, our proposed photonic platform can be  ~ ×10 faster and  ~ ×100 more energy efficient. A detailed discussion of this all-optical probabilistic computing platform is found in Supplementary Note 4.

We now compare the speed and energy performance of our photonic platform to a state-of-the-art FPGA52,53, in an image classification task considering a binary neural network. The deterministic FPGA implementation demonstrated a classification of  ~1.6 million images per second with  ~23 W power consumption. Adopting the network structure of our SBNN model in Fig. 3, we can calculate the computation time and the number of MAC operations required for each inference. Our estimation gives  ~4 ns and  ~105 MAC operations per classification, which result in  ~250 million image classifications per second with a power consumption of  ~0.1 W. Therefore, the suggested all-optical probabilistic computing hardware could perform  ×100 faster while consuming  ×100 less power. Detailed discussion can be found in Supplementary Note 4.

One of the possible extensions of our work is to train the network physically54,55. This becomes critical when an accurate digital modeling of the physical system becomes challenging due to its complexity. Without an additional cost of simulating randomness in digital models, several training methods which resort to stochasticity, including stochastic gradient descent56, dropout34, and noise injection57 could potentially be realized with PPNs. Harnessing quantum vacuum noise in optical elements for both training and testing, our PPNs will pave the way of implementing all-optical probabilistic physical neural networks, which can benefit state-of-the-art machine learning applications including large language models58 and diffusion models59.

Our platform could also be used to implement other important computational tasks. The first one is alternative interpretable neural network models with trainable activation functions60, which could be implemented with the PPN by taking advantage of its tunable bias-probability relationship. The second one is Ising model solvers with external magnetic fields, which can be modeled by the injection of a bias field in a network of OPOs61.