Table 1 Performance comparison of our approach to conventional computing systems and other optical/optoelectronic approaches

From: Low-power scalable multilayer optoelectronic neural networks enabled with incoherent light

Technique

Approach

Throughput

Efficiency (Expt)

Efficiency (Proj)

Precision

Reference

  

TOPS

TOPS/W

TOPS/W

bit

 

NVIDIA B200

GPU

144 103*

10.01

 

4

49

  

57 103*

5.03

 

8

 

NVIDIA RTX 4090

GPU

660.60**

0.78

 

8

50

Google TPUv4

ASIC

275

1.62

 

8

51,52

Photonic WDM/PCM in-memory computing

Photonic

0.65

0.50

7.00

5

21

Image intensifier

Incoherent Free Space

5.76 10−7

3.03 10−7

66.67

8

35

Photonic convolutional accelerator

Photonic

0.48

1.26

 

8

22

Free space optoelectronic neural network

Incoherent Free Space

1.6 10−3

11.45 10−3

35.09

8

This Work

  1. Compared distributions were the resulting effects of frequency on per-capita recruitment for each The numbers for NVIDIA B200* and RTX 4090** represent the performance for thousands of CUDA and tensor cores. Although the core count for B200 is not publicly available, RTX 4090 has 16384 CUDA cores and 512 Tensor cores. It is quite likely that Nvidia B200 has significantly more of these cores.