Table 3 Performance comparison (throughput -Tera OPerations per Second, TOPS-, Density and Efficiency) between hybrid CMOS/Hybrid prototypes and full-CMOS neuromorphic accelerators
From: Hardware implementation of memristor-based artificial neural networks
Exp./Sim | Type | Process (nm) | Activation resolution | Weight resolution | Clock speed | Benchmarked workload | Weight storage | Rhigh | Rlow | Array size | ADC type | Throughput (TOPS) | Density (TOPS per mm2) | Efficiency (TOPS per W) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NVIDIA T4277 | Exp. | Full-CMOS | 12 | 8-bit int | 8-bit int | 2.6 GHz | ResNet-50 (batch = 128) | -- | -- | -- | -- | -- | 22.2, 130 (peak) | 0.04, 0.24 (peak) | 0.32 |
Google TPU v119 | Exp. | Full-CMOS | 28 | 8-bit int | 8-bit int | 700 MHz | MLPs, LSTMs, CNNs | -- | -- | -- | -- | -- | 21.4, 92 (peak) | 0.06, 0.28 (peak) | 2.3 (peak) |
Habana Goya HL-1000278 | Exp. | Full-CMOS | 16 | 16-bit int | 16-bit int | 2.1 GHz (CPU) | ResNet-50 (batch = 10) | -- | -- | -- | -- | -- | 63.1 | -- | 0.61 |
DaDianNao279 | Sim. | Full-CMOS | 28 | 16-bit fixed-pt. | 16-bit fixed-pt. | 606 MHz | Peak performance | -- | -- | -- | -- | -- | 5.58 | 0.08 | 0.35 |
UNPU280 | Exp. | Full-CMOS | 65 | 16 bits | 1 bit | 200 MHz | Peak performance | -- | -- | -- | -- | -- | 7.37 | 0.46 | 50.6 |
Reference mixed-signal281 | Exp. | Full-CMOS | 28 | 1 bit | 1 bit | 10 MHz | Binary CNN (CIFAR-10) | -- | -- | -- | -- | -- | 0.478 | 0.1 | 532 |
ISAAC160 | Exp. | RRAM-CMOS | 32 | 16 bits | 16 bits | 1.2 GHz | Peak performance | ReRAM (8×2-bit) | ~2 M | ~2 k | 128×128 | SAR (8-bit) | 41.3 | 0.48 | 0.63 |
Newton282 | Exp. | RRAM-CMOS | 32 | 16 bits | 16 bits | 1.2 GHz | Peak performance | ReRAM (8×2-bit) | ~2 M | ~2 k | 128×128 | SAR (8-bit) | -- | 0.68 | 0.92 |
PUMA154 | Exp. | RRAM-CMOS | 32 | 16 bits | 16 bits | 1.0 GHz | Peak performance | ReRAM (8×2-bit) | 1 M | 100k | 128×128 | SAR | 26.2 | 0.29 | 0.42 |
PRIME125 | Sim. | RRAM-CMOS | 65 | 6 bits | 8 bits | 3.0 GHz (CPU) | -- | ReRAM | 20 k | 1 k | 256×256 | Ramp (6-bit) | -- | -- | -- |
Memristive Boltzmann machine283 | Sim. | RRAM-CMOS | 22 | 32 bits | 32 bits | 3.2 GHz (CPU) | -- | ReRAM | 1.1 G | 315 k | 512×512 | SAR | -- | -- | -- |
3D-aCortex83 | Exp. | RRAM-CMOS | 55 | 4 bits | 4 bits | 1.0 GHz | GNMT | NAND flash | -- | 2.3 M | 64×128 | Temporal to digital (4-bit) | 10.7 | 0.58 | 70.4 |
Analog-AI Using Dense 2-D Mesh284 | Sim | RRAM-CMOS | 14 | 8 bits | Analogue | 1.0 GHz | RNN/LSTM | PCM | No data | No data | 512×512 | Current controlled oscillator based | 376.7 | No data | 65.6 |