Table 1 Downstream task results
From: Analog in-memory computing attention mechanism for fast and energy-efficient large language models
ARC-E | ARC-C | WinoGrande | HellaSwag | LAMBADA | LAMBADA | PIQA | WikiText-2 | Average | Average | |
|---|---|---|---|---|---|---|---|---|---|---|
acc ↑ | acc ↑ | acc ↑ | acc ↑ | ppl ↓ | acc ↑ | acc ↑ | ppl ↓ | acc ↑ | ppl ↓ | |
Public software model | 43.81 | 22.70 | 51.62 | 31.14 | 35.15 | 45.96 | 62.89 | 37.37 | 43.02 | 36.26 |
Software model trained from scratch | 42.34 | 23.46 | 50.20 | 29.73 | 46.39 | 41.56 | 61.48 | 41.25 | 41.46 | 43.82 |
Linear hardware model | 42.80 | 23.46 | 52.41 | 30.31 | 51.69 | 38.10 | 61.21 | 39.79 | 41.38 | 45.74 |
Nonlinear hardware model with adaptation | 42.09 | 22.87 | 50.51 | 30.10 | 76.59 | 31.61 | 61.53 | 42.34 | 39.79 | 59.47 |
Nonlinear hardware model with adaptation and fine-tuning | 43.94 | 22.78 | 51.14 | 30.18 | 43.08 | 40.16 | 62.62 | 39.97 | 41.80 | 41.52 |
Public software model-XL | 58.29 (+14.48) | 28.50 (+5.80) | 58.33 (+6.71) | 50.89 (+19.75) | 9.68 (−25.47) | 63.87 (+17.91) | 70.84 (+7.95) | 20.38 (−16.99) | 55.12 (+12.10) | 15.03 (−21.23) |
Software model trained from scratch-XL | 53.82 (+11.48) | 25.76 (+2.30) | 53.75 (+3.55) | 42.54 (+12.81) | 14.82 (−31.57) | 56.33 (+14.77) | 68.71 (+7.23) | 24.98 (−16.27) | 50.15 (+8.69) | 19.90 (−23.92) |
Linear hardware model-XL | 54.08 (+11.28) | 27.47 (+4.01) | 57.93 (+5.52) | 45.51 (+15.20) | 12.32 (−39.37) | 58.54 (+20.44) | 68.01 (+6.80) | 23.26 (−16.53) | 51.92 (+10.54) | 17.79 (−27.95) |
Nonlinear hardware model-XL | 53.79 (+9.85) | 27.30 (+4.52) | 54.70 (+3.56) | 46.70 (+16.52) | 12.17 (−30.91) | 59.48 (+19.32) | 68.17 (+5.55) | 22.29 (−17.68) | 51.69 (+9.89) | 17.23 (−24.29) |