A kernel approximation method that enables linear-complexity attention computation via analogue in-memory computing (AIMC) to deliver superior energy efficiency is demonstrated on a multicore AIMC chip.
- Julian Büchel
- Giacomo Camposampiero
- Abu Sebastian