While transformers and large language models excel at efficiently processing long sequences, new approaches have been proposed that incorporate recurrence to overcome the quadratic cost of self-attention. Tiezzi et al. discuss recurrent and state-space models and the promise they hold for future sequence processing networks.
- Matteo Tiezzi
- Michele Casoni
- Stefano Melacci