Large language models learn from, and often memorize, training data. Here, authors show they memorize far more than previously thought, by stitching together pieces of text like a mosaic. This makes it difficult to control and challenges widely used practices for benchmarking and protecting privacy.
- Igor Shilov
- Matthieu Meeus
- Yves-Alexandre de Montjoye