Table 4 Retrieval performance metrics for different models at K = 5 and K = 10.
From: A Question Answering Dataset for Temporal-Sensitive Retrieval-Augmented Generation
K | Method | Overall | Multiple | Single | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
Recall | MAP | NDCG | Recall | MAP | NDCG | Recall | MAP | NDCG | ||
5 | Native RAG | 54.6  ± 1.1 | 52.5  ± 0.9 | 62.9  ± 1.3 | 26.9  ± 1.5 | 42.5  ± 1.8 | 49.5  ± 2.1 | 70.6  ± 1.0 | 58.3  ± 0.8 | 70.6  ± 1.0 |
Temporal Filter | 49.0  ± 1.4 | 45.1  ± 1.2 | 56.8  ± 1.5 | 15.5  ± 1.9 | 23.2  ± 2.2 | 30.8  ± 2.5 | 68.5  ± 1.2 | 57.8  ± 1.1 | 70.9  ± 0.9 | |
Query Rewrite | 55.7† ± 1.0 | 53.3† ± 0.8 | 64.0† ± 1.1 | 29.0† ± 1.4 | 44.2† ± 1.6 | 51.6† ± 1.9 | 71.3† ± 0.9 | 58.6  ± 0.7 | 71.3† ± 0.9 | |
Query Decomposition | 61.9† ± 0.9 | 56.4† ± 0.7 | 71.8† ± 1.0 | 45.2† ± 1.2 | 52.4† ± 1.4 | 72.1† ± 1.5 | 71.6† ± 0.9 | 58.6  ± 0.7 | 71.6† ± 0.9 | |
10 | Native RAG | 61.8  ± 1.0 | 53.5  ± 0.9 | 71.8  ± 1.1 | 35.6  ± 1.6 | 43.7  ± 1.7 | 62.6  ± 1.9 | 77.1  ± 0.9 | 59.1  ± 0.8 | 77.1  ± 0.9 |
Temporal Filter | 51.7  ± 1.3 | 43.7  ± 1.3 | 60.7  ± 1.6 | 17.5  ± 2.0 | 20.6  ± 2.4 | 35.5  ± 2.7 | 71.6  ± 1.1 | 57.2  ± 1.2 | 74.3  ± 1.0 | |
Query Rewrite | 62.6† ± 0.9 | 53.8  ± 0.8 | 72.3† ± 1.0 | 36.7† ± 1.5 | 44.3† ± 1.6 | 63.2† ± 1.8 | 77.7† ± 0.8 | 59.3  ± 0.7 | 77.7† ± 0.8 | |
Query Decomposition | 68.2† ± 0.8 | 57.6† ± 0.7 | 78.9† ± 0.9 | 53.9† ± 1.3 | 53.1† ± 1.5 | 83.2† ± 1.4 | 76.5  ± 0.9 | 59.3  ± 0.7 | 76.5  ± 0.9 | |