Table 7 Retrieval strategies in title_plus_ask (GPT-4o, RAG).

From: Medical QA dialogue datasets in RAG systems performance evaluation and ChatGPT optimization

Label

ROUGE-L F1

Δ vs. baseline

95% CI

p-value

BERTScore-F1

BLEU-2

Cosine

baseline_vector_gpt4o

0.1491

0.7038

0.0300

0.4261

hybrid_rerank_gpt4o

0.1551

+ 0.0061

[+ 0.0018, + 0.0104]

0.003

0.7015

0.0325

0.4593

hybrid_rrf_gpt4o

0.1566

+ 0.0076

[+ 0.0032, + 0.0121]

0.001

0.7020

0.0319

0.4824