Table 6 Test acccuracy of the FLAN-T5 models in Scenarios 1 and 3 for the three datasets.

From: Verbal lie detection using Large Language Models

Model

Opinion

Memory

Intention

Bag-of-words baseline

76.16 ± 2.9%

57.57 ± 7.66%

67.07 ± 3.18%

Literature baseline

65.16 ± 5.7%

69.00 [63; 74] %

69.86 ± 2.34%

70.61 ± 2.58%

FLAN-T5 small—Scenario 1

80.64 ± 2.03%

76.87 ± 2.06%

71.46 ± 3.65%

FLAN-T5 base—Scenario 1

82.60 ± 3.01%

80.61 ± 1.41%

71.52 ± 2.21%

FLAN-T5 small—Scenario 3

79 ± 2.11%

75.67 ± 1.90%

69.32 ± 3.75%

FLAN-T5 base—Scenario 3

82.72 ± 2.39%

79.87 ± 1.60%

72.25 ± 2.86%

  1. Reported values are means ± standard deviation of the 10 folds. Best results per evaluation metric are in bold. The literature baseline for the Opinion dataset refers to the average accuracy and standard deviation from all within-topic accuracies from FastText Embedding + Transformer51. The literature baseline for the Intention dataset refers to the accuracy from Vanilla Random Forest using LIWC features (confidence interval in square brackets)49, the averaged accuracy and standard deviation from RoBERTa + Transformers + Co-Attention model and BERT + co-attention model50 respectively.