Table 3 Ablation analysis, varying the value of the tolerance with 3 constrained actions

From: Fast, slow, and metacognitive thinking in AI

  

pRand = 1

pRand = 0.75

pRand = 0.5

pRand = 0.25

pRand = 0

S1 solver

Avg. Time (ms)

0.34 (0.56)

1.61 (1.7)

1.59 (1.51)

1.69 (1.95)

11.24 (26.2)

 

Avg. reward

−2452.34 (1760.79)

−1010.94 (916.45)

−575.27 (602.68)

−424.45 (551.9)

−833.13 (1017.25)

 

Avg. length

75.92 (52.1)

33.62 (28.43)

20.81 (19.24)

17.26 (19.04)

38.86 (43.27)

 

Avg. Viol. Constr.

22.39 (31.98)

9.2 (15.0)

5.21 (9.37)

3.77 (8.07)

7.05 (14.96)

S2 solver

Avg. Time (ms)

63.03 (47.54)

63.03 (47.54)

63.03 (47.54)

63.03 (47.54)

63.03 (47.54)

 

Avg. reward

−40.26 (54.48)

−40.26 (54.48)

−40.26 (54.48)

−40.26 (54.48)

−40.26 (54.48)

 

Avg. length

7.11 (4.98)

7.11 (4.98)

7.11 (4.98)

7.11 (4.98)

7.11 (4.98)

 

Avg. Viol. Constr.

0.38 (0.89)

0.38 (0.89)

0.38 (0.89)

0.38 (0.89)

0.38 (0.89)

 

Avg. Perc. Use S2

1.0 (0.0)

1.0 (0.0)

1.0 (0.0)

1.0 (0.0)

1.0 (0.0)

SOFAI t2 = 0.5

Avg. Time (ms)

20.98 (45.71)

21.98 (31.78)

20.96 (38.43)

15.0 (35.53)

8.31 (14.51)

 

Avg. reward

−2346.9 (1805.86)

−629.67 (461.66)

−290.84 (224.91)

−137.4 (128.97)

−41.39 (74.11)

 

Avg. length

73.84 (53.03)

24.64 (16.67)

15.45 (10.48)

11.88 (9.63)

9.5 (14.32)

 

Avg. Viol. Constr.

21.37 (31.45)

5.6 (7.97)

2.45 (3.63)

0.99 (1.73)

0.09 (0.34)

 

Avg. Perc. Use S2

0.07 (0.14)

0.11 (0.14)

0.13 (0.15)

0.13 (0.15)

0.11 (0.17)

SOFAI t2 = 0.95

Avg. Time (ms)

64.54 (49.35)

49.6 (44.47)

36.4 (41.39)

27.31 (39.38)

11.97 (23.42)

 

Avg. reward

−429.75 (249.84)

−262.42 (148.05)

−160.82 (110.41)

−104.1 (92.81)

−38.42 (59.74)

 

Avg. length

20.56 (11.26)

15.13 (8.43)

12.26 (7.52)

11.44 (9.0)

8.81 (11.7)

 

Avg. Viol. Constr.

3.67 (4.59)

2.16 (2.66)

1.22 (1.69)

0.66 (1.11)

0.09 (0.34)

 

Avg. Perc. Use S2

0.37 (0.18)

0.33 (0.18)

0.31 (0.2)

0.25 (0.2)

0.15 (0.19)

SOFAI t2 = 1

Avg. Time (ms)

63.99 (45.47)

51.53 (46.53)

37.43 (34.95)

27.33 (29.9)

12.45 (24.25)

 

Avg. reward

−415.18 (238.82)

−244.34 (143.5)

−148.84 (98.86)

−95.99 (87.81)

−40.58 (65.57)

 

Avg. length

19.96 (10.79)

14.6 (7.9)

11.88 (6.59)

11.06 (8.19)

9.23 (12.97)

 

Avg. Viol. Constr.

3.55 (4.41)

1.99 (2.5)

1.11 (1.52)

0.59 (1.04)

0.09 (0.34)

 

Avg. Perc. Use S2

0.38 (0.18)

0.37 (0.18)

0.33 (0.2)

0.27 (0.21)

0.16 (0.19)

SOFAI REF t2 = 0.5

Avg. Time (ms)

23.05 (34.68)

24.93 (36.73)

23.26 (40.2)

21.12 (39.04)

13.34 (36.48)

 

Avg. reward

−2381.86 (1838.78)

−648.65 (481.49)

−277.1 (216.26)

−127.33 (120.01)

−46.14 (95.83)

 

Avg. length

74.12 (53.68)

25.48 (17.77)

14.8 (10.25)

11.48 (8.87)

9.92 (19.21)

 

Avg. Viol. Constr.

21.72 (31.91)

5.76 (8.18)

2.34 (3.42)

0.9 (1.56)

0.13 (0.43)

 

Avg. Perc. Use S2

0.07 (0.14)

0.12 (0.13)

0.17 (0.13)

0.2 (0.15)

0.16 (0.18)

SOFAI REF t2 = 0.95

Avg. Time (ms)

69.57 (46.45)

53.75 (47.52)

43.31 (47.55)

37.31 (44.43)

18.96 (27.24)

 

Avg. reward

−397.6 (242.45)

−241.46 (142.31)

−153.55 (108.12)

−94.0 (91.56)

−44.29 (80.56)

 

Avg. length

19.49 (11.02)

14.65 (8.38)

11.9 (7.49)

10.27 (7.6)

9.79 (16.51)

 

Avg. Viol. Constr.

3.38 (4.24)

1.96 (2.42)

1.16 (1.57)

0.61 (1.08)

0.11 (0.37)

 

Avg. Perc. Use S2

0.42 (0.18)

0.41 (0.19)

0.39 (0.21)

0.38 (0.23)

0.24 (0.23)

SOFAI REF t2 = 1

Avg. Time (ms)

70.05 (52.97)

55.03 (43.44)

46.51 (40.27)

37.26 (42.37)

19.32 (38.7)

 

Avg. reward

−351.27 (211.5)

−231.46 (148.21)

−141.93 (96.43)

−92.06 (93.35)

−42.21 (75.94)

 

Avg. length

18.2 (10.48)

14.34 (8.13)

11.5 (6.62)

10.19 (6.91)

9.31 (15.05)

 

Avg. Viol. Constr.

2.96 (3.67)

1.87 (2.4)

1.06 (1.41)

0.59 (1.15)

0.11 (0.36)

 

Avg. Perc. Use S2

0.45 (0.18)

0.43 (0.2)

0.44 (0.22)

0.39 (0.24)

0.24 (0.24)

  1. The analysis is performed with and without the reflection phase. ”SOFAI REF” refers to agents that employed the reflection phase.