Table 3 Ablation analysis, varying the value of the tolerance with 3 constrained actions
pRand = 1 | pRand = 0.75 | pRand = 0.5 | pRand = 0.25 | pRand = 0 | ||
---|---|---|---|---|---|---|
S1 solver | Avg. Time (ms) | 0.34 (0.56) | 1.61 (1.7) | 1.59 (1.51) | 1.69 (1.95) | 11.24 (26.2) |
Avg. reward | −2452.34 (1760.79) | −1010.94 (916.45) | −575.27 (602.68) | −424.45 (551.9) | −833.13 (1017.25) | |
Avg. length | 75.92 (52.1) | 33.62 (28.43) | 20.81 (19.24) | 17.26 (19.04) | 38.86 (43.27) | |
Avg. Viol. Constr. | 22.39 (31.98) | 9.2 (15.0) | 5.21 (9.37) | 3.77 (8.07) | 7.05 (14.96) | |
S2 solver | Avg. Time (ms) | 63.03 (47.54) | 63.03 (47.54) | 63.03 (47.54) | 63.03 (47.54) | 63.03 (47.54) |
Avg. reward | −40.26 (54.48) | −40.26 (54.48) | −40.26 (54.48) | −40.26 (54.48) | −40.26 (54.48) | |
Avg. length | 7.11 (4.98) | 7.11 (4.98) | 7.11 (4.98) | 7.11 (4.98) | 7.11 (4.98) | |
Avg. Viol. Constr. | 0.38 (0.89) | 0.38 (0.89) | 0.38 (0.89) | 0.38 (0.89) | 0.38 (0.89) | |
Avg. Perc. Use S2 | 1.0 (0.0) | 1.0 (0.0) | 1.0 (0.0) | 1.0 (0.0) | 1.0 (0.0) | |
SOFAI t2 = 0.5 | Avg. Time (ms) | 20.98 (45.71) | 21.98 (31.78) | 20.96 (38.43) | 15.0 (35.53) | 8.31 (14.51) |
Avg. reward | −2346.9 (1805.86) | −629.67 (461.66) | −290.84 (224.91) | −137.4 (128.97) | −41.39 (74.11) | |
Avg. length | 73.84 (53.03) | 24.64 (16.67) | 15.45 (10.48) | 11.88 (9.63) | 9.5 (14.32) | |
Avg. Viol. Constr. | 21.37 (31.45) | 5.6 (7.97) | 2.45 (3.63) | 0.99 (1.73) | 0.09 (0.34) | |
Avg. Perc. Use S2 | 0.07 (0.14) | 0.11 (0.14) | 0.13 (0.15) | 0.13 (0.15) | 0.11 (0.17) | |
SOFAI t2 = 0.95 | Avg. Time (ms) | 64.54 (49.35) | 49.6 (44.47) | 36.4 (41.39) | 27.31 (39.38) | 11.97 (23.42) |
Avg. reward | −429.75 (249.84) | −262.42 (148.05) | −160.82 (110.41) | −104.1 (92.81) | −38.42 (59.74) | |
Avg. length | 20.56 (11.26) | 15.13 (8.43) | 12.26 (7.52) | 11.44 (9.0) | 8.81 (11.7) | |
Avg. Viol. Constr. | 3.67 (4.59) | 2.16 (2.66) | 1.22 (1.69) | 0.66 (1.11) | 0.09 (0.34) | |
Avg. Perc. Use S2 | 0.37 (0.18) | 0.33 (0.18) | 0.31 (0.2) | 0.25 (0.2) | 0.15 (0.19) | |
SOFAI t2 = 1 | Avg. Time (ms) | 63.99 (45.47) | 51.53 (46.53) | 37.43 (34.95) | 27.33 (29.9) | 12.45 (24.25) |
Avg. reward | −415.18 (238.82) | −244.34 (143.5) | −148.84 (98.86) | −95.99 (87.81) | −40.58 (65.57) | |
Avg. length | 19.96 (10.79) | 14.6 (7.9) | 11.88 (6.59) | 11.06 (8.19) | 9.23 (12.97) | |
Avg. Viol. Constr. | 3.55 (4.41) | 1.99 (2.5) | 1.11 (1.52) | 0.59 (1.04) | 0.09 (0.34) | |
Avg. Perc. Use S2 | 0.38 (0.18) | 0.37 (0.18) | 0.33 (0.2) | 0.27 (0.21) | 0.16 (0.19) | |
SOFAI REF t2 = 0.5 | Avg. Time (ms) | 23.05 (34.68) | 24.93 (36.73) | 23.26 (40.2) | 21.12 (39.04) | 13.34 (36.48) |
Avg. reward | −2381.86 (1838.78) | −648.65 (481.49) | −277.1 (216.26) | −127.33 (120.01) | −46.14 (95.83) | |
Avg. length | 74.12 (53.68) | 25.48 (17.77) | 14.8 (10.25) | 11.48 (8.87) | 9.92 (19.21) | |
Avg. Viol. Constr. | 21.72 (31.91) | 5.76 (8.18) | 2.34 (3.42) | 0.9 (1.56) | 0.13 (0.43) | |
Avg. Perc. Use S2 | 0.07 (0.14) | 0.12 (0.13) | 0.17 (0.13) | 0.2 (0.15) | 0.16 (0.18) | |
SOFAI REF t2 = 0.95 | Avg. Time (ms) | 69.57 (46.45) | 53.75 (47.52) | 43.31 (47.55) | 37.31 (44.43) | 18.96 (27.24) |
Avg. reward | −397.6 (242.45) | −241.46 (142.31) | −153.55 (108.12) | −94.0 (91.56) | −44.29 (80.56) | |
Avg. length | 19.49 (11.02) | 14.65 (8.38) | 11.9 (7.49) | 10.27 (7.6) | 9.79 (16.51) | |
Avg. Viol. Constr. | 3.38 (4.24) | 1.96 (2.42) | 1.16 (1.57) | 0.61 (1.08) | 0.11 (0.37) | |
Avg. Perc. Use S2 | 0.42 (0.18) | 0.41 (0.19) | 0.39 (0.21) | 0.38 (0.23) | 0.24 (0.23) | |
SOFAI REF t2 = 1 | Avg. Time (ms) | 70.05 (52.97) | 55.03 (43.44) | 46.51 (40.27) | 37.26 (42.37) | 19.32 (38.7) |
Avg. reward | −351.27 (211.5) | −231.46 (148.21) | −141.93 (96.43) | −92.06 (93.35) | −42.21 (75.94) | |
Avg. length | 18.2 (10.48) | 14.34 (8.13) | 11.5 (6.62) | 10.19 (6.91) | 9.31 (15.05) | |
Avg. Viol. Constr. | 2.96 (3.67) | 1.87 (2.4) | 1.06 (1.41) | 0.59 (1.15) | 0.11 (0.36) | |
Avg. Perc. Use S2 | 0.45 (0.18) | 0.43 (0.2) | 0.44 (0.22) | 0.39 (0.24) | 0.24 (0.24) |