Table 1 Average performance of S1 solver, S2 solver, and SOFAI over trajectors’ reward (that is, penalties accumulated over the moves), length, and time (in milliseconds) to generate it
pRand = 1 | pRand = 0.75 | pRand = 0.5 | pRand = 0.25 | pRand = 0 | ||
---|---|---|---|---|---|---|
S1 solver | Avg. Time (ms) | 0.34 (0.56) | 1.50 (1.29) | 1.59 (1.35) | 1.32 (1.22) | 26.28 (110.19) |
Avg. Reward | −3244.77 (2,320.64) | −1280.35 (1,220.32) | −721.18 (838.84) | −452.64 (652.03) | −621.68 (891.34) | |
Avg. Length | 73.91 (51.08) | 31.16 (28.07) | 18.73 (19.85) | 12.15 (15.30) | 21.55 (33.29) | |
S2 solver | Avg. Time (ms) | 235.84 (337.64) | 235.84 (337.64) | 235.84 (337.64) | 235.84 (337.64) | 235.84 (337.64) |
Avg. Reward | −208.07 (436.69) | −208.07 (436.69) | −208.07 (436.69) | −208.07 (436.69) | −208.07 (436.69) | |
Avg. Length | 11.47 (16.25) | 11.47 (16.25) | 11.47 (16.25) | 11.47 (16.25) | 11.47 (16.25) | |
SOFAI | Avg. Time (ms) | 206.91 (323.62) | 147.82 (251.05) | 110.34 (223.74) | 69.55 (186.78) | 99.61 (239.41) |
Avg. Reward | −213.22 (463.57) | −179.07 (319.69) | −146.21 (267.56) | −121.50 (191.13) | −156.37 (283.48) | |
Avg. Length | 12.10 (17.35) | 10.90 (14.11) | 10.12 (13.32) | 9.44 (11.93) | 16.31 (27.82) |