Table 1 Average performance of S1 solver, S2 solver, and SOFAI over trajectors’ reward (that is, penalties accumulated over the moves), length, and time (in milliseconds) to generate it

		pRand = 1	pRand = 0.75	pRand = 0.5	pRand = 0.25	pRand = 0
S1 solver	Avg. Time (ms)	0.34 (0.56)	1.50 (1.29)	1.59 (1.35)	1.32 (1.22)	26.28 (110.19)
	Avg. Reward	−3244.77 (2,320.64)	−1280.35 (1,220.32)	−721.18 (838.84)	−452.64 (652.03)	−621.68 (891.34)
	Avg. Length	73.91 (51.08)	31.16 (28.07)	18.73 (19.85)	12.15 (15.30)	21.55 (33.29)
S2 solver	Avg. Time (ms)	235.84 (337.64)	235.84 (337.64)	235.84 (337.64)	235.84 (337.64)	235.84 (337.64)
	Avg. Reward	−208.07 (436.69)	−208.07 (436.69)	−208.07 (436.69)	−208.07 (436.69)	−208.07 (436.69)
	Avg. Length	11.47 (16.25)	11.47 (16.25)	11.47 (16.25)	11.47 (16.25)	11.47 (16.25)
SOFAI	Avg. Time (ms)	206.91 (323.62)	147.82 (251.05)	110.34 (223.74)	69.55 (186.78)	99.61 (239.41)
	Avg. Reward	−213.22 (463.57)	−179.07 (319.69)	−146.21 (267.56)	−121.50 (191.13)	−156.37 (283.48)
	Avg. Length	12.10 (17.35)	10.90 (14.11)	10.12 (13.32)	9.44 (11.93)	16.31 (27.82)

Each column is related to a version of the S1 solver with a degree of randomness given by variable pRand: pRand = 1 means that the S1 solver is completely random, while pRand = 0 means that the solver always selects the most promising move. The S2 solver is instead always the same, and this is reflected by showing the same numbers in all five columns for the S2 solver rows. Best values for other rows are highlighted in bold. The average and standard deviation (shown in parentheses) are over 10 grids and 500 trajectories per grid. SOFAI outperforms or otherwise performs comparably to the S2 solver on all criteria. Moreover, it does better than all stances of the S1 solver except for time.

Quick links

Search