Extended Data Fig. 3: TTRL ablations when varying the quantity of synthetic variants and with different variant generator LLMs.
From: Olympiad-level formal mathematical reasoning with reinforcement learning

a Increasing the number of variants in the curriculum (from “Top 10” to “Top 100k”) leads to a higher proportion of solved variants int the TTRL training set. b More synthetic variants consistently improves the final prove rate on the target problems, demonstrating the significant benefit of a larger set of problem-specific variants. c The variants in the TTRL train set from Gemini 1.5 Flash S (grey) show a higher solve rate than those from the stronger Gemini 2.0 Flash S (pink), suggesting Gemini 2.0 Flash S’s variants may form a more challenging learning curriculum. d Using Gemini 2.0 Flash S for variant generation results in a notably higher prove rate on the target problems, indicating that the learning curriculum is indeed more effective.