Extended Data Fig. 1: Detailed overview of the OpenScholar inference (top) and training pipeline (bottom).
From: Synthesizing scientific literature with retrieval-augmented language models

At inference time, given an input x, OpenScholar first uses a retriever to identify relevant papers from a specialized data store (OSDS) and then uses a reranker to refine and identify the top N retrieved documents. The retrieved output is then passed to the LM, which generates both (1) an initial response y0 and (2) self-feedback f1. By incorporating its feedback, the LM iteratively refines its output a pre-defined number of times. Subsequently, a LM (1) generates an initial response y0, (2) generates self-feedback on the initial output and (3) incorporates feedback (fi) to generate a revised response y1. The LM repeats the process until all feedback is incorporated. To train a smaller yet competitive 8B LM, we generate high-quality training data using this inference-time pipeline, followed by data filtering and mixing. We use our new OpenScholar retriever, continue pre-training on the OSDS for the retriever, followed by OpenScholar reranker, fine-tuned on synthetically generated data initialized from the BGE reranker, for the reranker component. For OpenScholar-8B, we use Llama 3.1 8B as the generator and for OpenScholar-GPT-4o, we use GPT-4o as the LM.