Fig. 2: Accuracy comparison based on model size and number of “N/A” outputs.

This figure presents a comparison of model accuracy with the frequency of “N/A” outputs. A higher frequency of “N/A” outputs indicates a lower usefulness of the model. The size of each bubble represents the number of parameters of the model. This highlights the close performance of OncoLLM to GPT-4 despite having relatively fewer parameters.