Fig. 7: Concluding failure rates of DeepSeek-R1 with respect to four diagnostic metrics. | npj Digital Medicine