Assessing the degree to which medical large language models reliably convey existing, trustworthy knowledge is crucial. This study introduces SourceCheckup, an automated framework revealing that large language models frequently cite medical references that do not fully support, or even contradict, their responses, showing significant gaps in reliability for clinical use.