Fig. 1: Study design.

Sample strategy, preprocess, randomly-ordered assessment by blinded cardiologist. 75 questions were randomly selected from the original pool of 300 questions. Each chatbot was utilized to respond to 75 prompts, with each prompt being posed once on the interface during the respective query session. The evaluation was conducted in a blinded and randomly ordered manner. Specifically, the responses from three chatbots were randomly shuffled within the question set. The responses from three chatbots were randomly assigned to 3 rounds, in a 1:1:1 ratio, for blinded assessment by three cardiologists, with a 48-hour wash-out interval in between rounds so as to mitigate recency bias.