Fig. 1: Feature distribution in 500 MIMIC present medical histories.
From: Privacy-preserving large language models for structured medical information retrieval

a The bar chart visualizes data from 500 present medical history reports extracted from the MIMIC-IV database. It displays the counts for five extracted features, with “true” counts in red and “false” in blue. b The sunburst plot indicates the amount of reports, in which the features’ term is explicitly mentioned as a share of false and true counts. Liver cirrhosis and ascites are the features with the highest share of explicitly mentioned features, with every mention aligning with a “true” classification in the ground truth evaluation. Abdominal pain and shortness of breath were most frequently mentioned over all reports. “Explicit features” are consistently described with identical terminology (e.g., ascites, cirrhosis), whereas “implicit features” vary in description (e.g., shortness of breath: “SOB,” “difficulties in breathing,” “dyspnea”).