Fig. 2: Exploratory data analysis.

a Percentage comparison of discharge summary records with radiology-related features among the three cohorts. b Numbers of PCs for each PCA total variance cutoff for 2027 YNHH and MGH features in the case of non-discretized features with all standardized continuous features, discretized features with the standardized age feature, and discretized features with no standardization. c Scatter plots of PC1 and PC2 for the three cases in b by class and by cohort. d Top features that are present in >50% of non-cryptogenic stroke records for each TOAST class and their significance by chi-squared tests.