Fig. 2: Gene-SGAN identifies the ground truth in semi-synthetic experiments.

For constructing different ground truth subtypes, we impose distinct synthesized imaging patterns, specifically volumetric change in brain regions of interest (ROIs), on HC imaging features, simulating disease effects modulated by completely synthetic SNP variations. (Method 6). With known synthetic ground truth, we tested Gene-SGAN’s clustering performance in several experimental scenarios: (a) generalizability, (b) robustness to missing genotype, (c) comparison to previous methods, and (d, e) interpretability for model performances. In (a), (b), (c), the box plots were generated from 50 datapoints that reveal clustering accuracies in 50-iteration of hold-out cross validation or model runs. a Gene-SGAN shows robust generalizability to test data. With different hyperparameter (gene-lr) settings, Gene-SGAN consistently achieves comparable clustering accuracies on the training and test sets. With increasing confounders in imaging features (bottom vs. top), achieving the model’s optimal performance necessitates a higher gene-lr. b Gene-SGAN is robust to different levels of missing SNPs. Clustering accuracies remain high but gradually decrease as the SNPs’ missing rate increases. c Gene-SGAN outperforms other clustering methods. We report the clustering performances of the seven models (Gene-SGAN vs. others) with different levels of simulated imaging confounders and dimensions of genetic data. SGAN: Smile-GAN; CCA: Canonical Correlation Analysis; MSC: Multiview-Spectral-Clustering; MKmeans: Multiview-KMeans. d Gene-SGAN accurately recovers SNPs’ minor allele frequency (MAF) within each simulated subtype. We present the simulated subtype-associated SNPs (marked with asterisk) and the simulated confounding SNPs (not marked, Method 6). e Gene-SGAN captures dominant characteristics of the ground truth imaging patterns (associated with subtypes) but avoids confounding ones. The ROIs in the ground truth imaging patterns are colored with a ratio of 0.15, the average ratio of simulated changes (ranging from 0 to 0.3). The ROIs in the confounding patterns are left blank. Imaging patterns characterized by the model are defined as ratios of ROI changes made by the transformation function that approximates the disease effects. (Method 6) For visualizing important ROIs captured by f, we only color ratios > 0.05. MTL: medial temporal lobe. (Centerline, median; red marker: mean; box limits, upper and lower quartiles; whiskers, 1.5× interquartile range; points, outliers).