Fig. 1: AnimalGAN overview and study design. | Nature Communications

Fig. 1: AnimalGAN overview and study design.

From: A generative adversarial network model alternative to animal studies for clinical pathology assessment

Fig. 1

a AnimalGAN model development. The AnimalGAN was developed based on 80% of TG-GATEs data (the training set) which consists of 6442 rats exposed to 110 compounds under 4 different time points (i.e., 3/7/14/28 days) and three dose levels (i.e., low/medium/high). The chemical representation (i.e., 1826 Mordred descriptors), time point, dose level, and Gaussian noise as input to the Generator (G) to yield the 38 synthetic clinical pathology measurements which was compared to the real data by the Discriminator (D). The average 100 generated clinical pathology measures passed the blood cell counts check to represent the clinical pathology measurements. Once the difference between the synthetic and real data could not be distinguished by the Discriminator (D), the AnimalGAN model was established. b AnimalGAN model evaluation. The AnimalGAN model was employed to generate the 38 clinical pathology measurements for 20% of TG-GATEs dataset (the test set) which consists of 332 treatment conditions exposed to 28 different compounds under 4 different time points (i.e., 3/7/14/28 days) and three dose levels (i.e., low/medium/high). We calculated the average 100 generated clinical pathology measures met a criterion using the blood cell counts to represent the clinical pathology measurements from AnimalGAN and compared them to the corresponding real ones for each treatment condition. Boxplot of c RMSE - Root Mean Square Error and d Cosine Similarity between AnimalGAN generated synthetic data and real animal testing data for treatment conditions in the test set. The statistical difference between RMSEs/Cosine Similarities of AnimalGAN generated synthetic data and real animal testing data for n = 332 treatment conditions in the test set and RMSEs/Cosine Similarities of real data across any two treatment conditions (n = 1,358,776, derived from 1649 × 1648/2) was determined using a two-tailed Wilcoxon rank-sum test without adjustments for multiple comparisons. The boxplot displays the distribution of RMSEs/Cosine Similarities, with the centerline representing the median, the bounds of the box representing the first and third quantiles, and the whiskers representing the 1.5 times the interquartile range (IQR). e t-SNE visualization of generated data and real data for treatment conditions in the test set. Each point depicted one treatment condition. Source data are provided as a Source Data file.

Back to article page