Figure 3
From: Automated classification of estrous stage in rodents using deep learning

EstrousNet performs similarly across groups. (A) auROC curves for each estrous stage. True positives for each stage are determined by benchmark classifications. (B) Specificity (SP) vs. sensitivity (SN) curves for EstrousNet, with the probability cutoff at 0.26 defined as the intersection between curves (dotted grey line). Standard error shown in orange and blue for sensitivity and specificity, respectively. (C) Out of sample testing across 4 different stains: hematoxylin and eosin (HE), Shorr stain (SH), Giemsa stain (GE), cresyl violet (CreV), and crystal violet (CryV). Test accuracy represented as a distribution across 1000 testing iterations, with mean % SE shown. Accuracy differences between stains are not significant [F(4,198) = 3.14, p = 0.10, one-way ANOVA]. (D) Out of sample testing between mouse and rat species. Test accuracy represented as a distribution across 1000 testing iterations, with mean % SE shown. Accuracy differences between species are not significant [F(1,198) = 7.87, p = 0.73, one-way ANOVA]. (E) Out of sample (OOS) classification for each stage of the estrous cycle between n = 36 animals, represented as a heatmap. Benchmark classification was used as a proxy for ground truth. K-fold cross-validation was used to estimate accuracy across stages, with k = 6 groups of n = 6 subjects each. Testing accuracy was averaged between each fold to generate the most unbiased estimate across all groups. Accuracy differences between subjects are not significant (F(5,198) = 6.98, p = 0.60, one-way ANOVA).