Extended Data Fig. 5: Evolution of F1-score with respect to training iterations and training set size on real cryo-ET Dataset #2, Chlamydomonas reinhardtii (3 classes). | Nature Methods

Extended Data Fig. 5: Evolution of F1-score with respect to training iterations and training set size on real cryo-ET Dataset #2, Chlamydomonas reinhardtii (3 classes).

From: Deep learning improves macromolecule identification in 3D cellular cryo-electron tomograms

Extended Data Fig. 5

a, The loss, which quantifies the segmentation quality, is computed for the training set, as well as for the validation set. Comparing both curves allows assessment of the generalization capabilities of DeepFinder. The curves for both sets should ideally overlap, otherwise it indicates overfitting (the network memorizes trained samples instead of learning discriminating features). One epoch equals 100 training iterations. b, The F1-score, which quantifies the localization performance, computed on the test set. The F1-score is obtained by comparing the membrane-bound ribosomes found by DeepFinder to expert annotations. The time axis has been obtained using a Tesla K80 GPU. The curve indicates that competitive particle picking results are obtained after 20 epochs, or 4.3 hours with the required GPU. This analysis used 21 tomograms for training, one tomogram for validation, and eight tomograms for testing. c, In a similar fashion to Extended Data Fig. 4, this curve provides an estimate of the quantity of training data required to achieve a competitive result. It appears that this quantity is 1,400 ribosomes (nine tomograms), which is a typical size for a cryo-ET dataset. On first glance, this estimate seems to contradict the estimates in Extended Data Fig. 4: the numbers do not coincide (the curve labeled ‘Large’ estimates that quantity at 208 particles). Note that SHREC’19 is a synthetic dataset, composed of 12 classes. Here, we are dealing with a real cellular dataset consisting of three classes (membrane, mb-ribo and ct-ribo). It appears that having a larger number of classes enables the use of smaller training sets. On the other hand, the case of real data is more difficult, notably because of the presence of ‘label noise’ (errors due to the annotation pipeline) and other sources of signal corruption such as the missing wedge, the contrast transfer function and the low signal-to-noise ratio (in part caused by increased molecular crowding inside cells). This analysis used one tomogram for validation, and eight tomograms for testing.

Back to article page