Fig. 3: Model pipeline: Glimpses are submitted as a batch to a convolutional neural network (purple shaded area).

Intermediate outputs (red boxes) are input to an attention sub-network. Features maps (f1–fn) are weighted by their attention (a1–an) and aggregated via weighted averaging (oval). The representation learning subnetwork estimates the gestational age (GA) based on the aggregated feature map f. The mean squared error (GA - GA)2 inside a total batch of 64 glimpses is used in backpropagation. The whole learning procedure is done in an end-to-end manner.