Extended Data Fig. 2: Temperature scaling during the marginalization procedure. | Nature Methods

Extended Data Fig. 2: Temperature scaling during the marginalization procedure.

From: Cell tracking with accurate error prediction

Extended Data Fig. 2

a) Visual illustration of the temperature scaling procedure. The prediction on a single link (encoded in its weight) is not independent from predictions on nearby links. To account for this shared information we divide the link energy by a ‘calibration temperature’ (Eq. 1). The marginalized link probability, P(A|B), is then based on scaled energies (Eq. 2). The calibration temperature is found by minimizing the cross-entropy loss (CE) between the marginalized predictions and the ground truth (y), with respect to this temperature (Eq. 3). This scaling temperature has to be calibrated only once after training a set of division and link neural networks. The calibration can be done on the same data that was used for neural network training. For data far outside the training distribution new calibration can be performed on a small manually corrected set of links. b) Illustration of the difference subsets used in the subsequent plots. Red symbolizes the naïve approach, where the subset simply is the link of interest. In this case no marginalization is done. The green subset only considers link at a distance of one step from the target node of the link of interest. The blue subset goes up to a distance of three steps. This is the largest set that is computationally feasible (~1 h of computation time for 300 frames). c) Prediction performance as measured by the binary cross entropy loss versus the subgraph size, all dots are individual organoids. The loss at the optimal temperature (lowest point in panel d) was taken. Increasing subgraph size improves prediction, but there is no improvement when going beyond three steps, see method section ‘Marginalization’ for further discussion. d) Cross entropy loss between predictions and ground truth (based on all 9 organoids in the training dataset) for different neighborhoods and temperatures. Minimum loss (dotted line) is achieved at higher temperatures when the subset gets larger. All lines represent individual organoids. e) Optimal calibration temperature for every neighborhood. The error bars represent the 95% confidence interval (this is a lower bound as not all observed links are truly independent). Limiting ourselves to calibration on the validation dataset that was left during training gives the same results. f) Optimal temperatures and confidence intervals (again lower bounds) per organoid. The estimates show great overlap between organoids, validating that we can use a single calibration temperature for all. g) Predicted versus actual log-likelihoods after marginalization on different subsets. When marginalizing on larger subsets predictions become overconfident. If predictions suggest that links are not correct, more than expected fraction is actually correct and the other way around. h) After scaling the energies with the proper temperature per subset, we get well-calibrated link predictions.

Back to article page