Extended Data Fig. 1: Network architecture.

Our model adopted 3D U-net20, which is composed of a 3D encoder module, a 3D decoder module, and three skip connections from the encoder module to the decoder module. In the encoder module, there are three encoder blocks. Each block consists of two 3 × 3 × 3 convolutional layers followed by a leaky rectified linear unit (LeakyReLU), a group normalization layer, a 2 × 2 × 2 max pooling with strides of 2 in three dimensions. In the decoder module, there are three decoder blocks, each of which contains two 3 × 3 × 3 convolutional layers followed by a LeakyReLU, a group normalization layer, and a 3D nearest interpolation. The skip connections can pass feature maps from the encoder module to the decoder module to integrate low-level features and high-level features. Feature maps of the encoder module and the decoder module are represented in different colors. All operations are in 3D and feature maps are all 4D tensors. 3D (c, t, x) feature maps were used here to simplify representation.