Fig. 1: The architecture of UI-Trans network during the training phase in the context of the LSFM setup.

a UI-Trans network is composed of a dual-branch encoder and decoder structure. The encoder consists of a convolutional encoder and a transformer encoder in parallel, repeatedly performing feature extraction and down-sampling. The decoder consists of alternating convolutional layers and up-sampling layers. b Low light-dosage LSFM images (using full-frame exposure) are used as input and the high light-dosage confocal LS-LSFM images (by synchronizing the light scanning and rolling shutter exposure) are used as the ground truth. c Optical geometry of LSFM system. Cat – concatenation