Introduction

Recent progress in microscopy needs higher imaging speeds to accommodate dynamic biological studies1, like cell biology2, 3D genome organization3, developmental biology4, single cell optical imaging and microscopy5, mass spectroscopy imaging6, and high-throughput experiments7. On the other hand, resolution enhancements often result in increased noise due to fewer photons being available per pixel, especially in low illumination conditions. These challenges indicate the need for methods to enable efficient data acquisition without compromising image quality.

Confocal laser scanning microscopy (CLSM) offers high-resolution imaging with optical sectioning and detailed 3-D reconstructions of porous media8, corneal surfaces9, human cementocytes10, orthopedic research11, breast cancer12, cell biology13, and in optical microscopy of various materials14.

However, under practical conditions, CLSM images are often degraded by diffraction-limited resolution, which is fundamentally dictated by the system’s point spread function (PSF)15,16. The PSF imposes intrinsic limits on spatial resolution and introduces blurring artifacts17,18, and its precise characterization remains essential for understanding imaging performance across various modalities19. Additionally, dark current and speckle noise degrade CLSM image quality further to the point that maintaining clarity in low light and/or power-sensitive conditions becomes challenging. Under-sampling or non-uniform illumination further worsens these issues.

Traditional approaches to improving CLSM image quality rely on post-acquisition techniques such as deconvolution and denoising, which often address the symptoms of degradation but are unable to incorporate the underlying physical principles of image formation. Hardware-based solutions, including advanced detectors or adaptive optics, provide direct improvements but are often costly and not universally accessible.

Deep learning has demonstrated remarkable success in image restoration, with both general-purpose frameworks20,21 and application-specific designs, such as those tailored for endoscopic videos22 and tomographic imaging23. However, most existing methods are constructed as broadly applicable models, often without explicitly incorporating the underlying physics of the imaging process. Physics-guided computational methods offer an alternative by incorporating optical system constraints directly into imaging models. Unlike standard deep learning approaches, which primarily optimize visual fidelity, Physics-guided models integrate physical degradation mechanisms such as PSF convolution, photon shot noise, motion blur, dark current, speckle noise, and undersampling into the model training process. This ensures that image reconstruction remains consistent with the physical principles of CLSM.

In the current work, we developed a Physics-guided autoencoder for CLSM image restoration that incorporates the imaging system’s point spread function (PSF), diffraction effects, noise mechanisms, and sampling constraints directly into its design. This study focuses on addressing the imaging challenges due to noisy conditions in CLSM of various biosystems, where prolonged light exposure and/or high-intensity lasers can potentially damage or, in some cases, destroy the samples. This imposes imaging under low light and low laser power conditions to preserve sample integrity. However, these conditions result in under-sampling, increased noise levels, and reduced image resolution. Therefore, by integrating such physical constraints into an autoencoder model we aim to find a robust solution for restoring high-quality CLSM images while ensuring minimal impact on delicate biological samples. The model simulated common CLSM imaging degradations, including photon shot noise, motion blur, speckle noise, and under-sampling, and used these to train a physics-constrained neural network that restores high-quality CLSM images from degraded inputs. Our methods will reduce the need for additional hardware or modifications to existing CLSM systems, focusing instead on software-based solutions to improve image quality in CLSM. The proposed method will be tested on varied synthetic CLSM datasets, to demonstrate its capability to restore spatial resolution and reduce noise while maintaining consistency with the physical principles of CLSM imaging.

Materials and methods

Common noise types in confocal laser scanning microscopy

Confocal laser scanning microscopy (CLSM) imaging faces inherent limitations due to physical degradations caused by optical diffraction, noise sources, and undersampling. These degradations are amplified under conditions of low laser power, which is essential to prevent photodamage to delicate samples.

The imaging resolution in CLSM is fundamentally limited by diffraction, characterized by the point spread function (PSF)16, which defines the intensity distribution in the focal plane. For an ideal circular aperture, the PSF follows the Airy disk profile24,25:

$$\:PSF\left(r\right)={\left[\frac{2{J}_{1}\left(\pi\:D\frac{r}{\lambda\:}\right)}{\pi\:D\frac{r}{\lambda\:}+\varepsilon}\right]}^{2}$$
(1)

Such that, \(\:{J}_{1}\) is first order Bessel function, D is numerical aperture, r is the radial distance in the focal plane, \(\:\lambda\:\) is laser wavelength and \(\varepsilon\) is an arbitrary constant to avoid division by zero.

The lateral resolution of CLSM is then24,25

$$\:d=\frac{0.61\lambda\:}{D}$$
(2)

Similarly, the axial resolution (which defines depth discrimination) is given as

$$\:{d}_{z}=\frac{2\lambda\:}{n{D}^{2}}$$
(3)

Several other noise sources degrade CLSM images. Photon shot noise arises from the quantum nature of light, where the number of detected photons fluctuates statistically. This phenomenon limits the signal-to-noise ratio at low photon counts. For an optical signal \(\:\text{I}\left(\text{x},\text{y}\right)\) measured in photon counts per pixel, the arrival of photons follows a Poisson process. At moderate to high photon flux, this Poisson distribution can be approximated by a Gaussian distribution with variance equal to the signal intensity. Light intensity affected by photon shot noise is modeled as

$$\:{I}_{shot}\left(x,y\right)=I\left(x,y\right)+{R}_{random}\sqrt{I\left(x,y\right)}$$
(4)

such that, \(\:I\left(x,y\right)\) is true intensity for a pixel at \(\:\left(x,y\right)\) and \(\:{R}_{random}\) is a Gaussian random variable with zero mean and unit variance.

Similarly, thermally activated carriers within the photodetector contribute a background signal even in the absence of illumination. This dark current is independent of the signal and can be modeled as an additive zero-mean Gaussian process. The intensity of light \(\:I\left(x,y\right)\) for a pixel at \(\:\left(x,y\right)\) affected by dark current noise is expressed mathematically as

$$\:{I}_{dark}\left(x,y\right)=I\left(x,y\right)+{R}_{dark}{\sigma\:}_{dark}$$
(5)

Speckle arises from coherent interference of backscattered laser light and manifests as multiplicative intensity fluctuations. These variations are typically modeled as a random modulation of the underlying intensity. This noise, caused by coherent light interference, is modeled for the same pixel as multiplicative noise, with \(\:{\sigma\:}_{speckle}\) as speckle noise strength:

$$\:{I}_{speckle}\left(x,y\right)=I\left(x,y\right)(1+{R}_{speckle}{\sigma\:}_{speckle})$$
(6)

Mechanical instability, drift, or sample motion during the raster scan introduces spatially coherent blur. This is modeled via a deterministic convolution of the true image with a one-dimensional linear kernel Kblur of fixed length and direction. The noise can be mathematically modeled using a blur Kernal as a convolution “*” below:

$$\:{I}_{blur}\left(x,y\right)={K}_{blur}*I\left(x,y\right)$$
(7)

where ‘*’ denotes 2D convolution. The kernel is chosen to emulate a 5-pixel linear blur at random angles.

Any fluctuations in laser power may also introduce noise in CLSM images, they are represented by a spatially random but globally multiplicative scaling factor, and it can be expressed mathematically as

$$\:{I}_{fluctuation}\left(x,y\right)=I\left(x,y\right)(1+{R}_{fluctuation}{\sigma\:}_{fluctuation})$$
(8)

The undersampling of a CLSM image can be simulated using a binary mask \(\:M(x,y)\) with pixel omission probability p is defined as

$$\:M=\left\{\begin{array}{c}1\:\:\:;probabliity=\:\:1-p\\\:0\:;\:\:\:\:\:\:\:\:\:probability=\:p\end{array}\right.$$
(9)

The affected intensity will then be

$$\:{I}_{undersampled}\left(x,y\right)=I\left(x,y\right).M(x,y)$$
(10)

To ensure reproducibility of our synthetic data generation, we detail below the specific degradation parameters used in the simulation pipeline, which were selected to reflect typical noise characteristics observed in experimental CLSM systems. The point spread function (PSF) was modeled using an Airy disk approximation with a numerical aperture (NA) of 1.4 and an emission wavelength λ = 500 nm. Photon shot noise was introduced via Poisson-distributed counts with a scaling factor of 0.1, while Gaussian-distributed additive noise with a standard deviation σ = 0.05 was used to simulate dark current. Speckle noise was applied as multiplicative Gaussian noise with σ = 0.05. Motion blur was introduced by convolving the image with a horizontal linear kernel of size 5 pixels. Additionally, multiplicative Gaussian noise with σ = 0.05 was used to emulate laser power fluctuations. Undersampling was simulated using a random binary mask that removed 20% of the pixels, corresponding to an undersampling ratio of 0.2. Finally, to simulate optical attenuation, an exponential decay term with an attenuation coefficient of 1.20 was applied to the convolved image. These parameter settings form the basis of our degradation model and were held constant throughout training. These details are also summarized in in Table 1.

Table 1 Parameters used for synthetic degradation of CLSM training data.

Adaptive physics autoencoder model

To mitigate these degradations, we developed an Autoencoder model, the training of which is constrained by the physics confocal laser scanning microscopy and all the noise types described in the previous section. An autoencoder is a type of neural network with an encoder-decoder structure. The encoder maps noise-degraded image X into a compressed latent space Z

$$\:Z={f}_{encoder}(X;\theta\:)$$
(11)

The latent space representation \(\:Z\) encodes the image while filtering out noise. It follows the constraints imposed by the PSF defined earlier by equation (x), which governs the spatial resolution. The encoder consists of four convolutional layers, which gradually extract features while reducing the image size. Each layer applies a convolution operation followed by a LeakyReLU activation function, allowing the model to detect patterns while reducing noise. The detailed architecture of the encoder is shown in Table 2, while the decoder is described in Table 3.

The decoder then reconstructs a clean denoised image from Z.

$$\:\widehat{X}={f}_{decoder}(Z;\varphi\:)$$
(12)
Table 2 Encoder architecture of the adaptive physics autoencoder, consisting of four convolutional layers.
Table 3 Decoder architecture of the adaptive physics autoencoder, using three transpose Convolution layers, followed by a final Convolutional layer with a sigmoid activation function.

The \(\:\theta\:\) and \(\:\varphi\:\) are encoder and decoder parameters, respectively. The training objective minimizes the Mean Squared Error (MSE) between reconstructed images X and ground truth images, \(\:\widehat{X}\). The primary loss term, the Mean Squared Error (MSE), is defined for N training samples as

$$\:{\mathcal{L}}_{MSE}=\frac{1}{N}\sum\:_{i=}^{N}{\|{\widehat{X}}^{\left(i\right)}(x,y)-\:{X}^{\left(i\right)}(x,y)\|}^{2}$$
(13)

To ensure the reconstructed image adheres to the photon conservation law in CLSM, a photon loss between reconstructed and ground truth images is introduced.

$$\:{\mathcal{L}}_{photon}=\left|\sum\:\widehat{X}-X\right|$$
(14)

The morphological preservation of features in CLSM images was ensured by edge loss defined as

$$\:{\mathcal{L}}_{edge}=\left|\nabla\:\widehat{X}-X\right|$$
(15)

The total loss function was defined as a weighted sum of a pixel-wise reconstruction loss and a physics-consistency penalty, with scalar hyperparameters λ1 and λ2​ governing the relative contributions of each term. \(\:{\lambda\:}_{1}={\lambda\:}_{2}=0.1\) were used to moderately favor physics-guided consistency during training.

$$\:{\mathcal{L}}_{total}={\mathcal{L}}_{MSE}+{{\lambda\:}_{1}\mathcal{L}}_{photon}+{{\lambda\:}_{2}\mathcal{L}}_{edge}$$
(16)

To simulate real-world imaging conditions encountered in confocal laser scanning microscopy (CLSM), the training dataset was augmented with a series of physically motivated degradations. These included convolution with a point spread function (PSF) modeled as an Airy disk, followed by the sequential application of photon shot noise (Poisson-distributed), additive Gaussian noise representing dark current, and multiplicative Gaussian noise simulating speckle effects. Motion blur was introduced by convolving each image with a linear kernel of 5 pixels at random orientations. Additional multiplicative noise was applied to emulate laser power fluctuations, and random undersampling was implemented by removing 20% of pixels using a binary mask. These augmentation steps were applied consistently across the dataset to train the model under a diverse range of degradation scenarios.

The model was trained using the Adam optimizer with a fixed learning rate of 0.001 and a batch size of 4. All training was conducted on a standard CPU workstation to maximize reproducibility, requiring approximately 6–8 h for 300 epochs on a dataset of ~ 180 images (256 × 256 pixels). When executed on a single mid-range GPU (NVIDIA RTX 3050), total training time reduced to approximately 1 h. The model architecture is compact, comprising roughly 5 million trainable parameters, and achieves inference times under 0.1 s per image on GPU. These characteristics make the model suitable for real-time or near-real-time deployment in microscopy workflows, even in resource-constrained environments.

Fig. 1
figure 1

Deep learning architecture for image restoration in confocal laser scanning microscopy (CLSM).

Figure 1 demonstrates the Adaptive Physics Autoencoder for CLSM image restoration. On the left, noisy and degraded CLSM images are input to the encoder. The encoder extracts features using multiple convolutional layers with ReLU activation. The image is compressed into a latent space (64 × 64 × 256), capturing its core structure. The decoder, composed of transpose convolution layers, progressively reconstructs the image while reducing noise and restoring fine details. The final convolutional layer with Sigmoid activation ensures correct intensity scaling. The output on the right shows a restored CLSM image with improved clarity and structural preservation.

Unless otherwise stated, all image intensities are normalized to the range [0, 1] and are unitless. Wavelengths are in nanometers (nm), spatial resolution is expressed in micrometers (µm), and kernel sizes and image dimensions are given in pixels. Standard deviations (σ) for noise models are unitless and refer to normalized image scales. Peak Signal-to-Noise Ratio (PSNR) is reported in decibels (dB), and Structural Similarity Index (SSIM) is dimensionless. All the units used are being summarized in Table 4.

Table 4 Summary of units employed for physical quantities involved in this study.

Model training

The model was trained to reconstruct clean images by minimizing the total loss function defined by Eq. (7). Training of the model was optimized using the Adam optimizer with a learning rate of 0.001 and for 1000 epochs with a batch size of 4 images. The model learns to restore CLSM images by continuously improving its ability to remove noise while maintaining physically meaningful structures.

Model evaluation

The reconstructed images were evaluated using the Structural Similarity Index (SSIM), Peak Signal-to-Noise Ratio (PSNR), and intensity profile matching.

  • SSIM measures structural similarity between the reconstructed and original images.

$$\:SSIM\left(X,\widehat{X}\right)=\frac{(2{\mu\:}_{X}{\mu\:}_{\widehat{X}}+{C}_{1})(2{\sigma\:}_{X\widehat{X}}+{C}_{2})}{({{\mu\:}_{X}}^{2}+{{\mu\:}_{\widehat{X}}}^{2}+{C}_{1})({{\sigma\:}_{X}}^{2}+{{\sigma\:}_{\widehat{X}}}^{2}+{C}_{2})}$$
(17)

such that, \(\:\mu\:\) are the mean values, \(\:\sigma\:\) are variances and \(\:{\sigma\:}_{X\widehat{X}}\) is the covariance between the images.

  • Peak Signal-to-Noise Ratio measures the pixel-wise accuracy of the reconstructed image.

$$\:PSNR=10{\text{log}}_{10}\frac{{(max.I)}^{2}}{MSE}$$
(18)

where \(\:max.I\) is the maximum possible intensity value.

The performance of the autoencoder was also compared against traditional image restoration methods.

  • Richardson-Lucy Deconvolution method is given by:

$$\:{X}_{k+1}={X}_{k}\cdot\:\left(\frac{1}{{X}_{k}*PSF}*{PSF}^{\dagger}\right)$$
(19)

where ‘*’ represents convolution and \(\:{PSF}^{\dagger}\) is flipped point spread function.

  • Non-Negative Least Squares (NNLS) solves:

\(\:\underset{X}{\text{min}}{\|PSF*X-I\|}^{2}\) subject to \(\:X\ge\:0\)20.

  • Total Variation regularization minimizes:

$$\:\underset{X}{\text{min}}{\|PSF*X-I\|}^{2}+\lambda\:\sum\:_{i,j}\sqrt{{({X}_{i+1,j}-{X}_{i,j})}^{2}+{({X}_{i,j+1}-{X}_{i,j})}^{2}}$$
(20)

such that \(\:\lambda\:\) controls the smoothness of the reconstructed image.

Results and discussion

The individual effect of each of the noise types introduced in Sect. 2.1, simulated on an example image, is displayed in Fig. 2 with their effect on the intensity profile displayed in Fig. 2h. However, real-life CLSM imaging may contain a combination of few or all such noise types.

Fig. 2
figure 2

Common degradation mechanisms in confocal laser scanning microscopy (CLSM). (a) Point spread function optical diffraction and blurring. (b) Photon shot noise. (c) Dark current noise. (d) Speckle noise. (e) Motion blur. (f) Laser fluctuation noise. (g) Undersampling artifacts. (h) Intensity profiles corresponding to each noise type.

Fig. 3
figure 3

Cumulative effects of noise types in CLSM imaging, each subsequent image has added a new noise type to the previous ones. Right Panel: The normalized intensity profiles correspond to the noise types.

Figure 3 displays the sequential degradation of an example image. Following this, the training data was augmented by applying CLSM-specific noise types (PSF convolution, photon shot noise, motion blur, speckle noise, and undersampling), so that the model learns degradation characteristics aligned with experimentally observed CLSM conditions.

While our synthetic degradations are physically grounded, real CLSM images often contain unmodeled effects such as illumination drift, photobleaching, and sample heterogeneity. These may impact model performance under real conditions. To address this, we are initiating collaborations to acquire paired experimental datasets for validation and fine-tuning. Our framework supports transfer learning and can be adapted to real-world data once such training sets are available, making it a scalable starting point for practical deployment. The training progression of the Adaptive Physics Autoencoder for restoring CLSM images is shown in Fig. 4. Starting with the degraded input i.e. Network Input, the reconstructed images progressively converge toward the Ground Truth with gradual recovery of fine structural details as training progresses. This is also complemented quantitatively by normalized intensity profiles that show improved fidelity in reconstructions, with closer alignment to the Ground Truth over successive epochs. Quantitative metrics i.e. Structural Similarity Index (SSIM) and Peak Signal-to-Noise Ratio (PSNR) also indicate consistent improvements in image quality throughout training, achieving optimal performance at 300 epochs.

Fig. 4
figure 4

Training progression of the proposed adaptive physics autoencoder for confocal laser scanning microscopy (CLSM) image restoration.

Performance on CLSM images of lipid droplet morphology in adipocytes

We first demonstrated the ability of the proposed Adaptive Physics Autoencoder to improve the degraded CLSM images of lipid droplets in a gel matrix. Figure 5 shows the visualization of the recovery performance of different algorithms.

The Network Input represents the noisy, degraded image, while the ground truth shows the high-resolution target. The network output demonstrates the reconstructed image compared to the results from the Richardson-Lucy (RL) and Non-Negative Least Squares (NNLS) deconvolution algorithms. Enlarged Regions of Interest (ROIs) highlight the superior restoration of droplet morphology and surrounding features by the proposed network. Quantitative evaluations using SSIM and PSNR further validate the performance improvement achieved by the proposed method.

The network output images are inferred from the degraded images, i.e., the network input image. We compare the visualized reconstruction performance of the proposed network with widely used image deconvolution algorithms, including the non-negative least squares (NNLS) algorithm and the RL algorithm.

For a fair comparison, the degraded images were up-sampled by bicubic interpolation before being deconvolved. It can be seen that the resolution of the images processed by the two deconvolution algorithms is somewhat improved compared to the input images.

Fig. 5
figure 5

Visualization of the recovery performance of the proposed Adaptive Physics Autoencoder for restoring degraded CLSM images of lipid droplets in a gel matrix.

The reconstructed confocal images of the network output, however, present a much finer structure than the deconvolution results. From the enlarged images of the white dotted line frame shown in the bottom row of Fig. 5, we can see that the network output image has reconstructed the circular droplet structures with a more obvious outline and the surrounding network structure closer to the real image.

Furthermore, we quantified the performance of images generated by different algorithms using SSIM and peak signal-to-noise ratio (PSNR) indexes. The quantitative results (bottom row) report that the SSIM of our algorithm is approximately 0.98, and the PSNR exceeds 36 dB. The experiment was repeated with 20 images, yielding similar results.

Performance on CLSM images of neuronal networks in cerebral organoids

The proposed Adaptive Physics Autoencoder was also applied to the CLSM synthetic images of neuronal networks of cerebral organoids to assess its capability in reconstructing structurally complex and densely connected systems.

Figure 6 illustrates the comparative reconstruction performance of the network alongside traditional deconvolution methods. The network’s output is directly inferred from degraded inputs, while the other approaches, the RL and non-negative least squares (NNLS) deconvolution algorithms, require preprocessing, including bilinear interpolation to match resolution requirements. Unlike the RL and NNLS methods, which partially enhance the image resolution but struggle to reconstruct fine neuronal details, the network output excels in restoring the complexity of the neuronal architecture, including well-defined filaments and continuous network structures. ROIs in the bottom row further emphasize the network’s ability to recover delicate connections and accurately represent the topology of neuronal systems. Quantitative metrics support these visual observations, with the network achieving an SSIM of approximately 0.98 and a PSNR of 35.88 dB, significantly outperforming the deconvolution-based methods.

Performance on CLSM images of sparse fibrillar structures

The evaluation of the proposed Adaptive Physics Autoencoder was extended to synthetic CLSM images of sparse fibrillar structures. Figure 7 displays the reconstruction performance of our model compared to traditional deconvolution methods. While RL and NNLS algorithms result in slight enhancements to the resolution of the input images, they struggle to accurately reconstruct the sparse fibrillar features. On the other hand, the network output demonstrates significant improvements, with well-defined fibrillar structures and enhanced contrast closely resembling the Ground Truth.

Fig. 6
figure 6

Confocal laser scanning microscopy (CLSM) images of neuronal networks derived from cerebral organoids.

Quantitative analysis also confirms these visual findings. The SSIM and PSNR values of the network output are markedly higher than those of RL and NNLS, with an SSIM of approximately 0.94 and a PSNR of 35.25 dB.

Performance comparison with traditional reconstruction methods

The proposed Adaptive Physics Autoencoder is further evaluated by comparing its performance to additional image reconstruction methods, including Total Variation (TV) regularization, Wiener filtering, and Wavelet denoising, on simulated CLSM images of spherical structures embedded in a gel matrix, resembling microplastic particles. Figure 8 demonstrates the reconstruction performance of all methods using low-resolution, noisy inputs (Network Input) as the starting point, with the Ground Truth serving as the ideal reference.

Fig. 7
figure 7

Simulated CLSM images of sparse fibrillar structures. The Network Input represents the degraded images.

While TV regularization, Wiener filtering, and Wavelet denoising improve the input image quality to some extent, they struggle to reconstruct fine details and introduce unwanted artifacts or excessive smoothing. On the other hand, the network output achieves better reconstruction of the spherical structures and preserves their boundaries and overall morphology closely matching the Ground Truth. The ROIs show the superior performance of the network in restoring high-quality CLSM images.

Fig. 8
figure 8

Comparison of proposed model with traditional methods on simulated confocal laser scanning microscopy (CLSM) images of spherical structures embedded in a gel matrix, resembling microplastic particles, comparing the reconstruction performance of the proposed Adaptive Physics Autoencoder with traditional methods.

Ablation study on physics-guided loss

To quantitatively and qualitatively assess the role of Physics-guided loss terms in the training process, we performed an ablation study by training our Physics-guided model under two conditions: (1) using mean squared error (MSE) alone as the loss function, and (2) using the full Physics-guided composite loss incorporating MSE, photon-consistency penalty, and Laplacian-based edge preservation. The goal was to isolate the contribution of the physics constraints to reconstruction fidelity across training epochs.

Results of this study on LYSOSOMES like structures are displayed in Fig. 9, visual comparisons over training epochs (5 to 300) reveal that both versions of the model progressively reconstruct finer structural details. However, the Physics-guided variant consistently achieves sharper, more accurate restorations earlier in training and with greater structural consistency in localized regions (magnified insets). This advantage is particularly evident in the normalized intensity profiles, where the physics-augmented model more closely tracks the ground truth peaks and valleys across the pixel domain.

Quantitative trends in SSIM and PSNR further support this finding. As shown in the comparison plots in Fig. 9, the model trained with physics penalties exhibits a steeper and more consistent improvement trajectory in both metrics. At 300 epochs, it reaches an SSIM of ~ 0.93 and a PSNR exceeding 33 dB—both of which are higher than the corresponding values for the MSE-only model (~ 0.80 SSIM, ~ 31.5 dB PSNR). Notably, the performance gap begins to diverge substantially beyond 150 epochs, indicating that the added physical constraints guide the network toward more faithful and data-consistent solutions.

These results validate the efficacy of Physics-guided regularization in accelerating convergence and enhancing fidelity in image reconstruction, thereby demonstrating the non-trivial impact of physical priors in microscopy-specific restoration tasks.

Fig. 9
figure 9

Ablation study comparing training with Physics-guided loss versus MSE-only loss on LYSOSOMES like structures. Top panel shows reconstructed outputs at selected epochs (5–300), intensity profiles, SSIM, and PSNR progression using physics-constrained loss, while bottom panels displays same outputs and metrics for MSE-only training. Physics-based training shows superior structural recovery and faster convergence across all metrics.

Conclusions

Our method reduces equipment costs, ensures consistency with real imaging conditions, and takes humans out of the loop, a step towards self-driving labs. Comparisons with widely used deconvolution algorithms and other reconstruction methods demonstrated its ability to recover fine structural details, validated both qualitatively and quantitatively through network output and ground truth comparison, SSIM, and PSNR metrics. In summary, Adaptive Physics Autoencoder represents a step forward in interpretable deep learning by integrating Physics-guided models with deep learning for confocal microscopy imaging, real-time denoising of biological and medical imaging, and self-driving labs.