Fig. 1 | Scientific Reports

Fig. 1

From: Single view generalizable 3D reconstruction based on 3D Gaussian splatting

Fig. 1

Overall Network Architecture. The input to the U-Net model includes both RGB images and depth values estimated by a monocular depth estimator. After passing through the encoder and decoder, the network predicts a set of parameters for each pixel, describing the corresponding 3D Gaussian ellipsoids and their offsets in the x, y, and z directions. To prevent the 3D Gaussian ellipsoids from overfitting to the depth surfaces of objects, which could lead to visually satisfactory renderings from the input view but inaccuracies from new perspectives, a homography depth loss function, denoted as LH, is defined based on the source camera parameters to regularize the Gaussian ellipsoid positions throughout network training. Additionally, a depth loss function is defined for the target camera rendering results. Instead of computing L2 loss on the depth values of corresponding pixels, the image is divided into patches of varying sizes, and the depth consistency is enforced by applying a Pearson correlation constraint on the depth information within each patch. The photometric loss for the rendered model continues to utilize the L1 loss and SSIM loss as applied in the vanilla 3DGS approach.

Back to article page