Introduction

With the rapid development of computer vision technology, 3D light field reconstruction has become increasingly advanced, and its unique 3D display effects have attracted widespread attention1. This technology holds significant scientific value and practical application in both everyday life and military fields.

Common techniques in 3D display include holographic display2,3,7 lenticular lens display4,5 volumetric 3D display6,7,8 and projection-based 3D light field display9,13. Among these, projection-based 3D light field display is considered to have broad prospects due to its features such as large display size, wide viewing angles, and glasses-free viewing10. In 2006, the Balogh team from France first developed a 3D display system using a planar scattering screen and projectors11.

In projection-array-based light field display systems, which share a similar display principle to integral imaging, the typical algorithm employed is the SPOC algorithm. This approach reconstructs three-dimensional scenes from a large number of disparity images12,18 enabling the preservation of the scene’s true details without the need for complex 3D modeling. A notable example is the study conducted in 2020 by Yu Haiyang, Jiang Xiaoyu, and colleagues from the PLA Army Armored Force Engineering College, China13. Their research focused on achieving light field reconstruction17,18,19 using the SPOC14,15,16,20,21,22 algorithm within a fixed ring sampling system. The algorithm proposed in this paper aims to overcome the limitations of the ring (symmetrical) framework and the need for dense camera sampling.

This novel algorithm diverges from conventional projection-array-based 3D light field reconstruction approaches by eschewing circular structural simplifications. Rather, it incorporates parallel camera arrays from real-world scenarios, introducing an innovative light field encoding paradigm. The algorithm encodes imagery based on camera intrinsic parameters and their world coordinate system positions, thereby overcoming the limitations of traditional frameworks. Its core innovation resides in the elimination of constraints imposed by circular (symmetric) architectures. By leveraging camera intrinsic parameters and positional data, it accurately determines the optimal pixel locations in disparity images captured by parallel cameras that correspond to object points. This methodology enables 3D light field reconstruction using parallel camera arrays under sparse sampling scenarios. Additionally, the flexible mapping between display and sampling pixels substantially improves the display accuracy and photorealism of the reconstructed light field.

This paper’s Sect. "Structure of the projection array and traditional light field encoding algorithms" introduces the fundamental principles of projection-array-based 3D light field display systems and conventional light field encoding algorithms. Section "Projection-based 3d light field reconstruction using parallel camera arrays" delves into the detailed explanation of the 3D reconstruction algorithm based on parallel camera arrays and its optimization strategies. Section "Experimental results and analysis" validates the algorithm’s effectiveness through experimental verification and presents the corresponding results. Section "Problems and prospects" summarizes the experimental findings, discusses the current challenges, and outlines prospective directions for future research.

Structure of the projection array and traditional light field encoding algorithms

Projection array display system

The projection display system is composed of a projection array and a holographic scattering screen. The optical axes of the projectors converge on the geometric center of the screen, where light emitted by the laser projectors is focused. The screen’s anisotropic light modulation characteristics induce distinct scattering angles for transmitted light in the horizontal and vertical directions, enabling 3D light field reconstruction. Given that the human eye is more sensitive to horizontal disparity changes than vertical ones—and considering bandwidth limitations—both the traditional SPOC algorithm and the proposed algorithm disregard vertical disparity, retaining and reconstructing only horizontal disparity to enhance stereoscopic visual effects.

SPOC algorithm

The SPOC (Smart Pseudoscopic-to-Orthoscopic Conversion) algorithm operates on the principle of utilizing disparity images acquired through sampling as the elemental image array for simulated display. It employs this image array to encode the light field and synthesize an elemental image array that aligns with the parameters of the real display lens array. When the reference plane coincides with the elemental image plane, this process can be characterized as a direct mapping from the sampled pixel space to the display pixel space.

Fig. 1
Fig. 1
Full size image

Schematic diagram of the principle of the traditional SPOC algorithm.

As illustrated in Fig. 1, the geometric center of the holographic scattering screen is designated as the origin of the coordinate system. To establish the unit length of this coordinate system, it is defined as equivalent to the size of the unit pixel projected from the projector onto the scattering screen. The formula can be expressed as:

$$e={\raise0.5ex\hbox{$\scriptstyle {{R_p}}$}\kern-0.1em/\kern-0.15em\lower0.25ex\hbox{$\scriptstyle {N*K}$}}$$
(1)

Where \(\:{R}_{p}\) denotes the radius of the projector array,\(\:\:N\) represents the horizontal resolution of the projector, and\(\:\:K\) is the projection ratio of the projector. \(\:\varDelta\:PAB\) encompasses all light information emitted by projector P. For specific pixels P1 and P2,suppose the pixel projected by projector P is \(\:{P}_{n}\), with the coordinates of point P being \(({x_p},{{\text{z}}_p})\) and the coordinates of point \(\:{P}_{n}\) being\(\:(0,{z}_{n})\).

$$\left\{ {\begin{array}{*{20}{c}} {{x_{\text{p}}}={R_p} \times \cos (\phi - p \cdot {\theta _p})} \\ {{z_{\text{p}}}={R_p} \times \sin (\phi - p \cdot {\theta _p})} \end{array}} \right.$$
(2)

Where \(\:p\) denotes the projector index, with the angle\(\phi\)between the negative half-axis of the x-axis and the projector corresponding to projector index 0. \({\theta _p}\)represents the angular separation between two projectors that are evenly arranged on the arc.

$${z_n}=(n - \frac{N}{2}) \cdot e,n \in \left[ {0,N - 1} \right]$$
(3)

The equation for the display light ray can be expressed as:

$$z=\frac{{{z_p} - {z_n}}}{{{x_p}}} \times x+{z_n}$$
(4)

By integrating the equation of the display light ray with the equation of the camera trajectory, their intersection point can be denoted as\(({x_{ideal}},{z_{ideal}})\), If a camera is positioned at this intersection --for instance, at the intersection\({C_1}\)between line\(L1\)and the camera trajectory -- the information of object point \({O_1}\)​ conveyed by the display light ray\(P{P_2}\)can be completely sampled by the camera. However, due to the non - zero spacing between sampling cameras, in cases such as when there is no camera sampling at the intersection\({C_2}\)between line\(L2\)and the camera trajectory, the information of object point \({O_2}\)cannot be collected. The algorithm is designed to synthesize sampling rays that precisely coincide with the display light rays. In other words, it seeks to generate sampling pixels that perfectly align with the display pixels, thereby enabling the assignment of pixel values.

Projection-based 3D light field reconstruction using parallel camera arrays

The SPOC algorithm relies on a ring-shaped system framework, leading to insufficient collection of light information and poor light field reconstruction performance. To overcome the limitations of the ring (symmetric) framework and the challenges of dense camera array sampling, this paper proposes a projection-based 3D light field reconstruction algorithm using parallel camera arrays. This algorithm breaks away from the traditional framework, aiming to capture more comprehensive light field information and significantly enhance the reconstruction results.

In practical applications, installation errors of parallel camera arrays may lead to deviations in camera positions, optical axes, and other parameters, thereby introducing systematic errors. To address this, this paper proposes a method integrating camera calibration and coordinate transformation to effectively mitigate the impact of such errors. In this algorithm, the asymmetry of the system framework—resulting from departures from traditional symmetric structures—exacerbates the incompleteness of captured light field information. To tackle this, an approximate point matching algorithm based on the Lambertian properties of objects is further proposed, aiming to enhance the accuracy and stability of light field reconstruction.

Camera parameters and coordinate transformation method

To eliminate errors, this algorithm first necessitates the acquisition of both intrinsic and extrinsic parameters of the free - moving camera. The intrinsic parameters encompass the camera’s focal length and the coordinates of the optical center, while the extrinsic parameters denote the rotation and translation offsets of the camera with respect to the coordinate system. The camera calibration and coordinate transformation process involves conversions among multiple coordinate systems, such as the world coordinate system, the camera coordinate system, and the image coordinate system as illustrated in Fig. 2. Through these transformations, the spatial position and orientation of the camera can be accurately calibrated, effectively mitigating the impact of errors on light field reconstruction.

Fig. 2
Fig. 2
Full size image

World Coordinate System (a), Camera Coordinate System (b), Pixel Coordinate System (c).

Any point \(({x_w},{y_w},{z_w})\) in the world coordinate system, based on Eq. (5), can be transformed to determine its coordinates in the camera coordinate system.

$$\left[ {\begin{array}{*{20}{c}} {{x_c}} \\ {{y_c}} \\ {{z_c}} \end{array}} \right]=R * \left[ {\begin{array}{*{20}{c}} {{x_w}} \\ {{y_w}} \\ {{z_w}} \end{array}} \right]+T$$
(5)

Similarly, for a point \(({x_c},{y_c},{z_c})\)in the camera coordinate system, the corresponding point in the world coordinate system \(({{\text{x}}_w},{y_w},{z_w})\) can be calculated based on Eq. (6).

$$\left[ {\begin{array}{*{20}{c}} {{x_w}} \\ {{y_w}} \\ {{z_w}} \end{array}} \right]={R^{ - 1}} * \left[ {\begin{array}{*{20}{c}} {{x_c}} \\ {{y_c}} \\ {{z_c}} \end{array}} \right] - {R^{ - 1}}T$$
(6)

Let \({R^{ - 1}}\) be \({R_{{\text{cw}}}}\) and \({R^{ - 1}}T\) be \({T_{cw}}\).

Any point \(({x_c},{y_c},{z_c})\) in the camera coordinate system can be mapped to the pixel coordinates based on Eqs. (4 − 3).

$$\left\{ {\begin{array}{*{20}{c}} {u=\frac{{{x_c}}}{{dx}}+{u_0}} \\ {v=\frac{{{y_c}}}{{dy}}+{v_0}} \end{array}} \right.$$
(7)

By combining Eqs. (5) and (7), the formula for directly transforming from the world coordinate system to the pixel coordinate system can be derived as:

$${Z_c}\left[ {\begin{array}{*{20}{c}} u \\ v \\ 1 \end{array}} \right]=\left[ {\begin{array}{*{20}{c}} {\frac{1}{{dx}}}&0&{{u_0}} \\ 0&{\frac{1}{{dy}}}&{{v_0}} \\ 0&0&1 \end{array}} \right]\left[ {\begin{array}{*{20}{c}} \vline & f&0&0&0 \\ \vline & 0&f&0&0 \\ \vline & 0&0&1&0 \end{array}} \right]\left[ {\begin{array}{*{20}{c}} R&T \\ 0&1 \end{array}} \right]\left[ {\begin{array}{*{20}{c}} {{x_w}} \\ {{y_w}} \\ {{z_w}} \\ 1 \end{array}} \right]$$
(8)

Similarly, if the pixel coordinates are given as (u, v), the coordinates in the world coordinate system can be expressed as:

$$\left[ {\begin{array}{*{20}{c}} {{x_w}} \\ {{y_w}} \\ {{z_w}} \end{array}} \right]=\left[ {\begin{array}{*{20}{c}} {(u - {u_0}) * dx} \\ {(v - {v_0}) * dy} \\ f \end{array}} \right] * {R_{cw}}+{T_{cw}}$$
(9)

Here f denotes the focal length of the camera.

Approximate point matching algorithm based on parallel camera arrays

In the ideal scenario, we assume that the parallel camera array is positioned at a height h from the geometric central plane of the object. The camera trajectory is then expressed as:

$${\text{z}}=h$$
(10)

As shown in Fig. 3, the intersection of the camera trajectory and the display light ray \(P{P_n}\) is denoted as \(({x_i},{z_i})\). The equation for the display light ray \(P{P_n}\) is given by:

$$z=\frac{{{z_p}}}{{{x_p} - {x_n}}}(x - {x_n})$$
(11)

Where \(({x_p},{z_p})\) are the coordinates of point \(\:P\),and\(({x_n},0)\)are the coordinates of point \({P_n}\).

$$\left\{ {\begin{array}{*{20}{c}} {{x_i}=\frac{{h \times ({x_p} - {x_n})}}{{{z_p}}}+{x_n}} \\ {{z_i}=h} \end{array}} \right.$$
(12)
Fig. 3
Fig. 3
Full size image

Simplified schematic of the parallel camera array.

The intersection of the camera trajectory and the display light ray denotes the position of the ideal camera. The parallel camera array is uniformly distributed, with the horizontal coordinates of two adjacent cameras forming an interval \(({x_j},{x_{j+1}})\). When \({x_i} \in ({x_j},{x_{j+1}})\) is satisfied, the two cameras closest to the ideal camera can be identified. Let these two closest cameras be denoted as\({C_A}\)and \({C_B}\).Given that the distance from the object to the sampling cameras is significantly larger than the interval between adjacent cameras, according to the Lambertian properties of the object, for the same object point, the pixel pair formed by the light rays passing through cameras \({C_A}\)and\({C_B}\) is the most similar pixel pair.

Since the position of object point \(\:O\) is unknown, its location must be determined to compute the corresponding pixel information. To enhance computational efficiency, the algorithm first defines the range of the virtual active space. As illustrated in Fig. 4, the coordinates of the near and far points on the plane \(\:y=0\) are derived by solving the system of equations formed by the display light ray and the virtual boundary equation:

$$z=l,z= - l$$
(13)

Here l denotes the distance from the virtual boundary to the plane.

Let the near and far points be denoted as \(({x_{nA}},0,{z_{nA}})\) and \(({x_{fA}},0,{z_{fA}})\),respectively. In the camera coordinate system, the coordinates of the camera’s optical center are \((0,0,0)\), and the coordinates of the principal point are \((0,0,f)\). Based on Eqs. (4 − 2), the coordinates of the camera’s optical center \(C({x_{cw}},{y_{cw}},{z_{cw}})\)and the principal point \(F({x_{iw}},{y_{iw}},{z_{iw}})\)in the world coordinate system can be determined.

Fig. 4
Fig. 4
Full size image

Virtual Active Space and Imaging Pixel Space.

Since the optical axis of the camera is perpendicular to the imaging plane, the normal vector of the imaging plane can be expressed as:

$$\overrightarrow n =({x_{nw}},{y_{nw}},{z_{nw}})=({x_{iw}} - {x_{cw}},{y_{iw}} - {y_{cw}},{z_{iw}} - {z_{cw}})$$
(14)

The mathematical expression for the imaging plane is:

$${x_{nw}} \cdot (x - {x_{cw}})+{y_{nw}} \cdot (y - {y_{cw}})+{z_{nw}} \cdot (z - {z_{cw}})=0$$
(15)

When camera \({C_B}\) samples the near-field point near\(({x_{na}},0,{z_{na}})\), the expression for the sampling ray in the world coordinate system is:

$$\left\{ {\begin{array}{*{20}{c}} {x=t * ({x_{cwB}} - {x_{{\text{na}}}})+{{\text{x}}_{na}}} \\ {y=t * {y_{cwB}}} \\ {z=t * ({z_{cwB}} - {z_{na}})+{z_{na}}} \end{array}} \right.$$
(16)

Where \(({x_{cwB}},{y_{cwB}},{z_{cwB}})\) are the coordinates of camera \({C_B}\)in the world coordinate system.

The intersection point of the sampling ray with the imaging plane represents the camera’s sampling information. The coordinates of the intersection point can be obtained by calculating t, where:

$$t=\frac{{{x_{nw}}({x_{cw}} - {x_{na}}) - {z_{nw}}({z_n} - {z_{nw}})+{y_{nw}}{y_{cw}}}}{{{x_{nw}}({x_{cwB}} - {x_{na}})+{y_{nw}}{y_{cwB}}+{z_{nw}}({z_{cwB}} - {z_n})}}$$
(17)

By combining the coordinate transformation, the pixel coordinates \({I_{BN}}({u_{NB}},{v_{NB}})\) corresponding to the near-field point can be determined. Similarly, the pixel coordinates \({I_{BF}}({u_{FB}},{v_{FB}})\) for the far-field point can be obtained. The pixel coordinates \((u,v)\)corresponding to the object point \(O\) should satisfy that \(u\)lies within the interval defined by \({u_{NB}}\) and \({u_{FB}}\), and \(v\) lies within the interval defined by \({v_{NB}}\) and\({v_{FB}}\).

Due to the lack of symmetry in the system framework, it is not possible to easily determine the relative size of \({u_{NB}}\) and \({u_{FB}}\), as well as \({v_{NB}}\) and \({v_{FB}}\). Therefore, we need to discuss the cases separately. Assume that \({u_{NB}}<{u_{FB}},{v_{NB}}<{v_{FB}}\) then:

$$u \in [{u_{NB}},{u_{FB}}],v \in \left[ {{v_{NB}},{v_{FB}}} \right]$$
(18)

Let the pixel index of the object point \(O\) after being sampled by camera \({C_B}\)be denoted as\({I_{OB}}\), and the pixel indices corresponding to the near and far points be denoted as \({I_{nB}}\)and\({I_{fB}}\) respectively. Then:

$$0 \leqslant {I_{{\text{n}}B}}<{I_{OB}}<{I_{fB}} \leqslant K - 1$$
(19)

K is the horizontal resolution of the camera.

The imaging interval of the object point \(O\) has been determined above. Next, we need to determine the actual imaging pixel of the object point \(O\). Let the actual imaging pixel be denoted as \({I_{SB}}\), then \({I_{SB}} \in [{I_{nB}},{I_{fB}}]\). The pixel coordinates of \({I_{SB}}\) can be converted to world coordinates through Eq. (9). Since the camera’s optical center \(C({x_{cw}},{y_{cw}},{z_{cw}})\)is known, the equation of the sampling ray \({P_n}{I_{SB}}\)is known. Let the equation be:

$$y={k_c} \cdot x+{b_c}$$
(20)

By combining Eqs. (20) and (11), the position of \({P_n}\)can be determined. Then, by using the position of \({P_n}\)and the location of \({C_A}\), the mathematical expression of the line \({P_n}{C_A}\)can be established. The intersection of the line \({P_n}{C_A}\)with the imaging plane of \({C_A}\)gives the pixel \({I_{SA}}\), thus determining a set of pixel pairs. By traversing each imaging pixel in the range \([{I_{nB}},{I_{fB}}]\), multiple sets of pixel pairs can be determined.

Just as Fig. 5(a) shows the schematic diagram of the overall algorithm, Fig. 5(b) is a partial enlarged view. When \({I_{OB}} \ne {I_{SB}}\), the actual object points corresponding to \({I_{SB}}\) and\({I_{SA}}\)are \({O_B}\)and\({O_A}\), respectively. Since \({O_B}\)and \({O_A}\) are actually unrelated, the corresponding imaging pixels have significant differences. When \({I_{SB}}={I_{OB}}\), it implies that \({I_{SA}}={I_{OA}}\). Since the object points sampled by \({I_{OA}}\)and \({I_{OB}}\) both correspond to the same object point \(O\), the sampled pixels are very similar. Based on the above analysis, this algorithm uses variance to determine the pixel corresponding to the real object point \(O\). The imaging pixels of the real object point after being sampled by two adjacent cameras can be obtained by traversing and calculating multiple sets of pixel pairs. The pixel pair with the smallest variance is considered the pixel pair corresponding to the real object point \(O\).

Fig. 5
Fig. 5
Full size image

Schematic diagram of the light field reconstruction algorithm based on approximate point matching (a) and the zoomed-in view (b).

Similarly, the light field reconstruction algorithm for different vertical heights remains unchanged. After sampling the real object point \(O\), this represents a series of continuous light field information. After camera sampling, it is discretized into \(J\) (the vertical resolution of the camera) imaging pixels. Therefore, the variance \({s^2}\) is:

$${s^2}=\sum\nolimits_{{j=0}}^{{J - 1}} {\tfrac{{{{(I_{{SB}}^{j} - {\raise0.5ex\hbox{$\scriptstyle {(I_{{SB}}^{j}+I_{{SA}}^{j})}$}\kern-0.1em/\kern-0.15em\lower0.25ex\hbox{$\scriptstyle 2$}})}^2}+{{(I_{{SA}}^{j} - {\raise0.5ex\hbox{$\scriptstyle {(I_{{SB}}^{j}+I_{{SA}}^{j})}$}\kern-0.1em/\kern-0.15em\lower0.25ex\hbox{$\scriptstyle 2$}})}^2}}}{2}}$$
(21)

Here, \(I_{{SB}}^{j}\) represents the jth pixel in the \({I_{SB}}\)column of the disparity image captured by the \({C_B}\) camera, and \(I_{{SA}}^{j}\) represents the jth pixel in the \({I_{SA}}\)column of the disparity image captured by the \({C_A}\)camera.

By combining the above formula with the pixel coordinate system, we get:

$${{\text{s}}^2}=\sum\nolimits_{{j=0}}^{{J - 1}} {\tfrac{{({{(P_{A}^{m} - (P_{A}^{m}+P_{B}^{m})/2)}^2}+({{(P_{B}^{m} - (P_{A}^{m}+P_{B}^{m})/2)}^2}}}{2}}$$
(22)

Where \(P_{A}^{m}\)represents the pixel value at pixel coordinates \(({u_{NAm}},{v_{NAm}})\)on the imaging plane of the camera numbered \({C_A}\), and \(P_{B}^{m}\) represents the pixel value at pixel coordinates \(({u_{NBm}},{v_{NBm}})\) on the imaging plane of the camera numbered \({C_B}\).

Let the pixel coordinates corresponding to the object points in each row of contiguous object points be denoted as \(({u_{nam}},{v_{nam}})\)and \(({u_{nbm}},{v_{nbm}})\), where the variance is minimized. A weighted fusion method is then applied to assign values to the composite pixel. The resulting value of the pixel in the mm-th row is denoted as:

$$P_{n}^{m}=0.5 * P_{{OA}}^{m}+0.5*P_{{OB}}^{m}$$
(23)

Where \(P_{{OA}}^{m}\) represents the pixel value at pixel coordinates \(({u_{nam}},{v_{nam}})\) on the imaging plane of the camera numbered \({C_A}\), and \(P_{{OB}}^{m}\)represents the pixel value at pixel coordinates \(({u_{nbm}},{v_{nbm}})\) on the imaging plane of the camera numbered \({C_B}\).

In the process of computing the composite pixel, the calculation and assignment of values for each row of pixels in the display column, corresponding to the sampled pixels in the composite pixel, are equivalent to completing the vertical light field mapping. Therefore, the mapping relationship between the display column pixels and the composite column pixels can be expressed as follows:

$$dP_{n}^{m}=P_{n}^{m}$$
(24)

Where \(dP_{n}^{m}\) represents the pixel value of the \(m\) row pixel in the display column \({P_n}\).

The above is based on the assumption of \({u_{NB}}<{u_{FB}},{v_{NB}}<{v_{FB}}\), then \(u \in [{u_{NB}},{u_{FB}}],v \in \left[ {{v_{NB}},{v_{FB}}} \right]\). The algorithm principle for other \(u \in [{u_{NB}},{u_{FB}}],v \in \left[ {{v_{FB}},{v_{NB}}} \right]\),\(u \in [{u_{FB}},{u_{NB}}],v \in \left[ {{v_{NB}},{v_{FB}}} \right]\),\(u \in [{u_{FB}},{u_{NB}}],v \in \left[ {{v_{FB}},{v_{NB}}} \right]\)is similar.

Improvement of the approximate point matching algorithm

In the approximate point matching algorithm, it is assumed that \({u_{NB}}<{u_{FB}},{v_{NB}}<{v_{FB}}\),\(u \in [{u_{NB}},{u_{FB}}],v \in \left[ {{v_{NB}},{v_{FB}}} \right]\) but in practice, there may be cases where the horizontal distance between two pixels is less than the size of one pixel, i.e., \(\left| {{u_{NB}} - {u_{FB}}} \right|<1\). This means that the position information in the horizontal direction of the real object point corresponding to the same plane is determined.

Fig. 6
Fig. 6
Full size image

Schematic diagram of the improved algorithm.

As illustrated in Fig. 6, the display light ray \(P{P_n}\)can be fully sampled by camera \({C_B}\). Thus, it is unnecessary to determine the real positional information of the object point \(O\). Instead, we only need to compute the intersection of \(P{P_n}\)with the imaging plane using Eqs. (4) and (15), and determinen the pixel information corresponding to the object point \(O\) on the horizontal plane \(y=0\) through coordinate transformation. or the pixels in a column projected by the projector, it is only necessary to calculate the projection height of the projector onto the holographic scattering screen, namely:

$${z_h}= - e * (i - M/2),i \in \left[ {0,M - 1} \right]$$
(25)

Where \(M\) denotes the vertical resolution of the projector. Next, based on the intersection of the display light ray passing through the point \(({x_n},0,{z_h})\)with the imaging plane, the pixel information corresponding to this intersection can be computed. This simplification of the computational process allows for faster calculation of the projection image array.

Experimental results and analysis

The projection array employed in this system is depicted in Fig. 7.Comprising 108 projectors with a resolution of 1280 × 720 and a holographic scattering screen, the projectors are evenly distributed along an arc with a 1.7-meter radius. Featuring an angular interval of 0.5 degrees, they cover an azimuthal angle range of [− 27°, 27°]. The center of the holographic scattering screen aligns with the arc center, and its plane is perpendicular to the optical axis of the central projector at an azimuthal angle of 0°.

Fig. 7
Fig. 7
Full size image

Real-life image of the arc-shaped projection array display device.

Using the disparity images obtained from sampling as the input, the projection image array is computed by the approximate point matching algorithm and the traditional SPOC algorithm (with the placement and rotation of the projection array taken into account). The imaging results are shown in Fig. 8.

Fig. 8
Fig. 8
Full size image

Projection image arrays calculated using the parallel camera approximate point algorithm and the traditional SPOC algorithm (a) and (b).

Owing to the discrepancies in the acquisition processes of the two methods, the input image arrays exhibit substantial variations, and the shooting angles also vary significantly, introducing numerous variables. Hence, this paper refrains from discussing the occurrence of image deformation and instead focuses on performance metrics such as the overall clarity after image reconstruction and the computational speed of the algorithm.

Table 1 Parameters of the two Algorithms.

As tabulated in Table 1, the parallel camera-based approximate point matching algorithm exhibits a twofold speedup compared to the SPOC algorithm. Overall, both algorithms yield favorable results for the computed projection image arrays, with both clearly resolving object contours. Notable discrepancies emerge in image details, however. Compared to Fig. 8(b), the image in Fig. 8(a) is free from mosaic artifacts and stripe distortion. Relying on a single reference plane, the SPOC algorithm struggles to capture continuous object information under complex depth variations. By contrast, the proposed algorithm dispenses with the need for a reference plane, instead determining optimal pixel matches through object point similarity comparisons to assign values to the projection image array. This approach enables more accurate reconstruction of object information across diverse depth layers under sparse sampling conditions, substantially enhancing image detail representation.

To further quantify the image display quality, the widely adopted Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity (SSIM) in image quality assessment are hereinafter introduced. The projection image array derived from the disparity image sampled under dense conditions serves as the reference image, whereas the projection image arrays generated by the two algorithms using disparity images sampled under sparse conditions serve as the experimental images. PSNR and SSIM are calculated for both scenarios, respectively:

  1. 1)

    Compare the first projected image in the reference images with the first projected image in the experimental images.

  2. 2)

    Compare all projected images in the reference images with all projected images in the experimental images

Table 2 Comparison of PSNR and SSNR numerical results in the first Case.
Table 3 Comparison of PSNR and SSNR numerical results in the second Case.

Both Tables 2 and 3 demonstrate that the parallel camera-based approximate point matching algorithm outperforms the SPOC algorithm in terms of performance.

Finally, the projection image array is loaded into the projector array to observe the light field reconstruction effect. The display result is presented in Fig. 9.

Fig. 9
Fig. 9
Full size image

Images captured by cameras at corresponding angles using the approximate point matching algorithm and the SPOC algorithm. (a): Left side of the holographic scattering screen. (b): Center of the holographic scattering screen. (c): Right side of the holographic scattering screen.

As depicted in Fig. 9, the image details produced by the SPOC algorithm are significantly more blurred than those of the parallel camera algorithm, with severe aliasing evident in Fig. 9(a). This arises because the SPOC algorithm directly uses pixel values from the closest cameras to assign pixel values to object points in the projection array. When camera positions deviate from ideal positions, this pixel assignment method introduces substantial errors. Multiple misassigned pixels corresponding to light rays converge at the holographic scattering screen, readily leading to pronounced ghosting artifacts.

As the viewing angle changes, the parallel camera-based approximate point matching algorithm demonstrates less pronounced variations in perspective compared to the SPOC algorithm. This stems from the relatively narrow angular capture range of the parallel camera array. Theoretically, the maximum viewing angle achievable by the parallel array is restricted, whereas a ring-shaped camera array can cover a far broader angular span. Specifically, the parallel camera array is limited to a 180-degree maximum viewing angle, while the ring-shaped configuration can theoretically achieve a full 360-degree coverage.

Problems and prospects

The approximate point matching algorithm based on parallel camera sampling proposed in this study successfully breaks through the technical bottleneck of the traditional SPOC algorithm, which relies on a symmetric architecture for designing the display and acquisition systems. By leveraging the Lambertian properties of objects, the algorithm achieves high-quality light field reconstruction under sparse sampling conditions. Experimental data shows that the computational efficiency of the approximate point matching algorithm is approximately twice that of the SPOC algorithm with the same number of disparity images. However, the current algorithm still has a long training time, and the computational efficiency can be further optimized by using the CUDA architecture in the future.

Although the algorithm breaks the constraints of the ring (symmetric) architecture and outperforms the SPOC algorithm in terms of display effect and computational efficiency, its viewing angle variation range is narrower than that of the SPOC algorithm. It is worth noting that when the sparsity of the camera array increases, the algorithm may have problems such as image aliasing and color boundary blurring. Excessive sparse sampling will also lead to a large amount of information loss. Future research can explore deep learning-based automatic multi-view image generation technology to solve the problem of information completion in sparse sampling scenarios.