Table 13 Comparison of methodological approaches: distance estimation.

From: Measuring political polarization through visible interactions between religious and non-religious citizens

Approach

Main description (major improvements

of the previous approach)

Potential problem

1

1) Using the area ratio of the bounding boxes to represent the relative depth(z)

\(z = \frac{\begin{gathered} {\text{max(Area}}\;{\text{of}}\;{\text{Bounding}}\;{\text{Box}}\;{1,}\; \hfill \\ {\text{Area}}\;{\text{of}}\;{\text{Bounding}}\;{\text{Box}}\;{2)} \hfill \\ \end{gathered} }{\begin{gathered} {\text{min(Area}}\;{\text{of}}\;{\text{Bounding}}\;{\text{Box}}\;{1,}\; \hfill \\ {\text{Area}}\;{\text{of}}\;{\text{Bounding}}\;{\text{Box}}\;{2)} \hfill \\ \end{gathered} }\)

\({\text{Relative}}\;{\text{Distance}} = \sqrt {(x_{{2}} - x_{{1}} )^{{2}} + (y_{{2}} - y_{{1}} )^{{2}} + (z_{{2}} - z_{{1}} )^{{2}} }\)

1) The relationship between the area ratio and the relative

depth is not linear, thus the representation area ratio is not accurate

2) Combining z with x and y

2) The combination does not have theoretical robustness

2

1) Discovering an inverse square root relationship between the area of the bounding boxes

and the depth of the object toward the camera thus solving the non-linear relationship problem

\(z = \frac{1}{{\sqrt {{\text{Area}}\;{\text{of}}\;{\text{Bounding}}\;{\text{Box}}} }}\)

\({\text{Relative}}\;{\text{Distance}} = \sqrt {(x_{{2}} - x_{{1}} )^{{2}} + (y_{{2}} - y_{{1}} )^{{2}} + (z_{{2}} - z_{{1}} )^{{2}} }\)

1) The usefulness of this approach has not been replicated by other researchers and needs

numerous validation attempts. The validation rounds are time-consuming

2) Combining z with x and y using a 3D Euclidean formula to make the combination method more credential

2) Although an inverse square root relationship exists when the camera has been shot from an eye-level perspective, as the camera angle varies between different videos, the equation of calculating z would be inaccurate

3

1) Using MiDaR to get a relative depth(z). This approach offers more credibility. 2) Combining z with x and y using a 3D Euclidean formula to make the combination method more robust

\({\text{Relative}}\;{\text{Distance}} = \sqrt {(x_{{2}} - x_{{1}} )^{{2}} + (y_{{2}} - y_{{1}} )^{{2}} + (z_{{2}} - z_{{1}} )^{{2}} }\)

1) MiDaR would be inaccurate in this research’s circumstance if the centers of the bounding boxes used to capture human objects in the frames are overlapping