Introduction

The demand for precise localization is escalating with the rapid advancement of wireless systems. Traditional localization methods often struggle in complex urban and indoor environments due to their reliance on line-of-sight (LOS) propagation paths. Reconfigurable intelligent surfaces (RIS) offer a promising solution by creating controllable reflective paths to enable effective localization in non-line-of-sight (NLOS) scenarios. Furthermore, RIS present a cost-effective and energy-efficient alternative to deploying additional anchors or relays1as a single receive antenna can estimate the target location with the RIS assistance2,3. Moreover, RIS feature a simplified hardware design, which facilitates easier deployment and maintenance4.

Research has underscored the potential of RIS to enhance wireless positioning accuracy. Theoretical analyses using the Cramér-Rao Lower Bound (CRLB) established fundamental limits of localization accuracy under varying RIS configurations5with Wymeersch et al.3 demonstrating a reduction in positioning error compared to passive scattering approaches. Investigations revealed that localization error generally decreases as RIS aperture size grows, and distributed RIS deployments can simultaneously lower the CRLB and extend coverage6. In RIS-assisted localization, the target’s position is determined using anchor nodes with known coordinates by extracting parameters such as Time of Arrival (ToA), Angle of Arrival (AoA), and Angle of Departure (AoD)7,8,9,10. These approaches fall into two categories: model-based methods utilizing mathematical models and signal processing while maintaining robustness11,12and data-driven methods employing machine learning but potentially suffering performance degradation outside training conditions. Various model-based implementations demonstrate practical capabilities, including Keykhosravi et al.10 introducing a low-complexity estimator combining ToA and AoD, Fascista et al.13 developing a Maximum Likelihood-based AoA method, and Zhou et al.14 deploying two aerial RISs with a discrete Fourier transform-based positioning framework. Recent advances include a single-antenna full-duplex system achieving centimeter-level accuracy in indoor scenarios15 and Zhang et al.16 pioneering sub-RIS beam steering-assisted localization with environment-aware probabilistic modeling.

In RIS-assisted localization systems, the element spacing is typically designed as half the wavelength of the incident signal17. However, in practical scenarios, the wavelength of the target signal is often unknown. As the signal frequency increases, spatial aliasing occurs18. To mitigate this issue, difference frequency (DF) technology has been widely adopted as an effective solution. suppressing grating lobes through signal down-conversion while maintaining localization accuracy19,20. DF-based methods have demonstrated robust performance, achieving meter-level source positioning via measured-to-simulated wavefield correlation20,21 and enabling high-frequency DOA estimation comparable to low-frequency conventional beamforming (CBF)21. Further improvements integrate DF with deconvolution to refine CBF’s angular resolution20. Additionally, DF-based matched field processing enhances target localization by correlating received signals with model-simulated wavefields generated through acoustic propagation modeling22,23. A DF-integrated Multiple Signal Classification (MUSIC) technique has been developed to address spatial aliasing and suppress spurious DOA components in high-frequency DOA estimation. By analyzing three distinct DF combination strategies — time, frequency, and hybrid time-frequency processing — the method effectively resolves spatial aliasing while suppressing artifact DoA24. The aforementioned algorithms are primarily designed for LOS environments and often simplify location parameters during modeling. This simplification can lead to incomplete data at the receiver, ultimately degrading localization accuracy. Moreover, to the best of our knowledge, there has been no investigation of RIS-assisted localization systems operating under conditions that involve spatial aliasing at high frequencies.

In this paper, we present a RIS-based localization system designed to pinpoint high-frequency signals under NLOS propagation. By fully exploiting multi-domain information, the proposed algorithm effectively resolves spatial aliasing and achieves robust localization. The key contributions of this paper are as follows:

  1. (1)

    We proposed a ​cost-effective localization system that operates robustly under NLOS condition. Unlike conventional systems rely on array antenna, the proposed architecture achieves accurate positioning using only a single-sensor receiver and a single RIS. This design eliminates the need for costly phased arrays or dense sensor deployments while maintaining functionality in obstructed environments.

  2. (2)

    We introduce a ​spatio-temporal-frequency information fusion (STFIF) direct localization algorithm to overcome challenges of high-frequency NLOS localization. The STFIF framework resolves these challenges through ​tri-domain fusion:1) Sparse signal reconstruction via semi-definite programming (SDP) for preliminary parameter estimation in the spatial domain, 2) DF processing to suppress spatial aliasing and artifacts targets, and 3) temporal-domain information derived from RIS’s time-varying configurations, fully leveraging multi-dimensional signal characteristics.

  3. (3)

    We perform a comparative analysis by integrating the DF-based techniques presented in prior literature into our proposed RIS-assisted direct localization framework. Moreover, we demonstrate the versatility and effectiveness of our proposed algorithm by validating its performance in scenarios with both single and multiple snapshot conditions. The numerical results indicate robust localization performance in both cases, highlighting the adaptability and practical feasibility of our proposed STFIF algorithm.

System models

Fig. 1
figure 1

Scenario of one receive sensor, K target sources and moving RIS.

The proposed localization system comprises a single receive sensor, a dynamically RIS in controlled motion, and two closely spaced target sources. The RIS performs Q signal reception cycles, where each cycle is partitioned into P non-overlapping segments. The RIS follows a linear motion with a constant velocity. During each signal reception cycle, the RIS is assumed to remain completely stationary at position \({{\mathbf{u}}_{\text{q}}}=[{u_{q,x}},{u_{q,y}}]\)\(\left( {q=1,2, \cdots ,Q} \right)\). This assumption is reasonable as the signal acquisition time at each position is typically short, and any micro-movements during data collection would be negligible. As illustrated in Fig. 1, the direct path from the targets to the receiver is obstructed, preventing LOS transmission. Consequently, the target signals are received exclusively through reflections from the RIS. The primary objective is to estimate the positions of the targets based on the signals recorded at the receiver. The RIS contains \({N_{ris}}\) programmable elements with an inter-element spacing of d. Consider K targets located at \({{\varvec{e}}_k}=\left[ {{e_{k,x}},{e_{k,y}}} \right]\), each transmitting signals at frequency f. The source signal vector is defined as \(\user2{S}_{f} \user2{ = }\left[ {s_{{t_{1} ,f}} ,s_{{t_{2} ,f}} , \cdots ,s_{{t_{L} ,f}} } \right] \in \mathbb{C}^{{K \times L}}\). The received signal matrix \({{\varvec{X}}_f}=\left[ {{x_{{t_1},f}},{x_{{t_1},f}}, \cdots ,{x_{{t_L},f}}} \right] \in {{\mathbb{C}}^{P \times L}}\) over Lsnapshots is given by

$$X_{f} = H_{R} A_{r} S_{f} + W$$
(1)

where \({\varvec{W}}\sim \mathcal{N}(0,{\sigma ^2}) \in {{\mathbb{C}}^{P \times L}}\) denotes white Gaussian noise. The time-varying measurement matrix \({{\mathbf{H}}_R} \in {{\mathbb{C}}^{P \times {N_{{\text{ris}}}}}}\) accounts for the dynamic reconfiguration of the RIS over time, incorporating P measurements across its \({N_{ris}}\) programmable elements. \({{\mathbf{A}}_r}=[{\mathbf{a}}({{\mathbf{e}}_0}),{\mathbf{a}}({{\mathbf{e}}_1}), \cdots ,{\mathbf{a}}({{\mathbf{e}}_{K - 1}})] \in {{\mathbb{C}}^{{N_{ris}} \times K}}\) denotes the steering matrix of RIS, and we have

$${\mathbf{a}}({{\mathbf{e}}_k})={[{a_0}({{\mathbf{e}}_k}),{a_1}({{\mathbf{e}}_k}), \cdots ,{a_{N - 1}}({{\mathbf{e}}_k})]^\text{T}} \in {{\mathbb{C}}^{{\text{N}_{{\text{ris}}}} \times 1}}$$
(2)
$${a_n}({{\mathbf{e}}_k})={e^{j2\pi f\frac{{{d_n}}}{{\text{c}}}\sin {\theta _k}}}$$
(3)

c is the sound speed and \({{\text{d}}_n}\) represents the nth element spacing. \({\theta _k}\) is the AoA of the k-th target arriving at the RIS. The relationship of the target’s position and \({\theta _k}\) is described as follows:

$$\sin \theta _{k} = \frac{{u_{{q,x}} - e_{{k,x}} }}{{\left\| {u_{q} - e_{k} } \right\|}}$$
(4)

STFIF direct localization algorithm

  1. 1.

    Spatial sparse reconstruction.

To obtain the target location information, we leverage the signal’s sparsity through the ANM method. The resulting problem formulation is:

$$\mathop {\hbox{min} }\limits_{x} \left\| {{{\varvec{X}}_{\text{f}}} - {{\mathbf{H}}_R}{{\varvec{Y}}_f}} \right\|_{2}^{2}+\rho {\left\| {{{\varvec{Y}}_f}} \right\|_\mathcal{A}}$$
(5)

in this formulation, we define \({{\varvec{Y}}_f}={{\varvec{A}}_{\text{r}}}{{\varvec{S}}_f} \in {{\mathbb{C}}^{{N_{ris}} \times L}}\) to simplify the representation.\({\left\| {{{\varvec{Y}}_f}} \right\|_\mathcal{A}}\)denotes the atomic norm of \({{\varvec{Y}}_f}\), \(\rho\)is a regularization parameter that balances data fidelity with sparsity, typically set to \(\rho =\sqrt {{\sigma _{\text{w}}}N\log N}\). To solve the optimization problem in Eq. (5), we leverage the ANM technique, based on the approach described in reference25the atomic norm is given by:

$${\left\| {{{\varvec{Y}}_{\text{f}}}} \right\|_\mathcal{A}}={\inf _{{\mathbf{u}},t}}\left\{ {\frac{{\text{T}\text{r}\{ \text{T}\text{o}\text{e}\text{p}({\mathbf{u}})\} }}{{2{N_{ris}}}}+\frac{\rho }{2}\text{T}\text{r}(\varvec{\Upsilon}):\left( {\begin{array}{*{20}{c}} {\text{T}\text{o}\text{e}\text{p}({\mathbf{u}})}&{{{\varvec{Y}}_f}} \\ {{{\varvec{Y}}_f}^{\text{H}}}&\varvec{\Upsilon} \end{array}} \right) \succcurlyeq 0} \right\}$$
(6)

\({\mathbf{u}} \in {{\mathbb{C}}^{{N_{ris}} \times 1}}\),\(\text{T}\text{o}\text{e}\text{p}({\mathbf{u}})\) is a Toeplitz matrix denoted as:

$$\text{T}\text{o}\text{e}\text{p}({\mathbf{u}}) \triangleq \left[ {\begin{array}{*{20}{c}} {{u_0}}&{{u_{ - 1}}}& \cdots &{{u_{ - (N - 1)}}} \\ {{u_1}}&{{u_0}}& \cdots &{{u_{ - (N - 2)}}} \\ \vdots & \vdots & \ddots & \vdots \\ {{u_{N - 1}}}&{{u_{N - 2}}}& \cdots &{{u_0}} \end{array}} \right]$$
(7)

the ANM problem can be restated as follows:

$$\begin{gathered} \hbox{min} \left\| {{{\varvec{X}}_{\text{f}}} - {{\mathbf{H}}_R}{{\varvec{Y}}_f}} \right\|_{2}^{2}+\frac{\rho }{{2{N_{{\text{ris}}}}}}\text{T}\text{r}\{ \text{T}\text{o}\text{e}\text{p}({\mathbf{u}})\} +\frac{\rho }{2}\text{T}\text{r}(\varvec{\Upsilon}) \\ s.t.\left( {\begin{array}{*{20}{c}} {\text{T}\text{o}\text{e}\text{p}({\mathbf{u}})}&{{{\varvec{Y}}_{\text{f}}}} \\ {{{\varvec{Y}}_f}^{\text{H}}}&\varvec{\Upsilon} \end{array}} \right) \succcurlyeq 0. \\ \end{gathered}$$
(8)

we can recover the sparse signal \({{\varvec{Y}}_{{f}}}\) by employing an SDP-based approach, The parameter \(\rho\) usually set as \(\rho \approx \sigma \sqrt {{N_{ris}}\log {N_{ris}}}\). the Eq. (8) can be solved using the CVX convex optimization toolbox in MATLAB.

The formulation presented above applies generally to multiple snapshots. Under the single-snapshot scenario, the constant t replaces the matrix \(\varvec{\Upsilon}\)leading to a simplified form

$$\begin{gathered} \hbox{min} \left\| {{{\varvec{X}}_{\text{f}}} - {{\mathbf{H}}_R}{{\varvec{Y}}_f}} \right\|_{2}^{2}+\frac{\rho }{{2N}}\text{T}\text{r}\{ \text{T}\text{o}\text{e}\text{p}({\mathbf{u}})\} +\frac{\rho }{2}t \\ s.t.\left( {\begin{array}{*{20}{c}} {\text{T}\text{o}\text{e}\text{p}({\mathbf{u}})}&{{{\varvec{Y}}_f}} \\ {{{\varvec{Y}}_f}^{\text{H}}}&t \end{array}} \right) \succcurlyeq 0. \\ \end{gathered}$$
(9)
  1. 2.

    Difference frequency.

We leverage differential frequency to establish an equivalent array response with enhanced spatial resolution. This is achieved through the Hadamard product (\(\odot\)) between the upper-frequency data matrix \({{\varvec{Y}}_{{f_i}^{U}}}\)​ and the complex conjugate of corresponding lower-frequency components \({{\varvec{Y}}_{{f_i}^{L}}}\)24formulated as

$$\begin{gathered} {{\varvec{Y}}_{\Delta {f_i}}}={{\varvec{Y}}_{{f_i}^{U}}} \odot {\varvec{Y}}_{{{f_i}^{L}}}^{*} \\ =\left[ {{{\varvec{y}}_{{t_1},f_{i}^{U}}},{{\varvec{y}}_{{t_2},f_{i}^{U}}}, \cdots ,{{\varvec{y}}_{{t_L},f_{i}^{U}}}} \right] \odot \left[ {{\varvec{y}}_{{{t_1},f_{i}^{U}}}^{*},{\varvec{y}}_{{{t_2},f_{i}^{U}}}^{*}, \cdots ,{\varvec{y}}_{{{t_L},f_{i}^{U}}}^{*}} \right] \\ =\sum\limits_{{k=1}}^{K} {\left[ {{\varvec{a}}_{{f_{i}^{U}}}^{{}}({\varvec{e}}) \odot {\varvec{a}}_{{f_{i}^{L}}}^{*}({\varvec{e}})} \right]\left[ {s_{{kL,f_{i}^{U}}}^{{\rm T}} \odot s_{{kL,f_{i}^{L}}}^{{\rm H}}} \right]} +\sum\limits_{{k^{\prime}=1}}^{K} {\sum\limits_{\substack{ k^{\prime\prime}=1 \\ k^{\prime\prime} \ne k^{\prime} } }^{K} {\left[ {{\varvec{a}}_{{f_{i}^{U}}}^{{}}({{\varvec{e}}_{k^{\prime}}}) \odot {\varvec{a}}_{{f_{i}^{L}}}^{*}({{\varvec{e}}_{k^{\prime\prime}}})} \right]} } \left[ {s_{{k^{\prime}L,f_{i}^{U}}}^{{\rm T}} \odot s_{{k^{\prime\prime}L,f_{i}^{L}}}^{{\rm H}}} \right]+{W_\Delta } \\ \end{gathered}$$
(10)

To simplify the expression, let \({\varvec{a}}_{{\Delta {f_i}}}^{{}}({\varvec{e}})={\varvec{a}}_{{f_{i}^{U}}}^{{}}({\varvec{e}}) \odot {\varvec{a}}_{{f_{i}^{L}}}^{*}({\varvec{e}})\) .The cross-terms in Eq. (10) generate false responses values, resulting in \({K^2} - K\) artifact targets for K true targets, which degrade the localization accuracy. In the subsequent processing stages, mitigating the impact of artifact targets is critical. A fundamental characteristic of these artifact targets, generated by cross-terms, is that their apparent positions in the spatial spectrum are contingent upon the specific frequencies employed in the DF calculation. In contrast, the physical locations of true targets remain fixed, with their corresponding peaks in the spatial spectrum demonstrating consistency across various frequency processing methods. The STFIF algorithm exploits this frequency-dependent property of artifacts. Through fusion of multi-dimensional information derived from different frequencies, the algorithm effectively combines the spatial spectra.

  1. (1)

    DF-CBF.

The product \({{\varvec{Y}}_{\Delta {f_i}}}\) from Eq. (10) and \({\varvec{a}}_{{\Delta {f_i}}}^{{}}({\varvec{e}})\) are employed to carry out the conventional beamformer (CBF) .The DF sample covariance matrix (DF-SCM), denoted by \({R_{cbf}}\),is used for estimate unknow localization \(\hat {{\varvec{e}}}\). The resulting CBF power at each angle location is given by

$$P_{{\Delta {f_i}}}^{{{\text{CBF}}}}(\hat {{\varvec{e}}})=a_{{\Delta {f_i}}}^{H}(\hat {{\varvec{e}}}){\mkern 1mu} {R_{cbf}}{\mkern 1mu} {a_{\Delta {f_i}}}(\hat {{\varvec{e}}})$$
(11)
$$\begin{gathered} {R_{cbf}}=\frac{1}{L}\sum\limits_{{l=1}}^{L} {{{\varvec{y}}_{l,\Delta {f_i}}}} {\mkern 1mu} {\varvec{y}}_{{l,\Delta {f_i}}}^{H} \\ =\frac{1}{L}\sum\limits_{{l=1}}^{L} ( y_{{l,{f_i}^{U}}}^{{}}y_{{l,{f_i}^{U}}}^{{\rm H}}){\mkern 1mu} \circ {(y_{{l,{f_i}^{L}}}^{{}}y_{{l,{f_i}^{L}}}^{{\rm H}})^*} \\ \end{gathered}$$
(12)

The target location can be obtained by solving for the maximizer of Eq. (10).

  1. (2)

    Multi-information fusion.

Recall the Eq. (10), for a single vector \({y_{\Delta {f_i},l}}\)

$${y_{\Delta {f_i},l}}=\sum\limits_{{k=1}}^{K} {{s_{kl,f_{i}^{U}}}s_{{kl,f_{i}^{U}}}^{*}{\varvec{a}}_{{\Delta {f_i}}}^{{}}({{\varvec{e}}_k})} +\sum\limits_{{k^{\prime}=1}}^{K} {\sum\limits_{\substack{ k^{\prime\prime}=1 \\ k^{\prime\prime} \ne k^{\prime} } }^{K} {s_{{k^{\prime}l,{f_i}}}^{{}}} } {\mkern 1mu} s_{{k^{\prime\prime}l,{f_i}}}^{*}\left[ {{\varvec{a}}_{{\Delta {f_i}}}^{{}}({{\varvec{e}}_{k^{\prime}}}){\mkern 1mu} \odot {\mkern 1mu} {\varvec{a}}_{{\Delta {f_i}}}^{*}({{\varvec{e}}_{k^{\prime\prime}}})} \right]+w$$
(13)

the first, second, and third terms are uncorrelated. The covariance matrix can be formulated as

$${\varvec{R}}={\mathbb{E}}\left[ {{y_{\Delta {f_i},l}}y_{{\Delta {f_i},l}}^{{\rm H}}} \right]={\varvec{A}}{{\varvec{R}}_x}{\varvec{A}}+{{\varvec{R}}_c}+{{\varvec{R}}_\Delta }$$
(14)
$$\user2{A} = \left[ {\user2{a}_{{\Delta f_{i} }}^{{}} (\user2{e}_{1} ),\user2{a}_{{\Delta f_{i} }}^{{}} (\user2{e}_{2} ), \cdots ,\user2{a}_{{\Delta f_{i} }}^{{}} (\user2{e}_{K} )} \right] \in \mathbb{C}^{{M \times K}}$$
(15)

where \({\left[ {{{\varvec{R}}_x}} \right]_{i,j}}={\mathbb{E}}\left[ {{x_i}x_{j}^{*}} \right]\). The estimated covariance matrix can be obtained from

$$\hat {{\varvec{R}}}=\frac{{{{\varvec{Y}}_{\Delta {f_i}}}Y_{{\Delta {f_i}}}^{{\rm H}}}}{L}$$
(16)

By performing eigenvalue decomposition of \({\varvec{R}}\), the signal subspace and noise subspace components can be obtained, written as:

$${\varvec{R}}={{\varvec{U}}_S}{\varvec{\varSigma}_S}{\varvec{U}}_{S}^{{\rm H}}+{{\varvec{U}}_N}{\varvec{\varSigma}_N}{\varvec{U}}_{N}^{{\rm H}}$$
(17)

Due to the presence of cross-terms, the eigen-decomposition yields \({K^2}\) largest eigenvalues corresponding to the signal subspace (which still contains noise components), while the remaining \({N_{{\text{ris}}}} - {K^2}\) eigenvalues constitute the noise subspace. These two subspaces are mutually orthogonal

$${\varvec{a}}_{{\Delta {f_i}}}^{{}}{({{\varvec{e}}_k})^\text{H}}{{\varvec{U}}_N}{{\varvec{U}}_N}^{\text{H}}{\varvec{a}}_{{\Delta {f_i}}}^{{}}({{\varvec{e}}_k})=0,k=1,2, \cdots ,{K^2}$$
(18)

The cost function can be formulated as:

$${\rm Z}({{\varvec{e}}_k})=\sum\limits_{{q=1}}^{Q} {\sum\limits_{{i=1}}^{F} {{{\left\| {{{\varvec{U}}_N}^{\text{H}}{\varvec{a}}_{{\Delta {f_i}}}^{{}}({{\varvec{e}}_k})} \right\|}^2}} }$$
(19)

The formulation discussed above is suitable for multi-snapshots. In the single-snapshot scenario, the covariance matrix cannot be computed directly from Eq. (16). Additional information is required to construct the covariance matrix. Here, we utilize frequency-domain components to form the covariance matrix instead

$${\hat {{\varvec{R}}}_{ss}}=\frac{{{{\varvec{Y}}_F}{\varvec{Y}}_{F}^{{\rm H}}}}{F}$$
(20)
$${{\varvec{Y}}_F}=\left[ {{y_{\Delta {f_1}}},{y_{\Delta {f_2}}}, \cdots ,{y_{\Delta {f_F},}}} \right]$$
(21)

Performing eigenvalue decomposition on \({{\varvec{R}}_{ss}}\) yields the corresponding signal subspaces and noise subspaces \({\varvec{U}}_{N}^{{ss}}\). We formulate the cost function as:

$${{\rm Z}_{ss}}({{\varvec{e}}_k})=\sum\limits_{{q=1}}^{Q} {{{\left\| {{{\left( {{\varvec{U}}_{{_{N}}}^{{ss}}} \right)}^\text{H}}{\varvec{a}}_{{\Delta {f_i}}}^{{}}({{\varvec{e}}_k})} \right\|}^2}}$$
(22)

To provide a more detailed exposition of the algorithm, the pseudocode for the STFIF algorithm is shown in Table 1.

Table 1 Implementation of the STFIF algorithm.

To clearly illustrate the characteristics and advantages of the proposed STFIF algorithm relative to relevant baseline methods, we summarize their comparison across key technical dimensions in Table 2.

Table 2 Comparison of proposed STFIF algorithm with prior Work.

Simulation results

In this section, we conduct numerical simulations to validate the effectiveness of the proposed STFIF localization algorithm, particularly focusing on its performance in distinguishing two closely located targets. The simulated scenario involves two targets placed at \({{\varvec{e}}_1}=(50,100)m\),\({{\varvec{e}}_2}=(60,100)m\), with an intentionally minimal spatial separation of only\(10m\). The simulation environment is configured as follows: an RIS composed of \({N_{ris}}=32\)elements, employing a random coding scheme for its reflection matrix \({{\varvec{H}}_R}\), gather signal data over \(P=8\)time slots and observe \(L=50\) snapshot data. The element spacing of the RIS is normalized as\(d=c/2\Delta f\), and the frequency of the transmitted signal is set between 3.5 and 5 times the frequency \(\Delta f\). Here, we define uniformly spaced DF as \(\Delta f=f_{i}^{U} - f_{i}^{L}=10\text{k}\text{H}\text{z}\). To excute \(Q=5\)multiple measurements, the RIS initiates measurements from coordinate \({{\varvec{u}}_1}=(0,0)m\) and subsequently moves along the x-axis, performing additional measurements at intervals of \(15m\),such as \({{\varvec{u}}_2}=(15,0)m\),\({{\varvec{u}}_3}=(30,0)m\),\({{\varvec{u}}_4}=(45,0)m\), \({{\varvec{u}}_5}=(60,0)m\).

As shown in Fig. 2, the spatial spectra obtained using the proposed DF technology are presented for three frequency components (35 kHz, 42.5 kHz, and 50 kHz). The results demonstrate that while our DF approach successfully overcomes the spatial aliasing problem that commonly affects conventional beamforming methods, unwanted artifact targets still appear in the spectra. The two prominent central peaks highlighted by green circles correspond to the true target positions, whereas the additional peaks marked by red arrows represent artifact targets that could lead to false detections. These artifact targets exhibit clear frequency-dependent characteristics—notably shifting their positions in the spatial domain as the operating frequency changes from 35 kHz to 50 kHz. By applying our proposed algorithm, as demonstrated in Fig. 3, the 3D plot shows successful suppression of artifact targets, leaving only the two true targets clearly visible. The algorithm achieves enhanced target localization by exploiting the frequency-dependent characteristics of artifact peaks while preserving the consistent spatial signatures of genuine targets.

Fig. 2
figure 2

Spatial spectrum obtained using DF technology at three different frequencies (35 kHz, 42.5 kHz, and 50 kHz). The 3D plots illustrate the spectral power distribution across the x-y coordinate plane, where peaks indicate potential target locations. True targets are highlighted with green circles, while artifact targets (false positives) are marked with red arrows.

We compare our proposed method STFIF with DF-CBF and DF-MUSIC. Additionally, the performance of STFIF in the single-snapshot scenario is also compared, highlighting its effectiveness under conditions with limited snapshot availability. We use the root mean square error (RMSE) to measure the localization performance. The CRLB25 is also included in our comparisons as a theoretical performance benchmark, representing the minimum variance that can be achieved by any unbiased estimator. All simulations were performed over 500 Monte Carlo trials, the RMSE is computed using the standard formula:

$$RMSE=\sqrt {\frac{1}{{{\text{monte}}}}\sum\limits_{{i=1}}^{{monte}} {\sum\limits_{{k=1}}^{K} {{{\left( {{{\hat {{\varvec{e}}}}_k} - {{\varvec{e}}_k}} \right)}^2}} } }$$
(23)

where \(monte\) represents the total number of Monte Carlo trials and \({\hat {{\varvec{e}}}_k}\) is the estimated position.

Fig. 3
figure 3

Results of the proposed algorithm: artifact-free spatial spectra demonstrating successful target localization.

Figure 4 presents a comparative analysis of the RMSE performance for various algorithms across different noise levels, with the SNR ranging from 5 dB to 20 dB. It is evident that the proposed STFIF algorithm consistently outperforms DF-CBF and DF-MUSIC, exhibiting notably lower RMSE values at identical SNR conditions. Although all evaluated methods demonstrate improvements as the SNR increases, the STFIF method achieves superior estimation accuracy closer to the CRLB than other comparative algorithms. The gap between STFIF and the CRLB indicates potential for further algorithmic refinement, while clearly highlighting STFIF’s robustness and enhanced ability to handle noise in comparison to alternative approaches.

Figure 5 analyzes the impact of the number of RIS elements on localization performance, with the SNR fixed at 10 dB and the number of RIS elements varying from 15 to 40. The simulation results demonstrate a clear trend: as the number of RIS elements increases, the RMSE of all algorithms decreases, with our proposed STFIF algorithm consistently achieving the lowest error. Notably, the STFIF method maintains superior performance across the entire range compared to DF-CBF, DF-MUSIC, and STFIF-single approaches, with its error curve closest to the theoretical CRLB limit. This performance advantage becomes more pronounced with larger RIS arrays, where STFIF achieves approximately 0.13 m RMSE at 40 RIS elements, significantly outperforming conventional methods. The decreasing gap between STFIF and CRLB with increasing RIS elements suggests that our proposed algorithm can more effectively leverage the additional spatial diversity provided by larger RIS arrays, making it particularly well-suited for high-precision localization applications that employ substantial reconfigurable surfaces.

Fig. 4
figure 4

Different algorithms performance comparison under varying SNR.

Fig. 5
figure 5

Different algorithms performance comparison under varying RIS element number.

Beyond localization accuracy, computational complexity serves as another critical metric for evaluating the practical feasibility of localization algorithms. The computational complexity of these algorithms is primarily determined by several key operations: the computation of the covariance matrix from received signals, eigen decomposition of this matrix, and the evaluation of a spatial spectrum over a grid of possible angles. While the complexity of covariance matrix computation and eigen decomposition remains consistent across all three algorithms, the spatial spectrum search dominates the overall computational cost. The proposed STFIF algorithm incurs a marginally higher complexity in this step due to additional processing requirements, yet demonstrates only negligible computational overhead compared to DF-CBF and DF-MUSIC. The detailed quantitative comparison is presented in Table 3, where \({N_{grid}}\)​ denotes the number of grid points in the spatial search.

Table 3 Computational complexities comparation.

Conclusions

This study investigates RIS-assisted high-frequency localization and addresses the issue of spatial aliasing that arises in conventional array architectures. To overcome this challenge, we propose a STFIF algorithm, which integrates three critical components: (1) Sparse signal reconstruction via SDP for initial parameter estimation, (2) DF processing to mitigate spatial aliasing and artifact targets, and (3) integration of RIS’s time-varying configurations, multi spatial-domain measurements, and the accumulation of multi-frequency components. By effectively exploiting multi-dimensional information, the proposed STFIF algorithm achieves superior positioning performance with enhanced resolution and accuracy.