Abstract
Millimeter-Wave (MMW) imaging is a promising technique for contactless security inspection. However, the high cost of requisite large-scale antenna arrays hinders its widespread application in high-throughput scenarios. Here, we report a large-scale single-shot MMW imaging framework, achieving low-cost high-fidelity security inspection. We first analyzed the statistical ranking of each array element through 1934 full-sampled MMW echoes. The highest-ranked elements are preferentially selected based on the ranking, building the experimentally optimal sparse sampling strategy that reduces antenna array cost by one order of magnitude. Additionally, we derived an untrained interpretable learning scheme, realizing robust and accurate MMW image reconstruction from sparsely sampled echoes. Last, we developed a neural network for automatic object detection, and experimentally demonstrated successful detection of concealed centimeter-sized targets using 10% sparse array, whereas all the other contemporary approaches failed at such a low sampling ratio. With the strong detection ability and order-of-magnitude cost reduction, we anticipate that this technique provides a practical way for large-scale single-shot MMW imaging.
Similar content being viewed by others
Introduction
Security check at public places such as airports and railway stations requires effective personnel surveillance techniques to prevent rising-concerned terrorist attacks1. However, conventional surveillance techniques have limited utility in practice, as metal detectors can only detect metallic weapons and explosives, X-ray machines expose individuals to harmful ionizing radiation2, infrared imaging systems3,4 are sensitive to environmental disturbance, and visible-light cameras cannot see through clothing. A practical security check system typically relies on the combination of these different techniques. Consequently, the security system suffers from a complex inspection workload and poor efficiency of passage. In this context, millimeter-wave (MMW) imaging has emerged as a promising alternative, due to its unique advantages of high-resolution and penetrable imaging ability while being safer for human exposure compared to X-ray machines5. MMW imaging provides the possibility of an all-in-one high-throughput security screening system with high resolution, high penetrability, and strong safety.
Existing MMW imaging systems are generally categorized into two types: passive and active. Passive systems6 rely on detecting the naturally occurring MMW radiation emitted by targets. However, they often encounter limited imaging quality due to low radiation contrast between targets and the environment. On the other hand, active systems5,7,8,9,10,11, including single-input-single-output (SISO) and multiple-input multiple-output (MIMO) configurations12, leverage additional MMW illumination and capture scattered electromagnetic waves to achieve superior imaging resolution compared to passive systems. While a full SISO array can achieve high-quality imaging, it requires a large number of antenna elements that are prohibitively expensive. Consequently, a mechanically scanning linear array with range migration algorithm (RMA)7 has emerged as a more economical alternative. However, the resulting images may suffer from blur even if the subjects jiggle during data acquisition13, which degrades the subsequently concealed object detection accuracy14. The fully electronic MIMO arrays demonstrate the advantage of requiring fewer elements while enabling data collection through a single snapshot but need time-consuming high-dimensional data processing12,15,16,17,18,19. To sum up, the existing MMW imaging systems suffer from the inevitable tradeoff among imaging quality, cost, and running efficiency.
The sparse array synthesis (SAS)20,21,22,23,24,25 technique is emerging as a cost-effective alternative to minimize manufacturing expenses of the fully sampled arrays. However, the state-of-the-art SAS methods struggle to synthesize large arrays25, making them less applicable for security check scenarios where a large aperture is usually required. The next challenge lies in the reconstruction methods for sparse array-based systems. The existing compressive sensing (CS) techniques26,27,28,29 may fail under low sampling ratios with large-scale arrays. In recent years, the deep learning (DL) approaches have been introduced to either enhance pre-reconstructed images30,31, or directly deal with the inverse scattering problem32,33,34,35, yielding successful reconstruction on specific datasets. However, DL methods heavily rely on training datasets, potentially leading to poor generalization for untrained or marginal distributions.
To tackle the above challenges, here we present a low-cost, large-scale single-shot MMW security inspection framework, which is capable of successfully detecting concealed centimetre-sized targets. First, we collected a set of real-captured echoes using a scanning-based full-sampled antenna array, and then analyzed the statistical importance ranking map of different array elements. Based on the ranking map, we experimentally derived a statistically optimized sampling strategy to sparsely select antenna elements at low sampling ratios (e.g., 10% and 25%). Second, we proposed an interpretable untrained learning approach, which ensures robust MMW reconstruction from sparsely sampled echoes. Unlike conventional DL imaging methods30,31,32,33,34,35, this physics-informed learning strategy operates without the need for dataset training, ensuring case-specific optimization with strong robustness. Compared to the existing CS-based approaches, it yields high-fidelity reconstruction with up to 3.4 dB PSNR improvement at low sampling ratios. The combination of statistics-based sparse sampling and untrained learning results in a significant reduction in the cost of antenna array by up to an order of magnitude. Third, we developed a neural network that achieved successful centimeter-sized concealed target detection at a 10% sampling ratio, while other contemporary approaches failed. To conclude, this work leverages the statistical sparse prior and the emerging model-learning joint optimization framework, providing new insights into the development of low-cost and efficient MMW security inspection systems.
Results
Statistically optimized sparse array
In contrast to prevailing handicraft-designed arrays, our approach capitalizes on the statistical prior afforded by a large-scale dataset. To this end, we first gathered real echoes using a full-sampled antenna array. We have complied with all pertinent ethical regulations and secured informed consent from all fourteen volunteers. These volunteers carried five types of concealed objects, including knives, phones, wrenches, pistols (metal replica model), and Explosive Powdered Material (EPM, employing silica gel with similar MMW absorption properties in our experiments). They were positioned around 0.4 meters in front of the array, as depicted in Fig. 1a. Throughout the data collection process, these volunteers were instructed to maintain a steady position. Under such an experiment configuration, we obtained a total of 1934 echo samples from ten volunteers, constituting the MMW imaging dataset. The dataset comprises samples with three dimensions: vertical, horizontal, and frequency, each containing 430 × 186 × 50 voxels.
a We collected a full-sampled MMW concealed target inspection dataset containing 1934 echoes. b The statistical ranking \(\bar{M}\) was obtained by multiplying the average amplitude \(\bar{A}\) and inverse phase gradient \(\bar{P}\). c With the preset sampling control hyperparameter S and a uniform random function r(n), we select the n-th point if r(n) > S following the statistical ranking in b. d The reconstruction accuracy (PSNR) of 16 randomly selected echoes w.r.t. different S. The error bar represents the standard deviation. The optimal sparse patterns of Sampling Ratio(SR)=10% & 25% correspond to S = 0.8 & 0.5, respectively. More details are in Supplementary Note 3. e The reconstructed 3D scenes at 25% sampling ratio. f The reconstructed results of concealed target detection at 25% sampling ratio. We compared the random sampling with conventional CS-CG43 reconstruction and the reported statistically sparse sampling with the untrained reconstruction in e and f.
To analyze the distribution of 3D echo, we extracted a 2D slice of echo at the center frequency along the frequency dimension. This particular slice was chosen as it provides the most comprehensive representation of the echo characteristics within this frequency band. Then we calculated the average data of all the obtained 2D slices to generate a 2D average echo, which reflects the characteristics of the 3D averaged echo and facilitates investigation into the statistical importance ranking of elements in the 2D antenna array. Further, we extracted the average amplitude map \(\bar{A}\) and inverse phase gradient map \(\bar{P}\) (the reciprocal of phase’s gradient) of the cross-section, as shown in Fig. 1b. A larger value in \(\bar{A}\) indicates that this element has more significance in the antenna array, and vice versa. As for the phase item, a continuous target distribution can result in a flattened phase gradient distribution of the array (“Analysis on the statistical maps of MMW echoes” and Supplementary Note 4). Consequently, we suggest that attention should be focused on the portion of the array with a smaller phase gradient distribution (larger values in \(\bar{P}\)). A larger value in the product of \(\bar{A}\) and \(\bar{P}\), denoted as \(\bar{M}=\bar{A}\times \bar{P}\), signifies greater importance of the element within the antenna array. We set n = 1 → N as orders of each sorted element in \(\bar{M}\). A smaller n means higher statistical importance. Given the statistical importance ranking \(\bar{M}\) and a uniform random function \(r\left(n\right)\) ranging from 0 to 1 (Fig. 1c), we selected the elements n of \(r\left(n\right)\, > \,S\), where S is a hyperparameter to control the sparsity of sparse patterns (“Statistically sparse sampling”).
We conducted a series of simulations to test the performance of the reported sparse sampling strategy. The root mean square error (RMSE), peak signal-to-noise ratio (PSNR), and structural similarity (SSIM)36 were employed to quantify reconstruction accuracy. We compared both the 2D maximum value projections along the range direction and 3D results reconstructed by various imaging algorithms. The dynamic range of the reconstruction results was set to 20 dB. Images retrieved by RMA from full sampling echoes were regarded as references. Figure 1d demonstrates that there exists an optimal sparsity for a certain sampling ratio. For instance, the hyperparameter S = 0.8 corresponds to the optimal sparsity of 10% sampling ratio, while S = 0.5 corresponds to 25% sampling ratio. The statistics in Table 1 reveal that statistically sparse sampling yields average enhancements of 20% in RMSE, 2 dB in PSNR, and 0.22 in SSIM across all reconstruction methods compared to random sampling. Both visual and quantitative comparisons (Fig. 1e, f and Table 1 validate the superiority of the reported statistically sparse sampling on reconstruction accuracy.
The untrained learning reconstruction
To ensure reconstruction fidelity, we employed a physical constraint to regulate the output of the neural network, following the interpretable untrained learning strategy37,38. By updating the parameters θ of the neural network fθ, the untrained reconstruction minimizes the loss between \({{{{{{\mathcal{H}}}}}}}\left({f}_{\theta }\left(z\right)\right)\) and the measurement Es, where \({{{{{{\mathcal{H}}}}}}}\) is the physical model of MMW scattering, and z is the input of the network. Considering the complex-valued characteristics of MMW echoes, we constructed a lightweight complex-valued convolutional network (CCN)39,40,41 as shown in Fig. 2a. More details are referred to “Untrained reconstruction based on CCN”. We applied the complex-valued total variation (CTV)42 regularization to enhance the network’s output. The network does not require pre-training. Instead, it is the interplay between \({{{{{{\mathcal{H}}}}}}}\) and fθ that causes the prior of Es to be captured by the neural network. When the optimization converges, the resulting fθ corresponds to the inverse model of \({{{{{{\mathcal{H}}}}}}}\), which can be applied to reconstruct the target scene \(I={f}_{\theta }\left(z\right)\).
a We optimize the CCN parameters following the physical constraint (Eq. (4)) in an untrained learning manner to reconstruct the target scene. The CCN network is composed of a complex-valued Conv-BN-ReLU layer, five complex-valued res-blocks, and a complex-valued Conv layer. b The reconstruction results of various algorithms with the statistically sparse sampling strategy.
To evaluate the performance of the reported reconstruction method, the commonly used RMA, state-of-the-art CS-based methods including compressed sensing-complex gradient (CS-CG)43, Alternating direction method of multipliers (ADMM)44, and DL-based reconstruction networks were employed for comparison. We trained four complex-valued UNet-based45 networks, corresponding to random array + 10% sampling ratio, statistically optimized array + 10% sampling ratio, random array + 25% sampling ratio, and statistically optimized array + 25% sampling ratio. Details of these DL networks can be found in Supplementary Note 5.1. As shown in Fig. 2, the reported untrained reconstruction can retrieve more clear textures of concealed targets, while RMA causes more artifacts and other CS-based methods cannot mitigate the clutter components and produce target details. Moreover, the comparison in Table 1 shows that the reported untrained learning outperforms existing algorithms under both random and statistics-based sampling. It shows an average 2.61 dB and 4.19 dB improvement in PSNR compared to existing CS-based approaches at 10% and 25% sampling ratios, respectively. Compared to the DL methods, untrained learning employed physics-based constraints, making the reconstruction quality more robust (with a smaller standard deviation). More details are referred to Supplementary Note 7.
Based on our experiment implementation, the average solution times of RMA, CS-CG, and ADMM methods running by Matlab on 2 × Intel E5-2687W CPUs are 2 s, 610 s, and 66 s, respectively. While the untrained method has a relatively high complexity, the testing shows that 100–200 iterations were sufficient for identifying concealed targets in most cases, which took <21 s on an NVIDIA RTX 4090 GPU. Further, the running time tends to be less than 1.6 s on an NVIDIA H100 GPU. Details about the analysis of complexity and running time are referred to Supplementary Note 5.4. The reported framework does not require a scanning process, so it can be adapted to most application scenarios, such as airports and key train stations, where it can safely and conveniently inspect hidden items through clothing and enhance the efficiency of security checks.
Concealed target detection
To evaluate the concealed target detection performance of the reported technique, we adopted the commonly used object detection network YOLOv846. Similar to multi-class detection works47,48,49, the detection network was employed to distinguish five kinds of common hidden targets, including knives, wrenches, phones, guns, and EPM. We utilized the F1 score and mean Average Precision (mAP)50 to evaluate the detection results and the imaging methods’ performance. More details can be found in Supplementary Note 5.5. The results are presented in Fig. 3. We can see that when using the detection network trained with the full-sampled echo reconstructions, both our statistically sparse array and untrained reconstruction exhibit superior accuracy over others. The combination of statistically sparse sampling and untrained reconstruction at 25% sampling ratio produced above 0.4 absolute numerical improvements on F1 and mAP50 compared to the existing imaging schemes (Fig. 3a). Figure 3b illustrates the detection results with bounding boxes at 10% sampling ratio. Under this extremely low sampling ratio, only the combination of the statistically sparse array and untrained reconstruction can yield an accurate location and classification of the concealed objects. The DL method with statistically sparse sampling can recover the body shape but failed to reconstruct the concealed target. If we train the detection network using corresponding sparsely sampled echo reconstructions, we can achieve higher accuracy compared to training with full-sampled echo reconstructions (Fig. 3c). To sum up, by employing statistically sparse array design and untrained learning for robust reconstruction, we can achieve successful single-shot detection of targets using only 10% of the original full antenna array elements. This indicates that the reported technique is promising to reduce the cost of antenna arrays for single-shot MMW security inspection by an order of magnitude.
a The numerical comparison of detection performance at a 25% sampling ratio. The detection network was trained with full-sampled echo reconstructions. b The detection results (SR = 10%) using the network trained with full-sampled echo reconstructions. The combination of statistical sampling strategy and untrained reconstruction can stably recover clear textures from 10% sparsely sampled echoes for detecting concealed targets. c The detection results of the network trained with sparsely sampled echo reconstructions. According to the metrics and detection results, these networks exhibit higher accuracy compared to the networks trained with full-sampled echo reconstructions. 'EPM' is the abbreviation for explosive powdered material.
Applicability and generalization
We have conducted extensive experiments to assess our system’s applicability and generalization for reconstruction and detection, encompassing different clothing, subject positioning, body shape, target positioning and status, etc. The exemplar reconstruction and detection results are presented in Fig. 4, while more detailed results can be found in Supplementary Note 8. Drawing from the experiment results, we summarize the main conclusions as follows:
-
1.
Clothing: As shown in Fig. 4a and Supplementary Note 8.1, common materials such as cotton, synthetic fibers, and blended fabrics are penetrable by MMW, meaning that wearing clothes made of these materials does not affect the accuracy of reconstruction and detection. Woolen products, to some extent, may allow MMW penetration, but potentially affect the detection of MMW-absorbing targets. On the other hand, MMW is hard to penetrate leather items, which obstructs the reconstruction and detection of targets concealed under such cloth. Thicker garments made from materials penetrable by MMWs (like sweaters, down jackets, etc.), or layering multiple garments (e.g., T-shirt + sweater + down jacket), do not hinder reconstruction and detection processes. Hence, we advise operators to instruct individuals undergoing scans to remove clothing composed of leather, wool, or fur materials.
-
2.
Body shape: The imaging coverage area of our system is set at around 185 cm in height and 90 cm in width. We tested multiple subjects ranging from 160 cm to 189 cm in height and from 45 kg to 95 kg in weight. We experimentally validated that the system can successfully image and detect concealed targets of different people with common body shapes (Supplementary Note 8.2).
-
3.
Subject position: Orientation variation of the subject may impact detection accuracy. We adopt the orientation facing and parallel to the antenna array as a reference. When the targets are hidden on the subject’s chest, the detectable range extends to a left and right rotation of 20∘ under both 10% and 25% sampling ratios (Fig. 4b, Supplementary Note 8.3). The detectable ranges for forward, backward, and lateral movements are as follows: 10 cm for forward movements, 20 cm for backward movements, and 40 cm (25% sampling ratio) / 15 cm (10% sampling ratio) for lateral movements, as depicted in Fig. 4c and Supplementary Note 8.3. In practice, we can position subjects within the detectable range to ensure successful detection.
-
4.
Target shape, position, and status: Our prototype was designed for detecting multiple classes of concealed targets, including EPM, knives, phones, guns, and wrenches (Fig. 3c). The length of these targets ranges from 7 to 15 cm, while their width varies from 2 to 11 cm. The employed YOLOv8 network has been validated in handling variations of targets’ positions and status46. In addition, we have conducted a series of experiments (Supplementary Note 8.4) to validate that within the detectable range, regardless of the position, rotation angle, or open/closed status of the targets, the detection network is capable of successfully detecting hidden targets.
a MMW-penetrable clothing includes common materials such as cotton, synthetic fiber, and blended fabric. While wool does offer some degree of MMW penetration, it can potentially impact detection accuracy. MMW is hard to penetrate leather items. It does not affect the reconstructed image quality or detection accuracy in cases of common layering of multiple MMW-penetrated clothes. b The detectable range under subject rotation. The detectable range is between 20° to the left and right, with the human body facing the array as the reference. c The reconstruction robustness (PSNR/dB) and detectable range under varying subject positions. We show the difference between the PSNR values at offset positions and the PSNR value at the original position. The red box indicates the range within which hidden targets can be detected by the detection network. More details are presented in Supplementary Note 8.
Discussion
In this work, we developed a large-scale single-shot MMW imaging system for low-cost security inspection. We collected a large-scale MMW human security inspection dataset of real-captured MMW echoes. Based on the dataset, we proposed a statistical sampling approach for sparse MMW array design that maintains high-fidelity reconstruction quality while minimizing the number of elements. Besides, we report an untrained reconstruction approach based on a lightweight complex-valued neural network for sparsely sampled measurements. By integrating iterative optimization and neural network techniques, this approach overcomes the black-box limitation of end-to-end neural networks, and offers superior reconstruction quality and high reliability for sensitive security inspection. Furthermore, we trained a concealed target detection network, achieving automatic high-precision detection from the reconstructed images. Experiments demonstrated that the reported framework achieved accurate detection of concealed centimetre-sized targets with a 10% sparse array, whereas other contemporary approaches failed at such a low sampling ratio. We conducted comprehensive experiments to validate the performance of the reported technique under variations in clothing, subject positioning, body shape, target positioning, and target status. These experiments validated that the reported system showcases effectiveness and robustness in common security check scenarios. Overall, the technique maintains the potential to decrease array cost by more than one order of magnitude for efficient MMW security check.
Aiming for practical applications, the generalization and running efficiency of the reported technique can be further enhanced. First, the dataset diversity can be improved by collecting various echo data corresponding to a variety of body shapes. Second, convolution-accelerate hardware such as high-FLOPs GPUs (see Supplementary Note 5.4), FPGAs51, and specific SOCs52,53 can be utilized to improve the speed of reconstruction in practice. We can also investigate such low-precision networks with increased efficiency and comparable performance54,55. Third, according to the subject’s physique priori, defining a region of interest (ROI) and ignoring the reconstruction outside ROI can effectively reduce the number of network parameters, thereby further improving computational efficiency. Fourth, we can introduce joint learning for both the sampling array and reconstruction network, to further boost the efficiency and accuracy. When the training converges, the network inputs sparsely sampled signals to output the reconstruction results without the need for additional iterations.
The statistical sparse sampling strategy can be applied not only in SISO arrays but also in MIMO ones. For topology design, we can apply the statistical sampling strategy to the equivalent SISO array of a full MIMO array. Then, the statistically sampled equivalent SISO array is mapped to the MIMO array with fewer antenna elements. Besides the aforementioned synthetic aperture radar (SAR) imaging systems, the statistical sparse sampling strategy can also be extended to electrical scanning imaging systems such as electronically scanned antennas (ESAs). In the context of dynamic beamforming techniques, conventional ESAs rely on the phased array technique that enables high-fidelity beam control. However, the use of phased arrays is associated with significant system costs and complex hardware complexity, which limits their applicability in certain contexts. By employing the statistical sparse sampling strategy, it is possible to reduce array density with effective cost savings.
Further, we can introduce the end-to-end sensing strategy8,56 to further enhance efficiency, which omits the imaging process and extracts high-dimensional semantic information directly from raw measurements. Such an image-free sensing strategy can effectively alleviate storage and bandwidth load originating from the image reconstruction process. Besides, it remains impervious to signal distortion and loss resulting from image reconstruction, thereby facilitating the enhancement of recognition accuracy. What’s more, the joint training of both sampling matrix and sensing network can help achieve optimal sensing performance, further enhancing the accuracy of concealed target recognition.
Method
System implementation
We built an MMW imaging prototype for the following experiments (Fig. 5a). The prototype emits broadband linear frequency modulated (LFM) signals and receives echoes using the dechirp technique. The system diagram is shown in Fig. 5b. The first and second local oscillators (LO1 and LO2) output carrier signals f1 and f2, respectively, which are mixed with the LFM signal fd generated by the Direct Digital Frequency Synthesis, to obtain f1 + fd and f2 + fd, respectively. After N-fold frequency multiplication, the transmission signal N(f1 + fd) and the reference signal N(f2 + fd) are obtained, respectively. The echo N(f1 + fd) is de-modulated with the reference signal to obtain the intermediate frequency signal N(f1 − f2). The reference intermediate frequency signal N(f1 − f2) is obtained by mixing and N-fold frequency multiplication of the LO signals. Finally, the analog IQ demodulation is employed to produce the baseband signal, with subsequent digital sampling.
a The working system for human security inspection. b The block diagram of the transceiver system. c A resolution target and corresponding full-sampled imaging results. The groups 4, 5, 6, and 7 correspond to the resolutions of 4 mm, 5 mm, 6 mm, and 7 mm. d The untrained reconstructions of random and statistically optimized sparse arrays. The resolutions of these arrays are 10% random >7 mm, 10% statistically optimized 6 mm, 25% random 6 mm, 25% statistically optimized 5 mm, and full array 5 mm.
The working frequency of the prototype ranges from 32 to 37 GHz. The prototype consists of a linear monostatic array under a vertical mechanical scanning structure. The antenna is in the waveguide slot form with vertical linear polarization. The transmit and receive arrays are arranged in a staggered manner, thus an equivalent sampling point is formed at the center between two neighboring transmit and receive elements. The sampling interval of the equivalent sampling point meets the system requirements57. The transmit antennas work in sequence, with the two neighboring receive antennas collecting the EM wave at the same time. The aperture size is 2 m × 1 m, providing the spatial resolutions both to be 5 mm (Fig. 5d). Statistically optimized arrays exhibit higher resolution than random arrays. With the reported statistically sparse sampling strategy, a 25% sparse array can achieve the same resolution as the original full array. Even using a 10% sparse array, the technique can achieve a resolution of 6 mm.
Analysis on the statistical maps of MMW echoes
We conducted an analysis on the phase gradient map of a collection of real-captured echoes from a fully-sampled antenna array. This part aims to elucidate the underlying reasons why this map is capable of effectively reflecting the statistical importance ranking of phase in MMW echoes.
We consider a scenario where the target is distributed along the x and y axes. In this context, \(\sigma \left(x,y\right)\) represents the scattering coefficient, k0 corresponds to the wavenumber, and \(x^{\prime}\) and \(y^{\prime}\) denote the respective antenna locations along these axes. Additionally, R0 denotes the distance between the target and the antenna array. The gradient of echo along the x-axis can be mathematically expressed as
where C is a constant number, and
Herein, Θx and Θy represent the antenna beamwidth along the azimuth and height directions, respectively. Due to the relatively uniform distribution of scattering coefficients σ(x, y) of human targets, the derivatives \(\frac{\partial s\left(x^{\prime},y^{\prime} \right)}{\partial x^{\prime} }\to 0\), \(\frac{\partial s\left(x^{\prime},y^{\prime} \right)}{\partial y^{\prime} }\to 0\). In other words, selecting elements with small gradients can improve the quality of illumination for the target of interest. A more detailed analysis is referred to Supplementary Note 4.
Moreover, the averaged amplitude map reflects element significance, with higher amplitude usually indicating greater importance. By multiplying the averaged amplitude map with the inverse phase gradient, we obtain a statistical ranking of element importance. This ranking enables selectively choosing antenna elements of the highest importance at various sampling ratios.
Statistically sparse sampling
We developed a quantitative statistically sparse sampling strategy to obtain the sparse pattern M. Different from handcrafted-designed or other sparse sampling strategies58, we selected the elements in the order of statistical importance in \(\bar{M}\) with a fixed probability. Given the statistical prior \(\bar{M}\) and a uniform random function \(r\left(n\right)\) ranging from 0 to 1, we selected the element n of \(r\left(n\right)\, > \,S\), where S is a hyperparameter to control the sparsity of the sampling pattern. The sparse pattern M can be formulated as
where ‘1’ means the element to be selected. When 1-S is larger than the sampling ratio, the total number of selected elements is greater than the preset number determined by the sampling ratio, and the last out-of-range elements will be discarded. When 1-S equals to the sampling ratio, the obtained array is a uniformly random array. There won’t be enough elements in the resulting sparse array when 1-S is less than the sampling ratio.
Untrained reconstruction based on CCN
The objective of the reported untrained reconstruction is formulated as
where θ denotes the parameters of the CCN fθ, Es is the sparse MMW measurement, \({{{{{{\mathcal{H}}}}}}}\) is the physical model of MMW scattering, and z is the input of the network. The scattering process \({{{{{{\mathcal{H}}}}}}}\) can be denoted as
where \({{{{{{{\mathcal{F}}}}}}}}_{3D}\{\cdot \}\) represents a 3D spatial Fourier transform for all the spatial dimensions of the imaging region. \({{{{{{{\mathcal{F}}}}}}}}_{2D}^{-1}\{\cdot \}\) denotes the 2D spatial inverse Fourier transform over the 2D array aperture. INk indicates the interpolation with respect to the wavenumber k.
As shown in Fig. 2a, the reported lightweight CCN has 7 blocks, consisting of an input complex-valued Conv-BN-ReLU block, 5 complex-valued Res-blocks, and an output complex-valued Conv layer in sequential. We utilized the BFloat 16-bit (BF16) quantization to reduce computational workload while maintaining high reconstruction accuracy. All the complex-valued convolutional layers in the network have 256 complex-valued kernels (kernel size = 3, step = 1, padding = 1). What differentiates CCN from a real-valued convolutional network is the complex-valued convolutional layer. The complex-valued convolution takes the real and imaginary parts of a complex-valued feature as two-channel input and convolutes the input features with complex-valued kernels. We took the reconstructed 3D scene by RMA as the input z and used the Adam59 solver with a learning rate of 0.001 to update the parameter θ. The detailed analysis of network depth, quantization, gradient descent algorithm, and regularization can be found in Supplementary Note 5.2. The network was implemented on the Pytorch 2.0 platform.
-
1.
Complex-valued convolutional layer: Given the input complex-valued feature map F = FR + iFI and the complex-valued convolutional kernel K = KR + iKI, the complex-valued convolution is denoted as
$$F * K =\left({F}_{R}+i{F}_{I}\right) * \left({K}_{R}+i{K}_{I}\right)\\ =\left({F}_{R} * {K}_{R}-{F}_{I} * {K}_{I}\right)+i\left({F}_{R} * {K}_{I}+{F}_{I} * {K}_{R}\right),$$(6)where ∗ denotes the convolution operation. In the implementation, we treat complex values as two-channel real values (real channel R and imaginary channel I).
-
2.
Complex-valued ReLU: The complex-valued ReLU is denoted as
$$\,{{\mbox{Complex ReLU}}}(x)={{\mbox{ReLU}}}({x}_{R})+i{{\mbox{ReLU}}}\,({x}_{I}),$$(7)where the ReLU function is
$$\,{{\mbox{ReLU}}}\,\left(x\right)=\left\{\begin{array}{l}x,\quad \,{{\mbox{if}}}\,\,x\ge 0,\quad \\ 0,\quad \,{{\mbox{otherwise}}}\,.\quad \end{array}\right.$$(8) -
3.
Complex-valued batch normalization: The complex-valued batch normalization whitens the complex-valued features by multiplying the 0-centered data (\(F-{\mathbb{E}}(F)\)) by the inverse square root of the 2 × 2 covariance matrix V
$$\widetilde{F}={V}^{-\frac{1}{2}} \, \left(F-{\mathbb{E}}\left(F\right)\right),$$(9)where \({\mathbb{E}}\) represents the mathematical expectation, and the covariance matrix V is denoted as
$$V=\left[\begin{array}{cc}{V}_{rr}&{V}_{ri}\\ {V}_{ir}&{V}_{ii}\end{array}\right]=\left[\begin{array}{cc}\,{{\mbox{Cov}}}\,\left({F}_{R},\,{F}_{R}\right)&\,{{\mbox{Cov}}}\,\left({F}_{R},\,{F}_{I}\right)\\ \,{{\mbox{Cov}}}\,\left({F}_{I},\,{F}_{R}\right)&\,{{\mbox{Cov}}}\,\left({F}_{I},\,{F}_{I}\right)\end{array}\right].$$(10)Same as the traditional BN, the complex-valued BN also has the learnable scaling parameter γ and shift parameter β
$$\,{{\mbox{Complex BN}}}\,\left(F\right)=\gamma \widetilde{F}+\beta .$$(11) -
4.
Complex-valued res-block: The complex-valued res-block contains complex-valued Conv-BN-ReLU-Conv-BN layers arranged sequentially, and the final BN layer’s output feature is directly summed with the input feature. In our implementation, we employed a network architecture comprising five complex-valued res-blocks.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The minimum data generated in this study have been deposited in the Zenodo database under accession code https://zenodo.org/doi/10.5281/zenodo.1109126460. The complete data is available under restricted access following the funded project requirements. Access can be obtained by reasonable request to the corresponding authors.
Code availability
The demo code of the reported technique is available at https://github.com/bianlab/MMW.
References
Triplett, W. Technology will assist the fight against terrorism. Nature 413, 238–240 (2001).
Li, S., Wang, S., An, Q., Zhao, G. & Sun, H. Cylindrical MIMO array-based near-field microwave imaging. IEEE Trans. Antennas Propag. 69, 612–617 (2020).
Scribner, D. A., Kruer, M. R. & Killiany, J. M. Infrared focal plane array technology. Proc. IEEE 79, 66–85 (1991).
De Chaumont, F. et al. Real-time analysis of the behaviour of groups of mice via a depth-sensing camera and machine learning. Nat. Biomed. Eng. 3, 930–942 (2019).
Li, L. et al. Machine-learning reprogrammable metasurface imager. Nat. Commun. 10, 1082 (2019).
Lynch, J. J. et al. Passive millimeter-wave imaging module with preamplified zero-bias detection. IEEE Trans. Microw. Theory Techn. 56, 1592–1600 (2008).
Sheen, D. M., McMakin, D. L. & Hall, T. E. Three-dimensional millimeter-wave imaging for concealed weapon detection. IEEE Trans. Microw. Theory Tech. 49, 1581–1592 (2001).
Liu, C. et al. A programmable diffractive deep neural network based on a digital-coding metasurface array. Nat. Electron. 5, 113–122 (2022).
Hunt, J. et al. Metamaterial apertures for computational imaging. Science 339, 310–313 (2013).
Cui, T. J., Qi, M. Q., Wan, X., Zhao, J. & Cheng, Q. Coding metamaterials, digital metamaterials and programmable metamaterials. Light Sci. Appl. 3, 218–218 (2014).
Li, L. et al. Electromagnetic reprogrammable coding-metasurface holograms. Nat. Commun. 8, 197 (2017).
Zhuge, X. & Yarovoy, A. G. Three-dimensional near-field MIMO array imaging using range migration techniques. IEEE Trans. Image Process. 21, 3026–3033 (2012).
Liu, H. et al. Millimeter-wave image deblurring via cycle-consistent adversarial network. Electronics 12, 741 (2023).
Kupyn, O., Budzan, V., Mykhailych, M., Mishkin, D., Matas, J. DeblurGAN: blind motion deblurring using conditional adversarial networks. In: IEEE/CVF Conf. Comput. Vis. Pattern Recogn., pp. 8183–8192 (2018)
Desai, M. D. & Jenkins, W. K. Convolution backprojection image reconstruction for spotlight mode synthetic aperture radar. IEEE Trans. Image Process. 1, 505–517 (1992).
Fromenteze, T. et al. A transverse spectrum deconvolution technique for MIMO short-range fourier imaging. IEEE Trans. Geosci. Remote Sens. 57, 6311–6324 (2019).
Álvarez, Y. et al. Fourier-based imaging for multistatic radar systems. IEEE Trans. Microw. Theory Tech. 62, 1798–1810 (2014).
Abbasi, M., Shayei, A., Shabany, M. & Kavehvash, Z. Fast fourier-based implementation of synthetic aperture radar algorithm for multistatic imaging system. IEEE Trans. Instrum. Meas. 68, 3339–3349 (2018).
Li, S., Wang, S., Amin, M. G. & Zhao, G. Efficient near-field imaging using cylindrical MIMO arrays. IEEE Trans. Aerosp. Electron. Syst. 57, 3648–3660 (2021).
Yang, B., Zhuge, X., Yarovoy, A., Ligthart, L. UWB MIMO antenna array topology design using PSO for through dress near-field imaging. In: Eur. Microw. Conf., pp. 1620–1623 (2008).
Gonzalez-Valdes, B. et al. Sparse array optimization using simulated annealing and compressed sensing for near-field millimeter wave imaging. IEEE Trans. Antennas Propag. 62, 1716–1722 (2013).
Tan, K., Wu, S., Wang, Y., Ye, S., Chen, J. & Fang, G. A novel two-dimensional sparse MIMO array topology for UWB short-range imaging. IEEE Antennas Wireless Propag. Lett. 15, 702–705 (2015).
Tan, K. et al. On sparse MIMO planar array topology optimization for uwb near-field high-resolution imaging. IEEE Trans. Antennas Propag. 65, 989–994 (2016).
Wang, S. et al. Compressive sensing based sparse MIMO array synthesis for wideband near-field millimeter-wave imaging. IEEE Trans. Aerosp. Electron. Syst. 59, 7681–7697 (2023).
Wang, S. et al. Convex optimization-based design of sparse arrays for 3-D near-field imaging. IEEE Sens. J. 23, 9640–9648 (2023).
Coker, J.D., Tewfik, A.H. Compressed sensing and multistatic SAR. In: Int. Conf. Acoust. Speech Signal Process., pp. 1097–1100 (2009).
Li, S., Zhao, G., Zhang, W., Qiu, Q. & Sun, H. ISAR imaging by two-dimensional convex optimization-based compressive sensing. IEEE Sens. J. 16, 7088–7093 (2016).
Li, S. et al. Near-field radar imaging via compressive sensing. IEEE Trans. Antennas Propag. 63, 828–833 (2014).
Barzegar, A. S., Cheldavi, A., Sedighy, S. H. & Nayyeri, V. 3-D through-the-wall radar imaging using compressed sensing. IEEE Geosci. Remote Sens. Lett. 19, 1–5 (2021).
Ichikawa, K. & Hirose, A. Singular unit restoration in InSAR using complex-valued neural networks in the spectral domain. IEEE Trans. Geosci. Remote Sens. 55, 1717–1723 (2016).
Hu, C., Wang, L., Li, Z. & Zhu, D. Inverse synthetic aperture radar imaging using a fully convolutional neural network. IEEE Geosci. Remote Sens. Lett. 17, 1203–1207 (2019).
Li, L. et al. DeepNIS: Deep neural network for nonlinear electromagnetic inverse scattering. IEEE Trans. Antennas Propag. 67, 1819–1825 (2018).
Wei, Z. & Chen, X. Deep-learning schemes for full-wave nonlinear inverse scattering problems. IEEE Trans. Geosci. Remote Sens. 57, 1849–1860 (2018).
Rostami, P., Zamani, H., Fakharzadeh, M., Amini, A. & Marvasti, F. A deep learning approach for reconstruction in millimeter-wave imaging systems. IEEE Trans. Antennas Propag. 71, 1180–1184 (2022).
Bao, J. et al. Fine-grained image generation network with radar range profiles using cross-modal visual supervision. IEEE Trans. Microw. Theory Tech. 72, 1339–1352 (2023)
Wang, Z., Bovik, A. C., Sheikh, H. R. & Simoncelli, E. P. Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13, 600–612 (2004).
Ulyanov, D., Vedaldi, A., Lempitsky, V. Deep image prior. In: IEEE/CVF Conf. Comput. Vis. Pattern Recogn., pp. 9446–9454 (2018).
Wang, F. et al. Phase imaging with an untrained neural network. Light Sci. Appl. 9, 77 (2020).
Hirose, A. Complex-valued neural networks: advances and applications (Wiley-IEEE Press, 2013).
Zhang, Z., Wang, H., Xu, F. & Jin, Y.-Q. Complex-valued convolutional neural network and its application in polarimetric sar image classification. IEEE Trans. Geosci. Remote Sens. 55, 7177–7188 (2017).
Trabelsi, C. et al. Deep complex networks. In: Int. Conf. Learn. Representations https://arxiv.org/abs/1705.09792 (2018).
Gao, Y. & Cao, L. Iterative projection meets sparsity regularization: towards practical single-shot quantitative phase imaging with in-line holography. Light Adv. Manuf. 4, 1–17 (2023).
Li, S., Zhao, G., Sun, H. & Amin, M. Compressive sensing imaging of 3-d object by a holographic algorithm. IEEE Trans. Antennas Propag. 66, 7295–7304 (2018).
Boyd, S., Parikh, N., Chu, E., Peleato, B. & Eckstein, J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3, 1–122 (2011).
Chang, X. et al. Complex-domain-enhancing neural network for large-scale coherent imaging. Adv. Photon. Nexus 2, 046006–046006 (2023).
Jocher, G., Chaurasia, A. & Qiu, J. YOLO by Ultralytics. https://github.com/ultralytics.
Wang, C. et al. Concealed object detection for millimeter-wave images with normalized accumulation map. IEEE Sens. J. 21, 6468–6475 (2020).
Yuan, M., Zhang, Q., Li, Y., Yan, Y. & Zhu, Y. A suspicious multi-object detection and recognition method for millimeter wave sar security inspection images based on multi-path extraction network. Remote Sens. 13, 4978 (2021).
Su, B. & Yuan, M. Object recognition for millimeter wave MIMO-SAR images based on high-resolution feature recursive alignment fusion network. IEEE Sens. J. 23, 16413–16427 (2023).
Zheng, L. et al. Scalable person re-identification: a benchmark. In: IEEE Int. Conf. Comput. Vis., pp. 1116–1124 (2015)
Zhang, C. et al. Caffeine: toward uniformed representation and acceleration for deep convolutional neural networks. IEEE T. Comput.-Aided Des. Integr. Circuits Syst. 38, 2072–2085 (2018).
Hegde, G. & Kapre, N. Caffepresso: accelerating convolutional networks on embedded socs. IEEE Trans. Embed. Comput. S 17, 1–26 (2017).
Meloni, P. et al. Neuraghe: exploiting cpu-fpga synergies for efficient and flexible cnn inference acceleration on zynq socs. ACM Trans. Reconfig, Technol. Syst. 11, 1–24 (2018).
Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R. & Bengio, Y. Quantized neural networks: training neural networks with low precision weights and activations. J. Mach. Learn. Res. 18, 6869–6898 (2017).
Sun, X. et al. Ultra-low precision 4-bit training of deep neural networks. Adv. Neural Inf. Process. 33, 1796–1807 (2020)
Wetzstein, G. & Kauvar, I. Optically sensing neural activity without imaging. Nat. Photonics 14, 340–341 (2020).
Sheen, D., McMakin, D. & Hall, T. Near-field three-dimensional radar imaging techniques and applications. Appl. Opt. 49, 83–93 (2010).
Li, D., Gao, Z. & Bian, L. Efficient large-scale single-pixel imaging. Opt. Lett. 47, 5461–5464 (2022).
Kingma, D.P., Ba, J. Adam: a method for stochastic optimization. arXiv https://arxiv.org/abs/1412.6980 (2014).
Bian, L. et al. Towards large-scale single-shot millimeter-wave imaging for low-cost security inspection. https://doi.org/10.5281/zenodo.11091264.
Acknowledgements
We appreciate the assistance from Beijing Zhongdun Anmin Fenxi Technology Limited Company in conducting certain measurements. This work was supported by the National Natural Science Foundation of China under Grants 62322502 (L.B.) and 62131003 (L.B.), and the Guangdong Province Key Laboratory of Intelligent Detection in Complex Environment of Aerospace, Land and Sea under Grant 2022KSYS016 (L.B.).
Author information
Authors and Affiliations
Contributions
L.B., D.L., S.W., and S.L. conceived the idea. D.L., S.W., H.L., J.W., C.T., H.X., G.Z., and X.C. conducted the experiments. D.L. and S.W. performed data analysis. L.B., S.L., and J.Z. supervised the project. All the authors participated in the analysis and discussion of the results.
Corresponding authors
Ethics declarations
Competing interests
L.B. and D.L. hold patents on technologies related to the devices developed in this work (China patent numbers ZL202210778396.7 and ZL202010522279.5) and submitted related patent applications. The remaining authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Qi Ye, Shitao Zhu and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Bian, L., Li, D., Wang, S. et al. Towards large-scale single-shot millimeter-wave imaging for low-cost security inspection. Nat Commun 15, 6459 (2024). https://doi.org/10.1038/s41467-024-50288-y
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41467-024-50288-y







