Introduction

With the advancement of science and technology, intelligent vehicles and autonomous driving are increasingly becoming part of everyday life1. Intelligent vehicles are multidisciplinary integrated systems2,3, typically comprising three main components: environmental perception, behavior decision-making, and motion planning and control. Among these, environmental perception is the prerequisite and foundation for intelligent vehicles. The performance of environmental perception directly impacts the quality of vehicle decision-making and control4. As a significant participant in traffic environments, vehicles are crucial identification and detection objects in environmental perception tasks that play an essential role in enhancing traffic safety and practical work efficiency5. This plays a vital role in improving traffic safety and operational efficiency6,7. Recognizing vehicle types in actual scenarios, such as underground garages, indoor parking lots, intersections, etc., can not only assist traffic management and law enforcement, but also provide necessary environmental perception information for unmanned vehicles8.In recent years, with the rapid development of computer technology and sensor hardware, many new techniques and methods have been applied to the field of vehicle detection and recognition9,10,11. From the perspective of sensors, the common automatic recognition and classification methods of vehicle types mainly include methods based on video images12,13,14 and laser radar15,16,17.

Nowadays, vehicle detection and recognition have achieved high levels in favorable road environments18,19. However, the performance of camera-based detection and recognition is greatly reduced in harsh conditions such as nighttime, rain, snow, and foggy days20, while LiDAR can maintain good working conditions even in low-light situations21. As one of the primary sensors equipped in most new energy vehicles, LiDAR detects obstacles by transmitting and receiving laser echo signals, thereby providing perception information about the traffic environment through point clouds22,23. In comparison to cameras, LiDAR is unaffected by lighting conditions and exhibits excellent adaptability to various adverse weather conditions. And the point cloud collected by it contains multi-channel data, which can not only be processed directly from a three-dimensional perspective24, but also can be converted into two-dimensional images for more efficient processing25. This makes it widely used in many fields including intelligent vehicle systems26. Jin et al.27 proposed a robust vehicle detection method based on LiDAR. The improved Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm effectively dealt with the uneven distribution of point clouds and the deterioration of clustering quality, and built an advanced PointNet using deep learning technology to stably complete vehicle target recognition. Zhou et al.28 developed a new framework based on Convolutional Neural Network and LiDAR data, achieving vehicle detection from roadside LiDAR data. Wu et al.29 fused features obtained by projecting 3D-LiDAR data into image space with region features from camera images, accurately classifying various vehicle types and pedestrians using a Faster-RCNN. Wu et al.30 proposed an improved DBSCAN method named 3D-SDBSCAN for distinguishing vehicle points from snowflakes in LiDAR data, effectively solving weather occlusion problems. The aforementioned methods effectively leverage the high precision and robustness of LiDAR; however, they only provide coarse-grained recognition and classification for vehicles. These methods can merely differentiate between various objects such as cars, trucks, or pedestrians, without achieving fine-grained recognition and classification for specific vehicle types.

In order to achieve precise recognition of vehicle types, numerous deep learning-based methods31,32have been proposed by scholars, which utilize video images captured by cameras for car identification. Ke et al.33 introduced a dense attention network-based approach for fine-grained detection and recognition of vehicle types. Liu et al.34 presented the Multi-task Visual Language Model for vehicle recognition, while Joseph Sanjaya et al.35 conducted experiments to validate the effectiveness of using EfficientNet architecture in car image recognition. However, images captured at night often suffer from low brightness or quality, and the features used for recognition and classification may not be distinct enough. To address these issues, researchers have proposed various image preprocessing methods to enhance image quality and improve recognition accuracy. Histogram equalization36,37,38 enhances image contrast by redistributing pixel values to achieve a histogram closer to uniform distribution, thereby improving visual effect. However, global histogram equalization can sometimes reduce image details while ignoring local contrast enhancement. Median filtering39 improves clarity and detail in an image by replacing each pixel value with the median value of its neighborhood pixels while reducing noise and details in the process. Nevertheless, it has relatively slow calculation speed and cannot remove strong noise effectively. Dark channel prior40 restores image details and contrast by removing over-bright areas in an image to enhance visual effect; however, its operation efficiency is low making it unsuitable for real-time processing applications. Retinex41, based on human retina processing mechanism separates reflection component from illumination component of an image to recover details and contrast thus enhancing visual effect; however, its color retention ability is weak especially when dealing with brighter images. Although the aforementioned methods can enhance the accuracy of car image recognition and classification to a certain extent, under certain conditions such as extremely low light environments, such as underground garage and other scenes, it may not be feasible to utilize the camera-captured images through enhancement techniques.

Aiming to address the issue of LiDAR’s inability to achieve precise recognition of car models and the limitation of camera images in extremely low-light conditions, this paper proposes a novel approach that combines LiDAR point cloud data with EfficientNet and image enhancement technology to accomplish fine-grained recognition of car models. The main contributions of this study are as follows:

  1. 1)

    Leveraging LiDAR-collected car point cloud data, which undergo rotation, translation, and Angle projection processes, we realize car classification based on data augmentation techniques and EfficientNet architecture. This provides a new processing method for fine-grained recognition of car models in the field of unmanned driving perception.

  2. 2)

    We propose a data augmentation method that combines Contrast Limited Adaptive Histogram Equalization (CLAHE) with Gamma correction to enhance low-brightness datasets transformed by point clouds. This enhances vehicle recognition accuracy and demonstrates its effectiveness across multiple models.

  3. 3)

    Testing and evaluation on our self-collected dataset demonstrate the practicality of the proposed method for fine-grained recognition of car models under dim conditions, achieving an overall recognition accuracy rate of 98.88% and an F1-score reaching 98.86%.

II. Methodology

Aiming to address the challenge of recognizing vehicle types accurately under low-light conditions, which poses significant inconvenience to traffic safety and human activities, this paper proposes a refined method for fine-grained vehicle type recognition based on EfficientNet using LiDAR point cloud data and image enhancement techniques. This approach effectively enhances the accuracy of vehicle recognition in dim conditions and exhibits enhanced practicality.

The overall framework of the proposed method is shown in Fig. 1 After data acquisition using LiDAR, the specific processing workflow includes three core steps:

  1. 1)

    Point Cloud Processing Module: Convert the LiDAR-captured point cloud data into common JPEG images.

  2. 2)

    Data Enhancement: Use CLAHE and Gamma correction techniques to process the converted images, enhancing contrast and details to improve recognition capability.

  3. 3)

    EfficientNet Classification: Classify the enhanced images using the EfficientNet network to achieve accurate vehicle recognition.

Point cloud processing: from PCD to JPG

Point cloud data is widely used in fields such as remote sensing target detection and autonomous driving. However, due to its large data volume and inherent sparsity, processing and recognizing point clouds directly, particularly for 3D object detection tasks, poses significant challenges in terms of computational complexity, memory requirements, and real-time performance. Converting multi-channel lidar point cloud data into dual-channel images on the x and y planes serves as a dimensionality reduction step, which significantly reduces the data volume while preserving critical spatial information. This transformation allows the use of well-established image processing techniques and pre-trained deep learning models, such as the EfficientNet network, to achieve higher recognition accuracy without the need to design complex 3D-specific architectures from scratch.

Although 3D vehicle detection methods can theoretically provide higher precision, they often come with increased computational costs, higher model complexity, and greater training data demands, which may not be suitable for real-time applications or resource-constrained environments. By leveraging image-based processing, we can effectively balance accuracy and efficiency, enabling the system to achieve robust recognition performance even under limited computational resources. Furthermore, this approach facilitates the integration of advanced image enhancement techniques to improve the quality of data and further enhance recognition accuracy. Therefore, converting point cloud data into dual-channel images represents a practical and efficient alternative, particularly when considering real-time applications and the challenges associated with raw 3D data processing.

Point cloud data typically contains a large number of three-dimensional coordinate points, providing information on the distribution of objects in 3D space. Preprocessing involves denoising and downsampling the point cloud data to reduce computational load and enhance processing efficiency. To facilitate subsequent processing, the point cloud data must be rotated and translated to align with a unified coordinate system. First, calculate the centroid of the point cloud data and translate it to the origin. The centroid calculation formula is as follows:

$${\text{cen}}reoid=\frac{1}{N}\sum\limits_{{i=1}}^{N} {{P_i}}$$
(1)

where N is the number of points and \({P_i}\) is the coordinate of the th point.

Fig. 1
figure 1

Overall framework for LiDAR point cloud vehicle recognition combining image enhancement and EfficientNet.

Next, compute the principal direction of the point cloud data and generate a rotation matrix to align the point cloud with the coordinate axes. To convert 3D point cloud data into 2D images, project the point cloud data onto a 2D plane. The angle projection method maps each point in 3D space to 2D coordinates by calculating the projection angle:

$${\theta _y}=\arctan 2(y,x)$$
(2)
$${\theta _z}=\arctan 2(z,\sqrt {{x^2}+{y^2}} )$$
(3)

where \({\theta _y}\)and \({\theta _z}\)represent the projection angles on the y-axis and z-axis, respectively.

After projecting the point cloud data onto the 2D plane, some blank spots may appear. To fill these gaps, interpolation is applied to the image. Using the barycentric coordinate interpolation method, each pixel’s value is interpolated based on the barycentric coordinates within its adjacent triangle. The barycentric coordinate formula is as follows:

$$\left\{ \begin{gathered} u=\frac{{(v1*v1)*(v2*v0) - (v0*v1)*(v2*v1)}}{{(v0*v0)*(v1*v1) - (v0*v1)*(v0*v1)}} \hfill \\ v=\frac{{(v0*v0)*(v2*v1) - (v0*v1)*(v2*v0)}}{{(v0*v0)*(v1*v1) - (v0*v1)*(v0*v1)}} \hfill \\ w=1 - u - v \hfill \\ \end{gathered} \right.$$
(4)

where uv and w are the three weights of the barycentric coordinates.

To meet specified size requirements, the generated images are scaled and padded. First, scale the image proportionally to the target size, then pad the edges to fit the specified dimensions. As shown in Fig. 2, these steps are integrated to efficiently convert large amounts of point cloud data into image data, facilitating subsequent image enhancement and deep learning model training.

Fig. 2
figure 2

Visualization of original vehicle point cloud (Left) and converted vehicle image (Right).

B. Data enhancement: CLAHE + Gamma correction.

Data enhancement techniques play a crucial role in improving image visibility and contrast, especially under poor lighting and environmental conditions. In this study, we use a combination of contrast-limited adaptive histogram equalization and Gamma correction to enhance images generated from point cloud data. This method significantly improves visual effects, thereby increasing the recognition accuracy of deep learning models.

CLAHE involves calculating the number of pixels at each gray level, and then using the cumulative distribution function to represent the cumulative probability of pixels at a given gray level, as shown in Eq. (5).

$$Fcd(k)=\sum\limits_{{k=0}}^{n} {P(k)}$$
(5)

where k is the current gray level, is the total number of gray levels, and () is the probability of gray level .

By normalizing \(Fcd(k)\) the histogram equalization transformation function () is obtained, as shown in Eq. (6).

$$G=\frac{{Fcd(k) - Fc{d_{\hbox{min} }}}}{{1 - Fc{d_{\hbox{min} }}}}*(L - 1)$$
(6)

where \(Fc{d_{\hbox{min} }}\) is the minimum value of \(Fcd(k)\) and is the number of gray levels.

To address the issue of excessive noise amplification in homogeneous regions of the image, a contrast limiting function () is introduced, as shown in Eq. (7).

$$B(g)=\left\{ \begin{gathered} {B_{\hbox{max} }},G>{B_{\hbox{max} }} \hfill \\ g,{B_{\hbox{min} }}<G<{B_{\hbox{max} }} \hfill \\ {B_{\hbox{min} }},G<{B_{\hbox{min} }} \hfill \\ \end{gathered} \right.$$
(7)

where \({B_{\hbox{min} }}\) and \({B_{\hbox{max} }}\) are the minimum and maximum contrast values set according to the characteristics of vehicle images, and is the output value of the function. Finally, the contrast limiting function is applied to each pixel in the image to obtain the equalized image.

Gamma correction is a nonlinear operation used to adjust the brightness of an image. By changing the Gamma value of an image, it can be made brighter or darker to better match the human visual characteristics. The Gamma correction formula is as follows:

$${I_{out}}=I_{{in}}^{\gamma }$$
(8)

where \({I_{out}}\) and \(I_{{in}}^{\gamma }\) represent the output and input pixel values, respectively, and γ is the Gamma value.

By selecting an appropriate Gamma value, the details and contrast of the image can be enhanced. Typically, the Gamma value ranges from 0.5 to 2.5. The specific value needs to be adjusted based on the characteristics and application requirements of the image. In the method proposed in this paper, the selected Gamma value is 1.5.

In practical applications, CLAHE is first applied to the input image to enhance contrast, followed by Gamma correction to adjust brightness. This significantly improves the visual quality of the image. Figure 3 shows a comparison of images before and after applying the combination of CLAHE and Gamma correction.

Fig. 3
figure 3

Original image (Left) and enhanced image (right).

As shown in Fig. 3, the combination of CLAHE and Gamma correction has a significant impact on the original images converted from vehicle point clouds. The contrast of the original image was 28.73, and the brightness was 32.03. After enhancement, the image’s contrast increased to 39.67, and the brightness to 44.72. The overall image became brighter, and the contrast was significantly enhanced, demonstrating the effectiveness of CLAHE and Gamma correction in adjusting image contrast and brightness.

Vehicle recognition: based on EfficientNet-B0 network

Fig. 4
figure 4

Vehicle recognition architecture.

Convolutional Neural Networks (CNNs) are a prevalent type of deep learning network widely used in image classification, object detection, and image segmentation. As shown in Fig. 4, we chose the EfficientNet-B0 network for vehicle recognition in this study.

EfficientNet, proposed by Google, is a highly efficient and accurate convolutional neural network. Its design significantly improves model performance and efficiency through a compound scaling method that adjusts the network’s depth, width, and resolution. The largest model in the EfficientNet series has 66 million parameters and achieves a top-1 accuracy of 84.3% on ImageNet, with a smaller network structure and faster runtime than the best models of its time.

The core idea of EfficientNet-B0 is to adjust the network’s three dimensions—depth, width, and resolution-simultaneously through compound scaling. Traditionally, model scaling focused on a single dimension, such as increasing the number of layers or the input image resolution, which could lead to wasted computational resources or performance bottlenecks. Compound scaling, however, balances model complexity and computational cost by simultaneously adjusting the network’s depth, width, and resolution, achieving higher performance.

As illustrated in Fig. 4, EfficientNet-B0 consists of multiple convolutional layers, pooling layers, and fully connected layers. Its main features include:

MBConv Modules: EfficientNet-B0 extensively uses Mobile Inverted Bottleneck Convolution modules. These modules enhance the model’s expressive power and performance while maintaining low computational costs. MBConv is similar to the Inverted Residual Block in the MobileNetV3 network but with differences. One difference is the activation function used; MBConv in EfficientNet uses the Swish activation function. Additionally, each MBConv includes a Squeeze-and-Excitation (SE) module. Figure 5 depicts the MBConv structure of EfficientNet-B0.

Fig. 5
figure 5

EfficientNet-B0 MBConv structure.

As shown in the Fig. 5, the MBConv structure mainly consists of a 1 × 1 convolution for dimensionality increase, including Batch Normalization (BN) and Swish; a k*k Depthwise Conv convolution, with typical sizes of 3 × 3 and 5 × 5; an SE module; and a 1 × 1 convolution for dimensionality reduction, including BN and a Dropout layer.

The SE module, depicted in Fig. 6, comprises of a global average pooling layer and two fully connected layers. The first fully connected layer has one-fourth the number of nodes as the input MBConv feature matrix channels and utilizes the Swish activation function. The second fully connected layer has an equal number of nodes as the output channels feature matrix generated by the Depthwise Convolutional layer and employs the Sigmoid activation function.

Fig. 6
figure 6

SE module structure.

Experiments setup

Data collection and processing

In this section, we validate the effectiveness of our proposed method through real-world experiments. The experiments utilized a LiDAR sensor manufactured by RoboSense (China), specifically the M1 solid-state LiDAR. Table 1 presents the detailed parameters of the LiDAR.

Table 1 LiDAR parameters.

The M1 is mounted on the actual platform within a dim working environment, as depicted on the left side of Fig. 6. Subsequently, a point cloud dataset required for the experiment is acquired under low-light conditions. This dataset encompasses point cloud data from three distinct vehicle models: BYD Song Pro, Nissan Xuan Yi, and Toyota Corolla. Each vehicle type comprises a total of 1000 samples of point cloud data, accompanied by an additional 1000 samples representing environmental background. Consequently, the overall dataset consists of 4000 samples.

Fig. 7
figure 7

Experimental platform (Left) and point cloud to JPG dataset.

To improve the model’s generalization and robustness, we pre-processed the point cloud data, including converting point clouds to images (PCD to JPG) and enhancing the images using CLAHE and Gamma correction. Figure 7 (right) shows the vehicle JPG dataset after point cloud conversion and image enhancement.

Experimental setup

The experiments were conducted on a workstation equipped with an NVIDIA RTX 4070 GPU and an Intel i7-13700KF CPU, running Windows 11. The primary software environment included Python 3.10, Pytorch 2.2.1 + CUDA11.8, Open3D 0.18.0, and OpenCV 4.9.0.

During training, the batch size was set to 32, the learning rate to 0.001, and the number of epochs to 10. The enhanced vehicle JPG dataset was loaded and pre-processed, resizing the training and test set images to (224, 224), converting them to tensors, and performing normalization and standardization.

The dataset was split into a training set (80% of the total dataset) and a test set (20%). DataLoader was used to package the training and test sets into iterable data loaders. The training set data loader was set to shuffle the data (shuffle = True) to increase randomness. During model training, the cross-entropy loss function was used, and the Adam optimizer was selected.

Evaluation metrics

To comprehensively evaluate the model’s performance, we used four metrics: accuracy, precision, recall, and F1 score. The formulas are as follows:

$$\begin{gathered} Accuracy=\frac{{(TN+TP)}}{{TN+FP+TP+FN}} \hfill \\ \hfill \\ \end{gathered}$$
(9)
$$Precision=\frac{{TP}}{{TP+FP}}$$
(10)
$$Recall=\frac{{TP}}{{TP+FN}}$$
(11)
$$F1score=\frac{{(2*Recall*Precision)}}{{Recall+Precision}}$$
(12)

where: TP: true positives, the number of correct positive predictions; TN: true negatives, the number of correct negative predictions; FP: false positives, the number of incorrect positive predictions; FN: false negatives, the number of incorrect negative predictions.

Result

Classification performance of the proposed method on various types of samples

Table 2 The classification performance of the proposed method on various types of samples.

Table 2 shows the classification performance of the proposed method on various types of samples. The experimental results demonstrate that the proposed method performs excellently in the classification tasks for different vehicle point cloud data, achieving an overall recognition accuracy of 98.88%, with precision and recall rates of 98.89% and 98.85%, respectively, and an F1-score of 98.86%. Specifically, the recognition accuracy for BYD Song and Toyota Corolla reaches 100%. Misclassification mainly occurs between Nissan Sylphy and background images. The F1-score indicates successful recognition of vehicle samples, while errors in recognizing backgrounds are relatively higher, possibly due to the diverse nature of the background samples, which distracts the model’s attention.

Comparison of classification performance between different models

Fig. 8
figure 8

Confusion matrix of the prediction results for the proposed method.

Table 3 Comparison of classification performance between different models.

We evaluated the performance of six models: InceptionV3, DenseNet, MobileNetV2, ShuffleNetV2, SENet, and EfficientNet, including the four evaluation metrics and the confusion matrices for the recognition results of each vehicle type. The experimental results are shown in Table 3; Fig. 8.

In terms of overall performance, EfficientNet significantly outperforms the other models across all evaluation metrics, achieving an impressive F1-score of 98.86%, with an overall classification accuracy of 98.88%. The confusion matrix analysis reveals that EfficientNet not only provides high precision in distinguishing between the tested vehicle models (Tesla, Nissan Xuan Yi, and BYD Song Pro), but it also avoids misclassifications between different car types. This superior performance can be attributed to EfficientNet’s compound scaling approach, which balances network depth, width, and resolution to achieve higher model efficiency and expressiveness.

SENet, ranking second in overall performance, achieves an F1-score of 96.30%, showcasing strong capabilities in vehicle type recognition. Its squeeze-and-excitation modules help emphasize meaningful features in the dataset, though its performance slightly lags behind EfficientNet in fine-grained classification tasks. DenseNet also demonstrates competitive results, with an F1-score of 95.58%. Its densely connected architecture enables feature reuse, enhancing recognition accuracy.

Conversely, lightweight models such as ShuffleNetV2 and MobileNetV2, while offering faster inference times and lower computational complexity, exhibit reduced classification accuracy, with F1-scores of 92.81% and 90.96%, respectively. These models face challenges in distinguishing between fine-grained vehicle features, which may explain the observed performance gap. InceptionV3, despite being widely adopted in image recognition tasks, records the poorest performance in this study, with an F1-score of 84.35%. This underperformance may stem from its architectural constraints, which are less suited for tasks demanding high expressiveness in distinguishing between similar vehicle models.

This study highlights the importance of selecting a model that balances precision and efficiency based on specific application requirements. By leveraging EfficientNet’s high accuracy and efficient design, our proposed recognition method is well-suited for deployment in real-world applications such as underground parking garages, where computational resources may be limited, and accurate vehicle model recognition is critical.

Accuracy comparison before and after image enhancement

Table 4 Comparison of accuracy before and after image enhancement.

Table 4 shows that after applying data enhancement techniques, the classification accuracy of all models improved. This indicates that the enhancement methods used in this study effectively improve the models’ image processing capabilities, enabling them to better extract useful features from the images. For EfficientNet, training with the original point cloud-converted images and the enhanced images increased its classification accuracy from 96.79 to 98.88%, a 2.09% improvement, which, though small, is still significant. This may be because EfficientNet already has good performance, and the image enhancement techniques further optimized its image processing capabilities. For InceptionV3, classification accuracy improved from 70.88 to 84.62%, a significant 13.74% increase, showing that image enhancement techniques played an important role in boosting its performance. This may be because InceptionV3 has relatively weak image processing capabilities, and image enhancement techniques can compensate for its deficiencies. These results indicate that the image enhancement methods used in this study help improve the models’ generalization ability and robustness, enhancing recognition accuracy.

Evaluation of recognition results at different distances

Table 5 Comparison of recognition results of datasets collected at different distances.

The experimental datasets for the above three sections were all collected when the LiDAR and the target vehicle were 2 m apart. To verify the validity of the datasets and the feasibility of the recognition method proposed in this study, we evaluated the recognition results of multiple networks at other collection distances. The experimental results are shown in Table 5.

As shown in the table, there are certain differences in the recognition accuracy of each model at 2-meter and 5-meter distances. Among them, the EfficientNet model achieved the highest accuracy under both conditions, with 98.88% at 2 m and 97.75% at 5 m, indicating its strong generalization ability across different distances. SENet and DenseNet followed in accuracy and showed relatively stable performance, with minimal changes in accuracy at the two distances.

From the comparison of recognition accuracy at different distances, all models demonstrated slightly lower accuracy at the 5-meter distance compared to the 2-meter distance, though the overall differences were small. This suggests that the proposed method can adapt to vehicle recognition tasks under varying distance conditions. However, it also reflects that increasing the target distance may impact the model’s ability to extract detailed features.

Complexity and computational cost evaluation of different network models

To comprehensively evaluate the applicability of different network models in the target recognition task, this study compared the complexity and computational cost of the six network models mentioned above. The experimental results are shown in Table 6.

Table 6 Complexity and computational cost evaluation of different network models.

By comprehensively evaluating the models’ recognition performance, complexity, and computational costs, EfficientNet stood out due to its excellent recognition accuracy (98.88%), low complexity (4.013 M parameters, 0.421G FLOPs), and reasonable inference time (5.14ms), demonstrating the best overall performance in this study.

In contrast, although SENet and DenseNet also performed well in terms of recognition accuracy, their large model sizes or long inference times significantly increase computational and storage burdens. ShuffleNetV2 and MobileNetV2 exhibited outstanding performance in lightweight design and inference efficiency but showed relatively low recognition accuracy, which may limit their applicability in scenarios requiring high precision. InceptionV3, on the other hand, did not meet expectations. Its complex architecture and high computational cost did not result in a significant improvement in recognition performance, making it unsuitable for this study’s requirements.

Disscussion

The proposed method of vehicle recognition using LiDAR point cloud, combined with image enhancement and EfficientNet, demonstrates significant advantages in low-light environments. Firstly, the experimental data solely rely on LiDAR without dependence on other sensors, ensuring that the recognition performance remains unaffected by adverse weather conditions or poor lighting conditions. By converting point cloud data into images and applying image enhancement techniques such as CLAHE and Gamma correction, the quality of the images is significantly enhanced leading to improved classification accuracy. Experimental results indicate that when processing the enhanced image data, EfficientNet achieves superior performance with a classification accuracy of 98.88%. This novel approach of converting point clouds into images for enhancement and recognition not only showcases its effectiveness but also exhibits exceptional robustness in classifying different car models, thereby validating its practical applicability.

Despite the promising results, there are some limitations to this method. Firstly, due to the constraints of data collection efficiency, the dataset only includes point cloud data for three types of vehicles, which may limit the model’s generalization ability to other vehicle types. Because this dataset was created by collecting point cloud data from a fixed vehicle at a fixed distance from the lidar sensor. This setup allows for accurate and noise-free data acquisition, ensuring high quality baseline data for algorithm development. However, we acknowledge that this setup represents an idealized scenario compared to real-world environments where vehicles are often in motion, introducing challenges such as motion distortions, occlusions, and incomplete point cloud data. In real-world scenarios, moving vehicles may result in blurred or fragmented point cloud captures, which could degrade recognition performance. Secondly, although the method achieves good recognition performance, the operations, processing, and computational complexity involved are relatively high, potentially hindering real-time applications. Additionally, this study primarily focuses on data processing from a single sensor, without considering the potential advantages of multi-sensor fusion.

To address these potential issues, future iterations of this research will consider augmenting the dataset to include dynamic scenarios where vehicles are in motion and occlusion effects are present. Advanced data preprocessing techniques, such as motion compensation algorithms, can be applied to mitigate motion-induced distortions. For instance, point cloud alignment techniques can be employed to stitch fragmented data into coherent shapes, while occlusion-aware algorithms can improve robustness by reconstructing missing data based on context.

Looking ahead, there are several directions in which this research can be further improved and expanded: First, increasing the diversity of vehicle types and the number of samples in the dataset will enhance the model’s generalization ability. Collecting data under different environmental conditions, including varying distances and dynamic settings, will also contribute to improving robustness. Second, from another perspective, introducing small-sample learning techniques will enable the model to recognize new vehicle models with limited training data, thereby adapting to the frequent introduction of new vehicle designs in the real world. Third, integrating motion compensation and data augmentation techniques will address real-world challenges such as motion distortions and occlusions. Finally, considering practical applications, developing a lightweight model suitable for edge computing environments will be crucial for achieving real-time deployment in autonomous driving systems and other resource-constrained applications.

Conclusion

In this paper, we propose a novel approach that combines image enhancement and EfficientNet to achieve precise vehicle recognition using LiDAR point clouds in low-light conditions. By converting the point cloud data into images and applying CLAHE and Gamma correction techniques, we significantly enhance the quality of the input data. When combined with the EfficientNet model, our method achieves highly accurate recognition results. Experimental results validate the effectiveness of our approach, demonstrating a recognition accuracy of 98.88% and an F1-score of 98.86%. Our method not only introduces novelty but also exhibits exceptional robustness by solely utilizing LiDAR data, thereby providing a new perspective and auxiliary technique for intelligent transportation vehicle type recognition.

The automotive point cloud data set collected using lidar in the current study is available from the corresponding author upon reasonable request.