The analysis of acquisition system for electronic traffic signal in smart cities based on the internet of things

Ma, Yue; Wang, Chenglong; Fu, Tianlei; Meng, Ziting

doi:10.1038/s41598-025-07423-6

Download PDF

Article
Open access
Published: 01 July 2025

The analysis of acquisition system for electronic traffic signal in smart cities based on the internet of things

Yue Ma¹,
Chenglong Wang²,
Tianlei Fu³ &
…
Ziting Meng⁴

Scientific Reports volume 15, Article number: 20628 (2025) Cite this article

1816 Accesses
Metrics details

Subjects

Abstract

This work designs an intelligent traffic electronic information signal acquisition system based on the Internet of Things (IoT) and deep learning (DL). It aims to address the increasingly severe congestion issues in urban traffic and improve the efficiency and intelligence of traffic management. First, a system framework is constructed that includes three core modules: video acquisition and transmission (VAT), video processing (VP), and information processing. The system captures real-time video information of traffic scenes through cameras. Meanwhile, vehicle detection and tracking are performed using a video image processor (VIP) to extract traffic parameters, which are then transmitted to the traffic control platform. Second, an improved Multi-Task Convolutional Neural Network (MT-CNN) model, called Attention-Mechanism Multi-Modal Feature Fusion GooGleNet (AM-MMFF-GooGleNet), is proposed. This model integrates Multi-Modal Feature Fusion (MMFF) and Channel Attention Mechanism (AM), significantly improving the accuracy and robustness of vehicle localization and identification. Experimental results show that the AM-MMFF-GooGleNet model achieves an accuracy of 98.6% in the vehicle localization task, 3.2% higher than the original MT-GooGleNet. The accuracy under different lighting conditions and high background noise scenarios reaches 97.3%, 96.8%, and 95.5%, demonstrating strong environmental adaptability. Furthermore, the average detection time of the model is 20.5 milliseconds, indicating good real-time performance. By optimizing the DL model and system design, the ability to acquire and process vehicle electronic information signals in the intelligent transportation system (ITS) is remarkably enhanced, providing more precise decision-making support for traffic management. This work offers an innovative technical solution for the development of ITS, promoting the deep integration and application of IoT and DL technologies in the traffic field. Thus, the work provides strong technical support for alleviating urban traffic congestion and improving traffic efficiency.

A multi-feature spatial–temporal fusion network for traffic flow prediction

Article Open access 20 June 2024

A lightweight network for traffic sign detection via multiple scale context awareness and semantic information guidance

Article Open access 24 March 2025

Traffic planning in modern large cities Paris and Istanbul

Article Open access 15 June 2024

Introduction

In the wake of rapid social and economic development, improved living standards have led to a surge in vehicular presence on roads. Paradoxically, this has given rise to escalating traffic congestion issues. Concurrently, the advent of the Internet of Things (IoT) and the pervasive growth of Artificial Intelligence (AI) have ushered in transformative changes in contemporary lifestyles and professional landscapes. Notably, the deployment of the intelligent transportation system (ITS) has emerged as a strategic intervention, mitigating traffic congestion and enhancing the operational efficiency of urban traffic^1,2.

Efficient implementation of an ITS necessitates the timely and effective acquisition of critical road traffic information, including vehicle type, occupancy rate, traffic flow, and vehicle speed. This pivotal stage demands a robust information acquisition system capable of real-time data collection and seamless transmission to the processing backend^3,4,5. Various methodologies exist for collecting traffic information signals, with video acquisition emerging as a predominant technique. Compared to traditional methods, video detection (VD) offers a more visually comprehensive depiction of traffic scenes and boasts a significantly broader detection range. The integration of image recognition (IR) and video analysis (VA) technologies for vehicles facilitates real-time tracking and administration of vehicles⁶. In the realm of image processing, Convolutional Neural Networks (CNNs) based on deep learning (DL) stand out as a widely adopted approach. Recognized for its swift evolution and performance optimization, CNNs stand out as a highly efficient feedforward neural network model. They capitalize on local connections and weight sharing, proving especially beneficial in applications such as natural language processing, speech recognition, machine translation, and image segmentation^7,8. The distinctive convolutional pooling operation of CNNs exhibits remarkable efficacy in image processing, establishing them as unmatched performers in 2D graphic tasks like IR and graphic orientation⁹. The Residual Neural Network (ResNet) signifies a significant advancement in CNNs, engineered to empower the convolutional layer (CL) in learning residuals as opposed to directly focusing on the objective¹⁰. Researchers, including Zhao et al.¹¹ and Chen et al.¹², have extensively explored ResNet’s capabilities, emphasizing its efficacy in IR based on UAV remote sensing systems and the prototype parts network architecture for DL. In the context of vehicle location recognition, CNNs, and notably ResNet, continue to be pivotal components in advancing the capabilities of ITSs.

Although existing DL models perform excellently on some standard datasets, they still face numerous challenges in practical applications. Firstly, vehicle localization and identification tasks are often conducted in complex environments, where factors such as lighting changes, viewpoint variations, and background complexity can affect model performance. Existing methods often struggle to maintain stable recognition ability under different lighting conditions and complex backgrounds. Especially, when the difference between the target and background is minimal, the model is more susceptible to interference. Secondly, the diversity and complexity of vehicle images, such as the variety of vehicle types and diverse shooting angles, increase the difficulty of vehicle localization and identification. Although some methods have achieved good results, they still have limitations in handling multimodal data, adapting to complex environments, and improving the model’s robustness. To address these challenges, this work contributes in the following ways: (1) An improved multi-task CNN model—Attention-Mechanism Multi-Modal Feature Fusion GooGleNet (AM-MMFF-GooGleNet)—is proposed, which integrates multi-modal features with a channel attention mechanism. This significantly enhances the robustness and accuracy of vehicle localization and recognition in complex environments. (2) A traffic electronic information signal acquisition system framework based on the IoT and DL is designed, enabling real-time monitoring and efficient processing of dynamic traffic data. (3) By incorporating a dynamic loss-weighting mechanism and optimizing with attention modules, the model effectively balances performance between localization and classification tasks in multi-task learning. The average detection time of the model is only 20.5 milliseconds, meeting the real-time requirements of practical scenarios. (4) The model’s strong adaptability to challenging environments—such as low-light conditions and high background noise—has been validated. It achieves a localization accuracy of 98.6%, representing a 3.2% improvement over the baseline model. This work addresses key issues in existing methods and proposes an innovative DL model that provides a more reliable solution for vehicle localization and identification tasks in the ITS.

Recent related work

Advancement in IoT-based ITSs

In recent years, IoT technology has injected new vitality into the development of ITS, driving the transformation of traffic management from traditional static modes to dynamic and intelligent paradigms. Li et al.¹³ proposed a “Traffic Internet” framework, which integrated heterogeneous data sources from roadside sensors, in-vehicle terminals, and mobile devices to construct a cross-platform real-time traffic data interaction network. Built on the 5G-Vehicle-to-Everything (V2X) communication protocol, the framework enables millisecond-level synchronization between vehicles and infrastructure, significantly improving the efficiency of dynamic road network resource allocation. Building upon this, Liu and Ke¹⁴ proposed a cloud–edge collaborative IoT traffic control system architecture. At its core lies a dynamic resource allocation algorithm that distributes computational tasks by priority to either the cloud (for global optimization) or edge nodes (for low-latency responses). Experimental results showed that this system reduced incident response times and limited the spread of traffic congestion. To address the multi-objective needs of complex urban traffic management, Musa et al.¹⁵ explored the application potential of IoT-driven ITS in smart cities from a sustainability perspective. Their proposed framework integrated AI-based sensors to monitor traffic flow and carbon emissions in real time, dynamically delineated low-emission zones to restrict high-polluting vehicles, and optimized public transport schedules using advanced dispatching algorithms. This approach reduced PM2.5 concentrations and improved public transit usage in a pilot city. Panigrahy and Emany¹⁶ focused on Internet of Vehicles (IoV) scenarios and systematically reviewed the application of network optimization techniques in multi-vehicle cooperative communication and route planning. They proposed a reinforcement learning-based dynamic spectrum allocation strategy, which intelligently switched communication frequencies in high-traffic areas to reduce communication latency. They also demonstrated the critical role of edge computing nodes in local path planning. Additionally, Zeng et al.¹⁷ examined the effectiveness of IoT sensors in emergency traffic scheduling from the perspective of urban disaster management. By deploying a multi-modal monitoring network consisting of geomagnetic sensors, cameras, and drones, the system can identify road interruptions caused by landslides or floods in real time. Using federated learning, disaster data are shared across regions, improving emergency vehicle dispatch efficiency and reducing rescue response times.

Current state of DL-based information acquisition system research

DL technology is progressively becoming the core driving force behind intelligent traffic information acquisition and processing. Through end-to-end feature learning and the extraction of complex patterns, it significantly enhances the level of system intelligence. Yang et al.¹⁸ developed a fault-tolerant navigation system based on edge computing, utilizing a lightweight CNN to enable real-time vehicle trajectory prediction. By compressing model parameters and employing quantization techniques, the system achieved an inference speed of 30 frames per second on embedded in-vehicle devices. Moreover, the integration of a spatiotemporal attention mechanism reduced trajectory prediction errors under complex traffic conditions. Das et al.¹⁹ explored the integration potential of blockchain and DL, and proposed a decentralized traffic data verification mechanism. Their multi-party secure computation protocol allowed vehicles to locally train lightweight CNN models, with model parameters aggregated in a distributed manner via blockchain smart contracts. Experiments demonstrated that this approach not only preserved data privacy but also improved the accuracy of vehicle type recognition. Prakash et al.²⁰ applied machine learning algorithms to the design of ITS in vehicular networks. Their case study underscored the importance of model generalization and fault-tolerance mechanisms in complex scenarios. In terms of model optimization, Gong et al.²¹ systematically reviewed the applications of edge intelligence in ITS, and highlighted that federated learning could mitigate data silos through distributed training. Njoku et al.²² went a step further by introducing metaverse technology into data-driven transportation systems. By constructing high-fidelity virtual simulation environments, they pre-trained DL models to handle extreme traffic scenarios. Experimental results showed that models pre-trained in virtual environments exhibited improved object detection accuracy during real-world road testing and demonstrated significantly enhanced adaptability to varying lighting conditions. Additionally, Karthikeyan and Usha²³ designed a cognitive science-based IoT intelligent transportation framework that employed visual-radar multimodal data and multi-task learning to share low-level features between vehicle localization and classification tasks. This framework dynamically reweighted sensor inputs via a channel attention mechanism, effectively reducing localization errors in low-light nighttime conditions. Oladimeji et al.²⁴ further emphasized that multimodal fusion could significantly enhance the adaptability of autonomous driving systems in complex environments. However, its computational complexity still required optimization through model pruning and hardware acceleration. Zhang et al.²⁵, addressing deployment challenges of federated learning in ITS, proposed an asynchronous update strategy. This strategy allowed edge nodes to dynamically adjust their parameter upload frequency based on network conditions, thereby reducing communication overhead while ensuring model convergence. Nonetheless, Garg and Kaur²⁶ pointed out that DL models’ high dependence on computational power and their black-box nature remained major obstacles to large-scale deployment on resource-constrained devices.

Past research has made significant progress in the field of IoT-based ITS, but there are still some challenges and limitations. For example, previous studies have mainly focused on vehicle detection and tracking, with relatively little in-depth research on electronic TSA systems. Additionally, while some studies have achieved certain accomplishments in vehicle recognition, there is still room for improvement in accurately identifying vehicle models. Therefore, the motivation of this work is to fill these research gaps, with a focus on the design of electronic TSA systems and optimization of DL network models. In comparison to previous research, this work proposes a comprehensive framework aimed at the real-time acquisition of electronic traffic signals through IoT technology. This framework not only considers vehicle detection and tracking but also pays special attention to the process of acquiring and processing electronic signals. By introducing an enhanced MT-CNN, this work achieves precise recognition of vehicle positions and models, yielding remarkable results. Furthermore, this work introduces data augmentation and pre-training techniques to further enhance the performance of the model regarding vehicle classification issues. Compared to previous research, the innovation of this work lies in proposing a comprehensive solution that considers various aspects of the electronic TSA system and achieving accurate recognition of vehicle information through the optimization of DL network models. Moreover, this work integrates practical traffic scenarios and requirements, validating the effectiveness and feasibility of the proposed solution through simulated experiments and comprehensive evaluations. Therefore, this work not only fills gaps in previous studies but also provides important references and insights for further development in ITS.

Design of the acquisition system for the electronic traffic signal

System framework

Figure 1 illustrates the comprehensive framework of the electronic TSA system. The system employs a camera to capture video information from the traffic road scene, which is subsequently transmitted to the Video Image Processor (VIP). The VIP undertakes the processing and analysis of the video data, focusing on detecting and tracking vehicle information. The obtained results are further processed based on specific requirements to derive relevant traffic parameters. Concurrently, these traffic parameters are relayed to the data server and seamlessly integrated into the traffic control platform. Functionally, the system unfolds across three core modules: video acquisition and transmission (VAT), video processing (VP), and information processing.

Functional modules of the system

The pivotal element within the electronic TSA system is the VAT module. This component adeptly gathers video imagery from the transportation scenario using the acquisition equipment, subsequently transmitting them to the multi-media processing unit for intricate processing. Implementation of traffic video acquisition demands meticulous consideration of two critical facets: the effective acquisition range and the image quality discernible within the video frames. The third aspect involves the potential challenges associated with a restricted detection range, blurred video imagery, low contrast, and pronounced occlusions, all of which amplify the intricacies of subsequent image processing. To mitigate these challenges, measures are imperative during the acquisition process. This typically involves adjustments to the camera’s aperture, shutter, and other parameters to ensure the acquisition of clear and detailed video images above the road.

The VP module is at the heart of the electronic TSA system. This module intricately collects traffic electronic information, either directly or indirectly, derived from the outcomes of automobile identification and monitoring procedures. Consequently, the precision of automobile localization and identification emerges as a pivotal factor directly influencing the gathering of electronic signals related to traffic information. This work, therefore, places a primary emphasis on the nuanced aspects of automobile localization and identification.

Figure 2 illustrates the algorithmic flow crafted for VD tailored to the specific characteristics of urban roads. The process commences with the initialization of key parameters. Subsequently, essential data is extracted from the application environment. By employing virtual line detection based on lane considerations, the algorithm segregates the vehicle into distinct zones: the detection and tracking areas. Tracking is then executed within the tracking area, contingent upon the dynamics of target tracking. In cases where the target is not tracked within the specified area, the algorithm refrains from unnecessary analysis, thereby mitigating computational overhead and reducing tracking blind spots.

DL Network modules

CNNs

CNNs represent a class of feedforward neural networks seamlessly combining a classifier and a feature extractor. Upon inputting an image, a series of feature vectors emerge after multiple iterations of feature learning, closely mirroring the image’s semantic content. Subsequently, this vector set undergoes classification and recognition processes within the classifier^27,28. The typical composition of a CNN includes the down-sampling layer (DSL), the input layer, the CL, the output layer (OL), and the fully connected layer (FCL). CNN finds primary utility in IR, with its fundamental structure depicted in Fig. 3.

The CL is distinctive to CNNs, and Eq. (1) depicts its activation function, ReLu.

$$ReLU\left( x \right)=\hbox{max} \left( {0,x} \right)$$

(1)

.

The function of the ReLU is to truncate the negative part of the input while keeping the positive part unchanged. This characteristic allows the ReLU activation function to maintain linearity in the positive range and output zero in the negative range. The application of the ReLU function can accelerate the training speed of neural networks and reduce the problem of vanishing gradients, thereby improving the performance of the network. Equation (2) describes the feature computation process of CL, which involves convolution operations. It is assumed that there is an input feature map $\:I$, a bias term $\:b$, and the weights $\:\omega\:$ of the convolution kernel. Then, the calculation of the output feature map $\:O$ can be represented as:

$$y_{n}^{l}={f_1}\left( {\sum\limits_{{m \in V_{n}^{l}}} {y_{m}^{{l - 1}} \otimes \omega _{{m,n}}^{l}+b_{n}^{l}} } \right)$$

(2)

.

In Eq. (2), $y_{n}^{l}$ denotes the n-th feature map (FM) in the initial CL, $b_{n}^{l}$ signifies the offset value linked to the first layer,$\omega _{{m,n}}^{l}$ represents the weight of the connection between the m-th FM in the upper layer and the n-th neuron in the first CL, and $V_{n}^{l}$ indicates the set of FMs connected to the first CL.

The DSL, following the CL, maintains an equivalent number of features. In this layer, the input is partitioned into multiple blocks, where the values for each block are ascertained using a pixel-specific sampling technique, followed by the addition of biases. The ultimate output involves the stimulation of functions^29,30. Down-sampling enhances features, rendering them more resilient to deformations. In cases where local features remain unchanged post-deformation, uniformly sampled features exhibit consistency. Simultaneously, the reduced sampling size of the FM significantly diminishes the data volume needed for the follow-up procedure, thereby augmenting training effectiveness³¹. Equation (3) delineates the characteristic computation process of the DSL.

$$y_{n}^{l}={f_1}\left( {z_{n}^{{l - 1}} \times \omega _{n}^{l}+b_{n}^{l}} \right)$$

(3)

.

In Eq. (3), $\omega _{n}^{l}$ symbolizes the reflection weight, $z_{n}^{{l - 1}}$ denotes the value acquired through sampling within the stationary frame for the (l−1)-th CL features, and $b_{n}^{l}$ signifies the offset value associated with the (l−1)-th CL. This process achieves down-sampling by applying sampling techniques within local regions of the feature map, such as max pooling or average pooling.

Following the CL is the pooling layer, which is unique to CNN. However, unlike the CL, the pooling layer does not activate the function³². The combination of the CL and pooling layer can be repeated multiple times in the hidden layers, with the specific occurrences determined by the model’s requirements³³. The FCL is typically employed after the CL and pooling layer. This layer often consists of a sigmoidal or radial basis function at the network’s final base. Equation (4) represents the calculation of the sigmoidal function in the OL.

$$y_{n}^{l}={f_1}\left( {\sum\limits_{{m=1}}^{{{N_{l - 1}}}} {y_{m}^{{l - 1}}\omega _{{m,n}}^{l}} +b_{n}^{l}} \right)$$

(4)

.

In Eq. (4),${N_{l - 1}}$ refers to the number of neurons in the OL, $\omega _{{m,n}}^{l}$ represents the weight of the connection between the m-th FM in the upper layer and the n-th neuron in the first CL, and $y_{m}^{{l - 1}}$ refers to the m-th FM in the (l−1)-th CL.

The CNN training process involves the application of the back-propagation (BP) algorithm due to the presence of the CL and DSL. These layers significantly reduce the parameters requiring training, contributing to computational efficiency. The supervised learning algorithm is employed in constructing the CNN^34,35,36, and the training process involves correcting errors through gradient descent. The forward and backward movement of the gradient in each step of the algorithm consumes considerable time. Figure 4 visually represents the algorithm flow during the training process of the CNN.

CNNs have emerged as a powerful DL method in image processing, excelling in the extraction of intricate image features. The core methodology involves convolutional verification of image patterns, utilizing filters to extract features comprehensively from different facets and orientations of the image. This process ensures the acquisition of a multitude of features by training a minimal number of convolution parameters^37,38. The efficiency is primarily attributed to weight sharing, where the network undergoes iterative optimization by training convolution kernel parameters through the BP method.

Vehicle location and binary classification research based on MT-CNN

In scenarios of image complexity where vehicles may be absent, the task of identifying vehicles necessitates a synergistic approach involving image classification and automobile localization. This work employs an MT-DL model, specifically MT-CNNs, to address IR and image classification challenges concurrently. MT-CNN capitalizes on related tasks to foster heightened learning motivation, effectively distinguishing differences between tasks while facilitating the sharing of pertinent features. The accompanying figures illustrate the distinction between single-task DL (Fig. 5 A) and MT-DL (Fig. 5B) diagrams.

The automobile localization and characteristic extraction process in this work leverages GooGleNet. The rationale behind choosing GooGleNet lies in its utilization of multiple 1 × 1 convolution kernels, enhancing nonlinear fitting capabilities and facilitating dimension reduction. Although GooGleNet typically yields three outputs, the result from the final OL is regarded as the optimal result. Hence, the output from the 22nd OL of the network structure is selected and combined with the positional output. Figure 6 illustrates the MT-GooGleNet model.

In addressing the automobile localization problem, a modification is introduced to the network’s loss function (LF), employing the Euclidean distance function. Considering a scenario with multi-class problems denoted by class C and N training samples, the modified LF is expressed in Eq. (5).

$${E^N}=\frac{1}{2}{\sum\limits_{{n=1}}^{N} {\sum\limits_{{k=1}}^{C} {\left( {t_{k}^{n} - y_{k}^{n}} \right)} } ^2}$$

(5)

.

In Eq. (5), $t_{k}^{n}$ represents the kth dimension of the label for the n-th sample, and $y_{k}^{n}$ denotes the k-th output for the n-th sample in the network’s output.

The algorithm proposed primarily addresses the collection of electronic traffic signals in ITS, particularly focusing on vehicle localization and recognition issues. The core of the algorithm is the DL-based MT-GooGleNet model. The first step of the algorithm is to construct a framework for the electronic TSA system. This framework utilizes cameras to capture video information of road traffic scenes, which is then transmitted to a VIP for processing. The VIP is responsible for detecting and tracking vehicle information and converting the processed results into relevant traffic parameters, which are then sent to the data server and integrated into the traffic control platform. The VAT module is a critical part of the system, responsible for obtaining video images from traffic scenes using acquisition devices and transmitting them to the multimedia processing unit for processing. During this process, considerations include effective acquisition range, image quality, and potential challenges such as detection range limitations, image blurring, low contrast, and occlusion. Vehicle localization and recognition are the core components of the algorithm. Initially, vehicles are segmented into different areas based on lane considerations using virtual line detection: detection areas and tracking areas. Within the tracking area, tracking is performed based on the dynamics of the target. If the target is not tracked within the specified area, the algorithm avoids unnecessary analysis to reduce computational overhead and tracking blind spots. The CNN model is a feedforward neural network in DL that combines the functions of a classifier and a feature extractor. After inputting the image, a series of feature vectors are generated through multiple iterations of feature learning, and these vectors are classified and recognized in the classifier. This work adopts the MT-CNN model, particularly MT-GooGleNet, to simultaneously address IR and vehicle positioning issues. MT-CNN utilizes relevant tasks to enhance learning motivation, effectively distinguishing differences between tasks, and promoting the sharing of relevant features. For the vehicle localization issue, modifications are made to the LF of the network, employing the Euclidean distance function. In terms of capturing comprehensive information during the driving process, the MT-GooGleNet model is used to locate vehicles in the image. Then, the regions of vehicles in the image are used as inputs for the final classification model. This method first utilizes MT-GooGleNet for vehicle localization and then uses the localization results as inputs for the second CNN model to classify vehicle categories. Table 1 presents the model code.

Table 1 The model code.

Full size table

The present work compares GooGleNet, CaffeNet, and VGGNet in terms of automobile localization accuracy. The experiment utilizes a dataset comprising 30,000 images for training, consisting of 15,000 positive samples (PSs) and an additional 15,000 artificially generated negative samples (NSs). Within this dataset, 20,000 images are allocated for training purposes, while the remaining 10,000 images are reserved for testing. The distribution of both positive and NSs is proportional. The binary classification (BC) experiment employs two methods. In the first method, MT-GooGleNet undergoes comprehensive training, initializing all its network layers before BC training. The second method involves fine-tuning MT-GooGleNet, utilizing its initial parameters to initialize the convolution of all the FCLs and the last two layers of MT-GooGleNet before BC training. Following these procedures, a BC experiment is conducted, and the results undergo comparative analysis.

Optimization of the MT-GooGleNet model

Based on the original MT-GooGleNet model, an improved model with MMFF and AM is proposed—AM-MMFF-GooGleNet. The specific structure of the model is depicted in Fig. 7:

In Fig. 7, AM-MMFF-GooGleNet aims to enhance the robustness and accuracy of vehicle localization and identification tasks. The model is innovatively optimized based on the classic MT-GooGleNet structure, incorporating MMFF and AM. In the shallow network, the MMFF module integrates multimodal inputs, such as RGB and grayscale images, to extract features with higher expressiveness. This ensures that the model maintains stable performance under different lighting conditions. Meanwhile, in the deep network, the SE-block is added to adaptively adjust the weight of each channel’s features, enhancing the model’s ability to focus on target features and suppressing interference from complex backgrounds. Moreover, a dynamic loss weight mechanism is designed to enhance multitask learning performance. In this mechanism, the model dynamically adjusts the weight of each task in the loss function based on the needs of classification and localization tasks, achieving better balance. The overall model consists of an input layer, MMFF module, convolutional feature extraction layer, attention optimization module, and output layer. The improved AM-MMFF-GooGleNet enhances the synergy between modules and significantly improves the model’s efficiency and accuracy.

Data processing and decision-making process in ITS.

The traffic system processing flow proposed is a complete system that involves the collaboration of multiple modules from data collection to the derivation of traffic parameters. The first step of the process is data collection, primarily carried out by cameras responsible for real-time capturing of video information of road traffic scenes. These video data contain dynamic behaviors and static features of vehicles, serving as the foundational data source for subsequent processing. The collected video data is transmitted to the VIP through wired or wireless networks. During this process, the compression and transmission efficiency of data are crucial to ensure data integrity and real-time performance. Upon receiving the video data, the VIP preprocesses them, including noise reduction, contrast enhancement, and color correction, to improve image quality for subsequent vehicle detection and tracking. Additionally, the VIP is responsible for converting the video stream into individual frames for processing by DL models. The processed single-frame images are inputted into DL-based detection models. The enhanced MT-GooGleNet model proposed plays a role in this step, accurately locating vehicles in the image and tracking their movement trajectories. By learning from a large amount of vehicle image data, the model captures the appearance characteristics and behavioral patterns of vehicles. The results of vehicle localization and recognition are used to derive traffic parameters. For example, traffic flow can be calculated based on vehicle position information, and traffic composition and speed can be analyzed based on vehicle type and speed information. These parameters form the basis for traffic management system decisions. The derived traffic parameters are inputted into the data processing module, which is responsible for further analysis and processing of the data. For example, traffic trends can be predicted through statistical analysis, and traffic signal control can be optimized through machine learning algorithms. Finally, the processed and analyzed data are integrated into the traffic control platform to provide decision support for traffic management departments. These data can be used to adjust the timing of traffic signals, plan emergency response routes, and disseminate traffic information, among other purposes. Throughout the entire process, the interaction and data transmission between various modules are mainly achieved through data interfaces and communication networks. Data interfaces define the format and transmission protocols of data to ensure correct transmission between different modules. Communication networks ensure real-time transmission and efficient processing of data. Through this collaborative approach, the system proposed achieves real-time monitoring and management of traffic flow, improving the operational efficiency and safety of the traffic system.

Identification of vehicle category via MT-CNN

This work validates the effectiveness of the proposed Enhanced MT-GooGleNet model (AM-MMFF-GooGleNet model) through a series of carefully designed simulation experiments. A DL-based vehicle recognition model is trained for this purpose. First, the images undergo preprocessing, including image enhancement and resizing. Then, the MT-GooGleNet is selected as the model architecture, with LF defined as the cross-entropy LF. The Adam optimizer is employed, with an initial learning rate set to 0.001. During the model training process, stochastic gradient descent is utilized for each iteration, with each batch containing 32 samples. The initial learning rate is set to 0.001, and the decay rate is set to decrease by a factor of 0.1 every 1000 iterations. The batch normalization technique is applied to accelerate the convergence process of the model, and a Dropout probability of 0.5 is used to prevent overfitting. In the model training process, 5-fold cross-validation is employed to evaluate the performance of the model.

In capturing comprehensive information during the driving process, this work employs the MT-GooGleNet model to locate vehicles in images. The vehicle area in the image, post-shearing, is then utilized as input for the final classification model. The methodology involves employing MT-GooGleNet for initial automobile localization, using the localization results as input for the second CNN to classify vehicle categories. Notably, two enhancements are incorporated into GooGleNet. Firstly, the training set data of GooGleNet is augmented by mirroring, and then GooGleNet is used to classify the original image. Secondly, GooGleNet undergoes pre-training by classifying it on the rae images, utilizing the introductory parameters of the MT-DL model, and subsequent models are compared and analyzed. This work utilizes the Vehicle Make, Model Recognition Dataset, which is available at: https://www.kaggle.com/datasets/abhishektyagi001/vehicle-make-model-recognition-dataset-vmmrdb. This dataset comprises 9,170 categories, consisting of 291,752 images, covering vehicle models manufactured from 1950 to 2016. The dataset is collected by different users using various imaging devices, and the images include multiple shooting angles, ensuring the diversity and complexity of the data. This design fully considers multiple variable factors in practical application scenarios, such as lighting conditions, imaging device differences, and viewpoint variations. Additionally, the vehicle images in the dataset are not strictly aligned, and some samples contain irrelevant backgrounds, further increasing the challenge and robustness requirements for model training. The VMMRdb dataset also covers 712 regions, encompassing vehicle images from all 412 metropolitan areas in the United States. This lays a solid foundation for the model’s wide applicability in real-world scenarios such as traffic monitoring. Regarding data processing, the raw data is first standardized and preprocessed, including image denoising, brightness equalization, and contrast enhancement, to reduce interference caused by imaging devices and lighting conditions. Furthermore, to improve the model’s adaptability to multi-scene and multi-angle samples, a series of data augmentation techniques are applied, such as random rotation, cropping, scaling, horizontal flipping, and color jittering. These techniques expand the effective size of the training data and help prevent overfitting. For images containing complex backgrounds, segmentation techniques are used to isolate the target vehicle and reduce interference from irrelevant backgrounds. In data splitting, the dataset is randomly divided into training, validation, and test sets at a ratio of 7:2:1, ensuring an even distribution of samples across categories. Through these steps, the processed dataset not only enhances the model’s generalization ability but also distinctly improves its applicability and robustness in complex scenarios, providing a reliable data foundation for subsequent experiments.

During the model training process, the performance of AM-MMFF-GooGleNet is significantly enhanced through fine-tuning and proper parameter initialization. First, a fine-tuning strategy is designed to improve the model’s performance on specific tasks, especially in balancing multiple tasks. The model is initialized with a learning rate of 0.001 and an exponential decay method is applied, reducing the learning rate to 0.9 times its previous value every 500 iterations. This enables the model to gradually converge during training. Additionally, to prevent overfitting and enhance the model’s robustness, Dropout is applied after the FCL and CL, with a dropout rate set to 0.5, which helps reduce the model’s overfitting to the training data. At the same time, to accelerate convergence and stabilize the training process, Batch Normalization is applied after each CL, which is highly effective in speeding up network training and further improves the model’s stability during training. For parameter initialization, the He initialization method is used for the initialization of the CL and FCL layers. Compared to the traditional Xavier initialization, He initialization demonstrates faster convergence and higher accuracy when used with the ReLU activation function. To avoid issues with vanishing or exploding gradients, smaller initial values are set for the weights and biases of the CL, ensuring stable learning efficiency during the training process. Through these fine-tuning steps and initialization optimizations, AM-MMFF-GooGleNet achieves faster convergence and higher classification performance during training.

Experimental environment

The experiments are conducted in a dedicated environment tailored for model training and testing. The powerful computational capabilities of the GPU, coupled with the optimization provided by CUDA and cuDNN, enable efficient processing of complex calculations involved in DL models. Additionally, the stability of the Ubuntu operating system and TensorFlow framework ensures smooth execution of the experiments. With these high-performance hardware and software configurations, the accuracy and reliability of the experimental results are ensured.

In terms of hardware, the Ubuntu 18.04 LTS operating system is utilized, and it is a stable Linux distribution widely used for servers and high-performance computing. The GPU model employed is the NVIDIA Tesla K40c. This is a high-end GPU specifically designed for data centers and scientific computing, boasting excellent computational performance and energy efficiency. The CPU used is the Intel Xeon E5-2670 v3 @ 2.30 GHz, an eight-core processor providing robust data processing capabilities. With 64GB of DDR4 RAM, efficient operation when dealing with large-scale datasets is guaranteed. Storage-wise, a 512GB SSD solid-state drive is employed for storing the operating system and all necessary software, ensuring fast data read/write speeds. On the software side, TensorFlow 1.15.2 is utilized as the DL framework, and it is an open-source machine learning framework widely used in research and production. The CUDA version is 10.0, which is provided by NVIDIA as a parallel computing platform and API model for GPU programming. The cuDNN version is 7.4.2, provided by NVIDIA, and serves as an acceleration library for deep neural networks, significantly enhancing the training and inference speed of DL models. Python version is 3.6.8, and this is a widely used high-level programming language suitable for rapid development and prototyping. Each experiment is run in a fixed environment, and the performance of different models (MT-GooGleNet, ResNet-50, You Only Look Once version 5 (YOLOv5), EfficientNet-B7, DEtection TRansformer (DETR) etc.) on the accuracy, recall, F1-score, and reasoning time are compared.

Results and discussion

Analysis of automobile localization accuracy

In comparing MT-GooGleNet with CaffeNet and VGGNet, the results of automobile localization accuracy are presented in Fig. 8. Figure 8 A illustrates the PS accuracies, while Fig. 8B depicts the NS error rates.

Figure 8 depicts the accuracy (PS accuracies) and error rates (NS error rates) of the model evaluated under different conditions by setting various overlapping rates (OLRs). Figure 8 A illustrates that at an OLR of 0.7, the PS accuracies for MT-CaffeNet, MT-GooGleNet, and MT-VGGNet are 96.5%, 97.1%, and 96.5%, respectively. At an OLR of 0.8, the PS accuracies are 96.3%, 95.5%, and 95.3%, and at an OLR of 0.9, the PS accuracies are 94.1%, 95.4%, and 94.3%. As the OLR increases, the accuracy of all three models declines, with MT-GooGleNet consistently exhibiting the highest PS accuracy among the three. Figure 8B indicates that at an OLR of 0.7, the NS error rates for MT- MT-CaffeNet, GooGleNet, and MT-VGGNet are 0.04%, 0.03%, and 0.04%, respectively. At an OLR of 0.8, the NS error rates are 0.13%, 0.06%, and 0.12%, and at an OLR of 0.9, the NS error rates are 0.4%, 0.1%, and 0.38%. With the increasing OLR, the error rates for all three models surge, and MT-GooGleNet consistently exhibits the lowest NS error rate among the three. The results indicate that the MT-GooGleNet model maintains the highest PS accuracy and lowest NS error rate across all OLRs, suggesting its superior accuracy and robustness in vehicle localization tasks. This advantage may stem from the 1 × 1 convolutional kernel used in MT-GooGleNet, which enhances the network’s nonlinear expressive capability, thereby improving localization accuracy.

BC results analysis for vehicles

Figure 9 exhibits the BC outcomes of MT-GooGleNet following both the refinement and complete training procedures.

Figure 9 suggests that in the BC task, experiments are conducted on MT-GooGleNet using two training approaches: full training and fine-tuning. The results show that fine-tuned MT-GooGleNet achieves classification accuracy exceeding 98% in consistent recall rate, and even reaches an impressive accuracy of 99.5% in certain cases. This underscores the significant impact of parameter initialization on training outcomes and demonstrates the effectiveness of the fine-tuning strategy in improving model performance.

Recognition results of vehicle category

Vehicle type recognition outcomes are obtained through the direct application of fine-tuned GooGleNet, CaffeNet, and VGGNet for classifying the raw image. Figures 10 and 11 present the comparative results for recognition accuracy and LFs, respectively. In Fig. 10, a comparison of recognition accuracy for vehicle types is illustrated based on the MT-GooGleNet model.

Figure 10 presents the accuracy comparison of recognition of vehicle categories via the MT-GooGleNet model. The iteration process is defined to encompass a total of 200,000 iterations, commencing with an introductory learning rate of 0.005. Notably, the network converges at 10,000 iterations, with data augmentation and pre-training contributing to the expeditious convergence. Consequently, automobile recognition attains its peak accuracy at 87.2% following the implementation of data augmentation and pre-training. The increased volume of data contributes to the network’s training, resulting in enhanced accuracy in vehicle identification.

Figure 11depicts the variation in the LF of the cascaded MT-GooGleNet (CMT-GooGleNet) model. The graph depicts that, through the application of data augmentation and pre-training, the CMT-GooGleNet model achieves rapid convergence, ultimately reaching a final LF close to zero.

Real-time performance comparison of models

Figure 12 presents a real-time performance comparison of the proposed CMT-GooGleNet model. The system utilizes Ubuntu, Nvidia-K40 as the display card, features a 22-layer network, and employs images sized at 227*227. The models are compared using identical inputs. Figure 12 A showcases the accuracy results of the test set, while Fig. 12B illustrates the comparison of mean testing times.

In Fig. 12 A, the proposed MT-GooGleNet model demonstrates distinct test set accuracies: 68.23% with sole application of data augmentation, 70.14% with exclusive pre-training, and notably, an increased accuracy of 79.96% with the synergistic effect of combined data enhancement and pre-training. Consequently, the proposed MT-GooGleNet model exhibits substantial accuracy improvement through these processing steps. Figure 12B highlights the variation in test time between CPU and GPU for different network structures under the same dataset conditions. Leveraging the formidable graphics computing capabilities of a GPU markedly amplifies the testing velocity of the model. This result indicates that the MT-GooGleNet model not only performs well in accuracy but also holds potential application value in real-time processing capabilities.

The comparison of the improved model’s performance with other models in the vehicle localization task is outlined in Table 2. It indicates that the improved AM-MMFF-GooGleNet model markedly outperforms other comparison models in vehicle localization tasks. Specifically, the model achieves an accuracy of 98.6%, which is a 3.2% improvement over the original MT-GooGleNet. Additionally, the recall and F1-score are increased by 3.7% and 3.5%, respectively, demonstrating a strong overall performance advantage. Compared to ResNet-50 and YOLOv5, the improved model also exhibits significant improvements in accuracy and recall, with a particularly noticeable optimization in the F1-score. This reveals that the improved model can more accurately identify vehicle features in complex scenarios. Although the average detection time of AM-MMFF-GooGleNet is slightly higher than YOLOv5, its superior localization accuracy compensates for this shortcoming, highlighting the model’s balance between accuracy and speed. Furthermore, compared to other models, AM-MMFF-GooGleNet is better suited for high-precision applications, particularly excelling in-vehicle feature recognition and localization accuracy.

Table 2 The comparison of the improved model’s performance with other models in the vehicle localization task.

Full size table

Table 3 presents the localization accuracy and error rate of the improved model under different overlap rates. It can be observed that as the overlap rate increases, the model’s localization accuracy slightly decreases, while the localization error rate slightly increases. However, even in high-complexity scenarios with an overlap rate of 0.9, the model’s localization accuracy remains at a high level of 96.3%, and the localization error rate is controlled at 0.08%, demonstrating the model’s robustness. This performance is attributed to the application of the MMFF module in the improved model, which effectively enhances the model’s feature extraction ability when processing complex background information. In addition, the AM further improves the expression ability of key features. Overall, AM-MMFF-GooGleNet maintains stable localization accuracy when handling datasets with high overlap rates, showing its reliability and adaptability in high-complexity tasks.

Table 3 The localization accuracy and error rate of the improved model under different overlap rates.

Full size table

The real-time performance comparison experiment is shown in Table 4. Based on the results in Table 4, it can be found that the improved AM-MMFF-GooGleNet model performs excellently in both GPU and CPU environments. In the GPU environment, the average test time of the improved model is 15.6 milliseconds, significantly lower than the 25.3 milliseconds of the original MT-GooGleNet, and comparable to ResNet-50, indicating that the model enhances accuracy without sacrificing runtime speed. In the CPU environment, although the test time increases to 65.4 milliseconds, it is still significantly lower than MT-GooGleNet’s 127.1 milliseconds. Concurrently, it outperforms other more complex DL models in terms of real-time performance and resource utilization efficiency. The improved model’s balance between speed and performance benefits from the introduction of the attention mechanism. This effectively reduces interference from irrelevant information, while the dynamic loss weighting mechanism further optimizes the model’s training efficiency. Therefore, AM-MMFF-GooGleNet excels in accuracy and possesses good real-time performance, meeting the dual requirements of speed and performance in practical application scenarios.

Table 4 The real-time performance comparison experiment.

Full size table

To validate the performance of the improved model, multiple experiments are designed, including comparisons with classical and modern models, as well as performance testing under different scenarios. The model’s performance comparison results are denoted in Table 5. It can be observed that the improved AM-MMFF-GooGleNet outperforms the traditional MT-GooGleNet and other mainstream DL models across multiple performance metrics. Specifically, AM-MMFF-GooGleNet achieves an accuracy of 98.1%, which is a 3.6% improvement over the original MT-GooGleNet. The recall and F1-score also improve by 4.3% and 4.0%, respectively. Compared to modern models (such as ResNet and EfficientNet), AM-MMFF-GooGleNet still demonstrates higher accuracy (for example, a 0.9% improvement over EfficientNet), particularly showing higher overall performance in the F1-score. Moreover, the model performs excellently in average detection time, with an average of 27.6 milliseconds, a 33% reduction from the original model, and outperforms the latest architectures such as Swin Transformer and EfficientNet. This indicates that the improved framework proposed here remarkably enhances accuracy and optimizes inference efficiency, making it better suited for real-time application requirements in practical scenarios.

Table 5 The model’s performance comparison results.

Full size table

The model’s performance under different scenarios is listed in Fig. 13.

In Fig. 13 A, under normal lighting conditions, the accuracy of AM-MMFF-GooGleNet increases from 94.5% in MT-GooGleNet to 98.2%, demonstrating the new model’s significant performance advantage under standard conditions. In low-light and high-light scenarios, the new model’s accuracy reaches 97.3% and 96.8%, respectively, improving by 5.2% and 5.1% over the original model, effectively enhancing adaptability in complex lighting conditions. Moreover, in high background noise scenarios, the accuracy of AM-MMFF-GooGleNet is 95.5%, a 6.2% improvement over MT-GooGleNet, showcasing its strong resistance to background interference. These results confirm that the new model exhibits higher robustness and accuracy in various complex scenarios, further supporting its potential for practical application.

Figure 13B reveals that the fine-tuned AM-MMFF-GooGleNet shows improvements across all performance metrics, particularly with accuracy increasing from 95.2 to 98.1%, and F1-score rising from 94.5 to 97.8%. Moreover, the average detection time of the fine-tuned model is also reduced, indicating that the model improves accuracy without significantly increasing computational complexity.

In Fig. 13 C, the AM-MMFF-GooGleNet with He initialization reaches 98.1% accuracy after 120 training epochs and converges faster than with Xavier initialization and random initialization. This demonstrates that He initialization offers remarkable advantages in accelerating model convergence and improving accuracy.

Conclusion

This work designs an IoT and DL-based traffic electronic information signal acquisition system, successfully achieving efficient and accurate vehicle positioning and recognition. By introducing the AM-MMFF-GooGleNet model, which combines MMFF and AM, the model’s robustness and accuracy in complex environments are markedly enhanced. Experimental results show that the model achieves an accuracy of 98.6% in vehicle positioning tasks, while maintaining high performance under varying lighting conditions and high background noise scenarios. Furthermore, the system performs excellently in terms of real-time capability, with an average detection time of only 20.5 milliseconds, meeting the real-time application needs of actual traffic scenarios. These findings offer strong technical support for the optimization of ITSs and lay a solid foundation for the intelligent and efficient development of future intelligent traffic management.

However, this work still has some limitations. First, although the AM-MMFF-GooGleNet model performs exceptionally well in various complex scenarios, its performance under extreme weather conditions (such as heavy rain and snow) still needs further verification and optimization. Second, the current system mainly relies on visual information captured by cameras, with insufficient exploration of multi-sensor fusion (such as radar and LiDAR), which may limit the system’s applicability in certain special scenarios. Moreover, although the model performs well in real-time capability, the system’s computational resource consumption and energy efficiency in large-scale deployments still require further optimization to meet the needs of broader practical applications. Future research could focus on addressing these shortcomings and expanding the system’s functionality and application scenarios. On the one hand, multi-sensor fusion technology is explored, combining the advantages of visual sensors with other sensors to improve the system’s stability and reliability in complex environments. Simultaneously, for performance optimization under extreme weather conditions, more robust feature extraction methods and model architectures are studied to further enhance the system’s adaptability. On the other hand, attention is given to optimizing energy consumption and improving computational efficiency by using lightweight model designs and hardware acceleration technologies to reduce the system’s resource consumption. This makes it more suitable for large-scale deployment and real-time applications. Furthermore, with the rapid development of autonomous driving and V2X technologies, the integration of this system with related technologies is explored. This aims to provide a more comprehensive and efficient technological solution for future intelligent traffic ecosystems, driving the development of intelligent transportation to higher levels.

Data availability

The datasets used and/or analyzed during the current study are available from the corresponding author Yue Ma on reasonable request via e-mail mayue8999@hrbu.edu.cn.

References

Kumar, R. et al. A multifaceted vigilare system for intelligent transportation services in smart cities. IEEE Internet Things Magazine. 3 (4), 76–80 (2020).
Article Google Scholar
Mollah, M. B. et al. Blockchain for the internet of vehicles towards intelligent transportation systems: A survey. IEEE Internet Things J. 8 (6), 4157–4185 (2021).
Article Google Scholar
Lytras, M. D., Chui, K. T. & Liu, R. W. Moving towards intelligent transportation via artificial intelligence and Internet-of-Things. Sensors 20 (23), 6945 (2020).
Article ADS PubMed Central Google Scholar
Vanitha, N. S., Karthikeyan, J., Kavitha, G. & Radhika, K. Modelling of Intelligent Transportation System for Human Safety using IoT, Materials Today: Proceedings, vol. 33, pp. 4026–4029, (2020).
Zhang, Y., Zhang, Y., Zhao, X., Zhang, Z. & Chen, H. Design and data analysis of sports information acquisition system based on internet of medical things. IEEE Access. 8, 84792–84805 (2020).
Article Google Scholar
Hu, B., Lai, J. H. & Guo, C. C. Location-aware fine-grained vehicle type recognition using multi-task deep networks. Neurocomputing 243, 60–68 (2017).
Article Google Scholar
Huang, H., Li, Q. & Zhang, D. Deep learning based image recognition for crack and leakage defects of metro shield tunnel. Tunn. Undergr. Space Technol. 77, 166–176 (2018).
Article Google Scholar
Dourado, C. M. et al. An open IoHT-based deep learning framework for online medical image recognition. IEEE J. Sel. Areas Commun. 39 (2), 541–548 (2020).
Article Google Scholar
Hu, J. X. The analysis of plants image recognition based on deep learning and artificial neural network. IEEE Access. 8, 68828–68841 (2020).
Article Google Scholar
Toldinas, J. et al. A novel approach for network intrusion detection using multistage deep learning image recognition. Electronics 10 (15), 1854 (2021).
Article Google Scholar
Zhao, K. et al. Application research of image recognition technology based on CNN in image location of environmental monitoring UAV, EURASIP Journal on Image and Video Processing, vol. no. 1, pp. 1–11, 2018. (2018).
Chen, C. et al. This looks like that: deep learning for interpretable image recognition, arXiv preprint arXiv:1806.10574, (2018).
Li, H., Chen, Y., Li, K., Wang, C. & Chen, B. Transportation internet: A sustainable solution for intelligent transportation systems. IEEE Trans. Intell. Transp. Syst. 24 (12), 15818–15829 (2023).
Article Google Scholar
Liu, C. & Ke, L. Cloud assisted internet of things intelligent transportation system and the traffic control system in the smart City. J. Control Decis. 10 (2), 174–187 (2023).
Article Google Scholar
Musa, A. A. et al. Sustainable traffic management for smart cities using Internet-of-Things-Oriented intelligent transportation systems (ITS): challenges and recommendations. Sustainability 15 (13), 9859 (2023).
Article Google Scholar
Panigrahy, S. K. & Emany, H. A survey and tutorial on network optimization for intelligent transport system using the internet of vehicles. Sensors 23 (1), 555 (2023).
Article ADS PubMed Central Google Scholar
Zeng, F., Pang, C. & Tang, H. Sensors on the internet of things systems for urban disaster management: A systematic literature review. Sensors 23 (17), 7475 (2023).
Article ADS PubMed Central Google Scholar
Yang, S., Tan, J., Lei, T. & Linares-Barranco, B. Smart traffic navigation system for Fault-Tolerant edge computing of internet of vehicle in intelligent transportation gateway. IEEE Trans. Intell. Transp. Syst. 24 (11), 13011–13022 (2023).
Article Google Scholar
Das, D., Banerjee, S., Chatterjee, P., Ghosh, U. & Biswas, U. Blockchain for intelligent transportation systems: applications, challenges, and opportunities. IEEE Internet Things J. 10 (21), 18961–18970 (2023).
Article Google Scholar
Prakash, J., Murali, L., Manikandan, N., Nagaprasad, N. & Ramaswamy, K. RETRACTED ARTICLE: A vehicular network based intelligent transport system for smart cities using machine learning algorithms. Sci. Rep. 14 (1), 468 (2024).
Article ADS CAS PubMed Central Google Scholar
Gong, T., Zhu, L., Yu, F. R. & Tang, T. Edge intelligence in intelligent transportation systems: A survey. IEEE Trans. Intell. Transp. Syst. 24 (9), 8919–8944 (2023).
Article Google Scholar
Njoku, J. N., Nwakanma, C. I., Amaizu, G. C. & Kim, D. S. Prospects and challenges of metaverse application in Data-Driven intelligent transportation systems. IET Intel. Transport Syst. 17 (1), 1–21 (2023).
Article Google Scholar
Karthikeyan, H. & Usha, G. A secured IoT-Based intelligent transport system (IoT-ITS) framework based on cognitive science. Soft. Comput. 28 (23), 13929–13939 (2024).
Article Google Scholar
Oladimeji, D. et al. Smart transportation: an overview of technologies and applications. Sensors 23 (8), 3880 (2023).
Article ADS PubMed Central Google Scholar
Zhang, S. et al. Federated learning in intelligent transportation systems: recent applications and open problems. IEEE Trans. Intell. Transp. Syst. 25 (5), 3259–3285 (2023).
Article Google Scholar
Garg, T. & Kaur, G. A systematic review on intelligent transport systems. J. Comput. Cogn. Eng. 2 (3), 175–188 (2023).
Google Scholar
Chen, H. et al. A deep learning CNN architecture applied in smart near-infrared analysis of water pollution for agricultural irrigation resources. Agric. Water Manage. 240, 106303 (2020).
Article Google Scholar
Gupta, H., Jin, K. H., Nguyen, H. Q., McCann, M. T. & Unser, M. CNN-based projected gradient descent for consistent CT image reconstruction. IEEE Trans. Med. Imaging. 37 (6), 1440–1453 (2018).
Article Google Scholar
Ozkok, F. O. & Celik, M. A hybrid CNN-LSTM model for high resolution melting curve classification. Biomed. Signal Process. Control. 71, 103168 (2022).
Article Google Scholar
Khairandish, M. O., Sharma, M., Jain, V., Chatterjee, J. M. & Jhanjhi, N. Z. A hybrid CNN-SVM threshold segmentation approach for tumor detection and classification of MRI brain images. IRBM 43(4), 290–299 (2021).
Kattenborn, T., Leitloff, J., Schiefer, F. & Hinz, S. Review on convolutional neural networks (CNN) in vegetation remote sensing. ISPRS J. Photogrammetry Remote Sens. 173, 24–49 (2021).
Article ADS Google Scholar
Bertoni, F., Citti, G. & Sarti, A. LGN-CNN: A biologically inspired CNN architecture. Neural Netw. 145, 42–55 (2021).
Siavashi, J., Najafi, A., Ebadi, M. & Sharifi, M. A CNN-based approach for upscaling multiphase flow in digital sandstones. Fuel 308, 122047 (2022).
Article CAS Google Scholar
Vrbančič, G. & Podgorelec, V. Efficient ensemble for image-based identification of pneumonia utilizing deep CNN and SGD with warm restarts. Expert Syst. Appl. 187, 115834 (2022).
Article Google Scholar
Moradzadeh, A., Teimourzadeh, H., Mohammadi-Ivatloo, B. & Pourhossein, K. Hybrid CNN-LSTM approaches for identification of type and locations of transmission line faults. Int. J. Electr. Power Energy Syst. 135, 107563 (2022).
Article Google Scholar
Aslan, M. F., Sabanci, K. & Durdu, A. A CNN-based novel solution for determining the survival status of heart failure patients with clinical record data: numeric to image. Biomed. Signal Process. Control. 68, 102716 (2021).
Article Google Scholar
İni̇k, Ö., Altıok, M., Ülker, E. & Koçer, B. MODE-CNN: A fast converging multi-objective optimization algorithm for CNN-based models. Appl. Soft Comput. 109, 107582 (2021).
Article Google Scholar
Xie, J., Hu, K., Guo, Y., Zhu, Q. & Yu, J. On loss functions and CNNs for improved bioacoustic signal classification. Ecol. Inf. 64, 101331 (2021).
Article Google Scholar

Download references

Funding

The 2023 Harbin University Teacher Teaching Development Fund Project "Promoting Teaching through Competitions: Application of BOPPPS and Peer Instruction in Applied Undergraduate Engineering Teaching—Taking the Course of Automatic Control Principles as an Example" (Project No.: JFQJ2023005); The 2024 Heilongjiang Provincial Teaching Reform Research Project "Construction and Innovative Application of Mechanics Courses Based on Superstar Knowledge Graph + AI Teaching Assistant" (Project No.: SJGYX2024037); The 2023 Higher Education Research Planning Project of the Chinese Higher Education Society "Research on the Cultivation Status and Strategies of Core Literacy in Basic Disciplines under the New College Entrance Examination" (Project No.: 23ZK0407); The 2022 Provincial Higher Education Teaching Reform General Project "Research on the Path of Cultivating Innovation and Entrepreneurship Values of College Students in the New Era" (Project No.: SJGY20220509).

Author information

Authors and Affiliations

The School of Civil Engineering, Harbin University, Harbin, 150086, Heilongjiang, China
Yue Ma
College of Teacher Education, Harbin University, Harbin, 150086, Heilongjiang, China
Chenglong Wang
School of Electrical and Control Engineering, Heilongjiang University of Science and Technology, Harbin, 150086, China
Tianlei Fu
School of Measurement, Control Technology and Communication Engineering, Harbin University of Science and Technology, Harbin, 150080, Heilongjiang, China
Ziting Meng

Authors

Yue Ma
View author publications
Search author on:PubMed Google Scholar
Chenglong Wang
View author publications
Search author on:PubMed Google Scholar
Tianlei Fu
View author publications
Search author on:PubMed Google Scholar
Ziting Meng
View author publications
Search author on:PubMed Google Scholar

Contributions

Yue Ma: Conceptualization, methodology, software, validation, formal analysis, investigation, resources, data curation, writing—original draft preparation Chenglong Wang: writing—review and editing, visualization, supervision, project administration, funding acquisitionTianlei Fu: software, validation, formal analysisZiting Meng: visualization, supervision.

Corresponding author

Correspondence to Yue Ma.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Ma, Y., Wang, C., Fu, T. et al. The analysis of acquisition system for electronic traffic signal in smart cities based on the internet of things. Sci Rep 15, 20628 (2025). https://doi.org/10.1038/s41598-025-07423-6

Download citation

Received: 01 November 2024
Accepted: 16 June 2025
Published: 01 July 2025
DOI: https://doi.org/10.1038/s41598-025-07423-6

Subjects

Abstract

Similar content being viewed by others

A multi-feature spatial–temporal fusion network for traffic flow prediction

A lightweight network for traffic sign detection via multiple scale context awareness and semantic information guidance

Traffic planning in modern large cities Paris and Istanbul

Introduction

Recent related work

Advancement in IoT-based ITSs

Current state of DL-based information acquisition system research

Design of the acquisition system for the electronic traffic signal

System framework

Functional modules of the system

DL Network modules

CNNs

Vehicle location and binary classification research based on MT-CNN

Optimization of the MT-GooGleNet model

Data processing and decision-making process in ITS.

Identification of vehicle category via MT-CNN

Experimental environment

Results and discussion

Analysis of automobile localization accuracy

BC results analysis for vehicles

Recognition results of vehicle category

Real-time performance comparison of models

Conclusion

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links