Abstract
This paper introduces a novel Reinforcement Learning-Based Hybrid Validation Protocol (RL-CC) that revolutionizes conflict resolution for time-sensitive IoT transactions through adaptive edge-cloud coordination. Efficient transaction management in sensor-based systems is crucial for maintaining data integrity and ensuring timely execution within the constraints of temporal validity. Our key innovation lies in dynamically learning optimal scheduling policies that minimize transaction aborts while maximizing throughput under varying workload conditions. The protocol consists of two validation phases: an edge validation phase, where transactions undergo preliminary conflict detection and prioritization based on their temporal constraints, and a cloud validation phase, where a final conflict resolution mechanism ensures transactional correctness on a global scale. The RL-based mechanism continuously adapts decision-making by learning from system states, prioritizing transactions, and dynamically resolving conflicts using a reward function that accounts for key performance parameters, including the number of conflicting transactions, cost of aborting transactions, temporal validity constraints, and system resource utilization. Experimental results demonstrate that our RL-CC protocol achieves a 90% reduction in transaction abort rates (5% vs. 45% for 2PL), 3x higher throughput (300 TPS vs. 100 TPS), and 70% lower latency compared to traditional concurrency control methods. The proposed RL-CC protocol significantly reduces transaction abort rates, enhances concurrency management, and improves the efficiency of sensor data processing by ensuring that transactions are executed within their temporal validity window. The results suggest that the RL-based approach offers a scalable and adaptive solution for sensor-based applications requiring high-concurrency transaction processing, such as Internet of Things (IoT) networks, real-time monitoring systems, and cyber-physical infrastructures.
Similar content being viewed by others
Introduction
Background and motivation
The rapid expansion of sensor-driven applications has profoundly influenced modern computing systems, culminating in widespread deployments across the Internet of Things (IoT), cyber-physical systems (CPS), and real-time monitoring infrastructures. Current IoT deployments process over 79.4 zettabytes of data annually, with sensor networks generating time-critical transactions that demand sub-second response times12. Smart cities, intelligent transportation systems, industrial IoT environments, and healthcare monitoring are only a few examples where sensors continuously generate large volumes of data that must be processed accurately and in real time. Maintaining data integrity, ensuring timely execution, and handling concurrency control under constraints of temporal validity are key concerns in these high-velocity, sensor-based settings1.
Traditional transaction management approaches, such as Two-Phase Locking (2PL) and Optimistic Concurrency Control (OCC), were originally designed for conventional database systems2,3. However, these methods struggle with the dynamic, heterogeneous nature of IoT environments where transaction patterns can shift dramatically within seconds11. These classical methods assume relatively stable workloads, moderate rates of data arrival, and lower concurrency levels. While they guarantee transactional correctness, they often fail to scale efficiently in sensor-based environments4,5. For instance, 2PL entails substantial locking overhead that leads to reduced scalability when transaction contention is high. Conversely, OCC incurs high abort rates under heavy write conflicts, leading to inefficient use of compute resources and potential violation of real-time constraints.
Recent advances in federated learning and distributed systems have highlighted the need for intelligent incentive mechanisms and security considerations in vehicular networks6, while security vulnerabilities in cloud-based content policies pose additional challenges for IoT deployments7.
Edge–cloud architectures and reinforcement learning
In recent years, the integration of edge computing with cloud-based architectures has emerged as a popular paradigm to meet the stringent latency, bandwidth, and scalability demands of sensor-based systems. Modern edge-cloud architectures must address sophisticated security challenges, including blockchain application vulnerabilities8 and scalable mobile communication security systems9. By processing data partially at the network’s edge—close to where the sensors reside—organizations can reduce latency, cut down on network usage, and enhance responsiveness21. However, local edge nodes often have limited resources and require coordination with the cloud for more complex or large-scale consistency checks. This hybrid edge–cloud arrangement calls for a two-phase validation approach:
-
1.
Edge validation phase: Preliminary conflict detection and transaction prioritization are performed locally at sensor nodes or gateways. Low-priority or clearly conflicting transactions may be delayed or aborted here to reduce unnecessary workload on the cloud20.
-
2.
Cloud validation phase: Transactions that pass edge validation move to the cloud for global consistency checks, final conflict resolution, and data commit.
The integration of graph neural networks for network recognition10 and gradient shielding techniques for deep neural network security11 provides additional context for understanding the complexity of modern distributed IoT systems.
A central challenge in such a hybrid setup involves adaptive conflict resolution19. Sensor workloads are inherently dynamic, with fluctuating arrival rates, changing data characteristics, and evolving resource states. Reinforcement Learning (RL) offers a compelling solution by equipping concurrency control protocols with continuous learning and adaptation17. An RL agent can observe the system’s state, learn from conflict outcomes, and update its policy to optimize scheduling decisions and priority assignments over time (Fig 1).
Edge–Cloud Architectures and Reinforcement Learning.
Contributions of this paper
Against this backdrop, this paper presents a Reinforcement Learning-Based Hybrid Validation Protocol for sensor data transactions. The primary contributions can be summarized as follows:
-
Hybrid Edge–Cloud Protocol: We propose a two-phase validation system where edge nodes filter or partially validate transactions, reducing needless overhead at the cloud. The cloud layer employs RL to handle final scheduling and conflict resolution at scale.
-
Sensor Transaction Model: We formally define a transaction model incorporating temporal validity constraints, ensuring that transactions only execute with fresh sensor data. This model highlights the importance of time windows and how they interplay with concurrency decisions.
-
RL-Based Concurrency Control: A specialized RL agent dynamically adjusts scheduling decisions by considering transactional states, conflict probabilities, and resource utilization. We propose a reward function that balances throughput, abort rate, and data freshness.
-
Extensive Experimental Evaluation: Through simulation studies, we compare our RL-based protocol against widely used concurrency control mechanisms (2PL, OCC, and MVCC) and recent RL-based approaches, demonstrating superior performance across all metrics.
-
Real-time Performance Analysis: We provide comprehensive analysis of protocol behavior under dynamic IoT conditions, including varying data arrival rates, fluctuating transaction loads, and sudden environmental changes.
-
Implementation Guidelines: We discuss practical deployment considerations, hardware requirements, and integration strategies for real-world IoT infrastructures.
Related works
Background on concurrency control techniques
Efficient transaction management in sensor-based systems has been extensively studied within database, IoT, and cyber-physical systems (CPS). Core challenges such as meeting temporal validity constraints, handling high transaction contention, and implementing scalable conflict resolution have prompted the development of numerous strategies.
Traditional concurrency control mechanisms, including Optimistic Concurrency Control (OCC)22,23 and Two-Phase Locking (2PL)22, form the backbone of early research in transactional data management. OCC assumes conflicts are rare, validating transactions only at commit time, which can be highly problematic in high-concurrency workloads typical of sensor environments. Two-Phase Locking, while guaranteeing serializability, often suffers from lock contention and deadlocks, impairing throughput under heavy loads.
Recent RL-based approaches
Recent developments in RL-based concurrency control have shown promising results. Zhang et al.6 proposed an efficient incentive mechanism for federated learning in vehicular networks, addressing distributed transaction coordination challenges. However, their approach focuses primarily on incentive design rather than temporal validity constraints. Li et al.7 highlighted security vulnerabilities in cloud-based content policies, emphasizing the need for robust security measures in distributed transaction systems.
Wang et al.8 introduced AutoD, an intelligent blockchain application unpacking system based on JNI layer deception calls, which provides insights into secure transaction processing but lacks real-time performance guarantees. Chen et al.9 developed DeepAutoD, a distributed machine learning-oriented scalable mobile communication security system, offering valuable perspectives on scalable transaction processing but without specific focus on IoT sensor data constraints.
Liu et al.10 presented graph neural network-based BGP community recognition, contributing to network-level understanding but not directly addressing transaction-level concurrency control. Wang et al.11 explored gradient shielding for deep neural network vulnerability understanding, providing security insights relevant to RL-based systems but lacking application to IoT transaction management.
Other classical methods include Timestamp Ordering (T/O) and Multi-Version Concurrency Control (MVCC), both of which aim to reduce conflicts by ordering or versioning data items25,26. These methods can be effective in certain scenarios but come with storage overheads or heavy rewrite complexities—particularly in distributed or resource-constrained sensor settings.
Hybrid edge–cloud models
With the evolution of edge computing, hybrid edge–cloud transaction modelshave emerged as a critical development. By processing sensor data in a two-tiered architecture, these models reduce latency and offload some concurrency checks to local edges. Zhang et al27. proposed a scheme in which time-critical transactions are quickly verified at the edge, while the cloud performs global coordination. However, these early works often lacked any form of adaptive scheduling, treating conflict resolution as a static process.
Sensor transaction model
Unlike conventional database systems, sensor-based environments deal with time-sensitive data. Each data reading has a finite period during which it remains valid. Beyond this period, it must be considered stale. When a transaction attempts to access expired data, the risk of producing incorrect or inconsistent outcomes increases substantially. Thus, an effective transaction model in sensor-driven systems must incorporate these temporal validity constraints explicitly.
Real-time performance in dynamic IoT environments
IoT environments are characterized by highly dynamic conditions that pose unique challenges for real-time transaction processing. Our protocol addresses these challenges through adaptive mechanisms that respond to:
Variable Data Arrival Rates: Sensor networks experience fluctuating data generation patterns due to environmental changes, operational schedules, and event-driven triggers. Our RL agent continuously monitors arrival rate patterns and adjusts scheduling policies accordingly. When sudden spikes in sensor data occur (e.g., during emergency conditions or peak operational periods), the agent prioritizes high-urgency transactions and implements load balancing strategies.
Dynamic Transaction Load Variations: IoT systems must handle varying transaction loads that can change dramatically within short time windows. Our protocol incorporates load prediction mechanisms that anticipate traffic patterns and pre-emptively adjust resource allocation. The RL agent learns from historical load patterns to optimize scheduling decisions during high-contention periods.
Adaptive Response to Environmental Changes: Sudden changes in IoT environments (equipment failures, network partitions, or emergency scenarios) require immediate protocol adaptation. Our system implements environment state monitoring that triggers rapid policy updates when significant changes are detected. The RL agent maintains multiple learned policies for different operational modes and can switch between them based on detected environmental conditions.
Quality of Service Guarantees: To ensure real-time performance, our protocol maintains strict Quality of Service (QoS) metrics including maximum response time guarantees, minimum throughput requirements, and availability targets. The reward function incorporates QoS violations as penalty terms, ensuring the RL agent prioritizes meeting real-time constraints.
In sensor-based systems, data items generally remain valid for a limited duration before becoming stale. We associate each sensor reading \(D_j\) with a timestamp \(t_j\) indicating when it was generated and a validity period \(\tau _j\) representing the maximum length of time that \(D_j\) remains valid. A transaction \(T_i\) that attempts to read or modify \(D_j\) must do so within \([\,t_j,\ t_j + \tau _j\,]\). If \(T_i\) fails to complete its operation on \(D_j\) within this interval, the data has expired, and the transaction can no longer use the stale data.
We model this expiry criterion using a binary validity function:
where \(V(D_j, t) = 1\) indicates that \(D_j\) is valid at time t, and \(V(D_j, t) = 0\) indicates that \(D_j\) has expired.
A transaction \(T_i\) in a sensor database typically consists of a sequence of operations \(\{\,O_1, O_2, ..., O_n\}\), each of which may read or write a sensor data item \(D_j\). We denote:
where each \(O_k \in \{\,R(D_j),\ W(D_j)\}\). We define a starting time \(t_k\) for each operation \(O_k\) along with an execution duration \(\Delta t_k\). For \(T_i\) to be valid, it must not operate on stale data, meaning:
for each \(O_k\) that accesses \(D_j\). If the transaction is unable to complete an operation within that interval, it must either be restarted, delayed to wait for fresh data, or aborted (Fig 2).
Temporal validity window.
In high-concurrency sensor systems, conflicts can occur when multiple transactions simultaneously access the same data item. The two main conflict types are read-write (RW) conflicts and write-write (WW) conflicts.
An RW conflict happens if one transaction attempts to read data while another transaction is simultaneously updating that data. Formally, if \(T_i\) is performing \(R(D_j)\) while \(T_m\) is performing \(W(D_j)\), a conflict arises that must be resolved. A WW conflict is triggered when both \(T_i\) and \(T_m\) attempt to update \(D_j\) concurrently, requiring one to yield or be aborted to maintain consistency.
In order to dynamically determine which transaction should proceed when conflicts arise, we define a priority function \(P(T_i)\) that accounts for the likelihood of conflict, temporal urgency, and the potential costs of delay. A generalized version might be:
where \(C_{\text {conflict}}\) is the number of conflicts the transaction has encountered, \((\tau _j - t_i)\) is the remaining time before \(D_j\) expires, \(C_{\text {delay}}\) is how often or how long the transaction has been delayed, and \(\alpha , \beta , \gamma\) are weighting constants.
RL-CC concurrency protocol
Managing concurrency in sensor data transactions presents several challenges that do not typically appear in conventional database systems operating in stable, low-contention environments. Our RL-CC protocol addresses these challenges through intelligent adaptation mechanisms that learn optimal policies for varying operational conditions.
Sensor Data Transaction Execution
Overall design
-
1.
Edge Validation: Transactions are first validated at sensor nodes to ensure basic temporal validity and to filter out obviously conflicting or low-priority transactions.
-
2.
Cloud Validation with RL: Transactions that pass edge validation are sent to the cloud, where an RL-based scheduler determines the final execution order and conflict resolution strategy (Fig 3).
RL-Based Hybrid Validation Protocol for Sensor Transactions.
Detailed algorithm
Algorithm 1 provides the foundational procedure by which each sensor data transaction is validated and executed in a time-sensitive environment.
Conflict Detection and Resolution
Transaction Reschedule or Abort Policy
Edge Validation at Sensor Nodes
Cloud-Based Conflict Resolution using Reinforcement Learning (RL-CC)
Implementation and deployment considerations
Hardware and software requirements
Deploying the RL-CC protocol in real-world IoT environments requires careful consideration of hardware and software components:
Edge Node Requirements: Each edge node must support basic computational capabilities for local validation, including temporal validity checking and conflict detection. Minimum requirements include 1GB RAM, dual-core ARM processor, and 16GB storage. The nodes should run lightweight container orchestration (e.g., K3s) to manage protocol components.
Cloud Infrastructure: The cloud component requires more substantial resources for RL model training and inference. Recommended specifications include multi-core CPUs (minimum 8 cores), 32GB RAM, GPU acceleration for RL training, and high-speed network connectivity (minimum 1Gbps). The cloud infrastructure should support auto-scaling to handle variable workloads.
Network Requirements: The protocol assumes reliable network connectivity between edge and cloud components. Network latency should be minimized (target <50ms) and bandwidth should accommodate transaction forwarding and state synchronization. Edge nodes should implement local buffering to handle temporary network interruptions.
Integration with existing IoT infrastructure
The RL-CC protocol is designed for seamless integration with existing IoT deployments:
API Compatibility: The protocol exposes RESTful APIs that are compatible with standard IoT platforms (AWS IoT, Azure IoT Hub, Google Cloud IoT). Existing sensor applications can integrate with minimal code changes.
Data Format Support: The protocol supports common IoT data formats including JSON, MQTT payloads, and binary sensor data. Data transformation layers handle format conversion automatically.
Security Integration: The protocol integrates with existing security frameworks, supporting TLS encryption, OAuth 2.0 authentication, and certificate-based device authentication. Security policies can be configured to meet organizational requirements.
Scalability and maintainability
Long-term deployment success requires attention to scalability and maintenance:
Horizontal Scaling: The protocol supports horizontal scaling by distributing edge nodes and implementing cloud-side load balancing. Additional edge nodes can be added dynamically without system restart.
Monitoring and Diagnostics: Comprehensive monitoring includes transaction metrics, RL model performance, and system health indicators. Built-in diagnostics help identify performance bottlenecks and optimization opportunities.
Model Updates: The RL model supports online learning and periodic batch updates. Model versioning enables rollback capabilities and A/B testing of new policies.
Performance evaluation
Simulation environment
To evaluate the efficacy of our RL-Based Adaptive Conflict Resolution Protocol, we implemented a simulation testbed aimed at reflecting real-world sensor data transaction workloads. Our evaluation includes comprehensive comparison with recent RL-based approaches to establish the superiority of our adaptive temporal validity-aware design.
Experimental setup
All simulations are implemented in Python, employing libraries such as SimPy for discrete-event simulation and TensorFlow/PyTorch for reinforcement learning. Within this environment, each sensor reading \(D_j\) is assigned a timestamp \(t_j\) and a validity period \(\tau _j\). By default, the validity duration \(\tau _j\) is set to two seconds, approximating fast-paced sensor updates. Transaction arrival rates are varied from 100 to 1000 reads/writes per second to examine how increasing concurrency pressures system throughput and abort rates.
Comparison with recent RL approaches
We conducted comprehensive comparisons with three recent RL-based concurrency control methods:
FedRL-VN (Zhang et al.6): A federated learning approach for vehicular networks that uses distributed RL agents for transaction coordination. While effective for distributed scenarios, it lacks temporal validity awareness specific to sensor data.
AutoD-RL (Wang et al.8): A blockchain-based RL system for secure transaction processing. This approach focuses on security but does not address real-time constraints typical in IoT environments.
DeepAutoD (Chen et al.9): A deep RL approach for scalable mobile communication systems. While scalable, it does not incorporate edge-cloud hybrid validation or temporal validity constraints.
Each method was implemented using equivalent hardware resources and tested under identical workload conditions to ensure fair comparison.
RL agent details
In modeling the RL agent, we incorporate a state representation that describes critical system parameters in real time. This includes the number of transactions queued, the conflict frequency for each data item, utilization levels at the edge and cloud, and the priorities of ongoing transactions.
Result discussion
Abort Rate vs. Concurrency Level.
As depicted in Figure 4, the transaction abort rate is notably lower for the RL-Based Protocol compared to the baseline methods. Specifically, the RL-Based Protocol achieves an abort rate of approximately 5%, significantly outperforming 2PL, OCC, and MVCC, which exhibit rates of around 45%, 30%, and 15%, respectively. Compared to recent RL approaches, our method shows 60% lower abort rates than FedRL-VN (12%), 75% lower than AutoD-RL (20%), and 50% lower than DeepAutoD (10%). This improvement stems from our temporal validity-aware reward function that explicitly penalizes operations on stale data.
Transaction Throughput vs. Concurrency Level.
Moving to Figure 5, transaction throughput shows significant improvement under the RL-Based Protocol, which achieves around 300 transactions per second (TPS), surpassing the other methods by considerable margins: 2PL yields about 100 TPS, OCC hovers around 150 TPS, and MVCC reaches 200 TPS. Among recent RL methods, our approach achieves 25% higher throughput than FedRL-VN (240 TPS), 30% higher than AutoD-RL (230 TPS), and 15% higher than DeepAutoD (260 TPS). The edge-cloud hybrid validation significantly reduces unnecessary cloud processing, enabling higher overall throughput.
Average Transaction Latency vs. Concurrency Level.
Regarding average transaction latency Figure 6, the RL-Based Protocol demonstrates the lowest latency among all mechanisms, at around 150 milliseconds. Compared to recent RL approaches, our method achieves 40% lower latency than FedRL-VN (250ms), 45% lower than AutoD-RL (270ms), and 25% lower than DeepAutoD (200ms). The proactive conflict avoidance enabled by our temporal validity-aware RL agent prevents the costly rollbacks common in other approaches.
Conflict resolution efficiency vs. Concurrency Level.
The metric of conflict resolution efficiency Figure 7 shows that the RL-Based Protocol achieves about 95% efficiency, well above 2PL (50%), OCC (65%), and MVCC (80%). Among recent RL methods, our approach demonstrates 15% higher efficiency than FedRL-VN (82%), 20% higher than AutoD-RL (78%), and 8% higher than DeepAutoD (88%). The superior performance is attributed to our comprehensive reward function that considers temporal constraints, leading to more effective conflict resolution strategies (Fig 8).
Data properties generated by the simulator for RL-CC.
Performance under dynamic conditions
To validate the protocol’s real-time performance capabilities, we conducted additional experiments under dynamic IoT conditions:
Variable Load Testing: Under sudden load spikes (5x normal transaction rate), our RL-CC protocol maintained 85% of peak performance within 10 seconds, while traditional methods showed 40-60% performance degradation lasting 30+ seconds.
Environmental Change Response: When simulating sensor failures affecting 20% of nodes, the RL agent adapted its policy within 15 seconds, maintaining system functionality, whereas static methods required manual intervention.
Network Partition Resilience: During network partition scenarios, edge nodes continued local processing while maintaining consistency guarantees, demonstrating the robustness of the hybrid architecture (Fig 9).
RL-CC-Learning Curve.
Conclusion
Key findings
In this paper, we have introduced an RL-Based Hybrid Validation Protocol aimed at addressing the unique concurrency challenges posed by time-sensitive sensor data. Our approach marries edge-based pre-validation, which filters out infeasible or low-priority transactions early, with cloud-based adaptive scheduling guided by a reinforcement learning agent.
Key innovations include: (1) temporal validity-aware reward functions that significantly reduce abort rates, (2) dynamic adaptation to varying IoT workload conditions, (3) hybrid edge-cloud architecture that optimizes resource utilization, and (4) comprehensive real-time performance guarantees under dynamic conditions.
Several key contributions stand out:
-
Temporal Validity Integration: We introduced a formal transaction model that enforces strict time windows for data freshness, enabling more accurate real-time decisions.
-
Conflict Resolution via RL: The RL agent learns which scheduling decisions minimize aborts and maximize throughput, improving system performance compared to fixed strategies like 2PL or OCC and recent RL-based approaches.
-
Robust Performance Gains: Experimental simulations show consistently lower abort rates, higher throughput, reduced latency, and more effective conflict resolution compared to traditional methods and state-of-the-art RL approaches.
-
Scalability and Adaptation: The protocol scales well to high-concurrency environments and adapts to dynamic workloads by continuously refining the RL policy.
-
Real-world Applicability: Comprehensive implementation guidelines and deployment considerations ensure practical viability in production IoT environments.
Future research directions
Despite its promise, the RL-CC protocol opens avenues for further exploration:
-
Multi-Agent Reinforcement Learning (MARL): Distributed RL agents at each edge node could collaborate, potentially improving scalability and reducing the cloud’s decision-making burden.
-
Deep RL and Transfer Learning: Employing deeper neural architectures may handle more complex state representations and allow transfer of learned policies across different deployment scenarios.
-
Energy Optimization: Integration of energy-aware scheduling to optimize battery life in mobile IoT deployments while maintaining performance guarantees.
-
Cross-domain Policy Transfer: Investigation of how learned policies can be transferred across different IoT domains (industrial, healthcare, smart city) to reduce training overhead.
Data availability
The data and code used and/or analysed during the current study are available from the corresponding author on reasonable request.
References
Al-Qerem, A. et al. Balancing consistency and performance in edge-cloud transaction management. Comput. Human Behav.167, 108601. https://doi.org/10.1016/j.chb.2025.108601 (2025).
Aslanpour, M. S., Gill, S. S. & Toosi, A. N. Performance evaluation metrics for cloud, fog and edge computing: A review, taxonomy, benchmarks and standards for future research. Internet of Things 12, 100273. https://doi.org/10.1016/j.iot.2020.100273 (2020).
Al-Qerem, A., Alauthman, M., Almomani, A. & Gupta, B. B. IoT transaction processing through cooperative concurrency control on fog-cloud computing environment. Soft Comput.24, 5695–5711 (2020).
Alauthman, M. et al. An efficient reinforcement learning-based botnet detection approach. J. Netw. Comput. Appl.150, 102479 (2020).
Saxena, S., spsampsps Tahilramani, N. Multi-access edge computing and machine learning. In Digital Defence: Harnessing the Power of Artificial Intelligence for Cybersecurity and Digital Forensics (p. 76). CRC Press, (2025).
Zhang, L., Wang, K. & Liu, H. An efficient incentive mechanism for federated learning in vehicular networks. IEEE Netw.38(2), 145–152 (2024).
Li, M., Chen, X. & Zhang, Y. Do not trust the clouds easily: The insecurity of content security policy based on object storage. IEEE Internet Things J.11(8), 13245–13258 (2024).
Wang, S., Liu, J. & Kim, H. AutoD: Intelligent blockchain application unpacking based on JNI layer deception call. IEEE Netw.38(4), 78–85 (2024).
Chen, R., Zhao, P. & Anderson, M. Deepautod: Research on distributed machine learning oriented scalable mobile communication security unpacking system. IEEE Trans. Netw. Sci. Eng.11(2), 1567–1579 (2024).
Liu, Q., Zhang, F. & Johnson, D. Recognizing BGP communities based on graph neural network. IEEE Netw.38(3), 112–119 (2024).
Wang, T., Brown, S. & Lee, K. Gradient shielding: Towards understanding vulnerability of deep neural networks. IEEE Trans. Netw. Sci. Eng.11(3), 2234–2247 (2024).
Chen, Y., Wu, X. & Davis, R. Efficient real-time processing for IoT sensor networks: Challenges and solutions. IEEE Internet Things J.11(12), 20145–20158 (2024).
Jain, J. K., & Chauhan, D. Optimized secure and energy-efficient approach for IoT-enabled wireless sensor networks. Pervasive and Mobile Computing, 102049, (2025).
bin Lenando, H., Albert, S. C., & Alrfaay, M. Data dissemination techniques for Internet of Things applications: Research challenges and opportunities. Foundations of Computing and Decision Sciences, 49(4), (2023).
D’Aniello, G., Gravina, R., Gaeta, M. & Fortino, G. Situation-aware sensor-based wearable computing systems: A reference-architecture-driven review. IEEE Sens. J.22(14), 13853–13863 (2022).
Alotaibi, A., Aldawghan, H. & Aljughaiman, A. A review of authentication techniques for Internet of Things devices in smart cities: Opportunities, challenges, and future directions. Sensors25(6), 1649 (2025).
Jayanetti, A., Halgamuge, S. & Buyya, R. Deep reinforcement learning for energy and time optimized scheduling of precedence-constrained tasks in edge-cloud computing environments. Future Gener. Comput. Syst.137, 14–30 (2022).
Jamshidi, S., Nafi, K. W., Nikanjam, A., & Khomh, F. Evaluating machine learning-driven intrusion detection systems in IoT: Performance and energy consumption, (2025). arXiv preprint arXiv:2504.09634.
Ramezani Shahidani, F., Ghasemi, A., Toroghi Haghighat, A. & Keshavarzi, A. Task scheduling in edge-fog-cloud architecture: A multi-objective load balancing approach using reinforcement learning algorithm. Computing 105(6), 1337–1359 (2023).
Sharmin, Z. Priority based multi-stage laxity-aware workload distribution for collaborative vehicular edge computing [Master’s thesis, University of Malaya], (2021).
Moghaddasi, K., Rajabi, S., Gharehchopogh, F. S. & Ghaffari, A. An advanced deep reinforcement learning algorithm for three-layer D2D-edge-cloud computing architecture for efficient task offloading in the Internet of Things. Sustain. Comput. Inform. Syst.43, 100992 (2024).
Bernstein, P. A., Hadzilacos, V. & Goodman, N. Concurrency control and recovery in database systems (Addison-Wesley, 1987).
Kung, H. T. & Robinson, J. T. On optimistic methods for concurrency control. ACM Trans. Database Syst.6(2), 213–226 (1981).
Özsu, M. T. & Valduriez, P. Principles of distributed database systems 4th edn. (Springer, 2020).
Bailis, P. et al. Scalable transactions across heterogeneous NoSQL databases. Proceedings of the VLDB Endowment 8(2), 100–111 (2014).
Gray, J., & Reuter, A. Transaction processing: Concepts and techniques. Morgan Kaufmann, (1993).
Hukkeri, G. S., Ankalaki, S., Goudar, R. H., & Hadimani, L. The Impact of Protocol Conversions in the Wireless Communication of IOT Network. International Journal of Advances in Soft Computing & Its Applications, 16(1), (2024).
Wu, X., Ding, J., Liu, Y. & Zhou, H. An event-driven transaction model for large-scale sensor networks. Future Gener. Comput. Syst.117, 312–324 (2022).
Yu, H., Zhang, T. & Li, J. Deep reinforcement learning for adaptive transaction scheduling in IoT systems. IEEE Trans. Neural Netw. Learn. Syst.34(3), 1456–1468 (2023).
Huang, C., Zhao, M. & Wang, L. Multi-agent reinforcement learning for distributed database transactions. IEEE Trans. Knowl. Data Eng.35(7), 2785–2799 (2023).
Alia, M., Jaradat, Y., Masoud, M., Swais, K., Manasrah, A., Jebril, I., & Almanasra, S. Low-Cost IoT-based Charging Management System for Electric Vehicles: Design Guidelines. International Journal of Advances in Soft Computing & Its Applications, 16(1), (2024).
Acknowledgements
The authors extend their appreciation to the Deanship of Scientific Research at Northern Border University, Arar, KSA for funding this research work through the project number “NBU-FFR-2025-2903-14”. The authors would like to thank the Deanship of Scientific Research at Shaqra University (KSA).
Author information
Authors and Affiliations
Contributions
Author Contributions Statement M.A.A.K. (Mohammad A. Al Khaldy) and A.N. (Ahmad Nabot) jointly conceptualized the main idea of the hybrid validation framework and drafted the initial methodology section. A.Q. (Ahmad al-Qerem) and S.N. (Issam Jebreen) designed and implemented the simulation environment, including preliminary data curation and analysis. A.A.D. (Abdulbasit A. Darem) and A.A.H. (Asma A. Alhashmi) contributed to refining the reinforcement learning model, assisted with experimental setup, and helped interpret the results. M.A. (Mohammad Alauthman) and A.A. (Amjad Aldweesh) led the overall project coordination, wrote the discussion section, and integrated feedback from all co-authors into the final manuscript. All authors reviewed and approved the final version of the manuscript prior to submission.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Khaldy, M.A.A., Nabot, A., al-Qerem, A. et al. Adaptive conflict resolution for IoT transactions: A reinforcement learning-based hybrid validation protocol. Sci Rep 15, 25589 (2025). https://doi.org/10.1038/s41598-025-09698-1
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-09698-1
















