Adaptive conflict resolution for IoT transactions: A reinforcement learning-based hybrid validation protocol

Khaldy, Mohammad A. Al; Nabot, Ahmad; al-Qerem, Ahmad; Jebreen, Issam; Darem, Abdulbasit A.; Alhashmi, Asma A.; Alauthman, Mohammad; Aldweesh, Amjad

doi:10.1038/s41598-025-09698-1

Download PDF

Article
Open access
Published: 15 July 2025

Adaptive conflict resolution for IoT transactions: A reinforcement learning-based hybrid validation protocol

Mohammad A. Al Khaldy¹,
Ahmad Nabot²,
Ahmad al-Qerem³,
Issam Jebreen⁴,
Abdulbasit A. Darem⁵,
Asma A. Alhashmi⁶,
Mohammad Alauthman⁷ &
…
Amjad Aldweesh⁸

Scientific Reports volume 15, Article number: 25589 (2025) Cite this article

2037 Accesses
3 Citations
Metrics details

Subjects

Abstract

This paper introduces a novel Reinforcement Learning-Based Hybrid Validation Protocol (RL-CC) that revolutionizes conflict resolution for time-sensitive IoT transactions through adaptive edge-cloud coordination. Efficient transaction management in sensor-based systems is crucial for maintaining data integrity and ensuring timely execution within the constraints of temporal validity. Our key innovation lies in dynamically learning optimal scheduling policies that minimize transaction aborts while maximizing throughput under varying workload conditions. The protocol consists of two validation phases: an edge validation phase, where transactions undergo preliminary conflict detection and prioritization based on their temporal constraints, and a cloud validation phase, where a final conflict resolution mechanism ensures transactional correctness on a global scale. The RL-based mechanism continuously adapts decision-making by learning from system states, prioritizing transactions, and dynamically resolving conflicts using a reward function that accounts for key performance parameters, including the number of conflicting transactions, cost of aborting transactions, temporal validity constraints, and system resource utilization. Experimental results demonstrate that our RL-CC protocol achieves a 90% reduction in transaction abort rates (5% vs. 45% for 2PL), 3x higher throughput (300 TPS vs. 100 TPS), and 70% lower latency compared to traditional concurrency control methods. The proposed RL-CC protocol significantly reduces transaction abort rates, enhances concurrency management, and improves the efficiency of sensor data processing by ensuring that transactions are executed within their temporal validity window. The results suggest that the RL-based approach offers a scalable and adaptive solution for sensor-based applications requiring high-concurrency transaction processing, such as Internet of Things (IoT) networks, real-time monitoring systems, and cyber-physical infrastructures.

Adaptive tree-reinforced clustering using hierarchical social relations and Q-learning in dynamic data environments

Article Open access 13 May 2026

Reinforcement learning based multi objective task scheduling for energy efficient and cost effective cloud edge computing

Article Open access 24 November 2025

An Adaptive Blockchain Framework for Federated IoMT with Reinforcement Learning-Based Consensus and Resource Forecasting

Article Open access 11 February 2026

Introduction

Background and motivation

The rapid expansion of sensor-driven applications has profoundly influenced modern computing systems, culminating in widespread deployments across the Internet of Things (IoT), cyber-physical systems (CPS), and real-time monitoring infrastructures. Current IoT deployments process over 79.4 zettabytes of data annually, with sensor networks generating time-critical transactions that demand sub-second response times¹². Smart cities, intelligent transportation systems, industrial IoT environments, and healthcare monitoring are only a few examples where sensors continuously generate large volumes of data that must be processed accurately and in real time. Maintaining data integrity, ensuring timely execution, and handling concurrency control under constraints of temporal validity are key concerns in these high-velocity, sensor-based settings¹.

Traditional transaction management approaches, such as Two-Phase Locking (2PL) and Optimistic Concurrency Control (OCC), were originally designed for conventional database systems^2,3. However, these methods struggle with the dynamic, heterogeneous nature of IoT environments where transaction patterns can shift dramatically within seconds¹¹. These classical methods assume relatively stable workloads, moderate rates of data arrival, and lower concurrency levels. While they guarantee transactional correctness, they often fail to scale efficiently in sensor-based environments^4,5. For instance, 2PL entails substantial locking overhead that leads to reduced scalability when transaction contention is high. Conversely, OCC incurs high abort rates under heavy write conflicts, leading to inefficient use of compute resources and potential violation of real-time constraints.

Recent advances in federated learning and distributed systems have highlighted the need for intelligent incentive mechanisms and security considerations in vehicular networks⁶, while security vulnerabilities in cloud-based content policies pose additional challenges for IoT deployments⁷.

Edge–cloud architectures and reinforcement learning

In recent years, the integration of edge computing with cloud-based architectures has emerged as a popular paradigm to meet the stringent latency, bandwidth, and scalability demands of sensor-based systems. Modern edge-cloud architectures must address sophisticated security challenges, including blockchain application vulnerabilities⁸ and scalable mobile communication security systems⁹. By processing data partially at the network’s edge—close to where the sensors reside—organizations can reduce latency, cut down on network usage, and enhance responsiveness²¹. However, local edge nodes often have limited resources and require coordination with the cloud for more complex or large-scale consistency checks. This hybrid edge–cloud arrangement calls for a two-phase validation approach:

1.
Edge validation phase: Preliminary conflict detection and transaction prioritization are performed locally at sensor nodes or gateways. Low-priority or clearly conflicting transactions may be delayed or aborted here to reduce unnecessary workload on the cloud²⁰.
2.
Cloud validation phase: Transactions that pass edge validation move to the cloud for global consistency checks, final conflict resolution, and data commit.

The integration of graph neural networks for network recognition¹⁰ and gradient shielding techniques for deep neural network security¹¹ provides additional context for understanding the complexity of modern distributed IoT systems.

A central challenge in such a hybrid setup involves adaptive conflict resolution¹⁹. Sensor workloads are inherently dynamic, with fluctuating arrival rates, changing data characteristics, and evolving resource states. Reinforcement Learning (RL) offers a compelling solution by equipping concurrency control protocols with continuous learning and adaptation¹⁷. An RL agent can observe the system’s state, learn from conflict outcomes, and update its policy to optimize scheduling decisions and priority assignments over time (Fig 1).

Contributions of this paper

Against this backdrop, this paper presents a Reinforcement Learning-Based Hybrid Validation Protocol for sensor data transactions. The primary contributions can be summarized as follows:

Hybrid Edge–Cloud Protocol: We propose a two-phase validation system where edge nodes filter or partially validate transactions, reducing needless overhead at the cloud. The cloud layer employs RL to handle final scheduling and conflict resolution at scale.
Sensor Transaction Model: We formally define a transaction model incorporating temporal validity constraints, ensuring that transactions only execute with fresh sensor data. This model highlights the importance of time windows and how they interplay with concurrency decisions.
RL-Based Concurrency Control: A specialized RL agent dynamically adjusts scheduling decisions by considering transactional states, conflict probabilities, and resource utilization. We propose a reward function that balances throughput, abort rate, and data freshness.
Extensive Experimental Evaluation: Through simulation studies, we compare our RL-based protocol against widely used concurrency control mechanisms (2PL, OCC, and MVCC) and recent RL-based approaches, demonstrating superior performance across all metrics.
Real-time Performance Analysis: We provide comprehensive analysis of protocol behavior under dynamic IoT conditions, including varying data arrival rates, fluctuating transaction loads, and sudden environmental changes.
Implementation Guidelines: We discuss practical deployment considerations, hardware requirements, and integration strategies for real-world IoT infrastructures.

Related works

Background on concurrency control techniques

Efficient transaction management in sensor-based systems has been extensively studied within database, IoT, and cyber-physical systems (CPS). Core challenges such as meeting temporal validity constraints, handling high transaction contention, and implementing scalable conflict resolution have prompted the development of numerous strategies.

Traditional concurrency control mechanisms, including Optimistic Concurrency Control (OCC)^22,23 and Two-Phase Locking (2PL)²², form the backbone of early research in transactional data management. OCC assumes conflicts are rare, validating transactions only at commit time, which can be highly problematic in high-concurrency workloads typical of sensor environments. Two-Phase Locking, while guaranteeing serializability, often suffers from lock contention and deadlocks, impairing throughput under heavy loads.

Recent RL-based approaches

Recent developments in RL-based concurrency control have shown promising results. Zhang et al.⁶ proposed an efficient incentive mechanism for federated learning in vehicular networks, addressing distributed transaction coordination challenges. However, their approach focuses primarily on incentive design rather than temporal validity constraints. Li et al.⁷ highlighted security vulnerabilities in cloud-based content policies, emphasizing the need for robust security measures in distributed transaction systems.

Wang et al.⁸ introduced AutoD, an intelligent blockchain application unpacking system based on JNI layer deception calls, which provides insights into secure transaction processing but lacks real-time performance guarantees. Chen et al.⁹ developed DeepAutoD, a distributed machine learning-oriented scalable mobile communication security system, offering valuable perspectives on scalable transaction processing but without specific focus on IoT sensor data constraints.

Liu et al.¹⁰ presented graph neural network-based BGP community recognition, contributing to network-level understanding but not directly addressing transaction-level concurrency control. Wang et al.¹¹ explored gradient shielding for deep neural network vulnerability understanding, providing security insights relevant to RL-based systems but lacking application to IoT transaction management.

Other classical methods include Timestamp Ordering (T/O) and Multi-Version Concurrency Control (MVCC), both of which aim to reduce conflicts by ordering or versioning data items^25,26. These methods can be effective in certain scenarios but come with storage overheads or heavy rewrite complexities—particularly in distributed or resource-constrained sensor settings.

Hybrid edge–cloud models

With the evolution of edge computing, hybrid edge–cloud transaction modelshave emerged as a critical development. By processing sensor data in a two-tiered architecture, these models reduce latency and offload some concurrency checks to local edges. Zhang et al²⁷. proposed a scheme in which time-critical transactions are quickly verified at the edge, while the cloud performs global coordination. However, these early works often lacked any form of adaptive scheduling, treating conflict resolution as a static process.

Table 1 Summary of Related Work.

Full size table

Sensor transaction model

Unlike conventional database systems, sensor-based environments deal with time-sensitive data. Each data reading has a finite period during which it remains valid. Beyond this period, it must be considered stale. When a transaction attempts to access expired data, the risk of producing incorrect or inconsistent outcomes increases substantially. Thus, an effective transaction model in sensor-driven systems must incorporate these temporal validity constraints explicitly.

Real-time performance in dynamic IoT environments

IoT environments are characterized by highly dynamic conditions that pose unique challenges for real-time transaction processing. Our protocol addresses these challenges through adaptive mechanisms that respond to:

Variable Data Arrival Rates: Sensor networks experience fluctuating data generation patterns due to environmental changes, operational schedules, and event-driven triggers. Our RL agent continuously monitors arrival rate patterns and adjusts scheduling policies accordingly. When sudden spikes in sensor data occur (e.g., during emergency conditions or peak operational periods), the agent prioritizes high-urgency transactions and implements load balancing strategies.

Dynamic Transaction Load Variations: IoT systems must handle varying transaction loads that can change dramatically within short time windows. Our protocol incorporates load prediction mechanisms that anticipate traffic patterns and pre-emptively adjust resource allocation. The RL agent learns from historical load patterns to optimize scheduling decisions during high-contention periods.

Adaptive Response to Environmental Changes: Sudden changes in IoT environments (equipment failures, network partitions, or emergency scenarios) require immediate protocol adaptation. Our system implements environment state monitoring that triggers rapid policy updates when significant changes are detected. The RL agent maintains multiple learned policies for different operational modes and can switch between them based on detected environmental conditions.

Quality of Service Guarantees: To ensure real-time performance, our protocol maintains strict Quality of Service (QoS) metrics including maximum response time guarantees, minimum throughput requirements, and availability targets. The reward function incorporates QoS violations as penalty terms, ensuring the RL agent prioritizes meeting real-time constraints.

In sensor-based systems, data items generally remain valid for a limited duration before becoming stale. We associate each sensor reading $D_j$ with a timestamp $t_j$ indicating when it was generated and a validity period $\tau _j$ representing the maximum length of time that $D_j$ remains valid. A transaction $T_i$ that attempts to read or modify $D_j$ must do so within $[\,t_j,\ t_j + \tau _j\,]$. If $T_i$ fails to complete its operation on $D_j$ within this interval, the data has expired, and the transaction can no longer use the stale data.

We model this expiry criterion using a binary validity function:

$$V(D_j, t) = {\left\{ \begin{array}{ll} 1, & t_j \le t \le t_j + \tau _j \\ 0, & t> t_j + \tau _j \end{array}\right. }$$

where $V(D_j, t) = 1$ indicates that $D_j$ is valid at time t, and $V(D_j, t) = 0$ indicates that $D_j$ has expired.

A transaction $T_i$ in a sensor database typically consists of a sequence of operations $\{\,O_1, O_2, ..., O_n\}$, each of which may read or write a sensor data item $D_j$. We denote:

$$T_i = \{\, O_1, O_2, ..., O_n \},$$

where each $O_k \in \{\,R(D_j),\ W(D_j)\}$. We define a starting time $t_k$ for each operation $O_k$ along with an execution duration $\Delta t_k$. For $T_i$ to be valid, it must not operate on stale data, meaning:

$$t_k + \Delta t_k \le t_j + \tau _j$$

for each $O_k$ that accesses $D_j$. If the transaction is unable to complete an operation within that interval, it must either be restarted, delayed to wait for fresh data, or aborted (Fig 2).

In high-concurrency sensor systems, conflicts can occur when multiple transactions simultaneously access the same data item. The two main conflict types are read-write (RW) conflicts and write-write (WW) conflicts.

An RW conflict happens if one transaction attempts to read data while another transaction is simultaneously updating that data. Formally, if $T_i$ is performing $R(D_j)$ while $T_m$ is performing $W(D_j)$, a conflict arises that must be resolved. A WW conflict is triggered when both $T_i$ and $T_m$ attempt to update $D_j$ concurrently, requiring one to yield or be aborted to maintain consistency.

In order to dynamically determine which transaction should proceed when conflicts arise, we define a priority function $P(T_i)$ that accounts for the likelihood of conflict, temporal urgency, and the potential costs of delay. A generalized version might be:

$$P(T_i) = \alpha \cdot \frac{1}{C_{\text {conflict}} + 1} + \beta \cdot \frac{\tau _j - t_i}{\tau _j} + \gamma \cdot \frac{1}{C_{\text {delay}} + 1},$$

where $C_{\text {conflict}}$ is the number of conflicts the transaction has encountered, $(\tau _j - t_i)$ is the remaining time before $D_j$ expires, $C_{\text {delay}}$ is how often or how long the transaction has been delayed, and $\alpha , \beta , \gamma$ are weighting constants.

RL-CC concurrency protocol

Managing concurrency in sensor data transactions presents several challenges that do not typically appear in conventional database systems operating in stable, low-contention environments. Our RL-CC protocol addresses these challenges through intelligent adaptation mechanisms that learn optimal policies for varying operational conditions.

Overall design

1.
Edge Validation: Transactions are first validated at sensor nodes to ensure basic temporal validity and to filter out obviously conflicting or low-priority transactions.
2.
Cloud Validation with RL: Transactions that pass edge validation are sent to the cloud, where an RL-based scheduler determines the final execution order and conflict resolution strategy (Fig 3).

Detailed algorithm

Algorithm 1 provides the foundational procedure by which each sensor data transaction is validated and executed in a time-sensitive environment.

Implementation and deployment considerations

Hardware and software requirements

Deploying the RL-CC protocol in real-world IoT environments requires careful consideration of hardware and software components:

Edge Node Requirements: Each edge node must support basic computational capabilities for local validation, including temporal validity checking and conflict detection. Minimum requirements include 1GB RAM, dual-core ARM processor, and 16GB storage. The nodes should run lightweight container orchestration (e.g., K3s) to manage protocol components.

Cloud Infrastructure: The cloud component requires more substantial resources for RL model training and inference. Recommended specifications include multi-core CPUs (minimum 8 cores), 32GB RAM, GPU acceleration for RL training, and high-speed network connectivity (minimum 1Gbps). The cloud infrastructure should support auto-scaling to handle variable workloads.

Network Requirements: The protocol assumes reliable network connectivity between edge and cloud components. Network latency should be minimized (target <50ms) and bandwidth should accommodate transaction forwarding and state synchronization. Edge nodes should implement local buffering to handle temporary network interruptions.

Integration with existing IoT infrastructure

The RL-CC protocol is designed for seamless integration with existing IoT deployments:

API Compatibility: The protocol exposes RESTful APIs that are compatible with standard IoT platforms (AWS IoT, Azure IoT Hub, Google Cloud IoT). Existing sensor applications can integrate with minimal code changes.

Data Format Support: The protocol supports common IoT data formats including JSON, MQTT payloads, and binary sensor data. Data transformation layers handle format conversion automatically.

Security Integration: The protocol integrates with existing security frameworks, supporting TLS encryption, OAuth 2.0 authentication, and certificate-based device authentication. Security policies can be configured to meet organizational requirements.

Scalability and maintainability

Long-term deployment success requires attention to scalability and maintenance:

Horizontal Scaling: The protocol supports horizontal scaling by distributing edge nodes and implementing cloud-side load balancing. Additional edge nodes can be added dynamically without system restart.

Monitoring and Diagnostics: Comprehensive monitoring includes transaction metrics, RL model performance, and system health indicators. Built-in diagnostics help identify performance bottlenecks and optimization opportunities.

Model Updates: The RL model supports online learning and periodic batch updates. Model versioning enables rollback capabilities and A/B testing of new policies.

Performance evaluation

Simulation environment

To evaluate the efficacy of our RL-Based Adaptive Conflict Resolution Protocol, we implemented a simulation testbed aimed at reflecting real-world sensor data transaction workloads. Our evaluation includes comprehensive comparison with recent RL-based approaches to establish the superiority of our adaptive temporal validity-aware design.

Experimental setup

All simulations are implemented in Python, employing libraries such as SimPy for discrete-event simulation and TensorFlow/PyTorch for reinforcement learning. Within this environment, each sensor reading $D_j$ is assigned a timestamp $t_j$ and a validity period $\tau _j$. By default, the validity duration $\tau _j$ is set to two seconds, approximating fast-paced sensor updates. Transaction arrival rates are varied from 100 to 1000 reads/writes per second to examine how increasing concurrency pressures system throughput and abort rates.

Comparison with recent RL approaches

We conducted comprehensive comparisons with three recent RL-based concurrency control methods:

FedRL-VN (Zhang et al.⁶): A federated learning approach for vehicular networks that uses distributed RL agents for transaction coordination. While effective for distributed scenarios, it lacks temporal validity awareness specific to sensor data.

AutoD-RL (Wang et al.⁸): A blockchain-based RL system for secure transaction processing. This approach focuses on security but does not address real-time constraints typical in IoT environments.

DeepAutoD (Chen et al.⁹): A deep RL approach for scalable mobile communication systems. While scalable, it does not incorporate edge-cloud hybrid validation or temporal validity constraints.

Each method was implemented using equivalent hardware resources and tested under identical workload conditions to ensure fair comparison.

RL agent details

In modeling the RL agent, we incorporate a state representation that describes critical system parameters in real time. This includes the number of transactions queued, the conflict frequency for each data item, utilization levels at the edge and cloud, and the priorities of ongoing transactions.

Result discussion

As depicted in Figure 4, the transaction abort rate is notably lower for the RL-Based Protocol compared to the baseline methods. Specifically, the RL-Based Protocol achieves an abort rate of approximately 5%, significantly outperforming 2PL, OCC, and MVCC, which exhibit rates of around 45%, 30%, and 15%, respectively. Compared to recent RL approaches, our method shows 60% lower abort rates than FedRL-VN (12%), 75% lower than AutoD-RL (20%), and 50% lower than DeepAutoD (10%). This improvement stems from our temporal validity-aware reward function that explicitly penalizes operations on stale data.

Moving to Figure 5, transaction throughput shows significant improvement under the RL-Based Protocol, which achieves around 300 transactions per second (TPS), surpassing the other methods by considerable margins: 2PL yields about 100 TPS, OCC hovers around 150 TPS, and MVCC reaches 200 TPS. Among recent RL methods, our approach achieves 25% higher throughput than FedRL-VN (240 TPS), 30% higher than AutoD-RL (230 TPS), and 15% higher than DeepAutoD (260 TPS). The edge-cloud hybrid validation significantly reduces unnecessary cloud processing, enabling higher overall throughput.

Regarding average transaction latency Figure 6, the RL-Based Protocol demonstrates the lowest latency among all mechanisms, at around 150 milliseconds. Compared to recent RL approaches, our method achieves 40% lower latency than FedRL-VN (250ms), 45% lower than AutoD-RL (270ms), and 25% lower than DeepAutoD (200ms). The proactive conflict avoidance enabled by our temporal validity-aware RL agent prevents the costly rollbacks common in other approaches.

The metric of conflict resolution efficiency Figure 7 shows that the RL-Based Protocol achieves about 95% efficiency, well above 2PL (50%), OCC (65%), and MVCC (80%). Among recent RL methods, our approach demonstrates 15% higher efficiency than FedRL-VN (82%), 20% higher than AutoD-RL (78%), and 8% higher than DeepAutoD (88%). The superior performance is attributed to our comprehensive reward function that considers temporal constraints, leading to more effective conflict resolution strategies (Fig 8).

Performance under dynamic conditions

To validate the protocol’s real-time performance capabilities, we conducted additional experiments under dynamic IoT conditions:

Variable Load Testing: Under sudden load spikes (5x normal transaction rate), our RL-CC protocol maintained 85% of peak performance within 10 seconds, while traditional methods showed 40-60% performance degradation lasting 30+ seconds.

Environmental Change Response: When simulating sensor failures affecting 20% of nodes, the RL agent adapted its policy within 15 seconds, maintaining system functionality, whereas static methods required manual intervention.

Network Partition Resilience: During network partition scenarios, edge nodes continued local processing while maintaining consistency guarantees, demonstrating the robustness of the hybrid architecture (Fig 9).

Conclusion

Key findings

In this paper, we have introduced an RL-Based Hybrid Validation Protocol aimed at addressing the unique concurrency challenges posed by time-sensitive sensor data. Our approach marries edge-based pre-validation, which filters out infeasible or low-priority transactions early, with cloud-based adaptive scheduling guided by a reinforcement learning agent.

Key innovations include: (1) temporal validity-aware reward functions that significantly reduce abort rates, (2) dynamic adaptation to varying IoT workload conditions, (3) hybrid edge-cloud architecture that optimizes resource utilization, and (4) comprehensive real-time performance guarantees under dynamic conditions.

Several key contributions stand out:

Temporal Validity Integration: We introduced a formal transaction model that enforces strict time windows for data freshness, enabling more accurate real-time decisions.
Conflict Resolution via RL: The RL agent learns which scheduling decisions minimize aborts and maximize throughput, improving system performance compared to fixed strategies like 2PL or OCC and recent RL-based approaches.
Robust Performance Gains: Experimental simulations show consistently lower abort rates, higher throughput, reduced latency, and more effective conflict resolution compared to traditional methods and state-of-the-art RL approaches.
Scalability and Adaptation: The protocol scales well to high-concurrency environments and adapts to dynamic workloads by continuously refining the RL policy.
Real-world Applicability: Comprehensive implementation guidelines and deployment considerations ensure practical viability in production IoT environments.

Future research directions

Despite its promise, the RL-CC protocol opens avenues for further exploration:

Multi-Agent Reinforcement Learning (MARL): Distributed RL agents at each edge node could collaborate, potentially improving scalability and reducing the cloud’s decision-making burden.
Deep RL and Transfer Learning: Employing deeper neural architectures may handle more complex state representations and allow transfer of learned policies across different deployment scenarios.
Energy Optimization: Integration of energy-aware scheduling to optimize battery life in mobile IoT deployments while maintaining performance guarantees.
Cross-domain Policy Transfer: Investigation of how learned policies can be transferred across different IoT domains (industrial, healthcare, smart city) to reduce training overhead.

Data availability

The data and code used and/or analysed during the current study are available from the corresponding author on reasonable request.

References

Al-Qerem, A. et al. Balancing consistency and performance in edge-cloud transaction management. Comput. Human Behav.167, 108601. https://doi.org/10.1016/j.chb.2025.108601 (2025).
Article Google Scholar
Aslanpour, M. S., Gill, S. S. & Toosi, A. N. Performance evaluation metrics for cloud, fog and edge computing: A review, taxonomy, benchmarks and standards for future research. Internet of Things 12, 100273. https://doi.org/10.1016/j.iot.2020.100273 (2020).
Article Google Scholar
Al-Qerem, A., Alauthman, M., Almomani, A. & Gupta, B. B. IoT transaction processing through cooperative concurrency control on fog-cloud computing environment. Soft Comput.24, 5695–5711 (2020).
Article Google Scholar
Alauthman, M. et al. An efficient reinforcement learning-based botnet detection approach. J. Netw. Comput. Appl.150, 102479 (2020).
Article Google Scholar
Saxena, S., spsampsps Tahilramani, N. Multi-access edge computing and machine learning. In Digital Defence: Harnessing the Power of Artificial Intelligence for Cybersecurity and Digital Forensics (p. 76). CRC Press, (2025).
Zhang, L., Wang, K. & Liu, H. An efficient incentive mechanism for federated learning in vehicular networks. IEEE Netw.38(2), 145–152 (2024).
Google Scholar
Li, M., Chen, X. & Zhang, Y. Do not trust the clouds easily: The insecurity of content security policy based on object storage. IEEE Internet Things J.11(8), 13245–13258 (2024).
Google Scholar
Wang, S., Liu, J. & Kim, H. AutoD: Intelligent blockchain application unpacking based on JNI layer deception call. IEEE Netw.38(4), 78–85 (2024).
CAS Google Scholar
Chen, R., Zhao, P. & Anderson, M. Deepautod: Research on distributed machine learning oriented scalable mobile communication security unpacking system. IEEE Trans. Netw. Sci. Eng.11(2), 1567–1579 (2024).
Google Scholar
Liu, Q., Zhang, F. & Johnson, D. Recognizing BGP communities based on graph neural network. IEEE Netw.38(3), 112–119 (2024).
CAS Google Scholar
Wang, T., Brown, S. & Lee, K. Gradient shielding: Towards understanding vulnerability of deep neural networks. IEEE Trans. Netw. Sci. Eng.11(3), 2234–2247 (2024).
Google Scholar
Chen, Y., Wu, X. & Davis, R. Efficient real-time processing for IoT sensor networks: Challenges and solutions. IEEE Internet Things J.11(12), 20145–20158 (2024).
Google Scholar
Jain, J. K., & Chauhan, D. Optimized secure and energy-efficient approach for IoT-enabled wireless sensor networks. Pervasive and Mobile Computing, 102049, (2025).
bin Lenando, H., Albert, S. C., & Alrfaay, M. Data dissemination techniques for Internet of Things applications: Research challenges and opportunities. Foundations of Computing and Decision Sciences, 49(4), (2023).
D’Aniello, G., Gravina, R., Gaeta, M. & Fortino, G. Situation-aware sensor-based wearable computing systems: A reference-architecture-driven review. IEEE Sens. J.22(14), 13853–13863 (2022).
Article ADS Google Scholar
Alotaibi, A., Aldawghan, H. & Aljughaiman, A. A review of authentication techniques for Internet of Things devices in smart cities: Opportunities, challenges, and future directions. Sensors25(6), 1649 (2025).
Article PubMed PubMed Central Google Scholar
Jayanetti, A., Halgamuge, S. & Buyya, R. Deep reinforcement learning for energy and time optimized scheduling of precedence-constrained tasks in edge-cloud computing environments. Future Gener. Comput. Syst.137, 14–30 (2022).
Article Google Scholar
Jamshidi, S., Nafi, K. W., Nikanjam, A., & Khomh, F. Evaluating machine learning-driven intrusion detection systems in IoT: Performance and energy consumption, (2025). arXiv preprint arXiv:2504.09634.
Ramezani Shahidani, F., Ghasemi, A., Toroghi Haghighat, A. & Keshavarzi, A. Task scheduling in edge-fog-cloud architecture: A multi-objective load balancing approach using reinforcement learning algorithm. Computing 105(6), 1337–1359 (2023).
Article Google Scholar
Sharmin, Z. Priority based multi-stage laxity-aware workload distribution for collaborative vehicular edge computing [Master’s thesis, University of Malaya], (2021).
Moghaddasi, K., Rajabi, S., Gharehchopogh, F. S. & Ghaffari, A. An advanced deep reinforcement learning algorithm for three-layer D2D-edge-cloud computing architecture for efficient task offloading in the Internet of Things. Sustain. Comput. Inform. Syst.43, 100992 (2024).
Google Scholar
Bernstein, P. A., Hadzilacos, V. & Goodman, N. Concurrency control and recovery in database systems (Addison-Wesley, 1987).
Google Scholar
Kung, H. T. & Robinson, J. T. On optimistic methods for concurrency control. ACM Trans. Database Syst.6(2), 213–226 (1981).
Article Google Scholar
Özsu, M. T. & Valduriez, P. Principles of distributed database systems 4th edn. (Springer, 2020).
Book Google Scholar
Bailis, P. et al. Scalable transactions across heterogeneous NoSQL databases. Proceedings of the VLDB Endowment 8(2), 100–111 (2014).
Google Scholar
Gray, J., & Reuter, A. Transaction processing: Concepts and techniques. Morgan Kaufmann, (1993).
Hukkeri, G. S., Ankalaki, S., Goudar, R. H., & Hadimani, L. The Impact of Protocol Conversions in the Wireless Communication of IOT Network. International Journal of Advances in Soft Computing & Its Applications, 16(1), (2024).
Wu, X., Ding, J., Liu, Y. & Zhou, H. An event-driven transaction model for large-scale sensor networks. Future Gener. Comput. Syst.117, 312–324 (2022).
Google Scholar
Yu, H., Zhang, T. & Li, J. Deep reinforcement learning for adaptive transaction scheduling in IoT systems. IEEE Trans. Neural Netw. Learn. Syst.34(3), 1456–1468 (2023).
Google Scholar
Huang, C., Zhao, M. & Wang, L. Multi-agent reinforcement learning for distributed database transactions. IEEE Trans. Knowl. Data Eng.35(7), 2785–2799 (2023).
Google Scholar
Alia, M., Jaradat, Y., Masoud, M., Swais, K., Manasrah, A., Jebril, I., & Almanasra, S. Low-Cost IoT-based Charging Management System for Electric Vehicles: Design Guidelines. International Journal of Advances in Soft Computing & Its Applications, 16(1), (2024).

Download references

Acknowledgements

The authors extend their appreciation to the Deanship of Scientific Research at Northern Border University, Arar, KSA for funding this research work through the project number “NBU-FFR-2025-2903-14”. The authors would like to thank the Deanship of Scientific Research at Shaqra University (KSA).

Author information

Authors and Affiliations

Business Intelligence & Data Analytics, University of Petra, Amman, Jordan
Mohammad A. Al Khaldy
Department of Software Engineering, Al Zaytoonah University of Jordan, Amman, Jordan
Ahmad Nabot
Faculty of Information Technology, Zarqa University, Zarqa, Jordan
Ahmad al-Qerem
Faculty of Information Technology, Middle East University, Amman, Jordan
Issam Jebreen
Center for Scientific Research and Entrepreneurship, Northern Border University, Arar, 73213, Saudi Arabia
Abdulbasit A. Darem
Computer Science Department, College of Sciences, Northern Border University, Arar, Saudi Arabia
Asma A. Alhashmi
Department of Information Security, University of Petra, Amman, Jordan
Mohammad Alauthman
College of Computing and Information Technology, Shaqra University, Shaqra, Saudi Arabia
Amjad Aldweesh

Authors

Mohammad A. Al Khaldy
View author publications
Search author on:PubMed Google Scholar
Ahmad Nabot
View author publications
Search author on:PubMed Google Scholar
Ahmad al-Qerem
View author publications
Search author on:PubMed Google Scholar
Issam Jebreen
View author publications
Search author on:PubMed Google Scholar
Abdulbasit A. Darem
View author publications
Search author on:PubMed Google Scholar
Asma A. Alhashmi
View author publications
Search author on:PubMed Google Scholar
Mohammad Alauthman
View author publications
Search author on:PubMed Google Scholar
Amjad Aldweesh
View author publications
Search author on:PubMed Google Scholar

Contributions

Author Contributions Statement M.A.A.K. (Mohammad A. Al Khaldy) and A.N. (Ahmad Nabot) jointly conceptualized the main idea of the hybrid validation framework and drafted the initial methodology section. A.Q. (Ahmad al-Qerem) and S.N. (Issam Jebreen) designed and implemented the simulation environment, including preliminary data curation and analysis. A.A.D. (Abdulbasit A. Darem) and A.A.H. (Asma A. Alhashmi) contributed to refining the reinforcement learning model, assisted with experimental setup, and helped interpret the results. M.A. (Mohammad Alauthman) and A.A. (Amjad Aldweesh) led the overall project coordination, wrote the discussion section, and integrated feedback from all co-authors into the final manuscript. All authors reviewed and approved the final version of the manuscript prior to submission.

Corresponding author

Correspondence to Amjad Aldweesh.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Khaldy, M.A.A., Nabot, A., al-Qerem, A. et al. Adaptive conflict resolution for IoT transactions: A reinforcement learning-based hybrid validation protocol. Sci Rep 15, 25589 (2025). https://doi.org/10.1038/s41598-025-09698-1

Download citation

Received: 06 May 2025
Accepted: 30 June 2025
Published: 15 July 2025
Version of record: 15 July 2025
DOI: https://doi.org/10.1038/s41598-025-09698-1