Introduction

The Industrial Internet of Things (IIoT) is transforming operational technology in industries including energy, oil and gas, and smart manufacturing through the facilitation of real-time monitoring, predictive analytics, and informed decision-making1,2,3. The substantial amount of data produced by distributed IoT devices poses considerable hurdles for secure, efficient, and scalable data sharing, particularly in resource-limited and heterogeneous edge situations4.

Conventional cloud-based architectures suffer from significant drawbacks when applied to IIoT environments5,6,7,8. Centralized data processing introduces elevated latency and bandwidth costs while creating single points of failure. While frameworks such as EdgeShare9 attempt to address these challenges through decentralized access control using blockchain, they rely on fixed proxy node selection and rigid consensus models. These design choices limit their ability to dynamically respond to fluctuations in network load, trustworthiness, and device availability10,11,12. Existing approaches often fail to provide adaptive mechanisms that reflect real-time network conditions, resulting in reduced performance and vulnerability to targeted attacks13,14.

Existing blockchain-enabled IIoT data-sharing systems exhibit insufficient dynamic adaptation to fluctuating resource and trust conditions at the network periphery12,15,16. This results in suboptimal proxy selection, diminished performance under load, and heightened susceptibility to targeted attacks9,17,18,19.

The present research examines several critical research inquiries: In what ways may dynamic proxy node selection improve the efficiency and robustness of blockchain-based data sharing within IIoT environments? Is it feasible to combine deep Q-learning with blockchain overlays to facilitate real-time edge node selection in fluctuating network conditions? What design techniques provide lightweight, secure, and scalable access control using smart contracts tailored for low-power edge devices? The problem definition, objectives, and key idea behind the proposed solution are summarized in the following bullet points under three titles.

Problem definition

  • Static proxy selection leads in inefficient routing and poor flexibility to dynamic IIoT scenarios.

  • Centralized architectures introduce performance bottlenecks and single points of failure.

  • Existing smart contract-based systems lack fine-grained, trust-aware, real-time access control.

Objectives

  • Design a decentralized IIoT data-sharing architecture that responds to dynamic settings.

  • Incorporate reinforcement learning for intelligent proxy selection based on latency, load, and trust.

  • Provide auditable, fine-grained access control using smart contracts connected with blockchain.

Proposed solution

We introduce BDEQ, a Blockchain-based Dynamic Edge Q-learning system that:

  • Dynamically picks Proxy Service Providing Nodes (PSPNs) using Deep Q-learning.

  • Integrates smart contracts for safe, domain-aware access control.

  • Utilizes a lightweight consensus technique designed for resource-constrained edge situations.

This proposed structure promotes performance, scalability, and security in decentralized IIoT networks.

BDEQ is an innovative Blockchain-based Dynamic Edge Q-learning framework aimed at enabling secure, efficient, and adaptive data sharing in IIoT ecosystems. BDEQ utilizes blockchain for tamper-proof, auditable access control, employs smart contracts for precise, context-sensitive permissions, and incorporates Deep Q-learning to dynamically choose Proxy Service Providing Nodes (PSPNs) based on real-time metrics including CPU and memory usage, latency, and trust levels.

Despite notable advancements, existing blockchain-enabled IIoT data-sharing frameworks still suffer from key limitations: they predominantly rely on static proxy node selection, lack real-time adaptability to trust and resource fluctuations, and provide limited fine-grained, decentralized access control. Architectures such as EdgeShare leverage blockchain and smart contracts for secure sharing but are hindered by rigid node configurations and consensus models that do not scale well under dynamic workloads. Moreover, few approaches integrate machine learning for adaptive decision-making, and even fewer address trust-aware node selection in real-time.

This paper addresses this critical gap by proposing BDEQ: a Blockchain-based Dynamic Edge Q-learning framework that unifies smart contracts, a lightweight blockchain consensus mechanism, and deep Q-learning for adaptive proxy node selection. BDEQ is designed to operate efficiently in resource-constrained IIoT environments, ensuring dynamic, secure, and decentralized data access.

This research presents the subsequent key contributions:

  • Design of the BDEQ Framework: A stratified, modular framework integrating blockchain and reinforcement learning to facilitate adaptable, safe, and decentralized data sharing in the Industrial Internet of Things (IIoT).

  • Dynamic Proxy Node Selection: A deep Q-network (DQN)-based learning agent persistently identifies the appropriate edge proxy node amid changing network conditions.

  • Access Control Governed by Smart Contracts: Smart contracts enforce access controls both within and across domains with a hybrid, role-aware blockchain consensus mechanism tailored for edge environments.

  • Validation of Security and Performance: We present a theoretical attack model, Monte Carlo simulations, and experimental comparisons with EdgeShare and centralized systems, demonstrating enhancements in latency (up to 35%), throughput (up to 28%), and attack resilience.

  • Prototype Implementation: The system is deployed via Docker-based virtual edge nodes and assessed on synthetic gas industry data, demonstrating its applicability for industrial environments.

The remainder of this paper is organized as follows. Sect. “Related work” examines pertinent literature on blockchain-enabled data sharing and reinforcement learning-augmented IoT systems. Section “The proposed framework” delineates the architecture and operational elements of BDEQ. Sect. “Data access control policy (Enhanced with BDEQ)” addresses the construction of smart contracts, the selection of dynamic proxies, and the processes of consensus. Sect. “Working mechanisms of framework” provides an experimental assessment and security analysis. Sect. “Evaluation and results” closes with prospective avenues for future research.

Related work

Secure and efficient data exchange in IIoT contexts has garnered substantial attention, notably in the context of blockchain, smart contracts, and intelligent edge computing. This section evaluates the cutting-edge developments across four principal dimensions: 1. Blockchain-based data storage systems; 2. Smart contract-based access control; 3. Machine learning and reinforcement learning in IIoT; 4. Hybrid frameworks that integrate these technologies.

Blockchain-based data storage and infrastructure

Blockchain has been extensively utilized to guarantee the integrity, traceability, and decentralization of data in the IIoT and associated fields.

“Filecoin20 and IPFS21 facilitate decentralized storage using cryptographic proofs, but they require significant processing and storage capabilities, making them unsuitable for lightweight IIoT devices. ChainSplitter22 and LayerChain23 improve scalability by distributing storage responsibilities between edge and cloud tiers. Despite these advances, these methods still rely on fixed node configurations and cannot adapt to changing device or network conditions.

BlockchainDB24 enhances blockchain capabilities to facilitate structured searches on immutable data, aimed at enterprise-grade storage solutions. BGP-LSChain25 utilizes blockchain for trust exchange routing in inter-domain networks; nevertheless, despite its efficacy in improving trust and security, these frameworks are not tailored for real-time, adaptive settings such as IIoT. Significantly, these systems lack reinforcement learning and do not offer intelligent, adaptive node selection.

To address the privacy and trust challenges in IIoT digital twin systems 26, introduces a blockchain-assisted federated learning (FL) framework. The approach trains AI models collaboratively across edge devices using FL while preserving data privacy. Blockchain ensures secure, auditable model update tracking, and explainable AI (XAI) improves model transparency. The framework demonstrates enhanced scalability, security, and interpretability in real-world IIoT deployments, offering a practical solution for self-optimizing digital twins.

Kumar et al.27 introduce IoV-6G + , a blockchain-based framework that uses UAVs and edge servers for secure data collection in infrastructure-less IoV systems. It achieves low latency and energy efficiency via lightweight consensus and demonstrates strong security through formal analysis. However, the framework relies heavily on UAV availability, lacks dynamic trust management, and assumes reliable cloud participation, limiting its applicability in large-scale or failure-prone environments.

The IoEPM + 28 framework proposes a secure, lightweight blockchain-assisted architecture for environmental pollution monitoring using UAVs and IoT devices within a 6G network. It addresses the security vulnerabilities of public communication channels and resource constraints of mobile nodes through a customized authentication and key agreement protocol. Formal (RoR model) and informal (Scyther tool) security analyses support its robustness, and performance comparisons show reductions in energy use, latency, and communication overhead. However, the framework assumes reliable UAV connectivity and cloud server access, and it does not explicitly address scalability, real-time adaptability, or trust dynamics in heterogeneous deployments.

Smart contract-based access control mechanisms

Smart contracts provide significant assistance for autonomous, policy-based access management in decentralized networks. EdgeShare9 presents a blockchain-based access control framework for IIoT data sharing, utilizing smart contracts to implement role-based access at the edge. Mollah et al.21 devised a cloud-assisted data-sharing framework employing smart contracts for attribute-based access control (ABAC), specifically designed for the constraints of IoT resources.

In the context of E-healthcare, secure data exchange is critical as Electronic Healthcare Records (EHR) transition from centralized, paper-based systems to internet-enabled platforms. The study in29 proposes a blockchain-based authentication framework designed to secure EHR access and data sharing. Through formal and informal security analysis and simulation via the Scyther tool, the framework demonstrates robust protection against attacks. Performance evaluations confirm its efficiency in reducing communication overhead, computational cost, and processing time, making it a compelling approach for trustworthy E-healthcare systems.

Nonetheless, although these models provide data confidentiality and verifiability, their access determinations are predetermined and rigid. Systems such as BlockchainDB24 and the CP-ABE-enhanced architecture by Zhang et al.30,31 utilize cryptographic access controls instead of smart contracts, resulting in policy updates that are computationally intensive and challenging to decentralize. Furthermore, few of these methodologies consider trust swings, load imbalances, or latency variations in real-time contexts.

A recent study32 proposes a secure and lightweight multi-factor authentication framework for cloud-assisted IoT environments, aiming to mitigate the computational burden on gateway nodes while preserving robust user authentication. Designed for highly scalable systems such as smart grids and healthcare, the framework ensures secure device access and communication. It is validated through both informal assessments and formal verification using the Scyther tool. While the scheme effectively balances security and computational efficiency, it assumes consistent gateway availability and does not explicitly address dynamic device onboarding or trust adaptation in large, distributed IoT deployments.

ML and reinforcement learning for IIoT optimization

Recent research has investigated machine learning, specifically reinforcement learning (RL), to enhance performance in dynamic IIoT contexts. Kaur et al.4 introduced a nature-inspired optimization approach aimed at minimizing latency and energy consumption in LoRa-based IoT networks. This solution enhances delay tolerance but lacks blockchain infrastructure and access control measures.

Liu et al.33 integrate deep reinforcement learning (DRL) with blockchain to enhance IIoT data distribution. The DRL agent autonomously chooses data routes to reduce latency and enhance robustness. Lu et al.34 integrate federated learning with blockchain to guarantee secure, privacy-preserving model training among industrial nodes.

These studies demonstrate the efficacy of machine learning in adaptive decision-making; nevertheless, they either do not facilitate access control or lack precise, trust-based node selection.

Hybrid architectures and research gaps

Despite notable advancements in blockchain-integrated frameworks for IIoT, significant limitations remain unaddressed. Solutions like ChainSplitter and LayerChain offer scalable storage but lack intelligent or adaptive node selection, while systems such as EdgeShare9 employ smart contracts for decentralized access control but rely on static proxy configurations, reducing their responsiveness in dynamic environments. Although some recent efforts incorporate machine learning to enhance decision-making33,34,35, they often overlook trust-aware selection or lack integrated access control mechanisms. For instance, one study35 presents a blockchain-assisted federated learning (FL) framework for intrusion detection in Intelligent Transportation Systems (ITS). This framework enables privacy-preserving collaboration among distributed edge nodes while employing a blockchain-based reputation mechanism to maintain the integrity of the FL process. Tested using the UNSW-NB15 dataset, it achieves a high accuracy rate of 99% in detecting zero-day attacks. However, the solution is narrowly focused on intrusion detection and does not support adaptive node selection or fine-grained access control across heterogeneous IIoT domains. In parallel, SecureShare36 introduces a blockchain and digital twin–based framework to facilitate dynamic inter-domain DDoS mitigation across SDN-enabled autonomous systems. SecureShare leverages NFV, SDN, and blockchain to coordinate secure and fair resource sharing between domains and is evaluated using Microsoft Azure Digital Twins and the Ethereum Sepolia testnet. While SecureShare demonstrates efficiency and scalability in DDoS defense, its design is centered on macro-level coordination and lacks support for decentralized trust modeling or intelligent edge node selection in IIoT scenarios. These studies underscore the growing trend of combining blockchain with collaborative intelligence but fall short in addressing the core requirements of trust-aware, adaptive proxy selection and smart contract–based access control in resource-constrained IIoT environments.

Table 1 presents an overview and comparison of the most important works through five dimensions. The support for blockchain technology, the utilization of machine learning or reinforcement learning, the adaptability of nodes, access control based on smart contracts, and compatibility with the Industrial Internet of Things are some of the dimensions that are included in this category.

Table 1 Comparative analysis of blockchain and RL-enhanced IIoT data sharing frameworks.

There is currently no solution that combines blockchain technology, dynamic proxy node selection through reinforcement learning, and smart contract-based access control inside a lightweight framework that is suitable for the Internet of Things (IoT). This is because there is currently no such solution. The investigation has revealed that there is a big gap in the research that needs to be filled. Deep Q-learning (DQL) for adaptive PSPN selection, blockchain overlays for decentralized governance, and smart contracts for precise, trust-sensitive access management are all components that have been added to the proposed BDEQ system to overcome this problem. All of this is performed totally inside an architecture that is specifically designed for situations that have limited resources for the Internet of Things (IoT).

The proposed framework

In this section, we present BDEQ (Blockchain-based Dynamic Edge Q-learning)—a novel architecture that integrates blockchain smart contracts with Deep Q-learning to enable secure, efficient, and adaptive data sharing in IIoT environments. The design is modular, scalable, and optimized for edge-based deployment in domains like smart gas networks, where devices exhibit fluctuating trust levels, computational constraints, and intermittent connectivity.

Dynamically selected proxy service providing node

We develop an edge agent node, termed Dynamically Selected PSPN (DS-PSPN), for each subnet to incorporate blockchain technology into the IIoT ecosystem. It is generally an edge server with enhanced computational power and storage capacity, capable of executing the proprietary smart contract within our system. During network initialization, one DS-PSPN is dynamically selected per domain. This ensures flexible participation based on real-time resource conditions and trust levels.

Figure 1 illustrates the internal structure of the DSPN. Each DSPN is furnished with a smart contract, an offline storage module, and an online distributed record pool that interfaces with the smart contract, accompanied by a corresponding attribute-permission table. The smart contract handles many data models and executes the key logic functions behind multiple interface types. The offline storage module within each DSPN maintains a local database that saves data from associated member nodes (MNs); this edge-based distributed storage compensates for the blockchain’s restriction in processing large-scale data directly. The record pool operates as a caching database, with its contents synced across all DSPNs to ensure consistency. Any data alteration is broadcast to all DSPNs as a transaction, and only once consensus is reached among them will the transaction be confirmed and committed, prompting updates across all record pools. The attribute-permission table is required for imposing access control within the same domain and is stored in the record pool by default upon network activation.

Fig. 1
Fig. 1
Full size image

The architecture of the BDEQ framework comprising the Network, Resource, and Application Interface layers. The figure illustrates the dynamic selection of DS-PSPN nodes via Q-learning, interaction with smart contracts, and the flow of data and user access. This figure was created by authors using Microsoft Visio, Version 2412 (Build 18,324.20194). The software is available at: https://www.microsoft.com/en-us/microsoft-365/visio/flowchart-software.

Figure 1 illustrates two network domains administered by DSPN1 and DSPN2, respectively, with connections to member nodes MN1 to MN4. Each DSPN operates its offline storage module and generates a data index during the storage of MN data. Due to the diversity of MN data forms, DSPNs accommodate several storage options, including relational, non-relational, and file-based systems. Concurrently, each DSPN transmits index data to the blockchain network. The index has fields including Device ID, IP address, Data Type, Write Count, Timestamp, and Domain. This index metadata—encompassing MN ID, data type, update count, collection time, and corresponding subnet—is documented in the distributed record pool. DSPNs depend on consensus techniques to sustain the consistency of this index information. Significantly, a single index entry is preserved for each MN in the DSPN’s record repository. As additional data is recorded, the associated index’s count and timestamp are updated correspondingly.

The data index list, encompassing details such data kinds and storage locations, is accessible via the public interface getDataIndex. To access data, the user must initially register with the relevant network domain through the platform to get a data reading interface. As illustrated in Fig. 1, upon approval of the registration request by the administrator of DSPN1 (Dynamically Selected Subnet Proxy Node), the user1 is granted access to secret Application Interfaces (APIs). The user can utilize getIntraData to obtain data from MN1 (inside the same domain as DSPN1) and getCrossData to acquire data from MN2, MN3, and MN4 (across different domains). These APIs allow users like User1 to create tailored apps that securely and efficiently utilize the obtained data.

BDEQ framework

As illustrated in Fig. 1, the BDEQ structure comprises three primary layers: Resource, Network, and Application Interface layers. The essential elements resemble those of EdgeShare; however, the BDEQ technique incorporates dynamic edge node selection through Q-Learning. This is the functionality of each component within the BDEQ framework:

  • Resource layer: The resource layer consists of industrial IoT devices that produce data. The devices link to edge nodes, where the RL algorithm dynamically identifies the optimal DS-PSPN for each data-sharing operation according to real-time conditions (e.g., node load, latency, available resources).

  • Network layer: This layer comprises several autonomous subnetworks, each linked to a DS-PSPN. In contrast to EdgeShare, which uses pre-selected proxy nodes, the BDEQ system utilizes dynamic selection for edge nodes. Reinforcement learning methods, particularly Q-Learning (QL), are used to choose the most efficient edge node for each data-sharing request, hence minimizing latency and optimizing resource utilization.

  • Application interface layer: The application layer offers abstract interfaces for applications to engage with the system. Table 2 provides a comprehensive explanation of all interactions. It provides both public and private interfaces. Private interfaces facilitate secure and efficient data exchange across several network domains, whilst public interfaces permit access to distributed ledger information and data indexes.

Table 2 Interfaces, both public and private, that are utilized in the proposed framework.

The aforementioned interfaces delineate a distinct separation between control-plane operations (user administration, querying rules/keys) and data-plane operations (intra- or inter-domain data access). A user application can be developed utilizing these APIs to efficiently request and disseminate IIoT data over the network. For instance, as previously outlined, a gas industry application may utilize GetDataIndex to display the available data streams across many sites, subsequently employing GetCrossData to retrieve a required sensor reading from other sites upon securing the necessary authorization.

Dynamic edge node selection utilizing BDEQ

Dynamic edge node selection is a crucial element of BDEQ, utilizing reinforcement learning to determine which DS-PSPN should manage each incoming request. Conventional techniques depend on static or round-robin node selection, which fails to accommodate fluctuating network circumstances or traffic. In BDEQ, each DS-PSPN perpetually monitors the network state, encompassing parameters such as current latency across nodes, CPU and memory utilization of each node, and trust scores. An RL agent utilizes this information to determine which node will act as the proxy for the subsequent request.

We utilize a Deep Q-Network (DQN) agent that acquires a policy for node selection. The state space encompasses real-time metrics of all candidate PSPNs, such as their available resources, queue lengths, and recent response times, whereas the action space pertains to the selection of a specific PSPN to fulfill the request. The DQN is trained using an epsilon-greedy strategy: it predominantly selects the currently estimated optimal node, while occasionally exploring an alternative node to facilitate learning. The incentive function is structured to promote decisions that reduce delay and equilibrate load. For example, if the selection of a specific PSPN yields expedited responses and little resource consumption, a positive reward is granted; conversely, if it causes elevated latency or an overloaded node, a negative reward (penalty) is imposed. Over time, this feedback allows the agent to ascertain which nodes exhibit optimal performance under specific situations.

Algorithm 1 delineates the DS-PSPN selection methodology. The DQN agent is first initialized with the specified state and action space (line 1). Upon receiving each new user request, the agent assesses the present system state (line 2) and determines an action (i.e., selects a PSPN) according to its Q-values (line 3). The selected node is thereafter appointed as the proxy for managing that request (line 4). The request’s nature is next assessed: if it is an intra-network data request, the designated node processes it immediately and returns the result (lines 5–8). In the case of an inter-network (cross-domain) request, the algorithm initiates supplementary procedures to validate cross-domain permissions: it retrieves the requisite authorization data (lines 9–11), generates an encrypted authorization token, and records a blockchain transaction to document this authorization (lines 12–14). The chosen node then enables cross-network data access utilizing the token (lines 15–16). Upon fulfilling the request, a reward is calculated to assess the efficacy of the selection (line 17), and the DQN agent subsequently updates its Q-values (line 18), so reinforcing favorable decisions.

Algorithm 1
Algorithm 1
Full size image

BDEQ Dynamic Proxy Node Selection.

During this reinforcement learning process, the algorithm evaluates how effectively the selected PSPN fulfilled the request. It calculates a reward based on relevant performance indicators such as response time, processing efficiency, or task completion success. This reward is then used to update the agent’s Q-values, reinforcing successful decisions and discouraging inefficient ones. The Q-value update follows the standard Q-learning update rule:

$$Q\left( {s_{t} ,a_{t} } \right) \leftarrow Q\left( {s_{t} ,a_{t} } \right) + \alpha \left( {r_{t} + \gamma .\mathop {\max }\limits_{a^{\prime}} Q\left( {s_{t + 1} ,a^{\prime } } \right) - Q\left( {s_{t} ,a_{t} } \right)} \right)$$
(1)

where \(Q\left( {s_{t} ,a_{t} } \right)\) is the Q-value for the current state \(s_{t}\) and action \({ }a_{t}\), \(\alpha\) is the learning rate, \(r_{t}\) is the reward received, \(\gamma\) is the discount factor, and \(\mathop {\max }\limits_{{a^{\prime } }} Q\left( {s_{t + 1} ,a^{\prime } } \right)\) represents the maximum Q-value for the next state.

As presented by (lines 16–18 of) Algorithm1, in the BDEQ framework, a DQL agent is employed to perform adaptive and trust-aware proxy node selection. At each decision point, the agent observes the current system state \(s_{t}\), which includes a vector of trust scores for candidate PSPNs, their current CPU utilization, and the estimated communication latency between each PSPN and the requesting MN. The action \(a_{t}\) corresponds to selecting a specific PSPN to handle a data access or sharing request. Once the action is taken, the environment responds with a scalar reward \(r_{t}\) designed to encourage the selection of nodes that are trustworthy, underutilized, and responsive. The reward function is defined as:

$$r_{t} = w_{1} .T_{s} - w_{2} .L_{t} - w_{3} .C_{t}$$
(2)

where \(T_{s}\) is the trust score of the selected PSPN (ranging from 0 to 1), \(L_{t}\) is the measured response latency in milliseconds, and \(C_{t}\) is the normalized CPU load (between 0 and 1). The weights are empirically set to \(w_{1}\) = 0.5, \(w_{2}\) = 0.3, and \(w_{3}\) = 0.2 to prioritize trust while moderately penalizing latency and load.

The agent is trained using an experience replay mechanism, storing previous transitions (\(s_{t} ,{ }a_{t} ,{ } r_{t} ,{ }s_{t + 1} ){ }\) to break correlation and improve stability. A target network is used to stabilize the Q-value updates. The action selection follows an ε-greedy policy, where ε is gradually decreased during training to shift from exploration to exploitation. Through this learning process, the agent develops an optimal policy that dynamically selects PSPNs to maximize long-term performance, ensuring both secure and efficient data sharing across IIoT domains.

Finally, the agent enters a loop to process any queued requests (lines 19–24), continuously updating the state and selecting actions for each, thus operating in near real-time. In essence, the RL agent is always running in the background, refining its policy with each request handled.

In our system, only the PSPN selected by the RL agent is permitted to execute the sensitive components of the smart contract for that request. This guarantees that malevolent or overloaded nodes, which the agent would not select owing to their suboptimal condition, cannot interfere with essential processes. The methodology offers a dynamic trust mechanism: nodes exhibiting substandard behavior (e.g., elevated mistake rates or suspected compromise) will inherently obtain a low trust score, resulting in their infrequent selection and effectively isolating them.

The integration of dynamic edge node selection with blockchain technology produces a system that is both highly scalable and safe. Blockchain guarantees immutable logging of all access events and enforces access restriction through smart contracts, while the reinforcement learning agent enhances performance by strategically allocating requests. This synergy enables BDEQ to attain reduced latency and enhanced throughput compared to static methods, without compromising security or decentralization.

The following step is to incorporate this structure into a comprehensive access control policy, which outlines how BDEQ implements data-sharing restrictions through the use of smart contracts.

Data access control policy (Enhanced with BDEQ)

We augment EdgeShare’s data-sharing paradigm with our BDEQ platform to achieve precise access control across diverse networks and facilitate multi-tier policy enforcement. In BDEQ, the optimal PSPN is dynamically chosen for each access request according to prevailing conditions (CPU load, memory, latency, node dependability, etc.), as detailed in Sect. “Dynamic Edge Node Selection Utilizing BDEQ”. We reconfigure both the smart contract and the consensus mechanism to facilitate this dynamic selection, striving to reduce processing delays while maintaining security. The smart contract is refined to interact with the RL agent’s judgments and to permanently document any policy enforcement operations on-chain.

Design of smart contract and BDEQ-driven access control

EdgeShare’s data-sharing administration was regulated by a distinctive, immutable smart contract that delineated all permissible operations and enforced access control through blockchain transactions. We maintain this methodology while integrating the BDEQ logic into the contract’s workflow. Before executing any important action, the contract now verifies that the request originates from the presently designated DS-PSPN, as ascertained by the RL agent. This guarantees that only a trusted node, selected by our method, can execute privileged activities.

When a user commences a data access operation, a blockchain transaction is generated and directed to the DS-PSPN chosen by BDEQ. The DS-PSPN, and solely that node, will subsequently engage with the smart contract to execute the requested action, provided it possesses authorization. For instance, take two principal sorts of actions managed by the contract: (a) user management (adding or revoking user access) and (b) data access validation (verifying and giving read permissions). The pseudocode for the improved smart contract process is presented as Algorithm 2. The pseudocode for the improved smart contract process includes the following steps:

  • Step 0: Integrate with reinforcement learning by utilizing the BDEQ agent’s selection to determine the optimal PSPN for the new request (the function BDEQ_SelectBestNode(request) yields the selected node). This node alone persists in executing the subsequent phases of the contract logic.

Algorithm 2
Algorithm 2
Full size image

Smart Contract with BDEQ Proxy Node Selection.

  • Step 1: user inclusion If the operation is addUserInfo (registering a new user/device), the contract logic on the DS-PSPN generates a new public/private key pair for the user and encrypts the supplied information using the public key. It subsequently records the enrollment information on the blockchain (utilizing a report () function) alongside the public key, so immutably documents the new user registration.

  • Step 2: (permission verification) – If the operation constitutes a data access request (indicated by CompareAttribute in the pseudocode), the contract initially acquires the pertinent permission rules from the attribute-permission table (housed in the record pool and accessible through the contract; this is executed by getRuleInfo ()). Subsequently, it verifies if the request is intra-domain or cross-domain (queryPermission () yields a flag). If it is intra-domain, the contract immediately calls intraAccessControl () to enable the DS-PSPN to locally serve the data.

  • Step 3: (cross-domain authorization) – If the request is cross-domain, the contract will locate the relevant permission record for the requesting user (find () in the permission table) and, if necessary, create a new authorization item (addAuthorization ()). This effectively generates a cryptographic token or proof that the user is authorized to access the requested data from the specified domain. The authorization information is subsequently documented as a blockchain transaction by report (), and an encrypted token is issued in response.

  • Step 4: (access execution) — Following the login process, the contract (continuing on the designated DS-PSPN) transfers control to the relevant function to retrieve the data. If it is an inter-domain request and the authorization has been successfully granted, it will invoke crossAccessControl (userRequest) on the originating DS-PSPN to enable data transfer from the remote domain. If the checks are unsuccessful or the request lacks authorization, the contract yields an access denied outcome. The result (approved or rejected) is recorded on the blockchain ledger for verification purposes.

In this improved design, BDEQ’s RL algorithm collaborates with the smart contract: the RL agent determines the executor of the contract, while the contract guarantees the enforcement of this decision (only the selected node’s transaction will be recognized) and ensures that all access events are accurately verified and documented. This delineation of responsibilities ensures that data proprietors retain complete control and transparency about access—each cross-domain access must be authorized through a blockchain-logged process, while every intra-domain access is managed by a recognized trustworthy node. The blockchain’s immutable ledger and the necessity for authentic cryptographic tokens for cross-domain data guarantee that even if certain edge nodes are hacked, an attacker cannot circumvent the system’s security by, for instance, fabricating a request from an illegal node. The smart contract, enhanced with BDEQ, ensures that access control choices are made contextually and securely. BDEQ mitigates illegal access and alleviates performance constraints inherent in static designs by dynamically routing requests through appropriate nodes and recording decisions on-chain.

Block structure in BDEQ

BDEQ employs a permissioned blockchain ledger to document essential events like as user registrations, node selections, and access permissions. Conventional blockchains (e.g., Bitcoin) solely document basic financial transactions, whereas our bespoke block structure is engineered to capture more comprehensive information relevant to data sharing. Each block comprises standard components—a header, a body, and metadata—but we augment the information to incorporate application-specific indexes that facilitate our smart contracts and reinforcement learning decisions.

The block header contains the block’s hash, a date, and the Merkle tree root of the block’s transactions, similar to traditional blockchains. The block body encompasses the transaction list. Within the context of BDEQ, transactions encompass occurrences such as “User X registered in Domain Y,” “Node Z designated as proxy for Request Q,” “User A granted access to Data B,” and similar events. These transactions encompass both intra-domain occurrences and cross-domain permission tokens, all of which are cryptographically signed and, where required, encrypted. For example, when cross-domain access is permitted, the block may include the encrypted permission token granted to the requesting user.

The metadata component of each block is optimized to support rapid lookup tables and states necessary for the smart contract. We specifically incorporate structures for UserInfo (identification and public keys of registered users/devices), Permission (access control regulations and attributes for each domain), and AuthorizationList (pending cross-domain authorizations given). The metadata entries are modified by transactions and saved in the block, enabling any node to swiftly access current access control states without scanning the full chain. The metadata functions as a snapshot of the access control database at the moment of the block, preserved by the DS-PSPNs via their consensus.

To guarantee ledger integrity, we implement dual layers of security: (1) Merkle Tree verification- any alteration to a transaction within the block body will modify the Merkle root in the header, which will be identified by peers (the root is digitally signed as an integral component of the block). Hash chaining-The header of each block contains the hash of the preceding block. Consequently, if an assailant were to modify the contents of a previous block, it would render the hashes of all following blocks invalid. These procedures ensure that the blockchain offers an immutable, tamper-proof record of all DS-PSPN selections and access determinations. Even formidable enemies cannot modify historical records without detection by the network.

The BDEQ ledger operates as a reliable record and database for the framework by integrating dynamic metadata with cryptographic linking. All significant interactions (new user addition, data access, node modification, etc.) are permanently documented. This enhances security (audit trails, forensic analysis) and system reliability, as any node can recreate the current permission state by scanning the chain, allowing new DS-PSPNs to rapidly synchronize with the network state upon joining.

Consensus mechanism in BDEQ-based edgeshare

We adopt a consensus method specifically designed for the BDEQ architecture to ensure ledger consistency. The Consensus Mechanism in BDEQ is based on Practical Byzantine Fault Tolerance (PBFT) with adaptations for dynamic membership.

The consensus process plays a vital role in guaranteeing consistency and reliability across blockchain networks. Conventional protocols like PBFT and Delegated Proof of Stake (DPoS) are prevalent; nevertheless, their significant computing demands and propensity for centralization render them unsuitable for resource-limited edge situations. The BDEQ-based EdgeShare system employs a lightweight, role-based consensus algorithm that reduces energy usage while ensuring consistency among distributed edge nodes.

Our framework simplifies the consensus process by conceptualizing it as a deterministic sorting service. Transactions are validated using a progressively increasing value referred to as a nonce. Should the nonce submitted by a node fail to receive an acknowledgment from its peers, subsequent transactions from that node would be halted until consensus is achieved. This architecture enforces synchronization without necessitating intensive cryptographic calculations, rendering it exceptionally suitable for low-power edge devices. The approach is fundamentally analogous to PBFT but modified for asynchronous and bandwidth-constrained edge situations.

Figure 2 illustrates that the blockchain network in EdgeShare consists of three principal node roles: endorsers, committers, and orderers. Endorser nodes validate proposed transactions, committer nodes complete and record them, and orderer nodes aggregate validated transactions into blocks according to established criteria such as block height or timestamp. These roles are dynamically allocated to qualified PSPNs chosen by the BDEQ algorithm, enabling the system to consistently adjust to fluctuating load situations and trust levels within the network.

Fig. 2
Fig. 2
Full size image

Illustration of the BDEQ consensus mechanism for ledger updates.

The application layer and consensus layer are detached following the Hyperledger Fabric paradigm. This modularity enables EdgeShare to implement its trust model on the Fabric infrastructure. After an endorser verifies a transaction, it is submitted by a committer and subsequently organized into blocks by the appointed orderer. The ordering service guarantees consistency of transaction sequences across nodes, therefore averting ledger forks and maintaining a cohesive global state.

The allocation of ordering responsibilities can adhere to either a random or round-robin technique among PSPNs designated by BDEQ to provide fairness and load balance. This dynamic selection assures that no one node becomes a performance bottleneck or a point of failure. Ultimately, our lightweight consensus mechanism allows the trustworthy functioning of the EdgeShare blockchain while being suited for deployment in distributed, diverse, and resource-limited edge environments.

Therefore, the consensus algorithm facilitates swift consensus on transactions, such as node selection and access logs, despite the dynamic participation of edge nodes, hence enhancing BDEQ’s scalability.

Working mechanisms of framework

BDEQ functions through a series of fundamental mechanisms that collectively provide secure and efficient data sharing. This section delineates four fundamental operational strategies of the framework: Initialization, Registration, Intra-Network Access, and Cross-Network Access, explicating the system’s functionality from deployment to real-time data inquiries. Figure 3 depicts the execution flow of three essential operational methods after initialization in BDEQ. The foundation of all these activities is the permissioned blockchain, which maintains the ledger of events and policies, and the reinforcement learning agent, which makes adaptive judgments, as previously outlined.

Fig. 3
Fig. 3
Full size image

The execution flow of three fundamental operational strategies following initialization in BDEQ.

Initialization strategy

Upon implementation, the BDEQ framework commences by partitioning the IIoT environment into several network domains (subnetworks), each overseen by an administrator. For each domain, a single edge node is chosen based on characteristics such as maximum capability or trustworthiness to serve as the initial DS-PSPN. The RL agent can subsequently modify this initial selection as circumstances evolve. All these DS-PSPNs constitute an overlay blockchain network. In this phase, each DS-PSPN produces its cryptographic key pair (public and private keys). The public keys, in conjunction with node identities and their domain information, are disseminated and documented on the blockchain ledger. This creates a reliable identification for each edge node in the system from the outset—only nodes with registered public keys are permitted to engage in consensus and data exchange.

Subsequently, the smart contracts, referred to as chain code on certain platforms, that execute BDEQ’s logic are published onto the blockchain. These encompass agreements for user management, data access control, and logging as specified in Sect. “Data access control policy (Enhanced with BDEQ)”. By implementing them at initialization, we guarantee that all subsequent activities (user registrations, data requests, etc.) will utilize the same validated logic on each DS-PSPN.

The RL agent is initialized at this juncture. The initiation may involve a neutral policy or one influenced by any preceding training, if applicable. The agent possesses knowledge about the preliminary network topology, including the enumeration of DS-PSPN nodes and potentially their first heuristic weights. No genuine learning has occurred thus far; nevertheless, the conditions are established for the agent to commence observation and optimization once user interaction is initiated.

Ultimately, the system establishes foundational data structures: each DS-PSPN generates an unpopulated distributed record pool and an unfilled attribute-permission table. The record pool will soon have data indices once data begins to flow, and the permission table will populate as administrators establish access protocols. Initially, these may be vacant or populated with default entries (for instance, an administrative account may be pre-registered for each domain to authorize further user requests).

Upon completion of the initialization, the blockchain network becomes operational, albeit with only the genesis block and initial transactions such as node registrations, and all DS-PSPNs are prepared to implement policies. The framework consequently establishes trust through blockchain identities, security through smart contracts, and adaptability through the reinforcement learning agent before the involvement of any user or data.

Registration strategy

After initialization, no external user or device may gain access until they complete the registration process. The Registration Strategy delineates the secure and auditable process by which new users or devices integrate into a domain.

When a user, or an IoT device functioning as a user, seeks to access data from a certain domain, they must initially submit a registration request to that domain’s DS-PSPN. This request generally encompasses the user’s identification details and credentials. Upon receipt of a registration request, the DS-PSPN’s smart contract (activated through a blockchain transaction) executes verification procedures: it may necessitate administrator approval or assess the request against specific criteria. Upon approval, the smart contract will generate a new entry in the ledger signifying that this user is now registered in the domain.

Essentially, the UserRegister interface (refer to Table 1) is invoked behind the scenes. The DS-PSPN produces a distinct public–private key pair for the new user if the system oversees keys or if the user supplies a public key. The public key and user ID are documented on-chain, associated with that domain. The user receives credentials (e.g., a certificate or token) authenticated by the DS-PSPN’s private key, which will be utilized for further requests. This certificate serves as the verification of the user’s identity within the blockchain.

The registration transaction on the blockchain guarantees that all DS-PSPNs are informed about the new user, as the transaction is disseminated to all nodes. At this juncture, any DS-PSPN can cryptographically authenticate the user’s identity with the documented public key. The attribute-permission table in the domain’s record pool may be revised to incorporate a default rule set for the new user, specifying the data categories they are permitted to access by default, if applicable, or indicating that explicit permits are required.

This technique ensures auditability and security: a hostile actor cannot covertly add a user without leaving a blockchain record, and a user cannot impersonate another individual due to the requirement of unique keys and generally an administrator’s approval. If a user is to be denied access, the UserDelete interface may be invoked similarly, documenting the revocation on the ledger and ensuring that the user’s credentials are no longer recognized thereafter.

BDEQ’s Registration Strategy secures user onboarding using smart contracts and blockchain records, establishing a reliable foundation for all users in the data-sharing network. All subsequent data accesses by a user will be linked to this initial registration record.

Intra-network access strategy

Upon registration of users and devices, they may commence data requests. Within the Intra-Network Access Strategy, a user solicits data from IoT devices located in the same domain, where the same DS-PSPN governs both the user and the data source. This situation utilizes the closeness and confidence of local edges. Upon a registered user submitting a data request (via getIntraData) for a resource inside their domain, the request is initially processed by the local DS-PSPN. The DS-PSPN will verify the user’s credentials by validating the request’s signature against the public key registered during enrollment. Upon authentication, the DS-PSPN consults its permission table to determine if the user is authorized to access the requested data, which may be subject to certain attribute or role requirements. Provided the user possesses adequate privileges, the DS-PSPN extracts the data from its offline storage or straight from the target device and delivers it to the user.

The smart contract operates in this process: the DS-PSPN generates a blockchain transaction for the access request (despite being local), invoking a function (such as intraAccessControl) that records the access attempt and its result. Due to its intra-domain nature, the contract can swiftly verify the correctness of the domain and generally authorizes the request if the permission assessment is successful. The transaction is subsequently documented in a new block (or a forthcoming block) stating “User X accessed Data Y (success) at time T,” authenticated by DS-PSPN. The implementation of on-chain logging for intra-domain access is a deliberate design decision aimed at ensuring accountability; nevertheless, it can be executed in a batched or asynchronous manner to mitigate latency (the DS-PSPN can promptly return the data and subsequently log it to the blockchain, as trust exists within a single domain).

The RL agent may have several DS-PSPNs available for selection, particularly if a domain temporarily possesses multiple candidates or for load balancing purposes, allowing it to potentially choose a different node for an intra-domain request. In reality, for intra-domain scenarios, the default selection typically involves the domain’s own DS-PSPN, as routing to a node in another domain would introduce superfluous hops. Consequently, intra-domain queries are generally managed locally, resulting in minimum latency—one of the primary advantages of edge computing. Within our architecture, we observed that intra-domain read/write operations are completed on the order of a few tens of milliseconds (refer to Sect. “Data access performance”).

The Intra-Network Access Strategy facilitates easy and secure data retrieval when both the consumer and producer are within the same network. BDEQ guarantees that local data exchanges are expedited, safe, and recorded for future auditing by employing the local DS-PSPN and on-chain permission verification.

Cross-network access strategy

The Cross-Network Access Strategy encompasses data exchange across many domains (subnets), necessitating intricate inter-domain trust and coordination. The synergy between blockchain and reinforcement learning in BDEQ is most apparent, facilitating decentralized and efficient cross-domain requests.

Suppose a user in Domain A requires data from an IoT device located in Domain B. The user, registered in Domain A, submits a GetCrossData request to their local DS-PSPN (A). Domain A’s DS-PSPN cannot directly get data from Domain B’s device due to its lack of direct authority over Domain B’s resources. The request must be facilitated via Domain B’s DS-PSPN. BDEQ employs the blockchain as a trusted broker and the RL agent to identify the ideal proxy route. For this purpose, the following steps are processed:

  • Proxy selection: Upon receipt of the cross-domain request, DS-PSPN A employs the RL agent to ascertain whether to engage directly with DS-PSPN B or to route through an alternative DS-PSPN, considering the existence of numerous domains and potential pathways. Typically, the optimal selection is DS-PSPN B (the domain containing the target data); but, in situations involving several domains and varying network conditions, the RL agent’s policy may propose an alternative route (for instance, if B is congested, it may recommend routing through a third node that has already cached the data index). In our gas industry use case, characterized by a limited number of domains, direct selection is customary.

  • Authorization protocol: DS-PSPN A initiates a cross-domain access request transaction on the blockchain, stating “User X from Domain A requests data Z from Domain B.” This activates the cross-network authorization logic of the smart contract. The contract verifies whether user X, defined by their credentials and domain, possesses an existing authorization to access data Z in Domain B. If not, it generates an entry (as outlined in Sect.“Design of smart contract and BDEQ-driven access control” Step 3) effectively requesting Domain B to confer authorization. DS-PSPN B observes this transaction on the blockchain (as it engages in consensus) and, contingent upon the rules, accepts it by incorporating an authorization record (potentially due to prior approval of specific data for cross-sharing by the Domain B administrator, or it may necessitate real-time manual approval by an administrator, depending on policy). An addAuthorization event relating to User X and Data Z is logged.

  • Data transmission: Upon the establishment of authorization, DS-PSPN B is now authorized to serve User X. DS-PSPN B will get the requested data from its local storage or device, encrypt it using User X’s public key (which it possesses from the registration information on-chain), and transmit it to DS-PSPN A. DS-PSPN A subsequently decrypts the data, if required, and transmits it to the requesting user. All communication is secured by the network’s encryption (TLS) and blockchain-issued tokens to avert tampering.

  • Documentation and revision: The complete transaction is recorded. The DS-PSPN of Domain B generates a transaction stating, “Data Z delivered to User X of Domain A,” which is recorded in the ledger, establishing a traceable history. Furthermore, any alterations in status, such as a reduction in the number of times User X can access Z if quotas are in place, are documented.

The consensus technique guarantees that both domains maintain a uniform perspective on the request and its authorization. The RL-based dynamic selection guarantees that among numerous candidate DS-PSPNs capable of mediating the request, the one yielding the lowest latency or highest trust is selected. If Domain B’s primary node was occupied, the agent may have directed the request through a backup node inside Domain B or through another domain that possessed a replica, provided that such replication was established.

The Cross-Network Access Strategy facilitates federated data exchange without a central authority; each domain governs its data while permitting sharing with others upon obtaining appropriate on-chain authorization. BDEQ guarantees this is executed effectively (minimum interactions outside the blockchain transactions) and securely (robust cryptographic assurances and audit logs). In our implementation, cross-domain access experienced marginally increased latency compared to intra-domain access (attributable to the additional blockchain round-trip and network hop), although it remained far below the acceptable threshold for industrial applications (results in Sect. “System throughput” illustrate BDEQ’s advantage in this regard).

By managing cross-domain requests in a decentralized yet coordinated fashion, BDEQ effectively dismantles “data islands” while maintaining the autonomy and security protocols of each domain. This is especially crucial in sectors such as gas extraction, where various sites (domains) may be overseen by distinct businesses yet require data sharing under stringent control.

Evaluation and results

We assess the BDEQ framework using theoretical analysis and empirical findings. The theoretical analysis evaluates BDEQ’s security and functional advantages relative to conventional models, while the experimental assessment illustrates BDEQ’s performance (latency, throughput, etc.) across many scenarios. All methodologies and configurations are delineated to guarantee reproducibility. The studies were conducted repeatedly, and the presented results are averages; the observed variance was minimal, and the enhancements by BDEQ are statistically significant in most instances.

Theoretical results

This section presents a theoretical examination of the BDEQ framework, assessing its security advantages, efficiency, and scalability relative to conventional centralized models and alternative decentralized systems.

Security analysis

We initially examine how BDEQ alleviates prevalent attack vectors in data-sharing systems. Figures 4 and 5 delineate the attack surfaces in two scenarios: a conventional centralized architecture (Scenario 1) compared to the blockchain-based BDEQ design (Scenario 2), correspondingly. In both instances, an attacker may endeavor to corrupt the system during three phases of the data lifecycle: pre-transmission, in-transit, and post-reception. As illustrated by Fig. 4, in a centralized paradigm (Scenario 1), the vulnerabilities are significant.

  • Pre-transmission: In a centralized architecture, data resides and is accessed through a single control node, rendering system security heavily reliant on the integrity of the central server. This creates a concentrated attack surface, exposing the system to several vulnerabilities across the data lifecycle. First, at the local level, attackers may compromise edge devices or local storage to alter or exfiltrate data before it is transmitted. The probability of this type of attack is \(\mathop \prod \limits_{i = 1}^{N} \lambda_{i}\), where \(\lambda_{i}\) is the probability of compromising the edge node i, and \(0 \ll \lambda_{i} \ll 1\)

  • In-transit: during transmission, data packets traveling between local nodes and the central server are susceptible to interception and forgery, enabling unauthorized access or injection of false information. The probability of this type of attack is \(\mathop \prod \limits_{i = 1}^{N} \eta_{i}\), where \(\eta_{i}\) represents the probability of forgery of data packets between edge node i and the central server, and \(0 \le \eta_{i} \le 1\).

  • Post-reception: All data is stored in a singular central database. If the database or control center is breached (with probability µ, where \(0 \le \mu \le 1\).), the attacker obtains unrestricted access to all stored data and can modify access controls, hence compromising the entire system.

Fig. 4
Fig. 4
Full size image

Attack model in Scenario 1 (Centralized scenario).

Fig. 5
Fig. 5
Full size image

Attack model in the BDEQ scenario (Scenario 2), demonstrating decentralized defense layers using DS-PSPNs, blockchain verification, and trust-aware selection. This figure was created by authors using Microsoft Visio, Version 2412 (Build 18,324.20194). The software is available at: https://www.microsoft.com/en-us/microsoft-365/visio/flowchart-software.

The overall success probability of attack represented by \(P_{c}\) is calculated as:

$$P_{c} = \beta (c_{1} \mathop \prod \limits_{i = 1}^{N} \lambda_{i} + c_{2} \mathop \prod \limits_{i = 1}^{N} \eta_{i} + \mu )$$
(3)

Here, the coefficient \(\beta\) in the formula is a constant used to ensure that the value of \(P_{c}\) is between 0 and 1 so that all results are standardized for comparative analyses. The \(c_{1} and c_{2}\) are the thresholds required to launch a valid pre-transmission and in-transmit attack, respectively. In this paper, the \(\beta\) is set to 1/3 and the \(c_{1} and c_{2}\) are set to 1/2.

On the other hand, the decentralized design of BDEQ distributes data and access control over several different DS-PSPNs. The verification process is carried out via blockchain consensus, and the DS-PSPNs are distributed around the network. Even in the case that an adversary compromises a large number of nodes, the integrity of the system will continue to be maintained, unless the majority of the nodes are attacked and compromised. Consequently, the likelihood of successful coordinated strikes is greatly diminished as a consequence of this condition. The immutability of blockchain technology assures that records cannot be altered and can be audited. Additionally, asymmetric encryption protects communication routes. Both of these features are made possible by blockchain technology. Successful attacks on BDEQ are computationally costly since it is essential to simultaneously corrupt a large number of nodes and produce adequate cryptographic proofs. This is a need for the attack to be successful. A significant benefit in terms of security is represented by this in comparison to conventional centralized systems. Table 3 illustrates the probability of various attacks in the two conditions that have been defined.

Table 3 Attack capabilities and success probabilities in centralized versus BDEQ architecture.

As shown by Fig. 5, BDEQ’s decentralized architecture (Scenario 2) drastically reduces these risks. Data and access control are distributed across multiple DS-PSPN nodes with blockchain enforcement:

Pre-transmission: Compromising data at the source now requires assaulting several edge nodes (since no single node contains all the power or all the data). For an attacker to alter data before it’s shared, they would need to hack into each relevant DS-PSPN or a majority of them. If \(\lambda_{i}^{\prime }\) the probability of compromising DS-PSPN i, the probability of simultaneously compromising n DS-PSPNs (to corrupt data at source or alter permission rules) is \(\prod\limits_{{i = 1}}^{N} {\lambda _{i}^{\prime } }\) – a product of small probabilities, which becomes vanishingly small as n grows. Consequently, the failure of any individual node does not compromise the entirety of the data.

  • In-transit: BDEQ employs end-to-end encryption with public key cryptography for each transaction, alongside the blockchain’s intrinsic integrity checks. Even if an attacker intercepts a message, they cannot effectively inject counterfeit data without the necessary private keys and majority consensus. The likelihood of an assailant successfully acquiring the private key of each DS-PSPN is presumed to be independent, denoted as \(\varepsilon_{i}^{\prime }\), where \(0 \le \varepsilon_{i}^{\prime } \le 1\). To execute fake authorization transactions, the attacker must compromise n nodes and successfully obtain their corresponding public–private key pairs. Under the independence assumption, the probability of a successful attack is given by the product of the individual key compromise probabilities as \(\mathop \prod \limits_{i = 1}^{N} \varepsilon_{i}^{\prime } \times \mathop \prod \limits_{i = 1}^{N} \lambda_{i}^{\prime }\) .

  • Post-reception: To retroactively damage data or obtain unauthorized access, an attacker would have to compromise a majority of DS-PSPNs and modify the ledger, which is virtually unfeasible due to cryptographic safeguards. For instance, modifying a record in the distributed ledger necessitates the ownership of over 50% of the nodes to amend history, a feat rendered unachievable by BDEQ’s disadvantages. The likelihood of an attacker successfully altering authorization information contained in the distributed record pool is presumed to be independent across nodes and is denoted as \(\eta_{i}^{\prime }\) where \(0 \le \eta_{i}^{\prime } \le 1\). A successful tampering attack on the distributed record pool necessitates the concurrent alteration of authorization information across all N nodes. Therefore, the probability of a successful attack is expressed as \(\mathop \prod \limits_{i = 1}^{N} \varepsilon_{i}^{\prime } \times \mathop \prod \limits_{i = 1}^{N} \eta_{i}^{\prime }\).

Therefore, the overall probability of a successful attack in Scenario 2, denoted as \({P}_{b}\), can be computed by combining the independent probabilities of the three sequential attack steps: compromising the relevant nodes, stealing their private keys, and tampering with all ledger replicas. In this scenario, the trust value of the DS-PSPNs in the selection pool affects the probability of attack. If \(t_{i}\) represents the trust values of the ith DS-PSPN, then the overall probability of a successful attack \(P_{b}\) in the system, given N DS-PSPN candidates, each with a trust value \(t_{i}\) between 0 and 1 represents how trustworthy the ith DS-PSPN is. A trust value close to 1 means highly trustworthy while a value close to 0 means highly risky. So, the product \(\mathop \prod \limits_{i = 1}^{N} t_{i}\) reflects the joint probability that all DS-PSPNs behave correctly. Consequently, the probability that at least one DS-PSPN fails, i.e., a successful attack happens can be represented by (\(1 - \mathop \prod \limits_{i = 1}^{N} t_{i} ).\) This yields the following expression:

$$P_{b} = \alpha \left( {1 - \mathop \prod \limits_{i = 1}^{N} t_{i} } \right)\left( {\mathop \prod \limits_{i = 1}^{N} \lambda_{i}^{\prime } + \mathop \prod \limits_{i = 1}^{N} \varepsilon_{i}^{\prime } \times \mathop \prod \limits_{i = 1}^{N} \lambda_{i}^{\prime } + \mathop \prod \limits_{i = 1}^{N} \varepsilon_{i}^{\prime } \times \mathop \prod \limits_{i = 1}^{N} \eta_{ii}^{\prime } } \right)$$
(4)

Again, the coefficient \(\beta\) is set 1/3 in the formula to ensure that the value of \(P_{b}\) is between 0 and 1, so that all results are standardized for comparative analyses.

In conclusion, BDEQ substantially diminishes the likelihood of successful coordinated assaults. Even if an attacker infiltrates a minority of edge nodes, the blockchain’s immutability and the necessity for widespread consensus serve as a robust defense mechanism. Communications are inherently secure, as only authorized DS-PSPNs can decrypt and execute transactions, and the lack of a centralized control center eliminates a singular catastrophic point of failure. By necessitating that an attacker simultaneously compromises multiple components (and does so undetected), BDEQ makes most attacks computationally unfeasible.

The centralized, Edgeshare, and BDEQ architectures’ attack success probabilities were simulated to quantify this. Direct analytical comparisons between scenarios are complicated by model variable uncertainty. The suggested framework is tested using Monte Carlo simulations. Figures 6 and 7 simulate success likelihood for three competing frameworks by randomly assigning values between 0.9 and 1 to all probabilistic factors in the equations, except trust values. To simulate adversarial conditions, BDEQ’s trust values were dynamically allocated based on compromise in the scalability simulations. High-trust nodes (0.99–1.00) predominated at a 10% compromise threshold, while compromised nodes (0.90–0.95) had slightly lower trust, enabling effective filtering. BDEQ’s advantage decreased substantially to around 25% when the trust difference dropped to 0.90–0.99 for honest and 0.30–0.50 for compromised. Trust levels ranged from 0.80–0.95 (honest) to 0.10–0.20 (compromised) at 50%, reducing the filter’s discrimination. With a 100% compromise, all nodes obtained a low trust rating (0.10–0.20), nullifying the trust filter and aligning BDEQ with EdgeShare.

Fig. 6
Fig. 6
Full size image

Resilience comparison – Attack success probability (log₁₀ scale) as a function of the total number of edge nodes (100 to 1000) under varying compromise levels (10%, 25%, 50%, 100%) for three architectures (Centralized, EdgeShare, BDEQ).

Fig. 7
Fig. 7
Full size image

Security impact of node compromise—Average attack success probability in Centralized, EdgeShare, and BDEQ architectures at 10%, 25%, and 50% of nodes compromised.

Figure 6 illustrates the scalability and resilience of Centralized, EdgeShare, and BDEQ architectures under increasing node counts (100–1000) across four compromise levels (10%–100%), with attack success probabilities shown on a log scale.

BDEQ consistently achieves superior resilience, especially at 10% and 25% compromise, where its adaptive trust-aware proxy selection rapidly suppresses attack success probability, reaching below 10⁻2⁰ as N approaches 1000. This exponential decay stems from its reinforcement learning mechanism, which dynamically excludes low-trust nodes. At 50% compromise, BDEQ maintains an advantage, though diminished, as fewer high-trust nodes reduce filtering efficacy. Under 100% compromise, BDEQ’s behavior converges with EdgeShare, since trust-based filtering collapses (with \(\mathop \prod \limits_{i = 1}^{N} t_{i} \simeq 0\)), neutralizing adaptive selection. Both systems then rely solely on static consensus. The centralized architecture, in contrast, remains flat and highly vulnerable regardless of scale, highlighting its inherent single-point-of-failure limitation.

To further analyze the persistence and scalability of the BDEQ framework, we simulate the average attack success probability in BDEQ across varying compromise levels (10%, 25%, and 50%,) and compare it with EdgeShare and Centralized architectures.

Figure 7 illustrates the average attack success probabilities of Centralized, EdgeShare, and BDEQ architectures under 10%, 25%, and 50% node compromise. BDEQ consistently demonstrates the lowest attack success probability across all levels of compromise, attributable to its trust-aware proxy selection mechanism. At 10% and 25% compromise, BDEQ surpasses EdgeShare by 42% and 55%, respectively. At 50% compromise, both decentralized systems exhibit similar performance; however, they maintain significantly greater resilience compared to the centralized model, which remains persistently vulnerable across all scenarios.

Functional comparison with other systems

This section compares the BDEQ framework to EdgeShare and other systems to demonstrate its systemic benefits. The study examines data utilization, storage, security, scalability, and functionality to demonstrate BDEQ’s suitability for IIoT data-sharing applications. Table 4 shows that BDEQ’s dynamic trust-aware node selection and smart contract integration make it more robust and efficient than competing methods.

Table 4 Comparative analysis of architectural features across BDEQ, EdgeShare, and selected related systems.
  1. 1.

    Security, consistency, and reliability: BDEQ and EdgeShare utilize a permissioned blockchain to guarantee safe, tamper-proof recording of data transactions, significantly improving reliability in IIoT data exchange. EdgeShare’s use of static PSPNs and blockchain significantly enhances the difficulty for an attacker to compromise the entire system in contrast to a centralized architecture. BDEQ enhances this by implementing an improved consensus mechanism and dynamic node selection. BDEQ can validate transactions more efficiently and exclude nodes considered untrustworthy in real-time. Despite potential compromises of certain nodes, BDEQ preserves overall system integrity by utilizing the distributed ledger and the preponderance of honest nodes. BDEQ preserves the security of blockchain by eliminating a single point of failure while introducing adaptability, hence enhancing consistency under duress; for instance, if a node exhibits erratic behavior, BDEQ will circumvent it.

  2. 2.

    Data rights and privacy protection: Both frameworks mandate that data is exclusively disseminated to authorized entities. In EdgeShare, the distinction between the data plane and control plane ensures that raw data is never revealed without undergoing access control verifications. BDEQ adopts this and enhances privacy through meticulous identity management and encryption. Each access in BDEQ is associated with a single user’s key, and all cross-domain data transfers are encrypted using the requester’s public key, guaranteeing that intermediary nodes are unable to decipher the data. Furthermore, BDEQ meticulously records all access requests and decisions, providing users with transparency;They can determine who accessed their data and when, that is beneficial for data rights management. EdgeShare establishes a robust foundation of privacy, as blockchain inherently obstructs unauthorized access through the necessity of verified transactions. However, BDEQ’s capacity to incorporate advanced encryption methods, such as attribute-based encryption or zero-knowledge proofs in the future, could enhance privacy further without compromising usability.

  3. 3.

    Lightweight operation and compatibility: BDEQ and EdgeShare are both engineered to integrate with existing IoT infrastructure without necessitating modifications to the IoT devices. They function as an overlay network. Communication within BDEQ occurs via blockchain transactions and regular network calls, allowing any device capable of sending a request to the DS-PSPN (via a REST API or Software Development Kit (SDK)) to utilize the system. The cross-platform characteristic is derived from EdgeShare’s utilization of the extremely modular Hyperledger Fabric. BDEQ maintains a lightweight structure as IoT devices do not operate substantial blockchain nodes; this function is managed by the edge servers (DS-PSPNs). Furthermore, by dynamically selecting proxy nodes, BDEQ can enhance load dispersion, hence reducing local resource utilization on each device and preventing any single edge from becoming overwhelmed. Both methods are interoperable with heterogeneous devices and necessitate minimum integration, as devices merely refer to their domain’s DS-PSPN. BDEQ’s adaptive methodology guarantees that when the array of devices evolves or expands, it can seamlessly integrate them by scaling out DS-PSPNs or reallocating roles, all without the end devices’ awareness.

  4. 4.

    Scalability: EdgeShare implemented a dual-layer network (edge and cloud) to enhance scalability, effectively accommodating modest sizes. BDEQ enhances scalability by allowing the network to expand and reconfigure spontaneously. Due to the ability to add or remove DS-PSPNs, the RL agent will adapt, allowing BDEQ to accommodate a growing number of domains or nodes without necessitating manual reconfiguration. BDEQ’s effective consensus (Sect. “Consensus mechanism in BDEQ-based edgeshare”) ensures that transaction overhead stays minimal as additional nodes are incorporated, facilitating elevated throughput. In scenario simulations, as network capacity expanded, EdgeShare’s performance ultimately plateaued due to fixed proxies and a static consensus group, but BDEQ sustained throughput by dynamically using additional nodes. This suggests that BDEQ is more adept at managing extensive IIoT implementations (hundreds of nodes) while maintaining minimal latency.

  5. 5.

    Flexibility and functionality: BDEQ is engineered for extensibility. The smart contracts can be enhanced or expanded with additional functionalities, such as incorporating machine learning-based anomaly detection to operate with each data upload, without necessitating a complete framework rewrite. The policy of the RL agent can be re-trained or adjusted for new optimization objectives, such as energy efficiency versus latency. This adaptability is crucial in industrial environments where demands change. EdgeShare, built on Fabric, features modular components; nevertheless, it lacks an AI component such as BDEQ’s reinforcement learning agent. Consequently, BDEQ has enhanced adaptive capabilities; for example, one can configure the RL agent to prioritize specific nodes during peak periods or to implement geo-fencing regulations by “learning” that particular data must remain within designated areas. Dynamic policies are more challenging to execute within static frameworks. Furthermore, BDEQ’s implementation of smart contracts facilitates intricate rule enforcement, such as multi-factor access determinations that include contextual factors, representing a progression beyond basic access control lists.

In summary, the comparative analysis indicates BDEQ’s robust security and feature advantages. We next validate these advantages empirically through a prototype implementation and performance measurements.

Experimental results

We developed a prototype of BDEQ and performed studies to evaluate its performance compared to baseline methods. The assessment encompasses: (a) system implementation and resource use, (b) execution durations of diverse processes, (c) data access latency in intra- versus inter-domain scenarios, and (d) overall system throughput under load conditions. We evaluate BDEQ in with four alternatives: C-Local (a centralized local server model derived from the design of Kai Fan et al.), C-Cloud (a centralized model utilizing cloud hosting), a fundamental Blockchain-only system (lacking BDEQ’s dynamic features), and EdgeShare (the static proxy baseline). All systems underwent testing under uniform conditions to ensure fairness.

Experimental setup

The testbed has five edge servers, symbolizing five domains, interconnected over a local area network (LAN). All servers possess identical technical specifications: dual-core CPU, 32 GB RAM, and 100 GB storage, operating on Ubuntu 22.04 LTS. Docker containerization was employed to replicate network nodes: many containers were executed on each physical server to impersonate peer nodes, while one container functioned as an orderer for the blockchain network. The BDEQ software, including smart contracts and reinforcement learning agents, operates within these containers. Table 5 delineates the principal configuration characteristics of the environment and blockchain network.

Table 5 The primary configuration attributes of the environment and blockchain network.

MySQL and MongoDB were implemented on each DS-PSPN to store IoT sensor data, exemplifying both relational and document-oriented storage prevalent in IIoT. The data comprised simulated sensor measurements (e.g., temperature, pressure, etc.) pertinent to the gas industry application. These datasets were maintained uniformly across domains to evaluate cross-domain readings.

Also, a DQN was implemented for the RL agent, utilizing a straightforward neural network with two hidden layers, each containing 64 neurons, to approximate Q-values. The agent’s status encompassed normalized CPU load, memory use, and network latency to other nodes, refreshed with each incoming request. We initially trained the agent offline using a workload of random queries to facilitate convergence (about 200 training episodes) and subsequently introduced it into the live system. The learning rate was set at 0.01, and the discount factor was 0.9; exploration (epsilon) was reduced from 1.0 to 0.1 over the initial 100 episodes and subsequently maintained at 0.1 during active operation to facilitate moderate exploration.

To thoroughly assess BDEQ, we compare it with four typical designs encompassing centralized, cloud-based, and decentralized paradigms:

  • C-Local: A centralized data-sharing access control system designed based on the framework suggested by Kai Fan et al.18, employing role-based access control (RBAC) and a local MySQL database as its core management system.

  • C-Cloud: A cloud-based extension of C-Local, wherein the control logic and database are transferred to an Alibaba Cloud server, facilitating the comparison of edge-hosted versus cloud-hosted architectures.

  • Blockchain: A fundamental decentralized framework employing standard blockchain technology, devoid of BDEQ’s adaptive proxy selection and learning-driven enhancements.

  • EdgeShare: A static proxy-based edge-sharing architecture that serves as a direct benchmark to underscore BDEQ’s advantages in adaptability, scalability, and security.

Every system facilitates essential activities including user administration, authentication, and data access. To guarantee equitable assessment, all examinations were performed under uniform data and workload settings. Table 6 delineates the setting of each system, encompassing IP address, data type (temperature or light intensity), and storage format (MySQL or MongoDB).

Table 6 The Experimental deployment configuration of the evaluated systems.

EdgeShare/BDEQ is disseminated across five edge nodes, C-Local operates on a solitary local server, and C-Cloud is housed on Alibaba Cloud. This integrated configuration facilitates uniform benchmarking across centralized, cloud, and edge frameworks. Results indicate that BDEQ’s dynamic trust-based DS-PSPN selection markedly enhances flexibility, security, and efficiency in dispersed IIoT data exchange.

Execution performance of functional modules

Initially, we assessed the latency of essential processes within the system, including user registration, login, and authorization duties, contrasting BDEQ with EdgeShare. These tests assess the speed at which each framework performs a specified job from start to finish, encompassing any overhead associated with blockchain transactions. Each operation was executed 50 times consecutively, and we recorded the average execution time in milliseconds.

The findings, encapsulated in Table 7 below, indicate that BDEQ regularly surpasses EdgeShare in all evaluated processes. For instance, the registration of a new user (which in EdgeShare entails publishing to the blockchain through a static PSPN) required approximately 64.1 ms on EdgeShare, but in BDEQ it averaged around 59.2 ms – representing an approximate 8% enhancement in speed. Comparable enhancements were observed in user login times (42.7 ms in BDEQ against 46.5 ms in EdgeShare) and user deletion times (42.5 ms versus 46.3 ms). Although these gains may appear minor (just a few milliseconds), they signify the diminished overhead resulting from BDEQ’s astute proxy selection. In every instance, BDEQ’s RL agent guaranteed that the operation was managed by the least-loaded node, hence reducing some delay.

Table 7 Average execution time of various operations in EdgeShare vs. BDEQ.

Cross-domain authorization is a highly security-sensitive procedure, wherein a user from one domain seeks access to data from another domain. The system must manage the registration of this cross-domain access, including the granting and subsequent revocation of permissions. In EdgeShare, a fixed mechanism manages this, requiring around 49.6 ms to generate a cross-domain authorization token. BDEQ achieved this in 44.8 ms, representing approximately a 10% increase. BDEQ may delegate this duty to a proximate or less occupied node, whose smart contract has been refined to execute cross-domain rule verifications more effectively, as we have restructured certain checks to utilize cached data.

Significantly, all recorded operations for both systems were rapid, each under 70 ms, which is well under the allowed limits for industrial control, as the Industrial Internet Consortium criteria designate around 200 ms as the threshold for real-time control loops. BDEQ’s intra-domain access control enforcement was executed in approximately 20.4 ms, compared to 22.9 ms in EdgeShare. The little latency enables BDEQ to be utilized in situations necessitating near-instantaneous replies.

Ultimately, we emphasize that these enhancements in performance did not compromise security or success rate: in all trials, operations in both EdgeShare and BDEQ were successful and fulfilled their functional objectives. BDEQ executes them more rapidly. This illustrates that the reinforcement learning methodology, by alleviating bottlenecks, can improve system responsiveness in practice.

Additionally, we observed the training behavior of the RL agent during these operations. Figure 8 depicts the training convergence of the DQN agent, with the average episode reward settling after roughly 150 episodes. The agent swiftly acquired a proficient proxy selection policy, demonstrated by the significant initial rise in rewards and subsequent stabilization around ideal performance. Subtle post-convergence fluctuations suggest continued investigation, while the continuously elevated reward at convergence demonstrates that the agent effectively acquires a policy that minimizes delay and equilibrates load. This stable reinforcement learning policy supports BDEQ’s continuous performance enhancements in fluctuating IIoT environments.

Fig. 8
Fig. 8
Full size image

Convergence of the deep Q-learning agent during training.

Data access performance

Next, we focus on data access latency for read and write operations in both intra- and cross-domain contexts. We evaluate BDEQ against three baseline models: EdgeShare, C-Local, and C-Cloud. “Get” signifies a read (data retrieval) operation, while “Put” denotes a write (data insertion) process. In the intra-domain example, C-Local is merely a local database access with RBAC checks, and C-Cloud provides similar access but to a cloud-hosted database.

Figure 9 compares data access latency across different architectures. BDEQ consistently outperforms other models by maintaining read latencies around 30 ms and write latencies near 35 ms. These improvements are attributable to its local processing, dynamic proxy selection, and partial consensus mechanisms. In contrast, EdgeShare, while also edge-based, shows higher latencies due to static proxy assignment and stricter transaction ordering. Centralized models such as C-Local and C-Cloud demonstrate significantly higher delays, particularly C-Cloud, which incurs remote communication overheads that render it impractical for time-sensitive IIoT scenarios.

Fig. 9
Fig. 9
Full size image

Average latency for intra-domain read/write operations across different systems.

EdgeShare, albeit founded on an edge architecture, exhibits elevated latencies—approximately 40 ms for reads and 52 ms for writes. The performance disparity relative to BDEQ stems from EdgeShare’s dependence on static proxy selection and more rigorous consensus mechanisms, which fail to adjust to dynamic workloads or optimize for real-time circumstances.

Conversely, C-Local, a centralized design utilizing role-based access control (RBAC), demonstrates modest performance, with read and write latencies recorded at roughly 60 ms and 55 ms, respectively. While it circumvents the consensus delays characteristic of blockchain systems, it is deficient in parallelism and caching functionalities, necessitating that each operation be executed by a singular control server, hence constraining scalability and responsiveness.

C-Cloud, the cloud-hosted version of C-Local, exhibits the highest latency among all systems, with approximately 80 ms for reads and 70 ms for writes. The results indicate the inevitable network-induced delays associated with accessing remote cloud resources, rendering this strategy inappropriate for time-sensitive IIoT applications.

Smart contract function performance

We analyzed the execution durations of particular smart contract functions to see which aspects of the access control logic are most enhanced by BDEQ’s optimizations. Essential tasks within the smart contract encompass user information management (AddUserInfo, ChangeUserInfo), authorization management (AddAuth, ChangeAuth), and rule querying (GetRuleInfo, PermissQuery). We assessed the execution duration of these functions (from invocation to blockchain commitment) under both BDEQ and EdgeShare.

The findings, theoretically depicted in Fig. 10, demonstrated that DEQ executes all tested operations faster than EdgeShare. For example, the addition of a new user information entry through the contract required approximately 55 ms in BDEQ, whereas it took around 70 ms in EdgeShare. Modifying a user’s information, such as altering their responsibilities or attributes, took approximately 50 ms in BDEQ compared to around 65 ms in EdgeShare. The enhancements of 10–15 ms are substantial, considering the baseline is approximately 65–70 ms, resulting in an acceleration of almost 20%. The factors encompass BDEQ’s streamlined contract logic (eliminating some superfluous checks required in static systems) and the RL agent frequently selecting a PSPN that already possesses the pertinent state in the cache, thereby facilitating expedited contract execution.

Fig. 10
Fig. 10
Full size image

Execution times of key smart contract functions across different systems.

In terms of authorization functionalities, the processes of adding an authorization (AddAuth) and modifying or revoking an authorization (ChangeAuth) were approximately 10 ms quicker on BDEQ than on EdgeShare. These functions are crucial to cross-domain access control, indicating that BDEQ expedites cross-domain permission allocations. Cross-network requests naturally incur higher latency (due to inter-domain authorization), but BDEQ’s design minimizes this overhead – as reflected in Fig. 10. This aligns with previous discoveries in Sect. “Execution performance of functional modules” and “Data access performance”, and is especially advantageous during numerous cross-domain queries, as the system may allocate and modify permissions with little latency, hence sustaining throughput.

The read-intensive functions, such as GetRuleInfo (get the current access control rules) and PermissQuery (verify the existence of a specific permission), exhibited the most significant relative enhancement. BDEQ performed GetRuleInfo in approximately 35 ms compared to around 45 ms in EdgeShare. The approximately 22% enhancement is probably attributable to BDEQ’s caching strategy: as DS-PSPNs save an updated record pool, accessing it is more expedient than in EdgeShare, which may reconstruct information from the blockchain state with each query. Correspondingly, permission inquiries were approximately 32 ms compared to around 43 ms, resulting in a substantial improvement of approximately 25%. This suggests that BDEQ’s design is highly efficient for regular inspections, which are prevalent in a busy system, as each data request may initiate several rule evaluations.

In conclusion, by analyzing individual contract functionalities, it is verified that BDEQ’s benefits extend throughout the system architecture, from the networking and node selection tiers to the execution of smart contract logic. The design choices, such as partial ordering, caching, and dynamic selection, yield quantifiable enhancements in the efficiency of security and access control activities, facilitating quicker response times without compromising security assurances.

System throughput

Ultimately, we evaluated the system’s throughput in response to a combination of intra- and cross-domain data requests. Throughput is quantified in successful operations per second (Ops/s). We adjusted the ratio of cross-domain requests in the workload to evaluate the systems’ scalability, ranging from a scenario of entirely local requests (100% intra-domain) to one of entirely remote requests (100% cross-domain).

Figure 11 illustrates the throughput of BDEQ, EdgeShare, and the fundamental Blockchain baseline as the proportion of cross-domain requests grows. At 0% cross-domain (all requests are intra-domain), all systems exhibit quite good throughput, with BDEQ leading at approximately 26 + Ops/s, followed by the plain Blockchain at around 24 Ops/s and EdgeShare at approximately 23 Ops/s.

Fig. 11
Fig. 11
Full size image

Throughput of BDEQ vs. EdgeShare vs. a basic Blockchain system, as the fraction of cross-domain requests increases.

The initial disparity arises from BDEQ’s effective local management and parallel processing: despite all requests being local, BDEQ’s reinforcement learning agent can allocate load across the five DS-PSPNs, while EdgeShare may employ each domain’s proxy more rigidly, and the baseline Blockchain lacks edge caching (though it incurs slightly less overhead than the additional logic of EdgeShare, resulting in a marginal advantage over EdgeShare).

As the proportion of cross-domain requests increases, overall throughput diminishes due to the greater weight of cross-domain activities. Nonetheless, BDEQ’s throughput diminishes more slowly. With 50% cross-domain, BDEQ achieves approximately 22 Ops/s, EdgeShare declines to approximately 19 Ops/s, and the baseline reaches around 20 Ops/s. In a scenario of complete 100% cross-domain requests, BDEQ achieves around 19 operations per second, EdgeShare approximately 16 operations per second, and the baseline approximately 17 operations per second.

These findings illustrate BDEQ’s scalability and resilience under diverse workloads. IoT applications frequently use a combination of local and distant data access. BDEQ efficiently manages a substantial volume of concurrent requests due to its dynamic job distribution and prevention of superfluous bottlenecks. EdgeShare’s throughput is compromised due to its static PSPNs, which might be hindered by cross-domain validation jobs. Each cross-domain request in EdgeShare enforces a more stringent sequential consensus, hence restricting parallelism. The fundamental blockchain, devoid of off-chain caching, experiences deceleration as it must process each access over the complete consensus pathway.

An additional finding is that BDEQ’s throughput at 100% cross-domain (~ 19 Ops/s) is comparable to EdgeShare’s throughput at approximately 50% cross-domain (~ 18–19 Ops/s). This indicates that BDEQ can manage approximately twice the volume of cross-domain traffic as EdgeShare while maintaining equivalent performance levels. In a multi-site gas facility situation, when cross-site data exchange may surge (such as during a coordinated operation or emergency), BDEQ would maintain efficient functioning, whereas a more static system may become slow.

The experimental study demonstrates that BDEQ achieves its design objectives by enhancing performance (latency and throughput) while ensuring robust security. All employed methodologies (smart contract operations, reinforcement learning training, etc.) are replicable with the supplied configuration, and the outcomes demonstrate clear improvements over current alternatives. BDEQ’s flexibility renders it a potential platform for secure IoT data sharing in dynamic settings, offering the advantages of blockchain security alongside the performance enhancement of edge intelligence. BDEQ’s reduced access latency, increased throughput, and robust security establish it as a viable and impactful option for next-generation IIoT ecosystems.

Conclusion and future work

This study introduced BDEQ, a blockchain-augmented framework designed to address the dual demands of secure and efficient data sharing in IIoT. By combining a permissioned blockchain with deep Q-learning, the system enables dynamic proxy selection based on trust, load, and latency. Our experimental evaluations confirm BDEQ’s ability to reduce access latency, enhance throughput, and maintain robust security in multi-domain gas industry scenarios.

Despite the promising results, the BDEQ framework has certain limitations. The current proxy selection strategy is based on a single-agent Deep Q-Learning model, which may not optimally scale in multi-agent, highly dynamic IIoT environments. Additionally, our evaluation focuses primarily on a simulated gas-industry scenario; deploying BDEQ in real-world systems with physical constraints and cross-domain interactions remains an open challenge. In future work, we aim to explore decentralized multi-agent learning approaches, extend the system to other industrial sectors such as manufacturing or logistics, and evaluate its integration with hardware-based trust modules and real-time anomaly detection systems.

Building on these directions, we also plan to investigate the application of multi-agent reinforcement learning (MARL) to enable collaborative decision-making across distributed edge nodes. Such an approach can improve scalability, fault tolerance, and learning convergence in dynamic environments. Additionally, incorporating online learning strategies and trust adaptation mechanisms would allow BDEQ to respond more effectively to changes in device behavior or emerging security threats. Beyond the gas industry, BDEQ can be applied to other IIoT domains such as precision agriculture, smart healthcare, and intelligent logistics, where secure, decentralized, and adaptive data sharing is crucial. These extensions will help generalize the framework’s applicability and improve its robustness in complex, real-world deployments.