Introduction

With the exponential growth of the Internet of Vehicles (IoV), the amount of data generated by connected vehicles has dramatically increased. Guaranteeing secure storage and sharing of these massive datasets, as well as protecting the privacy of users, has emerged as a vital challenge in intelligent transportation systems. Blockchain technology, known for its decentralized, tamper-resistant, and highly secure attributes, has garnered significant interest as a potential solution to the data storage and security issues in IoV scenarios. Existing studies on applying blockchain to IoV can be grouped into two implementation strategies. The first strategy involves deploying blockchain on Road-Side Units (RSUs). Although this approach offers stable operation, it incurs substantial infrastructure costs and suffers from inadequate decentralization. The second strategy deploys blockchain directly on vehicles, effectively enhancing the degree of decentralization but faces significant technical challenges such as network instability and excessive storage overhead.

To reduce blockchain storage costs, several techniques have been developed, including Simplified Payment Verification (SPV), pruning, and sharding. SPV nodes store only block headers and rely on full nodes for transaction verification, thereby minimizing local storage. Pruning solutions discard historical data deemed unnecessary for block verification, lowering storage overhead. However, if all nodes apply pruning, certain parts of the blockchain data will eventually become inaccessible. Sharding reduces the storage burden on individual nodes by partitioning the network into smaller groups, each of which only maintains a fraction of the blockchain. Nevertheless, the security of sharding heavily depends on the size of each shard: smaller shard sizes may compromise network security. Although these approaches mitigate storage issues, they invariably affect the blockchain’s overall security or integrity.

Recently, the concept of coded blockchain has been introduced to reduce storage overhead while preserving security and integrity. Coded blockchain approaches have shown considerable promise in dynamic IoT settings. Nevertheless, the IoV environment is particularly dynamic, with nodes frequently joining or leaving the network and the node population undergoing large fluctuations, often following a cyclical daily pattern with significant volatility. Existing coded blockchain designs still exhibit shortcomings in coping with such highly dynamic conditions. Therefore, there is a pressing need to devise an improved compression scheme for vehicular blockchain networks.

One of the well-known compression approaches is the enhanced block design proposed by Yang1, wherein certain enhanced blocks are used to reach consensus regarding compression parameters. This method is suitable for scenarios with moderate dynamism, wherein node counts do not experience steep declines. In reality, however, vehicular blockchain networks may undergo substantial reductions in node numbers (e.g., after rush hours). Under such circumstances, the existing method is prone to performance bottlenecks. To tackle these issues, this paper proposes a novel vehicular blockchain compression scheme.

We summarize our main contributions as follows:

  1. 1.

    IoV-Oriented Block Compression Scheme: We propose a compression framework that significantly alleviates the storage burden on individual nodes in highly dynamic IoV environments, which are also characterized by cyclical variations in network size and activity.

  2. 2.

    Low-Overhead, Efficient Consensus Mechanism: We introduce a novel consensus mechanism optimized for vehicular blockchain, which adapts dynamically to evolving network conditions while maintaining low overhead and high efficiency.

  3. 3.

    Dynamic Compression-Rate Network Maintenance: We develop a network maintenance algorithm capable of updating key parameters without re-compressing or restructuring existing blocks. Specifically, this algorithm dynamically adjusts parameter b to confine the failure probability below \(10^{-12}\), while minimizing overhead. It also preserves a compression ratio between 99% and 99.9%, thus obviating the need for block reorganization.

  4. 4.

    Numerical Simulation and Validation: We verify the effectiveness and reliability of the proposed scheme through a series of numerical simulations, demonstrating its robustness under various network conditions.

Paper organization and section relationships. The remainder of this paper is structured to make the design-to-evaluation flow explicit. Section 2 positions our work in the coded-blockchain and IoV literature. Section 3 formalizes the storage model, rateless (Raptor) coding, and VRF basics that the protocol builds upon. Section 4 summarizes the end-to-end architecture and workflow. Section 5 presents the lightweight, VRF-driven consensus used by the on-board vehicular network. Section 6 defines the decoding-failure risk model and the parameters (e.g., b) subject to adaptive control. Section 7 operationalizes this model via a network maintenance algorithm that adjusts b without re-encoding old blocks. Section 8 analyzes security by combining the proposer selection in Sect. 5 with the redundancy and repair mechanisms in Sect. 7. Section 9 validates the design and the risk bounds through simulation under diurnal churn, and Sect. 10 contrasts our scheme with SPV, pruning, sharding, and edge-assisted approaches under the metrics introduced in Sects. 6, 7, 8, and 9. Section 11 concludes and outlines limitations.

Background

In recent years, blockchain has been widely regarded as a promising approach to achieve trustworthy data sharing and security in the Internet of Things (IoT). However, the traditional blockchain paradigm requires each node to store a complete copy of the ledger, which creates tremendous storage and communication overhead. This overhead is often unaffordable for resource-constrained IoT devices such as sensors and in-vehicle units1,2. For example, the size of the Bitcoin blockchain already exceeded several hundred gigabytes by 2022, making it extremely challenging for low-end devices to serve as full nodes.

To address these issues, the concept of coded blockchain has been introduced in the literature. The fundamental idea is to preserve the decentralized and secure nature of blockchain while incorporating techniques from coding theory, such as erasure codes, to split and redundantly encode the ledger data into fragments stored across different nodes. In this way, each node only stores a subset of the encoded fragments, yet the entire network collectively retains the ability to reconstruct the complete ledger. Early studies have demonstrated that this approach significantly reduces the per-node storage requirement. For instance, Wu et al.2 employed low-density parity-check (LDPC) codes to perform cross-block encoding, so that each node only needed to store an amount of data equivalent to one block out of a group, rather than storing all blocks. Similar frameworks based on erasure codes were also proposed in 2018 by Dai et al.3 and Perard et al.4, paving the way for deploying blockchain in resource-constrained scenarios.

Over time, coded blockchain has evolved from using fixed-rate erasure codes to more sophisticated coding techniques and has gradually been integrated with other blockchain mechanisms such as consensus and sharding. Early approaches primarily employed fixed-parameter erasure codes (e.g., Reed-Solomon codes, LDPC codes) to encode blocks. Subsequently, researchers explored the synergy of coding with blockchain sharding to boost both storage efficiency and throughput. A representative example is the PolyShard architecture proposed by Li et al.5, which combines polynomial-based encoding with sharding. Instead of assigning one shard to each node, PolyShard encodes the entire blockchain across multiple shards via polynomial codes, such that each node stores a mixture of coded data from all shards. This not only reduces the overall storage overhead per node, but also allows every node to participate in verifying transactions for any shard. The scheme theoretically achieves near-linear scalability in throughput and storage, thereby mitigating the conventional “impossible trilemma” in blockchain. During this period, fountain (rateless) codes were also proposed for blockchain to improve storage flexibility. For example, Kadhe et al.6 developed a preliminary idea of random linear fountain encoding to diminish redundant replication of data on chain.

More recent works focus on using Lagrange polynomial coding and advanced rateless coding. Asheralieva and Niyato7 introduced a Lagrange-coded blockchain (LCB) that extends the concept of PolyShard from a sharded network to a single-chain environment, incorporating encoding schemes in both block generation and verification to reduce redundancy and enhance security. Other researchers explored rateless codes (e.g., fountain codes, regenerating codes) to cope with frequent node arrivals and departures in dynamic IoT networks1,8. By allowing nodes to request and generate as many encoded fragments as needed, rateless-based approaches can further reduce storage burdens and accelerate synchronization for new participants. Pal9 showed that a fountain-code-based bootstrap mechanism can substantially reduce the bandwidth required for a new node to obtain ledger data, while Chawla et al.10 demonstrated how a rateless-coded broadcast strategy can alleviate redundant transmissions and speed up block dissemination.

Besides saving storage, coding techniques often strengthen the security and integrity of blockchain data. Standard blockchain integrity checks rely on cryptographic hashing to detect tampering across chained blocks; adding coding can provide an extra layer of redundancy-based detection. If any node provides incorrect data fragments, honest nodes can detect inconsistencies through the decoding process. Meanwhile, in a sharded setting, an attacker must corrupt enough coded fragments on multiple nodes to erase a given transaction beyond recovery. PolyShard leverages this property to push the security threshold to scale with the number of nodes5. In addition, coding can facilitate lightweight proofs such as fraud proofs, or coded Merkle trees, which benefit resource-limited light nodes by ensuring data availability11.

From the perspective of vehicular networks (IoV), coded blockchain holds great potential. Vehicular nodes often lack the capability to store large historical ledgers due to limited onboard resources, and the network topology is highly dynamic due to vehicle mobility. By encoding the blockchain so that each vehicle only needs to hold fragments, one can substantially reduce the storage burden. Moreover, rateless-coded broadcasting can enhance the robustness of blockchain synchronization in the presence of frequent connectivity disruptions. Although most existing IoV-oriented blockchain solutions primarily focus on methods such as digital twin offloading12 or dynamic sharding, coding-based approaches could further improve both reliability and security in a decentralized manner. Research on coded blockchain specifically tailored to vehicle-to-vehicle or vehicle-to-infrastructure contexts is still at an early stage, and future work may explore hybrid architectures that combine roadside units (RSUs) and vehicles as collaborative storage and computation entities.

In summary, coded blockchain has emerged as a viable approach to addressing the storage and scalability issues faced by traditional blockchain systems in resource-constrained, dynamic IoV environments. By integrating advanced coding techniques—ranging from fixed-rate erasure codes to rateless fountain or regenerating codes—these schemes can reduce per-node resource consumption without compromising the ledger’s integrity or security. Incorporating such designs into highly dynamic networks like the Internet of Vehicles remains a promising and ongoing research direction.

Preliminaries

Coded–blockchain storage model

Block grouping and symbolization. Let W denote the height of the latest confirmed block. Choose an integer \(\alpha >0\) such that blocks \(\{0,1,\dots ,W-\alpha \}\) are immune to further chain reorganization. These \((W-\alpha +1)\) blocks are partitioned into

$$\begin{aligned} M = \bigl \lceil (W-\alpha +1)\big /\!k \bigr \rceil . \end{aligned}$$
(1)

Denote the m-th coding group by

$$\mathcal {G}_m = \bigl \{B^{(m-1)k},\,B^{(m-1)k+1},\dots ,B^{mk-1}\bigr \}, \qquad m=1,\dots ,M .$$

After removing each block header, the payload is evenly sliced into s symbols. Consequently, group m can be expressed as the matrix

$$\begin{aligned} \textbf{B}^{(m)} = \begin{bmatrix} b_{1,1} & b_{2,1} & \dots & b_{k,1}\\ b_{1,2} & b_{2,2} & \dots & b_{k,2}\\ \vdots & \vdots & \ddots & \vdots \\ b_{1,s} & b_{2,s} & \dots & b_{k,s} \end{bmatrix} \in \mathbb {F}_q^{s\times k}, \end{aligned}$$
(2)

where \(\mathbb {F}_q\) is a finite field of size \(q=2^{p}\).

Linear erasure coding. Select an \([n,k]_q\) linear code with generator matrix \(G\in \mathbb {F}_q^{k\times n}\). Every row of \(\textbf{B}^{(m)}\) is right–multiplied by G, producing

$$\begin{aligned} \textbf{U}^{(m)} \;=\; \textbf{B}^{(m)}\,G \in \mathbb {F}_q^{s\times n}. \end{aligned}$$
(3)

The i-th column \(\textbf{u}_i=(u_{i,1},\dots ,u_{i,s})^{\!\top }\), \(i=1,\dots ,n\), is an encoded block stored by the i-th node.

Systematic generator matrix and offline recovery. If \(G=[\,I_k\;|\;P\,]\) is systematic, the first k columns equal the original block data, i.e., \(\textbf{u}_i=\textbf{b}_i\) for \(i=1,\dots ,k\). If the node that stores \(\textbf{u}_i\) is temporarily offline, a requester can

  1. (i)

    collect at least k columns in the systematic case, or

  2. (ii)

    collect at least \((1+\varepsilon )k\) columns in the non-systematic case,

and decode to restore the entire matrix \(\textbf{B}^{(m)}\), thereby obtaining the desired block.

Rateless codes

LT code

Encoding. Given n input symbols \(\{u_1,\dots ,u_n\}\), each encoded symbol \(v_j\) is generated as follows:

  1. (i)

    sample a degree d from the distribution \(\Omega (d)\);

  2. (ii)

    choose a neighbour set \(\mathcal {N}(v_j)\subseteq \{1,\dots ,n\}\) with \(|\mathcal {N}(v_j)|=d\);

  3. (iii)

    compute

    $$\begin{aligned} v_j \;=\; \bigoplus _{i\in \mathcal {N}(v_j)} u_i . \end{aligned}$$
    (4)

Peeling decoding. After receiving about \((1+\epsilon )n\) encoded symbols, the decoder iteratively

  1. (i)

    finds a degree-1 symbol \(v_j\) and assigns its value to the sole unknown neighbour \(u_i\);

  2. (ii)

    for every encoded symbol \(v_\ell\) containing \(u_i\), update

    $$v_\ell \leftarrow v_\ell \oplus u_i,\quad \mathcal {N}(v_\ell )\leftarrow \mathcal {N}(v_\ell )\setminus \{i\}.$$

Decoding may stall if no degree-1 symbol exists, producing the error floor: the failure probability does not vanish even when \(\epsilon \!\rightarrow \!0\).

Raptor code

To suppress the error floor, a Raptor code prepends a fixed-rate outer code \(C_{\text {outer}}\):

$$\begin{aligned} (u_1,\dots ,u_n)\; \xrightarrow {C_{\text {outer}}}\; (w_1,\dots ,w_{n'}),\quad n'>n . \end{aligned}$$
(5)

The intermediate symbols \(\{w_i\}\) are then ratelessly encoded by the LT procedure. Decoding first runs LT peeling and then lets \(C_{\text {outer}}\) recover any missing \(w_i\). The overall complexity is \(\mathcal {O}\!\bigl (k\log (1/\epsilon )\bigr )\), while the failure probability can be driven below \(10^{-12}\), matching stringent blockchain reliability requirements.

Verifiable random functions (VRFs)

A VRF maps an input to a unique, publicly verifiable pseudo-random output. Given a secret key sk and public key pk:

$$\begin{aligned} (r,p)&\leftarrow \operatorname {VRF}(\textit{seed}, sk),\end{aligned}$$
(6)
$$\begin{aligned} \operatorname {true/false}&\leftarrow \operatorname {VRF\_Verify}(r,p,\textit{seed}, pk). \end{aligned}$$
(7)
  • Correctness: If (rp) is produced by the holder of sk, verification returns true.

  • Pseudo-randomness: Without sk, r is computationally indistinguishable from uniform.

  • Unforgeability: No PPT adversary can create a pair (rp) that passes verification without knowing sk.

In later sections, the VRF is employed for random yet publicly auditable node selection and parameter refresh within the consensus protocol.

Overview

Fig. 1
Fig. 1
Full size image

Architecture of the proposed on-board vehicular blockchain network.

Fig. 2
Fig. 2
Full size image

System-level workflow of the adaptive rateless-coded vehicular blockchain.

Intuitively, Fig. 2highlights the life-cycle of an on-board vehicular blockchain: vehicles first obtain certificates from a trusted CA, then participate in block compression, VRF-driven consensus and dynamic maintenance within a fully peer-to-peer network.

In this system, the presence of a widely recognized and trusted Certificate Authority (CA) is assumed. During the blockchain bootstrap phase, the CA (i) authenticates vehicular identities and (ii) issues digital certificates, thereby thwarting Sybil attacks and ensuring that only legitimate vehicles can access the system. It also finalizes the initial network configuration and provides guidance to the participating nodes. As illustrated in Fig. 1, only newly joining vehicles must contact the CA to obtain a certificate.

It is important to emphasize that the CA’s role is confined to system initialization and identity issuance; the subsequent blockchain consensus process and network operation proceed independently of the CA. If the CA is taken offline by an attack, the existing vehicular blockchain still operates normally. The only impact is that new vehicles cannot be enrolled temporarily.

To facilitate the ensuing performance analysis, three working assumptions are introduced. First, the ledger is assumed to grow at a constant rate of \(\beta\) blocks per unit time. Second, reflecting the pronounced diurnal pattern of online vehicles, we model the instantaneous active-vehicle count as a time-varying random process bounded by \(f_\text {inf}\) and \(f^\text {sup}\); for concreteness and to match our 30-day aggregate and a representative 24-hour trace (Sect. 6), we approximate it within this envelope by a truncated normal. Finally, on-board units are significantly storage-constrained compared with conventional server-class blockchain nodes, and this limitation must be taken into account when designing data-retention policies.

Coding process

First, as described in the “encoding blocks” subsection of the preliminary knowledge, we partition each original block \(B^{w}\) into s symbols (defined over a finite field \(\mathbb {F}_q\)). Several original blocks are grouped together. Denote the set of original blocks in the m-th group by \(G_m\), and let \(k_m = |G_m|\) be its size. After partitioning each original block into s symbols, these symbols are placed as columns to form an \(s \times k_m\) matrix. For simplicity, we use k to represent \(k_m\) in the following discussion.

For each group of data, an independent encoding is performed row by row. Specifically, we employ a systematic Raptor code to encode the k original symbols in each row. A Raptor code is composed of a pre-code and an LT code in cascade:

  1. 1.

    First, a linear pre-coding (with rate \(r = k_m/n_m\), length parameter \([n_m, k_m]\), and generator matrix \(G_m\)) maps the k information symbols to \(n_m\) intermediate symbols \(u_1, u_2, \dots , u_{n_m}\).

  2. 2.

    Then, using a degree distribution \(\Omega (d)\), a systematic LT code encodes these \(n_m\) intermediate symbols into N coded symbols \(v_1, v_2, \dots , v_N\).

As illustrated in Fig. 3, because the pre-code is systematic, the first k intermediate symbols are simply replicas of the original symbols, i.e., \(\{u_1, \dots , u_k\} = \{b_1, \dots , b_k\}\). Moreover, the LT code is also systematic, so the first \(n_m\) coded symbols correspond directly to the intermediate symbols, i.e., \(\{v_1, \dots , v_{n_m}\} = \{u_1, \dots , u_{n_m}\}\), while \(v_{n_m+1}\) through \(v_N\) are non-systematic coded symbols. After this two-stage encoding process, each group of original blocks produces an \(s \times N\) coded matrix (shown as Eq. (8)). Suppose the total number of nodes in the network is N, and each column j of the matrix corresponds to a coded block \(v_j\). For each group of data, each node is required to store b coded blocks based on a target failure probability; for the m-th group, the set of coded blocks stored at node j is denoted by \(V_j^m\).

Fig. 3
Fig. 3
Full size image

An illustrative structure of the two-stage raptor encoding process.

$$\begin{aligned} \begin{bmatrix} v_{1,1} & v_{2,1} & \cdots & v_{k,1} & \cdots & v_{n,1} & \cdots & v_{N,1} \\ v_{1,2} & v_{2,2} & \cdots & v_{k,2} & \cdots & v_{n,2} & \cdots & v_{N,2} \\ \vdots & \vdots & \ddots & \vdots & \ddots & \vdots & \ddots & \vdots \\ v_{1,s} & v_{2,s} & \cdots & v_{k,s} & \cdots & v_{n,s} & \cdots & v_{N,s} \end{bmatrix} \end{aligned}$$
(8)

Decoding process

When node j requires a particular original block \(b_i\) from the m-th group, it broadcasts a request to the network. If node i (or any node) storing that original block or coded block is still online, it can directly transmit \(v_i\) to node j (since the code is systematic, \(v_i = b_i\)). If node i is offline or has failed, node j can recover (repair) \(b_i\) using coded blocks stored by other online nodes. Specifically, node j obtains sufficient coded blocks from other nodes and applies a repair algorithm to reconstruct \(b_i\). If the repair fails (for example, due to insufficient coded blocks or infeasible combinations), node j will then decode the entire m-th group. In this case, node j collects approximately \((1 + \varepsilon )k\) distinct coded blocks (i.e., some subset of the columns in Eq. (8)). Because the non-systematic coded blocks \(\{v_{n_m+1}, \dots , v_N\}\) are generated randomly, they are highly likely to be independent, thus satisfying decoding requirements. Node j then performs LT decoding first to retrieve the \(n_m\) intermediate symbols \(\{u_1, \dots , u_{n_m}\}\), and subsequently applies pre-code decoding to recover the original blocks \(\{b_1, \dots , b_k\}\). Since each row of symbols is encoded independently, decoding any single row enables reconstruction of all k original blocks in the group. Therefore, node j can then extract the required block \(b_i\). This procedure has been detailed in the work of Yang et als1.

Consensus mechanism

Basic consensus

  1. 1.

    Random Number Retrieval: Chainlink Oracle Network Chainlink is a decentralized oracle network designed to securely and reliably connect blockchain smart contracts with off-chain systems. By establishing an independent decentralized oracle network, Chainlink can aggregate data from multiple sources and deliver verified data to smart contracts, which helps trigger on-chain executions and reduce centralization risks. In this system, Chainlink’s Verifiable Random Function (VRF) service is leveraged to generate and obtain the random seeds required for subsequent block production.However, because Chainlink requires a certain number of block confirmations to generate and finalize the random value, there exists a time difference between the Chainlink random output and our system’s block interval. To avoid network stalls or reduced efficiency when the random value is not yet available (i.e., “random number delay”) at the actual block production time, we adopt an advanced request strategy:

    • At the beginning of the i-th block period, a contract on our blockchain requests the random seed \(\text {Seed}_{i+j}\) from Chainlink for the future \((i+j)\)-th block.

    • When the \((i+j)\)-th block is constructed, \(\text {Seed}_{i+j}\) is written into that block’s on-chain data.

    • The integer j is determined by the difference between “the confirmation time required by Chainlink to generate VRF” and “the block interval of our system,” ensuring that \(\text {Seed}_{i+j}\) has been securely recorded on-chain before block \((i+j)\) begins construction.

    Furthermore, since the Genesis Block does not have any prior random seeds requested, a trusted authority (CA, Certificate Authority) must pre-generate j random seeds for the first few blocks to initiate block selection. Once Chainlink-provided random seeds are recorded on-chain in subsequent blocks, the system switches to the decentralized random seed service to maintain normal operation.

  2. 2.

    Generating the VRF output After node k obtains the random seed \(\text {Seed}_i\) corresponding to the current block (i.e., the i-th block), it uses the following verifiable random function (VRF) to calculate its verification values:

    $$\begin{aligned} r_i^k,\; p_i^k \;=\; \textrm{VRF}\bigl (\text {Seed}_i,\;\textrm{SK}_k\bigr ), \end{aligned}$$
    (9)

    where \(\textrm{SK}_k\) denotes node k’s private key, and \(\textrm{VRF}(\cdot )\) is a cryptographic function whose output can be verified by all other nodes via

    $$\begin{aligned} \mathrm {True/False} \;=\;\mathrm {VRF\_Verify}\bigl (r_i^k,\;p_i^k,\;\text {Seed}_i,\;\textrm{PK}_k\bigr ). \end{aligned}$$
    (10)

    Hence, only the node holding the corresponding private key can ascertain whether it is eligible to produce the next block and prove its legitimacy to others.

  3. 3.

    Determining Block Eligibility This system uses a threshold-based mechanism to determine which node becomes the block producer. Let \(T_i\) be the threshold value for the i-th round of block production. A node k will be considered “eligible” if

    $$\begin{aligned} \bigl |\textrm{PK}_k - r_i^k\bigr | \;\le \; T_i. \end{aligned}$$
    (11)

    Once node k deems itself “elected,” it collects transactions and constructs the new block. It also appends the necessary proof fields (e.g., \(r_i^k, p_i^k\)) into the block body before broadcasting it to the entire network.

Threshold \(T\) setting

To ensure that the average number of block producers per round remains close to 1, this system must determine a threshold \(T\). First, nodes in the network should periodically detect the number of vehicles, denoted by \(N\), while controlling communication overhead; see13,14 for potential approaches. Meanwhile, to avoid excessive or insufficient corrections (overshoot/undershoot) caused by network fluctuations or abrupt node arrivals/departures, the system imposes a maximum adjustment step size for each update, thereby maintaining overall stability and continuity.

Once the network obtains the total vehicle count \(N\), the threshold \(T\) is computed as follows. Let \(\textrm{PK}\) and \(R\) be mapped to the interval \([0,1]\) and be independently and uniformly distributed, where \(\textrm{PK}\) represents the public key of a vehicle and \(R\) is a random number. We aim to determine the probability that \(|\textrm{PK} - R|\le T\) for any \(R \in [0,1]\). This probability can be computed via the integral

$$\begin{aligned} \int _{0}^{T} (R + T)\,dR \;+\; \int _{T}^{\,1 - T} (2T)\,dR \;+\; \int _{1 - T}^{1} \bigl (1 - R + T\bigr )\,dR, \end{aligned}$$
(12)

which simplifies to

$$\begin{aligned} P\bigl (|\textrm{PK} - R|\le T\bigr ) \;=\; 2T \;-\; T^{2}. \end{aligned}$$
(13)

Because there are \(N\) vehicles in total and they act independently, the expected number of selected producers is

$$\begin{aligned} \mathbb {E}\bigl [\text {selected producers}\bigr ] \;=\; N\,\bigl (2T - T^{2}\bigr ). \end{aligned}$$
(14)

To make the expected number of producers equal to \(c\), then

$$\begin{aligned} N\,\bigl (2T_{c} - T_{c}^{2}\bigr ) \;=\; c. \end{aligned}$$
(15)

Solving the above yields

$$\begin{aligned} T_{c} \;=\; 1 \;\pm \; \sqrt{\,1 \;-\; \frac{c}{N}\,}. \end{aligned}$$
(16)

Given \(0 \le T \le 1\), we take the smaller positive root, namely

$$\begin{aligned} T_{c} \;=\; 1 \;-\; \sqrt{\,1 \;-\; \frac{c}{N}\,}. \end{aligned}$$
(17)

Fork handling and priority evaluation

To establish a unique main chain as quickly as possible after a fork, we adopt a tiered, weighted priority evaluation mechanism. The threshold interval is partitioned into N equal segments. If a block producer is selected only at the x-th relaxation (i.e., when the interval width becomes x/N), then the block is assigned level x, where \(x\in \{1,2,\dots ,N\}\).

Let s(x) denote the priority score of a level-x block. We use an exponentially decaying function

$$\begin{aligned} s(x) = a^{\,x-1}, \qquad 0<a<1, \end{aligned}$$
(18)

so that \(s(1)=1\) and higher levels (i.e., later selections) receive lower scores. The parameter a controls the decay rate.In this scheme, we set \(a=0.5\). When a staircase implementation is desired, the function can be realized in a piecewise-constant (segmented) form (see Fig. 4).

Fig. 4
Fig. 4
Full size image

Segmented exponential (staircase) scoring function.

No-node-selected scenario

Based on this threshold mechanism and random selection, it is still possible that no node is chosen to produce a block in a given round. Hence, the system enforces a maximum waiting time: if no producer emerges within this time frame, \(T\) is temporarily relaxed to ensure timely block generation.

The method for relaxing \(T\) is described as follows. To produce a normal block in the next round without causing severe forks, we intend to increase \(T\) so that the expected number of block producers grows by about 1, relative to the previous expectation. Analysis of the threshold formulas shows that when \(N\) is much larger than \(c\), \(T_{c}\) grows approximately linearly with \(c\). For simplicity, once a node obtains the current threshold \(T_{1}\), it can quickly derive a new value \(T_{c}\) via an approximate linear function through the origin, thus allowing rapid relaxation when no node is selected. This guarantees that at least one node can qualify to produce a block in due time.

Additional information in blocks

Fig. 5
Fig. 5
Full size image

An illustrative structure of the block header.

Compared with standard blockchain systems, each block in our system contains the following additional fields:

  • Random Seed (Seed): Each block attaches a seed field. Note that this seed is not the random value used by the block itself, as explained in the consensus mechanism section.

  • VRF Output Parameters r and p: The block producer inserts \((r_i^k, p_i^k)\), generated by the VRF, so that other nodes can verify its block production eligibility.

Additional Information for the Final Block in Each Group

In the last block of every group, the following extra information must also be included, as shown in the Fig. 5:

  • Group Sequence Number: The index (sequence number) of the group, facilitating block lookup and positioning for all network nodes.

  • Hash Values: The hashes of the non-systematic intermediate blocks \(\{u_{k+1}, \ldots , u_{n}\}\). When a new node joins the network, it may need both the original blocks and some encoded blocks to generate its own encoded version. Original blocks can be verified via the Merkle root stored in the block header, while encoded blocks can be validated for integrity by comparing their hashes against this field.

Block verification

When a node receives the \((n \cdot k)\)-th block, in addition to the usual validation (e.g., block header, transaction validity, correctness of the random seed, and eligibility), it must further verify:

  • The correctness of the “group sequence number” in the block.

  • Whether the agreed generation matrix was used to compute the hashes of \(\{u_{k+1}, \ldots , u_{n}\}\) stored in the block.

If these checks pass, the block is deemed valid. The node then uses the randomness to select certain blocks and compute the encoded blocks it should store, subsequently releasing the memory occupied by all blocks in the n-th interval to achieve storage optimization.

Risk evaluation and parameter adjustment

Building on Sects. 5 and 3, we formalize the decoding-failure risk and the adaptive parameter b used later by Sect. 7.

Our primary objective is to keep the risk of block loss (i.e. decoding failure) at a negligible level. Whenever the system detects that this risk reaches a predefined threshold, each node in the same group should temporarily increase the number of stored blocks b so as to preserve overall security. To quantify that risk accurately, one must consider (i) the ratio between “degree-1 blocks” and “random-degree blocks’’ and (ii) the correlation among multiple blocks that reside on the same node. Because Monte-Carlo simulation is impractical on resource-constrained vehicular nodes, we adopt an upper-bound approximation: we analyse the worst-case scenario in which, at one instant, \(\Delta _\gamma\) nodes leave the network simultaneously. In reality, nodes usually depart gradually while new ones join and repair missing blocks, so the true failure probability is lower than this bound.

Given an initial node count \(N\) and a degree distribution \(\Omega (d)\), assume the number of vehicular nodes varies over time between two bounding functions \(f^{\text {sup}}\) and \(f_{\text {inf}}\).

Fig. 6
Fig. 6
Full size image

Assumed upper and lower bound of vehicle count over 24 h.

The instantaneous population fluctuates between them according to a normal distribution (the vertical axis in Fig. 6 denotes relative magnitude only). Define

$$\begin{aligned} f\!\bigl (b,N;\Omega (d),f^{\text {sup}},f_{\text {inf}},\gamma ,T\bigr ) \end{aligned}$$
(19)

as the probability that a group of coded data cannot be decoded after \(\gamma\) periods of duration \(T\) when each node stores only \(b\) blocks.

Degree-distribution drift. Some nodes may be unavailable due to failure or disconnection, causing the original distribution \(\Omega (d)\) to shift to a corrected version \(\Omega ^{*}(d)\). Specifically, when a new node—initially choosing its degree according to \(\Omega (d)\)—discovers that certain target neighbours are offline, it stores additional repair blocks during encoding/decoding, thereby changing the global degree profile.

Let \(G\) denote the pre-coding matrix and \(\varepsilon\) the redundancy ratio (overhead). Then

$$\begin{aligned} p\!\bigl (G,\Omega ^{*}(d),\varepsilon \bigr ) \end{aligned}$$
(20)

is the probability that a Raptor code constructed with \(\bigl (G,\Omega ^{*}(d),\varepsilon \bigr )\) fails to recover at least one source symbol. This term can be obtained by simulation or by the analytical methods in15,16.

Combining the above factors yields

$$\begin{aligned} f\!\bigl (b,N;\Omega (d),f^{\text {sup}},f_{\text {inf}},\gamma ,T\bigr ) = \int _{0}^{\infty } p\!\bigl (G,\Omega ^{*}(d),\varepsilon \bigr )\, \Pr \!\bigl (bN+\Delta _{\gamma }=(1+\varepsilon )k\bigr )\, d\varepsilon , \end{aligned}$$
(21)

where

$$\begin{aligned} \Delta _{\gamma } = NR = \frac{N\bigl (N_{1}-N\bigr )}{N_{1}},\qquad N_{1} \sim \mathcal {N}\!\Bigl ( \frac{L+U}{2}, \bigl (\tfrac{U-L}{20}\bigr )^{2} \Bigr ), \end{aligned}$$
(22)

and

$$\begin{aligned} L = f_{\text {inf}}\!\bigl (t_{0}+\gamma T\bigr ),\quad U = f^{\text {sup}}\!\bigl (t_{0}+\gamma T\bigr ),\quad L < U. \end{aligned}$$
(23)

By dynamically adjusting \(b\), the overall decoding-failure probability can be kept below a prescribed safety threshold. Detailed simulation settings and performance evaluations are presented in the Simulation and Performance Assessment section.

Empirical Validation of the Vehicle-Count Envelope

Fig. 7
Fig. 7
Full size image

Thirty-day aggregate of active vehicles vs. time of day.

Fig. 8
Fig. 8
Full size image

Representative 24 h profile of active vehicles vs. time of day.

To verify that the idealized envelope shown in Fig. 6 accurately captures real-world behavior, we analyzed the CRAWDAD epfl/mobility dataset (\(\approx 500\) San Francisco taxis, 30 days, 1–30 s sampling). Figure 7 depicts the 30-day aggregated active-vehicle profile versus time of day, while Fig. 8 shows a single representative day. Three key observations follow:

  1. 1.

    Shape similarity: the trace exhibits a clear morning rise, midday plateau, and evening decline that closely match the assumed envelope.

  2. 2.

    Milder extremes: the measured peak-to-trough ratio is approximately 3 : 1, substantially lower than the 5 : 1 span used in Fig. 2’s worst-case envelope, implying that real-world variability yields an even lower decoding-failure probability.

  3. 3.

    Service-fleet bias: taxi duty cycles sustain higher nighttime activity than typical private vehicles, further smoothing the trough and making this dataset a conservative validation.

Therefore, employing this idealized envelope assumption as an upper bound is justified and does not underestimate system risk.

Network maintenance algorithm

Using the risk model in Sect. 6, this section details how each node adjusts its fragment count b without re-encoding historical blocks.

To ensure the security of the blockchain network, it is necessary to guarantee the decodability of all blocks. Inspired by the NMA scheme proposed by Yang et al.1, we adapt their method to a vehicular blockchain context. When a new node joins the network, it randomly selects a set of blocks based on a predefined degree distribution and then encodes these blocks for storage. The procedure can be summarized as follows:

Initial Retrieval The new node requests the required blocks from other nodes in the network. If these blocks are successfully acquired, they are immediately encoded to generate the necessary coded blocks for local storage.

Repair Attempt If certain blocks cannot be obtained during the initial retrieval, the system offers a two-step repair mechanism:

  1. 1.

    Recursive Search: The new node recursively searches for additional coded blocks related to the missing block throughout the network. If it finds enough related coded blocks to recover the missing one, it stops here, thereby avoiding large communication overhead.

  2. 2.

    Extensive Collection: Should the first step fail, the new node performs a broader search to collect at least k blocks from the same group. It then applies erasure coding to decode the required block and stores the newly repaired block locally.

This two-tiered repair mechanism ensures that most missing blocks can be recovered at the first step. Only in rare cases, where the missing block cannot be retrieved from a small number of nodes, does the mechanism escalate to the second step. Consequently, this design achieves high repair success rates with minimal network overhead.

In the proposed vehicular blockchain system, where node membership varies significantly, we must also keep the failure probability under a very low threshold (e.g., \(10^{-12}\)) while maintaining controllable overhead. Hence, the number of coded blocks stored by each node, denoted by b, must be dynamically adjusted. Whenever the measured failure probability exceeds the expected level, each node independently computes the need to increase b using the same formula (see Eq. (21)) and detects that an increment is required. However, if all nodes simultaneously request new coded blocks, severe network congestion may occur.

To mitigate this risk, we introduce a delayed strategy that spreads out the requests and completes the increment of b over \(\gamma\) time periods, thus maintaining both network security and stability. Specifically,

  • Starting from the time when a heightened risk is detected, the system distributes the probability of “being chosen to acquire new blocks” uniformly among the nodes over the next \(\gamma\) time periods to avoid instantaneous overload.

  • If node i generates a randomized degree l (with \(l \le \gamma\)), it determines in advance which coded blocks should be additionally acquired (based on its chosen degree) and then listens for communication in the network to see whether other nodes request the same blocks.

  • If node i detects that node j is already requesting a block of interest, i records j’s address. When it reaches time period l, node i can directly request the block from j without broadcasting a new request.

By gradually increasing b in batches, the network avoids congestion caused by synchronized repair requests. These requests are issued by nodes that are already online, so no assumption is made about simultaneous node reconnection. The network then completes the storage adjustments within \(\gamma\) time periods, preserving overall system safety and robustness. Further details regarding experimental settings and performance evaluations are provided in the next section.

Assumption contingencies

CA availability. As discussed in Sec. 4, the Certificate Authority (CA) is contacted only when a new vehicle joins or a day-scale revocation list is published. Should the CA be taken offline by an attack or maintenance outage, all already-enrolled vehicles continue to participate with their still-valid certificates; the impact is restricted to (1) temporarily blocking new enrolments and (2) delaying the distribution of the next CRL.

Fallback randomness without Chainlink VRF. If the on-chain smart contract detects that no fresh Chainlink VRF value has been finalised within two block intervals, the proposer derives the block seed as

$$\begin{aligned} R_t = \operatorname {VRF}\!\bigl (\texttt {serial}\_\text {proposer},\,\texttt {sk}\_\text {proposer}\bigr ) \;\oplus \; H\!\bigl (\texttt {hash}_{t-1}\,\Vert \,\dots \,\Vert \,\texttt {hash}_{t-k}\bigr ), \end{aligned}$$
(24)

where \(k\!\ge \!64\) by default. Because at least one of the previous k blocks must have been produced by an honest node and the VRF output is unique to the proposer’s certified key pair, an adversary would need to both monopolise the last k block slots and pre-compute colliding VRF keys at scale to bias \(R_t\), a cost regarded as prohibitively high in vehicular settings. Hence liveness is preserved with negligible bias while the external oracle is unreachable.

Peer retrieval & repair. The above maintenance algorithm guarantees eventual fragment delivery; it does not, however, bound the worst-case latency under severe churn. Formalising such latency guarantees in highly dynamic IoV topologies remains an open problem and is left as future work.

Security analysis

We analyze security by combining the VRF-based proposer selection (Sect. 5) with the redundancy and repair mechanisms in Sect. 7.

Threat model and trust assumptions

Security goals.

  1. (i)

    Ledger integrity: valid transactions are irreversibly committed to the chain;

  2. (ii)

    Data availability: honest nodes can reconstruct block data within acceptable latency.

Participants. On-board units (OBUs) both produce blocks and store erasure-coded shards; the root CA only issues certificates and rejects illegitimate join requests.

Adversary capabilities.

  • Byzantine nodes: arbitrary deviations from the protocol (forged blocks, double spending, collusion).

  • Message dropping/network partitions: selective suppression of blocks or shards.

  • Sybil identities: the attacker may register multiple identities, but each must hold a valid certificate.

  • DoS: short-term throttling of bandwidth/compute; long-term compromise of key material is out of scope.

Trust assumptions. The root CA is not compromised; the VRF (e.g., Chainlink VRF) is unpredictable and unbiased; network delay remains within design bounds.

IoV context. OBUs are highly mobile with frequent joins/leaves; message drops and short-lived partitions are common. Our mechanism combines unpredictable VRF sampling, a layered “expanding ring” selection, and erasure coding to provide a quantifiable upper bound on reorg risk and robust data availability under high churn.

Attack path and decision rule

To rewrite recent history, the attacker must consecutively obtain block-production rights and extend a private chain; at the end of the window this private chain is released. If its accumulated weight exceeds that of the public chain over the same window, the reorg succeeds; otherwise a single missed round breaks the private chain and the attack fails. We adopt a 12-round finality rule: history older than 12 blocks is considered final, and we evaluate whether the next 12 rounds can be rewritten.

Notation and basic quantities

Let n be the total number of nodes, \(n_A\) the number of attacker nodes, and

$$\begin{aligned} p \triangleq \frac{n_A}{n} \end{aligned}$$
(25)

the attacker fraction. Let K denote the number of nodes that land in the current stopping layer of a round (K follows a zero-truncated Poisson). Conditioned on \(K=k\), the probability that at least one attacker appears in that layer is

$$\begin{aligned} \Pr (\text {attacker appears}\mid K=k)=1-(1-p)^k . \end{aligned}$$
(26)

Marginalizing over K yields the per-round attacker-appearance probability

$$\begin{aligned} q \triangleq \Pr (\text {attacker appears in a round}) = \sum _{k=1}^{\infty }\Bigl [1-(1-p)^k\Bigr ]\Pr (K=k). \end{aligned}$$
(27)

If the mechanism stops at level \(X\in \{1,\dots ,N\}\) (interval length X/N), the round contributes a score

$$\begin{aligned} R \;=\; a^{\,X-1},\qquad a\in (0,1) \end{aligned}$$
(28)

(independent of K). In our scheme we take \(a=0.5\). Define the auxiliary moments

$$\begin{aligned} \mu \triangleq \mathbb {E}[R], \qquad \sigma ^2 \triangleq \textrm{Var}(R). \end{aligned}$$
(29)

Consecutive-win length (limit process)

From some time onward, rounds are independent and the attacker succeeds in a round with probability q. Let L be the number of consecutive attacker wins before the first miss. Then

$$\begin{aligned} \Pr (L=\ell )=(1-q)\,q^{\ell },\qquad \ell =0,1,2,\dots \end{aligned}$$
(30)

Attacker score (limit mixture)

Given \(L=\ell\), the attacker’s total score over that run is

$$\begin{aligned} X_A=\sum _{i=1}^{\ell } R_i , \end{aligned}$$
(31)

where \(R_i\) are i.i.d. copies of R. Let g be the PMF of R and let \(f^{(*\ell )}\) denote the \(\ell\)-fold discrete convolution of g with itself (with \(f^{(*0)}\) the unit mass at 0). Unconditionally,

$$\begin{aligned} \Pr (X_A\in \cdot )=\sum _{\ell =0}^{\infty }(1-q)\,q^{\ell }\,f^{(*\ell )}(\cdot ). \end{aligned}$$
(32)

Baseline network score and risk definition

Let \(Y_{12}\) be the baseline (faction-agnostic) cumulative score of the public chain over 12 rounds. Then

$$\begin{aligned} \mathbb {E}[Y_{12}] = 12\,\mu , \qquad \textrm{Var}(Y_{12}) = 12\,\sigma ^2, \end{aligned}$$
(33)

and its PMF is obtained by 12-fold convolution of g. We define the attack success probability over a 12-round window as

$$\begin{aligned} \rho _{12}(p)\;\triangleq \;\Pr \bigl (X_A > Y_{12}\bigr ). \end{aligned}$$
(34)

Quantitative results

We evaluate \(\rho _{12}(p)\) via a semi-analytic + Monte Carlo approach (convolutions for g and partial \(f^{(*\ell )}\) tables, then mixture; \(\ge 5\times 10^5\) samples to ensure small confidence intervals). Representative results are:

$$\begin{aligned} \begin{aligned} \rho _{12}(0.10)&\approx 7\times 10^{-7}\;(0.00007\%),\\ \rho _{12}(0.20)&\approx 3.7\times 10^{-5}\;(0.0037\%),\\ \rho _{12}(0.30)&\approx 2.7\times 10^{-4}\;(0.027\%). \end{aligned} \end{aligned}$$
(35)

Conclusion. Even with \(30\%\) of the nodes under the attacker, the probability of a successful 12-round reorg is only about \(0.027\%\), indicating a comfortable security margin.

Mitigations and IoV impact

  • Unpredictable proposers (mitigates targeted DoS): VRF sampling prevents the attacker from pre-targeting next-round proposers, which is especially valuable in IoV.

  • Quantified reorg cost (mitigates hidden-chain attacks): geometric level weights and a 12-round finality window force long runs of wins; missing any round typically dooms the attack.

  • Sybil resistance: CA issuance limits identity quality; even with multiple certificates the attacker is bounded by the table above via p.

  • Message-drop tolerance / availability: blocks are erasure-coded across many OBUs, allowing reconstruction when a threshold of shards is reachable despite unstable links and local partitions.

  • DoS resilience: short-lived DoS only reduces participation in a few rounds; with a window (e.g., 12) the residual risk is further diluted.

Experimental simulation

Simulations were executed in Python 3.9 on a consumer-grade Intel i7 laptop. The channel model adopts an erasure code with the Maximum-Distance-Separable (MDS) property, e.g. Reed–Solomon. A Monte Carlo procedure is used to estimate the decoding success probability: whenever at least k symbols are collected, successful decoding is declared.

In designing the degree distribution we start from the well-established Robust Soliton Distribution (RSD), which is widely used for LT codes. However, owing to the specifics of our scheme, degree-one nodes are not generated during the initial encoding phase; they appear only after a “block-retrieval failure.” To suppress an excessive number of low-degree nodes—thereby mitigating topological isolation and improving both encoding and repair efficiency—we first compute the conventional RSD mass function \(\mu (d)\) for \(d=1,\dots ,k\), and then redistribute the mass of \(\mu (1)\) uniformly over \(\{2,\dots ,k\}\):

$$\begin{aligned} \Omega (1)=0,\qquad \Omega (d)=\mu (d)+\frac{\mu (1)}{k-1},\quad d=2,\dots ,k. \end{aligned}$$
(36)

Consequently, no degree-one nodes are present at the outset, preserving high network connectivity; only when a node fails to obtain a specific block does it “degenerate” into a degree-one node and trigger a dedicated repair process, thus reducing isolation probability and enhancing overall fault tolerance.

Fig. 9
Fig. 9
Full size image

Distribution of vehicle changes.

Fig. 10
Fig. 10
Full size image

Failure probability comparison.

Fig. 11
Fig. 11
Full size image

The vehicle variation in a simulation with an initial 3200 vehicle nodes.

Next, we compare the theoretical upper bound on the failure probability with simulation results. With parameters \(N=3200\), \(\gamma =100\), and \(T=0.01\) h (\(\approx 36\) s), and a start time of 10:00, the theoretical vehicle-variation profile \(\Delta _\gamma\) is shown in Fig. 9, the simulated vehicle count in Fig. 11, and the corresponding failure probabilities in Fig. 10. The theoretical bound consistently exceeds the empirical values; in this experiment, when \(k/N>0.45\) additional redundancy b must be introduced to guarantee reliability.

Fig. 12
Fig. 12
Full size image

Effect of Different Scales N on Failure Probability.

We then investigate how N affects the failure probability (Fig. 12. For \(N<1600\) the failure probability is highly sensitive to N; when \(N\ge 1600\), further enlarging the network yields diminishing returns.

Fig. 13
Fig. 13
Full size image

Comparison of failure probability at different times.

To capture the impact of different times of day on the vehicle blockchain, we fix \(N=1600\), \(\gamma =100\), and \(T=0.01\), with results summarized in Fig. 13. Between 12:00–13:00 the vehicle count is stable, and the failure probability rises sharply only when \(k/N>0.75\); between 20:00–21:00 the vehicle count drops rapidly, shifting the threshold forward to \(k/N>0.45\).

Fig. 14
Fig. 14
Full size image

24 h trend chart for N and b.

We next evaluate storage compression for a single block group with \(k=1000\) under a target failure probability below \(10^{-12}\). Assuming 1 MB per block, \(\gamma =100\), \(T=0.01\), and a start time of 07:00, Fig. 14 reports an average storage of 4.0154 MB per node—i.e. an average compression ratio of 99.6 %. Statistics show that 84.13 % of vehicles (predominantly during peak hours) store only 1–2 MB, corresponding to compression ratios of 99.8–99.9 %; even in off-peak periods the ratio remains above 99 %.

Fig. 15
Fig. 15
Full size image

One week simulation.

Finally, we examine one week of continuous block production. Each node initially stores ten groups of encoded data, still with 1 MB per block, \(k=1000\), \(\gamma =100\), and \(T=0.01\). According to Fig. 15, a total data volume of 26 800 MB results in an average per-node storage of 567.89 MB (average compression 97.88 %), while in the worst case a node stores 1 199 MB, corresponding to a 95.53 % compression ratio.

Hardware budget and timing

Table 1 Per-block-group resource footprint (\(k = 1000\)).

The coding layer dominates both CPU time and memory. As shown in Table 1, on our Intel i7-13700H laptop (AVX2, Ubuntu 22.04, release build) encoding and decoding a \(1\,\textrm{MiB}\) block group complete in well under a millisecond, while the peak resident–set size (RSS) stays below \(2.3\,\textrm{MB}\). To gauge what this implies for an entry-level automotive System-on-Chip, note that RaptorQ’s asymptotic cost is \(O\!\bigl (k\log (1/\varepsilon )\bigr )\) and scales linearly with clock frequency and vector width. Public benchmarks of the same code base on Cortex-A53 @ 1.4 GHz report a slowdown of roughly one order of magnitude compared with AVX2, which would still keep the per-decode latency comfortably below the 100 ms time budget of typical in-vehicle control loops. Because the memory formula \(k^2 + kS\) is architecture-agnostic, the RAM requirement remains in the \(2{-}3\,\textrm{MB}\) range on either platform.

Comparative analysis

We compare our approach against SPV, pruning, sharding, and edge-assisted schemes using the reliability and storage metrics introduced in Sects. 6, 7, 8, and 9.

Comparison assumptions and unified metrics

To make our head-to-head comparison reproducible, we adopt a unified metric and a common set of assumptions across all baselines (SPV, pruning, dynamic sharding, edge/DT offload) and our adaptive rateless-coded design (ARC).

Metric and unified formulas. Assume each coding group contains k blocks. Every block consists of payload (header-stripped) of size P and a header of size H. Using a full node as the baseline, the per-group storage is:

$$S_{\text {full}} = k (P + H).$$

The per-node storage under different schemes (ignoring constant-level implementation overheads) is:

$$\begin{aligned} \begin{aligned} \text {SPV:}\quad&S_{\text {SPV}} = kH,\\ \text {Pruning (keep last} w \text {full blocks):}\quad&S_{\text {PR}} = kH + wP,\\ \text {Sharding (} S \text {shards):}\quad&S_{\text {SH}} \approx \frac{k}{S}(P + H)^{\dagger },\\ \text {Edge/DT offload (keep a local hot fraction } \rho \text { of payloads):}\quad&S_{\text {EDGE}} = kH + \rho k P,\\ {\textbf {ARC (ours, keep }} b {\textbf {coded columns per node + all headers):}}\quad&S_{\text {ARC}} = kH + bP. \end{aligned} \end{aligned}$$

\(^{\dagger }\) For comparability, we omit cross-shard coordination/availability overhead \(\phi\).

Key assumptions and parameterization (with rationale). Unless otherwise noted, the comparison and plots use the following parameters:

  • Block payload \(P = 1\) MB. Consistent with our experiments; represents a conservative upper bound for IoV second-level aggregation.

  • Block header \(H = 256\) B. Includes standard fields (\(\sim\)80 B) plus seed/VRF outputs and group/hash amortization.

  • Coding group size \(k = 1000\); symbol size \(S = 1024\) B; field GF(\(2^8\)). Matches hardware constraints; balances practicality and performance.

  • Baselines:

    • SPV: \(S_{\text {SPV}} = kH\).

    • Pruning: \(w = 100\) (about one hour given \(\sim\)36 s block time).

    • Sharding: \(S \in \{10, 20\}\), omitting \(\phi\) for comparability.

    • Edge/DT offload: \(\rho = 0.2\).

Table 2 Comparison of storage–reduction techniques for IoV blockchains under unified assumptions (\(k{=}1000\), \(P{=}1\) MB, \(H{=}256\) B).

Comparative discussion

Table 2 summarizes the key characteristics and trade-offs of various storage-reduction techniques for IoV blockchains.

Trust-radius minimisation. SPV and Pruning clients cannot replay historical transactions autonomously; they permanently depend on at least one reachable honest full (or archive) node. Once an adversary monopolizes all connections, a two-phase Eclipse attack can isolate the victim and deliver a counterfeit longest chain, enabling double spending. By contrast, our rateless-coded blockchain merely requires that at least \(f(b,N;\Omega (d),f^{\text {sup}},f_{\text {inf}},\gamma ,T)\) honest fragment nodes remain online. Because nodes are role-symmetric and no committee or super node exists, an attacker must simultaneously corrupt that threshold number of randomly chosen peers, so the attack cost grows quadratically with both network size and per-node fragment count.

Comparison with sharding.

  1. 1.

    No cross-shard coordination. Sharded ledgers rely on two-phase commit or shared sequencers for atomic cross-shard transfers; our chain remains logically single, eliminating deadlock and double-spend paths.

  2. 2.

    Self-healing under churn. When vehicles disconnect en masse, each survivor autonomously raises its fragment count b, preserving availability without reshuffling validators or resizing shards.

  3. 3.

    Global security threshold. The \(<\!1/3\) Byzantine bound in sharding is local: compromising one shard can endanger the whole ledger. Our design forces adversaries to launch a broad-spectrum attack across the entire network.

Edge-assisted & twin-offloaded schemes. DT-Offload12 mirrors the on-board ledger to MEC servers, leaving a 15–30% hot-block cache on each vehicle. While latency and security are analyzed, the paper does not quantify storage gains—merely noting that “a substantial portion of historical data can be off-loaded.” The effective compression thus depends on edge availability. Hybrid-RV20 archives cold data in RSUs, with vehicles caching the latest k blocks (or a hot window). Storage reduction is likewise not quantified in the original papers; for comparability we parameterize \(w{=}100\). When RSU coverage is sparse or offline, vehicles must fall back to peer gossip and replay, incurring bandwidth close to a full node—an overhead our self-repair mechanism avoids.

Take-away. Although the compression ratio (97.9%) of our rateless-coded blockchain is not the highest among all schemes, it exhibits strong adaptability and robustness in highly dynamic vehicular environments. The system maintains availability and consistency under frequent node churn, while preserving decentralization and security without relying on centralized infrastructure or cross-domain coordination. This makes the design particularly suitable for sparse and high-mobility intelligent transportation networks.

Conclusion

This study proposes a rateless-coded blockchain architecture tailored for vehicular networks that substantially reduces per-node storage while dynamically tuning coding redundancy to sustain security under diurnal churn. In tandem, a lightweight VRF-based consensus layer is designed to be coding-aware, enabling block confirmation with minimal communication and computation overhead, and preserving on-chain integrity and availability in highly mobile IoV settings.

Through head-to-head comparisons with SPV, pruning, sharding, and edge-assisted offloading, our scheme minimizes the trust radius and removes infrastructure dependence while retaining full verifiability in a purely peer-to-peer network. Extensive simulations and measurements show that the system achieves an average compression ratio of \(97.88\%\) (worst case \(95.53\%\)) and maintains decoding failure probabilities below \(10^{-12}\) via adaptive fragment-count control, all with sub-millisecond encode/decode times and only MB-scale memory footprints on commodity hardware—indicating practicality on automotive SoCs.

From a security standpoint, under a 12-block finality rule, the probability that an adversary controlling \(30\%\) of nodes successfully rewrites history is approximately \(2.7\times 10^{-4}\) (\(0.027\%\)), providing a comfortable margin for dynamic IoV scenarios. Operationally, not every vehicle needs to contact the Chainlink VRF: only the elected proposer queries the oracle, which keeps system overhead low. To preserve liveness during oracle outages, we adopt an on-chain fallback seed derived from the proposer’s VRF and recent block hashes, which introduces negligible bias yet prevents stalls.

Limitations remain: prolonged oracle unavailability can defer block production, and providing formal worst-case latency bounds for fragment retrieval under severe churn is left for future work. Overall, by coupling rateless coding, adaptive maintenance, and auditable proposer selection, the architecture delivers a secure, storage-efficient, and deployable vehicular blockchain at scale.