Introduction

In the past decades, vehicle routing problem (VRP) and its variants have been widely popularized because they can simulate the practical applications in various fields. Their applications include transportation planning, supply chain management in logistics network, production management, etc. The goal of VRP is to design a group of optimal distribution routes for vehicles of a certain scale, so as to provide services for customers in logistics distribution. It represents the essence of vehicle allocation and route planning under the lowest cost in logistics distribution. Therefore, it is a key problem in logistics distribution and one of the most widely studied problems in the field of combinatorial optimization. Since the truck scheduling problem proposed by Dantzig and Ramser1, researchers have been studying the relationship between vehicle routing planning and delivery planning. It is considered as a typical case of VRP, involving the distribution of goods from central depots to geographically dispersed customers. Due to the influence of many factors such as transportation enterprises, customers and external environment, the current vehicle routing planning of logistics distribution is facing severe challenges.

This paper studies an important variant of VRP, namely, the reality oriented multi constraint VRP (Rich VRP, RVRP). The RVRP extends classical VRP by incorporating multiple real-world constraints and objectives, making it a more practical yet complex variant. In this study, RVRP is defined by four key dimensions of richness: Complex Road Network Constraints: Vehicles must navigate urban road networks with traffic restrictions (e.g., one-way streets, no-entry zones) and dynamically optimized shortest paths. Heterogeneous Fleet and Capacity Constraints: Vehicles have fixed capacities, and demand splitting is allowed to maximize load utilization. Time Window Constraints: Customers impose strict delivery time intervals, with penalties for early/late arrivals. Demand Splitting Constraints: The demand of a single customer node can be distributed by multiple vehicles in multiple times. Unlike prior RVRP formulations, this model uniquely integrates road network complexity with demand splitting and multi-objective trade-offs, reflecting realistic urban logistics challenges.

VRP and its basic variants have been widely discussed in the literature. RVRP for practical problems has become a new research trend in recent years. Paola Pellegrini et al.2 studied the RVRP with four constraints: multiple time windows, heterogeneous fleet, maximum duration, and multiple visits. D Pisinger et al.3 studied the RVRP with five constraints: time window constraint, capacity constraint, multi depots, site-dependent, the open VRP, and simultaneous pickup and delivery. Goel A et al.4 studied the RVRP of time window constraint, vehicle heterogeneity constraints, multi-dimensional capacity constraint, order/vehicle compatibility constraints, simultaneous pick-up and delivery, multi depots and other constraints. Subramanian et al.5 studied the RVRP with capacity constraint, asymmetric constraint, open, simultaneous pickup and delivery, mixed pickup and delivery, multi depots and multi depot with mixed pickup and delivery. Subsequently, two review studies on RVRP came into being. Arias et al.6 conducted a comprehensive research and summary on RVRP. Rahma Lahyani et al.7 classified and defined RVRP, summarized the composition of the problem, constraint definition and solution method. After that, Qi et al.8 studied the RVRP of multi station, multi time window, multi journey and multi vehicle types. Rabbouch et al.9 studied the RVRP of multi warehouse heterogeneous finite fleets (vehicle quantity constraints, vehicle capacity constraint, time window constraint, heterogeneous fleets, different vehicles) with time windows.

RVRP is produced to meet the actual needs of transportation. As a NP-hard problem, it is also a multi-objective optimization problem. The importance of its objective function varies from field to field. For example, for the food distribution and medical industries, delay time is critical. The freight transport industry can consider the total journey as the key objective to minimize compared with other objectives, because the fuel consumption is proportional to the driving distance. Therefore, from an economic point of view, it is important to minimize the total distance traveled by all vehicles. For small industries, the minimization of the number of vehicles may be the highest priority compared to other goals. When planning the vehicle path, the decision-makers hope to obtain multiple paths that meet the target requirements at the same time, so as to ensure the stability of the decision, that is, there are at least two equivalent global Pareto optimal solutions corresponding to the same point on Pareto front (PF)10,11. That is to say, when we are solving the multi-objective optimization problem RVRP, the vehicle path that satisfies the constraint conditions is an optimal path set. In order to find more equivalent optimal paths corresponding to the same objective optimal solution12, RVRP can be regarded as a multi-modal multi-objective optimization problem (MMOP).

In recent years, researchers have proposed many multi-modal multi-objective optimization algorithms (MMEAs) to solve MMOP. MMOP has multiple Pareto solution sets, which are usually crowded when mapped to the Pareto front in the target space, even corresponding to the same Pareto front. Therefore, when designing MMEAs, it is usually necessary to consider both decision space and target space. Based on this, many MMEAs with good performance are proposed13,14,15. They can simultaneously obtain multiple equivalent global optimal solutions in the problem, providing more choices for decision makers. Li et al.13 proposed a multi-modal multi-objective optimization algorithm called MMEAWI, which is based on weighted index. It fuses the diversity information of solutions in the decision space into an objective spatial performance index to maintain the diversity of the decision space. In addition, the algorithm introduces the convergence archive to ensure more effective approach to Pareto frontier. Ming and Gong16 proposed a coevolutionary algorithm called CMMO. The algorithm uses coevolution, target relaxation technology, specially designed environment and mating selection to balance the convergence and diversity of target space and decision space, so as to solve MMOPs more effectively. Li et al.17 proposed an algorithm called HREA, which uses hierarchical ranking method to rank individuals in the population according to different levels to promote the selection and evolution of different solutions in the population. The algorithm also uses a local convergence quality evaluation method to better maintain the diversity of decision space.

MMOP aims to locate multiple equivalent Pareto-optimal solutions corresponding to the same objective values, which is critical for decision-makers seeking diverse yet equally optimal paths in logistics planning. However, despite the notable advancements in the theoretical framework and algorithm design of MMOP, its application in real-world logistics scenarios, such as RVRP, continues to confront significant challenges. The multi-objective optimization problem lacks a single, definitive global optimal solution, and the abundance of non-dominated solutions cannot be readily implemented in practice. Consequently, the pursuit of solutions must focus on identifying an equivalent set of globally optimal solutions.

Although many multi-objective optimization algorithms have been proposed to solve RVRP and its related variants18,19,20, due to the complexity of problem modeling, the difficulty of solving, and the multi-modality of the problem, the research results are relatively few. Paola Pellegrini et al.2 used ant colony optimization algorithm to solve RVRP considering four constraints. D Pisinger et al.3 proposed a hybrid heuristic algorithm to solve the problem. Goel A et al.4 proposed an iterative method to change the neighborhood structure in the search process. Subramanian et al.5 proposed a hybrid algorithm to solve the RVRP considering seven different constraints, and solved a series of set partitioning (SP) models by using a mixed integer programming (MIP) solver. Srivastava et al.21 proposed a non-dominated sorting genetic algorithm (NSGA-II) with target specific mutation operator. Konstantakopoulos et al.22 proposed a multi-objective evolutionary algorithm (MOEA) with improved construction algorithm and crossover operator. Sethanan et al.23 proposed a hybrid differential evolution algorithm with fuzzy logic controller genetic operator. Peng et al.24 proposed a hybrid evolutionary algorithm combined with variable neighborhood search. The above researches show that evolutionary algorithm has certain advantages in solving RVRP and related problems.

In recent years, hybrid metaheuristic methods have shown significant advantages in complex optimization problems, effectively improving search efficiency and solution quality through multi technology fusion. In the field of vehicle path optimization, Sathyamurthy et al.25 innovatively combined the perturbation mechanism of simulated annealing (SA) with the crossover mutation of genetic algorithm (GA), and embedded a mixed integer linear programming (MILP) model to solve the multi warehouse rechargeable vehicle path problem. Dynamic balance was achieved through the local search ability of SA and the global exploration property of GA. Similarly, the Ferreira team26 proposed a variable neighborhood search algorithm for green vehicle routing and two-dimensional loading constraints, integrating lower bound programs, open space heuristics, and constraint planning models to systematically address the coupling problem of loading feasibility verification and path optimization. In the field of computational intelligence, Wu et al.27 used the RankNet surrogate model to predict individual ranking relationships and combined it with the Local Estimation Distribution Algorithm (EDA) to construct a hybrid optimization framework, significantly improving the search efficiency of high-dimensional coverage problems. In the field of healthcare, Narasimhan et al.28 used genetic algorithms to optimize random forest feature subsets, synchronously solving dynamic demand allocation and disease prediction problems, achieving collaborative optimization of feature selection and model performance. These studies all demonstrate the core advantages of hybrid metaheuristic methods: breaking through the limitations of a single algorithm through complementary techniques such as global local search balance, machine learning embedding, and constraint modeling, providing systematic solutions for complex problems in multi constraint, high-dimensional, and dynamic scenarios.

According to the analysis of previous studies, although the research on RVRP has achieved some research results, there are still some problems: (1) The current research on RVRP does not consider the impact of urban complex road network on vehicle routing, they both ignore that the logistics distribution process is based on the complex urban road network. (2) The objective of RVRP is relatively single. At present, the research on RVRP mainly solves the objective from a certain angle, such as minimum total cost, minimum travel time, minimum the average waiting time of customers, minimum travel time, etc. However, in the actual logistics distribution process, multi-objective optimization needs to be considered. (3) The resource utilization rate is not high. For most logistics enterprises or distribution centers, the number of vehicles and distribution personnel available for each transportation task are limited. Therefore, how to reduce the logistics transportation cost, maximize the use of limited resources, and improve the loading rate of transportation vehicles is very important for the related research of route optimization.

Urban logistics distribution faces escalating demands for cost efficiency, environmental sustainability, and customer satisfaction, yet traditional VRP models often oversimplify constraints like road networks and single-objective optimization, failing to address real-world complexities. To bridge this gap, this study proposes the RVRP framework that integrates four critical constraints—complex road networks, vehicle capacity, time windows, and demand splitting—through a comprehensive analysis of practical logistics challenges, constructing an integer programming model aligned with multi-constraint routing requirements and actual road network conditions. Formulated as a multi-modal problem with a Pareto front reflecting trade-offs between fuel costs, delivery times, and resource utilization, RVRP demands algorithms capable of escaping local optima while preserving solution diversity. Addressing this combinatorial complexity, the hybrid OL-DEVND algorithm innovatively combines Differential Evolution (DE) and Variable Neighborhood Descent (VND): it embeds Opposition-Based Learning (OL) to expand the search space during initialization, ensuring coverage of dispersed Pareto regions, while VND’s adaptive neighborhood switching refines equivalent routes without sacrificing diversity. This dual mechanism synergizes DE’s global exploration with VND’s local precision, overcoming modality loss in classical DE-based MMOP solvers and outperforming existing methods in route equivalence preservation. The framework ensures robust convergence to diverse Pareto-optimal routes through adaptive constraint handling, enhancing solution quality for multi-objective logistics planning.

The main contributions of this paper are as follows:

  1. 1)

    Comprehensive RVRP Modeling: Unlike traditional models, which address isolated constraints (e.g., time windows or multi-depots), this model integrates four critical dimensions into a unified integer programming framework. This aligns with real-world logistics operations where these constraints coexist.*.

  2. 2)

    Oppositional Learning-Enhanced DE: While DE-based methods are known for global search, they often overlook population diversity in combinatorial spaces. The OL is introduced during initialization to generate adversarial solutions.

  3. 3)

    VND-Embedded Local Search: Existing MMOP algorithms rely on fixed mutation operators, limiting their ability to escape local optima in RVRP. The VND is embedded into the adaptive neighborhood exchange to improve the quality of the solution.

Problem description and model establishment

Problem description

Combined with the actual logistics distribution situation of most logistics enterprises in the market, in the actual logistics distribution process, due to the limitations of the complex urban road network, under the condition of ensuring the maximum vehicle capacity, considering that the customer demand can be split, and most customers have specified the time interval for order distribution, under these constraints, the problem can be defined as a rich vehicle routing problem (RVRP), which can be described as:

For the distribution center in a known area, there are several vehicles sent from the distribution center. The distribution requirements of multiple logistics orders are completed orderly and without repetition in the complex urban road network. If the customer’s demand exceeds the vehicle carrying capacity, the demand can be split. These orders limit the time window of specific distribution. If the distribution vehicles are earlier or later than this time window, then a certain time penalty cost should be added to the final total transportation cost. Under the above constraints, the logistics distribution lines should be reasonably planned to minimize the number of vehicles required, the total distribution distance, the distribution time and the distribution cost.

Model assumptions

The problem of urban logistics distribution path planning is related to many factors. The mathematical model is very complex and has many constraints. In order to facilitate modeling, the following assumptions are made in this study:

  1. (1)

    Only consider the logistics distribution of a single logistics distribution center.

  2. (2)

    The vehicles responsible for logistics distribution must take the distribution center as the starting point and return to the distribution center after completing all customer order distribution tasks.

  3. (3)

    Each vehicle only completes the distribution of one line.

  4. (4)

    The demand and location coordinates of each customer are known and fixed.

  5. (5)

    The arc formed between customer nodes is an un-directed arc. For example, when an un-directed arc is formed between a customer node i and a customer node j, it means that the distribution vehicles can be transferred from customer node i to customer node j, or from customer node j to customer node i.

  6. (6)

    The arc formed between customer nodes has two-way weight, which represents distance and time cost.

  7. (7)

    In the process of vehicle distribution, the impact caused by temporary vehicle failure or wrong goods distribution will not be considered for the time being.

  8. (8)

    Time Window Constraints: Soft constraints with penalty costs. Early/late arrivals are permitted but penalized proportionally to deviation time.

  9. (9)

    Fleet Homogeneity: All vehicles have identical capacity and operational costs.

  10. (10)

    Demand Splitting: Customer demand can be split across multiple vehicles if \({q_i}>w\), ensuring full resource utilization.

Notation definition

The notations and meanings related to the model are represented in Table 1.

Table 1 Notation definition.

Establishment of objective function

The multi constraint vehicle routing problem is based on the actual logistics distribution. In the actual logistics distribution, the logistics distribution system is composed of multi constraint conditions such as vehicle capacity constraint, urban complex road network, customer time window constraint, customer demand split constraint, etc. when the complex multi constraint logistics distribution problem is optimized, the following mixed integer programming mathematical model is established:

\({f_1}\) : the number of vehicles required to complete the distribution task:

$$\hbox{min} {\kern 1pt} {\kern 1pt} {\kern 1pt} {f_1}=R=\left\lceil {\sum\limits_{{i=1}}^{n} {\frac{{{q_i}}}{w}} } \right\rceil$$
(1)

\({f_2}\) : the total driving distance of the vehicles, taking the minimum value :

$$\hbox{min} {\kern 1pt} {\kern 1pt} {\kern 1pt} {f_2}=\hbox{min} {\kern 1pt} {\kern 1pt} {\kern 1pt} D=\hbox{min} {\kern 1pt} {\kern 1pt} {\kern 1pt} \sum\limits_{{i,j \in C}} {\sum\limits_{{r \in R}} {{d_{ij}}} } x_{{ij}}^{r}$$
(2)

\({f_3}\) : the total vehicle delivery time:

$$\hbox{min} {\kern 1pt} {\kern 1pt} {\kern 1pt} {f_3}=\hbox{min} (\sum\limits_{{i,j \in C}} {\sum\limits_{{r \in R}} {(\frac{{{q_j}{d_{ij}}x_{{ij}}^{r}}}{v}+T{d_j})+\beta \sum\limits_{{i \in C}} {\sum\limits_{{r \in R}} {Z_{i}^{r}} } } } \cdot \hbox{max} \left\{ {{b_{ir}} - L{T_i},0} \right\})$$
(3)

\({f_4}\) : the total cost of completing logistics distribution, which is composed of driving cost, vehicle fixed cost and time delay cost, and takes the minimum value:

$$\hbox{min} {\kern 1pt} {\kern 1pt} {\kern 1pt} {f_4}=\hbox{min} (\sum\limits_{{r \in R}} {\sum\limits_{{i,j \in C}} {f{y_{ij}}x_{{ij}}^{r}} } +\sum\limits_{{r \in R}} {G\sum\limits_{{j \in C'}} {x_{{oj}}^{r}} } +l)$$
(4)
$$l=\alpha (\sum\limits_{{r \in R}} {\sum\limits_{{i \in {S^r}}} {\sqrt {{{({x_i} - \frac{{\sum\limits_{{i \in {S^r}}} {{x_i}} }}{{\left| {{S^r}} \right|}})}^2}+{{({y_i} - \frac{{\sum\limits_{{i \in {S^r}}} {{y_i}} }}{{\left| {{S^r}} \right|}})}^2}} } } )$$
(5)

l is the time delay cost, and its value is proportional to the sum of the distances from all customers in the line to the center of the geographical location. Here, the time delay cost is only used to compare the advantages and disadvantages of the schemes, and the value of a single scheme has no actual operational significance.

The four objectives are inherently interrelated, reflecting real-world logistics trade-offs:

Minimizing the number of vehicles (\({f_1}\)) often requires consolidating deliveries into fewer routes, which increases individual route lengths and total distance (\({f_2}\)). For example, reducing vehicles from 10 to 8 may extend average route distances. This conflict arises from the fixed vehicle capacity w, forcing longer detours to serve all customers. Shorter routes (\({f_2}\)) reduce fuel costs but risk violating time windows (\({f_3}\)), incurring penalties. Conversely, prioritizing strict time compliance may require additional vehicles or routes, raising operational costs (\({f_4}\)).Timely deliveries (\({f_3}\)) reduce penalty costs embedded in \({f_4}\). For instance, eliminating a 1-hour delay for a customer with β=$10/hour directly saves $10 in \({f_4}\).\({f_4}\) aggregates fixed vehicle costs, fuel expenses, and penalties, making it a composite metric influenced by \({f_1}\), \({f_2}\), and \({f_3}\). Optimizing \({f_4}\) inherently balances other objectives but may obscure specific trade-offs.

Constraints:

Each customer is visited at least once:

$$\sum\limits_{{r \in R}} {\sum\limits_{{i \in C}} {x_{{ij}}^{r}} } \geqslant 1$$
(6)

A decision variable, if and only if the vehicle in the \({r^{th}}\) route passes through the arc \((i,j)\), \(x_{{ij}}^{r}=1\), otherwise \(x_{{ij}}^{r}=1\).

$$x_{{ij}}^{r} \in \left\{ {0,1} \right\},r \in R,i,j \in C'$$
(7)

The distribution needs of each customer are met.

$$\sum\limits_{{r \in R}} {y_{i}^{r}} ={q_i},\forall i \in C'$$
(8)

The distribution demand of customer i in the \({r^{th}}\) line.

$$y_{i}^{r},i \in C,r \in R,{q_i} \geqslant y_{i}^{r} \geqslant 0$$
(9)

The conservation of flow, that is, the number of vehicles entering a point is equal to the number of vehicles leaving the point.

$$\sum\limits_{{i \in C}} {x_{{ip}}^{r}} - \sum\limits_{{j \in C'}} {x_{{pj}}^{r}} =0\;p \in C'$$
(10)

The traffic obstacles that must be avoided do not appear in the distribution line.

$$\sum\limits_{{r \in r}} {x_{{ij}}^{r}} =0,{a_{ij}} \in NC$$
(11)

The number of arc edges between served customers in each line is equal to the number of served customers minus 1.

$$\sum\limits_{{i \in {S^r}}} {\sum\limits_{{j \in {S^r}}} {x_{{ij}}^{r}} } =\left| {{S^r}} \right| - 1,r \in R,{S^r} \subseteq C'$$
(12)

The customer i can only be served when the vehicles passe by.

$$\sum\limits_{{j \in C}} {x_{{ij}}^{r}} {q_i} \geqslant y_{i}^{r}$$
(13)

Whether the distribution task has time requirements.

$$Z_{i}^{r} \in \left\{ {0,1} \right\}\;\forall r \in R,i \in C$$
(14)

The distribution volume of each distribution vehicle does not exceed the upper limit of vehicle capacity. In urban logistics, vehicles often operate near full capacity to minimize trips and fuel costs. For example, a vehicle with w = 500 kg serving a customer with \({q_i}\)= 600 kg must split the demand into two trips (500 kg + 100 kg), directly impacting route planning and costs.

$$\sum\limits_{{i \in C}} {{q_i}{y_{ir}}} \leqslant w\quad \forall r \in R,i \in C$$
(15)

Whether the vehicle passes customer i.

$${y_{ir}} \in \left\{ {0,1} \right\}\quad \forall r \in R,i \in C$$
(16)

Each logistics distribution task has vehicle distribution.

$$x_{{ij}}^{r} \leqslant {y_{ir}}\quad \forall r \in R,i \in C,j \in C$$
(17)

The departure time of the vehicle is 0.

$${b_{or}}=0\,,\;\forall r \in R$$
(18)

The time iteration relationship of the delivery vehicle arriving at the customer.

$${b_{jr}}=x_{{ij}}^{r}({b_{ir}}+\frac{{{d_{ij}}}}{v})\;\forall r \in R,i \in C,j \in C$$
(19)

The dwell time of distribution to each customer, and the dwell time has a certain linear relationship with the distribution volume.

$$T{d_j}=f({q_j})\;\forall j \in C$$
(20)

The customer distribution demand is greater than 0.

$${b_{jr}} - {b_{ir}} \geqslant 0\;\forall r \in R,i \in C,j \in C$$
(21)

Soft time windows reflect practical scenarios where minor delays are tolerable but costly. A customer requiring delivery between 10:00–12:00 may accept a 12:30 arrival with a penalty of \(\beta\)=$10/hour, balancing service quality and operational flexibility.

Solving method

In recent years, there have been many researches on the methods of solving the optimal vehicle routing. The commonly used methods mainly include exact algorithm and heuristics29,30. According to the analysis in Sect. 1, RVRP is a multi-modal and multi-objective optimization problem. When solving this kind of problem, NSGA-II31 and differential evolution algorithm32 have shown good results.

DE, as a new type of intelligent algorithm, has a simple principle, few controlled parameters, good robustness, and is easy to implement. Its essence is a multi-objective (continuous variable) optimization algorithm, mainly used to solve the overall optimal solution in multidimensional space. Due to its simple structure and ability to effectively enhance population diversity, the DE algorithm may fall into local optima during the evolution process. Therefore, this paper introduces the VND33 in DE to avoid falling into local optima. In addition, in order to obtain more effective solutions, an adversarial learning mechanism is introduced in the initialization process to expand the population.

Multi objective optimization framework

Addressing the multi-objective and multimodal nature of RVRP, the OL-DEVND algorithm accomplishes multi-objective optimization via the following steps:

Initialization

A diversified initial population is generated leveraging Oppositional Learning.

Global exploration

Differential Evolution (DE) is employed to generate new solutions, encompassing both the objective space and decision space.

Local refinement

Variable Neighborhood Descent (VND) enhances the quality of solutions to circumvent local optima.

Solution set update

Non-dominated solutions are screened using a combination of Pareto dominance and crowding distance.

Termination condition

The Pareto front is outputted upon meeting convergence criteria or reaching the maximum iteration count.

The flowchart of the OL-DEVND hybrid algorithm is shown in Fig. 1.

Fig. 1
Fig. 1
Full size image

Flowchart of the DEVND.

The pseudocode of the OL-DEVND hybrid algorithm is shown in Algorithm 1.

Algorithm 1: Algorithm OL-DEVND

Input: Population size N, maximum iteration times T, road network constraints

Output: Pareto optimal solution set

Initialization: Generate the current population and the opposing population, merge them, and select Top-N individuals

for t = 1 to T do

Mutation and crossover: DE/rand/1 strategy generates offspring

Local search: VND for offspring applications (Exchange → Insert → 2-opt)

Merge populations: parent + offspring

Non dominated sorting: Hierarchical screening of Pareto frontiers

Crowding distance calculation: Sort solutions on the same layer by diversity

Choose a new generation population: retain the first N individuals

end for

Return Pareto optimal solution set

Population initialization based on oppositional learning

OL is a commonly used strategy for escaping from local optimal solution positions34. OL not only helps individuals quickly escape from their current location, but also increases the likelihood of their fitness values being better compared to their current worst position.

Generate opposing solutions through the strategy of adversarial learning and the position of the current solution, as shown in Eq. (22):

$${X_{opposite}}=rand \times (ub+ul) - X$$
(22)

Where, X represents the current position of the solution.

\(ub\) and represent the upper and lower bounds of the problem solution, respectively. \(rand\) is a random number in [0,1]. In addition, the diversity of the generated space increased by random numbers and the variability of the current position enhances the unpredictability of the exploration process. In exploring mechanisms, this unpredictability is crucial.

Definition

Opposite Point Assuming there exists a number x on [1, u], the opposite point of x is defined as \(x^{\prime}=1+u - x\). Extending the definition of opposition to space, let \(p=({x_1},{x_2},...,{x_d})\) be a point in a d -dimensional space, where \({x_i} \in [{l_i},{u_i}]\), \(i=1,2,...,d\), and its opposition is \(p^{\prime}=({x^{\prime}_1},{x^{\prime}_2},...,{x^{\prime}_d})\), where \({x^{\prime}_i}={l_i}+{u_i} - x{}_{i}\).

According to the above definition, the specific steps for generating the initial group using the adversarial learning strategy are shown in Algorithm 2.

Algorithm 2: Initialization Method Based on Oppositional Learning

Set population size N

for i = 1 to N do

for j = 1 to d do\(X_{i}^{j}=l_{i}^{j}+rand(0,1) \cdot (u_{i}^{j} - l_{i}^{j})\)

end for

end for

or i = 1 to N do

for j = 1 to d do\(OX_{i}^{j}=l_{i}^{j}+u_{i}^{j} - X_{i}^{j})\)

end for

end for

Merge\(\{ X(N) \cup OX(N)\}\), select Nindividuals with the best fitness values as the initial population

DEVND algorithm design

Encoding operation

There are various forms of encoding operations for vehicle path planning, one of which is to represent all customer points with numbers, form a feasible solution after algorithm optimization, and randomly set fixed points based on this feasible solution. Here, the fixed point is set as the central vehicle factory in the encoding operation, usually using a fixed number “1” or “0”. This feasible solution is divided into several paths, and each vehicle path starts from the central vehicle factory and returns to it. This type of coding has a drawback of easily ignoring vehicle capacity. When encountering models with limited vehicle capacity, it is difficult to effectively meet the demand. The second type is the coding method, which adopts the “sort first, cluster later” method. Chromosome coding is a non repeating sorting that includes all customer point numbers. Path segmentation meets the requirement that the total cargo demand does not exceed the vehicle load. That is, in one chromosome, all customer point numbers are arranged in a line from left to right. This coding method occupies relatively small storage space and the scheme will cover all distribution points. The coding efficiency is high. Assuming the optimal chromosome distribution route is 4 → 7 → 3 → 8 → 9 → 6 → 10 → 2 → 5, the vehicle capacity is… When delivering to customer point “8”, the fully loaded cargo volume C is reset to zero. At this time, the vehicle needs to return to the central depot, The path information of the first car is 4 → 7 → 3 → 8.

DEVAD algorithm steps

The reciprocal of the objective function for solving the model is set as the fitness function F, and then the differential evolution algorithm is selected, including the crossover and mutation operators.

(1) Selection operator.

To rank all feasible solutions in the population, it is necessary to choose the one with the highest fitness function and the lowest objective function. There are five selection functions for the selection operator, and the roulette wheel strategy is used for selection. The calculated probability is:

$$P({x_i})=\frac{{F({x_i})}}{{\sum\limits_{{j=1}}^{N} {F({x_j})} }}$$
(23)
$$L{Q_i}=\sum\limits_{{j=1}}^{N} {P({x_j})}$$
(24)

Among them, \(P({x_i})\) is the probability of each feasible solution being inherited into the next generation population, and \(L{Q_i}\) is the cumulative probability of each individual.

(2) Crossover operator.

The initial generation of random integers \({r_1}\) and \({r_2}\) within the [0,1] interval determines the intersection position between the offspring and the parent, and crosses the intermediate data between the two positions. Defined binomial crossover with CR = 0.8 and ensured at least one dimension from \({V_i}\) is retained.

(3) Mutation operation.

The mutation strategy involves randomly selecting two points and swapping their positions, using the “DE/rand/1” strategy with a scaling factor of F = 0.5.

(4) Reinsert offspring.

The re insertion strategy is to replenish the mutated individuals back into the population after crossover, in order to obtain the optimal solution in this iteration.

(5) Change neighborhood descent search to update solutions.

The meaning of neighborhood in different problems is also different. It is a relatively mature improved local search algorithm. The main idea of this algorithm is to use multiple different neighborhoods for systematic search. When the current neighborhood cannot improve the solution, it switches to another neighborhood to improve the quality of the solution. The current neighborhood searches for improved solution quality and continues to search in this neighborhood. The variable neighborhood descent search algorithm is embedded into the genetic algorithm. Each iteration of the genetic algorithm will generate a new individual, and the VND algorithm is used to locally search for the individual path.

The neighborhood search algorithm sets N neighborhood structures, with the neighborhood structure being \({N_k}={N_1},{N_2},...,{N_n}\). Here, N is set as the Exchange optimization neighborhood, Insert optimization neighborhood, and 2-opt optimization neighborhood. A vehicle path is selected as the initial neighborhood, and optimization starts from the Exchange optimization neighborhood. The mutual switching of optimization neighborhoods is called neighborhood action. If an improved solution is found in this neighborhood, the disturbance continues in this neighborhood. If no improved solution is found, the operation is repeated in the next neighborhood. In this study, the path optimization method is used. The search effect and range of the three types of path optimization neighborhoods are the same [1,1], all randomly selecting a vehicle path and performing neighborhood operations within this vehicle path, as shown in Figs. 2, 3 and 4.

Exchange optimization: Choose a path, exchange two positions in the path, and finally form a new path.

Fig. 2
Fig. 2
Full size image

Exchange optimization.

Insert optimization: Select a path and insert a node into another location along the path.

Fig. 3
Fig. 3
Full size image

Insert optimization.

2-opt optimization: Select a path, traverse each node in the path in reverse order, and generate a new path.

Fig. 4
Fig. 4
Full size image

2-opt optimization.

(6) Multi objective solution set update strategy.

In the multi-objective solution set update strategy, the algorithm ensures the convergence and diversity of the solution set through the following mechanism: firstly, non dominated sorting is used to perform Pareto stratification on the population, and individuals are divided into different levels based on the target value, with priority given to retaining non dominated solutions located at the Pareto front, thus selecting the globally optimal candidate solution set. Secondly, for individuals within the same Pareto layer, the distribution density of evaluation solutions is calculated through crowding distance: the weighted sum of Euclidean distances between adjacent individuals in the target space is calculated (weights reflect the importance of each target), and sparse solutions are retained to maintain the diversity of the solution set and avoid local clustering. Finally, the elite retention strategy is introduced to merge the parent and child populations, and then apply non dominated sorting and crowding distance calculation in sequence to select a new generation population that combines high quality and high diversity. This process effectively balances the global exploration and local development capabilities, ensuring that the algorithm approaches the true Pareto front and covers its diverse regions simultaneously in multi-objective optimization.

Experimental results and analysis

Experimental settings

Parameter settings

To select the optimal parameter configuration, this paper conducted preliminary experimental tuning to evaluate the impact of these parameters on the algorithm performance, and also referred to the existing literature. The final parameter settings are as follows.

The DE parameter is set to a population Popsize of 50, a maximum iteration number IterMax of 100, a crossover probability Pc of 0.8, and a mutation probability Pm of 0.1. The feasible solution of the VRP problem is first encoded and an initial population is generated. Based on the input conditions of the mathematical model, the fitness of the initial population is calculated and the population is selected. The termination condition according to the objective function is: (1) the current optimal solution remains unchanged for 10 consecutive generations; (2) When the number of iteration steps exceeds 100.

Hardware configuration

To verify the effectiveness of the proposed method, MATLAB language was used for experimental simulation. The computer was configured with Intel Core i7-3630QM 2.40 GHz, 8GB RAM, and executed on Windows 10 system.

Data set

The proposed method was evaluated on Augerat Set-P35, a widely recognized CVRP benchmark, and Zhou & Wang’s real-world logistics dataset36, which includes 45 instances with varying customer sizes (50–250 nodes), time windows, and vehicle capacities. Augerat Set-P provides standardized instances for reproducibility, while Zhou & Wang’s dataset reflects urban logistics challenges like traffic constraints and split deliveries. No new instances were proposed, as our focus was on enhancing algorithmic performance under established benchmarks.

Evaluation indicators

A single performance indicator cannot comprehensively measure the performance of the multi-objective optimization algorithm. Therefore, we use four metrics, that is, inverse generational distance (IGD)37, coverage metric (C-metric)38, 1/HV (HV is hyper volume37) and 1/PSP. PSP is Pareto sets proximity (PSP)32. PSP reflects the overlap rate and distance between real PS and obtained PS. IGD and c-metric are the most commonly used performance indicators for multi-objective optimization problems. 1/HV and 1/PSP can measure performance in decision space and target space respectively. The smaller the value, the better the performance. They are common evaluation indicators for MMOP.

Comparison algorithm

To verify the effectiveness of the proposed algorithm, this paper compares HDMMODE10, INSGA_II39, ACO40, ADPRA41, Sim-BRIG-LS42 for the problem. All algorithms are executed under identical conditions, which means using the same starting and ending criteria, the same number of starting search points, the same dataset, and the same hardware for running the algorithm. To reduce the influence of randomness, all experiments were carried out 30 times.

Results analysis of augerat Set-P

To better prove the efficiency of the algorithm proposed in this paper, this paper randomly selects instances of different sizes and types on the Augerat Set-P dataset. Due to the fact that this dataset is a standard dataset rather than actual running data, and this dataset is a CVRP dataset with no time window data in the it, the effectiveness of the proposed algorithm is explored using \({f_1}\) and \({f_2}\) as the measurement criteria. In the table, the units of the two objective function values are number of vehicles, m, respectively.

Table 2 Comparison of different algorithm.

From Table 2, it can be seen that on standard CVRP instances with fixed vehicle capacity and customer demand, the hybrid algorithm proposed in this paper exhibits strong advantages compared to other algorithms in calculating the two objectives of vehicle number and driving distance. Taking P-n40-k5 as an example, OL-DEVND reduces the number of vehicles by 1 and shortens the driving distance by about 2.52% compared to the optimal Sim BRIG-LS algorithm among other comparison algorithms.

To analyze the significance of the differences in the experimental results, the experimental results of all algorithms were T tested by randomly selecting examples of different sizes and types. The test results are shown in Table 3. It can be seen from Table 3: Based on the T-test results, all the functions are significantly different than each of the other methods.

Table 3 T-test results.

Results analysis of real-world logistics dataset

Due to Zhou & Wang’s real-world logistics dataset reflects urban logistics challenges like traffic constraints and split deliveries. This paper conducted a dual validation of the objective function and measurement indicators on this dataset.

Table 4 shows the comparison results of all algorithms on this dataset under different customer scale and time window constraints. In the table, the first column names the instance in the form of “\(nu{m_1} - nu{m_2} - nu{m_3}\)”, where, \(nu{m_1}\) represents the number of customers, \(nu{m_2}\) represents the index of different types of vehicle capacity, and \(nu{m_3}\) represents the index of time window configuration, the units of the four objective functions are number of vehicles, m, min and $.

Table 4 Results of different algorithms on real-world logistics dataset.

To analyze the significance of the differences in the experimental results, the experimental results of all algorithms were T tested by randomly selecting examples of different sizes and types. The test results are shown in Table 5. It can be seen from Table 5: Based on the T-test results, all the functions are significantly different than each of the other methods.

Table 5 T-test results.

Tables 6 and Table 7 show the comparison results of IGD, C-metric, 1/HV and 1/PSP. In the table, the second, third, fourth and fifth columns represent the average values of 30 times of IGD, C-metric, 1/HV and 1/PSP respectively. In addition, Wilcoxon signed rank test at the 5% significance level was used, and the test results are given in the last row of each table. ‘B/S/W’ indicates that the effect of the proposed algorithm is significantly better than/basically similar to/significantly worse than the current algorithm.

Table 6 Average values of IGD, C-metric, 1/HV, and 1/PSP of OL-DEVND, INSGA_II.
Table 7 Average values of IGD, HV, 1/HV, and 1/PSP of OL-DEVND, HDMMODE.

From Tables 6 and 7, it can be seen that in terms of IGD, OL-DEVND is significantly better than INSGA_II in all 45 instances and better than HDMMODE in 32 instances. In terms of c-metric, OL-DEVND is significantly better than INSGA_II in 30 instances and better than HDMMODE in 28 instances. In terms of 1/HV and 1/PSP, OL-DEVND also shows good results. The main reason is that OL-DEVND is a global search algorithm with the highest probability of obtaining a highly convergent solution.

It can be obviously seen from Tables 6 and 7 that the difficulty of the problem increases with the increase of the number of customers and the decrease of the vehicle capacity. The reason is that the problem of having more customers and smaller capacity vehicles will have more path planning solutions, so it will be more difficult to converge on all goals. Moreover, any solution with a minimum value in any target is a non dominated solution, independent of the values of other targets. In the real scene, the preference for one goal may be higher than other goals. In the RVRP with four targets proposed in this paper, from a certain point of view, target \({f_2}\) (total driving distance) may be more important than other targets, because the driving distance is proportional to fuel consumption, so it has a direct impact on environmental pollution. In addition, for logistics companies, the target \({f_3}\)(total travel time) and \({f_4}\)(total cost) are equally important, because it is necessary to ensure that customers are served within the specified time to improve customer satisfaction, and to ensure that the total distribution cost is reduced throughout the logistics distribution process. Therefore, it is very important for RVRP to find the optimal value among all objectives by one method, because in most cases, the preference of decision makers is unknown a priori. This is analyzed in Table 8.

Table 8 Comparison of OL-DEVND with HDMMODE/INSGA_II in terms of count of the runs as well as in overall 30 runs, on which OL-DEVND achieved better (<), equal (=) and worse (>) values in four objectives.

There are 45 instances in the data set, and the three algorithms are executed on each instance three times respectively. For each instance, a total of 30 groups of non dominated solutions can be obtained, and each algorithm generates a total of 1350 groups of non dominated solutions. In Table 3, each row except the last two rows represents a summary comparison of 30 runs of an instance. In terms of the number of runs, the minimum value of each target obtained by OL-DEVND is better (<), equal (=) or worse (>) than the minimum value of the corresponding target obtained by HDMMODE/INSGA_II. In the table, the first column names the instance name in the form of of"num1num2num3", and and the second, third, fourth and fifth columns respectively list the comparison of the values of four objective functions in 30 runs. For example, the second row (instance “50-0−1”) and the third column (target) indicate that compared with HDMMODE, OL-DEVND can find the target value 18 times better, 0 times equal and 12 times worse in 30 runs. Compared with INSGA_II, OL-DEVND can find the target value 20 times better, 0 times equal and 10 times worse in 30 runs. The penultimate row provides the distribution of 1350 comparisons in the good, equal, and poor categories. The results show that for targets and, the minimum target value obtained by OL-DEVND is better than that of HDMMODE and INSGA_II, which is equal in the target \({f_2}\), but slightly worse in the target. The last line obtains the minimum value of each target in 30 runs of a method, and provides the comparison result statistics according to the number of instances between the three methods of each target. For the target, compared with HDMMODE, OL-DEVND can find a better value on 33 instances, an equal value on 2 instances, and a worse value on 10 instances. Compared with INSGA_II, OL-DEVND can find a better value on 37 instances, an equal value on 3 instances, and a worse value on 5 instances. Therefore, the last two rows of Table 3 clearly show that the performance of OL-DEVND is better than HDMMODE and INSGA_II in the single operation and the best operation of all 30 operations.

In order to further verify the efficiency of the algorithm proposed in this paper, we compared the algorithm proposed in this paper with ACO40, ADPRA41, Adaptive GA43 and MODEA44 on “50-0−3”, “50-1−2”, “150-0−1”, “150-2−4”, “250-1−1” and “250-2−3” (which are randomly selected). Similarly, the parameter settings of all comparison algorithms are consistent with the original references. All algorithms are carried out under equal conditions. Equal conditions mean using the same starting and termination criterion, equal number of starting search points, the same data set, the same hardware running the algorithms. The comparison results among different algorithms are shown in Table 9.

As can be seen from Table 4, In terms of IGD, 1/HV, and 1/PSP, OL-DEVND is superior to other algorithms in the selected random instances, and in terms of C-metric, OL-DEVND also shows a good effect.

Table 9 The comparison results among different algorithms.

If the non dominated solution set found by the former algorithm is better than the latter algorithm in convergence and diversity, it is considered that the former algorithm is better than the other one. Because visual representation is easy to understand, it is usually used to compare the convergence and diversity of solution sets obtained by different methods. In order to intuitively show the convergence and diversity of non dominated solutions obtained by OL-DEVND, HDMMODE and INSGA_II, we use heat map visualization for analysis. The heat map provides a visual representation of the solution set for many target problems and helps to observe the trade-offs between various targets in a clear way. The heat map displays the data as a grid of pixels whose color represents the proportional value from maximum (hot) to minimum (cold)45. In the heat map representation, each row represents a solution and each column represents a goal. The color of the cell represents the target value for a particular solution. The cold color indicates the convergence of the solution, and the distribution of various colors indicates the diversity of the solution. If the heat map of the solution shows the colder color and the distribution of the whole color range, the non dominated solution obtained by one method can be regarded as a good solution. In order to better use the heat map for visual representation, all targets must have the same scale. Therefore, we standardized all target values to the range of [0,1]. Each heat map is a visual representation of the non dominated solution obtained by the method of a specific instance in all 30 runs. Each target value is displayed in a specific color in the cold (blue) to hot (red) range.

In the test data set, the more customers and the smaller vehicle capacity, the more difficult it is to deal with the problem. Therefore, we selected the data examples of \(num{\kern 1pt} {\kern 1pt} {\kern 1pt} 1=250\) (i.e. 250 customers) and \(num{\kern 1pt} {\kern 1pt} {\kern 1pt} 2=2\) (i.e. the third category of vehicles with the smallest capacity compared with other categories) to visually compare the non dominant solutions obtained by the three algorithms.

Figure 5 shows the convergence and diversity of the non dominated solutions obtained by the three algorithms using the heat map in five cases, namely: 250-2−0, 250-2−1, 250-2−2, 250-2−3 and 250-2−4. The first, second and third lines of the heat map represent the non dominated solutions obtained by OL-DEVND, HDMMODE and INSGA_II on the selected data instance respectively. The rows of each heat map have been rearranged in ascending order of \({f_2}\). In all five instances, the heat map of OL-DEVND shows significantly colder colors than the HDMMODE and INSGA_II heat maps. Therefore, in all five instances OL-DEVND has better convergence in \({f_2}\) than HDMMODE and INSGA_II.

Fig. 5
Fig. 5
Full size image

Heat-maps of non-dominated solutions obtained by OL-DEVND, HDMMODE and INSGA_II on selected instances: (1) OL-DEVND on 250-2−0, (2) HDMMODE on 250-2−0, (3) INSGA_II on 250-2−0, (4) OL-DEVND on 250-2−1, (5) HDMMODE on 250-2−1, (6) INSGA_II on 250-2−1, (7) OL-DEVND on 250-2−2, (8) HDMMODE on 250-2−2, (9) INSGA_II on 250-2−2, (10) OL-DEVND on 250-2−3, (11) HDMMODE on 250-2−3, (12) INSGA_II on 250-2−3, (13) OL-DEVND on 250-2−4, (14) HDMMODE on 250-2−4, (15) INSGA_II on 250-2−4.

From all the comparison results, it can be seen that HDMMODE and INSGA_II always lack different solutions, which is more obvious in goals \({f_1}\) and \({f_2}\). In addition, the heat map of OL-DEVND has better convergence than HDMMODE and INSGA_II in terms of targets \({f_2}\) and \({f_3}\). Obviously, the goals conflict with each other, so a method is better if it produces a better diversity of non dominated solutions, that is, if the non dominated solutions have a wider distribution over the entire color range of all goals. Therefore, heat map visualization shows that the performance of OL-DEVND is better than that of HDMMODE and INSGA_II in terms of convergence and diversity.

In order to show the heat map more clearly, that is, the convergence and diversity of the obtained non dominated solutions. Taking “250-2−4” as an example, Fig. 6 shows the heat map normalization curves of 30 experiments of OL-DEVND, HDMMODE and INSGA_II on \({f_2}\).

Fig. 6
Fig. 6
Full size image

Heat map normalization curves of different algorithms.

It can be seen from Fig. 6 that the normalized values of OL-DEVND in 30 experiments are all smaller than HDMMODE and INSGA_II. And the curve variation amplitude of OL-DEVND is larger than that of HDMMODE and INSGA_II. This indicates that OL-DEVND has good convergence and diversity. The experiment results of the optimal values show that the OL-DEVND has better global optimization performance.

Conclusion

In order to make VRP more realistic, this paper considers RVRP under four constraints: complex road network constraints, capacity constraints, time window constraints, and demand separability constraints. In this problem, in order to meet the planning needs of different decision makers for vehicle paths under different constraints, multiple equivalent optimal paths are sought. RVRP is regarded as a MMOP, and a method combining DE and VND is designed. Firstly, in order to expand the search range of the population, OL is introduced into the basic differential evolution algorithm to broaden the search range of the solution. Secondly, to address the issue of DE being prone to premature convergence and falling into local optima, a VND is embedded to enhance the population search capability. Experimental simulations were conducted on the RVRP data-set, and the results showed that the non dominated solution set obtained by this method was superior to HDMMODE and INSGA_II in terms of convergence and diversity. This method achieved the best overall performance, thus verifying the effectiveness of the proposed OL-DEVND compared to existing methods. The OL-DEVND proposed in this article can obtain multiple equivalent optimal paths when solving RVRP, providing reference inspiration for RVRP and its variants.

The OL-DEVND framework offers actionable solutions for real-world logistics and urban mobility challenges.For example, logistics firms (e.g., e-commerce, cold-chain) can integrate OL-DEVND into fleet management systems to dynamically adjust routes under fluctuating demands and traffic conditions. For customers with large orders, the algorithm’s demand-splitting capability allows partial deliveries via multiple vehicles, maximizing load utilization. By optimizing routes under complex road network constraints, OL-DEVND can reduce peak-hour traffic density in urban corridors.

However, in subsequent research, it is still necessary to consider local searches that meet specific objectives and improve the performance of the algorithm. Due to the uncertainty of vehicle operation and distribution order, for example, temporary distribution will occur, which will affect the route planning. Therefore, the dynamic vehicle routing problem and multiple equivalent optimal the path planning problem will be further studied to further improve the robustness in the future.