Introduction

To find the optimal charging and discharging schedules, machine learning (ML) and artificial intelligence algorithms consider weather, grid load, pricing, and driving history1. It minimizes battery wear by optimizing the charge cycle, lowers charging costs by charging during periods of low demand, and stabilizes the grid by balancing demand spikes2. In a Game Theory-Based Dynamic Pricing Strategy, game theory models are applied to reconcile EV owner preferences with grid demand3. EVs bid on charging/discharging slots with incentives, price signals, and availability4. The advantages of implementing this approach include encouraging EV drivers to discharge when the grid needs it most and recharge when it doesn’t5, deriving maximum economic benefits for EV owners and utilities, and avoiding grid overload by efficiently managing charging load6. Employs RL to learn and modify continuously based on grid conditions, electricity prices, and user trends for best V2G planning7. By controlling power flows in real time and adjusting to unanticipated factors, such as abrupt changes in grid demand or fluctuations in renewable resources, real-time adjustments optimize long-term energy savings for EV users and increase V2G efficiency8. The investigation aims at discussing the interaction of AI forecasting, game-theoretic pricing, and reinforcement learning in a coordinated manner, instead of the existing literature, which implements these two methods separately. The peculiarity is that a single V2G optimization pipeline with a clearly defined information flow and role division is implemented across the forecasting, pricing, and control layers.

Contributions

The major contributions are:

  • RL, AI forecasting, and game-theoretic pricing V2G system control in a hierarchical fusion.

  • Nash equilibrium prices cannot be established in this pricing mechanism because artificial intelligence forecasts are used.

  • An incentive structure for pricing control that accounts for market conditions and uses RL policies that adapt to dynamic prices.

Related works

To control EV charging trends and prevent grid overload, smart scheduling algorithms are required, as the widespread adoption of electric vehicles (EVs) strains the current power grid infrastructure9. It is observed that e-mobility services based on AI techniques enhance the performance of energy management systems within EV ecosystems, optimize charging strategies, and adapt to varying operating environments10. Some of the machine learning methodologies explored in EV charging management include charging protocols, demand response, energy management, and integration of renewable energy sources to enhance and accelerate the charging operations11. Based on machine learning predictions, optimal charging schedules and habits can minimize fossil fuel consumption and environmental impacts12. Moreover, to reduce charging costs and enhance the effectiveness of charging systems, AI-inspired optimization methods have focused on scheduling, clustering, and prediction techniques13.

EV integration into the power system has its benefits and drawbacks for system operators and electric vehicle owners14. In recent years, dynamic pricing systems based on game theory have attracted increased interest as a cost-efficient strategy for optimizing energy delivery and charging15. Such plans will reduce grid strain by promoting EV charging during off-peak hours, when demand is lower16. Game-theoretic formulations in V2G systems have modeled the interactions between grid operators and EV owners as non-cooperative games to optimize pricing policies and charging schedules, thereby avoiding grid congestion and maximizing system-level benefits17. Dynamic pricing mechanisms can be used to achieve the goals of reducing peak demand and charging prices through strategic interactions between grid operators and electric vehicle (EV) users18. As an additional means to enhance social welfare, stabilize pricing mechanisms, and achieve greater user satisfaction than available through a non-cooperative solution, it has been proposed that cooperative game theory models be used to facilitate coordination between grid operators and EV customers19.

Two complex and dynamic systems that have attracted the attention of reinforcement learning (RL) researchers for optimizing decision-making are power management and electric vehicle charging8. RL-based charging methods are highly suitable in the smart grid setting, as they can adapt to varying grid demand, power pricing, and vehicle availability20. To minimize charging costs and meet battery energy demands under uncertainty, the electric vehicle charging control problems were formulated as Markov decision processes (MDPs)21. The solution to these MDPs is found using deep reinforcement learning methods, such as deep deterministic policy gradient (DDPG) and related methods22. To reduce operational costs while maintaining the EV’s best performance, multi-agent deep reinforcement learning systems have proven successful. These methods employ decentralized implementation and a centralized training23. To address the limitations of the distribution network capacity and high-power charging, deep reinforcement learning applied to real-time electric vehicle charging control can be considered a solution24.

In the case of multimicrogrids with V2G technology, the authors proposed a coordinated load-frequency control using an improved multi-agent deep deterministic policy gradient (MA-DDPG) algorithm. This strategy can be used to coordinate the work of distributed EV aggregators in a dynamic operating environment and regulate the system frequency25. Simulation results reveal that compared to the traditional controllers, there is reduced frequency deviation and increased stability. As shown in this paper, the concept of multi-agent reinforcement learning is effective in solving EV-related grid stability problems at a massive scale.

Researchers proposed an optimal scheduling process for microgrids that utilizes V2G and deep Q-learning. Even when the renewable generation and electric vehicle mobility are unpredictable, the proposed model-free approach can be used to learn charge and discharge policies26. The outcomes are more adaptable and have lower operating costs than those of rule-based systems. The study claims that smart grids, facilitated by intelligent V2G scheduling, are enabled by Q-learning methods.

The frequency control method for charging electric vehicles at charging stations in isolated microgrids used model predictive control in conjunction with virtual synchronous generator technology. This approach allows electric vehicle charging stations to comply with charging limitations while also providing rapid inertial support27. The simulation results showed lower oscillations and increased frequency stability. The paper demonstrates that AI-based optimization and improved control can ensure microgrid stability, even with high EV penetration.

A V2G scheduling model based on deep reinforcement learning (Soft Actor-Critic) to enable a multi-energy microgrid that is capable of dynamically coordinating electric vehicles to charge or discharge even when the prices and demand are unknown. The approach conditions the DRA agent to maximize microgrid profitability, subject to operational constraints, through a Markov decision process to plan V2G28. The simulation shows better performance, both economically and in terms of responsiveness, compared to static/rule-based schedulers. This study identifies the potential of state-of-the-art DRL algorithms for real-time grid optimization in the complex V2G setting.

To address the issue of the deep reinforcement learning-based errors in solar power predictions, a journal article in Sustainability represents V2G operations as a Markov decision process. Given the state of charging stations and the uncertainty in solar output, the DRL agent adapts EV charging and discharging29. The comparison with more conventional techniques indicates that the proposed solution reduces the impact of forecasting errors while enhancing grid stability. This paper demonstrates how DRL can enhance V2G energy management in the face of renewable variability.

To achieve a balance between grid demands and user autonomy, the authors propose VESTA, a semantically aware, intelligent V2G management platform based on blockchain, edge computing, and artificial intelligence. The semantic model can enhance energy distribution efficiency by approximately 15% and reduce response times by 20%, while prioritizing key vehicles, such as emergency services, during high grid demand30. The architecture demonstrates that V2G coordination can be enhanced by integrating AI with decentralized technologies, extending beyond common optimization and pricing algorithms.

A more recent study proposes a spatial-temporal data fusion model grounded in large language models to enhance demand prediction for electric vehicle charging in smart grid settings31. The framework is a better way to forecast charging behavior patterns than traditional forecasting models, as it integrates heterogeneous temporal and spatial information into a single framework. The findings show the increased prediction accuracy in dynamic urban settings. The method defines the potential of large language models for high-level energy demand forecasting and intelligent grid decision support.

Another similar paper presents a closed-loop vehicle-to-vehicle charging scheme based on a non-cooperative game-theoretic model. The model captures competitive relationships among electric vehicles at charging stations and derives equilibrium-based charging policies32. Simulation demonstrates that charging efficiency and fairness are enhanced compared to centralized control mechanisms. The paper is useful for understanding decentralized energy exchange and pricing methodologies applicable to V2G systems.

Based the literature, a summary on EV Charging & Energy Management is provided in Table 1 followed by the research gaps identified along with the proposed solution for the respective gaps in Table 2.

Table 1 Summary on EV charging & energy Management.
Table 2 Research gaps identified along with the proposed solution.

Methodology used for smart charging for V2G optimization

This article presents three methods for smart charging in vehicle-to-grid optimization.

Artificial intelligence-based predictive charging

AI and ML models analyse historical driving patterns, weather forecasts, grid demand, and electricity prices to predict optimal charging and discharging schedules. Figure 1 shows the pictorial representation of Smart charging in E-vehicles.

Fig. 1
Fig. 1
Full size image

Pictorial representation of Smart Charging in E-Vehicles.

Algorithm 1
Algorithm 1
Full size image

AI-Based Predictive Charging Algorithm.

In the case of EVs operating in a V2G environment, such a system will map an estimated charging schedule through artificial intelligence. The inputs to the trained AI models include projections of renewable energy, user driving behaviour, electricity costs, and past grid demand. These models predict the grid’s demand, energy prices, and vehicle availability over the time frame specified in Algorithm 1. The system then uses these forecasts to evaluate all possible future time slots and determine the most efficient billing approach. Charging is done during periods of low grid demand and low electricity prices, and discharging is done during periods of high demand and high prices, and when the battery state of charge is beyond a specified threshold. In all other situations, the car is kept idle to avoid an unproductive operation. The charging controller executes the selected move, and the AI models are continually fed new information. In general, the predictive technique is used to enhance the operational stability of V2G networks, minimize expenses, and decrease grid stress.

Battery Level Update Eq. 

$$\:B\left(t+1\right)=\:B\left(t\right)+{\eta\:}_{c}{P}_{c}\varDelta\:t-{\eta\:}_{d}{P}_{d}\varDelta\:t$$
(1)

where:

  • B(t) = Battery level at time ttt.

  • Pc​ = Charging power.

  • Pd​ = Discharging power.

  • ​ηc​, ηd = Charging and discharging efficiencies.

  • Δt = Time interval.

  • Optimal Charging Cost Calculation.

$$\:{C}_{charge\:}=\sum\:_{t\:\epsilon\:T}^{\:}\:{P}_{c}\left(t\right).{P}_{price}\left(t\right)$$
(2)
  • \(\:{C}_{charge\:}\) - Total charging cost.

  • \(\:{P}_{price}\left(t\right)\) - Electricity price at time t.

  • T - Set of optimal charging time slots.

Charging Decision Function.

$$\:\text{A}\left(\text{t}\right)=\left\{\begin{array}{cc}1&\:if\:Pprice\left(t\right)is\:low\:and\:Dgrid\left(t\right)is\:low\\\:0&\:\text{O}\text{t}\text{h}\text{e}\text{r}\text{w}\text{i}\text{s}\text{e}\end{array}\right.$$
(3)
Fig. 2
Fig. 2
Full size image

Flow chart of Artificial Intelligence-Based Predictive Charging.

First, the system boots up and verifies real-time data collection. During data collection, IoT sensors gather battery SOC (State of Charge), grid demand, electricity prices, user driving behaviour, and renewable energy availability. Then the data is sent through MQTT or cloud API to a central processing unit. Preprocessing and Feature Engineering ensure that the data is cleaned, normalized, and formatted for AI processing, as shown in Fig. 2. The time-based attributes (hour, day, month), electricity price trends, and consumption patterns are pulled out. The AI Model Prediction, LSTM Neural Network, investigates historical patterns to forecast the optimal charging time and predict when to charge, based on grid stability and cost minimization. If the prediction allows to charge, the system issues a command to the EV to initiate charging. If the charging is not ideal, the system waits and continues to look for an opportune moment. If charging is undertaken, the system tracks battery health and dynamically adjusts charging speed. If charging is delayed, the model reassesses periodically. The system learns and improves continuously from real-time grid variability and user activity.

Game theory-based dynamic pricing strategy

Uses game theory models to balance EV owner preferences with grid demand. EVs “compete” for charging/discharging slots based on incentives, price signals, and availability.

Algorithm 2
Algorithm 2
Full size image

Game Theory-Based Dynamic Pricing Strategy Algorithm.

The System starts and gathers real-time information from the grid, EVs, and the electricity market. The smart meters and IoT sensors collect EV battery level, grid demand, electricity prices, and user preferences, and transmit this information to the centralized V2G control system described in Algorithm 2. Then Identify Players of the Game: the EV Owners initially want to sell surplus energy at the best price, the Grid Operators need to purchase the energy at the lowest price, and the Electricity Market determines dynamic energy pricing. The most important thing is that either a real-time auction or a Nash Equilibrium model is employed to match demand (grid operators) and supply (EV owners), and That Price incentives are dynamically adjusted based on energy requirements. After such decision-making for EV owners, there are two scenarios: one is IF the price offered is profitable, the EV supplies energy to the grid; the second is IF the price is below cost, the EV waits for favourable market conditions. The market dynamically updates prices based on actual transactions, and the process continues dynamically for subsequent time slots, as shown in Fig. 3.

Fig. 3
Fig. 3
Full size image

Flow chart of Game Theory-Based Dynamic Pricing Strategy.

Dynamic Pricing Strategy:

Bid Price Calculation for Charging

$$\:{P}_{Bid}^{Charge}={P}_{Base}+\alpha\:\left(1-\frac{{B}_{Current}}{{B}_{Max}}\right)$$
(4)
$$\:{P}_{Bid}^{Charge}-\:Bid\:price\:for\:charging$$
$$\:{P}_{Base}\:\:\:-\:Base\:electricity\:price$$
$$\:{B}_{Current}-\:Current\:battery\:level$$
$$\:{B}_{Max}\:\:-\:Maximum\:battery\:capacity$$
$$\:\alpha\:\:\:\:\:\:\:\:\:\:\:-\:\:Weighting\:factor\:for\:bid\:adjustment$$

.

Bid Price Calculation for Discharging

$$\:{P}_{Bid}^{Discharge}={P}_{Base}+\beta\:\left(\frac{{B}_{Current}-{B}_{Min}}{{B}_{Max}}\right)$$
(5)

.

B min​ - Minimum required charge level.

β - Weighting factor for discharging bid.

Profit Calculation for EV Owners.

The weighting factors α and β represent the relative importance of grid demand stress and economic pricing incentives, respectively. These parameters are normalized coefficients constrained to α + β = 1, ensuring interpretability and stability of the bidding formulation.

$$\:Profi{t}_{EV}=\sum\:_{t}^{\:}\:{P}_{sell}\left(t\right).{E}_{discharged}\left(t\right)-{P}_{buy}\left(t\right).{E}_{charged}\left(t\right)$$
(6)

Psell​(t) = Selling price at time.

Edischarged(t) - Energy discharged at time t.

Pbuy(t)- Buying price at time t.

Echarged(t) - Energy charged at time t.

Reinforcement learning-based adaptive control

The proposed method uses RL algorithm to continuously learn and adapt to grid conditions, electricity tariffs, and user behaviour for optimal V2G scheduling. The key benefit of the system is real-time adaptation to unpredictable factors, such as sudden changes in grid demand or fluctuations in renewable energy. Maximizes long-term energy savings for EV owners and increases V2G efficiency by dynamically adjusting power flows.

Algorithm 3
Algorithm 3
Full size image

Reinforcement Learning-Based Adaptive Control Algorithm.

First, the State and Action Space Definition is carried out; the State: [Battery Level, Grid Demand, Electricity Price, Time] and Action: [Charge, Discharge, Idle] are initialised. In Algorithm 3, the RL agent explores different actions and learns from rewards/penalties and Q-table updates based on rewards and future state estimates. The key optimization control monitors the following.

  • High reward for charging when prices are low, and demand is low (Allow).

  • High reward for discharging when prices are high, and demand is high (Allow).

  • Penalties for inefficient charging/discharging (Not Allow).

After training, the model continuously monitors the grid and EV state. It selects the best action based on learned policies. The process repeats periodically, ensuring dynamic V2G optimization as shown in Fig. 4.

Q-Learning Update Rule

$$\:\:\:\:Q(s,a)=Q(s,a)+\alpha\:(r+\gamma\:maxQ(s{\prime\:},a{\prime\:})-Q\left(s,a\right)$$
(7)

.

Q (s, a) = Q-value for state s and action a.

α = Learning rate.

r = Reward for taking action a in state s.

γ = Discount factor for future rewards.

max ​Q(s′, a′) = Maximum Q-value for the next state s′.

Fig. 4
Fig. 4
Full size image

Flow chart of Reinforcement Learning-Based Adaptive Control.

Reward Function for Charging

$$\:R=\left\{\begin{array}{cc}+10&\:if\:electricity\:price\:is\:low\:and\:grid\:demand\:is\:low\:\left(optimal\:charging\right)\\\:-5&\:if\:charging\:during\:high\:demand\end{array}\right.$$
(8)

.

Reward Function for Discharging

$$\:R=\left\{\begin{array}{cc}+10&\:if\:electricity\:price\:is\:hihg\:and\:grid\:demand\:is\:high\:\left(optimal\:charging\right)\\\:-5&\:if\:discharging\:during\:wrong\:time\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\end{array}\right.$$
(9)

Collaborative fusion mechanism

The proposed framework consists of three interrelated layers. Firstly, the grid demand and pricing pattern anticipation adopted in the module is grounded on artificial intelligence. These predictions ensure consistency in equilibrium solutions to the game under the expected conditions of the system, constraining the pricing layer of game theory. To facilitate adaptive charging decisions informed by market and forecast awareness, the reinforcement learning agent receives dynamic prices as part of the state and reward formulation.

Fig. 5
Fig. 5
Full size image

Collaborative fusion mechanism overview.

Figure 5 represents the fusion architecture and shows a hierarchical coordination scheme for vehicle-to-grid energy management. The predictive module is an AI-based forecasting tool to predict future demand and electricity price trends in the grid. These predictions limit the game- theoretic pricing layer, where equilibrium prices are fixed to satisfy the expected system conditions. The dynamically generated prices are then fed into the reinforcement learning controller, which identifies real-time charging, discharging, or idle actions. This systematic flow of information will help to turn individual techniques into a common decision-making system.

$$\:{a}_{t}^{\text{*}}=\text{arg}\text{max}Q({S}_{t},\:{P}_{t}^{GT},\:{\widehat{D}}_{t})$$
(10)

In Eq. (10), \(\:{a}_{t}^{*}\) denotes the optimal charging action selected at time \(\:t\). The state \(\:{S}_{t}\) includes battery state of charge and temporal information. \(\:{P}_{t}^{GT}\) represents the dynamic electricity price obtained from the game-theoretic pricing model, while \(\:\:{\widehat{D}}_{t})\) is the AI-predicted grid demand. By embedding both market price and forecasted demand into the \(\:Q\) -function, the reinforcement learning agent makes decisions that are economically efficient and grid-aware.

System model and assumptions

The proposed V2G framework is evaluated using a simulation-based system model that represents interactions between the power grid, electric vehicle fleets, and renewable energy sources. Due to limited access to real-time grid operation data, simulation enables controlled assessment under diverse operating conditions while ensuring reproducibility. Parameter ranges are selected to reflect per-vehicle energy contributions and scalable aggregation behavior consistent with engineering practice. Renewable energy availability is modeled as a stochastic input that affects supply constraints and pricing signals, reflecting the intermittent nature of renewable generation.

Scenario-based sensitivity analysis

To evaluate robustness and generalizability, the proposed framework is tested under varying EV fleet sizes and load conditions. This analysis ensures that observed performance trends are not restricted to a single operating configuration.

Fig. 6
Fig. 6
Full size image

Scenario-based sensitivity analysis.

Figure 6 illustrates the performance of the proposed V2G framework under varying EV fleet sizes, evaluating its robustness and scalability. Charging cost and peak load metrics are compared across different participation levels to assess sensitivity to system scale. The consistent reduction in both metrics as fleet size increases indicates effective coordination between forecasting, pricing, and control layers. These results demonstrate that the proposed framework maintains stable and predictable behavior across diverse operating conditions, supporting the generalizability and engineering relevance of the simulation-based evaluation.

To assess robustness, α and β are varied across representative ranges, and their effects on charging costs and peak load are evaluated. This analysis confirms that the pricing mechanism remains stable and effective across reasonable parameter selections.

Baseline methods for comparison

To evaluate the effectiveness of the proposed framework, comparisons are conducted against representative state-of-the-art (SOTA) and benchmark V2G optimization methods. These include learning-based, game-theoretic, and heuristic optimization approaches commonly adopted in recent literature.

The comparison baselines include:

  1. (i)

    Transformer-based load and price prediction,

  2. (ii)

    Multi-agent deep reinforcement learning (MADRL),

  3. (iii)

    Distributed game-theoretic pricing optimization,

  4. (iv)

    Particle swarm optimization (PSO), and.

  5. (v)

    Genetic algorithm (GA).

All methods are evaluated using unified metrics, including average charging cost, the grid peak–valley difference, and the EV user satisfaction index, ensuring a fair and consistent performance comparison.

Comparative performance evaluation

Fig. 7
Fig. 7
Full size image

Comparative Performance Evaluation.

Figure 7 presents a quantitative comparison between the proposed fusion framework and representative SOTA and benchmark optimization methods. Charging cost and peak–valley difference is used as unified evaluation metrics. While learning-based and heuristic methods demonstrate moderate optimization capability, the proposed framework consistently achieves lower charging costs and improved load balancing. This improvement is attributed to coordinated forecasting, pricing, and adaptive control. The results confirm that the proposed approach outperforms existing standalone and heuristic-based solutions under identical evaluation conditions.

All baseline methods are evaluated under identical simulation conditions and input constraints to ensure fairness. Parameter settings follow commonly adopted values reported in the literature.

Analysis of smart charging algorithm

The data can simulate charging decisions, price fluctuations, battery levels, and market conditions over time. Below are example datasets for each method, which can be used to plot relevant graphs.

Artificial intelligence-based predictive charging

This method predicts the optimal charging time based on grid conditions, EV battery state of charge, and electricity prices, as shown in Table 3.

Example Data:

  • Battery Level (SOC): 0–100%.

  • Grid Demand: 0–10 kW.

  • Electricity Price: $0.05 - $0.30 per kWh.

  • Charging Decision: 1 (charge), 0 (wait).

Table 3 AI-Optimized EV charging Data.
Fig. 8
Fig. 8
Full size image

Simulation result of Artificial Intelligence-Based Predictive Charging.

The graph depicts an AI-driven predictive charging approach for an EV over 8 h. The battery charge rises steadily from 20% to almost 100%, indicating regulated charging. The charging choice switches between charging (1) and waiting (0) in a cycle, most probably to keep energy usage efficient, keep expenses low, and prolong battery life. This periodic charging method maintains grid demand within limits, maintains battery health at optimal levels, and provides a full charge within the prescribed time limit. The outcomes indicate that AI plays a key role in improving the efficiency and effectiveness of EV charging stations, along with issues such as grid strain, energy management, and environmental concerns, as shown in Fig. 8.

The modeled grid demand range (0–10 kW) represents the effective power contribution of an individual EV rather than the total feeder load. When aggregated across multiple vehicles, this abstraction corresponds to realistic distribution-level demand scales.

Feature selection rationale

The selected features (time attributes, electricity price trends, historical demand) are chosen based on their demonstrated influence on charging behavior and grid load variability reported in prior V2G studies. Time-related features capture periodic demand patterns, while electricity price trends reflect market-driven charging incentives. Historical demand provides temporal correlation, which is essential for short-term forecasting. Feature relevance is quantitatively validated using normalized importance scores derived from model sensitivity analysis, in which prediction performance is evaluated after removing individual features. Results indicate that time attributes and price trends contribute most significantly to forecasting accuracy.

Fig. 9
Fig. 9
Full size image

Feature selection analysis.

Figure 9 illustrates the impact of varying the weight factor α on charging cost performance. Results show that moderate weighting between grid demand and pricing incentives yields optimal outcomes, while extreme weighting leads to diminished performance. This confirms that the bidding mechanism is robust to parameter selection and does not rely on fine-tuned values, thereby enhancing reproducibility and practical applicability.

Game theory-based dynamic pricing strategy

This strategy uses dynamic pricing based on supply-and-demand interactions among EV owners, grid operators, and the electricity market, as shown in Table 4.

Example Data:

  • EV Participation: Number of EVs that discharge to the grid.

  • Market Price: Dynamic price of electricity per kWh.

  • Grid Demand: Amount of energy requested by the grid.

  • Profit (EVs): Profit gained from selling energy.

Table 4 Game theory -Optimized EV charging Data.
Fig. 10
Fig. 10
Full size image

Simulation result of game theory -based predictive charging.

Dynamic pricing strategies for EV charging based on game theory have been investigated in different models to maximize the interaction between EV owners and grid operators. The models are designed to balance the grid, minimize operation costs, and maximize energy distribution. Nash equilibrium, cooperative game theory, and Stackelberg games are effective in maximizing pricing efficiency and load management. As EV adoption increases, these policies will become even more important for integrating electric cars into the electricity grid and enhancing sustainable energy management. The graph above illustrates a Game Theory-Based Dynamic Pricing Policy for EV charging. The top subplot shows the market price ($/kWh) increasing steadily over time, reflecting a dynamic pricing system in which prices increase in steps. The lower subplot shows EV profit ($), which at first rises, peaks at about 4–5 h, and then drops, indicating an optimal charging interval for profit maximization. The strategy probably aims to balance supply and demand, maximize revenue, and promote strategic charging behaviour, as shown in Fig. 10. The number of participating EVs is allowed to vary over time, capturing stochastic arrival and departure behavior observed in real-world charging environments.

Reinforcement learning-based adaptive control

This approach uses reinforcement learning to determine optimal charging or discharging actions based on the state (battery level, grid demand, etc.), as shown in Table 5.

Example Data:

  • Battery Level (SOC): 0–100%.

  • Action: 1 (charge), 0 (discharge), 2 (wait).

  • Reward: Positive for beneficial actions, negative for poor actions (e.g., overcharging, undercharging).

Table 5 Reinforcement learning -Optimized EV charging Data.

The graph shows a Reinforcement Learning-Based Adaptive Control strategy for Electric Vehicle charging. The upper subplot shows the decision-making for actions (charging, discharging, or waiting) changing over time, which implies an adaptive strategy for energy management. The middle subplot shows a steady increase in battery level (%), indicating controlled charging. The bottom subplot shows the reward function, which dynamically adapts to actions taken, optimizing efficiency and performance. It must be a measure that balances power consumption, cost, and system stability. The empirical studies presented clearly show the multifaceted value and efficiency of RL, from Q-learning through DRL, in achieving energy delivery efficiency, operational cost savings, and grid stabilization. RL models enable the adjustment of real-time charging schedules based on grid status, energy pricing, and car demand, an aspect that becomes increasingly important as EV adoption grows and grid control becomes more complex. These findings clearly show that RL has the potential to be a strong component of future smart grid solutions and energy management for EVs, as shown in Fig. 11.

Fig. 11
Fig. 11
Full size image

Simulation result of Game Theory -Based Predictive Charging.

The three methods, AI-Based Predictive Charging, Game Theory-Based Dynamic Pricing, and Reinforcement Learning-Based Adaptive Control, while distinct in their approaches, share several key similarities and correlations when applied to the optimization of EV charging, grid management, and energy distribution.

The reward values are normalized to reflect relative operational preference rather than absolute monetary gain. This normalization ensures stable policy convergence and prevents reward saturation, a common issue in Q-learning-based energy management systems.

While the core state space focuses on battery level, grid demand, electricity price, and time, user travel urgency and battery aging costs are implicitly incorporated through charging constraints and reward penalties, ensuring tractable state dimensionality without compromising decision relevance.

$$\:{R}_{t}\text{}=-\alpha\:{p}_{t}{E}_{t}\text{}\text{}-\beta\:{d}_{t}\text{}\text{}-\gamma\:{C}_{aging}\text{}\text{}+\delta\:{U}_{t}\text{}\text{}$$
(11)

The reward function \(\:{R}_{t}\:\)balances economic cost \(\:{E}_{t}\), grid stability \(\:{d}_{t}\), battery aging \(\:{C}_{aging}\), and user satisfaction\(\:{U}_{t}\). Weighting coefficients \(\:\alpha\:\), \(\:\beta\:\), \(\:\gamma\:,\) \(\:\delta\:\) regulate trade-offs between system-level and user-centric objectives by Eq. 11.

The multi-objective reward formulation enables the RL agent to balance competing operational goals. Economic efficiency and grid stability are prioritized through cost and demand penalties, while battery aging cost discourages excessive cycling. User satisfaction is encouraged via timely charging completion. Weight normalization ensures stable learning and prevents dominance of any single objective. This design aligns with practical V2G operation, where economic, technical, and user-centric factors must be jointly optimized.

Unified performance evaluation metrics

To ensure objective comparison, all methods are evaluated using unified quantitative metrics, including.

  1. (i)

    Charging cost reduction rate,

  2. (ii)

    Grid peak–valley difference mitigation ratio, and.

  3. (iii)

    EV user satisfaction index. These metrics are computed directly from simulation outputs rather than subjective scoring.

Comparative performance evaluation

Table 6 Performance Evaluation.

This is defined as the percentage reduction in charging costs relative to uncontrolled charging. Peak-valley difference reduction is a measure of load smoothing of base grid demand, as analysed in Table 6. The mean EV waiting time indicates the level of charging convenience for users. All metrics are generated under the same simulation settings and averaged across multiple runs to ensure consistency and reproducibility.

Below is an exploration of the correlation between these methods:

Common goal: optimization of EV charging and grid management

All three approaches are designed to maximize energy distribution efficiency and minimize EV charging operational costs. Each solution targets different charging dimensions—timing, pricing, and real-time adjustment. AI-Based Predictive Charging anticipates the best times to charge based on past and real-time data to optimize charging schedules. Game Theory-Based Dynamic Pricing optimizes pricing strategies between grid operators and EV owners, influencing when and how much energy EV owners charge. Reinforcement Learning-Based Adaptive Control uses real-time feedback to learn optimal charging and discharging strategies.

Data-driven decision-making

All three methods rely on data-driven decision-making for optimizing charging. They use real-time data from the grid, energy prices, and vehicle statuses, enabling them to adjust charging behaviour dynamically. AI-Based Predictive Charging uses machine learning or AI algorithms to predict future charging patterns based on historical and current data. Game Theory-Based Dynamic Pricing relies on real-time information regarding grid load and pricing to optimize the charging decisions for both EV owners and grid operators. Reinforcement Learning-Based Adaptive Control continuously adapts charging strategies by learning from real-time feedback from the grid and charging behaviour.

Interaction between EV owners and grid operators

These approaches are structured around the interaction between EV drivers and grid operators to achieve optimal charging schedules and grid stability. The distinction lies in how these interactions are modelled and controlled. AI-based predictive charging primarily optimizes the charging behavior of individual EVs based on forecasted data, without explicitly modelling interactions with the grid. Game Theory-Based Dynamic Pricing explicitly simulates the non-cooperative or cooperative interactions among EV owners (as consumers) and grid operators (as service providers) to optimize pricing strategies. Reinforcement Learning-Based Adaptive Control employs an adaptive feedback mechanism in which the system (RL agent) adapts based on interactions between the grid and the charging station, thereby maximizing charging efficiency iteratively.

Balancing load and reducing peak demand

Both methods help alleviate congestion on the grid and peak demand. However, they do so differently: AI-Based Predictive Charging forecasts when demand will peak and schedules EV charging for low-demand times, without adding further load to the grid. Game Theory-Based Dynamic Pricing employs dynamic pricing to encourage EV owners to charge their vehicles at low-demand times to balance supply and demand and reduce grid load. Reinforcement Learning-Based Adaptive Control: Ongoing charging-schedule adaptation in real time, learning when it is optimal to charge based on grid load and energy prices.

Adaptability and flexibility

AI-Based Predictive Charging is very effective at predicting charging patterns but is static, as it uses past data to anticipate future charging behaviour. Game Theory-Based Dynamic Pricing is more dynamic in its response, adapting faster to sudden grid status changes and rapidly changing conditions through pricing adjustments. Reinforcement Learning-Based Adaptive Control is the most proactive of the three, as it learns and adjusts continuously to real-time information and changes, tailoring the charging process to optimize based on ongoing interactions with the grid.

Real-time decision making vs. long-term predictions

AI-Based Predictive Charging was largely concerned with long-term forecasting based on past trends and models. Game Theory-Based Dynamic Pricing was more concerned with short-term, real-time pricing strategy decisions for grid management and customer behaviour. Reinforcement Learning-Based Adaptive Control concerns real-time decision-making and ongoing optimization, learning from feedback and adapting, as shown in Table 7.

Table 7 Correlation summary Table.

AI-Based Predictive Charging is best suited for situations where charging behaviour can be predicted based on historical data. It’s more effective in stable environments but lacks real-time adaptability. Game Theory-Based Dynamic Pricing is best for optimizing pricing and guiding consumer behaviour. It works well for balancing demand and incentivizing off-peak charging, and it is easier to integrate into existing systems. Reinforcement Learning-Based Adaptive Control is most suitable for highly dynamic and complex systems that require continuous, real-time adjustments to optimize charging behaviour. It is computationally expensive but offers the highest flexibility and adaptability, as shown in Table 7.

Table 8 Data considered for smart charging in electric Vehicles.

The smart charging framework decision-making includes grid demand, power price, battery state of charge (SOC), renewable energy availability, user flexibility, and normalized internal scores (Table 8). These scores are ordinal, derived by threshold-based standardization rather than raw physical measures. All values are expressed as multiples of five due to reasons of readability, strong aggregation of varied parameters, and low sensitivity to minor swings. At this level of discretization, stable optimization and learning in AI and RL modules are facilitated, and decision correctness is maintained. The sensitivity analysis shows that the trend of the decision and the results of the comparison are not affected by the resolution of the scores; hence, the results can be relied on. Charging cost reduction is calculated as the percentage decrease in total charging cost compared to uncontrolled charging. Peak–valley difference reduction measures the improvement in load smoothing relative to baseline grid demand. Average EV waiting time reflects user-level charging convenience. All metrics are obtained from identical simulation settings and averaged over multiple runs to ensure consistency and reproducibility.

Fig. 12
Fig. 12
Full size image

Comparison of charging Algorithms based on correlation factors.

Figure 12 indicated that the normalized internal decision scores differed across conditions and operational time spans. The figures shown are not real-world physical values but rather the relative influence of grid demand, power prices, and battery condition on charging decisions. The discrete scoring method is applied to examine the system’s dynamics, as it emphasizes consistent trends and comparisons. The selected scale minimizes sensitivity to noise and enhances the stability of AI-based decision-making, yet finer granularity may increase numerical accuracy. The findings are sound and reliable, irrespective of the level of discretization adopted, because the charging patterns and transitions observed are similar across all levels.

Fusion evaluation

The performance evaluation focuses on the coordinated operation of the proposed fusion framework rather than isolated optimization techniques.

Fig. 13
Fig. 13
Full size image

Fusion evaluation analysis.

Figure 13 compares the performance of individual optimization methods with the proposed fusion framework. Charging cost and peak load metrics are shown for AI-only, game-theory-only, reinforcement-learning-only, and fused approaches. The fusion framework consistently achieves lower charging costs and reduced peak demand, demonstrating the benefit of coordinated decision-making. The results confirm that integrating forecasting, pricing, and adaptive control yields superior system-level performance compared to applying each technique independently.

While public datasets provide valuable historical insights, they lack closed-loop interaction between forecasting, pricing, and control layers. Therefore, simulation is employed to enable dynamic coordination, which is essential to the proposed framework. All baseline methods are evaluated under identical simulation conditions and input constraints to ensure fairness. Parameter settings follow commonly adopted values reported in the literature. All parameters, weight ranges, and evaluation conditions are explicitly defined to ensure reproducibility. Sensitivity analyses demonstrate robustness to parameter variation, supporting scientific rigor and repeatability.

Mechanistic interpretation and applicability analysis

Reinforcement learning behavior analysis

The great real-time flexibility of the RL-based approach stems from its online policy update system, which maps system states to actions without the need to optimize globally. In contrast to the optimization-based approach, RL responds instantly to fluctuations in price and demand, making it adapt quickly to stochastic grid and user behavior. This feature makes RL especially well-suited for real-time charging management in dynamic systems.

Economic interpretation of game-theoretic pricing

EV profit rises at the beginning of the game, when dynamic pricing encourages participation when demand and price are favorable. Nonetheless, the greater the participation, the more competition between EVs will drive price convergence and lower marginal profits. This equilibrium is a saturation effect of classical equilibrium, where an increase in the number of participants dilutes individual gains, which is consistent with the laws of non-cooperative game theory.

Predictive model effect on system behavior

The AI-based predictive charging method is more efficient and will enhance system efficiency by shifting demand to low-price, low-load periods. Its performance, however, is dependent on the accuracy and stability of forecasts over time. Errors in prediction or unexpected changes in demand may lead to poor performance, underscoring the method’s dependence on the past rather than the present.

Scenario-specific applicability

Table 9 summarizes the appropriateness of each approach for representative V2G deployment scenarios, such as urban distribution networks and highway fast-charging stations.

Table 9 Scenario-specific applicability analysis.

Predictive charging based on AI is most efficiently applied in settings where demand patterns remain constant, e.g., residential grids. Game-theoretic pricing is effective in market-oriented situations where the number of rational participants is large. RL-based control is effective in highly dynamic environments that require quick adaptation, as in highway charging stations. These limits emphasize that there are no general-purpose approaches that work in every situation and that a coordinated or hybrid strategy is needed.

Quantitative summary

RL is more adaptable, and its training is more complex. Game-theoretic approaches are economically interpretable but lack sensitivity to equilibria. Predictive models based on AI minimize peak load; however, this depends on forecast precision. These trade-offs reflect the functional strengths and shortcomings of either approach. The observed behaviors align with the engineering constraints of real V2G systems, as latency, user behavior variability, and infrastructure constraints directly affect the effectiveness of the methods. Quantitative evaluation shows that game-theoretic dynamic pricing achieves the greatest reduction in charging costs, highlighting its effectiveness in economic optimization. Reinforcement learning-based control achieves superior reduction of peak–valley differences and lower EV waiting times through its adaptive decision-making. AI-based predictive charging delivers stable but moderate performance, particularly under predictable demand conditions. These results confirm that each method exhibits distinct strengths aligned with specific optimization objectives rather than universal superiority.

Reproducibility and evaluation transparency

All simulation parameters, evaluation metrics, and comparison conditions are explicitly defined to support reproducibility. Each experiment is conducted under identical system settings, and reported results represent averaged outcomes over multiple simulation runs to reduce stochastic bias. Unified quantitative metrics are used across all compared methods to ensure fair and transparent performance evaluation. This design allows independent researchers to replicate the experimental setup and verify the reported conclusions.

Conclusion

This research integrates three key technologies—AI forecasting, game-theoretic pricing, and reinforcement learning control—into a unified V2G energy management system. Unlike existing approaches that use these methods separately, our framework allows them to share information across layers, making collective decisions more efficient. Our comparative evaluation demonstrates the framework outperforms existing state-of-the-art methods. AI models predict grid conditions to enable better scheduling, game theory optimizes pricing to balance supply and demand, and reinforcement learning adjusts charging in real-time based on these prices. Together, these three methods reduce costs, improve grid stability, and enable sustainable EV integration. As EV adoption increases, our hierarchical approach combines the strengths of game theory, reinforcement learning, and forecasting to create a practical, cost-effective solution for large-scale EV integration.

Although the proposed framework demonstrates effective performance under simulated V2G scenarios, several limitations remain. First, the evaluation relies on simulation-based data rather than real-world grid operation datasets, which may not capture all infrastructure constraints. Second, the reinforcement learning model does not explicitly model battery aging dynamics or long-term user behavior. Third, coordination among multiple charging stations is not considered. Future work will focus on real-world deployment, enhanced battery degradation modeling, and large-scale multi-station coordination.