Introduction

Extreme climatic events have become a major global concern due to their escalating frequency and magnitude under changing climate conditions. In recent decades many regions have experienced severe socio-economic and environmental disruptions due to the increased frequency and magnitude of hydro-meteorological extremes such as floods, droughts, and heatwaves1,2,3. Anticipated changes in global climate patterns suggest that the severity of such events is likely to escalate under future warming scenarios thereby emphasizing the importance of reliable climate projections at regional and local scales4,5. To address these concerns reliable projections of future climatic conditions require modelling frameworks capable of capturing large-scale climate dynamics. Global Climate Models (GCMs) developed under the framework of the World Climate Research Programme (WCRP) represent a key tool for understanding the earth’s climate system and its response to anthropogenic activities6,7,8.These models simulate complex atmospheric, oceanic, and land surface processes based on physical and empirical relationships. The development and evaluation of GCMs are coordinated through the Coupled Model Inter-comparison Project (CMIP), which provides standardized experimental protocols across multiple phases. The latest phase CMIP6 released in 2020 incorporates improved model physics, higher spatial resolution, and updated socio-economic scenarios compared to earlier versions9. However, despite these advancements, substantial uncertainties persist in reproducing regional climatic characteristics10,11,12. These discrepancies arise due to the coarser spatial resolution, simplified assumptions, and computational limitations inherent to GCMs13,14. Consequently, direct utilization of GCM outputs for impact assessment at finer scales is discouraged necessitating the application of downscaling techniques to bridge the gap between global simulations and regional studies. To overcome these limitations downscaling and bias-correction techniques are employed to translate large-scale GCM outputs into finer-resolution region specific data. Downscaling techniques are broadly categorized as dynamical and statistical. Dynamical downscaling involves the use of regional climate models (RCMs) nested within GCMs to explicitly simulate local atmospheric processes15,16 whereas statistical downscaling develops empirical relationships between large-scale predictors and local observations to generate finer-scale projections17,18. Among these statistical downscaling is often preferred for precipitation analysis due to its lower computational demand and ability to correct systematic model biases19,20. The downscaled and bias-corrected CMIP6 precipitation dataset21, available at 0.25° × 0.25° spatial resolution have been widely utilized for hydrological and extreme event analysis across India. Nevertheless, uncertainties remain even after bias correction and downscaling as the structural differences, boundary conditions, and emission scenarios among GCMs contribute to diverse regional projections22,23. To reduce these uncertainties, multi-model ensemble (MME) strategies are often adopted, integrating multiple GCMs to provide a more robust estimate of climate variables7,24. However, employing all available models in an ensemble does not necessarily guarantee improved performance. Recent studies emphasize the need to identify an optimal subset of models where underperforming models are excluded to enhance the reliability of projections25,9,26. Model selection and ranking are therefore essential for achieving credible regional projections. Typically, two strategies are used: the past-performance approach which evaluates models based on their ability to replicate observed historical conditions and the envelope approach which prioritizes the diversity of future projections27,28. To systematically access model performance researchers have applied a wide range of statistical indicators and multi criteria decision making (MCDM) methods that integrate multiple performance metrics into a single ranking framework. Techniques such as VIKOR, TOPSIS, Simple Additive Weighting (SAW), and Compromise Programming (CP) have been widely employed to rank climate models based on precipitation and temperature indices7,8,29. Among various weighting strategies, the CRITIC method has shown strong potential in assigning objective weights by accounting for both contrast intensity and inter-criterion correlation9,8.

Research significance

The Kosi River Basin located in the eastern Indo-Gangetic plains and fed by the Himalayas, is characterized by extreme rainfall variability and recurrent flooding30. The river’s dynamic morphology, coupled with intense monsoonal precipitation has historically earned it the title “Sorrow of Bihar”2. The basin supports millions of inhabitants who rely on agriculture and is thus highly vulnerable to even moderate changes in precipitation extremes31. A single shift in rainfall intensity can trigger widespread inundation, sediment deposition, and socio-economic impacts, including loss of life, agricultural damage, and displacement of local communities, which heightens the importance of reliable precipitation-extreme projections for flood-risk management and climate-adaptation planning. Despite this high hydro-climatic vulnerability very few studies have explicitly evaluated model performance based on individual extreme climatic indices or explored how optimal subset ensembles improve the representation of regional precipitation extremes. Furthermore, comprehensive comparisons integrating multiple MCDM approaches with diverse weighting schemes remain limited. Such integrative assessments are crucial for identifying the most reliable downscaled bias-corrected CMIP6 models that can effectively capture precipitation variability at the basin scale. Therefore, the present study aims to (i) evaluate and rank downscaled CMIP6 GCMs over the Kosi Basin using MCDM-CRITIC framework integrated with performance statistics (ii) identify an optimal ensemble subset that minimise uncertainty and improve simulation of precipitation extremes, (iii) project future changes in precipitation extremes under SSP245 and SSP585 scenarios using the optimal ensemble and evaluate their hydrological implications for the Kosi Basin.

Study area

The Kosi River Basin is one of the major transboundary basins of the Eastern Himalayas shared between Nepal and India. It extends from the high-altitude regions of southern Tibet and central Nepal to the alluvial plains of north Bihar. The main Kosi River flows through the districts of Supaul, Saharsa, Madhepura, Khagaria, and Purnea in Bihar eventually merging with the Ganga near Kursela32.The basin experiences a humid subtropical to alpine climate transitioning from the snow dominated Himalayan headwaters to the monsoon dominated plains. The hydrological regime of the Kosi River is characterized by large seasonal variations with extremely high discharge during monsoon months and low flow during the dry season33,2. Sediment yield is exceptionally high primarily due to the weak geological structure of the Himalayas, rapid erosion, and extensive deforestation in the upstream region. The average annual sediment load has been estimated to exceed 53 million tonnes leading to rapid aggradation and shifting of the river course in the downstream floodplains31.These geomorphic and hydrological processes are responsible for the frequent embankment breaches and large-scale flooding that have historically affected millions of people in Bihar. For instance, the 2008 Kosi flood caused widespread displacement and severe economic losses across northern Bihar underscoring the basin’s hydrological vulnerability34. Given its dependence on monsoon rainfall and sensitivity to climate variability the Kosi Basin serves as a natural laboratory for assessing the potential impacts of climate change on precipitation extremes and water related hazards. In addition to its hydrological importance, the basin holds immense socio-economic significance with the majority of the population dependent on agriculture35. The fertile floodplain region supports intensive cultivation and sustains rural livelihoods forming the backbone of the local economy. Consequently, even modest shifts in rainfall intensity or duration can lead to disproportionate impacts on food security, infrastructure, and community resilience in the region. The Location of the Kosi river basin along with its grid points is illustrated in Fig. 1.

Fig. 1
Fig. 1The alternative text for this image may have been generated using AI.
Full size image

Location of the Kosi River Basin showing assessment grid points, river, and drainage network, generated using ArcGIS Desktop Version 10.3 (https://www.esri.com/en-us/arcgis/products/arcgis-desktop/overview).

Data and methods

Datasets

Observational dataset

The ERA5 reanalysis dataset, developed by the European Centre for Medium-Range Weather Forecasts (ECMWF) was used as the reference observational dataset to evaluate the performance of historical CMIP6 models during the period 1985–2014. ERA5 provides hourly atmospheric variables at a spatial resolution of 0.25° × 0.25°, assimilating surface, satellite and upper-air observations. ERA5 was selected owing to its high spatio-temporal resolution and consistency with observed rainfall patterns over the Indian region36.

CMIP6 climate models

Historical and projected precipitation data were obtained from the Coupled Model Intercomparison Project Phase 6 (CMIP6) archive. A total of thirteen Global Climate Models (GCMs) were initially selected based on their data availability, spatial resolution, and ability to represent key monsoon processes over South Asia9. These models include ACCESS-CM2, ACCESS-ESM1-5, BCC-CSM2-MR, CanESM5, EC-Earth3, EC-Earth3-Veg, INM-CM4-8, INM-CM5-0, MPI-ESM1-2-HR, MPI-ESM1-2-LR, MRI-ESM2-0, NorESM2-LM, and NorESM2-MM. All the CMIP6 datasets employed in this study were statistically downscaled and bias corrected to a 0.25° × 0.25° spatial resolution by21. The CMIP6 simulations were analysed under two Shared Socioeconomic Pathways (SSPs): SSP245 representing an intermediate stabilization scenario with moderate emission control and SSP585 depicting a high emission pathway characterized by continued fossil fuel dependence and strong radiative forcing. For each GCM both historical simulations for the period (1985–2014) and future projections for two distinct time horizons namely the Near Future (2031–2060) and the Far Future (2061–2100) were considered for analysis. For each CMIP6 model, a single ensemble realization was consistently used to ensure comparability across models during ranking and ensemble generation. All models were evaluated over the same historical period (1985–2014) and under identical future forcing scenarios (SSP245 and SSP585), following a uniform statistical downscaling and bias-correction framework. This consistency ensures that inter-model differences reflect intrinsic model performance rather than variations in boundary conditions or ensemble member selection.

Methodology

Data pre-processing

In this study the hourly ERA5 precipitation data for the period 1985–2014 were aggregated to daily over the Kosi River Basin. Since the spatial grids of the ERA5 and bias corrected CMIP6 models differ the precipitation data were interpolated to ensure spatial consistency between the datasets. For this purpose, a bilinear interpolation technique was adopted which estimates rainfall values at the target grid locations based on the weighted average of the nearest surrounding grid points37.This approach ensured that both the ERA5 and CMIP6 datasets shared a common spatial framework across the basin. Data pre-processing involved the removal of missing or erroneous values, synchronization of common temporal steps, and aggregation of daily data to annual scales for ETCCDI computation. Subsequently, ETCCDI precipitation indices were computed independently at each grid point for both ERA5 and CMIP6 datasets. All model performance evaluations and ranking analysis were conducted using these grid-level indices, thereby preserving the spatial variability of precipitation extremes across the basin. The methodological workflow adopted in this study is illustrated in Fig. 2.

Fig. 2
Fig. 2The alternative text for this image may have been generated using AI.
Full size image

Methodology Flowchart.

Extreme climatic indices

Following these temporal and spatial adjustments a set of ETCCDI (Expert Team on Climate Change Detection and Indices) precipitation-based indices was computed for both ERA5 and bias corrected CMIP6 datasets. These indices are widely recognized and recommended by the World Meteorological Organization (WMO) and the Climate Variability and Predictability Project (CLIVAR) for assessing climatic extremes across different regions38,39. Previous studies have demonstrated that precipitation extremes derived from ETCCDI indices are more responsive to climate change signals than mean rainfall statistics, particularly in monsoon-dominated regions. Global analysis has shown that ETCCDI indices such as R95p_total, R99p_total, Rx1day, and Rx5day capture changes in rainfall intensity and frequency that are often obscured in seasonal or annual averages40,41,42. Climate model evaluations further demonstrate that extreme precipitation indices exhibit larger inter-model spread and stronger climate-change signals than mean precipitation, particularly in monsoon-dominated regions43,44. Over the Indian subcontinent, several studies have reported that flood-generating rainfall is increasingly driven by short-duration and high-intensity events rather than changes in total seasonal rainfall45,46,47. These findings underscore the necessity of evaluating GCM performance using extreme-event-based indices. Accordingly, the present study adopts an ETCCDI-based, index-specific evaluation framework to robustly assess CMIP6 model skill and ensemble reliability for precipitation extremes over the Kosi Basin. These indices capture key characteristics of precipitation extremes, including total rainfall, intensity, duration, and frequency of wet events. In this study eight key ETCCDI precipitation indices were selected to represent various aspects of rainfall extremes in the Kosi Basin. These indices were derived from the daily precipitation data of ERA5 and the bias-corrected CMIP6 models for both historical and future periods. Table 1 presents the list of indices, their definitions, computation methods, and their climatic significance.

Table 1 Description of ETCCDI precipitation-based indices utilise in this study.

Model performance evaluation

To assess the capability of each CMIP6 model in reproducing historical precipitation extremes a set of complementary statistical performance metrics was employed. These metrics quantify how well the simulated model aligns with ERA5 precipitation indices capturing both the magnitude and pattern of deviations9. The selected indicators include Normalized Root Mean Square Error (NRMSE), Correlation Coefficient (CC), Mean Absolute Error (MAE), Mean Deviation (MD), Variance Explained (VE), Percent Bias (PBIAS), Mean Percentage Error (MPE), Sum of Squared Error (SSE), and Symmetric Uncertainty (SU). Among these, the first eight indicators were utilized for CRITIC weighting and MCDM ranking to identify the best-performing model ensembles during the historical period (1985–2014). Each metric captures a distinct aspect of model behaviour: error magnitude (NRMSE, MAE, SSE), bias tendency (PBIAS, MD, MPE, VE), correlation strength (CC). The use of multiple indicators ensures a robust evaluation of CMIP6 model performance under different statistical perspectives. The SU metric was subsequently employed for post assessment to quantify the uncertainty between observed and simulated precipitation indices across the basin. Lower SU values indicate greater similarity between simulated and observed rainfall indices thereby highlighting more reliable ensemble combinations for subsequent future projection analysis. For preliminary comparison of model performance across individual ETCCDI indices, a composite mean score (CMS) was computed by averaging normalized performance metrics. Each statistical indicator was first normalized to a 0–1 range using min-max normalization to ensure dimensional consistency. The composite mean score for each model was then calculated as the arithmetic mean of the normalized metrics. This score provides an intuitive, unweighted summary of overall model performance for a given index. The mathematical formulations, performance ranges, and optimal values for each metric are summarized in Table 2.

Table 2 Performance Metrics for evaluating CMIP6 Models.

In the above equations, \({z}_{obs}\) and \({z}_{sim}\)represent the observed and simulated values respectively, while n denotes the total number of observations. Var denotes the variance operator. m represents number of performance metrics.\({z}_{i,j}\) denotes normalized value of metric j for model i. The terms \(\overline{{z}_{obs}}\) and \(\overline{{z}_{pred}}\) indicate the mean values of the observed and simulated datasets, respectively. In the expression for SU, \(IG\) denotes the information gain, and H represents the entropy function.

CRITIC weighing technique

In multi model evaluation each statistical metric contributes differently toward determining the overall model performance. To ensure objective weighting without subjective bias the Criteria Importance Through Inter-criteria Correlation (CRITIC) method was employed to quantify the relative importance of each performance criterion. This approach assigns weights based on both the variability of each criterion and their mutual correlation6,7. Thus, CRITIC simultaneously considers the amount of information a criterion carries and its redundancy with others ensuring a balanced and data driven weighting scheme.

The CRITIC method involves the following sequential steps:

  1. (i)

    Each performance indicator \({X}_{ij}\) (where i represents a model and j a criterion) is first normalized to make the metrics dimensionless and comparable. For beneficial criteria (CC, VE) higher values are preferred, while for non-beneficial criteria (NRMSE, MAE, PBIAS) lower values are desirable. The normalization is done using the min-max transformation:

$${Z}_{ij}=\left\{\begin{array}{c}\frac{{X}_{ij}-{X}_{j}^{min}}{{X}_{j}^{max}-{X}_{j}^{min}},forbeneficialcriteria\\\frac{{X}_{j}^{max}-{X}_{ij}}{{X}_{j}^{max}-{X}_{j}^{min}},fornon-beneficialcriteria\end{array}\right.$$
(1)
  1. (ii)

    Calculation of Standard Deviation\({(\sigma}_{j}\)): The standard deviation of each criterion reflects its contrast intensity representing how much the criterion varies across all models. A larger \({\sigma}_{j}\) indicates greater discriminating power.

  2. (iii)

    Computation of Correlation Coefficients: The pairwise correlation between criteria \({r}_{jk}\)is computed to measure redundancy. Highly correlated metrics provide overlapping information reducing their independent contribution to the overall assessment.

  3. (iv)

    Determination of Information Content (\({C}_{j}\)):The information value of each criterion is defined as:

$${C}_{j}={\sigma}_{j}\sum_{k=1}^{m}(1-{r}_{jk})$$
(2)

Where \(m\)is the number of criteria. This term combines both contrast intensity and independence among metrics.

  1. (v)

    Derivation of CRITIC Weights (\({W}_{j}\)): The normalized weight of each criterion is computed as:

$${W}_{j}=\frac{{C}_{j}}{{\sum}_{j=1}^{m}{C}_{j}}$$
(3)

The CRITIC approach provides an objective data-driven weighting scheme that eliminates human bias in assigning importance to metrics. By integrating both variability and interdependence, it effectively distinguishes between redundant and informative indicators. The CRITIC method addresses redundancy among evaluation criteria by assigning weights based on both contrast intensity and inter-criterion correlation. Indicators that exhibit strong correlation with other metrics or limited variability across models contribute less independent information and are therefore assigned lower weights. This mechanism reduces the cumulative influence of redundant indicators without excluding them from the analysis, allowing different statistical perspectives to be retained. However, CRITIC primarily accounts for linear dependencies and cannot fully eliminate higher-order or nonlinear redundancy, particularly when the number of evaluated models is limited. In this study, the derived CRITIC weights were subsequently used as input to four MCDM methods TOPSIS, VIKOR, EDAS, and PROMETHEE-II to identify the most optimal ensemble combinations for simulating precipitation extremes in the Kosi River Basin.

MCDM techniques

To comprehensively evaluate and rank the CMIP6 models based on multiple statistical performance criteria a Multi Criteria Decision Making (MCDM) framework was employed. MCDM techniques are widely applied in climate and hydrological model simulations as they allow the integration of several performance metrics into a single composite ranking thereby providing a balanced assessment that reflects both accuracy and consistency8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48. In this study, four MCDM techniques: TOPSIS, VIKOR, EDAS, and PROMETHEE-II were employed to enhance the robustness of model evaluation and to avoid reliance on a single decision-making framework. Each method was applied using the objective weights derived from the CRITIC method ensuring that the final ranking was entirely data driven and free from subjective bias.

  1. (a)

    TOPSIS.

The TOPSIS approach ranks alternatives based on their Euclidean distance from two hypothetical solutions a Positive Ideal Solution (PIS) representing the best possible performance and a Negative Ideal Solution (NIS) representing the worst49,50,51. The fundamental concept is that the most preferred model should be closest to PIS and farthest from NIS. The steps of the TOPSIS algorithm include normalization, weighted matrix formation, identification of PIS and NIS, computation of distances \({D}_{i}^{+}\)and \({D}_{i}^{-}\), and derivation of the relative closeness coefficient \({C}_{i}\):

$${C}_{i}=\frac{{D}_{i}^{-}}{{D}_{i}^{+}+{D}_{i}^{-}},0\le{C}_{i}\le1$$
(4)

Higher \({C}_{i}\)values indicate better model performance.

  1. (b)

    VIKOR.

The VIKOR method is a compromise ranking technique that identifies solutions closest to the ideal while minimizing individual regret52,53. It uses two major measures:

  1. (i)

    \({S}_{i}\): represents the group utility (overall performance across all criteria).

  2. (ii)

    \({R}_{i}\): represents the individual regret (the worst-case scenario among criteria).

The final ranking index \({Q}_{i}\)is computed as:

$${Q}_{i}=v\frac{{S}_{i}-{S}^{*}}{{S}^{-}-{S}^{*}}+(1-v)\frac{{R}_{i}-{R}^{*}}{{R}^{-}-{R}^{*}}$$
(5)

where \(v\)represents the weight of the group utility while \({S}^{*}\)and \({R}^{*}\)represent the minimum values of \({S}_{i}\) and \({R}_{i}\) respectively. The model with the lowest value \({Q}_{i}\) is considered optimal.

  1. (c)

    EDAS.

The EDAS method assesses alternatives based on their deviation from the average performance across all models. It calculates the Positive Distance from Average (PDA) and Negative Distance from Average (NDA) for each criterion7,54. The final score is derived as:

$${AS}_{i}=\frac{1}{2}\left(\frac{{\sum}_{j}{w}_{j}PD{A}_{ij}}{\text{max}\left(PDA\right)}+\left[1-\frac{{\sum}_{j}{w}_{j}ND{A}_{ij}}{\text{max}\left(NDA\right)}\right]\right)$$
(6)

Higher \(A{S}_{i}\) values denote better performing models. EDAS is less sensitive to outliers and provides a balanced approach across all performance metrics.

  1. (d)

    PROMETHEE-II.

PROMETHEE-II is a pairwise comparison based outranking method that ranks models by evaluating the degree of preference of one alternative over another for each criterion55. It computes two preference flows:

  1. (i)

    Positive flow (\({{\Phi}}^{+}\)): extent to which a model outperforms others.

  2. (ii)

    Negative flow (\({{\Phi}}^{-}\)): extent to which a model is outperformed by others.

The net preference flow is then obtained as:

$${{\Phi}}_{i}={{\Phi}}^{+}-{{\Phi}}^{-}$$
(7)

A higher \({{\Phi}}_{i}\)indicates better model performance. PROMETHEE-II is particularly useful when multiple models perform comparably as it highlights small yet significant differences among alternatives.

Comprehensive ranking of CMIP6 models

The final ranking of CMIP6 models was established through a comprehensive integration of the four MCDM techniques TOPSIS, VIKOR, EDAS, and PROMETHEE-II using the objective criterion weights derived from the CRITIC method. While each MCDM technique employs a distinct computational logic their combined application provides a more balanced and unbiased evaluation of model skill9. This integrated framework ensures that no single decision- making approach dominates the final outcome and that the resulting rankings reflect consistent model performance across multiple evaluation perspectives. For each ETCCDI precipitation index, all CMIP6 models were independently ranked using the four MCDM techniques, resulting in ordinal ranks ranging from 1 (best) to 13 (worst). Since all techniques produce rankings over an identical range, the rankings obtained from each MCDM method were directly combined through averaging to derive a Composite Ranking Index (CRI).

The CRI for the i-th model was computed as:

$${CRI}_{i}=\frac{1}{4}({R}_{i}^{TOPSIS}+{R}_{i}^{VIKOR}+{R}_{i}^{EDAS}+{R}_{i}^{PROMETHEE-II})$$
(8)

where \({\text{R}}_{\text{i}}^{\text{\:(m)}}\)denotes the rank of the i-th model obtained from the m-th MCDM technique. Models with lower CRI values indicate more consistent high performance across multiple decision-making frameworks and were therefore considered more reliable in simulating historical precipitation extremes. Based on the computed CRI values, all thirteen CMIP6 models were sorted in ascending order of CRI.

Subsequently, model ensembles were generated using the Arithmetic Multi-Model Ensemble (AMME) approach wherein precipitation indices from selected CMIP6 models were averaged arithmetically9,56. Four ensemble configurations were developed AMME3, AMME5, AMME8, and AMME13 representing the mean of the top 3, 5, 8, and all 13 models respectively. In this study, ETCCDI precipitation indices were first computed independently for each statistically downscaled and bias-corrected CMIP6 model using precipitation data at each grid point. Based on the CRITIC-MCDM assessment resulting indices were subsequently averaged across the selected models to obtain AMME-based index estimates. This index level ensemble generation preserves model specific extreme precipitation characteristics and avoids the suppression of extremes.

Results

The present study focuses on evaluating the performance of thirteen statistically downscaled and bias corrected CMIP6 models to identify reliable subsets for precipitation-based climate projections over the Kosi River Basin. This assessment emphasizes the importance of systematic model ranking prior to constructing multi-model ensembles for simulating precipitation extremes under changing climatic conditions. Section  4.1 presents the detailed performance evaluation of CMIP6 models using a combination of MCDM techniques integrated with CRITIC-derived objective weights. The section further explores the comparative skill of individual models and the performance of optimal ensemble subsets in reproducing the observed precipitation variability. Section  4.2 examines whether ranking models based on individual ETCCDI indices yields more robust ensemble configurations than rankings derived from aggregated annual precipitation characteristics. Subsequently, Sect.  4.3 investigates the projected changes in precipitation extremes for near future (2031–2060) and far future (2061–2100) periods under two emission scenarios SSP245 and SSP585 to assess potential hydro climatic shifts across the Kosi Basin.

Performance evaluation of CMIP6 models for precipitation extreme indices

Assessment of CMIP6 models based on ERA-5 observations

The model performance was assessed using eight statistical indicators as discussed in Table 2.The multi-metric performance scores of CMIP6 models across ETCCDI precipitation indices is shown in Fig. 3. Overall, noticeable variations were observed in model performance across indices. For PRCPTOT, representing total annual precipitation MPI-ESM1-2-HR, BCC-CSM2-MR, and INM-CM5-0 demonstrated strong agreement with ERA5. These models effectively reproduced both the mean climatology and spatial variability of annual rainfall over the Kosi Basin. In contrast, ACCESS-CM2 showed the weakest correlation with higher error metrics and bias. For Rx1day which captures annual maximum one day rainfall the INM-CM4-8 model achieved the best correspondence (score: 0.86), followed by INM-CM5-0 and ACCESS-CM2. The superior performance of the INM models may be attributed to their improved convective parameterization and better representation of localized precipitation extremes. In the case of Rx5day representing multi day extreme precipitation events CanESM5 (0.88) and INM-CM4-8 (0.83) outperformed others indicating their skill in capturing prolonged rainfall episodes. Conversely, EC-EARTH3 and MRI-ESM2-0 exhibited lower skill possibly due to their coarse spatial resolution and underestimation of monsoonal persistence. The indices R10mm and R20mm reflecting moderate and heavy rainfall days showed that BCC-CSM2-MR, MPI-ESM1-2-LR, and MPI-ESM1-2-HR captured the frequency distribution of wet days more accurately. ACCESS-CM2 consistently ranked lower indicating overestimation of light rainfall events. For R95p_total and R99p_total which measure very wet and extremely wet days respectively the MPI-ESM1-2-HR (0.94) and INM-CM4-8 (0.94) models emerged as top performers. These models effectively simulated the higher percentiles of precipitation demonstrating skill in capturing extreme tail behaviour crucial for flood-risk assessment. Finally, for CWD MPI-ESM1-2-HR (0.72) and MRI-ESM2-0 (0.70) achieved the highest agreement indicating their robustness in reproducing wet spell duration during monsoon periods.

Fig. 3
Fig. 3The alternative text for this image may have been generated using AI.
Full size image

Multi-metric performance scores of CMIP6 model for precipitation extreme indices.

CRITIC weight analysis

To address the potential subjectivity in metric importance, the CRITIC method was applied to evaluate the relative contribution of statistical performance metrics (NRMSE, CC, MAE, MD, VE, PBIAS, MPE, and SSE) across the ETCCDI precipitation indices. The CRITIC method objectively quantifies the importance of each criterion by accounting for both the contrast intensity (standard deviation) and the degree of correlation among the evaluation indicators. Hence, higher weights represent metrics providing more independent and informative contributions toward overall model performance evaluation. As shown in Fig. 4 the computed CRITIC weights varied moderately across the performance indicators and indices signifying a balanced contribution of different statistical measures in evaluating CMIP6 model skill. Among the performance criteria MD, MPE, and SSE exhibited relatively higher weights across most ETCCDI indices indicating their stronger discriminating power in differentiating model performance. Specifically, for the Rx1day and R99p_total, MD received the highest weights (0.154 and 0.202, respectively) reflecting the high sensitivity of these extreme precipitation indices to mean deviation in model simulated rainfall intensities. Similarly, CC and VE were observed to have comparatively lower CRITIC weights for most indices such as PRCPTOT (0.102 and 0.114) and R95p_total (0.091 and 0.119) suggesting that although they capture linear association and variability, they offer less additional information when compared with error-based measures. The NRMSE and MAE both representing magnitude-based error measures maintained moderate and consistent weights (ranging from 0.09 to 0.14) implying their stable contribution in assessing bias corrected GCM performance. Interestingly, R99p_total displayed the most uneven weight distribution among all indices, with MD and PBIAS contributing the largest shares (both 0.202), emphasizing that higher-order precipitation extremes are predominantly influenced by bias magnitude rather than correlation-based statistics. Conversely, the Rx5day index exhibited more balanced weights with CC (0.171), MD (0.155), and PBIAS (0.155) emerging as key parameters thereby reinforcing the role of precipitation extremes in capturing both correlation and error-based deviations. The CWD (Consecutive Wet Days) index assigned a higher weight to NRMSE (0.138), MAE (0.136), and SSE (0.136) demonstrating that sustained wet periods are better characterized by cumulative error indicators than by correlation metrics. Meanwhile, the R20mm and R95p_total indices reflected the combined importance of MD and VE highlighting the joint influence of bias and variability in simulating moderate to high rainfall intensities. Overall, the CRITIC analysis showed that error-based metrics (MD, MPE, SSE, and NRMSE) play a dominant role in evaluating the performance of downscaled bias corrected CMIP6 models particularly for precipitation extremes that are highly sensitive to systematic and random errors. The results suggest that future performance evaluation frameworks should assign proportionally higher importance to these indicators to achieve a balanced and objective assessment of GCM accuracy.

Fig. 4
Fig. 4The alternative text for this image may have been generated using AI.
Full size image

CRITIC derived weight distribution of performance metrics across ETCCDI precipitation indices.

MCDM based model performance assessment

Using the CRITIC derived objective weights the CMIP6 models were ranked employing four complementary MCDM techniques: TOPSIS, VIKOR, EDAS, and PROMETHEE II to ensure robustness of the evaluation. The results provide an integrated and objective ranking of model skill in simulating the observed precipitation extremes represented by the eight ETCCDI indices. The four MCDM techniques although differing in their computational frameworks demonstrated broadly consistent trends in identifying high and low performing models. Among all models, MPI-ESM1-2-HR consistently ranked first across all four techniques followed by INM-CM5-0 and BCC-CSM2-MR. These models exhibited superior capability in reproducing the spatial and temporal variability of precipitation extremes reflected by higher composite performance scores across multiple statistical indicators. In contrast, ACCESS-CM2, NoRESM2-LM, and NoRESM2-MM were ranked lowest suggesting weaker representation of precipitation extremes in the Kosi basin. Among the MCDM methods TOPSIS and VIKOR displayed similar ranking behaviour which can be attributed to their comparable treatment of the ideal and anti-ideal solutions in normalised decision space. Both methods emphasise proximity to the ideal performance leading to a strong correlation between their respective rankings. The EDAS approach, which evaluates the degree of positive and negative deviation from the average solution, yielded slightly more dispersed ranks. PROMETHEE-II, which relies on pairwise preference analysis, produced consistent but slightly conservative rankings particularly for mid-performing models reducing potential bias from extreme values.

The aggregated ranks across all MCDM techniques (Fig. 5a) indicated a strong consensus regarding the best-performing GCMs. MPI-ESM1-2-HR, INM-CM5-0, and BCC-CSM2-MR attained the top three composite positions while ACCESS-CM2 ranked last reaffirming its relatively weaker statistical agreement with observed datasets. This consistency among diverse MCDM techniques indicates a high level of robustness in the ranking framework reinforcing the reliability of the CRITIC weighted multi model evaluation. When assessed across the ETCCDI indices (Fig. 5b), model performance displayed considerable variation. Models such as MPI-ESM1-2-HR and INM-CM5-0 performed exceptionally well for high-intensity precipitation indices (R95p_total, R99p_total, and Rx1day) indicating their ability to capture the magnitude of extreme rainfall events. Conversely, indices representing moderate precipitation thresholds (R10mm, R20mm, and CWD) showed higher inter model variability, suggesting that model fidelity for prolonged or moderate-intensity rainfall events remains comparatively uncertain. Overall, the MCDM based ranking analysis demonstrated that using multiple decision-making frameworks enhance the objectivity and stability of model evaluation. By integrating multiple ranking philosophies (distance-based, deviation-based, and preference-based), the approach mitigates biases inherent to any single technique. Consequently, the comprehensive model rankings derived through CRITIC-MCDM integration offer a reliable foundation for selecting optimal CMIP6 model ensembles for subsequent regional scale precipitation projections.

Fig. 5
Fig. 5The alternative text for this image may have been generated using AI.
Full size image

(a-b)Ranking of CMIP6 models across different MCDM techniques and ETCCDI precipitation indices.

Interdependence of precipitation extremes and ensemble based performance

Dependency structure of ETCCDI indices across ensembles

Understanding the degree of dependency among precipitation extreme indices provides valuable insight into their co-behaviour and the physical processes represented in climate models. The Spearman rank correlation matrices were computed for ERA5 observations and the four ensemble subsets: AMME3, AMME5, AMME8, and AMME13 (Fig. 6). These matrices highlight both the internal coherence of each ensemble and their accuracy in reproducing observed dependencies. The ERA5 dataset revealed a coherent interdependence among several indices. PRCPTOT showed strong correlations with R10mm (0.82), R20mm (0.82), and Rx5day (0.61) confirming that annual precipitation is closely linked to both moderate and extreme rainfall events in the Kosi basin. R99p_total demonstrated a particularly high correlation with Rx1day (0.86) and Rx5day (0.74) consistent with their shared representation of high intensity precipitation extremes. In comparison, the AMME3 ensemble captured some of these dependencies but with dampened correlation strengths. For instance, PRCPTOT-R10mm and PRCPTOT-R20mm correlations were high (0.88 and 0.91) yet correlations involving extreme indices such as R99p_total were substantially weaker and even negative correlations were observed with CWD. This indicates that with only three models the ensemble fails to reproduce the higher order interactions governing extreme precipitation behaviour. The AMME5 ensemble showed noticeable improvement particularly in relationships among moderate extremes. Correlations between PRCPTOT-R10mm (0.91), PRCPTOT-R20mm (0.93), and R95p_total-R99p_total (0.83) strengthened reflecting improved coherence among the contributing models. However, CWD remained weakly correlated with high intensity indices suggesting persistent difficulty among smaller ensembles in capturing wet spell dynamics. The best agreement with ERA5 was observed for AMME8. This ensemble reproduced strong correlations across most index pairs with PRCPTOT-R10mm (0.94) and PRCPTOT-R20mm (0.94) matching closely to observed strengths. Furthermore, R95p_total-Rx5day (0.80) and R99p_total-Rx1day (0.78) were highly aligned with observational behaviour. The improved representation suggests that an eight-model ensemble balances diversity and structural similarity in a way that avoids the dominance of individual model biases. When expanded to AMME13 the ensemble maintained high positive correlations for moderate extremes but showed a decline in the coherence of extreme-extreme relationships for instance, R99p_total-PRCPTOT weakened to 0.47, compared to 0.51 in AMME8 and 0.52 in ERA5. This behaviour is expected because very large ensembles often incorporate structurally dissimilar models whose interactions dilute the strength of tail-based dependencies. Additionally, negative correlations with CWD persisted further indicating difficulty in representing duration-based indices. Overall, the Spearman analysis demonstrates that while all ensembles replicate key relationships among moderate precipitation indices AMME8 most closely reconstructs the dependency structure of observed extremes whereas AMME3 and AMME5 underestimate these connections and AMME13 dampens them due to excessive dispersion.

Fig. 6
Fig. 6The alternative text for this image may have been generated using AI.
Full size image

Spearman rank correlation matrices of ETCCDI precipitation indices.

Symmetric uncertainty and ensemble reliability

A comparison of SU values for the ETCCDI indices across the four ensemble configurations is shown in Fig. 7. Since SU ranges from 0 to 1 where lower value indicates higher agreement with the reference observations this evaluation provides direct insight into how effectively each ensemble reproduces the historical variability of precipitation extremes in the Kosi basin. Across all indices a consistent decline in SU is observed as the ensemble size increases indicating that ensemble expansion improves the stability and reliability of simulated precipitation extremes. The most substantial improvements occur when transitioning from AMME3 to AMME5 and from AMME5 to AMME8 after which gains become marginal. The R95p_total representing total rainfall from moderate to high intensity events exhibits SU values around 0.20 in AMME3 and 0.17 in AMME13, with AMME8 achieving intermediate but substantial improvement. This gradual reduction in uncertainty indicates that models are reasonably proficient at capturing frequent but impactful rainfall extremes. R99p_total representing extreme rainfall events exhibits the highest uncertainty across all ensembles with SU values decreasing from approximately 0.48 (AMME3) to 0.41 (AMME13). This persistent elevation in uncertainty highlights the inherent difficulty models face in capturing the frequency and magnitude of the rarest precipitation extremes even when large ensembles are used. Nevertheless, AMME8 and AMME13 reduce uncertainty by nearly 15%, demonstrating the value of ensemble integration. Moderate extremes such as R10mm and R20mm show comparatively low uncertainty (around 0.10–0.12), with a gentle downward trend across ensemble sizes. Their stability reflects the relatively predictable nature of moderate rainfall events, which are governed by synoptic scale dynamics that climate models generally simulate well. Similarly, PRCPTOT exhibits low uncertainty (≈ 0.07–0.08), indicating good agreement between simulation and observation for annual precipitation across all ensemble sizes. Indices linked to short duration but high intensity events Rx1day and Rx5day show moderate SU values (0.15–0.18). These indices respond well to ensemble expansion particularly between AMME3 and AMME8 where uncertainty reduces by 10–12%. Overall, the index wise SU comparison underscores the advantage of expanding ensemble size with AMME8 offering the best balance between reduced uncertainty and computational efficiency. While AMME13 yields the lowest SU values the marginal improvements over AMME8 indicate diminishing returns. Importantly, the consistently high uncertainty in R99p_total and CWD emphasizes that even large ensembles struggle to capture complex, high intensity and persistence driven precipitation extremes areas where additional regional modelling or hybrid downscaling approaches may be required.

Fig. 7
Fig. 7The alternative text for this image may have been generated using AI.
Full size image

Comparison of SU values for ETCCDI precipitation indices across the four ensemble configurations.

Future projection assessment

This section examines projected changes in precipitation extremes for the early future (EF: 2031–2060) and far future (FF: 2061–2100) under the SSP245 and SSP585 scenarios, using 1985–2014 as the baseline. The analysis is conducted using the AMME8 ensemble, which serves as the best-performing ensemble for the indices. The boxplots presented in Fig. 8 exhibit a consistent intensification of precipitation extremes across all indices when comparing the historical period with the EF and FF under both SSP245 and SSP585. For CWD a gradual upward shift in the median and upper quartile is noted from the historical period to future projections with the FF under SSP585 showing the highest values. This indicates a tendency toward longer wet spells although the magnitude of increase remains modest relative to other indices. In contrast, the response of PRCPTOT is more prominent as both SSP245 and SSP585 project noticeable increase in annual rainfall. The EF period shows a modest rise while the FF period particularly under SSP585 exhibits a substantial expansion in the upper tail, indicating increased inter annual variability and a higher probability of exceptionally wet years. Moderate rainfall indicators R10mm and R20mm display progressive increase across future scenarios with medians shifting upward and a clear expansion in the spread of values. Under SSP585_FF, both indices show a marked clustering toward higher counts, illustrating a substantial rise in the frequency of moderate-to-heavy rainfall days. This pattern combined with a noticeably larger interquartile range suggests that future monsoons will be more dominated by frequent high intensity rainfall episodes rather than steady moderate rainfall distributions. The behaviour of extreme precipitation indices further reinforces this intensification signal. R95p_total and R99p_total show a consistent upward shift in both EF and FF but the rise becomes particularly pronounced under SSP585 where the FF scenario exhibits a substantial increase in the central tendency as well as a dramatic broadening of the upper extremes. The emergence of numerous high value outliers in the FF under SSP585 highlights the likelihood of very intense rainfall events becoming more common. Rx1day and Rx5day follow a similar pattern with future scenarios projecting significant increase in the magnitude of both single day and multi day rainfall maxima. While the EF under both SSPs exhibits moderate intensification the FF under SSP585 demonstrates a much stronger response with both the median and upper quartile increasing substantially. The enhanced clustering toward higher extremes suggests a future climate characterized by more frequent and more intense rainfall bursts that are capable of producing severe flooding and rapid hydrological responses especially in topographically sensitive regions.

Fig. 8
Fig. 8The alternative text for this image may have been generated using AI.
Full size image

Boxplots of historical and future precipitation extreme indices from the AMME8 ensemble under SSP245 and SSP585.

Projected percentage changes in precipitation extreme indices

The percentage change analysis of precipitation extremes (Fig. 9) shows a clear intensification of rainfall characteristics in the Kosi Basin across all future periods and scenarios with the magnitude of increase strongly dependent on emission pathways. Under SSP245 moderate increase are projected in the early future (2031–2060) whereas annual precipitation (PRCPTOT) rises by about 11%, accompanied by proportional increases in R10mm (9%) and R20mm (11%). The indices representing heavy precipitation contributions R95p_total and R99p_total increase by 11% and 43% respectively indicating the early emergence of more intense rainfall events. Short duration extremes show comparatively smaller but notable changes with Rx1day increasing by 6% and Rx5day by 12% while CWD increases marginally (3%) suggesting that wet spell duration is less sensitive during the early-future moderate-emission scenario. For the far future (2061–2100) under SSP245 the intensification becomes more pronounced. PRCPTOT increases by 21% with substantial rises in R10mm (18%) and R20mm (23%) implying more frequent moderate to heavy rainfall days. The contribution from very wet and extremely wet days strengthens further with R95p_total and R99p_total rising by 19% and 51%, respectively. The increases in Rx1day (11%) and Rx5day (18%) indicate stronger single day and multi day extreme events developing later in the century. Although CWD increases by 10% its lower sensitivity relative to other indices suggests that the dominant changes arise not from longer wet spells but from more intense rainfall within those spells. Much stronger responses emerge under the high emission pathway (SSP585). During the early future PRCPTOT increases by 16%, and moderate to heavy rainfall indices strengthen by 15% (R10mm) and 18% (R20mm) exceeding the SSP245 projections for the same period. Extreme precipitation contributions also rise sharply with R99p_total increasing by 48% highlighting an early onset of high intensity storm events. Rx1day (11%) and Rx5day (13%) also show stronger increase relative to SSP245 while CWD rises by 8% again reflecting a modest extension in wet spells relative to the substantial rise in rainfall intensity. The most significant change occurs in the far-future period under SSP585 (2061–2100) where precipitation extremes intensify substantially across all indices. PRCPTOT shows a nearly 47% increase, while R10mm and R20mm show a rise of 40% and 60% respectively indicating a fundamental shift toward a monsoon regime dominated by frequent heavy rainfall days. Extreme indices display the strongest amplification with R95p_total increasing by 43% and R99p_total by a remarkable 79% demonstrating that extremely wet days will contribute disproportionately to annual rainfall. Short duration extremes also intensify substantially with Rx1day increasing by 32% and Rx5day by 35%, reflecting the heightened potential for severe single day and multi day rainfall bursts that are capable of triggering flash floods and high-magnitude hydrological events. CWD increases by 17%, suggesting longer wet periods but its growth remains moderate compared to the increase in rainfall intensity.

Fig. 9
Fig. 9The alternative text for this image may have been generated using AI.
Full size image

Percent change in precipitation extreme indices for EF and FF periods under SSP245 and SSP585.

Discussion

The intensification of precipitation extremes under future warming has been consistently highlighted in earlier assessments57,58. However, the reliability of such projections depends strongly on the performance of GCMs which are inherently influenced by structural uncertainties, parameterization choices, and differences in aerosol-GHG forcing pathways13,59,11. Multi model ensembles have been widely adopted for reducing model specific uncertainties9 yet the degree of improvement varies across regions and indicators7. In this context, the present study applied a rigorous, multi-stage evaluation framework to identify optimal CMIP6 ensembles for precipitation extremes over the Kosi Basin. The comprehensive assessment of 13 downscaled and bias corrected CMIP6 models underscores the importance of using multiple performance indicators in conjunction with objective weighting schemes. The CRITIC based weights offered a balanced quantification of indicator importance by capturing contrast intensity and inter-criterion correlation. Error focused metrics such as MD, SSE, and NRMSE received consistently higher weights across ETCCDI indices, emphasizing that magnitude errors exert stronger influence on extreme precipitation representation than correlation-based measures. This reinforces the need for models to accurately capture the magnitude and variability of extreme events. The MCDM based model rankings indicated notable contrasts among GCMs reaffirming the index specific variability observed in earlier regional studies60,61,62. The consistent emergence of MPI-ESM1-2-HR, INM-CM5-0, and BCC-CSM2-MR as top performers highlights their robustness in simulating both moderate and extreme rainfall events. Conversely, ACCESS-CM2 and NorESM variants ranked poorly across several indices, indicating persistent limitations in representing monsoonal precipitation behaviour consistent with known issues reported across the Indian subcontinent9. The large discrepancies in rankings across indices further justify the index-based evaluation approach adopted in this study, as reliance on annual precipitation metrics alone would have masked important model-specific strengths and weaknesses. The ensemble-based approach to extreme precipitation indices showed that the physical coherence among indices is best preserved by the medium sized ensemble AMME8. While smaller ensembles (AMME3, AMME5) were unable to capture the coupling among high percentile and short duration extremes excessively large ensembles (AMME13) tended to weaken extreme events due to the inclusion of structurally divergent models. Similar behaviour has been observed in other hydrological basins, where intermediate sized ensembles produced more realistic correlation structures7,8,9. The reliability of AMME8 in reproducing ERA5 based correlations particularly for PRCPTOT-R10mm, R95p_total-Rx5day, and R99p_total-Rx1day pairs suggests that ensemble diversity and structural similarity must be carefully balanced rather than simply maximizing ensemble size. The SU analysis further supports this finding. SU decreased systematically with increasing ensemble size but the most substantial improvements occurred between AMME3 and AMME8. Beyond eight models improvements became marginal indicating diminishing returns. This pattern is consistent with earlier research showing that uncertainty reduction in MMEs tends to plateau beyond a threshold number of contributing models9. While AMME13 displayed the lowest SU the slight improvement over AMME8 combined with reduced skill in reproducing inter-index dependencies suggests that AMME8 is the optimal ensemble as adding more models does not give any meaningful improvement. Future projections using the optimal ensemble (AMME8) show a pervasive intensification of precipitation extremes under both SSP245 and SSP585 scenarios with the far-future period (2061–2100) exhibiting the strongest increase. The projected growth in annual precipitation, moderate rainfall days (R10mm, R20mm), and short-duration extremes (Rx1day, Rx5day) signals a shift toward a more moisture-laden convectively active monsoon regime consistent with thermodynamic expectations of a warming climate. However, the increase in the Kosi Basin is even more pronounced making the region especially vulnerable to compound flooding and monsoon surges. These projected changes will have major implications for flood management, agricultural resilience, design storm estimation, and regional water-resources planning. The projected intensification of precipitation extremes over the Kosi River Basin is broadly consistent with regional and global studies that report a strengthening of heavy rainfall under a warming climate. Large-scale assessments indicate that increases in atmospheric moisture associated with rising temperatures lead to a disproportionate increase in extreme precipitation, particularly under higher emission pathways63. Studies focusing on the Indian subcontinent and the Indo-Gangetic Plains have similarly documented an increase in the frequency and magnitude of heavy rainfall events, despite relatively modest changes in seasonal mean precipitation46,47. Recent basin-scale analyses over eastern India and Himalayan-fed river systems further suggest that extreme rainfall indices such as R95p, R99p and short-duration maxima are especially sensitive to future warming, with implications for flood generation in low-gradient alluvial basins45,31. The present study extends these findings by demonstrating that objectively selected model subsets can better preserve the observed relationships among precipitation extremes while reducing ensemble uncertainty. In this context, the optimal ensemble identified here reproduces both the magnitude and inter-relationships of extremes more consistently than the full multi-model mean, reinforcing the importance of targeted ensemble selection for robust regional-scale climate impact assessments. Despite its comprehensive scope, the study acknowledges several limitations. The accuracy of projections remains dependent on the reliability of gridded datasets and the inherent assumptions of statistical downscaling both of which influence uncertainty in basins such as the Kosi. In this study, ERA5 reanalysis precipitation was adopted as the reference dataset for model evaluation. Although IMD gridded rainfall is widely regarded as the most authoritative observational dataset over India, ERA5 was selected due to methodological considerations relevant to basin-scale, grid-based intercomparison. ERA5 provides spatially and temporally homogeneous precipitation fields that are less affected by uneven station density and interpolation assumptions inherent in gauge-based gridded products. Such homogeneity is particularly important for assessing relative model performance, ranking stability, and ensemble uncertainty. Previous studies over India have reported comparable skill of ERA5 in representing precipitation variability and extremes64,65,66,67. Furthermore, ERA5 ensures methodological consistency with CMIP6 model outputs, which are also physically based representations of large-scale atmospheric processes. Nevertheless, the use of IMD gridded rainfall may provide additional regional validation, and future research may incorporate parallel evaluations to further strengthen robustness. Some performance metrics used in this study describe related statistical properties (such as error magnitude or bias tendency). Their joint inclusion however allows different expressions of model deviation to be evaluated across diverse ETCCDI indices. The CRITIC method partially mitigates redundancy by down-weighting highly correlated or weakly discriminating indicators based on inter-criterion correlation and variance structure.Correlation-based weighting cannot entirely eliminate cumulative influence when multiple related metrics are considered, particularly for small model ensembles. A systematic assessment of reduced or alternative metric subsets to enhance parsimony and interpretability is therefore identified as a relevant direction for future research. Finally, the evaluation was restricted to 13 downscaled CMIP6 models expanding the model pool could further enhance the robustness of ensemble selection. Overall, this study demonstrates that an index specific and weighted evaluation approach improves the reliability of multi model ensembles and their projections of precipitation extremes.The idendification of AMME8 as the best-performing ensemble provides a strong foundation for future hydro-climatic studies, flood forecasting, and climate-informed planning in the Kosi Basin.

Conclusions

This study provides a comprehensive evaluation of thirteen statistically downscaled and bias corrected CMIP6 models for simulating precipitation extremes over the Kosi River Basin. By integrating objective CRITIC derived weights with four complementary MCDM techniques the analysis delivers a robust and transparent framework for identifying the most reliable climate models and ensemble configurations for regional hydro-climatic applications. Based on the CRITIC-weighted MCDM rankings MPI-ESM1-2-HR, INM-CM5-0, and BCC-CSM2-MR emerged as the most reliable models for simulating historical climate conditions whereas ACCESS-CM2 and the NorESM2 family consistently ranked lower. The ensemble analysis further established that the eight-member ensemble (AMME8) offers the most balanced and reliable performance. AMME8 reproduced the inter index relationships with higher accuracy compared to AMME3 and AMME5. While AMME13 yielded slightly lower SU values the improvements were marginal and accompanied by reduced coherence among extreme indices indicating diminishing benefits from adding structurally divergent models. Therefore, AMME8 was identified as the most optimal ensemble for projecting future precipitation extremes in the basin. Future projections using AMME8 indicate a clear intensification of precipitation extremes under both SSP245 and SSP585 scenarios. This indicates a future hydrological regime that is not only wetter but also substantially more volatile. Such shifts imply heightened risks of extreme hydro-meteorological events including flash floods, large-scale riverine flooding, and intense monsoon driven hazards emphasizing the need for pro-active climate adaptation planning within the basin. While the analysis focused on annual extremes future work may extend to seasonal and sub seasonal scales incorporate additional performance metrics or weighting techniques and examine hybrid downscaling approaches to further reduce uncertainty in extreme rainfall projections.