A systematic approach to modeling monthly maximum temperature and total rainfall in Kenya

Otieno, Kevin; Chaba, Linda; Odhiambo, Collins; Omolo, Bernard

doi:10.1038/s41598-025-12810-0

Download PDF

Article
Open access
Published: 28 August 2025

A systematic approach to modeling monthly maximum temperature and total rainfall in Kenya

Kevin Otieno¹,
Linda Chaba^1,2,
Collins Odhiambo^1,3 &
…
Bernard Omolo^1,4

Scientific Reports volume 15, Article number: 31758 (2025) Cite this article

2196 Accesses
Metrics details

Subjects

Abstract

Goodness of fit (GOF) test approaches for selecting probability distributions of climatic variables are pervasive in the statistical literature. However, a combined approach of multiple tests remains underutilized despite evidence supporting their improved precision. Increased erratic climatic conditions pose severe threats to economic stability, necessitating robust statistical methods for climate modeling. To address this need, this study evaluates probability distributions for climatic variables using a comprehensive approach that combines multiple tests. A scoring system ranked each distribution’s performance across tests, with a composite score indicating the best fit. To assess robustness, sensitivity analysis on the best-performing distribution examined the influence of partitioning data into different segments (block sizes). The results show a generalized extreme value (GEV) distribution consistently outperforming other temperature and rainfall data distributions across multiple metrics. Extended block sizes capture long-term climatic patterns but introduce greater uncertainty due to fewer data points, while shorter block sizes tend to overfit. Intermediate block sizes provide a balance, producing reliable parameter estimates and stable return levels. These findings underscore the importance of selecting suitable block sizes and confirm the robustness of the GEV distribution for climate modeling. The study contributes to improved methodologies for risk assessment and climate adaptation strategies, particularly in regions such as Kenya.

Projection of key meteorological hazard factors in Xiongan new area of Hebei Province, China

Article Open access 21 September 2021

Modeling of historical and future changes in temperature and precipitation in the Panj River Basin in Central Asia under the CMIP5 RCP and CMIP6 SSP scenarios

Article Open access 24 January 2025

Towards seasonal forecasting of flood probabilities in Europe using climate and catchment information

Article Open access 06 August 2022

Introduction

Kenya’s increasing exposure to the effects of climate variability is a pressing issue, especially with erratic rainfall patterns and rising high-temperature patterns significantly affecting its key sectors. Agriculture, a backbone of Kenya’s economy^1,2, is particularly vulnerable, as unpredictable weather disrupts planting and harvesting cycles, reduces yields, and exacerbates food insecurity. Infrastructure, too, faces challenges, with extreme weather events such as floods and droughts causing damage to roads, bridges, and other critical systems. The cumulative effect of these climate-induced challenges undermines the country’s overall economic stability, highlighting the urgent need for robust mitigation and adaptation strategies.

The effects of climate variability are particularly evident in regions like Marsabit, where prolonged droughts and heavy rainfall lead to severe consequences. Droughts reduce water availability, hinder crop growth, and limit pastures, leading to crop failures and livestock losses, exacerbating food insecurity^3,4,5. In contrast, intense rainfall causes soil erosion, farmland flooding, and infrastructure damage, imposing significant financial burdens on the government for repairs and diverting resources from development projects.

These recurring events underscore the urgent need for sustainable strategies, such as climate-resilient agricultural practices, improved water management systems, and robust infrastructure design. Investments in early warning systems and community-based adaptation measures are also critical to mitigating the impacts on vulnerable populations.

A deeper understanding of climate variability, such as rainfall and temperature, can be achieved through probability distributions, which provide valuable tools to analyze climate patterns⁶. Globally, researchers have identified region- and time-dependent distributions for these variables, with models such as GEV, Gamma, log-normal, and Weibull frequently recommended for climatic data. Notable studies include those by Sharma and Singh⁷, Dzupire et al.⁸, Athulya and James⁹, Ozonur et al.¹⁰, Ximenes et al.¹¹, Hussain et al.¹², Singirankabo and Iyamuremye¹³ and Agbonaye and Izinyon¹⁴. For example, Ximenes et al.¹¹ found Gamma and Weibull to be optimal for monthly precipitation in Northeast Brazil, while Douka and Karacostas¹⁵ identified GEV and log-normal as suitable for extreme precipitation in Thessaloniki, Greece. The differences in the probability distributions between¹¹ and¹⁵ can be attributed to different geographical locations; Greece is located between \((40^\circ \text 37' N, 22^\circ \text 95' E)\) and northeast Brazil is \((34^\circ \text 47' N, 48^\circ \text 45' W)\). Their work on these regions also employed different periods; Greece’s data comprised monthly precipitation records from 1988 to 2017, whereas the study on Northeast Brazil used hourly rainfall data from 1947 to 2003. These studies and a summary in Table 1 demonstrate the importance of selecting appropriate probability distributions for accurate climate modeling.

Table 1 Literature results of probability density functions (PDF) fitted to rainfall data.

Full size table

Extensive research has also been conducted to identify the best-fitting probability distributions for temperature data. Key studies include those by Athulya and James⁹, Dzupire et al.⁸, Hasan²², Hossain²³, Hussain et al.¹² and Ozonur et al.¹⁰. These studies have explored various distributions, including the normal, log-normal, Gamma, and Weibull distributions. For instance, Hussain et al.¹² identified the Generalized Pareto (GP), Extreme Value (EV), and GEV models as suitable for modeling temperature data. Similarly, Hasan²² employed ten continuous distributions, including the exponential, Gamma, Log-Gamma, Beta, normal, log-normal, Erlang, power function, Rayleigh, and Weibull distributions, with the Beta distribution emerging as the best fit for the temperature data.

This study aims to identify the most appropriate probability distributions for modeling monthly maximum temperatures and total monthly rainfall in Kenya. The analysis is based on a comprehensive data set covering the last 73 years, capturing the impacts of recent climatic changes. By incorporating these extensive and up-to-date data, the study ensures that the models account for evolving climate patterns. For instance, accurate descriptions of climatic data provide a better understanding of the probability distributions of maximum temperatures and total rainfall, which helps capture the frequency and intensity of climatic events, such as heat waves and heavy downpours. These models also enhance predictive capabilities by leveraging historical trends and recent shifts, improving forecasting accuracy and facilitating better preparation for future climatic scenarios. Additionally, by identifying the underlying distributions, the study supports data-driven decision-making, providing a critical foundation for risk assessment and resource allocation in agriculture, water management, and disaster response sectors.

The study makes a significant contribution to modeling climatic events through three key focus areas. First, it provides a comprehensive theoretical framework for understanding and applying statistical distributions in hydrology and climate studies. The framework offers precise definitions of commonly used distributions, facilitating their identification and application to various climatic datasets. It also includes robust parameter estimation methodologies that ensure accurate modeling of climatic variables. Furthermore, the study outlines strategies for selecting extreme values tailored to specific extreme value distributions, enabling the precise focus on significant climatic events.

Second, the research emphasizes the application of GOF tests to identify the most suitable probability distributions for climatic data. Detailed discussions on the implementation of GOF tests enhance the accuracy and reliability of the models. This methodological rigor improves the alignment of models with observed data and bolsters their credibility for practical applications in risk assessment and decision-making.

Lastly, we emphasized the significance of temporal pattern analysis through block size selection, a crucial factor in statistical modeling that directly impacts the capture of temporal patterns in climatic data. We conducted a sensitivity analysis to assess the impact of varying block sizes on the GEV distribution. This analysis combined graphical methods, GOF tests, return level estimates for various periods, and confidence intervals. By examining the effect of block size on model performance and extremal forecast, this section provides valuable insights into the stability and reliability of the GEV distribution across varying temporal resolutions.

The paper is structured as follows. “Methods” section provides a detailed description of the data, the procedure for selecting candidate probability distributions, parameter estimation methods, and the implementation of GOF tests, including the combined approach of multiple GoF tests. “Results and discussion” section presents summary statistics, results from the selection of candidate distributions, findings from the GoF tests, and insights from the sensitivity analysis. Finally, “Conclusion” section concludes the paper by summarizing the key findings and their implications for climate modeling and risk assessment.

Methods

Data

The monthly maximum temperature (Tmax) and total precipitation (Prep) data for Kenya, covering the period 1950–2022, were sourced from the World Bank Climate Change Knowledge Portal²⁴. The precipitation data (Prep), measured in millimeters, represents the total accumulation of monthly rainfall. This provides a comprehensive measure of rainfall intensity and distribution across different months. The temperature data (Tmax), recorded in degrees Celsius, captures the highest daily maximum temperature observed each month, offering valuable insights into extreme temperature events.

Selection of candidate probability distributions

A review of existing literature identified probability distributions commonly applied in hydrological studies: exponential, Gamma, Weibull, log-normal, logistic, Gumbel, GPD, and GEV, as referenced by^{7,8,9,10,11,12,14,16,17,18,20}. Similarly, for temperature data, these distributions, in addition to a normal distribution, were identified as suitable candidates, supported by findings from²² and other related studies. Table 2 describes each probability distribution function. These distributions were selected due to their suitability in modeling skewed, heavy-tailed, or extreme data characteristics commonly found in climatic datasets. The Cullen and Frey graph²⁵ was used to preliminarily assess the shape characteristics of the data, guiding the selection of appropriate distributions for further analysis.

Parameter estimation

In statistical modeling, parameter estimation is essential due to the typically unknown nature of most model parameters. Commonly employed methods include the Method of Moments, L-moments, Maximum Likelihood Estimation (MLE), and LH-moments, as noted in studies by Al Mamoon and Rahman⁶ and Haddad and Rahman²⁶. In this paper, we employ the MLE method for parameter estimation across the analyzed distributions, as it is one of the most widely applied and robust methods. MLE is favored for its consistency and efficiency, particularly in large samples, as it maximizes the likelihood of the observed data and often yields more reliable results compared to other methods such as Moments, L-moments, and LH-moments, particularly in terms of asymptotic properties. Research, including foundational studies by Fisher²⁷, Zong²⁸ and Naghettini²⁹, has demonstrated that MLE’s variance and bias are comparatively low, thereby enhancing its suitability across a broad range of distributions. These qualities render MLE exceptionally reliable for environmental datasets, including temperature and rainfall measurements, where precision and robustness are critical.

Goodness of fit tests

The suitability of each probability distribution was assessed using a suite of GOF tests, including the Kolmogorov-Smirnov (KS), Anderson-Darling (AD), Cramer-von Mises (CvM), and Chi-Square tests. These tests evaluate the alignment between theoretical and empirical data, with KS tests focusing on overall distributional fit^15,30, AD and CvM emphasizing tail behavior^{15,26,31,32,33}, and Chi-Square examining frequency alignment¹⁹. Additional evaluation was performed using Akaike Information Criteria (AIC) and Bayesian Information Criteria (BIC) to balance model complexity and fit^10,12,22,26, along with Root Mean Square Error (RMSE) to quantify predictive accuracy¹⁴.

Comprehensive scoring methodology

The literature indicates a lack of suitable GOF tests designed to effectively distinguish between empirical and theoretical distributions³⁴. Numerous studies have shown that the best-fit probability distribution can vary significantly between different regions, even for the same variable³². In response to these challenges, we adopt a comprehensive scoring methodology, as outlined in previous studies^14,17,22,35. This method employs an integrated scoring approach that incorporates multiple GOF tests, information criteria, and graphical analyses to ensure a robust selection of the optimal probability distribution model. Each distribution model is subjected to several GOF tests, with a scoring system applied whereby the best-performing model in each test receives the highest rank. To enhance the rigor of the selection process, each model’s rank is determined independently for each GOF test and then aggregated across all tests to produce a composite score. For graphical assessments, rankings are informed by visual inspection of density plots and quantile-quantile (Q–Q) plots, providing additional insight into the best-fitting model.

Table 2 Description of various probability distribution functions.

Full size table

Results and discussion

This section provides statistical results from the analysis. The dataset used in this study assumes an independent and identically distributed (iid). We tested for stationarity using the Augmented Dickey-Fuller (ADF) test, randomness using the Wald-Wolfowitz runs test, and independence using the Ljung-Box test to verify adherence to these assumptions. All tests were performed at \(5\%\) significance level. The results indicated that the data were stationary and random but exhibited autocorrelation; therefore, the data were aggregated using block analysis.

Summary statistics

Table 3 shows the descriptive statistics for the annual maximum temperature and total rainfall for Kenya.

Table 3 Summary statistics for the monthly maximum temperature (\(^\circ\)C) and total monthly rainfall (mm).

Full size table

The maximum temperature (Tmax) for 876 observations has an average of \(26.23 ^\circ C\) with low variability (standard deviation = 1.27) and a range from \(23.16 ^\circ C\) to \(29.97 ^\circ C\). The interquartile range \(25.32 ^\circ C\) to \(27.15 ^\circ C\) highlights a concentration around the median \(26.23 ^\circ C\), with a near-symmetrical distribution (skewness = 0.12) and a relatively flat shape (kurtosis = 2.43). The findings resonate with previous studies in^1,2, which indicate that while temperature variability at the national level tends to be low due to data aggregation, an increase in temperature has been observed in most regions across the country.

In contrast, Total rainfall (Prep) exhibits much higher variability, with a mean of 63.97 and a standard deviation of 42.72, ranging from 2.46 to 280.32. This wide range reflects the variability and extreme nature of rainfall. Quartiles (q25 = 35.90, q75 = 81.88) and a median of 50.90 indicate a right-skewed distribution (skewness = 1.46), while positive kurtosis (5.43) points to heavy tails, signifying extreme events. The findings also align with the evidence^1,2.

Choice of candidate distributions

For the temperature data in Fig. 1a, the Cullen and Frey graph shows that the distribution approximates the normal region with a slight platykurtic shape, identifying the normal, uniform, log-normal, Gamma, Weibull, and logistic distributions as potential candidates. Studies, such as¹², have shown that extreme value distributions are suitable for modeling temperature data; therefore, these distributions were also considered potential candidates. In the rainfall data in Fig. 1b, the distribution exhibits positive skewness and high kurtosis, suggesting alignment with distributions such as log-normal, Gamma, Weibull, and exponential. Given the presence of extreme values, models that account for extreme behavior, specifically the GPD and GEV distributions, were also included in the analysis.

Model fitting was conducted using MLE for parameter estimation. For extreme value distributions, the Block Maxima (BM) and Peak Over Threshold (POT) approaches were used to determine the number of block maxima and thresholds required to fit GEV and GPD distributions, respectively. The BM approach is widely used in extreme value analysis to capture maximum events within defined time intervals, such as annual maxima, and it is commonly applied for environmental and climate data^30,36,37. For the POT method, which is well-suited to modeling excesses over a specified threshold, the Mean Residual Life (MRL) plot was generated as shown in Fig. 2, and visual inspection was used to determine an appropriate threshold for each variable^13,37. The blue curve in Fig. 2 represents the observed mean excess values {\(e = E(x_i - u \mid \text x_i > u )\)} , the red lines denote the upper and lower confidence intervals \((95\%)\) and threshold \(u\) defines the limit for identifying extreme events \((x_i: x_i > u)\)³⁸. In Fig. 2a, a threshold in the range of 50 to 150 is suitable, as it provides a stable mean excess with narrower confidence intervals. This indicates that values above this threshold exhibit behavior suitable for modeling with a GPD. For temperature, the MRL plot in Fig. 2b did not suggest a proper threshold, hence the initial guess of a threshold around \(u=25\), where the confidence intervals remain relatively narrow, indicating reliable estimates. However, after approximately 28, the confidence intervals begin to widen slightly, indicating increased uncertainty in the mean excess values at higher thresholds. The GPD parameters were estimated based on observations exceeding this threshold.

Graphical assessments and GOF tests results

Graphical assessments

Density and Q–Q plots were generated to compare the observed data with several fitted theoretical distributions. For temperature data, the density plot in Fig. 3 shows that the GEV, Gamma, and log-normal distributions provide the best fit, capturing both the central peak and tail behavior. The normal, Weibull, and logistic distributions also perform reasonably well but exhibit slight deviations in the tails. In contrast, the uniform distribution shows significant discrepancies, particularly in the extremes, suggesting its unsuitability for modeling extreme temperature events. The Q–Q plots in Fig. 4 reveal that most distributions demonstrate deviations in the tails, with the GEV and normal distributions showing the closest adherence to the theoretical quantiles. Among the fitted distributions, the GEV, normal, log-normal, and Gamma distributions provide the best fit in that order, followed by the logistic and Weibull distributions, which exhibit moderate deviations. In contrast, the GPD and uniform distributions exhibit a substantial lack of fit, particularly at the lower and upper tails. This visual approach to identifying the best-fitting distribution is inherently subjective and, therefore, cannot be relied upon solely. To enhance robustness, these results were complemented with findings from other GOF tests to improve the reliability of distribution selection.

Similarly, for the rainfall data in Fig. 5, the GEV, Gamma, and log-normal distributions show the closest alignment with the actual observed data, effectively capturing the shape and spread of the distribution. The Weibull distribution provides a moderate fit, performing well in the central range but diverging in the tails. In contrast, the exponential and GPD distributions exhibit substantial deviations, failing to represent the empirical distribution, especially at the extremes accurately. The Q–Q plots in Fig. 6 reinforce these findings, with the GEV and Gamma distributions displaying the best adherence to the theoretical quantile line, followed by the log-normal and Weibull distributions. Exponential and GPD exhibit the weakest performance. These results are consistent with previous studies, such as²¹, which identified the GEV distribution as the most appropriate model for extreme rainfall events.

GOF tests

The GOF analysis in Table 4 (a) identifies the GEV distribution as the most suitable model for the maximum temperature data. The GEV distribution achieves the lowest statistics for the KS (0.0297), AD (0.8890), and CvM (0.1335) statistics, accompanied by high p-values (0.4206, 0.4211, and 0.4442), indicating a strong alignment with the observed data. It also produces the lowest Chi-square statistic (3.5969, p = 0.9637) and achieves superior performance in terms of AIC (2,898.30), BIC (2,912.63) and RMSE (1.5694), highlighting its precision and efficiency. Other distributions, such as the normal, log-normal, and Gamma, provide moderate fits, with non-significant GOF statistics but higher AIC and BIC values, along with RMSE values that reflect less accuracy compared to the GEV. Conversely, the Weibull, Uniform, Logistic, and GPD distributions exhibit poor performance, with high test statistics, low p-values, and significant deviations from the observed data. The Uniform and GPD distributions show extreme misalignment, as evidenced by infinite AD statistics, high Chi-square values, and elevated RMSE scores, confirming their unsuitability for modeling maximum temperature data.

Table 4 Goodness of fit test results for temperature and rainfall distributions.

Full size table

For the rainfall data in Table 4 (b), the GEV distribution also emerges as the most robust model, as reflected in the highest p-values for the tests KS (0.3487), AD (0.2753), and CvM (0.2897), indicating minimal deviation from observed data. Furthermore, the GEV achieves among the lowest AIC (8713.87) and BIC (8728.19) values, highlighting its parsimony and suitability for modeling rainfall patterns. Its superior predictive accuracy is evident from the lowest RMSE value (58.86), reinforcing its reliability. Concerning chi-square tests, the log-normal distribution was found to have the lowest chi-square value, indicating a better fit. Yuan et al.¹⁷ also had a similar finding when they used Chi-square tests to evaluate the best fit for the frequency analysis of the annual maximum hourly precipitation. In contrast, the GPD and exponential distributions perform poorly, with significant p-values, high Chi-square statistics, and elevated RMSE values, indicating substantial deviation and limited applicability for modeling rainfall data.

A comprehensive scoring method was used to further evaluate the best-fitting distributions, with findings presented in Table 5. Analysis for temperature distributions in Table 5 (a) revealed that the GEV consistently outperformed others as observed in³⁹, achieving the highest overall rank with a total score of 17. This was supported by its superior performance in key tests, including KS, AD, and CVM tests. The Gamma and log-normal distributions ranked second and third, respectively, demonstrating moderate fits across multiple metrics. However, distributions like Weibull, Uniform, Logistic, and GPD performed poorly, accumulating higher total scores and displaying suboptimal results in density plots and QQ plots.

For rainfall distributions, the ranking analysis in Table 5 (b) also confirms that the GEV distribution again emerged as the top performer, ranking first with a total score of 16. These findings are supported by Agbonaye and Izinyon¹⁴, Al Mamoon and Rahman⁶, Alam et al.¹⁸, Coronado-Hernández et al.³⁶, Fadhilah et al.²¹, Ghosh et al.⁴⁰, Ng et al.³⁵ and Yuan et al.¹⁷. Its strength was evident across most GOF tests, where it outperformed or closely matched the best-performing distributions in each category. The Gamma distribution ranked second, showcasing a strong overall fit with balanced performance across metrics. Log-normal followed in third place, excelling in certain tests but lagging in others, such as AIC and BIC. In contrast, the exponential and Weibull distributions demonstrated weaker fits, while the GPD distribution consistently ranked lowest.

Table 5 Goodness of fit rankings for temperature and rainfall distributions.

Full size table

Sensitivity analysis

To evaluate the robustness of the GEV distribution’s fit to rainfall data, a sensitivity analysis was performed using various block sizes designed to capture diverse temporal patterns and extremes. Block size refers to a series of independent groups of observations of a particular length³⁸. According to Coles and Coles³⁸, block sizes are often selected to capture a specific period. In this work, the block sizes included annual, seasonal, monthly, 5-year, 10-year, 12-month moving averages, 6-month intervals, and 4-month intervals. Annual blocks, where maximum values were extracted per year, followed the methodologies outlined in^38,41. Seasonal blocks were based on quarterly aggregations, as indicated by⁴² and⁴¹. Monthly blocks were used to capture monthly maxima, as discussed in⁴³ and⁴². For longer-term patterns, multi-year blocks of 5-year and 10-year intervals were established, consistent with approaches adopted in studies such as⁴⁴. A 12-month moving average window assessed rolling maxima, highlighting shifts in trends. Event-based blocks focused on the most extreme events by isolating total rainfall above the 95th percentile following the techniques used in⁴⁵. For intermediate seasonality, semi-annual blocks were divided each year into January–June and July–December intervals, consistent with approaches used by^42,43,46. Furthermore, a regional seasonal classification for Kenya was used to account for local climatic variations, with blocks corresponding to the “Hot and Dry”, “Long Rainy”, “Cool”, and “Short Rainy” seasons, building on the framework proposed by⁴⁷. For each block length, maximum values were extracted and the GEV parameters were estimated and presented in Table 6.

For both rainfall and temperature data, parameter estimates reveal notable differences between block sizes, particularly in the shape parameter, which defines tail behavior. For rainfall, annual, 5-year, and 10-year blocks exhibited non-significant negative shape parameters \((p < 0.05)\), indicating a Weibull class of distribution as reported in³⁰ and uncertainty in tail estimates for these broader temporal aggregations. In contrast, mid-range blocks, such as monthly, quarterly, event-based, and seasonal, yielded significant positive shape parameters, reflecting the heavy-tailed Frechet class of distributions with well-defined extremal patterns. This is in agreement with Moccia et al.³³ although the findings of Onwuegbuche et al.⁴⁸ and Singirankabo et al.³⁷ revealed that Gumbel is the optimal distribution. The location and scale parameters were consistently significant \((p < 0.05)\) across all block sizes, indicating reliable estimation of central tendency and variability. The event-based block for rainfall, with a high shape estimate (0.3974), suggested a heavier tail and a higher propensity for extreme rainfall events compared to other blocks. For temperature data, location and scale parameters were also consistently significant across all blocks, confirming stable estimates of central tendency and variability. However, the shape parameter was not significant for the 5-year, 10-year, and event-based models, indicating uncertainty in tail estimates, which is likely due to the limited number of data points or the irregular occurrence of extreme events. In contrast, the quarterly, monthly, and seasonal models produced significant shape parameters, suggesting that they provide more robust and reliable tail estimates for predicting rare and extreme values in both temperature and rainfall.

Table 6 ML estimates and significance of location, scale, and shape parameter for temperature and rainfall distribution.

Full size table

The model diagnostic tests in Table 7 reveal that the 10-year and 5-year blocks provide the best fit for both rainfall and temperature data, achieving the lowest AIC and BIC values (e.g., AIC = 74.406 and 146.985 for rainfall), indicating strong model parsimony and minimal information loss. These longer blocks effectively capture long-term extreme trends but rely on fewer data points (n = 7 and 14), which increase uncertainty in parameter estimates due to increased variances, as demonstrated by⁴⁶. This finding aligns with studies by^38,41, which emphasize the effectiveness of larger blocks in capturing long-term climatic trends by averaging out short-term fluctuations, thereby focusing on extreme patterns. Event-based and annual blocks also perform well for rainfall, with low AIC and BIC values, reflecting their stability in representing extreme events with adequate data, as supported by⁴². In contrast, higher-frequency blocks, such as monthly and 12-month moving average models, exhibit much higher AIC and BIC values for both rainfall and temperature, suggesting potential overfitting and inefficiency in capturing extreme patterns, a limitation also noted by⁴³. Mid-range blocks, including quarterly, semi-annual, and seasonal, achieve moderate AIC and BIC values for both datasets, offering a balanced approach that captures seasonal variability while maintaining sufficient stability for reliable parameter estimation. This perspective is supported by studies such as^15,42,46, which highlight the value of intermediate temporal scales in balancing the trade-offs between long-term trend analysis and sufficient data representation.

Table 7 Model performance metrics for maximum temperature (\(^\circ\)C) and total rainfall (mm) across different blocks.

Full size table

In addition, we computed the return levels for different return periods to determine how various models estimate the extremes. The return level represents the magnitude of an event expected to be equaled or exceeded, on average, once within a specified return period^38,48. The findings in Fig. 7 for temperature and rainfall data reveal distinct patterns across models when estimating extremes at various return periods. For temperature in Fig. 7a , the 10-year and 5-year models consistently produce the highest return levels, maintaining stability across increasing return periods as observed in⁴⁸, indicating their robustness in estimating extreme values over longer intervals. In contrast, models with finer resolutions, such as monthly and 12-month moving averages, yield lower return levels with modest increases over time, suggesting a limited capacity to capture rare extremes. The quarterly and semi-annual models show moderate return levels, providing a balanced estimation that captures both seasonal variability and long-term trends. For rainfall in Fig. 7b, a similar pattern emerges, with the 10-year, 5-year, and seasonal models achieving the highest and most stable return levels, while finer models like monthly and 12-month moving averages display lower return levels and less pronounced growth across return periods. The event-based model exhibits high initial return levels but shows a plateau at more extended periods, indicating potential limitations in capturing prolonged extremes. Overall, the 10-year, 5-year, and seasonal models appear to be the most consistent for temperature and rainfall extremes.

Finally, we used a density plot to check how each model captures the distribution of maximum temperatures and total rainfall. In the temperature plot in Fig. 8a , the 10-year, 5-year, and event-based models displayed the most concentrated curves, suggesting a narrower range with more pronounced extremes. Models with higher temporal resolutions, like monthly and 12-month moving averages, exhibit wider density curves, indicating a broader distribution that captures more frequent fluctuations but is less focused on extremes. The quarterly and semi-annual models fall between these extremes, striking a balance between stability and variability. For rainfall data in Fig. 8b, a similar pattern emerges: the 10-year and 5-year models show steeper, more concentrated curves, indicating that they effectively capture rare, high-magnitude events. In contrast, finer-resolution models, such as monthly and 12-month moving averages, have flatter curves, capturing a wider range of data with less emphasis on extremes.

Conclusion

In this study, we have assessed various probability distributions for modeling maximum temperature and total rainfall data using a systematic and comprehensive approach that combines several GOF tests and graphical tools. In addition, we have identified the optimal block size for the GEV distribution using return levels across different periods, as well as log-likelihood, AIC, and BIC. Insights from GOF tests highlighted that the GEV, Gamma, and log-normal distributions were well-suited for both maximum temperature and total rainfall datasets, as they consistently aligned with empirical data. On the other hand, distributions such as uniform, Weibull, and logistic showed a poor fit across multiple metrics, underscoring their limitations in capturing the complexities of climatic variables. The GEV distribution emerged as the optimal model for rainfall and temperature data, consistently outperforming others in key metrics such as the AIC, BIC, and RMSE. It also demonstrated superior performance in GOF tests, including the KS, AD, and CVM tests. This strong performance affirms the robustness of the GEV distribution in modeling climatic extremes and its capacity to provide reliable insights into long-term trends.

Block size analysis revealed the effectiveness of longer temporal aggregations, such as 10-year and 5-year blocks, which produced stable and high return levels across return periods, effectively capturing long-term extreme trends. However, these longer blocks increased uncertainty in parameter estimates due to fewer data points. In contrast, intermediate blocks, such as quarterly and seasonal, struck a balance by capturing seasonal variations while maintaining stability and reliable parameter estimates with moderate AIC and BIC values. High-frequency blocks, such as monthly and 12-month moving averages, although rich in data, exhibited higher AIC and BIC values, suggesting potential overfitting and inefficiency in representing extreme values.

The results of this study are important for Kenya and the East African region, as the adopted methodology can be applied. The comprehensive GOF tests also enhance forecasting temperature and rainfall data, which is crucial for risk assessment and the development of climate adaptation strategies. With this knowledge, predictions and preparations for catastrophic events, such as floods, droughts, or rising temperatures, can be enhanced. With better forecasts, policymakers and the government can improve infrastructure for water catchment systems and enhance agricultural activities through proper planning and disaster preparedness.

However, a key limitation of this study is its focus on individual probability distributions for temperature and rainfall without explicitly addressing the interdependence between these variables. Since temperature and rainfall are inherently related, accurate risk assessments and effective climate adaptation strategies require consideration of their associations. Extensive research has been conducted on the dependence between temperature and rainfall; therefore, future studies should prioritize exploring dependence structures within a multivariate framework using the fitted probability distributions identified in this study. Advanced approaches such as copula models or joint distribution analyses could provide deeper insights into the interactions between these variables, particularly under extreme climatic conditions. Such efforts would significantly enhance the reliability of climate models and their applicability to integrated risk assessment frameworks.

To build on this work, future research should focus on applying this methodology at finer spatial scales using real datasets from various regions in Kenya. Conducting probability distribution analyses at regional levels, incorporating block size analysis, and integrating data from multiple weather stations could yield region-specific insights into seasonal rainfall patterns, further informing targeted climate adaptation strategies. From a policy perspective, the results underscore the need for data-driven strategies that take into account both individual and joint variability of climatic variables. Policymakers should leverage these insights to design robust adaptation measures, such as enhancing agricultural planning, improving water resource management, and enhancing infrastructure resilience tailored to Kenya’s specific climate challenges.

Data availability

The data that support the findings of this study are accessible to registered users (free registration) on the World Bank, Climate Change Knowledge Portal (https://climateknowledgeportal.worldbank.org/).

References

GOK. Kenya Climate Smart Agriculture Strategy, 2017–2026 (Ministry of Agriculture, Livestock and Fisheries, 2017).
Jalango, D. et al. Climate smart agriculture investment plan for kenya. In Accelerating Impacts of CGIAR Climate Research for Africa (AICCRA) (2022).
Nyika, J. M. Climate change situation in Kenya and measures towards adaptive management in the water sector. In Research Anthology on Environmental and Societal Impacts of Climate Change, 1857–1872 (IGI Global, 2022).
Ngure, M. W., Wandiga, S. O., Olago, D. O. & Oriaso, S. O. Climate change stressors affecting household food security among Kimandi–Wanyaga smallholder farmers in Murang’a County, Kenya. Open Agric. 6, 587–608 (2021).
Article Google Scholar
Mkonda, M. Y. & He, X. Are rainfall and temperature really changing? Farmer’s perceptions, meteorological data, and policy implications in the tanzanian semi-arid zone. Sustainability 9, 1412 (2017).
Article Google Scholar
Al Mamoon, A. & Rahman, A. Selection of the best fit probability distribution in rainfall frequency analysis for Qatar. Nat. Hazards 86, 281–296 (2017).
Article Google Scholar
Sharma, M. A. & Singh, J. B. Use of probability distribution in rainfall analysis. N. Y. Sci. J. 3, 40–49 (2010).
Google Scholar
Dzupire, N. C., Ngare, P. & Odongo, L. A copula based bi-variate model for temperature and rainfall processes. Sci. Afr. 8, e00365 (2020).
Google Scholar
Athulya, P. & James, K. Best fit probability distributions for monthly radiosonde weather data. Int. J. Adv. Manag. Technol. Eng. Sci. 7, 24–31 (2017).
Google Scholar
Ozonur, D., Pobocikova, I. & de Souza, A. Statistical analysis of monthly rainfall in central west Brazil using probability distributions. Model. Earth Syst. Environ. 7, 1979–1989 (2021).
Article Google Scholar
Ximenes, P. S. M. P., Silva, A. S. A., Ashkar, F. & Stosic, T. Best-fit probability distribution models for monthly rainfall of northeastern brazil. Water Sci. Technol. 84, 1541–1556 (2021).
Article PubMed Google Scholar
Hussain, B. et al. Interdependence between temperature and precipitation: Modeling using copula method toward climate protection. Model. Earth Syst. Environ. 8, 2753–2766 (2022).
Article Google Scholar
Singirankabo, E. & Iyamuremye, E. Modelling extreme rainfall events in Kigali city using generalized pareto distribution. Meteorol. Appl. 29, e2076 (2022).
Article ADS Google Scholar
Agbonaye, A. & Izinyon, O. Best-fit probability distribution model for rainfall frequency analysis of three cities in south eastern Nigeria. Niger. J. Environ. Sci. Technol. (NIJEST) 1, 34–42 (2017).
Article Google Scholar
Douka, M. & Karacostas, T. Statistical analyses of extreme rainfall events in Thessaloniki, Greece. Atmos. Res. 208, 60–77 (2018).
Article Google Scholar
Oseni, B. A. & Ayoola, F. J. Fitting the statistical distribution for daily rainfall in Ibadan, based on chi-square and Kolmogorov–Smirnov goodness-of-fit tests. West Afr. J. Ind. Acad. Res. 7, 93–100 (2013).
Google Scholar
Yuan, J., Emura, K., Farnham, C. & Alam, M. A. Frequency analysis of annual maximum hourly precipitation and determination of best fit probability distribution for regions in Japan. Urban Clim. 24, 276–286 (2018).
Article Google Scholar
Alam, M. A., Farnham, C. & Emura, K. Best-fit probability models for maximum monthly rainfall in Bangladesh using Gaussian mixture distributions. Geosciences 8, 138 (2018).
Article ADS Google Scholar
Houessou-Dossou, E. A. Y., Mwangi Gathenya, J., Njuguna, M. & Abiero Gariy, Z. Flood frequency analysis using participatory GIS and rainfall data for two stations in Narok town, Kenya. Hydrology 6, 90 (2019).
Article Google Scholar
Coronado-Hernández, Ó. E., Merlano-Sabalza, E., Díaz-Vergara, Z. & Coronado-Hernández, J. R. Selection of hydrological probability distributions for extreme rainfall events in the regions of Colombia. Water 12, 1397 (2020).
Article Google Scholar
Fadhilah, Y. et al. Fitting the best-fit distribution for the hourly rainfall amount in the Wilayah Persekutuan. Jurnal Teknologi 46, 49–58 (2007).
Google Scholar
Hasan, R. H. R. Estimating the best-fitted probability distribution for monthly maximum temperature at the Sylhet station in Bangladesh. J. Math. Stat. Stud. 2, 60–67 (2021).
Article ADS Google Scholar
Hossain, M. Fitting the probability distribution of monthly maximum temperature of some selected stations from the northern part of Bangladesh. Int. J. Ecol. Econ. Stat. 39, 80–91 (2018).
Google Scholar
WorldBank. Climate change knowledge portal (2024). Accessed 16 Sept 2023.
CullenFrey, A. Probabilistic techniques in exposure assessment (1999).
Haddad, K. & Rahman, A. Selection of the best fit flood frequency distribution and parameter estimation procedure: A case study for Tasmania in Australia. Stoch. Environ. Res. Risk Assess. 25, 415–428 (2011).
Article Google Scholar
Fisher, R. A. On the mathematical foundations of theoretical statistics. In Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, vol. 222, 309–368 (1922).
Zong, Z. Information-Theoretic Methods for Estimating of Complicated Probability Distributions Vol. 207 (Elsevier, 2006).
MATH Google Scholar
Naghettini, M. Fundamentals of Statistical Hydrology (Springer, 2017).
Book Google Scholar
Chikobvu, D. & Chifurira, R. Modelling of extreme minimum rainfall using generalised extreme value distribution for Zimbabwe. S. Afr. J. Sci. 111, 01–08 (2015).
Article Google Scholar
Sukrutha, A., Dyuthi, S. R. & Desai, S. Probability distribution for monthly precipitation data in India. arXiv preprint arXiv:1708.03144 (2017).
Lima, A. O. et al. Extreme rainfall events over Rio de Janeiro state, brazil: Characterization using probability distribution functions and clustering analysis. Atmos. Res. 247, 105221 (2021).
Article Google Scholar
Moccia, B., Mineo, C., Ridolfi, E., Russo, F. & Napolitano, F. Probability distributions of daily rainfall extremes in Lazio and Sicily, Italy, and design rainfall inferences. J. Hydrol. Reg. Stud. 33, 100771 (2021).
Article Google Scholar
Razali, N. M. et al. Power comparisons of Shapiro–Wilk, Kolmogorov–Smirnov, Lilliefors and Anderson–Darling tests. J. Stat. Model. Anal. 2, 21–33 (2011).
Google Scholar
Ng, J. et al. Investigation of the best fit probability distribution for annual maximum rainfall in Kelantan river basin. In IOP Conference Series: Earth and Environmental Science, vol. 476, 012118 (IOP Publishing, 2020).
Coronado-Hernández, Ó. E., Merlano-Sabalza, E., Díaz-Vergara, Z. & Coronado-Hernández, J. R. Selection of hydrological probability distributions for extreme rainfall events in the regions of Colombia. Water 12, 1397 (2020).
Article Google Scholar
Singirankabo, E., Iyamuremye, E., Habineza, A. & Nelson, Y. Statistical modelling of maximum temperature in Rwanda using extreme value analysis. Open J. Math. Sci. 7, 180–195 (2023).
Article Google Scholar
Coles, S. & Coles, S. Basics of statistical modeling. In An Introduction to Statistical Modeling of Extreme Values 18–44 (2001).
Ng, J. et al. Statistical modelling of extreme temperature in peninsular Malaysia. In IOP Conference Series: Earth and Environmental Science, vol. 1022, 012072 (IOP Publishing, 2022).
Ghosh, S., Roy, M. K. & Biswas, S. C. Determination of the best fit probability distribution for monthly rainfall data in Bangladesh. Am. J. Math. Stat. 6, 170–174 (2016).
Google Scholar
Villarini, G., Smith, J. A., Serinaldi, F. & Ntelekos, A. A. Analyses of seasonal and annual maximum daily discharge records for central Europe. J. Hydrol. 399, 299–312 (2011).
Article ADS Google Scholar
Hasan, H., Radi, N. A. & Kassim, S. Modeling of extreme temperature using generalized extreme value (GEV) distribution: A case study of Penang. Proc. World Congr. Eng. 1, 181–186 (2012).
Google Scholar
Ender, M. & Ma, T. Extreme value modeling of precipitation in case studies for China. Int. J. Sci. Innov. Math. Res. (IJSIMR) 2, 23–36 (2014).
Google Scholar
Fowler, H. & Kilsby, C. A regional frequency analysis of united kingdom extreme rainfall from 1961 to 2000. Int. J. Climatol. J. R. Meteorol. Soc. 23, 1313–1334 (2003).
Article Google Scholar
Gilleland, E., Ribatet, M. & Stephenson, A. G. A software review for extreme value analysis. Extremes 16, 103–119 (2013).
Article MathSciNet MATH Google Scholar
Özari, Ç., Eren, Ö. & Saygin, H. A new methodology for the block maxima approach in selecting the optimal block size. Tehnički vjesnik 26, 1292–1296 (2019).
Google Scholar
Musyoka, M. M. Spatial–Temporal Characteristics of Rainfall Events in Kenya. Ph.D. thesis, University of Nairobi (2020).
Onwuegbuche, F. C. et al. Application of extreme value theory in predicting climate change induced extreme rainfall in Kenya. Int. J. Stat. Probab. 8, 85–94 (2019).
Article Google Scholar

Download references

Acknowledgements

The authors acknowledge with gratitude the support from Strathmore Institute of Mathematical Sciences, Strathmore University and the DAAD [ST32 - PKZ: 91789473] in the production of this manuscript.

Author information

Authors and Affiliations

Strathmore Institute of Mathematical Sciences, Strathmore University, Nairobi, Kenya
Kevin Otieno, Linda Chaba, Collins Odhiambo & Bernard Omolo
Department of Bioengineering and Therapeutic Sciences, School of Pharmacy, University of California, San Francisco, San Francisco, CA, 94258, USA
Linda Chaba
College of Medicine, University of Illinois, Peoria, IL, 61605, USA
Collins Odhiambo
Division of Mathematics and Computer Science, University of South Carolina Upstate, Spartanburg, SC, 29303, USA
Bernard Omolo

Authors

Kevin Otieno
View author publications
Search author on:PubMed Google Scholar
Linda Chaba
View author publications
Search author on:PubMed Google Scholar
Collins Odhiambo
View author publications
Search author on:PubMed Google Scholar
Bernard Omolo
View author publications
Search author on:PubMed Google Scholar

Contributions

K.O., B.O. and L.C. conceived the project. K.O. performed the analysis and drafted the manuscript with substantial contributions from B.O., L.C., and C.O. All authors have read and approved the final version of the manuscript.

Corresponding author

Correspondence to Kevin Otieno.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Otieno, K., Chaba, L., Odhiambo, C. et al. A systematic approach to modeling monthly maximum temperature and total rainfall in Kenya. Sci Rep 15, 31758 (2025). https://doi.org/10.1038/s41598-025-12810-0

Download citation

Received: 15 January 2025
Accepted: 21 July 2025
Published: 28 August 2025
DOI: https://doi.org/10.1038/s41598-025-12810-0