Introduction

Global issues such as water scarcity, climate change, and demand for more food necessitate the application of new approaches in agriculture. Water scarcity, which has been described as the insufficient availability of freshwater to meet demand, is a major constraint to food security, public health, and economic growth1. Excessive use of freshwater resources poses an increasing risk to agricultural productivity. It is projected that nearly 2.3 billion individuals live in water-stressed countries and consume nearly 70% of the world freshwater resources. This reliance poses a significant risk to food security in such areas as sub-Saharan Africa and Asia. Approximately two-thirds of the world’s population experience severe water scarcity at least one month of the year2. The population exposed to water scarcity increased from 0.24 billion in the 1900 s to 3.8 billion in the 2000s. Climate change and urbanization are exacerbating this issue, with projections suggesting 1.693–2.373 billion urban dwellers will face water scarcity by 2050. India will be the most severely affected, with 153–422 million additional water-scarce urban dwellers. The major cities experiencing water scarcity are expected to rise from 193 to 193–2843. Addressing water scarcity challenges requires consideration of uncertainties in water availability projections and the development of tailored interventions4. Potential solutions include infrastructure investment, improved water-use efficiency, and better resource sharing.

It is driven by increasing water demands, dwindling resources, and contamination, facilitated by urbanization, climate change, and agriculture5,6. Seawater greenhouse (SWGH) technology, which was developed in the early 1990 s by Charlie Paton, employs seawater to cool and humidify greenhouse air to stimulate plant growth and produce freshwater via condensation. Successful trials in locations such as the Canary Islands and Australia have proven its effectiveness, and it has been the subject of research and development activities7. The population affected by water scarcity has increased from 0.24 billion in the 1900 s to 3.8 billion in the 2000 s, with water consumption rising fourfold8. Some attribute scarcity to arid environments or insufficient basin-scale water, while others blame poor water management9. Water shortage must be alleviated by an integrated approach of pollution control, rainwater harvesting, desalination techniques, aquifer replenishment, and water recycling technologies. Public awareness and cooperative measures at high levels are required for the successful management of water resources. Conventional agricultural practices that depend on freshwater resources are becoming increasingly unsustainable, especially in semi-arid and arid regions, and hence the necessity for new-generation farming technologies10. One of the most encouraging innovations is the seawater greenhouse, which offers a promising solution by utilizing ample seawater to produce ideal conditions for the cultivation of crops11. Agriculture utilizes about 70–87% of the world’s freshwater resources, and this poses tremendous challenges both in quantity and quality of water12. Heightened food production demand, along with economic growth and population increase, has resulted in agricultural activities that are highly dependent on the use of fertilizers and pesticides13.

Agricultural runoff causes eutrophication and threatens biodiversity. Optimization of irrigation efficiency, conservation strategies, and control of soil erosion are recommended by experts14. Adaptation strategies include choosing crops and irrigations according to local circumstances with the aim of reducing the pressure generated by increased food demand on land and water resources. Without technological advances, significant price adjustments for land, water, and food may be necessary15. A novel solution for freshwater shortages in coastal areas involves solar-thermal desalination technology, driven sustainably and effectively by solar energy using a humidification-dehumidification cycle, and is environmentally friendly. Seawater greenhouses employ solar desalination and evaporative cooling for plants to grow optimally, decreasing the dependency on freshwater resources and fighting the severe desert and coastal climates. This technology is in harmony with international sustainability goals, serves both environmental and food security issues, and has shown higher crop yields than traditional agriculture, thus being promising for raising food output in dry lands.

One of the primary advantages of seawater greenhouses is that they can generate freshwater in arid environments, thereby minimizing the dependence on external water sources. Research confirms that seawater greenhouses can cut water consumption by up to 90% when compared to conventional open-field farming with the same climatic conditions16. Moreover, seawater greenhouses utilize renewable forms of energy such as solar energy, which minimizes carbon emissions and lowers the demand for fossil fuels. The combination of solar power and SWGH technology can enhance sustainability by lowering operational costs17. Greenhouses promote sustainable agriculture by saving freshwater and energy, reducing the environmental impact of traditional farming methods. They also offer a potential solution to food security in water-scarce regions18. The initial cost of constructing an SWGH can be high, particularly in remote locations. Seawater and construction material transport infrastructure can also contribute to the costs19. The seawater environment also presents some maintenance problems, such as corrosion of materials and biofouling of equipment and pipes. These can contribute to operating costs and complexity in terms of sustaining the system in the long run20. Scaling up SWGH technology to a level that can significantly impact global food production is still a challenge. Factors such as land availability, economic feasibility, and technical know-how can affect its scalability21. While the SWGH reduces freshwater consumption, the disposal of concentrated brine from the desalination process remains an environmental concern. There must be good brine management practices to mitigate the potential harm to marine ecosystems22.

Artificial intelligence (AI) is increasingly being utilized in seawater greenhouses to increase agricultural yields and optimize water usage. AI regulates and supervises the greenhouse climate, thereby ensuring ideal growth conditions and effective desalination23. Sensors monitor a range of parameters, including temperature, humidity, solar radiation, and the health of the plants. AI algorithms then analyze the data to modulate ventilation, cooling, and irrigation management, and hence create an ideal microclimate for the crops24. AI also regulates the evaporation and condensation of seawater based on external weather conditions, ensuring consistent freshwater production. For instance, Lawal et al.25 introduced an improved method for optimizing humidification-dehumidification desalination systems through the use of hybrid machine learning optimization method. The study examined the efficiency of a neuro-fuzzy model, a decision algorithm derived from a boosted tree, and a simple averaging ensemble in improving the performance of humidification-dehumidification systems. Data-driven methods have shown promise in improving seawater greenhouse performance and control. Machine learning approaches, such as multilayer perceptron models and support vector clustering, have been used to develop predictive models for greenhouse climate control26,27,28. These models, integrated with data-driven robust model predictive control (DDRMPC) frameworks, have demonstrated superior performance in maintaining optimal greenhouse conditions while reducing energy consumption29. DDRMPC approaches have been shown to outperform traditional rule-based control methods, resulting in lower total costs and reduced constraint violations. In the example of seawater greenhouses, empirical models were constructed for predicting the performance of condensers, a parameter that is very important for the overall system effectiveness30. These data-driven methods have proven very effective in solving the problem of climate control in greenhouses that are subjected to extreme weather conditions, i.e., high temperature and humidity. In recent years, the rapid advancement of artificial intelligence technologies and data-driven modeling techniques has introduced new solutions to these issues.

Mahmood et al.31 created a data-driven, robust model predictive control framework that ensures precise temperature control and notable energy savings, even in the face of uncertainty. These findings indicate that data-driven approaches for predicting greenhouse temperatures warrant further investigation. Additionally, deep learning techniques are receiving increasing attention for time series prediction32. Additionally, deep learning models are capable of handling large-scale, high-dimensional, and multi-modal data, effectively uncovering intrinsic correlations and significantly enhancing prediction accuracy33. According to the current era’s high industrialization, energy usage has surged to previously unheard-of levels, worsening the environment and lead to greenhouse gas emissions34. Huang et al.35 used a gated recurrent unit (GRU) to forecast the lowest temperature, achieving better performance than other models, even with exclusive input parameters. However, because of the sophisticated and changeable nature of greenhouse environments, current approaches face difficulties in providing highly accurate forecasting, while also being developed for various approaches and effective under intense conditions. Therefore, it is crucial to explore combined deep learning models that leverage their individual strengths, address severe conditions in different locations.

Additionally, Subramaniam et al.36 explored an effective combination of deep learning (DL) and dimensionality reduction (DR) methods for predicting crop yield (CYP) of regional crops in India. The innovation of the suggested method lies in the integration of DL, DR, and WTDCNN methods for exact crop productivity forecast. Furthermore, Jun et al.37 introduced a temperature forecast model based on Informer, a variant of the Transformer architecture that is better suited for time series data. BiLSTM is a sequence modeling technique derived from recurrent neural networks, which is effective at using data, is combined with Transformer. BiLSTM has improved in various sectors38. Jiang et al.39 suggested an indoor temperature prediction model that combines attention mechanisms and LSTM, achieving high performance forecasting. Furthermore, to obtain a precise and comprehensive prediction of the short- and mid-term sea surface temperature (SST) field, Xiao et al.40 investigated a spatiotemporal deep learning model capable of capturing the correlations of SST across both spatial and temporal dimensions.

Golabi et al.41 proposes an optimal real-time management method using reinforcement learning to minimize the total daily operation cost of an RO desalination plant with a storage tank system, optimizing energy use and water quality to meet varying freshwater demands. Reza et al.42 provides a comprehensive review of the use of Multi-Layer Perceptron (MLP) in water treatment and desalination, covering its applications in automatic forecasting, resolving missing data issues, and comparing it with conventional modeling approaches. Al Ghamdi43 proposes a novel control strategy for seawater reverse osmosis desalination using an Interpolation and Exponential Function-centered Deep Learning Neural Network (IEF-DLNN) and multi-objective optimization, which demonstrates better performance compared to existing methodologies. Bueso et al.44 presents a novel approach using a Multilayer Perceptron (MLP) to estimate evaporated water mass in cooling tower systems for Zero Liquid Discharge (ZLD) desalination, demonstrating improved performance over traditional linear regression and robustness in capturing key variables, with potential applications across different environmental contexts. Ashraf et al.45 use machine learning and optimization to improve the efficiency of Multi-Effect Desalination systems by identifying optimal operating conditions for maximum distillate production, thereby enhancing operational excellence and contributing to the circular economy in desalination.

Collectively, these studies establish a state-of-the-art framework for integrating AI, remote sensing, and predictive analytics to optimize agricultural productivity, resource use, and food-system resilience from production to distribution. Table 1 presents an overview of various models for seawater greenhouses.

Table 1 The overview of various models for seawater greenhouse.

However, existing studies have not extensively explored the integration of this technology into green buildings or the use of deep learning models for optimizing freshwater production. A research gap remains in predictive models that forecast freshwater yield based on climatic parameters. Despite the promising potential of SWGH technology, existing studies have not thoroughly examined its incorporation into green building frameworks or the application of advanced deep learning models for optimizing freshwater production. Current research lacks predictive models that accurately estimate freshwater yield based on climatic parameters, creating a gap in optimizing SWGH efficiency for long-term water sustainability. This study aims to develop a deep learning-based predictive model for forecasting freshwater production in seawater greenhouses, particularly in the Makran region. The predictive models utilized in this research include BiGRU, BiLSTM, CNN-GRU, CNN-LSTM, and MLP. Initially, the most effective deep learning model among these is identified. Using this prediction, GHI is forecasted, which then enables the estimation of freshwater productivity per surface between 2024 and 2033, employing seawater greenhouse technology in green buildings along the Makran coast. By integrating renewable energy and artificial intelligence, the system aims to improve the efficiency and reliability of SWGH operations within green building applications. Figure 1 shows the mind map of data-driven method innovation in seawater greenhouse.

Fig. 1
figure 1

The mind map of data-driven method innovation in seawater greenhouse.

System description

Seawater greenhouse is a modern and up-to-date method that easily integrates into green-focused architectural systems, providing an environmentally friendly solution for managing significant global issues like water shortage and energy utilization57. The method employs the process of seawater evaporation to regulate the temperature in the greenhouse, thereby establishing a humid climate for plant development along with producing freshwater through condensation58. Furthermore, the passive seawater evaporation cooling system has the capacity to cut down the consumption of traditional air conditioning systems, further promoting building energy efficiency. The technique harnesses renewable resources to provide a mode of sustainable development activities. The approach integrates the rich resource of seawater and the renewable energy potential of solar radiation to develop a sustainable mechanism with the potential to generate freshwater and facilitate agricultural processes in areas of aridity or water stress. This system is not only designed to provide an efficient means for seawater desalination; it is also designed to achieve optimum energy efficiency, thus reducing the use of fossil fuels drastically59. With solar energy being used for heating and electricity, the seawater greenhouse reduces carbon emissions and aligns with global efforts to combat climate change. It also creates a controlled microclimate within the greenhouse, enabling the growing of plants and farm productivity without any loss of resources. This dual function generation of drinking water and promotion of agricultural sustainability renders it an innovative component of green building designs. By integrating high-technology components into natural processes, the seawater greenhouse illustrates the ability of innovative engineering to resolve multifaceted environmental issues, thereby addressing the larger objectives of sustainability and resource conservation. This is a big leap towards developing robust systems in alignment with ecological principles, hence a sustainable future. Figure 2 represents the distribution of countries involved in seawater greenhouse projects.

Fig. 2
figure 2

The distribution of countries involved in SWGH projects.

First, seawater is drawn from the ocean into a primary treatment system, where large particles are filtered out to avoid downstream components, including the condenser, from contamination. This also works to keep salt out of the condenser. Second, the treated seawater flows into the greenhouse condenser, where it delivers cooling and humidity control, creating optimal conditions for plant growth. The seawater is then directed into a parabolic solar collector, where it is heated to its saturation temperature before being conveyed through a three-way valve into the evaporator. In the evaporator, the seawater undergoes a phase transition to superheated steam. The parabolic solar collector effectively collects a large amount of solar radiation during daylight hours; however, supplemental support is required at night. In order to sustain operation during these times, a thermal storage system is included in the cycle. It accumulates excess heat produced during the daytime and feeds it at night, thereby enabling round-the-clock system operation. The three-way valve connects the thermal energy storage device to the evaporator60. Superheated steam is blown by a blower into the condenser, where it undergoes cooling to the dew point, causing distilled water droplets to form on the surface of the condenser. The role of coolant is played by seawater, which is circulated throughout the condenser and thus enhances the efficiency of the distillation process. The distilled water collected is lastly kept in a tank for use in irrigation or to supply buildings with water requirements. Figure 3 illustrates a schematic diagram of the seawater greenhouse cycle integrated with a green building.

Fig. 3
figure 3

The schematic of the seawater greenhouse cycle integrated with a green building61.

The new system greatly minimizes environmental pollution and the use of fossil fuels, highly meeting the Climate Change Organization criteria for sustainability. By integrating solar panels, the system generates renewable energy to power its pumps and other components, optimizing overall energy efficiency and keeping operations environmentally friendly. Solar energy coupled with seawater is a great instance of dedication to harnessing abundant natural resources, thus reducing carbon footprints and preserving limited energy reserves. The seawater greenhouse is a model of sustainable building practice, highlighting the importance of resource efficiency and environmental equilibrium. Technology not only reduces the environmental impact associated with water production but also reduces greenhouse gas emissions by far, thus helping to combat climate change. Also, by fostering sustainable production of drinking water and providing favorable conditions for agricultural production, it enhances sustainable agriculture in areas where water scarcity is a significant problem. As a key element of sustainable infrastructure, this system responds to environmental challenges on a global scale while enhancing the resilience of communities suffering from water shortages and climate stresses. Its pairing of renewable technologies with clever design renders it an innovative response striving to deliver greater sustainability and a climate-resilient future.

Methodology

Governing equations

The parameters applied to assess the performance of the hybrid desalination system in this study include freshwater production, inlet water flow rate to desalination plant, \(\:{{\upeta\:}}_{\text{p}\text{u}\text{m}\text{p}}\), \(\:{{\upeta\:}}_{\text{c}\text{o}\text{l}\text{l}\text{e}\text{c}\text{t}\text{o}\text{r}}\), \(\:{{\upeta\:}}_{\text{e}\text{v}\text{a}\text{p}\text{o}\text{r}\text{a}\text{t}\text{o}\text{r}}\), \(\:{{\upeta\:}}_{\text{c}\text{o}\text{n}\text{d}\text{e}\text{n}\text{s}\text{e}\text{r}}\). After extracting the experimental results, each of the performance parameters of the desalination system was calculated using thermodynamic relations based on the laws of mass and energy conservation. The laws of mass and energy conservation were implemented to each component of the system. It is worth mentioning that the vapor flow rate (\(\:{\dot{\text{m}}}_{\text{W}})\) was assumed to be constant throughout the cycle. The energy required and the efficiency of pumping seawater from the ocean into the primary treatment system is given by Eq. (1) and Eq. (2) :

$$\:{\text{W}}_{\text{P}\text{u}\text{m}\text{p}}=\frac{{\dot{\text{m}}}_{\text{s}\text{w}}\varDelta\:\text{h}}{{{\upeta\:}}_{\text{p}\text{u}\text{m}\text{p}}{\uprho\:}}$$
(1)
$$\:{{\upeta\:}}_{\text{p}\text{u}\text{m}\text{p}}=\frac{{\dot{\text{m}}}_{\text{s}\text{w}}\varDelta\:\text{h}}{{\text{W}}_{\text{P}\text{u}\text{m}\text{p}}{\uprho\:}}$$
(2)

The thermal power input into the water in the parabolic solar water heater is calculated using the Eqs. (3) and (4)62.

$$\:{\dot{\text{Q}}}_{\text{collector}}={\dot{\text{m}}}_{\text{s}\text{w}}{\text{C}}_{\text{p}}\left({\text{T}}_{\text{o},\text{h}}-{\text{T}}_{\text{i},\text{c}}\right)={{\upeta\:}}_{\text{collector}}{\text{A}}_{\text{collector}}{\text{G}\text{H}\text{I}}_{\text{solar}}$$
(3)
$$\:{{\upeta\:}}_{\text{collector}}=\frac{{\dot{\text{m}}}_{\text{s}\text{w}}{\text{C}}_{\text{p}}\left({\text{T}}_{\text{o},\text{h}}-{\text{T}}_{\text{i},\text{c}}\right)}{{\text{A}}_{\text{collector}}{\text{G}\text{H}\text{I}}_{\text{solar}}}$$
(4)

The governing equations for the performance of the evaporator are determined with Eq. (5) to (7)63.

$$\:{\dot{\text{m}}}_{\text{s}\text{w},\text{i},\text{e}\text{v}\text{a}\text{p}}={\dot{\text{m}}}_{\text{s}\text{w},\text{o},\text{e}\text{v}\text{a}\text{p}}+\left({\dot{\text{m}}}_{\text{a}\text{i}\text{r}}\left({{\upomega\:}}_{\text{o},\text{evap}}-{{\upomega\:}}_{\text{i},\text{evap}}\right)\right)\:\:\:\:\:\:\:\:\:\:$$
(5)
$$\:{\dot{\text{m}}}_{{\text{s}\text{w}}_{\text{i}},\text{e}\text{v}\text{a}\text{p}}={\dot{\text{m}}}_{{\text{s}\text{w}}_{\text{o}},\text{}\text{evap}}={\dot{\text{m}}}_{\text{w}}$$
(6)
$$\:{\dot{\text{Q}}}_{\text{loss,h}}={\dot{\text{m}}}_{\text{w}}{\text{C}}_{\text{p}}\left({\text{T}}_{\text{o},\text{evap}}-{\text{T}}_{\text{i},\text{evap}}\right)+{\dot{\text{m}}}_{\text{air,o,evap}\text{}}{\text{h}}_{\text{a}\text{i}\text{r},\text{o},\text{e}\text{v}\text{a}\text{p}}-{\dot{\text{m}}}_{\text{air,i,evap}}{\text{h}}_{\text{a}\text{i}\text{r},\text{i},\text{}\text{evap}}+{\dot{\text{m}}}_{\text{w}}{{\upomega\:}}_{\text{o},\text{evap}}{\text{h}}_{\text{v},\text{o},\text{evap}}$$
(7)

The evaporator efficiency or the effectiveness of the humidification process (\(\:{{\upeta\:}}_{\text{e}\text{v}\text{a}\text{p}\text{o}\text{r}\text{a}\text{t}\text{o}\text{r}}\)) is obtained in Eq. (8).

$$\:{{\upeta\:}}_{\text{evaporator}\text{}}=\frac{\left({{\upomega\:}}_{\text{o},\text{e}\text{v}\text{a}\text{p}}-{{\upomega\:}}_{\text{i},\text{e}\text{v}\text{a}\text{p}}\right)}{\left({{\upomega\:}}_{\text{o},\text{e}\text{v}\text{a}\text{p},\text{s}}-{{\upomega\:}}_{\text{i},\text{e}\text{v}\text{a}\text{p}}\right)}=\frac{{\dot{\text{m}}}_{\text{steam}\text{}}{\text{h}}_{\text{latent}\text{}}}{{{\upeta\:}}_{\text{collector}\text{}}{\text{A}}_{\text{collector}\text{}}{\text{I}}_{\text{solar}\text{}}}$$
(8)

(\(\:{{\upomega\:}}_{\text{o},\text{e}\text{v}\text{a}\text{p},\text{}\text{s}}\)) is the absolute humidity corresponding to the condition when the vapor exiting the evaporator is saturated.

The governing equations for the performance of the condenser are determined with Eq. (9) to (11)64.

$$\:{\dot{\text{m}}}_{\text{f}\text{w}\text{p}}={\dot{\text{m}}}_{\text{a}}\left({{\upomega\:}}_{\text{i},\text{}\text{condenser}}-{{\upomega\:}}_{\text{o},\text{}\text{condenser}}\right)$$
(9)
$$\:{\dot{\text{m}}}_{{\text{w}}_{\text{i}},\text{}\text{condenser}\text{}}={\dot{\text{m}}}_{{\text{w}}_{\text{o}},\text{}\text{condenser}\text{}}={\dot{\text{m}}}_{\text{w}}$$
(10)
$$\:{\dot{\text{Q}}}_{\text{c}\text{o}\text{n}\text{d}\text{e}\text{n}\text{s}\text{e}\text{r}}=\left({\dot{\text{m}}}_{\text{f}\text{w}\text{p}}{\text{h}}_{\text{l}\text{a}\text{t}\text{e}\text{n}\text{t}}\right)+{\dot{\text{m}}}_{\text{w}}\left({\text{}\text{h}}_{\text{o},\text{condenser}}-{\text{h}}_{\text{i},\text{condenser}}\right)$$
(11)

The condenser efficiency or the effectiveness of the dehumidification process (\(\:{{\upeta\:}}_{\text{c}\text{o}\text{n}\text{d}\text{e}\text{n}\text{s}\text{e}\text{r}}\)) is obtained using the Eq. (12).

$$\:{{\upeta\:}}_{\text{condenser}}=\frac{\left({{\upomega\:}}_{\text{i},\text{condenser}}-{{\upomega}}_{\text{o},\text{condenser}}\right)}{\left({{\upomega}}_{\text{i},\text{condenser}}-{{\upomega}}_{\text{o},\text{condenser}\text{,s}}\right)}=\frac{{\dot{\text{m}}}_{\text{cool}}{\text{C}}_{\text{p}}{\Delta\:}\text{T}}{{\dot{\text{m}}}_{\text{steam}}{\text{h}}_{\text{latent}}}$$
(12)

(\(\:{{\upomega\:}}_{\text{o},\text{c}\text{ondenser}\text{,}\text{s}}\)) is the absolute humidity corresponding to the condition when the vapor exiting the condenser is at a temperature equal to the condenser temperature and in a saturated state.

The latent heat of vaporization (\(\:{\text{h}}_{\text{f}\text{g}}\)) is used. The final amount produced water in the seawater greenhouse cycle is obtained in Eq. (13).

$$\:{\dot{\text{m}}}{\text{f}\text{w}\text{p}}=\frac{{{\upeta\:}}_{\text{pump}\text{}}{{\upeta\:}}_{\text{evaporator}\text{}}{{\upeta\:}}_{\text{condenser}\text{}}{{\upeta\:}}_{\text{collector}\text{}}{\text{A}}_{\text{collector}\text{}}{\text{G}\text{H}\text{I}}_{\text{solar}\text{}}}{{\text{h}}_{\text{latent}\text{}}}$$
(13)

Deep learning models

Bidirectional LSTM (BiLSTM)

LSTM is a particular kind of RNN specifically engineered for addressing the vanishing gradient issue. LSTM networks include an advanced architecture that enables them to efficiently track and learn long-term dependencies in sequential data. The BiLSTM developed in this study comprises two autonomous LSTMs: the forward LSTM and the backward LSTM. The final learning outcome is achieved by integrating the forward and reverse input sequences, weighed appropriately, while concurrently analyzing data from both past and future sequences65. The specific model configuration can be seen in Fig. 4.

Fig. 4
figure 4

The architecture of the BiLSTM hybrid model.

CNN-LSTM

CNN are a category of feedforward neural networks that incorporate convolutional operations, generally comprising convolutional, pooling, and fully connected layers. CNN-LSTM is a deep learning architecture that integrates CNN and LSTM networks, frequently employed for the analysis of time-series data. This work employs a mixed technique of extracting features via the CNN layer. As illustrated in Fig. 5, the time series data is initially fed into the convolutional layer of the CNN for feature extraction. The collected feature sequences are subsequently input into the LSTM for further time-series modelling and forecasting66.

Fig. 5
figure 5

The architecture of the CNN-LSTM hybrid model.

Bidirectional GRU (BiGRU)

The GRU neural network is an improved model of LSTM that decreases the number of gates while preserving long-term memory links to address the vanishing gradient problem. The BiGRU consists of two unidirectional GRUs functioning in opposing directions, hence creating an additional hidden layer. The key difference between BiGRU and GRU lies in the additional layer of hidden states67, as seen in Fig. 6.

Fig. 6
figure 6

The architecture of the BiGRU hybrid model.

CNN-GRU

CNN-GRU is a hybrid neural network model that integrates CNN and GRU, with a basic framework seen in Fig. 7. The data serves as the model input, and the convolution technique is employed to extract features and collect data correlations. The quantity of parameters is subsequently minimized by a pooling process to decrease the data dimension. Simultaneously, the dropout layer is incorporated to randomly choose neurons in the network based on a specified possibility to reduce overfitting68. The GRU layer is employed to analyze the decreased data and identify the temporal compliance rules among them69. The data are ultimately transformed into a one-dimensional sequence by the fully connected layer, resulting in the final results.

Fig. 7
figure 7

The architecture of the CNN-GRU hybrid model.

Multilayered perceptron (MLP)

MLP is one of the most widely used types of ANNs. The structure comprises three fully connected layers: the input layer, where model inputs are introduced; the output layer, which yields the results of the trained model; and the hidden layers, which serve as intermediate layers and may range from zero to multiple, as illustrated in Fig. 8 The relationships among them are founded on a weighted structure, with values adjusted throughout model training70,72.

Fig. 8
figure 8

The architecture of the CNN-LSTM hybrid model.

Model evaluation

Different error measurement methodologies can evaluate the accuracy of models for prediction. The study proposes to evaluate the precision of models using five evaluation metrics: mean absolute error (MAE), root mean square error (RMSE), coefficient of determination (R²), square error (MSE), normalized root mean square error (NRMSE). The five parameters are determined with Eqs. (14) to (18)7376.

$$\:\text{M}\text{A}\text{E}=\frac{1}{\text{N}}\sum_{\text{i}=1}^{\text{N}}\left|{\text{x}}_{\text{i}}-{\text{y}}_{\text{i}}\right|$$
(14)
$$\:\text{R}\text{M}\text{S}\text{E}=\sqrt{\frac{1}{\text{N}}\sum_{\text{i}=1}^{\text{N}}{({\text{x}}_{\text{i}}-{\text{y}}_{\text{i}})}^{2}}$$
(15)
$$\:{\text{R}}^{2}=1-\frac{{\sum}_{\text{i}=1}^{\text{N}}{{(\text{x}}_{\text{i}}-{\text{y}}_{\text{i}})}^{2}}{{\sum}_{\text{i}=1}^{\text{N}}{\left({\text{y}}_{\text{i}}\right)}^{2}}$$
(16)
$$\:\text{M}\text{S}\text{E}=\frac{1}{\text{N}}\sum_{\text{i}=1}^{\text{N}}{({\text{x}}_{\text{i}}-{\text{y}}_{\text{i}})}^{2}$$
(17)
$$\:\text{N}\text{R}\text{M}\text{S}\text{E}=\frac{\text{R}\text{M}\text{S}\text{E}}{\text{m}\text{e}\text{a}\text{n}\:\left(\text{o}\text{b}\text{s}\text{e}\text{r}\text{v}\text{a}\text{t}\text{i}\text{o}\text{n}\right)}\:\:\:$$
(18)

Input data

Case study

The geographical area of this case study is the Makran coast, in the southeastern part of Iran along the northern coast of the Gulf of Oman. This area stretches between latitudes 25° and 26.5° N and longitudes 57° to 61° E and is characterized by an arid to semi-arid climate with high temperatures, low annual rainfall, and limited natural freshwater resources. The Makran coast is of great geographical importance, particularly in light of its increasing population and expansion of industrial activities, both of which occasion an increasing need for sustainable and reliable water resources. The geographical and climatic conditions of the Makran coast make the region exceedingly well-suited for projects relating to renewable energy solutions. Due to its proximity to the Tropic of Cancer, the area receives high solar radiation and extended sunshine hours all year round, making it an ideal location for the deployment of solar energy systems. Figure 9 represents the location map of the Makran coast.

Fig. 9
figure 9

Location of the Makran coast in Sistan and Baluchestan, Iran, generated using base maps from Mapbox and OpenStreetMap (via Mapcarta, https://mapcarta.com/14666356, accessed 23 Jan. 2025), and edited with EdrawMax. Map data © OpenStreetMap contributors, licensed under ODbL77.

Direct Normal Irradiation (DNI) is an essential parameter for establishing the feasibility of solar energy because it quantifies the solar radiation per square meter received on a surface normal to the solar beam. This measure is particularly significant to solar technologies such as parabolic solar collectors, whose effectiveness relies on direct sunlight for effective concentration and conversion of solar energy into heat energy. The Makran coastal DNI map highlights the huge solar energy potential of the region. Owing to dry climatic conditions, low cloud cover, and high sunlight duration, the Makran coast offers a suitable arrangement for the installation of parabolic solar collectors, which find extensive applications in thermal energy systems like heating, cooling, and other industrial processes. The DNI map for this area (Fig. 10) illustrates yearly solar irradiation averages in the form of a color-coded spectrum, extending from blue (lower) to pink (higher). The Makran coastline is predominantly orange to pink in color, designating DNI values from 2200 to 3700 kWh/m² annually. Such high values point towards the region’s appropriateness for parabolic solar collector systems that demand persistent and intense sunlight in order to be effective. Utilization of the high solar energy potential of the Makran coast provides excellent opportunities for the development of sustainable energy plans, facilitating the production of green thermal energy for local industries and communities while, simultaneously, decreasing the use of conventional fossil fuels. This is an indicator of the country’s potential to become a center for renewable energy plans in Iran.

Fig. 10
figure 10

Source: Global Solar Atlas (World Bank Group and Solargis, 2025), available at https://globalsolaratlas.info (accessed 23 Jan. 2025). Licensed under CC BY 4.078..

Long-term average of annual totals of DNI of the Makran coast in Sistan and Baluchestan, Iran.

Figure 11 displays the monthly average values of DNI for the Makran coast of Sistan and Baluchestan, Iran, and these give considerable insight into the region’s solar energy potential. DNI quantifies the solar radiation that falls on one square meter of a surface oriented perpendicular to the direction of the solar rays and thus is an important variable in the evaluation of solar energy projects, especially for such technologies as parabolic solar collectors. The results indicate high DNI values all year round with maximum values registered during January, March, April, October, and December, ranging from 175 to 185 kWh/m² monthly on average. These high values reveal an excellent availability of solar resources in winter and transitional months.

Conversely, minimum DNI values range from 100 to 120 kWh/m² during July and August, most likely due to atmospheric conditions such as increased humidity, haze, or cloud cover that reduce direct sunlight intensity during these months. The findings show the excellent solar energy potential of the Makran coast year-round, particularly during cooler months when solar energy systems can be more efficient due to reduced thermal loss. The high and stable values of DNI make the region an appropriate place for the setup of solar energy systems, like parabolic solar collectors, ideal for the generation of thermal energy for industrial and household applications. The results indicate the Makran coast as a prospective site for the feasibility of renewable energy projects, in agreement with the transition to the utilization of sustainable energy systems in Iran.

Fig. 11
figure 11

Monthly average of DNI of Makran coast in Sistan and Baluchestan, Iran78.

Figure 12 demonstrates the solar elevation and azimuth for the Makran coast in Sistan and Baluchestan, Iran, providing detailed information on the sun’s trajectory across the sky throughout the year.

Fig. 12
figure 12

The solar elevation and azimuth for the Makran coast in Sistan and Baluchestan, Iran78.

Solar elevation, represented on the vertical axis, is the altitude of the sun above the horizon, whereas solar azimuth, represented on the horizontal axis, defines the direction of the sun relative to true north (0° or 360°) and south (180°). The yellow color represents the active area of solar energy, which signifies the position of the sun throughout the various seasons. Significant solar paths are also included on the chart, highlighting seasonal movement. The red curve is the June solstice, where the sun reaches its highest elevation, producing the strongest and longest solar radiation. The blue curve is the December solstice, with lower solar elevation and shorter daylight hours. The black curve is the sun’s path during the equinox (spring and fall), when day and night are approximately equal. In addition, the terrain horizon, which is shown as the shaded black area at the bottom part, considers potential obstructions, such as hills or mountains that may block sunlight at lower altitudes. This analysis provides vital information for the optimal location and orientation of solar energy systems, such as parabolic solar collectors, in the Makran coastal region. The high solar elevation for most of the year underscores the area high solar energy potential. Orientation of solar collectors to receive optimum exposure during the peak solar hours can increase efficiency and facilitate the establishment of sustainable energy infrastructure in this high-irradiation coastal area.

Further, the Gulf of Oman provides a rich source of seawater, which can be effectively desalinated using the HDH process. This article discusses the key problem of freshwater shortage in the Makran region by incorporating a deep learning prediction model into the HDH approach. The predictive model offers an empirically derived approach crafted to improve resource management and facilitate the creation of sustainable water production systems uniquely adapted to surmount the specific difficulties that face arid coastal regions. Table 2 presents the geographical coordinates, solar radiation parameters, and climatic features of the Makran Coast.

Table 2 The geographical location, solar radiation levels, and Climatic characteristics of the Makran Coast.

In order to forecast the freshwater yield on the Makran coast of Iran, a deep learning approach is established based on both historical and real-time data gathered from a renewable energy-driven Humidification-Dehumidification (HDH) desalination plant. All the influencing environmental and operational parameters, such as solar irradiance, air and seawater temperature, humidity, flow rates, and thermal storage-related considerations, are embedded in the model. An LSTM neural network is implemented for this task since it is capable of handling sequential data and learning temporal relations. LSTM network is trained in such a manner that it links the provided input features to the desired output, which is the quantity of freshwater generated. A loss function like Mean Squared Error (MSE) is utilized during training to reduce mistakes in the predictions, while model performance is measured using metrics like Root Mean Squared Error (RMSE) and R-squared. After validation, the algorithm has shown the ability to accurately predict freshwater production under varying environmental and operational conditions. It can therefore be used as a reliable tool for system effectiveness evaluation and to assist in proactive decision-making during water scarcity.

Data Pre-processing and feature selection

The study examined if an increased data volume and multivariate observation enhance predictive performance or if a univariate data set is enough for generating an acceptable outcome. Consequently, the input data for the algorithms were multivariate, to improve awareness of this issue. This study used direct technique for predicting multistep ahead data, utilizing prior time steps as input variables and defining future time steps as target variables. Table 3 provides a description of the characteristics. The dataset is fully populated, with no missing values, however it requires preprocessing. The preprocessing part of this study encompasses feature engineering and data normalization.

To minimize computational expenses and enhance forecast accuracy, feature selection is performed to identify the most important features affecting the output value79. This study uses the Pearson matrix as the mechanism for feature selection. Figure 13 demonstrates the heatmap of the Pearson correlation matrix for global horizontal irradiation (GHI) in Makran coast. This coefficient is suitable for continuous time series input and target variables, quantifying correlations on a scale from − 1 to 1. The Pearson correlation coefficient is generally utilized to calculate the standard deviations between input and target variables, with the coefficient indicating the strength of the relationship. When a coefficient stands between 0 and 1, the target variable increases as the input variable rises. Conversely, when a coefficient ranges from 0 to −1, the target variable decreases as the input variable increases80. Figure 13 illustrates the significant correlation among wet bulb temperature at 2 m (T2MWET), Temperature at 2 m (T2M), Specific Humidity at 2 m (QV2M), surface temperature (TS), relative humidity at 2 m (RH2M), and surface pressure (PS), Dew Point Temperature at 2 m (T2MDEW), with GHI for Makran coast. Consequently, these attributes have been taken to be inputs for the multivariate.

Fig. 13
figure 13

Heatmap of Makran coast based on Pearson coefficient.

Table 3 Dataset’s feature description.

Data normalization

Moreover, data may have disparities in scale and range, thus impacting the effectiveness of deep learning models. Normalization eliminates this issue by standardizing all numeric columns to a common scale, thus preventing any particular feature from unfairly impacting the model due to its range. This approach standardizes data obtained by rescaling it based on the maximum and minimum values. This standardizes the data ranges between 0 and 1, facilitating meaningful comparisons and clarity across various data types81. The dataset is divided into an 80 − 20 ratio for training and testing to prevent overfitting and underfitting. Figure 14 represents the methodology utilized in this study for forecasting solar radiation. This methodology comprises five main steps: (a) Meteorological data collection, (b) Data preprocessing, (c) Model training, (d) Model evaluation, and (e) Best model selection.

Fig. 14
figure 14

The process of the present study for prediction.

Results

Deep learning models

Table 4 defines the specifications of the hyperparameters for the suggested models. The model is supposed to forecast GHI as the primary output. It is important to consider that hyperparameters must be readjusted when the prediction priority varies. A higher volume of data and multivariate observations enhance predictive performance. Table 5 presents the result of the evaluation parameters for multivariate multistep ahead forecasting within the specified time horizon. The performance of different models for predicting GHI is presented. Across all metrics, the CNN-LSTM model demonstrated the highest accuracy, with the lowest RMSE (0.0021), MSE (0.0022), and MAE (0.0363) for testing. Its R² value of 0.9727 further indicates a strong correlation between the predicted and actual values. The MLP model, while slightly less accurate, demonstrated consistent performance across target features, achieving an R2 of 0.9707. The BiGRU and BiLSTM models also showed competitive performance, with LSTM slightly outperforming CNN-LSTM in terms of R2 and MAE. However, their performance was inferior to that of the standalone CNN-LSTM models, suggesting that simpler models can suffice for GHI prediction in this region for forecasting Makran coast over a 10-year horizon. To ensure clarity and reproducibility of the proposed hybrid forecasting framework, the full hybridization process is outlined in Appendix A, which provides a structured pseudo-code representation of the CNN-LSTM model. The algorithm begins with data preprocessing, including feature selection, aggregation, and normalization. Subsequently, the input–output pairs are generated through a sliding window approach, which enables the construction of supervised learning sequences. The core of the hybrid model is a CNN–LSTM architecture, where convolutional layers are employed to extract local temporal patterns, and stacked LSTM layers are used to capture long-term dependencies. Dense layers are then applied to map the learned representations to the forecast horizon. Finally, the model is trained and evaluated using multiple performance metrics.

Table 4 Hyperparameters of proposed models.
Table 5 Irradiance evaluation parameters of models.

The proposed hybrid CNN–LSTM model for time-series forecasting was designed with carefully tuned hyperparameters. A 12-step input window captures full seasonality, while a longer forecasting horizon tests both short- and long-term predictive ability. The CNN layer (kernel size 2, 64 filters) extracts local temporal features, and stacked LSTM layers (100 units each) capture long-term dependencies efficiently. A fully connected layer with ReLU activation enhances non-linear feature learning, while dropout was set to zero due to limited data but may be increased for larger datasets. The model is optimized with Adam for stable and fast convergence, using MSE loss to emphasize large errors. A small batch size improves generalization on small datasets, and 200 training epochs allow convergence without overfitting.

Figure 15 represents the value of the loss function in relation to epochs, as well as the MAE assessment throughout epochs. Clearly, an increase of epochs results in a reduction of the MAE value, indicating the completion of the learning process. A significant number of epochs (exceeding 200) increases the probability of overfitting, resulting in decreased model efficiency; conversely, fewer epochs provide poor learning outcomes.

Fig. 15
figure 15

The variation of the loss function and MAE versus epochs.

Figures 16 and 17 illustrate the CNN-LSTM performance algorithms in predicting GHI in the Makran ocean over the next ten years, utilizing testing and training data. An alternative method to assess the algorithm performance is to establish a regression line correlating the actual and forecasted values. The distribution of points near the line y = x demonstrates a robust relationship between the predicted and actual data. Figures 18 and 19 illustrate the regression plots for the training and testing datasets. A substantial correlation exists between the actual and predicted data of GHI. The final simulation result, as shown in Fig. 20, demonstrates that the proposed CNN-LSTM model can reliably and effectively forecast solar radiation over the next ten years, which was the study goal.

Fig. 16
figure 16

Predicted test data of solar radiation for CNN-LSTM.

Fig. 17
figure 17

Predicted train data of solar radiation for CNN-LSTM.

Fig. 18
figure 18

Regression plot of the forecasted and true train data of GHI data of CNN-LSTM model.

Fig. 19
figure 19

Regression plot of the forecasted and true test data of GHI data of CNN-LSTM model.

Fig. 20
figure 20

GHI diagram with CNN-LSTM model diagram by month from 1984 to 2033.

Green building freshwater prediction

Figure 21 shows the performance of the CNN-LSTM model in forecasting freshwater production in the Makran Ocean over the coming decade. The figure sketches the rate of freshwater production per unit surface area (L/m²) between 1984 and 2033, comprising historical data through 2023 and projecting the 2024–2033 period as achieved through the application of the CNN-LSTM model. The GHI value was forecasted with the help of this model, and according to Eq. Seawater greenhouse system freshwater production capacity in green buildings can be predicted (13). The decade (2024–2033) average annual freshwater production is predicted as 1454.25 L/m². This combined approach enhances the predictive accuracy by taking solar radiation into consideration as a significant parameter affecting freshwater production efficiency. Parameters describing the quantities of freshwater yields per unit area are given in Table 6.

Table 6 The parameters of quantities of freshwater production per unit area.

Following historical trends, freshwater production has displayed variability across the decades with an overall trend of rise from the 1990 s to around 2020. Inflection points representing significant growth stages are found in the late 1990 s and early 2000 s, reaching a peak around 2020, after which there is a downturn prior to stabilization being reached. The oscillations noted can be attributed to changes in climatic conditions, improvements in efficiencies of operations, or extraneous influences on the seawater greenhouse system. From the year 2024, the model foresees stabilization in freshwater production with minor fluctuations and a consistent increase to be witnessed up to 2033. The application of CNN-LSTM for GHI prediction is pivotal in this case since solar radiation is a significant driver of evaporation and condensation processes in seawater greenhouses. First, the system creates a model for GHI, then uses the results to improve forecasting of freshwater production. The method allows for more precise understanding of variations in water yield under different environmental conditions. This predictive model facilitates the creation of sustainable seawater greenhouse technology, especially for green building applications, by enhancing water resource efficiency over the solar energy potential. The anticipated stabilization and gradual rise in freshwater output indicate that, with continually advancing predictive modeling and system optimization, seawater greenhouses have much to contribute to water sustainability efforts in the future.

Fig. 21
figure 21

Freshwater production diagram with CNN-LSTM model diagram by month from 1984 to 2033.

Discussion

The predictive modeling framework developed in this study, centered on a CNN–LSTM hybrid architecture, achieved superior accuracy in forecasting both GHI and corresponding freshwater yield for seawater greenhouse (SWGH) systems in the Makran region. The achieved R² of 0.9727 and RMSE of 0.0021 for GHI prediction surpass the performance metrics reported in most recent SWGH modeling efforts. Panahi et al.48 employed an ANN–ALO approach for predicting freshwater production in Oman and reported RMSE values exceeding those obtained in this study, while Wu et al.47 using RBFNN also exhibited lower accuracy compared to CNN–LSTM results. Similarly, Essa et al.46 integrated a Random Vector Functional Link with Artificial Ecosystem Optimization for predicting water productivity in Egypt, achieving reliable performance but without addressing long-term, multistep-ahead forecasts as implemented here.

Several prior works have explored deep learning architectures for agricultural or desalination applications, but with different objectives and environmental contexts. Huang et al.35 and Jiang et al.39 applied attention-based CNN–LSTM and LSTM models for microclimate forecasting in controlled agricultural systems, showing the benefit of temporal–spatial feature extraction. However, these models primarily targeted short-term predictions (hours to days) and did not explicitly link irradiance prediction to desalination output. The two-stage approach; first forecasting solar input and then using it to estimate freshwater yield extends the applicability of deep learning to long-term water production planning in arid coastal zones. From a technological integration perspective, previous studies have generally modeled SWGH systems in isolation from green building concepts. Al-Ismaili7,11 documented SWGH operational benefits but did not consider their synergy with building-scale resource management. In contrast, the present study explicitly evaluates SWGH freshwater yield within a green building framework, aligning with the food–energy–water nexus approach discussed by Valencia et al.58. This integration is crucial for improving overall system sustainability, as it allows co-optimization of building cooling loads, water supply, and agricultural production.

The forecasted average freshwater yield of 1454.25 L/m²/year for 2024–2033 compares favorably with production capacities reported in other high-solar-irradiance sites. Zarei et al.51 in Iran and Ehteram et al.49 in Oman demonstrated freshwater outputs of similar magnitude under optimal seasonal conditions but did not account for interannual variability. By leveraging CNN–LSTM, our approach captures both seasonal and decadal-scale fluctuations, offering a more robust planning tool for infrastructure investment. Importantly, while this study demonstrates high predictive performance, it also highlights the sensitivity of SWGH output to solar resource availability. This finding is consistent with the observations of Lawal et al.25, who showed that optimizing humidification–dehumidification parameters can significantly improve productivity under fluctuating weather conditions. The implication for policy and design is that predictive control systems potentially combined with hybrid renewable inputs such as wind or biomass27 could further enhance year-round stability82. By embedding AI-driven prediction into SWGH operation, this research addresses a gap in current literature noted by Ghiat et al.24 and Mahmood et al.26, who emphasize the need for adaptive, data-driven control under harsh climates. Results demonstrate that deep learning not only improves forecast accuracy but also provides actionable insights for operational planning, long-term water security, and integration into sustainable architecture.

Conclusion

This research proves the viability of the combination of deep learning methods and SWGH technology for enhancing freshwater generation for sustainable building utilization. Sophisticated machine learning methods were used to create a predictive model that can estimate freshwater yield as a function of both operational and environmental conditions with great reliability. Of the models compared, the CNN-LSTM showed the best prediction accuracy with R² of 0.9727, and the least RMSE (0.0021) and MSE (0.0022), thereby being the top performing model for the prediction of solar irradiance and freshwater production. This model was used to predict the value of GHI, and the freshwater production from the seawater greenhouse system in green buildings can be derived from freshwater production per unit area, which is very much reliant on the GHI value. The analysis of historical trends in freshwater production between 1984 and 2023 indicated variability because of climatic conditions and system efficiency. However, 2024–2033 forecasts predict stabilization with a gradual increase in freshwater yield at an approximate average annual value of 1454.25 L/m². The trend is indicative of the SWGH technology potential as a sustainable tool for water management in water-deficient regions, particularly when integrated into green building systems. From the CNN-LSTM model, the outcomes imply that with increased solar radiation, freshwater yield in SWGHs can be optimized, making this technology more reliable for sustainable water management. Beyond predictive accuracy, the findings underscore the broader implications for sustainable development:

  • Technical relevance: AI-enhanced SWGH systems can adapt to fluctuating climatic conditions, improving operational planning and reducing downtime.

  • Environmental sustainability: The reliance on solar-driven desalination minimizes fossil fuel use and greenhouse gas emissions while enabling brine management strategies to protect marine ecosystems.

  • Economic and social value: Stable freshwater production supports agricultural activities, reduces dependency on costly imported water, and strengthens the resilience of coastal communities.

The seawater greenhouse is a significant development in sustainable building systems by efficiently integrating desalination processes relying on renewable energy and ecological infrastructure. It employs the evaporation of seawater in regulating greenhouse temperatures, thereby creating favorable conditions for plant growth while simultaneously generating freshwater from the condensation process. One such example of advancement in this field is the use of solar-powered desalination using a thermal storage system that is integrated, thus making continuous operation possible even at times of low solar irradiance. Additionally, passive cooling systems facilitate optimal energy efficiency by reducing the reliance on conventional air conditioning systems within green building architecture designs. Maximum use of water and energy resources is achievable due to the closed-loop nature of such systems, hence making it a viable green solution to application in desert climates. Seawater greenhouses incorporated in green building architecture designs present an environmentally friendly solution to water-scarce regions. These systems harness renewable solar energy, thus reducing carbon emissions and providing potable water for irrigation, cooling, and other building purposes.

Future work should focus on expanding model inputs to incorporate socio-economic factors and seasonal agricultural demand data, enabling more comprehensive integration within food–energy–water nexus planning. In addition, research should investigate the use of hybrid renewable energy systems, such as solar–wind or solar–biomass combinations, to ensure continuous seawater greenhouse (SWGH) operation and enhance resilience against fluctuations in solar availability. Finally, efforts should be directed toward scaling the proposed approach for multi-site deployments, coupled with an assessment of policy frameworks that can facilitate the adoption of SWGH-integrated green building solutions in water-scarce regions. The synergy between advanced deep learning methods and SWGH technology offers a practical, scalable pathway to sustainable water management. Artificial intelligence application in solar water generation systems is a significant move towards water sustainability as well as promoting the development of green infrastructure in regions that lack water. When deployed strategically within green building systems, this integrated approach has the potential to significantly mitigate water scarcity challenges in arid coastal zones worldwide.

Appendix A: Pseudo code for CNN–LSTM model

Input:

Dataset D with features (GHI, T2M, …).

Input window size W.

Forecast horizon H.

Output:

Predicted values Y_pred for next H steps.

1: Load dataset D.

2: Remove irrelevant columns.

3: Aggregate data into monthly means.

4: Normalize features using MinMaxScaler.

5: Create windowed dataset (X, Y) with size W and horizon H.

6: Split data into training set (X_train, Y_train) and test set (X_test, Y_test).

7: Initialize Sequential model.

8: Add Conv1D layer with filters = 64, kernel size = 2, activation = ReLU.

9: Add MaxPooling1D with pool size = 2.

10: Flatten output.

11: Reshape output for LSTM input.

12: Add LSTM layer with 100 units, return_sequences = True.

13: Add another LSTM layer with 100 units.

14: Add Dense layer with 200 units, activation = ReLU, regularizer = L2.

15: Add Dropout layer (rate = 0.0).

16: Add Dense layer with (H × 2) units, activation = Linear.

17: Reshape output to (H, 2).

18: Compile model with optimizer = Adam, loss = MSE, metrics = MAE.

19: Train model on (X_train, Y_train) with validation on (X_test, Y_test).

20: Predict Y_pred = model(X_test).

21: Evaluate using metrics {R², RMSE, MSE, MAE, NRMSE}.

22: Return Y_pred.