Introduction

Water scarcity has emerged as a significant challenge for agriculture in India. The excessive extraction of groundwater for agricultural, sugar mill, and domestic needs has resulted in declining water tables, intensifying the water shortage issue. Consequently, agriculture in these regions faces the pressing need to discover alternative irrigation water sources. Recently, reusing wastewater has gained popularity as a viable solution1,2,3. Wastewater contains not only beneficial nutrients and organic matter but also a significant amount of harmful contaminants. These include salts, toxic organic substances, pathogens, residual pharmaceuticals, endocrine-disrupting compounds, and various heavy metals4,5. The accumulation of these pollutants in soil can lead to long-term risks, negatively affecting soil quality and crop productivity6.

Sugar mill waste significantly contributes to environmental pollution and the expansion of industries and their effluent discharges place immense pressure on soil and environmental health. Despite the global significance of this issue, it often remains overlooked. This problem is severe in developing countries like India, where sugar mill pollution is a growing concern. In India, urban areas produce an estimated 26.4 cubic kilometers of wastewater annually, but only 28% of this wastewater undergoes treatment. Utilizing this treated wastewater for irrigation could potentially cover approximately 2.1 million hectares of agricultural land. According to statistics of 2022, there was almost 30% increase in generation of hazardous sugar mill waste and it is increasing on exponential rates7,8.

As reported by the Directorate of Sugarcane Development, Uttar Pradesh (UP) is the leading producer of sugarcane in India, with nearly 45% of the state’s agricultural land dedicated to this crop. During the 2022-23 crushing season, the state had 118 operational sugar mills. The total area under sugarcane cultivation in Uttar Pradesh is 2.853 million hectares, with a productivity rate of 839 quintals per hectare. Globally, these industries produce a large amount of untreated waste, which in turn affect mineral quality in water and soil sources8,9,10. In India, especially in large sugar mill states like UP there is a notable lack of research on the impact of sugar mill waste and wastewater on agriculture and their discharge onto agricultural land. Addressing this gap is crucial for mitigating the adverse effects on the environment and ensuring sustainable agricultural practices. To mitigate these risks, implementing stringent monitoring and management practices can ensure that the use of untreated wastewater remains safe and beneficial for soil and crop production11. By integrating advanced treatment methods and regular monitoring, the risks of soil and crop contamination can be effectively minimized, promoting sustainable agricultural practices. One effective approach to detect contamination in soil is through electromagnetic methods based on physical parameters. Electromagnetic remote sensing techniques have proven effective in identifying contaminated soils and those with high salinity levels12.

Most research on soil properties has concentrated on the use of surface or groundwater for irrigation13,14. However, the effects of using non-conventional waters, such as wastewater, for irrigation have received comparatively little attention. In the present study, effects of sugar mill wastewater (especially in sugarcane sugar mill area) on properties of soil has been analyzed on Indian agricultural soils by employing DAK-12d coaxial probe in frequency region of 10 MHz to 3 GHz.

Area of analysis

The study was conducted in the key sugarcane-growing regions of western Uttar Pradesh, which are irrigated by the Hindon River. This river originates from the Shivalik Hills in the Western Himalayas and flows through northern Uttar Pradesh as a tributary of the Yamuna River. Covering a catchment area of 7,083 square kilometers, the Hindon River passes through districts including Saharanpur, Muzaffarnagar, Meerut, Baghpat, Ghaziabad, Noida, and Greater Noida before joining the Yamuna River at Greater Noida, as shown in Fig. 1. Throughout its journey, the river is heavily polluted by a mix of treated and untreated sewage and industrial effluents from approximately 45 major industrial facilities, such as sugar mills, distilleries, paper mills, and food processing units. The soil in this region is classified into two types: Khaddar and Bangar, each exhibiting distinct textures and varying susceptibility to contamination based on their formation conditions15. The river’s pollution problem is exacerbated by the discharge of untreated or inadequately treated wastewater and effluents from large-scale industries like ITC, Indian Potash Limited (IPL), Titawi Sugar Mill, Star Paper Mill in Saharanpur, and various smaller chemical manufacturers. On a daily basis, the Hindon River is burdened with about 50 million liters of wastewater, including sewage and sugar mill effluents, which are funneled through its tributaries like the Dhamola, Paondhoi, and Nagdehi rivers. For over 25 years, local farmers have utilized the water from the Hindon River for irrigation purposes. However, in the last decade, the quality of this water has deteriorated drastically due to rapid sugar mill ization and urbanization16.

Fig. 1
figure 1

Path depiction of Hindon river.

Sample collection

To collect water samples from the wastewater channel, a methodical approach was employed using a 350 ml dipper. Initially, the dipper was rinsed with the wastewater to eliminate any residual contaminants that could affect the results. Samples were collected from zones where water was actively flowing; avoiding stagnant areas, overgrown vegetation, and muddy regions to ensure the samples were representative of the channel’s conditions. The water samples were then carefully transferred into pre-cleaned plastic bottles to prevent cross-contamination. Each sample was divided into multiple portions and clearly labeled for subsequent analysis.

To prepare samples for further analysis, portions of the water samples were acidified to a pH range of 1.5 to 2 using concentrated hydrochloric acid (HCl). This acidification step is critical as it stabilizes the metal ions in the solution, preventing them from precipitating out during storage and analysis, which could otherwise lead to inaccurate measurements. Additionally, the samples were stored in cool, dark conditions. This storage method minimizes potential chemical and biological changes that could occur if the samples were exposed to light or varying temperatures, thereby preserving the analytic concentrations and ensuring the accuracy of the analytical results17,18.

Soil sample collection

The samples were gathered from the region located between the coordinates 29.4727° N and 77.7085° E, near the city of Muzaffarnagar in western Uttar Pradesh. Soil samples were collected utilizing the \(\:``Cell\:Division\:Method".\) This process involved defining a rectangular area of 25 × 40 m on the ground. Within this designated area, samples were extracted from 16 strategically placed diagonal sites to ensure comprehensive coverage. In total, 100 sampling points were identified, with six samples taken from each point. Samples were collected from a depth of 0–30 cm at regular intervals to ensure they were representative. These individual samples from within the cell were then thoroughly mixed to form a composite sample, providing a more uniform representation of the soil properties. The resulting composite samples were placed in air-tight bags to minimize external contamination. Each bag was carefully labeled for proper identification and documentation, ensuring that the samples could be accurately tracked and analyzed in subsequent studies19,20.

Physio chemical analysis

Firstly the obtained soil samples were air dried and dried soil samples were first weighed, grounded, and then subjected to sieve analysis for its physical and chemical analysis. Following the collection of soil and water samples, a series of physicochemical analyses was conducted. The key parameters such as pH, temperature, texture, mineral concentration, conductivity, and turbidity were measured using specialized instruments designed for accurate readings. The pH of both soil and water samples was measured using a microprocessor pH meter IG-10PH21. This pH meter offered a resolution of 0.01 pH and maintained an accuracy of ± 0.01 pH. Temperature analysis was performed using a thermocouple (DTM 3000-Spezial LKM Electronics, GmbH)22. Conductivity measurements for both soil and water were conducted using a digital conductivity meter (CDH-280-KIT)23. For turbidity assessments, the ISO 7027 Compliant Benchtop Turbidity Meter - HI88713 was utilized to evaluate the turbidity levels in both soil and water samples24. The selection of the soil-to-water extract ratio for measuring pH, electrical conductivity (EC), and turbidity is critical for obtaining accurate and representative values of these parameters. Commonly utilized extract ratios, apart from the saturated soil paste method, include 1:1, 1:2, and 1:5 soil-to-water mixtures. Further the analysis of micro as well as macronutrients of soil had been carried out in soil testing lab of Mohali, Punjab.

Analysis of soil samples

For further analysis, the obtained samples were subjected to moisture content analysis. This analysis has been carried out to measure bound volumetric moisture content in soil. Firstly the samples were dried in a forced air oven at 110℃ for 6 h, following the AOAC guidelines. Then these samples were subjected to \(\:``MOC63u"\) electronic moisture analyzer measurement of its volumetric moisture content depicted in Fig. 2. The process was repeated in triplicate to ensure accuracy. The average moisture content of these samples was found to be falling in bracket of approximately \(\:1\:to\:3\%.\:\) once dried; the samples were allowed to cool inside a desiccator to prevent any contamination or moisture absorption.

Fig. 2
figure 2

\(\:"\varvec{M}\varvec{O}\varvec{C}63\varvec{u}"\) electronic moisture analyzer

Further, for the analysis of dielectric properties soil samples, sample with variable moisture content were prepared25. Using the formula provided in “equ. 1,” triplicate samples with the desired volumetric moisture content were prepared. This was achieved by gradually adding a calculated amount of deionized water to dry soil samples.

$$\:{M}_{n}=\:\frac{{W}_{W}-{W}_{d}}{{W}_{W}}\:\times\:100$$
(1)

Open-ended coaxial probe method

The dielectric properties of soil were assessed using the open-ended coaxial probe method. In this experiment coaxial probe 85070E from Agilent Technologies connected to an Agilent E5071C vector network analyzer (VNA). Controlled by Agilent software on an external computer, the VNA recorded the reflection coefficients \(\:{(S}_{11})\) of electromagnetic waves interacting with the soil samples. This data was then used to calculate\(\:{\:\epsilon\:}^{{\prime\:}}\:and\:{\epsilon\:}^{{\prime\:}{\prime\:}}\)26,27.

The samples were held at a consistent bulk density and temperature in a specially designed brass cylinder, as described by Palta et al. (2022). To ensure complete soil coverage and avoid air gaps, the probe was inserted from the bottom of the holder, with soil added from the top to avoid any type of errors and noise. Measurements covered 1601 points across a frequency range of 10 MHz to 3 GHz. The VNA was calibrated using air, short, and deionized water standards at 25°C, and calibration accuracy was confirmed by comparing the dielectric properties of deionized water with standard results. Soil samples were then placed in the holder, and their dielectric properties were measured over a temperature range from 0°C to 60°C, with each temperature point measured in triplicate. Temperature control was maintained using a water circulator connected to the sample holder with insulated pipes, increasing the temperature at a rate of 1°C per minute. The mean values of \(\:{\:\epsilon\:}^{{\prime\:}}\:and\:{\epsilon\:}^{{\prime\:}{\prime\:}}\)were computed, with relative measurement errors of \(\:\pm\:1.2\%\:and\:\pm\:3.03\%,\) respectively.

Statistical analysis

The obtained results had been modeled using Machine learning (ML) approaches. To establish baseline models and analyze linear relationships within the data, Linear Regression, Decision Tree Regression, and Support Vector Machine Regression were utilized. To improve predictive accuracy and effectively capture nonlinear patterns, advanced machine learning algorithms such as Random Forest Regression, XGBoost Regression, and Neural Network Regression were subsequently applied28,29. By integrating these complementary approaches, our analysis aimed not only to uncover nuanced correlations within the dataset but also to provide robust predictions and insights into the interplay of variables under study. This combined methodology enabled a thorough exploration of data dynamics, facilitating a deeper understanding of the experimental findings and their practical implications.

Result

Physiochemical properties of wastewater

Several analyses of the physical and chemical properties of sugar mill wastewater were conducted, and the results are summarized in Table 1. The sugar mill wastewater results indicate significant contamination. The light brown to black colour and rotten egg odour suggest high levels of organic matter and hydrogen sulfide gas. The temperature range of \({\rm 13\pm\:2.22\:to\:29\pm\:1.54^\circ C}\) is within normal environmental limits. High suspended solids (\(\:40\:to\:45\%)\) and turbidity (\(\:80\pm\:1.54\:to\:88\pm\:2.01\:NTU)\) reflect substantial particulate matter. The pH values ranging from \(\:7.63\:to\:6.26\) indicate slightly alkaline to mildly acidic conditions. Elevated conductivity (\(\:1222\pm\:12.06\:to\:1313\pm\:10.62\:\mu\:S/cm)\) points to a high concentration of dissolved ions. These indicators confirm the presence of substantial pollution in water confirmed by earlier studies also30,31.

Table 1 Physiochemical properties of wastewater.

Physiochemical properties of soil samples

The physicochemical analysis shown in Table 2 reveals several key characteristics and potential environmental concerns. The soil colour ranges from dark brown to black, indicating high organic content, and it has a sandy loam texture. The pH values suggest slightly acidic conditions. Soil moisture content is relatively low, while the electrical conductivity (EC) values indicate moderate salinity levels. Turbidity values reflect some level of suspended particles. The calcium content and chlorides are within normal ranges, suggesting moderate mineral content. High organic matter and organic carbon indicate significant decomposition of organic materials, contributing to soil fertility. Nitrogen, Zinc, manganese, Sulphate and lead content indicates a certain level of contamination but should be monitored due to its potential toxicity32,33.

Table 2 Physiochemical properties of soil affected by wastewater.

Dielectric properties

To assess the impact of sugar mill wastewater irrigation on the dielectric properties of soil, the analysis was conducted in three stages. First, the properties of the wastewater used for irrigation were measured. This was followed by an evaluation of the properties of the soil affected by this wastewater. Finally, the properties of the food crops grown in the irrigated area were examined. The analysis of dielectric properties of soil has been carried out in frequency region of 10 MHz to 3 GHz.

Dielectric properties of sugar mill wastewater

The study indicates that effluent-contaminated water from sugar mills exhibits significantly higher dielectric properties compared to standard irrigation water. This increase in the dielectric constant is primarily due to the increased mineral content in the contaminated water, as evidenced by its physicochemical characteristics. Specifically, the ε’ value of the contaminated water decreases from 109.04 to 72.61 at 10 MHz and from 38.48 to 25.69 at 14 GHz as the temperature increases from 0 to 60ºC, as shown in Fig. 3(A). This reduction in ε’ with rising temperature is due to enhanced molecular motion, which disrupts dipole alignment and reduces the water’s ability to polarize in response to an electric field34. Ionic polarization, where ions in the contaminated water align with the electric field, is more pronounced due to the higher concentration of dissolved ions and minerals. These ions contribute to a significant dielectric response, even at higher frequencies, indicating that the contaminated water has substantial ionic polarization characteristics35.

Fig. 3
figure 3

Variation of \(\:{\varvec{\epsilon\:}}^{\varvec{{\prime\:}}}\) and \(\:{\varvec{\epsilon\:}}^{\varvec{{\prime\:}}\varvec{{\prime\:}}}\:\)in sugar mill wastewater as function of frequency and temperature

In addition to changes in the dielectric constant, ε’’ values, which measures the energy dissipated as heat under an alternating electric field, also varies with temperature. At 0ºC ε’’ decreases from 114.34 at 10 MHz to 1.19 at 3 GHz. However, as the temperature increases, ε’’ rises significantly, reaching 978.58 at 10 MHz and 5.26 at 3 GHz at 60°C depicted in Fig. 3(B).

This increase is due to higher ionic mobility as temperature reduces the viscosity of water, allowing ions to move more freely and thereby increasing ionic conductivity. Elevated temperatures also enhance thermal agitation, which accelerates the reorientation of dipoles in response to the alternating electric field, resulting in greater energy dissipation and an increased dielectric loss factor36. Furthermore, as temperature increases, hydrogen bonds within the water molecules are weakened or disrupted, allowing for more rapid and easier reorientation of the molecules in the electric field. This effect not only increases dielectric loss but also reflects a complex interplay between molecular motion, ionic behavior, and the overall dielectric response of the contaminated water37.

Dielectric behavior of soil in frequency region of 10 mhz to 3 ghz

The results have indicated that the both the soils of the region have unique dielectric response under the effect of sugar mill contaminants.

Khadar soil

Figure 4 depicts the dielectric properties (ε’ and ε’’) of Khadar soil are influenced by its water content and the presence of contaminants. As depicted in Fig. 5, the ε’ value decreases from 37.45 at 10 MHz to 8.16 at 3 GHz at a temperature of 0°C Similar trends are observed across other temperature ranges; however, with increasing temperature, the ε’ value decrease from 37.45 to 15.62 at 10 MHz and from 8.16 to 5.43 at 3 GHz as the temperature rises from 0°C to 60°C

In parallel, the ε’’ values exhibit significant increases for soils continuously irrigated with water contaminated by sugar mill waste. Specifically, the ε’’ value increase from 149.24 to 232.52 at 10 MHz and shows a variation from 9.33 to 5.67 at 3 GHz. This behavior is attributed to the elevated ion concentration in the soils, suggesting that the dielectric response of Khadar soil across variable frequencies confirms the propagation of ionic polarization to higher values38.

Fig. 4
figure 4

Variation of \(\:{\varvec{\epsilon\:}}^{\varvec{{\prime\:}}}\) and \(\:{\varvec{\epsilon\:}}^{\varvec{{\prime\:}}\varvec{{\prime\:}}}\:\)in Khadar Soil with temperature and frequency

Bangar soil

The results in Fig. 5 show that the ε’ and ε’’ values of the studied soil exhibit distinct variations with frequency and temperature. Specifically, the ε’ value decreases from 60.1 at 10 MHz to 49.74 at 3 GHz at a temperature of 0℃. This trend persists across other temperature ranges; as the temperature increases from 0℃ to 60℃, the ε’ value declines from 60.1 to 49.74 at 10 MHz and from 41.43 to 34.35 at 3 GHz. In contrast, ε’’ value demonstrates a significant increase, rising from 110.55 to 523.88 at 10 MHz and varying from 1.15 to 3.11 at 3 GHz.

Fig. 5
figure 5

Variation of \(\:{\varvec{\epsilon\:}}^{\varvec{{\prime\:}}}\) and \(\:{\varvec{\epsilon\:}}^{\varvec{{\prime\:}}\varvec{{\prime\:}}}\:\) values of Bangar soil

These findings indicate that both dielectric parameters exhibit higher values compared to Khaddar soils. The observed variations and peak formations in the dielectric curves suggest that the soil in question possesses a sandier texture and lower water-holding capacity. Additionally, the reduced organic matter content in these soils is reflected in the dielectric properties, confirming that they are less fertile than Khaddar soils39,40.

These results emphasize the critical role of dielectric properties in assessing soil characteristics. The substantial decrease in ε’ with increasing frequency and temperature can be linked to the dispersion phenomena and relaxation processes inherent in soil structure. Meanwhile, the pronounced increase in ε’’ with temperature suggests enhanced ionic conductivity and dipole relaxation dynamics, indicative of the soil’s physicochemical state. These changes in dielectric properties emphasize the moderate texture and water retention capabilities of the soils, while also illustrating how temperature and contamination influence the soil’s dielectric response.

Dielectric properties of food materials

The dielectric properties of crop materials cultivated in fields irrigated with wastewater exhibit considerable variability as shown in Table 3. This analysis was carried at frequency of 2.45GHz. The ε’ and ε’’ of these food items are notably affected by the increased ion concentration resulting from sugar mill waste. The ε’ and ε’’ values are generally higher due to the ion concentration from sugar mill waste. Similarly, ε’’ is higher, indicating substantial energy dissipation within the food materials. As the frequency increases, ε’’ values decrease, reflecting a transition from ionic to dipolar polarization dominance. Earlier studies of similar food materials have confirmed the obtained results41,42.

Table 3 Dielectric properties of crops grown in wastewater affected field.

Statistical modelling

Exploratory data analysis (EDA) for ε’ and ε’’

For ε’

The dataset contains a total of 336,210 observations and encompasses five variables: Size, Moisture, Temperature (Temp), Frequency, and Dielectric Constant. The Size variable ranges from 0.002 to 2.0, with a mean of 0.5179 and a standard deviation of 0.7522. Moisture values vary between 0.1 and 0.47, averaging 0.232 and showing a standard deviation of 0.0982. Temperature ranges widely from 5.0 to 60.0, with an average of 30.71 and a standard deviation of 18.98. Frequency ranges from 10 MHz to 3 GHz. Finally, the Dielectric Constant spans from 4.016 to 43.889, with a mean of 16.32 and a standard deviation of 8.02, as detailed in Table 4.

Table 4 Data analysis for ε’.

For ε’’

The dataset consists of 256,160 observations, with Size, Moisture, Temp (Temperature), Frequency, and DL (Dielectric Loss) as the variables of interest. Size ranges from 0.002 to 2.0 with a mean of 0.6154 and a standard deviation of 0.7886, while Moisture varies between 0.1 and 0.25, with a mean of 0.175 and a standard deviation of 0.0559. Temperature spans from 5.0 to 70.0, with a mean of 35.625 and a standard deviation of 21.9997. Frequency exhibits a wide range from 10 MHz to 3 GHz. Dielectric Loss ranges from 0.379892 to 9.861584, with a mean of 3.238827 and a standard deviation of 2.254475 as shown in Table 5.

Table 5 Data analysis for ε’’.

Heat map correlation for ε’ and ε’’

Figure 6 illustrates heat maps that display the Pearson correlation between the input features (Size, Moisture, Temp, Frequency) and the target variables, ε’ and ε’’. These heat maps indicate the strength and direction of the relationships between each input feature and the target variables. For ε’, a positive correlation indicates that an increase in the value of an input feature corresponds to an increase in ε’, while a negative correlation suggests that a rise in the input feature is linked to a decrease in ε’. Likewise, for ε’’, a positive correlation means that higher input feature values are associated with greater ε’’ values, whereas a negative correlation implies that increases in the input feature lead to lower ε’’ values. Understanding these correlations is essential for pinpointing the most significant features for predicting ε’ and ε’’, ultimately improving the accuracy and effectiveness of the predictive model.

Fig. 6
figure 6

Heat map correlation for (A) DC (B) DL.

Artificial neural networks (ANN) modelling

Artificial Neural Networks (ANNs) for regression tasks generate continuous outputs by emulating the way the human brain operates through layers of interconnected neurons. Each neuron processes incoming data by calculating weighted sums, applying an activation function to introduce non-linearity, and passing the output to the next layer. During the forward propagation phase, the input data is transformed as it moves through the hidden layers, culminating in a final prediction. The network’s effectiveness is assessed using loss functions, including Mean Squared Error (MSE), R², and Mean Absolute Error (MAE), which measure the discrepancies between the predicted and actual values. In the back propagation phase, the gradients of the loss function with respect to the weights and biases are computed, enabling iterative adjustments through an optimization algorithm like Adam to reduce errors. This iterative process allows the ANN to capture complex patterns and enhance its prediction accuracy over time.

Fig. 7
figure 7

ANN with features and target.

The ANN model for predicting the dielectric constant /dielectric loss based on input parameters like size, moisture constant, temp and frequency would consist of an input layer with four nodes corresponding to these input features. These nodes would feed into one or more hidden layers, where each neuron processes the input data, applies an activation function, and passes the result to subsequent layers. Finally, the output layer would have one node representing the predicted dielectric constant/ dielectric loss as shown in Fig. 7.

Table 6 details the procedure for developing and training an ANN regression model aimed at predicting the ε’ and ε’’ values. The process begins with loading the dataset and partitioning it into training and testing sets, using an 80 − 20 split. Following this, feature scaling is implemented to standardize the input variables. The ANN model is constructed using the Sequential model from Keras, incorporating an input layer, one or more hidden layers with ReLU activation functions, and an output layer. The training process spans 20 epochs with a batch size of 32, and the model is compiled with the Adam optimizer and Mean Squared Error (MSE) as the loss function. Once training is complete, the model generates predictions on the test set, and performance metrics such as MSE, MAE, RMSE, and R² score are computed to assess its accuracy. The pseudocode serves as a comprehensive guide for constructing and evaluating the ANN regression model for predicting the dielectric constant and dielectric loss shown in Table 6.

Table 6 Pseudo code of ANN model.

ANN modelling result for ε’

The ANN modelling results for predicting the ε’ show the training process over 20 epochs as shown in Table 7. Each epoch reports the loss value for both the training and validation sets. Initially, the loss decreases significantly from 63.4889 to 2.3699 during the first epoch and continues to decrease gradually in subsequent epochs. By the end of the training process, the loss converges to 0.8415 for the training set and 0.8402 for the validation set. This decreasing trend in loss values indicates that the model is learning from the data and improving its predictive capability over epochs. Figure 8 show the loss curves of ANN model for ε’.

Table 7 ANN modeling result for ε’.
Fig. 8
figure 8

Loss curves (A) and Actual Vs. Predicted values (B)of ANN model for ε’.

The ANN model was initially trained with a learning rate of 0.005 and a mini-batch size of 2000 instances, followed by fine-tuning with a reduced learning rate of 0.0001. The training utilized a total of 336,210 experimental data instances. The performance of the proposed ANN-based prediction model yielded a Mean Absolute Error (MAE) of 0.59, a Mean Squared Error (MSE) of 0.84, and a validation Root Mean Square Error (RMSE) of 0.91 for ε’, indicating a strong correlation between the actual and predicted values. Additionally, the model achieved an R² score of 0.9961. Table 8, 9 presents the actual and predicted results for ε’.

Table 8 Predicted results of developed model.

ANN modelling result for ε’’

Similar to ε’, the ANN model for predicting ε’’ was trained over 20 epochs, with a notable drop in loss from 1.4133 in the first epoch, and gradually converging to 0.0275 by the final epoch. Figure 9 displays the loss curves, indicating consistent learning and improvement. Trained on 336,210 instances at an initial learning rate of 0.005, and fine-tuned at 0.0001, the model achieved impressive results: MAE of 0.364, MSE of 0.20, validation RMSE of 0.45, and R² scores of 0.9594 for the training and validation sets, respectively. Table 10 contrast actual and predicted ε’’ values, demonstrating the model’s accuracy.

Table 9 ANN modelling result for ε’’.
Fig. 9
figure 9

Loss curves (A) and Actual Vs. Predicted values (B) of ANN model for ε’.

Table 10 Predicted results of developed model for ε’’.

Conclusion

Soil property alterations

There is a notable increase in dielectric properties due to heightened ion concentrations from wastewater infiltration. Additionally, the soil water holding capacity has increased by 22%, which is crucial for enhancing water retention and supporting crop growth.

Chemical changes

The wastewater has caused the soil pH to shift towards acidity by 0.8 units, impacting nutrient availability and microbial activity. Furthermore, the elevated levels of nitrogen (28%) and phosphorus (18%) indicate a fertilizing effect, though there are concerns about potential nutrient leaching into groundwater.

Differential soil responses

The study identified distinct differences in the response of Khaddar and Bangar soils to the wastewater. Khaddar soil, characterized by its recent alluvial deposits, exhibited a higher dielectric constant compared to the older, more weathered Bangar soil. This indicates varying levels of soil compaction and moisture content, influenced by the wastewater’s organic matter and chemical constituents.

Machine learning applications

Machine learning algorithms proved effective in analyzing and interpreting the complex dielectric properties with R2-score of 0.9961, 0.9594 for ε’ and ε’’ for soils impacted by sugarcane mill wastewater. This approach enabled the identification of intricate patterns and correlations within the dielectric data, thereby enhancing our understanding of soil health dynamics under various environmental stresses.

Strategic implications

Integrating machine learning with dielectric property analysis not only advances scientific understanding but also facilitates proactive management practices. This integration is vital for devising targeted strategies to mitigate the adverse effects of wastewater discharge on agricultural productivity and environmental sustainability.

Future scope of analysis

This study underscores the critical need for effective wastewater management and soil treatment strategies to mitigate the adverse effects of sugar mill waste on agricultural productivity and soil health. Future research should focus on several key areas: conducting longitudinal studies over multiple growing seasons to assess chronic impacts on soil and crop health; exploring innovative remediation techniques like phytoremediation and biochar application to reduce contaminant levels; expanding crop-specific analysis to determine the impacts on various plant species and develop targeted agricultural practices; implementing advanced monitoring technologies using IoT and AI for real-time soil health data collection; investigating sustainable agricultural practices such as organic amendments, crop rotation, and conservation tillage to enhance soil fertility and resilience; and collaborating with policymakers to establish and enforce regulations that limit sugar mill pollution and ensure the safe use of wastewater in agriculture.

Disclosure of interests

The authors declare that there are no conflicts of interest that could have influenced the results of this study.