Introduction

The algal is crucial for the structure, function, and global biogeochemical cycles of lake ecosystems1,2. Algal, with their diverse morphological characteristics and ecological functions, are key components of lake ecosystems3,4,5,6. In recent years, the trend of eutrophication in lakes and reservoirs in China has rapidly intensified due to global climate change and increased human activities7,8,9,10. Eutrophication causes an overgrowth of algal, resulting in harmful algal blooms (HABs) that significantly disrupt local industries and threaten public safety11,12. Algal blooms formed by different algal species have different hazards to lakes, patterns of occurrence and development, and ways of management13. Therefore, it is imperative to identify algal species and quantify algal biomass changes on large spatiotemporal scales14,15.

Current methods for identifying algal species include microscopy, flow cytometry, genome sequencing, optical imaging, and High-Performance Liquid Chromatography (HPLC)16,17,18,19. These methods primarily utilize the cellular morphological characteristics and algal DNA for identification20. However, these methods are impractical for real-time and large-scale monitoring due to their high cost and labor-intensive nature. In contrast, remote sensing technology, characterized by its low cost, real-time updates, and large-scale monitoring capabilities, demonstrates tremendous potential for identifying algal species21,22. The detection of algal species from space depends on biomass or other indicators of abundance23, utilizing the sensors’ remote sensing reflectance (Rrs) data in conjunction with the absorption and backscattering spectral characteristics of algal for identification24,25.

With the continuous advancements in optical measurement methods and satellite sensor technologies, methods for identifying algal species have undergone significant development26. Numerous optical sensors, including the Moderate Resolution Imaging Spectroradiometer (MODIS), Medium Resolution Imaging Spectrometer (MERIS), Sea-viewing Wide Field-of-view Sensor (SeaWiFS), and Sentinel-3 Ocean and Land Colour Instrument (OLCI), provide rich spectral information20,21,27,28. These sensors are capable of distinguishing different algal species by establishing relationships between in situ measurement data and the remote sensing reflectance (Rrs) information obtained by the sensors. However, most studies on algal identification have focused on identifying single species or distinguishing single algal in natural waters, with a predominant emphasis on oceanic waters. The complex composition of algal and the distinctive optical characteristics of lake environments place higher demands and present greater challenges for remote sensing technologies. Consequently, further in-depth research and development are essential for identifying algal in lake environments.

In recent years, advancements in machine learning and data processing methods have opened new avenues for the long-term remote sensing identification of algal species29,30. Various machine learning methods, such as random forest regression, neural networks, and boosted regression trees, have been applied in this area31,32,33. Machine learning algorithms, with their distinctive capabilities in managing large datasets and addressing complex nonlinear issues, often achieve high estimation accuracy and can integrate data from multiple sources. As a result, machine learning models present a promising and reliable approach for identifying algal species in optically complex inland waters.

Hulun Lake, a cold-region lake located in a high-latitude inland area, has experienced ongoing water quality deterioration in recent years, marked by an increasing frequency and intensity of HABs34,35,36. In order to gain insights into the patterns of change in algal biomass abundance and the processes of succession, extinction and recovery, this study focuses on: (1) Establishing a remote sensing inversion model for determining the composition and biomass abundance of algal species in Hulun Lake using machine learning algorithms; (2) Analyzing the spatiotemporal evolution of different algal species in Hulun Lake from 2016 to 2023, based on OLCI imagery; (3) Exploring the spatiotemporal variability of dominant algal species and the driving mechanisms behind climatic and water quality parameters. Through this paper, we aim to comprehensively understand the dynamic changes in algal in Hulun Lake and provide scientific support for managing and protecting lake ecosystems.

Materials and methodology

Study area

Hulun Lake (117°00′10″-117°41′40″, 48°30′40″-49°20′40″) is located in the Hulunbuir Grassland on the eastern Inner Mongolia Plateau (Fig. 1a). It is a typical large lake in the high-latitude semi-arid region of northern China, with a total area of 2,037.3 km2 and an average depth of 5.7 meters35,37. The climate in this region is classified as semi-arid continental monsoon, with an annual average temperature of -0.24 °C and a winter ice-covered period lasting 170–180 days38. The algal species in Hulun Lake is primarily composed of Cyanophyta, Bacillariophyta, Chlorophyta, Cryptophyta, Dinophyta, Chrysophyta, and Euglenophyta, with Cyanophyta being the dominant group36. In recent years, cyanobacterial blooms in Hulun Lake have been increasing annually, making it one of China’s most severely affected lakes and imposing significant environmental pressure on the aquatic ecosystem39.

Fig. 1
figure 1

(a) geographical location of Hulun Lake. (b) distribution of sample points in Hulun Lake. (This map was generated using ArcGIS 10.8 Esri, http://www.esri.com).

Materials and methods

Field sampling and laboratory processing

This study conducted several field surveys between July 2022 and September 2023, collecting a total of 141 water samples from Hulun Lake (Fig. 1b). The sampling points were evenly distributed across the entire lake. We collected water samples (0 to 50 cm depth) in 1 L black high-density polyethylene (HDPE) bottles and fixed them with Lugol’s iodine solution. Algal species were identified under a biological optical microscope and the algal counts and genera names were determined40,41. In this study the phytoplankton biomass was determined using the cell volume conversion method. The species composition, density, and the algae’s biomass were recorded and calculated. The biomass of each algal species was determined by multiplying the cell density by the cell volume, and the total algal biomass was obtained by summing the biomass of all algal species. In this study, the biomass abundance of algal is defined as the ratio of the biomass of a specific algal group (Cyanophyta, Chlorophyta, Bacillariophyta) to the total algal biomass.

Remote sensing data and preprocessing

The Ocean and Land Colour Instrument (OLCI) onboard the Sentinel-3 A/3B satellites is a suitable satellite sensor for monitoring inland lakes. It features 21 spectral bands ranging from visible to near-infrared wavelengths (400–1020 nm), a spatial resolution of 300 m, and revisits the same area every 1.8 days. In this study, the remote sensing data used are OLCI Level-1B top-of-atmosphere radiance images, obtained from the National Aeronautics and Space Administration (NASA) Ocean Color website(https://oceancolor.gsfc.nasa.gov/). These data encompass multiple high-quality, cloud-free images of the study area from May 2016 to October 2023, totaling 409 scenes (Table S1). To eliminate atmospheric effects and derive the true Rrs of water pixels, we employed Acolite for atmospheric correction of the OLCI data. The corrected reflectance values exhibited high consistency with the measured spectral reflectance (Figure S1)42,43,44.

Following data quality checks, we selected 78 samples for further processing. Criteria for data validation included: firstly, confirming samples originated from points of algal bloom outbreaks to exclude anomalous conditions; secondly, ensuring compatibility with high-quality OLCI images. We removed cloud, shadow, and land signals through quality assurance measures to ensure the image quality met analytical requirements. Finally, we verified that matched points were within 24 h of image acquisition time to maintain temporal and spatial consistency and data accuracy.

Lake masks and environmental parameters

We employed the Normalized Difference Water Index (NDWI) thresholding method to delineate the water boundaries45. To mitigate the impact of algal blooms and aquatic vegetation on the inversion results, the Normalized Difference Vegetation Index (NDVI) was used to remove anomalous pixels46. Additionally, to account for mixed pixels and adjacency effects, the extracted lake boundary was inwardly buffered by 2 pixels (approximately 600 m) to define Hulun Lake’s water boundary.

Meteorological data for Hulun Lake were obtained from the Xinbaerhu Right Banner Meteorological Station (48.67°N, 116.82°E) and downloaded from the National Centers for Environmental Information (NCEI, https://www.ncei.noaa.gov). These data covered daily temperature (°C), wind speed (m/s), pressure (kPa), and precipitation (mm) from 2016 to 2023. Monthly averages were computed for temperature (TEMP), wind speed (WDSP), pressure (STP), and precipitation (PRCR).

Water quality data were collected from each sampling point approximately 0.5 m below the surface, with 2–4 L of water stored in HDPE bottles. Laboratory analysis provided data on chlorophyll a (Chla), total phosphorus (TP), total nitrogen (TN), chromophoric dissolved organic matter (CDOM), phycocyanin (PC), dissolved organic carbon (DOC), and total suspended matter (TSM)43,47,48,49,50,51.

Algae species identification algorithm

In this study, we employed the Machine learning (ML) algorithm to construct models for estimating the biomass abundance of different algal species.A total of 6 commonly used machine learning models were compared, including extreme gradient boosting (XGBoost)52, support vector regression (SVR), backpropagation neural network (BP), GradientBoosting Decision Tree (GBDT), random forest (RF) and Categorical Boosting (CatBoost).

Model development is a critical step in effectively utilizing remote sensing data for the large-scale identification of algal species and estimating biomass abundance in Lake Hulun. Initially, we conducted correlation analyses between in-situ measured biomass fractions of Cyanophyta, Chlorophyta, and Bacillariophyta and corresponding OLCI reflectance. We systematically examined all potential combinations of spectral bands and original band reflectances. Subsequently, we trained models using single bands, band ratios, and various combinations (including addition, subtraction, multiplication, and division) of OLCI Rrs against the biomass abundance of the three algal species, ensuring each combinations achieved Pearson correlation coefficients (p < 0.05, R ≥ 0.5). A total of 1360 band combinations were generated (Fig. 2), each evaluated for correlation coefficient R, standard error, t-test, p-value, and residuals (with R values close to 1, and standard error, t-test, p-value, and residuals close to 0 or negative values). Based on these metrics, suitable spectral bands and band ratios were identified as candidate variables for algal species identification and biomass abundance modeling. The selected bands and corresponding combinations are detailed in Table 1. The technical workflow for estimating phytoplankton biomass abundance in Lake Hulun based on OLCI imagery is illustrated in Fig. 3. Statistical analyses were conducted using SPSS 27.0 and Python 3.9.

Fig. 2
figure 2

Correlation of 1360 band combinations.

Table 1 ML modeling basic bands and band combinations.

Model construction, training, testing, and evaluation criteria

In this study, we utilized the determination coefficient (R2), root mean square error (RMSE), and mean absolute percentage error (MAPE) to assess the predictive accuracy of the models. The models were constructed and evaluated for accuracy using the scikit-learn library in Python.

$${R^{2} = 1 - \frac{{\mathop \sum \nolimits_{{i = 1}}^{N} \left( {y_{i} - y_{i}^{\prime} } \right)^{2} }}{{\mathop \sum \nolimits_{{i = 1}}^{N} \left( {y_{i} - \bar{y}} \right)^{2} }}}$$
(1)
$${RMSE = \sqrt {\frac{{\sum _{{i = 1}}^{N} \left( {y_{i}^{\prime} - y_{i} } \right)^{2} }}{N}} }$$
(2)
$${MAPE = \frac{1}{N} \times \sum _{{i = 1}}^{N} \left| {\frac{{y_{i}^{\prime} - y_{i} }}{{y_{i} }}} \right| \times 100\% }$$
(3)

Where N represents the sample size, \(\:{y}_{i}\) and \(\:{y}_{i}^{{\prime\:}}\) represent the in-situ measured values and the estimated values obtained through the algorithm, respectively. The R2 is commonly used to assess the predictive performance of the model, while RMSE and MAPE are employed to evaluate the consistency between measured and predicted values.

Fig. 3
figure 3

General framework employed in this study.

Results

In-situ measurement of algal

Field surveys identified a total of 180 algal species across 7 phyla in Lake Hulun. The predominant groups include Cyanophyta, Bacillariophyta, Chlorophyta, Cryptophyta, Dinophyta, Chrysophyta, and Euglenophyta (Fig. 4). Chlorophyta is the most diverse with 84 species (46.7%), followed by Cyanophyta with 39 species (21.7%), Bacillariophyta with 33 species (18.3%), and Euglenophyta with 10 species (5.6%). Other phyla have relatively fewer species: Dinophyta with 8 species (4.4%), Chrysophyta with 3 species (1.7%), and Cryptophyta with 2 species (1.1%). Cyanophyta the highest average biomass (212.8 mg/L), followed by Bacillariophyta (1.05 mg/L), Cryptophyta (0.29 mg/L), and Chlorophyta (0.18 mg/L). Biomass for other algal classes is below 0.01 mg/L. Overall, Cyanophyta, Chlorophyta, and Bacillariophyta dominate the phytoplankton community structure in Lake Hulun, significantly influencing the abundance of phytoplankton in the lake.

Fig. 4
figure 4

Results of Four Field Surveys Showing the average algal Biomass. (Due to a large Cyanobacterial bloom outbreak in Lake Hulun in 2022, this resulted in an unusually high Cyanobacterial biomass.)

Algal biomass abundance model

The selected feature bands from Sect. 2.3 were used as input variables to construct inversion models for algal species and biomass abundance. We divided the data into 3 data sets, and the partitioning ratio was 60% training, 20% verification, and 20% testing. Table S2 lists the performance metrics estimated for different algal species.

Different algae and different machine learning have shown great differences in performance (Fig. 5, Figure S2). Overall, the machine learning algorithms used to develop the algal biomass abundance algorithm performed better in the calibration and validation datasets, including the GBDT, RF, and XGBoost regression algorithms, while the BP and Catboost algorithms were slightly overfitted and the SVR algorithm was poorly fitted. The results showed that XGBoost had a significant advantage in estimating the biomass abundance of the Cyanophyta (R2 = 0.92, RMSE = 1.78%, MAPE = 9.96%), while RF was more responsive to changes in the biomass abundance of the Chlorophyta (R2 = 0.72, RMSE = 6.57%, MAPE = 50.8%), and the GBDT had a good fitting performance for Bacillariophyta biomass abundance (R2 = 0.9, RMSE = 4.66%, MAPE = 47.87%). In situ measurements and machine learning model predictions performed well in all three cases and were able to generate algal biomass abundance maps for large-scale satellite observations of Lake Hulun.

Fig. 5
figure 5

RF (a), GBT (b) and XGB (c) evaluation of the performance of Cyanophyta, Chlorophyta and Bacillarophyta.

Interannual variation of algal

Based on the aforementioned ML models, we calculated the spatial distribution changes in biomass abundance of three algal species (Cyanophyta, Chlorophyta, and Bacillariophyta) from 2016 to 2023 in Lake Hulun. The results were derived by averaging the inversion results of biomass abundance for each pixel. As depicted in Fig. 6a-c, significant spatial and interannual distribution differences were observed in the composition of algal species in Lake Hulun. Overall, Cyanophyta biomass (44.62 ± 3.47%) predominated, followed by Bacillariophyta (36.35 ± 2.68%), with Chlorophyta biomass (10.42 ± 1.08%) being the lowest. These three algal classes collectively accounted for 91.4 ± 1.55% of all algae in Lake Hulun (Figure S3). From Figure S4 it can be seen that Cyanophyta and Bacillariophyta abundance frequencies are mainly concentrated at 40% and Chlorophyta at 10%.

Regarding spatial distribution changes, Cyanophyta biomass abundance exhibited a southwest-to-northeast decreasing trend. Regions in the western, southern, and central parts of the lake displayed relatively high Cyanophyta biomass abundance. Chlorophyta biomass abundance were relatively evenly distributed throughout the lake, with higher proportions in the lake center than the shores, and lower values observed in the southeast and along the lake’s coastline. Bacillariophyta biomass abundance were predominantly higher in the eastern and northern parts of the lake, showing a northeast-to-southwest decreasing trend. The distribution of Bacillariophyta exhibited a clear inverse relationship with Cyanophyta; areas with high Cyanophyta biomass abundance tended to have low Bacillariophyta biomass abundance, and vice versa, where Cyanophyta biomass abundance were low, Bacillariophyta biomass abundance were high (Fig. 6).

Regarding temporal distribution changes from 2016 to 2023(Figure S4), notable variations were observed in Cyanophyta biomass abundance, with a relatively large fluctuation range. Overall, Cyanophyta biomass abundance showed an increasing trend (Fig. 7), with peaks observed in 2018 (51.24 ± 4.09%) and 2022 (49.38 ± 14.7%), coinciding with outbreaks of cyanobacterial blooms in these years, leading to higher Cyanophyta biomass abundance compared to other years. In other years, biomass abundance remained relatively stable, ranging from 40.56 ± 3.43% to 44.18 ± 3.32%, with an average of 44 ± 3.5%.

Chlorophyta biomass abundance exhibited relatively stable variations, with an average fraction of 10 ± 1%. Overall, slight fluctuations were observed, with the highest fraction occurring in 2021 (12.04 ± 3.83%) and the lowest in 2023 (8.5 ± 1.82%). Bacillariophyta showed an opposite trend to Cyanophyta, with an overall decreasing trend in biomass abundance. In 2018 (33.61 ± 3%) and 2022 (30.51 ± 10.2%), the biomass abundance of Bacillariophyta were lower due to the massive outbreak of Cyanophyta, placing Bacillariophyta at a disadvantage in competition with Cyanophyta, leading to a decrease in its biomass abundance. In other years, abundance remained relatively stable (37.05 ± 2.3% to 39.31 ± 1.6%), with an average abundance of 36 ± 2.7%.

Fig. 6
figure 6

The distribution of biomass abundance of three different algal species, Cyanophyta (a), Chlorophyta (b), and Bacillariophyta (c), from 2016 to 2023.

Fig. 7
figure 7

The distribution of biomass abundance of three different phytoplankton taxa, Cyanophyta (a), Chlorophyta (b), and Bacillariophyta (c), from 2016 to 2023.

Seasonal variation of algal

The distribution of Cyanophyta, Chlorophyta, and Bacillariophyta biomass abundance in Lake Hulun from 2016 to 2023 (Fig. 8a) and their monthly averages (Fig. 8b) reveal distinct trends in the biomass abundance of these three algal species. From May to July, Cyanophyta biomass abundance peaks above 46.5 ± 10.6%, dip to a minimum of 39.4 ± 4.34% in August, and gradually rise thereafter. Chlorophyta biomass abundance shows an overall decreasing trend until October. From May to August, there is a decline, with a slight increase observed in June (11.34 ± 2.04%), reaching the lowest proportion in August (7.01 ± 1.41%). The highest value occurs in October (13.04 ± 4.07%). In contrast, Bacillariophyta biomass abundance exhibits a trend opposite to Cyanophyta, with relatively lower proportions (< 35.63 ± 7.92%) from May to July and reaching the highest proportion in September (40.09 ± 2.23%). The average biomass abundance of the three algal species rank Cyanophyta highest, followed by Bacillariophyta and Chlorophyta. It is noteworthy that in September, the biomass abundance of Bacillariophyta and Cyanophyta is very close, at 40.12 ± 3.85% and 40.09 ± 2.23%, respectively, indicating that both Bacillariophyta and Cyanophyta dominate during this period.

Fig. 8
figure 8

The distribution of biomass proportions of three different algal species (a), from 2016 to 2023. (b) display the monthly distribution of three different algal species.

Additionally, this study investigated variations in algal biomass abundance across different seasons (Fig. 9). The results indicate significant variations in algal biomass abundance across seasons in Lake Hulun. Regarding the seasonal variations of individual algal species, Cyanophyta and Chlorophyta exhibit higher average biomass fractions in spring, reaching 48.02 ± 10.6% and 11.2 ± 2.2%, respectively, with slight decreases observed in summer and autumn. This suggests that spring has higher biomass abundance for Cyanophyta and Chlorophyta. In autumn, Bacillariophyta biomass abundance reaches the highest proportion, at 38.29 ± 2%, indicating that autumn is the season with higher biomass abundance for Bacillariophyta. Overall, the distribution pattern shows Cyanophyta predominating over Bacillariophyta and Chlorophyta, indicating Cyanophyta dominates the algal species in Lake Hulun, followed by Bacillariophyta and then Chlorophyta.

We observed that in some years, the biomass abundance of Bacillariophyta exceeded that of Cyanophyta, becoming the predominant algal group. Specifically, in spring, Bacillariophyta biomass abundance exceeded that of Cyanophyta in 2017, 2019, and 2021, while in autumn, Bacillariophyta biomass abundance surpassed Cyanophyta in 2016, making Bacillariophyta the dominant species. In the remaining years, Cyanophyta predominated.

Fig. 9
figure 9

Illustrates the seasonal variation in biomass abundance of Cyanophyta, Chlorophyta, and Bacillariophyta in Lake Hulun from 2016 to 2023.

Discussion

Impact of meteorological factors on algal species

The biomass abundance of Cyanophyta, Chlorophyta, and Bacillariophyta exhibit significant heterogeneity at both temporal and spatial scales. To investigate the potential influence of climatic factors on the interannual variations of algal biomass in Lake Hulun, we calculated correlation coefficients and corresponding significance test p-values between different algal species and temperature, atmospheric pressure, wind speed, maximum wind speed, and precipitation (Figure S6). Chlorophyta and Bacillariophyta shows either concurrent increase or decrease trends, contrasting with the trends observed in Cyanophyta. These variations in Cyanophyta and Bacillariophyta are consistent with previous findings53, indicating potential differences in the adaptive capacity and response mechanisms of different algal species to environmental conditions.

The results reveal significant correlations between temperature (TEMP) and the biomass abundance of the algal species (Fig. 10a). Specifically, temperature shows a significantly positive correlation with Cyanophyta biomass abundance (r = 0.23, p < 0.01) and negative correlations with Chlorophyta and Bacillariophyta biomass abundance (r = -0.47, p < 0.01; r = -0.13, p < 0.01) (Figure S6). Different algal species and taxonomic groups exhibit varied responses to temperature, influencing their growth rates and distributions54. Research indicates optimal growth temperatures ranging from 5 °C to 25 °C for Bacillariophyta55, 25 °C to 30 °C for Cyanophyta56,57, and up to 30 °C for Chlorophyta55. These temperature preferences among populations may lead to shifts in dominance towards species with greater adaptability within the ecosystem. The growth and reproduction of Cyanophyta are particularly influenced by local temperatures58, with warmer conditions promoting their proliferation at the expense of other algal species59,60. Given Lake Hulun’s location in the cold high-latitude Inner Mongolia Plateau, Bacillariophyta’s adaptation to low-temperature environments makes it more susceptible to temperature increases compared to Cyanophyta61, potentially placing Bacillariophyta at a competitive disadvantage62.

Furthermore, as Cyanophyta biomass abundance increase, Bacillariophyta biomass abundance correspondingly decrease. Chlorophyta biomass, relatively low in Lake Hulun, is greatly influenced by the dynamics of the other two algal classes. An increase in Cyanophyta biomass abundance consequently leads to a decrease in Chlorophyta biomass abundance, highlighting the significant negative correlation between TEMP and Chlorophyta biomass abundance.

Additionally, the results indicate a negative correlation between surface air temperature (STP) and Cyanophyta biomass abundance (r = -0.11, p < 0.05), a positive correlation with Chlorophyta biomass abundance (r = 0.21, p < 0.01), and a non-significant positive correlation with Bacillariophyta biomass abundance (r = 0.08, p > 0.05) (Fig. 10b). This suggests that STP may inhibit the growth of Cyanophyta12, possibly due to high-pressure systems causing water column stratification and influencing heat flux fluctuations on the lake surface63. Changes in atmospheric pressure affect air temperature through mechanisms such as adiabatic processes and weather systems, which in turn impact algal growth64,65. Given Cyanophyta’s predominant biomass abundance compared to Chlorophyta, a decrease in Cyanophyta biomass abundance tends to increase Chlorophyta biomass abundance.

WDSP and MXSPD exhibited similar trends concerning the biomass abundance of three algal types. Both WDSP and MXSPD showed negative correlations with Cyanobacteria biomass (r = -0.27, p < 0.01; r = -0.28, p < 0.01) and positive correlations with Chlorophyta and Bacillariophyta (r = 0.16, p < 0.01; r = 0.19, p < 0.01; r = 0.19, p < 0.01; r = 0.18, p < 0.01) (Fig. 10c). Strong winds on the lake disrupt the water column and reduce surface stability. The predominant Cyanobacteria species in Lake Hulun are Dolichospermum circinale and Microcystis sp. These species are characterized by unique gas vesicles that enable them to adjust buoyancy66, allowing them to sink to maintain stability under strong winds, which consequently reduces their biomass abundance at the water surface. Bacillariophyta and Chlorophyta cannot actively adjust their position in the water column, which causes them to remain at the water surface for a short time after the lake experiences strong winds. Meanwhile, their biomass proportion increases due to the reduction in cyanobacteria biomass. Our results are consistent with the study by Moreno-Ostos67. Due to the low daily precipitation in the Hulun Lake Basin, there is insufficient data correlating with algal biomass, thus the impact of PRCR on algal biomass abundance is not significant, as indicated by Spearman correlation lacking statistical significance (p > 0.05).

Fig. 10
figure 10

Daily time series (May 1, 2016 value October 31, 2023) of Cyanophyta, Chlorophyta, and Bacillariophyta biomass share and distribution of STP (a), WDSP (b), and TEMP (c) in Lake Hulun derived from Sentinel3A\B-OLCI.

Impact of water quality parameters on phytoplankton

In this study, we also analyzed the relationship between the biomass abundance of three algal species and 9 different Water Quality Parameters (WQP). In this respect, the correlation coefficients and corresponding significance test p-values were calculated (Fig. 11). Generally, nitrogen and phosphorus are essential elements for the growth and reproduction of algal68,69, and competition among algal intensifies with increasing availability of nutrients. This competitive pressure leads to the dominance of certain species over others, thereby altering the structure of algal communities70,71. However, we found that the correlations between the biomass abundance of three algal species and TP and TN were not strong. This may be attributed to Lake Hulun being a eutrophic lake, with consistently high levels of TP, TN, and other nutrients providing sufficient nutrients for the growth of most algal species.

Interestingly, Chla shows a significant positive correlation with Cyanophyta biomass proportion. A high Chla concentration indicates that Cyanophyta are in a flourishing reproductive state. Under such conditions, the growth rate of Chlorophyta and Bacillariophyta is lower than that of Cyanophyta72. Therefore, during algal blooms, Cyanophyta biomass levels are in absolute dominance, inhibiting the growth and dominance of other algal species. In contrast, under conditions of low Chla concentration, algae experience a less vigorous growth period, typically occurring from autumn to winter when decreased water temperatures limit algal growth to some extent58. However, Bacillariophyta are tolerant to low temperatures and prefer habitats with favorable optical transparency73,74, giving them a survival advantage over Cyanophyta in competition. With a wide range of suitable habitats and numerous species, Chlorophyta can maintain a certain abundance even during periods of low water temperature75, thus the biomass abundance of Chlorophyta does not significantly affect Chla variations.

Fig. 11
figure 11

illustrates the relationship between biomass abundance of Cyanophyta, Chlorophyta, and Bacillariophyta and various water quality parameters (WQPs). The parameters include DOC, SPIM, SPOM, PC, CDOM, TP, Chla, and TN. Asterisks indicate levels of significance, with * representing p < 0.05 and ** representing p < 0.01.

Applicability and uncertainty of model

The ML algorithm was utilized to develop a model for estimating the biomass abundance of various algal categories. Our approach successfully inverted and estimated the abundance of Cyanobacteria, Chlorophyta, and Bacillariophyta biomass relative to the total algal biomass in Lake Hulun. However, the performance of the ML model heavily relies on the quality of the training dataset. Insufficient or unrepresentative samples in the dataset may lead to overfitting or underfitting of the model, resulting in reduced robustness30. Therefore, it is crucial to ensure that the training dataset contains an adequate number of representative samples covering diverse scenarios within the target area. Moreover, our study exclusively addressed the biomass proportions of three algal categories without considering less abundant species like cryptophytes, dinoflagellates, and diatoms, and their potential influence on the inversion results. Thus, a comprehensive consideration of these factors is necessary to enhance the predictive accuracy and generalizability of the model, thereby optimizing its performance in practical applications.

Moreover, the relationship between the biomass of different algal groups and the Rrs characteristics may be influenced by ecosystem structure. When the biomass of certain algal is extremely low, its spectral characteristics may be masked by other substances (e.g., CDOM, SPIM). Although Sentinel-3 OLCI boasts a higher signal-to-noise ratio and spectral resolution compared to previous sensors, enhancing its sensitivity to algal changes in lakes, atmospheric correction algorithms can impact the accuracy of OLCI images. Therefore, when utilizing OLCI data for algal studies, selecting appropriate atmospheric correction algorithms is crucial to ensure the reliability and accuracy of research findings. Future advancements in sensor technology are needed to address this challenge more effectively. Noteworthy, on February 8, 2024, NASA successfully launched a new Earth observation satellite named the Plankton, Aerosol, Cloud, ocean Ecosystem (PACE), which will provide new avenues for remote sensing studies of phytoplankton communities in polar lakes76. PACE’s 5 nm spectral resolution allows it to capture very fine spectral details. This high resolution is particularly valuable for distinguishing between different types of algae, particulates, and other water constituents, which can often exhibit similar spectral characteristics. The ability to resolve these subtle differences enhances the accuracy of algae classification and biomass estimation.

Implications for environmental management

Understanding the composition and temporal trends of algal species is crucial for comprehending inland aquatic ecosystems77. In this study, we conducted a detailed analysis of the biomass abundance of three algal species in cold region lakes using long-term observational data, providing new insights and data support. This analytical approach has broad applicability in future research: (1) enhancing the precision of ecosystem health assessment through more accurate estimation of algal biomass, (2) exploring spatiotemporal variability and driving mechanisms of dominant algal species to understand ecosystem changes, and (3) predicting global distribution trends of algal communities under climate change conditions, thereby supporting environmental protection and policy-making.

The impact, developmental patterns, and control methods of algal blooms caused by different phytoplankton vary significantly13. Therefore, understanding the dynamic changes in these diverse algal communities is crucial for lake management. Through long-term sequential observations of biomass abundance across different algal species and comprehensive analysis using remote sensing data, scientific decision support can be provided to lake managers. This approach reveals developmental trends and characteristics of algal communities, offering a solid scientific basis for formulating targeted control strategies and thereby enhancing the protection and management of lake ecosystems.

Conclusion

This study aimed to establish an efficient model based on machine learning methods to estimate the biomass abundance of different algal in optically complex inland cold region lakes. A biomass abundance inversion model using OLCI images was developed. The results indicate that the XGBoost for estimating Cyanophyta’s biomass abundance achieved the highest accuracy; RF for estimating Chlorophyta’s biomass abundance achieved the highest accuracy; GBDT for estimating Bacillariophyta’s biomass abundance achieved the highest accuracy. Cyanophyta and Bacillariophyta were the predominant algal species in Lake Hulun, often coexisting with other algal groups. Biomass abundance varied significantly across different years and seasons, with Cyanophyta peaking in 2018 and 2022 due to cyanobacterial blooms, while diatoms exhibited lower values and Chlorophyta showed relatively stable changes. Cyanophyta and chlorophyta had higher biomass abundance in spring, whereas diatoms were more prevalent in autumn. Meteorological factors such as temperature, pressure, and wind speed influenced the biomass abundance of different algal. Our model not only accurately estimates algal biomass abundance but also enhances understanding of their spatiotemporal distribution and trends, providing crucial support for water quality monitoring and ecological protection. With the launch of new sensors in the future, this method may facilitate easier identification of optically complex inland waters, significantly improving our ability to monitor different algal species.