Abstract
Climate change and human activities affect the biomass of different algal and the succession of dominant species. In the past, phytoplankton phyla inversion has been focused on oceanic and continental shelf waters, while phytoplankton phyla inversion in inland lakes and reservoirs is still in the initial and exploratory stage, and the research results are relatively few. Especially for mid-to-high latitude lakes, the research is even more blank. Therefore, this study proposes a machine learning method based on OLCI/Sentinel-3 satellite imagery to retrieve algal biomass abundance. Remote sensing models were developed to estimate the biomass abundance of three major algal groups: Cyanophyta, Chlorophyta, and Bacillariophyta. This study compared and evaluated 6 commonly used machine learning models, including extreme gradient boosting (XGBoost), support vector regression (SVR), backpropagation neural network (BP), gradient boosting decision tree (GBDT), random forest (RF), and categorical boosting (CatBoost). The results indicated that XGBoost exhibited the highest accuracy (R2 = 0.92, RMSE = 1.78%, MAPE = 9.96%) in estimating Cyanophyta’s biomass abundance. The RF model demonstrated the highest accuracy for estimating Chlorophyta’s biomass abundance (R2 = 0.72, RMSE = 6.57%, MAPE = 50.8%), while the GBDT model exhibited the highest accuracy for estimating Bacillariophyta’s biomass abundance (R2 = 0.9, RMSE = 4.66%, MAPE = 47.87%). The models were subsequently applied to all cloud-free OLCI images from Hulun Lake during the ice-free periods from 2016 to 2023, producing spatiotemporal distribution maps of the different phytoplankton biomass abundance. Cyanophyta dominated the biomass abundance (44.62 ± 3.47%), followed by Bacillariophyta (36.35 ± 2.68%), and Chlorophyta had the lowest proportion (10.42 ± 1.08%). Together, these three algae groups constituted 91.4 ± 1.55% of all phytoplankton in Hulun Lake. Significant annual variations in the biomass abundance of Cyanophyta and Bacillariophyta were observed, whereas those of Chlorophyta remained stable. Additionally, this study examined the effects of climatic factors and water quality parameters on the biomass abundance of algae. The findings suggest that temperature, wind speed, and atmospheric pressure are critical factors influencing the biomass abundance of the different algae groups. This study not only fills the gaps in the related field, but also provides a new method for monitoring algae, as well as a strong support for realizing the goals of sustainable management of water resources and ecological protection.
Similar content being viewed by others
Introduction
The algal is crucial for the structure, function, and global biogeochemical cycles of lake ecosystems1,2. Algal, with their diverse morphological characteristics and ecological functions, are key components of lake ecosystems3,4,5,6. In recent years, the trend of eutrophication in lakes and reservoirs in China has rapidly intensified due to global climate change and increased human activities7,8,9,10. Eutrophication causes an overgrowth of algal, resulting in harmful algal blooms (HABs) that significantly disrupt local industries and threaten public safety11,12. Algal blooms formed by different algal species have different hazards to lakes, patterns of occurrence and development, and ways of management13. Therefore, it is imperative to identify algal species and quantify algal biomass changes on large spatiotemporal scales14,15.
Current methods for identifying algal species include microscopy, flow cytometry, genome sequencing, optical imaging, and High-Performance Liquid Chromatography (HPLC)16,17,18,19. These methods primarily utilize the cellular morphological characteristics and algal DNA for identification20. However, these methods are impractical for real-time and large-scale monitoring due to their high cost and labor-intensive nature. In contrast, remote sensing technology, characterized by its low cost, real-time updates, and large-scale monitoring capabilities, demonstrates tremendous potential for identifying algal species21,22. The detection of algal species from space depends on biomass or other indicators of abundance23, utilizing the sensors’ remote sensing reflectance (Rrs) data in conjunction with the absorption and backscattering spectral characteristics of algal for identification24,25.
With the continuous advancements in optical measurement methods and satellite sensor technologies, methods for identifying algal species have undergone significant development26. Numerous optical sensors, including the Moderate Resolution Imaging Spectroradiometer (MODIS), Medium Resolution Imaging Spectrometer (MERIS), Sea-viewing Wide Field-of-view Sensor (SeaWiFS), and Sentinel-3 Ocean and Land Colour Instrument (OLCI), provide rich spectral information20,21,27,28. These sensors are capable of distinguishing different algal species by establishing relationships between in situ measurement data and the remote sensing reflectance (Rrs) information obtained by the sensors. However, most studies on algal identification have focused on identifying single species or distinguishing single algal in natural waters, with a predominant emphasis on oceanic waters. The complex composition of algal and the distinctive optical characteristics of lake environments place higher demands and present greater challenges for remote sensing technologies. Consequently, further in-depth research and development are essential for identifying algal in lake environments.
In recent years, advancements in machine learning and data processing methods have opened new avenues for the long-term remote sensing identification of algal species29,30. Various machine learning methods, such as random forest regression, neural networks, and boosted regression trees, have been applied in this area31,32,33. Machine learning algorithms, with their distinctive capabilities in managing large datasets and addressing complex nonlinear issues, often achieve high estimation accuracy and can integrate data from multiple sources. As a result, machine learning models present a promising and reliable approach for identifying algal species in optically complex inland waters.
Hulun Lake, a cold-region lake located in a high-latitude inland area, has experienced ongoing water quality deterioration in recent years, marked by an increasing frequency and intensity of HABs34,35,36. In order to gain insights into the patterns of change in algal biomass abundance and the processes of succession, extinction and recovery, this study focuses on: (1) Establishing a remote sensing inversion model for determining the composition and biomass abundance of algal species in Hulun Lake using machine learning algorithms; (2) Analyzing the spatiotemporal evolution of different algal species in Hulun Lake from 2016 to 2023, based on OLCI imagery; (3) Exploring the spatiotemporal variability of dominant algal species and the driving mechanisms behind climatic and water quality parameters. Through this paper, we aim to comprehensively understand the dynamic changes in algal in Hulun Lake and provide scientific support for managing and protecting lake ecosystems.
Materials and methodology
Study area
Hulun Lake (117°00′10″-117°41′40″, 48°30′40″-49°20′40″) is located in the Hulunbuir Grassland on the eastern Inner Mongolia Plateau (Fig. 1a). It is a typical large lake in the high-latitude semi-arid region of northern China, with a total area of 2,037.3 km2 and an average depth of 5.7 meters35,37. The climate in this region is classified as semi-arid continental monsoon, with an annual average temperature of -0.24 °C and a winter ice-covered period lasting 170–180 days38. The algal species in Hulun Lake is primarily composed of Cyanophyta, Bacillariophyta, Chlorophyta, Cryptophyta, Dinophyta, Chrysophyta, and Euglenophyta, with Cyanophyta being the dominant group36. In recent years, cyanobacterial blooms in Hulun Lake have been increasing annually, making it one of China’s most severely affected lakes and imposing significant environmental pressure on the aquatic ecosystem39.
(a) geographical location of Hulun Lake. (b) distribution of sample points in Hulun Lake. (This map was generated using ArcGIS 10.8 Esri, http://www.esri.com).
Materials and methods
Field sampling and laboratory processing
This study conducted several field surveys between July 2022 and September 2023, collecting a total of 141 water samples from Hulun Lake (Fig. 1b). The sampling points were evenly distributed across the entire lake. We collected water samples (0 to 50 cm depth) in 1 L black high-density polyethylene (HDPE) bottles and fixed them with Lugol’s iodine solution. Algal species were identified under a biological optical microscope and the algal counts and genera names were determined40,41. In this study the phytoplankton biomass was determined using the cell volume conversion method. The species composition, density, and the algae’s biomass were recorded and calculated. The biomass of each algal species was determined by multiplying the cell density by the cell volume, and the total algal biomass was obtained by summing the biomass of all algal species. In this study, the biomass abundance of algal is defined as the ratio of the biomass of a specific algal group (Cyanophyta, Chlorophyta, Bacillariophyta) to the total algal biomass.
Remote sensing data and preprocessing
The Ocean and Land Colour Instrument (OLCI) onboard the Sentinel-3 A/3B satellites is a suitable satellite sensor for monitoring inland lakes. It features 21 spectral bands ranging from visible to near-infrared wavelengths (400–1020 nm), a spatial resolution of 300 m, and revisits the same area every 1.8 days. In this study, the remote sensing data used are OLCI Level-1B top-of-atmosphere radiance images, obtained from the National Aeronautics and Space Administration (NASA) Ocean Color website(https://oceancolor.gsfc.nasa.gov/). These data encompass multiple high-quality, cloud-free images of the study area from May 2016 to October 2023, totaling 409 scenes (Table S1). To eliminate atmospheric effects and derive the true Rrs of water pixels, we employed Acolite for atmospheric correction of the OLCI data. The corrected reflectance values exhibited high consistency with the measured spectral reflectance (Figure S1)42,43,44.
Following data quality checks, we selected 78 samples for further processing. Criteria for data validation included: firstly, confirming samples originated from points of algal bloom outbreaks to exclude anomalous conditions; secondly, ensuring compatibility with high-quality OLCI images. We removed cloud, shadow, and land signals through quality assurance measures to ensure the image quality met analytical requirements. Finally, we verified that matched points were within 24 h of image acquisition time to maintain temporal and spatial consistency and data accuracy.
Lake masks and environmental parameters
We employed the Normalized Difference Water Index (NDWI) thresholding method to delineate the water boundaries45. To mitigate the impact of algal blooms and aquatic vegetation on the inversion results, the Normalized Difference Vegetation Index (NDVI) was used to remove anomalous pixels46. Additionally, to account for mixed pixels and adjacency effects, the extracted lake boundary was inwardly buffered by 2 pixels (approximately 600 m) to define Hulun Lake’s water boundary.
Meteorological data for Hulun Lake were obtained from the Xinbaerhu Right Banner Meteorological Station (48.67°N, 116.82°E) and downloaded from the National Centers for Environmental Information (NCEI, https://www.ncei.noaa.gov). These data covered daily temperature (°C), wind speed (m/s), pressure (kPa), and precipitation (mm) from 2016 to 2023. Monthly averages were computed for temperature (TEMP), wind speed (WDSP), pressure (STP), and precipitation (PRCR).
Water quality data were collected from each sampling point approximately 0.5 m below the surface, with 2–4 L of water stored in HDPE bottles. Laboratory analysis provided data on chlorophyll a (Chla), total phosphorus (TP), total nitrogen (TN), chromophoric dissolved organic matter (CDOM), phycocyanin (PC), dissolved organic carbon (DOC), and total suspended matter (TSM)43,47,48,49,50,51.
Algae species identification algorithm
In this study, we employed the Machine learning (ML) algorithm to construct models for estimating the biomass abundance of different algal species.A total of 6 commonly used machine learning models were compared, including extreme gradient boosting (XGBoost)52, support vector regression (SVR), backpropagation neural network (BP), GradientBoosting Decision Tree (GBDT), random forest (RF) and Categorical Boosting (CatBoost).
Model development is a critical step in effectively utilizing remote sensing data for the large-scale identification of algal species and estimating biomass abundance in Lake Hulun. Initially, we conducted correlation analyses between in-situ measured biomass fractions of Cyanophyta, Chlorophyta, and Bacillariophyta and corresponding OLCI reflectance. We systematically examined all potential combinations of spectral bands and original band reflectances. Subsequently, we trained models using single bands, band ratios, and various combinations (including addition, subtraction, multiplication, and division) of OLCI Rrs against the biomass abundance of the three algal species, ensuring each combinations achieved Pearson correlation coefficients (p < 0.05, R ≥ 0.5). A total of 1360 band combinations were generated (Fig. 2), each evaluated for correlation coefficient R, standard error, t-test, p-value, and residuals (with R values close to 1, and standard error, t-test, p-value, and residuals close to 0 or negative values). Based on these metrics, suitable spectral bands and band ratios were identified as candidate variables for algal species identification and biomass abundance modeling. The selected bands and corresponding combinations are detailed in Table 1. The technical workflow for estimating phytoplankton biomass abundance in Lake Hulun based on OLCI imagery is illustrated in Fig. 3. Statistical analyses were conducted using SPSS 27.0 and Python 3.9.
Model construction, training, testing, and evaluation criteria
In this study, we utilized the determination coefficient (R2), root mean square error (RMSE), and mean absolute percentage error (MAPE) to assess the predictive accuracy of the models. The models were constructed and evaluated for accuracy using the scikit-learn library in Python.
Where N represents the sample size, \(\:{y}_{i}\) and \(\:{y}_{i}^{{\prime\:}}\) represent the in-situ measured values and the estimated values obtained through the algorithm, respectively. The R2 is commonly used to assess the predictive performance of the model, while RMSE and MAPE are employed to evaluate the consistency between measured and predicted values.
Results
In-situ measurement of algal
Field surveys identified a total of 180 algal species across 7 phyla in Lake Hulun. The predominant groups include Cyanophyta, Bacillariophyta, Chlorophyta, Cryptophyta, Dinophyta, Chrysophyta, and Euglenophyta (Fig. 4). Chlorophyta is the most diverse with 84 species (46.7%), followed by Cyanophyta with 39 species (21.7%), Bacillariophyta with 33 species (18.3%), and Euglenophyta with 10 species (5.6%). Other phyla have relatively fewer species: Dinophyta with 8 species (4.4%), Chrysophyta with 3 species (1.7%), and Cryptophyta with 2 species (1.1%). Cyanophyta the highest average biomass (212.8 mg/L), followed by Bacillariophyta (1.05 mg/L), Cryptophyta (0.29 mg/L), and Chlorophyta (0.18 mg/L). Biomass for other algal classes is below 0.01 mg/L. Overall, Cyanophyta, Chlorophyta, and Bacillariophyta dominate the phytoplankton community structure in Lake Hulun, significantly influencing the abundance of phytoplankton in the lake.
Algal biomass abundance model
The selected feature bands from Sect. 2.3 were used as input variables to construct inversion models for algal species and biomass abundance. We divided the data into 3 data sets, and the partitioning ratio was 60% training, 20% verification, and 20% testing. Table S2 lists the performance metrics estimated for different algal species.
Different algae and different machine learning have shown great differences in performance (Fig. 5, Figure S2). Overall, the machine learning algorithms used to develop the algal biomass abundance algorithm performed better in the calibration and validation datasets, including the GBDT, RF, and XGBoost regression algorithms, while the BP and Catboost algorithms were slightly overfitted and the SVR algorithm was poorly fitted. The results showed that XGBoost had a significant advantage in estimating the biomass abundance of the Cyanophyta (R2 = 0.92, RMSE = 1.78%, MAPE = 9.96%), while RF was more responsive to changes in the biomass abundance of the Chlorophyta (R2 = 0.72, RMSE = 6.57%, MAPE = 50.8%), and the GBDT had a good fitting performance for Bacillariophyta biomass abundance (R2 = 0.9, RMSE = 4.66%, MAPE = 47.87%). In situ measurements and machine learning model predictions performed well in all three cases and were able to generate algal biomass abundance maps for large-scale satellite observations of Lake Hulun.
Interannual variation of algal
Based on the aforementioned ML models, we calculated the spatial distribution changes in biomass abundance of three algal species (Cyanophyta, Chlorophyta, and Bacillariophyta) from 2016 to 2023 in Lake Hulun. The results were derived by averaging the inversion results of biomass abundance for each pixel. As depicted in Fig. 6a-c, significant spatial and interannual distribution differences were observed in the composition of algal species in Lake Hulun. Overall, Cyanophyta biomass (44.62 ± 3.47%) predominated, followed by Bacillariophyta (36.35 ± 2.68%), with Chlorophyta biomass (10.42 ± 1.08%) being the lowest. These three algal classes collectively accounted for 91.4 ± 1.55% of all algae in Lake Hulun (Figure S3). From Figure S4 it can be seen that Cyanophyta and Bacillariophyta abundance frequencies are mainly concentrated at 40% and Chlorophyta at 10%.
Regarding spatial distribution changes, Cyanophyta biomass abundance exhibited a southwest-to-northeast decreasing trend. Regions in the western, southern, and central parts of the lake displayed relatively high Cyanophyta biomass abundance. Chlorophyta biomass abundance were relatively evenly distributed throughout the lake, with higher proportions in the lake center than the shores, and lower values observed in the southeast and along the lake’s coastline. Bacillariophyta biomass abundance were predominantly higher in the eastern and northern parts of the lake, showing a northeast-to-southwest decreasing trend. The distribution of Bacillariophyta exhibited a clear inverse relationship with Cyanophyta; areas with high Cyanophyta biomass abundance tended to have low Bacillariophyta biomass abundance, and vice versa, where Cyanophyta biomass abundance were low, Bacillariophyta biomass abundance were high (Fig. 6).
Regarding temporal distribution changes from 2016 to 2023(Figure S4), notable variations were observed in Cyanophyta biomass abundance, with a relatively large fluctuation range. Overall, Cyanophyta biomass abundance showed an increasing trend (Fig. 7), with peaks observed in 2018 (51.24 ± 4.09%) and 2022 (49.38 ± 14.7%), coinciding with outbreaks of cyanobacterial blooms in these years, leading to higher Cyanophyta biomass abundance compared to other years. In other years, biomass abundance remained relatively stable, ranging from 40.56 ± 3.43% to 44.18 ± 3.32%, with an average of 44 ± 3.5%.
Chlorophyta biomass abundance exhibited relatively stable variations, with an average fraction of 10 ± 1%. Overall, slight fluctuations were observed, with the highest fraction occurring in 2021 (12.04 ± 3.83%) and the lowest in 2023 (8.5 ± 1.82%). Bacillariophyta showed an opposite trend to Cyanophyta, with an overall decreasing trend in biomass abundance. In 2018 (33.61 ± 3%) and 2022 (30.51 ± 10.2%), the biomass abundance of Bacillariophyta were lower due to the massive outbreak of Cyanophyta, placing Bacillariophyta at a disadvantage in competition with Cyanophyta, leading to a decrease in its biomass abundance. In other years, abundance remained relatively stable (37.05 ± 2.3% to 39.31 ± 1.6%), with an average abundance of 36 ± 2.7%.
Seasonal variation of algal
The distribution of Cyanophyta, Chlorophyta, and Bacillariophyta biomass abundance in Lake Hulun from 2016 to 2023 (Fig. 8a) and their monthly averages (Fig. 8b) reveal distinct trends in the biomass abundance of these three algal species. From May to July, Cyanophyta biomass abundance peaks above 46.5 ± 10.6%, dip to a minimum of 39.4 ± 4.34% in August, and gradually rise thereafter. Chlorophyta biomass abundance shows an overall decreasing trend until October. From May to August, there is a decline, with a slight increase observed in June (11.34 ± 2.04%), reaching the lowest proportion in August (7.01 ± 1.41%). The highest value occurs in October (13.04 ± 4.07%). In contrast, Bacillariophyta biomass abundance exhibits a trend opposite to Cyanophyta, with relatively lower proportions (< 35.63 ± 7.92%) from May to July and reaching the highest proportion in September (40.09 ± 2.23%). The average biomass abundance of the three algal species rank Cyanophyta highest, followed by Bacillariophyta and Chlorophyta. It is noteworthy that in September, the biomass abundance of Bacillariophyta and Cyanophyta is very close, at 40.12 ± 3.85% and 40.09 ± 2.23%, respectively, indicating that both Bacillariophyta and Cyanophyta dominate during this period.
Additionally, this study investigated variations in algal biomass abundance across different seasons (Fig. 9). The results indicate significant variations in algal biomass abundance across seasons in Lake Hulun. Regarding the seasonal variations of individual algal species, Cyanophyta and Chlorophyta exhibit higher average biomass fractions in spring, reaching 48.02 ± 10.6% and 11.2 ± 2.2%, respectively, with slight decreases observed in summer and autumn. This suggests that spring has higher biomass abundance for Cyanophyta and Chlorophyta. In autumn, Bacillariophyta biomass abundance reaches the highest proportion, at 38.29 ± 2%, indicating that autumn is the season with higher biomass abundance for Bacillariophyta. Overall, the distribution pattern shows Cyanophyta predominating over Bacillariophyta and Chlorophyta, indicating Cyanophyta dominates the algal species in Lake Hulun, followed by Bacillariophyta and then Chlorophyta.
We observed that in some years, the biomass abundance of Bacillariophyta exceeded that of Cyanophyta, becoming the predominant algal group. Specifically, in spring, Bacillariophyta biomass abundance exceeded that of Cyanophyta in 2017, 2019, and 2021, while in autumn, Bacillariophyta biomass abundance surpassed Cyanophyta in 2016, making Bacillariophyta the dominant species. In the remaining years, Cyanophyta predominated.
Discussion
Impact of meteorological factors on algal species
The biomass abundance of Cyanophyta, Chlorophyta, and Bacillariophyta exhibit significant heterogeneity at both temporal and spatial scales. To investigate the potential influence of climatic factors on the interannual variations of algal biomass in Lake Hulun, we calculated correlation coefficients and corresponding significance test p-values between different algal species and temperature, atmospheric pressure, wind speed, maximum wind speed, and precipitation (Figure S6). Chlorophyta and Bacillariophyta shows either concurrent increase or decrease trends, contrasting with the trends observed in Cyanophyta. These variations in Cyanophyta and Bacillariophyta are consistent with previous findings53, indicating potential differences in the adaptive capacity and response mechanisms of different algal species to environmental conditions.
The results reveal significant correlations between temperature (TEMP) and the biomass abundance of the algal species (Fig. 10a). Specifically, temperature shows a significantly positive correlation with Cyanophyta biomass abundance (r = 0.23, p < 0.01) and negative correlations with Chlorophyta and Bacillariophyta biomass abundance (r = -0.47, p < 0.01; r = -0.13, p < 0.01) (Figure S6). Different algal species and taxonomic groups exhibit varied responses to temperature, influencing their growth rates and distributions54. Research indicates optimal growth temperatures ranging from 5 °C to 25 °C for Bacillariophyta55, 25 °C to 30 °C for Cyanophyta56,57, and up to 30 °C for Chlorophyta55. These temperature preferences among populations may lead to shifts in dominance towards species with greater adaptability within the ecosystem. The growth and reproduction of Cyanophyta are particularly influenced by local temperatures58, with warmer conditions promoting their proliferation at the expense of other algal species59,60. Given Lake Hulun’s location in the cold high-latitude Inner Mongolia Plateau, Bacillariophyta’s adaptation to low-temperature environments makes it more susceptible to temperature increases compared to Cyanophyta61, potentially placing Bacillariophyta at a competitive disadvantage62.
Furthermore, as Cyanophyta biomass abundance increase, Bacillariophyta biomass abundance correspondingly decrease. Chlorophyta biomass, relatively low in Lake Hulun, is greatly influenced by the dynamics of the other two algal classes. An increase in Cyanophyta biomass abundance consequently leads to a decrease in Chlorophyta biomass abundance, highlighting the significant negative correlation between TEMP and Chlorophyta biomass abundance.
Additionally, the results indicate a negative correlation between surface air temperature (STP) and Cyanophyta biomass abundance (r = -0.11, p < 0.05), a positive correlation with Chlorophyta biomass abundance (r = 0.21, p < 0.01), and a non-significant positive correlation with Bacillariophyta biomass abundance (r = 0.08, p > 0.05) (Fig. 10b). This suggests that STP may inhibit the growth of Cyanophyta12, possibly due to high-pressure systems causing water column stratification and influencing heat flux fluctuations on the lake surface63. Changes in atmospheric pressure affect air temperature through mechanisms such as adiabatic processes and weather systems, which in turn impact algal growth64,65. Given Cyanophyta’s predominant biomass abundance compared to Chlorophyta, a decrease in Cyanophyta biomass abundance tends to increase Chlorophyta biomass abundance.
WDSP and MXSPD exhibited similar trends concerning the biomass abundance of three algal types. Both WDSP and MXSPD showed negative correlations with Cyanobacteria biomass (r = -0.27, p < 0.01; r = -0.28, p < 0.01) and positive correlations with Chlorophyta and Bacillariophyta (r = 0.16, p < 0.01; r = 0.19, p < 0.01; r = 0.19, p < 0.01; r = 0.18, p < 0.01) (Fig. 10c). Strong winds on the lake disrupt the water column and reduce surface stability. The predominant Cyanobacteria species in Lake Hulun are Dolichospermum circinale and Microcystis sp. These species are characterized by unique gas vesicles that enable them to adjust buoyancy66, allowing them to sink to maintain stability under strong winds, which consequently reduces their biomass abundance at the water surface. Bacillariophyta and Chlorophyta cannot actively adjust their position in the water column, which causes them to remain at the water surface for a short time after the lake experiences strong winds. Meanwhile, their biomass proportion increases due to the reduction in cyanobacteria biomass. Our results are consistent with the study by Moreno-Ostos67. Due to the low daily precipitation in the Hulun Lake Basin, there is insufficient data correlating with algal biomass, thus the impact of PRCR on algal biomass abundance is not significant, as indicated by Spearman correlation lacking statistical significance (p > 0.05).
Impact of water quality parameters on phytoplankton
In this study, we also analyzed the relationship between the biomass abundance of three algal species and 9 different Water Quality Parameters (WQP). In this respect, the correlation coefficients and corresponding significance test p-values were calculated (Fig. 11). Generally, nitrogen and phosphorus are essential elements for the growth and reproduction of algal68,69, and competition among algal intensifies with increasing availability of nutrients. This competitive pressure leads to the dominance of certain species over others, thereby altering the structure of algal communities70,71. However, we found that the correlations between the biomass abundance of three algal species and TP and TN were not strong. This may be attributed to Lake Hulun being a eutrophic lake, with consistently high levels of TP, TN, and other nutrients providing sufficient nutrients for the growth of most algal species.
Interestingly, Chla shows a significant positive correlation with Cyanophyta biomass proportion. A high Chla concentration indicates that Cyanophyta are in a flourishing reproductive state. Under such conditions, the growth rate of Chlorophyta and Bacillariophyta is lower than that of Cyanophyta72. Therefore, during algal blooms, Cyanophyta biomass levels are in absolute dominance, inhibiting the growth and dominance of other algal species. In contrast, under conditions of low Chla concentration, algae experience a less vigorous growth period, typically occurring from autumn to winter when decreased water temperatures limit algal growth to some extent58. However, Bacillariophyta are tolerant to low temperatures and prefer habitats with favorable optical transparency73,74, giving them a survival advantage over Cyanophyta in competition. With a wide range of suitable habitats and numerous species, Chlorophyta can maintain a certain abundance even during periods of low water temperature75, thus the biomass abundance of Chlorophyta does not significantly affect Chla variations.
illustrates the relationship between biomass abundance of Cyanophyta, Chlorophyta, and Bacillariophyta and various water quality parameters (WQPs). The parameters include DOC, SPIM, SPOM, PC, CDOM, TP, Chla, and TN. Asterisks indicate levels of significance, with * representing p < 0.05 and ** representing p < 0.01.
Applicability and uncertainty of model
The ML algorithm was utilized to develop a model for estimating the biomass abundance of various algal categories. Our approach successfully inverted and estimated the abundance of Cyanobacteria, Chlorophyta, and Bacillariophyta biomass relative to the total algal biomass in Lake Hulun. However, the performance of the ML model heavily relies on the quality of the training dataset. Insufficient or unrepresentative samples in the dataset may lead to overfitting or underfitting of the model, resulting in reduced robustness30. Therefore, it is crucial to ensure that the training dataset contains an adequate number of representative samples covering diverse scenarios within the target area. Moreover, our study exclusively addressed the biomass proportions of three algal categories without considering less abundant species like cryptophytes, dinoflagellates, and diatoms, and their potential influence on the inversion results. Thus, a comprehensive consideration of these factors is necessary to enhance the predictive accuracy and generalizability of the model, thereby optimizing its performance in practical applications.
Moreover, the relationship between the biomass of different algal groups and the Rrs characteristics may be influenced by ecosystem structure. When the biomass of certain algal is extremely low, its spectral characteristics may be masked by other substances (e.g., CDOM, SPIM). Although Sentinel-3 OLCI boasts a higher signal-to-noise ratio and spectral resolution compared to previous sensors, enhancing its sensitivity to algal changes in lakes, atmospheric correction algorithms can impact the accuracy of OLCI images. Therefore, when utilizing OLCI data for algal studies, selecting appropriate atmospheric correction algorithms is crucial to ensure the reliability and accuracy of research findings. Future advancements in sensor technology are needed to address this challenge more effectively. Noteworthy, on February 8, 2024, NASA successfully launched a new Earth observation satellite named the Plankton, Aerosol, Cloud, ocean Ecosystem (PACE), which will provide new avenues for remote sensing studies of phytoplankton communities in polar lakes76. PACE’s 5 nm spectral resolution allows it to capture very fine spectral details. This high resolution is particularly valuable for distinguishing between different types of algae, particulates, and other water constituents, which can often exhibit similar spectral characteristics. The ability to resolve these subtle differences enhances the accuracy of algae classification and biomass estimation.
Implications for environmental management
Understanding the composition and temporal trends of algal species is crucial for comprehending inland aquatic ecosystems77. In this study, we conducted a detailed analysis of the biomass abundance of three algal species in cold region lakes using long-term observational data, providing new insights and data support. This analytical approach has broad applicability in future research: (1) enhancing the precision of ecosystem health assessment through more accurate estimation of algal biomass, (2) exploring spatiotemporal variability and driving mechanisms of dominant algal species to understand ecosystem changes, and (3) predicting global distribution trends of algal communities under climate change conditions, thereby supporting environmental protection and policy-making.
The impact, developmental patterns, and control methods of algal blooms caused by different phytoplankton vary significantly13. Therefore, understanding the dynamic changes in these diverse algal communities is crucial for lake management. Through long-term sequential observations of biomass abundance across different algal species and comprehensive analysis using remote sensing data, scientific decision support can be provided to lake managers. This approach reveals developmental trends and characteristics of algal communities, offering a solid scientific basis for formulating targeted control strategies and thereby enhancing the protection and management of lake ecosystems.
Conclusion
This study aimed to establish an efficient model based on machine learning methods to estimate the biomass abundance of different algal in optically complex inland cold region lakes. A biomass abundance inversion model using OLCI images was developed. The results indicate that the XGBoost for estimating Cyanophyta’s biomass abundance achieved the highest accuracy; RF for estimating Chlorophyta’s biomass abundance achieved the highest accuracy; GBDT for estimating Bacillariophyta’s biomass abundance achieved the highest accuracy. Cyanophyta and Bacillariophyta were the predominant algal species in Lake Hulun, often coexisting with other algal groups. Biomass abundance varied significantly across different years and seasons, with Cyanophyta peaking in 2018 and 2022 due to cyanobacterial blooms, while diatoms exhibited lower values and Chlorophyta showed relatively stable changes. Cyanophyta and chlorophyta had higher biomass abundance in spring, whereas diatoms were more prevalent in autumn. Meteorological factors such as temperature, pressure, and wind speed influenced the biomass abundance of different algal. Our model not only accurately estimates algal biomass abundance but also enhances understanding of their spatiotemporal distribution and trends, providing crucial support for water quality monitoring and ecological protection. With the launch of new sensors in the future, this method may facilitate easier identification of optically complex inland waters, significantly improving our ability to monitor different algal species.
Data availability
Data will be made available on request. The data used in this paper can be accessed at https://doi.org/10.57760/sciencedb.11306 and other data are available from the corresponding author on reasonable request.
References
Kramer, S. J., Siegel, D. A., Maritorena, S. & Catlett, D. Modeling surface ocean phytoplankton pigments from hyperspectral remote sensing reflectance on global scales. Remote Sens. Environ. 270, 112879 (2022).
Falkowski, P. G. & Oliver, M. J. Mix and match: how climate selects phytoplankton. Nat. Rev. Microbiol. 5, 813–819 (2007).
Shilei, Z. et al. Reservoir water stratification and mixing affects microbial community structure and functional community composition in a stratified drinking reservoir. J. Environ. Manage. 267, 110456 (2020).
Ren, Z., Qu, X. & Zhang, M. Distinct bacterial communities in wet and dry seasons during a seasonal water level fluctuation in the largest freshwater lake (Poyang Lake) in China. Front. Microbiol. 10, 453849 (2019).
Yang, J., Ma, L., Jiang, H., Wu, G. & Dong, H. Salinity shapes microbial diversity and community structure in surface sediments of the Qinghai-Tibetan Lakes. Sci. Rep. 6, 25078 (2016).
Bracher, A. et al. Obtaining phytoplankton diversity from ocean color: a scientific roadmap for future development. Front. Mar. Sci. 4, 55 (2017).
Cao, Z. et al. What water color parameters could be mapped using MODIS land reflectance products: a global evaluation over coastal and inland waters. Earth Sci. Rev. 232, 104154 (2022).
Hu, M. et al. Eutrophication state in the Eastern China based on Landsat 35-year observations. Remote Sens. Environ. 277, 113057 (2022).
Li, X., Wang, Y., Xue, B., Zhang, X. & Wang, G. Attribution of runoff and hydrological drought changes in an ecologically vulnerable basin in semi-arid regions of China. Hydrol. Process. 37, e15003 (2023).
Wang, M. et al. Interannual changes of coastal aquaculture ponds in China at 10-m spatial resolution during 2016–2021. Remote Sens. Environ. 284, 113347 (2023).
Zhao, C., Zhang, Y., Guo, W. & Fahad Baqa, M. Dynamics and drivers of water clarity derived from landsat and in-situ measurement data in Hulun Lake from 2010 to 2020. Water 14, 1189 (2022).
Fang, C. et al. Global divergent trends of algal blooms detected by satellite during 1982–2018. Glob. Change Biol. 28, 2327–2340 (2022).
Pal, M., Yesankar, P. J., Dwivedi, A. & Qureshi, A. Biotic control of harmful algal blooms (HABs): a brief review. J. Environ. Manage. 268, 110687 (2020).
Li, X., Yang, Y., Ishizaka, J. & Li, X. Global estimation of phytoplankton pigment concentrations from satellite data using a deep-learning-based model. Remote Sens. Environ. 294, 113628 (2023).
Frieder, C. A. et al. A macroalgal cultivation modeling system (MACMODS): evaluating the role of physical-biological coupling on nutrients and farm yield. Front. Mar. Sci. 9, 752951 (2022).
Wolny, J. L. et al. Current and future remote sensing of harmful algal blooms in the Chesapeake Bay to support the shellfish industry. Front. Mar. Sci. 7, 337 (2020).
Kramer, S. J., Siegel, D. A. & Graff, J. R. Phytoplankton community composition determined from co-variability among phytoplankton pigments from the NAAMES field campaign. Front. Mar. Sci. 7, 215 (2020).
Chase, A., Boss, E., Cetinić, I. & Slade, W. Estimation of phytoplankton accessory pigments from hyperspectral reflectance spectra: toward a global algorithm. J. Geophys. Research: Oceans. 122, 9725–9743 (2017).
Mousing, E. A., Richardson, K., Bendtsen, J., Cetinić, I. & Perry, M. J. Evidence of small-scale spatial structuring of phytoplankton alpha‐and beta‐diversity in the open ocean. J. Ecol. 104, 1682–1695 (2016).
Zhu, Y. et al. Spatial and temporal distribution analysis of dominant algae in Lake Taihu based on ocean and land color instrument data. Ecol. Ind. 155, 110959 (2023).
Shen, F., Tang, R., Sun, X. & Liu, D. Simple methods for satellite identification of algal blooms and species using 10-year time series data from the East China Sea. Remote Sens. Environ. 235, 111484 (2019).
Sun, X., Shen, F., Brewin, R. J., Li, M. & Zhu, Q. Light absorption spectra of naturally mixed phytoplankton assemblages for retrieval of phytoplankton group composition in coastal oceans. Limnol. Oceanogr. 67, 946–961 (2022).
Hirata, T. et al. Synoptic relationships between surface Chlorophyll-a and diagnostic pigments specific to phytoplankton functional types. Biogeosciences 8, 311–327 (2011).
Xi, H. et al. Global chlorophyll a concentrations of phytoplankton functional types with detailed uncertainty assessment using multisensor ocean color and sea surface temperature satellite products. J. Geophys. Research: Oceans. 126, e2020JC017127 (2021).
Brewin, R. J. et al. An intercomparison of bio-optical techniques for detecting dominant phytoplankton size class from satellite remote sensing. Remote Sens. Environ. 115, 325–339 (2011).
Sathyendranath, S. et al. in (Reports of the International Ocean-Colour Coordinating Group (IOCCG); 15) 1-156 (International Ocean-Colour Coordinating Group, (2014).
Sathyendranath, S. et al. Discrimination of diatoms from other phytoplankton using ocean-colour data. Mar. Ecol. Prog. Ser. 272, 59–68 (2004).
Tao, B. et al. A novel method for discriminating Prorocentrum donghaiense from diatom blooms in the East China Sea using MODIS measurements. Remote Sens. Environ. 158, 267–280 (2015).
Raitsos, D. E. et al. Identifying four phytoplankton functional types from space: an ecological approach. Limnol. Oceanogr. 53, 605–613 (2008).
Zhang, Y., Shen, F., Sun, X. & Tan, K. Marine big data-driven ensemble learning for estimating global phytoplankton group composition over two decades (1997–2020). Remote Sens. Environ. 294, 113596 (2023).
Flombaum, P. et al. Present and future global distributions of the marine Cyanobacteria Prochlorococcus and Synechococcus. Proc. Natl. Acad. Sci. 110, 9824–9829 (2013).
Stock, A. & Subramaniam, A. Accuracy of empirical satellite algorithms for mapping phytoplankton diagnostic pigments in the open ocean: a supervised learning perspective. Front. Mar. Sci. 7, 599 (2020).
Busseni, G. et al. Large scale patterns of marine diatom richness: drivers and trends in a changing ocean. Glob. Ecol. Biogeogr. 29, 1915–1928 (2020).
Fan, C. et al. Century-scale reconstruction of water storage changes of the largest lake in the inner mongolia plateau using a machine learning approach. Water Resources Research 57, e2020WR028831 (2021).
Fang, C. et al. Remote sensing of harmful algal blooms variability for Lake Hulun using adjusted FAI (AFAI) algorithm. J. Environ. Inf. 34, 108–122 (2018).
Li, X. et al. Evolution characteristics and driving factors of cyanobacterial blooms in Hulun Lake from 2018 to 2022. Water 15, 3765 (2023).
Shang, Y. et al. Factors affecting seasonal variation of microbial community structure in Hulun Lake, China. Sci. Total Environ. 805, 150294 (2022).
Chen, J. et al. Common fate of sister lakes in Hulunbuir Grassland: long-term harmful algal bloom crisis from multi-source remote sensing insights. J. Hydrol. 594, 125970 (2021).
Song, T. et al. Lake Cyanobacterial Bloom Color Recognition and Spatiotemporal monitoring with Google Earth Engine and the Forel-Ule Index. Remote Sens. 15, 3541 (2023).
Guo, S. et al. Seasonal variation in the phytoplankton community of a continental-shelf sea: the East China Sea. Mar. Ecol. Prog. Ser. 516, 103–126 (2014).
Utermöhl, H. Zur Vervollkommnung Der Quantitativen phytoplankton-methodik: Mit 1 Tabelle und 15 abbildungen im text und auf 1 Tafel. Int. Ver. für Theoretische und Angewandte Limnologie: Mitteilungen. 9, 1–38 (1958).
Vanhellemont, Q. & Ruddick, K. Atmospheric correction of Sentinel-3/OLCI data for mapping of suspended particulate matter and chlorophyll-a concentration in Belgian turbid coastal waters. Remote Sens. Environ. 256, 112284 (2021).
Wang, X. et al. Monitoring phycocyanin concentrations in high-latitude inland lakes using Sentinel-3 OLCI data: the case of Lake Hulun, China. Ecol. Ind. 155, 110960 (2023).
Li, Y. et al. Sentinel-3 OLCI observations of Chinese lake turbidity using machine learning algorithms. J. Hydrol. 622, 129668 (2023).
Gao, B. C. NDWI—A normalized difference water index for remote sensing of vegetation liquid water from space. Remote Sens. Environ. 58, 257–266 (1996).
Rouse, J. W., Haas, R. H., Schell, J. A. & Deering, D. W. Monitoring vegetation systems in the Great Plains with ERTS. NASA Spec. Publ. 351, 309 (1974).
Li, S. et al. Quantification of chlorophyll-a in typical lakes across China using Sentinel-2 MSI imagery with machine learning algorithm. Sci. Total Environ. 778, 146271 (2021).
Song, K. et al. Quantification of lake clarity in China using Landsat OLI imagery data. Remote Sens. Environ. 243, 111800 (2020).
Lyu, L. et al. Remote estimation of phycocyanin concentration in inland waters based on optical classification. Sci. Total Environ. 899, 166363 (2023).
Fang, C. et al. A novel total phosphorus concentration retrieval method based on two-line classification in lakes and reservoirs across China. Sci. Total Environ. 906, 167522 (2024).
Tao, H. et al. Response of total suspended matter to natural and anthropogenic factors since 1990 in China’s large lakes. Sci. Total Environ. 892, 164474 (2023).
Chen, T. Xgboost: extreme gradient boosting. R package version 0.4-2 1 (2015).
Rousseaux, C. S. & Gregg, W. W. Climate variability and phytoplankton composition in the Pacific Ocean. J. Geophys. Res.: Oceans 117 (2012).
Poloczanska, E. S. et al. Global imprint of climate change on marine life. Nat. Clim. Change. 3, 919–925 (2013).
Butterwick, C., Heaney, S. & Talling, J. Diversity in the influence of temperature on the growth rates of freshwater algae, and its ecological relevance. Freshw. Biol. 50, 291–300 (2005).
Joehnk, K. D. et al. Summer heatwaves promote blooms of harmful cyanobacteria. Glob. Change Biol. 14, 495–512 (2008).
Lürling, M., Eshetu, F., Faassen, E. J., Kosten, S. & Huszar, V. L. Comparison of cyanobacterial and green algal growth rates at different temperatures. Freshw. Biol. 58, 552–559 (2013).
Yang, Z., Zhang, M., Yu, Y. & Shi, X. Temperature triggers the annual cycle of Microcystis, comparable results from the laboratory and a large shallow lake. Chemosphere 260, 127543 (2020).
Kosten, S. et al. Warmer climates boost cyanobacterial dominance in shallow lakes. Glob. Change Biol. 18, 118–126 (2012).
Wang, Q., Yang, X., Hamilton, P. B. & Zhang, E. Linking spatial distributions of sediment diatom assemblages with hydrological depth profiles in a plateau deep-water lake system of subtropical China. Fottea 12, 59–73 (2012).
Alvain, S., Moulin, C., Dandonneau, Y. & Loisel, H. Seasonal distribution and succession of dominant phytoplankton groups in the global ocean: a satellite view. Glob. Biogeochem. Cycles 22 (2008).
Thomas, M. K., Kremer, C. T., Klausmeier, C. A. & Litchman, E. A global pattern of thermal adaptation in marine phytoplankton. Science 338, 1085–1088 (2012).
Thoppil, P. G. Enhanced phytoplankton bloom triggered by atmospheric high-pressure systems over the Northern Arabian Sea. Sci. Rep. 13, 769 (2023).
Winder, M. & Sommer, U. Phytoplankton response to a changing climate. Hydrobiologia 698, 5–16 (2012).
Henson, S. A., Cael, B., Allen, S. R. & Dutkiewicz, S. Future phytoplankton diversity in a changing climate. Nat. Commun. 12, 5372 (2021).
Huisman, J. et al. Cyanobacterial blooms. Nat. Rev. Microbiol. 16, 471–483 (2018).
Moreno-Ostos, E., Cruz-Pizarro, L., Basanta, A. & George, D. G. The influence of wind-induced mixing on the vertical distribution of buoyant and sinking phytoplankton species. Aquat. Ecol. 43, 271–284 (2009).
Xiong, J. et al. Development of remote sensing algorithm for total phosphorus concentration in eutrophic lakes: conventional or machine learning? Water Res. 215, 118213 (2022).
Deininger, A., Faithfull, C. L. & Bergström, A. K. Phytoplankton response to whole lake inorganic N fertilization along a gradient in dissolved organic carbon. Ecology 98, 982–994 (2017).
Chen, W., Wang, X. & Yang, S. Response of phytoplankton community structure to environmental changes in the coastal areas of northern China. Mar. Pollut. Bull. 195, 115300 (2023).
Elser, J. J. et al. Global analysis of nitrogen and phosphorus limitation of primary producers in freshwater, marine and terrestrial ecosystems. Ecol. Lett. 10, 1135–1142 (2007).
Reynolds, C. S. The Ecology of Phytoplankton (Cambridge University Press, 2006).
Mustapha, Z. B., Alvain, S., Jamet, C., Loisel, H. & Dessailly, D. Automatic classification of water-leaving radiance anomalies from global SeaWiFS imagery: application to the detection of phytoplankton groups in open ocean waters. Remote Sens. Environ. 146, 97–112 (2014).
Gregg, W. W. & Casey, N. W. Modeling coccolithophores in the global oceans. Deep Sea Res. Part II. 54, 447–477 (2007).
Chen, M., Li, J., Dai, X., Sun, Y. & Chen, F. Effect of phosphorus and temperature on chlorophyll a contents and cell sizes of Scenedesmus obliquus and Microcystis aeruginosa. Limnology 12, 187–192 (2011).
Cetinić, I. et al. Phytoplankton composition from sPACE: requirements, opportunities, and challenges. Remote Sens. Environ. 302, 113964 (2024).
Pan, X., Mannino, A., Marshall, H. G., Filippino, K. C. & Mulholland, M. R. Remote sensing of phytoplankton community composition along the northeast coast of the United States. Remote Sens. Environ. 115, 3731–3747 (2011).
Acknowledgements
The research was jointly supported by the National Natural Science Foundation of China (41971322, 42171374, 42101366, 42371390), the Natural Science Foundation of Jilin Province, China (YDZJ202401474ZYTS). Young Scientist Group Project of Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences (2023QNXZ01).
Author information
Authors and Affiliations
Contributions
Zhaojiang Yan: Conceptualization, Methodology, Writing—original draft. Chong Fang: Methodology, Conceptualization, Writing - Review & Editing. Kaishan Song: Funding acquisition, Resources. Xiangyu Wang: Software, Visualization. Zhidan Wen: Data curation, Validation. Yingxin Shang: Data curation, Validation. Hui Tao: Data curation, Validation. Yunfeng Lyu: Funding acquisition, Resources, Validation.All authors have read and agreed to the published version of the manuscript. All authors reviewed the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Yan, Z., Fang, C., Song, K. et al. Spatiotemporal variation in biomass abundance of different algal species in Lake Hulun using machine learning and Sentinel-3 images. Sci Rep 15, 2739 (2025). https://doi.org/10.1038/s41598-025-87338-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-87338-4