Introduction

The Southern Ocean plays a key role in regulating global climate through its influence on the Meridional Overturning Circulation (MOC). Buoyancy loss near the Antarctic shelf drives the formation of Dense Shelf Water (DSW) and Antarctic Bottom Water (AABW), ventilating the abyssal ocean, while wind and eddy-driven upwelling supports the return flow of deep waters to the surface1. The upwelling of CDW brings nutrients and carbon to the surface, fuelling biological productivity and influencing air-sea carbon fluxes2,3, while ventilation of the ocean through formation of AABW, Subantarctic Mode Water (SAMW), and Antarctic Intermediate Water (AAIW) regulate the uptake and storage of heat4,5,6,7.

Of particular importance to the future stability of Antarctic ice sheets is Circumpolar Deep Water (CDW), which is known to be the major source of oceanic heat near the Antarctic shelf region8. CDW forms at depth as a mixture between multiple water masses, including North Atlantic Deep Water (NADW), AABW, Pacific Deep Water, and Indian Deep Water. Its persistence and renewal on centennial timescales arise from the integrated effect of the global overturning circulation, including the slow return of NADW and Pacific/Indian deep waters to the Southern Ocean interior via isopycnal advection and diapycnal mixing9,10. At the Antarctic shelf break, CDW typically exhibits potential temperatures (θ) above 0°C, making it warm enough to contribute substantially to basal melting of ice shelves11. The processes governing CDW supply to the ice shelf are complex and often uncertain, although studies have shown eddies12, wind forcing13, bathymetry14 and buoyancy forcing15 all play important roles at a variety of timescales16.

The response of CDW to climate change may depend on changes in both the position and strength of the Antarctic Circumpolar Current (ACC) and the Southern Hemisphere westerlies. Observations suggest a poleward shift in the southern extent of the ACC in recent decades17,18, consistent with a contemporaneous poleward displacement of the westerly wind belt associated with a more positive Southern Annular Mode (SAM)19,20. Such shifts could relocate wind-driven CDW upwelling, promoting a poleward migration of warm CDW toward the Antarctic continental margin. Similar linkages between ACC migration and global climate have been inferred on palaeoclimatic timescales21,22, and there is emerging regional evidence for wind-driven poleward CDW migration in East Antarctica23. Consistently, coupled climate models project a continued poleward shift of the westerly winds under future warming scenarios24.

In addition to meridional shifts, strengthening of the westerly winds—observed in recent decades and projected by climate models24,25—may also influence the meridional structure of CDW. Although the ACC core is widely considered eddy-saturated, such that its transport remains relatively insensitive to wind intensification26,27, stronger winds may enhance the meridional eddy-driven residual circulation, facilitating a southward migration of CDW17,28. Observational evidence for a corresponding increase in eddy kinetic energy remains mixed, with some studies reporting increases29 and others finding little change outside specific regions30. Eddy-saturated behaviour is most clearly associated with wind forcing over the ACC belt itself; wind anomalies located closer to Antarctica can instead drive a largely barotropic response, with little associated change in eddy activity31.

Importantly, CDW responses to changes in the westerlies and the SAM are dynamically heterogeneous and occur on multiple timescales. Modelling studies demonstrate a two-timescale response to wind forcing, with a rapid Ekman and barotropically mediated adjustment that initially cools the Antarctic surface ocean, followed by a slower baroclinic response in which sustained wind forcing enhances upwelling of warm CDW, leading to longer-term warming31,32. This slower adjustment involves changes in stratification, residual overturning, and subpolar gyre dynamics, may evolve over years to decades, and exhibits zonal variability33,34.

AABW and its precursor DSW exert a strong control on the vertical and meridional structure of Southern Ocean water masses. In addition to contributing to the lower limb of the MOC, DSW on and near the continental shelf forms a dynamical barrier that limits CDW access to ice shelves15,35,36. DSW formation is highly localised, occurring primarily in the Weddell and Ross seas, Prydz Bay, and along the Adélie Coast37,38, regions that consequently experience some of the lowest observed ice shelf melt rates39. Downslope of the shelf, the thickness of the cascading AABW/DSW layer constrains the meridional extent of the CDW layer; freshwater-forced simulations indicate that contraction of this layer leads to a poleward migration of CDW40. The strength of DSW and AABW formation therefore regulates the amount of oceanic heat within the CDW layer that can reach Antarctic ice shelves.

Beyond their impact on Antarctic melt, AABW and CDW also play a key role in setting the global distribution and ventilation of oceanic tracers. CDW is brought to the surface in the Southern Ocean via wind-driven and eddy-mediated upwelling, where it can release preformed nutrients and dissolved carbon to the atmosphere, contributing to natural CO2 outgassing2,3,41. In contrast, AABW acts as a long-term reservoir for tracers, ventilating the abyssal ocean and sequestering heat and anthropogenic carbon on multi-century timescales42,43. Any changes in the volume, properties, or formation rate of these water masses may therefore alter global tracer inventories, impacting both the rate of heat uptake and the efficiency of carbon storage in the ocean interior.

Observations show that AABW has warmed, freshened, and contracted in recent decades44,45,46,47,48,49, with declining oxygen concentrations suggesting reduced abyssal ventilation50,51. The strongest warming occurs at high southern latitudes near DSW formation regions and may be linked to changes in AABW source water properties on or near the continental shelf15,52,53. Reduced AABW volume has been reported in the Weddell Sea, the Australian-Antarctic Basin, and as far north as the North Pacific44,51,54,55. Although separating the roles of reduced formation and altered source water properties remains challenging given sparse observations, climate models consistently project a future slowdown of AABW production under continued warming, despite known limitations in their representation of formation processes40,56,57.

Changes in Southern Ocean water mass structure have important implications for Antarctic glacial melt, sea level rise, and climate. While changes in AABW have been widely documented, comparatively little attention has been given to observable changes in CDW. Here we present observational evidence that CDW has migrated poleward across the upper 2000 m of the Southern Ocean over the past two decades. We use an Optimum Multiparameter (OMP) water mass classification, applied to repeat ship-based hydrographic sections and a gridded climatology to provide a baseline description of the mean Southern Ocean water mass structure (Section ‘Climatology’). We then use a machine learning approach to extend this ship-based water mass classification across the Southern Ocean by filling gaps between sparse ship observations using the Roemmich-Gilson (RG) Argo gridded monthly dataset58. This combined framework allows us to assess temporal changes in water mass distributions, with a primary focus on CDW and secondary consideration of changes in other water masses (Section ‘Temporal changes in water mass distribution’). We find an upper 2000 m poleward migration in CDW across almost all longitudes in both the ship sections and the Argo dataset. This signal is the leading-order mode of non-seasonal CDW variability according to our Empirical Orthogonal Function (EOF) analysis, and is accompanied by a contraction in the downslope cascade of AABW/DSW. Finally, the implications of the observed changes are discussed in Section ‘Discussion and conclusions’.

Results

To explore recent changes in the Southern Ocean water mass structure we use a multistep analysis. The method is set out schematically in Fig. 1. We begin by conducting an OMP water mass classification to describe the mean-state of Southern Ocean water masses in the Global Ocean Data Analysis Project (GLODAP) climatological product (Section ‘Climatology’; step 0 in Fig. 1). Establishing the mean state is necessary in order to interpret subsequent variability. We then examine the changes in water mass distribution in two ways. For all time-varying analyses, we vary our chosen end members in time to control for multi-decadal trends in source water properties (Section ‘Time-varying end members’). The effects of these end-member property variations are explored in detail and associated uncertainties are quantified (see Sections ‘Machine learning model and application to Argo data’ and S2.1.1). All OMP analyses presented here are based on the framework of Tomczak59, in which end members—canonical, unmixed water masses with distinct physical-biogeochemical properties—are defined for a range of source waters, and water mass fractions are obtained by solving a non-negative system of linear equations.

Fig. 1: Schematic overview of the analysis workflow used in this study, illustrating the main processing steps.
Fig. 1: Schematic overview of the analysis workflow used in this study, illustrating the main processing steps.The alternative text for this image may have been generated using AI.
Full size image

0) An OMP water mass classification framework is applied to the GLODAP gridded climatological product to derive the Southern Ocean water-mass mean state (Section ‘Climatology’). 1) Raw ship-based GO-SHIP hydrographic observations, including biogeochemical tracers, are then used to estimate relative water mass fractions along individual hydrographic sections using OMP analysis, yielding section-based distributions of CDW and other water masses. 2) A random forest model ensemble is trained on the section-based water-mass classifications in the absence of biogeochemical tracers and applied to gridded Argo float data. 3) This produces a basin-wide, monthly water-mass classification, providing a spatially and temporally continuous estimate of water mass variability across the Southern Ocean. The outputs of steps 1) and 3) are subsequently used to diagnose CDW variability along sections and at the basin scale, respectively (Section ‘Temporal changes in water mass distribution’). Each method identifies a poleward migration of CDW in the upper 2000 m in recent decades. Uncertainty in the basin-wide CDW estimates is quantified from the spread of the ensemble classification, with robustness assessed through sensitivity tests to source-water definitions, tracer end-member uncertainties, and OMP weighting choices.

To quantify temporal changes, we first calculate changes in water mass percentages along repeat Global Ocean Ship-based Hydrographic Investigations Program (GO-SHIP) cruise sections (lines shown in Fig. 2; method shown Step 1 in Fig. 1). Second, we train a Random Forest (RF) model using the output of this point-wise water mass classification, and apply it to the Roemmich-Gilson gridded Argo dataset58 (Step 2 in Fig. 1) to produce a 20-year gridded classification of Southern Ocean water masses at high spatial and temporal resolution (Step 3 in Fig. 1). Using this gridded classification, we examine local monthly variability of all water masses in the upper 2000 m of the Southern Ocean. The results of both these analyses are presented in Section ‘Temporal changes in water mass distribution’.

Fig. 2: GO-SHIP/GLODAP data and associated water mass end members used in this study.
Fig. 2: GO-SHIP/GLODAP data and associated water mass end members used in this study.The alternative text for this image may have been generated using AI.
Full size image

a Shows all individual cruise lines used throughout the analysis, coloured by CDW+NADW fraction at 2000 m according to our water mass classification (Section ‘OMP water mass classification’). b Shows the θ-salinity bivariate histogram of the gridded GLODAP climatology dataset from Lauvset et al.60, along with our chosen temperature and salinity end members for each water mass which are adapted from Pardo et al.65. c, d Show scatter plots of the upper and lower red boxes in (b), respectively. The scatter points are coloured by dissolved oxygen concentration.

Climatology

Here we describe the multidecadal mean Southern Ocean water-mass distribution derived from the gridded GLODAP climatology (1972–2013)60. These fields are used primarily to provide biogeochemical context for the large-scale water-mass structure, rather than to define a dynamical baseline, as the climatological θ-salinity fields are not optimised for physical analyses. Water masses are classified using a standard OMP approach (Section ‘OMP water mass classification’), with end members shown in θ-salinity space in Fig. 2. Applying OMP to gridded GLODAP data enables classification of AABW, CDW, AAIW, and SAMWs at 33 depth levels between 0 and 5500 m on a 1 × 1 degree grid.

AABW dominates abyssal depths, consistent with Southern Ocean observations61. At mid-to-upper levels, its downslope cascade appears as a narrow band of elevated concentrations along the Antarctic margin (Fig. 3). The AABW precursor, DSW, is confined to relatively shallow shelf depths (Fig. S1) and forms primarily in the Weddell and Ross seas, Prydz Bay, and along the Adélie Coast37,38. Spatial variability in DSW production strongly influences the downstream distribution of AABW, with peak concentrations in the Weddell Sea sector and persistently high values across much of East Antarctica and the Ross Sea, but markedly reduced concentrations in West Antarctica where DSW formation is absent (Fig. S2).

Fig. 3: Circumpolar distribution of Southern Ocean water mass fractions.
Fig. 3: Circumpolar distribution of Southern Ocean water mass fractions.The alternative text for this image may have been generated using AI.
Full size image

Climatological water mass classification at 500 m (top), 1000 m (middle) and 2000 m (bottom) for distinct water mass fractions (rows), computed from the mapped GLODAP climatology product (Section ‘GLODAP and GO-SHIP data’) using the OMP analysis described in Section ‘OMP water mass classification’. Fractions lower than 5% are removed. The background is NASA satellite-derived land-cover data from Natural Earth and ocean bottom data from CleanTOPO2.

We identify a core of CDW that upwells along tilted isopycnals associated with strong Southern Ocean westerlies. CDW is ubiquitous at mid-depths away from the continent, with concentrations increasing poleward as depth decreases (Fig. 3). Concentrations are lower in the Atlantic sector, where southward-penetrating NADW dominates at depth (Fig. S3). CDW access to the Antarctic continental shelf is strongly modulated by the presence of DSW and AABW. In the Weddell Sea and East Antarctic sectors, DSW occupies the near-shelf region and inhibits CDW intrusion, whereas in the warm-shelf West Antarctic sector, CDW extends onto the continental shelf (Fig. 4). Away from the shelf, the downslope cascade of AABW defines the southern boundary of upwelling CDW (Fig. 3).

Fig. 4: Meridional-vertical structure of Southern Ocean water mass fractions.
Fig. 4: Meridional-vertical structure of Southern Ocean water mass fractions.The alternative text for this image may have been generated using AI.
Full size image

Water mass classification at three longitudes (rows), computed from the mapped GLODAP climatology product (Section ‘GLODAP and GO-SHIP data’) using the OMP analysis described in Section ‘OMP water mass classification’.

Antarctic Surface Water (AASW) occupies the near-continental surface layers south of the ACC, extending northward to the sub-Antarctic front (Figs. S4 and S1; Fig. S5 for the location of the major ACC fronts). North of this, SAMW appears as a shallow subsurface layer on the equatorward side of the sub-Antarctic front, deepening toward lower latitudes, consistent with wintertime subduction and northward advection4,5. AAIW forms a distinct layer beneath SAMW and above the upwelling CDW core north of the polar front, in line with established formation mechanisms involving surface subduction and mixing with CDW4,62,63. Subtropical Central Water (STCW) dominates the upper ocean at lower latitudes above the SAMW/AAIW layers, reflecting warm, saline surface waters characteristic of subtropical regions (Fig. S1).

We note several regions where the classification departs from expectations of the large-scale Southern Ocean circulation. First, a patch of elevated NADW fractions appears at mid-depths north of the Adélie Coast (Fig. S3), outside the depth and longitudinal range typically associated with NADW. This is likely due to misclassification of CDW as NADW, reflecting their similar end-member properties; accordingly, CDW and NADW are grouped in Figs. 3 and 4. This distinction is not fundamental to our analysis, as NADW is a precursor to CDW. Second, a shallow layer of high AABW fractions near the continental shelf (Fig. 4) likely reflects Winter Water formed from the previous winter’s mixed layer and subsequently capped by seasonal warming and freshening64. Finally, while two SAMW end members are included following Pardo et al.65, the SAMW2 end member lies close to AAIW in thermohaline space (Fig. 2), leading to a deeper-than-expected SAMW distribution (Fig. 4). Sensitivity tests omitting SAMW2 show that resulting changes are largely confined to AAIW (Fig. S6), indicating redistribution between neighbouring classes rather than a substantive change in overall structure.

Temporal changes in water mass distribution

To explore temporal changes in the volume and distribution of Southern Ocean water masses, we first classify water masses along individual GO-SHIP hydrographic sections and quantify changes between repeated occupations (for a more detailed overview of the OMP outputs, see Figs. S7, S8, S9, and S10). The method follows a standard OMP framework, with the addition of minor end member adjustments to account for source water variability (Section ‘OMP water mass classification’). Zonal-mean changes in water mass fraction are quantified following the mapping and differencing method outlined in Section ‘Time-varying end members’. Differences between the zonal-mean early (2005–2010) and late (post-2015) sections are computed by interpolating observations onto a common latitude-depth grid within each longitude bin and averaging across sectors (Fig. 5).

Fig. 5: Changes in CDW fraction between early (2005–2010) and late (post-2015) GO-SHIP sections of the Southern Ocean.
Fig. 5: Changes in CDW fraction between early (2005–2010) and late (post-2015) GO-SHIP sections of the Southern Ocean.The alternative text for this image may have been generated using AI.
Full size image

a Shows all data from GO-SHIP sections used in this study, coloured by relative CDW fraction. Individual sections are then interpolated onto the GLODAP climatology grid and smoothed with Gaussian filtering (σdepth = 1.5, σlatitude = 2.0), and the changes between the early and late sections are shown in (bd). The East Antarctic sector is defined as the zonal mean of the interpolated 9° longitudinal bins within the range 0–180°E. The West Antarctic sector is defined as the zonal mean of the interpolated 9° longitudinal bins within the range 180–70°W, which includes the P18 line at ~105°W (Fig. S9 for the original classification). Topography is estimated from the GLODAP climatology and overlaid in black, while missing data are shown in light grey. Only grid points for which observations are present in both the early and late datasets are included in the analysis.

While repeat GO-SHIP sections provide physically robust and tracer-rich estimates of water-mass composition, their spatial and temporal sampling is inherently limited. In contrast, the Argo programme offers near-global coverage of the upper 2000 m of the Southern Ocean with monthly sampling since the early 2000s, but is largely restricted to temperature and salinity observations. To bridge this trade-off between physical completeness and spatial coverage, we train a random-forest ensemble algorithm to estimate water-mass fractions using only potential temperature, salinity, pressure, latitude, and longitude (Section ‘Machine learning model and application to Argo data’ and Fig. 1). The model is trained on OMP-derived water-mass fractions from three decades of GO-SHIP observations and subsequently applied to a monthly 1 × 1 degree gridded Argo climatology spanning January 2004 to January 2024. This approach enables the projection of physically interpretable water-mass definitions into data-rich but tracer-sparse regions, substantially extending the spatial and temporal reach of the OMP framework and allowing the investigation of decadal variability at basin scale (Section ‘Machine learning model and application to Argo data’).

Zonal-mean redistribution of CDW in repeat hydrographic sections

Figure 5 shows that the strongest changes in CDW fraction are concentrated in the upper 2000 m, characterised by increased CDW occurrence at higher latitudes and reduced occurrence at lower latitudes. In the zonal mean, enhanced CDW occurrence is concentrated near the continental slope and shelf break, while reductions are apparent on the equatorward flank of the tilted upwelling CDW layer. The orientation of these anomalies broadly mirrors that of CDW upwelling, with positive anomalies preferentially located on the poleward side of the CDW tongue near the continental slope. These results indicate a poleward redistribution of CDW near the continental shelf of the Southern Ocean. Because the dominant signal is confined to the mid-to-upper ocean, this depth range is well suited to investigation using Argo observations, which provide dense sampling of the upper 2000 m.

Basin-scale CDW layer thickness trends from Argo

To quantify the magnitude and basin-scale structure of this redistribution, we next examine CDW variability diagnosed from our machine learning model applied to Argo observations. As a means of determining the magnitude and spatial distribution of water mass changes from Argo data, we first compute integrated layer thicknesses for each water mass at each time-step (Section S1.7). Figure 6 shows the linear trend in layer thickness for a selection of water masses over the Argo period. The trend in CDW layer thickness across the two decades of Argo in the upper 2000 m is consistent with the changes inferred from the direct observations of repeat sections, with CDW undergoing a circumpolar layer expansion near Antarctica and contraction further north (Fig. 6). Integrating the thickness trends zonally and meridionally yields a volume expansion estimate for CDW of 8.7 × 1013 m3 (2.17% increase) in the Weddell Sea (50°W–20°E, 65–55°S) and 3.3 × 1013 m3 (0.55% increase) in East Antarctica (20–170°E, 65–60°S). The equivalent contractions observed to the north in these sectors (to 30°S) are 1.6 × 1014 m3 (1.71% decrease) and 3.0 × 1014 m3 (1.15% decrease), respectively.

Fig. 6: Linear trend in the integrated layer thickness of each water mass for the period 2004–2023.
Fig. 6: Linear trend in the integrated layer thickness of each water mass for the period 2004–2023.The alternative text for this image may have been generated using AI.
Full size image

Trends are estimated using a heteroskedasticity and autocorrelation-consistent (HAC, Newey-West) regression. Stippling indicates regions where the trend is significant at the 99% confidence level after controlling the false discovery rate (5%). The integrated layer thickness of the four water mass classes is computed from the monthly 1 × 1 degree Argo θ-S gridded climatology using the RF-based water mass classification. The time-mean absolute layer thicknesses are shown in Fig. S11 for reference. The layer-thickness calculation method is described in Section S1.7.

The pattern of CDW expansion near the continent appears to mirror the AABW/DSW contraction in Fig. 6: the CDW layer exhibits the greatest expansion in the Weddell Sea, with smaller peaks in East Antarctica near the Adélie Coast and Prydz Bay. In these regions, the increase in CDW volume is largely compensated by a reduction in AABW/DSW volume. The equivalent AABW/DSW volume contractions offset the majority of the CDW expansion in the aforementioned near-continent domains; AABW/DSW undergoes a volume reduction of 8.2 × 1013 m3 in the Weddell Sea and 4.1 × 1013 m3 in East Antarctica (compared with 8.7 × 1013 m3 and 3.3 × 1013 m3 for CDW). This is also visible in Fig. 7. Correlations between the AABW/DSW and CDW layer thickness time-series in regions of substantial climatological AABW/DSW concentrations are strongly negative and significant at the 99% level. Taking an average for both AABW/DSW and CDW within the red box shown and removing seasonal variability yields the time-series shown in the bottom row of Fig. 7. Both the trend of increasing CDW thickness and interannual variability are largely compensated by opposing changes in the thickness of the AABW/DSW layer. Preliminary analysis reveals that CDW expansion at the expense of AABW/DSW near the continent seems to primarily be the result of a poleward movement in the region of highest mixing, rather than an increase in overall CDW-AABW/DSW mixing (Fig. S12).

Fig. 7: Correlations between CDW and other water masses.
Fig. 7: Correlations between CDW and other water masses.The alternative text for this image may have been generated using AI.
Full size image

The row of top panels shows the r-value of the correlation between the CDW layer thickness time-series and the equivalent time-series for AABW + DSW, AAIW and both SAMWs, with seasonal variability removed. Water mass fractions are obtained from the application of the water mass-classification algorithm to Argo data. These are filtered to only show regions in which the r-value of the correlation is significant at the 99% level, assuming 18 degrees of freedom (due to the removal of seasonal variability). We also mask any region in which the integrated layer thickness of the water mass is less than 50 m. The panels directly below show the time-series of each water mass layer thickness along with the CDW thickness time-series, averaged within the indicated red box. Linear trends in these time-series are estimated using HAC Newey-West regression, and are all significant at the 99 % confidence level. Note that for ease of comparison, the negative of the time-series of the non-CDW water mass is shown.

However, near-continent CDW layer expansion is also present in West Antarctica where upper 2000 m concentrations of AABW/DSW are negligible (Section ‘Climatology’). Figure 3 shows that, at mid-depths between 70 and 160°W, AAIW extends as far south as 65°S (the southern extent of the gridded Argo dataset), such that, in this region, AAIW is the second most prevalent water mass (behind CDW). This is also visible in Fig. 3: the intermediate water tongue is located relatively far south and at close proximity to the continent at 120°W, but terminates around 60°S elsewhere. Figure 7 shows the correlation between the CDW and AAIW layer thicknesses (top middle panel). Correlations are negative in a circumpolar band which largely exists away from the continent to the north of the AABW/DSW sector, moving further south in the West Antarctic sector to encompass the region of CDW expansion (Fig. 6). The time-series of each water mass averaged within this near-continent sector (shown in the red box) are shown in the bottom row in Fig. 7. Both the trend and interannual variability of CDW are compensated by opposing changes in AAIW. This suggests that, in this region, the increase in the volume of CDW is primarily compensated by a reduction in the volume of AAIW. This is consistent with the results shown in Fig. 5c, in which a relatively shallow near-continent increase in CDW fraction is found outside the west Antarctic depth range of AABW.

In contrast, the contraction of CDW away from the continent is counteracted almost entirely by a circumpolar expansion of SAMW. SAMW is formed on the northern flank of the ACC5,66,67 and is visible in Fig. 3 as a band of high concentrations north of the sub-Antarctic front. Figure 7 shows strong negative correlations with CDW time-series away from the continent. Taking an average north of 55°S across all longitudes and removing seasonal variability yields the time-series shown in bottom row of Fig. 7, which demonstrates that the consistent decline in CDW volume in this region occurs concurrently with an expansion of SAMW.

Despite differences in sampling and temporal averaging, the GO-SHIP and Argo analyses show broadly consistent CDW fraction anomalies, with enhanced CDW occurrence concentrated at higher latitudes in the upper ocean (Fig. S13). Quantitative differences are expected given the sparse spatial and temporal coverage of GO-SHIP sections, which motivates our use of a machine-learning framework applied to Argo observations to resolve the full, basin-scale pattern of variability.

Rates and dominant modes of CDW variability

To quantify changes in the upper 2000 m CDW boundary, we identify at each longitude the polewardmost latitude at which CDW layer thickness exceeds a fixed threshold chosen to represent the upper CDW core (1450 m, approximately the 95th percentile of CDW layer thickness), and fit a linear trend to this latitude through time. The slope of each regression gives the local migration rate. To account for temporal autocorrelation in the annual-mean time series, trends were estimated using ordinary least squares with Newey-West HAC standard errors. Circumpolar means were computed as cos(latitude)-weighted averages of the per-longitude trends, with 95% confidence intervals obtained from a 20° longitude block bootstrap (2000 resamples), which accounts for spatial correlation between neighbouring longitudes. Using this method, we find a circumpolar-mean poleward migration of 1.26km yr−1 (95% CI: 0.53–1.98). Zonal poleward migration rates differ around Antarctica, with the largest values in the Weddell Sea (2.39km yr−1), intermediate rates in East Antarctica (1.31km yr−1), and weaker migration in West Antarctica (0.80km yr−1). Within the 60–65°S latitude band, the ocean heat content within the CDW layer increased at a rate of 2.81 TW (95% CI: 2.0–3.6 TW) based on the RG Argo product. The confidence interval reflects the uncertainty of the trend, calculated from the standard error of the slope estimated using a HAC regression. This indicates sustained warming along the Antarctic margin over the analysis period.

To further explore the spatial and temporal correlations in these changes, an EOF analysis is used to assess the major components of spatial variability of CDW over the last two decades in the upper 2000 m. Initially, the time-series of CDW layer thickness at each grid point in the Argo dataset is computed. We then apply a seasonal-trend decomposition (using locally-estimated scatterplot smoothing) to remove seasonal variability from each time-series, before undertaking the EOF analysis. Fig. 8 shows the leading three modes of variability according to the analysis, alongside their corresponding principal components and variance contributions. The third mode, which contributes 10% of the variance, shows a wave-like structure which is strongest in the Pacific sector and is correlated strongly with the Pacific Decadal Oscillation (r = 0.71). The second mode, which accounts for 16% of the variance, also shows an inter-decadal oscillation with a period of between 10 and 15 years. However, most importantly, the leading mode of variability is the same pattern of CDW expansion (contraction) near (away from) the continent that has been noted in both the repeat GO-SHIP sections (Fig. 5) and layer thickness trends from Argo data (Fig. 6). The principal component of this mode is a quasi-linear trend and leads all other modes of variability with 22% of the total variance.

Fig. 8: Leading modes of non-seasonal CDW variability.
Fig. 8: Leading modes of non-seasonal CDW variability.The alternative text for this image may have been generated using AI.
Full size image

The output of the EOF analysis on the CDW layer thickness time-series, computed from the application of the water mass-classification algorithm to Argo data. Seasonal variability is removed from the layer thickness time-series via locally-estimated scatter-plot smoothing. The r-value and associated principal component are shown for the leading mode of variability (left panel), the second leading mode (top right panel), and the third leading mode (bottom right panel).

Observed variability in AABW/DSW, AAIW, and SAMW distribution

Above 2000 m, AABW/DSW exhibits a clear contraction in layer thickness, concentrated primarily in the Weddell Sea and the Indian-sector margin south of 60°S (Fig. 6). Integrated over the Argo period, this corresponds to a reduction in AABW/DSW volume of 8.2 × 1013 m3 in the Weddell Sea and 4.1 × 1013 m3 in East Antarctica, including the Prydz Bay and Ad’elie Coast sectors. In contrast, little change is detected in the Ross Sea and West Antarctic sectors, likely reflecting that AABW there lies predominantly below the Argo sampling depth. The pronounced contraction in the Weddell Sea is consistent with recent observational evidence for a sustained decline in bottom water formation in this region68, while reductions in the Indian-sector margin align with documented weakening of DSW production at Prydz Bay and along the Ad’elie Coast in recent decades46,48,69,70.

SAMW exhibits a widespread layer expansion across most longitudes (Fig. 6), primarily north of the ACC and sub-Antarctic front where SAMW concentrations are highest. This pattern mirrors the contraction of CDW at lower latitudes. A notable exception occurs in the western Pacific, where SAMW shows a minor near-continent contraction that may make a small contribution to local CDW expansion alongside reductions in AAIW (Fig. 7). Overall, the SAMW thickening observed here is consistent with previous studies documenting increased SAMW layer thickness and subduction in recent decades6,71,72,73,74.

AAIW predominantly thins within the upper 2000 m, consistent with previous studies6,74. This signal is strongest in the Pacific sector, where near-continent CDW expansion is largely compensated by a reduction in AAIW volume (Fig. 7). AAIW thinning is also evident north of the ACC in the Atlantic and Indian sectors, occurring between regions of SAMW thickening to the north and CDW expansion to the south (Fig. S5). Overall, the spatial pattern of AAIW change closely matches earlier observational analyses74.

Discussion and conclusions

We present a comprehensive mapping of the climatological distribution of Southern Ocean water masses and assess their decadal-scale changes using repeat GO-SHIP sections and gridded Argo float data. We define end members for nine source waters, along with Δ values (adjustments applied to θ and salinity to account for multi-decadal shifts in source-water properties during the observation period). This correction is particularly important for AABW, which has experienced measurable warming in recent years. We train a RF model ensemble on the OMP derived water mass fractions from GO-SHIP. Initially, the model was calibrated using a full suite of inputs (θ, salinity, coordinates, and biogeochemical tracers) and validated via cross validation on the GO-SHIP dataset. Importantly, even when restricting the training to only θ, salinity, and coordinate information, the model maintained high skill. This enabled confident application of the model to the gridded Argo dataset—which provides only these essential parameters—to generate a 20 year, high-resolution water mass classification.

We find that there is a robust signal across both datasets of an increase in the upper 2000 m CDW volume near the Antarctic continent, accompanied by a reduction in CDW volume away from it. This signal exists across almost all longitudes, albeit with some regional differences in magnitude, and is the leading mode of non-seasonal variability according to our EOF analysis. In the Weddell Sea and across the East Antarctic sector, the CDW layer expansion south of the ACC is offset by a similar contraction in the AABW/DSW layer. However, in the West Antarctic sector, the upper 2000 m increase in CDW is compensated by a reduction in AAIW. Away from the continent and to the north of the ACC, SAMW expands at the expense of CDW.

There are a number of caveats to our work which must be considered. Firstly, we make the assumption that biogeochemical end members remain unchanged during the Argo period, and that the end members that we select from the GLODAP climatology are sufficiently similar to those in the early Argo period. We also assume that the mask we use to calculate Δ values (set out in Section ‘Time-varying end members’) is a reasonable spatial approximation of the source waters (Section S1.2). Secondly, we acknowledge that, whilst the OMP inversion is mathematically robust, the selection of all end member values necessarily has a subjective component. However, we have a strong rationale for our end member selection, and have extensively tested how our results would change in the face of a variety of sensitivity tests on both original and adjusted member values.

Importantly, our main findings relating to changes in water mass distribution are robust to all reasonable choices we could make (Section S2.1.1). Of particular significance is that the upper 2000 m poleward redistribution of CDW is largely unaffected by a much greater AABW end member warming (Figs. S14 and S15), at the extreme end of what is noted in observations (Section S2.1.1). This suggests that the apparent increase in CDW fractions near the continent (at the expense of AABW/DSW) is unlikely to result from a warming of the AABW end member that is outside of the range of our chosen AABW ΔT. Moreover, further analysis reveals that the presence of this signal is largely insensitive to the choice of Δ values (Section S2.1.1 and Fig. S16). We find that the poleward CDW migration signal is also retained when the RF model ensemble is trained only on the gridded GLODAP climatology water mass classification and applied to Argo, without any Δ value adjustment or use of the GO-SHIP section data (Section S2.1.2, Figs. S17 and S18).

We interpret this CDW signal as a poleward redistribution of CDW associated with a possible shift in the upwelling core of deep water toward the continental margin. The resulting pattern of CDW layer expansion to the south and contraction to the north is consistent with the concentration of anomalies in the upper 2000 m near Antarctica. We note that the increase in CDW near the continent is largely independent of whichever water mass occupies this space. CDW expands, accompanied by a reduction in AABW/DSW or AAIW, depending on which of these occurs in substantial concentrations near the continent. Likewise, the reduction in CDW north of the ACC is ubiquitous across all longitudes. A poleward redistribution of CDW implies a compensatory adjustment by neighbouring water masses as they thicken to occupy the layers CDW previously filled. This is consistent with our results. To the north of the sub-Antarctic front (Fig. S5 for frontal locations), the reduction in CDW is largely compensated by an increase in SAMWs, consistent with the fact that SAMW is the second most prevalent water mass in the top 2000 m of this region. In a narrow band in the Weddell Sea and East Antarctic sector between the sub-Antarctic and polar fronts, AAIW forms a meridional separation between SAMW and upwelling CDW (visible in the climatology—see Figs. 3 and 4). The increase in AAIW in this band shown in Fig. 6 is congruent with an expansion to fill the layers left by a southward shift in CDW.

There are a number of possible mechanisms that could account for this signal. Our work sets the context for future investigations of this nature. Firstly, it is possible that such a poleward migration in the upwelling core of CDW would be driven by a contraction in the volume occupied by the downslope cascade of upper 2000 m AABW/DSW. Abyssal contraction of AABW has already been well established from observations44,46,48,75. Future simulations from Li et al.40 suggest that, under strong meltwater forcing, a reduction in AABW formation and volume induces a poleward migration of CDW. Existing studies point to an already-initiated slowdown in AABW formation and transport51,68, such that AABW-induced poleward migration of CDW upwelling could account for the dominant signal observed here.

It may also be that the signal is a response to changes in wind forcing over the Southern Ocean in recent decades. A poleward migration in upwelling CDW is potentially consistent with the aforementioned strengthening in Southern Ocean westerly winds that has been shown in proxies, observational and model data24,25,76; theoretically, a wind-driven steepening in the isopycnals of the ACC could induce an increase in EKE and a strengthening in the eddy-driven residual circulation, increasing poleward eddy transport of CDW across the ACC. Alternatively, the observed poleward migration in the westerly wind belt is also consistent with a poleward reorganisation of CDW19,20. It follows that a southward shift in the location of mean wind stress could induce a similar shift in the ACC, thereby resulting in a poleward movement in the upwelling tongue of CDW.

The future climatic implications of a poleward shift in upper 2000 m CDW are substantial, given that the heat contained within CDW is the principal source of basal ice shelf melting. A warming of the waters near the continental slope from a southward shift in CDW, as is found at mid-century by Li et al.40, is likely to result in increased levels of ice shelf melting. Warmer shelf waters, coupled with increased freshwater input from melting, may also inhibit future AABW/DSW formation and associated overturning strength. These changes may have important consequences for the global carbon cycle, particularly given that the region south of the ACC is known to be a net source of carbon outgassing2,3. Altered upwelling pathways and stratification could influence both the ventilation of deep carbon-rich waters and the efficiency of biological carbon export, with potential feedbacks on atmospheric carbon levels.

Methods and data

This section begins by outlining the main data sources used in this study (Sections ‘GLODAP and GO-SHIP data’ and ‘Argo data’), before moving to explain the OMP-based water mass classification (Section ‘OMP water mass classification’). Initially, this method is used with the GLODAP climatology to provide a mean-state estimate of the Southern Ocean water mass configuration (Section ‘Climatology’), with the end member selection process described in section ‘Initial end member selection: Climatology’. The same method is used with varied end members (Section ‘Time-varying end members’) to compute the second water mass classification with GO-SHIP section data, which provides: (a) a direct assessment of water mass changes across individual cruises, and (b) the necessary training data for the RF model ensemble (described in Section ‘Machine learning model and application to Argo data’), which is then applied to the RG Argo data (Section ‘Temporal changes in water mass distribution’) to investigate variability and change in the top 2000 m at higher temporal resolution.

Whilst GLODAP/GO-SHIP data contains the necessary biogeochemical tracer data to properly constrain the OMP inversion, the gridded Argo data contains only θ and salinity. As we show in Section ‘Machine learning model and application to Argo data’, the machine learning approach enables a water mass classification with only θ, salinity, and positional data that exceeds the abilities of a traditional OMP analysis using just θ and salinity without an a priori (Fig. S19). Given that the resulting RF model ensemble does not require the specification of end members, this method has the added benefit of improved ease of application.

GLODAP and GO-SHIP data

GLODAP climatology

GLODAP is a synthesis activity for ocean surface to bottom biogeochemical data collected through chemical analysis of water samples. As such, it also includes all repeated hydrographic observations collected approximately every 10 years through the GO-SHIP program. The GLODAPv2 climatology is constructed by mapping ship-based data collected during the period 1972–2013 onto a 1 × 1 degree grid with 33 standard depth surfaces by using the Data-Interpolating Variational Analysis method. Further information, including estimates of error fields, quality control procedures and removal of anthropogenic trends, can be found in Lauvset et al.60. We use this product to provide a mean-state of Southern Ocean water masses (Section ‘Climatology’) for the first part of the OMP-based water mass classification analysis, as well as in the determination of adjusted end members (section ‘Time-varying end members’). For all GLODAP-related applications the following fields are used: θ, salinity, oxygen, nitrate, phosphate, and total alkalinity. The domain incorporates data from all longitudes within the latitudinal range 90°S–30°S. We acknowledge that the mapped θ and salinity from this product are not optimised for dynamical applications. However, these fields are used only to provide visual context in Section ‘Climatology’ and to maintain consistency with the biogeochemical end member definitions and with our use of GLODAP section data elsewhere. None of the key analyses of temporal change rely on the mapped climatology.

GO-SHIP cruise section data

The second part of the analysis focuses on observable changes to the Southern Ocean water mass configuration. For this we select individual GO-SHIP cruise sections from the merged and adjusted GLODAPv2.2023 synthesis dataset77, as presented in Lauvset et al.78. We select all measurements within the Southern Ocean domain which meet the highest quality control criteria and contain the full set of aforementioned required variables. We separate this data into a 2005–2010 dataset and post-2015 dataset (primarily 2015–2020) to examine changes in water mass distribution. 2005–2010 is selected to align with the start of the Argo dataset. These sub-sets are used in both the direct analysis of change along repeat sections and to train the RF model ensemble, which we later apply to Argo data. Both these analyses are presented in Section ‘Temporal changes in water mass distribution’. The GO-SHIP data provide high accuracy, quality controlled measurements with full biogeochemical coverage. The primary limitation is that it is necessarily temporally sparse and spatially limited to discrete lines. For full details on the data, including quality control procedures, bias adjustments and error estimates, see Lauvset et al.78.

Argo data

We use the mapped Roemmich-Gilson (RG) Argo product, which provides a monthly global objective mapping of all good Argo profile data. All Argo data receive additional quality control prior to mapping. The monthly maps are given at a 1 × 1 degree gridded resolution with 58 pressure levels over the depth range 0–2000 m between January 2004 and January 2024. Full details can be found in Roemmich & Gilson58. We apply the GO-SHIP-trained RF model ensemble to this data to determine a time-varying gridded water mass classification. The advantage of this core Argo data is that it provides high-resolution and near-global coverage with robust regular sampling. The principal disadvantage of this dataset is that the domain only extends as far as 65°S, which makes analysis of the near shelf water mass configuration extremely limited; for example, our water mass classification in the GLODAP climatology dataset shows high concentrations of DSW near the shelf (Fig. S2), but this is almost entirely missing in our Argo-based results. The core Argo floats also do not provide any biogeochemical tracer measurements. However, we contend that this is a necessary trade-off, given that, unlike other similar mapped Argo datasets, the RG product corrects for the well-known spurious salinity trend arising from drift, which has potential to impact the analysis of trends from the water mass classification method79,80,81.

OMP water mass classification

We use an Optimum Multi-Parameter (OMP) framework to provide a GLODAP-based Southern Ocean water mass classification, using the Pyompa implementation from Shrikumar et al.82, which is built on the original OMP package from Tomczak59 using the same fundamental non-negative least-squares machinery. Classical OMP analysis relies on the definition of end members of unmixed Source Water Types (SWTs), which are considered as water masses at the point of formation. SWTs are assigned characteristic variable values and it is assumed that these variables undergo the same mixing processes with identical mixing coefficients. Observed variables can therefore be considered to be a linear combination of each of the SWTs, such that it is possible to determine the spatial distribution of each SWT with a set of linear equations at each point in space. It is also assumed that all variables are conservative, although buoyancy fluxes at the surface and biogeochemical processes involving assimilation or remineralisation can violate this assumption. OMP analysis is a well-established method for identifying water-mass fractions and leveraging multiple tracers, but it is inherently sensitive to end member selection and is generally limited to relatively scarce bottle data. This provides our rationale for using a machine learning framework (Section ‘Machine learning model and application to Argo data’) to extend water mass classification to higher-resolution Argo float data. For the full set of equations, see Section S1.1.

We use θ, salinity, dissolved oxygen, nitrate, phosphate, and total alkalinity as raw input variables in our analysis from both GLODAP climatology and GO-SHIP cruise section data. Given the non-conservative potential for biogeochemical tracers, we elect to model the ‘semi-conservative’ parameters NO and PO via flexible exchange parameters83. This is the extended OMP analysis described in Karstensen & Tomczak84. This allows deviations from strict conservativeness to be absorbed into an additional Δ term while retaining all three tracers in the mixing system. The full set of equations is set out in Section S1.1. For additional details of the Pyompa configuration, linear constraints and normalisation practices, including further details on adaptions made to the original implementation, see Shrikumar et al.82.

Initial end member selection: Climatology

We select end members for 9 Southern Ocean water masses: CDW, AABW, NADW, AAIW, DSW, AASW, STCW and two forms of SAMW. These are the same water masses used in Pardo et al.65, who defined end members based on ship section data. We begin with the θ and salinity end members from Pardo et al.65 and make fine-scale adjustments to these to fit the parameter space of our data (GLODAP climatology and pre-2010 GO-SHIP data). Most notably, these include a cooling/freshening of the NADW end member by 0.78 °C/0.08 psu, and a cooling/freshening of the AABW end member by 0.25 °C/0.02 psu. These adjustments ensure that the end member captures the θ-salinity extreme, whilst ensuring that no end members lie just outside the empirical property space. The chosen θ and salinity end members are shown in Fig. 2. To determine the end member for the remaining tracers, we firstly use a K-D tree to find the nearest 1000 data points to the θ and salinity end-member pair. We then select the median value of this sample as the end member for each tracer. A complete account of all end members used and the sensitivity tests can be found in Sections S1.2 and S1.4. Figs. S20, S21, S22, and S23 show the sensitivity of the OMP solution to a simultaneous perturbation of all end-member values by 1 standard deviation, according to our source water property mask. Fig. S24 shows the impact of perturbing the chosen weighting matrix on the OMP solution. Importantly, the main conclusion of this study—relating to a poleward reorganisation of upper 2000 m CDW—is robust to a variety of end member perturbations (Figs. S15, S16, and S18).

Time-varying end members

OMP analysis is often used to describe the time-mean water mass configuration, as we do initially in this study (Section ‘Climatology’). However, it is more challenging to use end member analysis to infer changes in water masses, considering that end members themselves may also change. Changes in the computed water mass fractions may therefore reflect both changes in water mass distribution and/or changes in the properties of the associated end members. This is particularly relevant given that observational data shows a warming trend and substantial salinity trends in the Southern Ocean in recent decades85,86,87,88. Here we outline the method that is used to address this, which involves the specification of two groups of end members to account for changes across end member properties in time.

We begin by separating the GO-SHIP ship section data (Section ‘GLODAP and GO-SHIP data’) into two distinct 2005–2010 and post-2015 datasets. We run an OMP analysis with the 2005–2010 dataset, using the climatological end members as described in Section ‘Initial end member selection: Climatology’.

Selection of Δ values

For the post-2015 dataset, we define additional θ and salinity change (ΔT, ΔS) values to be summed with the original end members values to form adjusted end members. Firstly, we train a RF model ensemble on the water mass output from the OMP run with the GLODAP climatology. We apply this model to RG Argo θ, salinity, and positional data, to give a 3-dimensional water mass fraction in every grid box for the monthly output between years 2004 and 2024. We use the same method as described in the following Section ‘Machine learning model and application to Argo data’, although this is a preliminary (climatology) analysis used solely for the determination of end member ΔT and ΔS values. We use this output to define the 3-dimensional spatial distribution of each water mass on the Argo grid for each timestep, and then average in time to define a mean gridded distribution for each water mass across the Argo period. Next, we identify the 1000 grid boxes with the highest water mass fraction, which we take to approximate the properties of the unmixed source waters associated with each water mass. We calculate the change in θ and salinity (2021–2024 mean, minus 2004–2007 mean) in this 3-D source water mask over the 20-year Argo period to yield ΔT and ΔS values for each water mass. These are added to the original climatological end members to give the final adjusted θ salinity end members to be used with the post-2015 GO-SHIP dataset.

This method is used for all water masses, except for CDW and NADW. We assume that the end members of these two remaining water masses do not change during this period. This is based on observations of CFCs in the Southern Ocean, which show that the age of upwelling core of CDW/NADW exceeds 50 years89. We therefore assume that any change in the source water properties of these water masses has not propagated to the Southern Ocean during this time period. We also assume that the dominant trends in water mass properties are those of temperature and salinity, and thus assume that biogeochemical tracer end members remain constant. The complete set of adjusted end members can be found in Section S1.2 (Table 3).

The computed ΔT values are largely consistent with observational studies of water mass property change in the last two decades. All end members show at least some degree of warming, consistent with the Southern Ocean-wide warming trend that is apparent in the Argo float record (Fig. S25). Perhaps most importantly for the focus of this study, the chosen AABW ΔT of 0.077 °C is similar to observed rates of AABW warming. For example, Johnson et al.75 estimate a AABW warming trend of 2.8m °C per year in West Antarctica, which corresponds to an approximate 20-year warming of 0.056 °C. Similarly, Purkey et al.90 find warming rates of 3.5 m °C per year in the Ross Sea, corresponding to an approximate 20-year warming of 0.07 °C. There is evidence that this warming is accelerating: Johnson et al.75 show that the 2016/2017–2023/2024 trend is nearly triple the 30-year trend. In order to ensure that we do not underestimate the warming of the AABW end member, we test the sensitivity of our conclusions to a large warming of 0.16 °C (discussed more in Sections ‘Discussion and conclusions’ and S2.1.1). Importantly, the poleward migration of CDW is largely insensitive to this change (Figs. S14 and S15).

There is more variability in the ΔS values. Surface waters end members (AASW, STCW) and the near-surface mode water end members show an increase in salinity, with values ranging from 0.0039 psu (AASW) to 0.0217 psu (SAMW2). The AAIW end member shows a freshening of 0.0059 psu and DSW shows a very small freshening of 0.0001 psu. These trends are consistent with the spatial structure of the large-scale salinity changes that we note in the Argo dataset (Fig. S25). AABW also exhibits a 20-year increase in salinity of 0.0069 psu, which is seemingly inconsistent with recent studies that highlight a sustained freshening trend in AABW. This discrepancy arises due to the 2000 m maximum depth and maximum southern extent of 65°S of the Argo dataset; what is defined as the AABW source water by this method is actually a relatively north-ward portion of the downslope AABW cascade, which experiences minor increases in salinity during this period85 (shown in Fig. S25).

Assessing change along GO-SHIP sections

In order to assess changes in Southern Ocean water mass distribution, we compare the abundance of water masses along repeat GO-SHIP sections of the Southern Ocean (Section ‘GLODAP and GO-SHIP data’). Repeat hydrographic sections are sorted into early (2005–2010) and late (post-2015) transects, so as to align with the start of the Argo dataset and the associated changes analysed in Section ‘Temporal changes in water mass distribution’. The only exception to this is in the Weddell Sea sector, where we are unable to access any complete biogeochemical dataset post-2015. As such, in order to assess change in this region, we extend the dataset to incorporate the 1999 A23 section in the early data, and the 2010 A23 section in the late data. We classify water masses across in both early and late datasets, using the original end members for the early GO-SHIP lines, and adjusted end members for the late GO-SHIP lines. For a sample of the water mass classification in GO-SHIP sections, see Figs. S7, S8, and S9. The outputs generally agree well with the water mass distribution of the climatology. For a full sample of the output of the classification, including comparison with GLODAP climatology output, see Sections S1.5 and S1.6.

We interpolate each set of observations on to a common grid as a means to compare early and late sections. To minimise spatial biases between the early and late datasets during interpolation and subsequent comparison, we split each dataset into 40 × 9° longitude bins and mask any grid cell in which the other time period is missing data. We map the data to the same grid as the GLODAP climatology from Lauvset et al.60.

Machine learning model and application to Argo data

Overview of method

The final part of our analysis involves the training of a machine learning-based water mass classification model and the application of this to the gridded monthly RG Argo product. We selected a RF for this task, which is effectively a multi-output regression on tabular hydrographic data. Tree-based ensemble methods like RF are exceptionally well-suited for this class of problem, as they can capture the complex, non-linear, and potentially disjoint decision boundaries separating water masses in feature space more effectively than models with a strong continuity bias, such as neural networks. Furthermore, the native ability of RFs to handle multiple target variables (i.e. the water mass fractions) simultaneously makes them a more direct and robust choice than other gradient boosting methods like XGBoost, which would require training a separate model for each water mass. It should be noted that the skill of the RF model depends on the quality of the solution of the OMP analysis upon which it is trained.

We use this to analyse trends in the Southern Ocean water mass distribution during the period 2004–2024. The model is trained on the combined OMP output from both the 2005–2010 and post-2015 GO-SHIP datasets, with the original and adjusted end members, respectively. This means that the RF model ensemble sees both the 2005–2010 and post-2015 versions of each water mass, such that changes in water mass properties over the Argo period are broadly constrained by the model ensemble.

We use randomised 5-fold cross-validation to train the model. The method is set out below:

  1. 1.

    The output of both the 2005–2010 and post-2015 GO-SHIP OMP runs are collated and combined into a single training dataset. The sine and cosine of longitude are used as features (instead of longitude) to prevent discontinuity at the 0/360° transition.

  2. 2.

    The training dataset is randomised by row and then divided into 5 equally sized partitions (folds). In each fold, one partition is held out as the test set while the remaining 4 serve as the training set.

  3. 3.

    For each of the 5 folds, a separate RF regressor is trained on its corresponding training partition (the RF models are ensembles of decision trees with a maximum depth of 16 levels; the ensemble size is set according to our hyperparameter tuning.) Each trained model is then used to generate predictions on its own held-out test partition.

  4. 4.

    Combining the predictions from all 5 folds gives full coverage of the dataset. For each fold, we compute the R2 value on the test data, which collectively provides a robust cross-validated measure of model performance.

  5. 5.

    Ultimately, we obtain 5 independently trained RF models. These models are then applied to each time step and location in the RG Argo dataset. At every grid point, the final prediction is taken as the mean of the 5 model outputs, while the variance across these predictions is used as an uncertainty metric.

Model validation

We employ an exclusion study method to provide geographically out-of-distribution validation (the randomised K-fold cross-validation approach above is an out-of-sample testing approach) of the output of the RF model. We group repeat sections by location and collate all unique cruises (expocodes) relating to these groups. For a full list of these groupings by expocode, see Table S4. Next, we loop through each repeat cruise section (region), excluding it sequentially from the RF model training and subsequently using it as a testing dataset. The resulting R2 values quantify the ability of the model to predict geographically out-of-distribution observations. We do this for both RF models trained on the full suite of biogeochemical data, and also models trained on just θ, salinity, latitude, longitude and depth. This is shown in Table 1. It should be noted that, in some regions, we remove certain water mass contributions to the overall mean R2 value. This is typically where the relative fraction of this water mass tends toward zero, and the model shows no real predictive skill. For example, the I05 line is a zonal transect in the Indian Ocean near 30° S. The R2 breakdown by water mass for this section shows that the model displays high skill values (R2 > 0.90) for all water masses except for DSW and AASW, for which they are below 0.35. Given that DSW and AASW are functionally absent at this latitude and in this basin, we choose to exclude these skill scores from the overall R2 values in Table 1. A full account of where we do this and associated justification can be found in Section S1.5.

Table 1 Validation of the RF model via the exclusion of repeat sections

The RF model is able to predict geographically out-of-distribution water mass fractions with a relatively high degree of skill. R2 scores exceed 0.84 throughout all regions when using all variables in the training. For this case, the mean R2 value across all locations is 0.94. With the exception of the Drake Passage region, scores remain above 0.85 when biogeochemical variables are removed entirely from the training data set. For this case, the mean R2 value across all locations is 0.88 (or 0.90, excluding the Drake Passage). The Drake Passage hosts the lowest model skill in the exclusion studies across both train/test cases, with an anomalously low value of 0.62 when just θ, salinity, latitude, longitude and depth are used in training. Further attention to the R2 contributions from each water mass reveals that this results from particularly low model skill in predicting the fractions of SAMW and DSW (note that we already exclude NADW and STCW from the R2 value calculation here–see section S1.5). This may be partially related to substantial eddy variability and water mass mixing in this region, which likely reduces predictability in the model. Moreover, Fig. 3 reveals that, as an acting boundary between the Pacific and Atlantic sectors, the Drake Passage experiences a relatively strong zonal gradient in water mass fractions. Mode water fractions are high in the Pacific sector to the west of the passage, but decrease rapidly to the east. This pattern is reversed for DSW+AABW. We suggest that it is likely that strong eddy-driven mixing of tracers (TS), coupled with anomalously strong zonal gradients in water mass fractions (i.e. such that longitude loses predictive power), limits the predictive capabilities of the hyperparameters in the Drake Passage.

However, we retain high skill scores in all other regions when biogeochemical tracers are not used in model training. This has a number of implications that underpin our methodology, which are discussed in the following section. Note that an exhaustive list of expocodes and location groups used in the out-of-distribution validation can be found in Section S1.5. Further exclusion tests reveal that using only θ-salinity inputs reproduces most of the predictive skill, whereas positional predictors alone perform poorly (Table S5). Consistent with this, Shapley Additive Explanations (SHAP) analysis shows that model skill is driven primarily θ and salinity, with a more modest contribution from spatial terms (Table S6).

Application to Argo data and uncertainty estimation

The water mass classification analysis described in section ‘OMP water mass classification’ requires biogeochemical tracers to produce physically meaningful and well constrained solutions. When the OMP is run using only θ and salinity, the problem becomes greatly underdetermined: θ-salinity pairs alone cannot distinguish between source waters with similar density but different origins, and the resulting fields are diffuse and largely unable to separate NADW, CDW, and AABW, which share overlapping θ-salinity characteristics but distinct biogeochemical signatures. The inclusion of biogeochemical tracers therefore provides the essential constraints on the inversion and preserve realistic vertical and lateral gradients among these deep and bottom waters. This is shown in further detail in section S1.3 and in Fig. S19. Consequently, there is a fundamental limit on this type of analysis in the Southern Ocean: most biogeochemical measurements occur along repeat cruise sections, which are sparsely sampled in both space and time. Any analysis of the change in water mass distribution is hence limited to a handful of discrete repeat sections. In contrast, the vast majority of Argo floats do not measure these tracers, but provide some of the best temporal and spatial coverage in the mid-to-upper depths of the Southern Ocean.

However, the results of the machine learning verification demonstrate that our RF model can achieve high levels of skill even with just θ, salinity, latitude, longitude and depth. Importantly, this enables skilful water mass classifications to be undertaken without knowledge of biogeochemical tracers, and thus expands the possible coverage of classifiable data to include Argo floats which measure only θ and salinity. Our approach takes the OMP classification of the individual point measurements taken along GO-SHIP cruises (including all the tracer data) and, leveraging the skill of the RF model in the absence of tracers, translates this into a 3-dimensional Argo-based gridded water mass product at high spatial and temporal resolution. Figure S26 shows that the output of the RF model ensemble applied to Argo data exhibits good qualitative agreement with the output of the initial OMP water mass classification with the GLODAP climatology.

We also quantify uncertainty in the application of the model to the Argo data as the variance in predictions across the model ensemble (Fig. S27). The highest uncertainty values occur in CDW and NADW at intermediate depths toward the northern boundary of the Atlantic sector (i.e. in the space that NADW occupies in high fractions, see Fig. S3 and Fig. S27a, c). In this case, we suggest that this arises due to the difficulty the model has in differentiating between CDW and NADW, which are very similar in density space (Table 1). Elsewhere, variance across the fold ensemble appears in AAIW and SAMWs at mid-depths (Figs. S27f, i). Uncertainty associated with these water masses is particularly high in the Pacific sector of the Southern Ocean. Here, it appears that there is some uncertainty in the model’s differentiation between AAIW and SAMWs. We do not consider this to have an important impact on the main conclusion (of upper 2000 m poleward CDW redistribution) in this study, but it should be considered when assessing the relative change in these two water masses in Fig. 6. Most importantly, we find low uncertainty in the modelled fractions of CDW and AABW in the region near the shelf at all depths. This means that we are confident that the signal of poleward migration of CDW at the expense of AABW/DSW found in this study is unlikely to be substantially influenced by uncertainty in the model.