CODC-S: A quality-controlled global ocean salinity profiles dataset

Tan, Zhetao; Zhu, Yujing; Cheng, Lijing; Gouretski, Viktor; Pan, Yuying; Yuan, Huifeng; Wang, Zhankun; Li, Guancheng; Song, Xinyi; Zhang, Bin; Bao, Senliang; Li, Yuanlong; Zhu, Jiang

doi:10.1038/s41597-025-05172-9

Download PDF

Data Descriptor
Open access
Published: 30 May 2025

CODC-S: A quality-controlled global ocean salinity profiles dataset

Zhetao Tan ORCID: orcid.org/0000-0003-4342-3356^1,2^na1,
Yujing Zhu^1,3^na1,
Lijing Cheng ORCID: orcid.org/0000-0002-9854-0392^1,3,
Viktor Gouretski¹,
Yuying Pan¹,
Huifeng Yuan^3,4,
Zhankun Wang ORCID: orcid.org/0000-0002-3727-1379⁵,
Guancheng Li^1,6,
Xinyi Song^1,3,
Bin Zhang^7,8,
Senliang Bao^1,9,
Yuanlong Li⁷ &
…
Jiang Zhu¹

Scientific Data volume 12, Article number: 917 (2025) Cite this article

4422 Accesses
3 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Changes in the global ocean salinity reflect the evolution of the global hydrological cycle. These secular changes are assessed using seawater salinity profiles obtained during the past ~80 years. Here, we introduce a new global ocean salinity profiles database named CODC-S (the Chinese Academy of Science (CAS) Oceanography Data Center – Salinity component), which encompasses over 11 million in-situ salinity profiles from 1940 to 2023 obtained by means of several instrument types. These salinity profiles are quality-controlled (QC-ed) using a new automated salinity quality control system named CODC-QC-S (the CODC Quality Control system – Salinity component), consisting of 11 distinct quality checks. By applying time-varying, flow-dependent, and topographical-dependent 0.5% and 99.5% quantile thresholds, the CODC-QC-S defines local climatology salinity ranges without the assumption of Gaussian distribution. The CODC-S database, together with the newly proposed QC algorithm, has undergone extensive evaluations, including comparisons with the benchmark data and climatology, as well as analyses of global and basin-scale long-term salinity changes before and after QC. These validations demonstrate that the quality of salinity data in the CODC-S is time-, depth-, region- and instrument type-dependent. The eight-decade quality-homogeneous salinity profiles from the CODC-S database can support diverse oceanographic and climatic research such as monitoring water cycle changes and freshwater/overturning transports.

Short-term impacts of cold front passage on coastal water quality and material transport

Article Open access 01 December 2025

CODC-v1: a quality-controlled and bias-corrected ocean temperature profile database from 1940–2023

Article Open access 22 June 2024

GO-SHIP Easy Ocean: Gridded ship-based hydrographic section of temperature, salinity, and dissolved oxygen

Article Open access 25 March 2022

Background & Summary

Ocean salinity, an indicator of the global hydrological cycle, is one of the most important physical parameters of sea water^1,2,3. Ocean salinity changes can affect marine physical and biogeochemical conditions, including geostrophic currents⁴, density stratification⁵, mixed layer structure, and vertical movement of nutrients and marine organisms^6,7. Salinity is also one of the climatic impact-drivers (CIDs) linking the information from ocean physical changes to the climate change impact^8,9,10. For example, suitable salinity determines physiological levels of marine species, but excessive/limited salinity (freshening/salinization) may cause a redistribution of marine life¹¹. Our knowledge of ocean salinity changes relies on high-quality in-situ salinity observations. Since the beginning of hydrographic observations, over 10 million ocean salinity profiles have been collected by various instruments¹². Because of the heterogeneous quality of salinity data in the global archive, there is an increasing demand for a high-quality, error-free, quality-consistent salinity dataset to support relevant scientific research, governmental and non-governmental organizations, industry, fisheries, individuals, and policymakers^13,14,15.

In the past decade, salinity gridded products have been compiled by several research groups^1,3,16,17,18. These datasets have been widely used for different oceanographic and climatic studies such as the global hydrological cycle^19,20, ocean freshwater estimation^21,22,23, model evaluation^24,25, geostrophic current and freshwater transport estimation^26,27,28, as well as essential ocean variables (EOVs) and ocean indicators development^15,29. However, Li et al.³⁰ noted substantial differences between these datasets, with quality control (QC) procedures as one of the possible causes. Aiming to achieve a dataset of climate research quality, several research groups were committed to developing a comprehensive automated quality control procedure under the auspices of the International Quality-controlled Ocean Database (IQuOD)³¹.

The development of an automated quality control system (AutoQC) for salinity is closely linked to the quality control of temperature, as both parameters are almost always measured simultaneously¹². Significant progress has been achieved in the development of QC procedures for ocean temperature profile data^{17,32,33,34,35,36,37,38}. Similarly, in the past decades, progress has also been achieved in developing QC systems for salinity profile, each linked to a specific database (Table 1)^{12,17,18,20,39}. For example, the salinity QC systems developed by the NOAA National Centers for Environmental Information (NCEI) (namely as WOD-QC³⁷) and by the Met Office Hadley Centre (namely as EN4-QC^17,39). These two QC systems deploy automated (AutoQC) and expert manual checks (ExpertQC). The AutoQC procedures in these two systems consist of several checks applied to prove the plausibility of metadata, reported parameter values and derived quantities (date/time, geographical coordinates, platform speed, duplicate cast, depth level inversion, duplicate depth level, high-resolution pairs, variable range, spike, excessive vertical gradient, float track, static stability (density), and others). The manual checking typically involves the removal of outliers and anomalous data through individual profile assessments^17,37. The WOD manual-QC (expertQC) also examines unrealistic patterns when constructing maps for the World Ocean Atlas (WOA) and subjectively flagging profiles causing them³⁷. The ExpertQC improves the ability of the system to identify outliers, but is time- and cost-consuming⁴⁰.

Table 1 A brief overview of the primary salinity QC systems used by various organizations.

Full size table

Other QC systems for salinity profile observations include the one developed by the Argo Data Management Team, including real-time QC (Argo-RTQC), real-time adjusted QC, and delayed-mode QC (DMQC) for the Argo products³⁴. The quality-controlled ‘good’ (QC = 1) and ‘possible good’ (QC = 2) Argo data from the Argo Data Management Team are ingested into the WOD and EN4 database^17,37. Besides, the ICDC-QC (Integrated Climate Data Center) system developed by the University of Hamburg²⁰ is mainly used to produce the World Ocean Circulation Experiment-Argo Global Hydrographic Climatology based on the quality controlled salinity profiles. Coriolis Ocean dataset for ReAnalysis (CORA-QC⁴¹) also developed a hybrid-QC system by Jerome et al.⁴², which is based on an AutoQC but followed by human validation with ExpertQC, and is mainly used in the Copernicus marine nearly real-time (NRT) and delayed-mode (DM) in-situ salinity dataset.

It has become standard practice in most QC systems to verify profiles against climatological data. For example, the WOD-QC procedure defines the local salinity range by applying criteria of 3 to 5 standard deviations within 5-degree boxes, assuming a Gaussian distribution⁴³. In contrast, the ICDC-QC system²⁰ uses a different method. It employs an adjusted Tukey’s boxplot method that accounts for the skewness in data distribution²⁰. The existing salinity QC algorithms use different thresholds for outlier detection, as well as different sets of quality checks. These differences will impact the respective data products based on observed data filtered by the different QC schemes^31,40, introducing the methodological uncertainty in the estimation of climate of such climate indicators like ocean heat content¹⁵. Therefore, it is beneficial for this manuscript to first develop a comprehensive automatic QC algorithm for salinity, which is based on the recent developments and improvements of the QC schemes for the seawater parameters. This new AutoQC system, namely the Chinese Academy of Science (CAS) Oceanography Data Center Quality Control system – salinity component (CODC-QC-S), will be also applied in the new database introduced in this paper. Such a QC scheme will be essential for the compilation of a new high-quality and quality-consistent salinity dataset needed to support research activities in hydrology³ and marine/coastal ecosystems⁴⁴.

To develop a quality control procedure for the high-quality in-situ salinity profiles dataset, we should also understand how the salinity is measured and calibrated as the quality and accuracy of the salinity data depends on the method of salinity measurements. Salinity is a measure of the amount of dissolved matter in the seawater. Its concept and definition experienced several profound changes over time. Originally, the chlorinity of the seawater served as the measure of salinity based on a chemical titration of the water samples collected by means of bottles with bottle depth estimated by the wire length and angle. During the 1960s, the chemical titration was gradually substituted by electronic instruments (salinometers)⁴⁵. Chemical titration is regarded to be less accurate compared to high-precision electronic salinometers^46,47. The modern methods are based on measuring the electrical conductivity of the seawater, which strongly depends on both salinity and temperature and mildly on pressure, with the relationship between the conductivity and these three variables being non-linear. Since 1978, a proxy for true salinity called Practical Salinity has been introduced. All methods of salinity determinations require the reference standard with known salinity – the standard seawater. The IAPSO standard seawater is used to calibrate salinometers in the laboratory (http://www.soest.hawaii.edu/HOT_WOCE/sal-hist-report/3.1.html). Introduced into oceanographic practice in the late 1960s, different types of Conductivity-Temperature-Depth (CTD) profilers provide high-resolution salinity profiles¹². However, to achieve high accuracy, the CTD salinities need to be referenced to the salinity of water samples taken at a limited number of levels and analyzed using high-precision laboratory salinometers. Respectively, ship-based CTD salinity profiles are characterized by an order of magnitude higher accuracy than those CTDs installed on autonomous ARGO profiling floats or attached to marine mammals (APB), for which the adjustment to simultaneously obtained bottle samples is impossible. Therefore, Argo float salinity values are prone to instrumental biases due to sensor time drift, fouling, etc^48,49. In the deep ocean, the variability of salinity is generally very small²⁵, imposing strict requirements for the accuracy of salinity measurements and thus challenging the performance of QC in the deep ocean. The high-resolution ship-board CTD devices can measure salinity with an accuracy of better than 0.005 g/kg and a resolution of ~0.001 (https://www.seabird.com/profiling/family?productCategoryId=54627473767). In practice, the accuracy of salinity measurements ranges from 0.002 to 0.08, depending on the instrumentation used and the quality standard applied during specific cruises^50,51,52,53.

In this study, we introduce a new in-situ ocean salinity dataset named CODC-S (the Chinese Academy of Science (CAS) Oceanography Data Center – Salinity component). Suggested for applications in climate research and operational use, this dataset benefits from the application of a new automatic quality control system for salinity (CODC-QC-S). Both the salinity profiles from the CODC-S database and the new QC algorithm have undergone extensive validation using a high-quality salinity benchmark dataset. We also conducted analyses of global and basin-scale long-term salinity changes using the salinity profiles before and after QC to highlight the robustness of the newly proposed dataset. Here, we note that the CODC-S dataset will adopt a data processing and QC framework similar to that used for the CODC temperature component⁵⁴ developed by the IAP/CAS (Institute of Atmospheric Physics, Chinese Academy of Sciences). This framework has been successfully applied in the IAP temperature in-situ data processing (i.e., the CODC-QC-T temperature AutoQC system³⁶) and ocean heat content estimation^55,56. Therefore, in this study, the proposed CODC-S dataset, generated with the new salinity AutoQC system (CODC-QC-S), will serve as an extension from the temperature component to the salinity component of the existing CODC dataset⁵⁴. The CODC-S dataset also complements existing salinity databases such as the WOD¹², Ishii¹⁶, and EN4¹⁷. The intercomparison between the salinity databases is vital for addressing the uncertainty sources in estimating ocean salinity changes and the global water cycle, which have not been well-quantified yet⁵⁷.

Methods

Data sources

The main data sources of CODC-S come from the in-situ salinity profile data in the World Ocean Database (WOD) downloaded in April 2024. We use salinity data from all instruments reporting salinity, including Profiling Float (e.g., Argo profiling floats), Conductivity/Temperature/Depth (CTD), bottles (Ocean Station Data, OSD), moorings (MRB), gliders (GLD), the Autonomous Pinniped Bathythermographs (APB) and others³⁷. Besides, 52,253 non-WOD salinity profiles are included in our study to fill the data-poor regions: the Arctic 1970 to 2005 in-situ CTD data from Bedford Institute of Oceanography, salinity profiles from the Alfred-Wegener-Institute (Bremerhaven, Germany), the Northwest Atlantic Fisheries Centre, the Department of Fisheries and Oceans of Canada, the Freshwater Institute, the Institute of Ocean Sciences, and the Maurice-Lamontagne Institute. Additionally, some non-WOD salinity profiles in the Sea around China (e.g., the South China Sea, the East China Sea, the Yellow Sea, and the Western Pacific Ocean) owned by several Chinese institutes are also included in this study^{58,59,60,61,62,63}. In total, there are 11,093,341 salinity profiles with 2,234,827,427 measurements at observed depth levels spanning the period from January 1940 to December 2023.

QC working flows

The CODC-QC-S (CAS-Ocean Data Center (CODC) Salinity Quality Control system) comprises a total of 11 individual quality checks (Table 2). These checks evaluate the acceptable salinity ranges, as well as the vertical structure (shape) of the salinity profile, considering vertical, temporal, and regional variations. The quality control flag for each check is set for salinity values at each observed depth level. The flags are binary: ‘0’ signifies an acceptable (good) value, and ‘1’ indicates a rejected (bad) value. Based on these distinct check flags, the overall quality flag for each observed level is derived. This flag is set to ‘1’ if a salinity value fails at least one distinct check. Otherwise, the overall flag is set to ‘0’ indicating good (accepted) salinity value. Users can rely on the overall quality flag or their own decision based on the individual check values.

Table 2 The QC models of CODC-QC-S for salinity developed in this study.

Full size table

The details and parameter settings for each QC check are briefly introduced as follows:

Basic information check

This check proves the validity of each profile’s date, time, and location. For example, the latitude and longitude should be in the range [−83, 90] and [0 360] respectively. The profile location should not be on land. All observations for the profile are flagged if the check fails.

Sample level order check

This test proves whether the sampled levels are reported in increasing order. If level depths are not growing with depth, this observation at this level is flagged.

Local bottom depth check

This check proves whether the deepest sampled level is deeper than the local bottom depth, which is defined according to the latest version of the global 0.5 arc-second resolution digital General Bathymetric Chart of the Oceans (GEBCO)⁶⁴. Since the accuracy of GEBCO bathymetry is not uniform, a tolerance is added following Tan et al.³⁶. Salinity values on levels deeper than the local bottom depth are flagged. An example of this check could be found in the Supplementary Figure 1. However, caution is needed when applying this quality check, as failing the check might be due to errors in coordinates or the digital bathymetry.

Instrument type depth check

Each instrument type is designed to operate within a certain depth/pressure range. If a sample depth falls outside the nominal depth range for the instrument type, the observations beyond the acceptable range are flagged. The ranges are: 0–8000 m for CTD, 0–9000 m for OSD and XCTD, 0–6050 m for PFL, and 0–1200 m for APB. An example of this check could be found in Supplementary Figure 2.

Constant value check

This check identifies salinity profiles with “stuck values”. Such profiles exhibit constant salinity values throughout the whole depth or over an unrealistically thick layer of the water column. An example of this check could be found in Supplementary Figure 3.

Multiple salinity extrema check

This check is aimed to identify profiles with an excessive number of local salinity extremes. The salinity extrema at level k is defined as follows:

$$\begin{array}{l}({S}_{k+1}-{S}_{k})\ast ({S}_{k}-{S}_{k-1}) < 0\\ |{S}_{k}-{S}_{k+1}| > M\,and\,|{S}_{k}-{S}_{k-1}| > M\\ M=R+C/Z\end{array}$$

(1)

Here, M represents the threshold for the salinity extrema magnitude, with the choice based on the instrumental resolution. R is the mean extreme magnitude represented by the maximum allowed instrumental resolution (0.01 g/kg for high-resolution CTD and PFL, 0.05 g/kg for XCTD, low-resolution CTD, Bottle, and others). C/Z denotes the standard deviation of extreme magnitude (Z denotes the depth measurement; C is 25 for CTD, 40 for PFL, and 30 for XCTD, Bottle etc., which are empirical choices). If multiple salinity extremes are detected, all observations of a profile are flagged. This check is not performed in the upper 10 m as some large fluctuations are real features close to the surface, such as in the front water mass. An example of this check can be found in Supplementary Figure 4.

Spike check

This check is to identify the salinity spike. Spike occurs typically due to the malfunction of electronic sensors. The check is done by assessing how far the central salinity measurement at level k deviates from the average of its neighboring depth levels (k−1, k+1), adjusted by the threshold of the absolute difference between those neighbors (S):

$$\begin{array}{l}{S}_{1}=|{S}_{k}-({S}_{k-1}+{S}_{k+1})\ast 0.5|\\ {S}_{2}=|({S}_{k+1}+{S}_{k-1})\ast 0.5|\\ S={S}_{1}-{S}_{2}\end{array}$$

(2)

Here, S is a depth-dependent spike threshold: S = 0.12 in the upper 1000 m, S = 0.10 between 1000–2000 m; see Supplementary Figure 5). The observations beyond the depth-dependent spike are flagged. Here, it should be noted that once QC-flagged data points are removed, the adjacent data points are reconnected to form a new profile, potentially generating new spikes at these junctions. Therefore, in this check, we will iteratively implement the above judgments over again until no further new spikes are detected in the QCed profile. Supplementary Figure 6 provides an example of this check.

Density inversion check

This check proves whether the water density increases with increasing depth. The density is calculated from salinity and accompanied temperature at the same level using the computationally efficient 75-term expression⁶⁵. If temperature measurement is not available, this check is not performed. Supplementary Figure 7 shows some examples of this check.

Global crude range check

This check proves if the salinity measurement is grossly in error. The depth-dependent minimum/maximum threshold is determined based on all available salinity profiles from 1940 to 2023 and is set at 0.5% and 99.5% quantiles, respectively. Any value exceeding the overall range is flagged. Specially, this check is not performed in the Red Sea, the Gulf of Mexico, the Persian Gulf, the Black Sea, the Baltic Sea, the Mediterranean Sea, and coastal lines due to their distinct thermohaline structures notably different from that of the open ocean. An example of this check could be found in Supplementary Figure 8.

Global vertical gradient check

This check identifies pairs of depth levels for which the vertical salinity gradient exceeds the overall depth-dependent gradient threshold, which is also determined based on all available salinity profiles from 1940 to 2023. This check is similar to that in Gouretski²⁰, with minor parameter modifications. Similar to the spike check, this check will be iteratively implemented over again until no further threshold-exceed vertical salinity gradients are detected in the QCed profile. An example of this check could be found in Supplementary Figure 9.

Local salinity climatology range check

In addition to the global crude range check, each salinity measurement is checked against the acceptable local climatology range. This local climatology range (hereinafter IAP-S-range) is constructed following the suggestion of two-step thresholding by Yang et al.⁶⁶: we first perform the preliminary quality control, including checks (1) – (10) aiming to exclude observations which are grossly in error. During the second step, the IAP-S ranges are constructed based on the preliminary QCed data. According to Good et al.³¹ and Gouretski et al.⁶⁷, this check represents one of the most effective checks to identify outliers. The construction of the IAP-S local climatology ranges is described in the next section. An example of this check could be found in Supplementary Figure 10.

To evaluate the CODC-QC-S performance, we used the one-time hydrographic dataset obtained from the World Ocean Circulation Experiment (WOCE)⁶⁸. This dataset is characterized by outstanding data quality due to the strict and uniform quality requirements^68,69. The data from each WOCE cruise were subject to the manual expert QC before dissemination to the respective data centers. The high quality and consistency of the WOCE dataset were confirmed through the analysis of differences between distinct cruise lines at cross-over points⁷⁰. This dataset includes 8,790 CTD salinity profiles and 8,793 Bottle salinity profiles located globally from 1985–1997 (hereinafter ‘WOCE CTD dataset’ and ‘WOCE Bottle dataset), with 98.33% (WOCE CTD) and 96.39% (WOCE Bottle) of all salinity measurements in the upper 2000m ranked as good after the manual expert QC. These high-quality WOCE one-time CTD datasets are used to benchmark and evaluate the CODC-QC-S performance. The True Negative Rate (TNR) is used to assess the ability of a QC algorithm to retain good data (the definition follows Good et al.³¹):

$${TNR}=100 \% \ast \frac{{N}_{{TN}}}{{N}_{{TN}}+{N}_{{FP}}}$$

(3)

where ${N}_{{TN}}$ is the number of true negatives, and ${N}_{{FP}}$ is the number of false positives. Here, TNR should be as high as possible. The missing values had been removed before the evaluation.

Local salinity climatology range for CODC-QC-S

Developing a global ocean salinity profile dataset (i.e., CODC-S) usually depends on a robustness climatological-based automatic QC. However, defining the cut-off for local climatological range in ocean variables is still in debate in the community³¹. Acceptable ranges for observed variables are commonly established with box-plot methods⁷¹ or the mean±3-sigma (i.e., PauTa Criterion), with the latter method assuming a Gaussian distribution. However, the local salinity distribution in the ocean is typically skewed, as illustrated by the skewness maps for two selected depth levels (Fig. 1a,b). Therefore, Gouretski²⁰ suggested to use the modified adjusted boxplot method, which was first introduced by Hubert et al.⁷² and then improved by Adil et al.⁷³. The latter method defines the local lower fence (${Lf}$) and upper fence (${Uf}$) (i.e., local salinity range) as a function of three parameters: the salinity interquartile range (IQR), the median coupled (MC) and the skewness (SK) with the coefficient C to achieve the target outlier percentage⁷³:

$$\begin{array}{ccc}Lf & = & Q1-C\ast IQR\ast {e}^{-SK\ast |MC|}\\ Uf & = & Q3+C\ast IQR\ast {e}^{SK\ast |MC|}\end{array}$$

(4)

Here, C = 1.5 in Adil et al.⁷³, which is a subjective choice and corresponds to ~0.7% outliers for normal distribution. The Q1 and Q3 denote the 25^th quantile and the 75^th quantile. In a deviation from the original Tukey technique which multiplies the IQR by C = 1.5, the adjusted boxplot method extends or compresses the fences depending on the local skewness parameters SK and MC. Increasing the coefficient C from 1.0 to 2.25 extends the fences and thus reduces the rejection rate and increases the True Negative Rate (TNR) according to benchmark results (Supplementary Table 1). Another way is to set the lower fence and upper fences at fixed quantiles: Tan et al.³⁶ uses the 99% quantile which automatically results in 1% outliers in the data. Comparison of Rejection Rates and True Negative Rates (TNR) using the WOCE benchmark dataset also revealed that the quantile approach with selected fixed quantile thresholds results in a low rejection rate and a high TNR (Supplementary Table 1).

Figures 1, 2 show the comparison of different threshold methods: the Tukey’s boxplot⁵⁴, the modified adjusted boxplot⁵⁶, the mean ± 3-sigma (i.e., PauTa Criterion), the 99% quantile³⁰. We found that the mean ± 3-sigma method results in a symmetrical fence because it assumes a Gaussian distribution of the ocean variables. In some cases (e.g., Fig. 2), the mean ± 3-sigma ranges seem too wide compared to the actual observations. In some other cases (e.g., Fig. 2c), the modified adjusted boxplot may effectively identify bad outliers that may be mistakenly identified as good data by different methods. However, the modified adjusted boxplot might face limitations in data-poor regions, as its accuracy relies on local skewness parameters SK and MC. In regions with highly skewed salinity distribution (such as regions A and D in Figs. 1, 2), it seems 99% quantile could reasonably maintain more data that looks realistic than the other methods. Additionally, based on the benchmark evaluation using the WOCE dataset, we found that, if tuning the coefficient C of the Eq. 4, the modified adjusted boxplot⁷³ could give a similar percentage of outliers with the 99% quantile method (when C = 2.0). There is no significant performance difference in data rejection rate and TNR between this approach and the 99% quantile approach (the Supplementary Table 1). Therefore, considering the capacity to deal with the highly skewed data and in the data-poor regions, currently, we decided to use 99% quantile approach for the CODC-QC-S system (i.e., 99.5% quantile for the upper threshold and 0.5% quantile for the lower threshold).

Within each one-degree box, the salinity in-situ WOD profiles collected between 1940 and 2023 have been utilized to establish the local climatological range fields (hereinafter IAP-S-range). As detailed in previous Local salinity climatology range check section, preliminary quality-controlled salinity profiles are interpolated into 79 standard depths ranging from the surface down to 2000 meters following the method introduced by Reiniger and Ross⁷⁴. Interpolation is not performed where the gap between two consecutive levels surpasses a threshold (the threshold follows Gouretski²⁰). We didn’t establish the monthly IAP-S-range below 2000 meters because there are limited data in the deep ocean.

At each standard level and for each grid point on a monthly basis, the surrounding profiles are selected within the 555 km radius (follows Li et al.³⁰) to guarantee a sufficient number of profiles even in the data-poor regions. The minimum number of profiles required within the bubble has been determined after some empirical experimenting: minimum of 40 profiles are required above 250 meters, 30 profiles from 250 to 450 meters, 20 profiles from 450 to 1500 meters, and 15 profiles from 1500 to 2000 meters. If the number of profiles collected in a given month does not meet these criteria, additional profiles from neighboring months are included.

Due to the large initial size of the influence bubble, salinity profiles from different water masses might be ingested within the bubble, increasing the overall local salinity range. To more precisely select profiles with characteristics specific for the center of the bubble (e.g., for the analyzed grid-point for which the local salinity limits should be calculated), the following procedure is implemented. For all 1-degree boxes whose centers fall within the influence bubble, salinity monthly mean (M) and salinity standard deviations (σ) are calculated. Then, we retain the data from the boxes in which mean salinity is within the range [M_c ± 0.8*$\sigma $_c], where M_c and $\sigma $ are mean salinity and salinity standard deviation for the central box within the bubble, and the coefficient 0.8 chosen after some experiments. The selection of profiles also takes into account topographic barriers, so that profiles isolated by a topographic barrier are not considered. The global 0.5 arc-second resolution digital General Bathymetric Chart of the Oceans (GEBCO; 2022 Version) is used to represent the bottom relief⁶⁴.

Following the above strategies, the upper (S_max) and lower (S_min) climatological thresholds in the IAP-S-range are then defined using 99.5% and 0.5% quantiles based on the data retained after the selection procedure described above. However, the use of constant local thresholds may lead to the exclusion of good ‘extreme’ observations since the global salinity exhibited a significant long-term change over the past 60 years, for example, salinization in the Atlantic Ocean and freshening in the Pacific Ocean because of the intensification of the global hydrological cycle^3,30,75, we therefore apply instead time-varying thresholds ${S}_{{\max }}{\prime} $ and ${S}_{{\min }}{\prime} $, with the long-term threshold change represented by a linear trend:

$$\begin{array}{ccc}{S}_{{\max }}{\prime} & = & {S}_{{\max }}+{Year}\ast \left|{k}_{{mean}}\right|\\ {S}_{{\min }}{\prime} & = & {S}_{{\min }}-{Year}\ast \left|{k}_{{mean}}\right|\end{array}$$

where ${k}_{{mean}}$ is estimated by linearly fitting the IAP salinity monthly gridded product³ in each 1-degree box at each standard level, and ${Year}$ denotes the year of observation (ranging from 1940–2023). The Supplementary Figure 11 shows the spatial distribution of ${k}_{{mean}}$. The Atlantic Ocean shows the highest values, indicating a broad salinization trend, while the Pacific Ocean exhibits the lowest values, corresponding to a significant freshening trend. These patterns agree with the findings of Li et al.³⁰. The final local salinity climatological range field (i.e., IAP-S-range) was constructed on a 1° × 1° grid spanning 79 standard levels from the surface to 2000 m. For each 1-degree box, a spatial nine-point moving average filter was applied to smooth the data.

Figure 3 shows the fields of IAP-S-range for four representative depth layers. The minimum and maximum salinity fields indicate the large-scale salinity patterns, mainly modulated by the surface forcing (i.e., Evaporation minus Precipitation and river runoff) and oceanic transports⁷⁶. For instance, the near-surface salinity distribution acts as a rain gauge for precipitation minus evaporation over ocean (Fig. 3a,b). For the subsurface ocean, the salinity at 360 m depth is mainly featured by the high-salinity waters of the subtropical gyres and by fresher waters of the North Pacific and Southern Oceans (Fig. 3d,e). Salinity at 1000 and 1500 m levels reveals the low-salinity waters subducting in low- and mid- latitudes^1,19 (Fig. 3g–k). The salinity contrast between the salty Atlantic and fresh Pacific is also well seen for all levels, and the saltier Atlantic for 1000 m and 1500 m levels can be attributed to the overflow of the Mediterranean waters⁷⁷.

The local salinity climatological ranges mirror the spatial variation of salinity (Fig. 3). The largest range of the near-surface salinity is confined to coastal regions and the Arctic Ocean (Fig. 3c), with the latter mainly impacted by the strong terrestrial runoff and the ice melt and ice formation^78,79. The Bay of Bengal shows significant salinity variations near the sea surface, which are subject to large terrestrial runoff events with known ocean fronts¹. The other regions of the high subsurface salinity variation at 15 m and 360 m levels correspond to the western boundary currents, particularly to the Gulf Stream, being the manifestation of the moving ocean fronts (Fig. 3c,f,i). The fields at 1,000 and 1,500 m show a large salinity variability of >0.6 g kg⁻¹ in the North Atlantic Ocean, corresponding to the salty signature of Mediterranean Outflow Waters (MOW)⁸⁰ (Fig. 3i,l). To further illustrate the IAP-S-range, the local salinity climatological median field is shown in Supplementary Figure 12.

Data Records

CODC-S dataset⁸¹ with global ocean QC-ed salinity data from Jan 1940 to Dec 2023 by applying the CODC-QC-S procedure described above is freely available from the Chinese Academy of Sciences Ocean Data Repository at http://www.ocean.iap.ac.cn/ftp/cheng/CODCv2.1_Insitu_T_S_database/ or https://doi.org/10.12157/IOCAS.20241217.001 (for efficiently reuse in the community, we put the CODC temperature (CODC-T) profiles⁵⁴ together in the same folder). This dataset includes 1,008 ‘.mat’ (MATLAB format) and 1,008 ‘.nc’ (NetCDF format) monthly files (until Dec 2023), each file corresponding to the specific year and month. The format description is also attached as a ‘README’ document. Additionally, as the primary data source for this study is WOD, therefore, similar with the CODC-T dataset⁵⁴, most of relevant metadata from the WOD, including WOD-QC flags and data unique IDs, have been maintained. This is essential for enabling future comparisons between CODC-S and WOD (e.g., Section 4). Table 3 list the data format and introduces variables in the data files of CODC-S.

Table 3 A simple description of the salinity profile data, their QC flags, and metadata information stored in CODC-S (in MATLAB files).

Full size table

Figures 4, 5 show some basic statistical information about the data counts and profile geographical location in the CODC-S. For the entire dataset, the total number of salinity observations increases gradually over time but decreases with depth (Fig. 4). The majority of the salinity profiles come from the three main instrumentation types: 1) hydrographic bottle casts (e.g., the OSD instrumentation type), 2) the CTD casts, and 3) the autonomous Argo floats (PFL type). The OSD profiles contribute 23% of all salinity profiles but exhibit a strong geographical bias to the Northern Hemisphere (Fig. 5b). A small fraction of the OSD profiles come from the low-resolution CTD profiles⁴⁷. Since the end of the 1960s, when electronic profilers were introduced in oceanographic practice, the CTD profiles currently amount to ~1.32 million profiles (13.07% of all profiles). After the 2000s, the array of core-Argo autonomous floats started to provide salinity profiles for the upper 0–2000 m, dramatically improving global salinity sampling (Figs. 4b, 5e). The Argo salinity profiles comprise 24.70% of the entire CODC-S dataset. The four other instrumentation types are represented by APB, MRB, UOR, and DRB salinity profiles, each contributing 6.16%, 6.38%, 1.74% and 1.17% of profiles, respectively. These instruments are characterized by a regional geographical scope. Below the core-Argo float maximum depth of 2000m, the number of salinity observations drops significantly.

Technical Validation

CODC-QC-S systems validations

Here, we will first validate the performance of the proposed QC system. Using the WOCE benchmark dataset, we find the TNR of 99.93% and 99.24% for the WOCE CTD and the WOCE Bottle dataset, respectively. The flags are mainly attributed to the local salinity climatological range check. In comparison, ICDC-QC scheme developed by Gouretski²⁰ (improved version in November 2024) has a similar TNR of 99.91% (WOCE CTD dataset) and 99.8% (WOCE Bottle dataset) with the local salinity climatological ranges defined by the modified adjusted box-plot method, this is because by tuning the Coefficient C of the modified adjusted box-plot⁷², ICDC-QC can give the similar percentage of outliers like the CODC-QC-S (see Supplementary Table 1). This result indicates that CODC-QC-S can effectively retain good data.

We also note that the WOCE Bottle and CTD benchmark dataset have a tiny fraction (~1–3%) of salinity values with WOCE flags indicating bad or likely bad data. Only 1.09% (WOCE CTD dataset) and 5.34% (WOCE Bottle dataset) of these values were detected as outliers by the CODC-QC-S. Similarly, these values are 1.26% and 3.65% for ICDC-QC. The failure in detection is due to the dominant amount (~65%) of hidden outliers that fall within the magnitude of the natural variability: the salinity differences with the adjacent depth layers are typically smaller than 0.05 (in average) below the halocline, which are much smaller than the preset thresholds in all the QC checks. This result indicates a potential limitation of the CODC-QC-S. Nevertheless, note that we only have a tiny fraction of benchmark bad data, the evaluation of the CODC-QC-S performance of removing bad data would benefit through the comparison with other benchmarking datasets that contain a large amount of bad data that underdo the expert QC (e.g., similar to the QuOTA dataset⁶⁵ for temperature profiles), but no other manually validated salinity datasets were available for us at present.

For further assessment of the CODC-QC-S performance, we checked the ability of outlier detection by using the Argo grey list (the grey list information is sourced from https://argo.ucsd.edu/data/) as another possible alternative. We compared the rejection rate of the Argo grey list floats and non-grey list floats and found the Argo grey list floats exhibited systematic errors or other types of malfunctions, which contribute to the higher rejection rate (Fig. 6). This is also illustrated by Fig. 12c showing a higher rejection rate for some specific Argo floats. Similar results could be found in the QCed data in the WOCE benchmark dataset, where the rejected data can be attributed to some specific cruise lines (Supplementary Figure 13). We noted that a minor fraction of non-grey list floats also exhibits a much higher rejection rate (for instance, the float with ID = 3902185 in Fig. 6), and we believe that the above examples show the ability of CODC-QC-S to serve as a valuable tool in data quality monitoring in the future.

We also used some randomly selected real-time PFL (Argo) salinity profiles to further evaluate the CODC-QC-S performance. The raw real-time Argo profiles have not undergone rigorous delayed-mode QC at Argo Data Assembly Centers (DAC) and so include a considerable fraction of outliers (Fig. 7a,b). Data with quality issues and gross errors (e.g., spikes, extreme values, constant values, unrealistic variability, etc.) can be successfully identified as outliers by the CODC-QC-S, with the similar performance in different periods obtained by different instrument types (Supplementary Figure 14). Accordingly, the application of the CODC-QC-S reduces the overall salinity standard deviation at depth levels compared to the raw data (Fig. 7c).

Furthermore, we checked the regional CODC-QC-S performance in two randomly selected boxes. Figure 8 shows the mean and standard deviation of the salinity anomaly profiles (relative to 2008–2012 climatology) within two selected boxes before and after CODC-QC-S. The data before CODC-QC-S indicate the raw data including a lot of unrealistic variation (see Fig. 8b,e). However, the data after CODC-QC show significantly fewer anomalies (Fig. 8a,d). Several specific examples of the data rejection through the CODC-QC-S are provided in the Supplementary Information of Supplementary Figures 1–10. We conclude that the application of the CODC-QC-S leads to the reduction of the overall standard deviation and to smooth profiles of the standard deviation over depth. In addition, we also note that the application of the quality control procedure has an impact on the mean salinity profile (Fig. 8a,d).

Dataset (CODC-Salinity) validations

Individual data points (profile) validation

Here, we provided the CODC-S dataset validation from the perspective of data outliers statistics from the assessment based on over 11 million individual QCed salinity profiles from 1940 to 2023 (see Section 2.1 for the data source). The total rejection rates were defined as the percentage of the number of observations flagged as bad to the total number of observations. The gross errors and missing values (e.g., 99999, −99999, 99, −99, etc.) were excluded from our statistics. For the entire CODC-S dataset, which includes OSD, CTD, PFL, APB, MRB, DRB, and GLD instrumentation types, the overall rejection rate is 2.27% for 2,234,189,022 measurements. The yearly rejection rate decreased over time, with data collected before the 1990s generally exhibiting higher rejection rates compared to data obtained after the 2000s (red line in Fig. 9b). The decreasing rejection rate is primarily due to the general improvement of instrument accuracy/precision over time^46,53,82. The result shows a homogeneous overall rejection rate (~3.5%) with depth (black line in Fig. 9c), with only some regionally deployed instrument groups showing a higher-than-average rejection rate (Fig. 9c). Among the distinct instrument groups, the MRB, PFL, and Glider data exhibit the lowest rejection rate (less than 2%), being superior in data quality compared to APB, DRB and old Nansen casts. The lower rejection rate of PFL and Glider can be explained because only good data provided by the data originator were included into WOD (see the Introduction section).

Specifically, the APB data have the highest overall rejection rate of 7.39% among all instrumental types. The percentage of outliers steadily increases with depth below 500 m (Fig. 9a,c). We note that mammals less frequently dive deeper than 500 m⁸², and the APB measurements can be strongly influenced by the behaviors of marine animals⁵³, the thermal mass errors in the tags⁸³ and the errors in geographical position⁸⁴. Our results are also consistent with Boehlert et al.⁸² who reported the high outlier percentage for APB data.

Figure 10 shows the rejection rate versus depth and time for four main instrumental types. The progressive improvement in data quality is most clearly seen in OSD profiles. The highest percentage in the upper 500 m during the 1940s is mainly due to the erroneous positions of Nansen casts often reported during the Second World War (Fig. 11a). The CTD profiles are characterized by higher rejection rates before 1980 (Fig. 10b), which is partly due to the gross salinity errors at the initial stage of CTD implementation, and the same as below 2500 m, which is partly characterized by unrealistic vertical gradients and spikes at the final stage of CTD downcast before it hit the bottom (Fig. 11b). The quality of the PFL data improves significantly after 2005 indicating issues with salinity sensors during the initial stage of the Argo program before 2005 (Figs. 10d, 11c). The increased rejection rate after ~2020 (Figs. 10d, 11c) is due to a larger fraction of real-time Argo data, which have not undergone delayed-mode quality control (DMQC) at data acquisition centers. A higher rejection rate for APB data in 2004–2006 is linked to some gross errors (Fig. 11e). A high percentage of DRB outliers in 2013 is connected to the wrong geographic coordinates on land (Fig. 11f). The local salinity climatology range check results in the largest outlier percentages compared to other checks (Fig. 11).

Finally, Fig. 13 provides spatial rejection rate maps for four main instrumentation types: CTD, OSD, PFL, and APB. The three most accurate instrumentation types (OSD, CTD, and PFL) are characterized by low outlier percentages (less than 2% on average) for most regions. A somewhat higher percentages are found within the regions of coastal upwelling (tropical Pacific), marginal seas (e.g., the Japan Sea, the Red Sea), high energetic zones (e.g., Kuroshio, Antarctic Circumpolar Current, Agulhas Return Current), and coastal regions, suggesting the need for further adjustment of the local salinity climatology range.

Additionally, Fig. 9a–b and the Supplementary Figure 15 show the rejection rate for WOD-QC flags. According to the WOD user manual³⁷, two types of WOD-QC flags are provided: QC flags for each observed level (Salinity_WODflag) and QC flags for the entire cast (Salinity_WODprofileflag). The latter flag is based on both AutoQC and ExpertQC. The ExperQC flag is set when the profiles are selected for NOAA climatology products³⁷ (e.g., WOA23). Our results indicate a significant difference in the impact of these two flags: the first flag is less strict than the second one. Comparisons with the results of the CODC-QC-S validation (red lines in Fig. 9b), we find that the rejection rate of CODC-QC-S falls between the rejection rates suggested by the two WOD-QC flags. We noted that the discrepancy in the rejection rate between CODC-QC-S and WOD-QC might be due to the distinctions in the threshold definition of QC checks, additional QC checks, and entire profile QC flagging strategy (see Supplementary Table 2 in the Supplementary Information). However, intercomparing the performance of different salinity QC checks is more complicated due to the different QC standards and multi-faced adoption in different cases. Currently, IQuOD (IQuOD International Quality-Controlled Ocean Database)⁸⁵ has started a task team to identify the best-practices salinity AutoQC checks, and we believe the above investigation could contribute to this comprehensive investigation in the future.

To conclude, the data quality of salinity in CODC-S is time-, depth-, instrument-, and regional-dependent (Figs. 9–12), which is mostly linked to the changes in measuring techniques (e.g., sensors, and recording systems). Specifically, the bottle salinities of water samples (OSD instrumentation type), determined through chemical titration in early years⁴⁶ have been gradually replaced by salinities determined utilizing salinometers⁴⁵ which still provide the most accurate salinity measurements often used as reference. Besides, there has been a shift from the manual data recording (for old bottle data) to automatic recording by shipboard CTDs or Argo floats. Compared to salinity titration for old Nansen casts, the modern instrumentation determines salinity by measuring the electrical conductivity of the sea water, markedly improving the data quality (as indicated in Section 1).

Climatology validation

Another possible way to illustrate the CODC-S data quality is to inspect spatial fields of salinity standard deviation, calculated by the standard deviation among all available observations in each grid box, which largely represents the local variability of the ocean salinity. These fields based on unvalidated data typically exhibit spikes, “bullseye (red blobs)”, or other unphysical salinity variations. Figure 13 shows the 1-degree gridded fields of the salinity standard deviation at four selected levels (20 m, 700 m, and 1000 m). Generally, compared with the no-QCed data, the maps after applying the CODC-QC-S depict the well-known large-scale global salinity variation patterns with areas of high standard deviation corresponding to high-energetic regions of western boundary currents, Antarctic Circumpolar Current, and the equatorial zone. We note that these patterns also agree with the standard deviation map of the World Ocean Atlas 2023 (WOA23¹⁸) where the original salinity data undergone automatic WOD-QC and a rigorous manual QC check (Fig. 13c,f,i) and thus can be regarded as a benchmark climatology dataset. We believe that a high degree of consistency with WOA23 fields indicates the success of the CODC-QC-S scheme in removing the outliers without performing any further manual/expert QC. We also note that these patterns can be explained by previous studies^1,3 of ocean thermohaline variability. The same fields, but based on non-validated data exhibit numerous spots of high standard deviation (second columns of Fig. 13). We also note that the fields based on the data retained after the application of the WOD observed level flag (‘Salinity_WODflag’) only still exhibit some unrealistic patterns in the upper layers and larger standard deviation than CODC-QC-S (see Fig. 14).

Time series validation

Another validation method is to check the impact of the QC on the estimation of the salinity changes before & after QC. As ocean salinity has changed at various spatial and temporal scales in response to natural variability and external forcings^13,86, the erroneous salinity profiles, if not properly dealt with, will ultimately impact the estimation of the salinity changes. To test the performance of CODC-S dataset on long-term salinity change estimates, we compare the estimated sea surface salinity (SSS) and 0–2000 m averaged salinity (S2000) changes at global scale and in different ocean basins using two datasets: CODC-S with CODC-QC-S applied and the same in-situ data without CODC-QC-S applied (i.e., NoQC; but some crude QC process is still applied to remove some crude values, for example, removing the salinity measurements less than 0 g/kg and larger than 45 g/kg). The two sets of data are then processed separately with the same data processing procedure as described in Cheng et al.³, e.g., the same vertical interpolation and gap-filling methods, as well as removing the duplicated profiles following the definition by Song et al.⁸⁷.

The Global, Atlantic, Pacific, and Indian SSS and S2000 time series before and after QC are shown in Fig. 15. The most visible difference before and after QC is the salinity variability. After QC, the standard deviation of the detrended salinity time series from 1960 to 2023 is reduced by ~25% for global SSS (0.0191 g kg⁻¹ before QC, 0.0141 g kg⁻¹ after QC), by ~75% for Global S2000 (0.0059 g kg⁻¹ before QC, 0.0015 g kg⁻¹ after QC). For example, the ‘large jumps’ seen in the S2000 time series before QC during the early 1980s might be caused by gross salinity errors (e.g., limitation of sensors or conductivity probes) during the initial stage of CTD implementation^88,89,90. However, these ‘jumps’ disappeared after applying the CODC-QC-S. The smaller salinity variability after QC is likely more physically tenable because the salinity change over the global scale is associated with the surface net freshwater flux, which can be used as a constraint for salinity variability. A global change of 0.0059 g kg⁻¹ in S2000 corresponds to a sea-level change of about 300 mm (assuming the freshwater input/outputs in the 0–2000 m layer), which is far greater than the variation of total sea level revealed by altimetry data (from the University of Colorado https://sealevel.colorado.edu/). Independent measurement of ocean mass change by [Gravity Recovery and Climate Experiment (GRACE)]. Watkins et al.⁹¹ suggests a water mass variation of 1.25e + 12 m³ (equal to a sea level variation of ~2.75 mm) from 2002 to 2023, corresponding to a S2000 variability of 0.00005 g kg⁻¹. For comparison, the S2000 variation after QC is 0.0011 g kg⁻¹ after 2002, closer to the GARCE result than before QC (0.0017 g kg⁻¹). These physical considerations suggest that QC can reduce spurious variability and improve the quality of salinity data.

The reduction of variability after QC can be found in all ocean basins. For example, before QC, there were big Atlantic, Indian and Pacific S2000 salinity anomalies of >0.03 g kg⁻¹ in the early 1980s and a spike of Atlantic S2000 of ~ 0.02 g kg⁻¹ in 2005, drops of Indian S2000 of 0.01~0.02 g kg⁻¹ after 2020, which are all too big and non-physical because of the erroneous in situ data. Such anomalous signals disappeared after QC/adjustment (Fig. 15). For basin means, the standard deviations are reduced from 0.0253 (Pacific SSS), 0.0068 (Pacific S2000), 0.0359 (Atlantic SSS), 0.0066 (Atlantic S2000), 0.0378 (Indian SSS), 0.0115 (Indian S2000) before QC to 0.0181 (Pacific SSS), 0.0015 (Pacific S2000), 0.0291 (Atlantic SSS), 0.0030 (Atlantic S2000), 0.0313 (Indian SSS), 0.0025 (Indian S2000) g kg⁻¹ after QC (Fig. 15). Note that the S2000 estimate always shows a bigger impact of QC than SSS because of sparser data in the sea subsurface than in the near-surface, so the erroneous measurements can impact broader areas over time and space associated with the spatial interpolation approach.

The long-term trends of salinity change can also be influenced by the QC method. Because of the amplification of the global water cycle, the Pacific Ocean has been getting fresher, and the Atlantic Ocean has been getting saltier^1,3,19,92. Such contrasting trends have been identified as robust even based on data without QC (Fig. 15). However, the long-term (1960–2023) trends can be largely impacted by QC, changing from 0.19 ± 0.05 (Atlantic SSS), 0.024 ± 0.019 (Atlantic S2000), −0.09 ± 0.04 (Pacific SSS), −0.018 ± 0.021 (Pacific S2000) g kg⁻¹ century⁻¹ before QC to 0.09 ± 0.04 (Atlantic SSS), 0.020 ± 0.009 (Atlantic S2000), −0.09 ± 0.02 (Pacific SSS), −0.015 ± 0.004 (Pacific S2000) g kg⁻¹ century⁻¹ after QC. The linear trend is calculated by the ordinary least square regression with a 90% confidence interval shown (accounting for the reduction in degree of freedom). On a global average, GARCE data derived a trend in ocean salinity of about −0.004 g kg⁻¹ century⁻¹ from 2002–2023 for the upper 2000 m, assuming the freshwaters are all input there, nearly identical to the global S2000 trend from 1960 to 2023 after QC (−0.004 ± 0.002 g kg⁻¹ century⁻¹). However, the short-term global S2000 after 2000 shows an increasing trend, opposite to the expected impact of freshwater input into the ocean due to land ice melting, which is likely associated with the Argo data drift^93,94.

In summary, this test suggests a high-quality and outliers-minimized global ocean salinity profiles dataset and its well-tuned salinity QC system are a basis for a more accurate estimate of global and regional salinity changes. Consequently, the robust estimate of long-term salinity trends is a basis for quantifying the trends of water cycle amplification^1,3. However, even after our QC, the magnitude of salinity variability is still larger than the results derived from GRACE data, indicating either further refinements are needed for the QC scheme, or there are other sources of errors in salinity data.

Usage Notes

The development of a quality-controlled global hydrographic database is the main goal of the international IQuOD initiative (International Quality-controlled Ocean Database)⁸⁵. The joint efforts of several international teams had resulted in producing the first version of this database with quality flags for temperature profiles. This study may be considered as a further contribution to the IQuOD joint effort, including contributing to the development of the IQuOD salinity database and identifying the best practices in salinity QC. We also believe that the assessment methodologies used in this study could serve as a preliminary salinity QC performance evaluation framework for future use.

Code availability

Scripts for loading the CODC-S dataset into MATLAB, as well as the codes to process the dataset for the manuscript figures, are provided in the README document via http://www.ocean.iap.ac.cn/ftp/cheng/CODCv2.1_Insitu_T_S_database/. We also provide access to the codes for interpolation methods for their specific purposes via http://www.ocean.iap.ac.cn/.

References

Durack, P. J. & Wijffels, S. E. Fifty-year trends in global ocean salinities and their relationship to broad-scale warming. J Climate 23, 4342–4362 (2010).
Article ADS Google Scholar
Levang, S. J. The response of ocean salinity patterns to climate change: implications for circulation, Massachusetts Institute of Technology (2019).
Cheng, L. et al. Improved estimates of changes in upper ocean salinity and the hydrological cycle. J Climate 2020, 10357–10381 (2020).
Article Google Scholar
Rabe, B., Johnson, H. L., Münchow, A. & Melling, H. Geostrophic ocean currents and freshwater fluxes across the Canadian polar shelf via Nares Strait. J Mar Res 70, 603–640 (2012).
Article Google Scholar
Li, G. et al. Increasing ocean stratification over the past half-century. Nat Clim Change 10, 1116–1123 (2020).
Article ADS Google Scholar
Haumann, F. A. et al. Sea-ice transport driving Southern Ocean salinity and its recent trends. Nature 537, 89–92 (2016).
Article ADS CAS PubMed Google Scholar
Freeland, H. J. Evidence of change in the winter mixed layer in the Northeast Pacific Ocean: a problem revisited. Atmos Ocean 51, 126–133 (2013).
Article ADS CAS Google Scholar
Ranasinghe, R. et al. Climate change information for regional impact and for risk assessment. In climate change 2021: The physical science basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change. (Cambridge Univ. Press, 2021).
Ruane, A. C. et al. The climatic impact‐driver framework for assessment of risk‐relevant climate information. Earths Future 10, e2022EF002803 (2022).
Article ADS PubMed PubMed Central Google Scholar
Tan, Z., von Schuckmann, K., Cheng, L. & Speich, S. in EGU General Assembly Conference Abstracts. EGU-3662.
Pecuchet, L., Törnroos, A. & Lindegren, M. Patterns and drivers of fish community assembly in a large marine ecosystem. Mar Ecol Prog Ser 546, 239–248 (2016).
Article ADS Google Scholar
Boyer, T. P. et al. World Ocean Database 2018. A. V. Mishonov, Technical Editor, NOAA Atlas NESDIS 87 (2018).
Bindoff, N. L. et al. Changing Ocean, Marine Ecosystems, and Dependent Communities. In: IPCC Special Report on the Ocean and Cryosphere in a Changing Climate. Report No. 1009157973 (2019).
IPCC. Climate change 2021: The physical science basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change. (Cambridge Univ. Press 2021).
Cheng, L. et al. New record ocean temperatures and related climate indicators in 2023. Adv Atmos Sci, 1–15 (2024).
Ishii, M. et al. Accuracy of global upper ocean heat content estimation expected from present observational data sets. Sola 13, 163–167 (2017).
Article ADS Google Scholar
Good, S. A., Martin, M. J. & Rayner, N. A. EN4: Quality controlled ocean temperature and salinity profiles and monthly objective analyses with uncertainty estimates. J Geophys Res-Oceans 118, 6704–6716 (2013).
Article ADS Google Scholar
Reagan, J. R. et al. World Ocean Atlas 2023, Volume 2: Salinity. A. Mishonov, Technical Ed. NOAA Atlas NESDIS 90. https://doi.org/10.25923/70qt-9574 (2024).
Helm, K. P., Bindoff, N. L. & Church, J. A. Changes in the global hydrological‐cycle inferred from ocean salinity. Geophys Res Lett 37 (2010).
Gouretski, V. World Ocean Circulation Experiment – Argo global hydrographic climatology. Ocean Science 14, 1127–1146 (2018).
Article ADS Google Scholar
Fournier, S. et al. Sea surface salinity as a proxy for Arctic Ocean freshwater changes. J Geophys Res-Oceans 125, e2020JC016110 (2020).
Article Google Scholar
Palmer, M. D. et al. Adequacy of the Ocean Observation System for Quantifying Regional Heat and Freshwater Storage and Change. Front Mar Sci 6 (2019).
McDonagh, E. L. & King, B. A. Oceanic fluxes in the South Atlantic. J Phys Oceanogr 35, 109–122 (2005).
Article ADS Google Scholar
Jensen, T. G. et al. Modeling salinity exchanges between the equatorial Indian Ocean and the Bay of Bengal. Oceanography 29, 92–101 (2016).
Article Google Scholar
Liu, Y. et al. How well do CMIP6 and CMIP5 models simulate the climatological seasonal variations in ocean salinity? Adv Atmos Sci 39, 1650–1672 (2022).
Article Google Scholar
Dong, S., Goni, G. & Bringas, F. Temporal variability of the South Atlantic meridional overturning circulation between 20 S and 35 S. Geophys Res Lett 42, 7655–7662 (2015).
Article ADS Google Scholar
Manta, G. et al. The South Atlantic meridional overturning circulation and mesoscale eddies in the first GO‐SHIP section at 34.5 S. J Geophys Res-Oceans 126, e2020JC016962 (2021).
Article ADS Google Scholar
Zheng, H. et al. An observation‐based estimate of Atlantic meridional freshwater transport. Geophys Res Lett 51, e2024GL110021 (2024).
Article Google Scholar
Forster, P. M. et al. Indicators of global climate change 2022: Annual update of large-scale indicators of the state of the climate system and the human influence. Earth Syst. Sci. Data. 15, 2295–2327 (2023).
Article ADS Google Scholar
Li, G. et al. A global gridded ocean salinity dataset with 0.5° horizontal resolution since 1960 for the upper 2000 m. Front Mar Sci 10, 1108919 (2023).
Article ADS Google Scholar
Good, S. et al. Benchmarking of automatic quality control checks for ocean temperature profiles and recommendations for optimal sets. Front Mar Sci 9, 1075510 (2023).
Article Google Scholar
Bushnell, M. Quality assurance/quality control of real-time oceanographic data. OCEANS 2016 MTS/IEEE Monterey. IEEE (2016).
Castelao, G. P. A framework to quality control oceanographic data. J Open Source Softw 5, 2063 (2020).
Article ADS Google Scholar
Wong, A., Keeley, R. & Carval, T. Argo quality control manual for CTD and trajectory data (2020).
Mieruch, S. et al. SalaciaML: A deep learning approach for supporting ocean data quality control. Front Mar Sci 8, 611–742 (2021).
Article Google Scholar
Tan, Z. et al. A new automatic quality control system for ocean profile observations and impact on ocean warming estimate. Deep Sea Res 1 Oceanogr Res Pap 194, 103961 (2023).
Article Google Scholar
Garcia, H. E. et al. World Ocean Database 2023 User’s Manual. A.V. Mishonov, Technical Ed., NOAA Atlas NESDIS 98, pp 129. (2024).
Zhang, B. et al. Developing big ocean system in support of Sustainable Development Goals: challenges and countermeasures. Big Earth Data 5, 557–575 (2021).
Article Google Scholar
Ingleby, B. & Huddleston, M. Quality control of ocean temperature and salinity profiles: historical and real-time data. J Marine Syst 65, 158–175 (2007).
Article ADS Google Scholar
Tan, Z. et al. Quality control for ocean observations: From present to future. Sci China Earth Sci 65, 215–233 (2022).
Article ADS Google Scholar
Szekely, T., Gourrion, J., Pouliquen, S. & Reverdin, G. The CORA 5.2 dataset for global in situ temperature and salinity measurements: data description and validation. Ocean Science 15, 1601–1614 (2019).
Article ADS Google Scholar
Gourrion, J. et al. Improved Statistical Method for Quality Control of Hydrographic Observations. J Atmos Ocean Tech 37, 789–806 (2020).
Article Google Scholar
Garcia, H. E. et al. World Ocean Database 2018: User’s Manual. A.V. Mishonov, Technical Ed., NOAA, Silver Spring, MD (2018).
Van der Stocken, T. et al. Mangrove dispersal disrupted by projected changes in global seawater density. Nat Clim Change 12, 685–691 (2022).
Article ADS Google Scholar
Tabata, S. On the accuracy of sea‐surface temperatures and salinities observed in the northeast pacifie ocean. Atmos Ocean 16, 237–247 (1978).
Article ADS Google Scholar
Warren, B. A. Nansen-bottle stations at the Woods Hole Oceanographic Institution. Deep Sea Res 1 Oceanogr Res Pap 55, 379–395 (2008).
Article Google Scholar
Gouretski, V., Cheng, L. & Boyer, T. On the consistency of the bottle and CTD profile data. J Atmos Ocean Tech 39, 1869–1887 (2022).
Article Google Scholar
Wong, A. P., Johnson, G. C. & Owens, W. B. Delayed-mode calibration of autonomous CTD profiling float salinity data by θ–S climatology. J Atmos Ocean Tech 20, 308–318 (2003).
Article Google Scholar
Bordone, A. et al. XBT, ARGO float and ship-based CTD profiles intercompared under strict space-time conditions in the Mediterranean Sea: Assessment of metrological comparability. Journal of Marine Science and Engineering 8, 313 (2020).
Article Google Scholar
Joyce, T. et al. Observations of the Antarctic polar front during FDRAKE 76: a cruise report. WHOI-76-74, 154pp (1976).
Mizuno, K. & Watanabe, T. Preliminary results of in-situ XCTD/CTD comparison test. Journal of Oceanography 54, 373–380 (1998).
Article Google Scholar
Pellerano, F. A., Horgan, K. A., Wilson, W. J. & Tanner, A. B. in IGARSS 2004. 2004 IEEE International Geoscience and Remote Sensing Symposium. 774–776 (IEEE).
Siegelman, L. et al. Correction and Accuracy of High- and Low-Resolution CTD Data from Animal-Borne Instruments. J Atmos Ocean Tech 36, 745–760 (2019).
Article Google Scholar
Zhang, B. et al. CODC-v1: a quality-controlled and bias-corrected ocean temperature profile database from 1940–2023. Scientific Data 11, 666 (2024).
Article PubMed PubMed Central Google Scholar
Cheng, L. et al. IAPv4 ocean temperature and ocean heat content gridded dataset. Earth Syst Sci Data 2024, 3517–3546 (2024).
Article Google Scholar
Cheng, L. et al. Record High Temperatures in the Ocean in 2024. Adv Atmos Sci, 1–18 (2025).
Liu, C., Liang, X., Ponte, R. M. & Chambers, D. P. “Salty Drift” of Argo Floats Affects the Gridded Ocean Salinity Products. Journal of Geophysical Research: Oceans (2024).
Wang, X., Wang, C. & Liu, C. A dataset of profile observation on three-anchor buoy integrated observation platform of the East China Observation station in 2018–2019. Science Data Bank. https://doi.org/10.11922/sciencedb.926 (2019).
Jia, S., Liu, C. & Wang, C. A dataset of temperature, salinity and depth profile of sea water based on No.6 Buoy of the East China Observation Station during 2014–2015. Science Data Bank. https://doi.org/10.11922/sciencedb.931 (2019).
Article Google Scholar
Meng, Z. et al. A dataset of benthic environmental parameters in the Yellow Sea (2007–2009). Science Data Bank. https://doi.org/10.11922/sciencedb.554 (2018).
Article Google Scholar
Hu, Z. et al. Oceanographic data collected within the eastern equatorial Indian Ocean by JAMES during December 2019‒February 2020. Science Data Bank. https://doi.org/10.11922/sciencedb.01136 (2021).
Article Google Scholar
Chang, Y. et al. The ocean dynamic datasets of seafloor observation network experiment system at the South China Sea. Science Data Bank. https://doi.org/10.11922/sciencedb.823 (2019).
Article Google Scholar
Xu, C. et al. 2009-2012 South China Sea section scientific CTD CTD data sets. Science Data Bank. https://doi.org/10.11922/sciencedb.41 (2015).
Article Google Scholar
Tozer, B. et al. Global bathymetry and topography at 15 arc sec: SRTM15+. Earth and Space Science 6 (2019).
Roquet, F., Madec, G., McDougall, T. J. & Barker, P. M. Accurate polynomial expressions for the density and specific volume of seawater using the TEOS-10 standard. Ocean Modelling 90, 29–43 (2015).
Article ADS Google Scholar
Yang, J., Rahardja, S. & Fränti, P. in Proceedings of the international conference on artificial intelligence, information processing and cloud computing. 1–6.
Gouretski, V. et al. A consistent ocean oxygen profile dataset with new quality control and bias assessment. Earth Syst Sci Data 16, 5503–5530 (2024).
Article Google Scholar
King, B. A., Firing, E. & Joyce, T. M. Shipboard observations during WOCE. Vol. 77 (Elsevier, 2001).
Gouretski, V. & Koltermann, K. P. WOCE global hydrographic climatology. Berichte des BSH 35, 1–52 (2004).
Google Scholar
Gouretski, V. & Jancke, K. Systematic errors as the cause for an apparent deep water property variability: global analysis of the WOCE and historical hydrographic data. Prog Oceanogr 48, 337–402 (2000).
Article ADS Google Scholar
McGill, R., Tukey, J. W. & Larsen, W. A. Variations of box plots. The american statistician 32, 12–16 (1978).
Article Google Scholar
Hubert, M. & Vandervieren, E. An adjusted boxplot for skewed distributions. Computational Statistics and Data Analysis 52, 5186–5201 (2008).
Article MathSciNet Google Scholar
Adil, I. H. & Irshad, A. R. A modified approach for detection of outliers. Pakistan Journal of Statistics and Operation Research 11, 91–102 (2015).
Article Google Scholar
Reiniger, R. & Ross, C. A method of interpolation with application to oceanographic data. Deep Sea Research and Oceanographic Abstracts 15, 185–193 (1968).
Article ADS Google Scholar
Zhu, C. & Liu, Z. Weakening Atlantic overturning circulation causes South Atlantic salinity pile-up. Nat Clim Change 10, 998–1003 (2020).
Article ADS Google Scholar
Yu, L. A global relationship between the ocean water cycle and near‐surface salinity. J Geophys Res-Oceans 116 (2011).
Jordà, G. et al. The Mediterranean Sea heat and mass budgets: Estimates, uncertainties and perspectives. Prog Oceanogr 156, 174–208 (2017).
Article Google Scholar
Li, H. & Fedorov, A. V. Persistent freshening of the Arctic Ocean and changes in the North Atlantic salinity caused by Arctic sea ice decline. Clim Dynam 57, 2995–3013 (2021).
Article ADS Google Scholar
Rudels, B. & Carmack, E. Arctic ocean water mass structure and circulation. Oceanography 35, 52–65 (2022).
Google Scholar
Potter, R. A. & Lozier, M. S. On the warming and salinification of the Mediterranean outflow waters in the North Atlantic. Geophys Res Lett 31 (2004).
Zhu, Y. et al. CODC-v2 global ocean in-situ profile observational dataset. Chinese Academy of Sciences Oceanographic Science Data Center. https://doi.org/10.12157/IOCAS.20241217.001 (2024).
Boehlert, G. W. et al. Autonomous pinniped environmental samplers: using instrumented animals as oceanographic data collectors. J Atmos Ocean Tech 18, 1882–1893 (2001).
Article Google Scholar
Mensah, V. et al. A correction for the thermal mass–induced errors of CTD tags mounted on marine mammals. J Atmos Ocean Tech 35, 1237–1252 (2018).
Article Google Scholar
Welch, D. W. & Eveson, J. P. in Electronic Tagging and Tracking in Marine Fisheries: Proceedings of the Symposium on Tagging and Tracking Marine Fish with Electronic Devices, February 7–11, 2000, East-West Center, University of Hawaii. 369–383 (Springer).
Cowley, R. et al. International Quality-Controlled Ocean Database (IQuOD) v0.1: the temperature uncertainty specification. Front Mar Sci 8, 689–695 (2021).
Article Google Scholar
Gulev, S. K. et al. Changing state of the climate system. In Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change. (Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA, pp. 287–422, 2021).
Song, X. et al. DC_OCEAN: An open-source algorithm for identification of duplicates in ocean database. Frontier in Marine Science 6 (2024).
Fofonoff, N. P., Hayes, S. & Millard, R. C. WHOI/Brown CTD microprofiler: methods of calibration and data handling. (1974).
Gregg, M. C. & Hess, W. C. Dynamic response calibration of Sea-Bird temperature and conductivity probes. J Atmos Ocean Tech 2, 304–313 (1985).
Article Google Scholar
Johnson, G. C., Toole, J. M. & Larson, N. G. Sensor corrections for sea-bird SBE-41CP and SBE-41 CTDs. J Atmos Ocean Tech 24, 1117–1130 (2007).
Article Google Scholar
Watkins, M. M. et al. Improved methods for observing Earth’s time variable mass distribution with GRACE using spherical cap mascons. J. Geophys. Res.- Solid Earth 120, 2648–2671 (2015).
Article ADS Google Scholar
Lu, Y. et al. North Atlantic–Pacific salinity contrast enhanced by wind and ocean warming. Nat. Clim. Change 14, 723–731 (2024).
Barnoud, A. et al. Revisiting the global mean ocean mass budget over 2005‐2020. Ocean Science 19, 321–334 (2023).
Article ADS Google Scholar
Mu, D. et al. Contrasting discrepancy in the sea level budget between the North and South Atlantic Ocean since 2016. Earth and Space Science 11, e2023EA003133 (2024).
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank NOAA/NCEI scientists for long-term data preservation and maintaining the WOD and WOA. This study is supported by the National Natural Science Foundation of China (Grant no. 42261134536), the National Key R&D Program of China (Grant No. 2023YFF0806500), the International Partnership Program of the Chinese Academy of Sciences (Grant No. 060GJHZ2024064MI), Asia Cooperation Fund, the new Cornerstone Science Foundation through the XPLORER PRIZE, National Key Scientific and Technological Infrastructure project “Earth System Science Numerical Simulator Facility” (EarthLab), the Young Talent Support Project of Guangzhou Association for Science and Technology, the Youth Independent Innovation Science Foundation (Grant No. ZK24-54), and the China Scholarship Council (Grant no. 202204910270). This work was also supported by the Oceanographic Data Center, Chinese Academy of Sciences. The calculations in this study were carried out on the ORISE Supercomputer. The Argo Program is part of the Global Ocean Observing System.

Author information

These authors contributed equally: Zhetao Tan, Yujing Zhu.

Authors and Affiliations

State Key Laboratory of Earth System Numerical Modeling and Application, Institute of Atmospheric Physics, Chinese Academy of Sciences, Beijing, 100029, China
Zhetao Tan, Yujing Zhu, Lijing Cheng, Viktor Gouretski, Yuying Pan, Guancheng Li, Xinyi Song, Senliang Bao & Jiang Zhu
Laboratoire de Météorologie Dynamique, Institut Pierre Simon Laplace, Ecole Normale Supérieure – Université PSL, Paris, 75231, France
Zhetao Tan
University of Chinese Academy of Sciences, Beijing, 100049, China
Yujing Zhu, Lijing Cheng, Huifeng Yuan & Xinyi Song
Computer Network Information Center, Chinese Academy of Sciences, Beijing, 100039, China
Huifeng Yuan
NOAA National Centers for Environmental Information, Silver Spring, MD, 20910, USA
Zhankun Wang
Eco-Environmental Monitoring and Research Center, Pearl River Valley and South China Sea Ecology and Environment Administration, Ministry of Ecology and Environment, Guangzhou, 510611, China
Guancheng Li
Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China
Bin Zhang & Yuanlong Li
Oceanographic Data Center, Chinese Academy of Sciences, Qingdao, 266071, China
Bin Zhang
College of Meteorology and Oceanography, National University of Defense Technology, Changsha, 410073, China
Senliang Bao

Authors

Zhetao Tan
View author publications
Search author on:PubMed Google Scholar
Yujing Zhu
View author publications
Search author on:PubMed Google Scholar
Lijing Cheng
View author publications
Search author on:PubMed Google Scholar
Viktor Gouretski
View author publications
Search author on:PubMed Google Scholar
Yuying Pan
View author publications
Search author on:PubMed Google Scholar
Huifeng Yuan
View author publications
Search author on:PubMed Google Scholar
Zhankun Wang
View author publications
Search author on:PubMed Google Scholar
Guancheng Li
View author publications
Search author on:PubMed Google Scholar
Xinyi Song
View author publications
Search author on:PubMed Google Scholar
Bin Zhang
View author publications
Search author on:PubMed Google Scholar
Senliang Bao
View author publications
Search author on:PubMed Google Scholar
Yuanlong Li
View author publications
Search author on:PubMed Google Scholar
Jiang Zhu
View author publications
Search author on:PubMed Google Scholar

Contributions

L.C., Z.T. and V.G. designed the research and methods. Z.T. wrote the first manuscript and created the figures. Z.T., Y.Z., Y.P., H.Y., V.G. and B.Z. prepared the data. Y.Z. contributed to the data formatting. Z.T., V.G., L.C., Y.Z., G.L, Z.W. and B.Z. analyzed the data. All authors contributed to the writing and reviewing of the manuscript.

Corresponding author

Correspondence to Lijing Cheng.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Tan, Z., Zhu, Y., Cheng, L. et al. CODC-S: A quality-controlled global ocean salinity profiles dataset. Sci Data 12, 917 (2025). https://doi.org/10.1038/s41597-025-05172-9

Download citation

Received: 03 January 2025
Accepted: 08 May 2025
Published: 30 May 2025
Version of record: 30 May 2025
DOI: https://doi.org/10.1038/s41597-025-05172-9

This article is cited by

Ocean Heat Content Sets Another Record in 2025
- Yuying Pan
- Lijing Cheng
- Lin Chen
Advances in Atmospheric Sciences (2026)

Subjects

Abstract

Similar content being viewed by others

Short-term impacts of cold front passage on coastal water quality and material transport

CODC-v1: a quality-controlled and bias-corrected ocean temperature profile database from 1940–2023

GO-SHIP Easy Ocean: Gridded ship-based hydrographic section of temperature, salinity, and dissolved oxygen

Background & Summary

Methods

Data sources

QC working flows

Basic information check

Sample level order check

Local bottom depth check

Instrument type depth check

Constant value check

Multiple salinity extrema check

Spike check

Density inversion check

Global crude range check

Global vertical gradient check

Local salinity climatology range check

Local salinity climatology range for CODC-QC-S

Data Records

Technical Validation

CODC-QC-S systems validations

Dataset (CODC-Salinity) validations

Individual data points (profile) validation

Climatology validation

Time series validation

Usage Notes

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Ocean Heat Content Sets Another Record in 2025

Search

Quick links