Introduction

Rapid urbanization in developing countries has significantly increased the amount of urban domestic sewage, which has become a major source of water pollution1,2. Domestic sewage accounts for about 70% of the total wastewater discharged in urban areas2, reaching 62.5 billion cubic meters in 2021. To address the challenge, governments have heavily invested in sewage management, enhancing infrastructure and increasing the capacity and numbers of wastewater treatment plants (WWTPs)3,4. With the significant progress in urban wastewater collecting and treatment systems, sewage network leakage are becoming a limiting issue for clean water in many cities all over the world5,6,7,8. The leakage, which is domestic sewage that enters the environment directly without treatment, consists primarily of uncollected sewage, sewage that is collected but exfiltrated, and sewage that is collected but discharged with stormwater in combined sewer overflows. Initial research on sewage leakage has been predominantly conducted in European countries and selected regions in the Canada and United States9,10,11,12, with a mainly focus on exfiltrated or overflowed issues. The enactment of the “Water Framework Directive” in 2002 marked a pivotal moment, with 16 European Union countries establishing national standards or guidelines (obtained from www.eureau.org) to regulate sewage overflows, delineating limits on overflow events, concentration levels, and maximum discharge volumes13,14. Still, sewage network leakage is a challenging issue for many developing countries. Exfiltrated sewage from networks was found to be a major contributor to the shallow aquifer in Jordan and Ukraine, leading to groundwater quality deterioration or even structural soil collapse6,7. A study in Tehran found that exfiltrated sewage lead to larger graywater footprints15. In Wuhan, China, after heavy rainfall, large pollution loads from overflows were discharged into receiving waters within a short period of time, resulting in frequent fish kills16. In 2015, China launched the “Water Pollution and Control Action Plan” that first took the issue of leakage from sewage networks as a main concern to be addressed in order to eliminate water pollution17,18,19.

Sewage network leakage arises from several critical factors: pipe defects leading to exfiltrated sewage; combined sewer overflows due to comingled stormwater and sewage; unserved areas of cities and thus uncollected sewage. Besides, blockages caused by sediment accumulation, tree branches, leaves, plastic bags, and other debris can also lead to overflows15,20,21,22,23. This leakage issue is regarded to be closely related to the presence of ‘black and odorous water bodies’ in over 70% of China’s 295 cities19,22. Furthermore, the infiltration of rainwater and urban runoff into the sewage network has led to a reduced pollutant concentration at WWTPs, thereby impacting wastewater treatment efficiency24,25,26. Most importantly, the existence of leakage makes the monitoring of domestic pollution loads on water environment extremely difficult. For example, the Chinese Second National Pollution Source Census (CSNPSC) estimated urban domestic source loads by monitoring WWTPs and other outlets directly discharging domestic wastewater27. Since leakage, in particular net leakage from the sewage network (including exfiltrated and overflowed leakage), is hardly monitored, it is probably not accounted by the census and related pollution control strategies. In this sense, quantitative assessment of the leakage loads is crucial for the understanding of water pollution sources in a river basin and the development of more effective pollution management, treatment and mitigation strategies.

Despite the severity of the issue, studies trying to quantify sewage network leakage are quite limited. By monitoring flow or concentration change within the sewage network, several studies have quantified that there were 30%-500% of external water entering the rainfall-domestic combined sewage systems in Germany and in Mexico28,29,30. In Nottingham, estimations of exfiltrated identified that 67% of the study area was susceptible to high exfiltrated rates exceeding 500 liters per day due to pipe defects9. It has also been found that exfiltrated sewage in the Middle East contribute 30-64% of the heavily contaminated groundwater6, and globally, more than 80% of sewage is discharged into the environment without proper treatment31. In Germany, it is estimated that ~10% of nitrate pollution is attributed to exfiltrated sewage entering the environment10. However, most studies have focused on laboratory analyses or marker tracking studies in small area32. Since precise and dynamic monitoring data are hardly available in a large region, large-scale quantification is still challenging but meaningful for developing countries where leakage is widely existed. In China, Xu et al. have found that 67% of 219 Chinese cities exhibit structural defects in their sewage networks and/or improper connections of rainwater pipelines by analyzing changes in sewage concentration5. Cao et al. developed a mass balance method and studied several typical Chinese cities, finding that ~38% of the wastewater volume and 48% total nitrogen was lost from the sewage system due to exfiltrated or overflowed24. These pioneering studies provided initial information of the widespread leakage in China. Still, integrated quantification of the sewage leakage in a large scale and the analysis of factors influencing leakage is lacking, which hinders cost-effective management of sewage networks and the development of overall pollution control strategies.

In this study, we proposed a mathematical model based on mass balance theory with city scale data and uncertainty analysis to provide a comprehensive estimation of sewage leakage and analyzed the natural and socio-economic factors influencing leakage by city comparisons. We took Yangtze River basin (YRB), a region of significant environmental and economic importance in China as an example of the assessment. The objective was to address two questions: (1) How much nitrogen pollution load is discharged to the environment from urban domestic sources without treatment and what is the extent of gross leakage and net leakage? (2) Are there differences in the extent of network leakage in different cities, and what are the drivers? Our study aims to shed light on the current state of sewage leakage, make recommendations for reducing leakage, and establish a scientific basis for future improvements in urban sewage network management and the overall pollution control strategies in developing countries.

Results and discussion

Leakage indicated by load difference between domestic sources and WWTPs inflows

The study of the Yangtze River basin (YRB) founded a notable disparity between the domestic source sewage concentration (Cr) and the wastewater treatment plants (WWTPs) inflow concentration (Cin) (Fig. 1a). Compared to values in developed countries, such as those in Singapore and Germany (obtained from https://de.dwa.de/de/zahlen-fakten-umfragen.html), the Cin of the cities in the Yangtze River basin was significantly lower24,33,34,35,36,37. Additionally, the daily per capita pollution inflow load (PLin) was substantially lower than the daily per capita pollution discharge load (PLr) (Fig. 1b), indicating potential leakage issues in the sewage collection and transportation processes, in addition to load reductions in pipelines38. Compared to the averaged gross leakage rate of 65% for pollution load in the YRB (Fig. 1b), the leakage rate of water was significantly lower at 38% (Fig. 1c). These discrepancies suggested the presence of both low-concentration infiltrated in and high-concentration exfiltrated or overflowed out of the sewage networks, leading to diluted wastewater concentrations and reduced pollution loads.

Fig. 1: Comparison of concentrations, loads and volumes between domestic source sewage and wastewater treatment plant inflows.
figure 1

(a) Comparison of total nitrogen domestic source sewage concentration (Cr) and wastewater treatment plant inflow concentration (Cin), including values and ranges from other studies as reference values. Horizontal lines represent mean values from literature studies, while light yellow and light blue bands show the range of concentrations, indicating fluctuation observed within the studies. Note that the Corg for comparison to Cr is the concentration of effluent prior to septic treatment; (b) Comparison of daily per capita load and gross leakage rate across three sub-basins of the Yangtze River basin, where PLr is the daily per capita load generated by domestic sources (after septic treatment) and PLin is the daily per capita load of wastewater treatment plant inflow; (c) Comparison of sewage volume and leakage rate across three sub-basins of the Yangtze River basin, where Vr is the daily per capita sewage volume generated by domestic sources and Vin is the daily per capita sewage volume of wastewater treatment plant inflow.

The gross leakage rate (PLgross/PLr) in urban domestic sewage across the cities averaged 65% (Fig. 1b) with a standard deviation of 16%. Notably, the Middle Sub-basin exhibited a significantly higher mean gross leakage rate of 69%, compared to 62% in the Upper Sub-basin and 58% in the Lower Sub-basin. The Middle Sub-basin’s leakage rates are likely to be high due to its flat land and heavy rainfall, combined with its many converging tributaries, making it prone to flooding that might overload the sewage systems, as shown by the relatively higher annual precipitation and more severe flooding as reported in China Water and Drought Disaster Bulletin 2017 (obtained from http://www.mwr.gov.cn/sj/tjgb/zgshzhgb/) in the Middle Sub-basin39.

Potential unaccounted load by the national census due to leakage effect

There was a high proportion of leakage loads from urban domestic pollution in the YRB. Despite generating 1173 Gg (range of 1146-1199Gg) of nitrogen from domestic sources, only about 34% underwent processing at WWTPs. After treatment, 111 Gg was discharged into aquatic environments (Fig. 2). Notably, direct gross leakage accounted for 658 Gg (range of 611-706Gg), accounting for 56% (range of 53%–59%) of domestic nitrogen loads in the YRB, close to the 48% for China estimated by Cao et al.24. The high gross leakage demonstrated that the actual environmental load was nearly sevenfold the output from WWTPs, thus marking a substantial environmental challenge. Among the gross leakage, 224 Gg was uncollected and monitored by the Chinese Second National Pollution Census (CSNPSC) and the remaining 434 Gg (range of 387–482 Gg) was net leakage out of the sewage network that was not accounted by the CSNPSC, indicating that previously published official urban domestic sewage discharge loads may have been significantly underestimated. At the provincial and municipal scale, where data were available, similarly high gross leakage are shown in Supplementary Fig. 1. This indicated that sewage network leakage accounted for over half of the domestic nitrogen load into the environment.

Fig. 2
figure 2

Structure of nitrogen loads in the Yangtze River basin, with upper and lower bounds of load estimates reflecting uncertainty ranges, as indicated in parentheses.

Effect of rainfall on the extent of leakage

The volume of water infiltration (Vinf) and exfiltration (Vexf) in various cities (Supplementary Fig. 2) indicated that while the infiltration volume was similar across the three sub-basins, a markedly higher exfiltration volume was observed in the Middle Sub-basin compared to the Upper Sub-basin (p < 0.05) (Supplementary Fig. 3). Compare to the variation between the three sub-basins, the fluctuation in infiltration and exfiltration across different cities was more substantial. We found the impact of increased precipitation on domestic wastewater leakage, as shown in Fig. 3. Vinf increased significantly when rainfall was higher, i.e., at MIDDLE and HIGH precipitation levels. This indicates that combined sewage systems and sewage overflows during storm events are relatively common in the Yangtze River basin’s cities. In terms of Vexf, our findings indicated a significant increase in the HIGH precipitation group compared to LOW and MIDDLE groups, underscoring the pressure high precipitation exerts on the sewage network. However, the difference between MIDDLE and LOW precipitation levels was not statistically significant, suggesting that the network’s design generally coped with low to moderate rainfall levels. It is similar to the findings of Peche40. This pattern indicated that that under both LOW and MIDDLE conditions, exfiltration from the sewage network was mainly due to structural weaknesses and defects. This resilience of network, however, was compromised under heavy precipitation, particularly in flat terrains where slower water flow exacerbated the risk of overflows, leading to more significant exfiltration in cities experiencing HIGH precipitation.

Fig. 3
figure 3

Comparison of the (a) infiltration volume, (b) exfiltration volume and (c) scatter plot of exfiltration versus infiltration water volumes in various cities under different precipitation levels: LOW, MIDDLE and HIGH. *p < 0.05, ** p < 0.01, ***p < 0.001. Due to challenges in distinguishing exfiltrated from overflowed sewage, “exfiltration water volume per capita” (Vexf) here encompasses both exfiltrated sewage in dry periods and overflowed sewage during high flow events—total water per capita discharged externally from sewage network before reaching treatment plants. Square markings with an internal “X” denote cities with uncollected wastewater loads requiring attention.

Figure 3c illustrates the complex relationship between infiltration (Vinf) and exfitration (Vexf) water volumes in urban sewage networks. Due to monitoring challenges, the individual volumes of exfiltrated and overflowed sewage cannot be quantitatively distinguished and thus collectively termed as exfiltration volume in this study. The ratio of Vexf to Vinf offers insights in some extent into whether net leakage is predominantly due to exfiltrated sewage in dry periods or from overflow during high flow events. Cities below the 1:1 line (Vinf > Vexf) suggest that infiltration exceeds exfiltration, indicating relatively efficient transport of both original sewage and infiltrated rainfall or runoff with limited exfiltrated or overflowed sewage. This reflects that these cities have a prevalence of combined sewage systems, yet the infrastructure operates efficiently with minimal losses. Only about 30% of the cities fall within this range, and the majority of these cities have MIDDLE precipitation. Between the 1:1 and 2:1 lines (0 < Vexf-Vinf < Vinf), the condition indicates more severe overflow issues, where the volume of exfiltration water surpasses the volume of infiltration water. This reflects substantial external water ingress and inadequate network capacity to handle high flow conditions, resulting in overflows within these cities. Moreover, there is a tendency for Vinf and Vexf to be higher in the HIGH precipitation cities compared to LOW and MIDDLE precipitation cities, suggesting the influence of rainfall on overflow events. The points above the 2:1 line (Vexf-Vinf > Vinf) indicate that despite the limited external water infiltrated, substantial volumes of water are still discharged from the network in these cities. This may be attributed to significant structural defects increasing exfiltrated sewage or blockages that impede sewage flow causing overflows, necessitating further investigation and repairs to the pipeline quality. Cities below the 1:1 line and with uncollected load per capita greater than or equal to gross leakage per capita (marked by the square with an internal “X” in Fig. 3c), the primary concern should be uncollected loads, highlighting a need for substantial infrastructure development to enhance sewage collection rates.

With these insights, our findings confirm that combined sewage systems are prevalent across cities in the Yangtze River basin, with many experiencing poor network quality. This necessitates a tailored approach to managing urban sewage infrastructure based on the dominant leakage processes in each city. For cities below the 1:1 line in Fig. 3c, where sewage with rainwater infiltration does not significantly exceed network capacity, sewage transmission losses are minimal, and the focus should be on maintaining system integrity through regular inspections. Above the 2:1 line, cities often face structural defects and severe overflow due to blockages, and there is an urgent need to further investigate and improve the pipeline quality with corresponding maintenance measures. To effectively monitor and manage aging and corrosion, which can lead to the structural defects, Closed Circuit Television is the one of the primary method41, augmented by dye tests and other techniques15. Corrosion can be mitigated using chemical agents to reduce hydrogen sulfide, adding specific additives to concrete, and applying nitrite sprays to affected pipelines42. Cities between the 1:1 and 2:1 lines, especially those with high precipitation, need to prioritize separating rainwater from sewage systems and enhancing network capacity to cope with severe overflow problems.

Impact of socio-economic development on leakage

Our analysis revealed that in addition to the influence of natural meteorological factors, socio-economic factors significantly impacted the daily per capita total nitrogen gross leakage load (PLgross). We observed a complex interplay between economic development, population pressure, and pollution management capacity, leading to distinct PLgross levels across different city sizes. Specifically, we observed significant differences in PLgross values between Middle cities compared to Small cities (p < 0.05), as well as between Middle and Large cities compared to Megacities (p < 0.05), as illustrated in Fig. 4a. Interestingly, within cities of the same size, categorized by urban population (Pu), PLgross decreased with economic development, evidenced by the negative slope (k) in each city size (Fig. 4b). The non-linear interaction between urban development and PLgross was also demonstrated by the correlation analysis results shown in Fig. 5. Overall, population increase tended to reduce PLgross, but for Small and Middle cities, population increase also increased leakage. This indicated that when urban development and sewage network construction in early stages, the growth of urban population can overload the sewage collection, treatment, and transportation systems, increasing leakage. As cities develop and urban management capacity improves, population growth enhanced the utilization rate and efficiency of these infrastructures, showing the economies of scale. Unlike the complex relationship between urban population and leakage, economic development indicated by the nighttime lighting intensity (NTL) was always negatively related to sewage leakage for all city sizes (Fig. 5), indicating more financial investment, technical improvement and enhanced management with economic development are important to reduce the per capita pollution load leakage.

Fig. 4: Trends in per capita gross leakage load under population and economic growth.
figure 4

(a) Boxplot of the value and comparison of daily per capita gross leakage load (PLgross) for Small cities, Middle cities, Large cities, and Megacities, with outliers defined as 1.5 times the interquartile range (IQR); (b) Trend plot of daily per capita gross leakage load (PLgross) with increasing economic development, which is indicated by total nighttime lighting intensity (NTL) (nanoW cm−2 sr−1) for cities of different population size and fitted line, where k is the slope of the fitted line. City size categories based on urban population (Pu): Megacities (>5 million), Large cities (3–5 million), Middle cities (1–3 million), Small cities (<1 million). The size of the circle indicates the urban population of the city.

Fig. 5: Correlation analysis between daily per capita gross leakage load (PLgross) and driving factors across cities of varying population sizes.
figure 5

Population and economic concentration is represented by R10%, the ratio of the nighttime light intensity (NTL) (nanoW cm−2 sr−1) in the top 10% brightest urban areas to the total NTL of the city. *p < 0.05, **p < 0.01, ***p < 0.001.

Although the economies of scale were demonstrated by population and economic development factors stated above, population and economic concentration within a city was not always beneficial for the performance of sewage networks. The population and economic concentration (indicated by the ratio of the nighttime light intensity in the top 10% brightest urban areas to the total NTL of the city) was significantly positively related to leakage for Megacities (Fig. 5). This suggested excessive population and economic over-concentration in megacities can lead to pollution emissions that surpass the capacity of facilities in certain areas of the city. This was particularly pronounced in cities with older sewage networks, thereby significantly exacerbating the risk of pollution load leakage. Previous studies have also found that measures such as over-centralized municipal WWTPs can be less efficient43,44,45.

Not as the normal expectation, the construction of the drainage network, i.e., the density of drainage pipes in the built-up area, did not significantly affect PLgross at different city sizes. This might be due to over 70% of the sewage network being constructed in the past 15 years. The drainage network construction could reduce uncollected domestic wastewater but did not necessarily guarantee less leakage out of the pipelines46. These findings underscored that merely increasing the density of the drainage pipes was not sufficient. Ensuring high construction quality and effective management was crucial for the network’s performance in pollution control.

While some studies suggested that urbanization may increase environmental pollution, others proposed positive impacts due to the economies of scale47,48,49. Our research on urban wastewater leakage revealed a non-linear interaction with urban development, emphasizing the importance of considering the economies of scale at different stages of urban growth for sustainable city planning.

Further discussion

As extreme weather events become more frequent, the risk of sewage leakage in urban areas grows according to the finding of this research (Fig. 3). This emphasizes the critical need to upgrade urban sewage networks to prevent rainfall and runoff infiltration50, since preventing sewage leakage is more cost-effective than dealing with pollution afterwards51. Also, this study revealed the economies of scale that a certain extent of population and economic concentration enhanced the wastewater collection and transportation efficiency of sewage networks, but the over-concentration in megacities posed extensive wastewater collection and transportation pressure and higher leakage risks (Fig. 5). Centralized sewage systems have become the prevailing choice due to the economies of scale, which enhance pollution control capacity and efficiency while reducing per capita costs52. However, as cities developing, particularly in areas of population and economic over-concentration, increased human activity tends to produce more and more sewage and loads that exceed the capacities of existing sewage infrastructure. Faced with these challenges, options include expanding the existing sewage network to integrate additional areas and upgrading treatment plant capabilities, or constructing region-specific networks and treatment facilities to create a decentralized sewage system. Previous research has demonstrated that decentralized sewage systems offer competitive advantages over traditional centralized systems53. These advantages include better environmental outcomes in wastewater collection and treatment, as well as economic benefits in terms of network construction, service life, and maintenance costs. Our study also identifies that over-concentration in economic development may lead to higher risks of sewage leakage, especially in large population megacities. Thus, the selection between centralized and decentralized systems should be flexible, accommodating the rapid growth and specific needs of urban domestic treatment54.

Quantifying urban domestic pollution loads has always faced the challenge of monitoring and estimating sewage leakage. The mass balance model proposed in this study provides a novel endeavor for improving the estimation of sewage leakage, which can be applied in other regions and other developing countries where data are available. It should also be noted that the parameters of daily per capita pollution load produced from domestic sources (PLr), daily per capita reduction load in the network (PLcut), infiltration concentration (Cinf), and exfiltration concentration (Cexf) utilized in our methodology have spatial and temporal variability due to differences in city terrain, atmospheric deposition, precipitation characteristics, and the construction and upgrade condition of septic systems55. Due to the lack of these detailed data for all cities, our estimation employed zonal-graded reference values from previous studies for these parameters, and an uncertainty analysis accommodating a possible spectrum of these parameter values was conducted to quantify the uncertainty related to leakage load and water volume calculation. For future studies, to ensure a thorough assessment of network quality and minimize accounting discrepancies by taking into account regional variability, we recommend more monitoring research on domestic discharge points, in-network nutrient change, infiltrated, exfiltrated and overflowed sewage concentration be conducted in more cities. In addition, incorporating factors related to sewage network quality such as age of pipelines would enhance our understanding of exfiltration in dry period, yet such data are not widely available in our large study area and vary significantly within the scope of one city, presenting substantial challenges for differentiating quantitatively between overflow leakage and exfiltrated leakage during dry periods. Recognizing this gap, future research should aim to include more detailed data on infrastructure age to refine leakage assessments and improve urban sewage management strategies.

Additionally, we hope the quantitative assessment of leakage and influencing factor analysis in this study further stimulates further scientific research and guide government action on urban sewage networks. This study looked at how rainfall and economic development affect leakage. More advanced research is needed to take in-depth investigation for specific cities to find the local main reason of leakage and effective ways of reducing leakage. The Chinese government is already handling the sewage network issue. On February 16, 2024, the MOHURD announced a comprehensive plan to upgrade over 100,000 kilometers of underground pipelines each year, with about 26% of the existing pipelines being sewage networks. This initiative will launch a project aimed at enhancing urban drainage capabilities and flood prevention measures, while also advancing the development of critical urban infrastructure safety projects. Our and future network leakage studies will provide valuable information for planning and carrying out this work and help to better decision-making to reduce pollution from urban domestic sources.

Methods

Study area

The study investigated nitrogen pollution leakage of urban domestic wastewater from municipal sewage network in the YRB, which stretches from western to eastern China and covers 19 provinces and 129 cities. The total area of YRB is about 1.8 million km2, accounting for almost 20% of China’s total area, and the population is about 4.2 million. The vast territory brings multifaceted diversity to the YRB56. The YRB is divided into the Upper, Middle and Lower Sub-basins, with increasing areas of urban land from the Upper Sub-basin to the Lower Sub-basin.

Quantitative assessment of nitrogen leakage load

Total nitrogen pollution is expected to be generated from domestic sources, collected by the sewage network and transported to WWTPs for treatment before discharging to surface waters, as indicated by the yellow arrows in Fig. 6. However, during the collection and transportation processes, leakages may occur: uncollected sewage is present in unserved urban areas. Pipeline defects lead to exfiltrated sewage. Mixing of rainwater and sewage or blockages due to improper maintenance often result in sewer overflows during high flow events. These processes change the volume and concentration of the sewage water as it is transported from the sources to the WWTPs.

Fig. 6: Research framework.
figure 6

The yellow arrow indicates the theoretical flow direction of urban domestic wastewater, while the green and orange dashed arrows represent the directions of infiltration and exfiltration, i.e., total nitrogen (TN) loads entering the network from the outside and loads discharged from the network into the external environment due to exfiltrated or overflowed sewage, respectively. Red dashed arrow indicates TN loads that are not collected into the sewage network and are discharged directly into the environment. The uncollected TN loads are accounted by the Chinese Second National Pollution Source Census (CSNPSC), along with loads discharged from wastewater treatment plants after treatment, which is indicated by the purple arrows. Net leakage of TN is the difference between the exfiltration and infiltration loads, which is indicated by the red arrow. The uncollected load and the net leakage load together constitute the gross leakage. The unit of each TN concentration C is mg/L, the unit of volume V is L capita−1 day−1, and the unit of daily per capita TN load PL is g capita−1 day−1.

We constructed a mathematical model based on the principles of mass balance and water balance to estimate the amount of sewage leakage of total nitrogen (TN) in each city. We used PLr, PLcut, PLun, PLinf, PLexf, PLin and PLout to represent the daily per capita TN load produced from domestic sources, reduction in the network, uncollected by the network, infiltrated into the network, exfiltrated or overflowed from the network, transported to WWTPs and discharged from WWTPs38. PLgross is the daily per capita gross leakage load during the collection and transport process, including uncollected load (PLun) and net leakage load (PLnet), is calculated as the difference between PLr and the sum of PLcut and PLin.

According to the mass balance and the water balance theory, the daily per capita load, the water volumes, and the total pollution load follow Eqs. (1)–(5):

$${{\rm{PL}}}_{{\rm{r}}}-{{\rm{PL}}}_{{\rm{cut}}}-{{\rm{PL}}}_{{\rm{un}}}+{{\rm{PL}}}_{\inf }-{{\rm{PL}}}_{{\rm{exf}}}={{\rm{PL}}}_{{\rm{in}}}$$
(1)
$${{\rm{V}}}_{{\rm{r}}}-{{\rm{V}}}_{{\rm{un}}}+{{\rm{V}}}_{\inf }-{{\rm{V}}}_{{\rm{exf}}}={{\rm{V}}}_{{\rm{in}}}$$
(2)
$${{\rm{P}}{\rm{L}}}_{{\rm{gross}}}={{\rm{PL}}}_{{\rm{un}}}+{{\rm{PL}}}_{{\rm{net}}}={{\rm{PL}}}_{{\rm{r}}}-{{\rm{PL}}}_{{\rm{cut}}}-{{\rm{PL}}}_{{\rm{in}}}$$
(3)
$${{\rm{PL}}}_{{\rm{net}}}={{\rm{PL}}}_{{\rm{exf}}}-{{\rm{PL}}}_{\inf }={{\rm{PL}}}_{{\rm{r}}}-{{\rm{PL}}}_{{\rm{cut}}}-\,{{\rm{PL}}}_{{\rm{in}}}-{{\rm{PL}}}_{{\rm{un}}}$$
(4)
$${{\rm{L}}}_{{\rm{gross}}}={{\rm{L}}}_{{\rm{r}}}-{{\rm{L}}}_{{\rm{cut}}}-{{\rm{L}}}_{{\rm{in}}}={{\rm{L}}}_{{\rm{un}}}+{{\rm{L}}}_{{\rm{net}}}$$
(5)
$${{\rm{L}}}_{{\rm{tot}}}={{\rm{L}}}_{{\rm{out}}}+{{\rm{L}}}_{{\rm{gross}}}$$
(6)

where the daily per capita load (PLr, PLinf, PLexf, PLin and PLout), in g capita−1 day−1, are calculated by multiplying the concentrations at each phase (Cr / Cinf / Cexf / Cin/ Cout), in mg L−1 with the corresponding water volumes (Vr / Vinf / Vexf / Vin/ Vout), in L capita−1 day−1. PLr represents the load after septic tank treatment, with a reduction rate of 13% in developed cities and 10% in general cities35,36,37,57. PLcut is calculated by multiplying the PLr with the rate of reduction in network. Referring to Shi’s study, we used 10% as a reference value38. With known variables (PLr, Vr, PLcut, PLun, Vun, PLin, Vin) and parameters (Cinf, Cexf) extracted or calculated from literature data, the leakage volumes (Vinf, Vexf) and load (PLinf, PLexf) can be solved. The per capita load is multiplied by the urban population (Pu) to calculate the corresponding total load (Lr, Lcut, Lun, Linf, Lexf, Lin, Lout and Lgross). Among these loads, Lout and Lgross together make up the total pollution load (Ltot) from urban domestic that enters the environment.

In this study, Ltot was compared with the TN pollution loads from urban domestic sources reported by the CSNPSC (Lcen). The Lcen include pollution loads discharged from the WWTPs after treatment and pollution loads that enter water bodies directly through other outlets without being treated by WWTPs. Since infiltration and exfiltration can hardly be investigated and monitored, Lcen probably did not account for net leakage. Therefore, the difference between census load (Lcen) and discharged load by WWTPs (Lout) was defined as uncollected load (Lun), and the difference between total load (Ltot) and Lcen was used to estimate the net leakage load that is unaccounted by the CSNPSC at the YRB, provincial and municipal levels (Supplementary Data 2, Supplementary Data 3).

Data were collected in the following three aspects (Supplementary Table 1, Supplementary Data 1): (1) Statistical data, including Pu, Vr, Vin, Vout, Cin, Cout and Lcen, were obtained from the Ministry of Housing and Urban-Rural Development of the People’s Republic of China (MOHURD), provincial Statistical Yearbooks, municipal Statistical Yearbooks, the CSNPSC and the Yearbook of China Urban Water Association3,58,59,60; (2) Data from other studies including PLr and Cinf were collected from literatures and were taken as parameters; (3) Cexf used the mean value of Cin and Cr, assuming the pollutant concentration vary uniformly within the sewage network.

For PLr, there are few studies on quantitative monitoring in China. PLr shows differences between areas with different levels of development, with slightly larger PLr in developed areas; however, PLr fluctuates little from year to year for the same area (obtained from https://de.dwa.de/de/zahlen-fakten-umfragen.html and https://de.dwa.de/de/zahlen-fakten-umfragen.html)33,34,61,62. The cities in our study were divided into two categories based on economic levels: more developed cities and general cities according to the CSNPSC. We used the systematic quantitative monitoring results located in Changzhou City as the per capita TN load from domestic of the more developed cities in the YRB before septic treatment, and Cexf used the mean value of Cin and Cr, assuming the pollutant concentration vary uniformly within the sewage network35,36,37,63,64. PLr for other general cities was calculated using the ratio of per capita pollution emissions between general and more developed cities, which is obtained from the CSNPSC regional values59.

For the parameter Cinf, regional concentrations of surface runoff were determined by 131 literature values from 34 studies on surface runoff concentrations in different cities (Supplementary Table 2). The values for all cities in the YRB are listed in Supplementary Table 3.

To analyze the uncertainty related to PLr and PLcut determination due to the variations of the reduction rates in septic tanks and in pipelines, we applied a ± 2% variance in reduction rates. This variance resulted in the uncertain range for nitrogen load estimation. Regarding the influence of uncertainty due to varying Cinf (infiltration concentration) and Cexf (exfiltration concentration) values on the calculations of Vinf (infiltration volume) and Vexf (exfiltration volume), we defined the uncertainty bounds for Cinf using its mean value plus or minus the standard deviation (mean ± SD). For the range of Cexf, we used the possible maximum and minimum values, which were Cr and Cin respectively. The uncertainty in Vinf and Vexf was quantified by performing an uncertainty analysis with 1000 Latin hypercube samples of Cinf and Cexf, utilizing the third (Q3) and first (Q1) quartiles of the derived Vinf and Vexf ranges to denote the uncertainty interval associated with leakage volume.

Analysis of the impact of rainfall and socio-economic development on leakage

Annual precipitation data from the China Meteorological Data Service Center were collected for 88 out of 129 cities in the year 2017 to analyze the impact of rainfall on leakage (Supplementary Data 4). To conduct a comparative analysis, cities were classified into three categories—“LOW”, “MIDDLE”, and “HIGH” – based on their respective positions in the lower, middle, and upper thirds for precipitation levels with the cut-off values of 1106 and 1360 mm. Statistical analysis was conducted using Tukey’s HSD method to compare the means of infiltration and exfiltration volume across cities in different precipitation categories65.

In addition to natural meteorological factors, urban population (Pu), nighttime lighting (NTL) and pipe density data of 94 cities were used to analyze the impact of socio-economic development on leakage extent (Supplementary Data 5). Cities were classified into four sizes based on urban population (Pu): over 5 million as Megacities, 3 million as Large cities, 1–3 million as Middle cities, and <1 million as Small cities (obtained from https://www.gov.cn/xinwen/2014-11/20/content_2781156.htm)66. Furthermore, nighttime lighting (NTL) data were employed as a proxy for economic development67. The NTL data, from the National Geographic Data Center (obtained from http://lake.geodata.cn), was cleaned to remove anomalies such as sudden light sources and background noise68,69. The dark background mask was set as 1 nanoW cm−2 sr−1. Pipe density in built-up areas was extracted from the MOHURD, which represents the length of pipelines constructed per square kilometer. We then examined the correlation between socio-economic development and daily per capita gross leakage load at four different city size levels.