Introduction

Birth weight is the most commonly used proxy of newborn growth and development. Newborn babies are routinely classified as small for gestational age (SGA) or large for gestational age (LGA) if their birth weight percentile falls below the 10th percentile or above the 90th percentile of the growth references. SGA or LGA are often at high risk for perinatal mortality and morbidity as well as long-term health problems. One of the hallmarks of twin pregnancies is the slower rate of fetal growth and development when compared to singleton pregnancies. As a consequence, birth weight of twins is significantly lower than singletons beginning around 30 to 32 weeks of gestational age (GA)1. Therefore, an unbiased classification for twin newborns is of great significance in neonate clinical practice.

Based on existing evidence, the relative smallness of twins is probably the result of benign adaptive mechanisms and the normal deceleration of growth in twins cannot be attributed to a pathological slowing, it is likely preferable to use a twin-based chart of birth weight to avoid overdiagnosis of intrauterine growth restriction in twin pregnancies2. In pursuit of this goal, several countries have constructed their twin-based chart of birth weight by sex and GA3,4,6. In recent years, the singleton-based chart of birth weight has been established and recommended for use in China7,8. A retrospective cohort study in China suggested that the use of twin-based chart of birth weight seemed more reasonable and may be preferred over the use of singleton chart when diagnosing SGA in twin fetuses9. Several twin-based charts of birth weight were reported based on large data from nationwide, multiple cities or local regions in China from 2006 to 2015 to 2010-202110,11,12,13,−14. However, these twin-based charts showed inconsistent ranges across GA and their corresponding growth trajectories did not appear to be entirely robust, and thus further integration may be necessary and realistic in practical application.

We aimed to establish the percentile values of twin birth weight by sex and GA based on recently published high-quality data on Chinese twin newborns in order to produce more robust and reliable reference values for identifying intrauterine growth restriction or postnatal growth deviation from normal trajectory in Chinese twin neonates.

Methods

Data source and search strategy

We conducted the literature search on PubMed, Scopus, Web of science, Chinese National Knowledge Infrastructure (CNKI), and Wanfang from their inception to April 30, 2025, with no language restrictions. The following search terms expressed in the Boolean form were used: [(twin*) AND ((birth weight) OR (birthweight)) AND ((percentile*) OR (curve*) OR (reference*) OR (standard*)) AND ((China) OR (Chinese))]. The detailed search strategy was described in Table S1. We have applied a registration in the International Prospective Register of Systematic Reviews (PROSPERO) database (No. 477530).

Inclusion and exclusion criteria

Studies that met the following criteria simultaneously were included in this present study: (1) GA was defined as last-menstrual-period and/or ultrasound examination; (2) the reference values of twin birth weight included the frequently-used percentile values by sex and GA; (3) Both male and female sample sizes were greater than 1000; (4) the study subjects were Chinese neonates. Studies were excluded if any one of the following criteria was met: (1) repeated reports or identical data in different reports; (2) ultrasound-based fetal weight; (3) only monochorionic and/or dichorionic twin birth weight; (4) only primipara and/or multipara twin birth weight.

Data extraction and quality assessment

Two investigators (XNZ and CDY) extracted data independently. Titles and abstracts of identified articles were screened and irrelevant articles were removed. If necessary, full texts of articles were examined thoroughly for their eligibility. A third investigator (WS) was consulted in case of disagreement, and a 100% consensus was reached. Data were entered into the Microsoft Excel 365, including the first author, published year, surveyed year, surveyed location, sample size, GA range, method for assessing GA, method for establishing percentile curves, and the percentile values of twin birth weight by sex and GA.

Potential eligible articles were assessed by the QualSyst quality assessment tool, and each study was assessed for risk of bias by 14 criteria15. Each item was scored, depending on the degree to which specific criteria were met or reported (“yes” = 2; “partial” = 1; “no” = 0). The items not applicable to a particular study were marked “N/A” and excluded from the calculation of the summary score. The final score was calculated by summing up the total score across the relevant items, expressed as a percentage of the available theoretical maximum. The quality of the articles was categorized as strong (> 75%), moderate (55–75%), and weak (< 55%).

Data synthesis and curve modeling

Figure 1 presented the flowchart of inclusion/exclusion of the search records. Finally, five eligible studies were included in this present study, one was the Nationwide Birth Defects Surveillance System data (GA of 28–42 weeks)10, one was a multicenter study (GA of 25–40 weeks)11, two were single region studies (GA of 25–42 weeks vs. 26–42 weeks)12,13, and one was the National Vital Statistics System data (GA of 22–27 weeks)14.

Fig. 1
figure 1

Flowchart of inclusion/exclusion of the search records.

Figure 2 showed brief procedure of data synthesis and curve modeling, consistent with the procedure and steps of previous studies16,17. First, we computed the synthetized values for each of the P3, P10, P25, P50, P75, P90, P97 of twin birth weight by sex and GA according to the two-stage proportionally weighted approach. In the first stage, two studies on single region were combined to the multicenter study according to equal weight by center/region. In the second stage, those above synthesized data in the first stage were further combined to those two national studies by a proportional weight of sample sizes by sex. A sensitivity analysis of data synthesis was also conducted as a direct equal weight of each of the included five studies irrespective of study location and sample sizes. Second, in the procedure of curve modeling, the initial smoothed curves were produced by a manual adjustment for the initial integrated data from two-stage proportionally weighted approach. These initial smoothed curves were further fitted by quartic polynomial regression equation to generate the intermediate smoothed curves. After that, the LMS parameters were produced based on these intermediate smoothed curves by the nonlinear equation from the LMS method18:

Fig. 2
figure 2

Procedure of data synthesis and curve modeling.

$$C_{{100\alpha }} \left( {\text{t}} \right) = {\text{M}}\left( {\text{t}} \right)\left[ {{\text{1}} + {\text{L}}\left( {\text{t}} \right){\text{S}}\left( {\text{t}} \right)z_{\alpha } } \right]^{{{\text{1}}/{\text{L}}\left( {\text{t}} \right)}}$$

where C100α(t) is the centile curve plotted against age t, zα is the normal equivalent deviate for the centile (for example when α = 0.97 corresponding to P97, zα = 1.88), and L(t), M(t) and S(t) are the fitted smoothed curves plotted against age. Third, the final smoothed curves and any percentile values by sex and GA were established by the LMS parameters according to the equation of the LMS method.

Statistical analysis

Two-stage proportionally weighted percentile values of twin birth weight were calculated as the initial integrated data that were then manually adjusted to obtain the initial smoothed curves. A polynomial regression equation was employed to fit the initial smoothed curves and then generate the intermediate smoothed curves. After that, the LMS method was used to fit the intermediate smoothed curves and then produce the LMS parameters that were used to calculate any percentile values by sex and GA as the final smoothed curves. R-square and adjusted R-square as well as Akaike information criterion (AIC) and Bayesian information criterion (BIC) were used to assess goodness of fit of the growth curves by means of the PROC GLMSELECT procedure. The P3, P10, P50, P90, P97 values of our finally established smoothed curves of twin birth weight were compared with the corresponding percentiles of twin birth weight from the included five studies10,11,12,13,14 and from other countries (USA (1980–1990 s), Germany (2007–2011), Australia (2001–2010), and Japan (1968–1990))3,4,5,6, as well as the corresponding percentiles of singleton birth weight from a representative sample of China7,8. Data procedure and statistical analysis were performed by SAS 9.4 (SAS Institute Inc., Cary, North Carolina).

Results

The literature search identified 236 papers, of which 229 were excluded as not relevant to this study or no reference values or intrauterine fetal data or duplicate records (Fig. 1). The remaining seven studies were examined in details, however of which two were further discarded as incomplete or identical data. Finally, five studies that met the inclusion criteria were included in this present study (Table 1), with strong quality according to the QualSyst quality assessment tool (Table S2). The original P3, P10, P25, P50, P75, P90, P97 percentile values of twin birth weight by sex and GA for these included five studies were extracted and documented in Tables S3 to S7.

Table 1 Details of the included five studies.

Based on the included five studies, the two-stage proportionally weighted percentile values of twin birth weight were calculated and then used to produce the initial integrated data for further curve modelling (Fig. 3). A sensitivity analysis illustrated that the curve shape from the two-stage proportionally weighted approach was similar to the curve shape from the equally weighted approach, however, the curves from the two-stage proportionally weighted approach seemed more conservative and inclusive, especially at the younger and older GA (Fig. S1).

Fig. 3
figure 3

Comparisons of the 3rd, 10th, 50th, 90th, 97th percentile curves of twin birth weight among final smoothed curves, manual smoothed curves, and original weighted curves.

A polynomial regression equation was employed to fit the initial manual smoothed percentile values that was an optimization to the two-stage proportionally weighted curves, showing good fit and tolerable difference for the polynomial fitted smoothed curves (Fig. S2). According to the LMS method, the final smoothed percentile curves were produced from the fitted LMS values that were generated from the intermediate polynomial fitted smoothed percentile values. The 3rd, 10th, 50th, 90th, 97th percentile curves exhibited robust fit and good consistency among final smoothed curves, manual smoothed curves, and original weighted curves (Fig. 3). Table 2 presented the LMS parameters and frequently-used percentile values of birth weight by sex and GA of 22 to 42 weeks in Chinese twins, and other percentile values can be calculated by the LMS parameters according to the equation of the LMS method. Tables S8 to S9 further presented the LMS parameters and percentile values of birth weight by sex and GA in exact weeks and days for more exact growth evaluation and exact calculation of z-scores for individual values. In order to facilitate actual application, we drew a set of user-friendly growth charts for male and female twin newborns which were composed of the frequently-used seven main percentile curves of twin birth weight in the charts (Figs. 4 and 5).

Table 2 The L, M and S parameters and percentile values of birth weight by sex and gestational age in Chinese twins.
Fig. 4
figure 4

Twin birth weight growth charts for male newborns in China.

Fig. 5
figure 5

Twin birth weight growth charts for female newborns in China.

Our finally established smoothed twin-based curves seemed a reliable projection of the cluster of the original curves from the included five studies (Fig. 6). The growth trajectory of our established twin-based curves of birth weight was generally consistent with the growth trajectories of twin birth weight from USA, Germany, Australia, and Japan, but was lower than those from Germany and Australia at > 36 weeks of GA and higher than those from USA and Japan at > 32 weeks (Fig. 7). In addition, our established twin-based curves of birth weight were obviously lower than the singleton-based curves from a representative sample of China, with a more pronounced trend after 32 weeks of GA (Fig. 8).

Fig. 6
figure 6

Comparisons of the 3rd, 50th, 97th percentile curves of twin birth weight among final smoothed curves, and original smoothed curves from the included five studies.

Fig. 7
figure 7

Comparisons of the 10th, 50th, 90th percentile curves of twin birth weight among China, USA, Germany, Australia and Japan.

Fig. 8
figure 8

Comparisons of the 3rd, 10th, 50th, 90th, 97th percentile curves of birth weight among twin-based curves (this study) and singleton-based curves in China.

Discussion

Our established twin-based percentile values of birth weight may be useful to identify those twin newborns at risk of adverse outcomes and may be preferred over the use of singleton growth charts when monitoring the growth trajectories of twin newborns. Compared to the use of singleton-based charts for twin neonates, using twin-based charts for twin neonates has the potential to avoid overdiagnoses of intrauterine growth restriction and the consequences of this misdiagnosis.

The mechanisms of fetal growth deceleration in twins and whether this phenomenon represents pathology or physiological adaptation has been the subject of debate19. Some organizations recommended the use of twin-based charts20, whereas others recommended singleton charts21. Our comparisons demonstrated a distinct variation between twin-based and singleton-based charts in Chinese populations (Fig. 8), suggesting that the use of singleton-based charts may lead to a high estimation of suspected growth restricted twin neonates because of what seems to fall off the singleton-based charts. Considering that the use of twin-based charts can lead to a considerable reduction in the proportion of twin neonates identified as SGA and such charts could safely reduce the burden of unnecessary medical interventions in twin pregnancies2, we thus tend to propose the use of twin-based charts for the diagnosis of SGA or LGA of twin neonates in clinical practice.

We selected eligible high-quality studies including percentile values of twin birth weight to contribute this present study according to the procedure of systematic review. We synthesized the percentile values of twin birth weight by a two-stage proportionally weighted approach and finally produced smoothed percentile curves by a series of curve modeling procedure including manually smoothing the weighted data, using polynomial equation for fitting manual smoothed curves, using the LMS method for fitting polynomial smoothed curves, and ultimately establishing a set of standardized growth curves of twin birth weight. This curve modeling procedure we used was consistent with the procedure used to establish international and China postnatal growth monitoring curves for preterm infants16,22.

Compared to the percentile curves of singleton-based birth weight in Chinese newborns7,8, our established twin-based curves were obviously lower, especially for newborns at GA of ≥ 32 weeks, supporting that the twin-based charts may be more appropriate and could be preferred for use in assessing and monitoring the growth of twin newborns. A model-fitting analysis showed twin birth weight varies across racial groups23. A reference from a country may not be optimal fits to other countries. Our present study has once again confirmed this variation of growth in twin birth weight by comparing to other countries, supporting the necessity of establishing the percentile values of twin birth weight in China. We argued that a flat or even reversed trajectory of twin birth weight in late full-term (e.g., 40–42 weeks) may be due to the fact that a proportion of indeed smaller newborns can continue to this final stage. However, further in-depth studies are still needed. We also noticed that a sharp increasing trend at term 40 weeks in Huang’s study that seemed different from other studies, may partially attribute to the right-edge effect in curve fitting24,25.

Our study has several strengths. First, we constructed robust and reliable percentile values based on all available high-quality data in Chinese twin newborns searched according to the procedure of systematic review. Second, we synthesized the data by two-stage proportionally weighted approach and additionally conducted a sensitivity analysis by equally weighted approach. However, our study has several limitations. First, although we used twin-based references without disaggregation of monochorionic/dichorionic twins for more common use in practice, the subtle difference may exist between monochorionic and dichorionic twin birth26,27. However, a recent cohort study including 398 twin pregnancies indicated that the differences of fetal weight between monochorionic and dichorionic twins were clinically insignificant28 Second, the unavailable data at extreme GA (e.g., Dai’s study) may affect the new combined reference values at extreme GA (e.g., 26 weeks of GA), so we considered using a two-stage proportionally weighted approach and further using a sensitivity analysis for examining robustness. Third, it is still essential to acknowledge that ultrasound-derived growth trajectories hold greater clinical relevance than birth-weight-based references. This is particularly true for preterm deliveries where fetuses fail to achieve their full growth potential. Fourth, since our present study did not exclude pregnancies complicated by hypertensive disorders, fetal anomalies, or other confounders, the resulting curves should be framed as population-based reference rather than optimal growth standard trajectories. Fifth, indeed, the reference values at term 40, 41 and 42 weeks remained uncertain due to selection biases and/or model fitting, the actual use should be caution beyond term 40 weeks.

Conclusions

Our established twin-based percentile reference values of birth weight could be preferred over the use of singleton references when diagnosing SGA or LGA in twin newborns or monitoring the growth of twin newborns in China. Further studies are needed to test the performance and benefit of our established twin-based charts in Chinese twin newborns and provide a better understanding of the mechanisms responsible for the slower growth in twins.