Introduction

Photosynthesis stands as the preeminent large-scale energy-matter conversion and biological carbon sequestration process on our planet, constituting the fundamental basis for the survival and development of the overwhelming majority of lives. Light serves as the foundational element for photosynthesis; thus, investigating plant responses to light becomes imperative1. It has been demonstrated that the photosynthetic light response curve of plants is one of the most crucial tools to quantify the changes of the photosynthetic rate2,3,4. Several essential photosynthetic variables that are regarded as indicators to evaluate the response of an organism to meet environmental changes, such as saturation light intensity, net light-saturated photosynthetic rate, light compensation point, and dark respiration rate, can be obtained by fitting the photosynthetic light response curves of plants using the light response model5,6,7. Therefore, determining the light response models of plant photosynthesis is of crucial significance for studying the photosynthetic productivity of plants.

Researchers have developed a variety of photosynthetic light response models, primarily intended for studying the rate of photosynthesis in response to the change of irradiance in algae and higher plants8,9,10,11,12,13,14,15. However, many mechanistic models feature numerous parameters and intricate relationships between photosynthetic rate and irradiance. Consequently, classical models continue to dominate in terms of widespread application, which include the Exponential Model15, Rectangular Hyperbola Model16, Nonrectangular Hyperbola Model17,18, and Modified Rectangular Hyperbola Model5,19,20. It has been shown that the Nonrectangular Hyperbola Model offers greater flexibility compared to the Rectangular Hyperbola Model, attributed to its inclusion of an additional curvature parameter. This enhanced flexibility is particularly advantageous when fitting photosynthetic light response curves for specific species of plants21. Additionally, the Modified Rectangular Hyperbola Model for higher plants can reproduce the irradiance response trends of photosynthesis well and for phytoplankton species can obtain close values to the measured data, but the fitted curves exhibited some slight deviations under low intensity of irradiance22. Therefore, there is a need to test which photosynthetic light response model describes the relationship between light intensity and the rate of photosynthesis in plants in the most accurate manner. Prior researches in model comparison have predominantly concentrated on assessing the goodness of fit (e.g., coefficient of determination) or exploring the trade-off between goodness of fit and model structural complexity (e.g., the Akaike information criterion)21,22. However, there remains a comparative dearth in quantifying and comparing the nonlinearity of photosynthetic light response models using relative curvature measures of nonlinearity, despite its potential to provide valuable insights into photosynthetic mechanisms.

Relative curvature measures of nonlinearity can provide a more comprehensive assessment of the nonlinear behavior of different nonlinear models compared to traditional criteria like the coefficient of determination or Akaike information criterion. While these traditional measures focus primarily on how well a model fits the data, they do not account for the model’s inherent nonlinearity and how this might affect its performance across different datasets or conditions. In contrast, relative curvature measures assess the extent to which a nonlinear regression model approaches the behavior of a linear model23. This is crucial in the context of photosynthetic light-response curves, where nonlinearity is a fundamental characteristic. By focusing on these nonlinear measures, related researches can ensure that the chosen models not only fit the data well but also provide a deeper insight into the model’s structural properties, making them a superior choice for comparing different nonlinear models, especially in ecological and biological studies where nonlinearity is common.

To understand and compare the inherent nonlinearity of light response models, we used four photosynthetic light response models (i.e., the Exponential Model, the Rectangular Hyperbola Model, the Nonrectangular Hyperbola Model, and the Modified Rectangular Hyperbola Model) to fit the photosynthetic light response curves using 42 datasets from 21 plant species. The root-mean-square error and relative curvature measures of nonlinearity were used to determine which one of the four nonlinear models provided the best description of the photosynthetic rate in response to the change of irradiance. The aim of this work is to validate the effectiveness of relative curvature measures of nonlinearity in nonlinear regression analyses and to provide a new approach for the future evaluation of photosynthetic light response models. The study’s practical application mainly lies in enhancing the accuracy and efficiency of fitting photosynthetic light-response curves, which are critical for understanding plant productivity under varying light conditions.

Materials and methods

Data acquisition

We used 42 datasets from 21 species, including both herbaceous and woody plants, to assess the performance of the four photosynthetic light response models. Each dataset includes light intensity values and the corresponding net photosynthetic rates for the subject under varying light conditions, with sample size ranging from 7 to 13. For all datasets, the light intensity varied from 0 to 2407 µmol m−2 s−1, and the net photosynthetic rate varied from − 4.11 to 46.4 µmol CO2 m−2 s−1. Therefore, the photosynthetic light-response models can be easily fitted. The original data are sourced from the Supplementary Table S4 in the publication of Chen et al.21.

Models

The observations of net photosynthetic rate were then plotted against irradiance and subsequently fitted using the following equations:

(i) The Exponential Model (denoted as EM)15,24 is

$$P={A_{\hbox{max} }}\left( {1 - \exp \left( {{{ - aI} \mathord{\left/ {\vphantom {{ - aI} {{A_{\hbox{max} }}}}} \right. \kern-0pt} {{A_{\hbox{max} }}}}} \right)} \right) - {R_d},$$
(1)

where P is the net photosynthetic rate, I is the irradiance, a is the initial quantum efficiency, Amax is the net light saturated photosynthetic rate, and Rd is the dark respiration rate (the letters in the following formulas have synonymous meanings). Note that there is a typographical error in the published paper of Chen et al.21, where the minus sign preceding the dark respiration rate in the model has been omitted.

(ii) The Rectangular Hyperbola Model (denoted as RHM)16 is

$$P=\frac{{aI{A_{\hbox{max} }}}}{{aI+{A_{\hbox{max} }}}} - {R_d}.$$
(2)

(iii) The Nonrectangular Hyperbola Model (denoted as NHM)17,18 is

$$P=\frac{{aI+{A_{\hbox{max} }} - \sqrt {{{\left( {aI+{A_{\hbox{max} }}} \right)}^2} - 4a{{{\uptheta}}}I{A_{\hbox{max} }}} }}{{2{{{\uptheta}}}}} - {R_d},$$
(3)

where θ is a curvature parameter.

(iv) The Modified Rectangular Hyperbola Model (denoted as MRHM)5,19,20 is

$$P=a\frac{{1 - {{{\upbeta}}}I}}{{1+{{{\upgamma}}}I}}I - {R_d},$$
(4)

where β and γ are additional parameters that need to be estimated along with the initial quantum efficiency a and the dark respiration rate Rd.

Model assessment

Least squares protocols are utilized to analyze a dataset encompassing responses collected at corresponding experimental settings. This method operates under the premise that the relationship between the responses and the experimental settings can be effectively captured and modeled by an equation of the following form23:

$${y_t}=f\left( {{x_t},{{{\uplambda}}}} \right)+{{{{\upvarepsilon}}}_t},$$
(5)

where yt (t = 1, 2, …, n) is the response, xt (t = 1, 2, …, n) is the experimental setting, εt (t = 1, 2, …, n) is an additive random error of measurement, and λ = (λ1, λ2, …, λp)T is an unknown parameter vector. Here, n represents the sample size, and p the number of model parameters.

When employing least squares protocols to fit a mathematical model, such as in the current study, it becomes imperative to adopt a stochastic assumption:

$$\begin{gathered} E\left( {{{{{\upvarepsilon}}}_t}} \right)=0, \hfill \\ {\text{Cov}}\left( {{{{{\upvarepsilon}}}_t}{{{{\upvarepsilon}}}_s}} \right)={{{{\upsigma}}}^2}{{{{\upkappa}}}_{ts}}, \hfill \\ \end{gathered}$$
(6)

where E represents the expectation, and Cov represents the covariance; σ2 is the variance, and κts (t, s = 1, 2, …, n) is a constant. This assumption elucidates the variability of the error term, exemplified by the difference between the observed and fitted net photosynthetic rate, which fluctuates in tandem with changes in irradiance within the context of this study. Then the parameters can be estimated by minimizing the residual sum of squares (RSS):

$${\text{RSS}}=\sum\limits_{{t=1}}^{n} {{{\left( {{y_t} - f\left( {{x_t},{{{\uplambda}}}} \right)} \right)}^2}} .$$
(7)

The value \(\:\widehat{{\uplambda\:}}\) that minimizes RSS in Eq. (7) is called the least squares estimator of λ. For each xt, we consider the conditional expectation of response (ηt):

$${{{{\upeta}}}_t}({{{\uplambda}}})=E({y_t}|{{{\uplambda}}})=f({x_t},{{{\uplambda}}}).$$
(8)

Therefore, the RSS in Eq. (7) can be written in vector form:

$${\text{RSS}}={\left\| {Y - {{{\upeta}}}({{{\uplambda}}})} \right\|^2},$$
(9)

where Y = (y1, y2, …, yn)T, η(λ) = (η1(λ), η2(λ), …, ηn(λ))T, and \(\|\bullet \|\) represents the length of a vector. The vector η(λ) delineates a p-dimensional surface, referred to as the solution locus, within the n-dimensional sample space23,25.

In theory, if the error term is assumed to follow an independent and identically distributed normal distribution, then the least squares estimators of the parameters in a linear regression model are unbiased, jointly normally distributed, and possess minimum variance compared to other estimators within the class of regular estimators26. However, in the context of a nonlinear regression model, the estimators derived from the least squares methods lack the advantageous characteristics typically sought after, particularly when dealing with a small sample size. It is only upon the expansion of the sample size to a sufficiently large scale that these least squares estimators begin to approximate the esteemed asymptotic qualities, whereby they emerge as asymptotically unbiased, adhere to an asymptotic normal distribution, and achieve the status of estimators with the asymptotic minimal variance. Ratkowsky27 labeled as “close-to-linear” models those nonlinear regression models whose estimators closely approached the asymptotic qualities mentioned above even with relatively small sample sizes. In contrast, the estimators of nonlinear regression models labeled “far-from-linear” did not approximate these desirable asymptotic qualities. Therefore, it is of paramount importance to actively pursue models that yield close-to-linear estimators when dealing with nonlinear models27,28.

In general, the bedrock of algorithms for computing least squares estimates and the majority of inference methods for nonlinear models lies in a local linear approximation achieved through the first-order Taylor expansion at a fixed parameter value λ023,29:

$$f(x,{{{\uplambda}}}) \cong f(x,{{{{\uplambda}}}_0})+\sum\limits_{{i=1}}^{p} {\left( {{{{{\uplambda}}}_i} - {{{{\uplambda}}}_{i,0}}} \right){v_i}(x)} ,$$
(10)

where \({v_i}(x)={{\partial f(x,{{{\uplambda}}})} \mathord{\left/ {\vphantom {{\partial f(x,{{{\uplambda}}})} {\partial {{{{\uplambda}}}_i}}}} \right. \kern-0pt} {\partial {{{{\uplambda}}}_i}}}\left| {_{{{{{{\uplambda}}}_0}}}} \right.\). We express Eq. (10) in vector form as follows:

$${{{\upeta}}}({{{\uplambda}}}) \cong {{{\upeta}}}({{{{\uplambda}}}_0})+\sum\limits_{{i=1}}^{p} {\left( {{{{{\uplambda}}}_i} - {{{{\uplambda}}}_{i,0}}} \right){v_i}} ,$$
(11)

where \({v_i}={\left( {{v_i}({x_1}),{v_i}({x_2}), \cdots ,{v_i}({x_n})} \right)^{\text{T}}}={{\partial {{{\upeta}}}({{{\uplambda}}})} \mathord{\left/ {\vphantom {{\partial {{{\upeta}}}({{{\uplambda}}})} {\partial {{{{\uplambda}}}_i}}}} \right. \kern-0pt} {\partial {{{{\uplambda}}}_i}}}\left| {_{{{{{{\uplambda}}}_0}}}} \right.\).

The linear approximation’s impact is to substitute the solution locus with its tangent plane at \({{{\upeta}}}({{{{\uplambda}}}_0})\) while simultaneously enforcing a uniform coordinate system on that tangent plane, which corresponds to two distinct assumptions: the planar assumption and the uniform coordinate assumption23,29. Measures of nonlinearity serve to illuminate the adequacy or insufficiency of the linear approximation23,30,31,32. For example, Bates and Watts23 introduced curvature measures of nonlinearity based on differential geometry ideas, namely the intrinsic curvature and parameter-effects curvature, offering comprehensive evaluations to determine whether a nonlinear regression model aligns close-to-linear or far-from-linear. In their published literature, η(λ) serves as a mapping, projecting the p-dimensional parameter space onto a corresponding p-dimensional surface within the n-dimensional sample space. Therefore, each point λ within the parameter space corresponds to a specific point η(λ) situated on the solution locus. Furthermore, lines within the parameter space through the point λ correspond to curves on the solution locus through η(λ)23,29.

A general straight line passing through the fixed point λ0 within the parameter space can be parametrically represented by the geometric parameter g as follows:

$${{{\uplambda}}}(g)={{{{\uplambda}}}_0}+gh,$$
(12)

where h = (h1, h2, …, hp)T is any non-zero vector. Thus, the corresponding ηh(g) on the solution locus is as follows:

$${{{{\upeta}}}_h}(g)={{{\upeta}}}({{{{\uplambda}}}_0}+gh).$$
(13)

Considering the tangential direction to the curve ηh(g) at g = 0 yields the following:

$${{{\dot {{\upeta}}}}_h}={{d{{{{\upeta}}}_h}} \mathord{\left/ {\vphantom {{d{{{{\upeta}}}_h}} {dg}}} \right. \kern-0pt} {dg}}\left| {_{{g=0}}} \right.=\sum\limits_{{i=1}}^{p} {{{\partial {{{{\upeta}}}_h}} \mathord{\left/ {\vphantom {{\partial {{{{\upeta}}}_h}} {\partial {{{{\uplambda}}}_i}\left| {_{{\lambda 0}}} \right.}}} \right. \kern-0pt} {\partial {{{{\uplambda}}}_i}\left| {_{{\lambda 0}}} \right.}}{{d{{{{\uplambda}}}_i}} \mathord{\left/ {\vphantom {{d{{{{\uplambda}}}_i}} {dg\left| {_{{g=0}}} \right.}}} \right. \kern-0pt} {dg\left| {_{{g=0}}} \right.}}} =\sum\limits_{{i=1}}^{p} {{v_i}{h_i}} ={\mathbf{\dot {V}}}h,$$
(14)

where \({\mathbf{\dot {V}}}\) represents a n × p matrix whose i-th column is vi. Furthermore, the second partial derivative of ηh to the fitted parameter λ0 can be calculated as:

$${v_{ij}}={{{\partial ^2}{{{\upeta}}}} \mathord{\left/ {\vphantom {{{\partial ^2}{{{\upeta}}}} {\partial {{{{\uplambda}}}_i}}}} \right. \kern-0pt} {\partial {{{{\uplambda}}}_i}}}\partial {{{{\uplambda}}}_j}\left| {_{{{{{{\uplambda}}}_{\text{0}}}}}} \right.$$
(15)

Therefore, the second partial derivative of ηh at g = 0 is

$$\begin{aligned} {{{{\ddot {{\upeta}}}}}_h}&={{{d^2}{{{{\upeta}}}_h}} \mathord{\left/ {\vphantom {{{d^2}{{{{\upeta}}}_h}} {d{g^2}}}} \right. \kern-0pt} {d{g^2}}}\left| {_{{g=0}}} \right. \\ &=\sum\limits_{{j=1}}^{p} {{{\partial \left( {\sum\limits_{{i=1}}^{p} {{v_i}{h_i}} } \right)} \mathord{\left/ {\vphantom {{\partial \left( {\sum\limits_{{i=1}}^{p} {{v_i}{h_i}} } \right)} {\partial {{{{\uplambda}}}_j}}}} \right. \kern-0pt} {\partial {{{{\uplambda}}}_j}}}} \left| {_{{{\lambda _0}}}{{d{{{{\uplambda}}}_j}} \mathord{\left/ {\vphantom {{d{{{{\uplambda}}}_j}} {dg\left| {_{{g=0}}} \right.}}} \right. \kern-0pt} {dg\left| {_{{g=0}}} \right.}}} \right. \\ &=\sum\limits_{{i=1}}^{p} {\sum\limits_{{j=1}}^{p} {{v_{ij}}{h_i}{h_j}} } \\ &={h^{{T}}}{\mathbf{\ddot {V}}}h, \end{aligned}$$
(16)

where \({\mathbf{\ddot {V}}}\) represents a p × p × n array, whose element at i-th row and j-th column is vij (vij is a n-dimensional column vector if we treat \({\mathbf{\ddot {V}}}\) as a p × p matrix).

The vectors \({{{\dot {{\upeta}}}}_h}\) and \({{{\ddot {{\upeta}}}}_h}\) can be interpreted as the instantaneous velocity and instantaneous acceleration, respectively, of ηh at the time g = 0. The second partial derivative vector \({{{\ddot {{\upeta}}}}_h}\) can be decomposed into the sum of three components using vector decomposition23:

$${{{\ddot {{\upeta}}}}_h}={{{\ddot {{\upeta}}}}}_{h}^{N}+{{\ddot {{\upeta}}}}_{{{h}}}^{P}+{{{\ddot {{\upeta}}}}_{h}^{G}},$$
(17)

where \({{\ddot {{\upeta}}}}_{h}^{N}\) is normal to the tangent plane, which determines the change in direction of the vector \({{{\dot {{\upeta}}}}_h}\) associated with normal direction; \({{\ddot {{\upeta}}}}_{h}^{P}\) is parallel to \({{{\dot {{\upeta}}}}_h}\), which determines the change in speed of the moving point (ηh) and hence determines whether the point moves uniformly on the solution locus; \({{\ddot {{\upeta}}}}_{h}^{G}\) is parallel to the tangent plane and normal to the vector \({{{\dot {{\upeta}}}}_h}\), which determines the change in direction of the vector \({{{\dot {{\upeta}}}}_h}\) within the tangent plane. These acceleration components can be converted to curvatures. The intrinsic curvature (\(K_{h}^{N}\)) associated with the acceleration component normal to the tangent plane, which is defined as23:

$$K_{h}^{N}=\frac{{\left\| {{{\ddot {{\upeta}}}}_{h}^{N}} \right\|}}{{{{\left\| {{{{{\dot {{\upeta}}}}}_h}} \right\|}^2}}},$$
(18)

and the parameter-effects curvature (\(K_{h}^{T}\)) associated with the acceleration components parallel to the tangent plane, which is defined as23:

$$K_{h}^{T}=\frac{{\left\| {{{\ddot {{\upeta}}}}_{h}^{T}} \right\|}}{{{{\left\| {{{{{\dot {{\upeta}}}}}_h}} \right\|}^2}}},$$
(19)

where \({{\ddot {{\upeta}}}}_{h}^{T}\) is the combination of \({{\ddot {{\upeta}}}}_{h}^{P}\) and \({{\ddot {{\upeta}}}}_{h}^{G}\), i.e., \({{\ddot {{\upeta}}}}_{h}^{T}={{\ddot {{\upeta}}}}_{h}^{P}+{{\ddot {{\upeta}}}}_{h}^{G}\).

The intrinsic curvature, derived from the solution locus25 at the least squares estimates of the parameters, stands as an inherent feature of the model, impervious to alterations through reparameterization. Conversely, parameter-effects curvature hinges upon the specific parameterization chosen, with the potential for substantial modification through nonlinear reparameterization approaches23,33. Moreover, the intrinsic curvature and parameter-effects curvature can effectively gauge the validity of the planar assumption and the uniform coordinate assumption, respectively. However, the intrinsic curvature and parameter-effects curvature are not useful measures of nonlinearity in part because they depend on the scaling of the data. Thus, Bates and Watts23 introduced a scaling factor (ρ) to avoid the dependence:

$${{\varvec{\uprho}}}=\sqrt {{s^2}p} ,$$
(20)

where \({s^2}={{{\text{RSS}}} \mathord{\left/ {\vphantom {{{\text{RSS}}} {(n - p)}}} \right. \kern-0pt} {(n - p)}}\), n is the sample size, and p is the number of parameters. Therefore, the relative intrinsic curvature (\(C_{h}^{{\text{I}}}\)) in the direction h is as follows:

$$C_{h}^{{\text{I}}}=K_{h}^{N}{{{\uprho},}}$$
(21)

and the relative parameter-effects curvature (\(C_{h}^{{\text{P}}}\)) in the direction h is

$$C_{h}^{{\text{P}}}=K_{h}^{T}{{{\uprho}}}.$$
(22)

The relative intrinsic curvature and the relative parameter-effects curvature are independent of data scaling. Nevertheless, calculating the relative intrinsic curvature and the relative parametric-effects curvature in all directions has proven to be challenging in practical applications. Therefore, the root-mean-square curvatures (\(\:{{\upgamma\:}}_{\text{RMS}}\)) that are defined as the square root of the average squared curvature over all directions, including the root-mean-square intrinsic curvature and the root-mean-square parameter-effects curvature, are used in our study to quantify the nonlinearity of the photosynthetic light response models on a global scale. The root-mean-square intrinsic curvature (\(\:{{\upgamma\:}}_{\text{RMS}}^{N}\)) is given by the equation23,29:

$${{{\upgamma}}}_{{{\text{RMS}}}}^{N}=\sqrt {\frac{1}{{A(p)}}\int_{{\left\| v \right\|=1}} {{{\left( {C_{v}^{{\text{I}}}} \right)}^2}dA} } ,$$
(23)

and the root-mean-square parameter-effects curvature (\(\:{{\upgamma\:}}_{\text{RMS}}^{T}\)) is given by the equation23,29:

$${{{\upgamma}}}_{{{\text{RMS}}}}^{T}=\sqrt {\frac{1}{{A(p)}}\int_{{\left\| v \right\|=1}} {{{\left( {C_{v}^{{\text{P}}}} \right)}^2}dA} } ,$$
(24)

where \(C_{v}^{{\text{I}}}\) and \(C_{v}^{{\text{P}}}\) represent the relative intrinsic curvature and relative parameter-effects curvature corresponding to the unit direction v in Eqs. (23) and (24), respectively. A(p) represents the surface area of the p-dimensional unit sphere.

The two root-mean-square curvatures \(\:{{\upgamma\:}}_{\text{RMS}}^{N}\) and \(\:{{\upgamma\:}}_{\text{RMS}}^{T}\) are assessed by the critical curvature (Kc), defined as:

$${K_c}=\frac{1}{{\sqrt {F(p,n - p;{{{\upalpha}}})} }},$$
(25)

where F represents the F-distribution, p the number of model parameters, n the sample size, and α the significance level usually chosen to be 0.0529. A value of \(\:{{\upgamma\:}}_{\text{RMS}}^{N}\) not exceeding Kc suggests that the planar assumption can be confidently embraced. Similarly, if \(\:{{\upgamma\:}}_{\text{RMS}}^{T}\) is substantially smaller than Kc, then the uniform coordinate assumption holds true across the region of interest. Furthermore, the smaller the values of \(\:{{\upgamma\:}}_{\text{RMS}}^{N}\) and \(\:{{\upgamma\:}}_{\text{RMS}}^{T}\), the more close-to-linear the nonlinear regression model is23,29.

In addition to global curvature measures of nonlinearity, various other metrics are accessible that target the nonlinear behavior of individual parameters within the model. These encompass assessments of bias31, skewness32, and kurtosis34 of the least squares estimators. To evaluate the nonlinear behavior for a particular parameter within a photosynthetic light response model, the skewness (Sk) which is the standardized third moment of the estimators, serves as a valuable metric in this study. The skewness of λi is defined as28,32:

$${S_k}=\frac{{E{{\left( {{{\widehat {{{{\uplambda}}}}}_i} - E\left( {{{\widehat {{{{\uplambda}}}}}_i}} \right)} \right)}^3}}}{{{{\left( {{s^2}{{\mathbf{Q}}^{ii}}} \right)}^{3/2}}}},$$
(26)

where \({{{\hat {{\uplambda}}}}_i}\) is the estimator of parameter λi; \({s^2}={{{\text{RSS}}} \mathord{\left/ {\vphantom {{{\text{RSS}}} {(n - p)}}} \right. \kern-0pt} {(n - p)}}\), n is the sample size, and p is the number of parameters; \({\mathbf{Q}}={[{{\mathbf{J}}^{\text{T}}}(\widehat {{{{\uplambda}}}}){\mathbf{J}}(\widehat {{{{\uplambda}}}})]^{ - 1}}\), \({\mathbf{J}}(\widehat {{{{\uplambda}}}})\) represents the n × p Jacobian matrix, and Qii is the element of matrix Q in the i-th row and i-th column; and E represents the expectation.

As a rule of thumb: an absolute value of Sk not exceeding 0.2, i.e., |Sk| ≤ 0.2, indicates good close-to-linear behavior; an absolute value of Sk surpassing 0.5, i.e., |Sk| > 0.5, indicates bad close-to-linear behavior; and an absolute value of Sk falling between 0.2 and 0.5, i.e., 0.2 < |Sk| ≤ 0.5, demonstrates moderate close-to-linear behavior26.

The four nonlinear photosynthetic light response models described above, namely EM, RHM, NHM, and MRHM, were used to fit photosynthetic light response curves based on the empirical data. The Nelder-Mead optimization algorithm35 was employed to minimize the fitting criterion of nonlinear regression. The parameters of the photosynthetic light response models were estimated by minimizing the RSS between empirical and predicted net photosynthetic rate values.

The root-mean-square error (RMSE) which can be considered to be the “average absolute deviation” was then used to assess the goodness of fit of the nonlinear regression:

$${\text{RMSE}}=\sqrt {{\text{RSS}}/\left( {n - p} \right)} ,$$
(27)

where n represents the sample size for each dataset, and p represents the number of parameters for each photosynthetic light response model. The smaller the RMSE value, the better the model fits. Tukey′s honestly significant difference (HSD) tests with a 0.05 significance level were employed to determine whether there were significant differences among RMSE values derived from different photosynthetic light response models.

The R package “IPEC” (version 1.1.0)36 based on R (version 4.2.1)37 was utilized for fitting the photosynthetic light response curves, and calculations of nonlinear curvature measures \(\:{{\upgamma\:}}_{\text{RMS}}^{N}\), \(\:{{\upgamma\:}}_{\text{RMS}}^{T}\), Kc, and Sk.

Results

Figure 1 illustrates the fitting of the photosynthetic light response curves using the four nonlinear models for a representative dataset. For the fitted results of all datasets, the RMSE values ranged from 0.117 to 0.991 with a median of 0.372 for EM, from 0.086 to 1.734 with a median of 0.354 for RHM, from 0.072 to 0.784 with a median of 0.292 for NHM, and from 0.038 to 1.356 with a median of 0.274 for MRHM (Tables S1S4). The results of Tukey′s HSD test revealed that the mean of the RMSE values of RHM was slightly higher compared to the means of the other three photosynthetic light response models, suggesting that RHM may provide a poorer fit to the data than the other three models. However, this result is strongly influenced by two aberrant outliers (the RMSE values were 1.734 and 1.414) in the values of RHM (see Fig. 2). Broadly, the spread of the RMSE results displayed in Fig. 2 suggest that the four models provide comparable levels of goodness of fit for the data of the 21 species considered.

Fig. 1
figure 1

The fitted results of the empirical data using the four photosynthetic light response models (A EM; B RHM; C NHM; D MRHM) for a representative dataset selected from 42 datasets. Data points represent observations; the curves signify predicted photosynthetic light response curves. Letters a, Amax, Rd, θ, β, and γ with hats represent the estimated least-squares values of the parameters of the corresponding photosynthetic light response model in each panel; n represents the number of observations; RMSE represents the root-mean-square error.

Fig. 2
figure 2

Boxplots of the root-mean-square error for the four photosynthetic light response models (EM, RHM, NHM, and MRHM) for 42 datasets. The letters a-b at the top of each boxplot denote the significance of the difference in the means between any two models based on the Tukey′s HSD test. Means with different letters are significantly different at the 0.05 significance level. The horizontal solid lines represent the medians, and the asterisks within the boxplots represent the means, and the small open circles represent the data points.

The overall nonlinearity of the photosynthetic light response models was evaluated by the root-mean-square curvatures, i.e., \(\:{{\upgamma\:}}_{\text{RMS}}^{N}\), \(\:{{\upgamma\:}}_{\text{RMS}}^{T}\), and Kc. For the EM, all of the \(\:{{\upgamma\:}}_{\text{RMS}}^{N}\) values and \(\:{{\upgamma\:}}_{\text{RMS}}^{T}\) values in 42 datasets were less than the corresponding Kc. For the RHM, the \(\:{{\upgamma\:}}_{\text{RMS}}^{N}\) value was smaller than the corresponding Kc for each of the 42 datasets, and 80.95% of the 42 datasets had \(\:{{\upgamma\:}}_{\text{RMS}}^{T}\) values less than the corresponding Kc. For the NHM, all of the \(\:{{\upgamma\:}}_{\text{RMS}}^{N}\) values were less than the corresponding Kc, but only 38.10% of the 42 datasets had \(\:{{\upgamma\:}}_{\text{RMS}}^{T}\) values less than the corresponding Kc. For the MRHM, all \(\:{{\upgamma\:}}_{\text{RMS}}^{N}\) values were less than the corresponding Kc, and 57.14% of 42 \(\:{{\upgamma\:}}_{\text{RMS}}^{T}\) values were less than the corresponding Kc (Fig. 3).

These results indicated that EM exhibited the best linear approximation among the four photosynthetic light response models, while the NHM showed the worst performance in linear approximation. Notably, all four models demonstrated exceptional adherence to the planar assumption, as evidenced by all \(\:{{\upgamma\:}}_{\text{RMS}}^{N}\) values being less than the corresponding Kc. Regarding the uniform coordinate assumption, the EM emerged as the most satisfactory among all photosynthetic light response models, as all of its \(\:{{\upgamma\:}}_{\text{RMS}}^{T}\) values were less than the corresponding Kc.

Fig. 3
figure 3

Assessment of nonlinear behavior of the four photosynthetic light response models (EM, RHM, NHM, and MRHM) at the global level for 42 datasets. \(\:{{\upgamma\:}}_{\text{RMS}}^{N}\) represents root-mean-square intrinsic curvature, \(\:{{\upgamma\:}}_{\text{RMS}}^{T}\) represents root-mean-square parameter-effects curvature, and Kc represents critical curvature. For example, 80.95% represents that there are 80.95% of \(\:{{\upgamma\:}}_{\text{RMS}}^{T}\) values which are smaller than the corresponding Kc among 42 datasets for RHM.

When considering the individual parameter-level nonlinear behavior, analyzing the skewness (Sk) of parameters yields valuable insights. The analysis results of the close-to-linear behavior for specific parameters were presented in Table S5, where G indicated a good close-to-linear behavior (the absolute value of corresponding skewness |Sk| ≤ 0.2), M indicated a moderate close-to-linear behavior (0.2 < |Sk| ≤ 0.5), and B indicated a bad close-to-linear behavior (|Sk| > 0.5). Figure 4 presented the proportions of G, M, and B across 42 datasets for each parameter within the photosynthetic light response models. The best-behaving model was EM, which had a preponderance of good (G) scores for two out of three parameters, i.e., Amax, and Rd, and had no bad (B) scores. For parameter a, about 40% of the 42 datasets showed G scores, and there were no B scores. Each parameter within the remaining three models, namely RHM, NHM, and MRHM, had at least some B scores (except Amax and Rd within RHM), indicating that the bad close-to-linear behavior was spread out among the majority parameters. For RHM, the proportion of B scores for parameter a was 23.81%. Moving to NHM, there were 90.48, 73.81, 16.67, and 4.76% of the parameters θ, a, Amax, and Rd having B scores among 42 datasets, respectively. Regarding MRHM, we observed that among the 42 datasets, 16.67, 2.38, 33.33, and 16.67% of the parameters a, β, γ, and Rd received B scores, respectively (Fig. 4).

Fig. 4
figure 4

Assessment of nonlinear behavior of individual parameter of the four photosynthetic light response models (A EM; B RHM; C NHM; D MRHM) for 42 datasets. G indicates good close-to-linear behavior (the absolute value of corresponding skewness |Sk| ≤ 0.2); M indicates a moderate close-to-linear behavior (0.2 < |Sk| ≤ 0.5); and B indicates bad close-to-linear behavior (|Sk| > 0.5). The numeric values in percentage represent the proportions of G (or M, B) among 42 datasets for each parameter. For example, 40.48% in panel (A) represents the proportion of G scores for parameter a in EM is 40.48% among 42 datasets.

Discussion

When evaluating nonlinear regression models, goodness of fit is a commonly used criterion. However, it may prove inadequate when assessing complex nonlinear models with a multitude of parameters26. This becomes especially pertinent when comparing two models with closely matched levels of goodness of fit. In such cases, it becomes imperative to scrutinize other aspects of the models to determine the optimal choice, including considerations of model structure complexity38, and performance in linear approximation23,26,39. In addition to effectively fitting the data, a robust nonlinear model should ensure that each parameter exhibits close-to-linear behavior, thus guaranteeing that their least squares estimators are nearly unbiased, normally distributed, and asymptotically achieving minimum variance. Therefore, the curvature measures of nonlinearity hold paramount importance in the evaluation of models within nonlinear regression.

Among the four nonlinear models scrutinized in this study, we found that they exhibited comparable levels of goodness of fit overall, with RHM providing a marginally worse fit than the other three models (see Fig. 2). EM had the best linear approximation performance, as all of its \(\:{{\upgamma\:}}_{\text{RMS}}^{N}\) values and \(\:{{\upgamma\:}}_{\text{RMS}}^{T}\) values were smaller than the corresponding Kc (Fig. 3). In addition, two out of the three EM parameters, Amax and Rd, were deemed close-to-linear based on their low skewness. All of the datasets had ‘not bad’ close-to-linear behavior for the remaining parameter a (Fig. 4A). These results manifested that the EM outperformed the other photosynthetic light response models in nonlinear regression by achieving a favorable curvature measurement of nonlinearity.

Generally, through an appropriate nonlinear reparameterization, it is possible to mitigate the effects of a significant parameter-effects curvature. Moreover, such a reparameterization holds the potential to enhance the nonlinear behavior of the model parameters23,26. We employ the RHM model as an illustrative example. Notably, the uniform coordinate assumption of the RHM cannot be accepted in approximately 20% of the datasets (Fig. 3), and the parameter a of RHM exhibited far-from-linear behavior across over 20% of the datasets (Fig. 4B). To improve the equation nonlinearity, we conducted a reparameterization of the parameter a of RHM in this study, where a was substituted by the exponential of itself, that is,

$$P=\frac{{\exp (a)I{A_{\hbox{max} }}}}{{\exp (a)I+{A_{\hbox{max} }}}} - {R_d}.$$
(28)

It is worth noting that the exponential transformation guarantees positivity, regardless of the parameter a value in Eq. (28). This is crucial because the original parameter a in the RHM model (Eq. 2) must be positive to represent the initial quantum efficiency. This transformation also helps in stabilizing the optimization process by reducing the likelihood of the parameter estimates becoming unbounded or falling into biologically meaningless negative values during model fitting. This approach ensures that the parameter estimates remain within a biologically plausible range, which is critical for the accurate representation of the photosynthetic process.

Equation (28) was denoted as Repar-RHM and the empirical data were then fitted to it. The results revealed a significant improvement in the parameter-effects curvature and close-to-linear behavior of parameter a in Repar-RHM compared to parameter a in RHM. All of the \(\:{{\upgamma\:}}_{\text{RMS}}^{T}\) values in 42 datasets were smaller than the corresponding Kc (Table S6), and all of the 42 datasets had good (G) scores for parameter a in Repar-RHM (Fig. 5 and Table S7). Meanwhile, the intrinsic nonlinearity corresponding to the intrinsic curvature and the nonlinear behavior of the other two parameters, Amax and Rd, remained unchanged in Repar-RHM compared to RHM (Tables S2, S5, S6, and S7), continuing to exhibit their desirable close-to-linear behavior. The results indicated that the uniform coordinate assumption could be totally accepted and the parameter a exhibited good close-to-linear behavior across all of the datasets after that reparameterization.

However, it is crucial to acknowledge that identifying a suitable reparameterization form to enhance a model poses a significant challenge for researchers, primarily due to the potential structural complexity introduced by nonlinear reparameterization. Exploring effective reparameterization methods to enhance the model nonlinearity could prove to be a valuable direction for future research, potentially leading to more accurate descriptions of the relationship between light intensity and the rate of photosynthesis.

In summary, four nonlinear models were used to fit the photosynthetic light response curves and compared comprehensively based on goodness of fit and relative curvature measures of nonlinearity in this study. This nuanced approach in the presented work provides valuable insights into the criteria guiding model selection for nonlinear regression in describing the relationship between light intensity and the rate of photosynthesis. The study opens up several avenues for future research. Firstly, the validation of relative curvature measures of nonlinearity as an effective tool for model assessment suggests that this approach could be applied to other types of nonlinear regression problems in ecological modeling. Secondly, the focus on optimizing the fitting process for photosynthetic light-response curves can lead to more precise modeling of plant responses to environmental changes, which is increasingly important in the context of climate change and its impact on agriculture and ecosystems. One key limitation is the generalizability of the findings. The models and methods described in the study have been tested on specific datasets and conditions. However, their performance might vary when applied to different species, environmental conditions, or light regimes not covered in this study. Future investigations could explore the application of these methods to a broader range of species and environmental conditions, as well as the integration of these models into larger ecological and agricultural prediction systems.

Fig. 5
figure 5

Assessment of nonlinear behavior of individual parameters of the re-parameterized RHM (Repar-RHM) for 42 datasets. G indicates good close-to-linear behavior (the absolute value of corresponding skewness |Sk| ≤ 0.2); M indicates moderate close-to-linear behavior (0.2 < |Sk| ≤ 0.5); and B indicates bad close-to-linear behavior (|Sk| > 0.5). The numeric values in percentage represent the proportions of G (or M, B) among 42 datasets for each parameter. For example, 100% represents the proportion of G scores for parameter a in Repar-RHM is 100% among 42 datasets.