Introduction

Many current medical treatments and interventions have been developed and tested in clinical trials involving cohorts of individuals. Whereas inter-individual variability in subjects included in clinical trials is typically strongly characterized, prescription of treatment often assumes that patients receiving respond similarly to the average response in these clinical trials1. However, this assumption inherently neglects physiological and environmental differences between people, such as genetic variants or acquired exposures that may mediate disease risk or response to treatment. Furthermore, the observed inter-individual variation may not be due to just natural variability, but this variation may be indicative of disease progression. Specifically, in the development of type 2 diabetes mellitus (T2DM), the progressive decline of β-cell function, responsible for insulin release in response to glucose, is a key characteristic of disease development2. Furthermore, residual β-cell function is indicative of treatment response and can therefore aid in treatment selection3,4.

In recent years, personalized medicine, where treatments are tailored based on specific characteristics, such as genetics5 or body composition6, has emerged as a promising approach to improve health outcomes. In particular, in the field of oncology, machine learning is increasingly being used to map large datasets to clinical outputs, to identify more optimal treatment strategies based on genetic data7 or machine learning-assisted analysis of tumor biopsies8. Research suggests that these more personalized treatment regimens have the potential to improve the long-term prognosis of patients9. However, the direct application of machine learning methods that have shown success in precision oncology to other medical disciplines has been hampered by smaller sample sizes and a lack of publicly available clinical trial data10,11.

The advantages of these purely data-driven approaches that have been used successfully in precision oncology are that they allow flexible incorporation of various types and sources of data for accurate model output. However, a downside of this flexibility in machine learning models is that the volume of data required to train machine learning models is relatively large12. The comparatively small sample sizes collected in human clinical trials have greatly hampered the widespread deployment of machine learning to biomedical problems10. In addition, these machine learning models can lack interpretability, particularly in the case of larger neural networks13. This interpretability can be retained by using inherently predictable models, such as ordinary logistic regression, for structured data with meaningful features.

Alternatively, in cases where individual data is limited but biological knowledge is abundant, systems of differential equations are constructed to describe biological processes. These physiologically-based mathematical models (PBMMs) are powerful tools to disentangle the complexity of the physiological basis underlying medical measurements14,15. Previous research has demonstrated that the estimation of model parameters in PBMMs from individual measurements can yield an accurate and interpretable explanation of the inter-individual biological variation16. While PBMMs are beneficial for studying biological systems, building and validating accurate PBMMs requires a profound understanding of the underlying physiology and can be time-consuming. Consequently, these models typically have a limited scope15,17. Additionally, as PBMMs are constructed manually, unwanted bias can be introduced, especially when complex nonlinear biological behaviours may be approximated with comparatively simple terms.

A promising emerging area of research focuses on the combination of highly plastic machine learning approaches with physiological knowledge in the form of mechanistic models to produce a hybrid model that can be trained with fewer learning examples. In recent years, multiple hybrid frameworks have been proposed. Physics-informed neural networks (PINNs)18 are an example, where the loss function of a neural network is supplemented with a set of equations to ensure that the neural network not only fits the data well, but also adheres to known physical laws. In this work, we use the Universal Differential Equations (UDE) framework19, where the known components of a biological system are described by parameterized differential equations and a neural network is incorporated into the equations to account for the unknown components. These UDE models have been shown to be applicable to various biological systems, including the deployment to infer the glucose appearance of a meal20, as well as a STAT5 dimerization model21. Furthermore, the resulting trained neural networks can be reduced to analytical expressions using a technique called symbolic regression19.

The application of UDE models in biomedical contexts such as learning the average rate of glucose appearance from a meal has been explored. However, current conventional training of UDEs cannot directly accommodate inter-individual variability, which is ubiquitous in biomedical data. Although it is possible to train the model on the average data of a population, as has been done in the past, such models are not expected to generalize well to the individual data. Alternatively, it is possible to train a model for each individual separately. However, this approach has some drawbacks. First, often only limited measurements are available for each individual, making estimation of neural network parameters on individuals highly sensitive to measurement noise, increasing the risk of overfitting. Furthermore, the black-box nature of neural networks complicates the comparison of trained neural networks between individuals.

In this work, we propose an extension of the UDE framework, termed conditional UDEs (cUDEs), where trainable person-specific parameters are added as input to the neural network to account for between-subject variability, and the weights of the neural network are assumed to be common across the entire population. In this way, variability between subjects is forced into these conditional input parameters, while the neural network parameters learn the global behaviour of the system.

Here, we applied a cUDE model to characterize the insulin production capacity of pancreatic β-cells in individuals with normal glucose tolerance, impaired glucose tolerance and T2DM. Our results demonstrate that the conditional universal differential framework derives an accurate representation of the inter-individual variation in c-peptide production. Furthermore, we show that this subject-specific conditioning parameter is strongly correlated with the gold standard hyperglycemic clamp measure of insulin production capacity. We then derived an analytical expression from the conditionally trained network using symbolic regression and showed that the learned function not only described c-peptide production for people with normal glucose tolerance, impaired glucose tolerance, and T2DM, but also generalized to describe individual c-peptide production in an independent human trial.

Results

Conventionally trained UDE does not generalize across population

To investigate the ability of a conventionally trained UDE to generalize to meal responses in a population of individuals, a universal differential equation of c-peptide production and kinetics was initially trained on the average meal response. The UDE model is based on a two-compartment ordinary differential equation model describing c-peptide kinetics in the plasma and interstitial space by van Cauter et al.22. Here, the van Cauter model was extended by introducing a fully connected neural network to represent c-peptide production in the pancreas (Fig. 1a).

Fig. 1: Modelling c-peptide production with a conventionally trained universal differential equation.
figure 1

a Schematic overview of the van Cauter22 model of c-peptide kinetics, depicting the location of the neural network that describes the production of c-peptide (P(t)) depending on plasma glucose (Gpl). The blue circles indicate the c-peptide state variables (Cpl and Cint for the plasma and interstitial fluid compartments respectively). The green circle depicts the plasma glucose level. Solid arrows represent fluxes, and dashed arrows indicate stimulation. Each flux arrow is labeled with their respective kinetic parameter. b Mean squared error (MSE) distributions of the UDE model, trained on average response data, on each individual in the used dataset, split by train and test set, and grouped by glucose tolerance status. ce Mean (circles) and standard deviations (error bars) of the data, and UDE model predictions (solid lines) given the mean data per glucose tolerance condition.

The change in plasma glucose concentration relative to the fasting value at time is provided as input to the neural network t, defined as

$$G(t)={G}^{{\rm{pl}}}(t)-{G}^{{\rm{pl}}}(0)$$

where Gpl(t) is the plasma glucose value at time t. The output of the neural network is the rate of c-peptide production P(t).

To train the model, demographic data and plasma glucose and c-peptide trajectories were used from 117 people from a study by Okuno et al.23, labeled the Ohashi dataset, as the data was retrieved from a paper by Ohashi et al.24. The data set encompassed three distinct subgroups: people with normal glucose tolerance (NGT), impaired glucose tolerance (IGT), and type 2 diabetes mellitus (T2DM) (Supplementary Fig. 1). For estimating neural network weights and biases, average c-peptide measurements were used from a training set containing 70% of the individuals. The weights and biases from the neural network trained on the average response were then used in combination with the glucose values and kinetic parameters to predict postprandial c-peptide values. The simulation errors for the individuals in the train and test sets are shown in Fig. 1b, showing comparable performance for the normal glucose tolerance and impaired glucose tolerance groups, but a strong reduction in performance in the T2DM group.

Figure 1 c–e shows the resulting UDE fits, using the average data from each glucose tolerance condition as input. From this figure, we can observe that the UDE model generally fits the mean data within one standard deviation, with the exception of the final two time points in the IGT group. However, the model underestimates c-peptide production in the NGT and IGT groups, while overestimating c-peptide production in the T2DM group. This indicates the inability of the single universal differential equation trained on the average response data to account for the progressive decline in β-cell function observed in the progression from NGT towards T2DM.

Conditional universal differential equation model of c-peptide kinetics

To capture the inter-individual variability, an additional input parameter was added to the neural network, resulting in a conditional UDE model (cUDE). Consequently, the neural network in this cUDE model has two inputs. The first input of the cUDE network is the relative plasma glucose concentration at time t, as in the conventional UDE. The second input is a trainable parameter βi that accounts for the variability between individuals in the production of c-peptide. The output of the neural network (P(t)) is the c-peptide production at time t (Fig. 2a).

Fig. 2: Structure and training procedure of the conditional universal differential equation model used to infer postprandial c-peptide.
figure 2

a Schematic of the conditional UDE model, and the neural network used to estimate personalized c-peptide production. The time-dependent plasma glucose value and a person-specific parameter controlling for inter-individual variability in dose response are inputs to the neural network. The weights and biases of the neural network are estimated population-wide. b Illustration of the training procedure. The dataset is split into a train (49%), validation (21%), and test set (30%). In the train set, both population and individual parameters are estimated. In the validation and test sets, population parameters in the neural network are fixed to the trained values and only the person-specific conditioning parameters are estimated from data.

Figure 2b depicts the process of training the conditional UDE. Model selection and training are performed on a subset of 70% the dataset, labeled the ‘train set’. The weights and biases of the neural network for the whole population are trained together with the individual parameters of the train set, obtaining 25 candidate models from 25 initializations of the optimization. A validation set is used, where only the individual parameters are estimated, to select the best-performing model from these 25 candidate models. The model is then evaluated on a separate test set, where, in the same way as in the validation set, the individual parameters are estimated, while the neural network parameters are kept constant.

cUDE derives generalizable c-peptide production across population

Figure 3a–c visualizes the cUDE simulation of plasma c-peptide for the individuals in the test set with the median error value for each glucose tolerance condition, showing a good concordance with the measured c-peptide data. This figure demonstrates that the same neural network weights and biases, in combination with a subject-specific conditional parameter, can simulate glucose-driven c-peptide production while accounting for a large part of the inter-individual variability of the c-peptide production. The confidence regions for the model simulations are computed from the likelihood profiles, shown in Supplementary Fig. 6. All data points for the individuals are contained within these confidence regions, with the exception of the final time point in the NGT case. Furthermore, the confidence regions for the NGT and IGT individuals are both larger than the confidence region in the T2DM individual. All test fits are shown in Supplementary Fig. 4. The empirical distributions of the conditional parameters were computed for each glucose tolerance condition, and included in Supplementary Fig. 5a, where Supplementary Fig. 5b-d contain simulated c-peptide curves for each glucose tolerance condition, showing that these curves closely match with the c-peptide data of each glucose tolerance condition.

Fig. 3: Model fits of the conditional UDE (cUDE) model on the test data.
figure 3

ac Model fit of the individuals with median error value within each glucose tolerance group. Visualization of all model fits for all individuals in the test set can be found in Supplementary Fig. 4. Circles indicate the measured c-peptide levels from each individual and solid lines represent the model fits. Dotted lines represent the 95% confidence intervals on the model fits based on the likelihood profiles, defined according to45. d Distribution of mean squared error values for model fits for all subjects in the test subset, separated by glucose tolerance status group.

Furthermore, the distribution of model error values across the three glucose tolerance groups is shown in 3 d. Compared to the conventional UDE (Fig. 1b), the distributions are narrower, especially for the T2DM group. The resulting model fits and error distributions are comparable to the model fits and errors in the train set, which can be found in Supplementary Fig. 3. In addition, training the cUDE model on various fractions of the train set showed that a train set size of around 29 individuals is already sufficient for training a model, with a comparable mean test error to the current cUDE model (Supplementary Fig. 7).

In supplementary section Supplementary Note 2, we demonstrate that the ability of the cUDE architecture to learn a generalizable model from data with large systematic heterogeneity by applying the approach to a second example system simulated using a different mathematical model. As with the c-peptide model, the conditional parameter showed strong correlation with the prescribed parameter values used to simulate the data. (Supplementary Note 2).

Conditional training parameter captures inter-individual variation

To investigate the interpretability of the conditional parameter, personalized conditional parameters were compared with subject characteristics, including BMI, age, body weight, and clamp-based measurements of insulin sensitivity and insulin production capacity.

In Fig. 4, the strongest Spearman correlation of −0.805 is observed with the first phase of insulin production measured using the hyperglycemic clamp, the gold standard measure of insulin production (Fig. 4a). A moderate correlation is seen with age (b), while the insulin sensitivity index, measured using a hyperinsulinemic-euglycemic clamp (c), has the lowest correlation of the three.

Fig. 4: Spearman correlation of conditional parameter βi with independent phenotypic measurements for the individuals.
figure 4

a Spearman correlation of the conditional parameter with the first-phase insulin production during a hyperglycemic clamp (Supplementary Fig. 2). b Correlation of the conditional parameter with age in years. c Correlation of the conditional parameter with the insulin sensitivity index measured from a hyperinsulinemic-euglycemic clamp test.

The correlations with body weight and body mass index are low, while the correlations with other measures of insulin production are high, which is shown in the supplementary figure Supplementary Fig. 10.

Symbolic regression derives a generalizable analytical expression of c-peptide production

As the neural network model remains a black box model, we also sought to replace the neural network with a more interpretable analytical expression. The symbolic regression approach proposed by Cranmer et al.25 was applied to data sampled from the trained neural network. Subsequently, the derived analytical expression was simplified manually, reducing several fixed constants to a single term (see Supplementary Note 1 for a detailed derivation). The resulting expression resembles Michaelis-Menten kinetics and is given as

$$P({G}^{{\rm{pl}}}(t)| {k}_{M})=\left\{\begin{array}{ll}1.78\cdot \frac{{G}^{{\rm{pl}}}(t)-{G}^{{\rm{pl}}}(0)}{{k}_{M}+{G}^{{\rm{pl}}}(t)-{G}^{{\rm{pl}}}(0)}\qquad{\rm{if}}\,{G}^{{\rm{pl}}}(t)\ge {G}^{{\rm{pl}}}(0)\\ 0\qquad\qquad\qquad\qquad\qquad{\rm{otherwise}}\end{array}\right.$$
(1)

Here, kM is a trainable parameter, qualitatively equivalent to the βi parameter learned by the cUDE. (see Supplementary Note 1 for more details on the numerical relation between βi and kM). The dose-response curves for the neural network and the learned expression are depicted in supplementary figure Supplementary Fig. 12.

To evaluate the performance of this learned analytical expression for c-peptide production the neural network of the cUDE was replaced with equation (1). The fully analytical model was then fit to the measured c-peptide data for all individuals by estimating a value for kM.

Figure 5 a–c visualizes the model fits of the analytical model model for the individuals corresponding to the median error values per glucose tolerance group. As seen with the cUDE model, the model derived via symbolic regression agrees well with the data across all three groups. The distribution of model fit errors per group (Fig. 5d) also shows comparable distributions to the model fit errors obtained for the cUDE model. Furthermore, the correlations of the estimated kM value with insulin production, age, and insulin sensitivity, as shown in Fig. 5e–g, are similar to the previous results obtained with the cUDE model and again display a high correlation with insulin production as measured with the hyperglycaemic clamp. Profile likelihood analysis was performed on the parameter kM for each individual to test whether it was identifiable from the data. (Supplementary Fig. 6)

Fig. 5: Fit of the analytical model derived using symbolic regression to measured data.
figure 5

ac Model fit for individuals with the median error value for each glucose tolerance condition. Model fits are shown with the solid line, measured c-peptide values are indicated with the circles. The model simulations for the 95% confidence intervals on the parameters are shown in dashed lines. d Model error value distributions split by glucose tolerance condition. eg Correlation of personalized kM estimate with e the first-phase insulin production measured during the hyperglycaemic clamp, f age (years), and g the insulin sensitivity index measured using a hyperinsulinemic-euglycemic clamp test.

Finally, to demonstrate generalizability of the model derived from symbolic regression, the analytical model was fitted to glucose and c-peptide measurements collected during an OGTT from a previously unseen dataset.

The model fits for the individuals at the 25th, 50th and 75th percentiles of the mean squared error are shown in Fig. 6a–c respectively. In all three models, the curve shows high concordance with the data. Moreover, despite the original cUDE model being trained for data up to 120 min, the learned analytical term can also reliably simulate plasma concentrations of c-peptide up to 240 min postprandially. The distribution of model errors is shown in Fig. 6d, indicating that high-quality model fits could be obtained for a large part of the twenty individuals.

Fig. 6: Model fits and errors for the c-peptide model derived using symbolic regression on the Fujita dataset43.
figure 6

ac Model fits for the individuals at the 25th, 50th and 75th percentiles of the model error distribution respectively. Measured c-peptide values are indicated with the black circles, the simulated model fits are shown with the solid lines. Dashed lines indicate simulations with the 95% confidence intervals on the parameters. d Raincloud plot of the model errors for the entire Fujita dataset.

Discussion

In this work, we introduced conditional universal differential equations (cUDEs) as an extension of the universal differential equation framework that facilitates simultaneous data-driven model discovery and model personalization. We then applied this technique to uncover a novel index of inter-inidividual varition in c-peptide, and by extension, insulin production in a human population with diverse glucose tolerance status.

Our results show that cUDE models can accurately estimate a missing c-peptide production term from the data. More importantly, by accommodating the large inter-individual variation in plasma glucose and c-peptide level, the cUDE learns a model that can be generalized across individuals with different glucose tolerance status. In contrast, the classical UDE was unable to capture difference in beta-cell capacity that are indicative of glucose tolerance status. Investigating individual model fits, the trained cUDE was unable to describe the c-peptide measurements of a single individual (Supplementary Fig. 4 individual 10) from the test set. However, this individual showed a strong discordance between the measured glucose and c-peptide data with measured plasma glucose only increasing 60 minutes after ingestion of the glucose solution. This unexpected plasma glucose response may potentially be explained by the effect of incretin hormones such as GLP-1 or GIP. These incretin hormones are produced in response to an increase in glucose level in the intestine and activate insulin and c-peptide production26,27. In this study, these hormones are not measured and are a potential additional source of inter-individual variability in c-peptide production. Should time series of incretin hormones become available in the future, the cUDE framework could be reapplied without strong modifications to further learn the role of these incretin hormones in c-peptide production. However, in the current model, where only glucose is provided as the stimulus for c-peptide release, the majority of model fits showed a strong agreement with plasma measurements, suggesting that glucose is the primary driver of c-peptide production28.

Furthermore, by constraining the weights and biases of the neural network to be the same for the entire population, the free conditional parameters capture the inter-individual variation which enables the direct comparison between individuals. By comparing the conditional parameters resulting from the c-peptide model with a range of independent measures of metabolic health, we have shown that the conditional parameter strongly correlates with metrics of insulin secretion measured using the hyperglycemic clamp method, the current gold standard measure of insulin production capacity. Furthermore, the lack of a strong correlation with the insulin sensitivity index indicates that the conditional parameter specifically targets the c-peptide and insulin production capacity, and not just a general deterioration in metabolic resilience. The moderate correlation observed with age may have two causes. Firstly, the conditional parameter has been shown to describe the progressive decline in β-cell function, with higher values in people with T2DM. The age distribution was different between each glucose tolerance condition, with the ages of the T2DM group being significantly different than the NGT individuals (Mann-Whitney U test, p < 10−10). Secondly, part of this correlation may also originate from the known natural decline of β-cell function with aging29. We have also trained a cUDE model including age as an additional input to the model to investigate the effect of correcting for age on the correlation with the first-phase clamp indices and the curve-fitting performance. However, the correlation of the conditional parameter with the clamp index reduced slightly, and no notable improvement was observed regarding curve fitting performance. (Supplementary Fig. 11) We suspect that the bias of the dataset concerning the age of individuals in each subgroup may influence the ability of the neural network to correctly estimate the true age effect on insulin production capacity, and we would require a larger dataset with a better representation of this natural age effect for an improved separation of the age effect and the diabetes progression effect. While the inclusion of age as an additional covariate did not improve the results in the model used in this work, this feature of cUDEs could also be used in different applications, for example to introduce relevant phenotypic characteristics, such as sex, smoking status, or family history of disease into the UDE models. This approach would produce a hybrid model that can integrate these features into a mechanistic model to improve the prediction of disease risk or treatment response, as proposed in the case of ventricular tachycardia in ref. 30.

In order to learn a generalizable model of c-peptide production, a sufficiently heterogeneous dataset is required. Here, we used data from the Ohashi data set consisting of individuals with normal glucose tolerance, imparied gluocse tolerance and T2DM. However, it is not essential to have very large data sets. In Supplementary Fig. 7 we show that using data from 29 individuals did not strongly increase the test error of the cUDE model, provided that the proportion of NGT, IGT and T2DM was maintained. Although the amount of data required also depends on the complexity of the model to be learned, and thereby the neural network size and the amount of inputs and outputs, the cUDE is relatively data-efficient, compared to fully data-driven methods, which typically require thousands of samples31,32. Additionally, we have shown the applicability of the cUDE model in a simulated example with just 37 individuals, showing that the conditional parameter strongly associates with the variability introduced in the simulated data. (Supplementary Note 2).

Furthermore, we show that the interpretability of the cUDE model can also be further increased through the use of symbolic regression. For symbolic regression, we have used a genetic algorithm, which is non-deterministic and may produce variable results upon repeated runs. This can be mitigated by letting the algorithm run through sufficient iterations, which will eventually lead to model convergence. However, this required number of iterations (25,000 in this work) is problem dependent, and for larger problems, more iterations are required, which should be taken into account when applying symbolic regression based on genetic algorithms. In this work, the use of a limited number of allowed operators based on knowledge of previously built ODE models in systems biology greatly reduced the search space, allowing for the discovery of an interpretable model. However, some detail in the dose-response relationship is lost when comparing the analytic expression to the neural network (Supplementary Fig. 12). Despite the loss of some of the details in the dose response, the derived analytical equation demonstrates generalizability beyond the original dataset, as shown by fitting the derived analytical model to normoglycemic individuals from a previously unseen dataset. Furthermore, we demonstrate that the derived model originally trained on 120 minutes of data can successfully simulate model behaviour over 240 minutes.

Several models of insulin and c-peptide production in response to glucose have been proposed in the literature. From models such as Maas et al.33 that use a complex PID controller or the detailed model of exocytosis used by Ha et al.34 to simple linear mass-action kinetics presented in Hovorka et al.35. The Michaelis-Menten term for c-peptide production derived from data in this study is similar to the insulin production term used in the model by Topp et al.36, which uses a Hill function with a Hill coefficient of 2. This acts as a form of validation that the model we derived from symbolic regression is a physiologically plausible model of c-peptide production.

A limitation of this study is that both the Ohashi and Fujita datasets contain only people of Japanese descent. Although previous work has provided evidence for similar β-cell responsiveness across all glucose tolerance states37, it is necessary to further validate the trained model on more diverse populations. In addition, the derived model has only been tested on OGTT responses. Especially considering the effect of amino acids on insulin and c-peptide production38, the model may not be able to accurately describe responses to more complex meals. However, despite these limitations in the learned c-peptide model, we demonstrate that the cUDE approach outperforms current UDE approaches in learning a generalizable model that incorporates biologically relevant inter-individual variation.

While we demonstrated that the conditional parameters were identifiable in almost all individuals (109 of 117 individuals, Supplementary Fig. 6), we also demonstrated that the conditional parameters are only identifiable when the neural network parameters are fixed, as can be seen from Supplementary Fig. 8. In this figure, we visualized β against the first phase of the hyperglycemic clamp for all models resulting from the various initializations of the neural network parameters during training. When comparing the conditional parameters trained in multiple initializations of the cUDE training we see that the neural network learns either a positive or negative association with the first-phase clamp index. However, a consistently strong association is derived across models. Furthermore, while a linear relationship between the conditional parameters of two models is not guaranteed, due to the nonlinearity of the neural network, comparing the parameters of two models does results in a high correlation. This high correlation, in combination with narrow spread of points suggests an algebraic relationship (Supplementary Fig. 9). This effect, however, does pose a challenge concerning the use of ensemble UDE models for increased robustness39. This challenge can potentially be remedied using dimensionality reduction techniques, such as principal component analysis, to align common patterns within conditional parameters, but this requires further investigation.

Furthermore, if multiple conditional parameters were to be used, the nonlinearity of the neural network can cause these to become correlated, and mutually unidentifiable. Possibly, in case of multiple conditional parameters, regularization could be applied to penalize correlations between the conditional parameters to ensure orthogonality. However, assessment of identifiability is still only possible after fixing the neural network weights and biases.

In our current training regimen, we train the biases and weights of the neural network using the whole training data set, while the conditional parameters are trained independently for each individual using a maximum likelihood approach. Nonlinear mixed effects (NLME) modelling is an alternative approach to model parameterization that simultaneously accounts for both inter- and intra-individual variability40. By representing inter-individual variability through random effects, NLME models enable scalable estimation via the population likelihood, integrating out individual-level parameters. Recent advances, such as neural network-based NLME extensions, incorporate random effects as neural inputs to capture population heterogeneity, typically under the assumption of normally distributed effects41. We show that NLME based training of the cUDE model is equally possible and yields similar results to the original approach taken in this work. While the correlation with hyperglycemic clamp remains strong when estimating the parameters using a NLME structure, the accuracy of the model fit is reduced in some individuals of the T2DM group, as their parameters regress towards the population mean. (Supplementary Note 3) Due to this reduced accuracy, we used a traditional frequentist approach for estimating parameters, but in some cases, depending on the research question being addressed, NLME estimation combined with the cUDE model structure may provide more useful results.

In conclusion, we present cUDEs as an effective extension to the UDE framework that can be used to learn a generalizable representation of missing dynamics from a heterogeneous dataset. The cUDE works under the main assumption that the dynamic system underlying the data is common to all samples, while only a limited set of parameters is necessary to capture the differences between samples. This setup makes the cUDE especially suited to biological challenges, where inter-individual variability is both ubiquitous and often physiologically relevant. Here, we show that the conditional parameter in the cUDE model for c-peptide is interpretable as a physiologically relevant index, capturing the inter-individual variability in c-peptide production as validated by comparison with the hyperglycemic clamp. Although this study demonstrates the application of the cUDE model in a specific application, the cUDE framework is also usable in several other medical disciplines where mathematical models are abundant, such as cardiovascular medicine, neurology, and infectious diseases. The ability of the cUDE model to learn a model that can generalize, capture relevant and interpretable inter-individual variation, and to be trained with limited number of learning examples are key features that demonstrate its potential to support model- and data-driven precision healthcare.

Methods

Ohashi dataset

The Ohashi dataset was obtained from Ohashi et al.24,42, and originally collected by Okuno et al.23. The original study was approved by the ethics committee of the Kobe University Graduate School of Medicine and was registered with the University Hospital Medical Information Network (UMIN000002359). Written informed consent was obtained for all subjects.

As described in42, 50 subjects with normal glucose tolerance (NGT), 18 subjects with impaired glucose tolerance (IGT), and 53 subjects with type 2 diabetes (T2DM) participated in the study. The characteristics of the subjects for each group are shown in Table 1.

Table 1 Subject characteristics from the Ohashi dataset23,24,42 after exclusion of subjects with missing data, and the Fujita dataset43

All subjects underwent a 75-gram oral glucose tolerance test, as well as a consecutive hyperglycemic and hyperinsulinemic-euglycemic clamp test. Both tests were performed on separate mornings after an overnight fast.

In the 75g-OGTT, follwing an overnight fast, blood samples were collected before and 30, 60, 90 and 120 minutes after ingestion of the glucose solution. Plasma glucose and serum insulin and c-peptide concentrations were measured in each sample.

Hyperglycemic clamp and hyperinsulinemic-euglycemic clamp tests were performed consecutively. The hyperglycemic clamp began with an intravenous infusion of a glucose bolus of 9622 mgm−2 within 15 minutes, followed by a variable dose of glucose to keep plasma glucose levels at 200 mg dL−1 for 90 minutes. Blood samples were collected before and at 5, 10, 15, 60, 75, and 90 minutes after glucose infusion. In each blood sample, plasma glucose and serum insulin and c-peptide were measured. The hyperinsulinemic-euglycemic clamp test was then performed by intravenous infusion of regular human insulin at 1.46 \({{\rm{mU}}{\rm{kg}}}^{-1}{\min }^{-1}\) to obtain a serum insulin concentration of 600 pmolL−1. Plasma glucose concentration was kept at 90 mg dL−1 by variable glucose infusion for 120 minutes23.

Insulin secretion indices were defined as the incremental area under the insulin concentration curve during the hyperglycemic clamp:

$$S({T}_{1},{T}_{2})=\mathop{\int}\nolimits_{{T}_{1}}^{{T}_{2}}\left(I(t)-I(0)\right){\rm{d}}t$$
(2)

Where the insulin secretion during the first-phase is defined as S(0, 10), during the second phase as S(10, 90) and the total insulin secretion as S(0, 90). The insulin sensitivity index (ISI) is calculated from the hyperinsulinemic-euglycemic clamp by dividing the mean measured glucose infusion rate during the last 30 minutes of the test by the product of plasma glucose and serum insulin levels at the end of the clamp (t = 120).

Fujita dataset

The Fujita dataset was obtained from Fujita et al.43. Written informed consent was obtained for all subjects.

As described in43, 20 subjects with normal glucose tolerance (NGT) participated in the study. Subject characteristics for each group are shown in Table 1. All subjects underwent a 75g-oral glucose tolerance test (OGTT) in the morning after an overnight fast. Fasting blood samples were drawn twice before oral ingestion of glucose. Blood samples were obtained at 10, 20, 30, 45, 60, 75, 90, 120, 150, 180, 210, 240 min after ingestion. Subjects remained at rest throughout the test. Blood samples were rapidly centrifuged43.

Data preprocessing

Measurements of four subjects with missing values in the OGTT experiment were excluded from further analysis. The values reported in Table 1 are calculated on the data after exclusion. Unit conversions were performed to convert glucose from mgdL−1 to mM and c-peptide from ngmL−1 to nM. For the data from Fujita et al.43, no measurements were dropped and the same unit conversions were applied, as with the data from Ohashi et al.

Differential equation model of c-peptide

The van Cauter model was used to describe the concentrations of c-peptide in the plasma and interstitial compartment,22 (Fig. 2a). The original model, used to describe intravenously administered c-peptide, was extended to include endogenous production of c-peptide by the pancreas. The model equations for both compartments are given by:

$$\frac{{\rm{d}}{C}^{{\rm{pl}}}}{{\rm{d}}t}=-({k}_{0}+{k}_{2}){C}^{{\rm{pl}}}+{k}_{1}{C}^{{\rm{int}}}+P(t)$$
(3)
$$\frac{{\rm{d}}{C}^{{\rm{int}}}}{{\rm{d}}t}={k}_{2}{C}^{{\rm{pl}}}-{k}_{1}{C}^{{\rm{int}}}$$
(4)

Where Cpl represents the concentration of c-peptide in the plasma compartment and Cint is the concentration of C-peptide in the interstitial compartment. Kinetic parameters k0-k2 were calculated for each individual based on age, using equations provided by van Cauter et al. 22, which are given as:

$$\begin{array}{rcl}{k}_{1}&=&f\frac{\log (2)}{{\tau }_{L}}+(1-f)\frac{\log (2)}{{\tau }_{S}}\\ {k}_{0}&=&\frac{\log (2)}{{\tau }_{S}}\cdot \frac{\log (2)}{{\tau }_{L}\cdot {k}_{1}}\\ {k}_{2}&=&\frac{\log (2)}{{\tau }_{S}}+\frac{\log (2)}{{\tau }_{L}}-{k}_{0}-{k}_{1}\end{array}$$

For which the parameter values (f, τS, and τL) are given in Table 2 for the NGT, IGT, and T2DM groups.

Table 2 Parameter values for computing the kinetic parameters for the van Cauter c-peptide model for the NGT, IGT and T2DM groups

Neural network component

The production of c-peptide P(t) was modelled using a densely connected neural network with two inputs; the first was given by the difference in plasma glucose at time t compared to fasting values (Gi(t) = Gpl(t) − Gpl(0)) and the second was a learnable parameter βi representing the inter-individual variability (fig. 2b). Plasma glucose values are obtained directly from the measured data using a forcing function. For timepoints in between measurements, glucose values are linearly interpolated.

The neural network contained two hidden layers each consisting of 4 nodes making use of \(\tanh\)-activation functions, and an output layer of size 1, with a softplus activation function, resulting in 37 trainable weights.

The neural network architecture was selected through a grid search. Different architectures were obtained by varying the depth of the model between 1 and 2 layers, with layer sizes of 3–6 nodes, and 3 layers with layer sizes of 3 and 4 nodes. All models were trained on 70% of the train set and evaluated in the remaining 30%. The model that gave the lowest error for most individuals was selected. In case of a tie, the model with the lowest median error on all individuals was selected.

Initial conditions

For simulation, the whole system was assumed to be in steady state at t = 0, as subjects were fasting prior to the oral glucose tolerance test. The initial condition for plasma c-peptide (Cpl) was set to the measured fasting value at t = 0. For interstitial c-peptide, the initial condition was calculated using the steady-state assumption to be

$${C}^{{\rm{int}}}(t=0)=\frac{{k}_{2}}{{k}_{1}}{C}^{{\rm{pl}}}(t=0)$$
(5)

Furthermore, to ensure the plasma c-peptide compartment was in steady-state at t = 0, the production term P(t) including the neural network N(G, βi), describing c-peptide production was formulated as

$$P(t)={P}_{0}+N(G(t)-G(t=0);{\beta }_{i})-N(0;{\beta }_{i})$$
(6)

Resulting in a production value at t = 0 of P(t = 0) = P0, where P0 was set as

$${k}_{0}{C}^{{\rm{pl}}}(t=0)$$
(7)

Parameter estimation

The neural network parameters were estimated on a randomly selected training subset containing 70% of the total samples, stratified according to the glucose tolerance condition. This training set was further divided into a true train set of 70% and a validation set of 30% of samples. This resulted in a true training set containing 49% of the entire dataset (n = 57), and a validation set containing 21% of the entire dataset (n = 25). Parameters were estimated on the true train set using the following loss function:

$${{\rm{L}}}_{{\rm{train}}}({p}_{{\rm{NN}}},\beta )=\mathop{\sum }\limits_{i=1}^{{N}_{{\rm{train}}}}\mathop{\sum }\limits_{t=0}^{{\mathcal{T}}}{\left({C}_{{\rm{model}}}^{{\rm{pl}}}(t| {p}_{{\rm{NN}}},{\beta }_{i})-{C}_{{\rm{data}},i}^{{\rm{pl}}}(t)\right)}^{2}$$
(8)

Where pNN are the parameters of the neural network, β represents the vector of all conditional parameters for each individual i out of the total of Ntrain individuals. Furthermore, \({\mathcal{T}}=\left\{0,30,60,90,120\right\}\) represents the set of timepoints contained in the data. To prevent sign changes of β between individuals, \(\log (\beta )\) was estimated, constraining β to the positive domain.

The parameter estimation for the universal differential equation models was then performed by sampling 25,000 initial candidate parameter sets and optimising the 25 candidate parameter sets that yielded the smallest initial objective function values. Subsequent optimization was performed using a two-stage optimizer, starting with Adam44 for 1000 iterations with a learning rate of 10−2. Starting from the endpoint of Adam, the LBFGS optimizer was used for a maximum of 1000 iterations or until convergence. Subsequently, for all trained 25 models, the neural network parameters were fixed and the conditional parameters were estimated on the validation set using the LBFGS optimizer, with the following loss function:

$${{\rm{L}}}_{{\rm{test}}}({\beta }_{i})=\frac{| {\mathcal{T}}| }{2}\ln \left({\sigma }^{2}\right)+\frac{1}{2{\sigma }^{2}}\mathop{\sum }\limits_{t=0}^{{\mathcal{T}}}{\left({C}_{{\rm{model}}}^{{\rm{pl}}}(t| {p}_{{\rm{NN}}},{\beta }_{i})-{C}_{{\rm{data}},i}^{{\rm{pl}}}(t)\right)}^{2}$$
(9)

The model that resulted in the lowest average loss function value in the individuals in the validation set was then selected as the best performing model.

After selection of the best performing model, the conditional parameters were reestimated on the full dataset, including the remaining 30% of the data that was not used until now (n = 35). Furthermore, for each individual, the variance in the residuals (\({\sigma }_{i}^{2}\)) was estimated to enable the computation of confidence intervals on the conditional parameter (see “Identifiability analysis”). Estimation was performed using maximum likelihood estimation assuming zero-mean residuals, depicted in equation (10).

$${\rm{NLL}}({\beta }_{i},{\sigma }_{i})=\frac{| {\mathcal{T}}| }{2}\ln \left({\sigma }_{i}^{2}\right)+\frac{1}{2{\sigma }_{i}^{2}}\mathop{\sum }\limits_{t=0}^{{\mathcal{T}}}{\left({C}_{{\rm{model}}}^{{\rm{pl}}}(t| {p}_{{\rm{NN}}},{\beta }_{i})-{C}_{{\rm{data}},i}^{{\rm{pl}}}(t)\right)}^{2}$$
(10)

Identifiability analysis

To determine whether βi was identifiable for each individual, we inspected the maximum likelihood function values (equation (10)) with the estimated \({\sigma }_{i}^{2}\) fixed, when varying βi around its optimum. The 95% confidence interval of βi was determined by the boundary values for the change in likelihood, defined in45 to be ΔNLL ≈ 7.16. For an individual, βi was defined as identifiable if these bounds were reached. If only one bound was reached, βi was defined as practically unidentifiable. If neither bounds were reached, βi was defined as unidentifiable46.

Symbolic regression

For symbolic regression, initially 900 unique samples of the neural network output were created through combinations of 30 values for the conditional parameter β and incremental glucose values. Incremental glucose values were capped at zero, to reduce the complexity of the problem. Symbolic regression was then performed using the PySR package25 using the settings listed in Table 3.

Table 3 Settings for the symbolic regression algorithm from PySR

From the resulting equations, the top equation was selected using the ‘best’ option from the PySR package. This first selects all expressions with a loss smaller than at least 1.5 times the loss of the most accurate model. From these expressions, the model equation with the highest score is selected, defined as the negated derivative of the loss with respect to complexity25.

The resulting equation was then simplified by amalgamating constants into a single learnable parameter. As the incremental glucose values used to train the symbolic equation were capped at zero, production was set at a value of zero when Gpl(t) < Gpl(0).

Programming

Both the ordinary differential equation models, as well as the universal differential equation models used in this research were implemented in the Julia programming language, using the ‘OrdinaryDiffEq.jl’ package47.