Introduction

In the early 1940s, after meticulously gathering microbial growth data, Jacques Monod observed that bacterial growth rates systematically varied with nutrient concentration, mirroring patterns in enzyme kinetics1. This insight led him to formulate a mathematical model linking external conditions (nutrient availability) to biological responses (growth rates), fundamentally redefining our understanding of microbial growth. Monod demonstrated that even inherently complex phenomena become tractable—and testable—when expressed through appropriate variables, transforming microbial growth studies into quantitative tools for interrogating fundamental principles of microbial behavior.

Today, microbial data are collected at unprecedented scales and temporal resolutions, driven by urgent needs to understand and predict phenomena such as antibiotic resistance evolution, ecosystem resilience, and efficiency of microbial bioproduction2,3,4,5. Yet despite this wealth of data, turning observations into actionable interventions remains challenging. Doing so requires models that generate testable hypotheses about microbial behaviors under previously unobserved conditions6,7. Much like Monod’s approach, these models establish quantitative relationships between measurable phenotypes (e.g., growth rate, lag duration, or yield) and ecological conditions (e.g., nutrient availability, chemical stressors, or species interactions) or evolutionary factors (e.g., genetic mutations)—what we refer to as ecological or evolutionary responses. Before these models can be built, relevant phenotypic variables must be inferred from experimental data. But beyond the computational difficulties associated with the analysis of large datasets, biologically relevant observables are often not clearly defined or immediately known.

Available tools can assist with parameter inference for growth kinetics datasets8,9,10,11,12,13, even handling more complex cases like diauxic growth13 and mixed cultures14, user-defined functions, ordinary differential equations9,11, or fitting specific dosage-response models11,12. However, we lack resources to support exploratory data analysis, help identify observables for further investigation and detect mathematical expressions of response patterns that can inform on biological processes underlying the observed responses.

To fill this methodological gap, we developed Kinbiont—a Julia package designed as an end-to-end pipeline for biological discovery, enabling data-driven generation of hypotheses that can be tested in targeted experiments. Leveraging Julia’s speed, flexibility, and growing popularity in biological sciences15, Kinbiont integrates advanced solvers for ordinary differential equations, non-linear optimization methods, signal processing, and interpretable machine learning algorithms. Kinbiont can fit virtually any system of differential equations or analytic function, and, unlike existing tools, extends model-based parameter estimation to fits with segmentation. The resulting parameters can be further linked to experimental conditions via explainable machine learning methods: symbolic regression16,17 can help identify mathematical expressions (e.g., equations for dose-response curves), while decision trees18 yield graphical decision rules mapping experimental conditions and growth responses.

Here, we introduce the Kinbiont framework and demonstrate parameter inference for microbial growth kinetics using publicly available datasets on diauxic growth19, bacterial cultures infected by phages20, and ethanol bioproduction21. To illustrate how Kinbiont’s machine learning methods reveal quantitative relationships in microbial responses, we revisit Monod’s classical nutrient-limitation experiment, analyze growth inhibition by a ribosome-targeting antibiotic, and explore a large-scale chemical screening study of ecotoxicological effects22.

Results

The Kinbiont framework

Kinbiont is distributed as an open-source library, freely available via the Julia package manager or at https://github.com/pinheiroGroup/Kinbiont.jl23 with documentation. The Kinbiont framework supports both single-dimensional (e.g., optical density measured by plate readers) and multi-dimensional datasets (e.g., simultaneous measurements of biomass, fluorescence, and oxygen concentration) collected under controlled experimental conditions. The pipeline is composed of three modules, which can be used independently or sequentially to perform model-based analyses and systematically map microbial growth parameters to experimental variables (see Fig. 1).

Fig. 1: From experimental design to response patterns with Kinbiont.
figure 1

Microbial kinetics experiments generate time-resolved datasets under controlled conditions. These datasets serve as input to Kinbiont. The pipeline consists of three modules: (1) optional data preprocessing, including background subtraction, smoothing, or optical-density corrections tailored to microplate assays; (2) model-based parameter inference to characterize growth curves quantitatively (see Fig. 2 for examples of supported models and analyses); and (3) glass-box machine learning methods to perform feature selection and establish interpretable relationships between inferred growth parameters and experimental variables. The output from this module is either an empirical mathematical law (via symbolic regression) or a graphical decision model (e.g., a decision tree), describing how experimental conditions influence microbial growth parameters. All data shown in this figure are synthetic and intended solely for illustrative purposes.

In the first module—Data Preprocessing—raw time-series data are processed according to user-defined needs, including background subtraction, replicate averaging, correction for multiple scattering (especially relevant for plate-reader assays24), and smoothing (see “Methods” for data formats and preprocessing details). The second module—Model-Based Parameter Inference—fits processed data to mathematical models, estimating microbial growth parameters such as growth rates, lag-phase duration, and total biomass production. Users may select models from a built-in library of nonlinear functions (NL) and ordinary differential equations (ODEs), or define a custom system of equations tailored to their datasets. Finally, the third module—Glass-Box Machine Learning Analyses—employs interpretable machine-learning (ML) techniques to identify mathematical relationships that we refer to as empirical laws and graphical decision rules linking inferred model parameters to variations in experimental conditions (e.g., antibiotic or nutrient concentrations). This analysis helps determine which experimental features influence growth dynamics the most, aiding interpretation and guiding follow-up investigation.

Kinbiont—an ecosystem of numerical methods for microbial growth analysis

Kinbiont methods are organized into three distinct categories, as shown in Fig. 2:

  1. (1)

    Mathematical models for microbial kinetics: Kinbiont can be used for both parameter estimation and simulation/synthetic data generation of microbial kinetics.

    Users can define custom models through analytic expressions or ODE systems, or select from Kinbiont’s library of classical microbial growth models available in both closed-form and differential-equation-based formats (e.g., logistic, Gompertz, Richards, and Heterogeneous Population Models25,26). Integration with Catalyst.jl27 extends these capabilities by enabling users to implement reaction networks (see Supplementary Information). In addition, Kinbiont supports cybernetic models for describing growth dynamics in multi-substrate environments28 (see Supplementary Information). Kinbiont also provides dedicated functions specifically focused on analyzing the exponential growth phase, and microbial growth simulators that employ deterministic ODEs or stochastic Poisson processes (Methods).

    Fig. 2: Overview of models and analyses supported by Kinbiont.
    figure 2

    Kinbiont includes multiple methods for modeling microbial growth kinetics, from A simple log-linear fits that automatically detect exponential growth phases, to B more complex nonlinear models and ODE-based systems. C Multidimensional datasets that include additional measurements beyond biomass (e.g., metabolites or fluorescent reporters) can be modeled using coupled ODE systems, chemical reaction networks via integrations with Catalyst.jl27, or cybernetic models describing single microbial populations utilizing multiple substrates. D Users can apply optimization methods to fit any supported model to their data. Additional capabilities include E systematic model selection and F sensitivity analysis (currently supported only for single-dimensional datasets). Non-parametric methods to estimate growth parameters directly from experimental data, such as G specific growth-rate calculations and H signal-processing algorithms for growth-phase detection, are also provided. I To handle more complex scenarios involving multiphase growth dynamics, Kinbiont implements a segmented fitting approach, first detecting growth-phase transitions via change-point algorithms and then performing model fitting on individual segments. Finally, if experimental conditions are known, interpretable machine learning techniques such as J decision trees and K symbolic regression can be applied to quantitatively link these conditions to inferred microbial growth parameters. All data shown in this figure are synthetic and intended solely for illustrative purposes.

    For reference, Supplementary Table 1 provides an overview of models available in Kinbiont that are commonly used in microbial time series data analysis. Detailed mathematical descriptions of these and other models are included in Methods and Supplementary Information. A flowchart to help users select the appropriate fitting procedure based on their dataset characteristics is shown in Fig. 3, and illustrative case studies demonstrating practical applications are described Kinbiont’s accompanying documentation23.

    Fig. 3: Flowchart guiding selection of fitting methods in Kinbiont.
    figure 3

    This schematic flowchart assists users in choosing suitable functions and methods available in Kinbiont for analyzing microbial kinetics data. Decisions on data dimensionality, experimental design specifics, and computational constraints help identify optimal workflows and analytical routines within the software.

  2. (2)

    Growth kinetics data analysis: Kinbiont frames parameter estimation as a nonlinear optimization task, enabling users to fit any supported model to time-series data. To address common challenges encountered in model fitting, such as non-differentiable functions and error landscapes, Kinbiont integrates over 100 optimization algorithms—including global, mixed-integer, non-convex, constrained, and restart schemes29,30. By default, it employs a differential evolution black-box optimizer with box constraints to ensure the fitted parameters remain biologically plausible even in cases where the model does not inherently impose restrictions.

    Together with optimization-based fitting routines, Kinbiont provides tools for model selection31, sensitivity analysis32, and confidence interval estimation for inferred parameters using bootstrap resampling33 (Methods). In addition, Kinbiont also includes non-parametric methods to directly compute specific growth rate curves from experimental data.

    For more complex kinetic scenarios, such as multiphase bacterial growth, Kinbiont introduces a new approach combining an offline change-point detection algorithm34, systematic model selection, and optimization-based parameter fitting. This method, currently implemented for single-dimensional datasets, detects growth-phase transitions, performs NL and ODE model fits to segmented time-series data, and evaluates alternative segmentation scenarios, automatically ranking models across a user-specified range of change points to determine the most informative representation of growth-phase dynamics (see “Methods” for details, and Fig. 4 for a schematic representation of the workflow steps involved in segmented fitting).

    Fig. 4: Segmented fitting workflow.
    figure 4

    Illustrative example demonstrating the direct search approach for optimal change point selection. The input is a synthetic time series, used here solely for illustrative purposes. Throughout the figure, black dots represent the data points, and blue lines indicate model fits. Initially, the user selects whether the change point detection algorithm should analyze the original data or its derivative. Subsequently, a dissimilarity curve is computed using the selected method (see “Methods”), where significant peaks represent potential change points signaling shifts in the system dynamics. These peaks are ranked by prominence, and a user-defined number of top-ranking peaks (in this example, three) are chosen for further analysis. If the direct search method is chosen, all combinations of the selected candidate change points are evaluated through segmented model fitting. These models are then ranked based on their AICc scores (Eq. (M13)). Kinbiont can output either only the best-performing model or all evaluated models along with their respective parameters and AICc scores.

    We benchmarked the accuracy and reliability of these features using synthetic datasets at varying noise levels and compared parameter inference across multiple ODE-based and nonlinear models (See Supplementary Information and Supplementary Figs. 14).

  3. (3)

    Glass-box downstream machine learning analysis: Kinbiont incorporates interpretable machine learning methods such as symbolic regression and decision trees. Symbolic regression uses evolutionary algorithms to search iteratively for algebraic expressions that relate input variables (experimental features and growth parameters), mathematical operators, and constants to observations, thus capturing empirical relationships within the data16. Decision trees recursively partition data into groups according to experimental features, generating graphical decision rules and statistical measures (e.g., importance scores) to quantify the relative influence of different experimental variables35.

    Together, these methods aid in identifying trends on relevant variables and selecting experimental features with significant effects on microbial responses, ultimately informing subsequent experimental design.

Kinbiont enables model-based parameter inference in non-standard microbial growth kinetics

To showcase Kinbiont’s flexibility in analyzing non-standard growth kinetics, we applied it to three distinct datasets: bacterial kinetics in a culture infected with T4 phage20, ethanol bioproduction21, and diauxic growth in multi-resource environments19. Our primary goal here is to highlight the methodological capabilities of Kinbiont’s parameter inference module. Detailed biological interpretation and discussion of the inferred parameters are provided in subsequent sections, where datasets are better suited for response analysis.

In bacterial cultures infected by viruses, growth kinetics are characterized by an initial growth phase followed by a death phase due to phage-induced cell lysis. These dynamics differ qualitatively from the monotonic curves typical of standard bacterial cultures and thus require dedicated models. As an example, we used Kinbiont to fit the heterogeneous population model with inhibition and death (Eq. (M1)) to data from Ref. 20, obtaining accurate quantitative agreement with the experimental measurements (see Fig. 5A and Methods for details).

Fig. 5: Fitting non-standard microbial growth kinetics with Kinbiont.
figure 5

Kinbiont can fit microbial growth dynamics with a variety of profiles. A Non-monotonic growth with cell death: Kinbiont fit to an E. coli culture infected by T4 phage20. B Multidimensional fit to biomass, substrate, and ethanol time series from a Kluyveromyces marxianus bioproduction experiment21. C Examples of segmented fits for two different alanine-to-glutamate concentration ratios (top: 4, bottom: 1/16). The optimal number of change points was determined using the AICc criterion (Eq. (M13)), evaluated via direct search across multiple segmentation possibilities. D Heatmaps of time-resolved optical density (OD) measurements from diauxic growth experiments (left, data from Ref. 19). The optimal number of change points for each experimental condition is annotated. Violin plots with embedded box plots (showing the 0.25, 0.50, and 0.75 quantiles; right) display the distribution of mean AICc values for fits with different segment numbers. Conditions that Kinbiont classified as having one change point (8 samples) are contrasted with those requiring two change points (17 samples). See Supplementary Fig. 6 for detailed AICc distributions across replicates. E Mean growth rates for each detected phase are plotted against nutrient composition. Each condition includes 2-4 replicates (see “Methods”); error bars show the replicate standard deviation. Green dots represent the first-phase rate, present in every curve, whereas the purple dots represent the second-phase rate, detected only when a diauxic shift occurs. The dashed black line marks the transition to diauxic behavior, where the change-point analysis (AICc) favors two segments over one.

Next, we analyzed multichannel kinetic data from ethanol bioproduction by different strains of Kluyveromyces marxianus, including simultaneous measurements of biomass accumulation, glucose consumption, and ethanol production21. Using the Monod-Ierusalimsky model28 (Eq. (M2)), Kinbiont successfully fitted this multivariate dataset, capturing the coupled dynamics of biomass and metabolite concentrations (Fig. 5B; see “Methods” for details of the fitting procedure and Supplementary Fig. 5 for the fits to all replicates).

Finally, we applied Kinbiont to analyze diauxic growth data in multi-resource environments19. Diauxie occurs when microbes sequentially utilize multiple nutrient sources. Unlike typical single-resource growth curves, diauxic growth features an additional lag phase followed by a second phase of exponential growth until the culture reaches saturation19. Quantitative estimation of parameters such as lag-phase durations, exponential growth rates, and saturation levels is important for understanding microbial responses to multi-nutrient environments. However, existing methods for extracting these parameters often rely on manual curation due to the lack of automated, model-based approaches11. To address this issue, we performed segmented fits allowing up to five change points on optical density (OD) data from Acinetobacter species grown under varying combinations of alanine and glutamate, with logarithmic concentration ratios Log([Ala]/[Glu]) ranging from -4 to 4 in increments of 0.333. We considered the ODEs from the exponential (Eq. (M3)), logistic (Eq. (M4)), heterogeneous population (HPM; Eq. (M5)), and exponential HPM (Eq. (M6)) models (Fig. 5C, see “Methods” for explicit equations and fitting details). As demonstrated in Fig. 5D, Kinbiont reproduces the observed Acinetobacter growth curves from Ref. 19 across all tested conditions. For each curve, we retain the fit with the lowest corrected Akaike information criterion (AICc; Fig. 5D, left panel). Because AICc penalizes additional parameters, this criterion selects the model and segmentation combination that offers the best trade-off between goodness-of-fit and complexity, countering overfitting. By mapping the inferred growth rates from each segment onto their corresponding experimental conditions (Fig. 6E), we observed distinct nutrient-dependent patterns: the growth rate during the first phase increased with alanine concentration until saturation, whereas the second-phase growth rate decreased as glutamate concentration diminished. At low glutamate levels, the second growth phase exhibits a higher growth rate than the first, a counterintuitive result given the classical expectation that microbes preferentially metabolize nutrients that support faster growth rates initially. These observations thus suggest a regulatory preference in this strain for the sequential utilization of alanine followed by glutamate, consistent with findings from the original study19.

Fig. 6: Automated detection of empirical microbial responses using Kinbiont.
figure 6

A Representative example illustrating the fitting procedure and parameter inference from an optical density (OD) time series of the methionine auxotroph strain at 10 μM methionine. All growth curves and optimal fits are provided in Supplementary Fig. 8. The exponential growth rate and maximum biomass concentration were extracted from the first and second segments, respectively. B Inferred exponential growth rates at varying methionine concentrations (green dots), plotted together with a subset of candidate equations from the symbolic regression hall-of-fame (Supplementary Table 2). The solid black line indicates the Monod equation. C Inferred maximum biomass concentrations at varying methionine concentrations (red dots), shown together with selected candidate equations from the symbolic regression hall-of-fame (Supplementary Table 3). The black line represents the linear substrate-yield relationship. D Inferred exponential growth rates at varying chloramphenicol concentrations (green dots), shown alongside selected candidate equations describing growth rate inhibition from symbolic regression (Supplementary Table 6). All growth curves and optimal fits are provided in Supplementary Fig. 9.

Kinbiont identifies mathematical models underlying microbial responses to environmental changes

We conducted two experiments to evaluate Kinbiont’s ability to identify empirical laws from time series data: a variation of Monod’s classical nutrient-limitation assay1, assessing auxotroph growth under amino acid limitation, and a dose-response assay measuring growth inhibition by the ribosome-targeting antibiotic chloramphenicol. In both cases, OD time-series data were collected using a microplate reader.

In the nutrient-limitation experiment, we used an E. coli strain incapable of synthesizing the amino acid methionine (ΔmetA knockout). Cells were cultured in minimal medium supplemented with methionine concentrations ranging from 0 to 80 μM (Methods). Data preprocessing consisted of background subtraction and multiple scattering corrections. These corrections are important because neglecting background subtraction can cause a misestimation of exponential growth rates (Supplementary Fig. 7A). Similarly, failing to correct for multiple scattering leads to underestimated saturation OD values (Supplementary Fig. 7B. We applied segmented fitting with one change point, using logistic (Eq. (M4)) and exponential HPM (Eq. (M5)) models for each condition (Supplementary Fig. 8). From these fits, we extracted the exponential growth rate from the first segment and the saturation OD from the second segment, which was then used to quantify the maximum biomass concentration (Fig. 6A; Methods). Response profiles of growth rate and maximum biomass concentration as a function of methionine concentration are shown in Fig. 6B–C.

For the chloramphenicol dose-response assay, we cultured E. coli in minimal media with chloramphenicol concentrations ranging from 0 to 16 μM. We followed the previously described protocol for pre-processing and parameter inference, with a focus on the exponential growth rate (Methods, Supplementary Fig. 9). The resulting dose-response is shown in Fig. 6D.

We next used symbolic regression16 to identify mathematical models describing how growth rate and total growth depend on methionine concentration and how growth rate depends on chloramphenicol concentration. For both experiments, the resulting candidate models, their complexity scores, and mean squared errors are presented in Supplementary Tables 2, 3, and 6.

In the amino acid limitation experiment, symbolic regression identified the following equations among the top-ranked candidate models:

$$\lambda=\frac{{\alpha }_{1}A}{{\alpha }_{2}+A},\quad \Delta N={\alpha }_{3}A,$$
(1)

where α1, α2, and α3 are constants, A denotes the concentration of supplemented methionine, and ΔN is the culture’s total biomass growth. Recognizing α1 = λmax (maximum growth rate), α2 = KM (half-saturation constant), and α3 = YM (methionine yield coefficient), we recover the empirical relationships first reported by Monod1. This result was replicated using an independent dataset from another methionine auxotroph strain (Supplementary Tables 4 and 5).

Using the symbolic regression output, we estimated the specific methionine consumption coefficient χ as the reciprocal of the yield coefficient α3. Converting χ from μM/OD to μmol/g (dry weight), using a previously measured OD-to-dry-weight conversion factor of 0.5 g/OD (Methods) and accounting for culture volume, yielded χ = 145.05 μmol/g (dry weight). This closely matches the methionine content reported for E. coli biomass (146 μmol/g dry weight)36, suggesting that the Δmet auxotroph utilizes nearly all supplemented methionine for biomass synthesis, with negligible catabolism.

In the chloramphenicol dose-response assay, symbolic regression proposed the following model:

$$\lambda=\frac{{\beta }_{1}}{1+{\beta }_{2}d},$$
(2)

where β1 and β2 are constants and d is the chloramphenicol concentration. Identifying β1 = λ0 (drug-free growth rate) and β2 = 1/d50 (inverse half-inhibition concentration), this expression corresponds to the Langmuir form recovered in the bacteriostatic limit of a mechanistic model for ribosome-targeting antibiotic inhibition37 (see derivation in Methods). The fitted parameters (β1 = 0.58 h−1, 1/β2 = 7.55 μ M) are consistent with our experimental data (Fig. 6D).

These results demonstrate that symbolic regression can extract mathematical expressions with interpretable parameters directly from microbial growth data, providing a proof-of-concept for the automated discovery of empirical laws describing responses to ecological perturbations such as nutrient limitation and antibiotic exposure.

Kinbiont ranks the impact of stressors on growth-phase-specific parameters

A different class of problems in ecology concerns feature identification. Microbes are exposed to multiple environmental stressors, such as antibiotics and agrochemicals, but how do interactions among these stressors impact microbial physiology and ecosystem ecology?38

To address this question, we applied Kinbiont to the dataset from Ref. 22, which recently profiled the growth kinetics of 12 bacterial isolates—including strains of A. fisheri, N. soli, and E. coli, among others, plus a mixture of 10 strains—in response to combinations of eight freshwater pollutants. These included antibiotics (Amoxicillin, Oxytetracycline) and common agrochemicals (Tebuconazole, Metaldehyde, Chlorothalonil, Imidacloprid, Diflufenican, and Glyphosate), tested in a total of 255 unique combinations, resulting in 15120 growth curves. After background subtraction, we fitted the entire dataset with the Richards model (Eq. (M8), Methods) to extract the duration of the lag phase, the exponential growth rate, and the maximum biomass concentration. ~10% of growth curves were discarded based on an average relative error exceeding a threshold (Methods, Supplementary Fig. 12).

For each observable, we performed decision tree regression using the Gini impurity score. Impurity scores quantify how markedly a given stressor separates data into distinct groups, which we characterized by differences in the average of growth parameters in conditions with and without the stressor (greater reductions in impurity indicate more pronounced effects of stressors on microbial growth dynamics). We assessed the robustness of these classifications using 10-fold cross-validation across different tree depths. Interestingly, the coefficients of determination (R2 values) indicate that the impact of a given chemical is not necessarily relevant across all observables of a given strain. For example, we obtained a cross-validation coefficient of determination R2 > 0.5 for all observables of N. soli, but only for the lag of A. fischeri (Methods, Supplementary Figs. 1315).

Next, we ranked stressors for each strain and parameter by using the impurity importance score computed from decision-tree splits. As shown in Fig. 7, the score values indicate that the response to a chemical is highly species-specific. More surprisingly, they reveal an intra-strain differential response across the various phases of growth (Fig. 7A, C, E). For example, Imidacloprid significantly affects the growth rate in the species N. soli, while the maximum biomass concentration is mostly affected by Oxytetracycline, and the duration of the lag phase is mainly affected by Tebuconazole. Importantly, summary metrics like the commonly used area under the curve (AUC) cannot resolve growth-phase-specific stressor effects. As shown in the Supplementary Fig. 10, the AUC primarily serves as a proxy for the culture’s saturation OD over a fixed time interval. However, a given AUC value can be highly degenerate with respect to growth rate or lag phase duration (Supplementary Fig. 10A, C).

Fig. 7: Characterization of chemical perturbations.
figure 7

A, C, and E Heatmaps displaying the normalized impurity scores (scaled between 0 and 1) for each chemical, strain, and kinetic parameter from the decision trees evaluated at maximum depth (Methods). High values indicate that the growth property is perturbed by the chemical, while low values suggest that the strain’s kinetic parameter remains unaffected. B, D, and F Distribution splitting analysis for each strain and parameter, illustrating the effect of the presence or absence of the most relevant chemical (indicated by the highest impurity score). Each panel compares the growth-parameter distributions with the chemical identified as most relevant by the decision tree either present or absent. Violin plots include individual data points and an internal box-plot marking the 0.25, 0.50, and 0.75 quantiles. The accompanying two-sided Wilcoxon P value reports the statistical significance of the difference between conditions.

More broadly, by efficiently identifying which perturbations impact individual growth parameters, this framework can be used to guide targeted interventions that steer microbial behavior toward specific goals—for example, slowing the growth rate or reducing biomass density.

Kinbiont resolves interactions in combinations of stressors

Beyond ranking individual stressors, Kinbiont can also characterize the impact of interactions among stressors through decision tree analysis. To illustrate how decision trees graphically capture these interactions, we present a simplified example in Supplementary Fig. 11. Building on this intuitive picture, we analyzed decision trees constructed for N. soli and the mixture of strains from the previously described dataset, which display a consistent coefficient of determination R2 0.5 across all the different observables (Methods, Supplementary Figs. 1315).

Figure 8 shows the first two levels of the maximum depth trees for each kinetic parameter, together with the distribution of parameter values at each leaf (Methods). We chose to limit the visualization to the first two tree levels because only a few features (two or three) have high importance scores for these strains. Thus, the top nodes effectively capture most of the variation distinguishing sample clusters (see Fig. 7). Notably, all chemicals displayed at this level significantly split the distributions (P values from the two-tailed Wilcoxon test), confirming the decision tree’s ability to resolve the chemical landscape beyond the single stressor with the most significant effect.

Fig. 8: Profiling stressor combinations with Kinbiont.
figure 8

Graphical representation of the first two levels of the maximum depth decision trees for the different kinetic parameters inferred for the mixture of strains A—C and for the strain N. soli D—E. Dashed branches mark leaves where the chemicals from the parent node are absent; solid branches mark leaves where they are present. For every leaf, violin plots (with individual points and an internal box-plot showing the 0.25, 0.50, and 0.75 quantiles) display the distribution of the corresponding growth parameter. Two-sided Wilcoxon P values accompanying each pair of violins quantify how significantly the split separates the group means. In addition to identifying the chemicals with the most significant statistical effects - like the pronounced impact of Oxytetracycline on the maximum biomass concentration of the mixture in panel B—this graphical representation can also be used to explore chemical interactions. For instance, it can detect synergistic interactions between Oxytetracycline and Imidacloprid in E, and antagonistic interactions between Tebuconazole and Oxytetracycline in D and F.

Expanding on the graphical representation of decision trees, Kinbiont can be used to identify synergistic or antagonistic interactions. Starting with the isolates mixture (Fig. 8A–C), we observe a weak synergistic effect between Oxytetracycline and Metaldehyde in the distribution of growth rates, leading to a mean increase in generation time higher than 10% (mean doubling time gain of 20 mins; Fig. 8A). In N. soli, the distribution of growth rates at the leaf with Tebuconazole but no Oxytetracycline is shifted to lower values compared to the leaf where both chemicals are present, indicating a small antagonistic effect between these drugs on the growth rate (Fig. 8D). While this combination of chemicals exhibits a similar antagonistic effect on the lag phase of this strain (Fig. 8F), the maximum biomass concentration of N. soli is synergistically affected by the presence of Oxytetracycline and Imidacloprid, as the presence of both drugs results in a distribution with a lower mean (Fig. 8E).

These results illustrate how Kinbiont can be used to generate testable hypotheses about chemical interactions from large datasets to inform combined interventions. As one concrete example, our decision-tree analysis (Fig. 8E) ranks the stressor pair Oxytetracycline and Imidacloprid as giving the largest drop in maximum biomass for N. soli, whereas Oxytetracycline and Tebuconazole most strongly slow the growth rate. Since killing by many antibiotics shows linear relationships with growth rate39, an informed next step would be a three-condition time-kill assay: (i) Imidacloprid alone, (ii) Tebuconazole alone, and (iii) Oxytetracycline and Imidacloprid (largest biomass drop). Comparing these conditions distinguishes whether biomass suppression or growth rate suppression accelerates bacterial clearance. In this way Kinbiont narrows dozens of potential drug combinations to a small set that merits immediate experimental validation.

Discussion

Kinbiont is an open-source package developed to enhance microbial data interpretation by combining dynamical modeling with interpretable ML methods. It provides a systematic way to characterize time-series data, enabling the generation of quantitative hypotheses about ecological and evolutionary responses that can subsequently be tested experimentally.

Kinbiont leverages Julia’s extensive ecosystem of numerical methods, providing computational advantages over traditional tools for intensive tasks like optimization and numerical integration15. In addition to built-in classical microbial models and ODE-based systems (see Supplementary Table 1 and Supplementary Information), Kinbiont supports user-defined models compatible with virtually any closed-form equation or ODE system. An important innovation of Kinbiont is the segmented fitting approach, which combines signal processing, model selection, and parameter optimization to automatically detect and characterize distinct growth phases in multiphase microbial time-series data (Fig. 4). Subsequent downstream ML analyses then link these inferred parameters quantitatively to experimental variables.

How can such an integrated framework translate microbial growth data into biological insights and testable hypotheses? First, by quantifying growth parameters from standard to more complex microbial kinetics, Kinbiont allows for systematic analyses of microbial behavior in broader contexts. For example, in bioproduction processes, the relationship between microbial growth rate and product titer often exhibits non-monotonic behavior40. By fitting multidimensional datasets using suitable models (e.g., Eq. (M2)), Kinbiont accurately estimates variables of relevance for process optimization, like biomass growth and metabolic yields. These variables can directly inform strain selection and feeding strategies to implement optimal growth-rate regimes that maximize productivity per nutrient41. Similarly, by using Kinbiont to construct response profiles from multiphase datasets, users can identify behavioral transitions such as diauxic shifts and investigate large-scale datasets that probe ecological processes, for example microbial growth in multi-resource environments19.

With symbolic regression, Kinbiont automates the identification of mathematical relationships between microbial observables and experimental variables, extending the pioneering work of Monod to modern datasets. This approach proposes candidate empirical laws whose validity and generality can be assessed in different microbial systems, environments, and perturbations. Robust mathematical relationships identified in this way can serve as a starting point to reverse-engineer the biological mechanisms leading to observed responses—much like Monod’s insight that microbial growth is limited by nutrient uptake (an enzymatic process)1—and enable exploring their biological implications through testable hypotheses. By combining experimental data with symbolic regression within the Kinbiont framework, we introduce automated empirical law detection in microbiology. In addition to characterizing auxotrophic responses to amino acid limitation, we used this method to identify empirical dose-response relationships relevant to pharmacological studies.

In parallel, decision trees complement symbolic regression by highlighting environmental and genetic conditions that significantly affect microbial growth parameters. Our analysis of a recent ecotoxicological study22 illustrates how integrating model-based parameter inference with decision trees reveals species- and growth-phase-specific responses to chemical stressors. A similar procedure can be applied to other datasets to inform targeted interventions, such as selecting specific stressors or their combinations to modulate microbial growth rates and biomass accumulation in clinical or environmental contexts, or adapted to evolutionary studies by incorporating genotypes as predictive features and modeling phenotypic variations among microbial variants.

Despite these strengths, automated model-based data analysis tools face general challenges related to identifiability and model selection. In over-parametrized models, some data and noise combinations can leave specific parameters unresolved because multiple parameter sets fit equally well42 weakening robustness and limiting biological interpretation. Currently, Kinbiont employs a linear parsimony criterion balancing model fit and complexity, but integrating complementary approaches, like Bayesian posterior estimation methods over replicates, could improve robustness and accuracy in model selection43,44. Similarly, while symbolic regression in Kinbiont has been successful using basic arithmetic operators, extending it to include dimensional constraints, and biologically informed priors would substantially improve interpretability, convergence, and biological relevance—particularly for complex microbial response patterns. Another specific challenge concerns segmented fitting: there is no universal rule for picking the optimal number and placement of segments across individual curves or replicate datasets, and supplying an overly large library of candidate segment models can drive the algorithm toward over-segmentation, producing more break-points than are biologically warranted. Establishing standardized approaches for these advanced features is an important direction for Kinbiont’s future development. This goal will become progressively more achievable as the community produces datasets that probe a broader range of biological phenomena and capture non-standard growth profiles.

Beyond methodological refinements, applying Kinbiont to clinical and environmental datasets will require addressing practical challenges related to data availability, inherent biological variability, and measurement noise. Future iterations could integrate additional machine learning techniques, including explainable boosting machines45 and non-linear mixed-effects models46, both well-suited for high-dimensional datasets and capable of capturing nonlinear relationships between variables (interactions). Expanding these capabilities will broaden the applicability of Kinbiont to areas like metabolic engineering for predicting metabolic pathway perturbation effects, and pathogen-host studies for identifying genetic determinants of infection outcomes.

In summary, we establish a pipeline that streamlines data analysis and supports theoretical formulation in microbiology. Its modular, extensible design ensures adaptability to future computational advancements and emerging research needs, positioning Kinbiont as a long-term resource for microbial discovery.

Methods

Experimental methods

Bacterial culture conditions

Growth experiments were performed at 37 °C with 800 rpm orbital shaking in 48-well microplates, containing 1 mL of culture. Optical density (OD 600 nm) measurements were taken every 6 min using a BioTek Synergy H1 microplate reader. Each experiment included at least two blank wells containing only growth media. For all experiments, M9 Minimal Medium was prepared by diluting a 5X stock solution of M9 salts (Sigma Aldrich) 1:5 in deionized (milliQ) water. The medium was then supplemented with 0.24 g/L MgSO4, 0.011 g/L CaCl2 and 0.36 g/L D-glucose as the sole carbon source.

All strains used in the experiments were stored at  −80 °C as glycerol stock, streaked on LB-Agar plates, and grown overnight at 37 °C. The following day, a single colony was picked and inoculated into 15 mL of LB broth in a 100 mL Erlenmeyer flask. The liquid culture was grown overnight in Luria-Bertani (LB) medium at 37 °C with 150 rpm orbital shaking. Once culture saturation was reached, the OD was measured using a ThermoFisher Genesys spectrophotometer, and the culture was diluted to an initial OD of 0.01 for microplate experiments.

Amino acid limitation experiment data generation

E. coli K-12 MG1655 ΔmetA knockouts (provided by Martin Ackermann’s Lab, ETH) were used as methionine auxotrophs. L-methionine (Sigma Aldrich) was added to M9 Minimal Medium at final concentrations ranging from 3.125 to 80 μM. Each concentration was assayed in biological duplicate, with three technical replicates per biological replicate.

Chloramphenicol dose response

E. coli K-12 BW25113 obtained from the DSM-Z library was used in the experiments. Chloramphenicol (Sigma Aldrich) was prepared as a stock solution in 96% ethanol and stored at  −20 °C. Before each experiment, the stock solution was diluted to final concentrations ranging from 2 to 16 μM.

Calibration for optical density measurements

This calibration was performed to correct for multiple scattering effects, which can cause underestimation of the OD measurements by microplate readers24, and to enable comparison of OD measurements between the BioTek Synergy H1 microplate reader and the ThermoFisher Genesys spectrophotometer (used as the reference instrument).

An overnight culture of E. coli K-12 BW25113 grown in M9 medium supplemented with 0.9 g/L D-glucose was centrifuged and resuspended in phosphate-buffered saline (PBS) to achieve an initial OD of approximately 4.3. This suspension was then serially diluted in PBS to create nine additional suspensions with varying cell concentrations.

Each suspension (1 mL) was transferred to three wells of a 48-well microplate, and OD (600 nm) measurements were performed using the BioTek Synergy H1 microplate reader. For spectrophotometer measurements, each suspension was further diluted in PBS to obtain a final OD within the linear range of the instrument (0.01-0.5). Three cuvettes per suspension were measured at 600 nm using the ThermoFisher Genesys spectrophotometer. Calibration was performed by comparing the microplate reader measurements to the spectrophotometer measurements, adjusting for the dilution factor.

Dry weight estimation

Stationary phase bacterial cultures grown in 100 mL of M9 + 10 mM glucose in 500 mL flasks were harvested and centrifuged at 8000 g for 10 min at 4 °C. The supernatant was then removed, and the cultures were resuspended in milliQ water and centrifuged again with the same settings. The supernatant was again removed, and the cultures resuspended in milliQ water. In each experiment, the resulting suspension was diluted into several suspensions with variable concentrations of bacteria. 15 mL of each suspension was transferred to a pre-warmed (at least 24h at 90 °C) glass vial pre-weighted with an analytical scale. The vials were then put to dry in an unsealed incubator at 90 °C for 48h. After this time, the weight of each vial was measured with the same analytical scale. After accounting for dilution factors, the coefficient of conversion from OD to mg/mL of dry weight was determined by linear regression, constraining the intercept to 0.

Computational analyses

Data input

Kinbiont can accept data directly through Julia notebooks/scripts or by working with external data files:

  • Julia notebook/script: Input data are provided as an m × n matrix, where n is the number of time points, and m is the number of measured quantities plus one. The first row contains time points, while subsequent rows represent measurements (e.g., optical density (OD), colony-forming units, or nutrient concentrations).

  • External .csv files: Users can specify paths to a primary data file and an optional annotation file. The primary data file must be formatted as a matrix, with the first row containing identifiers for each time series, and the first column containing time points. Subsequent columns hold numerical data for each measured case. Annotation files, if provided, are structured as two-column .csv files: the first column lists the names of time series, and the second column includes unique identifiers for biological replicates, optionally accompanied by an exclusion flag.

Note: Currently, external file handling is supported only for single-dimensional datasets (e.g., optical density measurements). Detailed formatting instructions and examples are provided in the GitHub repository23.

Preprocessing of bacterial growth curves

Kinbiont provides the following pre-processing options to prepare data before model fitting:

  • Background subtraction, correction for negative values, and averaging of replicates. If an annotation file is provided, Kinbiont automatically performs background subtraction, either by averaging blank measurements or applying a rolling average subtraction over time. Negative values resulting from background subtraction can be removed or replaced by values derived from a user-defined threshold or distribution of blank measurements. Additionally, replicate wells sharing a unique identifier can be automatically averaged.

  • Data smoothing. Users can smooth noisy data using either a rolling average with a user-specified window size or locally weighted scatterplot smoothing (LOWESS)47.

  • Correction for multiple scattering in OD measurements. Optical density measurements often assume linearity with cell number, an assumption that only holds under specific conditions24. To correct deviations caused by multiple scattering, users can supply a calibration curve composed of raw OD readings paired with reference values obtained from measurements not affected by multiple scattering48. Kinbiont implements two methods for correcting raw OD data:

    • Interpolation using the Steffen monotonic interpolation49.

    • Extrapolation by fitting the exponential model proposed by Meyers et al.48:

      $${{\mbox{OD}}}_{{{\rm{corrected}}}}=a\cdot (1-{e}^{-b\cdot {{\mbox{OD}}}_{{{\rm{raw}}}}}),$$

      where a and b are fit parameters. Note: When using interpolation, the calibration dataset must contain the entire range of experimental OD measurements to ensure accurate corrections.

Mathematical models used in this study

The models listed below were used in the analyses presented in this study (in order of appearance) and are referenced throughout the text. A representative list of one dimensional models implemented in Kinbiont is provided in Supplementary Table 1, with detailed parameter descriptions available in the Supplementary Information. Implementation details can also be found in the GitHub repository23.

ODE models:

  • Heterogeneous Population Model with Inhibition and Death:

    $$\left\{\begin{array}{l}N(t)={N}_{1}(t)+{N}_{2}(t)+{N}_{3}(t)\hfill \\ \frac{d{N}_{1}(t)}{dt}=-\!{r}_{{{\rm{L}}}}\,{N}_{1}(t)\hfill \\ \frac{d{N}_{2}(t)}{dt}=\,\,\,{r}_{{{\rm{L}}}}\,{N}_{1}(t)+\mu \,{N}_{2}(t)-{{\mbox{r}}}_{{{\rm{I}}}}\,{N}_{2}(t)\hfill \\ \frac{d{N}_{3}(t)}{dt}=-\!{r}_{{{\rm{D}}}}\,{N}_{3}(t)+{r}_{{{\rm{I}}}}\,{N}_{2}(t).\hfill \end{array}\right.$$
    (M1)

    Here, N1, N2, and N3 represent the populations of dormant, active, and inhibited cells, respectively. Unlike the actively dividing cell population represented by N2, cells in the inhibited state (N3) are metabolically active but do not divide. Model parameters include the growth rate μ, the lag rate for transition from dormant (N1) to active (N2) cells, rL, the inhibition rate governing the transition from active (N2) to inhibited cells (N3), rI, and the death rate for inhibited cells, rD. We assume all cells initially start in the dormant state (i.e., N1(t = 0) = OD(t = 0), N2(t = 0) = N3(t = 0) = 0).

  • Monod–Ierusalimsky Model for Batch Growth: We implemented a modified version of the Monod-Ierusalimsky model28 to describe microbial batch growth dynamics, specifically accounting for biomass growth and inhibitory byproduct (ethanol) production. The model uses biomass ([B]), substrate ([S]), and ethanol ([EtOH]) concentrations as state variables. The specific growth rate (μ) depends on substrate concentration following Monod kinetics:

    $$M([S])={\mu }_{\max }\frac{[S]}{{K}_{S}+[S]},$$

    where \({\mu }_{\max }\) denotes the maximum specific growth rate, and KS is the half-saturation constant. Ethanol inhibition is modeled as:

    $$I([\,{{\mbox{EtOH}}}\,])=\frac{{K}_{E}}{[\,{{\mbox{EtOH}}}\,]+{K}_{E}},$$

    where KE represents the ethanol inhibition constant. Additionally, we included a lag phase adjustment factor:

    $$\,{{\mbox{Lag}}}(t)=\frac{t}{t+{t}_{{{\rm{L}}}}},$$

    where tL characterizes the adaptation time during the initial lag phase. The effective growth rate is then computed as the product of these three factors:

    $$\mu=M([S])\cdot I([\,{{\mbox{EtOH}}}\,])\cdot \,{{\mbox{Lag}}}\,(t).$$

    The resulting differential equations governing the dynamics are:

    $$\left\{\begin{array}{l}\frac{d[B]}{dt}=\mu [B],\hfill \\ \frac{d[S]}{dt}=-\!\mu [B]{\beta }_{s},\hfill \\ \frac{d[\,{\mbox{EtOH}}\,]}{dt}=\mu [B]{\beta }_{e},\hfill \end{array}\right.$$
    (M2)

    where βs and βe represent the inverse yield coefficients for substrate consumption and ethanol production, respectively.

  • Exponential:

    $$\frac{dN(t)}{dt}=\mu \,N(t),$$
    (M3)

    where μ is the growth rate.

  • Logistic50:

    $$\frac{dN(t)}{dt}=\mu \left(1-\frac{N(t)}{{N}_{{{\rm{max}}}}}\right)N(t),$$
    (M4)

    where μ is the growth rate, and Nmax the maximum biomass concentration.

  • Heterogeneous Population Model (HPM)26:

    $$\left\{\begin{array}{l}N(t)={N}_{1}(t)+{N}_{2}(t)\hfill \\ \frac{d{N}_{1}(t)}{dt}=-\!{r}_{{{\rm{L}}}}\,{N}_{1}(t)\hfill \\ \frac{d{N}_{2}(t)}{dt}=\,\,\,{r}_{{{\rm{L}}}}\,{N}_{1}(t)+\mu \,{N}_{2}(t)\,\left(1-\frac{{N}_{1}(t)+{N}_{2}(t)}{{N}_{{{\rm{max}}}}}\right),\hfill \\ \quad \end{array}\right.$$
    (M5)

    where N1 is the population of dormant cells, N2 is the population of active cells capable of duplicating, μ is the growth rate, Nmax is the maximum biomass concentration, and rL is the lag rate, defined as the rate of transition between the N1 and N2 populations. Here, we assume that all cells are in the dormant state at the start (i.e., N1(t = 0) = OD(t = 0), and N2(t = 0) = 0).

  • Exponential HPM:

    $$\left\{\begin{array}{l}N(t)={N}_{1}(t)+{N}_{2}(t)\hfill \\ \frac{d{N}_{1}(t)}{dt}=-\!{r}_{{{\rm{L}}}}\,{N}_{1}(t)\hfill \\ \frac{d{N}_{2}(t)}{dt}=\,\,\,{r}_{{{\rm{L}}}}\,{N}_{1}(t)+\mu \,{N}_{2}(t),\hfill \end{array}\right.$$
    (M6)

    where similarly to the HPM model, N1 and N2 refer to the populations of dormant and active cells, respectively. μ is the growth rate, and the lag rate rL denotes the transition between the N1 and N2 populations. Here, we also assume that all cells are in the dormant state at the start (i.e., N1(t = 0) = OD(t = 0), and N2(t = 0) = 0).

  • Adjusted Heterogeneous Population Model:

    $$\left\{\begin{array}{l}N(t)={N}_{1}(t)+{N}_{2}(t)\hfill \\ \frac{d{N}_{1}(t)}{dt}=-\!{r}_{{{\rm{L}}}}\,{N}_{1}(t)\hfill \\ \frac{d{N}_{2}(t)}{dt}={r}_{{{\rm{L}}}}\,{N}_{1}(t)+\mu \,{N}_{2}(t)\,\left(1-{\left(\frac{{N}_{1}(t)+{N}_{2}(t)}{{N}_{{{\rm{max}}}}}\right)}^{\!m}\right),\hfill \end{array}\right.$$
    (M7)

    where, similarly to all variants of the HPM model, N1 and N2 refer to the populations of dormant and active cells, respectively. μ is the growth rate, Nmax the maximum biomass concentration, rL is the lag rate (i.e., the rate of transition between N1(t) and N2(t)) and m a shape constant. Here, we also assume that all cells are in the dormant state at the start (i.e., N1(t = 0) = OD(t = 0), and N2(t = 0) = 0).

NL models:

  • Richards model51:

    $$N(t)=\frac{{N}_{{{\rm{max}}}}}{{[1+\eta {e}^{-\mu (t-{t}_{{{\rm{L}}}})}]}^{\frac{1}{\eta }}},$$
    (M8)

    where μ is the growth rate, Nmax is the maximum biomass concentration, tL the lag time, and η a shape constant.

Parameter inference and model fitting

In Kinbiont, parameter inference is framed as a nonlinear optimization problem. Consider a dataset consisting of n observations collected at time points ti,  i = 1, . . . , n, each observation having m measured dimensions. Let Dj(ti) denote the experimental data for dimension j, and \({\hat{N}}_{j}({t}_{i},P)\) represent the numerical solution of the corresponding ODE evaluated at time ti with parameters P. We define an objective function as:

$${{\mathcal{L}}}(P):=\!\frac{1}{n}{\sum }_{{t }={1}}^{n}{\sum }_{k=1}^{m}{{\mathcal{D}}}({D}_{j}({t}_{i}),{\hat{N}}_{j}({t}_{i},P)),$$
(M9)

where \({{\mathcal{D}}}({D}_{j}({t}_{i}),{\hat{N}}_{j}({t}_{i},P))\) represents the distance metric (e.g., squared error) between the observed data dimension Dj(ti) and its corresponding numerical prediction \({\hat{N}}_{j}({t}_{i},P)\). Parameter inference thus becomes an optimization task aiming to identify the optimal set of parameters P* minimizing this objective:

$${{\mathcal{L}}}({P}^{*})=\min \left({{\mathcal{L}}}(P)\right).$$
(M10)

Kinbiont uses the package Optimization.jl29 which integrates over 100 optimization schemes, including local and global optimization methods, derivative-free methods, gradient-based approaches, constrained optimization, and multi-start techniques. This selection enables users to choose the most appropriate method based on their time constraints, the complexity of the loss function, or specific performance requirements.

Various loss functions are implemented in Kinbiont, such as the L2 norm and the relative error. In the main text analyses of one-dimensional (e.g., OD) data, we used the latter, defined as

$${{\mathcal{L}}}(P)=\frac{1}{n}{\sum }_{i=1}^{n}\frac{1}{2}\,{\left(1-\frac{D({t}_{i})}{\hat{N}({t}_{i},P)}\right)}^{\!2},$$
(M11)

where D(ti) and \(\hat{N}({t}_{i},P)\) denote observed and predicted values, respectively, at each time point ti. A complete list of available loss functions is provided in the Supplementary Information and on the GitHub repository23.

Model selection

Selecting an appropriate growth model for bacterial kinetics can involve trial and error due to the non-mechanistic nature of macroscopic models25. Currently, the automated model selection feature in Kinbiont is available only for single-dimensional datasets and uses the Akaike Information Criterion (AIC)31.

Specifically, Kinbiont fits all candidate models from a user-defined list and evaluates the AIC according to:

$$\,{{\mbox{AIC}}}=\beta k+{n}_{{{\rm{data}}}}{{\mbox{Log}}}({{\mbox{RSS}}}\,({P}{*})),$$
(M12)

where k is the number of model parameters, β is a user-defined penalty, n is the number of data points, and RSS is the residual sum of squares at the optimal parameters P*. Kinbiont returns the AIC scores for all tested models and identifies the model with the lowest AIC as the best-fitting model. Users also have the option of calculating the corrected AIC (AICc) for datasets with a small number of points, given by:

$$\,{{\mbox{AIC}}}c={{\mbox{AIC}}}\,+\frac{\beta (k+1)(k+2)}{(n-k-2)}.$$
(M13)

Sensitivity analysis

Kinbiont includes an automatic sensitivity analysis feature to ensure stable parameter inference, regardless of parameter initialization. Currently implemented only for single-dimensional data, this feature employs the Morris method for global sensitivity analysis32. The Morris method efficiently identifies input parameters with significant impacts on model outputs by systematically varying initial guesses—one at a time—and re-evaluating the optimization outcome. This approach highlights the input variables that have the most significant influence on the model, indicating which ones may require more detailed analysis.

Confidence interval estimation for non-linear fits

When fitting data with a non-linear function, confidence intervals for the inferred parameters, as well as their means, can be estimated using bootstrap or Monte Carlo methods. In the bootstrap method, the user specifies the number of fit repetitions and the portion of data for each fit. The data are randomly sampled and fitted multiple times. The 95% confidence interval and mean of each parameter are then calculated from the distribution of the objective functions33. In the Monte Carlo approach, the user specifies the number of fit repetitions, and for each fit, random noise from the empirical noise distribution of the blanks (with zero mean) is added to the data. The 95% confidence intervals and parameter means are then derived from the distribution of fit results52.

Change point detection algorithm

To accurately analyze microbial kinetics exhibiting multiphase dynamics, Kinbiont implements a segmented fitting procedure that combines automated change point detection, systematic model fitting, and model selection. Kinbiont incorporates two offline change point detection methods34:

  • Linear sliding window method: Detects changes by minimizing the absolute deviation from the mean of data points within a segment.

  • Least Square Density Difference (LSDD) method: Implemented using the Julia package ChangePointDetection.jl, this method quantifies changes by comparing the probability densities estimated from data points in adjacent segments53.

Both methods evaluate a dissimilarity curve \(\Delta\) by comparing adjacent segments through a cost function C:

$${\Delta}(y(t\in (a,b)),y(t\in (b,c)))= {C}(y(t\in (a,c))) -{C}(y(t\in (a,b)))\\ -{C}(y(t\in (b,c))).$$
(M14)

For the linear sliding window method, the cost function is

$${C}(y(t\in (a,c)))={\sum}_{t\in (a,c)}| \bar{y}-y(t)|,$$
(M15)

where a < b ≤ c are three different time points, y(t) is the theoretical prediction of a linear fit in the considered segment, and \(\bar{y}\) is the average signal in the considered time interval. For the LSDD method, the package ChangePointDetection.jl evaluates the L2 norm between the estimated probability densities of adjacent segments53.

The change point detection process involves:

  • Dissimilarity curve construction. The algorithm constructs a dissimilarity curve using a sliding window, comparing adjacent windows along the time series (or its derivative, if specified by the user).

  • Peak detection. Potential change points correspond to peaks in the dissimilarity curve. The algorithm identifies and ranks these peaks based on prominence, returning (nCP) user-specified peaks as change points.

Fitting growth models with segmentation

Once change points are identified, Kinbiont fits segmented models to data using two different strategies:

  • ODE models: Kinbiont sequentially fits each segment, starting from the earliest segment and proceeding forward. At each segment, the model with the lowest Akaike Information Criterion (AIC) is selected from a user-defined set, and continuity is maintained by using the fitted endpoint of one segment as the initial condition of the next.

  • Non-linear models: Segments are fitted in reverse chronological order, ensuring continuity between segments by incorporating a penalty term into the loss function.

Automatic selection of the number of change points

When the precise number of change points (nCP) is unknown, Kinbiont provides an automatic method to determine it. Kinbiont uses a direct search algorithm to evaluate all combinations of change points up to a user-defined maximum \({n}_{\,{\mbox{CP}}}^{{\mbox{Max}}\,}\), selecting the best configuration based on minimal AIC or corrected AICc:

$$\,{{\mbox{score}}}=(K+{n}_{{{\rm{CP}}}})\zeta+2{{\mbox{Log}}}\,(RSS({P}{*})),$$
(M16)

where K is the total number of model parameters across all segments, nCP is the number of change points, and ζ is a smoothing parameter as described in ref. 34. Lower values of ζ favor more complex models with additional segments, while higher values emphasize simpler, more parsimonious models.

This approach enables Kinbiont to automatically determine the optimal number of change points, ensuring data-driven and accurate segmentation of the time series. The flexibility in setting ζ allows users to balance model complexity with fit quality, adapting the analysis to the specifics of the data.

Interpretable machine learning methods for downstream analyses

Kinbiont incorporates different interpretable machine learning methods. Below, we briefly describe these methods; specific application details are provided in relevant sections of this paper.

Decision Tree: DecisionTree.jl is a Julia implementation of classification and regression trees18. This method constructs interpretable graphical models (decision trees) by recursively partitioning data based on explanatory features. Within each partition, it applies a simple prediction model. Classification trees, suitable for categorical responses, optimize prediction accuracy by minimizing misclassification costs. Regression trees, appropriate for continuous responses, minimize the squared error between observed and predicted values. Decision trees have been previously employed to evaluate the impact of media composition on bacterial growth2.

Symbolic Regression: SymbolicRegression.jl is a Julia implementation of symbolic regression16, which aims to identify interpretable mathematical expressions describing datasets. Symbolic regression represents candidate equations as expression trees, with leaves corresponding to constants or dataset features, and internal nodes representing mathematical operations. The algorithm seeks the simplest expression (fewer nodes) that accurately fits data, minimizing mean squared error and penalizing complexity. The optimal expression is identified through a multi-population evolutionary algorithm, in which multiple populations (ensembles of candidate solutions) evolve asynchronously54.

Analysis of the datasets used in this study

Parameter inference in non-standard microbial kinetics

The example illustrating phage-bacteria interaction kinetics (from Ref. 20) consists of a single growth curve for E. coli, initially inoculated at 108 cfu/ml, in the presence of T4 phages inoculated at 5 × 105 pfu/ml. Data were preprocessed by subtracting blanks and smoothing curves using LOWESS with default parameters. Subsequently, we fitted the Heterogeneous Population Model with inhibition and death (Eq. (M1)) by minimizing the relative error loss function (Eq. (M11)).

The ethanol bioproduction dataset consists of eight time series showing biomass, sugar, and ethanol concentrations of Kluyveromyces marxianus batch fermentations21. We fitted these data using the Monod-Ierusalimsky model (Eq. (M2)), minimizing the Euclidean distance between the observed and modeled data points.

The Acinetobacter dataset from Ref. 19 consists of 96 background-corrected growth curves profiling 25 different combinations of alanine and glutamate concentrations, with a logarithmic ratio of concentrations \(\,{\mbox{Log}}\,\left(\frac{[\,{\mbox{Ala}}\,]}{[\,{\mbox{Glu}}\,]}\right)\) spanning the range from −4 to 4, sampled in increments of 0.333. Most conditions have four replicates, except for \(\,{\mbox{Log}}\,\left(\frac{[\,{\mbox{Ala}}\,]}{[\,{\mbox{Glu}}\,]}\right)=-\!4\) and \(\,{\mbox{Log}}\,\left(\frac{[\,{\mbox{Ala}}\,]}{[\,{\mbox{Glu}}\,]}\right)=4\), which have two replicates each.

In preparation for the change point detection, we applied a smoothing filter to the growth curves using a rolling average over three time points and then employed a linear change point detection algorithm on the specific growth rate, which was evaluated using a sliding window of six points. The change point detection was performed with a window size of ten points, and the optimal number of change points was determined through a direct search within the range of 0 to 5. For each segmented time series, we fitted the following ODE models by minimizing the relative error loss function (Eq. (M11)): the exponential model (Eq. (M3)), the logistic model (Eq. (M4)), the HPM (Eq. (M5)), and the exponential HPM (Eq. (M6)). Model selection was carried out using the corrected AIC score (Eq. (M13)), with a penalty parameter set to 2. Corrected AIC scores for all model fits are shown in Supplementary Fig. 6.

Parameter inference on the growth curves used in symbolic regression

In the amino acid limitation and chloramphenicol inhibition experiments, data were baseline-subtracted and corrected for multiple scattering using an interpolation algorithm, followed by smoothing with a rolling average (10-point window for amino acid limitation, 12-point window for chloramphenicol inhibition). A linear change-point detection algorithm was applied to the specific growth rate (evaluated over a sliding window of 10 points) for each curve, searching for a single change point within a 16-point window. Curves were subsequently fitted using the logistic model (Eq. (M4)) and exponential HPM (Eq. (M6)), employing relative error as the loss function (Eq. (M11)). Model selection for each segment was performed using the corrected AIC (Eq. (M13)) with a penalty parameter set to 2.

Symbolic regression was conducted using Kinbiont’s downstream_symbolic_regression function, adapted from the Julia implementation described in Ref. 16. We used the binary operations  +, −, ×, and /, excluded unary operations, and all other parameters set to default.

Bacteriostatic (reversible-binding) limit of ribosome-targeting antibiotics

To interpret the results of symbolic regression obtained in the antibiotic inhibition experiment, we consider the mechanistic model of growth inhibition by ribosome-targeting antibiotics proposed by Ref. 37, which incorporates the mode of drug action into the physiological cell context through constraints of bacterial growth55. In this framework, the drug-dependent growth rate λ is given by the solution of

$${\left(\frac{\lambda }{{\lambda }_{0}}\right)}^{3}-{\left(\frac{\lambda }{{\lambda }_{0}}\right)}^{2}+\left(\frac{\lambda }{{\lambda }_{0}}\right)\left[\frac{1}{4}{\left(\frac{{\lambda }^{*}}{{\lambda }_{0}}\right)}^{2}+\frac{d}{2{d}_{50}^{*}}{\left(\frac{{\lambda }^{*}}{{\lambda }_{0}}\right)}^{2}\right]-\frac{1}{4}{\left(\frac{{\lambda }^{*}}{{\lambda }_{0}}\right)}^{2}=0,$$
(M17)

where d is the antibiotic concentration, λ0 is the drug-free growth rate, and the model parameters \({\lambda }{*}=2\sqrt{{\kappa }_{t}{P}_{{{\rm{out}}}}{K}_{D}}\) and \({d}_{50}^{ * }=\Delta r{\lambda }^{ * }/2{P}_{{{\rm{in}}}}\) are linked to biophysical processes of the drug action. Here, κt is the ribosomal translational capacity, Pin and Pout are the membrane permeabilities for inward and outward transport, KD is the antibiotic-ribosome binding constant, and Δr is the ribosome’s dynamical range37,56.

For antibiotics like chloramphenicol, characterized by large values of λ*, Eq. (M17) simplifies to the Langmuir form:

$$\frac{\lambda }{{\lambda }_{0}}=\frac{1}{1+d/{d}_{50}},$$
(M18)

where the half-inhibition drug concentration d50 is the antibiotic concentration at which growth rate is reduced to half of its drug-free value (λ/λ0 = 1/2).

Decision tree on the ecotoxicological dataset

The dataset of Ref. 22 comprises 15120 growth curves profiling the effects of every possible combination of eight commonly used pollutants across 12 bacterial isolates and a mixture of some of them, as described in the main text. Each chemical was either present at a concentration of 0.1 mg l−1 or absent22.

After background subtraction, we fitted the entire dataset using the Richards model (Eq. (M8)), minimizing the relative error in the loss function. A total of 1367 curves with an average relative error greater than 5% were excluded from further analysis (see Supplementary Fig. 12). Subsequent analyses were performed on the remaining 13753 growth curves, focusing on the duration of the lag phase, the exponential growth rate, and the maximum biomass concentration inferred from the data.

For each of these observables, we performed decision tree analysis using Kinbiont’s downstream_decision_tree_regression, adapted from the implementation in Ref. 57, with default parameters. The decision tree algorithm employed splitting criteria based on the Gini impurity for the mean of the distributions.

To evaluate the model’s performance, we tested different tree depths and conducted a 10-fold cross-validation for each case (Supplementary Figs. 1315), randomly splitting the dataset into training and test sets. We did not exclude entire conditions from sampling, as the replicated data were outcomes from biological replicates and separate experiments.

Synthetic data generation and simulations of deterministic and stochastic microbial dynamics

Kinbiont generates synthetic data by solving selected ODE models using the SCiML solver58. As a microbial dynamics simulator, Kinbiont supports both deterministic and stochastic cases, enabling hypothesis testing and precise synthetic benchmarking for each model.

For deterministic simulations, users can choose any ODE model implemented in Kinbiont and specify the preferred numerical integrator. Kinbiont is compatible with all numerical integrators available in SCiML, with the Tsitouras 5/4 Runge-Kutta method from DifferentialEquation.jl58 set as the default.

For stochastic simulations, Kinbiont models cell division as a discrete-time Poisson process. The simulation time span is divided into regular intervals of Δt, and at each time step, the population size is updated using the following discrete model:

$$N({t}_{i})=N({t}_{i-1})+{N}_{{{\rm{birth}}}},$$
(M19)

where N(ti) represents the number of cells at time ti, and Nbirth denotes the number of birth events in Δt, sampled from a Poisson distribution with rate N(ti−1)μΔt:

$${N}_{{{\rm{birth}}}} \sim {{\mbox{Pois}}}\,\left(N({t}_{i-1})\mu \Delta t\right).$$
(M20)

In this context, the growth rate of each individual in the population (μ) is calculated based on the concentration of the limiting nutrient, with the user specifying the initial nutrient amount and the culture volume. Various kinetic growth models can be used for this evaluation (see59 for a review):

  • Monod:

    $$\mu (\nu ;{k}_{1},{\mu }_{{{\rm{max}}}})={\mu }_{{{\rm{max}}}}\frac{\nu }{{k}_{1}+\nu }.$$
    (M21)
  • Haldane:

    $$\mu (\nu ;{k}_{1},{k}_{2},{\mu }_{{{\rm{max}}}})={\mu }_{{{\rm{max}}}}\frac{\nu }{{k}_{1}+\nu+\frac{{\nu }^{2}}{{k}_{2}}}.$$
    (M22)
  • Blackman60:

    $$\mu (\nu ;{k}_{1},{\mu }_{{{\rm{max}}}})=\left\{\begin{array}{ll}{\mu }_{{{\rm{max}}}}\quad &\,{\mbox{if}}\,\nu \ge {k}_{1}\\ \frac{{\mu }_{{{\rm{max}}}}}{{k}_{1}} \quad &\,{\mbox{if}}\,\nu < {k}_{1}\end{array}\right..$$
    (M23)
  • Tesseir:

    $$\mu (\nu ;{k}_{1},{\mu }_{{{\rm{max}}}})={\mu }_{{{\rm{max}}}}(1-{e}^{{k}_{1}\nu }).$$
    (M24)
  • Moser:

    $$\mu (\nu ;{k}_{1},{k}_{2},{\mu }_{{{\rm{max}}}})={\mu }_{{{\rm{max}}}}\frac{{\nu }^{{k}_{2}}}{{k}_{1}+{\nu }^{{k}_{2}}}.$$
    (M25)
  • Aiba-Edwards:

    $$\mu (\nu ;{k}_{1},{k}_{2},{\mu }_{{{\rm{max}}}})={\mu }_{{{\rm{max}}}}\frac{\nu }{{k}_{1}+\nu }{e}^{-\frac{\nu }{{k}_{2}}}.$$
    (M26)
  • Verhulst:

    $$\mu (N;{N}_{{{\rm{max}}}},{\mu }_{{{\rm{max}}}})={\mu }_{{{\rm{max}}}}\left(1-\frac{N}{{N}_{{{\rm{max}}}}}\right).$$
    (M27)

We use ν to represent the limiting nutrient concentration throughout, μmax denotes the maximum possible growth rate, k1 (for i = 1, 2) is a numerical constant whose specific meaning depends on the model, N indicates the number of present cells, and Nmax is the carrying capacity in the Verhulst model.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.