Introduction

Recent advances in experimental techniques have revolutionised our capacity to simultaneously acquire high-throughput data from the genome, epigenome, transcriptome, proteome, and metabolome1. Datasets that encompass multiple omic layers within a biological system are collectively referred to as multi-omic data. Thus far, omic layers within these datasets have often been analysed in isolation2,3,4,5, offering a partial understanding of the complex machinery of biological systems6. The integrative analysis of multi-omic data directly addresses this limitation, providing a holistic perspective of biological processes and cellular functions7. Such approaches have been successfully applied to associate omic entities with specific phenotypes8, identify biomarkers for drug therapies9, stratify patients10, and conduct functional analyses11. Inferring regulatory networks through multi-omic data can reveal complex genotype–phenotype relationships and uncover regulatory pathways overlooked in single-omic studies.

Biological interactions are commonly represented as regulatory networks, where nodes correspond to biological molecules associated with distinct omics (e.g., genes, proteins, or metabolites) and directed edges indicate causal effects between molecules. Inferring these causal relationships typically requires time-series data to capture the temporal order of events in the system12,13,14. In computational biology, considerable research has focused on network inference from time-series data15, with a particular emphasis on gene regulatory networks (GRNs)3,16,17,18,19. Traditionally, bulk RNA sequencing (RNA-seq) data was the primary source of data for GRN inference3,20,21,22, but lately, single-cell RNA sequencing (scRNA-seq) data has been receiving considerable attention for its ability to capture cellular heterogeneity16,17,23,24,25,26. Yet, these tools primarily focus on single-omic studies and do not consider the challenges associated with the integration of multi-omic data.

Multi-omic data exhibit significant sample heterogeneity and variability, especially when data is measured at a single-cell resolution. Experimental protocols for data collection can be distinct for each omic layer, leading to multiple data modalities1. More importantly, these layers are regulated at different timescales ranging from seconds to hours27. These characteristics require the development of data-driven methods that can model the system dynamics over a wide range of molecular layers and temporal scales. In this context, several methods have been developed to improve GRN inference by integrating genomic, transcriptomic, and chromatin accessibility data28,29,30. However, these tools are limited to inferring gene-gene interactions and cannot predict heterogeneous, cross-omic interactions. One notable method that overcomes this limitation, KiMONo31, combines statistical models with prior knowledge of protein-protein interactions to infer regulatory networks from multi-omic data. Yet, these predictions strongly rely on human-curated knowledge, which is known to be sparse and incomplete, and the method is not designed for time-series data. Other existing approaches to multi-omic network inference are based on graph representation learning algorithms32,33. These methods also have limitations: they typically focus on statistical correlations between nodes, which hinder the identification of causal relationships, and are based on pre-specified network topologies that may not accurately represent the true underlying biological network. While tools like TREM-Flux34 and scFEA35 integrate transcriptomic and metabolomic data, their primary goal is metabolic flux estimation rather than inferring causal regulatory relationships between omic layers. Despite these advances, computational methods that can build causal dynamical models from time-series multi-omic data are still needed.

This paper presents MINIE, a computational method for Multi-omIc Network Inference from timE-series data. Our approach follows a two-step pipeline for the inference of inter- and intra-layer interactions. First, we incorporate the timescale separation across omic layers using a model of differential-algebraic equations (DAEs). Second, we integrate the two most common data modalities available in multi-omic datasets—bulk and single-cell measurements—within a Bayesian regression framework, enabling the inference of the network topology. We validate our approach using case studies that integrate single-cell transcriptomic data (slow layer) with bulk metabolomic data (fast layer). These two omics were chosen due to the critical role of metabolites as both end products of gene expression and key regulators of cellular processes. Our results show that MINIE accurately infers regulatory networks using synthetic datasets generated from both linear and nonlinear dynamical models. When applied to experimental data from Parkinson’s disease (PD) studies, MINIE successfully identified high-confidence interactions reported in literature as well as novel links that are potentially relevant to PD (which could then be further validated). In benchmarking against state-of-the-art algorithms, MINIE outperformed single-omic methods, underscoring the importance of purpose-built algorithms for multi-omics integration. Additionally, when compared to existing GRN inference methods designed exclusively for scRNA-seq data, MINIE demonstrated superior performance in curated and synthetic networks. Overall, these findings highlight MINIE’s potential to advance our understanding of complex biological systems through comprehensive multi-omic integration.

Results

Method overview

MINIE is a data-driven network inference tool designed to identify causal interactions both within and across omic layers. Figure 1 summarises the method pipeline. This paper focuses on the regulatory role of the metabolome on the transcriptome. Accordingly, MINIE takes as inputs time-series of both transcriptomic and metabolomic data (Fig. 1a). Given the advances in sequencing technologies, transcriptomic data is now available at the single-cell level (scRNA-seq data), while metabolomic measurements typically remain at the bulk level. MINIE integrates these commonly used data modalities in multi-omic experiments.

Fig. 1: Method overview.
figure 1

a The input of MINIE consists of time-series bulk metabolomics and scRNA-seq data. Note that time-series scRNA-seq data can be interpreted as time-dependent distributions of gene expression across measured cells, sampled at specific time points. b MINIE's algorithm is divided into two steps: transcriptome–metabolome mapping inference (top) and network inference using Gaussian regression (bottom). Step 1: Bulk metabolomic and transcriptomic data (obtained by averaging the scRNA-seq data across cells) are used as inputs in this step. The diagram illustrates the sparse regression problem, which consists of determining the coefficients of Θ from the metabolite concentrations \({\boldsymbol{m}}(t)\) and bulk gene expressions \({\boldsymbol{g}}(t)\). These coefficients enable the inference of the transcriptome–metabolome mapping Γ, which is further used to estimate the metabolomic trajectories at the single-cell level. Step 2: Three MCMC samplers are employed iteratively to estimate the parameters of the underlying regulatory model, including the network topology. First, the pseudotime values are estimated from the scRNA-seq data, assigning each sampled cell a unique time point reflecting its progression. Second, the network topology is sampled (by randomly adding/deleting edges) together with other model parameters. Third, the gene trajectories are sampled from the Bayesian model, fitting the data. c The final output is a confidence matrix with the probability of existence for all regulatory interactions in the transcriptome (i.e., gene-gene and metabolite-to-gene links).

A significant challenge for causal inference is the timescale separation in the regulation of different omics36. For instance, the turnover time of the metabolic pool in mammalian cells is approximately one minute, while the mRNA pool half-life is around ten hours27. MINIE captures this phenomenon using a dynamical model of DAEs: the slow transcriptomic dynamics are captured by differential equations that govern the evolution of mRNA concentrations over time, while the fast metabolic dynamics are encoded as algebraic constraints that assume instantaneous equilibration of metabolite concentrations. This allows DAEs to explicitly integrate processes that unfold on vastly different timescales within a single unified model. By contrast, when fast and slow processes coexist, ordinary differential equations (ODEs) require stiff numerical approximations that are unstable and computationally demanding, providing an inaccurate representation of the underlying biological system.

The DAE model is formalised as

$$\begin{array}{ll}\dot{{\boldsymbol{g}}}\,=\,{\boldsymbol{f}}({\boldsymbol{g}},{\boldsymbol{m}},{{\boldsymbol{b}}}_{{\rm{g}}};{\boldsymbol{\theta }})+{\boldsymbol{\rho }}({\boldsymbol{g}},{\boldsymbol{m}}){\boldsymbol{w}},\\ \dot{{\boldsymbol{m}}}\,=\,{\boldsymbol{h}}({\boldsymbol{g}},{\boldsymbol{m}},{{\boldsymbol{b}}}_{{\rm{m}}};{\boldsymbol{\theta }})\approx 0,\end{array}$$
(1)

where \({\boldsymbol{g}}\in {{\mathbb{R}}}_{\ge 0}^{{n}_{{\rm{g}}}}\) denotes a vector containing the expression levels of ng genes, \({\boldsymbol{m}}\in {{\mathbb{R}}}_{\ge 0}^{{n}_{{\rm{m}}}}\) denotes the concentration levels of nm metabolites, and n = ng + nm is the total number of molecules. The nonlinear functions \({\boldsymbol{f}}:{{\mathbb{R}}}^{n}\mapsto {{\mathbb{R}}}^{{n}_{{\rm{g}}}}\) and \({\boldsymbol{h}}:{{\mathbb{R}}}^{n}\mapsto {{\mathbb{R}}}^{{n}_{{\rm{m}}}}\) describe the multi-layer interactions involved in gene and metabolite regulation, respectively. The algebraic equations arise from the quasi-steady-state approximation \(\dot{{\boldsymbol{m}}}(t)\approx 0\), due to the assumption that changes in \({\boldsymbol{m}}(t)\) occur much faster than those of other variables in the system, allowing \({\boldsymbol{m}}(t)\) to be considered effectively constant on the timescales of interest. Stochastic influences observed in biological processes, such as cellular noise, are accounted for using a multiplicative noise model, where ρ(g, m) is a state-dependent function representing the noise amplitude and w is a Gaussian white noise. External influences or baseline effects that are known a priori (e.g., gene knockdown) are represented by \({{\boldsymbol{b}}}_{{\rm{g}}}\in {{\mathbb{R}}}^{{n}_{{\rm{g}}}}\) and \({{\boldsymbol{b}}}_{{\rm{m}}}\in {{\mathbb{R}}}^{{n}_{{\rm{m}}}}\). Finally, we explicitly denote the parameters sought to be identified in our model from data as \({\boldsymbol{\theta }}\in {{\mathbb{R}}}^{{n}_{{\rm{p}}}}\).

The method is divided into two steps described in Fig. 1b: (1) transcriptome–metabolome mapping inference and (2) regulatory network inference via Bayesian regression. The first step is grounded on the algebraic component of Eq. (1). Assuming h can be approximated by a linear function, we have that

$$0\approx {A}_{{\rm{mg}}}{\boldsymbol{g}}+{A}_{{\rm{mm}}}{\boldsymbol{m}}+{{\boldsymbol{b}}}_{{\rm{m}}},$$
(2a)
$${\boldsymbol{m}}\approx -{A}_{{\rm{mm}}}^{-1}{A}_{{\rm{mg}}}{\boldsymbol{g}}-{A}_{{\rm{mm}}}^{-1}{{\boldsymbol{b}}}_{{\rm{m}}},$$
(2b)

where \({A}_{{\rm{mg}}}\in {{\mathbb{R}}}^{{n}_{{\rm{m}}}\times {n}_{{\rm{g}}}}\) and \({A}_{{\rm{mm}}}{{\mathbb{R}}}^{{n}_{{\rm{m}}}\times {n}_{{\rm{g}}}}\) are matrices encoding the gene-metabolite and metabolite-metabolite interactions. Using time-series measurements of metabolite concentrations \({\boldsymbol{m}}(t)\) and gene expression \({\boldsymbol{g}}(t)\) this formulation allows Amg and Amm to be inferred through a sparse regression problem, circumventing the underdetermined nature of biological systems (characterised by high-dimensional data and limited sample sizes). To narrow down the number of interactions inferred in Amm and Amg, we curated a list of human metabolic reactions documented in the literature37. This data was then used to identify metabolite-metabolite and gene-metabolite interactions, and constrain the nonzero elements in Amm and Amg to only those interactions that are known a priori (see Fig. S1 for an illustration of the curated network). Consequently, the structure of these matrices is fixed, and the sparse regression infers only the corresponding interaction strengths. The transcriptome–metabolome mapping \(\Gamma =-{A}_{{\rm{mm}}}^{-1}{A}_{{\rm{mg}}}\) is then computed from the inferred Amm and Amg and captures the relationship between transcriptomic and metabolomic data.

The second step builds on the differential component of Eq. (1). To characterise the complexity and uncertainty of gene regulatory dynamics, we model f as a Gaussian process (GP). This GP framework describes the temporal evolution of gene expressions (i.e., gene trajectories) through a mean function and a covariance function encoding gene-gene and gene-metabolite relationships. Our approach to estimating the GP functions is based on BINGO3, a GRN inference tool developed for bulk RNA-seq data. This step of MINIE introduces two key modifications:

  1. 1.

    scRNA-seq data integration via pseudotime estimates;

  2. 2.

    metabolomic integration via the inferred transcriptome–metabolome mapping Γ: g m.

First, we introduce the latent variable τ, known as pseudotime, to model cellular progression within the biological system under study. Unlike actual time, pseudotime is inferred from data and represents the relative temporal progression of each measured cell state along a biological process. For example, cell differentiation is a dynamical process where two measured cells (even if sampled at the same time point) might be at different stages in the differentiation process. Pseudotime aims to infer the developmental trajectory followed by those cells. Given the inherent noise in scRNA-seq data, we sample gene trajectories (using the GP model) and identify those that best fit the scattered data points. This procedure ensures consistency between pseudotime estimates, single-cell gene expression data, and the underlying dynamical model.

Second, MINIE uses the mapping Γ to infer the single-cell metabolomic trajectories based on the sampled gene trajectories. These metabolic profiles are treated as external inputs to the GP model, integrating the regulation of metabolites on gene dynamics at the single-cell level. We note that perturbations, such as drugs or mutations, can also be incorporated as external inputs. As a result, the inferred dynamical model can be used to uncover the underlying network structure and identify potential perturbation targets (e.g., drug targets) that modulate gene regulation.

We employ Bayesian inference and three Markov Chain Monte Carlo (MCMC) samplers to infer the posterior distribution of gene trajectories, pseudotime, and model parameters (including the network topology). MINIE builds the posterior distribution p(θG) for the model parameters θ given the measured (transcriptomic) data G. To make the distribution tractable, gene trajectories \({\boldsymbol{g}}(t)\) and pseudotime τ are introduced as latent variables, yielding

$$p({\boldsymbol{\theta }}| G)\propto p({\boldsymbol{\theta }})\iint \,p(G| {\boldsymbol{g}},{\boldsymbol{\tau }},{\boldsymbol{\theta }}\,)p(\,{\boldsymbol{g}}| \theta )p({\boldsymbol{\tau }}){\rm{d}}{\boldsymbol{g}}\,{\rm{d}}{\boldsymbol{\tau }},$$
(3)

where p(gθ) describes the gene trajectories \({\boldsymbol{g}}(t)\) over potential network interactions (encoded in θ).

MINIE’s output is a confidence matrix \(C\in {[0,1]}^{{n}_{{\rm{g}}}\times n}\), where each entry Cij corresponds to the probability that molecule j regulates gene i (Fig. 1c). By setting a threshold Cij ≥ ε, we can infer the underlying multi-omic regulatory network, identifying the specific genes and metabolites involved in gene expression regulation. See “Methods” section for mathematical and algorithmic details on MINIE’s implementation.

Proof-of-concept on a multi-layer network motif

To validate the fundamental design of MINIE, we conducted a proof-of-concept experiment using a multi-layer network motif. Figure 2 shows MINIE’s performance on this case study. The system used to generate synthetic data consists of 5 nodes (3 genes and 2 metabolites) modelled as in Eq. (1), where both f and h are linear functions encoding the network interactions (Fig. 2a). In this case, the DAE model is given by

$$\left[\begin{array}{c}\dot{{\boldsymbol{g}}}(t)\\ 0\end{array}\right]=\left[\begin{array}{cc}{A}_{{\rm{gg}}}&{A}_{{\rm{gm}}}\\ {A}_{{\rm{mg}}}&{A}_{{\rm{mm}}}\end{array}\right]\left[\begin{array}{c}{\boldsymbol{g}}(t)\\ {\boldsymbol{m}}(t)\end{array}\right]+\left[\begin{array}{c}{{\boldsymbol{b}}}_{{\rm{g}}}\\ {{\boldsymbol{b}}}_{{\rm{m}}}\end{array}\right],$$
(4)

where the block matrices have consistent dimensions and encode the intra- and inter-omic interactions. Details on model parameters and data generation can be found in the Supplementary Material, Section 2. MINIE’s performance was statistically evaluated using 100 synthetic datasets with different initial conditions and external inputs (e.g., overexpression or underexpression levels). Each dataset comprises 11 time points sampled under two experimental conditions, in which the concentration of either metabolite 1 or 2 is increased (e.g., bm = [1, 0] or [0, 1]), mimicking control versus mutant/treated experiments in biological studies. Figure 2b illustrates the synthetic (transcriptomic and metabolomic) data for a representative realisation, together with the fitted pseudotime and metabolic trajectories by MINIE. As expected, given the linearity of the data (and our assumption that the structures of Amg and Amm are known), the inferred metabolite trajectories by the mapping Γ are almost perfect.

Fig. 2: Performance on linear network motifs.
figure 2

a Multi-layer network of genes (purple) and metabolites (green). Solid arrows depict activating (sharp) and inhibiting (blunt) interactions to be inferred by MINIE, while dashed arrows indicate known metabolic interactions. b Artificially-generated transcriptomic (left) and metabolomic (right) data by a linear SDE model. Left: Projection of the single-cell data using principal component analysis (PCA). Each data point represents the gene expression level of a measured cell, colour-coded by sampling time (ordered from blue to yellow). The arrow shows the pseudotime trajectory estimated by MINIE. Right: Bulk metabolic concentrations (dots) and inferred trajectories (dashed lines) using MINIE's transcriptome–metabolome mapping. c Histogram of predicted confidence values for the existence of each link, where true positive links are represented by a dark shade. d Reconstructed network based on predicted probabilities for ε = 0.7. The true links and non-existing links are respectively represented by green and grey arrows, weighted according to the confidence values predicted by MINIE. e Distribution of AUROC scores across 100 datasets with different initial conditions.

Figure 2c shows the confidence values predicted by MINIE for gene-gene and metabolite-to-gene interactions. By setting an appropriate threshold, ε = 0.7, the regulatory network can be perfectly reconstructed with high confidence (Fig. 2d). To measure the method’s ability to distinguish between classes (true vs. non-existing links) across all possible thresholds, we used the area under the receiver-operating characteristic curve (AUROC). MINIE achieved an average AUROC of 0.99 over all datasets (Fig. 2e). Overall, these results demonstrate that MINIE reliably infers the network topology in a linear, low-dimensional model.

We emphasise that the identifiability of multi-omic systems is crucial in our study. Identifiability refers to the ability to uniquely estimate model parameters from measurements. Linear DAE models, unlike their ODE counterparts, can be unidentifiable even when all variables are perfectly measured, depending on the network structure38. Two key reasons underlie this lack of identifiability. First, when Amm is invertible, it follows from Eqs. (2b) and (4) that the DAE can be reduced to the ODE \(\dot{{\boldsymbol{g}}}=({A}_{{\rm{gg}}}-{A}_{{\rm{mm}}}^{-1}{A}_{{\rm{mg}}}){\boldsymbol{g}}\). The gene dynamics can thus be equivalently represented either as a DAE or an ODE, making the model non-unique. Yet, DAEs (representing multi-omic networks) are generically sparser than their ODE counterparts (representing single-omic networks). For example, the network in Fig. 2a has 3 edges in the DAE model compared to 4 edges in the ODE model (ignoring self-edges). MINIE favours the identification of DAE models by promoting network sparsity in the inference of the mapping Γ and the covariance matrix of the GP functions. Second, DAEs may display equivalent dynamics at the metabolome layer, hindering the identification of the regulatory metabolites. Breaking this proportionality contributes to the system identification, which, together with the previous assumption, is essential for accurately assessing an algorithm’s predictive capacity when using synthetic networks. See Supplementary Material, Section 3, for further discussion.

Case studies on nonlinear multi-omic models

The linear network motifs are useful for initial validation, but lack the complexity inherent in biological systems. To address this challenge, we consider two nonlinear biological models: a curated multi-omic network described by Hill-type regulatory interactions and the canonical lac operon model.

Curated multi-omic network

Figure 3 shows MINIE’s performance on a nonlinear multi-omic regulatory network curated from the literature. Since experimentally validated ground truths for multi-omic networks remain scarce, we constructed a realistic network based on established biological principles. This design draws from the scientific review39, which focuses on the regulation of metabolism at the transcriptomic level. By integrating the GRNs and metabolic networks described therein (along with reported inter-omic interactions; see Supplementary Material, Section 4 for details), we built a multi-omic network consisting of 9 genes, 8 metabolites, and 26 interactions (Fig. 3a). We used the BoolODE algorithm40 to generate biologically relevant multi-omic time-series data under two experimental conditions: mutant and control (Supplementary Material, Section 2). The BoolODE algorithm accounts for the nonlinear dynamics of cellular processes by modelling the saturation of molecular concentrations using the Hill function. We adapted the original code to simulate the molecular regulation under different timescales (i.e., ×75 faster for metabolites) based on comprehensive studies across biological systems41,42,43; for completeness, a performance comparison under different scaling factors is also reported in Table S1. The synthetic data is represented in Fig. 3b, together with the fitted pseudotime and metabolic trajectories by MINIE (for illustration purposes, only Cholesterol and Ornithine trajectories are shown). Despite the nonlinearity of the data, the inferred transcriptome–metabolome mapping successfully reconstructs the metabolite trajectories, showing a good fit with the original data.

Fig. 3: MINIE’s performance on nonlinear multi-omic model.
figure 3

a Multi-omic network of genes (purple) and metabolites (green). Solid arrows depict interactions to be inferred by MINIE, while dashed arrows indicate known metabolic interactions input as prior knowledge. b Artificially-generated transcriptomic (left) and metabolomic (right) data by the BoolODE algorithm. Left: Projection of the single-cell data using PCA. The arrow shows the pseudotime trajectory estimated by MINIE. Right: Bulk metabolic concentrations (dots) and inferred metabolomic trajectories (dashed lines) using the mapping Γ. c Histogram of predicted probability values for the existence of each link, where true positive links are represented by a dark shade. d Reconstructed network based on predicted probabilities for ε = 0.4. The true links (solid green), false positives (dotted grey), and false negatives (solid grey) are represented by arrows weighted according to the confidence value predicted by MINIE.

The confidence scores output by MINIE are indicated in Fig. 3c. The separation between true positive links and non-existing links enables us to set a threshold, ε = 0.4, where eight true positives (out of 10) and one false positive are identified (Fig. 3d). The false positive could be explained by the dynamic similarity between C/EBPα and PPARβ (Fig. S2); this led to a low confidence score of 0.33 for the true link from C/EBPα to HNF4α, which is excluded as a false negative for ε = 0.4. Despite these minor discrepancies, MINIE achieved a strong performance, with an AUROC of 0.93. Notably, the inter-omic link from Ornithine to SREBP1c was correctly inferred with moderate confidence (0.50), which further supports MINIE’s reliability in complex scenarios.

Lac operon model

Figure 4 shows MINIE’s performance on the well-characterised Escherichia coli lac operon. Unlike the previous example, this model provides both a well-known topology and experimentally measured kinetic parameters, enabling more stringent validation. We adopted the delay-differential equation model in ref. 44, which describes the coupled dynamics of five molecular species: lactose (L), allolactose (A), mRNA (M) transcribed from the lacZ gene, β-galactosidase (B), and membrane permease (P). These represent fast (metabolites: L and A) and slow (genes/proteins: M, B and P) variables (Fig. 4a), accounting for the timescale separation modelled in MINIE. The simulations were performed in a stochastic Langevin framework under two environmental scenarios: an induction phase with high extracellular lactose and increasing nutrition, and a nutrient-rich repression phase with decreasing extracellular lactose (Fig. 4b).

Fig. 4: MINIE’s performance on Escherichia coli lac operon.
figure 4

a Multi-omic regulatory network of genes/proteins (purple) and metabolites (green) of the lac operon model. Solid arrows depict interactions to be inferred by MINIE, while dashed arrows indicate known metabolic interactions input as prior knowledge. b Artificially-generated transcriptomic (left) and metabolomic (right) data simulating a nutrient-rich repression phase, with decreasing levels of lactose. Left: Projection of the single-cell data using PCA. The arrow shows the pseudotime trajectory estimated by MINIE. Right: Bulk metabolic concentrations (dots) and inferred metabolomic trajectories (dashed lines) using the mapping Γ for lactose (blue) and allolactose (magenta). c Histogram of predicted probability values for the existence of each link, where true positive links are represented by a dark shade. d Reconstructed network based on predicted probabilities for ε = 0.8. The true links and non-existing links are respectively represented by green and grey arrows, weighted according to the confidence values predicted by MINIE.

Despite the system’s strong nonlinearity and time-delay regulation, MINIE accurately inferred the regulatory links, achieving an AUROC of 0.93 and AUPRC of 0.85 (Fig. 4d). Notably, the key inter-omic link from allolactose to mRNA was inferred with high confidence (0.88), demonstrating robust performance under mechanistically realistic conditions.

These two case studies—one inspired by biological principles and the other grounded in mechanistic data—demonstrate MINIE’s ability to infer complex regulatory networks under nonlinear dynamics, timescale separation, and biological variability.

Benchmarking MINIE against published algorithms

Benchmarking new algorithms is essential to characterise their pros and cons over existing methods. However, MINIE is the first network inference method designed specifically for time-series multi-omic data. Hence, we benchmarked MINIE against state-of-the-art GRN inference methods using two strategies to validate MINIE’s new features: (1) considering a synthetic multi-omic dataset for the evaluation of available GRN inference methods (even though they are only designed for single-omic studies and do not account for multiple timescales), and (2) comparing the GRN inference capabilities of MINIE solely on transcriptomic single-cell data using the BEELINE pipeline40.

Multi-omic dataset benchmarking

In the first strategy, we used the multi-omic model investigated in Fig. 3. We compared MINIE with BINGO3 and dynGENIE320, which are the top-performing methods in the DREAM4 in silico GRN inference challenge3,45,46,47. Since BINGO and dynGENIE3 were designed for bulk time-series data, we consider as input data—for all three methods— average values of molecular concentrations across all cells for each time point. For BINGO and dynGENIE3, we concatenated the gene expression and metabolomic data into a single input matrix (under the premise that these data were part of a single-omic layer). In contrast, MINIE used its two-step pipeline to integrate metabolomic and transcriptomic data. All methods output confidence matrices, which are evaluated using the AUROC and area under the precision-recall curve (AUPRC) metrics. Figure 5 shows that MINIE outperforms both BINGO and dynGENIE3 in terms of AUROC and AUPRC by a margin of 13% and 8%, respectively, with AUROC exceeding 90%. Although all methods face similar challenges, such as misidentifying regulators like C/EBPα and PPARβ, MINIE stands out as the only approach capable of uniquely inferring the regulatory links from metabolites to genes (Fig. S8). A comparison of running times for the three methods is also included in Table S4.

Fig. 5: Benchmark of MINIE against state-of-the-art GRN inference methods.
figure 5

a Receiver-operating characteristic curves and b precision-recall curves of MINIE, BINGO and dynGENIE3. The results show that MINIE (AUROC = 0.93, AUPRC = 0.86) outperforms both BINGO (AUROC = 0.80, AUPRC = 0.78) and dynGENIE3 (AUROC = 0.79, AUPRC = 0.77).

Importantly, these results constitute a de facto ablation study, isolating the functional advantages of MINIE’s design choices. When applied to bulk RNA-seq data, BINGO and MINIE operate under equivalent conditions except for the inclusion of the metabolomic layer in MINIE through the transcriptome–metabolome mapping. The superior performance of MINIE in this setting directly reflects the added value of this mapping and its integration into the network inference step. Furthermore, MINIE achieves even higher predictive accuracy and confidence when applied to single-cell data (Fig. 3d) compared to its bulk counterpart (Fig. S8a), underscoring the informational value of single-cell data for network inference. Together, these results validate the impact of MINIE’s core design elements: the integration of multi-omic data modalities, the modelling of molecular layer dynamics, and the exploitation of single-cell variability.

Single-cell dataset benchmarking

In the second strategy, we benchmarked MINIE’s capacity to infer GRNs from scRNA-seq data using the BEELINE pipeline40. The analysis is based on synthetic, curated, and experimental datasets derived from single-cell transcriptomic data, and considers the following performance metrics: the AUPRC ratio, which compares the AUPRC score of a given method to that of a random predictor, and the early-precision ratio (EPR), which measures the fraction of true positives among the top-k predicted edges relative to a random predictor (accounting for the edge density of the ground truth). Figure 6 compares the performance of MINIE with other published methods16,17,23,24,48,49,50,51,52,53,54,55 included in the BEELINE study, whereas a comparison of the computational running times is included in Table S5. The synthetic networks were designed to evaluate each method’s ability to infer regulatory networks producing a variety of trajectories observed in differentiating and developing studies. Figure 6a shows that MINIE achieved superior performance in three out of six network motifs (bifurcating, bifurcating converging, and trifurcating) and closely matched the best-performing methods in the remaining ones (linear, cycle, and long linear). Additionally, MINIE’s performance was consistent across different cell numbers (100, 200, 500, 2000, and 5000), with a robust average AUPRC close to 0.8 across multiple network topologies (Fig. S9). Figure 6b shows the benchmark using curated networks (mCAD, VSC, HSC, GSD) derived from validated Boolean models. MINIE achieved competitive AUPRC ratios, particularly excelling in the HSC and GSD networks. Although it performed poorly in the mCAD network (as did most other methods), MINIE stands among the top-performing methods. Notably, our method maintained consistently high performance across both curated and synthetic datasets, in contrast to other methods that typically excelled in one domain but underperformed in the other.

Fig. 6: Benchmark of MINIE using the BEELINE pipeline.
figure 6

a Performance on BEELINE synthetic networks. Each column corresponds to a different network motif, while rows represent GRN inference methods. The performance scores are averaged over the 20 datasets with 2000 and 5000 cells generated in the original study40. The GRN inference methods are ordered in descending order based on the median value of their median AUPRC ratios across all motifs. b Performance on BEELINE curated networks. The two sets of columns represent the EPR and median AUPRC ratios. The curated networks include mCAD (mammalian cortical area development), VSC (ventral spinal cord), HSC (hematopoietic stem cell differentiation), and GSD (gonadal sex determination). The performance scores are averaged over the 10 datasets without considering dropouts. The performance values included in this figure, except for MINIE's, are obtained from ref. 40.

Finally, we tested MINIE on experimental scRNA-seq datasets using ground truths reconstructed from several resources containing regulatory information, such as ENCODE56, DoRothEA57, and STRING58. MINIE’s performance, while strong on synthetic and curated datasets, was limited with these experimental data (Fig. S10). This difference raised two hypotheses: (i) biases in the ground-truth networks that favour correlation-based methods or (ii) limitations in MINIE’s scalability to large datasets. However, tests with smaller experimental datasets produced similar trends (Fig. S10; second row), discarding the second hypothesis. Regarding the first hypothesis, the BEELINE study itself demonstrated that GRN inference performance varies substantially depending on whether STRING or ChIP-seq-based networks are used as ground truths, with higher scores observed on STRING despite similar network densities40. Likewise, other independent studies have also reported that imposed ground truths may introduce biases and do not always capture direct regulatory interactions59,60. To further explore this, we applied a traditional correlation-based analysis to the experimental datasets, which achieved notably high performance—particularly when evaluated against nonspecific and functional ground truths (Fig. S10; first row). These findings suggest that the current ground-truth construction may systematically favour correlation-based rather than causal-based approaches, highlighting a potential area for improvement in the benchmarking of causal network inference methods.

Experimental validation on Parkinson’s disease data

We applied the pipeline to study PD61. The experimental procedure focused on the differentiation of induced Pluripotent Stem Cells (iPSCs) to dopaminergic neurons in cell lines derived from a PD patient with a PINK1 mutation and a control subject. Time-series bulk metabolomics and scRNA-seq data were collected at six stages of differentiation (days 0, 8, 18, 25, 32, and 37) for both healthy and PD cell lines. Figure 7a, b illustrates the dynamics of the experimental data together with the estimated pseudotime and metabolite trajectories generated by MINIE.

Fig. 7: Multi-omic network inference for the Parkinson’s disease datasets.
figure 7

Time-series transcriptomic (upper) and metabolomic (lower) data for a PINK1 mutant and b healthy samples. Upper: Projection of the single-cell data using PCA. The arrow shows the pseudotime trajectory estimated by MINIE; the trajectory does not pass through the point clouds due to significant zero inflation in the data (which is accounted for by a weighting scheme described in “Methods” section). Lower: Bulk metabolic concentrations (dots) and inferred metabolomic trajectories (dashed lines) using MINIE's transcriptome–metabolome mapping. For illustration purposes, just two metabolites are shown: phenylalanine (purple) and glutamate (blue). c Histograms of predicted probability values for the existence of each link, considering gene-gene interactions (left panel), metabolite-to-gene interactions (middle panel), and perturbation targets (right panel). d Reconstructed network based on confidence values for ε2 = 0.06. Only genes (purple) and metabolites (green) with regulatory links are included (node size is proportional to the degree of each node). PINK1, the gene responsible for the mutation under study, is highlighted in bold. e Reconstructed network for ε1 = 0.04.

The PINK1 mutation was modelled as an external perturbation to identify genes with dynamic responses to this mutation. We analysed the results across three dimensions (Fig. 7c): predicted gene-gene interactions (left panel), regulatory role of metabolites (middle panel), and perturbation target candidates (right panel). To visualise the reconstructed networks based on MINIE’s predictions, we consider two choices of thresholds: ε = 0.04 (Fig. 7d) and 0.06 (Fig. 7e). At a lower threshold, the reconstruction yields a small-scale network with 63 genes and 6 metabolites, interconnected by 274 links. The higher threshold leads to a single connected component network with 552 molecules and 3669 interactions. The small-scale network is particularly well-suited for biological interpretability, as it highlights only the most confident interactions. On the other hand, the large-scale nature of the latter network demonstrates the comprehensive scope of MINIE’s predictive capacity and offers a complete resource amenable to computational analysis within network medicine and other data science frameworks62,63,64. Such large-scale networks can enable, for example, the identification of key hubs, pathways, and relationships that may not be immediately apparent in raw data.

In what follows, we focus on the biological interpretation of the regulatory network inferred in Fig. 7d. A direct quantitative assessment of our predictions is not possible due to a lack of ground truth. Notwithstanding, we conducted a comprehensive literature review of reported biological interactions to qualitatively validate our findings, a common practice in network inference studies65,66,67. Several top-scoring genes predicted as perturbation targets are known to be linked to neurodegeneration, including PD. For instance, ATP5A1, involved in mitochondrial function and mitophagy, is upregulated in models with PINK1 mutations and has been linked to PD pathogenesis68. Similarly, RHOA—a key regulator of cytoskeletal dynamics—has been implicated in PD through its role in maintaining neuronal structure function, with dysregulation observed in PD models69. Other identified genes (e.g., PHGDH and DNAJC7), though not directly linked to PINK1, have been associated with neuronal health and survival in PD70,71. We have also identified several high-confidence predictions for gene-gene interactions, including well-known interactions such as H1F0-MAP1B. Other links, like PCNA-EEF1B2, can represent indirect interactions, which are predicted with higher confidence than the true direct regulations (PCNA-EEF1A1-EEF1B2, respectively). Notably, we also identified novel potentially relevant links, including several interactions regulating ZFAS1, a gene known for its role in reducing neuronal damage and inhibiting inflammation and apoptosis, though its specific involvement in PD remains uncertain72. Finally, our network also highlights several PD-related hubs, including: NREP, involved in neural regeneration and plasticity; DLK1, a stress-response kinase implicated in dopaminergic neuron degeneration; and PDP1, a regulator of mitochondrial metabolism. These findings show the potential of our method to uncover both established and previously unreported interactions that may be relevant to PD development.

Validating metabolite-gene associations proved challenging due to the limited literature on metabolomic regulation. We therefore qualitatively examined our top predicted metabolites, including Glutathione (GSH/GSSG) and Glutamate, both well established in PD pathophysiology. Glutathione is depleted in early PD, perpetuating oxidative stress, mitochondrial dysfunction, and neuronal death73, while dysregulated glutamate contributes to excitotoxicity and progressive neuronal damage74. Although specific predicted metabolite-gene pairs like Glutamate-TAF7 and GSSG-CALM2 are not currently reported in the HMDB or STITCH databases, our results suggest biologically plausible links. In particular, the GSSG-CALM2 association is supported by mechanistic evidence, as glutathione redox status modulates calcium/calmodulin signalling through protein S-glutathionylation, a process strongly implicated in PD-related oxidative stress. These findings underscore the potential of our approach to generate novel, testable hypotheses that extend beyond existing biochemical annotations.

Discussion

This study presented MINIE, a two-step algorithm designed for multi-omic network inference from time-series data. Unlike other state-of-the-art algorithms designed for time-series (e.g., BINGO and dynGENIE3), which focus exclusively on single-omic data, MINIE models both transcriptomic and metabolomic dynamics within a DAE framework. Our results show that, when accounting for datasets with timescales differing by 100-fold or more, single-omic methods struggle to accurately infer regulatory cross-omic interactions, while MINIE successfully predicts these links with high confidence. These findings emphasise the importance of tailoring algorithms to the unique temporal and molecular characteristics of multi-omic data, rather than directly relying on single-omic methods. Furthermore, even in benchmarks comprising solely single-omic data, MINIE attains a highly competitive performance in network inference from scRNA-seq data, both in synthetic and curated data. Such performance stems from our Bayesian framework for pseudotime estimation, designed to model the cell variability intrinsic in single-cell data. MINIE’s performance was validated on both synthetic and experimental datasets. On synthetic data (including linear and nonlinear dynamics), MINIE accurately inferred the underlying network topologies, capturing both intra- and inter-omic interactions, and outperformed other published methods. A qualitative analysis on a PD study revealed that MINIE successfully identified known regulatory interactions (especially gene-gene regulations) while also uncovering novel links with potential relevance to the disease.

Despite these strengths, MINIE has limitations. Its performance depends on the quality and completeness of the input data, as well as prior knowledge of the metabolic network topology. Moreover, our analysis shows that while MINIE is robust across a wide range of cell counts (from a few hundred to thousands), its accuracy can substantially decrease when the number of time points or experimental conditions is limited. These challenges are expected given the high-dimensional nature of the data, and the fact that datasets comprising a single experiment often lack sufficient excitation—that is, the necessary variability in the input conditions to accurately identify system parameters. Moreover, our linear mapping approach, though beneficial for interpretability and computational efficiency, likely oversimplifies nonlinear regulatory mechanisms in metabolite-gene interactions. As we demonstrate in the Supplementary Material, Section 8 and Fig. S7, nonlinear methods for sparse model identification can become computationally intractable and are prone to overfitting in high-dimensional transcriptomic datasets. Future work should therefore focus on developing sophisticated nonlinear mapping methods that preserve biological interpretability while effectively handling the dimensionality challenges inherent to such data. Finally, although prior knowledge typically enhances inference accuracy, it may introduce biases when network structures are incomplete or inaccurate.

With current experiments, MINIE cannot infer the metabolic network. This is due to the lack of identifiability in the algebraic part (i.e., metabolite dynamics) of the DAE model, which directly restricts our ability to reconstruct the metabolic network without imposing additional constraints38. Given this intrinsic limitation, we focus on a critical subproblem: inferring how the metabolome regulates the transcriptome, an area that remains largely unexplored. Fortunately, the metabolome is one of the most well-characterised omic layers, providing a wealth of prior knowledge that can be effectively leveraged for network inference. This prior knowledge is tailored to the experimental conditions through the inferred transcriptome–metabolome mapping, enabling the integration of metabolomic data into our method. Future work could explore more flexible approaches to incorporate prior knowledge, which would further increase MINIE’s adaptability to novel biological contexts. Moreover, MINIE’s reliance on a pseudotime approach may not be ideal for capturing highly heterogeneous behaviour. The presented transcriptome–metabolome mapping could also be integrated in an approach based on modelling the propagation of the full single-cell distributions26. Finally, reliance on GP modelling and Bayesian inference poses scalability challenges for larger datasets. MCMC sampling, while effective, is computationally intensive and may experience slow convergence in high-dimensional settings (see Supplementary Material, Section 6, for experiments and discussion on MINIE’s computational efficiency). Despite these challenges, our results demonstrate that the method can be successfully applied to networks comprising around 500 nodes. In high-dimensional problems, we strongly recommend conducting several MCMC runs in parallel. Parallelisation not only accelerates the inference process but also improves exploration of the parameter and topology spaces.

Looking ahead, several promising directions could extend and refine MINIE. Integrating additional omic layers, such as proteomics and epigenomics, would provide a more holistic understanding of regulatory mechanisms, bringing MINIE closer to reconstructing the full interactome. The proteome, in particular, can be naturally incorporated in the DAE framework as protein translation operates on timescales comparable to mRNA transcription40. Additionally, the advent of single-cell proteomics offers a rich dataset that is compatible with our pseudotime approach75. To fully capitalise on these developments, there is an urgent need for comprehensive multi-omic benchmarks with validated ground truths to facilitate more rigorous evaluation and refinement of network inference methods.

Methods

MINIE’s algorithm

The pseudocodes describing MINIE’s pipeline are included in the Supplementary Material (Algorithms 1 and 2). A MATLAB implementation of MINIE is available in GitLab (see “Data availability” section). Below, we detail the two main steps implemented in MINIE, namely the transcriptome–metabolome mapping inference and the network inference via Bayesian regression.

Notation

The scRNA-seq data comprises a sequence of samples collected at specific time points, with each sample capturing the gene expression levels of individual cells. For each time point k = 1, …, T, the number of cells measured is denoted by Nk. This dataset is represented by matrix G = [G1 …GY], where each column \({{\boldsymbol{G}}}_{i}\in {{\mathbb{R}}}_{\ge 0}^{{n}_{{\rm{g}}}}\) represents the gene expression levels of cell i. The matrix contains all cells measured over all time points, with the total number of cells being \(Y=\mathop{\sum }\nolimits_{k = 1}^{T}{N}_{k}\), and is sorted by time points such that \(G=[{{\boldsymbol{G}}}_{1}\,\ldots \,{{\boldsymbol{G}}}_{{N}_{1}}\,{{\boldsymbol{G}}}_{{N}_{1}+1}\,\ldots \,{{\boldsymbol{G}}}_{{N}_{1}+{N}_{2}}\,\ldots \,{{\boldsymbol{G}}}_{Y}]\).

The metabolomic data is represented by \(M\in {{\mathbb{R}}}_{\ge 0}^{{n}_{{\rm{m}}}\times T}\), a matrix of observed responses containing nm experimental metabolite concentrations measured in bulk over T time points.

Transcriptome–metabolome mapping inference

We infer the transcriptome–metabolome mapping Γ from metabolomic and transcriptomic data using a sparse regression formulation.

Data processing

The inherent differences in data acquisition methods for the transcriptome and metabolome result in different data modalities. While transcriptomic data is often available at single-cell resolution, metabolomic data are typically collected in bulk. To maintain consistency in data modality for the inference of Γ, we aggregate the scRNA-seq data by averaging the gene expression levels across cells measured at the same time point. This procedure provides a bulk-like gene expression matrix, denoted by \(\bar{G}=[{\bar{{\boldsymbol{G}}}}_{1},\ldots ,{\bar{{\boldsymbol{G}}}}_{T}]\in {{\mathbb{R}}}_{\ge 0}^{{n}_{{\rm{g}}}\times T}\), which is compatible with the metabolomic data matrix M in the regression problem. Another essential component of this step is data normalisation. Transcriptomic and metabolomic data often differ substantially in scale, which can cause numerical instability. To address this challenge, we normalise the dynamic range of each gene and metabolite to one (e.g., \(\mathop{\max }\nolimits_{i}{\bar{G}}_{ij}-\mathop{\min }\nolimits_{i}{\bar{G}}_{ij}=1\)).

Constructing the metabolic network

The regression problem for inferring matrices Amm and Amg is highly underdetermined: the number of possible molecular interactions far exceeds the number of experimental samples. To reduce the degree of underdetermination, a key assumption in the inference of Amm and Amg is that the topology of the metabolic network is known, but the interaction weights within this topology are not. To determine the topology of the metabolic network, we curated the Human 1 GEM37 comprising over 10,000 metabolites, 3500 genes, and 13,000 metabolic reactions among these molecules (see Supplementary Material, Section 1, for details). From this model, we determine the nonzero structure of the interaction matrices Amm and Amg. This procedure assigns the positions of the nonzero elements in both matrices, while the actual weights of these interactions are estimated by solving the regression problem. Sensitivity analysis of the network inference results with respect to gaps and errors in the prior metabolic network topology can be found in Supplementary Material, Section 7 and Fig. S6.

Sparse regression

We now infer the parameters of matrices Amm and Amg in the linear algebraic equation (2b). To prevent trivial solutions where all coefficients are zero (i.e., Amm = Amg = 0), we employ the decomposition \({A}_{{\rm{mm}}}=-{I}_{{n}_{{\rm{m}}}}+{\bar{A}}_{{\rm{mm}}}\), where \({\bar{A}}_{{\rm{mm}}}\) contains only off-diagonal elements and \({I}_{{n}_{{\rm{m}}}}\) is an identity matrix of size nm. Thus, Eq. (2a) is rewritten as

$${\boldsymbol{m}}={\bar{A}}_{{\rm{mm}}}{\boldsymbol{m}}+{A}_{{\rm{mg}}}{\boldsymbol{g}}+{{\boldsymbol{b}}}_{{\rm{m}}}.$$
(5)

Note that, due to the aforementioned decomposition, the coefficients in \({\bar{A}}_{{\rm{mm}}}\), Amg, and bm are appropriately scaled by the inverse of the diagonal entries of Amm. The nonzero elements of \({\bar{A}}_{{\rm{mm}}}\), Amg, and bm are then estimated through a linear regression approach, formulated as M = XΘ. Here, \(X=[M\,\,\,\bar{G}]\) is the matrix of predictor variables, and Θ is the vector of unknown regression coefficients corresponding to the previously assigned nonzero elements of \({\bar{A}}_{{\rm{mm}}}\), Amg, and bm.

Our approach assumes that the time-series sampling typically consists of experimental designs with two conditions (e.g., mutant and control) and 4–6 time points per experiment. Given the sparse nature of metabolic networks, it is expected that most metabolites are regulated by fewer than 12 interactions (which defines an upper bound based on the combination of 6 time points across 2 conditions). However, when metabolites have a large number of regulators or the number of samples is small, the regression problem can still be underdetermined. To mitigate this issue, we solve the following sparse regression problem

$$\min \parallel X\Theta -M{\parallel}_2^2+\lambda \parallel \Theta {\parallel }_{2}^{2},$$
(6)

where the regularisation term shrinks the regression coefficients towards zero, ensuring a unique sparse solution. This step yields the coefficients Θ, which are then integrated in their corresponding positions in Amm and Amg and used to derive \(\Gamma =-{A}_{{\rm{mm}}}^{-1}{A}_{{\rm{mg}}}\).

We note that this linear formulation may overlook nonlinear metabolite-gene regulation. Yet, nonlinear methods proved computationally intractable and prone to overfitting in high-dimensional settings (as shown in Supplementary Material, Section 8 and Fig. S7).

Network inference via Bayesian regression

We infer the multi-omic regulatory network from the time-series scRNA-seq data and the previously inferred transcriptome–metabolome mapping. Following Eq. (1), the dynamics of the gene expression are modelled as a nonlinear stochastic differential equation

$$\dot{{\boldsymbol{g}}}={\boldsymbol{f}}({\boldsymbol{g}},{\boldsymbol{m}},{{\boldsymbol{b}}}_{{\rm{g}}};{\boldsymbol{\theta }})+{\boldsymbol{\rho }}({\boldsymbol{g}},{\boldsymbol{m}}){\boldsymbol{w}},$$

where f is a function of gene expression and metabolite concentrations, modelled as a collection of GPs, \({\boldsymbol{f}}={[{{\boldsymbol{f}}}_{1},\ldots ,{{\boldsymbol{f}}}_{{n}_{{\rm{g}}}}]}^{\top }\), and w is a white noise with covariance \(Q={\rm{diag}}({q}_{1},\ldots ,{q}_{{n}_{{\rm{g}}}})\).

MINIE constructs the posterior distribution p(θG) for model parameters θ, given the observed scRNA-seq data G. To obtain a tractable distribution, latent variables in the form of the continuous gene expression trajectory g and the pseudotime variable τ are introduced and integrated out by Monte Carlo integration:

$$\begin{array}{ll}p({\boldsymbol{\theta }}| G)\,=\,\iint \,p({\boldsymbol{\theta }},{\boldsymbol{g}},{\boldsymbol{\tau }}| G){\rm{d}}{\boldsymbol{g}}\,{\rm{d}}{\boldsymbol{\tau }}\\\qquad\qquad\,\propto \,p(\theta )\iint \,p(G| {\boldsymbol{g}},{\boldsymbol{\tau }},{\boldsymbol{\theta }})p({\boldsymbol{g}}| \theta )p({\boldsymbol{\tau }}){\rm{d}}{\boldsymbol{g}}\,{\rm{d}}{\boldsymbol{\tau }}.\end{array}$$

Here, p(Gg, τ, θ) is the measurement model defined below; p(gθ) is the prior for gene expression trajectories, governed by the GP model in Eq. (1)a; and p(τ) represents the prior distribution over the pseudotime variables, incorporating information from the experimental sampling times.

MINIE’s inference procedure iteratively alternates between sampling the pseudotime estimate, the network topology, and the continuous-time gene trajectories. Three distinct MCMC samplers are employed in this procedure, as described below, whereas the required priors are reported in the Supplementary Material, Section 5. The convergence of the MCMC sampling in the case of the nonlinear multi-omic network represented in Fig. 3 and for the PD case in Fig. 7 is reported in the Supplementary Material, Fig. S11. With high-dimensional problems (n > 50), the convergence is slower. To help in exploration, it is recommended to conduct several MCMC runs (in parallel) and combine the results by taking the average over the confidence matrices. For example, our results on the PD data were obtained from 28 parallel chains.

Data processing

To account for outliers and higher variability in single-cell data compared to bulk data, the normalisation used for single-cell data is slightly different. With single-cell data, the expression levels of all genes are normalised such that the difference between the 95th and 5th percentiles is one. In order to maintain correct scaling of the transcriptome–metabolome mapping, each column of Γ must be scaled by the inverse of the scaling factor for the corresponding gene. Finally, to scale inferred metabolite levels to transcriptomic levels, Γ is normalised row-wise such that the difference of the 95th and 5th percentiles of each row of the product ΓG is one.

Pseudotime sampling

The integration of scRNA-seq data as input is based on the concept of pseudotime. Pseudotime represents a latent temporal dimension, capturing the progression of cell states along a biological process76. Unlike traditional GRN inference methods that require the computation of pseudotime prior to the network inference step40, MINIE integrates pseudotime as a parameter to be sampled concurrently with other model parameters. This procedure is detailed as follows.

MINIE models the single-cell gene expression data G as samples drawn from the continuous trajectory g, where the pseudotime τ serves as the sampling time:

$${{\boldsymbol{G}}}_{j}={\boldsymbol{g}}({{\boldsymbol{\tau }}}_{j})+{{\boldsymbol{v}}}_{j},$$
(7)

where Gj denotes the gene expression profile of cell j, τj is the associated pseudotime at which the gene expression state \({\boldsymbol{g}}(t)\) is sampled along the trajectory, \({{\boldsymbol{v}}}_{j} \sim {\mathcal{N}}({0}_{{n}_{{\rm{g}}}\times 1},R)\) accounts for measurement noise, and \(R=\,\text{diag}\,({r}_{1},\ldots ,{r}_{{n}_{{\rm{g}}}})\) denotes a diagonal covariance matrix. Accordingly, τ = {τ1, …, τY} is the collection of pseudotimes for Y cells. Pseudotimes are then sampled using a dedicated Metropolis–Hastings MCMC sampler with random walk proposals from the posterior distribution p(τG, g, θ) p(Gg, τ, θ)p(τ). The measurement model is given by \(p(G| {\boldsymbol{g}},{\boldsymbol{\tau }},{\boldsymbol{\theta }})=\mathop{\prod }\nolimits_{j = 1}^{Y}{\mathcal{N}}({{\boldsymbol{G}}}_{j};{\boldsymbol{g}}({{\boldsymbol{\tau }}}_{j}),R)\). A Gaussian prior p(τ) is imposed on the pseudotimes, with a mean centred on the experimentally observed sampling time of each cell and a variance based on the measurement time intervals. Sensitivity analysis on the pseudotime variance can be found in Supplementary Material, Section 7 and Fig. S6. A weighting scheme for the measurement model to account for zero inflation in single-cell data is described in the Supplementary Material, Section 5.

Network topology sampling

Network inference is performed using a Bayesian framework to estimate the model parameters θ, including the network topology. The mean function of the GP f in Eq. (1) is defined componentwise as

$${{\boldsymbol{\mu }}}_{i}({\boldsymbol{g}})={{\boldsymbol{b}}}_{i}-{{\boldsymbol{a}}}_{i}{{\boldsymbol{g}}}_{i},$$
(8)

where ai and bi represent the rates of mRNA degradation and basal transcription, respectively. The covariance function is given by

$${K}_{i}({\boldsymbol{x}},{\boldsymbol{z}})={\gamma }_{i}\exp \left\{-\mathop{\sum }\limits_{j=1}^{n}{\beta }_{ij}{({{\boldsymbol{x}}}_{j}-{{\boldsymbol{z}}}_{j})}^{2}\right\},$$
(9)

where x = [gx, mx] and z = [gz, mz]. If an external perturbation is considered, then x = [gx, mx, bg] (and z) concatenates the concentrations of all molecules and the possible perturbation to gene expression dynamics bg, in which case the summation index j goes up to n + 1. The mapping Γ is applied to gene trajectory samples to estimate metabolite trajectories at the single-cell level: m = Γg − Δbm. This approach integrates the bulk metabolomic data with single-cell transcriptomic data.

The parameters βij quantify the regulatory influence of molecule j on gene i. We factorise it as βij = SijHij, where Sij is the binary indicator for the existence of a link from molecule j to i and Hij ≥ 0 determines the weight of that link. The Bayesian estimation of parameters βij is performed by sampling the indicator variable Sij, the continuous parameters Hij, and other model parameters included in \({\boldsymbol{\theta }}={\{{S}_{ij},{H}_{ij},{\gamma }_{i},{{\boldsymbol{b}}}_{i},{{\boldsymbol{a}}}_{i},{q}_{i},{r}_{i}\}}_{i = 1,\ldots ,{n}_{{\rm{g}}},j = 1,\ldots ,n}\), using dedicated Metropolis–Hastings MCMC samplers from the posterior distribution p(Gg, τ, θ)p(gθ)p(θ). This posterior distribution can be factorised into ng components such that each component depends only on the hyperparameters θ for one target gene at a time. Therefore, the topology sampling can be performed one target gene at a time. Random walk proposals are used for sampling the continuous parameters. For the topology, we randomly choose between two proposal moves. In the first proposal move, for target gene i, an element Si,j is randomly chosen from the ith row of S, and it is changed to 1 − Si,j (that is, from one to zero or zero to one). In the second proposal move, one existing link is replaced by another.

The prior for the network topology can be used to adjust network sparsity and to incorporate prior knowledge on the network topology. Sensitivity analysis on the network sparsity level can also be found in Supplementary Material, Section 7 and Fig. S6.

Trajectory sampling

MINIE manages the high variability and noise levels characteristic of scRNA-seq data by sampling gene expression trajectories g from the posterior distribution p(gθ, G, τ) p(Gg, τ, θ)p(gθ). Here, p(gθ) is the prior probability distribution for the gene expression trajectories that depends on the model parameters, notably on the network topology. We note that this distribution can be analytically calculated for a discretised trajectory due to the GP formalism used [3, Supplementary Note 3]. Given the high cell variability in scRNA-seq data, a single trajectory cannot fit the whole data. Thus, the trajectory sampling is used to better explore plausible gene expression trajectories. Trajectories are sampled using a dedicated MCMC sampler based on Crank–Nicolson sampling77,78 as described in Ref. [3, Suppl. Note 7].

Output

The MCMC sampler (dedicated to the network topology) generates a series of network topologies, encoded by adjacency matrix samples S(l), over multiple iterations l = 1, …, Niter. These topologies are then averaged to produce the confidence matrix \(C=\frac{1}{{N}_{{\rm{iter}}}}\mathop{\sum }\nolimits_{l = 1}^{{N}_{{\rm{iter}}}}{S}^{(l)}\). The dimension of the confidence matrix is ng × n or ng × (n + 1) if a perturbation is included. The first ng columns correspond to gene-gene interactions, the next nm columns correspond to metabolite-to-gene interactions, and in case a perturbation is included, the last column of the matrix corresponds to direct perturbation targets.

Experimental data preprocessing

Control and PINK1-mutant cell lines were studied through a cell differentiation process from iPSCs to dopaminergic neurons. The PINK1 dataset involved iPSCs carrying the patient-based homozygous mutation ILE368ASN in the PINK1 gene, whereas control cells were obtained from age- and sex-matched individuals. Measurements for both samples occurred on days 0, 8, 18, 25, 32 and 37, which were used to generate the (time-series) scRNA-seq data and bulk metabolomic data. For both control and mutant cases, three biological replicates for metabolomic data were collected.

scRNA-seq processing

The preprocessing steps for scRNA-seq data focused on retaining high-quality cells and genes while removing uninformative data that could affect downstream analyses. Low-quality cells were identified based on three criteria: (1) the number of expressed genes per cell had to exceed 200 and be more than 2 median absolute deviations (MAD) above the median, (2) the total number of counts per cell had to be 2 MAD above or below the median, and (3) the percentage of mitochondrial gene counts had to be less than 1.5 MAD above the median. Cells that failed any of these criteria were classified as low-quality and excluded from further analysis. Genes expressed in fewer than 10 cells were also removed from the dataset. Doublets (i.e., cases where two cells are mistakenly processed as one) were identified and removed using Scrublet79, a nearest-neighbour classifier that simulates transcriptomic profiles to predict doublets. Identified doublets were removed to maintain the integrity of single-cell profiles.

To reduce technical variation and ensure accurate comparisons between cells, gene expression counts were normalised by adjusting feature expression for each cell by the median of the total counts, followed by log transformation. This approach mitigates the effects of differences in sequencing depth while preserving the signal information. After normalisation, the top 500 most dynamically variable genes were identified by calculating the Wasserstein distances of the single-gene expression distributions between consecutive time points. The sum of Wasserstein distances across time points for both control and mutant samples was used to rank genes by their dynamic variability. Genes exhibiting the highest variability were selected for use as input data to MINIE. The PINK1 gene was also added to the list of genes due to its relevance to the study.

Metabolomic data processing

The metabolomic data were processed independently by the metabolic platform at the Luxembourg Centre for Systems Biomedicine. The platform identified and quantified metabolites, producing concentration matrices for each condition. After data cleaning, the metabolomics dataset consisted of 111 identified metabolites, with four duplicates (L-tryptophan, 2-hydroxyglutarate, 2-oxoglutarate, and GABA) removed from the dataset. Given the small size of the metabolomic data relative to the transcriptomic data, no further filtering was applied, and all metabolites were input to MINIE.