Introduction

Diffusion magnetic resonance imaging (dMRI) is a non-invasive technique for in vivo measurement of water diffusion in tissue using magnetic field gradients. To extract biologically interpretable information, a common approach is to fit a microstructural tissue model to a set of signals acquired with different dMRI acquisition settings1,2,3,4. In the absence of diffusion time dependence, these typically include different combinations of gradient strengths (commonly quantified by the b-value), directions (b-vector), and B-tensor shape5. Microstructural parameters estimated by these models – including compartmental signal fractions and diffusivities – have shown to be sensitive to changes in brain structure due to diseases like multiple sclerosis6, Alzheimer’s disease7 and Parkinson’s disease8, and can provide a more fundamental understanding of tissue microstructure in both healthy and pathological tissues9.

The Standard Model of white matter (SM), see ref. 4 for a review, describes the signal arising from white matter by a kernel consisting of three compartments (intra-axonal, extra-axonal, and free water (occasionally omitted)) convolved with a fiber orientation distribution (FOD)10. Compartmental signal fractions and diffusivities can be estimated, alongside the parameters that describe the FOD (usually in the form of a spherical harmonics (SH) series). Nevertheless, the high-dimensional parameter space of the SM complicates the estimation of its parameters, potentially leading to low accuracy, precision, and degeneracy of estimates11. These issues become even more prominent at high noise levels.

Multiple strategies have been employed to fit microstructure models to dMRI data. When the primary goal is to estimate the tissue’s directional structure, a common two-step approach involves first fixing the kernel using a global estimate, followed by solving a linear inverse problem to estimate the fiber orientation distribution (FOD)12,13. In contrast, when the focus is on estimating the kernel parameters, the orientational dependence is factored out by using rotational invariants of the signal14,15,16,17. This last approach is most common for SM parameter estimation.

Estimation of SM parameters has been improved by machine-learning based methods including Bayesian estimators16, neural networks18,19, and fitting cubic polynomials15,20. Importantly, these approaches are commonly supervised machine-learning methods, operating at a voxel-level, that are fit by using simulated ground truth parameters and their associated signals – the training dataset. While this can be very effective, the quality of the results depend on biases existing in the training and inference datasets21. Additionally, since the methods operate at a voxel-level, they do not make any use of the spatial correlation that is naturally present in anatomy.

Recently, implicit neural representations (INRs) have been introduced to the dMRI domain as a novel self-supervised fitting method, which – rather than on a voxel level – fit models on a continuous space of coordinates and are trained on the dMRI signal directly without the realization of ground truth parameters. INRs have shown to create noise-robust continuous representations of dMRI datasets of individual subjects, using the spatial correlations present in the data. So far, they have been used to represent the diffusion signal using SH basis functions and to estimate the parameters of (multi-shell multi-tissue) constrained spherical deconvolution (MSMT-CSD)22,23,24. In this previous work, INRs demonstrate potential to improve on voxel-based methods to estimate parameters, especially in more noisy acquisitions. Since INRs are spatially regularized continuous representations of the dataset, they can potentially be beneficial when performing downstream tasks which require interpolation, such as microstructure-informed fiber tracking25,26.

Building upon our previous work22,24, we implement INRs to estimate the SM parameters alongside the FODs, and demonstrate the noise-robustness, continuous representation, and applicability of INRs for fitting on both synthetically generated and in vivo dMRI data. Synthetically generated datasets facilitate a quantitative analysis of the model outputs, while in vivo data quantitatively shows the performance in a realistic setting. The INR is compared to two existing machine learning methods for fitting the SM (Standard Model Imaging Toolbox (SMI)15 and a supervised neural network (NN)18), as well as nonlinear least squares (NLLS). Additionally, moving beyond existing methods, the FOD SH-coefficients up to order eight are estimated directly alongside the SM kernel parameters (intra-axonal diffusivity (Di), extra-axonal axial diffusivity (\({D}_{e}^{\parallel }\)), extra-axonal perpendicular diffusivity (\({D}_{e}^{\perp }\)) and intra-axonal fraction (fi)). Thus, in contrast to earlier methods that focused either on the kernel or on the FOD, this approach performs a joint estimation, which can improve accuracy and ameliorate degeneracy27. Moreover, every INR is fit on and represents a dMRI dataset of a single subject and, therefore, does not rely on a large number of training datasets as supervised methods do. Furthermore, it is capable of explicitly correcting for gradient non-uniformities by inputting the effective acquisition protocol (B-tensor) for each coordinate with the spatially varying gradient coil tensor of the scanner28. The latter would become impractical for supervised methods, which would need prohibitively large sets of training data to capture voxel-wise protocol deviations. Altogether, the proposed method provides a flexible, noise-robust, spatially coherent way of fitting the SM, which is self-supervised and, therefore, not biased by training data.

Results

Quantitative comparison on simulated data

Results from experiment 1 are presented in Fig. 1 (signal-to-noise ratio (SNR) = 20). When Gaussian noise is added, the INR method shows superior performance for all SM parameters, with both Pearson’s correlation coefficient (ρ) and root mean squared error (RMSE) achieving the highest values. Brain parameter maps from Fig. 1b shows the ability of the INR method to reproduce smooth parameter maps similar to the ground truth, where the voxel-wise fitting methods show noisy estimates due to the lack of spatial regularization. The INR method does exhibit minor overestimation of \({D}_{e}^{\parallel }\) in the splenium of the corpus callosum compared to the ground truth.

Fig. 1: Results of Standard Model fitting in experiment 1 (SNR 20).
Fig. 1: Results of Standard Model fitting in experiment 1 (SNR 20).
Full size image

a Scatter density plots of ground truth versus parameter estimations of all methods. The titles of the subplots indicate Pearson’s correlation coefficient (ρ) and root mean squared error (RMSE). Every column corresponds to a specific parameter, which is indicated above. b SM parameter maps corresponding to the results in a. Bottom row shows the ground truth (GT).

Parameter estimation on simulated data without noise and with SNR = 50 can be found in the supplementary information section 2 (Figs. S2 and S3), showing a less prominent, but still evident improvement in ρ and RMSE of the INR method compared to other estimation methods for all parameters. Fitting without noise shows that the ground truth generating process is not positively biased towards parameter estimates of the INR.

Rician noise bias

The correlation plots between the INR with mean squared error (MSE) or Rician loss and the ground truth parameters, alongside the parameter maps for both approaches, can be seen in Fig. 2. For MSE, the Rician bias is visible in the scatter plots and parameter maps as a general over- or underestimation across all voxels. The Rician loss is able to correct the bias, resulting in better correlation with the ground truth. The bias is most significant for Di and \({D}_{e}^{\parallel }\). The bias effect is less prominent when considering SNR = 50, see supplementary information section 3, Fig. S4.

Fig. 2: Effect of Rician Loss Likelihood function on Standard Model fitting (SNR = 20).
Fig. 2: Effect of Rician Loss Likelihood function on Standard Model fitting (SNR = 20).
Full size image

a Scatter plots for estimation with MSE and Rician loss. Pearson correlation coefficient (ρ) and Root-Mean Squared Error (RMSE) are indicated in the subplot’s title. Every column corresponds to a specific parameter, which is indicated above. b Brain parameter maps of the predictions from a. Difference maps are calculated with the ground truth (GT) parameters.

Qualitative comparison on in vivo data

The parameter maps for the in vivo dataset for the different methods are seen in Fig. 3. The differences between methods are especially visible in the capsula interna and externa, and the splenium of the corpus callosum. The INR produces maps that are more spatially smooth and show a clear (and anatomically plausible) structure, while other methods display higher spatial variability.

Fig. 3: Standard Model fitting results for in-vivo data for INR, SMI, Supervised NN and NLLS.
Fig. 3: Standard Model fitting results for in-vivo data for INR, SMI, Supervised NN and NLLS.
Full size image

All SM parameters are plotted as single row. Every row corresponds to the fitting method indicated at the beginning of the specific row. All maps have equal scaling.

Estimation of SH order up to l max = 8

The INR outputs for the different SH order (lmax) FODs of the synthetic datasets are visualized in Fig. 4. Qualitative inspection of the FODs shows plausible FOD shapes and directions throughout all datasets and SH orders. Furthermore, there are no notable differences between the FOD estimates for the noiseless and SNR 50 synthetic dataset, indicating noise-robustness in the estimate. Small spurious peaks appear in the SNR 20 synthetic dataset, but the fiber orientations indicated by the larger peaks remains almost identical to both comparisons. In the in vivo dataset the INR produces plausible FOD shapes and directions as well, as visualized in Fig. 5. The backgrounds in Figs. 4 and 5 show no large voxel-wise differences for fi across the different orders. This holds true for all kernel parameter estimates, which is shown in further detail in the supplementary information section 4, Tables S1 and S2.

Fig. 4: Visualization of the centrum semi-ovale for different spherical harmonics orders on the synthetic datasets.
Fig. 4: Visualization of the centrum semi-ovale for different spherical harmonics orders on the synthetic datasets.
Full size image

Different combinations of lmax (rows) and datasets (columns) are shown, with parameter map fi as background. Fiber orientation distributions are scaled for visibility.

Fig. 5: Visualization of the centrum semi-ovale for different spherical harmonics orders (lmax) of the in vivo dataset.
Fig. 5: Visualization of the centrum semi-ovale for different spherical harmonics orders (lmax) of the in vivo dataset.
Full size image

Fiber orientation distributions (FODs) are shown for increasing lmax with parameter map f as background. FODs are scaled for visibility.

Effect of gradient non-uniformity correction of SM parameter estimation

Brain parameter maps with and without gradient non-uniformity correction on in vivo data are presented in Fig. 6. Difference maps show a significant effect on Di and \({D}_{e}^{\parallel }\). Corrected maps show the effect at the edges of the brain, mainly in the frontal lobe. This is to be expected as gradient non-uniformity is strongest there. Di and \({D}_{e}^{\parallel }\) show lower values after correction. The influence of the correction is least apparent on fi and the rotational invariant of lmax = 2 (p2). The effect of the gradient non-uniformity correction is similar for both INR and NLLS fitting. Lower diffusivity values at the front and back of the brain appear using both methods, as well as higher p2 values. The parameter f shows only small differences between the approaches: NLLS shows no effect, while INR shows small corrections throughout the brain. Results of combining Rician bias loss with gradient non-uniformity correction can be found in the supplementary information section 5, Fig. S5.

Fig. 6: Effect of gradient non-uniformity correction on the estimation of Standard Model parameters.
Fig. 6: Effect of gradient non-uniformity correction on the estimation of Standard Model parameters.
Full size image

The top row shows parameter maps without correction. The middle rows show the effect of gradient non-uniformity correction using the INR method. Bottom rows show the effect of gradient non-uniformity correction on NLLS parameter estimation. The difference maps are computed relative to the parameter estimates obtained with the same method, but without applying gradient non-uniformity correction.

Implicit neural representation for spatial interpolation

The visualizations in Fig. 7 show the comparison between the p2 parameter maps using different methods for upsampling. The linear interpolation maintains the pixelated appearance of the low-resolution data, especially visible in structures that are not aligned with the image grid (a ’staircase-like’ effect). These artifacts are, although less prominent, still visible in cubic interpolation. The INR does not show these artifacts at this resolution, as the underlying continuous representation is less limited by the input resolution.

Fig. 7: Comparisons between different upsampling methods.
Fig. 7: Comparisons between different upsampling methods.
Full size image

A coronal slice (a) and a sagittal slice (b) of the p2 parameter map are shown at the original resolution, and upsampled 8x in every dimension using linear interpolation, cubic interpolation, and the INR.

Model fitting times

Model fitting times for all INR experiments are shown in Table 1. The main influence on the fitting time is the size of the dataset (number of voxels in the WM mask), the size of the hidden layers, amount of epochs, the number of outputs (determined by lmax), and the usage of the analytical or numerical integration solution. The analytical approach was used in the experiments with the simulated dataset and the numerical approach for the in vivo data. The number of white matter voxels included in the simulated data and in vivo data are 60800 and 11266, respectively. This means that the analytical approach is considerably faster than the numerical approach. The addition of gradient non-uniformity correction also increases fitting time.

Table 1 Table showing fitting times in seconds for the INR on different datasets, N = numerical integration, A = analytical integration

Discussion

Implications of the results

In this work, we show how INRs can be used to estimate continuous, noise-robust SM parameter maps of simulated and in vivo datasets and with FODs of different SH orders. The self-supervised, subject-wise nature of the framework prevents training set bias, while the continuous representation allows spatial correlations to improve parameter estimates and reduce the impact of noise. For high SNR levels, the supervised NN method achieves performance metrics close to those of the INR approach; however, it can be significantly more time-consuming due to its reliance on NLLS for generating part of the training data and the need to retrain the model for each specific acquisition protocol (see Supplementary Fig. S3). At higher noise levels (SNR = 20), the INR method clearly outperforms all other methods (see Fig. 1). On in vivo data, the underlying representation shows a more structurally correlated appearance, without large inter-voxel variability. This is further illustrated by upsampling the INR at high resolution. Parameter estimates for FODs up to SH orders of at least eight can be provided alongside the other SM parameters without introducing bias in other parameters as shown in the supplementary information section 4. The self-supervised nature of the method avoids training set bias prevalent in supervised fitting methods.

The proposed hyperparameters np = 5000 and nh = 2048 provide a robust setting that can provide good representations of dMRI datasets with different sizes, acquisition protocols, and levels of noise. An exploration of these hyperparameters can be found in the supplement of24. The hyperparameter σ2 provides a convenient way of tuning the model to provide stronger or weaker spatial regularization, as detailed in the supplementary information section 1, which shows results for σ2 = 1 (extremely smooth) up to σ2 = 8 (granular).

Fitting time is a critical factor when applying dMRI microstructure modeling, and various efforts have been made to speed up the computationally heavy nonlinear optimization and enable large-scale population studies19,29,30. INRs circumvent this through its inherent self-supervised multi-layer perceptron (MLP) structure, which allows for efficient, continuous representation of the parameter space without requiring voxel-wise optimization. The INRs are fit on consumer-grade hardware in around 5 up to at most 23 minutes for the scenarios tested, much faster than classic NLLS approaches. Using the analytical integration approach decreases training time considerably, possible when excluding negative bΔ values in acquisition protocols. Once the INR is fit to the dataset, the inference time is negligible. For example, an lmax = 2 model with nh = 2048 and np = 5000 can perform inference at one million coordinates in 2.7 seconds and for lmax = 8 in 2.8 seconds, including data writing times to and from the GPU. However, since INRs require a model to be fit to every individual subject, a supervised learning approach could remain faster for large multi-subject datasets consisting of many subjects with identical acquisition protocols, despite the considerable amount of training time it requires initially (e.g., 83 minutes on GPU, excluding initial NLLS parameter estimations, for the supervised NN method18).

The method’s ability to incorporate gradient non-uniformity correction in the fitting process provides an advantage over typical supervised methods, for which the training set would be impractically large to capture the spatial variability. This correction is essential as even small non-uniformities can affect parameter maps28,31,32,33, and the availability of high-performance gradient coils suffering from significant gradient non-uniformities is increasing34. To our knowledge, SMI is the only framework that has incorporated gradient non-uniformity correction into the SM fitting process, in the form of PIPE35. This approach uses SVD and linear regression to approximate the exact acquisition settings in each voxel for linear tensor encoding (LTE) acquisitions.

Limitations of the work

A limitation of using simulated data for evaluation is the variety of possible approaches to generating the ground truth. In this work, the intention was to create a ground truth with structurally smooth characteristics assumed to mimic real brain tissue. However, factors such as voxel size play a role and need to be further investigated. The ground truth generated for the synthetic experiments makes use of SMI for generating the underlying parameter maps. This could potentially bias the parameters to be in a range that favors estimation using SMI. We indeed observed that estimation on the noiseless signal showed optimal performance for SMI and NLLS (see supplementary information Fig. S2). The INR method exhibits lower performance on noiseless signals due to its inability to model voxels individually, indicating that the ground truth is not positively biased with respect to the outputs of the INR method. Nevertheless, we have attempted to reduce biases that would benefit a particular method by smoothing the parameter maps and using FODs from a different source (MSMT-CSD). Omitting the smoothing still resulted in the highest performance for INR, see supplementary information section 6 Fig. S8. Another limitation related to the ground truth is that the MGH dataset used in this study contained only LTE acquisitions, which may have led to inaccurate parameter estimates (see discussion at the end of this section). We have investigated the impact of other possible sources of severe bias on creating ground truth parameter maps from the MGH dataset, such as the relatively short diffusion time, noise estimation procedure, and included b-values (see supplementary information section 6, Figs. S6S8). We found similar overall distributions and linear voxel-wise correlations when using longer diffusion times, noise estimate from repeated b = 0 smm−2 images, and excluding b = 200 smm−2 and b > 10.000 smm−2 images. Nevertheless, creating a ground truth that balances capturing anatomical reality while exerting sufficient control remains an important avenue to further explore.

The comparison experiments across methods were conducted using Gaussian noise, which differs from the noise characteristics of in vivo magnitude MRI data, typically following Rician or non-central Chi distributions. However, with appropriate preprocessing and ideally the availability of phase data, the noise can be transformed to approximate a Gaussian distribution36,37. This makes the use of Gaussian noise still relevant and consistent with previous work in self-supervised learning for dMRI38. The Rician noise experiments reveal biases in the parameter estimates by the INR when using MSE, suggesting that this should be taken into consideration. In this work, we show the promise of correcting for this bias by using a loss function tailored specifically to Rician noise39.

The INR shows a slight overestimation in \({D}_{e}^{\parallel }\) in the splenium of the corpus callosum in the synthetic experiments. This could be due to the ground truth exhibiting less structural coherence in this part, which is especially apparent in \({D}_{e}^{\perp }\) parameter map. Since the parameters are estimated jointly, this might influence the estimation of \({D}_{e}^{\parallel }\). Potentially, SMI does not suffer from this because it fits the SM voxel-wise and is, therefore, able to produce these combinations of parameters.

The interpretation of the in vivo parameter maps is subjective, as there is no ground truth available. The INR produces more spatially coherent estimates than other methods, showing anatomically plausible structure and physiologically plausible parameter values. This could imply that they more closely resemble the actual underlying tissue, but conclusions should be drawn with caution. For example, compared to the other maps, the INR produces a slightly higher estimate for \({D}_{e}^{\parallel }\) and a slightly lower estimate for \({D}_{e}^{\perp }\). We cannot be certain about which estimate is more accurate. To evaluate the eligibility of the in vivo acquisition protocol to fit SM itself, ground truth simulations with this protocol were performed, which can be found in the supplementary information section 7 (Fig. S9).

Additionally, correcting for gradient non-uniformities has a significant impact on the parameter estimates, yet the accuracy remains to be evaluated, although comparison to NLLS in combination with gradient non-uniformity correction shows similar results. While changes in shape due to gradient non-uniformities are taken into account (bΔ), a limitation of the current implementation is that it assumes conservation of B-tensor axial symmetry, an assumption that generally does not hold when gradient non-uniformities and non-LTE encodings are considered35. The exact impact of this approximation would necessitate implementation of SO(3) convolutions and requires further investigation. Experiment 5 shows that these corrections result in lower estimates for \({D}_{e}^{\parallel }\) which brings \({D}_{e}^{\parallel }\) more in agreement with previous work showing \({D}_{i} > {D}_{e}^{\parallel }\) (in gadolinium based contrast experiments40) and Di ~ 2.3 μm2 ms−1 (in experiments with elaborate acquisition protocols using high diffusion planar tensor encoding41). Any further inconsistency with values of \({D}_{e}^{\parallel }\) for the in vivo dataset could be caused by the inability of the acquisition protocol to discriminate solution branches as a high b-shell (b > 5000 smm−2) with LTE is lacking11,15,42. To gain more insight into the degeneracies of the INRs estimation and to enable error quantification, calculating the posterior distribution is required43,44.

Future work

The presented INR method can be extended in future work. This work has focused on including the minimal number of SM parameters to reduce the complexity of the fitting parameter space and to evaluate the method’s performance. Importantly, the framework is fully flexible to fit any biophysical model, by adjusting the forward equation to predict the signal (Fig. 9). For example, the SM implementation can be extended to include relaxation effects, which introduce compartmental T2 as fitting parameters. Further distinction can be made between intra-axonal compartmental T2 and extra-axonal compartmental T245,46, which adds two extra fitting parameters. The contribution of free water can also be introduced as a fitting parameter. However, previous work has shown that the impact of this parameter is small except for voxels around the ventricles15. Adding this parameter could resolve fitting issues around the ventricles for the in vivo data in experiment 3 where high \({D}_{e}^{\parallel }\) and \({D}_{e}^{\perp }\) values are found.

The spatial regularization inherent to INRs in the fitting process can be beneficial for other biophysical models. The presented method fits the Standard Model of white matter, which – as the name suggests – is only applicable for white matter. As a result, we applied a white matter mask, and only used the coordinates that lie inside this mask as input to the INR. This implies that for coordinates outside of the mask, and in different tissue types, INR results are not fit (correctly). Implementation of gray matter models (e.g. as in ref. 47) using INRs avoids the need for fitting solely white matter and can provide whole brain parameter maps.

The combination of estimating SM parameters together with FOD SH up to high lmax values opens up the possibility to combine the microstructural information of SM-estimates and the directional information of the FOD to do microstructure-informed tractography25,26,43,48, and future work could extend the model to estimate fiber-direction specific kernels.

When applying this method to a large number of subjects, for example when doing group analysis, fitting times of the INR might become a limiting factor. The duration of the fitting process for self-supervised learning in combination with dMRI models can potentially be considerably shortened when applying transfer learning49, meta-learning50, continual learning51, or hash-encodings52, which could make on-the-fly fitting of INRs possible. Tractography can also benefit from the continuous representation of the INR in both interpolation computation time and accuracy53.

The application of microstructural information in clinical studies remains untested in this context, but its potential utility for the diagnosis and assessment of various pathophysiological processes could be explored. The INRs performance remains to be further evaluated in pathology, particularly the effect of spatial regularization on the quantification of small lesions. The encoding frequency variance σ2 can be tuned to accommodate higher frequency changes in the signal. The performance of INRs for sparse, clinically feasible acquisition protocols remains to be investigated and represents a direction for future research38.

Conclusion

Using INRs to fit the SM provides noise-robust, spatially regularized parameter estimates. FODs of SH orders up to at least eight can be estimated alongside the SM kernel parameters. The self-supervised nature of this approach has advantages over existing (supervised) methods, as it prevents training set bias and allows for explicit correction of gradient non-uniformities, within reasonable estimation times.

Methods

Standard Model of white matter

Multiple approaches have been suggested to model white matter dMRI signal as a combination of sticks and anisotropic Gaussian diffusion compartments9,11,16,46,54,55,56. Generalization of this principle without introducing constraints on model parameters has led to a unified framework called the Standard Model of white matter4,14. The Standard Model assumes the measured signal S to be described by the convolution of a kernel \({{{\mathcal{K}}}}(b,{b}_{\Delta },{{{\boldsymbol{n}}}}\cdot {{{\boldsymbol{u}}}})\) – describing the signal arising from water diffusing within and around a coherent fiber bundle with direction n – with a distribution of fiber populations \({{{\mathcal{P}}}}({{{\boldsymbol{n}}}})\) on the unit sphere:

$$S(b,{b}_{\Delta },{{{\boldsymbol{u}}}})={S}_{0}{\int _{{S}^{2}}}{{{\mathcal{K}}}}(b,{b}_{\Delta },{{{\boldsymbol{n}}}}\cdot {{{\boldsymbol{u}}}})\,{{{\mathcal{P}}}}({{{\boldsymbol{n}}}})\,d{{{\boldsymbol{n}}}}.$$
(1)

where b is the b-value, bΔ is the B-tensor shape, u describes the first eigenvector of the B-tensor, and S0 is the signal without diffusion weighting. Our implementation of the SM assumes fiber bundles to consist of two compartments, intra-axonal and extra-axonal, that hold different diffusion characteristics. The signal from an axially symmetric tensor (zeppelin) compartment depends on its axial (D) and perpendicular (D) diffusivity and is given by the following relation57:

$$ {{{{\mathcal{K}}}}}_{zep}(b,{b}_{\Delta },{{{\boldsymbol{n}}}}\cdot {{{\boldsymbol{u}}}}) \\ =\exp \left[\frac{1}{3}b{b}_{\Delta }({D}^{\parallel }-{D}^{\perp })-\frac{1}{3}b({D}^{\parallel }+2{D}^{\perp })-b{b}_{\Delta }{({{{\boldsymbol{n}}}}\cdot {{{\boldsymbol{u}}}})}^{2}({D}^{\parallel }-{D}^{\perp })\right]$$
(2)

The intra-axonal compartment is modeled as a zero-radius stick (i.e. D = 0) with D = Di, while the extra-axonal compartment is modeled as a zeppelin with axial and perpendicular diffusivity \({D}^{\parallel }={D}_{e}^{\parallel }\) and \({D}^{\perp }={D}_{e}^{\perp }\), respectively. The fraction of the signal occupied by the intra-axonal compartment is given by fi (thus setting the fraction of the extra-axonal compartment to 1 − fi). Summing over the intra-axonal and extra-axonal signal contributions, results in the following forward equation for the signal:

$$ {\hskip -3pt}S(b,{b}_{\Delta },{{{\boldsymbol{u}}}}) = \; {S}_{0}\cdot \Bigg[{f}_{i}{\int _{{S}^{2}}}\exp (-b{b}_{\Delta }\,{({{{\boldsymbol{n}}}}\cdot {{{\boldsymbol{u}}}})}^{2}{D}_{i}){{{\mathcal{P}}}}({{{\boldsymbol{n}}}})\,d{{{\boldsymbol{n}}}}\cdot \exp \left(\frac{1}{3}b{b}_{\Delta }{D}_{i}-\frac{1}{3}b{D}_{i}\right) \\ {\hskip -5pt}+(1-{f}_{i}){\int _{{S}^{2}}}\exp (-b{b}_{\Delta }{({{{\boldsymbol{n}}}}\cdot {{{\boldsymbol{u}}}})}^{2}({D}_{e}^{\parallel }-{D}_{e}^{\perp })){{{\mathcal{P}}}}({{{\boldsymbol{n}}}})\,d{{{\boldsymbol{n}}}}\cdot \exp \left(\frac{1}{3}b{b}_{\Delta }\left({D}_{e}^{\parallel }-{D}_{e}^{\perp }\right)\\ -\frac{1}{3}b\left({D}_{e}^{\parallel }+2{D}_{e}^{\perp }\right)\right)\! \Bigg].$$
(3)

Calculation of the integral can follow two approaches. The first approach leverages an analytical expression for the integral containing a product of Legendre polynomial function and an exponential term. This term arises when projecting \({{{\mathcal{P}}}}({{{\boldsymbol{n}}}})\) on a SH basis. For a full derivation, see refs. 15,45,58. However, this analytical expression is only valid when bΔ≥0 and \({D}_{e}^{\parallel } > {D}_{e}^{\perp }\) (see ref. 54 for the derivation of this analytical solution). The second approach uses numerical integration to calculate the integral. This approach is able to incorporate negative bΔ but is computationally more demanding than the analytical approach.

INR network architecture

The purpose of the INR is to map a coordinate vector x to a desired output vector k which represents the underlying dataset at that coordinate, by passing the coordinate through a neural network \({{{{\mathcal{F}}}}}_{\Psi }:{{{\boldsymbol{x}}}}\to {{{\boldsymbol{k}}}}\) with weights Ψ. In our case we map a 3D-coordinate \({{{\boldsymbol{x}}}}\in {{\mathbb{R}}}^{3}\) to a vector of parameters for the SM kernel and FOD, \({{{\boldsymbol{k}}}}=[{D}_{i},{D}_{e}^{\parallel },{D}_{e}^{\perp },{f}_{i},{S}_{0},{p}_{0}^{0},...{p}_{l}^{m}]\) where \({p}_{l}^{m}\) is the coefficient of the SH basis function of order l and phase m. The signal is hence projected onto real SH as in ref. 58. The end result is a representation of a (dMRI) dataset of a single subject by a neural network, from which the parameter maps can be inferred at any x. A ’dataset’ in this manuscript will refer to all dMRI volumes in a single acquisition of a single subject, unless specified otherwise. The implicit neural representation consists of three parts: the spatial encoding, a small MLP, and a number of output layers (Fig. 8). Each of the parts will be discussed in-depth in the upcoming sections.

Fig. 8: INR network architecture.
Fig. 8: INR network architecture.
Full size image

a shows how the input coordinates (x) are mapped to a higher dimensional frequency space (γ). b These values are then forwarded to the Multi-layer perceptron (\({{{{\mathcal{M}}}}}_{{\Psi }_{m}}\)). c Output layer (z) of the MLP is converted to SM parameters (\(\widehat{{{{\boldsymbol{k}}}}}\)).

Spatial encoding

By encoding the input coordinates to a high-dimensional space before entering them into the model, we can greatly increase the representational power of the INR59. We use the Fourier features encoding described by Tancik et al.59, which was used previously to model MSMT-CSD using INRs24. First we scale the coordinates x (maintaining aspect ratio) to lie in range [−1, 1]3, and then apply to following transformation:

$$\gamma ({{{\boldsymbol{x}}}})=[\cos (2\pi {{{\boldsymbol{A}}}}{{{\boldsymbol{x}}}}),\sin (2\pi {{{\boldsymbol{A}}}}{{{\boldsymbol{x}}}})]$$
(4)

where γ(. ) is the Fourier feature encoding, and A is a size np × 3 matrix with values sampled from \({{{\mathcal{N}}}}(0,{\sigma }^{2})\). The number of encodings np and the variance σ2 are hyperparameters that can be adapted to suit datasets of varying complexity and quality. This process results in an encoded coordinate vector \(\gamma ({{{\boldsymbol{x}}}})\in {[-1,1]}^{{n}_{p}\times 2}\).

Multi-layer perceptron

The MLP \({{{{\mathcal{M}}}}}_{{\Psi }_{m}}\) with weights Ψm (m pointing towards the corresponding MLP) is the backbone of INR and is largely responsible for representing the underlying dataset. It consists of four fully-connected layers of equal sizes nh, determined by the complexity of the represented dataset, and ReLU activation functions. The MLP maps the encoded coordinates to some latent vector \({{{{\mathcal{M}}}}}_{{\Psi }_{m}}:\gamma ({{{\boldsymbol{x}}}})\to {{{\boldsymbol{z}}}}\) with \({{{\boldsymbol{z}}}}\in {{\mathbb{R}}}_{+}^{{n}_{h}}\), that serves as an input to the output layers.

Output layers

The final part of the INR architecture maps z to a parameter estimate \(\widehat{{{{\boldsymbol{k}}}}}\) using a separate fully-connected layer, called ‘head’, for each parameter estimate. For the SM parameters \({\widehat{D}}_{i}\), \({\widehat{D}}_{e}\), \({\widehat{D}}_{p}\), and \({\widehat{f}}_{i}\) the heads use a sigmoid activation function scaled to fit physiological ranges (Table 2). For \({\widehat{S}}_{0}\) a softplus activation function was used, which ensures positivity, without an upper bound60. The SH-coefficients of the estimated FOD \(\widehat{{{{\mathcal{P}}}}}({{{\boldsymbol{n}}}})\) require both positive and negative outputs and, therefore, have no activation function. This results in the full output layer of the INR providing the mapping \({{{{\mathcal{Q}}}}}_{{\Psi }_{q}}:{{{\boldsymbol{z}}}}\to \widehat{{{{\boldsymbol{k}}}}}\), where \({{{{\mathcal{Q}}}}}_{{\Psi }_{q}}\) are the output layers with weights Ψq.

Table 2 Table showing the lower (min.) and upper (max.) bounds of the estimated parameters, diffusivities shown in μm2ms−1

Model fitting

Given set of Nm (capital N denoting a fixed number, opposed to lower case hyperparameters np and nh) measurements \(\{({b}_{i},{({b}_{\Delta })}_{i},{{{{\boldsymbol{u}}}}}_{i})| i\in 1,...,{N}_{m}\}\), measured at coordinates xj X with \({{{\boldsymbol{X}}}}\subset {{\mathbb{R}}}^{3}\) being the set of all measured coordinates in the dMRI dataset, the estimated signal \(\widehat{S}({b}_{i},{({b}_{\Delta })}_{i},{{{{\boldsymbol{u}}}}}_{i},{{{{\boldsymbol{x}}}}}_{{{{\boldsymbol{j}}}}})\) at coordinate xj is obtained from the model output \({{{{\mathcal{F}}}}}_{\Psi }:{{{\boldsymbol{x}}}}\to \widehat{{{{\boldsymbol{k}}}}}\) by calculating the estimated kernel \(\widehat{{{{\mathcal{K}}}}}(b,{b}_{\Delta },{{{\boldsymbol{n}}}}\cdot {{{\boldsymbol{u}}}})\) using (2) and convolving with \(\widehat{{{{\mathcal{P}}}}}({{{\boldsymbol{n}}}})\) as in (1). We approximate the desired INR \({{{{\mathcal{F}}}}}_{\Psi }\) with weights Ψ {ΨmΨq} by finding the weights Ψ* that minimize the error (MSE or Rician likelihood loss39, the latter allowing an explicit correction of Rician noise) between the estimated signal \(\widehat{S}\) and the measured signal S:

$${\Psi }^{* }={{{\rm{argmin}}}}_{\Psi }\frac{1}{| {{{\boldsymbol{X}}}}| }\sum\limits_{{{{{\boldsymbol{x}}}}}_{{{{\boldsymbol{j}}}}}\in {{{\boldsymbol{X}}}}}\sum\limits_{i=1}^{{N}_{m}}{{{\mathcal{L}}}}(S({b}_{i},{({b}_{\Delta })}_{i},{{{{\boldsymbol{u}}}}}_{i},{{{{\boldsymbol{x}}}}}_{{{{\boldsymbol{j}}}}}),\widehat{S}({b}_{i},{({b}_{\Delta })}_{i},{{{{\boldsymbol{u}}}}}_{i},{{{{\boldsymbol{x}}}}}_{{{{\boldsymbol{j}}}}}))+{\Lambda }_{{{{{\boldsymbol{x}}}}}_{{{{\boldsymbol{j}}}}}}$$
(5)

where \({{{\mathcal{L}}}}\) is either the MSE or the Rician likelihood loss. We include an additional term \({\Lambda }_{{{{{\boldsymbol{x}}}}}_{{{{\boldsymbol{j}}}}}}\) which is a non-negativity constraint for the FOD at xj, as described by Tournier et al.10. The constraint is calculated by sampling the FOD across the spherical domain and adding any negative values as a loss. The full dMRI dataset is used, without a train/test split, as the goal is for the INR to represent the data, not to predict unseen data. The fitting process is shown in Fig. 9.

Fig. 9: Visualization of the INR fitting process.
Fig. 9: Visualization of the INR fitting process.
Full size image

The input coordinates x are input in the INR (architecture shown in Fig. 8) and are mapped to a parameter estimate \(\widehat{{{{\boldsymbol{k}}}}}\). Using \(\widehat{{{{\boldsymbol{k}}}}}\) the signal estimate \(\widehat{S}\) is reconstructed following (3). The loss between \(\widehat{S}\) and the measured signal S is used to update the INR weights.

Implementation

The INR is implemented in Python 3.10.10 using PyTorch 2.0.0. An Adam optimizer was used with a learning rate of 10−4, β1 = 0.9, β2 = 0.999, ϵ = 10−8, and no weight decay. The hyperparameters were set at np = 5000, nh = 2048, σ2 = 3.5 for the SNR 50 synthetic datasets (see section ‘Generation of simulated ground truth data’) and the in vivo dataset (see section ‘In vivo data acquisition’), and σ2 = 2.5 for the SNR 20 synthetic datasets. More details about the choice of σ2 is given in the supplementary information section 1 (Fig. S1). Each INR was fit for 150 epochs on an NVIDIA RTX 4080 GPU with 16GB of VRAM, with a batch size of 500. Visualizations of the model output were created using matplotlib 3.8.0, and MRtrix3 3.0.461. Numerical integration to calculate the integral during training is implemented with Torchquad Simpson function62.

Comparisons

The performance of the INR model (referred to as the INR method) is compared to three other SM model fitting methods described below. A supervised machine learning approach using the SMI toolbox15 (SMI method) with standard settings. The SMI method requires a noise map, which was determined using MP-PCA denoising63. A supervised deep learning method (referred to as the supervised NN method introduced in ref. 18) was trained using a combination of synthetic and data-driven parameter samples. Specifically, the training data consisted of 500,000 samples, with 75% allocated for training and 25% for validation. Half of the training samples were generated by uniformly sampling model parameters, while the other half were derived by applying mutations to NLLS estimates obtained from the target dMRI data. The neural network architecture comprised three hidden layers with 150, 80, and 55 neurons. Gaussian noise was added with SNR 50. For a comprehensive description of all training settings, see ref. 18. Finally, an NLLS approach (NLLS method) was implemented with the MATLAB (MathWorks, Natick, MA, USA) optimization toolbox: lsgnonlin with Levenberg-Marquardt algorithm, max 1000 iterations. Two initializations were fitted after which the solution with the lowest residual norm was chosen. Of the above methods, only INR has a positivity constraint implemented for the FOD (see section ‘Model fitting’).

Fitting performance across the different methods was evaluated using ρ and RMSE on kernel parameters and rotational invariant \({p}_{2}=\sqrt{\frac{4\pi }{5}}\sqrt{{\sum }_{m}| {p}_{2m}{| }^{2}}\), where p2m2 is the absolute value of the second order, m-th phase SH-coeffient.

Generation of simulated ground truth data

In silico experiments were conducted on simulated data obtained from one brain (subject 011) of the MGH Connectome Diffusion Microstructure Dataset64. The dMRI data were acquired on the 3T Connectome MRI scanner (Magnetom CONNECTOM, Siemens Healthineers) at 2mm isotropic resolution. The acquisitions with b = [0, 50, 350, 800, 1500, 2400, 3450, 4750, 6000]smm−2 and Δ = 19ms were selected. The lowest 4 b-values were acquired with 32 uniformly distributed diffusion encoding directions, the highest 4 b-values with 64. The dataset also contained 50 b = 0 smm−2 volumes. More details about the imaging parameters and processing can be found in ref. 65. The SM was fitted with the SMI toolbox15 to generate a set of realistic SM kernel parameters for fi, Di, \({D}_{e}^{\parallel }\), and \({D}_{e}^{\perp }\), using lmax = 4 and noise bias correction with a sigma map acquired through MP-PCA63. Further settings were 2 compartments (intra- and extra-axonal), 106 training samples, and Nlevels = 1. To enhance the smoothness of the kernel maps, anisotropic diffusion filtering was performed using MATLAB’s imdiffusefilt function with three iterations (N = 3) and minimal connectivity. The SH-coefficients plm of the FODs were calculated using MSMT-CSD13 for lmax = [2, 4, 6, 8]. The simulated signals corresponding to these parameters were calculated from the SM signal equation with a published optimized acquisition protocol15: b = [0, 1000, 2000, 8000, 5000, 2000] smm−2, number of directions [4, 20, 40, 40, 35, 15], and B-tensor shape bΔ = [1, 1, 1, 1, 0.8, 0]. Directions were optimized by minimizing the electric potential energy on a hemisphere66 after which half of the directions were flipped. The image resolution was kept identical to the original dataset at 2 mm isotropic. Finally, Gaussian or Rician noise was added. The standard deviation of the noise distribution was determined by the mean of the b = 0 smm−2 acquisitions and the SNR (20,50 or on the b = 0 smm−2 images), resulting in a spatially varying standard deviation. These SNR levels are comparable to (50), or below (20), those investigated in previous work15,18. Free water contributions and TE dependence were not considered. Non-white matter voxels are masked out as their influence on the loss value will decrease the performance of parameter estimation on white matter voxels. A white matter mask was generated with the Freesurfer67 segmentation, which is included in the MGH dataset. Voxels with \({D}_{e}^{\parallel } > {D}_{e}^{\perp }\) were also masked out as these represent nonphysical behavior.

In vivo data acquisition

The study was approved by the Cardiff University School of Psychology Ethics Committee and written informed consent was obtained from the participant in the study. All ethical regulations relevant to human research participants were followed. One healthy volunteer was scanned on a 3T, 300 mT/m Connectom scanner (Siemens Healthineers, Erlangen, Germany). Imaging parameters and diffusion acquisition scheme can be found in Table 3. Gradient waveforms for spherical tensor encoding were optimized using the NOW toolbox68. The in vivo data was corrected for Gibbs ringing69, signal drift70, motion and eddy current correction71, susceptibility correction72, and gradient non-uniformity image distortion and B-matrix correction28.

Table 3 In vivo diffusion acquisition scheme and imaging parameters

Experiment 1: Quantitative comparison on simulated data

All four fitting methods were compared on simulated data. To mimic realistic conditions, Gaussian noise was added during the simulation of the synthetic dataset (SNR = [20,50]). For this experiment, only lmax = 2 was considered, which is the highest SH order the supervised NN method can fit. As the used optimized acquisition protocol contains only positive bΔ values, the SM forward model was calculated following the analytical approach.

Experiment 2: Rician noise bias

The effect of Rician noise bias was investigated by introducing noise sampled from a Rician distribution (SNR = [20,50]), rather than Gaussian. As in experiment 1, only lmax = 2 was considered. Parameter estimation was performed using both a standard MSE loss and a Rician likelihood loss39, and the resulting estimations were compared to the ground truth. The integral in the SM forward model is calculated with the analytical approach.

Experiment 3: Qualitative comparison on in vivo data

To test the INR on in vivo data, SM parameter estimation was executed on the dataset from section ‘In vivo data acquisition’ using the Rician likelihood loss. The MPPCA map for the SMI fitting was estimated on the unprocessed data and b-value up to 1300 s/mm2. Only lmax = 2 was considered. As the acquisition protocol contained negative bΔ values, the SM forward model was calculated following the numerical integration approach. The results are compared to the estimates from the methods in section ‘Comparisons’.

Experiment 4: Estimation of SH order up to l max = 8

The performance of the proposed method to model higher order FODs is investigated by fitting the INR with SH orders of lmax = [2, 4, 6, 8], for four different datasets. The noiseless synthetic dataset and the SNR 50 and 20 synthetic data provide insight in the accuracy of FOD estimation in noisy data, while the in vivo dataset qualitatively shows the capability of the INR to estimate higher order FODs on realistic datasets.

Experiment 5: Effect of gradient non-uniformity correction of SM parameter estimation

The impact of gradient non-uniformities on SM parameter estimation was assessed using in vivo data. For each voxel, the b-value, bΔ, and b-vectors were recalculated to account for scanner-specific gradient deviations following28. These corrected effective acquisition parameters were then used to fit the model. This analysis was performed for lmax = 2. The differences between the corrected and uncorrected parameter estimates were subsequently evaluated. Gradient non-uniformity correction was implemented using MSE loss function and compared to gradient non-uniformity correction with NLLS.

Experiment 6: Implicit neural representation for spatial interpolation

To obtain more detailed insight into the continuous spatial representation of the dataset provided by an INR, the INR fit on the Gaussian noise, SNR 50 synthetic dataset at lmax = 2 is sampled at 8x the original resolution in every dimension, resulting in 0.25mm isotropic voxels. The parameter map of p2 was visualized in the coronal and sagittal plane for the original resolution output of the model and the linear, cubic and INR upsampling.

Statistics and reproducibility

All statistical analyses were conducted using custom Python scripts. Comparisons between ground truth and estimated SM parameters on simulated data, as well as on in vivo data, were performed using Pearson’s correlation coefficient and RMSE. The analysis workflow relied on SciPy (v1.16.1), sklearn (1.7.1) and NumPy (v2.3.2). No inter-subject statistical analyses were carried out.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.