Real-time inference for binary neutron star mergers using machine learning

Dax, Maximilian; Green, Stephen R.; Gair, Jonathan; Gupte, Nihar; Pürrer, Michael; Raymond, Vivien; Wildberger, Jonas; Macke, Jakob H.; Buonanno, Alessandra; Schölkopf, Bernhard

doi:10.1038/s41586-025-08593-z

Download PDF

Article
Open access
Published: 05 March 2025

Real-time inference for binary neutron star mergers using machine learning

Nature volume 639, pages 49–53 (2025) Cite this article

71k Accesses
46 Citations
190 Altmetric
Metrics details

Subjects

Abstract

Mergers of binary neutron stars emit signals in both the gravitational-wave (GW) and electromagnetic spectra. Famously, the 2017 multi-messenger observation of GW170817 (refs. ^1,2) led to scientific discoveries across cosmology³, nuclear physics^4,5,6 and gravity⁷. Central to these results were the sky localization and distance obtained from the GW data, which, in the case of GW170817, helped to identify the associated electromagnetic transient, AT 2017gfo (ref. ⁸), 11 h after the GW signal. Fast analysis of GW data is critical for directing time-sensitive electromagnetic observations. However, owing to challenges arising from the length and complexity of signals, it is often necessary to make approximations that sacrifice accuracy. Here we present a machine-learning framework that performs complete binary neutron star inference in just 1 s without making any such approximations. Our approach enhances multi-messenger observations by providing: (1) accurate localization even before the merger; (2) improved localization precision by around 30% compared to approximate low-latency methods; and (3) detailed information on luminosity distance, inclination and masses, which can be used to prioritize expensive telescope time. Additionally, the flexibility and reduced cost of our method open new opportunities for equation-of-state studies. Finally, we demonstrate that our method scales to long signals, up to an hour in length, thus serving as a blueprint for data analysis for next-generation ground- and space-based detectors.

Constraining the equation of state in neutron-star cores via the long-ringdown signal

Article Open access 03 February 2025

End-to-end study of the host galaxy and genealogy of the first binary neutron star merger

Article 12 January 2023

An updated nuclear-physics and multi-messenger astrophysics framework for binary neutron star mergers

Article Open access 20 December 2023

Main

The fast and accurate inference of binary neutron stars (BNSs) from gravitational-wave (GW) data is a critical challenge facing multi-messenger astronomy. For a BNS, the GW signal is visible by the Laser Interferometer GW Observatory (LIGO)–Virgo GW Interferometer (Virgo)–Kamioka GW Detector (KAGRA) (collectively, LVK)^9,10,11 observatories minutes before any electromagnetic counterpart. The GW encodes information on the source characterization, distance, sky location and orientation necessary for pointing and prioritizing optical telescopes. However, the length of BNS signals makes conventional Bayesian inference techniques^12,13 too slow to be useful in low-latency applications. Instead, once a GW signal is identified by detection pipelines^14,15, approximate algorithms are used for providing initial alerts (for example, BAYESTAR¹⁶, which uses the signal-to-noise ratio (SNR) time series rather than the complete strain data and gives localization in seconds). Other methods focus on accelerating likelihood evaluations without incurring loss of precision (for example, using reduced-order quadratures), with the state-of-the-art delivering localization in 6 min and full inference in 2 h (ref. ¹⁷).

Simulation-based machine learning offers a powerful alternative for GW inference (see Methods for related work). With simulation-based inference (SBI)¹⁸, neural networks are trained to encode probabilistic estimates of astrophysical source parameters conditional on data. Trained networks then enable extremely fast analysis for new datasets, amortizing upfront training costs across observations. In past work, we developed the deep inference for GW observations (DINGO) framework for binary black holes (BBHs)^19,20, which performs accurate inference in seconds, including strong accuracy guarantees when coupled with importance sampling. However, when applied to BNSs, machine-learning approaches, such as DINGO, are beset by the same challenges facing traditional methods because of long signal durations. Indeed, DINGO becomes unreliable even for low-mass BBHs (chirp masses ≲15 M_⊙) with signals longer than roughly 16 s. A BNS lasts for hundreds of seconds for the LVK and will reach hours for next-generation detectors (for example, Cosmic Explorer²¹ and Einstein Telescope²²). From the neural-network perspective, this corresponds to time or frequency series input with up to tens of millions of dimensions—a thousand-fold increase over BBH.

In this study, we overcome these challenges by leveraging perturbative BNS physics information to simplify and compress the data. However, this simplification requires approximate knowledge of the source itself and is hence valid only over a small portion of the parameter space. We solve this problem using a new algorithm, called prior conditioning, which enables us to construct networks that can be adapted at inference time to subsets of the prior volume. Our new framework, called DINGO-BNS, makes no (practically relevant) approximation and takes just 1 s for accurate inference of all 17 BNS parameters (Fig. 1). Using DINGO-BNS, we can also infer all of these parameters minutes before the merger based on partial inspiral-only information—estimates that can be continuously updated as more data become available (Fig. 2a). Near-real-time or pre-merger alerts can then be provided to astronomers, facilitating potential discoveries of precursor and prompt electromagnetic counterparts^23,24,25.

**Fig. 1: Real-time inference of GW170817.**

Our results are faster and more complete than any existing low-latency algorithm, with the accuracy of offline parameter estimation codes. Compared to BAYESTAR, we achieve median reductions in the size of the 90% credible sky region of 30% (Fig. 2b). Finally, DINGO-BNS exhibits excellent scaling to longer signals (Methods), and we demonstrate next-generation detector pre-merger inference for signals up to an hour in length (Fig. 2c).

DINGO-BNS

For given GW data, d, we characterize the source in terms of the posterior probability distribution, p(θ|d), over BNS parameters, θ. Parameters include component masses (two), spins (six), orientation, sky position (two), luminosity distance, polarization, time and phase of coalescence and (in contrast to black holes) tidal deformabilities (two). Following our past work¹⁹, we use simulated GW datasets to train a density estimation neural network, q(θ|d) (a normalizing flow), to approximate p(θ|d). Once trained, the inference for new d simply requires sampling θ ~ q(θ|d). We obtain asymptotically exact results by augmenting samples with importance weights using the GW likelihood function²⁶. This framework, called DINGO-IS²⁰, has been successfully applied to black hole mergers. However, the length of BNS signals renders the naïve transfer of machine-learning methods impossible.

To tackle this challenge, DINGO-BNS makes several innovations (Fig. 3d), including using knowledge of specific BNS signal morphology to compress data in a non-lossy way, conditioning the network on the compressor using prior conditioning, frequency masking based on the pre-merger time and chirp mass, and conditioning on parameter subsets for incorporating multi-messenger information or expectations from nuclear models. The philosophy underlying our approach is that the full BNS problem is too hard for existing neural architectures, so we divide the parameter and data spaces into manageable portions based on known physical information. We then combine all of these variable design choices into a single network. By passing relevant control parameters at inference time, the network can be tuned to the context at hand.

**Fig. 3: Prior conditioning and method flowchart.**

Data compression and prior conditioning

We adapted two GW analysis techniques to the SBI context—heterodyning²⁷ to simplify the data and multibanding^28,29 to reduce the data dimension without loss of information. During the long inspiral period, a BNS signal exhibits a ‘chirp’, with phase evolution (to leading order in the post-Newtonian expansion³⁰),

$${\varphi }(f;{\mathcal{M}})=\frac{3}{128}{\left(\frac{{\rm{\pi }}G{\mathcal{M}}f}{{c}^{3}}\right)}^{-5/3},$$

(1)

where f is the frequency, c is the speed of light in vacuum and ${\mathcal{M}}$ = (m₁m₂)^3/5/(m₁ + m₂)^1/5 is the chirp mass of the system, with m₁ and m₂ being the component masses. Given an approximation, $\widetilde{{\mathcal{M}}}$, to the chirp mass, we heterodyne the (frequency-domain) data by multiplying by ${{\rm{e}}}^{{\rm{i}}{\boldsymbol{\varphi }}({\boldsymbol{f}};\widetilde{{\mathcal{M}}})}$, reducing the number of oscillations in the signal by several orders of magnitude (Fig. 4a). Given the heterodyned data, we apply multibanding by partitioning the domain into (empirically determined) frequency bands and coarsening the resolution in higher bands, such that the (heterodyned) signal is preserved.

Because the compression described requires $\widetilde{{\mathcal{M}}}$ to approximate the chirp mass, it cannot be done across the entire BNS prior volume using a single $\widetilde{{\mathcal{M}}}$ value. Therefore, DINGO-BNS uses prior conditioning to restrict to an event-specific prior over which data are compressed. The restricted volume additionally simplifies the density estimation task. By conditioning on the choice of restriction, prior conditioning trains a network that is tunable to this choice, but otherwise applicable over the whole volume (Fig. 3a). Inference requires an estimate, $\widetilde{{\mathcal{M}}}$, of the chirp mass, ${\mathcal{M}}$, which can be determined quickly by sweeping across the prior (Methods).

Frequency masking

In contrast to past work, DINGO-BNS also allows strain frequency series with varying minimum and maximum frequencies (f_min and f_max, respectively). For a given analysis, f_min is chosen based on $\widetilde{{\mathcal{M}}}$ and the segment duration as the minimum frequency present in the signal in a given GW-detector network. This masking is necessary for consistency with frequency-domain waveform models, which assume infinite duration. Choosing f_max, by contrast, determines the end time of the data stream analysed to enable pre-merger inference (Fig. 4b and Methods).

Conditioning on parameter subsets

The DINGO-BNS framework (and SBI in general) allows considerable flexibility in terms of quickly marginalizing over, and conditioning on, parameters. Conditioning on a parameter allows us to set it to a fixed value—for example, to incorporate knowledge of that parameter from other sources. In our study, we trained DINGO-BNS networks conditioned on the sky position—that is, we learned p(θ\{α,δ}|d,α,δ), where α and δ denote the right ascension and declination, respectively. Such a network allows us to incorporate precise multi-messenger localization to obtain tighter constraints on the remaining parameters, potentially enabling real-time feedback on whether optical candidates should be prioritized for detailed spectroscopy³¹. In this way, DINGO-BNS can enable new modes of interaction between GW and electromagnetic observers, potentially transforming how we prioritize and respond to multi-messenger events. We have also explored parameter-conditioning to accelerate offline nuclear equation-of-state (EOS) analyses (Methods).

Experiments

We generated training data using simulated BNS waveforms (including spin-precession and tidal contributions, but without higher angular multipoles³²) with additive stationary Gaussian detector noise. When relevant, networks are also trained with power spectral density (PSD)-conditioning to enable instant tuning to noise levels at the time of an event. At inference time, we validate and correct results using importance sampling, thus guaranteeing their accuracy, provided a sufficient effective sample size is obtained²⁰. We accelerate the importance sampling step using JAX waveform and likelihood implementations^33,34,35.

We performed four studies using DINGO-BNS: (1) a pre-merger analysis of the first BNS detected—GW170817—as well as equivalent injections (simulated datasets) at varying noise levels; (2) a pre-merger analysis of a range of injections in LVK design sensitivity noise; (3) an after-merger analysis of the two detected BNS events—GW170817 and GW190425—reproducing published LVK results; and (4) a pre-merger analysis of injections in Cosmic Explorer noise (with a minimum frequency of 6 Hz, corresponding to an hour-long signal). We use the importance sampling efficiency as a primary performance metric, finding average values of 63.3%, 47.0%, 31.0% and 35.6% in experiments (1), (2), (3) and (4), respectively. With these high efficiencies, inference for 10⁴ effective samples takes roughly 1 s on an H100 GPU (Methods). Efficiencies are generally higher for pre-merger, probably because the waveform morphology is simplest in the early inspiral.

Discussion

Prior conditioning works well for BNS inference, and it could be extended to address further challenges in GW astronomy (for example, the isolation of events from overlapping backgound signals in next-generation detectors) and other scientific domains. In the future, we would like to explore our prior-conditioning approach to data compression for black hole–neutron star systems and low-mass BBHs. This is non-trivial because such systems can emit GWs in higher angular radiation multipoles (that is, beyond the (l,m) = (2,2) mode that we assume here), which evolve according to integer multiples of (1), and so would require an improved heterodyning algorithm to factor out the chirp. Higher modes are not present in BNS signals, because the stars are very nearly equal in mass.

Another exciting prospect for SBI is a more realistic treatment of detector noise. Indeed, because BNS inspirals have long durations, noise non-stationarities and non-Gaussianities are more likely to manifest. Currently, DINGO-BNS assumes stationary Gaussian noise and is supplied with an estimate of the PSD (possibly resulting in additional latency for data preparation, see Methods). However, by training on realistic detector noise, our approach can, in principle, learn to fully characterize the noise jointly with the signal, including any deviations from stationarity and Gaussianity. This approach is akin to on-source PSD and glitch modelling³⁶, but allows more general noise and automatically marginalizes over uncertainties. Initial steps in this direction have already been taken for intermediate-mass BBHs³⁷. Improved noise treatments, such as those afforded by SBI, will become crucial for reducing systematic error as detectors become more sensitive³⁸.

Finally, although DINGO-BNS is intended to be used for parameter estimation following a trigger by dedicated search pipelines, its speed opens up the possibility of being run continuously on all data as they are taken. Either the SNR or Bayesian evidence time series generated by DINGO-BNS could then be used as a detection statistic, forming an end-to-end detection and parameter estimation pipeline. To implement this would require calibrating these statistics to determine false alarm rates, as well as careful comparisons against existing algorithms to establish efficacy.

Methods

Machine-learning framework

The Bayesian posterior, p(θ|d) = p(d|θ)p(θ)/p(d), is defined in terms of a prior p(θ) and a likelihood p(d|θ). For GW inference, the likelihood is constructed by combining models for waveforms and detector noise. The Bayesian evidence, p(d), corresponds to the normalization of the posterior and can be used for model comparison.

Our framework is based on neural posterior estimation (NPE)^40,41,42, which trains a density estimation neural network, q(θ|d), to estimate p(θ|d). We parameterize q(θ|d) with a conditional normalizing flow^43,44. Training minimizes the loss L = Ep_(θ,d)[−log q(θ|d)], where the expectation value is computed across a dataset (θ_i,d_i) of parameters, θ_i ~ p(θ), paired with corresponding likelihood simulations, d_i ~ p(d|θ_i). After training, q(θ|d) serves as a surrogate for p(θ|d) and inference for any observed data, d_o, can be performed by sampling θ ~ q(θ|d_o). With DINGO^19,20 using a group-equivariant formulation of NPE (GNPE^19,45), the GW data are simplified by aligning coalescence times in the different detectors. However, this comes at the cost of longer inference times, so we do not use GNPE for DINGO-BNS.

At inference, we correct for potential inaccuracies of q(θ|d) using importance sampling²⁰, by assigning weight, w_i = p(d|θ_i)p(θ_i)/q(θ_i|d), to each sample, θ_i ~ q(θ_i|d). A set of n weighted samples (w_i,θ_i) corresponds to ${n}_{\text{eff}}={\left({\sum }_{i}{w}_{i}\right)}^{2}\,/\,\left({\sum }_{i}{w}_{i}^{2}\right)$ effective samples from the posterior, p(θ|d). This reweighting enables asymptotically exact results and the sample efficiency, ϵ = n_eff/n, serves as a performance metric. The normalization of the weights further provides an unbiased estimate of the Bayesian evidence, $p({\bf{d}})=({\sum }_{i}{w}_{i})/n$.

Below, we describe in more detail the technical innovations of DINGO-BNS that enable scaling of this framework to BNS signals.

Prior conditioning

An NPE model, q(θ|d), estimates the posterior, p(θ|d), for a fixed prior, p(θ). Choosing a broad prior enhances the general applicability of the NPE model, but it also implies worse tuning to specific events (for which smaller priors may be sufficient). This is a general trade-off in NPE, but it is particularly notable for BNS inference, where typical events constrain the chirp mass to around 10⁻³ of the prior volume. Thus, for an individual BNS event, a tight chirp mass prior would have been sufficient (Extended Data Fig. 1b) and moreover would have enabled effective heterodyning^27,46,47. However, to cover generic BNS events, we need to train the NPE network with a large prior (Extended Data Table 2).

We resolve this trade-off with a new technique called prior conditioning. The key idea is to train an NPE model with multiple different (restricted) priors simultaneously. Training a prior-conditioned model requires hierarchical sampling:

$${\boldsymbol{\theta }} \sim {{p}}_{{\boldsymbol{\rho }}}({\boldsymbol{\theta }}),{\boldsymbol{\rho }} \sim \widehat{p}({\boldsymbol{\rho }}),$$

(2)

where p_ρ(θ) is a prior family parameterized by ρ and $\widehat{p}({\boldsymbol{\rho }})$ is a corresponding hyperprior. We additionally condition the NPE model, q(θ|d,ρ), on ρ. This model can then perform inference for any desired prior, p_ρ(θ), by simply providing the corresponding ρ. This effectively amortizes the training cost over different choices of the prior. On each of the restricted priors, we are furthermore allowed to transform the data in a ρ-dependent way. This is because, having been conditioned on the prior choice, the network has all the information necessary to properly interpret the transformed data. We use this freedom to heterodyne the GW strain with respect to the approximate chirp mass.

To apply prior conditioning for the chirp mass, ${\mathcal{M}}$, we use a set of priors, ${p}_{\widetilde{{\mathcal{M}}}}({\mathcal{M}})={U}_{{m}_{1},{m}_{2}}(\widetilde{{\mathcal{M}}}-\Delta {\mathcal{M}},\widetilde{{\mathcal{M}}}+\Delta {\mathcal{M}})$. Here, U_m1,m2(${\mathcal{M}}$_min, ${\mathcal{M}}$_max) denotes a distribution over ${\mathcal{M}}$ with support, [${\mathcal{M}}$_min, ${\mathcal{M}}$_max], within which component masses, m₁, m₂, are uniformly distributed. We use a fixed ∆${\mathcal{M}}$ = 0.005 M_⊙ and choose a hyperprior, $\hat{p}(\widetilde{{\mathcal{M}}})$, covering the expected range of ${\mathcal{M}}$ for LVK detections of BNS (Extended Data Table 2). Because ∆${\mathcal{M}}$ is small, $\widetilde{{\mathcal{M}}}$ is a good approximation for any ${\mathcal{M}}$ within the restricted prior, ${p}_{\widetilde{{\mathcal{M}}}}({\mathcal{M}})$, and we can thus use $\widetilde{{\mathcal{M}}}$ for heterodyning. The resulting model, $q({\boldsymbol{\theta }}| {{\bf{d}}}_{\widetilde{{\mathcal{M}}}},\widetilde{{\mathcal{M}}})$, can then perform inference with event-optimized heterodyning and priors (via the choice of appropriate $\widetilde{{\mathcal{M}}}$), but is nevertheless applicable to the entire range of the hyperprior.

Inference results are independent of $\widetilde{{\mathcal{M}}}$ as long as the posterior, p(${\mathcal{M}}$|d), is fully covered by [$\widetilde{{\mathcal{M}}}$ − ∆${\mathcal{M}}$, $\widetilde{{\mathcal{M}}}$ + ∆${\mathcal{M}}$]. For BNS, p(${\mathcal{M}}$|d) is typically tightly constrained and we can use a coarse estimate of ${\mathcal{M}}$ for $\widetilde{{\mathcal{M}}}$. This can either be taken from a GW search pipeline or rapidly computed from $q({\boldsymbol{\theta }}| {{\bf{d}}}_{\widetilde{{\mathcal{M}}}},\widetilde{{\mathcal{M}}})$ itself by sweeping the hyperprior (see below). Note that, for shorter GW signals from black hole mergers, p(${\mathcal{M}}$|d) is generally less well constrained. The transfer of prior conditioning would thus require larger (and potentially flexible) values of ∆${\mathcal{M}}$. Alternatively, the prior range can be extended at inference time by iterative Gibbs sampling of ${\mathcal{M}}$ and $\widetilde{{\mathcal{M}}}$, similar to the GNPE algorithm^19,45.

Prior conditioning is a general SBI technique that enables a choice of prior at inference time. This can also be achieved with sequential NPE^40,41,42,48. However, in contrast to prior conditioning, these techniques require simulations and retraining for each observation, resulting in more expensive and slower inference. We here use prior conditioning with priors of fixed width for the chirp mass, and optional additional conditioning on fixed values for other parameters (corresponding to Dirac delta priors). Extension to more complicated priors and hyperpriors is straightforward.

Independent estimation of chirp mass and merger times

Running DINGO-BNS requires an initial estimate of the chirp mass, ${\mathcal{M}}$ (to determine $\widetilde{{\mathcal{M}}}$ for the network), and the merger time, t_c (to trigger the analysis). Matched filter searches can identify the presence of a compact binary signal and its chirp mass and merger time in low latency^{14,15,49,50,51}. Specialized early warning searches are designed to produce output before the coalescence can further provide a rough indication of sky position and distance^52,53,54. When available, the output of such pipelines can be used to trigger a DINGO analysis and provide estimates for ${\mathcal{M}}$ and t_c.

We here describe an alternative independent approach of obtaining these parameters, using only the trained DINGO-BNS model. We compute $\widetilde{{\mathcal{M}}}$ by sweeping the entire hyperprior, $\hat{p}(\widetilde{{\mathcal{M}}})={U}_{{m}_{1},{m}_{2}}({\widetilde{{\mathcal{M}}}}_{\min },{\widetilde{{\mathcal{M}}}}_{\max })$. Specifically, we run DINGO-BNS with a set of prior centres,

$${\mathop{{\mathcal{M}}}\limits^{ \sim }}_{i}={\mathop{{\mathcal{M}}}\limits^{ \sim }}_{min}+i\Delta {\mathcal{M}},$$

(3)

where i takes integer values between 0 and ($\widetilde{{\mathcal{M}}}$_max − $\widetilde{{\mathcal{M}}}$_min)/∆${\mathcal{M}}$. The inference models in this study were trained with hyperprior ranges of up to [1.0,2.2] M_⊙. For ∆${\mathcal{M}}$ = 0.005 M_⊙, we can thus cover the entire global chirp mass range using 241 (overlapping) local priors. We run DINGO-BNS for all local priors, ${\widetilde{{\mathcal{M}}}}_{i}$, in parallel, with 10 samples per ${\widetilde{{\mathcal{M}}}}_{i}$. This requires a DINGO-BNS inference of only a few thousand samples, which takes less than 1 s. We use the chirp mass, ${\mathcal{M}}$, of the maximum likelihood sample as the prior centre, $\widetilde{{\mathcal{M}}}$, for the analysis (Extended Data Fig. 1a). Note that the exact choice of $\widetilde{{\mathcal{M}}}$ does not matter, as long as the inferred posterior is fully covered by [$\widetilde{{\mathcal{M}}}$ − ∆${\mathcal{M}}$, $\widetilde{{\mathcal{M}}}$ + ∆${\mathcal{M}}$] (Extended Data Fig. 1b).

The merger time, t_c, can be inferred by continuously running this $\widetilde{{\mathcal{M}}}$ scan on the input data stream, sliding the t_c prior in real time over the incoming data. With inference times of 1 s, continuous analysis can be achieved on just a few parallel computational nodes (or even a single node when also parallelizing over the t_c grid), constantly running on the input data stream. Event candidates can then be identified by analysing the SNR, triggering upon exceeding some defined threshold (Extended Data Fig. 1c). This scan can be performed at an arbitrary (but fixed) time before the merger.

This scan successfully estimates ${\mathcal{M}}$ and t_c for both real BNS events (Extended Data Fig. 1). However, we have not tested this at a large scale on detector noise to compute false alarm rates because DINGO-BNS is primarily intended for parameter estimation. Existing search and early warning pipelines are probably more robust for event identification, particularly in the presence of non-stationary detector noise.

Frequency multibanding

Although the native resolution of a frequency series is determined by the duration, T, of the corresponding time series, (∆f = 1/T), we can average adjacent frequency bins wherever the signal is roughly constant. This enables data compression with only negligible loss of information. Here we use frequency multibanding, which divides the frequency range, [f_min, f_max], into N bands of decreasing resolution. Frequency band i covers the range $[\,{\hat{f}}_{i}\,,\,{\hat{f}}_{i+1}\,]$ with ∆f_i = 2ⁱ∆f₀, where ${\hat{f}}_{0}\,=\,{f}_{\min }$, ${\hat{f}}_{N}\,=\,{f}_{\max }$ and ∆f₀ is the native resolution of the frequency series. Within band i, the multibanded domain thus compresses the data by a factor of 2ⁱ (Extended Data Fig. 2), which is achieved by averaging 2ⁱ sequential bins from the original frequency series (called decimation). To achieve optimal compression, we empirically choose the smallest possible nodes, ${\hat{f}}_{i}$, for which GW signals are still fully resolved. Specifically, we simulate a set of 10³ heterodyned GW signals and demand that every period of these signals is covered by at least 32 bins in the resulting multibanded frequency domain. This is done before generating the training dataset, and the multibanded domain then remains fixed during dataset generation and training. The optimized resolution achieves compression factors between 60 and 650 (Extended Data Fig. 2c).

Traditionally, multibanding has been used without additional heterodyning—an approach we could also apply to DINGO-BNS. However, this would lead to lower compression factors, ranging from 14 to 56. More importantly, using multibanding alone would result in substantially more complicated input data to the DINGO-BNS network. Indeed, heterodyning enables additional truncation of the waveform singular value decomposition used in initializing the first network layer¹⁹. For LVK data, this results in a reduction from roughly 1,800 to 200 basis elements, thereby simplifying the learning task.

Care needs to be taken that the approximations are valid in the presence of detector noise. We now investigate how multibanding affects data simulation (for training) and the likelihood (for importance sampling).

Data simulation

The GW data are simulated as the sum of a signal and detector noise, d = h(θ) + n. The detector noise in frequency bin j is given by

$${n}_{j}\sim {\mathcal{N}}(0,{\sigma }\sqrt{{S}_{j}}),{\sigma }=\sqrt{\frac{w}{4\Delta f}},$$

(4)

where S denotes the detector noise PSD, and σ takes into account the frequency resolution and the Tukey window factor, w. Note that n is a complex frequency series, which we ignore in our notation, as the considerations here hold for real and imaginary parts individually. It is conventional to work with whitened data,

$${d}_{j}^{\text{w}}={h}_{j}^{\text{w}}({\boldsymbol{\theta }})+{n}_{j}^{\text{w}}=\frac{{h}_{j}({\boldsymbol{\theta }})+{n}_{j}}{\sqrt{{S}_{j}}},$$

(5)

in which case ${n}_{j}^{\text{w}}\sim {\mathcal{N}}(0,{\sigma })$.

We convert to the multibanded frequency domain by averaging sets of N_i = 2ⁱ bins,

$$\bar{{d}_{j}^{\text{w}}}=\frac{1}{{N}_{i}}\mathop{\sum }\limits_{k={m}_{j}}^{{m}_{j}+{N}_{i}-1}({h}_{k}^{\text{w}}+{n}_{k}^{\text{w}})=\bar{{h}_{j}^{\text{w}}}+\bar{{n}_{j}^{\text{w}}},$$

(6)

where j denotes the bin in the multibanded domain, m_j denotes the starting index of the decimation window for j in the native domain and i indexes the frequency band associated with j. Because $\bar{{n}_{j}^{\text{w}}}$ is an average of N_i Gaussian random variables with standard deviation σ, it follows that ${n}_{j}^{\text{w}}$ is also Gaussian with standard deviation,

$${{\sigma }}_{i}={\sigma }\,/\,\sqrt{{N}_{i}}=\sqrt{\frac{w}{4\Delta f{N}_{i}}}=\sqrt{\frac{w}{4\Delta {f}_{i}}}.$$

(7)

We can thus simulate the detector noise directly in the multibanded domain by updating σ → σ_i, corresponding to ∆f → ∆f_i. For the whitened signal we find

$$\overline{{h}_{j}^{\text{w}}}=\frac{1}{{N}_{i}}\mathop{\sum }\limits_{k={m}_{j}}^{{m}_{j}+{N}_{i}-1}\frac{{h}_{k}}{\sqrt{{S}_{k}}}\approx \bar{{h}_{j}}\mathop{\sum }\limits_{k={m}_{j}}^{{m}_{j}+{N}_{i}-1}\frac{1}{\sqrt{{S}_{k}}},$$

(8)

assuming an approximately constant signal, h, within the decimation window, $\bar{{h}_{j}}$ ≈ h_k,∀k ∈ [m_j,m_j + N_i − 1]. For frequency-domain waveform models, we can thus directly compute the signal $\bar{{h}_{j}}$ in the multibanded domain by simply evaluating the model at frequencies $\bar{{f}_{j}}$.

In summary, we can directly generate BNS data in the multibanded frequency domain by: (1) updating the noise standard deviation according to the multibanded resolution; (2) appropriately decimating noise PSDs; and (3) computating signals and noise realizations in the compressed domain. These operations are carefully designed to be consistent with the data processing of real BNS observations, which for DINGO-BNS are first whitened in the native domain and then decimated to the multibanded domain. This process relies on the assumption that signals are constant within decimation windows, and we ensure that this is (approximately) fulfilled when determining the multibanded resolution. Indeed, for signals generated directly in the multibanded domain, we find mismatches of at most around 10⁻⁷ when comparing to signals that are properly decimated from the native domain.

Likelihood evaluations

We also use frequency multibanding to evaluate the likelihood for importance sampling. The standard Whittle likelihood used in GW astronomy²⁶ reads:

$$\log p({\bf{d}}| {\boldsymbol{\theta }})=-\frac{1}{2}\sum _{k}\frac{{| {d}_{k}^{\text{w}}-{h}_{k}^{\text{w}}({\boldsymbol{\theta }})| }^{2}}{{\sigma }^{2}},$$

(9)

up to a normalization constant. The sum extends over all bins, k, in the native frequency domain. Assuming a constant signal (as above) and PSD within each decimation window, we can directly compute the likelihood in the multibanded domain,

$$\log \,p({\bf{d}}| {\boldsymbol{\theta }})\approx -\frac{1}{2}\sum _{j}\frac{{| \bar{{d}_{j}^{\text{w}}}-\bar{{h}_{j}^{\text{w}}}({\boldsymbol{\theta }})| }^{2}}{{\sigma }_{i(j)}^{2}}.$$

(10)

The assumptions are not exactly fulfilled in practice—for additional corrections, see ref. ²⁹. For importance sampling, we can always evaluate the exact likelihood in the native frequency domain instead. In this case, the result is no longer subject to any approximations, even if the DINGO-BNS proposal is generated with a network using multibanded data. With the full likelihood for GW170817, we found a sample efficiency of 11.0% with an inference time of 13 s for 50,000 samples. The deviation from the result obtained with the multibanded likelihood is negligible (Jensen–Shannon divergence of less than 5 × 10⁻⁴ nat for all parameters). This demonstrates that use of the multibanded resolution has no practically relevant impact on the results.

Frequency masking

Because the GW likelihood (and our framework) uses the frequency domain, but data are taken in the time domain, it is necessary to convert data by windowing and Fourier transforming. However, frequency domain waveform models assume infinite time duration, leading to inconsistencies with finite time segments, [t_min, t_max]. Because the frequency evolution of the inspiral is tightly constrained by the chirp mass, ${\mathcal{M}}$, we can compute boundaries, f_min(t_min, ${\mathcal{M}}$) and f_max(t_max, ${\mathcal{M}}$), such that the signals are not corrupted by the finite-duration effects within [f_min, f_max] and are negligibly small outside of that range (Extended Data Fig. 3).

We approximate the lower bound, f_min(t_min, ${\mathcal{M}}$), using the leading order in the post-Newtonian relationship between time and frequency,

$${f}_{0\text{PN}}(t,{\mathcal{M}})=\frac{1}{8{\rm{\pi }}}{\left(\frac{-t}{5}\right)}^{-3/8}{\left(\frac{G{\mathcal{M}}}{{c}^{3}}\right)}^{-5/8}.$$

(11)

For a network designed for fixed data duration, T, we set f_min(T, ${\mathcal{M}}$) = f_0PN(−T, ${\mathcal{M}}$) + f_buffer (we use f_buffer = 1 Hz for LVK and f_buffer = 0.5 Hz for next-generation detector setups).

For the upper bound, we found that f_0PN(t, ${\mathcal{M}}$) is not sufficiently accurate. Instead, we determined f_max(t, ${\mathcal{M}}$) empirically by simulating a set of signals (with parameters θ ~ p(θ)) and computing mismatches between signals with and without truncation at t > t_max. For a given set of simulations, we choose f_max(t, ${\mathcal{M}}$) as the highest frequency at which all mismatches are at the most 10⁻³. To avoid additional computation at inference time, we cache the results in a lookup table for f_max(t, ${\mathcal{M}}$). The lookup table used here was generated with 20 waveforms per element. We verified for random elements that this matches the result obtained using 1,000 waveforms with an accuracy of approximately 0.1 Hz. For production use, the lookup table may need to be generated with more waveforms.

Both bounds depend on the chirp mass, ${\mathcal{M}}$, with the upper bound additionally depending on the pre-merger time. To enable inference for arbitrary configurations, we trained a single network with variable frequency bounds. During training, we computed f_min(T, $\widetilde{{\mathcal{M}}}$) with the centre $\widetilde{{\mathcal{M}}}$ of the local chirp mass prior. The upper frequency bound, f_max, is sampled randomly (uniform in frequency bins of the multibanded frequency domain) to allow arbitrary pre-merger times. Data outside of [f_min, f_max] are zero-masked.

Such masked networks can perform pre-merger inference. Given an alert (for example, from a detection pipeline) predicting a merger at time ${\widetilde{t}}_{{\rm{c}}}$ (in the future) with chirp mass $\widetilde{{\mathcal{M}}}$, DINGO-BNS uses the frequency range [f_min(T, $\widetilde{{\mathcal{M}}}$), f_max(−${\widetilde{t}}_{{\rm{c}}}$, $\widetilde{{\mathcal{M}}}$)]. The frequency domain data are further shifted by ${\widetilde{t}}_{{\rm{c}}}$, such that the prior p(t_c) is centred around ${\widetilde{t}}_{{\rm{c}}}$. The resulting pre-merger posterior can then be used to update the alert predictions ($\widetilde{{\mathcal{M}}}$, ${\widetilde{t}}_{{\rm{c}}}$) (for example, as the centre of the inferred posterior marginals) and to trigger a new DINGO-BNS analysis with the new strain data that became available in the meantime. Such iterative posterior updates allow continuous optimization of the prior centres ($\widetilde{{\mathcal{M}}}$, ${\widetilde{t}}_{{\rm{c}}}$). As a result, the t_c prior for the after-merger analysis (Extended Data Table 2) does not need to account for large trigger uncertainties and only needs to be large enough to capture expected posteriors, p(t_c|d).

In the absence of external alerts, the independent DINGO-BNS search described earlier (Extended Data Fig. 1) can also be run as a pre-merger scan, searching for mergers around a freely chosen (but fixed) time, ${\widetilde{t}}_{{\rm{c}}}$, in the future. By continuously running this scan over the incoming data stream, alerts are triggered at roughly time ${\widetilde{t}}_{{\rm{c}}}$ before potential BNS mergers. Note that, in the absence of chirp mass predictions, we need to apply a conservative frequency range, [f_min(T, ${\mathcal{M}}$_max),f_max(−$\widetilde{t}$_c, ${\mathcal{M}}$_min)], where ${\mathcal{M}}$_min/max are the prior bounds for ${\mathcal{M}}$.

EOS likelihood

A nuclear EOS implies a functional relationship between neutron star masses, m_i, and tidal deformabilities, λ_i. The likelihood, p(d|${\mathcal{E}}$), for a given EOS, ${\mathcal{E}}$, and data, d, can be computed by integrating the GW likelihood along the hyperplane defined by the EOS constraint, ${{\lambda }}_{i}={{\lambda }}_{i}^{{\mathcal{E}}}({m}_{i})$,

$$\begin{array}{l}p({\bf{d}}| {\mathcal{E}})=\int p({\bf{d}}| {\boldsymbol{\theta }})p({\boldsymbol{\theta }})\delta ({\lambda }_{i}-{\lambda }_{i}^{{\mathcal{E}}}({m}_{i})){\rm{d}}{\boldsymbol{\theta }}\\ \,=\int \int p({\bf{d}}| {m}_{i},{m}_{2},{\lambda }_{i}^{{\mathcal{E}}}({m}_{1}),{\lambda }_{2}^{{\mathcal{E}}}({m}_{2})){\rm{d}}{m}_{1}{\rm{d}}{m}_{2}\end{array}$$

(12)

Here p(d|m₁,m₂,λ₁,λ₂) is the Bayesian evidence of d conditional on (m₁, m₂, λ₁, λ₂). To calculate equation (12) using Monte Carlo integration, it is necessary to repeatedly evaluate the integrand, which is extremely expensive using traditional methods (for example, nested sampling).

With DINGO-BNS, there are two fast ways to evaluate the integrand, using either a conditional or a marginal network: (1) a marginal network, q(m₁,m₂,λ₁,λ₂|d), directly provides an unnormalized estimate of the conditional evidence, p(d|m₁,m₂,λ₁,λ₂) (sufficient for model comparison, but not subject to our usual accuracy guarantees); or (2) a conditional network, q(θ|d;m₁,m₂,λ₁,λ₂), provides the normalized conditional evidence via importance sampling (including accuracy guarantees). Option (1) allows 10⁵ evaluations per second, whereas option (2) only allows 10³, assuming 10² weighted samples per evaluation.

By combining (1) and (2), we can achieve speed and accuracy, using the marginal network (1) to define a proposal distribution for Monte Carlo integration with the integrand from (2). Specifically, the density of the marginal network, q(m₁,m₂,λ₁,λ₂|d), is evaluated on an (m₁,m₂) grid with ${\lambda }_{i}={\lambda }_{i}^{{\mathcal{E}}}({m}_{i})$. This provides a discretized estimate of the integrand, p(d|m₁, m₂, ${\lambda }_{1}^{{\mathcal{E}}}$(m₁), ${\lambda }_{1}^{{\mathcal{E}}}$(m₂)), which we use as a proposal distribution for the integration in equation (12) when computing the integrand with the more accurate method (2). We tested this on GW170817 data using two polynomial EOS constraints, $\lambda ={\lambda }^{{\mathcal{E}}}(m)$ (Extended Data Fig. 4), finding good sample efficiencies of around 50%, small uncertainties, ${{\sigma }}_{\log p({\bf{d}}|{\mathcal{E}})}\approx 0.01$, and computation times of 1−3 s for the integral equation (12). Alternatively, the proposal could also be generated using a network, q(m₁,m₂|d), which additionally marginalizes over λ_i. Finally, for a parametric EOS, a DINGO-BNS network could be conditioned on EOS parameters, allowing for direct EOS inference. This variety of approaches emphasizes the flexibility of SBI for EOS inference.

Related work

Machine learning for GW astronomy is an active area of research⁵⁵. Several studies have explored machine-learning inference for black hole mergers^{19,20,56,57,58,59,60,61,62,63,64,65}. There have also been applications specific to BNS inference (Extended Data Table 1). The GW-SkyLocator algorithm⁶⁶ estimates the sky position using the SNR time series (similar to BAYESTAR), whereas Jim^33,35 uses hardware acceleration and machine learning to speed up conventional samplers and achieve full inference in 21–33 min. The i-nessai framework³⁹ achieves BNS inference in 24 min by combining normalizing flows using importance nested sampling. Ref. ⁶⁷ explores pre-merger BNS detection and parameter estimation with normalizing flows, also reporting 1-s analysis times. However, it has not demonstrated accurate results on real data, and is subject to several other limitations. SBI has also been used for neutron star EOS inference from GWs⁶⁸ and electromagnetic data⁶⁹. Pre-merger localization with conventional techniques has been explored for ground-based third-generation detectors^70,71,72 and for space-based detectors^73,74.

Experimental details

For our experiments, we trained DINGO-BNS networks using the hyperparameters and neural architecture^44,75 from ref. ¹⁹, with a few modifications. The embedding network consisted of a sequence of 34 two-layer, fully-connected residual blocks with hidden dimensions of 2,048 (×8), 1,024 (×8), 512 (×6), 256 (×6) and 128 (×6) after the initial projection layer. Compared with ref. ¹⁹, this added ten new blocks, increasing the number of trainable parameters in this part of the embedding network from 17 million to 91 million. For the LVK experiments, we used a dataset with 3 × 10⁷ training samples and trained it for 200 epochs. For the Cosmic Explorer experiments, we used 6 × 10⁷ training samples and trained it for 100 epochs. Training took between 5 and 7 days on one H100 GPU. We used three detectors for LVK (LIGO-Hanford, LIGO-Livingston and Virgo) and two detectors for Cosmic Explorer (primary detector at the location of LIGO-Hanford, secondary detector at the location of LIGO-Livingston). The networks were trained with the priors displayed in Extended Data Table 2. The DINGO-BNS network marginalized over the phase of coalescence, ϕ_c. During importance sampling, we reconstructed ϕ_c (ref. ²⁰) (Extended Data Figs. 6a and 8) or used a phase-marginalized likelihood^76,77 (in all other experiments). The phase reconstruction used here made the same assumptions as conventional phase marginalization^76,77.

In the first experiment, we evaluated DINGO-BNS models on 200 simulated GW datasets, generated using a fixed GW signal with GW170817-like parameters and simulated LVK detector noise. We used noise PSDs from the second (O2) and third (O3) LVK, observing runs as well as LVK design sensitivity. For each noise level, we trained one pre-merger network (f ∈ [23,200] Hz) and one network for inference with the full signal, including the merger (f ∈ [23,1024]). The latter network was only used for after-merger inference because we found that separation into two networks improved the performance. The pre-merger network was trained with frequency masking, the masking bound, f_max, being sampled in the range [28,200] Hz, enabling inference up to 60 s before the merger.

In the second experiment, we analysed 10⁴ simulated GW datasets, with GW signal parameters randomly sampled from the prior (Extended Data Table 2), the ${\mathcal{M}}$ prior reduced to the range [1.0,1.5] M_⊙ and the d_L prior reweighted to a uniform distribution in the comoving volume, with design sensitivity noise PSDs. We again trained one pre-merger network (f ∈ [19.4,200] Hz) and one after-merger network (f ∈ [19.4, 1,024] Hz). The pre-merger network was trained with frequency masking with the masking bound, f_max, sampled in the range [25,200] Hz, enabling inference up to 60 s before the merger for ${\mathcal{M}}$ ≤ 1.5 M_⊙. Both networks were additionally trained with lower-frequency masking, with f_min($\widetilde{{\mathcal{M}}}$) determined as explained above, ensuring an optimal frequency range for any chirp mass. Following ref. ¹⁷, we only considered events with SNR ≥ 12. Before each analysis, we performed a t_c scan by generating 2,500 samples for four time-shifted copies of the strain, followed by joint importance sampling of the combined results. Specifically, for a network with a t_c prior U(−${\tau }$, $\tau $), we applied the time shifts (−3$\tau $, −$\tau $, $\tau $, 3$\tau $), effectively increasing the prior to U(−4$\tau $, 4$\tau $). For the subsequent analysis, we time shifted the data such that the t_c posterior was fully covered by the prior.

For each DINGO-BNS result, we generated a skymap using a kernel density estimator implemented by ligo.skymap⁷⁸. For the sky localization comparison between DINGO-BNS and BAYESTAR, we ran BAYESTAR based on the GW signal template generated with the maximum likelihood parameters from the DINGO-BNS analysis. We noted that BAYESTAR was designed as a low-latency pipeline and typically run with (coarser) parameter estimates from search templates. Therefore, the reported BAYESTAR runs may deviate slightly from the realistic LVK setup. However, our results are consistent with those of ref. ¹⁷, which also had an approximately 30% precision improvement over BAYESTAR localization (using LVK search triggers). Both DINGO-BNS and ref. ¹⁷ performed full Bayesian BNS inference and should therefore have had identical localization improvements over BAYESTAR (assuming ideal accuracy, which for DINGO-BNS was validated with consistently high importance sampling efficiency). Differences to the localization comparison in ref. ¹⁷ are thus primarily attributed to different configurations for BAYESTAR and slightly different injection priors. Additional results for the localization comparison are shown in Extended Data Fig. 5. A probability–probability (P–P) plot for the after-merger analysis is shown in Extended Data Fig. 6a, which shows no significant bias for any parameter.

In the third experiment, we reproduced the public LVK results for GW170817 (refs. ^1,5) and GW190425 (ref. ⁷⁹) with DINGO-BNS. We used the same priors and data settings as the LVK, but we did not marginalize over calibration uncertainty. The GW170817 after-merger analysis (see also Extended Data Fig. 7) was performed with a DINGO-BNS model conditioned on the sky position, {α,β}. Following the LVK analysis⁵, we used the localization α = 3.44616 rad and δ = −0.408084 rad from the electromagnetic counterpart AT 2017gfo (ref. ⁸) at inference. Note that the localization uncertainty (σ_α = 10⁻⁶ rad, σ_δ = 10⁻⁶ rad (ref. ⁸)) was negligible for GW parameter estimation, but such effects could, in principle, be integrated by convolving the conditional DINGO-BNS network with a distribution over {α,β}. We found good sample efficiencies for both events (10.8% for GW170817 and 51.3% for GW190425) and good agreement with the LVK results (Extended Data Fig. 6b). The LVK results used detector noise PSDs generated with BayesWave³⁶, which were not available before the merger. For our pre-merger analysis of GW170817 in the main part (sample efficiency 78.9%), we thus used a PSD generated using the Welch method. The GW170817 signal overlapped with a loud glitch in the LIGO-Livingston detector¹, and we used the glitch-subtracted data provided by the LVK in our analyses. Because such data would not be available before the merger, the pre-merger inference of BNS events overlapping with glitches would, in practice, also require fast glitch mitigation methods.

In the fourth experiment, we analysed simulated Cosmic Explorer data using the anticipated noise PSDs for the primary and secondary detectors. We trained a DINGO-BNS network for pre-merger inference using f ∈ [6,11] Hz, with the upper frequency masking bound, f_max, sampled in the range [7,11] Hz. This supported a signal length of 4,096 s, with a pre-merger inference between 45 and 15 min before the merger. We injected signals with GW170817-like parameters for distance, masses and inclination to investigate how well a GW170817-like event could be localized in the Cosmic Explorer detector. We also trained a network on the full frequency range, [6, 1,024] Hz, for after-merger inference, with a reduced distance prior to control the SNR (Extended Data Table 2).

Sample efficiencies

We report sample efficiencies for all injection studies in Extended Data Fig. 6. The importance-sampled DINGO-BNS results are accurate, even with low efficiency, provided that a sufficient absolute number of effective samples can be generated. The efficiency nevertheless is a valuable diagnostic for assessing the performance of the trained inference networks.

In the LVK experiments, we found consistently high efficiencies, comparable to or greater than those reported for BBHs²⁰. As a general trend, we observed that higher noise levels (Extended Data Fig. 6c) and earlier pre-merger times (Extended Data Fig. 6d) led to higher efficiencies. This is because low SNR events generally have broader posteriors, which are simpler to model for DINGO-BNS density estimators. Furthermore, the GW signal morphology is most complicated around the merger, making pre-merger inference much simpler than inference based on the full signal.

For Cosmic Explorer injections with GW170817-like parameters (Extended Data Fig. 6e), DINGO-BNS achieved extremely high efficiency for early pre-merger analyses, but the performance decreased substantially for later analysis times. This effect can again be attributed to the increase in SNR, which was O(10³) 15 min before the merger. Improving DINGO-BNS for such high SNR events will probably require improved density estimators⁶⁴ that can better deal with tighter posteriors. When limiting the SNR by increasing the distance prior (Extended Data Table 2), we found good sample efficiencies for an after-merger Cosmic Explorer analysis that used the full 4,096-s-long signal (Extended Data Fig. 6e).

Inference times

The computational cost of inference with DINGO-BNS is dominated by: (1) neural-network forward passes to sample from the approximate posterior, ${\boldsymbol{\theta }} \sim q({\boldsymbol{\theta }}| {{\bf{d}}}_{\widetilde{{\mathcal{M}}}},\widetilde{{\mathcal{M}}})$; and (2) likelihood evaluations, p(θ|d), used for importance sampling. For 50,000 samples on an H100 GPU, (1) takes around 0.370 s and (2) takes around 0.190 s, resulting in an inference time of less than 0.6 s. The speed of the likelihood evaluations is enabled by using JAX waveform and likelihood implementations^33,34,35, combined with the heterodyning and multibanding step that we also used to compress the data for the DINGO-BNS network. We extend the open-source implementations^34,35 by combining NRTidalv1 (refs. ^32,80) with IMRPhenomPv2 (refs. ^81,82,83), as well as re-implementing the DINGO likelihood functions in JAX. The JAX functions are usually just-in-time compiled to run efficiently. The input dimension of the likelihood (determined by likelihood batch size and number of frequency bins) is fixed, enabling compilation ahead of time on random data of the correct input dimension. After compilation, a running DINGO-BNS script can perform any number of analyses without recompiling. Thus, we can leave the compilation time (18 s) out of the timing estimate for importance sampling. Compilation could further be transferred between separate DINGO-BNS runs using the persistent compilation cache in JAX, although we have not implemented this option. Likelihood evaluations can also be done without JAX, which takes less than 10 s on a single node with 64 CPUs for 50,000 samples. For the vast majority of DINGO-BNS analyses in this study, the sample efficiency was sufficiently high such that 50,000 samples corresponded to several thousands of effective samples after importance sampling, enabling full importance sampling inference in less than 1 s.

Additional sources of latency

The inference times quoted above assume that the data have already been provided to DINGO-BNS. In practice, there are various additional sources of latency.

First, PSD estimation typically uses strain data taken during or after the merger. Obtaining low-latency PSDs represents a general challenge for low-latency analyses, encountered also in the field of GW searches^14,53. Indeed, most PSDs used in this work would not be available in very low latency—the GW170817 and GW190425 PSDs from the LVK analyses^5,79 use on-source noise estimation³⁶ based on data during the merger, whereas LVK design sensitivity and Cosmic Explorer PSDs are not based on real detector noise, but rather reflect anticipated future configurations. These PSDs were chosen for comparability with existing studies. To test DINGO-BNS in a more realistic low-latency setting, we performed inference with PSDs estimated using Welch’s method⁸⁴ and only data from before the merger (Extended Data Fig. 8). The resulting posterior for GW170817 only deviated slightly from the posterior obtained from the on-source PSDs. However, we note that, in general, such pre-merger Welch PSDs may be less reliable in the presence of non-stationary detector noise, which could lead to larger biases for parameter estimation.

Strain data can also be contaminated with instrumental or environmental glitches, which need to be removed for parameter estimation⁸⁵. This adds additional latency for events that overlap with glitches, as noted above in the case of GW170817. Finally, data transfer between detectors and computing facilities add some additional latency. Using DINGO-BNS to its full potential will therefore require careful integration with low-latency pipelines²³ and further acceleration of existing components.

PSD tuning

Although most of the networks used in this study were trained with only a single PSD per detector, in practice we would generally train DINGO-BNS with an entire distribution of PSDs to enable instant tuning to drifting detector noise¹⁹. (This is not relevant to tests involving, for example, design sensitivity noise.) Of the experiments in this study, only the pre-merger result of GW170817 and the result for the Welch PSD (Extended Data Fig. 8) were generated with a PSD-conditioned DINGO-BNS network. This network was trained with a PSD distribution covering the entire second LVK observing run (O2). Conditioning on the PSD makes the inference task more complicated and therefore leads to slightly reduced performance. For example, when repeating the first injection experiment (Extended Data Fig. 6c) with the PSD-conditioned DINGO-BNS network from above, the mean efficiency was reduced from 71% to 22%. Such networks can, in principle, also be trained before the start of an observing run, by training with a synthetic dataset designed to reflect the expected noise PSDs⁸⁶.

Data availability

Public LVK data is available at https://gwosc.org/events/GW170817/ for GW170817 and at https://dcc.ligo.org/LIGO-T1900685/public for GW190425.

Code availability

The code for DINGO-BNS is publicly available as an extension of the DINGO python package at https://github.com/dingo-gw/dingo. We provide a demo at https://github.com/dingo-gw/binary-neutron-star-demo and a trained network via Zenodo at https://zenodo.org/records/13321251 (ref. 88).

References

Abbott, B. P. et al. GW170817: observation of gravitational waves from a binary neutron star inspiral. Phys. Rev. Lett. 119, 161101 (2017).
Article ADS CAS PubMed MATH Google Scholar
Abbott, B. P. et al. Multi-messenger observations of a binary neutron star merger. Astrophys. J. Lett. 848, L12 (2017).
Article ADS MATH Google Scholar
Abbott, B. P. et al. A gravitational-wave standard siren measurement of the Hubble constant. Nature 551, 85–88 (2017).
Article ADS MATH Google Scholar
Abbott, B. P. et al. Gravitational waves and gamma-rays from a binary neutron star merger: GW170817 and GRB 170817 A. Astrophys. J. Lett. 848, L13 (2017).
Article ADS MATH Google Scholar
Abbott, B. P. et al. Properties of the binary neutron star merger GW170817. Phys. Rev. X 9, 011001 (2019).
Article MATH Google Scholar
Abbott, B. P. et al. GW170817: measurements of neutron star radii and equation of state. Phys. Rev. Lett. 121, 161101 (2018).
Article ADS CAS PubMed MATH Google Scholar
Abbott, B. P. et al. Tests of general relativity with GW170817. Phys. Rev. Lett. 123, 011102 (2019).
Article ADS CAS PubMed MATH Google Scholar
Coulter, D. A. et al. Swope supernova survey 2017a (SSS17a), the optical counterpart to a gravitational wave source. Science 358, 1556 (2017).
Article ADS CAS PubMed MATH Google Scholar
Aasi, J. et al. Advanced LIGO. Class. Quant. Grav. 32, 074001 (2015).
Article ADS Google Scholar
Acernese, F. et al. Advanced Virgo: a second-generation interferometric gravitational wave detector. Class. Quant. Grav. 32, 024001 (2015).
Article ADS MATH Google Scholar
Aso, Y. et al. Interferometer design of the KAGRA gravitational wave detector. Phys. Rev. D 88, 043007 (2013).
Article ADS MATH Google Scholar
Veitch, J. et al. Parameter estimation for compact binaries with ground-based gravitational-wave observations using the LALInference software library. Phys. Rev. D 91, 042003 (2015).
ADS MATH Google Scholar
Ashton, G. et al. BILBY: a user-friendly Bayesian inference library for gravitational-wave astronomy. Astrophys. J. Suppl. 241, 27 (2019).
Article ADS CAS MATH Google Scholar
Nitz, A. H., Dal Canton, T., Davis, D. & Reyes, S. Rapid detection of gravitational waves from compact binary mergers with PyCBC Live. Phys. Rev. D 98, 024050 (2018).
Article ADS CAS Google Scholar
Cannon, K. et al. GstLAL: a software framework for gravitational wave discovery. SoftwareX 14, 100680 (2021).
Singer, L. P. & Price, L. R. Rapid Bayesian position reconstruction for gravitational-wave transients. Phys. Rev. D 93, 024013 (2016).
Article ADS MathSciNet MATH Google Scholar
Morisaki, S. et al. Rapid localization and inference on compact binary coalescences with the advanced LIGO-Virgo-KAGRA gravitational-wave detector network. Phys. Rev. D 108, 123040 (2023).
Article ADS MathSciNet CAS MATH Google Scholar
Cranmer, K., Brehmer, J. & Louppe, G. The frontier of simulation-based inference. Proc. Natl Acad. Sci. USA 117, 30055–30062 (2020).
Article ADS MathSciNet CAS PubMed PubMed Central MATH Google Scholar
Dax, M. et al. Real-time gravitational wave science with neural posterior estimation. Phys. Rev. Lett. 127, 241103 (2021).
Article ADS CAS PubMed MATH Google Scholar
Dax, M. et al. Neural importance sampling for rapid and reliable gravitational-wave inference. Phys. Rev. Lett. 130, 171403 (2023).
Article ADS CAS PubMed MATH Google Scholar
Reitze, D. et al. Cosmic Explorer: The U.S. contribution to gravitational-wave astronomy beyond LIGO. Bull. Am. Astron. Soc. 51, 035 (2019).
MATH Google Scholar
Punturo, M. et al. The third generation of gravitational wave observatories and their science reach. Class. Quant. Grav. 27, 084007 (2010).
Article ADS MATH Google Scholar
Chaudhary, S. S. et al. Low-latency gravitational wave alert products and their performance at the time of the fourth LIGO-Virgo-KAGRA observing run. Proc. Natl Acad. Sci. USA 121, e2316474121 (2024).
Article CAS PubMed PubMed Central Google Scholar
Sridhar, N., Zrake, J., Metzger, B. D., Sironi, L. & Giannios, D. Shock-powered radio precursors of neutron star mergers from accelerating relativistic binary winds. Mon. Not. Roy. Astron. Soc. 501, 3184–3202 (2021).
Article ADS Google Scholar
Most, E. R. & Philippov, A. A. Electromagnetic precursor flares from the late inspiral of neutron star binaries. Mon. Not. Roy. Astron. Soc. 515, 2710–2724 (2022).
Article ADS CAS MATH Google Scholar
Abbott, B. P. et al. A guide to LIGO–Virgo detector noise and extraction of transient gravitational-wave signals. Class. Quant. Grav. 37, 055002 (2020).
Article ADS CAS MATH Google Scholar
Cornish, N. J. Fast Fisher matrices and lazy likelihoods. Preprint at https://arxiv.org/abs/1007.4820 (2010).
Vinciguerra, S., Veitch, J. & Mandel, I. Accelerating gravitational wave parameter estimation with multi-band template interpolation. Class. Quant. Grav. 34, 115006 (2017).
Article ADS MATH Google Scholar
Morisaki, S. Accelerating parameter estimation of gravitational waves from compact binary coalescence using adaptive frequency resolutions. Phys. Rev. D 104, 044062 (2021).
Article ADS MathSciNet CAS Google Scholar
Blanchet, L. Gravitational radiation from post-Newtonian sources and inspiralling compact binaries. Living Rev. Rel. 17, 2 (2014).
Abbott, B. P. et al. Low-latency gravitational-wave alerts for multimessenger astronomy during the second advanced LIGO and Virgo observing run. Astrophys. J. 875, 161 (2019).
Article ADS Google Scholar
Dietrich, T. et al. Matter imprints in waveform models for neutron star binaries: Tidal and self-spin effects. Phys. Rev. D 99, 024029 (2019).
Article ADS CAS Google Scholar
Wong, K. W. K., Isi, M. & Edwards, T. D. P. Fast gravitational-wave parameter estimation without compromises. Astrophys. J. 958, 129 (2023).
Article ADS MATH Google Scholar
Edwards, T. D. P. et al. Differentiable and hardware-accelerated waveforms for gravitational wave data analysis Phys. Rev. D 110, 064028 (2024).
Wouters, T., Pang, P. T. H., Dietrich, T. & Van Den Broeck, C. Robust parameter estimation within minutes on gravitational wave signals from binary neutron star inspirals. Phys. Rev. D 110, 083033 (2024).
Cornish, N. J. & Littenberg, T. B. BayesWave: Bayesian inference for gravitational wave bursts and instrument glitches. Class. Quant. Grav. 32, 135012 (2015).
Article ADS MATH Google Scholar
Raymond, V., Al-Shammari, S. & Göttel, A. Simulation-based inference for gravitational-waves from intermediate-mass binary black holes in real noise. Preprint at https://arxiv.org/abs/2406.03935 (2024).
Davis, D. et al. LIGO detector characterization in the second and third observing runs. Class. Quant. Grav. 38, 135014 (2021).
Article ADS CAS MATH Google Scholar
Williams, M. J., Veitch, J. & Messenger, C. Importance nested sampling with normalising flows. Mach. Learn. Sci. Tech. 4, 035011 (2023).
Article Google Scholar
Papamakarios, G. & Murray, I. Fast ε-free inference of simulation models with Bayesian conditional density estimation. In NIPS’16: Proceedings 30th International Conference on Neural Information Processing Systems (eds Lee, D. D. et al.) 1036–1044 (Curran Associates Inc., 2016).
Lueckmann, J.-M. et al. Flexible statistical inference for mechanistic models of neural dynamics. In NIPS’17: Proceedings 31st International Conference on Neural Information Processing Systems (eds von Luxburg, U. et al.) 1289–1299 (Curran Associates Inc., 2017).
Greenberg, D., Nonnenmacher, M. & Macke, J. Automatic posterior transformation for likelihood-free inference. Proc. Machine Learn. Res. 97, 2404–2414 (2019).
Rezende, D. & Mohamed, S. Variational inference with normalizing flows. Proc. Machine Learn. Res. 37, 1530–1538 (2015).
Durkan, C., Bekasov, A., Murray, I. & Papamakarios, G. Neural spline flows. In NeurIPS 2019: Proceedings 33rd Annual Conference on Neural Information Processing Systems (eds Wallach, H. et al.) 7509–7520 (Curran Associates Inc., 2019).
Dax, M. et al. Group equivariant neural posterior estimation. In Proceedings 10th International Conference on Learning Representations (2022).
Cornish, N. J. Heterodyned likelihood for rapid gravitational wave parameter inference. Phys. Rev. D 104, 104054 (2021).
Article ADS MathSciNet CAS MATH Google Scholar
Zackay, B., Dai, L. & Venumadhav, T. Relative binning and fast likelihood evaluation for gravitational wave parameter estimation. Preprint at https://arxiv.org/abs/1806.08792 (2018).
Deistler, M., Goncalves, P. J. & Macke, J. H. Truncated proposals for scalable and hassle-free simulation-based inference. In NeurIPS 2022: Proceedings 36th Annual Conference on Neural Information Processing Systems (eds Koyejo, S. et al.) 23135–23149 (Curran Associates Inc., 2022).
Finn, L. S. Detection, measurement and gravitational radiation. Phys. Rev. D 46, 5236–5249 (1992).
Article ADS CAS MATH Google Scholar
Adams, T. et al. Low-latency analysis pipeline for compact binary coalescences in the advanced gravitational wave detector era. Class. Quant. Grav. 33, 175012 (2016).
Article ADS MATH Google Scholar
Chu, Q. et al. SPIIR online coherent pipeline to search for gravitational waves from compact binary coalescences. Phys. Rev. D 105, 024023 (2022).
Article ADS CAS Google Scholar
Sachdev, S. et al. An early-warning system for electromagnetic follow-up of gravitational-wave events. Astrophys. J. Lett. 905, L25 (2020).
Article ADS CAS MATH Google Scholar
Nitz, A. H., Schäfer, M. & Dal Canton, T. Gravitational-wave merger forecasting: scenarios for the early detection and localization of compact-binary mergers with ground-based observatories. Astrophys. J. Lett. 902, L29 (2020).
Article ADS CAS MATH Google Scholar
Kovalam, M. et al. Early warnings of binary neutron star coalescence using the SPIIR search. Astrophys. J. Lett. 927, L9 (2022).
Article ADS MATH Google Scholar
Cuoco, E. et al. Enhancing gravitational-wave science with machine learning. Mach. Learn. Sci. Tech. 2, 011002 (2020).
Article MATH Google Scholar
Gabbard, H., Messenger, C., Heng, I. S., Tonolini, F. & Murray-Smith, R. Bayesian parameter estimation using conditional variational autoencoders for gravitational-wave astronomy. Nat. Phys. 18, 112–117 (2022).
Article CAS Google Scholar
Green, S. R., Simpson, C. & Gair, J. Gravitational-wave parameter estimation with autoregressive neural network flows. Phys. Rev. D 102, 104057 (2020).
Article ADS MathSciNet CAS MATH Google Scholar
Delaunoy, A. et al. Lightning-fast gravitational wave parameter inference through neural amortization. In Proceedings 3rd Workshop on Machine Learning and the Physical Sciences (2020).
Green, S. R. & Gair, J. Complete parameter inference for GW150914 using deep learning. Mach. Learn. Sci. Tech. 2, 03LT01 (2021).
Article MATH Google Scholar
Williams, M. J., Veitch, J. & Messenger, C. Nested sampling with normalizing flows for gravitational-wave inference. Phys. Rev. D 103, 103006 (2021).
Article ADS MathSciNet CAS MATH Google Scholar
Alvey, J., Bhardwaj, U., Nissanke, S. & Weniger, C. What to do when things get crowded? Scalable joint analysis of overlapping gravitational wave signals. Preprint at https://arxiv.org/abs/2308.06318 (2023).
Crisostomi, M., Dey, K., Barausse, E. & Trotta, R. Neural posterior estimation with guaranteed exact coverage: the ringdown of GW150914. Phys. Rev. D 108, 044029 (2023).
Article ADS MathSciNet CAS MATH Google Scholar
Bhardwaj, U., Alvey, J., Miller, B. K., Nissanke, S. & Weniger, C. Sequential simulation-based inference for gravitational wave signals. Phys. Rev. D 108, 042004 (2023).
Article ADS CAS Google Scholar
Wildberger, J. et al. Flow matching for scalable simulation-based inference. NeurIPS 2023 (2023).
Kolmus, A. et al. Tuning neural posterior estimation for gravitational wave inference. Preprint at https://arxiv.org/abs/2403.02443 (2024).
Chatterjee, C. & Wen, L. Premerger sky localization of gravitational waves from binary neutron star mergers using deep learning. Astrophys. J. 959, 76 (2023).
Article ADS CAS Google Scholar
van Straalen, W., Kolmus, A., Janquart, J. & Van Den Broeck, C. Pre-merger detection and characterization of inspiraling binary neutron stars derived from neural posterior estimation. Preprint at https://www.arxiv.org/abs/2407.10263 (2024).
McGinn, J. et al. Rapid neutron star equation of state inference with normalising flows. Preprint at https://arxiv.org/abs/2403.17462 (2024).
Brandes, L. et al. Neural simulation-based inference of the neutron star equation of state directly from telescope spectra. J. Cosmol. Astropart. Phys. 09, 009 (2024).
Article MATH Google Scholar
Chan, M. L., Messenger, C., Heng, I. S. & Hendry, M. Binary neutron star mergers and third generation detectors: localization and early warning. Phys. Rev. D 97, 123014 (2018).
Article ADS CAS Google Scholar
Nitz, A. H. & Dal Canton, T. Pre-merger localization of compact-binary mergers with third-generation observatories. Astrophys. J. Lett. 917, L27 (2021).
Article ADS MATH Google Scholar
Hu, Q. & Veitch, J. Rapid premerger localization of binary neutron stars in third-generation gravitational-wave detectors. Astrophys. J. Lett. 958, L43 (2023).
Article ADS MATH Google Scholar
Lops, G. et al. Galaxy fields of LISA massive black hole mergers in a simulated universe. Mon. Not. Roy. Astron. Soc. 519, 5962–5986 (2023).
Article ADS CAS MATH Google Scholar
Piro, L. et al. Chasing super-massive black hole merging events with Athena and LISA. Mon. Not. Roy. Astron. Soc. 521, 2577–2592 (2023).
Article ADS CAS MATH Google Scholar
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (IEEE, 2015).
Veitch, J. & Del Pozzo, W. Analytic marginalisation of phase parameter. Ligo DCC https://dcc.ligo.org/LIGO-T1300326/public (2013).
Thrane, E. & Talbot, C. An introduction to Bayesian inference in gravitational-wave astronomy: parameter estimation, model selection, and hierarchical models. Publ. Astron. Soc. Austral. 36, e010 (2019); erratum 37, e036 (2020).
Article ADS MATH Google Scholar
Singer, L. ligo.skymap. ligo.skymap:docs https://lscsoft.docs.ligo.org/ligo.skymap/ (2020).
Abbott, B. et al. GW190425: observation of a compact binary coalescence with total mass ~3.4 M_⊙. Astrophys. J. Lett. 892, L3 (2020).
Article ADS CAS MATH Google Scholar
Dietrich, T., Bernuzzi, S. & Tichy, W. Closed-form tidal approximants for binary neutron star gravitational waveforms constructed from high-resolution numerical relativity simulations. Phys. Rev. D 96, 121501 (2017).
Article ADS MATH Google Scholar
Hannam, M. et al. Simple model of complete precessing black-hole-binary gravitational waveforms. Phys. Rev. Lett. 113, 151101 (2014).
Article ADS PubMed MATH Google Scholar
Khan, S. et al. Frequency-domain gravitational waves from nonprecessing black-hole binaries. II. A phenomenological model for the advanced detector era. Phys. Rev. D 93, 044007 (2016).
MATH Google Scholar
Bohé, A. et al. PhenomPv2 – technical notes for the LAL implementation. LIGO Technical Document, LIGO-T1500602-v4 (2016).
Welch, P. The use of fast fourier transform for the estimation of power spectra: a method based on time averaging over short, modified periodograms. IEEE Trans. Audio Electroacoust. 15, 70–73 (1967).
Article ADS MATH Google Scholar
Pankow, C. et al. Mitigation of the instrumental noise transient in gravitational-wave data surrounding GW170817. Phys. Rev. D 98, 084016 (2018).
Article ADS CAS MATH Google Scholar
Wildberger, J. et al. Adapting to noise distribution shifts in flow-based gravitational-wave inference. Phys. Rev. D 107, 084046 (2023).
Article ADS CAS MATH Google Scholar
Hernandez Vivanco, F. et al. Measuring the neutron star equation of state with gravitational waves: the first forty binary neutron star merger observations. Phys. Rev. D 100, 103009 (2019).
Article ADS CAS MATH Google Scholar

Download references

Acknowledgements

We thank S. Buchholz, T. Dent, A. Kofler and S. Morisaki for useful discussions. This research made use of data and software obtained from the Gravitational Wave Open Science Center (gwosc.org), a service of the LIGO Scientific Collaboration, the Virgo Collaboration and KAGRA. This material is based upon work supported by the National Science Foundation’s (NSF’s) LIGO Laboratory—a major facility, fully funded by the NSF—as well as the Science and Technology Facilities Council (STFC) of the UK, the Max Planck Society (MPS) and the State of Niedersachsen, Germany, which supported the construction of the Advanced LIGO and construction and operation of the GEO600 detector. Additional support for the Advanced LIGO was provided by the Australian Research Council. Virgo is funded through the European Gravitational Observatory (EGO), by the French Centre National de Recherche Scientifique (CNRS), the Italian Istituto Nazionale di Fisica Nucleare (INFN) and the Dutch Nikhef, with contributions from institutions from Belgium, Germany, Greece, Hungary, Ireland, Japan, Monaco, Poland, Portugal and Spain. KAGRA is supported by the Ministry of Education, Culture, Sports, Science and Technology (MEXT) and the Japan Society for the Promotion of Science (JSPS) in Japan; the National Research Foundation (NRF) and Ministry of Science and Information and Communications Technology (MSIT) in Korea; and Academia Sinica (AS) and the National Science and Technology Council (NSTC) in Taiwan. M.D. thanks the Hector Fellow Academy for support. This work was supported by the German Research Foundation (DFG) through Germany’s Excellence Strategy (EXC-Number 2064/1, project number 390727645) and the German Federal Ministry of Education and Research (BMBF) (Tübingen AI Center, FKZ: 01IS18039A). S.R.G. was supported by a UK Research and Innovation (UKRI) Future Leaders Fellowship (grant no. MR/Y018060/1). The computational work for this manuscript was carried out on the Atlas cluster at the Max Planck Institute for Intelligent Systems in Tübingen, Germany, and the Lakshmi and Hypatia clusters at the Max Planck Institute for Gravitational Physics in Potsdam, Germany. V.R. is supported by the UK’s Science and Technology Facilities Council (grant no. ST/V005618/1).

Funding

Open access funding provided by Max Planck Society.

Author information

Authors and Affiliations

Max Planck Institute for Intelligent Systems, Tübingen, Germany
Maximilian Dax, Jakob H. Macke & Bernhard Schölkopf
ETH Zurich, Zurich, Switzerland
Maximilian Dax & Bernhard Schölkopf
ELLIS Institute Tübingen, Tübingen, Germany
Maximilian Dax, Jonas Wildberger & Bernhard Schölkopf
School of Mathematical Sciences, University of Nottingham, Nottingham, UK
Stephen R. Green
Max Planck Institute for Gravitational Physics (Albert Einstein Institute), Potsdam, Germany
Jonathan Gair, Nihar Gupte & Alessandra Buonanno
Department of Physics, University of Maryland, College Park, MD, USA
Nihar Gupte & Alessandra Buonanno
Department of Physics, University of Rhode Island, Kingston, RI, USA
Michael Pürrer
Center for Computational Research, University of Rhode Island, Kingston, RI, USA
Michael Pürrer
Gravity Exploration Institute, Cardiff University, Cardiff, UK
Vivien Raymond
Machine Learning in Science, University of Tübingen & Tübingen AI Center, Tübingen, Germany
Jakob H. Macke

Authors

Maximilian Dax
View author publications
Search author on:PubMed Google Scholar
Stephen R. Green
View author publications
Search author on:PubMed Google Scholar
Jonathan Gair
View author publications
Search author on:PubMed Google Scholar
Nihar Gupte
View author publications
Search author on:PubMed Google Scholar
Michael Pürrer
View author publications
Search author on:PubMed Google Scholar
Vivien Raymond
View author publications
Search author on:PubMed Google Scholar
Jonas Wildberger
View author publications
Search author on:PubMed Google Scholar
Jakob H. Macke
View author publications
Search author on:PubMed Google Scholar
Alessandra Buonanno
View author publications
Search author on:PubMed Google Scholar
Bernhard Schölkopf
View author publications
Search author on:PubMed Google Scholar

Contributions

M.D. led the research with input from S.R.G. and B.S. M.D. and S.R.G. conceived the main methodology. M.D. devised and carried out the implementation and experimental analysis. M.D. and S.R.G. developed the DINGO code, with contributions from N.G., M.P. and J.W. N.G. implemented the JAX waveform model. V.R. proposed the exploration of the EOS inference. M.D. and S.R.G. wrote the paper. M.D., S.R.G., J.G., N.G., M.P., V.R., J.W., J.H.M., A.B. and B.S. contributed to scientific discussions and paper editing.

Corresponding author

Correspondence to Maximilian Dax.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature thanks Rory Smith, Michael Williams and Kaze Wong for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 DINGO-BNS scan estimates the chirp mass and merger time.

(a) Log likelihoods generated from a scan over different values of $\widetilde{{\mathcal{M}}}$ with a DINGO-BNS network. The final $\widetilde{{\mathcal{M}}}$ is chosen as the maximum likelihood ${\mathcal{M}}$ (red line; $\widetilde{{\mathcal{M}}}$ = 1.1975 M_⊙ for GW170817, $\widetilde{{\mathcal{M}}}$ = 1.4868 M_⊙ for GW190425). (b) Posterior marginal p(${\mathcal{M}}$|d). The prior (dashed lines) determined by the scan from (a) fully covers the marginal. (c) A combined scan over ${\mathcal{M}}$ and t_c successfully identifies GW170817 (with ${\hat{t}}_{{\rm{c}}}$ = 1187008882.43) and GW190425 (with ${\hat{t}}_{{\rm{c}}}$ = 1240215503.04).

Extended Data Fig. 2 Frequency multibanding.

(a) The period of (heterodyned) GW signals decreases with increasing frequency. The native frequency resolution (blue) thus oversamples the signal at high frequencies. Frequency multibanding (band boundaries indicated by dotted red lines) adapts to the signal variation, decreasing the resolution at higher frequencies (orange). (b) The multibanded domain therefore requires fewer frequency bins, and the signal variation is more homogeneous across bins. (c) Multibanded frequency domain partitions for LVK (f_min = 19.4 Hz, compression factor ≈60) and CE (f_min = 5 Hz, compression factor ≈650) experiments. We use a smaller chirp mass prior for the CE experiments (Extended Data Table 2), which allows a slightly coarser resolution compared to LVK (corresponding to lower ${\hat{f}}_{i}$). The first two bands for CE are skipped entirely, which is a consequence of the reduced signal variation with heterodyning.

Extended Data Fig. 3 Time-domain truncation of BNS signals.

Truncation at time t_max (red dashed line) before the merger can be approximated by truncation at a corresponding maximum frequency (red solid line) in frequency domain. Below frequency f_max(t, ${\mathcal{M}}$), the truncated signal (blue in center panel) matches the original signal (gray). Above f_max(t, ${\mathcal{M}}$), the amplitude of the truncated signal quickly approaches zero. We determine f_max(t, ${\mathcal{M}}$) empirically, by allowing mismatches between truncated and original signals of at most 10⁻³ (lower panel). Analogously, truncation for t < t_min can be achieved by imposing a minimum frequency cutoff f_min(t, ${\mathcal{M}}$).

Extended Data Fig. 4 Neutron-star equation-of-state likelihood.

Neutron-star EOS imply a functional relation λ_i = λ_i(m_i) between tidal parameters λ_i and component masses m_i. The likelihood p(d|${\mathcal{E}}$) for an EOS ${\mathcal{E}}$ given the GW data d requires integrating the posterior p(θ|d) along the corresponding hyperplane. No posterior samples (gray) will be exactly on such hyperplanes (coloured lines), hence the standard Bayesian inference techniques are not directly applicable⁸⁷. DINGO-BNS provides various possibilities to directly compute this quantity, enabling comparison of different EOS in terms of the Bayes factors ${\mathcal{B}}$.

Extended Data Fig. 5 Localization comparison between BAYESTAR and DINGO-BNS.

We evaluate the 90% credible area and the searched area. The boxplots display the medians (percentage changes indicated), quartiles and 10th/90th percentiles. The comparisons according to SNR are based only on results after the merger.

Extended Data Fig. 6 DINGO-BNS performance tests.

(a) P-P plot for 1000 simulated GW datasets. For each simulation and parameter, we compute the percentile p of the injected parameter under the corresponding one-dimensional marginal posterior inferred with DINGO-BNS. The plot shows the cumulative distribution function (CDF) of p. For an unbiased sampler we expect CDF(p) = p within statistical uncertainties. Gray bands indicate confidence intervals corresponding to one, two and three standard deviations. The legend displays the p-value for each parameter, the combined p-value is 0.37. The GW datasets for this plot are generated with the settings from the after-merger analysis of the second experiment, except for the luminosity distance prior, which for this statistical test needs to be set to follow the uniform training prior (Extended Data Table 2). (b) Posterior for GW190425. 2D marginals are displayed in terms of 50% and 90% credible regions. For the 1D marginals, the deviation between the two results is quantified in terms of the Jensen–Shannon divergence (numbers on the diagonal). DINGO-BNS shows good agreement with the public LVK result⁷⁹. (c)–(e) Sample efficiencies for DINGO-BNS, for (c) GW170817-like injections using different detector noise levels, (d) injections using LVK design sensitivity PSDs and (e) injections using CE PSDs. Panel b adapted from ref. ⁷⁹ under a Creative Commons licence CC BY 4.0.

Extended Data Fig. 7 Extended posterior for GW170817.

The corner plot compares the DINGO-BNS result from the main text (orange) with the LVK result (black)⁵. 2D marginals are displayed in terms of 50% and 90% credible regions. For the 1D marginals, the deviation between the two results is quantified in terms of the Jensen–Shannon divergence (numbers on the diagonal). For the definition of the parameters see Extended Data Table 2. The small deviations between the results could be a consequence of the marignalization over detector calibration uncertainties in the LVK result, which is not applied with DINGO-BNS.

Extended Data Fig. 8 GW170817 posterior with different PSDs.

The corner plot displays DINGO-BNS inference results using the BayesWave PSD (orange) from the LVK analysis and a Welch PSD (blue). 2D marginals are displayed in terms of 50% and 90% credible regions. For the 1D marginals, the deviation between the two results is quantified in terms of the Jensen–Shannon divergence (numbers on the diagonal). For the definition of the parameters see Extended Data Table 2. The Welch PSD is estimated based on strain data taken 1184 to 160 s before the BNS merger (using 128 segments of length eight seconds and subsequent upsampling to the desired resolution ∆f = 1/128 Hz), and is thus available in low latency for pre-merger analysis. The DINGO-BNS network for the Welch analysis is trained with a distribution of Welch PSDs covering the entire second LVK observing run, and tuned to the GW170817 PSDs via network conditioning¹⁹.

Extended Data Table 1 Comparison of fast methods for BNS inference

Full size table

Extended Data Table 2 Training priors

Full size table

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Dax, M., Green, S.R., Gair, J. et al. Real-time inference for binary neutron star mergers using machine learning. Nature 639, 49–53 (2025). https://doi.org/10.1038/s41586-025-08593-z

Download citation

Received: 02 August 2024
Accepted: 03 January 2025
Published: 05 March 2025
Version of record: 05 March 2025
Issue date: 06 March 2025
DOI: https://doi.org/10.1038/s41586-025-08593-z

This article is cited by

Gravitational wave standard sirens: A brief review of cosmological parameter estimation
- Shang-Jie Jin
- Ji-Yu Song
- Xin Zhang
Science China Physics, Mechanics & Astronomy (2026)
How AI could let us watch epic star collisions in real time
- Davide Castelvecchi
Nature (2025)

Subjects

Abstract

Similar content being viewed by others

Main

DINGO-BNS

Data compression and prior conditioning

Frequency masking

Conditioning on parameter subsets

Experiments

Discussion

Methods

Machine-learning framework

Prior conditioning

Independent estimation of chirp mass and merger times

Frequency multibanding

Data simulation

Likelihood evaluations

Frequency masking

EOS likelihood

Related work

Experimental details

Sample efficiencies

Inference times

Additional sources of latency

PSD tuning

Data availability

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data figures and tables

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links