Introduction

Exploring the interior of Venus could yield crucial insight into its evolution and current geodynamic regime, which remain unknown1. The global network of seismometers on Earth’s surface was instrumental to developing 1D models of its interior2,3,4, and now contributes to revealing 3D heterogeneities in the mantle and crust5,6. Beyond Earth, successful seismometer deployments on Mars and the Moon have provided invaluable information about their structure7,8 and new seismology missions are now planned to explore Titan9,10,11. However, surface deployment remains challenging on Venus due to the short lifespan of electronics at its high-temperature surface (~460 K)12,13.

In recent years, key observations have demonstrated the potential of balloon-borne microbarometers to detect the acoustic signature of seismic waves14,15,16. These signals emerge from the mechanical coupling of seismic ground motion into infrasound—acoustic waves below ~20 Hz—resulting from stress and vertical displacement continuity at the surface17,18. Owing to the large velocity contrast between a planet and its atmosphere and low attenuation at low frequencies, seismic waves generate vertically-propagating acoustic waves with dispersion characteristics similar to those of their seismic counterparts19. Importantly, this seismo-acoustic coupling is expected to be two orders of magnitude stronger on Venus due to its dense atmosphere20,21, enabling the detection of converted seismic waves across a wide range of altitudes. Balloon platforms are therefore considered a realistic alternative to ground deployments to explore Venus’ interior12,13,22,23. They offer several advantages for subsurface monitoring, such as their mobility and ability to cover large areas. On Venus, balloons operate under acceptable pressure and temperature conditions between 50 and 60 km altitude and were successfully deployed during the Soviet VEGA missions24. They are also relatively inexpensive and benefit from recent advances that enable long-duration flights. Technology for controlled flights is also available25,26,27 and offers greater flexibility in mission design and planning; however, it may introduce unwanted noise into infrasound signals.

The recent recordings of earthquake infrasound on Earth, therefore, represent a unique opportunity to assess the use of balloon infrasound for seismic source localization and subsurface exploration. Krishnamoorthy et al.15 and Bowman and Krishnamoorthy28 detected and analyzed epicentral infrasound produced by artificial seismic sources and recorded by balloons at close range (<100 km). Although these epicentral signals can provide information on the source, such as its location or mechanism, they mainly propagate in a direct path in the atmosphere from the shallow source to the balloon, and hence carry little information on subsurface seismic properties. Brissaud et al.19 detected a magnitude 4.2 earthquake using free-floating balloons in Southern California. However, signals were recorded at only one balloon, did not show body wave arrivals, and the surface waves had a low Signal-to-Noise Ratio (SNR), which prevented the joint inversion of source and subsurface properties. A year later, Garcia et al.29 reported the detection of an Mw 7.5 earthquake in Peru and an Mw 7.3 earthquake in the Flores Sea using freely floating stratospheric balloons from the Strateole-2 campaign30. In particular, infrasound from the Flores Sea earthquake was recorded by four balloons at high SNR. Garcia et al.29 showed general agreement between balloon pressure signals and ground-based vertical velocity records. Recently, Gerier et al.31 modeled this event numerically, including the seismo-acoustic coupling, and demonstrated that major seismic phases—P and S body waves and Rayleigh surface waves (LR)—are identifiable in the balloon data. However, it is still largely unknown to what extent such waveforms can provide insight into seismic sources and seismic velocity models.

In the present contribution, we demonstrate that body- and Rayleigh-wave arrival times at various frequencies provide sufficient information to constrain these seismic parameters through a Bayesian inversion approach, even with a small number of balloon stations. We apply this inversion to jointly retrieve the hypocenter of the 2021 Mw 7.3 Flores Sea earthquake and a 1D model of subsurface seismic velocities in the region. The inversion is first tested using P, S, and LR arrivals identified at ground stations, and then using arrivals identified in Strateole-2 balloon recordings. We finally quantify the uncertainty in the retrieved source location and seismic velocities and discuss the sensitivity of the inversion method.

Results

A joint approach for the inversion of source and subsurface parameters

Earthquake-induced infrasound signals are scaled images of the vertical velocity of the ground surface below the balloon29,32. As low-frequency coupled waves propagate mostly vertically and up to relatively low stratospheric altitudes, they remain unaffected by dispersion or non-linear distortion through the Earth’s atmosphere. A forward model of arrival times at balloon platforms can thus be readily derived from classical seismological methods. Consequently, we use both body- and surface-wave arrival times in several frequency bands to retrieve the source and subsurface parameters. Relying on arrival times instead of full waveform modeling eliminates the need for an accurate source model and the reliance on low-frequency waveforms, which are typically contaminated by buoyancy oscillations and turbulence in balloon data29,33. For planetary exploration, a joint subsurface and source inversion is required due to our lack of prior knowledge of subsurface structures and source locations. Additionally, the sparsity of balloon networks on Earth, and possibly on Venus, calls for careful assessment of uncertainty in hypocenter coordinates34. To solve the ill-posed hypocenter-velocity problem35, we employ a Bayesian approach, which performs a global search through model space using a Markov chain Monte Carlo (McMC) method (see, e.g., the monograph by Tarantola36). This approach combines the misfit between predicted and observed arrival times (likelihood) with the provided a priori information for each inverted parameter (prior) to infer the probability distribution for these parameters (posterior). The present McMC inversion is adapted from the Ensemble Sampler37,38.

The inverted source variables are origin time ts, source latitude and longitude, and depth (lonslatshs). The subsurface is modeled as six homogeneous layers over a halfspace, and the shear wave velocity vS,i, Poisson’s ratio νi and thickness Hi of each layer i are inverted for. Prior ranges for these variables are described in the Methods and Supplementary Table S3.

Concept validation on Earth data: the 2021 Flores earthquake

The Flores Sea earthquake occurred on December 14, 2021, with a magnitude of Mw = 7.3. Following relocation, Supendi et al.39 associated the event with the Kalaotoa fault system, identifying a strike-slip mechanism at a depth of 12.2 km. This aligns with USGS estimates of 14.2 depth, based on source location, and 17.5 km depth from moment tensor inversions40.

Few subsurface velocity models are available near the Flores Sea, a region characterized by high heterogeneity due to the presence of several subduction zones. To provide a meaningful reference for the interpretation of our inverted subsurface models, we define a Median Model based on the median of CRUST1.0 models41 and the LLNL-G3D-JPS tomographic models in the mantle42 below our stations (see Supplementary Fig. S4). In these comparisons, we consider 15 km depth, latitude −7.6°N and longitude 122.2°E as the reference hypocenter (USGS CMT solution40), and 03:20:23 UTC as the reference origin time.

At the time of the event, four Strateole-2 balloons, identified as TTL4-07, TTL4-15, TTL5-16, and TTL3-17, were located between 680 and 2800 km to the northeast of the event. The balloon inversion uses body and surface wave arrival times extracted from their pressure traces. The Strateole-2 pressure sensor has a flat frequency response below about 0.3 Hz29,43, however, the SNR is low at long periods due to the presence of the balloon buoyancy resonance33. These oscillations were partly corrected using a method inspired from Podglajen et al.43 (see Methods and Supplementary Fig. S3 for details). P-wave arrival times were picked for the four balloons with an estimated uncertainty between 7 and 35 s, and S wave arrivals with uncertainties between 8 and 49 s. Due to the low-frequency noise, the LR arrival could only be identified with confidence for TTL3-17 and TTL5-16 between 0.005 and 0.1 Hz, with a mean uncertainty of around 50 s. The picks are shown in more details in the Supplementary Figs. S1S2, and S33S36.

In order to assess the robustness of this infrasound-based inversion, we also construct a reference source and subsurface model through the inversion of data recorded at 11 seismic stations, selected among the Global Seismograph Network, the Australian National Seismograph Network, and the German GEOFON seismic network. This separate inversion allows us to build confidence in the joint inversion technique, and to compare the resolution obtained from a small number of receivers at low SNR—the balloon case—to the one obtained from a typical dense ground network of receivers at high SNR—the seismic case. For consistency, we pick the seismic arrivals using vertical velocity signals from seismic stations in the Flores region, emulating a single-component infrasound signal. The 11 chosen stations are illustrated in Fig. 1a and detailed in Supplementary Table S1 and Supplementary Figs. S22S32. For these stations, the uncertainty of the P wave arrival times ranged between 1 and 2 s, 10 to 40 s for S waves, and 20 to 100 s for LRs between 0.002 and 0.2 Hz.

Fig. 1: Source origin inverted using 11 seismic stations.
Fig. 1: Source origin inverted using 11 seismic stations.
Full size image

a Map of chosen ground seismic stations for the inversion of the 2021 Mw 7.3 Flores earthquake. The four Strateole-2 balloons are marked with blue crosses for comparison. Plot b shows the marginal distribution of the source hypocenter, up to scale between horizontal and vertical slices. c The histogram of source origin time, centered around the true value of zero, with the MAP solution in green.

Source and subsurface as seen from seismic data

The reference joint inversion is performed using only seismic data picks, obtained from 11 stations. The McMC simulations return an ensemble of source and subsurface parameters forming the posterior probability distribution. To interpret these results, we reduce the dimensions of the posterior by calculating marginal distributions, and by estimating the Maximum A Posteriori (MAP) parameters, i.e., the solution maximizing the posterior distribution function (see Methods and Supplementary Table S4).

The marginal distribution of the source parameters inverted from these arrival times is shown in Fig. 1b. The MAP source location is shifted 13 km south of the true epicenter, at a slightly greater distance to the majority of the stations, which are to the north. This longer travel time is accommodated by a slightly earlier source origin time, with the MAP value  ~2 ± 2 s earlier than the reference time (1c). The inversion also favors a source about 50 km deeper than the reference solution, with  ~100 km uncertainty. The marginal distributions of source parameters follow Gaussian distributions with little trade-offs between variables.

Figure 2 displays the marginal posterior distributions for the shear wave velocity vS and the Poisson’s ratio ν as function of depth. Both ν and vs appear constrained down to  ~500 km depth. The MAP models are in good agreement with the Median profile, constructed from global tomographic Earth models, especially for the shear wave velocity. Posterior values of vS have a 1−σ uncertainty of  ±0.1 km s−1 in the crust and upper mantle layers, and ±0.6 km s−1 in the top sediment layer. This thin, top sediment region has the least well defined velocities, likely due to the high variability of LR dispersion above 0.1 Hz (Fig. 2a) and its limited sensitivity to such narrow features. The 1−σ uncertainty becomes 0.5 to 0.6 km s−1 in the lowermost layer and halfspace (Fig. 2c). The Poisson’s ratio takes values between 0.21 and 0.29, within the range expected for most minerals44. It is constrained with a large uncertainty of ±0.02 between 100 and 400 km depth, and is otherwise undefined.

Fig. 2: Subsurface models for shear wave velocity vs and Poisson’s ratio ν below the Flores Sea, inverted using 11 local seismic station.
Fig. 2: Subsurface models for shear wave velocity vs and Poisson’s ratio ν below the Flores Sea, inverted using 11 local seismic station.
Full size image

The marginal distributions of models are shown as normalized 2D histograms, down to 100 km in (a) and (b) and to 1000 km in (d) and (e). The Median literature model is shown in blue, the MAP in green and the 1−σ probability region in dashed white lines. Red histograms in panels of (c) and (f) show regions with a high probability of presenting an interface, or strong gradient in subsurface properties, together with the MAP interfaces in green. See the Methods section for details on normalization and the Interface count metric.

The inversion method also returns a distribution of layer thicknesses, which can be converted to a more easily interpretable distribution of interface depths through cumulative summation. In Fig. 2c and f, we compare the posterior distribution of interfaces to the prior, thereby highlighting depth ranges with a higher probability of hosting a change in subsurface properties, independently from the prior model distribution. Three interfaces, or regions of strong velocity gradients, are strongly suggested in this inverted model: at 20 ± 4 km depth in the crust, and 150 ± 30 km, and 500 ± 70 km depth in the mantle. A very shallow interface is also suggested at 4 km depth.

Source and subsurface inverted from a network of four balloons

The balloon inversion result fits the arrival times adequately, as evidenced by the strong match between the observed and posterior distribution of arrivals in Fig. 3. The low number of arrival-times picked from the balloon data, combined with their large uncertainty, limits the precision of the source location. Indeed, Fig. 4a, b shows a larger uncertainty in epicenter using the Strateole-2 balloons, compared to using a subset of four regional seismic stations at similar locations with more precise P, S, and LR picks. The Strateole-2 data inversion returns a MAP epicenter 35 km away from the true epicenter, at coordinates −7.5 ± 1.0° latitude and 122.5 ± 0.7° longitude, against 32 km distance with an uncertainty of  ±0.6–0.8° in latitude and longitude using four seismic stations. This corresponds to an uncertainty of 200 km around the true epicenter. Still, despite the low SNR of balloon infrasound signals, the inversion framework enables an accurate characterization of the source location—a critical task when the network is sparse or poorly distributed in terms of source-station azimuth34.

Fig. 3: Infrasound signals and arrivals at balloons following the Mw 7.3 Flores earthquake, compared to arrivals predicted from the inverted models.
Fig. 3: Infrasound signals and arrivals at balloons following the Mw 7.3 Flores earthquake, compared to arrivals predicted from the inverted models.
Full size image

a Picked Rayleigh wave group velocities, derived from picked arrival times assuming the true location and time of the Mw 7.3 Flores earthquake, shown in blue. These measurements are compared to 400 group velocity curves constructed from a random selection of posterior models. The MAP model is shown in green. b Pressure waveforms used to pick arrivals, bandpass-filtered between 0.06 and 0.2 Hz, with picked times shown in blue and arrival times predicted from the MAP in green. c Zoom on TTL4-15 signal, showing the posterior distribution of arrival times for P and S waves compared to the picked value and its uncertainty in blue.

Fig. 4: Source origin inverted from four balloons (left) or four seismic stations (right).
Fig. 4: Source origin inverted from four balloons (left) or four seismic stations (right).
Full size image

Posterior distributions of source location (a), depth (c) and origin time (e) inverted from four Strateole-2 balloons and (resp. (b), (d), (f)) four seismic stations at similar locations. The true (resp. MAP) values are shown with blue (resp. green) vertical lines.

For the balloon inversion result, the origin time is about 1 ± 22 s earlier than the published value (Fig. 4e), while it is predicted to be 9 ± 16 s earlier using four local seismic stations (Fig. 4f). The published value is comprised within the posterior uncertainty, and the difference in MAP value could be due to the slight difference in distribution of balloon stations over azimuth and distance compared to the local ground stations, or to biased picks among the ground stations. In both cases, the source depth displays an almost uniform posterior distribution down to 200 km depth and cannot be constrained (Fig. 4c, d). Similarly, the 11-station inversion returned a MAP depth of ~40–50 km (Fig. 1b) rather than the 12.2 to 17.5 km previously published39,40. Inverting for the source depth without stations close to the source (less than a few source depths away) or without identified depth phases is notoriously difficult, hence making this result unsurprising45.

With only four P-wave picks, the Strateole-2 data insufficiently constrains the Poisson’s ratio in the subsurface, where both posterior distributions of ν or vP are hardly distinguishable from uniform priors (shown in Fig. S17 of the Supplementary Information). However, P, S, and LR picks provide constraints on the posterior distribution of vS, which is shown on Fig. 5a, c. The MAP and posterior models match the Median Model within one standard deviation down to around 600 km depth, and shear wave velocities are constrained with a 1−σ uncertainty of ±0.3 to ±0.6 km s−1 between 10 and 400 km depth.

Fig. 5: Subsurface velocity model inverted using Strateole-2 infrasound data.
Fig. 5: Subsurface velocity model inverted using Strateole-2 infrasound data.
Full size image

Shear wave velocities inverted from balloon data, down to 100 km (a) and 1000 km depth (c), along with the associated ratio of posterior and prior interface depth distributions ((b) and (d), resp.). The MAP model and interface depths are shown in green, and the Median literature model in blue, along with the 1−σ probability region in dashed white lines.

Once again, the interface count metric evaluated from the posterior distribution favors changes in subsurface properties at specific depth ranges, in particular at 19 ± 6 km depth in the crust (see Fig. 5b), a value similar to the 11-station inversion. The CRUST1.0 model predicts high variability of crustal thickness in the Malay Archipelago, with values ranging from 10 km to about 40 km (see Supplementary Figs. S5 and S6). However, the median crustal thickness calculated using CRUST1.0 along the great circle path between the source and the seismic stations is 23 km, and respectively 15 km for the balloons, which is consistent with the inverted interface depth (details in Supplementary Table S2). Thus, the distribution of inverted interfaces likely represents the typical Moho depth between the Flores Sea event and the receivers.

Three deep regions of velocity change are suggested at 420 ± 50 km depth, between 60 and 200 km and below 800 km depth, although with little confidence (Fig. 5d). These depths are similar to those found in the 11-station inversion, at 150 ± 30 km and 500 ± 70 km. No global mantle interface is known between 60 and 200 km depth. The wide uncertainty in inverted depth suggests that the high interface probability may not indicate an abrupt change in thermochemical properties, but rather a smooth increase in velocity, as expected at the top of the mantle. Meanwhile, important velocity changes are known to occur in the mantle transition zone, such as at 410 km depth where the olivine-Wadsleyite phase transition is found46. However, both inverted interface distributions show large uncertainties below 400 km, due to the low sensitivity of P, S, and LR travel times to changes at these depths (see Supplementary Figs. S7 and S8). Thus, this concentration of interfaces could rather indicate that a gradual increase in seismic velocity is necessary in this mantle region to fit LR arrival times in the 0.002–0.005 Hz range.

Sensitivity of the inversion method

We further examine the sensitivity of the joint inversion method to subsurface and source parameters, and the resolving power of the body wave and surface wave data employed.

There is no well-established method to pick body wave arrivals from single-component balloon recordings. In particular, the S arrivals were measured with large uncertainties due to the lack of clear impulsive arrivals in the balloon pressure data. We assess the sensitivity of our results to the picking of these arrivals by performing an inversion using only P and LR arrivals. The results, provided in Supplementary Figs. S12 and S13, show little difference compared to Figs. 5 and 4, with a  ~4% difference in MAP vs. The predominant effect is a larger uncertainties in the posterior distribution of vs and source location (up to +50%). P and LR picks are thus found to be sufficient to constrain the source location and the main interfaces in the subsurface.

To further evaluate the sensitivity of our inversion results to the different types of seismic waves, we perform inversions using only LR arrivals, and respectively only P and S arrivals. This assessment uses the ground seismic data to ensure a higher resolution in the comparison. The results are summarized in Fig. 6. Body waves are shown to have the strongest sensitivity to source location, with an uncertainty ellipse almost identical to our complete inversion (Fig. 6a) and a slight sensitivity to depth (Fig. 6b). Both datasets have similar sensitivities to origin time (Fig. 6c), with the LR-based inversion yielding a delayed source. Indeed, the LR-based source location is shifted to the north, and is therefore closer to the majority of the seismic stations, thereby shortening the travel times and requiring a later origin time. Finally, Fig. 6d illustrates the subsurface resolution provided by each dataset, by comparing the 1−σ uncertainty of the inverted vs. the first 100 in the crust are poorly resolved using the body wave information, contrary to the Rayleigh waves, which display high sensitivity in the frequency range considered (see Supplementary Figs. S7S8). Both datasets are sensitive to shear wave velocities between 100 and 400 km depth, and lose resolution at greater depths. Individual plots of vs for the P, S-based, and LR-based inversions can be found in Supplementary Fig. S14.

Fig. 6: Comparison of inversions using different datasets.
Fig. 6: Comparison of inversions using different datasets.
Full size image

a The 1−σ contours of the PDFs for source location inverted using all available wave picks (black, same as in Fig. 1), or only using body waves (blue) and Rayleigh wave picks (red) The location obtained from body wave picks is almost superimposed with the one of Fig. 1. b, c The PDFs of source depth and origin time for the respective datasets. d The 1−σ uncertainty of vs retrieved from the posterior distribution of each inversion, with a close-up on the first 50 km.

The McMC framework also allows for an analysis of trade-offs between inverted parameter in the context of the hypocenter-velocity problem. Figure 7 shows the marginal posterior probability densities of several inverted variables along one and two dimensions. Trade-offs are observed between the origin time ts and the source epicenter defined by (lonslats) (Fig. 7a, b). Regarding the subsurface, there are complex, non-linear trade-offs between the thickness of layers and their seismic velocities (Fig. 7c, d). This is a known phenomenon, due to the fact that Rayleigh wave group velocities are sensitive to seismic velocities over a range of depths (see Supplementary Fig. S8). Finally, there are also trade-offs between the Poisson’s ratio, which for the balloon inversion is weakly resolved between 0.1 and 0.4, and the shear wave velocity in the same layer (Fig. 7e), and between shear wave velocities in adjacent layers (Fig. 7f). These trade-offs mean that a large number of solutions exist for the non-linear, ill-defined system of equations defining arrival times (Methods, Eq. (1)). Yet, our probabilistic inversion framework still highlights regions of higher probability for source location and subsurface properties.

Fig. 7: Marginal probability density distributions resulting from the inversion of balloon data.
Fig. 7: Marginal probability density distributions resulting from the inversion of balloon data.
Full size image

Distributions of a origin time and latitude, b origin time and longitude, c first layer thickness and shear wave velocity, d third layer thickness and shear wave velocity, e Poisson’s ratio and shear wave velocity in the fifth layer, and f shear wave velocities in the first and second layer. A darker hue represents a higher density of models.

There are ways to mitigate the non-uniqueness of the solution space and improve the inversion convergence. One possibility is to fix some of the least resolved parameters to reduce the dimensionality of the inverse problem. In the balloon inversion case, fixing the Poisson’s ratio appears particularly relevant. Supplementary Figs. S15 and S16 show results of a test adopting ν ≈ 0.28, a value representative of the Median model. Location and subsurface solutions are within the uncertainties of the general problem, and posterior uncertainties differ only by a few percent.

Alternatively, the misfit function itself can be reshaped to eliminate some of the undetermined parameters. The time-difference of arrival (TDOA) method is a typical application of this concept in seismology, adapted from the seminal method of hyperbolas of Milne47, whereby the inversion seeks to minimize the misfit between observed and predicted travel time differences, instead of absolute travel times48. This formulation of the misfit effectively eliminates the origin time variable. We tested this inversion method for the balloon inversion, using the earliest P-wave as a reference for all travel-time differences. Supplementary Figs. S18 and S19 show results identical to the original formulation, confirming that ts is effectively ill-defined in this joint inversion problem. Finally, the misfit, or likelihood function, could alternatively be formulated using an L1-norm instead of an L2-norm of time differences (see “Methods” section for its definition), which has the advantage of being less sensitive to data outliers49. In the balloon case, this formulation yields results nevertheless similar to the original L2-norm, as show in Supplementary Figs. S20 and S21, with posterior uncertainties differing by at most 30%.

Perspectives for balloon seismology

We achieve the inversion of a subsurface seismic velocity profile based on earthquake infrasound signals recorded at airborne balloon platforms. The distributions of subsurface profiles inverted using data from 4 balloons (Fig. 5) are consistent with the Median Model, a median representation of seismic velocities in the Malay Archipelago from the literature. We also capture a crustal interface at 19 km, consistent with the local Moho depth, with  ±6 km uncertainty. The Bayesian approach enables an examination of parameter trade-offs and distributions in the simultaneous estimation of source location and subsurface velocity.

We identify the main challenge of this balloon-based inversion as the reliable picking of seismic phases in single-component data. Besides the lack of waveform polarization estimation, balloon signals suffer from lower SNR on Earth at low frequencies due to buoyant motion through atmospheric perturbations, and possibly to local turbulence induced by this motion31. Without knowledge of the source location, coda arrivals may also be misidentified: a broadband energy pulse from a wind burst or a secondary P-phase can be wrongly interpreted as an S-wave arrival, and higher-mode LR energy can obscure the fundamental mode at higher frequencies. These limitations could be mitigated in the future by improved signal processing methods, such as template matching or machine learning-based picking, as well as by additional instrumentation. Recent studies have proposed using Inertial Motion Units (IMUs) onboard balloons to better characterize the polarization of the velocity perturbation associated with a pressure wave and derive its direction of arrival, providing additional information for the interpretation of the signal and the resolution of the location problem50.

In conclusion, our findings confirm the viability of using balloons for seismic exploration. Our results strengthen the case for balloon seismology on Venus, as we demonstrated the ability to address challenges related to unknown sources and subsurface properties through a joint inversion using balloon infrasound data. Beyond the arrival time analysis proposed by the present study, seismic analyses based on signal amplitude, or even the full waveforms could be envisioned, potentially allowing to access additional information on the source depth and mechanism, or on the subsurface properties on Earth or Venus. Consequently, balloon seismology could provide valuable insights into the planet’s current tectonic activity and internal structure.

On Earth, the free-floating nature of balloons enables them to reach remote regions, such as the oceans and poles, where seismic sensor deployment remains challenging. Balloons have proven their ability to capture epicentral infrasound at close range28 and volcanic infrasound at larger distances43, providing potential constraints on various types of remote seismo-acoustic sources. Earthquake infrasound data can for instance be used to determine the event magnitude32. Under certain conditions, signals recorded by balloons near the source may match the quality of those from distant seismic stations, and they could thus serve as complement to conventional monitoring networks. Further investigations could help identify the regions and typical magnitudes for which balloons fill an observational gap also on Earth.

Methods

Markov chain Monte Carlo inversion

Sophisticated Monte Carlo sampling approaches, such as Ensemble Sampling37, Hamiltonian Monte Carlo51, or Parallel Tempering52, allow a thorough search through model space robust to the presence of multiple local minima.

In the current work, we employ the open-source implementation of an Ensemble Sampler written in Python named emcee38 as the basis for our inversion framework. Its use here is motivated by its simplicity of application and its efficiency when sampling highly correlated parameter spaces, which can be encountered in the hypocenter-velocity problem.

Each McMC simulation is run for 106 iterations on an ensemble of 50 chains, resulting in a total of about 50 × 106 samples. The simulations are run with 32 CPUs on a high-performance computing server.

Forward model and misfit

The inversion method is based on measurements of arrival times for different seismic phases, namely P, S, and LRs at a network of receivers. Considering a common and arbitrary reference time for the receivers and source of interest, the time of arrival of a wave W at receiver R can be written:

$${t}_{W,R}={t}_{{{{\rm{s}}}}}+\Delta {t}_{W,R}+\Delta {t}_{{{{\rm{air}}}},R},$$
(1)

where the earthquake occurs at time t = ts (s) since the reference time, W is the wave type among seismic or air-coupled P, S, and LR, ΔtW,R is the seismic travel time from the source to the piercing point at a ground station or below a floating balloon, and Δtair,R is the additional travel time from the surface to the floating balloon, if applicable. For simplicity, we use the origin time of the Flores earthquake published by the USGS (03:20:23 UTC on 14 December 2021) as our reference time40. In the case of LRs, ΔtW,R is frequency-dependent, allowing to model a range of arrival time measurements. At a specific receiver R, the travel time Δtair,R is independent of the phase type, and can be estimated knowing the balloon altitude and the atmospheric state at the time of the event. The recording of multiple phases W at several receiver locations provides a system of equations similar to Eq. (1).

Upon selection of a source location, origin time, and subsurface model by the Monte Carlo sampling algorithm, the forward model is in charge of predicting the arrival time of waves at each of the station/balloons following Eq. (1). The travel times ΔtP and ΔtS of P and S waves are calculated using a ray-tracing method derived from the LAUFZE suite53. This Fortran routine takes in a source-receiver distance and a layered subsurface model and in return predicts the arrival time of the fastest direct P and S body waves.

The travel times ΔtLR(f) of LRs are calculated using a NumPy-accelerated Python implementation of the surf96 code54, called disba55. The code is given a layered subsurface model input and outputs the group velocity vg(f) of the Rayleigh waves at the fundamental or higher modes. We obtain the travel time by the approximation:

$$\Delta {t}_{{{{\rm{LR}}}}}(f)=\frac{{d}_{s}}{{v}_{g}(f)},$$
(2)

where ds is the epicentral distance, considering only the fundamental mode.

The travel time Δtair from the ground to the balloon is calculated by integrating the vertical variation of sound speed cair(z) from z = 0 to the balloon altitude z = zb:

$$\Delta {t}_{{{{\rm{air}}}}}=\int_{z = 0}^{{z}_{b}}\frac{{{{\rm{d}}}}z}{{c}_{{{{\rm{air}}}}}(z)}.$$
(3)

The uncertainty of Δtair primarily depends on the uncertainty of the atmospheric profile between 0 and about 20 km altitude. Considering a spatial and temporal grid around the Flores event, the variation of sound speed profile obtained from the G2S atmospheric specification56—itself based on the MERRA257 reanalysis—leads to variations of Δtair smaller than 0.2 s (see Supplementary Fig. S9), therefore negligible compared to picking uncertainties.

Considering travel-time picks to have an uncorrelated Gaussian distribution with standard deviation σ (which is debatable, see, e.g., Husen and Hardebeck45), the log-likelihood function minimized by the Monte Carlo search is the sum of the following L2-norms:

$$\log P({{{\boldsymbol{d}}}}| {{{\boldsymbol{m}}}})=-\frac{1}{2}{\sum}_{R}{\sum}_{i}\frac{{({t}_{{W}_{i},R}({{{\boldsymbol{m}}}})-{t}_{{W}_{i},{{{\rm{obs}}}}})}^{2}}{{\sigma }_{{W}_{i},R}^{2}}-\frac{1}{2}{\sum}_{R}{\sum}_{i}\log (2\pi {\sigma }_{{W}_{i},R}^{2}),$$
(4)

where Wi represents the available phase picks among {PSLR(fi)} and R are the available receivers. \(2\pi {\sigma }_{{W}_{i},R}^{2}\) is the normalisation term for the Gaussian distribution of observations. An alternative formulation of the problem, which eliminates the origin time variable, seeks to minimize the misfit between the observed and predicted time-difference of arrival48 In this case, we use a reference P-wave arrival, \({t}_{{P{{{\rm{ref}}}}}}\) to calculate the arrival time differences. The likelihood function then becomes:

$$\log {P}_{{{{\rm{TDOA}}}}}({{{\boldsymbol{d}}}}| {{{\boldsymbol{m}}}})\\ =-\frac{1}{2}\sum\limits_{R}\sum\limits_{i}\dfrac{{(| {t}_{{W}_{i},R}({{{\boldsymbol{m}}}})-{t}_{{P{{{\rm{ref}}}}}}({{{\boldsymbol{m}}}})| -| {t}_{{W}_{i},{{{\rm{obs}}}}}-{t}_{{P{{{\rm{ref}}}}},{{{\rm{obs}}}}}| )}^{2}}{{\sigma }_{{W}_{i},R}^{2}+{\sigma }_{{P{{{\rm{ref}}}}}}^{2}}\\ \quad-\frac{1}{2}\sum\limits_{R}\sum\limits_{i}\log \left(2\pi ({\sigma }_{{W}_{i},R}^{2}+{\sigma }_{{P{{{\rm{ref}}}}}}^{2})\right).$$
(5)

If an L1-norm is preferred, Eq. (4) becomes:

$$\log {P}_{{{{\rm{L1}}}}}({{{\boldsymbol{d}}}}| {{{\boldsymbol{m}}}})=-{\sum}_{R}{\sum}_{i}\frac{| {t}_{{W}_{i},R}({{{\boldsymbol{m}}}})-{t}_{{W}_{i},{{{\rm{obs}}}}}| }{{\sigma }_{{W}_{i},R}}-{\sum}_{R}{\sum}_{i}\log (2{\sigma }_{{W}_{i},R}).$$
(6)

This formulation assumes that observations follows a Laplace distribution.

Effects of balloon motion on arrival times

Contrary to a seismic station, a balloon is a non-stationary object, animated with a horizontal motion due to jet winds, and a oscillatory vertical motion due to buoyancy. Due to the horizontal balloon motion, the station-balloon distance is not constant over the duration of the earthquake signal and can impact ΔtW (see Eq. (1)). Assuming that the balloon is located at the distance d0 from the source at time ts, and that, in the worst-case scenario, it is moving with velocity vb in the radial direction away or towards the source; and considering a homogeneous media of seismic velocity vW for simplicity, the expression of ΔtW becomes:

$$\Delta {t}_{W}=\frac{{d}_{0}+{v}_{{{{\rm{b}}}}}\Delta {t}_{W}}{{v}_{W}},$$
(7)

or

$$\Delta {t}_{W}=\left(\frac{1}{1-{v}_{{{{\rm{b}}}}}/{v}_{W}}\right)\frac{{d}_{0}}{{v}_{W}}.$$
(8)

Strateole-2 balloons have a horizontal velocity of 5 to 8 m s−1. This means a  ~0.1% change in travel time for P waves, and ~0.4% for S and LR waves compared to a stationary receiver. This Doppler shift can thus be neglected compared to other sources of errors in travel time estimations31.

In the same way, constant-volume balloons like the Strateole-2 aerostat experience vertical motion, caused by wind perturbation. The stratification of the atmosphere, with density decreasing with altitude, exerts a restoring force through the volume of air displaced by the balloon. This leads to buoyancy oscillations, whose period depends on the Brunt–Väisälä pulsation N at the balloon equilibrium altitude, namely \(2\pi {f}_{\!\!0}=N=\sqrt{-\frac{d\rho }{dz}\frac{g}{{\rho }_{e}}}\), with g the constant of gravity and ρe the density at the balloon equilibrium altitude33. In the case of Strateole-2 balloons, this oscillation has a period between 180 and 240 s and an amplitude of 10−100 m, corresponding to about  ~0.5 m s−1. This speed is insufficient to produce any notable effect on arrival time or travel time estimations. However, it is responsible for a strong low-frequency noise in the balloon pressure recordings, as a variation of 10−100 Pa is expected at each oscillation. In section “Balloon noise correction”, we describe a method we applied to mitigate this noise.

Inversion priors

Source location

We have assumed no prior information on the epicenter location or source depth. Thus, we set uniform prior bounds of [−90°, +90°] for lats and [−180°, +180°] for lons. For the practical examples of this article, for which the epicenter is known a priori from earthquake catalogs, we simply restrict the starting latitudes and longitudes of the McMC chains to a range  ±20° closer to the known epicenter, so as to avoid stuck chains and speed up the computation. The source depth is also considered unknown. We thus chose [0, 200] km as uniform prior bounds for hs (see Supplementary Table S3).

Source origin time

The choice of prior bounds for the origin time is strongly dependent on the choice of reference time for arrival time picking. In the practical examples of this article, the chosen reference time is the USGS published origin time40, and we set prior bounds for ts to [−200, +200] s. In practice, a rough approximation of the origin time can be calculated by estimating the minimum and maximum possible source distance using prior bounds for seismic velocities, leading to ranges for ts closer to thousands of seconds.

Layers and layer thickness

We choose to parameterize our subsurface as a succession of homogeneous layers, as this parametrization is best adapted to the numerical methods (disba, LAUFZE) used in our forward model. The maximum source-receiver distance in our two inversions is about ~3000 km, a distance at which body waves have turning depths of  ~600 km on Earth. It is therefore necessary to parameterize subsurface down to mantle depths.

In this study, the number of layers is fixed to six, in addition to an underlying halfspace. Tests of the effect of the number of layers on the achieved misfit showed that the misfit does not decrease notably for a higher number of layers (see Supplementary Fig. S10). The last layer and the halfspace are intended to represent upper-mantle velocities, hence having large prior thickness between 100 and 400 km. The uppermost layers represent a possible sedimentary region with thickness of 0.2 to 5 km. The remaining layers have intermediate prior thicknesses, allowing for variations within the crust, and can be found in Supplementary Table S3.

Seismic velocities and Poisson’s ratio

The inversion covers seismic velocities from the upper crust to the upper mantle. The thin top layer allows for possible sedimentary deposits and has prior bounds for vs of [0.5, 4] km s−1. The following four layers correspond to crustal or upper mantle materials and have vs within [1, 6] to [3, 6] km s−1. The last layer and the halfspace are mantle layers with vs within [4, 7] km s−1. vp is calculated using the values of vs and of the Poisson’s ratio ν. Prior bounds for ν are uniform within a range of [0.1, 0.4] encompassing typical properties for crustal and mantle minerals44. In addition to these uniform bounds for Poisson’s ratio and shear wave velocity, we implement additional rules to restrict the acceptable prior models. Prior models must have no negative velocity gradient in the first 3 layers. Below that, negative changes in velocity are limited to Δvs, Δvp < −1 km s−1. An upper limit of vp < 12 km s−1 is set, which has an influence on the prior distribution of ν. The distribution of prior models can be found in the Supplementary Information (Fig. S11).

The density, to which P, S, and LR travel times are less sensitive than seismic velocities, is not inverted but rather modeled using Birch’s empirical law58.

Balloon noise correction

To improve the low-frequency SNR of the balloon pressure trace, the balloon buoyancy resonance33 is corrected following a method similar to43. The GPS altitude trace Z of each Strateole-2 balloon is upsampled and interpolated to Zup, so as to match the sampling rate of 1 Hz and exact timestamps of the pressure trace P, using a Hann taper in the frequency domain. Over small variations in altitude, the relation between pressure and altitude is quasi-linear. A sliding window of 500 s is run along the P and Zup traces, and a linear regression is applied to determine the coefficients of their relation, valid for the center point of the window. These are then used to produce an auxiliary pressure trace \({P}_{{{{\rm{mod}}}}}\), calculated from Zup. Finally, the corrected pressure trace Pcorr is obtained from \({P}_{{{{\rm{corr}}}}}=P-{P}_{{{{\rm{mod}}}}}\). The different traces and steps of the correction can be found in Supplementary Fig. S3. This processing step helps partially correct the balloon buoyancy oscillations and improves subsequent frequency-time analysis.

Data processing and arrival picking

Key to the inversion framework is to properly identify and pick seismic arrival times. An infrasound signal is by nature single-component. Hence, classical techniques for separating P, S, and LRs in 3-component seismic signals based on polarity cannot be used. Instead, we leverage other aspects of these arrivals, namely the impulsive nature of body waves and their envelopes and the dispersive nature of Rayleigh waves, distinguishable in the time-frequency domain.

To pick P and S waves, a two-step method is used. First, the signal is filtered in several frequency bands and its envelope is calculated using a Hilbert transform. For balloon signals, we use the envelope of the low-passed signal below 0.1 Hz, the high-passed signal above 0.05 Hz, and an intermediate signal band-passed between 0.03 and 0.1 Hz. For one-component seismic velocity signals, the signal is first low-passed or high-passed at 1 Hz, or band-passed between 0.02 and 0.8 Hz. Part of the scattering in the envelopes is smoothed by calculating a sliding median over the 5, 10, and 20 s preceding each considered point in time. Using multiple sliding window sizes helps rule out picks in the envelope that could be due to a local scattered arrival. Using this envelope method, a first hypothesis on the arrival of P and S waves and their uncertainty can be made by identifying the start of the P and the S energy envelope. Then, in a second step, these picks are assessed in narrower frequency bands. We construct a filter bank by bandpass-filtering the single-component signal in 10 narrow logarithmic intervals from ~0.001 Hz to the Nyquist frequency of the signal. This method presents advantages for refining the P wave pick, more clearly visible at high frequency, and for confirming the S-wave pick, by identifying a later impulsive arrival spanning multiple intervals of frequency. Despite these two steps, the S wave arrival is only identified with a very large uncertainty in some cases.

To pick the Rayleigh wave arrivals, we apply a Frequency-Time ANalysis (FTAN) to the single component signal using the Stockwell transform (also named S-transform), an approach analogous to a Morlet wavelet transform. The LR is identified by its dispersion, and arrival times are picked at different frequencies around the maximum of the dispersed signal. Wavelet or S-transforms optimize the trade-off between time and frequency resolution in FTAN, but arrivals retain a frequency-dependent spread in time, which we interpret as our uncertainty in arrival time.

Post processing of inversion results

McMC inversions return a large amount of model parameter samples, out of which several statistically meaningful metrics should be extracted. In the Bayesian framework, we are interested in the most probable model given our data and prior, i.e., the model with the maximum posterior probability, referred to as MAP. Although a Monte Carlo inversion returns millions of samples, the curse of dimensionality means that the MAP is not necessarily among them. To determine an estimate of the MAP out of all our samples, we use the Mean-shift method, through which subset of 2 × 104 samples are migrated towards one or more regions of high density in the posterior space. The MAP values calculated for the three main inversions in this study are reported in Supplementary Table S4.

The MAP yields the region of high density throughout all dimensions. We are also interested in the behavior of individual or groups of parameters in lower dimensions. This is done by considering marginal distributions of parameters through histograms or density plots, as was done in the majority of figures throughout this article. In some cases, we apply additional processing to the marginal distribution in order to enhance visualization or enable easier interpretation. The marginal density distribution of subsurface parameters of Figs. 2 and 5 corresponds to a 2D histogram of subsurface models, normalized by a Mean value. This Mean value represents the histogram counts expected if subsurface models were uniformly distributed on the plot area. Thus, the Posterior/Mean scale allows to compare how densely packed posterior models are between two distinct inversions, independently from the inversion parameters.

We apply a similar process to interpret the layer thickness distribution in the posterior models. The posterior distribution of layer thickness can be transformed into a posterior distribution of interface depth using a cumulative summation starting from the top layer. However, comparing this distribution of interface depth to a uniform distribution, for example using a simple histogram, can be misleading, as the sum of uniform distribution from which the layer thicknesses were picked is not uniform itself. Here, we instead calculate the ratio of the number layer thicknesses counted in one bin in the posterior distribution, to the number predicted in a cumulative prior distribution. For each layer N, the cumulative prior is defined as the cumulative posterior distribution of interface depths for k < N, summed with the prior distribution of thickness for layer N:

$${{{\rm{Prior}}}}({d}_{N})={\sum }_{k=0}^{N-1}{{{\rm{Posterior}}}}({h}_{k})+{{{\rm{Prior}}}}({h}_{N}),$$
(9)

where dN is the depth of interface N. Mixing the prior and posterior distribution in this cumulative prior allows to rule out the effect of above layers on the distribution of lower layers. The final interface count ratio is obtained from  ~5 × 105 samples from the posterior and from each prior. An interface count ratio greater than one thus means that there is a higher probability that an interface is located at this depth than would be expected from a cumulated prior distribution.