Main

On subgalactic scales, observational evidence for the ΛCDM cosmological paradigm has been inconclusive due to the complex and highly nonlinear interplay between baryons, radiation and dark matter1. A major challenge for conducting robust observational tests of ΛCDM in the low-mass regime (108M) is the inherent difficulty in characterizing the luminosities, sizes and total masses of dim objects at large distances. Low-mass luminous objects, such as ultracompact dwarf galaxies and globular clusters, have so far been directly observed only within the local Universe out to z ≈ 0.4 (refs. 2,3,4,5,6), whereas studies leveraging strong lensing by galaxy clusters have detected dwarf galaxies7, star clusters8,9,10 and even individual stars11,12 up to z ≈ 10. However, such observations are sensitive to only the emitted light and any constraints on the total mass of such objects must be inferred indirectly. Other low-mass objects are intrinsically dark, with dark matter haloes13,14 and isolated intermediate-mass black holes15 expected to have no observable electromagnetic (EM) signatures.

Strong gravitational lensing offers a powerful alternative pathway for studying low-mass objects with little to no EM luminosity. A spatially extended source in a galaxy-scale strong lens system acts as a backlight for the gravitational landscape of its lens galaxy, revealing low-mass perturbers through their gravitational effects alone. Furthermore, lens galaxies typically lie in the redshift range 0.2 z 1.5 (ref. 16), which means that low-mass, low-luminosity objects can be detected and studied across cosmic time. To date, observations of galaxy-scale lenses with resolved arcs have been used to detect three low-mass perturbers: an ~109M detection in SDSS J0946+1006 (refs. 17,18,19,20,21,22,23), an ~108M detection in JVAS B1938+666 (refs. 20,23,24,25) (the lens studied in the present work) and an ~109M detection in SDP.81 (refs. 26,27). These were discovered in observations with the Hubble Space Telescope (HST; 120 mas full-width at half-maximum angular resolution), the W. M. Keck Observatory adaptive optics system (Keck AO; 70 mas) and the Atacama Large Millimeter Array (ALMA; 23 mas), respectively. Angular resolution is the limiting factor in detecting low-mass perturbers28, as the mass of a perturber sets a characteristic gravitational lensing length scale that must be resolved for a detection to be possible20,29. Therefore, expanding the mass range that we can robustly probe necessitates that we use strong lens observations at the highest possible angular resolution.

JVAS B1938+666 (ref. 30) is a strong lens system consisting of an early-type galaxy at zl = 0.881 (ref. 31) lensing both optical and radio emission from a source galaxy at zs = 2.059 (ref. 32) (Fig. 1). For this work, we analysed a radio observation of JVAS B1938+666 performed using a global very long baseline interferometry (VLBI) array at 1.7 GHz, with a resulting angular resolution of~5 mas (ref. 33). In the radio, JVAS B1938+666 contains a bright quadruply imaged feature, two images of which merge into an extremely thin arc, about 200 mas in length, as well as a fainter doubly imaged component.

Fig. 1: The strong gravitational lens system JVAS B1938+666.
figure 1

Left, our best surface brightness model of the 1.7-GHz global VLBI observation used here, which has been reconvolved with the main lobe of the interferometer’s point spread function and added to the residuals (34 μJy per beam r.m.s.). For reference, red contours show the surface brightness at 2.1 μm observed by the W. M. Keck Observatory adaptive optics system30. The positions of two low-mass perturbers are each marked with a black X. The 2 × 108M object first detected by ref. 24 is labelled \({\mathcal{A}}\), and the 1.13 × 106M detection reported here is labelled \({\mathcal{V}}\). The zoomed-in region shown in the right-hand two panels is indicated by the black square, which has a side length of 60 mas. Top right, detail of the bright arc around \({\mathcal{V}}\), with the colour scale modified to emphasize the gap in the arc produced by the gravitational perturbation of \({\mathcal{V}}\). Bottom right, GI corrections to the lensing convergence (expressed in units of lens-plane surface mass density), showing a compact, positive feature whose position and mass are consistent with the independent parametric modelling results for \({\mathcal{V}}\). The dashed black circle has a radius of 80 pc and the lensed emission is indicated by the black contours.

We used a visibility-plane Bayesian forward-modelling technique34,35,36,37,38 to jointly infer both the pixellated source surface-brightness distribution and the lens mass model. We initially modelled the lens mass distribution in a fully parametric fashion to infer the large-scale mass distribution of the lens galaxy (the ‘macromodel’). In addition, we included a truncated isothermal (pseudo-Jaffe (PJ)) parametric mass component for the previously detected ~108M perturber, which we label \({\mathcal{A}}\) hereafter. The VLBI observation of JVAS B1938+666 is sensitive to the presence of \({\mathcal{A}}\), with a Bayesian log-evidence of \(\Delta \log {\mathcal{E}}=+16\) over a completely smooth macromodel containing no low-mass perturbers.

To search for additional low-mass haloes in JVAS B1938+666, we next applied the gravitational imaging (GI) technique17,24,34,39, which is a pixellated, non-parametric method for finding local overdensities in strong lens systems. In addition to the parametric macromodel and perturber \({\mathcal{A}}\), we recovered a single strong, compact, positive correction to the surface mass density along the bright extended arc (Fig. 1). The presence of this feature is robust to the choice of GI pixel size and Bayesian prior. We hereafter label this new detection as \({\mathcal{V}}\).

Figure 2 shows a visual explanation of the detection mechanism. Examining a comparison of reconstructed sky models with and without detection \({\mathcal{V}}\), it is clear that \({\mathcal{V}}\) corrects a >5σ peak in the residuals map. Without perturber \({\mathcal{V}}\), the source model attempts (unsuccessfully) to absorb a defect in the lens model, resulting in a discontinuity in the reconstructed image of the bright radio lobe. When \({\mathcal{V}}\) is introduced in the lens model, the caustic curve folds back over onto the source, simultaneously removing the surface-brightness discontinuity and improving the model residuals. Hence, the drastic improvement in \(\Delta \log {\mathcal{E}}\) in the presence of perturber \({\mathcal{V}}\) is the result of both improved χ2 and a reduced regularization penalty due to a better-behaved source reconstruction.

Fig. 2: Visual comparison of models without and with perturber \({\mathcal{V}}\).
figure 2

Top row, pixellated source surface-brightness maps. In the model without \({\mathcal{V}}\), the source model attempts to fit the gap in the bright lensed arc, resulting in a sharp discontinuity in the surface brightness along the lensing caustic (left column). The gravitational effect of \({\mathcal{V}}\) folds the caustic over onto the bright radio lobe, correcting this discontinuity and recovering a smooth, contiguous image (right column). The inset region around the bright, quadruply imaged source component has a side length of 4 mas. Also note the presence of a much fainter, doubly imaged source component ~50 mas to the north of the bright component. Middle row, lens-plane surface-brightness maps (reconvolved with the main lobe of the point spread function). Note that the gap in the arc is still present in the model without \({\mathcal{V}}\); in this case, the source has attempted to absorb the discontinuity. Bottom row, residuals, which have been normalized in the visibility plane and Fourier transformed into the image plane for ease of visualization. The inclusion of \({\mathcal{V}}\) corrects a >5σ peak in the residuals (corresponding to a 2% change in surface brightness) near the intersection between the critical curve and the lensed arc. The zoomed-in region in the middle and bottom rows is the same as the inset region in Fig. 1.

To confirm detection of \({\mathcal{V}}\) using a complementary technique, we repeated the analysis using a fully parametric model for \({\mathcal{V}}\) (Table 1). We found that a PJ profile with free truncation radius is preferred over a lens model containing only the macromodel and \({\mathcal{A}}\) by \(\Delta \log {\mathcal{E}}=+348\), corresponding to a detection significance of 26σ. The inferred parameters for this model were \({m}_{{\rm{tot}},{\mathcal{V}}}=(2.8\pm 0.3)\times 1{0}^{6}\,{M}_{\odot }\) and a truncation radius \({r}_{\rm{t},{\mathcal{V}}}=149\pm 18\,{\rm{pc}}\). Because \({m}_{{\rm{tot}},{\mathcal{V}}}\) and \({r}_{\rm{t},{\mathcal{V}}}\) are highly correlated, for a more physically meaningful parametrization of the properties of \({\mathcal{V}}\) we define \({m}_{80,{\mathcal{V}}}\) to be the cylindrical mass of \({\mathcal{V}}\) enclosed within a projected radius of 80 pc on the lens plane. This is the radius at which the enclosed mass exhibits the smallest statistical uncertainty (equivalently, where the enclosed mass and \({r}_{\rm{t},{\mathcal{V}}}\) decorrelate completely), indicating that the VLBI observation of JVAS B1938+666 is primarily sensitive to \({m}_{80,{\mathcal{V}}}\). We found \({m}_{80,{\mathcal{V}}}=(1.13\pm 0.04)\times 1{0}^{6}\,{{\rm{M}}}_{\odot }\). This measurement was robust across various model choices for \({\mathcal{A}}\) and \({\mathcal{V}}\), and is also consistent with the gravitational imaging results: across six different GI runs, the projected mass within a radius of 80 pc was in the range \(8.3\times 1{0}^{5}\,{M}_{\odot }\le {m}_{80,{\mathcal{V}}}\le 1.8\times 1{0}^{6}\,{M}_{\odot }\) (Fig. 3). The inferred position of \({\mathcal{V}}\) was extremely precise, with an uncertainty of 194 μas in the y direction and 86 μas in the x direction, corresponding to 1.5 pc and 0.7 pc at the lens redshift, respectively.

Table 1 Properties of low-mass perturber \({\mathcal{V}}\) inferred from the fully parametric analysis and their 1σ uncertainties

If the perturber \({\mathcal{V}}\) is a dark matter halo, its presence is consistent with the expected number of detectable subhaloes in a ΛCDM cosmology. In a \(1.06\times 1{0}^{-2}\,{{\rm{arcsec}}}^{2}\) sensitive region around all lensed images (roughly one beam width), the probability of detecting at least one cold dark matter (CDM) subhalo in the [106, 107] M mass bin is p(n ≥ 1) = 0.65. This probability falls to P = 0.36 for 9.1 keV thermal relic warm dark matter (WDM) and P = 0.14 for 4.6 keV WDM. Although detection of \({\mathcal{V}}\) presents no strong tension with CDM or WDM models in terms of halo abundance, its density profile may be problematic. In a companion paper (Vegetti et al., manuscript in preparation), a range of different parametric density profiles beyond PJ are fitted to this detection and an analysis of its implications for cold, warm and self-interacting dark matter models is presented. Interestingly, the best-fit Navarro–Frenk–White (NFW) subhalo model from that analysis has a concentration that is many σ higher than the expectation based on simulated CDM subhaloes40, which is an intriguing result similar to that found for object \({\mathcal{A}}\) in JVAS B1938+666 and for the ~109M object in SDSS J0946+1006 (refs. 20,23).

When considering profiles for objects other than dark matter haloes, Vegetti et al. (manuscript in preparation) find that parametric profiles for intermediate-mass black holes15 and globular clusters41 are disfavoured at high statistical significance, although a relatively extended ultracompact dwarf galaxy is not ruled out42. A more definitive statement on what type of object \({\mathcal{V}}\) is will require deep optical/infrared observations to detect any potential EM emission, a task made more challenging by the lensed optical emission seen in the Keck AO image of JVAS B1938+666 (Fig. 1).

Given the robustness of detection of \({\mathcal{V}}\) in the gravitational imaging analysis, in tandem with its extremely high significance in independent parametric modelling, we believe \({\mathcal{V}}\) to be a 106M object that has been directly detected at a cosmological distance using only its gravitational lensing effect. The precision measurement of its mass, size and position is unprecedented for an object in this mass range at this distance. This result was made possible by milli-arcsecond-resolution global VLBI observation, analysed using state-of-the-art lens modelling software. This work establishes VLBI strong lens observations as the only currently viable tool for discovering and precisely measuring properties of low-luminosity objects in the 106M mass regime at cosmological distances.

Methods

Observation

The VLBI observation of JVAS B1938+666 used for this work was performed using a global VLBI array combining antennas from the Very Long Baseline Array (VLBA) and European VLBI Network (EVN) at 1.7 GHz for 14 h at a data recording rate of 512 Mbits s−1 (ID: GM068; Principal Investigator: McKean). The observing strategy followed a standard phase referencing mode only for the VLBA part (phase calibrator J1933+654). Throughout the observations, scans on the fringe finder 3C454.3 were also included every ~4 h for the bandpass calibration. The correlation of the data from the 19 antennas was performed at the Joint Institute for VLBI-European Research Infrastructure Consortium (JIV-ERIC) and resulted in a dataset with eight intermediate frequencies, each with 8-MHz bandwidth and divided into 32 channels for two circular polarizations (RR and LL).

The data were processed within the Astronomical Image Processing System (aips)43 package following a typical calibration procedure for phase-referenced observations. The resulting deconvolved image was obtained with a restoring beam of 7.4 mas × 4.7 mas at a position angle of 32.1 degrees east of north33. The noise, assumed to be Gaussian and uncorrelated, was measured directly in the Fourier domain by subtracting time-adjacent visibilities to remove the source signal, then computing the root mean square (r.m.s.) within 30-minute time intervals. The triangle of highly sensitive baselines between Effelsberg, Jodrell Bank and Westerbork were flagged to prevent them from dominating the model. The baselines Green Bank–Hancock, Green Bank–Owens Valley, Green Bank–Pie Town and Los Alamos–Pie Town were also flagged for several 1- to 2-h time intervals owing to strong radio frequency interference.

The surface-brightness distribution of the background radio source consists of two radio lobes (hot spots), the brighter of which is lensed into an extended gravitational arc, whereas the fainter one is doubly imaged (Fig. 1). The physical properties of the source are derived and discussed in ref. 33.

Bayesian inference

We use the visibility-plane gravitational lens modelling software PRONTO34,35,36,37,38 to jointly infer the pixellated source surface-brightness vector s and lens parameters ηH from the observed visibilities d, as well as to obtain the Bayesian log-evidence for each model parametrization H. At each likelihood evaluation (the first level of inference), we obtain the maximum a posteriori source sMP for a given ηH and source regularization weight λs by solving

$${A}\,{{\mathbf{s}}}_{{\rm{MP}}}=(DL)^{\rm{T}}{C}^{-1}{\mathbf{d}},$$
(1)

where

$$A\equiv \left[{(DL)}^{\rm{T}}C^{-1}DL+{\lambda }_{{\mathbf{s}}}R_{{\mathbf{s}}}\right].$$
(2)

Here \(L({{\boldsymbol{\eta }}}_{H})\) is the lens operator, which maps light from the source plane to the lens plane, \({\lambda }_{{\boldsymbol{s}}}R_{{\boldsymbol{s}}}\) is the source prior covariance, which enforces the lens equation by penalizing strong surface-brightness gradients and \(C^{-1}\) is the noise covariance of the data. We solve equation (1) using a preconditioned conjugate gradient solver, where the Fourier operator \(D\) is implemented using a non-uniform fast Fourier transform. We refer the reader to ref. 37 for further details on the method.

In the second level of inference, we sample the lens parameters ηH and source regularization weight λs. The posterior is

$$P({{\boldsymbol{\eta }}}_{H},{\lambda }_{{\mathbf{s}}}| {\mathbf{d}})=\frac{P({\mathbf{d}}| {{\boldsymbol{\eta }}}_{{H}},{\lambda }_{{\mathbf{s}}})\,P({{\boldsymbol{\eta }}}_{{H}})\,P({\lambda }_{{\mathbf{s}}})}{P({\mathbf{d}}| {H}\;)}.$$
(3)

The likelihood (which is the evidence from the source-inversion step) is

$$\begin{array}{rcl}2\log P({\mathbf{d}}| {{\boldsymbol{\eta }}}_{{\rm{H}}},{\lambda }_{{\mathbf{s}}})&=&-{\chi }^{2}-{\lambda }_{{\mathbf{s}}}{{\mathbf{s}}}_{{\rm{MP}}}^{\rm{T}}R_{{\mathbf{s}}}\,{{\mathbf{s}}}_{{\rm{MP}}}-\log \det A\\ &&+\log \det ({\lambda }_{{\mathbf{s}}}R_{{\mathbf{s}}})+\log \det (2\uppi C^{-1}).\end{array}$$
(4)

This expression follows from the marginalization over all possible sources s when the noise and source prior are both Gaussian. As \({{{{\boldsymbol{R}}}}}_{{\mathbf{s}}}\) and \({{{{\boldsymbol{C}}}}}^{-1}\) are sparse, the terms containing them are straightforward to evaluate. Computing \(\log \det A\) is non-trivial; we approximate it using the preconditioner from the inference on sMP as described in ref. 37. The χ2 term is

$${\chi }^{2}={(DL{{\mathbf{s}}}_{{\rm{MP}}}-{\mathbf{d}})}^{\rm{T}}C^{-1}(DL{{\mathbf{s}}}_{{\rm{MP}}}-{\mathbf{d}}).$$
(5)

We optimize its evaluation using the fast-χ2 technique from ref. 38.

We compared different model parametrizations, H, using the log-evidence \(\log {{\mathcal{E}}}_{{\rm{H}}}\equiv \log P({\boldsymbol{d}}| {\rm{H}})\), which was obtained by marginalizing over all parameters. We used the nested sampling algorithm MULTINEST44 to compute the log-evidence and to sample the posterior distributions of ηH and λs.

Our method takes both noise and priors to be Gaussian. From a practical standpoint, these assumptions make our analysis computationally tractable by allowing the marginalization over all possible configurations of the source and pixellated potential corrections (‘Gravitational imaging’) using a single linear solution. Although it is the case that non-Gaussian noise statistics can, in principle, arise from calibration errors or correlated instrumental effects, we do not expect this to be a major issue given the high signal-to-noise ratio of the observation. Nevertheless, we tested our analysis on several independent sets of calibration solutions, including on individual spectral windows, and found our results to be robust across all cases. Most reassuringly, we found that the gap in the bright arc induced by object \({\mathcal{V}}\) cannot be suppressed by the self-calibration process.

Similarly, although there exist techniques for applying highly informative generative machine-learning models as non-Gaussian priors for ‘realistic’ sources and potentials (for example, ref. 45), these are susceptible to systematic biases learned from the training data (which are themselves taken from numerical simulations). False-positive subhalo detections are a particularly relevant concern for this work. Instead, we opt for highly uninformative Gaussian priors, which are agnostic to the shapes of the source surface brightness and the linearized potential corrections, which thus maximizes the ability of the data to drive the inference.

Parametric lens model components

We now introduce the surface mass density profiles used for parametric modelling in this work. All quantities labelled κ are in units of the critical surface mass density for strong lensing,

$${\varSigma }_{{\rm{cr}}}=\frac{{c}^{2}\,{D}_{\rm{s}}}{4\uppi \,G\,{D}_{\rm{ls}}\,{D}_{\rm{l}}},$$
(6)

where Ds, Dl and Dls are the angular diameter distances from observer to source, from observer to lens and from lens to source, respectively. Assuming a Planck 2015 (ref. 46) cosmology, \({\varSigma }_{{\rm{cr}}}=1.50\times 1{0}^{11}\,{M}_{\odot }\,{{\rm{arcsec}}}^{-2}\) for JVAS B1938+666 (zs = 2.059, zl = 0.881).

We model the smooth galaxy-scale mass distribution (the macromodel) as an elliptical power-law mass distribution (for example, ref. 47), with a projected surface mass density of

$$\kappa (\xi\;)=\frac{3-\gamma }{2}{\left(\frac{{R}_{{\rm{E}}}}{\xi }\right)}^{\gamma -1},$$
(7)

which we define in terms of an effective Einstein radius RE and the elliptical radius ξ2 ≡ x2q + y2/q, where q is the axis ratio, and γ is the three-dimensional logarithmic slope, where γ = 2 is isothermal. The additional parameters x0, y0 and θ0 set the position and orientation of the profile. We use FASTELL48 to compute the corresponding deflection angles.

To account for non-ellipticity in the lens galaxy (for example, diskiness or boxiness), we include multipole perturbations of orders m = 3 and 4, parametrized as

$${\kappa }_{m}(r,\theta )={\left(\frac{r}{1{\rm{arcsec}}}\right)}^{-(\gamma -1)}\left[{a}_{m}\sin (m\theta )+{b}_{m}\cos (m\theta )\right],$$
(8)

which we write in circular polar coordinates here for readability. The coefficients am and bm encode the amplitude and orientation of each multipole term and γ is fixed to that of the underlying elliptical power-law model (equation (7)). We use Gaussian priors for am and bm with μ = 0 and σ = 0.01, a choice motivated by the results of numerical simulations of massive elliptical galaxies (for example, ref. 49), which is also consistent with previous lens modelling results23,38,50. In addition, we include an external shear term in the macromodel, with strength Γ and direction θΓ.

For parametric modelling of the low-mass perturbers \({\mathcal{A}}\) and \({\mathcal{V}}\) (Table 1), we use the following spherically symmetric mass profiles. Models PJ_free and PJ_tidal are truncated isothermal profiles (PJ; for example, refs. 47,51) with a projected surface mass density of

$$\kappa (r)=\frac{{m}_{{\rm{tot}}}}{2\pi \,{r}_{t}^{2}{\Sigma }_{{\rm{cr}}}}\,\left(\frac{1}{x}-\frac{1}{\sqrt{{x}^{2}+1}}\right),$$
(9)

where rt is the truncation radius, x = r/rt and mtot is the total mass. We leave rt as a free parameter for PJ_free, whereas for PJ_tidal we set rt to be the tidal radius

$${r}_{t}=\frac{2R}{3}{\left(\frac{{m}_{{\rm{tot}}}}{2{M}_{{\rm{lens}}}( < R)}\right)}^{\frac{1}{3}}.$$
(10)

Here R is the three-dimensional radius to the centre of the lens and Mlens (<R) is the mass of the main lens enclosed within R; see ref. 20. Because we have no knowledge of the three-dimensional position of a low-mass perturber within the lens, we take R to be the projected distance to the perturber in the plane of the lens.

In the case of NFW_sub, we model perturber \({\mathcal{A}}\) as a spherical NFW profile47,52, with a surface mass density of

$$\kappa (x)=2{\kappa }_{s}\frac{1-F(x)}{{x}^{2}-1},$$
(11)

where x = r/rs, κs = ρsrs/Σcr and

$$F(x)=\left\{\begin{array}{ll}\frac{1}{\sqrt{{x}^{2}-1}}\,{\tan }^{-1}\sqrt{{x}^{2}-1}\quad &x > 1\\ \frac{1}{\sqrt{1-{x}^{2}}}\,{\tanh }^{-1}\sqrt{1-{x}^{2}}\quad &x < 1\\ 1\quad &x=1.\end{array}\right.\,$$
(12)

We found that using a mass-concentration-redshift relationship for CDM haloes (for example, ref. 53) leads to virial concentrations that are much too low to produce the required lensing effect for a halo of a given virial mass (see also refs. 20,23). Therefore, we leave the concentration of NFW_sub as a free parameter in this work. The redshift is fixed to that of the main lens.

GI

GI (refs. 17,24,34,39,54) is a technique for recovering pixellated (non-parametric) corrections to the lensing convergence. GI is well suited for finding local overdensities that have not been accounted for in the parametric mass model, ηH, which remains fixed during the GI procedure. Our GI formulation augments the lens operator (for example, equation (1)) with an extra block column representing linearized degrees of freedom in the lensing potential:

$$L_{{\rm{GI}}}\equiv \left[L\,| \,-LG_{{\mathbf{s}}}G_{{\boldsymbol{\psi }}}\right],$$
(13)

where

$${G}_{{\mathbf{s}}}\equiv \frac{\partial {\mathbf{s}}}{\partial {\mathbf{x}}}$$
(14)

is the matrix of source gradients and

$${G}_{{\boldsymbol{\psi }}}\equiv \frac{\partial {\boldsymbol{\alpha }}}{\partial {\boldsymbol{\psi }}},$$
(15)

is the matrix of derivatives of the deflection angles α with respect to the pixellated lensing potential ψ. Operator \(L_{{\rm{GI}}}\) acts on the concatenated vector of the pixellated source and linearized potential corrections

$${\mathbf{r}}\equiv \left(\begin{array}{c}{\mathbf{s}}\\ {\boldsymbol{\psi }}\end{array}\right),\,\,\Delta {\mathbf{r}}\equiv \left(\begin{array}{c}\Delta {\mathbf{s}}\\ \Delta {\boldsymbol{\psi }}\end{array}\right)$$
(16)

Similarly, we define the block regularization matrix

$${R}_{{\rm{GI}}}\equiv \left(\begin{array}{cc}{\lambda }_{{\mathbf{s}}}{R}_{{\mathbf{s}}}&0\\ 0&{\lambda }_{{\boldsymbol{\psi }}}{R}_{{\boldsymbol{\psi }}}\end{array}\right).$$
(17)

For this work, we apply three different forms for the potential regularization operator \({R}_{{\boldsymbol{\psi }}}\), checking them against one another to ensure the robustness of our results. These penalize total mass, surface density gradients or surface density curvature. In each case, \(R_{{\boldsymbol{\psi }}}\) is assembled using standard finite-difference operators for the gradient and Laplacian on a Cartesian grid.

It can be shown that the proper linearization over ψ yields a linear system analogous to equation (1):

$$\left[{(DL_{{\rm{GI}}})}^{\rm{T}}C^{-1}(DL_{{\rm{GI}}})+R_{{\rm{GI}}}\right]\,\Delta {{\mathbf{r}}}_{{\rm{MP}}}=-{(DL_{{\rm{GI}}})}^{\rm{T}}C^{-1}(DL{\mathbf{s}}-{\mathbf{d}})-{R}_{{\rm{GI}}}\,{\mathbf{r}}.$$
(18)

Formally, equation (18) represents a single Newton iteration in the space of s and ψ. After iterating until convergence, it can be shown that the likelihood with respect to the regularization weights, \(\log P({\mathbf{d}}| {\lambda }_{{\mathbf{s}}},{\lambda }_{{\boldsymbol{\psi }}})\) is obtained by substituting s → r, \({\lambda }_{{\mathbf{s}}}{R}_{{\mathbf{s}}}\to {R}_{{\rm{GI}}}\) and \({L}\to {R}_{{\rm{GI}}}\) in equation (4). A single likelihood evaluation in our GI procedure thus consists of the following.

  1. (1)

    Initialize r = 0.

  2. (2)

    Solve for ΔrMP.

  3. (3)

    Update r = r + ΔrMP.

  4. (4)

    Repeat steps 2 and 3 until convergence. We find that 10 iterations are generally sufficient for convergence; in practice, we use 20.

  5. (5)

    Evaluate \(\log P({\mathbf{d}}| {\lambda }_{{\mathbf{s}}},{\lambda }_{{\boldsymbol{\psi }}})\) using equation (4).

Note that our GI implementation differs from that of ref. 54, who used a single iteration per likelihood evaluation. Our approach of iterating to convergence allows the GI algorithm to more easily recover compact features in the lensing potential.

We use a standard simplex optimization algorithm to maximize the likelihood with respect to the regularization weights λs and λψ, where each likelihood evaluation requires 20 subiterations as described above. We used starting values for λs and λψ of 109 and 1013, respectively, with initial logarithmic step sizes of 0.5 dex. After optimizing for λs and λψ, we convert ψ into corrections to the lensing convergence (surface mass density) by means of \({\kappa }_{{\rm{GI}}}=-\frac{1}{2}{\nabla }^{2}{\boldsymbol{\psi }}\).

Our use of Gaussian priors for the linearized potential corrections was motivated by the need for an uninformative and computationally fast regularization term. A natural consequence of this choice is that ψ and hence κGI, prefer to resemble Gaussian random fields in regions where the data are uninformative. To aid in our interpretation of the GI results, we apply a 3σGI threshold to the convergence maps, treating any convergence over the threshold as a real feature. We compute σGI from the r.m.s. of the convergence corrections within the mask used for modelling the lensed images. The value of σGI must be empirically estimated in this way, as the operator-based iterative solver framework (‘Bayesian inference’) precludes direct manipulation of the posterior covariance matrix.

Expected number counts of detectable subhaloes

We express the differential mass function for WDM subhaloes in terms of the ‘half-mode mass’ Mhm, as

$$\frac{{\rm{d}}n}{{\rm{d}}m}={m}^{\alpha }{\left[1+{\left({\alpha }_{2}\frac{{M}_{{\rm{hm}}}}{m}\right)}^{\beta }\right]}^{\gamma }$$
(19)

with α = −1.9, α2 = 1.1, β = 1.0 and γ = −0.5 (refs. 55,56). The mass m is the total mass of a PJ subhalo. The expected number of subhaloes in the mass range \({M}_{\min }=1{0}^{6}\,{{M}}_{\odot }\) to \({M}_{\max }=1{0}^{7}\,{{M}}_{\odot }\) is then

$${\mu }_{{\rm{sub}}}={A}_{{\rm{sens}}}\,{f}_{{\rm{sub}}}\,\frac{{M}_{{\rm{lens}}}( < 2{R}_{{\rm{E}}})}{4\uppi {R}_{{\rm{E}}}^{2}}\frac{\mathop{\int}\nolimits_{\ln {M}_{\min }}^{\ln {M}_{\max }}m\frac{{\rm{d}}n}{{\rm{d}}m}\,{\rm{d}}\ln m}{\mathop{\int}\nolimits_{\ln {M}_{\min }}^{\ln {M}_{\max }}{m}^{\alpha +2}\,{\rm{d}}\ln m},$$
(20)

where RE is the Einstein radius of the lens and Mlens(<2RE) is the total projected mass of the lens within twice the Einstein radius. The variable fsub is the fraction of dark matter (normalized with respect to the CDM mass function) contained in subhaloes; we use fsub = 0.012 (ref. 57). The variable Asens is the area around the lensed arcs that we expect to be sensitive to the presence of subhaloes, which we obtain by thresholding the deconvolved image at 5σ above the noise for a total area of \({A}_{{\rm{sens}}}=1.06\times 1{0}^{-2}\,{{\rm{arcsec}}}^{2}\). This area roughly corresponds to one primary beam width (~5 mas) along each bright arc. Defining Asens in such a tight region around the brightest parts of the lensed arcs is an intentionally conservative choice made to ensure that all subhaloes 106M within this region have been detected. Our use of a constant sensitivity within Asens is a rough but conservative approximation to more sophisticated sensitivity mapping techniques (for example, refs. 28,56).

Parametric modelling: macromodel and \({\mathcal{A}}\) only

The results of the fully parametric modelling are summarized in Table 1. We first consider those fully parametric models consisting of the macromodel and the previously detected perturber \({\mathcal{A}}\) (‘Parametric lens model components’). Because \({\mathcal{A}}\) lies ~50 mas away from the nearest lensed radio emission, its redshift cannot be robustly constrained by our VLBI observation alone. For this work, we therefore assume that \({\mathcal{A}}\) lies at the redshift of the main lens.

The best parametric models for \({\mathcal{A}}\), in terms of Bayesian log-evidence \(\Delta \log {\mathcal{E}}\), are truncated isothermal subhalo models (PJ), with either free or fixed (tidal) truncation radii being equally preferred (\({\mathcal{A}}=\) PJ_free or \({\mathcal{A}}=\) PJ_tidal; \(\Delta \log {\mathcal{E}}=16\)). We additionally tested an NFW profile with free virial concentration (\({\mathcal{A}}=\)NFW_sub; \(\Delta \log {\mathcal{E}}=13\)) as well as a completely smooth macromodel (\({\mathcal{A}}=\)None; \(\Delta \log {\mathcal{E}}\equiv 0\)). For simplicity, we chose the best model with fewer free parameters, \({\mathcal{A}}=\) PJ_tidal, as our fiducial model for the GI procedure. We found that the choice of profile \({\mathcal{A}}\) does not affect the inferred properties of \({\mathcal{V}}\).

GI detection of \({\mathcal{V}}\)

The results of the GI modelling, applied to the parametric model with \({\mathcal{A}}=\) PJ_tidal, are shown in Fig. 1 and Supplementary Fig. 1. For three different regularization types (penalizing the convergence, gradient of convergence or curvature of convergence) and two different grid resolutions (NGI = 512 and NGI = 1,024, corresponding to pixel sizes of ΔxGI = 3.5 mas and ΔxGI = 1.8 mas, respectively), we consistently recover a compact density correction well above the 3σGI threshold used to identify bona fide features in the surface mass density that were unaccounted for during the initial parametric modelling. We label this feature \({\mathcal{V}}\), as it was discovered using a VLBI observation of JVAS B1938+666.

Defining \({m}_{80,{\mathcal{V}}}\) to be the cylindrical mass contained within a projected radius of 80 pc on the lens plane (‘Parametric modelling: macromodel, \({\mathcal{A}}\) and \({\mathcal{V}}\)’), we found \(8.3\times 1{0}^{5}\,{{M}}_{\odot }\le {m}_{80,{\mathcal{V}}}\le 1.8\times 1{0}^{6}\,{{M}}_{\odot }\) for the six GI runs. The scatter in GI masses is due to the large number of degrees of freedom in the pixellated convergence map, combined with the GI technique’s lack of prior knowledge on the approximate sphericity of gravitationally bound astrophysical objects. GI is an important technique for identifying and estimating the masses of positive density corrections to a parametric lens model; however, it is necessary to verify detection \({\mathcal{V}}\) with independent parametric modelling.

Parametric modelling: macromodel, \({\mathcal{A}}\) and \({\mathcal{V}}\)

We repeat the fully parametric modelling procedure, this time also including a parametric profile for perturber \({\mathcal{V}}\). The results are summarized in Table 1 and Supplementary Fig. 3. We assume, for this work, that perturber \({\mathcal{V}}\) lies at the lens redshift of z = 0.881, using a truncated isothermal profile (PJ; ‘Parametric lens model components’). We find that, regardless of the choice of profile for detection \({\mathcal{A}}\), the inclusion of a parametric model for \({\mathcal{V}}\) increases the log-evidence by at least \(\Delta \log {\mathcal{E}} > 332\) (>25σ) in all cases. Models \({\mathcal{A}}=\) PJ_free, \({\mathcal{V}}=\) PJ_free and \({\mathcal{A}}=\) PJ_tidal, \({\mathcal{V}}=\) PJ_free are equally preferred. As before, we chose \({\mathcal{A}}=\) PJ_tidal, \({\mathcal{V}}=\) PJ_free with fewer free parameters for a more detailed discussion, and treat it as our ‘best model’, with a significance of \(\Delta \log {\mathcal{E}}=364\) over the smooth macromodel and \(\Delta \log {\mathcal{E}}=348\) (26σ) over \({\mathcal{A}}=\) PJ_tidal, \({\mathcal{V}}=\)None. For the rest of the discussion, it is implied that \({\mathcal{A}}=\) PJ_tidal.

In the \({\mathcal{V}}=\) PJ_free parametrization, \({\mathcal{V}}\) has a total mass of \({m}_{{\mathcal{V}}}=(2.82\pm 0.26)\times 1{0}^{6}\,{M}_{\odot }\) and a truncation radius \({r}_{\rm{t},{\mathcal{V}}}=149\pm 18\,{\rm{pc}}\). To express the inferred mass in a way that is independent of \({r}_{\rm{t},{\mathcal{V}}}\), we found 80 pc to be the radius at which the enclosed (projected) mass \({m}_{80,{\mathcal{V}}}\) is uncorrelated with \({r}_{\rm{t},{\mathcal{V}}}\). For \({\mathcal{V}}=\) PJ_free, \({m}_{80,{\mathcal{V}}}=(1.13\pm 0.04)\times 1{0}^{6}\,{M}_{\odot }\), which is consistent with the gravitational imaging results (Fig. 3). For \({\mathcal{V}}=\) PJ_tidal, \({m}_{80,{\mathcal{V}}}=(1.07\pm 0.04)\) agrees with \({\mathcal{V}}=\) PJ_free, which indicates that the enclosed mass at 80 pc is indeed particularly well constrained by the data. The position of \({\mathcal{V}}\) is constrained to 194 μas precision in the y direction and 86 μas in the x direction. For all models, \({m}_{80,{\mathcal{V}}}\), \({x}_{{\mathcal{V}}}\) and \({y}_{{\mathcal{V}}}\) were consistent within the 1σ uncertainties regardless of the choice of profile \({\mathcal{A}}\).

Fig. 3: Comparison of mass profiles of detection \({\mathcal{V}}\), in terms of cylindrical enclosed mass Mcyl (<r), for both gravitational imaging and parametric modelling procedures.
figure 3

For the gravitational imaging models, we compared three different regularization types (penalizing the convergence, gradient of convergence or curvature of convergence) and two different grid resolutions (NGI = 512 and NGI = 1,024, corresponding to pixel sizes of 3.5 mas and 1.8 mas, respectively). The profiles were consistent to within ~50% at a radius of 80 pc (vertical dashed line). The discrete steps in the GI curves are due to the pixellated nature of the convergence corrections. Shaded regions denote the 1σ uncertainties in the enclosed mass profiles.

The tidal truncation radius of \({\mathcal{V}}=\) PJ_tidal (equation (10)) is \({r}_{\rm{t},{\mathcal{V}}}=53\pm 1\,{\rm{pc}}\), which is a factor of three smaller than that inferred using PJ_free. Relative to \({\mathcal{V}}=\) PJ_free, \({\mathcal{V}}=\) PJ_tidal is disfavoured by \(\Delta \log {\mathcal{E}}=-16\), which suggests that this smaller value of \({r}_{\rm{t},{\mathcal{V}}}\) is too compact to produce the required lensing effect. This is likely to be a result of the use of the projected two-dimensional separation as a proxy for the three-dimensional distance between the lens galaxy and the perturber; any offset of \({\mathcal{V}}\) along the line of sight would increase the three-dimensional distance and hence rt, alleviating this issue.

Unlike the Keck AO observation of JVAS B1938+666, in which perturber \({\mathcal{A}}\) lies on top of the infrared arc, the projected lens-plane distance between \({\mathcal{A}}\) and the nearest lensed radio emission is 400 pc; therefore, we expect only the cylindrical mass out to a projected radius of 400 pc, which we define as \({m}_{400,{\mathcal{A}}}\), to be well constrained by the VLBI data. For \({\mathcal{A}}=\) PJ_tidal, we find \({m}_{400,{\mathcal{A}}}=(5.0\pm 0.8)\times 1{0}^{7}\,{M}_{\odot }\) and \({r}_{\rm{t},{\mathcal{A}}}=243\pm 20\,{\rm{pc}}\). For \({\mathcal{A}}=\) PJ_free, \({m}_{400,{\mathcal{A}}}=(5.2\pm 0.7)\times 1{0}^{7}\,{M}_{\odot }\) and \({r}_{\rm{t},{\mathcal{A}}}=15\pm 20\,{\rm{pc}}\). Hence, \({m}_{400,{\mathcal{A}}}\) is well constrained and consistent between the two, whereas \({r}_{\rm{t},{\mathcal{A}}}\) is unconstrained and simply reflects the tidal radius relationship and the log-uniform prior (respectively) used for these profiles. For comparison, Despali et al. (ref. 20 and subsequent private communication) find a mass of \({m}_{400,{\mathcal{A}}}=(7.7\pm 0.1)\times 1{0}^{7}\,{{M}}_{\odot }\) for their best subhalo model of the Keck AO observation.