Introduction

Understanding chemical stoichiometry, distributions, and material structure in 3D is crucial for the future of nanomaterial engineering1. Layered organic photovoltaics, semiconductors, and super alloys are just a few of the recent material classes that have been better understood in 3D using chemical tomography2,3,4. For electron tomography, resolution, fluence (e per unit area), and specimen size are all inextricably linked—3D resolution is set by fluence and sampling, and not by the microscope’s resolving power5,6. Traditional STEM tomography routinely achieves sub-nanometer 3D resolution using more plentiful elastically scattered electron signals7,8,9. Chemical tomography relies on core-excitation spectroscopy, which demands high electron fluences that typically surpass specimen fluence limits with commonly reported fluences around >108−109 e/nm2 for sufficient signal10,11. The specific fluence requirements depend on detector solid angle12, specimen, thickness, chemical cross sections, and the desired SNR. Even for radiation-resistant specimens, the resolution of chemical tomography is poor (often around 3–10 nm) and resultant tomograms have high noise and artifacts that limit interpretation. Despite these shortcomings, chemical tomography is still favored when chemical specificity is prioritized over resolution.

To achieve chemical specificity and high-resolution simultaneously, fused multimodal electron tomography (MM-ET) was developed13. This technique is inspired by data fusion from the satellite community—where separate signals are linked to improve measurement accuracy—and prior efforts in electron tomography that reduce noise through linked modalities14,15,16,17,18,19,20. MM-ET leverages the physical prior of non-linear Rutherford scattering to fuse low-fluence, high-quantity elastic HAADF projections with a limited number of low-SNR chemical maps. This data fusion yields higher resolution from the new measurement constraints; when correctly tuned, multimodal electron tomography may enable high-resolution chemical tomography associated with a >90% reduction in fluence by reducing the total number of chemical projections from several dozen to just a few. When both experimental and algorithmic parameters are appropriately tuned, MM-ET can deliver high-resolution 3D chemistry mapping at sub-nanometer length scales across a wide range of material systems, including radiation-sensitive samples.

Here, the performance of fused multimodal chemical tomography is validated across sampling (i.e., number of specimen projections), electron fluence, signal-to-noise ratios (SNR), the weight of data fusion, and limited specimen rotation range (i.e., the missing wedge). Multimodal electron tomography consistently outperforms traditional chemical tomography; under optimal conditions, MM-ET can reduce fluence by 1–2 orders of magnitude at equivalent resolution or increase resolution by 1–2 orders of magnitude at equivalent fluence. Furthermore, we demonstrate the impact of data-fusion parameter weighting (Eq. (1)) on final reconstruction accuracy using experimentally-inspired simulations. The reconstruction parameters are autonomously tuned using Bayesian optimization (BO). Longstanding physical limitations, such as the missing wedge and inherent SNR differences between elastic and inelastic modalities, are explored quantitatively for chemical tomography. Optimization of these experimental and computational factors for MM-ET enables unprecedented voxel-level chemical accuracy.

Results

Background

Chemical electron tomography seeks the complete volumetric distribution of the sample chemistry in 3D space. Chemical specificity is greatest using inelastic core-loss signals from energy dispersive x-rays (EDX) or electron energy loss spectroscopy (EELS). However, these inelastic scattering events are rare and require considerable electron fluence (~109 e-/nm2) to achieve a statistically significant projection image. The Rose criterion establishes a target SNR above 5 for each projection to ensure adequate contrast for feature detection21,22. This threshold reflects the empirically determined point when features are distinguishable from background noise in imaging systems. Nonetheless, fluence limitation restricts the number of projections that can be collected before sample degradation. Ultimately, the 3D resolution and quality of chemical tomography are limited by this small number of projections.

More plentiful elastic scattering signals can also correlate with chemistry—e.g., high-angle scattering off an atom is proportional to the atomic number squared23,24,25. Elastic scattering onto a high-angle annular dark-field (HAADF) detector produces projection images of nanomaterials at lower fluences (~ 105 e/nm2) with intensity that scales with atomic number as roughly Zγ. This γ factor is bounded by 4/3 as described by Lenz-Wentzel and 2 for Rutherford scattering26. A γ value of 1.7 was chosen based on prior simulation work showing that for HAADF, a value between 1.6 and 2 fits Z contrast27. This γ must be estimated and could be a source of model mismatch. Supplementary Fig. 9 illustrates that variation in γ does not significantly affect the final reconstruction. When nanomaterials have complex structure, density variation, and multiple elemental compounds, distinguishing chemistry becomes difficult or intractable. Structurally and chemically precise 3D reconstructions can be obtained by fusing morphological detail from HAADF tomography with the elemental specificity found in chemical maps (EDX, EELS).

Fused multimodal electron tomography offers a departure from traditional chemical electron tomography: (1) By linking the physics of elastic Rutherford scattering and inelastic core-loss scattering, fluence can be reduced by two orders of magnitude. (2) Mixed inelastic and elastic signals can be sampled independently to surpass traditional 3D resolution limits for chemical electron tomography. By leveraging multiple detector modalities (the elastic and spectroscopic inelastic signals), MM-ET benefits from both the inelastic and elastic scattering cross-sections. Many light elements have salient core-excitation edges for detection. Consequently, data fusion works for most low and high Z-number elements and has a broader range of atomic sensitivity compared to traditional elastic or inelastic tomography. MM-ET is an optimization that solves a fully 3D inverse problem and finds consistency with all available projection images across modalities. Fused MM-ET does not interpolate or overlay but rather extracts meaningful chemical information residing in the elastic modality to solve 3D chemical distributions that satisfy the measurement constraints.

3D resolution is limited by the number of projections

For electron tomography, higher resolution requires more specimen projections. Tomography is limited by incomplete 3D information; the projection-slice theorem describes how each projection corresponds to a tilted plane of information in frequency (k) space28. The Crowther criterion uses this interpretation to state that 3D resolution is limited by the number of projections and the size of the specimen: higher resolution, d (a finer detectable feature size), necessitates more projections, N, or a smaller specimen, D (\(d\approx\pi \frac{D}{N}\))6,29. Doubling the number of measured specimen projections will double the resolving power.

Chemical tomography is severely undersampled to avoid specimen destruction. With 10 or 15 specimen tilt increments, chemical tomography will have 10–30× worse resolution than a traditional HAADF reconstruction performed at 0.5 or 1 increments. Traditional elastic tomography faces less severe sampling issues; elastic signals require a lower electron fluence to achieve sufficient SNR, allowing many projections to be captured prior to sample degradation. More projections improve k-space sampling by reducing the gap between measured planes (< 2), resulting in more accurate, higher-resolution reconstructions of nanostructure but with poor chemical specificity (Fig. 1a). For traditional chemical tomography, the information gaps are much larger, and the quality and resolution of reconstruction are poor (Fig. 1b).

Fig. 1: Diagrams illustrate sampling and resolution for HAADF, chemical, and multimodal tomography.
figure 1

3D reconstructions of CoO/CuO nanocube cluster decorated with Au nanoparticles are shown in (ac). a HAADF reconstruction: High-resolution morphology but poor chemical discernment. b Chemical reconstruction: Easily discernible chemistry but poor resolution. c Fused MM-ET reconstruction: High-resolution morphology for each chemistry. d HAADF sampling: High-count elastic signals allow for many projections. High-resolution reconstructions possible due to fine sampling. Experimental limitations create a missing wedge of information. e Chemical sampling: Sparse inelastic signals lead to fewer projections and low resolution. f Fused MM-ET sampling: Combines elastic (HAADF) and chemical projections for high sampling, improved chemical specificity, and superior accuracy. Simulated dataset modified from Padgett et al.44.

The Crowther resolution of multimodal electron tomography is determined by the elastic modality (i.e., the number of HAADF projections), and the inelastic modality adds chemical specificity (Fig. 1c). The more elastic projections acquired, the higher the chemical resolution achieved. Previous work showed that 40+ HAADF projections and 9 chemical projections can produce chemical reconstructions with around 1 nm resolution in 3D13. Fused multimodal tomography achieves this resolution at an overall fluence that is 1–2 orders of magnitude less than would be required by traditional chemical tomography. For example, instead of taking 14 chemical projections alone, fusing 7 chemical projections with 56 HAADF projections can provide a fluence reduction of ~50% and a four-fold increase in resolution (e.g., with a 20 μs dwell time for HAADF, 3 ms dwell time for EELS or simultaneous EDX, and 115 pA beam current).

3D resolution is limited by electron fluence

Fluence also limits 3D resolution, even with an unlimited number of projection images. That is, after the sampling requirements are met (i.e., a sufficient number of projections), the signal-to-noise ratio of each projection will determine the 3D resolution. Achieving better resolution in 3D requires balancing the fluence to improve the SNR while protecting the sample from excessive exposure to the beam. Prior studies, including early work by Hegerl and Hoppe, have shown that the fluence-limited 3D resolution scales inversely with the fourth root of the fluence under a weak contrast approximation5,30,31,32. The tradeoff between fluence and resolution is shown in Fig. 2a—higher resolution requires substantially higher fluences. On a log-log plot, the fluence-resolution limit appears as a linear boundary for chemical tomography that becomes more favorable with multimodal data fusion.

Fig. 2: MM-ET results in higher fluence efficiency, allowing for sub-nanometer chemical recovery in 3D.
figure 2

a Resolution and fluence relationship for a range of materials36,37,38,39,40 is shown assuming an image contrast of 80% as described in31. b Experimental MM-ET fluences and resolutions compared to traditional chemical tomography experiments with reported fluence2,3,4,17,41,42,43. The green stars mark MM-ET experiments, the red squares mark traditional chemical tomography experiments, and the black squares mark unique chemical tomography experiments boosted by 180 tilt range or basic HAADF/EDX correlation.

The electron fluence tolerance of a material dictates the highest achievable 3D resolution for electron tomography. Figure 2 compares the theoretical resolution and fluence tradeoff of MM-ET and traditional chemical tomography across several materials with previously reported fluence limits. Note, the fluence limit of a material may vary with temperature and flux33,34,35. Highlighted materials include highly fluence-resistant minerals such as ZSM-5 zeolite and NaCl36,37, moderately fluence-resistant ionic compounds such as MgF2, LiF, LiF(AlF3)37,38, and less fluence-resistant polymers and soft materials such as PVC and valine39,40. Using an image contrast of 80%, the regimes in Fig. 2a are defined by the boundaries of fluence-limited resolution in 3D. Higher resolution requires higher fluences, which are inaccessible for certain material classes.

For any material fluence limit, fused multimodal electron tomography enables an order of magnitude resolution improvement by including elastic signals that provide information about structure and chemistry at significantly lower doses. Notably, for fluence-sensitive materials like lithium compounds, metal-organic frameworks, and soft matter, traditional methods are theoretically estimated to not exceed resolution below 10 nanometers. While better detectors and more fluence-efficient imaging techniques can push the boundaries slightly, multimodal approaches open a new frontier for 3D imaging. For many materials, multimodal electron tomography brings us to resolutions of 1 to 2 nanometers and surpasses the 1-nanometer barrier for fluence-resilient materials. Moderately beam-sensitive materials will benefit from future efforts to integrate additional image modalities into fused multimodal electron tomography.

Experimentally, MM-ET has demonstrated higher resolutions at lower fluences compared to prior traditional chemical tomography experiments2,3,4,17,41,42,43. A literature review of past chemical tomogram total fluence and resolution across various materials highlights the fluence-specific advantages of MM-ET over traditional approaches. The squares in Fig. 2b report the total fluence and resolution achieved in prior traditional chemical tomography studies. Note, past experiments that did not report fluence are not included. The black squares highlight innovative strategies to push the resolution-fluence tradeoff, such as milling a needle-shaped sample to enable 180 rotation2, and directly correlating high-resolution HAADF projections to lower resolution EDX signals3. It is difficult to assess resolution without access to the data and rigorous analysis; here, the optimistic Nyquist-limited resolution is reported (equal to twice the projection image pixel size). However, the resolution for multimodal reconstructions was assessed directly in real and reciprocal space in previous work13. MM-ET detected low-Z elements such as carbon (Z=6), oxygen (Z=8), silicon (Z=14), and sulfur (Z=16). MM-ET also enabled the first sub-nanometer chemical reconstruction, which included the light element oxygen13. The theoretical higher resolution and lower fluence results in Fig. 2a are validated in Fig. 2b.

The Missing Wedge Affects Chemical Tomography

In electron tomography, imaging the specimen at high tilt angles is most often not possible because the specimen, the TEM grid, or the TEM holder blocks the electron beam. Achieving a full tilt range of 180 degrees from needle-shaped geometries is challenging2,44. This typically restricts the specimen tilt range to ± 75 or smaller—i.e., a missing range of tilts greater than 30. The incomplete set of specimen projections corresponds to a missing wedge of information in reciprocal space (Fig. 3a); this missing wedge creates a noticeable distortion in one direction of the 3D reconstruction.

Fig. 3: Missing wedge of information degrades chemical tomography.
figure 3

a 2D tilt range illustration of missing wedge regions (pink). b Tilt range versus NRMSE plot charting how traditional chemical tomography (red line with diamonds), and chemical tomography with total variation (green line with circles) always result in greater error compared to MM-ET (blue line with squares) at a given missing wedge size. c 2D XY slices of 3D reconstructed volume for various missing wedge sizes on synthetic CoO/CuO nanocube dataset. Elongation along one direction grows as missing wedge size increases. CuO appears yellow, CoO appears red. Tilt range is labeled for each 2D slice.

MM-ET consistently improves reconstruction accuracy, even with a missing wedge. A simulated CoO/CuO nanocube dataset was reconstructed using traditional chemical tomography, total variation minimization, and MM-ET across missing wedge sizes, ranging from no missing wedge to a 160 missing wedge. For all missing wedge sizes tested, 11 chemical maps were used in all methods. In the case of MM-ET, an additional 21 HAADF projections were also incorporated. MM-ET outperforms traditional techniques at every missing wedge size, and the gap in accuracy widens as the missing wedge diminishes (Fig. 3b). The normalized root mean square error (NRMSE) halves for a common 40 missing wedge. Peak SNR (PSNR) and Fourier shell correlation (FSC) show similar behavior in Supplementary Figures 2, 5 and 7.

Missing wedge artifacts for multimodal chemical tomography are familiar to traditional electron tomography. Slices through the tomographic reconstructions (Fig. 3c) provide a better assessment of the effects of a missing wedge on chemical tomography. The 3D reconstruction slices display an increased smearing of structure along one direction as the wedge of missing information increases. Qualitatively, we see that multimodal tomography has negligible improvements to missing wedge artifacts—chemical objects still appear blurred along one direction. Rather, the improvement of data fusion is a reduction of noise and overall sharpness of features. When utilizing just TV, NRMSE may decrease, but features such as small internal pores become lost (green circle vs. green square). A tilt range of ± 70 (missing wedge of 40) offers a reasonably high-quality reconstruction and should be a minimum tilt range for high-quality electron tomography.

Optimizing the fusion function for multimodal electron tomography

To reconstruct three-dimensional chemistry using fused multimodal electron tomography, an optimal solution is sought that is consistent with (1) the high SNR HAADF modality, (2) the chemical maps from EELS and/or EDX, and (3) sparsity in the gradient domain (i.e., reduced spatial variation).

$$\begin{array}{rcl} &\mathop{\rm{arg}}\limits_{\qquad{{\boldsymbol{x}}}_i \geq 0}\!\!\!\!\! {\rm{min}}\quad \underbrace{{\frac{\lambda_1}{2} \| {{\boldsymbol{A}}}_h \sum\limits_{i} (Z_i{{\boldsymbol{x}}}_{i})^\gamma - {{\boldsymbol{b}}}_{H} \|_2^2}}_{{{\text{Fusion}}\,{\text{Model}\,}({\text{Linking}}\,{\text{of}}\,{\text{Modalities}})}} + \\ &\underbrace{{\lambda_2 \sum\limits_{i} \left({{\mathbf{1}}}^T {{\boldsymbol{A}}}_c {{\boldsymbol{x}}}_i - {{\boldsymbol{b}}}_{i}^T \log({{\boldsymbol{A}}}_c {{\boldsymbol{x}}}_i + \varepsilon) \right)}}_{{{\text{Data}}\,\, {\text{Consistency}}\,\, ({\text{Poisson}}\,\, {\text{Limited}})}} + \\ &\underbrace{{\lambda_3 \sum\limits_{i} \|{{\boldsymbol{x}}}_i\|_{{{\mathrm{TV}}}}}}_{{{\text{Regularization}\,{({\text{Total}}\, {\text{Variation}})}}}} \end{array}$$
(1)

Here, xi is the reconstructed 3D chemical distribution for element i, bi is the measured 2D chemical maps for element i, bH is the measured elastic (HAADF) micrographs, Ah and Ac are forward projection operators for elastic and inelastic chemical modalities, λ1 is the elastic interaction weight, λ2 is the data consistency weight, λ3 is the total variation (TV) weight, ε herein prevents log(0) issues but can also account for background, the \(\log\) is applied element-wise to its arguments, superscript T denotes vector transpose, and 1 denotes the vector of \({N}_{{\rm{chem}}}^{{\rm{proj}}}{n}_{y}{n}_{i}\) ones, where ny is the number of pixels, ni is the number of elements present, and \({N}_{{\rm{chem}}}^{{\rm{proj}}}\) is the number of projections for the chemical modality.

For the highest quality reconstruction of chemistry, all three terms in Eq. (1) are needed. To illustrate how each term in the cost function influences reconstruction quality, different combinations of the MM-ET terms are reconstructed in Fig. 4. Using a synthetic dataset with a known ground truth, the reconstruction can be compared to the true values at each voxel (Fig. 4a). Conventional chemical tomography, equivalent to isolating the data consistency term (Ψ2), uses low SNR chemical projections with large tilt increments (≈ 10) resulting in poor reconstruction quality (Fig. 4b). Advancements in compressed sensing electron tomography (e.g., TV minimization) have been shown to reduce artifacts and noise45,46; TV minimization applied to traditional chemical tomography, i.e., using only the data consistency and TV terms, cleans up reconstruction noise but does not improve resolution (Figs. 3, 4c, Supplementary Figures 2, 3, 4, 6). Newer compressed sensing reconstruction algorithms such as CS-DART have also shown high reconstruction quality and should be compatible with fused MM-ET47. Data fusion without the TV term provides a resolution enhancement and improves SNR (Fig. 4d), but introducing TV minimization into the MM-ET framework improves the reconstruction, further reducing noise and sharpening features. Note that the particle’s shape and internal structure are well defined for each chemistry in Fig. 4e, demonstrating that MM-ET really requires all three terms in the data fusion cost function to produce a dramatically improved result. Extensive simulations of multiple synthetic datasets quantitatively validated that accuracy is greatest when combining all three cost terms13.

Fig. 4: Simulated reconstruction illustrates the contribution of each term in multimodal electron tomography.
figure 4

a Ground truth of synthetic CuO/CoO/Au nanocubes. b Nanocubes reconstructed with only Ψ2 (data consistency), λ1, λ3 = 0. The recovered tomograms are noisy. c Reconstructed with Ψ2 + TV, λ1 = 0. The expression Ψ2 + TV is equivalent to a denoising problem. d Reconstructed with Ψ1 and Ψ2 (model + data consistency), λ3 = 0. The tomograms remain noisy but structure has higher fidelity. e Reconstructed with MM-ET. Scale bar, 75 nm.

Cost function weight selection is critical for ensuring accurate reconstruction and improving convergence. Here, λ1, λ2, and λ3 are hyperparameters that serve as cost function weights for fused multimodal electron tomography. The hyperparameters create a unique cost landscape (Eq. (1)); however, the global minimizer is determined by the ratio between the three hyperparameters. In practice, λ3 is often held constant while λ1 and λ2 are hand-tuned; we find that TV (weighted by λ3) typically requires minimal tuning since its value deviates modestly between different material systems and experimental configurations. Fortunately, optimal weights do not change dramatically between normalized datasets, and even sub-optimal terms outperform traditional tomography.

Experimental conditions can influence the optimal cost function weights (Fig. 5); to determine the relationship between the SNR and the cost function weights, λ1 and λ2 values were tuned using Bayesian Optimization across the full range of elastic (HAADF) and inelastic (chemical) projection SNRs. Optimal weights generally do not vary by more than a factor of 2 in the experimental regime. The yellow circle in Fig. 5b represents typical experimental parameters for high-fidelity 3D chemical reconstruction.

Fig. 5: Exploring the parameter landscape with Bayesian optimization.
figure 5

a Illustration of Bayesian optimization for fused multimodal tomography simulations. Model certainty shown by the surface’s varying edge width; red points represent assessed reconstructions. b The weights between elastic and inelastic modalities (λ1 and λ2 respectively) change with the number of projections but not substantially. The yellow circle on each plot corresponds to a target MM-ET experiment with a HAADF projection SNR of 10 and chemical projection SNR of 5. c 3D visualization of the ground truth CoO/NiO nanotube. Scale cube, 15 nm. d Bayesian optimization searching for the cost function weighting that minimizes NRMSE. Each black dot indicates a full MM-ET reconstruction using 141 HAADF, 11 chemical projections. e The three cost function components show smooth asymptotic convergence. Multimodal tomography—as with all iterative tomographic reconstruction methods—should be assessed for proper convergence.

MM-ET reconstructs chemistry using an iterative optimization framework that should properly converge. Assessing convergence is done by inspecting the progression of the cost function and its constituents to ensure asymptotic decay to a constant minimum or near-minimum value; smooth and asymptotic decay of all three terms in Eq. (1) is an indicator of reliable reconstruction (Fig. 5e). Appropriate step sizes for convergence of Eq. (1) can be estimated using the Lipschitz constant of the measurement matrices (Ah, Ac) with the power method48. The Lipschitz constant provides an upper bound on a function’s rate of change.

SNR Affects Multimodal Electron Tomography

MM-ET performs reliably across signal-to-noise ratios. SNR of elastic and chemical projections (using Poisson noise) varied from 1 to 30 and were reconstructed using MM-ET to assess the final NRMSE (Fig. 6a). The ground truth CoNiO nanotube is shown in 3D (Fig. 6b). With very low signal-to-noise ratios, the chemical maps are difficult to interpret; however, even with moderate SNR, MM-ET sharply increases quality. The map of SNR performance is examined at three points (Fig. 6c). MM-ET's performance is approximately equivalent when the SNR of either modality is higher than that of the other. Moderate SNR (~ 5) for both modalities gives a vastly improved reconstruction with accurate chemistry and well-resolved features. Note, NRMSE is not a definitive indicator of improved quality and should be combined with direct assessment of the final reconstruction. Evaluating the resolution is best done using multiple metrics, including edge analysis, assessing the Fourier transfer limit, and Fourier shell correlation49.

Fig. 6: The signal-to-noise ratios of elastic and inelastic modalities affect reconstruction quality.
figure 6

a Average reconstruction error across SNR of each modality (elastic or inelastic) when 11 chemical maps (Δθ = 14) and 141 elastic (HAADF) projections (Δθ = 1) are available. b Synthetic CoNiO nanotube ground truth in 3D (rendered in Tomviz, scalebar 50 nm). c Three combinations of SNR levels: low inelastic and high elastic SNR, high inelastic and low elastic SNR, high inelastic and elastic SNR. Higher SNR in both modalities improves quality; however, experimentally it is desirable to prioritize the SNR of the lower-fluence modality.

The high-quality regime of MM-ET requires a moderate fidelity signal, elastic or inelastic (Fig. 6). MM-ET affords the choice of which modality to measure more. It takes 50–100 times more fluence to increase the SNR of chemical projections when compared to HAADF; therefore, an ideal experimental regime would be high HAADF SNR (≥10) and a moderate chemical SNR (~5), with elastic fluence accounting for roughly 10% of the total.

Automated hyper-parameters using Bayesian optimization

Selection of optimal cost function weights (i.e., λ1, λ2, and λ3) improves reconstruction quality but is time intensive. A full reconstruction is required to verify the performance of any given weighting, and exploration of the parameter space quickly becomes expensive. For large computational studies of MM-ET, selection of optimal weights is intractable. When experimental parameters change dramatically (e.g., number of projections, missing-wedge, SNR), optimizing the cost function can become onerous.

By leveraging Bayesian optimization with Gaussian processes (BO-GP)50,51, 3D multimodal chemical tomography cost function weights can be tuned efficiently and autonomously. Bayesian optimization is a machine learning algorithm known for finding global optimizers of expensive unknown landscapes with minimal evaluations. The framework consists of two core components: (1) develop a posterior probability distribution of the parameter space with Gaussian process (GP) regression and (2), control the exploration of future measurements with an acquisition function52. GPs estimate the landscape and quantify uncertainty in unsampled parts of the domain, informing the next exploration point. In previous work, BO-GP was used to tune one hyperparameter for compressed sensing electron tomography53; here, BO-GP is extended to two dimensions for MM-ET.

In Fig. 5, BO-GP is used to model the NRMSE landscape. The graphical rendering in Fig. 5a presents a Bayesian optimization landscape in 3D, offering uncertainty measurements above and below the optimizer’s estimate of the objective function, visible on the edges of the surface. Observed data points (red spheres) pinch the uncertainty to zero. Figure 5d depicts an experimental landscape: in electron tomography and MM-ET specifically, we have observed convex cost landscapes that are smooth near the global optimizer. In Fig. 5a,d each circle and sphere indicates a full MM-ET reconstruction that BO performed when exploring the landscape; here, the global minimizer is identified within 30 iterations, reducing parameter exploration time by approximately 90% when compared to a rudimentary grid search (e.g., a 20 × 20 grid). This BO-GP framework enables quick and autonomous creation of atlases of balanced hyperparameters for Eq. (1), as with the CoNiO and CoO/CuO synthetic datasets (Figs. 5 and 6). Experimental MM-ET may be fully automated using BO against a projection excluded from the reconstruction (Supplementary Fig. 10).

Discussion

Chemical tomography is an inverse problem that seeks to find the volumetric density of every chemistry throughout 3D space. Every chemical projection constrains the solution; however, there are never enough projections to solve this underdetermined problem. Fused multimodal electron tomography succeeds by adding substantial new constraints from elastic imaging. Framing the entire volume as a single inverse problem allows for each modality to be independently sampled. This unlocks access to higher 3D chemical resolution than previously thought possible. Fusing modalities in tomography requires a physical link between modalities that can be modeled. Here HAADF was used for its experimental simplicity and well-described Z-contrast behavior. Future work may use more complex elastic imaging modes within a 4D-STEM acquisition or ptychographic reconstruction.

In this work, MM-ET has been validated across thousands of synthetic simulations, and experiments have shown consistent resolution and fluence improvement over traditional chemical tomography. Fused multimodal chemical tomography appears more robust to low SNR and large missing wedges; however, it is most accurate when SNR is above 5 and the missing wedge is less than 40. Optimizing experimental and computational parameters resulted in 100-fold reductions in fluence and enabled sub-nanometer resolution with MM-ET. Although parameter tuning is required for optimal reconstruction, general trends have been highlighted to streamline the process. Bayesian optimization of reconstruction parameters allows for automated and efficient large-scale simulations across expansive parameter spaces.

For MM-ET experiments, the ideal imaging conditions balance sample survival with resolution and chemical SNR. Based on this work, we generally recommend a tilt range of ±70 or greater while acquiring 40 equally spaced HAADF projections and 7 EELS/EDX maps or more. The SNR should be above 10 on the HAADF and 4 for EELS/EDX modalities. All chemistries should be mapped. Cryogenic electron tomography can improve specimen resilience to the beam54,55.

Methods

Phantom design and simulation framework

A multi-channel phantom specimen inspired by an experimental system of SrTiO3 from Padgett et al. was synthesized for these simulations44. The phantom in Figs. 1, 3, and 4 consists of four channels, each attributed to a single element of Cu, Co, O, and Au with a volume of 2563 voxels (Au is excluded from Fig. 3). Each nanocube was hand-segmented by iterating over each layer and choosing which pixels corresponded to each element. CuO and CoO attributions were then assigned randomly to each nanocube, while the external nanoparticles were assigned as gold. The nanotube phantom was created by designating the tube core and shell, generating two independently rotated sets of 15 striates and of 1512 core–shell nanoparticles, and assigning these to Co, Ni, and O volumes. The nanotube is skewered by rotated striates and peppered with spherical nanoparticles. By design, Co overlaps with O and Ni overlaps with O.

Here, we use linear models and an incoherent linear imaging approximation, assuming only single scattering events. This assumption is valid for thin samples. Note that experimentally, thicker samples introduce multiple scattering and absorption of low-energy x-rays, which has been addressed with alternative imaging modes or computational methods56,57,58,59,60. The HAADF intensity is proportional to \({\sum }_{e}{({Z}_{i}{x}_{i})}^{\gamma }\) where xi reflects the elemental stoichiometry. The background (vacuum) is set to roughly 15% of the maximum intensity, and Poisson noise was applied to meet the desired SNR. For a Poisson-limited signal, each synthetic image has an SNR of \(\frac{{\mu }_{s}+{\mu }_{s}^{2}}{{\sigma }_{N}^{2}}\) where μs is the mean signal and \({\sigma }_{N}^{2}\) is the variance of noise29. Prior to measuring the NRMSE of the reconstructed volumes, the chemical distributions were normalized with zero mean and unit standard deviation. The NRMSE expresses a normalized measure of volumetric agreement between the reconstruction (x) and ground truth (y) : \(\sqrt{\frac{{\sum }_{i,j,k}{({{\boldsymbol{y}}}_{i,j,k}-{{\boldsymbol{x}}}_{i,j,k})}^{2}}{{\sum }_{i,j,k}{({{\boldsymbol{y}}}_{i,j,k})}^{2}}}\). 3D elastic, inelastic, and MM-ET volumes were rendered using the Tomviz platform (tomviz.org61).

Optimization framework and computational pipeline

Successful application of MM-ET requires adequate experimental conditions and cost-function weighting. In this work, all reconstructions were performed assuming all elements in the specimen were measured. When implemented experimentally, traditional best practices, such as choosing thinner samples, using high voltage, and correcting for aberrations, will benefit the final reconstruction. Additionally, care must be taken to properly extract chemical maps from the raw EELS and EDX spectra without artifacts from improper background subtraction, filtering, or re-weighting the data. As with all tomography, the depth of focus should not be smaller than the object size62. The MM-ET algorithm is compatible with alternative sampling regimes, including the revised Saxton method63,64, which samples at lower tilt angles (Supplementary Fig. 11).

In this work, BO-GP is used in Python with the Scikit Optimize library (scikit-optimize.github.io/stable) using the Matern kernel and GP Hedge acquisition strategy65. Asynchronous parallel BO on supercomputing resources allowed us to efficiently run several reconstructions simultaneously on a single node. Determining optimal parameters for the maps shown in Figs. 5 and 6 is computationally expensive due to the variability across sampling conditions. Using a grid search would have resulted in a computation time of approximately 100 days on a single GPU. Parallel computing resulted in computational speed-up as experimental parameters (e.g., SNR or sampling) were explored on several GPUs. Computation time to generate a parameter map was reduced by 98%, taking two days to complete. In total, 2760 GPU hours were used to complete the simulations, including 2274 hours on the Great Lakes Cluster and 486 GPU hours on a local computer.