Introduction

Pathological features serve as the gold standard for tumor grading, forming the cornerstone of treatment decisions1. During surgical interventions, the choice of surgical strategy is often guided by the findings from pathological examinations2. This involves striking a delicate balance between aggressive and conservative tumor resection strategies based on the malignancy level of the tumor, especially in critical body parts like the central nervous system (CNS). An excessively aggressive resection strategy can compromise the patient’s normal functions, causing additional suffering. Conversely, a too-conservative strategy risks incomplete tumor excision and high tumor recurrence rates3. The surgical goal is to remove as much of the tumor tissues as possible while preserving functional tissues. Hence, the ability to obtain high-quality pathological images in a timely and convenient manner during surgery is crucial for making intraoperative decisions and ensuring clean surgical margins3. However, the standard histopathological biopsy process is laborious and time-consuming, involving formalin fixation and paraffin embedding (FFPE), followed by thin sectioning, staining, and mounting on glass slides, often extending over several days4. Frozen sectioning provides a rapid intraoperative diagnosis method, but this technique can introduce severe artifacts, such as freezing artifacts, poor-quality sectioning, swollen cell morphologies, and poor staining. These effects are especially pronounced in tissues that do not freeze well, like those in the breast and brain, resulting in reduced diagnosis accuracy5,6.

In order to overcome these drawbacks of standard and frozen histology and provide rapid intraoperative diagnosis, high-resolution and high-contrast “optical biopsy” techniques that reduce or obviate the need for biological tissue processing are currently being explored. Among these novel imaging methods, one category utilizes point-by-point scanning for image acquisition, such as stimulated Raman scattering7,8, confocal microscopy9,10, multiphoton microscopy11,12,13,14, second-harmonic imaging microscopy15, and photoacoustic microscopy16,17. Most of these imaging modalities utilize a highly focused laser beam to induce linear and nonlinear light-matter interactions, allowing for high-resolution, label-free imaging. However, the requirement to scan across three dimensions and the time needed to linger on each pixel for data acquisition means that assembling a three-dimensional image is a very time-consuming process18. In addition, these approaches can lead to phototoxicity and photodamage. Another category including wide-field structured illumination microscopy (SIM) and light-sheet microscopy like open-top light-sheet (OTLS) microscopy and MediSCAPE system offers significant advantages in imaging speed as they can capture an entire plane in each acquisition18,19,20. However, these techniques involve complicated geometric configurations, which add to the complexity and cost of the system. In addition, they typically rely on exogenous fluorochromes to enhance their signal-to-noise ratio (SNR) and enrich their contrast, which increases the complexity of tissue processing and potentially affects the subsequent tissue reutilization.

In recent years, dynamic full-field optical coherence tomography (D-FFOCT) has been developed, utilizing subcellular dynamics as an intrinsic contrast source to enable high contrast imaging21,22. This technique measures variations in the optical path length of backscattered light by capturing the interference signal with a camera, thus sensitively responding to the nanoscale movements of subcellular particles across the entire field of view. By extracting statistical features of the interference signal, optical histology can be achieved without any tissue processing. Offering improved image contrast over traditional FFOCT23,24,25, D-FFOCT has demonstrated high accuracy in the diagnosis of breast cancer and nodal metastasis24,26. However, axial displacements of the sample caused by environmental vibration may often mix with intracellular motions, resulting in a reduction of the SNR in D-FFOCT images27. And because of the random nature of environmental noises, consecutive D-FFOCT images of the adjacent tissues often exhibit random shifts in hue and brightness24. Since the hue and brightness of D-FFOCT images reflect metabolic indexes of tissues, this instability will confuse pathologists and reduce the interpretability of the images. In addition, pathologists are typically trained to analyze standard H&E histology for diagnosis, hence unfamiliar with D-FFOCT colored images. Therefore, D-FFOCT may raise the learning threshold for pathologists and impede fast decision-making.

To overcome these barriers, pseudo-H&E images have been developed based on linear28 and nonlinear mapping29 for other optical imaging techniques. But these methods usually require fluorescent dyes and result in unnatural-looking images. Recently, deep learning30,31 has been applied to generate virtual H&E histology from various imaging modalities32,33,34,35,36,37,38,39,40,41,42,43. However, the performance of virtual H&E histology significantly depends on the quality and characteristics of the original images used for training. For example, imaging techniques, such as autofluorescence imaging32,38, quantitative phase images (QPI)33, reflectance confocal microscopy (RCM)34, quantitative oblique back illumination microscopy (qOBM)39, stimulated Raman scattering (SRS) microscopy40 and point-scanning OCT41 fail to provide sufficient nuclear contrast, especially lacking intranuclear details. Consequently, errors in nucleus generation or omissions can occur after deep learning transformations. Microscopy with UV surface excitation (MUSE)42,43 and UV photoacoustic microscopy (PAM)16,17,35 can provide nuclear contrast within tissues. However, MUSE requires dye staining of the sample and lacks depth-resolving capability, while PAM cannot resolve individual cell nuclei within densely packed cells due to limited axial resolution. In fact, D-FFOCT could provide high-contrast and high-resolution nuclear structures without labeling, but its instability in hue and brightness as mentioned above prohibits the correct generation of virtual H&E histology.

In this study, we first utilize a novel technique, called APMD-FFOCT that adopts active phase modulation to eliminate the influence of the random environmental vibration. We demonstrate that this method significantly enhances the stability of dynamic images and achieves continuity and consistency for image stitching. Meanwhile, this method also offers good contrast to static tissues, such as tumor-associated collagen fibers and calcified tissues, while maintaining high sensitivity to dynamic tissues like tumor cells. Therefore, the APMD-FFOCT images provide a much more solid foundation than conventional ones for virtual H&E-staining. We demonstrate, for the first time, that APMD-FFOCT images can be converted into virtual H&E-stained images by unsupervised deep learning, achieving three-dimensional virtual H&E-stained images at a scanning rate of 1 frame per second. Furthermore, we also show that this novel technique has been successfully applied in cancer diagnosis for the human central nervous system and breast. These results demonstrate that our method has the potential for scanning large, thick tumor tissues intraoperatively, making it an ideal tool in intraoperative histology.

Results

APMD-FFOCT system and imaging principle

As shown in Fig. 1a, the APMD-FFOCT configuration utilizes a Linnik interferometer with twin 20x water immersion objectives and a low-coherence light source. The light returning from the sample layer interferes with the light reflected back from the reference mirror and is subsequently projected onto the camera. More system specifics are given in the Methods section. APMD-FFOCT detects the subcellular movements by capturing changes in the optical length of back-scattered light from actively metabolizing, freshly excised tissues. The fluctuation of light intensity on each pixel of the camera is recorded as a time-domain intensity trace, as shown in Fig. 1b. These traces then undergo Fourier transformation to generate their corresponding frequency spectra. We then employed the Hue-Saturation-Value (HSV) color space to visualize the dynamic characteristics of the metabolism activities. As shown in Fig. 1c, hue is determined by the spectral centroid of the frequency spectrum, thus reflecting the fluctuation speed. The brightness of the image is determined by the integration of the frequency spectrum without the zero-frequency portion, which links to the fluctuation amplitude. As shown in Fig. 1a, a sinusoidal signal can be added to the PZT under the reference mirror to generate active phase modulation. With a chosen frequency of 25 Hz and amplitude of 23 nm, this active phase modulation forms a sharp and stable intensity peak in the frequency spectrum, as shown in Fig. 1c.

Fig. 1: APMD-FFOCT system designs and image generation.
figure 1

a Schematic of active phase modulation-assisted D-FFOCT. BS, Beam splitter; OB, Objectives; PZT, piezoelectric translation; YAG, Yttrium Aluminum Garnet. The signal generator provides an active modulation signal to the PZT in the reference arm. b An intensity trace that records the temporal fluctuations of back-scattered light from a pixel of the camera. c Frequency spectrum obtained by Fourier transform of the intensity trace.

Instability in D-FFOCT images and the implementation of active phase modulation correction

Figure 2a displays a portion of the stitched D-FFOCT image of human psammomatous meningioma, highlighting abrupt changes in both hue and brightness of consecutive D-FFOCT images. To elucidate this phenomenon, the averaged frequency spectra were extracted from the selected areas, as depicted in Fig. 2b. The spectrum integrals and centroids of ROI 1, ROI 2, and ROI 3 in Fig. 2a were 64 and 25.6 Hz, 54 and 26.4 Hz, and 67 and 24.8 Hz, respectively. The instability of these dynamic indexes is thus mainly due to the random distribution of intensity peaks in the high-frequency range of the spectra in Fig. 2b. Moreover, Fig. 2c shows the stitched D-FFOCT image with both collagen fibers and tumor cells. In the two adjacent images, while the hue and brightness of tumor cells remain consistent, a sudden change is observed in the collagen fibers. Figure 2d exhibits the averaged frequency spectra of tumor cells and collagen fibers in solid boxes of Fig. 2c. The results reveal that the spectral intensity peaks for both collagen fibers and tumor cells occur at identical frequencies but with significant variations in intensity, as marked by the arrows. This implies that the intensity peaks stem from the bulk vibration of the sample rather than intrinsic subcellular motions. Furthermore, the variation in peak intensities suggests that collagen fibers have much higher reflectivity than tumor cells, making them more susceptible to ambient vibrations. Therefore, the random distribution of environmental vibrations results in the instability of hue and intensity in the highly reflective sections of tissues, like collagen fibers. Notably, the discontinuity and instability in stitched images cannot be resolved by merely adjusting the mapping relationship of hue and brightness. Efforts to make the fiber regions appear continuous would inevitably compromise the continuity of originally seamless regions, such as cells, as shown in Supplementary Fig. 1.

Fig. 2: Instability in D-FFOCT images and the implementation of active phase modulation correction.
figure 2

a Stitched D-FFOCT image of psammomatous meningioma with collagen fibers. b Averaged frequency spectra correspond to the selected regions in a. c Stitched D-FFOCT image with both collagen fibers and tumor cells (enlargement of the purple dotted box in a). d Averaged frequency spectra correspond to the selected solid boxes of collagen fibers and tumor cells in c. e Stitched APMD-FFOCT image of the same area in a. f-h Images generated by low-frequency part (0.4-24.8 Hz, green area of i) (f), active phase modulation part (24.8-25.2 Hz, blue area of i) (g), and the combination of f and g shown in h. i Averaged frequency spectra correspond to the collagen fibers and tumor cells in h. j–l APMD-FFOCT images with PZT modulation voltage amplitude of 0 nm(j), 23 nm (k) and 115 nm (l). m Frequency spectra of tumor cell areas in j-l with increasing PZT modulation voltage.

A straightforward method to mitigate these abrupt changes in hue and brightness is to exclude the high-frequency portion of the frequency spectrum, as marked by the red segment in Fig. 2i. The green segment, denoting the metabolic frequency region, was employed to generate Fig. 2f. In this figure, only the tumor cells are visible. However, collagen fibers, which are not evident in this representation, are vital as they are one of the tumor features and crucial for providing accurate pathological information. Unfortunately, the intrinsic motion of the collagen fibers is too faint to be effectively detected. Therefore, to enhance the contrast of collagen fibers, we introduce a 25 Hz sinusoidal signal to the PZT to actively modulate the reference mirror while capturing intracellular motions. This active phase modulation results in sharp but stable intensity peaks in frequency spectra at 25 Hz, as shown in the blue area of Fig. 2i. The integral of the blue area of the frequency spectrum is used to generate Fig. 2g, where collagen fibers are effectively preserved due to its high reflectivity. Figure 2h presents the combination of Fig. 2f and g. In this figure, the hue and brightness of the collagen fibers can be flexibly and independently adjusted for contrast optimization and remain stable across different images. It should be noted that the collagen fiber portion is hardly visible in Fig.2f due to its low dynamics in traditional D-FFOCT but quite clear in Fig. 2h due to active phase modulation in APMD-FFOCT. Furthermore, stitched images from the same tissue region without and with active phase modulation, as shown in Fig. 2a and e, obviously prove that the stability of hue and brightness of APMD-FFOCT image is significantly improved. Additionally, APMD-FFOCT demonstrated a significant improvement in the SNR of collagen fibers compared to traditional D-FFOCT, increasing the SNR from 10 to 18, as illustrated in Supplementary Fig. 2.

In order to optimize image quality, we conducted a comparative study to assess the impact of PZT modulation voltage when imaging the same tissue region. Figure 2j–l represent APMD-FFOCT images with PZT modulation amplitude of 0 nm (no modulation), 23 nm (active phase modulation), and 115 nm (traditional FF-OCT phase modulation). The corresponding frequency spectra with varying PZT modulation amplitude are summarized in Fig. 2m. Comparing Fig. 2j, k, it is evident that the latter offers clearer collagen fiber structures. Corresponding frequency spectrum curves of 0 nm and 23 nm modulation amplitude show that their low-frequency curves are nearly identical (Relative difference of spectrum integrals: ~1%), indicating that a 23 nm modulation has a negligible impact on the detection of cellular dynamics. The frequency spectrum curve of 23 nm modulation exhibits a sharp intensity peak near 25 Hz. Upon further increasing the modulation voltage to 115 nm, as shown in Fig. 2l, it is evident that the image contrast is reduced, and the signals from active phase modulation gradually overwhelm the signals from cellular dynamics. Corresponding frequency spectrum curves from 46 nm to 115 nm modulation also reveal an increasing suppression of cellular dynamics in the low-frequency range. Therefore, we choose a PZT modulation amplitude of 23 nm for our APMD-FFOCT.

Comparison of APMD-FFOCT and H&E-stained images in CNS tumors

CNS tumors encompass various pathological entities, each displaying unique histological features. Figure 3a presents APMD-FFOCT images of psammomatous meningioma. Its histological signature includes numerous psammoma bodies—concentrically calcified structures within the tumor tissue. These bodies, being relatively static, are either not visible or appear unstable in traditional D-FFOCT images. However, active phase modulation makes these structures clear and distinct in APMD-FFOCT images, ensuring high consistency and continuity in the stitched images. Figure 3d displays corresponding H&E-stained images from the same sample, showing similar psammoma body structures. Yet, in H&E-stained images, most psammoma bodies appear fragmented and incomplete. Figure 3j zooms into the area indicated by the blue box in Fig. 3d, highlighting the out-of-focus blur of remaining psammoma bodies. These issues may arise from the inherent hardness of the psammoma bodies, which, when subjected to microtome slicing, can lead to fragmentation and detachment from the tissue section. Additionally, this process can also result in the unevenness of the remaining psammoma bodies and lead to the out-of-focus blur. These problems are completely avoided in slide-free APMD-FFOCT imaging. Moreover, nuclei from round to oval in shape in APMD-FFOCT image are clearly discernible in Fig. 3g, consistent with H&E-stained images in Fig. 3j. APMD-FFOCT and H&E-stained images of ependymoma are shown in Fig. 3b and e, respectively. The collagen fibers in both images are distinctly visible, while highlighted in blue in the APMD-FFOCT image, weaving through the regions of tumor cells. The tumor nuclei, from round to slightly oval with moderate density, are visible in Fig. 3h and k. Figure 3c and f reveal APMD-FFOCT and H&E-stained images of schwannoma, where Antoni A areas can be identified by densely packed spindle-shaped cells with elongated nuclei (Fig. 3i and l), which are marked features of schwannomas44. In conclusion, despite from different contrast, APMD-FFOCT has demonstrated its ability to reveal tissue and cellular details comparable to those seen in H&E-stained images.

Fig. 3: Comparison of APMD-FFOCT and H&E staining images in CNS tumors.
figure 3

a–c Stitched APMD-FFOCT images of psammomatous meningioma (a), ependymoma (b), and schwannoma (c). d–f H&E-stained images of the same sample correspond to a–c. g–i Enlargement of the red boxes in a–c. j–l Enlargement of the blue boxes in d–f.

Virtual staining of APMD-FFOCT images of diffuse midline glioma

Although APMD-FFOCT offers rapid access to high-contrast pathological images, it concurrently elevates the learning curve for pathologists and hampers rapid decision-making. This challenge arises because pathologists are traditionally trained to diagnose through standard histological images. After establishing the capability to capture stable dynamic images with histological details comparable to H&E-stained images, we applied a CycleGAN-based deep learning approach to perform virtual H&E staining on APMD-FFOCT images. More detailed procedures are described in Methods. Figure 4a shows the APMD-FFOCT image of diffuse midline glioma (DMG), where tumor cells are densely arranged, with significant nuclear pleomorphism. Figure 4b displays virtual H&E-stained images derived from Fig. 4a, in which the histological features are accurately converted into a representation resembling traditional H&E staining (Fig. 4c). Figure 4d–f show enlargements of corresponding boxes in Fig. 4a–c. The morphology of the tumor cells is accurately translated, as exemplified by the cells to which the red arrows are pointing. Meanwhile, the tumor cells in the virtual H&E-stained images demonstrate excellent concordance with the style and appearance of traditional H&E images (Fig. 4f). Moreover, the dark corners in stitched APMD-FFOCT images (Fig. 4d), caused by uneven illumination and interference, are largely corrected (Fig. 4e).

Fig. 4: Conversion of APMD-FFOCT images to virtual H&E-stained images of diffuse midline glioma.
figure 4

a–c Stitched APMD-FFOCT images, virtual H&E-stained images, and standard H&E-stained images of diffuse midline glioma. d–f Enlargement of the selected regions in a–c.

Each APMD-FFOCT image in Fig. 4 was calculated from 500 raw images obtained in 5 seconds. The acquisition time was set to 5 seconds, as extending it further was deemed unlikely to notably enhance the image SNR and quality. However, moderately reducing the acquisition time and the number of raw images can still produce APMD-FFOCT images with sufficient quality for further deep-learning transformations. Figure 5a–c present APMD-FFOCT images generated from 100, 300, and 500 raw images, captured over durations of 1, 3, and 5 seconds, respectively. As the acquisition time and number of raw images are reduced, there is an increase in speckle noise and a decrease in the SNR. Figure 5d–f display virtual H&E-stained images derived from Fig. 5a–c. Through the conversion process, speckle noise is effectively removed, leading to virtual H&E-stained images in Fig. 5d–f that show uniform quality without obvious differences.

Fig. 5: Comparison of virtual H&E staining results based on APMD-FFOCT images generated from different numbers of raw images and a comparison between virtual H&E staining images derived from APMD-FFOCT and D-FFOCT.
figure 5

ac APMD-FFOCT images reconstructed using 100, 300, and 500 raw images, respectively. df Virtual H&E-stained images corresponding to the APMD-FFOCT images in panels ac. g Stitched image obtained using D-FFOCT. h Stitched image obtained using APMD-FFOCT. i Virtual H&E-stained image derived from the stitched D-FFOCT image (g). j Virtual H&E-stained image derived from the stitched APMD-FFOCT image (h).

To demonstrate the role of active phase modulation for the correct generation of virtual H&E-stained images, we compared APMD-FFOCT and D-FFOCT images of the same tissue region and their generated virtual H&E staining images. Figure 5g shows the stitched D-FFOCT image, where the continuous collagen fiber regions exhibit significant differences in hue and brightness between consecutive imaging sessions. Figure 5i presents the virtual H&E staining image generated from Fig. 5g, where erroneous cell generation is observed in the red box. In contrast, Fig. 5h shows the stitched APMD-FFOCT image, which appears more continuous. Figure 5j shows the virtual H&E staining image derived from Fig. 5h, effectively avoiding the erroneous cell generation observed in Fig. 5i. The results prove that the image stabilization achieved by APMD-FFOCT is crucial for virtual staining.

APMD-FFOCT to virtual H&E-stained images conversion of invasive ductal carcinoma (IDC)

Having confirmed the capability to convert APMD-FFOCT images of CNS tumors into virtual H&E-stained images, we now proceed to apply this approach to breast tumors. Breast cancer is not only one of the most common cancers affecting women worldwide but also a leading cause of cancer-related deaths in females45. IDC is the most common type of breast cancer, whose diagnosis and grading heavily depend on H&E staining. The Nottingham histological grading system for breast cancer emphasizes nuclear size and pleomorphism as critical factors for tumor grading46. Furthermore, the collagen fibers serve as a prognostic indicator for survival in breast carcinoma47, offering a more comprehensive understanding in diagnosis. We show that APMD-FFOCT images of IDC samples in grade III, grade II, and grade I can be accurately translated into virtual H&E-stained images, providing rich pathological features like morphology of nuclei and distribution of collagen fibers.

As illustrated in Fig. 6a, the APMD-FFOCT image of grade III IDC shows a large nuclear area with more pronounced nuclear pleomorphism. Figure 6d displays the virtual H&E-stained images derived from Fig. 6a, wherein the size and morphology of the nuclei are faithfully mapped. Especially, the subnuclear details can also be resolved and mapped as pointed by the arrows. The second row of Fig. 6 depicts a grade II IDC sample, showing densely packed tumor cells that are embedded within the collagen fibers. Figure 6b captures the intricate details of collagen fibers, rendered in blue. The distribution and orientation of collagen fibers are faithfully mapped in the virtual H&E-stained image in Fig. 6e. The virtual H&E staining and standard H&E staining show high consistency (Fig. 6e, h). Furthermore, Fig. 6c shows APMD-FFOCT image of grade I IDC sample, also proving the feasibility of our method. The tumor cell nuclei are bright and easily distinguishable, surrounded by a comparatively dimmer cytoplasm, both exhibiting yellow-green hues. Correspondingly, the virtual H&E-stained image in Fig. 6f highlights the nuclei of tumor cells in shades of purple and the cytoplasm in lighter. In fact, the accuracy of the transformations for each of the above parts is obviously validated in the standard H&E-stained images shown in Fig. 6g–i.

Fig. 6: APMD-FFOCT to virtual H&E-stained images conversion of IDC.
figure 6

ac APMD-FFOCT images from IDC samples of grade III, grade II, and grade I. df virtual H&E-stained images converted from a to c. gi Standard H&E-stained images of the same sample correspond to ac.

To quantitatively evaluate the fidelity of virtual H&E images, we extracted the nuclear area, nuclear perimeter-to-area ratio, and internuclear distance (mean values and standard deviations (SD), n represents the number of cells counted) from the APMD-FFOCT images, virtual staining images, and H&E images shown in Fig. 6. While the nuclear area reflects size similarity, the perimeter-to-area ratio indicates nuclear heterogeneity, and internuclear distance reflects nuclear density. The results in Table 1 show that the virtual H&E staining images faithfully reflect the size, heterogeneity, and density of nuclei observed in the APMD-FFOCT images. Furthermore, for Grade II and Grade I tumors, the virtual H&E staining images correspond well with the actual H&E images. For Grade III tumors, the nuclear areas in APMD-FFOCT images are slightly larger than those in H&E images. This discrepancy may be due to the use of a coverslip to flatten the tissue surface during imaging, as over-compression of the tissue can cause nuclei to flatten, increasing their apparent area. Additionally, tumor heterogeneity and sampling differences could also contribute to this observation.

Table 1 Quantitative analysis of cellular features in IDC tumor grades (I–III) using APMD-FFOCT, virtual H&E, and conventional H&E methods

To further assess the diagnostic value of virtual staining images, three pathologists, blinded to the staining techniques, evaluated the quality of the stains shown in Fig. 6. They graded the images in terms of tumor cells and fibers on a scale from 1 to 4: 4 for perfect, 3 for very good, 2 for acceptable, and 1 for unacceptable. The results, summarized in Table 2, indicate that the pathologists successfully identified histopathological features using both staining techniques. Additionally, there was a high level of agreement between the techniques, with no clear preference for either virtual staining or traditional histological staining.

Table 2 Pathologist’s evaluation of tumor cells and fibers across three types of images

3D Virtual H&E Staining of APMD-FFOCT

Due to low coherence gating, APMD-FFOCT enables high axial resolution and label-free tomographic imaging without the need for slicing. We can acquire three-dimensional tomographic images without any damage to the sample. Figure 7a displays a volumetric 3D APMD-FFOCT tomography of a grade III IDC sample, which is composed of 50 two-dimensional slices, each separated by an axial distance of 0.3 micrometers with a volume of 350 μm × 500 μm × 15 μm. Here, our primary goal is to demonstrate the three-dimensional morphology of cell nuclei, and this depth is sufficient to capture their complete 3D structure. The imaging depth of APMD-FFOCT in biological tissue is about 75 microns, as shown in Supplementary Fig. 3. In addition, deeper penetration depths can be achieved using longer wavelengths of light. This high-resolution stack not only allows for an in-depth study of tissue morphology but also enables detailed analysis of nuclear structures of any cross-section at arbitrary angles. For example, the cross-section in Fig. 7a shows a clear longitudinal profile with rich intranuclear structures, which provides a unique perspective for doctors to analyze pathological details. Figure 7b shows APMD-FFOCT image slices and the corresponding virtual H&E image slices at various depths. As APMD-FFOCT enables high contrast and high SNR imaging at various tissue depths, all APMD-FFOCT images are accurately transformed into virtual H&E images. The details inside nuclei shown as dark, indicated with arrows can be successfully translated into the white areas of nuclei in virtual H&E-stained images. Therefore, the first 3D virtual H&E Staining of APMD-FFOCT is successfully achieved, which will play a new unique and important role in intraoperative histology.

Fig. 7: Virtual H&E Staining of 3D tomographic images.
figure 7

a 3D APMD-FFOCT volume to 3D virtual H&E-stained volume conversion of IDC sample. b Comparison of APMD-FFOCT images and corresponding virtual H&E-stained images at various depths from a.

Discussion

We have demonstrated the effectiveness of combining APMD-FFOCT with a CycleGAN-based deep learning approach for performing 3D virtual H&E staining on unprocessed specimens, enabling rapid intraoperative diagnosis. This novel technique has been applied to human CNS and breast tumors, which are typically not suitable for conventional frozen section analysis but necessitate intraoperative diagnosis to inform surgical decision-making.

We first address the instability of D-FFOCT images, which is caused by random environmental vibrations. This issue is most prominent in structures with high reflectivity but low dynamics. The modulation of the interference phase is required to distinguish the light scattered within the coherence gate from background scattering. In D-FFOCT, this phase modulation arises from the tissue’s intrinsic metabolic dynamics; however, static components such as collagen fibers do not exhibit detectable phase changes. Therefore, by applying a 25 Hz, 23 nm active modulation to the reference mirror, we shift the originally static collagen fiber signals into a stable, distinguishable frequency band (indicated in blue in Fig. 2i), creating a sharp peak in the spectrum. This enables collagen fiber signals to be distinctly separated from both background scattering signals outside the coherence gate (zero-frequency component) and signals of metabolic dynamics (low-frequency component). The moving reference mirror introduces a reliable phase modulation for static structures, as it is used in FFOCT without dynamic contrast, which actively shifts the mirror by fractions of a wavelength. The difference is that the amplitude of active phase modulation is small enough so that it does not degrade the signals of intrinsic metabolic dynamics. Therefore, this technique combines the dynamic contrast of D-FFOCT and the backscatter contrast of FFOCT into a single, integrated image—without requiring additional acquisition time. Notably, this technique does not cancel the phase noise caused by environmental vibrations; instead, by selecting the appropriate portion of the frequency spectrum, we replace random, uncontrolled environmental vibrations with stable, controllable active phase modulation. As a result, it improves imaging stability and ensures continuity and consistency in image stitching, while enhancing the contrast of static tissues.

We then compare APMD-FFOCT images and H&E-stained images across various types of CNS tumors. Standard H&E staining is a fundamentally histopathological technique that highlights tissue structures and cellular details. Hematoxylin stains cell nuclei blue or purple, highlighting DNA and RNA, while Eosin colors the cytoplasm and extracellular matrix in shades of pink and red. This differentiation allows pathologists to examine tissue morphology, discern normal from abnormal structures, and assess cellular features crucial for diagnosing diseases. Correspondingly, APMD-FFOCT, a label-free imaging technique, renders nuclei the brightest due to their high metabolic dynamics, followed by the cytoplasm with a lesser brightness reflecting its weaker metabolic activity. The extracellular matrix, typically more metabolically inert, can still be visualized distinctly when enhanced through active phase modulation. Therefore, although arising from different contrast mechanisms, APMD-FFOCT provides pathological information comparable to standard H&E-stained images. Particularly for calcified structures, like the psammoma bodies, APMD-FFOCT offers images of superior quality compared to H&E staining, as it avoids the need for intricate tissue processing procedures.

While APMD-FFOCT provides quick acquisition of high-contrast pathological images, it also introduces an extra learning cost for pathologists, impeding swift diagnostic decisions. This complexity stems from the fact that pathologists are traditionally accustomed to interpreting diagnoses from conventional histological stains. To overcome these challenges, we have implemented CycleGAN-based deep learning to perform virtual H&E staining on the APMD-FFOCT images. It is worth noting that the image stabilization achieved by APMD-FFOCT is crucial for virtual staining. Without it, the instability in traditional D-FFOCT might lead to the generation of erroneous information during virtual staining. The performance of our virtual H&E staining method has been validated by CNS tumor and IDC samples. The virtual H&E images, on the one hand, faithfully reflect the details presented in the original APMD-FFOCT images, such as the morphology and distribution of cell nuclei, cytoplasm, and extracellular matrix. On the other hand, their staining style is highly consistent with that of real H&E staining. Moreover, generating each APMD-FFOCT image typically involves capturing 500 raw images within a 5-s window to mitigate speckle noise and ensure high image quality. Nonetheless, thanks to the robustness of our model, reducing the acquisition time and the number of raw images to 1 s and 100 frames still yields almost the same virtual H&E images. As the processing procedures, which include the generation of APMD-FFOCT images and their conversion into virtual H&E-stained images, can occur concurrently with the acquisition of raw images, the overall imaging duration required to produce virtual H&E-stained images of wide areas is substantially shortened.

Because nuclei of tumor cells exhibit significantly higher metabolic activities, their contrast is more pronounced in APMD-FFOCT images compared to other label-free imaging techniques32,33,34,35. This advantage helps effectively mitigate issues of missing or erroneously generated nuclei in deep learning transformations, thereby enhancing the reliability of the resulting virtual H&E-stained images. Moreover, nuclear pleomorphism is a critical but challenging parameter to assess histological grade, exhibiting low reproducibility among even expert pathologists48. This uncertainty is primarily attributed to the limitations of 2D histological specimen analysis49, as current histology only provides information on a certain section of the cells and lacks three-dimensional images of the natural living cells. Due to low-coherence gating, APMD-FFOCT is able to capture high axial resolution and label-free tomographic imaging without the need of slicing. The deep learning algorithm can then transform these into 3D virtual H&E-stained volumes and provide a clearer and more intuitive visualization of nuclear pleomorphism and subnuclear structures. Therefore, this new technique will surely enhance the assessment’s accuracy and reliability effectively.

From a cost and size perspective, APMD-FFOCT requires only an LED as the light source, eliminating the need for expensive and bulky pulsed lasers. Moreover, its adoption of full-field imaging negates the necessity for a complex galvanometer scanning system, further simplifying the system’s architecture. Importantly, this method does not require any labeling or fixation of tissues, and the light power irradiated onto biological tissues is only 1.2 mW, ensuring no damage to the tissue and allowing for its reuse. Compared to traditional D-FFOCT, no additional time is required in APMD-FFOCT imaging. Further improvements and exploration should be made toward better clinical use. For example, APMD-FFOCT remains susceptible to low-frequency disturbances, like airflow or human activity. Future efforts could focus on reducing the impact of low-frequency vibrations by improving the instrument’s anti-interference capabilities and implementing denoising algorithms.

It should be noted that in some APMD-FFOCT images, the nuclear areas appear larger than those in H&E images. This discrepancy may arise because a coverslip is used to flatten the tissue surface for imaging. When softer tissues are over-compressed, the nuclei become flattened, increasing their apparent area and decreasing their density. To address this issue, a smoother and flat tissue surface can be prepared by sectioning it with a blade before imaging, minimizing tissue deformation. Moreover, the correlation between subnuclear structures observed in APMD-FFOCT and those in H&E-stained images requires additional investigation. Finally, with a custom large FOV high-NA objective and a high pixel number camera, the imaging range can be expanded without compromising resolution, thereby reducing the overall imaging time.

In summary, the development of APMD-FFOCT marks a significant advancement in stabilizing traditional D-FFOCT and enhancing the contrast of biological tissues with low metabolic activity. This lays the groundwork for utilizing deep learning to create 3D virtual H&E-stained images. Furthermore, the use of deep learning significantly reduces overall imaging time by decreasing the number of raw images required. We have also demonstrated the effectiveness of the combination of APMD-FFOCT and virtual H&E staining on human CNS and breast tumors for fast diagnosis in intraoperative histology. We believe this approach could greatly assist in pathological diagnosis and provide rapid feedback for intraoperative decision-making.

Materials and methods

Experimental setup

The APMD-FFOCT configuration utilizes a Linnik interferometer with twin 20× water immersion objectives and a low-coherence light source. Light from an LED (Thorlabs M565L3, with a central wavelength of 565 nm and a full-width at half maximum of 104 nm) is divided equally into the sample and reference arms via a 50/50 non-polarizing beam splitter (BS013, Thorlabs). The light source is focused onto the back focal plane of the objective lens using a lens with a focal length of 40 mm. After passing through the objective, the light becomes collimated, enabling Köhler illumination and providing uniform illumination across the entire field of view. Identical high-magnification water immersion objectives (Nikon NIR APO 20 × 0.5 NA) are employed to attain a lateral resolution of ~0.7 μm over a 500 × 350 µm2 field of view. Using water as a medium allows for refractive index matching to reduce surface reflections from the objective lens and cover slides. Simultaneously, it minimizes the separation between the focal plane and the coherence plane when adjusting imaging depth. The axial resolution is ~1 μm, determined by the light source’s coherence length.

In the sample arm, freshly excised tissues are placed on a custom sample holder, with a cover glass to flatten the tissue. The height of the cover glass can be finely adjusted using a threaded ring, which is then locked in place by a clasp to ensure stability. The holder is set on a five-dimensional control stage, with three for translational adjustments and two for angular corrections, ensuring the cover glass remains parallel to the focal plane for consistent imaging depth during image stitching. In the reference arm, a YAG (Yttrium Aluminum Garnet) reference mirror, modulated by an underlying PZT (TA0505D024W, Thorlabs), induces 25Hz phase modulation during imaging. This arm is attached to a high-precision electric translation stage (M-VP-25XL, Newport) to fine-tune the optical path difference between the arms, aligning the objective’s focal plane with the coherence plane for optimal imaging.

Scattered light from subcellular particles and the reference mirror recombines at the beam splitter, and the merged beam is then focused onto a camera (MV4-D1600-S01-GT, Photon Focus) through a tube lens. The entire apparatus is stationed on an active vibration isolation platform (VCM-S400, Jiangxi Shengsheng) to mitigate environmental disturbances.

Data acquisition, processing, and image generation

In our study, we employed custom-designed C++ software to collect all APMD-FFOCT datasets, facilitating three-dimensional scanning capabilities. The acquisition time of a dynamic image is set at 5 seconds (except Fig. 5), with a frame rate of 100 fps, to capture a wide range of intracellular movements. Each acquisition produces a tensor sized (550, 800, 500), where 550 × 800 represents the sensor pixels after pixel binning (2, 2), and 500 denotes the count of frames captured. Binning reduces pixel count, conserves storage, and quadruples the quantum well depth, enhancing the SNR. Each pixel yields a time-domain intensity trace of 500 points across 5 seconds, which is then Fourier-transformed to derive the frequency spectrum. The spectrum’s lowest and highest detectable motion frequencies are 0.2 Hz and 50 Hz, determined by the total acquisition duration and the camera’s frame rate. Capturing the image shown in Fig. 2e requires a total of 120 s.

The reflectivity of scatterers located within the coherence slice of the sample at a given depth is denoted as \(R(x,y)\). The remaining light backscattered from the sample, which does not interfere and is collected by the microscope objective, is represented by an equivalent reflectivity coefficient, \({R}_{{Inc}}\). The reflectivity of the reference mirror is uniform and is denoted as \({R}_{{ref}}\). The axial position of scatterers and reference mirror are \({P}_{s}\left(x,y,t\right)\) and \({P}_{r}\left(t\right)\), respectively. The phase of the scatterers is represented by \(\varphi (x,y)\). The intensity of light incident onto the CMOS can be written as:

$$\begin{array}{l}I\left(x,y,t\right)=\frac{{I}_{0}\left(x,y\right)}{4}\,{{\cdot }}\,\left({R}_{{Inc}}+{R}_{{ref}}+R\left(x,y\right)\right.\\\qquad\qquad\quad\left.+\,2\sqrt{R\left(x,y\right)\,{{\cdot }}\,{R}_{{ref}}}\,{\cdot}\,\cos \left(2\pi \,{{\cdot }}\,\frac{{P}_{s}\left(x,y,t\right)-{P}_{r}\left(t\right)}{\lambda }+\varphi (x,y)\right)\right)\end{array}$$
(1)

where \({I}_{0}\) is the intensity of light incident on the interferometer beam splitter. We applied a sinusoidal voltage with a frequency of 25 Hz and an amplitude of 0.5 V to the PZT to generate a sinusoidal motion in the reference mirror with a frequency of 25 Hz and an amplitude of ~23 nm.

$${P}_{r}\left(t\right)={A}_{{pzt}}\,{{\cdot }}\,{e}^{i2\pi \,{{\cdot }}\,{f_{0}}\,{{\cdot }}\,t}$$
(3)

To visualize the dynamic characteristics of metabolic activities, we utilized the Hue-Saturation-Value (HSV) color model. The hue, indicating the average fluctuation velocity, is calculated from the spectral centroid.

$${\rm{Hue}}\left({\rm{x}},{\rm{y}}\right)={\alpha }_{H}\,{{\cdot }}\,\frac{\int S\left(x,y,f\right)\,{{\cdot }}\,f{df}}{\int S\left(x,y,f\right){df}}+{\beta }_{H}$$
(3)

Where \(f\) is the frequency, \(S\left(x,y,f\right)\) is the spectral intensity, obtained by Fourier transform of the light intensity trace. This formula calculates the weighted average of the spectrum. \(\alpha\) and \(\beta\) are the coefficient and constant in the linear transformation.

Saturation is inversely related to the standard deviation of the frequencies. A wider spectrum indicates lower saturation, which, to an extent, mirrors the complexity of motion patterns. Saturation is quantified as follows, where \({sc}\) denotes the spectral centroid.

$${\rm{Saturation}}\left({\rm{x}},\,{\rm{y}}\right)={\alpha }_{S}\,{{\cdot }}\,\frac{1}{\sqrt{\frac{\int {\left(f-{sc}\right)}^{2}\,{{\cdot }}\,S\left(x,\,y,\,f\right){df}}{\int S\left(x,\,y,\,f\right){df}}}}+{\beta }_{S}$$
(4)

The \(V{alue}\) (brightness) of the image is determined by the integration of the frequency spectrum, which represents fluctuation amplitude.

$$V{alue}\left(x,y\right)={\alpha }_{V}\,{{\cdot }}\,\int S\left(x,y,f\right){df}+{\beta }_{V}$$
(5)

In generating the APMD-FFOCT images, we set the integration ranges of Eqs. (35) from 0.4 Hz to 24.8 Hz, excluding the zero-frequency component that represents the static background, to produce images representing metabolic dynamics (primarily containing tumor cells). Additionally, the integration range is set to 24.8 Hz to 25.2 Hz to generate reflectance images (primarily containing collagen fibers). Since the reflectance of collagen fibers is significantly higher than that of tumor cells, we apply a threshold to the reflectance image, setting pixels below this threshold to zero to retain only collagen fibers. Finally, we combine the highly metabolic tumor cells with the high-reflectance collagen fiber to generate the APMD-FFOCT images.

The above parameters for cell images are scaled using the same set of linear transformation coefficients, and the parameters for collagen fiber images are also scaled using their own consistent coefficients (except Fig. 5). The APMD-FFOCT images in Fig. 5 are generated by different numbers of original images, therefore the linear transformation coefficients are adjusted accordingly. Finally, the HSV image is transformed in the RGB color space to display. All data processing is executed via our custom software. Nuclear parameters in Table 1 were quantified by manually selecting the boundaries of the nuclei using Fiji.

SNR analysis for the APMD-FFOCT

Since the contrast in APMD-FFOCT images is derived from metabolic dynamics, the SNR is influenced not only by the imaging instrument but also by the biological tissue being imaged. Nevertheless, we can quantify the SNR of the particular APMD-FFOCT images. The SNR of images is defined as:

$${SNR}=\frac{{\mu }_{I}}{{\sigma }_{N}}$$
(6)

Here, \({\mu }_{I}\) represents the mean pixel intensity in the corresponding region, and \({\sigma }_{N}\) denotes the standard deviation of the noise region. Noise region in APMD-FFOCT images is defined as areas without reflection (such as water).

Converting APMD-FFOCT images into H&E-stained images with CycleGAN

The raw APMD-FFOCT datasets were first processed using the above-mentioned Fourier-transform method. Frequency spectra at 25 Hz were isolated to enable flexible adjustment of hue and brightness for static tissues, such as collagen fibers and calcified tissues. These tissues were normalized to a moderate intensity. Furthermore, since the nuclei are brighter than the cytoplasm, the images were then converted to grayscale and underwent an inversion process to match standard H&E-stained images. Then, we used a trained deep neural network to convert gray APMD-FFOCT Images into H&E-stained Images.

We utilized CycleGAN, a type of Generative Adversarial Network (GAN) known for its effectiveness in image-to-image translation tasks without the need for paired images. The architecture of CycleGAN is composed by two generators(G: OCT → HE, F: HE→ OCT) and two discriminators(DOCT and DHE). The purpose of virtual H&E staining is to determine the most suitable generator G: OCT → HE. The CycleGAN’s success mainly contributed to its cycle-consistency loss, which ensures that an image can be translated from one domain to another and back again, with the goal of the returned image being indistinguishable from the original. So, the generators could generate images that mimic the statistical properties of the target domain. By using the cycle-consistency property of the CycleGAN framework, the cycle-consistency loss function was defined as:

$${L}_{{cycle}}=\left|F\left(G\left({OCT}\right)-{OCT}\right)\right|+\left|G\left(F\left({HE}\right)-{HE}\right)\right|$$
(7)

The least-squares adversarial losses were combined with cycle-consistency loss to train the generators, so the total loss function was defined as:

$${L}_{G}={\left(1-{D}_{{HE}}\left(G\left({OCT}\right)\right)\right)}^{2}+{\left(1-{D}_{{OCT}}\left(F\left({HE}\right)\right)\right)}^{2}+\lambda {L}_{{cycle}}$$
(8)

where \(\lambda\) is the regularization parameter which was set to 10.

As opposed to the generators, the discriminators were used to distinguish between the real and the generated images, and the loss function for DOCT and DHE were defined as:

$${L}_{{D}_{{HE}}}={D}_{{HE}}{\left(G\left({\rm{OCT}}\right)\right)}^{2}+{\left(1-{D}_{{HE}}\left({OCT}\right)\right)}^{2}$$
(9)
$${L}_{{D}_{{OCT}}}={D}_{{OCT}}{\left(F\left({\rm{HE}}\right)\right)}^{2}+{\left(1-{D}_{{OCT}}\left({OCT}\right)\right)}^{2}$$
(10)

The detailed architecture of CycleGAN was described by Zhu. et al.31. The training dataset contains 3600 APMD-FFOCT images and 3450 H&E tiles of size 256 × 256 pixels, including human CNS tumor and breast tumor image patches. And 400 APMD-FFOCT images of size 800 × 550 pixels were used for testing. The training process was performed on NVIDIA GTX 3080 10GB GPU. The loss function was optimized using the Adam solver, with an initial learning rate of 0.0001 and batch size of 8. After 200 training epochs, the resulting generator G model’s weights were used to convert APMD-FFOCT Images into H&E-stained Images.

Sample preparation

The study procedures of CNS tumors received approval from the Ethics Committee of Beijing Tsinghua Changgung Hospital, and the study procedures of Breast tumors were approved by the Ethics Committee of Peking University People’s Hospital. Informed consent was obtained from each patient undergoing cancer surgery before imaging. Fresh tumor tissue was promptly collected from the tumor bed of the excised surgical specimen, ensuring the presence of at least one smooth and flat surface suitable for APMD-FFOCT imaging. The excised tissue was kept moist with saline and the container was stored on ice until imaging. All data were obtained within 2 h after excision at room temperature. During the imaging time, the contrast of the APMD-FFOCT images changed very little, as shown Supplementary Fig. 4. Moreover, in APMD-FFOCT imaging, the brightness of cell nuclei shows no significant variation within the temperature range of 28–40 °C. Beyond 40 °C, the brightness of the cell nuclei begins to decrease, as shown in Supplementary Fig. 5. This may be because the activity of enzyme decreases at excessively high temperatures. After APMD-FFOCT imaging, the tissue was fixed in formalin for paraffin H&E pathology. Pathologists with expertize in histology provided a pathological diagnosis for each tissue based on the corresponding H&E slide.