Adaptive compression framework for giga-pixel whole slide images

Lee, Jonghyun; Takemaru, Lina; Bappy, D. M.; Jeong, Ye Sul; Jeong, Won-Ki; Oldridge, Derek; Kim, Dokyoon; Ahn, Sangjeong; Lee, Sung Hak

doi:10.1038/s41467-025-66889-0

Download PDF

Article
Open access
Published: 03 December 2025

Adaptive compression framework for giga-pixel whole slide images

Nature Communications volume 17, Article number: 207 (2026) Cite this article

7701 Accesses
1 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Digital pathology generates gigapixel whole-slide images that require extensive storage, limiting scalability and increasing costs. Traditional compression methods apply uniform ratios, disregarding variations in diagnostic importance across regions. Here we show Adaptive compression for gigapixel whole-slide images (AdaSlide), a framework that balances compression efficiency and diagnostic integrity. AdaSlide integrates a reinforcement learning-based Compression Decision Agent (CDA) that determines the optimal compression ratio per region, and a Foundational Image Enhancer (FIE) that restores visual fidelity after compression. In evaluations, pathologists achieved near-chance accuracy (55%) in distinguishing original from restored images, confirming high perceptual fidelity. Across 13 downstream tasks, AdaSlide maintained diagnostic performance in most tasks while reducing storage to 10-35% of the original size. By aligning compression strategies with clinical relevance, AdaSlide enables efficient, scalable, and reliable storage for digital pathology and future AI development.

TIAToolbox as an end-to-end library for advanced tissue image analytics

Article Open access 24 September 2022

Integrating digital pathology into clinical practice

Article 01 October 2021

Interoperable slide microscopy viewer and annotation tool for imaging data science and computational pathology

Article Open access 22 March 2023

Introduction

Recently, the transformation from glass slide pathology to the digital pathology system (DPS) has led to the emergence of machine learning-integrated DPS, known as computational pathology (CPath)^1,2,3,4. CPath has demonstrated strong performance in various pathological tasks, including cancer diagnosis ^{5,6,7,8,9,10,11}, survival prediction¹², lymph node invasion detection^13,14, and cell segmentation^15,16. CPath performance has significantly improved through the use of large-scale digitized slides, with robust foundation models recently proposed^{17,18,19,20,21,22,23}. However, unlike natural images, whole-slide images (WSIs) are considerably large (3 or 4 GB), resulting in significant storage burdens for data archiving. For example, Linköping Hospital collects digital slides totaling 130 TB every six months ²⁴. Due to storage limitations, slides older than six months are deleted, which is a temporary, impractical, and invasive solution. Simply expanding storage is also impractical, as it requires physical space and increases maintenance costs. Additionally, cloud servers may not be an optimal solution, as storage costs escalate cumulatively (see Supplementary Note 1 and Supplementary Fig. 1). However, most CPath studies remain focused on prediction accuracy, thus leaving the issue of efficient data storage unresolved.

To tackle this large-scale storage issue, several methods have been proposed, including neural image compression (NIC) that effectively compresses the original high-dimensional image into a compressed low-dimensional vector using neural networks^{25,26,27,28,29}. These encoder-focused NIC methods utilize a supervised learning approach to maximize the task-specific prediction performance. However, the encoder-focused NICs are unsuitable for image data storage because, without a decoder, the compressed vectors cannot be reconstructed into the original images. Encoder-decoder-based NICs are more suitable for efficient storage, but limited methods are available for digital pathology applications^30,31,32,33. A notable effort by Tellez et al.³⁰ involved training Variational Autoencoders (VAE)³⁴ and Bidirectional Generative Adversarial Networks (BiGAN)³⁵ to encode and decode digital pathology images. They demonstrated the effectiveness of the NIC for digital pathology image compression and restoration, with minimal performance degradation compared to the original image-based methods. Nasr et al.³³ focused on the compression ratio (CR) using the VAE model, and they reported a state-of-the-art (SOTA) CR (1/512) on cancer imaging data. Previous studies tessellated WSIs into 32 × 32 or 64 × 64 pixels. However, Keighley et al.³¹ employed patches of size 512 × 512 pixels and demonstrated successful compression and restoration using a Vector Quantized VAE (VQVAE)-2³⁶ model. With the matched CR (1/19.2), the outputs of VQVAE-2 (namely, VQVAE) showed high structural similarity (SSIM) to the JPEG-compressed images (0.8); however, the peak signal-to-noise ratio (PSNR) degraded (20 and 29, for VQVAE and JPEG, respectively). In contrast to NICs, Afshari et al.³² applied super-resolution (SR) methods for digital pathology images. The difference between NICs and SR is that NICs use compressed vectors, whereas SR relies on human-interpretable low-resolution images. In their empirical study, the SR models successfully reconstructed low-resolution images into their original high-resolution forms.

Previous studies^30,31,32,33 have focused on compressing digital pathology images using specific models to achieve a higher CR with minimal information loss. However, even promising models suffer information loss during encoding and decoding steps due to structural limitations, which worsen at higher compression levels. Consequently, there is a risk of losing clinically significant information while aiming to maximize the CR. To address this concern, we focused on the inter-slide information variance in gigapixel WSIs, which we refer to as information disequilibrium (see “Information Disequilibrium” section for more details). Each WSI comprises numerous regions of interest (ROIs), some containing diagnostic information while others do not. By enabling the model to distinguish between informative and non-informative ROIs, we can optimize compression by heavily compressing only clinically uninformative ROIs, thereby minimizing critical information loss.

One recent work³⁷ introduced a multiple instance learning (MIL)-based method for identifying informative regions. MIL learns the relationship between the labels provided during training and the instances within a bag, enabling the model to distinguish between more and less important instances. Based on this capability, the authors proposed applying different compression rates to important and less important regions, as determined by a trained MIL classifier. However, this approach has several limitations. Since the MIL classifier is trained in a supervised manner tailored to a specific task, (1) it learns to distinguish instances only in the context of that task, (2) making it difficult to generalize to other tasks with different objectives. Moreover, (3) the method is highly dependent on the performance of the MIL classifier itself and requires an appropriate dataset for MIL training, which may not always be available. Consequently, while this approach can be effective in certain controlled settings, its general applicability is limited, and its scalability is constrained.

To address these issues, we propose an Adaptive compression framework for gigapixel whole-slide images, namely AdaSlide, to account for information disequilibrium. AdaSlide is designed to meet the following constraints: (1) the lost information should be minimized, (2) each patch image must be maximally compressed, (3) the compression decision should be object-dependent, and (4) the compression decision and image enhancement should be agnostic to backbone-model architecture, organ, magnification, and downstream tasks.

AdaSlide consists of two core modules: a Compression Decision Agent (CDA) and a Foundational Image Enhancer (FIE). The CDA autonomously decides whether to compress an image based on its informational content. The images chosen for compression are downscaled from the D-dimension to the d-dimension (D > d), and later restored to the D-dimension using the FIE. The CDA is trained using reinforcement learning, eliminating the need for human annotation and allowing it to consider key factors in compression decisions based on the design of the reward function. The FIE is trained to handle multi-organ, multi-magnification scenarios and can be easily substituted with multiple backbones. In summary, our main contributions are as follows:

AdaSlide is an organ- and task-agnostic compression method.
AdaSlide provides flexible control over compression tendency and quantifies clinically relevant information.
AdaSlide applies information disequilibrium in a scalable, task-agnostic compression framework.
AdaSlide offers an optimal solution for mid- and long-term archiving, achieving over 65–90% storage reduction with minimal impact on clinical diagnosis.

Results

Overview of AdaSlide

AdaSlide trains in two stages: First, the FIE, designed to handle diverse magnification and cancer types, is trained. Subsequently, the CDA is trained using reinforcement learning to determine the optimal compression level for each patch image based on its information content. AdaSlide’s inference process consists of two stages: encoding and decoding. During encoding, the CDA decides action (keep or compress), and the original patch is compressed accordingly. Compressed patches are restored during decoding with the FIE. The entire AdaSlide pipeline is shown in Fig. 1B and Box 1.

Box 1 AdaSlide adaptive compression algorithm

Input: Patch images P 7D2Output: Reconstructed whole-slide image R

Initialize empty sets for compressed (S_P), non-compressed (S_N), and enhanced (S_E) patches.

Encoding:

For each patch p ∈ P:

Obtain compression decision $\hat{h}={{{\rm{CDA}}}}(p)$.

If $\hat{h}\ge \tau$, set p_c = FIE_encoder(p) and add to S_P; otherwise add p to S_N.

Decoding:

For each p_c ∈ S_P, compute p_e = FIE_decoder(p_c) and add to S_E.

Reconstruct R from S_N ∪ S_E: R ← Reconstruct(S_N ∪ S_E).

Information disequilibrium

Information disequilibrium arises from the gigapixel scale of WSIs, where not all pixels contribute equally to clinical diagnosis. For instance, in identifying tumor regions within a WSI, areas densely packed with cells are more relevant than background, adipose tissue, or bone. Similarly, detecting lymphovascular invasion or perineural invasion requires focusing on specific tube-like structures, such as lymph nodes, vascular tubes, or perineural regions. This suggests that clinically important regions vary dynamically with diagnostic tasks and goals, rendering equal attention across all regions unnecessary.

For instance, MIL divides a WSI into multiple patch instances and applies attention mapping to selectively identify regions that significantly contribute to label prediction^37,38,39. Advanced MIL models incorporate locality and global patterns^40,41, prioritize key instances⁴², or leverage clustering loss to effectively differentiate label-relevant instances³⁸. This study defines this dynamic allocation of importance across regions as information disequilibrium. Specifically, Li et al.³⁷ proposed an adaptive decompression method that leverages MIL-based attention maps to assign different decompression depths to image patches. While the paper does not explicitly define the concept of information disequilibrium, its strategy effectively leverages the inherent information imbalance across regions to prioritize the decompression of diagnostically relevant areas.

Hypothesis of AdaSlide

Information disequilibrium is the cornerstone of this study because traditional uniform compression models fail to address it, leading to unavoidable information loss by compressing all regions equally. To design an information content-aware compression framework that accounts for information disequilibrium, we propose the following hypotheses:

Tumor-related diagnosis is one of the primary tasks.
Cellular information is clinically critical for tumor diagnosis.
Certain patch instances are more challenging to reconstruct than others.

While tumor-related diagnosis is not the sole clinical priority, we focused on it due to our training data’s characteristics and the prominence of tumor-related research in CPath. Accordingly, we designed a CDA reward function tailored to tumor-related diagnosis. In this study, clinically informative regions are defined as tumor-related areas. However, AdaSlide is flexible and can redefine clinically informative regions by modifying the reward function.

To quantify regions contributing to tumor-related diagnosis, we utilized cellular information, recognizing that the morphological features and distribution of cells are crucial to tumor diagnosis. Furthermore, we incorporated the compression difficulty of cellular information to balance the information loss and compression performance. Guided by hypotheses (2) and (3), we constructed a 2 × 2 conditional matrix (Fig. 2).

Clinically informative and easy to restore (Zone A).
Clinically informative and hard to restore (Zone B).
Clinically uninformative and easy to restore (Zone C).
Clinically uninformative and hard to restore (Zone D).

**Fig. 2: Example of the hypothesis field of AdaSlide.**

In this hypothesis field, the CDA is designed to prioritize Zone B by maintaining its original information as much as possible, while employing compression strategies for the remaining zones. This is achieved through a combination of compression rewards and information penalties.

Datasets

The PanCancer dataset, derived from 31 projects from The Cancer Genome Atlas (TCGA) dataset, has 930 WSIs, with 30 WSIs extracted from each project. It also includes 1.8 million patch images extracted at 20× and 40× magnifications Fig. 1A. The PanCancer dataset was divided into training (94.5%), validation (5%), and test (0.5%) sets, corresponding to 1,766,502, 93,307, and 9393 patch images, respectively. To further evaluate the FIEs’ reconstruction performance, we utilized datasets from the Clinical Proteomic Tumor Analysis Consortium (CPTAC). A total of 110 WSIs were selected, with 10 WSIs per project across 11 projects. Additionally, 29,861 patch images were extracted using the same patch generation pipeline and were used for performance evaluation. Details of the TCGA and CPTAC projects are summarized in Supplementary Note 2.

We evaluated the performance of AdaSlide on patch-level tasks (classification and segmentation) and slide-level tasks (classification) using 13 benchmark datasets, as summarized in Table 1. Detailed information, including preprocessing steps, is provided in the “Methods” section.

Table 1 Summarization of downstream task datasets

Full size table

For the patch-level classification tasks, we used five datasets: NCT-CRC (9 classes; colon)⁴³, MHIST (binary class; colon)⁴⁴, LI (binary class; stomach)¹⁴, SICAPv2 (4 classes; prostate)⁴⁵, and WSSS4LUAD (binary class; lung)⁴⁶ datasets.

For patch-level segmentation tasks, we used the SNOW (binary class; breast)⁴⁷, NuInsSeg (binary class; various organs)⁴⁸, and PanNuke (binary class; various organs)⁴⁹ datasets. The NuInsSeg and PanNuke datasets include multiple classes; however, for simplicity, we converted the multi-class masks into binary masks. Moreover, the NuInsSeg dataset contains mouse-derived H&E images and mask pairs, which we excluded from the analysis.

For the slide-level classification, we used the TCGA-RCC, TCGA-NSCLC, TCGA-BRCA, Camelyon16, and Children’s Brain Tumor Network (CBTN)⁵⁰ datasets for slide-level classification analysis. The TCGA-RCC dataset combines multiple renal cell carcinoma (RCC) projects: TCGA-KICH, TCGA-KIRC, and TCGA-KIRP. Similarly, the TCGA-NSCLC dataset integrates non-small cell lung cancer (NSCLC) subtypes, including TCGA-LUAD and TCGA-LUSC. The TCGA-BRCA dataset was used for binary subtype classification, while the Camelyon16 dataset was used for tumor vs. non-tumor slide classification tasks. Lastly, the CBTN dataset was used for pediatric brain tumor subtype classification, including two pediatric-specific brain tumor cancer types: medulloblastoma and ependymoma.

Foundational Image Enhancer (FIE)

The image restoration performances of the FIEs are shown in Fig. 3A, and sample output images are illustrated in Fig. 3B and Supplementary Figs. 2 and 3. We used the VAE³⁴, VQVAE³⁶, ESRGAN⁵¹, Swin Image Restoration (SwinIR)⁵², and Latent Diffusion Model (LDM)⁵³ as FIE backbones, as these models have been widely adopted for digital pathology image compression tasks in previous studies^30,31,33. In addition, we included Transformer-based and diffusion-based models that have demonstrated strong performance in image restoration tasks to provide a comprehensive comparison across different model families. Three evaluation metrics were used: Structural Similarity Index Measure (SSIM), Peak Signal-to-Noise Ratio (PSNR), and Learned Perceptual Image Patch Similarity (LPIPS)⁵⁴.

The overall performance ranking was as follows: VQVAE, SwinIR, LDM, ESRGAN, and VAE. VQVAE demonstrated consistently strong reconstruction performance across both internal and external datasets. However, it occasionally produced artificial noise artifacts during reconstruction, as shown in Fig. 3B. SwinIR and LDM exhibited excellent performance on the internal dataset; however, their performance substantially degraded on the external dataset, indicating potential limitations in generalizability. In contrast, ESRGAN exhibited lower internal performance compared to SwinIR and LDM, but its performance degradation on the external dataset was less pronounced.

Although AdaSlide is designed to be agnostic to the choice of FIE model, it is necessary to fix the FIE during CDA training. This is because the reward function explicitly incorporates the quality of the enhanced images reconstructed from low-resolution inputs. Based on these considerations, we selected the FIE backbone by evaluating two key factors: (1) compatibility with compression, and (2) efficiency in training and inference.

VAE achieved the highest compression ratio, but the resulting image degradation was so severe that the reconstructed images were difficult to interpret visually. Therefore, this model was excluded from the candidate list. VQVAE demonstrated excellent compression and reconstruction performance. However, although the feature dimension of VQVAE (${{\mathbb{R}}}^{128\times 128\times 2}+{{\mathbb{R}}}^{64\times 64\times 2}$) is theoretically smaller than that of the original image (${{\mathbb{R}}}^{512\times 512\times 3}$), this only applies to the raw, uncompressed feature space. In practical applications, where image formats such as JPEG or PNG are used, the compressed features of VQVAE require larger storage space than image-based formats. Therefore, to support real-world applications such as demo systems, we focused on models that enable compression of low-resolution image files and subsequent reconstruction. As a result, the candidate models were SwinIR, LDM, and ESRGAN.

While LDM and SwinIR demonstrated strong performance, both required substantial computational resources and long processing times for training and inference. In particular, LDM, due to the inherent characteristics of diffusion models, was especially time-consuming. This makes LDM and SwinIR less practical for processing WSIs, which typically contain thousands to tens of thousands of image patches, as they would demand excessive computing resources and time. In contrast, ESRGAN offers the advantages of lower computational demands and the ability to leverage larger batch sizes during training, making it a more practical choice.

Based on these considerations, ESRGAN was selected as the FIE backbone for CDA training and for demonstration experiments, as it provided a good balance between performance and efficiency. However, it is important to note that this does not imply that AdaSlide is dependent on a specific FIE. To further assess the feasibility of FIE, we conducted a reader study involving a Visual Turing Test (VTT) with five pathology experts. The VTT evaluated whether the FIE-generated images could be distinguished from real images, using the internal dataset. The VTT results are summarized in Supplementary Table 1. VQVAE was excluded from this analysis due to its characteristic artificial noise Fig. 3B, which made it visually easy to differentiate. The VTT results indicated that FIE-generated images were challenging to distinguish from real ones, demonstrating the practicality of FIE-based reconstruction (56%; Z = 6, p = n.s.).

Compression Decision Agent (CDA)

To determine the compression level of an image, we employ a reinforcement learning-based agent that assesses the information content, reconstruction difficulty, and importance of each image. A conceptual diagram of the CDA is shown in Fig. 4A. We designed a reward function to compensate for the gain from compression and the loss of information (Equation (3)). This simplifies training by removing the need for human-annotated data and enhances model stability by eliminating inter-observer variation. To discourage the shortcut where the agent tries to compress every ROI, the agent receives a penalty for the amount of significant information lost during restoration. We defined the penalty as the difference between the cell segmentation output of the original image and the restored image, using HoverNet¹⁵. The concept of penalty based on HoverNet outputs is illustrated in Fig. 4B.

To control the tendency of CDA, we include a λ parameter. A higher λ value strengthens the penalty term, making the agent more conservative and less likely to compress images. We used grid search to find the optimal combination of λ and learning rate, observing that as λ increases, the learning rate and CR decrease (Fig. 4C). This indicates that a more conservative agent (higher λ) achieves a lower CR while likely preserving more image information.

To determine the backbone architecture for the CDA, we experimented with ResNet-18, ResNet-50, and Vision Transformer (ViT) models. Considering the alignment between the λ parameter and CR, as well as the efficiency of each model, ResNet-18 was selected as the final backbone for the CDA. The results are summarized in Supplementary Table 2.

Evaluation of information disequilibrium

We quantitatively evaluate the core assumption of information disequilibrium in CDA, based on its actual inference results. Specifically, we assess whether CDA effectively recognizes information imbalance across image regions through two complementary analyses: (1) whether cell-dense regions are appropriately prioritized, and (2) whether reconstruction difficulty is properly accounted for during compression decisions. Those results are summarized in Table 2.

Table 2 Joint evaluation of semantic relevance and enhancement complexity across compression levels

Full size table

To examine the first aspect, we leveraged cell segmentation results to implicitly guide CDA toward recognizing the importance of cell-rich regions. To validate this behavior, we utilized the zero-shot inference capability of the PLIP pathology vision-language model²³. PLIP is a CLIP-based model trained on pathology image-caption pairs sourced from Twitter and is designed to capture the semantic relationships between images and textual descriptions within the pathology domain. For this analysis, we employed the following two query prompts to evaluate the relative similarity of each image:

A photo of densely packed cells.
A photo showing adipose tissue, stroma, or acellular background.

For each λ condition and compression decision, we measured the similarity of images to the first prompt (“A photo of densely packed cells.”) to assess whether CDA preferentially retained cell-rich regions. The findings indicate that in regions where compression was applied, the similarity to the cell-rich prompt was relatively lower, suggesting that CDA effectively deprioritizes semantically less informative regions during compression.

The second analysis aimed to verify whether the penalty design in CDA appropriately discourages indiscriminate compression by considering the difficulty of reconstructing compressed images. Specifically, we compared the pre- and post-reconstruction SSIM scores between images where CDA selected compression versus those where it selected retention (keep). The results show that as λ increases, images selected for compression exhibit higher SSIM scores compared to retained images. This trend suggests that CDA adapts its compression decisions to avoid high-penalty scenarios, demonstrating sensitivity to reconstruction difficulty and supporting the intended information-aware behavior.

Downstream tasks

We assessed AdaSlide across 13 downstream tasks, encompassing patch-level classification, patch-level segmentation, and slide-level classification. Information loss was quantified as the reduction in performance compared to results achieved with high-resolution image-based analysis. CR is a ratio of compression compared to the original high-resolution images. Hence, a lower CR value indicates a smaller file size (i.e., higher compression). Except for the training dataset, where images were compressed and enhanced using FIEs, all other conditions remained consistent. This ensured that any performance gap between original image-based analysis and processed image-based analysis was solely due to image information degradation during the image processing stages.

AdaSlide includes several variants depending on the backbone models (ESRGAN, VQVAE, SwinIR, LDM) and the compression tendency parameter λ (0.1, 0.25, 0.5, 0.75, 1.0). The average performance, best performance, and best CR of AdaSlide were reported in Fig. 5, while the detailed performances are summarized in Supplementary Tables 3–15. Since VAE’s performance was substantially lower, its results are omitted from Fig. 5 for clarity.

**Fig. 5: Overview of various downstream tasks.**

Supplementary Tables 3–7 summarize the patch-level classification results. Overall, the uniform compression models (VAE, ESRGAN, VQVAE, SwinIR, LDM) achieved the highest CR (smallest file sizes) but tended to degrade performance more than AdaSlide. Specifically, VAE exhibited a severe performance drop in every dataset (e.g., MHIST AUROC: 0.5552 vs. baseline 0.8437; LI AUROC: 0.5696 vs. baseline 0.8321), presumably due to the blurred reconstructions (Fig. 3B and Supplementary Fig. 2).

In contrast, AdaSlide outperformed the baseline on NCT-CRC, MHIST, LI, and SICAPv2, although it slightly underperformed in WSSS4LUAD. For instance, on NCT-CRC, VQVAE-based AdaSlide with a λ parameter of 0.5 (${\,{\mbox{AdaSlide}}}_{{\mbox{VQVAE}}\,}^{0.5}$) achieved the highest AUROC of 0.9943 (vs. baseline 0.9895). On MHIST, ${\,{\mbox{AdaSlide}}}_{{\mbox{ESRGAN}}\,}^{0.5}$ reached an AUROC of 0.8509, outperforming the baseline of 0.8437. The optimal λ and backbone model varied across datasets, indicating that compression preferences are dataset-specific.

Supplementary Tables 8–10 present the segmentation results (example images are in Supplementary Fig. 3). In contrast to classification tasks, segmentation requires high pixel-level fidelity for accurate mask delineation. As anticipated, VAE reconstructions often appeared too blurry, causing significant performance deterioration. Notably, in the NuInsSeg dataset, uniform compression models (VAE, ESRGAN, VQVAE without adaptive compressions) showed near-zero Dice scores, underscoring their difficulty in preserving crucial details.

Meanwhile, AdaSlide substantially mitigated information loss. For example, on the NuInsSeg dataset, ${\,{\mbox{AdaSlide}}}_{{\mbox{SwinIR}}\,}^{1.00}$ achieved a Dice score of 0.7068, with only a 2.12% drop from the baseline (0.7221). Conversely, in the same dataset, uniform compression models showed severely decreased performances (Dice: 0.0000). Interestingly, AdaSlide outperformed the baseline on certain datasets, including SNOW and PanNuke. For instance, on SNOW, ${\,{\mbox{AdaSlide}}}_{{\mbox{LDM}}\,}^{0.25}$ achieved a Dice of 0.9434, higher than the baseline (0.9334). Similarly, on PanNuke, ${\,{\mbox{AdaSlide}}}_{{\mbox{VQVAE}}\,}^{0.50}$ attained a Dice of 0.7304 (vs. baseline 0.7158).

For slide-level analysis, we employed the CLAM framework³⁸, a widely used MIL method. As summarized in Supplementary Tables 11–15, AdaSlide slightly underperformed compared to the baseline in TCGA-RCC, Camelyon16, and CBTN but outperformed it in TCGA-NSCLC and TCGA-BRCA. For instance, in Camelyon16, AdaSlide ranked third (AUROC = 0.8676), slightly below the uniform compression VQVAE model (AUROC = 0.8679), and both were lower than the baseline (0.8852). Conversely, in TCGA-NSCLC, ${\,{\mbox{AdaSlide}}}_{{\mbox{SwinIR}}\,}^{1.00}$ achieved an AUROC of 0.8970, outperforming the baseline of 0.8774.

AdaSlide’s performance, information loss, optimal λ parameter, and FIE type vary based on the characteristics of the downstream datasets. This section reviews AdaSlide’s performance from the following perspectives: (1) regularization effects, (2) global and local patterns, (3) in-domain and out-of-domain datasets, and (4) dataset difficulty.

(1) Regularization Effects. A noteworthy observation across most datasets was that applying AdaSlide frequently improved performance compared to the baseline. As performance degradation relative to the baseline was defined as information loss, this improvement can be considered an information benefit. We hypothesize that this improvement stems from a regularization effect. Unlike uniform compression models, AdaSlide determines whether to pass an image through the FIE or use the original based on CDA’s decision. The FIE processes images only when the CDA predicts that it will not significantly degrade quality or affect performance. This selective approach likely enhances performance by reducing noise and normalizing colors in images, resulting in better outcomes for downstream tasks.

Key regions of high importance, where the FIE could potentially harm image quality, are left uncompressed by the CDA. This rationale explains why AdaSlide outperformed uniform compression models in most datasets, with the exception of Camelyon16. However, not all datasets exhibited positive effects, with some experiencing information loss, highlighting the limitations of the regularization hypothesis.

(2) Global and Local Patterns. The VAE model’s performance is constrained by its severe output blurriness. Nevertheless, this limitation offers insights into whether specific tasks rely more on global patterns or local details. Tasks that achieve acceptable performance despite blurry outputs suggest that while detailed information adds value, the primary directionality remains discernible.

For example, datasets such as NCT-CRC (patch-level classification) and slide-level classification datasets predominantly rely on global patterns, whereas datasets like NuInsSeg and other segmentation tasks depend heavily on local patterns. Tasks involving cell boundary segmentation or tumor infiltration detection rely heavily on fine-grained details (local patterns), while tasks focused on overall distribution and coloration (e.g., NCT-CRC) can be adequately performed using global patterns.

In tasks primarily driven by global patterns, even uniform compression models delivered satisfactory results. Conversely, for tasks dependent on local patterns, AdaSlide outperformed uniform compression models, effectively addressing information disequilibrium.

(3) In-domain and Out-of-domain Problem. Out-of-domain (OOD) challenges in AdaSlide arise from two primary sources of risk: (1) the image reconstruction performance of the FIE, and (2) the compression decision accuracy of the CDA. Since image reconstruction directly affects the amount of retained information, an FIE trained on in-domain data may fail to accurately reconstruct images from out-of-domain distributions, potentially introducing artifacts or blur. In parallel, the CDA, trained on cell information and image reconstruction difficulty, may misclassify important regions or overestimate reconstruction difficulty on unfamiliar data, resulting in information loss.

Among the 13 downstream tasks evaluated, AdaSlide exhibited notable performance degradation on three datasets: NuInsSeg, CBTN, and Camelyon16. These issues can be interpreted as follows: NuInsSeg represents a dataset shift in image brightness; CBTN introduces a domain shift to pediatric pathology; and Camelyon16 introduces a domain shift via lymph node sections, which differ from the tissue sections used in TCGA.

For NuInsSeg, performance degradation is primarily attributable to the FIE, as all FIE-based segmentation models failed (Dice score near zero), and the LDM-based model failed to generate adequate cell structures (Supplementary Fig. 3). This indicates that segmentation performance is highly dependent on the fidelity of the original high-resolution information, that is, on the compression ratio.

For CBTN, although performance degradation was less severe than for NuInsSeg, applying FIE alone resulted in noticeable performance drops. This is likely due to the pediatric-specific disease distribution, which diverges from the training domain of the FIE. However, when AdaSlide was applied, performance improved relative to using FIE alone, suggesting that the CDA mitigated some of the domain-induced reconstruction errors and that the primary issue lay with the FIE.

Camelyon16 exhibited a distinct pattern: it was the only case where FIE alone outperformed the full AdaSlide pipeline. Specifically, VQVAE, previously shown to generalize robustly-performed well, but adding CDA reduced performance. The CDA likely failed to operate effectively in this setting, as evidenced by the elevated CR and performance drop. Camelyon16 lymph node sections contain substantially higher cell densities than typical TCGA tissue sections. The CDA, trained to prioritize regions with high cell content, likely responded too aggressively in this context.

Despite these challenges, AdaSlide, composed of both FIE and CDA, outperformed FIE alone in most cases, suggesting that the CDA can compensate for FIE-induced information loss. Nevertheless, since both modules carry inherent OOD risks, careful experimental design and additional robustness strategies will be important considerations for future applications.

(4) Dataset Difficulty. The difficulty of each task played a significant role in determining the optimal λ parameter. For most datasets, λ = 0.50 produced the best results. For example, in patch-level segmentation tasks, dataset difficulty ranked as SNOW > PanNuke > NuInsSeg. This ranking considered both VAE performance and dataset characteristics, such as noise levels, with SNOW being a synthetic dataset with minimal noise. The optimal λ values were 0.25 for SNOW, 0.50 for PanNuke, and 1.00 for NuInsSeg. Higher λ values were more appropriate for out-of-domain datasets where the FIE struggled to reconstruct images accurately.

Slide-level classification showed relatively smaller performance differences compared to patch-level classification when selecting FIE and λ values. This was likely due to the reduced influence of individual patch instances in slide-level tasks, where classification relies more on shared global patterns. Although restoring fine-grained patterns remains important, the preservation of macroscopic patterns had a greater impact on slide-level classification. As a result, mid-range λ values typically achieved the best performance across most datasets, except for the CBTN dataset.

Best practice for selecting the optimal λ parameter

AdaSlide is not explicitly trained for any particular downstream task. Instead, its compression behavior is implicitly controlled through the choice of the λ parameter. Selecting an appropriate λ value is therefore important to balance compression efficiency and downstream task performance. Our experiments show that no single λ value performs optimally across all datasets and tasks. The optimal choice depends on several factors, including the target task’s sensitivity to fine-grained details, the relative importance of preserving subtle information, and the potential benefits of noise reduction introduced by the FIE. Because of this variability, the most reliable strategy is to conduct small-scale validation experiments that reflect the characteristics of the target task, as demonstrated in our downstream evaluations. Such experiments provide empirical guidance on selecting the λ value that best suits the intended application.

When performing dedicated validation is not feasible, either due to a lack of labeled data or limited computational resources, a practical heuristic can be used. In our experiments, we observed that λ = 0.50 generally provided a good balance between information preservation and compression across a wide range of datasets. Therefore, λ = 0.50 can be recommended as a reasonable default choice for general applications. For use cases where preserving fine-grained information is particularly important, such as tasks that are highly sensitive to local image patterns or applications where domain shift is expected between training and deployment data (for example, between TCGA and external cohorts), a more conservative setting such as λ = 1.00 is advisable. This choice reduces the risk of losing important information due to excessive compression. On the other hand, if storage efficiency is the primary concern and some level of information loss is acceptable, smaller λ values can be explored to maximize compression. More details can be found in Supplementary Note 4, and Supplementary Fig. 4.

Application example of AdaSlide

To evaluate compression performance at the WSI level, we implemented a real-world application of AdaSlide pipelines and validated both compression and enhancement. We selected one sample from TCGA-BRCA, TCGA-KIRC, and Camelyon16 datasets, respectively. The compression results are depicted in Fig. 6, and Supplementary Figs. 5 and 6.

**Fig. 6: Example output of AdaSlide on a sample from the TCGA-BRCA dataset.**

Tumor regions exhibited a higher tendency for the Keep action, and the AUROC for distinguishing tumor versus non-tumor regions based on CDA action probabilities was 0.6288, indicating performance above chance level Fig. 6B. The table in Fig. 6D summarizes the effective compression performance when storing the compressed images. While JPEG format achieved the smallest storage footprint due to its inherent lossy compression, we observed notable image quality degradation during ESRGAN-based reconstruction, rendering it suboptimal for practical use (Supplementary Fig. 7). Conversely, PNG format provided the best reconstruction quality owing to its lossless nature, but resulted in substantially larger storage requirements. To balance these trade-offs, we adopted a hybrid strategy wherein high-resolution (keep) images were stored as JPEG to minimize storage demands, while compressed (compress) images were stored as PNG to preserve reconstruction quality. This approach provided an effective balance between storage efficiency and image fidelity.

Discussion

In this study, we developed AdaSlide, an adaptive compression framework that addresses the trade-offs between CR and information loss when compressing gigapixel WSIs. We assembled a PanCancer dataset from 31 TCGA projects, comprised of 1.8 million patches. Using this dataset, we trained two key modules: the FIE and the CDA. The FIE enhances the compressed images through encoding and decoding steps. The CDA, trained to handle information disequilibrium, autonomously determines the compression level based on the information content of the image. We evaluated the performance of AdaSlide across various downstream tasks, including patch-level classification, segmentation, and slide-level classification. The results indicated minimal information loss between the baseline and AdaSlide method, demonstrating its effectiveness in maintaining prediction accuracy while reducing the image storage capacity. AdaSlide defines information disequilibrium in a data-driven manner, which offers a key advantage over prior MIL-based approaches³⁷ that rely on attention maps. In contrast to these methods, AdaSlide is task-agnostic and not reliant on task-specific datasets, enabling broader applicability across diverse clinical and research settings.

Notably, AdaSlide outperformed uniform compression models (i.e., ESRGAN, VQVAE, SwinIR, and LDM) in all downstream tasks except for Camelyon16. Furthermore, AdaSlide demonstrated performance comparable to that of baseline models. Specifically, on the NuInsSeg dataset, uniform compression models failed to properly segment cell masks (Dice: 0.0000), whereas AdaSlide preserved information more effectively than the baseline methods. Additionally, AdaSlide showed adaptability to out-of-domain datasets such as Camelyon16 and CBTN. Camelyon16 comprises lymph node section slides, which differ slightly from the TCGA PanCancer dataset. Meanwhile, CBTN focuses on pediatric tumor types, which are distinct from the adult cancer data in TCGA. While AdaSlide’s performance was lower than the baseline on these datasets, it still exceeded that of the uniform compression models. These findings suggest that AdaSlide effectively mitigates information loss caused by FIE limitations and addresses out-of-domain challenges to some extent. These findings highlight that improving the performance of the FIE is crucial for enhancing AdaSlide’s overall effectiveness. In the future, we plan to incorporate recent diffusion-based SOTA models⁵⁵ to further improve performance.

The AdaSlide framework offers flexible options to accommodate various research objectives. For instance, researchers can regulate the compression tendency using the CDA’s λ parameter. Additionally, the reward function is easily modifiable. We used a reinforcement learning method to train the CDA, eliminating the need for human annotations. This process is scalable to larger training datasets and adaptable to different information criteria. By defining the penalty term, the CDA can learn the information contained in patch images. For example, we used cell segmentation masks as pseudo-ground truths to quantify information loss. However, diagnosing lipogenic neoplasms originating from adipocytic cells does not align with the current version of AdaSlide. Fat tissue, which generally exhibits low cellularity and scarce nuclei, is prone to being recognized as less informative by the CDA. Adjustments to the CDA’s reward function would be necessary to handle lipogenic neoplasms more effectively.

Faghani et al.⁵⁶ introduced a lossless compression technique that identifies the tissue-containing foreground using an object detection-style binarization process and removes the background. Since this method does not alter the foreground (tissue) regions, it preserves the diagnostic information and does not degrade downstream performance. This represents an effective compression approach for certain scenarios. However, their method reconstructs the image by cropping the foreground regions into bounding boxes and reassembling them using a rectangle-packing strategy. As a result, it performs optimally when (1) the image contains repetitive patterns of similar size, (2) a large proportion of background is present, and (3) the tissue is predominantly organized along horizontal or vertical axes. In contrast, the method is less effective when the tissue exhibits diagonal structures or when the background is minimal, as occurs in many tumor-rich slides. Thus, the ideal application scenario for this method would be tissue microarrays (TMAs). Moreover, because the final reconstructed image disrupts the original spatial layout, direct spatial correspondence with the original WSI is lost, posing challenges for certain applications such as spatial omics alignment or visual inspection. In contrast, AdaSlide preserves the original spatial grid of the WSI and is not subject to the aforementioned constraints, enabling more flexible and generalizable compression.

This study had several limitations. First, we only considered a binary compression decision, determining whether to compress each image region. This approach could be extended to multi-class compression, where the CDA determines different compression levels for improved efficiency. Additionally, the CDA occasionally failed to converge stably during training. Future research will address this by exploring advanced policy-based reinforcement learning algorithms, such as proximal policy optimization (PPO)⁵⁷, and conducting more sophisticated hyperparameter tuning. The λ parameter and the achieved CR were not always perfectly aligned with downstream task requirements, suggesting that the optimal λ value may vary depending on the dataset. Furthermore, the final compressed outputs could not be explicitly controlled, necessitating empirical experiments to identify the optimal λ value. To address this, we plan to modify the reward functions by incorporating explicit CR constraints to better control slide- or group-level CR. In addition, SSIM scores for tumor and non-tumor areas were relatively similar under the current CDA interpretation (Table 2). To address this, we plan to further refine the penalty design to more explicitly account for reconstruction difficulty. Incorporating semantic information from VLMs²³ represents a promising extension to enhance this capability.

AdaSlide is currently limited to an ROI-level single-patch image-based model, although the final output is expected to operate at the slide level. As such, AdaSlide does not collectively consider related ROI information. To address this, we plan to enhance both the CDA and FIE to handle multiple ROIs in future studies. The CR of AdaSlide is based on the size of raw pixel images, which differs from the file sizes of compressed formats such as JPG or PNG. For example, in our demo using ESRGAN, tessellated patch instances saved in formats like JPG enabled compression to a smaller size than the original SVS image. However, using VQVAE, which operates with 32-bit floating-point values, resulted in file sizes exceeding those of typical image files. Future research should consider practical image formats and introduce techniques like quantization to address this issue. Currently, ESRGAN achieves optimal enhancement performance when applied to PNG-based images. As a result, in the current demonstration, a hybrid strategy was used, with Keep images stored in JPEG format and Compress images stored in PNG format. In future work, enabling ESRGAN to operate effectively on JPEG-compressed images would allow for maximal storage efficiency. This capability could potentially be combined with the foreground cropping method proposed by Faghani et al.⁵⁶, further enhancing the overall compression effectiveness.

In summary, AdaSlide is a framework that adjusts the compression levels based on the image’s information content to effectively compress digital pathology images. AdaSlide demonstrated balanced performance between CR and information preservation in various downstream tasks. Additionally, AdaSlide offers flexibility in terms of backbone architecture, allows for adjustable CR, and includes a reward function design to define information penalties. AdaSlide stores digital pathology images more effectively, thereby reducing costs and contributing to large-scale digital pathology archiving for long-term storage.

Methods

PanCancer datasets

The main objective of the FIE is to achieve robustness across multiple domains, including different organs, scanner variations, and scanned image quality. To address these challenges, we collected data from TCGA, which encompasses 31 projects. Thirty diagnostic slides were randomly sampled from each project, which yielded a training set of 930 WSIs. Although the entire TCGA collection contained over 20,000 WSIs, we sampled 5% to reduce computational costs and facilitate efficient training. Each WSI was tessellated into multiple patch images. We considered two key objectives to generate diverse patches:

Multi-magnification representation: Patches were extracted from both 40× (0.25 μm/pixel) and 20× (0.5 μm/pixel) magnifications.
Spatial diversity: To promote adequate coverage of different WSI regions, we sampled 10% of the patches at 40× and 40% at 20×. The WSIs exhibited strong spatial correlations, implying that nearby patches often shared similar patterns. We employed a sampling-based patch selection instead of a sliding-window-based selection to encourage diverse patch selection.

To exclude non-tissue areas, we generated binary tissue masks using the Otsu thresholding with a validity threshold of 70%. Each patch was 512 × 512 pixels in size. We generated 1.8 million patches, ensuring equal representation at 40× and 20× magnifications. The PanCancer dataset encompasses diverse organs and cancer types. We divided PanCancer into training (94.5%), validation (5%), and test (0.5%) sets.

Datasets: Foundation Image Enhancer (FIE)

To train and validate the FIE, we started with high-resolution images (512 × 512 pixels) from the PanCancer dataset and artificially downscaled them to 128 × 128 pixels using bilinear interpolation. This simplified setting focused solely on the challenge of upscaling without the additional noise complexities encountered in real-world low-resolution images. Patch images were generated at both 40× and 20× magnifications to train and validate robustness across varying scales. The same preprocessing pipeline was applied to generate an external validation dataset sourced from CPTAC.

Datasets: Compress Decision Agent (CDA)

A validation split of the PanCancer dataset was used to train the CDA. For the pseudo-ground truth, the HoverNet¹⁵ outputs were acquired from both the original image and the compressed-and-restored image.

Downstream datasets

To validate the performance of the trained AdaSlide model, we leveraged several publicly available downstream datasets commonly used for slide-level and patch-level tasks. These datasets were specifically selected to evaluate the performance of AdaSlide in multi-magnification, multi-organ environments. A summary of each dataset is provided in Table 1.

The NCT-CRC dataset (National Center for Tumor Diseases Colorectal Cancer Type Classification) dataset consists of 100,000 training and 7180 test images extracted from H&E-stained WSIs of human CRC and normal tissue⁴³. The dataset contains nine classes: adipose tissue, background, debris, lymphocytes, mucus, smooth muscle, normal colon mucosa, cancer-associated stroma, and NCT-CRC epithelium. All images were color-normalized using the Macenko method⁵⁸. We split the original training set into a training and a validation set at a 7:3 ratio. The test dataset was then used according to a publicly available split set.

The MHIST (Minimalist Histopathology Image Analysis Dataset) dataset contains 2175 training and 977 test patch images of H&E-stained colorectal polyps (extracted from 328 FFPE WSIs) from Dartmouth-Hitchcock⁴⁴. This dataset focused on the binary classification of benign hyperplastic polyps (HPs) vs. precancerous sessile serrated adenomas (SSAs). We split the original training set into a training and validation set at a 7:3 ratio. The test dataset was then used according to the publicly available split set.

The LI (Lymphatic Invasion Classification) dataset has 974 images extracted from H&E-stained WSIs of human stomach cancers¹⁴. It specifically targets the binary classification of LI regions versus normal regions. Unlike other datasets, the LI dataset concentrates on small regions containing lymphocytes and the presence of tumor invasions within those lymphocyte areas. Although the original LI dataset includes weakly-labeled negative cases, we excluded them from this study. The LI dataset was divided into training, validation, and test datasets at a ratio of 5:3:2.

The SICAPv2 (Sistema de Interpretación de Imágenes Histopatológicas para la Detección de Cáncer de Próstata) dataset comprises 155 biopsies from 95 patients, with labels based on primary Gleason grades ranging from non-cancerous (NC) to Grade 5 (GG5). The primary grade was used as the target label for classification. The public test split was utilized for validation, while the remaining data were divided into training and validation sets with a ratio of 0.8:0.2 based on the public training split set configuration.

The WSSS4LUAD (Weakly-supervised Semantic Segmentation Techniques for Histopathology Images of Lung Adenocarcinoma) dataset comprises LUAD image tiles ranging from 200 to 500 pixels in size, each annotated with labels such as tumor, tumor-associated stroma, or normal. For our analysis, we adopted the approach outlined in the UNI¹⁹, selecting only samples with a single ground-truth label. This resulted in a subset of 4693 images from the official training split. In this study, the dataset was further divided into training, validation, and test sets in a ratio of 0.6:0.2:0.2.

The SNOW (Synthetic Dataset for Human Breast Cancer Nuclei Segmentation) dataset contains 20,000 images of breast cancer patch images⁴⁷. The patch images and nuclei annotation masks were generated from deep neural networks. We selected the first 5000 images and their corresponding mask pairs to simplify the experiments. These 5000 image-mask pairs were then divided into training, validation, and test datasets at a ratio of 9:0.5:0.5.

The NuInsSeg (Nuclei Instance Segmentation) dataset is a fully manually annotated dataset for nuclei segmentation in H&E images. It includes 665 image patches with over 30,000 manually segmented nuclei from 31 human and mouse organs. For this study, only the human H&E images (472 images) were used. The dataset was split into training, validation, and testing sets with a ratio of 0.6:0.2:0.2, respectively. Detailed instructions for generating the related segmentation masks are available in the public repository associated with the dataset.

The PanNuke (Pan-cancer Histology Dataset for Nuclei Instance Segmentation and Classification) dataset contains nuclei-level annotations derived from histopathology images spanning 19 tissue types. The dataset includes approximately 200,000 nuclei categorized into five clinically significant classes: neoplastic, inflammatory, connective tissue, necrotic/dead, and epithelial. In this study, we converted the dataset to binary segmentation masks for simplicity. The PanNuke dataset fold 1 was split into training, validation, and test datasets at a ratio of 0.6:0.2:0.2.

The TCGA-RCC dataset comprises 884 WSIs from TCGA’s kidney chromophobe (TCGA-KICH), kidney renal clear cell carcinoma (TCGA-KIRC), and kidney renal papillary cell carcinoma (TCGA-KIRP) projects. TCGA-KICH contains 111 images from 99 patients, TCGA-KIRC contains 489 images from 483 patients, and TCGA-KIRP contains 284 images from 264 patients. Patches were generated at 20x magnification with a size of 512 × 512 pixels. The average number of patches per slide was approximately 802.66, with a standard deviation of 435.81. The maximum and minimum numbers of patch images per slide were 4001 and 12, respectively.

The TCGA-NSCLC dataset contains 993 WSIs from TCGA, comprising 507 lung squamous cell carcinoma (TCGA-LUAD) and 486 lung adenocarcinoma (TCGA-LUSC) cases across 896 patients. Patches were generated at 20× magnification and are 512 × 512 pixels in size. The average number of patches per slide was approximately 694.63, with a standard deviation of 490.80. The maximum and minimum patch images per slide were 2933 and 10, respectively.

The TCGA-BRCA dataset includes various cancer types; however, for this analysis, we focused on invasive ductal carcinoma (IDC) and invasive lobular carcinoma (ILC) subtypes. A total of 1007 cases across 948 patients were employed. Patches were generated at 20× magnification and are 512 × 512 pixels in size. The average number of patches per slide was approximately 712.82, with a standard deviation of 452.17. The maximum and minimum patch images per slide were 4252 and 12, respectively.

The Camelyon16 dataset (Cancer Metastases in Lymph Nodes Challenge 2016) contains 268 WSIs, including 111 slides from patients with lymph node metastases and 160 slides from normal lymph nodes. The dataset focuses on the binary classification of metastatic versus non-metastatic lymph nodes. Patches were extracted at 40× magnification with a size of 512 × 512 pixels. On average, each slide generated approximately 6021.86 patches, with a standard deviation of 4130.37. The maximum and minimum patch images per slide were 26,926 and 127, respectively.

The Children’s Brain Tumor Network (CBTN)⁵⁰ contains 3607 WSIs across 41 diagnostic categories derived from pediatric samples. For this study, we narrowed the classification task to only include Medulloblastoma and Ependymoma diagnoses. 150 WSIs were selected from each class, and included 212 patients. Patches were generated at 20× magnification and are 512 × 512 pixels in size without overlaps. The average number of patches per slide was about 173, with a standard deviation of 127.12. The minimum and maximum number of patch images per slide were 4 and 721, respectively.

Model Developments: Foundation Image Enhancer (FIE)

The FIE was trained using three backbone architectures: VAE³⁴, VQVAE³⁶, ESRGAN⁵¹, SwinIR⁵², and LDM⁵³. The FIE required two modules: an encoder f( ⋅ ) and a decoder g( ⋅ ). The encoder maps the original images x to the latent vectors, and the decoder maps the latent vectors to the original images. Therefore, the FIE works as follows:

$$\hat{x}=g(f(x)),$$

(1)

where the FIE was trained to minimize the difference between x and $\hat{x}$.

For the VAE³⁴, the original study used 64 × 64 pixel images. We trained it with images of size 224 × 224 and 512 × 512 pixels. The architecture consisted of hidden dimensions with various sizes depending on the inputs: 224 × 224 (32, 64, 128, 256, 256, 512, 512) and 512 × 512 (32, 64, 128, 256, 512, 512, 512, 1024, 1024, 1024). The batch size was set to 128, and training used the Adam optimizer with a learning rate of 1 × 10⁻⁴. The training split of the PanCancer dataset was used and trained for five epochs. The checkpoints were selected based on the validation loss.

The VQVAE³⁶ was proposed to replace the latent dimensions represented by a Gaussian distribution in VAEs. Instead, VQVAE employs quantized vector embeddings to represent the latent dimensions. In addition, VQVAE, particularly VQVAE-2, attempts to enhance the images in a two-stage manner. Similar to VAE, VQVAE exhibits low flexibility with respect to the input image shape. Therefore, we trained the VQVAE model twice, once for 224 × 224 pixels and again for 512 × 512 pixels. The architecture and hyperparameters are shared. The model hyperparameters and training parameters followed those of a previously proposed VQVAE model for histopathological image compression³¹. The batch size was set to 128 and the learning rate to 1 × 10⁻⁴, using the Adam optimizer with five epochs. Minimum validation loss was set as the checkpoint selection criterion.

Unlike the VQVAE models, the ESRGAN⁵¹ does not require trained encoders. The ESRGAN’s encoder f( ⋅ ) is a simple linear downscaling method. Unlike other methods, ESRGAN attempts to minimize perceptual differences. As numerical metrics such as SSIM and PSNR diverged from human expert selection, the optimal checkpoint was determined through visual inspection in collaboration with a pathologist. To train the ESRGAN model, a batch size of 256 was used, and the model was initialized with a pretrained ESRGAN model. The Adam optimizer with a learning rate of 1 × 10⁻⁴ was used, and training lasted for a total of 50,000 steps, with the first 2500 as a warm-up phase. All other hyperparameters were maintained at their default settings, as defined in the original ESRGAN model⁵¹.

The SwinIR⁵² is a Transformer-based image restoration model that utilizes the Swin Transformer architecture⁵⁹. Compared to CNN-based models such as ESRGAN, SwinIR provides enhanced capability for modeling long-range dependencies and contextual information, which is particularly useful in reconstructing complex textures in histopathology images. In our implementation, we trained the SwinIR model to handle input sizes of 512 × 512 pixels. The architecture was initialized with a pretrained SwinIR model and fine-tuned using the PanCancer training split. A batch size of 12 and the Adam optimizer with a learning rate of 2 × 10⁻⁴ were used. The training was performed for 50,000 steps, with the multiple step scheduler with 5 milestones: 25,000, 40,000, 45,000, 47,500, 50,000. The checkpoint was selected based on the lowest validation loss. All other hyperparameters followed the original SwinIR implementation⁵². For training SwinIR, we employed the KAIR library (v1.1, https://github.com/cszn/KAIR).

The LDM⁵³ is a generative diffusion model that performs denoising in a lower-dimensional latent space, enabling both efficient training and high-quality image reconstruction. Unlike other models in this study, LDM consists of three main components: a first-stage VQVAE to map images to latent space, a UNet-based denoising network that operates in latent space, and a conditioning mechanism to guide reconstruction. In our implementation, we adopted an LDM architecture customized for histopathology image reconstruction. The pretrained first-stage VQVAE was employed. The denoising network was trained with a noise schedule following the cosine beta schedule, with 1000 diffusion steps. We used a batch size of 8 and the Adam optimizer with a learning rate of 2 × 10⁻⁶. The training was conducted for 7 epochs. The best checkpoint was selected based on the lowest validation loss computed on reconstructed images. Training the LDM required approximately 10 GPU days.

The VAE and VQVAE depended on a fixed input size; therefore, these models were trained twice to handle the input sizes of 224 × 224 and 512 × 512 pixels, respectively. The ESRGAN, SwinIR, and LDM models were trained once to handle an input size of 128 × 128. Python (v3.8), PyTorch (v2.0.1), and a single RTX A6000 GPU were used to implement AdaSlide.

Model Developments: Compression Decision Agent (CDA)

The CDA takes a single image patch as input and makes a binary compression decision a ∈ {0, 1}, where a = 1 denotes Compress and a = 0 denotes Keep. The agent employs pretrained image encoders (ResNet-18, ResNet-50, ViT) and is optimized by policy-based learning using the REINFORCE algorithm⁶⁰ (see Supplementary Note 3). Unlike conventional multi-step REINFORCE, we adopt a single-step setting with one patch per decision, eliminating the computational overhead of episode sampling.

We balance compression efficiency and information preservation. Let S_Dice ∈ [0, 1] be the Dice similarity between the segmentation mask derived from the ESRGAN-reconstructed (compressed) image and the reference mask obtained from the original image. The information penalty is

$${S}_{{{{\rm{info}}}}}=1-{S}_{{{{\rm{Dice}}}}}.$$

(2)

To avoid slow convergence, we introduce a quality threshold τ ∈ (0, 1) on S_Dice. The final reward is

$${S}_{{{{\rm{Reward}}}}}=\left\{\begin{array}{ll}-1,\quad &\,{{{\rm{if}}}} \, a=1 \, \, {{{\rm{and}}}} \, {S}_{{{{\rm{Dice}}}}} < \tau,\\ 0,\quad &\,{{\mbox{if}}} \, \, a=0,\hfill \\ 1-\lambda {S}_{{{{\rm{info}}}}},\quad &\,{{\mbox{if}}} \, a \,=1 \, {{\mbox{and}}} \, {S}_{{{{\rm{Dice}}}}} \, \ge \tau,\end{array}\right.$$

(3)

where λ > 0 controls the strength of the information-preservation term.

We tune λ ∈ {0.1, 0.25, 0.5, 0.75, 1.0} and the learning rate ∈ {1 × 10⁻⁴, 5 × 10⁻⁵, 1 × 10⁻⁵, 5 × 10⁻⁶, 1 × 10⁻⁶} via grid search. To avoid over-/under-compression, models with CR outside 20–90% are discarded.

Downstream tasks

To compare performance, we evaluated 13 downstream tasks for each condition: supervised training using the original image, using only the enhanced image processed by the foundation image enhancer (when λ = 0), and applying adaptive compression with the AdaSlide method.

Patch-level downstream tasks are divided into classification and segmentation. The classification tasks include five datasets (NCT-CRC, MHIST, LI, SICAPv2, WSSS4LUAD), while the segmentation tasks include three datasets (SNOW, NuInsSeg, PanNuke). All experiments were repeated using five random seeds ({42, 43, 44, 45, 46}). For consistency, the same split sets were used across experiments, and public split sets were utilized whenever available. The checkpoints were selected based on the lowest validation loss. The Adam optimizer, paired with a cosine annealing learning rate scheduler, was applied throughout.

For patch-level classification tasks, a class-balanced sampler was employed to address class imbalance. Only resize and normalization augmentations were applied. In contrast, patch-level segmentation tasks, which generally involve smaller datasets, utilize various image augmentations, including flipping, random cropping, and color jittering. Detailed configurations for each downstream task are summarized in Table 3.

Table 3 Training configurations for patch-level tasks

Full size table

The slide-level classification datasets included TCGA-RCC, TCGA-NSCLC, TCGA-BRCA, Camelyon16, and CBTN. We utilized the CLAM pipeline³⁸ for preprocessing and training the MIL models, with the CLAM-SB backbone model. Feature extraction for each patch image was performed using the ResNet-50 model. The batch size was set to 1, with training conducted over 50 epochs at a learning rate of 1 × 10⁻⁴. Early stopping was applied to prevent overfitting. The Adam optimizer was used without a learning rate scheduler.

Visual Turing test (VTT)

Five expert pathologists participated in VTT. In each trial, participants were presented with two images: one original and the other restored using ESRGAN. They were instructed to identify which image appeared to be enhanced. A total of 30 questions were presented, and the chance level was set at 50%. The resulting accuracy scores were compared against the chance level using the Wilcoxon signed-rank test.

Application example of AdaSlide

Box 1 presents the AdaSlide pipeline. In this application, patch images were saved as JPEG for kept images and PNG for compressed images. The threshold τ was set to the optimal λ determined by downstream performance.

Using CLAM toolbox³⁸, we preprocessed the WSIs by selecting foreground tissue areas using binarized images and applying median blurring to smooth the edges. For the Camelyon16 sample, patches were generated at level 0 (0.25 μm/pixel), with a size of 512 × 512 pixels. For TCGA-BRCA and TCGA-KIRC samples, the patch size was initially 1024 × 1024 at level 0, which was then resized to 512 × 512 pixels. The numbers of patches for Camelyon16, TCGA-BRCA, and TCGA-KIRC were 24,063, 9278, and 3720, respectively. Using the CDA, we made a patch-wise compression decision. Based on the decision made by the CDA, the patch images were either compressed to 128 × 128 pixels (compress) or left uncompressed (keep). In the decoding step, the compressed images are restored to their original pixel size and enhanced to the FIE. After the encoding-decoding step, the patch images were restored to their original WSI file format. The pyvips framework was used to generate a TIFF-format WSI file and JPEG compression with 75% quality. During reconstruction, the background images were not considered; therefore, the final size of the WSIs was reduced.

Evaluation Metrics: Foundation Image Enhancer (FIE)

The PSNR measures the extrinsic image quality based on the pixel values between the original and enhanced images. The PSNR is computed as follows:

$$PSNR=10 \, {\log }_{10}\left(\frac{{\,{\mbox{MAX}}}_{I}^{2}}{{\mbox{MSE}}\,}\right),$$

(4)

where MAX_I is the maximum pixel value of the image, and MSE is the mean squared error between the original and the enhanced images. Similarly, the SSIM measures the differences in structures and pixel values between the original and enhanced images. The SSIM is computed as follows:

$$SSIM=\frac{(2{\mu }_{x}{\mu }_{y}+\epsilon )(2{\sigma }_{xy}+\epsilon )}{({\mu }_{x}^{2}+{\mu }_{y}^{2}+\epsilon )({\sigma }_{x}^{2}+{\sigma }_{y}^{2}+\epsilon )},$$

(5)

where μ_x and μ_y are the average pixel values of the original and enhanced images, respectively. σ_x and σ_y are the standard deviations of the original and enhanced images, respectively. σ_xy is the covariance between the original and enhanced images. ϵ is a small constant that avoids a zero denominator.

The LPIPS measures the dissimilarity between the original and enhanced images in feature space using pretrained image encoders, such as VGG, ResNet-50 models. The LPIPS is computed as follows:

$$LPIPS={L}_{1}(\phi (x),\phi (y)),$$

(6)

where L₁ represents the mean absolute difference, and ϕ(x) and ϕ(y) are the feature vectors extracted from the original image (x) and the enhanced image (y) by a pretrained deep learning model designed to capture perceptual similarity. In this study, we used the ImageNet pretrained VGG 16 model.

Evaluation metrics: Compression Decision Agent (CDA)

CR refers to the theoretical amount of compression achieved through image compression. When the original image has an input size of 512 × 512, it is represented by a dimension of ${{\mathbb{R}}}^{512{\mbox{x}}512{\mbox{x}}3}$. However, for VAE₅₁₂, the representation is ${{\mathbb{R}}}^{1,000}$, while VQVAE₅₁₂ is represented by ${{\mathbb{R}}}^{128{\mbox{x}}128{\mbox{x}}2}+{{\mathbb{R}}}^{64{\mbox{x}}64{\mbox{x}}2}$, and ESRGAN by ${{\mathbb{R}}}^{128{\mbox{x}}128{\mbox{x}}3}$. The size of the dimension required by each method is expressed as the ratio of the CR. When adaptive compression was determined, the adaptive compression ratio was calculated by assuming the condition of no compression as the baseline, set to a value of one. Therefore, if only the original image was used, it was 100%, and if both were compressed using ESRGAN, it was 6.25%, equivalent to the compression ratio achieved by SwinIR and LDM.

Evaluation metrics: classification performance

The performance of the classification models was evaluated using various widely accepted metrics, including AUROC, AUPRC, accuracy, and F1 score.

AUROC: The AUROC evaluates a classifier’s ability to differentiate between positive and negative classes across varying decision thresholds. This provides a comprehensive measure of the discriminatory ability of the model.
AUPRC: The AUPRC summarizes the trade-off between precision and recall across different threshold values. This is particularly useful for imbalanced datasets where positive samples are scarce.
Accuracy: The proportion of correctly classified instances out of the total number of instances. Although it offers a straightforward assessment of the overall model performance, it may not be suitable for imbalanced datasets.
F1 score: The F1 score considers precision and recall, providing a balanced evaluation of the classifier’s performance. It is calculated as the harmonic mean of these two metrics and is particularly valuable when precision and recall are equally important.

Evaluation metrics: segmentation performance

The performance of the segmentation models was assessed using commonly employed metrics, namely the Dice coefficient and the Jaccard index.

Dice score: The Dice score measures the spatial overlap between the predicted segmentation mask and the ground-truth mask, expressed as the harmonic mean of precision and recall (Equation (7)).
Jaccard index: The Jaccard index evaluates the similarity between two sets by computing the ratio of intersections to unions, and is also known as the Intersection over Union (IoU) (Equation (8)).

The Dice coefficient was computed as:

$$\,{\mbox{Dice}}=\frac{2\cdot {\mbox{TP}}}{2\cdot {\mbox{TP}}+{\mbox{FP}}+{\mbox{FN}}\,}.$$

(7)

The Jaccard index was computed as:

$$\,{\mbox{Jaccard}}=\frac{{\mbox{TP}}}{{\mbox{TP}}+{\mbox{FP}}+{\mbox{FN}}\,}.$$

(8)

where TP, FP, and FN denote true positives, false positives, and false negatives, respectively.

Statistics and reproducibility

For pretraining the FIEs, 30 WSIs were randomly sampled from each TCGA project. During patch extraction, patches were also randomly selected within tissue regions to ensure representative sampling.

For the reconstruction performance of the FIE, SSIM, PSNR, and LPIPS were computed for each patch and then averaged across the evaluation set to obtain global scores. In the VTT, model performance was statistically compared against chance-level accuracy (0.5) using a two-sided Wilcoxon signed-rank test, with P < 0.05 considered statistically significant. No other formal statistical tests were applied.

For downstream classification and segmentation tasks, model performance was evaluated using AUROC, AUPRC, Dice score, and Jaccard index as the primary evaluation metrics. For patch-level analyses, these metrics were calculated per patch and averaged within slides, whereas for slide-level analyses, patch-level features were aggregated using attention weights to generate slide-level representations for classification. All downstream experiments were repeated five times with different random seeds (42, 43, 44, 45, and 46), and mean performance values were reported. All experiments were conducted under identical hyperparameter settings, except for the specific comparison variable being tested.

No statistical method was used to predetermine sample size, and no data were excluded from the analyses. The experiments were not randomized, and the investigators were not blinded to allocation during experiments or outcome assessment. All datasets used in this study are publicly available, and all codes, model configurations, and trained weights will be released to ensure reproducibility.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

This study used publicly accessible WSI datasets. TCGA data (including TCGA-RCC, TCGA-NSCLC, TCGA-BRCA) can be accessed via the NIH Genomic Data Commons (https://portal.gdc.cancer.gov). CPTAC datasets are available through The Cancer Imaging Archive (https://www.cancerimagingarchive.net/). Additional publicly available datasets used in this work include Camelyon16 (https://camelyon16.grand-challenge.org/), NCT-CRC (https://zenodo.org/record/1214456), MHIST (https://bmirds.github.io/MHIST/), LI (https://zenodo.org/records/10020633), SICAPv2 (https://data.mendeley.com/datasets/9xxm58dvs3/1), WSSS4LUAD (https://wsss4luad.grand-challenge.org/), SNOW (https://zenodo.org/records/6633721), NuInsSeg (https://www.kaggle.com/datasets/ipateam/nuinsseg), and PanNuke (https://warwick.ac.uk/fac/cross_fac/tia/data/pannuke). The CBTN dataset is available under controlled access and can be requested from the Children’s Brain Tumor Network (https://cbtn.org/). Source data are provided with this paper.

Code availability

A demo version of the AdaSlide code, including example usage, is available for academic research through GitHub (https://github.com/PathfinderLab/AdaSlide_demo) and Zenodo (https://doi.org/10.5281/zenodo.17445388), and the corresponding demo outputs can be accessed at https://zenodo.org/records/15665900. The full AdaSlide training and evaluation code is provided at https://github.com/PathfinderLab/AdaSlide and also archived in Zenodo (https://doi.org/10.5281/zenodo.17445388). The trained FIE and CDA model weights are available via Zenodo (https://zenodo.org/record/11069591). Additional publicly available resources used in this study include HoverNet (https://github.com/vqdang/hover_net), ESRGAN (https://github.com/XPixelGroup/BasicSR), VQVAE (https://github.com/rosinality/vq-vae-2-pytorch), KAIR (https://github.com/cszn/KAIR), LDM (https://github.com/CompVis/latent-diffusion), and CLAM (https://github.com/mahmoodlab/CLAM).

References

Abels, E. et al. Computational pathology definitions, best practices, and recommendations for regulatory guidance: a white paper from the digital pathology association. J. Pathol. 249, 286–294 (2019).
Article PubMed PubMed Central Google Scholar
Bera, K., Schalper, K. A., Rimm, D. L., Velcheti, V. & Madabhushi, A. Artificial intelligence in digital pathology-new tools for diagnosis and precision oncology. Nat. Rev. Clin. Oncol. 16, 703–715 (2019).
Article PubMed PubMed Central Google Scholar
Lipkova, J. et al. Artificial intelligence for multimodal data integration in oncology. Cancer Cell 40, 1095–1110 (2022).
Article PubMed PubMed Central Google Scholar
Heinz, C. N., Echle, A., Foersch, S., Bychkov, A. & Kather, J. N. The future of artificial intelligence in digital pathology–results of a survey across stakeholder groups. Histopathology 80, 1121–1127 (2022).
Article PubMed Google Scholar
Echle, A. et al. Deep learning in cancer pathology: a new generation of clinical biomarkers. Br. J. Cancer 124, 686–696 (2021).
Article PubMed Google Scholar
Mandair, D., Reis-Filho, J. S. & Ashworth, A. Biological insights and novel biomarker discovery through deep learning approaches in breast cancer histopathology. NPJ Breast Cancer 9, 21 (2023).
Article PubMed PubMed Central Google Scholar
Tran, K. A. et al. Deep learning in cancer diagnosis, prognosis and treatment selection. Genome Med. 13, 1–17 (2021).
Article Google Scholar
Baxi, V., Edwards, R., Montalto, M. & Saha, S. Digital pathology and artificial intelligence in translational medicine and clinical practice. Mod. Pathol. 35, 23–32 (2022).
Article PubMed Google Scholar
Tolkach, Y., Dohmgörgen, T., Toma, M. & Kristiansen, G. High-accuracy prostate cancer pathology using deep learning. Nat. Mach. Intell. 2, 411–418 (2020).
Article Google Scholar
Steyaert, S. et al. Multimodal data fusion for cancer biomarker discovery with deep learning. Nat. Mach. Intell. 5, 351–362 (2023).
Article PubMed PubMed Central Google Scholar
Zhang, Z. et al. Pathologist-level interpretable whole-slide cancer diagnosis with deep learning. Nat. Mach. Intell. 1, 236–245 (2019).
Article Google Scholar
Chen, R. J. et al. Pan-cancer integrative histology-genomic analysis via multimodal deep learning. Cancer Cell 40, 865–878 (2022).
Article PubMed PubMed Central Google Scholar
Ghosh, A. et al. The potential of artificial intelligence to detect lymphovascular invasion in testicular cancer. Cancers 13, 1325 (2021).
Article PubMed PubMed Central Google Scholar
Lee, J., Ahn, S., Kim, H.-S., An, J. & Sim, J. A robust model training strategy using hard negative mining in a weakly labeled dataset for lymphatic invasion in gastric cancer. J. Pathol. Clin. Res. 10, e355 (2024).
Article PubMed Google Scholar
Graham, S. et al. Hover-Net: simultaneous segmentation and classification of nuclei in multi-tissue histology images. Med. Image Anal. 58, 101563 (2019).
Article PubMed Google Scholar
Israel, U. et al. A foundation model for cell segmentation. Preprint at bioRxiv https://doi.org/10.1101/2023.11.17.567630 (2023).
Wang, X. et al. Transformer-based unsupervised contrastive learning for histopathological image classification. Med. Image Anal. 81, 102559 (2022).
Article PubMed Google Scholar
Azizi, S. et al. Robust and data-efficient generalization of self-supervised machine learning for diagnostic imaging. Nat. Biomed. Eng. 7, 756–779 (2023).
Article PubMed Google Scholar
Chen, R. J. et al. Towards a general-purpose foundation model for computational pathology. Nat. Med. 30, 850–862 (2024).
Article PubMed PubMed Central Google Scholar
Xiang, J. et al. A vision–language foundation model for precision oncology. Nature 638, 769–778 (2025).
Article ADS PubMed Google Scholar
Wang, X. et al. A pathology foundation model for cancer diagnosis and prognosis prediction. Nature 634, 970–978 (2024).
Article ADS PubMed PubMed Central Google Scholar
Xu, H. et al. A whole-slide foundation model for digital pathology from real-world data. Nature 630, 181–188 (2024).
Article ADS PubMed PubMed Central Google Scholar
Huang, Z., Bianchi, F., Yuksekgonul, M., Montine, T. J. & Zou, J. A visual–language foundation model for pathology image analysis using medical Twitter. Nat. Med. 29, 2307–2316 (2023).
Article PubMed Google Scholar
Thorstenson, S., Molin, J. & Lundström, C. Implementation of large-scale routine diagnostics using whole slide imaging in Sweden: digital pathology experiences 2006-2013. J. Pathol. Inform. 5, 14 (2014).
Article PubMed PubMed Central Google Scholar
Aswolinskiy, W. et al. Neural image compression for non-small cell lung cancer subtype classification in H&E stained whole-slide images. In Medical Imaging 2021: Digital Pathology Vol. 11603 1160304 (SPIE, 2021).
Chikontwe, P. et al. Weakly supervised segmentation on neural compressed histopathology with self-equivariant regularization. Med. Image Anal. 80, 102482 (2022).
Article PubMed Google Scholar
Jewsbury, R., Bhalerao, A. & Rajpoot, N. M. A quadtree image representation for computational pathology. In Proc. IEEE/CVF International Conference on Computer Vision 648–656 (2021).
Wang, Y. et al. CWC-transformer: a visual transformer approach for compressed whole slide image classification. Neural Comput. Appl. 37, 7485–7497 (2025).
Article Google Scholar
Bidgoli, A. A. et al. Evolutionary deep feature selection for compact representation of gigapixel images in digital pathology. Artif. Intell. Med. 132, 102368 (2022).
Article PubMed Google Scholar
Tellez, D., Litjens, G., Van der Laak, J. & Ciompi, F. Neural image compression for gigapixel histopathology image analysis. IEEE Trans. Pattern Anal. Mach. Intell. 43, 567–578 (2019).
Article ADS Google Scholar
Keighley, J., de Kamps, M., Wright, A. & Treanor, D. Digital pathology whole slide image compression with vector quantized variational autoencoders. In Medical Imaging 2023: Digital and Computational Pathology Vol. 12471 344–353 (SPIE, 2023).
Afshari, M. et al. Single patch super-resolution of histopathology whole slide images: a comparative study. J. Med. Imaging 10, 017501–017501 (2023).
Article Google Scholar
Nasr, M. S. et al. Clinically relevant latent space embedding of cancer histopathology slides through variational autoencoder based image compression. In 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI) 1–5 (IEEE, 2023).
Kingma, D. P. & Welling, M. Auto-encoding variational bayes. Preprint at https://arxiv.org/abs/1312.6114 (2013).
Donahue, J., Krähenbühl, P. & Darrell, T. Adversarial feature learning. Preprint at https://arXiv.org/abs/1605.09782 (2016).
Razavi, A., Van den Oord, A. & Vinyals, O. Generating diverse high-fidelity images with VQ-VAE-2. Adv. Neural Inf. Process. Syst. 32, 14866–14876 (2019).
Google Scholar
Li, Z., Li, B., Eliceiri, K. W. & Narayanan, V. Computationally efficient adaptive decompression for whole slide image processing. Biomed. Opt. Express 14, 667–686 (2023).
Article PubMed PubMed Central Google Scholar
Lu, M. Y. et al. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat. Biomed. Eng. 5, 555–570 (2021).
Article PubMed PubMed Central Google Scholar
Ilse, M., Tomczak, J. & Welling, M. Attention-based deep multiple instance learning. In International Conference on Machine Learning 2127–2136 (PMLR, 2018).
Shi, J., Li, C., Gong, T., Zheng, Y. & Fu, H. Vila-mil: Dual-scale vision-language multiple instance learning for whole slide image classification. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 11248–11258 (2024).
Li, B., Li, Y. & Eliceiri, K. W. Dual-stream multiple instance learning network for whole slide image classification with self-supervised contrastive learning. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 14318–14328 (2021).
Chen, Y.-C. & Lu, C.-S. RankMix: Data augmentation for weakly supervised learning of classifying whole slide images with diverse sizes and imbalanced categories. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 23936–23945 (2023).
Kather, J. N., Halama, N. & Marx, A. 100,000 histological images of human colorectal cancer and healthy tissue. Zenodo https://doi.org/10.5281/zenodo.1214456 (2018).
Wei, J. et al. A petri dish for histopathology image analysis. In Artificial Intelligence in Medicine: 19th International Conference on Artificial Intelligence in Medicine, AIME 2021, Virtual Event, June 15–18, 2021, Proceedings 11–24 (Springer, 2021).
Silva-Rodríguez, J., Colomer, A., Sales, M. A., Molina, R. & Naranjo, V. Going deeper through the Gleason scoring scale: An automatic end-to-end system for histology prostate grading and cribriform pattern detection. Comput. Methods Prog. Biomed. 195, 105637 (2020).
Article Google Scholar
Han, C. et al. Wsss4luad: Grand challenge on weakly-supervised tissue semantic segmentation for lung adenocarcinoma. Preprint at https://arXiv.org/abs/2204.06455 (2022).
Ding, K. et al. A large-scale synthetic pathological dataset for deep learning-enabled segmentation of breast cancer. Sci. Data 10, 231 (2023).
Article PubMed PubMed Central Google Scholar
Mahbod, A. et al. Nuinsseg: A fully annotated dataset for nuclei instance segmentation in H&E-stained histological images. Sci. Data 11, 295 (2024).
Article PubMed PubMed Central Google Scholar
Gamper, J., Alemi Koohbanani, N., Benet, K., Khuram, A. & Rajpoot, N. Pannuke: an open pan-cancer histology dataset for nuclei instance segmentation and classification. In Digital Pathology: 15th European Congress, ECDP 2019, Warwick, UK, April 10–13, 2019, Proceedings 15 11–19 (Springer, 2019).
Lilly, J. V. et al. The children’s brain tumor network (CBTN)-accelerating research in pediatric central nervous system tumors through collaboration and open science. Neoplasia 35, 100846 (2023).
Article PubMed Google Scholar
Wang, X. et al. Esrgan: Enhanced super-resolution generative adversarial networks. In Proc. European Conference on Computer Vision (ECCV) Workshops (2018).
Liang, J. et al. SwinIR: Image restoration using swin transformer. In Proc. IEEE/CVF International Conference on Computer Vision 1833–1844 (2021).
Rombach, R., Blattmann, A., Lorenz, D., Esser, P. & Ommer, B. High-resolution image synthesis with latent diffusion models. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 10684–10695 (2022).
Zhang, R., Isola, P., Efros, A. A., Shechtman, E. & Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 586–595 (2018).
Xu, X., Kapse, S. & Prasanna, P. Histo-diffusion: A diffusion super-resolution method for digital pathology with comprehensive quality assessment. Preprint at https://arXiv.org/abs/2408.15218 (2024).
Faghani, S., Codipilly, D. C., Moassefi, M., Iyer, P. G. & Erickson, B. J. Optimizing storage and computational efficiency: an efficient algorithm for whole slide image size reduction. Mayo Clin. Proc. Digital Health 1, 419–424 (2023).
Article PubMed Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A. & Klimov, O. Proximal policy optimization algorithms. Preprint at https://arXiv.org/abs/1707.06347 (2017).
Macenko, M. et al. A method for normalizing histology slides for quantitative analysis. In 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro 1107–1110 (IEEE, 2009).
Liu, Z. et al. Swin transformer: hierarchical vision transformer using shifted windows. In Proc. IEEE/CVF International Conference on Computer Vision 10012–10022 (2021).
Sutton, R. S., McAllester, D., Singh, S. & Mansour, Y. Policy gradient methods for reinforcement learning with function approximation. Adv. Neural Inf. Process. Syst. 12, 1057–1063 (1999).
Google Scholar

Download references

Acknowledgements

This research was supported by the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health and Welfare, Republic of Korea (grant number RS-2021-KH113146; S.A., S.H.L.). It was also supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (grant number RS-2025-02215813; S.A.), as well as the Digital-Bio AI + X Global Innovative Talent Nurturing Project with Hands-on Experience of the NRF, funded by the Korean government (MSIT) (grant number RS-2024-00441029; S.A.). Additional support was provided by the High-Performance Computing Support Project, funded by the Government of the Republic of Korea (Ministry of Science and ICT) (grant number RQT-25-090105; S.A.).

Author information

Authors and Affiliations

Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
Jonghyun Lee, Lina Takemaru & Dokyoon Kim
Department of Pathology, College of Medicine, Korea University, Seoul, Republic of Korea
Jonghyun Lee, D. M. Bappy & Sangjeong Ahn
Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, USA
Lina Takemaru & Derek Oldridge
Center for Computational and Genomic Medicine, Children’s Hospital of Philadelphia, Philadelphia, PA, USA
Lina Takemaru & Derek Oldridge
Department of Hospital Pathology, Seoul St. Mary’s Hospital, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea
Ye Sul Jeong & Sung Hak Lee
Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul, Republic of Korea
Won-Ki Jeong
Department of Biomedical Informatics, College of Medicine, Korea University, Seoul, Republic of Korea
Sangjeong Ahn

Authors

Jonghyun Lee
View author publications
Search author on:PubMed Google Scholar
Lina Takemaru
View author publications
Search author on:PubMed Google Scholar
D. M. Bappy
View author publications
Search author on:PubMed Google Scholar
Ye Sul Jeong
View author publications
Search author on:PubMed Google Scholar
Won-Ki Jeong
View author publications
Search author on:PubMed Google Scholar
Derek Oldridge
View author publications
Search author on:PubMed Google Scholar
Dokyoon Kim
View author publications
Search author on:PubMed Google Scholar
Sangjeong Ahn
View author publications
Search author on:PubMed Google Scholar
Sung Hak Lee
View author publications
Search author on:PubMed Google Scholar

Contributions

J.L. and L.T. carried out the experiments. J.L. wrote the manuscript with support from L.T., D.M.B., D.O., D.K., W.J., S.A. and S.H.L. conceived the original idea. D.O., D.K., S.A. and S.H.L. supervised the project.

Corresponding authors

Correspondence to Sangjeong Ahn or Sung Hak Lee.

Ethics declarations

Competing interests

The authors declare a related patent application covering the methodological aspects of this work (Applicants: Korea University and The Catholic University of Korea Industry-Academic Cooperation Foundations; Inventors: S.A., S.H.L., and J.L.; PCT/KR2025/007948; filed and pending). The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download PDF )

Reporting Summary (download PDF )

Transparent Peer Review file (download PDF )

Source data

Source Data (download XLSX )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Lee, J., Takemaru, L., Bappy, D.M. et al. Adaptive compression framework for giga-pixel whole slide images. Nat Commun 17, 207 (2026). https://doi.org/10.1038/s41467-025-66889-0

Download citation

Received: 18 July 2024
Accepted: 18 November 2025
Published: 03 December 2025
Version of record: 07 January 2026
DOI: https://doi.org/10.1038/s41467-025-66889-0

Subjects

Abstract

Similar content being viewed by others

TIAToolbox as an end-to-end library for advanced tissue image analytics

Integrating digital pathology into clinical practice

Interoperable slide microscopy viewer and annotation tool for imaging data science and computational pathology

Introduction

Results

Overview of AdaSlide

Information disequilibrium

Hypothesis of AdaSlide

Datasets

Foundational Image Enhancer (FIE)

Compression Decision Agent (CDA)

Evaluation of information disequilibrium

Downstream tasks

Best practice for selecting the optimal λ parameter

Application example of AdaSlide

Discussion

Methods

PanCancer datasets

Datasets: Foundation Image Enhancer (FIE)

Datasets: Compress Decision Agent (CDA)

Downstream datasets

Model Developments: Foundation Image Enhancer (FIE)

Model Developments: Compression Decision Agent (CDA)

Downstream tasks

Visual Turing test (VTT)

Application example of AdaSlide

Evaluation Metrics: Foundation Image Enhancer (FIE)

Evaluation metrics: Compression Decision Agent (CDA)

Evaluation metrics: classification performance

Evaluation metrics: segmentation performance

Statistics and reproducibility

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Supplementary Information (download PDF )

Reporting Summary (download PDF )

Transparent Peer Review file (download PDF )

Source data

Source Data (download XLSX )

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links