Artificial intelligence-assisted prostate cancer diagnosis for reduced use of immunohistochemistry

Blilie, Anders; Mulliqi, Nita; Ji, Xiaoyi; Szolnoky, Kelvin; Boman, Sol Erika; Titus, Matteo; Gonzalez, Geraldine Martinez; Asenjo, José; Gambacorta, Marcello; Libretti, Paolo; Gudlaugsson, Einar; Kjosavik, Svein R.; Egevad, Lars; Janssen, Emiel A. M.; Eklund, Martin; Kartasalo, Kimmo

doi:10.1038/s43856-025-01185-y

Download PDF

Article
Open access
Published: 15 October 2025

Artificial intelligence-assisted prostate cancer diagnosis for reduced use of immunohistochemistry

Communications Medicine volume 5, Article number: 425 (2025) Cite this article

4482 Accesses
3 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Background:

Prostate cancer diagnosis heavily relies on histopathological evaluation, which is subject to variability. While immunohistochemical staining (IHC) assists in distinguishing benign from malignant tissue, it increases workload, costs, and leads to diagnostic delays. Artificial intelligence (AI) presents a promising solution to reduce reliance on IHC by accurately classifying atypical glands and borderline morphologies in hematoxylin and eosin (H&E) stained tissue sections.

Methods:

In this study, we evaluated an AI model’s ability to minimize IHC use without compromising diagnostic accuracy. We retrospectively analyzed prostate core needle biopsies from routine diagnostics at three different pathology sites. These cohorts consisted exclusively of diagnostically challenging cases where pathologists had required IHC to finalize the diagnosis.

Results:

We show that the AI model achieves high performance, with area under the curve values ranging from 0.951 to 0.993 for detecting cancer in routine H&E-stained slides. When applying sensitivity-prioritized diagnostic thresholds, the model reduces the need for IHC staining by 44.4%, 42.0%, and 20.7% across the three cohorts, without a single false negative prediction. Among slides with a benign ground truth label, IHC use is reduced by up to 80.6%.

Conclusions:

This AI model shows promise for reducing unnecessary IHC use in difficult prostate biopsy cases while maintaining diagnostic safety. Its integration into clinical workflows could streamline decision-making in prostate pathology and alleviate resource burdens.

Plain language summary

Diagnosing prostate cancer typically involves examining tissue samples under a microscope. In challenging cases, doctors often use a special test called immunohistochemistry (IHC) to help confirm whether cancer is present. However, IHC adds time, cost, and extra work to the diagnostic process. In this study, we tested an artificial intelligence (AI) tool to see if it could accurately identify prostate cancer using only standard tissue images—without needing IHC. We analyzed especially difficult biopsy cases from three different hospitals, where pathologists had originally needed IHC to make a diagnosis. The AI tool was highly accurate and, when using a safety-first approach, it could reduce the use of IHC by 20% to 44% depending on the site without missing any cancers. Importantly, when focusing only on slides that were ultimately benign, the AI could reduce IHC use by up to 80.6%. This suggests AI could help pathologists make faster and more efficient decisions while maintaining diagnostic safety.

Artificial intelligence for advance requesting of immunohistochemistry in diagnostically uncertain prostate biopsies

Article Open access 20 May 2021

A systematic review and meta-analysis of artificial intelligence diagnostic accuracy in prostate cancer histology identification and grading

Article 25 April 2023

Systematic review and meta-analysis of artificial intelligence in classifying HER2 status in breast cancer immunohistochemistry

Article Open access 06 March 2025

Introduction

Histopathological evaluation of prostate biopsies using the Gleason grading system is a cornerstone in the diagnosis and management of prostate cancer^1,2,3. However, Gleason grading is notoriously subjective, showing high inter- and intraobserver variability, resulting in over- and underdiagnosis^4,5,6,7. To standardize diagnostics, the International Society of Urological Pathology (ISUP) updated grading guidelines for prostate cancer to convert Gleason scores into ISUP grades (also called ‘grade groups’) from 1 to 5⁸. Pathological assessment can be aided by immunohistochemical staining (IHC), which in prostate pathology is mainly used for the identification of prostatic basal cells. These cells are present around the periphery of benign glandular structures but are lost in the development of prostatic adenocarcinoma (with rare exceptions), making absent basal-cell IHC staining strongly suggestive of malignancy^9,10,11,12. ISUP recommends using basal-cell IHC markers to confirm cancer when encountering small foci of atypical glands where a definitive malignant diagnosis cannot be rendered based on hematoxylin & eosin (H&E) staining⁹. The IHC markers (antibodies) most commonly used to identify prostatic basal cells are high-molecular-weight cytokeratin (HMWCK) and p63, often used together to increase sensitivity⁹. In rare cases, non-cancerous morphological variants such as adenosis, atrophy, or intraepithelial neoplasia can have areas of absent basal cell staining, or conversely, prostate cancer can paradoxically exhibit positive IHC staining for basal cell markers⁹. Thus, IHC expression must be interpreted carefully and correlated with H&E morphology, which can be challenging.

The decision to order IHC for a given tissue block is inherently subjective, depending on the judgment of the pathologist. Differences in uropathology experience, combined with high observer variability, naturally lead to varying practices for ordering IHC¹³. Personal preferences also play a role, as some pathologists rely on IHC as a safety net to minimize misdiagnosing malignancy even when morphological suspicion is low. This variation extends across pathology laboratories, where many IHC investigations ultimately result in a benign diagnosis¹⁴. Furthermore, some laboratories are known to preemptively order IHC for all prostate biopsies, anticipating a high likelihood of IHC requests from pathologists. The use of IHC incurs costs in both time and resources. Each antibody reagent has a per-use price, and every tissue block requiring IHC must be re-cut, stained, and further processed. This places additional strain on the pathology lab and extends turnaround times, ultimately delaying the final diagnosis^15,16.

Transitioning from glass slides to digital whole-slide images (WSI) is widely considered the third revolution in modern pathology, following the introduction of IHC and the inception of genomic medicine using molecular-based methods¹⁷. Artificial intelligence (AI) has shown potential in standardizing histopathological grading of prostate cancer^6,18,19,20, as well as in predicting treatment response²¹ and patient outcomes²². Recently, pathology foundation models (FM) have shown promise in pan-cancer detection^23,24,25,26. Despite the growing role of AI in pathology and its potential to enhance diagnostic consistency, there remains a critical gap in leveraging AI models to systematically standardize and minimize unnecessary IHC usage in routine prostate cancer diagnostics. A previous study proposed an AI solution for identifying tissue blocks that are likely to require IHC and preemptively order IHC prior to pathologist evaluation¹⁶. Another study found that retrospective evaluation of prostate biopsies using an AI model led to a reduced need for IHC compared to the traditional diagnostic approach²⁷. IHC staining has also been used as the reference standard in a study developing an AI model for prostate tissue segmentation²⁸. While these studies have aimed to replicate pathologists’ IHC-ordering patterns for workflow optimization, assess IHC frequency in a research setting, or improve tissue segmentation, to our knowledge, no study has used AI models to standardize IHC usage in prostate pathology or minimize IHC requests for benign slides in routine clinical practice.

We utilize an AI model trained on prostate core needle biopsies for prostate cancer diagnosis and Gleason grading, which demonstrates robust performance in handling challenging tissue morphologies²⁹. We hypothesize that such a model, capable of accurately diagnosing small foci with atypical glands and borderline morphology, can reduce reliance on IHC in routine practice. This study follows a pre-specified protocol, detailing study design and patient cohorts³⁰. In this study, we apply the model to WSIs of H&E-stained prostate biopsies from multiple international pathology sites. These WSIs represent slides where the diagnosing pathologist required IHC, in addition to H&E staining, to render a final diagnosis. We apply sensitivity-prioritized diagnostic thresholds to minimize false negative predictions—critical for ensuring no cancers are overlooked—while maintaining the high specificity required to effectively reduce IHC usage for benign slides. We show that for true negative slides, where the pathologist’s suspicion of malignancy is low, AI-based support can potentially eliminate a substantial number of IHC investigations traditionally used for rule-out purposes.

Methods

Study design

The full dataset underlying the AI models of this study comprises biopsy samples from 7243 patients across 15 clinical sites in 11 countries, encompassing 58,744 physical glass slides containing ~100,000 biopsy cores. Slides were digitized using 14 scanners (nine different models from five manufacturers), producing a total of 82,584 WSIs. For this study, we exclusively included slides where IHC staining targeting basal cells was performed in the diagnostic process, considering this to be a surrogate marker for the pathologist not being able to establish infiltration status by H&E-staining alone. To avoid data leakage and ensure robust generalization, only held-out test data from patients who were not part of AI model training or hyperparameter tuning were included in the final analysis. The patient sampling strategy for internal and external validation was pre-specified in the study protocol, ensuring a structured and reproducible selection process³⁰. Among the 15 patient cohorts represented in the dataset, only three cohorts had reliable information regarding IHC staining status: Stavanger University Hospital (SUH), Synlab France (SFR), and Synlab Switzerland (SCH). The clinical characteristics of the included patients are summarized in Table 1 and the CONSORT diagram outlining the sample inclusion process is provided in Supplementary Fig. 1.

Table 1 Clinical characteristics of patient cohorts

Full size table

Data cohorts

Cohort 1: Stavanger University Hospital

The SUH samples represent consecutive cases collected from routine diagnostics at the Department of Pathology, Stavanger University Hospital, Norway, between December 2016 and March 2018. Biopsies were obtained at the Department of Urology, Stavanger University Hospital, as well as private urological clinics within the same geographic region. Most biopsies were transrectal and systematic, with a minority involving MRI-guided targeted biopsy collection. The slides were digitized with a Hamamatsu S60 scanner (40×, pixel size 0.2199 μm).

Tabulated slide-level information from the SUH cohort contained IHC status, including the type of stain used, as well as Gleason scores and cancer length per slide. As portions of this cohort were used in the development of the AI model, only slides reserved for internal validation were considered for inclusion. From this subset, all slides where a basal-cell IHC marker was used, almost invariably HMWCK (CK903/34βE12), were included in the study (n = 234; 129 benign, 105 malignant). 12 different diagnosing pathologists were represented in this subset of samples. Detailed information on the IHC and H&E staining protocols, including antibody clones, equipment, and site-specific procedures, was not available.

Cohort 2: Synlab France

The SFR cohort comprises consecutive cases collected from routine diagnostics at the Technipath-Synlab Medical Laboratory in Dommartin, Rhône, France, between September 2020 and December 2020. This cohort was entirely external, i.e., not used in AI model development. The samples were a mixture of systematic transrectal biopsies and MRI-guided targeted biopsies. The slides were digitized with a Philips IntelliSite Ultra Fast Scanner (40×, pixel size 0.2500 μm, the same device as for the cohort SCH).

From the SFR cohort, we also had slide-level information regarding the use of IHC. Gleason scores and cancer length for individual slides were available; however, there was no tabulated specification of the type of IHC stain(s) performed. To determine this, we manually investigated de-identified pathology reports to extract the missing information. All slides where a basal-cell marker was used, almost invariably p63 in combination with P504S/AMACR, were included in our study (n = 112; 66 benign, 46 malignant). The pathologists’ names were redacted in the reports, and thus, we could not determine the number of pathologists represented in this subset of samples. Detailed information on the IHC and H&E staining protocols, including antibody clones, equipment, and site-specific procedures, was not available.

Cohort 3: Synlab Switzerland

The SCH samples represent consecutive cases collected from routine diagnostics at the Argot Laboratory in Lausanne, Switzerland, between January 2020 and December 2020. This dataset was entirely external, i.e., not used in AI-model development. Biopsies were a mixture of systematic transrectal biopsies and MRI-guided targeted biopsies. The slides were digitized with a Philips IntelliSite Ultra Fast Scanner (40×, pixel size 0.2500 μm, the same device as for the cohort SFR).

The SCH cohort differs from the other cohorts in that the diagnoses were reported in a pooled manner for each anatomical location of the prostate sampled with multiple biopsy cores. The combined Gleason score per location covered by multiple slides was reported. Consequently, no tabulated slide-level information regarding Gleason score or cancer length was available for this cohort. The data tables contained location-level information regarding IHC use but did not specify which particular slide(s) had IHC requested or the type of stain used. Getting this information required reading the de-identified pathology reports. For cases involving IHC, the reports detailed the specific slides where IHC was requested and whether they were benign or malignant, enabling us to include only relevant slides in our study. To ensure validity, it was necessary to confirm a systematic order linking scanned slides to their corresponding location-level report information. Approximately 150 WSIs were evaluated by our study pathologist (A.B.), verifying that such a systematic order existed (e.g., that “Slide 2 C” in a report corresponded to “Scan 3” from “Location 2”). The reports also specified the type of IHC used, which was almost invariably p63 (often in combination with P504S/AMACR). After this filtering process, all IHC slides were included in our study (n = 164; 65 benign, 99 malignant). Five different diagnosing pathologists were represented in this subset of samples. Detailed information on the IHC and H&E staining protocols, including antibody clones, equipment, and site-specific procedures, was not available.

Tissue detection and tiling

Tissue detection from WSIs was performed using a custom-built tissue segmentation model based on a UNet++ architecture, incorporating a ResNeXt-101 (32 × 4d) encoder³¹. Initially, 512 × 512 px patches were extracted across the entire WSI at 8.0 μm/px resolution, with a 128 px overlap, followed by pixel-wise segmentation to identify tissue regions. These segmented regions were then combined into a single binary tissue mask per WSI. Next, 256 × 256 px high-resolution tissue patches were extracted at 1.0 µm/px resolution, using the segmentation masks to retain only those patches where at least 10% of pixels contained tissue. During model training, patches were extracted without overlap to reduce GPU memory usage, whereas, for model prediction, a 128 px overlap was used to enhance diagnostic accuracy. To achieve a resolution of 1.0 µm/px, patches were downsampled from the nearest higher resolution level in the WSI resolution pyramid using Lanczos resampling. Extracted patches were stored in the TFRecord format for efficient disk storage, with each WSI saved as a separate file.

AI model

The task-specific AI model used for evaluation in this study was trained on digitized prostate core needle biopsies for prostate cancer diagnosis and Gleason grading²⁹. The model was built using an attention-based multiple instance learning (ABMIL) architecture with weakly supervised learning, leveraging only slide-level labels. The model uses an EfficientNet-V2-S encoder³² to extract patch-level feature embeddings that are further aggregated into slide-level representations with the ABMIL and classification layers providing classification of the two Gleason patterns (i.e., 3, 4, or 5), further translated into Gleason score and ISUP grade. The grading model was trained in an end-to-end fashion where all model parameters were jointly optimized for cross-entropy loss using the AdamW optimizer³³ with a base learning rate of 0.0001. Further details regarding design choices, hyperparameters, and validation results can be found in the original publication²⁹. UNI²⁵ and Virchow2³⁴ foundation models were used within the same training pipeline; however, the weights of the encoders were kept frozen, and only the ABMIL and subsequent classification layers were trained identically to the task-specific model. The model was trained on 10 cross-validation folds stratified by the patient and ISUP grade. During model predictions, test-time augmentation (TTA) was applied for three iterations per model, and the final prediction was obtained as the majority vote of the 30 predicted Gleason scores (10 models × 3 TTA runs), and further translated into an ISUP grade. Cancer probability was obtained as the median over the ensemble. Mean attention scores from the ABMIL models were used for each tile to highlight regions of interest that the AI focused on for the final diagnosis.

Statistics and reproducibility

Using the numerical value for AI-predicted cancer probability of a given WSI, we analyze the results for each cohort at different model operating points (i.e., different thresholds for a positive prediction), allowing us to prioritize either sensitivity or specificity for cancer detection. We analyzed the data using thresholds ranging from 0.5 to 0.01. To quantify the concordance of negative/positive diagnoses for prostate cancer with the reference standard, we used sensitivity (true positive rate), specificity (true negative rate), and AUC. All reported values are point estimates. The statistical calculations were conducted using the Python modules NumPy (v1.24.0), scikit-learn (v1.2.2), and Pandas (v1.5.3). All computational analyses were verified to be deterministic and, as such, fully repeatable. The analyzed material consisted of routine clinical samples without biological replicates.

Hardware and software

Model training and predictions were run as described earlier²⁹. We used Python (v3.8.10), PyTorch (v2.0.0, CUDA 12.2) (https://pytorch.org), and PyTorch DDP for multi-GPU training for all experiments across all models. We used the pre-trained weights for UNI and Virchow2 FMs from their official releases on the HuggingFace hub (https://huggingface.co/MahmoodLab/UNI; https://huggingface.co/paige-ai/Virchow2) and integrated them with the ViT implementations provided by the timm library (v0.9.8). All experiments were done on two high-performance clusters: Alvis (part of the National Academic Infrastructure for Supercomputing in Sweden) and Berzelius (part of the National Supercomputer Centre). On Alvis, training was done on 4 × 80GB NVIDIA A100 GPUs (256 GB system memory, 16 CPU cores per GPU). On Berzelius, training was done on 8 × 80 GB NVIDIA A100 GPUs (128 GB system memory, 16 CPU cores per GPU). Predictions were run on the clusters on a single 40 GB A100 NVIDIA GPU. Docker (v20.10.21) was used locally, Singularity and Apptainer were used on the computing clusters. OpenSlide (v4.0.0), openslide-python (v1.3.1), and OpenPhi (v2.1.0) were used to access WSIs. DareBlopy (v0.0.5) was used for compatibility between the TFRecord data format (.tfrecord) and PyTorch. Albumentations (v1.3.1) and Stainlib (v0.6.1) were used for image augmentations. For implementing the tissue segmentation model PyTorch segmentation_models_pytorch library (v0.3.3) was used. NumPy (v1.24.0), scikit-learn (v1.2.2), and Pandas (v1.5.3) were used for numerical operations, model evaluation, and data management. Pillow (v9.4.0) and OpenCV-python were used for basic image processing tasks. Matplotlib (v3.7.1) and Seaborn (v0.12.2) were used for plots and figures, and Biorender was used to assemble figure panels. Pathologists' reviews of false negative cases were done using QuPath (v0.4.3)³⁵.

Ethical considerations

This study included data gathered in one or more collection rounds at participating international sites between 2012 and 2024. All datasets were de-identified at their respective sites and subsequently transferred to Karolinska Institutet in an anonymized format. This study complies with the Helsinki Declaration. The patient sample collection was approved by the Stockholm Regional Ethics Committee (permits 2012/572-31/1, 2012/438-31/3, and 2018/845-32), the Swedish Ethical Review Authority (permit 2019-05220), and the Regional Committee for Medical and Health Research Ethics in Western Norway (permits REC/Vest 80924, REK 2017/71). Informed consent was obtained from patients in the Swedish dataset and was waived for other data cohorts due to the use of de-identified prostate specimens in a retrospective setting. Patient involvement in this study was supported by the Swedish Prostate Cancer Society.

Results

Interpretation of AI predictions and rationale for reduction in IHC use

We employed an in-house, task-specific AI model trained for prostate cancer grading²⁹ to assess its performance in retrospective cases requiring basal-cell IHC staining as part of routine clinical diagnostics. A prediction of “positive” would in this setting translate to “IHC-analysis recommended”, indicating that the model is not confident that the WSI is benign relative to the applied threshold. Conversely, a “negative” prediction should be interpreted as “IHC analysis not recommended”, indicating high AI confidence in benign morphology (i.e., a high negative predictive value) even at a sensitivity-prioritized threshold. In a scenario where the pathologist would have absolute trust in the thresholded AI predictions, i.e., only ordering IHC on positive-predicted WSIs, the amount of negative-predicted WSIs would represent IHC investigations saved compared to current diagnostic practice (Fig. 1). We evaluated the model by measuring sensitivity and specificity for prostate cancer detection at different sensitivity thresholds, along with the area under the receiver operating characteristic curve (AUC) (Fig. 2). Diagnostic performance was assessed across three validation cohorts (Table 1) representing only slides where pathologists ordered IHC-staining for basal-cell markers during routine diagnostics. These cohorts included WSIs from Stavanger University Hospital, Norway (SUH, n = 234 WSIs), Synlab Laboratory, France (SFR, n = 112 WSIs), and Synlab Laboratory, Switzerland (SCH, n = 164 WSIs).

**Fig. 1: Integration of the AI model into the diagnostic workflow.**

**Fig. 2: Model performance across three cohorts using sensitivity-prioritized thresholding.**

With respect to the patient population, laboratory, and whole-slide scanner used for the digitization of biopsies, the SUH cohort represents an internal validation set (different patients but from the same scanner and lab as the AI training data), while the SFR and SCH cohorts represent entirely external validation sets (different patients, scanners, and laboratories than the training data). A detailed description of the data cohorts is provided in the predefined study protocol³⁰. The AI model’s performance was evaluated across varying sensitivity-prioritized thresholds for cancer probability (Table 2). In addition, we evaluated the performance of two foundation models: UNI (UFM) and Virchow2 (VFM) in this task (Supplementary Table 1).

Table 2 AI model performance across sensitivity-prioritized thresholds

Full size table

Diagnostic performance: internal validation

For the SUH internal validation cohort, the AI model achieved an AUC of 0.980 on IHC-validated WSIs. At the baseline threshold of 0.50, sensitivity was 0.914 and specificity was 0.930, yielding 120 true negatives and 9 false negatives out of 234 WSIs. Therefore, if IHC staining had only been ordered for positive AI labels, this threshold would have saved IHC for 129 out of 234 slides (55.0%), though 9 out of 105 cancer slides (8.6%) would have been missed. Using a highly sensitive threshold of 0.01 improved sensitivity to 1.0 while specificity dropped to 0.806, resulting in 104 true negatives and no false negatives. This adjustment would have saved IHC for 104 out of 234 slides (44.4%) without missing any cancers.

Diagnostic performance: external validation

For the SFR external validation cohort, the model demonstrated an AUC of 0.993. At the 0.50 threshold, sensitivity was 0.935 and specificity was 0.955, with 63 true negatives and 3 false negatives among 112 WSIs. This would have saved IHC for 66 out of 112 slides (58.9%) while missing 3 out of 46 cancers (6.5%). Lowering the threshold to 0.4 eliminated all false negatives without losing any true negatives, resulting in 63 out of 112 IHC stains saved (56.3%). At the most sensitivity-prioritized threshold of 0.01, true negatives decreased to 47, reducing IHC savings to 47 out of 112 slides (42.0%).

For the SCH external validation cohort, the model achieved an AUC of 0.951. At a threshold of 0.50, sensitivity was 0.921 and specificity was 0.831, with 54 true negatives and 10 false negatives among 164 WSIs. This would have saved IHC for 64 out of 164 slides (39.0%) but missed 10 out of 99 cancers (10.1%). Using the highly sensitive-prioritized threshold of 0.01 increased sensitivity to 1.0, while specificity dropped to 0.523, resulting in 34 true negatives and reducing IHC savings to 34 out of 164 slides (20.7%).

Pathologist review of false negative cases

At the unadjusted threshold of 0.50, false negative predictions were observed for a total of 22 WSIs across the SUH (9), SFR (3), and SCH (10) cohorts. Slide-level label data for ISUP grade and cancer length were available for SUH and SFR but not for SCH. In the SUH cohort, the nine false negatives had a mean cancer length of 2.8 mm (median: 1 mm, range: 0.2–11.0 mm) with the following ISUP distribution: ISUP 1: six slides, ISUP 4: one slide, and ISUP 5: two slides. For the SFR cohort, all three false negatives were ISUP 1 slides with cancer lengths of 2 mm, 4 mm, and 4 mm.

All 22 false negative WSIs were re-evaluated by the study pathologist (A.B.) in a blinded review. To maintain blinding, 12 additional external slides with a balanced distribution of all ISUP grades were included–the purpose of adding these slides was not to balance the dataset, but to mask the fact that the original cases were exclusively IHC-validated false negatives. The pathologist assessed only H&E-stained WSIs, providing a diagnosis for each case and indicating whether IHC would be required in a clinical setting. One WSI was diagnosed as benign, with no need for IHC. Sixteen WSIs were assigned non-definitive diagnoses of atypia (of uncertain significance) or suspicious for cancer (SFC), with IHC recommended for all. One WSI was diagnosed as ISUP 1 (3 + 3) cancer, which did not require IHC. Three WSIs were identified as high-grade cancers, including one ISUP 4 (4 + 4) and two ISUP 5 (5 + 5 and 5 + 4). Additionally, one WSI was deemed suspicious for ductal adenocarcinoma, necessitating IHC for a definitive diagnosis. Overall, IHC was recommended for 20 of the 22 cases.

Following this reassessment, a second pathologist (L.E.) participated in a review meeting where WSIs of IHC-stained slides, when available (SUH cohort), were presented alongside the WSIs of H&E-stained slides. L.E. is an experienced uropathology specialist and has been shown to be highly concordant with other specialists in earlier studies^6,18. The consensus was that most false negative WSIs contained only minimal foci with ambiguous morphology, indeed warranting further IHC investigation. After evaluating the IHC stains, the pathologists agreed that in the majority of cases, the findings still did not meet the qualitative and quantitative criteria for a definitive cancer diagnosis (Fig. 3).

**Fig. 3: False negative predictions with low-grade morphologies from the SUH cohort.**

In 18 of the 22 cases, the suspicious areas displayed low-grade morphology, with at most minimal ISUP 1 cancer. One case had a consensus diagnosis of probable ductal adenocarcinoma. The remaining three cases, classified as high-grade cancers, were also independently assessed by the second pathologist (L.E.) in a blinded review. Both pathologists confirmed ISUP grades consistent with the original reports. However, all three WSIs exhibited significant crush artifacts and tissue folds, partially obscuring cancer morphology. Notably, in these cases, the pathologists who made the original diagnoses had requested PSA immunostaining alongside basal-cell stains to confirm the prostatic origin of the malignancies. This finding aligns with the meeting consensus that these false negatives represented true high-grade cancers with atypical features for acinar carcinoma. Figure 4 provides representative images of the high-grade areas.

**Fig. 4: False negative predictions with high-grade morphologies from the SUH cohort.**

Importantly, the AI model can provide attention maps along with prostate cancer diagnosis predictions. Attention heatmaps for specifically these false negative slides reveal that, although the model ultimately predicted them as benign, the AI correctly localized and highlighted suspicious areas within the tissue. One pathologist (A.B.) independently reviewed all false-negative WSIs in full and annotated the regions deemed most suspicious for malignancy. These initial assessments were then discussed in a consensus meeting with a second pathologist (L.E.), during which the annotated regions were compared to both the corresponding IHC staining patterns (where available) and the AI model’s attention maps. This review confirmed a strong correspondence between the pathologist-identified areas of concern, IHC-confirmed regions, and the model’s high-attention zones. This suggests that even when an AI model does not flag a case for an IHC order, the attention heatmaps could serve as an additional layer of decision support, helping pathologists focus on diagnostically challenging regions.

Discussion

Our results show that the AI model retains high diagnostic performance even for morphologies deemed ambiguous by pathologists (i.e., slides where the pathologists required IHC to make the final diagnosis of benign vs. cancer). By thresholding predictions in a sensitivity-prioritized fashion, we demonstrate the potential of using AI as a decision-support system for deciding when IHC staining is truly necessary. Ordering IHC for every ambiguous WSI with a predicted cancer probability exceeding 1% (sensitivity-maximized threshold of 0.01) eliminated all false negatives (sensitivity = 1.0), while still significantly reducing IHC staining performed on benign slides (44.4%, 42.0%, and 20.7% total IHC reduction for cohorts SUH, SFR and SCH, respectively). The observed differences in IHC reduction across cohorts can be partially explained by cohort composition—specifically, the proportion of benign slides. The SUH and SFR cohorts included a higher percentage of benign cases (55.1% and 58.9%, respectively) compared to the SCH cohort (39.6%). Since our approach targets IHC savings exclusively for benign slides, a smaller overall impact in the SCH cohort is expected. However, even when considering only slides with a benign ground truth, the IHC savings vary substantially across sites—80.6% for SUH, 71.2% for SFR, and 52.3% for SCH—indicating that other factors contribute as well. We believe this variation likely reflects differences in institutional and individual IHC-ordering practices. For example, some sites may have a lower threshold for initiating IHC, applying it even for mildly suspicious morphologies, while others may reserve IHC for cases with more overt atypia. Such differences in diagnostic thresholds and practice patterns can meaningfully influence the potential for IHC reduction. The performance of the task-specific model was similar to FMs, supporting the general applicability of different AI models for this purpose. The FMs exhibited slightly higher sensitivity compared to the task-specific model but at the cost of lower specificity, consistent with previous findings²⁹.

Pathologists’ reassessment of the false negative WSIs observed at higher thresholds revealed that the vast majority of these slides contained only minimal foci of low-grade morphologies, warranting diagnoses of atypia or SFC rather than definitive malignant classification. Such diagnoses are applied when the morphological features are insufficient for a conclusive cancer diagnosis, yet there remains some degree of uncertainty, and malignancy cannot be ruled out. This category of indeterminate diagnoses also includes “atypical small acinar proliferation (ASAP)”, although the use of this term is discouraged by the International Society of Urological Pathology (ISUP)^36,37.

One of the false negative predictions involved a case of ductal adenocarcinoma. While not excluded from the datasets, this is a cancer subtype our AI model is not specifically trained or validated to detect. Although this is the second most common subtype of prostate cancer after acinar adenocarcinoma, it remains rare, comprising only 0.17% of cases³⁸. Due to its low prevalence, acquiring sufficient training data for robust AI model development and validation remains a significant challenge. This case also highlights the broader issue of detecting and differentiating various intraductal proliferations such as high-grade prostatic intraepithelial neoplasia (HGPIN), atypical intraductal proliferation (AIP), and intraductal carcinoma (IDC). Our current model is not validated to distinguish these entities, and they fall outside the scope of this study. While IHC—particularly basal-cell markers—can aid in distinguishing IDC from invasive cribriform (Gleason pattern 4) or comedonecrotic (Gleason pattern 5) cancers, current guidelines recommend IHC primarily in cases lacking definitive invasive cancer, which are relatively rare (0.06–0.26% of biopsies)³⁹. Moreover, the diagnostic value of basal-cell IHC in differentiating HGPIN, AIP, and IDC is limited; such assessments continue to rely heavily on expert interpretation of H&E morphology, occasionally supplemented with non-basal-cell markers such as AMACR. As we do not currently report separate performance metrics for ductal adenocarcinoma or intraductal lesions, these limitations further underscore the importance of human oversight and the current role of AI as a decision-support tool rather than a stand-alone diagnostic system. Nevertheless, we aim to expand our dataset to include more such cases in future iterations of the model.

The three false negative cases representing high-grade cancers (one ISUP 4 and two ISUP 5) were confirmed as such by both study pathologists during reassessment, consistent with the original reports. Importantly, the infiltrative nature of these lesions was readily apparent, making it highly unlikely for these cancers to be missed in clinical practice. This emphasizes the role of AI models as diagnostic aids for pathologists, with the final decision remaining under human oversight. It is also worth noting that while the evaluation presented in this study was conducted on individual slides, several slides are typically assessed per prostate. As multiple WSIs are screened per patient, the probability of a false-negative cancer diagnosis is further reduced. Still, a pathologist’s assessment remains crucial to ensure accurate diagnoses when encountering technical artifacts or rare morphological variants that the AI model has not been sufficiently exposed to during development⁴⁰. While stain variation and tissue artifacts can affect AI performance, our primary validation study²⁹ demonstrated model robustness against stain variation across multiple fully external datasets. Additionally, several color calibration techniques have been proposed to further enhance model robustness to staining variability⁴¹. In contrast, tissue artifacts that obscure key morphological features remain a significant challenge. We have previously investigated this issue and plan to implement a conformal prediction framework to flag uncertain predictions—such as those arising from obscured or artifact-laden tissue—in future versions of the model⁴⁰. Regarding thresholding, the review process highlights the importance of prioritizing sensitivity. Although many of the false negative WSIs encountered at threshold 0.5 should likely be diagnosed as atypia/SFC rather than definitive malignancy, their need for IHC staining suggests that the baseline threshold of 0.5 is insufficient for this use case.

A critical factor influencing the adoption of AI in pathology is whether pathologists will trust its predictions. Our proposed scenario, where the AI model would suggest omitting IHC for morphologies the pathologist perceives as ambiguous, is no exception–the thought of misclassifying cancer as benign due to AI advice is naturally frightening. However, we must understand that the pathologists’ uncertainty spans a continuous spectrum. Sometimes the cancer suspicion is very low, but using IHC validation as a safety net is an easy way of eliminating lingering doubt. The fear of missing malignancies, combined with the fact that most doctors are not involved in departmental financial governance, could explain overly cautious approaches, where pathologists prioritize ensuring accurate diagnoses over the institution’s financial considerations. This tendency is reflected in our data with 55.1%, 58.9%, and 39.6% of IHC-validated WSIs ultimately yielding benign diagnoses from the SUH, SFR, and SCH cohorts, respectively. We believe that for cases where initial cancer suspicion is low, the addition of an AI model proven to be highly adept at correctly classifying cancer vs. benign tissue in similar situations could give pathologists the extra assurance needed to sign out benign samples without IHC validation. The potential impact of reducing IHC usage depends on institutional practices, which vary. However, our data suggest significant potential for reduction: IHC was requested for 56%, 58%, and 38% of patients in the SUH, SFR, and SCH cohorts, respectively, and for 20%, 22%, and 7% of all slides in those same cohorts. In the likely situation of early resistance from pathologists, trust could develop over time as they use the AI model and observe its consistent accuracy. Confidence may be gained by pathologists initially sticking to their individual IHC-ordering patterns while cross-verifying AI predictions with subsequent IHC results. This trust-building process would be essential for encouraging widespread acceptance and integration of AI in routine diagnostics.

To date, there are very few publications focusing on the utilization of AI models in IHC-related tasks within prostate pathology. A study by Chatrian et al.¹⁶ aimed to pre-order IHC for presumed difficult cases in order to save diagnostic time for the investigating pathologist, using an AI model trained on cases from routine diagnostics where IHC staining had been performed. That is, the aim of the AI model was to mimic the IHC requesting pattern of pathologists. Our work is fundamentally different as we aim to provide pathologists with an AI model that will modify these patterns, reducing the number of IHCs requested for truly benign cases where pathologists’ suspicion of cancer is low. While the approach of Chatrian et al. enhances workflow efficiency, it does not address the overuse of IHC in benign cases, which represents a tangible resource burden. By using a sensitivity-prioritized thresholding framework, our AI model offers a novel solution to this issue, allowing pathologists to confidently forgo IHC in benign cases while maintaining diagnostic accuracy for malignant cases.

Eloy et al. demonstrated how using an AI model in the evaluation of prostate biopsies reduced the reliance on IHC workup compared to the traditional diagnostic approach²⁷. The study design involved four pathologists assessing the same set of slides in two phases, with a washout period of a minimum of 2 weeks between assessments. All slides in the set were presumably difficult cases, having had IHC requested during the routine diagnostic process. Phase 1 involved assessment with no aid from AI, while in Phase 2, the AI model was introduced. Even though the results showed a reduction of pathologist IHC requests when slide assessment was assisted by AI, there are reasons to question the relevance and transferability of these findings to routine diagnostic practice. Firstly, the pathologists were aware that all slides in the set had IHC staining performed during primary diagnostics. Secondly, Phase 1 allowed pathologists to view IHC stains, given that they would have requested it in a diagnostic situation. Knowing that other pathologists had requested IHC, and having these stains readily available in a research setting without real-world consequences in terms of time or resources if choosing to look at them, risks introducing bias. Furthermore, giving the pathologists the option of seeing the IHC stains in Phase 1, i.e., letting them know the true nature of the tissue, could potentially have introduced bias in Phase 2.

Our study highlights the potential of a sensitivity-prioritized AI framework for reducing IHC use for benign prostate biopsies, alleviating resource burdens, reducing costs, and improving diagnostic efficiency in pathology laboratories. The AI model demonstrates state-of-the-art performance, maintaining high sensitivity and specificity even in challenging cases where pathologists traditionally rely on IHC. In a significant proportion of these cases, the AI model shows overwhelming confidence in its predictions, underscoring its potential to reduce IHC staining for benign slides even when thresholds are applied. However, it must be noted that the present analysis is limited to IHC data from three cohorts. While our previous large-scale validation study demonstrated that the AI model generalizes well across 12 external cohorts from 11 countries, those evaluations did not specifically focus on diagnostically challenging cases requiring IHC. This highlights the need for further validation in broader and more diverse clinical settings, particularly for difficult cases. By standardizing decision-making across pathologists with varying experience levels, AI has the potential to mitigate subjectivity in IHC usage and enhance diagnostic consistency.

Integration of AI into clinical workflows requires careful consideration of laboratory protocols, workflow dynamics, and user interactions, and prospective studies in real-world settings will be crucial for validating the clinical and economic benefits suggested by our retrospective analysis. While our study focuses on diagnostic performance and potential IHC savings, broader implementation of AI in pathology will also require careful consideration of integration costs. This includes infrastructure requirements, image processing times, IT maintenance, and long-term operating expenses. We advocate for future cost-effectiveness studies that comprehensively evaluate these factors alongside diagnostic benefits. Such analyses will be critical in assessing the real-world value of AI-assisted pathology workflows, and we plan to pursue this in collaboration with health economics experts. Ultimately, AI-driven pathology represents a transformative opportunity to improve diagnostic precision, streamline workflows, and optimize resource utilization, contributing to better patient outcomes worldwide.

Data availability

All relevant data are available upon request, but cannot be shared publicly. For any requests to access these sources, inquiries should be directed to M.E. at Karolinska Institutet (martin.eklund@ki.se). Requests will be evaluated on a case-by-case basis, with approvals granted if they comply with data privacy regulations and intellectual property policies. A subset of the data used for model training (STHLM3 and RUMC cohorts) is available for non-commercial purposes, subject to a CC BY-SA-NC 4.0 license as part of the PANDA challenge dataset and is freely downloadable after registration at https://www.kaggle.com/c/prostate-cancer-grade-assessment. Source data are provided with this paper (Supplementary Data Set 1).

Code availability

No significant custom code was developed for this study. The AI models were implemented as described in the previous study²⁹. Torch library (https://github.com/pytorch/pytorch) was used for obtaining model predictions. For the foundation models, we have used the publicly available models from https://huggingface.co/paige-ai/Virchow2 and https://huggingface.co/MahmoodLab/UNI.

References

Epstein, J. I., Allsbrook, W. C., Jr, Amin, M. B., Egevad, L. L. & ISUP Grading Committee. The 2005 International Society of Urological Pathology (ISUP) Consensus Conference on Gleason Grading of Prostatic Carcinoma. Am. J. Surg. Pathol. 29, 1228–1242 (2005).
Gleason, D. F. Histologic grading of prostate cancer: a perspective. Hum. Pathol. 23, 273–279 (1992).
Article CAS PubMed Google Scholar
Egevad, L., Granfors, T., Karlberg, L., Bergh, A. & Stattin, P. Prognostic value of the Gleason score in prostate cancer. BJU Int. 89, 538–542 (2002).
Article CAS PubMed Google Scholar
Ozkan, T. A. et al. Interobserver variability in Gleason histological grading of prostate cancer. Scand. J. Urol. 50, 420–424 (2016).
Article CAS PubMed Google Scholar
Melia, J. et al. A UK-based investigation of inter- and intra-observer reproducibility of Gleason grading of prostatic biopsies. Histopathology 48, 644–654 (2006).
Article CAS PubMed Google Scholar
Ström, P. et al. Artificial intelligence for diagnosis and grading of prostate cancer in biopsies: a population-based, diagnostic study. Lancet Oncol. 21, 222–232 (2020).
Article PubMed Google Scholar
Egevad, L. et al. Standardization of Gleason grading among 337 European pathologists: Gleason grading in Europe. Histopathology 62, 247–256 (2013).
Article PubMed Google Scholar
Epstein, J. I. et al. The 2014 international society of urological pathology (ISUP) consensus conference on Gleason grading of prostatic carcinoma: Definition of grading patterns and proposal for a new grading system. Am. J. Surg. Pathol. 40, 244–252 (2016).
Article PubMed Google Scholar
Epstein, J. I., Egevad, L., Humphrey, P. A. & Montironi, R. & Members of the ISUP Immunohistochemistry in Diagnostic Urologic Pathology Group. Best practices recommendations in the application of immunohistochemistry in the prostate: report from the International Society of Urologic Pathology consensus conference. Am. J. Surg. Pathol. 38, e6–e19 (2014).
Article PubMed Google Scholar
Magi-Galluzzi, C. Prostate cancer: diagnostic criteria and role of immunohistochemistry. Mod. Pathol. 31, S12–S21 (2018).
Article PubMed Google Scholar
Varma, M. & Jasani, B. Diagnostic utility of immunohistochemistry in morphologically difficult prostate cancer: review of current literature. Histopathology 47, 1–16 (2005).
Article CAS PubMed Google Scholar
Paner, G. P., Luthringer, D. J. & Amin, M. B. Best practice in diagnostic immunohistochemistry: Prostate carcinoma and its mimics in needle core biopsies. Arch. Pathol. Lab. Med. 132, 1388–1396 (2008).
Article PubMed Google Scholar
Al Diffalha, S. et al. Immunohistochemistry in the workup of prostate biopsies: Frequency, variation and appropriateness of use among pathologists practicing at an academic center. Ann. Diagn. Pathol. 27, 34–42 (2017).
Article PubMed Google Scholar
Mandel, P. et al. Immunohistochemistry for prostate biopsy-impact on histological prostate cancer diagnoses and clinical decision making. Curr. Oncol. 28, 2123–2133 (2021).
Article PubMed PubMed Central Google Scholar
Watson, K., Wang, C., Yilmaz, A., Bismar, T. A. & Trpkov, K. Use of immunohistochemistry in routine workup of prostate needle biopsies: a tertiary academic institution experience. Arch. Pathol. Lab. Med. 137, 541–545 (2013).
Article CAS PubMed Google Scholar
Chatrian, A. et al. Artificial intelligence for advance requesting of immunohistochemistry in diagnostically uncertain prostate biopsies. Mod. Pathol. 34, 1780–1794 (2021).
Article CAS PubMed PubMed Central Google Scholar
Salto-Tellez, M., Maxwell, P. & Hamilton, P. Artificial intelligence-the third revolution in pathology. Histopathology 74, 372–376 (2019).
Article PubMed Google Scholar
Bulten, W. et al. Artificial intelligence for diagnosis and Gleason grading of prostate cancer: the PANDA challenge. Nat. Med. 28, 154–163 (2022).
Article CAS PubMed PubMed Central Google Scholar
Bulten, W. et al. Automated deep-learning system for Gleason grading of prostate cancer using biopsies: a diagnostic study. Lancet Oncol. 21, 233–241 (2020).
Article PubMed Google Scholar
Campanella, G. et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat. Med. 25, 1301–1309 (2019).
Article CAS PubMed PubMed Central Google Scholar
Esteva, A. et al. Prostate cancer therapy personalization via multi-modal deep learning on randomized phase III clinical trials. NPJ Digit. Med. 5, 71 (2022).
Article PubMed PubMed Central Google Scholar
Wulczyn, E. et al. Predicting prostate cancer specific-mortality with artificial intelligence-based Gleason grading. Commun. Med. 1, 10 (2021).
Article PubMed PubMed Central Google Scholar
Vorontsov, E. et al. A foundation model for clinical-grade computational pathology and rare cancers detection. Nat. Med. 30, 2924–2935 (2024).
Article CAS PubMed PubMed Central Google Scholar
Xu, H. et al. A whole-slide foundation model for digital pathology from real-world data. Nature https://doi.org/10.1038/s41586-024-07441-w (2024).
Article PubMed PubMed Central Google Scholar
Chen, R. J. et al. Towards a general-purpose foundation model for computational pathology. Nat. Med. 30, 850–862 (2024).
Article CAS PubMed PubMed Central Google Scholar
Lu, M. Y. et al. A visual-language foundation model for computational pathology. Nat. Med. 30, 863–874 (2024).
Article CAS PubMed PubMed Central Google Scholar
Eloy, C. et al. Artificial intelligence-assisted cancer diagnosis improves the efficiency of pathologists in prostatic biopsies. Virchows Arch. 482, 595–604 (2023).
Article CAS PubMed PubMed Central Google Scholar
Bulten, W. et al. Epithelium segmentation using deep learning in H&E-stained prostate specimens with immunohistochemistry as reference standard. Sci. Rep. 9, 864 (2019).
Article PubMed PubMed Central Google Scholar
Mulliqi, N. et al. Foundation models—a panacea for artificial intelligence in pathology? arXiv https://arxiv.org/abs/2502.21264 (2025).
Mulliqi, N. et al. Development and retrospective validation of an artificial intelligence system for diagnostic assessment of prostate biopsies: study protocol. BMJ Open 15, e097591 (2025).
Article PubMed PubMed Central Google Scholar
Xie, S., Girshick, R., Dollár, P., Tu, Z. & He, K. Aggregated residual transformations for deep neural networks. arXiv https://arxiv.org/abs/1611.05431 (2016).
Tan, M. & Le, Q. V. EfficientNet: rethinking model scaling for convolutional neural networks. arXiv https://arxiv.org/abs/1905.11946 (2019).
Loshchilov, I. & Hutter, F. Decoupled weight decay regularization. arXiv https://arxiv.org/abs/1711.05101 (2017).
Zimmermann, E. et al. Virchow2: scaling self-supervised mixed magnification models in pathology. arXiv https://arxiv.org/abs/2408.00738 (2024).
Bankhead, P. et al. QuPath: Open source software for digital pathology image analysis. Sci. Rep. 7, 16878 (2017).
Article PubMed PubMed Central Google Scholar
Egevad, L., Allsbrook, W. C. & Epstein, J. I. Current practice of diagnosis and reporting of prostatic intraepithelial neoplasia and glandular atypia among genitourinary pathologists. Mod. Pathol. 19, 180–185 (2006).
Article PubMed Google Scholar
Amin, M. et al. Prognostic and predictive factors and reporting of prostate carcinoma in prostate needle biopsy specimens. Scand. J. Urol. Nephrol. Suppl. 39, 20–33 (2005).
Article Google Scholar
Ranasinha, N. et al. Ductal adenocarcinoma of the prostate: a systematic review and meta-analysis of incidence, presentation, prognosis, and management. BJUI Compass 2, 13–23 (2021).
Article PubMed PubMed Central Google Scholar
van Leenders, G. J. L. H. et al. The 2019 International Society of Urological Pathology (ISUP) consensus conference on grading of prostatic carcinoma. Am. J. Surg. Pathol. 44, e87–e99 (2020).
Article PubMed PubMed Central Google Scholar
Olsson, H. et al. Estimating diagnostic uncertainty in artificial intelligence assisted pathology using conformal prediction. Nat. Commun. 13, 7761 (2022).
Article CAS PubMed PubMed Central Google Scholar
Ji, X. et al. Physical color calibration of digital pathology scanners for robust artificial intelligence-assisted cancer diagnosis. Mod. Pathol. 38, 100715 (2025).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

A.B. received a grant from the Health Faculty at the University of Stavanger, Norway. M.E. received funding from the Swedish Research Council, Swedish Cancer Society, Swedish Prostate Cancer Society, Nordic Cancer Union, Karolinska Institutet, and Region Stockholm. K.K. received funding from the SciLifeLab & Wallenberg Data Driven Life Science Program (KAW 2024.0159), David and Astrid Hägelen Foundation, Instrumentarium Science Foundation, KAUTE Foundation, Karolinska Institute Research Foundation, Orion Research Foundation, and Oskar Huttunen Foundation. We thank Silja Kavlie Fykse and Desmond Mfua Abono for scanning in Stavanger. We would like to acknowledge the patients who contributed the clinical information that made this study possible. Computations were enabled by the National Academic Infrastructure for Supercomputing in Sweden (NAISS) and the Swedish National Infrastructure for Computing (SNIC) at C3SE, partially funded by the Swedish Research Council through grant agreement no. 2022-06725 and no. 2018-05973, and by the supercomputing resource Berzelius provided by the National Supercomputer Centre at Linköping University and the Knut and Alice Wallenberg Foundation.

Funding

Open access funding provided by Karolinska Institute.

Author information

These authors contributed equally: Anders Blilie, Nita Mulliqi.

Authors and Affiliations

Department of Pathology, Stavanger University Hospital, Stavanger, Norway
Anders Blilie, Einar Gudlaugsson & Emiel A. M. Janssen
Faculty of Health Sciences, University of Stavanger, Stavanger, Norway
Anders Blilie
Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
Nita Mulliqi, Xiaoyi Ji, Kelvin Szolnoky, Sol Erika Boman, Matteo Titus, Geraldine Martinez Gonzalez & Martin Eklund
Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden
Sol Erika Boman
Department of Pathology, Synlab, Madrid, Spain
José Asenjo
Department of Pathology, Synlab, Brescia, Italy
Marcello Gambacorta & Paolo Libretti
The General Practice and Care Coordination Research Group, Stavanger University Hospital, Stavanger, Norway
Svein R. Kjosavik
Department of Global Public Health and Primary Care, Faculty of Medicine, University of Bergen, Bergen, Norway
Svein R. Kjosavik
Department of Oncology and Pathology, Karolinska Institutet, Stockholm, Sweden
Lars Egevad
Faculty of Science and Technology, University of Stavanger, Stavanger, Norway
Emiel A. M. Janssen
Institute for Biomedicine and Glycomics, Griffith University, Queensland, Australia
Emiel A. M. Janssen
Department of Medical Epidemiology and Biostatistics, SciLifeLab, Karolinska Institutet, Stockholm, Sweden
Kimmo Kartasalo

Authors

Anders Blilie
View author publications
Search author on:PubMed Google Scholar
Nita Mulliqi
View author publications
Search author on:PubMed Google Scholar
Xiaoyi Ji
View author publications
Search author on:PubMed Google Scholar
Kelvin Szolnoky
View author publications
Search author on:PubMed Google Scholar
Sol Erika Boman
View author publications
Search author on:PubMed Google Scholar
Matteo Titus
View author publications
Search author on:PubMed Google Scholar
Geraldine Martinez Gonzalez
View author publications
Search author on:PubMed Google Scholar
José Asenjo
View author publications
Search author on:PubMed Google Scholar
Marcello Gambacorta
View author publications
Search author on:PubMed Google Scholar
Paolo Libretti
View author publications
Search author on:PubMed Google Scholar
Einar Gudlaugsson
View author publications
Search author on:PubMed Google Scholar
Svein R. Kjosavik
View author publications
Search author on:PubMed Google Scholar
Lars Egevad
View author publications
Search author on:PubMed Google Scholar
Emiel A. M. Janssen
View author publications
Search author on:PubMed Google Scholar
Martin Eklund
View author publications
Search author on:PubMed Google Scholar
Kimmo Kartasalo
View author publications
Search author on:PubMed Google Scholar

Contributions

A.B., M.T., G.M.G., J.A., M.G., P.L., E.G., S.R.K., and E.A.M.J. collected, assessed, and curated clinical datasets. A.B., N.M., X.J., K.S., S.E.B., M.T., and K.K. contributed to the digitization, pre-processing, and management of WSI data. N.M., X.J., K.S., S.E.B., and K.K. developed the AI models. A.B. and N.M. conducted the statistical analyses. A.B. and L.E. conducted an in-depth review of false negative cases. A.B., N.M., X.J., S.E.B., L.E., E.A.M.J., M.E., and K.K. analyzed and interpreted the study results. A.B., M.E., and K.K. acquired funding. K.K. conceived of the study and takes responsibility for its integrity and accuracy. A.B., N.M., M.E., and K.K. drafted the manuscript. All authors reviewed, edited, and approved the manuscript.

Corresponding author

Correspondence to Kimmo Kartasalo.

Ethics declarations

Competing interests

N.M., L.E., K.K., and M.E. are shareholders of Clinsight AB. All other authors declare no competing interests.

Peer review

Peer review information

Communications Medicine thanks Mónica Curado and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. [A peer review file is available].

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Transparent Peer Review file (download PDF )

Supplemental material (download PDF )

Description of Additional Supplementary Files (download PDF )

Supplementary Data Set 1 (download XLSX )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Blilie, A., Mulliqi, N., Ji, X. et al. Artificial intelligence-assisted prostate cancer diagnosis for reduced use of immunohistochemistry. Commun Med 5, 425 (2025). https://doi.org/10.1038/s43856-025-01185-y

Download citation

Received: 24 April 2025
Accepted: 02 October 2025
Published: 15 October 2025
Version of record: 15 October 2025
DOI: https://doi.org/10.1038/s43856-025-01185-y