Table 1 Sample CTA algorithms from the published literature.

From: Report on computational assessment of Tumor Infiltrating Lymphocytes from the International Immuno-Oncology Biomarker Working Group

Stain

Approach

Ref

Data set

Method

Ground truth

Notes

H&E

Patch classification

24

Multiple sites

CNN

Labeled patches (yes/no TILs)

Strengths: large-scale study with investigation of spatial TIL maps. AV includes molecular correlates.

TCGA data set

Annotations are open-access

Limitations: does not distinguish sTIL and iTIL; does not classify individual TILs*.

Other: we defined CTA TIL score as fraction of patches that contain TILs, and found this to be correlated with VTA (R = 0.659, p = 2e-35).

 

Semantic segmentation

16

Breast

FCN

Traced region boundaries (exhaustive)

Strengths: large sample size and regions; investigates inter-rater variability at different experience levels; delineation of tumor, stroma and necrosis regions.

TCGA data set

Annotations are open-access

Limitations: only detects dense TIL infiltrates*; does not classify individual TILs*.

 

Semantic segmentation + Object detection

25

Breast

Seeding + FCN

Traced region boundaries (exhaustive)

Strengths: mostly follows TIL-WG VTA guidelines. AV includes correlation with consensus VTA scores and inter-pathologist variability.

Private data set

Labeled & segmented nuclei within labeled region

Limitations: heavy ground truth requirement*; underpowered CV; and limited manually annotated slides.

 

Object detection

26

Breast

SVM using morphology features

Labeled nuclei

Strengths: robust analysis and exploration of molecular TIL correlates.

METABRIC data set

Qualitative density scores

Limitations: individual labeled nuclei are limited; does not distinguish TILs in different histologic regions*.

  

27

Breast

RG and MRF

Labeled patches (low-medium-high density)

Strengths: explainable model and modular pipeline.

Private data set

Limitations: does not distinguish sTIL and iTIL; does not classify individual TILs. Limited AV sample size.

  

28

NSCLC

Watershed + SVM classifier

Labeled nuclei

Strengths: explainable model; robust CV; captures spatial TIL clustering.

Private data sets

Limitations: limited AV; does not distinguish sTIL and iTIL.

 

Object detection + inferred TIL localization

31

Breast

SVM classifier using morphology features

Labeled nuclei

Strengths: infers TIL localization using spatial localization. Robust CV. Investigation of spatial TIL patterns.

METABRIC + private data sets

Qualitative density scores

Limitations: individual labeled nuclei are limited. not clear if spatial clustering has 1:1 correspondence with regions.

IHC

Object detection + manual regions

29

Colon

Complex pipeline (non-DL)

Overall density estimates

Strengths: CTA within manual regions, including invasive margin.

Private data set

Limitations: unpublished AV.

 

Object detection

30

Multiple

Multiple DL pipelines

Labeled nuclei within FOV (exhaustive)

Strengths: large-scale, robust AV. Systematic benchmarking.

Private data set

Limitations: no CV; does not distinguish TILs in different regions*.

  1. This non-exhaustive list has been restricted to H&E and chromogenic IHC, although excellent works exist showing CTA based on other approaches like multiplexed immunofluorescence21,22,23. Published CTA algorithms vary markedly in their approach to TIL scoring, the robustness of their validation, their interpretability, and their consistency with published VTA guidelines. Strengths and limitations of each publication is highlighted, with general limitations (related to the broad approach used, not the specific paper) are marked with an asterisk (*). Going forward, nuanced approaches are needed, ideally incorporating workflows for robust quantification and validation as presented in this paper. Different approaches have different ground truth requirements (illustrated in Fig. 1, panel f), hence the need for large-scale ground truth data sets. We encourage all future CTA publications to open-access their data sets whenever possible. Of note are two major efforts: 1. A group of scientists, including the US FDA and the TIL-WG, is collaborating to crowdsource pathologists and collect images and pathologist annotations that can be qualified by the FDA medical device development tool program; 2. The TIL-WG is organizing a challenge to validate CTA algorithms against clinical trial outcome data (CV).
  2. AV analytical validation, CNN convolutional neural network, DL deep learning, FCN fully convolutional network, FOV field of view, MRF markov random field, RG region growing, NSCLC non-small cell lung cancer, SVM support vector machine.