Fig. 1: LATTICe-M dataset overview and HPL pipeline for mesothelioma analysis.

a An overview of the LATTICe-M dataset clinical information, highlighting key demographic and pathological information. The forest plot displays log hazard ratios (centre) with 95% confidence intervals (error bars), derived from the Cox proportional hazards model for clinical variables including age, subtype, and TNM stage. Survival probability Kaplan-Meier survival curves are stratified by mesothelioma subtype, with time displayed in months. Shaded areas represent 95% confidence intervals around each survival curve. Risk groups were defined by median predicted hazard from the Cox model. The overview plots are based on n = 512 patients (biological replicates), each representing an independent clinical record. The unit of study is patient and no technical replicates is used. Other clinical variables are presented with their corresponding frequencies and percentages in two summary tables. b HPL pipeline workflow: Each WSI is divided into 224 by 224 pixel tiles. After applying various data augmentation distortions, these tiles served as input for the Barlow Twins self-supervised learning model. Once the model is trained, the ResNet backbone network generated 128-dimensional feature vectors per tile, representing prominent histopathological features. These vectors were then grouped using the Leiden community clustering algorithm to identify morphologically distinct patterns. At the patient level, the clusters representing different histopathological patterns were analysed to quantify the proportion of each HPC within each WSI. This quantitative information was subsequently used to predict mesothelioma subtypes and patient outcomes. Source data are provided as a Source Data file.