Fig. 1: Examples from the four evaluation datasets (LIDC-IDRI, LNDb, MosMedData, and NSCLC-Radiomics).

The significant visual differences in lesion appearance-ranging from small, well-defined nodules to large, complex tumors and diffuse infectious lesions-highlight the domain gap across the datasets, providing a robust benchmark for evaluating the model’s cross-domain generalization performance.