Abstract
Thyroid cancer (THCA) shows rising incidence and aggressiveness in young-onset cases. Through integrated single-cell, transcriptomic, and proteomic analyses, we identified age-specific tumor-associated macrophage (TAM) genes, including OLR1 and SIGLEC1, linked to metastasis and immune dysfunction. These TAM biomarkers were validated at the protein level, highlighting their prognostic and therapeutic potential. Our findings reveal key TAM-driven mechanisms in young-onset THCA progression, warranting further clinical investigation.
Introduction
Thyroid cancer (THCA) incidence has risen in recent decades, notably among adolescents and young adults1. Young-onset THCA exhibits distinct molecular and clinical features versus older patients, often presenting with aggressive, advanced-stage disease2. The tumor microenvironment (TME), particularly tumor-associated macrophages (TAMs), critically influences tumor progression and therapy resistance3. In thyroid cancer, high TAM density correlates with lymph node metastasis and poor survival4. However, TAM roles in young-onset THCA remain uncharacterized. This study integrates single-cell RNA sequencing (scRNA-seq) data from young-onset THCA to identify TAM-specific markers, validated across RNA-seq and proteomic datasets. We aim to elucidate TAM-related biomarkers in young-onset THCA aggressiveness, informing targeted therapeutic strategies.
A sophisticated scRNA-seq dataset including 38,224 cells (Fig. 1A, B) was accessed and re-analyzed. Nine major cell types were identified using canonical markers (Fig. 1C). Macrophage abundance was significantly increased in the Young group (Fig. 1D). Differential expression analysis identified 47 differentially expressed genes (DEGs) with age- and macrophage-specific changes, linked to immune regulation and inflammatory signaling (Fig. 1E, F). Among the 47 DEGs, 45 were protein-coding genes. We examined the expression of these 45 DEGs across age groups and pathological subtypes and found that age substantially contributes to the expression variability of these DEGs; however, due to high collinearity between age group and subtype, these estimates should be interpreted with caution (Supplementary Fig. 1). The 45 DEGs were further validated in the GSE153659 dataset, showing higher expression in the Young group (Fig. 1G). Eight genes exhibited significant differential expression between age groups (Fig. 1H), with a similar trend in the GSE53157 dataset (Fig. 1I). Ten genes showed significant expression differences (Fig. 1J). This trend persisted after stratifying by pathological subtypes (Supplementary Fig. 2). Notably, OLR1 and SIGLEC1 were significantly upregulated in the Young group in both datasets, suggesting relevance to age-related changes in TAMs.
A UMAP visualization of 9 major cell types in the GSE193581 dataset. B UMAP visualization of two age groups. C Dotplot of marker genes. D Box plot of the cell type percentage in two age groups. All boxes are centered at the median and bounded by the first (Q1) and third (Q3) quartiles. Upper whiskers indicate the minimum (maximum, Q3 + 1.5 IQR), and lower whiskers indicate the maximum (minimum, Q1—1.5 IQR). E Filter criteria to obtain age-specific and macrophage-specific genes. F Bar plot exhibiting enrichment results of upregulated and downregulated genes on GOBP terms. Bar length represents the −log10(q-value) of enrichment, reflecting correlation strength. G Gene expression heatmap in the Young and Old groups from GSE153659, based on Z-scores of normalized gene expression levels. H Boxplot of significant DEGs in GSE153659, based on FPKM values. I Gene expression heatmap in the Young and Old groups from GSE53157, based on Z-scores of normalized gene expression levels. J Boxplot of significant DEGs in GSE53157, based on signal intensity. Asterisks indicate the level of statistical significance: ns, non-significant, *p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001.
The 45 DEGs were then validated in The Cancer Genome Atlas (TCGA)-THCA dataset to examine the association with age, lymph node metastasis and tumor staging. Although the differences between age groups were not highly significant (Supplementary Fig. 3), in the metastasis subgroup comparison, 21 genes showed significant differential expression, with 14 upregulated and 7 downregulated in the metastasis group (Fig. 2A), and the top 8 upregulated genes were visualized (Fig. 2B). Additionally, 18 DEGs differed in T staging, with 11 upregulated and 7 downregulated in the T3_4 group (Fig. 2C), and the top 8 upregulated genes were visualized (Fig. 2D). To evaluate the relationship between these 45 DEGs and immune status, we assessed their T-cell dysfunction scores using the TIDE tool, where a higher score indicates that samples with high expression tend to be enriched in the T-cell dysfunction phenotype. KCNMA1, SIGLEC1, and FOLR2 exhibited the highest dysfunction scores (Fig. 2E). Specifically, in the TCGA Melanoma dataset, these genes were grouped by High and Low expression, with survival analysis showing opposite trends between the cytotoxic T cell (CTL) Top and Bottom groups for KCNMA1 (Fig. 2F), SIGLEC1 (Fig. 2G), and FOLR2 (Fig. 2H). In other datasets, KCNMA1 and OLR1 (TCGA Endometrial; Fig. 2I-J) and P2RY13 (METABRIC; Fig. 2K) also showed significant dysfunction. Pearson correlation analysis revealed significant correlations with CTL levels (Fig. 2L), and survival risk scores were consistent across five datasets (Fig. 2M).
A DEGs associated with lymph node metastasis. B Significant DEGs between N0 and N1 groups, represented by gene read count. C DEGs associated with tumor T staging. D Significant DEGs between T1_2 and T3_4 groups, represented by gene read count. E Heatmap of T cell dysfunction scores for the 45 genes across five datasets. F Kaplan–Meier (K–M) curves for KCNMA1 gene High and Low expression groups with different CTL proportions in the TCGA Melanoma dataset. G K–M curves for SIGLEC1 gene High and Low expression groups with different CTL proportions in the TCGA Melanoma dataset. H K–M curves for FOLR2 gene High and Low expression groups with different CTL proportions in the TCGA Melanoma dataset. I K–M curves for KCNMA1 gene High and Low expression groups with different CTL proportions in the TCGA Endometrial dataset. J K–M curves for OLR1 gene High and Low expression groups with different CTL proportions in the TCGA Endometrial dataset. K K–M curves for P2RY13 gene High and Low expression groups with different CTL proportions in the METABRIC dataset. L Pearson correlation coefficient (Pearson’s r) heatmap between the 45 genes and CTL levels across five datasets. M Survival risk scores heatmap for the 45 genes across five datasets. Asterisks indicate the level of statistical significance: ns non-significant, *p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001.
The 45 DEGs were further validated at the proteomic level. The coefficient of variation across the 12 pooled samples mainly ranged from 0 to 0.2 (Supplementary Fig. 4A), with a similar distribution after missing value imputation (Supplementary Fig. 4B). Principal component analysis after batch effect correction with Combat showed no apparent batch effects (Supplementary Fig. 4C). A total of 412 differentially expressed proteins were shared between the pediatric malignant (PM) vs pediatric benign (PB) and PM vs adult malignant (AM) comparisons (Supplementary Fig. 4D). Overlaps with the 45 selected genes were four and two, respectively (Supplementary Fig. 4E), corresponding to ALOX5, IL4I1, MNDA, and HPGDS (upregulated in PM vs. PB), MERTK (upregulated in PM vs. AM), and OLFML3 (downregulated in PM vs. AM). Nevertheless, OLR1 was not detected in this dataset, and SIGLEC1 did not show significant differences between groups.
Our analysis revealed increased macrophage infiltration in young-onset THCA patients, identifying 45 TAM-associated DEGs with age-specific expression patterns validated across multiple datasets. These DEGs showed significant correlation with mast cell-mediated immunity and activation, upregulation of vascular endothelial growth factor production, nitric oxide biosynthesis, and calcium-mediated signaling, which are potential mechanisms for tumor dissemination5,6,7,8. Two markers, OLR1 and SIGLEC1, showed consistent upregulation, aligning with our previous findings. Comprehensive transcriptomic analysis and protein staining confirmed OLR1’s specific expression on TAMs within the TME, with OLR1 levels serving as a reliable biomarker for macrophage infiltration and correlating with poor clinical outcomes in head and neck squamous cell carcinoma9. Our prior research also highlighted the widespread presence of Siglec family members in TAMs across various cancers, suggesting their role in tumor progression through immunomodulation and TAM polarization within the TME10. Their co-expression suggests synergistic TAM polarization toward immunosuppressive phenotypes. Importantly, high marker expression reversed the protective CTL infiltration association, indicating TAM-mediated cytotoxic T-cell impairment through direct suppression or checkpoint modulation11,12, suggesting that the interaction between tumor cells and immune cells in the tumor microenvironment significantly influences tumor progression13. Unfortunately, since TIDE does not include a thyroid cancer dataset, we were unable to directly validate this hypothesis in thyroid cancer samples.
In conclusion, our study highlights the critical role of TAMs in the prognosis of young-onset THCA. The identified TAM-specific genes serve as important biomarkers for tumor metastasis, staging, and immune dysfunction, as well as potential therapeutic targets. Further prospective cohort studies and experimental validation are warranted to confirm these findings and explore their clinical implications.
Methods
Thyroid cancer scRNA-seq data analysis
The single-cell RNA sequencing (scRNA-seq) dataset was obtained from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/) under the accession number GSE19358114. After excluding normal thyroid tissues, cell line data, and tumor samples treated with drugs, seven anaplastic thyroid cancer (ATC) samples and five papillary thyroid cancer (PTC) samples were selected for analysis. The raw gene expression matrix and corresponding metadata were processed and analyzed using Seurat (v5)15 in R (v4.4.2).
The gene expression matrix was first normalized using the normalize_total and log1p functions. Highly variable genes were identified using the highly_variable_genes function and selected for downstream analysis. Principal component analysis (PCA) and Harmony were applied for dimensionality reduction and batch effect correction, with the parameters “batch_key = ‘SampleID’, n_pcs = 20”. Uniform Manifold Approximation and Projection (UMAP) was then used for visualization.
Samples were divided into two age groups based on a 35-year-old threshold: the Old group (n = 8) and the Young group (n = 4). To identify differentially expressed genes (DEGs) in macrophages, we used the FindMarkers function with the parameters: “ident.1 = young, ident.2 = old, logfc.threshold = 0.5”, applying an adjusted p-value threshold of p.adj < 0.01. Finally, Gene Ontology (GO) enrichment analysis was performed using the ClusterProfiler (v4.12.6) package16 in R.
To assess the confounding effect of pathological subtype on target gene expression, we examined the distribution of pathological types within the two age groups. The expression values of these 45 genes were aggregated into pseudobulk for each sample, scaled, and used to generate a gene expression matrix. We then compared gene expression across age and pathological subgroups and performed ANOVA to quantify the contributions of age and disease to expression variance. To better visualize their contributions, we calculated the proportion of the sum of squares for each factor relative to the total sum of squares.
Bulk RNA-seq data acquisition and processing
The GSE15365917 and GSE5315718 datasets were downloaded from the GEO database. GSE153659 contains data from 24 PTC samples, while GSE53157 includes data from 24 thyroid carcinoma samples. In GSE153659, patients older than 50 years were excluded, and the remaining samples were classified into two age groups: Young (≤35 years) and Old (>35 years). After removing two samples with abnormal values, 15 PTC samples were retained for analysis. Similarly, in GSE53157, samples with the pathological type “follicular variant of papillary carcinoma” were excluded, leaving 5 poorly differentiated thyroid carcinomas (PDTC), 7 PTC, and 4 follicular thyroid carcinomas (FTC). Following the removal of one outlier, 15 samples remained, which were also stratified into Young and Old groups using the 35-year-old threshold.
To assess differential gene expression between the Young and Old groups, the Wilcoxon rank-sum test was performed. In addition, we present the expression level of differentially expressed target genes across the pathological subgroups in the GSE53157 dataset.
The Cancer Genome Atlas (TCGA)—THCA data analysis
The TCGA-THCA dataset and the corresponding clinical data were downloaded from the Xena platform, comprising 508 THCA patients, of which 500 are PTC samples.
The differential gene analyses were conducted between the Young and Old subgroups, the non-lymph node metastasis subgroup and the metastasis subgroup, as well as between the T1_2 (combined T1 and T2 stages) group and T3_4 (combined T3 and T4 stages) group. Differentially expressed genes (DEGs) between groups were identified using the DESeq2 R package (v1.46.0)19, with significance defined as an adjusted p-value < 0.05 and an absolute fold change greater than 1.2.
TIDE-based evaluation of selected genes, immune infiltration, and tumor prognosis
The tumor immune dysfunction and exclusion (TIDE)20,21 tool was used to evaluate the association between the selected genes, immune infiltration in the TME, and tumor prognosis.
T-cell dysfunction scores for 45 genes were computed across five public datasets. Gene expression levels were classified into High and Low groups and further stratified by cytotoxic T lymphocyte (CTL) levels into CTL Top and CTL Bottom groups. The interaction effect between gene expression and CTL levels was assessed using the Cox proportional hazards (CoxPH) model, with z-scores and p-values calculated to determine statistical significance. A higher T-cell dysfunction score indicates that samples with high expression tend to be enriched in the T-cell dysfunction phenotype, while a lower score indicates that a sample with a low expression level tends to be enriched in the T-cell functional phenotype.
To further explore immune interactions, Pearson correlation analysis was performed to examine the relationship between selected genes and CTL levels, assessing their impact on T cell activity within the tumor microenvironment.
Finally, the survival risk score was determined by calculating the z-score of each gene’s effect on death risk using the CoxPH model.
Proteomics level validation
Proteomics data from thyroid cancer samples22, including 83 pediatric benign (PB), 85 pediatric malignant (PM), and 66 adult malignant (AM) nodules, were used for further validation. All malignant samples were of the PTC pathological type. To reduce statistical bias, 1272 proteins with a missing value rate greater than 85% were excluded, resulting in a final dataset containing 9154 proteins.
Data quality was assessed by evaluating the coefficient of variation (CV) across pooled samples and technical replicates. Missing values were excluded, and the protein abundance was log2-transformed for further analysis. Missing value imputation was performed using the NAguideR R package, with robust sequential imputation performed using the impsqrob method. The batch effect in the protein matrix was corrected using Combat, an empirical Bayes method implemented in the sva R package23. After imputation and correction, non-positive values were substituted with half of the minimum positive abundance for the corresponding protein. Each pair of technical replicates was averaged to create a single sample representing the mean protein abundance.
Differentially expressed proteins (DEPs) were identified with a fold change (FC) > 1.2.
Data availability
The scRNA-seq and bulk RNA-seq datasets of thyroid cancer samples could be accessed from NCBI’s Gene Expression Omnibus database (http://www.ncbi.nlm.nih.gov/geo/) through accession numbers: GSE193581, GSE153659, and GSE53157. The TCGA-THCA patient cohort used for the current study was publicly available and can be accessed by the TCGA database (https://portal.gdc.cancer.gov/). The proteomic data used in this study could be accessed from the ProteomeXchange Consortium via the iProX partner repository under accession identifier IPX0006407000 (subproject ID: IPX0006407001). The other processed data is available upon reasonable request from the corresponding author.
Code availability
The code used for analysis is available from the corresponding author upon reasonable request.
References
Kitahara, C. M. & Schneider, A. B. Epidemiology of thyroid cancer. Cancer Epidemiol. Biomark. Prev. 31, 1284–1297 (2022).
Parisi, M. T. & Mankoff, D. Differentiated pediatric thyroid cancer: correlates with adult disease, controversies in treatment. Semin. Nucl. Med. 37, 340–356 (2007).
Christofides, A. et al. The complex role of tumor-infiltrating macrophages. Nat. Immunol. 23, 1148–1156 (2022).
Qing, W. et al. Density of tumor-associated macrophages correlates with lymph node metastasis in papillary thyroid carcinoma. Thyroid 22, 905–910 (2012).
Eissmann, M. F. et al. IL-33-mediated mast cell activation promotes gastric cancer through macrophage mobilization. Nat. Commun. 10, 2735 (2019).
Umansky, V., Blattner, C., Gebhardt, C. & Utikal, J. The role of myeloid-derived suppressor cells (MDSC) in cancer progression. Vaccines (Basel) 4, 36 (2016).
Weber, R. et al. Myeloid-derived suppressor cells hinder the anti-cancer activity of immune checkpoint inhibitors. Front. Immunol. 9, 1310 (2018).
Wu, L., Lian, W. & Zhao, L. Calcium signaling in cancer progression and therapy. FEBS J. 288, 6187–6205 (2021).
Zhang, P. et al. Expression of OLR1 gene on tumor-associated macrophages of head and neck squamous cell carcinoma, and its correlation with clinical outcome. Oncoimmunology 12, 2203073 (2023).
Mei, S., Huang, Y., Zhao, Y., Zhang, X. & Zhang, P. A pan-cancer blueprint of genomics alterations and transcriptional regulation of Siglecs, and implications in prognosis and immunotherapy responsiveness. Clin. Transl. Med. 13, e1262 (2023).
Liu, C. et al. Treg cells promote the SREBP1-dependent metabolic fitness of tumor-promoting macrophages via repression of CD8+ T cell-derived interferon-γ. Immunity 51, 381–397.e6 (2019).
Lu, D. et al. Beyond T cells: understanding the role of PD-1/PD-L1 in tumor-associated macrophages. J. Immunol. Res. 2019, 1919082 (2019).
Mei, S. et al. Resolving the spatial and cellular architecture of intra-tumor heterogeneity by multi-region dissection of lung adenocarcinoma. J. Genet. Genomics 52, 1121–1132 (2025).
Lu, L. et al. Anaplastic transformation in thyroid cancer revealed by single-cell transcriptomics. J. Clin. Invest. 133, e169653 (2023).
Hao, Y. et al. Dictionary learning for integrative, multimodal and scalable single-cell analysis. Nat. Biotechnol. 42, 293–304 (2024).
Yu, G., Wang, L.-G., Han, Y. & He, Q.-Y. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS 16, 284–287 (2012).
Yang, F. et al. Identification of key genes associated with papillary thyroid microcarcinoma characteristics by integrating transcriptome sequencing and weighted gene co-expression network analysis. Gene 811, 146086 (2022).
Pita, J. M., Banito, A., Cavaco, B. M. & Leite, V. Gene expression profiling associated with the progression to poorly differentiated thyroid carcinomas. Br. J. Cancer 101, 1782–1791 (2009).
Hong, K. et al. Identification and validation of a novel senescence-related biomarker for thyroid cancer to predict the prognosis and immunotherapy. Front. Immunol. 14, 1128390 (2023).
Fu, J. et al. Large-scale public data reuse to model immunotherapy response and resistance. Genome Med. 12, 21 (2020).
Jiang, P. et al. Signatures of T cell dysfunction and exclusion predict cancer immunotherapy response. Nat. Med. 24, 1550–1558 (2018).
Wang, Z. et al. An individualized protein-based prognostic model to stratify pediatric patients with papillary thyroid carcinoma. Nat. Commun. 15, 3560 (2024).
Leek, J. T., Johnson, W. E., Parker, H. S., Jaffe, A. E. & Storey, J. D. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 28, 882–883 (2012).
Acknowledgements
This work was supported by the Beijing Nova Program (Z211100002121044), Beijing Hospitals Authority (QML20211205), and Funding for Reform and Development of Beijing Municipal Health Commission.
Author information
Authors and Affiliations
Contributions
T.C. S.X. and H.C. performed the analysis and prepared the paper. P.Z. and Y.Z. supervised the studies, designed the analysis, and revised the paper with the help of X.Z.
Corresponding authors
Ethics declarations
Competing interests
H.C. and X.Z. are employees of Beijing ClouDNA Co. The remaining authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Chu, T., Xu, S., Chen, H. et al. Age-specific tumor-associated macrophage biomarkers underlie metastasis and immune dysregulation in thyroid cancer. npj Aging 11, 82 (2025). https://doi.org/10.1038/s41514-025-00270-9
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41514-025-00270-9

