Introduction

Breast cancer is the most prevalent cancer type globally and accounts for 23.8% of all cancer cases in women [1]. Triple-negative breast cancer (TNBC) accounts for 15–20% of all breast cancer cases. Due to the lack of HER2, estrogen, or progesterone receptors, the standard of care for non-surgical TNBC remains non-specific chemotherapy [2, 3]. The 5-year overall survival rate of TNBC patients is 77%, but once metastasis occurs, this rate drops to 12% [4]. Although several biomarkers (including EGFR, VEGF, PARP1, and C-kit) [5, 6] are associated with TNBC metastasis, clinical treatment faces challenges as no substantial improvements in outcomes have been made. Recently, three micropeptides (CIP2A-BP, ASRPS, and XBP1SBM) [7,8,9] were found to be involved in TNBC development, and the functions and mechanisms of more novel micropeptides are largely unknown. These new discoveries and other studies derived from the detection of unannotated small open reading frames (sORFs) that previously hidden in non-coding regions (such as lncRNAs, circRNAs, pri-miRNAs, 5’-UTR, and 3’-UTR) actually have the ability to encode micropeptides [10,11,12]. Emerging evidence suggests that these micropeptides can act as proto-oncogenes or oncogenes in different tumors to regulate the processes of cell proliferation, metastasis, angiogenesis, energy metabolism, and drug resistance [13,14,15,16].

By combining Ribo-seq, mass spectrometry (MS), machine learning, and gene editing technology, our team established a new micropeptide discovery platform and found that MIAC, encoded by the lncRNA AC025154.2, has an important role in head and neck squamous cell carcinoma and kidney cancer [17, 18]. In the present study, using the same platform, we discovered the novel 36-amino acid translated product, and we named it micropeptide XLH-36, which is encoded by the lncRNA C5orf66-AS1. Furthermore, combining analysis of clinical samples with cell biological and molecular studies, we explored the binding proteins and molecular mechanisms of XLH-36, and investigated the relationship between XLH-36 and TNBC tumorigenesis, hoping to provide potential biomarkers and targets for TNBC diagnosis and treatment.

Materials and methods

Mice and animal housing

BALB/C female nude mice were purchased from Jiangsu Gempharmatech. All mice were housed in the SPF barrier environment of the Pharmaceutical Animal Center, China Pharmaceutical University.

Xenograft studies

Five-week-old female nude mice obtained from Jiangsu Gempharmatech were maintained under specific pathogen-free conditions. Subcutaneous injections were performed at a density of 100 μl containing different groups of 6.5 × 106 MDA-MB-231 cells in the left para-mammary gland of nude mice. Tumor growth was observed every 3 days, and the volume was calculated by the following formula: volume = (length × width2)/2. The nude mice were euthanized according to their status, and tumors were dissected and weighed for analysis.

For the in vivo metastasis model, MDA-MB-231 cells (3 × 106) with different genetic manipulations (XLH-36 overexpression or knockout) were injected into the tail vein of mice, tumor metastasis was monitored twice a week through the IVIS Spectrum living imaging system (PerkinElmer) for 4 weeks. At the endpoint of the experiment, mice were humanely euthanized. All experiments using animals were conducted under the Institutional Animal Care and Use Committee (IACUC)-approved protocols at China Pharmaceutical University (IACUC-202312016) by NIH and institutional guidelines.

Cell lines and cell culture

MCF10A, BT474, MCF7, MDA-MB-231, MDA-MB-468, and 293T cells were cultured in DMEM (GIBCO) supplemented with 10% FBS (Bioind). T47D cells were cultured in RPMI 1640 (GIBCO) supplemented with 10% FBS (Bioind). Cells were kept at 37 °C and 5% CO2.

RNA isolation and qRT-qPCR

Total RNA was extracted via TRIzol (TIANGEN Biotech #G1012). Two milligrams of the RNA (measured by Nanodrop spectrophotometer) was reverse transcribed to cDNA by 5X All-In-One MasterMix (with AccuRT Genomic DNA Removal Kit, Applied Biological Materials #G492) in 20 μl system. qRT-PCR analysis was performed using Blastaq 2X qPCR MasterMix (Applied Biological Materials #G891). The mRNA expression was normalized by the expression of GAPDH and relative expression levels were calculated using the 2–ΔΔCT method in the indicated cell lysates. The primer sequences were listed in Supplementary Table 2

Cell transfection and lentivirus infection

The siRNA sequences targeting ICAM1, S100A4, and XLH-36 gene and non-targeting siRNA control were listed in Supplementary Table 3. Using Lipofectamine 8000 (Beyotime Biotechnology) according to the instructions. Lentiviruses were produced in HEK293T cells with package vectors pMD2.G and psPAX2. The virus was collected to transduce target cells, followed by selection with puromycin (2 μg/mL).

CRISPR/Cas9-mediated knock-in and knockout

For XLH-36 knock-in (KI), the single-guide RNA (sgRNA) was designed to target the stop codon region of XLH-36 ORF to insert a 1* FLAG- epitope at its C-terminal. The designed sgRNA was synthesized (GenScript) and cloned into a pSpCas9(BB)-2A-GFP vector (PX458, Addgene). As a DNA donor, a single-stranded oligodeoxynucleotide (ssODN) with homology arms specific to the XLH-36 locus was used: CGGACCCCAGCCTACGCCCCG AGGGAGGCCAGGACCCCGACTACAAGGACGACGATGACAAGTAGCCGGCGGGACTGCGCGCCGCCCCTCTCCCCGCAGGTCCC. PX458 was co-transfected with ssODN into cells using Lipofectamine 3000 (Thermo) and sorted using the FITC channel on a BD FACSAria II SORP. The editing for KI was validated by PCR, DNA sequencing, immunofluorescence, and IP followed by MS identification.

For XLH-36 knock-out (KO), two different single-guide RNAs (sgRNA) were designed to target the start and stop codon region of XLH-36 ORF, respectively. The designed sgRNAs were synthesized and cloned into pYSY-U6-sgRNA-EFla-mCherry and pYSY-U6-sgRNA-EF1a-eGFP (YSY Biotech, Nanjing, China). The lentivirus particle expressing Cas9 protein was generated (Hanbio, Shanghai, China) and transfected with MDA-MB-231 cells (MDA-MB-231-Cas9). The pYSY-U6-sgRNA-EFla-mCherry and pYSY-U6-sgRNA-EF1a-eGFP plasmids were co-transfected into MDA-MB-231-Cas9 cells and sorted using the FITC and PE channels on a BD FACSAria II SORP. The developed monoclonal XLH-36-KO populations were examined for editing by DNA sequencing and Western blot. The sgRNAs for XLH-36 KI and KO were listed in Supplementary Table 3.

Preparation of XLH-36 monoclonal antibody

XLH-36 peptide was used as the immunogen, which was synthesized by Abmart. Each set of antigens will be used to immunize 12 Balb/c mice (8–12 weeks old) and their serum potency will be monitored to determine the optimal number of immunizations. The optimized adjuvants and immunization methods produce high-affinity antibodies (IgG subtype) against most antigenic peptides. Initial immunization is followed by three to four boosts, after which mouse sera will be taken for titer testing (recombinant protein encapsulated as an antigen) by ELISA assay. Serum potency needs to be greater than 10 K, otherwise continue to strengthen the immunization. Whole spleen and 1/2 of the lymph nodes are taken and fused with myeloma SP2/0 cell line. The fused cells were spread onto four 384-well plates (102–104 cells per well) and cultured. Supernatants of all wells were collected and positive cell lines were screened by ELISA method to detect the reaction of cell supernatants against XLH-36 peptide, and positive wells with cells microscopically detected were transferred to 96-well plates for further culture. After a few days of growth, the supernatants of all wells were collected to detect the reactivity of cell supernatants with XLH-36 peptide by ELISA. Monoclonal hybridoma cells were spread in a 96-well plate and cultured until the bottom 1/6 plate was covered. ELISA was performed to detect the reaction of each well supernatant against the XLH-36 peptide, and the two wells with high OD values and good cell status were selected for the next round of subcloning. Repeat the above steps until the wells are 100% positive for the cell line. After the last round of subcloning, all the positive cells were immediately expanded and cultured, and the produced ascites was purified with Protein A/G and used for subsequent assays.

Protein extraction and Western blot analysis

Cells were collected and total protein was extracted using RIPA lysis buffer (Beyotime Biotechnology) supplemented with Halt Protease Inhibitor Cocktail (Termo Fisher Scientifc). Supernatants were obtained by centrifugation at 12,000 × g for 10 min at 4 °C. Cell lysate was added to 12% SDS-PAGE and transferred to PVDF membranes (Millipore Sigma). Membranes were incubated with primary antibodies overnight. HRP-linked goat anti-rabbit/mouse IgG was used as the secondary antibody. GAPDH was used as the loading control. The blots were visualized by ECL. Blot images were obtained using Tanon software (TANON-5200 system, Shanghai, PR China).

Immunoprecipitation (IP)

Cells were collected and lysed. Two microliter primary antibody was added into 1.5 mg protein solution. Indicated 0.4 μl control IgG was added to the 400 μg lysates. After incubation at 4 °C overnight, 35 μl protein A/G magnetic beads were added to the lysates and then incubated for 3 h. Then the magnetic beads were washed 3 times with 500 μl protease cracking solution containing RIPA at 4 °C and 6000 × g, each time for 1 min. The bead-enriched proteins were detected by Western blot analysis.

Subcellular fractionation

Using Subcellular Fractionation Isolation Kit (Thermo Fisher Scientific) according to the instructions. Cells were collected and lysed. Cytoplasmic solution, cell membrane solution, nuclear solution, chromatin solution and cytoskeletal solution were separated by CEB, MEB, NEB and PEB reagents.

Endoplasmic reticulum mitochondrial sub-fractionation

Using Endoplasmic Reticulum Isolation Kit (Sigma-Aldrich) according to the instructions. Lysed and collected 2 × 108 cells. Hypotonic solution with three times the volume of precipitation was added and the cells were incubated at 4 °C for 20 min. After centrifugation at 600 × g for 5 min, two times the volume of 1× Isotonic solution was added, ground in a homogenizer for 20 min, centrifuged at 1000 × g for 25 min, slowly add 7.5 times the volume of CaCl2 solution to the supernatant solution, stirred at 4 °C for 15 min, and centrifuged at 8000 × g for 15 min to obtain the precipitate. 300 μl 1× Isotonic solution was added to the precipitate and ground in a homogenizer for 15 min to obtain an ER solution.

Using Mitochondrial Isolation Kit (Thermo Fisher Scientific) according to the instructions. Lysed and collected 2 × 107 cells. 800 μl of reagent A was added, vortexed for 5 s at medium speed, and incubated on ice for 2 min. Then 10 μl of reagent B was added, vortexed for 5 s at high speed, incubated on ice for 5 min, vortexed once per minute, and 800 μl of reagent C was added and gently mixed. After centrifugation at 700 × g for 10 min at 4 °C, the supernatant solution was further centrifuged at 1200 × g for 15 min at 4 °C, 500 μl of reagent C was added to the precipitate and centrifuged at 12,000 × g for 5 min at 4 °C to obtain the mitochondrial precipitate.

Immunofluorescence and confocal microscopy

Cells were fixed with β-galactosidase staining fixative (Beyotime Biotechnology #C0602) for 15 min at RT. After two washes with PBS, the membranes were permeabilized with Triton X-100 (Beyotime Biotechnology #P0096) for 15 min and washed twice with PBS. Block with 5% BSA at RT for 30 min, wash with PBS, add primary antibody, and incubate at 4 °C overnight. Cells were washed with PBS twice and added secondary antibodies for 2 h at RT in the dark. After washing twice with PBS, added 10 μl DAPI (Servicebio Biotechnology # GDP1024), and the cells were incubated in the dark for 10 min and observed under a fluorescence microscope.

Cell proliferation assay (CCK-8)

Cell proliferation was evaluated using cell-counting kit-8 (CCK-8 Kit, Yeasen Biotechnology, #40203ES76). Cells were seeded into 96-well plates at the density of 2000 cells/well. At the indicated time points, 10 μl CCK-8 was added into the medium of the cells, and OD was measured by spectrophotometer at a wavelength of 450 nm (Thermo Scientific). All experiments were performed in triplicate.

Flow cytometric (FC) analysis-cell cycle analysis

Using Cell Cycle Assay Kit-PI/RNase Staining (CCS012, MULTI SCIENCES (LIANKE) BIOTECH CO., LTD) according to the instructions. Lysed and collected 5 × 105 cells. Precooled 70% ethanol was added to the precipitate, and cells were resuspended and incubated overnight at –20 °C. After centrifugation at 2000 rpm for 5 min, 100 μl RNase solution was added to the precipitate and incubated at 37 °C for 30 min, followed by 900 μl PI solution and incubation at 37 °C for 30 min. The samples were analyzed using a CytoFLEX (Beckman Coulter) and FlowJo software.

Cell migration and cell invasion assay

A total of 1.5 × 104 cells were cultured in 200 μl DMEM without FBS in the upper chamber, and 500 μl DMEM containing 10% FBS was added to the lower chamber. After incubation of these chambers for 48 h, non-migrated cells on the upper membrane layer were removed, cells that had migrated through the membrane were fixed in precooled formaldehyde for 30 min, stained with 0.1% crystal violet for 15 min, and five randomly selected fields (×10) of each chamber were photographed and counted.

After removing the OrganoGel Matrigel (CELLada #OM-1) from –20 °C, it should be thawed at 4 °C to a liquid state. All products touching Matrigel should be pre-cooled by ice bath before use, and the whole process is carried out on ice. Matrigel was diluted with pre-cooled DMEM without FBS (1:7), and 10 μl of diluted Matrigel was evenly spread in the transwell chamber, which was placed at 37 °C for 30 min. The cell suspension was digested and prepared. 3 × 104 cells were added to each chamber, and subsequent manipulations were consistent with cell migration. At 0, 12, 24, and 48 h, the cell migration distance was photographed under an inverted microscope at the same position and calculated. The distance of cell migration was measured and calculated using ImageJ-win64 software.

Cell adhesion assay

Recombinant fibrin was dissolved in sterile PBS to 10 μg/ml, mixed, and added to a 96-well plate, 50 μl per well, and coated at 37 °C for 1 h. The well plates were washed twice by dipping with serum-free medium. Lysed and collected 1 × 104 cells. Cells were resuspended in serum-free medium, cultured at 37 °C for 1 h, washed three times with PBS, and CCK8 was added to detect the number of cells.

Cell wound scratch assay

Cells were inoculated in 24-well culture plates until the confluence reached 90%. Scratches were created by scraping the monolayer with a sterile pipette tip. The medium was then changed to DMEM with 2% FBS.

Bioinformatic analysis

Homologous sequences of C5ORF66-AS1 ORF1 were identified using NCBI-BLAST or BLAT search of the UCSC Genome Browser (https://genome.ucsc.edu/). Further evaluations of XLH-36 structure and domains were performed using the following online servers: C-I-TASSER (https://zhanglab.ccmb.med.umich.edu/C-I-TASSER/), TMHMM (http://www.cbs. dtu.dk/ services /TMHMM/) and HDOCK (http://hdock.phys.hust.edu.cn/).

Molecular docking

The 3D structure of the XLH-36 peptide was generated in HDOCK, and the 3D structure of the protein Gemin4 were downloaded from AlphaFold structure protein Database (AF-P57678-F1). The ClusPro server [19] was used for molecular docking of XLH-36 and Gemin4. The docked structures were analyzed using MOE 2019.1.

MicroScale thermophoresis (MST)

Human Gemin4 protein was expressed by HEK293 and labeled by CY5 fuorescence (Xi’an ruixi Biological Technology Co., Ltd) via the sulfhydryl group on the cysteine. The labeled Gemin4 protein and chemosynthetic XLH-36 peptide (Shanghai GL Biochem Ltd) were incubated for 5 min at room temperature, and reaction mixtures were enclosed in premium-coated glass capillaries and loaded into the instrument (Monolith NT.115, NanoTemper, Germany). Kd values were determined using the NanoTemper MO. Affinity analysis tool as described.

Hematoxylin-eosin staining (H&E)

Mouse tissues were fixed in 4% formalin for 48 h, and cryo-sectioned after embedding using Tissue-TEK O.C.T. Compound (SAKURA #4583) to obtain 9 µm thickness sections. The plates were fixed in 4% formalin for 5 min at RT. Sections were immersed in 70%, 85%, and 100% ethanol for 1 min. The plates were stained with hematoxylin staining solution (BBI #E607317-0100) for 3 min, rinsed with small water flow for 5 min, and then dried and stained with eosin staining solution (BBI #E607321-0100) for 3 min and rinsed with small water flow for 4 min. Sections were transferred from the following solutions with indicated times. 70% Ethanol (5 s)-85%Ethanol (5 s)-100% Ethanol (2 min)- Xylene (4 min)- Xylene (4 min). After staining, the slides were air-dried and covered with neutral balsam (Solarbio #G8590).

LC-MS/MS analysis

The LC-MS/MS analysis of the immunoprecipitation complex with XLH-36 antibody in MDA-MB-231 cells was performed as previously described [17]. Total proteins were collected and separated by SDS-PAGE on 10–17% Bis-Tris acrylamide gels and stained with silver. Samples were dissolved in 0.1% TFA and 2% acetonitrile, and the resulting peptides were analyzed with a QExactive mass spectrometer (Thermo Fisher Scientific) coupled with an HPLC instrument (Dionex NCS3500). For the identification of XLH-36 interactors, the resulting spectra were searched against the UniProt database. A maximum of two missed cleavage sites was allowed. The top 10 candidate XLH-36 interactors are listed in the Supplementary Table 4.

Sample and library preparation for RNA-seq

One million MDA-MB-231/WT and MDA-MB-231/XLH-36-KO cells were harvested and washed with cold PBS to remove trypsin by spinning at 500 × g for 5 min at 4 °C. RNA was purified using the Invitrogen TRIzol Reagent and quality control was assessed by Agilent 2100 BioAnalyzer (RIN ≥ 7.0, and rRNA ratio ≥ 1.5). Libraries for sequencing were prepared using the Illumina NEBNext Ultra RNA Library Prep Kit and sequenced on the Illumina NextSeq 500 with 1100-bp single-end sequencing. Quality control and library preparation for RNA from MDA-MB-231/WT and MDA-MB-231/XLH-36-KO cell lines were performed at CapitalBio Technology following their standard protocols. Replicates for specific RNA-seq experiments are indicated in Fig. 7B.

Statistical methods

The data are presented as the means ± standard deviation (SD) of a minimum of three biological replicates and were compared using the student’s t test. Multiple comparisons were made by unpaired, two-tailed Student’s t tests or one-way ANOVA were used. OS curves were assessed with the Kaplan-Meier method and compared by the Log-rank test. Statistical tests were conducted using GraphPad Prism (Version 8; La Jolla, CA, USA) software unless otherwise indicated. NS non-significant, *P < 0.05, **P < 0.01, ***P < 0.001.

Analysis of RNA-seq data

Gene expression was evaluated as read count at the gene level with Hisat2. Gene expression data were normalized to counts per million (CPM), and differential expression between experimental groups was evaluated using LIMMA [20] and DESeq with P < 0.05 and |log2(fold change)| ≥ 1 as cut-offs. GO enrichment analysis was performed using PANTHER to determine the molecular and biological functions that were enriched among DEGs in XLH-36-KO cells. The input gene lists included differentially upregulated and downregulated genes in two comparisons based on adjusted P < 0.05, |log2(fold change)| ≥ 1 (MDA-MB-231/WT compared to MDA-MB-231/XLH-36-KO). In total, 640 DEGs were identified. A cut-off value of FDR < 0.05 was used to select the top enriched GO terms. RNA-seq data have been deposited in the GSA database (GSA accession number: HRA008378).

Results

Differentially expressed lncRNA and potential coding sORF were selected by combining 1060 BRCA samples and 165 TNBC samples

Emerging evidence shows that functional peptides could be encoded by sORFs within lncRNAs [21], which have a structure and transcription process similar to those of mRNAs [22]. To identify lncRNAs with coding potential that are highly expressed in TNBC, we analyzed lncRNA data from 1090 BRCA and 113 non-neoplastic breast tissues from the TCGA database (Supplementary Table 1), as well as RNA-seq data from 165 TNBC samples from GEO database (GSE76250). Overall, 1017 highly expressed lncRNAs candidates (log(Fold Change) ≥ 2, P ≤ 0.01, Supplementary Fig. 1A) in BRCA tissues were identified based on the following inclusion criteria: (1) highly expressed in breast cancer and TNBC samples, with a significant correlation with clinical prognosis; (2) good coding probability based on ribosome profiling and the m6A score according to the Translnc database; and (3) potential sORF length <300 nt, with a unique and undiscovered peptide sequence (Fig. 1A). There are four lncRNAs (C5ORF66-AS1, LINC01614, LINC00511, LINC00460) that meet all three of the aforementioned screening conditions simultaneously (Fig. 1B), and C5orf66-AS1 has the highest overall score among all of them (Table 1 and Fig. 1C). Furthermore, the C5orf66-AS1 expression was significantly higher in TNBC clinical samples (Fig. 1D) and three TNBC cells (MDA-MB-231, MDA-MB-468, and BT549, Fig. 1E). Thus, in the present study, we focused on the role of this candidate micropeptide-encoding lncRNA in TNBC progression.

Fig. 1: Large-scale screening identified the novel micropeptide XLH-36 in triple-negative breast cancer.
figure 1

A Schematic screening of differential expressed of potential coding lncRNAs from TCGA and the bioinformatic database. B Venn diagrams showing differentially expressed lncRNAs overlaps between breast cancer, TNBC samples, and coding ability analysis. C C5orf66-AS1, encoding XLH-36, was significantly upregulated in BRCA patients (n = 1090) compared to adjacent normal tissues (n = 113) from TCGA. D The GSE76250 dataset indicated that C5orf66-AS1 was significantly upregulated in TNBC patients (n = 165) compared to adjacent normal tissues (n = 33). E XLH-36-encoding C5orf66-AS1 expression in breast cancer cells and normal cells. Expression levels were normalized to GAPDH. F Diagram of the location XLH-36-encoding region in C5orf66-AS1. The XLH-36-Flag fusion protein was expressed in MDA-MB-231 cells and detected by immunoblot (G) and immunofluorescence analysis (H). Nuclei were stained with DAPI (blue). I XLH-36 expression was detected in breast cancer cells and normal cells were detected by immunoblot with prepared anti-XLH-36 antibody. J Amino acid alignment of XLH-36 from different vertebrates. Data are presented as mean ± SD from n = 3 biologically independent experiments. *P < 0.05, **P < 0.01, ***P < 0.001.

Table. 1 Summary of the high-potential coding lncRNAs differently overexpressed in breast cancer and TNBC patients.

Discovery of the highly conserved novel endogenous micropeptide XLH-36 in primates

To analyze the coding potential of C5orf66-AS1, we used PhyloCSF, the Coding Potential Calculator (CPC), and ORF finder analysis, and we found that a sORF spanning the 21-131 bp in exon 1 of C5orf66-AS1 could encode a 36-amino acid micropeptide (Fig. 1F and Supplementary Fig. 1B), which we termed XLH-36. To further confirm that XLH-36 is endogenously expressed, we knocked in a Flag-tag (Flag KI) upstream of the XLH-36 stop codon (Supplementary Fig. 1C). Immunoblotting (IB) and immunofluorescence (IF) analysis using anti-Flag suggested that XLH-36 was indeed endogenously translated (Fig. 1F, G). Moreover, we generated a monoclonal anti-XLH-36 antibody (Supplementary Fig. 1D, E) and performed IB in different BRCA cell lines (Fig. 1H). XLH-36-specific peptide sequences were also identified by immunoprecipitation (IP) with Flag-KI-enriched cell lysates, followed by MS analysis (Supplementary Fig. 1F, G).

As evolutionary conservation is considered the hallmark of functional ORFs, we investigated the codon conservation of XLH-36. Amino acid analysis also showed that XLH-36 is relatively conserved in primates (Fig. 1I), indicating that XLH-36 might be a functional micropeptide. Collectively, these data indicate that XLH-36 is naturally and endogenously translated from the sORF located in lncRNA C5orf66-AS1 (Chr.5 135,040,047-135,039,596).

XLH-36 is an oncogenic micropeptide that serves as a prognostic biomarker for TNBC

To uncover the clinical relevance of XLH-36, we divided the participants into two groups (XLH-36-low and XLH-36-high) based on the median expression level of C5orf66-AS1. Kaplan–Meier survival analysis showed that patients with higher XLH-36 expression levels had a poor prognosis (Fig. 2A), and XLH-36 was upregulated in patients with advanced stage BRCA (Fig. 2B). In an independent TNBC patient cohort, Kaplan–Meier survival analysis revealed that high XLH-36 expression was associated with poor prognosis (Fig. 2C), while the expression level of XLH-36 does not strongly correlate with the prognosis of patients who are ER, PR, or HER2 positive (Supplementary Fig. 2A–C). Moreover, XLH-36 expression levels were determined by immunohistochemical analysis using anti-XLH-36 antibody. Consistent with the RNA expression, XLH-36 protein expression was significantly higher in the BRCA samples than in the normal breast tissue samples (Fig. 2D). Notably, XLH-36 was found to be significantly upregulated in TNBC patients (Fig. 2E) compared to control subjects and the high XLH-36 expression group exhibited enhanced clinical stage (Supplementary Fig. 2D). Taken together, these results suggest that high XLH-36 expression is correlated with poor clinical outcomes in TNBC patients.

Fig. 2: XLH-36 is highly expressed in TNBC samples and is a significant predictor of poor outcomes.
figure 2

A Overall survival curves of BRCA patients grouped according to XLH-36 expression. B XLH-36-encoding C5orf66-AS1 was significantly associated with the clinical stage of BRCA patients. C Low XLH-36 expression was associated with a high overall survival rate of TNBC patients. D XLH-36 levels were significantly increased in BRCA tissues by IHC assay with XLH-36 antibodies. Scale bar, 200 μm. E XLH-36 levels were significantly increased in TNBC tissues by IF assay with XLH-36 antibodies. Scale bar, 50 μm. The overall survival curve was generated with Kaplan–Meier analysis and the log-rank test. Lines show the means ± SD. *P < 0.05, **P < 0.01, ***P < 0.001.

XLH-36 promotes TNBC metastasis in vitro and in vivo

To investigate the potential biological function of XLH-36 in TNBC progression, a set of silencing/overexpression experiments were performed in two TNBC cell lines (MDA-MB-231 and BT549, Fig. 3A and Supplementary Fig. 3A). Results showed that silencing XLH-36 (XLH-36 KO) in MDA-MB-231 and BT549 cells dramatically inhibited (Fig. 3B–G) while overexpressing XLH-36 (XLH-36 OE) significantly promoted cell proliferation (Supplementary Fig. 3B), cell cycle progression (Supplementary Fig. 3C–G), migration, invasion and wound healing (Supplementary Fig. 4A–C). These results suggest that XLH-36 functions as an oncomicropeptide to promote TNBC progression.

Fig. 3: XLH-36 functions as an oncomicropeptide to promote TNBC metastasis in vitro and in vivo.
figure 3

A Upper: Schematic diagram of the CRISPR/Cas9 based strategy to produce XLH-36-KO MDA-MB-231 and BT549 cells. Lower: Immunoblot of XLH-36 in wild-type and XLH-36-KO MDA-MB-231 and BT549 cells. B Cell proliferation was evaluated by the CCK-8 assay in the indicated cells. C Flow cytometry analysis of the cell cycle distribution in the indicated cells. D qPCR analysis of the cell cycle marker in the WT and XLH-36-KO MDA-MB-231 and BT549 cells. Cell metastatic properties were assessed by the Transwell migration assay (E) invasion assay (F) and wound scratch assay (G). H Representative images and statistical analysis of lung metastases of tumor-bearing mice after i.v. injection of the indicated MDA-MB-231 cells. In vivo-generated tumors and analyses of tumor volume (I) are shown. Data are presented as mean ± SD from n = 5 mice per group. Two-way ANOVA analysis. Lines show the means ± SD. *P < 0.05, **P < 0.01, ***P < 0.001.

We next evaluated the tumor-promotive role of XLH-36 in vivo. Injection of XLH-36-KO cells substantially reduced the incidence of pulmonary metastasis (Fig. 3H), and xenograft tumor growth (Fig. 3I and Supplementary Fig. 5A) with decreased cell proliferation (indicated by Ki-67 staining, Supplementary Fig. 5B). Furthermore, hematoxylin and eosin (H&E) staining showed that XLH-36 knockout led to a decrease in the number and size of metastatic foci in the lung (Supplementary Fig. 5C, D). In contrast, XLH-36 overexpression resulted in the opposite phenotypes (Supplementary Fig. 5E–I). These data collectively corroborate the oncogenic role of XLH-36 in TNBC metastasis and show that targeting XLH-36 may elicit anti-tumor effects.

XLH-36, but not lncRNA C5orf66-AS1, exerts the specific pro-cancer effect

Moreover, to distinguish the potential role of C5orf66-AS1 from the potential role of XLH-36, XLH-36-ORF (XLH-36-OE), C5orf66-AS1 derived sORF with mutated ATG start codon (XLH-36-OE-ATG-Mut), and XLH-36-ORF with synonymous nucleotide substitutions of besides the start codon and stop codon (XLH-36-OE-SNS) that generated the same peptide product were separately overexpressed in MDA-MB-231 and BT549 cells (Fig. 4A). Importantly, overexpression of XLH-36-OE or XLH-36-OE-SNS but not XLH-36-OE-ATG-Mut resulted in elevated XLH-36 expression (Fig. 4B), cell proliferation, and cell migration (Fig. 4C–E). Moreover, the capability of cell proliferation and migration suppressed by XLH-36 KO could be restored by overexpression of XLH-36 or XLH-36-OE-SNS, while XLH-36-OE-ATG-Mut failed to rescue these phenotypes (Fig. 4F–I), implying that XLH-36 but not C5orf66-AS1 is the major executor of the pro-cancer functions in TNBC.

Fig. 4: XLH-36 but not C5orf66-AS1 promotes TNBC progression.
figure 4

A The minimal free energy of RNA folding of wild-type and mutant (Mut) XLH-36 ORF, as predicted using the RNA-fold web server. B Immunoblot analysis of the expression levels of indicated XLH-36 variants in lentivirus-infected MDA-MB-231 and BT549 cells. Cell proliferation (C) and migration ability (D, E) analysis in the MDA-MB-231 and BT549 cells with the indicated constructs. Cell proliferation (F) and migration ability (G) assay was performed on WT and XLH-36 KO MDA-MB-231 cells with overexpression of XLH-36, XLH-36-OE-SNS, or XLH-36-OE-ATG-Mut. Cell proliferation (H) and migration ability (I) assay was performed on WT and XLH-36 KO BT549 cells with overexpression of XLH-36, XLH-36-OE-SNS, or XLH-36-OE-ATG-Mut. Data are presented as mean ± SD from n = 3 biologically independent experiments. ns non-significant, *P < 0.05, **P < 0.01, ***P < 0.001.

XLH-36 is localized in the endoplasmic reticulum

The subcellular localization of a protein is closely related to its function, to study how XLH-36 contributes to TNBC progression, we analyzed its intracellular distribution. We conducted subcellular fractionation and structure prediction (using the TMHMM and HDOCK servers), and the result showed a prominent distribution in the cytoplasm (Fig. 5A and Supplementary Fig. 6A). Notably, XLH-36 was co-localized with the endoplasmic reticulum but not with mitochondria (Fig. 5B, C and Supplementary Fig. 6B).

Fig. 5: XLH-36 is localized in the endoplasmic reticulum and directly binds to Gemin4.
figure 5

A, B Immunoblot of XLH-36 in endoplasmic reticulum/ cytosolic fractionated MDA-MB-231 cells. ATPase: cell membrane protein; SP1: nuclear protein; GAPDH: cytosolic protein; Calreticulin: endoplasmic reticulum protein. C Immunofluorescence of XLH-36 (Green) colocalized with endoplasmic reticulum (Red) in MDA-MB-231 cells. D Schematic diagram showing the identification of XLH-36-interacting proteins using MS analysis following immunoprecipitation. E Co-IP assays showing the endogenous interaction between XLH-36 and Gemin4. F Representative images of MDA-MB-231 cells co-stained with XLH-36, Gemin4 and DAPI. G The direct interaction of Gemin4 and XLH-36 was analyzed by the yeast two-hybrid system. DDO medium, SD/Trp-/Leu-; TDO medium, SD/Trp-/Leu-/HIS-. H Molecular docking analysis of XLH-36 with Gemin4. I Tables showing the potential contact sites of XLH-36 binding with Gemin4. J MicroScale Thermophoresis (MST) analysis of the interaction between XLH-36 and Gemin4. Dose–response analysis of the MST traces allows for the determination of the XLH-36 KD values for its interactions with Gemin4.

XLH-36 directly binds to Gemin4

To identify potential interactors of XLH-36, we performed a co-IP assay, followed by MS analysis (Fig. 5D). In total, 140 proteins that were pulled down with XLH-36 were identified (Supplementary Fig. 6C). Remarkably, among the three candidate interactors with the top score (Gemin4, EEF2, and ACLY) that were validated by Co-IP, only Gemin4 was confirmed to interact with XLH-36 (Fig. 5E and Supplementary Fig. 6D). IF analysis of in MDA-MB-231 cells showed that XLH-36 indeed interacted with Gemin4 (Fig. 5F).

To further demonstrate the interaction between XLH-36 and Gemin4, we conducted a yeast two-hybrid assay, which supported the direct interaction between XLH-36 and Gemin4 (Fig. 5G). Subsequently, we performed molecular docking simulations (Fig. 5H) using the predicted XLH-36 structure and Gemin4 structure (Supplementary Fig. 7A, D). We found that residues R12, P16, and R17 in XLH-36 were involved in the binding to residues K608, V897, H925, L931, H994, S1037, E1040, I894 and R1033 in Gemin4 by establishing salt bridges and hydrogen bonds (Fig. 5I). We next performed microscale thermophoresis (MST) analysis using chemically synthesized XLH-36 (Supplementary Fig. 7B, C) and purified Gemin4 protein (Supplementary Fig. 7E, F). The KD value of XLH-36 bound to Gemin4 was 4.76 E-06 (Fig. 5J), suggesting that XLH-36 has a strong binding affinity to Gemin4. To better illustrate the key binding sites between XLH-36 and Gemin4, we mutated R12, P16, and R17 to Ala, respectively, according to the molecular docking analysis and performed MST assay. Our results showed that the mutant peptides had decreased affinity for Gemin4 compared to the original XLH-36 peptide. The most significant decrease in affinity was observed after mutation at the P16 position (Supplementary Fig. 7G), indicating that this residue plays a crucial role in the interaction between XLH-36 and Gemin4.

XLH-36 promotes TNBC progression by binding to Gemin4

Existing studies have shown that Gemin4, an essential component of the survival motor neuron (SMN) complex [23], binds to the small nucleus by working together with other proteins in the complex, including motor neuron protein (SMN) and GEMIN2, 3, 5, 6, 7, and 8. This binding is crucial for the structural integrity of small ribonucleoproteins (snRNPs), which are responsible for the assembly of the spliceosome and the activation of downstream pre-mRNA [24]. No relevant reports have been found on the relationship between Gemin4 and the TNBC pathogenesis. Continuing to investigate the role of Gemin4 in triple-negative breast cancer is crucial for gaining a comprehensive understanding of its biological functions.

Analysis of TCGA data revealed that Gemin4 expression was significantly decreased in BRCA tissues compared to normal tissues (Fig. 6A). Furthermore, there was a strong correlation between low Gemin4 levels and advanced clinical stage in BRCA patients (Fig. 6B). Kaplan–Meier survival analysis revealed that patients with higher Gemin4 expression had a lower risk of BRCA-related death (Fig. 6C). Furthermore, we evaluated the function of Gemin4 in TNBC progression, (Fig. 6D) knocking down of Gemin4 significantly promoted cell proliferation, and migration in MDA-MB-231 cells (Fig. 6E, F). Meanwhile, knockdown of Gemin4 can reverse the inhibitory effect of XLH-36 on TNBC cell proliferation and migration (Fig. 6G, H). Based on our above-mentioned finding that XLH-36 could directly interact with Gemin4, we investigated whether XLH-36 regulates Gemin4 or vice versa. Notably, overexpression of XLH-36 did not affect Gemin4 expression (Supplementary Fig. 8A, B), and Gemin4 knockdown similarly had no effect on XLH-36 expression (Supplementary Fig. 8C, D), implying there is no regulatory relationship between XLH-36 and Gemin4. Overall, these data indicate that XLH-36 exerts its oncogenic function by binding to Gemin4, which acts as a new tumor suppressor in TNBC.

Fig. 6: Gemin4 serves as a tumor suppressor gene and is involved in XLH-36 regulating the progression of TNBC.
figure 6

A Gemin4 is significantly downregulated in BRCA patients (n = 1090) versus adjacent normal tissues (n = 113) from TCGA. B Gemin4 is significantly related to clinical stage of BRCA patients. C Overall survival curves of BRCA patients grouped according to the Gemin4 expression. The overall survival curve was generated with Kaplan-Meier analysis and log-rank test. Gemin4 knockdown (D) promotes MDA-MB-231 cell proliferation (E) and migration (F) ability. Cell proliferation (G) and migration assay (H) were performed on WT and XLH-36 KO MDA-MB-231 cells with Gemin4 knockdown. Data are presented as mean ± SD from n = 3 biologically independent experiments. *P < 0.05, **P < 0.01, ***P < 0.001.

XLH-36 regulates TNBC metastasis primarily through ICAM1 and S100A4

We next performed RNA-seq analysis in XLH-36-KO cells to identify the downstream pathway through which XLH-36 promotes TNBC metastasis. A total of 640 differentially expressed genes (DEGs, fold change ≥ 2 and P < 0.05) were identified, including 310 upregulated genes and 330 downregulated genes (Supplementary Fig. 9A). Furthermore, KEGG pathway enrichment and GESA analysis revealed that these DEGs were enriched in ECM receptor interaction, focal adhesion, and cell adhesion molecules (CAMs), which are functionally involved in cell migration and invasion (Fig. 7A and Supplementary Fig. 9B). The top 10 upregulated genes and the top 10 downregulated genes in the RNA-seq (Fig. 7B) results were selected for qRT-PCR validation (Fig. 7C). Among them, the mRNA or protein expression levels of ICAM1 and S100A4 were most significantly regulated by XLH-36 (Fig. 7D and Supplementary Fig. 9C, D).

Fig. 7: XLH-36 regulates TNBC metastasis primarily through ICAM1 and S100A4.
figure 7

A GSEA of cell migration signature between XLH-36-KO versus WT in MDA-MB-231 cells. B Heatmap of the top 10 up/down differentially expressed genes between XLH-36-KO versus WT in MDA-MB-231 cells. C Left, qRT-PCR validation of top 10 differentially upregulated genes between XLH-36-KO versus WT in MDA-MB-231 cells. Right, qRT-PCR validation of top 10 differentially downregulated genes between XLH-36-KO versus WT in MDA-MB-231 cells. D IB of the expression level of S100A4 and ICAM1 in WT versus XLH-36-KO MDA-MB-231 cells. E S100A4 is significantly downregulated in BRCA patients (n = 1090) versus adjacent normal tissues (n = 113) from TCGA. Cell proliferation (F) and migration capacity (G) are evaluated in MDA-MB-231 cells transfected with si-NC or si-S100A4. Cell migration assays (H) were performed on WT or XLH-36 KO MDA-MB-231 cells with S100A4 knockdown. I ICAM1 is significantly upregulated in BRCA patients (n = 1090) versus adjacent normal tissues (n = 113) from TCGA. J Data set from GSE76250 indicates ICAM1 is significantly upregulated in TNBC patients (n = 165) versus adjacent normal tissues (n = 33). Cell proliferation (K) and migration capacity (L) are evaluated in MDA-MB-231 cells transfected with si-NC or si-ICAM1. Cell migration assays (M) were performed on ctrl or XLH-36 overexpression MDA-MB-231 cells with ICAM1 knockdown. Data are presented as mean ± SD from n = 3 biologically independent experiments. *P < 0.05, **P < 0.01, ***P < 0.001.

ICAM1 (Intercellular Adhesion Molecule 1), also known as CD54, is a member of the immunoglobulin superfamily (IGSF) of adhesion molecules [25]. It plays a crucial role in regulating various biological processes, including cell adhesion, angiogenesis, tumor evolution, apoptosis, and metastasis [26]. S100A4, also known as fibroblast-specific protein 1 (FSP1) or secreted calcium-binding protein S100A4, is a member of the S100 family of proteins, which plays a critical role in processes such as cell migration, invasion, angiogenesis, and epithelial-mesenchymal transition (EMT) [27, 28]. However, the exact mechanisms by which S100A4 promotes cancer progression are complex and not fully understood. To investigate the specific roles of ICAM1 and S100A4 involved in XLH-36 regulating the progression and metastasis of TNBC, we first conducted clinical database (TCGA and GTEx) analysis and loss-of-function experiments. The results showed that S100A4 expression was remarkably lower in BRCA tissues (Fig. 7E), and the knockdown of S100A4 led to an increase in cell proliferation and migration (Fig. 7F, G). Additionally, the inhibitory effects of silencing XLH-36 were counteracted by the knockdown of S100A4 (Fig. 7H and Supplementary Fig. 9E). Conversely, ICAM1 was highly expressed in BRCA tissues and TNBC samples (Fig. 7I, J). ICAM1 knockdown reduced proliferation and migration (Fig. 7K, L), which were reversed by XLH-36 overexpression (Fig. 7M and Supplementary Fig. 9F). These data show that ICAM1 and S100A4 are key genes regulated by XLH-36 in TNBC development.

XLH-36/Gemin4 interaction promotes ICAM1 and EMT process via regulating S100A4 mRNA splicing

We next assessed the effect of XLH-36/Gemin4 interaction on the expression of S100A4 and ICAM1, and also investigated the regulatory relationship between ICAM1 and S100A4. The results indicated that knockdown of Gemin4 only decreased the levels of S100A4 and did not have any impact on the levels of ICAM1 (Fig. 8A and Supplementary Fig. 10A). Meanwhile, S100A4 knockdown promoted ICAM1 expression and did not affect XLH-36 or Gemin4 expression (Fig. 8B and Supplementary Fig. 10B), and the expression levels of XLH-36, Gemin4, or S100A4 were not affected upon ICAM1 knockdown (Fig. 8C and Supplementary Fig. 10C). Gemin4 is a protein that plays an important role in the process of pre-mRNA splicing, which is crucial for the proper maturation of mRNA. We therefore postulate that Gemin4 plays a role in the splicing process of S100A4 mRNA, potentially contributing to its stability. Wild-type cells (WT) and Gemin4 KD cells were treated with α-amanitin to stop transcription by RNA polymerase II and RNA was isolated at 0, 2, 4, 6 and 8 h post-treatment. Then, the decay of S100A4 transcripts was measured using real-time polymerase chain reaction (RT-PCR). Our results showed that the S100A4 RNA levels was significantly decreased in Gemin4 knockdown cells (Fig. 8D, E). Moreover, cycloheximide was added to the cell medium to block the synthesis of new proteins. Endogenous S100A4 protein levels were almost completely suppressed in Gemin4 knockdown cells from 0 h to 6 h time points (Fig. 8F). This phenomenon could be explained by the fact that S100A4 mRNA levels were significantly suppressed by the Gemin4 knockdown.

Fig. 8: XLH-36/Gemin4 interaction promoted ICAM1 and EMT process via regulating S100A4 mRNA splicing.
figure 8

IB of the expression level of XLH-36, Gemin4, S100A4, and ICAM1 in MDA-MB-231 cells transfected with si-NC, si-Gemin4 (A) si-S100A4 (B) or si-ICAM1(C). D, E The decay rates of total mRNAs were assessed using quantitative RT-PCR in si-NC and si-Gemin4 cells following transcription inhibition by α-amanitin experiments. F Indicated MDA-MB-231 cells were treated with CH× (100 μg/mL), and Gemin4 or S100A4 proteins were examined at indicated time point. G Immunofluorescence staining was performed to Gemin4 (red) in WT and XLH-36-KO MDA-MB-231 cells. Scale bar: 10 μm. H IB of the expression level of TWIST1 and SANI1 in the indicated MDA-MB-231 cells. I left, representative IF images of TWIST1 in the WT and XLH-36-KO MDA-MB-231 cells. Right, representative IF images of TWIST1 in the frozen tumor section from the mice subcutaneously injected with WT and XLH-36-KO MDA-MB-231 cells. Data are presented as mean ± SD from n = 3 biologically independent experiments. *P < 0.05, **P < 0.01, ***P < 0.001.

The SMN complex, including Gemin4, localizes primarily to the cytoplasm but shuttles to the nucleus, where it is involved in the biogenesis and maintenance of spliceosomal snRNPs for splicing [29]. As a micropeptide located in the cytoplasm, we hypothesize that XLH-36 binds to Gemin4 and prevents its transport into the nucleus, thereby inhibiting the transcription and translation of S100A4. Immunofluorescence results indicated that the absence of XLH-36 decreased the presence of Gemin4 in the cytoplasm, but increased its concentration in the nucleus (Fig. 8G). Inversely, the overexpression of XLH-36 shows an opposite trend (Supplementary Fig. 10D, E). These data collectively suggested that XLH-36 binds Gemin4 and retains it in the cytoplasm, thereby inhibiting the enhanced effect of Gemin4 on S100A4 mRNA splicing and stability.

Notably, the regulatory roles of S100A4 and ICAM1 in tumor metastasis are associated with epithelial-mesenchymal transition (EMT) [30, 31]. Based on the results mentioned above, it can be inferred that decreased expression of S100A4 will hinder the impairment of TNBC cell function, leading to a compensatory increase in ICAM1 levels. However, the specific mechanism behind this relationship requires further investigation. We further explored the regulatory relationship between XLH-36, Gemin4, S100A4, and ICAM-1 and EMT. We found that the expression levels of EMT activator TWIST1 and SNAI1 were significantly reduced in XLH-36-KO or ICAM1-KD cells. In contrast, knockdown of Gemin4 or S100A4 enhanced TWIST1 and SNAI1 expression (Fig. 8H). Additionally, IF experiments in MDA-MB-231 cells and xenograft tumor tissues showed that silencing of XLH-36 inhibited TWIST1 expression (Fig. 8I) and XLH-36 overexpression promoted TWIST1 expression (Supplementary Fig. 10F, G). Taken together, these data demonstrate that XLH-36 is a high-affinity Gemin4 binding micropeptide that enhances EMT by regulating ICAM1 and S100A4 expression.

Discussion

Here, we present evidence that the previously uncharacterized C5orf66-AS1 encodes the novel micropeptide XLH-36, which mediates TNBC progression and metastasis. Specifically, as an oncopeptide, XLH-36 directly binds Gemin4 to inactivate S100A4 and induce ICAM1 expression, consequently promoting EMT and cancer metastasis (Fig. 9). Together, these data provide new insight into the molecular regulatory network underlying EMT and metastasis in TNBC.

Fig. 9
figure 9

The proposed model for the role of C5orf66-AS1-encoded XLH-36 in TNBC metastasis.

In this study, we found that C5orf66-AS1 encodes a novel endogenous micropeptide XLH-36. Conservation analysis across 101 species showed that micropeptide XLH-36 is highly conserved in different primates, suggesting that XLH-36 may be necessary for biological processes. Furthermore, through the analysis of 1295 clinical BRCA and TNBC samples from different sources, we proved that XLH-36 is highly expressed in BRCA and TNBC tissues. Importantly, the expression of XLH-36 in early patients was significantly lower than that in advanced-stage patients. In addition, RNA-seq analysis of 32 different cancer types from TCGA cohorts showed that XLH-36 behaves differently in various tumors (Supplementary Fig. 11), including Lung squamous cell carcinoma (LUSC), Head and Neck squamous cell carcinoma (HNSC), Bladder Urothelial Carcinoma (BLCA), Colon adenocarcinoma (COAD), Sarcoma (SARC), Stomach adenocarcinoma (STAD), and Glioblastoma multiforme (GBM), and further research is needed.

Recent studies have demonstrated that C5orf66-AS1, which is annotated as “non-coding RNA”, has distinct biological functions in multiple cancers, including pituitary null cell adenoma, cervical cancer, and gastric cancer [32,33,34]. In this study, we showed that the micropeptide XLH-36, but not the lncRNA C5orf66-AS1, exerts specific pro-cancer effects in TNBC. Whether XLH-36 exerts similar effects in other tumors remains to be determined.

Moreover, TNBC is an exceptionally heterogeneous disease, and besides conventional chemotherapy, treatment options are limited [35]. Although some oncogenes (such as MMPs [36], CDK4/6 [37], SET [38], and mutant BRCA1 [39]) have been found in TNBC, few micropeptides have been found to play proto-oncogenic roles. Recent advances in novel immunotherapeutic approaches show promise in TNBC [40]. In the KEYNOTE-355 study [41], treatment of PD-L1-positive metastatic TNBC with immune checkpoint inhibitors (ICIs) combined with chemotherapy reduced the risk of mortality by 9% at a 50-month follow-up. Here, we demonstrated that TNBC patients with low XLH-36 expression exhibited a 20% higher overall survival rate than those with high XLH-36 expression (85% vs. 65%) at the same follow-up time, supporting that XLH-36 may serve as a prognostic biomarker. In the future, further investigations will be needed to increase the number of clinical samples and use liquid biopsies to determine the value of XLH-36 in the diagnosis and prognosis of TNBC.

Recent studies report that Gemin4 gene polymorphism is associated with colon cancer, bladder cancer, liver cancer, and kidney cancer progression [42, 43], and high Gemin4 expression levels are associated with poor prognosis in Basal-like breast cancer (BLBC) patients [44], whereas its specific role in TNBC tumorigenesis is still unclear. In the present study, we showed that downregulation of Gemin4 expression is associated with higher overall survival rates in BRCA patients and Gemin4 can inhibit the proliferation and migration of TNBC cells. Furthermore, our Co-IP experiment, yeast two-hybrid analysis, molecular dynamics simulations, and affinity assay showed that XLH-36 directly binds to Gemin4. Of note, XLH-36 and Gemin4 interact, but do not affect each other’s expression in TNBC cells. All binding sites between the two molecules could be accurately determined in the future using other analytical methods like cryo-EM or co-crystallization to accurately determine all the sites of binding between the two molecules in the future. Here, we document the critical role of Gemin4 in TNBC, and our results suggest that Gemin4 might serve as a target for TNBC treatment, but this needs to be validated using more clinical samples.

Gemin4 specifically interacts with a DEAD-box protein and several core proteins of the spliceosome, indicating that it is directly involved in the assembly and regeneration of spliceosomes required for pre-mRNA splicing within the nucleus. S100A4 is a member of the S100 family of calcium-binding proteins, which has been shown to play an important role in tumor metastasis by regulating adhesion, extracellular matrix remodeling, and cell motility.

Given that XLH-36 colocalized to cytoplasm, directly bound to Gemin4, and inhibited the entry of Gemin4 into the nucleus, thereby preventing its promotion of S100A4 mRNA splicing. ICAM1 is an intercellular adhesion molecule that has been found to be involved in the development of a wide range of tumors through the regulation of cell adhesion, angiogenesis, tumor evolution, apoptosis, metastasis, and other biological processes [45]. It is not clear whether there is a regulatory relationship between ICAM1 and S100A4. We observed a significant upregulation of ICAM1 and EMT biomarkers (TWIST1 and SNAI1) in conjunction with a decrease in S100A4. This suggests that TNBC cells have developed compensatory mechanisms to promote survival and metastasis.

In summary, we reveal that the novel translated product micropeptide XLH-36 is significantly upregulated in TNBC, and that XLH-36 expression was significantly correlated with tumor stage and overall survival of patients. XLH-36 may serve as a target for the treatment, diagnosis, and prognosis of TNBC patients. Our study not only represents a new step in elucidating the functions of micropeptides and their translation to the clinic, but also will provide a new protein (peptide) for the human proteome.