Introduction

Breast cancer is the leading cause of cancer-related deaths among women globally. Dysfunctional signaling pathways, genetic alterations in regulatory sequences or chromatin regulators, and changes in gene expression regulation at the transcription level are hallmarks of cancerous cells1. In eukaryotes, RNA polymerase II (RNAP II) transcribes all protein-coding genes, as well as miRNAs and lncRNAs2. The transcription initiation of protein-coding genes requires the basal transcription factors TFIID, TFIIA, TFIIB, TFIIE, and TFIIH. These factors are sequentially assembled on the core promoter to recruit the TFIIF-RNAP II complex, forming the transcription preinitiation complex (PIC)3,4. The core promoter spans about 100 bp around the transcription start site (TSS) and includes motifs such as the TATA-box, BRE, Inr, and DPE5.

The TATA-box binding protein (TBP) plays a role in RNA polymerase I, II, and III transcription in eukaryotes6. TBP is the initial member of a compact eukaryotic family of TBP paralogs6,7. TAFs (TBP-associated factors) along with TBP, assemble into the TFIID complex, which appears in various forms depending on the tissue and cell types8. TBP consists of two domains, the C-terminal (core) and the N-terminal domains. In particular, the C-terminal domain is highly conserved, features a saddle-shaped structure, and carries both a convex and concave surface9. The convex surface binds to the TATA-box through the minor groove utilizing an induced fit model9. Intriguingly, the N-terminal domain varies in sequence and size among species yet remains conserved among vertebrates10.

TBPL2 (TBP2 or TRF3) is the most recent and final member described in the TBP family11,12. It has only been found in vertebrates and shows 95% similarity to TBP’s core domain7,10. It both stimulates basal transcription in TATA-box promoters and interacts with TFIIA and TFIIB in vitro7,10. Some plants also have two TBP genes, very similar in sequence, including Zea mays12 and Arabidopsis thaliana13,14. TBPL2 is expressed in oocytes, early embryos, and embryonic stem cells, with conserved patterns in frogs, zebrafish, and mice10,15. The absence of TBP and the abundance of TBPL2 indicates that TBPL2 functions as a TBP replacement factor in several biological systems12,15,16,17.

The TATA-box binding protein-like 1 (TBPL1, TLF, or TRF2) is a less conserved distant metazoan paralog10,18,19,20,21,22. It showcases only 40% similarity with the TBP core domain23. TBPL1 is unique in that it cannot bind directly to the TATA-box10,18,19,20,21,22. However, it does interact with TFIIA and TFIIB10,18,19,20,21,22,23. TBPL1 forms a complex with TFIIA in HeLa cells24 but also interacts with the TFIIA-like factor (ALF), which is a germ cell-specific paralog of TFIIA25. Notably, TBPL1 negatively modulates transcription by sequestering TFIIA26,27. Therefore, the TBPL1 interaction with TFIIA activates the transcription of the TATA-less promoter of the NF1 gene26. TBPL1 inhibits transcription of TATA-box-containing promoters, while simultaneously activating transcription on TATA-less promoters in chickens27. The role of TBPL1 in worms, frogs, and fishes has been examined using knockdown and dominant negative strategies28,29,30,31,32. In these organisms, deletion of TBPL1 leads to an early halt in embryonic development and further downregulation of certain gene subsets10. Numerous transcripts in frogs require TBPL1 but not TBP or TBPL230. TBPL1 does not affect normal embryogenesis in mice, but it is crucial for proper spermatogenesis33,34. This might suggest a functional specialization in different species10,18,19,20,21,22,33,35,36,37,38.

TBPL1 has recently been linked to the development and progression of cancer. The downregulation of miR-18a39 and miR-133b40 was demonstrated in colorectal cancer cells and tissues. These miRNAs regulate the expression of TBPL1, and their overexpression drives a significant decrease in the invasion and proliferation of cancer cells39,40. In agreement with these findings, a study of salivary gland tumors revealed that the TBPL1 gene was part of a fusion product with the Nuclear Factor I B (NFIB) gene, suggesting its participation in oncogenic chromosome rearrangements41. The TBPL1 gene was significantly mutated in samples of pulmonary carcinoid tumors42. Another study on the pathways affected by genomic changes revealed that TBPL1 is associated with the NF-κB signaling pathway, which is linked to cell proliferation, metastasis, and epithelial-mesenchymal transition42. TBPL1 has also been involved in cancer in other studies; for example, in a study conducted using Mendelian Randomization analysis, this transcriptional factor was identified as one of five genes whose expression showed a significant causal relationship with prostate cancer risk43.

The existence of TBP paralogs raises the question of why more than one TBP-related factor is required in organisms from metazoans to vertebrates, where only one such protein would have soundly executed this function in transcription. Global gene expression analysis offers one plausible hypothesis requiring additional components of the basal transcription machinery to accommodate more elaborated gene regulatory frameworks in these organisms21. Switching of TBP family members during development and differentiation is another intriguing explanation for multiple TBP paralogs10.

The aim of this study was to determine the transcriptome of the TBPL1 gene in breast cancer cell lines and the subsequent effect of the knocking out of this gene on breast cancer regulation. For this purpose, the TBPL1 gene was mutated using CRISPR/Cas9 technology, and a comparative analysis of wild-type versus knocked-out cell lines was performed using RNA-seq analysis. We obtained the transcriptomes of the healthy MCF-12F breast cell line and the T47D, SK-BR-3, and MDA-MB-231 breast cancer cell lines, which revealed differential expression patterns of downstream genes, including the upregulation of various novel transcripts. These revealed differential expression patterns of downstream genes, with the upregulation of several novel transcripts. The current study suggests a potential role of the TBPL1 gene in cancer aggressiveness, cell migration, invasion, metastasis, and the EMT pathways, as well as the involvement of different genes in regulating apoptosis in breast cancer cell lines. Consequently, this indicates the involvement of the TBPL1 gene in cancer regulation.

Results

TBP and TBPL1 genes show differential gene expression behavior among different breast cancer cell lines

Human cells possess three members of the TBP family: TBP, TBPL1, and TBPL27. Initially, we determined the mRNA gene expression levels of these genes via end-point RT-PCR, using the HCC-1937, MCF-7, T47D, SK-BR-3, and MDA-MB-231 breast cancer cell lines, with the MCF-12F and MCF-10A healthy breast cell lines selected as controls (Fig. 1). TBP and TBPL1 are expressed in all cell lines at varied levels, but TBPL2 is not (Fig. 1B–D). As controls, we also assessed the expression of the B2M gene, which encodes the Beta-2-Microglobulin (Fig. 1E), as well as the integrity of RNA28S and RNA18S ribosomal transcripts in total RNA samples (Fig. 1A).

Fig. 1
figure 1

TBP and TBPL1 genes, but not TBPL2, are expressed in healthy and breast cancer cell lines. Total RNA was isolated with Trizol and end-point RT-PCR was performed with the oligonucleotides shown in Table 2. (A) Electrophoresis on 1% bleach agarose gel showing the integrity of the RNA28S and RNA18S ribosomal genes in MCF-12F and MCF-10A healthy cell lines (lanes1 and 2, respectively), and MCF-7, T47D, SK-BR-3, and MDA-MB-231 breast cancer cell lines (lanes 3 to 7, respectively). (B) Amplification of a 232 bp DNA fragment corresponding to the TBP gene in all breast cell lines. (C) Amplification of a 289 bp DNA fragment corresponding to the TBPL1 gene in all cell lines. (D) TBPL2 gene was not expressed in any of the healthy cell lines or breast cancer cell lines. (E) As an internal control, a 298 bp DNA fragment corresponding to the B2M gene was amplified in all the breast cell lines studied.

The quantitative analysis of the mRNA levels of TBP and TBPL1 genes via RT-qPCR showed distinct expression patterns for each studied cell line (Fig. 2). TBP showed consistent expression levels in healthy cell lines (MCF10A and MCF-12F). However, its expression differed in breast cancer cell lines. TBP was found to be underexpressed in HCC1937, MCF7, and T47D breast cancer cell lines (Fig. 2A). Conversely, it was overexpressed in the more aggressive SK-BR-3 and MDA-MB-231 breast cancer cell lines. Interestingly, there may be a direct correlation between the expression level of this gene and the molecular classification of the breast cancer cell lines. MCF-7 and T47D, which belong to the Luminal A subtype, are not highly aggressive and non-invasive breast cancer cell lines displaying similarly low levels of TBP mRNA gene expression.

Fig. 2
figure 2

TBP and TBPL1 genes are differentially expressed in breast cancer cell lines according to the stage and molecular classification. (A) Relative TBP mRNA expression levels were evaluated by RT-qPCR using the RNA18SN1 gene as a housekeeping gene and normalized to the values obtained for MCF10A. (B) RT-qPCR evaluated relative TBPL1 mRNA expression levels. The results shown are from three independent experiments in triplicate, and the analysis of the results was performed with the 2(-∆∆Ct) algorithm. Statistics analysis was carried out with unpaired Student’s t-test with Welch’s correction., and the asterisks show the p-value for significance.

Contrarily, SK-BR-3—classified as enriched in the HER2 receptor—showed a 400% overexpression of TBP. This level could be tied to the aggressiveness, invasiveness, and ability of SK-BR-3 to form undifferentiated tumors in immunocompromised mice44. Notably, MDA-MB-231, a cell line that is triple-negative for estrogen (ER), progesterone (PR), and Her2 receptors, revealed nearly 750% TBP gene overexpression compared to healthy cell lines. This is in line with its characteristics—the ability to invade, proliferate, and form tumors in mice, as well as the poorest patient survival prognosis for triple-negative cancer. Conversely, HCC1937—despite being triple-negative—exhibited a lower TBP gene expression level than MDA-MB-231, possibly due to its stage IIB grade 3 designation per the TNM classification system.

RT-qPCR analysis of the TBPL1 gene in healthy control cell lines revealed consistent expression levels. In HCC1937 and MCF7 cell lines, we observed a reduction in TBPL1 expression by 60% and 50%, respectively, compared to healthy cell lines. Conversely, T47D, SK-BR-3, and MDA-MB-231 demonstrated increased TBPL1 expressions of 30%, 90%, and 120%, respectively (Fig. 2B). This uptick in TBPL1 expression in the latter two cell lines potentially correlates with their heightened cancer aggression. Notably, the expression patterns of the TBPL1 gene mirror those found in TBP expression in HCC1937, characterized by relatively low TBP gene expression compared to MDA-MB-231. Therefore, the low TBP and TBPL1 gene expression levels might be linked to the reassuring prognosis of the stage IIB breast cancer subtype, as opposed to the more severe prognosis related to the triple-negative MDA-MB-231 of stage IV, despite being triple-negative as well.

Silencing of the TBPL1 gene in breast cancer cell lines

Given that TBP is known to be vital for cell survival in several organisms due to its crucial role in transcription conducted by the three RNA polymerases, we concentrated our study on the role of TBPL1 in these cell lines. Initially, we assessed the quantity of TBPL1 expressed in cells via immunofluorescence and confocal microscopy (Fig. 3). Our focus was particularly on T47D, SK-BR-3, and MDA-MB-231 as these cell lines exhibited overexpression of the TBPL1 mRNA (Fig. 2). The TBPL1 protein was observed to be overexpressed, primarily localizing in the cytoplasm in all three breast cancer cell lines, compared with the MCF-12F healthy cell line. This observation corroborated the overexpression determined by RT-qPCR.

Fig. 3
figure 3

Intracellular localization of TBPL1 protein by immunofluorescence and confocal microscopy in healthy and breast cancer cell lines. Cells were fixed and incubated with antibodies against TBPL1 (1:200 dilution) and stained with Rhodamine-Phalloidin (1:100 dilution) for actin cytoskeleton and with Hoechst 33,342 for the nuclei.

To ascertain the potential function of TBPL1 in breast cancer cell lines, we employed the CRISPR/Cas 9 method to knockout this gene, using the U6-gRNA Cas-92A + GFP-RM CRISPR plasmid. We confirmed the knockout through RT-PCR in transfected MCF-12F, T47D, SK-BR-3, and MDA-MB-231 cell lines (Fig. 4B,D,F,H, correspondingly), compared with untransfected cell lines (Fig. 4A,C,E,G). Notably, no TBPL1 gene expression was observed in all transfected cell lines. We also documented the integrity of the RNA28S and RNA18S ribosomal transcripts in total RNA samples from all four cell lines (Fig. 4I,J). High-quality RNA samples were prepared in triplicates, culminating in 24 samples (Suppl. Table 1), which were used for next-generation sequencing (NGS) under both wild-type and mutated conditions.

Fig. 4
figure 4

Determination of the TBPL1 gene expression levels in untransfected and transfected healthy and breast cancer cell lines with the U6-gRNA Cas-92A + GFP-RM CRISPR plasmid. The TBPL1 gene mRNA levels were measured by end-point RT-PCR in all cell lines. As a control, the B2M mRNA expression level was also determined in all samples. RNA samples of the three independent experimental replicates for nontransfected and transfected cell lines were used as templates for RT-PCR assays before sending them for RNAseq analyses. (A, C, E, G) MCF-12, T47D, SK-BR-3, and MDA-MB-2321 nontransfected (NT) cell lines. (B, D, F, H) MCF-12, T47D, SK-BR-3, and MDA-MB-231 transfected (T) cell lines. (I) RNA gel electrophoresis shows the integrity of RNA18S and RNA28S ribosomal genes for all healthy and breast cancer cell lines.

Western blot assays were used to determine the TBPL1 protein levels in both healthy and knocked-out cells. T47D, SK-BR-3, and MDA-MB-231 cells were noted to express the TBPL1 protein (Fig. 5, lanes 4, 6, and 8), except MCF12F cells (Fig. 5, lane 2). Knocked-out cells exhibited a significant decrease in TBPL1 in T47D and MDA-MB-231 cells (Fig. 5, lanes 5 and 9, respectively), while there was no detection of TBPL1 in SK-BR-3 and MCF12F cells (Fig. 5, lanes 7 and 3, respectively). Tubulin protein showcased expression in both wild and mutated cells. Despite TBPL1 mRNA levels not being detected by RT-PCR, the amounts of protein expression varied. This is potentially due to a longer half-life for the protein in T47D and MDA-MB-231 cells. Alternatively, it could be attributed to the cells not being incubated with G-418 immediately following transfection with the plasmid.

Fig. 5
figure 5

Determination of the TBPL1 protein expression level by Western blot. Protein extracts from nontransfected and transfected cell lines with the U6-gRNA Cas-92A + GFP-RM CRISPR plasmid were analyzed by Western blot. Membranes were probed with anti-TBPL1 and anti-Tubulin antibodies and revealed for chemoluminescence as described in Methods.

Transcriptomic analyses using next-generation sequencing technology

The RNA-Seq analyses are divided into two sections: the first part shows the differential gene expression found in healthy vs. breast cancer cell lines, and the second displays the comparison between the wild-type and the corresponding TBPL1 mutated cell line.

Differential expression analysis of healthy vs breast cancer cell lines

The differential gene expression analysis was conducted between the MCF-12F healthy cell line and the SK-BR-3, T47D, and MDA-MB-231 breast cancer cell lines. We identified 3860 genes that exhibited significant changes in their expression levels based on false discovery rate (FDR) values for overexpressed and underexpressed genes. The volcano plots display the correlation between statistical significance (log10FDR) and the log2 fold change in gene expression for T47D (Fig. 6A), SK-BR-3 (Fig. 5B), and MDA-MB-231 (Fig. 5C), each compared to MCF-12F.

Fig. 6
figure 6

Gene expression profiles of healthy and breast cancer cell lines. RNAseq data for each cell line were obtained as described in Methods. Comparisons were then carried out between each breast cancer cell line and the healthy breast cell line. Volcano plot of the genes identified in T47D (A), SK-BR-3 (B), and MDA-MB-231 (C). (D) Principal Component Analysis (PCA) of the data obtained for the three replicates for each cell line. (E) Hierarchical clustering of the genes that were found up-regulated and down-regulated in the healthy and breast cancer cell lines.

The Principal Component Analysis (PCA) plot illustrates that the three experiments for each cell line are differentially grouped (Fig. 6D). The three breast cancer cell lines are distinctly separated from the healthy cell line. As anticipated, the T47D (Luminal A) appears more closely related to SK-BR-3 (HER2 overexpressing), yet both are distant from MDA-MB-231 (Triple Negative) due to their molecular classifications.

The genes that altered their expression level across all studied breast cell lines were categorized using the k-means algorithm. The heatmaps show that distinctive expression patterns are present in each cell line (Fig. 6E). For instance, MCF-12F presents an exclusive cluster of 1,684 upregulated genes, only seen in this healthy cell line (Fig. 6E, cluster 1). Notably, the same signature appears downregulated in breast cancer cell lines. SK-BR-3 possesses a unique cluster of 780 genes that are specifically overexpressed (Fig. 6E, cluster 2). Similarly, T47D exhibits 795 upregulated genes (Fig. 6E, cluster 3), while the more aggressive cell line, MDA-MB-231, also presents a distinct signature of 582 overexpressed genes (Fig. 6E, cluster 4). Intriguingly, the genes seen in clusters 2, 3, and 4 are downregulated in the healthy cell line, hinting at their potential role in breast cancer.

The main pathways affected in breast cancer cell lines compared to healthy cell lines are displayed in Fig. 7. The Luminal A T47D breast cell line shows alterations in the expression levels of genes related to ribosome function, including the mitochondrial ribosome (Fig. 7A). These genes, which are part of the large or small subunits, were primarily overexpressed. Some of these genes have also been implicated in coronavirus disease. Other significant pathways involve cancer-related proteoglycans, which are chiefly overexpressed. Many of these are connected to the cytoskeleton and focal adhesion, including Laminin (LAMC2, LAMB3), Actinin (ACTN1, ACTN4), Collagen (COL4A1, COL4A6), Tenascin C (TNC), and Integrin (ITGA7, ITGA1). Transcription factors like MYC, ESR1, and MDM2 were also overexpressed, but others, such as HIF1A, were downregulated. Additionally, factors related to cell cycle control, such as CDKN1A, and other signaling molecules, including RRAS2, WNT4, WNT5A, and ERBB4, were affected.

Fig. 7
figure 7

Enriched pathways (KEGG) Interactomes of the genes affected in breast cancer cell lines compared to the healthy MCF-12F breast cell line. T The overexpressed or subexpressed genes affected in breast cancer cell lines compared to the healthy breast cells are shown and classified according to the main pathways involved. (A) T47D. (B) SK-BR-3. (C) MDA-MB-231.

In the case of SK-BR-3, some molecules involved in nucleotide metabolism were found to be either upregulated or downregulated (Fig. 7B). The PI3K-AKT signaling pathway was identified as altered. Other affected pathways in this cell line included those genes involved in Focal Adhesion, Human Papillomavirus infection, TNF signaling pathway, and efferocytosis, the latter being involved in the clearance of apoptotic cells. Finally, a few pathways were impacted in MDA-MB-231 (Fig. 7C), including the guidance of axons, coronavirus diseases, and assembly of ribosomes.

Differential expression analysis of the wild-type and TBPL1 knocked-out gene in the healthy breast and breast cancer cell lines

The effect of knocking out the TBPL1 gene using CRISPR/Cas9 technology on the expression profile of both healthy and breast cancer cell lines is depicted (Fig. 8). Volcano plots reveal minimal changes in the expression profiles of the wild-type compared to each mutant breast cell line. The MCF-12 healthy breast cell line exhibited the most dramatic changes (Fig. 8A), followed by the breast cancer cell lines SK-BR-3 (Fig. 8C), MDA-MB-231 (Fig. 8D), and T47D (Fig. 8B). Wild-type and mutant MCF-12F cell replicates were differentially grouped (Fig. 8E). Parallel behavior was observed for T47D cells and SK-BR-3 cells. Conversely, the wild-type and mutant cells corresponding to MDA-MB-231 appeared closer, suggesting minimal effects of TBPL1 gene knockout on the gene expression profile. Hierarchical clustering revealed a set of genes (Fig. 8F, 1) that showed decreased or unchanged expression in all TBPL1 knockout cell lines relative to their wild-type counterparts. Cluster 4 in wild-type MCF-12F cells was found to be upregulated; however, the silencing of the TBPL1 gene led to downregulation. For cluster 3 in wild-type SK-BR-3 cells, there was an initial upregulation which turned into downregulation after the TBPL1 gene was silenced. Cluster 1 remained unaffected by TBPL1 knockout in all cells.

Fig. 8
figure 8

Gene expression profiles of the wild type and TBPL1 knocked out gene determined in healthy and breast cancer cell lines. Knock-out cell lines were obtained by transfection of the U6-gRNA Cas-92A + GFP-RM CRISPR plasmid with Lipofectamine as described in Materials and Methods. Comparisons of RNAseq data were then carried out between non-transfected and transfected cell lines. Volcano plot of the genes identified in MCF-12F (A), T47D (B), SK-BR-3 (C), and MDA-MB-231 (D). (E) Principal Component Analysis (PCA) of the data obtained for the three replicates for each cell line. (F) Hierarchical clustering of the genes found overexpressed and subexpressed in the healthy and breast cancer cell lines.

In cluster 2, which was downregulated in all wild-type cell lines, specific gene groups altered significantly following TBPL1 gene knockout. For instance, cluster 5 was upregulated in T47D knockout cells. Similarly, the knockout of the TBPL1 gene in MDA-MB-231 cells led to the overexpression of clusters 6 and 7, whereas in SK-BR-3, cluster 8 was upregulated. For the healthy cell line, clusters 9 and 10 were also observed to be upregulated after the mutation of the TBPL1 gene.

In the case of MCF-12F, the most impacted pathways include those involved in infections such as Influenza A, Epstein-Barr virus, Measles, and SARS-CoV-2, as well as ribosome function, the Tricarboxylic acid cycle, Glyoxylate and dicarboxylate metabolism, Antigen processing and presentation, the mRNA surveillance pathway, and Mitophagy. Several affected genes are also connected to diseases including Parkinson’s, Huntington’s, and diabetic cardiomyopathy, as well as multiple disease neurodegeneration pathways (Fig. 9A,B). The TBPL1 gene knockout in T47D impacted pathways associated with various viral infections, including human papillomavirus, SARS-CoV-2, Measles, Hepatitis C, Influenza A, and Epstein-Barr virus (Fig. 9C,D). The influenced pathways in SK-BR-3 featured cellular senescence, Lysosome function, and nucleocytoplasmic transport (Fig. 9E,F). Lastly, the affected pathways in MDA-MB-231 involved primary immunodeficiency, antigen processing and presentation, ferroptosis, glutathione metabolism, central carbon metabolism, glycolysis/gluconeogenesis, and fructose and mannose metabolism (Fig. 9G,H).

Fig. 9
figure 9

Enriched pathways (KEGG) and gene expression levels affected by the knocking out of the TBPL1 gene. Pathways and genes affected by the TBPL1 gene knocking out in relation to the wild-type cells: MCF-12F (A, B), T47D (C, D), SK-BR-3 (E, F), MDA-MB-231 (G,H.).

Thus, the effect of knocking out the TBPL1 gene varies in each cell line. Even among breast cancer cells, the effects significantly differ, reflecting the heterogeneity of gene expression patterns and likely the differences in their genomes.

Discussion

TBPL1 is a member of the TBP family. Prior studies have demonstrated that TBPL1 binds to active promoters in haploid testis cells in mice, alongside TBP and TAF732—however, unique regulatory mechanisms might exist in the germline. Species-specific differences in TBPL1 functions could potentially be explained by variations in the TBPL1 core domain. While the core domain of TBP is over 80% conserved across species, TBPL1’s core domain only maintains 40–45% amino acid conservation among metazoans10. TBPL1 controls specific subsets of genes that are distinct from TBP. Additionally, our findings indicate that the TBPL1 gene manages different subsets of genes in healthy versus breast cancer cell lines, and in wild-type versus mutated conditions. It is essential to noteworthy that although the contributions of TBPL1 described thus far in healthy cells are primarily related to transcriptional regulation, genetic alterations have also been identified in cancer cells. For example, in a study analyzing salivary gland tumors, a novel gene fusion between the NFIB and TBPL1 genes was identified41, suggesting functions beyond the control of gene expression for this transcription factor. Likewise, in a study using pulmonary carcinoid tumors, the TBPL1 gene was identified as one of the significantly mutated genes42. This research underscores the significance of investigating this transcription factor as a key target in cancer biology.

TBPL1 and TBP are the only genes transcribed in the examined cell lines, while TBPL2 remained unexpressed (Fig. 1). The TBP gene was detected at deficient levels in HCC1937, MCF7, and T47D but was overexpressed in the more aggressive and poor-prognosis SK-BR-3 and MDA-MB-231 cell lines. The TBPL1 gene was also overexpressed in these cell lines; in the first set of cell lines, it was underexpressed. This pattern of overexpression has also been observed in colorectal cancer, where increased TBPL1 expression has been linked to an increased capacity for cancer cell invasion and proliferation39,40. Although the mechanisms by which TBPL1 expression is controlled in breast cancer have not been investigated, similar regulatory mechanisms are likely to exist, especially in more aggressive cell lines such as SK-BR-3 and MDA-MB-231. T47D exhibited a pattern in which these two genes were overexpressed (Fig. 2). This pattern strongly insinuates a distinct gene expression profile for each cell line. The expression of the TBPL1 protein was validated by immunofluorescence and confocal microscopy (Fig. 3), RT-PCR (Fig. 4), and Western blot (Fig. 5). MCF-12F showed lower expression than the other cell lines.

In the first segment of this work, the transcriptomes of T47D, SK-BR-3, and MDA-MB-231 breast cancer cell lines were contrasted with the healthy MCF-12F breast cell line (Fig. 6). The T47D cell line, a Luminal A and PR + that displays chemo-resistance. T47D showcases differentially expressed genes across various pathways like amyotrophic lateral sclerosis, chemical carcinogenesis reactive oxygen species, Coronavirus disease 2019 (COVID-19), diabetic cardiomyopathy, protein processing in the endoplasmic reticulum, ribosome, and thermogenesis (Fig. 7A). The downregulated genes SERPINA6, IGFBP5, CXCL12, KRT19, and ARG2, along with the upregulated GPX1, CAV2, KRT6A, AKR1B1, and COL17A1 genes, were selected for this cell line. SERPINA6, found in both liver and breast cancer cell lines, encodes the alpha-globulin protein with a corticosteroid-binding capacity domain. IGFBP5 inhibits both the growth and metastatic abilities of cancer cells45; hence, it was seen as downregulated in T47D. The chemokine CXCL12, prevalent in typical sites of tumor metastasis such as lungs, lymph nodes, bone marrow, liver, and animal models46, works by attracting T-lymphocytes and monocytes. Its subexpression potentially implies a negative regulation of the immune response.

SK-BR-3 is a poorly differentiated HER2 + overexpressing cell line, isolated from the pleural effusion of a patient with metastasis. It has shown a downregulation of KRT19, BMP7, MAGEA3, TDRD1, and CTAG2 genes. KRT19 has been acknowledged to regulate breast cancer properties by activating the AKT signaling pathway47. However, recent studies have illuminated that modulation of KRT19 expression can lead to contrasting effects on cell proliferation, survival, invasion, migration, and apoptosis, and these effects vary across different cancer cell types48.

BMPs, or Bone Morphogenetic Proteins, are cytokines; extracellular signaling molecules that bind to the TGF-beta protein family. These proteins regulate gene transcription by signaling via specific serine-threonine receptors and intracellular SMAD proteins49,50. This signaling cascade is regulated extensively on various levels51. Moreover, BMPs are involved in the activation and interaction with other signaling pathways, like the MAPK pathway52. BMPs influence a range of cellular processes including proliferation, differentiation, and apoptosis in assorted cell types53.

BMP7 also plays a significant function in cell invasion, proliferation, and migration as demonstrated in MDA-MB-231 cells54. It has also been proven to inhibit cellular proliferation in several other cell lines55. In an in vivo xenograft mouse model using MDA-MB-231 cells, BMP7 has displayed contradictory effects such as reducing tumor growth56. Furthermore, BMPs have been implicated to play a role in tumor angiogenesis and metastasis57.

The MAGEA12 gene is found to be downregulated in the SK-BR-3 cell line compared to the healthy MCF-12F cell line. It is significantly associated with aggressiveness in breast cancer, regardless of hormone receptor subtype status. This suggests that the TBPL1 gene may be involved in the invasive properties of the SK-BR-3 cell line, which could contribute to its heightened aggressiveness. Furthermore, genes with a downregulated MAGEA12 signature are implicated in the migration and invasion of breast cancer cells. The regulation of MAGEA12 expression could, therefore, play an integral role in determining the morphology of aggressive breast cancer cells58.

The CTAG2 gene (Cancer/Testis Antigen 2 gene) has been linked with diseases such as melanoma and other forms of cancer, inclusive of bladder and prostate cancers, as well as epileptic encephalopathy. In the case of breast cancer, members of the CTA family have been implicated in promoting invasion59. The Cancer-Testis antigens SPANX-A/C/D and CTAG2 further contribute to the invasiveness of breast cancer60.

The LAMC2, CDH13, KRT6A, IL1A, and VIM genes were upregulated in the SK-BR-3 cell line. Aberrant LAMC2 (Laminin) expression strongly correlates with metastasis and poor prognosis in lung cancer61. Existing research reports a positive correlation between LAMC2 and EGFR expressions in bladder cancer, oral squamous carcinoma, cholangiocarcinoma, and multiple human cancer cell lines, including those of breast, neoplastic, and thyroid cancers62. LAMC2 has also been seen to bind to EGFR in breast carcinoma and stimulate MAPK activation63. Methylation of the CDH13 gene in breast cancer has emerged as a potent diagnostic biomarker64. Keratin 6A (KRT6A), a type II keratin involved in the epidermalization of squamous epithelium65,66, plays a crucial role in cell migration, specifically keratinocyte migration67. It further promotes the growth and metastasis of lung adenocarcinoma via the induction of the epithelial-mesenchymal transition67. IL1A (interleukin-1 alpha) is implicated in cell proliferation and invasion68. VIM (Vimentin), a cytoskeletal protein forming type III intermediate filaments, participates in cell migration, motility, and adhesion. In solid cancers, overexpression of vimentin is linked with EMT and metastasis69. In conclusion, the overexpression of these genes in the SK-BR-3 breast cancer cell line is consistent with their known metastatic and cancerous properties at the molecular level.

MDA-MD-231 is the most aggressive subtype with a poor prognosis, characterized by the triple-negative status for ER, PR, and HER2- receptors. One of the significantly impacted pathways is related to ribosomes—this includes the protein-coding genes of the two subunits (Fig. 7C). These genes are overexpressed and their expression levels correlate with the invasiveness and tumor formation capacity of this cell line. Another pathway influenced is the so-called “Axon Guidance”, an embryonic pathway typically activated only during development. Key molecules in this process include the NFATC2 gene, part of the nuclear factors of the activated T cells family. This transcription factor, localized in the cytoplasm, stimulates the transcription of immune-related genes post-activation by the T-cell receptor. Additionally, NFATC2 is stimulated through the TGF-β1 pathway and can activate the MMP13 gene in MDA-MB-231 through interaction with ATF3. This activation is crucial for invasiveness70.

Finally, there are some shared pathways between COVID-19 and cancer. For instance, both conditions involve a pro-inflammatory environment and the NF-kB pathway. IL-6 is associated with the JAK-STAT pathway. Moreover, there is significant engagement of the same type of immune response through IFN1A. The common pathways between these two conditions include cytokines, IFN1A, Androgen Receptors, and immune checkpoint pathways71.

In MDA-MB-231 cells, downregulated genes include SPANXB1, LINC03040, ADGRF5, KRT19, and Xage1B. Conversely, the genes H1-3, DSC3, KRT6A, MME, and KRT14 were upregulated. The human sperm protein associated with the nucleus on the X chromosome (SPANX) genes family, comprises five known members (SPANX-A1, -A2, -B, -C, and -D). These genes encode cancer/testis-specific antigens, which are potential cancer immunotherapy targets. SPANXB1 promotes triple negative subtype breast cancer metastasis72. LINC00460 expression is a potential marker for aggressive phenotypes in distinct tumors, including locally advanced lung cancer and basal-like BRCA73. Notably, overexpression of ADGRF5 has been seen in breast cancer. Breast cancer patients with high ADGRF5 expression have a significantly shorter recurrence-free survival and distant metastasis-free survival compared to those with low expression of ADGRF574. KRT19, recent studies have shown, can regulate cancer properties by modulating signaling pathways like the EGR1/PTEN/AKT, Wnt/β-catenin/Notch, and others. Knocking down KRT19 promotes breast cancer cell proliferation, migration, and sphere formation through NUMB-dependent crosstalk in the Wnt/Notch signaling pathway50. The activation of the human cancer/testis antigen gene, XAGE-1, in tumor cells correlates with CpG island hypomethylation, and it sees high expression in lung adenocarcinoma75. The DSC3 gene, a desmosome adhesion molecule that is responsible for normal physiological integrity and cell adhesion, is upregulated in the MDA-MB-231 cell line. Its expression changes during carcinogenesis, and DSC3 is a p53-responsive gene. Like other adhesion molecules, it reportedly provides a barrier to the immune infiltration of tumors76. Lastly, the expression of membrane metalloendopeptidase (MME) has been found in various cancer types. MME is overexpressed in pancreatic endocrine tumors and colorectal carcinoma but reduced in lung and ovarian cancer, suggesting cell type-specific effects of MME77. In MDA-MB-231, this marker gene is upregulated.

The second part involved determining the effect of the TBPL1 gene knockout. We found different signatures when the cells have the intact or mutated TBPL1 gene (Fig. 8), with the healthy cell line being the most affected. For the TBPL1 knocked out MCF-12F cell line, a set of genes including NPIPB3, ESNG00000281383, ZNF252P, TAF4, VP53, INFL3, RSAD2, SPATASL1, and DLX2 were upregulated. However, these were turned off in the wild-type MCF-12F cell line. Conversely, other gene sets, including RAP1, TNFRSF10A, ZNF432, MYDGF, VSP39, CMTR, ZNF815P, and ENSG00000270066, were found to be downregulated in the TBPL1 mutated MCF-12F cell line.

The nuclear pore complex interacting protein, NPIPB3, exhibits high expression, which contributes to better survival in normal cells and is regulated by estrogen receptors. RSAD2, a protein featuring the S-adenosyl-L-methionine (SAM) domain, impedes the replication of DNA or RNA viruses. Although it plays a detrimental role in endometrial cancer, it also functions as a protein marker indicating a poor prognosis in cancer cells. Moreover, MCF-12F, a non-cancerous cell line, typically displays minimal expression of RSAD2 (Viperin) under wild-type conditions. ESNG00000281383, a newly identified gene product, was discovered during the knockout of the TBPL1 gene in various cell lines, yet its function remains undefined. Although INFL3 is expressed in BRCA, its presence is somewhat affected in normal cells. This gene influences cytokine, antitumor, and antiviral regulation via the JAK-STAT pathway.

Contrarily, some of the downregulated genes in TBPL1 muted cells include RAP1A. This gene is a part of the Ras oncogene family. Diseases associated with RAP1A consist of Kabuki syndrome78 and Tuberous Sclerosis79. Rap1 plays a varied role in tumor initiation and progression. For instance, the activation of Rap1 induces tumor initiation and epithelial-mesenchymal transition (EMT) through Notch signaling. Activated Rap1 may stimulate EGFR and Src/FAK, leading to integrin-mediated cell adhesion in cancer when overexpressed80. However, low levels of the RAP1 gene in normal cells can cause cell senescence and growth inhibition81.

TNFRSF10A encodes a member of the TNF-receptor superfamily. This receptor is activated by the tumor necrosis factor-related apoptosis-inducing ligand (TNFSF10/TRAIL) which transduces a cell death signal, inducing apoptosis82. This gene is downregulated under TBPL1 mutated conditions, highlighting the importance of the TBPL1 gene in the survival of normal cells and its potential contributions to cell signaling pathways. ZNF432 is a newly identified HR repressor involved in the cellular response to DNA damage and acts as a PARP inhibitor83.

The MDA-MB-231 cell line demonstrated an upregulation of ENSG00000285238, HKDC1, RASA4, HSPH1, THAP10, TEX30, AVPI1, and RASGRP3 genes within TBPL1 mutated cells. However, the genes MUCSB, CYP3A5, KCNE3, LINCO1934, JAK3, GPC2, LCDR, and ADGRA2 appeared downregulated in the TBPL1 KO cell lines, despite their overexpression in wild-type cells. ENSG00000285238 is a novel transcript of unknown function. that signifies the principal component of the nucleosome remodeling and deacetylase complex, playing a crucial role in epigenetic transcriptional repression. HKDC1 is encoded as a member of the hexokinase protein family. This encoded protein is intrinsically involved in glucose metabolism, and a reduced expression may correlate with gestational diabetes mellitus. Notably, elevated gene expression might be linked with an unfavorable prognosis in hepatocarcinoma84. RASA4 encodes a member of the GAP1 family of GTPase-activating proteins, suppressing the Ras/mitogen-activated protein kinase pathway in response to Ca2+. Strikingly, it activates Ras GTPase functionality.

Consequently, Ras shifts from the active GTP-bound state to the inactive GDP-bound state, no longer initiating downstream pathways that regulate gene expression, cell growth, and differentiation85. The HSPH1 gene encodes a member of the heat shock protein 70 family. The encoded protein functions as a nucleotide exchange factor for the molecular chaperone heat shock cognate 71 kDa protein (Hsc70). Furthermore, this protein has a separate though related role as a holdase, preventing the aggregation of misfolded proteins, including the cystic fibrosis transmembrane conductance regulator (CFTR) protein. Elevated expression of this protein has been observed in various human cancers86.

The CYP3A5 gene is overexpressed in prostate cancer86. However, it is upregulated in the MDA-MB-231 cell line, while the TBPL1 KO cell line showed significant downregulation. LINCO1934 signifies the long intergenic non-protein-coding RNA1934, a non-annotated gene found upregulated in the wild-type MDA-MB-231 cell line, with significant downregulation in the KO cell line. Many novel transcripts have been found in the NGS data; most are long noncoding RNAs, so their functions remain unknown. Significantly, JAK3 (Janus kinase 3) contributes to the pathogenesis of natural killer/T-cell lymphoma (NKTCL)87, suggesting that it could be involved in immunogenic pathways in this cell line.

The SK-BR-3 cell line exhibited different signatures. In the TBPL1 OK cell line, IFI27, CMPK2, RSAD2, HSPA7, MRC1, IL6, IFNL3, LINCO1087, and RSC1A1, were upregulated. Conversely, GCNT3, LINCO1484, LINCO1293, CELF3, ART1, FOXJ1, TEMEM92, RENBP, HBE1, and LEMD1 genes were found to be downregulated in the same cells. However, these genes exhibited the opposite effect in the wild-type TBPL1 SK-BR-3 cells.

The T47D TBPL1 KO cell line exhibited upregulation of the following genes: APOL1, LINC02574, SLC2A3P1, IGHG4, MIR210HG, and TRIM38. Conversely, FIBIN, ENSG00000264668, ENSG00000270066, ENSG00000144785, DUSP13, ASNSP1, and RPGP genes were downregulated in the same cells.

The heatmap comparison of TBPL1 mutant cells to wild-type cell lines yields remarkable results (Fig. 8F). These could be attributable to the disparate origins and genomic backgrounds of the cells under scrutiny. A comprehensive examination of our data may elucidate the crucial roles the TBPL1 gene potentially plays a key role in the biology of both healthy and breast cancer cells. We also searched for TBP, TBPL1, and TBPL2 expression levels in breast cancer samples compared to normal breast tissue in the GEPIA2 database. The only genes being expressed correspond to TBP and TBPL1, which confirms our results. We also looked up the three genes’ survival curves (Kaplan–Meier plots) (Supplementary Fig. 1). When TBP has low expression, survival is 40% at 7,500 days; in comparison, when TBP has a high expression level, survival is 0% at 7,500 days. For the TBPL1 gene, at low expression, the survival rate is between 25 and 50% at 7500 days, compared to approximately 12.5% when the TBPL1 gene is overexpressed. TBPL2 has no statistically significant difference. Thus, this analysis reveals an important role of TBP and TBPL1 genes in breast cancer. Further, the roles of the lncRNAs and the miRNAs not yet evaluated in this study should also be closely examined.

Moreover, the knockout of the TBPL1 gene reduces cell viability by 75% in the healthy breast cell line (MCF-12F) and 37.5% in the T47D Luminal A breast cancer cell line, while in SK-BR-3 (Her2 overexpressing) and MDB-MB-231 (triple negative) breast cancer cell lines, the viability was reduced to 50%, revealing that it is an essential gene (Suppl. Figure 2). Notably, our data show that TBPL1 depletion affects both healthy and cancerous cells, suggesting that its role in cancer is context-dependent. For instance, in a recent randomized Mendelian study, it was observed that a high level of TBPL1 expression is associated with a lower risk of developing prostate cancer43, which would position this transcription factor as a tumor suppressor in contrast to studies that have shown that an increase in the expression of this protein promotes cancer progression39,40.

Methods

Cell cultures

Breast cancer cell lines (HCC-1937, MCF7, T47D, SK-BR-3, MDA-MB-231) and control cell lines MCF-12F and MCF-10A (Table 1) were obtained from the American Type Culture Collection (ATCC) and cultured as recommended. The MCF-12F and MCF-10A cell lines were cultivated in Dulbecco’s Modified Eagle Medium (DMEM)/ F12 medium (Gibco BRL), supplemented with 10% fetal bovine serum (FBS), 10 ng/µL epidermal growth factor (EGF), 8.36 µg/mL insulin, 0.5 µg/mL hydrocortisone, 1 × L-glutamine, and 1 × penicillin–streptomycin. MDA-MB-231 and MCF-7 cell lines were cultured in DMEM, enriched with 10% FBS, 1 × sodium pyruvate, 1 × L-glutamine, 1 × non-essential amino acids (NEA), and 1 × penicillin–streptomycin solution. An additional 4.18 µg/mL of insulin was added to MCF-7 cells. T47D and HCC1937 cell lines were cultivated in RPMI 1640 medium enriched with 10% FBS, 1 × sodium pyruvate, 1 × L-glutamine, and 1 × penicillin–streptomycin. Lastly, the SK-BR-3 cell line was cultured in McCoy’s medium with 10% FBS, 1 × L-glutamine, and 1 × penicillin–streptomycin. All cell lines were incubated at 37 °C in a Shell Lab incubator within a humidified atmosphere containing 5% CO2 (v/v) until they achieved 80–90% confluence.

Table 1 Subtype and characteristics of healthy and breast cancer cell lines used.

mRNA gene expression level of target genes by end-point RT-PCR and RT-qPCR

Total RNA was isolated from cells using the Trizol reagent (Invitrogen, Carlsbad, CA, USA), following the recommended protocol. Its quantity was determined using a Nanodrop 2000c system (Thermo Fisher Scientific). Oligonucleotides, as shown in Table 2, were designed using the PrimerQuest and OligoAnalyzer Tools from Integrated DNA Technologies (https://www.idtdna.com/pages) and the sequences were taken from the Reference Sequences (RefSeq) database at the National Center for Biotechnology Information (NCBI), National Library of Medicine of the National Institutes of Health (https://www.ncbi.nlm.nih.gov/). The Blast program at NCBI was employed to verify the oligonucleotides’ specificity for the target genes, using the RNA18SN1 gene as a reference gene88. The integrity of RNA was checked by conducting electrophoresis in a 1% agarose gel in 1 × Tris Acetate-EDTA (TAE) buffer as described89. The gel was stained using 0.5 μg/mL ethidium bromide and band visibility was achieved using a Gel Doc Gel Imager transilluminator (Bio-Rad).

Five micrograms of total RNA from each cell line were digested using 5 units of DNase I RNase-free (Roche, Mexico) at 37 °C for 1 h, after which the enzyme was inactivated at 70 °C for 15 min. The samples were then incubated in a solution containing 250 ng oligo dT18 (SIGMA-ALDRICH), 1 × SuperScript II RT buffer (New England BioLabs, Ipswich, Massachusetts), 0.5 mM dNTP mix (Roche), 10 mM DTT (New England BioLabs), and 40 U RNasin Ribonuclease Inhibitor (Promega) at 65 °C for 10 min. Following this, 200 units of SuperScript II Reverse Transcriptase (Invitrogen, Thermo Fisher Scientific) were added, and the mixture was incubated at 50 °C for 1 h. RNA removal was achieved by incubating the solution with 1 µL 0.2 M NaOH at room temperature for 10 min and then adding 2 µL TE at pH 8.0 to neutralize the pH.

The gene expression of target genes was verified using RT-qPCR with 2 µL cDNA solution to perform PCR amplification. This used 2 units per 100 µL of the total reaction volume of One Taq DNA polymerase (New England Biolabs), 200 nM of each oligonucleotide (Table 2), 1 × Taq DNA polymerase buffer, and 2 mM MgCl2 in a 10 µL final reaction volume. The PCR conditions were set at 94 °C for 5 min, followed by 35 cycles of denaturation at 94 °C for 1 min, annealing for 1 min at 58 °C, and extension at 72 °C for 1 min, implemented using an Eppendorf MasterCycler Personal PCR system. Amplicons were then analyzed by electrophoresis in a 1% agarose gel, stained with 0.5 µg/mL ethidium bromide, and visualized on a Bio-Rad Transilluminator (Table 2).

Table 2 Oligonucleotides used in RT-PCR and RT-qPCR assays.

Relative expression of mRNA by RT-qPCR

For RT-qPCR assays, samples (100 ng) of total RNA were used with the KAPA SYBR FAST One-Step kit (Kapa Biosystems, Mexico), following the recommended protocol, using an Eco-Real Time PCR (Illumina) system. The PCR conditions were 50 °C for 30 min for reverse transcription, followed by 95 °C for 10 min, and 40 cycles, each consisting of 95 °C for 10 s, 62 °C for 30 s, and 72 °C for 30 s. The expression level of target genes was normalized with the RNA18SN1 housekeeping gene. The relative gene expression was measured using the equation 2(−ΔΔCt) with the v4.02 EcoStudy program (Illumina). Appropriate controls (no reverse transcriptase and no template reactions) were included in each 48-well plate. The GraphPad Prism 6.0 version program was used for all statistical analyses, utilizing a non-paired student T-test with Welch’s corrections. Significant difference values were displayed as mean ± standard deviation for three biological repeats, each in triplicate.

Subcellular localization of TBPL1 by immunofluorescence and confocal microscopy

Half a million cells for each cell line studied were cultured in 6-well cell culture plates containing sterile glass coverslips (Corning) overnight at 37 °C in an incubation oven. Cells were washed twice with phosphate buffer saline (PBS) 1 × for 5 min, then fixed with 2 mL of 4% paraformaldehyde (PFA) for 30 min at room temperature. They were washed three times with PBS 1 × for 5 min with gentle shaking on a Rocker 35 EZ shaker. Following this, cells were permeabilized with 0.05% Triton X-100 for 30 min at room temperature, washed three times with PBS 1 × , and incubated in 2 mL blocking solution (3% bovine serum albumin in PBS 1 ×) for 1 h at room temperature.

After washing once with PBS 1 × , cells were incubated with a rabbit polyclonal anti-TBPL1 antibody (1:200 dilution) overnight at 4 °C. This antibody was a gift from Dr. Lázsló Tora from IGBMC, Strasbourg, France. Cells were then washed five times with PBS 1 × , each for 10 min, and incubated with the Fluorescein-conjugated goat anti-rabbit IgG Antibody (1:200 dilution, Vector). Staining was done with Rhodamine-Phalloidin (1:100 dilution, Invitrogen) for 1 h at room temperature.

After incubation, the cells were again washed five times with PBS 1 × , for 5 min each, and incubated with Hoechst 33,342 reagent (1.16 µg/mL, Thermo Fisher Scientific) for 3 min at room temperature. After two washes with PBS 1 × , each for 5 min, cells were mounted on glass slides with Gelvatol 20/30 mounting medium (Sigma-Aldrich). Finally, cells were examined using a Leica TCS SP8 AOBS multiphoton confocal microscope (Leica Microsystems) at our department.

TBPL1 gene knockout

The CRISPR/Cas9 system was employed to knockout the TBPL1 gene using a CRISPR all-in-one plasmid, specifically designed for this project by Sigma-Aldrich (Merck) and named the U6-gRNA Cas-9-2A + GFP-RM CRISPR plasmid. This plasmid encodes the target guide RNA under the control of the U6 gene promoter and the Cas9 gene, which is fused at its 3’-end with an SV40 nuclear localization signal. The GFP gene and the bGH polyadenylation signal are also embedded, controlled by a human cytomegalovirus (CMV) promoter fused to a CMV enhancer, with the NeoR/KanR gene serving as an antibiotic selection marker. The short guide RNA sequence was chosen via a proprietary Sigma-Aldrich algorithm based on the following criteria: the single-guide RNA (sgRNA) sequence should have a high on-target score90, contain a high specificity score (indicating low off-target effects), be complementary to a sequence located on a target-specific exon, and be located close to the N-terminus of the target gene86. The target sequence of the ID/Clone name HS00003-0,009,519, corresponding to the TBPL1 gene, was 5′-GCAGACAGTGATGTTGCAT, and the PAM sequence was TGG, exhibiting a calculated efficiency of 56.23 and specificity of 86.21, according to the program utilized. We confirmed its specificity for the TBPL1 gene at the RefSeq database using the BLASTN 2.14.1 + program with the non-redundant nucleotide human genome (taxid:9605) database at the BLAST-NCBI web page (https://blast.ncbi.nlm.nih.gov/Blast.cgi). The Non-Homologous End Joining (NHEJ) repair mechanism was chosen due to the error-prone nature of NHEJ, which enables the highly efficient creation of knockout-causing mutations in the target gene. These mutations appear at the site of the targeted double-stranded break more often than at off-target sites. Generally, off-target cutting constitutes a meager 1% of total activity, while on-target cutting comprises the remaining 99%91,92.

The plasmid was transformed into competent Escherichia coli TOP10 cells and was purified using the QIAGEN Plasmid Mega purification kit, as recommended. Fifty thousand cells from each cell line, set in a 24-well plate with sterile glass coverslips (Corning), were transfected with 2.5 µg of the plasmid using Lipofectamine 3000 (Thermo Fisher Scientific), in accordance with the manufacturer’s protocol.

All cell lines were incubated at 37 °C in a humidified air atmosphere with 5% (v/v) CO2 in a Sheldon Shel Lab 3517 Water Jacket CO2 incubator. After transfecting cells with Lipofectamine 3000, they were incubated with the antibiotic G-418 for 48 h to augment the population of transfected cells in each cell line. Previously, the dosage of the antibiotic was determined by independently performing a kill curve for each cell line. The MCF-12F, MDA-MB-231, T47-D, and SK-BR-3 cell lines were incubated with 800 µg/mL, 1.2 mg/mL, 500 µg/mL, and 450 µg/mL of G-418 antibiotic, respectively. The plasmid transfection and expression of the Cas9-GFP fusion protein in cells were confirmed by the fluorescence emitted from the GFP 48 h post-transfection; this was observed through a Leica TCS SP8 AOBS Multiphoton Confocal Microscope (Leica Microsystems). Total RNA was extracted using the Trizol method as previously described. This was followed by multiple PBS-T washes each lasting 10 min and done under consistent agitation. mRNA expression level was determined by end-point RT-PCR in transfected and untransfected cells, with the expression of the B2M gene coding the β-2-Microglobulin protein used as a control. Amplicons were separated by electrophoresis on 1% agarose gels and were subsequently observed as mentioned earlier.

Detection of TBPL1 by western blot

Fifty thousand cells per well were cultured in 1 mL of supplemented culture medium and transfected with 2.5 μg/well for 48 h, as previously described, within a 24-well plate. Cells were treated with 1 mL of 0.25% trypsin and 0.02% EDTA in 1 × PBS for 1 min at room temperature until detachment from the culture plate. These cells were gathered through centrifugation in 1.5 mL Eppendorf tubes (Axygen) using a 5410 Eppendorf centrifuge for 1 min at room temperature, with the resulting pellet resuspended in 30 μL of 1 × Laemmli buffer. Protein samples (20 μg per lane) were boiled for 2 min before undergoing electrophoresis in a 10% sodium dodecyl sulfate–polyacrylamide gel (SDS-PAGE) using a Mini Protean Tetra Cell system (Bio-Rad). Proteins were consequently transferred to a nitrocellulose membrane (0.22 µm pore size, Amersham) with the help of a Mini Trans-Blot Cell (Bio-Rad) at 4 °C overnight. The membrane was subsequently blocked with a 5% skim milk solution in PBS containing 0.05% Tween 20 in PBS (PBS-T) overnight at 4 °C under gentle shaking, followed by overnight incubation at 4 °C with a 1:1,000 dilution of each anti-human TBPL1 rabbit monoclonal antibody (Boster Bio-Tech) and anti-tubulin mouse monoclonal primary antibody (Invitrogen) diluted in PBS-T with 5% skim milk under gentle shaking. This was followed by multiple PBS-T washes each lasting 10 min and done under consistent agitation. After ten washes, the membrane was incubated for 1 h at room temperature in a 5% skim milk in PBS-T solution containing a 1/20,000 dilution of the peroxidase-linked anti-rabbit IgG antibody (Vector), and with streptactin-peroxidase (Bio-Rad) to elucidate protein markers. This was followed by 1 h-long incubation with the goat HRP-linked anti-mouse IgG antibody (Thermo Fisher Scientific) at a 1:20,000 dilution and at room temperature. After a final series of ten washes in PBS-T, each lasting 10 min under gentle agitation, immunoreactivity was detected through chemiluminescence using the Pierce ECL Western Blotting Substrate Kit (Thermo Fisher Scientific) at a 1:8 substrate-to-buffer ratio. Images were captured with an Odyssey FC Imaging System Bioanalyzer (Licor).

RNA sequencing experimental methods and workflow

Whole transcriptome sequencing was conducted to achieve the gene expression profiles of the studied cell lines. All 24 paired-end samples were successfully analyzed for total RNA, and a single-end analysis was performed on 24 miRNA samples. It is noteworthy that three independent biological experiments were performed, each in triplicate, for RNA seq, RT-qPCR, and RT-PCR. These samples, frozen and shipped in dry ice, were sent to the Macrogen company located in Seoul, South Korea. RNA-Seq analysis was carried out using their standard protocol. Apparatuses such as the TruSeq Stranded Total RNA Library Prep Gold Kit and the Illumina NovaSeq6000 sequencer platform were utilized. The analysis was successfully completed on all 24 paired-end samples with a read length of 151 nucleotides. The data received via USB drive were relevant to all 24 tested samples for future downstream assays.

RNA-seq analysis involved several steps. Firstly, raw paired-end reads were cleaned using Trimmomatic93. Cleaned reads were then mapped to the GRCh38 version of the human genome using the STAR aligner94. Counting, normalization, and gene expression quantification were conducted using RSEM95. To identify genes with differential expression, we considered those with a FDR less than 0.05 and a fold change of either less than 0.5 or greater than 2.

Downstream analysis for generating plots and illustrating each comparison was conducted using R packages (R 4.2.1). PCA, Venn diagrams, and volcano plots were created using ggplot2 (v 3.4.3), while heatmaps were produced with pheatmap (v.1.0.12) utilizing the K-means algorithm. Functional annotation and analysis were conducted using ClusterProfiler (v.4.6.2) with the Kyoto Encyclopedia of Genes and Genomes (KEGG) serving as the pathway reference (RefSeq (Assembly: GCF_000001405.40), with a significance threshold established at a p-value of < 0.05. Protein–protein interaction networks were built with STRING, using stringApp (v.2.0.1) within Cytoscape (v.3.10.0).

The RNA18SN1 gene functioned as a housekeeping gene to standardize the gene expression of TBP and TBPL1 genes in RT-qPCR experiments. Each experiment was performed in triplicate, under identical conditions and incubation time. The total RNA from these samples was gathered at the same time. All PCR assays were conducted concurrently, also under consistent conditions. The B2M gene served as an internal control in PCR experiments for the confirmation of knockout. Immunofluorescence and confocal microscopy experiments were performed concurrently and under identical conditions. All samples were in triplicate, and both mutated and wild-type cells were incubated for 48 h. For blot (WB) experiments, each lane utilized 20 μg of protein on average (protein content in 50,000 cells) determined with the Bradford Method96, and the experiments were performed in triplicate using alpha-actin as an internal control. High-quality RNA was isolated for NGS experiments in triplicate for each cell line, thereby analyzing 24 samples in total. RNA quality was verified by the Macrogen company in Korea, where the dry ice-packed samples were sent for RNA-seq analysis.

It is noteworthy that three independent biological experiments were performed, each in triplicate, for RNA seq, RT-qPCR, and RT-PCR. All images in Figs. 1 and 4 were ready for publication using the PowerPoint software (Microsoft), by cutting and pasting in the final figure and only fitting the size of the images. No other software was used to manipulate images.

MTT cell viability assay

Cell viability was determined through the 3-[4,5-dimethylthiazol-2-yl]-2,5 diphenyl tetrazolium bromide (MTT) assay97. Fifteen thousand cells in 100 μL of the corresponding culture medium containing 10% FBS per well were inoculated in a 96-well plate (Corning) and incubated for 24 h at 37 °C and under an atmosphere of air containing 5% CO2. Then, the culture medium was withdrawn from the well, and cells were transfected with 500 ng of the U6-gRNA Cas-9-2A + GFP-RM CRISPR plasmid following the Lipofectamine 3000´s protocol (Invitrogen). Proper controls of untransfected cells were also prepared. After transfection, the culture medium was withdrawn form transfected and non-transfected cells and replaced with 100 μL MTT dilution [5 mg/mL in culture medium without FBS, sterilized by filtration through sterile 0.22 µm (pore size), 33 mm in diameter polyethersulfone filtration device (Millipore)], pipette mixed to properly distribute the solvent and incubated for four hours at 37 °C. Then, the MTT solution was withdrawn and washed once with 100 μL PBS 1 × . Formazan crystals formed by living cells were solubilized with 200 μL isopropanol containing 40 mM HCl for 10 min and shaking at 30 rpm at room temperature. Finally, absorbance at 570 nm was immediately measured using a BioTek Epoch 2 Microplate Spectrophotometer (Agilent). As a reference, absorbance at 630 nm was also determined. As a negative control, the same amount of culture medium without serum was added to a well.