Abstract
The orthopedia homeobox (OTP) gene encodes a homeodomain-containing transcription factor involved in brain development. OTP is mapped to human chromosome 5q14.1. Earlier we described transcription in the second intron of this gene in wide variety of tumors, but among normal tissues only in testis. In GeneBank these transcripts are represented by several 300–400 nucleotide long AI267901-like ESTs. We assumed that the AI267901-like ESTs belonged to the longer transcript(s). We used the Rapid Amplification of cDNA Ends (RACE) approach and other methods to find the full-length transcript. The transcript we found was a 2436 nucleotide polyadenylated sequence in antisense to OTP gene. The corresponding gene consisted of two exons separated by an intron of 2961 bp. The first exon was found to be 91 bp long and located in the third exon of OTP. The second exon was 2345 bp long and located in the second intron of OTP. We have shown the expression of this gene in many human tumors but as few as a single sample of normal testis. The transcript lacked significant ORFs suggesting that we discovered a new antisense cancer/testis (CT) sequence OTP-AS1 (OTP—antisense RNA 1), which belongs to the class of long noncoding RNAs (lncRNAs). According to our findings we assume that OTP-AS1 and OTP genes may be a CT-coding gene/CT-ncRNA pair, or sense-antisense gene pair involved in regulatory interactions.
Similar content being viewed by others
Introduction
Non-coding RNA is divided into small non-coding RNAs (20–200 nucleotides) and long non-coding RNAs (200–100,000 nucleotides). Genes encoding lncRNA often overlap or are adjacent to protein-coding genes, and localization of this kind is beneficial in order to regulate the transcription of neighboring genes. Studies have shown that lncRNAs play many roles in the regulation of gene expression. New evidence indicates that dysfunctions of lncRNAs are associated with human diseases and cancer.
Earlier we performed in silico analysis of the UniGene transcribed sequences database, which includes human transcribed sequences and found that the AI267901 sequence exhibited tumor-specific expression1. Subsequently we confirmed tumor-specific expression of AI267901 experimentally2,3.
We mapped the sequence AI267901 to the human genome (build GRCh38/hg38) using BLAT within the UCSC Genome Browser. The transcript was localized to 5q14.1 in the second intron of the human Orthopedia homeobox (OTP) gene encoding a homeodomain-containing transcription factor involved in brain development3.
We suggested that AI267901 and other similar ESTs were parts of a longer RNA. In order to verify our hypothesis and to obtain complete nucleotide sequence of the putative full-length RNA we used Rapid Amplification of cDNA Ends (RACE).
Here, we present a new discovered cancer/testis long noncoding RNA named the OTP-AS1 RNA expressing specifically in tumors. We also showed that several transcription factors seem to participate in the regulation of OTP-AS1.
Results
We assumed that AI267901 and similar ESTs belong to one long transcript so we used the Rapid Amplification of cDNA Ends (RACE) to find its 5’ and 3’ ends. Results of the two-round amplification of the 5’ end of the transcript are presented in Fig. 1a. The figure shows a PCR-product of 443 bp. This fragment was further cloned, propagated in E.coli and sequenced (S1 Sequence). The resulting sequence was aligned to build GRCh38/hg38 of the human genome. We mapped the 443 bp fragment to chromosome 5 and found that it consisted of two exons (91 and 352 bp) separated by a 2961 bp intron (Figs. 2 and 3).
Two-round amplification of the studied gene 5’ and 3’ ends using gene specific and adaptor primers. (a) Two-round amplification of the studied gene 5’ end: 1—adaptor primer and rev(N), 2—negative control, first round of PCR, no template added, 3—negative control, second round of PCR, no template added. M1—GeneRuler™ 100 bp DNA ladder (Fermentas), M2—GeneRuler™ 1 kb DNA ladder (Fermentas). (b) Two-round amplification of the studied gene 3’ end: 1—adaptor primer and forv1(N), 2—negative control, first round of PCR, no template added, 3—adaptor primer and forv2(N), 4—negative control, second round of PCR, no template added, M—GeneRuler™ 1 kb DNA ladder (Fermentas).
Genome localization of OTP gene, its regulatory region and sequenced fragments. Schematic alignment of the sequenced fragments (5’ end fragment—yellow, 3’ end fragment №1—green, 3’end fragment №3—cyan, assembled full-length sequence—red) with the 5th chromosome (black) and the OTP gene (blue), regulatory region (purple), core promoter (lilac).
Analysis of the exon/intron borders (Fig. 2) demonstrated that the 443 bp sequence was encoded by the “plus” DNA strand of chromosome 5, i.e. located partially in antisense to the OTP gene. The first exon was located in the third exon of OTP, and the second in the second intron of OTP.
We were unable to further extend the 3’ end of the 443 nucleotides transcript using RACE. Therefore, to determine the 3’ end of the longer transcript we performed first strand cDNA synthesis with RT and an oligo(dT) adapter primer followed by 2-rounds of PCR. The results are presented at Fig. 1b. We obtained three fragments of different size and further cloned them in E.coli, and then sequenced the clones. We found that fragment №1 (see sequence in S2 Sequence) was 2311 bp long, polyadenylated, and located on chromosome 5 as an extension of the 443 bp fragment (5’ end fragment) found earlier, overlapping by 318 bp (Fig. 3). Fragment №2 was found to reside on the 3rd chromosome (chr3:14247460–14247871, data not shown). And, finally, fragment №3 (see sequence in S3 Sequence) was found to be polyadenylated, 940 bp long and also located on the chromosome 5, fully matching the 3’ end of the fragment №1 (Fig. 3).
Using the BioEdit (v.7.2.5) software and UCSC Genome Browser we determined the sequence of this previously unknown gene located on chromosome 5 in antisense to the OTP gene. The corresponding transcript of the gene was 2436 bp long and polyadenylated. This gene consisted of two exons: the first exon was 91 bp long and located in the third exon of OTP, and the second one was 2345 bp long and located in the second intron of OTP. The exons were separated by an intron of 2961 bp long (Fig. 3). We called this gene the OTP-AS1 gene. The OTP-AS1 gene sequence is available in the GenBank repository, BankIt2699884 OTP-AS1 OQ938547.
To obtain the full-length sequence of OTP-AS1 experimentally, 2-rounds of RT PCR were performed. RNA isolated from 293T cells or uterine endothelial adenocarcinoma were used as templates. RT was performed with an oligo(dT) primer, and PCR—with the full-forv and full-rev primers. The resulting fragment was 2436 bp long (Fig. 4) as predicted by in silico analysis (Fig. 3). This full length transcript was cloned using a TA cloning vector, propagated in E. coli and sequenced. Its sequence is presented in S4 Sequence.
Two-round amplification of the full transcript on the cDNA from: 1—293 T cells, 2—uterus endothelium adenocarcinoma, 3—negative control, first round of PCR, no template added, 4—negative control, second round of PCR, no template added, M1—GeneRuler™ 100 bp DNA ladder (Fermentas), M2—GeneRuler™ 1 kb DNA ladder (Fermentas).
The results of searching for promoter regions of OTP-AS1 gene are the following. As was mentioned above OTP-AS1 is located on human chromosome 5q14.1; its exact genomic coordinates are chr5:77629671–77635067 (+) based on reference sequence (GRCh38.p13 Primary Assembly). Promoters (at least their core parts) are commonly known to be located near the transcription start site of genes (TSS). We used FANTOM5 “cap analysis of gene expression” (CAGE) to locate the transcription start sites4. One CAGE tag per million (tpm) is considered robust expression and we observed up to 4.2 tpm at chr5:77,629,642–77,629,670, adjacent to our transcript (beginning at 77,629,671) for OTP-AS1. The characteristic activity of the OTP-AS1 promoter was comparable of that of the OTP gene (max CAGE signal 7.3 tpm).
FANTOM5 CAGE data indicated that OTP-AS1 promoter is active in the MCF7 breast cancer cell lines as well as the mesenchymal stem cells of adipose tissues. A CpG island is located about 1608–994 nucleotides upstream of the transcription start site also indicating the possible location of a TSS.
Transcription factor (TF) binding sites (TFBS) are often located near promoters and thus implicitly validate their locations. In addition, sometimes TF may give some information on the function of the gene, for instance genes dependent on HIF1A TF are likely to be related to the hypoxia response. A TFBS can be manifested with a DNA sequence motif or ChIP-seq peak; the latter is cell type specific. Peaks identified in any of cell types compose the human cistrome. For identification of transcription factor binding sites (TFBS) we used human cistrome data in the vicinity of OTP-AS1 promoter5. The regulatory region (2000 bp upstream—200 bp downstream of TSS) contained 12 TFBS peaks for 9 TFs (Table S5). Among TFs displaying DNA binding in this region there were chromatin packing TFs (CTCF and Sp1, probably USF1), oncogenes ERG and MYC, and the chromatin remodelling factor FOXA2. As the cistrome does not contain data on particular cell types we took this information from the GTRD database of ChIP-seq peaks. In total, GTRD gave 228 TFBS peaks in the OTP-AS1 regulatory region, but only 28 of them were associated with 9 transcription factors of AB categories identified in the cistrome dataset. Interestingly, 20 out of 28 confirmed TFBS peaks were identified in cancer cell lines (Table S6).
To predict the transcription factor binding sites in the regulatory region based on sequence analysis we used the MoLoTool6. We were able to find 50 possible binding sites for only 8 TFs out of 9 TFs with high categories, as the JARID2 binding motif is not known (Table S7). There were 14 TF binding DNA motifs supported by GTRD and cistrome peaks, of which 13 related to cancer cell lines. Additionally, motifs for 4 TFs were supported by GTRD peaks not included into the cistrome. All of these peaks were obtained for cancer cell lines. Interestingly, for MCF7 breast cancer cell line, the FANTOM5 data report the OTP-AS1 promoter activity, and GTRD simultaneously show TF binding nearby. In addition, 11 binding sites (of which 3 did not overlap any other binding site) were found among the cistrome peaks but had no support from GTRD. It should be noted, that counting composite elements consisting of strongly overlapping binding sites resulted in 7 sites supported by both databases, 6 of them found in cancer cells (Fig. S8).
To assess the tumor specificity of expression of the newly identified gene we used commercial Clontech (USA) and BioChain (USA) cDNA panels corresponding to normal and tumor tissues. We also used a cDNA panel made in our laboratory using clinical samples of tumors from various locations and at different stages of progression (Human tumor cDNA panel, Biomedical Center, St Petersburg, Russia). Gene expression was determined by PCR with primers specific to the most conservative part of the newly discovered gene—from nucleotide position 1012 to 1452 (Fig. 5, in yellow). In normal and fetal tissue cDNA panels, the minor specific signal corresponding to this fragment was detected in only one testis sample (Fig. 6).
The lack of expression of the newly discovered gene conservative region in normal human tissues. (a) 1—normal brain, 2—normal heart, 3—normal kidney, 4—normal liver, 5—normal lung, 6—normal pancreas, 7—normal placenta, 8—normal skeletal muscle, K−—PCR with no template, K + —PCR with human DNA (Full-length gel Fig. S16). (b) 1—normal colon, 2—normal ovary, 3—normal peripheral blood leukocytes, 4—normal prostate, 5—normal small intestine, 6—normal spleen, 7—normal testis, 8—normal thymus, K−—PCR with no template, K + —PCR with human DNA (Full-length gel Fig. S17). (c) 1—normal bone marrow, 2—fetal liver, 3—normal lymph node, 4—normal peripheral blood leukocyte, 5—normal spleen, 6—normal thymus, 7—normal tonsil, K−—PCR with no template, K + —PCR with human DNA (Full-length gel Fig. S18). (d) 1—fetal brain, 2—fetal heart, 3—fetal kidney, 4—fetal liver, 5—fetal lung, 6—fetal skeletal muscle, 7—fetal spleen, 8—fetal thymus, K−—PCR with no template, K + —PCR with human DNA (Full-length gel Fig. S19).
In the BioChain tumor cDNA panel, the expression of the newly discovered OTP-AS1 gene was detected in the following tumors: carcinomas of lung and bladder, adenocarcinoma of esophagus, small intestine, colon and ovary. The fragment was not found in astrocytoma, testis seminoma or in carcinomas of breast, liver, kidney, fallopian tube and ureter, stomach, or in uterine adenocarcinoma (Fig. 7a).
Expression of the newly discovered gene conservative region in human tumors. (a) 1—brain astrocytoma, 2—breast invasive ductal carcinoma, 3—lung squamous cell carcinoma, 4—esophagus adenocarcinoma, 5—stomach adenocarcinoma, 6—small intestine adenocarcinoma, 7—colon adenocarcinoma, 8- hepatocellular carcinoma, 9—kidney clear cell carcinoma, 10—bladder transitional cell carcinoma, 11—uterus adenocarcinoma, 12—fallopian tube medullary carcinoma, 13—ovary mucinous adenocarcinoma, 14—testis seminoma, 15- ureter papillary transitional cell carcinoma, K−—PCR with no template, K + —PCR with human DNA. (Full-length gel Fig. S20). (b) 19—stage III mammary gland adenocarcinoma, 246, 250, 251, 252—stage II–III invasive duct mammary gland cancer; 2—squamous cell cervical carcinoma IV stage and its metastases into uterus (2a-1), left (2a-3) and right ovary (2a-4), 13- cervical myosarcoma, stage II-III, 6—ovary cancer, 156—stage II moderately differentiated endometrial adenocarcinoma, 270—stage III moderately differentiated endometrial adenocarcinoma with metastases, 7—seminoma, K−—PCR with no template, K + —PCR with human DNA (Full-length gel Fig. S21). 45, 63—meningiomas, 140—hypophyseal adenoma, 12,14—squamous cell lung cancer, 17—bronchus cancer III stage, 108 –stomach cancer, 30—stage IV chronic lymphacytic leukemia, 31—stage IV non-Hodgkin T-cell lymphoma, 67—lymphoadenpathy of unclear pathogenesis, 82—stage II non-Hodgkin lymphoma, stage II, 92—stage IV Hodgkin’s lymphoma, 94—hemolythic anaemia of unclear pathogenesis, 102—stage II non-Hodgkin lymphoma, 113T—stage IV non-Hodgkin lymphoma, K−—PCR with no template, K+—PCR with human DNA. (Full-length gel Fig. S22).
In the Biomedical Center human tumor cDNA panel the expression of the 1012–1452 fragment was detected in five samples of mammary adenocarcinoma (19, 246, 250, 251, 252); in one sample of ovarian cancer (6), hypophyseal adenoma (140) and bronchus cancer (17); in two samples of endometrial adenocarcinoma (156, 270), meningiomas (45, 63); and in all lymphoma samples (31, 67, 82, 92, 102, 94, 113) (Fig. 7b). Weak signals corresponding to the fragment were also found in samples of metastasis of squamous cell cervical carcinoma from uterus (2a-1), left (2a-3) and right ovaries (2a-4), and in one of two samples of squamous cell lung cancer (14). No signals were found in cervical carcinoma (2), cervical myosarcoma (13), seminoma (7), stomach cancer (108), leukemia (30), and in one of two samples of squamous cell lung cancer (12) (Fig. 7b).
We also studied expression of the extended transcript of the newly identified gene on RNA isolated from different tumors (non-Hodgkin’s lymphoma at stages II and IV, lymphadenopathy of unknown origin, invasive ductal breast cancer at stage II). cDNA was synthesized using oligo(dT) primer. We conducted 2-round PCR with the primers to the ends of the newly identified transcript (as-forv, as-rev). Results are presented at Fig. 8. The fragment of 2378 nucleotides was found in all studied samples, thus demonstrating the expression of the previously unknown gene in different tumors.
Two-round amplification of the studied transcript on the cDNA from different clinical tumor subjects. 1—lymphadenopathy of unknown origin (67), 2—non-Hodgkin’s lymphoma at stage II (82), 3—non-Hodgkin’s lymphoma at stage IV (113), 4—invasive ductal breast cancer at stage II (246), 6—negative control, first round of PCR, no template added, 7—negative control, second round of PCR, no template added, 8—positive control, first round of PCR with plasmid containing full transcript of newly identified gene 9—positive control, second round of PCR with plasmid containing full transcript of newly identified gene M1—GeneRuler™ 1 kb DNA ladder (Fermentas).
The analysis of data provided by TCGA for OTP-AS1 gene expression confirmed results obtained in silico. The FPKM values for breast invasive ductal carcinoma, lung squamous cell carcinoma, colon adenocarcinoma and esophageal adenocarcinoma were in the range 16–28 FPKM (Fig. S9). We did not find any detectable expression of OTP-AS1 gene in normal breast, lungs, colon and esophagus, but in testis the expression level was 15 FPKM. The expression of OTP in normal hypothalamus is 9.4 TPM and in testis—1 TPM (Fig. 2, Fig. S10), according to GTEx data. The OTP expression values for breast invasive ductal carcinoma, lung squamous cell carcinoma, colon adenocarcinoma and esophageal adenocarcinoma and other tumors were in the range of 4.3–10.8 TPM (Fig. S11).
Using the ORF Finder webtool we searched for possible open reading frames (ORFs) and identified 10 coding for peptides from 20 to 62 amino acid long (S12 List).
The amino acid sequences of the identified ORFs were compared to the known proteins using Blastp7. Homologous proteins were not found in humans or in other organisms.
Using the bidirectional best hits method, we found that different parts of the sequence had different evolutionary novelty. The sequence consists of an evolutionarily old part (1-91st nucleotide and 600th to 2436th nucleotides), which originated in Tetrapoda, and an evolutionary novel part (92-600th nucleotides), which originated in Eutheria. Phylogenetic analysis of the whole sequence revealed that its eutherian homologs form the detached clade on phylogram (S13 (in red)) that reflected the insertion of novel sequence into the older part. The phylogenetic analysis of the 92–600 nucleotides fragment confirmed its evolutionary novelty in eutherians (S14). Using Phylop8 we found that the sequence between the 92nd and 600th nucleotides had low conservation in all genomes used for the analysis (Fig. 5, in blue).
Discussion
In our previous papers2,3 we demonstrated highly tumor-specific expression of an AI267901-like EST (earlier corresponding to cluster Hs.202247 from UNIGENE Build 129). This locus was expressed in 49 of 59 tumor samples of different cancer types and only in one sample of normal testis. We mapped the AI267901 sequence to the human genome. The AI267901 sequence was found to be located at 5q14.1 in the second intron of the human Orthopedia homeobox (OTP) gene (Figs. 3 and 5).
The human OTP gene is the homologue of the murine Orthopedia gene and belongs to the homeodomain gene family. Its function in humans is not fully characterized as yet. Expression of the OTP gene is found in the brain of 17-week-old human embryo, suggesting a potential role of this gene in brain development9. The gene is encoded by the “minus” strand of chromosome 5 and consists of 3 exons and 2 introns.
At the present time the OTP gene is under active study as a prognostic marker for carcinoid lung tumors by several research groups. There are data showing the OTP gene expression only in lung tumors10,11 and bladder11. We have shown that OTP is expressed in a wide range of tumors of almost all organs: it was expressed in 23 of 29 tumor samples of different organs and only in one sample of normal testis3.
AI267901 is located in the intronic region of the OTP gene and must be absent in the mature Orthopedia homeobox mRNA. At the same time, our results show that AI267901 and OTP have similar expression profiles both in normal tissues and in tumors3. Therefore, the AI267901 might possibly be interpreted either as an OTP transcript alternatively processed in tumors or as a separate transcript.
To find the complete nucleotide sequence of AI267901 we used the Rapid Amplification of cDNA Ends (RACE) and other approaches. As a result, we have obtained a 2436 nucleotide sequence of a previously unknown gene located on the strand opposite of OTP. The mRNA of this gene is polyadenylated and has two exons. The first short exon (91 bp) maps to the antisense strand of the 3’-UTR of OTP. It is known that initiation sites of noncoding RNA transcripts are frequently located in the 3’-UTRs of protein-coding genes12. The second exon (2345 bp) mapped to the opposite strand of intron 2 of OTP. The first and second exons are separated by 2961 bp of intron (Figs. 3 and 5). The sequence of the full length transcript is presented in S4 Sequence.
CAGE tags for OTP-AS1 fall into a narrow region (30 bp, chr5: 77629641–77629670) suggesting that the transcription is activated from a single promoter cluster. Close inspection shows that the CAGE tags are aligned to three closely located sites, the transcription initiation activity of which display some preference to a particular cell type. The strongest initiation site chr5: 77629658–77629659 displays the almost exclusive initiation (4.15 tpm) in the adipose derived mesenchymal stem cells (on adipogenic induction). Two other sites chr5: 77629641–77629642 and chr5: 77629650–77629651 show transcription initiation activity in MCF7 cells (1.815 and 1.105 tpm, respectively).
The OTP-AS1 promoter belongs to the class of multimodal promoters, according to the promoter architecture classification of Carninci13, see also Danino Y.M.14. Such promoters typically contain a number of TSS in a region around 100–150 bp15,16, thus we consider the region from 100 bp upstream to 50 bp downstream of the TSS as containing the core promoter of OTP-AS1.
We also consider the regions that are 2000 bp upstream and 200 bp downstream of TSS as the regulatory sequence. The downstream part is included in order not to miss any regulatory elements as some promoters are known to be partly located downstream (e.g., human IRF1 gene contains the downstream promoter element17) and also includes the beginning of the first intron which is known as a frequent participant of transcription regulation (see Fig. 3). The length of upstream part is selected to include a nearby CpG island, which is often present in human promoters.
We were able to identify the transcription factors that seem to participate in the regulation of OTP-AS1 gene transcription and their binding sites in the upstream regulatory region. The experimental ChIP-Seq data from the public databases for many of TFBS agree with the data of sequence analysis. We did not conducted ChIP-seq ourselves, but we used binding information from the databases of processed experimental data, GTRD18, and ReMAP19, which contain minimally ChIP-seq peaks for many TFs binding in different cell types. In both databases we have found a large number of binding sites for a number of transcription factors. We used a cistrome5 to filter out peaks not supported by TFBS predictions, but any prediction is based on the experimental CHIP-SEQ peak.
Among transcription factors displaying ChIP-seq peaks5 near OTP-AS1 TSS are ERG and MYC, the known oncogenes, and MAX, which is not considered as an oncogene but can form homodimers, or heterodimers with MYC (its name MAX stands for MYC Associated Factor X). The majority of TF binding events was found in cancer cell lines18. For example, highly reliable binding sites for transcription factors CTCF and USF1 have been identified in lung carcinoma A549 cells, which is consistent with the OTP-AS1 expression results obtained in this paper. Thus, the OTP-AS1 gene seems to be up-regulated mostly in cancer.
To further explore the regulatory potential of OTP-AS1 lncRNA we used RBPmap to predict possible sites for RNA binding proteins. A region at pos. 26–33 (uguguguu) in the first exon is likely to bind proteins BRUNOL4, BRUNOL5, RBM24, RBM38, binding of all of which appear to be splicing related, which agrees with the OTP-AS1 splicing cite nearby. In the second exon at pos. 1791–1798 (uauauac) a RBMS1 binding site is found. RBMS1 protein is shown to bind DNA upstream Myc gene and promotes gastric cancer metastases20, so this binding may be functional and related to the regulatory function of OTP-AS1. Finally at pos. 1126–1040 there is a segment ‘agagaaaagagaga’, predicted to bind SRSF10, which binds to purine rich RNA sites.
We also tried to locate miRNA response elements using R package microRNA but failed to find any within OTP-AS1.
We have shown the expression of the full size newly discovered gene in the human embryonic kidney 293 (HEK293) cell line and in uterine endothelial adenocarcinoma (Fig. 4), and of the 2378 bp fragment of the gene in the following human tumors: non-Hodgkin lymphoma stage II and stage IV, invasive ductal breast cancer stage II and lymphoadenpathy of unknown origin (Fig. 8). Interestingly, several shorter sequences can be seen along with the expected fragment (2378 bp) in Fig. 8. These fragments might be alternatively spliced variants of the newly discovered gene in tumors and their sequence is being determined.
The study of the expression of the conserved region of the newly discovered gene in human tumors and normal tissues showed that it was expressed in majority of the tumor samples we studied, including tumors of brain, lung, esophagus, intestines, breast, bladder, uterus, ovary and in lymphomas (Fig. 7), but among normal tissues only in testis (Fig. 6).
We also did not detect the expression of OTP-AS1 in fetal tissues samples (Fig. 6). But it should be taken into account that the tissue samples were obtained from spontaneously aborted fetuses at 18 to 36 weeks of gestation. And since early in development, active genes tend to function in many organs and as organs differentiate and mature gene expression profile may change dramatically21, it would be interesting to analyze OTP-AS1 gene expression at different embryonic development stages separately.
The results of the computational analysis of TCGA data for OTP-AS1 gene expression in breast invasive ductal carcinoma, lung squamous cell carcinoma, colon adenocarcinoma and esophageal adenocarcinoma are in accordance with the data obtained by in silico analysis (Fig. S9). Nevertheless, the FPKM values for OTP-AS1 transcript do not exceed 28, which indicate a low expression level of OTP-AS1 in the tumors we studied. The absence of OTP-AS1 gene expression, according to TCGA data, in normal tissues (with exception of testis) confirms its CT nature.
Amino acid sequences of the ORFs were short in comparison with the known proteins. The Blastp algorithm did not show homologous proteins in humans or in other organisms.
The lack of significant ORFs suggests that we discovered a new tumor-specific long noncoding RNA (lncRNA). Thus, OTP-antisense RNA 1 gene appears to be a CT lncRNA gene. According to human non-protein coding RNA (ncRNA) gene nomenclature22,23 we assigned a symbol OTP-AS1 (OTP-antisense RNA 1) to this gene. This gene symbol is approved by HUGO Gene Nomenclature Committee (HGNC).
We found that, despite the fact that most of the gene (~ 1900 nucleotides) appeared in Tetrapoda, the insertion of an evolutionarily novel part occurred after Eutheria speciation (Fig. S14). This evolutionarily novel part locates between nucleotide 92 and 600 on the OTP-AS1 sequence.
Most lncRNAs exhibit weak or untraceable primary sequence conservation24,25,26. Nevertheless conservative lncRNAs are described in the literature, e.g. MALAT1 gene27. Phylop analysis (Fig. 5) shows that the older part of OTP-AS1 is also conserved. The evolutionarily younger part (~ 500 nucleotides) of the OTP-AS1 gene demonstrates a much lower level of conservation according to Phylop (Fig. 5, in blue). Thus, the whole gene is novel for Eutheria. Interestingly, all AI267901-like tumor-specific ESTs are mapped on the non-conserved Eutheria-specific region of the gene. These data link the OTP-AS1 gene to the so-called TSEEN (tumor-specifically expressed, evolutionary novel) genes described previously28,29,30,31.
Different exons can evolve with different evolutionary rates32. Both conserved and rapidly evolving regions have been described in the BRCA1 gene33. ASPM gene34 and many other genes, reflecting a mosaic of positive and negative selection.
Although the functions of most lncRNAs are unknown, the number of characterized lncRNAs is growing and many publications suggest they play roles in regulation of gene expression in development, differentiation and human disease. lncRNAs may regulate protein-coding gene expression on both transcriptional and posttranscriptional levels (reviewed in35).
lncRNA loci can function in cis, and their lncRNA transcripts can function in trans36,37. lncRNA can regulate the expression of a protein coding gene neighbor38. lncRNA often overlap with coding genes, both in sense and antisense directions39. lncRNAs may interact with proteins, DNA, mRNAs, and micro RNAs, and participate in multimodal interaction networks40.
CT genes demonstrate similarity between processes of spermatogenesis and tumorigenesis. The large group of CT-ncRNA was recently described by in silico methods41. Wang and co-authors described cancer-specific CT-coding gene/CT-ncRNA pairs (where the distance between the CT-coding gene and CT-ncRNA was < 100 kb). The authors suggest that these pairs may be involved in self-regulatory interactions. For example, it was demonstrated that meiosis-related extremely highly expressed CT genes (MEIOB) and their companion testis-specific ncRNAs (TS-ncRNA; LINC00254) play crucial roles in carcinogenesis in lung adenocarcinoma41.
According to our findings herein and elsewhere3 the OTP-AS1 and OTP genes may be CT-genes. Moreover, OTP-AS1 gene is on the opposite strand to the OTP gene and they share complementary sequence in at least one exon. Sense-antisense gene pairs may affect regulatory cascades through established mechanisms42. Thus, OTP-AS1 and OTP genes may be a CT-coding gene/CT-ncRNA pair, or sense-antisense gene pair involved in regulatory interactions. This is supported by similar tissue distribution of their expression.
Thus, we have discovered a new CT lncRNA, which may have regulatory function.
Part of this data was presented at the 2nd International Conference on the Long and the Short of Non-Coding RNAs (09.06.2017–14.06.2017, Heraklion, Crete, Greece).
Materials and methods
cDNA panels
MTC™ Panels. For studies of gene expression, we used commercial cDNA panels. The panels (Clontech, USA) contained a set of normalized single-strand cDNA, produced from poly(A) + RNA from various normal human tissues. We used the following panels: Human MTC™ Panel I (Catalog no. 636742), Human MTC™ Panel 2 (Catalog no. 637643), Human Immune System MTC™ Panel (Catalog no. 636748) end Human Fetal MTC™ Panel (Catalog no. 636747). According to the manufacturer, the panels were free from genomic DNA and were normalized to expression levels of four house-keeping genes. Each cDNA sample comes from a pool of tissue samples obtained from donors of different age and sex, with 2–550 donors in each pool, and the fetal tissue samples were obtained from spontaneously aborted fetuses at 18 to 36 weeks of gestation. We assessed the quality of all samples by PCR using primers for the housekeeping gene GAPDH (data not shown).
Tumor cDNA panel
A cDNA panel containing a total of 15 of cDNA samples were obtained from BioChain Institute, USA (Catalog nos.: C8235544, C8235545, C82355546, C8235549). The samples were produced by the manufacturer from various human tumors obtained by surgical resection. Each sample came from one patient and was histologically characterized. cDNA was produced from poly(A) + mRNA that was free from genomic DNA and normalized by β-actin gene expression level. We assessed the quality of all samples by PCR using primers for the housekeeping gene GAPDH.
Clinical material
Samples of surgically excised tumors of various origins were obtained from the Kirov Military Medical Academy (St. Petersburg, Russia) with written informed consent of all participant patients. The participants were all older than 21. The use of the samples for gene expression studies was approved by the Ethical Committee of the Kirov Military Medical Academy and the Biomedical Centre (St. Petersburg, Russia). All procedures were carried out in accordance with existing guidelines and regulations. The tumors were histologically characterized. We studied the following 29 tumor samples: stage II–III invasive ductal mammary adenocarcinoma (3 samples, patient codes: 250, 251, 252), stage III mammary adenocarcinoma (patient code 19), squamous cell cervical carcinoma, IV stage (patient code 2) and its metastases into uterus (patient code 2a-1), left (patient code 2a-3) and right ovary (patient code 2a-4), cervical myosarcoma, stage II–III (patient code 13), ovary cancer (patient code 6), moderately differentiated endometrial adenocarcinoma, stage II (patient code 156), moderately differentiated endometrial adenocarcinoma with metastases, stage III (patient code 270), seminoma (patient code 7), meningioma (patient codes 45, 63), hypophyseal adenoma (patient code 140), squamous cell lung cancer (patient codes 12, 14), bronchus cancer III stage (patient code 17), stomach cancer (patient code 108), chronic lymphacytic leukemia, stage IV (patient code 30), non-Hodgkin T-cell lymphoma, stage IV (patient code 31), lymphoadenpathy of unclear pathogenesis (patient code 67), non-Hodgkin lymphoma, stage II (patient code 82), Hodgkin’s lymphoma, relapse, stage IV (patient code 92), hemolythic anaemia of unclear pathogenesis (patient code 94), non-Hodgkin lymphoma, stage II (patient code 102), non-Hodgkin lymphoma, stage IV (patient code 113), invasive ductal breast cancer at stage II (patient code 246). Additionally, we used human embryonic kidney 293 (HEK293) cell line (the cell line was obtained from the Institute of Cytology of the Russian Academy of Sciences).
RNA isolation and quality control
Total RNA from clinical material of human tumors was isolated using guanidine isothiocyanate as described elsewhere43. RNA samples were treated with DNAse I (RNAse free, Sigma, USA) for 10 min at 25 °C in order to remove any contaminating genomic DNA.
The concentration of isolated RNA was measured using Ultrospec® 3100 pro spectrophotometer. RNA quality was assessed spectrally by the A260/A280 ratio and visually following agarose gel electrophoresis by band intensity ratio of 28 s rRNA to 18 s rRNA44.
The absence of the DNA in the RNA samples was determined by PCR using primers for the housekeeping gene GAPDH (forward 5ʹ- TGAAGGTCGGAGTCAACGGATTTGGT-3ʹ, reverse 5ʹ-CATGTGGGCCATGAGGTCCACCAC-3ʹ). Conditions for PCR amplification were as follows: 3 min of denaturation at 94 °C; 40 cycles of 30 s at 94 °C, 30 s at 68 °C, 30 s at 72 °C; followed by a final extension for 5 min at 72 °C. The resulting PCR products were resolved by electrophoresis in 2% agarose gel and stained with ethidium bromide. The absence of the DNA contamination in RNA samples was indicated by the lack of the 983 bp long amplification product of GAPDH. The gels were photographed under UV illumination.
RACE (rapid amplification of cDNA ends)
We used the MarathonTM cDNA Amplification Kit (Clontech) to obtain cDNA from uterine adenocarcinoma RNA samples according to manufacturer’s protocol. Double-stranded cDNA was subjected to 5ʹ- and 3ʹ- RACE PCR.
5ʹ—RACE PCR employs two-round amplification using the gene specific forv1 (5ʹ-CGATGGATAAACAGGTCTCGTCTCTTCC-3ʹ, Tm = 62 °C), forv1N (5ʹ-AGGTCTCGTCTCTTCCCAGTTGCAG-3ʹ Tm = 61 °C) and adaptor (5ʹ-GGCCAGGCGTCGACTAGTAC-3ʹ) primers.
3ʹ—RACE PCR is two-round amplification using the gene specific rev1 (5ʹ-TGCAGGTTGTTAGGAACCGGTCTTG-3ʹ Tm = 62 °C), rev1N (5ʹ-TTAGGAACCGGTCTTGATTTTATAAGAC-3ʹ Tm = 56 °C) and adaptor primers.
Gene specific primers (forv1, rev1) were designed to generate overlapping PCR products.
The PCR mixture contained 2 μl of 1:50 cDNA dilution, PCR-buffer (Qiagen, Germany), 100 μM (each) dATP, dGTP, dTTP and dCTP, 5 pmol of each primer, and 1 unit of Hot Taq DNA polymerase (Qiagen, Germany) in a total of 25-µl reaction volume.
Conditions of the reactions were as follows: 15 min of denaturation at 94 °C; 10 pre-cycles of 30 s at 94 °C, 30 s at 68 °C, 4 min at 72 °C. Then 5 pmol of the adaptor primer was added to the reaction mix and amplification was continued under the following conditions: 1 min of denaturation at 94 °C; 5 cycles of 30 s at 94 °C and 4 min at 72 °C, 5 cycles of 30 s at 94 °C and 4 min at 70 °C, 25 cycles of 30 s at 94 °C and 4 min at 68 °C; followed by a final extension for 5 min at 72 °C.
A 50-fold diluted 1 μl aliquot from the 1st round of amplification was used for the 2nd round of amplification with nested primers under the same conditions, but without the 10 pre-cycles. The resulting products were resolved by electrophoresis in 2% agarose gel and stained with ethidium bromide.
Determination of the 3’ end of the transcript
To determine the 3’ end of the transcript reverse transcription (RT) followed by 2-round PCR were performed. cDNA from 293T cells RNA was obtained using SuperScriptTMIII Reverse Transcriptase (Invitrogen) with oligo(dT) adapter primer (5ʹ-GGCCAGGCGTCGACTAGTACTTTTTTTTTTTTTTTTT-3ʹ) according to manufacturer’s protocol. The cDNA was subjected to 2-round amplification with adapter and gene-specific primers forv1 and forv2 (5ʹ- GTGCAGAAGTTATTTTACTGATTTG-3ʹ Tm = 63 °C). The PCR mixture contained 1 μl of cDNA, PCR Taq buffer (Invitrogen), 3 mM MgCl2, 100 μM (each) dNTP, 5 pmol of each primer, and 1 unit of Platinum Taq DNA Polymerase (Invitrogen) in a total of 25-µl reaction. Conditions for the reactions were as follows: 2 min of denaturation at 94 °C; 5 cycles of 15 s at 94 °C, 15 s at 60 °C, 5 min at 72 °C, 35 cycles of 15 s at 94 °C, 15 s at 55 °C, 5 min at 72 °C; and a final extension for 5 min at 72 °C.
A 2 μl aliquot from the 1st round was used for the 2nd round of amplification with nested primers forv1N, forv2N (5ʹ-GTTATTTTACTGATTTGGTTTTTATG-3ʹ Tm = 63 °C) and adapter primer under the same conditions, but with Tm increased by 2 degrees (62 °C and 57 °C, respectively). The resulting products were resolved by electrophoresis in 2% agarose gel and stained with ethidium bromide.
cDNA synthesis
To obtain the full-length PCR-product of a transcript we used cDNA prepared with SuperScript™ III Reverse Transcriptase (Invitrogen) with oligo(dT) primer on RNA from 293 T cells and human tumors (patient codes 67, 82, 113, 156 and 246). cDNA was prepared as recommended by manufacturer.
To obtain the Biomedical Center human tumor cDNA panel we used Revert Aid® First Strand cDNA Synthesis Kit (Fermentas, Lithuania) with random hexamer primers on RNAs from different human tumors, following the manufacturer’s guidelines.
cDNA samples were stored at − 20 °C. The quality of the samples was assessed by PCR using primers for the housekeeping gene GAPDH (data not shown).
Two-round amplification of the full-length transcript
Two-rounds of PCR were performed with the primers for the 5ʹend and 3’end: as-forv (5ʹ-TGCACAGCATGCCCTAGAC-3’ Tm = 60 °C), as-rev (5ʹ-TTTTACTGATTTGGTCATTATG-3ʹ Tm = 61 °C) and full-forv (5ʹ-GTCTGAGCGTGAGCGAGAG-3ʹ Tm = 62 °C), full-rev (5ʹ-ATGAAAAAAGAAAACGAGGTCTATT-3ʹ Tm = 58 °C).
The PCR mixture contained 1 μl of cDNA, PCR-buffer High Fidelity (Invitrogen), 2 mM MgSO4, 100 μM (each) dNTP, 5 pmol of each primer, and 1 unit of Platinum DNA Polymerase High Fidelity (Invitrogen) in a total of 25-µl reaction. The first round of amplification was performed under the following conditions: 2 min of denaturation at 94 °C; 40 cycles of 15 s at 94 °C, 20 s at 55 °C, 3.5 min at 68 °C; and a final extension for 5 min at 68 °C. A 1 μl aliquot from the 1st round was used for the 2nd round of amplification under the same conditions. The resulting products were resolved by electrophoresis in 2% agarose gel and stained with ethidium bromide.
PCR
PCR primers targeting the conserved region of OTP-AS1 were designed based on our sequence of OTP-AS1 from the S1 Sequence: Forward primer 1012forv: 5ʹ-CACTTTCATGATATCTGCTGTTAC-3ʹ, reverse primer 1452rev: 5ʹ-ATAGTGTGCTGTAATTCCATTG-3ʹ. The expected size of the amplicon was 440 bp (from 1012 to 1452 nucleotide of OTP-AS1 sequence).
The PCR mixture contained 2.5 µl of cDNA, PCR-buffer (67 mM Tris–HCl, pH 8.9, 4 mM MgCl2, 16 mM (NH4)SO4, 10 mM 2-mercaptoetanol), 200 µM (each) dNTP, 1 unit of Taq DNA polymerase (Fermentas, Lithuania), and 5 pmol of each primer in a total of 25-µl reaction volume. Amplification was performed under the following conditions: 1 min at 95 °C; 35 cycles consisting of 30 s at 95 °C, 30 s at 60 °C, and 60 s at 72 °C; and final elongation at 72 °C for 5 min.
All PCR products were analyzed by electrophoresis in 2% agarose gel and detected by staining with ethidium bromide.
Sequencing
PCR-products were extracted from the agarose gel, cloned into the pGEM-T Easy Vector (Promega) or TA cloning vector (Invitrogen), propagated in E.coli and sequenced using conventional techniques.
Software and databases
We used BioEdit software for basic manipulation of nucleic and amino acids sequences. Resources of the US National Center for Biotechnology Information (NCBI) databases (http://www.ncbi.nlm.nih.gov/) and UCSC Genome Browser (GB) (http://genome.ucsc.edu/) were used extensively.
An ORF search was performed with the ORF Finder webtool (http://www.bioinformatics.org/sms2/orf_find.html).
To analyze the evolutionary age of the gene we used the bidirectional best hits (BBH) method. Two genes—e.g. g and h—form bidirectional best hits if the similarity (i.e., highest alignment score) between g and h is greater than that between g and any other gene (h is the best hit for g) and vice versa45. The orthologs were searched in 40 completely sequenced eukaryotic genomes, which were retrieved from the “Genome” resource of the NCBI (http://www.ncbi.nlm.nih.gov/genome/). The genomes of representatives of major taxa of the human lineage, i.e. Bilateria, Deuterostomia, Chordata, Euteleostomi, Tetrapoda, Amniota, Mammalia, Theria, Eutheria, Euarchontoglires, Catarrhini, Homininae, were chosen. The genomes we used are listed in S15. The hidden Markov model–based nHMMER tool46 and the original shell script were used to perform the homology search. For nHMMER an e-value threshold of 1e-10 was specified as a more widely accepted threshold for homology based on DNA:DNA searches47. The first hit for query sequence within the program output was considered as the best hit. The very same procedure was performed for the results run in the opposite direction, i.e. for the results where the subject genome was used as a query, and the query genome was used as a subject. All homologous sequences were collected in the single fasta file.
Multiple alignments of homologs were generated using the MAFFT alignment software package48. The L-INS-algorithm was chosen as the most accurate for datasets with 200 or less sequences49. The output fasta file was converted to nexus format by the Alignment Converter web tool (http://www.ibi.vu.nl/programs/convertalignwww/) for further utilization.
A phylogram and cladogram of the complete transcript and of its Eutheria-specific part were obtained with MrBayes (v3.2.3)50,51 and FigTree (github.com/rambaut/figtree/) tools. Bayesian reconstruction of phylogeny was conducted using the MrBayes software package for 20,000 “generations” and the nexus file obtained at the previous step. The “generations” number was chosen as the optimum for our dataset according to previously conducted computational experiments: after 20,000 generations the standard deviation of split frequencies fell below 0.01. This standard deviation was chosen according to Ronquist F et al.51. The cladogram with the posterior probabilities for each split and a phylogram with mean branch lengths were generated and printed into nexus files. The visualization and editing of the trees were performed by FigTree.
Phylop, integrated into the USCS genome browser, was used for conservation analysis.
The search for potential regulatory elements
To find the potential promoter region of OTP-AS1 gene we analyzed the atlas of human promoterome by the FANTOM5 Consortium52 as represented at the Zenbu genome browser https://fantom.gsc.riken.jp/zenbu/gLyphs/.
For identification of transcription factors binding sites the human cistrome5 was used. It is a map of transcription factors sites obtained by integration of ChIP-Seq peaks from different experiments. The data are classified into four reliability categories based on their experimental and technical reproducibility (A is for both experimental and technical reproducibility, B is for experimental reproducibility, C is for only technical one, D is for all others). We considered only TFBS belonging to categories A and B. To additionally verify the peaks we used binding information from the databases of processed experimental data, GTRD18, and ReMAP19, which contain minimally processed data of ChIP-Seq for TF binding in different cell types. We used a cistrome5 to filter out peaks not supported by TFBS predictions, but any prediction is based on the experimental CHIP-seq peak.
For information on TF binding in particular cell types we used the GTRD database18. To annotate occurrences of TF binding motifs we used MoLoTool (Motif Location Toolbox, http://molotool.autosome.ru)6.
We searched miRNA response elements with R package ‘microRNA’53.
To predict RNA protein binding sites we used RBPmap server54; highly stringent predictions with p-values less than 1E-5 were selected.
The analysis of data provided by TCGA and GTEx
The computational analysis of data provided by TCGA for OTP-AS1 gene expression was performed using breast invasive ductal carcinoma, lung squamous cell carcinoma, colon adenocarcinoma and esophageal adenocarcinoma transcriptomes. Paired-end FASTQ files of RNA-Seq data from tumor tissues were accessed from the Genomic Data Commons (GDC) legacy archive (https://gdc-portal.nci.nih.gov/legacy-archive). The GTF file was obtained from ENCODE. The raw paired-end reads in FASTQ format were aligned to the human reference genome, GRCh38.p12 (obtained from UCSC) with the STAR aligner. The BAM files were used for read alignments at the next step. Reads mapped to OTP-AS1 mRNA were counted using HTSeq and normalized to FPKM (counts per kilobase per million reads). The FPKM cutoff was set at 1.
The expression analysis of OTP gene in normal tissues was performed with software integrated at the GTEx database. The GTEx database includes the RNAseq analysis results from 54 tissues obtained from 1000 individuals who died as a result of an accident and had no serious pathologies during their life. The expression analysis in tumor tissues was performed with TCGA database using software available at cBioPortal website (www.cbioportal.org).
Data availability
The datasets generated and analyzed during the current study are available in the GenBank repository, BankIt2699884 OTP-AS1 OQ938547. Data is also provided within the manuscript or supplementary information files.
Change history
23 April 2025
A Correction to this paper has been published: https://doi.org/10.1038/s41598-025-98337-w
References
Baranova, A. V. et al. In silico screening for tumour-specific expressed sequences in human genome. FEBS Lett. 508(1), 143–148 (2001).
Krukovskaja, L. L., Baranova, A., Tyezelova, T., Polev, D. E. & Kozlov, A. P. Experimental study of human expressed sequences newly identified in silico as tumor specific. Tumour Biol. 26(1), 17–24 (2005).
Karnaukhova, Y. K., Polev, D. E., Krukovskaya, L. L. & Kozlov, A. P. The study of Orthopedia homeobox gene expression in different normal and tumor human tissues. Vopr Onkol. 63(1), 128–34 (2017).
Noguchi, S. et al. Data Descriptor: FANTOM5 CAGE profiles of human and mouse samples. Sci. Data 4, 170112. https://doi.org/10.1038/sdata.2017.112 (2017).
Vorontsov, I. E. et al. Genome-wide map of human and mouse transcription factor binding sites aggregated from ChIP-Seq data. BMC Res. Notes 11, 756 (2018).
Kulakovskiy, I. V. et al. HOCOMOCO: Towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP-Seq analysis. Nucleic Acids Res. 46(D1), 252–259 (2018).
Camacho, C. et al. BLAST+: Architecture and applications. BMC Bioinform. 10, 421 (2008).
Pollard, K. S., Hubisz, M. J. & Siepel, A. Detection of non-neutral substitution rates on mammalian phylogenies. Genome Res. 20(1), 110–121 (2010).
Lin, X. et al. Identification, chromosomal assignment, and expression analysis of the human homeodomain-containing gene Orthopedia (OTP). Genomics 60(1), 96–104 (1999).
Swarts, D. R. et al. CD44 and OTP are strong prognostic markers for pulmonary carcinoids. Clin. Cancer Res. 19(8), 2197–2207 (2013).
Nonaka, D., Papaxoinis, G. & Mansoor, W. Diagnostic utility of orthopedia homeobox (OTP) in pulmonary carcinoid tumors. Am. J. Surg. Pathol. 40(6), 738–744 (2016).
The FANTOM Consortium and RIKEN Genome Exploration Research Group and Genome Science Group. The transcriptional landscape of the mammalian genome. Science 309, 1559–63 (2005).
Carninci, P. et al. Genome-wide analysis of mammalian promoter architecture and evolution. Nat. Genet. 38, 626–635 (2006).
Danino, Y. M., Even, D., Ideses, D. & Juven-Gershon, T. The core promoter: At the heart of gene expression. BBA 1849(8), 1116–1131 (2015).
Forrest, A. R. R. et al. A promoter-level mammalian expression atlas. Nature 507, 462–470 (2014).
Smale, S. T. & Kadonaga, J. T. The RNA polimerase II core promoter. Annu. Rev. Biochem. 72, 449–479 (2003).
Burke, T. W. & Kadonaga, J. T. The downstream core promoter element, DPE, is conserved from Drosophila to humans and is recognized by TAFII60 of Drosophila. Genes Dev. 11(22), 3020–31 (1997).
Yevshin, I., Sharipov, R., Kolmykov, S., Kondrakhin, Y. & Kolpakov, F. GTRD: A database on gene transcription regulation-2019 update. Nucleic Acids Res. 47(D1), D100–D105. https://doi.org/10.1093/nar/gky1128 (2019).
Hammal, F., de Langen, P., Bergon, A., Lopez, F. & Ballester, B. ReMap 2022: A database of Human, Mouse, Drosophila and Arabidopsis regulatory regions from an integrative analysis of DNA-binding sequencing experiments. Nucleic Acids Res. 7, D316–D325 (2022).
Liu, M. et al. RBMS1 promotes gastric cancer metastasis through autocrine IL-6/JAK2/STAT3 signaling. Cell Death Dis. 13(3), 287 (2022).
Cardoso-Moreira, M. et al. Gene expression across mammalian organ development. Nature 571(7766), 505–509 (2019).
Wright, M. W. & Bruford, E. A. Naming “junk”: Human non-protein coding RNA (ncRNA) gene nomenclature. Hum. Genom. 5(2), 90–98 (2011).
Wright, M. W. A short guide to long non-coding RNA gene nomenclature. Hum. Genom. 8(1), 7 (2014).
Nitsche, A. & Stadler, P. F. Evolutionary clues in lncRNAs. Wiley Interdiscip. Rev. RNA. 8(1), 1376 (2017).
Hutchinson, J. N. et al. A screen for nuclear transcripts identifies two linked noncoding RNAs associated with SC35 splicing domains. BMC Genom. 8, 39 (2007).
Basu, S., Müller, F. & Sanges, R. Examples of sequence conservation analyses capture a subset of mouse long non-coding RNAs sharing homology with fish conserved genomic elements. BMC Bioinform. 14(Suppl 7), 14 (2013).
Johnsson, P., Lipovich, L., Grandér, D. & Morris, K. V. Evolutionary conservation of long noncoding RNAs; sequence, structure, function. Biochim. Biophys. Acta 1840(3), 1063–1071 (2014).
Samusik, N., Krukovskaya, L., Meln, I., Shilov, E. & Kozlov, A. P. PBOV1 is a human de novo gene with tumor-specific expression that is associated with a positive clinical outcome of cancer. PLoS ONE 8(2), e56162 (2013).
Polev, D. E., Karnaukhova, I. K., Krukovskaya, L. L. & Kozlov, A. P. ELFN1-AS1: A novel primate gene with possible MicroRNA function expressed predominantly in human tumors. Biomed. Res. Int. 2014, 398097 (2014).
Dobrynin, P. V., Matyunina, E. A., Malov, S. V. & Kozlov, A. P. The novelty of human cancer/testis antigen encoding genes in evolution. Int. J. Genom. 2013, 105–108 (2013).
Kozlov, A. P. Expression of evolutionarily novel genes in tumors. Infect. Agent Cancer 11, 34 (2016).
Zhang, X.H.-F. & Chasin, L. A. Comparison of multiple vertebrate genomes reveals the birth and evolution of human exons. Proc. Natl. Acad. Sci. USA 103(36), 13427–13432 (2006).
Pavlicek, A. et al. Evolution of the tumor suppressor BRCA1 locus in primates: Implications for cancer predisposition. Hum. Mol. Genet. 13, 2737–2751 (2004).
Kouprina, N. et al. Accelerated evolution of the ASPM gene controlling brain size begins prior to human brain expansion. PLoS Biol. 2, 0653–0663 (2004).
Kornienko, A. E., Guenzl, P. M., Barlow, D. P. & Pauler, F. M. Gene regulation by the act of long non-coding RNA transcription. BMC Biol. 11, 59 (2013).
Liu, S. J. & Lim, D. A. Modulating the expression of long non-coding RNAs for functional studies. EMBO Rep. 19, e46955 (2018).
Kopp, F. & Mendell, J. T. Functional classification and experimental dissection of long noncoding RNAs. Cell 172(3), 393–407. https://doi.org/10.1016/j.cell.2018.01.011 (2018).
Engreitz, J. M. et al. Local regulation of gene expression by lncRNA promoters, transcription and splicing. Nature 539, 452–455 (2016).
Derrien, T. et al. The GENCODE v7 catalog of human long noncoding RNAs: Analysis of their gene structure, evolution, and expression. Genome Res. 22, 1775–1789 (2012).
Smith, K. N., Miller, S. C., Varani, G., Calabrese, J. M. & Magnuson, T. Multimodal long noncoding RNA interaction networks: Control panels for cell fate specification. Genetics 213(4), 1093–1110 (2019).
Wang, C. et al. Systematic identification of genes with a cancer-testis expression pattern in 19 cancer types. Nat. Commun. 7, 10499 (2016).
Wood, E. J., Chin-Inmanu, K., Jia, H. & Lipovich, L. Sense-antisense gene pairs: Sequence, transcription, and structure are not conserved between human and mouse. Front. Genet. 4, 183 (2013).
Chirgwin, J. M., Przybyla, A. E., MacDonald, R. J. & Rutter, W. J. Isolation of biologically active ribonucleic acid from sources enriched in ribonuclease. Biochemistry 18(24), 5294–5299 (1979).
Sambrook, J. & Russel, D.W. Molecular Cloning: A Laboratory Manual, 3rd edn (Cold Spring Harbor Laboratory Press, 2001).
Zhang, M. & Leong, H. W. BBH-LS: An algorithm for computing positional homologs using sequence and gene context similarity. BMC Syst. Biol. 6(Suppl1), 22 (2012).
Wheeler, T. J. & Eddy, S. R. nhmmer: DNA homology search with profile HMMs. Bioinformatics 29(19), 2487–2489 (2013).
Pearson, W. R. An introduction to sequence similarity (“Homology”) searching. Curr. Protoc. Bioinform. (2013).
Katoh, K., Asimenos, G. & Toh, H. Multiple alignment of DNA sequences with MAFFT. Methods Mol. Biol. 537, 39–64 (2009).
Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Mol. Biol. Evol. 30(4), 772–780 (2013).
Huelsenbeck, J. P. & Ronquist, F. Bayesian inference of phylogenetic trees. Bioinformatics 17, 754–755 (2001).
Ronquist, F. & Huelsenbeck, J. P. Mrbayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19, 1572–1574 (2003).
Lizio, M. et al. Gateways to the FANTOM5 promoter level mammalian expression atlas. Genome Biol. 16, 22. https://doi.org/10.1186/s13059-014-0560-6 (2015).
Gentleman, R. & Falcon, S. MicroRNA: Data and Functions for Dealing with MicroRNAs. R Package Version 1.62.0 (2024).
Paz, I., Kosti, I., Ares, M., Cline, M. & Mandel-Gutfreund, Y. RBPmap: A web server for mapping binding sites of RNA-binding proteins. Nucleic Acids Res. 42, W361–W367 (2014).
Acknowledgements
The results published here are in whole or part based upon data generated by the TCGA Research Network: https://www.cancer.gov/tcga.
Author information
Authors and Affiliations
Contributions
Iu.K., D.P., L.K. and Al.M. performed the experiments on full-length transcript finding and expression analysis. Iu.K., O.N. and E.A. wrote the main manuscript text and prepared figures. An.M., I.P. and V.M. performed the bioinformatic analysis. E.A. revised the main manuscript text, prepared point-by-point response to reviewers and editors questions. A.K. provided project management and supervision, reviewed and edited the manuscript text.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The original online version of this Article was revised: The Acknowledgements section in the original version of this Article was omitted. The Acknowledgements section now reads: “The results published here are in whole or part based upon data generated by the TCGA Research Network: https://www.cancer.gov/tcga.”
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Karnaukhova, I.K., Polev, D.E., Krukovskaya, L.L. et al. A new cancer/testis long noncoding RNA, the OTP-AS1 RNA. Sci Rep 14, 28676 (2024). https://doi.org/10.1038/s41598-024-80065-2
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-024-80065-2










