Abstract
Recent advancements in experimental and computational methods for RNA secondary structure detection have revealed the crucial role of RNA structural elements in diverse molecular processes within living cells. It has been demonstrated that the secondary structure of the entire viral genome is often responsible for performing crucial functions in the viral life cycle and also influences virus evolution. To investigate the role of viral RNA secondary structure, alongside experimental techniques, the use of bioinformatics tools is important for analyzing various secondary structure patterns, including hairpin loops, internal loops, multifurcations, external loops, bulges, stems, and pseudoknots. Here, we have introduced a Python package for analyzing RNA secondary structure elements in viral genomes, which includes the recognition of common secondary structure patterns, the generation of descriptive statistics for these structural elements, and the provision of their basic properties. We applied the developed package to analyze the secondary structures of complete viral genomes collected from the literature, aiming to gain insights into viral function and evolution. Both the package and the collection of secondary structures of viral genomes are available at http://github.com/KazanovLab/RNAsselem.
Similar content being viewed by others
Introduction
It is well-recognized that RNA is a versatile molecule capable of performing various functions, including storing genetic information, catalyzing chemical reactions, regulating gene expression, and even self-replicating1. RNA molecules can fulfill their functions as stand-alone molecules (mRNA, tRNA), in complexes with proteins (rRNA), or as structural elements integrated within another RNA molecule (riboswitches)2. For most types of RNA molecules, the ability to form specific secondary structure, and subsequently, tertiary structure, plays a crucial role in performing their functions3. Recent advances in experimental and computational methods for detecting RNA secondary structures have unveiled fascinating examples of molecular processes involving RNA molecules, with many studies being associated with RNA viruses, including HIV, influenza, dengue, Zika, and SARS/SARS-CoV-2 viruses4.
RNA viruses pose a serious threat to public health and have significant pandemic potential, as evidenced by the recent COVID-19 pandemic caused by the SARS-CoV-2 RNA virus5,6. RNA viruses have a higher mutation rate compared to DNA viruses, are typically highly contagious, and, in many cases, effective vaccines and treatments for RNA viruses do not exist7. Exploring the intricacies of RNA virus life cycles and the role of their genome’s secondary structure is essential for our ability to manage and control these viruses. The significance of viral RNA secondary structure has been demonstrated at various stages of the viral life cycle, including virus replication, protein synthesis, packaging, evasion of the host immune system, and the hijacking of host cellular factors8.
Recent studies have identified numerous important functional structural elements within viral RNA that interact with proteins, other RNA molecules, or small ligands. Thus, it was found that the secondary structure of HIV-1 Rev response element (RRE) provides the basis for selection of HIV-1 mRNA by Rev protein for nucleocytoplasmic transport9. Another example of an important RNA element with an established secondary structure is the SARS-CoV-2 programmed ribosomal frameshifting stimulation element (FSE)10, which induces a one-nucleotide backward shift of the ribosome into an overlapping reading frame at a specific frequency. This shift enables the ribosome to bypass a stop codon and translate ORF1b containing five additional proteins. One more example is the flavivirus cis-acting 5’-flanking element (UFS), characterized by a hairpin secondary structure with a U-rich stem. This element regulates the recruitment of the flavivirus replicase through genome cyclization11.
It has come to light that the secondary structure also plays a role in host-mediated RNA editing of RNA viruses, thereby influencing the viral life cycle and the direction of viral evolution. Two families of enzymes – adenosine deaminases (ADAR) and apolipoprotein B mRNA editing catalytic polypeptide family (APOBEC) – were implicated in this process12,13. The ADAR family members (ADAR1-3) deaminate adenines residing in double-stranded RNA (dsRNA), converting them to inosines (A-to-I)14, while APOBEC family members (APOBEC1-2, 3A-H, 4, and AID) deaminate cytosine into uracil (C-to-U) on single-stranded RNA (ssRNA)15,16. The relative mutational impact between RdRp (RNA-dependent RNA polymerase), which introduces errors during replication, and RNA editing enzymes, remains unclear17,18,19. Recent studies revealed an enrichment of C-to-U substitutions and, to a lesser extent, A-to-G substitutions in the SARS-CoV-2 genome, offering evidence for RNA editing by APOBEC and potentially by ADAR enzymes20,21,22,23,24,25. However, the full extent of this editing can only be conclusively established through experiments using ADAR and APOBEC knock-out cell lines26,27. As mentioned earlier, the locations of mutations induced by ADAR and APOBEC depend on secondary structure. Besides APOBEC enzymes’ activity toward single-stranded DNA, Buisson et al.28,29 discovered additional APOBEC preferences that depend on secondary structure. They identified hotspots of APOBEC-induced mutations in cytosines located at the 3’ end of hairpin loops, formed by single-stranded DNA/RNA. A study conducted by Nakata et al.30 reported an increased number of C-to-T mutations at the tips of bulge or loop regions within the viral RNA secondary structure. These studies have demonstrated that the secondary structure of the viral genome can significantly influence its evolutionary trajectory31.
Recognizing the significance of describing diverse secondary structural patterns within viral genomes, it’s worth noting that the predominant formats used for representing secondary structures in bioinformatics studies remain focused on nucleotide pairing only (e.g., dot-bracket and connectivity table (CT) formats)32. Thus, these formats lack the capability to label higher-order secondary structure elements, such as hairpins, bulges, internal loops, multifurcated loops, and pseudoknots (Fig. 1a). While there is a specialized file format known as Washington University Secondary Structure (WUSS)33 designed for describing high-order secondary structure elements, it has not gained common usage thus far. For example, to the best of our knowledge, there are currently no tools available for converting dot-bracket or CT formats to the WUSS format or any packages that offer functionality for conducting a descriptive analysis of secondary structure patterns. To address the gap in this rapidly evolving field, we have developed a Python package for gathering descriptive statistics of high-order secondary structure elements in viral genomes, which also includes the preliminary conversion of conventional secondary structure formats into the WUSS representation. We have also assembled a collection of the currently available secondary structures of viral genomes and used our package to gain insights into the range of secondary structure patterns in these viruses. The package and the collection of secondary structures of viral genomes have been made publicly available for the scientific community.
(a) Illustration of different RNA secondary structure patterns. (b) Proportions of paired/unpaired nucleotides in the genomes of RNA viruses. (c) Percentage of the viral genome covered by RNA structural elements (excluding multifurcation loops). (d) Percentage of the viral genome covered by RNA structural elements (including multifurcation loops).
Methods
Python package for descriptive analysis of RNA secondary structure elements
The algorithm for converting from the Connectivity Table (CT) format to the Washington University Secondary Structure (WUSS) format was adopted from a C library Easel34. Specifically, we have ported the algorithm implemented in the esl_ct2wuss() function from the C file esl_wuss.c into the ct2ss() function of the RNAsselem package by analyzing the C code and reimplementing the algorithm in Python. Upon applying this algorithm, paired nucleotides in stem regions were designated using different types of squares (' < > ,' '{},' '[],' '()' ) based on their nesting level. Loop regions were classified into the external loops (‘:’), hairpin loops (‘_’), bulges, internal loops (‘- ‘), and multifurcation loops (‘,’). The conversion output could be optionally generated in one of two formats: either in WUSS notation or in the CT-like extended format, where the WUSS string is added as an additional column.
The logic of enumerating RNA structural elements was specific for each element type. For hairpin loops and bulges, consecutive labels of particular type were logically concatenated, treating them as a unified RNA element. In the case of internal loops, we interpreted this structural element as two distinct loops: one on the direct strand of the stem and the other on the complementary strand. The combination of these two loops was considered as a single structural element. Similarly, multifurcation labels were initially concatenated into the arcs of multifurcation loops, and then these arcs were organized into the components of multifurcation loops based on the topological analysis of adjusted stems. Stems interrupted by bulges or internal loops were treated as components of a single, integrated stem. A comprehensive overview of the package’s functionality is provided in its documentation. The original esl_ct2wuss() function from the Easel C library was extracted and implemented as a standalone program to benchmark RNAsselem’s ct2ss() function, which is intended for converting CT to WUSS formats. Package documentation and source code are available at: http://github.com/KazanovLab/RNAsselem.
Collection of RNA secondary structures of viral genomes
Secondary structures of RNA viruses were retrieved from publications through a PubMed search using the keywords 'RNA secondary structure’ in combination with the respective RNA virus names. The search was performed for poliovirus, dengue, Zika, coronavirus (specific type, e.g., SARS-CoV-2), hepatitis, and HIV viruses. Among the retrieved publications, we selected those that presented genome secondary structures in dot-bracket or connectivity table (CT) format files. In total, seven CT files describing the structures of three RNA viruses, including Dengue virus serotype 2 (DENV-2), Hepatitis C (HCV), and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), were obtained. Detailed descriptions of the obtained RNA secondary structures are provided in Supplemental Table 1. For each CT file in the collected dataset RNA viral secondary structures, we used the developed package to generate both the WUSS-format file and the modified CT-format file, containing an additional column representing the WUSS notation of the secondary structure. The compiled collection is now accessible to the scientific community via the GitHub repository at http://github.com/KazanovLab/RNAsselem. The repository is organized as follows: secondary structure files of the RNA viruses are located in separate folders, each named according to the short name of the respective virus. Each folder contains original CT files named according to the virus, concatenated with the PubMed ID of the publication from which these secondary structures were selected. If the original publications contained several secondary structures, the PubMed ID label was concatenated with a suffix explaining the origin of the structure mentioned in the publication. For example, the name of the cell line was included if several cell lines were used in the study. The generated WUSS files and CT-modified files, which include an additional column with WUSS notation, were created with the same names as the original CT files and with extensions '.wuss’ and '.ctwuss’, respectively.
Results
RNAsselem – a tool for descriptive analysis of high-order RNA secondary structure patterns in viral genomes
The main goals of this study were to develop a programming package for conducting a descriptive analysis of RNA secondary structure patterns in viral genomes and to compile a collection of known secondary structures of RNA viral genomes. The package was designed to perform the following functions: (i) converting from dot-bracket or CT format to WUSS format; (ii) calculating statistics on the pairing and unpairing of nucleotides; (iii) calculating statistics on genome coverage by different types of structural elements; (iv) creating a list of structural elements of a particular type with information on position and size, including hairpin loops, internal loops, bulges, stems, and multifurcation loops; and (v) calculating statistics on structural elements of a given type, providing a frequency of occurrence in the genome, as well as computing the mean, median, standard deviations of the element size, and the total length of the elements in the genome.
We have also compiled a collection of secondary structure annotations available in the literature for RNA viruses. A total of seven CT files from35,36,37,38 were obtained, describing the secondary structures of three RNA viruses, including Dengue virus serotype 2 (DENV-2), Hepatitis C (HCV), and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Using the developed RNAsselem package, the CT files were converted to a WUSS format, offering a comprehensive annotation of high-order RNA secondary structure patterns. The obtained annotations have been deposited in a GitHub repository to make them freely available to the scientific community. As the WUSS format offers comprehensive annotations for RNA secondary structure elements, we further conducted a detailed descriptive analysis of these structural elements in the collected RNA viruses.
Abundance of RNA secondary structure elements in the genomes of DENV-2, HCV, and SARS-CoV-2 viruses
Using the developed programming package RNAsselem, we first analyzed the proportion of paired and unpaired nucleotides within the secondary structure of RNA viruses from our dataset (Fig. 1b). Among the secondary structures of viral genomes considered, the DENV-2 virus exhibited the lowest fraction of stem regions (paired nucleotides), accounting for 46%. In comparison, the mean fraction of paired nucleotides in the RNA secondary structures of the HCV virus was 58%, with a standard deviation of 5%, and in the SARS-CoV-2 virus, it was 54% with a standard deviation of 4%. Thus, our findings indicate that approximately half of the nucleotides in the considered RNA viruses are paired.
Next, we analyzed the fractions of the viral genomes occupied by the RNA structural elements. Considering that multifurcation loops are conditionally treated as RNA structural elements, we calculated the fractions of the genomes occupied by RNA elements without (Fig. 1c) and with (Fig. 1d) the inclusion of multifurcation loops. We observed a notable similarity in these fractions across all the analyzed RNA secondary structures of viruses. Thus, the fraction value excluding multifurcation loops for DENV-2 virus, 77%, fell within the error margins of the RNA secondary structure versions of HCV and SARS-CoV-2 viruses, which were 81.7% ± 6% and 79.8% ± 5%, respectively. If multifurcation loops were included in the count of RNA structural elements, the respective proportions of the genome occupied by these elements were 93.7% for DENV, 87.9% ± 7% for HCV, and 87.2% ± 5.7% for SARS-CoV-2. Thus, our observations revealed that a significant portion of the RNA genomes in the studied viruses is characterized by the presence of RNA secondary structure elements.
Diversity of RNA secondary structure patterns in DENV-2, HCV, and SARS-CoV-2 viruses
We further analyzed the distribution of various RNA structural elements within our collection of the secondary structures of RNA viruses (Figs. 2, 3). With our developed RNAsselem package, we calculated the number and size of RNA structural elements, such as hairpin loops, internal loops, bulges, and multifurcation loops, for each version of the studied RNA viruses’ secondary structures. The most prevalent RNA structural elements were found to be hairpin loops and internal loops, occurring on average 19.7 and 20.5 times per 1K nucleotides, respectively. The average frequency of bulges was slightly lower at 14.8 occurrences per 1,000 nucleotides. The least frequently observed RNA structural element was multifurcation loops, appearing on average 4.4 times per 1K nt.
Frequencies of hairpin loops (a), internal loops (b), bulges (c), and multifurcation loops (d) in the genomes of RNA viruses.
Average size of hairpin loops (a), internal loops (b), bulges (c), and multifurcation loops (d) in the genomes of RNA viruses. The statistical significance of the differences was estimated using the Wilcoxon test. Significant differences are indicated by stars: p < 0.05 (*), p < 0.01 (**), and p < 0.001 (***). Statistical tests were conducted separately for each virus type.
Given the variability in the size of RNA structural elements, we have calculated the total fraction of the genome covered by each type of RNA element for every RNA virus. The highest fraction of nucleotides covered by an RNA structural element was observed for hairpin loops, with an average coverage of 126.7 per 1,000 nucleotides. The second most prominent RNA element, internal loops, demonstrated an average occurrence of 104.8 per 1K nucleotides, and the third most prevalent were multifurcation loops, with an average of 82.3 per 1K nucleotides. The lowest coverage of nucleotides, at 27.8 per 1,000, was observed for bulges. It should be noted that, despite occurring almost as frequently as hairpin loops and internal loops, bulges occupy much less space in RNA viral genomes due to their smaller size.
We have also analyzed variations among the considered viruses in the number and size of RNA elements of the same type. We observed that the frequency of hairpin loops, calculated as the number of structures per 1K nucleotides, in the HCV virus is slightly lower than in the DENV-2 and SARS-CoV-2 viruses: 17.28 ± 0.23 compared to 21.54 and 20.73 ± 0.75, respectively (Fig. 2a). While the frequency of hairpin loops is similar in DENV-2 and SARS-CoV-2, the size, and consequently the fraction of the viral genome occupied by these structures normilized per 1K nucleotide, is higher in the DENV-2 virus compared to SARS-CoV-2: 182.6 nt versus 131.7 ± 2.4 nt (Fig. S1a). As observed in Fig. 2b, the frequencies of internal loops are quite similar across the considered viral genomes, while the size of internal loops is higher in the DENV-2 virus compared to other viruses (Fig. 3b): 6.7 nt versus 4.8 ± 0.2 nt. Due to a higher frequency of bulges in HCV structures, the total coverage was greater in HCV viruses (Fig. S1c), despite the sizes of bulges being approximately similar across all considered genomes (Fig. 3c). The genome coverage by multifurcation loops was similar in HCV and SARS-CoV-2 viruses, as the higher frequency of these structures in HCV (Fig. 2d) was compensated by their larger size in SARS-CoV-2 (Fig. 3d). However, the DENV-2 virus showed higher coverage compared to HCV and SARS-CoV-2, with the number of multifurcation loops similar to those in HCV and a size comparable to the size of these elements in the SARS-CoV-2 virus.
To compare a location of the RNA secondary structure elements in structures generated in different studies (SARS-CoV-2) or for different virus genotypes (HCV) using RNAsselem package, we created a RNA secondary structure maps for all considered secondary structures (Figure S3) using the functions of the package retrieving the list of structural elements of particular type for a given RNA secondary structure. We analyzed intersection of position of RNA secondary structure elements separately for each type of secondary structure (figure S4). As expected, we found more difference in case of different virus genotypes in case of HCV virus and less difference for structures of the same virus (SARS-CoV-2) generated in different studies. However, structures generated in different studies using different experimental conditions or different experimental techniques were completely identical, however there was no clear consensus as was noticed earlier39. To investigate closely on its differences, we compared 5’ and 3’ UTR regions of SARS-CoV-2 virus know for stable RNA secondary structures. We found that most difference between structures were resided in case when particular region designated as regions without structure (so-called “external loop”) in one study, but detected as haiping loops or multifurcation loops in other studies, and even within single study but in different experimental conditions. Thus, … We speculate that this differences apparently correspond to very unstable RNA secondary structures that can be formed in specific conditions.
Discussion
The increasing evidence highlighting the crucial importance of RNA secondary structure in numerous cellular processes has heightened interest in research within this field. Bioinformatics analysis of RNA structures can offer valuable insights into molecular processes where the structure of the RNA molecule plays a crucial role. Despite the availability of fundamental bioinformatics tools for handling RNA secondary structures, in our opinion, there is a lack of tools for more sophisticated analysis of higher-order RNA secondary structure elements. Here, we introduce a Python package specifically designed for conducting descriptive analyses of RNA secondary structure patterns in genomes of RNA viruses, along with an assembled collection of available secondary structures of viral genomes.
Thus, we have collected available descriptions of RNA secondary structures for the genomes of DENV-2, HCV, and SARS-CoV-2 viruses and applied our package to compare the content of RNA secondary structure elements within these genomes. First, we compared the fractions of the genome occupied by paired nucleotides and observed a similarity across all considered viruses, with approximately half of the genome being covered. Secondly, we found that the fractions of the genome occupied by various RNA structural elements are consistently similar in all genomes, amounting to 80% when excluding multifurcation loops and 90% without exclusion. Third, we compared the statistics for each type of RNA secondary structure element, including the number, size, and genome coverage, and identified variations among viruses. Thus, the hairpin loops, identified as the most common structural RNA element, displayed a larger count and mean size in DENV-2 compared to other viruses. The average size of the internal loops was found to be maximal in the DENV-2 virus. HCV virus surpassed other viruses in the frequency of bulges, while SARS-CoV-2 exhibited a larger size of multifurcation loops, as did the DENV-2 virus, approximately two times bigger than in the HCV virus.
In summary, this study illustrates how our developed bioinformatic package facilitated a comparative descriptive analysis of RNA structural elements across diverse RNA viruses. In general, bioinformatics tools are indispensable for studying the RNA secondary structure in viruses. They enable researchers to analyze and interpret the role of RNA secondary structure, providing insights into its functions and the mechanisms involved in the viral life cycle. Investigating the role of viral RNA secondary structures is crucial for understanding the mechanisms of viral replication and evolution. It could have practical applications in vaccine development and drug design, making it a critical area of research for both basic science and public health.
Data availability
Both the RNAsselem package and the collection of secondary structures of viral genomes are available at http://github.com/KazanovLab/RNAsselem.
References
Caprara, M. G. & Nilsen, T. W. RNA: Versatility in form and function. Nat. Struct. Biol. 7, 831–833 (2000).
Strobel, E. J., Watters, K. E., Loughrey, D. & Lucks, J. B. RNA systems biology: Uniting functional discoveries and structural tools to understand global roles of RNAs. Curr. Opin. Biotechnol. 39, 182–191 (2016).
Ganser, L. R., Kelly, M. L., Herschlag, D. & Al-Hashimi, H. M. The roles of structural dynamics in the cellular functions of RNAs. Nat. Rev. Mol. Cell Biol. 20, 474–489 (2019).
Spitale, R. C. & Incarnato, D. Probing the dynamic RNA structurome and its functions. Nat. Rev. Genet. 24, 178–196 (2023).
Jamison, D. A. et al. A comprehensive SARS-CoV-2 and COVID-19 review, Part 1: Intracellular overdrive for SARS-CoV-2 infection. Eur. J. Hum. Genet. 30, 889–898 (2022).
Narayanan, S. A. et al. A comprehensive SARS-CoV-2 and COVID-19 review, Part 2: Host extracellular to systemic effects of SARS-CoV-2 infection. Eur. J. Hum. Genet. https://doi.org/10.1038/s41431-023-01462-1 (2023).
Villa, T. G., Abril, A. G., Sánchez, S., de Miguel, T. & Sánchez-Pérez, A. Animal and human RNA viruses: Genetic variability and ability to overcome vaccines. Arch. Microbiol. 203, 443–464 (2021).
Boerneke, M. A., Ehrhardt, J. E. & Weeks, K. M. Physical and functional analysis of viral RNA genomes by SHAPE. Annu. Rev. Virol. 6, 93–117 (2019).
Fang, X. et al. XAn unusual topological structure of the HIV-1 rev response element. Cell 155, 594 (2013).
Hill, C. H. & Brierley, I. Structural and functional insights into viral programmed ribosomal frameshifting. Annu. Rev. Virol. 10 (2023).
Liu, Z. Y. et al. Viral RNA switch mediates the dynamic control of flavivirus replicase recruitment by genome cyclization. Elife 5, 1–27 (2016).
Kockler, Z. W. & Gordenin, D. A. From RNA world to SARS-CoV-2: The edited story of RNA viral evolution. Cells 10, 1557 (2021).
Zhu, T. et al. Host-mediated RNA editing in viruses. Biol. Direct 18, 1–12 (2023).
Piontkivska, H., Wales-McGrath, B., Miyamoto, M. & Wayne, M. L. ADAR editing in viruses: An evolutionary force to reckon with. Genome Biol. Evol. 13, 1–21 (2021).
Klimczak, L. J., Randall, T. A., Saini, N., Li, J. L. & Gordenin, D. A. Similarity between mutation spectra in hypermutated genomes of rubella virus and in SARS-CoV-2 genomes accumulated during the COVID-19 pandemic. PLoS One 15, 1–21 (2020).
Kim, K. et al. The roles of APOBEC-mediated RNA editing in SARS-CoV-2 mutations, replication and fitness. Sci. Rep. 12, 1–15 (2022).
Zong, J. et al. Poor evidence for host-dependent regular RNA editing in the transcriptome of SARS-CoV-2. J. Appl. Genet. 63, 413–421 (2022).
Wei, L. Retrospect of the two-year debate: What fuels the evolution of SARS-CoV-2: RNA editing or replication error?. Curr. Microbiol. 80, 1–4 (2023).
Martignano, F., Di Giorgio, S., Mattiuz, G. & Conticello, S. G. Commentary on “Poor evidence for host-dependent regular RNA editing in the transcriptome of SARS-CoV-2”. J. Appl. Genet. 63, 423–428 (2022).
Di Giorgio, S., Martignano, F., Torcia, M. G., Mattiuz, G. & Conticello, S. G. Evidence for host-dependent RNA editing in the transcriptome of SARS-CoV-2. Sci. Adv. 6, 1–9 (2020).
Simmonds, P. & Azim Ansari, M. Extensive C->U transition biases in the genomes of a wide range of mammalian RNA viruses; potential associations with transcriptional mutations, damage- or host-mediated editing of viral RNA. PLoS Pathog. 17, 1–25 (2021).
Azgari, C., Kilinc, Z., Turhan, B., Circi, D. & Adebali, O. The mutation profile of sars-cov-2 is primarily shaped by the host antiviral defense. Viruses 13 (2021).
Pu, X., Xu, Q., Wang, J. & Liu, B. The continuing discovery on the evidence for RNA editing in SARS-CoV-2. RNA Biol. 20, 219–222 (2023).
Liu, X. et al. Rampant C-to-U deamination accounts for the intrinsically high mutation rate in SARS-CoV-2 spike gene. RNA 28, 917–926 (2022).
Wang, J., Wu, L., Pu, X., Liu, B. & Cao, M. Evidence supporting that C-to-U RNA editing is the major force that drives SARS-CoV-2 evolution. J. Mol. Evol. https://doi.org/10.1007/s00239-023-10097-1 (2023).
Wei, L. Reconciling the debate on deamination on viral RNA. J. Appl. Genet. 63, 583–585 (2022).
Cai, H., Liu, X. & Zheng, X. RNA editing detection in SARS-CoV-2 transcriptome should be different from traditional SNV identification. J. Appl. Genet. 63, 587–594 (2022).
Buisson, R. et al. Passenger hotspot mutations in cancer driven by APOBEC3A and mesoscale genomic features. Science 364, eaaw2872 (2019).
Langenbucher, A. et al. An extended APOBEC3A mutation signature in cancer. Nat. Commun. 12, 1–11 (2021).
Nakata, Y. et al. Cellular APOBEC3A deaminase drives mutations in the SARS-CoV-2 genome. Nucleic Acids Res 51, 783–795 (2023).
Ratcliff, J. & Simmonds, P. Potential APOBEC-mediated RNA editing of the genomes of SARS-CoV-2 and other coronaviruses and its impact on their longer term evolution. Virology 556, 62–72 (2021).
Mathews, D. H. RNA secondary structure analysis using RNAstructure. Curr. Protoc. Bioinform. https://doi.org/10.1002/0471250953.bi1206s46 (2014).
Nawrocki, E. & Eddy, S. RNA Secondary Structures: WUSS Notation, INFERNAL User’s Guide. 107–108 http://eddylab.org/infernal/Userguide.pdf (2023).
Eddy, S. et al. Easel - a C Library for Biological Sequence Analysis. https://github.com/EddyRivasLab/easel (2023).
Dethoff, E. A. et al. Pervasive tertiary structure in the dengue virus RNA genome. Proc. Natl. Acad. Sci. U. S. A. 115, 11513–11518 (2018).
Mauger, D. M. et al. Functionally conserved architecture of hepatitis C virus RNA genomes. Proc. Natl. Acad. Sci. U. S. A. 112, 3692–3697 (2015).
Huston, N. C. et al. Comprehensive in vivo secondary structure of the SARS-CoV-2 genome reveals novel regulatory motifs and mechanisms. Mol. Cell 81, 584-598.e5 (2021).
Lan, T. C. T. et al. Secondary structural ensembles of the SARS-CoV-2 RNA genome in infected cells. Nat. Commun. 13, 1–14 (2022).
Id, A. Z. & Jabbari, H. Unveiling hidden structural patterns in the SARS-CoV-2 genome: Computational insights and comparative analysis. Plos One https://doi.org/10.5281/zenodo.8298680 (2024).
Acknowledgements
This study was supported by the RSF, grant #22-14-00132 (to D.N.I.). We thank Irina Ponomareva for the RNA secondary structure artwork.
Author information
Authors and Affiliations
Contributions
Conceptualization, M.D.K.; methodology, M.D.K.; software, F.M.K., G.V.P., E.V.M and M.D.K.; validation, F.M.K., G.V.P. and E.V.M.; investigation, F.M.K., E.V.M., G.V.P, D.N.I. and M.D.K.; data curation, F.M.K.; writing—original draft preparation, M.D.K.; writing—review and editing, M.D.K.; visualization, E.V.M.; supervision, M.D.K.; project administration, M.D.K.; funding ac-quisition, D.N.I.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Kazanov, F.M., Matveev, E.V., Ponomarev, G.V. et al. Analysis of the abundance and diversity of RNA secondary structure elements in RNA viruses using the RNAsselem Python package. Sci Rep 14, 28587 (2024). https://doi.org/10.1038/s41598-024-80240-5
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-024-80240-5





