Abstract
Current methods for determining RNA structure with short-read sequencing cannot capture most differences between distinct transcript isoforms. Here we present RNA structure analysis using nanopore sequencing (PORE-cupine), which combines structure probing using chemical modifications with direct long-read RNA sequencing and machine learning to detect secondary structures in cellular RNAs. PORE-cupine also captures global structural features, such as RNA-binding-protein binding sites and reactivity differences at single-nucleotide variants. We show that shared sequences in different transcript isoforms of the same gene can fold into different structures, highlighting the importance of long-read sequencing for obtaining phase information. We also demonstrate that structural differences between transcript isoforms of the same gene lead to differences in translation efficiency. By revealing isoform-specific RNA structure, PORE-cupine will deepen understanding of the role of structures in controlling gene regulation.
Similar content being viewed by others
Main
RNAs can fold into complex secondary and tertiary structures to regulate every step of their life cycle1. The ability to assign correct structures to the right transcript is key to understanding RNA-based gene regulation. Recently, enzymatic and chemical probes have been coupled with high-throughput sequencing to generate large-scale RNA structure information across transcriptomes in vitro and in vivo2,3,4,5,6,7,8,9,10. This has yielded insights into the pervasive regulatory roles of structure in diverse organisms and cellular conditions1,11. Although powerful, current high-throughput structure mapping approaches suffer from drawbacks, such as complex library preparation protocols and the lack of structural information for full-length transcripts owing to short-read sequencing. As short reads cannot distinguish structures in shared regions between isoforms, this poses a challenge in the ability to correctly assign RNA structural information in individual gene-linked isoforms, limiting understanding of the role of structure in gene regulation.
Recent developments in high-throughput sequencing have enabled long-read, amplification-free complementary DNA (cDNA) sequencing on both the Pacific Biosciences and Oxford Nanopore platforms12,13,14, enabling the phasing of alternative exons15. In nanopore sequencing, RNA and DNA molecules can be directly sequenced by measuring the current as the molecules are threaded through a biological pore16,17. Additionally, natural modifications along DNA or RNA can result in current perturbations, leading to decodable signal anomalies that reveal both the position and identity of modifications along the genome18,19,20. In principle, artificial RNA modifications caused by structure-probing chemicals are also decodable, but their use for studying full-length RNA structure has yet to be explored.
In this study, we coupled chemical modifications with direct RNA sequencing on nanopores to identify structural patterns in the transcriptome of human embryonic stem cells (hESCs). Our method, which we term PORE-cupine, identifies single-stranded bases along an RNA by detecting current changes induced by structure-dependent modifications (Fig. 1a). As the method involves only a simple two-step ligation protocol (with a preparation time of 2 h before sequencing) without the need for polymerase chain reaction (PCR) amplification, PORE-cupine captures structural information in a transcriptome rapidly and directly. The nature of long-read sequencing through nanopores also allows one to accurately assign and capture structures and their connectivity along individual gene-linked isoforms, deepening one’s understanding of how the complex and extensively spliced transcriptome could take on different structures to regulate isoform-specific gene expression.
a, Schematic of direct RNA sequencing followed by signal processing. RNAs were structure probed and nanopore sequenced, yielding characteristic voltage signals. The non-structure-probed signal was used as a training set to predict modifications from the structure-probed data set. b, Normalized current mean and s.d. distributions for single-stranded regions on Tetrahymena RNA modified with NAI-N3. With footprinting gels as a guide, the top 10% of single-stranded regions (24 positions) on the Tetrahymena RNA were chosen for these plots. P values for comparison between the two distributions were calculated using the two-sided Wilcoxon rank-sum test. c, Upper panel, schematic of the secondary structure of Tetrahymena RNA and the location of a representative single- and double-stranded base. Lower panel, current mean and log10 (s.d. of current) of the highlighted bases in the upper panel. Each data point is from a base in a single RNA strand. Plots on the left show the distribution for unmodified (blue) and modified (red) bases before SVM classification. Plots on the right show the distributions for unmodified and modified bases but with the SVM boundary drawn (dotted lines). Outliers are in red; points within the boundary are in black. d, NAI-N3 modification rates along the entire length of the Tetrahymena RNA (red). The modification rate of a randomly modified denatured RNA (brown) and another unmodified replicate (blue) is also shown. The y axis indicates the modification rate per base, whereas the x axis indicates the position along the RNA. Inset, ROC curves for unmodified, modified and denatured Tetrahymena RNA sequences. e, f, Comparison of NAI-N3 modification reactivities between PORE-cupine (teal) and the average SAFA footprinting signals from n = 2 biological replicates (black). The footprinting is from base 189 to 269 along the Tetrahymena RNA (e) and from base 117 to 186 along the lysine riboswitch (f). Lane 1 is G ladder, lane 2 is unmodified RNA and lane 3 is NAI-N3 modified RNA. Correlations were quantified using the Pearson correlation coefficient.
Results
Chemical modifications on RNAs can result in detectable errors in direct RNA sequencing
Many chemical probes can modify single-stranded bases in folded RNAs21. To determine which chemical probes can result in a detectable signal change during direct RNA sequencing, we tested five different structure-probing compounds that modify single-stranded bases (Extended Data Fig. 1a). These include SHAPE reagents that acylate 2′ hydroxyl (OH) groups of flexible bases: N-methylisatoic anhydride (NMIA), 2-methylnicotinic acid imidazolide (NAI) and 2-methylnicotinic acid imidazolide azide (NAI-N3), as well as base-specific chemical probes: dimethyl sulfate (DMS) and 1-cyclohexyl-3-(2-morpholinoethyl) carbodiimide metho-p-toluenesulfonate (CMCT) (Extended Data Fig. 1b)6,21,22,23. DMS alkylates single-stranded bases, specifically at N1 of adenines, N3 of cytosines and N7 of guanines, whereas CMCT primarily reacts with single-stranded uracils at N3 and guanines at N1 positions (Extended Data Fig. 1b). We first performed in vitro structure probing using each of these chemicals on Tetrahymena RNA, which has a well-defined secondary structure24. Modified and unmodified Tetrahymena RNAs were ligated to an adapter and attached to a motor protein before being directly sequenced on the nanopore MinION system16 (Fig. 1a). We sequenced 19,000–68,000 modified Tetrahymena RNA reads for each of the compounds individually and 20,000 unmodified Tetrahymena RNA sequences (Supplementary Table 1).
Tetrahymena RNAs that were modified by the different compounds individually showed similar lengths upon mapping to the sequence, although DMS-modified mapped sequences were slightly shorter (Extended Data Fig. 1c). As bases with modifications can result in errors during base-calling along the sequence, we calculated the proportion of mismatches, insertions and deletions in our mapped modified versus unmodified Tetrahymena RNA reads. Notably, we observed a higher mismatch rate (modified: 6.5%–11.7%, unmodified: 5.3%), deletion rate (modified: 8.7–13.2%, unmodified: 8.7%) and insertion rate (modified: 3.4–4.1%, unmodified: 3%) in modified Tetrahymena RNA sequences for all of the five chemical compounds (Extended Data Fig. 1d–f). Aligning the error rates for the five compounds along the sequence and structure of the Tetrahymena RNA showed that there are peaks that are shared by multiple compounds (Extended Data Figs. 2 and 3a). We further observed that modification-induced mismatches show base-specific changes such that NAI-N3 and DMS modifications on cytosines tend to be miscalled as uracils, whereas NAI-N3 and CMCT modifications on uracils tend to be miscalled as cytosines (Extended Data Fig. 3b). This suggests that there are systematic errors in base-calling when bases are modified, and this could be leveraged for detecting them.
To determine whether the mismatches, insertions and deletions caused by the compounds reside in expected single-stranded positions according to the Tetrahymena RNA secondary structure, we calculated the performance of these errors using area under the receiver operating characteristic curve (AUC-ROC) analysis on footprinting signals for the Tetrahymena RNA. NAI-N3-induced mismatches resulted in the best performance for detecting single-stranded bases in the secondary structure of the Tetrahymena RNA (Extended Data Fig. 1g–i), suggesting that it is a promising structure-probing compound for further optimization with direct RNA sequencing signals.
PORE-cupine accurately identifies NAI-N3 modifications using machine learning
To detect NAI-N3 signals with higher accuracy, we next used a machine learning strategy known as support vector machine (SVM) to perform anomaly detection on our modified RNA. Upon sequencing and mapping, we used the program Nanopolish, which was first developed to align DNA signals to detect 5-mC18, to align the current signals to the RNA sequence (Extended Data Fig. 4a). We extracted three features from the current that flows through the channel of the nanopores during sequencing—the current mean, s.d. and dwell time—and determined their distribution in modified versus unmodified bases along the Tetrahymena RNA. Modified single-stranded bases undergo current changes in their s.d. and mean, but not in their dwell time, as compared to unmodified bases, suggesting that we could distinguish modification status based on the above two features (Fig. 1b and Extended Data Fig. 4b–e). By using footprinting data from the Tetrahymena RNA, we then optimized one-class SVM parameters with these two features to best distinguish signals from modified versus unmodified bases (Methods). The extent of modified outliers per base could be calculated as a ‘reactivity score’, whereby, the higher the score, the more single-stranded a base is predicted to be. For example, the double-stranded base 182 in the Tetrahymena RNA is not modified by NAI-N3 and does not show current changes with or without chemical modification upon sequencing (Fig. 1c). However, the single-stranded base 129 is modified upon structure probing, and this is reflected by the ‘comet tail’, indicating deviations in current for the modified base (Fig. 1c).
To determine the effect of different extent of modifications on direct RNA sequencing, we modified the Tetrahymena RNA in vitro using two different conditions—a standard structure-probing condition (5 min) and an over-modified condition (25 min) (Methods)—and sequenced 10,000–51,000 reads for each (Supplementary Table 1). We observed that over-modifying the RNA did not cause RNA degradation (Extended Data Fig. 4f) but resulted in much poorer mappability rates (mappability rates of unmodified, 5 min and 25 min were 75.2%, 81.4% and 16.5%, respectively; Extended Data Fig. 4g), indicating that the high error rates make it difficult to align reads accurately to known sequences. Over-modified RNAs are also shorter in length (median length of 25-min modified RNA = 348 bases, 5-min modified = 378 bases and unmodified = 379 bases; Extended Data Fig. 4h), suggesting that over-modified reads could be prematurely ejected from pores during sequencing. Plotting the coverage of the mapped reads along the length of the Tetrahymena RNA sequence showed the largest decrease in the first 50 bases of the 5′ end for unmodified and 5-min modified samples and in the first 100 bases for the over-modified samples (Extended Data Fig. 4i). Based on these results, we continued with the standard 5-min modification protocol for all subsequent structure experiments.
We observed that reactivity signals from two replicates of the Tetrahymena RNA were highly correlated, indicating that our data are reproducible (R = 0.97; Extended Data Fig. 5a). Bases with high reactivity scores were not observed in additional replicates of unmodified Tetrahymena RNA and were evenly distributed in modified denatured RNA in a non-structure-specific manner, further indicating that the reactivity scores represent real structure modifications (Fig. 1d). In addition to unimodal current profiles for each k-mer, we observed that 2.9% of Tetrahymena RNA k-mers show bimodal current profiles for both mean and s.d. (Extended Data Fig. 5b). Comparing PORE-cupine’s reactivity profile to footprinting signals showed that PORE-cupine has a two-base frameshift relative to footprinting (Extended Data Fig. 5c) and that correcting for this frameshift results in a high Pearson correlation coefficient (Fig. 1e, Extended Data Fig. 5d,e and Supplementary Data 1). As five bases occupy the nanopore channel at a time, this two-base shift indicates that modifications on the third base, which is at the center of the channel, result in the largest current difference in our study. We performed this two-base shift for all of our downstream analysis.
To optimize SVM’s ability to distinguish modified from unmodified bases accurately in different RNAs, we expanded our training and test set to 14 RNAs (2,663 bases, including two human messenger RNAs (mRNAs)) and used a 80%/20% train-test split (at the RNA level) to refine model parameters (Fig. 1e,f, Extended Data Fig. 5d–f and Supplementary Data 1). Except for the 16S ribosomal RNA (rRNA), for which we performed in vivo structure probing, the rest of the 13 RNAs were structure probed in vitro. All RNAs were sequenced as a pool to obtain 0.5–2 million reads using direct RNA sequencing (Supplementary Tables 1 and 2). PORE-cupine analysis showed high reproducibility in reactivity between different biological replicates for RNAs in the test set (Extended Data Fig. 5g–j). Our revised SVM parameters performed similarly to the initial SVM parameters based on the Tetrahymena RNA but exhibited a slightly higher median AUC-ROC score of 0.79 on the test set (Extended Data Fig. 5k). The overall performance of the model was also robust across various train-test splits of the 14 RNAs (Extended Data Fig. 5l and Methods). Separate training based on unimodal and bimodal k-mers did not improve the performance of the SVM for them (Extended Data Fig. 5m). Comparing PORE-cupine’s reactivity with footprints on randomly selected regions of test RNAs (full-length 16S rRNA, RPS29 and AdoCbl riboswitch) demonstrated good correlation, similar to what is observed with biological replicates for footprinting (Supplementary Table 3, Fig. 2a–c, Extended Data Fig. 6 and Supplementary Data 1).
a–c, Graphs of z-score-normalized NAI-N3 modification profiles against SAFA footprinting signals from in vivo modified 16S rRNA (a), in vitro modified AdoCbl riboswitch (b) and in vitro modified RPS29 (c). The Pearson correlation was used to quantify the similarity between the two signals. SAFA quantification is shown as a bar chart in teal; PORE-cupine is shown as a line plot in black. d,e, PORE-cupine captures TPP riboswitch dynamics. d, Top, line plots showing PORE-cupine reactivities along TPP riboswitch in the presence of different concentrations of TPP (0, 250 nM, 750 nM and 10 μM). The pink box marks the aptamer region of the riboswitch, which shows the greatest change. Below, expanded view of the reactivities in the aptamer region. e, Pearson correlation between the reactivities of TPP when structure probed in water versus in the presence of 250 nM TPP, 750 nM TPP and 10 μM TPP. Pearson correlation values were computed by taking 30-nucleotide sliding windows across the transcript.
As RNA structures can be dynamic, we next tested whether structural changes in a riboswitch (with and without ligand) can be detected using PORE-cupine25. Applying PORE-cupine on the in vitro modified thiamine pyrophosphate (TPP) riboswitch resulted in 5,000–64,000 sequenced reads (Supplementary Table 1). We observed that corresponding reactivity signals were robust (Extended Data Fig. 7) and that the binding of TPP results in structure differences in the aptamer region (R = 0.3 between water and 10 μM TPP in the aptamer region versus R = 0.9 in non-aptamer regions; Fig. 2d). In addition, we also observed a gradated change in reactivity under different concentrations of TPP in vitro, indicating that PORE-cupine could detect gradual changes in RNA secondary structure (Fig. 2d,e).
Genome-wide analysis of RNA structures in hESCs using PORE-cupine
Groups of transcripts can share similar structures and perform related functions in cells26. We applied PORE-cupine to study the RNA structural landscape in hESCs by sequencing four biological replicates of NAI-N3-modified and two biological replicates of unmodified hESC transcriptomes, totaling 10 million sequenced reads in each condition (Supplementary Table 4 and Extended Data Fig. 8a,b). The mappability of unmodified and modified reads was 86.1% and 59.6%, respectively (Extended Data Fig. 8c), with most reads having a modification rate of 1–2% (Extended Data Fig. 8d), providing good performance for detecting secondary structures in terms of AUC-ROC (Extended Data Fig. 8e). We did not observe a decrease in mapping rate around exon–exon junctions for our modified libraries (Extended Data Fig. 8f). We observed that 0.29% of the k-mers in the hESC transcriptome showed bimodal current profiles for both the current mean and s.d., and that this bimodality is enriched in specific bases along a k-mer (Extended Data Fig. 8g–j). We also observed a drop in read coverage at the extreme 5′ and 3’ ends of transcripts, indicating that these could be blindspots in calculating reactivities (Extended Data Fig. 8k).
To determine the number of reads needed for accurate structure determination, we subsampled the number of unmodified and modified reads of the Tetrahymena RNA and compared the reactivity information obtained from the subsampled set to that of the full data set. As expected, the correlation increased with the number of reads used and began to plateau at around R = 0.8, with 200 reads of unmodified RNA and 100 reads of modified RNA (Fig. 3a). At this threshold, we observed that transcript abundances and reactivity profiles of hESC RNAs were highly correlated across biological replicates, independent of modification status (Fig. 3b), indicating that our data are reproducible (Extended Data Fig. 9a). We obtained structural information for 1,582 coding genes, 98 noncoding genes, 67 pseudogenes and four rRNAs across the hESC transcriptome after filters (Fig. 3c and Extended Data Fig. 9b). The median length of mapped reads was 772 and 752 bases for unmodified and modified libraries, respectively (Extended Data Fig. 9c), with 37.9% and 42.8% of the unmodified and modified transcripts having more than 90% of the annotated gene length, respectively (Extended Data Fig. 9d). We observed that the reactivity profiles of our transcripts were highly consistent with those for near full-length transcripts (>99% of annotated length, n = 83, median Pearson correlation = 0.93; Extended Data Fig. 9e), indicating that our structure information is robust and captures what is found in vivo.
a, Top, Pearson correlation of Tetrahymena RNA structural profiles obtained via subsampling of unmodified reads in comparison to the structural profile obtained using the full data set. The minimum number of unmodified reads used for downstream analysis was set to 200 (dotted line in gray). Bottom, Pearson correlation of Tetrahymena RNA structural profiles obtained via subsampling of modified reads in comparison to the structural profile obtained using the full data set. The minimum number of modified reads used for downstream analysis was set to 100 (dotted line in gray); we subsampled 100× for each abundance. b, Scatter plot showing normalized base reactivity between two biological replicates of structure-probed hESC libraries. Pearson correlation was used to calculate the similarity between the two signals; P = 0 and CI95% = (0.73, 0.73) using two-tailed Student’s t-test. n = 2 biological replicates were performed. c, Pie chart showing the number of genes belonging to different classes of transcripts captured in the hESC data set. d, AUC-ROC curves showing the performance of PORE-cupine, icSHAPE and SHAPE-MaP on the Tetrahymena RNA (top) and 16S rRNA (bottom) based on footprinting. e, Comparison of PORE-cupine to icSHAPE and SHAPE-MaP in the hESC transcriptome. Venn diagram showing the overlap of high-reactivity sites identified by PORE-cupine, SHAPE-MaP and icSHAPE. f, Box plot showing the fraction of high-reactivity bases that were identified in PORE-cupine, SHAPE-MaP and icSHAPE and that also had high signals in at least one other method (3,037 genes and 1,617,397 positions). All signals were taken from the average of n = 2 biological replicates. The middle line of the box plot indicates the median, whereas the lower and upper boundary of the box plot correspond to the first and third quartiles. The upper whisker extends from the hinge to the largest value no further than 1.5× IQR from the hinge, and the lower whisker extends from the hinge to the smallest value at most 1.5× IQR of the hinge. Outliers are shown as dots. g, Top, metagene analysis of reactivity profiles for transcripts with (orange) and without (blue) HITS-CLIP evidence for Lin28 binding, centered at the binding motifs. Bottom, log10(P value) of the difference in reactivity between the profiles shown above. P value was calculated using two-sided Student’s t-test. h, Top, metagene analysis of correlation of reactivity profiles (± 100 bases centered on SNV position) for reads from different alleles from the same biological replicate (red) and reads from the same allele but different biological replicates (green). SNVs result in local reactivity differences up to 25 bases upstream and downstream of the SNV site. Bottom, log10(P value) of the difference in metagene profiles between red and green lines on top. Pearson correlation values were computed by taking 30-nucleotide sliding windows across each transcript, and P values were calculated using two-sided Fisher’s exact test. i, An example of the reactivity difference due to A-to-C allele change in ARC21 transcript. The black and red lines represent normalized reactivity of the A and C allele, respectively, around the SNV. The dotted brown line on the line plot represents the location of the SNV; red bars represent above the reactivity profiles the positions that have significant change between the two profiles (Methods). IQR, interquartile range.
To benchmark the accuracy of PORE-cupine with other widely used high-throughput, structure-probing methods, such as icSHAPE and SHAPE-MaP5,6, we performed icSHAPE and SHAPE-MaP on the hESC transcriptome and on known structural RNAs, such as 16S rRNA and the Tetrahymena RNA. We observed that PORE-cupine performed similarly to icSHAPE and SHAPE-MaP on the 16S rRNA and Tetrahymena RNA (AUC-ROC for Tetrahymena RNA: 0.93 for SHAPE-MaP and PORE-cupine and 0.91 for icSHAPE; AUC-ROC for 16S rRNA: 0.8 for icSHAPE and PORE-cupine and 0.77 for SHAPE-MaP; Fig. 3d). To determine whether high-reactivity sites observed in one method were also seen in another independent method, we overlapped PORE-cupine signals with icSHAPE and SHAPE-MaP signals. We observed that 38% of PORE-cupine’s high-reactivity positions overlapped with icSHAPE or SHAPE-MaP sites, whereas 36% of SHAPE-MaP high-reactivity positions overlapped with icSHAPE or PORE-cupine sites, and 39% of icSHAPE high-reactivity positions overlapped with SHAPE-MaP or PORE-cupine sites (Fig. 3e,f and Methods). Although these overlap rates are low, they are consistent with previous observations on read-through versus reverase transcription (RT) stop methods27 and point to the complementary range of various genome-wide, structure-probing methods in capturing different populations of single-stranded bases.
To determine whether PORE-cupine could capture global structural properties seen in other high-throughput, structure-probing data sets, we calculated the average reactivity signal in different RNA classes. As expected, we observed that rRNAs are the most highly structured, followed by long noncoding RNAs (lncRNAs) and mRNAs, in agreement with the importance of structure for noncoding RNAs3 (Extended Data Fig. 9f). Metagene analysis of mRNAs aligned by their translational start and stop sites showed the classic three-nucleotide structural periodicity in their coding sequences and not in their 5′ and 3′ untranslated regions (UTRs)2,10,28 (Extended Data Fig. 9g), highlighting PORE-cupine’s ability to recapitulate known structural patterns in other data sets.
As icSHAPE and SHAPE-MaP can identify RNA-binding protein (RBP) sites by detecting different SHAPE reactivities in bound versus unbound positions29, we evaluated whether PORE-cupine could also detect RBP binding sites in our hESC data. We examined the reactivity profiles of Lin28 binding sites with and without high-throughput sequencing of RNA isolated by crosslinking immunoprecipitation (HITS-CLIP) binding evidence in hESCs30 (Methods). We observed that Lin28 binding sites with HITS-CLIP evidence showed an increase in reactivity in the bases flanking the binding motif and a decrease in reactivity within the binding motif (Fig. 3g). This indicates that real Lin28 binding sites are more structurally accessible around the motif and that Lin28 binding likely prevents NAI-N3 from modifying the RNA sequence.
Besides RBP binding, previous studies also showed that single-nucleotide variants (SNVs) can result in structural changes along an RNA3. To determine whether PORE-cupine could identify structural changes due to point mutations, we first identified mutations in the hESC transcriptome using Illumina RNA sequencing data (Methods). We then separated mapped direct RNA sequencing reads based on the different alleles observed. We identified 90 transcripts with two or more SNVs and sufficient coverage for reactivity analysis: 10/90 SNVs were observed to result in statistically significant reactivity changes (11.1%, Fisher’s exact test; Methods). Metagene analysis of the reactivity profiles across alleles showed that the largest reactivity differences occur locally and extend up to 25 bases upstream and downstream of the SNV location (Fig. 3h,i).
Detecting structural differences in shared exons from alternative isoforms
The human transcriptome is extensively spliced31,32. As most short reads fall in sequences shared between transcript isoforms, short-read sequencing cannot reveal structural differences between isoforms, which can have considerable functional consequences33. To analyze RNA structure across isoforms, we mapped direct RNA sequencing data to transcripts present in the Ensembl database (Methods) and obtained 104 genes (corresponding to 204 pairwise transcript comparisons) that had two or more isoforms with sufficient coverage for accurate structure characterization for downstream analysis (Fig. 4a and Extended Data Fig. 10a).
a, Histogram showing the distribution of structure-probed genes according to the number of isoforms present; values for each group are shown above. b, Top, schematic showing pairwise structure comparisons between 1) biological replicates of the same transcript and 2) different isoforms of the same gene. Bottom, box plot showing distribution of structure similarity for the same isoform across biological replicates (salmon) or between different isoforms within the same biological replicate (teal). In total, 204 transcripts were compared. Structure similarity was calculated using the Pearson correlation. P values for comparison between the two distributions were calculated using the two-sided Wilcoxon rank-sum test. c, Scatter plot of structure similarity between shared exons of different isoforms versus their sequence similarity from both sides of the alternative splice site to the 5′ and 3′ ends of the transcript. Pearson’s R is shown (1 alt P = 0.003 using two-tailed Student’s t-test, CI95% = (0.074, 0.35); >2 alt P = 0.089 using two-tailed Student’s t-test, CI95% = (−0.06, 0.68)). In total, 182 transcripts with one alternate splice site and 22 transcripts with more than two alternate splice sites are compared to each other. d, Top, metagene analysis of reactivity similarity between alternatively spliced isoforms centered at the alternative splice site (blue), as well as reactivity similarity between biological replicates of the same transcripts centered at the same splice sites (red). Bottom, log10(P value) of the difference in the blue and red lines above. P values were calculated using two-sided Fisher’s exact test. e, Box plot showing distribution of reactivity similarity for the same isoform across biological replicates (salmon) or between different isoforms within the same biological replicate (teal), for local contexts (left) and for global excluding local contexts (right). Global excluding local contexts are defined as regions extending from both sides of the alternative splice site to the 5′ and 3′ ends of the transcript, excluding 200 bp to the right and left of the alternative splice site. In total, 204 transcripts were compared globally. Local contexts are defined as 50 bp to the left and right of the alternative splice site; 226 transcripts were compared at the local level. P values were calculated using two-sided Wilcoxon rank-sum test. f, Structural information from two different isoforms of RPS8. Top, exon and intron organization displayed with their respective Ensembl spliced transcript IDs. An alternative exon seen in our structural data is colored in brown. The three shared exons near the 3′ end of the gene is boxed. Bottom, normalized reactivity profiles for the aggregate signal of the isoforms in the three shared exons. The two structure-changing regions (A and B) between the isoforms identified by PORE-cupine are boxed. g, Normalized reactivity profiles across the entire length of the different isoforms. PORE-cupine could assign the correct structures to the different isoforms due to long-read sequencing. Red bars represent above the reactivity profiles the positions that have significant change between the two profiles (Methods). The gray filled-in boxes indicate the boxed structure regions (A and B) in f. The beige filled-in bar indicates the location of the retained intron. In b and e, the middle line of the box plot indicates the median, whereas the lower and upper boundary of the box plot correspond to the first and third quartiles. The upper whisker extends from the hinge to the largest value no further than 1.5× IQR from the hinge, and the lower whisker extends from the hinge to the smallest value at most 1.5× IQR of the hinge. Outliers are shown as dots. IQR, interquartile range.
We observed that most gene-linked isoforms (178 of 204, 87%) showed reactivity differences in shared regions (Methods). Globally, this is reflected in lower reactivity similarities in shared sequences across gene-linked isoforms as compared to biological replicates of the same transcript (Fig. 4b). In general, isoform pairs with greater sequence similarities are more structurally similar in shared regions (Fig. 4c). This correlation is stronger when there are two or more alternative splice sites along a transcript, resulting in widespread reactivity changes across the entire transcript (Fig. 4c). Although the largest reactivity differences appear to occur immediately around the alternative splice site, 70% of isoforms contain both local and distal (>200 bp away) reactivity changes relative to splice sites (Fig. 4d). We confirmed this distal effect on structures by showing that identical sequences that are far away from an alternative splice site show lower reactivity correlation between gene-linked isoforms in the same replicate than between identical transcripts across biological replicates (Fig. 4e).
PORE-cupine can phase structures along long isoforms
The presence of two or more alternative structures that reside in and span identical sequences makes it particularly challenging for short-read sequencing to determine which combinations of RNA structures coexist in an isoform (Extended Data Fig. 10b). An example of this is RPS8, which is alternatively spliced into two isoforms that share identical sequences for three exons near the 3′ end but are alternatively spliced near the 5′ end (Fig. 4f). PORE-cupine analysis shows that the two isoforms contain different structures (A1 versus A2 and B1 versus B2) that are separated by ~400 bases from each other in the shared sequences. In short-read sequencing, the lack of connectivity between structures A and B makes it difficult to know whether A1 is linked to B1 or B2 in the blue isoform and vice versa. However, PORE-cupine enables us to link and correctly assign structural information to their individual isoforms in shared regions (Fig. 4g). Globally, 36.4% of the transcripts contained two structure-changing regions that were more than 200 bases apart (Extended Data Fig. 10c), demonstrating the importance of PORE-cupine for phasing structures along long isoforms by providing connectivity in RNA structure information across the transcriptome.
Isoforms with structural differences show differences in translation efficiency
Different RNA structures are used to regulate gene expression, including translation, splicing and decay1. To determine whether structural differences between isoforms could regulate translation, we performed transcript isoform in polysomes sequencing (TrIP-seq) on hESCs to analyze the distribution of gene-linked isoforms across a polysome gradient34. Isoforms that are found predominantly in higher polysome fractions are typically associated with more ribosomes and have higher translation rates, although RNAs could also be associated with other high-molecular-weight complexes in high polysome fractions (Extended Data Fig. 10d,e). We obtained a high degree of correlation between two biological replicates across polysome fractions (Extended Data Fig. 10f) and observed that highly translated transcripts are found in high polysome fractions, although poorly translated transcripts are found in low polysome fractions as expected (Extended Data Fig. 10g), indicating that our TrIP-seq data provide an accurate reflection of mRNA–polysome association.
Of 178 structure-changing isoform pairs, 153 pairs had polysome fractionation data. We observed that 28 pairs showed changes in translation efficiency by TrIP-seq (18.3%, Fisher’s exact test; Methods). Gene-linked isoforms that are structurally similar are translated at similar rates, whereas structurally more divergent pairs show greater differences in their translation, suggesting that isoform-specific structures could affect isoform-specific translation (Fig. 5a). We observed that one of the isoform pairs of RPL17 showed reactivity differences in shared regions based on PORE-cupine analysis, as well as translation efficiency differences. The transcript ENST00000618619.4 (RPL17_1) is highly translated, whereas ENST00000579408.5 (RPL17_2) is poorly translated and contains a retained intron of 161 bases in the 5′ UTR (Fig. 5b,c). To study how this retained intron resulted in structural changes in the 5′ UTR and in translation repression, we examined pairwise RNA interactions in this region from a previously published data set that uses proximity ligation sequencing (Sequencing of Psoralen crosslinked, Ligated and Selected Hybrids (SPLASH))35. SPLASH reads showed strong interactions between the retained intron and sequences upstream and downstream to it, resulting in an extensively structured environment around the start codon (Fig. 5d). In the absence of the retained intron (RPL17_1), the isoform folds into a simpler structure, allowing the start codon to be more accessible for translation.
a, Box plot showing that gene-linked isoforms with greater structural similarity show smaller differences in translation efficiency. P value was calculated using the two-sided Wilcoxon rank-sum test (P = 0.001). In total, 43 transcripts with low structure similarity and 48 transcripts with high structure similarity were used for the comparison. b, Structural information from a pair of isoforms from RPL17 that show structure and translational differences. Top, exon and intron organization displayed with their respective Ensembl spliced transcript IDs. The alternative exon seen in our structural data is highlighted in red. Bottom, normalized reactivity profiles for the two gene-linked isoforms. The red bars on top of the line plots indicate positions of significant structure changes (P < 0.05, Fisher’s exact test; Methods). c, The normalized log(TPM) expression level for the two isoforms in fractions 6–10, P = 4 × 105 (Fisher’s exact test; Methods). d, Left, the upper structure (blue and green) is based on RNAcofold-calculated interactions for 10–400 bp, whereas the lower structure (red) is based on the RNAcofold-calculated interactions between regions 10–96 bp and 257–300 bp. Right, RNAcofold-derived structures for the region around the alternative exon. The SPLASH read counts are indicated below the structures. The start codon is highlighted (red) on the structures. e, Top, design of the fusion RNAs used in the luciferase assay. Bottom, expanded view of the predicted secondary structure of RPL17_2, with the mutations shown next to the original base. We performed mutations along three different stems (helixes 1–3) of the structure, first on each side of the structure (1.1/1.2, 2.1/2.2 and 3.1/3.2) to disrupt the helixes and then on both sides of the structure to restore the helixes (compensatory mutations 1.1 + 1.2, 2.1 + 2.2 and 3.1 + 3.2). f, Box plots showing the luciferase activity of fusion RNAs containing 5′ ends of RPL17_1 and RPL17_2 and the mutants in front of luciferase gene, 8 or 16 h after transfection in 293T cells. We performed four biological replicates for each experiment. Red dots represent the average of technical replicates for each biological replicate. P values were calculated using the two-sided Student’s t-test with n = 4 for each condition. RPL17_1 shows >10-fold higher luciferase activity as compared to RPL17_2, and structure mutations that disrupt the helical stems of RPL17_2 increase luciferase activity, whereas compensatory mutations that restore the structure partially rescue the low luciferase activity. In a and f, the middle line of the box plot indicates the median, whereas the lower and upper boundary of the box plot corresponds to the first and third quartiles. The upper whisker extends from the hinge to the largest value no further than 1.5× IQR from the hinge, and the lower whisker extends from the hinge to the smallest value at most 1.5× IQR of the hinge. Outliers are shown as dots. IQR, interquartile range; NS, not significant.
To experimentally validate that the poor translatability of RPL17_2 is indeed due to extensive structures formed by the retained intron around the start codon, we cloned the 5′ ends of RPL17_1 and RPL17_2 in front of a luciferase reporter and performed mutagenesis experiments on RPL17_2 (Fig. 5e and Methods). We confirmed that RPL17_1 indeed translates much better than RPL17_2, as shown by >10-fold higher luciferase units upon RNA transfection (Fig. 5f). Mutations that disrupt the pairwise interactions of three different stems (1.1 or 1.2, 2.1 or 2.2 and 3.1 or 3.2; Fig. 5e) open the structures around the start codon and increase the translatability of RPL17_2, whereas compensatory mutations that restore the helical structures partially rescue the poor translatability of RPL17_2 (Fig. 5f). These results confirm that structure plays an important role in regulating isoform-specific translation of RPL17.
Discussion
The human transcriptome is tightly regulated by sequence and structural features along each transcript32,36. Assigning correct structural information to individual transcripts is the first step in understanding structure-based gene regulation. In this study, we coupled RNA structure probing with direct RNA sequencing on nanopores to better understand structure–function relationships in the cell. We initially tested five different structure-probing compounds, as it is unclear how the size, charge and location of the modifications (on the base or sugar) along RNA could perturb the current flowing through the nanopore. Although we observed that DMS modifications resulted in the highest amount of errors upon mapping, these errors were occurring in double-stranded as well as single-stranded bases, probably due to the ability of DMS to modify guanines in a structure-independent manner. In contrast, the SHAPE compound NAI-N3 modifies RNAs to result in errors that are more enriched in single-stranded bases.
Although errors, such as mismatches, insertions and deletions, detected during nanopore sequencing is a convenient way to determine whether a modification has an effect on direct RNA sequencing, machine learning strategies, such as SVMs, enable the identification of NAI-N3-modified bases with high accuracy. In addition, the frequency of outliers at each base can provide a reactivity score that serves as a proxy for single-strandedness at that base. Similarly to other direct RNA sequencing experiments, we observed that RNAs longer than 200 bases are sequenced better by direct RNA sequencing16, and that there is a 5′ end decay in signal particularly in the first 10% of bases of the transcript, which could be due to degradation during poly(A) selection or when the RNA is being sequenced. We also observed a signal decay at the very 3′ end of the transcripts, which could indicate incorrect annotations.
Compared to other short-read, high-throughput sequencing methods2,6,8,10, PORE-cupine provides a fast, direct and complementary way to assay for RNA structure and dynamics genome wide. Low-reactivity regions could indicate either double-strandedness or protection from the compound, such as upon RBP binding. PORE-cupine also requires a minimum number of reads (200 reads of modified RNA and 100 reads of unmodified RNA) for accurate reactivity analysis, as most of the reads contain only 1–2% modifications and hence require us to aggregate signals across the strands to be able to detect structure. As such, deep sequencing is needed to be able to obtain RNA structure information using direct RNA sequencing. We obtained structure information for a total of 1,751 transcripts with ~40 million sequencing reads and estimate that at least 26 million more reads are needed to double the amount of structure information transcriptome wide. As we are currently mapping reads to transcripts in the Ensembl database, we are detecting structures on transcripts that have been annotated. Further efforts in de novo assembly of RNA transcripts would enable us to identify structural information on novel transcripts.
As PORE-cupine obtains structural information along long stretches of RNA, we can determine and phase RNA reactivities in gene-linked isoforms. However, alternative splicing at the extreme 5′ or 3′ ends of the transcripts limits the ability to uniquely map reads to individual gene-linked isoforms due to 5′ and 3′ end decay. We focused our analysis on mRNA gene-linked isoforms due to controversies on whether lncRNAs could be translated. We observed that many gene-linked isoforms exhibit reactivity differences in shared regions, and that this is associated with changes in translational efficiency using polysome profiling. Although polysome profiling data are not a perfect proxy for translation, as RNAs could also be associated with other RBPs that reside in different polysome fractions, we do observe that highly translated RNAs, such as ACTB, and poorly translated RNAs, such as ATF4, are in high and low polysome fractions, respectively, in our data. Furthermore, our luciferase experiments on a poorly translated versus highly translated isoform validated the polysome fractionation results, suggesting that transcripts were largely translated according to their presence in the polysome fractions. Lastly, our mutational experiments on RPL17 further demonstrate the importance of isoform-specific structure in regulating translation.
PORE-cupine expands the current repertoire of RNA structure probing strategies to deepen our understanding of the role of RNA structure in isoform-specific gene regulation2,6,9,10,37. Although the initial version still requires an aggregate of signals across many strands to obtain accurate structure reactivity information, further developments of the method, by increasing the modification frequency per strand and improving sequencing and analytical techniques, could enable the study of RNA structures at a single-molecule level in the future.
Methods
RNA-modifying reagents
CMCT, NAI, NMIA and DMS were purchased from Sigma-Aldrich. NAI-N3 was synthesized as previously described from ethyl 2-methylnicotinate in four steps, as in Spitale et al.6.
In vitro transcription, folding and in vitro structure probing
RNA was transcribed from PCR-amplified inserts using the Hiscribe T7 High Yield Synthesis Kit (NEB). The RNA of interest was folded and structure probed in the presence or absence of ligand (TPP). Depending on the solvent for the RNA-modifying chemical, DMSO or water was added to the negative control. CMCT, NAI and NAI-N3 were added to final concentrations of 100 mM, whereas NMIA and DMS were used at final concentrations of 20 mM and 5% (vol/vol), respectively. DMS reactions were quenched with 30% β-mercaptoethanol in 0.3 M sodium acetate. Reactions were column purified (Zymo Research) and resuspended in nuclease-free water.
RNA footprint analysis
To determine sites of modifications along an RNA, an RT primer (IDT) was designed around 20 bp downstream of the region of interest. Primers were radiolabeled with P32 and purified on a 15% TBE-Urea PAGE gel. The purified labeled primer (1 µl) was then incubated with 500 ng of RNA (in 5.5 µl) for RT and run on an 8% TBE-urea PAGE sequencing gel. Gels were dried for 2 h on a vacuum gel drier before being exposed to a phosphorimager plate for 24 h. The phosphorimager plates were imaged on an Amersham Typhoon 5 Biomolecular Imager (General Electric). Gel images were quantified using semi-automated footprinting analysis (SAFA) software38.
Human and bacterial cell culture with in vivo SHAPE modification
hESCs (H9) were obtained from labs in the Genome Institute of Singapore and cultivated in feeder-free conditions with mTESR basal media supplemented with mTESR supplement (STEMCELL Technologies).
For in vivo SHAPE modification, cells were rinsed once on the plate with room temperature PBS (Thermo Fisher Scientific) before being incubated with Accutase (STEMCELL Technologies) for 10 min at 37 °C to dissociate cells. The cells were washed with PBS and spun down at 400g for 5 min before being resuspended in 950 µl of PBS in a 1.7-ml Eppendorf. NAI-N3 (2 M in 50 µl) in DMSO+ or DMSO− was then added to two separate cell suspensions and immediately mixed by inversion, and the cells were incubated at 37 °C at 10 r.p.m. (Model 400 hybridization incubator, SciGene) for 5 min. After the incubation period, the cell suspensions were immediately spun down at 400g for 5 min at 4 °C. The supernatant was removed before total RNA was isolated with the addition of TRIzol regent (Thermo Fisher Scientific).
Bacillus subtilis strain 168 was grown in LB media at 37 °C to an OD600 = 0.6. Cells were harvested by centrifugation at 3,000 r.p.m. for 5 min and pelleted and washed in PBS, before being treated with 100 mM NAI-N3 as in the H9 example above for 5 min at 37 °C. The cells were pelleted after incubation, resuspended in bacterial lysis buffer (1% SDS, 8 mM EDTA and 100 mM NaCl) and lysozyme (final concentration, 15 mg ml−1), and incubated for 15 min at 37 °C. Total RNA was then isolated with the addition of TRIzol reagent.
Direct RNA sequencing library preparation
Direct RNA sequencing libraries were constructed using an input of 500 ng of RNA for single templates or 1 µg of mRNA isolated from H9 hESC total RNA. For the training and test set of RNAs, a total of 14 RNAs were structure probed individually in vitro and then pooled together and sequenced as a mix. Of these 14 RNAs, 11 were used as the training set for the SVM, and the remaining three were used as a test set.
H9 mRNA was isolated using the Poly(A) Purist MAG Kit (Thermo Fisher Scientific). In this study, the direct RNA library preparation kit SQK-RNA001 from Oxford Nanopore Technologies was used for all sequencing runs. All preparation steps were followed according to the manufacturer’s specifications, except for the omission of a single RT step. The libraries were loaded onto R9.4.1 or R9.5 flow cells and sequenced on a MinION device. For the in vitro and 16S rRNA templates, the RTA DNA adapter from the sequencing kit in the first step was replaced with a DNA adapter complementary to the 3′ end of the RNA. The sequences of these adapters are detailed in Supplementary Table 1. The sequencing parameters were modified for all runs, and the specific changes are described in the document ‘Modified Minkown parameters’ found in our GitHub repository: https://github.com/awjga/PORE-cupine.
Polysome fractionation
H9 cells from a 15-cm plate were treated for 10 min with 100 µg ml−1 cycloheximide at 37 °C. The cells were next washed with warm PBS, dissociated with trypsin and neutralized with ice-cold media containing FBS (all supplemented with 100 µg ml−1 cycloheximide). Next, they were pelleted before being resuspended in 1× RSB buffer (10 mM Tris-HCl (pH 7.4), 150 mM NaCl and 15 mM MgCl2) with 200 µg ml−1 cycloheximide, lysed in lysis buffer (10 mM Tris-HCl (pH 7.4), 150 mM NaCl, 15 mM MgCl2, 1% Triton-X, 2% Tween-20 and 1% deoxycholate) and incubated on ice for 10 min. Centrifugation at 12,000g for 3 min removed the nuclei, and the supernatant was removed for a subsequent centrifugation.
Equal OD units were loaded onto a linear 10–50% sucrose gradient that was made using the 107 Gradient Master Ip (BioComp Instruments). The gradients containing the cell lysate were centrifuged in SW41 bucket rotors (Beckman Coulter) at 36,000 r.p.m. at 8 °C for 2 h. Twelve fractions were separated and collected from the top of the gradient using the PGF Ip Piston Gradient Fractionator (BioComp Instruments) and the Fraction Collector (FC-203B, Gilson). The absorbance readings were collected at 260 nm with an Econo UV Monitor (EM-1 220 V, Bio-Rad). After fractionation, 110 µl of 20% SDS and 12 µl of proteinase K (Thermo Fisher Scientific) were added to each fraction for a 30-min incubation at 42 °C, after which 10 µl of GeneChip Eukaryotic Poly-A RNA controls (Affymetrix; final concentration, 1:120,000) were added to each fraction. Total RNA from each fraction was extracted using phenol-chloroform-isoamyl alcohol (25:24:1, Sigma-Aldrich), poly(A) selected and made into a cDNA library using Ultra Directional RNA Library Prep Kit (NEB), following the manufacturer’s instructions.
Data analysis
Processing and quantification of TrIP-seq data
Raw paired-end reads were first trimmed and then mapped to the human transcriptome (Ensembl version GRCh38.93) using Salmon39 options: -l A–seqBias–gcBias –posBias). Relative abundances of each isoform were estimated as transcripts per million (TPM) by Salmon, and corresponding values were used for downstream analysis.
Base-calling and mapping of nanopore reads
Reads were base-called with Albacore version 2.3.3 or Guppy version 3.1.5 without filtering. Base-called sequences were aligned with GraphMap version 0.5.2 (ref. 40). For single-gene mapping, references for the individual genes were used. For H9 transcriptome mapping, cDNA and noncoding reference sequences obtained from Ensembl were used (GRCh38).
Determination of single-stranded positions
To evaluate the performance of various structure-probing methods, single-stranded positions were determined as those having a value greater than 1 s.d. above the median of the SAFA value within a gel. Single-stranded positions were then used to evaluate various methods and determine ROC curves.
Calculation and comparison of error rates
Mismatch, deletion and insertion rates were calculated using custom Python scripts from aligned BAM files. The Wilcoxon rank sum test was used to calculate significance of error rates between modified and unmodified samples. Fold changes in mismatch, deletion and insertion rates per position were calculated by dividing the error rate in modified samples by the rate in unmodified samples. The fold change was Winsorized: values ≥99th percentile were set to the value at the 99th percentile, and values <1 were set to 1. AUC-ROC values for mismatch, deletion and insertion rate-based prediction of single-stranded bases were calculated and compared between modified and unmodified libraries using the Wilcoxon test.
Alignment of nanopore signals
Current measurements above 200 µA and below 0 µA were considered as outlier values and were removed from the raw nanopore sequencing files. The current signal was aligned using Nanopolish (version 0.10.2). As the current mean drifts with increasing sequencing time, we normalized the current per strand to that of the expected model current mean in Nanopolish. Multiple events from the same read and position were collapsed into a single value by calculating the weighted average of event mean and event s.d. and taking the sum of event lengths.
Training of parameters, determination of thresholds and calculation of reactivity profiles
A one-class SVM was used to determine the percentage of modifications per position. Specifically, the current mean and current s.d. were used as features for each base.
To determine the number of unmodified and modified reads needed for robust analysis, we subsampled reads to various depths (25–500, 100 iterations) and compared the reactivities of the subsampled strands to that of the full data set using Pearson correlation. Reactivity scores were determined per position by calculating the percentage of modified bases detected using one-class SVM. For the hESC H9 transcriptome, transcripts that were present in both replicates of the modified libraries with a minimum of 100× coverage and transcript length >200 nucleotides were retained for analysis. To compare the reactivity across isoforms, a five-nucleotide moving average was applied to the transcript reactivity profile followed by z-score normalization.
Reactivity near RBP binding sites
HITS-CLIP libraries for the RBP Lin28 (SRR531463 and SRR531464) were downloaded from the Sequence Read Archive and mapped to the human genome (Homo_sapiens.GRCh38) using BWA41. Binding peaks were enriched using the CLIP Tool Kit package42, and binding motifs were detected using HOMER43. We analyzed the reactivities −50 bp to +50 bp around Lin28 motifs (G{GU}AG{C}A) that have CLIP binding peaks and randomly selected the same number of motifs outside of CLIP peaks as controls (compared using a t-test). In total, 552 HITS-CLIP binding sites on 316 transcripts were used for the analysis of Lin28 binding in our data. We randomly sampled 552 sites with the same motif sequence on 351 transcripts as control.
SNV structure analysis
Illumina RNA sequencing reads from the libraries RHN1291 and RHN1295 were mapped to the human genome (Homo_sapiens.GRCh38) using BWA. SNVs were called using bcftools with default parameters44. From the identified SNV positions, nanopore mapped reads were separated into three categories, corresponding to matching (reads that match the annotation), mutated (mutated reads based on the variant calling results) and unclassified (reads that are not found in the previous two groups) by Biostar214299 from Jvarkit. Matching and mutated reads were used for further SNV analysis.
For each of the SNV pairs, both transcripts were filtered for having >200 unmodified reads and >100 modified reads. SNV pairs with at least one changing region were considered as having a change in structure.
Determination of structure changing regions
To determine whether a base changes structure significantly between two reactivity profiles, we used Fisher’s exact test to compare the number of unmodified and modified reads at each position between two transcripts. A five-nucleotide sliding window was applied across the transcripts and regions with two or more positions with P < 0.05 across a transcript pair, and those that were not significant between biological replicates were identified as being structurally changing. Hommel’s method was used for false discovery rate (FDR) correction of the P value. As the structure-changing region cannot be structurally different across biological replicates, this allows us to filter off regions that fluctuate in coverage across biological replicates, reducing the amount of noise that is called as structurally significant.
Analysis of isoform pairs
Transcript coordinates were converted into genomic coordinates to allow ease of comparison across isoforms. Two transcripts were considered to be a gene-linked isoform pair if they had overlapping genomic positions and >100 bp of unique positions. Reactivity values from shared positions for each isoform pair were retained for comparisons. For global analysis, all shared positions were used to calculate the Pearson correlation. For local analysis, 100 nucleotides to the left and right of differential splice sites were used (sites with fewer adjacent bases were excluded).
Translation efficiency (TE) for each transcript was determined by TrIP-seq. To calculate the significance in TE differences between two isoform pairs, the raw counts for alleles across both low polysome fractions (sum of fractions 6 and 7) and high polysome fractions (sum of fractions 9 and 10) were compared using Fisher’s exact test to assess if the reads derived from the reference allele are significantly enriched or depleted in high polysome fractions (P < 0.05). Hommel’s method was used for FDR correction of P values.
Reporting Summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Data availability
Raw sequencing data and reactivity profiles can be downloaded from the Gene Expression Omnibus under accession number GSE133361. Source data are provided with this paper.
Code availability
Source code for all scripts (R version 3.4.1) and commands used for analysis can be found at http://github.com/awjga/PORE-cupine.
Change history
12 November 2020
A Correction to this paper has been published: https://doi.org/10.1038/s41587-020-00755-w
23 March 2021
A Correction to this paper has been published: https://doi.org/10.1038/s41587-021-00889-5
References
Wan, Y., Kertesz, M., Spitale, R. C., Segal, E. & Chang, H. Y. Understanding the transcriptome through RNA structure. Nat. Rev. Genet. 12, 641–655 (2011).
Kertesz, M. et al. Genome-wide measurement of RNA secondary structure in yeast. Nature 467, 103–107 (2010).
Wan, Y. et al. Landscape and variation of RNA secondary structure across the human transcriptome. Nature 505, 706–709 (2014).
Wan, Y. et al. Genome-wide measurement of RNA folding energies. Mol. Cell 48, 169–181 (2012).
Siegfried, N. A., Busan, S., Rice, G. M., Nelson, J. A. & Weeks, K. M. RNA motif discovery by SHAPE and mutational profiling (SHAPE-MaP). Nat. Methods 11, 959–965 (2014).
Spitale, R. C. et al. Structural imprints in vivo decode RNA regulatory mechanisms. Nature 519, 486–490 (2015).
Lucks, J. B. et al. Multiplexed RNA structure characterization with selective 2′-hydroxyl acylation analyzed by primer extension sequencing (SHAPE-Seq). Proc. Natl Acad. Sci. USA 108, 11063–11068 (2011).
Rouskin, S., Zubradt, M., Washietl, S., Kellis, M. & Weissman, J. S. Genome-wide probing of RNA structure reveals active unfolding of mRNA structures in vivo. Nature 505, 701–705 (2014).
Zubradt, M. et al. DMS-MaPseq for genome-wide or targeted RNA structure probing in vivo. Nat. Methods 14, 75–82 (2017).
Ding, Y. et al. In vivo genome-wide profiling of RNA secondary structure reveals novel regulatory features. Nature 505, 696–700 (2013).
Strobel, E. J., Yu, A. M. & Lucks, J. B. High-throughput determination of RNA structures. Nat. Rev. Genet. 19, 615–634 (2018).
Sharon, D., Tilgner, H., Grubert, F. & Snyder, M. A single-molecule long-read survey of the human transcriptome. Nat. Biotechnol. 31, 1009–1014 (2013).
Au, K. F. et al. Characterization of the human ESC transcriptome by hybrid sequencing. Proc. Natl Acad. Sci. USA 110, E4821–E4830 (2013).
Koren, S. et al. Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nat. Biotechnol. 30, 693–700 (2012).
Tilgner, H. et al. Comprehensive transcriptome analysis using synthetic long-read sequencing reveals molecular co-association of distant splicing events. Nat. Biotechnol. 33, 736–742 (2015).
Garalde, D. R. et al. Highly parallel direct RNA sequencing on an array of nanopores. Nat. Methods 15, 201–206 (2018).
Oikonomopoulos, S., Wang, Y. C., Djambazian, H., Badescu, D. & Ragoussis, J. Benchmarking of the Oxford Nanopore MinION sequencing for quantitative and qualitative assessment of cDNA populations. Sci. Rep. 6, 31602 (2016).
Simpson, J. T. et al. Detecting DNA cytosine methylation using nanopore sequencing. Nat. Methods 14, 407–410 (2017).
Parker, M. T. et al. Nanopore direct RNA sequencing maps the complexity of Arabidopsis mRNA processing and m6A modification. eLife 9, e49658 (2020).
Liu, H. et al. Accurate detection of m6A RNA modifications in native RNA sequences. Nat. Commun. 10, 4079 (2019).
Weeks, K. M. Advances in RNA structure analysis by chemical probing. Curr. Opin. Struct. Biol. 20, 295–304 (2010).
Spitale, R. C. et al. RNA SHAPE analysis in living cells. Nat. Chem. Biol. 9, 18–20 (2013).
Sachsenmaier, N., Handl, S., Debeljak, F. & Waldsich, C. Mapping RNA structure in vitro using nucleobase-specific probes. Methods Mol. Biol. 1086, 79–94 (2014).
Guo, F., Gooding, A. R. & Cech, T. R. Structure of the Tetrahymena ribozyme: base triple sandwich and metal ion at the active site. Mol. Cell 16, 351–362 (2004).
Winkler, W., Nahvi, A. & Breaker, R. R. Thiamine derivatives bind messenger RNAs directly to regulate bacterial gene expression. Nature 419, 952–956 (2002).
Jambhekar, A. et al. Unbiased selection of localization elements reveals cis-acting determinants of mRNA bud localization in Saccharomyces cerevisiae. Proc. Natl Acad. Sci. USA 102, 18005–18010 (2005).
Sexton, A. N., Wang, P. Y., Rutenberg-Schoenberg, M. & Simon, M. D. Interpreting reverse transcriptase termination and mutation events for greater insight into the chemical probing of RNA. Biochemistry 56, 4713–4721 (2017).
Li, F. et al. Global analysis of RNA secondary structure in two metazoans. Cell. Rep. 1, 69–82 (2012).
Sun, L. et al. RNA structure maps across mammalian cellular compartments. Nat. Struct. Mol. Biol. 26, 322–330 (2019).
Wilbert, M. L. et al. LIN28 binds messenger RNAs at GGAGA motifs and regulates splicing factor abundance. Mol. Cell 48, 195–206 (2012).
Wang, E. T. et al. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476 (2008).
Pan, Q., Shai, O., Lee, L. J., Frey, B. J. & Blencowe, B. J. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat. Genet. 40, 1413–1415 (2008).
Moqtaderi, Z., Geisberg, J. V. & Struhl, K. Extensive structural differences of closely related 3′ mRNA isoforms: links to Pab1 binding and mRNA stability. Mol. Cell 72, 849–861 (2018).
Floor, S. N. & Doudna, J. A. Tunable protein synthesis by transcript isoforms in human cells. eLife 5, e10921 (2016).
Aw, J. G. et al. In vivo mapping of eukaryotic RNA interactomes reveals principles of higher-order organization and regulation. Mol. Cell 62, 603–617 (2016).
Wang, E. T. et al. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476 (2008).
Mustoe, A. M. et al. Pervasive regulatory functions of mRNA structure revealed by high-resolution SHAPE probing. Cell 173, 181–195 (2018).
Das, R., Laederach, A., Pearlman, S. M., Herschlag, D. & Altman, R. B. SAFA: semi-automated footprinting analysis software for high-throughput quantification of nucleic acid footprinting experiments. RNA 11, 344–354 (2005).
Patro, R., Duggal, G., Love, M. I., Irizarry, R. A. & Kingsford, C. Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 14, 417–419 (2017).
Sovic, I. et al. Fast and sensitive mapping of nanopore sequencing reads with GraphMap. Nat. Commun. 7, 11307 (2016).
Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 26, 589–595 (2010).
Shah, A., Qian, Y., Weyn-Vanhentenryck, S. M. & Zhang, C. CLIP Tool Kit (CTK): a flexible and robust pipeline to analyze CLIP sequencing data. Bioinformatics 33, 566–567 (2017).
Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).
Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 2987–2993 (2011).
Acknowledgements
We thank members of the Wan and Tan labs and F. Yao, H. M. Loh, C. C. Khor and M. Sikic for helpful discussions. Y.W. is supported by funding from A*STAR (A*STAR investigatorship 1630700155), the National Research Foundation Singapore (NRF2019-NRF-ISF003-2970 and CRP21-2018-0101), the EMBO Young Investigatorship and the CIFAR global scholarship. F.R.P.L. is supported by a doctoral scholarship from the Warwick-A*STAR research attachment programme.
Author information
Authors and Affiliations
Contributions
Y.W. conceived the project. Y.W., N.N., M.H.T., B.S.N. and L.V. designed the experiments and analysis. S.W.L. and J.X.W. performed the experiments with help from P.K. J.G.A.A. and Y.S. performed the computational analysis with help from C.L. and E.P.K. Y.W. organized and wrote the paper with J.G.A.A., S.W.L. and all other authors.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Chemical structures of RNA structure probing compounds, associated reaction products, mapped length and statistics of error rates.
a, Chemical structures of RNA structure probing compounds. Side chains for the carbodiimide of CMCT are highlighted and abbreviated as R’ and R’ for part (b). b, RNA nucleotide triphosphates with chemical adducts formed from reaction with structure probing compounds. Adducts are highlighted in green. c, Median lengths of mapped nanopore reads for unmodified and modified Tetrahymena RNA with different structure probing compounds. d, e, f, Boxplots showing the frequency of mismatch (d), deletion (e) and insertion (f) rates for different structure probing chemicals on Tetrahymena RNA, as compared to unmodified RNA. P-values were calculated using the two-sided Wilcoxon Rank Sum test. h–j, Boxplots showing the AUC-ROC performance of mismapping (h), deletion (i) and insertion (j) rates for the different compounds on the Tetrahymena RNA secondary structure. P-values were calculated using two-sided Wilcoxon rank-sum test. c-j, 6962-42107 reads from different libraries were used for comparisons (Supplementary Table 1). The middle, lower and upper boundary lines in the boxplot correspond to median, first and third quartiles. The upper whisker extends to the largest value no further than 1.5 × IQR from the hinge (where IQR is the inter-quartile range) and the lower whisker extends to the smallest value at most 1.5 × IQR of the hinge.
Extended Data Fig. 2 Distribution of mismatches, insertions, deletions along Tetrahymena RNA sequence.
Line plots of normalized number of mismatches (a), deletions (b) and insertions (c) caused by the different compounds and unmodified, along the length of the Tetrahymena RNA sequence. The red bars on top of the plots indicate the location of single-stranded bases in the secondary structure.
Extended Data Fig. 3 Error characterization of the modifications along the secondary structure of the Tetrahymena RNA.
a, Positions and intensity of mismatches (red), deletions (green) and insertions (purple) caused by the different chemical compounds are mapped along the secondary structure of the Tetrahymena RNA. b, Percentage of observed bases (upper number) and corresponding P-values were shown (lower number) for each observation. P-value was calculated using two-sided chi-square test for all modified versus unmodified comparisons.
Extended Data Fig. 4 Schematic of the bioinformatic workflow of PORE-cupine and characteristics of direct RNA sequencing signal.
Sequenced reads were basecalled using Albacore or Guppy, and mapped to the reference sequences using Graphmap. We used Nanopolish to align the raw signals and to extract current features which were used to train unmodified data using SVM. We then filter for reads that are longer than 50% of annotated lengths and for transcripts that have at least 100 reads in the modified library and 200 reads in the unmodified library for downstream analysis. b, Normalized current dwell time for single-stranded regions on Tetrahymena RNA modified with NAI-N3. With footprinting gels as a guide, the top 10% of the single-stranded regions on Tetrahymena RNA were selected for these plots. c-e, Normalized current mean (c), standard deviation (d) and dwell time (e) distributions for all positions on unmodified Tetrahymena RNA and RNA modified with NAI-N3. f, Bioanalyzer traces of in vitro transcribed, full-length, unmodified, and NAI-N3 (100 mM for 5 mins or 25 mins) modified Tetrahymena RNA. g, Mapping rates for modified versus unmodified Tetrahymena RNA. The total number of sequenced reads for unmodified, NAI-N3 (5 mins), and NAI-N3(25 mins) are 20149, 51760, and 22155 reads respectively. The percentage of mapped reads for unmodified, NAI-N3 (5 mins), and NAI-N3 (25 mins) are 75%, 81%, and 17% respectively. h, Density plots showing the distribution of lengths of sequenced unmodified and modified Tetrahymena RNA. Top: unmodified and NAI-N3 modified (100 mM, 5 min) RNA. Bottom: unmodified and NAI-N3 modified (100 mM, 25 min) RNA. i, Coverage of reads mapping to Tetrahymena RNA along its length, for unmodified (top), NAI-N3 modified (100 mM, 5 min)(middle) and extended NAI-N3 modified (100 mM, 25 min) RNA (bottom).
Extended Data Fig. 5 Optimization of PORE-cupine using 11 RNAs as training set.
a, Scatterplot showing the distribution of normalized base reactivity between N = 2 biological replicates of modified Tetrahymena RNA. R = 0.97, CI95% = [0.97,0.98] (Pearson correlation). P-value=2.5×10-262, two-tailed Student’s T-test. b, Distribution of current mean and standard deviation for a unimodal (left) and bimodal (right) position in two biological replicates. c, AUC-ROC performance of the correlation of NAI-N3 reactivities of the training set based on PORE-cupine versus footprinting from 11 transcripts. d,e, Comparison of PORE-cupine reactivity and traditional footprinting. Two replicates of gels were shown for Tetrahymena RNA (d, R = 0.80) and lysine riboswitch (e, R = 0.74). Lane 1 of the footprinting gels show A (left, Tetrahymena) or G (right, Tetrahymena and lysine) ladder. Lane 2 shows unmodified RNA, and lane 3 shows NAI-N3 modified RNA. Quantification of the bands on the gels was done using SAFA. Pearson correlation was used to compare between SAFA and PORE-cupine signals. f, List of RNAs used for training and test. g, Scatter plot of per-base reactivity in two biological replicates of the three test RNAs. P-value = 0 using two-tailed Student’s T-test. R = 0.877, CI95% = [0.87, 0.89], by Pearson correlation. h-j, Line plots showing the per-base reactivity along the length of three test RNAs, for two biological replicates. R > = 0.89, using Pearson correlation. k, Boxplot showing the performance of the SVM parameters on the 3 test RNAs, based on training on the Tetrahymena RNA (left) or on 11 RNAs (right, footprinting gels). l, AUC-ROC performance of SVM parameters on 3 test RNAs (red, based on our current 11 training RNAs) versus test RNAs after random selection of 11/14 RNAs as training, for 20 times. m, Boxplot showing the performance of all, unimodal and bimodal positions on test RNAs using AUC-ROC based on footprinting gels from 3 transcripts. In c, k-m, the middle, lower and upper boundary lines in the boxplot correspond to median, first and third quartiles. The upper whisker extends to the largest value no further than 1.5 × IQR from the hinge (where IQR is the inter-quartile range) and the lower whisker extends to the smallest value at most 1.5 × IQR of the hinge. Outliers are shown as dots.
Extended Data Fig. 6 Comparison between PORE-cupine and footprinting signals.
a, Bioanalyzer traces of unmodified and in vivo NAI-N3 modified (100 mM, 5 min) total B. subtilis RNA. b, Secondary structure model of B. subtilis 16 S rRNA. The structure probed regions are boxed in pink, green and blue. c-e, Comparisons between PORE-cupine and footprinting. Two replicates of gels are shown for each of the three regions along B. subtilis RNA. The gels show G ladder (lane 1), unmodified RNA (lane 2) and NAI-N3 modified RNA (lane 3) and a correlation of R (Pearson)= 0.91 (c), 0.74 (d) and 0.24 (e) between the gels. Quantification of the bands on the gels were done using SAFA. Comparison between SAFA quantification and PORE-cupine for each of the regions is shown as a line plot to the right of the gels. R = 0.52 (c), 0.76 (d), 0.62 (e) by Pearson correlation. f,g, Bioanalyzer traces of unmodified and in vitro NAI-N3 modified RPS29 (100 mM, 5 min) (f) and Adocbl riboswitch (g). h-i, Comparisons between PORE-cupine and footprinting. Two replicates of gels are shown for along RPS29 and Adocbl riboswitch RNA, R(Pearson)= 0.93(h) and 0.73(i). The gels show G ladder (lane 1), unmodified RNA (lane 2) and NAI-N3 modified RNA (lane 3). Quantification of the bands on the gels were done using SAFA. Comparison between SAFA quantification and PORE-cupine for each of the regions is shown to the right of the gels. R(Pearson)=0.68 (h) and 0.69 (i).
Extended Data Fig. 7 PORE-cupine reactivity signals on TPP.
Line plots showing 2 replicates of PORE-cupine reactivities along TPP riboswitch in the presence of water (R = 0.86), 250 nM TPP (R = 0.87), 750 nM TPP (R = 0.83), and 10 μM TPP (R = 0.94). Pearson correlation is used to calculate the similarities between the reactivities of the replicates.
Extended Data Fig. 8 PORE-cupine results on the hESC transcriptome.
a, b, Bioanalyzer traces of unmodified and modified total (a) and polyA(+) selected (b) hESC. c, Barplots showing the number of reads after basecalling (12007032 and 10118432) and mapping (86% and 60%) in unmodified and modified hESC samples respectively. d, Histogram showing the distribution of reads with different amounts of modification in hESC. e, Boxplots showing the performance of reads with different amounts of modification, calculated using AUC-ROC on the test set of 3 RNAs, based on 10 footprinting regions. Reads were grouped into different classes: all reads (current, 147670 reads), with only 1 modification for the strand (only 1, 15461 reads), with 0-1% modification (68025 reads), with 1-2% modification (9771 reads), with 2-3% modification (803 reads), and with 3-4% modifications (76 reads). P-value is calculated using two-sided Wilcoxon Rank Sum test. The middle, lower and upper boundary of the boxplot correspond to the median, first and third quartiles, while the upper and lower whiskers extend from the hinge to the largest and smallest value at most 1.5 × IQR of the hinge respectively. f, Top: Fraction of modified reads mapped across exon-exon junctions (position 0) in hESC (black) and across artificial junctions with 50 base insertions (red). Bottom: Difference between mapping rates across normal versus artificial exon-exon junctions at each base; p-value was calculated using two-tailed Wilcoxon Rank-Sum Test. g, Line graph showing the percentage of bimodal positions observed (for both current standard deviation and mean) for all 1024 kmers. The orange line indicates the top 1% of bimodal signals across all kmers and the identity of the corresponding kmers are labelled above. h-j, Base composition along positions 1,2,3,4,5 of unimodal (left) and bimodal kmers (right). These include kmers that show bimodal current mean (h), bimodal current standard deviation (i), and both bimodal current mean and current standard deviation (j). k, Coverage of unmodified (left) and modified (right) reads along the hESC transcriptome using direct RNA sequencing.
Extended Data Fig. 9 Structural properties of the hESC transcriptome.
Scatterplot showing the distribution of total reads per position for each hESC transcript, for N = 2 biologically independent replicates, of unmodified (left, p-value =0, using two-tailed Student T test, CI95% = [0.98,0.98], 1613 transcripts) and modified transcripts (right, 1751 transcripts, p-value =0 using two-tailed Student T test, CI95% = [0.97,0.97]). R (Pearson)=0.98 (left) and 0.97 (right). b, Barplot showing the number of transcripts left after abundance and length filter. The number of transcripts in each group is shown above the plot. c, Boxplots showing the distribution of median mapped lengths of unmodified (left) and NAI-N3 modified (middle) hESC mRNAs (1751 trancripts). Annotated refers to the distribution of expected lengths for each transcript based on ENSEMBL GRCh38 annotation (right). d, Histogram showing the distribution of transcripts having different fractions of annotated length in unmodified and modified samples. e, Distribution of Pearson correlations between full-length (>99% of known length) and partial transcripts in hESC (83 transcripts from N = 2 two biological replicates were used). The Y-axis shows the fraction of transcripts with a particular correlation. The X-axis depicts Pearson correlation coefficients. f, Boxplot showing PORE-cupine reactivity of different classes of transcripts. P-values were calculated using two-sided Wilcoxon Rank Sum test.1584 coding genes, 67 pseudogenes, 81 non-coding genes and 4 rRNAs were used. g, Top, Metagene analysis of PORE-cupine-derived mean reactivities aligned according to start (Upper) and stop (Lower) codons for all 559 transcripts. Bottom, Metagene autocorrelation function (ACF) plot for the 5’ UTR, CDS and 3’ UTR. In c and f, the middle, lower and upper boundary of the boxplot correspond to the median, first and third quartiles. The upper and lower whisker extends from the hinge to the largest and smallest value at most 1.5 × IQR of the hinge. Outliers are shown as dots.
Extended Data Fig. 10 RNA structures in gene-linked isoforms.
a, Upper: Transcript organization of different RPLP0 isoforms. Alternative exons seen in our structural data are highlighted in red. Lower, normalized reactivity profiles for the different isoforms and their aggregate signal. b, Upper, Transcript organization of different RACK1 isoforms. Alternative exon is shown in red (also in inset). Lower, Line plots for the aggregate reactivity signal between the two isoforms are shown (Top). Middle, Line plots showing the expanded view of the reactivity difference between the isoforms. Bottom, Line plots showing the individual reactivity information for each isoform along its length. c, No. of transcripts with two structure changing regions that are more than 100, 200, 300, 400 or 500 bases apart. The value for each group is shown above the bar. d, Schematic of the TrIP-seq workflow. e, Line plot showing the absorbance A260 of each fraction (2-12) after polysome fractionation. f, Pair-wise correlations (Spearman correlation) of the read-counts/transcript for each fraction between two biological replicates. Fractions and batches are denoted as F2-12 and B1-2 respectively. g, Distribution of read-counts across different polysome fractions for two biological replicates of Actin B (left) and Activating transcription factor 4 (right).
Supplementary information
Supplementary Data 1
Less edited version of the gel images from Fig. 1 and Extended Data Figs. 5 and 6. The gel images are cropped to show the full lanes of the samples, and rotated to straighten the lanes.
Supplementary Table 1
Supplementary Table 1. Direct RNA sequencing statistics of individual and pooled RNAs
Supplementary Table 2
Numbers of reads after mapping
Supplementary Table 3
Primers for direct RNA sequencing and structure probing
Supplementary Table 4
Sequencing statistics for hESC H9 cells
Source data
Source Data Fig. 1
Source Data
Source Data Fig. 1
Raw gel images
Source Data Fig. 2
Source Data
Source Data Fig. 3
Source Data
Source Data Fig. 4
Source Data
Source Data Fig. 5
Source Data
Source Data Extended Data Fig. 1
Source Data
Source Data Extended Data Fig. 3
Source Data
Source Data Extended Data Fig. 5
Source Data
Source Data Extended Data Fig. 5
Raw gel images
Source Data Extended Data Fig. 6
Source Data
Source Data Extended Data Fig. 6
Raw gel images
Source Data Extended Data Fig. 7
Source Data
Source Data Extended Data Fig. 8
Source Data
Source Data Extended Data Fig. 9
Source Data
Source Data Extended Data Fig. 10
Source Data
Rights and permissions
About this article
Cite this article
Aw, J.G.A., Lim, S.W., Wang, J.X. et al. Determination of isoform-specific RNA structure with nanopore long reads. Nat Biotechnol 39, 336–346 (2021). https://doi.org/10.1038/s41587-020-0712-z
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/s41587-020-0712-z
This article is cited by
-
De novo basecalling of RNA modifications at single molecule and nucleotide resolution
Genome Biology (2025)
-
Knowing when to fold ’em
Nature Methods (2025)
-
DEMINERS enables clinical metagenomics and comparative transcriptomic analysis by increasing throughput and accuracy of nanopore direct RNA sequencing
Genome Biology (2025)
-
Identification of RNA structures and their roles in RNA functions
Nature Reviews Molecular Cell Biology (2024)
-
Isoform-specific RNA structure determination using Nano-DMS-MaP
Nature Protocols (2024)