Introduction

Rice (Oryza sativa L. subsp. Geng (Japonica) cv. Nipponbare) is a staple crop feeding more than half the world’s population1 as well as a model plant for genetic studies and molecular breeding2,3. The development of rice endosperm, rich in starch granules and protein bodies4, determines grain production and quality5. Rice endosperm development is roughly divided into four stages: (i) coenocyte (1–2 days after flowering (DAF) or pollination; the pollination occurs at or slightly before flowering6,7), (ii) cellularization (3–5 DAF), (iii) storage product accumulation (6–20/21 DAF), and (iv) maturation (21/22–30 DAF)4,8. Among these stages, it is at stage III that accumulation of storage products and programmed cell death (PCD) simultaneously occur and play crucial roles in endosperm development8,9.

Plant mitochondria provide energy for endosperm development and are related to PCD10,11. For rice, its mitochondrial genome contains 53 protein-coding genes, falling into four categories: (i) complexes composing the mitochondrial electron transport chain (mETC), including complex I (NADH dehydrogenase), complex III (cytochrome c reductase), complex IV (cytochrome c oxidase) and complex V (adenosine triphosphate (ATP) synthase), (ii) cytochrome (Cyt) c biogenesis proteins, (iii) ribosomal proteins, and (iv) other proteins. Furthermore, mETC tightly couples with ATP synthesis; besides, it has been found that complexes I and III in the mETC produce reactive oxygen species (ROS) related to PCD12,13. In mitochondria, two c-type cytochromes, namely, Cyt c and c1, participate in multiple biological processes; Cyt c transfers electrons from complex III to IV14 and likely regulates PCD15,16, and Cyt c1 is integrated into complex III14. In addition, MatR, a maturase-related protein, is assigned to the “other proteins” group and is required for the splicing of Group II introns in mitochondria17,18. Ribosomal proteins are responsible for translating mRNAs encoded by mitochondrial genomes and regulating protein synthesis, such as complexes in the mETC19,20.

RNA editing, as one of the most important post-transcriptional modifications, creates RNA products that differ from their DNA templates. RNA editing presents unique patterns of editing types in land plants, namely, mainly cytidine-to-uridine (C-to-U) alterations in RNA transcribed from mitochondrial and plastid DNA, termed as mitochondrial RNA editing21,22,23,24 and plastid RNA editing25,26, respectively. Importantly, pentatricopeptide repeat (PPR) proteins, including P-type and PLS-type (that involves four classes, viz., PLS, E (E1 and E2), E+, and DYW), specifically bind RNA depending on PPR codes and mediate C-to-U RNA editing27, playing a key role as RNA editing factors28. Specifically, P-type PPR proteins can mediate RNA editing by involving in the formation of E/E+PPR editosome29,30,31,32,33,34, or directly binding RNA35,36,37,38. Among PLS-type PPR proteins, PLS-class (that does not contain E, E+ and DYW domains) mediates RNA editing39,40 yet with unclear regulatory mechanisms, E-class and E+-class (that do not contain DYW domain) mediate RNA editing by recruiting DYW domain-containing proteins41, and DYW-class mediates RNA editing by utilizing DYW domain to implement the deamination of cytidine to uridine42. Over the past decades, valuable efforts have been devoted to investigating RNA editing via PPR proteins during plant endosperm development. Evidence has been accumulated that RNA editing in mitochondrial transcripts is important for endosperm development43,44,45,46,47,48,49,50,51 by affecting functions of complexes composing the mETC, Cyt c biogenesis proteins, ribosomal proteins and MatR. Specifically, RNA editing in mitochondrial genes regulated by PPR proteins is related to rice endosperm development, such as editing in nad7 mediated by SMK152, atp6, cob, cox3, nad1, nad9, rpl16 and rps12 by EMP553, cox2, cox3, ccmC, nad2 and nad4 by OGR154, and multiple mitochondrial genes (such as nad9) by cooperation of P-type FLO22 and DYW-class DYW335. As a result, RNA editing in mitochondrial genes as well as PPR editing factors has been identified in wild-type and mutant rice endosperm at specific developmental stages (for details see the Plant Editosome Database55), yet lacking comprehensive analyses of RNA editome during rice endosperm development.

Toward this end, here we collect rice endosperms across five successive developmental stages, mainly including the process of grain filling (stage III of rice endosperm development), and perform whole-genome re-sequencing and strand-specific RNA sequencing (RNA-seq). Based on these sequencing data, we systematically profile RNA editome, analyze RNA editing features in terms of location, editing frequency & variability, evolutionary conservation and structure, identify PPR genes in rice genome and construct PPR-RNA binding profiles, which collectively provide insightful clues for better understanding rice endosperm development from RNA editing machinery.

Results

Profiling RNA editome during rice endosperm development

To systematically investigate RNA editing events during rice endosperm development, we obtain both genome re-sequencing and strand-specific RNA-seq data from rice endosperm derived from five successive developmental stages (3, 6, 9, 12, and 15 DAF). After strict quality control, sequence alignment, and filtration (for details see Methods), we identify a total of 369 RNA editing sites across the five developmental stages (Supplementary Data 1). Specifically, the primary RNA editing type in rice mitochondria is C-to-U (298) and the rest of types are G-to-A (65) and U-to-C (1), whereas in plastids the only type is C-to-U (5), revealing the dominance of C-to-U editing during rice endosperm development (Fig. 1A and Supplementary Data 2). Furthermore, we find that these C-to-U editing sites identified in endosperm are different from those in seedling, presumably suggesting the tissue-specificity of RNA editing (for details see Supplementary Fig. 1A, B). In addition, our procedure for identification of RNA editing sites, by comparison with a previous study56, is more effective to obtain high-confidence editing sites (for details see Supplementary Fig. 1C, D).

Fig. 1: Characteristics of RNA editome during rice endosperm development.
figure 1

A RNA editing sites in rice mitochondria and plastids. 369 RNA editing sites in mitochondria (364) and plastids (5) are categorized by editing types, which are color-coded, with pink for C-to-U (mitochondria), blue for C-to-U (plastids), green for G-to-A (mitochondria) and purple for U-to-C (mitochondria), respectively. B Distribution of 369 RNA editing sites at 3, 6, 9, 12, and 15 DAF (n = 5). Bars with pink, green, blue, and purple denote C-to-U (mitochondria), G-to-A (mitochondria), C-to-U (plastids), and U-to-C (mitochondria) RNA editing types, respectively. C Distribution of RNA editing events and their corresponding RNA-seq coverage in rice mitochondria. The outer circle denotes rice mitochondrial genes; genes outside the circle are on the positive strand and oriented clockwise, while genes inside the circle are on the negative strand and oriented anticlockwise. The gray circles, from outer to inner denote RNA-seq coverage (Bar plot, C: Coverage) and frequency (Heatmap, E: Editing) of RNA editing sites at 3, 6, 9, 12, and 15 DAF. D Distribution of editing frequency of 298 C-to-U editing sites. Components of the box plot are: center line, median value; box limits, upper and lower quartiles; whiskers, 1.5x interquartile range; error bars, the highest and lowest values excluding outliers. E Distribution of 298 C-to-U editing sites in different sequence regions. Mit mitochondria, Plt plastids.

For each rice endosperm developmental stage, the number of C-to-U editing sites in mitochondria is consistently larger in contrast to other editing types (Fig. 1B). Furthermore, we capture a whole view of RNA editing events in rice mitochondrial genome across the five developmental stages (Fig. 1C). We find that the average editing frequency of all 298 C-to-U sites in mitochondria varies from 2% to 99.8% at the five developmental stages, with the median at 69.8% (Fig. 1D). Besides, the distribution of the distance between two neighboring editing sites shows that C-to-U editing sites tend to be located within 50 bp (Supplementary Fig. 2). Among these 298 C-to-U editing sites in rice mitochondria, we observe that there are 253 sites located in coding sequence (CDS) regions (214 for CDS-recoding and 39 for CDS-synonymous), which are more prevalent compared to those in intronic (73), intergenic (12) and pseudogenic (8) regions (Fig. 1E). Since one gene may be overlapped with other genes in the plant mitochondrial genome, notably, we find that among these 73 intronic editing sites, 43 are located in CDS (35 for CDS-recoding and 8 for CDS-synonymous) regions and 5 are in pseudogenic regions (Fig. 1E and Supplementary Data 1). Unlike the diversity of C-to-U editing distribution, the 65 G-to-A editing sites in mitochondria are located in intronic (23, 35.38%) and intergenic (42, 64.62%) regions and the 5 C-to-U editing sites in plastids are only in CDS regions (2 for CDS-recoding and 3 for CDS-synonymous) (Supplementary Data 1). Taken together, it is clearly shown that CDS-recoding sites are more ubiquitous in mitochondria during rice endosperm development.

Amino acid biochemical properties changed by RNA editing

Studies have shown that RNA editing is functionally important by affecting amino acid biochemical properties57,58. In our study, among these 214 CDS-recoding sites, we find that 88 (41.12%) and 126 (58.88%) are located at the first and second codon positions, respectively. Contrastingly, among these 39 CDS-synonymous sites, most (36, 92.31%) occur at the third codon position producing plenty of U-ending codons, except 3 sites (7.69%; Leu-to-Leu, 2 for CUG-to-UUG and 1 for CUA-to-UUA) at the first codon position (Supplementary Data 3 and 4). Accordingly, we identify 8 codon alteration types with more than 10 editing events: UCA-to-UUA (Ser-to-Leu), CGG-to-UGG (Arg-to-Trp), UCG-to-UUG (Ser-to-Leu), UCU-to-UUU (Ser-to-Phe), CCA-to-CUA (Pro-to-Leu), UCC-to-UUC (Ser-to-Phe), CGU-to-UGU (Arg-to-Cys), and UUC-to-UUU (Phe-to-Phe) (Fig. 2A and Supplementary Data 4). Considering the relative frequency of amino acids corresponding to these 253 sites in CDS regions (214 for CDS-recoding and 39 for CDS-synonymous) after RNA editing, Leu (94, 37.30%) and Phe (55, 21.83%) are relatively dominant (Fig. 2A and Supplementary Data 4). Regarding the amino acid biochemical properties, we find that nearly all amino acids after RNA editing on these 253 sites in mitochondria are hydrophobic except Ser and Thr (Fig. 2A); the percentage of hydrophobic amino acids is dramatically increased from 37.70% to 86.90% by most CDS-recoding editing sites (188/214) (Supplementary Data 4).

Fig. 2: Codon alterations caused by editing in CDS regions, and the distributions of CDS-recoding sites and the corresponding recoded amino acids.
figure 2

A Amino acid and codon alterations of 253 editing sites in CDS regions. Red and gray arrows denote codon alterations with more and less than 10 editing events, respectively. Edited RNA bases are underlined in codons. Amino acids in membrane structures and water drops denote hydrophobic and hydrophilic amino acids, respectively. Asterisks denote stop codons. Ala Alanine, Arg Arginine, Cys Cysteine, Gln Glutamine, His Histidine, Ile Isoleucine, Leu Leucine, Met Methionine, Phe Phenylalanine, Pro Proline, Ser Serine, Thr Threonine, Trp Tryptophan, Tyr Tyrosine, Val Valine. B Normalized CDS-recoding site counts in different grouped genes/proteins/complexes as well as their corresponding hydrophobic amino acid counts before and after editing. Purple bars denote normalized CDS-recoding site counts in protein-coding genes. Numbers in the left bars denote total normalized CDS-recoding site counts in different types of proteins or complexes (complex I, III, IV, V, Cyt c biogenesis proteins, ribosomal proteins and other proteins from up to down). Normalized counts of hydrophobic amino acids corresponding to editing sites before and after editing in protein-coding genes are shown as blue and red bars, respectively. Numbers in the right bars denote normalized hydrophobic amino acid counts after editing in different types of proteins or complexes. Normalized editing site counts (or normalized hydrophobic amino acid counts) in protein-coding genes (or proteins, or complexes) is denoted as “CDS-recoding site counts (or hydrophobic amino acid counts)/total length of CDS in the corresponding protein-coding gene (or protein, or complex) × 1000”. C Recoded amino acids caused by RNA editing with mapped positions in the 3D structure of apocytochrome b (COB). The abbreviation used is: IMM, inner mitochondrial membrane.

In terms of 53 rice mitochondrial protein-coding genes assigned to 7 groups encoding complex I, complex III, complex IV, complex V, Cyt c biogenesis proteins, ribosomal proteins, and other proteins, we investigate the distribution of these 214 CDS-recoding sites as well as the corresponding amino acid hydrophobicity (Fig. 2B). Notably, editing at 64 of these 214 CDS-recoding sites has been greatly affected in mutant endosperm of O. sativa, Zea mays and A. thaliana43,44,45,48,50,51,52,54,59 (for details see Supplementary Data 5), suggesting the critical significance of the rest CDS-recoding sites for endosperm development. Among these 53 genes, nearly half (25 genes) harbor at least one CDS-recoding site (Fig. 2B). Furthermore, we detect RNA editing hotspots that feature more CDS-recoding sites and more hydrophobic amino acids after editing. We find that three gene groups, encoding Cyt c biogenesis proteins (normalized CDS-recoding sites: 19.16, normalized hydrophobic amino acids caused by CDS-recoding editing: 14.87), complex III (normalized CDS-recoding sites: 12.56, normalized hydrophobic amino acids caused by CDS-recoding editing: 10.89) and IV (normalized CDS-recoding sites: 7.66, normalized hydrophobic amino acids caused by CDS-recoding editing: 7.66) (Fig. 2B), are RNA editing hotspots, including a bunch of previously reported editing sites that can be affected in endosperm mutants43,47,48,49,50,52,54,60,61 (for details see Supplementary Data 5). In plastids, atpA-1148 is the only CDS-recoding site that presents higher editing frequencies (0.838 on average) (Supplementary Data 1 and 6), which has been reported to be critical for ATP synthesis62.

Structural effects provoked by RNA editing

Considering the possible effect of RNA editing in protein structure58,63, we map these 214 CDS-recoding sites to three-dimensional (3D) structures of edited gene products. Notably, we observe that within mitochondria, most CDS-recoding sites (167/188) featuring hydrophobic amino acids are situated in/ near the inner membrane and/or the interior of proteins, characterized by a hydrophobic lipid environment (Supplementary Fig. 3 and Supplementary Data 7), suggesting that editing at these sites plays a crucial role in preserving the insertion of membrane-proteins, ensuring protein stability, and subsequently influencing assemblies and activities of mitochondrial complexes/proteins, as previously elucidated58. Especially, cytochrome b (cob), encoding apocytochrome b (COB), a subunit of complex III in mitochondria that is one of RNA editing hotspots, possesses 15 CDS-recoding sites, among which 13 produce hydrophobic amino acids, conforming well with its hydrophobic environment (Fig. 2C and Supplementary Data 7). Likewise, in plastids, atpA-1148 and ndhG-347 causing hydrophobic amino acids are in/near the inner mitochondrial membrane (IMM) and/or buried in the interior of proteins (Supplementary Fig. 3 and Supplementary Data 7).

To further investigate the effect of these CDS-recoding sites on protein 3D structure, we align 3D structures of mitochondrial proteins before and after editing (for details see Methods and Data availability). Obviously, we detect a significant difference (Root Mean Square Deviation, RMSD > 2.5Å64) in 3D structures of proteins encoded by atp6, ccmB, ccmC, ccmFc, ccmFn, cob, cox2, mat-r, nad5, nad6, orf183, orf288, orfX, rpl2, rps1, rps4 and rps19 (Supplementary Fig. 4), indicating that CDS-recoding editing in these genes greatly affects protein 3D structures. Notably, the RNA editing hotspots identified above, all present significant changes in 3D structures, especially ccmB (RMSD = 30.90 Å) and ccmFc (RMSD = 25.06 Å) (Supplementary Fig. 4). On the contrary, the rest mitochondrial protein-coding genes (atp9, nad1, nad2, nad4, nad4L, nad7, rps7, and rps13) delicately change their 3D structures with RMSD < 2.5 Å (Supplementary Fig. 4), implying that CDS-recoding editing probably fine-tunes these proteins’ structures.

We also find that two intronic editing sites at nad7-i3-D5-22 (position 22 at domain 5 (D5) in the 3rd intron of nad7, genomic position at 87,186) and ccmFc-i-D5-26 (position 26 at D5 in the only intron of ccmFc, genomic position at 332,291) affect D5’s structure (Supplementary Fig. 5), which is essential for intron splicing65. Specifically, the two RNA editing sites are likely to produce GAAA sequence, which is located in the apical loops of RNA secondary structure models of edited nad7 intron 3 D5 and ccmFc intron D5 (Supplementary Fig. 5) and important for interaction activities of D566,67.

RNA editing frequency, variability and evolutionary conservation

We next investigate the editing frequency and variability of these 298 C-to-U sites in mitochondria across the five developmental stages. For each stage, it is consistently observed that CDS-recoding sites present significantly higher editing frequencies than CDS-synonymous, intronic and intergenic ones (Fig. 3A). Strikingly, all CDS-recoding and CDS-synonymous sites in cob, encoding a subunit of complex III and belonging to the editing hotspot, feature higher editing frequencies at each developmental stage (Supplementary Fig. 6). As one site may present variable editing frequencies at different developmental stages, we further examine the editing variability of these sites. We find that CDS-recoding sites also possess lower editing variability than other sites (Fig. 3B), indicating the significant role of their relative constant editing in rice endosperm development. Specifically, we notice that most CDS-recoding sites in atp6, ccmC, cox2, nad4L, nad6, orf183 as well as ribosomal protein genes, present not only higher editing frequencies but also lower editing variabilities across the five developmental stages (Supplementary Fig. 6), clearly showing the critical significance of RNA editing in altering amino acids to yield functional proteins for rice endosperm development.

Fig. 3: Developmental and evolutionary features of CDS-recoding events in mitochondria.
figure 3

A Editing frequencies of 247 C-to-U sites (outside of overlapped genes) in CDS (179 for CDS-recoding and 31 for CDS-synonymous), intronic (25) and intergenic (12) regions during five rice endosperm development stages. In addition, editing frequency of three sites in pseudogenic regions are not compared with that of CDS-recoding sites because of few sites. The Wilcoxon test was used to calculate the statistical significance (p-value, with ***<0.001, and ****<0.0001). Components of the box plot are: center line, median value; box limits, upper and lower quartiles; whiskers, 1.5x interquartile range; error bars: the highest and lowest values excluding outliers. B Editing variabilities of these 247 C-to-U sites across five stages during rice endosperm development. The Wilcoxon test was used to calculate the statistical significance (p-value, with **<0.01, and ***<0.001). Components of the box plot are: center line, median value; box limits, upper and lower quartiles; whiskers, 1.5x interquartile range; error bars: the highest and lowest values excluding outliers. C Correlation between editing frequency of CDS-recoding sites and evolutionary conservation of amino acids corresponding to these CDS-recoding sites. Spearman’s correlation coefficient is calculated for the correlation. The fitted curve is implemented by the generalized linear model. Points in different colors denote CDS-recoding sites in mitochondrial genes encoding different proteins or complexes. Based on the density distribution of conservation of amino acids (Supplementary Fig. 7), thresholds of high and low conservation were set at 0.50 and 0.15, respectively. Similarly, thresholds of high and low editing frequencies were set at 0.89 and 0.10, respectively, according to the density distribution of editing frequencies of these CDS-recoding sites (Supplementary Fig. 8).

RNA editing is believed to restore evolutionarily conserved amino acids68. In other words, for one CDS-recoding site, its recoded amino acid via RNA editing may be restored to the amino acid that is relatively conserved in multiple species. To explore the conservation of amino acids corresponding to these 214 CDS-recoding sites, we retrieve a large collection of protein sequences in accordance with these 214 sites from 122 land plants (for details see Methods), falling into 6 different clades, namely, liverworts (12 species; 189 CDS-recoding sites), mosses (25 species; 192 CDS-recoding sites), hornworts (5 species; 82 CDS-recoding sites), lycophytes (3 species; 109 CDS-recoding sites), ferns (2 species; 107 CDS-recoding sites), and seed plants (75 species; 198 CDS-recoding sites) (Supplementary Data 8 and 9). Accordingly, we find that these CDS-recoding sites present a positive correlation between amino acid conservation and editing frequency (R = 0.31, p = 9.4e−06) (Fig. 3C; for details see Supplementary Fig. 9), which, albeit the correlation coefficient is not so strong, is observed in nearly all clades except hornworts and ferns (Supplementary Fig. 10).

We further classify these editing sites into 7 groups in terms of gene products as mentioned above. We find that editing sites in genes encoding Cyt c biogenesis proteins exhibit the positive correlation between amino acid conservation and editing frequency (Supplementary Fig. 11). Particularly, cob (encoding a subunit of complex III), cox2 (encoding a subunit of complex IV) and nad4L (encoding a subunit of complex I) tend to possess more conserved amino acids and higher editing frequencies (Fig. 3C). On the contrary, it is also believed that RNA editing enriches proteomic diversity69. Notably, ccmC, ccmFn, cox2, rps1, and rps7 harbor a few CDS-recoding sites with higher editing frequencies but less conserved amino acids, indicating the enriched proteomic diversity in these genes (Fig. 3C).

It is worth noting that RNA editing can provoke initiation and stop codons. Here we find that these CDS-recoding editing events lead to one initiation codon (nad4L-2, ACG to AUG) and three stop codons (atp9-223, CGA to UGA; ccmFc-1306, CGA to UGA; and rps1-505, CAA to UAA) (Supplementary Data 6). Specifically, nad4L-2, converting ACG into the canonical initiation codon AUG, has lower editing frequency during rice endosperm development and higher conservation in its encoded amino acid across 117 land plants (Fig. 3C and Supplementary Data 10). In addition, atp9-223 and ccmFc-1306 present higher conservation across land plants, leading to stop codons at the end of corresponding gene products (Fig. 3C; Supplementary Data 6 and 10). Contrastingly, RNA editing at rps1-505 resulting in a stop codon is observed only in rice and absent in other land plants, accordingly reducing the length of rps1 products by 4 amino acids.

RNA editing pattern and classification of mitochondrial genes

Since CDS-recoding editing sites predominate and present important effects on biochemical properties and protein structures during rice endosperm development, we further investigate their editing patterns during the five developmental stages. Based on unsupervised clustering, we group the 214 CDS-recoding sites into five clusters, namely, C1~C5, with each cluster presenting a distinctive RNA editing pattern, containing 26, 17, 25, 43, and 103 sites, respectively (Fig. 4A). We find that the average editing frequency is on the increase from C1 to C5 (with 0.08 in C1, 0.28 in C2, 0.50 in C3, 0.74 in C4, and 0.92 in C5, respectively), and their corresponding editing variability is gradually on the decrease (with τ value at 0.85, 0.44, 0.25, 0.15 and 0.03, respectively; lower τ value indicates more relatively constant editing frequencies). Clearly, C5 features higher editing frequency and lower editing variability, whereas C1 features lower editing frequency and higher editing variability.

Fig. 4: Classification of mitochondrial genes by editing frequency of 214 CDS-recoding sites.
figure 4

A Clustering of 214 CDS-recoding sites in five developmental stages of rice endosperm, leading to five clusters (C1~C5) that show distinct patterns by factoring both editing frequency (“f”) and variability (“τ”). In each cluster, the red triangle denotes the average editing frequency in each development stage. Components of the box plot are: center line, median value; box limits, upper and lower quartiles; whiskers, 1.5x interquartile range; error bars: the highest and lowest values excluding outliers; points, outliers. B Classification of edited mitochondrial protein-coding genes based on the clustering patterns of these 214 CDS-recoding sites.

To further investigate 25 mitochondrial genes containing CDS-recoding sites, we classify these genes, by factoring C1~C5, into Groups I, II, and III which contain 3, 9, and 13 genes, respectively. Specifically, Group I harbors editing events present in C1~C4 yet absent in C5, Group II features C5-specific editing events, and Group III contains editing events spanning broadly from C1~C5 (Fig. 4B). In particular, in Group I, nad4 encoding a subunit of complex I contains CDS-recoding events only from C4. Of note, cob encoding a component of complex III and cox2 encoding a component of complex IV, belonging to Group II, are RNA editing hotspots. Similarly, four genes (ccmB, ccmC, ccmFc and ccmFn) encoding Cyt c biogenesis proteins in Group III are also RNA editing hotspots, with diverse CDS-recoding events covering all five clusters (C1~C5) except ccmC. Furthermore, since editing frequencies are increased or decreased in mutant endosperms (Supplementary Data 5), regulating RNA editing in genes from Groups I and III is potential to affect functions of mitochondrial complexes/proteins and further improve rice endosperm development.

Construction of PPR-RNA binding profiles

To explore which PLS-type PPR proteins regulate C-to-U RNA editing during rice endosperm development according to the distinctive mechanism of PPR-RNA binding, we identify 480 PPR genes in the rice genome (Fig. 5A and Supplementary Data 11). To further identify potential RNA editing factors, we predict the subcellular localization of PPR proteins and PPR-RNA binding (for details see Methods) and obtain a total of 156 PLS-type PPR proteins as potential editing factors, namely, 149 in mitochondria and 7 in plastids (Supplementary Data 12 and 13). To further verify the confidence of these 156 predicted PPR proteins, we perform protein sequence alignments and find that 49 present higher similarity with extant PPR editing factors experimentally discovered in A. thaliana, Glycine max, Gossypium hirsutum, Hordeum vulgare and Z. mays (for details see Supplementary Data 14), indicating that the predicted PPR proteins are valuable candidates for experimental validation.

Fig. 5: Identification of PPR genes and prediction of PPR-RNA binding.
figure 5

A A total of 480 PPR genes were identified in the rice genome, containing P-type PPR and PLS-type PPR (including PLS, E (E1 and E2), E+, and DYW class PPR) genes. B, C Motif arrangement of LOC_Os12g17080 and LOC_Os11g10740 and their binding to the upstream sequence of cox2-167 and nad7-836, respectively. The combination of the fifth and last amino acids in each motif (P1, L1, S1, P2, L2 and S2) denotes a PPR code, specifically binding a RNA base. The cytidine (C) in red is the edited RNA base.

Meanwhile, we construct PLS-type PPR-RNA binding profiles for these 156 predicted PPR proteins (Supplementary Data 13 and 15). Based on binding affinities of PPR codes for RNA bases, we find that LOC_Os12g17080 and LOC_Os11g10740 tend to bind RNA sequences upstream of the top 10 editing sites (Supplementary Data 13). Among these top 10 sites, notably, cox2-167 (ranking 3rd) (Fig. 5B) and nad7-836 (ranking 6th) (Fig. 5C) have been experimentally validated to affect endosperm development in rice52,54, implying the great potential of our predicted PPR-RNA binding profiles in studying RNA editing machinery during rice endosperm development. Furthermore, we investigate PPR gene expression during rice endosperm development and find that PPR (especially PLS-type PPR) genes extensively exhibit low expression (Supplementary Data 16) in conformity with previous findings70,71,72, indicating that expression is not a proper indicator for screening PPR RNA editing factors.

Discussion

Here, we profiled RNA editome in mitochondria during rice endosperm development and performed systematic analyses to characterize RNA editing sites, demonstrating that CDS-recoding editing is conserved in evolution and development, increases the proportion of hydrophobic amino acids located in/near the mitochondrial inner membrane and/or interior of proteins, and changes 3D structures of mitochondrial proteins. Moreover, we classified mitochondrial protein-coding genes containing CDS-recoding editing events into three groups based on editing frequency and variability, and constructed PPR-RNA binding profiles to yield a high-confidence collection of candidate PPR editing factors. Based on these results, we present a model of RNA editing in mitochondria for rice endosperm development (Fig. 6): RNA editing mediated by one PPR protein changes RNA and amino acid sequences and affects complex/protein formation to maintain ATP synthesis, protein synthesis, and PCD involving genes that encode complexes (complex I, III, IV and V) in the mETC, Cyt c biogenesis proteins and ribosomal proteins, which together sustain rice endosperm development.

Fig. 6: A model of CDS-recoding editing in mitochondria for rice endosperm development.
figure 6

Editing regulated by one PPR protein changes RNA and amino acid sequences, regulates formations and functions of complexes (complex I, III, IV and V), Cyt c biogenesis proteins and ribosomal proteins to ensure ATP synthesis, PCD and protein synthesis, which together maintain rice endosperm development. Different gene categories containing CDS-recoding editing events are color-coded, with complexes (complex I, III, IV and V) for brown, Cyt c biogenesis proteins for pink, ribosomal proteins for gray, and other proteins for green. The yellow dashed line denotes CcmB, CcmFc and CcmFn are related to Cyt c1 biogenesis, while the question mark means that the role of CcmC in Cyt c1 biogenesis is uncertain. Complex II is encoded by two nuclear genes and is not found to contain RNA editing events in rice. AA amino acid, PCD programmed cell death, IMS intermembrane space, IMM inner mitochondrial membrane, CCBPs cytochrome c biogenesis proteins, ROS reactive oxygen species, Cyt cytochrome.

CDS-recoding editing in genes encoding complexes (complex I, III, IV, and V) in the mETC and encoding Cyt c biogenesis proteins, is important for ATP synthesis during rice endosperm development. For instance, cox2-167, an editing site identified in the gene encoding complex IV that is the last electron acceptor in the mETC and crucial for ATP formation, has been documented to present aberrant RNA editing leading to abnormal rice endosperm54. Strikingly, there is a body of related evidence supported by several studies on maize endosperm development: (i) absence of editing at cob-908 (also identified in this study) results in deficiency of complex III, decrease of ATP production and reduction of maize endosperm growth43, (ii) loss of editing at ccmB-43 (also identified in this study) affects the maturation of Cyt c and c1, and biogenesis of complex III48, and (iii) abolishment of editing at ccmFn-1553 in maize (corresponding to ccmFn-1514 identified in this study) results in the loss of CcmFN protein (a subunit of heme lyase complex) and affects cytochrome c maturation, mitochondrial oxidative phosphorylation47, which, as a result, affects maize endosperm development.

In addition, attention should be paid to CDS-recoding editing in genes that encode other mETC complexes, such as complex I and V, because, besides ATP synthesis, complex I is related to formation of ROS that is the crucial signaling molecule regulating PCD12,13. Consistently, abolishment of editing at nad2-26 and nad5-1916 (also identified in this study) affects PCD and maize endosperm development50. Moreover, CDS-recoding editing events in nad4L (nad4L-110 and nad4L-179) encoding a subunit of complex I and atp9 (atp9-191 and atp9-212) encoding a subunit of complex V (also identified in this study), are closely associated with mutant maize endosperm31,49. In addition, abolishment of editing at nad7-836 affects the structural stability and activity of NAD7 (a subunit of complex I) and further influences respiration rate, starch biosynthesis, and maize endosperm development52.

CDS-recoding editing in ribosomal proteins is important for protein synthesis during rice endosperm development. Particularly, it has been reported that editing at rps4-335 (in maize, A. thaliana, and rice in this study), converting Pro into Leu, is related to the maturation of Cyt c, the assembly of complex I, III, and V, and affects protein translation and endosperm development48. In addition, ribosomal protein S1, S7 and S13 are essential for protein synthesis73,74,75 and CDS-recoding editing events in rps1 (rps1-107 and rps1-377), rps7 (rps7-277 and rps7-332) and rps13 (rps13-56, rps13-100, rps13-256 and rps13-287) (also identified in this study) are affected in mutant maize endosperm31,47,51,59,76, implying that these editing events play an important role in protein synthesis during rice endosperm development.

CDS-recoding editing in mitochondrial genes encoding other proteins may also be critical for rice endosperm development. Notably, 3D structures of genes mat-r, orf183, orf288 and orfX are significantly changed after RNA editing, especially for orf288 (RMSD = 24.57 Å) and mat-r (RMSD = 16.77 Å) (Supplementary Fig. 4), implying the importance of editing in these genes for rice endosperm development. Specifically, it has been reported that abnormal editing at mat-r-86, mat-r-92, mat-r-295, and mat-r-296 is potentially associated with mutant endosperm of maize (also identified in this study)31,44. In addition, it has been found that abolishment of editing at one site in mat-r can reduce the splicing efficiency of nad4 Introns 1 and 3 and further affect maize endosperm development46. However, orfX, orf183, and orf288, similar to a chimeric cytoplasmic male sterility-associated gene orf31277, have not been well studied, requiring further investigations on uncovering molecular mechanisms and functions of RNA editing.

Methods

Whole–genome re-sequencing and RNA-seq

DNA was extracted from leaves and endosperm (at 3 DAF) of the wild-type Nipponbare (Japonica) by the DNA extraction kit (DP305) from HUAYUEYANG Biotechnology, BEIJING CO.,LTD. RNA was extracted from endosperms at 3, 6, 9, 12, and 15 DAF by the RNA extraction kit (GK0416) produced by HUAYUEYANG Biotechnology, BEIJING CO.,LTD. Sequencing libraries were prepared parallelly by the KAPA Stranded mRNA-seq Library Preparation kit (KK8421) and the Epicentre Ribo-Zero Magnetic Kit (Plant Seed/Root, MRZSR116), which were sequenced on the Illumina HiSeq 2500.

Read mapping and gene expression quantification

Trimmomatic78 was used to eliminate low-quality reads and remove adapters. Cleaned reads were mapped to the mitochondria genome (NC_011033.1)56, the chloroplast genome (NC_001320.1)79 and the nuclear genome (Os-Nipponbare-Reference-IRGSP-1.0)80 by HISAT281 with the parameter “--rna-strandness FR” to infer strand-specific information. The mapped reads were used to quantify expression with FPKM (Fragments Per Kilobase of exon per Million fragments mapped) for genes and transcripts by stringTie82.

Identification of RNA editing sites and estimation of editing frequency

To detect editing sites, matched RNA and genomic DNA were analyzed by the REDItoolDnaRna.py script in REDItools83. Single nucleotide polymorphisms (SNPs) were identified by comparing the re-sequenced DNA sequence to the reference genome and were excluded. The parameters “-e -E -d -D” were used to remove multiple mapped reads caused by gene transfers as well as duplicated reads. Low-quality bases (quality score <25) and low-mapping sites (quality score <25) were removed. DNA-RNA mismatches featuring multiple types were excluded, and only those supported by both RNA and DNA sequencing with base coverages greater than 30 and variations supported by at least 3 reads were kept. RNA editing sites identified at five developmental stages were mutually confirmed. RNA editing sites in cox1, nad2, nad7 and rpl2 were corrected by annotations from NCBI RefSeq under accession number DQ167400.1.

Detecting the variability of editing frequency during rice endosperm development

The τ value84 was adopted to estimate the variability of editing frequency across different developmental stages. For an editing site, smaller τ value means relative constant editing frequencies across different conditions and larger τ value signifies variable editing frequencies.

Correlation between editing frequency and evolutionary conservation

To investigate the correlation between the editing efficiency of CDS-recoding sites and the evolutionary conservation of corresponding recoded amino acids, alterations of amino acids caused by editing were obtained from mitochondrial genes among land plants in RefSeq85 and GenBank86. The correlation in orf183 and orf288 was filtered due to few plants containing RNA editing sites in these two genes. Amino acids recoded by ccmFn-37, ccmFn-38, mat-r-295, mat-r-296, rps19-163, and rps19-164 were filtered because of more than one editing site in the corresponding codons. Multiple sequence alignment conducted by ClustalW87 in MEGA1188 was used to calculate the evolutionary conservation of amino acids corresponding to RNA editing sites. These amino acids existing in more than 80% species were kept because of gaps in the multiple sequence alignment. The correlation between the editing frequency and the conservation of amino acids was analyzed by Spearman’s correlation and fitted by the generalized linear model (GLM).

RNA secondary structure prediction

To functionally investigate editing sites in intronic regions of mitochondrial genes, editing sites in sequence-conserved domain 5 (D5) in introns were selected. D5 in rice mitochondrial genes was searched by aligning with D5 of Pylaiella littoralis89, and D5 containing editing sites was kept. The secondary structure of D5 before and after RNA editing was predicted by MFold90 (http://www.unafold.org/Dinamelt/applications/quickfold.php).

Mapping CDS-recoding sites to protein 3D structures

To investigate locations of CDS-recoding sites on 3D structures of proteins, experimentally verified protein 3D structures in PDB91 (https://www.rcsb.org) were used with priority. Protein sequences were aligned with those in other plants with 3D annotations in PDB, and CDS-recoding sites were mapped to the 3D structures. Given that some annotations of protein 3D structures are not available in PDB, mapping of CDS-recoding sites to protein 3D structures was conducted based on annotations in Uniprot92 (https://www.uniprot.org).

Protein 3D structure prediction and alignment

To investigate effects of CDS-recoding editing on protein 3D structures, ColabFold93(https://colab.research.google.com/github/sokrypton/ColabFold) was used to predict 3D structures of each protein before and after RNA editing, and hypothetical 3D structures were aligned and visualized by SuperPose94 (http://superpose.wishartlab.com). Root Mean Square Deviation (RMSD) greater than 2.5 Å was considered a significant difference of protein 3D structures64.

Prediction of candidate PLS-type PPR editing factors and PPR-RNA binding

PPRFinder95,96 (https://ppr.plantenergy.uwa.edu.au) was used to search PLS-type PPR genes and motifs. PPR codes were predicted by PPRCODE97 (https://colab.research.google.com/github/YaoYinYing/PPRCODE_Guideline/blob/master/PPRCODE.ipynb). Motif gaps in PLS-type PPR genes were filled with “inferred motifs”95, and “##” is taken as pseudo-PPR codes of inferred motifs. PPR proteins with fewer than six motifs are filtered. Furthermore, subcellular localization of PPR proteins was predicted by TargetP 2.098 (https://services.healthtech.dtu.dk/services/TargetP-2.0), WoLF PSORT99 (https://wolfpsort.hgc.jp), iPSORT100 (https://ipsort.hgc.jp), Plant-mPLoc101 (http://www.csbio.sjtu.edu.cn/bioinf/plant-multi), Predotar102 (https://urgi.versailles.inra.fr/predotar) and the reported subcellular localization predicted by TargetP 1.1103,104. For each PPR protein, the top 10% predictions generated by TargetP 2.0 and WoLF PSORT, and the top 20% predictions generated by Predotar were retained, respectively. Only those predictions with the same subcellular localizations implemented by at least two aforementioned tools were kept for subsequent analysis. The binding of PPR-RNA was predicted and scored based on PPRmatcher105. The binding score greater than or equal to 1 in mitochondria and 2 in plastids is retained. BLASTP (specific parameters: -evalue 1e-5 -max_hsps 1) was used to detect the similarity of PPR genes and RNA editing factors106.

Statistics and reproducibility

The statistical significance of differences between RNA editing frequencies/variabilities of sites in different genomic regions was evaluated by the two-sided Wilcoxon’s rank-sum test. The p value was adjusted by the Bonferroni correction. A p value less than 0.05 was considered statistically significant. Correlation between editing frequency of CDS-recoding sites and evolutionary conservation of amino acids was calculated by the Spearman’s correlation coefficient and the fitted curve was implemented by Generalized Linear Models. 214 CDS-recoding sites in five developmental stages of rice endosperm were clustered by the K-means clustering algorithm. Five rice endosperm samples were collected across five successive developmental stages (3, 6, 9, 12, and 15 DAF). The sample at each stage was used to conduct RNA-seq (polyA and Ribo-Zero). In addition, RNA editing sites identified at five developmental stages were mutually confirmed.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.