Introduction

Mitochondria are double-membrane semi-autonomous organelles that widely present in eukaryotes, and are the place of cell oxidative metabolism1. Mitochondria are involved in the regulation of cell growth, division, apoptosis, and the synthesis and metabolism of some compounds, which also play a vital role in the development of plants2. The mitochondrial (mt) genomes of most plants have a circular double stranded DNA, and their lengths rang from several thousand to several million base pairs3,4. In the current study, the mt genome of Brassica napus has the lowest length of 221 kb, while the Silene conoidea mt genome has the largest size of 11.3 Mb5. Although plant mitochondria display great diversity in terms of genome size, most of the protein-coding genes (PCGs) are highly conserved, mainly composed of 24 core conserved genes and 17 variant genes, and could be divided into complex I (nad), complex II (sdh), complex III (cob), complex IV (cox), complex V (atp), Cytohrome c biogenesis (ccm), and transfer RNAs, etc.6. Except for complex II, ribosomal protein, and tRNA genes, other genes are relatively conserved in the mt genome of higher plants7,8. Unlike chloroplast (cp) genomes that use their own unique genetic codes, the genetic codes among plant mt genomes are universal across species. In addition to directly inheriting from ancestral mitochondria, tRNA genes also originate from the migration of their own cp genome sequences9,10. There are also abundant genetic variations in the mt genomes of higher plants, which are widely used as potential molecular markers for studying the origin and evolution of species and population genetic diversity11. The mt genome not only has the characteristics of fast evolution speed and low recombination rate, but also has the advantages of small genome size and easy sequencing research. It has become an ideal tool for comparative genetics and systematics researches among different plants12,13.

Lettuce is an important raw vegetable in the world that belongs to the Asteraceae family in the Lactuca genus, which originates from Mediterranean coast14. Lettuce is one of the main cultivated vegetables in plant factories, and is favored by consumers due to its abundant vitamin C content15. Lactuca is a big genus including about 100 species, which is divided into four groups: the cultivated one, being L. sativa, and three wild species, being L. serriola, L.virosa, and L. saligna. L. serriola has the phenotypic characteristics of harder prickles on its stem and lobed leaves16. While L. saligna has a narrow and long leaf phenotype. L.virosa represents many phenotypes, including some with lobed leaves or not, some with prickles on its leaves others not, but all species have wide leaves16. L. sativa var. ramosa Hort with long obovate leaves and dense into cabbage-like leaf balls is eaten raw, crisp and refreshing, and slightly sweet. Morphological differences could distinguish between L. sativa, the cultivated lettuce and wild-type varieties. Four types of lettuces, being Butterhead lettuces (var. capitata L. nidus tenerrima), Crisphead lettuces (var. capitata L. salinas), leaf lettuces (var. acephala Alef.), and Cos lettuces (var. longifolia), have strong competitive advantages in the market17. Besides, L. sativa var. ramosa Hort is a representative lettuce variety, which widely planted in China due to its short growth cycle and high nutritional value.

The process leading to the domestication of L. sativa is still unclear. L. serriola was confirmed as the direct ancestor and one of the closest related species of L. sativa18,19,20. With the progress of sequencing technologies and the report of sequencing genomes, it is helpful to explain the relationship between Lactuca spp. The rnL-F and ndhF genes were developed as cp marker for Lactuca species21. The Lactuca spp is classified into several clades, namely, L. sativa, L. serriola, L. virosa, and L. saligna21. In addition, the mt genomes of several Lactuca species have also been reported. The ones of L. sativa (363,324 bp, MK642355), L. serriola (363,328 bp, MK820672), and L. saligna (368,269 bp, MK759657), which makes it possible to develop a new set of markers22. The characterized mt genome of L. sativa var. ramosa comparison with between wild and cultivated Lactuca species can contribute to finding genetic or structural variations in the evolutionary history of Lactuca cultivars. Therefore, the assembly and analysis of the mt genome is of great significance for better understanding its genetic features and for molecular marker research.

In this study, we sequenced and assembled the whole mtDNA sequence of L. sativa var. ramosa by using the Illumina and Nanopore sequencing platforms and described its genome features. Its genome characteristics and evolutionary relationships were conducted a comparison with the other related Lactuca species. The findings obtained in this study provide available genetic information to explore species identification, genetic variation, and genetic relationship for the Lactuca species in the future.

Materials and methods

Plant materials, genome sequencing and assembly

The L. sativa var. ramosa Hort plants were cultivated in the greenhouse at the the Loudi Ziyuan Agricultural Science and Technology Development Co., Ltd. (Tongzi Village, Shanshan Town, Louxing District, Loudi, Hunan, China, 27°47ʹN, 112°1ʹE). We collected approximately 5 g of 30-day-old leaves, transported them using dry ice, and sent them to the Genepioneer Biotechnologies (Nanjing, China). Total genomic DNA of L. sativa var. ramosa Hort was isolated from the young leaves using the HiPure Universal DNA kit D301(Genepioneer Biotechnologies ). The DNA purity was detected with 1.0% agarose gel, and then was sequenced using the Illumina Novaseq6000 and Oxford Nanopore PromethION sequencing platforms. To obtain high-quality reads of L. sativa var. ramosa Hort mt genome, Fastp v0.23.4 (https://github.com/OpenGene/fastp) software was used to filter the Illumina sequencing raw data, and delete the sequencing adaptors and primer sequences in the reads, filter out reads with an average quality value lower than Q5, and discard reads with the number over than 5. Then, the Nanopore sequencing raw data was filtered via using filtlongv0.2.1 software and the parameters were set as follows:–min_length 1000 and –min_mean_q 7. The Nanopore sequencing raw data was assembled via using Minimap223, of which mt sequences were aligned with the plant mt gene database (https://github.com/xul962464/plant_mt_ref_gene) . Sequences with sizes > 50 bp, comprising multiple core genes, were screened as the seed sequences according to their alignment. Subsequently, Minimap2 was used to compare the original Nanopore sequencing raw data with the seed sequences, and sequences with overlap > 1 kb were selected and added to the seed sequences, and iteratively aligned the original Nanopore sequencing data with the seed sequences to obtain all the mt genome sequence of L. sativa var. ramosa Hort. All the Nanopore seuqencing data were conducted self-correction via using Canu24, and Bowtie2 (v2.3.5.1) was used to compare the Illumina sequencing data to the corrected sequence. The corrected Illumina sequencing data were stitched with the corrected Nanopore sequencing data using Unicycler (v0.4.8) with default parameters. The stitching results were visualized and manually adjusted using Bandage software (v0.8.1), and finally mt genome sequence of L. sativa var. ramosa Hort was obtained.

Genome annotation

The PCGs and rRNA genes of the L. sativa var. ramosa Hort mt genome was annotated using MIFOFY5. Then, the tRNA genes was analyzed using tRNAscan-SE 2.025. Finally, the annotation results were manually adjusted and corrected based on the related species. Open Reading Frame Finder (http://www.ncbi.nlm.nih.gov/gorf/gorf.html) was used to identify the ORFs with the length ≥ 102 bp, and delete the redundant sequences and known genes with overlap sequences. OGDRAW program was used to draw the circle map of L. sativa var. ramosa Hort mt genome26.

Repeat sequence analysis

Interspersed repeats, comprising forward repeats, palindromic repeats, reverse repeats, and complementary repeats, were identified using blastn v2.10.1 with removing redundancy and tandem repeats, and the parameters was set as follows : -word_size 7 and evalue 1e-5. Subsequently, the interspersed repeats were visualized using circos v0.69-5. Tandem repeats were analyzed using online tool Tandem Repeats Finder (http://tandem.bu.edu/trf/trf.basic.submit.html) with parameters set as default. Misa v1.0 software was used to detect simple sequence repeats (SSRs). The repeats of one to six bases with 10, 5, 4, 3, 3, and 3 repeats numbers, respectively, were analyzed in this analysis.

RNA-editing analysis in PCGs and Pi analysis

The RNA- editing sites of 31 PCGs of L. sativa var. ramosa Hort and other five mt genomes (L. saligna, L. sativa, L. sativa var. capitata, L. serriola, and L. virosa) were identified using the PREP-Mt online tool (http://prep.unl.edu/) with cutoff value set as 0.227. We calculated the nucleotide diversity (Pi) value of each PCG between L. sativa var. ramosa Hort and L. saligna, L. sativa, L. sativa var. capitata, L. serriola, and L. virosa. The homologous gene sequences from six Lactuca species were globally aligned using mafft software v7.427 with auto mode. The Pi value of each PCG was determined using Dnasp5.

Phylogenetic analyses

A total of 28 entire mt genomes, including 27 representative Asteraceae species and one Ginkgoaceae species, were used to confirm the phylogenetic position of L. sativa var. ramosa Hort. The 31 mt PCGs, being atp1, atp4, atp6, atp8, atp9, ccmB, ccmC, ccmFc, ccmFn, cob, cox1, cox2, cox3, matR, mttB, nad1, nad2, nad3, nad4, nad4L, nad5, nad6, nad7, nad9, rpl10, rpl16, rpl5, rps12, rps13, rps3, and rps4, conserved across the 28 tested species were aligned in MAFFT v7.427 with –auto mode. The aligned sequences were connected end-to-end, and were trimmed using trimAl (v1.4.rev15) in ModelFinder28,29. A Bayesian phylogenetic tree was created using MrBayes v3.2.7 software with the Markov Chain Monte Carlo (MCMC) iterative operation for 1 million generations, sampling every 100 generations. The initial 25% of the phylogenetic tree was deleted (burn-in), and then the majority-rule consensus tree was obtained.

Identification of homologous fragments from cp genome to mt genome

To obtain the homologous fragments from cp genome to mt genome, BLASTN software was used to compare the L. sativa var. ramosa Hort mt genome with its cp genome (PP999684). The parameters were set as follows: the matching rate ≥ 70%, E-value ≤ 1e − 5 and the minimum length = 30 bp30.

Synteny analysis

Using L. sativa var. ramosa Hort as the reference, genome alignment between other Lactuca sequences and L. sativa var. ramosa Hort sequences was conducted using nucmer (4.0.0beta2) software with the maxmatch parameter to produced dot-plot plots,

Results

Features of the L. sativa var. ramosa Hort mt genome.

The L. sativa var. ramosa Hort mt genome was generated 16,089,057,852 raw data and 53,275,026 bp clean data (Q20 = 98.71% and Q30 = 96.34%) were obtained via the Illumina sequencing (Table S1). Then, in total of 17,652,114,316 bases and 1,711,468 reads were obtained via Nanopore sequencing with a mean read size of 10,314 bp. Te subreads with N50 value was 24,620 bp in length (Table S2). The L. sativa var. ramosa Hort mt genome exhibited a typical circular structure with full length of 363,324 bp (Fig. 1). The nucleotide composition of the entire L. sativa var. ramosa Hort mt genome included 27.33% for A, 27.31% for T, 22.65% for C, and 22.70% for G, with GC content of 45.35% (Table S3). PCGs and cis-spliced introns accontted for 9.46% and 6.42% of the entire mt genome, while tRNA and rRNA genes only occupied 0.57% and 3.12%, respectively. 71 annotated genes, consisting of 35 PCGs, 6 rRNAs, 28 tRNAs, and 2 pseudogenes, were detected in the L. sativa var. ramosa Hort mt genome (Table 1). Six genes, being ccmFc, cox2, nad4, rps3, trnS-GCT(2), and trnT-TGT(3), had one intron; whereas four genes, namely, nad1, nad2, nad5(2), and nad7, included four introns. 11 genes, including atp1, ccmB, nad5, rpl10, rrn18, rrn26, rrn5, trnD-GTC, trnK-TTT, trnQ-TTG, and trnS-GCT, were found in two copies, while trnT-TGT and trnM-CAT genes were detected in three or five copies.

Fig. 1
figure 1

Circular map of the L. sativa var. ramosa Hort mt genome.

Table 1 List of encoding genes in the L. sativa var. ramosa Hort mt genome.

Genome size and gene content vary from species to species31,32. Five representative Lactuca species were used to compare genome features and find variability of the genome of L. sativa var. ramosa Hort (Table 2). The lengths of all the tested species were between 363,324 bp (L. sativa var. ramosa Hort, L. sativa, and L. sativa var. capitata) and 373,019 bp (L. virosa). The lowest number of genes (69) were identified in L. sativa and L. serriola, and the highest (79) in L. virosa. The PCGs ranged from 35 in L. sativa var. ramosa Hort to 43 in L. virosa, and tRNAs were between 25 and 29. Excep for L. sativa var. ramosa Hort, all the Lactuca species had the same number in rRNA (6) and intron (24). The AT and GC contents exhibited a minor difference in all the detected species. Overall, L. sativa var. ramosa showed a minor difference in characteristics with other Lactuca species.

Table 2 Comparison of gene content among Lactuca mt genomes.

Codon usage analysis of PCGs

Except for cox1 gene with ACG and mttB gene with ATT as the start codon, other PCGs were used ATG as the start codon, which resulted in C-to-U RNA editing of the second site and G-to-U RNA editing of the third site, respectively (Table 1). The RSCU values of 35 PCGs were calculated with our Perl script in the L. sativa var. ramosa Hort mt genome (Fig. 2). Except for stop codons, the 35 PCGs encoded 9,868 codons with the total length of 34,353 bp. The highest frequent amino acid was leucine (Leu), encoded by CUA, CUC, CUG, CUU, UUA, and UUG, with 1,051 codons, followed by serine (Ser), encoded by AGC, AGU, UCA, UCC, UCG and UCU, with 936 codons, and cysteine (Cys) encoded by UGC and UGU was the lowest with 134 codons. 29 codons with RSCU > 1 were observed in the L. sativa var. ramosa Hort mt genome, of which 27 codons (93.10%) ended with A or U, and two condons (6.90%) ended with C or G. In addition, the methionine (Met) and tryptophan (Trp) with RSCU = 1 showed no preference (Table S4).

Fig. 2
figure 2

RSCU analysis of the L. sativa var. ramosa Hort mt genome.

Prediction of RNA-editing sites

RNA-editing is a means of maintaining the normal biological function of cp and mt, and widely exists in all eukaryotes33. In this work, 500 RNA-editing sites in 35 PCGs (Table 3) were discovered in the L. sativa var. ramosa Hort mt genome using the PREP-Mt online tool. The atp8 gene had the least RNA-editing sites (3), while the largest was in ccmFn gene with 37 RNA-editing sites (Figure S1). Among 500 RNA-editing sites, 64.80% (324 sites) changed at the second position of the triplet codes, followed 33.60% (168 sites) changed with the first base of the triplet codes. while 1.6% (8 sites) changed with the first and second bases of the triplet codes, which resulted in an amino acid change from proline (CCC) to phenylalanine (TTC). Additionally, 48% (240) sites were changed from hydrophilic to hydrophobic, followed 31.4% (157) from hydrophobic to hydrophobic, and 0.40% (2 sites) was the least from hydrophilic to stop. Furthermore, 113 sites (about 22.6%) were varied from serine (S) to leucine (L) , and 110 sites (about 22%) were change from proline (P) to leucine (L).

Table 3 RNA-editing prediction in the L. sativa var. ramosa Hort mt genome.

Furthermore, we compared the RNA editing sites of L. saligna, L. sativa, L. sativa var. capitata, L. serriola and L.virosa with representatives from Lactuca species (Fig. 3). The largest edited transcripts were ccmB and ccmFn both with 36 RNA editing sites in L. saligna, and L . sativa var. capitata, and the ccmFn gene with 38–39 RNA editing sites (38 for L. sativa var. ramosa Hort, L. serriola, and L.virosa; 39 for L. sativa ). From the comparison of RNA editing sites among six Lactuca species, we found that they have no interspecies differences in the number of RNA editing sites for ccmB.

Fig. 3
figure 3

Comparison of RNA-editing sites among six Lactuca species.

Repeat sequence analysis

Repeat sequences, including SSR, tandem repeats, and interspersed repeats, were widely distributed in the mt genomes of plants, which play a critical role in genome rearrangement34,35. SSRs are an efficient molecular marker, which are DNA fragments comprising short sequence repeat units with a size of 1–6 base pairs36. In total of 110 SSRs were discovered in the L. sativa var. ramosa Hort mt genome, consisting of 21.82% (24) for monomers, 20.91% (23) for dimers, 9.09% (10) for trimers, 44.55% (49) for tetramers, 3.63% (4) for pentamers (Table 4). SSRs in monomer, dimer and tetramer motifs occupied 87.28% of all identified SSRs. The monomers included 11 of Adenine (A) and 13 of thymine (T), respectively. The TA SSR motifs were the highest abundant dimers with 30.43% of the total dimers (Table S5). Whereas the hexamers were not yet found in this genome.

Table 4 SSR motifs anaysis of the L. sativa var. ramosa Hort mt genome.

Tandem repeats, also named satellite DNA, are widely present in eukaryotic genomes and some prokaryotes37. In L. sativa var. ramosa Hort,15 tandem repeats were identified with a matching degree more than 76%, and the sizes were between 12 and 39 bp (Table 5). Interspersed repeats is another kind of repetitive sequence, which is distributed dispersedly in the genome. A total of 120 interspersed repeats with the size 30 bp were obtained, of which 76 palindromic (about 63.33%) and 44 forward repeats (36.67%), and the reverse and complementary were not yet detected in this mt genome (Fig. 4). The whole size of these identified interspersed repeats was 58,124 bp, which accounted for 16% of the total mt genome. Most interspersed repeats were between 30 and 50 bp, and the maximum length of repeat was 34,696 bp (Table S6).

Table 5 The identified tandem repeats in the L. sativa var. ramosa Hort mt genome.
Fig. 4
figure 4

Interspersed repeats identified in the L. sativa var. ramosa Hort mt genome.

Pi analysis

The Pi values of 35 PCGs were calculated and ranged from 0 to 0.01032 in the L. sativa var. ramosa Hort mt genome (Fig. 5 and Table S7). The Pi values of gene16.atp1 were the highest among all the tested regions, being 0.01032, and 0.00082 in gene20.nad2, 0.00046 in gene3.nad6, 0.0004 in gene2.cox2, and 0.00028 in gene8.cob. These genetic variations, being atp1, nad2, nad6, cox2, and cob, might be selected as the available molecular markers for the Lactuca species in the future. Most PCGs with low Pi values reflected that the mt genome of L. sativa var. ramosa Hort were relatively conserved.

Fig. 5
figure 5

Nucleotide diversity analysis of the L. sativa var. ramosa Hort mt genome.

Phylogenetic analysis

To affirmed the phylogenetic position of L. sativa var. ramosa Hort, a Bayesian phylogenetic tree was conducted based on a set of 31 conserved PCGs from all 28 detected mt genomes (Fig. 6). The phylogenetic tree was divided into eight groups, namely, Lactuca, Chrysanthemum, Diplostephium, Aster, Helianthus, Ageratum, Arctium, and Ginkgo. L. sativa var. ramosa Hort was well clustered with the species of Lactuca genus at first group, and formed sister branches with other related Lactuca species in the Asteraceae family clade. the mt genome of L. sativa var. ramosa Hort was closely related not only to L. sativa var. capitata (MZ159953), but also to other mt genomes of L. sativa var. capitata and L. virosa to the same extent. Overall, the findings of our mt genomes analysis provide an utilizable information for future researches of the evolutionary relationships of Lactuca plants.

Fig. 6
figure 6

A Bayesian phylogenetic tree was created based on 31 conserved PCGs among 27 Asteraceae species. Ginkgo biloba (NC_027976) was selected as an outgroup.

Homologous fragments transferred from cp to mt

The cp-like sequences in the mt genome were identified via comparing with the whole cp genome sequence of L. sativa var. ramosa Hort obtained from the GenBank of NCBI (PP999684). The homologous sequence had a length of 5,511 bp in the cp genome, occupied 3.61% of the entrie cp genome. Whereas the homologous sequences on the mt genome was 5,553 bp in length, accounting for 1.53% of the entrie mt genome (Table S8). A total of 15 fragments were observed in the L. sativa var. ramosa Hort mt genome, varying in length from 79 bp to 1,219 bp (Table 6). The cp-like sequences was 7,547 bp in length, accounting for 2.08% of the mt genome. Six complete tRNA genes, being trnW-CCA, trnQ-TTG, trnD-GTC, trnH-GTG, trnN-GTT, and trnM-CAT, were identified, with some homologous fragments of rrn18 genes. We also found that 15 insertion regions in the cp genome of L. sativa var. ramosa Hort, comprising eight complete genes, including two PCGs (petL and petG, ) and five tRNA genes (trnW, trnP, trnD-GUC, trnN, and trnM), were detected in the L.sativa var. ramosa Hort cp genome, with some homologous fragments of rpoC1, rrn16, rbcL, infA, rps8, and ycf3 genes. Combined with the above findings, the tRNA genes were more conserved than PCGs and rRNAs in the mt genome of L. sativa var. ramosa Hort.

Table 6 Fragments transferred from cp to mt in L. sativa var. ramosa Hort.

Synteny analysis of mt genome sequences

As shown in Fig. 7, the dot-plot analysis indicated that longer synteny sequences with higher similarity were identified among L.sativa var. ramosa Hort with L. sativa var. capitata than between L. sativa var. ramosa Hort and other Lactuca species, illustrating that L. sativa var. ramosa Hort has a similarity structure with L. sativa var. capitata. The off-diagonal signals in L. serriola were due to common repeat sequences. Furthermore, the sequence rearrangement events were found in L. sativa, L. saligna and L.virosa.

Fig. 7
figure 7

Dot-plot graphs showing synteny sequences between mt genomes in Lactuca species compared to L. sativa var. ramosa Hort as the reference.

Discussion

Mitochondria are indispensable organelles in plants, which are an important place for respiration and energy conversion. Mt genomes have the characteristics of slow evolution and high conservation, which have become an ideal tool for evolutionary analysis of species38,39. In this study, we characterized the L. sativa var. ramosa Hort mt genome, and carried out a comparison with other related Lactuca species. The L. sativa var. ramosa Hort mt genome is a circular structure with a full length of 363,324 bp and 45.35% GC content, which exhibited a high similarity to L. sativa and L. sativa var. capitata (Table 2) . GC content is an important indicator for evaluating species. The GC content in the L. sativa var. ramosa Hort mt genome was 45.35%, which was comparable to other reported mt genomes of Lactuca species such as L. serriola, 45.36%; L. virosa 45.27%; L. sativa var Salinas, 43.43%; L. saligna, 42.54%;17,40, whereas showed higher than the L. sativa var. ramosa Hort cp genome (PP999684, 37.55%) sequenced by our research team. Non-coding sequence occupied 81.62% for the complete L. sativa var. ramosa Hort mt genome, which is consistent with Brassica rapa var. Purpuraria41, Taraxacum mongolicum42 and Clematis acerifolia43. Besides, the PCGs generally encoded from start codon (ATG) to terminator codon (TGA, TAG and TAA), which accounted for 9.46% of the whole mt genome. This phenomenon was agreed with Mesona chinensis Benth44 and Luffa cylindrica45, which might be resulted in increasing repetitive sequences during evolution. The cox1 gene using ACG as initiator codon in coherence with Diospyros oleifera might be caused by RNA editing46.

The usage frequency of different codons encoding the same amino acid is different, which is interpreted as codon preference47. RSCU is an important index to evaluate the codon usage pattern of mt genome in plants48. Codon preference has been widely applied in genetic, domestication and systematic evolution of plant taxa49,50,51. In L. sativa var. ramosa Hort, 29 high-frequency codons with RSCU > 1 were identified, of which 93.10% (27) codons preferred to end with A or U bases, which was agreed with previous studies52,53,54 . Besides, the most frequently used amino acid was leucine in the L. sativa var. ramosa Hort mt genome, and the similar results were found in the Conopomorpha sinensis55 and Perilla frutescens mt genomes56.

RNA editing is widely existed in the mt genome of plants, which involved in plant development and stress response57. A total of 500 RNA-editing sites within all the 35 PCGs were predicted in the L. sativa var. ramosa Hort mt genome, which presented much higher than those in Welwitschia (226)58, Garcinia mangostana L.variety Mesta (333)59 and Abelmoschus esculentus (281)60, and lower than those in Hypopitys monotropa (545)61 and Pulsatilla patens (902)8. Most of RNA editing sites has been found to be C-to-U conversion in plant mt genomes62. A total of 500 C-to-U edit sites were observed in 35 PCGs, while no U-to-C sites were found in the L. sativa var. ramosa Hort mt genome, being similar as in the Cycas63 and Ginkgo mt genomes58 . Most of RNA-editing sites generated at the first or second codon positions, and no RNA editing sites were observed at the third codon position in the L. sativa var. ramosa Hort mt genome. The similar results were obtained in the Suaeda glauca32, Macadamia integrifolia34 and L.cylindrica45. These identified RNA-editing sites provide necessary clues for exploring evolution and predicting gene function of new codons, which could help us better understand the gene expression of mt genomes in plants.

Repetitive sequences containing tandem, short and large repeats, are abundant distributed in the mt genome of higher plants, and vary from a few bp to tens kb, accounting for 6.84% ~ 58.34% of the entire mt genome64,65,66. Repetitive sequences are essential for intermolecular recombination, which can produce extreme mt genome sizes and structural variations5 . SSR has an important function and are widely used for population diversity, genetic stability, species identification and phylogenetic analysis67. In L. sativa var. ramosa Hort, 110 SSRs were observed, of which 100% monomers being A or T, and 30.43% dimers being TA, resulting richness AT content (54.65%) in the L. sativa var. ramosa Hort mt genome. The abundant AT content were also found in the Ilex metabaptista mt genome68. Furthermore, the proportion of interspersed repeats in the L. sativa var. ramosa Hort mt genome (16%) was less than that of Acer yangbiense (17.20%) and A. truncatum (18.24%), and the largest interspersed repeats were 34,696 bp, 27,124 bp and 28,452 bp, respectively69,70. Besides, 15 tandem repeats were obtained in L. sativa var. ramosa Hort, which was much less than Selenicereus monacanthus (94)71 and Cyperus esculentus L.(82)72. The repeats obtained in this study will provide valuable information for future study on developing potential molecular markers and genetic evolution in the Lactuca species.

Genetic diversity refers to the variation of genes within an organism, including genetic variations between significantly different populations within the same or different species73. Studying the genetic diversity of crop populations will help us to better understand the genetic structure, highly variability regions and genetic background74. Previous studies reported that the highly variable regions could be designed as potential molecular markers for population genetics68,75. In L. sativa var. ramosa Hort, the highest Pi value of all the PCGs was atp1 gene, revealing that atp1 gene might be developed as an available molecular marker for the Lactuca species. The atp1 gene was widely identified in the plant mt genomes, and involved in the ATP synthase76,77. Whereas in Ilex metabaptista, atp9 genes (Pi = 0.114) showed the largest variability, which also played an important in the ATP synthase68. In our study, five hotspots, namely, atp1, nad2, nad6, and cox2, were found and used as potentially molecular markers. Three highly variable regions, being atp9, sdh3 and cox2, were selected as molecular marker in the Ilex metabaptista mt genome68. while four hotspots, being rpl5, atp8, rps3, and nad1, were obtained in the Piophila casei mt genome, of which might potentially use as molecular markers75. Most PCGs with lower Pi values declared that the L. sativa var. ramosa Hort mt genome was highly conserved.

The genome-wide data was widely used to analyze the evolutionary relationship among different species68,78,79. It is not clear that different lettuce species are involved in the domestication and/or diversification of L.sativa. From the -perspective of nuclear genome, L.serriola is considered as one of the direct ancestors of L.sativa and the closest relationship with L.sativa20,80. The rapid development of sequencing technologies and the recent increase in sequenced genomes contributed to illustrating the relationship between Lactuca species. The Lactuca species were well clustered and subdivided into several clades including L. sativa, L. serriola, L. virosa, and L. saligna81. In this study, the Bayesian phylogenetic was conducted based on 27 mt genomes of Asteraceae species and an outgroup mt genome. L. sativa var. ramosa Hort was well clustered with the species of Lactuca genus, and was closely related to L. sativa var. capitata and L. virosa , implicating that L. sativa var. ramosa Hort belongs to the Lactuca genus in the Asteraceae family. The similar results were obtained in the analysis of the whole genome resequencing of 445 Lactuca species81. Additionally, the synteny analysis showed that L. sativa var. ramosa Hort has a similarity structure with L. sativa var. capitata. The sequence rearrangement events were observed in L. sativa, L. saligna and L.virosa compared to L. sativa var. ramosa Hort. Although L. sativa var. ramosa Hort has the same size with L. sativa var. capitata, it exhibits minor differences in gene content in comparison with L. sativa var. capitata (Table 2). Gene mutation, homologous sequence interference or sequencing artifacts might be caused for these differences between L. sativa var. ramosa Hort and L. sativa var. capitata. These minor differences might be also caused due to the different genetic characteristics of different Lactuca varieties. Intraspecifc variations in mt genome sequence and gene content have been identified in six Lactuca varieties, which helped to distinguish this genome from previously sequenced L. sativa mt genomes.

Sequences migrated from the cp genome can be found in the plant mt genome, usually accounting for 1–12% of the whole mt genome82. About 33.33% tRNA genes originated from cp genome and gradually migrated during evolution83. The total length of migrated sequences varies from 50 kb (Arabidopsis thaliana) to 1.1 Mb (Oryza sativa subsp.japonica ) based on the plant species84. In our study, 15 fragments with the total length of 7,547 bp (2.08% of the L. sativa var. ramosa Hort mt genome) migrated from cp to mt genomes, implicating that these transferred fragments might play an important role in evolution. Seven genes, including six tRNA genes (trnW-CCA, trnQ-TTG, trnD-GTC, trnH-GTG, trnN-GTT, and trnM-CAT) and rrn18, were migrated between cp and mt genomes. According to previous studies on higher plants, about 42% of the cp genome fragments were integrated into the Vitis vinifera mt genome with a length of 773,279 bp, including more than 30 cp PCGs and 17 tRNA genes85. In addition, over than 113 kb cp migrated sequences were found in the Cucurbita pepo mt genome, and most of transferred genes were tRNA genes5. Combined with the above findings, tRNA genes are more conserved than PCGs in the mt genome of L. sativa var. ramosa Hort, which might be a character of a mt genome during the process of evolution in Lactuca species.

Conclusion

In this study, we sequenced and successfully drew the genome with a typical circular structure in the L. sativa var. ramosa Hort mt genome. Its genome has a length of 363,324 bp, consisting of 71 genes with 35 PCGs, 6 rRNAs, 28 tRNAs, and 2 pseudogenes, within 45.35% GC content. Subsequently, we carried out studies on codon preference, SSRs, tandem repeats and interspersed repeats in the L. sativa var. ramosa Hort mt genome. Additionally, 500 RNA-editing sites were detected in 35 PCGs, which is helpful to predict gene function by using new codons. Based on gene migration analysis, a total of 15 fragments, including six complete tRNA genes, were migrated from cp genome to mt genome. Most PCGs with low Pi values illustrated that the mt genome was conserved in L. sativa var. ramosa Hort. Phylogenetic analysis confirmed that L. sativa var. ramosa Hort is genetically closer to L. sativa var. capitata and and L. virosa, which belongs to the Lactuca genus in the Asteraceae family. In summary, L. sativa var. ramosa Hort has a similarity structure with L. sativa var. capitata, but displays minor differences in gene content compared to L. sativa var. capitata.