Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

The complete genome assembly of Nicotiana benthamiana reveals the genetic and epigenetic landscape of centromeres

Abstract

Nicotiana benthamiana is a model organism widely adopted in plant biology. Its complete assembly remains unavailable despite several recent improvements. To further improve its usefulness, we generate and phase the complete 2.85 Gb genome assembly of allotetraploid N. benthamiana. We find that although Solanaceae centromeres are widely dominated by Ty3/Gypsy retrotransposons, satellite-based centromeres are surprisingly common in N. benthamiana, with 11 of 19 centromeres featured by megabase-scale satellite arrays. Interestingly, the satellite-enriched and satellite-free centromeres are extensively invaded by distinct Gypsy retrotransposons which CENH3 protein more preferentially occupies, suggestive of their crucial roles in centromere function. We demonstrate that ribosomal DNA is a major origin of centromeric satellites, and mitochondrial DNA could be employed as a core component of the centromere. Subgenome analysis indicates that the emergence of satellite arrays probably drives new centromere formation. Altogether, we propose that N. benthamiana centromeres evolved via neocentromere formation, satellite expansion, retrotransposon enrichment and mtDNA integration.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Whole-genome landscape of the N. benthamiana complete genome.
The alternative text for this image may have been generated using AI.
Fig. 2: Subgenome assignments of the allotetraploid N. benthamiana and tracing the potential diploid ancestors.
The alternative text for this image may have been generated using AI.
Fig. 3: Genetic and epigenetic landscape of centromeres in N. benthamiana.
The alternative text for this image may have been generated using AI.
Fig. 4: Satellites drive the formation of neocentromere in N. benthamiana.
The alternative text for this image may have been generated using AI.
Fig. 5: Precise kinetochore assembly sites and epigenetic centromere landscapes.
The alternative text for this image may have been generated using AI.
Fig. 6: Scenario for the palaeohistory of N. benthamiana genome evolution.
The alternative text for this image may have been generated using AI.

Similar content being viewed by others

Data availability

The raw sequencing data and the genome assembly have been deposited at the China National Center for Bioinformation (https://ngdc.cncb.ac.cn/) under project number PRJCA022857. The nuclear genome assembly and annotation are also available in Zenodo93. The mitochondria and chloroplast genomes have been deposited at the China National Center for Bioinformation under accession number C_AA066595 and C_AA066594, respectively.

Code availability

This manuscript does not report original code.

References

  1. Jiang, J., Birchler, J. A., Parrott, W. A. & Dawe, R. K. A molecular view of plant centromeres. Trends Plant Sci. 8, 570–575 (2003).

    PubMed  Google Scholar 

  2. Zhang, H. et al. Boom-bust turnovers of megabase-sized centromeric DNA in Solanum species: rapid evolution of DNA sequences associated with centromeres. Plant Cell 26, 1436–1447 (2014).

    PubMed  PubMed Central  Google Scholar 

  3. Wlodzimierz, P. et al. Cycles of satellite and transposon evolution in Arabidopsis centromeres. Nature 618, 557–565 (2023).

    PubMed  Google Scholar 

  4. Altemose, N. et al. Complete genomic and epigenetic maps of human centromeres. Science 376, eabl4178 (2022).

    PubMed  PubMed Central  Google Scholar 

  5. Shang, L. et al. A complete assembly of the rice Nipponbare reference genome. Mol. Plant 16, 1232–1236 (2023).

    PubMed  Google Scholar 

  6. Naish, M. et al. The genetic and epigenetic landscape of the Arabidopsis centromeres. Science 374, eabi7489 (2021).

    PubMed  PubMed Central  Google Scholar 

  7. Zhao, J. et al. Centromere repositioning and shifts in wheat evolution. Plant Commun. 4, 100556 (2023).

    PubMed  PubMed Central  Google Scholar 

  8. Ahmed, H. I. et al. Einkorn genomics sheds light on history of the oldest domesticated wheat. Nature 620, 830–838 (2023).

    PubMed  PubMed Central  Google Scholar 

  9. Liu, Q. et al. Non–B-form DNA tends to form in centromeric regions and has undergone changes in polyploid oat subgenomes. Proc. Natl Acad. Sci. USA 120, e2211683120 (2023).

    PubMed  Google Scholar 

  10. Chen, J. et al. A complete telomere-to-telomere assembly of the maize genome. Nat. Genet. 55, 1221–1231 (2023).

    PubMed  PubMed Central  Google Scholar 

  11. Yang, X. et al. The gap-free potato genome assembly reveals large tandem gene clusters of agronomical importance in highly repeated genomic regions. Mol. Plant 16, 314–317 (2023).

    PubMed  Google Scholar 

  12. Chang, S. B. et al. FISH mapping and molecular organization of the major repetitive sequences of tomato. Chromosome Res. 16, 919–933 (2008).

    PubMed  Google Scholar 

  13. Nagaki, K. et al. Coexistence of NtCENH3 and two retrotransposons in tobacco centromeres. Chromosome Res. 19, 591–605 (2011).

    PubMed  Google Scholar 

  14. Chen, W. et al. Two telomere-to-telomere gapless genomes reveal insights into Capsicum evolution and capsaicinoid biosynthesis. Nat. Commun. 15, 4295 (2024).

    PubMed  PubMed Central  Google Scholar 

  15. Ranawaka, B. et al. A multi-omic Nicotiana benthamiana resource for fundamental research and biotechnology. Nat. Plants 9, 1558–1571 (2023).

    PubMed  PubMed Central  Google Scholar 

  16. Bombarely, A. et al. A draft genome sequence of Nicotiana benthamiana to enhance molecular plant–microbe biology research. Mol. Plant Microbe Interact. 25, 1523–1530 (2012).

    PubMed  Google Scholar 

  17. Kurotani, K. I. et al. Genome sequence and analysis of Nicotiana benthamiana, the model plant for interactions between organisms. Plant Cell Physiol. 64, 248–257 (2023).

    PubMed  PubMed Central  Google Scholar 

  18. Wu, Y. et al. Phylogenomic discovery of deleterious mutations facilitates hybrid potato breeding. Cell 186, 2313–2328 (2023).

    PubMed  Google Scholar 

  19. Wang, J. et al. High-quality assembled and annotated genomes of Nicotiana tabacum and Nicotiana benthamiana reveal chromosome evolution and changes in defense arsenals. Mol. Plant 17, 423–437 (2024).

    PubMed  Google Scholar 

  20. Ko, S. R. et al. High-quality chromosome-level genome assembly of Nicotiana benthamiana. Sci. Data 11, 386 (2024).

    PubMed  PubMed Central  Google Scholar 

  21. Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).

    PubMed  PubMed Central  Google Scholar 

  22. Matyasek, R., Fulnecek, J., Leitch, A. R. & Kovarik, A. Analysis of two abundant, highly related satellites in the allotetraploid Nicotiana arentsii using double-strand conformation polymorphism analysis and sequencing. New Phytol. 192, 747–759 (2011).

    PubMed  Google Scholar 

  23. Chen, C. M. et al. Two tandemly repeated telomere-associated sequences in Nicotiana plumbaginifolia. Chromosome Res. 5, 561–568 (1997).

    PubMed  Google Scholar 

  24. D’Andrea, L. et al. Polyploid Nicotiana section Suaveolentes originated by hybridization of two ancestral Nicotiana clades. Front. Plant Sci. 14, 999887 (2023).

    PubMed  PubMed Central  Google Scholar 

  25. Jia, K. H. et al. SubPhaser: a robust allopolyploid subgenome phasing method based on subgenome-specific k-mers. New Phytol. 235, 801–809 (2022).

    PubMed  Google Scholar 

  26. Chase, M. W. et al. Molecular systematics, GISH and the origin of hybrid taxa in Nicotiana (Solanaceae). Ann. Bot. 92, 107–127 (2003).

    PubMed  PubMed Central  Google Scholar 

  27. Wang, L. et al. A telomere-to-telomere gap-free assembly of soybean genome. Mol. Plant 16, 1711–1714 (2023).

    PubMed  Google Scholar 

  28. de Castro Nunes, R. et al. Structure and distribution of centromeric retrotransposons at diploid and allotetraploid Coffea centromeric and pericentromeric regions. Front. Plant Sci. 9, 175 (2018).

    PubMed  PubMed Central  Google Scholar 

  29. Cauz-Santos, L. A. et al. Genomic insights into recent species divergence in Nicotiana benthamiana and natural variation in Rdr1 gene controlling viral susceptibility. Plant J. 111, 7–18 (2022).

    PubMed  PubMed Central  Google Scholar 

  30. Yang, X. et al. Amplification and adaptation of centromeric repeats in polyploid switchgrass species. New Phytol. 218, 1645–1657 (2018).

    PubMed  Google Scholar 

  31. Puertas, M. J. & González-Sánchez, M. Insertions of mitochondrial DNA into the nucleus—effects and role in cell evolution. Genome 63, 365–374 (2020).

    PubMed  Google Scholar 

  32. Matsuo, M., Ito, Y., Yamauchi, R. & Obokata, J. The rice nuclear genome continuously integrates, shuffles, and eliminates the chloroplast genome to cause chloroplast-nuclear DNA flux. Plant Cell 17, 665–675 (2005).

    PubMed  PubMed Central  Google Scholar 

  33. Schiavinato, M., Marcet‐Houben, M., Dohm, J. C., Gabaldón, T. & Himmelbauer, H. Parental origin of the allotetraploid tobacco Nicotiana benthamiana. Plant J. 102, 541–554 (2020).

    PubMed  PubMed Central  Google Scholar 

  34. Ni, P. et al. Genome-wide detection of cytosine methylations in plant from Nanopore data using deep learning. Nat. Commun. 12, 5976 (2021).

    PubMed  PubMed Central  Google Scholar 

  35. Wang, S. et al. Phylotranscriptomics supports numerous polyploidization events and phylogenetic relationships in Nicotiana. Front. Plant Sci. 14, 1205683 (2023).

    PubMed  PubMed Central  Google Scholar 

  36. Clarkson, J. J., Dodsworth, S. & Chase, M. W. Time-calibrated phylogenetic trees establish a lag between polyploidisation and diversification in Nicotiana (Solanaceae). Plant Syst. Evol. 303, 1001–1012 (2017).

    Google Scholar 

  37. Lim, K. Y. et al. Sequence of events leading to near-complete genome turnover in allopolyploid Nicotiana within five million years. New Phytol. 175, 756–763 (2007).

    PubMed  Google Scholar 

  38. Koukalova, B. et al. Fall and rise of satellite repeats in allopolyploids of Nicotiana over c. 5 million years. New Phytol. 186, 148–160 (2010).

    PubMed  Google Scholar 

  39. Gong, Z. et al. Repeatless and repeat-based centromeres in potato: implications for centromere evolution. Plant Cell 24, 3559–3574 (2012).

    PubMed  PubMed Central  Google Scholar 

  40. Song, J. et al. Two gap-free reference genomes and a global view of the centromere architecture in rice. Mol. Plant 14, 1757–1767 (2021).

    PubMed  Google Scholar 

  41. Malik, H. S. & Henikoff, S. Major evolutionary transitions in centromere complexity. Cell 138, 1067–1082 (2009).

    PubMed  Google Scholar 

  42. Naish, M. & Henderson, I. R. The structure, function, and evolution of plant centromeres. Genome Res. 34, 161–178 (2024).

    PubMed  PubMed Central  Google Scholar 

  43. Wei, W. et al. Nuclear-embedded mitochondrial DNA sequences in 66,083 human genomes. Nature 611, 105–114 (2022).

    PubMed  PubMed Central  Google Scholar 

  44. Michalovová, M., Vyskot, B. & Kejnovsky, E. Analysis of plastid and mitochondrial DNA insertions in the nucleus (NUPTs and NUMTs) of six plant species: size, relative age and chromosomal localization. Heredity 111, 314–320 (2013).

    PubMed  PubMed Central  Google Scholar 

  45. Zhang, M. et al. Preparation of megabase-sized DNA from a variety of organisms using the nuclei method for advanced genomics research. Nat. Protoc. 7, 467–478 (2012).

    PubMed  Google Scholar 

  46. Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259 (2015).

    PubMed  PubMed Central  Google Scholar 

  47. Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).

    PubMed  PubMed Central  Google Scholar 

  48. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    PubMed  PubMed Central  Google Scholar 

  49. Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3, 95–98 (2016).

    PubMed  PubMed Central  Google Scholar 

  50. Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95 (2017).

    PubMed  PubMed Central  Google Scholar 

  51. Durand, N. C. et al. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst. 3, 99–101 (2016).

    PubMed  PubMed Central  Google Scholar 

  52. Goel, M., Sun, H., Jiao, W. B. & Schneeberger, K. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol. 20, 277 (2019).

    PubMed  PubMed Central  Google Scholar 

  53. Manni, M., Berkeley, M. R., Seppey, M., Simao, F. A. & Zdobnov, E. M. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol. Biol. Evol. 38, 4647–4654 (2021).

    PubMed  PubMed Central  Google Scholar 

  54. Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21, 245 (2020).

    PubMed  PubMed Central  Google Scholar 

  55. Ou, S., Chen, J. & Jiang, N. Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic Acids Res. 46, e126 (2018).

    PubMed  PubMed Central  Google Scholar 

  56. Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinformatics 5, 4–10 (2009).

    Google Scholar 

  57. Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, W265–W268 (2007).

    PubMed  PubMed Central  Google Scholar 

  58. Ellinghaus, D., Kurtz, S. & Willhoeft, U. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics 9, 18 (2008).

    PubMed  PubMed Central  Google Scholar 

  59. Ou, S. & Jiang, N. LTR_retriever: a highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant Physiol. 176, 1410–1422 (2018).

    PubMed  Google Scholar 

  60. Zhang, R. G. et al. TEsorter: an accurate and fast method to classify LTR-retrotransposons in plant genomes. Hortic. Res. 9, uhac017 (2022).

    PubMed  PubMed Central  Google Scholar 

  61. Cantarel, B. L. et al. MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res. 18, 188–196 (2008).

    PubMed  PubMed Central  Google Scholar 

  62. Ray, R. et al. A persistent major mutation in canonical jasmonate signaling is embedded in an herbivory-elicited gene network. Proc. Natl Acad. Sci. USA 120, e2308500120 (2023).

    PubMed  PubMed Central  Google Scholar 

  63. Li, W. & Godzik, A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659 (2006).

    PubMed  Google Scholar 

  64. Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357–360 (2015).

    PubMed  PubMed Central  Google Scholar 

  65. Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).

    PubMed  PubMed Central  Google Scholar 

  66. Korf, I. Gene finding in novel genomes. BMC Bioinformatics 5, 59 (2004).

    PubMed  PubMed Central  Google Scholar 

  67. Lomsadze, A., Ter-Hovhannisyan, V., Chernoff, Y. O. & Borodovsky, M. Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids Res. 33, 6494–6506 (2005).

    PubMed  PubMed Central  Google Scholar 

  68. Stanke, M., Tzvetkova, A. & Morgenstern, B. AUGUSTUS at EGASP: using EST, protein and genomic alignments for improved gene prediction in the human genome. Genome Biol. 7, S11 (2006).

    PubMed Central  Google Scholar 

  69. Brůna, T., Hoff, K. J., Lomsadze, A., Stanke, M. & Borodovsky, M. BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genom. Bioinformatics 3, lqaa108 (2021).

    Google Scholar 

  70. Hu, J. et al. NextDenovo: an efficient error correction and accurate assembly tool for noisy long reads. Genome Biol. 25, 107 (2024).

    PubMed  PubMed Central  Google Scholar 

  71. Bi, C. et al. PMAT: an efficient plant mitogenome assembly toolkit using low-coverage HiFi sequencing data. Hortic. Res. 11, uhae023 (2024).

    PubMed  PubMed Central  Google Scholar 

  72. Wick, R. R., Schultz, M. B., Zobel, J. & Holt, K. E. Bandage: interactive visualization of de novo genome assemblies. Bioinformatics 31, 3350–3352 (2015).

    PubMed  PubMed Central  Google Scholar 

  73. Tillich, M. et al. GeSeq – versatile and accurate annotation of organelle genomes. Nucleic Acids Res. 45, W6–W11 (2017).

    PubMed  PubMed Central  Google Scholar 

  74. Hao, Z. et al. RIdeogram: drawing SVG graphics to visualize and map genome-wide data on the idiograms. PeerJ Comput. Sci. 6, e251 (2020).

    PubMed  PubMed Central  Google Scholar 

  75. Nagaki, K., Kashihara, K. & Murata, M. A centromeric DNA sequence colocalized with a centromere-specific histone H3 in tobacco. Chromosoma 118, 249–257 (2009).

    PubMed  Google Scholar 

  76. Shi, X. et al. The complete reference genome for grapevine (Vitis vinifera L.) genetics and breeding. Hortic. Res. 10, uhad061 (2023).

    PubMed  PubMed Central  Google Scholar 

  77. Wang, Y. H. et al. Telomere-to-telomere carrot (Daucus carota) genome assembly reveals carotenoid characteristics. Hortic. Res. 10, uhad103 (2023).

    PubMed  PubMed Central  Google Scholar 

  78. Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).

    PubMed  PubMed Central  Google Scholar 

  79. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie2. Nat. Methods 9, 357–359 (2012).

    PubMed  PubMed Central  Google Scholar 

  80. Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    PubMed  PubMed Central  Google Scholar 

  81. Ramirez, F. et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 44, W160–W165 (2016).

    PubMed  PubMed Central  Google Scholar 

  82. Zhang, Y. et al. Model-based analysis of ChIP-seq (MACS). Genome Biol. 9, R137 (2008).

    PubMed  PubMed Central  Google Scholar 

  83. Vollger, M. R., Kerpedjiev, P., Phillippy, A. M. & Eichler, E. E. StainedGlass: interactive visualization of massive tandem repeat structures with identity heatmaps. Bioinformatics 38, 2049–2051 (2022).

    PubMed  PubMed Central  Google Scholar 

  84. Simpson, J. T. et al. Detecting DNA cytosine methylation using nanopore sequencing. Nat. Methods 14, 407–410 (2017).

    PubMed  Google Scholar 

  85. Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).

    PubMed  PubMed Central  Google Scholar 

  86. Tang, H., Krishnakumar, V. & Li, J. jcvi: JCVI utility libraries. Zenodo https://doi.org/10.5281/ZENODO.31631 (2015).

  87. Wang, Y. et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 40, e49 (2012).

    PubMed  PubMed Central  Google Scholar 

  88. Langdon, Q. K., Peris, D., Kyle, B. & Hittinger, C. T. sppIDer: a species identification tool to investigate hybrid genomes with high-throughput sequencing. Mol. Biol. Evol. 35, 2835–2849 (2018).

    PubMed  PubMed Central  Google Scholar 

  89. Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 238 (2019).

    PubMed  PubMed Central  Google Scholar 

  90. Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).

    PubMed  PubMed Central  Google Scholar 

  91. Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).

    PubMed  Google Scholar 

  92. Nguyen, L. T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).

    PubMed  Google Scholar 

  93. Chen, W. & Guo, L. Data used in ‘Thecomplete genome assembly of Nicotiana benthamiana reveals genetic andepigenetic landscape of centromeres’. Zenodo https://doi.org/10.5281/zenodo.14010728 (2024).

Download references

Acknowledgements

We thank the Bioinformatics Platform at Peking University Institute of Advanced Agricultural Sciences for providing the high-performance computing resources. This work was supported by the Key R&D Program of Shandong Province (ZR202211070163 to L.G.), the Taishan Scholars Program and Natural Science Foundation for Distinguished Young Scholars (ZR2023JQ010 to L.G.) of Shandong Province.

Author information

Authors and Affiliations

Contributions

L.G. conceived and supervised the project. W.C. performed genome assembly and analysis of centromeric sequences. M.Y. prepared figures and tables. S.C., J.S. and J.W. performed epigenetic analysis. D.M. conducted epigenome sequencing. J.L. and L.Z. generated genome sequencing data. W.C. and L.G. wrote the paper. All authors read and approved the final version of the paper.

Corresponding author

Correspondence to Li Guo.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Plants thanks Feng Li and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Whole-genome coverage of HiFi reads across the NB.T2T assembly.

Local coverage-anomalous regions were shown in black lines. The regions of rDNA arrays, satellites, NUPTs and NUMTs were marked on the bottom track. The red triangles and black asterisks represent the gaps in HiFi assembly and HiFi & ONT assembly before gap-closure, respectively.

Extended Data Fig. 2 Genome structure of the telomeres and subtelomeres.

a, A presentation of the subtelomere alignment between two assemblies. The collinear regions were linked by gray lines. The assembly gaps in NB.PCP were marked by black lines. The following information exhibited the HiFi coverage, ONT coverage, 5mC level, gene number and TE distribution calculated in 20-kb bins. The heatmaps showed the pairwise sequence identity (%) between 5-kb bins of the subtelomeric regions in Chr01. The histogram showed the satellite repeat length across all subtelomeric regions. Tsate181 and Tsate164 represent 181-bp and 164-bp subtelomeric satellite, respectively. b, Length of telomeres and subtelomeres in 19 chromosomes. c, Maximum-likelihood phylogenetic tree of Tsate181 sampled from 16 subtelomeres (L, left terminal; R, right terminal) and three interstitial subtelomeric sequences (marked as M), rooted using Tsate164. The color represents the origin of each satellite in chromosome location.

Extended Data Fig. 3 Phased subgenomes of allotetraploid N. benthamiana.

a, Alignments of N. benthamiana chromosomes with itself. b, Circos plot of subgenome partitions of N. benthamiana. Track from outer to inner: subgenome assignments by a k-means algorithm, significant enrichment of subgenome specific k-mers, normalized proportion of subgenome specific k-mers, count of B subgenome specific k-mers, count of A subgenome specific k-mers, count of subgenome specific LTR-RTs (in yellow and blue color), and homoeologous blocks of each homoeologous chromosome set. c-d, Alignments of N. benthamiana chromosomes with its potential diploid ancestors of N. sylvestris (c) and N. attenuata (d).

Extended Data Fig. 4 CENH3 protein alignments and CENH3 ChIP-seq values across whole genome.

a, Two antibodies against N-terminal (ARTKHLALRKQSRPPSRPTA) or whole protein of CENH3 were synthesized and used to conduct ChIP-seq experiment. b, CENH3 log2(ChIP/control) enrichment level across 19 chromosomes. The diamond represents the position of identified centromeres.

Extended Data Fig. 5 CENH3 enrichment, repeat distribution and StainedGlass heatmap of the 19 centromeres.

The CENH3 panels show the log2(ChIP/control ratio) calculated in 20-kb bins in 4 Mb or 9 Mb windows for centromere lower than or higher than 3 Mb, respectively. The heatmaps show the pairwise sequence identity (%) between 5-kb sequences.

Extended Data Fig. 6 Genetic and epigenetic landscape of NUMTs-type and satellite-type centromeres.

a-b, Schematic representation showing the CENH3 enrichment level, GC content, and distribution of different repeats in 20-kb bins across Chr02: 15.00-17.58 Mb (a) and Chr08: 118.74-123.12 Mb (b). The bottom showing a conserved CEN02-spaning synteny block, and a StainedGlass sequence identity heatmap of CEN08.

Extended Data Fig. 7 Identification of centromeric satellite repeats and phylogenetic analysis of Gypsy retrotransposons.

a, Circos plot showed the distribution of different satellite family. b, Pairwise sequence alignments of 19 centromeres. The red and blue lines indicate forward- and reverse-strand similarity, respectively. c, Maximum likelihood phylogenetic tree of intact Gypsy retrotransposons, colored according to seven subfamilies. Asterisks at the branch indicate elements within the centromeres.

Extended Data Fig. 8 Satellites drive the formation of neocentromere (CEN10) in N. benthamiana.

a, A synteny block was conserved among N. benthamiana, tobacco, potato and pepper, which spanned the centromere CEN10 of N. benthamiana. b, CENH3 enrichment and Gypsy density in 20 kb windows within CEN10-spanning synteny block between Chr09 and Chr10. Full descriptions for the panels are also available in the main Fig. 4 legend. StainedGlass sequence identity heatmaps were shown beneath.

Supplementary information

Supplementary Information (download PDF )

Supplementary Note 1, Methods and Figs. 1–20.

Reporting Summary (download PDF )

Supplementary Table (download XLSX )

Supplementary Tables 1–13.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, W., Yan, M., Chen, S. et al. The complete genome assembly of Nicotiana benthamiana reveals the genetic and epigenetic landscape of centromeres. Nat. Plants 10, 1928–1943 (2024). https://doi.org/10.1038/s41477-024-01849-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41477-024-01849-y

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing