Abstract
Nicotiana benthamiana is a model organism widely adopted in plant biology. Its complete assembly remains unavailable despite several recent improvements. To further improve its usefulness, we generate and phase the complete 2.85 Gb genome assembly of allotetraploid N. benthamiana. We find that although Solanaceae centromeres are widely dominated by Ty3/Gypsy retrotransposons, satellite-based centromeres are surprisingly common in N. benthamiana, with 11 of 19 centromeres featured by megabase-scale satellite arrays. Interestingly, the satellite-enriched and satellite-free centromeres are extensively invaded by distinct Gypsy retrotransposons which CENH3 protein more preferentially occupies, suggestive of their crucial roles in centromere function. We demonstrate that ribosomal DNA is a major origin of centromeric satellites, and mitochondrial DNA could be employed as a core component of the centromere. Subgenome analysis indicates that the emergence of satellite arrays probably drives new centromere formation. Altogether, we propose that N. benthamiana centromeres evolved via neocentromere formation, satellite expansion, retrotransposon enrichment and mtDNA integration.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout






Similar content being viewed by others
Data availability
The raw sequencing data and the genome assembly have been deposited at the China National Center for Bioinformation (https://ngdc.cncb.ac.cn/) under project number PRJCA022857. The nuclear genome assembly and annotation are also available in Zenodo93. The mitochondria and chloroplast genomes have been deposited at the China National Center for Bioinformation under accession number C_AA066595 and C_AA066594, respectively.
Code availability
This manuscript does not report original code.
References
Jiang, J., Birchler, J. A., Parrott, W. A. & Dawe, R. K. A molecular view of plant centromeres. Trends Plant Sci. 8, 570–575 (2003).
Zhang, H. et al. Boom-bust turnovers of megabase-sized centromeric DNA in Solanum species: rapid evolution of DNA sequences associated with centromeres. Plant Cell 26, 1436–1447 (2014).
Wlodzimierz, P. et al. Cycles of satellite and transposon evolution in Arabidopsis centromeres. Nature 618, 557–565 (2023).
Altemose, N. et al. Complete genomic and epigenetic maps of human centromeres. Science 376, eabl4178 (2022).
Shang, L. et al. A complete assembly of the rice Nipponbare reference genome. Mol. Plant 16, 1232–1236 (2023).
Naish, M. et al. The genetic and epigenetic landscape of the Arabidopsis centromeres. Science 374, eabi7489 (2021).
Zhao, J. et al. Centromere repositioning and shifts in wheat evolution. Plant Commun. 4, 100556 (2023).
Ahmed, H. I. et al. Einkorn genomics sheds light on history of the oldest domesticated wheat. Nature 620, 830–838 (2023).
Liu, Q. et al. Non–B-form DNA tends to form in centromeric regions and has undergone changes in polyploid oat subgenomes. Proc. Natl Acad. Sci. USA 120, e2211683120 (2023).
Chen, J. et al. A complete telomere-to-telomere assembly of the maize genome. Nat. Genet. 55, 1221–1231 (2023).
Yang, X. et al. The gap-free potato genome assembly reveals large tandem gene clusters of agronomical importance in highly repeated genomic regions. Mol. Plant 16, 314–317 (2023).
Chang, S. B. et al. FISH mapping and molecular organization of the major repetitive sequences of tomato. Chromosome Res. 16, 919–933 (2008).
Nagaki, K. et al. Coexistence of NtCENH3 and two retrotransposons in tobacco centromeres. Chromosome Res. 19, 591–605 (2011).
Chen, W. et al. Two telomere-to-telomere gapless genomes reveal insights into Capsicum evolution and capsaicinoid biosynthesis. Nat. Commun. 15, 4295 (2024).
Ranawaka, B. et al. A multi-omic Nicotiana benthamiana resource for fundamental research and biotechnology. Nat. Plants 9, 1558–1571 (2023).
Bombarely, A. et al. A draft genome sequence of Nicotiana benthamiana to enhance molecular plant–microbe biology research. Mol. Plant Microbe Interact. 25, 1523–1530 (2012).
Kurotani, K. I. et al. Genome sequence and analysis of Nicotiana benthamiana, the model plant for interactions between organisms. Plant Cell Physiol. 64, 248–257 (2023).
Wu, Y. et al. Phylogenomic discovery of deleterious mutations facilitates hybrid potato breeding. Cell 186, 2313–2328 (2023).
Wang, J. et al. High-quality assembled and annotated genomes of Nicotiana tabacum and Nicotiana benthamiana reveal chromosome evolution and changes in defense arsenals. Mol. Plant 17, 423–437 (2024).
Ko, S. R. et al. High-quality chromosome-level genome assembly of Nicotiana benthamiana. Sci. Data 11, 386 (2024).
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).
Matyasek, R., Fulnecek, J., Leitch, A. R. & Kovarik, A. Analysis of two abundant, highly related satellites in the allotetraploid Nicotiana arentsii using double-strand conformation polymorphism analysis and sequencing. New Phytol. 192, 747–759 (2011).
Chen, C. M. et al. Two tandemly repeated telomere-associated sequences in Nicotiana plumbaginifolia. Chromosome Res. 5, 561–568 (1997).
D’Andrea, L. et al. Polyploid Nicotiana section Suaveolentes originated by hybridization of two ancestral Nicotiana clades. Front. Plant Sci. 14, 999887 (2023).
Jia, K. H. et al. SubPhaser: a robust allopolyploid subgenome phasing method based on subgenome-specific k-mers. New Phytol. 235, 801–809 (2022).
Chase, M. W. et al. Molecular systematics, GISH and the origin of hybrid taxa in Nicotiana (Solanaceae). Ann. Bot. 92, 107–127 (2003).
Wang, L. et al. A telomere-to-telomere gap-free assembly of soybean genome. Mol. Plant 16, 1711–1714 (2023).
de Castro Nunes, R. et al. Structure and distribution of centromeric retrotransposons at diploid and allotetraploid Coffea centromeric and pericentromeric regions. Front. Plant Sci. 9, 175 (2018).
Cauz-Santos, L. A. et al. Genomic insights into recent species divergence in Nicotiana benthamiana and natural variation in Rdr1 gene controlling viral susceptibility. Plant J. 111, 7–18 (2022).
Yang, X. et al. Amplification and adaptation of centromeric repeats in polyploid switchgrass species. New Phytol. 218, 1645–1657 (2018).
Puertas, M. J. & González-Sánchez, M. Insertions of mitochondrial DNA into the nucleus—effects and role in cell evolution. Genome 63, 365–374 (2020).
Matsuo, M., Ito, Y., Yamauchi, R. & Obokata, J. The rice nuclear genome continuously integrates, shuffles, and eliminates the chloroplast genome to cause chloroplast-nuclear DNA flux. Plant Cell 17, 665–675 (2005).
Schiavinato, M., Marcet‐Houben, M., Dohm, J. C., Gabaldón, T. & Himmelbauer, H. Parental origin of the allotetraploid tobacco Nicotiana benthamiana. Plant J. 102, 541–554 (2020).
Ni, P. et al. Genome-wide detection of cytosine methylations in plant from Nanopore data using deep learning. Nat. Commun. 12, 5976 (2021).
Wang, S. et al. Phylotranscriptomics supports numerous polyploidization events and phylogenetic relationships in Nicotiana. Front. Plant Sci. 14, 1205683 (2023).
Clarkson, J. J., Dodsworth, S. & Chase, M. W. Time-calibrated phylogenetic trees establish a lag between polyploidisation and diversification in Nicotiana (Solanaceae). Plant Syst. Evol. 303, 1001–1012 (2017).
Lim, K. Y. et al. Sequence of events leading to near-complete genome turnover in allopolyploid Nicotiana within five million years. New Phytol. 175, 756–763 (2007).
Koukalova, B. et al. Fall and rise of satellite repeats in allopolyploids of Nicotiana over c. 5 million years. New Phytol. 186, 148–160 (2010).
Gong, Z. et al. Repeatless and repeat-based centromeres in potato: implications for centromere evolution. Plant Cell 24, 3559–3574 (2012).
Song, J. et al. Two gap-free reference genomes and a global view of the centromere architecture in rice. Mol. Plant 14, 1757–1767 (2021).
Malik, H. S. & Henikoff, S. Major evolutionary transitions in centromere complexity. Cell 138, 1067–1082 (2009).
Naish, M. & Henderson, I. R. The structure, function, and evolution of plant centromeres. Genome Res. 34, 161–178 (2024).
Wei, W. et al. Nuclear-embedded mitochondrial DNA sequences in 66,083 human genomes. Nature 611, 105–114 (2022).
Michalovová, M., Vyskot, B. & Kejnovsky, E. Analysis of plastid and mitochondrial DNA insertions in the nucleus (NUPTs and NUMTs) of six plant species: size, relative age and chromosomal localization. Heredity 111, 314–320 (2013).
Zhang, M. et al. Preparation of megabase-sized DNA from a variety of organisms using the nuclei method for advanced genomics research. Nat. Protoc. 7, 467–478 (2012).
Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259 (2015).
Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3, 95–98 (2016).
Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95 (2017).
Durand, N. C. et al. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst. 3, 99–101 (2016).
Goel, M., Sun, H., Jiao, W. B. & Schneeberger, K. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol. 20, 277 (2019).
Manni, M., Berkeley, M. R., Seppey, M., Simao, F. A. & Zdobnov, E. M. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol. Biol. Evol. 38, 4647–4654 (2021).
Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21, 245 (2020).
Ou, S., Chen, J. & Jiang, N. Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic Acids Res. 46, e126 (2018).
Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinformatics 5, 4–10 (2009).
Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, W265–W268 (2007).
Ellinghaus, D., Kurtz, S. & Willhoeft, U. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics 9, 18 (2008).
Ou, S. & Jiang, N. LTR_retriever: a highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant Physiol. 176, 1410–1422 (2018).
Zhang, R. G. et al. TEsorter: an accurate and fast method to classify LTR-retrotransposons in plant genomes. Hortic. Res. 9, uhac017 (2022).
Cantarel, B. L. et al. MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res. 18, 188–196 (2008).
Ray, R. et al. A persistent major mutation in canonical jasmonate signaling is embedded in an herbivory-elicited gene network. Proc. Natl Acad. Sci. USA 120, e2308500120 (2023).
Li, W. & Godzik, A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659 (2006).
Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357–360 (2015).
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).
Korf, I. Gene finding in novel genomes. BMC Bioinformatics 5, 59 (2004).
Lomsadze, A., Ter-Hovhannisyan, V., Chernoff, Y. O. & Borodovsky, M. Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids Res. 33, 6494–6506 (2005).
Stanke, M., Tzvetkova, A. & Morgenstern, B. AUGUSTUS at EGASP: using EST, protein and genomic alignments for improved gene prediction in the human genome. Genome Biol. 7, S11 (2006).
Brůna, T., Hoff, K. J., Lomsadze, A., Stanke, M. & Borodovsky, M. BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genom. Bioinformatics 3, lqaa108 (2021).
Hu, J. et al. NextDenovo: an efficient error correction and accurate assembly tool for noisy long reads. Genome Biol. 25, 107 (2024).
Bi, C. et al. PMAT: an efficient plant mitogenome assembly toolkit using low-coverage HiFi sequencing data. Hortic. Res. 11, uhae023 (2024).
Wick, R. R., Schultz, M. B., Zobel, J. & Holt, K. E. Bandage: interactive visualization of de novo genome assemblies. Bioinformatics 31, 3350–3352 (2015).
Tillich, M. et al. GeSeq – versatile and accurate annotation of organelle genomes. Nucleic Acids Res. 45, W6–W11 (2017).
Hao, Z. et al. RIdeogram: drawing SVG graphics to visualize and map genome-wide data on the idiograms. PeerJ Comput. Sci. 6, e251 (2020).
Nagaki, K., Kashihara, K. & Murata, M. A centromeric DNA sequence colocalized with a centromere-specific histone H3 in tobacco. Chromosoma 118, 249–257 (2009).
Shi, X. et al. The complete reference genome for grapevine (Vitis vinifera L.) genetics and breeding. Hortic. Res. 10, uhad061 (2023).
Wang, Y. H. et al. Telomere-to-telomere carrot (Daucus carota) genome assembly reveals carotenoid characteristics. Hortic. Res. 10, uhad103 (2023).
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie2. Nat. Methods 9, 357–359 (2012).
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Ramirez, F. et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 44, W160–W165 (2016).
Zhang, Y. et al. Model-based analysis of ChIP-seq (MACS). Genome Biol. 9, R137 (2008).
Vollger, M. R., Kerpedjiev, P., Phillippy, A. M. & Eichler, E. E. StainedGlass: interactive visualization of massive tandem repeat structures with identity heatmaps. Bioinformatics 38, 2049–2051 (2022).
Simpson, J. T. et al. Detecting DNA cytosine methylation using nanopore sequencing. Nat. Methods 14, 407–410 (2017).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Tang, H., Krishnakumar, V. & Li, J. jcvi: JCVI utility libraries. Zenodo https://doi.org/10.5281/ZENODO.31631 (2015).
Wang, Y. et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 40, e49 (2012).
Langdon, Q. K., Peris, D., Kyle, B. & Hittinger, C. T. sppIDer: a species identification tool to investigate hybrid genomes with high-throughput sequencing. Mol. Biol. Evol. 35, 2835–2849 (2018).
Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 238 (2019).
Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).
Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).
Nguyen, L. T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).
Chen, W. & Guo, L. Data used in ‘Thecomplete genome assembly of Nicotiana benthamiana reveals genetic andepigenetic landscape of centromeres’. Zenodo https://doi.org/10.5281/zenodo.14010728 (2024).
Acknowledgements
We thank the Bioinformatics Platform at Peking University Institute of Advanced Agricultural Sciences for providing the high-performance computing resources. This work was supported by the Key R&D Program of Shandong Province (ZR202211070163 to L.G.), the Taishan Scholars Program and Natural Science Foundation for Distinguished Young Scholars (ZR2023JQ010 to L.G.) of Shandong Province.
Author information
Authors and Affiliations
Contributions
L.G. conceived and supervised the project. W.C. performed genome assembly and analysis of centromeric sequences. M.Y. prepared figures and tables. S.C., J.S. and J.W. performed epigenetic analysis. D.M. conducted epigenome sequencing. J.L. and L.Z. generated genome sequencing data. W.C. and L.G. wrote the paper. All authors read and approved the final version of the paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Plants thanks Feng Li and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Whole-genome coverage of HiFi reads across the NB.T2T assembly.
Local coverage-anomalous regions were shown in black lines. The regions of rDNA arrays, satellites, NUPTs and NUMTs were marked on the bottom track. The red triangles and black asterisks represent the gaps in HiFi assembly and HiFi & ONT assembly before gap-closure, respectively.
Extended Data Fig. 2 Genome structure of the telomeres and subtelomeres.
a, A presentation of the subtelomere alignment between two assemblies. The collinear regions were linked by gray lines. The assembly gaps in NB.PCP were marked by black lines. The following information exhibited the HiFi coverage, ONT coverage, 5mC level, gene number and TE distribution calculated in 20-kb bins. The heatmaps showed the pairwise sequence identity (%) between 5-kb bins of the subtelomeric regions in Chr01. The histogram showed the satellite repeat length across all subtelomeric regions. Tsate181 and Tsate164 represent 181-bp and 164-bp subtelomeric satellite, respectively. b, Length of telomeres and subtelomeres in 19 chromosomes. c, Maximum-likelihood phylogenetic tree of Tsate181 sampled from 16 subtelomeres (L, left terminal; R, right terminal) and three interstitial subtelomeric sequences (marked as M), rooted using Tsate164. The color represents the origin of each satellite in chromosome location.
Extended Data Fig. 3 Phased subgenomes of allotetraploid N. benthamiana.
a, Alignments of N. benthamiana chromosomes with itself. b, Circos plot of subgenome partitions of N. benthamiana. Track from outer to inner: subgenome assignments by a k-means algorithm, significant enrichment of subgenome specific k-mers, normalized proportion of subgenome specific k-mers, count of B subgenome specific k-mers, count of A subgenome specific k-mers, count of subgenome specific LTR-RTs (in yellow and blue color), and homoeologous blocks of each homoeologous chromosome set. c-d, Alignments of N. benthamiana chromosomes with its potential diploid ancestors of N. sylvestris (c) and N. attenuata (d).
Extended Data Fig. 4 CENH3 protein alignments and CENH3 ChIP-seq values across whole genome.
a, Two antibodies against N-terminal (ARTKHLALRKQSRPPSRPTA) or whole protein of CENH3 were synthesized and used to conduct ChIP-seq experiment. b, CENH3 log2(ChIP/control) enrichment level across 19 chromosomes. The diamond represents the position of identified centromeres.
Extended Data Fig. 5 CENH3 enrichment, repeat distribution and StainedGlass heatmap of the 19 centromeres.
The CENH3 panels show the log2(ChIP/control ratio) calculated in 20-kb bins in 4 Mb or 9 Mb windows for centromere lower than or higher than 3 Mb, respectively. The heatmaps show the pairwise sequence identity (%) between 5-kb sequences.
Extended Data Fig. 6 Genetic and epigenetic landscape of NUMTs-type and satellite-type centromeres.
a-b, Schematic representation showing the CENH3 enrichment level, GC content, and distribution of different repeats in 20-kb bins across Chr02: 15.00-17.58 Mb (a) and Chr08: 118.74-123.12 Mb (b). The bottom showing a conserved CEN02-spaning synteny block, and a StainedGlass sequence identity heatmap of CEN08.
Extended Data Fig. 7 Identification of centromeric satellite repeats and phylogenetic analysis of Gypsy retrotransposons.
a, Circos plot showed the distribution of different satellite family. b, Pairwise sequence alignments of 19 centromeres. The red and blue lines indicate forward- and reverse-strand similarity, respectively. c, Maximum likelihood phylogenetic tree of intact Gypsy retrotransposons, colored according to seven subfamilies. Asterisks at the branch indicate elements within the centromeres.
Extended Data Fig. 8 Satellites drive the formation of neocentromere (CEN10) in N. benthamiana.
a, A synteny block was conserved among N. benthamiana, tobacco, potato and pepper, which spanned the centromere CEN10 of N. benthamiana. b, CENH3 enrichment and Gypsy density in 20 kb windows within CEN10-spanning synteny block between Chr09 and Chr10. Full descriptions for the panels are also available in the main Fig. 4 legend. StainedGlass sequence identity heatmaps were shown beneath.
Supplementary information
Supplementary Information (download PDF )
Supplementary Note 1, Methods and Figs. 1–20.
Supplementary Table (download XLSX )
Supplementary Tables 1–13.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chen, W., Yan, M., Chen, S. et al. The complete genome assembly of Nicotiana benthamiana reveals the genetic and epigenetic landscape of centromeres. Nat. Plants 10, 1928–1943 (2024). https://doi.org/10.1038/s41477-024-01849-y
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1038/s41477-024-01849-y
This article is cited by
-
A closed-loop method for precise genome size estimation using HiFi reads
BMC Genomics (2025)


