Abstract
Understanding the driving force of centromere dynamics is crucial for deciphering the complexity of eukaryotic evolution and speciation. Here we assembled 67 rice genomes from the Oryza AA group and analyzed >800 nearly complete centromeres. Through de novo annotation of centromeric satellite CEN155 sequences and employing a progressive compression strategy, we quantified the local homogenization and multilayer structures of rice satellite arrays. Our results indicate that genetic innovations in rice centromeres primarily arise from structural variations and centrophilic retrotransposon insertions. The single-base substitution rate in rice centromeres appears to be lower relative to that in chromosome arms. Comparisons of CEN155 arrays, retrotransposons and functional centromeres highlight their dynamic but correlated interplay. Contrary to the KARMA model for Arabidopsis centromere evolution, we propose a hypothesis that retrotransposon invasion probably contributes to the decline of progenitor centromeric satellite arrays and promotes centromere repositioning, as evidenced by extended CENH3 chromatin immunoprecipitation sequencing enrichment beyond the native satellite arrays.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout






Similar content being viewed by others
Data availability
Raw PacBio HiFi reads for 46 rice accessions, raw ONT sequencing reads for 10 accessions and CENH3 ChIP–seq NGS reads for 10 accessions generated in this study have been deposited in the National Genomics Data Center under BioProject, accession no. PRJCA025388 (https://ngdc.cncb.ac.cn/bioproject/browse/PRJCA025388), with the Genome Sequence Archive nos. CRA016014 (https://ngdc.cncb.ac.cn/gsa/browse/CRA016014), CRA017638 (https://ngdc.cncb.ac.cn/gsa/browse/CRA017638) and CRA017653 (https://ngdc.cncb.ac.cn/gsa/browse/CRA017653), respectively. The newly generated genome assemblies in this study are available in the NCBI (BioProject, accession no. PRJNA1276249) and via Zenodo at https://doi.org/10.5281/zenodo.12770803 (ref. 81). The genome assemblies of NIP, MH63 and ZS97 are available at the RiceSuperPIRdb (http://ricesuperpir.com/) and the Rice Information GateWay (http://rice.hzau.edu.cn/). TE and gene annotation files of 70 rice genomes are available via Zenodo at https://doi.org/10.5281/zenodo.12698984 (ref. 82). Supporting materials for centromere assembly quality control for each sample are available via Zenodo at https://doi.org/10.5281/zenodo.14286880 (ref. 83), including read-mapping coverage plots, NucFreq plots, GCI coverage plots and VerityMap plots. Rice centromere annotation and comparison plots for all accessions and chromosomes are available via Zenodo at https://doi.org/10.5281/zenodo.12702715 (ref. 84), including similarity heatmap plots generated by StainedGlass for each centromere, whole-genome synteny to the NIP reference assembly, centromere synteny against NIP and NJ11 assemblies and centromere composition for all chromosomes. Source data are provided with this paper.
Code availability
The SynPan-CEN code is available via GitHub at https://github.com/Darlene1997/SynPan-CEN (ref. 85) and the scripts for the progressive compression strategy in deciphering the satellite organization and additional in-house codes associated with this study (including assembly, annotation and visualization) are available via GitHub at https://github.com/dongyawu/CenTools (ref. 86). The code and scripts are also available via Zenodo at https://doi.org/10.5281/zenodo.16990314 (ref. 87). The visualization of centromere annotation and synteny tracks was performed using ggplot2 in R (v.4.3.1, https://www.r-project.org/).
References
Barra, V. & Fachinetti, D. The dark side of centromeres: types, causes and consequences of structural abnormalities implicating centromeric DNA. Nat. Commun. 9, 4340 (2018).
Naish, M. & Henderson, I. R. The structure, function, and evolution of plant centromeres. Genome Res. 34, 161–178 (2024).
Melters, D. P. et al. Comparative analysis of tandem repeats from hundreds of species reveals unique insights into centromere evolution. Genome Biol. 14, R10 (2013).
Ahmed, H. I. et al. Einkorn genomics sheds light on history of the oldest domesticated wheat. Nature 620, 830–838 (2023).
Naish, M. et al. The genetic and epigenetic landscape of the Arabidopsis centromeres. Science 374, eabi7489 (2021).
Altemose, N. et al. Complete genomic and epigenetic maps of human centromeres. Science 376, eabl4178 (2022).
Huang, Z. et al. Evolutionary analysis of a complete chicken genome. Proc. Natl Acad. Sci. USA 120, e2216641120 (2023).
Wang, T. et al. A complete gap-free diploid genome in Saccharum complex and the genomic footprints of evolution in the highly polyploid Saccharum genus. Nat. Plants 9, 554–571 (2023).
Logsdon, G. A. et al. The variation and evolution of complete human centromeres. Nature 629, 136–145 (2024).
Cheng, Z. et al. Functional rice centromeres are marked by a satellite repeat and a centromere-specific retrotransposon. Plant Cell 14, 1691–1704 (2002).
Lv, Y. et al. A centromere map based on super pan-genome highlights the structure and function of rice centromeres. J. Integr. Plant Biol. 66, 196–207 (2024).
Malik, H. S. & Henikoff, S. Major evolutionary transitions in centromere complexity. Cell 138, 1067–1082 (2009).
Kursel, L. E. & Malik, H. S. Centromeres. Curr. Biol. 26, R487–R490 (2016).
Gent, J. I., Wang, N. & Dawe, R. K. Stable centromere positioning in diverse sequence contexts of complex and satellite centromeres of maize and wild relatives. Genome Biol. 18, 121 (2017).
Liu, Y. et al. Pan-centromere reveals widespread centromere repositioning of soybean genomes. Proc. Natl Acad. Sci. USA 120, e2310177120 (2023).
Zhang, T. et al. The CentO satellite confers translational and rotational phasing on CENH3 nucleosomes in rice centromeres. Proc. Natl Acad. Sci. USA 110, E4875–E4883 (2013).
Chen, J. et al. A complete telomere-to-telomere assembly of the maize genome. Nat. Genet. 55, 1221–1231 (2023).
Stein, J. C. et al. Genomes of 13 domesticated and wild rice relatives highlight genetic conservation, turnover and innovation across the genus Oryza. Nat. Genet. 50, 1618 (2018).
Song, J. et al. Two gap-free reference genomes and a global view of the centromere architecture in rice. Mol. Plant 14, 1757–1767 (2021).
Shang, L. et al. A complete assembly of the rice Nipponbare reference genome. Mol. Plant 16, 1232–1236 (2023).
Cheng, H. et al. Haplotype-resolved assembly of diploid genomes without parental data. Nat. Biotechnol. 40, 1332–1335 (2022).
Rautiainen, M. et al. Telomere-to-telomere assembly of diploid chromosomes with Verkko. Nat. Biotechnol. 41, 1474–1482 (2023).
Nurk, S. The complete sequence of a human genome. Science 376, 44–53 (2022).
Mikheenko, A., Bzikadze, A. V., Gurevich, A., Miga, K. H. & Pevzner, P. A. TandemTools: mapping long reads and assessing/improving assembly quality in extra-long tandem repeats. Bioinformatics 36, i75–i83 (2020).
Vollger, M. R. et al. Long-read sequence and assembly of segmental duplications. Nat. Methods 16, 88–94 (2019).
Cheng, Z., Buell, C. R., Wing, R. A., Gu, M. & Jiang, J. Toward a cytological characterization of the rice genome. Genome Res. 11, 2133–2141 (2001).
Lian, Q. et al. A pan-genome of 69 Arabidopsis thaliana accessions reveals a conserved genome structure throughout the global species range. Nat. Genet. 56, 982–991 (2024).
Wu, D. et al. A syntelog-based pan-genome provides insights into rice domestication and de-domestication. Genome Biol. 24, 179 (2023).
Gong, H. & Han, B. Genetic introgression between different groups reveals the differential process of Asian cultivated rice. Sci. Rep. 12, 17662 (2022).
Rosandić, M. et al. CENP-B box and pJα sequence distribution in human alpha satellite higher-order repeats (HOR). Chromosome Res. 14, 735–753 (2006).
Rice, W. R. A game of thrones at human centromeres I. Multifarious structure necessitates a new molecular/evolutionary model. Preprint at bioRxiv https://doi.org/10.1101/731430 (2020).
Masumoto, H., Masukata, H., Muro, Y., Nozaki, N. & Okazaki, T. A human centromere antigen (CENP-B) interacts with a short specific sequence in alphoid DNA, a human centromeric satellite. J. Cell Biol. 109, 1963–1973 (1989).
Kipling, D., Wilson, H. E., Mitchell, A. R., Taylor, B. A. & Cooke, H. J. Mouse centromere mapping using oligonucleotide probes that detect variants of the minor satellite. Chromosoma 103, 46–55 (1994).
Kugou, K., Hirai, H., Masumoto, H. & Koga, A. Formation of functional CENP-B boxes at diverse locations in repeat units of centromeric DNA in New World monkeys. Sci. Rep. 6, 27833 (2016).
Cappelletti, E. et al. The localization of centromere protein A is conserved among tissues. Commun. Biol. 6, 963 (2023).
Gaff, C. et al. A novel nuclear protein binds centromeric alpha satellite DNA. Hum. Mol. Genet. 3, 711–716 (1994).
Wlodzimierz, P. et al. Cycles of satellite and transposon evolution in Arabidopsis centromeres. Nature 618, 557–565 (2023).
Kubo, T. & Yoshimura, A. Genetic basis of hybrid breakdown in a Japonica/Indica cross of rice, Oryza sativa L. Theor. Appl. Genet. 105, 906–911 (2002).
Bensasson, D. Evidence for a high mutation rate at rapidly evolving yeast centromeres. BMC Evol. Biol. 11, 211 (2011).
Minton, K. Tandem repeat variation of human centromeres. Nat. Rev. Genet. 25, 455 (2024).
Schneider, K. L., Xie, Z., Wolfgruber, T. K. & Presting, G. G. Inbreeding drives maize centromere evolution. Proc. Natl Acad. Sci. USA 113, E987–E996 (2016).
Irvine, D. V. et al. Chromosome size and origin as determinants of the level of CENP-A incorporation into human centromeres. Chromosome Res. 12, 805–815 (2004).
Plačková, K., Bureš, P. & Zedek, F. Centromere size scales with genome size across eukaryotes. Sci. Rep. 11, 19811 (2021).
Wang, N., Liu, J., Ricci, W. A., Gent, J. I. & Dawe, R. K. Maize centromeric chromatin scales with changes in genome size. Genetics 217, iyab020 (2021).
Bilinski, P. et al. Diversity and evolution of centromere repeats in the maize genome. Chromosoma 124, 57–65 (2015).
Rice, W. R. A game of thrones at human centromeres II. A new molecular/evolutionary model. Preprint at bioRxiv https://doi.org/10.1101/731471 (2019).
Talbert, P. & Henikoff, S. Centromeres organize (epi)genome architecture. Cell 185, 3083–3085 (2002).
Wu, Z. et al. De novo genome assembly of Oryza granulata reveals rapid genome expansion and adaptive evolution. Commun. Biol. 1, 84 (2018).
Zhang, Y. et al. The telomere-to-telomere gap-free genome of four rice parents reveals SV and PAV patterns in hybrid rice breeding. Plant Biotechnol. J. 20, 1642–1644 (2022).
Sedeek, K. et al. Multi-omics resources for targeted agronomic improvement of pigmented rice. Nat. Food 4, 366–371 (2023).
Shang, L. et al. A super pan-genomic landscape of rice. Cell Res. 32, 878–896 (2022).
Nagaki, K., Talbert, P. B. & Zhong, C. X. Chromatin immunoprecipitation reveals that the 180-bp satellite repeat is the key functional DNA element of Arabidopsis thaliana centromeres. Genetics 163, 1221–1225 (2003).
Sim, S. B., Corpuz, R. L., Simmonds, T. J. & Geib, S. M. HiFiAdapterFilt, a memory efficient read processing pipeline, prevents occurrence of adapter sequence in PacBio HiFi reads and their negative impacts on genome assembly. BMC Genom. 23, 157 (2022).
Cheng, H., Asri, M., Lucas, J., Koren, S. & Li, H. Scalable telomere-to-telomere assembly for diploid and polyploid genomes with double graph. Nat. Methods 21, 967–970 (2024).
Alonge, M. et al. Automated assembly scaffolding using RagTag elevates a new tomato system for high-throughput genome editing. Genome Biol. 23, 258 (2022).
Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21, 245 (2020).
Jain, C., Rhie, A., Hansen, N. F., Koren, S. & Phillippy, A. M. Long-read mapping to repetitive reference sequences using Winnowmap2. Nat. Methods 19, 705–710 (2022).
Hu, J. et al. NextPolish2: a repeat-aware polishing tool for genomes assembled using HiFi long reads. Genom. Proteom. Bioinform. 22, qzad009 (2024).
Bzikadze, A. V., Mikheenko, A. & Pevzner, P. A. Fast and accurate mapping of long reads to complete genome assemblies with VerityMap. Genome Res. 32, 2107–2118 (2022).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Mol. Biol. Evol. 26, 1641–1650 (2009).
Vollger, M. R., Kerpedjiev, P., Phillippy, A. M. & Eichler, E. E. StainedGlass: interactive visualization of massive tandem repeat structures with identity heatmaps. Bioinformatics 38, 2049–2051 (2022).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Lin, J. et al. SVision: a deep learning approach to resolve complex structural variants. Nat. Methods 19, 1230–1233 (2022).
Edgar, R. C. Muscle5: high-accuracy alignment ensembles enable unbiased assessments of sequence homology and phylogeny. Nat. Commun. 13, 6968 (2022).
Bailey, T. L., Johnson, J., Grant, C. E. & Noble, W. S. The MEME suite. Nucleic Acids Res. 43, W39–W49 (2015).
Haas, B. J., Delcher, A. L., Wortman, J. R. & Salzberg, S. L. DAGchainer: a tool for mining segmental genome duplications and synteny. Bioinformatics 20, 3643–3646 (2004).
Rice, P., Longden, L. & Bleasby, A. EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 16, 276–277 (2000).
Mistry, J. et al. Pfam: the protein families database in 2021. Nucleic Acids Res. 49, D412–D419 (2021).
Katoh, K. & Standley, D. M. MAFFT Multiple Sequence Alignment Software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).
Capella-Gutiérrez, S., Silla-Martínez, J. M. & Gabaldón, T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 (2009).
Letunic, I. & Bork, P. Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res. 49, W293–W296 (2021).
Mayor, C. et al. VISTA: visualizing global DNA sequence alignments of arbitrary length. Bioinformatics 16, 1046–1047 (2000).
Paradis, E. & Schliep, K. ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 35, 526–528 (2019).
Thompson, J. D., Gibson, T. J. & Higgins, D. G. Multiple sequence alignment using ClustalW and ClustalX. Curr. Protoc. Bioinform. Chapter 2, Unit 2.3 (2002).
Chen, S. Ultrafast one-pass FASTQ data preprocessing, quality control, and deduplication using fastp. iMeta 2, e107 (2023).
Langmead, B., Wilks, C., Antonescu, V. & Charles, R. Scaling read aligners to hundreds of threads on general-purpose processors. Bioinformatics 35, 421–432 (2019).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Ramírez, F. et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 44, W160–W165 (2016).
Xie, L. The newly-generated genome assemblies. Zenodo https://doi.org/10.5281/zenodo.12770803 (2025).
Xie, L. TE and gene annotation files. Zenodo https://doi.org/10.5281/zenodo.12698984 (2025).
Xie, L. Centromere assembly quality plots. Zenodo https://doi.org/10.5281/zenodo.14286880 (2025).
Xie, L. Rice centromere annotation and comparison plots. Zenodo https://doi.org/10.5281/zenodo.12702715 (2025).
Xie, L. The SynPan-CEN code. GitHub https://github.com/Darlene1997/SynPan-CEN (2025).
Wu, D. The CenTools code. GitHub https://github.com/dongyawu/CenTools (2025).
Xie, L. The SynPan-CEN code. Zenodo https://doi.org/10.5281/zenodo.16990314 (2025).
Acknowledgements
This work was supported by Biological Breeding-Major Projects (grant no. 2023ZD04076), China National Postdoctoral Program for Innovative Talents (grant no. BX20220269), China Postdoctoral Science Foundation (grant no. 2023M743045), Young Scientists Fund of the National Natural Science Foundation of China (grant no. 32300490), National Key Research and Development Program of China (grant no. 2019YFA0903904) and CIC MIC. We thank G. Zhang (Zhejiang University), Y. Mao (Shanghai Jiao Tong University), B. Wu (Sun Yat-sen University) and K. Wu (Zhejiang University) for constructive suggestions.
Author information
Authors and Affiliations
Contributions
D.W. conceived and initiated this study. L.S. collected the samples. D.W., Q.C. and M.S. performed the sequencing data quality control, centromere assembly and quality evaluation. L.X., D.W. and Y.S. performed the analysis of satellite sequence identification and clustering and satellite array organization. Y.H. and D.W. performed the annotation and centromeric insertion analysis of TEs. W.H., L.X. and S.B. conducted the CENH3 ChIP experiments. L.X., D.W. and S.Z. processed the ChIP–seq data and analyzed the epigenetic profiling of rice centromeres. L.F. and D.W. supervised all analyses. Q.Q., W.J., C.Y., L.S. and X.Z. provided suggestions on analysis, organization and writing. D.W., L.X. and Y.H. wrote the manuscripts with input from all the coauthors. All authors discussed the results and commented on the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Genetics thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Variation in rice centromere positioning.
a, Chromosome length and the ratio of long arm length to short arm length across each chromosome. b, Relationships between chromosome size and CEN155 satellite array size across different taxonomic groups. Linear regression analyses were performed to evaluate the relationships between variables, with P values based on two-sided t test. Shaded areas represent 95% confidence intervals of the fitted regression lines. c, Two megabase-scale inversions observed in the centromere region of chromosome Chr06.
Extended Data Fig. 2 StainedGlass sequence identity heat maps of centromeres from chromosomes Chr02, Chr05 and Chr04.
Representative centromere haplotypes for each chromosome are shown.
Extended Data Fig. 3 Centromere divergence and fission on rice chromosome Chr12.
a, Centromere similarity and structural variations compared to NIP on chromosomes Chr12, showing divergent centromere haplotypes (CenHaps) and putative centromere introgression events (for example CW15, MH63). Left, a maximum-likelihood phylogenetic tree across rice accessions on chromosome Chr12. b, StainedGlass sequence similarity heat maps within and between CEN155 arrays on chromosomes Chr12 from SL044, CW09 and NJ11, and their synteny, indicating a fission from SL044-type centromere to CW09 (X3) and NJ11 (X4) type. TEs (blue) and gene-like elements (red) are shown. Their commonly shared retrotransposon RETROSAT-2C around the junction is highlighted. c, Comparison of phylogenetic trees built using upstream 500-Kbp and downstream 500-Kbp SNPs flanking the Chr12 CEN155 array. Taxonomic information is represented by colored circles, with the position of SL044 highlighted by a dashed line. d, Schematic representation of centromeric structural alterations, including introgression, duplication and fission or splitting.
Extended Data Fig. 4 Divergence sites between CEN155 superfamilies.
The CENP-B box-like and pJα-like motif regions are shown.
Extended Data Fig. 5 Inference of multimers and muHRs in rice centromeres.
The de Bruijn graphs are constructed based on the dimer-compressed satellite string of each centromere.
Extended Data Fig. 6 Structural variations in centromere regions.
a, Schematic diagram of structural variations in satellite arrays (SaSVs). A query satellite array is aligned against a reference array using satellites and TEs as markers. Based on syntenic pairing of CEN155 satellites and TEs, SVs with more than 50 copies of CEN155 satellites are defined as large expansions (LEs) or contractions (LCs). Regions with continuously poor synteny are referred to as divergent blocks, compared to the reference array. b, SaSV number and involved CEN155 satellite size (upper), and SaSV size distribution (bottom) in GJ and XI satellite arrays compared to their corresponding reference assemblies NIP and NJ11, respectively. c. SaSVs in individual genomes associated with the phylogenetic kinship.
Extended Data Fig. 7 CENH3 ChIP-seq enrichment and element annotation across Chr05 centromeres.
Top, CENH3 ChIP-seq enrichment (log2(ChIP/input), two replicates). Beneath, CEN155 superfamily, TE and sati annotation along each centromere.
Extended Data Fig. 8 A retrotransposon-induced centromere evolution model, summarized from the rice centromere analysis.
This model highlights the multi-layer structures of rice satellite arrays by local homogenization and emphasizes the triggering role of LTR invasion in initiating satellite array degeneration and centromere repositioning determined by CENH3 occupancy.
Supplementary information
Supplementary Information (download PDF )
Supplementary Figs. 1–52 and Notes 1–8.
Supplementary Tables (download XLSX )
Supplementary Tables 1–11.
Source data
Source Data Fig. 1 (download XLSX )
Statistical source data.
Source Data Fig. 5 (download XLSX )
Statistical source data.
Source Data Fig. 6 (download XLSX )
Statistical source data.
Source Data Extended Data Fig. 1 (download XLSX )
Statistical source data.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Xie, L., Huang, Y., Huang, W. et al. Genetic diversity and evolution of rice centromeres. Nat Genet 57, 2808–2818 (2025). https://doi.org/10.1038/s41588-025-02365-1
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1038/s41588-025-02365-1


