Abstract
Water lilies are among the most basal groups of angiosperms and retain many morphological and physiological traits of early angiosperms, making them invaluable for studying angiosperm evolution, particularly floral organ development. Here we present the most comprehensive phylogeny of the genus Nymphaea to date, alongside gap-free genome assemblies for three species (Nymphaea colorata, Nymphaea thermarum and Nymphaea caerulea). Our analyses resolve 2 major clades, day-flowering (section A) and night-flowering (section B), which diverged approximately 50 million years ago. Comparative genomics reveals an angiosperm-exclusive pectin lyase gene specifically expressed during pollen tube elongation. Regarding floral traits, we identify the transcription factor NcolMYB75-like as a master regulator of blue anthocyanin biosynthesis. Furthermore, the expansion and diversification of the O-methyltransferase gene family drive the synthesis of species-specific floral scent volatiles. These findings deepen our understanding of early angiosperm innovations and provide a genomic framework for plant breeding and ecological conservation.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout







Similar content being viewed by others
Data availability
All raw sequencing data have been deposited in the National Genomics Data Center. Data used for genome assembly and gene expression quantification are available under project accession numbers PRJCA040347 (N. caerulea), PRJCA021382 (N. thermarum), PRJCA027228 (N. colorata) and PRJCA033981 (Nymphaea sp. Peru–Puerto Maldonado). Data used for flower colour-regulatory analysis are available under project accession number PRJCA023065. Transcriptomic data used for phylogenetic reconstruction of Nymphaea are available under project accession number PRJCA030030. Source data are provided with this paper.
References
Amborella Genome, P. The Amborella genome and the evolution of flowering plants. Science 342, 1241089 (2013).
Soltis, P. S. & Soltis, D. E. The role of hybridization in plant speciation. Annu. Rev. Plant Biol. 60, 561–588 (2009).
Theissen, G. Development of floral organ identity: stories from the MADS house. Curr. Opin. Plant Biol. 4, 75–85 (2001).
Wu, Q. et al. Transcriptome sequencing and metabolite analysis for revealing the blue flower formation in waterlily. BMC Genomics 17, 897 (2016).
Zhang, L. et al. The water lily genome and the early evolution of flowering plants. Nature 577, 79–84 (2020).
Borsch, T., Löhne, C., Mbaye, M. & Wiersema, J. Towards a complete species tree of Nymphaea: shedding further light on subg. Brachyceras and its relationships to the Australian water-lilies. Telopea 13, 193–217 (2011).
Pellicer, J., Kelly, L. J., Magdalena, C. & Leitch, I. J. Insights into the dynamics of genome size and chromosome evolution in the early diverging angiosperm lineage Nymphaeales (water lilies). Genome 56, 437–449 (2013).
Borsch, T. et al. Phylogeny of Nymphaea (Nymphaeaceae): evidence from substitutions and microstructural changes in the chloroplast trnT-trnF region. Int. J. Plant Sci. 168, 639–671 (2007).
Yuan, J. et al. Rapid drift of the Tethyan Himalaya terrane before two-stage India-Asia collision. Natl. Sci. Rev. 8, nwaa173 (2021).
Jiang, X. et al. Chromosome fusions shaped karyotype evolution and evolutionary relationships in the model family Brassicaceae. Nat. Commun. 16, 4631 (2025).
Sun, P. et al. Subgenome-aware analyses reveal the genomic consequences of ancient allopolyploid hybridizations throughout the cotton family. Proc. Natl Acad. Sci. USA 121, e2313921121 (2024).
Li, S. X., Liu, Y., Zhang, Y. M., Chen, J. Q. & Shao, Z. Q. Convergent reduction of immune receptor repertoires during plant adaptation to diverse special lifestyles and habitats. Nat. Plants 11, 248–262 (2025).
Hierarchical reduction in plant immune receptor repertoires during adaptation to special lifestyles and habitats. Nat. Plants 11, 163–164 (2025).
Liu, Y. et al. An angiosperm NLR Atlas reveals that NLR gene reduction is associated with ecological specialization and signal transduction component deletion. Mol. Plant 14, 2015–2031 (2021).
Araguirang, G. E. & Richter, A. S. Activation of anthocyanin biosynthesis in high light – what is the initial signal? New Phytol. 236, 2037–2043 (2022).
Lam, K. C., Ibrahim, R. K., Behdad, B. & Dayanandan, S. Structure, function, and evolution of plant O-methyltransferases. Genome 50, 1001–1013 (2007).
Beilstein, M. A., Nagalingum, N. S., Clements, M. D., Manchester, S. R. & Mathews, S. Dated molecular phylogenies indicate a Miocene origin for Arabidopsis thaliana. Proc. Natl Acad. Sci. USA 107, 18724–18728 (2010).
Zhang, D. et al. Diverse roles of MYB transcription factors in plants. J. Integr. Plant Biol. 67, 539–562 (2025).
Gonzalez, A., Zhao, M., Leavitt, J. M. & Lloyd, A. M. Regulation of the anthocyanin biosynthetic pathway by the TTG1/bHLH/Myb transcriptional complex in Arabidopsis seedlings. Plant J. 53, 814–827 (2008).
Li, S. et al. MYB75 phosphorylation by MPK4 is required for light-induced anthocyanin accumulation in Arabidopsis. Plant Cell 28, 2866–2883 (2016).
Li, W. et al. A key R2R3-MYB transcription factor activates anthocyanin biosynthesis and leads to leaf reddening in poplar mutants. Plant Cell Environ. 48, 2067–2082 (2025).
Naing, A. H. & Kim, C. K. Roles of R2R3-MYB transcription factors in transcriptional regulation of anthocyanin biosynthesis in horticultural plants. Plant Mol. Biol. 98, 1–18 (2018).
Liu, G. et al. Diverse O-methyltransferases catalyze the biosynthesis of floral benzenoids that repel aphids from the flowers of waterlily Nymphaea prolifera. Hortic. Res. 10, uhad237 (2023).
Carta, A., Bedini, G. & Peruzzi, L. A deep dive into the ancestral chromosome number and genome size of flowering plants. New Phytol. 228, 1097–1106 (2020).
Murat, F., Armero, A., Pont, C., Klopp, C. & Salse, J. Reconstructing the genome of the most recent common ancestor of flowering plants. Nat. Genet. 49, 490–496 (2017).
Formenti, G. et al. The era of reference genomes in conservation genomics. Trends Ecol. Evol. 37, 197–202 (2022).
Soltis, P. S., Marchant, D. B., Van de Peer, Y. & Soltis, D. E. Polyploidy and genome evolution in plants. Curr. Opin. Genet. Dev. 35, 119–125 (2015).
Huang, X., Huang, S., Han, B. & Li, J. The integrated genomics of crop domestication and breeding. Cell 185, 2828–2839 (2022).
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).
Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinformatics 10, 421 (2009).
Zeng, X. et al. Chromosome-level scaffolding of haplotype-resolved assemblies using Hi-C data without reference genomes. Nat. Plants 10, 1184–1200 (2024).
Durand, N. C. et al. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst. 3, 99–101 (2016).
Xu, M. et al. TGS-GapCloser: a fast and accurate gap closer for large genomes with low coverage of error-prone long reads. Gigascience 9, giaa094 (2020).
Lin, Y. et al. quarTeT: a telomere-to-telomere toolkit for gap-free genome assembly and centromeric repeat identification. Hortic. Res. 10, uhad127 (2023).
Goel, M., Sun, H., Jiao, W. B. & Schneeberger, K. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol. 20, 277 (2019).
Goel, M. & Schneeberger, K. plotsr: visualizing structural similarities and rearrangements between multiple genomes. Bioinformatics 38, 2922–2926 (2022).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Manni, M., Berkeley, M. R., Seppey, M., Simao, F. A. & Zdobnov, E. M. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol. Biol. Evol. 38, 4647–4654 (2021).
Sun, P. et al. WGDI: a user-friendly toolkit for evolutionary analyses of whole-genome duplications and ancestral karyotypes. Mol. Plant 15, 1841–1851 (2022).
Ou, S. et al. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol. 20, 275 (2019).
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).
Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357–360 (2015).
Goodstein, D. M. et al. Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res. 40, D1178–D1186 (2012).
UniProt, C. UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res. 51, D523–D531 (2023).
Gabriel, L. et al. BRAKER3: fully automated genome annotation using RNA-seq and protein evidence with GeneMark-ETP, AUGUSTUS, and TSEBRA. Genome Res. 34, 769–777 (2024).
Gabriel, L., Hoff, K. J., Bruna, T., Borodovsky, M. & Stanke, M. TSEBRA: transcript selector for BRAKER. BMC Bioinformatics 22, 566 (2021).
Kovaka, S. et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 20, 278 (2019).
Bruna, T., Lomsadze, A. & Borodovsky, M. A new gene finding tool GeneMark-ETP significantly improves the accuracy of automatic annotation of large eukaryotic genomes. Preprint at bioRxiv https://doi.org/10.1101/2023.01.13.524024 (2024).
Stanke, M., Diekhans, M., Baertsch, R. & Haussler, D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24, 637–644 (2008).
Chen, C. et al. TBtools-II: a ‘one for all, all for one’ bioinformatics platform for biological big-data mining. Mol. Plant 16, 1733–1742 (2023).
Cantalapiedra, C. P., Hernandez-Plaza, A., Letunic, I., Bork, P. & Huerta-Cepas, J. eggNOG-mapper v2: functional annotation, orthology assignments, and domain prediction at the metagenomic scale. Mol. Biol. Evol. 38, 5825–5829 (2021).
Huerta-Cepas, J. et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res. 47, D309–D314 (2019).
Jin, J. J. et al. GetOrganelle: a fast and versatile toolkit for accurate de novo assembly of organelle genomes. Genome Biol. 21, 241 (2020).
Tillich, M. et al. GeSeq – versatile and accurate annotation of organelle genomes. Nucleic Acids Res. 45, W6–W11 (2017).
Zhang, D. et al. PhyloSuite: an integrated and scalable desktop platform for streamlined molecular sequence data management and evolutionary phylogenetics studies. Mol. Ecol. Resour. 20, 348–355 (2020).
Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-seq data without a reference genome. Nat. Biotechnol. 29, 644–652 (2011).
Li, W. & Godzik, A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659 (2006).
Emms, D. M. & Kelly, S. OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol. 16, 157 (2015).
Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 238 (2019).
Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).
Capella-Gutierrez, S., Silla-Martinez, J. M. & Gabaldon, T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 (2009).
Kalyaanamoorthy, S., Minh, B. Q., Wong, T. K. F., von Haeseler, A. & Jermiin, L. S. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat. Methods 14, 587–589 (2017).
Minh, B. Q. et al. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol. Biol. Evol. 37, 1530–1534 (2020).
Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).
Letunic, I. & Bork, P. Interactive Tree of Life (iTOL) v6: recent updates to the phylogenetic tree display and annotation tool. Nucleic Acids Res. 52, W78–W82 (2024).
Yu, Y., Harris, A. J., Blair, C. & He, X. RASP (Reconstruct Ancestral State in Phylogenies): a tool for historical biogeography. Mol. Phylogenet. Evol. 87, 46–49 (2015).
Mendes, F. K., Vanderpool, D., Fulton, B. & Hahn, M. W. CAFE 5 models variation in evolutionary rates among gene families. Bioinformatics 36, 5516–5518 (2021).
He, S. et al. DataColor: unveiling biological data relationships through distinctive color mapping. Hortic. Res. 11, uhad273 (2024).
Mistry, J. et al. Pfam: the protein families database in 2021. Nucleic Acids Res. 49, D412–D419 (2021).
Eddy, S. R. Accelerated profile HMM pearches. PLoS Comput. Biol. 7, e1002195 (2011).
Chen, T., Zhang, H., Liu, Y., Liu, Y. X. & Huang, L. EVenn: easy to create repeatable and editable Venn diagrams and Venn networks online. J. Genet. Genomics 48, 863–866 (2021).
Shen, W., Sipos, B. & Zhao, L. SeqKit2: a Swiss army knife for sequence and alignment processing. Imeta 3, e191 (2024).
Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).
Chen, Y., Chen, L., Lun, A. T. L., Baldoni, P. L. & Smyth, G. K. edgeR v4: powerful differential analysis of sequencing data with expanded functionality and improved support for small counts and larger datasets. Nucleic Acids Res. 53, gkaf018 (2025).
Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024).
Bailey, T. L. et al. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 37, W202–W208 (2009).
Wang, J. et al. The conserved domain database in 2023. Nucleic Acids Res. 51, D384–D388 (2023).
Gu, Z., Eils, R. & Schlesner, M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics 32, 2847–2849 (2016).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559 (2008).
Langfelder, P. & Horvath, S. Fast R functions for robust correlations and hierarchical clustering. J. Stat. Softw. 46, i11 (2012).
Xu, Y. et al. Gibberellin signaling regulates pectin biosynthesis in Arabidopsis. Nat. Commun. 16, 4065 (2025).
Clough, S. J. & Bent, A. F. Floral dip: a simplified method for Agrobacterium-mediated transformation of Arabidopsis thaliana. Plant J. 16, 735–743 (1998).
Deikman, J. & Hammer, P. E. Induction of anthocyanin accumulation by cytokinins in Arabidopsis thaliana. Plant Physiol. 108, 47–57 (1995).
Song, Z. et al. BBX28/BBX29, HY5 and BBX30/31 form a feedback loop to fine-tune photomorphogenic development. Plant J. 104, 377–390 (2020).
Liu, G., Yang, M., Yang, X., Ma, X. & Fu, J. Five TPSs are responsible for volatile terpenoid biosynthesis in Albizia julibrissin. J. Plant Physiol. 258-259, 153358 (2021).
Acknowledgements
F.C. was supported by the National Natural Science Foundation of China (32172614) and a grant from Hainan University (XTCX2022NYB04). J.-Y.X. acknowledges funding from the National Natural Science Foundation of China (32570265), the Basic Research Program of Jiangsu (BK20252062), the Fundamental Research Funds for the Central Universities (RENCAI2025034 and KYCXJC2025002), the National Administration of Traditional Chinese Medicine High-level Key Discipline Construction Project (zyyzdxk-2023293) and the Traditional Chinese Medicine Interdisciplinary Cultivation Project. We thank J. Qiu for the myb75-c mutant seeds.
Author information
Authors and Affiliations
Contributions
F.C. conceived this study. J.-Y.X. and Y.J. participated in and coordinated the research. J.Z., Y.L., G.L, Yang Bai, X.-C.H., Yibo Bai, P.G.-C., T.Z., J.F., H.Z., H.C., W.W. and L.Z. conducted the genomic analyses and laboratory experiments. J.Z., Y.L., F.C., J.-Y.X. and Y.J. drafted the paper. F.C. and J.-Y.X. edited the paper. The authors read and approved the final paper.
Corresponding authors
Ethics declarations
Competing interests
Patent applications related to this work have been submitted by J.Z., Yang Bai and F.C. The other authors declare no competing interests.
Peer review
Peer review information
Nature Plants thanks Yuannian Jiao and Jianquan Liu for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information (download PDF )
Supplementary Figs. 1–29 and Tables 1–3.
Source data
Source Data Fig. 1 (download XLSX )
Data for tree construction.
Source Data Fig. 2 (download XLSX )
Whole genomic details for circos plots.
Source Data Fig. 4 (download XLSX )
Data of gene family expansion and contraction.
Source Data Fig. 5 (download XLSX )
List of lost genes and their GO annotation.
Source Data Fig. 6 (download XLSX )
Details for coexpression gene network and anthocyanin quantification data.
Source Data Fig. 7 (download XLSX )
Expression pattern of OMTs from three water lilies.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, J., Liang, Y., Liu, G. et al. Water lily complete genomes illuminate the innovations of water lilies and early angiosperms. Nat. Plants (2026). https://doi.org/10.1038/s41477-026-02281-0
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41477-026-02281-0


