Background & Summary

Biodiversity is undergoing a rapid global decline, underscoring the urgent need for effective conservation measures to safeguard plant diversity. Botanical gardens worldwide are pivotal in this effort, conserving an estimated 100,000 plant species ex situ1. Beyond their roles in recreation, education, and public engagement, these institutions serve as critical hubs for conservation and research, advancing integrated programs aimed at preventing plant extinctions2.

Monumental trees—large, ancient, or historically significant plants that are notable for their size, age, rarity, or cultural importance3—, serve as powerful flagship species for conservation initiatives. These trees are not merely ecological relics; they are also invaluable cultural and educational resources3,4 and represent unique models for studying long-term adaptation responses.

In the field of conservation genomics, these ancient plants can play crucial roles in the new era of reference genomes5, bridging the gap between modern genomic techniques and traditional conservation efforts. This allows researchers to uncover how monumental trees have adapted to survive through centuries of environmental changes. For instance, the genome sequencing of the Napoleon Oak (Quercus robur L.) in Switzerland revealed remarkable genetic stability, with low levels of somatic mutations even in an ancient tree6. This suggests robust mechanisms for maintaining genetic integrity – a likely contributor to the oak’s long lifespan. Similarly, the Vouves Olive tree (Olea europaea L.) in Greece, estimated to be between 3,000 and 5,000 years old, provided critical genetic insights through genome sequencing7. Researchers identified different genetic origins in the tree’s rootstock and scion, indicating the use of ancient grafting techniques.

From a scientific and biogeographical perspective, the palm species Chamaerops humilis L. (Arecaceae) is crucial for understanding the adaptive traits that allow this palm to thrive in Mediterranean environments. The Mediterranean habitat in which it is endemic differs significantly from the tropical settings typically associated with palms. This makes C. humilis the northmost naturally thriving palm species8. Although currently listed as ‘Least Concern’ by the IUCN Red List9, C. humilis is increasingly under threat from habitat loss, urbanization, and agricultural expansion, contributing to population declines in the wild10,11. As a Pleistocene relict and a thermo-Mediterranean bioindicator12, conserving this species is essential for maintaining the genetic diversity and ecological legacy of Mediterranean ecosystems.

From a historical perspective, Goethe’s Palm represents both a cultural icon and a scientific landmark. During his visit to Padua in 1786, the polymath and writer Johann Wolfgang von Goethe was inspired by this palm to develop his foundational botanical concepts featured in his scientific book, The Metamorphosis of Plants13. Sequencing the genome of Goethe’s palm may help clarify its provenance, as the original source of the specimen cultivated in 1585 remains unknown.

In this study, we present the first chromosome-level genome assembly of C. humilis, generated from Goethe’s Palm (Fig. 1), the oldest known individual of the species housed at the UNESCO Botanical Garden of the University of Padua (Italy)14. The assembled genome spans 4.41 Gbp, with a scaffold N50 of 195 Mbp, and BUSCO completeness of 99.3%. Mapping evaluation shows that 99.98% of HiFi reads aligned to the primary assembly, achieving a total coverage of 17x. The assembly has a quality value (QV) of 59 and an LTR Assembly Index (LAI) of 8.01. Approximately 88% of the genome is composed of repeats, with LTR retrotransposons being the most prevalent. A total of 28,321 protein-coding genes are predicted, 98.5% of which are functionally annotated. This genome provides a comprehensive resource for molecular research on C. humilis and offers novel insights into both the evolutionary history and cultural legacy of this emblematic specimen.

Fig. 1
figure 1

Goethe’s Palm (Chamaerops humilis) at the Botanical Garden of Padua (Italy). Top left: location of the Botanical Garden of Padua; bottom left: Goethe’s Palm in 1895; right: the plant as it appears today.

Methods and Results

PacBio and Hi-C sequencing

One fresh leaf of Goethe’s Palm was collected at the Botanical Garden of the University of Padua in September 2022. After 5 days in the dark, the DNA was extracted from approximately 30 mg of leaf tissue according to the CTAB-based protocol15. DNA concentration and DNA fragment length were assessed using the Qubit dsDNA BR Assay kit on the Qubit Fluorometer (Thermo Fisher Scientific, Waltham, USA) and the Genomic DNA Screen Tape on the Agilent 2200 TapeStation system (Agilent Technologies, Santa Clara, USA), respectively. One SMRTbell library was constructed following the instructions of the SMRTbell Express Prep kit v.2.0 (Pacific Biosciences of California, Inc., Menlo Park, USA). The same library was loaded twice on the PacBio by performing two SMRT cell sequencing runs on the Sequel System IIe in CCS mode. This resulted in an output of 114 Gb of PacBio raw reads, with a read N50 of 14,450 bp and a mean length of 12,812 bp.

We then employed a Hi-C approach to obtain high-resolution data on the spatial organization of the genome, which is essential for precise genome scaffolding. The Hi-C library was prepared from approximately 250 mg of leaf tissue (from the same individual) using the Arima High Coverage Hi-C Kit v.01 (Arima Genomics, Carlsbad, USA) according to the Animal Tissue User Guide. This kit captures the organizational structure of chromatin in three dimensions by first fixing the chromatin structure using formaldehyde, then digesting the crosslinked chromatin using a restriction enzyme cocktail optimized for coverage uniformity across a wide range of genomic sequence compositions and finally ligating the ends of the molecules in close proximity using a biotin-labeled bridge. Library preparation of the proximally-ligated DNA was then performed according to the Swift Biosciences Accel-NGS 2S Plus DNA Library Kit protocol (Swift Biosciences, Ann Arbor, USA). Fragment size distribution and concentration of the Arima High Coverage Hi-C library were assessed using the TapeStation 2200 (Agilent Technologies, Santa Clara, USA) and the Qubit Fluorometer and Qubit dsDNA HS Reagents Assay Kit (Thermo Fisher Scientific, Waltham, MA), respectively. The library was sequenced on an Illumina NovaSeq 6000 platform at Novogene (Cambridge, UK) using a 150 bp paired-end sequencing strategy with an insert size of 350 bp, resulting in 97.4 Gb of data.

RNA extraction and sequencing

Four different tissues (young and old leaves, marrow, fruits, and roots) of C. humilis were collected for RNA extraction. Total RNA (including microRNA and mRNA) was extracted using CTAB extraction buffer16 and subsequently quantified with an Implen NanoPhotometer (Implen GmbH, Munich, Germany). Only the samples with 260/280 and 260/230 absorbance ratios above 1.8 were kept. For each sample, 10 µg of total RNA were treated with DNase I (New England Biolabs, Ipswich, USA) to remove contaminant DNA and purified with an RNA Clean & Concentration-5 kit (Zymo Research, Irvine, USA).

The RNA extracts were then pooled, and the library preparation for both microRNAs and mRNA, as well as the sequencing, were performed at Novogene (Cambridge, UK). Sequencing was performed on the NovaSeq 6000 platform (Illumina, Inc., San Diego, USA), generating 150 bp paired-end reads with an insert size of 350 bp. The sequencing generated 11 Gb of paired-end mRNA raw data, and 1.2 Gb of microRNA data.

Nuclear genome assembly and scaffolding

High-quality circular consensus sequences based on the PacBio subreads were generated using a workflow containing DeepConsensus (v 0.2.0)17. Briefly, all CCS reads were obtained from subreads using PacBio’s ccs tool (v 6.4.0, https://github.com/PacificBiosciences/ccs). Both the CCS reads and subreads were then aligned with actc (v 0.3.1, https://github.com/PacificBiosciences/actc). DeepConsensus was run using CCS reads and alignments. The long high-fidelity (HiFi) fragments generated were first trimmed from potential adapters with HiFiAdapterFilt (v 2.0.1)18 and subsequently assembled at the contig level with Hifiasm (v 0.18.7)19.

We scaffolded the primary and haplotype contig-level assemblies with Hi-C data to generate chromosome-level assemblies. This was carried out using the Arima Hi-C mapping pipeline20. In brief, the Hi-C reads were first aligned to the contigs using BWA-MEM (v 0.7.17)21. The 3′ ends of the reads identified as chimeric were then removed with Arima in-house perl script (part of the Arima Hi-C mapping pipeline). Next, duplicated reads were removed with Picard (v 3.0.0)22. Finally, the assembly was scaffolded to chromosome level using YaHS (v 1.1)23 and visualized for quality assessment with Juicebox (v 1.11.08)24. To fill gaps, TGS-GapCloser (v 1.0.1)25 was run with Racon (v 1.4.3)26 as a method for enhancing the base-level accuracy of merged sequences. The chromosome-level genome assembly and the HiFi reads were utilized as the input for TGS-GapCloser.

To enhance the quality of the genome assemblies, the Arima Hi-C mapping pipeline was run once more, using the output from the initial run as input. The subsequent steps involved gap closing and quality assessment of the assemblies obtained. After two iterations of the Arima Hi-C mapping pipeline, the assemblies of the primary and the two haplotypes decreased their total number of scaffolds, increased the scaffold N90 (Table S1, Table S2, Table S3), and converged into 18 main scaffolds. These 18 scaffolds displayed on the contact maps (Fig. S1) are consistent with the findings of Röser27, who identified 18 chromosome pairs in C. humilis. The GC content remained the same throughout all the iterations, being 43.8% in all the assemblies (Table S1, Table S2, Table S3). The primary assembly had a total length of 4.41 Gbp, with 1,986 scaffolds, scaffold N50 of 195 Mbp, and BUSCO completeness of 99.3% (Table 1, Table S1, Fig. 2). The statistics for both haplotypes can be found in Tables S2 and S3.

Table 1 Summary of the main statistics of the primary assembly of C. humilis.
Fig. 2
figure 2

(a) Ideogram showing the telomeres (red), centromeres (black), and the microRNA annotated (orange, light blue, and dark blue). Chromosomes in the ideogram were sorted from the longest (chr 1) to the shortest one (chr 18). (b) Proteomap displaying the KEGG functional annotation groups. (c) SnailPlot showing the primary assembly statistics. (d) JupiterPlots showing the synteny between the primary and each haplotype assemblies, and between haplotypes.

We then used Blobtools (v 1.1.1)28 to identify potential contamination. To do so, the high-quality HiFi reads were mapped against each assembly using Minimap2 (v 2.26)29, and the scaffolds of the assembly were blasted against the NCBI Nucleotide database with Blastn (v 2.14.0)30. No evident contamination was found in any of the assemblies (Fig. S2). Finally, the statistics of the primary genome assembly were summarized in a snail plot (Fig. 2), which was obtained using Blobtoolkit (v 4.2.1)31.

We identified telomeres and centromeres of the primary assembly with quarTeT (v 1.2.5)32 and CentrIER (v 2.0)33, respectively. The default parameters were used for all functions except for telomere identification, where the ‘-m 10’ parameter was employed. The default motif for telomere identification established by quarTeT was TTTAGGG. The results were then visualized in an ideogram plot using the ggplot234 R package. In the primary assembly, all telomeres were identified (≥10 repeats), except for chromosome 7, which was missing one telomere. Likewise, all putative centromeres were detected, except for chromosome 10 (Fig. 2).

Transcriptome assembly and quality assessment

Raw reads from the RNA sequencing data (microRNA and mRNA) were first trimmed with Fastp (v 0.23.4)35. A total of 97.7% of the paired-end mRNA and 99.8% of microRNA passed length and quality filtering. They were subsequently aligned to the primary chromosome-level assembly with HISAT2 (v 2.2.1)36. Each type of RNA was assembled with StringTie (v 2.1.2)37.

Additionally, the microRNAs were aligned and annotated using ShortStack (v 4.0.3)38 with microRNAanno as a reference dataset39. The results were then plotted in an ideogram generated with the ggplot234 R package. The annotation identified 13 known miRNAs, 47 putative miRNAs, and 4 different siRNAs (Table S4). Identified miRNAs included miR156, miR159, miR160, miR164, miR166, miR167, miR168, miR172, miR395, miR396, miR528, miR535, and miR5179, and the siRNA were siRNA21, siRNA22, siRNA23, and siRNA24.

Repeat annotation of the primary assembly

To enhance the accuracy of gene prediction, repeat regions in the primary assembly were first masked using both the de novo Viridiplantae repeat library in RepeatModeler (v 2.0.5)40 and the Liliopsida repeat library from RepBase41 using RepeatMasker (v 4.1.6)42. The Liliopsida library was selected because C. humilis belongs to this taxonomic group, making it the closest available clade. Given the significant impact of masking on gene prediction, two different masking procedures were applied using RepeatMasker (v 4.1.6)42: (i) simple repeats were soft-masked while interspersed repeats were hard-masked, and (ii) all repeats were soft-masked.

Genome masking revealed that 88% of the genome assembly consists of repeats (Table S5). LTR retrotransposons represented most of the transposable elements in C. humilis (63%), primarily composed of two superfamilies: Ty1/Copia (38%) and Gypsy (25%). Other repeat classes, such as DNA transposons, small RNA, simple repeats, and others, collectively accounted for less than 10% (Table 1, Table S5).

Gene annotation of the primary assembly

Gene annotation was performed with both BRAKER343 and GeMoMa (v 1.9)44 using (i) the mRNA sequences mapped to the primary assembly, (ii) the reference genomes of Elaeis guineensis Jacq. (GCA_000442705.1, African oil palm) and Phoenix dactylifera L. (GCA_009389715.1, date palm), and (iii) the masked primary assembly. To run BRAKER3, we used the soft-masked genome, whereas for GeMoMA, we used the hard-masked genome, as required by each tool. The output of both tools was then combined with previously generated transcriptomic data to obtain a consensus gene annotation using EVidenceModeler (v 2.1.0)45. The primary genome assembly resulted in 28,321 annotated genes.

Additionally, the functional annotation of the predicted proteins was conducted by (i) Blastp (v 2.14.0)30 search using an E-value cutoff of 10−6 against the Swiss-Prot database, and (ii) InterProScan (v 5.64−96.0)46. A total of 24,116 (85%) proteins were identified in the Swiss-Prot database while 27,905 (98.5%) were identified by InterProScan, of which 21,386 (76%) had a gene ontology (GO) term associated.

To search for homology between the proteins, the sequence similarity between all protein-coding genes was compared using DIAMOND (v 0.9.30)47. For a protein sequence, the best five hits that met an E-value threshold of 10−5 were reported. The file generated was employed in conjunction with the annotation file to execute MCScanX (v 1.0)48 to generate pairwise synteny blocks of proteins. To identify the number and type of duplicated proteins, we ran the function ‘Duplicate_gene_classifier’ from MCScanX (v 1.0)48. This analysis provided the following classification of the duplicated proteins: 3,128 were singletons, 1,709 were proximal (located in nearby chromosomal regions but not adjacent), 3,986 were tandem (consecutive repeat), 12,330 were whole-genome or segmental (matching genes in synteny blocks), and 7,169 were dispersed (any mode other than segmental, tandem and proximal) duplications.

Additionally, the script ‘group_collinear_genes.pl’ from MCScanX (v 1.0)48 was applied to cluster the proteins by connecting collinear proteins until no protein in each group had any collinear proteins outside the group. A total of 4,017 collinear groups were found, which were visualized with SynVisio49 software (Fig. S3).

Organelle assembly and annotation

The most suitable reference genome for the assembly of the mitochondria and the chloroplast was identified using the script ‘findMitoReference.py’50. After assembling the genomes (as described above), all contigs belonging to the organelles were identified from the primary chromosome-level assembly using Blastn (v 2.14.0)30 and removed with Seqtk (v 1.3)51. Both organelle assemblies were annotated using GeSeq (v 2.03)52, setting the sequence identity for the proteins, rRNA, and tRNA to 95%. In addition, we utilized all NCBI Refseq assemblies from the Arecaceae family as references for the organelles GeSeq annotation. The results were visualized with OGDRAW (v 1.3.1)53.

Mitochondrial genome assembly

The mitochondrial genome was assembled with MitoHiFi (v 3.2)50, using the most closely related reference genome, Phoenix dactylifera L. (Arecaceae, NC_016740), and the primary chromosome-level assembly. We obtained a circular mitochondrial genome with a total length of 522.6 Kbp and GC content of 45.71% (Table S6). The mitochondrial assembly was annotated using MitoHiFi (v 3.2)50 and GeSeq (v 2.03)52. The annotation with the GeSeq (v 2.03)52 was performed using BLAT annotation engine (BLATX and BLATN). MitoHifi annotated 53 unique genes (Fig. S4), while GeSeq annotated between 37 and 41 genes (Table S7, Fig. S5). Due to the absence of other mitochondrial genomes for C. humilis in NCBI, the available ones from other members of the Arecaceae family were downloaded (Table S7). The statistics generated by Quast (v 5.0.2)54 showed that C. humilis has a GC content comparable to that of related species. In contrast, GeSeq (v 2.03)52 revealed notable differences in gene content across species (Table S7).

Chloroplast genome assembly

The chloroplast genome was assembled using ptGAUL (v 1.0.5)55, with C. humilis (Arecaceae, ON248747) as the most closely related reference genome, and PacBio HiFi reads as input. The GeSeq (v 2.03)52 with Chlöe engine was used for the gene annotation. We obtained a chloroplast assembly with a total length of 159 Kbp and a GC content of 37.17% (Table S8). It displays the typical quadripartite structure, comprising a large single-copy region (LSC; 86,302 bp), a small single-copy region (SSC; 17,924 bp) and two inverted repeats (IRs; 27,241 bp each). The genome contains 112 unique genes according to the GeSeq annotation (Fig. S5, Table S8). The statistics from Quast (v 5.0.2)54 indicate that both genome length and GC content are in line with other C. humilis plastome assemblies available in NCBI. The number of genes predicted using PGA (Plastid Genome Annotator)56 is likewise consistent across assemblies (Table S8).

Comparison to other published palm genomes

The genomes of 15 palm specimens belonging to eight different species were downloaded from NCBI, and statistics for each of them (assembly length, N50, and GC content) were calculated with Quast (v 5.0.2)54 (Table S9). BUSCO (v 5.2.2)57 with the ‘Liliopsida_odb10’ dataset was used to quantify complete and single-copy BUSCO genes of these palm genomes. Only genomes with more than 50% completeness were selected for the BUSCO-based phylogenomic analysis.

The phylogenomic position of C. humilis with respect to the other related palm species was determined with ‘BUSCO_phylogenomics.py’ (https://github.com/jamiemcg/BUSCO_phylogenomics). In short, this pipeline aligned the BUSCO sequences with MUSCLE (v 5.1)58, trimmed them with TrimAl (v 1.4.1)59, and constructed a maximum-likelihood consensus tree with IQ-TREE (v 2.2.6)60. The genomes of Aegilops tauschii subsp. strangulata (GCF_002575655.2) and Zea mays (GCF_902167145.1) were used as outgroups. All genomes from the same species were monophyletic with high clade support (bootstrap support of 100, Fig. S6). Chamaerops humilis (18 chromosomes) is sister to the genus Phoenix (with 18−19 chromosomes), which includes two species, P. dactylifera and P. roebelenii. There are two other clades: one comprising the genera Calamus and Metroxylon, and another comprising the genera Areca, Elaeis and Cocos (each with 16 chromosomes). The phylogenetic reconstruction showed a strong phylogenetic signal for chromosome number, but there was no correlation between assembly size and chromosome number61 (Table S9, Fig. S6).

To assess the transposable elements (TE) activity, we calculated the percentage of LTRs for each palm species. For species with multiple genome assemblies, the most complete assembly was selected based on BUSCO scores (Table S9). The proportion of LTR elements was determined by using RepeatMasker (v 4.1.6)42 as described above. This method involved the integration of the de novo Viridiplantae repeat library from RepeatModeler (v 2.0.5)40 and the Liliopsida repeat library from RepBase41. The proportion of LTR elements from the RepeatMasker-generated table was used to build a linear regression model to examine its linear relationship with the assembly length. Repeat divergence landscapes were then created following the methodology of Rodriguez and Arkhipova (2023)62, using TwoBit (v 2.0.9, weng-lab.github.io/TwoBit/), and two Perl scripts (calcDivergenceFromAlign.pl and createRepeatLandscape.pl) from RepeatMasker (v 4.1.6)42. The results were presented in the form of a histogram created using the ggplot234 R package. Small genomes such as those from Phoenix or Metroxylon have lower proportions of LTR elements than larger ones like those from Chamaerops or Areca (Table S9, Fig. 3). The repeat divergence landscape plots (Fig. 3, Fig. S7) show that the elevated proportion of TE sequences in C. humilis is associated with low sequence divergence, particularly within the LTR family.

Fig. 3
figure 3

(a) Linear regression between assembly length and proportion of LTR elements present in the genome. The linear regression shows an adjusted R2 of 0.74 (p-value = 1.08e-5), with a residual standard error of 0.54. (b) Repeat divergence landscapes of LTR elements in the main crops of the family Arecaceae.

Geographical origin of Goethe’s Palm

The ten highly polymorphic microsatellites identified by Giovino et al.11 were used in this study to investigate the geographic origin of Goethe’s Palm. Microsatellite loci were extracted from the haplotype 1 and haplotype 2 chromosome-level assemblies of C. humilis using SeqKit (v 0.10.2)63 and Blastn (v 2.14.0)30. Eight loci were detected in Goethe’s Palm (locus 15, locus 16, locus 19, locus 23, locus 25, locus 27, locus 37 and locus 44). Locus 25 was only found in haplotype 1, while locus 37 was only present in haplotype 2.

The identified loci were combined with the Giovino et al.11 dataset, which includes approximately 300 individuals spanning the native range of C. humilis. A genind object was constructed using the adegenet64 R package, and duplicate genotypes were removed with poppr65. The resulting dataset was analyzed using Discriminant Analysis of Principal Components (DAPC), which explained 77% of the genetic variation and revealed a strong geographic structure, with axis 1 separating western and eastern Mediterranean populations (Fig. 4). The results were then visualized using the ‘compoplot’ function from adegenet64 and ggplot234. Goethe’s Palm grouped with the Western genetic lineage, predominantly found in Spain and Portugal (Fig. 4).

Fig. 4
figure 4

(a) DAPC showing population structure in populations of the Mediterranean dwarf palm and the potential geographic origin of Goethe’s Palm. All samples analyzed represented by filled circles are colored according to the country to which they belong. (b) Compoplot showing the most likely assignment of Goethe’s Palm to the evaluated countries. Both analyses indicate that Goethe’s Palm belongs to the Western genetic lineage of C. humilis.

Data Records

The C. humilis genome assembly was deposited in NCBI GenBank under accession number GCA_042465325.166 for haplotype 1, GCA_042465335.167 for haplotype 2, and GCA_042465385.168 for the primary assembly. The RNA sequencing data69 was submitted to the NCBI Genbank under the accession numbers SRR29081397 and SRR29081396 for mRNA and microRNA, respectively. The mitochondrial and plastid assemblies were deposited in the Zenodo repository (https://doi.org/10.5281/zenodo.14872859)70. The primary assembly annotation was deposited in the Zenodo repository (https://doi.org/10.5281/zenodo.15518854)71.

Technical Validation

The statistics for the nuclear (Table S1, Table S2, Table S3) and organelle (Table S6, Table S7, Table S8) assemblies were obtained using Quast (v 5.0.2)54 and Assemblathon272.

The completeness of the genome assemblies was analyzed using BUSCO (v 5.2.2)57 with the ‘Viridiplantae_odb10’ dataset. The primary assembly had a total BUSCO completeness of 99.3% (Table S1, Fig. 2). The BUSCO completeness of haplotype1 and haplotype 2 are shown in Table S2 and Table S3, respectively. The completeness of the predicted proteins was also evaluated with BUSCO, yielding 98.3% completeness (C: 98.3% [S: 88.7% and D: 9.6%], F: 0.7%, M: 1.0%). Finally, the BUSCO completeness of the transcripts obtained after assembling the mRNA data with StringTie (v 2.1.2)37 reached 91.8% (Table S10).

For the primary assembly, mapping rate and coverage depth were assessed using Qualimap (v 2.3)73. The analysis showed that 99.98% of HiFi reads mapped to the primary genome assembly with an average coverage of 17x. Furthermore, the LTR Assembly Index (LAI), calculated using LTR retriever (v 3.0.1)74, yielded a value of 8.01.

To evaluate annotation quality, mRNA reads were mapped to the primary genome using BBmap (v 39.06)75, resulting in a 98% mapping rate.

To evaluate synteny between the primary assembly and each haplotype, as well as between the two haplotypes, we ran JupiterPlot (v 3.8.1)76. The JupiterPlots showed high levels of synteny between the primary and haplotype 1 assemblies, and between the primary and haplotype 2 assemblies, as expected given the identical chromosome numbers (Fig. 2). However, the synteny between the two haplotypes revealed that scaffolds 17 and 18 were incomplete in haplotype 2.

In addition, the haplotype assemblies were assessed using Merqury (v 1.3)77. For this analysis, Meryl was first run with the HiFi reads using a k-mer size of 21. Subsequently, Merqury was executed with default parameters. The k-mer completeness corroborated the results obtained by the synteny analysis. Specifically, the k-mer completeness for haplotype 1 was higher, reaching 83%, while for haplotype 2, it reached 82%. For the primary assembly, k-mer completeness reached 87% and the quality value (QV) reached 59%.