Abstract
This study presents the first chromosome-level genome assembly of the Korean long-tailed chicken (KLC), a unique breed of Gallus gallus known as Ginkkoridak. Our assembly achieved a super contig N50 of 5.7 Mbp and a scaffold N50 exceeding 90 Mb, with a genome completeness of 96.3% as assessed by BUSCO using the aves_odb10 set. We also constructed a comprehensive pangenome graph, incorporating 40 Gallus gallus assemblies, including the KLC genome. This graph comprises 87,934,214 nodes, 121,720,974 edges, and a total sequence length of 1,709,850,352 bp. Notably, our KLC assembly contributed 1,919,925 bp of new sequences to the pangenome, underscoring the unique genetic makeup of this breed. Furthermore, in comparison with the pangenome, we identified 36,818 structural variants in KLC, which included 2,529 insertions, 27,743 deletions, and 6,546 of either insertions or deletions shorter than 1 kb. We also successfully identified pan-genome wide non-reference sequences. Our KLC assembly and pangenome graph provide valuable genomic resources for studying G. gallus populations.
Similar content being viewed by others
Background & Summary
The Korean long-tailed chicken (KLC), commonly referred to as “Ginkkoridak” in Korean, a breed distinguished by its notably long tail, represents a crucial element of Korea’s avian genetic legacy and occupies a significant role in worldwide biodiversity1,2. Remarkably, roosters at the age of three can extend their tail feathers up to 1.5 meters annually, while hens, although their tails are not as long as the roosters’, can still achieve a growth of up to 40 cm per year, which is relatively lengthy compared to other breeds3,4. Unfortunately, the KLC population experienced a significant reduction, especially during and subsequent to the Korean War. This decline was primarily due to the influx of foreign poultry breeds and changes in agricultural methods2,3,4. As of 2018, the KLC population had dwindled to approximately 450 individuals. This decline has highlighted the urgent need for detailed genetic research at the assembly level to understand and preserve the KLC’s unique genetic makeup, which is vital for maintaining biodiversity and has potential applications in various scientific fields.
Here, we constructed a first chromosome-level genome assembly of KLC using a combination of PacBio long reads, 10x Genomic reads and Illumina short reads (Fig. 1). The assembly features a super contig N50 of 5.7 Mb and a scaffold N50 of over 90 Mb. The genome’s completeness, assessed using the aves_odb10 set, was confirmed with a BUSCO5 score of 96.3%, reflecting the thoroughness and accuracy of the assembly.
Circos plot of KLC. The circos plot was plotted using TBtools-II v.2.070.22 From the outer most track, each track represents: chromosomes including 39 autosomal chromosomes and sex chromosome Z, DNA TEs, LTRs, LINEs, SINEs, gaps and GC ratio, respectively. Each tick of the chromosome track represents 20,000 bp. Value of each element was calculated for every 10,000 bp window of the genome. For DNA TEs, LTRs, LINEs, SINEs and gaps, bar was plotted by taking the mean value of all 10,000 bp windows on every 100,000 bp and the global minimum and maximum values for every windows of genome were used for minimum and maximum height of the tracks (DNA TEs: 0, 1,936; LTRs: 0, 756; LINEs: 30,451, 0; SINEs: 0, 1,263; gaps: 0%, 1%). Chicken image is from Youm et al.2.
Then, complementing this genome assembly, a comprehensive pangenome graph was constructed, integrating 40 Gallus gallus genome assemblies, including that of the KLC. This pangenome graph consisted of 87,934,214 nodes, 121,720,974 edges, sequence length of 1,709,850,352 and a mean degree (number of edges attached to a node) of 1.4. In total of 1,709,850,352 bp of sequence, the reference genome (GRCg7b)6 covered 1,041,122,857 bp of sequence, while our KLC assembly covered 1,919,925 bp of new sequences. (Fig. 2).
Methods
Sample collection and sequencing
A rooster of KLC was used in this study. Blood samples were gathered in compliance with the guidelines set by the National Institute of Animal Science (NIAS) in South Korea. All animal-related experimental procedures received approval from NIAS (approval number NIAS2018268). This study was conducted adhering to the ARRIVE guidelines.
Genomic sequencing was performed using multiple platforms to ensure comprehensive coverage. High-quality genomic DNA was extracted and sequenced using PacBio long-read technology, Illumina short-read sequencing with insert sizes of 280 bp and 550 bp, and 10x Genomics reads. The Illumina short-read sequencing achieved a depth of 19.661x for the 280 bp insert library and 5.909x for the 550 bp insert library. The 10x Genomics read sequencing provided a depth of 37.546x while the PacBio long-read sequencing generated a coverage depth of 88.183x. To summarize the sequencing data obtained from each platform, refer to Table 1, which provides detailed statistics including the platform used, tissue type, total number of reads, total bases, and SRA accession.
Short read(PE) sequencing
The quality of genomic DNA was assessed by agarose gel electrophoresis, and quantification was carried out using the Quant-IT PicoGreen assay (Invitrogen). Sequencing libraries were constructed following the guidelines of the TruSeq DNA Nano Library Prep Kit (Illumina, Inc., San Diego, CA, USA). In summary, the genomic DNA was fragmented using adaptive focused acoustic technology (AFA; Covaris). The resulting fragments were end-repaired to generate 5′-phosphorylated, blunt-ended double-stranded DNA. After end-repair, the DNA was size-selected through a bead-based approach. An ‘A’ base was then added to the 3′ ends, followed by the ligation of TruSeq indexing adapters. The final libraries were quantified via qPCR (using the KAPA Library Quantification Kit for Illumina platforms) and checked for quality using the Agilent Technologies 4200 TapeStation (Agilent Technologies). Paired-end sequencing was carried out on the HiSeq X platform (Illumina, San Diego, USA) at Macrogen (Seoul, South Korea).
Pacbio sequencing
High-quality, high-molecular-weight DNA was required to generate size-selected SMRTbell templates of approximately 10 kb. DNA concentration was measured using a NanoDrop spectrophotometer (Thermo Scientific) and PicoGreen assay. All samples met the QC screening criteria. For PacBio Sequel sequencing, 5 µg of input genomic DNA was used to prepare the 10 kb library. For genomic DNA with a size distribution below 17 kb, the actual size distribution was determined using the Bioanalyzer 2100 (Agilent). If the apparent size exceeded 40 kb, the DNA was sheared using g-TUBE (Covaris Inc., Woburn, MA, USA) and purified with AMPurePB magnetic beads (Beckman Coulter Inc., Brea, CA, USA). A total of 10 µL of library was prepared using the PacBio DNA Template Prep Kit 1.0. SMRTbell templates were annealed using the Sequel Binding and Internal Ctrl Kit 3.0. Sequencing was carried out with the Sequel Sequencing Kit 3.0 and SMRT Cells 1 M v3 Tray. Data were captured from each SMRT cell using 600-minute movies on the PacBio Sequel sequencing platform (Pacific Biosciences) by Macrogen (Seoul, South Korea). Subsequent steps followed the PacBio Sample Net-Shared Protocol, available at http://pacificbiosciences.com/.
10x Genomics
The sequencing libraries were prepared according to the manufacturer’s instructions of Chromium Genome Library Kit (10x Genomics). Briefly, 40 kb or more of HMW genomic DNA were size selected using BluePippin(Sage Science) and 1 μg was used as input. Size selected HMW template gDNA was combined with 10x barcoded Gel bead to barcode DNA. The barcoded DNA underwent end-repair to generate 5′-phosphorylated, blunt-ended double-stranded DNA molecules. Afterward, a single ‘A’ base was added, followed by adapter ligation. The adapter-ligated DNA is amplified with an indexed primer to complete the chromium genome libraries with full-length adapters. Finally, the product is size selected with a bead-based method. The libraries were quantified following the qPCR Quantification Protocol Guide (KAPA Library Quantification kits for Illumina Sequencing platforms) and their quality was evaluated using the Agilent Technologies 4200 TapeStation D1000 ScreenTape (Agilent Technologies). Then we sequenced using the HiSeq (Illumina).
Genome assembly
We initiated the process by filtering the raw Illumina short reads using NGSQCToolkit’s7 IlluQC module. Reads were retained only if at least 70% of their length exhibited a Phred quality score (Q score) of 20 or higher, which corresponds to a base call accuracy of 99%. Next, we employed Trimmomatic8 to remove adaptor sequences from the reads. After this step, we utilized fastQC9 to assess the quality of the trimmed reads, ensuring removal of any residual artifacts. Then SOAPec’s10 Corrector module was utilized to correct sequencing errors in paired-end reads with insert sizes of 280 and 550 base pairs. Finally, we utilized LoRDEC11 algorithm to correct sequencing errors specific to PacBio sequences. The genomic assembly was conducted in a hybrid fashion, combining short reads and long PacBio reads. Falcon12 generated primary contigs, Falcon-Unzip leveraged PacBio long reads and 10x Genomic Reads to create haplotigs, and GapCloser (Gap-DOES) utilized mate-pair reads to fill gaps. This integrated approach yielded a high-quality assembly with super contig N50 of 5.7 megabase pairs (Mbp) and a total of 1162 super contigs (Table 2: Statistics of contig-assembly before scaffolding). Purge_dups (v1.2.6)13 was used to remove haplotypic duplication and contig overlaps in the draft assembly.
After the removal of haplotypic duplication, our assembly was scaffolded using a reference-guided approach with RagTag (v2.1.0)14. Given the absence of the W chromosome in male chickens (Gallus gallus), which have ZZ sex chromosomes (females are ZW), we used GRCg7b6 (excluding the W chromosome) as a reference. The RagTag process was divided into two steps: ‘ragtag correct’ and ‘ragtag scaffold’ which corrects potential misassemblies by aligning our contig assembly and scaffolds the query contigs guided by the reference genome respectively. This resulted in 304 scaffolds including 39 chromosome-level scaffolds encompassing the Z chromosome, and 264 unplaced scaffolds (Table 3: KLC genome assembly statistics).
Then the PacBio long reads we employed for our assembly were aligned using minimap2 within the TGS-GapCloser15 framework for gap filling. The completed assembly of 39 chromosome-level scaffolds amounted to a total size of 995.29 Mb. The 39 chromosome-level scaffolds constituted 98.54% of the entire assembly, while the 14.98 Mb of the genome remains to be further studied (Table 4: Length of chromosome-level scaffolds).
Repeat annotation of genome assembly
Repeat elements were identified with RepeatMasker v4.1.516 using Dfam v3.7 library and RepBase (v 10/26/2018) using RMBlast. Approximately 10.28% of the genome was composed of repetitive elements. Retroelements, primarily Long Interspersed Nuclear Elements (LINEs), emerged as the dominant class, accounting for 7.14% of the genome. Short Interspersed Nuclear Elements (SINEs) were also observed, although they represent a smaller fraction. A detailed breakdown of repeat elements and their genomic proportions is provided in Table 5: Statistics of repetitive elements.
Assessment of the chromosome-level genome assembly
The quality and completeness of the genome assembly were assessed using the Benchmarking Universal Single-Copy Orthologs (BUSCO)5 v5.5.0 software docker image, which evaluates the presence of evolutionarily informed expected gene content. Our analysis with the aves_odb10 lineage dataset revealed that 96.33% of the assessed orthologs were complete, indicating a high level of completeness in the assembly. Additionally, 0.008% were identified as fragmented, and 0.027% were missing, providing a clear metric of assembly quality. Furthermore, the assembly was evaluated with QUAST v5.2.017 docker image for its contiguity and accuracy. Key metrics derived from QUAST included a contig N50 of 90.512 Mb, which is similar to the current reference genome, GRCg7b6. The largest contig assembled measured 195.43 Mb. The total number of contigs was 304, with 302 contigs being longer than 1 kb. The genome assembly also had a GC content of 42.04%, which is within the expected range for this species. These statistics underscore the robustness of our assembly process and the high quality of the genomic resource produced.
Pangenome graph construction
To construct a comprehensive pangenome graph, we aggregated 40 available Gallus gallus genomes from the NCBI GenBank database, as of June 3rd, 2024. This collection encompassed a diverse range of assembly levels, featuring 21 chromosome-level assemblies, including the notable KLC assembly, along with 15 scaffold-level and 4 contig-level assemblies. For the establishment of a reference genome within this pangenome, we selected “GRCg7b”6, also recognized as bGalGal1b. This decision was informed by its status as the current species reference assembly in NCBI RefSeq, notable for its contig N50 value of 18.8 Mb. The GRCg7b assembly, renowned for its high-quality, fully annotated sequence, offers an extensive insight into the genomic architecture of the domestic chicken.
Minigraph–cactus pangenome pipeline18 with Docker image cactus v2.7.0 was used to build a pangenome because it enabled to incorporate not only chromosome-level assemblies, but also scaffold and contig-level assemblies to the pangenome. Due to different chromosomal structures among assemblies (Table 6: Chromosome structures of 40 assemblies), we had to manually run each step of the pipeline to retrieve all the macro and microchromosomes of Gallus gallus.
Structural variants and genetic variation in KLC
To extract structural variants within the pangenome of Korean Long-Tailed (KLC) chicken, we employed the Minigraph-Cactus pangenome pipeline. This pipeline generates multi-sample VCF files containing genotypes relative to the pangenome graph. To isolate variants specific to the KLC sample, we adopted a targeted approach as outlined in Liao et al.19. Variants were filtered using a stringent criterion: those with a length difference between the reference and alternate alleles exceeding 50 base pairs were classified as deletions, while those where the alternate allele length surpassed the reference by 50 base pairs or more were classified as insertions.
Additionally, we used the Variant Effect Predictor (VEP) from Ensembl20 to understand the genetic variations in KLC chicken. VEP identified the distribution of variant classes, showing a predominance of deletions (27,743), followed by indels (6,546) and insertions (2,529). In this analysis, insertions or deletions less than 1 kb were categorized as indels. VEP also provided the chromosomal distribution of these variants, as presented in Fig. 3.
Variants by Chromosome in Korean Long-Tailed Chicken. This figure illustrates the distribution of genetic variants across the chromosomes of the Korean Long-Tailed Chicken (KLC). It highlights the frequency and types of variants, including insertions, deletions, and indels, across different chromosomes.
Pangenome graph and non-reference sequences
To quantify the unique sequences each sample added to the pangenome graph, we used a Python script that adapts the iterative approach described Rice and colleagues21. This script removes sequences covered by the reference assembly and iteratively eliminates sequences from the largest non-reference contributor until all samples are evaluated, effectively isolating novel sequences introduced by each genome. Additionally, to extract non-reference sequences from the pangenome graph, we converted the HAL output file from the Minigraph-Cactus pipeline18 into a MAF file using cactus-hal2maf tool. We filtered out blocks containing Ancestral, synthetic, or reference genome sequences (i.e., ‘Anc0’, ‘MINIGRAPH’, and ‘GRCg7b’) to extract non-reference alignments. The MAF files were used to determine the number of assemblies present and the total base pair counts in genomic blocks that do not align with reference genomes (non-reference blocks). The data regarding the base pairs in each assembly and the frequency of each genomic block appearance is presented in Figs. 4, 5, respectively. The script used for this analysis is available in the associated code repository.
Pangenome-wide non-reference sequences. This figure presents a log-scale comparison of the total base pairs contributed by each assembly to the pangenome. It underscores the diversity and richness of the genomic content across different Gallus gallus assemblies, including the unique contributions from the KLC genome, thereby demonstrating the breadth and depth of the pangenome.
Data Records
The Korean long-tailed chicken assembly project has been deposited at DDBJ/ENA/GenBank under the accession JBBEWE0122.
This whole genome shotgun sequencing of Illumina short reads of 280 bp and 550 bp have been deposited at DDBJ/ENA/GenBank under the accession SRR2944572923 and SRR2944573023, respectively.
The PacBio sequencing data was deposited in the SRA at NCBI SRR2944573223.
The 10x Genomics read data was deposited in the SRA at NCBI SRR2944573123.
The variant data, which is including the structural variation (SV) from the pangenome analysis, have been deposited in the European Variation Archive (EVA) at EMBL-EBI under the project accession number PRJEB8121924,25.
The pangenome graph output (GFA file)26 is also uploaded on figshare
Technical Validation
The integrity and purity of the KLC genomic DNA were rigorously evaluated to ensure high-quality sequencing. DNA degradation was monitored using agarose gel electrophoresis, and the purity of DNA samples was assessed using the NanoDrop spectrophotometer (Thermo Scientific). We accepted only DNA samples with an optimal OD260/280 ratio of 1.8–2.0 and an OD260/230 ratio greater than 2.0, ensuring minimal contamination and degradation.
The completeness of the KLC genome assembly was critically assessed using BUSCO v5.5.0 with the aves_odb10 data set. This evaluation demonstrated a high completeness score of 96.3%, indicating that the majority of avian core genes were successfully captured in our assembly. Furthermore, the assembly was subjected to additional quality checks using tools like QUAST v5.2.017, which confirmed the contiguity and accuracy of our assembly process. A contig N50 value of 90.512 Mb, comparable to the current reference genome, was achieved, reflecting the robustness of our assembly.
Code availability
The following section details the versions, settings, and parameters of the software utilized:
NGSQCToolkitv2.3: https://github.com/mjain-lab/NGSQCToolkit
ragtag: https://github.com/malonge/RagTag
vg stats -N -E -l -p 20 <gfa file>> <output txt> bcftools: view -a -l -s <sample name> cactus-hal2maf: <jobstore> <hal file> <maf file> --refGenome Anc0 --chunkSize 500000
QUAST v5.2.0; python quast.py <genome> BUSCO v5.5.0; busco -I <genome> -l aves_odb10 -m genome
purge_dups v1.2.6: https://github.com/dfguan/purge_dups
RepeatMasker: https://www.repeatmasker.org/
Change history
28 May 2025
In this article Hanshin D. Shin and Wonchoul Park should have been denoted as equally contributing authors. The original article has been corrected.
References
Suh, S. et al. Genetic Diversity and Relationships of Korean Chicken Breeds Based on 30 Microsatellite Markers. Asian-Australas. J. Anim. Sci. 27, 1399–1405 (2014).
Youm, D.-J. et al. The idiosyncratic genome of Korean long-tailed chicken as a valuable genetic resource. iScience 26, 106236 (2023).
Charton, C. et al. The transcriptomic blueprint of molt in rooster using various tissues from Ginkkoridak (Korean long-tailed chicken). BMC Genomics 22, 594 (2021).
Yeon, S., Cho, C., Kim, J., Jin, H. & Kim, Y. Phylogenetic systematics presumption of Korean long-tailed chickens. Korean Soc. Poult. Sci. 11, 84–85 (2006).
Seppey, M., Manni, M. & Zdobnov, E. M. BUSCO: Assessing Genome Assembly and Annotation Completeness. in Gene Prediction: Methods and Protocols (ed. Kollmar, M.) 227–245 https://doi.org/10.1007/978-1-4939-9173-0_14 (Springer, New York, NY, 2019).
Genome Reference Consortium. Gallus gallus reference genome (GRCg7b). https://doi.org/10.1093/gigascience/giab080.
Patel, R. K. & Jain, M. NGS QC Toolkit: A Toolkit for Quality Control of Next Generation Sequencing Data. PLoS ONE 7, e30619 (2012).
Trimmomatic: a flexible trimmer for Illumina sequence data | Bioinformatics | Oxford Academic. https://academic.oup.com/bioinformatics/article/30/15/2114/2390096.
Andrews, S. s-andrews/FastQC. (2024).
SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler | GigaScience | Full Text. https://gigascience.biomedcentral.com/articles/10.1186/2047-217X-1-18.
Salmela, L. & Rivals, E. LoRDEC: accurate and efficient long read error correction. Bioinformatics 30, 3506–3514 (2014).
Phased diploid genome assembly with single-molecule real-time sequencing | Nature Methods. https://www.nature.com/articles/nmeth.4035.
Guan, D. et al. Identifying and removing haplotypic duplication in primary genome assemblies. Bioinforma. Oxf. Engl. 36, 2896–2898 (2020).
Alonge, M. et al. Automated assembly scaffolding using RagTag elevates a new tomato system for high-throughput genome editing. Genome Biol. 23, 258 (2022).
TGS-GapCloser: A fast and accurate gap closer for large genomes with low coverage of error-prone long reads | GigaScience | Oxford Academic. https://academic.oup.com/gigascience/article/9/9/giaa094/5902284.
Chen, N. Using RepeatMasker to Identify Repetitive Elements in Genomic Sequences. Curr. Protoc. Bioinforma. 5, 4.10.1–4.10.14 (2004).
Gurevich, A., Saveliev, V., Vyahhi, N. & Tesler, G. QUAST: quality assessment tool for genome assemblies. Bioinforma. Oxf. Engl. 29, 1072–1075 (2013).
Pangenome graph construction from genome alignments with Minigraph-Cactus | Nature Biotechnology. https://www.nature.com/articles/s41587-023-01793-w.
Liao, W.-W. et al. A draft human pangenome reference. Nature 617, 312–324 (2023).
McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016).
Rice, E. S. et al. A pangenome graph reference of 30 chicken genomes allows genotyping of large and complex structural variants. BMC Biol. 21, 267 (2023).
Shin, H. et al. Gallus gallus breed Ginkkoridak isolate https://identifiers.org/ncbi/insdc:JBBEWE000000000.1 (2024).
NCBI Sequence Read Archive. https://identifiers.org/ncbi/insdc.sra:SRP514511 (2024).
Structural Variations of Ginkkoridak extracted from 40 breed of Gallus gallus, EBI European Nucleotide Archive. https://identifiers.org/ena.embl:PRJEB81219 (2024).
Structural Variations of Ginkkoridak extracted from 40 breed of Gallus gallus, EBI European Variant Archive. http://www.ebi.ac.uk/eva/?eva-study=PRJEB81219 (2024).
40 chicken pangenome GFA, Figshare, https://doi.org/10.6084/m9.figshare.27240009
Rhie, A. et al. Towards complete and error-free genome assemblies of all vertebrate species. Nature 592, 737–746 (2021).
Huang, Z. et al. Evolutionary analysis of a complete chicken genome. Proc. Natl. Acad. Sci. 120, e2216641120 (2023).
Wu, S. et al. High quality assemblies of four indigenous chicken genomes and related functional data resources. Sci. Data 11, 300 (2024).
Sohn, J.-I. et al. Whole genome and transcriptome maps of the entirely black native Korean chicken breed Yeonsan Ogye. GigaScience 7, (2018).
Li, M. et al. De Novo Assembly of 20 Chicken Genomes Reveals the Undetectable Phenomenon for Thousands of Core Genes on Microchromosomes and Subtelomeric Regions. Mol. Biol. Evol. 39, (2022).
Bellott, D. W. et al. Avian W and mammalian Y chromosomes convergently retained dosage-sensitive regulators. Nat. Genet. 49, 387–394 (2017).
Acknowledgements
This research was carried out with the support of the “Cooperative Research Program for Agriculture Science & Technology Development (Project title: Discovery of phenotype specific genes from the reference genome of Ginkkoridak, Project No. PJ01334102)” Rural Development Administration, Republic of Korea.
Author information
Authors and Affiliations
Contributions
Byung June Ko, Jaehoon Jung and Heebal Kim conceived the study; Wonchoul Park and Han-ha Chai collected the samples; Hanshin D. Shin, Han-ha Chai assembled the genome; Youngho Lee assessed the genome quality; Hanshin Shin assembled the pangenome; Jaehoon Jung, Byung June Ko and Hanshin D. Shin analyzed non-reference sequences of pangenome. Hanshin D. Shin, Wonchoul Park and Heebal Kim wrote the manuscript. Also, all authors reviewed, revised and approved the final version of manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Shin, H.D., Park, W., Chai, Hh. et al. Chromosome-level Genome Assembly of Korean Long-tailed Chicken and Pangenome of 40 Gallus gallus Assemblies. Sci Data 12, 51 (2025). https://doi.org/10.1038/s41597-024-04287-9
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41597-024-04287-9







