Chromosome-level genome assembly of the white-spotted sawyer Monochamus scutellatus (Coleoptera: Cerambycidae)

Kim, Sangil; Farrell, Brian D.

doi:10.1038/s41597-025-06258-0

Download PDF

Data Descriptor
Open access
Published: 10 December 2025

Chromosome-level genome assembly of the white-spotted sawyer Monochamus scutellatus (Coleoptera: Cerambycidae)

Scientific Data volume 12, Article number: 2009 (2025) Cite this article

1564 Accesses
1 Citations
Metrics details

Subjects

Abstract

The white-spotted sawyer, Monochamus scutellatus (Say) (Coleoptera: Cerambycidae), is an important vector of pinewood nematode (PWN), Bursaphelenchus xylophilus (Steiner and Buhrer) Nickle, in North America. While Monochamus species from the Palearctic region have been extensively studied for their role in transmitting PWN that causes pine wilt disease in Asia and Europe, the genetic mechanisms underlying Monochamus-PWN interactions in their native range remain largely unknown. Here, we present the first chromosome-level genome assembly of the North American M. scutellatus, constructed using PacBio HiFi long read, Pore-C chromatin conformation capture, and Illumina RNA sequencing. The assembled genome spans 830.9 Mbp, with a scaffold N50 of 87.9 Mbp, 97.9% of which were anchored to 10 chromosome-level scaffolds. The X chromosome was identified through synteny analysis. Repeat elements constitute 70.7% of the genome, and 13,684 protein-coding genes were functionally annotated. This reference-quality genome of M. scutellatus provides a valuable comparative resource for elucidating the genomic basis of Monochamus-PWN interactions, and offers a foundation for devising targeted management strategies against PWN and its vectors.

Chromosome-level genome assembly of Monochamus sutor in China

Article Open access 24 December 2025

Chromosome-level genome assembly of the Japanese sawyer beetle Monochamus alternatus

Article Open access 13 February 2024

A near-complete genome assembly of Monochamus alternatus a major vector beetle of pinewood nematode

Article Open access 26 March 2024

Background & Summary

Longhorned beetles (family Cerambycidae) represent one of the most species-rich families of beetles, with over 36,000 described species in 4,100 genera worldwide^1,2. As a major lineage of phytophagous beetles, most longhorned beetles attack and feed on plant tissues, and their diversification is often explained by a coevolutionary radiation with diversifying angiosperms^3,4,5. Among them, the Asian longhorned beetle, Anoplophora glabripennis (Motschulsky), was one of the first species to be investigated for the genetic basis of plant-feeding evolution, due to its broad host range and invasive pest status in the United States^6,7,8. Recent transcriptomic and genomic studies of longhorned beetles, including a reference genome of A. glabripennis, have revealed the presence of horizontally acquired plant cell wall-degrading enzymes in the glycoside hydrolase families, which likely facilitate nutrient acquisition from nutrient-poor, recalcitrant woody tissues^6,9,10,11. More recently, chromosome-level genome assemblies have also been generated for two other important xylophagous longhorned beetles, Monochamus alternatus (Hope) and M. saltuarius (Gebler, 1830)^12,13—major vectors of pinewood nematodes in East Asia—providing an unparalleled opportunity to study the genomic basis of conifer-feeding and adaptive traits associated with life in temperate forests.

Monochamus longhorned beetles are distributed worldwide, except Australasia, and include a monophyletic clade of 18 conifer-feeding species restricted to temperate forests of the Holarctic region. These conifer specialists are inferred to have evolved within a predominantly angiosperm-feeding lineage of Monochamus at the Miocene-Pliocene boundary around 5 million years ago (Mya)¹⁴. In fact, most of these conifer-feeding species are known to transmit the pinewood nematode (PWN), Bursaphelenchus xylophilus (Steiner and Buhrer) Nickle (Nematoda: Aphelenchoididae), the causal agent of pine-wilt disease in the Palearctic region^15,16,17 and a nematode species native to North America. While the biology and pest control measures for Palearctic Monochamus species—such as Monochamus alternatus and M. saltuarius—have been extensively studied, the genetic mechanisms underlying Monochamus-PWN interactions in their native North American range remains largely unexplored.

In this study, we present the first chromosome-level genome assembly of Monochamus scutellatus (Say), a major vector of PWN in North America¹⁸, generated based on PacBio HiFi long reads, Pore-C chromatin confirmation capture, and Illumina RNA-seq data. The genome spans 830.9 Mbp and comprises 10 pseudo-chromosomes (Fig. 2A,B; Table 2), consistent with previous cytological evidence¹⁹. Chromosome 10 was identified as the X chromosome based on synteny analysis, which revealed extensive conservation of the X chromosome across Coleoptera (Fig. 2C). With a genome size comparable to those of the two Palearctic congeners—M. alternatus (792.1 Mbp) and M. saltuarius (682.2 Mbp), the M. scutellatus genome demonstrates exceptional contiguity, reflected by fewer scaffolds and a higher N50 (Table 2). As the first genomic resource for a North American Monochamus species, the M. scutellatus genome provides a valuable foundation for investigating the genomic basis of Monochamus-PWN interactions in their region of origin and offers a key comparative framework for testing evolutionary hypotheses on the origin of these interactions, as well as their role in the beetles’ adaptation to utilizing the vast resource of coniferous forests across the Northern Hemisphere.

Methods

Sample collection

Adult specimens of Monochamus scutellatus were collected in July 2023 from Eastern white pine, Pinus strobus Linnaeus (Pinaceae), at Blue Hills Reservation, Milton, Massachusetts, U.S.A. (42°13.237′N, 71°7.037′W; elev. 81 m), using panel traps equipped with monochamol pheromone lures (Fig. 1). To minimize contamination from gut contents, all specimens were starved for several days, flash-frozen in liquid nitrogen, and subsequently cryo-preserved at −80 °C until used for extraction. A total of two female specimens were used: One for PacBio sequencing and the other for Pore-C and Illumina transcriptome sequencing. The voucher specimen for PacBio sequencing (voucher no.: MCZ-SK1313) has been deposited in the Entomology Research Collection at the Museum of Comparative Zoology, Harvard University.

Nucleic acid extraction and sequencing

High molecular weight (HMW) genomic DNA (gDNA) was extracted from the thoracic muscle tissue of an individual adult specimen using the Qiagen MagAttract HMW DNA Kit (Qiagen, Hilden, Germany). The integrity of the extracted gDNA was evaluated via gel electrophoresis on a 1% agarose gel with a lambda DNA marker, while concentration and purity were assessed using a Quantus Fluorometer (Promega, Madison, WI, USA) and a Nanodrop Spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA). Purified HMW gDNA was treated with the Short Read Eliminator (SRE) XL Kit (Pacific Biosciences, Menlo Park, CA, USA) to remove DNA fragments below 40 kbp, and sheared into 20 kbp fragments using the Megaruptor 2 (Diagenode, Liège, Belgium). A PacBio SMRT library was constructed using the SMRTbell Prep Kit 3.0, and sequenced on a single SMRT HiFi cell of the PacBio Sequel IIe system at the National Instrumentation Center for Environmental Management (NICEM), Seoul National University (Seoul, Republic of Korea), generating 34.0 Gbp of HiFi reads (Table 1).

Table 1 Summary statistics of raw sequencing data for Monochamus scutellatus used in genome assembly.

Full size table

Chromatin conformation capture sequencing was performed on half of a longitudinally bisected specimen (voucher no.: MCZ-SK1314) following the Pore-C protocol²⁰. Briefly, chromatin was fixed in situ within intact nuclei using formaldehyde to preserve native 3-D interactions. Following permeabilization of the nuclei, chromatin was denatured to expose accessible regions and digested with the restriction enzyme NlaIII (New England Biolabs, Ipswich, MA, USA). Proximally crosslinked DNA fragments were then ligated, and purified via phenol:chloroform extraction. The final Pore-C library was prepared using the Genomic DNA by Ligation Protocol [SQK-LSK114; Oxford Nanopore Technologies (ONT), Oxford, UK], and sequenced on a single flowcell of the PromethION system at NICEM (Seoul, Republic of Korea), yielding 22.5 Gbp of Pore-C reads with Phred Q-score ≥10 (Table 1).

Total RNA was extracted from the remaining half of the second specimen using the mirVana miRNA Isolation Kit (Invitrogen, Waltham, MA, USA). RNA concentration and integrity were evaluated using a Nanodrop Spectrophotometer and an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). An mRNA library was constructed using the NEBNext Ultra II RNA Library Prep Kit (New England Biolabs, Ipswich, MA, USA) and sequenced on a 150-bp paired-end S4 flowcell of the NovaSeq 6000 platform (Illumina, San Diego, CA, USA) at Novogene (Sacramento, CA, USA), producing 17.2 Gbp of Illumina RNA-seq reads (Table 1).

Genome assembly and scaffolding

To assemble the genome of M. scutellatus, genome size and heterozygosity were first estimated from the raw PacBio reads using Jellyfish v2.3.0²¹ and GenomeScope v2.0²² with a k-mer size of 35, which estimated the genome size of 774.11 Mbp and heterozygosity of 1.21%. Primary contigs were assembled using Hifiasm v0.16.1²³, and haplotypic duplications were resolved by reassigning allelic contigs using Purge Haplotigs v1.1.2²⁴. The primary contigs were screened for potential contamination using BlobTools v1.1.1²⁵, based on which two of the 33 primary contigs that were identified as prokaryotic or having atypical GC content for arthropods were removed.

Chromosome-level scaffolds were constructed from the primary contigs and Pore-C data using the Pore-C Snakemake workflow²⁰ and the 3D-DNA pipeline v180992²⁶. The Hi-C contact map was visualized and manually curated in Juicebox v2.20.00²⁷, and scaffolds were finalized with 3D-DNA. Scaffolding was further refined with RagTag v2.1.0²⁸ using the primary contigs as reference, increasing the mean scaffold lengths from 16.6 Mbp to 27.7 Mbp. Error correction was performed with Inspector v1.3.1²⁹ using the original raw PacBio HiFi reads. The final genome assembly was 830.9 Mbp in total length, slightly larger than the estimated genome size, 97.9% of which were assembled into 10 chromosome-scale scaffolds ranging from 152.6 Mbp to 28.4 Mbp (Table 2). A high-resolution Hi-C contact frequency heatmap was generated using HiGlass v0.8.0³⁰ to visualize chromosomal architecture (Fig. 2A). The mitochondrial genome was assembled using MitoHiFi v3.2³¹, guided by the mitochondrial genome of Anoplophora glabripennis (GenBank accession: NC_008221.1) as a reference, and the final mitochondrial contig was selected based on annotations from MitoFinder v1.4.1³².

Table 2 Summary statistics for genome assemblies and annotations of eight coleopteran genomes analyzed.

Full size table

Repeat elements and gene annotations

Repeat regions and transposable elements (TE) in the M. scutellatus genome were predicted and annotated using both homology-based and de novo prediction approaches within the Earl Grey pipeline v4.4.0³³. RepeatMasker v4.1.5³⁴ was used to annotate repeats based on the Dfam v3.8³⁵ TE database for Coleoptera, and RepeatModeler v2.0.5³⁶ was employed to generate a species-specific de novo repeat library for M. scutellatus. De novo consensus TE sequences were curated through the “BLAST, Extract, Extend and Trim” (BEAT) process³⁷, and long terminal repeat (LTR) retrotransposons were further annotated using LTR_Finder v1.07³⁸. Repetitive elements accounted for 70.7% of the genome, with unknown repeats and DNA transposons comprising 41.1% and 30.4% of all repeats, respectively (Table 3).

Table 3 Summary statistics of repeat elements across eight coleopteran genomes analyzed.

Full size table

Gene prediction was performed on the repeat-masked genome assembly using BRAKER v3.0.7³⁹, integrating species-specific transcriptomic and protein data, the Arthropoda OrthoDB, and reference protein datasets from Anoplophora glabripennis (GCF_000390285.2), Tribolium castaneum (GCF_000002335.3), Drosophila melanogaster (GCF_000001215.4) and Bombyx mori (GCF_030269925.1). A species-specific protein dataset for M. scutellatus was generated by assembling Illumina RNA-seq reads with rnaSPAdes v3.13.0⁴⁰, and translating RNA contigs into amino acid sequences with TransDecoder v5.7.0⁴¹. Prior to the assembly, raw Illumina RNA-seq reads were adapter-trimmed and quality-filtered to a minimum Phred Q-score of 33 using Trimmomatic v0.39⁴². Ab initio gene prediction was conducted using AUGUSTUS v3.5.5⁴³, trained with transcriptome-based evidence from GenMark-ET v4.72⁴⁴ and protein-based evidence from GeneMark-EP + v4.72⁴⁵. Consensus gene models were generated by merging the outputs of seven gene prediction runs using TSEBRA v1.1.1⁴⁶, retaining only the longest isoform per gene. The final gene annotation was formatted into GFF using gFACs v1.1.2⁴⁷, resulting in a total of 21,110 predicted protein-coding genes (Table 2). Functional annotation of the predicted gene models was subsequently performed using eggNOG-mapper v2.1.12⁴⁸ based on the Insecta eggNOG database, yielding a total of 13,684 genes functionally annotated (Tables 2, 4), with a BUSCO (Benchmarking Universal Single-copy Orthologs) protein completeness score of 97.6% (Table 5).

Table 4 Summary statistics of functional annotations for protein-coding genes in the M. scutellatus genome.

Full size table

Table 5 BUSCO assessment for the genome assembly, annotated proteins, and transcriptome assembly for M. scutellatus against the Insecta OrthoDB v10 (n = 1,367).

Full size table

Synteny-based identification of the X chromosome

Genome synteny was analyzed across chromosome-level genome assemblies of seven Cerambycidae species, with Tribolium castaneum (Tenebrionidae) as an outgroup. Orthologous genes were identified via reciprocal-best BLASTp⁴⁹ hits among annotated proteins using DIAMOND v2.0.13⁵⁰. Chromosomal homology was evaluated through Bonferroni-corrected one-sided Fisher’s exact tests implemented in odp v0.3.3⁵¹, based on the reciprocal-best BLASTp results. Conserved macrosyntenic blocks were visualized as ribbon diagrams. The analysis revealed extensive conservation of the X chromosome across all seven Cerambycidae species and T. castaneum, consistent with previous reports of high X chromosome conservation in Coleoptera¹², and permitted the identification of chromosome 10 in the M. scutellatus genome as the X chromosome.

Orthologous gene identification and phylogenomic inference

Orthologous genes and orthogroups across the seven Cerambycidae and T. castaneum genomes were identified using OrthoFinder v2.5.5⁵², with DIAMOND for sequence alignment and FastTree v2.1.11⁵³ for maximum likelihood (ML) tree inference. A total of 3,708 single-copy orthologs were identified, aligned with MAFFT v7.5.26⁵⁴, and trimmed using trimAl v1.4⁵⁵ with the gappyout algorithm. ML gene trees were inferred using IQ-TREE v2.2.2.6⁵⁶ with the MFP + MERGE option. A species tree was reconstructed under the multispecies coalescent model in ASTRAL v5.7.8⁵⁷, with T. castaneum as the outgroup. Divergence times were estimated within a Bayesian framework in MCMCtree, implemented in PAML v4.10.7⁵⁸, employing the approximate likelihood calculation method and two calibration points: (1) a fossil calibration for the crown-group Cerambycidae, based on the age of ^†Cretoprionus liutiaogouensis Wang et al. from the lower Cretaceous circa 122.5–124.0 Mya⁵⁹; and (2) a secondary calibration for the Cerambycidae-Tenebrionidae divergence at approximately 220.2 Mya [95% highest posterior density (HPD): 188.1–237.6 Mya]⁵. The resulting time-calibrated phylogeny supports a sister-group relationship between M. scutellatus and the Palearctic clade comprising M. alternatus and M. saltuarius, which diverged approximately 11.8 Mya (95% HPD: 8.0–16.3 Mya) (Fig. 2C), providing robust evidence for the systematic placement and divergence history of M. scutellatus within Cerambycidae.

Data Records

All raw sequencing data (PacBio HiFi, ONT Pore-C, and Illumina RNA-seq) used for genome assembly and annotation for M. scutellatus have been deposited in NCBI BioProject PRJNA1289024. PacBio HiFi sequencing data, ONT Pore-C sequencing data and Illumina RNA-seq data are available within the NCBI Sequence Read Archive (SRA) under accession numbers SRR34444379⁶⁰, SRR34444378⁶¹, and SRR34444377⁶², respectively. The final chromosome-level genome assembly has been deposited in GenBank under accession number GCA_052862855.1⁶³. The genome assembly and annotation files are available from the Figshare Repository (https://doi.org/10.6084/m9.figshare.29575361⁶⁴).

Technical Validation

The final genome assembly, constructed using PacBio HiFi and Pore-C sequencing data, along with transcriptome data, was assembled to 10 chromosome-level scaffolds. Genome completeness was assessed with BUSCO v5.8.0⁶⁵ against the Insecta OrthoDB v10⁶⁶ and revealed 99.0% of core single-copy orthologs, with only 0.8% of duplicated genes (Table 5), exceeding the 90% BUSCO threshold recommended for reference genomes⁶⁷. To further evaluate assembly quality, raw PacBio HiFi reads were mapped back to the final genome assembly using Minimap v2.21⁶⁸ within Inspector, resulting in a mapping rate of 99.6%, an average alignment depth of 40.7×, and an assembly quality value (QV) of 36.9 (Table 6), indicating a highly complete and contiguous assembly.

Table 6 Summary statistics for raw long-read sequencing data mapped to the genome assembly of M. scutellatus.

Full size table

Data availability

All datasets are available through the NCBI SRA (https://www.ncbi.nlm.nih.gov/sra) under accession numbers SRR34444377, SRR34444378 and SRR34444379; the NCBI GenBank (https://identifiers.org/ncbi/insdc.gca:GCA_052862855.1); and the Figshare Repository (https://doi.org/10.6084/m9.figshare.29575361).

Code availability

All software and pipelines were executed according to the manuals provided by the published bioinformatics tools. The version of each program is provided in the Methods section, and default parameters were used unless otherwise stated. No custom scripts were used.

References

Svacha, P. & Lawrence, J. F. Cerambycidae Latreille, 1802. in Handbook of Zoology, Arthropoda: Insecta; Coleoptera, Beetles, Volume 3: Morphology and systematics (Phytophaga) (eds Richard A. B. Leschen & Rolf G. Beutel) 77–177 (Walter de Gruyter, 2014).
Monné, M. L., Monné, M. A. & Wang, Q. General morphology, classification, and biology of Cerambycidae. in Cerambycidae of the world: Bioogy and pest management (ed Qiao Wang) 1–70 (CRC Press, 2017).
Farrell, B. D. “Inordinate fondness” explained: Why are there so many beetles? Science 281, 555–559 (1998).
Article PubMed Google Scholar
Farrell, B. D. & Mitter, C. The timing of insect/plant diversification: Might Tetraopes (Coleoptera: Cerambycidae) and Asclepias (Asclepiadaceae) have co-evolved? Biological Journal of the Linnean Society 63, 553–577 (1998).
Google Scholar
McKenna, D. D. et al. The evolution and genomic basis of beetle diversity. Proceedings of the National Academy of Sciences of the United States of America 116, 24729–24737 (2019).
Article ADS PubMed PubMed Central Google Scholar
McKenna, D. D. et al. Genome of the Asian longhorned beetle (Anoplophora glabripennis), a globally significant invasive species, reveals key functional and evolutionary innovations at the beetle–plant interface. Genome Biology 17, 227–227 (2016).
Article PubMed PubMed Central Google Scholar
Scully, E. D. et al. Functional genomics and microbiome profiling of the Asian longhorned beetle (Anoplophora glabripennis) reveal insights into the digestive physiology and nutritional ecology of wood feeding beetles. BMC genomics 15, 1096–1096 (2014).
Article PubMed PubMed Central Google Scholar
Scully, E. D. et al. Metagenomic profiling reveals lignocellulose degrading system in a microbial community associated with a wood-feeding beetle. PLoS ONE 8, 1–22 (2013).
Article Google Scholar
Kirsch, R. et al. Horizontal gene transfer and functional diversification of plant cell wall degrading polygalacturonases: Key events in the evolution of herbivory in beetles. Insect Biochemistry and Molecular Biology 52, 33–50 (2014).
Article PubMed Google Scholar
Shin, N. R. et al. Larvae of longhorned beetles (Coleoptera; Cerambycidae) have evolved a diverse and phylogenetically conserved array of plant cell wall degrading enzymes. Systematic Entomology 46, 784–797 (2021).
Article Google Scholar
Shin, N. R., Doucet, D. & Pauchet, Y. Duplication of horizontally acquired GH5-2 enzymes played a central role in the evolution of longhorned Beetles. Molecular Biology and Evolution 39, 1–14 (2022).
Article Google Scholar
Fu, N. et al. Chromosome-level genome assembly of Monochamus saltuarius reveals its adaptation and interaction mechanism with pine wood nematode. International Journal of Biological Macromolecules 222, 325–336 (2022).
Article PubMed Google Scholar
Gao, Y. F. et al. Chromosome-level genome assembly of the Japanese sawyer beetle Monochamus alternatus. Scientific Data 11, 199 (2024).
Article PubMed PubMed Central Google Scholar
Gorring, P. S. & Farrell, B. D. Evaluating species boundaries using coalescent delimitation in pine-killing Monochamus (Coleoptera: Cerambycidae) sawyer beetles. Molecular Phylogenetics and Evolution 184, 107777 (2023).
Article PubMed Google Scholar
Tóth, Á. Bursaphelenchus xylophilus, the pinewood nematode: Its significance and a historical review. Acta Biologica Szegediensis 55, 213–217 (2011).
Google Scholar
Akbulut, S. & Stamps, W. T. Insect vectors of the pinewood nematode: A review of the biology and ecology of Monochamus species. Forest Pathology 42, 89–99 (2012).
Article Google Scholar
Vicente, C., Espada, M., Vieira, P. & Mota, M. Pine wilt disease: A threat to European forestry. European Journal of Plant Pathology 133, 89–99 (2012).
Article Google Scholar
Wingfield, M. J. & Blanchette, R. A. The pine-wood nematode, Bursaphelenchus xylophilus, in Minnesota and Wisconsin: Insect associates and transmission studies. Canadian Journal of Forest Research 13, 1068–1076 (1983).
Article ADS Google Scholar
Smith, S. G. Chromosome numbers of Coleoptera. Heredity 7, 31–48 (1953).
Article Google Scholar
Deshpande, A. S. et al. Identifying synergistic high-order 3D chromatin conformations from genome-scale nanopore concatemer sequencing. Nature Biotechnology 40, 1488–1499 (2022).
Article PubMed Google Scholar
Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770 (2011).
Article PubMed PubMed Central Google Scholar
Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nature Communications 11, 1432 (2020).
Article ADS PubMed PubMed Central Google Scholar
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nature Methods 18, 170–175 (2021).
Article ADS PubMed PubMed Central Google Scholar
Roach, M. J., Schmidt, S. A. & Borneman, A. R. Purge Haplotigs: Allelic contig reassignment for third-gen diploid genome assemblies. BMC Bioinformatics 19, 460 (2018).
Article PubMed PubMed Central Google Scholar
Laetsch, D. R. & Blaxter, M. L. BlobTools: Interrogation of genome assemblies. F1000Research 6, 1287–1287 (2017).
Article Google Scholar
Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95 (2017).
Article ADS PubMed PubMed Central Google Scholar
Durand, N. C. et al. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Systems 3, 99–101 (2016).
Article PubMed PubMed Central Google Scholar
Alonge, M. et al. Automated assembly scaffolding using RagTag elevates a new tomato system for high-throughput genome editing. Genome Biology 23, 1–19 (2022).
Article Google Scholar
Chen, Y., Zhang, Y., Wang, A. Y., Gao, M. & Chong, Z. Accurate long-read de novo assembly evaluation with Inspector. Genome Biology 22, 321 (2021).
Article Google Scholar
Kerpedjiev, P. et al. HiGlass: Web-based visual exploration and analysis of genome interaction maps. Genome Biology 19, 125 (2018).
Article PubMed PubMed Central Google Scholar
Uliano-Silva, M. et al. MitoHiFi: A python pipeline for mitochondrial genome assembly from PacBio high fidelity reads. BMC Bioinformatics 24, 288 (2023).
Article PubMed PubMed Central Google Scholar
Allio, R. et al. MitoFinder: Efficient automated large-scale extraction of mitogenomic data in target enrichment phylogenomics. Molecular Ecology Resources 20, 892–905 (2020).
Article PubMed PubMed Central Google Scholar
Baril, T., Galbraith, J. & Hayward, A. Earl Grey: A fully automated user-friendly transposable element annotation and analysis pipeline. Molecular Biology and Evolution 41, 1–18 (2024).
Article Google Scholar
Smit, A. F. A., Hubley, R. & Green, P. RepeatMasker Open-4.0. Retrieved from https://www.repeatmasker.org (2023).
Storer, J., Hubley, R., Rosen, J., Wheeler, T. J. & Smit, A. F. The Dfam community resource of transposable element families, sequence models, and genome annotations. Mobile DNA 12, 1–14 (2021).
Article Google Scholar
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proceedings of the National Academy of Sciences of the United States of America 117, 9451–9457 (2020).
Article ADS PubMed PubMed Central Google Scholar
Platt, R. N., Blanco-Berdugo, L. & Ray, D. A. Accurate transposable element annotation is vital when analyzing new genome assemblies. Genome Biology and Evolution 8, 403–410 (2016).
Article PubMed PubMed Central Google Scholar
Xu, Z. & Wang, H. LTR_FINDER: An efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Research 35, W265–W268 (2007).
Article PubMed PubMed Central Google Scholar
Gabriel, L. et al. BRAKER3: Fully automated genome annotation using RNA-seq and protein evidence with GeneMark-ETP, AUGUSTUS, and TSEBRA. Genome Research 34, 769–777 (2024).
Article PubMed PubMed Central Google Scholar
Bushmanova, E., Antipov, D., Lapidus, A. & Prjibelski, A. D. RnaSPAdes: A de novo transcriptome assembler and its application to RNA-Seq data. GigaScience 8, 1–13 (2019).
Article Google Scholar
Haas, B. J. TransDecoder (version 5.7.0) Retrieved from https://github.com/TransDecoder/TransDecoder (2023).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Article PubMed PubMed Central Google Scholar
Stanke, M., Diekhans, M., Baertsch, R. & Haussler, D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24, 637–644 (2008).
Article PubMed Google Scholar
Lomsadze, A., Ter-Hovhannisyan, V., Chernoff, Y. O. & Borodovsky, M. Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids Research 33, 6494–6506 (2005).
Article PubMed PubMed Central Google Scholar
Brůna, T., Lomsadze, A. & Borodovsky, M. GeneMark-EP+: Eukaryotic gene prediction with self-training in the space of genes and proteins. NAR Genomics and Bioinformatics 2, 1–14 (2020).
Article Google Scholar
Gabriel, L., Hoff, K. J., Brůna, T., Borodovsky, M. & Stanke, M. TSEBRA: Transcript selector for BRAKER. BMC Bioinformatics 22, 1–12 (2021).
Article Google Scholar
Caballero, M. & Wegrzyn, J. gFACs: Gene filtering, analysis, and conversion to unify genome annotations across alignment and gene prediction frameworks. Genomics, Proteomics & Bioinformatics 17, 305–310 (2019).
Article Google Scholar
Cantalapiedra, C. P., Hernnandez-Plaza, A., Letunic, I., Bork, P. & Huerta-Cepas, J. eggNOG-mapper v2: Functional annotation, orthology assignments, and domain prediction at the metagenomic scale. Molecular Biology and Evolution 38, 5825–5829 (2021).
Article PubMed PubMed Central Google Scholar
Altschul, S. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research 25, 3389–3402 (1997).
Article PubMed PubMed Central Google Scholar
Buchfink, B., Reuter, K. & Drost, H. G. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nature Methods 18, 366–368 (2021).
Article PubMed PubMed Central Google Scholar
Schultz, D. T. et al. Ancient gene linkages support ctenophores as sister to other animals. Nature 618, 110–117 (2023).
Article ADS PubMed PubMed Central Google Scholar
Emms, D. M. & Kelly, S. OrthoFinder: Phylogenetic orthology inference for comparative genomics. Genome Biology 20, 1–14 (2019).
Article Google Scholar
Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2 – Approximately maximum-likelihood trees for large alignments. PLoS ONE 5, e9490 (2010).
Article ADS PubMed PubMed Central Google Scholar
Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Molecular Biology and Evolution 30, 772–780 (2013).
Article PubMed PubMed Central Google Scholar
Capella-Gutiérrez, S., Silla-Martínez, J. M. & Gabaldón, T. trimAl: A tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 (2009).
Article PubMed PubMed Central Google Scholar
Minh, B. Q. et al. IQ-TREE 2: New models and efficient methods for phylogenetic inference in the genomic era. Molecular Biology and Evolution 37, 1530–1534 (2020).
Article PubMed PubMed Central Google Scholar
Zhang, C., Rabiee, M., Sayyari, E. & Mirarab, S. ASTRAL-III: Polynomial time species tree reconstruction from partially resolved gene trees. BMC Bioinformatics 19, 15–30 (2018).
Article Google Scholar
Yang, Z. PAML 4: Phylogenetic analysis by maximum likelihood. Molecular Biology and Evolution 24, 1586–1591 (2007).
Article PubMed Google Scholar
Wang, B. et al. The earliest known longhorn beetle (Cerambycidae: Prioninae) and implications for the early evolution of Chrysomeloidea. Journal of Systematic Palaeontology 12, 565–574 (2013).
Article Google Scholar
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR34444379 (2025).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR34444378 (2025).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR34444377 (2025).
Kim, S. & Farrell, B. D. Chromosome-level genome assembly of the white-spotted sawyer beetle Monochamus scutellatus (Coleoptera: Cerambycidae). GenBank https://identifiers.org/ncbi/insdc.gca:GCA_052862855.1 (2025).
Kim, S. & Farrell, B. D. Chromosome-level genome assembly of the white-spotted sawyer Monochamus scutellatus (Coleoptera: Cerambycidae). figshare https://doi.org/10.6084/m9.figshare.29575361 (2025).
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: Assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
Article PubMed Google Scholar
Kriventseva, E. V. et al. OrthoDB v10: Sampling the diversity of animal, plant, fungal, protist, bacterial and viral genomes for evolutionary and functional annotations of orthologs. Nucleic Acids Research 47, D807–D811 (2019).
Article PubMed Google Scholar
Lewin, H. A. et al. Earth BioGenome Project: Sequencing life for the future of life. Proceedings of the National Academy of Sciences of the United States of America 115, 4325–4333 (2018).
Article ADS PubMed PubMed Central Google Scholar
Li, H. Minimap2: Pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

Computational analyses were performed on the FASRC Cannon cluster supported by the FAS Division of Science Research Computing Group, Harvard University. This work was supported by the Graduate Research Fund of the Department of Organismic and Evolutionary Biology and the Dean’s Competitive Fund for Promising Scholarship at Harvard University; and grants from the National Institute of Biological Resources, the Ministry of Environment (MOE), Republic of Korea (No. NIBR202405101) and the National Research Foundation of Korea, funded by the Ministry of Education, Science and Technology (MEST) (No. 2019R1A6A1A10073437). Fieldwork was supported by the Putnam Expedition Grant, and publication costs were covered by the Wetmore Colles Fund of the Museum of Comparative Zoology, Harvard University.

Author information

Authors and Affiliations

Museum of Comparative Zoology and Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, 02138, USA
Sangil Kim & Brian D. Farrell
Research Institute of Basic Sciences and School of Biological Sciences, Seoul National University, Seoul, 08826, Republic of Korea
Sangil Kim

Authors

Sangil Kim
View author publications
Search author on:PubMed Google Scholar
Brian D. Farrell
View author publications
Search author on:PubMed Google Scholar

Contributions

Sangil Kim: Conceptualization; data curation; formal analysis; funding acquisition; investigation; methodology; project administration; resources; validation; visualization; writing – original draft; writing – review and editing. Brian D. Farrell: Conceptualization; data curation; funding acquisition; investigation; methodology; project administration; resources; supervision; validation; writing – review and editing.

Corresponding author

Correspondence to Sangil Kim.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Kim, S., Farrell, B.D. Chromosome-level genome assembly of the white-spotted sawyer Monochamus scutellatus (Coleoptera: Cerambycidae). Sci Data 12, 2009 (2025). https://doi.org/10.1038/s41597-025-06258-0

Download citation

Received: 18 July 2025
Accepted: 03 November 2025
Published: 10 December 2025
Version of record: 29 December 2025
DOI: https://doi.org/10.1038/s41597-025-06258-0