Background & Summary

Trichophyton is a genus of dermatophytes, a group of fungi that are parasitic on the skin, hair, and nails of humans and animals. These fungi are renowned for their keratolytic abilities, allowing them to break down keratin, a key structural protein found in skin, hair, and nails1. They can cause a variety of clinical manifestations, including tinea capitis (scalp ringworm), tinea corporis (body ringworm), tinea pedis (athlete’s foot), and onychomycosis (nail infection). The infections are generally acquired through direct contact with an infected individual or animal, or indirectly through contaminated surfaces or objects2.

Trichophyton belongs to the family Arthrodermataceae and is classified within the phylum Ascomycota. Some of the most well-known species in this genus include Trichophyton rubrum, Trichophyton mentagrophytes, Trichophyton tonsurans, and Trichophyton verrucosum3,4. T. mentagrophytes and T. tonsurans are two species that have significant medical importance5. T. mentagrophytes is known to cause a wide range of dermatophytoses, including tinea corporis and tinea pedis. It is also a common cause of ringworm in small mammals, which can act as a reservoir for human infections6,7. T. tonsurans is a leading cause of tinea capitis, particularly in children, and has become increasingly resistant to conventional antifungal treatments8.

To date, nearly all available Trichophyton genomes are fragmented into contigs or scaffolds, with none achieving chromosome-level resolution. For instance, prior assemblies of T. mentagrophytes and T. tonsurans exhibited contig N50 values below 400 kb and lacked telomere-to-telomere (T2T) continuity (Table 1), which hindered precise gene synteny analysis, repeat element characterization, and haplotype-specific investigations. These limitations impede the identification of virulence factors, antifungal resistance markers, and species-specific genomic features critical for diagnostics and therapeutics.

Table 1 Comparison of assembly quality with the existing genomes of T. mentagrophytes and T. tonsurans.

To address this gap, we employed PacBio HiFi circular consensus sequencing (CCS) technology to generate long-read data (average read length ~20 kb, N50 > 19 kb) for T. mentagrophytes and T. tonsurans, implying the reliability of sequencing data. Using Hifiasm, we achieved gapless, haplotype-resolved assemblies for both species. The genome sizes of T. mentagrophytes and T. tonsurans span 24.48 Mb and 25.24 Mb, respectively, with contig N50 values of 6.47 Mb and 12.65 Mb. Each assembly comprises three pseudochromosomes (accounting for 95.58–96.67% genome coverage), two of which are fully T2T with canonical telomeric repeats (TAACCC/GGGTTA) at both ends. BUSCO analysis confirmed exceptional completeness (99% for fungi_odb10), while HiFi read remapping rates (>99.99%) validated assembly accuracy. Notably, our annotations revealed 7,577 (T. mentagrophytes) and 7,519 (T. tonsurans) protein-coding genes, with >99% functionally annotated through GO, KEGG, and Swiss-Prot databases. We further identified lineage-specific repeat expansions: T. tonsurans harbors abundant LINE retrotransposons (2.29% of the genome), absent in T. mentagrophytes, suggesting post-speciation transposable element activity. These chromosome-scale assemblies provide the first comprehensive view of Trichophyton genome architecture, enabling precise comparative genomics, and pathogenicity island mapping. This resource is pivotal for developing targeted antifungals, molecular diagnostics, and understanding mechanisms underlying drug resistance and host adaptation.

Methods

Sample collection and fungal cultivation

The clinical isolates of T. mentagrophytes and T. tonsurans were harvested from School of Medicine, Henan Polytechnic University, Jiaozuo city, Henan Province, P. R. China. For fungal cultivation, the isolates were cultured in Sabouraud Dextrose Broth (SDB) prepared using (40 g Dextrose (Glucose), 10 g Peptone, 1000 mL Distilled water and pH adjusted to 5.6) for 5–7 days of incubation at 28 °C in a shaking incubator with steady speed of 100 rpm. The mycelium was filtered out using a sterilized cheesecloth and rinse off culture media with sterilized distilled water.

DNA extraction and sequencing

High-quality genomic DNA of T. mentagrophytes and T. tonsurans was extracted from hyphae using the modified CTAB9. Specifically, about 0.2–0.5 g of harvested mycelium was weighed and frozen in liquid nitrogen. The frozen samples were transferred into pre-chilled mortar containing nitrogen and ground with a pestle into a fine powder. The ground samples were transferred into 1.5 mL Eppendorf tubes and 500 µL of CTAB extraction buffer (100 mM Tris-HCl (pH 8.0), 20 mM EDTA, 1.4 M, 1% (w/v) PVP, and 0.2% (v/v) β-mercaptoethanol) and homogenized by gently vertexing the tube for 5–10 minutes to allow proper mixing and phase separation. The content was centrifuged at 12,000 rpm for 10–15 minutes at room temperature. 300 μL of the upper aqueous phase supernatant (containing the DNA) was pipetted into a new sterilized EP tube and 0.6 volumes of cold isopropanol added mix gently and incubated in −20 °C for 30 minutes to precipitate the DNA.

The content was centrifuged at 12,000 rpm for 10 minutes at 4 °C to pellet the DNA. The supernatant was discarded and 500 µL of 70% ethanol was added to the DNA pellet, gently inverted to wash the DNA and then centrifuged again at 12,000 rpm for 5 minutes at 4 °C. The ethanol wash was discarded and the DNA pellet air dried for 10–15 minutes to remove residual ethanol. The extracted DNA was eluted with a pre-warmed 50 μL RNAse-free water (the RNAse-free water was prewarmed to 55–60 °C to improve the elution) and stored −80 °C.

The quality and quantity of the extracted DNA were examined using a NanoDrop 2000 spectrophotometer (NanoDrop Technologies, Wilmington, DE, USA), Qubit dsDNA HS Assay Kit on a Qubit 3.0 Fluorometer (Life Technologies, Carlsbad, CA, USA) and electrophoresis on a 0.8% agarose gel, respectively. The SMRTbell library was constructed using the SMRTbell Express Template Prep kit 2.0 (Pacific Biosciences). Briefly, 15 μg of the genomic DNA was carried into the first enzymatic reaction to remove single-stranded overhangs followed by treatment with repair enzymes to repair any damage that may be present on the DNA backbone. After DNA damage repair, ends of the double-stranded fragments were polished and subsequently tailed with an A-overhang. Ligation with T-overhang SMRTbell adapters was performed at 20 °C for 60 minutes. Following ligation, the SMRTbell library was purified with 1X AMPure PB beads. The size distribution and concentration of the library were assessed using the FEMTO Pulse automated pulsed-field capillary electrophoresis instrument (Agilent Technologies, Wilmington, DE) and the Qubit 3.0 Fluorometer (Life Technologies, Carlsbad, CA, USA). Following library characterization, 3 μg was subjected to a size selection step using the BluePippin system (Sage Science, Beverly, MA) to remove SMRTbells ≤15 kb. After size selection, the library was purified with 1X AMPure PB beads. Library size and quantity were assessed using the FEMTO Pulse and the Qubit dsDNA HS reagents Assay kit. Sequencing primer and Sequel II DNA Polymerase were annealed and bound, respectively, to the final SMRTbell library. The library was loaded at an on-plate concentration of 120 pM using diffusion loading. SMRT sequencing was performed using a single 8 M SMRT Cell on the Sequel II System with Sequel II Sequencing Kit, 1800-minute movies by Frasergen. Finally, we obtained 5.51 Gb and 8.33 Gb high fidelity (HiFi) reads for T. mentagrophytes and T. tonsurans, with an average depth of 225X and 330X, respectively (Table 2). Both the average length and N50 length of the HiFi data were close to 20 kb.

Table 2 Statistics of HiFi reads of two Trichophyton species.

De novo genome assembly

A total of 269 thousand and 450 thousand HiFi reads for T. mentagrophytes and T. tonsurans were assembled using the Hifiasm software (version 0.19.5-r587) with parameters of ‘–telo-m CCCTAAA’10. The genomes of T. mentagrophytes and T. tonsurans showed no gap, with the total sizes of 24.48 Mb and 25.24 Mb, respectively (Table 3). The contig numbers were 19 and 24, with the contig N50 values of 6.47 Mb and 12.65 Mb, respectively. Notably, we found three super-contigs that accounted for 95.58% and 96.67% of the T. mentagrophytes and T. tonsurans genomes, respectively. Moreover, two super-contigs of the two assemblies exhibited the canonical telomere repeat sequences in both ends (Fig. 1A,B, Table 4). Together, these results suggested that the three super-contigs should represent pseudo-chromosomes of the genomes and two of them had achieved telomere-to-telomere (T2T) level. Moreover, we used the previously published mitochondrion sequences of T. mentagrophytes11 to search the mitochondrion genome in the draft genomes of two species. Finally, we obtained two mitochondrion genomes for T. mentagrophytes and T. tonsurans, with the total sizes of 24,299 bp and 24,306 bp, respectively. The genome assembly was visualized using Circos (version 0.69-6)12. The genomic synteny was analyzed and visualized using JCVI (version 1.4.22) software with default parameters13.

Table 3 Genomic characteristics of two Trichophyton species.
Fig. 1
figure 1

(A,B) The Circos plots displayed the chromosome (circle A), gene density (circle B), repeat density (circle C) and lncRNA density (circle D), and the synteny lines represented the paralogous genes. The triangles in circle A represented the telomere regions.

Table 4 Information of telomere in the genomes of two Trichophyton species.

The completeness of the genome was assessed using Benchmarking Universal Single-Copy Orthologs (BUSCO, version 5.5.0) with fungi_odb10, ascomycota_odb10 databases14. The results showed that the two genomes yielded over 99% of genomic completeness (Fig. 2, Table 3). The HiFi reads were realigned to the assembly using Minimap2 (version 2.26-r1175, with parameters of ‘-ax map-hifi’) and Samtools (version 1.17) software15,16, revealing that more than 99.99% of the HiFi reads can be mapped to the corresponding assembly. These results indicated the high quality, accuracy, continuity and completeness of the T. mentagrophytes and T. tonsurans genome assemblies.

Fig. 2
figure 2

BUSCO analysis in genome mode for T. mentagrophytes and T. tonsurans by using fungi_odb10 and ascomycota_odb10 databases.

Two haploid genomes of each species were assembled using Hifiasm (version 0.19.5-r587) with parameters of ‘–telo-m CCCTAAA’, and they showed similar genome sizes and the contig N90 values (Table 3). Crucially, all of the haploid genomes contained three pseudo-chromosomes that accounted for more than 95% of the genomes, and we also observed the canonical telomere repeats the ends of chromosomes (Table 4), highlighting the high quality and continuity of the haploid genomes. Moreover, BUSCO analysis for each haploid genome showed a high completeness (Table 4).

Genome annotation

Gene structure prediction was carried out using the GETA software (version 2.5.7), which is accessible at its GitHub repository (https://github.com/chenlianfu/GETA). GETA integrates three different gene prediction approaches: the de novo method, homology-based prediction, and transcriptome-based prediction. Throughout this process, a suite of tools was applied, including PASA17, genewise18, AUGUSTUS19, Hisat220, hmmsearch21, BGM2AT, exonerate, and others. In total, 7577 and 7519 genes were identified in the T. mentagrophytes and T. tonsurans genomes, with 1965 and 1826 genes containing alternative isoforms (Table 3). Additionally, we discovered 503 and 421 long-noncoding RNAs (lncRNAs) in the two genomes.

To infer gene functions, interpro-scan (version 5.65–97.0) and eggNOG-Mapper (version 2.1.12) were used, and a range of gene function databases were consulted for detailed functional annotations. These databases included Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), Pfam, Clusters of Orthologous Groups (COG), and the Swiss-Prot protein database22,23,24,25,26,27,28. Interestingly, nearly all the genes (7497/7577 in T. mentagrophytes and 7449/7519 in T. tonsurans) were well-annotated, implying the accuracy and completeness of the gene prediction results (Fig. 3A,B).

Fig. 3
figure 3

Numbers of genes in T. mentagrophytes (A) and T. tonsurans (B) that were annotated in the five gene function databases for the two species.

For repeat annotation in the two genomes, the de novo identification of repeat elements was conducted with RepeatModeler (version 2.0.3)29. The repeat consensus sequences identified were then used as seeds for RepeatMasker (version 4.1.0)30 to locate and analyze all repeat regions within the genome. To predict full-length LTR retrotransposons (LTR-RTs), a combination of tools was used, including LTR_finder (version 1.07, with parameters of ‘-D 15000 -d 1000 -L 7000 -l 100 -p 20 -C -M 0.8’)31, LTRharvest (version 1.6.2, with parameters of ‘-similar 80 -vic 10 -seed 20 -seqids yes -minlenltr 100 -maxlenltr 7000 -mintsd 4 -maxtsd 6’)32, and LTR_retriever (version 2.9.0, with default parameters)33. The simple repeat with the repeat unit of either ‘TAACCC’ or ‘GGGTTA’ was considered as the telomeric repeat34. A total of 8.89% and 9.99% of the T. mentagrophytes and T. tonsurans genomes were covered by repeat elements (Table 3). Among these repeats, LTR retrotransposon (LTR-RT) Ty3/Gypsy elements were predominant in both genomes (Table 5). Notably, we discovered a substantial amount of non-LTR-RT Long interspersed nuclear elements (LINEs) exclusively in the T. tonsurans genome, suggesting that these TEs were amplified after the divergence between the two species.

Table 5 Repeat classification and content in the two Trichophyton genomes.

Data Records

The raw HiFi reads of T. tonsurans and T. mentagrophytes were deposited in NCBI under the accessions of SRP53618035 and SRP53614536, respectively. The genome assemblies have been deposited at GeneBank with accession numbers of GCA_047301375.137 (T. tonsurans) and GCA_047301425.138 (T. mentagrophytes) Additionally, we have deposited the genome assemblies, including haploid genomes, and gene annotation results in Figshare39,40.

Technical Validation

Several methods were employed to assess the quality of the genomes. At first, BUSCO analysis with two databases, including fungi_odb10, ascomycota_odb10, was used to evaluate the completeness of genome. The results showed that the two assemblies exhibited over 99% of genome completeness according to fungi_odb10 dataset and 97.3% of genome completeness according to ascomycota_odb10. Then, the genomes of two species contained no gap, and the three pseudo-chromosomes covered over 95% of the genomes, and two of the three pseudo-chromosomes exhibited the telomere repeat sequences in the both ends. Moreover, over 99.99% of the HiFi reads can be realigned to the genome assemblies. In aggregate, these results indicated the high quality, accuracy, continuity and completeness of the T. mentagrophytes and T. tonsurans genome assemblies.