Introduction

Fascioliasis is the most widespread foodborne parasitic zoonosis and is recognized as a neglected tropical disease by the World Health Organization. It poses the greatest burden in countries across South America, the Middle East, and South/Southeast Asia1. The highest prevalence in humans is observed in the Andean regions of South America, where prevalence in children can reach 20–70%2. Fasciola hepatica, the parasite responsible for fascioliasis, has a complex lifecycle that requires snail intermediate hosts and mammalian definitive hosts. Livestock are the most common definitive hosts for fascioliasis, though humans also contribute to the cycle in areas of high prevalence. Livestock fascioliasis causes significant economic impact, with estimated annual losses of €635 million in 18 European countries3. This economic burden is associated with reduced production of milk, meat, and wool, decreased fertility, and liver condemnation3,4,5,6,7. In developing countries, economic burden estimates are often based on limited regional data and abattoir registries, which may not fully capture the extent of losses. These economic impacts exacerbate cycles of poverty and food insecurity in affected regions. F. hepatica infection in humans leads to a biphasic illness depending on the parasite’s lifecycle stage. Acute fascioliasis, resulting from the migration of juvenile parasites through the liver, manifests as debilitating fever, abdominal pain, and weight loss. Chronic fascioliasis, caused by adult parasites residing in the biliary tree, presents with nonspecific symptoms and may lead to bile duct obstruction. This disease disproportionately affects children, contributing to the burden of anemia and malnutrition in an already vulnerable population8,9,10. Although less well-characterized, fascioliasis also contributes to liver and biliary tree disease, causing biliary obstruction, ascending cholangitis, and possibly liver/biliary fibrosis11,12.

The number of drugs to treat livestock and human fascioliasis is limited by effectiveness, activity on juvenile stages, toxicity, and withdrawal periods in livestock13. Clorsulon, closantel, and nitroxynil are veterinary drugs with limited efficacy against early juvenile stages14. Bithionol and dehydroemetine, previously used in human treatments, were discontinued due to toxicity and drug interactions15. Triclabendazole remains the only drug considered highly effective against both juvenile and adult stages of Fasciola in livestock and humans13,15. The treatment and control of fascioliasis in these groups have heavily relied on triclabendazole and mass drug administration. However, resistance has become widespread in livestock16,17,18, and small case series have documented treatment failure in humans with acute and chronic infection after repeated courses of triclabendazole19,20,21,22,23. For instance, in a cohort of 146 children with chronic fascioliasis in Cusco, Peru, only 55% achieved parasitological cure after the first round of triclabendazole treatment24, compared to a 95% efficacy rate with a single dose in 200525. Furthermore, 12% of children in this cohort did not achieve parasitological cure despite undergoing more than four treatment rounds with high doses24. Poor quality control of drug formulations and underdosing are likely contributing to the diminishing effectiveness and the emergence of resistance14.

The mechanisms underlying TCBZ-R in Fasciola hepatica are not well understood14. Proposed mechanisms include enhanced activity of the P-glycoprotein (P-gp) drug efflux pump and increased conversion of the active sulfoxide metabolite to the less active sulfone26,27,28. Additionally, higher expression levels of the detoxifying enzyme glutathione S-transferase (GST) have been observed in resistant Fasciola29. However, subsequent evaluations of these pathways have produced inconsistent results. A specific single nucleotide polymorphism (SNP), T687G, which results in an amino acid substitution in P-gp, was linked to TCBZ-R in one study30, but follow-up research in Australia and Latin America did not corroborate this association31,32. Transcriptomic analyses of Latin American laboratory isolates revealed downregulation of adenylate cyclase transcription and upregulation of GST mu isoforms in parasites resistant to albendazole and triclabendazole33. Nevertheless, distinguishing between the signals of albendazole and triclabendazole resistance was challenging due to the small sample size and experimental conditions. These findings underscore the complexity of TCBZ-R mechanisms and the limitations inherent in studies that focus on a narrow range of metabolic pathways, utilize few parasites, or rely on laboratory-bred isolates. In a recent 2023 study, Beesley and colleagues utilized bulked segregant analysis to compare allele frequencies before and after TCBZ treatment in F2 mapping populations from experimental genetic crosses and egg samples from natural infections in Cumbria, UK34. They identified a ~3.2 Mb locus associated with TCBZ-R, characterized by dominant inheritance, although the specific causal gene(s) could not be pinpointed due to extensive linkage disequilibrium34.

In this study, we analyzed a large number of field isolates from Cusco, Peru, to perform a genome-wide selection signature analysis, identifying candidate loci associated with TCBZ-R and demonstrating the potential of SNP-based phenotype classification. Additionally, transcriptomic analysis provided deeper insights into the mechanisms of drug action and resistance. A comparison of resistance-associated loci from geographically diverse F. hepatica populations in Peru and the UK revealed distinct selection signatures, suggesting independent genetic origins of TCBZ resistance in each population.

Results

Population sequencing of adult Fasciola hepatica with divergent triclabendazole susceptibility

In preparation for this study, we collected adult F. hepatica specimens from naturally infected livestock in Peru and determined the sensitivity of the isolates to triclabendazole sulfoxide in vitro, the most active metabolite of TCBZ35. To identify individuals at opposite ends of the phenotype distribution, we titrated the drug concentration and exposure time in our motility assay. This allowed us to classify parasites approximately in the upper and lower quartiles of the susceptibility distribution as sensitive (TCBZ-S) and resistant (TCBZ-R), respectively35. Out of the 3348 parasites exposed to triclabendazole, we whole-genome sequenced 99 TCBZ-S and 210 TCBZ-R parasites collected from 146 bovid livers (Supplementary Data 1). To avoid oversampling of closely related individuals or genetically identical parasites resulting from clumped transmission of clonemates arising from asexual reproduction in the molluscan intermediate host, we limited the number of parasites sequenced per liver to a maximum of 5, with an average of 2.1 (Fig. 1a). On average, we generated 17 Gb of sequence data per sample (11.6× mean coverage of the 1.2 Gb reference genome), resulting in the identification of 42.5 million single nucleotide variants (SNPs) across all samples. We identified and excluded 17 closely related individuals, including clonemates (kinship coefficient Φ > 0.1875; Supplementary Table 1), 2 samples with high levels of genotype missingness (>7%; Supplementary Fig. 1a), and 5 samples displaying excessive heterozygosity indicative of possible sample contamination (Supplementary Fig. 1b), leaving 91 TCBZ-S and 194 TCBZ-R parasites and 42.1 million variants (1 SNP every 29 bp on average) for subsequent analyses.

Fig. 1: Population sequencing of adult Fasciola hepatica with divergent triclabendazole sensitivity.
figure 1

a Frequency distribution of the number of parasites sequenced per host liver. Dotted lines indicate mean (black) and median (red). b, c Multidimensional scaling analysis of genetic relationship based on identity-by-state of ~9 million LD-pruned biallelic SNPs. b Specimens collected from various countries, including samples from previous studies. c Peru samples with divergent triclabendazole sensitivity. d Genome-wide estimates of nucleotide diversity. e Genome-wide estimates of Tajima’s D. f Linkage disequilibrium (LD) decay pattern in Peru population based on mean LD calculated in 10 longest scaffolds (range: 5.5–8.5 Mb). g Runs of homozygosity (ROH) regions in 194 triclabendazole-resistant and 91 triclabendazole-sensitive Peru samples. h ROH regions in 285 Peru, 5 UK, and 5 Uruguay samples. Box plots display the median (center line), upper and lower quartiles (box limits), 1.5x interquartile range (whiskers), and outliers (points). P values were calculated using a two-sided Wilcoxon rank-sum test without adjustments for multiple comparisons.

Fasciola populations in the Cusco region of Peru do not exhibit significant genetic structuring with respect to their TCBZ sensitivity phenotype

We compared our Peruvian samples to previously published individual genome sequencing data from the UK, US, and Uruguay (n = 13) to contextualize genetic variation among geographically diverse F. hepatica populations36,37,38. Genome-wide allele sharing patterns indicated that the F. hepatica population in Peru is distinct from specimens collected in the UK and Uruguay, consistent with the expected limited gene flow between these locations (Fig. 1b). A lower level of genetic differentiation was observed between samples from Peru and an isolate from Oregon, USA, as previously noted36. Among the Peruvian samples, there was little genetic structuring between flukes with divergent TCBZ sensitivity phenotype (Fig. 1c). By comparing between-group and in-group pairwise identity-by-state (IBS) distances (based on 1,267,844 LD-pruned SNPs with a minor allele frequency >5%), we asked whether, on average, pairs across the two groups are less similar than pairs within a phenotype group than would be expected by chance. A non-significant test result was obtained (p = 0.090; 91 TCBZ-S and 194 TCBZ-R samples, mean and standard deviation of 0.8010 ± 0.0023, 0.8009 ± 0.0024, and 0.8012 ± 0.0024, for between-group IBS, in-group (TCBZ-S) IBS, and in-group (TCBZ-R) IBS, respectively; permutation test randomizing phenotype labels, 106 permutations), suggesting that our specimens were sampled from a local interbreeding population without strong population stratification between TCBZ-S and -R flukes. Stratification could have confounded the analysis of selection signatures. Genome-wide estimates of nucleotide diversity (π) were 0.0065 regardless of TCBZ sensitivity (Fig. 1d). Genome-wide mean Tajima’s D estimates were positive (>0.6) in both TCBZ-S and -R populations (Fig. 1e), suggesting a possible recent population contraction in the region.

F. hepatica is a hermaphrodite with a mixed mating system involving both inbreeding and outcrossing. In inbreeders, closely linked sites often show haplotype structure detectable as high linkage disequilibrium (LD). In our study population, LD decayed to a value of r2 = 0.1 for SNPs separated by 20 kb and approached background levels at distances greater than 250 kb (Fig. 1f), a pattern comparable to those in other selfing species with mixed mating systems39,40. Inbreeding changes genotype frequencies by increasing homozygosity, largely in the form of runs of homozygosity (ROH), thereby reducing effective recombination frequency throughout the genome41. We inferred an average of 265.4 Mb and 268.1 Mb of total ROH regions per fluke in TCBZ-S and TCBZ-R parasites, respectively (Fig. 1g). These estimates were lower than those observed in the UK (mean and standard deviation of 267.2 ± 20.9 Mb and 398.3 ± 25.7 Mb in Peru and the UK, respectively; n = 285 and n = 5 in Peru and the UK, respectively; p = 1.3 × 10−4; two-sided Wilcoxon rank-sum test W statistic = 3.83, 95% confidence interval = [−1.96: 1.96], effect size = 0.22) and Uruguay (331.4 ± 9.9 Mb; n = 5 in Uruguay; p = 1.7 × 10−4; two-sided Wilcoxon rank-sum test W statistic = 3.77, 95% confidence interval =  [−1.96: 1.96], effect size = 0.22) specimens (Fig. 1h), suggesting a comparatively higher level of outcrossing in the Peru population.

Candidate loci under TCBZ selection include EGFR/PI3K-mTOR-S6K pathway genes and genes involved in microtubule function

To identify loci under TCBZ selection, we performed a genome-wide fixation index (FST) analysis. While the majority of genomic diversity is expected to be neutral with respect to the focal trait (i.e., TCBZ sensitivity), adaptive alleles that differ in their selection history are likely to be associated with regions genetically differentiated between TCBZ-S and TCBZ-R populations. We scanned the F. hepatica genome for FST outlier loci using a sliding window approach, where window boundaries were determined based on the inflection points of a smoothing spline fitted to raw FST values estimated for individual SNPs42 (Supplementary Fig. 2). To compare genomic windows of varying sizes, we used the t-test like statistic W42. Genomic windows were ranked based on their Wstatistic values (Supplementary Data 2), leading to the identification of outlier genomic regions above the 99.9th percentile (1.2 Mb), encompassing a total of nine protein-coding genes located across nine scaffolds (Fig. 2a, Supplementary Fig. 3 and Table 1). Each candidate gene was located in distinct genomic regions in separate scaffolds. Due to the limited contiguity of the F. hepatica reference genome assembly34 (GenBank Accession: GCA_900302435; N50: 1.9 Mb; N90: 519 kb), determining the long-range spatial distribution of our FST outlier loci was not possible. However, gene synteny analysis, based on the closely-related sister species F. gigantica (Supplementary Data 3), suggested that these outlier loci are likely distributed across multiple chromosomes (linkage groups) rather than in a single locus. Furthermore, LD among the outlier genes was observed to be low (R2 < 0.05) in both TCBZ-S and -R populations (Fig. 2b, c).

Fig. 2: Genome-wide scan of fixation index (FST) between triclabendazole-sensitive and -resistant Fasciola hepatica populations.
figure 2

a Outlier genomic intervals (99.9th percentile, dotted line) were determined based on Wstatistic described in ref. 42. Scaffolds on the x-axis were ordered by ID and does not imply physical proximity. Candidate genes overlapping the outlier regions were indicated using their gene symbol. b, c Linkage disequilibrium (LD) matrix among the 9 genes that overlap FST outlier regions. From each gene locus, 10 SNPs were selected randomly for all-vs-all pairwise LD calculation (90 × 90 SNPs). Red borders indicate within-gene SNP pairs. b LD in TCBZ sensitive flukes. c LD in TCBZ resistant flukes. Maximum test statistic observed on each scaffold in the present study (y-axis) and in Beesley et al. (x-axis; d: genetic cross experiment 1, e: genetic cross experiment 2, and f: field isolate 1)34. Scaffolds containing the outlier candidate genes were colored blue (present study) and red (Beesley et al.).

Table 1 Candidate genes overlapping the 99.9th percentile FST outlier regions comparing triclabendazole-sensitive and -resistant Fasciola hepatica populations in Peru

The top 9 outlier genes from the FST genome scan can be grouped into those linked to the PI3K/AKT-mTOR-S6K pathway (CYPA, EGFR, SIK3, S6K, and GALNT) and those implicated in microtubule function, either as physical binding partners (DNAH) or as modification enzymes (KTNA1). Of these, five genes contained one or more non-synonymous variants showing a significant difference in frequency between TCBZ-S and -R populations (p < 0.01 in all comparisons, Fisher’s exact test; see Supplementary Data 4 for the exact p values for each variant). We did not observe a selection signal in previously reported candidate genes and gene families, such as T687G in P-gp30 and T143S in GST mu genes43 (Supplementary Data 5). Furthermore, our candidate loci did not overlap with the TCBZ-R associated QTL regions identified from the bulked segregant analysis of Fasciola in Cumbria, UK (a 0.3 Mb region of scaffold 1853 and a 2.9 Mb region of scaffold 157)34 (Fig. 2d–f). We searched for the co-occurrence of selection signals from the two studies on the same scaffold by comparing the maximum values of the statistic (i.e., Wstatistic and median LRT values) observed for each scaffold in each study. The candidate loci from each study were located on mutually exclusive sets of scaffolds, with no overlap found.

Analysis of transcriptional profiles identified expression difference in microtubule-related genes in TCBZ-S and -R flukes

To understand the baseline transcriptional differences between TCBZ-S and -R flukes and to investigate how drug response differs between the flukes with different phenotypes, we characterized the transcriptional profiles of adult parasites derived from parental flukes with known drug sensitivity (Supplementary Fig. 4 and Supplementary Data 6). RNA-seq was performed on drug-treated and untreated whole fluke samples, focusing on (i) the expression differences between TCBZ-S and -R flukes without treatment and (ii) the TCBZ treatment effects within each phenotypic group44. To ensure statistically robust results, we accounted for variability associated with time and treatment concentrations (see Methods for details on the experimental approach, including snail and rabbit infections). The statistical significance of the pairwise differential expressions was calculated by the negative binomial test-based algorithm44, and the FDR-adjusted p values for each comparison are provided in Supplementary Data 7.

In the absence of treatment, we identified 562 overexpressed and 142 underexpressed genes in TCBZ-R flukes compared to TCBZ-S flukes (p ≤ 0.001 and >2 fold-change; Fig. 3a). Among the underexpressed genes, we did not observe significantly enriched functional categories (p ≤ 0.01; FDR-adjusted conditional hypergeometric tests45; Fig. 3e). The functional categories enriched among the overexpressed genes included microtubule-based processes (GO:0007017, p = 6.5 × 10−19), microtubule-based movement (GO:0007018, p = 2.0 × 10−13) and cilium or flagellum-dependent cell motility (GO:0001539, p = 4.9 × 10−7) (Fig. 3e). Particularly, multiple tubulin genes (9 alpha- and 1 beta-tubulins), structural and motor proteins found in cilia and flagella, such as kinesin, axonemal dynein, tektin, and various members of the cilia- and flagella-associated protein family displayed higher transcript levels in TCBZ-R flukes (Supplementary Data 7). Additionally, higher transcript abundance was observed in genes involved in regulating microtubules and cytoskeleton (Table 2), such as vasohibin with tubulin detyrosination activity46, EB1 protein modulating microtubule dynamics47, and tubulin glycylase modifying exonemal microtubules in cilia and flagella48. Tubulin polyglutamylases and tubulin deglutamylase (Cytosolic carboxypeptidase-like protein 5), which are involved in microtubule glutamylation and abundant in neurons and cilia, were also expressed at higher levels in TCBZ-R flukes. Notably, the most significantly overexpressed gene was an IQ motif–containing GTPase-activating protein (IQGAP) (p = 1.2 × 10−37), a scaffolding protein integrating EGFR signaling to PI3K/AKT-mTOR signaling49. Additional overexpressed genes of interest included three EGF-like domain-containing proteins. Two of these proteins were identified as TCBZ-R associated candidate genes by ref. 34.

Fig. 3: Transcript expression analysis of triclabendazole-sensitive and -resistant Fasciola hepatica, both without and in response to triclabendazole treatment.
figure 3

a Differential expression between sensitive and resistant flukes without treatment. b Differential expression between drug-treated and untreated sensitive flukes. c Differential expression between drug-treated and untreated resistant flukes. Genes with an FDR-adjusted p value of ≤0.001 and a fold-change of >2 were deemed to have significantly increased (red) or decreased (blue) expression and were grouped based on expression patterns (Set 1 through 6, with gene counts in parentheses). d Distribution of differentially expressed genes across Sets 1 through 4. e Overrepresented biological process Gene Ontology (GO) terms of the differentially expressed genes. GO terms with an FDR-adjusted p value of ≤0.01 were deemed significant (black border).

Table 2 Genes of interest that are differentially expressed between triclabendazole-sensitive and -resistant Fasciola hepatica without and in response to triclabendazole treatment

In TCBZ-S flukes, 62 genes showed a significant increase, and 131 genes showed a significant decrease in response to TCBZ treatment; however, no significantly differentially expressed genes were detected in TCBZ-R flukes (p ≤ 0.001 and >2 fold-change; Fig. 3b, c). The most significantly upregulated gene was a mitogen-activated protein kinase (ortholog of human MAPK11) (p = 2.2 × 10−6), which belongs to a class of kinases (p38 MAPKs) that are responsive to stress stimuli and are involved in cell differentiation, apoptosis, and autophagy50 (Table 2). We did not observe any significantly enriched functional categories (p ≤ 0.01) among the 62 upregulated genes (Fig. 3e). The 131 downregulated genes were significantly enriched for microtubule-based processes (GO: 0007017, p = 8.7 × 10−13), microtubule-based movement (GO:0007018, p = 3.9 × 10−8), and cell projection assembly (GO:0030031, p = 4.8 × 10−7) (Fig. 3e). Notably, 87 of the 131 downregulated genes overlapped with the 562 genes that exhibited higher transcript abundance levels in TCBZ-R flukes relative to TCBZ-S flukes (p ≤ 10−15), resulting in similar functional category enrichment results (Fig. 3d, e). Among the TCBZ treatment-responsive genes in TCBZ-S flukes were multiple tubulin genes (3 alpha- and 1 delta-tubulins) (Table 2). This coordinated suppression of tubulin transcripts was reminiscent of the transcriptional signatures of drug-induced microtubule destabilization involving tubulin autoregulation, a co-translational degradation of tubulin mRNAs51. To further characterize the differentially expressed genes, we performed a homology-based cell-type association analysis using Schistosoma mansoni single-cell RNA-seq data52. We observed a significant association of specific cell types including flame cells (p = 1.3 × 10−15) and late male germ cells (p = 1.0 × 10−13) that contain axonemal structure53,54, as well as several neuron types, among the differentially expressed genes that showed increased transcript abundance in TCBZ-R flukes and/or decreased transcript levels in drug-treated TCBZ-S flukes (Supplementary Data 8).

Differentiating TCBZ-S and -R parasites is possible using a limited number of informative SNPs

The identification of genetic markers that can predict clinical failure of TCBZ among infected individuals is a prerequisite for the development of cost-effective targeted genotyping approaches that can be deployed in the field. We investigated whether a limited number of SNP markers could provide discriminatory power to classify flukes as TCBZ-S or TCBZ-R. To identify informative loci, we compared allele frequencies in TCBZ-S and TCBZ-R populations using Fisher’s exact test and used a clumping approach to select the top 300 lead/index SNPs with the lowest p-values that were not in strong LD with each other (r2 < 0.5) (Supplementary Data 9). Using these loci, we performed a discriminant analysis of principal components (DAPC)55 that can probabilistically assign individual samples to TCBZ-S or TCBZ-R groups (Fig. 4a). Although the classification accuracy decreased when an incrementally smaller set of SNPs was used for DAPC modeling, results indicated that it could be possible to differentiate TCBZ-S and TCBZ-R parasites with >95% accuracy using ~100 SNPs. To address the issue of overfitting, which leads to overly optimistic performance that fails to generalize to new samples, we evaluated the DPAC classification using an independent set of 152 adult fluke samples (78 TCBZ-S and 74 TCBZ-R) genotyped with a custom multiplex amplicon panel. It was possible to design primers for 293 of the 300 lead/index SNPs, and after removing four primer pairs that produced higher levels of off-target products, the final design (IDT xGen custom amplicon panel cp727) included primer sets targeting a total of 289 loci. To assess genotyping accuracy, we sequenced an additional 10 samples using this panel, which were technical repeats of the same DNA samples that were whole-genome sequenced (WGS) for our FST analysis. The amplicon-seq resulted in a 94% mean target coverage (minimum 10×) and a 75% coverage uniformity (number of target SNPs with coverage >20% of mean) (Supplementary Data 10). Multiallelic loci (n = 2) and loci with >10% missing genotype calls (n = 26) were excluded from further analysis. Genotype calls from the 10 amplicon-seq samples matched to 10 WGS samples were compared to assess the genotyping accuracy of amplicon-seq at individual target loci. Only loci with identical genotype calls in at least 9 out of 10 sample pairs were retained for downstream analysis (n = 194). Feature selection was then performed using the Random Forest machine learning method56 to prioritize informative SNPs (n = 30), ranked by mean decrease accuracy (MDA) values (Supplementary Data 10). A DAPC discriminant function was constructed using these 30 most informative SNPs for binary classification of the 152 amplicon-seq samples. Performance was assessed through receiver operating characteristic (ROC) analysis, comparing the true positive rate (sensitivity) versus the false positive rate (1-specificity) at different classification thresholds (Fig. 4b). The area under the ROC curve (AUC), which provides an aggregate measure of performance across all possible classification thresholds, was 0.86 (non-parametric stratified bootstrapping 95% confidence interval = 0.82, 0.92). The classification accuracy was 79.6% (exact binomial 95% confidence interval =  72.3%, 85.7%), with a binomial test p values of 4.8 × 10−13. The classification sensitivity and specificity were 83.8% (exact binomial 95% confidence interval = 73.4%, 91.3%) and 75.6% (exact binomial 95% confidence interval = 64.6%, 84.7%), respectively. Finally, we performed leave-pair-out cross-validation (with 1000 stratified resamplings) to estimate the classification accuracy when the model is used to make predictions on data not included in the training set. The estimated prediction accuracy using cross-validation was 75.5%.

Fig. 4: Differentiation of triclabendazole-sensitive and -resistant Fasciola hepatica based on SNP profiles.
figure 4

Group classification by Discriminant Analysis of Principal Components (DAPC). a The top 300 SNPs, exhibiting significant between-group allele frequency differences and not in strong linkage disequilibrium with each other, were used to classify 91 TCBZ-S and 194 TCBZ-R WGS samples. Classification accuracy was evaluated incrementally by reducing the number of SNPs in the analysis. b Receiver operating characteristic (ROC) curve of the DAPC classifier using 30 informative SNPs and an independent sample set of 78 TCBZ-S and 74 TCBZ-R flukes genotyped by amplicon sequencing. The ROC curve is presented as a bold line, with the 95% confidence intervals as a light blue area.

Discussion

Triclabendazole is the first line anthelmintic used to treat F. hepatica infections, due to its effectiveness against both adult and early immature parasites. Despite the emergence of widespread resistance in livestock and the rise in human cases with resistant infections, our understanding of the genetics of triclabendazole resistance remains limited14,57. Previous efforts to identify genetic markers associated with resistance have resulted in inconsistent results across different strains and isolates, highlighting the importance of broad sampling of phenotypically well-characterized specimens57. We determined the triclabendazole sensitivity of a large number of individual adult flukes from natural infections in cattle in the Cusco region of Peru using a standardized in vitro motility assay35. Building on this collection of field isolates, we undertook genome-wide selection signature and transcript expression analyses to understand the genetic basis of TCBZ-R in F. hepatica and identified resistance-associated genetic markers to facilitate development of molecular surveillance tools for improved interventions and sustainable control.

Genotype-phenotype comparisons between F. hepatica isolates from ancestrally diverse populations are likely to suffer from the confounding effects of population structure. The fluke samples from the Cusco region showed limited population structure, making them suitable for genome-wide selection scan analysis. Moreover, the total length of ROH regions per worm was shorter than in previously sequenced UK and Uruguay specimens, consistent with an interbreeding population with higher outcrossing rates. Positive Tajima’s D values were observed, which might indicate recent or ongoing population contractions.

We identified genomic regions of high differentiation (above the 99.9th percentile) between populations with divergent TCBZ sensitivity, using 210 TCBZ-R and 99 TCBZ-S adult flukes, each group approximately representing the upper and lower quartiles of the distribution. These candidate loci under selection contained a total of 9 protein-coding genes, including those in the EGFR/PI3K/AKT-mTOR-S6K pathway and genes involved in microtubule function. We propose a model in which modifications in the EGFR/PI3K/AKT-mTOR-S6K signaling contribute to TCBZ-R by altering the stability of microtubule and promoting survival. Studies have shown that S6K pathway enhances cell survival under stressed conditions and promotes tubulin acetylation that can protect microtubules from treatments with depolymerizing drugs, such as colchicine and nocodazole58,59,60. In cells resistant to a tubulin-targeting drug (paclitaxel), tubulin acetylation was shown to enhance the anti-apoptotic phenotype by stabilizing Mcl-1 and protecting it from ubiquitin–proteasome-mediated degradation61. In Caenorhabditis elegans, S6K regulates stress granule dynamics, and its loss of function sensitizes the nematode to stress-induced death62.

It is not possible to determine, based solely on our selection scan analysis, whether any of the outlier genes directly contribute to the resistance phenotype. However, it is significant that multiple outlier genes converged on the same pathway. It is possible that an outlier gene may contribute to the resistance phenotype or play a compensatory role in rebalancing the stability and dynamics of microtubule in resistant parasites. Considering that alternations in cellular signaling pathway activity often impact multiple biological processes, including those with fitness costs, some of our candidate loci may reflect compensatory mechanisms that restore fitness. In the following sections, we discuss the individual candidate genes in greater detail within the context of our proposed model.

We identified five candidate genes implicated in the EGFR/PI3K/AKT-mTOR-S6K signaling pathway: cyclophilin (CYPA), a receptor protein-tyrosine kinase (orthologous to C. elegans let-23 and human EGFR and ERBB4), S6K, SIK3 (a positive regulator of mTOR signaling63), and GALNT (involved in the activation of PI3K/AKT pathway through the O-glycosylation of EGFR64). Cyclophilin, a chaperone/signaling molecule with peptidyl-prolyl isomerase activity, promotes cell proliferation and an anti-apoptotic phenotype in various cancer cell types65. The anti-apoptotic effects are associated with the activation of the PI3K/AKT-mTOR signaling pathway, modulation of the Bcl-2 family, and inhibition of caspase cascades66,67. In F. hepatica, a significant increase in the level of cyclophilin A protein was observed following exposure to high-concentration TCBZ (50 μg/ml, 6 h) in both Sligo (TCBZ-R) and Cullompton (TCBZ-S) isolates. However, the change in protein expression level was substantially greater in the TCBZ-R Sligo compared to TCBZ-S Cullompton flukes (a 32.7- and 6.5-fold increase relative to the control, respectively)68. The epidermal growth factor receptor (EGFR/ERBB) is a receptor tyrosine kinase that activates PI3K upon ligand binding and dimerization and regulates a wide range of biological processes, including cell proliferation and differentiation69. Transcriptomic and proteomic analyses have identified PI3K-AKT signaling as a key pathway associated with growth and development in the immature liver-stage F. hepatica70. Interestingly, it has been noted that TCBZ-R flukes reach patency faster than TCBZ-S flukes both in rats (Oberon vs. Fairhurst isolates)71 and in sheep (Sligo vs. Cullompton isolates)72. It remains to be determined whether a common pathway contributes to TCBZ-R in these isolates and if variation in PI3K-AKT activity impacts both parasite development and TCBZ sensitivity (pleiotropy).

Our analysis identified candidate genes involved in microtubule and cytoskeletal function, such as axonemal dynein (DNAH) and katanin (KTNA1). Dyneins function as motors to generate sliding between ciliary doublet microtubules. Katanin is a microtubule-severing enzyme that can modulate microtubule stability both positively and negatively73. While severing can destabilize the microtubule, katanin can also stabilize newly generated microtubule ends, contributing to microtubule amplification.

While we hypothesize that differential regulation of the PI3K/AKT-mTOR-S6K pathway is associated with TCBZ-R, further study will be required to determine the exact nature and effects of individual pathway modifications under TCBZ selection and to ascertain the causal allele(s) that are necessary and sufficient for the resistant phenotype. Although it remains to be determined whether TCBZ-R in F. hepatica is a polygenic trait controlled by multiple genes, it appears that TCBZ-R in our study population is mediated by a limited number of effector mechanisms that can be affected by multiple (upstream) regulatory processes. We did not observe strong genetic linkage between the outlier genes (located on different scaffolds), suggesting that different combinations of adaptive alleles may confer the resistance phenotype in different individual flukes (i.e., genetic redundancy).

We profiled the transcriptomes of TCBZ-S and -R flukes to better understand the drug action and develop insights into the resistance mechanism. TCBZ is a member of the benzimidazole family of anthelmintics that bind to beta-tubulin, thereby inhibiting the polymerization of microtubules74. Alpha- and beta-tubulins form obligate heterodimers that polymerize into microtubules, and TCBZ is thought to interfere with the formation of heterodimers by locking the beta-tubulin moieties in the open conformation75. TCBZ binding inhibits colchicine binding to microtubular protein purified from adult F. hepatica, suggesting a common binding site76. Although a mechanistic understanding of TCBZ interaction with F. hepatica beta-tubulins is lacking, morphological and histological changes observed in TCBZ-S flukes following drug treatment are consistent with microtubule disruption, strongly suggesting involvement of beta-tubulin72,77.

We identified an overrepresentation of microtubule-related genes, including multiple tubulin genes, among those that exhibited differential transcript abundance (i) following TCBZ treatment in the drug-sensitive flukes and (ii) between TCBZ-S and -R flukes in the absence of treatment. In particular, the functional category and cell type enrichment analysis indicated differences in transcript abundance in genes involved in exonemal microtubule function in cilia and flagella between TCBZ-S and -R flukes. Because whole-worm bulk RNA sequencing was used, we could not determine if these transcript abundance differences are due to changes in gene expression and/or cell type composition between TCBZ-S and -R flukes. Considering that axonemal microtubules are found in spermatozoon and excretory flame cells53,54, the transcriptional differences we observed could be indicative of differences in organ development or function between TCBZ-S and -R flukes. Further histological studies or single-cell RNA sequencing analysis could provide additional information to support this interpretation. Another critical consideration when comparing the transcriptomes of untreated TCBZ-S and TCBZ-R flukes is the potential influence of genetic background variation. Differences in genetic background, unrelated to the resistance phenotype, could confound the transcriptional differences observed between TCBZ-S and TCBZ-R flukes. To address this, conducting experiments with additional flukes from a range of genetic backgrounds will be essential for pinpointing transcriptional differences that are specifically associated with the resistance phenotype.

A substantial difference was observed between TCBZ-S and -R flukes in the number of differentially expressed genes in response to drug treatment (190 genes in TCBZ-S and 0 in TCBZ-R flukes). This mirrors the differential effects of TCBZ on TCBZ-S and TCBZ-R flukes reported in previous histological studies77 and indicates that our TCBZ sensitivity assay effectively captured the variation in resistance phenotype among flukes collected from the sympatric population. A notable feature of the drug response profile in TCBZ-S flukes was the concerted suppression of tubulin transcripts. This pattern has been observed during microtubule destabilization induced by combretastatin A-4 (CA4), which binds tubulin at the colchicine site51. The changes in tubulin transcript abundance induced by CA4 were shown to be mediated by tubulin autoregulation (occurring at the level of mRNA stability), which was regulated by PI3K activity via changes in microtubule stability and concentration of soluble tubulin dimer51. In addition, the most significantly overexpressed gene in TCBZ-R flukes compared to TCBZ-S flukes was an IQGAP that integrates EGFR signaling and links EGFR to PI3K/AKT-mTOR signaling49, suggesting that TCBZ-R is likely associated with differences in EGFR-PI3K pathway activity. It has been shown that changes in PI3K activity can affect tubulin dynamics and confer drug resistance to microtubule-targeting agents51,78,79.

We compared our candidate loci to those identified in a recent UK population study of F. hepatica by Beesley and colleagues34. Genetic signatures of TCBZ selection were distinct in each population with no overlap in the selection target, suggesting that TCBZ-R evolved independently in these populations using different sets of adaptive alleles. When a species with wide range of distribution, such as F. hepatica, experiences a common novel selection pressure, adaptation across the whole range may require independent origins of the adaptive allele in different geographic regions if genetic exchange between them is not expected80. Since the first report of TCBZ-R in F. hepatica in Australia81, TCBZ-R has now been demonstrated on at least 30 properties in 11 countries or regions worldwide14. Previous reports of genetic differentiation between susceptible and resistant flukes have failed to be replicated in subsequent studies in geographically or genetically diverse strains. Although these inconsistencies may have been due to the limitations of each individual study (e.g., limited sample size, lack of robust phenotyping, etc.), our data suggest that TCBZ-R can arise indecently in different populations using different adaptive alleles, thereby making it more difficult to identify TCBZ-R associated genetic markers that perform consistently across populations. Multiple independent origins of benzimidazole anthelmintic resistance have also been described in parasitic nematodes, although they are characterized by recurrent mutations in a limited set of resistance alleles in beta-tubulin82.

Our data suggest that different genes and alleles likely contribute to TCBZ-R in Peru and the UK F. hepatica populations. However, there is a possibility that a common pathway underpins the resistance phenotype in both populations. Beesley et al. highlighted genes in the Ras superfamily, Ras-related protein 1 (RAP1; maker-scaffold10x_157_pilon-snap-gene-0.182), and a class II ADP-ribosylation factor (ARF4/5; maker-scaffold10x_157_pilon-snap-gene-0.197) as prime candidates under selection, although the primary large-effect causal gene could not be determined due to an extended LD among the genes located on the locus. ARF4 has been shown to interact with EGFR, mediating the activation of phospholipase D283, to regulate microtubule posttranslational modifications (tubulin acetylation and detyrosination) and stability84, as well as to suppress apoptosis85. The small GTPase Rap1 can bind and control PI3K and TORC2 activity86,87, and activate the AKT-mTOR pathway88. This suggests that modifications in the EGFR-PI3K/AKT-mTOR pathway may play a role in the resistance phenotype in both the UK and Peru F. hepatica populations, despite the heterogeneity among selected loci; which is an observation that warrants further investigation. Mechanistic studies using broader sampling of parasites across a larger spatial range, both within and beyond the two countries, will help clarify whether a common pathway-level mechanism is driving TCBZ-R in F. hepatica. Furthermore, these follow-up studies will be valuable in strengthening our conclusions and demonstrating that the observed differences in TCBZ selection signatures are not merely artifacts of the differing experimental approaches used in each study, which vary significantly in their strengths and limitations. Beesley et al. identified a TCBZ-R-associated locus using bulked segregant analysis of pooled DNA samples collected before and after treatment. While this approach enabled and benefitted from in vivo phenotyping (in contrast to the in vitro assay used in our study), it restricted the detection of alleles contributing to TCBZ-R to those present in a single resistant fluke, selected as the parental line for the F2 mapping population, and to three replicate pools of eggs (n = 500 each) from a single farm.

Using a panel of informative SNP markers that show different allele frequencies between TCBZ-S and TCBZ-R flukes, we tested whether it would be possible to predict the TCBZ sensitivity phenotype based on the parasite’s genotype. Using DAPC with the WGS data, a classification accuracy of over 95% was obtained with ~100 SNPs, indicating that it would be possible to differentiate between TCBZ-S and -R parasites using an SNP panel based on targeted genotyping approaches, such as amplicon-seq or other cost-effective methods. To further test the feasibility of SNP-based phenotype classification, we constructed a DAPC model using an independent sample set of 152 adult fluke samples (78 TCBZ-S and 74 TCBZ-R) genotyped by amplicon-seq. Using the 30 most informative SNPs (identified independently by Random Forest machine learning), it was possible to achieve a prediction accuracy of 75.5%. Although the panel of SNP markers would benefit from further optimization for maximum informativeness and needs to be validated by testing on additional independent sample sets, including those with intermediate levels of TCBZ-R to confirm its predictive power, our work demonstrates the potential to develop a genetics-based surveillance tool for TCBZ-R. However, considering that the loci under TCBZ selection may vary between populations, as has been observed between the Peru and UK populations, population-specific SNP panels may need to be developed for accurate prediction. Nonetheless, molecular diagnostic tools for TCBZ-R would provide valuable insights into the factors driving the emergence and spread of resistance, supporting the creation of effective TCBZ stewardship strategies and enabling more informed drug policy decisions.

In conclusion, our genome-wide analysis of field F. hepatica populations provides evidence that TCBZ-R has independent origins in different geographic populations, supports efforts to develop genetics-based surveillance tools, and lays the foundation for future studies to investigate the causal relationship between EGFR/PI3K/AKT-mTOR-S6K pathway activity, posttranslational modifications/stability of microtubules, and the survival of the parasite following TCBZ exposure.

Methods

Parasite procurement and whole-genome sequencing

All animal work was approved by the Institutional Ethics Committee for Animal Use at Universidad Peruana Cayetano Heredia (CIEA Protocol 104472) and the Institutional Animal Care and Use Committee at the University of Texas Medical Branch (IACUC protocol 1907062). Adult F. hepatica parasites were collected from naturally infected cattle at slaughterhouses in the Cusco region of Peru (elevation 11,152 ft). Parasites were exposed to triclabendazole sulfoxide (TCBZ-SO) (Sigma–Aldrich, St. Louis, MO) to characterize their susceptibility to the drug as previously described35. Briefly, parasites measuring ≥10 mm were collected from the biliary tree of cattle livers condemned at slaughterhouses. Parasites were transported in warm RPMI 1640 (Sigma–Aldrich, St. Louis, MO) supplemented with antibiotic - antimycotic (100 units/ml of penicillin, 100 μg/ml of streptomycin, and 0.25 μg/ml of amphotericin B, Gibco, Carlsbad, CA), and washed 5 times with normal saline at 37 °C upon arrival to the laboratory. Up to 24 fully motile parasites from the same liver were randomly selected for a 48-h incubation in 12 well plates, each containing 3 ml of RPMI 1640 supplemented with antibiotic-antimycotic and 5% fetal bovine serum (Biowest, Riverside, MO) at 37 °C/5% CO2. The eggs produced by these parasites during the incubation period were collected and stored individually for snail infection experiments. After the incubation period, twelve flukes from each liver deemed fully viable using a motility score were selected randomly for TCBZ-SO exposure89,90. Four of these flukes were exposed to susceptible conditions, four were exposed to resistant conditions, and four were used as controls without exposure. To define the susceptible phenotype, parasites were exposed to TCBZ-SO at 15 μg/ml (33.2 μM) for 12 h and observed for 48 h to evaluate their motility. Those that developed a motility score of 0 or 1+ within the observation period were considered susceptible89,90. To define the resistant phenotype, parasites were incubated in TCBZ-SO at 15 μg/ml (33.2 μM) for 24 h and observed for 72 h to evaluate their motility. Those with a motility score of 2 or 3 after the observation period were considered resistant. Phenotypically characterized parasites were washed three times and frozen at −80 °C until further processing. All other parasites were considered to have an indeterminate susceptibility to TCBZ-SO and were discarded. Genomic DNA from frozen F. hepatica parasites was isolated using the phenol-chloroform method91. Whole parasite tissue was manually disrupted and homogenized in lysis buffer (10 mM Tris-HCl pH 7.5, 10 mM EDTA pH 8, 50 mM NaCl, 2% SDS), β-mercaptoethanol (Millipore–Sigma, Burlington, MA), and 0.1 mg/ml proteinase K at a final volume of 500 µl. The homogenate was incubated at 56 °C overnight. After incubation, 500 µl phenol–phenol-chloroform-isoamyl alcohol (25:24:1) (Millipore–Sigma, Burlington, MA) solution was added to the homogenate and centrifuged at 16,873 × g for 5 min. The aqueous phase was transferred to a new tube, 600 µl of isopropanol was added, mixed, and then centrifuged at 16,873 × g for 15 min to precipitate the DNA. The DNA pellet was washed three times with 70% molecular-grade ethanol and then resuspended with 100 µl of DNAase-free water. DNA quality and concentration were assessed by spectrophotometry using NanoDrop 2000 (Wilmington, DE, US), and specimens stored at −80 °C. DNA was precipitated in sodium acetate and absolute alcohol, air dried, and shipped to the Washington University in St. Louis McDonnell Genome Institute for genome sequencing92. To remove salt, the DNA pellet was resuspended in 500 µl of 70% ethanol, then centrifuged at 16,873 × g at 4 °C for at least 5 min. After the supernatant was removed, the DNA pellet was air-dried and resuspended in Buffer EB. A Kapa Hyper PCR-free library was generated from the DNA samples and sequenced on Illumina’s NovaSeq platform using 2 × 150 bp paired-end reads.

Genome-wide variant and selection scan analyses

Sequencing reads were adapter/quality trimmed using trimmomatic v0.3993 and aligned against the combined reference assembly of the F. hepatica nuclear, mitochondria, and neorickettsia genomes34,38 (GenBank accession: GCA_900302435, NC_002546 and NZ_LNGI01000001) using BWA v0.7.1794. Duplicate reads were removed, and single-nucleotide variants (SNPs) were called using GATK v4.2.295. The following set of quality filters were applied to obtain high-confidence genotype calls in GATK: QD < 2; QUAL < 30; SOR > 3; FS > 60; MQ < 40; MQRankSum < −12.5; ReadPosRankSum < −8; DP > (2× median depth)96. Variants were annotated according to their genomic locations and predicted coding effects using SnpEff v5.0c97. The gene models of candidate genes were manually curated to ensure accurate variant annotation and predicted coding effects (Supplementary Note 1). Genomic regions exhibiting copy number variations (CNVs) were identified using CNVcaller98, and SNPs overlapping duplicated CNV regions with copy number >2 were excluded from downstream analysis. To further remove false positive calls, Hardy–Weinberg exact test-based filtering was applied with p value cutoff of 10−25. Sites with >10% missing genotypes or <5% minor allele frequency were removed prior to multidimensional scaling (MDS) and association analysis. MDS was performed using a subset of SNPs after pruning based on LD using PLINK v1.9 with the following parameters: a window size of 100 variants, step size of 5, and a variance inflation factor of 299. Kinship analysis was performed to identify closely related samples, including clonemates, using the KING method100 implemented in AKT v0.3.3101. Inbreeding coefficients (FIS) were calculated in PLINK v1.999 to identify samples with excessive heterozygosity, likely indicative of sample contamination or non-diploid karyotypes. Linkage disequilibrium decay analysis was performed using PopLDdecay v3.42102. Wright’s fixation index (FST) between TCBZ-S and TCBZ-R populations was estimated for each individual SNP in PLINK v1.999, and a sliding-window method was applied to identify genomic regions with elevated levels of differentiation using GenWin package v0.1 in R42. Window boundaries were determined based on the inflection points of a cubic smoothing spline that were fitted to the FST estimates for individual variants. Analysis was run with a smoothing parameter of 1000, which controls how aggressively information is combined over windows of adjacent markers. Wstatistic value for each genomic window was calculated based on both the magnitude of the FST estimates and the number of linked variants (i.e., haplotype block length) and was used to identify outlier genomic regions. The null distribution of the Wstatistic was estimated by permuting the phenotype labels 20,000 times, and empirical p values were derived for each genomic window to provide statistical support (Supplementary Note 2). To help interpret the results, protein-coding genes were annotated using results from InterProScan v5.59-91.0103 to identify Gene Ontology104 classifications and InterPro functional domains105, and BlastKOALA v2.3106 to assign KEGG107 annotations. Additional annotation was performed using PANNZER2108, Sma3s v2109, eggNOG6.0110, and the STRING database v12.0111.

Generation of TCBZ-S and -R metacercariae

Fasciola eggs produced by adult parasites during the 48 h of incubation before drug exposure were collected and classified as TCBZ-S or TCBZ-R according to the results of the drug exposure experiments on the adult parents. Groups of eggs from the same parasite were washed in normal saline, resuspended in distilled water, and incubated together in complete darkness at 25 °C for 2 weeks as described previously112,113. After incubation, the eggs were exposed to a direct light source for one hour to induce hatching. Miracidia that emerged from the same group of eggs were used to infect second-generation snail colonies kept in the laboratory for these experiments. Two miracidia emerging from the same phenotypically characterized group of eggs were placed in a well of a 96 well plate in the presence of a single snail. Snail infections were confirmed by microscopy, and snail cohorts infected with miracidia emerging from the same group of eggs were created. Each snail cohort was kept in the same container in the laboratory, fed at libitum, and exposed to 12 h cycles of light and darkness. After 55 days, groups of 10 snails from the same cohort were placed inside a plastic bag containing chilled water and exposed to a direct light source for a total of 60 min to induce the release of cercariae. Metacercariae adhered to the inside of the plastic bag from the same snail cohort and known TCBZ susceptibility were collected and kept in water at 4 °C until used.

Generation of TCBZ-S and -R adult F. hepatica parasites

We used the rabbit model of Fasciola infection to obtain F1 adult parasites from parental flukes with a well-characterized TCBZ susceptibility phenotype114,115,116. We purchased Californian rabbits weighing ~4 kg from a local vendor in a non-endemic area of Peru. Rabbits were brought to the animal care facility at Universidad Peruana Cayetano Heredia and kept in individual collector cages with filtered water and dry feed at libitum. On arrival, all animals tested negative for Fasciola infection three times using microscopy of stool samples collected in consecutive days and were treated with toltrazuril/fenbendazole/praziquantel to eliminate other potential helminth infections117. Two weeks later, rabbits were infected orally with groups of 40 TCBZ-S or TCBZ-R metacercariae. Each group of metacercariae was formed by collecting 20 metacercariae from two different snail cohorts with the same TCBZ susceptibility profile. For this study, one male rabbit was infected with one group of TCBZ-S metacercaria, and one male rabbit was infected with one group of TCBZ-R metacercariae generated from eggs and snails collected in the Cusco region. We tested daily individual rabbit stool samples for Fasciola eggs using microscopy starting at day 15 post-infection and continued testing until both rabbits were passing eggs and the three-day mean egg count reached a steady state. Two weeks after steady state mean egg counts were reached (day 84 post infection), we euthanized the rabbits and immediately dissected their livers to obtain adult F. hepatica parasites from the bile ducts. Parasites were immediately washed in warm normal saline and individually placed in a well of a 12 well plate containing RPMI 1640 supplemented with antibiotic-antimycotic and 5% fetal bovine serum and incubated for 48 h at 37 °C/5% CO2. The newly obtained adult flukes were grouped according to the TCBZ susceptibility of their phenotypically characterized progenitors and used for TCBZ exposure experiments at varying times and concentrations.

Triclabendazole exposure experiments and RNA-seq transcriptional profiling

Fully motile adult parasites obtained from the rabbit infections were separated into two groups according to their TCBZ-S (n = 12) or TCBZ-R (n = 18) characteristics (Supplementary Fig. 4). Parasites were exposed to sublethal concentrations of TCBZ-SO for different periods of time to evaluate transcriptional responses associated with drug exposure. Parasites in each group were exposed to TCBZ-SO concentrations of 16.6 μM (7.5 μg/ml) or 33.2 μM (15 μg/ml) for 2 or 6 h. Two (TCBZ-S) or three (TCBZ-R) controls with no TCBZ-SO exposure depending on parasite availability were included with each exposure time and collected at times zero (pre-exposure), 2, and 6 h. When the number of parasites allowed it, each experimental condition was replicated three times. Parasites were collected separately at each time point, washed in PBS (Corning, Manassas, VA), preserved in RNAlater® (Invitrogen, Carlsbad, CA), and frozen at −80 °C. RNA extraction from parasite tissue was performed using a protocol combining TRIzol (Invitrogen, Carlsbad, CA) and HiBind® RNA Spin Columns (Omega Bio-Tek, Norcross, GA) as described previously with the following modifications118. Each parasite was placed in a 50 ml Falcon tube containing 7.5 ml TRIzol and 1 ml of molecular grade β-mercaptoethanol, mechanically disrupted, and incubated for 5 min at room temperature. At the end of incubation, 1.5 ml of molecular grade chloroform (Sigma–Aldrich, St. Louis, MO) was added, the tube was shaken vigorously for 15 s, and centrifuged at 3800 × g at 4 °C for 15 min. The aqueous phase formed in the tube was transferred into a 5 ml RNAse-free tube, 2.5 ml of chilled 70% ethanol was added to the sample and shaken vigorously for 15 s. Seven hundred microliters of this solution, including any precipitate, were transferred into the HiBind® RNA Spin Columns and centrifuged at 5510 × g for 5 min. This step was repeated until all the solution was passed through the column. Then, the RNA was washed by adding 700 μl of chilled 70% ethanol to the column and centrifuging it at 5510 × g for 5 min for a total of three cycles. The silica columns were dried by centrifugation at 16,873 × g for 1 min, and RNA was eluted in 100 μl of nuclease-free water and stored at −80 °C. Before shipment to the Washington University in St. Louis McDonnell Genome Institute for RNA sequencing, RNA specimens were transferred to GenTegra-RNA Screw Cap Tubes (GenTegra, Pleasanton, CA) following the manufacturer’s instructions. RNA was reconstituted in nuclease-free water, and quality was assessed using a Bioanalyzer. A Clontech SMARTer universal low-input RNA library was constructed and sequenced on Illumina’s NovaSeq platform using 2 × 150 bp paired-end reads. Sequencing reads were adapter/quality trimmed using trimmomatic v0.39 and then mapped to the F. hepatica genome (GenBank accession: GCA_900302435) using HISAT2 v2.2.1119. Mapped fragments (read pairs) were quantified using featureCounts v2.0.3120, and relative gene expression values (fragments per kilobase per million reads mapped to genes, FPKM) were calculated. DESeq2 v1.38.344 was used to perform differential gene expression analysis using the raw fragment counts per gene per sample, with an FDR-adjusted p value threshold of 0.001 and a minimum fold-change of 2. Untreated TCBZ-S and -R flukes were compared to each other to ascertain baseline transcriptional differences. In addition, TCBZ-treated flukes were compared to untreated controls, independently for TCBZ-S and -R flukes to investigate the treatment response. Exposure time and drug concentration were treated as batch effects. Gene set enrichment analysis was performed using the Over-Representation Analysis (ORA) statistical tool provided in WebGestalt v2019121, using (i) KEGG107 annotations, (ii) InterPro domain105 annotations, and (iii) tissue assignments based on S. mansoni reciprocal best hit results (BLAST 2.12.0+) and previous scRNA-seq analysis52. GOstats v2.64.045 was also used to perform functional enrichment analysis based on Gene Ontology annotations104. All significant results were FDR-corrected for the number of tests run and required a minimum of 3 genes representing a category/pathway. REVIGO122 was used to summarize the lists of Gene Ontology terms for visualization.

Amplicon panel design and targeted sequencing (amplicon-seq)

The xGen Custom Amplicon Panel (IDT, Coralville, IA) was developed based on the top 300 index variants (identified as described in the next section) (Supplementary Data 9). When it was not possible to design primers for an index SNP, primers were designed to target a proxy SNP that is in LD with the index SNP. The xGen Amplicon Core Kit (IDT, Coralville, IA) was used to prepare amplicon libraries for Illumina sequencing, following the manufacturer’s protocol. The libraries were sequenced on Illumina’s NovaSeq platform using 2 × 150 bp paired-end reads. Using eight Fasciola genomic DNA samples, the performance of the amplicon panel was assessed, and primer pairs that resulted in higher amounts of off-target products were removed from the panel (Supplementary Data 10). The final revised amplicon panel (IDT xGen custom amplicon panel cp727), targeting 289 SNP loci, was used for subsequent genotyping experiments with an independent set of 80 TCBZ-S and 80 TCBZ-R samples. Adaptor trimming, reference alignment, and variant calling were performed as described for whole-genome sequencing samples, except that GATK v4.2.295 was run only over the genomic intervals corresponding to the amplicon target loci, and no maximum depth filter (DP) was applied during the variant filtration step. Coverage statistics were calculated for each sample using Picard v2.26.2 (CollectTargetedPcrMetrics)95. Genotyping accuracy of amplicon-seq was assessed for each target loci by comparing the genotype calls to those from WGS data using 10 replication samples (i.e., the same DNA samples genotyped using both WGS and amplicon-seq). Loci with ≥90% genotype concordance and <10% missing call rates (n = 194) were used for discriminant analysis. To identify and exclude clonemates and closely related individuals among the amplicon-seq samples, PLINK v1.999 (--rel-cutoff) was used for relationship-based pruning with a maximum cutoff value of 0.7, resulting in a final set of 152 fluke samples (78 TCBZ-S and 74 TCBZ-R), with 8 samples removed.

Discriminating TCBZ sensitivity phenotype using a panel of SNP markers

Discriminant analysis of principal components (DAPC; adegenet v2.1.10)55 was used to test if phenotypic classification is possible using a set of SNP markers. To identify informative markers, genotype-phenotype allelic association analysis (Fisher’s exact test with Lancaster’s mid-p adjustment) and LD-based clumping were conducted using PLINK v1.999. The clumping approach grouped phenotype-associated variants such that significant sites (p value < 0.0001) within 250 kb of a representative (most significant) index variant were assigned to that index variant’s clump, given that their r2 to the index was larger than 0.5 (Supplementary Data 9). A ranked list of top 300 index variants was used for the DAPC analysis of the WGS samples, and the classification accuracy (successful re-assignment of individuals to their phenotype group) was assessed as a function of the number of top SNPs used. DAPC classification of the amplicon-seq samples (n = 152) was performed using the 30 most informative SNPs, selected based on the Mean Decrease Accuracy (MDA) values from a Random Forest machine learning method56 implemented in randomForest package v4.7-1.156, with 10,000 trees and Out-Of-Bag (OOB) evaluation (Supplementary Data 10). A cross-validation analysis was conducted to estimate the classification accuracy when the DAPC model was used to make predictions on data not included in the training set. Using the xvalDapc function of the adegenet package v2.1.1055, we performed leave-pair-out cross-validation with 1000 stratified resamplings, ensuring that each test set included both TCBZ-S and -R samples. This leave-p-out cross-validation technique (p = 2) is particularly effective for small datasets and provides a nearly unbiased estimate of model performance123. Classification accuracy and its statistical significance based on a binomial test were calculated using the caret package v6.0-94124. The 95% confidence intervals for sensitivity and specificity were estimated using the epiR package v2.0.74125. ROC and area under curve (AUC) analyses were performed using the pROC package v1.18.5126.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.